[ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats

ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed

* [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
@ 2023-12-15  2:36 jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-08 19:31 ` [ruby-core:116081] " Dan0042 (Daniel DeLorme) via ruby-core
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2023-12-15  2:36 UTC (permalink / raw
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20066 has been reported by jeremyevans0 (Jeremy Evans).

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116081] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-01-08 19:31 ` Dan0042 (Daniel DeLorme) via ruby-core
  2024-01-08 20:02 ` [ruby-core:116082] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2024-01-08 19:31 UTC (permalink / raw
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #20066 has been updated by Dan0042 (Daniel DeLorme).

These are very nice optimizations, though they lead me to wonder: would it be possible to also optimize `def foo(**x) = bar(**x)` in the same way that `def foo(&x) = bar(&x)` is currently optimized? If a named block param can be optimized to avoid "materialization" in this case, I think it should be possible to do the same with a named splat param, and have it become equivalent to `def foo(**) = bar(**)`. This would have the nice benefit that existing forwarding code with named params would become faster, without having to convert to anonymous params.

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106076

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116082] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-08 19:31 ` [ruby-core:116081] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2024-01-08 20:02 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-08 21:19 ` [ruby-core:116086] " Dan0042 (Daniel DeLorme) via ruby-core
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-01-08 20:02 UTC (permalink / raw
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20066 has been updated by jeremyevans0 (Jeremy Evans).

Dan0042 (Daniel DeLorme) wrote in #note-1:
> These are very nice optimizations, though they lead me to wonder: would it be possible to also optimize `def foo(**x) = bar(**x)` in the same way that `def foo(&x) = bar(&x)` is currently optimized? If a named block param can be optimized to avoid "materialization" in this case, I think it should be possible to do the same with a named splat param, and have it become equivalent to `def foo(**) = bar(**)`. This would have the nice benefit that existing forwarding code with named params would become faster, without having to convert to anonymous params.

See the commit message for why you cannot do this for named variables (at least, without escape analysis): https://github.com/ruby/ruby/commit/7d14dc8983b4d279d22bbbb779e94e7a01a7e88f.patch

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106077

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116086] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-08 19:31 ` [ruby-core:116081] " Dan0042 (Daniel DeLorme) via ruby-core
  2024-01-08 20:02 ` [ruby-core:116082] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-01-08 21:19 ` Dan0042 (Daniel DeLorme) via ruby-core
  2024-01-08 21:32 ` [ruby-core:116087] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2024-01-08 21:19 UTC (permalink / raw
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #20066 has been updated by Dan0042 (Daniel DeLorme).

The thing is, I don't understand how is this different from the `def f(&b)` situation.

```ruby
def f(&block)
  foo(bar)
  g(&block)
end
```

Here above, `foo` may be `eval` and `bar` may be a string referencing `block`. So how does the `&block` optimization work in this case? Is a Proc object allocated from the start? Is it allocated lazily only if `foo(bar)` turns out to be `eval("block")`? How is this different from keyword arguments? I'd appreciate if you could elucidate.

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106078

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116087] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (2 preceding siblings ...)
  2024-01-08 21:19 ` [ruby-core:116086] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2024-01-08 21:32 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-08 21:49 ` [ruby-core:116089] " Eregon (Benoit Daloze) via ruby-core
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-01-08 21:32 UTC (permalink / raw
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20066 has been updated by jeremyevans0 (Jeremy Evans).

Dan0042 (Daniel DeLorme) wrote in #note-3:
> The thing is, I don't understand how is this different from the `def f(&b)` situation.
> 
> ```ruby
> def f(&block)
>   foo(bar)
>   g(&block)
> end
> ```
> 
> Here above, `foo` may be `eval` and `bar` may be a string referencing `block`. So how does the `&block` optimization work in this case? Is a Proc object allocated from the start? Is it allocated lazily only if `foo(bar)` turns out to be `eval("block")`? How is this different from keyword arguments? I'd appreciate if you could elucidate.

Proc-activation works differently.  Before Ruby 2.5, Ruby always allocated a Proc object in such cases.  Since Ruby 2.5, it does not, I believe using a block param proxy (`getblockparamproxy` VM instruction).  I don't know the details, though I'm sure @ko1 does.

The positional and keyword splats currently always allocate up front.  Maybe a similar approach could work for those (`getsplatparamproxy`/`getkwsplatparamproxy`), but it would assuredly be much more invasive.  The nice part of the Allocationless Anonymous Splat Forwarding optimization is that it recognizes that you cannot modify the anonymous parameters, and therefore they don't need to allocate, which takes minimal changes to the existing code.

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106079

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116089] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (3 preceding siblings ...)
  2024-01-08 21:32 ` [ruby-core:116087] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-01-08 21:49 ` Eregon (Benoit Daloze) via ruby-core
  2024-01-08 21:54 ` [ruby-core:116090] " Eregon (Benoit Daloze) via ruby-core
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-01-08 21:49 UTC (permalink / raw
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20066 has been updated by Eregon (Benoit Daloze).

jeremyevans0 (Jeremy Evans) wrote in #note-4:
> The nice part of the Allocationless Anonymous Splat Forwarding optimization is that it recognizes that you cannot modify the anonymous parameters

Interesting.
I thought it might not hold for:
```
$ ruby -e 'def g(*,**h); h; end; def m(h); h[:a] = 2; p h; end; def f(*, **); m(*,**); g(*,**); end; p f(a: 1)'
{:a=>2}
{:a=>1}
```
but it seems fine.
Which method does the copy there? #f when calling #m?

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106081

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116090] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (4 preceding siblings ...)
  2024-01-08 21:49 ` [ruby-core:116089] " Eregon (Benoit Daloze) via ruby-core
@ 2024-01-08 21:54 ` Eregon (Benoit Daloze) via ruby-core
  2024-01-08 21:57 ` [ruby-core:116091] " Dan0042 (Daniel DeLorme) via ruby-core
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-01-08 21:54 UTC (permalink / raw
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20066 has been updated by Eregon (Benoit Daloze).

Something else is one might want to observe what is the value of `*` or `**` or `&` in the debugger.
And if that's somehow available then it becomes a way to mutate them, unless that way always makes a copy.
Notably it's not uncommon for debuggers to allow to eval code with any receiver, field, argument, etc.

What about the :call TracePoint, doesn't that provide access to arguments?
It seems not currently, only the binding and `parameters` (= Method#parameters).

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106082

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116091] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (5 preceding siblings ...)
  2024-01-08 21:54 ` [ruby-core:116090] " Eregon (Benoit Daloze) via ruby-core
@ 2024-01-08 21:57 ` Dan0042 (Daniel DeLorme) via ruby-core
  2024-01-08 22:04 ` [ruby-core:116092] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2024-01-08 21:57 UTC (permalink / raw
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #20066 has been updated by Dan0042 (Daniel DeLorme).

jeremyevans0 (Jeremy Evans) wrote in #note-4:
> The positional and keyword splats currently always allocate up front.  Maybe a similar approach could work for those (`getsplatparamproxy`/`getkwsplatparamproxy`), but it would assuredly be much more invasive.

I was thinking that something similar would be possible, except that instead of lazily allocating a Proc object, we'd be lazily copying the kwrest argument. I wasn't aware there's a special getblockparamproxy VM instruction, and indeed that sounds fairly invasize.

Note that I'm only discussing an additional optimization related to rest/kwrest arguments, not something that would replace Allocationless Anonymous Splat Forwarding (which, it's worth repeating, is really quite nice).

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106083

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116092] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (6 preceding siblings ...)
  2024-01-08 21:57 ` [ruby-core:116091] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2024-01-08 22:04 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-08 22:19 ` [ruby-core:116093] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-01-08 22:04 UTC (permalink / raw
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20066 has been updated by jeremyevans0 (Jeremy Evans).

Eregon (Benoit Daloze) wrote in #note-5:
> jeremyevans0 (Jeremy Evans) wrote in #note-4:
> > The nice part of the Allocationless Anonymous Splat Forwarding optimization is that it recognizes that you cannot modify the anonymous parameters
> 
> Interesting.
> I thought it might not hold for:
> ```
> $ ruby -e 'def g(*,**h); h; end; def m(h); h[:a] = 2; p h; end; def f(*, **); m(*,**); g(*,**); end; p f(a: 1)'
> {:a=>2}
> {:a=>1}
> ```
> but it seems fine.
> Which method does the copy there? #f when calling #m?

Close, `m` duplicates the hash callee-side.  There was a bug in Ruby 3.0-3.2 where it would not (#20012), but I fixed it in Ruby 3.3 after finding it during this optimization work.

With the optimizations, I think `f(a: 1)` allocates 1 array and 2 hashes (I didn't double check):

* array - `f` (callee-side), empty array for anonymous positional splat parameter
* hash - `f` (callee-side), as `a: 1` keyword argument is converted to anonymous keyword splat parameter
* hash - `m` (callee-side), as anonymous keyword splat argument is duplicated for positional parameter `h`

Without the 6th optimization, there is an additional hash allocated:

* hash - `g` (callee-side), as anonymous keyword splat argument is duplicated for named keyword splat parameter `h`

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106084

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116093] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (7 preceding siblings ...)
  2024-01-08 22:04 ` [ruby-core:116092] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-01-08 22:19 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-09  0:08 ` [ruby-core:116095] " Dan0042 (Daniel DeLorme) via ruby-core
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-01-08 22:19 UTC (permalink / raw
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20066 has been updated by jeremyevans0 (Jeremy Evans).

Eregon (Benoit Daloze) wrote in #note-6:
> Something else is one might want to observe what is the value of `*` or `**` or `&` in the debugger.
> And if that's somehow available then it becomes a way to mutate them, unless that way always makes a copy.
> Notably it's not uncommon for debuggers to allow to eval code with any receiver, field, argument, etc.
> 
> What about the :call TracePoint, doesn't that provide access to arguments?
> It seems not currently, only the binding and `parameters` (= Method#parameters).

Honestly, I didn't consider either of these.  I don't see how you could do it with TracePoint, since even with the binding, you cannot get the values, only pass them as splats.

I don't know whether it is possible to change the values with a debugger API, but I would guess not.  The debugger has a method for getting local variables and their values (https://github.com/ruby/debug/blob/ab937ac49a4643f9302856c2da487ca113c16bd5/lib/debug/frame_info.rb#L140-L147), but it uses Binding#local_variable_get, which raises `NameError` for `*` and `**`, even inside a method with an anonymous splats.  I tested and it skips the anonymous splat parameters completely, they don't show up in the `i l` command inside a method, unlike named splat parameters.

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106085

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116095] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (8 preceding siblings ...)
  2024-01-08 22:19 ` [ruby-core:116093] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-01-09  0:08 ` Dan0042 (Daniel DeLorme) via ruby-core
  2024-01-09  0:25 ` [ruby-core:116096] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2024-01-09  0:08 UTC (permalink / raw
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #20066 has been updated by Dan0042 (Daniel DeLorme).

> I thought it might not hold for:
> ```
> $ ruby -e 'def g(*,**h); h; end; def m(h); h[:a] = 2; p h; end; def f(*, **); m(*,**); g(*,**); end; p f(a: 1)'
> {:a=>2}
> {:a=>1}
> ```

Ok, I'm not sure what's going on, but I just tried compiling jeremy's branch and the above is not working for me

```
jeremy/bin/ruby -e 'def g(*,**h); h; end; def m(h); h[:a] = 2; p h; end; def f(*, **); m(*,**); g(*,**); end; p f(a: 1)'
{:a=>2}
{:a=>2}
```
???

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106091

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116096] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (9 preceding siblings ...)
  2024-01-09  0:08 ` [ruby-core:116095] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2024-01-09  0:25 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-09  0:37 ` [ruby-core:116097] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-01-09  0:25 UTC (permalink / raw
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20066 has been updated by jeremyevans0 (Jeremy Evans).

Dan0042 (Daniel DeLorme) wrote in #note-10:
> > I thought it might not hold for:
> > ```
> > $ ruby -e 'def g(*,**h); h; end; def m(h); h[:a] = 2; p h; end; def f(*, **); m(*,**); g(*,**); end; p f(a: 1)'
> > {:a=>2}
> > {:a=>1}
> > ```
> 
> Ok, I'm not sure what's going on, but I just tried compiling jeremy's branch and the above is not working for me
> 
> ```
> jeremy/bin/ruby -e 'def g(*,**h); h; end; def m(h); h[:a] = 2; p h; end; def f(*, **); m(*,**); g(*,**); end; p f(a: 1)'
> {:a=>2}
> {:a=>2}
> ```
> ???

Thanks for testing.  I'm fairly sure @Eregon was checking Ruby's current behavior, not the branch. That is a regression in the branch.  I'll fix it.

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106092

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116097] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (10 preceding siblings ...)
  2024-01-09  0:25 ` [ruby-core:116096] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-01-09  0:37 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-09  4:30 ` [ruby-core:116101] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-01-09  0:37 UTC (permalink / raw
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20066 has been updated by jeremyevans0 (Jeremy Evans).

jeremyevans0 (Jeremy Evans) wrote in #note-11:
> Dan0042 (Daniel DeLorme) wrote in #note-10:
> > > I thought it might not hold for:
> > > ```
> > > $ ruby -e 'def g(*,**h); h; end; def m(h); h[:a] = 2; p h; end; def f(*, **); m(*,**); g(*,**); end; p f(a: 1)'
> > > {:a=>2}
> > > {:a=>1}
> > > ```
> > 
> > Ok, I'm not sure what's going on, but I just tried compiling jeremy's branch and the above is not working for me
> > 
> > ```
> > jeremy/bin/ruby -e 'def g(*,**h); h; end; def m(h); h[:a] = 2; p h; end; def f(*, **); m(*,**); g(*,**); end; p f(a: 1)'
> > {:a=>2}
> > {:a=>2}
> > ```
> > ???
> 
> Thanks for testing.  I'm fairly sure @Eregon was checking Ruby's current behavior, not the branch. That is a regression in the branch.  I'll fix it.

It's related to the 6th optimization (the most invasive). Fix:

```
diff --git a/vm_args.c b/vm_args.c
index a19169de13..f22c3ad3db 100644
--- a/vm_args.c
+++ b/vm_args.c
@@ -581,7 +581,8 @@ setup_parameters_complex(rb_execution_context_t * const ec, const rb_iseq_t * co
         if (UNLIKELY((ci_flag & VM_CALL_ANON_SPLAT) && VM_FRAME_ANON_SPLAT_MUT_P(ec->cfp))) {
             args->rest_dupped = true;
         }
-        if (UNLIKELY((ci_flag & VM_CALL_ANON_KW_SPLAT) && VM_FRAME_ANON_KW_SPLAT_MUT_P(ec->cfp))) {
+        if (UNLIKELY((ci_flag & VM_CALL_ANON_KW_SPLAT) && VM_FRAME_ANON_KW_SPLAT_MUT_P(ec->cfp) &&
+                        ISEQ_BODY(iseq)->param.flags.has_kwrest)) {
             kw_flag |= VM_CALL_KW_SPLAT_MUT;
         }

@@ -659,7 +660,8 @@ setup_parameters_complex(rb_execution_context_t * const ec, const rb_iseq_t * co
             // f(**kw)
             VALUE last_arg = args->argv[args->argc-1];

-            if (UNLIKELY((ci_flag & VM_CALL_ANON_KW_SPLAT) && VM_FRAME_ANON_KW_SPLAT_MUT_P(ec->cfp))) {
+            if (UNLIKELY((ci_flag & VM_CALL_ANON_KW_SPLAT) && VM_FRAME_ANON_KW_SPLAT_MUT_P(ec->cfp) &&
+                          ISEQ_BODY(iseq)->param.flags.has_kwrest)) {
                 kw_flag |= VM_CALL_KW_SPLAT_MUT;
             }

```

I'll commit this with a test later today or tomorrow.

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106093

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [ruby-core:116101] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (11 preceding siblings ...)
  2024-01-09  0:37 ` [ruby-core:116097] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-01-09  4:30 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-09 13:17 ` [ruby-core:116116] " Eregon (Benoit Daloze) via ruby-core
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-01-09  4:30 UTC (permalink / raw
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20066 has been updated by jeremyevans0 (Jeremy Evans).

I've decided to remove the 6th optimization, and updated the pull request to do so.  In addition to being by far the most complex and invasive optimization, it's also incorrect if the anonymous splat is passed to multiple methods that accept named splats.  That's unfixable (without escape analysis/deoptimization), because `eval` and such can be used to pass the anonymous splat, so this isn't solvable in the compiler.

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106098

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116116] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (12 preceding siblings ...)
  2024-01-09  4:30 ` [ruby-core:116101] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-01-09 13:17 ` Eregon (Benoit Daloze) via ruby-core
  2024-01-09 16:39 ` [ruby-core:116123] " Dan0042 (Daniel DeLorme) via ruby-core
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-01-09 13:17 UTC (permalink / raw
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20066 has been updated by Eregon (Benoit Daloze).

Happy I found an example that revealed an issue :)

> Switch ... argument forwards to not use ruby2_keywords
> Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array. This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that def f(...) end operates similarly to def f(*, **) end, allowing allocationless splat forwarding

Awesome, I think this is much cleaner, and what TruffleRuby already does.
(and this kind of details tends to leak into parsers notably local tables unfortunately so it's valuable to be in sync)

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106115

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116123] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (13 preceding siblings ...)
  2024-01-09 13:17 ` [ruby-core:116116] " Eregon (Benoit Daloze) via ruby-core
@ 2024-01-09 16:39 ` Dan0042 (Daniel DeLorme) via ruby-core
  2024-01-09 17:02 ` [ruby-core:116124] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2024-01-09 16:39 UTC (permalink / raw
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #20066 has been updated by Dan0042 (Daniel DeLorme).

I'm not sure this is worth worrying about, but:

```ruby
def a(**)
  @opts[:x] = 2
  b(**)
end
def b(**kw)
  p kw
end
opts = @opts = {x: 1}
a(**opts) # normally prints {:x=>1}, now prints {:x=>2}
```

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106125

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116124] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (14 preceding siblings ...)
  2024-01-09 16:39 ` [ruby-core:116123] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2024-01-09 17:02 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-01-09 18:30 ` [ruby-core:116126] " Dan0042 (Daniel DeLorme) via ruby-core
  2024-01-25  2:29 ` [ruby-core:116436] " ko1 (Koichi Sasada) via ruby-core
  17 siblings, 0 replies; 19+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-01-09 17:02 UTC (permalink / raw
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20066 has been updated by jeremyevans0 (Jeremy Evans).

Dan0042 (Daniel DeLorme) wrote in #note-15:
> I'm not sure this is worth worrying about, but:
> 
> ```ruby
> def a(**)
>   @opts[:x] = 2
>   b(**)
> end
> def b(**kw)
>   p kw
> end
> opts = @opts = {x: 1}
> a(**opts) # normally prints {:x=>1}, now prints {:x=>2}
> ```

That's a good point.   Note that similar things are already possible:

```ruby
def a(**)
  b(**)
end
def b(**kw)
  p kw
end
opts = @opts = {x: 1}
b = Object.new
b.define_singleton_method(:to_proc){opts[:x] = 2; proc{}}
a(**opts, &b)
```

This prints `{:x=>2}` even though you would expect `{:x=>1}`.  This is true since Ruby 3.0 if you change the anonymous keyword splats to named keyword splats, since the duplication of the keyword splat was moved from caller-side to callee-side.

This does expand the scope in which modifying the object affects things, so it's something that should be considered when deciding whether to merge.

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106126

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116126] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (15 preceding siblings ...)
  2024-01-09 17:02 ` [ruby-core:116124] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-01-09 18:30 ` Dan0042 (Daniel DeLorme) via ruby-core
  2024-01-25  2:29 ` [ruby-core:116436] " ko1 (Koichi Sasada) via ruby-core
  17 siblings, 0 replies; 19+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2024-01-09 18:30 UTC (permalink / raw
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #20066 has been updated by Dan0042 (Daniel DeLorme).

I'm not sure if the following is relevant, but maybe just food for thought...

Before Jeremy submitted this patch I had toyed with the idea of optimizing `*rest` and `**kwrest` arguments by freezing them. The idea is that with
```ruby
opts = {a: 1}.freeze
foo(**opts)
```
We don't need to make a copy of `opts` since it's frozen. And that frozen hash can be used directly in the callee depending on certain conditions:
```ruby
def foo1(**)   #use frozen hash directly since there's no variable to reference
  bar(**)      #frozen hash is passed to bar, so no copy needed
end
def foo2(**kw) #use frozen hash directly since kw is only used in splat
  bar(**kw)    #frozen hash is passed to bar, so no copy needed
end
def foo3(**kw) #use frozen hash directly since kw is only used in splat
  eval("kw").frozen? #=> true; imho this is acceptable tradeoff; should almost never cause issues, and can workaround with kw=kw.dup
  bar(**kw)    #frozen hash is passed to bar, so no copy needed
end
def foo4(**kw) #copy hash since kw variable is used apart from splat
  kw.frozen? #=> false
  bar(**kw)    #mutable hash is passed to bar
end
foo1 #no kwargs; in foo1 we can use a global frozen hash like Hash::EMPTYKW = {}.freeze
```
And then I toyed with the idea of introducing a `frozen_rest_arguments` pragma so that kw can be frozen explicitly in foo3 and foo4 above.

Jeremy's approach seems better because `foo(**opts)` with a long forward chain results in only 1 allocation at the end, whereas my idea requires an extra allocation at the beginning if `opts` is not frozen. But I think the extra allocation makes this freezing approach immune to the issue in #note-15. So, food for thought, maybe.

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106131

* Author: jeremyevans0 (Jeremy Evans)
* Status: Open
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [ruby-core:116436] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
  2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (16 preceding siblings ...)
  2024-01-09 18:30 ` [ruby-core:116126] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2024-01-25  2:29 ` ko1 (Koichi Sasada) via ruby-core
  17 siblings, 0 replies; 19+ messages in thread
From: ko1 (Koichi Sasada) via ruby-core @ 2024-01-25  2:29 UTC (permalink / raw
  To: ruby-core; +Cc: ko1 (Koichi Sasada)

Issue #20066 has been updated by ko1 (Koichi Sasada).

Please try!

----------------------------------------
Feature #20066: Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats
https://bugs.ruby-lang.org/issues/20066#change-106459

* Author: jeremyevans0 (Jeremy Evans)
* Status: Closed
* Priority: Normal
----------------------------------------
I have submitted a pull request (https://github.com/ruby/ruby/pull/9247) to reduce implicit array and hash allocations for method calls involving splats.  The following optimizations are included:

VM_CALL_ARGS_SPLAT_MUT callinfo flag

This is similar to the VM_CALL_KW_SPLAT_MUT flag added in Ruby 3.0.  This makes it so if the caller-side allocates an array for the method call, the flag is used to signal to the callee that it can reuse the allocated array and does not need to duplicate it.

concattoarray VM instruction

This instruction is similar to concatarray, but assumes the object being concatenated to is already a mutable array (such as those created by the splatarray VM instruction).  This optimizes method calls with multiple splats such as `f(*a,*a,*a)` (which previously allocated 3 arrays), allocating a single array instead of an array per splatted array.

pushtoarray VM instruction

This is similar, but handles non-splat arguments after a splat.  Previously, the VM would wrap those arguments in an array using newarray, and then call concatarray, such that `f(*a, a)` allocated 3 arrays caller-side.  This instruction just appends to the mutable array, reducing the number of arrays allocated to 1.

Allocationless Anonymous Splat Forwarding

This allows `def f(*, **) end` to not allocate an array or hash callee side.  This works because it is not possible to mutate the local variables, only pass them as splats to other methods.  This can make the following call chain allocation less:

```ruby
def f(a, b: 1) end
def g(*, **) f(*, **) end
ea

a = [1]
kw = {b: 2}
g(*a, **kw) # No allocations in this call
```

Switch ... argument forwards to not use ruby2_keywords

Using ruby2_keywords has probably been slower since Koichi's changes early in the Ruby 3.3 development cycle to not combine keyword splats into the positional splat array.  This removes the FORWARD_ARGS_WITH_RUBY2_KEYWORDS define, so that `def f(...) end` operates similarly to  `def f(*, **) end`, allowing allocationless splat forwarding

Reduce array and hash allocations for nested argument forwarding calls

This uses a combination of frame flags and callinfo flags to track mutability of anonymous splat variables.  It can make it so the following call example only allocates a 1 array and 1 hash:

```ruby
def m1(*args, **kw)
end

def m2(...)
  m1(...)
end

def m3(*, **)
  m2(*, **)
end

m3(1, a: 1) # 1 array and 1 hash allocated
```

In the above example, the call to `m3` allocates an array (`[1]`) and a hash (`{a: 1}`), but the call to `m2` passes them as mutable splats, `m2` treats them as mutable splats when calling `m1`, and `m1` reuses the array that `m3` allocated for `args` and the hash that `m3` allocated for `kw`.

I created a benchmark for all of these changes.  In the method calls optimized by these changes, it is significantly faster:

```
named_multi_arg_splat
after:   5344097.6 i/s 
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:   5401882.3 i/s 
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:  12242780.9 i/s 
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:  11277398.7 i/s 
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:   5132699.5 i/s 
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:   5602915.1 i/s 
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:  15403727.3 i/s 
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:   2985715.3 i/s 
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:   2941030.4 i/s 
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:   2801008.7 i/s 
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:   2742670.4 i/s 
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:   2309246.6 i/s 
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:   2193227.6 i/s 
before:   1351184.1 i/s - 1.62x  slower
```

Only fallout from these changes:

* Minor change to AST output for `...` not using `ruby2_keywords`
* Prism and rbs need updating for `...` not using `ruby2_keywords`
* typeprof need updating for new VM instructions (at least pushtoarray)

VM_CALL_ARGS_SPLAT_MUT, concattoarray, and pushtoarray only affect uncommon callsites (multiple splats, argument after splat).  Other commits only optimize calls to methods using anonymous splats or `...` argument forwarding.  Previously, there was no performance reason to use anonymous splats or `...` argument forwarding, but with this change, using them can be faster, and can offer a new way for users to optimize their code.

In my opinion, this is too late for consideration in Ruby 3.3, but it could be considered for Ruby 3.4.

-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-01-25  2:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-15  2:36 [ruby-core:115749] [Ruby master Feature#20066] Reduce Implicit Array/Hash Allocations For Method Calls Involving Splats jeremyevans0 (Jeremy Evans) via ruby-core
2024-01-08 19:31 ` [ruby-core:116081] " Dan0042 (Daniel DeLorme) via ruby-core
2024-01-08 20:02 ` [ruby-core:116082] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-01-08 21:19 ` [ruby-core:116086] " Dan0042 (Daniel DeLorme) via ruby-core
2024-01-08 21:32 ` [ruby-core:116087] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-01-08 21:49 ` [ruby-core:116089] " Eregon (Benoit Daloze) via ruby-core
2024-01-08 21:54 ` [ruby-core:116090] " Eregon (Benoit Daloze) via ruby-core
2024-01-08 21:57 ` [ruby-core:116091] " Dan0042 (Daniel DeLorme) via ruby-core
2024-01-08 22:04 ` [ruby-core:116092] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-01-08 22:19 ` [ruby-core:116093] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-01-09  0:08 ` [ruby-core:116095] " Dan0042 (Daniel DeLorme) via ruby-core
2024-01-09  0:25 ` [ruby-core:116096] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-01-09  0:37 ` [ruby-core:116097] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-01-09  4:30 ` [ruby-core:116101] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-01-09 13:17 ` [ruby-core:116116] " Eregon (Benoit Daloze) via ruby-core
2024-01-09 16:39 ` [ruby-core:116123] " Dan0042 (Daniel DeLorme) via ruby-core
2024-01-09 17:02 ` [ruby-core:116124] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-01-09 18:30 ` [ruby-core:116126] " Dan0042 (Daniel DeLorme) via ruby-core
2024-01-25  2:29 ` [ruby-core:116436] " ko1 (Koichi Sasada) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).