ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:87845] [Ruby trunk Bug#14900] Extra allocation in String#byteslice
       [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
@ 2018-07-07  9:47 ` janko.marohnic
  2018-07-07 20:48 ` [ruby-core:87856] " samuel
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: janko.marohnic @ 2018-07-07  9:47 UTC (permalink / raw
  To: ruby-core

Issue #14900 has been reported by janko (Janko Marohnić).

----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900

* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:87856] [Ruby trunk Bug#14900] Extra allocation in String#byteslice
       [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
  2018-07-07  9:47 ` [ruby-core:87845] [Ruby trunk Bug#14900] Extra allocation in String#byteslice janko.marohnic
@ 2018-07-07 20:48 ` samuel
  2018-07-08  2:52 ` [ruby-core:87867] " samuel
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: samuel @ 2018-07-07 20:48 UTC (permalink / raw
  To: ruby-core

Issue #14900 has been updated by ioquatix (Samuel Williams).


Nice catch I will try to verify on my end too

----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900#change-72874

* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.

EDIT: It seems that `String#slice` has the same issue.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:87867] [Ruby trunk Bug#14900] Extra allocation in String#byteslice
       [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
  2018-07-07  9:47 ` [ruby-core:87845] [Ruby trunk Bug#14900] Extra allocation in String#byteslice janko.marohnic
  2018-07-07 20:48 ` [ruby-core:87856] " samuel
@ 2018-07-08  2:52 ` samuel
  2018-07-08  2:52 ` [ruby-core:87868] " samuel
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: samuel @ 2018-07-08  2:52 UTC (permalink / raw
  To: ruby-core

Issue #14900 has been updated by ioquatix (Samuel Williams).


Okay, I reproduced the error. I made a test case here:

https://github.com/ioquatix/ruby/commit/9fb5cd644209efc79378841e1b6eb644876393b0

I test both prefix and postfix as you discuss in your initial report.

----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900#change-72882

* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.

EDIT: It seems that `String#slice` has the same issue.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:87868] [Ruby trunk Bug#14900] Extra allocation in String#byteslice
       [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2018-07-08  2:52 ` [ruby-core:87867] " samuel
@ 2018-07-08  2:52 ` samuel
  2018-07-08  3:22 ` [ruby-core:87871] " samuel
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: samuel @ 2018-07-08  2:52 UTC (permalink / raw
  To: ruby-core

Issue #14900 has been updated by ioquatix (Samuel Williams).


One thing I noticed if I freeze source string, the extra memory allocation goes away.

----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900#change-72883

* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.

EDIT: It seems that `String#slice` has the same issue.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:87871] [Ruby trunk Bug#14900] Extra allocation in String#byteslice
       [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2018-07-08  2:52 ` [ruby-core:87868] " samuel
@ 2018-07-08  3:22 ` samuel
  2018-07-08  3:31 ` [ruby-core:87872] " samuel
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: samuel @ 2018-07-08  3:22 UTC (permalink / raw
  To: ruby-core

Issue #14900 has been updated by ioquatix (Samuel Williams).


Okay I made an attempt to fix this: https://github.com/ruby/ruby/pull/1909

----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900#change-72884

* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.

EDIT: It seems that `String#slice` has the same issue.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:87872] [Ruby trunk Bug#14900] Extra allocation in String#byteslice
       [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2018-07-08  3:22 ` [ruby-core:87871] " samuel
@ 2018-07-08  3:31 ` samuel
  2018-07-08  9:46 ` [ruby-core:87877] " funny.falcon
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: samuel @ 2018-07-08  3:31 UTC (permalink / raw
  To: ruby-core

Issue #14900 has been updated by ioquatix (Samuel Williams).


I think there are several things to consider here:

- Even though the string appears to be two allocations, it's only one allocation but the 2nd one is sharing the first's data.
- I guess that subsequent slice would share the underling frozen string?
- In some cases, byteslice might be less efficient, e.g. 100Mbyte buffer, slice the last 10bytes, it makes an entire copy of the source string, but all you were interested in was 10 bytes at the end.

----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900#change-72885

* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.

EDIT: It seems that `String#slice` has the same issue.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:87877] [Ruby trunk Bug#14900] Extra allocation in String#byteslice
       [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2018-07-08  3:31 ` [ruby-core:87872] " samuel
@ 2018-07-08  9:46 ` funny.falcon
  2018-07-08  9:47 ` [ruby-core:87878] " samuel
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: funny.falcon @ 2018-07-08  9:46 UTC (permalink / raw
  To: ruby-core

Issue #14900 has been updated by funny_falcon (Yura Sokolov).


@ioquatix, your patch doesn't seems to be correct for me on first glance. 

Imagine pipelined RPC server:
- we read data into buffer
- while buffer larger than request size
 - detect first request and split buffer into request and rest of buffer

Same for any other binary parser.

With current behavior, operation "get rest of buffer" will copy buffer into shared frozen string only once.
With your patch it will copy every time. 
So instead on linear complexity we will have quadratic complexity.

Thinking second time, it is possible to use frozen string explicitely for buffer, just not so trivial (while there are not enough data for request, buffer should not frozen, and `<<` should be used, otherwise it should be frozen, and `+` used).

Some programs will certainly become slower with this change, until they fixed. 

I'm not against the patch, but new behavior should be carefully documented and mentioned in a Changelog as a change, that could negatively affect performance if not concerned.

----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900#change-72890

* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.

EDIT: It seems that `String#slice` has the same issue.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:87878] [Ruby trunk Bug#14900] Extra allocation in String#byteslice
       [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2018-07-08  9:46 ` [ruby-core:87877] " funny.falcon
@ 2018-07-08  9:47 ` samuel
  2018-07-08 10:45 ` [ruby-core:87879] " samuel
  2018-07-08 11:35 ` [ruby-core:87880] " samuel
  9 siblings, 0 replies; 10+ messages in thread
From: samuel @ 2018-07-08  9:47 UTC (permalink / raw
  To: ruby-core

Issue #14900 has been updated by ioquatix (Samuel Williams).


Yeah, I agree, this patch probably isn't right, but I just try to figure it out what is going on and suggest a solution. The outcome may be that this is normal behaviour. Thanks for your feedback.

----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900#change-72891

* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.

EDIT: It seems that `String#slice` has the same issue.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:87879] [Ruby trunk Bug#14900] Extra allocation in String#byteslice
       [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2018-07-08  9:47 ` [ruby-core:87878] " samuel
@ 2018-07-08 10:45 ` samuel
  2018-07-08 11:35 ` [ruby-core:87880] " samuel
  9 siblings, 0 replies; 10+ messages in thread
From: samuel @ 2018-07-08 10:45 UTC (permalink / raw
  To: ruby-core

Issue #14900 has been updated by ioquatix (Samuel Williams).


The way I've implemented it now (as in your first example) is something like this:

```
@buffer = read_data
if @buffer.size > REQUEST_SIZE
    @buffer.freeze
    request_buffer = @buffer.byteslice(0, REQUEST_SIZE)
    @buffer = @buffer.byteslice(REQUEST_SIZE, @buffer.size)
end
```

Because we will recreate @buffer from remainder, it makes sense to freeze the source to avoid generating a hidden copy. Does that make sense?


----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900#change-72892

* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.

EDIT: It seems that `String#slice` has the same issue.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:87880] [Ruby trunk Bug#14900] Extra allocation in String#byteslice
       [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
                   ` (8 preceding siblings ...)
  2018-07-08 10:45 ` [ruby-core:87879] " samuel
@ 2018-07-08 11:35 ` samuel
  9 siblings, 0 replies; 10+ messages in thread
From: samuel @ 2018-07-08 11:35 UTC (permalink / raw
  To: ruby-core

Issue #14900 has been updated by ioquatix (Samuel Williams).


I played around with my assumptions here. By far the worst from a memory POV was `slice!`, which given a string of 5MB, produces 7.5MB allocations. The equivalent sequence of `byteslice` as above only allocates 2.5MB.

Here were my comparisons:

```
measure_memory("Initial allocation") do
	string = "a" * 5*1024*1024
	string.freeze
end # => 5.0 MB

measure_memory("Byteslice from start to middle") do
	# Why does this need to allocate memory? Surely it can share the original allocation?
	x = string.byteslice(0, string.bytesize / 2)
end # => 2.5 MB

measure_memory("Byteslice from middle to end") do
	string.byteslice(string.bytesize / 2, string.bytesize)
end # => 0.0 MB

measure_memory("Slice! from start to middle") do
	string.dup.slice!(0, string.bytesize / 2) # dup doesn't make any difference to size of allocations
end # => 7.5 MB

measure_memory("Byte slice into two halves") do
	head = string.byteslice(0, string.bytesize / 2)
	remainder = string.byteslice(string.bytesize / 2, string.bytesize)
end # 2.5 MB
```

(examples are also here: https://github.com/socketry/async-io/blob/master/examples/allocations/byteslice.rb)

In the best case, the last example should be able to reuse the source string entirely, but Ruby doesn't seem capable of doing that yet. Perhaps a specific implementation of `byteslice!` could address this use case with zero allocations?

----------------------------------------
Bug #14900: Extra allocation in String#byteslice
https://bugs.ruby-lang.org/issues/14900#change-72893

* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script:

~~~ ruby
require "objspace"

string = "a" * 100_000

GC.start
GC.disable
generation = GC.count

ObjectSpace.trace_object_allocations do
  string.byteslice(50_000..-1)

  ObjectSpace.each_object(String) do |string|
    p string.bytesize if ObjectSpace.allocation_generation(string) == generation
  end
end
~~~

it outputs

~~~
50000
100000
6
5
~~~

The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations.

If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur.

~~~ ruby
# ...
  string.byteslice(0, 50_000)
# ...
~~~

~~~
50000
5
~~~

It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result.

EDIT: It seems that `String#slice` has the same issue.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-07-08 11:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-14900.20180707094727@ruby-lang.org>
2018-07-07  9:47 ` [ruby-core:87845] [Ruby trunk Bug#14900] Extra allocation in String#byteslice janko.marohnic
2018-07-07 20:48 ` [ruby-core:87856] " samuel
2018-07-08  2:52 ` [ruby-core:87867] " samuel
2018-07-08  2:52 ` [ruby-core:87868] " samuel
2018-07-08  3:22 ` [ruby-core:87871] " samuel
2018-07-08  3:31 ` [ruby-core:87872] " samuel
2018-07-08  9:46 ` [ruby-core:87877] " funny.falcon
2018-07-08  9:47 ` [ruby-core:87878] " samuel
2018-07-08 10:45 ` [ruby-core:87879] " samuel
2018-07-08 11:35 ` [ruby-core:87880] " samuel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).