ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
@ 2021-04-09 10:42 jean.boussier
  2021-04-09 11:30 ` [ruby-core:103344] " jean.boussier
                   ` (11 more replies)
  0 siblings, 12 replies; 14+ messages in thread
From: jean.boussier @ 2021-04-09 10:42 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been reported by byroot (Jean Boussier).

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103344] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
@ 2021-04-09 11:30 ` jean.boussier
  2021-04-09 13:38 ` [ruby-core:103347] " marcandre-ruby-core
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: jean.boussier @ 2021-04-09 11:30 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by byroot (Jean Boussier).


Proposed patch: https://github.com/ruby/ruby/pull/4373

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91435

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103347] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
  2021-04-09 11:30 ` [ruby-core:103344] " jean.boussier
@ 2021-04-09 13:38 ` marcandre-ruby-core
  2021-04-09 17:53 ` [ruby-core:103349] " eregontp
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: marcandre-ruby-core @ 2021-04-09 13:38 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by marcandre (Marc-Andre Lafortune).


Looks good. I doubt very much that this would be a compatibility concern.

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91437

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103349] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
  2021-04-09 11:30 ` [ruby-core:103344] " jean.boussier
  2021-04-09 13:38 ` [ruby-core:103347] " marcandre-ruby-core
@ 2021-04-09 17:53 ` eregontp
  2021-04-09 18:39 ` [ruby-core:103352] " dylan.smith
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: eregontp @ 2021-04-09 17:53 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by Eregon (Benoit Daloze).


I think that some people and libraries might expect that the `#clear` method releases the allocated memory.
This might be useful when e.g. reusing a String as a large buffer and the new usage might need less memory.
Not saying that's a good pattern, because IMHO it would be better to allocate a new String, but I'd guess it's used in some cases.

In general I think it is surprising that after `#clear` the object might "leak" a significant amount of memory, not observable from the typical Ruby methods on that collection.

`#clear` feels a bit similar to `#close` to me.


----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91440

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103352] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
                   ` (2 preceding siblings ...)
  2021-04-09 17:53 ` [ruby-core:103349] " eregontp
@ 2021-04-09 18:39 ` dylan.smith
  2021-04-09 18:44 ` [ruby-core:103353] " dylan.smith
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: dylan.smith @ 2021-04-09 18:39 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by dylants (Dylan Thacker-Smith).


What makes sense probably depends on how long lived the String is and whether there is an upper-bound to how much needs to be stored in it.

For instance, there may be a rare iteration of a loop that adds a lot to the String, which might be excessive to hold onto for most iterations.  As such, we may want to shrink the String back to the capacity we expect most iterations to use, such as the initial capacity.

It would be nice to have more control over the capacity of collections, such as Array or String.  If we know exactly how much memory is needed, then it would be useful to have `shrink(capacity = bytesize)` and `reserve(capacity)` for this purpose.  It is also common to not know at least a specific amount of memory needs to be reserved, but to not know exactly how much is needed, so providing a `capacity` method gives more control over how to expand memory (e.g. double capacity until it is at least the minimum amount needed, then call `reserve` with that expanded capacity).  This would provide the primitives needed to avoid unnecessary reallocations, which convenience methods can always be built on top of.

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91443

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103353] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
                   ` (3 preceding siblings ...)
  2021-04-09 18:39 ` [ruby-core:103352] " dylan.smith
@ 2021-04-09 18:44 ` dylan.smith
  2021-04-09 21:28 ` [ruby-core:103356] " jean.boussier
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: dylan.smith @ 2021-04-09 18:44 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by dylants (Dylan Thacker-Smith).


If we want `clear` to shrink memory by default, a `shrink: true` keyword argument could be added so the user could override this default with `clear(shrink: false)`.  This would make the change less risky, since it wouldn't change the behaviour of existing code.

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91444

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103356] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
                   ` (4 preceding siblings ...)
  2021-04-09 18:44 ` [ruby-core:103353] " dylan.smith
@ 2021-04-09 21:28 ` jean.boussier
  2021-04-09 21:42   ` [ruby-core:103357] " Eric Wong
  2021-04-10  9:28 ` [ruby-core:103369] " eregontp
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 14+ messages in thread
From: jean.boussier @ 2021-04-09 21:28 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by byroot (Jean Boussier).


> so providing a capacity method gives more control over how to expand memory 

Agreed. Without also exposing the capacity, my proposed change would be a big footgun. 

Maybe `String#capacity` and `String#capacity=` would make sense? But then there's the question of the behavior if you set the capacity to lower than the `size`. Should it truncate? (this could corrupt UTF-8 for instance) or should it raise?

Additionally I think `Array` and `Hash` should expose similar ways of querying and reserving capacity.

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91447

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103357] Re: [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 21:28 ` [ruby-core:103356] " jean.boussier
@ 2021-04-09 21:42   ` Eric Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2021-04-09 21:42 UTC (permalink / raw)
  To: ruby-core

jean.boussier@gmail.com wrote:
> > so providing a capacity method gives more control over how to expand memory 
> 
> Agreed. Without also exposing the capacity, my proposed change would be a big footgun. 

Yes, rb_str_resize(str, 0) is common to workaround the lack of
escape analysis inside the core VM and some C exts.  I think
it's reasonable for Rubyists to use String#clear for the same
purpose.

> Maybe `String#capacity` and `String#capacity=` would make sense? But then there's the question of the behavior if you set the capacity to lower than the `size`. Should it truncate? (this could corrupt UTF-8 for instance) or should it raise?

Yes, but I don't know what it should do for corruption.  It
would also be useful for IO#read-like methods if/when that
supports destination buffer offsets.

> Additionally I think `Array` and `Hash` should expose similar ways of querying and reserving capacity.

Probably, yes.  It seems a bit low-level, but I've been favoring
"semi-automatic" memory management since we probably can't have
escape analysis due to the C API.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103369] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
                   ` (5 preceding siblings ...)
  2021-04-09 21:28 ` [ruby-core:103356] " jean.boussier
@ 2021-04-10  9:28 ` eregontp
  2021-04-10  9:33 ` [ruby-core:103370] " eregontp
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: eregontp @ 2021-04-10  9:28 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by Eregon (Benoit Daloze).


byroot (Jean Boussier) wrote in #note-6:
> Maybe `String#capacity` and `String#capacity=` would make sense? But then there's the question of the behavior if you set the capacity to lower than the `size`. Should it truncate? (this could corrupt UTF-8 for instance) or should it raise?

I would say definitely not truncate.
So either raise an error, or do nothing, since the capacity is kind of a hint.

My feeling is handling the capacity in Ruby code feels wrong and like C++ code.
The example snippet above can just allocate a new String per call to `build_next_packet` and I doubt that would affect performance much:
```ruby
10.times do
  buffer = build_next_packet # build_next_packet can use String.new(capacity: 1024), it probably knows better about it anyway
  udp_socket.send(buffer)
end
```
A bit more GC pressure, but I think anyway there would be many other allocations in `build_next_packet` that this wouldn't matter much.

IMHO we should really have a separate Buffer class and String class here.
For instance if the String class uses a rope representation (e.g. on TruffleRuby) a capacity doesn't make much sense.

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91462

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103370] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
                   ` (6 preceding siblings ...)
  2021-04-10  9:28 ` [ruby-core:103369] " eregontp
@ 2021-04-10  9:33 ` eregontp
  2021-04-10 15:17 ` [ruby-core:103373] " jean.boussier
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: eregontp @ 2021-04-10  9:33 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by Eregon (Benoit Daloze).


I think `clear(shrink: true/false)` would be fine to add.

I'm not sure if it's really needed in practice though.

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91463

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103373] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
                   ` (7 preceding siblings ...)
  2021-04-10  9:33 ` [ruby-core:103370] " eregontp
@ 2021-04-10 15:17 ` jean.boussier
  2021-04-12  1:29 ` [ruby-core:103387] " daniel
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: jean.boussier @ 2021-04-10 15:17 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by byroot (Jean Boussier).


> My feeling is handling the capacity in Ruby code feels wrong and like C++ code.

This is really meant for the few low level places where it matters. For context this came up when trying to optimize a StatsD client, which is quite a hotspot. 

More generally there are cases when you know you'll return a large String/Array/Hash, and you know the size in advance, and it can make sense to pre-reserve capacity rather than having it resized a dozen times. Amusingly enough the C API allow to create an array or hash with a specific capacity.

> IMHO we should really have a separate Buffer class and String class here.

Possibly yes. StringIO is kind of meant to be that buffer class, but it's not always easier to use, and often is slower than using a string.

> I think clear(shrink: true/false) would be fine to add.

I think it would be good, but would also require a `String#resize(capacity)` as well.

So the pattern would be:

```
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)
loop do

  # do your thing
  buffer.clear(shrink: false)
  buffer.resize(1024)
end
``` 

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91466

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103387] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
                   ` (8 preceding siblings ...)
  2021-04-10 15:17 ` [ruby-core:103373] " jean.boussier
@ 2021-04-12  1:29 ` daniel
  2021-04-14 18:28 ` [ruby-core:103451] " dylan.smith
  2021-04-21 23:27 ` [ruby-core:103546] " dsisnero
  11 siblings, 0 replies; 14+ messages in thread
From: daniel @ 2021-04-12  1:29 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by Dan0042 (Daniel DeLorme).


What about `buffer.clear(capacity: 1024)`
Or maybe even `buffer.clear(capacity: 1024..8192)`
I think that's more straightforward than separate `clear` and `resize` operations.

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91482

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103451] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
                   ` (9 preceding siblings ...)
  2021-04-12  1:29 ` [ruby-core:103387] " daniel
@ 2021-04-14 18:28 ` dylan.smith
  2021-04-21 23:27 ` [ruby-core:103546] " dsisnero
  11 siblings, 0 replies; 14+ messages in thread
From: dylan.smith @ 2021-04-14 18:28 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by dylants (Dylan Thacker-Smith).


> Maybe String#capacity and String#capacity= would make sense?

Using `capacity=` for the method name would set the assumption that the capacity is exactly that after the call.  However, with embedded strings, the capacity would be fixed until it grows larger than what can be embedded in the object struct.  That's why I suggested `shrink` as the name to shrink the capacity.

> But then there's the question of the behavior if you set the capacity to lower than the size. Should it truncate? (this could corrupt UTF-8 for instance) or should it raise?

I think that should raise, since it seems too implicit to have a call to set the capacity also truncate the contents.

I do think it would be useful to be able to efficiently truncate a string, but that could be done with a separate method. For example, `String#size=` could be provided and could efficiently truncate a binary string and would avoid corrupting UTF-8 strings.

There are limited String methods for working with byte offsets for variable width encoded strings like UTF-8, so I'm actually surprised that there is already a String#byteslice method.  Nothing prevents that from creating an invalid UTF-8 string, however, I don't see the use case for using that with non-binary strings.  I think a way to truncate using byte offset would be more useful as part of the C API for now.

> My feeling is handling the capacity in Ruby code feels wrong and like C++ code.

Performance sensitive code will naturally be written based on what is more efficient for the machine (the primary concern of C++), such as preferring mutations to avoid object allocations. Providing primitive low-level methods for performance sensitive ruby code will allow more pleasant optimization than forcing the code to be rewritten in a native extension to do the same optimization.

> String#resize

`size` refers to the size of the contents, so `resize` seems like it would affect that `size` (e.g. truncating or padding) instead of just the capacity.

> What about buffer.clear(capacity: 1024)
> Or maybe even buffer.clear(capacity: 1024..8192)
> I think that's more straightforward than separate clear and resize operations.

Coupling capacity control with clearing the buffer makes the capacity control less general.  For instance, it doesn't support shrinking the buffer to fit the contents or growing the buffer once before multiple appends.

----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91544

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:103546] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity
  2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
                   ` (10 preceding siblings ...)
  2021-04-14 18:28 ` [ruby-core:103451] " dylan.smith
@ 2021-04-21 23:27 ` dsisnero
  11 siblings, 0 replies; 14+ messages in thread
From: dsisnero @ 2021-04-21 23:27 UTC (permalink / raw)
  To: ruby-core

Issue #17790 has been updated by dsisnero (Dominic Sisneros).


That was what I was hoping the addition of memoryview would help with but the only way to interact with the memoryview in ruby is with Fiddle
If we had a ByteArray class that implemented memoryview 

buffer = ByteArray.new('this is a string'.bytes)  
mv = Fiddle::MemoryView.new(buffer)
mv.byte_size # 16
first8 = mv[0:8]   # once Fiddle::MemoryView allows you to slice
socket.write(first8)   # once socket.write allows you to write memoryview objects without changing into string. 

What memoryview is supposed to do is allow the reading and writing with zero copy because it knows the offsets, strides, etc of the underlying obj in the buffer

So, I think we should instead finish the parts of memoryview that are missing:

1) IO support for memoryview (read into a memoryview object and write from a memoryview object without converting to strings)
2) Add classes that implement the memoryview protocol  
3) change String.bytes to return a new ByteArray class that implements the memoryview protocol
4) add a ruby extension that allows you to use memory view objects in ruby (not just Fiddle) 
   mv = MemoryView.new(obj)
   mv[offset:offset_size]   #slicing memory views
   mv.cast(format, shape)  - change format or shape of memoryview but keep data
  mv.format
  mv.strides
  mv.shape
   





----------------------------------------
Feature #17790: Have a way to clear a String without resetting its capacity
https://bugs.ruby-lang.org/issues/17790#change-91647

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
In some tight loop it can be useful to re-use a buffer string. For instance:

```ruby
buffer = String.new(encoding: Encoding::BINARY, capacity: 1024)

10.times do
  build_next_packet(buffer)
  udp_socket.send(buffer)
  buffer.clear
end
```

Currently `Array#clear` preserve the Array capacity, but `String#clear` doesn't:

```ruby
>> puts ObjectSpace.dump(Array.new(20).clear)
{"address":"0x7fd3260a1558", "type":"ARRAY", "class":"0x7fd3230972e0", "length":0, "memsize":200, "flags":{"wb_protected":true}}
>> puts ObjectSpace.dump(String.new(encoding: Encoding::BINARY, capacity: 1024).clear)
{"address":"0x7fd322a8a320", "type":"STRING", "class":"0x7fd3230b75b8", "embedded":true, "bytesize":0, "value":"", "memsize":40, "flags":{"wb_protected":true}}
```

It would be useful if `String#clear` wouldn't free allocated memory, but if it's a backward compatibility concern to change it, then maybe another method could make sense?




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-04-21 23:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-09 10:42 [ruby-core:103342] [Ruby master Feature#17790] Have a way to clear a String without resetting its capacity jean.boussier
2021-04-09 11:30 ` [ruby-core:103344] " jean.boussier
2021-04-09 13:38 ` [ruby-core:103347] " marcandre-ruby-core
2021-04-09 17:53 ` [ruby-core:103349] " eregontp
2021-04-09 18:39 ` [ruby-core:103352] " dylan.smith
2021-04-09 18:44 ` [ruby-core:103353] " dylan.smith
2021-04-09 21:28 ` [ruby-core:103356] " jean.boussier
2021-04-09 21:42   ` [ruby-core:103357] " Eric Wong
2021-04-10  9:28 ` [ruby-core:103369] " eregontp
2021-04-10  9:33 ` [ruby-core:103370] " eregontp
2021-04-10 15:17 ` [ruby-core:103373] " jean.boussier
2021-04-12  1:29 ` [ruby-core:103387] " daniel
2021-04-14 18:28 ` [ruby-core:103451] " dylan.smith
2021-04-21 23:27 ` [ruby-core:103546] " dsisnero

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).