* [ruby-core:86954] [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream
[not found] <redmine.issue-14745.20180509125009@ruby-lang.org>
@ 2018-05-09 12:50 ` janko.marohnic
2018-05-11 3:36 ` [ruby-core:86983] " Eric Wong
` (2 more replies)
2018-05-16 9:33 ` [ruby-core:87075] " janko.marohnic
2018-05-17 12:28 ` [ruby-core:87136] " janko.marohnic
2 siblings, 3 replies; 8+ messages in thread
From: janko.marohnic @ 2018-05-09 12:50 UTC (permalink / raw)
To: ruby-core
Issue #14745 has been reported by janko (Janko Marohnić).
----------------------------------------
Bug #14745: High memory usage when using String#replace with IO.copy_stream
https://bugs.ruby-lang.org/issues/14745
* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
I'm using custom IO-like objects that implement #read as the first argument to IO.copy_stream, and I noticed odd memory behaviour when using String#replace on the output buffer versus String#clear. Here is an example of a "fake IO" object where #read uses String#clear on the output buffer:
~~~ ruby
GC.disable
require "stringio"
class FakeIO
def initialize(content)
@io = StringIO.new(content)
end
def read(length, outbuf)
chunk = @io.read(length)
if chunk
outbuf.clear
outbuf << chunk
chunk.clear
else
outbuf.clear
end
outbuf unless outbuf.empty?
end
end
io = FakeIO.new("a" * 50*1024*1024) # 50MB
IO.copy_stream(io, File::NULL)
system "top -pid #{Process.pid}"
~~~
This program outputs memory usage of 50MB at the end, as expected – 50MB was loaded into memory at the beginning and any new strings are deallocated. However, if I modify the #read implementation to use String#replace instead of String#clear:
~~~ ruby
def read(length, outbuf)
chunk = @io.read(length)
if chunk
outbuf.replace chunk
chunk.clear
else
outbuf.clear
end
outbuf unless outbuf.empty?
end
~~~
the memory usage has now doubled to 100MB at the end of the program, indicating that some string bytes weren't successfully deallocated. So, it seems that String#replace has different behaviour compared to String#clear + String#<<.
I was *only* able to reproduce this with `IO.copy_stream`, the following program shows 50MB memory usage, regardless of whether the String#clear or String#replace approach is used:
~~~ ruby
GC.disable
buffer = "a" * 50*1024*1024
chunk = "b" * 50*1024*1024
if ARGV[0] == "clear"
buffer.clear
buffer << chunk
else
buffer.replace chunk
end
chunk.clear
system "top -pid #{Process.pid}"
~~~
With this program I also noticed one interesting thing. If I remove `chunk.clear`, then the "clear" version uses 100MB as expected (because both buffer and chunk strings are 50MB large), but the "replace" version uses only 50MB, which makes it appear that the `buffer` string doesn't use any memory when in fact it should use 50MB just like the `chunk` string. I found that odd, and I think it might be a clue to the memory bug with String#replace I experienced when using `IO.copy_stream`.
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:86983] Re: [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream
2018-05-09 12:50 ` [ruby-core:86954] [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream janko.marohnic
@ 2018-05-11 3:36 ` Eric Wong
2018-05-11 4:26 ` [ruby-core:86987] " Eric Wong
2018-05-11 5:43 ` [ruby-core:86988] " Eric Wong
2 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2018-05-11 3:36 UTC (permalink / raw)
To: ruby-core
janko.marohnic@gmail.com wrote:
> the memory usage has now doubled to 100MB at the end of the
> program, indicating that some string bytes weren't
> successfully deallocated. So, it seems that String#replace has
> different behaviour compared to String#clear + String#<<.
Yes, this is an unfortunate side effect because of copy-on-write
semantics of String#replace. If the arg (other_str) for
String#replace is non-frozen, a new frozen string is created with
using the existing malloc-ed pointer. Both the receiver string
and other_str point to that new, shared string.
> I was *only* able to reproduce this with `IO.copy_stream`
I suspect part of this is because outbuf becomes a long-lived object
with IO.copy_stream (not sure), and the hidden frozen string
becomes long-lived, as well.
So yeah; a combination of well-intentioned optimizations hurt
when combined together.
The other part could be anything using IO#write could create
massive amounts of garbage before 2.5:
https://bugs.ruby-lang.org/issues/13085 And your copy_stream use
hits the IO#write case. Unfortunately, the "fix" for [Bug
#13085] won't work effectively with shared strings, because we
can't free+recycle the a string which wasn't created internally
by the VM.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:86987] Re: [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream
2018-05-09 12:50 ` [ruby-core:86954] [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream janko.marohnic
2018-05-11 3:36 ` [ruby-core:86983] " Eric Wong
@ 2018-05-11 4:26 ` Eric Wong
2018-05-11 5:43 ` [ruby-core:86988] " Eric Wong
2 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2018-05-11 4:26 UTC (permalink / raw)
To: ruby-core
janko.marohnic@gmail.com wrote:
> ~~~ ruby
> def read(length, outbuf)
> chunk = @io.read(length)
>
> if chunk
> outbuf.replace chunk
> chunk.clear
Elaborating on my previous comment, chunk.clear is a no-op in
this case because the shared frozen string is still in play and
now used by outbuf.
> else
> outbuf.clear
Likewise, the final #clear is also no-op because the
hidden+frozen string is still floating around waiting to be
GC-ed.
> end
>
> outbuf unless outbuf.empty?
> end
> ~~~
I don't think you need IO.copy_stream or IO#write to trigger, even.
~~~ ruby
io = FakeIO.new("a" * 50*1024*1024) # 50MB
buf = ''.b
while z = io.read(16384, buf)
end
~~~
I tried playing with shortening object lifetimes, but I guess GC
is not aggressive enough by default to matter.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:86988] Re: [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream
2018-05-09 12:50 ` [ruby-core:86954] [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream janko.marohnic
2018-05-11 3:36 ` [ruby-core:86983] " Eric Wong
2018-05-11 4:26 ` [ruby-core:86987] " Eric Wong
@ 2018-05-11 5:43 ` Eric Wong
2 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2018-05-11 5:43 UTC (permalink / raw)
To: ruby-core
janko.marohnic@gmail.com wrote:
> def initialize(content)
> @io = StringIO.new(content)
> end
>
> def read(length, outbuf)
> chunk = @io.read(length)
Finally, I always assumed your example is a contrived case and
you're dealing with an interface somewhere (not StringIO) which
doesn't accept a destination buffer for .read.
Fwiw, I encountered a similar problem with Net::HTTP and
Zlib::GzipReader earlier this year and made r61665 to ensure
outbuf got used all the way down the call chain.
Sometimes I guess modifying/fixing interfaces is the only way to go.
(unless we can do escape analysis, but I'm not sure how feasible
it is with a dynamic language)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:87075] [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream
[not found] <redmine.issue-14745.20180509125009@ruby-lang.org>
2018-05-09 12:50 ` [ruby-core:86954] [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream janko.marohnic
@ 2018-05-16 9:33 ` janko.marohnic
2018-05-16 10:07 ` [ruby-core:87077] " Eric Wong
2018-05-17 12:28 ` [ruby-core:87136] " janko.marohnic
2 siblings, 1 reply; 8+ messages in thread
From: janko.marohnic @ 2018-05-16 9:33 UTC (permalink / raw)
To: ruby-core
Issue #14745 has been updated by janko (Janko Marohnić).
> Yes, this is an unfortunate side effect because of copy-on-write
> semantics of String#replace. If the arg (other_str) for
> String#replace is non-frozen, a new frozen string is created with
> using the existing malloc-ed pointer. Both the receiver string
> and other_str point to that new, shared string.
>
> So yeah; a combination of well-intentioned optimizations hurt
> when combined together.
That makes sense, thanks for the explanation.
> The other part could be anything using IO#write could create
> massive amounts of garbage before 2.5
This is on Ruby 2.5.1, so I'm guessing it doesn't apply here.
> Finally, I always assumed your example is a contrived case and
> you're dealing with an interface somewhere (not StringIO) which
> doesn't accept a destination buffer for .read.
The example was simplified for reproducing purposes. The place where I discovered this was in https://github.com/rocketjob/symmetric-encryption/pull/98 (I eventually managed to figure out `String#replace` was causing the high memory usage, so I switched to `String#clear`).
In short, the `SymmetricEncryption::Reader` object wraps an IO object with encrypted content, and when calling `#read` it reads data from the underlying IO object, decrypts it and returns the decrypted data. So, it's not patching the lack of outbuf argument (because the underlying IO object *should* accept the outbuf argument), rather it provides an `IO#read` interface over incrementally decrypting IO object content.
If this is necessary behaviour when having the copy-on-write optimization, feel free to close this ticket then.
----------------------------------------
Bug #14745: High memory usage when using String#replace with IO.copy_stream
https://bugs.ruby-lang.org/issues/14745#change-72040
* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
I'm using custom IO-like objects that implement #read as the first argument to IO.copy_stream, and I noticed odd memory behaviour when using String#replace on the output buffer versus String#clear. Here is an example of a "fake IO" object where #read uses String#clear on the output buffer:
~~~ ruby
GC.disable
require "stringio"
class FakeIO
def initialize(content)
@io = StringIO.new(content)
end
def read(length, outbuf)
chunk = @io.read(length)
if chunk
outbuf.clear
outbuf << chunk
chunk.clear
else
outbuf.clear
end
outbuf unless outbuf.empty?
end
end
io = FakeIO.new("a" * 50*1024*1024) # 50MB
IO.copy_stream(io, File::NULL)
system "top -pid #{Process.pid}"
~~~
This program outputs memory usage of 50MB at the end, as expected – 50MB was loaded into memory at the beginning and any new strings are deallocated. However, if I modify the #read implementation to use String#replace instead of String#clear:
~~~ ruby
def read(length, outbuf)
chunk = @io.read(length)
if chunk
outbuf.replace chunk
chunk.clear
else
outbuf.clear
end
outbuf unless outbuf.empty?
end
~~~
the memory usage has now doubled to 100MB at the end of the program, indicating that some string bytes weren't successfully deallocated. So, it seems that String#replace has different behaviour compared to String#clear + String#<<.
I was *only* able to reproduce this with `IO.copy_stream`, the following program shows 50MB memory usage, regardless of whether the String#clear or String#replace approach is used:
~~~ ruby
GC.disable
buffer = "a" * 50*1024*1024
chunk = "b" * 50*1024*1024
if ARGV[0] == "clear"
buffer.clear
buffer << chunk
else
buffer.replace chunk
end
chunk.clear
system "top -pid #{Process.pid}"
~~~
With this program I also noticed one interesting thing. If I remove `chunk.clear`, then the "clear" version uses 100MB as expected (because both buffer and chunk strings are 50MB large), but the "replace" version uses only 50MB, which makes it appear that the `buffer` string doesn't use any memory when in fact it should use 50MB just like the `chunk` string. I found that odd, and I think it might be a clue to the memory bug with String#replace I experienced when using `IO.copy_stream`.
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:87077] Re: [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream
2018-05-16 9:33 ` [ruby-core:87075] " janko.marohnic
@ 2018-05-16 10:07 ` Eric Wong
0 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2018-05-16 10:07 UTC (permalink / raw)
To: ruby-core
janko.marohnic@gmail.com wrote:
> I wrote:
> > Finally, I always assumed your example is a contrived case and
> > you're dealing with an interface somewhere (not StringIO) which
> > doesn't accept a destination buffer for .read.
> The example was simplified for reproducing purposes. The place
> where I discovered this was in
> https://github.com/rocketjob/symmetric-encryption/pull/98 (I
> eventually managed to figure out `String#replace` was causing
> the high memory usage, so I switched to `String#clear`).
>
> In short, the `SymmetricEncryption::Reader` object wraps an IO
> object with encrypted content, and when calling `#read` it
> reads data from the underlying IO object, decrypts it and
> returns the decrypted data. So, it's not patching the lack of
> outbuf argument (because the underlying IO object *should*
> accept the outbuf argument), rather it provides an `IO#read`
> interface over incrementally decrypting IO object content.
I may be misreading(*), but it looks like @stream_cipher.update
can take a second destination arg (like IO#read and friends) and
maybe that helps... (that appears to be OpenSSL::Cipher#update)
If that's not usable somehow, I've sometimes wondered if adding a
String#exchange! method might help:
foo = "hello............................"
bar = "world............................"
foo.exchange!(bar)
p foo # => "world............................"
p bar # => "hello............................"
In your case, you could still use String#clear with this:
dst.clear
dst.exchange!(tmp)
# tmp is now '' because dst was cleared before
For non-embed strings, it would swap the pointers and be zero-copy,
but won't suffer the new-frozen-string-behind-your-back problem
of String#replace.
(*) I'm looking at your commit 8d41efacff26f3357016d6b611bee174802fba66
after git clone-ing (since I don't do JS-infested web UIs)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:87136] [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream
[not found] <redmine.issue-14745.20180509125009@ruby-lang.org>
2018-05-09 12:50 ` [ruby-core:86954] [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream janko.marohnic
2018-05-16 9:33 ` [ruby-core:87075] " janko.marohnic
@ 2018-05-17 12:28 ` janko.marohnic
2018-05-18 18:52 ` [ruby-core:87176] " Eric Wong
2 siblings, 1 reply; 8+ messages in thread
From: janko.marohnic @ 2018-05-17 12:28 UTC (permalink / raw)
To: ruby-core
Issue #14745 has been updated by janko (Janko Marohnić).
> it looks like @stream_cipher.update can take a second destination arg (like IO#read and friends) and maybe that helps... (that appears to be OpenSSL::Cipher#update)
Wow, I didn't know that. Thanks, this allowed me to greatly simplify the `#read` implementation, so now it's both simpler and has lower memory usage than the original implementation.
> I've sometimes wondered if adding a String#exchange! method might help
That sounds very useful to have in general. In my particular scenario (the PR) I wouldn't have use of it after all, this `String#clear` vs `String#replace` thing was just something I encountered during the optimization, the current version doesn't need `String#replace`. But I can imagine `String#exchange!` being a useful tool for diligent string management.
I'll close this ticket, as this seems to be a necessary consequence of the copy-on-write optimization, and I'm not having problems with it anymore.
----------------------------------------
Bug #14745: High memory usage when using String#replace with IO.copy_stream
https://bugs.ruby-lang.org/issues/14745#change-72123
* Author: janko (Janko Marohnić)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
* ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
I'm using custom IO-like objects that implement #read as the first argument to IO.copy_stream, and I noticed odd memory behaviour when using String#replace on the output buffer versus String#clear. Here is an example of a "fake IO" object where #read uses String#clear on the output buffer:
~~~ ruby
GC.disable
require "stringio"
class FakeIO
def initialize(content)
@io = StringIO.new(content)
end
def read(length, outbuf)
chunk = @io.read(length)
if chunk
outbuf.clear
outbuf << chunk
chunk.clear
else
outbuf.clear
end
outbuf unless outbuf.empty?
end
end
io = FakeIO.new("a" * 50*1024*1024) # 50MB
IO.copy_stream(io, File::NULL)
system "top -pid #{Process.pid}"
~~~
This program outputs memory usage of 50MB at the end, as expected – 50MB was loaded into memory at the beginning and any new strings are deallocated. However, if I modify the #read implementation to use String#replace instead of String#clear:
~~~ ruby
def read(length, outbuf)
chunk = @io.read(length)
if chunk
outbuf.replace chunk
chunk.clear
else
outbuf.clear
end
outbuf unless outbuf.empty?
end
~~~
the memory usage has now doubled to 100MB at the end of the program, indicating that some string bytes weren't successfully deallocated. So, it seems that String#replace has different behaviour compared to String#clear + String#<<.
I was *only* able to reproduce this with `IO.copy_stream`, the following program shows 50MB memory usage, regardless of whether the String#clear or String#replace approach is used:
~~~ ruby
GC.disable
buffer = "a" * 50*1024*1024
chunk = "b" * 50*1024*1024
if ARGV[0] == "clear"
buffer.clear
buffer << chunk
else
buffer.replace chunk
end
chunk.clear
system "top -pid #{Process.pid}"
~~~
With this program I also noticed one interesting thing. If I remove `chunk.clear`, then the "clear" version uses 100MB as expected (because both buffer and chunk strings are 50MB large), but the "replace" version uses only 50MB, which makes it appear that the `buffer` string doesn't use any memory when in fact it should use 50MB just like the `chunk` string. I found that odd, and I think it might be a clue to the memory bug with String#replace I experienced when using `IO.copy_stream`.
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:87176] Re: [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream
2018-05-17 12:28 ` [ruby-core:87136] " janko.marohnic
@ 2018-05-18 18:52 ` Eric Wong
0 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2018-05-18 18:52 UTC (permalink / raw)
To: ruby-core
Good to know the destination buffer to #update helped :>
> But I can imagine `String#exchange!` being a useful tool for
> diligent string management.
Maybe... As it stands, destination buffers for #read and #update
are underutilized and I'm glad to see them being used more.
Thus I'm not hopeful of #exchange! being widely adopted...
When used, destination buffers would still offer better performance,
anyways, as there's no extra object slot used.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-05-18 18:52 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <redmine.issue-14745.20180509125009@ruby-lang.org>
2018-05-09 12:50 ` [ruby-core:86954] [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream janko.marohnic
2018-05-11 3:36 ` [ruby-core:86983] " Eric Wong
2018-05-11 4:26 ` [ruby-core:86987] " Eric Wong
2018-05-11 5:43 ` [ruby-core:86988] " Eric Wong
2018-05-16 9:33 ` [ruby-core:87075] " janko.marohnic
2018-05-16 10:07 ` [ruby-core:87077] " Eric Wong
2018-05-17 12:28 ` [ruby-core:87136] " janko.marohnic
2018-05-18 18:52 ` [ruby-core:87176] " Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).