From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS4713 221.184.0.0/13 X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by dcvr.yhbt.net (Postfix) with ESMTP id B06571F406 for ; Fri, 11 May 2018 03:36:48 +0000 (UTC) Received: from neon.ruby-lang.org (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id 30841120A6F; Fri, 11 May 2018 12:36:47 +0900 (JST) Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by neon.ruby-lang.org (Postfix) with ESMTPS id 5DEE7120A6E for ; Fri, 11 May 2018 12:36:43 +0900 (JST) Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 11BCB1F406; Fri, 11 May 2018 03:36:42 +0000 (UTC) Date: Fri, 11 May 2018 03:36:41 +0000 From: Eric Wong To: ruby-core@ruby-lang.org Message-ID: <20180511033641.GA4459@dcvr> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-ML-Name: ruby-core X-Mail-Count: 86983 Subject: [ruby-core:86983] Re: [Ruby trunk Bug#14745] High memory usage when using String#replace with IO.copy_stream X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" janko.marohnic@gmail.com wrote: > the memory usage has now doubled to 100MB at the end of the > program, indicating that some string bytes weren't > successfully deallocated. So, it seems that String#replace has > different behaviour compared to String#clear + String#<<. Yes, this is an unfortunate side effect because of copy-on-write semantics of String#replace. If the arg (other_str) for String#replace is non-frozen, a new frozen string is created with using the existing malloc-ed pointer. Both the receiver string and other_str point to that new, shared string. > I was *only* able to reproduce this with `IO.copy_stream` I suspect part of this is because outbuf becomes a long-lived object with IO.copy_stream (not sure), and the hidden frozen string becomes long-lived, as well. So yeah; a combination of well-intentioned optimizations hurt when combined together. The other part could be anything using IO#write could create massive amounts of garbage before 2.5: https://bugs.ruby-lang.org/issues/13085 And your copy_stream use hits the IO#write case. Unfortunately, the "fix" for [Bug #13085] won't work effectively with shared strings, because we can't free+recycle the a string which wasn't created internally by the VM.