ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: dylan.smith@shopify.com
To: ruby-core@ruby-lang.org
Subject: [ruby-core:95124] [Ruby master Feature#16119] Optimize Array#flatten and flatten! for already flattened arrays
Date: Fri, 27 Sep 2019 03:42:47 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-81758.20190927034246.cc86d70cd7424c21@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-16119.20190822235825@ruby-lang.org

Issue #16119 has been updated by dylants (Dylan Thacker-Smith).


It looks like I made a mistake in my benchmarking of non-flattened arrays, since flatten! from the first iteration would cause further iterations to actually test with a flat array.  It looks like the performance for arrays starting with a nested array are about the same with and without my patch.

The attached patch has a merge conflict.  I've opened https://github.com/ruby/ruby/pull/2495 with a rebase of my patch that resolves the merge conflict.  That PR also has an updated benchmark that uses the benchmark driver yaml format and includes the results from running that benchmark.

nobu (Nobuyoshi Nakada) wrote:
> But I could get only insignificant result, and a worse result for an unflattened array.

It looks like nobu's benchmark/array_flatten.yml has the same problem with the non-flat arrays being mutated and affecting following iterations.  That explains the improvement I am seeing locally with your benchmark, although I don't know why my results differ from what you posted

```
Comparison:
                         flatten!
          built-ruby:    246214.2 i/s
        compare-ruby:    193998.4 i/s - 1.27x  slower

                          flatten
          built-ruby:    214275.0 i/s
        compare-ruby:    188818.6 i/s - 1.13x  slower

                         flatten2
          built-ruby:    159840.8 i/s
        compare-ruby:    159546.4 i/s - 1.00x  slower
```

where only the last one actually benchmarks an unflattened array properly.

methodmissing (Lourens Naudé) wrote:
> Attached is a patch for early return and preventing additional allocs for the empty array case (however not very sure how often that happens in practice):

There is no need to special case the empty array case.  My patch would just treat it as a flat array, so would still avoid extra allocations and return early.

----------------------------------------
Feature #16119: Optimize Array#flatten and flatten! for already flattened arrays
https://bugs.ruby-lang.org/issues/16119#change-81758

* Author: dylants (Dylan Thacker-Smith)
* Status: Assigned
* Priority: Normal
* Assignee: nobu (Nobuyoshi Nakada)
* Target version: 
----------------------------------------
## Problem

When doing an object profile from stackprof, I noticed object allocations being made from `Array#flatten!` which was unlike other in-place methods like `Array#uniq!` and `Array#compact!`.  In this case, I wanted to optimize for the array already being flattened and found that `ary.flatten! if ary.any?(Array)` was significantly faster.  The code confirmed my suspicion that `ary.flatten!` was almost equivalent to `ary.replace(ary.flatten)` in implementation. The object allocations I noticed were from a temporary result array, a stack array to handle nesting and a hash table to prevent infinite recursion, all of which can be avoided in the simple case of an already flattened array.

## Solution

Iterate over the array to find the first nested array.  If no nested array is found, then `flatten!` just returns `nil` without creating additional objects and `flatten` returns a shared copy of the original array.  If a nested array is found, then it creates and initializes the temporary objects to resume with the existing algorithm for flattening the array.

## Benchmark

```ruby
require 'benchmark'

N = 100000

def report(x, name)
  x.report(name) do
    N.times do
      yield
    end
  end
end

arrays = {
  small_flat_ary: 5.times.to_a,
  large_flat_ary: 100.times.to_a,
  small_pairs_ary: [[1, 2]] * 5,
  large_pairs_ary: [[1, 2]] * 100,
}

Benchmark.bmbm do |x|
  arrays.each do |name, ary|
    report(x, "#{name}.flatten!") do
      ary.flatten!
    end
    report(x, "#{name}.flatten") do
      ary.flatten
    end
  end
end
```

results on the latest master (`ruby 2.7.0dev (2019-08-22T14:10:55Z master fd20b32130) [x86_64-darwin18]`)

```
                               user     system      total        real
small_flat_ary.flatten!    0.082001   0.000294   0.082295 (  0.082685)
small_flat_ary.flatten     0.078655   0.000211   0.078866 (  0.079176)
large_flat_ary.flatten!    0.552551   0.001643   0.554194 (  0.556166)
large_flat_ary.flatten     0.551520   0.001327   0.552847 (  0.554091)
small_pairs_ary.flatten!   0.100073   0.000302   0.100375 (  0.100687)
small_pairs_ary.flatten    0.109440   0.000232   0.109672 (  0.109850)
large_pairs_ary.flatten!   1.021200   0.001650   1.022850 (  1.024227)
large_pairs_ary.flatten    1.049046   0.002938   1.051984 (  1.055228)
```

results with the attached patch

```
                               user     system      total        real
small_flat_ary.flatten!    0.034868   0.000150   0.035018 (  0.035180)
small_flat_ary.flatten     0.040504   0.000148   0.040652 (  0.040914)
large_flat_ary.flatten!    0.458273   0.000786   0.459059 (  0.460005)
large_flat_ary.flatten     0.453846   0.000833   0.454679 (  0.455324)
small_pairs_ary.flatten!   0.055211   0.000205   0.055416 (  0.055673)
small_pairs_ary.flatten    0.060157   0.000226   0.060383 (  0.060540)
large_pairs_ary.flatten!   0.901698   0.001650   0.903348 (  0.905246)
large_pairs_ary.flatten    0.917180   0.001370   0.918550 (  0.920008)
```

---Files--------------------------------
0001-Optimize-Array-flatten-and-flatten-for-already-fl.diff.txt (2.79 KB)
0001-Let-empty-arrays-being-flattened-not-alloc-additiona.patch (791 Bytes)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

      parent reply	other threads:[~2019-09-27  3:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <redmine.issue-16119.20190822235825@ruby-lang.org>
2019-08-22 23:58 ` [ruby-core:94487] [Ruby master Feature#16119] Optimize Array#flatten and flatten! for already flattened arrays dylan.smith
2019-08-23  2:02 ` [ruby-core:94492] " duerst
2019-08-23  6:31 ` [ruby-core:94497] " shevegen
2019-08-23  7:41 ` [ruby-core:94498] " hanmac
2019-08-23 14:22   ` [ruby-core:94501] " Austin Ziegler
2019-08-23 15:32 ` [ruby-core:94504] " dylan.smith
2019-08-23 15:38 ` [ruby-core:94506] " dylan.smith
2019-09-19  8:31 ` [ruby-core:94984] " mame
2019-09-20  0:19 ` [ruby-core:94994] " nobu
2019-09-22 18:29 ` [ruby-core:95037] " lourens
2019-09-27  3:42 ` dylan.smith [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-81758.20190927034246.cc86d70cd7424c21@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).