ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: Eric Mahurin <eric_mahurin@yahoo.com>
To: ruby-core@ruby-lang.org
Subject: Re: another array patch - performance boosts all over the place
Date: Fri, 23 Sep 2005 00:33:20 +0900	[thread overview]
Message-ID: <20050922153302.80007.qmail@web36110.mail.mud.yahoo.com> (raw)
In-Reply-To: <TYOMLEM04FRaqbC8wSA000000fe@tyomlvem02.e2k.ad.ge.com>

--- nobuyoshi nakada <nobuyoshi.nakada@ge.com> wrote:
> Eric Mahurin wrote in [ruby-core:05861]:
> > - instead of shared arrays all pointing to a common frozen
> > array object, put shared arrays in a circular linked list
> (used
> > ary->shared).  The original master array will have
> > ELTS_SHARED=0 and the others will have ELTS_SHARED=1. 
> Unshared
> > arrays will have ELTS_SHARED=0 and ary->shared=0.
> 
> What kind of merit and necessity, compared with the current
> code?  It seems to be complicated.

The changes regarding element sharing were the largest, but I
believe they will be be most beneficial.  This was the best
solution I came up with.  Here are the issues with the
old/current way of element sharing that are solved by my new
way:

1. After a slice is taken from a large array and then that
array needs to be modified, the large array is copied to a new
area so that the shared slice is not disturbed.  This causes a
extra copy (O(n)) of this large array compared to not doing
element sharing (coping the small slice to begin with).  With
my solution, I copy all of the shared slices instead of the
master array when the master array needs to be modified.  Here
is a benchmark that demonstrates the problem (FIFO taking 2
elements in/out at a time):

ruby -e 'n=2**16; a=(0...n).to_a; n.times{a.push(*a.shift(2))};
puts(Process.times)'

old - O(n**2) time:

#<struct Struct::Tms utime=17.19, stime=20.14, cutime=0.0,
cstime=0.0>

new - O(n) time:

#<struct Struct::Tms utime=0.1, stime=0.0, cutime=0.0,
cstime=0.0>


2. If you take a small slice of a large array, modify the large
array, and keep the original, the small slice will maintain a
link to the original large array and keep all that space. 
You'll be maintaining about double the memory that you would
without element sharing.  Because my patch copies the slices
instead of the large array, this problem doesn't exist.  Here
is an example (works in linux using the /proc filesystem to get
memory usage):

ruby -e 'n=2**13; a=(0...n).to_a; n.times{a.push(a.pop(2))};
IO.readlines("/proc/#{Process.pid}/status").grep(/VmSize/).display;
puts(Process.times)'

old - O(n**2) memory and time:

VmSize:   199864 kB
#<struct Struct::Tms utime=4.02, stime=0.18, cutime=0.0,
cstime=0.0>

new - O(n) memory and time:

VmSize:     3360 kB
#<struct Struct::Tms utime=0.01, stime=0.0, cutime=0.0,
cstime=0.0>


3. If a slice is taken and held, references to all of the
elements of the original array are kept even after the original
array is not needed.  With the method I used, when the original
is GC'd, the slice is copied and all of the other elements of
the original array can be GC'd.  Here is an example that shows
the issue:

ruby -e 'n=2**13; a=[];
n.times{|i|a.push([i,(0..i).to_a][0,1])};
IO.readlines("/proc/#{Process.pid}/status").grep(/VmSize/).display;
puts(Process.times)'

old - O(n**2) memory, O(n**2) time:

VmSize:   166644 kB
#<struct Struct::Tms utime=14.9, stime=0.25, cutime=0.0,
cstime=0.0>

new - O(n) memory, O(n**2) time:

VmSize:     7252 kB
#<struct Struct::Tms utime=9.47, stime=0.03, cutime=0.0,
cstime=0.0>

I tried all of the above tests in perl and it has the
performance/memory characteristics of my patch.

If we are to have element sharing in ruby, I think the
delay/memory penalty should be no worse than the same order of
magnitude than if we didn't have it.  This is definitely not
the case currently.

This is only one part of my patch - element sharing.  The other
major part is doing all insertions/deletions toward the closest
end of the array (instead of always toward the right end). 
Looking at the performance of perl's unshift, they don't have
this capability.  Maybe this is the first time someone has done
something like this.

Eric



		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

  reply	other threads:[~2005-09-22 15:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-21 16:56 another array patch - performance boosts all over the place Eric Mahurin
2005-09-21 17:36 ` Yukihiro Matsumoto
2005-09-22  4:19 ` nobuyoshi nakada
2005-09-22 15:33   ` Eric Mahurin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2005-09-21 17:16 Berger, Daniel
2005-09-21 19:50 ` Eric Mahurin
2005-10-05 15:50 Eric Mahurin
2005-10-06 14:39 ` Matt Mower
2005-10-06 19:20   ` Eric Mahurin
2005-10-07  1:49 ` Gyoung-Yoon Noh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050922153302.80007.qmail@web36110.mail.mud.yahoo.com \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).