another array patch - performance boosts all over the place

ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed

* another array patch - performance boosts all over the place
@ 2005-09-21 16:56 Eric Mahurin
  2005-09-21 17:36 ` Yukihiro Matsumoto
  2005-09-22  4:19 ` nobuyoshi nakada
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Mahurin @ 2005-09-21 16:56 UTC (permalink / raw
  To: ruby-core

[-- Attachment #1: Type: text/plain, Size: 4416 bytes --]

Is anybody interested in the patches I'm doing to make arrays
(and someday strings) faster?  I didn't see a response for my
last patch (shift/unshift).  Now I have another patch
(fastarray.diff - attached) that should help out a lot more
code sequences.  Here are the results using the attached
benchmark (end_test.rb):

n = 32768 (2**15)

code                        old   new
------------------------- ----- -----
;                          0.01  0.01
shift                      0.02  0.02
shift(1)                   0.03  0.04
pop                        0.02  0.01
pop(1)                     0.04  0.05
shift;pop                  0.02  0.02
shift(1);pop(1)            0.07  0.08
unshift(i)                 2.34  0.02
push(i)                    0.02  0.02
unshift(i);push(i)         2.35  0.03
unshift(shift)            17.61  0.03
unshift(*shift(1))        17.42  0.07
push(pop)                  0.04  0.02
push(*pop(1))             13.83  0.08
push(shift)               14.64  0.03
push(*shift(1))           15.39  0.08
unshift(pop)               2.33  0.03
unshift(*pop(1))          16.47  0.08
slice!(1)                  2.29  0.03
delete_at(1)               2.30  0.02
slice!(1,1)               12.75  0.05
self[1,1]=[]               2.36  0.05
slice!(-2)                 0.02  0.03
delete_at(-2)              0.02  0.02
slice!(-2,1)              10.37  0.05
self[-2,1]=[]              0.04  0.04
self[1,0]=[i]              3.49  0.04
insert(1,i)                3.44  0.04
self[-1,0]=[i]             1.07  0.05
insert(-2,i)               1.02  0.05
self[1,0]=slice!(1,1)     16.40  0.06
insert(1,delete_at(1))     5.60  0.06
self[-1,0]=slice!(-2,1)   11.92  0.06
insert(-2,delete_at(-2))   1.00  0.06
self[-1,0]=slice!(1,1)    14.10  0.07
insert(-2,delete_at(1))    3.29  0.06
self[1,0]=slice!(-2,1)    14.06  0.07
insert(1,delete_at(-2))    3.30  0.06
self[i]=self[i]            0.03  0.02
self[i]=at(i)              0.03  0.02
self[i,1]=self[i,1]       11.72  0.06
self[-i-1]=self[-i-1]      0.05  0.05
self[-i-1]=at(-i-1)        0.04  0.05
self[-i-1,1]=self[-i-1,1] 11.74  0.07
self[-i-1]=self[i]         0.03  0.03
self[-i-1]=at(i)           0.03  0.04
self[-i-1,1]=self[i,1]    11.67  0.07
self[i]=self[-i-1]         0.04  0.04
self[i]=at(-i-1)           0.03  0.03
self[i,1]=self[-i-1,1]    11.70  0.06

In each of the tests above, it runs the code n times on an
array that has an average length of n.  The massive performance
boosts you see above is because the old (current) code was O(n)
in those cases and my patched code is O(1).

Here were the changes I made:

- added a bit (ELTS_LFREE=FL_USER3) to say whether there is
free space to the left of ary->ptr and put the number of free
elements in ary->ptr[-1].

- instead of ary->aux.capa, did the same for free space to the
right of ary->ptr+ary->len-1 (new flag: ELTS_RFREE=FL_USER3 and
amount stored in ary->ptr[ary->len]).  This freed up
ary->shared for dedicated used (instead of being a union with
capa).

- instead of shared arrays all pointing to a common frozen
array object, put shared arrays in a circular linked list (used
ary->shared).  The original master array will have
ELTS_SHARED=0 and the others will have ELTS_SHARED=1.  Unshared
arrays will have ELTS_SHARED=0 and ary->shared=0.

- when a shared array is to be modified, there are 2 cases:
slave - make copy and remove from shared circular linked list;
master - make copies of all slaves and terminate sharing
between this master and its slaves.  When a shared array is to
be freed, it operates in a similar way.

- during GC, shared arrays are treated independently (master
can be freed).  This works because of the above mechanism. 
Before, unused object references would persist because of array
sharing (keeping an array slice around would keep references to
all objects in the original array).

- modified various functions (shift, unshift, delete_at,
splice), to insert/delete toward the closest side to minimize
the number of elements needed to be moved.

Does anybody see a downside to this?  The main problem I see is
that I don't see the big picture when making these changes
(i.e. GC).  And this needs to be better tested.  Anybody know
of something good to do array testing?  I just used the
self-testing benchmark attached and what's in test/ruby.

Eric



		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 1280032582-fastarray.diff --]
[-- Type: text/x-patch; name="fastarray.diff", Size: 25939 bytes --]

Index: array.c
===================================================================
RCS file: /src/ruby/array.c,v
retrieving revision 1.179
diff -u -r1.179 array.c
--- array.c	12 Sep 2005 15:23:54 -0000	1.179
+++ array.c	21 Sep 2005 16:13:02 -0000
@@ -16,6 +16,7 @@
 #include "util.h"
 #include "st.h"
 #include "node.h"
+#include "rubysig.h"
 
 VALUE rb_cArray, rb_cValues;
 
@@ -52,20 +53,63 @@
 }
 
 static void
-rb_ary_modify(VALUE ary)
+rb_ary_unshare(VALUE ary, int freeit)
 {
+    VALUE shared = RARRAY(ary)->shared;
+    VALUE next = shared;
+    VALUE cur;
     VALUE *ptr;
-
-    rb_ary_modify_check(ary);
+    VALUE *freeptr = 0;
     if (FL_TEST(ary, ELTS_SHARED)) {
-	ptr = ALLOC_N(VALUE, RARRAY(ary)->len);
+        if (freeit) {
+            RARRAY(ary)->ptr = 0;
+        } else {
+            ptr = ALLOC_N(VALUE, RARRAY(ary)->len);
+            MEMCPY(ptr, RARRAY(ary)->ptr, VALUE, RARRAY(ary)->len);
+            RARRAY(ary)->ptr = ptr;
+        }
 	FL_UNSET(ary, ELTS_SHARED);
-	RARRAY(ary)->aux.capa = RARRAY(ary)->len;
-	MEMCPY(ptr, RARRAY(ary)->ptr, VALUE, RARRAY(ary)->len);
-	RARRAY(ary)->ptr = ptr;
+        RARRAY(ary)->shared = 0;
+        do {
+            cur = next;
+            next = RARRAY(cur)->shared;
+        } while (next!=ary);
+        if (cur==shared) {
+            RARRAY(cur)->shared = 0;
+        } else {
+            RARRAY(cur)->shared = shared;
+        }
+    } else if (RARRAY(ary)->ptr) {
+        if (freeit) {
+            long lfree = FL_TEST(ary,ELTS_LFREE) ? RARRAY(ary)->ptr[-1] : 0;
+            freeptr = RARRAY(ary)->ptr-lfree;
+            FL_UNSET(ary, ELTS_LFREE);
+            FL_UNSET(ary, ELTS_RFREE);
+            RARRAY(ary)->ptr = 0;
+        }
+        if (shared) for (;;) {
+            RARRAY(ary)->shared = 0;
+            ary = shared;
+            shared = RARRAY(ary)->shared;
+            if (!shared) break;
+            ptr = ALLOC_N(VALUE, RARRAY(ary)->len);
+	    MEMCPY(ptr, RARRAY(ary)->ptr, VALUE, RARRAY(ary)->len);
+	    FL_UNSET(ary, ELTS_SHARED);
+	    RARRAY(ary)->ptr = ptr;
+        }
+        if (freeptr) RUBY_CRITICAL(free(freeptr));
     }
 }
 
+static void
+rb_ary_modify(VALUE ary)
+{
+    VALUE *ptr;
+
+    rb_ary_modify_check(ary);
+    rb_ary_unshare(ary,0);
+}
+
 VALUE
 rb_ary_freeze(VALUE ary)
 {
@@ -96,7 +140,7 @@
 
     ary->len = 0;
     ary->ptr = 0;
-    ary->aux.capa = 0;
+    ary->shared = 0;
 
     return (VALUE)ary;
 }
@@ -116,7 +160,6 @@
     
     ary = ary_alloc(klass);
     RARRAY(ary)->ptr = ALLOC_N(VALUE, len);
-    RARRAY(ary)->aux.capa = len;
 
     return ary;
 }
@@ -146,7 +189,7 @@
     ary = rb_ary_new2(n);
 
     va_start(ar, n);
-    for (i=0; i<n; i++) {
+    for (i=0; i<n; i++) {               
 	RARRAY(ary)->ptr[i] = va_arg(ar, VALUE);
     }
     va_end(ar);
@@ -196,44 +239,62 @@
     if (n > 0 && elts) {
 	RARRAY(val)->len = n;
 	MEMCPY(RARRAY(val)->ptr, elts, VALUE, n);
+    } else if (n > 0) {
+        RARRAY(val)->ptr[0] = n;
+        FL_SET(val,ELTS_RFREE);
     }
 
     return val;
 }
 
 static VALUE
-ary_make_shared(VALUE ary)
+ary_share(VALUE val, VALUE ary)
 {
+    /*
     if (!FL_TEST(ary, ELTS_SHARED)) {
 	NEWOBJ(shared, struct RArray);
 	OBJSETUP(shared, rb_cArray, T_ARRAY);
 
 	shared->len = RARRAY(ary)->len;
 	shared->ptr = RARRAY(ary)->ptr;
-	shared->aux.capa = RARRAY(ary)->aux.capa;
-	RARRAY(ary)->aux.shared = (VALUE)shared;
+        if (FL_TEST(ary,ELTS_RFREE)) {
+            FL_SET(shared,ELTS_RFREE);
+            FL_UNSET(ary,ELTS_RFREE);
+        }
+        if (FL_TEST(ary,ELTS_LFREE)) {
+            FL_SET(shared,ELTS_LFREE);
+            FL_UNSET(ary,ELTS_LFREE);
+        }
+	RARRAY(ary)->shared = (VALUE)shared;
 	FL_SET(ary, ELTS_SHARED);
 	OBJ_FREEZE(shared);
-	return (VALUE)shared;
     }
-    else {
-	return RARRAY(ary)->aux.shared;
+    */
+    RARRAY(val)->ptr = RARRAY(ary)->ptr;
+    RARRAY(val)->len = RARRAY(ary)->len;
+    if (RARRAY(ary)->shared) {
+        RARRAY(val)->shared = RARRAY(ary)->shared;
+    } else {
+        RARRAY(val)->shared = ary;
     }
+    RARRAY(ary)->shared = val;
+    FL_SET(val, ELTS_SHARED);
+    return val;
 }
 
 static VALUE
 ary_shared_array(VALUE klass, VALUE ary)
 {
-    VALUE val = ary_alloc(klass);
+    return ary_share(ary_alloc(klass),ary);
+}
 
-    ary_make_shared(ary);
-    RARRAY(val)->ptr = RARRAY(ary)->ptr;
-    RARRAY(val)->len = RARRAY(ary)->len;
-    RARRAY(val)->aux.shared = RARRAY(ary)->aux.shared;
-    FL_SET(val, ELTS_SHARED);
-    return val;
+void
+ary_free(VALUE obj)
+{
+    rb_ary_unshare(obj,1);
 }
 
+
 VALUE
 rb_values_from_ary(VALUE ary)
 {
@@ -315,8 +376,17 @@
 {
     long len;
     VALUE size, val;
+    long capa;
 
     if (rb_scan_args(argc, argv, "02", &size, &val) == 0) {
+        if (!FL_TEST(ary,ELTS_SHARED)) {
+            rb_ary_modify(ary);
+            capa = RARRAY(ary)->len+(FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0);
+            if (capa) {
+                FL_SET(ary,ELTS_RFREE);
+                RARRAY(ary)->ptr[0] = capa;
+            }
+        }
 	RARRAY(ary)->len = 0;
 	if (rb_block_given_p()) {
 	    rb_warning("given block not used");
@@ -340,9 +410,13 @@
 	rb_raise(rb_eArgError, "array size too big");
     }
     rb_ary_modify(ary);
-    if (len > RARRAY(ary)->aux.capa) {
-	REALLOC_N(RARRAY(ary)->ptr, VALUE, len);
-	RARRAY(ary)->aux.capa = len;
+    capa = RARRAY(ary)->len+(FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0);
+    if (len > capa) {
+        long lfree = FL_TEST(ary,ELTS_LFREE) ? RARRAY(ary)->ptr[-1] : 0;
+        RARRAY(ary)->ptr -= lfree;
+	REALLOC_N(RARRAY(ary)->ptr, VALUE, len + lfree);
+        RARRAY(ary)->ptr += lfree;
+	capa = len;
     }
     if (rb_block_given_p()) {
 	long i;
@@ -359,6 +433,12 @@
 	memfill(RARRAY(ary)->ptr, len, val);
 	RARRAY(ary)->len = len;
     }
+    if (capa > len) {
+        FL_SET(ary,ELTS_RFREE);
+        RARRAY(ary)->ptr[len] = capa-len;
+    } else {
+        FL_UNSET(ary,ELTS_RFREE);
+    }
 
     return ary;
 }
@@ -381,7 +461,7 @@
 	RARRAY(ary)->ptr = ALLOC_N(VALUE, argc);
 	MEMCPY(RARRAY(ary)->ptr, argv, VALUE, argc);
     }
-    RARRAY(ary)->len = RARRAY(ary)->aux.capa = argc;
+    RARRAY(ary)->len = argc;
 
     return ary;
 }
@@ -389,6 +469,7 @@
 void
 rb_ary_store(VALUE ary, long idx, VALUE val)
 {
+    long capa;
     if (idx < 0) {
 	idx += RARRAY(ary)->len;
 	if (idx < 0) {
@@ -398,8 +479,10 @@
     }
 
     rb_ary_modify(ary);
-    if (idx >= RARRAY(ary)->aux.capa) {
-	long new_capa = RARRAY(ary)->aux.capa / 2;
+    capa = RARRAY(ary)->len+(FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0);
+    if (idx >= capa) {
+	long new_capa = capa / 2;
+        long lfree = FL_TEST(ary,ELTS_LFREE) ? RARRAY(ary)->ptr[-1] : 0;
 
 	if (new_capa < ARY_DEFAULT_SIZE) {
 	    new_capa = ARY_DEFAULT_SIZE;
@@ -408,8 +491,10 @@
 	if (new_capa * (long)sizeof(VALUE) <= new_capa) {
 	    rb_raise(rb_eArgError, "index too big");
 	}
-	REALLOC_N(RARRAY(ary)->ptr, VALUE, new_capa);
-	RARRAY(ary)->aux.capa = new_capa;
+        RARRAY(ary)->ptr -= lfree;
+	REALLOC_N(RARRAY(ary)->ptr, VALUE, new_capa + lfree);
+        RARRAY(ary)->ptr += lfree;
+	capa = new_capa;
     }
     if (idx > RARRAY(ary)->len) {
 	rb_mem_clear(RARRAY(ary)->ptr + RARRAY(ary)->len,
@@ -417,7 +502,14 @@
     }
 
     if (idx >= RARRAY(ary)->len) {
-	RARRAY(ary)->len = idx + 1;
+	long len = idx + 1;
+        RARRAY(ary)->len = len;
+        if (capa > len) {
+            FL_SET(ary,ELTS_RFREE);
+            RARRAY(ary)->ptr[len] = capa-len;
+        } else {
+            FL_UNSET(ary,ELTS_RFREE);
+        }
     }
     RARRAY(ary)->ptr[idx] = val;
 }
@@ -495,15 +587,30 @@
 VALUE
 rb_ary_pop(VALUE ary)
 {
-    rb_ary_modify_check(ary);
+    long capa;
+    VALUE item;
+    if (!FL_TEST(ary, ELTS_SHARED)) {
+        rb_ary_modify(ary);
+    } else {
+        rb_ary_modify_check(ary);
+    }
     if (RARRAY(ary)->len == 0) return Qnil;
+    capa = RARRAY(ary)->len+(FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0);
     if (!FL_TEST(ary, ELTS_SHARED) &&
-	    RARRAY(ary)->len * 2 < RARRAY(ary)->aux.capa &&
-	    RARRAY(ary)->aux.capa > ARY_DEFAULT_SIZE) {
-	RARRAY(ary)->aux.capa = RARRAY(ary)->len * 2;
-	REALLOC_N(RARRAY(ary)->ptr, VALUE, RARRAY(ary)->aux.capa);
+	    RARRAY(ary)->len * 2 < capa &&
+	    capa > ARY_DEFAULT_SIZE) {
+        long lfree = FL_TEST(ary,ELTS_LFREE) ? RARRAY(ary)->ptr[-1] : 0;
+	capa = RARRAY(ary)->len * 2;
+        RARRAY(ary)->ptr -= lfree;
+	REALLOC_N(RARRAY(ary)->ptr, VALUE, capa + lfree);
+        RARRAY(ary)->ptr += lfree;
+    }
+    item = RARRAY(ary)->ptr[--RARRAY(ary)->len];
+    if (!FL_TEST(ary, ELTS_SHARED)) {
+        RARRAY(ary)->ptr[RARRAY(ary)->len] = capa-RARRAY(ary)->len;
+        FL_SET(ary,ELTS_RFREE);
     }
-    return RARRAY(ary)->ptr[--RARRAY(ary)->len];
+    return item;
 }
 
 /*
@@ -523,15 +630,20 @@
 rb_ary_pop_m(int argc, VALUE *argv, VALUE ary)
 {
     VALUE result;
+    long n;
 
     if (argc == 0) {
 	return rb_ary_pop(ary);
     }
 
-    rb_ary_modify_check(ary);
-
     result = ary_shared_last(argc, argv, ary);
-    RARRAY(ary)->len -= RARRAY(result)->len;
+    rb_ary_modify(ary);
+    n = RARRAY(result)->len;
+    if (n) {
+        long rfree = FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0; 
+        RARRAY(ary)->ptr[RARRAY(ary)->len -= n] = rfree+n;
+        FL_SET(ary,ELTS_RFREE);
+    }
     return result;
 }
 
@@ -540,10 +652,20 @@
 {
     VALUE top;
 
-    rb_ary_modify_check(ary);
     if (RARRAY(ary)->len == 0) return Qnil;
     top = RARRAY(ary)->ptr[0];
-    ary_make_shared(ary);
+    if (!FL_TEST(ary,ELTS_SHARED)) {
+        rb_ary_modify(ary);
+        if (FL_TEST(ary,ELTS_LFREE)) {
+            RARRAY(ary)->ptr[0] = RARRAY(ary)->ptr[-1]+1;
+        } else {
+            FL_SET(ary, ELTS_LFREE);
+            RARRAY(ary)->ptr[0] = 1;
+        }
+    } else {
+        rb_ary_modify_check(ary);
+    }
+        
     RARRAY(ary)->ptr++;		/* shift ptr */
     RARRAY(ary)->len--;
 
@@ -577,36 +699,17 @@
 	return rb_ary_shift(ary);
     }
 
-    rb_ary_modify_check(ary);
-
     result = ary_shared_first(argc, argv, ary);
-    n = RARRAY(result)->len;
-    RARRAY(ary)->ptr += n;
-    RARRAY(ary)->len -= n;
-
-    return result;
-}
-
-VALUE
-rb_ary_unshift(VALUE ary, VALUE item)
-{
     rb_ary_modify(ary);
-    if (RARRAY(ary)->len == RARRAY(ary)->aux.capa) {
-	long capa_inc = RARRAY(ary)->aux.capa / 2;
-	if (capa_inc < ARY_DEFAULT_SIZE) {
-	    capa_inc = ARY_DEFAULT_SIZE;
-	}
-	RARRAY(ary)->aux.capa += capa_inc;
-	REALLOC_N(RARRAY(ary)->ptr, VALUE, RARRAY(ary)->aux.capa);
+    n = RARRAY(result)->len;
+    if (n) {
+        long lfree = FL_TEST(ary,ELTS_LFREE) ? RARRAY(ary)->ptr[-1] : 0; 
+        (RARRAY(ary)->ptr += n)[-1] = lfree+n;
+        RARRAY(ary)->len -= n;
+        FL_SET(ary,ELTS_LFREE);
     }
 
-    /* sliding items */
-    MEMMOVE(RARRAY(ary)->ptr + 1, RARRAY(ary)->ptr, VALUE, RARRAY(ary)->len);
-
-    RARRAY(ary)->len++;
-    RARRAY(ary)->ptr[0] = item;
-
-    return ary;
+    return result;
 }
 
 /*
@@ -624,20 +727,50 @@
 static VALUE
 rb_ary_unshift_m(int argc, VALUE *argv, VALUE ary)
 {
-    long len = RARRAY(ary)->len;
-
-    if (argc == 0) return ary;
-
-    /* make rooms by setting the last item */
-    rb_ary_store(ary, len + argc - 1, Qnil);
-
-    /* sliding items */
-    MEMMOVE(RARRAY(ary)->ptr + argc, RARRAY(ary)->ptr, VALUE, len);
-    MEMCPY(RARRAY(ary)->ptr, argv, VALUE, argc);
+    long lfree = FL_TEST(ary,ELTS_LFREE) ? RARRAY(ary)->ptr[-1] : 0;
     
+    rb_ary_modify_check(ary);
+    if (lfree<argc) {
+        long len = RARRAY(ary)->len;
+        long rfree = (FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0)/2;
+	long free2 = argc+(RARRAY(ary)->len+argc)/2;
+        VALUE *ptr;
+	if (free2 < ARY_DEFAULT_SIZE) {
+	    free2 = ARY_DEFAULT_SIZE;
+	}
+        ptr = ALLOC_N(VALUE,len+free2)+(free2-rfree);
+        MEMMOVE(ptr, RARRAY(ary)->ptr, VALUE, len);
+        ary_free(ary);
+        RARRAY(ary)->ptr = ptr;
+        if (rfree) {
+            RARRAY(ary)->ptr[len] = rfree;
+            FL_SET(ary,ELTS_RFREE);
+        } else {
+            FL_UNSET(ary,ELTS_RFREE);
+        }
+        lfree = free2-rfree;
+        FL_SET(ary, ELTS_LFREE);
+    }
+
+    RARRAY(ary)->ptr -= argc;
+    RARRAY(ary)->len += argc;
+    lfree -= argc;
+    if (lfree) {
+        RARRAY(ary)->ptr[-1] = lfree;
+    } else {
+        FL_UNSET(ary, ELTS_LFREE);
+    }
+    MEMCPY(RARRAY(ary)->ptr, argv, VALUE, argc);
+
     return ary;
 }
 
+VALUE
+rb_ary_unshift(VALUE ary, VALUE item)
+{
+    return rb_ary_unshift_m(1,&item,ary);
+}
+
 /* faster version - use this if you don't need to treat negative offset */
 static inline VALUE
 rb_ary_elt(VALUE ary, long offset)
@@ -661,7 +794,7 @@
 static VALUE
 rb_ary_subseq(VALUE ary, long beg, long len)
 {
-    VALUE klass, ary2, shared;
+    VALUE klass, ary2;
     VALUE *ptr;
 
     if (beg > RARRAY(ary)->len) return Qnil;
@@ -675,13 +808,9 @@
     klass = rb_obj_class(ary);
     if (len == 0) return ary_new(klass, 0);
 
-    shared = ary_make_shared(ary);
-    ptr = RARRAY(ary)->ptr;
-    ary2 = ary_alloc(klass);
-    RARRAY(ary2)->ptr = ptr + beg;
+    ary2 = ary_shared_array(klass,ary);
+    RARRAY(ary2)->ptr += beg;
     RARRAY(ary2)->len = len;
-    RARRAY(ary2)->aux.shared = shared;
-    FL_SET(ary2, ELTS_SHARED);
 
     return ary2;
 }
@@ -968,6 +1097,11 @@
 rb_ary_splice(VALUE ary, long beg, long len, VALUE rpl)
 {
     long rlen;
+    long alen;
+    long beg0;
+    long diff;
+    VALUE *ptr;
+    long alen0 = RARRAY(ary)->len;
 
     if (len < 0) rb_raise(rb_eIndexError, "negative length (%ld)", len);
     if (beg < 0) {
@@ -977,9 +1111,6 @@
 	    rb_raise(rb_eIndexError, "index %ld out of array", beg);
 	}
     }
-    if (beg + len > RARRAY(ary)->len) {
-	len = RARRAY(ary)->len - beg;
-    }
 
     if (rpl == Qundef) {
 	rlen = 0;
@@ -988,41 +1119,100 @@
 	rpl = rb_ary_to_ary(rpl);
 	rlen = RARRAY(rpl)->len;
     }
+
     rb_ary_modify(ary);
 
-    if (beg >= RARRAY(ary)->len) {
-	len = beg + rlen;
-	if (len >= RARRAY(ary)->aux.capa) {
-	    REALLOC_N(RARRAY(ary)->ptr, VALUE, len);
-	    RARRAY(ary)->aux.capa = len;
-	}
-	rb_mem_clear(RARRAY(ary)->ptr + RARRAY(ary)->len, beg - RARRAY(ary)->len);
-	if (rlen > 0) {
-	    MEMCPY(RARRAY(ary)->ptr + beg, RARRAY(rpl)->ptr, VALUE, rlen);
-	}
-	RARRAY(ary)->len = len;
-    }
-    else {
-	long alen;
+    beg0 = beg;
+    ptr = RARRAY(ary)->ptr;
 
-	if (beg + len > RARRAY(ary)->len) {
-	    len = RARRAY(ary)->len - beg;
-	}
+    if (beg + len > RARRAY(ary)->len) {
+        len = RARRAY(ary)->len - beg;
+        if (len<0) beg0 = beg + len;
+    }
 
-	alen = RARRAY(ary)->len + rlen - len;
-	if (alen >= RARRAY(ary)->aux.capa) {
-	    REALLOC_N(RARRAY(ary)->ptr, VALUE, alen);
-	    RARRAY(ary)->aux.capa = alen;
-	}
+    diff = rlen - len;
+    alen = alen0 + diff;
 
-	if (len != rlen) {
-	    MEMMOVE(RARRAY(ary)->ptr + beg + rlen, RARRAY(ary)->ptr + beg + len,
-		    VALUE, RARRAY(ary)->len - (beg + len));
-	    RARRAY(ary)->len = alen;
-	}
-	if (rlen > 0) {
-	    MEMMOVE(RARRAY(ary)->ptr + beg, RARRAY(rpl)->ptr, VALUE, rlen);
-	}
+    if (diff) {
+        long lfree = FL_TEST(ary,ELTS_LFREE) ? ptr[-1] : 0;
+        long rfree = FL_TEST(ary,ELTS_RFREE) ? ptr[alen0] : 0;
+        if (beg>=alen/2) {
+            rfree -= diff;
+            if (rfree<0) {
+                long free2 = alen/2;
+                if (lfree>free2/2) {
+                    lfree = free2/2;
+                }
+                rfree = free2-lfree;
+                ptr = ALLOC_N(VALUE, alen+free2) + lfree;
+                MEMCPY(ptr, RARRAY(ary)->ptr, VALUE, beg0);
+                MEMCPY(ptr+beg+rlen, RARRAY(ary)->ptr+beg+len, VALUE, alen0 - (beg+len));
+                if (rlen > 0) {
+                    MEMMOVE(ptr + beg, RARRAY(rpl)->ptr, VALUE, rlen);
+                }
+                ary_free(ary);
+                if (lfree) {
+                    ptr[-1] = lfree;
+                    FL_SET(ary,ELTS_LFREE);
+                } else {
+                    FL_UNSET(ary,ELTS_LFREE);
+                }
+                RARRAY(ary)->ptr = ptr;
+            } else {
+                MEMMOVE(ptr+beg+rlen, ptr+beg+len, VALUE, alen0 - (beg+len));
+                if (rlen > 0) {
+                    MEMMOVE(ptr + beg, RARRAY(rpl)->ptr, VALUE, rlen);
+                }
+            }
+            if (rfree) {
+                ptr[alen] = rfree;
+                FL_SET(ary, ELTS_RFREE);
+            } else {
+                FL_UNSET(ary, ELTS_RFREE);
+            }
+        } else {
+            ptr -= diff;
+            lfree -= diff;
+            if (lfree<0) {
+                long free2 = alen/2;
+                if (rfree>free2/2) {
+                    rfree = free2/2;
+                }
+                lfree = free2-rfree;
+                ptr = ALLOC_N(VALUE, alen+free2) + lfree;
+                MEMCPY(ptr, RARRAY(ary)->ptr, VALUE, beg0);
+                MEMCPY(ptr+beg+rlen, RARRAY(ary)->ptr+beg+len,
+                    VALUE, alen0 - (beg+len));
+                if (rlen > 0) {
+                    MEMMOVE(ptr + beg, RARRAY(rpl)->ptr, VALUE, rlen);
+                }
+                ary_free(ary);
+                if (rfree) {
+                    ptr[alen] = rfree;
+                    FL_SET(ary,ELTS_RFREE);
+                } else {
+                    FL_UNSET(ary,ELTS_RFREE);
+                }
+            } else {
+                MEMMOVE(ptr, ptr+diff, VALUE, beg0);
+                if (rlen > 0) {
+                    MEMMOVE(ptr + beg, RARRAY(rpl)->ptr, VALUE, rlen);
+                }
+            }
+            if (lfree) {
+                ptr[-1] = lfree;
+                FL_SET(ary, ELTS_LFREE);
+            } else {
+                FL_UNSET(ary, ELTS_LFREE);
+            }
+            RARRAY(ary)->ptr = ptr;
+        }
+        RARRAY(ary)->len = alen;
+    } else if (rlen > 0) {
+        MEMMOVE(ptr + beg, RARRAY(rpl)->ptr, VALUE, rlen);
+    }
+    if (len < 0) {
+        rb_mem_clear(ptr + beg0, -len);
     }
 }
 
@@ -1730,6 +1920,7 @@
 rb_ary_delete(VALUE ary, VALUE item)
 {
     long i1, i2;
+    long capa;
 
     for (i1 = i2 = 0; i1 < RARRAY(ary)->len; i1++) {
 	VALUE e = RARRAY(ary)->ptr[i1];
@@ -1748,13 +1939,23 @@
     }
 
     rb_ary_modify(ary);
+    capa = RARRAY(ary)->len+(FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0);
     if (RARRAY(ary)->len > i2) {
 	RARRAY(ary)->len = i2;
-	if (i2 * 2 < RARRAY(ary)->aux.capa &&
-	    RARRAY(ary)->aux.capa > ARY_DEFAULT_SIZE) {
-	    REALLOC_N(RARRAY(ary)->ptr, VALUE, i2 * 2);
-	    RARRAY(ary)->aux.capa = i2 * 2;
-	}
+	if (i2 * 2 < capa &&
+	    capa > ARY_DEFAULT_SIZE) {
+            long lfree = FL_TEST(ary,ELTS_LFREE) ? RARRAY(ary)->ptr[-1] : 0;
+            RARRAY(ary)->ptr -= lfree;
+	    REALLOC_N(RARRAY(ary)->ptr, VALUE, i2 * 2 + lfree);
+            RARRAY(ary)->ptr += lfree;
+	    capa = i2 * 2;
+	}
+        if (capa>i2) {
+            RARRAY(ary)->ptr[i2] = capa-i2;
+            FL_SET(ary,ELTS_RFREE);
+        } else {
+            FL_UNSET(ary,ELTS_RFREE);
+        }
     }
 
     return item;
@@ -1763,8 +1964,10 @@
 VALUE
 rb_ary_delete_at(VALUE ary, long pos)
 {
-    long i, len = RARRAY(ary)->len;
+    long len = RARRAY(ary)->len;
     VALUE del;
+    long rfree;
+    VALUE *ptr;
 
     if (pos >= len) return Qnil;
     if (pos < 0) {
@@ -1773,11 +1976,27 @@
     }
 
     rb_ary_modify(ary);
-    del = RARRAY(ary)->ptr[pos];
-    for (i = pos + 1; i < len; i++, pos++) {
-	RARRAY(ary)->ptr[pos] = RARRAY(ary)->ptr[i];
+    ptr = RARRAY(ary)->ptr;
+    del = ptr[pos];
+    if (pos>=len/2) {
+        VALUE *eptr = ptr+len;
+        long rfree = FL_TEST(ary,ELTS_RFREE) ? *eptr : 0;
+        --len;
+        --eptr;
+        ptr += pos;
+        MEMMOVE(ptr, ptr+1, VALUE, len-pos);
+        RARRAY(ary)->len = len;
+        *eptr = rfree+1;
+        FL_SET(ary,ELTS_RFREE);
+    } else {
+        long lfree = FL_TEST(ary,ELTS_LFREE) ? ptr[-1] : 0;
+        --len;
+        MEMMOVE(ptr+1, ptr, VALUE, pos);
+        *ptr = lfree+1;
+        RARRAY(ary)->len = len;
+        RARRAY(ary)->ptr = ptr+1;
+        FL_SET(ary,ELTS_LFREE);
     }
-    RARRAY(ary)->len = pos;
 
     return del;
 }
@@ -1866,9 +2085,11 @@
 rb_ary_reject_bang(VALUE ary)
 {
     long i1, i2;
+    long rfree;
 
     RETURN_ENUMERATOR(ary, 0, 0);
     rb_ary_modify(ary);
+    rfree = FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0;
     for (i1 = i2 = 0; i1 < RARRAY(ary)->len; i1++) {
 	VALUE v = RARRAY(ary)->ptr[i1];
 	if (RTEST(rb_yield(v))) continue;
@@ -1878,8 +2099,12 @@
 	i2++;
     }
     if (RARRAY(ary)->len == i2) return Qnil;
-    if (i2 < RARRAY(ary)->len)
+    if (i2 < RARRAY(ary)->len) {
+        rfree += RARRAY(ary)->len-i2;
+        RARRAY(ary)->ptr[i2] = rfree;
+        FL_SET(ary,ELTS_RFREE);
 	RARRAY(ary)->len = i2;
+    }
 
     return ary;
 }
@@ -2031,18 +2256,11 @@
 static VALUE
 rb_ary_replace(VALUE copy, VALUE orig)
 {
-    VALUE shared;
-
     rb_ary_modify(copy);
     orig = to_ary(orig);
     if (copy == orig) return copy;
-    shared = ary_make_shared(orig);
-    if (RARRAY(copy)->ptr && !FL_TEST(copy, ELTS_SHARED))
-	free(RARRAY(copy)->ptr);
-    RARRAY(copy)->ptr = RARRAY(orig)->ptr;
-    RARRAY(copy)->len = RARRAY(orig)->len;
-    RARRAY(copy)->aux.shared = shared;
-    FL_SET(copy, ELTS_SHARED);
+    ary_free(copy);
+    ary_share(copy,orig);
 
     return copy;
 }
@@ -2060,12 +2278,19 @@
 VALUE
 rb_ary_clear(VALUE ary)
 {
+    long capa;
     rb_ary_modify(ary);
+    capa = RARRAY(ary)->len+(FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0);
     RARRAY(ary)->len = 0;
-    if (ARY_DEFAULT_SIZE * 2 < RARRAY(ary)->aux.capa) {
-	REALLOC_N(RARRAY(ary)->ptr, VALUE, ARY_DEFAULT_SIZE * 2);
-	RARRAY(ary)->aux.capa = ARY_DEFAULT_SIZE * 2;
+    if (ARY_DEFAULT_SIZE * 2 < capa) {
+        long lfree = FL_TEST(ary,ELTS_LFREE) ? RARRAY(ary)->ptr[-1] : 0;
+        RARRAY(ary)->ptr -= lfree;
+	REALLOC_N(RARRAY(ary)->ptr, VALUE, ARY_DEFAULT_SIZE * 2 + lfree);
+        RARRAY(ary)->ptr += lfree;
+	capa = ARY_DEFAULT_SIZE * 2;
     }
+    RARRAY(ary)->ptr[0] = capa;
+    FL_SET(ary,ELTS_RFREE);
     return ary;
 }
 
@@ -2131,13 +2356,23 @@
     rb_ary_modify(ary);
     end = beg + len;
     if (end > RARRAY(ary)->len) {
-	if (end >= RARRAY(ary)->aux.capa) {
-	    REALLOC_N(RARRAY(ary)->ptr, VALUE, end);
-	    RARRAY(ary)->aux.capa = end;
+        long capa = RARRAY(ary)->len+(FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0);
+	if (end >= capa) {
+            long lfree = FL_TEST(ary,ELTS_LFREE) ? RARRAY(ary)->ptr[-1] : 0;
+            RARRAY(ary)->ptr -= lfree;
+	    REALLOC_N(RARRAY(ary)->ptr, VALUE, end + lfree);
+            RARRAY(ary)->ptr += lfree;
+	    capa = end;
 	}
 	if (beg > RARRAY(ary)->len) {
 	    rb_mem_clear(RARRAY(ary)->ptr + RARRAY(ary)->len, end - RARRAY(ary)->len);
 	}
+        if (capa > end) {
+            RARRAY(ary)->ptr[end] = capa-end;
+            FL_SET(ary,ELTS_RFREE);
+        } else {
+            FL_UNSET(ary,ELTS_RFREE);
+        }
 	RARRAY(ary)->len = end;
     }
 
@@ -2611,6 +2846,7 @@
 {
     VALUE hash, v, vv;
     long i, j;
+    long capa;
 
     hash = ary_make_hash(ary, 0);
 
@@ -2623,7 +2859,14 @@
 	    rb_ary_store(ary, j++, v);
 	}
     }
+    capa = RARRAY(ary)->len+(FL_TEST(ary,ELTS_RFREE) ? RARRAY(ary)->ptr[RARRAY(ary)->len] : 0);
     RARRAY(ary)->len = j;
+    if (capa > j) {
+        RARRAY(ary)->ptr[j] = capa-j;
+        FL_SET(ary,ELTS_RFREE);
+    } else {
+        FL_UNSET(ary,ELTS_RFREE);
+    }
 
     return ary;
 }
@@ -2661,6 +2904,7 @@
 rb_ary_compact_bang(VALUE ary)
 {
     VALUE *p, *t, *end;
+    long lfree;
 
     rb_ary_modify(ary);
     p = t = RARRAY(ary)->ptr;
@@ -2673,8 +2917,12 @@
     if (RARRAY(ary)->len == (p - RARRAY(ary)->ptr)) {
 	return Qnil;
     }
-    RARRAY(ary)->len = RARRAY(ary)->aux.capa = (p - RARRAY(ary)->ptr);
-    REALLOC_N(RARRAY(ary)->ptr, VALUE, RARRAY(ary)->len);
+    RARRAY(ary)->len = (p - RARRAY(ary)->ptr);
+    lfree = FL_TEST(ary,ELTS_LFREE) ? RARRAY(ary)->ptr[-1] : 0;
+    RARRAY(ary)->ptr -= lfree;
+    REALLOC_N(RARRAY(ary)->ptr, VALUE, RARRAY(ary)->len + lfree);
+    RARRAY(ary)->ptr += lfree;
+    FL_UNSET(ary,ELTS_RFREE);
 
     return ary;
 }
Index: gc.c
===================================================================
RCS file: /src/ruby/gc.c,v
retrieving revision 1.209
diff -u -r1.209 gc.c
--- gc.c	14 Sep 2005 08:30:15 -0000	1.209
+++ gc.c	21 Sep 2005 16:13:05 -0000
@@ -711,7 +711,7 @@
     obj->as.basic.flags |= FL_MARK;
 
   marking:
-    if (FL_TEST(obj, FL_EXIVAR)) {
+    if (FL_TEST(obj, FL_EXIVAR)) {      
 	rb_mark_generic_ivar(ptr);
     }
 
@@ -869,11 +869,7 @@
 	goto again;
 
       case T_ARRAY:
-	if (FL_TEST(obj, ELTS_SHARED)) {
-	    ptr = obj->as.array.aux.shared;
-	    goto again;
-	}
-	else {
+	{
 	    long i, len = obj->as.array.len;
 	    VALUE *ptr = obj->as.array.ptr;
 
@@ -1110,9 +1106,7 @@
 	}
 	break;
       case T_ARRAY:
-	if (RANY(obj)->as.array.ptr && !FL_TEST(obj, ELTS_SHARED)) {
-	    RUBY_CRITICAL(free(RANY(obj)->as.array.ptr));
-	}
+	ary_free(obj);
 	break;
       case T_HASH:
 	if (RANY(obj)->as.hash.tbl) {
Index: ruby.h
===================================================================
RCS file: /src/ruby/ruby.h,v
retrieving revision 1.122
diff -u -r1.122 ruby.h
--- ruby.h	14 Sep 2005 13:40:43 -0000	1.122
+++ ruby.h	21 Sep 2005 16:13:05 -0000
@@ -346,7 +346,10 @@
     double value;
 };
 
+// SHARED implies not (LFREE or RFREE)
 #define ELTS_SHARED FL_USER2
+#define ELTS_LFREE FL_USER3
+#define ELTS_RFREE FL_USER4
 
 struct RString {
     struct RBasic basic;
@@ -361,10 +364,7 @@
 struct RArray {
     struct RBasic basic;
     long len;
-    union {
-	long capa;
 	VALUE shared;
-    } aux;
     VALUE *ptr;
 };
 

[-- Attachment #3: 1685833283-end_test.rb --]
[-- Type: application/octet-stream, Size: 3809 bytes --]


require 'benchmark'

#init final code element-assertion

tests = %w{

1   1   ;                               i
        
1.5 0.5 shift                           i+n
1.5 0.5 shift(1)                        i+n
1.5 0.5 pop                             i
1.5 0.5 pop(1)                          i
2   0   shift;pop                       nil
2   0   shift(1);pop(1)                 nil
        
0.5 1.5 unshift(i)                      (i>=n)&&(i-n)||(n-1-i)
0.5 1.5 push(i)                         (i>=n/2)&&(i-n/2)||i
0   2   unshift(i);push(i)              (i>=n)&&(i-n)||(n-1-i)
        
1   1   unshift(shift)                  i
1   1   unshift(*shift(1))              i
1   1   push(pop)                       i
1   1   push(*pop(1))                   i
1   1   push(shift)                     i
1   1   push(*shift(1))                 i
1   1   unshift(pop)                    i
1   1   unshift(*pop(1))                i
        
1.5 0.5 slice!(1)                       (i<1)&&i||(i+n)
1.5 0.5 delete_at(1)                    (i<1)&&i||(i+n)
1.5 0.5 slice!(1,1)                     (i<1)&&i||(i+n)
1.5 0.5 self[1,1]=[]                    (i<1)&&i||(i+n)
1.5 0.5 slice!(-2)                      (i>n/2-2)&&(i+n)||i
1.5 0.5 delete_at(-2)                   (i>n/2-2)&&(i+n)||i
1.5 0.5 slice!(-2,1)                    (i>n/2-2)&&(i+n)||i
1.5 0.5 self[-2,1]=[]                   (i>n/2-2)&&(i+n)||i
        
0.5 1.5 self[1,0]=[i]                   j=n-i;(j<0)&&(i-n)||(j>=n)&&i||j
0.5 1.5 insert(1,i)                     j=n-i;(j<0)&&(i-n)||(j>=n)&&i||j
0.5 1.5 self[-1,0]=[i]                  j=i-n/2+1;(j<0)&&i||(j>=n)&&(i-n)||j
0.5 1.5 insert(-2,i)                    j=i-n/2+1;(j<0)&&i||(j>=n)&&(i-n)||j

1   1   self[1,0]=slice!(1,1)           i
1   1   insert(1,delete_at(1))          i
1   1   self[-1,0]=slice!(-2,1)         i
1   1   insert(-2,delete_at(-2))        i

1   1   self[-1,0]=slice!(1,1)          ((i<1)||(i>=n-1))&&i||((i+1)%(n-2)+1)
1   1   insert(-2,delete_at(1))         ((i<1)||(i>=n-1))&&i||((i+1)%(n-2)+1)
1   1   self[1,0]=slice!(-2,1)          ((i<1)||(i>=n-1))&&i||((i-3)%(n-2)+1)
1   1   insert(1,delete_at(-2))         ((i<1)||(i>=n-1))&&i||((i-3)%(n-2)+1)

1   1   self[i]=self[i]                 i
1   1   self[i]=at(i)                   i
1   1   self[i,1]=self[i,1]             i
1   1   self[-i-1]=self[-i-1]           i
1   1   self[-i-1]=at(-i-1)             i
1   1   self[-i-1,1]=self[-i-1,1]       i
1   1   self[-i-1]=self[i]              (i>=n/2)&&(n-1-i)||i
1   1   self[-i-1]=at(i)                (i>=n/2)&&(n-1-i)||i
1   1   self[-i-1,1]=self[i,1]          (i>=n/2)&&(n-1-i)||i
1   1   self[i]=self[-i-1]              (i>=n/2)&&i||(n-1-i)
1   1   self[i]=at(-i-1)                (i>=n/2)&&i||(n-1-i)
1   1   self[i,1]=self[-i-1,1]          (i>=n/2)&&i||(n-1-i)

}

Fields = 4

ARGV.replace([12]) if ARGV.empty?

ARGV.each { |exp|
    n = 1 << exp.to_i
    puts
    puts("n = #{n}")
    an = (0...(n << 1)).to_a
    
    results = Hash.new;
    Benchmark.bmbm { |benchmark|
        entry = 0   
        while entry<tests.size
            init,final,code,assert = tests[entry,Fields]
            #puts(code)
            init = init.to_f
            final = final.to_f
            init+final==2 or raise(init.to_s+"+"+final.to_s+"!=2");
            results[entry/Fields] = final==0 ? [] :
                eval("Array.new(#{(n*final).to_i}) {|i|
                    #{assert}
                }")
            benchmark.report(code,&eval("lambda {
                a = an[0,#{(n*init).to_i}]
                a.instance_eval {#{n}.times{ |i|
                    #{code}
                }}
                result = results[#{entry/Fields}]
                a==result or raise(a.inspect+'!='+result.inspect)
            }"))
            entry += Fields
        end
    }
}



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another array patch - performance boosts all over the place
@ 2005-09-21 17:16 Berger, Daniel
  2005-09-21 19:50 ` Eric Mahurin
  0 siblings, 1 reply; 10+ messages in thread
From: Berger, Daniel @ 2005-09-21 17:16 UTC (permalink / raw
  To: ruby-core

> -----Original Message-----
> From: Eric Mahurin [mailto:eric_mahurin@yahoo.com] 
> Sent: Wednesday, September 21, 2005 10:56 AM
> To: ruby-core@ruby-lang.org
> Subject: another array patch - performance boosts all over the place
> 
> 
> Is anybody interested in the patches I'm doing to make arrays 
> (and someday strings) faster?  I didn't see a response for my 
> last patch (shift/unshift).

Oh, I'm paying attention. :)

I think the imminent release of 1.8.3 had people focused on other
things.

> Does anybody see a downside to this?  The main problem I see 
> is that I don't see the big picture when making these changes 
> (i.e. GC).  And this needs to be better tested.  Anybody know 
> of something good to do array testing?  I just used the 
> self-testing benchmark attached and what's in test/ruby.

Perhaps providing your own unit tests would be useful, or running it
through the latest rubicon test suite, would be a good idea.  I haven't
had a chance to tinker with it myself...yet.

Regards,

Dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another array patch - performance boosts all over the place
  2005-09-21 16:56 Eric Mahurin
@ 2005-09-21 17:36 ` Yukihiro Matsumoto
  2005-09-22  4:19 ` nobuyoshi nakada
  1 sibling, 0 replies; 10+ messages in thread
From: Yukihiro Matsumoto @ 2005-09-21 17:36 UTC (permalink / raw
  To: ruby-core

Hi,

In message "Re: another array patch - performance boosts all over the place"
    on Thu, 22 Sep 2005 01:56:21 +0900, Eric Mahurin <eric_mahurin@yahoo.com> writes:

|Is anybody interested in the patches I'm doing to make arrays
|(and someday strings) faster?  I didn't see a response for my
|last patch (shift/unshift).  Now I have another patch
|(fastarray.diff - attached) that should help out a lot more
|code sequences.  Here are the results using the attached
|benchmark (end_test.rb):

I've been busy for a while to release 1.8.3, and still need some time
before investigating your patches, but you are not ignored.

							matz.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another array patch - performance boosts all over the place
  2005-09-21 17:16 Berger, Daniel
@ 2005-09-21 19:50 ` Eric Mahurin
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Mahurin @ 2005-09-21 19:50 UTC (permalink / raw
  To: ruby-core

[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

--- "Berger, Daniel" <Daniel.Berger@qwest.com> wrote:

> > -----Original Message-----
> > From: Eric Mahurin [mailto:eric_mahurin@yahoo.com] 
> > Sent: Wednesday, September 21, 2005 10:56 AM
> > To: ruby-core@ruby-lang.org
> > Subject: another array patch - performance boosts all over
> the place
> > 
> > 
> > Is anybody interested in the patches I'm doing to make
> arrays 
> > (and someday strings) faster?  I didn't see a response for
> my 
> > last patch (shift/unshift).
> 
> Oh, I'm paying attention. :)

Good.  I was just surprised by no response to having several
operations go from O(n) to O(1).

> I think the imminent release of 1.8.3 had people focused on
> other
> things.

OK.

> > Does anybody see a downside to this?  The main problem I
> see 
> > is that I don't see the big picture when making these
> changes 
> > (i.e. GC).  And this needs to be better tested.  Anybody
> know 
> > of something good to do array testing?  I just used the 
> > self-testing benchmark attached and what's in test/ruby.
> 
> Perhaps providing your own unit tests would be useful, or
> running it
> through the latest rubicon test suite, would be a good idea. 
> I haven't
> had a chance to tinker with it myself...yet.

I downloaded the rubicon test suite and first tried it out on
the existing v1.9.  There were failures all over the place. 
There was also a segmentation fault.  I made a patch (attached)
to at least fix this.  The problem was rb_range_beg_len
returning Qnil upon fail and several places were expecting 0
instead.  I made the same fix in my fast array ruby.

My fast array patch causes no additional failures in the
rubicon/builtin/TestArray.rb suite or the ruby tree test/ruby
test suite.

I'm surprised at how many failures there are in these test
suites for v1.9.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 1636230909-rb_range_beg_len.diff --]
[-- Type: text/x-patch; name="rb_range_beg_len.diff", Size: 1603 bytes --]

Index: array.c
===================================================================
RCS file: /src/ruby/array.c,v
retrieving revision 1.179
diff -u -r1.179 array.c
--- array.c	12 Sep 2005 15:23:54 -0000	1.179
+++ array.c	21 Sep 2005 19:31:41 -0000
@@ -1070,7 +1070,7 @@
 	offset = FIX2LONG(argv[0]);
 	goto fixnum;
     }
-    if (rb_range_beg_len(argv[0], &beg, &len, RARRAY(ary)->len, 1)) {
+    if (RTEST(rb_range_beg_len(argv[0], &beg, &len, RARRAY(ary)->len, 1))) {
 	/* check if idx is Range */
 	rb_ary_splice(ary, beg, len, argv[1]);
 	return argv[1];
@@ -1845,7 +1845,7 @@
 	return arg2;
     }
 
-    if (!FIXNUM_P(arg1) && rb_range_beg_len(arg1, &pos, &len, RARRAY(ary)->len, 1)) {
+    if (!FIXNUM_P(arg1) && RTEST(rb_range_beg_len(arg1, &pos, &len, RARRAY(ary)->len, 1))) {
 	goto delete_pos_len;
     }
 
@@ -2115,7 +2115,7 @@
 	len = RARRAY(ary)->len;
 	break;
       case 2:
-	if (rb_range_beg_len(arg1, &beg, &len, RARRAY(ary)->len, 1)) {
+	if (RTEST(rb_range_beg_len(arg1, &beg, &len, RARRAY(ary)->len, 1))) {
 	    break;
 	}
 	/* fall through */
Index: string.c
===================================================================
RCS file: /src/ruby/string.c,v
retrieving revision 1.239
diff -u -r1.239 string.c
--- string.c	17 Sep 2005 14:40:05 -0000	1.239
+++ string.c	21 Sep 2005 19:31:44 -0000
@@ -1644,7 +1644,7 @@
 	/* check if indx is Range */
 	{
 	    long beg, len;
-	    if (rb_range_beg_len(indx, &beg, &len, RSTRING(str)->len, 2)) {
+	    if (RTEST(rb_range_beg_len(indx, &beg, &len, RSTRING(str)->len, 2))) {
 		rb_str_splice(str, beg, len, val);
 		return val;
 	    }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another array patch - performance boosts all over the place
  2005-09-21 16:56 Eric Mahurin
  2005-09-21 17:36 ` Yukihiro Matsumoto
@ 2005-09-22  4:19 ` nobuyoshi nakada
  2005-09-22 15:33   ` Eric Mahurin
  1 sibling, 1 reply; 10+ messages in thread
From: nobuyoshi nakada @ 2005-09-22  4:19 UTC (permalink / raw
  To: ruby-core

Hi,

At Thu, 22 Sep 2005 01:56:21 +0900,
Eric Mahurin wrote in [ruby-core:05861]:
> Here were the changes I made:

> - instead of shared arrays all pointing to a common frozen
> array object, put shared arrays in a circular linked list (used
> ary->shared).  The original master array will have
> ELTS_SHARED=0 and the others will have ELTS_SHARED=1.  Unshared
> arrays will have ELTS_SHARED=0 and ary->shared=0.

What kind of merit and necessity, compared with the current
code?  It seems to be complicated.

-- 
Nobu Nakada

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another array patch - performance boosts all over the place
  2005-09-22  4:19 ` nobuyoshi nakada
@ 2005-09-22 15:33   ` Eric Mahurin
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Mahurin @ 2005-09-22 15:33 UTC (permalink / raw
  To: ruby-core

--- nobuyoshi nakada <nobuyoshi.nakada@ge.com> wrote:
> Eric Mahurin wrote in [ruby-core:05861]:
> > - instead of shared arrays all pointing to a common frozen
> > array object, put shared arrays in a circular linked list
> (used
> > ary->shared).  The original master array will have
> > ELTS_SHARED=0 and the others will have ELTS_SHARED=1. 
> Unshared
> > arrays will have ELTS_SHARED=0 and ary->shared=0.
> 
> What kind of merit and necessity, compared with the current
> code?  It seems to be complicated.

The changes regarding element sharing were the largest, but I
believe they will be be most beneficial.  This was the best
solution I came up with.  Here are the issues with the
old/current way of element sharing that are solved by my new
way:

1. After a slice is taken from a large array and then that
array needs to be modified, the large array is copied to a new
area so that the shared slice is not disturbed.  This causes a
extra copy (O(n)) of this large array compared to not doing
element sharing (coping the small slice to begin with).  With
my solution, I copy all of the shared slices instead of the
master array when the master array needs to be modified.  Here
is a benchmark that demonstrates the problem (FIFO taking 2
elements in/out at a time):

ruby -e 'n=2**16; a=(0...n).to_a; n.times{a.push(*a.shift(2))};
puts(Process.times)'

old - O(n**2) time:

#<struct Struct::Tms utime=17.19, stime=20.14, cutime=0.0,
cstime=0.0>

new - O(n) time:

#<struct Struct::Tms utime=0.1, stime=0.0, cutime=0.0,
cstime=0.0>

2. If you take a small slice of a large array, modify the large
array, and keep the original, the small slice will maintain a
link to the original large array and keep all that space. 
You'll be maintaining about double the memory that you would
without element sharing.  Because my patch copies the slices
instead of the large array, this problem doesn't exist.  Here
is an example (works in linux using the /proc filesystem to get
memory usage):

ruby -e 'n=2**13; a=(0...n).to_a; n.times{a.push(a.pop(2))};
IO.readlines("/proc/#{Process.pid}/status").grep(/VmSize/).display;
puts(Process.times)'

old - O(n**2) memory and time:

VmSize:   199864 kB
#<struct Struct::Tms utime=4.02, stime=0.18, cutime=0.0,
cstime=0.0>

new - O(n) memory and time:

VmSize:     3360 kB
#<struct Struct::Tms utime=0.01, stime=0.0, cutime=0.0,
cstime=0.0>

3. If a slice is taken and held, references to all of the
elements of the original array are kept even after the original
array is not needed.  With the method I used, when the original
is GC'd, the slice is copied and all of the other elements of
the original array can be GC'd.  Here is an example that shows
the issue:

ruby -e 'n=2**13; a=[];
n.times{|i|a.push([i,(0..i).to_a][0,1])};
IO.readlines("/proc/#{Process.pid}/status").grep(/VmSize/).display;
puts(Process.times)'

old - O(n**2) memory, O(n**2) time:

VmSize:   166644 kB
#<struct Struct::Tms utime=14.9, stime=0.25, cutime=0.0,
cstime=0.0>

new - O(n) memory, O(n**2) time:

VmSize:     7252 kB
#<struct Struct::Tms utime=9.47, stime=0.03, cutime=0.0,
cstime=0.0>

I tried all of the above tests in perl and it has the
performance/memory characteristics of my patch.

If we are to have element sharing in ruby, I think the
delay/memory penalty should be no worse than the same order of
magnitude than if we didn't have it.  This is definitely not
the case currently.

This is only one part of my patch - element sharing.  The other
major part is doing all insertions/deletions toward the closest
end of the array (instead of always toward the right end). 
Looking at the performance of perl's unshift, they don't have
this capability.  Maybe this is the first time someone has done
something like this.

Eric

__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another array patch - performance boosts all over the place
@ 2005-10-05 15:50 Eric Mahurin
  2005-10-06 14:39 ` Matt Mower
  2005-10-07  1:49 ` Gyoung-Yoon Noh
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Mahurin @ 2005-10-05 15:50 UTC (permalink / raw
  To: ruby-core

Has anybody evaluated this patch?  In addition to these cases
where the benchmark time went from O(n**2) to O(n), I can show
other cases where the memory went from O(n**2) to O(n).

I also looked at equivalent perl for many of these.  I never
found perl to have the element sharing problem (not sure if it
does element sharing).  I also found that perl 5.6.1+ has a
fast unshift like below (always had a fast shift), but doesn't
have fast random insert/delete.  perl's splice always operates
toward the right instead of the closest of the right or left
like my patch.

We can't even make a fast FIFO out of an Array now (see
push(shift) benchmark).  The reason for this is element sharing
cause large copies when a modification is made (shift does
sharing and push does the unsharing).  This is exactly what
Queue uses and its operations are O(n) instead of O(1) as they
should be.  I have several applications in mind but the one
that I have immediate use for is a FIFO buffer where that
buffer could be an Array or String (need the String patch
also).  Another idea (that I don't need right now) is for a
text editor buffer that is an inside-out gap buffer (don't know
what to call it) and has the same performance attributes as a
gap buffer.

I would also like to do a similar patch for String, but I need
to know whether this will be accepted before I invest the time.
 These are not trivial patches - they are involved because of
the data structure changes.

I will be at rubyconf if this needs to be discussed in person.

--- Eric Mahurin <eric_mahurin@yahoo.com> wrote:

> Is anybody interested in the patches I'm doing to make arrays
> (and someday strings) faster?  I didn't see a response for my
> last patch (shift/unshift).  Now I have another patch
> (fastarray.diff - attached) that should help out a lot more
> code sequences.  Here are the results using the attached
> benchmark (end_test.rb):
> 
> n = 32768 (2**15)
> 
> code                        old   new
> ------------------------- ----- -----
> ;                          0.01  0.01
> shift                      0.02  0.02
> shift(1)                   0.03  0.04
> pop                        0.02  0.01
> pop(1)                     0.04  0.05
> shift;pop                  0.02  0.02
> shift(1);pop(1)            0.07  0.08
> unshift(i)                 2.34  0.02
> push(i)                    0.02  0.02
> unshift(i);push(i)         2.35  0.03
> unshift(shift)            17.61  0.03
> unshift(*shift(1))        17.42  0.07
> push(pop)                  0.04  0.02
> push(*pop(1))             13.83  0.08
> push(shift)               14.64  0.03
> push(*shift(1))           15.39  0.08
> unshift(pop)               2.33  0.03
> unshift(*pop(1))          16.47  0.08
> slice!(1)                  2.29  0.03
> delete_at(1)               2.30  0.02
> slice!(1,1)               12.75  0.05
> self[1,1]=[]               2.36  0.05
> slice!(-2)                 0.02  0.03
> delete_at(-2)              0.02  0.02
> slice!(-2,1)              10.37  0.05
> self[-2,1]=[]              0.04  0.04
> self[1,0]=[i]              3.49  0.04
> insert(1,i)                3.44  0.04
> self[-1,0]=[i]             1.07  0.05
> insert(-2,i)               1.02  0.05
> self[1,0]=slice!(1,1)     16.40  0.06
> insert(1,delete_at(1))     5.60  0.06
> self[-1,0]=slice!(-2,1)   11.92  0.06
> insert(-2,delete_at(-2))   1.00  0.06
> self[-1,0]=slice!(1,1)    14.10  0.07
> insert(-2,delete_at(1))    3.29  0.06
> self[1,0]=slice!(-2,1)    14.06  0.07
> insert(1,delete_at(-2))    3.30  0.06
> self[i]=self[i]            0.03  0.02
> self[i]=at(i)              0.03  0.02
> self[i,1]=self[i,1]       11.72  0.06
> self[-i-1]=self[-i-1]      0.05  0.05
> self[-i-1]=at(-i-1)        0.04  0.05
> self[-i-1,1]=self[-i-1,1] 11.74  0.07
> self[-i-1]=self[i]         0.03  0.03
> self[-i-1]=at(i)           0.03  0.04
> self[-i-1,1]=self[i,1]    11.67  0.07
> self[i]=self[-i-1]         0.04  0.04
> self[i]=at(-i-1)           0.03  0.03
> self[i,1]=self[-i-1,1]    11.70  0.06
> 
> In each of the tests above, it runs the code n times on an
> array that has an average length of n.  The massive
> performance
> boosts you see above is because the old (current) code was
> O(n)
> in those cases and my patched code is O(1).
> 
> Here were the changes I made:
> 
> - added a bit (ELTS_LFREE=FL_USER3) to say whether there is
> free space to the left of ary->ptr and put the number of free
> elements in ary->ptr[-1].
> 
> - instead of ary->aux.capa, did the same for free space to
> the
> right of ary->ptr+ary->len-1 (new flag: ELTS_RFREE=FL_USER3
> and
> amount stored in ary->ptr[ary->len]).  This freed up
> ary->shared for dedicated used (instead of being a union with
> capa).
> 
> - instead of shared arrays all pointing to a common frozen
> array object, put shared arrays in a circular linked list
> (used
> ary->shared).  The original master array will have
> ELTS_SHARED=0 and the others will have ELTS_SHARED=1. 
> Unshared
> arrays will have ELTS_SHARED=0 and ary->shared=0.
> 
> - when a shared array is to be modified, there are 2 cases:
> slave - make copy and remove from shared circular linked
> list;
> master - make copies of all slaves and terminate sharing
> between this master and its slaves.  When a shared array is
> to
> be freed, it operates in a similar way.
> 
> - during GC, shared arrays are treated independently (master
> can be freed).  This works because of the above mechanism. 
> Before, unused object references would persist because of
> array
> sharing (keeping an array slice around would keep references
> to
> all objects in the original array).
> 
> - modified various functions (shift, unshift, delete_at,
> splice), to insert/delete toward the closest side to minimize
> the number of elements needed to be moved.
> 
> Does anybody see a downside to this?  The main problem I see
> is
> that I don't see the big picture when making these changes
> (i.e. GC).  And this needs to be better tested.  Anybody know
> of something good to do array testing?  I just used the
> self-testing benchmark attached and what's in test/ruby.
> 
> Eric

look at previous message for patch and benchmark attachments



		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another array patch - performance boosts all over the place
  2005-10-05 15:50 another array patch - performance boosts all over the place Eric Mahurin
@ 2005-10-06 14:39 ` Matt Mower
  2005-10-06 19:20   ` Eric Mahurin
  2005-10-07  1:49 ` Gyoung-Yoon Noh
  1 sibling, 1 reply; 10+ messages in thread
From: Matt Mower @ 2005-10-06 14:39 UTC (permalink / raw
  To: ruby-core

Hi Eric,

On 05/10/05, Eric Mahurin <eric_mahurin@yahoo.com> wrote:
> Has anybody evaluated this patch?  In addition to these cases
> where the benchmark time went from O(n**2) to O(n), I can show
> other cases where the memory went from O(n**2) to O(n).
>

Sadly I'm not qualified to evaluate your patch but it seems like
you're doing very valuable work.  A big speed boost to something as
core as Array (or String) would be great!

Regards,

Matt

--
Matt Mower :: http://matt.blogs.it/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another array patch - performance boosts all over the place
  2005-10-06 14:39 ` Matt Mower
@ 2005-10-06 19:20   ` Eric Mahurin
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Mahurin @ 2005-10-06 19:20 UTC (permalink / raw
  To: ruby-core

--- Matt Mower <matt.mower@gmail.com> wrote:

> Hi Eric,
> 
> On 05/10/05, Eric Mahurin <eric_mahurin@yahoo.com> wrote:
> > Has anybody evaluated this patch?  In addition to these
> cases
> > where the benchmark time went from O(n**2) to O(n), I can
> show
> > other cases where the memory went from O(n**2) to O(n).
> >
> 
> Sadly I'm not qualified to evaluate your patch but it seems
> like
> you're doing very valuable work.  A big speed boost to
> something as
> core as Array (or String) would be great!

Thanks.  For now, it sounds like Evan Webb will integrate this
into the "Sydney" ruby fork.



		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: another array patch - performance boosts all over the place
  2005-10-05 15:50 another array patch - performance boosts all over the place Eric Mahurin
  2005-10-06 14:39 ` Matt Mower
@ 2005-10-07  1:49 ` Gyoung-Yoon Noh
  1 sibling, 0 replies; 10+ messages in thread
From: Gyoung-Yoon Noh @ 2005-10-07  1:49 UTC (permalink / raw
  To: ruby-core

Hi,

I've checked your patch, though I couldn't understand meaning of your
code well, result is really great.  After adapting your patch,
Array#unshift is even faster than Array#push. ;-) Especially on my amd64
machine, performance improvements achieved by the patch was much more
extreme.

I hope your patch will be merged into MAIN branch.

Regards,

On 10/6/05, Eric Mahurin <eric_mahurin@yahoo.com> wrote:
>

--
http://nohmad.sub-port.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-10-07  1:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-05 15:50 another array patch - performance boosts all over the place Eric Mahurin
2005-10-06 14:39 ` Matt Mower
2005-10-06 19:20   ` Eric Mahurin
2005-10-07  1:49 ` Gyoung-Yoon Noh
  -- strict thread matches above, loose matches on Subject: below --
2005-09-21 17:16 Berger, Daniel
2005-09-21 19:50 ` Eric Mahurin
2005-09-21 16:56 Eric Mahurin
2005-09-21 17:36 ` Yukihiro Matsumoto
2005-09-22  4:19 ` nobuyoshi nakada
2005-09-22 15:33   ` Eric Mahurin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).