ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: Hongli Lai <hongli@plan99.net>
To: ruby-core@ruby-lang.org
Subject: Re: Copy-on-write friendly garbage collector
Date: Sat, 15 Mar 2008 00:53:39 +0900	[thread overview]
Message-ID: <47DA9F80.4050501@plan99.net> (raw)
In-Reply-To: <6037b70c0803140800r547c7cc2l8a6c003fc93dbb20@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1412 bytes --]

Daniel Berger wrote:
> I recommend against this. Most people are simply not going to know
> when they should invoke it and when they shouldn't. It's an
> implementation detail that should be hidden from the user IMHO.
> Besides, it really worth splitting it out like this over a 5% speed
> hit? Or am I missing some other implementation detail?

Apparently people care about even a 5% speed hit, or we wouldn't still 
be discussing this.

However, it is true that most scripts will not benefit from a 
copy-on-write friendly garbage collector. Only very few scripts rely on 
it for saving memory. As far as I know, the program that I'm developing 
is the first Ruby program that uses copy-on-write friendliness for 
important stuff. There is both a legit reason for wanting the garbage 
collector to be fast in the normal case, and for wanting it to be 
not-so-fast but copy-on-write friendly in some edge cases.

That said, exposing implementation details is generally not a nice thing 
to do. That's why I've written extensive documentation for the 
"copy_on_write_friendly?" and "copy_on_write_friendly=" methods, which 
explains that the behavior might be different across different Ruby 
implementations and/or different platforms, and that they should only 
use these methods if they know what they're doing.

Please see the attached patch. This patch is made against Ruby 1.8, 
relative to my last patch.

[-- Attachment #2: ruby-pluggable-mark-tables.diff --]
[-- Type: text/x-patch, Size: 13932 bytes --]

(in /home/hongli/Projects/ruby/cowruby)
diff --git a/common.mk b/common.mk
index f57a0da..404cd9d 100644
--- a/common.mk
+++ b/common.mk
@@ -386,7 +386,8 @@ gc.$(OBJEXT): {$(VPATH)}gc.c {$(VPATH)}ruby.h config.h \
   {$(VPATH)}defines.h {$(VPATH)}intern.h {$(VPATH)}missing.h \
   {$(VPATH)}rubysig.h {$(VPATH)}st.h {$(VPATH)}node.h \
   {$(VPATH)}env.h {$(VPATH)}re.h {$(VPATH)}regex.h \
-  {$(VPATH)} pointerset.h {$(VPATH)} marktable.c
+  {$(VPATH)}pointerset.h {$(VPATH)}marktable.h \
+  {$(VPATH)}marktable.c {$(VPATH)}fastmarktable.c
 hash.$(OBJEXT): {$(VPATH)}hash.c {$(VPATH)}ruby.h config.h \
   {$(VPATH)}defines.h {$(VPATH)}intern.h {$(VPATH)}missing.h \
   {$(VPATH)}st.h {$(VPATH)}util.h {$(VPATH)}rubysig.h
@@ -415,7 +416,7 @@ parse.$(OBJEXT): {$(VPATH)}parse.c {$(VPATH)}ruby.h config.h \
   {$(VPATH)}defines.h {$(VPATH)}intern.h {$(VPATH)}missing.h \
   {$(VPATH)}env.h {$(VPATH)}node.h {$(VPATH)}st.h \
   {$(VPATH)}regex.h {$(VPATH)}util.h {$(VPATH)}lex.c
-pointerset.$(OBJEXT): {$(VPATH)}pointerset.c
+pointerset.$(OBJEXT): {$(VPATH)}pointerset.c {$(VPATH)}pointerset.h
 prec.$(OBJEXT): {$(VPATH)}prec.c {$(VPATH)}ruby.h config.h \
   {$(VPATH)}defines.h {$(VPATH)}intern.h {$(VPATH)}missing.h
 process.$(OBJEXT): {$(VPATH)}process.c {$(VPATH)}ruby.h config.h \
diff --git a/fastmarktable.c b/fastmarktable.c
new file mode 100644
index 0000000..8f8526d
--- /dev/null
+++ b/fastmarktable.c
@@ -0,0 +1,83 @@
+/**
+ * A mark table, used during a mark-and-sweep garbage collection cycle.
+ *
+ * This implementation is faster than marktable.c, but is *not*
+ * copy-on-write friendly. It stores mark information directly inside objects.
+ */
+#ifndef _FAST_MARK_TABLE_C_
+#define _FAST_MARK_TABLE_C_
+
+static void
+rb_fast_mark_table_init() {
+}
+
+static void
+rb_fast_mark_table_prepare() {
+}
+
+static void
+rb_fast_mark_table_finalize() {
+}
+
+static inline void
+rb_fast_mark_table_add(RVALUE *object) {
+	object->as.basic.flags |= FL_MARK;
+}
+
+static inline void
+rb_fast_mark_table_heap_add(struct heaps_slot *hs, RVALUE *object) {
+	object->as.basic.flags |= FL_MARK;
+}
+
+static inline int
+rb_fast_mark_table_contains(RVALUE *object) {
+	return object->as.basic.flags & FL_MARK;
+}
+
+static inline int
+rb_fast_mark_table_heap_contains(struct heaps_slot *hs, RVALUE *object) {
+	return object->as.basic.flags & FL_MARK;
+}
+
+static inline void
+rb_fast_mark_table_remove(RVALUE *object) {
+	object->as.basic.flags &= ~FL_MARK;
+}
+
+static inline void
+rb_fast_mark_table_heap_remove(struct heaps_slot *hs, RVALUE *object) {
+	object->as.basic.flags &= ~FL_MARK;
+}
+
+static inline void
+rb_fast_mark_table_add_filename(char *filename) {
+	filename[-1] = 1;
+}
+
+static inline int
+rb_fast_mark_table_contains_filename(const char *filename) {
+	return filename[-1];
+}
+
+static inline void
+rb_fast_mark_table_remove_filename(char *filename) {
+	filename[-1] = 0;
+}
+
+static void
+rb_use_fast_mark_table() {
+	rb_mark_table_init          = rb_fast_mark_table_init;
+	rb_mark_table_prepare       = rb_fast_mark_table_prepare;
+	rb_mark_table_finalize      = rb_fast_mark_table_finalize;
+	rb_mark_table_add           = rb_fast_mark_table_add;
+	rb_mark_table_heap_add      = rb_fast_mark_table_heap_add;
+	rb_mark_table_contains      = rb_fast_mark_table_contains;
+	rb_mark_table_heap_contains = rb_fast_mark_table_heap_contains;
+	rb_mark_table_remove        = rb_fast_mark_table_remove;
+	rb_mark_table_heap_remove   = rb_fast_mark_table_heap_remove;
+	rb_mark_table_add_filename  = rb_fast_mark_table_add_filename;
+	rb_mark_table_contains_filename = rb_fast_mark_table_contains_filename;
+	rb_mark_table_remove_filename   = rb_fast_mark_table_remove_filename;
+}
+
+#endif /* _FAST_MARK_TABLE_C_ */
diff --git a/gc.c b/gc.c
index 3ba5ec5..2843895 100644
--- a/gc.c
+++ b/gc.c
@@ -397,7 +397,9 @@ static int heap_slots = HEAP_MIN_SLOTS;
 
 static RVALUE *himem, *lomem;
 
+#include "marktable.h"
 #include "marktable.c"
+#include "fastmarktable.c"
 
 static void
 add_heap()
@@ -1718,6 +1720,7 @@ void ruby_init_stack(VALUE *addr
 void
 Init_heap()
 {
+    rb_use_fast_mark_table();
     rb_mark_table_init();
     if (!rb_gc_stack_start) {
 	Init_stack(0);
@@ -2209,17 +2212,52 @@ os_statistics()
 }
 
 /*
- * Returns whether this garbage collector is copy-on-write friendly.
+ *  call-seq:
+ *     GC.copy_on_write_friendly?     => true or false
+ *
+ *  Returns whether the garbage collector is copy-on-write friendly.
+ *
+ *  This method only has meaning on platforms that support the _fork_ system call.
+ *  Please consult the documentation for GC.copy_on_write_friendly= for additional
+ *  notes.
  */
 static VALUE
-rb_gc_cow_friendly()
+rb_gc_copy_on_write_friendly()
 {
-    return Qtrue;
+    if (rb_mark_table_init == rb_fast_mark_table_init) {
+	return Qtrue;
+    } else {
+	return Qfalse;
+    }
 }
 
+/*
+ *  call-seq:
+ *     GC.copy_on_write_friendly = _boolean_
+ *
+ *  Tell the garbage collector whether to be copy-on-write friendly.
+ *
+ *  Note that this is an implementation detail of the garbage collector. On some Ruby
+ *  implementations, the garbage collector may always be copy-on-write friendly. In that
+ *  case, this method will do nothing. Furthermore, copy-on-write friendliness has no
+ *  meaning on some platforms (such as Microsoft Windows), so setting this flag on those
+ *  platform is futile.
+ *
+ *  Please keep in mind that this flag is only advisory. Do not rely on it for anything
+ *  truly important.
+ *
+ *  In the mainline Ruby implementation, the copy-on-write friendly garbage collector is
+ *  slightly slower the non-copy-on-write friendly version.
+ */
 static VALUE
-rb_gc_test()
+rb_gc_set_copy_on_write_friendly(VALUE self, VALUE val)
 {
+    if (RTEST(val)) {
+	rb_use_bf_mark_table();
+    } else {
+	rb_use_fast_mark_table();
+    }
+    rb_mark_table_init();
     return Qnil;
 }
 
@@ -2239,8 +2277,8 @@ Init_GC()
     rb_define_singleton_method(rb_mGC, "enable", rb_gc_enable, 0);
     rb_define_singleton_method(rb_mGC, "disable", rb_gc_disable, 0);
     rb_define_method(rb_mGC, "garbage_collect", rb_gc_start, 0);
-    rb_define_singleton_method(rb_mGC, "cow_friendly?", rb_gc_cow_friendly, 0);
-    rb_define_singleton_method(rb_mGC, "test", rb_gc_test, 0);
+    rb_define_singleton_method(rb_mGC, "copy_on_write_friendly?", rb_gc_copy_on_write_friendly, 0);
+    rb_define_singleton_method(rb_mGC, "copy_on_write_friendly=", rb_gc_set_copy_on_write_friendly, 1);
 
     rb_mObSpace = rb_define_module("ObjectSpace");
     rb_define_module_function(rb_mObSpace, "each_object", os_each_obj, -1);
diff --git a/marktable.c b/marktable.c
index 412616f..84f88fe 100644
--- a/marktable.c
+++ b/marktable.c
@@ -1,3 +1,14 @@
+/**
+ * A mark table, used during a mark-and-sweep garbage collection cycle.
+ *
+ * This implementation is somewhat slower than fastmarktable.c, but is
+ * copy-on-write friendly. It stores mark information for objects in a bit
+ * field located at the beginning of the heap. Mark information for filenames
+ * are stored in a pointer set.
+ */
+#ifndef _MARK_TABLE_C_
+#define _MARK_TABLE_C_
+
 #include "pointerset.h"
 
 /* A mark table for filenames and objects that are not on the heap. */
@@ -58,19 +69,21 @@ find_position_in_bitfield(struct heaps_slot *hs, RVALUE *object,
 
 
 static void
-rb_mark_table_init()
+rb_bf_mark_table_init()
 {
-	mark_table = pointer_set_new();
+	if (mark_table == NULL) {
+		mark_table = pointer_set_new();
+	}
 }
 
 static void
-rb_mark_table_prepare()
+rb_bf_mark_table_prepare()
 {
 	last_heap = NULL;
 }
 
 static void
-rb_mark_table_finalize()
+rb_bf_mark_table_finalize()
 {
 	int i;
 
@@ -83,7 +96,7 @@ rb_mark_table_finalize()
 }
 
 static inline void
-rb_mark_table_add(RVALUE *object)
+rb_bf_mark_table_add(RVALUE *object)
 {
 	struct heaps_slot *hs;
 	unsigned int bitfield_index, bitfield_offset;
@@ -98,7 +111,7 @@ rb_mark_table_add(RVALUE *object)
 }
 
 static inline void
-rb_mark_table_heap_add(struct heaps_slot *hs, RVALUE *object)
+rb_bf_mark_table_heap_add(struct heaps_slot *hs, RVALUE *object)
 {
 	unsigned int bitfield_index, bitfield_offset;
 	find_position_in_bitfield(hs, object, &bitfield_index, &bitfield_offset);
@@ -106,7 +119,7 @@ rb_mark_table_heap_add(struct heaps_slot *hs, RVALUE *object)
 }
 
 static inline int
-rb_mark_table_contains(RVALUE *object)
+rb_bf_mark_table_contains(RVALUE *object)
 {
 	struct heaps_slot *hs;
 	unsigned int bitfield_index, bitfield_offset;
@@ -121,7 +134,7 @@ rb_mark_table_contains(RVALUE *object)
 }
 
 static inline int
-rb_mark_table_heap_contains(struct heaps_slot *hs, RVALUE *object)
+rb_bf_mark_table_heap_contains(struct heaps_slot *hs, RVALUE *object)
 {
 	unsigned int bitfield_index, bitfield_offset;
 	find_position_in_bitfield(hs, object, &bitfield_index, &bitfield_offset);
@@ -130,7 +143,7 @@ rb_mark_table_heap_contains(struct heaps_slot *hs, RVALUE *object)
 }
 
 static inline void
-rb_mark_table_remove(RVALUE *object)
+rb_bf_mark_table_remove(RVALUE *object)
 {
 	struct heaps_slot *hs;
 	unsigned int bitfield_index, bitfield_offset;
@@ -145,7 +158,7 @@ rb_mark_table_remove(RVALUE *object)
 }
 
 static inline void
-rb_mark_table_heap_remove(struct heaps_slot *hs, RVALUE *object)
+rb_bf_mark_table_heap_remove(struct heaps_slot *hs, RVALUE *object)
 {
 	unsigned int bitfield_index, bitfield_offset;
 	find_position_in_bitfield(hs, object, &bitfield_index, &bitfield_offset);
@@ -153,19 +166,37 @@ rb_mark_table_heap_remove(struct heaps_slot *hs, RVALUE *object)
 }
 
 static inline void
-rb_mark_table_add_filename(const char *filename)
+rb_bf_mark_table_add_filename(char *filename)
 {
 	pointer_set_insert(mark_table, (void *) filename);
 }
 
 static inline int
-rb_mark_table_contains_filename(const char *filename)
+rb_bf_mark_table_contains_filename(const char *filename)
 {
 	return pointer_set_contains(mark_table, (void *) filename);
 }
 
 static inline void
-rb_mark_table_remove_filename(const char *filename)
+rb_bf_mark_table_remove_filename(char *filename)
 {
 	pointer_set_delete(mark_table, (void *) filename);
 }
+
+static void
+rb_use_bf_mark_table() {
+	rb_mark_table_init          = rb_bf_mark_table_init;
+	rb_mark_table_prepare       = rb_bf_mark_table_prepare;
+	rb_mark_table_finalize      = rb_bf_mark_table_finalize;
+	rb_mark_table_add           = rb_bf_mark_table_add;
+	rb_mark_table_heap_add      = rb_bf_mark_table_heap_add;
+	rb_mark_table_contains      = rb_bf_mark_table_contains;
+	rb_mark_table_heap_contains = rb_bf_mark_table_heap_contains;
+	rb_mark_table_remove        = rb_bf_mark_table_remove;
+	rb_mark_table_heap_remove   = rb_bf_mark_table_heap_remove;
+	rb_mark_table_add_filename  = rb_bf_mark_table_add_filename;
+	rb_mark_table_contains_filename = rb_bf_mark_table_contains_filename;
+	rb_mark_table_remove_filename   = rb_bf_mark_table_remove_filename;
+}
+
+#endif /* _MARK_TABLE_C_ */
diff --git a/marktable.h b/marktable.h
new file mode 100644
index 0000000..3904fd6
--- /dev/null
+++ b/marktable.h
@@ -0,0 +1,17 @@
+#ifndef _MARK_TABLE_H_
+#define _MARK_TABLE_H_
+
+static void (*rb_mark_table_init)();
+static void (*rb_mark_table_prepare)();
+static void (*rb_mark_table_finalize)();
+static void (*rb_mark_table_add)(RVALUE *object);
+static void (*rb_mark_table_heap_add)(struct heaps_slot *hs, RVALUE *object);
+static int  (*rb_mark_table_contains)(RVALUE *object);
+static int  (*rb_mark_table_heap_contains)(struct heaps_slot *hs, RVALUE *object);
+static void (*rb_mark_table_remove)(RVALUE *object);
+static void (*rb_mark_table_heap_remove)(struct heaps_slot *hs, RVALUE *object);
+static void (*rb_mark_table_add_filename)(char *filename);
+static int  (*rb_mark_table_contains_filename)(const char *filename);
+static void (*rb_mark_table_remove_filename)(char *filename);
+
+#endif /* _MARK_TABLE_H_ */
diff --git a/pointerset.h b/pointerset.h
index 15ccab7..bc4733b 100644
--- a/pointerset.h
+++ b/pointerset.h
@@ -1,17 +1,56 @@
+/**
+ * A specialized set data structure, designed to only contain pointers.
+ * It will grow and shrink dynamically.
+ */
 #ifndef _POINTER_SET_H_
 #define _POINTER_SET_H_
 
 typedef void * PointerSetElement;
 typedef struct _PointerSet PointerSet;
 
+/**
+ * Create a new, empty pointer set.
+ */
 PointerSet   *pointer_set_new();
+
+/**
+ * Free the given pointer set.
+ */
 void         pointer_set_free(PointerSet *set);
 
+/**
+ * Insert the given pointer into the pointer set. The data that the
+ * pointer pointers to is not touched, so <tt>element</tt> may even be
+ * an invalid pointer.
+ */
 void         pointer_set_insert(PointerSet *set, PointerSetElement element);
+
+/**
+ * Remove the given pointer from the pointer set. Nothing will happen
+ * if the pointer isn't already in the set.
+ */
 void         pointer_set_delete(PointerSet *set, PointerSetElement element);
+
+/**
+ * Check whether the given pointer is in the pointer set.
+ */
 int          pointer_set_contains(PointerSet *set, PointerSetElement element);
+
+/**
+ * Clear the pointer set.
+ */
 void         pointer_set_reset(PointerSet *set);
+
+/**
+ * Return the number of pointers in the pointer set.
+ */
 unsigned int pointer_set_get_size(PointerSet *set);
+
+/**
+ * Return the amount of space that is used to store the pointers in the set.
+ *
+ * @invariant pointer_set_get_capacity(set) >= pointer_set_get_size(set)
+ */
 unsigned int pointer_set_get_capacity(PointerSet *set);
 
 #endif /* _POINTER_SET_H_ */
diff --git a/ruby.h b/ruby.h
index 32ea164..c43164e 100644
--- a/ruby.h
+++ b/ruby.h
@@ -454,11 +454,12 @@ struct RBignum {
 #define FL_SINGLETON FL_USER0
 #define FL_ALLOCATED (1<<6)
 #define FL_FINALIZE  (1<<7)
+#define FL_MARK      (1<<11)
 #define FL_TAINT     (1<<8)
 #define FL_EXIVAR    (1<<9)
 #define FL_FREEZE    (1<<10)
 
-#define FL_USHIFT    11
+#define FL_USHIFT    12
 
 #define FL_USER0     (1<<(FL_USHIFT+0))
 #define FL_USER1     (1<<(FL_USHIFT+1))

  reply	other threads:[~2008-03-14 15:57 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-03  9:48 Copy-on-write friendly garbage collector Hongli Lai
2008-03-03 12:38 ` Daniel DeLorme
2008-03-03 13:11 ` Yukihiro Matsumoto
2008-03-04 11:31   ` Gonzalo Garramuño
2008-03-07 12:04   ` Hongli Lai
2008-03-07 15:20     ` Paul Brannan
2008-03-07 16:22       ` Hongli Lai
2008-03-07 18:47         ` Joel VanderWerf
2008-03-08  5:34   ` Daniel DeLorme
2008-03-08  7:50     ` Daniel DeLorme
2008-03-08 10:01       ` Daniel DeLorme
2008-03-08 15:39         ` Yukihiro Matsumoto
2008-03-12 17:23       ` Hongli Lai
2008-03-12 17:38         ` Yukihiro Matsumoto
2008-03-13  0:48         ` Daniel DeLorme
2008-03-13 11:04           ` Hongli Lai
2008-03-15 16:15         ` Hongli Lai
2008-03-13 11:18     ` Hongli Lai
2008-03-14  3:20     ` Hongli Lai
2008-03-14  4:44       ` Daniel DeLorme
2008-03-14 11:25         ` Hongli Lai
2008-03-14 12:01           ` Meinrad Recheis
2008-03-14 15:00           ` Daniel Berger
2008-03-14 15:53             ` Hongli Lai [this message]
2008-03-14 17:34               ` Joel VanderWerf
2008-03-20 10:59     ` Hongli Lai
2008-03-20 11:55       ` Yukihiro Matsumoto
2008-03-20 12:54         ` Hongli Lai
2008-03-20 14:24           ` Yukihiro Matsumoto
2008-03-20 14:45             ` Hongli Lai
2008-03-20 15:28             ` Joel VanderWerf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47DA9F80.4050501@plan99.net \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).