git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Jakub Narebski" <jnareb@gmail.com>, "Ted Ts'o" <tytso@mit.edu>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Clemens Buchacher" <drizzd@aon.at>
Subject: [PATCH 1/5] decorate: allow storing values instead of pointers
Date: Mon, 11 Jul 2011 12:16:50 -0400	[thread overview]
Message-ID: <20110711161649.GA10418@sigill.intra.peff.net> (raw)
In-Reply-To: <20110711161332.GA10057@sigill.intra.peff.net>

The decorate API provides a mapping of objects to arbitrary
values. Until now, it did this by allowing you to store a
void pointer, which could point to other storage. This has
two problems:

  1. It's inefficient. To store even a small value, you have
     to allocate persistent storage. So we end up storing
     both the value and a pointer to it.

  2. It's a pain to use. To avoid heap overhead for small
     values, you have to either use a custom allocater, or
     you have to shoe-horn the value into the void pointer
     (if it's small enough to fit).

This patch lets you store fixed-size values directly in the
hash table without allocating them elsewhere. This is a
definite win for any value smaller than or equal to a
pointer. It's probably a win for slightly larger values, but
may eventually be slower for storing large structs (because
the values are copied when the hash grows).

It also provides a more natural interface for storing small
values; see the changes in fast-export.c, which can now drop
its pointer arithmetic magic.

The original "store and retrieve a void pointer" API is easy
to implement on top of this (the values we store are the
pointers). The add_decoration and lookup_decoration
functions are kept for compatibility.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/api-decorate.txt |  170 +++++++++++++++++++++++++++++-
 builtin/fast-export.c                    |   32 ++----
 decorate.c                               |  102 ++++++++++++-------
 decorate.h                               |   22 ++++-
 4 files changed, 263 insertions(+), 63 deletions(-)

diff --git a/Documentation/technical/api-decorate.txt b/Documentation/technical/api-decorate.txt
index 1d52a6c..8d10b66 100644
--- a/Documentation/technical/api-decorate.txt
+++ b/Documentation/technical/api-decorate.txt
@@ -1,6 +1,172 @@
 decorate API
 ============
 
-Talk about <decorate.h>
+The decorate API is a system for efficiently mapping objects to values
+in memory. It is slightly slower than an actual member of an object
+struct (because it incurs a hash lookup), but it uses less memory when
+the mapping is not in use, or when the number of decorated objects is
+small compared to the total number of objects.
 
-(Linus)
+For efficiency, the mapping is capable of storing actual byte values, as
+long as the byte values for each element are of a fixed size. So one
+could, for example, map objects into 32-bit integers. For ease of use,
+functions are provided for storing the values of arbitrary pointers,
+which can point to strings or structs.
+
+Data Structures
+---------------
+
+`struct decoration`::
+
+	This structure represents a single mapping of objects to
+	values. Its fields are:
+
+	`name`:::
+		This field is not used by the decorate API itself, but
+		may be used by calling code.
+
+	`width`:::
+		This field specifies the width (in units of `char`) of
+		the values to be stored. This field must be set to its
+		final value before any objects are added to the mapping.
+		A width of `0` is equivalent to `sizeof(void *)`.
+
+	`nr`:::
+		The number of objects currently mapped by the
+		decoration.
+
+	`size`:::
+		The number of hash slots allocated; this is kept to at
+		least 3/2 of the number of actual slots used, to keep
+		the hash sparse.
+
+	`hash`:::
+		A pointer to an array of actual `object_decoration`
+		structs. Note that because the width of `struct
+		object_decoration` is not known until runtime, this
+		array is stored with type `unsigned char *`. To access
+		individual items, one must perform pointer arithmetic
+		using the `stride` field (see the description of
+		`stride` below, or the "Examples" section).
+
+	`stride`:::
+		The size of a single object_decoration record.
+		Technically this can be computed as a function of the
+		width, but it is here for ease-of-use.
+
+	`end`:::
+		A pointer to storage just past the end of the `hash`
+		array. This can be used as a boundary for iterating the
+		items in the hash (see "Examples" below). Technically
+		this can be computed as a function of the hash pointer,
+		the size field, and the stride, but is here for
+		ease-of-use.
+
+`struct object_decoration`::
+
+	A structure representing the decoration of a single object.
+	Callers will not normally need to use this object unless they
+	are iterating all elements in the decoration hash. The `base`
+	field points to the object being mapped (or `NULL` if it is
+	an empty hash slot). The `decoration` field stores the mapped
+	value as a sequence of bytes; use the `width` field in `struct
+	decoration` to know the exact size.
+
+
+Functions
+---------
+
+`add_decoration_value`::
+
+	Add a mapping from an object to a sequence of bytes. The number
+	of bytes pointed to by `decoration` should be equal to the
+	`width` field of the `struct decoration`. If the `old` parameter
+	is not NULL and a there was already a value for the object, the
+	bytes of the old value are copied into `old`.  The return value
+	is `1` if there was a previous value, or `0` otherwise. Note
+	that if there is no previous value, then `old` is left
+	untouched; it is the responsibility of the caller to either
+	check the return value or to set a sentinel value in `old`.
+
+`lookup_decoration_value`::
+
+	Retrieve a decoration from the mapping. The return value is a
+	pointer to the sequence of bytes representing the value (of
+	length `width`), or `NULL` if no value is found.
+
+`add_decoration`::
+
+	Add a mapping from an object to a void pointer. If a previous
+	pointer exists for the object, it is returned; otherwise, `NULL`
+	is returned.
+
+`lookup_decoration`::
+
+	Retrieve a void pointer from the mapping. The return value is
+	the stored pointer, or `NULL` if there is no stored pointer.
+
+
+Examples
+--------
+
+Store and retrieve pointers to structs:
+
+-------------------------------------------------------------------
+/* no need to set width parameter; it defaults to sizeof(void *) */
+static struct decoration commit_foos;
+
+void store_foo(const struct commit *c, const char *name)
+{
+	struct foo *value = alloc_foo(name);
+	struct foo *old;
+
+	old = add_decoration(&commit_foos, c->object, value);
+	free(old);
+}
+
+const struct foo *get_foo(const struct commit *c)
+{
+	return lookup_decoration(&commit_foos, c->object);
+}
+-------------------------------------------------------------------
+
+Store and retrieve `unsigned long` integers:
+
+-------------------------------------------------------------------
+static struct decoration longs = { "my longs", sizeof(unsigned long) };
+
+void store_long(const struct object *obj, unsigned long value)
+{
+	unsigned long old;
+	if (add_decoration_value(&longs, obj, &value, &old)
+		printf("old value was %lu\n", old);
+}
+
+void print_long(const struct object *obj)
+{
+	unsigned long *value = lookup_decoration_value(&longs, obj);
+	if (!value)
+		printf("no value\n");
+	else
+		printf("value is %lu\n", *value);
+}
+-------------------------------------------------------------------
+
+Iterate over all stored decorations:
+
+-------------------------------------------------------------------
+void dump_longs(void)
+{
+	unsigned char *p;
+	for (p = longs.hash; p < longs.end; p += longs.stride) {
+		struct object_decoration *e = (struct object_decoration *)p;
+		unsigned long *value = (unsigned long *)e->decoration;
+
+		/* empty hash slot */
+		if (!e->base)
+			continue;
+
+		printf("%s -> %lu\n", sha1_to_hex(e->base->sha1), *value);
+	}
+}
+-------------------------------------------------------------------
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index daf1945..c053f1f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -59,7 +59,7 @@ static int parse_opt_tag_of_filtered_mode(const struct option *opt,
 	return 0;
 }
 
-static struct decoration idnums;
+static struct decoration idnums = { NULL, sizeof(uint32_t) };
 static uint32_t last_idnum;
 
 static int has_unshown_parent(struct commit *commit)
@@ -73,20 +73,9 @@ static int has_unshown_parent(struct commit *commit)
 	return 0;
 }
 
-/* Since intptr_t is C99, we do not use it here */
-static inline uint32_t *mark_to_ptr(uint32_t mark)
-{
-	return ((uint32_t *)NULL) + mark;
-}
-
-static inline uint32_t ptr_to_mark(void * mark)
-{
-	return (uint32_t *)mark - (uint32_t *)NULL;
-}
-
 static inline void mark_object(struct object *object, uint32_t mark)
 {
-	add_decoration(&idnums, object, mark_to_ptr(mark));
+	add_decoration_value(&idnums, object, &mark, NULL);
 }
 
 static inline void mark_next_object(struct object *object)
@@ -96,10 +85,10 @@ static inline void mark_next_object(struct object *object)
 
 static int get_object_mark(struct object *object)
 {
-	void *decoration = lookup_decoration(&idnums, object);
-	if (!decoration)
+	uint32_t *mark = lookup_decoration_value(&idnums, object);
+	if (!mark)
 		return 0;
-	return ptr_to_mark(decoration);
+	return *mark;
 }
 
 static void show_progress(void)
@@ -536,20 +525,19 @@ static void handle_tags_and_duplicates(struct string_list *extra_refs)
 
 static void export_marks(char *file)
 {
-	unsigned int i;
-	uint32_t mark;
-	struct object_decoration *deco = idnums.hash;
 	FILE *f;
 	int e = 0;
+	unsigned char *p;
 
 	f = fopen(file, "w");
 	if (!f)
 		die_errno("Unable to open marks file %s for writing.", file);
 
-	for (i = 0; i < idnums.size; i++) {
+	for (p = idnums.hash; p < idnums.end; p += idnums.stride) {
+		struct object_decoration *deco = (struct object_decoration *)p;
 		if (deco->base && deco->base->type == 1) {
-			mark = ptr_to_mark(deco->decoration);
-			if (fprintf(f, ":%"PRIu32" %s\n", mark,
+			uint32_t *mark = (uint32_t *)deco->decoration;
+			if (fprintf(f, ":%"PRIu32" %s\n", *mark,
 				sha1_to_hex(deco->base->sha1)) < 0) {
 			    e = 1;
 			    break;
diff --git a/decorate.c b/decorate.c
index 2f8a63e..55ebaee 100644
--- a/decorate.c
+++ b/decorate.c
@@ -6,6 +6,11 @@
 #include "object.h"
 #include "decorate.h"
 
+#define WIDTH(n) ((n)->width ? (n)->width : sizeof(void *))
+#define REF_OFFSET(n, i) ((n)->hash + ((i) * n->stride))
+#define REF_OBJECT(n, obj) (REF_OFFSET(n, hash_obj(obj, n->size)))
+#define OBJ(p) ((struct object_decoration *)(p))
+
 static unsigned int hash_obj(const struct object *obj, unsigned int n)
 {
 	unsigned int hash;
@@ -14,75 +19,100 @@ static unsigned int hash_obj(const struct object *obj, unsigned int n)
 	return hash % n;
 }
 
-static void *insert_decoration(struct decoration *n, const struct object *base, void *decoration)
+static int insert_decoration(struct decoration *n, const struct object *base,
+			     const void *decoration, void *old)
 {
-	int size = n->size;
-	struct object_decoration *hash = n->hash;
-	unsigned int j = hash_obj(base, size);
-
-	while (hash[j].base) {
-		if (hash[j].base == base) {
-			void *old = hash[j].decoration;
-			hash[j].decoration = decoration;
-			return old;
+	unsigned long width = WIDTH(n);
+	unsigned char *p = REF_OBJECT(n, base);
+
+	while (OBJ(p)->base) {
+		struct object_decoration *e = OBJ(p);
+		if (e->base == base) {
+			if (old)
+				memcpy(old, e->decoration, width);
+			memcpy(e->decoration, decoration, width);
+			return 1;
 		}
-		if (++j >= size)
-			j = 0;
+		p += n->stride;
+		if (p >= n->end)
+			p = n->hash;
 	}
-	hash[j].base = base;
-	hash[j].decoration = decoration;
+	OBJ(p)->base = base;
+	memcpy(OBJ(p)->decoration, decoration, width);
 	n->nr++;
-	return NULL;
+	return 0;
 }
 
 static void grow_decoration(struct decoration *n)
 {
-	int i;
-	int old_size = n->size;
-	struct object_decoration *old_hash = n->hash;
+	unsigned char *old_hash = n->hash;
+	unsigned char *old_end = n->end;
+	unsigned char *p;
 
-	n->size = (old_size + 1000) * 3 / 2;
-	n->hash = xcalloc(n->size, sizeof(struct object_decoration));
+	n->stride = sizeof(struct object_decoration) + WIDTH(n);
+	n->size = (n->size + 1000) * 3 / 2;
+	n->hash = xcalloc(n->size, n->stride);
+	n->end = n->hash + (n->size * n->stride);
 	n->nr = 0;
 
-	for (i = 0; i < old_size; i++) {
-		const struct object *base = old_hash[i].base;
-		void *decoration = old_hash[i].decoration;
-
-		if (!base)
+	for (p = old_hash; p < old_end; p += n->stride) {
+		if (!OBJ(p)->base)
 			continue;
-		insert_decoration(n, base, decoration);
+		insert_decoration(n, OBJ(p)->base, OBJ(p)->decoration, NULL);
 	}
 	free(old_hash);
 }
 
 /* Add a decoration pointer, return any old one */
 void *add_decoration(struct decoration *n, const struct object *obj,
-		void *decoration)
+		     void *decoration)
 {
-	int nr = n->nr + 1;
+	void *old = NULL;
 
-	if (nr > n->size * 2 / 3)
+	add_decoration_value(n, obj, &decoration, &old);
+	return old;
+}
+
+int add_decoration_value(struct decoration *n,
+			 const struct object *obj,
+			 const void *decoration,
+			 void *old)
+{
+	if (n->nr + 1 > n->size * 2 / 3)
 		grow_decoration(n);
-	return insert_decoration(n, obj, decoration);
+	return insert_decoration(n, obj, decoration, old);
 }
 
+
 /* Lookup a decoration pointer */
-void *lookup_decoration(struct decoration *n, const struct object *obj)
+void *lookup_decoration(const struct decoration *n, const struct object *obj)
 {
-	unsigned int j;
+	void **v;
+
+	v = lookup_decoration_value(n, obj);
+	if (!v)
+		return NULL;
+	return *v;
+}
+
+void *lookup_decoration_value(const struct decoration *n,
+			      const struct object *obj)
+{
+	const unsigned char *p;
 
 	/* nothing to lookup */
 	if (!n->size)
 		return NULL;
-	j = hash_obj(obj, n->size);
+
+	p = REF_OBJECT(n, obj);
 	for (;;) {
-		struct object_decoration *ref = n->hash + j;
+		struct object_decoration *ref = OBJ(p);
 		if (ref->base == obj)
 			return ref->decoration;
 		if (!ref->base)
 			return NULL;
-		if (++j == n->size)
-			j = 0;
+		p += n->stride;
+		if (p >= n->end)
+			p = n->hash;
 	}
 }
diff --git a/decorate.h b/decorate.h
index e732804..b75054a 100644
--- a/decorate.h
+++ b/decorate.h
@@ -3,16 +3,32 @@
 
 struct object_decoration {
 	const struct object *base;
-	void *decoration;
+	unsigned char decoration[FLEX_ARRAY];
 };
 
 struct decoration {
 	const char *name;
+	/* width of data we're holding; must be set before adding */
+	unsigned int width;
 	unsigned int size, nr;
-	struct object_decoration *hash;
+	/*
+	 * The hash contains object_decoration structs, but we don't know their
+	 * size until runtime. So we store is as a pointer to characters to
+	 * make pointer arithmetic easier.
+	 */
+	unsigned char *hash;
+	unsigned int stride; /* size of a single record */
+	unsigned char *end; /* end of the hash array */
 };
 
 extern void *add_decoration(struct decoration *n, const struct object *obj, void *decoration);
-extern void *lookup_decoration(struct decoration *n, const struct object *obj);
+extern void *lookup_decoration(const struct decoration *n, const struct object *obj);
+
+extern int add_decoration_value(struct decoration *n,
+				const struct object *obj,
+				const void *decoration,
+				void *old);
+extern void *lookup_decoration_value(const struct decoration *n,
+				     const struct object *obj);
 
 #endif
-- 
1.7.6.37.g989c6

  reply	other threads:[~2011-07-11 16:16 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-11 16:13 [RFC/PATCH 0/5] generation numbers for faster traversals Jeff King
2011-07-11 16:16 ` Jeff King [this message]
2011-07-11 16:39   ` [PATCH 1/5] decorate: allow storing values instead of pointers Jeff King
2011-07-11 19:06   ` Junio C Hamano
2011-07-11 21:20     ` Jeff King
2011-07-11 16:17 ` [PATCH 2/5] add object-cache infrastructure Jeff King
2011-07-11 16:46   ` Jeff King
2011-07-11 16:58     ` Shawn Pearce
2011-07-11 19:17   ` Junio C Hamano
2011-07-11 22:01     ` Jeff King
2011-07-11 23:21       ` Junio C Hamano
2011-07-11 23:42         ` Junio C Hamano
2011-07-12  0:03         ` Jeff King
2011-07-12 19:38           ` Clemens Buchacher
2011-07-12 19:45             ` Jeff King
2011-07-12 21:07               ` Clemens Buchacher
2011-07-12 21:15                 ` Clemens Buchacher
2011-07-12 21:36                 ` Jeff King
2011-07-14  8:04                   ` Clemens Buchacher
2011-07-14 16:26                     ` Illia Bobyr
2011-07-13  1:33                 ` John Szakmeister
2011-07-12  0:14         ` Illia Bobyr
2011-07-12  5:35           ` Jeff King
2011-07-12 21:52             ` Illia Bobyr
2011-07-12  6:36       ` Miles Bader
2011-07-12 10:41   ` Jakub Narebski
2011-07-12 17:57     ` Jeff King
2011-07-12 18:41       ` Junio C Hamano
2011-07-13  6:37         ` Jeff King
2011-07-13 17:49           ` Junio C Hamano
2011-07-11 16:18 ` [PATCH 3/5] commit: add commit_generation function Jeff King
2011-07-11 17:57   ` Clemens Buchacher
2011-07-11 21:10     ` Jeff King
2011-07-11 16:18 ` [PATCH 4/5] pretty: support %G to show the generation number of a commit Jeff King
2011-07-11 16:18 ` [PATCH 5/5] limit "contains" traversals based on commit generation Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110711161649.GA10418@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).