From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Jakub Narebski" <jnareb@gmail.com>, "Ted Ts'o" <tytso@mit.edu>,
"Jonathan Nieder" <jrnieder@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Clemens Buchacher" <drizzd@aon.at>
Subject: [PATCH 2/5] add object-cache infrastructure
Date: Mon, 11 Jul 2011 12:17:54 -0400 [thread overview]
Message-ID: <20110711161754.GB10418@sigill.intra.peff.net> (raw)
In-Reply-To: <20110711161332.GA10057@sigill.intra.peff.net>
There is sometimes a need to cache some information about an
object or set of objects persistently across git
invocations. The notes-cache interface can be used for this,
but it is very heavyweight and slow for storing small
values.
This patch introduces a new API, object-cache, which stores
a mapping of objects to values in a concise and efficient
form. See the added API documentation for details.
Signed-off-by: Jeff King <peff@peff.net>
---
As I mentioned earlier, I wanted this to be generic and size-agnostic,
because I'd also like to try caching patch-ids for git-cherry.
Documentation/technical/api-decorate.txt | 3 +
Documentation/technical/api-object-cache.txt | 101 +++++++++++++
Makefile | 2 +
object-cache.c | 204 ++++++++++++++++++++++++++
object-cache.h | 32 ++++
5 files changed, 342 insertions(+), 0 deletions(-)
create mode 100644 Documentation/technical/api-object-cache.txt
create mode 100644 object-cache.c
create mode 100644 object-cache.h
diff --git a/Documentation/technical/api-decorate.txt b/Documentation/technical/api-decorate.txt
index 8d10b66..c851f59 100644
--- a/Documentation/technical/api-decorate.txt
+++ b/Documentation/technical/api-decorate.txt
@@ -13,6 +13,9 @@ could, for example, map objects into 32-bit integers. For ease of use,
functions are provided for storing the values of arbitrary pointers,
which can point to strings or structs.
+Note that the decorate API only stores the mapping in memory. See the
+object-cache API for persistent storage.
+
Data Structures
---------------
diff --git a/Documentation/technical/api-object-cache.txt b/Documentation/technical/api-object-cache.txt
new file mode 100644
index 0000000..ecead4c
--- /dev/null
+++ b/Documentation/technical/api-object-cache.txt
@@ -0,0 +1,101 @@
+Object Cache API
+================
+
+The object cache API is meant to store a cache of per-object values. The
+aim is for robustness, speed, and simplicity. The stored data must be of
+a fixed size, and the only operations allowed are insertion and
+retrieval with a struct object as the key.
+
+This API is similar to the decorate API, but provides persistence of
+values across multiple invocations. It is also similar to the
+notes-cache API, but is much lighter weight and suitable for storing
+small values. If you are storing large, arbitrary data, consider using
+notes-cache.
+
+
+Storage
+-------
+
+Values are stored both on-disk and in-memory. Newly added values are
+initially stored in a hash table in memory, and written to disk only
+when `object_cache_write` is called.
+
+The disk storage consists of a single file per cache, located in the
+`$GIT_DIR/cache` directory. Each file contains a list of rows ordered by
+sha1. Each row contains a binary sha1, followed by the fixed-size data
+mapped to that sha1.
+
+When the cache is written to disk, the contents of the in-memory data
+and the disk data are merged, with in-memory values taking precedence
+over disk values. The data is written to a temporary file and atomically
+renamed into the new cache file. Thus there is no lock contention
+between competing processes on either reading or writing (though one
+process's updates may be lost).
+
+
+Speed
+-----
+
+Lookup in the cache requires `O(lg(n))` hash comparisons (via binary
+search of the disk contents, or the in-memory hash table).
+
+Insertion into the cache is amortized `O(1)` via the hash table. Writing
+the cache to disk entails `O(n*lg(n) + m)` hash comparisons, where `m`
+is the number of existing disk entries and `n` is the number of newly
+added entries.
+
+
+Data Structures
+---------------
+
+`struct object_cache`::
+
+ This structure represents a single object cache. Its contents
+ should be considered opaque by calling code.
++
+ The `fd`, `disk_entries`, `disk_nr`, and `record_size` fields
+ represent the mmap'd on-disk data. The `mem` field represents
+ the in-memory data (see the decorate API for details).
+
+
+Functions
+---------
+
+`object_cache_init`::
+
+ Initialize a new object cache; this must be called before any
+ other functions are called on the cache. The `name` parameter
+ specifies a human-readable name which will be used for storage
+ in `$GIT_DIR/cache/$name`. The `width` parameter specifies the
+ size, in units of `char`, of the data to be stored (e.g., pass
+ in `sizeof(uint32_t)` to store 32-bit integers).
+
+`object_cache_lookup`::
+
+ Retrieve a value from the object cache. A void pointer to the
+ stored value will be returned, or `NULL` if there is no value.
+
+`object_cache_add`::
+
+ Store a value in the object cache. The value pointer should
+ point to exactly `width` bytes of data.
+
+`object_cache_write`::
+
+ Merge the in-memory and existing disk entries, and write the
+ resulting cache to disk, under the name `$GIT_DIR/cache/$name`.
+ If no in-memory entries exist, the on-disk cache will be left
+ unchanged.
+
+`object_cache_lookup_uint32`::
+
+ Convenience wrapper for retrieving unsigned 32-bit integers. The
+ value will be returned via the `value` pointer. The return value
+ is `0` if a value was found, or negative otherwise (in which
+ case the contents of `value` will be unchanged).
+
+`object_cache_add_uint32`::
+
+ Convenience wrapper for storing unsigned 32-bit integers. Note
+ that integers are stored on disk in network-byte order, so it is
+ safe to access caches from any architecture.
diff --git a/Makefile b/Makefile
index f8c72e1..dd05579 100644
--- a/Makefile
+++ b/Makefile
@@ -535,6 +535,7 @@ LIB_H += merge-recursive.h
LIB_H += notes.h
LIB_H += notes-cache.h
LIB_H += notes-merge.h
+LIB_H += object-cache.h
LIB_H += object.h
LIB_H += pack.h
LIB_H += pack-refs.h
@@ -628,6 +629,7 @@ LIB_OBJS += name-hash.o
LIB_OBJS += notes.o
LIB_OBJS += notes-cache.o
LIB_OBJS += notes-merge.o
+LIB_OBJS += object-cache.o
LIB_OBJS += object.o
LIB_OBJS += pack-check.o
LIB_OBJS += pack-refs.o
diff --git a/object-cache.c b/object-cache.c
new file mode 100644
index 0000000..d019f0a
--- /dev/null
+++ b/object-cache.c
@@ -0,0 +1,204 @@
+#include "cache.h"
+#include "object-cache.h"
+#include "sha1-lookup.h"
+#include "object.h"
+
+static const char *object_cache_path(const char *name)
+{
+ return git_path("cache/%s", name);
+}
+
+static void open_disk_cache(struct object_cache *c, const char *path)
+{
+ struct stat sb;
+
+ c->fd = open(path, O_RDONLY);
+ if (c->fd < 0)
+ return;
+
+ if (fstat(c->fd, &sb) < 0) {
+ close(c->fd);
+ c->fd = -1;
+ return;
+ }
+
+ c->disk_entries = xmmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE,
+ c->fd, 0);
+ c->disk_nr = sb.st_size / c->record_size;
+}
+
+void object_cache_init(struct object_cache *c, const char *name, int width)
+{
+ memset(c, 0, sizeof(*c));
+ c->record_size = 20 + width;
+ c->mem.width = width;
+ open_disk_cache(c, object_cache_path(name));
+}
+
+static void *lookup_disk(const struct object_cache *c,
+ const struct object *obj)
+{
+ int pos;
+
+ pos = sha1_entry_pos(c->disk_entries, c->record_size, 0,
+ 0, c->disk_nr, c->disk_nr, obj->sha1);
+ if (pos < 0)
+ return NULL;
+
+ return c->disk_entries + (pos * c->record_size) + 20;
+}
+
+const void *object_cache_lookup(const struct object_cache *c,
+ const struct object *obj)
+{
+ void *r;
+
+ r = lookup_decoration_value(&c->mem, obj);
+ if (!r)
+ r = lookup_disk(c, obj);
+ return r;
+}
+
+void object_cache_add(struct object_cache *c, const struct object *obj,
+ const void *value)
+{
+ add_decoration_value(&c->mem, obj, value, NULL);
+}
+
+static unsigned char *flatten_mem_entries(struct object_cache *c)
+{
+ unsigned char *p;
+ unsigned char *ret;
+ int nr;
+
+ ret = xmalloc(c->mem.nr * c->record_size);
+ nr = 0;
+ for (p = c->mem.hash; p < c->mem.end; p += c->mem.stride) {
+ struct object_decoration *e = (struct object_decoration *)p;
+ unsigned char *out;
+
+ if (!e->base)
+ continue;
+
+ if (nr == c->mem.nr)
+ die("BUG: decorate hash contained extra values");
+
+ out = ret + (nr * c->record_size);
+ hashcpy(out, e->base->sha1);
+ out += 20;
+ memcpy(out, e->decoration, c->mem.width);
+ nr++;
+ }
+
+ return ret;
+}
+
+static int void_hashcmp(const void *a, const void *b)
+{
+ return hashcmp(a, b);
+}
+
+int object_cache_write(struct object_cache *c, const char *name)
+{
+ const char *path = object_cache_path(name);
+ struct strbuf tempfile = STRBUF_INIT;
+ int fd;
+ unsigned char *mem_entries;
+ int i, j;
+ int r;
+
+ if (!c->mem.nr)
+ return 0;
+
+ strbuf_addf(&tempfile, "%s.XXXXXX", path);
+ if (safe_create_leading_directories(tempfile.buf) < 0 ||
+ (fd = git_mkstemp_mode(tempfile.buf, 0755)) < 0) {
+ strbuf_release(&tempfile);
+ return -1;
+ }
+
+ mem_entries = flatten_mem_entries(c);
+ qsort(mem_entries, c->mem.nr, c->record_size, void_hashcmp);
+
+#define WRITE_ENTRY(e) \
+do { \
+ if (write_in_full(fd, e, c->record_size) < 0) { \
+ close(fd); \
+ unlink(tempfile.buf); \
+ strbuf_release(&tempfile); \
+ free(mem_entries); \
+ return -1; \
+ } \
+} while (0)
+
+ i = j = 0;
+ while (i < c->mem.nr && j < c->disk_nr) {
+ unsigned char *mem = mem_entries + (i * c->record_size);
+ unsigned char *disk = c->disk_entries + (j * c->record_size);
+ int cmp = hashcmp(mem, disk);
+
+ /* replace disk entries with in-memory ones */
+ if (cmp == 0)
+ j++;
+ else if (cmp < 0) {
+ WRITE_ENTRY(mem);
+ i++;
+ }
+ else {
+ WRITE_ENTRY(disk);
+ j++;
+ }
+ }
+ while (i < c->mem.nr) {
+ unsigned char *mem = mem_entries + (i * c->record_size);
+ WRITE_ENTRY(mem);
+ i++;
+ }
+ while (j < c->disk_nr) {
+ unsigned char *disk = c->disk_entries + (j * c->record_size);
+ WRITE_ENTRY(disk);
+ j++;
+ }
+#undef WRITE_ENTRY
+
+ free(mem_entries);
+
+ if (close(fd) < 0 || rename(tempfile.buf, path) < 0)
+ r = -1;
+ else
+ r = 0;
+
+ unlink(tempfile.buf);
+ strbuf_release(&tempfile);
+ return r;
+}
+
+int object_cache_lookup_uint32(const struct object_cache *c,
+ const struct object *obj,
+ uint32_t *value)
+{
+ const uint32_t *out;
+
+ if (c->record_size != 24)
+ die("BUG: size mismatch in object cache lookup (%d != 24)",
+ c->record_size);
+
+ out = object_cache_lookup(c, obj);
+ if (!out)
+ return -1;
+
+ *value = ntohl(*out);
+ return 0;
+}
+
+void object_cache_add_uint32(struct object_cache *c,
+ const struct object *obj,
+ uint32_t value)
+{
+ if (c->record_size != 24)
+ die("BUG: size mismatch in object cache add (%d != 24)",
+ c->record_size);
+
+ value = htonl(value);
+ object_cache_add(c, obj, &value);
+}
diff --git a/object-cache.h b/object-cache.h
new file mode 100644
index 0000000..b869f1e
--- /dev/null
+++ b/object-cache.h
@@ -0,0 +1,32 @@
+#ifndef OBJECT_CACHE_H
+#define OBJECT_CACHE_H
+
+#include "decorate.h"
+
+struct object_cache {
+ /* mmap'd disk entries */
+ int fd;
+ unsigned char *disk_entries;
+ int disk_nr;
+ int record_size;
+
+ /* in memory entries */
+ struct decoration mem;
+};
+
+void object_cache_init(struct object_cache *, const char *name, int width);
+const void *object_cache_lookup(const struct object_cache *,
+ const struct object *);
+void object_cache_add(struct object_cache *, const struct object *,
+ const void *value);
+int object_cache_write(struct object_cache *, const char *name);
+
+/* Convenience wrappers around object_cache_{lookup,add} */
+int object_cache_lookup_uint32(const struct object_cache *,
+ const struct object *,
+ uint32_t *value);
+void object_cache_add_uint32(struct object_cache *,
+ const struct object *,
+ uint32_t value);
+
+#endif /* OBJECT_CACHE_H */
--
1.7.6.37.g989c6
next prev parent reply other threads:[~2011-07-11 16:18 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-11 16:13 [RFC/PATCH 0/5] generation numbers for faster traversals Jeff King
2011-07-11 16:16 ` [PATCH 1/5] decorate: allow storing values instead of pointers Jeff King
2011-07-11 16:39 ` Jeff King
2011-07-11 19:06 ` Junio C Hamano
2011-07-11 21:20 ` Jeff King
2011-07-11 16:17 ` Jeff King [this message]
2011-07-11 16:46 ` [PATCH 2/5] add object-cache infrastructure Jeff King
2011-07-11 16:58 ` Shawn Pearce
2011-07-11 19:17 ` Junio C Hamano
2011-07-11 22:01 ` Jeff King
2011-07-11 23:21 ` Junio C Hamano
2011-07-11 23:42 ` Junio C Hamano
2011-07-12 0:03 ` Jeff King
2011-07-12 19:38 ` Clemens Buchacher
2011-07-12 19:45 ` Jeff King
2011-07-12 21:07 ` Clemens Buchacher
2011-07-12 21:15 ` Clemens Buchacher
2011-07-12 21:36 ` Jeff King
2011-07-14 8:04 ` Clemens Buchacher
2011-07-14 16:26 ` Illia Bobyr
2011-07-13 1:33 ` John Szakmeister
2011-07-12 0:14 ` Illia Bobyr
2011-07-12 5:35 ` Jeff King
2011-07-12 21:52 ` Illia Bobyr
2011-07-12 6:36 ` Miles Bader
2011-07-12 10:41 ` Jakub Narebski
2011-07-12 17:57 ` Jeff King
2011-07-12 18:41 ` Junio C Hamano
2011-07-13 6:37 ` Jeff King
2011-07-13 17:49 ` Junio C Hamano
2011-07-11 16:18 ` [PATCH 3/5] commit: add commit_generation function Jeff King
2011-07-11 17:57 ` Clemens Buchacher
2011-07-11 21:10 ` Jeff King
2011-07-11 16:18 ` [PATCH 4/5] pretty: support %G to show the generation number of a commit Jeff King
2011-07-11 16:18 ` [PATCH 5/5] limit "contains" traversals based on commit generation Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110711161754.GB10418@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=avarab@gmail.com \
--cc=drizzd@aon.at \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
--cc=jrnieder@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).