[PATCH 0/7] tweaking the delta base cache

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/7] tweaking the delta base cache
@ 2016-08-22 21:57 Jeff King
  2016-08-22 21:57 ` [PATCH 1/7] cache_or_unpack_entry: drop keep_cache parameter Jeff King
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Jeff King @ 2016-08-22 21:57 UTC (permalink / raw)
  To: git

After the experiments I did with --depth=50 recently, I noticed there
seemed to be a lot of room for improvement in the delta-base-cache (and
in particular, there seemed to be a lack of actual numbers).

So I tried a series of experiments, and these are the tweaks I came up
with. There are a lot of numbers and analysis in the commit messages
themselves. The most dramatic effect I got was that before this patch,
bumping core.deltaBaseCacheLimit for the kernel gets you basically
nothing, whereas with it, I get:

  core.deltaBaseCacheLimit    time to run git log -Sfoo --raw
  ------------------------    -------------------------------
                      128m    4m56.486s
                      256m    4m33.769s
                      512m    4m12.968s
                     1024m    3m32.623s

Note that I don't actually propose bumping the memory limit in this
series. That's a bit more contentious, as it's really using more
resources to do a space/time tradeoff, and people may not want to spend
the RAM. Whereas this series just adjusts the actual data structures to
let us use the RAM we've already been allocated more efficiently.

The interesting changes are really patches 5 and 6, which adjust the LRU
management and the underlying hash structure.

There are a few ideas I thought of or saw in past threads but didn't
explore. I don't plan on digging further on them right now, so if
anybody else wants to do so, be my guest:

  - limiting the size of items entering the cache (e.g., to avoid a
    single giant blob blowing out all of the other entries)

  - something more clever than LRU, like weighting by a mix of size and
    recency

  - I didn't look at the criteria for adding entries to the cache at all

  - we seem to drop cache entries as we use them in unpack_entry(); I'm
    not sure if we would do better to retain them and let them leave via
    LRU expiration

So there may be more work, but I think these improvements stand on their
own.

  [1/7]: cache_or_unpack_entry: drop keep_cache parameter
  [2/7]: clear_delta_base_cache_entry: use a more descriptive name
  [3/7]: release_delta_base_cache: reuse existing detach function
  [4/7]: delta_base_cache: use list.h for LRU
  [5/7]: delta_base_cache: drop special treatment of blobs
  [6/7]: delta_base_cache: use hashmap.h
  [7/7]: t/perf: add basic perf tests for delta base cache

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] cache_or_unpack_entry: drop keep_cache parameter
  2016-08-22 21:57 [PATCH 0/7] tweaking the delta base cache Jeff King
@ 2016-08-22 21:57 ` Jeff King
  2016-08-23 21:45   ` Junio C Hamano
  2016-08-22 21:57 ` [PATCH 2/7] clear_delta_base_cache_entry: use a more descriptive name Jeff King
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2016-08-22 21:57 UTC (permalink / raw)
  To: git

There is only one caller of cache_or_unpack_entry() and it
always passes 1 for the keep_cache parameter. We can
simplify it by dropping the "!keep_cache" case.

Another call, which did pass 0, was dropped in abe601b
(sha1_file: remove recursion in unpack_entry, 2013-03-27),
as unpack_entry() now does more complicated things than a
simple unpack when there is a cache miss.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 3045aea..2333911 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2129,25 +2129,18 @@ static void clear_delta_base_cache_entry(struct delta_base_cache_entry *ent)
 }
 
 static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
-	unsigned long *base_size, enum object_type *type, int keep_cache)
+	unsigned long *base_size, enum object_type *type)
 {
 	struct delta_base_cache_entry *ent;
-	void *ret;
 
 	ent = get_delta_base_cache_entry(p, base_offset);
 
 	if (!eq_delta_base_cache_entry(ent, p, base_offset))
 		return unpack_entry(p, base_offset, type, base_size);
 
-	ret = ent->data;
-
-	if (!keep_cache)
-		clear_delta_base_cache_entry(ent);
-	else
-		ret = xmemdupz(ent->data, ent->size);
 	*type = ent->type;
 	*base_size = ent->size;
-	return ret;
+	return xmemdupz(ent->data, ent->size);
 }
 
 static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
@@ -2755,7 +2748,7 @@ static void *read_packed_sha1(const unsigned char *sha1,
 
 	if (!find_pack_entry(sha1, &e))
 		return NULL;
-	data = cache_or_unpack_entry(e.p, e.offset, size, type, 1);
+	data = cache_or_unpack_entry(e.p, e.offset, size, type);
 	if (!data) {
 		/*
 		 * We're probably in deep shit, but let's try to fetch
-- 
2.10.0.rc1.118.ge2299eb


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] clear_delta_base_cache_entry: use a more descriptive name
  2016-08-22 21:57 [PATCH 0/7] tweaking the delta base cache Jeff King
  2016-08-22 21:57 ` [PATCH 1/7] cache_or_unpack_entry: drop keep_cache parameter Jeff King
@ 2016-08-22 21:57 ` Jeff King
  2016-08-23 21:47   ` Junio C Hamano
  2016-08-22 21:57 ` [PATCH 3/7] release_delta_base_cache: reuse existing detach function Jeff King
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2016-08-22 21:57 UTC (permalink / raw)
  To: git

The delta base cache entries are stored in a fixed-length
hash table. So the way to remove an entry is to "clear" the
slot in the table, and that is what this function does.

However, the name is a leaky abstraction. If we were to
change the hash table implementation, it would no longer be
about "clearing". We should name it after _what_ it does,
not _how_ it does it. I.e., something like "remove" instead
of "clear".

But that does not tell the whole story, either. The subtle
thing about this function is that it removes the entry, but
does not free the entry data. So a more descriptive name is
"detach"; we give ownership of the data buffer to the
caller, and remove any other resources.

This patch uses the name detach_delta_base_cache_entry().
We could further model this after functions like
strbuf_detach(), which pass back all of the detached
information. However, since there are so many bits of
information in the struct (the data, the size, the type),
and so few callers (only one), it's not worth that
awkwardness. The name change and a comment can make the
intent clear.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 2333911..1d0810c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2120,7 +2120,12 @@ static int in_delta_base_cache(struct packed_git *p, off_t base_offset)
 	return eq_delta_base_cache_entry(ent, p, base_offset);
 }

-static void clear_delta_base_cache_entry(struct delta_base_cache_entry *ent)
+/*
+ * Remove the entry from the cache, but do _not_ free the associated
+ * entry data. The caller takes ownership of the "data" buffer, and
+ * should copy out any fields it wants before detaching.
+ */
+static void detach_delta_base_cache_entry(struct delta_base_cache_entry *ent)
 {
 	ent->data = NULL;
 	ent->lru.next->prev = ent->lru.prev;
@@ -2243,7 +2248,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 			type = ent->type;
 			data = ent->data;
 			size = ent->size;
-			clear_delta_base_cache_entry(ent);
+			detach_delta_base_cache_entry(ent);
 			base_from_cache = 1;
 			break;
 		}
-- 
2.10.0.rc1.118.ge2299eb

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] release_delta_base_cache: reuse existing detach function
  2016-08-22 21:57 [PATCH 0/7] tweaking the delta base cache Jeff King
  2016-08-22 21:57 ` [PATCH 1/7] cache_or_unpack_entry: drop keep_cache parameter Jeff King
  2016-08-22 21:57 ` [PATCH 2/7] clear_delta_base_cache_entry: use a more descriptive name Jeff King
@ 2016-08-22 21:57 ` Jeff King
  2016-08-23 21:49   ` Junio C Hamano
  2016-08-22 21:59 ` [PATCH 4/7] delta_base_cache: use list.h for LRU Jeff King
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2016-08-22 21:57 UTC (permalink / raw)
  To: git

This function drops an entry entirely from the cache,
meaning that aside from the freeing of the buffer, it is
exactly equivalent to detach_delta_base_cache_entry(). Let's
build on top of the detach function, which shortens the code
and will make it simpler when we change out the underlying
storage in future patches.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 1d0810c..8264b39 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2152,10 +2152,7 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
 {
 	if (ent->data) {
 		free(ent->data);
-		ent->data = NULL;
-		ent->lru.next->prev = ent->lru.prev;
-		ent->lru.prev->next = ent->lru.next;
-		delta_base_cached -= ent->size;
+		detach_delta_base_cache_entry(ent);
 	}
 }
 
-- 
2.10.0.rc1.118.ge2299eb


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] delta_base_cache: use list.h for LRU
  2016-08-22 21:57 [PATCH 0/7] tweaking the delta base cache Jeff King
                   ` (2 preceding siblings ...)
  2016-08-22 21:57 ` [PATCH 3/7] release_delta_base_cache: reuse existing detach function Jeff King
@ 2016-08-22 21:59 ` Jeff King
  2016-08-22 23:18   ` Eric Wong
  2016-08-22 21:59 ` [PATCH 5/7] delta_base_cache: drop special treatment of blobs Jeff King
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2016-08-22 21:59 UTC (permalink / raw)
  To: git

We keep an LRU list of entries for when we need to drop
something from an over-full cache. The list is implemented
as a circular doubly-linked list, which is exactly what
list.h provides. We can save a few lines by using the list.h
macros and functions. More importantly, this makes the code
easier to follow, as the reader sees explicit concepts like
"list_add_tail()" instead of pointer manipulation.

As a bonus, the list_entry() macro lets us place the lru
pointers anywhere inside the delta_base_cache_entry struct
(as opposed to just casting the pointer, which requires it
at the front of the struct). This will be useful in later
patches when we need to place other items at the front of
the struct (e.g., our hashmap implementation requires this).

Signed-off-by: Jeff King <peff@peff.net>
---
I think the result is much nicer, but I found list_entry() a little
disappointing, because we lack typeof(). So you are stuck writing:

  struct delta_base_cache_entry *f =
    list_entry(p, struct delta_base_cache_entry, lru);

I waffled on adding something like:

  LIST_ITEM(struct delta_bas_cache_entry, f, p, lru);

to declare "f" as above. But it's getting rather magical and un-C-like.

 sha1_file.c | 38 ++++++++++++++++----------------------
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 8264b39..c02e785 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -24,6 +24,7 @@
 #include "streaming.h"
 #include "dir.h"
 #include "mru.h"
+#include "list.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -2077,13 +2078,10 @@ static void *unpack_compressed_entry(struct packed_git *p,
 
 static size_t delta_base_cached;
 
-static struct delta_base_cache_lru_list {
-	struct delta_base_cache_lru_list *prev;
-	struct delta_base_cache_lru_list *next;
-} delta_base_cache_lru = { &delta_base_cache_lru, &delta_base_cache_lru };
+static LIST_HEAD(delta_base_cache_lru);
 
 static struct delta_base_cache_entry {
-	struct delta_base_cache_lru_list lru;
+	struct list_head lru;
 	void *data;
 	struct packed_git *p;
 	off_t base_offset;
@@ -2128,8 +2126,7 @@ static int in_delta_base_cache(struct packed_git *p, off_t base_offset)
 static void detach_delta_base_cache_entry(struct delta_base_cache_entry *ent)
 {
 	ent->data = NULL;
-	ent->lru.next->prev = ent->lru.prev;
-	ent->lru.prev->next = ent->lru.next;
+	list_del(&ent->lru);
 	delta_base_cached -= ent->size;
 }
 
@@ -2168,24 +2165,24 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 {
 	unsigned long hash = pack_entry_hash(p, base_offset);
 	struct delta_base_cache_entry *ent = delta_base_cache + hash;
-	struct delta_base_cache_lru_list *lru;
+	struct list_head *lru;
 
 	release_delta_base_cache(ent);
 	delta_base_cached += base_size;
 
-	for (lru = delta_base_cache_lru.next;
-	     delta_base_cached > delta_base_cache_limit
-	     && lru != &delta_base_cache_lru;
-	     lru = lru->next) {
-		struct delta_base_cache_entry *f = (void *)lru;
+	list_for_each(lru, &delta_base_cache_lru) {
+		struct delta_base_cache_entry *f =
+			list_entry(lru, struct delta_base_cache_entry, lru);
+		if (delta_base_cached <= delta_base_cache_limit)
+			break;
 		if (f->type == OBJ_BLOB)
 			release_delta_base_cache(f);
 	}
-	for (lru = delta_base_cache_lru.next;
-	     delta_base_cached > delta_base_cache_limit
-	     && lru != &delta_base_cache_lru;
-	     lru = lru->next) {
-		struct delta_base_cache_entry *f = (void *)lru;
+	list_for_each(lru, &delta_base_cache_lru) {
+		struct delta_base_cache_entry *f =
+			list_entry(lru, struct delta_base_cache_entry, lru);
+		if (delta_base_cached <= delta_base_cache_limit)
+			break;
 		release_delta_base_cache(f);
 	}
 
@@ -2194,10 +2191,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	ent->type = type;
 	ent->data = base;
 	ent->size = base_size;
-	ent->lru.next = &delta_base_cache_lru;
-	ent->lru.prev = delta_base_cache_lru.prev;
-	delta_base_cache_lru.prev->next = &ent->lru;
-	delta_base_cache_lru.prev = &ent->lru;
+	list_add_tail(&ent->lru, &delta_base_cache_lru);
 }
 
 static void *read_object(const unsigned char *sha1, enum object_type *type,
-- 
2.10.0.rc1.118.ge2299eb


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] delta_base_cache: drop special treatment of blobs
  2016-08-22 21:57 [PATCH 0/7] tweaking the delta base cache Jeff King
                   ` (3 preceding siblings ...)
  2016-08-22 21:59 ` [PATCH 4/7] delta_base_cache: use list.h for LRU Jeff King
@ 2016-08-22 21:59 ` Jeff King
  2016-08-23 22:18   ` Junio C Hamano
  2016-08-22 22:00 ` [PATCH 6/7] delta_base_cache: use hashmap.h Jeff King
  2016-08-22 22:01 ` [PATCH 7/7] t/perf: add basic perf tests for delta base cache Jeff King
  6 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2016-08-22 21:59 UTC (permalink / raw)
  To: git

When the delta base cache runs out of allowed memory, it has
to drop entries. It does so by walking an LRU list, dropping
objects until we are under the memory limit. But we actually
walk the list twice: once to drop blobs, and then again to
drop other objects (which are generally trees). This comes
from 18bdec1 (Limit the size of the new delta_base_cache,
2007-03-19).

This performs poorly as the number of entries grows, because
any time dropping blobs does not satisfy the limit, we have
to walk the _entire_ list, trees included, looking for blobs
to drop, before starting to drop any trees.

It's not generally a problem now, as the cache is limited to
only 256 entries. But as we could benefit from increasing
that in a future patch, it's worth looking at how it
performs as the cache size grows. And the answer is "not
well".

The table below shows times for various operations with
different values of MAX_DELTA_CACHE (which is not a run-time
knob; I recompiled with -DMAX_DELTA_CACHE=$n for each).

I chose "git log --raw" ("log-raw" in the table) because it
will access all of the trees, but no blobs at all (so in a
sense it is a worst case for this problem, because we will
always walk over the entire list of trees once before
realizing there are no blobs to drop). This is also
representative of other tree-only operations like "rev-list
--objects" and "git log -- <path>".

I also timed "git log -Sfoo --raw" ("log-S" in the table).
It similarly accesses all of the trees, but also the blobs
for each commit. It's representative of "git log -p", though
it emphasizes the cost of blob access more, as "-S" is
cheaper than computing an actual blob diff.

All timings are best-of-3 wall-clock times (though they all
were CPU bound, so the user CPU times are similar). The
repositories were fully packed with --depth=50, and the
default core.deltaBaseCacheLimit of 96M was in effect.  The
current value of MAX_DELTA_CACHE is 256, so I started there
and worked up by factors of 2.

First, here are values for git.git (the asterisk signals the
fastest run for each operation):

    MAX_DELTA_CACHE    log-raw       log-S
    ---------------   ---------    ---------
                256   0m02.212s    0m12.634s
                512   0m02.136s*   0m10.614s
               1024   0m02.156s    0m08.614s
               2048   0m02.208s    0m07.062s
               4096   0m02.190s    0m06.484s*
               8192   0m02.176s    0m07.635s
              16384   0m02.913s    0m19.845s
              32768   0m03.617s    1m05.507s
              65536   0m04.031s    1m18.488s

You can see that for the tree-only log-raw case, we don't
actually benefit that much as the cache grows (all the
differences up through 8192 are basically just noise; this
is probably because we don't actually have that many
distinct trees in git.git). But for log-S, we get a definite
speed improvement as the cache grows, but the improvements
are lost as cache size grows and the linear LRU management
starts to dominate.

Here's the same thing run against linux.git:

    MAX_DELTA_CACHE    log-raw       log-S
    ---------------   ---------    ----------
                256   0m40.987s     5m13.216s
                512   0m37.949s     5m03.243s
               1024   0m35.977s     4m50.580s
               2048   0m33.855s     4m39.818s
               4096   0m32.913s     4m47.299s*
               8192   0m32.176s*    5m14.650s
              16384   0m32.185s     6m31.625s
              32768   0m38.056s     9m31.136s
              65536   1m30.518s    17m38.549s

The pattern is similar, though the effect in log-raw is more
pronounced here. The times dip down in the middle, and then
go back up as we keep growing.

So we know there's a problem. What's the solution?

The obvious one is to improve the data structure to avoid
walking over tree entries during the looking-for-blobs
traversal. We can do this by keeping _two_ LRU lists: one
for blobs, and one for other objects. We drop items from the
blob LRU first, and then from the tree LRU (if necessary).

Here's git.git using that strategy:

    MAX_DELTA_CACHE    log-raw      log-S
    ---------------   ---------   ----------
                256   0m02.264s   0m12.830s
                512   0m02.201s   0m10.771s
               1024   0m02.181s   0m08.593s
               2048   0m02.205s   0m07.116s
               4096   0m02.158s   0m06.537s*
               8192   0m02.213s   0m07.246s
              16384   0m02.155s*  0m10.975s
              32768   0m02.159s   0m16.047s
              65536   0m02.181s   0m16.992s

The upswing on log-raw is gone completely. But log-S still
has it (albeit much better than without this strategy).
Let's see what linux.git shows:

    MAX_DELTA_CACHE    log-raw       log-S
    ---------------   ---------    ---------
                256   0m42.519s    5m14.654s
                512   0m39.106s    5m04.708s
               1024   0m36.802s    4m51.454s
               2048   0m34.685s    4m39.378s*
               4096   0m33.663s    4m44.047s
               8192   0m33.157s    4m50.644s
              16384   0m33.090s*   4m49.648s
              32768   0m33.458s    4m53.371s
              65536   0m33.563s    5m04.580s

The results are similar. The tree-only case again performs
well (not surprising; we're literally just dropping the one
useless walk, and not otherwise changing the cache eviction
strategy at all). But the log-S case again does a bit worse
as the cache grows (though possibly that's within the noise,
which is much larger for this case).

Perhaps this is an indication that the "remove blobs first"
strategy is not actually optimal. The intent of it is to
avoid blowing out the tree cache when we see large blobs,
but it also means we'll throw away useful, recent blobs in
favor of older trees.

Let's run the same numbers without caring about object type
at all (i.e., one LRU list, and always evicting whatever is
at the head, regardless of type).

Here's git.git:

    MAX_DELTA_CACHE    log-raw      log-S
    ---------------   ---------   ---------
                256   0m02.227s   0m12.821s
                512   0m02.143s   0m10.602s
               1024   0m02.127s   0m08.642s
               2048   0m02.148s   0m07.123s
               4096   0m02.194s   0m06.448s*
               8192   0m02.239s   0m06.504s
              16384   0m02.144s*  0m06.502s
              32768   0m02.202s   0m06.622s
              65536   0m02.230s   0m06.677s

Much smoother; there's no dramatic upswing as we increase
the cache size (some remains, though it's small enough that
it's mostly run-to-run noise. E.g., in the log-raw case,
note how 8192 is 50-100ms higher than its neighbors). Note
also that we stop getting any real benefit for log-S after
about 4096 entries; that number will depend on the size of
the repository, the size of the blob entries, and the memory
limit of the cache.

Let's see what linux.git shows for the same strategy:

    MAX_DELTA_CACHE    log-raw      log-S
    ---------------   ---------   ---------
                256   0m41.661s   5m12.410s
                512   0m39.547s   5m07.920s
               1024   0m37.054s   4m54.666s
               2048   0m35.871s   4m41.194s*
               4096   0m34.646s   4m51.648s
               8192   0m33.881s   4m55.342s
              16384   0m35.190s   5m00.122s
              32768   0m35.060s   4m58.851s
              65536   0m33.311s*  4m51.420s

It's similarly good. As with the "separate blob LRU"
strategy, there's a lot of noise on the log-S run here. But
it's certainly not any worse, is possibly a bit better, and
the improvement over "separate blob LRU" on the git.git case
is dramatic.

So it seems like a clear winner, and that's what this patch
implements.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index c02e785..33564d6 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2175,14 +2175,6 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 			list_entry(lru, struct delta_base_cache_entry, lru);
 		if (delta_base_cached <= delta_base_cache_limit)
 			break;
-		if (f->type == OBJ_BLOB)
-			release_delta_base_cache(f);
-	}
-	list_for_each(lru, &delta_base_cache_lru) {
-		struct delta_base_cache_entry *f =
-			list_entry(lru, struct delta_base_cache_entry, lru);
-		if (delta_base_cached <= delta_base_cache_limit)
-			break;
 		release_delta_base_cache(f);
 	}
 
-- 
2.10.0.rc1.118.ge2299eb


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] delta_base_cache: use hashmap.h
  2016-08-22 21:57 [PATCH 0/7] tweaking the delta base cache Jeff King
                   ` (4 preceding siblings ...)
  2016-08-22 21:59 ` [PATCH 5/7] delta_base_cache: drop special treatment of blobs Jeff King
@ 2016-08-22 22:00 ` Jeff King
  2016-08-23 22:26   ` Junio C Hamano
  2016-08-22 22:01 ` [PATCH 7/7] t/perf: add basic perf tests for delta base cache Jeff King
  6 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2016-08-22 22:00 UTC (permalink / raw)
  To: git

The fundamental data structure of the delta base cache is a
hash table mapping pairs of "(packfile, offset)" into
structs containing the actual object data. The hash table
implementation dates back to e5e0161 (Implement a simple
delta_base cache, 2007-03-17), and uses a fixed-size table.
The current size is a hard-coded 256 entries.

Because we need to be able to remove objects from the hash
table, entry lookup does not do any kind of probing to
handle collisions. Colliding items simply replace whatever
is in their slot.  As a result, we have fewer usable slots
than even the 256 we allocate. At half full, each new item
has a 50% chance of displacing another one. Or another way
to think about it: every item has a 1/256 chance of being
ejected due to hash collision, without regard to our LRU
strategy.

So it would be interesting to see the effect of increasing
the cache size on the runtime for some common operations. As
with the previous patch, we'll measure "git log --raw" for
tree-only operations, and "git log -Sfoo --raw" for
operations that touch trees and blobs. All times are
wall-clock best-of-3, done against fully packed repos with
--depth=50, and the default core.deltaBaseCacheLimit of
96MB.

Here are timings for various values of MAX_DELTA_CACHE
against git.git (the asterisk marks the minimum time for
each operation):

    MAX_DELTA_CACHE    log-raw      log-S
    ---------------   ---------   ---------
                256   0m02.227s   0m12.821s
                512   0m02.143s   0m10.602s
               1024   0m02.127s   0m08.642s
               2048   0m02.148s   0m07.123s
               4096   0m02.194s   0m06.448s*
               8192   0m02.239s   0m06.504s
              16384   0m02.144s*  0m06.502s
              32768   0m02.202s   0m06.622s
              65536   0m02.230s   0m06.677s

The log-raw case isn't changed much at all here (probably
because our trees just aren't that big in the first place,
or possibly because we have so _few_ trees in git.git that
the 256-entry cache is enough). But once we start putting
blobs in the cache, too, we see a big improvement (almost
50%). The curve levels off around 4096, which means that we
can hold about that many entries before hitting the 96MB
memory limit (or possibly that the workload is small enough
that there is simply no more work to be optimized out by
caching more).

(As a side note, I initially timed my existing git.git pack,
which was a base of --aggressive combined with some pulls on
top. So it had quite a few deeper delta chains. The
256-cache case was more like 15s, and it still dropped to
~6.5s in the same way).

Here are the timings for linux.git:

    MAX_DELTA_CACHE    log-raw      log-S
    ---------------   ---------   ---------
                256   0m41.661s   5m12.410s
                512   0m39.547s   5m07.920s
               1024   0m37.054s   4m54.666s
               2048   0m35.871s   4m41.194s*
               4096   0m34.646s   4m51.648s
               8192   0m33.881s   4m55.342s
              16384   0m35.190s   5m00.122s
              32768   0m35.060s   4m58.851s
              65536   0m33.311s*  4m51.420s

As we grow we see a nice 20% speedup in the tree traversal,
and more modest 10% in the log-S. This is probably an
indication that we are bound less by the number of entries,
and more by the memory limit (more on that below). What is
interesting is that the numbers bounce around a bit;
increasing the number of entries isn't always a strict
improvement.

Partially this is due to noise in the measurement. But it
may also be an indication that our LRU ejection scheme is
not optimal. The smaller cache sizes introduce some
randomness into the ejection (due to collisions), which may
sometimes work in our favor (and sometimes not!).

So what is the optimal setting of MAX_DELTA_CACHE? The
"bouncing" in the linux.git log-S numbers notwithstanding,
it mostly seems like bigger is better. And even if we were
to try to find a "sweet spot", these are just two
repositories, that are not necessarily representative. The
shape of history, the size of trees and blobs, the memory
limit configuration, etc, all will affect the outcome.

Rather than trying to find the "right" number, another
strategy is to just switch to a hash table that can actually
store collisions: namely our hashmap.h implementation.

Here are numbers for that compared to the "best" we saw from
adjusting MAX_DELTA_CACHE:

        |       log-raw        |       log-S
        | best       hashmap   | best       hashmap
	| ---------  --------- | ---------  ---------
  git   | 0m02.144s  0m02.144s | 0m06.448s  0m06.688s
  linux | 0m33.311s  0m33.092s | 4m41.194s  4m57.172s

We can see the results are similar in most cases, which is
what we'd expect. We're not ejecting due to collisions at
all, so this is purely representing the LRU. So really, we'd
expect this to model most closely the larger values of the
static MAX_DELTA_CACHE limit. And that does seem to be
what's happening, including the "bounce" in the linux log-S
case.

So while the value for that case _isn't_ as good as the
optimal one measured above (which was 2048 entries), given
the bouncing I'm hesitant to suggest that 2048 is any kind
of optimum (not even for linux.git, let alone as a general
rule). The generic hashmap has the appeal that it drops the
number of tweakable numbers by one, which means we can focus
on tuning other elements, like the LRU strategy or the
core.deltaBaseCacheLimit setting.

And indeed, if we bump the cache limit to 1G (which is
probably silly for general use, but maybe something people
with big workstations would want to do), the linux.git log-S
time drops to 3m32s. That's something you really _can't_ do
easily with the static hash table, because the number of
entries needs to grow in proportion to the memory limit (so
2048 is almost certainly not going to be the right value
there).

This patch takes that direction, and drops the static hash
table entirely in favor of using the hashmap.h API.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c | 94 +++++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 60 insertions(+), 34 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 33564d6..a57b71d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2074,48 +2074,69 @@ static void *unpack_compressed_entry(struct packed_git *p,
 	return buffer;
 }

-#define MAX_DELTA_CACHE (256)
-
+static struct hashmap delta_base_cache;
 static size_t delta_base_cached;

 static LIST_HEAD(delta_base_cache_lru);

-static struct delta_base_cache_entry {
-	struct list_head lru;
-	void *data;
+struct delta_base_cache_key {
 	struct packed_git *p;
 	off_t base_offset;
+};
+
+struct delta_base_cache_entry {
+	struct hashmap hash;
+	struct delta_base_cache_key key;
+	struct list_head lru;
+	void *data;
 	unsigned long size;
 	enum object_type type;
-} delta_base_cache[MAX_DELTA_CACHE];
+};

-static unsigned long pack_entry_hash(struct packed_git *p, off_t base_offset)
+static unsigned int pack_entry_hash(struct packed_git *p, off_t base_offset)
 {
-	unsigned long hash;
+	unsigned int hash;

-	hash = (unsigned long)(intptr_t)p + (unsigned long)base_offset;
+	hash = (unsigned int)(intptr_t)p + (unsigned int)base_offset;
 	hash += (hash >> 8) + (hash >> 16);
-	return hash % MAX_DELTA_CACHE;
+	return hash;
 }

 static struct delta_base_cache_entry *
 get_delta_base_cache_entry(struct packed_git *p, off_t base_offset)
 {
-	unsigned long hash = pack_entry_hash(p, base_offset);
-	return delta_base_cache + hash;
+	struct hashmap_entry entry;
+	struct delta_base_cache_key key;
+
+	if (!delta_base_cache.cmpfn)
+		return NULL;
+
+	hashmap_entry_init(&entry, pack_entry_hash(p, base_offset));
+	key.p = p;
+	key.base_offset = base_offset;
+	return hashmap_get(&delta_base_cache, &entry, &key);
 }

-static int eq_delta_base_cache_entry(struct delta_base_cache_entry *ent,
-				     struct packed_git *p, off_t base_offset)
+static int delta_base_cache_key_eq(const struct delta_base_cache_key *a,
+				   const struct delta_base_cache_key *b)
 {
-	return (ent->data && ent->p == p && ent->base_offset == base_offset);
+	return a->p == b->p && a->base_offset == b->base_offset;
+}
+
+static int delta_base_cache_hash_cmp(const void *va, const void *vb,
+				     const void *vkey)
+{
+	const struct delta_base_cache_entry *a = va, *b = vb;
+	const struct delta_base_cache_key *key = vkey;
+	if (key)
+		return !delta_base_cache_key_eq(&a->key, key);
+	else
+		return !delta_base_cache_key_eq(&a->key, &b->key);
 }

 static int in_delta_base_cache(struct packed_git *p, off_t base_offset)
 {
-	struct delta_base_cache_entry *ent;
-	ent = get_delta_base_cache_entry(p, base_offset);
-	return eq_delta_base_cache_entry(ent, p, base_offset);
+	return !!get_delta_base_cache_entry(p, base_offset);
 }

 /*
@@ -2125,9 +2146,10 @@ static int in_delta_base_cache(struct packed_git *p, off_t base_offset)
  */
 static void detach_delta_base_cache_entry(struct delta_base_cache_entry *ent)
 {
-	ent->data = NULL;
+	hashmap_remove(&delta_base_cache, ent, &ent->key);
 	list_del(&ent->lru);
 	delta_base_cached -= ent->size;
+	free(ent);
 }

 static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
@@ -2136,8 +2158,7 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
 	struct delta_base_cache_entry *ent;

 	ent = get_delta_base_cache_entry(p, base_offset);
-
-	if (!eq_delta_base_cache_entry(ent, p, base_offset))
+	if (!ent)
 		return unpack_entry(p, base_offset, type, base_size);

 	*type = ent->type;
@@ -2147,27 +2168,27 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,

 static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
 {
-	if (ent->data) {
-		free(ent->data);
-		detach_delta_base_cache_entry(ent);
-	}
+	free(ent->data);
+	detach_delta_base_cache_entry(ent);
 }

 void clear_delta_base_cache(void)
 {
-	unsigned long p;
-	for (p = 0; p < MAX_DELTA_CACHE; p++)
-		release_delta_base_cache(&delta_base_cache[p]);
+	struct hashmap_iter iter;
+	struct delta_base_cache_entry *entry;
+	for (entry = hashmap_iter_first(&delta_base_cache, &iter);
+	     entry;
+	     entry = hashmap_iter_next(&iter)) {
+		release_delta_base_cache(entry);
+	}
 }

 static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	void *base, unsigned long base_size, enum object_type type)
 {
-	unsigned long hash = pack_entry_hash(p, base_offset);
-	struct delta_base_cache_entry *ent = delta_base_cache + hash;
+	struct delta_base_cache_entry *ent = xmalloc(sizeof(*ent));
 	struct list_head *lru;

-	release_delta_base_cache(ent);
 	delta_base_cached += base_size;

 	list_for_each(lru, &delta_base_cache_lru) {
@@ -2178,12 +2199,17 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 		release_delta_base_cache(f);
 	}

-	ent->p = p;
-	ent->base_offset = base_offset;
+	ent->key.p = p;
+	ent->key.base_offset = base_offset;
 	ent->type = type;
 	ent->data = base;
 	ent->size = base_size;
 	list_add_tail(&ent->lru, &delta_base_cache_lru);
+
+	if (!delta_base_cache.cmpfn)
+		hashmap_init(&delta_base_cache, delta_base_cache_hash_cmp, 0);
+	hashmap_entry_init(ent, pack_entry_hash(p, base_offset));
+	hashmap_add(&delta_base_cache, ent);
 }

 static void *read_object(const unsigned char *sha1, enum object_type *type,
@@ -2227,7 +2253,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 		struct delta_base_cache_entry *ent;

 		ent = get_delta_base_cache_entry(p, curpos);
-		if (eq_delta_base_cache_entry(ent, p, curpos)) {
+		if (ent) {
 			type = ent->type;
 			data = ent->data;
 			size = ent->size;
-- 
2.10.0.rc1.118.ge2299eb

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] t/perf: add basic perf tests for delta base cache
  2016-08-22 21:57 [PATCH 0/7] tweaking the delta base cache Jeff King
                   ` (5 preceding siblings ...)
  2016-08-22 22:00 ` [PATCH 6/7] delta_base_cache: use hashmap.h Jeff King
@ 2016-08-22 22:01 ` Jeff King
  6 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2016-08-22 22:01 UTC (permalink / raw)
  To: git

This just shows off the improvements done by the last few
patches, and gives us a baseline for noticing regressions in
the future. Here are the results with linux.git as the perf
"large repo":

Test                origin                HEAD
-------------------------------------------------------------------
0003.1: log --raw   43.41(40.36+2.69)     33.86(30.96+2.41) -22.0%
0003.2: log -S      313.61(309.74+3.78)   298.75(295.58+3.00) -4.7%

(for a large repo, the "log -S" improvements are greater if
you bump the delta base cache limit, but I think it makes
sense to test the "stock" behavior, since that is what most
people will see).

Signed-off-by: Jeff King <peff@peff.net>
---
I also got good times for "log -S" when measuring git.git (but _not_
when measuring "log --raw", because of its size). So I guess perhaps the
script could measure both. I'm not sure if the perf framework is ready
for using both test_default_repo and test_large_repo, though.

 t/perf/p0003-delta-base-cache.sh | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
 create mode 100755 t/perf/p0003-delta-base-cache.sh

diff --git a/t/perf/p0003-delta-base-cache.sh b/t/perf/p0003-delta-base-cache.sh
new file mode 100755
index 0000000..62369ea
--- /dev/null
+++ b/t/perf/p0003-delta-base-cache.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+test_description='Test operations that emphasize the delta base cache.
+
+We look at both "log --raw", which should put only trees into the delta cache,
+and "log -Sfoo --raw", which should look at both trees and blobs.
+
+Any effects will be emphasized if the test repository is fully packed (loose
+objects obviously do not use the delta base cache at all). It is also
+emphasized if the pack has long delta chains (e.g., as produced by "gc
+--aggressive"), though cache is still quite noticeable even with the default
+depth of 50.
+
+The setting of core.deltaBaseCacheLimit in the source repository is also
+relevant (depending on the size of your test repo), so be sure it is consistent
+between runs.
+'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+# puts mostly trees into the delta base cache
+test_perf 'log --raw' '
+	git log --raw >/dev/null
+'
+
+test_perf 'log -S' '
+	git log --raw -Sfoo >/dev/null
+'
+
+test_done
-- 
2.10.0.rc1.118.ge2299eb

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] delta_base_cache: use list.h for LRU
  2016-08-22 21:59 ` [PATCH 4/7] delta_base_cache: use list.h for LRU Jeff King
@ 2016-08-22 23:18   ` Eric Wong
  2016-08-23  0:48     ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2016-08-22 23:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:
> We keep an LRU list of entries for when we need to drop
> something from an over-full cache. The list is implemented
> as a circular doubly-linked list, which is exactly what
> list.h provides. We can save a few lines by using the list.h
> macros and functions. More importantly, this makes the code
> easier to follow, as the reader sees explicit concepts like
> "list_add_tail()" instead of pointer manipulation.

Yay!

> As a bonus, the list_entry() macro lets us place the lru
> pointers anywhere inside the delta_base_cache_entry struct
> (as opposed to just casting the pointer, which requires it
> at the front of the struct). This will be useful in later
> patches when we need to place other items at the front of
> the struct (e.g., our hashmap implementation requires this).

On a side note, I think we should s/list_entry/container_of/ and
use container_of for hashmap.

Linux kernel defines list_entry to use container_of,
but I'd rather avoid introducing the duality entirely.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I think the result is much nicer, but I found list_entry() a little
> disappointing, because we lack typeof(). So you are stuck writing:
> 
>   struct delta_base_cache_entry *f =
>     list_entry(p, struct delta_base_cache_entry, lru);
> 
> I waffled on adding something like:
> 
>   LIST_ITEM(struct delta_bas_cache_entry, f, p, lru);
> 
> to declare "f" as above. But it's getting rather magical and un-C-like.

Right.  I'd rather keep the list_entry/container_of usage
identical to what developers familiar using userspace-rcu,
ccan/list, or the Linux kernel expect.

>  sha1_file.c | 38 ++++++++++++++++----------------------
>  1 file changed, 16 insertions(+), 22 deletions(-)

Good to see the code reduction.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] delta_base_cache: use list.h for LRU
  2016-08-22 23:18   ` Eric Wong
@ 2016-08-23  0:48     ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2016-08-23  0:48 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

On Mon, Aug 22, 2016 at 11:18:06PM +0000, Eric Wong wrote:

> > As a bonus, the list_entry() macro lets us place the lru
> > pointers anywhere inside the delta_base_cache_entry struct
> > (as opposed to just casting the pointer, which requires it
> > at the front of the struct). This will be useful in later
> > patches when we need to place other items at the front of
> > the struct (e.g., our hashmap implementation requires this).
> 
> On a side note, I think we should s/list_entry/container_of/ and
> use container_of for hashmap.
> 
> Linux kernel defines list_entry to use container_of,
> but I'd rather avoid introducing the duality entirely.

That does make it slightly more annoying to use, since container_of()
requires naming the original type again (because of our lack of typeof),
whereas now you can freely cast between "struct hashmap_entry" and the
actual item. But if you are proposing to actually fix the hashmap to
eliminate the requirement to put the entry at the front, that might have
value (I'm not sure how easy it is to do, though; a lot of those casts
happen inside the hashmap code which is type-agnostic).

> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > I think the result is much nicer, but I found list_entry() a little
> > disappointing, because we lack typeof(). So you are stuck writing:
> > 
> >   struct delta_base_cache_entry *f =
> >     list_entry(p, struct delta_base_cache_entry, lru);
> > 
> > I waffled on adding something like:
> > 
> >   LIST_ITEM(struct delta_bas_cache_entry, f, p, lru);
> > 
> > to declare "f" as above. But it's getting rather magical and un-C-like.
> 
> Right.  I'd rather keep the list_entry/container_of usage
> identical to what developers familiar using userspace-rcu,
> ccan/list, or the Linux kernel expect.

Makes sense. I am not familiar with those APIs myself, but I do not see
any reason to deviate from them for no good reason.

> >  sha1_file.c | 38 ++++++++++++++++----------------------
> >  1 file changed, 16 insertions(+), 22 deletions(-)
> 
> Good to see the code reduction.

Yep. I realized the "hashmap must go at the front of the struct" thing
later. My initial reason for converting was actually to make the
"separate blob LRU" patch easier (since without this patch, it literally
doubles the amount of pointer manipulation required).

For reference, in case anybody wants to duplicate my experiments from
patch 5, that patch looks like this:

diff --git a/sha1_file.c b/sha1_file.c
index 2cd1b79..e82be30 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2076,7 +2076,8 @@ static void *unpack_compressed_entry(struct packed_git *p,
 
 static size_t delta_base_cached;
 
-static LIST_HEAD(delta_base_cache_lru);
+static LIST_HEAD(delta_base_cache_lru_blobs);
+static LIST_HEAD(delta_base_cache_lru_other);
 
 static struct delta_base_cache_entry {
 	struct list_head lru;
@@ -2170,15 +2171,14 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	release_delta_base_cache(ent);
 	delta_base_cached += base_size;
 
-	list_for_each(lru, &delta_base_cache_lru) {
+	list_for_each(lru, &delta_base_cache_lru_blobs) {
 		struct delta_base_cache_entry *f =
 			list_entry(lru, struct delta_base_cache_entry, lru);
 		if (delta_base_cached <= delta_base_cache_limit)
 			break;
-		if (f->type == OBJ_BLOB)
-			release_delta_base_cache(f);
+		release_delta_base_cache(f);
 	}
-	list_for_each(lru, &delta_base_cache_lru) {
+	list_for_each(lru, &delta_base_cache_lru_other) {
 		struct delta_base_cache_entry *f =
 			list_entry(lru, struct delta_base_cache_entry, lru);
 		if (delta_base_cached <= delta_base_cache_limit)
@@ -2191,7 +2191,10 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	ent->type = type;
 	ent->data = base;
 	ent->size = base_size;
-	list_add_tail(&ent->lru, &delta_base_cache_lru);
+	if (type == OBJ_BLOB)
+		list_add_tail(&ent->lru, &delta_base_cache_lru_blobs);
+	else
+		list_add_tail(&ent->lru, &delta_base_cache_lru_other);
 }
 
 static void *read_object(const unsigned char *sha1, enum object_type *type,

-Peff

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] cache_or_unpack_entry: drop keep_cache parameter
  2016-08-22 21:57 ` [PATCH 1/7] cache_or_unpack_entry: drop keep_cache parameter Jeff King
@ 2016-08-23 21:45   ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2016-08-23 21:45 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> There is only one caller of cache_or_unpack_entry() and it
> always passes 1 for the keep_cache parameter. We can
> simplify it by dropping the "!keep_cache" case.
>
> Another call, which did pass 0, was dropped in abe601b
> (sha1_file: remove recursion in unpack_entry, 2013-03-27),
> as unpack_entry() now does more complicated things than a
> simple unpack when there is a cache miss.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  sha1_file.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)

Makes sense.  Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] clear_delta_base_cache_entry: use a more descriptive name
  2016-08-22 21:57 ` [PATCH 2/7] clear_delta_base_cache_entry: use a more descriptive name Jeff King
@ 2016-08-23 21:47   ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2016-08-23 21:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> The delta base cache entries are stored in a fixed-length
> hash table. So the way to remove an entry is to "clear" the
> slot in the table, and that is what this function does.
>
> However, the name is a leaky abstraction. If we were to
> change the hash table implementation, it would no longer be
> about "clearing". We should name it after _what_ it does,
> not _how_ it does it. I.e., something like "remove" instead
> of "clear".
>
> But that does not tell the whole story, either. The subtle
> thing about this function is that it removes the entry, but
> does not free the entry data. So a more descriptive name is
> "detach"; we give ownership of the data buffer to the
> caller, and remove any other resources.

OK.  Sounds sensible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] release_delta_base_cache: reuse existing detach function
  2016-08-22 21:57 ` [PATCH 3/7] release_delta_base_cache: reuse existing detach function Jeff King
@ 2016-08-23 21:49   ` Junio C Hamano
  2016-08-24 17:41     ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2016-08-23 21:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> This function drops an entry entirely from the cache,
> meaning that aside from the freeing of the buffer, it is
> exactly equivalent to detach_delta_base_cache_entry(). Let's
> build on top of the detach function, which shortens the code
> and will make it simpler when we change out the underlying
> storage in future patches.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  sha1_file.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 1d0810c..8264b39 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2152,10 +2152,7 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
>  {
>  	if (ent->data) {
>  		free(ent->data);
> -		ent->data = NULL;
> -		ent->lru.next->prev = ent->lru.prev;
> -		ent->lru.prev->next = ent->lru.next;
> -		delta_base_cached -= ent->size;
> +		detach_delta_base_cache_entry(ent);

If we were designing this from scratch, we might have made detach_*
to return the pointer to minimize direct access to ent->data, but I
do not think it is worth it.  Looks very sensible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/7] delta_base_cache: drop special treatment of blobs
  2016-08-22 21:59 ` [PATCH 5/7] delta_base_cache: drop special treatment of blobs Jeff King
@ 2016-08-23 22:18   ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2016-08-23 22:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Let's run the same numbers without caring about object type
> at all (i.e., one LRU list, and always evicting whatever is
> at the head, regardless of type).
> ...
> So it seems like a clear winner, and that's what this patch
> implements.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

Nice work.  You make your readers expect some clever data structures
that may perform better than the obvious two-separate-list approach
and end up using the simplest way.  Quite nice.



>  sha1_file.c | 8 --------
>  1 file changed, 8 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index c02e785..33564d6 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2175,14 +2175,6 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
>  			list_entry(lru, struct delta_base_cache_entry, lru);
>  		if (delta_base_cached <= delta_base_cache_limit)
>  			break;
> -		if (f->type == OBJ_BLOB)
> -			release_delta_base_cache(f);
> -	}
> -	list_for_each(lru, &delta_base_cache_lru) {
> -		struct delta_base_cache_entry *f =
> -			list_entry(lru, struct delta_base_cache_entry, lru);
> -		if (delta_base_cached <= delta_base_cache_limit)
> -			break;
>  		release_delta_base_cache(f);
>  	}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/7] delta_base_cache: use hashmap.h
  2016-08-22 22:00 ` [PATCH 6/7] delta_base_cache: use hashmap.h Jeff King
@ 2016-08-23 22:26   ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2016-08-23 22:26 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> So while the value for that case _isn't_ as good as the
> optimal one measured above (which was 2048 entries), given
> the bouncing I'm hesitant to suggest that 2048 is any kind
> of optimum (not even for linux.git, let alone as a general
> rule). The generic hashmap has the appeal that it drops the
> number of tweakable numbers by one, which means we can focus
> on tuning other elements, like the LRU strategy or the
> core.deltaBaseCacheLimit setting.
>
> And indeed, if we bump the cache limit to 1G (which is
> probably silly for general use, but maybe something people
> with big workstations would want to do), the linux.git log-S
> time drops to 3m32s. That's something you really _can't_ do
> easily with the static hash table, because the number of
> entries needs to grow in proportion to the memory limit (so
> 2048 is almost certainly not going to be the right value
> there).
>
> This patch takes that direction, and drops the static hash
> table entirely in favor of using the hashmap.h API.

Sounds very sensible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] release_delta_base_cache: reuse existing detach function
  2016-08-23 21:49   ` Junio C Hamano
@ 2016-08-24 17:41     ` Jeff King
  2016-08-24 17:59       ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2016-08-24 17:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Aug 23, 2016 at 02:49:15PM -0700, Junio C Hamano wrote:

> > diff --git a/sha1_file.c b/sha1_file.c
> > index 1d0810c..8264b39 100644
> > --- a/sha1_file.c
> > +++ b/sha1_file.c
> > @@ -2152,10 +2152,7 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
> >  {
> >  	if (ent->data) {
> >  		free(ent->data);
> > -		ent->data = NULL;
> > -		ent->lru.next->prev = ent->lru.prev;
> > -		ent->lru.prev->next = ent->lru.next;
> > -		delta_base_cached -= ent->size;
> > +		detach_delta_base_cache_entry(ent);
> 
> If we were designing this from scratch, we might have made detach_*
> to return the pointer to minimize direct access to ent->data, but I
> do not think it is worth it.  Looks very sensible.

I actually looked into that during the conversion in patch 2/7. I didn't
want to just return ent->data, because there are actually several bits
of information to "rescue" from the entry. So it looks more like:

  char *data = detach_delta_base_cache_entry(ent, NULL, NULL);
  free(data);

here, and

  data = detach_delta_base_cache_entry(ent, &type, &size);

in unpack_entry().

That is not too bad, I guess. I can switch it if you prefer that way.
Since there are only these two callers with two different sets of needs
(and the function is static), I just let them continue inspecting the
elements directly.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] release_delta_base_cache: reuse existing detach function
  2016-08-24 17:41     ` Jeff King
@ 2016-08-24 17:59       ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2016-08-24 17:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> That is not too bad, I guess. I can switch it if you prefer that way.
> Since there are only these two callers with two different sets of needs
> (and the function is static), I just let them continue inspecting the
> elements directly.

I do not think it is too hard to refactor later when need arises; it
is not like we have too many callers and this is a file-scope static
helper to begin with.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-08-24 18:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-22 21:57 [PATCH 0/7] tweaking the delta base cache Jeff King
2016-08-22 21:57 ` [PATCH 1/7] cache_or_unpack_entry: drop keep_cache parameter Jeff King
2016-08-23 21:45   ` Junio C Hamano
2016-08-22 21:57 ` [PATCH 2/7] clear_delta_base_cache_entry: use a more descriptive name Jeff King
2016-08-23 21:47   ` Junio C Hamano
2016-08-22 21:57 ` [PATCH 3/7] release_delta_base_cache: reuse existing detach function Jeff King
2016-08-23 21:49   ` Junio C Hamano
2016-08-24 17:41     ` Jeff King
2016-08-24 17:59       ` Junio C Hamano
2016-08-22 21:59 ` [PATCH 4/7] delta_base_cache: use list.h for LRU Jeff King
2016-08-22 23:18   ` Eric Wong
2016-08-23  0:48     ` Jeff King
2016-08-22 21:59 ` [PATCH 5/7] delta_base_cache: drop special treatment of blobs Jeff King
2016-08-23 22:18   ` Junio C Hamano
2016-08-22 22:00 ` [PATCH 6/7] delta_base_cache: use hashmap.h Jeff King
2016-08-23 22:26   ` Junio C Hamano
2016-08-22 22:01 ` [PATCH 7/7] t/perf: add basic perf tests for delta base cache Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).