From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Ben Maurer <bmaurer@fb.com>,
Siddharth Agarwal <sid0@fb.com>
Subject: Re: [PATCH/RFC 0/6] reuse deltas found by bitmaps
Date: Wed, 26 Mar 2014 21:13:23 -0400 [thread overview]
Message-ID: <20140327011323.GA9266@sigill.intra.peff.net> (raw)
In-Reply-To: <20140326181300.GA7087@sigill.intra.peff.net>
On Wed, Mar 26, 2014 at 02:13:00PM -0400, Jeff King wrote:
> So I think the next steps are probably:
>
> 1. Measure the "all objects are preferred bases" approach and confirm
> that it is bad.
Below is a very rough patch to accomplish this. It just traverses the
"have" bitmap and adds every object with the "exclude" flag. The result
is as comically bad as I expected.
For a big fetch, it seems like it's working (numbers against v1.9.0):
5311.31: server (128 days) 4.49(7.35+0.23) 4.98(6.82+3.31) +10.9%
5311.32: size (128 days) 25.8M 32.0M +24.2%
5311.33: client (128 days) 7.17(7.38+0.20) 7.33(7.90+0.20) +2.2%
A modest increase in CPU time, and we get back most of our size
(remember that our "bad" case here is ~80M).
But for a small fetch...
5311.3: server (1 days) 0.20(0.17+0.03) 4.39(4.03+6.59) +2095.0%
5311.4: size (1 days) 57.2K 59.5K +4.1%
5311.5: client (1 days) 0.08(0.08+0.00) 0.08(0.08+0.00) +0.0%
Yikes. Besides spending lots of CPU on handling the enlarged object
list, notice that we still only dropped the size in the 128-day case to
32M. Which is almost exactly what the earlier "reuse on-disk deltas"
patch achieved.
What I think is happening is that we manage to reuse those on-disk
deltas (since they are now preferred bases, and we know we can). But we
never actually come up with any _new_ deltas, because the search window
is overwhelmed with candidates. So not only do we waste a huge amount of
CPU, but we just end up at the same not-quite-optimal result as before.
So this is a dead end, but I think it was good to double-check that.
The patch below is messy and would probably be better split into a few
patches, but I don't expect anyone to apply it (or even read it,
really). It's just for reference.
---
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0ee5f1f..1a5d401 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1026,7 +1026,7 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1,
if (have_duplicate_entry(sha1, 0, &index_pos))
return 0;
- create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset);
+ create_object_entry(sha1, type, name_hash, flags, 0, index_pos, pack, offset);
display_progress(progress_state, to_pack.nr_objects);
return 1;
@@ -2436,6 +2436,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
}
traverse_bitmap_commit_list(&add_object_entry_from_bitmap);
+ bitmap_have_foreach(&add_object_entry_from_bitmap);
return 0;
}
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1bae7e8..f4e30f5 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -605,6 +605,7 @@ static void show_objects_for_type(
struct bitmap *objects,
struct ewah_bitmap *type_filter,
enum object_type object_type,
+ int flags,
show_reachable_fn show_reach)
{
size_t pos = 0, i = 0;
@@ -613,9 +614,6 @@ static void show_objects_for_type(
struct ewah_iterator it;
eword_t filter;
- if (bitmap_git.reuse_objects == bitmap_git.pack->num_objects)
- return;
-
ewah_iterator_init(&it, type_filter);
while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
@@ -640,7 +638,7 @@ static void show_objects_for_type(
if (bitmap_git.hashes)
hash = ntohl(bitmap_git.hashes[entry->nr]);
- show_reach(sha1, object_type, 0, hash, bitmap_git.pack, entry->offset);
+ show_reach(sha1, object_type, flags, hash, bitmap_git.pack, entry->offset);
}
pos += BITS_IN_WORD;
@@ -816,14 +814,17 @@ void traverse_bitmap_commit_list(show_reachable_fn show_reachable)
{
assert(bitmap_git.result);
+ if (bitmap_git.reuse_objects == bitmap_git.pack->num_objects)
+ return;
+
show_objects_for_type(bitmap_git.result, bitmap_git.commits,
- OBJ_COMMIT, show_reachable);
+ OBJ_COMMIT, 0, show_reachable);
show_objects_for_type(bitmap_git.result, bitmap_git.trees,
- OBJ_TREE, show_reachable);
+ OBJ_TREE, 0, show_reachable);
show_objects_for_type(bitmap_git.result, bitmap_git.blobs,
- OBJ_BLOB, show_reachable);
+ OBJ_BLOB, 0, show_reachable);
show_objects_for_type(bitmap_git.result, bitmap_git.tags,
- OBJ_TAG, show_reachable);
+ OBJ_TAG, 0, show_reachable);
show_extended_objects(bitmap_git.result, show_reachable);
@@ -1090,3 +1091,18 @@ int bitmap_have(const unsigned char *sha1)
return bitmap_get(bitmap_git.haves, pos);
}
+
+void bitmap_have_foreach(show_reachable_fn show_reachable)
+{
+ if (!bitmap_git.haves)
+ return;
+
+ show_objects_for_type(bitmap_git.haves, bitmap_git.commits,
+ OBJ_COMMIT, 1, show_reachable);
+ show_objects_for_type(bitmap_git.haves, bitmap_git.trees,
+ OBJ_TREE, 1, show_reachable);
+ show_objects_for_type(bitmap_git.haves, bitmap_git.blobs,
+ OBJ_BLOB, 1, show_reachable);
+ show_objects_for_type(bitmap_git.haves, bitmap_git.tags,
+ OBJ_TAG, 1, show_reachable);
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index a63ee6b..02c08f8 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -50,6 +50,7 @@ int reuse_partial_packfile_from_bitmap(struct packed_git **packfile, uint32_t *e
int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bitmaps, int show_progress);
int bitmap_have(const unsigned char *sha1);
+void bitmap_have_foreach(show_reachable_fn);
void bitmap_writer_show_progress(int show);
void bitmap_writer_set_checksum(unsigned char *sha1);
next prev parent reply other threads:[~2014-03-27 1:13 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-26 7:22 [PATCH/RFC 0/6] reuse deltas found by bitmaps Jeff King
2014-03-26 7:22 ` [PATCH 1/6] t/perf-lib: factor boilerplate out of test_perf Jeff King
2014-03-26 22:34 ` Junio C Hamano
2014-03-26 7:22 ` [PATCH 2/6] t/perf/aggregate: factor our percent calculations Jeff King
2014-03-26 7:22 ` [PATCH 3/6] t/perf: add infrastructure for measuring sizes Jeff King
2014-03-26 7:22 ` [PATCH 4/6] t/perf: add perf tests for fetches from a bitmapped server Jeff King
2014-03-26 7:23 ` [PATCH 5/6] pack-bitmap: save "have" bitmap from walk Jeff King
2014-03-26 7:23 ` [PATCH 6/6] pack-objects: reuse deltas for thin "have" objects Jeff King
2014-03-28 4:23 ` Eric Sunshine
2014-03-26 17:33 ` [PATCH/RFC 0/6] reuse deltas found by bitmaps Junio C Hamano
2014-03-26 18:13 ` Jeff King
2014-03-26 22:31 ` Junio C Hamano
2014-03-26 22:36 ` Jeff King
2014-03-27 1:13 ` Jeff King [this message]
2014-03-27 16:36 ` Junio C Hamano
2014-03-26 22:40 ` Siddharth Agarwal
2014-03-27 14:09 ` Siddharth Agarwal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140327011323.GA9266@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=bmaurer@fb.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=sid0@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).