git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 2/3] prune: use bitmaps for reachability traversal
Date: Wed, 13 Feb 2019 23:37:43 -0500
Message-ID: <20190214043743.GB19183@sigill.intra.peff.net> (raw)
In-Reply-To: <20190214043127.GA19019@sigill.intra.peff.net>

Pruning generally has to traverse the whole commit graph in order to
see which objects are reachable. This is the exact problem that
reachability bitmaps were meant to solve, so let's use them (if they're
available, of course).

Here are timings on git.git:

  Test                            HEAD^             HEAD
  ------------------------------------------------------------------------
  5304.6: prune with bitmaps      3.65(3.56+0.09)   1.01(0.92+0.08) -72.3%

And on linux.git:

  Test                            HEAD^               HEAD
  --------------------------------------------------------------------------
  5304.6: prune with bitmaps      35.05(34.79+0.23)   3.00(2.78+0.21) -91.4%

The tests show a pretty optimal case, as we'll have just repacked and
should have pretty good coverage of all refs with our bitmaps. But
that's actually pretty realistic: normally prune is run via "gc" right
after repacking.

A few notes on the implementation:

  - the change is actually in reachable.c, so it would improve
    reachability traversals by "reflog expire --stale-fix", as well.
    Those aren't performed regularly, though (a normal "git gc" doesn't
    use --stale-fix), so they're not really worth measuring. There's a
    low chance of regressing that caller, since the use of bitmaps is
    totally transparent from the caller's perspective.

  - The bitmap case could actually get away without creating a "struct
    object", and instead the caller could just look up each object id in
    the bitmap result. However, this would be a marginal improvement in
    runtime, and it would make the callers much more complicated. They'd
    have to handle both the bitmap and non-bitmap cases separately, and
    in the case of git-prune, we'd also have to tweak prune_shallow(),
    which relies on our SEEN flags.

  - Because we do create real object structs, we go through a few
    contortions to create ones of the right type. This isn't strictly
    necessary (lookup_unknown_object() would suffice), but it's more
    memory efficient to use the correct types, since we already know
    them.

Signed-off-by: Jeff King <peff@peff.net>
---
 reachable.c           | 42 ++++++++++++++++++++++++++++++++++++++++++
 t/perf/p5304-prune.sh | 11 +++++++++++
 2 files changed, 53 insertions(+)

diff --git a/reachable.c b/reachable.c
index 6e9b810d2a..0d00a91de4 100644
--- a/reachable.c
+++ b/reachable.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "worktree.h"
 #include "object-store.h"
+#include "pack-bitmap.h"
 
 struct connectivity_progress {
 	struct progress *progress;
@@ -158,10 +159,44 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
 				      FOR_EACH_OBJECT_LOCAL_ONLY);
 }
 
+static void *lookup_object_by_type(struct repository *r,
+				   const struct object_id *oid,
+				   enum object_type type)
+{
+	switch (type) {
+	case OBJ_COMMIT:
+		return lookup_commit(r, oid);
+	case OBJ_TREE:
+		return lookup_tree(r, oid);
+	case OBJ_TAG:
+		return lookup_tag(r, oid);
+	case OBJ_BLOB:
+		return lookup_blob(r, oid);
+	default:
+		die("BUG: unknown object type %d", type);
+	}
+}
+
+static int mark_object_seen(const struct object_id *oid,
+			     enum object_type type,
+			     int exclude,
+			     uint32_t name_hash,
+			     struct packed_git *found_pack,
+			     off_t found_offset)
+{
+	struct object *obj = lookup_object_by_type(the_repository, oid, type);
+	if (!obj)
+		die("unable to create object '%s'", oid_to_hex(oid));
+
+	obj->flags |= SEEN;
+	return 0;
+}
+
 void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 			    timestamp_t mark_recent, struct progress *progress)
 {
 	struct connectivity_progress cp;
+	struct bitmap_index *bitmap_git;
 
 	/*
 	 * Set up revision parsing, and mark us as being interested
@@ -188,6 +223,13 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
+	bitmap_git = prepare_bitmap_walk(revs);
+	if (bitmap_git) {
+		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
+		free_bitmap_index(bitmap_git);
+		return;
+	}
+
 	/*
 	 * Set up the revision walk - this will move all commits
 	 * from the pending list to the commit walking list.
diff --git a/t/perf/p5304-prune.sh b/t/perf/p5304-prune.sh
index 3c852084eb..83baedb8a4 100755
--- a/t/perf/p5304-prune.sh
+++ b/t/perf/p5304-prune.sh
@@ -21,4 +21,15 @@ test_perf 'prune with no objects' '
 	git prune
 '
 
+test_expect_success 'repack with bitmaps' '
+	git repack -adb
+'
+
+# We have to create the object in each trial run, since otherwise
+# runs after the first see no object and just skip the traversal entirely!
+test_perf 'prune with bitmaps' '
+	echo "probably not present in repo" | git hash-object -w --stdin &&
+	git prune
+'
+
 test_done
-- 
2.21.0.rc0.586.gffba1126a0


  parent reply index

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
2019-02-14 10:54   ` Eric Sunshine
2019-02-14 11:07     ` Jeff King
2019-02-14  4:37 ` Jeff King [this message]
2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
2019-03-10 23:39     ` Jeff King
2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
2019-03-12 10:49         ` Jeff King
2019-03-12 12:05           ` Jeff King
2019-03-13  1:51           ` Eric Wong
2019-03-13 14:54             ` Jeff King
2019-03-14  9:12               ` [PATCH v3] " Eric Wong
2019-03-14 16:02                 ` Jeff King
2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
2019-03-15 13:25                       ` SZEDER Gábor
2019-03-15 18:36                         ` Jeff King
2019-03-15  6:25                     ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King
2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
2019-04-10 22:57                   ` Jeff King
2019-04-25  7:16                     ` Junio C Hamano
2019-05-04  1:37                       ` Jeff King
2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
2019-05-04 13:23                           ` SZEDER Gábor
2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
2019-05-09  4:24                               ` Junio C Hamano
2019-05-07  7:45                           ` Jeff King
2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
2019-05-08  7:11                               ` Jeff King
2019-05-08 14:20                                 ` Derrick Stolee
2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
2019-05-08 22:25                                   ` Jeff King
2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
2019-04-18 19:49     ` Jeff King
2019-04-18 20:08       ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King
2019-04-20  1:01         ` Derrick Stolee
2019-04-20  3:24           ` Jeff King
2019-04-20 21:01             ` Derrick Stolee
2019-02-14  4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190214043743.GB19183@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox