git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 1/3] prune: lazily perform reachability traversal
Date: Wed, 13 Feb 2019 23:35:22 -0500
Message-ID: <20190214043522.GA19183@sigill.intra.peff.net> (raw)
In-Reply-To: <20190214043127.GA19019@sigill.intra.peff.net>

The general strategy of "git prune" is to do a full reachability walk,
then for each loose object see if we found it in our walk. But if we
don't have any loose objects, we don't need to do the expensive walk in
the first place.

This patch postpones that walk until the first time we need to see its
results.

Note that this is really a specific case of a more general optimization,
which is that we could traverse only far enough to find the object under
consideration (i.e., stop the traversal when we find it, then pick up
again when asked about the next object, etc). That could save us in some
instances from having to do a full walk. But it's actually a bit tricky
to do with our traversal code, and you'd need to do a full walk anyway
if you have even a single unreachable object (which you generally do, if
any objects are actually left after running git-repack).

So in practice this lazy-load of the full walk catches one easy but
common case (i.e., you've just repacked via git-gc, and there's nothing
unreachable).

The perf script is fairly contrived, but it does show off the
improvement:

  Test                            HEAD^             HEAD
  -------------------------------------------------------------------------
  5304.4: prune with no objects   3.66(3.60+0.05)   0.00(0.00+0.00) -100.0%

and would let us know if we accidentally regress this optimization.

Note also that we need to take special care with prune_shallow(), which
relies on us having performed the traversal. So this optimization can
only kick in for a non-shallow repository. Since this is easy to get
wrong and is not covered by existing tests, let's add an extra test to
t5304 that covers this case explicitly.

Signed-off-by: Jeff King <peff@peff.net>
---
The diff looks nice with --color-moved. I wish there was a way to
communicate that information in a plaintext email.

 builtin/prune.c       | 43 ++++++++++++++++++++++++++++++++-----------
 t/perf/p5304-prune.sh | 24 ++++++++++++++++++++++++
 t/t5304-prune.sh      | 12 ++++++++++++
 3 files changed, 68 insertions(+), 11 deletions(-)
 create mode 100755 t/perf/p5304-prune.sh

diff --git a/builtin/prune.c b/builtin/prune.c
index 1ec9ddd751..04b6573945 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -31,16 +31,40 @@ static int prune_tmp_file(const char *fullpath)
 	return 0;
 }
 
-static int prune_object(const struct object_id *oid, const char *fullpath,
-			void *data)
+static void perform_reachability_traversal(struct rev_info *revs)
 {
-	struct stat st;
+	static int initialized;
+	struct progress *progress = NULL;
+
+	if (initialized)
+		return;
+
+	if (show_progress)
+		progress = start_delayed_progress(_("Checking connectivity"), 0);
+	mark_reachable_objects(revs, 1, expire, progress);
+	stop_progress(&progress);
+	initialized = 1;
+}
+
+static int is_object_reachable(const struct object_id *oid,
+			       struct rev_info *revs)
+{
+	perform_reachability_traversal(revs);
 
 	/*
 	 * Do we know about this object?
 	 * It must have been reachable
 	 */
-	if (lookup_object(the_repository, oid->hash))
+	return !!lookup_object(the_repository, oid->hash);
+}
+
+static int prune_object(const struct object_id *oid, const char *fullpath,
+			void *data)
+{
+	struct rev_info *revs = data;
+	struct stat st;
+
+	if (is_object_reachable(oid, revs))
 		return 0;
 
 	if (lstat(fullpath, &st)) {
@@ -102,7 +126,6 @@ static void remove_temporary_files(const char *path)
 int cmd_prune(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
-	struct progress *progress = NULL;
 	int exclude_promisor_objects = 0;
 	const struct option options[] = {
 		OPT__DRY_RUN(&show_only, N_("do not remove, show only")),
@@ -142,17 +165,13 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 
 	if (show_progress == -1)
 		show_progress = isatty(2);
-	if (show_progress)
-		progress = start_delayed_progress(_("Checking connectivity"), 0);
 	if (exclude_promisor_objects) {
 		fetch_if_missing = 0;
 		revs.exclude_promisor_objects = 1;
 	}
 
-	mark_reachable_objects(&revs, 1, expire, progress);
-	stop_progress(&progress);
 	for_each_loose_file_in_objdir(get_object_directory(), prune_object,
-				      prune_cruft, prune_subdir, NULL);
+				      prune_cruft, prune_subdir, &revs);
 
 	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
 	remove_temporary_files(get_object_directory());
@@ -160,8 +179,10 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	remove_temporary_files(s);
 	free(s);
 
-	if (is_repository_shallow(the_repository))
+	if (is_repository_shallow(the_repository)) {
+		perform_reachability_traversal(&revs);
 		prune_shallow(show_only ? PRUNE_SHOW_ONLY : 0);
+	}
 
 	return 0;
 }
diff --git a/t/perf/p5304-prune.sh b/t/perf/p5304-prune.sh
new file mode 100755
index 0000000000..3c852084eb
--- /dev/null
+++ b/t/perf/p5304-prune.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+test_description='performance tests of prune'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'remove reachable loose objects' '
+	git repack -ad
+'
+
+test_expect_success 'remove unreachable loose objects' '
+	git prune
+'
+
+test_expect_success 'confirm there are no loose objects' '
+	git count-objects | grep ^0
+'
+
+test_perf 'prune with no objects' '
+	git prune
+'
+
+test_done
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 270da21ac3..2c19a790c1 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -274,6 +274,18 @@ test_expect_success 'prune .git/shallow' '
 	test_path_is_missing .git/shallow
 '
 
+test_expect_success 'prune .git/shallow when there are no loose objects' '
+	SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&
+	echo $SHA1 >.git/shallow &&
+	git update-ref refs/heads/shallow-tip $SHA1 &&
+	git repack -ad &&
+	# verify assumption that all loose objects are gone
+	git count-objects | grep ^0 &&
+	git prune &&
+	echo $SHA1 >expect &&
+	test_cmp expect .git/shallow
+'
+
 test_expect_success 'prune: handle alternate object database' '
 	test_create_repo A &&
 	git -C A commit --allow-empty -m "initial commit" &&
-- 
2.21.0.rc0.586.gffba1126a0


  reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
2019-02-14  4:35 ` Jeff King [this message]
2019-02-14 10:54   ` [PATCH 1/3] prune: lazily perform reachability traversal Eric Sunshine
2019-02-14 11:07     ` Jeff King
2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
2019-03-10 23:39     ` Jeff King
2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
2019-03-12 10:49         ` Jeff King
2019-03-12 12:05           ` Jeff King
2019-03-13  1:51           ` Eric Wong
2019-03-13 14:54             ` Jeff King
2019-03-14  9:12               ` [PATCH v3] " Eric Wong
2019-03-14 16:02                 ` Jeff King
2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
2019-03-15 13:25                       ` SZEDER Gábor
2019-03-15 18:36                         ` Jeff King
2019-03-15  6:25                     ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King
2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
2019-04-10 22:57                   ` Jeff King
2019-04-25  7:16                     ` Junio C Hamano
2019-05-04  1:37                       ` Jeff King
2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
2019-05-04 13:23                           ` SZEDER Gábor
2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
2019-05-09  4:24                               ` Junio C Hamano
2019-05-07  7:45                           ` Jeff King
2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
2019-05-08  7:11                               ` Jeff King
2019-05-08 14:20                                 ` Derrick Stolee
2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
2019-05-08 22:25                                   ` Jeff King
2019-05-23 11:30                     ` Jeff King
2019-05-23 12:53                       ` Derrick Stolee
2019-05-24  7:24                         ` Jeff King
2019-05-24 10:33                           ` Derrick Stolee
2019-05-23 19:26                       ` Ævar Arnfjörð Bjarmason
2019-05-24  7:27                         ` Jeff King
2019-05-24  7:55                           ` Ævar Arnfjörð Bjarmason
2019-05-24  8:26                             ` Jeff King
2019-05-24  9:01                               ` Ævar Arnfjörð Bjarmason
2019-05-24  9:29                                 ` SZEDER Gábor
2019-05-24 11:17                                   ` Ævar Arnfjörð Bjarmason
2019-05-24 11:41                                     ` SZEDER Gábor
2019-05-24 11:58                                       ` Ævar Arnfjörð Bjarmason
2019-05-24 12:34                                         ` SZEDER Gábor
2019-05-24 13:41                                           ` Ævar Arnfjörð Bjarmason
2019-05-24 11:31                       ` [PATCH] pack-bitmap: look for an uninteresting bitmap Derrick Stolee
2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
2019-04-18 19:49     ` Jeff King
2019-04-18 20:08       ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King
2019-04-20  1:01         ` Derrick Stolee
2019-04-20  3:24           ` Jeff King
2019-04-20 21:01             ` Derrick Stolee
2019-02-14  4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190214043522.GA19183@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox