git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Rafael Silva <rafaeloliveira.cs@gmail.com>
To: git@vger.kernel.org
Cc: "Jeff King" <peff@peff.net>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Rafael Silva" <rafaeloliveira.cs@gmail.com>
Subject: [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones
Date: Sun, 18 Apr 2021 15:57:49 +0200	[thread overview]
Message-ID: <20210418135749.27152-2-rafaeloliveira.cs@gmail.com> (raw)
In-Reply-To: <20210418135749.27152-1-rafaeloliveira.cs@gmail.com>

When `git repack -A -d` is run in a partial clone, `pack-objects`
is invoked twice: once to repack all promisor objects, and once to
repack all non-promisor objects. The latter `pack-objects` invocation
is with --exclude-promisor-objects and --unpack-unreachable, which
loosens all unused objects. Unfortunately, this includes promisor
objects.

Because the -d argument to `git repack` subsequently deletes all loose
objects also in packs, these just-loosened promisor objects will be
immediately deleted. However, this extra disk churn is unnecessary in
the first place.  For example, a newly-clone partial repo that filters
all blob objects (e.g. `--filter=blob:none`), `repack` ends up
unpacking all trees and commits into the filesystem because every
object, in this particular case, is a promisor object. Depending on
the repo size, this increases the disk usage considerably: In my copy
of the linux.git, the object directory peaked 26GB of more disk usage.

In order to avoid this extra disk churn, pass the names of the promisor
packfiles as --keep-pack arguments to the second invocation of
`pack-objects`. This informs `pack-objects` that the promisor objects
are already in a safe packfile and, therefore, do not need to be
loosened. The --keep-pack option takes only a packfile name, but we
concatenate both the path and the name in a single string. Instead,
let's split them into separate string in order to easily pass the
packfile name later.

For testing, we need to validate whether any object was loosened.
However, the "evidence" (loosened objects) is deleted during the
process which prevents us from inspecting the object directory.
Instead, let's teach `pack-objects` to count loosened objects and
emit via trace2 thus allowing inspecting the debug events after the
process is finished. This new event is used on the added regression
test.

Lastly, add a new perf test to evaluate the performance impact
made by this changes (tested on git.git):

     Test          HEAD^                 HEAD
     ----------------------------------------------------------
     5600.3: gc    134.38(41.93+90.95)   7.80(6.72+1.35) -94.2%

For a bigger repository, such as linux.git, the improvement is
even bigger:

     Test          HEAD^                     HEAD
     -------------------------------------------------------------------
     5600.3: gc    6833.00(918.07+3162.74)   268.79(227.02+39.18) -96.1%

These improvements are particular big because every object in the
newly-cloned partial repository is a promisor object.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
---
 builtin/pack-objects.c        | 8 +++++++-
 builtin/repack.c              | 9 +++++++--
 t/perf/p5600-partial-clone.sh | 4 ++++
 t/t5616-partial-clone.sh      | 8 ++++++++
 4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 40ee6fa19f..73889cec95 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3479,6 +3479,7 @@ static void loosen_unused_packed_objects(void)
 {
 	struct packed_git *p;
 	uint32_t i;
+	uint32_t loosened_objects_nr = 0;
 	struct object_id oid;
 
 	for (p = get_all_packs(the_repository); p; p = p->next) {
@@ -3492,11 +3493,16 @@ static void loosen_unused_packed_objects(void)
 			nth_packed_object_id(&oid, p, i);
 			if (!packlist_find(&to_pack, &oid) &&
 			    !has_sha1_pack_kept_or_nonlocal(&oid) &&
-			    !loosened_object_can_be_discarded(&oid, p->mtime))
+			    !loosened_object_can_be_discarded(&oid, p->mtime)) {
 				if (force_object_loose(&oid, p->mtime))
 					die(_("unable to force loose object"));
+				loosened_objects_nr++;
+			}
 		}
 	}
+
+	trace2_data_intmax("pack-objects", the_repository,
+			   "loosen_unused_packed_objects/loosened", loosened_objects_nr);
 }
 
 /*
diff --git a/builtin/repack.c b/builtin/repack.c
index 2847fdfbab..5f9bc74adc 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -20,7 +20,7 @@ static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
 static int write_bitmaps = -1;
 static int use_delta_islands;
-static char *packdir, *packtmp;
+static char *packdir, *packtmp_name, *packtmp;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [<options>]"),
@@ -530,7 +530,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	}
 
 	packdir = mkpathdup("%s/pack", get_object_directory());
-	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
+	packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
+	packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
 
 	sigchain_push_common(remove_pack_on_signal);
 
@@ -573,6 +574,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		repack_promisor_objects(&po_args, &names);
 
 		if (existing_packs.nr && delete_redundant) {
+			for_each_string_list_item(item, &names) {
+				strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack",
+					     packtmp_name, item->string);
+			}
 			if (unpack_unreachable) {
 				strvec_pushf(&cmd.args,
 					     "--unpack-unreachable=%s",
diff --git a/t/perf/p5600-partial-clone.sh b/t/perf/p5600-partial-clone.sh
index ca785a3341..a965f2c4d6 100755
--- a/t/perf/p5600-partial-clone.sh
+++ b/t/perf/p5600-partial-clone.sh
@@ -35,4 +35,8 @@ test_perf 'count non-promisor commits' '
 	git -C bare.git rev-list --all --count --exclude-promisor-objects
 '
 
+test_perf 'gc' '
+	git -C bare.git gc
+'
+
 test_done
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 5cb415386e..6e3e7565d0 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -548,6 +548,14 @@ test_expect_success 'fetch from a partial clone, protocol v2' '
 	grep "version 2" trace
 '
 
+test_expect_success 'repack does not loosen promisor objects' '
+	rm -rf client trace &&
+	git clone --bare --filter=blob:none "file://$(pwd)/srv.bare" client &&
+	test_when_finished "rm -rf client trace" &&
+	GIT_TRACE2_PERF="$(pwd)/trace" git -C client repack -A -d &&
+	grep "loosen_unused_packed_objects/loosened:0" trace
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.31.0.699.g8849f49b87


  reply	other threads:[~2021-04-18 14:00 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-03  9:04 rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-05  1:02 ` Rafael Silva
2021-04-07 21:17   ` Jeff King
2021-04-08  0:02     ` Jonathan Tan
2021-04-08  0:35       ` Jeff King
2021-04-12  7:09     ` Rafael Silva
2021-04-12 21:36     ` SZEDER Gábor
2021-04-12 21:49       ` Bryan Turner
2021-04-12 23:51         ` Jeff King
2021-04-12 23:47       ` Jeff King
2021-04-13  7:12         ` [PATCH 0/3] low-hanging performance fruit with promisor packs Jeff King
2021-04-13  7:15           ` [PATCH 1/3] is_promisor_object(): free tree buffer after parsing Jeff King
2021-04-13 20:17             ` Junio C Hamano
2021-04-14  5:18               ` Jeff King
2021-04-13  7:16           ` [PATCH 2/3] lookup_unknown_object(): take a repository argument Jeff King
2021-04-13  7:17           ` [PATCH 3/3] revision: avoid parsing with --exclude-promisor-objects Jeff King
2021-04-13 20:22             ` Junio C Hamano
2021-04-13 18:10           ` [PATCH 0/3] low-hanging performance fruit with promisor packs SZEDER Gábor
2021-04-14 17:14           ` Jonathan Tan
2021-04-14 19:22           ` Rafael Silva
2021-04-13 18:05         ` rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-14  5:14           ` Jeff King
2021-04-11 10:59   ` SZEDER Gábor
2021-04-12  7:53     ` Rafael Silva
2021-04-14 19:14 ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Rafael Silva
2021-04-14 19:14   ` [PATCH 1/2] repack: teach --no-prune-packed to skip `git prune-packed` Rafael Silva
2021-04-14 23:50     ` Jonathan Tan
2021-04-18 14:15       ` Rafael Silva
2021-04-14 19:14   ` [PATCH 2/2] repack: avoid loosening promisor pack objects in partial clones Rafael Silva
2021-04-15  1:04     ` Jonathan Tan
2021-04-15  3:51       ` Junio C Hamano
2021-04-15  9:03         ` Jeff King
2021-04-15  9:05       ` Jeff King
2021-04-18  7:12       ` Rafael Silva
2021-04-15 18:06     ` Junio C Hamano
2021-04-18  8:40       ` Rafael Silva
2021-04-14 22:10   ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Junio C Hamano
2021-04-15  9:15   ` Jeff King
2021-04-18  8:20     ` Rafael Silva
2021-04-18 13:57   ` [PATCH v2 0/1] " Rafael Silva
2021-04-18 13:57     ` Rafael Silva [this message]
2021-04-19 19:15       ` [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones Jonathan Tan
2021-04-21 18:54         ` Rafael Silva
2021-04-19 23:09       ` Junio C Hamano
2021-04-21 19:25         ` Rafael Silva
2021-04-21 19:32     ` [PATCH v3] " Rafael Silva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210418135749.27152-2-rafaeloliveira.cs@gmail.com \
    --to=rafaeloliveira.cs@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).