git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net
Subject: [PATCH v1] builtin/pack-objects.c: ignore missing links with --stdin-packs
Date: Fri, 19 Mar 2021 11:40:52 -0400	[thread overview]
Message-ID: <815623da67d283e8509fc4ac67e939c6140fc39a.1616168441.git.me@ttaylorr.com> (raw)

When 'git pack-objects --stdin-packs' encounters a commit in a pack, it
marks it as a starting point of a best-effort reachability traversal
that is used to populate the name-hash of the objects listed in the
given packs.

The traversal expects that it should be able to walk the ancestors of
all commits in a pack without issue. Ordinarily this is the case, but it
is possible to having missing parents from an unreachable part of the
repository. In that case, we'd consider any missing objects in the
unreachable portion of the graph to be junk.

This should be handled gracefully: since the traversal is best-effort
(i.e., we don't strictly need to fill in all of the name-hash fields),
we should simply ignore any missing links.

This patch does that (by setting the 'ignore_missing_links' bit on the
rev_info struct), and ensures we don't regress in the future by adding a
test which demonstrates this case.

It is a little over-eager, since it will also ignore missing links in
reachable parts of the packs (which would indicate a corrupted
repository), but '--stdin-packs' is explicitly *not* about reachability.
So this step isn't making anything worse for a repository which contains
packs missing reachable objects (since we never drop objects with
'--stdin-packs').

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
I noticed a small number of repositories using geometric repacks
complained about missing links when doing the best-effort traversal to
fill in missing namehash fields with 'git pack-objects --stdin-packs'.

This patch takes care to handle that situation gracefully. It is based
on tb/geometric-repack, which is on 'next'.

 builtin/pack-objects.c |  1 +
 t/t5300-pack-object.sh | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8cb32763b7..f513138513 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3122,6 +3122,7 @@ static void read_packs_list_from_stdin(void)
 	revs.blob_objects = 1;
 	revs.tree_objects = 1;
 	revs.tag_objects = 1;
+	revs.ignore_missing_links = 1;

 	while (strbuf_getline(&buf, stdin) != EOF) {
 		if (!buf.len)
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 7138a54595..ab509e8c38 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -629,4 +629,42 @@ test_expect_success '--stdin-packs with loose objects' '
 	)
 '

+test_expect_success '--stdin-packs with broken links' '
+	(
+		cd stdin-packs &&
+
+		# make an unreachable object with a bogus parent
+		git cat-file -p HEAD >commit &&
+		sed "s/$(git rev-parse HEAD^)/$(test_oid zero)/" <commit |
+		git hash-object -w -t commit --stdin >in &&
+
+		git pack-objects .git/objects/pack/pack-D <in &&
+
+		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
+		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
+		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
+		PACK_D="$(basename .git/objects/pack/pack-D-*.pack)" &&
+
+		git pack-objects test3 --stdin-packs --unpacked <<-EOF &&
+		$PACK_A
+		^$PACK_B
+		$PACK_C
+		$PACK_D
+		EOF
+
+		(
+			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
+			git show-index <$(ls .git/objects/pack/pack-C-*.idx) &&
+			git show-index <$(ls .git/objects/pack/pack-D-*.idx) &&
+			git rev-list --objects --no-object-names \
+				refs/tags/C..refs/tags/D
+		) >expect.raw &&
+		git show-index <$(ls test3-*.idx) >actual.raw &&
+
+		cut -d" " -f2 <expect.raw | sort >expect &&
+		cut -d" " -f2 <actual.raw | sort >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
--
2.30.0.667.g81c0cbc6fd

             reply	other threads:[~2021-03-19 15:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19 15:40 Taylor Blau [this message]
2021-03-19 18:19 ` [PATCH v1] builtin/pack-objects.c: ignore missing links with --stdin-packs Junio C Hamano
2021-03-19 19:31   ` Jeff King
2021-03-19 20:55     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=815623da67d283e8509fc4ac67e939c6140fc39a.1616168441.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).