git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: martin.agren@gmail.com, sandals@crustytoothpaste.net,
	me@ttaylorr.com, abhishekkumar8222@gmail.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 3/3] multi-pack-index: use hash version byte
Date: Fri, 14 Aug 2020 18:07:20 +0000	[thread overview]
Message-ID: <b4645789adf26e46cea721896867135941fb7654.1597428440.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.703.git.1597428440.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Similar to the commit-graph format, the multi-pack-index format has a
byte in the header intended to track the hash version used to write the
file. This allows one to interpret the hash length without having the
context of the repository config specifying the hash length. This was
not modified as part of the SHA-256 work because the hash length was
automatically up-shifted due to that config.

Since we have this byte available, we can make the file formats more
obviously incompatible instead of relying on other context from the
repository.

Add a new oid_version() method in midx.c similar to the one in
commit-graph.c. This is specifically made separate from that
implementation to avoid artificially linking the formats.

The test impact requires a few more things than the corresponding change
in the commit-graph format. Specifically, 'test-tool read-midx' was not
writing anything about this header value to output. Since the value
available in 'struct multi_pack_index' is hash_len instead of a version
value, we output "20" or "32" instead of "1" or "2".

Since we want a user to not have their Git commands fail if their
multi-pack-index has the incorrect hash version compared to the
repository's hash version, we relax the die() to an error() in
load_multi_pack_index(). This has some effect on 'git multi-pack-index
verify' as we need to check that a failed parse of a file that exists is
actually a verify error. For that test that checks the hash version
matches, we change the corrupted byte from "2" to "3" to ensure the test
fails for both hash algorithms.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/pack-format.txt |  7 +++-
 midx.c                                  | 32 ++++++++++++++----
 t/helper/test-read-midx.c               |  8 +++--
 t/t5319-multi-pack-index.sh             | 43 ++++++++++++++++++++++---
 4 files changed, 77 insertions(+), 13 deletions(-)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index d3a142c652..16cf7e83aa 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -273,7 +273,12 @@ HEADER:
 	    Git only writes or recognizes version 1.
 
 	1-byte Object Id Version
-	    Git only writes or recognizes version 1 (SHA1).
+	    We infer the length of object IDs (OIDs) from this value:
+		1 => SHA-1
+		2 => SHA-256
+	    If the hash type does not match the repository's hash algorithm,
+	    the multi-pack-index file should be ignored with a warning
+	    presented to the user.
 
 	1-byte number of "chunks"
 
diff --git a/midx.c b/midx.c
index a5fb797ede..0c165a40f5 100644
--- a/midx.c
+++ b/midx.c
@@ -17,7 +17,6 @@
 #define MIDX_BYTE_HASH_VERSION 5
 #define MIDX_BYTE_NUM_CHUNKS 6
 #define MIDX_BYTE_NUM_PACKS 8
-#define MIDX_HASH_VERSION 1
 #define MIDX_HEADER_SIZE 12
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
 
@@ -36,6 +35,15 @@
 
 #define PACK_EXPIRED UINT_MAX
 
+static uint8_t oid_version(void)
+{
+	if (the_hash_algo->rawsz == GIT_SHA1_RAWSZ)
+		return 1;
+	if (the_hash_algo->rawsz == GIT_SHA256_RAWSZ)
+		return 2;
+	die(_("invalid hash version"));
+}
+
 static char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
@@ -90,8 +98,11 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 		      m->version);
 
 	hash_version = m->data[MIDX_BYTE_HASH_VERSION];
-	if (hash_version != MIDX_HASH_VERSION)
-		die(_("hash version %u does not match"), hash_version);
+	if (hash_version != oid_version()) {
+		error(_("multi-pack-index hash version %u does not match version %u"),
+		      hash_version, oid_version());
+		goto cleanup_fail;
+	}
 	m->hash_len = the_hash_algo->rawsz;
 
 	m->num_chunks = m->data[MIDX_BYTE_NUM_CHUNKS];
@@ -418,7 +429,7 @@ static size_t write_midx_header(struct hashfile *f,
 
 	hashwrite_be32(f, MIDX_SIGNATURE);
 	byte_values[0] = MIDX_VERSION;
-	byte_values[1] = MIDX_HASH_VERSION;
+	byte_values[1] = oid_version();
 	byte_values[2] = num_chunks;
 	byte_values[3] = 0; /* unused */
 	hashwrite(f, byte_values, sizeof(byte_values));
@@ -1105,8 +1116,17 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
 	verify_midx_error = 0;
 
-	if (!m)
-		return 0;
+	if (!m) {
+		int result = 0;
+		struct stat sb;
+		char *filename = get_midx_filename(object_dir);
+		if (!stat(filename, &sb)) {
+			error(_("multi-pack-index file exists, but failed to parse"));
+			result = 1;
+		}
+		free(filename);
+		return result;
+	}
 
 	if (flags & MIDX_PROGRESS)
 		progress = start_progress(_("Looking for referenced packfiles"),
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 831b586d02..2430880f78 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -7,14 +7,18 @@
 static int read_midx_file(const char *object_dir)
 {
 	uint32_t i;
-	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	struct multi_pack_index *m;
+
+	setup_git_directory();
+	m = load_multi_pack_index(object_dir, 1);
 
 	if (!m)
 		return 1;
 
-	printf("header: %08x %d %d %d\n",
+	printf("header: %08x %d %d %d %d\n",
 	       m->signature,
 	       m->version,
+	       m->hash_len,
 	       m->num_chunks,
 	       m->num_packs);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 7dfff0f8f4..09cbca4949 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -5,6 +5,8 @@ test_description='multi-pack-indexes'
 
 objdir=.git/objects
 
+HASH_LEN=$(test_oid rawsz)
+
 midx_read_expect () {
 	NUM_PACKS=$1
 	NUM_OBJECTS=$2
@@ -13,7 +15,7 @@ midx_read_expect () {
 	EXTRA_CHUNKS="$5"
 	{
 		cat <<-EOF &&
-		header: 4d494458 1 $NUM_CHUNKS $NUM_PACKS
+		header: 4d494458 1 $HASH_LEN $NUM_CHUNKS $NUM_PACKS
 		chunks: pack-names oid-fanout oid-lookup object-offsets$EXTRA_CHUNKS
 		num_objects: $NUM_OBJECTS
 		packs:
@@ -46,7 +48,7 @@ test_expect_success "don't write midx with no packs" '
 	test_path_is_missing pack/multi-pack-index
 '
 
-test_expect_success "Warn if a midx contains no oid" '
+test_expect_success SHA1 'warn if a midx contains no oid' '
 	cp "$TEST_DIRECTORY"/t5319/no-objects.midx $objdir/pack/multi-pack-index &&
 	test_must_fail git multi-pack-index verify &&
 	rm $objdir/pack/multi-pack-index
@@ -198,6 +200,40 @@ test_expect_success 'write midx with twelve packs' '
 
 compare_results_with_midx "twelve packs"
 
+test_expect_success 'warn on improper hash version' '
+	git init --object-format=sha1 sha1 &&
+	(
+		cd sha1 &&
+		git config core.multiPackIndex true &&
+		test_commit 1 &&
+		git repack -a &&
+		git multi-pack-index write &&
+		mv .git/objects/pack/multi-pack-index ../mpi-sha1
+	) &&
+	git init --object-format=sha256 sha256 &&
+	(
+		cd sha256 &&
+		git config core.multiPackIndex true &&
+		test_commit 1 &&
+		git repack -a &&
+		git multi-pack-index write &&
+		mv .git/objects/pack/multi-pack-index ../mpi-sha256
+	) &&
+	(
+		cd sha1 &&
+		mv ../mpi-sha256 .git/objects/pack/multi-pack-index &&
+		git log -1 2>err &&
+		test_i18ngrep "multi-pack-index hash version 2 does not match version 1" err
+	) &&
+	(
+		cd sha256 &&
+		mv ../mpi-sha1 .git/objects/pack/multi-pack-index &&
+		git log -1 2>err &&
+		test_i18ngrep "multi-pack-index hash version 1 does not match version 2" err
+	)
+'
+
+
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
 '
@@ -243,7 +279,6 @@ test_expect_success 'verify bad signature' '
 		"multi-pack-index signature"
 '
 
-HASH_LEN=$(test_oid rawsz)
 NUM_OBJECTS=74
 MIDX_BYTE_VERSION=4
 MIDX_BYTE_OID_VERSION=5
@@ -272,7 +307,7 @@ test_expect_success 'verify bad version' '
 '
 
 test_expect_success 'verify bad OID version' '
-	corrupt_midx_and_verify $MIDX_BYTE_OID_VERSION "\02" $objdir \
+	corrupt_midx_and_verify $MIDX_BYTE_OID_VERSION "\03" $objdir \
 		"hash version"
 '
 
-- 
gitgitgadget

  parent reply	other threads:[~2020-08-14 18:07 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-14 18:07 [PATCH 0/3] SHA-256: Update commit-graph and multi-pack-index formats Derrick Stolee via GitGitGadget
2020-08-14 18:07 ` [PATCH 1/3] t/README: document GIT_TEST_DEFAULT_HASH Derrick Stolee via GitGitGadget
2020-08-14 19:02   ` Junio C Hamano
2020-08-14 20:39   ` Eric Sunshine
2020-08-14 20:41     ` Derrick Stolee
2020-08-14 18:07 ` [PATCH 2/3] commit-graph: use the hash version byte Derrick Stolee via GitGitGadget
2020-08-14 19:05   ` Junio C Hamano
2020-08-14 20:05     ` Taylor Blau
2020-08-14 20:11   ` brian m. carlson
2020-08-14 20:22     ` Junio C Hamano
2020-08-14 20:36     ` Derrick Stolee
2020-08-15 13:46       ` Martin Ågren
2020-08-14 18:07 ` Derrick Stolee via GitGitGadget [this message]
2020-08-14 20:14   ` [PATCH 3/3] multi-pack-index: use " brian m. carlson
2020-08-14 19:25 ` [PATCH 0/3] SHA-256: Update commit-graph and multi-pack-index formats Junio C Hamano
2020-08-14 20:34   ` Derrick Stolee
2020-08-14 21:41     ` Junio C Hamano
2020-08-17 14:04 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2020-08-17 14:04   ` [PATCH v2 1/3] t/README: document GIT_TEST_DEFAULT_HASH Derrick Stolee via GitGitGadget
2020-08-17 14:04   ` [PATCH v2 2/3] commit-graph: use the "hash version" byte Derrick Stolee via GitGitGadget
2020-08-17 14:04   ` [PATCH v2 3/3] multi-pack-index: use hash version byte Derrick Stolee via GitGitGadget
2020-08-17 23:12   ` [PATCH v2 0/3] SHA-256: Update commit-graph and multi-pack-index formats brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4645789adf26e46cea721896867135941fb7654.1597428440.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=martin.agren@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=sandals@crustytoothpaste.net \
    --subject='Re: [PATCH 3/3] multi-pack-index: use hash version byte' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).