git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: sandals@crustytoothpaste.net, avarab@gmail.com,
	Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 5/6] commit-graph: implement file format version 2
Date: Wed, 23 Jan 2019 13:59:19 -0800 (PST)
Message-ID: <c55e0a738c79c30affcf2cb68564c01c36c66dff.1548280753.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.112.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The commit-graph file format had some shortcomings which we now
correct:

  1. The hash algorithm was determined by a single byte, instead
     of the 4-byte format identifier.

  2. There was no way to update the reachability index we used.
     We currently only support generation numbers, but that will
     change in the future.

  3. Git did not fail with error if the unused eighth byte was
     non-zero, so we could not use that to indicate an incremental
     file format without breaking compatibility across versions.

The new format modifies the header of the commit-graph to solve
these problems. We use the 4-byte hash format id, freeing up a byte
in our 32-bit alignment to introduce a reachability index version.
We can also fail to read the commit-graph if the eighth byte is
non-zero.

The 'git commit-graph read' subcommand needs updating to show the
new data.

Set the default file format version to 2, and adjust the tests to
expect the new 'git commit-graph read' output.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .../technical/commit-graph-format.txt         | 26 +++++++++-
 builtin/commit-graph.c                        |  9 ++++
 commit-graph.c                                | 47 ++++++++++++++++---
 commit-graph.h                                |  1 +
 t/t5318-commit-graph.sh                       |  9 +++-
 5 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 16452a0504..e367aa94b1 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -31,13 +31,22 @@ and hash type.
 
 All 4-byte numbers are in network order.
 
+There are two versions available, 1 and 2. These currently differ only in
+the header.
+
 HEADER:
 
+All commit-graph files use the first five bytes for the same purpose.
+
   4-byte signature:
       The signature is: {'C', 'G', 'P', 'H'}
 
   1-byte version number:
-      Currently, the only valid version is 1.
+      Currently, the valid version numbers are 1 and 2.
+
+The remainder of the header changes depending on the version.
+
+Version 1:
 
   1-byte Hash Version (1 = SHA-1)
       We infer the hash length (H) from this value.
@@ -47,6 +56,21 @@ HEADER:
   1-byte (reserved for later use)
      Current clients should ignore this value.
 
+Version 2:
+
+  1-byte number (C) of "chunks"
+
+  1-byte reachability index version number:
+      Currently, the only valid number is 1.
+
+  1-byte (reserved for later use)
+      Current clients expect this value to be zero, and will not
+      try to read the commit-graph file if it is non-zero.
+
+  4-byte format identifier for the hash algorithm:
+      If this identifier does not agree with the repository's current
+      hash algorithm, then the client will not read the commit graph.
+
 CHUNK LOOKUP:
 
   (C + 1) * 12 bytes listing the table of contents for the chunks:
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index b1bed84260..28787d0c9c 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -102,6 +102,11 @@ static int graph_read(int argc, const char **argv)
 		*(unsigned char*)(graph->data + 5),
 		*(unsigned char*)(graph->data + 6),
 		*(unsigned char*)(graph->data + 7));
+
+	if (*(unsigned char *)(graph->data + 4) == 2)
+		printf("hash algorithm: %X\n",
+		       get_be32(graph->data + 8));
+
 	printf("num_commits: %u\n", graph->num_commits);
 	printf("chunks:");
 
@@ -162,6 +167,10 @@ static int graph_write(int argc, const char **argv)
 	case 1:
 		flags |= COMMIT_GRAPH_VERSION_1;
 		break;
+
+	case 2:
+		flags |= COMMIT_GRAPH_VERSION_2;
+		break;
 	}
 
 	read_replace_refs = 0;
diff --git a/commit-graph.c b/commit-graph.c
index f7f45893fd..aeb6cae656 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -90,7 +90,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	uint64_t last_chunk_offset;
 	uint32_t last_chunk_id;
 	uint32_t graph_signature;
-	unsigned char graph_version, hash_version;
+	unsigned char graph_version, hash_version, reach_index_version;
+	uint32_t hash_id;
 
 	if (fd < 0)
 		return NULL;
@@ -115,9 +116,9 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	}
 
 	graph_version = *(unsigned char*)(data + 4);
-	if (graph_version != 1) {
-		error(_("graph version %X does not match version %X"),
-		      graph_version, 1);
+	if (!graph_version || graph_version > 2) {
+		error(_("unsupported graph version %X"),
+		      graph_version);
 		goto cleanup_fail;
 	}
 
@@ -135,6 +136,30 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 		graph->num_chunks = *(unsigned char*)(data + 6);
 		chunk_lookup = data + 8;
 		break;
+
+	case 2:
+		graph->num_chunks = *(unsigned char *)(data + 5);
+
+		reach_index_version = *(unsigned char *)(data + 6);
+		if (reach_index_version != 1) {
+			error(_("unsupported reachability index version %d"),
+			      reach_index_version);
+			goto cleanup_fail;
+		}
+
+		if (*(unsigned char*)(data + 7)) {
+			error(_("unsupported value in commit-graph header"));
+			goto cleanup_fail;
+		}
+
+		hash_id = get_be32(data + 8);
+		if (hash_id != the_hash_algo->format_id) {
+			error(_("commit-graph hash algorithm does not match current algorithm"));
+			goto cleanup_fail;
+		}
+
+		chunk_lookup = data + 12;
+		break;
 	}
 
 	graph->hash_len = the_hash_algo->rawsz;
@@ -802,9 +827,11 @@ int write_commit_graph(const char *obj_dir,
 
 	if (flags & COMMIT_GRAPH_VERSION_1)
 		version = 1;
+	if (flags & COMMIT_GRAPH_VERSION_2)
+		version = 2;
 	if (!version)
-		version = 1;
-	if (version != 1) {
+		version = 2;
+	if (version <= 0 || version > 2) {
 		error(_("unsupported commit-graph version %d"),
 		      version);
 		return 1;
@@ -1003,6 +1030,14 @@ int write_commit_graph(const char *obj_dir,
 		hashwrite_u8(f, 0); /* unused padding byte */
 		header_size = 8;
 		break;
+
+	case 2:
+		hashwrite_u8(f, num_chunks);
+		hashwrite_u8(f, 1); /* reachability index version */
+		hashwrite_u8(f, 0); /* unused padding byte */
+		hashwrite_be32(f, the_hash_algo->format_id);
+		header_size = 12;
+		break;
 	}
 
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
diff --git a/commit-graph.h b/commit-graph.h
index e03df54e33..050137063b 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -63,6 +63,7 @@ int generation_numbers_enabled(struct repository *r);
 #define COMMIT_GRAPH_APPEND     (1 << 0)
 #define COMMIT_GRAPH_PROGRESS   (1 << 1)
 #define COMMIT_GRAPH_VERSION_1  (1 << 2)
+#define COMMIT_GRAPH_VERSION_2  (1 << 3)
 
 int write_commit_graph_reachable(const char *obj_dir, int flags);
 int write_commit_graph(const char *obj_dir,
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index b79d6263e9..3ff5e3b48d 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -65,7 +65,8 @@ graph_read_expect() {
 		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
 	fi
 	cat >expect <<- EOF
-	header: 43475048 1 1 $NUM_CHUNKS 0
+	header: 43475048 2 $NUM_CHUNKS 1 0
+	hash algorithm: 73686131
 	num_commits: $1
 	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
 	EOF
@@ -390,10 +391,14 @@ test_expect_success 'detect bad signature' '
 '
 
 test_expect_success 'detect bad version' '
-	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\03" \
 		"graph version"
 '
 
+test_expect_success 'detect version 2 with version 1 data' '
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+		"reachability index version"
+'
 test_expect_success 'detect bad hash version' '
 	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
 		"hash version"
-- 
gitgitgadget


  parent reply index

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 3/6] commit-graph: create new version flags Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-01-24  9:31   ` Ævar Arnfjörð Bjarmason
2019-01-23 21:59 ` Derrick Stolee via GitGitGadget [this message]
2019-01-23 23:56   ` [PATCH 5/6] commit-graph: implement file format version 2 Jonathan Tan
2019-01-24  9:40   ` Ævar Arnfjörð Bjarmason
2019-01-24 14:34     ` Derrick Stolee
2019-03-21  9:21   ` Ævar Arnfjörð Bjarmason
2019-01-23 21:59 ` [PATCH 6/6] commit-graph: test verifying a corrupt v2 header Derrick Stolee via GitGitGadget
2019-01-23 23:59   ` Jonathan Tan
2019-01-24 23:05 ` [PATCH 0/6] Create commit-graph file format v2 Junio C Hamano
2019-01-24 23:39 ` Junio C Hamano
2019-01-25 13:54   ` Derrick Stolee
2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 1/5] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 2/5] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-04-25  5:21     ` Junio C Hamano
2019-04-24 19:58   ` [PATCH v2 3/5] commit-graph: create new version flags Derrick Stolee via GitGitGadget
2019-04-25  5:29     ` Junio C Hamano
2019-04-25 11:09       ` Derrick Stolee
2019-04-25 21:31     ` Ævar Arnfjörð Bjarmason
2019-04-26  2:20       ` Junio C Hamano
2019-04-24 19:58   ` [PATCH v2 4/5] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 5/5] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-04-25 22:09   ` [PATCH v2 0/5] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
2019-04-26  2:28     ` Junio C Hamano
2019-04-26  8:33       ` Ævar Arnfjörð Bjarmason
2019-04-26 12:06         ` Derrick Stolee
2019-04-26 13:55           ` Ævar Arnfjörð Bjarmason
2019-04-27 12:57     ` Ævar Arnfjörð Bjarmason
2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-01 14:46       ` Ævar Arnfjörð Bjarmason
2019-05-01 13:11     ` [PATCH v3 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 3/6] commit-graph: create new version parameter Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-05-01 19:12       ` Ævar Arnfjörð Bjarmason
2019-05-01 19:56         ` Derrick Stolee
2019-05-01 13:11     ` [PATCH v3 6/6] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-01 14:58       ` Ævar Arnfjörð Bjarmason
2019-05-01 19:59         ` Derrick Stolee
2019-05-01 20:25     ` [PATCH v3 0/6] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
2019-05-02 13:26       ` Derrick Stolee
2019-05-02 18:02         ` Ævar Arnfjörð Bjarmason
2019-05-03 12:47           ` Derrick Stolee
2019-05-03 13:41             ` Ævar Arnfjörð Bjarmason
2019-05-06  8:27               ` Christian Couder
2019-05-06 13:47                 ` Derrick Stolee
2019-05-03 14:16             ` SZEDER Gábor
2019-05-03 15:11               ` Derrick Stolee
2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-05-13  2:56         ` Junio C Hamano
2019-05-09 14:22       ` [PATCH v4 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-13  3:13         ` Junio C Hamano
2019-05-13 11:04           ` Derrick Stolee
2019-05-13 11:22             ` Derrick Stolee
2019-05-09 14:22       ` [PATCH v4 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-13  3:44         ` Junio C Hamano
2019-05-13 11:07           ` Derrick Stolee
2019-05-09 14:22       ` [PATCH v4 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-05-13  5:05         ` Junio C Hamano
2019-05-09 14:22       ` [PATCH v4 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
2019-05-13  5:09         ` Junio C Hamano
2019-05-09 17:58       ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Josh Steadmon
2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-06-29 17:23           ` SZEDER Gábor
2019-07-01 12:19             ` Derrick Stolee
2019-06-12 13:29         ` [PATCH v5 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c55e0a738c79c30affcf2cb68564c01c36c66dff.1548280753.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git