git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: me@ttaylorr.com, gitster@pobox.com, abhishekkumar8222@gmail.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH 6/7] commit-graph: parse file format v2
Date: Thu, 24 Feb 2022 20:38:35 +0000	[thread overview]
Message-ID: <28fe8824ba71ff9cda5ec5c034b6a6fb3c857654.1645735117.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1163.git.1645735117.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

The commit-graph file format v2 alters how it stores the corrected
commit date offsets within the Commit Data chunk instead of a separate
chunk. The idea is to significantly reduce the amount of data loaded
from disk while parsing the commit-graph.

We need to alter the error message when we see a file format version
outside of our range now that multiple are possible. This has a
non-functional side-effect of altering a use of GRAPH_VERSION within
write_commit_graph().

By storing the file format version in 'struct commit_graph', we can
alter the parsing code to depend on that version value. This involves
changing where we look for the corrected commit date offset, but also
which constants we use for jumping into the Generation Data Overflow
chunk. The Commit Data chunk only has 30 bits available for the offset
while the Generation Data chunk has 32 bits. This only makes a
meaningful difference in very malformed repositories.

Also, we need to be careful about how we enable using corrected commit
dates and generation numbers to rely upon the read_generation_data value
instead of a non-zero value in the Commit Date chunk. In
generation_numbers_enabled(), the first_generation variable is
attemptint to look for the first topological level stored to see that it
is nonzero. However, for a v2 commit-graph, this value is actually
likely to be zero because the corrected commit date offset is probably
zero.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c          | 43 +++++++++++++++++++++++++++++------------
 commit-graph.h          |  6 ++++++
 t/t5318-commit-graph.sh |  2 +-
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b86a6a634fe..366fc4d6e41 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -49,7 +49,9 @@ void git_test_write_commit_graph_or_die(void)
 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
 #define GRAPH_VERSION_1 0x1
-#define GRAPH_VERSION GRAPH_VERSION_1
+#define GRAPH_VERSION_2 0x2
+#define GRAPH_VERSION_MIN GRAPH_VERSION_1
+#define GRAPH_VERSION_MAX GRAPH_VERSION_2
 
 #define GRAPH_EXTRA_EDGES_NEEDED 0x80000000
 #define GRAPH_EDGE_LAST_MASK 0x7fffffff
@@ -63,6 +65,7 @@ void git_test_write_commit_graph_or_die(void)
 			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 #define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31)
+#define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW_V3 (1ULL << 29)
 
 /* Remember to update object flag allocation in object.h */
 #define REACHABLE       (1u<<15)
@@ -358,9 +361,10 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 	}
 
 	graph_version = *(unsigned char*)(data + 4);
-	if (graph_version != GRAPH_VERSION) {
-		error(_("commit-graph version %X does not match version %X"),
-		      graph_version, GRAPH_VERSION);
+	if (graph_version < GRAPH_VERSION_MIN ||
+	    graph_version > GRAPH_VERSION_MAX) {
+		error(_("commit-graph version %X is not understood"),
+		      graph_version);
 		return NULL;
 	}
 
@@ -375,6 +379,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 
 	graph = alloc_commit_graph();
 
+	graph->version = graph_version;
 	graph->hash_len = the_hash_algo->rawsz;
 	graph->num_chunks = *(unsigned char*)(data + 6);
 	graph->data = graph_map;
@@ -402,13 +407,17 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
 	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
 
-	if (get_configured_generation_version(r) >= 2) {
-		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
-			&graph->chunk_generation_data);
+	if (graph_version >= GRAPH_VERSION_2 ||
+	    get_configured_generation_version(r) >= 2) {
+		/* Skip this chunk if GRAPH_VERSION_2 or higher. */
+		if (graph_version == GRAPH_VERSION_1)
+			pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
+				   &graph->chunk_generation_data);
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
 			&graph->chunk_generation_data_overflow);
 
-		if (graph->chunk_generation_data)
+		if (graph_version >= GRAPH_VERSION_2 ||
+		    graph->chunk_generation_data)
 			graph->read_generation_data = 1;
 	}
 
@@ -683,6 +692,9 @@ int generation_numbers_enabled(struct repository *r)
 	if (!g->num_commits)
 		return 0;
 
+	if (g->version >= GRAPH_VERSION_2)
+		return g->read_generation_data;
+
 	first_generation = get_be32(g->chunk_commit_data +
 				    g->hash_len + 8) >> 2;
 
@@ -799,13 +811,20 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
 	if (g->read_generation_data) {
-		offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
+		timestamp_t overflow_bit;
+		if (g->version == GRAPH_VERSION_2) {
+			offset = (timestamp_t)(get_be32(commit_data + g->hash_len + 8) >> 2);
+			overflow_bit = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW_V3;
+		} else {
+			offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
+			overflow_bit = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW;
+		}
 
-		if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) {
+		if (offset & overflow_bit) {
 			if (!g->chunk_generation_data_overflow)
 				die(_("commit-graph requires overflow generation data but has none"));
 
-			offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW;
+			offset_pos = offset ^ overflow_bit;
 			graph_data->generation = item->date + get_be64(g->chunk_generation_data_overflow + 8 * offset_pos);
 		} else
 			graph_data->generation = item->date + offset;
@@ -1917,7 +1936,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 
-	hashwrite_u8(f, GRAPH_VERSION);
+	hashwrite_u8(f, GRAPH_VERSION_1);
 	hashwrite_u8(f, oid_version());
 	hashwrite_u8(f, get_num_chunks(cf));
 	hashwrite_u8(f, ctx->num_commit_graphs_after - 1);
diff --git a/commit-graph.h b/commit-graph.h
index 04a94e18302..b379b8eae25 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -63,6 +63,12 @@ struct commit_graph {
 	const unsigned char *data;
 	size_t data_len;
 
+	/**
+	 * The 'version' byte mirrors the file format version. This is
+	 * necessary to consider when parsing commits.
+	 */
+	unsigned version;
+
 	unsigned char hash_len;
 	unsigned char num_chunks;
 	uint32_t num_commits;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 5e4b0216fa6..a14a13e5f7b 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -605,7 +605,7 @@ test_expect_success 'detect bad signature' '
 '
 
 test_expect_success 'detect bad version' '
-	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\03" \
 		"graph version"
 '
 
-- 
gitgitgadget


  parent reply	other threads:[~2022-02-24 20:39 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-24 20:38 [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 1/7] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 2/7] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-24 22:15   ` Junio C Hamano
2022-02-25 13:51     ` Derrick Stolee
2022-02-25 17:35       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 3/7] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:18   ` Patrick Steinhardt
2022-02-28 16:23     ` Derrick Stolee
2022-02-28 16:59       ` Patrick Steinhardt
2022-02-28 18:44         ` Derrick Stolee
2022-03-01  9:46           ` Patrick Steinhardt
2022-03-01 10:35             ` Patrick Steinhardt
2022-03-01 14:06               ` Derrick Stolee
2022-03-01 14:53                 ` Patrick Steinhardt
2022-03-01 15:25                   ` Derrick Stolee
2022-03-02 13:57                     ` Patrick Steinhardt
2022-03-02 14:57                       ` Derrick Stolee
2022-03-02 18:15                         ` Junio C Hamano
2022-03-02 18:46                           ` Derrick Stolee
2022-03-02 22:42                             ` Junio C Hamano
2022-03-03 11:19                         ` Patrick Steinhardt
2022-03-03 16:00                           ` Derrick Stolee
2022-03-04 14:03                             ` Derrick Stolee
2022-03-07 10:34                               ` Patrick Steinhardt
2022-03-07 13:45                                 ` Derrick Stolee
2022-03-07 17:22                                   ` Junio C Hamano
2022-03-10 13:58                                   ` Patrick Steinhardt
2022-03-10 17:18                                     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 4/7] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-24 22:35   ` Junio C Hamano
2022-02-25 13:53     ` Derrick Stolee
2022-02-25 17:38       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 5/7] commit-graph: document file format v2 Derrick Stolee via GitGitGadget
2022-02-24 22:55   ` Junio C Hamano
2022-02-25 22:31   ` Ævar Arnfjörð Bjarmason
2022-02-28 13:44     ` Derrick Stolee
2022-02-28 14:27       ` Ævar Arnfjörð Bjarmason
2022-02-28 16:39         ` Derrick Stolee
2022-02-28 21:14           ` Ævar Arnfjörð Bjarmason
2022-03-01 14:19             ` Derrick Stolee
2022-03-01 14:29               ` Ævar Arnfjörð Bjarmason
2022-03-01 15:59                 ` Derrick Stolee
2022-02-24 20:38 ` Derrick Stolee via GitGitGadget [this message]
2022-02-24 23:01   ` [PATCH 6/7] commit-graph: parse " Junio C Hamano
2022-02-25 13:54     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 7/7] commit-graph: write " Derrick Stolee via GitGitGadget
2022-02-24 21:42 ` [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Junio C Hamano
2022-02-24 23:06   ` Junio C Hamano
2022-02-25 13:55     ` Derrick Stolee
2022-02-28 13:53 ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Derrick Stolee via GitGitGadget
2022-02-28 13:53   ` [PATCH v2 1/4] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-28 15:22     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 2/4] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-28 15:25     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 3/4] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:30     ` Ævar Arnfjörð Bjarmason
2022-02-28 16:43       ` Derrick Stolee
2022-02-28 13:53   ` [PATCH v2 4/4] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-28 15:40     ` Ævar Arnfjörð Bjarmason
2022-03-01 17:23   ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Ævar Arnfjörð Bjarmason
2022-03-01 19:48   ` [PATCH v3 0/5] " Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 1/5] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 2/5] t5318: extract helpers to lib-commit-graph.sh Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 3/5] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-03-01 20:13       ` Junio C Hamano
2022-03-01 20:30         ` Junio C Hamano
2022-03-02 14:13           ` Derrick Stolee
2022-03-01 19:48     ` [PATCH v3 4/5] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 5/5] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28fe8824ba71ff9cda5ec5c034b6a6fb3c857654.1645735117.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).