git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "SZEDER Gábor via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: me@ttaylorr.com, szeder.dev@gmail.com, jnareb@gmail.com,
	peff@peff.net, garimasigit@gmail.com,
	"Derrick Stolee" <dstolee@microsoft.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH 02/10] commit-graph: fix parsing the Chunk Lookup table
Date: Fri, 05 Jun 2020 13:00:24 +0000	[thread overview]
Message-ID: <1e1671e7c699adf72fcf4ba8f03a7427ff30c9e7.1591362033.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.650.git.1591362032.gitgitgadget@gmail.com>

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

The commit-graph file format specifies that the chunks may be in any
order.  However, if the OID Lookup chunk happens to be the last one in
the file, then any command attempting to access the commit-graph data
will fail with:

  fatal: invalid commit position. commit-graph is likely corrupt

In this case the error is wrong, the commit-graph file does conform to
the specification, but the parsing of the Chunk Lookup table is a bit
buggy, and leaves the field holding the number of commits in the
commit-graph zero-initialized.

The number of commits in the commit-graph is determined while parsing
the Chunk Lookup table, by dividing the size of the OID Lookup chunk
with the hash size.  However, the Chunk Lookup table doesn't actually
store the size of the chunks, but it stores their starting offset.
Consequently, the size of a chunk can only be calculated by
subtracting the starting offsets of that chunk from the offset of the
subsequent chunk, or in case of the last chunk from the offset
recorded in the terminating label.  This is currenly implemented in a
bit complicated way: as we iterate over the entries of the Chunk
Lookup table, we check the ID of each chunk and store its starting
offset, then we check the ID of the last seen chunk and calculate its
size using its previously saved offset if necessary (at the moment
it's only necessary for the OID Lookup chunk).  Alas, while parsing
the Chunk Lookup table we only interate through the "real" chunks, but
never look at the terminating label, thus don't even check whether
it's necessary to calulate the size of the last chunk.  Consequently,
if the OID Lookup chunk is the last one, then we don't calculate its
size and turn don't run the piece of code determining the number of
commits in the commit graph, leaving the field holding that number
unchanged (i.e. zero-initialized), eventually triggering the sanity
check in load_oid_from_graph().

Fix this by iterating through all entries in the Chunk Lookup table,
including the terminating label.

Note that this is the minimal fix, suitable for the maintenance track.
A better fix would be to simplify how the chunk sizes are calculated,
but that is a more invasive change, less suitable for 'maint', so that
will be done in later patches.

This additional flexibility of scanning more chunks breaks a test for
"git commit-graph verify" so alter that test to mutate the commit-graph
to have an even lower chunk count.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 2 +-
 t/t5318-commit-graph.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 196e817a84c..7807d945626 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -277,7 +277,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	last_chunk_id = 0;
 	last_chunk_offset = 8;
 	chunk_lookup = data + 8;
-	for (i = 0; i < graph->num_chunks; i++) {
+	for (i = 0; i <= graph->num_chunks; i++) {
 		uint32_t chunk_id;
 		uint64_t chunk_offset;
 		int chunk_repeated = 0;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 18304a65e4d..79e7fbcd40e 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -488,7 +488,7 @@ test_expect_success 'detect bad hash version' '
 '
 
 test_expect_success 'detect low chunk count' '
-	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \
 		"missing the .* chunk"
 '
 
-- 
gitgitgadget


  parent reply	other threads:[~2020-06-05 13:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
2020-06-05 13:00 ` [PATCH 01/10] tree-walk.c: don't match submodule entries for 'submod/anything' SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget [this message]
2020-06-05 13:00 ` [PATCH 03/10] commit-graph-format.txt: all multi-byte numbers are in network byte order SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 04/10] commit-slab: add a function to deep free entries on the slab SZEDER Gábor via GitGitGadget
2020-06-18 20:59   ` René Scharfe
2020-06-19 12:52     ` Derrick Stolee
2020-06-27 15:53   ` SZEDER Gábor
2020-06-05 13:00 ` [PATCH 05/10] diff.h: drop diff_tree_oid() & friends' return value SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 06/10] commit-graph: clean up #includes SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 07/10] commit-graph: simplify parse_commit_graph() #1 SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 08/10] commit-graph: simplify parse_commit_graph() #2 SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 09/10] commit-graph: simplify write_commit_graph_file() #1 SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 10/10] commit-graph: simplify write_commit_graph_file() #2 SZEDER Gábor via GitGitGadget
2020-06-08 17:39 ` [PATCH 00/10] Szeder's commit-graph cleanups Junio C Hamano
2020-06-18  1:48 ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e1671e7c699adf72fcf4ba8f03a7427ff30c9e7.1591362033.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=garimasigit@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).