git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Taylor Blau <me@ttaylorr.com>
Subject: [PATCH 17/20] commit-graph: check bounds when accessing BDAT chunk
Date: Mon, 9 Oct 2023 17:05:50 -0400	[thread overview]
Message-ID: <20231009210550.GQ3282181@coredump.intra.peff.net> (raw)
In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net>

When loading Bloom filters from a commit-graph file, we use the offset
values in the BIDX chunk to index into the memory mapped for the BDAT
chunk. But since we don't record how big the BDAT chunk is, we just
trust that the BIDX offsets won't cause us to read outside of the chunk
memory. A corrupted or malicious commit-graph file will cause us to
segfault (in practice this isn't a very interesting attack, since
commit-graph files are local-only, and the worst case is an
out-of-bounds read).

We can't fix this by checking the chunk size during parsing, since the
data in the BDAT chunk doesn't have a fixed size (that's why we need the
BIDX in the first place). So we'll fix it in two parts:

  1. Record the BDAT chunk size during parsing, and then later check
     that the BIDX offsets we look up are within bounds.

  2. Because the offsets are relative to the end of the BDAT header, we
     must also make sure that the BDAT chunk is at least as large as the
     expected header size. Otherwise, we overflow when trying to move
     past the header, even for an offset of "0". We can check this
     early, during the parsing stage.

The error messages are rather verbose, but since this is not something
you'd expect to see outside of severe bugs or corruption, it makes sense
to err on the side of too many details. Sadly we can't mention the
filename during the chunk-parsing stage, as we haven't set g->filename
at this point, nor passed it down through the stack.

Signed-off-by: Jeff King <peff@peff.net>
---
 bloom.c              | 24 ++++++++++++++++++++++++
 commit-graph.c       | 10 ++++++++++
 commit-graph.h       |  1 +
 t/t4216-log-bloom.sh | 28 ++++++++++++++++++++++++++++
 4 files changed, 63 insertions(+)

diff --git a/bloom.c b/bloom.c
index aef6b5fea2..61abad7f8c 100644
--- a/bloom.c
+++ b/bloom.c
@@ -29,6 +29,26 @@ static inline unsigned char get_bitmask(uint32_t pos)
 	return ((unsigned char)1) << (pos & (BITS_PER_WORD - 1));
 }
 
+static int check_bloom_offset(struct commit_graph *g, uint32_t pos,
+			      uint32_t offset)
+{
+	/*
+	 * Note that we allow offsets equal to the data size, which would set
+	 * our pointers at one past the end of the chunk memory. This is
+	 * necessary because the on-disk index points to the end of the
+	 * entries (so we can compute size by comparing adjacent ones). And
+	 * naturally the final entry's end is one-past-the-end of the chunk.
+	 */
+	if (offset <= g->chunk_bloom_data_size - BLOOMDATA_CHUNK_HEADER_SIZE)
+		return 0;
+
+	warning("ignoring out-of-range offset (%"PRIuMAX") for changed-path"
+		" filter at pos %"PRIuMAX" of %s (chunk size: %"PRIuMAX")",
+		(uintmax_t)offset, (uintmax_t)pos,
+		g->filename, (uintmax_t)g->chunk_bloom_data_size);
+	return -1;
+}
+
 static int load_bloom_filter_from_graph(struct commit_graph *g,
 					struct bloom_filter *filter,
 					uint32_t graph_pos)
@@ -51,6 +71,10 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 	else
 		start_index = 0;
 
+	if (check_bloom_offset(g, lex_pos, end_index) < 0 ||
+	    check_bloom_offset(g, lex_pos - 1, start_index) < 0)
+		return 0;
+
 	filter->len = end_index - start_index;
 	filter->data = (unsigned char *)(g->chunk_bloom_data +
 					sizeof(unsigned char) * start_index +
diff --git a/commit-graph.c b/commit-graph.c
index f446e76c28..f7a42be6d0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -365,7 +365,17 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,
 {
 	struct commit_graph *g = data;
 	uint32_t hash_version;
+
+	if (chunk_size < BLOOMDATA_CHUNK_HEADER_SIZE) {
+		warning("ignoring too-small changed-path chunk"
+			" (%"PRIuMAX" < %"PRIuMAX") in commit-graph file",
+			(uintmax_t)chunk_size,
+			(uintmax_t)BLOOMDATA_CHUNK_HEADER_SIZE);
+		return -1;
+	}
+
 	g->chunk_bloom_data = chunk_start;
+	g->chunk_bloom_data_size = chunk_size;
 	hash_version = get_be32(chunk_start);
 
 	if (hash_version != 1)
diff --git a/commit-graph.h b/commit-graph.h
index b373f15802..c6870274c5 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -101,6 +101,7 @@ struct commit_graph {
 	size_t chunk_base_graphs_size;
 	const unsigned char *chunk_bloom_indexes;
 	const unsigned char *chunk_bloom_data;
+	size_t chunk_bloom_data_size;
 
 	struct topo_level_slab *topo_levels;
 	struct bloom_filter_settings *bloom_filter_settings;
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index fa9d32facf..7a727bcddd 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -5,6 +5,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-chunk.sh
 
 GIT_TEST_COMMIT_GRAPH=0
 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
@@ -404,4 +405,31 @@ test_expect_success 'Bloom generation backfills empty commits' '
 	)
 '
 
+corrupt_graph () {
+	graph=.git/objects/info/commit-graph &&
+	test_when_finished "rm -rf $graph" &&
+	git commit-graph write --reachable --changed-paths &&
+	corrupt_chunk_file $graph "$@"
+}
+
+check_corrupt_graph () {
+	corrupt_graph "$@" &&
+	git -c core.commitGraph=false log -- A/B/file2 >expect.out &&
+	git -c core.commitGraph=true log -- A/B/file2 >out 2>err &&
+	test_cmp expect.out out
+}
+
+test_expect_success 'Bloom reader notices too-small data chunk' '
+	check_corrupt_graph BDAT clear 00000000 &&
+	echo "warning: ignoring too-small changed-path chunk" \
+		"(4 < 12) in commit-graph file" >expect.err &&
+	test_cmp expect.err err
+'
+
+test_expect_success 'Bloom reader notices out-of-bounds filter offsets' '
+	check_corrupt_graph BIDX 12 FFFFFFFF &&
+	# use grep to avoid depending on exact chunk size
+	grep "warning: ignoring out-of-range offset (4294967295) for changed-path filter at pos 3 of .git/objects/info/commit-graph" err
+'
+
 test_done
-- 
2.42.0.884.g35e1fe1a6a


  parent reply	other threads:[~2023-10-09 21:06 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-09 20:55 [PATCH 0/20] bounds-checks for chunk-based files Jeff King
2023-10-09 20:58 ` [PATCH 01/20] chunk-format: note that pair_chunk() is unsafe Jeff King
2023-10-10 23:45   ` Taylor Blau
2023-10-11 22:49     ` Jeff King
2023-10-09 20:58 ` [PATCH 02/20] t: add library for munging chunk-format files Jeff King
2023-10-10 23:47   ` Taylor Blau
2023-10-09 20:59 ` [PATCH 03/20] midx: stop ignoring malformed oid fanout chunk Jeff King
2023-10-10 23:50   ` Taylor Blau
2023-10-11 22:52     ` Jeff King
2023-10-09 20:59 ` [PATCH 04/20] commit-graph: check size of " Jeff King
2023-10-11  0:08   ` Taylor Blau
2023-10-11  1:24     ` Taylor Blau
2023-10-11 23:01     ` Jeff King
2023-10-09 21:02 ` [PATCH 05/20] midx: check size of oid lookup chunk Jeff King
2023-10-09 21:04 ` [PATCH 06/20] commit-graph: check consistency of fanout table Jeff King
2023-10-11 14:45   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 07/20] midx: check size of pack names chunk Jeff King
2023-10-11 14:52   ` Taylor Blau
2023-10-11 23:06     ` Jeff King
2023-10-09 21:05 ` [PATCH 08/20] midx: enforce chunk alignment on reading Jeff King
2023-10-11 14:56   ` Taylor Blau
2023-10-11 15:01   ` Taylor Blau
2023-10-11 23:09     ` Jeff King
2023-10-09 21:05 ` [PATCH 09/20] midx: check size of object offset chunk Jeff King
2023-10-11 18:31   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 10/20] midx: bounds-check large " Jeff King
2023-10-11 18:38   ` Taylor Blau
2023-10-11 23:18     ` Jeff King
2023-10-09 21:05 ` [PATCH 11/20] midx: check size of revindex chunk Jeff King
2023-10-11 18:41   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 12/20] commit-graph: check size of commit data chunk Jeff King
2023-10-11 18:46   ` Taylor Blau
2023-10-11 23:22     ` Jeff King
2023-10-09 21:05 ` [PATCH 13/20] commit-graph: detect out-of-bounds extra-edges pointers Jeff King
2023-10-11 19:02   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 14/20] commit-graph: bounds-check base graphs chunk Jeff King
2023-10-11 19:05   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 15/20] commit-graph: check size of generations chunk Jeff King
2023-10-09 21:05 ` [PATCH 16/20] commit-graph: bounds-check generation overflow chunk Jeff King
2023-10-09 21:05 ` Jeff King [this message]
2023-10-11 19:11   ` [PATCH 17/20] commit-graph: check bounds when accessing BDAT chunk Taylor Blau
2023-10-11 23:27     ` Jeff King
2023-10-09 21:05 ` [PATCH 18/20] commit-graph: check bounds when accessing BIDX chunk Jeff King
2023-10-11 19:15   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 19/20] commit-graph: detect out-of-order BIDX offsets Jeff King
2023-10-11 19:16   ` Taylor Blau
2023-10-09 21:06 ` [PATCH 20/20] chunk-format: drop pair_chunk_unsafe() Jeff King
2023-10-11 19:19 ` [PATCH 0/20] bounds-checks for chunk-based files Taylor Blau
2023-10-11 23:31   ` Jeff King
2023-10-13 19:25 ` [PATCH 0/8] chunk-format: introduce `pair_chunk_expect()` convenience API Taylor Blau
2023-10-13 19:25   ` [PATCH 1/8] chunk-format: introduce `pair_chunk_expect()` helper Taylor Blau
2023-10-13 19:25   ` [PATCH 2/8] commit-graph: read `OIDF` chunk with `pair_chunk_expect()` Taylor Blau
2023-10-13 19:25   ` [PATCH 3/8] commit-graph: read `CDAT` " Taylor Blau
2023-10-13 19:25   ` [PATCH 4/8] commit-graph: read `GDAT` " Taylor Blau
2023-10-13 19:25   ` [PATCH 5/8] commit-graph: read `BIDX` " Taylor Blau
2023-10-13 19:49     ` Taylor Blau
2023-10-14 16:10     ` Junio C Hamano
2023-10-20 10:31       ` Jeff King
2023-10-13 19:25   ` [PATCH 6/8] midx: read `OIDF` " Taylor Blau
2023-10-13 21:04     ` Junio C Hamano
2023-10-13 19:25   ` [PATCH 7/8] midx: read `OIDL` " Taylor Blau
2023-10-13 19:25   ` [PATCH 8/8] midx: read `OOFF` " Taylor Blau
2023-10-20 10:23   ` [PATCH 0/8] chunk-format: introduce `pair_chunk_expect()` convenience API Jeff King
2023-10-14  0:43 ` [PATCH 21/20] t5319: make corrupted large-offset test more robust Jeff King
2023-10-14 19:42   ` Junio C Hamano
2023-10-15  3:17     ` Jeff King
2023-10-15 17:04       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231009210550.GQ3282181@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).