git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Taylor Blau <me@ttaylorr.com>
Subject: [PATCH 02/20] t: add library for munging chunk-format files
Date: Mon, 9 Oct 2023 16:58:38 -0400	[thread overview]
Message-ID: <20231009205838.GB3282181@coredump.intra.peff.net> (raw)
In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net>

When testing corruption of files using the chunk format (like
commit-graphs and midx files), it's helpful to be able to modify bytes
in specific chunks. This requires being able both to read the
table-of-contents (to find the chunk to modify) but also to adjust it
(to account for size changes in the offsets of subsequent chunks).

We have some tests already which corrupt chunk files, but they have some
downsides:

  1. They are very brittle, as they manually compute the expected size
     of a particular instance of the file (e.g., see the definitions
     starting with NUM_OBJECTS in t5319).

  2. Because they rely on manual offsets and don't read the
     table-of-contents, they're limited to overwriting bytes. But there
     are many interesting corruptions that involve changing the sizes of
     chunks (especially smaller-than-expected ones).

This patch adds a perl script which makes such corruptions easy. We'll
use it in subsequent patches.

Note that we could get by with just a big "perl -e" inside the helper
function. I chose to put it in a separate script for two reasons. One,
so we don't have to worry about the extra layer of shell quoting. And
two, the script is kind of big, and running the tests with "-x" would
repeatedly dump it into the log output.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-chunk.sh                    | 17 ++++++++
 t/lib-chunk/corrupt-chunk-file.pl | 66 +++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)
 create mode 100644 t/lib-chunk.sh
 create mode 100644 t/lib-chunk/corrupt-chunk-file.pl

diff --git a/t/lib-chunk.sh b/t/lib-chunk.sh
new file mode 100644
index 0000000000..a7cd9c3c6d
--- /dev/null
+++ b/t/lib-chunk.sh
@@ -0,0 +1,17 @@
+# Shell library for working with "chunk" files (commit-graph, midx, etc).
+
+# corrupt_chunk_file <fn> <chunk> <offset> <bytes>
+#
+# Corrupt a chunk-based file (like a commit-graph) by overwriting the bytes
+# found in the chunk specified by the 4-byte <chunk> identifier. If <offset> is
+# "clear", replace the chunk entirely. Otherwise, overwrite data <offset> bytes
+# into the chunk.
+#
+# The <bytes> are interpreted as pairs of hex digits (so "000000FE" would be
+# big-endian 254).
+corrupt_chunk_file () {
+	fn=$1; shift
+	perl "$TEST_DIRECTORY"/lib-chunk/corrupt-chunk-file.pl \
+		"$@" <"$fn" >"$fn.tmp" &&
+	mv "$fn.tmp" "$fn"
+}
diff --git a/t/lib-chunk/corrupt-chunk-file.pl b/t/lib-chunk/corrupt-chunk-file.pl
new file mode 100644
index 0000000000..cd6d386fef
--- /dev/null
+++ b/t/lib-chunk/corrupt-chunk-file.pl
@@ -0,0 +1,66 @@
+#!/usr/bin/perl
+
+my ($chunk, $seek, $bytes) = @ARGV;
+$bytes =~ s/../chr(hex($&))/ge;
+
+binmode STDIN;
+binmode STDOUT;
+
+# A few helpers to read bytes, or read and copy them to the
+# output.
+sub get {
+	my $n = shift;
+	return unless $n;
+	read(STDIN, my $buf, $n)
+		or die "read error or eof: $!\n";
+	return $buf;
+}
+sub copy {
+	my $buf = get(@_);
+	print $buf;
+	return $buf;
+}
+
+# read until we find table-of-contents entry for chunk;
+# note that we cheat a bit by assuming 4-byte alignment and
+# that no ToC entry will accidentally look like a header.
+#
+# If we don't find the entry, copy() will hit EOF and exit
+# (which should cause the caller to fail the test).
+while (copy(4) ne $chunk) { }
+my $offset = unpack("Q>", copy(8));
+
+# In clear mode, our length will change. So figure out
+# the length by comparing to the offset of the next chunk, and
+# then adjust that offset (and all subsequent) ones.
+my $len;
+if ($seek eq "clear") {
+	my $id;
+	do {
+		$id = copy(4);
+		my $next = unpack("Q>", get(8));
+		if (!defined $len) {
+			$len = $next - $offset;
+		}
+		print pack("Q>", $next - $len + length($bytes));
+	} while (unpack("N", $id));
+}
+
+# and now copy up to our existing chunk data
+copy($offset - tell(STDIN));
+if ($seek eq "clear") {
+	# if clearing, skip past existing data
+	get($len);
+} else {
+	# otherwise, copy up to the requested offset,
+	# and skip past the overwritten bytes
+	copy($seek);
+	get(length($bytes));
+}
+
+# now write out the requested bytes, along
+# with any other remaining data
+print $bytes;
+while (read(STDIN, my $buf, 4096)) {
+	print $buf;
+}
-- 
2.42.0.884.g35e1fe1a6a


  parent reply	other threads:[~2023-10-09 20:58 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-09 20:55 [PATCH 0/20] bounds-checks for chunk-based files Jeff King
2023-10-09 20:58 ` [PATCH 01/20] chunk-format: note that pair_chunk() is unsafe Jeff King
2023-10-10 23:45   ` Taylor Blau
2023-10-11 22:49     ` Jeff King
2023-10-09 20:58 ` Jeff King [this message]
2023-10-10 23:47   ` [PATCH 02/20] t: add library for munging chunk-format files Taylor Blau
2023-10-09 20:59 ` [PATCH 03/20] midx: stop ignoring malformed oid fanout chunk Jeff King
2023-10-10 23:50   ` Taylor Blau
2023-10-11 22:52     ` Jeff King
2023-10-09 20:59 ` [PATCH 04/20] commit-graph: check size of " Jeff King
2023-10-11  0:08   ` Taylor Blau
2023-10-11  1:24     ` Taylor Blau
2023-10-11 23:01     ` Jeff King
2023-10-09 21:02 ` [PATCH 05/20] midx: check size of oid lookup chunk Jeff King
2023-10-09 21:04 ` [PATCH 06/20] commit-graph: check consistency of fanout table Jeff King
2023-10-11 14:45   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 07/20] midx: check size of pack names chunk Jeff King
2023-10-11 14:52   ` Taylor Blau
2023-10-11 23:06     ` Jeff King
2023-10-09 21:05 ` [PATCH 08/20] midx: enforce chunk alignment on reading Jeff King
2023-10-11 14:56   ` Taylor Blau
2023-10-11 15:01   ` Taylor Blau
2023-10-11 23:09     ` Jeff King
2023-10-09 21:05 ` [PATCH 09/20] midx: check size of object offset chunk Jeff King
2023-10-11 18:31   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 10/20] midx: bounds-check large " Jeff King
2023-10-11 18:38   ` Taylor Blau
2023-10-11 23:18     ` Jeff King
2023-10-09 21:05 ` [PATCH 11/20] midx: check size of revindex chunk Jeff King
2023-10-11 18:41   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 12/20] commit-graph: check size of commit data chunk Jeff King
2023-10-11 18:46   ` Taylor Blau
2023-10-11 23:22     ` Jeff King
2023-10-09 21:05 ` [PATCH 13/20] commit-graph: detect out-of-bounds extra-edges pointers Jeff King
2023-10-11 19:02   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 14/20] commit-graph: bounds-check base graphs chunk Jeff King
2023-10-11 19:05   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 15/20] commit-graph: check size of generations chunk Jeff King
2023-10-09 21:05 ` [PATCH 16/20] commit-graph: bounds-check generation overflow chunk Jeff King
2023-10-09 21:05 ` [PATCH 17/20] commit-graph: check bounds when accessing BDAT chunk Jeff King
2023-10-11 19:11   ` Taylor Blau
2023-10-11 23:27     ` Jeff King
2023-10-09 21:05 ` [PATCH 18/20] commit-graph: check bounds when accessing BIDX chunk Jeff King
2023-10-11 19:15   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 19/20] commit-graph: detect out-of-order BIDX offsets Jeff King
2023-10-11 19:16   ` Taylor Blau
2023-10-09 21:06 ` [PATCH 20/20] chunk-format: drop pair_chunk_unsafe() Jeff King
2023-10-11 19:19 ` [PATCH 0/20] bounds-checks for chunk-based files Taylor Blau
2023-10-11 23:31   ` Jeff King
2023-10-13 19:25 ` [PATCH 0/8] chunk-format: introduce `pair_chunk_expect()` convenience API Taylor Blau
2023-10-13 19:25   ` [PATCH 1/8] chunk-format: introduce `pair_chunk_expect()` helper Taylor Blau
2023-10-13 19:25   ` [PATCH 2/8] commit-graph: read `OIDF` chunk with `pair_chunk_expect()` Taylor Blau
2023-10-13 19:25   ` [PATCH 3/8] commit-graph: read `CDAT` " Taylor Blau
2023-10-13 19:25   ` [PATCH 4/8] commit-graph: read `GDAT` " Taylor Blau
2023-10-13 19:25   ` [PATCH 5/8] commit-graph: read `BIDX` " Taylor Blau
2023-10-13 19:49     ` Taylor Blau
2023-10-14 16:10     ` Junio C Hamano
2023-10-20 10:31       ` Jeff King
2023-10-13 19:25   ` [PATCH 6/8] midx: read `OIDF` " Taylor Blau
2023-10-13 21:04     ` Junio C Hamano
2023-10-13 19:25   ` [PATCH 7/8] midx: read `OIDL` " Taylor Blau
2023-10-13 19:25   ` [PATCH 8/8] midx: read `OOFF` " Taylor Blau
2023-10-20 10:23   ` [PATCH 0/8] chunk-format: introduce `pair_chunk_expect()` convenience API Jeff King
2023-10-14  0:43 ` [PATCH 21/20] t5319: make corrupted large-offset test more robust Jeff King
2023-10-14 19:42   ` Junio C Hamano
2023-10-15  3:17     ` Jeff King
2023-10-15 17:04       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231009205838.GB3282181@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).