git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 10/20] midx: bounds-check large offset chunk
Date: Wed, 11 Oct 2023 19:18:40 -0400	[thread overview]
Message-ID: <20231011231840.GK518221@coredump.intra.peff.net> (raw)
In-Reply-To: <ZSbrj3hmm8H7ce2l@nand.local>

On Wed, Oct 11, 2023 at 02:38:07PM -0400, Taylor Blau wrote:

> >  		offset32 ^= MIDX_LARGE_OFFSET_NEEDED;
> > -		return get_be64(m->chunk_large_offsets +
> > -				st_mult(sizeof(uint64_t), offset32));
> > +		if (offset32 >= m->chunk_large_offsets_len / sizeof(uint64_t))
> > +			die(_("multi-pack-index large offset out of bounds"));
> > +		return get_be64(m->chunk_large_offsets + sizeof(uint64_t) * offset32);
> 
> Makes sense, and this seems very reasonable. I had a couple of thoughts
> on this hunk, one nitpick, and one non-nitpick ;-).
> 
> The nitpick is the easy one, which is that I typically think of scaling
> some index to produce an offset into some chunk, instead of dividing and
> going the other way.
> 
> So I probably would have written something like:
> 
>     if (st_mult(offset32, sizeof(uint64_t)) >= m->chunk_large_offsets_len)
>         die(_("multi-pack-index large offset out of bounds"));

Yeah, I admit that my inclination is to think of it that way, too, and
the division is a bit of an inversion. But I guess I am hard-wired to do
bounds checks with division, because I know it avoids overflow issues.
And the behavior is actually different, since st_mult() dies (with a
somewhat vague message) rather than returning with an error.

I was actually tempted to write a small inline helper to do
bounds-checks (since this pattern appears a few times in this series).
But of course it suffers from the same issues (it dies with a vague
message, or it returns a result code and then is awkward to call/use
because you have to stick the output in an out-parameter).

> But that is definitely a subjective/minor point, and I would not at all
> be sad if you felt differently about it. That said, I do wish for a
> little more information in the die() message, perhaps:
> 
>     if (st_mult(offset32, sizeof(uint64_t)) >= m->chunk_large_offsets_len)
>         die(_("multi-pack-index large offset for %s out of bounds "
>               "(%"PRIuMAX" is beyond %"PRIuMAX")"),
>             (uintmax_t)(offset32 * sizeof(uint64_t)), /* checked earlier */
>             (uintmax_t)m->chunk_large_offsets_len);
> 
> I can imagine that for debugging corrupt MIDXs in the future, having
> some extra information like the above would give us a better starting
> point than popping into a debugger to get the same information.

As you'll see when you get to the BDAT/BIDX messages, I put in a load of
detail. That's because I did those ones first, and then after seeing the
terse "eh, it's the wrong size" messages in the rest of the commit-graph
and midx code, I just followed suit.

We can circle back and improve the detail in the messages. One slightly
annoying thing is dealing with it in the tests. We'd have to do one of:

  - make the tests more brittle by hard-coding positions and offsets we
    don't necessarily care about

  - loosen the tests by just matching with "grep", which can sometimes
    miss other unexpected output

  - do some post-processing on the output to massage out the details we
    don't care about; this in theory is the best of both worlds, but
    reliably post-processing is non-trivial.

So of the three I'd probably just loosen to use "grep" in most spots.

-Peff

  reply	other threads:[~2023-10-11 23:19 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-09 20:55 [PATCH 0/20] bounds-checks for chunk-based files Jeff King
2023-10-09 20:58 ` [PATCH 01/20] chunk-format: note that pair_chunk() is unsafe Jeff King
2023-10-10 23:45   ` Taylor Blau
2023-10-11 22:49     ` Jeff King
2023-10-09 20:58 ` [PATCH 02/20] t: add library for munging chunk-format files Jeff King
2023-10-10 23:47   ` Taylor Blau
2023-10-09 20:59 ` [PATCH 03/20] midx: stop ignoring malformed oid fanout chunk Jeff King
2023-10-10 23:50   ` Taylor Blau
2023-10-11 22:52     ` Jeff King
2023-10-09 20:59 ` [PATCH 04/20] commit-graph: check size of " Jeff King
2023-10-11  0:08   ` Taylor Blau
2023-10-11  1:24     ` Taylor Blau
2023-10-11 23:01     ` Jeff King
2023-10-09 21:02 ` [PATCH 05/20] midx: check size of oid lookup chunk Jeff King
2023-10-09 21:04 ` [PATCH 06/20] commit-graph: check consistency of fanout table Jeff King
2023-10-11 14:45   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 07/20] midx: check size of pack names chunk Jeff King
2023-10-11 14:52   ` Taylor Blau
2023-10-11 23:06     ` Jeff King
2023-10-09 21:05 ` [PATCH 08/20] midx: enforce chunk alignment on reading Jeff King
2023-10-11 14:56   ` Taylor Blau
2023-10-11 15:01   ` Taylor Blau
2023-10-11 23:09     ` Jeff King
2023-10-09 21:05 ` [PATCH 09/20] midx: check size of object offset chunk Jeff King
2023-10-11 18:31   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 10/20] midx: bounds-check large " Jeff King
2023-10-11 18:38   ` Taylor Blau
2023-10-11 23:18     ` Jeff King [this message]
2023-10-09 21:05 ` [PATCH 11/20] midx: check size of revindex chunk Jeff King
2023-10-11 18:41   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 12/20] commit-graph: check size of commit data chunk Jeff King
2023-10-11 18:46   ` Taylor Blau
2023-10-11 23:22     ` Jeff King
2023-10-09 21:05 ` [PATCH 13/20] commit-graph: detect out-of-bounds extra-edges pointers Jeff King
2023-10-11 19:02   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 14/20] commit-graph: bounds-check base graphs chunk Jeff King
2023-10-11 19:05   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 15/20] commit-graph: check size of generations chunk Jeff King
2023-10-09 21:05 ` [PATCH 16/20] commit-graph: bounds-check generation overflow chunk Jeff King
2023-10-09 21:05 ` [PATCH 17/20] commit-graph: check bounds when accessing BDAT chunk Jeff King
2023-10-11 19:11   ` Taylor Blau
2023-10-11 23:27     ` Jeff King
2023-10-09 21:05 ` [PATCH 18/20] commit-graph: check bounds when accessing BIDX chunk Jeff King
2023-10-11 19:15   ` Taylor Blau
2023-10-09 21:05 ` [PATCH 19/20] commit-graph: detect out-of-order BIDX offsets Jeff King
2023-10-11 19:16   ` Taylor Blau
2023-10-09 21:06 ` [PATCH 20/20] chunk-format: drop pair_chunk_unsafe() Jeff King
2023-10-11 19:19 ` [PATCH 0/20] bounds-checks for chunk-based files Taylor Blau
2023-10-11 23:31   ` Jeff King
2023-10-13 19:25 ` [PATCH 0/8] chunk-format: introduce `pair_chunk_expect()` convenience API Taylor Blau
2023-10-13 19:25   ` [PATCH 1/8] chunk-format: introduce `pair_chunk_expect()` helper Taylor Blau
2023-10-13 19:25   ` [PATCH 2/8] commit-graph: read `OIDF` chunk with `pair_chunk_expect()` Taylor Blau
2023-10-13 19:25   ` [PATCH 3/8] commit-graph: read `CDAT` " Taylor Blau
2023-10-13 19:25   ` [PATCH 4/8] commit-graph: read `GDAT` " Taylor Blau
2023-10-13 19:25   ` [PATCH 5/8] commit-graph: read `BIDX` " Taylor Blau
2023-10-13 19:49     ` Taylor Blau
2023-10-14 16:10     ` Junio C Hamano
2023-10-20 10:31       ` Jeff King
2023-10-13 19:25   ` [PATCH 6/8] midx: read `OIDF` " Taylor Blau
2023-10-13 21:04     ` Junio C Hamano
2023-10-13 19:25   ` [PATCH 7/8] midx: read `OIDL` " Taylor Blau
2023-10-13 19:25   ` [PATCH 8/8] midx: read `OOFF` " Taylor Blau
2023-10-20 10:23   ` [PATCH 0/8] chunk-format: introduce `pair_chunk_expect()` convenience API Jeff King
2023-10-14  0:43 ` [PATCH 21/20] t5319: make corrupted large-offset test more robust Jeff King
2023-10-14 19:42   ` Junio C Hamano
2023-10-15  3:17     ` Jeff King
2023-10-15 17:04       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231011231840.GK518221@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).