git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Erik Faye-Lund <kusmabite@gmail.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [RFC PATCH v2 5/5] range-diff: fix integer overflow & segfault on cost[i + n * j]
Date: Tue, 14 Dec 2021 09:04:29 -0500	[thread overview]
Message-ID: <YbikbfdEH2EKZE1p@coredump.intra.peff.net> (raw)
In-Reply-To: <RFC-patch-v2-5.5-9194965635a-20211210T122901Z-avarab@gmail.com>

On Fri, Dec 10, 2021 at 01:30:42PM +0100, Ævar Arnfjörð Bjarmason wrote:

> in preceding commits the "column_count" and the "int*"'s we malloc()
> were changed to track their length with a size_t, so we're able to
> track as many "cost" items as malloc() will give us.
> 
> But we'd still segfault on relatively large range comparisons,
> e.g. this would segfault:
> 
>     git -P range-diff --creation-factor=50 origin/master...git-for-windows/main
> 
> The reason for that is that we'd still use integer types to compute an
> array index into the "cost" array, which would overflow. The result of
> a signed overflow in C is undefined, but on my system it'll result in
> a negative number, and a prompt segfault as we'll try to access a
> negative array index.

Note that this isn't just access. We'll first write to cost[i + n * j]
to start. In practice because of the iteration, and signed overflow
being implemented as it usually is, we'd always first write a single int
that's 2GB before the array. And that should segfault.

I do wonder if this can be turned into a heap overflow exploit. I think
you'd probably need to manage to get 2GB on the heap to avoid an
immediate segfault.

> Luckily we used the COST() macro in linear-assignment.c already for
> all of these lookups, and in a preceding commit we renamed "n" in
> "range-diff.c"'s get_correspondences() to "column_count" in
> preparation for using it here.
> 
> So let's use it for the three occurrences of "cost" indexing in
> range-diff.c, and have the COST() macro itself do overflow checking
> with st_mult() and st_add(). Due to the cast to "size_t" from "int"
> we'll avoid the segfault, and will end up correctly pointing to the
> relevant "int *".

Is it actually necessary to do bounds checking here? If we know the
arrays are sized correctly, and we use an appropriate integer type,
wouldn't we know that our computations are always in bounds?

(I saw your other discussion of the unreliability of ssize_t; if we
don't want to assume it's of the same magnitude as size_t, then intmax_t
would work).

The reason I ask in particular is that I wonder if these non-intrinsic
st_* helpers might introduce a measurable slowdown. When I suggested
them earlier it was because I was also suggesting that we'd have done
all of our bounds-checks up front, during the allocation.

> It's still possible for us to overflow even with this change, that's
> because the iteration variables (such as "i" and "j" in this diff
> context are all "int"), even if we changed those to "size_t" or
> "intmax_t" (not trivial, as we depend on them being negative in some
> places) the underlying "struct string_list"'s "nr" member is an
> "unsigned int", which would eventually overflow.

The string_list overflow is something I do think we ought to fix. But we
know from past experiments that it can't actually cause a heap overflow.
Can overflowing one of the ints, though?

If we're computing i*n+j, and j goes negative, then cast to a size_t
that will turn into a big number. But depending how negative it is, it
might not overflow a size_t. But it would still be well outside the
bounds of the allocated array. E.g., consider code like this:

        int j = INT_MAX;
        while (1) {
                printf("int = %d\n", j);
                printf("size_t = %"PRIuMAX"\n", (uintmax_t)st_add(0, j));
                j++;
        }

which shows what happens when i=0, but j approaches overflow. We wrap to
-2^31, which is a large but representable size_t. So st_add() does not
trigger, but I think we'd still be out of bounds.

I suspect it's OK in practice from a security perspective, because it's
so far out of bounds as to cause a segfault and not any kind of heap
overflow. But it really feels like the fix is incomplete. Whereas using
the correct types avoids the segfault.

> We're unlikely to encounter a 2-4 billion commit history on 32 bit
> platforms. Even if we did one of the types in the underlying object
> machinery would probably overflow before we overflowed here. So let's
> punt that for now. If we're ever going to solve that issue [1] to
> change the "struct string_list"'s "nr" member to a "size_t" might be a
> good start.

I'm less concerned about "unlikely" and more about "what can bad actors
trigger". 2 billion is probably out of reach in practice though
(typically I've seen things get untenable around a few hundred million
objects total).

Still, it feels a bit hand-wavy.

-Peff

  reply	other threads:[~2021-12-14 14:05 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 19:19 [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 01/10] string-list API: change "nr" and "alloc" to "size_t" Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 02/10] range-diff.c: don't use st_mult() for signed "int" Ævar Arnfjörð Bjarmason
2021-12-10  3:39   ` Jeff King
2021-12-10 10:22     ` Ævar Arnfjörð Bjarmason
2021-12-10 11:41       ` Jeff King
2021-12-10 12:31         ` Ævar Arnfjörð Bjarmason
2021-12-10 19:24           ` Phillip Wood
2021-12-14 14:34           ` Jeff King
2021-12-10 14:27         ` Johannes Schindelin
2021-12-10 14:58           ` Ævar Arnfjörð Bjarmason
2021-12-11 14:01             ` Johannes Schindelin
2021-12-12 17:44               ` Ævar Arnfjörð Bjarmason
2021-12-14 14:42           ` Jeff King
2021-12-09 19:19 ` [RFC PATCH 03/10] range-diff.c: use "size_t" to refer to "struct string_list"'s "nr" Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 04/10] range-diff: zero out elements in "cost" first Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 05/10] linear-assignment.c: split up compute_assignment() function Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 06/10] linear-assignment.c: take "size_t", not "int" for *_count Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 07/10] linear-assignment.c: convert a macro to a "static inline" function Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 08/10] linear-assignment.c: detect signed add/mul on GCC and Clang Ævar Arnfjörð Bjarmason
2021-12-10  3:56   ` Jeff King
2021-12-09 19:19 ` [RFC PATCH 09/10] linear-assignment.c: add and use intprops.h from Gnulib Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 10/10] linear-assignment.c: use "intmax_t" instead of "int" Ævar Arnfjörð Bjarmason
2021-12-10  4:00   ` Jeff King
2021-12-10 12:30 ` [RFC PATCH v2 0/5] range-diff: fix segfault due to integer overflow Ævar Arnfjörð Bjarmason
2021-12-10 12:30   ` [RFC PATCH v2 1/5] range-diff: zero out elements in "cost" first Ævar Arnfjörð Bjarmason
2021-12-14 13:36     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 2/5] linear-assignment.c: split up compute_assignment() function Ævar Arnfjörð Bjarmason
2021-12-14 13:39     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 3/5] linear-assignment.c: take "size_t", not "int" for *_count Ævar Arnfjörð Bjarmason
2021-12-14 13:40     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 4/5] range-diff.c: rename "n" to "column_count" in get_correspondences() Ævar Arnfjörð Bjarmason
2021-12-14 13:42     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 5/5] range-diff: fix integer overflow & segfault on cost[i + n * j] Ævar Arnfjörð Bjarmason
2021-12-14 14:04     ` Jeff King [this message]
2021-12-10 14:31 ` [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow Johannes Schindelin
2021-12-10 15:07   ` Ævar Arnfjörð Bjarmason
2021-12-21 23:22   ` Philip Oakley
2021-12-21 23:36     ` Ævar Arnfjörð Bjarmason
2021-12-22 20:50       ` Johannes Schindelin
2021-12-22 21:11         ` Jeff King
2021-12-24 11:15       ` Philip Oakley
2021-12-24 16:46         ` Ævar Arnfjörð Bjarmason
2021-12-24 18:31           ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbikbfdEH2EKZE1p@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=kusmabite@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).