Stack read out-of-bounds in parse_sha1_header

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0
       [not found] <1399913289.8224468.1474810664933.JavaMail.zimbra@imag.fr>
@ 2016-09-25 14:12 ` Gustavo Grieco
  2016-09-26  0:10   ` Junio C Hamano
  2016-09-27  2:30   ` Possible integer overflow parsing malformed objects in " Gustavo Grieco
  0 siblings, 2 replies; 22+ messages in thread
From: Gustavo Grieco @ 2016-09-25 14:12 UTC (permalink / raw)
  To: git

Hi,

We found a stack read out-of-bounds parsing object files using git 2.10.0. It was tested on ArchLinux x86_64. To reproduce, first recompile git with ASAN support and then execute:

$ git init ; mkdir -p .git/objects/b2 ; printf 'x' > .git/objects/b2/93584ddd61af21260be75ee9f73e9d53f08cd0

Finally you can trigger the bug using several commands from git (other commands that parses all objects will work too), for instance:

$ git fsck

The ASAN report is here:

==2763==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe16e4a690 at pc 0x0000006fe5dc bp 0x7ffe16e4a530 sp 0x7ffe16e4a520
READ of size 1 at 0x7ffe16e4a690 thread T0
    #0 0x6fe5db in parse_sha1_header_extended /home/g/Work/Code/git-2.10.0/sha1_file.c:1684
    #1 0x702cd4 in sha1_loose_object_info /home/g/Work/Code/git-2.10.0/sha1_file.c:2660
    #2 0x70332c in sha1_object_info_extended /home/g/Work/Code/git-2.10.0/sha1_file.c:2696
    #3 0x7038e0 in sha1_object_info /home/g/Work/Code/git-2.10.0/sha1_file.c:2745
    #4 0x648498 in parse_object /home/g/Work/Code/git-2.10.0/object.c:260
    #5 0x48d46d in fsck_sha1 builtin/fsck.c:367
    #6 0x48da47 in fsck_loose builtin/fsck.c:493
    #7 0x707514 in for_each_file_in_obj_subdir /home/g/Work/Code/git-2.10.0/sha1_file.c:3477
    #8 0x70775b in for_each_loose_file_in_objdir_buf /home/g/Work/Code/git-2.10.0/sha1_file.c:3512
    #9 0x707885 in for_each_loose_file_in_objdir /home/g/Work/Code/git-2.10.0/sha1_file.c:3532
    #10 0x48dc1d in fsck_object_dir builtin/fsck.c:521
    #11 0x48e2e6 in cmd_fsck builtin/fsck.c:644
    #12 0x407a8f in run_builtin /home/g/Work/Code/git-2.10.0/git.c:352
    #13 0x407e35 in handle_builtin /home/g/Work/Code/git-2.10.0/git.c:539
    #14 0x408175 in run_argv /home/g/Work/Code/git-2.10.0/git.c:593
    #15 0x408458 in cmd_main /home/g/Work/Code/git-2.10.0/git.c:665
    #16 0x53fc70 in main /home/g/Work/Code/git-2.10.0/common-main.c:40
    #17 0x7f0f46d43290 in __libc_start_main (/usr/lib/libc.so.6+0x20290)
    #18 0x405209 in _start (/home/g/Work/Code/git-2.10.0/git+0x405209)

Address 0x7ffe16e4a690 is located in stack of thread T0 at offset 192 in frame
    #0 0x702834 in sha1_loose_object_info /home/g/Work/Code/git-2.10.0/sha1_file.c:2614

  This frame has 5 object(s):
    [32, 40) 'mapsize'
    [96, 120) 'hdrbuf'
    [160, 192) 'hdr' <== Memory access at offset 192 overflows this variable
    [224, 368) 'st'
    [416, 576) 'stream'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/g/Work/Code/git-2.10.0/sha1_file.c:1684 in parse_sha1_header_extended
Shadow bytes around the buggy address:
  0x100042dc1480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100042dc1490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100042dc14a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100042dc14b0: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4
  0x100042dc14c0: f4 f4 f2 f2 f2 f2 00 00 00 f4 f2 f2 f2 f2 00 00
=>0x100042dc14d0: 00 00[f2]f2 f2 f2 00 00 00 00 00 00 00 00 00 00
  0x100042dc14e0: 00 00 00 00 00 00 00 00 f4 f4 f2 f2 f2 f2 00 00
  0x100042dc14f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100042dc1500: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
  0x100042dc1510: 00 00 00 00 f1 f1 f1 f1 04 f4 f4 f4 f2 f2 f2 f2
  0x100042dc1520: 00 00 00 00 00 f4 f4 f4 f3 f3 f3 f3 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb


Regards,
Gustavo.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0
  2016-09-25 14:12 ` Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0 Gustavo Grieco
@ 2016-09-26  0:10   ` Junio C Hamano
  2016-09-26  4:29     ` [PATCH] unpack_sha1_header(): detect malformed object header Junio C Hamano
                       ` (2 more replies)
  2016-09-27  2:30   ` Possible integer overflow parsing malformed objects in " Gustavo Grieco
  1 sibling, 3 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26  0:10 UTC (permalink / raw)
  To: Gustavo Grieco; +Cc: git

Gustavo Grieco <gustavo.grieco@imag.fr> writes:

> We found a stack read out-of-bounds parsing object files using git 2.10.0. It was tested on ArchLinux x86_64. To reproduce, first recompile git with ASAN support and then execute:
>
> $ git init ; mkdir -p .git/objects/b2 ; printf 'x' > .git/objects/b2/93584ddd61af21260be75ee9f73e9d53f08cd0

Interesting.  If you prepare such a broken loose object file in your
local repository, I would expect that either unpack_sha1_header() or
unpack_sha1_header_to_strbuf() that sha1_loose_object_info() calls
would detect and barf by noticing that an error came from libz while
it attempts to inflate and would not even call parse_sha1_header.

But it is nevertheless bad to assume that whatever happens to
inflate without an error must be formatted correctly to allow
parsing (i.e. has ' ' and then NUL termination within the first 32
bytes after inflation), which is exactly what the hdr[32] is saying.

Perhaps we need something like the following to tighten the
codepath.

Note that this is totally unteseted and not thought through; I
briefly thought about what unpack_sha1_header_to_strbuf() does with
this change (it first lets unpack_sha1_header() to attempt with a
small buffer but it seems to discard the error code from it before
seeing if the returned buffer has NUL in it); there may be bad
interactions with it.

 sha1_file.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 60ff21f..dfcbd76 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1648,6 +1648,8 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,

 int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz)
 {
+	int status;
+
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
 	stream->next_in = map;
@@ -1656,7 +1658,15 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
 	stream->avail_out = bufsiz;

 	git_inflate_init(stream);
-	return git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
+	if (status)
+		return status;
+
+	/* Make sure we got the terminating NUL for the object header */
+	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
+		return -1;
+
+	return 0;
 }

 static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
@@ -1758,6 +1768,8 @@ static int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
 		char c = *hdr++;
 		if (c == ' ')
 			break;
+		if (!c)
+			die("invalid object header");
 		type_len++;
 	}

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] unpack_sha1_header(): detect malformed object header
  2016-09-26  0:10   ` Junio C Hamano
@ 2016-09-26  4:29     ` Junio C Hamano
  2016-09-26 14:03       ` Jeff King
  2016-09-26 13:50     ` Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0 Jeff King
  2016-09-26 17:48     ` Gustavo Grieco
  2 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26  4:29 UTC (permalink / raw)
  To: git, Karthik Nayak, Jeff King; +Cc: Gustavo Grieco

When opening a loose object file, we often do this sequence:

 - prepare a short buffer for the object header (on stack)

 - call unpack_sha1_header() and have early part of the object data
   inflated, enough to fill the buffer

 - parse that data in the short buffer, assuming that the first part
   of the object is <type> SP <length> NUL

Nobody in this sequence however actually verifies that the loop that
tries to find SP that must come after the typename or NUL that must
come after the length exist in the inflated data.  Because the
parsing function parse_sha1_header_extended() is not even given the
number of bytes inflated into the header buffer, it can easily read
past it, looking for the SP byte that may not even exist.

A variant recently introduced to support "--allow-unknown-type"
option of "git cat-file -t" changes the second step to use
unpack_sha1_header_to_strbuf(), but the story is essentially the
same.  It did check to see if it saw enough to include NUL, but
nobody checked for SP before calling the parsing function.

To correct this, do these three things:

 - rename unpack_sha1_header() to unpack_sha1_short_header() and
   have unpack_sha1_header_to_strbuf() keep calling that as its
   helper function.  This will detect and report zlib errors, but is
   not aware of the format of a loose object (as before).

 - introduce unpack_sha1_header() that calls the same helper
   function, and when zlib reports it inflated OK into the buffer,
   check if the buffer has both SP and NUL in this order.  This
   would ensure that parsing function will terminate within the
   buffer that holds the inflated header.

 - update unpack_sha1_header_to_strbuf() to check if the resulting
   buffer has both SP and NUL in this order for the same effect.

Reported-by: Gustavo Grieco <gustavo.grieco@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Unlike the "something like this" version, this does the "we got
   some data, does it look like an object header, safely parseable
   by our parser?" check in the unpack code, without touching the
   parser, as I think that division of labor between the unpacker
   and the parser makes more sense.

   The strbuf codepath came in 46f03448 ("sha1_file: support reading
   from a loose object of unknown type", 2015-05-03) by Karthik,
   whose log says it was written by me, and helped by Peff, so I'm
   asking these two to lend their eyes.

 sha1_file.c | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..445e763 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1646,7 +1646,9 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
 	return used;
 }
 
-int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz)
+static int unpack_sha1_short_header(git_zstream *stream,
+				    unsigned char *map, unsigned long mapsize,
+				    void *buffer, unsigned long bufsiz)
 {
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1659,13 +1661,37 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
 	return git_inflate(stream, 0);
 }
 
+int unpack_sha1_header(git_zstream *stream,
+		       unsigned char *map, unsigned long mapsize,
+		       void *buffer, unsigned long bufsiz)
+{
+	const char *eoh;
+	int status = unpack_sha1_short_header(stream, map, mapsize,
+					      buffer, bufsiz);
+
+	if (status < Z_OK)
+		return status;
+
+	/* Make sure we have the terminating NUL */
+	eoh = memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer);
+	if (!eoh)
+		return -1;
+	/* Make sure we have ' ' at the end of type */
+	if (!memchr(buffer, ' ', eoh - (const char *)buffer))
+		return -1;
+	return 0;
+}
+
 static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
 					unsigned long mapsize, void *buffer,
 					unsigned long bufsiz, struct strbuf *header)
 {
+	const char *eoh;
 	int status;
 
-	status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);
+	status = unpack_sha1_short_header(stream, map, mapsize, buffer, bufsiz);
+	if (status < Z_OK)
+		return -1;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
@@ -1686,11 +1712,19 @@ static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
 		status = git_inflate(stream, 0);
 		strbuf_add(header, buffer, stream->next_out - (unsigned char *)buffer);
 		if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-			return 0;
+			goto enough;
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
 	return -1;
+
+enough:
+	eoh = memchr(header->buf, '\0', header->len);
+	if (!eoh)
+		die("BUG: the NUL we earlier saw is gone???");
+	if (!memchr(header->buf, ' ', eoh - header->buf))
+		return -1;
+	return 0;
 }
 
 static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0
  2016-09-26  0:10   ` Junio C Hamano
  2016-09-26  4:29     ` [PATCH] unpack_sha1_header(): detect malformed object header Junio C Hamano
@ 2016-09-26 13:50     ` Jeff King
  2016-09-26 17:48     ` Gustavo Grieco
  2 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2016-09-26 13:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Gustavo Grieco, git

On Sun, Sep 25, 2016 at 05:10:31PM -0700, Junio C Hamano wrote:

> Gustavo Grieco <gustavo.grieco@imag.fr> writes:
> 
> > We found a stack read out-of-bounds parsing object files using git 2.10.0. It was tested on ArchLinux x86_64. To reproduce, first recompile git with ASAN support and then execute:
> >
> > $ git init ; mkdir -p .git/objects/b2 ; printf 'x' > .git/objects/b2/93584ddd61af21260be75ee9f73e9d53f08cd0
> 
> Interesting.  If you prepare such a broken loose object file in your
> local repository, I would expect that either unpack_sha1_header() or
> unpack_sha1_header_to_strbuf() that sha1_loose_object_info() calls
> would detect and barf by noticing that an error came from libz while
> it attempts to inflate and would not even call parse_sha1_header.
> 
> But it is nevertheless bad to assume that whatever happens to
> inflate without an error must be formatted correctly to allow
> parsing (i.e. has ' ' and then NUL termination within the first 32
> bytes after inflation), which is exactly what the hdr[32] is saying.

Yeah. I also was surprised that we didn't barf on a zlib failure. But
based on previous debugging of corrupted zlib data, my recollection
is that there are a large number of weird corruptions that zlib will
happily pass back and only later complain about a checksum error. So
presumably "x" is one of those, and it might not hold for other
corruptions (but I didn't try).

> Note that this is totally unteseted and not thought through; I
> briefly thought about what unpack_sha1_header_to_strbuf() does with
> this change (it first lets unpack_sha1_header() to attempt with a
> small buffer but it seems to discard the error code from it before
> seeing if the returned buffer has NUL in it); there may be bad
> interactions with it.

Yeah, that seems wrong. I don't think it would involve an out of bounds
read, but we probably could fail to correctly report zlib corruption.

> diff --git a/sha1_file.c b/sha1_file.c
> index 60ff21f..dfcbd76 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1648,6 +1648,8 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
>  
>  int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz)
>  {
> +	int status;
> +
>  	/* Get the data stream */
>  	memset(stream, 0, sizeof(*stream));
>  	stream->next_in = map;
> @@ -1656,7 +1658,15 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
>  	stream->avail_out = bufsiz;
>  
>  	git_inflate_init(stream);
> -	return git_inflate(stream, 0);
> +	status = git_inflate(stream, 0);
> +	if (status)
> +		return status;
> +
> +	/* Make sure we got the terminating NUL for the object header */
> +	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
> +		return -1;
> +
> +	return 0;

This doesn't look too invasive as an approach, though I would have done
it differently. We're making the assumption that once there is a NUL,
the header-parser won't do anything stupid, which creates a coupling
between those two bits of code. My inclination would have been to just
treat the header as a ptr/len pair, and make sure the parser never reads
past the end.

But I implemented that, and it _is_ rather invasive. And it's not like
coupling unpack_sha1_header() and parse_sha1_header() is all that
terrible; they are meant to be paired.

I haven't read through your follow-up yet; I'll do that before posting
my version.

>  static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
> @@ -1758,6 +1768,8 @@ static int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
>  		char c = *hdr++;
>  		if (c == ' ')
>  			break;
> +		if (!c)
> +			die("invalid object header");
>  		type_len++;
>  	}

We keep reading from hdr after this, though I think those bits would all
bail correctly on seeing NUL.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] unpack_sha1_header(): detect malformed object header
  2016-09-26  4:29     ` [PATCH] unpack_sha1_header(): detect malformed object header Junio C Hamano
@ 2016-09-26 14:03       ` Jeff King
  2016-09-26 16:15         ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2016-09-26 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Karthik Nayak, Gustavo Grieco

On Sun, Sep 25, 2016 at 09:29:04PM -0700, Junio C Hamano wrote:

> To correct this, do these three things:
> 
>  - rename unpack_sha1_header() to unpack_sha1_short_header() and
>    have unpack_sha1_header_to_strbuf() keep calling that as its
>    helper function.  This will detect and report zlib errors, but is
>    not aware of the format of a loose object (as before).

This step makes sense to me, and is a problem in the original you posted
(i.e., we may not see all of the header in the strbuf variant). Your
refactor looks good.

>  - introduce unpack_sha1_header() that calls the same helper
>    function, and when zlib reports it inflated OK into the buffer,
>    check if the buffer has both SP and NUL in this order.  This
>    would ensure that parsing function will terminate within the
>    buffer that holds the inflated header.
> 
>  - update unpack_sha1_header_to_strbuf() to check if the resulting
>    buffer has both SP and NUL in this order for the same effect.

This part I don't understand, though. We clearly need to look for the
NUL. But why do we need to look for the space? The loop in
parse_sha1_header() can easily detect this as it looks for the end of
the type name (and if it hits the end-of-string, can bail as in your
original patch).

I.e., the root of the problem is that we pass parse_sha1_header() a the
"ptr" half of a ptr/len buffer, and it has no idea how much we read.
But once we get it that information (either by passing the length, or by
ensuring that the buffer is NUL-terminated, it should be easy for it to
do the right thing.

Anyway, here's my ptr/len version (which passes the length back out of
unpack_sha1_header via an in/out pointer). After thinking on it, though,
I'm of the opinion that we're better off just ensuring that "hdr" is
NUL-terminated. We end up assuming that anyway later, since we have to
know how much of the header buffer was consumed by parsing.

Do note the final call below in the streaming loose-open code, which
exhibits that, but also seems to call parse_sha1_header() without
checking its return value. I think that needs fixed regardless of the
approach.

---
diff --git a/cache.h b/cache.h
index d0494c8..e89dcff 100644
--- a/cache.h
+++ b/cache.h
@@ -1121,8 +1121,9 @@ extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned c
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
 extern int git_open_noatime(const char *name);
 extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
-extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
+extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned
+			      long mapsize, void *buffer, size_t *bufsiz);
+extern int parse_sha1_header(const char *hdr, size_t len, unsigned long *sizep);
 
 /* global flag to enable extra checks when accessing packed objects */
 extern int do_check_packed_object_crc;
diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..326593b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1646,41 +1646,50 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
 	return used;
 }
 
-int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz)
+int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize,
+		       void *buffer, size_t *bufsiz)
 {
+	int ret;
+
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
 	stream->next_in = map;
 	stream->avail_in = mapsize;
 	stream->next_out = buffer;
-	stream->avail_out = bufsiz;
+	stream->avail_out = *bufsiz;
 
 	git_inflate_init(stream);
-	return git_inflate(stream, 0);
+	ret = git_inflate(stream, 0);
+	*bufsiz -= stream->avail_out;
+	return ret;
 }
 
-static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					unsigned long mapsize, void *buffer,
-					unsigned long bufsiz, struct strbuf *header)
+static int unpack_sha1_header_to_strbuf(git_zstream *stream,
+					unsigned char *map, unsigned long mapsize,
+					void *buffer, size_t *bufsiz,
+					struct strbuf *header)
 {
+	size_t initial_len = *bufsiz;
 	int status;
 
-	status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);
+	status = unpack_sha1_header(stream, map, mapsize, buffer, &initial_len);
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
-	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
+	if (memchr(buffer, '\0', initial_len)) {
+		*bufsiz = initial_len;
 		return 0;
+	}
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
 	 * reading the stream.
 	 */
-	strbuf_add(header, buffer, stream->next_out - (unsigned char *)buffer);
+	strbuf_add(header, buffer, initial_len);
 	stream->next_out = buffer;
-	stream->avail_out = bufsiz;
+	stream->avail_out = *bufsiz;
 
 	do {
 		status = git_inflate(stream, 0);
@@ -1688,7 +1697,7 @@ static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
 		if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 			return 0;
 		stream->next_out = buffer;
-		stream->avail_out = bufsiz;
+		stream->avail_out = *bufsiz;
 	} while (status != Z_STREAM_END);
 	return -1;
 }
@@ -1743,9 +1752,11 @@ static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long s
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
-			       unsigned int flags)
+static int parse_sha1_header_extended(const char *hdr, size_t len,
+				      struct object_info *oi,
+				      unsigned int flags)
 {
+	const char *end = hdr + len;
 	const char *type_buf = hdr;
 	unsigned long size;
 	int type, type_len = 0;
@@ -1754,12 +1765,14 @@ static int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
 	 * The type can be of any size but is followed by
 	 * a space.
 	 */
-	for (;;) {
+	while (hdr < end) {
 		char c = *hdr++;
 		if (c == ' ')
 			break;
 		type_len++;
 	}
+	if (hdr >= end)
+		return -1;
 
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->typename)
@@ -1781,10 +1794,10 @@ static int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
 	 * decimal format (ie "010" is not valid).
 	 */
 	size = *hdr++ - '0';
-	if (size > 9)
+	if (size > 9 || hdr >= end)
 		return -1;
 	if (size) {
-		for (;;) {
+		while (hdr >= end) {
 			unsigned long c = *hdr - '0';
 			if (c > 9)
 				break;
@@ -1799,17 +1812,17 @@ static int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	return hdr >= end || *hdr ? -1 : type;
 }
 
-int parse_sha1_header(const char *hdr, unsigned long *sizep)
+int parse_sha1_header(const char *hdr, size_t len, unsigned long *sizep)
 {
 	struct object_info oi;
 
 	oi.sizep = sizep;
 	oi.typename = NULL;
 	oi.typep = NULL;
-	return parse_sha1_header_extended(hdr, &oi, LOOKUP_REPLACE_OBJECT);
+	return parse_sha1_header_extended(hdr, len, &oi, LOOKUP_REPLACE_OBJECT);
 }
 
 static void *unpack_sha1_file(void *map, unsigned long mapsize, enum object_type *type, unsigned long *size, const unsigned char *sha1)
@@ -1817,9 +1830,11 @@ static void *unpack_sha1_file(void *map, unsigned long mapsize, enum object_type
 	int ret;
 	git_zstream stream;
 	char hdr[8192];
+	size_t hdr_len = sizeof(hdr);
 
-	ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
-	if (ret < Z_OK || (*type = parse_sha1_header(hdr, size)) < 0)
+	ret = unpack_sha1_header(&stream, map, mapsize, hdr, &hdr_len);
+	if (ret < Z_OK ||
+	    (*type = parse_sha1_header(hdr, hdr_len, size)) < 0)
 		return NULL;
 
 	return unpack_sha1_rest(&stream, hdr, *size, sha1);
@@ -2697,6 +2712,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	void *map;
 	git_zstream stream;
 	char hdr[32];
+	size_t hdr_len = sizeof(hdr);
 	struct strbuf hdrbuf = STRBUF_INIT;
 
 	if (oi->delta_base_sha1)
@@ -2725,19 +2741,19 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
 	if ((flags & LOOKUP_UNKNOWN_OBJECT)) {
-		if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
+		if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, hdr, &hdr_len, &hdrbuf) < 0)
 			status = error("unable to unpack %s header with --allow-unknown-type",
 				       sha1_to_hex(sha1));
-	} else if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+	} else if (unpack_sha1_header(&stream, map, mapsize, hdr, &hdr_len) < 0)
 		status = error("unable to unpack %s header",
 			       sha1_to_hex(sha1));
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_sha1_header_extended(hdrbuf.buf, hdrbuf.len, oi, flags)) < 0)
 			status = error("unable to parse %s header with --allow-unknown-type",
 				       sha1_to_hex(sha1));
-	} else if ((status = parse_sha1_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_sha1_header_extended(hdr, hdr_len, oi, flags)) < 0)
 		status = error("unable to parse %s header", sha1_to_hex(sha1));
 	git_inflate_end(&stream);
 	munmap(map, mapsize);
diff --git a/streaming.c b/streaming.c
index 3c48f04..ee73544 100644
--- a/streaming.c
+++ b/streaming.c
@@ -334,6 +334,7 @@ static struct stream_vtbl loose_vtbl = {
 
 static open_method_decl(loose)
 {
+	size_t len = sizeof(st->u.loose.hdr);
 	st->u.loose.mapped = map_sha1_file(sha1, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -341,13 +342,14 @@ static open_method_decl(loose)
 			       st->u.loose.mapped,
 			       st->u.loose.mapsize,
 			       st->u.loose.hdr,
-			       sizeof(st->u.loose.hdr)) < 0) {
+			       &len) < 0) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
 	}
 
-	parse_sha1_header(st->u.loose.hdr, &st->size);
+	if (parse_sha1_header(st->u.loose.hdr, len, &st->size) < 0)
+		return -1;
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
 	st->u.loose.hdr_avail = st->z.total_out;
 	st->z_state = z_used;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] unpack_sha1_header(): detect malformed object header
  2016-09-26 14:03       ` Jeff King
@ 2016-09-26 16:15         ` Junio C Hamano
  2016-09-26 17:33           ` Junio C Hamano
  2016-09-26 17:34           ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 16:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Karthik Nayak, Gustavo Grieco

Jeff King <peff@peff.net> writes:

> This part I don't understand, though. We clearly need to look for the
> NUL. But why do we need to look for the space? The loop in
> parse_sha1_header() can easily detect this as it looks for the end of
> the type name (and if it hits the end-of-string, can bail as in your
> original patch).
> I.e., the root of the problem is that we pass parse_sha1_header() a the
> "ptr" half of a ptr/len buffer, and it has no idea how much we read.
> But once we get it that information (either by passing the length, or by
> ensuring that the buffer is NUL-terminated, it should be easy for it to
> do the right thing.

Yup.

> Anyway, here's my ptr/len version (which passes the length back out of
> unpack_sha1_header via an in/out pointer). After thinking on it, though,
> I'm of the opinion that we're better off just ensuring that "hdr" is
> NUL-terminated. We end up assuming that anyway later, since we have to
> know how much of the header buffer was consumed by parsing.

I'd agree, not because I didn't first go in this <ptr,len> route
myself, but because the attached change does look quite invasive.
Also, I think it is OK to ask unpack_*_header() to fail if what it
turns can no way be a header, e.g. lacks NUL termination.

> Do note the final call below in the streaming loose-open code, which
> exhibits that, but also seems to call parse_sha1_header() without
> checking its return value. I think that needs fixed regardless of the
> approach.

Good that your attempt to signature-changing change caught it.  I'll
take a further look.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] unpack_sha1_header(): detect malformed object header
  2016-09-26 16:15         ` Junio C Hamano
@ 2016-09-26 17:33           ` Junio C Hamano
  2016-09-26 17:35             ` Jeff King
  2016-09-26 17:34           ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 17:33 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Karthik Nayak, Gustavo Grieco

Junio C Hamano <gitster@pobox.com> writes:

> Good that your attempt to signature-changing change caught it.  I'll
> take a further look.

So here are two patch series.  The first one makes sure all callers
of parse_sha1_header() check the returned status.

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Mon, 26 Sep 2016 09:23:41 -0700
Subject: [PATCH 1/2] streaming: make sure to notice corrupt object

The streaming read interface from a loose object called
parse_sha1_header() but discarded its return value, without noticing
a potential error.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 streaming.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/streaming.c b/streaming.c
index 811fcc2..884a8f1 100644
--- a/streaming.c
+++ b/streaming.c
@@ -347,7 +347,8 @@ static open_method_decl(loose)
 		return -1;
 	}
 
-	parse_sha1_header(st->u.loose.hdr, &st->size);
+	if (parse_sha1_header(st->u.loose.hdr, &st->size) < 0)
+		return -1;
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
 	st->u.loose.hdr_avail = st->z.total_out;
 	st->z_state = z_used;
-- 
2.10.0-533-ga18d90d


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] unpack_sha1_header(): detect malformed object header
  2016-09-26 16:15         ` Junio C Hamano
  2016-09-26 17:33           ` Junio C Hamano
@ 2016-09-26 17:34           ` Junio C Hamano
  2016-09-26 17:38             ` Jeff King
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 17:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Karthik Nayak, Gustavo Grieco

And the second one, that no longer checks SP in unpacker, looks like
this.

-- >8 --
Subject: [PATCH] unpack_sha1_header(): detect malformed object header

When opening a loose object file, we often do this sequence:

 - prepare a short buffer for the object header (on stack)

 - call unpack_sha1_header() and have early part of the object data
   inflated, enough to fill the buffer

 - parse that data in the short buffer, assuming that the first part
   of the object is <typename> SP <length> NUL

Because the parsing function parse_sha1_header_extended() is not
given the number of bytes inflated into the header buffer, it you
craft a file whose early part inflates a garbage sequence without SP
or NUL, and replace a loose object with it, it will end up reading
past the end of the inflated data.

To correct this, do the following four things:

 - rename unpack_sha1_header() to unpack_sha1_short_header() and
   have unpack_sha1_header_to_strbuf() keep calling that as its
   helper function.  This will detect and report zlib errors, but is
   not aware of the format of a loose object (as before).

 - introduce unpack_sha1_header() that calls the same helper
   function, and when zlib reports it inflated OK into the buffer,
   check if the inflated data has NUL.  This would ensure that
   parsing function will terminate within the buffer that holds the
   inflated header.

 - update unpack_sha1_header_to_strbuf() to check if the resulting
   buffer has NUL for the same effect.

 - update parse_sha1_header_extended() to make sure that its loop to
   find the SP that terminates the <typename> stops at NUL.

Essentially, this makes unpack_*() functions that are asked to
unpack a loose object header to be a bit more strict and detect an
input that cannot possibly be a valid object header, even before the
parsing function kicks in.

Reported-by: Gustavo Grieco <gustavo.grieco@imag.fr>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 sha1_file.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 17262e1..f7054d3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1566,7 +1566,9 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
 	return used;
 }
 
-int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz)
+static int unpack_sha1_short_header(git_zstream *stream,
+				    unsigned char *map, unsigned long mapsize,
+				    void *buffer, unsigned long bufsiz)
 {
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1579,13 +1581,31 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
 	return git_inflate(stream, 0);
 }
 
+int unpack_sha1_header(git_zstream *stream,
+		       unsigned char *map, unsigned long mapsize,
+		       void *buffer, unsigned long bufsiz)
+{
+	int status = unpack_sha1_short_header(stream, map, mapsize,
+					      buffer, bufsiz);
+
+	if (status < Z_OK)
+		return status;
+
+	/* Make sure we have the terminating NUL */
+	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
+		return -1;
+	return 0;
+}
+
 static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
 					unsigned long mapsize, void *buffer,
 					unsigned long bufsiz, struct strbuf *header)
 {
 	int status;
 
-	status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);
+	status = unpack_sha1_short_header(stream, map, mapsize, buffer, bufsiz);
+	if (status < Z_OK)
+		return -1;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
@@ -1676,6 +1696,8 @@ static int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
 	 */
 	for (;;) {
 		char c = *hdr++;
+		if (!c)
+			return -1;
 		if (c == ' ')
 			break;
 		type_len++;
-- 
2.10.0-533-ga18d90d


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] unpack_sha1_header(): detect malformed object header
  2016-09-26 17:33           ` Junio C Hamano
@ 2016-09-26 17:35             ` Jeff King
  2016-09-26 17:39               ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2016-09-26 17:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Karthik Nayak, Gustavo Grieco

On Mon, Sep 26, 2016 at 10:33:57AM -0700, Junio C Hamano wrote:

> -- >8 --
> From: Junio C Hamano <gitster@pobox.com>
> Date: Mon, 26 Sep 2016 09:23:41 -0700
> Subject: [PATCH 1/2] streaming: make sure to notice corrupt object
> 
> The streaming read interface from a loose object called
> parse_sha1_header() but discarded its return value, without noticing
> a potential error.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  streaming.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/streaming.c b/streaming.c
> index 811fcc2..884a8f1 100644
> --- a/streaming.c
> +++ b/streaming.c
> @@ -347,7 +347,8 @@ static open_method_decl(loose)
>  		return -1;
>  	}
>  
> -	parse_sha1_header(st->u.loose.hdr, &st->size);
> +	if (parse_sha1_header(st->u.loose.hdr, &st->size) < 0)
> +		return -1;

Do you have to git_inflate_end() and munmap() here, as the error path
above does (this was missing from my patch, too)?

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] unpack_sha1_header(): detect malformed object header
  2016-09-26 17:34           ` Junio C Hamano
@ 2016-09-26 17:38             ` Jeff King
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2016-09-26 17:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Karthik Nayak, Gustavo Grieco

On Mon, Sep 26, 2016 at 10:34:32AM -0700, Junio C Hamano wrote:

> And the second one, that no longer checks SP in unpacker, looks like
> this.

This looks good from a cursory read (but I am about to go to sleep, so
might be a bit less careful than usual :) ).

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] unpack_sha1_header(): detect malformed object header
  2016-09-26 17:35             ` Jeff King
@ 2016-09-26 17:39               ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 17:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Karthik Nayak, Gustavo Grieco

Jeff King <peff@peff.net> writes:

>> diff --git a/streaming.c b/streaming.c
>> index 811fcc2..884a8f1 100644
>> --- a/streaming.c
>> +++ b/streaming.c
>> @@ -347,7 +347,8 @@ static open_method_decl(loose)
>>  		return -1;
>>  	}
>>  
>> -	parse_sha1_header(st->u.loose.hdr, &st->size);
>> +	if (parse_sha1_header(st->u.loose.hdr, &st->size) < 0)
>> +		return -1;
>
> Do you have to git_inflate_end() and munmap() here, as the error path
> above does (this was missing from my patch, too)?

Ah, definitely.  We'd need to be consistent; otherwise we'd be
either leaking resources (or existing one double-freeing).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0
  2016-09-26  0:10   ` Junio C Hamano
  2016-09-26  4:29     ` [PATCH] unpack_sha1_header(): detect malformed object header Junio C Hamano
  2016-09-26 13:50     ` Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0 Jeff King
@ 2016-09-26 17:48     ` Gustavo Grieco
  2016-09-26 17:55       ` Junio C Hamano
  2 siblings, 1 reply; 22+ messages in thread
From: Gustavo Grieco @ 2016-09-26 17:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hello,

Now that the cause of this issue is identified, i would like to know if there is an impact in the security, so i can request a CVE if necessary.

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0
  2016-09-26 17:48     ` Gustavo Grieco
@ 2016-09-26 17:55       ` Junio C Hamano
  2016-09-26 18:01         ` Gustavo Grieco
  2016-09-26 18:10         ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 17:55 UTC (permalink / raw)
  To: Gustavo Grieco; +Cc: git

Gustavo Grieco <gustavo.grieco@imag.fr> writes:

> Now that the cause of this issue is identified, i would like to
> know if there is an impact in the security, so i can request a CVE
> if necessary.

I am inclined to say that it has no security implications.  You have
to be able to write a bogus loose object in an object store you
already have write access to in the first place, in order to cause
this read-only access that goes beyond what is allocated, so at the
worst, what you can do is to hurt yourself, and you can already hurt
yourself in various other ways.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0
  2016-09-26 17:55       ` Junio C Hamano
@ 2016-09-26 18:01         ` Gustavo Grieco
  2016-09-26 18:06           ` Junio C Hamano
  2016-09-26 18:10         ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Gustavo Grieco @ 2016-09-26 18:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Fair enough. We are testing our tool to try to find bugs/vulnerabilities in several git implementations. I will report here my results if i can find some other memory issue in this git client.

----- Original Message -----
> Gustavo Grieco <gustavo.grieco@imag.fr> writes:
> 
> > Now that the cause of this issue is identified, i would like to
> > know if there is an impact in the security, so i can request a CVE
> > if necessary.
> 
> I am inclined to say that it has no security implications.  You have
> to be able to write a bogus loose object in an object store you
> already have write access to in the first place, in order to cause
> this read-only access that goes beyond what is allocated, so at the
> worst, what you can do is to hurt yourself, and you can already hurt
> yourself in various other ways.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0
  2016-09-26 18:01         ` Gustavo Grieco
@ 2016-09-26 18:06           ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 18:06 UTC (permalink / raw)
  To: Gustavo Grieco; +Cc: git

Gustavo Grieco <gustavo.grieco@imag.fr> writes:

> Fair enough. We are testing our tool to try to find
> bugs/vulnerabilities in several git implementations. I will report
> here my results if i can find some other memory issue in this git
> client.

Thanks.  With or without security implications, it is basic codebase
hygiene to identify and correct these issues, and your help is
highly appreciated.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0
  2016-09-26 17:55       ` Junio C Hamano
  2016-09-26 18:01         ` Gustavo Grieco
@ 2016-09-26 18:10         ` Junio C Hamano
  2016-09-27  2:13           ` Gustavo Grieco
  2016-09-27  7:19           ` Jeff King
  1 sibling, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 18:10 UTC (permalink / raw)
  To: Gustavo Grieco; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> I am inclined to say that it has no security implications.  You have
> to be able to write a bogus loose object in an object store you
> already have write access to in the first place, in order to cause
> this ...

Note that you could social-engineer others to fetch from you and
feed a small enough update that results in loose objects created in
their repositories, without you having a direct write access to the
repository.

The codepath under discussion in this thread however cannot be used
as an attack vector via that route, because the "fetch from
elsewhere" codepath runs verification of the incoming data stream
before storing the results (either in loose object files, or in a
packfile) on disk.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0
  2016-09-26 18:10         ` Junio C Hamano
@ 2016-09-27  2:13           ` Gustavo Grieco
  2016-09-27  7:19           ` Jeff King
  1 sibling, 0 replies; 22+ messages in thread
From: Gustavo Grieco @ 2016-09-27  2:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Btw, this other test case will trigger a similar issue, but in another line of code:

To reproduce: 

$ git init ; mkdir -p .git/objects/b2 ; printf 'eJwNwoENgDAIBECkDsII5Z8CHagLGPePXu59zjHGRIOZG3OzI/lnRc4KemXDPdYSml6iQ+4ATIZ+nAEK4g==' | base64 -d > .git/objects/b2/93584ddd61af21260be75ee9f73e9d53f08cd0

Then:

$ git fsck

notice: HEAD points to an unborn branch (master)
=================================================================
==24569==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe7645fda0 at pc 0x0000006fe799 bp 0x7ffe7645fc40 sp 0x7ffe7645fc30
READ of size 1 at 0x7ffe7645fda0 thread T0
    #0 0x6fe798 in parse_sha1_header_extended /home/g/Work/Code/git-2.10.0/sha1_file.c:1714
...

It will be nice to test the current patch.

----- Original Message -----
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > I am inclined to say that it has no security implications.  You have
> > to be able to write a bogus loose object in an object store you
> > already have write access to in the first place, in order to cause
> > this ...
> 
> Note that you could social-engineer others to fetch from you and
> feed a small enough update that results in loose objects created in
> their repositories, without you having a direct write access to the
> repository.
> 
> The codepath under discussion in this thread however cannot be used
> as an attack vector via that route, because the "fetch from
> elsewhere" codepath runs verification of the incoming data stream
> before storing the results (either in loose object files, or in a
> packfile) on disk.
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Possible integer overflow parsing malformed objects in git 2.10.0
  2016-09-25 14:12 ` Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0 Gustavo Grieco
  2016-09-26  0:10   ` Junio C Hamano
@ 2016-09-27  2:30   ` Gustavo Grieco
  2016-09-27  8:07     ` Jeff King
  1 sibling, 1 reply; 22+ messages in thread
From: Gustavo Grieco @ 2016-09-27  2:30 UTC (permalink / raw)
  To: git

Hi,

We found a malformed object file that triggers an allocation with a negative size when parsed in git 2.10.0. It can be caused by an integer overflow somewhere, so it is better to verify how the code got such value. It was tested on ArchLinux x86_64. To reproduce, first recompile git with ASAN support and then execute:

$ git init ; mkdir -p .git/objects/b2 ; printf 'eJyVT8ERAjEIXKiEBpyBHJdcCroGHAvQjyX49m1ZtmADQjL68uMnZFnYZU/HfRfb3Gtz17Y07etqXhX6ul9uAnCJh6DCAKxUCWABok9J2PN8jYn42iwqYA2OYoKRzVAY67mYgIOfQP8WOthUKubNt6V6/yn5YSPEowsxKGPk0Jdq6ZLKxJYX2LTjYTNi52WTAN4RVyPd' | base64 -d > .git/objects/b2/93584ddd61af21260be75ee9f73e9d53f08cd0

Finally you can trigger the bug using several commands from git (other commands that parses all objects will work too), for instance:

$ git fsck

The ASAN report is here:

==24709==WARNING: AddressSanitizer failed to allocate 0xffffffffffffff65 bytes
==24709==AddressSanitizer's allocator is terminating the process instead of returning 0
==24709==If you don't like this behavior set allocator_may_return_null=1
==24709==AddressSanitizer CHECK failed: /build/gcc-multilib/src/gcc/libsanitizer/sanitizer_common/sanitizer_allocator.cc:145 "((0)) != (0)" (0x0, 0x0)
    #0 0x7f571ae467aa in AsanCheckFailed /build/gcc-multilib/src/gcc/libsanitizer/asan/asan_rtl.cc:65
    #1 0x7f571ae4d163 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /build/gcc-multilib/src/gcc/libsanitizer/sanitizer_common/sanitizer_common.cc:157
    #2 0x7f571ae4b326 in __sanitizer::ReportAllocatorCannotReturnNull() /build/gcc-multilib/src/gcc/libsanitizer/sanitizer_common/sanitizer_allocator.cc:145
    #3 0x7f571ad9b2f4 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<105553116266496ul, 4398046511104ul, 0ul, __sanitizer::SizeClassMap<17ul, 128ul, 16ul>, __asan::AsanMapUnmapCallback>, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator64<105553116266496ul, 4398046511104ul, 0ul, __sanitizer::SizeClassMap<17ul, 128ul, 16ul>, __asan::AsanMapUnmapCallback> >, __sanitizer::LargeMmapAllocator<__asan::AsanMapUnmapCallback> >::ReturnNullOrDie() /build/gcc-multilib/src/gcc/libsanitizer/sanitizer_common/sanitizer_allocator.h:1315
    #4 0x7f571ad9b2f4 in __asan::Allocator::Allocate(unsigned long, unsigned long, __sanitizer::BufferedStackTrace*, __asan::AllocType, bool) /build/gcc-multilib/src/gcc/libsanitizer/asan/asan_allocator.cc:357
    #5 0x7f571ad9b2f4 in __asan::asan_malloc(unsigned long, __sanitizer::BufferedStackTrace*) /build/gcc-multilib/src/gcc/libsanitizer/asan/asan_allocator.cc:716
    #6 0x7f571ae3ce24 in __interceptor_malloc /build/gcc-multilib/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:63
    #7 0x767816 in do_xmalloc /home/g/Work/Code/git-2.10.0/wrapper.c:59
    #8 0x76794c in do_xmallocz /home/g/Work/Code/git-2.10.0/wrapper.c:99
    #9 0x7679bd in xmallocz /home/g/Work/Code/git-2.10.0/wrapper.c:107
    #10 0x6fe36c in unpack_sha1_rest /home/g/Work/Code/git-2.10.0/sha1_file.c:1625
    #11 0x6feb40 in unpack_sha1_file /home/g/Work/Code/git-2.10.0/sha1_file.c:1751
    #12 0x703fe0 in read_object /home/g/Work/Code/git-2.10.0/sha1_file.c:2811
    #13 0x70410a in read_sha1_file_extended /home/g/Work/Code/git-2.10.0/sha1_file.c:2834
    #14 0x647676 in read_sha1_file /home/g/Work/Code/git-2.10.0/cache.h:1056
    #15 0x648545 in parse_object /home/g/Work/Code/git-2.10.0/object.c:269
    #16 0x48d46d in fsck_sha1 builtin/fsck.c:367
    #17 0x48da47 in fsck_loose builtin/fsck.c:493
    #18 0x707514 in for_each_file_in_obj_subdir /home/g/Work/Code/git-2.10.0/sha1_file.c:3477
    #19 0x70775b in for_each_loose_file_in_objdir_buf /home/g/Work/Code/git-2.10.0/sha1_file.c:3512
    #20 0x707885 in for_each_loose_file_in_objdir /home/g/Work/Code/git-2.10.0/sha1_file.c:3532
    #21 0x48dc1d in fsck_object_dir builtin/fsck.c:521
    #22 0x48e2e6 in cmd_fsck builtin/fsck.c:644
    #23 0x407a8f in run_builtin /home/g/Work/Code/git-2.10.0/git.c:352
    #24 0x407e35 in handle_builtin /home/g/Work/Code/git-2.10.0/git.c:539
    #25 0x408175 in run_argv /home/g/Work/Code/git-2.10.0/git.c:593
    #26 0x408458 in cmd_main /home/g/Work/Code/git-2.10.0/git.c:665
    #27 0x53fc70 in main /home/g/Work/Code/git-2.10.0/common-main.c:40
    #28 0x7f5719f46290 in __libc_start_main (/usr/lib/libc.so.6+0x20290)
    #29 0x405209 in _start (/home/g/Work/Code/git-2.10.0/git+0x405209)


This test case was found using QuickFuzz.


Regards,
Gustavo.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0
  2016-09-26 18:10         ` Junio C Hamano
  2016-09-27  2:13           ` Gustavo Grieco
@ 2016-09-27  7:19           ` Jeff King
  1 sibling, 0 replies; 22+ messages in thread
From: Jeff King @ 2016-09-27  7:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Gustavo Grieco, git

On Mon, Sep 26, 2016 at 11:10:54AM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > I am inclined to say that it has no security implications.  You have
> > to be able to write a bogus loose object in an object store you
> > already have write access to in the first place, in order to cause
> > this ...
> 
> Note that you could social-engineer others to fetch from you and
> feed a small enough update that results in loose objects created in
> their repositories, without you having a direct write access to the
> repository.
> 
> The codepath under discussion in this thread however cannot be used
> as an attack vector via that route, because the "fetch from
> elsewhere" codepath runs verification of the incoming data stream
> before storing the results (either in loose object files, or in a
> packfile) on disk.

I don't think it could be used at all for anything that speaks the git
protocol, because the object header is not present at all in a packfile.
So even if you hit unpack-objects, it would be writing the (correct)
loose object header itself.

But when we grab loose objects _directly_ from a remote, as in dumb-http
fetch, I'd suspect that the code doing the verification calls
unpack_sha1_header() as part of it. So I didn't test, but I'd strongly
suspect that's a viable attack vector.

I'm not sure what the actual attack would look like, though, aside from
locally accessing memory in a read-only way.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible integer overflow parsing malformed objects in git 2.10.0
  2016-09-27  2:30   ` Possible integer overflow parsing malformed objects in " Gustavo Grieco
@ 2016-09-27  8:07     ` Jeff King
  2016-09-27 15:57       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2016-09-27  8:07 UTC (permalink / raw)
  To: Gustavo Grieco; +Cc: git

On Tue, Sep 27, 2016 at 04:30:23AM +0200, Gustavo Grieco wrote:

> We found a malformed object file that triggers an allocation with a
> negative size when parsed in git 2.10.0. It can be caused by an
> integer overflow somewhere, so it is better to verify how the code got
> such value.

Are you sure this is triggering a negative integer?

The zlib-inflated contents for the object in your example look like:

  (gdb) print hdr
  $2 = "tree 18446744073709551460\000..."

IOW, this really _is_ a gigantic number, but still within 2^64. So when
we feed it to malloc, that really is correct. And we'd expect malloc to
return NULL, at which point we'll call die, which should look like this
(which I get when running without ASAN):

  $ git fsck
  fatal: Out of memory, malloc failed (tried to allocate 18446744073709551461 bytes)

You'll note that's 1 more than the value in the object; that addition
happens via xmallocz() and _is_ checked for integer overflow.

> The ASAN report is here:
> 
> ==24709==WARNING: AddressSanitizer failed to allocate 0xffffffffffffff65 bytes
> ==24709==AddressSanitizer's allocator is terminating the process instead of returning 0
> ==24709==If you don't like this behavior set allocator_may_return_null=1

I don't think this is an overflow at all. This is just ASAN having
really conservative debugging defaults. A real malloc would return NULL,
and git would notice and abort.

If you follow its suggestion, you get:

  $ ASAN_OPTIONS=allocator_may_return_null=1 git fsck
  ==19380==WARNING: AddressSanitizer failed to allocate 0xffffffffffffff65 bytes
  ==19380==WARNING: AddressSanitizer failed to allocate 0xffffffffffffff65 bytes
  fatal: Out of memory, malloc failed (tried to allocate 18446744073709551461 bytes)

as expected.  So I don't think there is any bug at all in the example
you gave, only a silly-sized object that we cannot handle.

That being said, the parse_sha1_header() function clearly does not
detect overflow at all when parsing the size. So on a 32-bit system, you
end up with:

  $ git fsck
  fatal: Out of memory, malloc failed (tried to allocate 4294967141 bytes)

which is not correct, but I'm not sure it's a security problem.  Integer
overflows are an issue if they cause us to under-allocate, and then to
write more bytes than we allocated. In this case, I would expect
unpack_sha1_rest() to never write more bytes than the "size" we parsed
and allocated (and to complain if the number of bytes we get from the
zlib sequence do not exactly match the claimed size).

So a more interesting example is more like "ULONG_MAX + 5", where we
would overflow to 5 bytes. And we'd hope that unpack_sha1_rest does not
ever write more than 5 bytes. From my reading and a few tests with gdb,
it does not. However, it also does not notice that there were more bytes
that we didn't use.

So I think there's room for improved diagnosis of bogus situations
(including integer overflows), but I don't see any actual security bugs.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible integer overflow parsing malformed objects in git 2.10.0
  2016-09-27  8:07     ` Jeff King
@ 2016-09-27 15:57       ` Junio C Hamano
  2016-09-27 19:14         ` Gustavo Grieco
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-09-27 15:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Gustavo Grieco, git

Jeff King <peff@peff.net> writes:

> That being said, the parse_sha1_header() function clearly does not
> detect overflow at all when parsing the size. So on a 32-bit system, you
> end up with:
>
>   $ git fsck
>   fatal: Out of memory, malloc failed (tried to allocate 4294967141 bytes)
>
> which is not correct, but I'm not sure it's a security problem.  Integer
> overflows are an issue if they cause us to under-allocate, and then to
> write more bytes than we allocated. In this case, I would expect
> unpack_sha1_rest() to never write more bytes than the "size" we parsed
> and allocated (and to complain if the number of bytes we get from the
> zlib sequence do not exactly match the claimed size).
>
> So a more interesting example is more like "ULONG_MAX + 5", where we
> would overflow to 5 bytes. And we'd hope that unpack_sha1_rest does not
> ever write more than 5 bytes. From my reading and a few tests with gdb,
> it does not. However, it also does not notice that there were more bytes
> that we didn't use.
>
> So I think there's room for improved diagnosis of bogus situations
> (including integer overflows), but I don't see any actual security bugs.

I agree with the overall conclusion.  This does look like an attempt
to throw random fuzz at Git and see if and how it breaks, and in this
particular one Git is simply doing the right thing (and the fault lies
in the way how ASAN is used and how its result was interpreted).

Throwing random fuzz to see what breaks is not a bad thing to do
per-se, but anybody who does so without wearing a black hat needs to
keep two things in mind:

 * When a random fuzz attempt does uncover a security issue,
   reporting it here on this list is a grossly irresponsible way to
   disclose the issue.  We have the git-security list for that.

 * A random fuzz may stop Git and that may be perfectly legit thing
   to happen, e.g. the data may request a large but still valid
   amount of memory to be allocated that happens not to fit in the
   hardware the fuzz attempt is being run, and xmalloc() may detect
   the situation and die, like the above example.  False positives
   are expected and you want to make sure you cull them before
   making your reports.  Otherwise, they will unnecessary burden
   people who are doing real work, i.e. reproduce and correct
   problems that may be security related that are irresponsibly
   disclosed here quickly enough to minimize damage.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible integer overflow parsing malformed objects in git 2.10.0
  2016-09-27 15:57       ` Junio C Hamano
@ 2016-09-27 19:14         ` Gustavo Grieco
  0 siblings, 0 replies; 22+ messages in thread
From: Gustavo Grieco @ 2016-09-27 19:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

----- Original Message -----
> Jeff King <peff@peff.net> writes:
> 
> > That being said, the parse_sha1_header() function clearly does not
> > detect overflow at all when parsing the size. So on a 32-bit system, you
> > end up with:
> >
> >   $ git fsck
> >   fatal: Out of memory, malloc failed (tried to allocate 4294967141 bytes)
> >
> > which is not correct, but I'm not sure it's a security problem.  Integer
> > overflows are an issue if they cause us to under-allocate, and then to
> > write more bytes than we allocated. In this case, I would expect
> > unpack_sha1_rest() to never write more bytes than the "size" we parsed
> > and allocated (and to complain if the number of bytes we get from the
> > zlib sequence do not exactly match the claimed size).
> >
> > So a more interesting example is more like "ULONG_MAX + 5", where we
> > would overflow to 5 bytes. And we'd hope that unpack_sha1_rest does not
> > ever write more than 5 bytes. From my reading and a few tests with gdb,
> > it does not. However, it also does not notice that there were more bytes
> > that we didn't use.
> >
> > So I think there's room for improved diagnosis of bogus situations
> > (including integer overflows), but I don't see any actual security bugs.

Great, it is exactly the type of analysis i was expecting. 

> 
> I agree with the overall conclusion.  This does look like an attempt
> to throw random fuzz at Git and see if and how it breaks, and in this
> particular one Git is simply doing the right thing (and the fault lies
> in the way how ASAN is used and how its result was interpreted).
> 
> Throwing random fuzz to see what breaks is not a bad thing to do
> per-se, but anybody who does so without wearing a black hat needs to
> keep two things in mind:
> 
>  * When a random fuzz attempt does uncover a security issue,
>    reporting it here on this list is a grossly irresponsible way to
>    disclose the issue.  We have the git-security list for that.

That is reasonable, indeed. As we discussed, this type of issues are very unlikely to be easily exploited (or even possible), so i did not think that it could be irresponsible to post this issue here. I will be happy to post only in git-security if you think my reports can uncover security issues (but so far, it was not the case). It is also interesting to mention that git-security is not linked anywhere in the official website (git-scm.com) or the github repository (github.com/git/git).

> 
>  * A random fuzz may stop Git and that may be perfectly legit thing
>    to happen, e.g. the data may request a large but still valid
>    amount of memory to be allocated that happens not to fit in the
>    hardware the fuzz attempt is being run, and xmalloc() may detect
>    the situation and die, like the above example.  False positives
>    are expected and you want to make sure you cull them before
>    making your reports.  Otherwise, they will unnecessary burden
>    people who are doing real work, i.e. reproduce and correct
>    problems that may be security related that are irresponsibly
>    disclosed here quickly enough to minimize damage.

I try to discard false positives as much as possible. Despite we just started adding the git generation module to our tool, we got hundreds of aborts caused by the allocation limits of AddressSanitizer: I only reported the one which specifically contains a size that could be interpreted as negative number (in fact, valgrind will also report this issue a 'fishy size' for malloc).
Also, I think my reports should are clear enough to reproduce any issue and i carefully check every test case for reproducibility, still i am open to suggestion on how to improve my bug reports! 

> 
> Thanks.
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-09-27 19:14 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1399913289.8224468.1474810664933.JavaMail.zimbra@imag.fr>
2016-09-25 14:12 ` Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0 Gustavo Grieco
2016-09-26  0:10   ` Junio C Hamano
2016-09-26  4:29     ` [PATCH] unpack_sha1_header(): detect malformed object header Junio C Hamano
2016-09-26 14:03       ` Jeff King
2016-09-26 16:15         ` Junio C Hamano
2016-09-26 17:33           ` Junio C Hamano
2016-09-26 17:35             ` Jeff King
2016-09-26 17:39               ` Junio C Hamano
2016-09-26 17:34           ` Junio C Hamano
2016-09-26 17:38             ` Jeff King
2016-09-26 13:50     ` Stack read out-of-bounds in parse_sha1_header_extended using git 2.10.0 Jeff King
2016-09-26 17:48     ` Gustavo Grieco
2016-09-26 17:55       ` Junio C Hamano
2016-09-26 18:01         ` Gustavo Grieco
2016-09-26 18:06           ` Junio C Hamano
2016-09-26 18:10         ` Junio C Hamano
2016-09-27  2:13           ` Gustavo Grieco
2016-09-27  7:19           ` Jeff King
2016-09-27  2:30   ` Possible integer overflow parsing malformed objects in " Gustavo Grieco
2016-09-27  8:07     ` Jeff King
2016-09-27 15:57       ` Junio C Hamano
2016-09-27 19:14         ` Gustavo Grieco

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).