git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Peter Oberndorfer <kumbayo84@arcor.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: crash on git diff-tree -Ganything <tree> for new files with textconv filter
Date: Mon, 29 Oct 2012 21:19:48 +0100	[thread overview]
Message-ID: <508EE4E4.1080407@arcor.de> (raw)
In-Reply-To: <20121029060524.GB4457@sigill.intra.peff.net>

On 2012-10-29 07:05, Jeff King wrote:
> On Sun, Oct 28, 2012 at 08:56:39PM +0100, Peter Oberndorfer wrote:
>
>>> The patch below should fix it. I added tests, but please try your
>>> real-world test case on it to double-check.
>> I tested your patch, but now it crashes for another reason :-)
> Well, that's progress, right? :)
Sure :-)
>
>> i have a file with exactly 12288(0x3000) bytes in the repository.
>> When the file is loaded, the data is placed luckily so the data end
>> falls at a page boundary.
>> Later diff_grep() calls regexec() which calls strlen() on the loaded buffer
>> and ends up reading beyond the actual data into the next page
>> which is not allocated and causes a pagefault.
>> Or it could possibly (randomly) match the regex on data that is not
>> actually part of a file...
> Yuck. For the most part, we treat blob content (and generally most
> object content) as a sized buffer. However, there are some spots which,
> either through laziness or because a code interface expects a string, we
> pass the value as a string. This works because the object-reading code
> puts an extra NUL at the end of our buffer to handle just such an
> instance. So we might prematurely end if the object contains embedded
> NULs, but we would never read past the end.
>
> The code to read the output of a textconv filter does not do this
> explicitly. I would think it would get it for free by virtue of reading
> into a strbuf, though. I'll try to investigate.
I could reproduce with my 0x3000 bytes file on linux. The buffer is not
read with a trailing null byte it is mapped by mmap in
diff_populate_filespec...
So i think we will not get away with expecting a trailing null :-/

For me the key to reproduce the problem was to have 2 commits.
Adding the file in the root commit it did not work. [1]

Greetings Peter
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[1]
kumbayo@home:~/src$ mkdir git_mmap_crash2
kumbayo@home:~/src$ cd git_mmap_crash2
kumbayo@home:~/src/git_mmap_crash2$ git init
kumbayo@home:~/src/git_mmap_crash2$ echo blah>blah
kumbayo@home:~/src/git_mmap_crash2$ git add blah
kumbayo@home:~/src/git_mmap_crash2$ git commit -m blah
[master (Basis-Version) 3458422] blah
diff_populate_filespec -> xmmap for blah size:0x5 returned: 0xb7206000
 1 file changed, 1 insertion(+)
 create mode 100644 blah
kumbayo@home:~/src/git_mmap_crash2$ perl -e 'print "-" x 0x3000 '> asdf.txt
kumbayo@home:~/src/git_mmap_crash2$ git add asdf.txt
kumbayo@home:~/src/git_mmap_crash2$ git commit -m crashy
[master 5cf2c5f] crashy
diff_populate_filespec -> xmmap for asdf.txt size:0x3000 returned:
0xb771e000
 1 file changed, 1 insertion(+)
 create mode 100644 asdf.txt

kumbayo@soybean:~/src/git_mmap_crash2$ valgrind git diff-tree -Ganything HEAD
==8388== Memcheck, a memory error detector
==8388== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==8388== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==8388== Command: git diff-tree -Ganything HEAD
==8388==
==8388== Conditional jump or move depends on uninitialised value(s)
==8388==    at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)
==8388==    by 0xA0: ???
==8388==
==8388== Conditional jump or move depends on uninitialised value(s)
==8388==    at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)
==8388==    by 0x7F: ???
==8388==


==8388== Conditional jump or move depends on uninitialised value(s)


==8388==    at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)


==8388==    by 0x30: ???


==8388==


==8388== Conditional jump or move depends on uninitialised value(s)


==8388==    at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)


==8388==    by 0x50: ???


==8388==


diffcore_pickaxe_grep


diff_populate_filespec -> xmmap for asdf.txt size:0x3000 returned: 0x4035000


==8388== Invalid read of size 1


==8388==    at 0x402C683: __GI_strlen (in
/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)


==8388==    by 0x430581F: regexec@@GLIBC_2.3.4 (regexec.c:245)


==8388==    by 0x814489D: diff_grep (diffcore-pickaxe.c:110)
==8388==    by 0x8144B89: pickaxe.constprop.6 (diffcore-pickaxe.c:40)
==8388==    by 0x8144DCD: diffcore_pickaxe_grep (diffcore-pickaxe.c:155)
==8388==    by 0x80DCE64: diffcore_std (diff.c:4638)
==8388==    by 0x80F0B20: log_tree_diff_flush (log-tree.c:696)
==8388==  Address 0x4038000 is not stack'd, malloc'd or (recently) free'd
==8388==
==8388==
==8388== Process terminating with default action of signal 11 (SIGSEGV)
==8388==  Access not within mapped region at address 0x4038000
==8388==    at 0x402C683: __GI_strlen (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==8388==    by 0x430581F: regexec@@GLIBC_2.3.4 (regexec.c:245)
==8388==    by 0x814489D: diff_grep (diffcore-pickaxe.c:110)
==8388==    by 0x8144B89: pickaxe.constprop.6 (diffcore-pickaxe.c:40)
==8388==    by 0x8144DCD: diffcore_pickaxe_grep (diffcore-pickaxe.c:155)
==8388==    by 0x80DCE64: diffcore_std (diff.c:4638)
==8388==    by 0x80F0B20: log_tree_diff_flush (log-tree.c:696)
==8388==  If you believe this happened as a result of a stack
==8388==  overflow in your program's main thread (unlikely but
==8388==  possible), you can try to increase the size of the
==8388==  main thread stack using the --main-stacksize= flag.
==8388==  The main thread stack size used in this run was 8388608.
==8388==
==8388== HEAP SUMMARY:
==8388==     in use at exit: 86,229 bytes in 69 blocks
==8388==   total heap usage: 193 allocs, 124 frees, 259,991 bytes allocated
==8388==
==8388== LEAK SUMMARY:
==8388==    definitely lost: 65 bytes in 1 blocks
==8388==    indirectly lost: 0 bytes in 0 blocks
==8388==      possibly lost: 0 bytes in 0 blocks
==8388==    still reachable: 86,164 bytes in 68 blocks
==8388==         suppressed: 0 bytes in 0 blocks
==8388== Rerun with --leak-check=full to see details of leaked memory
==8388==
==8388== For counts of detected and suppressed errors, rerun with: -v
==8388== Use --track-origins=yes to see where uninitialised values come from
==8388== ERROR SUMMARY: 7 errors from 5 contexts (suppressed: 0 from 0)

  parent reply	other threads:[~2012-10-29 20:20 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-27 18:37 crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-28 12:01 ` Jeff King
2012-10-28 12:45   ` [PATCH 0/2] textconv support for "log -S" Jeff King
2012-10-28 12:46     ` [PATCH 1/2] pickaxe: hoist empty needle check Jeff King
2012-10-28 12:47     ` [PATCH 2/2] pickaxe: use textconv for -S counting Jeff King
2012-11-13 23:13       ` Junio C Hamano
2012-11-15  1:21         ` Jeff King
2012-11-20  0:31           ` Junio C Hamano
2012-11-20  0:48             ` Junio C Hamano
2012-11-21 20:27               ` Jeff King
2012-10-28 19:56   ` crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-29  6:05     ` Jeff King
2012-10-29  6:18       ` Jeff King
2012-10-29 20:19       ` Peter Oberndorfer [this message]
2012-10-29 22:35         ` Jeff King
2012-10-29 22:47           ` Jeff King
2012-10-30 12:17             ` Jeff King
2012-10-30 12:46               ` Junio C Hamano
2012-10-30 13:12                 ` Jeff King
2012-11-01 19:19               ` Ramsay Jones
2012-11-07 21:10           ` Peter Oberndorfer
2012-11-07 21:13             ` Jeff King
2013-06-03 17:25               ` Peter Oberndorfer
2013-06-03 22:17                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=508EE4E4.1080407@arcor.de \
    --to=kumbayo84@arcor.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).