git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Peter Oberndorfer <kumbayo84@arcor.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: crash on git diff-tree -Ganything <tree> for new files with textconv filter
Date: Sun, 28 Oct 2012 08:01:04 -0400	[thread overview]
Message-ID: <20121028120104.GE11434@sigill.intra.peff.net> (raw)
In-Reply-To: <508C29E4.5000801@arcor.de>

On Sat, Oct 27, 2012 at 08:37:24PM +0200, Peter Oberndorfer wrote:

> It seems "git diff-tree -Ganything <tree>" crashes[1] with a null
> pointer dereference
> when run on a commit that adds a file (pdf) with a textconv filter.
> 
> It can be reproduced with vanilla git by having a commit on top that
> adds a file with a textconv filter and executing git diff-tree
> -Ganything HEAD
> But running git log -Ganything still works without a crash.
> This problem seems to exist since the feature was first added in f506b8e8b5.

Thanks for a thorough bug report. I didn't reproduce the crash, but I
can see how it happens (it happens with diff-tree because we will reuse
the working tree file in that instance; it could also happen if you
turned on textconv caching).

> While testing I also noticed the -S and -G act on the original file
> instead of the textconv munged data.
> Is this intentional or caused by accessing the wrong data?

Both, perhaps. :)

-G operates on the munged data; you can see it feed the munged data to
xdiff in diff_grep. But the optimization for handling added and removed
files accidentally fed the wrong pointer. Fixing that is a no-brainer,
since the optimization is inconsistent with the regular code path.

-S, however, predates the invention of textconv and has never used it.
It is a little less clear that textconv is the right thing here, because
it is not about grepping the diff, but about counting occurrences of the
string inside the file content. I tend to think that doing so on the
textconv'd data would be what people generally want, but it is a
behavior change.

> Wild guess: should we really access p->one->data and not mf1.ptr?

Precisely. Thanks for your wild guess; it made finding the bug very
easy. :)

> Is there some more information i should provide?

The patch below should fix it. I added tests, but please try your
real-world test case on it to double-check.

-- >8 --
Subject: [PATCH] diff_grep: use textconv buffers for add/deleted files

If you use "-G" to grep a diff, we will apply a configured
textconv filter to the data before generating the diff.
However, if the diff is an addition or deletion, we do not
bother running the diff at all, and just look for the token
in the added (or removed) content. This works because we
know that the diff must contain every line of content.

However, while we used the textconv-derived buffers in the
regular diff, we accidentally passed the original unmodified
buffers to regexec when checking the added or removed
content. This could lead to an incorrect answer.

Worse, in some cases we might have a textconv buffer but no
original buffer (e.g., if we pulled the textconv data from
cache, or if we reused a working tree file when generating
it). In that case, we could actually feed NULL to regexec
and segfault.

Reported-by: Peter Oberndorfer <kumbayo84@arcor.de>
Signed-off-by: Jeff King <peff@peff.net>
---
 diffcore-pickaxe.c       |  4 ++--
 t/t4030-diff-textconv.sh | 12 ++++++++++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index ed23eb4..a209376 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -104,10 +104,10 @@ static int diff_grep(struct diff_filepair *p, struct diff_options *o,
 		if (!mf2.ptr)
 			return 0; /* ignore unmerged */
 		/* created "two" -- does it have what we are looking for? */
-		hit = !regexec(regexp, p->two->data, 1, &regmatch, 0);
+		hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0);
 	} else if (!mf2.ptr) {
 		/* removed "one" -- did it have what we are looking for? */
-		hit = !regexec(regexp, p->one->data, 1, &regmatch, 0);
+		hit = !regexec(regexp, mf1.ptr, 1, &regmatch, 0);
 	} else {
 		/*
 		 * We have both sides; need to run textual diff and see if
diff --git a/t/t4030-diff-textconv.sh b/t/t4030-diff-textconv.sh
index eebb1ee..461d27a 100755
--- a/t/t4030-diff-textconv.sh
+++ b/t/t4030-diff-textconv.sh
@@ -84,6 +84,18 @@ test_expect_success 'status -v produces text' '
 	git reset --soft HEAD@{1}
 '
 
+test_expect_success 'grep-diff (-G) operates on textconv data (add)' '
+	echo one >expect &&
+	git log --root --format=%s -G0 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep-diff (-G) operates on textconv data (modification)' '
+	echo two >expect &&
+	git log --root --format=%s -G1 >actual &&
+	test_cmp expect actual
+'
+
 cat >expect.stat <<'EOF'
  file | Bin 2 -> 4 bytes
  1 file changed, 0 insertions(+), 0 deletions(-)
-- 
1.8.0.3.g3456896

  reply	other threads:[~2012-10-28 12:01 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-27 18:37 crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-28 12:01 ` Jeff King [this message]
2012-10-28 12:45   ` [PATCH 0/2] textconv support for "log -S" Jeff King
2012-10-28 12:46     ` [PATCH 1/2] pickaxe: hoist empty needle check Jeff King
2012-10-28 12:47     ` [PATCH 2/2] pickaxe: use textconv for -S counting Jeff King
2012-11-13 23:13       ` Junio C Hamano
2012-11-15  1:21         ` Jeff King
2012-11-20  0:31           ` Junio C Hamano
2012-11-20  0:48             ` Junio C Hamano
2012-11-21 20:27               ` Jeff King
2012-10-28 19:56   ` crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-29  6:05     ` Jeff King
2012-10-29  6:18       ` Jeff King
2012-10-29 20:19       ` Peter Oberndorfer
2012-10-29 22:35         ` Jeff King
2012-10-29 22:47           ` Jeff King
2012-10-30 12:17             ` Jeff King
2012-10-30 12:46               ` Junio C Hamano
2012-10-30 13:12                 ` Jeff King
2012-11-01 19:19               ` Ramsay Jones
2012-11-07 21:10           ` Peter Oberndorfer
2012-11-07 21:13             ` Jeff King
2013-06-03 17:25               ` Peter Oberndorfer
2013-06-03 22:17                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121028120104.GE11434@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kumbayo84@arcor.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).