git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git diff --word-diff problem
@ 2012-01-12  9:05 Ivan Shirokoff
  2012-01-12 11:15 ` [PATCH] word-diff: ignore '\ No newline at eof' marker Thomas Rast
  0 siblings, 1 reply; 3+ messages in thread
From: Ivan Shirokoff @ 2012-01-12  9:05 UTC (permalink / raw
  To: git

Hello.
I've got a couple of generated files with obfuscated code.
I want to word-diff them just to make sure that everything is right.

The thing is when word-diff gets oneline file without newline at the end 
it compares it with regular line by line diff.

Here is an example. Two one line files generated with perl -e "print 'a 
'x10" > file

git diff --word-diff=plain file1 file2
diff --git a/file1 b/file2
index 3526254..0515a63 100644
--- a/file1
+++ b/file2
@@ -1 +1 @@
[- a a a a a a a a a a-]
  No newline at end of file
  {+a a a a a ab a a a a+}

Git shows that the whole line is different.
And if I add newlines to that files everything works just as expected

git diff --word-diff=plain file1 file2
diff --git a/file1 b/file2
index 1756d83..1ec45b9 100644
--- a/file1
+++ b/file2
@@ -1,2 +1,2 @@
  a a a a a [-a -]{+ab +}a a a a

Is that a bug or I've missed explanation in docs?

-- 
Ivan Shirokoff

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH] word-diff: ignore '\ No newline at eof' marker
  2012-01-12  9:05 git diff --word-diff problem Ivan Shirokoff
@ 2012-01-12 11:15 ` Thomas Rast
  2012-01-12 19:23   ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Rast @ 2012-01-12 11:15 UTC (permalink / raw
  To: Ivan Shirokoff; +Cc: git, Junio C Hamano

The word-diff logic accumulates + and - lines until another line type
appears (normally [ @\]), at which point it generates the word diff.
This is usually correct, but it breaks when the preimage does not have
a newline at EOF:

  $ printf "%s" "a a a" >a
  $ printf "%s\n" "a ab a" >b
  $ git diff --no-index --word-diff a b
  diff --git 1/a 2/b
  index 9f68e94..6a7c02f 100644
  --- 1/a
  +++ 2/b
  @@ -1 +1 @@
  [-a a a-]
   No newline at end of file
  {+a ab a+}

Because of the order of the lines in a unified diff

  @@ -1 +1 @@
  -a a a
  \ No newline at end of file
  +a ab a

the '\' line flushed the buffers, and the - and + lines were never
matched with each other.

A proper fix would defer such markers until the end of the hunk.
However, word-diff is inherently whitespace-ignoring, so as a cheap
fix simply ignore the marker (and hide it from the output).

We use a prefix match for '\ ' to parallel the logic in
apply.c:parse_fragment().  We currently do not localize this string
(just accept other variants of it in git-apply), but this should be
future-proof.

Noticed-by: Ivan Shirokoff <shirokoff@yandex-team.ru>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 diff.c                |    8 ++++++++
 t/t4034-diff-words.sh |   14 ++++++++++++++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/diff.c b/diff.c
index a65223a..996cc60 100644
--- a/diff.c
+++ b/diff.c
@@ -1113,6 +1113,14 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			diff_words_append(line, len,
 					  &ecbdata->diff_words->plus);
 			return;
+		} else if (!prefixcmp(line, "\\ ")) {
+			/*
+			 * Silently eat the "no newline at eof" marker
+			 * (we are diffing without regard to
+			 * whitespace anyway), and defer processing:
+			 * more '+' lines could be after it.
+			 */
+			return;
 		}
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 6f1e5a2..5c20121 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -334,4 +334,18 @@ test_expect_success 'word-diff with diff.sbe' '
 	word_diff --word-diff=plain
 '
 
+test_expect_success 'word-diff with no newline at EOF' '
+	cat >expect <<-\EOF &&
+	diff --git a/pre b/post
+	index 7bf316e..3dd0303 100644
+	--- a/pre
+	+++ b/post
+	@@ -1 +1 @@
+	a a [-a-]{+ab+} a a
+	EOF
+	printf "%s" "a a a a a" >pre &&
+	printf "%s" "a a ab a a" >post &&
+	word_diff --word-diff=plain
+'
+
 test_done
-- 
1.7.9.rc0.168.g3847c

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] word-diff: ignore '\ No newline at eof' marker
  2012-01-12 11:15 ` [PATCH] word-diff: ignore '\ No newline at eof' marker Thomas Rast
@ 2012-01-12 19:23   ` Junio C Hamano
  0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2012-01-12 19:23 UTC (permalink / raw
  To: Thomas Rast; +Cc: Ivan Shirokoff, git

Thomas Rast <trast@student.ethz.ch> writes:

> A proper fix would defer such markers until the end of the hunk.
> However, word-diff is inherently whitespace-ignoring, so as a cheap
> fix simply ignore the marker (and hide it from the output).

Sounds like a very sensible simplification of the issue.

> We use a prefix match for '\ ' to parallel the logic in
> apply.c:parse_fragment().  We currently do not localize this string
> (just accept other variants of it in git-apply), but this should be
> future-proof.
>
> Noticed-by: Ivan Shirokoff <shirokoff@yandex-team.ru>
> Signed-off-by: Thomas Rast <trast@student.ethz.ch>
> ---
>
>  diff.c                |    8 ++++++++
>  t/t4034-diff-words.sh |   14 ++++++++++++++
>  2 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index a65223a..996cc60 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1113,6 +1113,14 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
>  			diff_words_append(line, len,
>  					  &ecbdata->diff_words->plus);
>  			return;
> +		} else if (!prefixcmp(line, "\\ ")) {
> +			/*
> +			 * Silently eat the "no newline at eof" marker
> +			 * (we are diffing without regard to
> +			 * whitespace anyway), and defer processing:
> +			 * more '+' lines could be after it.
> +			 */
> +			return;
>  		}
>  		diff_words_flush(ecbdata);

It took me a while to realize "defer processing" in the comment was meant
to justify the placement of the new block _before_ this flush. Perhaps
rephrasing it to "return without calling diff_words_flush()" would make it
more readable?

Otherwise the patch looks good.

Thanks.

>  		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
> diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
> index 6f1e5a2..5c20121 100755
> --- a/t/t4034-diff-words.sh
> +++ b/t/t4034-diff-words.sh
> @@ -334,4 +334,18 @@ test_expect_success 'word-diff with diff.sbe' '
>  	word_diff --word-diff=plain
>  '
>  
> +test_expect_success 'word-diff with no newline at EOF' '
> +	cat >expect <<-\EOF &&
> +	diff --git a/pre b/post
> +	index 7bf316e..3dd0303 100644
> +	--- a/pre
> +	+++ b/post
> +	@@ -1 +1 @@
> +	a a [-a-]{+ab+} a a
> +	EOF
> +	printf "%s" "a a a a a" >pre &&
> +	printf "%s" "a a ab a a" >post &&
> +	word_diff --word-diff=plain
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-01-12 19:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-12  9:05 git diff --word-diff problem Ivan Shirokoff
2012-01-12 11:15 ` [PATCH] word-diff: ignore '\ No newline at eof' marker Thomas Rast
2012-01-12 19:23   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).