git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git diff --quiet exits with 1 on clean tree with CRLF conversions
@ 2017-02-17 21:26 Mike Crowe
  2017-02-17 22:05 ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Mike Crowe @ 2017-02-17 21:26 UTC (permalink / raw)
  To: git

If "git diff --quiet" finds it necessary to compare actual file contents,
and a file requires CRLF conversion, then it incorrectly exits with an exit
code of 1 even if there have been no changes.

The patch below adds a test file that shows the problem.

The first test of diff without --quiet correctly has an exit status of zero
on both invocations.

The second test of diff with --quiet has an exit code of zero on the first
invocation, but an exit code of one on the second invocation. Further
invocations (not included in the test) may yield an exit code of 1. Calling
"git status" always fixes things.

(The touching with "tomorrow" was my attempt to avoid the sleep, but that
didn't seem to help - it appears that time must pass in order to ensure
that the cache is not used.)

The culprit would appear to be in diff_filespec_check_stat_unmatch where it
assumes that the files are different if their sizes are different:

	if (!DIFF_FILE_VALID(p->one) || /* (1) */
	    !DIFF_FILE_VALID(p->two) ||
	    (p->one->oid_valid && p->two->oid_valid) ||
	    (p->one->mode != p->two->mode) ||
	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
	    (p->one->size != p->two->size) ||
	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
		p->skip_stat_unmatch_result = 1;

In the failing case p->one->size == 14 and p->two->size == 12.

Mike.

diff --git a/t/t4063-diff-converted.sh b/t/t4063-diff-converted.sh
new file mode 100755
index 0000000..a108dfb
--- /dev/null
+++ b/t/t4063-diff-converted.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+#
+# Copyright (c) 2017 Mike Crowe
+#
+
+test_description='git diff with files that require CRLF conversion'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	echo "* text=auto" > .gitattributes &&
+	printf "Hello\r\nWorld\r\n" > crlf.txt &&
+	git add .gitattributes crlf.txt &&
+	git commit -m "initial"
+'
+test_expect_success 'noisy diff works on file that requires CRLF conversion' '
+	git status >/dev/null &&
+	git diff >/dev/null &&
+	sleep 1 &&
+	touch --date=tomorrow crlf.txt &&
+	git diff >/dev/null
+'
+# The sleep is required for reasons I don't fully understand
+test_expect_success 'quiet diff works on file that requires CRLF conversion with no changes' '
+	git status &&
+	git diff --quiet &&
+	sleep 1 &&
+	touch --date=tomorrow crlf.txt &&
+	git diff --quiet
+'
+
+test_done

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-02-17 21:26 git diff --quiet exits with 1 on clean tree with CRLF conversions Mike Crowe
@ 2017-02-17 22:05 ` Junio C Hamano
  2017-02-17 22:19   ` Mike Crowe
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2017-02-17 22:05 UTC (permalink / raw)
  To: Mike Crowe; +Cc: git

Mike Crowe <mac@mcrowe.com> writes:

> If "git diff --quiet" finds it necessary to compare actual file contents,
> and a file requires CRLF conversion, then it incorrectly exits with an exit
> code of 1 even if there have been no changes.
>
> The patch below adds a test file that shows the problem.

If "git diff" does not show any output and "git diff --exit-code" or
"git diff --quiet" says there are differences, then it is a bug.

I would however have expected that any culprit would involve a code
that says "under QUICK option, we do not have to bother doing
this".  The part you quoted:

> 	if (!DIFF_FILE_VALID(p->one) || /* (1) */
> 	    !DIFF_FILE_VALID(p->two) ||
> 	    (p->one->oid_valid && p->two->oid_valid) ||
> 	    (p->one->mode != p->two->mode) ||
> 	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
> 	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
> 	    (p->one->size != p->two->size) ||
> 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
> 		p->skip_stat_unmatch_result = 1;

is used by "git diff" with and without "--quiet", afacr, so I
suspect that the bug lies somewhere else.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-02-17 22:05 ` Junio C Hamano
@ 2017-02-17 22:19   ` Mike Crowe
  2017-02-20 15:33     ` Mike Crowe
  0 siblings, 1 reply; 29+ messages in thread
From: Mike Crowe @ 2017-02-17 22:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Friday 17 February 2017 at 14:05:17 -0800, Junio C Hamano wrote:
> Mike Crowe <mac@mcrowe.com> writes:
> 
> > If "git diff --quiet" finds it necessary to compare actual file contents,
> > and a file requires CRLF conversion, then it incorrectly exits with an exit
> > code of 1 even if there have been no changes.
> >
> > The patch below adds a test file that shows the problem.
> 
> If "git diff" does not show any output and "git diff --exit-code" or
> "git diff --quiet" says there are differences, then it is a bug.
> 
> I would however have expected that any culprit would involve a code
> that says "under QUICK option, we do not have to bother doing
> this".  The part you quoted:
> 
> > 	if (!DIFF_FILE_VALID(p->one) || /* (1) */
> > 	    !DIFF_FILE_VALID(p->two) ||
> > 	    (p->one->oid_valid && p->two->oid_valid) ||
> > 	    (p->one->mode != p->two->mode) ||
> > 	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
> > 	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
> > 	    (p->one->size != p->two->size) ||
> > 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
> > 		p->skip_stat_unmatch_result = 1;
> 
> is used by "git diff" with and without "--quiet", afacr, so I
> suspect that the bug lies somewhere else.

I can't say that I really understand the code fully, but it appears that
the first pass generates a queue of files that contain differences. The
result of the quiet diff comes from the size of that queue,
diff_queued_diff.nr, being non-zero in diffcore_std. I'm assuming that the
result of the noisy diff comes from actually comparing the files.

But, I've only spent a short while looking so I may have got the wrong end
of the stick.

Thanks.

Mike.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-02-17 22:19   ` Mike Crowe
@ 2017-02-20 15:33     ` Mike Crowe
  2017-02-20 21:25       ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Mike Crowe @ 2017-02-20 15:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Friday 17 February 2017 at 22:19:58 +0000, Mike Crowe wrote:
> On Friday 17 February 2017 at 14:05:17 -0800, Junio C Hamano wrote:
> > Mike Crowe <mac@mcrowe.com> writes:
> > 
> > > If "git diff --quiet" finds it necessary to compare actual file contents,
> > > and a file requires CRLF conversion, then it incorrectly exits with an exit
> > > code of 1 even if there have been no changes.
> > >
> > > The patch below adds a test file that shows the problem.
> > 
> > If "git diff" does not show any output and "git diff --exit-code" or
> > "git diff --quiet" says there are differences, then it is a bug.
> > 
> > I would however have expected that any culprit would involve a code
> > that says "under QUICK option, we do not have to bother doing
> > this".  The part you quoted:
> > 
> > > 	if (!DIFF_FILE_VALID(p->one) || /* (1) */
> > > 	    !DIFF_FILE_VALID(p->two) ||
> > > 	    (p->one->oid_valid && p->two->oid_valid) ||
> > > 	    (p->one->mode != p->two->mode) ||
> > > 	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
> > > 	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
> > > 	    (p->one->size != p->two->size) ||
> > > 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
> > > 		p->skip_stat_unmatch_result = 1;
> > 
> > is used by "git diff" with and without "--quiet", afacr, so I
> > suspect that the bug lies somewhere else.
> 
> I can't say that I really understand the code fully, but it appears that
> the first pass generates a queue of files that contain differences. The
> result of the quiet diff comes from the size of that queue,
> diff_queued_diff.nr, being non-zero in diffcore_std. I'm assuming that the
> result of the noisy diff comes from actually comparing the files.
> 
> But, I've only spent a short while looking so I may have got the wrong end
> of the stick.

Tricking Git into checking the actual file contents (by passing
--ignore-space-change for example) is sufficient to ensure that the exit
status is never 1 when it should be zero. (Of course that option has other
unwanted effects in this case.)

I think that if there's a risk that file contents will undergo conversion
then this should force the diff to check the file contents. Something like:

diff --git a/diff.c b/diff.c
index 051761b..bee1662 100644
--- a/diff.c
+++ b/diff.c
@@ -3413,13 +3413,14 @@ void diff_setup_done(struct diff_options *options)
 	/*
 	 * Most of the time we can say "there are changes"
 	 * only by checking if there are changed paths, but
-	 * --ignore-whitespace* options force us to look
-	 * inside contents.
+	 * --ignore-whitespace* options or text conversion
+	 * force us to look inside contents.
 	 */
 
 	if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
 	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
-	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
+	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
+	    DIFF_OPT_TST(options, ALLOW_TEXTCONV))
 		DIFF_OPT_SET(options, DIFF_FROM_CONTENTS);
 	else
 		DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS);

This solves the problem for me and my test case now passes. Unfortunately
it breaks the 'removing and adding subproject' test case in
t3040-subprojects-basic at the line:

 test_expect_code 1 git diff -M --name-status --exit-code HEAD^ HEAD

presumably because after the rename has been detected the file contents are
identical. :( A rename of a single file appears to still be handled
correctly.

Mike.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-02-20 15:33     ` Mike Crowe
@ 2017-02-20 21:25       ` Junio C Hamano
  2017-02-25 15:32         ` Mike Crowe
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2017-02-20 21:25 UTC (permalink / raw)
  To: Mike Crowe; +Cc: git

Mike Crowe <mac@mcrowe.com> writes:

> I think that if there's a risk that file contents will undergo conversion
> then this should force the diff to check the file contents. Something like:
>
> diff --git a/diff.c b/diff.c
> index 051761b..bee1662 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3413,13 +3413,14 @@ void diff_setup_done(struct diff_options *options)
>  	/*
>  	 * Most of the time we can say "there are changes"
>  	 * only by checking if there are changed paths, but
> -	 * --ignore-whitespace* options force us to look
> -	 * inside contents.
> +	 * --ignore-whitespace* options or text conversion
> +	 * force us to look inside contents.
>  	 */
>  
>  	if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
>  	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
> -	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
> +	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
> +	    DIFF_OPT_TST(options, ALLOW_TEXTCONV))
>  		DIFF_OPT_SET(options, DIFF_FROM_CONTENTS);
>  	else
>  		DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS);

Thanks.

You may be on the right track to find FROM_CONTENTS bit, but
I think TEXTCONV bit is a red-herring.

This part of diff.c caught my attention, while thinking about this
topic:

	if (output_format & DIFF_FORMAT_NO_OUTPUT &&
	    DIFF_OPT_TST(options, EXIT_WITH_STATUS) &&
	    DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
		/*
		 * run diff_flush_patch for the exit status. setting
		 * options->file to /dev/null should be safe, because we
		 * aren't supposed to produce any output anyway.
		 */

and the body of this "if" statement loops over q->queue[].  It is
about the "even though we prefer not having to format the patch
because we are doing --quiet, we need to see if contents of one and
two that we _know_ are different are made into the same thing when
we compare them while ignoring various forms of whitespace changes".
So one and two that are removed in earlier step because they were
truly identical may not be penalized if you flip FROM_CONTENTS bit
early on.

Having said that, DIFF_FROM_CONTENTS is about all paths this options
structure governs, not just paths that have eol conversion defined.
When you say "diff --ignore-whitespace-change", that applies to all
paths.  The eol conversion is specified per-path, so ideally the
FROM_CONTENTS bit should be flipped if and only if one or more of
the paths would need the conversion (i.e. does the helper function
would_convert_to_git() say "yes" to the path?).  I however suspect
that necessary information to do so (i.e. "which paths we are
looking at?") has not been generated yet at the point of the code
you quoted.  setup comes (and must come) very early, and then
q->queue[] is populated by different front-end functions that
compare trees, the index, and the working tree, depending on the
"git diff" option or "git diff-{tree,index,files}" plumbing command,
and you can ask "would one of these paths need conversion?" only
after q->queue[] is populated.  Hmm.....

Another thing is that ALLOW_TEXTCONV is not a right bit to check for
your example.  It is "do we use textconv filters to turn binary
files into a (phony) text representation before comparing?".  People
use the mechanism to throw JPEG photos in Git and have textconv
filter to extract only EXIF data, and "diff --textconv" would let us
textually compare only the EXIF data (which is the only human
readable part of the contents anyway).  

It might be a good idea to also flip FROM_CONTENTS bit under "diff
--textconv", and ...

> This solves the problem for me and my test case now passes.

... but I suspect that you were misled to think it fixes your issue,
only because "--textconv" is by default enabled for "git diff" and
"git log" (see "git diff --help").  I think you are saying that if
you always set FROM_CONTENTS bit on, you get what you want.  But
that is to be expected and it unfortunately penalizes everybody by
turning an obvious optimization permanently off.

Also "--textconv" is not on by default for the plumbing "git
diff-index" command and its friends, so it is likely that "git
diff-index HEAD" with your change will still not work as you expect.

A cheap (from code-change point of view) band-aid might be to flip
FROM_CONTENTS on if we know the repository has _some_ paths that
need eol conversion, even when the particular "diff" we are taking
does not involve any eol conversion (e.g. "is core.crlf set?").
While it may be better than "if we are porcelain (aka ALLOW_TEXTCONV
is set), unconditionally flip FROM_CONTENTS on", it is not ideal,
either.

This almost makes me suspect that the place that checks lengths of
one and two in order to refrain from running more expensive content
comparison you found earlier need to ask would_convert_to_git()
before taking the short-cut, something along the lines of this (for
illustration purposes only, not even compile-tested).  The "almost"
comes to me because I do not offhand know the performance implications
of making calls to would_convert_to_git() here.

 diff.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 051761be40..094d5913da 100644
--- a/diff.c
+++ b/diff.c
@@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 	 *    differences.
 	 *
 	 * 2. At this point, the file is known to be modified,
-	 *    with the same mode and size, and the object
-	 *    name of one side is unknown.  Need to inspect
-	 *    the identical contents.
+	 *    with the same mode and size, the object
+	 *    name of one side is unknown, or size comparison
+	 *    cannot be depended upon.  Need to inspect the 
+	 *    contents.
 	 */
 	if (!DIFF_FILE_VALID(p->one) || /* (1) */
 	    !DIFF_FILE_VALID(p->two) ||
@@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 	    (p->one->mode != p->two->mode) ||
 	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
 	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
-	    (p->one->size != p->two->size) ||
+
+	    /* 
+	     * only if eol and other conversions are not involved,
+	     * we can say that two contents of different sizes
+	     * cannot be the same without checking their contents.
+	     */
+	    (!would_convert_to_git(p->one->path) &&
+	     !would_convert_to_git(p->two->path) &&
+	     (p->one->size != p->two->size)) ||
+
 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
 		p->skip_stat_unmatch_result = 1;
 	return p->skip_stat_unmatch_result;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-02-20 21:25       ` Junio C Hamano
@ 2017-02-25 15:32         ` Mike Crowe
  2017-02-27 20:17           ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Mike Crowe @ 2017-02-25 15:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Monday 20 February 2017 at 13:25:01 -0800, Junio C Hamano wrote:
> This almost makes me suspect that the place that checks lengths of
> one and two in order to refrain from running more expensive content
> comparison you found earlier need to ask would_convert_to_git()
> before taking the short-cut, something along the lines of this (for
> illustration purposes only, not even compile-tested).  The "almost"
> comes to me because I do not offhand know the performance implications
> of making calls to would_convert_to_git() here.
> 
>  diff.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/diff.c b/diff.c
> index 051761be40..094d5913da 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
>  	 *    differences.
>  	 *
>  	 * 2. At this point, the file is known to be modified,
> -	 *    with the same mode and size, and the object
> -	 *    name of one side is unknown.  Need to inspect
> -	 *    the identical contents.
> +	 *    with the same mode and size, the object
> +	 *    name of one side is unknown, or size comparison
> +	 *    cannot be depended upon.  Need to inspect the 
> +	 *    contents.
>  	 */
>  	if (!DIFF_FILE_VALID(p->one) || /* (1) */
>  	    !DIFF_FILE_VALID(p->two) ||
> @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
>  	    (p->one->mode != p->two->mode) ||
>  	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
>  	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
> -	    (p->one->size != p->two->size) ||
> +
> +	    /* 
> +	     * only if eol and other conversions are not involved,
> +	     * we can say that two contents of different sizes
> +	     * cannot be the same without checking their contents.
> +	     */
> +	    (!would_convert_to_git(p->one->path) &&
> +	     !would_convert_to_git(p->two->path) &&
> +	     (p->one->size != p->two->size)) ||
> +
>  	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
>  		p->skip_stat_unmatch_result = 1;
>  	return p->skip_stat_unmatch_result;
> 
> 

Thanks for investigating this. I think you are correct that I was misguided
in my previous "fix". However, your change above does fix the problem for
me.

It looks like the main cost of convert_to_git is in convert_attrs which
ends up doing various path operations in attr.c. After that, both
apply_filter and crlf_to_git return straight away if there's nothing to do.

I experimented several times with running "git diff -quiet" after touching
every file in Git's own worktree and any difference in total time was lost
in the noise.

I've further improved my test case. Tests 3 and 4 fail without the above
change but pass with it. Unfortunately I'm still unable to get those tests
to fail without the above fix unless the sleeps are present. I've tried
using the "touch -r .datetime" technique from racy-git.txt but it doesn't
help. It seems that I'm unable to stop Git from using its cache without
sleeping. :(

diff --git a/t/t4063-diff-converted.sh b/t/t4063-diff-converted.sh
new file mode 100755
index 0000000..31a730d
--- /dev/null
+++ b/t/t4063-diff-converted.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+#
+# Copyright (c) 2017 Mike Crowe
+#
+# These tests ensure that files changing line endings in the presence
+# of .gitattributes to indicate that line endings should be ignored
+# don't cause 'git diff' or 'git diff --quiet' to think that they have
+# been changed.
+#
+# The sleeps are necessary to reproduce the problem for reasons that I
+# don't understand.
+
+test_description='git diff with files that require CRLF conversion'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	echo "* text=auto" > .gitattributes &&
+	printf "Hello\r\nWorld\r\n" > crlf.txt &&
+	printf "Hello\nWorld\n" > lf.txt &&
+	git add .gitattributes crlf.txt lf.txt &&
+	git commit -m "initial" && echo three
+'
+test_expect_success 'noisy diff works on file that requires CRLF conversion' '
+	git status >/dev/null &&
+	git diff >/dev/null &&
+	sleep 1 &&
+	touch crlf.txt lf.txt &&
+	git diff >/dev/null
+'
+test_expect_success 'quiet diff works on file that requires CRLF conversion with no changes' '
+	git status &&
+	git diff --quiet &&
+	sleep 1 &&
+	touch crlf.txt lf.txt &&
+	git diff --quiet
+'
+
+test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' '
+	printf "Hello\nWorld\n" > crlf.txt &&
+	printf "Hello\r\nWorld\r\n" > lf.txt &&
+	git diff --quiet
+'
+test_done


Mike.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-02-25 15:32         ` Mike Crowe
@ 2017-02-27 20:17           ` Junio C Hamano
  2017-02-28 18:06             ` Torsten Bögershausen
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2017-02-27 20:17 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, Mike Crowe

Torsten, you've been quite active in fixing various glitches around
the EOL conversion in the latter half of last year.  Have any
thoughts to share on this topic?

Thanks.

Mike Crowe <mac@mcrowe.com> writes:

> On Monday 20 February 2017 at 13:25:01 -0800, Junio C Hamano wrote:
>> This almost makes me suspect that the place that checks lengths of
>> one and two in order to refrain from running more expensive content
>> comparison you found earlier need to ask would_convert_to_git()
>> before taking the short-cut, something along the lines of this (for
>> illustration purposes only, not even compile-tested).  The "almost"
>> comes to me because I do not offhand know the performance implications
>> of making calls to would_convert_to_git() here.
>> 
>>  diff.c | 18 ++++++++++++++----
>>  1 file changed, 14 insertions(+), 4 deletions(-)
>> 
>> diff --git a/diff.c b/diff.c
>> index 051761be40..094d5913da 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
>>  	 *    differences.
>>  	 *
>>  	 * 2. At this point, the file is known to be modified,
>> -	 *    with the same mode and size, and the object
>> -	 *    name of one side is unknown.  Need to inspect
>> -	 *    the identical contents.
>> +	 *    with the same mode and size, the object
>> +	 *    name of one side is unknown, or size comparison
>> +	 *    cannot be depended upon.  Need to inspect the 
>> +	 *    contents.
>>  	 */
>>  	if (!DIFF_FILE_VALID(p->one) || /* (1) */
>>  	    !DIFF_FILE_VALID(p->two) ||
>> @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
>>  	    (p->one->mode != p->two->mode) ||
>>  	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
>>  	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
>> -	    (p->one->size != p->two->size) ||
>> +
>> +	    /* 
>> +	     * only if eol and other conversions are not involved,
>> +	     * we can say that two contents of different sizes
>> +	     * cannot be the same without checking their contents.
>> +	     */
>> +	    (!would_convert_to_git(p->one->path) &&
>> +	     !would_convert_to_git(p->two->path) &&
>> +	     (p->one->size != p->two->size)) ||
>> +
>>  	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
>>  		p->skip_stat_unmatch_result = 1;
>>  	return p->skip_stat_unmatch_result;
>> 
>> 
>
> Thanks for investigating this. I think you are correct that I was misguided
> in my previous "fix". However, your change above does fix the problem for
> me.
>
> It looks like the main cost of convert_to_git is in convert_attrs which
> ends up doing various path operations in attr.c. After that, both
> apply_filter and crlf_to_git return straight away if there's nothing to do.
>
> I experimented several times with running "git diff -quiet" after touching
> every file in Git's own worktree and any difference in total time was lost
> in the noise.
>
> I've further improved my test case. Tests 3 and 4 fail without the above
> change but pass with it. Unfortunately I'm still unable to get those tests
> to fail without the above fix unless the sleeps are present. I've tried
> using the "touch -r .datetime" technique from racy-git.txt but it doesn't
> help. It seems that I'm unable to stop Git from using its cache without
> sleeping. :(
>
> diff --git a/t/t4063-diff-converted.sh b/t/t4063-diff-converted.sh
> new file mode 100755
> index 0000000..31a730d
> --- /dev/null
> +++ b/t/t4063-diff-converted.sh
> @@ -0,0 +1,44 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2017 Mike Crowe
> +#
> +# These tests ensure that files changing line endings in the presence
> +# of .gitattributes to indicate that line endings should be ignored
> +# don't cause 'git diff' or 'git diff --quiet' to think that they have
> +# been changed.
> +#
> +# The sleeps are necessary to reproduce the problem for reasons that I
> +# don't understand.
> +
> +test_description='git diff with files that require CRLF conversion'
> +
> +. ./test-lib.sh
> +
> +test_expect_success setup '
> +	echo "* text=auto" > .gitattributes &&
> +	printf "Hello\r\nWorld\r\n" > crlf.txt &&
> +	printf "Hello\nWorld\n" > lf.txt &&
> +	git add .gitattributes crlf.txt lf.txt &&
> +	git commit -m "initial" && echo three
> +'
> +test_expect_success 'noisy diff works on file that requires CRLF conversion' '
> +	git status >/dev/null &&
> +	git diff >/dev/null &&
> +	sleep 1 &&
> +	touch crlf.txt lf.txt &&
> +	git diff >/dev/null
> +'
> +test_expect_success 'quiet diff works on file that requires CRLF conversion with no changes' '
> +	git status &&
> +	git diff --quiet &&
> +	sleep 1 &&
> +	touch crlf.txt lf.txt &&
> +	git diff --quiet
> +'
> +
> +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' '
> +	printf "Hello\nWorld\n" > crlf.txt &&
> +	printf "Hello\r\nWorld\r\n" > lf.txt &&
> +	git diff --quiet
> +'
> +test_done
>
>
> Mike.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-02-27 20:17           ` Junio C Hamano
@ 2017-02-28 18:06             ` Torsten Bögershausen
  2017-02-28 21:50               ` Junio C Hamano
  2017-03-02 15:38               ` git status reports file modified when only line-endings have changed (was git diff --quiet exits with 1 on clean tree with CRLF conversions) Mike Crowe
  0 siblings, 2 replies; 29+ messages in thread
From: Torsten Bögershausen @ 2017-02-28 18:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Mike Crowe

On 2017-02-27 21:17, Junio C Hamano wrote:

> Torsten, you've been quite active in fixing various glitches around
> the EOL conversion in the latter half of last year.  Have any
> thoughts to share on this topic?
> 
> Thanks.

Sorry for the delay, being too busy with other things.
I followed the discussion, but didn't have good things to contribute.
I am not an expert in diff.c, but there seems to be a bug, thanks everybody
for digging.



Back to business:

My understanding is that git diff --quiet should be quiet, when
git add will not do anything (but the file is "touched".
The touched means that Git will detect e.g a new mtime or inode
or file size when doing lstat().

mtime is tricky, inodes are problematic under Windows.
What is easy to change is the file length.
I don't thing that we need a test file with LF, nor do we need
the sleep, touch or anything.
Would the the following work ?
(This is copy-paste, so the TABs may be corrupted)


#!/bin/sh
#
# Copyright (c) 2017 Mike Crowe
#
# These tests ensure that files changing line endings in the presence
# of .gitattributes to indicate that line endings should be ignored
# don't cause 'git diff' or 'git diff --quiet' to think that they have
# been changed.

test_description='git diff with files that require CRLF conversion'

. ./test-lib.sh

test_expect_success setup '
	echo "* text=auto" > .gitattributes &&
	printf "Hello\r\nWorld\r\n" >crlf.txt &&
	git add .gitattributes crlf.txt lf.txt &&
	git commit -m "initial"
'

test_expect_success 'quiet diff works on file with line-ending change that has
no effect on repository' '
	printf "Hello\r\nWorld\n" >crlf.txt &&
	git status &&
	git diff --quiet
'

test_done






^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-02-28 18:06             ` Torsten Bögershausen
@ 2017-02-28 21:50               ` Junio C Hamano
  2017-03-01 17:04                 ` [PATCH v1 1/1] " tboegi
  2017-03-02 15:38               ` git status reports file modified when only line-endings have changed (was git diff --quiet exits with 1 on clean tree with CRLF conversions) Mike Crowe
  1 sibling, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2017-02-28 21:50 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, Mike Crowe

Torsten Bögershausen <tboegi@web.de> writes:

> On 2017-02-27 21:17, Junio C Hamano wrote:
>
>> Torsten, you've been quite active in fixing various glitches around
>> the EOL conversion in the latter half of last year.  Have any
>> thoughts to share on this topic?
>> 
>> Thanks.
>
> Sorry for the delay, being too busy with other things.
> I followed the discussion, but didn't have good things to contribute.
> I am not an expert in diff.c, but there seems to be a bug, thanks everybody
> for digging.
>
> Back to business:
>
> My understanding is that git diff --quiet should be quiet, when
> git add will not do anything.

Yes, I think that is a sensible criterion.  What I was interested to
hear from you the most was to double check if Mike's expectation is
reasonable.  Earlier we had a lengthy discussion on what to do when
convert-to-git and convert-to-working-tree conversions do not round
trip, and I was wondering if this was one of those cases.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-02-28 21:50               ` Junio C Hamano
@ 2017-03-01 17:04                 ` tboegi
  2017-03-01 21:14                   ` Junio C Hamano
  2017-03-01 21:25                   ` Mike Crowe
  0 siblings, 2 replies; 29+ messages in thread
From: tboegi @ 2017-03-01 17:04 UTC (permalink / raw)
  To: git, mac; +Cc: Junio C Hamano, Torsten Bögershausen

From: Junio C Hamano <gitster@pobox.com>

git diff --quiet may take a short-cut to see if a file is changed
in the working tree:
Whenever the file size differs from what is recorded in the index,
the file is assumed to be changed and git diff --quiet returns
exit with code 1

This shortcut must be suppressed whenever the line endings are converted
or a filter is in use.
The attributes say "* text=auto" and a file has
"Hello\nWorld\n" in the index with a length of 12.
The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14.
(Or even "Hello\r\nWorld\n").
In this case "git add" will not do any changes to the index, and
"git diff -quiet" should exit 0.

Add calls to would_convert_to_git() before blindly saying that a different
size means different content.

Reported-By: Mike Crowe <mac@mcrowe.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
This is what I can come up with, collecting all the loose ends.
I'm not sure if Mike wan't to have the Reported-By with a
Signed-off-by ?
The other question is, if the commit message summarizes the discussion
well enough ?

diff.c                    | 18 ++++++++++++++----
 t/t0028-diff-converted.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 4 deletions(-)
 create mode 100755 t/t0028-diff-converted.sh

diff --git a/diff.c b/diff.c
index 051761b..c264758 100644
--- a/diff.c
+++ b/diff.c
@@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 	 *    differences.
 	 *
 	 * 2. At this point, the file is known to be modified,
-	 *    with the same mode and size, and the object
-	 *    name of one side is unknown.  Need to inspect
-	 *    the identical contents.
+	 *    with the same mode and size, the object
+	 *    name of one side is unknown, or size comparison
+	 *    cannot be depended upon.  Need to inspect the
+	 *    contents.
 	 */
 	if (!DIFF_FILE_VALID(p->one) || /* (1) */
 	    !DIFF_FILE_VALID(p->two) ||
@@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 	    (p->one->mode != p->two->mode) ||
 	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
 	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
-	    (p->one->size != p->two->size) ||
+
+	    /*
+	     * only if eol and other conversions are not involved,
+	     * we can say that two contents of different sizes
+	     * cannot be the same without checking their contents.
+	     */
+	    (!would_convert_to_git(p->one->path) &&
+	     !would_convert_to_git(p->two->path) &&
+	     (p->one->size != p->two->size)) ||
+
 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
 		p->skip_stat_unmatch_result = 1;
 	return p->skip_stat_unmatch_result;
diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
new file mode 100755
index 0000000..3d5ab95
--- /dev/null
+++ b/t/t0028-diff-converted.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+#
+# Copyright (c) 2017 Mike Crowe
+#
+# These tests ensure that files changing line endings in the presence
+# of .gitattributes to indicate that line endings should be ignored
+# don't cause 'git diff' or 'git diff --quiet' to think that they have
+# been changed.
+
+test_description='git diff with files that require CRLF conversion'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	echo "* text=auto" >.gitattributes &&
+	printf "Hello\r\nWorld\r\n" >crlf.txt &&
+	git add .gitattributes crlf.txt &&
+	git commit -m "initial"
+'
+
+test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' '
+	printf "Hello\r\nWorld\n" >crlf.txt &&
+	git status &&
+	git diff --quiet
+'
+
+test_done
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-01 17:04                 ` [PATCH v1 1/1] " tboegi
@ 2017-03-01 21:14                   ` Junio C Hamano
  2017-03-01 21:54                     ` Junio C Hamano
  2017-03-01 21:25                   ` Mike Crowe
  1 sibling, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2017-03-01 21:14 UTC (permalink / raw)
  To: tboegi; +Cc: git, mac

tboegi@web.de writes:

> From: Junio C Hamano <gitster@pobox.com>
>
> git diff --quiet may take a short-cut to see if a file is changed
> in the working tree:
> Whenever the file size differs from what is recorded in the index,
> the file is assumed to be changed and git diff --quiet returns
> exit with code 1
>
> This shortcut must be suppressed whenever the line endings are converted
> or a filter is in use.
> The attributes say "* text=auto" and a file has
> "Hello\nWorld\n" in the index with a length of 12.
> The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14.
> (Or even "Hello\r\nWorld\n").
> In this case "git add" will not do any changes to the index, and
> "git diff -quiet" should exit 0.

The thing I find the most disturbing is that at this point in the
flow, p->one->size and p->two->size are supposed to be the sizes of
the blob object, not the contents of the file on the working tree.
IOW, p->two->size being 14 in the above example sounds like pointing
at a different bug, if it is 14.  

The early return in diff_populate_filespec(), where it does

	s->size = xsize_t(st.st_size);
	...
	if (size_only)
		return 0;

way before it runs convert_to_git(), may be the real culprit.

I am wondering if the real fix would be to do this, instead of the
two extra would_convert_to_git() call there in the patch you sent.
The result seems to still pass the new test in your patch.

Thanks for helping.

 diff.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 8c78fce49d..dc51dceb44 100644
--- a/diff.c
+++ b/diff.c
@@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 			s->should_free = 1;
 			return 0;
 		}
-		if (size_only)
+
+		/*
+		 * Even if the caller would be happy with getting
+		 * only the size, we cannot return early at this
+		 * point if the path requires us to run the content
+		 * conversion.
+		 */
+		if (!would_convert_to_git(s->path) && size_only)
 			return 0;
+
+		/*
+		 * Note: this check uses xsize_t(st.st_size) that may
+		 * not be the true size of the blob after it goes
+		 * through convert_to_git().  This may not strictly be
+		 * correct, but the whole point of big_file_threashold
+		 * and is_binary check is that we want to avoid
+		 * opening the file and inspecting the contents, so
+		 * this is probably fine.
+		 */
 		if ((flags & CHECK_BINARY) &&
 		    s->size > big_file_threshold && s->is_binary == -1) {
 			s->is_binary = 1;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-01 17:04                 ` [PATCH v1 1/1] " tboegi
  2017-03-01 21:14                   ` Junio C Hamano
@ 2017-03-01 21:25                   ` Mike Crowe
  2017-03-01 23:29                     ` Junio C Hamano
  2017-03-02 18:17                     ` Torsten Bögershausen
  1 sibling, 2 replies; 29+ messages in thread
From: Mike Crowe @ 2017-03-01 21:25 UTC (permalink / raw)
  To: tboegi; +Cc: git, Junio C Hamano

On Wednesday 01 March 2017 at 18:04:44 +0100, tboegi@web.de wrote:
> From: Junio C Hamano <gitster@pobox.com>
> 
> git diff --quiet may take a short-cut to see if a file is changed
> in the working tree:
> Whenever the file size differs from what is recorded in the index,
> the file is assumed to be changed and git diff --quiet returns
> exit with code 1
> 
> This shortcut must be suppressed whenever the line endings are converted
> or a filter is in use.
> The attributes say "* text=auto" and a file has
> "Hello\nWorld\n" in the index with a length of 12.
> The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14.
> (Or even "Hello\r\nWorld\n").
> In this case "git add" will not do any changes to the index, and
> "git diff -quiet" should exit 0.
> 
> Add calls to would_convert_to_git() before blindly saying that a different
> size means different content.
> 
> Reported-By: Mike Crowe <mac@mcrowe.com>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---
> This is what I can come up with, collecting all the loose ends.
> I'm not sure if Mike wan't to have the Reported-By with a
> Signed-off-by ?
> The other question is, if the commit message summarizes the discussion
> well enough ?
> 
> diff.c                    | 18 ++++++++++++++----
>  t/t0028-diff-converted.sh | 27 +++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+), 4 deletions(-)
>  create mode 100755 t/t0028-diff-converted.sh
> 
> diff --git a/diff.c b/diff.c
> index 051761b..c264758 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
>  	 *    differences.
>  	 *
>  	 * 2. At this point, the file is known to be modified,
> -	 *    with the same mode and size, and the object
> -	 *    name of one side is unknown.  Need to inspect
> -	 *    the identical contents.
> +	 *    with the same mode and size, the object
> +	 *    name of one side is unknown, or size comparison
> +	 *    cannot be depended upon.  Need to inspect the
> +	 *    contents.
>  	 */
>  	if (!DIFF_FILE_VALID(p->one) || /* (1) */
>  	    !DIFF_FILE_VALID(p->two) ||
> @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
>  	    (p->one->mode != p->two->mode) ||
>  	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
>  	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
> -	    (p->one->size != p->two->size) ||
> +
> +	    /*
> +	     * only if eol and other conversions are not involved,
> +	     * we can say that two contents of different sizes
> +	     * cannot be the same without checking their contents.
> +	     */
> +	    (!would_convert_to_git(p->one->path) &&
> +	     !would_convert_to_git(p->two->path) &&
> +	     (p->one->size != p->two->size)) ||
> +
>  	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
>  		p->skip_stat_unmatch_result = 1;
>  	return p->skip_stat_unmatch_result;
> diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
> new file mode 100755
> index 0000000..3d5ab95
> --- /dev/null
> +++ b/t/t0028-diff-converted.sh
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2017 Mike Crowe
> +#
> +# These tests ensure that files changing line endings in the presence
> +# of .gitattributes to indicate that line endings should be ignored
> +# don't cause 'git diff' or 'git diff --quiet' to think that they have
> +# been changed.
> +
> +test_description='git diff with files that require CRLF conversion'
> +
> +. ./test-lib.sh
> +
> +test_expect_success setup '
> +	echo "* text=auto" >.gitattributes &&
> +	printf "Hello\r\nWorld\r\n" >crlf.txt &&
> +	git add .gitattributes crlf.txt &&
> +	git commit -m "initial"
> +'
> +
> +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' '
> +	printf "Hello\r\nWorld\n" >crlf.txt &&
> +	git status &&
> +	git diff --quiet
> +'
> +
> +test_done

Hi Torsten,

Thanks for investigating this.

I think that you've simplified the test to the point where it doesn't
entirely prove the fix. Although you test the case where the file has
changed size, you don't test the case where it hasn't.

Unfortunately that was the part of my test that could only reproduce the
problem with the sleeps. Maybe someone who understands how the cache works
fully could explain an alternative way to force the cache not to be used.

Also, I think I've found a behaviour change with this fix. Consider:

 echo "* text=auto" >.gitattributes
 printf "Hello\r\nWorld\r\n" >crlf.txt
 git add .gitattributes crlf.txt
 git commit -m "initial"

 printf "\r\n" >>crlf.txt

With the above patch, both "git diff" and "git diff --quiet" report that
there are no changes. Previously Git would report the extra newline
correctly.

Mike.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-01 21:14                   ` Junio C Hamano
@ 2017-03-01 21:54                     ` Junio C Hamano
  2017-03-02  8:53                       ` Jeff King
  2017-03-02 14:20                       ` [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions Mike Crowe
  0 siblings, 2 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-03-01 21:54 UTC (permalink / raw)
  To: tboegi; +Cc: git, mac

Now I thought about it through a bit more thoroughly, I think this
is the right approach, so here is my (tenative) final version.

I seem to be getty really rusty---after all the codepaths involved
are practically all my code and I should have noticed the real
culprit during my first attempt X-<.

Thanks for helping.

-- >8 --
Subject: [PATCH] diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec()

Callers of diff_populate_filespec() can choose to ask only for the
size of the blob without grabbing the blob data, and the function,
after running lstat() when the filespec points at a working tree
file, returns by copying the value in size field of the stat
structure into the size field of the filespec when this is the case.

However, this short-cut cannot be taken if the contents from the
path needs to go through convert_to_git(), whose resulting real blob
data may be different from what is in the working tree file.

As "git diff --quiet" compares the .size fields of filespec
structures to skip content comparison, this bug manifests as a
false "there are differences" for a file that needs eol conversion,
for example.

Reported-by: Mike Crowe <mac@mcrowe.com>
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c                    | 19 ++++++++++++++++++-
 t/t0028-diff-converted.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100755 t/t0028-diff-converted.sh

diff --git a/diff.c b/diff.c
index 8c78fce49d..dc51dceb44 100644
--- a/diff.c
+++ b/diff.c
@@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 			s->should_free = 1;
 			return 0;
 		}
-		if (size_only)
+
+		/*
+		 * Even if the caller would be happy with getting
+		 * only the size, we cannot return early at this
+		 * point if the path requires us to run the content
+		 * conversion.
+		 */
+		if (!would_convert_to_git(s->path) && size_only)
 			return 0;
+
+		/*
+		 * Note: this check uses xsize_t(st.st_size) that may
+		 * not be the true size of the blob after it goes
+		 * through convert_to_git().  This may not strictly be
+		 * correct, but the whole point of big_file_threashold
+		 * and is_binary check being that we want to avoid
+		 * opening the file and inspecting the contents, this
+		 * is probably fine.
+		 */
 		if ((flags & CHECK_BINARY) &&
 		    s->size > big_file_threshold && s->is_binary == -1) {
 			s->is_binary = 1;
diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
new file mode 100755
index 0000000000..3d5ab9565b
--- /dev/null
+++ b/t/t0028-diff-converted.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+#
+# Copyright (c) 2017 Mike Crowe
+#
+# These tests ensure that files changing line endings in the presence
+# of .gitattributes to indicate that line endings should be ignored
+# don't cause 'git diff' or 'git diff --quiet' to think that they have
+# been changed.
+
+test_description='git diff with files that require CRLF conversion'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	echo "* text=auto" >.gitattributes &&
+	printf "Hello\r\nWorld\r\n" >crlf.txt &&
+	git add .gitattributes crlf.txt &&
+	git commit -m "initial"
+'
+
+test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' '
+	printf "Hello\r\nWorld\n" >crlf.txt &&
+	git status &&
+	git diff --quiet
+'
+
+test_done
-- 
2.12.0-319-gc5f21175ee


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-01 21:25                   ` Mike Crowe
@ 2017-03-01 23:29                     ` Junio C Hamano
  2017-03-02 18:17                     ` Torsten Bögershausen
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-03-01 23:29 UTC (permalink / raw)
  To: Mike Crowe; +Cc: tboegi, git

Mike Crowe <mac@mcrowe.com> writes:

> With the above patch, both "git diff" and "git diff --quiet" report that
> there are no changes. Previously Git would report the extra newline
> correctly.

I sent an updated one that (I think) fixes the real issue, which the
extra would_convert_to_git() calls added in the older iteration to
diff_filespec_check_stat_unmatch() were merely papering over.

It would be nice to see if it fixes the issue for you.

Thanks.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-01 21:54                     ` Junio C Hamano
@ 2017-03-02  8:53                       ` Jeff King
  2017-03-02 17:52                         ` Junio C Hamano
  2017-03-02 18:51                         ` [PATCH v2] diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec() Junio C Hamano
  2017-03-02 14:20                       ` [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions Mike Crowe
  1 sibling, 2 replies; 29+ messages in thread
From: Jeff King @ 2017-03-02  8:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: tboegi, git, mac

On Wed, Mar 01, 2017 at 01:54:26PM -0800, Junio C Hamano wrote:

> -- >8 --
> Subject: [PATCH] diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec()

Thanks, this is well-explained, and the new comments in the code really
help.

I wondered if we should be checking would_convert_to_git() in
reuse_worktree_file(), but we already do. It's just that we may still
end up in this code-path when we're _actually_ diffing the working tree
file, not just trying to optimize.

> diff --git a/diff.c b/diff.c
> index 8c78fce49d..dc51dceb44 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
>  			s->should_free = 1;
>  			return 0;
>  		}
> -		if (size_only)
> +
> +		/*
> +		 * Even if the caller would be happy with getting
> +		 * only the size, we cannot return early at this
> +		 * point if the path requires us to run the content
> +		 * conversion.
> +		 */
> +		if (!would_convert_to_git(s->path) && size_only)
>  			return 0;

The would_convert_to_git() function is a little expensive (it may have
to do an attribute lookup). It may be worth swapping the two halves of
the conditional here to get the short-circuit.

It may not matter much in practice, though, because in the !size_only
case we'd make the same query lower a few lines later (and in theory
expensive bits of the attr lookup are cached).

> +
> +		/*
> +		 * Note: this check uses xsize_t(st.st_size) that may
> +		 * not be the true size of the blob after it goes
> +		 * through convert_to_git().  This may not strictly be
> +		 * correct, but the whole point of big_file_threashold

s/threashold/threshold/

> +		 * and is_binary check being that we want to avoid
> +		 * opening the file and inspecting the contents, this
> +		 * is probably fine.
> +		 */
>  		if ((flags & CHECK_BINARY) &&
>  		    s->size > big_file_threshold && s->is_binary == -1) {
>  			s->is_binary = 1;

I'm trying to think how this "not strictly correct" could bite us. For
line-ending conversion, I'd say that the before/after are going to be
approximately the same size. But what about something like LFS? If I
have a 600MB file that convert_to_git() filters into a short LFS
pointer, I think this changes the behavior. Before, we would diff the
pointer file, but now we'll get "binary file changed".

I wonder if we should take the opposite approach, and ignore
big_file_threshold for converted files. One assumes that such gigantic
files are binary, and therefore do not have line endings to convert. And
any filtering has a reasonable chance of condensing them to something
much smaller.

I dunno. I'm sure somebody has some horrific 500MB-filtering example
that can prove me wrong.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-01 21:54                     ` Junio C Hamano
  2017-03-02  8:53                       ` Jeff King
@ 2017-03-02 14:20                       ` Mike Crowe
  2017-03-02 18:20                         ` Torsten Bögershausen
  2017-03-02 18:33                         ` Junio C Hamano
  1 sibling, 2 replies; 29+ messages in thread
From: Mike Crowe @ 2017-03-02 14:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: tboegi, git, Jeff King

On Wednesday 01 March 2017 at 13:54:26 -0800, Junio C Hamano wrote:
> Now I thought about it through a bit more thoroughly, I think this
> is the right approach, so here is my (tenative) final version.
> 
> I seem to be getty really rusty---after all the codepaths involved
> are practically all my code and I should have noticed the real
> culprit during my first attempt X-<.
> 
> Thanks for helping.
> 
> -- >8 --
> Subject: [PATCH] diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec()
> 
> Callers of diff_populate_filespec() can choose to ask only for the
> size of the blob without grabbing the blob data, and the function,
> after running lstat() when the filespec points at a working tree
> file, returns by copying the value in size field of the stat
> structure into the size field of the filespec when this is the case.
> 
> However, this short-cut cannot be taken if the contents from the
> path needs to go through convert_to_git(), whose resulting real blob
> data may be different from what is in the working tree file.
> 
> As "git diff --quiet" compares the .size fields of filespec
> structures to skip content comparison, this bug manifests as a
> false "there are differences" for a file that needs eol conversion,
> for example.
> 
> Reported-by: Mike Crowe <mac@mcrowe.com>
> Helped-by: Torsten Bögershausen <tboegi@web.de>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  diff.c                    | 19 ++++++++++++++++++-
>  t/t0028-diff-converted.sh | 27 +++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100755 t/t0028-diff-converted.sh
> 
> diff --git a/diff.c b/diff.c
> index 8c78fce49d..dc51dceb44 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
>  			s->should_free = 1;
>  			return 0;
>  		}
> -		if (size_only)
> +
> +		/*
> +		 * Even if the caller would be happy with getting
> +		 * only the size, we cannot return early at this
> +		 * point if the path requires us to run the content
> +		 * conversion.
> +		 */
> +		if (!would_convert_to_git(s->path) && size_only)
>  			return 0;
> +
> +		/*
> +		 * Note: this check uses xsize_t(st.st_size) that may
> +		 * not be the true size of the blob after it goes
> +		 * through convert_to_git().  This may not strictly be
> +		 * correct, but the whole point of big_file_threashold
> +		 * and is_binary check being that we want to avoid
> +		 * opening the file and inspecting the contents, this
> +		 * is probably fine.
> +		 */
>  		if ((flags & CHECK_BINARY) &&
>  		    s->size > big_file_threshold && s->is_binary == -1) {
>  			s->is_binary = 1;

This patch solves the problem for me. Including my tests where the file
size doesn't change but the file has been touched. It also doesn't have the
side effect of failing to report the extra trailing newline that the
original fix suffered from.

All the solutions presented so far do cause a small change in behaviour
when using git diff --quiet: they may now cause warning messages like:

 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

to be emitted (unless of course core.safecrlf=false.) I think this is an
unavoidable side-effect of doing the job properly but it might be worth
mentioning.

> diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
> new file mode 100755
> index 0000000000..3d5ab9565b
> --- /dev/null
> +++ b/t/t0028-diff-converted.sh
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2017 Mike Crowe
> +#
> +# These tests ensure that files changing line endings in the presence
> +# of .gitattributes to indicate that line endings should be ignored
> +# don't cause 'git diff' or 'git diff --quiet' to think that they have
> +# been changed.
> +
> +test_description='git diff with files that require CRLF conversion'
> +
> +. ./test-lib.sh
> +
> +test_expect_success setup '
> +	echo "* text=auto" >.gitattributes &&
> +	printf "Hello\r\nWorld\r\n" >crlf.txt &&
> +	git add .gitattributes crlf.txt &&
> +	git commit -m "initial"
> +'
> +
> +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' '
> +	printf "Hello\r\nWorld\n" >crlf.txt &&
> +	git status &&
> +	git diff --quiet
> +'
> +
> +test_done

As I said before, this doesn't actually test the case when the file sizes
match. However, given the way that the code has changed the actual file
sizes are not compared, so perhaps this doesn't matter.

Thanks for all your help investigating this.

Mike.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* git status reports file modified when only line-endings have changed (was git diff --quiet exits with 1 on clean tree with CRLF conversions)
  2017-02-28 18:06             ` Torsten Bögershausen
  2017-02-28 21:50               ` Junio C Hamano
@ 2017-03-02 15:38               ` Mike Crowe
  1 sibling, 0 replies; 29+ messages in thread
From: Mike Crowe @ 2017-03-02 15:38 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Junio C Hamano, git

On Tuesday 28 February 2017 at 19:06:44 +0100, Torsten Bögershausen wrote:
> My understanding is that git diff --quiet should be quiet, when
> git add will not do anything (but the file is "touched".
> The touched means that Git will detect e.g a new mtime or inode
> or file size when doing lstat().

Does the same apply to "git status"?

If so, then whilst investigating the "git diff --quiet" problems in this
thread I've found a similar bug with "git status". It reports the file has
modifications even if only the line-endings have changed, and issuing "git
add" causes the perceived modification to disappear.

It can be very confusing for users if "git status" reports a modification
but for "git diff" to claim that the files are identical.

This bug is still reproducible even with the fix from
https://public-inbox.org/git/xmqqshmyhtnu.fsf@gitster.mtv.corp.google.com/T/#m67cbfad1f2efe721f0c2afac2a1523b743bb57ca

Here's the test case. Test 3 is the part that currently fails:

commit de5f3f1d9161cdd46342689abe38a046fc71850e
Author: Mike Crowe <mac@mcrowe.com>
Date:   Sat Feb 25 09:28:37 2017 +0000

    status: Add tests for status output when file line endings change

diff --git a/t/t7518-status-eol-change.sh b/t/t7518-status-eol-change.sh
new file mode 100755
index 0000000..e18186f
--- /dev/null
+++ b/t/t7518-status-eol-change.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+#
+# Copyright (c) 2017 Mike Crowe
+#
+
+test_description='git status with files that require CRLF conversion'
+
+. ./test-lib.sh
+
+cat >expected_no_change <<EOF
+On branch master
+nothing to commit, working tree clean
+EOF
+
+test_expect_success setup '
+	echo "* text=auto" > .gitattributes &&
+	printf "Hello\r\nWorld\r\n" > crlf.txt &&
+	printf "expected_no_change\nactual\n" > .gitignore &&
+	git add .gitignore .gitattributes crlf.txt &&
+	git commit -m "initial"
+'
+test_expect_success 'git status reports no change if file regenerated' '
+	printf "Hello\r\nWorld\r\n" > crlf.txt &&
+	git status >actual &&
+	test_cmp expected_no_change actual
+'
+test_expect_success 'git status reports no change if line endings change' '
+	printf "Hello\nWorld\n" > crlf.txt &&
+	git status >actual &&
+	test_cmp expected_no_change actual
+'
+test_expect_success 'git status reports no change if line ending change is staged' '
+	git add crlf.txt &&
+	git status >actual &&
+	test_cmp expected_no_change actual
+'
+test_done

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-02  8:53                       ` Jeff King
@ 2017-03-02 17:52                         ` Junio C Hamano
  2017-03-02 19:12                           ` Jeff King
  2017-03-02 18:51                         ` [PATCH v2] diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec() Junio C Hamano
  1 sibling, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2017-03-02 17:52 UTC (permalink / raw)
  To: Jeff King; +Cc: tboegi, git, mac

Jeff King <peff@peff.net> writes:

>> diff --git a/diff.c b/diff.c
>> index 8c78fce49d..dc51dceb44 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
>>  			s->should_free = 1;
>>  			return 0;
>>  		}
>> -		if (size_only)
>> +
>> +		/*
>> +		 * Even if the caller would be happy with getting
>> +		 * only the size, we cannot return early at this
>> +		 * point if the path requires us to run the content
>> +		 * conversion.
>> +		 */
>> +		if (!would_convert_to_git(s->path) && size_only)
>>  			return 0;
>
> The would_convert_to_git() function is a little expensive (it may have
> to do an attribute lookup). It may be worth swapping the two halves of
> the conditional here to get the short-circuit.

Yes.  I think it makes sense.

>> +
>> +		/*
>> +		 * Note: this check uses xsize_t(st.st_size) that may
>> +		 * not be the true size of the blob after it goes
>> +		 * through convert_to_git().  This may not strictly be
>> +		 * correct, but the whole point of big_file_threashold
>
> s/threashold/threshold/

Thanks.  I felt there was something wrong and looked at the line
three times but somehow failed to spot exactly what was wrong ;-)

>
>> +		 * and is_binary check being that we want to avoid
>> +		 * opening the file and inspecting the contents, this
>> +		 * is probably fine.
>> +		 */
>>  		if ((flags & CHECK_BINARY) &&
>>  		    s->size > big_file_threshold && s->is_binary == -1) {
>>  			s->is_binary = 1;
>
> I'm trying to think how this "not strictly correct" could bite us. 

Note that the comment is just documenting what I learned and thought
while working on an unrelated thing that happened to be sitting next
to it.

Nobody asks "I am OK without the contents i.e. size-only" and "Can
you see if this is binary?" at the same time (and if a caller did,
it would never have got is_binary with the original code).  s->size
is still a copy of st.size at this point of the code (we have not
actually updated it to the size of the real blob, which happens a
bit later in the flow of this codepath where we actually slurp the
thing in and run the conversion).  So with or without this patch,
there shouldn't be any change in the behaviour wrt CHECK_BINARY.

> For
> line-ending conversion, I'd say that the before/after are going to be
> approximately the same size. But what about something like LFS? If I
> have a 600MB file that convert_to_git() filters into a short LFS
> pointer, I think this changes the behavior. Before, we would diff the
> pointer file, but now we'll get "binary file changed".

To be quite honest, I do not think this code should cater to LFS or
any other conversion hack.  They all install their own diff driver
and they can tell diff_filespec_is_binary() if the thing is binary
or not without falling back to this heuristics codepath.

What is done here *is* inconsistent with what is done on the other
side of if/else, that compares big_file_threshold with in-git size,
not in-working-tree size.  That may want to be addressed, but I am
not sure if it is worth it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-01 21:25                   ` Mike Crowe
  2017-03-01 23:29                     ` Junio C Hamano
@ 2017-03-02 18:17                     ` Torsten Bögershausen
  2017-03-03 17:01                       ` Mike Crowe
  1 sibling, 1 reply; 29+ messages in thread
From: Torsten Bögershausen @ 2017-03-02 18:17 UTC (permalink / raw)
  To: Mike Crowe; +Cc: git, Junio C Hamano

On 2017-03-01 22:25, Mike Crowe wrote:
> On Wednesday 01 March 2017 at 18:04:44 +0100, tboegi@web.de wrote:
>> From: Junio C Hamano <gitster@pobox.com>
>>
>> git diff --quiet may take a short-cut to see if a file is changed
>> in the working tree:
>> Whenever the file size differs from what is recorded in the index,
>> the file is assumed to be changed and git diff --quiet returns
>> exit with code 1
>>
>> This shortcut must be suppressed whenever the line endings are converted
>> or a filter is in use.
>> The attributes say "* text=auto" and a file has
>> "Hello\nWorld\n" in the index with a length of 12.
>> The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14.
>> (Or even "Hello\r\nWorld\n").
>> In this case "git add" will not do any changes to the index, and
>> "git diff -quiet" should exit 0.
>>
>> Add calls to would_convert_to_git() before blindly saying that a different
>> size means different content.
>>
>> Reported-By: Mike Crowe <mac@mcrowe.com>
>> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
>> ---
>> This is what I can come up with, collecting all the loose ends.
>> I'm not sure if Mike wan't to have the Reported-By with a
>> Signed-off-by ?
>> The other question is, if the commit message summarizes the discussion
>> well enough ?
>>
>> diff.c                    | 18 ++++++++++++++----
>>  t/t0028-diff-converted.sh | 27 +++++++++++++++++++++++++++
>>  2 files changed, 41 insertions(+), 4 deletions(-)
>>  create mode 100755 t/t0028-diff-converted.sh
>>
>> diff --git a/diff.c b/diff.c
>> index 051761b..c264758 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
>>  	 *    differences.
>>  	 *
>>  	 * 2. At this point, the file is known to be modified,
>> -	 *    with the same mode and size, and the object
>> -	 *    name of one side is unknown.  Need to inspect
>> -	 *    the identical contents.
>> +	 *    with the same mode and size, the object
>> +	 *    name of one side is unknown, or size comparison
>> +	 *    cannot be depended upon.  Need to inspect the
>> +	 *    contents.
>>  	 */
>>  	if (!DIFF_FILE_VALID(p->one) || /* (1) */
>>  	    !DIFF_FILE_VALID(p->two) ||
>> @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
>>  	    (p->one->mode != p->two->mode) ||
>>  	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
>>  	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
>> -	    (p->one->size != p->two->size) ||
>> +
>> +	    /*
>> +	     * only if eol and other conversions are not involved,
>> +	     * we can say that two contents of different sizes
>> +	     * cannot be the same without checking their contents.
>> +	     */
>> +	    (!would_convert_to_git(p->one->path) &&
>> +	     !would_convert_to_git(p->two->path) &&
>> +	     (p->one->size != p->two->size)) ||
>> +
>>  	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
>>  		p->skip_stat_unmatch_result = 1;
>>  	return p->skip_stat_unmatch_result;
>> diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
>> new file mode 100755
>> index 0000000..3d5ab95
>> --- /dev/null
>> +++ b/t/t0028-diff-converted.sh
>> @@ -0,0 +1,27 @@
>> +#!/bin/sh
>> +#
>> +# Copyright (c) 2017 Mike Crowe
>> +#
>> +# These tests ensure that files changing line endings in the presence
>> +# of .gitattributes to indicate that line endings should be ignored
>> +# don't cause 'git diff' or 'git diff --quiet' to think that they have
>> +# been changed.
>> +
>> +test_description='git diff with files that require CRLF conversion'
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success setup '
>> +	echo "* text=auto" >.gitattributes &&
>> +	printf "Hello\r\nWorld\r\n" >crlf.txt &&
>> +	git add .gitattributes crlf.txt &&
>> +	git commit -m "initial"
>> +'
>> +
>> +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' '
>> +	printf "Hello\r\nWorld\n" >crlf.txt &&
>> +	git status &&
>> +	git diff --quiet
>> +'
>> +
>> +test_done
> 
> Hi Torsten,
> 
> Thanks for investigating this.
> 
> I think that you've simplified the test to the point where it doesn't
> entirely prove the fix. Although you test the case where the file has
> changed size, you don't test the case where it hasn't.
> 
> Unfortunately that was the part of my test that could only reproduce the
> problem with the sleeps. Maybe someone who understands how the cache works
> fully could explain an alternative way to force the cache not to be used.
> 
> Also, I think I've found a behaviour change with this fix. Consider:
> 
>  echo "* text=auto" >.gitattributes
>  printf "Hello\r\nWorld\r\n" >crlf.txt
That should give
"Hello\nWorld\n" in the index:

git add .gitattributes crlf.txt
warning: CRLF will be replaced by LF in ttt/crlf.txt.
The file will have its original line endings in your working directory.
tb@mac:/tmp/ttt> git commit -m "initial"
[master (root-commit) 354f657] initial
 2 files changed, 3 insertions(+)
 create mode 100644 ttt/.gitattributes
 create mode 100644 ttt/crlf.txt
tb@mac:/tmp/ttt> git ls-files --eol
i/lf    w/lf    attr/text=auto          .gitattributes
i/lf    w/crlf  attr/text=auto          crlf.txt
tb@mac:/tmp/ttt>

>  git add .gitattributes crlf.txt
>  git commit -m "initial"
> 
>  printf "\r\n" >>crlf.txt
> 
> With the above patch, both "git diff" and "git diff --quiet" report that
> there are no changes. Previously Git would report the extra newline
> correctly.
Wait a second.
Which extra newline "correctly" ?

The "git diff" command is about the changes which will be done to the index.
Regardless if you have any of these in the working tree on disk:

"Hello\nWorld\n"
"Hello\nWorld\r\n"
"Hello\r\nWorld\n"
"Hello\r\nWorld\r\n"

"git status" and "git diff --quiet"
should not report any changes.

So I don't know if there is a mis-understanding about "git diff" on your side,
or if I miss something.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-02 14:20                       ` [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions Mike Crowe
@ 2017-03-02 18:20                         ` Torsten Bögershausen
  2017-03-02 18:33                         ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Torsten Bögershausen @ 2017-03-02 18:20 UTC (permalink / raw)
  To: Mike Crowe, Junio C Hamano; +Cc: git, Jeff King

On 2017-03-02 15:20, Mike Crowe wrote:
> ll the solutions presented so far do cause a small change in behaviour
> when using git diff --quiet: they may now cause warning messages like:
> 
>  warning: CRLF will be replaced by LF in crlf.txt.
>  The file will have its original line endings in your working directory.
Ah,
that is not ideal.
I can have a look at it later (or due to the weekend)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-02 14:20                       ` [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions Mike Crowe
  2017-03-02 18:20                         ` Torsten Bögershausen
@ 2017-03-02 18:33                         ` Junio C Hamano
  2017-03-02 20:03                           ` Mike Crowe
  1 sibling, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2017-03-02 18:33 UTC (permalink / raw)
  To: Mike Crowe; +Cc: tboegi, git, Jeff King

Mike Crowe <mac@mcrowe.com> writes:

> All the solutions presented so far do cause a small change in behaviour
> when using git diff --quiet: they may now cause warning messages like:
>
>  warning: CRLF will be replaced by LF in crlf.txt.
>  The file will have its original line endings in your working directory.

That is actually a good thing, I think.  As the test modifies a file
that originally has "Hello\r\nWorld\r\n" in it to this:

>> +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' '
>> +	printf "Hello\r\nWorld\n" >crlf.txt &&

If you did "git add" at this point, you would get the same warning,
because the lack of CR on the second line could well be a mistake
you may want to notice and fix before going forward.  Otherwise
you'd be losing information that _might_ matter to you (i.e. the
fact that the first line had CRLF while the second had LF) and it is
the whole point of safe_crlf setting.

I also think it is a good thing if "git status" reported this path
as modified for the same reason (I didn't actually check if that is
the case).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2] diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec()
  2017-03-02  8:53                       ` Jeff King
  2017-03-02 17:52                         ` Junio C Hamano
@ 2017-03-02 18:51                         ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-03-02 18:51 UTC (permalink / raw)
  To: git; +Cc: tboegi, Jeff King, mac

Callers of diff_populate_filespec() can choose to ask only for the
size of the blob without grabbing the blob data, and the function,
after running lstat() when the filespec points at a working tree
file, returns by copying the value in size field of the stat
structure into the size field of the filespec when this is the case.

However, this short-cut cannot be taken if the contents from the
path needs to go through convert_to_git(), whose resulting real blob
data may be different from what is in the working tree file.

As "git diff --quiet" compares the .size fields of filespec
structures to skip content comparison, this bug manifests as a
false "there are differences" for a file that needs eol conversion,
for example.

Reported-by: Mike Crowe <mac@mcrowe.com>
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * With "test size_only to avoid more expensive would_convert call"
   fix applied.  Also the new test is now in t4xxx that it belongs
   to.

 diff.c                | 19 ++++++++++++++++++-
 t/t4035-diff-quiet.sh |  9 +++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 059123c5dc..37e60ca601 100644
--- a/diff.c
+++ b/diff.c
@@ -2783,8 +2783,25 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 			s->should_free = 1;
 			return 0;
 		}
-		if (size_only)
+
+		/*
+		 * Even if the caller would be happy with getting
+		 * only the size, we cannot return early at this
+		 * point if the path requires us to run the content
+		 * conversion.
+		 */
+		if (size_only && !would_convert_to_git(s->path))
 			return 0;
+
+		/*
+		 * Note: this check uses xsize_t(st.st_size) that may
+		 * not be the true size of the blob after it goes
+		 * through convert_to_git().  This may not strictly be
+		 * correct, but the whole point of big_file_threshold
+		 * and is_binary check being that we want to avoid
+		 * opening the file and inspecting the contents, this
+		 * is probably fine.
+		 */
 		if ((flags & CHECK_BINARY) &&
 		    s->size > big_file_threshold && s->is_binary == -1) {
 			s->is_binary = 1;
diff --git a/t/t4035-diff-quiet.sh b/t/t4035-diff-quiet.sh
index 461f4bb583..2f1737fcef 100755
--- a/t/t4035-diff-quiet.sh
+++ b/t/t4035-diff-quiet.sh
@@ -152,4 +152,13 @@ test_expect_success 'git diff --quiet ignores stat-change only entries' '
 	test_expect_code 1 git diff --quiet
 '
 
+test_expect_success 'git diff --quiet on a path that need conversion' '
+	echo "crlf.txt text=auto" >.gitattributes &&
+	printf "Hello\r\nWorld\r\n" >crlf.txt &&
+	git add .gitattributes crlf.txt &&
+
+	printf "Hello\r\nWorld\n" >crlf.txt &&
+	git diff --quiet crlf.txt
+'
+
 test_done
-- 
2.12.0-352-gb05ccab5eb

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-02 17:52                         ` Junio C Hamano
@ 2017-03-02 19:12                           ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2017-03-02 19:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: tboegi, git, mac

On Thu, Mar 02, 2017 at 09:52:21AM -0800, Junio C Hamano wrote:

> >> +		 * and is_binary check being that we want to avoid
> >> +		 * opening the file and inspecting the contents, this
> >> +		 * is probably fine.
> >> +		 */
> >>  		if ((flags & CHECK_BINARY) &&
> >>  		    s->size > big_file_threshold && s->is_binary == -1) {
> >>  			s->is_binary = 1;
> >
> > I'm trying to think how this "not strictly correct" could bite us. 
> 
> Note that the comment is just documenting what I learned and thought
> while working on an unrelated thing that happened to be sitting next
> to it.

Yeah, sorry, this is obviously not a blocker to your patch. I'm just
wondering if there is more work needed.

> To be quite honest, I do not think this code should cater to LFS or
> any other conversion hack.  They all install their own diff driver
> and they can tell diff_filespec_is_binary() if the thing is binary
> or not without falling back to this heuristics codepath.

Yeah, you're right, I was just being silly. Whatever configured the
filter already has an opportunity to give us this knowledge in a better
way, and we should rely on that.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-02 18:33                         ` Junio C Hamano
@ 2017-03-02 20:03                           ` Mike Crowe
  2017-03-03 17:02                             ` Torsten Bögershausen
  0 siblings, 1 reply; 29+ messages in thread
From: Mike Crowe @ 2017-03-02 20:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: tboegi, git, Jeff King

On Thursday 02 March 2017 at 10:33:59 -0800, Junio C Hamano wrote:
> Mike Crowe <mac@mcrowe.com> writes:
> 
> > All the solutions presented so far do cause a small change in behaviour
> > when using git diff --quiet: they may now cause warning messages like:
> >
> >  warning: CRLF will be replaced by LF in crlf.txt.
> >  The file will have its original line endings in your working directory.
> 
> That is actually a good thing, I think.  As the test modifies a file
> that originally has "Hello\r\nWorld\r\n" in it to this:
> 
> >> +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' '
> >> +	printf "Hello\r\nWorld\n" >crlf.txt &&
> 
> If you did "git add" at this point, you would get the same warning,
> because the lack of CR on the second line could well be a mistake
> you may want to notice and fix before going forward.  Otherwise
> you'd be losing information that _might_ matter to you (i.e. the
> fact that the first line had CRLF while the second had LF) and it is
> the whole point of safe_crlf setting.

Well, there is an argument that it's not very "--quiet" to emit this
message. Especially when it didn't used to come out. However, I can
understand that the message is useful if the line endings have changed
despite this.

However, I can make the message appear from "git diff --quiet" even if the
line endings have not changed. I merely need to touch a file where the line
endings do not match the canonical representation in the repository first.
Upon subsequent invocations of "git diff --quiet" the message does not come
out. (Note that in this may not be reproducible in a script without
sleeps.)

Perhaps this interactive log will make things clearer:

 $ git init
 Initialized empty Git repository in /tmp/test/.git/
 $ echo "* text=auto" >.gitattributes
 $ printf "Hello\r\nWorld\r\n" >crlf.txt
 $ git add .gitattributes crlf.txt
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

The message was expected and useful there.

 $ git commit -m "initial"
 [master (root-commit) c3fb5a5] initial
 2 files changed, 3 insertions(+)
 create mode 100644 .gitattributes
 create mode 100644 crlf.txt
 $ git diff --quiet
 $ touch crlf.txt
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

I didn't change the file - I just touched it. Why did the message come out here?

 $ git diff --quiet

But then it didn't here. Which is correct?

 $ printf "Hello\r\nWorld\n" >crlf.txt
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

If the line endings have genuinely changed then the message comes out every
time...

 $ git add crlf.txt
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

...until the file is added to the index. That's probably the right thing to
do.

 $ git diff --quiet
 $ touch crlf.txt
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.
 $ git diff --quiet

But once the file has been added the previous behaviour of only emitting
the message on the first time after the touch occurs.

 $ printf "Hello\r\nWorld\n" >crlf.txt
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.
 $ git diff --quiet
 $ printf "Hello\r\nWorld\r\n" >crlf.txt
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

Hopefully that makes things a bit clearer.

Mike.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-02 18:17                     ` Torsten Bögershausen
@ 2017-03-03 17:01                       ` Mike Crowe
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Crowe @ 2017-03-03 17:01 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Hi Torsten,

Your patch has been superseded, but I thought I ought to answer your
questions rather than leave them hanging.

On Thursday 02 March 2017 at 19:17:00 +0100, Torsten Bögershausen wrote:
> On 2017-03-01 22:25, Mike Crowe wrote:
> > On Wednesday 01 March 2017 at 18:04:44 +0100, tboegi@web.de wrote:
> >> From: Junio C Hamano <gitster@pobox.com>
> >>
> >> git diff --quiet may take a short-cut to see if a file is changed
> >> in the working tree:
> >> Whenever the file size differs from what is recorded in the index,
> >> the file is assumed to be changed and git diff --quiet returns
> >> exit with code 1
> >>
> >> This shortcut must be suppressed whenever the line endings are converted
> >> or a filter is in use.
> >> The attributes say "* text=auto" and a file has
> >> "Hello\nWorld\n" in the index with a length of 12.
> >> The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14.
> >> (Or even "Hello\r\nWorld\n").
> >> In this case "git add" will not do any changes to the index, and
> >> "git diff -quiet" should exit 0.
> >>
> >> Add calls to would_convert_to_git() before blindly saying that a different
> >> size means different content.
> >>
> >> Reported-By: Mike Crowe <mac@mcrowe.com>
> >> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> >> ---
> >> This is what I can come up with, collecting all the loose ends.
> >> I'm not sure if Mike wan't to have the Reported-By with a
> >> Signed-off-by ?
> >> The other question is, if the commit message summarizes the discussion
> >> well enough ?
> >>
> >> diff.c                    | 18 ++++++++++++++----
> >>  t/t0028-diff-converted.sh | 27 +++++++++++++++++++++++++++
> >>  2 files changed, 41 insertions(+), 4 deletions(-)
> >>  create mode 100755 t/t0028-diff-converted.sh
> >>
> >> diff --git a/diff.c b/diff.c
> >> index 051761b..c264758 100644
> >> --- a/diff.c
> >> +++ b/diff.c
> >> @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
> >>  	 *    differences.
> >>  	 *
> >>  	 * 2. At this point, the file is known to be modified,
> >> -	 *    with the same mode and size, and the object
> >> -	 *    name of one side is unknown.  Need to inspect
> >> -	 *    the identical contents.
> >> +	 *    with the same mode and size, the object
> >> +	 *    name of one side is unknown, or size comparison
> >> +	 *    cannot be depended upon.  Need to inspect the
> >> +	 *    contents.
> >>  	 */
> >>  	if (!DIFF_FILE_VALID(p->one) || /* (1) */
> >>  	    !DIFF_FILE_VALID(p->two) ||
> >> @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
> >>  	    (p->one->mode != p->two->mode) ||
> >>  	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
> >>  	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
> >> -	    (p->one->size != p->two->size) ||
> >> +
> >> +	    /*
> >> +	     * only if eol and other conversions are not involved,
> >> +	     * we can say that two contents of different sizes
> >> +	     * cannot be the same without checking their contents.
> >> +	     */
> >> +	    (!would_convert_to_git(p->one->path) &&
> >> +	     !would_convert_to_git(p->two->path) &&
> >> +	     (p->one->size != p->two->size)) ||
> >> +
> >>  	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
> >>  		p->skip_stat_unmatch_result = 1;
> >>  	return p->skip_stat_unmatch_result;
> >> diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
> >> new file mode 100755
> >> index 0000000..3d5ab95
> >> --- /dev/null
> >> +++ b/t/t0028-diff-converted.sh
> >> @@ -0,0 +1,27 @@
> >> +#!/bin/sh
> >> +#
> >> +# Copyright (c) 2017 Mike Crowe
> >> +#
> >> +# These tests ensure that files changing line endings in the presence
> >> +# of .gitattributes to indicate that line endings should be ignored
> >> +# don't cause 'git diff' or 'git diff --quiet' to think that they have
> >> +# been changed.
> >> +
> >> +test_description='git diff with files that require CRLF conversion'
> >> +
> >> +. ./test-lib.sh
> >> +
> >> +test_expect_success setup '
> >> +	echo "* text=auto" >.gitattributes &&
> >> +	printf "Hello\r\nWorld\r\n" >crlf.txt &&
> >> +	git add .gitattributes crlf.txt &&
> >> +	git commit -m "initial"
> >> +'
> >> +
> >> +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' '
> >> +	printf "Hello\r\nWorld\n" >crlf.txt &&
> >> +	git status &&
> >> +	git diff --quiet
> >> +'
> >> +
> >> +test_done
> >

[snip]

> > Also, I think I've found a behaviour change with this fix. Consider:
> > 
> >  echo "* text=auto" >.gitattributes
> >  printf "Hello\r\nWorld\r\n" >crlf.txt
> That should give
> "Hello\nWorld\n" in the index:
> 
> git add .gitattributes crlf.txt
> warning: CRLF will be replaced by LF in ttt/crlf.txt.
> The file will have its original line endings in your working directory.
> tb@mac:/tmp/ttt> git commit -m "initial"
> [master (root-commit) 354f657] initial
>  2 files changed, 3 insertions(+)
>  create mode 100644 ttt/.gitattributes
>  create mode 100644 ttt/crlf.txt
> tb@mac:/tmp/ttt> git ls-files --eol
> i/lf    w/lf    attr/text=auto          .gitattributes
> i/lf    w/crlf  attr/text=auto          crlf.txt
> tb@mac:/tmp/ttt>
> 
> >  git add .gitattributes crlf.txt
> >  git commit -m "initial"
> > 
> >  printf "\r\n" >>crlf.txt
> > 
> > With the above patch, both "git diff" and "git diff --quiet" report that
> > there are no changes. Previously Git would report the extra newline
> > correctly.
> Wait a second.
> Which extra newline "correctly" ?

The extra newline I appended with the printf "\r\n" >> crlf.txt

> The "git diff" command is about the changes which will be done to the index.
> Regardless if you have any of these in the working tree on disk:
> 
> "Hello\nWorld\n"
> "Hello\nWorld\r\n"
> "Hello\r\nWorld\n"
> "Hello\r\nWorld\r\n"
> 
> "git status" and "git diff --quiet"
> should not report any changes.

But I didn't have any of those. I ended up with:

 "Hello\nWorld\n"

in the index, and

 "Hello\r\nWorld\r\n\r\n"

in the working tree, but the extra newline was not reported by git diff.

> So I don't know if there is a mis-understanding about "git diff" on your side,
> or if I miss something.

I don't think it matters any more since Junio's patch didn't suffer from
this problem.

Thanks.

Mike.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-02 20:03                           ` Mike Crowe
@ 2017-03-03 17:02                             ` Torsten Bögershausen
  2017-03-03 17:47                               ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Torsten Bögershausen @ 2017-03-03 17:02 UTC (permalink / raw)
  To: Mike Crowe, Junio C Hamano; +Cc: git, Jeff King

Understood, thanks for the explanation.

quiet is not quite any more..

Does the following fix help ?

--- a/diff.c
+++ b/diff.c
@@ -2826,6 +2826,8 @@ int diff_populate_filespec(struct diff_filespec *s,
unsigned int flags)
        enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL
                                    ? SAFE_CRLF_WARN
                                    : safe_crlf);
+       if (size_only)
+               crlf_warn = SAFE_CRLF_FALSE;



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-03 17:02                             ` Torsten Bögershausen
@ 2017-03-03 17:47                               ` Junio C Hamano
  2017-03-04  6:25                                 ` Torsten Bögershausen
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2017-03-03 17:47 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Mike Crowe, git, Jeff King

Torsten Bögershausen <tboegi@web.de> writes:

> Understood, thanks for the explanation.
>
> quiet is not quite any more..
>
> Does the following fix help ?
>
> --- a/diff.c
> +++ b/diff.c
> @@ -2826,6 +2826,8 @@ int diff_populate_filespec(struct diff_filespec *s,
> unsigned int flags)
>         enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL
>                                     ? SAFE_CRLF_WARN
>                                     : safe_crlf);
> +       if (size_only)
> +               crlf_warn = SAFE_CRLF_FALSE;

If you were to go this route, it may be sufficient to change its
initialization from WARN to FALSE _unconditionally_, because this
function uses the convert_to_git() only to _show_ the differences by
computing canonical form out of working tree contents, and the
conversion is not done to _write_ into object database to create a
new object.

Having size_only here is not a sign of getting --quiet passed from
the command line, by the way.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-03 17:47                               ` Junio C Hamano
@ 2017-03-04  6:25                                 ` Torsten Bögershausen
  2017-03-04 19:59                                   ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Torsten Bögershausen @ 2017-03-04  6:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Mike Crowe, git, Jeff King

On 2017-03-03 18:47, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
>> Understood, thanks for the explanation.
>>
>> quiet is not quite any more..
>>
>> Does the following fix help ?
>>
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -2826,6 +2826,8 @@ int diff_populate_filespec(struct diff_filespec *s,
>> unsigned int flags)
>>         enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL
>>                                     ? SAFE_CRLF_WARN
>>                                     : safe_crlf);
>> +       if (size_only)
>> +               crlf_warn = SAFE_CRLF_FALSE;
> 
> If you were to go this route, it may be sufficient to change its
> initialization from WARN to FALSE _unconditionally_, because this
> function uses the convert_to_git() only to _show_ the differences by
> computing canonical form out of working tree contents, and the
> conversion is not done to _write_ into object database to create a
> new object.
Hm, since when (is it not used) ?

I thought that it is needed to support the safecrlf handling introduced in
21e5ad50fc5e7277c74cfbb3cf6502468e840f86
Author: Steffen Prohaska <prohaska@zib.de>
Date:   Wed Feb 6 12:25:58 2008 +0100

    safecrlf: Add mechanism to warn about irreversible crlf conversions

-------------
The SAFE_CRLF_FAIL was converted into WARN here:
commit 5430bb283b478991a979437a79e10dcbb6f20e28
Author: Junio C Hamano <gitster@pobox.com>
Date:   Mon Jun 24 14:35:04 2013 -0700

    diff: demote core.safecrlf=true to core.safecrlf=warn

    Otherwise the user will not be able to start to guess where in the
    contents in the working tree the offending unsafe CR lies.
------------

My understanding is that we don't want to break the safecrlf feature,
but after applying

diff --git a/diff.c b/diff.c
index a628ac3a95..a05d88dd9f 100644
--- a/diff.c
+++ b/diff.c
@@ -2820,12 +2820,10 @@ int diff_populate_filespec(struct diff_filespec *s,
unsigned int flags)
        int size_only = flags & CHECK_SIZE_ONLY;
        int err = 0;
        /*
-        * demote FAIL to WARN to allow inspecting the situation
-        * instead of refusing.
+        * Don't use FAIL or WARN as this code is not called when _writing_
+        * into object database to create a new object.
         */
-       enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL
-                                   ? SAFE_CRLF_WARN
-                                   : safe_crlf);
+       enum safe_crlf crlf_warn = SAFE_CRLF_FALSE;


None of the test cases in t0020--t0027 fails or complain about missing warnings.
Does this all means that, looking back,  5430bb283b478991 could have been more
aggressive and could have used SAFE_CRLF_FALSE ?
And we can do this change now?

(If the answer is yes, we don't need to deal with the problem below)
> Having size_only here is not a sign of getting --quiet passed from
> the command line, by the way.
> 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
  2017-03-04  6:25                                 ` Torsten Bögershausen
@ 2017-03-04 19:59                                   ` Junio C Hamano
  0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-03-04 19:59 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Mike Crowe, git, Jeff King

Torsten Bögershausen <tboegi@web.de> writes:

>>>         enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL
>>>                                     ? SAFE_CRLF_WARN
>>>                                     : safe_crlf);
>>> +       if (size_only)
>>> +               crlf_warn = SAFE_CRLF_FALSE;
>> 
>> If you were to go this route, it may be sufficient to change its
>> initialization from WARN to FALSE _unconditionally_, because this
>> function uses the convert_to_git() only to _show_ the differences by
>> computing canonical form out of working tree contents, and the
>> conversion is not done to _write_ into object database to create a
>> new object.
>
> Hm, since when (is it not used) ?

Since forever, but my statement above said "this function", which may
have confused you, where it could have said diff_populate_filespec().

Surely it is possible for somebody to diff_populate_filespec(s, 0)
and then call hash_sha1_file(s->data, s->size, "blob", ...) to write
the data into the object database to create a new object.  But that
sounds really crazy, no?

> The SAFE_CRLF_FAIL was converted into WARN here:
> commit 5430bb283b478991a979437a79e10dcbb6f20e28
> Author: Junio C Hamano <gitster@pobox.com>
> Date:   Mon Jun 24 14:35:04 2013 -0700
>
>     diff: demote core.safecrlf=true to core.safecrlf=warn

Yes.

> Does this all means that, looking back,  5430bb283b478991 could have been more
> aggressive and could have used SAFE_CRLF_FALSE ?

That is pretty much the statement, to which you said "since when",
suspects.

> And we can do this change now?

I am not sure.  The conversion the safe-crlf code does is unsafe and
it is a disservice to users not to warn whenever we notice they are
risking information loss.  Maybe time they run "git diff" is not a
good time to warn, as they may not be actually adding the file
as-is, but if warning against information loss at "git diff" time is
important enough, the I think that should not be squelched by the
"--quiet" option, which is about "do not show the patch text
output".  It should not be taken as "do not diagnose any errors".


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-03-04 19:59 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-17 21:26 git diff --quiet exits with 1 on clean tree with CRLF conversions Mike Crowe
2017-02-17 22:05 ` Junio C Hamano
2017-02-17 22:19   ` Mike Crowe
2017-02-20 15:33     ` Mike Crowe
2017-02-20 21:25       ` Junio C Hamano
2017-02-25 15:32         ` Mike Crowe
2017-02-27 20:17           ` Junio C Hamano
2017-02-28 18:06             ` Torsten Bögershausen
2017-02-28 21:50               ` Junio C Hamano
2017-03-01 17:04                 ` [PATCH v1 1/1] " tboegi
2017-03-01 21:14                   ` Junio C Hamano
2017-03-01 21:54                     ` Junio C Hamano
2017-03-02  8:53                       ` Jeff King
2017-03-02 17:52                         ` Junio C Hamano
2017-03-02 19:12                           ` Jeff King
2017-03-02 18:51                         ` [PATCH v2] diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec() Junio C Hamano
2017-03-02 14:20                       ` [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions Mike Crowe
2017-03-02 18:20                         ` Torsten Bögershausen
2017-03-02 18:33                         ` Junio C Hamano
2017-03-02 20:03                           ` Mike Crowe
2017-03-03 17:02                             ` Torsten Bögershausen
2017-03-03 17:47                               ` Junio C Hamano
2017-03-04  6:25                                 ` Torsten Bögershausen
2017-03-04 19:59                                   ` Junio C Hamano
2017-03-01 21:25                   ` Mike Crowe
2017-03-01 23:29                     ` Junio C Hamano
2017-03-02 18:17                     ` Torsten Bögershausen
2017-03-03 17:01                       ` Mike Crowe
2017-03-02 15:38               ` git status reports file modified when only line-endings have changed (was git diff --quiet exits with 1 on clean tree with CRLF conversions) Mike Crowe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).