git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Mike Crowe <mac@mcrowe.com>
Cc: git@vger.kernel.org
Subject: Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
Date: Mon, 20 Feb 2017 13:25:01 -0800	[thread overview]
Message-ID: <xmqqlgt0imhe.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170220153322.GA8352@mcrowe.com> (Mike Crowe's message of "Mon, 20 Feb 2017 15:33:22 +0000")

Mike Crowe <mac@mcrowe.com> writes:

> I think that if there's a risk that file contents will undergo conversion
> then this should force the diff to check the file contents. Something like:
>
> diff --git a/diff.c b/diff.c
> index 051761b..bee1662 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3413,13 +3413,14 @@ void diff_setup_done(struct diff_options *options)
>  	/*
>  	 * Most of the time we can say "there are changes"
>  	 * only by checking if there are changed paths, but
> -	 * --ignore-whitespace* options force us to look
> -	 * inside contents.
> +	 * --ignore-whitespace* options or text conversion
> +	 * force us to look inside contents.
>  	 */
>  
>  	if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
>  	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
> -	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
> +	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
> +	    DIFF_OPT_TST(options, ALLOW_TEXTCONV))
>  		DIFF_OPT_SET(options, DIFF_FROM_CONTENTS);
>  	else
>  		DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS);

Thanks.

You may be on the right track to find FROM_CONTENTS bit, but
I think TEXTCONV bit is a red-herring.

This part of diff.c caught my attention, while thinking about this
topic:

	if (output_format & DIFF_FORMAT_NO_OUTPUT &&
	    DIFF_OPT_TST(options, EXIT_WITH_STATUS) &&
	    DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
		/*
		 * run diff_flush_patch for the exit status. setting
		 * options->file to /dev/null should be safe, because we
		 * aren't supposed to produce any output anyway.
		 */

and the body of this "if" statement loops over q->queue[].  It is
about the "even though we prefer not having to format the patch
because we are doing --quiet, we need to see if contents of one and
two that we _know_ are different are made into the same thing when
we compare them while ignoring various forms of whitespace changes".
So one and two that are removed in earlier step because they were
truly identical may not be penalized if you flip FROM_CONTENTS bit
early on.

Having said that, DIFF_FROM_CONTENTS is about all paths this options
structure governs, not just paths that have eol conversion defined.
When you say "diff --ignore-whitespace-change", that applies to all
paths.  The eol conversion is specified per-path, so ideally the
FROM_CONTENTS bit should be flipped if and only if one or more of
the paths would need the conversion (i.e. does the helper function
would_convert_to_git() say "yes" to the path?).  I however suspect
that necessary information to do so (i.e. "which paths we are
looking at?") has not been generated yet at the point of the code
you quoted.  setup comes (and must come) very early, and then
q->queue[] is populated by different front-end functions that
compare trees, the index, and the working tree, depending on the
"git diff" option or "git diff-{tree,index,files}" plumbing command,
and you can ask "would one of these paths need conversion?" only
after q->queue[] is populated.  Hmm.....

Another thing is that ALLOW_TEXTCONV is not a right bit to check for
your example.  It is "do we use textconv filters to turn binary
files into a (phony) text representation before comparing?".  People
use the mechanism to throw JPEG photos in Git and have textconv
filter to extract only EXIF data, and "diff --textconv" would let us
textually compare only the EXIF data (which is the only human
readable part of the contents anyway).  

It might be a good idea to also flip FROM_CONTENTS bit under "diff
--textconv", and ...

> This solves the problem for me and my test case now passes.

... but I suspect that you were misled to think it fixes your issue,
only because "--textconv" is by default enabled for "git diff" and
"git log" (see "git diff --help").  I think you are saying that if
you always set FROM_CONTENTS bit on, you get what you want.  But
that is to be expected and it unfortunately penalizes everybody by
turning an obvious optimization permanently off.

Also "--textconv" is not on by default for the plumbing "git
diff-index" command and its friends, so it is likely that "git
diff-index HEAD" with your change will still not work as you expect.

A cheap (from code-change point of view) band-aid might be to flip
FROM_CONTENTS on if we know the repository has _some_ paths that
need eol conversion, even when the particular "diff" we are taking
does not involve any eol conversion (e.g. "is core.crlf set?").
While it may be better than "if we are porcelain (aka ALLOW_TEXTCONV
is set), unconditionally flip FROM_CONTENTS on", it is not ideal,
either.

This almost makes me suspect that the place that checks lengths of
one and two in order to refrain from running more expensive content
comparison you found earlier need to ask would_convert_to_git()
before taking the short-cut, something along the lines of this (for
illustration purposes only, not even compile-tested).  The "almost"
comes to me because I do not offhand know the performance implications
of making calls to would_convert_to_git() here.

 diff.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 051761be40..094d5913da 100644
--- a/diff.c
+++ b/diff.c
@@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 	 *    differences.
 	 *
 	 * 2. At this point, the file is known to be modified,
-	 *    with the same mode and size, and the object
-	 *    name of one side is unknown.  Need to inspect
-	 *    the identical contents.
+	 *    with the same mode and size, the object
+	 *    name of one side is unknown, or size comparison
+	 *    cannot be depended upon.  Need to inspect the 
+	 *    contents.
 	 */
 	if (!DIFF_FILE_VALID(p->one) || /* (1) */
 	    !DIFF_FILE_VALID(p->two) ||
@@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 	    (p->one->mode != p->two->mode) ||
 	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
 	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
-	    (p->one->size != p->two->size) ||
+
+	    /* 
+	     * only if eol and other conversions are not involved,
+	     * we can say that two contents of different sizes
+	     * cannot be the same without checking their contents.
+	     */
+	    (!would_convert_to_git(p->one->path) &&
+	     !would_convert_to_git(p->two->path) &&
+	     (p->one->size != p->two->size)) ||
+
 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
 		p->skip_stat_unmatch_result = 1;
 	return p->skip_stat_unmatch_result;



  reply	other threads:[~2017-02-20 21:25 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-17 21:26 git diff --quiet exits with 1 on clean tree with CRLF conversions Mike Crowe
2017-02-17 22:05 ` Junio C Hamano
2017-02-17 22:19   ` Mike Crowe
2017-02-20 15:33     ` Mike Crowe
2017-02-20 21:25       ` Junio C Hamano [this message]
2017-02-25 15:32         ` Mike Crowe
2017-02-27 20:17           ` Junio C Hamano
2017-02-28 18:06             ` Torsten Bögershausen
2017-02-28 21:50               ` Junio C Hamano
2017-03-01 17:04                 ` [PATCH v1 1/1] " tboegi
2017-03-01 21:14                   ` Junio C Hamano
2017-03-01 21:54                     ` Junio C Hamano
2017-03-02  8:53                       ` Jeff King
2017-03-02 17:52                         ` Junio C Hamano
2017-03-02 19:12                           ` Jeff King
2017-03-02 18:51                         ` [PATCH v2] diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec() Junio C Hamano
2017-03-02 14:20                       ` [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions Mike Crowe
2017-03-02 18:20                         ` Torsten Bögershausen
2017-03-02 18:33                         ` Junio C Hamano
2017-03-02 20:03                           ` Mike Crowe
2017-03-03 17:02                             ` Torsten Bögershausen
2017-03-03 17:47                               ` Junio C Hamano
2017-03-04  6:25                                 ` Torsten Bögershausen
2017-03-04 19:59                                   ` Junio C Hamano
2017-03-01 21:25                   ` Mike Crowe
2017-03-01 23:29                     ` Junio C Hamano
2017-03-02 18:17                     ` Torsten Bögershausen
2017-03-03 17:01                       ` Mike Crowe
2017-03-02 15:38               ` git status reports file modified when only line-endings have changed (was git diff --quiet exits with 1 on clean tree with CRLF conversions) Mike Crowe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqlgt0imhe.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=mac@mcrowe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).