git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Johannes Altmanninger <aclopte@gmail.com>
Cc: git@vger.kernel.org
Subject: Re* [PATCH] diff: suppress --name-only paths where all hunks are ignored
Date: Wed, 16 Dec 2020 17:27:13 -0800	[thread overview]
Message-ID: <xmqq4kkl1atq.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: 20201216231840.3163806-2-aclopte@gmail.com

Johannes Altmanninger <aclopte@gmail.com> writes:

> Diff options -w and the new -I<regex> can be used to suppress some hunks. Honor
> these ignores in combination with --name-only, --name-stat, and --raw,
> to not output files where all hunks are ignored.

Hmph, I am not sure if --raw should be affected by any of the
whitespace-difference suppression feature, though.

> Commit f245194f9a made "git diff -w --exit-code" exit with zero if all
> changed hunks are whitespace. This uses the diff_from_contents bit.
> Set that also when given -I<regex>, for consistent exit codes.

This one I can agree with.

> The diff_from_contents bit means that we have to look at content
> changes to know if a path has changed - modulo ignored hunks.  

I am not sure what you mean by " - modulo ignored hunks" here.  When
the diff_from_contents bit is not in effect, we can just rely on
byte-for-byte equality to determine if a path has changed, which
means we can say that two blobs are different by seeing that they
have different object names.  But when diff_from_contents bit is in
effect, two blobs that are not byte-for-byte equal could still be
considered the same.

But your sentence without " - modulo ignored hunks" says that very
clearly.  So perhaps these three words can just go away?

> Teach
> diff.c::flush_one_pair() to do so.  In the caller, reset the found_changes bit
> after each file pair, so we can test each file separately for content changes.

This part I am not sure (and later I will become even less sure it
is right).

It's not like we were buggy when diff_from_contents bit is in effect
for all output formats, is it?  How does this change interact with
the use of the found_changes bit in diffcore_flush() where NO_OUTPUT
format is given, the command wants to exit with status and
diff_from_contents is in effect?


> @@ -5967,6 +5966,26 @@ static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt)
>  {
>  	int fmt = opt->output_format;
>  
> +	if (opt->flags.diff_from_contents &&
> +	    (fmt & (DIFF_FORMAT_RAW | DIFF_FORMAT_NAME | DIFF_FORMAT_NAME_STATUS))) {
> +		static FILE *devnull;
> +		FILE *diff_file;
> +
> +		if (!devnull)
> +			devnull = xfopen("/dev/null", "w");
> +
> +		diff_file = opt->file;
> +		opt->file = devnull;
> +		opt->color_moved = 0;

Why is color_moved so special?  If we added some other new option,
how would we make sure that the person who is adding that new option
would remember to reset it here?  And how does that person decide if
his or her new option needs resetting or not in the first place?

It almost feels backwards to me, to be honest.  All this rigmarole
comes because opt that is used for the 'main' diff is reused by the
inner diff you run here.  You save away opt->file and swap in a new
value, and restore it before you leave.  You disable color_moved but
forget to restore it, which may be an indication of new bug.  If we
used a brand new diff_options instance and copied the options that
mattered from *opt to the new one, and used the new one to drive
this inner diff, at least we wouldn't have to worry about destroying
the outer 'opt' like this patch tries to avoid (and probably not
very successfully).

Also I am not sure if "caching" the file handle to /dev/null is
good idea or not.  When will it be closed?  Would repeated
invocation of the diff machinery work well with it (think: "git log
-w --name-only")

> +		if (check_pair_status(p))
> +			diff_flush_patch(p, opt);
> +
> +		opt->file = diff_file;
> +		if (!opt->found_changes)
> +			return;

This somehow feels wrong.  For one thing, RAW is about showing
object names of preimage and postimage, which means it shows the
preimage and postimage are either identical or different, and
options like -w, -I, etc. that inspect and hide some textual
differences should not affect its outcome at all.

The body of this function (without the above addition) looks like
this:

        static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt)
        {
                int fmt = opt->output_format;

                if (fmt & DIFF_FORMAT_CHECKDIFF)
                        diff_flush_checkdiff(p, opt);
                else if (fmt & (DIFF_FORMAT_RAW | DIFF_FORMAT_NAME_STATUS))
                        diff_flush_raw(p, opt);
                else if (fmt & DIFF_FORMAT_NAME) {
                        const char *name_a, *name_b;
                        name_a = p->two->path;
                        name_b = NULL;
                        strip_prefix(opt->prefix_length, &name_a, &name_b);
                        fprintf(opt->file, "%s", diff_line_prefix(opt));
                        write_name_quoted(name_a, opt->file, opt->line_termination);
                }
        }

and we notice that FORMAT_NAME case is an oddball among the three
with open-coded logic here without any helper function.  I would
have expected that FORMAT_NAME would be the only thing that should
be affected, and the way to do so would be to move what we see here
to a new diff_flush_name_only(p, opt) helper function, and have that
function pay the overhead of actually running a textual diff when
the diff_from_contents bit is in effect.

Having said that, after seeing RAW and NAME_STATUS are grouped into
the same group, I am having a second thought.

> @@ -6350,11 +6369,18 @@ void diff_flush(struct diff_options *options)
>  			     DIFF_FORMAT_NAME |
>  			     DIFF_FORMAT_NAME_STATUS |
>  			     DIFF_FORMAT_CHECKDIFF)) {
> +		int found_changes = 0;
>  		for (i = 0; i < q->nr; i++) {
>  			struct diff_filepair *p = q->queue[i];
> -			if (check_pair_status(p))
> -				flush_one_pair(p, options);
> +			if (!check_pair_status(p))
> +				continue;
> +			flush_one_pair(p, options);
> +			if (options->found_changes) {
> +				found_changes = 1;
> +				options->found_changes = 0;
> +			}
>  		}
> +		options->found_changes = found_changes;
>  		separator++;
>  	}

See above---this seems to be an unwanted side effect of an
unnecessary reuse of opt in flush_one_pair().

How about turning this into at least two patches?

 - We are not handling "-I" the same way as "-w", which is one
   issue.  I think I can agree with the patch to fix that bug.

 - And then there is an issue that we are not inspecting the
   contents when "--name-only" and "-w" is given together.  It is a
   separate issue, and I am not even sure if it is a bug.  The
   attempted "fix" we see here does not look so clean, either, and I
   am tempted to declare that just like "raw", "name-only" and
   "name-status" formats work with byte-for-byte equality and "-w"
   and friends are ignored just like in "diff --name-status --patch"
   the "--patch" option is ignored.

In any case, the first half is a lot more straightforward and it is
easy to convince any reader of its correctness.

Thanks.

--- >8 ------ >8 ------ >8 ------ >8 ------ >8 ------ >8 ------ >8 ---

Subject: [PATCH] diff: correct interaction between --exit-code and -I<pattern>

Just like "git diff -w --exit-code" should exit with 0 when ignoring
whitespace differences results in no changes shown, if ignoring
certain changes with "git diff -I<pattern> --exit-code" result in an
empty patch, we should exit with 0.

The test suite did not cover the interaction between "--exit-code"
and "-w"; add one while adding a new test for "--exit-code" + "-I".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c                     |  3 ++-
 t/t4015-diff-whitespace.sh | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 9768d8eab4..69e3bc00ed 100644
--- a/diff.c
+++ b/diff.c
@@ -4637,7 +4637,8 @@ void diff_setup_done(struct diff_options *options)
 	 * inside contents.
 	 */
 
-	if ((options->xdl_opts & XDF_WHITESPACE_FLAGS))
+	if ((options->xdl_opts & XDF_WHITESPACE_FLAGS) ||
+	    options->ignore_regex_nr)
 		options->flags.diff_from_contents = 1;
 	else
 		options->flags.diff_from_contents = 0;
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 47f0e2889d..8c574221b2 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -567,6 +567,30 @@ test_expect_success '--check and --quiet are not exclusive' '
 	git diff --check --quiet
 '
 
+test_expect_success '-w and --exit-code interact sensibly' '
+	test_when_finished "git checkout x" &&
+	{
+		test_seq 15 &&
+		echo " 16"
+	} >x &&
+	test_must_fail git diff --exit-code &&
+	git diff -w >actual &&
+	test_must_be_empty actual &&
+	git diff -w --exit-code
+'
+
+test_expect_success '-I and --exit-code interact sensibly' '
+	test_when_finished "git checkout x" &&
+	{
+		test_seq 15 &&
+		echo " 16"
+	} >x &&
+	test_must_fail git diff --exit-code &&
+	git diff -I. >actual &&
+	test_must_be_empty actual &&
+	git diff -I. --exit-code
+'
+
 test_expect_success 'check staged with no whitespace errors' '
 	echo "foo();" >x &&
 	git add x &&
-- 
2.30.0-rc0-217-gff80b81bc6


  reply	other threads:[~2020-12-17  1:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 19:00 Unexpected behavior on diff -I<regex> --name-only Johannes Altmanninger
2020-12-14 19:49 ` Junio C Hamano
2020-12-16 23:18   ` Johannes Altmanninger
2020-12-16 23:18     ` [PATCH] diff: suppress --name-only paths where all hunks are ignored Johannes Altmanninger
2020-12-17  1:27       ` Junio C Hamano [this message]
2020-12-20 22:34         ` Re* " Johannes Altmanninger
2020-12-21 19:29           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq4kkl1atq.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=aclopte@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).