git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Unexpected behavior on diff -I<regex> --name-only
@ 2020-12-14 19:00 Johannes Altmanninger
  2020-12-14 19:49 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Altmanninger @ 2020-12-14 19:00 UTC (permalink / raw)
  To: git

Since v2.28.0-2-g296d4a94e7 (diff: add -I<regex> that ignores matching changes)
diff -I<regex> can be used to suppress hunks of only matching lines.
This interacts in a surprising way with --name-only, which lists
all changed files, regardless of whether they are filtered out by -I<regex>:

	git diff HEAD~ -I ''			# always empty
	git diff HEAD~ -I '' --name-only	# not empty, "-I" does nothing

It could be nice to only show names of files with matching hunks
(or reject this combination of options?).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected behavior on diff -I<regex> --name-only
  2020-12-14 19:00 Unexpected behavior on diff -I<regex> --name-only Johannes Altmanninger
@ 2020-12-14 19:49 ` Junio C Hamano
  2020-12-16 23:18   ` Johannes Altmanninger
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2020-12-14 19:49 UTC (permalink / raw)
  To: Johannes Altmanninger; +Cc: git

Johannes Altmanninger <aclopte@gmail.com> writes:

> Since v2.28.0-2-g296d4a94e7 (diff: add -I<regex> that ignores matching changes)
> diff -I<regex> can be used to suppress hunks of only matching lines.
> This interacts in a surprising way with --name-only, which lists
> all changed files, regardless of whether they are filtered out by -I<regex>:
>
> 	git diff HEAD~ -I ''			# always empty
> 	git diff HEAD~ -I '' --name-only	# not empty, "-I" does nothing
>
> It could be nice to only show names of files with matching hunks
> (or reject this combination of options?).

Interesting.  

This is not a new issue limited to -I at all.  If you did this:

	$ echo "hello" >world
	$ git add world ; git commit -m 'add world'
	$ echo " hello" >world ; git add world
	$ git diff -w --cached
	$ git diff --name-only --cached
	world
	$ git diff --name-only -w --cached
	world

I think "--name-only", and perhaps other options, has too aggressive
an optimization that takes advantage of the fact that we can tell if
a path has changed or not without looking at the contents at all by
looking at the object name recorded.  That optimization may have
been valid until many newer and more expensive features came around,
but not anymore.

I think diff.c::flush_one_pair() needs to learn to pay attention to
opt->diff_from_contents in its third branch where DIFF_FORMAT_NAME
is handled.  I do not offhand remember if -I flips diff_from_contents
bit, but I wouldn't be surprised if the recent change added the
support for -I forgot to do so.

Thanks.










^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected behavior on diff -I<regex> --name-only
  2020-12-14 19:49 ` Junio C Hamano
@ 2020-12-16 23:18   ` Johannes Altmanninger
  2020-12-16 23:18     ` [PATCH] diff: suppress --name-only paths where all hunks are ignored Johannes Altmanninger
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Altmanninger @ 2020-12-16 23:18 UTC (permalink / raw)
  To: gitster; +Cc: aclopte, git


On Mon, Dec 14, 2020 at 11:49:06AM -0800, Junio C Hamano wrote:
> This is not a new issue limited to -I at all.  If you did this:
Right, I didn't think of -w at all!

> I do not offhand remember if -I flips diff_from_contents
> bit, but I wouldn't be surprised if the recent change added the
> support for -I forgot to do so.

Spot on ;)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] diff: suppress --name-only paths where all hunks are ignored
  2020-12-16 23:18   ` Johannes Altmanninger
@ 2020-12-16 23:18     ` Johannes Altmanninger
  2020-12-17  1:27       ` Re* " Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Altmanninger @ 2020-12-16 23:18 UTC (permalink / raw)
  To: gitster; +Cc: aclopte, git

Diff options -w and the new -I<regex> can be used to suppress some hunks. Honor
these ignores in combination with --name-only, --name-stat, and --raw,
to not output files where all hunks are ignored.

Commit f245194f9a made "git diff -w --exit-code" exit with zero if all
changed hunks are whitespace. This uses the diff_from_contents bit.
Set that also when given -I<regex>, for consistent exit codes.

The diff_from_contents bit means that we have to look at content
changes to know if a path has changed - modulo ignored hunks.  Teach
diff.c::flush_one_pair() to do so.  In the caller, reset the found_changes bit
after each file pair, so we can test each file separately for content changes.

Signed-off-by: Johannes Altmanninger <aclopte@gmail.com>
---
 diff.c                     | 36 +++++++++++++++++++++++++++++++-----
 t/t4013-diff-various.sh    | 13 +++++++++++++
 t/t4015-diff-whitespace.sh |  9 +++++++++
 3 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 643f4f3f6d..560f2d5fad 100644
--- a/diff.c
+++ b/diff.c
@@ -4630,11 +4630,10 @@ void diff_setup_done(struct diff_options *options)
 	/*
 	 * Most of the time we can say "there are changes"
 	 * only by checking if there are changed paths, but
-	 * --ignore-whitespace* options force us to look
-	 * inside contents.
+	 * --ignore-* options force us to look inside contents.
 	 */
 
-	if ((options->xdl_opts & XDF_WHITESPACE_FLAGS))
+	if ((options->xdl_opts & XDF_WHITESPACE_FLAGS) || options->ignore_regex)
 		options->flags.diff_from_contents = 1;
 	else
 		options->flags.diff_from_contents = 0;
@@ -5967,6 +5966,26 @@ static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt)
 {
 	int fmt = opt->output_format;
 
+	if (opt->flags.diff_from_contents &&
+	    (fmt & (DIFF_FORMAT_RAW | DIFF_FORMAT_NAME | DIFF_FORMAT_NAME_STATUS))) {
+		static FILE *devnull;
+		FILE *diff_file;
+
+		if (!devnull)
+			devnull = xfopen("/dev/null", "w");
+
+		diff_file = opt->file;
+		opt->file = devnull;
+		opt->color_moved = 0;
+
+		if (check_pair_status(p))
+			diff_flush_patch(p, opt);
+
+		opt->file = diff_file;
+		if (!opt->found_changes)
+			return;
+	}
+
 	if (fmt & DIFF_FORMAT_CHECKDIFF)
 		diff_flush_checkdiff(p, opt);
 	else if (fmt & (DIFF_FORMAT_RAW | DIFF_FORMAT_NAME_STATUS))
@@ -6350,11 +6369,18 @@ void diff_flush(struct diff_options *options)
 			     DIFF_FORMAT_NAME |
 			     DIFF_FORMAT_NAME_STATUS |
 			     DIFF_FORMAT_CHECKDIFF)) {
+		int found_changes = 0;
 		for (i = 0; i < q->nr; i++) {
 			struct diff_filepair *p = q->queue[i];
-			if (check_pair_status(p))
-				flush_one_pair(p, options);
+			if (!check_pair_status(p))
+				continue;
+			flush_one_pair(p, options);
+			if (options->found_changes) {
+				found_changes = 1;
+				options->found_changes = 0;
+			}
 		}
+		options->found_changes = found_changes;
 		separator++;
 	}
 
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index f72d456d3b..7cfd3a22d1 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -509,6 +509,19 @@ test_expect_success 'diff -I<regex> --stat' '
 	test_cmp expect actual
 '
 
+test_expect_success 'diff -I<regex> --name-only' '
+	git diff -I "" >actual --exit-code &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'diff -I<regex> --name-status' '
+	! git diff -I"[0-9]" --name-status  --exit-code >actual &&
+	cat >expect <<-\EOF &&
+	M	file0
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'diff -I<regex>: detect malformed regex' '
 	test_expect_code 129 git diff --ignore-matching-lines="^[124-9" 2>error &&
 	test_i18ngrep "invalid regex given to -I: " error
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 47f0e2889d..3c4941cf96 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -805,6 +805,15 @@ test_expect_success 'whitespace-only changes not reported (diffstat)' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'whitespace-only changes not reported (name-only)' '
+	# reuse state from previous test
+	! git diff --name-only >actual --exit-code &&
+	git diff --name-only -b >actual --exit-code &&
+	git diff --name-status -b >>actual &&
+	git diff --raw -b >>actual &&
+	test_must_be_empty actual
+'
+
 test_expect_success 'whitespace changes with modification reported (diffstat)' '
 	git reset --hard &&
 	echo >x "hello  world" &&
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re* [PATCH] diff: suppress --name-only paths where all hunks are ignored
  2020-12-16 23:18     ` [PATCH] diff: suppress --name-only paths where all hunks are ignored Johannes Altmanninger
@ 2020-12-17  1:27       ` Junio C Hamano
  2020-12-20 22:34         ` Johannes Altmanninger
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2020-12-17  1:27 UTC (permalink / raw)
  To: Johannes Altmanninger; +Cc: git

Johannes Altmanninger <aclopte@gmail.com> writes:

> Diff options -w and the new -I<regex> can be used to suppress some hunks. Honor
> these ignores in combination with --name-only, --name-stat, and --raw,
> to not output files where all hunks are ignored.

Hmph, I am not sure if --raw should be affected by any of the
whitespace-difference suppression feature, though.

> Commit f245194f9a made "git diff -w --exit-code" exit with zero if all
> changed hunks are whitespace. This uses the diff_from_contents bit.
> Set that also when given -I<regex>, for consistent exit codes.

This one I can agree with.

> The diff_from_contents bit means that we have to look at content
> changes to know if a path has changed - modulo ignored hunks.  

I am not sure what you mean by " - modulo ignored hunks" here.  When
the diff_from_contents bit is not in effect, we can just rely on
byte-for-byte equality to determine if a path has changed, which
means we can say that two blobs are different by seeing that they
have different object names.  But when diff_from_contents bit is in
effect, two blobs that are not byte-for-byte equal could still be
considered the same.

But your sentence without " - modulo ignored hunks" says that very
clearly.  So perhaps these three words can just go away?

> Teach
> diff.c::flush_one_pair() to do so.  In the caller, reset the found_changes bit
> after each file pair, so we can test each file separately for content changes.

This part I am not sure (and later I will become even less sure it
is right).

It's not like we were buggy when diff_from_contents bit is in effect
for all output formats, is it?  How does this change interact with
the use of the found_changes bit in diffcore_flush() where NO_OUTPUT
format is given, the command wants to exit with status and
diff_from_contents is in effect?


> @@ -5967,6 +5966,26 @@ static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt)
>  {
>  	int fmt = opt->output_format;
>  
> +	if (opt->flags.diff_from_contents &&
> +	    (fmt & (DIFF_FORMAT_RAW | DIFF_FORMAT_NAME | DIFF_FORMAT_NAME_STATUS))) {
> +		static FILE *devnull;
> +		FILE *diff_file;
> +
> +		if (!devnull)
> +			devnull = xfopen("/dev/null", "w");
> +
> +		diff_file = opt->file;
> +		opt->file = devnull;
> +		opt->color_moved = 0;

Why is color_moved so special?  If we added some other new option,
how would we make sure that the person who is adding that new option
would remember to reset it here?  And how does that person decide if
his or her new option needs resetting or not in the first place?

It almost feels backwards to me, to be honest.  All this rigmarole
comes because opt that is used for the 'main' diff is reused by the
inner diff you run here.  You save away opt->file and swap in a new
value, and restore it before you leave.  You disable color_moved but
forget to restore it, which may be an indication of new bug.  If we
used a brand new diff_options instance and copied the options that
mattered from *opt to the new one, and used the new one to drive
this inner diff, at least we wouldn't have to worry about destroying
the outer 'opt' like this patch tries to avoid (and probably not
very successfully).

Also I am not sure if "caching" the file handle to /dev/null is
good idea or not.  When will it be closed?  Would repeated
invocation of the diff machinery work well with it (think: "git log
-w --name-only")

> +		if (check_pair_status(p))
> +			diff_flush_patch(p, opt);
> +
> +		opt->file = diff_file;
> +		if (!opt->found_changes)
> +			return;

This somehow feels wrong.  For one thing, RAW is about showing
object names of preimage and postimage, which means it shows the
preimage and postimage are either identical or different, and
options like -w, -I, etc. that inspect and hide some textual
differences should not affect its outcome at all.

The body of this function (without the above addition) looks like
this:

        static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt)
        {
                int fmt = opt->output_format;

                if (fmt & DIFF_FORMAT_CHECKDIFF)
                        diff_flush_checkdiff(p, opt);
                else if (fmt & (DIFF_FORMAT_RAW | DIFF_FORMAT_NAME_STATUS))
                        diff_flush_raw(p, opt);
                else if (fmt & DIFF_FORMAT_NAME) {
                        const char *name_a, *name_b;
                        name_a = p->two->path;
                        name_b = NULL;
                        strip_prefix(opt->prefix_length, &name_a, &name_b);
                        fprintf(opt->file, "%s", diff_line_prefix(opt));
                        write_name_quoted(name_a, opt->file, opt->line_termination);
                }
        }

and we notice that FORMAT_NAME case is an oddball among the three
with open-coded logic here without any helper function.  I would
have expected that FORMAT_NAME would be the only thing that should
be affected, and the way to do so would be to move what we see here
to a new diff_flush_name_only(p, opt) helper function, and have that
function pay the overhead of actually running a textual diff when
the diff_from_contents bit is in effect.

Having said that, after seeing RAW and NAME_STATUS are grouped into
the same group, I am having a second thought.

> @@ -6350,11 +6369,18 @@ void diff_flush(struct diff_options *options)
>  			     DIFF_FORMAT_NAME |
>  			     DIFF_FORMAT_NAME_STATUS |
>  			     DIFF_FORMAT_CHECKDIFF)) {
> +		int found_changes = 0;
>  		for (i = 0; i < q->nr; i++) {
>  			struct diff_filepair *p = q->queue[i];
> -			if (check_pair_status(p))
> -				flush_one_pair(p, options);
> +			if (!check_pair_status(p))
> +				continue;
> +			flush_one_pair(p, options);
> +			if (options->found_changes) {
> +				found_changes = 1;
> +				options->found_changes = 0;
> +			}
>  		}
> +		options->found_changes = found_changes;
>  		separator++;
>  	}

See above---this seems to be an unwanted side effect of an
unnecessary reuse of opt in flush_one_pair().

How about turning this into at least two patches?

 - We are not handling "-I" the same way as "-w", which is one
   issue.  I think I can agree with the patch to fix that bug.

 - And then there is an issue that we are not inspecting the
   contents when "--name-only" and "-w" is given together.  It is a
   separate issue, and I am not even sure if it is a bug.  The
   attempted "fix" we see here does not look so clean, either, and I
   am tempted to declare that just like "raw", "name-only" and
   "name-status" formats work with byte-for-byte equality and "-w"
   and friends are ignored just like in "diff --name-status --patch"
   the "--patch" option is ignored.

In any case, the first half is a lot more straightforward and it is
easy to convince any reader of its correctness.

Thanks.

--- >8 ------ >8 ------ >8 ------ >8 ------ >8 ------ >8 ------ >8 ---

Subject: [PATCH] diff: correct interaction between --exit-code and -I<pattern>

Just like "git diff -w --exit-code" should exit with 0 when ignoring
whitespace differences results in no changes shown, if ignoring
certain changes with "git diff -I<pattern> --exit-code" result in an
empty patch, we should exit with 0.

The test suite did not cover the interaction between "--exit-code"
and "-w"; add one while adding a new test for "--exit-code" + "-I".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c                     |  3 ++-
 t/t4015-diff-whitespace.sh | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 9768d8eab4..69e3bc00ed 100644
--- a/diff.c
+++ b/diff.c
@@ -4637,7 +4637,8 @@ void diff_setup_done(struct diff_options *options)
 	 * inside contents.
 	 */
 
-	if ((options->xdl_opts & XDF_WHITESPACE_FLAGS))
+	if ((options->xdl_opts & XDF_WHITESPACE_FLAGS) ||
+	    options->ignore_regex_nr)
 		options->flags.diff_from_contents = 1;
 	else
 		options->flags.diff_from_contents = 0;
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 47f0e2889d..8c574221b2 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -567,6 +567,30 @@ test_expect_success '--check and --quiet are not exclusive' '
 	git diff --check --quiet
 '
 
+test_expect_success '-w and --exit-code interact sensibly' '
+	test_when_finished "git checkout x" &&
+	{
+		test_seq 15 &&
+		echo " 16"
+	} >x &&
+	test_must_fail git diff --exit-code &&
+	git diff -w >actual &&
+	test_must_be_empty actual &&
+	git diff -w --exit-code
+'
+
+test_expect_success '-I and --exit-code interact sensibly' '
+	test_when_finished "git checkout x" &&
+	{
+		test_seq 15 &&
+		echo " 16"
+	} >x &&
+	test_must_fail git diff --exit-code &&
+	git diff -I. >actual &&
+	test_must_be_empty actual &&
+	git diff -I. --exit-code
+'
+
 test_expect_success 'check staged with no whitespace errors' '
 	echo "foo();" >x &&
 	git add x &&
-- 
2.30.0-rc0-217-gff80b81bc6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re* [PATCH] diff: suppress --name-only paths where all hunks are ignored
  2020-12-17  1:27       ` Re* " Junio C Hamano
@ 2020-12-20 22:34         ` Johannes Altmanninger
  2020-12-21 19:29           ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Altmanninger @ 2020-12-20 22:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Dec 16, 2020 at 05:27:13PM -0800, Junio C Hamano wrote:
> Johannes Altmanninger <aclopte@gmail.com> writes:
> > The diff_from_contents bit means that we have to look at content
> > changes to know if a path has changed - modulo ignored hunks.  
>
> But your sentence without " - modulo ignored hunks" says that very
> clearly.  So perhaps these three words can just go away?

My bad, this should rather be

	The diff_from_contents bit means that we have to look at content
	changes (modulo ignored hunks) to know if a path has changed.

probably dropping the three words makes it even less confusing.

> It's not like we were buggy when diff_from_contents bit is in effect
> for all output formats, is it?

Right, it's useless to set the bit here.

> > +		opt->color_moved = 0;
> 
> Why is color_moved so special?  If we added some other new option,
> how would we make sure that the person who is adding that new option
> would remember to reset it here?  And how does that person decide if
> his or her new option needs resetting or not in the first place?

I copied color_moved=0 from the DIFF_FORMAT_NO_OUTPUT section of
diff_flush(). It happens to work correctly in both places because no diff
contents are printed, but yeah it's not robust to future changes.

> Also I am not sure if "caching" the file handle to /dev/null is
> good idea or not.  When will it be closed?  Would repeated
> invocation of the diff machinery work well with it (think: "git log
> -w --name-only")

I believe that repeated invocations work fine, because the file is never
closed (which may be a problem?). The file could be made nilable if we still
need something like that.

> This somehow feels wrong.  For one thing, RAW is about showing
> object names of preimage and postimage, which means it shows the
> preimage and postimage are either identical or different, and
> options like -w, -I, etc. that inspect and hide some textual
> differences should not affect its outcome at all.

Yeah, I realized late that -w --raw will have sort of unintuitive semantics.
This combination is probably rare but I guess users could set an alias for
diff -w.

>    I am tempted to declare that just like "raw", "name-only" and
>    "name-status" formats work with byte-for-byte equality


OK, I think it's better to keep the current (consistent) behavior.  I don't
think it's worth to special-case "--name-only".

It's easy to read the names from a "diff -I" output anyway.  This filter is
broken in various ways, but works well enough in practise:

	diff -I | perl -ne 'print "$1\n" if /\+{3} b\/(.*)/'

> and "-w" and friends are ignored just like in "diff --name-status --patch"
> the "--patch" option is ignored.

For interactive use, it would be more helpful to reject useless options
instead of ignoring them. Though ignoring is probably desirable to allow
liberal use of these options, I'm not sure.

> --- >8 ------ >8 ------ >8 ------ >8 ------ >8 ------ >8 ------ >8 ---
> 
> Subject: [PATCH] diff: correct interaction between --exit-code and -I<pattern>

Looks good.

> +test_expect_success '-w and --exit-code interact sensibly' '

Maybe 'exit with 0 when all changes are ignored by -w' though either version
is fine because I think the intention of the test is already obvious.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re* [PATCH] diff: suppress --name-only paths where all hunks are ignored
  2020-12-20 22:34         ` Johannes Altmanninger
@ 2020-12-21 19:29           ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2020-12-21 19:29 UTC (permalink / raw)
  To: Johannes Altmanninger; +Cc: git

Johannes Altmanninger <aclopte@gmail.com> writes:

>> +test_expect_success '-w and --exit-code interact sensibly' '
>
> Maybe 'exit with 0 when all changes are ignored by -w' though either version
> is fine because I think the intention of the test is already obvious.

Yeah, 'sensibly' is a zero-bit phrase and the letters are better
spent on describing what we deem sensible more clearly.  Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-12-21 19:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-14 19:00 Unexpected behavior on diff -I<regex> --name-only Johannes Altmanninger
2020-12-14 19:49 ` Junio C Hamano
2020-12-16 23:18   ` Johannes Altmanninger
2020-12-16 23:18     ` [PATCH] diff: suppress --name-only paths where all hunks are ignored Johannes Altmanninger
2020-12-17  1:27       ` Re* " Junio C Hamano
2020-12-20 22:34         ` Johannes Altmanninger
2020-12-21 19:29           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).