git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* gitk regression in version 2.36.0
@ 2022-04-23  5:25 Matthias Aßhauer
  2022-04-23  5:54 ` Junio C Hamano
  2022-04-23  9:27 ` gitk regression in version 2.36.0 René Scharfe
  0 siblings, 2 replies; 16+ messages in thread
From: Matthias Aßhauer @ 2022-04-23  5:25 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Elijah Newren,
	Johannes Schindelin

Git 2.36.0 (or more precisely 244c27242f (diff.[ch]: have diff_free() call 
clear_pathspec(opts.pathspec), 2022-02-16)) introduced some change in 
behaviour that causes gitks highlight feature not to work correctly 
anymore.

Here's a quick reproducer based on git.git:

git checkout 244c27242f44e6b88e3a381c90bde08d134c274b~1
make install
git checkout 244c27242f44e6b88e3a381c90bde08d134c274b
PATH=~/bin:$PATH ~/bin/gitk
# In commit 4c53a8c20f (Git 2.35.1, 2022-01-28) (2nd from the top)
# right click GIT-VERSION-GEN and select "Highlight this only".
# You'll see 4c53a8c20f (Git 2.35.1, 2022-01-28) and
# 89bece5c8c (Git 2.35, 2022-01-24) highlighted, but not the surrounding
# commits. Exit gitk.
make install
PATH=~/bin:$PATH ~/bin/gitk
# In commit 4c53a8c20f (Git 2.35.1, 2022-01-28) (2nd from the top)
# right click GIT-VERSION-GEN and select "Highlight this only".
# Almost every non-merge commmit will be highlighted.

I think this is a change in behaviour in `git diff-tree`, but I'm honestly 
not sure what arguments gitk passes to `git diff-tree`, so I'm struggling 
to figure out what exactly changed.

This issue was originally reported as a Git for Windows issue [1], but I 
can also reproduce it on Linux.

[1] https://github.com/git-for-windows/git/issues/3815

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: gitk regression in version 2.36.0
  2022-04-23  5:25 gitk regression in version 2.36.0 Matthias Aßhauer
@ 2022-04-23  5:54 ` Junio C Hamano
  2022-04-23  6:05   ` Junio C Hamano
  2022-04-23 10:13   ` René Scharfe
  2022-04-23  9:27 ` gitk regression in version 2.36.0 René Scharfe
  1 sibling, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2022-04-23  5:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Matthias Aßhauer
  Cc: git, Elijah Newren, Johannes Schindelin

Matthias Aßhauer <mha1993@live.de> writes:

> Git 2.36.0 (or more precisely 244c27242f (diff.[ch]: have diff_free()
> call clear_pathspec(opts.pathspec), 2022-02-16)) introduced some
> change in behaviour that causes gitks highlight feature not to work
> correctly anymore.

Nicely found.

I think what happens is that gitk REPEATEDLY calls log_tree_commit()
by opening a pipe to "git diff-tree -r -s --stdin $gdtargs" and feed
revisions from its standard input.  

    builtin/diff-tree.c::diff_tree_stdin()
    -> builtin/diff-tree.c::stdin_diff_commit()
       -> log-tree.c::log_tree_commit()

Now, log-tree.c::log_tree_commit() is responsible for formating the
difference between the commit and its parent, using the
opt->diffopt.  After computing the pairwise diff for one tree pair
(commit and its parent), of course it calls diff_free() to flush the
queued diff and release other resources we allocated while showing
that single diff.  Before the broken commit 244c27242f, diff_free()
released ONLY the newly allocated resources that we needed to
compute one pair of trees.  Most importantly, the settings used to
compute diffs that are reused to compute the next pair of trees need
to be kept, e.g. pathspec.

Remember, we are talking about log-tree, the upstream of the pipe
going from the tip of the history and traversing the parent chain to
ancestors, so once we are done with showing the diff between our
current commit and its parent, we will move to the parent and
compute the same diff with its parent (i.e. our grand-parent).

But with the regression, we mistakenly release the pathspec, so the
first commit may use the specified pathspec, but the second and
subsequent ones will lose the pathspec the user gave us, making the
comparison tree-wide one, not limited with any pathspec.

I am reasonably sure that reverting that commit will be the right
thing to do.  It is somewhat unfortunate that it would reintroduce
resource leaks that having clear_pathspec() in a wrong place (i.e.
diff_free()) was covering up.  We should instead need to find the
place where a diff_opt struct goes out of scope (after being used
for zero or more times, calling diff_free() after each iteration to
release resources consumed per-iteration) and call clear_pathspec().

Thanks for a report.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: gitk regression in version 2.36.0
  2022-04-23  5:54 ` Junio C Hamano
@ 2022-04-23  6:05   ` Junio C Hamano
  2022-04-23 10:13   ` René Scharfe
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2022-04-23  6:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Matthias Aßhauer, git, Elijah Newren, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> Matthias Aßhauer <mha1993@live.de> writes:
>
>> Git 2.36.0 (or more precisely 244c27242f (diff.[ch]: have diff_free()
>> call clear_pathspec(opts.pathspec), 2022-02-16)) introduced some
>> change in behaviour that causes gitks highlight feature not to work
>> correctly anymore.
>
> Nicely found.

A simple reproduction recipe without gitk is a command line
invocation like this:

$ git rev-list -10 --max-parents=1 v2.36.0 -- Documentation | 
  git diff-tree --stdin --stat -- Documentation

The upstream of the pipe lists 10 topmost non-merge commits, going
back from v2.36.0, that touch Documentation/ directory, and the
downstream "diff-tree" is told to show for each of these commits to
compute equivalent of "git show --stat -- Documentation", i.e. only
the Documentation directory.  But "diff-tree" loses pathspec and we
will see paths outside Documentation appearing in the output.

If I substitute "git diff-tree" on the downstream of the pipe with
the version from v2.35.0, of course the correct thing happens X-<.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: gitk regression in version 2.36.0
  2022-04-23  5:25 gitk regression in version 2.36.0 Matthias Aßhauer
  2022-04-23  5:54 ` Junio C Hamano
@ 2022-04-23  9:27 ` René Scharfe
  1 sibling, 0 replies; 16+ messages in thread
From: René Scharfe @ 2022-04-23  9:27 UTC (permalink / raw)
  To: Matthias Aßhauer, git
  Cc: Ævar Arnfjörð Bjarmason, Elijah Newren,
	Johannes Schindelin

Am 23.04.22 um 07:25 schrieb Matthias Aßhauer:
> Git 2.36.0 (or more precisely 244c27242f (diff.[ch]: have diff_free()
> call clear_pathspec(opts.pathspec), 2022-02-16)) introduced some
> change in behaviour that causes gitks highlight feature not to work
> correctly anymore.
>
> Here's a quick reproducer based on git.git:
>
> git checkout 244c27242f44e6b88e3a381c90bde08d134c274b~1
> make install
> git checkout 244c27242f44e6b88e3a381c90bde08d134c274b
> PATH=~/bin:$PATH ~/bin/gitk
> # In commit 4c53a8c20f (Git 2.35.1, 2022-01-28) (2nd from the top)
> # right click GIT-VERSION-GEN and select "Highlight this only".
> # You'll see 4c53a8c20f (Git 2.35.1, 2022-01-28) and
> # 89bece5c8c (Git 2.35, 2022-01-24) highlighted, but not the surrounding
> # commits. Exit gitk.
> make install
> PATH=~/bin:$PATH ~/bin/gitk
> # In commit 4c53a8c20f (Git 2.35.1, 2022-01-28) (2nd from the top)
> # right click GIT-VERSION-GEN and select "Highlight this only".
> # Almost every non-merge commmit will be highlighted.
>
> I think this is a change in behaviour in `git diff-tree`, but I'm
> honestly not sure what arguments gitk passes to `git diff-tree`, so
> I'm struggling to figure out what exactly changed.
>
> This issue was originally reported as a Git for Windows issue [1],
> but I can also reproduce it on Linux.
>
> [1] https://github.com/git-for-windows/git/issues/3815

gitk does something like this to find commits that touched that file
(just with more commits):

   # v2.25.3
   $ git rev-parse 4c53a8c20f ff5b7913f0 | git diff-tree -r -s --stdin GIT-VERSION-GEN
   4c53a8c20f8984adb226293a3ffd7b88c3f4ac1a

   # 244c27242f (diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16))
   $ git rev-parse 4c53a8c20f ff5b7913f0 | git diff-tree -r -s --stdin GIT-VERSION-GEN
   4c53a8c20f8984adb226293a3ffd7b88c3f4ac1a
   ff5b7913f0af62c26682b0376d0aa2d7f5d74b2e

Somewhere in diff-tree a struct diff_options is reused between commits,
and the caller expects its pathspec to be preserved, but 244c27242f
clears it.  With the path filter gone, the following commits match.

René

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: gitk regression in version 2.36.0
  2022-04-23  5:54 ` Junio C Hamano
  2022-04-23  6:05   ` Junio C Hamano
@ 2022-04-23 10:13   ` René Scharfe
  2022-04-23 16:00     ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: René Scharfe @ 2022-04-23 10:13 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Matthias Aßhauer
  Cc: git, Elijah Newren, Johannes Schindelin

Am 23.04.22 um 07:54 schrieb Junio C Hamano:
> Matthias Aßhauer <mha1993@live.de> writes:
>
>> Git 2.36.0 (or more precisely 244c27242f (diff.[ch]: have diff_free()
>> call clear_pathspec(opts.pathspec), 2022-02-16)) introduced some
>> change in behaviour that causes gitks highlight feature not to work
>> correctly anymore.
>
> I am reasonably sure that reverting that commit will be the right
> thing to do.  It is somewhat unfortunate that it would reintroduce
> resource leaks that having clear_pathspec() in a wrong place (i.e.
> diff_free()) was covering up.  We should instead need to find the
> place where a diff_opt struct goes out of scope (after being used
> for zero or more times, calling diff_free() after each iteration to
> release resources consumed per-iteration) and call clear_pathspec().

Right; a small memory leak is better than wrong output.

Finding those places is a bit complicated by diff_options often being
embedded in struct rev_info, though.

René


PS: And I need to learn to download new posts before hitting Send
(or become faster); sorry for my near-duplicate reply.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: gitk regression in version 2.36.0
  2022-04-23 10:13   ` René Scharfe
@ 2022-04-23 16:00     ` Junio C Hamano
  2022-04-25 17:45       ` [PATCH] 2.36 gitk/diff-tree --stdin regression fix Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2022-04-23 16:00 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, Matthias Aßhauer,
	git, Elijah Newren, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

>> I am reasonably sure that reverting that commit will be the right
>> thing to do.  It is somewhat unfortunate that it would reintroduce
>> resource leaks that having clear_pathspec() in a wrong place (i.e.
>> diff_free()) was covering up.  We should instead need to find the
>> place where a diff_opt struct goes out of scope (after being used
>> for zero or more times, calling diff_free() after each iteration to
>> release resources consumed per-iteration) and call clear_pathspec().
>
> Right; a small memory leak is better than wrong output.

Yeah, it is an absolute requirement to avoid producing wrong output,
and when producing output for 100 or 1000 commits, we should not
leak resources in proportion to the number of these commits
processed, so forgetting to call diff_free() that releases resources
that are required per diff-invocation is also a must.  Compared to
that, cleaning up the resource allocated just once and repeatedly
used while we handle these 100 or 1000 commits, while it is nice to
do so, is of much lower priority, certainly much lower than computing
the right result.

> Finding those places is a bit complicated by diff_options often being
> embedded in struct rev_info, though.

Perhaps, but we should have a resource-releasing helper for rev_info,
so that may be a good place to do so, hopefully.

Thanks

> PS: And I need to learn to download new posts before hitting Send
> (or become faster); sorry for my near-duplicate reply.

Actually this time I very much appreciated an independent validation
of my findings ;-)  Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] 2.36 gitk/diff-tree --stdin regression fix
  2022-04-23 16:00     ` Junio C Hamano
@ 2022-04-25 17:45       ` Junio C Hamano
  2022-04-25 22:37         ` [PATCH] t4013: diff-tree --stdin with pathspec Junio C Hamano
  2022-04-26 10:09         ` [PATCH] 2.36 gitk/diff-tree --stdin regression fix Phillip Wood
  0 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2022-04-25 17:45 UTC (permalink / raw)
  To: git
  Cc: Matthias Aßhauer, René Scharfe,
	Ævar Arnfjörð Bjarmason

This reverts commit 244c2724 (diff.[ch]: have diff_free() call
clear_pathspec(opts.pathspec), 2022-02-16).

The diff_free() call is to be used after a diffopt structure is used
to compare two sets of paths to release resources that were needed
only for that comparison, and keep the data such as pathspec that
are reused by the diffopt structure to make the next and subsequent
comparison (imagine "git log -p -<options> -- <pathspec>" where the
options and pathspec are kept in the diffopt structure, used to
compare HEAD and HEAD~, then used again when HEAD~ and HEAD~2 are
compared).

We by mistake started clearing the pathspec in diff_free(), so
programs like gitk that runs

    git diff-tree --stdin -- <pathspec>

downstream of a pipe, processing one commit after another, started
showing irrelevant comparison outside the given <pathspec> from the
second commit.

The buggy commit may have been hiding the places where diff
machinery is used only once and called diff_free() to release that
per-comparison resources, but forgetting to call clear_pathspec() to
release the resource held for the (potentially) repeated comparison,
and we eventually would want to add clear_pathspec() to clear
resources to be released after a (potentially repeated) diff session
is done (if there are similar resources other than pathspec that
need to be cleared at the end, we should then know where to clear
them), but that is "per program invocation" leak that will be
cleaned up by calling exit(3) and of lower priority than fixing this
behavior-breaking regression.

Reported-by: Matthias Aßhauer <mha1993@live.de>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 add-interactive.c | 6 +++---
 blame.c           | 3 +++

 builtin/reset.c   | 1 +
 diff.c            | 1 -
 notes-merge.c     | 2 ++
 5 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/add-interactive.c b/add-interactive.c
index e1ab39cce3..6498ae196f 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -797,14 +797,14 @@ static int run_revert(struct add_i_state *s, const struct pathspec *ps,
 	diffopt.flags.override_submodule_config = 1;
 	diffopt.repo = s->r;
 
-	if (do_diff_cache(&oid, &diffopt)) {
-		diff_free(&diffopt);
+	if (do_diff_cache(&oid, &diffopt))
 		res = -1;
-	} else {
+	else {
 		diffcore_std(&diffopt);
 		diff_flush(&diffopt);
 	}
 	free(paths);
+	clear_pathspec(&diffopt.pathspec);
 
 	if (!res && write_locked_index(s->r->index, &index_lock,
 				       COMMIT_LOCK) < 0)
diff --git a/blame.c b/blame.c
index 401990726e..206c295660 100644
--- a/blame.c
+++ b/blame.c
@@ -1403,6 +1403,7 @@ static struct blame_origin *find_origin(struct repository *r,
 		}
 	}
 	diff_flush(&diff_opts);
+	clear_pathspec(&diff_opts.pathspec);
 	return porigin;
 }
 
@@ -1446,6 +1447,7 @@ static struct blame_origin *find_rename(struct repository *r,
 		}
 	}
 	diff_flush(&diff_opts);
+	clear_pathspec(&diff_opts.pathspec);
 	return porigin;
 }
 
@@ -2326,6 +2328,7 @@ static void find_copy_in_parent(struct blame_scoreboard *sb,
 	} while (unblamed);
 	target->suspects = reverse_blame(leftover, NULL);
 	diff_flush(&diff_opts);
+	clear_pathspec(&diff_opts.pathspec);
 }
 
 /*
diff --git a/builtin/reset.c b/builtin/reset.c
index 24968dd628..b97745ee94 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -274,6 +274,7 @@ static int read_from_tree(const struct pathspec *pathspec,
 		return 1;
 	diffcore_std(&opt);
 	diff_flush(&opt);
+	clear_pathspec(&opt.pathspec);
 
 	return 0;
 }
diff --git a/diff.c b/diff.c
index 0aef3db6e1..c862771a58 100644
--- a/diff.c
+++ b/diff.c
@@ -6345,7 +6345,6 @@ void diff_free(struct diff_options *options)
 
 	diff_free_file(options);
 	diff_free_ignore_regex(options);
-	clear_pathspec(&options->pathspec);
 }
 
 void diff_flush(struct diff_options *options)
diff --git a/notes-merge.c b/notes-merge.c
index 7ba40cfb08..b4a3a903e8 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -175,6 +175,7 @@ static struct notes_merge_pair *diff_tree_remote(struct notes_merge_options *o,
 		       oid_to_hex(&mp->remote));
 	}
 	diff_flush(&opt);
+	clear_pathspec(&opt.pathspec);
 
 	*num_changes = len;
 	return changes;
@@ -260,6 +261,7 @@ static void diff_tree_local(struct notes_merge_options *o,
 		       oid_to_hex(&mp->local));
 	}
 	diff_flush(&opt);
+	clear_pathspec(&opt.pathspec);
 }
 
 static void check_notes_merge_worktree(struct notes_merge_options *o)
-- 
2.36.0-194-g075c326b54



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] t4013: diff-tree --stdin with pathspec
  2022-04-25 17:45       ` [PATCH] 2.36 gitk/diff-tree --stdin regression fix Junio C Hamano
@ 2022-04-25 22:37         ` Junio C Hamano
  2022-04-26 10:09         ` [PATCH] 2.36 gitk/diff-tree --stdin regression fix Phillip Wood
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2022-04-25 22:37 UTC (permalink / raw)
  To: git
  Cc: Matthias Aßhauer, René Scharfe,
	Ævar Arnfjörð Bjarmason

Add a test that feeds "diff-tree --stdin <pathspec>" with list of
revisions, in a way very similar to how "gitk" drives it.  If we had
such a test, we would have caught the recent regression.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 * On top of that "revert" change.

 t/t4013-diff-various.sh | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 750aee17ea..628b01f355 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -542,6 +542,20 @@ test_expect_success 'diff-tree --stdin with log formatting' '
 	test_cmp expect actual
 '
 
+test_expect_success 'diff-tree --stdin with pathspec' '
+	cat >expect <<-EOF &&
+	Third
+
+	dir/sub
+	Second
+
+	dir/sub
+	EOF
+	git rev-list master^ |
+	git diff-tree -r --stdin --name-only --format=%s dir >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'diff -I<regex>: setup' '
 	git checkout master &&
 	test_seq 50 >file0 &&
-- 
2.36.0-200-g3c19986797


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.36 gitk/diff-tree --stdin regression fix
  2022-04-25 17:45       ` [PATCH] 2.36 gitk/diff-tree --stdin regression fix Junio C Hamano
  2022-04-25 22:37         ` [PATCH] t4013: diff-tree --stdin with pathspec Junio C Hamano
@ 2022-04-26 10:09         ` Phillip Wood
  2022-04-26 13:45           ` Phillip Wood
  1 sibling, 1 reply; 16+ messages in thread
From: Phillip Wood @ 2022-04-26 10:09 UTC (permalink / raw)
  To: Junio C Hamano, git
  Cc: Matthias Aßhauer, René Scharfe,
	Ævar Arnfjörð Bjarmason

On 25/04/2022 18:45, Junio C Hamano wrote:
> This reverts commit 244c2724 (diff.[ch]: have diff_free() call
> clear_pathspec(opts.pathspec), 2022-02-16).
> 
> The diff_free() call is to be used after a diffopt structure is used
> to compare two sets of paths to release resources that were needed
> only for that comparison, and keep the data such as pathspec that
> are reused by the diffopt structure to make the next and subsequent
> comparison (imagine "git log -p -<options> -- <pathspec>" where the
> options and pathspec are kept in the diffopt structure, used to
> compare HEAD and HEAD~, then used again when HEAD~ and HEAD~2 are
> compared).
> 
> We by mistake started clearing the pathspec in diff_free(), so
> programs like gitk that runs
> 
>      git diff-tree --stdin -- <pathspec>
> 
> downstream of a pipe, processing one commit after another, started
> showing irrelevant comparison outside the given <pathspec> from the
> second commit.

I notice from the patch context that we are still calling 
diff_free_ignore_regex(options) which was added in c45dc9cf30 ("diff: 
plug memory leak from regcomp() on {log,diff} -I", 2021-02-11). I think 
that will need reverting as well as it freeing data that is needed when 
options is reused by "diff-tree --stdin" or "log -p".

Best Wishes

Phillip

> The buggy commit may have been hiding the places where diff
> machinery is used only once and called diff_free() to release that
> per-comparison resources, but forgetting to call clear_pathspec() to
> release the resource held for the (potentially) repeated comparison,
> and we eventually would want to add clear_pathspec() to clear
> resources to be released after a (potentially repeated) diff session
> is done (if there are similar resources other than pathspec that
> need to be cleared at the end, we should then know where to clear
> them), but that is "per program invocation" leak that will be
> cleaned up by calling exit(3) and of lower priority than fixing this
> behavior-breaking regression.
> 
> Reported-by: Matthias Aßhauer <mha1993@live.de>
> Helped-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>   add-interactive.c | 6 +++---
>   blame.c           | 3 +++
> 
>   builtin/reset.c   | 1 +
>   diff.c            | 1 -
>   notes-merge.c     | 2 ++
>   5 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/add-interactive.c b/add-interactive.c
> index e1ab39cce3..6498ae196f 100644
> --- a/add-interactive.c
> +++ b/add-interactive.c
> @@ -797,14 +797,14 @@ static int run_revert(struct add_i_state *s, const struct pathspec *ps,
>   	diffopt.flags.override_submodule_config = 1;
>   	diffopt.repo = s->r;
>   
> -	if (do_diff_cache(&oid, &diffopt)) {
> -		diff_free(&diffopt);
> +	if (do_diff_cache(&oid, &diffopt))
>   		res = -1;
> -	} else {
> +	else {
>   		diffcore_std(&diffopt);
>   		diff_flush(&diffopt);
>   	}
>   	free(paths);
> +	clear_pathspec(&diffopt.pathspec);
>   
>   	if (!res && write_locked_index(s->r->index, &index_lock,
>   				       COMMIT_LOCK) < 0)
> diff --git a/blame.c b/blame.c
> index 401990726e..206c295660 100644
> --- a/blame.c
> +++ b/blame.c
> @@ -1403,6 +1403,7 @@ static struct blame_origin *find_origin(struct repository *r,
>   		}
>   	}
>   	diff_flush(&diff_opts);
> +	clear_pathspec(&diff_opts.pathspec);
>   	return porigin;
>   }
>   
> @@ -1446,6 +1447,7 @@ static struct blame_origin *find_rename(struct repository *r,
>   		}
>   	}
>   	diff_flush(&diff_opts);
> +	clear_pathspec(&diff_opts.pathspec);
>   	return porigin;
>   }
>   
> @@ -2326,6 +2328,7 @@ static void find_copy_in_parent(struct blame_scoreboard *sb,
>   	} while (unblamed);
>   	target->suspects = reverse_blame(leftover, NULL);
>   	diff_flush(&diff_opts);
> +	clear_pathspec(&diff_opts.pathspec);
>   }
>   
>   /*
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 24968dd628..b97745ee94 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -274,6 +274,7 @@ static int read_from_tree(const struct pathspec *pathspec,
>   		return 1;
>   	diffcore_std(&opt);
>   	diff_flush(&opt);
> +	clear_pathspec(&opt.pathspec);
>   
>   	return 0;
>   }
> diff --git a/diff.c b/diff.c
> index 0aef3db6e1..c862771a58 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6345,7 +6345,6 @@ void diff_free(struct diff_options *options)
>   
>   	diff_free_file(options);
>   	diff_free_ignore_regex(options);
> -	clear_pathspec(&options->pathspec);
>   }
>   
>   void diff_flush(struct diff_options *options)
> diff --git a/notes-merge.c b/notes-merge.c
> index 7ba40cfb08..b4a3a903e8 100644
> --- a/notes-merge.c
> +++ b/notes-merge.c
> @@ -175,6 +175,7 @@ static struct notes_merge_pair *diff_tree_remote(struct notes_merge_options *o,
>   		       oid_to_hex(&mp->remote));
>   	}
>   	diff_flush(&opt);
> +	clear_pathspec(&opt.pathspec);
>   
>   	*num_changes = len;
>   	return changes;
> @@ -260,6 +261,7 @@ static void diff_tree_local(struct notes_merge_options *o,
>   		       oid_to_hex(&mp->local));
>   	}
>   	diff_flush(&opt);
> +	clear_pathspec(&opt.pathspec);
>   }
>   
>   static void check_notes_merge_worktree(struct notes_merge_options *o)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.36 gitk/diff-tree --stdin regression fix
  2022-04-26 10:09         ` [PATCH] 2.36 gitk/diff-tree --stdin regression fix Phillip Wood
@ 2022-04-26 13:45           ` Phillip Wood
  2022-04-26 15:16             ` Junio C Hamano
  2022-04-26 15:26             ` Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: Phillip Wood @ 2022-04-26 13:45 UTC (permalink / raw)
  To: Junio C Hamano, git
  Cc: Matthias Aßhauer, René Scharfe,
	Ævar Arnfjörð Bjarmason

On 26/04/2022 11:09, Phillip Wood wrote:
> On 25/04/2022 18:45, Junio C Hamano wrote:
>> This reverts commit 244c2724 (diff.[ch]: have diff_free() call
>> clear_pathspec(opts.pathspec), 2022-02-16).
>>
>> The diff_free() call is to be used after a diffopt structure is used
>> to compare two sets of paths to release resources that were needed
>> only for that comparison, and keep the data such as pathspec that
>> are reused by the diffopt structure to make the next and subsequent
>> comparison (imagine "git log -p -<options> -- <pathspec>" where the
>> options and pathspec are kept in the diffopt structure, used to
>> compare HEAD and HEAD~, then used again when HEAD~ and HEAD~2 are
>> compared).
>>
>> We by mistake started clearing the pathspec in diff_free(), so
>> programs like gitk that runs
>>
>>      git diff-tree --stdin -- <pathspec>
>>
>> downstream of a pipe, processing one commit after another, started
>> showing irrelevant comparison outside the given <pathspec> from the
>> second commit.
> 
> I notice from the patch context that we are still calling 
> diff_free_ignore_regex(options) which was added in c45dc9cf30 ("diff: 
> plug memory leak from regcomp() on {log,diff} -I", 2021-02-11). I think 
> that will need reverting as well as it freeing data that is needed when 
> options is reused by "diff-tree --stdin" or "log -p".

On further inspection we have tests for "log -p -I<regex>" in t4013 and 
e900d494dc ("diff: add an API for deferred freeing", 2021-02-11) 
modified builtin/log.c to set the new no_free flag so "log" should be 
OK. However "diff-tree --stdin -p -I<regex>" is not as 
builtin/diff-tree.c is unchanged by e900d494dc so the no_free flag is 
not set which I think is the cause of the problems reported here.

I think the close_file changes in e900d494dc should be safe as far as 
"diff-tree" is concerned as it never sets that flag.

In retrospect the no_free flag is pretty ugly and fragile. If we really 
cannot do it another way at least requiring callers to set a flag when 
they want things freeing would avoid nasty surprises like this at the 
expense of leaking when the caller forgets to set it. Perhaps once 
2.36.1 is out we should step back and think about exactly what we're 
trying to achieve by removing these bounded leaks rather than annotating 
them with UNLEAK().

Best Wishes

Phillip

> Best Wishes
> 
> Phillip
> 
>> The buggy commit may have been hiding the places where diff
>> machinery is used only once and called diff_free() to release that
>> per-comparison resources, but forgetting to call clear_pathspec() to
>> release the resource held for the (potentially) repeated comparison,
>> and we eventually would want to add clear_pathspec() to clear
>> resources to be released after a (potentially repeated) diff session
>> is done (if there are similar resources other than pathspec that
>> need to be cleared at the end, we should then know where to clear
>> them), but that is "per program invocation" leak that will be
>> cleaned up by calling exit(3) and of lower priority than fixing this
>> behavior-breaking regression.
>>
>> Reported-by: Matthias Aßhauer <mha1993@live.de>
>> Helped-by: René Scharfe <l.s.r@web.de>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>>   add-interactive.c | 6 +++---
>>   blame.c           | 3 +++
>>
>>   builtin/reset.c   | 1 +
>>   diff.c            | 1 -
>>   notes-merge.c     | 2 ++
>>   5 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/add-interactive.c b/add-interactive.c
>> index e1ab39cce3..6498ae196f 100644
>> --- a/add-interactive.c
>> +++ b/add-interactive.c
>> @@ -797,14 +797,14 @@ static int run_revert(struct add_i_state *s, 
>> const struct pathspec *ps,
>>       diffopt.flags.override_submodule_config = 1;
>>       diffopt.repo = s->r;
>> -    if (do_diff_cache(&oid, &diffopt)) {
>> -        diff_free(&diffopt);
>> +    if (do_diff_cache(&oid, &diffopt))
>>           res = -1;
>> -    } else {
>> +    else {
>>           diffcore_std(&diffopt);
>>           diff_flush(&diffopt);
>>       }
>>       free(paths);
>> +    clear_pathspec(&diffopt.pathspec);
>>       if (!res && write_locked_index(s->r->index, &index_lock,
>>                          COMMIT_LOCK) < 0)
>> diff --git a/blame.c b/blame.c
>> index 401990726e..206c295660 100644
>> --- a/blame.c
>> +++ b/blame.c
>> @@ -1403,6 +1403,7 @@ static struct blame_origin *find_origin(struct 
>> repository *r,
>>           }
>>       }
>>       diff_flush(&diff_opts);
>> +    clear_pathspec(&diff_opts.pathspec);
>>       return porigin;
>>   }
>> @@ -1446,6 +1447,7 @@ static struct blame_origin *find_rename(struct 
>> repository *r,
>>           }
>>       }
>>       diff_flush(&diff_opts);
>> +    clear_pathspec(&diff_opts.pathspec);
>>       return porigin;
>>   }
>> @@ -2326,6 +2328,7 @@ static void find_copy_in_parent(struct 
>> blame_scoreboard *sb,
>>       } while (unblamed);
>>       target->suspects = reverse_blame(leftover, NULL);
>>       diff_flush(&diff_opts);
>> +    clear_pathspec(&diff_opts.pathspec);
>>   }
>>   /*
>> diff --git a/builtin/reset.c b/builtin/reset.c
>> index 24968dd628..b97745ee94 100644
>> --- a/builtin/reset.c
>> +++ b/builtin/reset.c
>> @@ -274,6 +274,7 @@ static int read_from_tree(const struct pathspec 
>> *pathspec,
>>           return 1;
>>       diffcore_std(&opt);
>>       diff_flush(&opt);
>> +    clear_pathspec(&opt.pathspec);
>>       return 0;
>>   }
>> diff --git a/diff.c b/diff.c
>> index 0aef3db6e1..c862771a58 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -6345,7 +6345,6 @@ void diff_free(struct diff_options *options)
>>       diff_free_file(options);
>>       diff_free_ignore_regex(options);
>> -    clear_pathspec(&options->pathspec);
>>   }
>>   void diff_flush(struct diff_options *options)
>> diff --git a/notes-merge.c b/notes-merge.c
>> index 7ba40cfb08..b4a3a903e8 100644
>> --- a/notes-merge.c
>> +++ b/notes-merge.c
>> @@ -175,6 +175,7 @@ static struct notes_merge_pair 
>> *diff_tree_remote(struct notes_merge_options *o,
>>                  oid_to_hex(&mp->remote));
>>       }
>>       diff_flush(&opt);
>> +    clear_pathspec(&opt.pathspec);
>>       *num_changes = len;
>>       return changes;
>> @@ -260,6 +261,7 @@ static void diff_tree_local(struct 
>> notes_merge_options *o,
>>                  oid_to_hex(&mp->local));
>>       }
>>       diff_flush(&opt);
>> +    clear_pathspec(&opt.pathspec);
>>   }
>>   static void check_notes_merge_worktree(struct notes_merge_options *o)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.36 gitk/diff-tree --stdin regression fix
  2022-04-26 13:45           ` Phillip Wood
@ 2022-04-26 15:16             ` Junio C Hamano
  2022-04-26 15:26             ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2022-04-26 15:16 UTC (permalink / raw)
  To: Phillip Wood
  Cc: git, Matthias Aßhauer, René Scharfe,
	Ævar Arnfjörð Bjarmason

Phillip Wood <phillip.wood123@gmail.com> writes:

> On further inspection we have tests for "log -p -I<regex>" in t4013
> and e900d494dc ("diff: add an API for deferred freeing", 2021-02-11) 
> modified builtin/log.c to set the new no_free flag so "log" should be
> OK. However "diff-tree --stdin -p -I<regex>" is not as 
> builtin/diff-tree.c is unchanged by e900d494dc so the no_free flag is
> not set which I think is the cause of the problems reported here.

... reported here, meaning some reproduction exists?  It would be
good to have it in the test, next to the ones I added yesterday, I
think.

In any case, I think that is a much older breakage that can be left
after this "oops, where is my pathspec?" regression is dealt with.

> I think the close_file changes in e900d494dc should be safe as far as
> "diff-tree" is concerned as it never sets that flag.
>
> In retrospect the no_free flag is pretty ugly and fragile.

Yes.

> If we
> really cannot do it another way at least requiring callers to set a
> flag when they want things freeing would avoid nasty surprises like
> this at the expense of leaking when the caller forgets to set
> it. Perhaps once 2.36.1 is out we should step back and think about
> exactly what we're trying to achieve by removing these bounded leaks
> rather than annotating them with UNLEAK().

Doubly yes.

There is small per-task resources allocated that is not proportional
to the size of the task, i.e. "git log -p" may need more resource at
"peak" in a project with 100k files than in a project with 1k files,
and we do not want to leak these resources we use to compare two sets
of 100k (or 1k) files between "commit^" and "commit".  It may allocate
and deallocate more times in a project with 100k commits than in a
project with 1k commits, and we do not want to leak 100 times more
resources in the former project than the latter.  Aiming to reclaim
these resources needed proportinally to the size of the task is
absolutely a good thing to do.

But the final clean-up for the very top-level allocations that is
not proportional to the size of the task, like pathspec, regex, and
other result from parsing command line options and configuration
variables?  Using UNLEAK() to squelch the leak checker and letting
process termination to reclaim them is absolutely a no-cost solution
that is much better.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.36 gitk/diff-tree --stdin regression fix
  2022-04-26 13:45           ` Phillip Wood
  2022-04-26 15:16             ` Junio C Hamano
@ 2022-04-26 15:26             ` Junio C Hamano
  2022-04-26 16:11               ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2022-04-26 15:26 UTC (permalink / raw)
  To: Phillip Wood
  Cc: git, Matthias Aßhauer, René Scharfe,
	Ævar Arnfjörð Bjarmason

I wonder if we would have caught a regression like the one if we
used FREE_AND_NULL more sparingly.  For example, if we prematurely
called clear_pathspec(), the second iteration, because there is
free-and-null of pathspec->items and resetting pathspec->nr to 0,
would behave very normally as if there is no pathspec.  If we just
freed things, without NULLing them out or resetting .nr to 0, the
second iteration would try to access garbage and hopefully we will
catch a crash before such a code would have escaped the lab.

In any case, based on what I heard here, it appears that mimicking
"git log" does may probably be a better way to deal with this
regression?  As you said, all the other things diff_free() calls are
unwanted while "diff-tree --stdin" is still working, just like
"log"?

Thanks.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] 2.36 gitk/diff-tree --stdin regression fix
  2022-04-26 15:26             ` Junio C Hamano
@ 2022-04-26 16:11               ` Junio C Hamano
  2022-04-27 16:42                 ` René Scharfe
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2022-04-26 16:11 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Matthias Aßhauer, René Scharfe,
	Ævar Arnfjörð Bjarmason

This only surfaced as a regression after 2.36 release, but the
breakage was already there with us for at least a year.

The diff_free() call is to be used after we completely finished with
a diffopt structure.  After "git diff A B" finishes producing
output, calling it before process exit is fine.  But there are
commands that prepares diff_options struct once, compares two sets
of paths, releases resources that were used to do the comparison,
then reuses the same diff_option struct to go on to compare the next
two sets of paths, like "git log -p".  

After "git log -p" finishes showing a single commit, calling it
before it goes on to the next commit is NOT fine.  There is a
mechanism, the .no_free member in diff_options struct, to help "git
log" to avoid calling diff_free() after showing each commit and
instead call it just one.  When the mechanism was introduced in
e900d494 (diff: add an API for deferred freeing, 2021-02-11),
however, we forgot to do the same to "diff-tree --stdin", which *is*
a moral equivalent to "git log".

During 2.36 release cycle, we started clearing the pathspec in
diff_free(), so programs like gitk that runs

    git diff-tree --stdin -- <pathspec>

downstream of a pipe, processing one commit after another, started
showing irrelevant comparison outside the given <pathspec> from the
second commit.  The same commit, by forgetting to teach the .no_free
mechanism, broke "diff-tree --stdin -I<regexp>" and nobody noticed
it for over a year, presumably because it is so seldom used an
option.

But <pathspec> is a different story.  The breakage was very
prominently visible and was reported immediately after 2.36 was
released.

Fix this breakage by mimicking how "git log" utilizes the .no_free
member so that "diff-tree --stdin" behaves more similarly to "log".

Protect the fix with a few new tests.

Reported-by: Matthias Aßhauer <mha1993@live.de>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * I feel MUCH better with this than the revert, now Phillip helped
   me to get the root cause straight.  Addition of clear_pathspec()
   to diff_tree() was *not* a mistake but is quite reasonable thing
   to do.  Not using the .no_free hack in a code path that needed it
   was.



 builtin/diff-tree.c     |  3 +++
 log-tree.c              |  1 +
 t/t4013-diff-various.sh | 14 ++++++++++++++
 3 files changed, 18 insertions(+)

diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 0e0ac1f167..116097a404 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -195,6 +195,7 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 		int saved_dcctc = 0;
 
 		opt->diffopt.rotate_to_strict = 0;
+		opt->diffopt.no_free = 1;
 		if (opt->diffopt.detect_rename) {
 			if (!the_index.cache)
 				repo_read_index(the_repository);
@@ -217,6 +218,8 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 		}
 		opt->diffopt.degraded_cc_to_c = saved_dcctc;
 		opt->diffopt.needed_rename_limit = saved_nrl;
+		opt->diffopt.no_free = 0;
+		diff_free(&opt->diffopt);
 	}
 
 	return diff_result_code(&opt->diffopt, 0);
diff --git a/log-tree.c b/log-tree.c
index 25165e2a91..f8c18fd8b9 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -1098,6 +1098,7 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 	opt->loginfo = &log;
 	opt->diffopt.no_free = 1;
 
+	/* NEEDSWORK: no restoring of no_free?  Why? */
 	if (opt->line_level_traverse)
 		return line_log_print(opt, commit);
 
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 750aee17ea..628b01f355 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -542,6 +542,20 @@ test_expect_success 'diff-tree --stdin with log formatting' '
 	test_cmp expect actual
 '
 
+test_expect_success 'diff-tree --stdin with pathspec' '
+	cat >expect <<-EOF &&
+	Third
+
+	dir/sub
+	Second
+
+	dir/sub
+	EOF
+	git rev-list master^ |
+	git diff-tree -r --stdin --name-only --format=%s dir >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'diff -I<regex>: setup' '
 	git checkout master &&
 	test_seq 50 >file0 &&
-- 
2.36.0-202-g319c44b8f9


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.36 gitk/diff-tree --stdin regression fix
  2022-04-26 16:11               ` Junio C Hamano
@ 2022-04-27 16:42                 ` René Scharfe
  2022-04-27 18:06                   ` René Scharfe
  0 siblings, 1 reply; 16+ messages in thread
From: René Scharfe @ 2022-04-27 16:42 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason, git
  Cc: Phillip Wood, Matthias Aßhauer

Am 26.04.22 um 18:11 schrieb Junio C Hamano:
> This only surfaced as a regression after 2.36 release, but the
> breakage was already there with us for at least a year.
>
> The diff_free() call is to be used after we completely finished with
> a diffopt structure.  After "git diff A B" finishes producing
> output, calling it before process exit is fine.  But there are
> commands that prepares diff_options struct once, compares two sets
> of paths, releases resources that were used to do the comparison,
> then reuses the same diff_option struct to go on to compare the next
> two sets of paths, like "git log -p".
>
> After "git log -p" finishes showing a single commit, calling it
> before it goes on to the next commit is NOT fine.  There is a
> mechanism, the .no_free member in diff_options struct, to help "git
> log" to avoid calling diff_free() after showing each commit and
> instead call it just one.  When the mechanism was introduced in
> e900d494 (diff: add an API for deferred freeing, 2021-02-11),
> however, we forgot to do the same to "diff-tree --stdin", which *is*
> a moral equivalent to "git log".
>
> During 2.36 release cycle, we started clearing the pathspec in
> diff_free(), so programs like gitk that runs
>
>     git diff-tree --stdin -- <pathspec>
>
> downstream of a pipe, processing one commit after another, started
> showing irrelevant comparison outside the given <pathspec> from the
> second commit.  The same commit, by forgetting to teach the .no_free
> mechanism, broke "diff-tree --stdin -I<regexp>" and nobody noticed
> it for over a year, presumably because it is so seldom used an
> option.
>
> But <pathspec> is a different story.  The breakage was very
> prominently visible and was reported immediately after 2.36 was
> released.
>
> Fix this breakage by mimicking how "git log" utilizes the .no_free
> member so that "diff-tree --stdin" behaves more similarly to "log".
>
> Protect the fix with a few new tests.

We could check where reused diffopts caused a pathspec loss at runtime,
like in the patch below.  Then we "just" need to get the relevant test
coverage to 100% and we'll find them all.

With your patch on top of main, "make test" passes for me.  With the
patch below added as well I get failures in three test scripts:

t3427-rebase-subtree.sh                          (Wstat: 256 Tests: 3 Failed: 2)
  Failed tests:  2-3
  Non-zero exit status: 1
t4014-format-patch.sh                            (Wstat: 256 Tests: 190 Failed: 1)
  Failed test:  73
  Non-zero exit status: 1
t9350-fast-export.sh                             (Wstat: 256 Tests: 50 Failed: 3)
  Failed tests:  30, 32, 43
  Non-zero exit status: 1

The format-patch is a bit surprising to me because it already sets
no_free conditionally.  t4014 is successful if no_free is set in all
cases, so the condition seems to be too narrow -- but I don't understand
it.  Didn't look at the other cases.

---
 diff.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/diff.c b/diff.c
index ef7159968b..b7c837aca8 100644
--- a/diff.c
+++ b/diff.c
@@ -6455,9 +6455,16 @@ static void diff_free_ignore_regex(struct diff_options *options)

 void diff_free(struct diff_options *options)
 {
+	static struct diff_options *prev_options_with_pathspec;
+
 	if (options->no_free)
 		return;

+	if (prev_options_with_pathspec == options && !options->pathspec.nr)
+		BUG("reused struct diff_options, potentially lost pathspec");
+	if (options->pathspec.nr)
+		prev_options_with_pathspec = options;
+
 	diff_free_file(options);
 	diff_free_ignore_regex(options);
 	clear_pathspec(&options->pathspec);
--
2.35.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.36 gitk/diff-tree --stdin regression fix
  2022-04-27 16:42                 ` René Scharfe
@ 2022-04-27 18:06                   ` René Scharfe
  2022-04-27 20:03                     ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: René Scharfe @ 2022-04-27 18:06 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason, git
  Cc: Phillip Wood, Matthias Aßhauer

Am 27.04.22 um 18:42 schrieb René Scharfe:
> Am 26.04.22 um 18:11 schrieb Junio C Hamano:
>> This only surfaced as a regression after 2.36 release, but the
>> breakage was already there with us for at least a year.
>>
>> The diff_free() call is to be used after we completely finished with
>> a diffopt structure.  After "git diff A B" finishes producing
>> output, calling it before process exit is fine.  But there are
>> commands that prepares diff_options struct once, compares two sets
>> of paths, releases resources that were used to do the comparison,
>> then reuses the same diff_option struct to go on to compare the next
>> two sets of paths, like "git log -p".
>>
>> After "git log -p" finishes showing a single commit, calling it
>> before it goes on to the next commit is NOT fine.  There is a
>> mechanism, the .no_free member in diff_options struct, to help "git
>> log" to avoid calling diff_free() after showing each commit and
>> instead call it just one.  When the mechanism was introduced in
>> e900d494 (diff: add an API for deferred freeing, 2021-02-11),
>> however, we forgot to do the same to "diff-tree --stdin", which *is*
>> a moral equivalent to "git log".
>>
>> During 2.36 release cycle, we started clearing the pathspec in
>> diff_free(), so programs like gitk that runs
>>
>>     git diff-tree --stdin -- <pathspec>
>>
>> downstream of a pipe, processing one commit after another, started
>> showing irrelevant comparison outside the given <pathspec> from the
>> second commit.  The same commit, by forgetting to teach the .no_free
>> mechanism, broke "diff-tree --stdin -I<regexp>" and nobody noticed
>> it for over a year, presumably because it is so seldom used an
>> option.
>>
>> But <pathspec> is a different story.  The breakage was very
>> prominently visible and was reported immediately after 2.36 was
>> released.
>>
>> Fix this breakage by mimicking how "git log" utilizes the .no_free
>> member so that "diff-tree --stdin" behaves more similarly to "log".
>>
>> Protect the fix with a few new tests.
>
> We could check where reused diffopts caused a pathspec loss at runtime,
> like in the patch below.  Then we "just" need to get the relevant test
> coverage to 100% and we'll find them all.
>
> With your patch on top of main, "make test" passes for me.  With the
> patch below added as well I get failures in three test scripts:
>
> t3427-rebase-subtree.sh                          (Wstat: 256 Tests: 3 Failed: 2)
>   Failed tests:  2-3
>   Non-zero exit status: 1
> t4014-format-patch.sh                            (Wstat: 256 Tests: 190 Failed: 1)
>   Failed test:  73
>   Non-zero exit status: 1
> t9350-fast-export.sh                             (Wstat: 256 Tests: 50 Failed: 3)
>   Failed tests:  30, 32, 43
>   Non-zero exit status: 1
>
> The format-patch is a bit surprising to me because it already sets
> no_free conditionally.  t4014 is successful if no_free is set in all
> cases, so the condition seems to be too narrow -- but I don't understand
> it.  Didn't look at the other cases.
>
> ---
>  diff.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/diff.c b/diff.c
> index ef7159968b..b7c837aca8 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6455,9 +6455,16 @@ static void diff_free_ignore_regex(struct diff_options *options)
>
>  void diff_free(struct diff_options *options)
>  {
> +	static struct diff_options *prev_options_with_pathspec;
> +
>  	if (options->no_free)
>  		return;
>
> +	if (prev_options_with_pathspec == options && !options->pathspec.nr)
> +		BUG("reused struct diff_options, potentially lost pathspec");
> +	if (options->pathspec.nr)
> +		prev_options_with_pathspec = options;

This can report a false positive if a diffopt is reused with different
pathspecs, and one of them is empty (match all).  Which could be countered
by using a fresh diffopt every time (e.g. pushing it into a loop).

> +
>  	diff_free_file(options);
>  	diff_free_ignore_regex(options);
>  	clear_pathspec(&options->pathspec);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.36 gitk/diff-tree --stdin regression fix
  2022-04-27 18:06                   ` René Scharfe
@ 2022-04-27 20:03                     ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2022-04-27 20:03 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, git, Phillip Wood,
	Matthias Aßhauer

René Scharfe <l.s.r@web.de> writes:

>> +	if (prev_options_with_pathspec == options && !options->pathspec.nr)
>> +		BUG("reused struct diff_options, potentially lost pathspec");
>> +	if (options->pathspec.nr)
>> +		prev_options_with_pathspec = options;
>
> This can report a false positive if a diffopt is reused with different
> pathspecs, and one of them is empty (match all).  Which could be countered
> by using a fresh diffopt every time (e.g. pushing it into a loop).

The only use case to reset pathspec of a diffopt during iteration I
can think of is the hacky[*] version of "git log --follow" where the
pathspec is swapped when a rename of a single path being followed is
detected.

    Side note: hacky because the way it swaps a single pathspec upon
    seeing one rename means it does not work in a mergy-branchy
    history where one branch renames and there are still commits
    that need to be explored on the other branch that had the path
    under its original name.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-04-27 20:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-23  5:25 gitk regression in version 2.36.0 Matthias Aßhauer
2022-04-23  5:54 ` Junio C Hamano
2022-04-23  6:05   ` Junio C Hamano
2022-04-23 10:13   ` René Scharfe
2022-04-23 16:00     ` Junio C Hamano
2022-04-25 17:45       ` [PATCH] 2.36 gitk/diff-tree --stdin regression fix Junio C Hamano
2022-04-25 22:37         ` [PATCH] t4013: diff-tree --stdin with pathspec Junio C Hamano
2022-04-26 10:09         ` [PATCH] 2.36 gitk/diff-tree --stdin regression fix Phillip Wood
2022-04-26 13:45           ` Phillip Wood
2022-04-26 15:16             ` Junio C Hamano
2022-04-26 15:26             ` Junio C Hamano
2022-04-26 16:11               ` Junio C Hamano
2022-04-27 16:42                 ` René Scharfe
2022-04-27 18:06                   ` René Scharfe
2022-04-27 20:03                     ` Junio C Hamano
2022-04-23  9:27 ` gitk regression in version 2.36.0 René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).