git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Bug: `git show` honors path filters only for the first commit
@ 2022-04-30  2:22 Daniel Li
  2022-04-30  4:59 ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Li @ 2022-04-30  2:22 UTC (permalink / raw)
  To: git

git version: 2.36.0
OS: macOS Monterey 12.2.1
Installed via: homebrew

Bug summary: When `git show` is invoked with more than one commit, it
only respects the `<path>` filters for the first commit.

This is best illustrated with an example. In git's own source repo,
invoke `git show --oneline --name-only ecc7c8841d 961b130d20
d3115660b4` and observe the files these commits touch (the `--oneline
--name-only` arguments are to make this example short; they're not
relevant to the bug):

    $ git show --oneline --numstat ecc7c8841d 961b130d20 d3115660b4
    ecc7c8841d repo_read_index: add config to expect files outside
sparse patterns
    2       0       Documentation/config.txt
    27      0       Documentation/config/sparse.txt
    1       0       cache.h
    14      0       config.c
    1       0       environment.c
    2       1       sparse-index.c
    19      0       t/t1090-sparse-checkout-scope.sh
    961b130d20 branch: add --recurse-submodules option for branch creation
    3       0       Documentation/config/advice.txt
    26      11      Documentation/config/submodule.txt
    18      1       Documentation/git-branch.txt
    1       0       advice.c
    1       0       advice.h
    141     0       branch.c
    29      0       branch.h
    38      6       builtin/branch.c
    38      0       builtin/submodule--helper.c
    61      0       submodule-config.c
    34      0       submodule-config.h
    9       2       submodule.c
    3       0       submodule.h
    292     0       t/t3207-branch-submodule.sh
    d3115660b4 branch: add flags and config to inherit tracking
    2       1       Documentation/config/branch.txt
    17      7       Documentation/git-branch.txt
    1       1       Documentation/git-checkout.txt
    1       1       Documentation/git-switch.txt
    42      7       branch.c
    2       1       branch.h
    4       2       builtin/branch.c
    4       2       builtin/checkout.c
    3       0       config.c
    16      0       parse-options-cb.c
    2       0       parse-options.h
    10      1       t/t2017-checkout-orphan.sh
    23      0       t/t2027-checkout-track.sh
    28      0       t/t2060-switch.sh
    33      0       t/t3200-branch.sh
    17      0       t/t7201-co.sh

Now invoke the same command but filtered on files under the
`Documentation/` directory. Observe that this path is only respected
for the first commit:

    $ git show --oneline --numstat ecc7c8841d 961b130d20 d3115660b4 --
Documentation
    ecc7c8841d repo_read_index: add config to expect files outside
sparse patterns
    2       0       Documentation/config.txt
    27      0       Documentation/config/sparse.txt
    961b130d20 branch: add --recurse-submodules option for branch creation
    3       0       Documentation/config/advice.txt
    26      11      Documentation/config/submodule.txt
    18      1       Documentation/git-branch.txt
    1       0       advice.c
    1       0       advice.h
    141     0       branch.c
    29      0       branch.h
    38      6       builtin/branch.c
    38      0       builtin/submodule--helper.c
    61      0       submodule-config.c
    34      0       submodule-config.h
    9       2       submodule.c
    3       0       submodule.h
    292     0       t/t3207-branch-submodule.sh
    d3115660b4 branch: add flags and config to inherit tracking
    2       1       Documentation/config/branch.txt
    17      7       Documentation/git-branch.txt
    1       1       Documentation/git-checkout.txt
    1       1       Documentation/git-switch.txt
    42      7       branch.c
    2       1       branch.h
    4       2       builtin/branch.c
    4       2       builtin/checkout.c
    3       0       config.c
    16      0       parse-options-cb.c
    2       0       parse-options.h
    10      1       t/t2017-checkout-orphan.sh
    23      0       t/t2027-checkout-track.sh
    28      0       t/t2060-switch.sh
    33      0       t/t3200-branch.sh
    17      0       t/t7201-co.sh

The expected output should be:

    $ git show --oneline --numstat ecc7c8841d 961b130d20 d3115660b4 --
Documentation
    ecc7c8841d repo_read_index: add config to expect files outside
sparse patterns
    2       0       Documentation/config.txt
    27      0       Documentation/config/sparse.txt
    961b130d20 branch: add --recurse-submodules option for branch creation
    3       0       Documentation/config/advice.txt
    26      11      Documentation/config/submodule.txt
    18      1       Documentation/git-branch.txt
    d3115660b4 branch: add flags and config to inherit tracking
    2       1       Documentation/config/branch.txt
    17      7       Documentation/git-branch.txt
    1       1       Documentation/git-checkout.txt
    1       1       Documentation/git-switch.txt

Bonus: Surprisingly, the `<path>` *is* respected for commits that
don't have any files satisfying the `<path>`. For example, the
following command correctly excludes commit `f36d4f8316` from the
output because it doesn't contain any files under `Documentation/`:

    $ git show --oneline --numstat ecc7c8841d f36d4f8316 961b130d20
d3115660b4 -- Documentation
    ecc7c8841d repo_read_index: add config to expect files outside
sparse patterns
    2       0       Documentation/config.txt
    27      0       Documentation/config/sparse.txt
    961b130d20 branch: add --recurse-submodules option for branch creation
    3       0       Documentation/config/advice.txt
    26      11      Documentation/config/submodule.txt
    18      1       Documentation/git-branch.txt
    1       0       advice.c
    1       0       advice.h
    141     0       branch.c
    29      0       branch.h
    38      6       builtin/branch.c
    38      0       builtin/submodule--helper.c
    61      0       submodule-config.c
    34      0       submodule-config.h
    9       2       submodule.c
    3       0       submodule.h
    292     0       t/t3207-branch-submodule.sh
    d3115660b4 branch: add flags and config to inherit tracking
    2       1       Documentation/config/branch.txt
    17      7       Documentation/git-branch.txt
    1       1       Documentation/git-checkout.txt
    1       1       Documentation/git-switch.txt
    42      7       branch.c
    2       1       branch.h
    4       2       builtin/branch.c
    4       2       builtin/checkout.c
    3       0       config.c
    16      0       parse-options-cb.c
    2       0       parse-options.h
    10      1       t/t2017-checkout-orphan.sh
    23      0       t/t2027-checkout-track.sh
    28      0       t/t2060-switch.sh
    33      0       t/t3200-branch.sh
    17      0       t/t7201-co.sh

Cheers,

Dan Li

--

Daniel Li
dan@danielyli.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug: `git show` honors path filters only for the first commit
  2022-04-30  2:22 Bug: `git show` honors path filters only for the first commit Daniel Li
@ 2022-04-30  4:59 ` Junio C Hamano
  2022-04-30  5:29   ` [PATCH] 2.36 show regression fix Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2022-04-30  4:59 UTC (permalink / raw)
  To: Daniel Li, Ævar Arnfjörð Bjarmason, Phillip Wood,
	René Scharfe
  Cc: git

Daniel Li <dan@danielyli.com> writes:

> git version: 2.36.0
> OS: macOS Monterey 12.2.1
> Installed via: homebrew

I think this is the same regression as the recently talked about
"diff-tree --stdin" aka "gitk" regression.

https://lore.kernel.org/git/xmqq7d7bsu2n.fsf@gitster.g/

e900d494 (diff: add an API for deferred freeing, 2021-02-11), broke
cmd_log_walk(), and we started to lose some setting that was parsed
from the command line and stored in the diff_options structure after
cmd_log_walk() runs just once.  But "git show A B" runs the function
once for each commit.   A recent change in 2.36.0 made it worse by
adding <pathspec> to the set of setting that gets lost after
cmd_log_walk() runs once.

Thanks for a report.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] 2.36 show regression fix
  2022-04-30  4:59 ` Junio C Hamano
@ 2022-04-30  5:29   ` Junio C Hamano
  2022-04-30 10:32     ` [PATCH] 2.36 format-patch " René Scharfe
  2022-04-30 14:31     ` [PATCH] 2.36 fast-export regression fix René Scharfe
  0 siblings, 2 replies; 10+ messages in thread
From: Junio C Hamano @ 2022-04-30  5:29 UTC (permalink / raw)
  To: Daniel Li
  Cc: Ævar Arnfjörð Bjarmason, Phillip Wood,
	René Scharfe, git

This only surfaced as a regression after 2.36 release, but the
breakage was already there with us for at least a year.

e900d494 (diff: add an API for deferred freeing, 2021-02-11)
introduced a mechanism to delay freeing resources held in
diff_options struct that need to be kept as long as the struct will
be reused to compute diff.  "git log -p" was taught to utilize the
mechanism but it was done with an incorrect assumption that the
underlying helper function, cmd_log_walk(), is called only once,
and it is OK to do the freeing at the end of it.

Alas, for "git show A B", the function is called once for each
commit given, so it is not OK to free the resources until we finish
calling it for all the commits given from the command line.

During 2.36 release cycle, we started clearing the <pathspec> as
part of this freeing, which made the bug a lot more visible.

Fix this breakage by tweaking how cmd_log_walk() frees the resources
at the end and using a variant of it that does not immediately free
the resources to show each commit object from the command line in
"git show".

Protect the fix with a few new tests.

Reported-by: Daniel Li <dan@danielyli.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

Junio C Hamano <gitster@pobox.com> writes:

> Daniel Li <dan@danielyli.com> writes:
>
>> git version: 2.36.0
>> OS: macOS Monterey 12.2.1
>> Installed via: homebrew
>
> I think this is the same regression as the recently talked about
> "diff-tree --stdin" aka "gitk" regression.
>
> https://lore.kernel.org/git/xmqq7d7bsu2n.fsf@gitster.g/
>
> e900d494 (diff: add an API for deferred freeing, 2021-02-11), broke
> cmd_log_walk(), and we started to lose some setting that was parsed
> from the command line and stored in the diff_options structure after
> cmd_log_walk() runs just once.  But "git show A B" runs the function
> once for each commit.   A recent change in 2.36.0 made it worse by
> adding <pathspec> to the set of setting that gets lost after
> cmd_log_walk() runs once.
>
> Thanks for a report.

 builtin/log.c           | 23 ++++++++++++++++++-----
 t/t4013-diff-various.sh | 19 +++++++++++++++++++
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index c211d66d1d..6696c4cfd0 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -417,7 +417,7 @@ static void finish_early_output(struct rev_info *rev)
 	show_early_header(rev, "done", n);
 }
 
-static int cmd_log_walk(struct rev_info *rev)
+static int cmd_log_walk_no_free(struct rev_info *rev)
 {
 	struct commit *commit;
 	int saved_nrl = 0;
@@ -444,7 +444,6 @@ static int cmd_log_walk(struct rev_info *rev)
 	 * and HAS_CHANGES being accumulated in rev->diffopt, so be careful to
 	 * retain that state information if replacing rev->diffopt in this loop
 	 */
-	rev->diffopt.no_free = 1;
 	while ((commit = get_revision(rev)) != NULL) {
 		if (!log_tree_commit(rev, commit) && rev->max_count >= 0)
 			/*
@@ -469,8 +468,6 @@ static int cmd_log_walk(struct rev_info *rev)
 	}
 	rev->diffopt.degraded_cc_to_c = saved_dcctc;
 	rev->diffopt.needed_rename_limit = saved_nrl;
-	rev->diffopt.no_free = 0;
-	diff_free(&rev->diffopt);
 
 	if (rev->remerge_diff) {
 		tmp_objdir_destroy(rev->remerge_objdir);
@@ -484,6 +481,17 @@ static int cmd_log_walk(struct rev_info *rev)
 	return diff_result_code(&rev->diffopt, 0);
 }
 
+static int cmd_log_walk(struct rev_info *rev)
+{
+	int retval;
+
+	rev->diffopt.no_free = 1;
+	retval = cmd_log_walk_no_free(rev);
+	rev->diffopt.no_free = 0;
+	diff_free(&rev->diffopt);
+	return retval;
+}
+
 static int git_log_config(const char *var, const char *value, void *cb)
 {
 	const char *slot_name;
@@ -680,6 +688,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 
 	count = rev.pending.nr;
 	objects = rev.pending.objects;
+	rev.diffopt.no_free = 1;
 	for (i = 0; i < count && !ret; i++) {
 		struct object *o = objects[i].item;
 		const char *name = objects[i].name;
@@ -725,12 +734,16 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			rev.pending.nr = rev.pending.alloc = 0;
 			rev.pending.objects = NULL;
 			add_object_array(o, name, &rev.pending);
-			ret = cmd_log_walk(&rev);
+			ret = cmd_log_walk_no_free(&rev);
 			break;
 		default:
 			ret = error(_("unknown type: %d"), o->type);
 		}
 	}
+
+	rev.diffopt.no_free = 0;
+	diff_free(&rev.diffopt);
+
 	free(objects);
 	return ret;
 }
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 750aee17ea..7a44d5d595 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -542,6 +542,25 @@ test_expect_success 'diff-tree --stdin with log formatting' '
 	test_cmp expect actual
 '
 
+test_expect_success 'show A B ... -- <pathspec>' '
+	# side touches dir/sub, file0, and file3
+	# master^ touches dir/sub, and file1
+	# master^^ touches dir/sub, file0, and file2
+	git show --name-only --format="<%s>" side master^ master^^ -- dir >actual &&
+	cat >expect <<-\EOF &&
+	<Side>
+
+	dir/sub
+	<Third>
+
+	dir/sub
+	<Second>
+
+	dir/sub
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'diff -I<regex>: setup' '
 	git checkout master &&
 	test_seq 50 >file0 &&
-- 
2.36.0-256-g547811d5a1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] 2.36 format-patch regression fix
  2022-04-30  5:29   ` [PATCH] 2.36 show regression fix Junio C Hamano
@ 2022-04-30 10:32     ` René Scharfe
  2022-04-30 16:32       ` Carlo Marcelo Arenas Belón
  2022-04-30 14:31     ` [PATCH] 2.36 fast-export regression fix René Scharfe
  1 sibling, 1 reply; 10+ messages in thread
From: René Scharfe @ 2022-04-30 10:32 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason
  Cc: Phillip Wood, Daniel Li, git

e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
that mechanism mandatory.

git format-patch only sets no_free when --output is given, causing it to
forget pathspecs after the first commit.  Set no_free unconditionally
instead.

The existing test was unable to detect this breakage because it checks
stderr for the absence of a certain string, but format-patch writes to
stdout.  Also the test was not checking the case of one commit modifying
multiple files and a pathspec limiting the diff.  Replace it with a more
thorough one.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/log.c           |  9 ++-------
 t/t4014-format-patch.sh | 33 +++++++++++++++++++++++++++++++--
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index c211d66d1d..9acc130594 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1883,6 +1883,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 	rev.diff = 1;
 	rev.max_parents = 1;
 	rev.diffopt.flags.recursive = 1;
+	rev.diffopt.no_free = 1;
 	rev.subject_prefix = fmt_patch_subject_prefix;
 	memset(&s_r_opt, 0, sizeof(s_r_opt));
 	s_r_opt.def = "HEAD";
@@ -2008,13 +2009,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)

 	if (use_stdout) {
 		setup_pager();
-	} else if (rev.diffopt.close_file) {
-		/*
-		 * The diff code parsed --output; it has already opened the
-		 * file, but we must instruct it not to close after each diff.
-		 */
-		rev.diffopt.no_free = 1;
-	} else {
+	} else if (!rev.diffopt.close_file) {
 		int saved;

 		if (!output_directory)
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 7dc5a5c736..fbec8ad2ef 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -926,11 +926,40 @@ test_expect_success 'format-patch --numstat should produce a patch' '
 '

 test_expect_success 'format-patch -- <path>' '
-	git format-patch main..side -- file 2>error &&
-	! grep "Use .--" error
+	rm -f *.patch &&
+	git checkout -b pathspec main &&
+
+	echo file_a 1 >file_a &&
+	echo file_b 1 >file_b &&
+	git add file_a file_b &&
+	git commit -m pathspec_initial &&
+
+	echo file_a 2 >>file_a &&
+	git add file_a &&
+	git commit -m pathspec_a &&
+
+	echo file_b 2 >>file_b &&
+	git add file_b &&
+	git commit -m pathspec_b &&
+
+	echo file_a 3 >>file_a &&
+	echo file_b 3 >>file_b &&
+	git add file_a file_b &&
+	git commit -m pathspec_ab &&
+
+	cat >expect <<-\EOF &&
+	0001-pathspec_initial.patch
+	0002-pathspec_a.patch
+	0003-pathspec_ab.patch
+	EOF
+
+	git format-patch main..pathspec -- file_a >output &&
+	test_cmp expect output &&
+	! grep file_b *.patch
 '

 test_expect_success 'format-patch --ignore-if-in-upstream HEAD' '
+	git checkout side &&
 	git format-patch --ignore-if-in-upstream HEAD
 '

--
2.35.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] 2.36 fast-export regression fix
  2022-04-30  5:29   ` [PATCH] 2.36 show regression fix Junio C Hamano
  2022-04-30 10:32     ` [PATCH] 2.36 format-patch " René Scharfe
@ 2022-04-30 14:31     ` René Scharfe
  2022-04-30 20:50       ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: René Scharfe @ 2022-04-30 14:31 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason
  Cc: Phillip Wood, git, Daniel Li

e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
that mechanism mandatory.

git fast-export doesn't set no_free, so path limiting stopped working
after the first commit.  Set the flag and add a basic test to make sure
only changes to the specified files are exported.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/fast-export.c  | 1 +
 t/t9350-fast-export.sh | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index a7d72697fb..1355b5a96d 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -1261,6 +1261,7 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 	revs.diffopt.format_callback = show_filemodify;
 	revs.diffopt.format_callback_data = &paths_of_changed_objects;
 	revs.diffopt.flags.recursive = 1;
+	revs.diffopt.no_free = 1;
 	while ((commit = get_revision(&revs)))
 		handle_commit(commit, &revs, &paths_of_changed_objects);

diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 7b7a18dd2c..fc99703fc5 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -500,6 +500,13 @@ test_expect_success 'path limiting with import-marks does not lose unmodified fi
 	grep file0 actual
 '

+test_expect_success 'path limiting works' '
+	git fast-export simple -- file >actual &&
+	sed -ne "s/^M .* //p" <actual | sort -u >actual.files &&
+	echo file >expect &&
+	test_cmp expect actual.files
+'
+
 test_expect_success 'avoid corrupt stream with non-existent mark' '
 	test_create_repo avoid_non_existent_mark &&
 	(
--
2.35.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] 2.36 format-patch regression fix
  2022-04-30 10:32     ` [PATCH] 2.36 format-patch " René Scharfe
@ 2022-04-30 16:32       ` Carlo Marcelo Arenas Belón
  2022-05-01  9:35         ` René Scharfe
  0 siblings, 1 reply; 10+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2022-04-30 16:32 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Daniel Li, git

On Sat, Apr 30, 2022 at 12:32:44PM +0200, René Scharfe wrote:
> e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
> way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
> have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
> that mechanism mandatory.
> 
> git format-patch only sets no_free when --output is given, causing it to
> forget pathspecs after the first commit.  Set no_free unconditionally
> instead.

I remember when I saw the first commit long ago, and thought; well that is
very round about way to reintroduce the UNLEAK removal that might have made
it visible.

Haven't looked too closely, but considering that we were warned[1] the
interface was hard to use and might cause problems later and it did.

wouldn't it a better and more secure solution to UNLEAK and remove all this
code, at least until it could be refactored cleanly, of course?

Carlo

[1] https://lore.kernel.org/git/YCUFNVj7qlt9wzlX@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] 2.36 fast-export regression fix
  2022-04-30 14:31     ` [PATCH] 2.36 fast-export regression fix René Scharfe
@ 2022-04-30 20:50       ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2022-04-30 20:50 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, Phillip Wood, git,
	Daniel Li

René Scharfe <l.s.r@web.de> writes:

> e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
> way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
> have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
> that mechanism mandatory.
>
> git fast-export doesn't set no_free, so path limiting stopped working
> after the first commit.  Set the flag and add a basic test to make sure
> only changes to the specified files are exported.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---

Both format-patch fix and this one look good to me.

We have UNLEAK() in format-patch so that the fix will not result in
additional false positives from the leak checker.  But this one, we
didn't have UNLEAK() so this may start tickling the leak checker
again.  We may need to add UNLEAK() at some point.

Stopping a leak checker from complaining about a known singleton
leak that is at the top-level (i.e. does not become bigger with the
data) by spending extra cycles is pointless, compared to using
UNLEAK() to do the same, and it is doubly misguided if it breaks the
behaviour X-<.

Will queue both, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] 2.36 format-patch regression fix
  2022-04-30 16:32       ` Carlo Marcelo Arenas Belón
@ 2022-05-01  9:35         ` René Scharfe
  2022-05-20 15:23           ` the state of diff_free() and release_revisions() (was: [PATCH] 2.36 format-patch regression fix) Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 10+ messages in thread
From: René Scharfe @ 2022-05-01  9:35 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Daniel Li, git

Am 30.04.22 um 18:32 schrieb Carlo Marcelo Arenas Belón:
> On Sat, Apr 30, 2022 at 12:32:44PM +0200, René Scharfe wrote:
>> e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
>> way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
>> have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
>> that mechanism mandatory.
>>
>> git format-patch only sets no_free when --output is given, causing it to
>> forget pathspecs after the first commit.  Set no_free unconditionally
>> instead.
>
> I remember when I saw the first commit long ago, and thought; well that is
> very round about way to reintroduce the UNLEAK removal that might have made
> it visible.
>
> Haven't looked too closely, but considering that we were warned[1] the
> interface was hard to use and might cause problems later and it did.
>
> wouldn't it a better and more secure solution to UNLEAK and remove all this
> code, at least until it could be refactored cleanly, of course?

Silently self-destructing pathspecs are a safety hazard indeed.

no_free also affects freeing ignore_regex and parseopts, and even
closing the output file.  I don't know about the file, but leaking the
first two is harmless.  So removing the flag is safe as long as we make
sure the output file is closed as needed.

A safe diff_free() would only be called a particular diffopt once, when
it's no longer needed.  It could check for reuse by setting a flag the
first time, like in the patch below.  1426 tests in 163 test scripts
fail for me with it applied on top of the regression fixes from this
thread.

Removing the diff_free() calls from diff.c::diff_flush() and
log-tree.c::log_tree_commit() reduces this to just one or two in t7527
(seems to be flaky).  Perhaps this is still salvageable?

>
> Carlo
>
> [1] https://lore.kernel.org/git/YCUFNVj7qlt9wzlX@coredump.intra.peff.net/


---
 diff.c | 3 +++
 diff.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/diff.c b/diff.c
index ef7159968b..01296829b5 100644
--- a/diff.c
+++ b/diff.c
@@ -6458,10 +6458,13 @@ void diff_free(struct diff_options *options)
 	if (options->no_free)
 		return;

+	if (options->is_dead)
+		BUG("double diff_free() on %p", (void *)options);
 	diff_free_file(options);
 	diff_free_ignore_regex(options);
 	clear_pathspec(&options->pathspec);
 	FREE_AND_NULL(options->parseopts);
+	options->is_dead = 1;
 }

 void diff_flush(struct diff_options *options)
diff --git a/diff.h b/diff.h
index 8ae18e5ab1..c31d32ba19 100644
--- a/diff.h
+++ b/diff.h
@@ -398,6 +398,7 @@ struct diff_options {
 	struct strmap *additional_path_headers;

 	int no_free;
+	int is_dead;
 };

 unsigned diff_filter_bit(char status);
--
2.35.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* the state of diff_free() and release_revisions() (was: [PATCH] 2.36 format-patch regression fix)
  2022-05-01  9:35         ` René Scharfe
@ 2022-05-20 15:23           ` Ævar Arnfjörð Bjarmason
  2022-05-20 17:23             ` the state of diff_free() and release_revisions() Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-20 15:23 UTC (permalink / raw)
  To: René Scharfe
  Cc: Carlo Marcelo Arenas Belón, Junio C Hamano, Phillip Wood,
	Daniel Li, git


On Sun, May 01 2022, René Scharfe wrote:

> Am 30.04.22 um 18:32 schrieb Carlo Marcelo Arenas Belón:
>> On Sat, Apr 30, 2022 at 12:32:44PM +0200, René Scharfe wrote:
>>> e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
>>> way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
>>> have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
>>> that mechanism mandatory.
>>>
>>> git format-patch only sets no_free when --output is given, causing it to
>>> forget pathspecs after the first commit.  Set no_free unconditionally
>>> instead.
>>
>> I remember when I saw the first commit long ago, and thought; well that is
>> very round about way to reintroduce the UNLEAK removal that might have made
>> it visible.
>>
>> Haven't looked too closely, but considering that we were warned[1] the
>> interface was hard to use and might cause problems later and it did.
>>
>> wouldn't it a better and more secure solution to UNLEAK and remove all this
>> code, at least until it could be refactored cleanly, of course?
>
> Silently self-destructing pathspecs are a safety hazard indeed.
>
> no_free also affects freeing ignore_regex and parseopts, and even
> closing the output file.  I don't know about the file, but leaking the
> first two is harmless.  So removing the flag is safe as long as we make
> sure the output file is closed as needed.
>
> A safe diff_free() would only be called a particular diffopt once, when
> it's no longer needed.  It could check for reuse by setting a flag the
> first time, like in the patch below.  1426 tests in 163 test scripts
> fail for me with it applied on top of the regression fixes from this
> thread.
>
> Removing the diff_free() calls from diff.c::diff_flush() and
> log-tree.c::log_tree_commit() reduces this to just one or two in t7527
> (seems to be flaky).  Perhaps this is still salvageable?

Thanks both for handling this, and sorry that I was away at the time.

AFAICT the current status in this area is that with 2cc712324d5 (Merge
branch 'rs/fast-export-pathspec-fix', 2022-05-04) and 5048b20d1c2 (Merge
branch 'rs/format-patch-pathspec-fix', 2022-05-04) merged the known bugs
related to this have been fixed, along with 3da993f2e63 (Merge branch
'jc/diff-tree-stdin-fix', 2022-04-28).

"This" being my e900d494dcf (diff: add an API for deferred freeing,
2021-02-11), and 244c27242f4 (diff.[ch]: have diff_free() call
clear_pathspec(opts.pathspec), 2022-02-16) for the diff-tree case.

Not coincidentally around the same time my ab/plug-leak-in-revisions got
un-marked for "next" from [1] to [2], and I'm looking for a path forward
for this whole thing...

1. https://lore.kernel.org/git/xmqqbkwyz78z.fsf@gitster.g/
2. https://lore.kernel.org/git/xmqqwnfcskw2.fsf@gitster.g/

>> [1] https://lore.kernel.org/git/YCUFNVj7qlt9wzlX@coredump.intra.peff.net/
>
>
> ---
>  diff.c | 3 +++
>  diff.h | 1 +
>  2 files changed, 4 insertions(+)
>
> diff --git a/diff.c b/diff.c
> index ef7159968b..01296829b5 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6458,10 +6458,13 @@ void diff_free(struct diff_options *options)
>  	if (options->no_free)
>  		return;
>
> +	if (options->is_dead)
> +		BUG("double diff_free() on %p", (void *)options);
>  	diff_free_file(options);
>  	diff_free_ignore_regex(options);
>  	clear_pathspec(&options->pathspec);
>  	FREE_AND_NULL(options->parseopts);
> +	options->is_dead = 1;
>  }
>
>  void diff_flush(struct diff_options *options)
> diff --git a/diff.h b/diff.h
> index 8ae18e5ab1..c31d32ba19 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -398,6 +398,7 @@ struct diff_options {
>  	struct strmap *additional_path_headers;
>
>  	int no_free;
> +	int is_dead;
>  };
>
>  unsigned diff_filter_bit(char status);

Yes, that's quite scary. It shows that in general diff_free() isn't
reentrant-safe, but that we do call it repeatedly again.

However if we patch it like this instead we can see that (gulp!) we just
barely putter along, according to our test coverage at least. I.e. we
don't end up calling the parts of it that would be unsafe to call again:
	
	diff --git a/diff.c b/diff.c
	index ef7159968b6..0fe8bc5fade 100644
	--- a/diff.c
	+++ b/diff.c
	@@ -6438,14 +6438,23 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
	 
	 static void diff_free_file(struct diff_options *options)
	 {
	-	if (options->close_file)
	+	if (options->close_file) {
	+		if (options->is_dead)
	+			BUG("double diff_free() on %p", (void *)options);
	 		fclose(options->file);
	+	}
	 }
	 
	 static void diff_free_ignore_regex(struct diff_options *options)
	 {
	 	int i;
	 
	+	if (!options->ignore_regex_nr && !options->ignore_regex)
	+		return;
	+
	+	if (options->is_dead)
	+		BUG("double diff_free() on %p", (void *)options);
	+
	 	for (i = 0; i < options->ignore_regex_nr; i++) {
	 		regfree(options->ignore_regex[i]);
	 		free(options->ignore_regex[i]);
	@@ -6462,6 +6471,7 @@ void diff_free(struct diff_options *options)
	 	diff_free_ignore_regex(options);
	 	clear_pathspec(&options->pathspec);
	 	FREE_AND_NULL(options->parseopts);
	+	options->is_dead = 1;
	 }
	 
	 void diff_flush(struct diff_options *options)
	@@ -6560,7 +6570,6 @@ void diff_flush(struct diff_options *options)
	 free_queue:
	 	free(q->queue);
	 	DIFF_QUEUE_CLEAR(q);
	-	diff_free(options);
	 
	 	/*
	 	 * Report the content-level differences with HAS_CHANGES;
	diff --git a/diff.h b/diff.h
	index 8ae18e5ab1e..c31d32ba192 100644
	--- a/diff.h
	+++ b/diff.h
	@@ -398,6 +398,7 @@ struct diff_options {
	 	struct strmap *additional_path_headers;
	 
	 	int no_free;
	+	int is_dead;
	 };
	 
	 unsigned diff_filter_bit(char status);

I'd really like to fix this properly, but AFAICT the best way to do that
is to:

 A. Get ab/plug-leak-in-revisions merged down
 B. Fix diff_free() on top of that

Before I knew of these bugs I'd already written patches to get rid of
that whole "no_free" business. In retrospect it was completely the wrong
thing to do, but in hindsight something like it was needed to fix those
leaks as long as we didn't have a revisions_release().

I.e. the tricky cases where I ended up needing to set "no_free" are ones
where all the complexity neatly goes away once we start releasing the
"struct rev_info" properly, as it contains the data we'd like to
diff_free() at the end.

How does that plan sound, and is there anything I've missed?

I could also re-roll ab/plug-leak-in-revisions to include a fix that
makes it safe in the interim, i.e.:

	diff --git a/diff.c b/diff.c
	index ef7159968b6..2bc7ee81e4e 100644
	--- a/diff.c
	+++ b/diff.c
	@@ -6438,8 +6438,12 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
	 
	 static void diff_free_file(struct diff_options *options)
	 {
	-	if (options->close_file)
	+	if (options->close_file) {
	 		fclose(options->file);
	+
	+		options->file = NULL;
	+		options->close_file = 0;
	+	}
	 }
	 
	 static void diff_free_ignore_regex(struct diff_options *options)
	@@ -6450,7 +6454,8 @@ static void diff_free_ignore_regex(struct diff_options *options)
	 		regfree(options->ignore_regex[i]);
	 		free(options->ignore_regex[i]);
	 	}
	-	free(options->ignore_regex);
	+	options->ignore_regex_nr = 0;
	+	FREE_AND_NULL(options->ignore_regex);
	 }
	 
	 void diff_free(struct diff_options *options)

But as long as we're not adding new API users of it until the follow-up
after ab/plug-leak-in-revisions we should also be safe for now, but
perhaps it's prudent to do it anyway.

I *could* potentially produce a shorter series than
ab/plug-leak-in-revisions to narrowly try to remove "no_free" from
diff.c first, but it would basically need to first introduce a
release_revisions(), and without the other revisions API leaks being
fixed testing it would be much tricker. I'd really prefer not to do
that.

How does all that sound?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: the state of diff_free() and release_revisions()
  2022-05-20 15:23           ` the state of diff_free() and release_revisions() (was: [PATCH] 2.36 format-patch regression fix) Ævar Arnfjörð Bjarmason
@ 2022-05-20 17:23             ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2022-05-20 17:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: René Scharfe, Carlo Marcelo Arenas Belón, Phillip Wood,
	Daniel Li, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> AFAICT the current status in this area is that with 2cc712324d5 (Merge
> branch 'rs/fast-export-pathspec-fix', 2022-05-04) and 5048b20d1c2 (Merge
> branch 'rs/format-patch-pathspec-fix', 2022-05-04) merged the known bugs
> related to this have been fixed, along with 3da993f2e63 (Merge branch
> 'jc/diff-tree-stdin-fix', 2022-04-28).

There is another possible known breakage around "diff -I --cc"
reported, no?

https://lore.kernel.org/git/a6a14213-bc82-d6fb-43dd-5a423c40a4f8@web.de/

> Not coincidentally around the same time my ab/plug-leak-in-revisions got
> un-marked for "next" from [1] to [2], and I'm looking for a path forward
> for this whole thing...

I think this was not because it had known bugs but only because we
wanted to avoid distractions and too much churns in the tree.

I think it is a good time to reactivate it; Phillip Wood took a
serious look at the latest round, if I am reading [*] correctly.

* https://lore.kernel.org/git/2cc2d026-1496-1ed9-838b-6fdf8cfba285@gmail.com/

Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-20 17:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-30  2:22 Bug: `git show` honors path filters only for the first commit Daniel Li
2022-04-30  4:59 ` Junio C Hamano
2022-04-30  5:29   ` [PATCH] 2.36 show regression fix Junio C Hamano
2022-04-30 10:32     ` [PATCH] 2.36 format-patch " René Scharfe
2022-04-30 16:32       ` Carlo Marcelo Arenas Belón
2022-05-01  9:35         ` René Scharfe
2022-05-20 15:23           ` the state of diff_free() and release_revisions() (was: [PATCH] 2.36 format-patch regression fix) Ævar Arnfjörð Bjarmason
2022-05-20 17:23             ` the state of diff_free() and release_revisions() Junio C Hamano
2022-04-30 14:31     ` [PATCH] 2.36 fast-export regression fix René Scharfe
2022-04-30 20:50       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).