git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] revisions API: fix more memory leaks
@ 2022-07-13 13:10 Ævar Arnfjörð Bjarmason
  2022-07-13 13:10 ` [PATCH 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
                   ` (7 more replies)
  0 siblings, 8 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-13 13:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

This short series fixes a couple of mistakes in 1-2/6 that snuck in in
the revisions_release() series, those compliment the already queued
jk/diff-files-cleanup-fix.

The rest of this fixes a few tricky remaining memory leaks, allowing
us to mark the tests seen in the diffstat as passing with
SANITIZE=leak.

The "git show" leak in 4/6 in particular showed up in a lot of places
in our test suite, so fixing it really helps us to accelerate towards
marking more entire tests as leak-free.

Passing CI for this series can be found at:
https://github.com/avar/git/tree/avar/follow-up-release-revisions-fixes

Ævar Arnfjörð Bjarmason (6):
  bisect.c: add missing "goto" for release_revisions()
  test-fast-rebase helper: use release_revisions() (again)
  log: make the intent of cmd_show()'s "rev.pending" juggling clearer
  log: fix common "rev.pending" memory leak in "git show"
  bisect.c: partially fix bisect_rev_setup() memory leak
  revisions API: don't leak memory on argv elements that need free()-ing

 bisect.c                                      | 28 +++++++++++--------
 builtin/log.c                                 | 22 ++++++++-------
 builtin/submodule--helper.c                   |  5 +++-
 remote.c                                      |  5 +++-
 revision.c                                    |  2 ++
 revision.h                                    |  3 +-
 t/helper/test-fast-rebase.c                   |  2 --
 t/t0203-gettext-setlocale-sanity.sh           |  1 +
 t/t1020-subdirectory.sh                       |  1 +
 t/t2020-checkout-detach.sh                    |  1 +
 t/t3307-notes-man.sh                          |  1 +
 t/t3920-crlf-messages.sh                      |  2 ++
 t/t4069-remerge-diff.sh                       |  1 +
 t/t7007-show.sh                               |  1 +
 ...3-pre-commit-and-pre-merge-commit-hooks.sh |  1 +
 t/t9122-git-svn-author.sh                     |  1 -
 t/t9162-git-svn-dcommit-interactive.sh        |  1 -
 17 files changed, 50 insertions(+), 28 deletions(-)

-- 
2.37.0.932.g7b7031e73bc


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 1/6] bisect.c: add missing "goto" for release_revisions()
  2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
@ 2022-07-13 13:10 ` Ævar Arnfjörð Bjarmason
  2022-07-13 13:10 ` [PATCH 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-13 13:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Add a missing "goto cleanup", this fixes a bug in
f196c1e908d (revisions API users: use release_revisions() needing
REV_INFO_INIT, 2022-04-13).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bisect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bisect.c b/bisect.c
index b63669cc9d7..421470bfa59 100644
--- a/bisect.c
+++ b/bisect.c
@@ -1054,7 +1054,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 		 */
 		res = error_if_skipped_commits(tried, NULL);
 		if (res < 0)
-			return res;
+			goto cleanup;
 		printf(_("%s was both %s and %s\n"),
 		       oid_to_hex(current_bad_oid),
 		       term_good,
-- 
2.37.0.932.g7b7031e73bc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 2/6] test-fast-rebase helper: use release_revisions() (again)
  2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
  2022-07-13 13:10 ` [PATCH 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
@ 2022-07-13 13:10 ` Ævar Arnfjörð Bjarmason
  2022-07-13 13:10 ` [PATCH 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-13 13:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Fix a bug in 0139c58ab95 (revisions API users: add "goto cleanup" for
release_revisions(), 2022-04-13), in that commit a release_revisions()
call was added to this function, but it never did anything due to this
TODO memset() added in fe1a21d5267 (fast-rebase: demonstrate
merge-ort's API via new test-tool command, 2020-10-29).

Simply removing the memset() will fix the "cmdline" which can be seen
when running t5520-pull.sh.

This sort of thing could be detected automatically with a rule similar
to the unused.cocci merged in 7fa60d2a5b6 (Merge branch
'ab/cocci-unused' into next, 2022-07-11). The following rule on top
would catch the case being fixed here:

	@@
	type T;
	identifier I;
	identifier REL1 =~ "^[a-z_]*_(release|reset|clear|free)$";
	identifier REL2 =~ "^(release|clear|free)_[a-z_]*$";
	@@

	- memset(\( I \| &I \), 0, ...);
	  ... when != \( I \| &I \)
	(
	  \( REL1 \| REL2 \)( \( I \| &I \), ...);
	|
	  \( REL1 \| REL2 \)( \( &I \| I \) );
	)
	  ... when != \( I \| &I \)

That rule should arguably use only &I, not I (as we might be passed a
pointer). He distinction would matter if anyone cared about the
side-effects of a memset() followed by release() of a pointer to a
variable passed into the function.

As such a pattern would be at best very confusing, and most likely
point to buggy code as in this case, the above rule is probably fine
as-is.

But as this rule only found one such bug in the entire codebase let's
not add it to contrib/coccinelle/unused.cocci for now, we can always
dig it up in the future if it's deemed useful.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/helper/test-fast-rebase.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c
index 4e5553e2024..45665ec19a5 100644
--- a/t/helper/test-fast-rebase.c
+++ b/t/helper/test-fast-rebase.c
@@ -184,8 +184,6 @@ int cmd__fast_rebase(int argc, const char **argv)
 		last_picked_commit = commit;
 		last_commit = create_commit(result.tree, commit, last_commit);
 	}
-	/* TODO: There should be some kind of rev_info_free(&revs) call... */
-	memset(&revs, 0, sizeof(revs));
 
 	merge_switch_to_result(&merge_opt, head_tree, &result, 1, !result.clean);
 
-- 
2.37.0.932.g7b7031e73bc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer
  2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
  2022-07-13 13:10 ` [PATCH 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
  2022-07-13 13:10 ` [PATCH 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
@ 2022-07-13 13:10 ` Ævar Arnfjörð Bjarmason
  2022-07-18 14:51   ` Jeff King
  2022-07-13 13:10 ` [PATCH 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-13 13:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Adjust code added in 5d7eeee2ac6 (git-show: grok blobs, trees and
tags, too, 2006-12-14) to use the "memcpy a &blank" idiom introduced
in 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT
macro, 2021-07-01).

Now the types in play are guaranteed to correspond, i.e. we used "int"
here for the "count" before, but the corresponding "nr" is an
"unsigned int". By using a "blank" object we almost entirely bypass
that, we'll only need to declare our own "unsigned int i".

There are no functional changes here aside from potential overflow
guard rails, the structure only has these three members ("nr", "alloc"
and "objects"), but now we're obviously future-proof against assuming
that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/log.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 88a5e98875a..e0f40798d45 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -668,10 +668,12 @@ static void show_setup_revisions_tweak(struct rev_info *rev,
 int cmd_show(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info rev;
-	struct object_array_entry *objects;
+	struct object_array blank = OBJECT_ARRAY_INIT;
+	struct object_array cp = OBJECT_ARRAY_INIT;
+	unsigned int i;
 	struct setup_revision_opt opt;
 	struct pathspec match_all;
-	int i, count, ret = 0;
+	int ret = 0;
 
 	init_log_defaults();
 	git_config(git_log_config, NULL);
@@ -698,12 +700,11 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 	if (!rev.no_walk)
 		return cmd_log_deinit(cmd_log_walk(&rev), &rev);
 
-	count = rev.pending.nr;
-	objects = rev.pending.objects;
+	memcpy(&cp, &rev.pending, sizeof(rev.pending));
 	rev.diffopt.no_free = 1;
-	for (i = 0; i < count && !ret; i++) {
-		struct object *o = objects[i].item;
-		const char *name = objects[i].name;
+	for (i = 0; i < cp.nr && !ret; i++) {
+		struct object *o = cp.objects[i].item;
+		const char *name = cp.objects[i].name;
 		switch (o->type) {
 		case OBJ_BLOB:
 			ret = show_blob_object(&o->oid, &rev, name);
@@ -726,7 +727,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			if (!o)
 				ret = error(_("could not read object %s"),
 					    oid_to_hex(oid));
-			objects[i].item = o;
+			cp.objects[i].item = o;
 			i--;
 			break;
 		}
@@ -743,8 +744,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			rev.shown_one = 1;
 			break;
 		case OBJ_COMMIT:
-			rev.pending.nr = rev.pending.alloc = 0;
-			rev.pending.objects = NULL;
+			memcpy(&rev.pending, &blank, sizeof(rev.pending));
 			add_object_array(o, name, &rev.pending);
 			ret = cmd_log_walk_no_free(&rev);
 			break;
-- 
2.37.0.932.g7b7031e73bc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 4/6] log: fix common "rev.pending" memory leak in "git show"
  2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2022-07-13 13:10 ` [PATCH 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
@ 2022-07-13 13:10 ` Ævar Arnfjörð Bjarmason
  2022-07-18 14:57   ` Jeff King
  2022-07-13 13:10 ` [PATCH 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-13 13:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Fix a very common memory leak introduced in 5d7eeee2ac6 (git-show: grok blobs,
trees and tags, too, 2006-12-14).

When "git show" displays commits it needs to temporarily clobbers the
"rev.pending" array, but by doing so we'll fail to
release_revisions(), as we have for most other uses of "struct
rev_info" since 2da81d1efb0 (Merge branch 'ab/plug-leak-in-revisions',
2022-06-07).

In the preceding commit this code was made to use a more extendable
pattern, which we can now complete. Once we've clobbered our
"rev.pending" and invoked "cmd_log_walk_no_free()" we need to
"object_array_clear()" our newly created "rev.pending" to avoid
leaking the memory related to the one member array we've created.

But more importantly we need to set "rev.pending" back to the original
we squirreled away in the "cp" variable, so that we'll make use of the
release_revisions() added in f6bfea0ad01 (revisions API users: use
release_revisions() in builtin/log.c, 2022-04-13). In f6bfea0ad01 this
memory leak was noted as an outstanding TODO, but it's now been fixed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/log.c                                    | 2 ++
 t/t0203-gettext-setlocale-sanity.sh              | 1 +
 t/t1020-subdirectory.sh                          | 1 +
 t/t3307-notes-man.sh                             | 1 +
 t/t3920-crlf-messages.sh                         | 2 ++
 t/t4069-remerge-diff.sh                          | 1 +
 t/t7007-show.sh                                  | 1 +
 t/t7503-pre-commit-and-pre-merge-commit-hooks.sh | 1 +
 t/t9122-git-svn-author.sh                        | 1 -
 t/t9162-git-svn-dcommit-interactive.sh           | 1 -
 10 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index e0f40798d45..77ec256a8ae 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -747,6 +747,8 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			memcpy(&rev.pending, &blank, sizeof(rev.pending));
 			add_object_array(o, name, &rev.pending);
 			ret = cmd_log_walk_no_free(&rev);
+			object_array_clear(&rev.pending);
+			memcpy(&rev.pending, &cp, sizeof(rev.pending));
 			break;
 		default:
 			ret = error(_("unknown type: %d"), o->type);
diff --git a/t/t0203-gettext-setlocale-sanity.sh b/t/t0203-gettext-setlocale-sanity.sh
index 0ce1f22eff6..86cff324ff1 100755
--- a/t/t0203-gettext-setlocale-sanity.sh
+++ b/t/t0203-gettext-setlocale-sanity.sh
@@ -5,6 +5,7 @@
 
 test_description="The Git C functions aren't broken by setlocale(3)"
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./lib-gettext.sh
 
 test_expect_success 'git show a ISO-8859-1 commit under C locale' '
diff --git a/t/t1020-subdirectory.sh b/t/t1020-subdirectory.sh
index 9fdbb2af80e..45eef9457fe 100755
--- a/t/t1020-subdirectory.sh
+++ b/t/t1020-subdirectory.sh
@@ -6,6 +6,7 @@
 test_description='Try various core-level commands in subdirectory.
 '
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-read-tree.sh
 
diff --git a/t/t3307-notes-man.sh b/t/t3307-notes-man.sh
index 1aa366a410e..ae316502c45 100755
--- a/t/t3307-notes-man.sh
+++ b/t/t3307-notes-man.sh
@@ -4,6 +4,7 @@ test_description='Examples from the git-notes man page
 
 Make sure the manual is not full of lies.'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' '
diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index 0276edbe3d3..4c661d4d54a 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
 test_description='Test ref-filter and pretty APIs for commit and tag messages using CRLF'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 LIB_CRLF_BRANCHES=""
diff --git a/t/t4069-remerge-diff.sh b/t/t4069-remerge-diff.sh
index 35f94957fce..9e7cac68b1c 100755
--- a/t/t4069-remerge-diff.sh
+++ b/t/t4069-remerge-diff.sh
@@ -2,6 +2,7 @@
 
 test_description='remerge-diff handling'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 # This test is ort-specific
diff --git a/t/t7007-show.sh b/t/t7007-show.sh
index d6cc69e0f2c..f908a4d1abc 100755
--- a/t/t7007-show.sh
+++ b/t/t7007-show.sh
@@ -2,6 +2,7 @@
 
 test_description='git show'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success setup '
diff --git a/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh b/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
index ad1eb64ba0d..aa004b70a8d 100755
--- a/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
+++ b/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
@@ -5,6 +5,7 @@ test_description='pre-commit and pre-merge-commit hooks'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'root commit' '
diff --git a/t/t9122-git-svn-author.sh b/t/t9122-git-svn-author.sh
index 527ba3d2932..0fc289ae0f0 100755
--- a/t/t9122-git-svn-author.sh
+++ b/t/t9122-git-svn-author.sh
@@ -2,7 +2,6 @@
 
 test_description='git svn authorship'
 
-TEST_FAILS_SANITIZE_LEAK=true
 . ./lib-git-svn.sh
 
 test_expect_success 'setup svn repository' '
diff --git a/t/t9162-git-svn-dcommit-interactive.sh b/t/t9162-git-svn-dcommit-interactive.sh
index e2aa8ed88a9..b3ce033a0d3 100755
--- a/t/t9162-git-svn-dcommit-interactive.sh
+++ b/t/t9162-git-svn-dcommit-interactive.sh
@@ -4,7 +4,6 @@
 
 test_description='git svn dcommit --interactive series'
 
-TEST_FAILS_SANITIZE_LEAK=true
 . ./lib-git-svn.sh
 
 test_expect_success 'initialize repo' '
-- 
2.37.0.932.g7b7031e73bc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 5/6] bisect.c: partially fix bisect_rev_setup() memory leak
  2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2022-07-13 13:10 ` [PATCH 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
@ 2022-07-13 13:10 ` Ævar Arnfjörð Bjarmason
  2022-07-18 15:05   ` Jeff King
  2022-07-13 13:10 ` [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-13 13:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Partially fix the memory leak noted in in 8a534b61241 (bisect: use
argv_array API, 2011-09-13), which added the "XXX" comment seen in the
context. We can partially fix it by having the bisect_rev_setup()
function take a "struct strvec", rather than constructing it.

As the comment notes we need to keep the construct "rev_argv" around
while the "struct rev_info" is around, which as seen in the newly
added "strvec_clear()" calls here we do after "release_revisions()".

This "partially" fixes the memory leak because we're leaking the "--"
added to the "rev_argv" here still, which will be addressed in a
subsequent commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bisect.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/bisect.c b/bisect.c
index 421470bfa59..6afb98be7a1 100644
--- a/bisect.c
+++ b/bisect.c
@@ -648,11 +648,11 @@ static struct commit_list *managed_skipped(struct commit_list *list,
 }
 
 static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
+			     struct strvec *rev_argv,
 			     const char *prefix,
 			     const char *bad_format, const char *good_format,
 			     int read_paths)
 {
-	struct strvec rev_argv = STRVEC_INIT;
 	int i;
 
 	repo_init_revisions(r, revs, prefix);
@@ -660,16 +660,16 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 	revs->commit_format = CMIT_FMT_UNSPECIFIED;
 
 	/* rev_argv.argv[0] will be ignored by setup_revisions */
-	strvec_push(&rev_argv, "bisect_rev_setup");
-	strvec_pushf(&rev_argv, bad_format, oid_to_hex(current_bad_oid));
+	strvec_push(rev_argv, "bisect_rev_setup");
+	strvec_pushf(rev_argv, bad_format, oid_to_hex(current_bad_oid));
 	for (i = 0; i < good_revs.nr; i++)
-		strvec_pushf(&rev_argv, good_format,
+		strvec_pushf(rev_argv, good_format,
 			     oid_to_hex(good_revs.oid + i));
-	strvec_push(&rev_argv, "--");
+	strvec_push(rev_argv, "--");
 	if (read_paths)
-		read_bisect_paths(&rev_argv);
+		read_bisect_paths(rev_argv);
 
-	setup_revisions(rev_argv.nr, rev_argv.v, revs, NULL);
+	setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
 	/* XXX leak rev_argv, as "revs" may still be pointing to it */
 }
 
@@ -873,10 +873,11 @@ static enum bisect_error check_merge_bases(int rev_nr, struct commit **rev, int
 static int check_ancestors(struct repository *r, int rev_nr,
 			   struct commit **rev, const char *prefix)
 {
+	struct strvec rev_argv = STRVEC_INIT;
 	struct rev_info revs;
 	int res;
 
-	bisect_rev_setup(r, &revs, prefix, "^%s", "%s", 0);
+	bisect_rev_setup(r, &revs, &rev_argv, prefix, "^%s", "%s", 0);
 
 	bisect_common(&revs);
 	res = (revs.commits != NULL);
@@ -885,6 +886,7 @@ static int check_ancestors(struct repository *r, int rev_nr,
 	clear_commit_marks_many(rev_nr, rev, ALL_REV_FLAGS);
 
 	release_revisions(&revs);
+	strvec_clear(&rev_argv);
 	return res;
 }
 
@@ -1010,6 +1012,7 @@ void read_bisect_terms(const char **read_bad, const char **read_good)
  */
 enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 {
+	struct strvec rev_argv = STRVEC_INIT;
 	struct rev_info revs = REV_INFO_INIT;
 	struct commit_list *tried;
 	int reaches = 0, all = 0, nr, steps;
@@ -1037,7 +1040,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 	if (res)
 		goto cleanup;
 
-	bisect_rev_setup(r, &revs, prefix, "%s", "^%s", 1);
+	bisect_rev_setup(r, &revs, &rev_argv, prefix, "%s", "^%s", 1);
 
 	revs.first_parent_only = !!(bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY);
 	revs.limited = 1;
@@ -1112,6 +1115,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 	res = bisect_checkout(bisect_rev, no_checkout);
 cleanup:
 	release_revisions(&revs);
+	strvec_clear(&rev_argv);
 	return res;
 }
 
-- 
2.37.0.932.g7b7031e73bc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2022-07-13 13:10 ` [PATCH 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
@ 2022-07-13 13:10 ` Ævar Arnfjörð Bjarmason
  2022-07-18 15:11   ` Jeff King
  2022-07-18 15:14 ` [PATCH 0/6] revisions API: fix more memory leaks Jeff King
  2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
  7 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-13 13:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Add a "free_removed_argv_elements" member to "struct
setup_revision_opt", and use it to fix several memory leaks, e.g. the
one with a "XXX" comment added in 8a534b61241 (bisect: use argv_array
API, 2011-09-13).

We have various memory leaks in APIs that take and munge "const
char **argv", e.g. parse_options(). Sometimes these APIs are given the
"argv" we get to the "main" function, in which case we don't leak
memory, but other times we're giving it the "v" member of a "struct
strvec" we created.

There's several potential ways to fix those sort of leaks, we could
add a "nodup" mode to "struct strvec", which would work for the cases
where we push constant strings to it. But that wouldn't work as soon
as we used strvec_pushf(), or otherwise needed to duplicate or create
a string for that "struct strvec".

Let's instead make it the responsibility of the revisions API. If it's
going to clobber elements of argv it can also free() them, which it
will now do if instructed to do so via "free_removed_argv_elements".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bisect.c                    | 6 ++++--
 builtin/submodule--helper.c | 5 ++++-
 remote.c                    | 5 ++++-
 revision.c                  | 2 ++
 revision.h                  | 3 ++-
 t/t2020-checkout-detach.sh  | 1 +
 6 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/bisect.c b/bisect.c
index 6afb98be7a1..38b3891f3a6 100644
--- a/bisect.c
+++ b/bisect.c
@@ -653,6 +653,9 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 			     const char *bad_format, const char *good_format,
 			     int read_paths)
 {
+	struct setup_revision_opt opt = {
+		.free_removed_argv_elements = 1,
+	};
 	int i;
 
 	repo_init_revisions(r, revs, prefix);
@@ -669,8 +672,7 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 	if (read_paths)
 		read_bisect_paths(rev_argv);
 
-	setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
-	/* XXX leak rev_argv, as "revs" may still be pointing to it */
+	setup_revisions(rev_argv->nr, rev_argv->v, revs, &opt);
 }
 
 static void bisect_common(struct rev_info *revs)
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c597df7528e..c4e47c1b15a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1114,6 +1114,9 @@ static int compute_summary_module_list(struct object_id *head_oid,
 {
 	struct strvec diff_args = STRVEC_INIT;
 	struct rev_info rev;
+	struct setup_revision_opt opt = {
+		.free_removed_argv_elements = 1,
+	};
 	struct module_cb_list list = MODULE_CB_LIST_INIT;
 	int ret = 0;
 
@@ -1131,7 +1134,7 @@ static int compute_summary_module_list(struct object_id *head_oid,
 	init_revisions(&rev, info->prefix);
 	rev.abbrev = 0;
 	precompose_argv_prefix(diff_args.nr, diff_args.v, NULL);
-	setup_revisions(diff_args.nr, diff_args.v, &rev, NULL);
+	setup_revisions(diff_args.nr, diff_args.v, &rev, &opt);
 	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = submodule_summary_callback;
 	rev.diffopt.format_callback_data = &list;
diff --git a/remote.c b/remote.c
index b19e3a2f015..f05b50b1dd1 100644
--- a/remote.c
+++ b/remote.c
@@ -2169,6 +2169,9 @@ static int stat_branch_pair(const char *branch_name, const char *base,
 	struct object_id oid;
 	struct commit *ours, *theirs;
 	struct rev_info revs;
+	struct setup_revision_opt opt = {
+		.free_removed_argv_elements = 1,
+	};
 	struct strvec argv = STRVEC_INIT;
 
 	/* Cannot stat if what we used to build on no longer exists */
@@ -2203,7 +2206,7 @@ static int stat_branch_pair(const char *branch_name, const char *base,
 	strvec_push(&argv, "--");
 
 	repo_init_revisions(the_repository, &revs, NULL);
-	setup_revisions(argv.nr, argv.v, &revs, NULL);
+	setup_revisions(argv.nr, argv.v, &revs, &opt);
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 
diff --git a/revision.c b/revision.c
index 211352795c5..3fa84247d33 100644
--- a/revision.c
+++ b/revision.c
@@ -2748,6 +2748,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			const char *arg = argv[i];
 			if (strcmp(arg, "--"))
 				continue;
+			if (opt && opt->free_removed_argv_elements)
+				free((char *)argv[i]);
 			argv[i] = NULL;
 			argc = i;
 			if (argv[i + 1])
diff --git a/revision.h b/revision.h
index e576845cdd1..bb91e7ed914 100644
--- a/revision.h
+++ b/revision.h
@@ -375,7 +375,8 @@ struct setup_revision_opt {
 	const char *def;
 	void (*tweak)(struct rev_info *, struct setup_revision_opt *);
 	unsigned int	assume_dashdash:1,
-			allow_exclude_promisor_objects:1;
+			allow_exclude_promisor_objects:1,
+			free_removed_argv_elements:1;
 	unsigned revarg_opt;
 };
 int setup_revisions(int argc, const char **argv, struct rev_info *revs,
diff --git a/t/t2020-checkout-detach.sh b/t/t2020-checkout-detach.sh
index bc46713a43e..2eab6474f8d 100755
--- a/t/t2020-checkout-detach.sh
+++ b/t/t2020-checkout-detach.sh
@@ -4,6 +4,7 @@ test_description='checkout into detached HEAD state'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 check_detached () {
-- 
2.37.0.932.g7b7031e73bc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer
  2022-07-13 13:10 ` [PATCH 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
@ 2022-07-18 14:51   ` Jeff King
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-18 14:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Wed, Jul 13, 2022 at 03:10:32PM +0200, Ævar Arnfjörð Bjarmason wrote:

> diff --git a/builtin/log.c b/builtin/log.c
> index 88a5e98875a..e0f40798d45 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -668,10 +668,12 @@ static void show_setup_revisions_tweak(struct rev_info *rev,
>  int cmd_show(int argc, const char **argv, const char *prefix)
>  {
>  	struct rev_info rev;
> -	struct object_array_entry *objects;
> +	struct object_array blank = OBJECT_ARRAY_INIT;
> +	struct object_array cp = OBJECT_ARRAY_INIT;

I'm not sure what "cp" stands for. Maybe just "pending" would be a more
descriptive name?

> @@ -698,12 +700,11 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  	if (!rev.no_walk)
>  		return cmd_log_deinit(cmd_log_walk(&rev), &rev);
>  
> -	count = rev.pending.nr;
> -	objects = rev.pending.objects;
> +	memcpy(&cp, &rev.pending, sizeof(rev.pending));

OK, so now "cp" is a copy of the original "rev.pending". But that
original is still in place. If I understand the intent of this code
correctly, we'd never want to look at it again. The only place that
should do so is the call to cmd_log_walk_no_free():

>  		case OBJ_COMMIT:
> -			rev.pending.nr = rev.pending.alloc = 0;
> -			rev.pending.objects = NULL;
> +			memcpy(&rev.pending, &blank, sizeof(rev.pending));
>  			add_object_array(o, name, &rev.pending);
>  			ret = cmd_log_walk_no_free(&rev);
>  			break;

but both before and after your patch, we clear rev.pending before doing
so.

So perhaps it would make the intent more clear if we fully transferred
ownership out of the rev struct? I.e., something like:

  memcpy(&cp, &rev.pending, sizeof(rev.pending));
  memcpy(&rev.pending, &blank, sizeof(rev.pending));
  for (i = 0; i < cp.nr; i++) {
     ...stuff...
  }
  object_array_clear(&cp);

> @@ -726,7 +727,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  			if (!o)
>  				ret = error(_("could not read object %s"),
>  					    oid_to_hex(oid));
> -			objects[i].item = o;
> +			cp.objects[i].item = o;
>  			i--;
>  			break;
>  		}

Wow, this "overwrite the current item and back up one" strategy is truly
horrific. But it's neither here nor there for your series; you don't
make it any worse, and because "item" is not a free-able pointer, you
don't need to worry about it for leaking.

I suspect the cleaner way of doing it would be to push all of this
switch logic into a function, and then call the function recursively
when dereferencing a tag. But let's put that aside so as not to distract
from your goal.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] log: fix common "rev.pending" memory leak in "git show"
  2022-07-13 13:10 ` [PATCH 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
@ 2022-07-18 14:57   ` Jeff King
  2022-07-18 15:08     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 52+ messages in thread
From: Jeff King @ 2022-07-18 14:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Wed, Jul 13, 2022 at 03:10:33PM +0200, Ævar Arnfjörð Bjarmason wrote:

> diff --git a/builtin/log.c b/builtin/log.c
> index e0f40798d45..77ec256a8ae 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -747,6 +747,8 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  			memcpy(&rev.pending, &blank, sizeof(rev.pending));
>  			add_object_array(o, name, &rev.pending);
>  			ret = cmd_log_walk_no_free(&rev);
> +			object_array_clear(&rev.pending);
> +			memcpy(&rev.pending, &cp, sizeof(rev.pending));
>  			break;

OK, so we clear the fake one-entry pending array we just created. That
make sense.

And then we have to restore rev.pending, because we don't otherwise
clean up cp. That makes sense in the context of your previous patch, but
if you take my suggestion there to separate "cp" from rev.pending
entirely, and clean it up explicitly, then this second memcpy() goes
away. IMHO that leaves the resulting code much easier to follow, as the
lifetimes are clearly distinct.

Two alternatives I briefly considered that don't work:

  - you can't just leave the one-item rev.pending in place to get
    cleaned up by release_revisions(), because we may show multiple
    commits. We have to clean each one as we go.

  - you can't just object_array_clear(&rev.pending) _before_ clearing
    it, because in the initial state it's still a copy of "cp", and thus
    would hose the array we're iterating over.

This whole "reuse rev, but tweak its pending array" feels a bit sketchy
to me in general. But it has been this way since 2006, and anyway is
completely out of scope for your series, so let's hold our nose and
continue. ;)

> diff --git a/t/t7007-show.sh b/t/t7007-show.sh
> index d6cc69e0f2c..f908a4d1abc 100755
> --- a/t/t7007-show.sh
> +++ b/t/t7007-show.sh
> @@ -2,6 +2,7 @@
>  
>  test_description='git show'
>  
> +TEST_PASSES_SANITIZE_LEAK=true
>  . ./test-lib.sh

Lots of stuff now passes, which is good, but....

> diff --git a/t/t9122-git-svn-author.sh b/t/t9122-git-svn-author.sh
> index 527ba3d2932..0fc289ae0f0 100755
> --- a/t/t9122-git-svn-author.sh
> +++ b/t/t9122-git-svn-author.sh
> @@ -2,7 +2,6 @@
>  
>  test_description='git svn authorship'
>  
> -TEST_FAILS_SANITIZE_LEAK=true
>  . ./lib-git-svn.sh

...this one (and t9162) don't anymore? Are these hunks a mistake? If
not, this feels like something that needs to go into the commit message.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] bisect.c: partially fix bisect_rev_setup() memory leak
  2022-07-13 13:10 ` [PATCH 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
@ 2022-07-18 15:05   ` Jeff King
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-18 15:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Wed, Jul 13, 2022 at 03:10:34PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Partially fix the memory leak noted in in 8a534b61241 (bisect: use
> argv_array API, 2011-09-13), which added the "XXX" comment seen in the
> context. We can partially fix it by having the bisect_rev_setup()
> function take a "struct strvec", rather than constructing it.
> 
> As the comment notes we need to keep the construct "rev_argv" around
> while the "struct rev_info" is around, which as seen in the newly
> added "strvec_clear()" calls here we do after "release_revisions()".

This seems like not too bad a solution on its face. The caller is
responsible for keeping rev_argv around as long as rev is valid.

This would be simpler if setup_revisions() made its own copy of the argv
strings and dropped them via release_revisions(). That has a small cost,
but I suspect it doesn't matter much in practice.

I thought that would be too much of a pain to add, but it kind of looks
like you went there anyway in the next patch. I'll comment further
there.

> This "partially" fixes the memory leak because we're leaking the "--"
> added to the "rev_argv" here still, which will be addressed in a
> subsequent commit.

This part confused me, because the "--" is in the strvec which gets
cleared. I'd guess that the revisions code just overwrites it with NULL,
and we lose access to it. But it should become clear in the next commit.

> -	setup_revisions(rev_argv.nr, rev_argv.v, revs, NULL);
> +	setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
>  	/* XXX leak rev_argv, as "revs" may still be pointing to it */

I wouldn't really call this a "leak" anymore; the caller now owns it.
Maybe nit-picking, since this comment goes away later.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] log: fix common "rev.pending" memory leak in "git show"
  2022-07-18 14:57   ` Jeff King
@ 2022-07-18 15:08     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-18 15:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano


On Mon, Jul 18 2022, Jeff King wrote:

> On Wed, Jul 13, 2022 at 03:10:33PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> diff --git a/builtin/log.c b/builtin/log.c
>> index e0f40798d45..77ec256a8ae 100644
>> --- a/builtin/log.c
>> +++ b/builtin/log.c
>> @@ -747,6 +747,8 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>>  			memcpy(&rev.pending, &blank, sizeof(rev.pending));
>>  			add_object_array(o, name, &rev.pending);
>>  			ret = cmd_log_walk_no_free(&rev);
>> +			object_array_clear(&rev.pending);
>> +			memcpy(&rev.pending, &cp, sizeof(rev.pending));
>>  			break;
>
> OK, so we clear the fake one-entry pending array we just created. That
> make sense.
>
> And then we have to restore rev.pending, because we don't otherwise
> clean up cp. That makes sense in the context of your previous patch, but
> if you take my suggestion there to separate "cp" from rev.pending
> entirely, and clean it up explicitly, then this second memcpy() goes
> away. IMHO that leaves the resulting code much easier to follow, as the
> lifetimes are clearly distinct.

Thanks, I'll mull that over, and didn't have time to do it now. FWIW I
was aiming to end up with the least bit of invasiveness into the private
bits of revision.[ch], i.e. we know we can fiddle with rev.pending
independently of other members, but it's rather nasty to assume that. So
restoring the previous state and having release_revisions() handle it is
a bit less nasty.

Of course we assume things about its internals here anyway, but this way
it's only for the narrow scope of when we're clobbering it for "show".

> Two alternatives I briefly considered that don't work:
>
>   - you can't just leave the one-item rev.pending in place to get
>     cleaned up by release_revisions(), because we may show multiple
>     commits. We have to clean each one as we go.
>
>   - you can't just object_array_clear(&rev.pending) _before_ clearing
>     it, because in the initial state it's still a copy of "cp", and thus
>     would hose the array we're iterating over.
>
> This whole "reuse rev, but tweak its pending array" feels a bit sketchy
> to me in general. But it has been this way since 2006, and anyway is
> completely out of scope for your series, so let's hold our nose and
> continue. ;)

*nod*

>> diff --git a/t/t7007-show.sh b/t/t7007-show.sh
>> index d6cc69e0f2c..f908a4d1abc 100755
>> --- a/t/t7007-show.sh
>> +++ b/t/t7007-show.sh
>> @@ -2,6 +2,7 @@
>>  
>>  test_description='git show'
>>  
>> +TEST_PASSES_SANITIZE_LEAK=true
>>  . ./test-lib.sh
>
> Lots of stuff now passes, which is good, but....
>
>> diff --git a/t/t9122-git-svn-author.sh b/t/t9122-git-svn-author.sh
>> index 527ba3d2932..0fc289ae0f0 100755
>> --- a/t/t9122-git-svn-author.sh
>> +++ b/t/t9122-git-svn-author.sh
>> @@ -2,7 +2,6 @@
>>  
>>  test_description='git svn authorship'
>>  
>> -TEST_FAILS_SANITIZE_LEAK=true
>>  . ./lib-git-svn.sh
>
> ...this one (and t9162) don't anymore? Are these hunks a mistake? If
> not, this feels like something that needs to go into the commit message.

I can add an explanation for this, this too passes, note that it's
"TEST_FAILS..." not "TEST_PASSES...".

I..e. in an earlier series I whitelisted all of lib-git-svn.sh, unless
that variable was specificed, which was less churn than adding
"TEST_PASSES_SANITIZE_LEAK=true" to all of them (as more than 1/2 of
them passed already).

So there's no new leaks here, and only new passing tests.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-13 13:10 ` [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
@ 2022-07-18 15:11   ` Jeff King
  2022-07-18 15:13     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 52+ messages in thread
From: Jeff King @ 2022-07-18 15:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Wed, Jul 13, 2022 at 03:10:35PM +0200, Ævar Arnfjörð Bjarmason wrote:

> There's several potential ways to fix those sort of leaks, we could
> add a "nodup" mode to "struct strvec", which would work for the cases
> where we push constant strings to it. But that wouldn't work as soon
> as we used strvec_pushf(), or otherwise needed to duplicate or create
> a string for that "struct strvec".

Right, I think this falls down when you have mixed const/non-const
entries. You have to know which is which for strvec_clear().

> Let's instead make it the responsibility of the revisions API. If it's
> going to clobber elements of argv it can also free() them, which it
> will now do if instructed to do so via "free_removed_argv_elements".

OK, I think this is a reasonable and minimal fix for the "--" problem on
its own.

But if you went just a little further and made the option "rev should
own its own argv", then I think you can simplify life for callers even
more. They could construct a strvec themselves and then hand it off to
the rev_info, to be cleaned up when release_revisions() is called (and
of course freeing the "--" when we overwrite it in the interim, as you
do here).

Then all of the bisect callers from the previous patch could avoid
having to deal with the strvec at all. They'd call bisect_rev_setup(),
which would internally attach the memory to rev_info.

>  bisect.c                    | 6 ++++--
>  builtin/submodule--helper.c | 5 ++++-
>  remote.c                    | 5 ++++-
>  revision.c                  | 2 ++
>  revision.h                  | 3 ++-
>  t/t2020-checkout-detach.sh  | 1 +
>  6 files changed, 17 insertions(+), 5 deletions(-)

The patch itself works as advertised, I think.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-18 15:11   ` Jeff King
@ 2022-07-18 15:13     ` Ævar Arnfjörð Bjarmason
  2022-07-18 15:45       ` Jeff King
  0 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-18 15:13 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano


On Mon, Jul 18 2022, Jeff King wrote:

> On Wed, Jul 13, 2022 at 03:10:35PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> There's several potential ways to fix those sort of leaks, we could
>> add a "nodup" mode to "struct strvec", which would work for the cases
>> where we push constant strings to it. But that wouldn't work as soon
>> as we used strvec_pushf(), or otherwise needed to duplicate or create
>> a string for that "struct strvec".
>
> Right, I think this falls down when you have mixed const/non-const
> entries. You have to know which is which for strvec_clear().

Yes, that aspect of it sucks, such a caller will need to do its own
bespoke free-ing.

In practice I haven't really found such callers, API users tend to use
one or the other. Either they're using strvec (needs free-ing) or
main()'s argv (no free-ing).

To the extent that that wasn't the case I figured the easiest fix for
those is probably to fully convert them to strvec, the copies being
trivial in the larger context...

>> Let's instead make it the responsibility of the revisions API. If it's
>> going to clobber elements of argv it can also free() them, which it
>> will now do if instructed to do so via "free_removed_argv_elements".
>
> OK, I think this is a reasonable and minimal fix for the "--" problem on
> its own.
>
> But if you went just a little further and made the option "rev should
> own its own argv", then I think you can simplify life for callers even
> more. They could construct a strvec themselves and then hand it off to
> the rev_info, to be cleaned up when release_revisions() is called (and
> of course freeing the "--" when we overwrite it in the interim, as you
> do here).
>
> Then all of the bisect callers from the previous patch could avoid
> having to deal with the strvec at all. They'd call bisect_rev_setup(),
> which would internally attach the memory to rev_info.

Yes, I experimented with this, and it's a solid approach.

But it's a much larger change, particularly since we'd also want to
update the API itself to take take "const" in the appropriate places to
do it properly.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/6] revisions API: fix more memory leaks
  2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
                   ` (5 preceding siblings ...)
  2022-07-13 13:10 ` [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
@ 2022-07-18 15:14 ` Jeff King
  2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
  7 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-18 15:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Wed, Jul 13, 2022 at 03:10:29PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Ævar Arnfjörð Bjarmason (6):
>   bisect.c: add missing "goto" for release_revisions()
>   test-fast-rebase helper: use release_revisions() (again)
>   log: make the intent of cmd_show()'s "rev.pending" juggling clearer
>   log: fix common "rev.pending" memory leak in "git show"
>   bisect.c: partially fix bisect_rev_setup() memory leak
>   revisions API: don't leak memory on argv elements that need free()-ing

This looks correct overall to me. I left a few comments:

  - the readability tweaks in 3 and 4 I think are worth doing. If you
    agree, I'd like to see that in a re-roll.

  - the bigger "rev_info owns its argv" option for patches 5 and 6 is a
    good suggestion, I think, but maybe is a bigger change (though if
    it's optional, it shouldn't need to affect any other callers). But
    if it derails things, it may not be worth doing now. I can live
    without it.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-18 15:13     ` Ævar Arnfjörð Bjarmason
@ 2022-07-18 15:45       ` Jeff King
  2022-07-29  7:07         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 52+ messages in thread
From: Jeff King @ 2022-07-18 15:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Mon, Jul 18, 2022 at 05:13:05PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > But if you went just a little further and made the option "rev should
> > own its own argv", then I think you can simplify life for callers even
> > more. They could construct a strvec themselves and then hand it off to
> > the rev_info, to be cleaned up when release_revisions() is called (and
> > of course freeing the "--" when we overwrite it in the interim, as you
> > do here).
> >
> > Then all of the bisect callers from the previous patch could avoid
> > having to deal with the strvec at all. They'd call bisect_rev_setup(),
> > which would internally attach the memory to rev_info.
> 
> Yes, I experimented with this, and it's a solid approach.
> 
> But it's a much larger change, particularly since we'd also want to
> update the API itself to take take "const" in the appropriate places to
> do it properly.

Hmm. I was thinking that we'd just have rev_info.we_own_our_argv, and
then release it in release_revisions(). But actually, rev_info does not
hold onto the argv at all! It's totally processed in setup_revisions().

And there it either:

  - becomes part of prune_data (in which case a copy is made)

  - is passed to handle_revisions_opt(), etc, in which case it is parsed
    but not held onto (it's possible that some string option holds onto
    a pointer, but the only one I found is --filter, which makes a
    copy).

  - is passed to handle_revision_arg(), but these days
    add_rev_cmdline(), etc, make copies. As they must, otherwise --stdin
    would be totally buggy.

So I actually wonder if the comment in bisect_rev_setup() is simply
wrong. It was correct in 2011 when I wrote it, but things changed in
df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
2013-05-25), etc.

In that case, we could replace your patch 5 in favor of just calling
strvec_clear() at the end of bisect_rev_setup(). It's possible there's a
case I'm missing that makes this generally not-safe, but in the case of
bisect_rev_setup() there's a very limited set of items in our argv in
the first place. Doing so also passes the test suite with
SANITIZE=address, though again, this is just exercising the very limited
bisect options here.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-18 15:45       ` Jeff King
@ 2022-07-29  7:07         ` Ævar Arnfjörð Bjarmason
  2022-07-29 18:03           ` Jeff King
  0 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29  7:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano


On Mon, Jul 18 2022, Jeff King wrote:

> On Mon, Jul 18, 2022 at 05:13:05PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > But if you went just a little further and made the option "rev should
>> > own its own argv", then I think you can simplify life for callers even
>> > more. They could construct a strvec themselves and then hand it off to
>> > the rev_info, to be cleaned up when release_revisions() is called (and
>> > of course freeing the "--" when we overwrite it in the interim, as you
>> > do here).
>> >
>> > Then all of the bisect callers from the previous patch could avoid
>> > having to deal with the strvec at all. They'd call bisect_rev_setup(),
>> > which would internally attach the memory to rev_info.
>> 
>> Yes, I experimented with this, and it's a solid approach.
>> 
>> But it's a much larger change, particularly since we'd also want to
>> update the API itself to take take "const" in the appropriate places to
>> do it properly.
>
> Hmm. I was thinking that we'd just have rev_info.we_own_our_argv, and
> then release it in release_revisions(). But actually, rev_info does not
> hold onto the argv at all! It's totally processed in setup_revisions().
>
> And there it either:
>
>   - becomes part of prune_data (in which case a copy is made)
>
>   - is passed to handle_revisions_opt(), etc, in which case it is parsed
>     but not held onto (it's possible that some string option holds onto
>     a pointer, but the only one I found is --filter, which makes a
>     copy).
>
>   - is passed to handle_revision_arg(), but these days
>     add_rev_cmdline(), etc, make copies. As they must, otherwise --stdin
>     would be totally buggy.
>
> So I actually wonder if the comment in bisect_rev_setup() is simply
> wrong. It was correct in 2011 when I wrote it, but things changed in
> df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
> 2013-05-25), etc.
>
> In that case, we could replace your patch 5 in favor of just calling
> strvec_clear() at the end of bisect_rev_setup().

5/6 is doing that, or rather at the end of check_ancestors() and
bisect_next_all(), but those call bisect_rev_setup() and pass it the
strvec, so in terms of memory management it amounts to the same thing.

> It's possible there's a
> case I'm missing that makes this generally not-safe, but in the case of
> bisect_rev_setup() there's a very limited set of items in our argv in
> the first place. Doing so also passes the test suite with
> SANITIZE=address, though again, this is just exercising the very limited
> bisect options here.

I think what you're missing is that this code is basically doing this,
which is a common pattern in various parts of our codebase:

	struct strvec sv = STRVEC_INIT;
	strvec_push(&sv, "foo"); // sv.v[0] 
	strvec_push(&sv, "bar"); // sv.v[1] 
	sv.v[1] = NULL; // the code in revisions.c
	strvec_clear(&sv);

I.e. you can't fix this by simply having revision.c having its own
strvec, because it would just .. proceed to do the same thing.

Of course we could alter its argv-iterating state machine to not clobber
that "argv" element to NULL, and have other subsequent code know that it
should stop at a new lower "argc" etc. etc., but that's getting at the
much more invasive API changes 6/6 notes as the path not taken.

And, in the general case for things that do this to the strvec what
we're explicitly doing is having the caller then operate on that munged
argv, i.e. it's important that we change *its* argv. That's not going on
here, but e.g. various parse_options() callers rely on that.

I've experimented a bit with how to solve this more generally, the below
is some very rough WIP code I have laying aroun to try to do that.

IIRC this fails SANITIZE=address or was otherwise broken, I didn't test
it now, but that's not the point...

... I'm just including it by way of explanation. I.e. for at least
*some* callers (IIRC the below mostly works, and I can't remember where
it's broken) it would suffice to just keep around a list of "here's
stuff we should free later".

In case below I opted to do that with a string_list, but it could be
another strvec or whatever.

-- >8 --
From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>
Date: Fri, 10 Jun 2022 13:05:05 +0200
Subject: [PATCH] string-list API: add utility to help free "struct strvec"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The "struct strvec" has its own strvec_clear(), but unfortunately it's
a very common pattern to pass its "v" member to parse_options(),
setup_revisions() etc.

Those APIs will then change the "argv" so that strvec_clear() can't do
anything useful with its members, e.g. parse_options_end() will NULL
out the dereferenced elements of the "argv" that's passed in to
indicate how much of it was consumed.

One way to deal with that would be to change all of those APIs to
accept some "in_argv" which they'd then copy, but that would be a very
large change.

This is a less invasive way of doing it, we'll now maintain a "struct
string_list" on the side, whose "util" members point to the elements
of the "struct strvec" that we should free() later, which we can
helpfully do with string_list_clear(list, 1). See the added API
documentation for more details.

As a result we can mark more tests as passing with SANITIZE=leak using
"TEST_PASSES_SANITIZE_LEAK=true". We'll be able to use this API for a
lot more callers, but let's start with "am"'s run_apply().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/am.c                  |  3 +++
 string-list.c                 | 16 ++++++++++++++++
 string-list.h                 | 15 +++++++++++++++
 t/t0023-crlf-am.sh            |  1 +
 t/t4152-am-subjects.sh        |  2 ++
 t/t4254-am-corrupt.sh         |  2 ++
 t/t4256-am-format-flowed.sh   |  1 +
 t/t5403-post-checkout-hook.sh |  1 +
 8 files changed, 41 insertions(+)

diff --git a/builtin/am.c b/builtin/am.c
index 93bec62afa9..d927fea128a 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	int res, opts_left;
 	int force_apply = 0;
 	int options = 0;
+	struct string_list to_free = STRING_LIST_INIT_NODUP;
 
 	if (init_apply_state(&apply_state, the_repository, NULL))
 		BUG("init_apply_state() failed");
@@ -1483,6 +1484,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	strvec_push(&apply_opts, "apply");
 	strvec_pushv(&apply_opts, state->git_apply_opts.v);
 
+	string_list_append_strvec_nodup(&to_free, &apply_opts);
 	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
 					&apply_state, &force_apply, &options,
 					NULL);
@@ -1512,6 +1514,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
 
 	strvec_clear(&apply_paths);
 	strvec_clear(&apply_opts);
+	string_list_clear(&to_free, 1);
 	clear_apply_state(&apply_state);
 
 	if (res)
diff --git a/string-list.c b/string-list.c
index 549fc416d68..ecbfb3065a8 100644
--- a/string-list.c
+++ b/string-list.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "string-list.h"
+#include "strvec.h"
 
 void string_list_init_nodup(struct string_list *list)
 {
@@ -325,3 +326,18 @@ int string_list_split_in_place(struct string_list *list, char *string,
 		}
 	}
 }
+
+void string_list_append_strvec_nodup(struct string_list *list,
+				     const struct strvec *vec)
+{
+	size_t i;
+
+	if (list->strdup_strings)
+		BUG("need .strdup_strings = 0 for string_list_append_strvec_nodup()");
+
+	for (i = 0; i < vec->nr; i++) {
+		const char *p = vec->v[i];
+
+		string_list_append_nodup(list, (char *)p)->util = (void *)p;
+	}
+}
diff --git a/string-list.h b/string-list.h
index d5a744e1438..e740f3f29be 100644
--- a/string-list.h
+++ b/string-list.h
@@ -266,4 +266,19 @@ int string_list_split(struct string_list *list, const char *string,
  */
 int string_list_split_in_place(struct string_list *list, char *string,
 			       int delim, int maxsplit);
+
+/**
+ * Given a STRING_LIST_INIT_NODUP-initialized "struct string_list"
+ * takes a "struct strvec" and "copies" it, the "string" members will
+ * be the elements of the "struct strvec", and more importantly so
+ * will the "util" members.
+ *
+ * This is intended to squirrel away pointers to "argv" to
+ * string_list_clear(list, 1) them later, in cases where a "struct
+ * strvec" is passed to an API like parse_options(), setup_revisions()
+ * etc.
+ */
+struct strvec;
+void string_list_append_strvec_nodup(struct string_list *list,
+				     const struct strvec *vec);
 #endif /* STRING_LIST_H */
diff --git a/t/t0023-crlf-am.sh b/t/t0023-crlf-am.sh
index f9bbb91f64e..575805513a3 100755
--- a/t/t0023-crlf-am.sh
+++ b/t/t0023-crlf-am.sh
@@ -2,6 +2,7 @@
 
 test_description='Test am with auto.crlf'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 cat >patchfile <<\EOF
diff --git a/t/t4152-am-subjects.sh b/t/t4152-am-subjects.sh
index 4c68245acad..9f2edba1f83 100755
--- a/t/t4152-am-subjects.sh
+++ b/t/t4152-am-subjects.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
 test_description='test subject preservation with format-patch | am'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 make_patches() {
diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index 54be7da1611..45f1d4f95e5 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
 test_description='git am with corrupt input'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 make_mbox_with_nul () {
diff --git a/t/t4256-am-format-flowed.sh b/t/t4256-am-format-flowed.sh
index 2369c4e17ad..1015273bc82 100755
--- a/t/t4256-am-format-flowed.sh
+++ b/t/t4256-am-format-flowed.sh
@@ -2,6 +2,7 @@
 
 test_description='test format=flowed support of git am'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' '
diff --git a/t/t5403-post-checkout-hook.sh b/t/t5403-post-checkout-hook.sh
index 978f240cdac..cfaae547398 100755
--- a/t/t5403-post-checkout-hook.sh
+++ b/t/t5403-post-checkout-hook.sh
@@ -7,6 +7,7 @@ test_description='Test the post-checkout hook.'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success setup '



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 0/6] revisions API: fix more memory leaks
  2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
                   ` (6 preceding siblings ...)
  2022-07-18 15:14 ` [PATCH 0/6] revisions API: fix more memory leaks Jeff King
@ 2022-07-29  8:31 ` Ævar Arnfjörð Bjarmason
  2022-07-29  8:31   ` [PATCH v2 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
                     ` (7 more replies)
  7 siblings, 8 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29  8:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

A late re-roll of this set of revisions API and API user memory leak
fixes. I think the much simpler code here in 4/6 should address the
points Jeff King brought up in the v1 review.

Other than that I renamed the variable in 3/6 s/cp/pending/g, per
Jeff's suggestion (FWIW "cp" = "copy").

1. https://lore.kernel.org/git/cover-0.6-00000000000-20220713T130511Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (6):
  bisect.c: add missing "goto" for release_revisions()
  test-fast-rebase helper: use release_revisions() (again)
  log: make the intent of cmd_show()'s "rev.pending" juggling clearer
  log: fix common "rev.pending" memory leak in "git show"
  bisect.c: partially fix bisect_rev_setup() memory leak
  revisions API: don't leak memory on argv elements that need free()-ing

 bisect.c                                      | 28 +++++++++++--------
 builtin/log.c                                 | 21 +++++++-------
 builtin/submodule--helper.c                   |  5 +++-
 remote.c                                      |  5 +++-
 revision.c                                    |  2 ++
 revision.h                                    |  3 +-
 t/helper/test-fast-rebase.c                   |  2 --
 t/t0203-gettext-setlocale-sanity.sh           |  1 +
 t/t1020-subdirectory.sh                       |  1 +
 t/t2020-checkout-detach.sh                    |  1 +
 t/t3307-notes-man.sh                          |  1 +
 t/t3920-crlf-messages.sh                      |  2 ++
 t/t4069-remerge-diff.sh                       |  1 +
 t/t7007-show.sh                               |  1 +
 ...3-pre-commit-and-pre-merge-commit-hooks.sh |  1 +
 t/t9122-git-svn-author.sh                     |  1 -
 t/t9162-git-svn-dcommit-interactive.sh        |  1 -
 17 files changed, 49 insertions(+), 28 deletions(-)

Range-diff against v1:
1:  6624eb315a5 = 1:  12a4a20c59f bisect.c: add missing "goto" for release_revisions()
2:  0da0acba8b0 = 2:  bbd3a7e5ecc test-fast-rebase helper: use release_revisions() (again)
3:  f7cc2f6a381 ! 3:  1629299f883 log: make the intent of cmd_show()'s "rev.pending" juggling clearer
    @@ builtin/log.c: static void show_setup_revisions_tweak(struct rev_info *rev,
      	struct rev_info rev;
     -	struct object_array_entry *objects;
     +	struct object_array blank = OBJECT_ARRAY_INIT;
    -+	struct object_array cp = OBJECT_ARRAY_INIT;
    ++	struct object_array pending = OBJECT_ARRAY_INIT;
     +	unsigned int i;
      	struct setup_revision_opt opt;
      	struct pathspec match_all;
    @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
      
     -	count = rev.pending.nr;
     -	objects = rev.pending.objects;
    -+	memcpy(&cp, &rev.pending, sizeof(rev.pending));
    ++	memcpy(&pending, &rev.pending, sizeof(rev.pending));
      	rev.diffopt.no_free = 1;
     -	for (i = 0; i < count && !ret; i++) {
     -		struct object *o = objects[i].item;
     -		const char *name = objects[i].name;
    -+	for (i = 0; i < cp.nr && !ret; i++) {
    -+		struct object *o = cp.objects[i].item;
    -+		const char *name = cp.objects[i].name;
    ++	for (i = 0; i < pending.nr && !ret; i++) {
    ++		struct object *o = pending.objects[i].item;
    ++		const char *name = pending.objects[i].name;
      		switch (o->type) {
      		case OBJ_BLOB:
      			ret = show_blob_object(&o->oid, &rev, name);
    @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
      				ret = error(_("could not read object %s"),
      					    oid_to_hex(oid));
     -			objects[i].item = o;
    -+			cp.objects[i].item = o;
    ++			pending.objects[i].item = o;
      			i--;
      			break;
      		}
4:  2c5b41d2b24 ! 4:  54b632c1124 log: fix common "rev.pending" memory leak in "git show"
    @@ Commit message
         In the preceding commit this code was made to use a more extendable
         pattern, which we can now complete. Once we've clobbered our
         "rev.pending" and invoked "cmd_log_walk_no_free()" we need to
    -    "object_array_clear()" our newly created "rev.pending" to avoid
    -    leaking the memory related to the one member array we've created.
    +    "object_array_clear()" our old "rev.pending".
     
    -    But more importantly we need to set "rev.pending" back to the original
    -    we squirreled away in the "cp" variable, so that we'll make use of the
    +    So in addition to the now-populated &pending array, to avoid leaking
    +    the memory related to the one member array we've created. The
    +    &rev.pending was already being object_array_clear()'d by the
         release_revisions() added in f6bfea0ad01 (revisions API users: use
    -    release_revisions() in builtin/log.c, 2022-04-13). In f6bfea0ad01 this
    -    memory leak was noted as an outstanding TODO, but it's now been fixed.
    +    release_revisions() in builtin/log.c, 2022-04-13), now we'll also
    +    correctly free the previous data (f6bfea0ad01 noted this memory leak
    +    as an outstanding TODO).
    +
    +    This way of doing it is making assumptions about the internals of
    +    "struct rev_info", in particular that the "pending" member is
    +    stand-alone, and that no other state is referring to it. But we've
    +    been making those assumptions ever since 5d7eeee2ac6 (git-show: grok
    +    blobs, trees and tags, too, 2006-12-14).
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/log.c ##
     @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
    - 			memcpy(&rev.pending, &blank, sizeof(rev.pending));
    + 		return cmd_log_deinit(cmd_log_walk(&rev), &rev);
    + 
    + 	memcpy(&pending, &rev.pending, sizeof(rev.pending));
    ++	memcpy(&rev.pending, &blank, sizeof(rev.pending));
    + 	rev.diffopt.no_free = 1;
    + 	for (i = 0; i < pending.nr && !ret; i++) {
    + 		struct object *o = pending.objects[i].item;
    +@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
    + 			rev.shown_one = 1;
    + 			break;
    + 		case OBJ_COMMIT:
    +-			memcpy(&rev.pending, &blank, sizeof(rev.pending));
      			add_object_array(o, name, &rev.pending);
      			ret = cmd_log_walk_no_free(&rev);
    -+			object_array_clear(&rev.pending);
    -+			memcpy(&rev.pending, &cp, sizeof(rev.pending));
      			break;
    - 		default:
    +@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
      			ret = error(_("unknown type: %d"), o->type);
    + 		}
    + 	}
    ++	object_array_clear(&pending);
    + 
    + 	rev.diffopt.no_free = 0;
    + 	diff_free(&rev.diffopt);
     
      ## t/t0203-gettext-setlocale-sanity.sh ##
     @@
5:  525e3427396 = 5:  2af2bce2d17 bisect.c: partially fix bisect_rev_setup() memory leak
6:  291e660a2bc = 6:  3c57e126554 revisions API: don't leak memory on argv elements that need free()-ing
-- 
2.37.1.1196.g8af3636bc64


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 1/6] bisect.c: add missing "goto" for release_revisions()
  2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
@ 2022-07-29  8:31   ` Ævar Arnfjörð Bjarmason
  2022-07-29  8:31   ` [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29  8:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Add a missing "goto cleanup", this fixes a bug in
f196c1e908d (revisions API users: use release_revisions() needing
REV_INFO_INIT, 2022-04-13).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bisect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bisect.c b/bisect.c
index b63669cc9d7..421470bfa59 100644
--- a/bisect.c
+++ b/bisect.c
@@ -1054,7 +1054,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 		 */
 		res = error_if_skipped_commits(tried, NULL);
 		if (res < 0)
-			return res;
+			goto cleanup;
 		printf(_("%s was both %s and %s\n"),
 		       oid_to_hex(current_bad_oid),
 		       term_good,
-- 
2.37.1.1196.g8af3636bc64


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again)
  2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
  2022-07-29  8:31   ` [PATCH v2 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
@ 2022-07-29  8:31   ` Ævar Arnfjörð Bjarmason
  2022-07-30  5:22     ` Eric Sunshine
  2022-07-29  8:31   ` [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29  8:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Fix a bug in 0139c58ab95 (revisions API users: add "goto cleanup" for
release_revisions(), 2022-04-13), in that commit a release_revisions()
call was added to this function, but it never did anything due to this
TODO memset() added in fe1a21d5267 (fast-rebase: demonstrate
merge-ort's API via new test-tool command, 2020-10-29).

Simply removing the memset() will fix the "cmdline" which can be seen
when running t5520-pull.sh.

This sort of thing could be detected automatically with a rule similar
to the unused.cocci merged in 7fa60d2a5b6 (Merge branch
'ab/cocci-unused' into next, 2022-07-11). The following rule on top
would catch the case being fixed here:

	@@
	type T;
	identifier I;
	identifier REL1 =~ "^[a-z_]*_(release|reset|clear|free)$";
	identifier REL2 =~ "^(release|clear|free)_[a-z_]*$";
	@@

	- memset(\( I \| &I \), 0, ...);
	  ... when != \( I \| &I \)
	(
	  \( REL1 \| REL2 \)( \( I \| &I \), ...);
	|
	  \( REL1 \| REL2 \)( \( &I \| I \) );
	)
	  ... when != \( I \| &I \)

That rule should arguably use only &I, not I (as we might be passed a
pointer). He distinction would matter if anyone cared about the
side-effects of a memset() followed by release() of a pointer to a
variable passed into the function.

As such a pattern would be at best very confusing, and most likely
point to buggy code as in this case, the above rule is probably fine
as-is.

But as this rule only found one such bug in the entire codebase let's
not add it to contrib/coccinelle/unused.cocci for now, we can always
dig it up in the future if it's deemed useful.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/helper/test-fast-rebase.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c
index 4e5553e2024..45665ec19a5 100644
--- a/t/helper/test-fast-rebase.c
+++ b/t/helper/test-fast-rebase.c
@@ -184,8 +184,6 @@ int cmd__fast_rebase(int argc, const char **argv)
 		last_picked_commit = commit;
 		last_commit = create_commit(result.tree, commit, last_commit);
 	}
-	/* TODO: There should be some kind of rev_info_free(&revs) call... */
-	memset(&revs, 0, sizeof(revs));
 
 	merge_switch_to_result(&merge_opt, head_tree, &result, 1, !result.clean);
 
-- 
2.37.1.1196.g8af3636bc64


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer
  2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
  2022-07-29  8:31   ` [PATCH v2 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
  2022-07-29  8:31   ` [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
@ 2022-07-29  8:31   ` Ævar Arnfjörð Bjarmason
  2022-07-29 18:44     ` Junio C Hamano
  2022-07-29  8:31   ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29  8:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Adjust code added in 5d7eeee2ac6 (git-show: grok blobs, trees and
tags, too, 2006-12-14) to use the "memcpy a &blank" idiom introduced
in 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT
macro, 2021-07-01).

Now the types in play are guaranteed to correspond, i.e. we used "int"
here for the "count" before, but the corresponding "nr" is an
"unsigned int". By using a "blank" object we almost entirely bypass
that, we'll only need to declare our own "unsigned int i".

There are no functional changes here aside from potential overflow
guard rails, the structure only has these three members ("nr", "alloc"
and "objects"), but now we're obviously future-proof against assuming
that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/log.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 88a5e98875a..6135f8191a9 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -668,10 +668,12 @@ static void show_setup_revisions_tweak(struct rev_info *rev,
 int cmd_show(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info rev;
-	struct object_array_entry *objects;
+	struct object_array blank = OBJECT_ARRAY_INIT;
+	struct object_array pending = OBJECT_ARRAY_INIT;
+	unsigned int i;
 	struct setup_revision_opt opt;
 	struct pathspec match_all;
-	int i, count, ret = 0;
+	int ret = 0;
 
 	init_log_defaults();
 	git_config(git_log_config, NULL);
@@ -698,12 +700,11 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 	if (!rev.no_walk)
 		return cmd_log_deinit(cmd_log_walk(&rev), &rev);
 
-	count = rev.pending.nr;
-	objects = rev.pending.objects;
+	memcpy(&pending, &rev.pending, sizeof(rev.pending));
 	rev.diffopt.no_free = 1;
-	for (i = 0; i < count && !ret; i++) {
-		struct object *o = objects[i].item;
-		const char *name = objects[i].name;
+	for (i = 0; i < pending.nr && !ret; i++) {
+		struct object *o = pending.objects[i].item;
+		const char *name = pending.objects[i].name;
 		switch (o->type) {
 		case OBJ_BLOB:
 			ret = show_blob_object(&o->oid, &rev, name);
@@ -726,7 +727,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			if (!o)
 				ret = error(_("could not read object %s"),
 					    oid_to_hex(oid));
-			objects[i].item = o;
+			pending.objects[i].item = o;
 			i--;
 			break;
 		}
@@ -743,8 +744,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			rev.shown_one = 1;
 			break;
 		case OBJ_COMMIT:
-			rev.pending.nr = rev.pending.alloc = 0;
-			rev.pending.objects = NULL;
+			memcpy(&rev.pending, &blank, sizeof(rev.pending));
 			add_object_array(o, name, &rev.pending);
 			ret = cmd_log_walk_no_free(&rev);
 			break;
-- 
2.37.1.1196.g8af3636bc64


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show"
  2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
                     ` (2 preceding siblings ...)
  2022-07-29  8:31   ` [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
@ 2022-07-29  8:31   ` Ævar Arnfjörð Bjarmason
  2022-07-29 18:55     ` Jeff King
  2022-07-29 18:59     ` Jeff King
  2022-07-29  8:31   ` [PATCH v2 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
                     ` (3 subsequent siblings)
  7 siblings, 2 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29  8:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Fix a very common memory leak introduced in 5d7eeee2ac6 (git-show: grok blobs,
trees and tags, too, 2006-12-14).

When "git show" displays commits it needs to temporarily clobbers the
"rev.pending" array, but by doing so we'll fail to
release_revisions(), as we have for most other uses of "struct
rev_info" since 2da81d1efb0 (Merge branch 'ab/plug-leak-in-revisions',
2022-06-07).

In the preceding commit this code was made to use a more extendable
pattern, which we can now complete. Once we've clobbered our
"rev.pending" and invoked "cmd_log_walk_no_free()" we need to
"object_array_clear()" our old "rev.pending".

So in addition to the now-populated &pending array, to avoid leaking
the memory related to the one member array we've created. The
&rev.pending was already being object_array_clear()'d by the
release_revisions() added in f6bfea0ad01 (revisions API users: use
release_revisions() in builtin/log.c, 2022-04-13), now we'll also
correctly free the previous data (f6bfea0ad01 noted this memory leak
as an outstanding TODO).

This way of doing it is making assumptions about the internals of
"struct rev_info", in particular that the "pending" member is
stand-alone, and that no other state is referring to it. But we've
been making those assumptions ever since 5d7eeee2ac6 (git-show: grok
blobs, trees and tags, too, 2006-12-14).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/log.c                                    | 3 ++-
 t/t0203-gettext-setlocale-sanity.sh              | 1 +
 t/t1020-subdirectory.sh                          | 1 +
 t/t3307-notes-man.sh                             | 1 +
 t/t3920-crlf-messages.sh                         | 2 ++
 t/t4069-remerge-diff.sh                          | 1 +
 t/t7007-show.sh                                  | 1 +
 t/t7503-pre-commit-and-pre-merge-commit-hooks.sh | 1 +
 t/t9122-git-svn-author.sh                        | 1 -
 t/t9162-git-svn-dcommit-interactive.sh           | 1 -
 10 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 6135f8191a9..5c7e254e994 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -701,6 +701,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 		return cmd_log_deinit(cmd_log_walk(&rev), &rev);
 
 	memcpy(&pending, &rev.pending, sizeof(rev.pending));
+	memcpy(&rev.pending, &blank, sizeof(rev.pending));
 	rev.diffopt.no_free = 1;
 	for (i = 0; i < pending.nr && !ret; i++) {
 		struct object *o = pending.objects[i].item;
@@ -744,7 +745,6 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			rev.shown_one = 1;
 			break;
 		case OBJ_COMMIT:
-			memcpy(&rev.pending, &blank, sizeof(rev.pending));
 			add_object_array(o, name, &rev.pending);
 			ret = cmd_log_walk_no_free(&rev);
 			break;
@@ -752,6 +752,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			ret = error(_("unknown type: %d"), o->type);
 		}
 	}
+	object_array_clear(&pending);
 
 	rev.diffopt.no_free = 0;
 	diff_free(&rev.diffopt);
diff --git a/t/t0203-gettext-setlocale-sanity.sh b/t/t0203-gettext-setlocale-sanity.sh
index 0ce1f22eff6..86cff324ff1 100755
--- a/t/t0203-gettext-setlocale-sanity.sh
+++ b/t/t0203-gettext-setlocale-sanity.sh
@@ -5,6 +5,7 @@
 
 test_description="The Git C functions aren't broken by setlocale(3)"
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./lib-gettext.sh
 
 test_expect_success 'git show a ISO-8859-1 commit under C locale' '
diff --git a/t/t1020-subdirectory.sh b/t/t1020-subdirectory.sh
index 9fdbb2af80e..45eef9457fe 100755
--- a/t/t1020-subdirectory.sh
+++ b/t/t1020-subdirectory.sh
@@ -6,6 +6,7 @@
 test_description='Try various core-level commands in subdirectory.
 '
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-read-tree.sh
 
diff --git a/t/t3307-notes-man.sh b/t/t3307-notes-man.sh
index 1aa366a410e..ae316502c45 100755
--- a/t/t3307-notes-man.sh
+++ b/t/t3307-notes-man.sh
@@ -4,6 +4,7 @@ test_description='Examples from the git-notes man page
 
 Make sure the manual is not full of lies.'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' '
diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index 0276edbe3d3..4c661d4d54a 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
 test_description='Test ref-filter and pretty APIs for commit and tag messages using CRLF'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 LIB_CRLF_BRANCHES=""
diff --git a/t/t4069-remerge-diff.sh b/t/t4069-remerge-diff.sh
index 35f94957fce..9e7cac68b1c 100755
--- a/t/t4069-remerge-diff.sh
+++ b/t/t4069-remerge-diff.sh
@@ -2,6 +2,7 @@
 
 test_description='remerge-diff handling'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 # This test is ort-specific
diff --git a/t/t7007-show.sh b/t/t7007-show.sh
index d6cc69e0f2c..f908a4d1abc 100755
--- a/t/t7007-show.sh
+++ b/t/t7007-show.sh
@@ -2,6 +2,7 @@
 
 test_description='git show'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success setup '
diff --git a/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh b/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
index ad1eb64ba0d..aa004b70a8d 100755
--- a/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
+++ b/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
@@ -5,6 +5,7 @@ test_description='pre-commit and pre-merge-commit hooks'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'root commit' '
diff --git a/t/t9122-git-svn-author.sh b/t/t9122-git-svn-author.sh
index 527ba3d2932..0fc289ae0f0 100755
--- a/t/t9122-git-svn-author.sh
+++ b/t/t9122-git-svn-author.sh
@@ -2,7 +2,6 @@
 
 test_description='git svn authorship'
 
-TEST_FAILS_SANITIZE_LEAK=true
 . ./lib-git-svn.sh
 
 test_expect_success 'setup svn repository' '
diff --git a/t/t9162-git-svn-dcommit-interactive.sh b/t/t9162-git-svn-dcommit-interactive.sh
index e2aa8ed88a9..b3ce033a0d3 100755
--- a/t/t9162-git-svn-dcommit-interactive.sh
+++ b/t/t9162-git-svn-dcommit-interactive.sh
@@ -4,7 +4,6 @@
 
 test_description='git svn dcommit --interactive series'
 
-TEST_FAILS_SANITIZE_LEAK=true
 . ./lib-git-svn.sh
 
 test_expect_success 'initialize repo' '
-- 
2.37.1.1196.g8af3636bc64


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 5/6] bisect.c: partially fix bisect_rev_setup() memory leak
  2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
                     ` (3 preceding siblings ...)
  2022-07-29  8:31   ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
@ 2022-07-29  8:31   ` Ævar Arnfjörð Bjarmason
  2022-07-29  8:31   ` [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29  8:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Partially fix the memory leak noted in in 8a534b61241 (bisect: use
argv_array API, 2011-09-13), which added the "XXX" comment seen in the
context. We can partially fix it by having the bisect_rev_setup()
function take a "struct strvec", rather than constructing it.

As the comment notes we need to keep the construct "rev_argv" around
while the "struct rev_info" is around, which as seen in the newly
added "strvec_clear()" calls here we do after "release_revisions()".

This "partially" fixes the memory leak because we're leaking the "--"
added to the "rev_argv" here still, which will be addressed in a
subsequent commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bisect.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/bisect.c b/bisect.c
index 421470bfa59..6afb98be7a1 100644
--- a/bisect.c
+++ b/bisect.c
@@ -648,11 +648,11 @@ static struct commit_list *managed_skipped(struct commit_list *list,
 }
 
 static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
+			     struct strvec *rev_argv,
 			     const char *prefix,
 			     const char *bad_format, const char *good_format,
 			     int read_paths)
 {
-	struct strvec rev_argv = STRVEC_INIT;
 	int i;
 
 	repo_init_revisions(r, revs, prefix);
@@ -660,16 +660,16 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 	revs->commit_format = CMIT_FMT_UNSPECIFIED;
 
 	/* rev_argv.argv[0] will be ignored by setup_revisions */
-	strvec_push(&rev_argv, "bisect_rev_setup");
-	strvec_pushf(&rev_argv, bad_format, oid_to_hex(current_bad_oid));
+	strvec_push(rev_argv, "bisect_rev_setup");
+	strvec_pushf(rev_argv, bad_format, oid_to_hex(current_bad_oid));
 	for (i = 0; i < good_revs.nr; i++)
-		strvec_pushf(&rev_argv, good_format,
+		strvec_pushf(rev_argv, good_format,
 			     oid_to_hex(good_revs.oid + i));
-	strvec_push(&rev_argv, "--");
+	strvec_push(rev_argv, "--");
 	if (read_paths)
-		read_bisect_paths(&rev_argv);
+		read_bisect_paths(rev_argv);
 
-	setup_revisions(rev_argv.nr, rev_argv.v, revs, NULL);
+	setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
 	/* XXX leak rev_argv, as "revs" may still be pointing to it */
 }
 
@@ -873,10 +873,11 @@ static enum bisect_error check_merge_bases(int rev_nr, struct commit **rev, int
 static int check_ancestors(struct repository *r, int rev_nr,
 			   struct commit **rev, const char *prefix)
 {
+	struct strvec rev_argv = STRVEC_INIT;
 	struct rev_info revs;
 	int res;
 
-	bisect_rev_setup(r, &revs, prefix, "^%s", "%s", 0);
+	bisect_rev_setup(r, &revs, &rev_argv, prefix, "^%s", "%s", 0);
 
 	bisect_common(&revs);
 	res = (revs.commits != NULL);
@@ -885,6 +886,7 @@ static int check_ancestors(struct repository *r, int rev_nr,
 	clear_commit_marks_many(rev_nr, rev, ALL_REV_FLAGS);
 
 	release_revisions(&revs);
+	strvec_clear(&rev_argv);
 	return res;
 }
 
@@ -1010,6 +1012,7 @@ void read_bisect_terms(const char **read_bad, const char **read_good)
  */
 enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 {
+	struct strvec rev_argv = STRVEC_INIT;
 	struct rev_info revs = REV_INFO_INIT;
 	struct commit_list *tried;
 	int reaches = 0, all = 0, nr, steps;
@@ -1037,7 +1040,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 	if (res)
 		goto cleanup;
 
-	bisect_rev_setup(r, &revs, prefix, "%s", "^%s", 1);
+	bisect_rev_setup(r, &revs, &rev_argv, prefix, "%s", "^%s", 1);
 
 	revs.first_parent_only = !!(bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY);
 	revs.limited = 1;
@@ -1112,6 +1115,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 	res = bisect_checkout(bisect_rev, no_checkout);
 cleanup:
 	release_revisions(&revs);
+	strvec_clear(&rev_argv);
 	return res;
 }
 
-- 
2.37.1.1196.g8af3636bc64


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
                     ` (4 preceding siblings ...)
  2022-07-29  8:31   ` [PATCH v2 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
@ 2022-07-29  8:31   ` Ævar Arnfjörð Bjarmason
  2022-07-29 19:00     ` Jeff King
  2022-07-29 19:02   ` [PATCH v2 0/6] revisions API: fix more memory leaks Jeff King
  2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
  7 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29  8:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Add a "free_removed_argv_elements" member to "struct
setup_revision_opt", and use it to fix several memory leaks, e.g. the
one with a "XXX" comment added in 8a534b61241 (bisect: use argv_array
API, 2011-09-13).

We have various memory leaks in APIs that take and munge "const
char **argv", e.g. parse_options(). Sometimes these APIs are given the
"argv" we get to the "main" function, in which case we don't leak
memory, but other times we're giving it the "v" member of a "struct
strvec" we created.

There's several potential ways to fix those sort of leaks, we could
add a "nodup" mode to "struct strvec", which would work for the cases
where we push constant strings to it. But that wouldn't work as soon
as we used strvec_pushf(), or otherwise needed to duplicate or create
a string for that "struct strvec".

Let's instead make it the responsibility of the revisions API. If it's
going to clobber elements of argv it can also free() them, which it
will now do if instructed to do so via "free_removed_argv_elements".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bisect.c                    | 6 ++++--
 builtin/submodule--helper.c | 5 ++++-
 remote.c                    | 5 ++++-
 revision.c                  | 2 ++
 revision.h                  | 3 ++-
 t/t2020-checkout-detach.sh  | 1 +
 6 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/bisect.c b/bisect.c
index 6afb98be7a1..38b3891f3a6 100644
--- a/bisect.c
+++ b/bisect.c
@@ -653,6 +653,9 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 			     const char *bad_format, const char *good_format,
 			     int read_paths)
 {
+	struct setup_revision_opt opt = {
+		.free_removed_argv_elements = 1,
+	};
 	int i;
 
 	repo_init_revisions(r, revs, prefix);
@@ -669,8 +672,7 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 	if (read_paths)
 		read_bisect_paths(rev_argv);
 
-	setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
-	/* XXX leak rev_argv, as "revs" may still be pointing to it */
+	setup_revisions(rev_argv->nr, rev_argv->v, revs, &opt);
 }
 
 static void bisect_common(struct rev_info *revs)
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index fac52ade5e1..b63f420ecef 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1104,6 +1104,9 @@ static int compute_summary_module_list(struct object_id *head_oid,
 {
 	struct strvec diff_args = STRVEC_INIT;
 	struct rev_info rev;
+	struct setup_revision_opt opt = {
+		.free_removed_argv_elements = 1,
+	};
 	struct module_cb_list list = MODULE_CB_LIST_INIT;
 	int ret = 0;
 
@@ -1121,7 +1124,7 @@ static int compute_summary_module_list(struct object_id *head_oid,
 	init_revisions(&rev, info->prefix);
 	rev.abbrev = 0;
 	precompose_argv_prefix(diff_args.nr, diff_args.v, NULL);
-	setup_revisions(diff_args.nr, diff_args.v, &rev, NULL);
+	setup_revisions(diff_args.nr, diff_args.v, &rev, &opt);
 	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = submodule_summary_callback;
 	rev.diffopt.format_callback_data = &list;
diff --git a/remote.c b/remote.c
index 1ee2b145d07..d66064118cb 100644
--- a/remote.c
+++ b/remote.c
@@ -2169,6 +2169,9 @@ static int stat_branch_pair(const char *branch_name, const char *base,
 	struct object_id oid;
 	struct commit *ours, *theirs;
 	struct rev_info revs;
+	struct setup_revision_opt opt = {
+		.free_removed_argv_elements = 1,
+	};
 	struct strvec argv = STRVEC_INIT;
 
 	/* Cannot stat if what we used to build on no longer exists */
@@ -2203,7 +2206,7 @@ static int stat_branch_pair(const char *branch_name, const char *base,
 	strvec_push(&argv, "--");
 
 	repo_init_revisions(the_repository, &revs, NULL);
-	setup_revisions(argv.nr, argv.v, &revs, NULL);
+	setup_revisions(argv.nr, argv.v, &revs, &opt);
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 
diff --git a/revision.c b/revision.c
index 0c6e26cd9c8..35d24e4fd3e 100644
--- a/revision.c
+++ b/revision.c
@@ -2784,6 +2784,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			const char *arg = argv[i];
 			if (strcmp(arg, "--"))
 				continue;
+			if (opt && opt->free_removed_argv_elements)
+				free((char *)argv[i]);
 			argv[i] = NULL;
 			argc = i;
 			if (argv[i + 1])
diff --git a/revision.h b/revision.h
index e576845cdd1..bb91e7ed914 100644
--- a/revision.h
+++ b/revision.h
@@ -375,7 +375,8 @@ struct setup_revision_opt {
 	const char *def;
 	void (*tweak)(struct rev_info *, struct setup_revision_opt *);
 	unsigned int	assume_dashdash:1,
-			allow_exclude_promisor_objects:1;
+			allow_exclude_promisor_objects:1,
+			free_removed_argv_elements:1;
 	unsigned revarg_opt;
 };
 int setup_revisions(int argc, const char **argv, struct rev_info *revs,
diff --git a/t/t2020-checkout-detach.sh b/t/t2020-checkout-detach.sh
index bc46713a43e..2eab6474f8d 100755
--- a/t/t2020-checkout-detach.sh
+++ b/t/t2020-checkout-detach.sh
@@ -4,6 +4,7 @@ test_description='checkout into detached HEAD state'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 check_detached () {
-- 
2.37.1.1196.g8af3636bc64


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-29  7:07         ` Ævar Arnfjörð Bjarmason
@ 2022-07-29 18:03           ` Jeff King
  2022-07-29 18:37             ` Jeff King
  0 siblings, 1 reply; 52+ messages in thread
From: Jeff King @ 2022-07-29 18:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Fri, Jul 29, 2022 at 09:07:40AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > In that case, we could replace your patch 5 in favor of just calling
> > strvec_clear() at the end of bisect_rev_setup().
> 
> 5/6 is doing that, or rather at the end of check_ancestors() and
> bisect_next_all(), but those call bisect_rev_setup() and pass it the
> strvec, so in terms of memory management it amounts to the same thing.

Right, but my point is that we do not need to complicate the interface
and ownership rules for bisect_rev_setup() by passing around the extra
variable.

> > It's possible there's a
> > case I'm missing that makes this generally not-safe, but in the case of
> > bisect_rev_setup() there's a very limited set of items in our argv in
> > the first place. Doing so also passes the test suite with
> > SANITIZE=address, though again, this is just exercising the very limited
> > bisect options here.
> 
> I think what you're missing is that this code is basically doing this,
> which is a common pattern in various parts of our codebase:
> 
> 	struct strvec sv = STRVEC_INIT;
> 	strvec_push(&sv, "foo"); // sv.v[0] 
> 	strvec_push(&sv, "bar"); // sv.v[1] 
> 	sv.v[1] = NULL; // the code in revisions.c
> 	strvec_clear(&sv);
> 
> I.e. you can't fix this by simply having revision.c having its own
> strvec, because it would just .. proceed to do the same thing.

Right, none of what I was suggesting above gets rid of the flag to tell
revisions.c that it should free elements it removes from argv. We still
have to do that. My point was just that if we can assume that
setup_revisions() does not need for the passed-in argv to exist after it
exits (which _used_ to not be true, but I think is these days), then
that can simplify a few of the callers.

> Of course we could alter its argv-iterating state machine to not clobber
> that "argv" element to NULL, and have other subsequent code know that it
> should stop at a new lower "argc" etc. etc., but that's getting at the
> much more invasive API changes 6/6 notes as the path not taken.

Yeah, I don't think that's at all worth it. If anything, it could
perhaps hold on to the "removed" pointer and restore it at the end, but
I wouldn't be surprised if that gets ugly, too.

> And, in the general case for things that do this to the strvec what
> we're explicitly doing is having the caller then operate on that munged
> argv, i.e. it's important that we change *its* argv. That's not going on
> here, but e.g. various parse_options() callers rely on that.

Right, agreed.

> IIRC this fails SANITIZE=address or was otherwise broken, I didn't test
> it now, but that's not the point...
> 
> ... I'm just including it by way of explanation. I.e. for at least
> *some* callers (IIRC the below mostly works, and I can't remember where
> it's broken) it would suffice to just keep around a list of "here's
> stuff we should free later".
> 
> In case below I opted to do that with a string_list, but it could be
> another strvec or whatever.

FWIW, I don't really like this direction. It feels like a huge band-aid,
and the right solution is either having functions _not_ munge strvecs
too much, or be told that they can take ownership of removed elements
and free them (i.e., your current patch 6).

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-29 18:03           ` Jeff King
@ 2022-07-29 18:37             ` Jeff King
  2022-07-29 18:52               ` Junio C Hamano
  0 siblings, 1 reply; 52+ messages in thread
From: Jeff King @ 2022-07-29 18:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Fri, Jul 29, 2022 at 02:03:16PM -0400, Jeff King wrote:

> On Fri, Jul 29, 2022 at 09:07:40AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > > In that case, we could replace your patch 5 in favor of just calling
> > > strvec_clear() at the end of bisect_rev_setup().
> > 
> > 5/6 is doing that, or rather at the end of check_ancestors() and
> > bisect_next_all(), but those call bisect_rev_setup() and pass it the
> > strvec, so in terms of memory management it amounts to the same thing.
> 
> Right, but my point is that we do not need to complicate the interface
> and ownership rules for bisect_rev_setup() by passing around the extra
> variable.

Just to be clear, what I'm proposing is to replace your 5/6 with the
patch below.

It passes the leak-checker, but I don't think that should be a surprise.
The end result should be the same as your patch, and anyway I don't
think your series actually marks any scripts as PASSING_SANITIZE_LEAK
as a result of fixing this (presumably because there are other leaks in
those scripts, too).

The more interesting question is whether it causes any use-after-free
bugs. I don't think it does, and certainly SANITIZE=address agrees.

-- >8 --
Subject: [PATCH] bisect: stop leaking strvec in bisect_rev_setup()

Back when 8a534b6124 (bisect: use argv_array API, 2011-09-13) was
written, it was not safe to free the argv we had passed to
setup_revisions() until the actual traversal was done.

But since then, we've had many cleanups that makes this safe; e.g.,
df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
2013-05-25) and 31faeb2088 (object_array_entry: fix memory handling of
the name field, 2013-05-25). The comment here is now out-dated; we can
just clear the strvec rather than leaking.

Note that there's still a small leak because of the way
setup_revisions() handles removed elements internally. That will be
fixed in a subsequent patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 bisect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bisect.c b/bisect.c
index 421470bfa5..9cce23e929 100644
--- a/bisect.c
+++ b/bisect.c
@@ -670,7 +670,7 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 		read_bisect_paths(&rev_argv);
 
 	setup_revisions(rev_argv.nr, rev_argv.v, revs, NULL);
-	/* XXX leak rev_argv, as "revs" may still be pointing to it */
+	strvec_clear(&rev_argv);
 }
 
 static void bisect_common(struct rev_info *revs)
-- 
2.37.1.804.g1775fa20e0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer
  2022-07-29  8:31   ` [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
@ 2022-07-29 18:44     ` Junio C Hamano
  0 siblings, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2022-07-29 18:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Adjust code added in 5d7eeee2ac6 (git-show: grok blobs, trees and
> tags, too, 2006-12-14) to use the "memcpy a &blank" idiom introduced
> in 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT
> macro, 2021-07-01).

Once we start using the "copy blank object to clear" pattern, we may
want to go all the way to use structure assignment instead of
memcpy().


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-29 18:37             ` Jeff King
@ 2022-07-29 18:52               ` Junio C Hamano
  2022-07-29 19:04                 ` Jeff King
  0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2022-07-29 18:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, git

Jeff King <peff@peff.net> writes:

> The more interesting question is whether it causes any use-after-free
> bugs.

Thanks for mentioning this.  All the "plug more leaks" patches make
me worried for exactly that.  Another is a potential subtle breakage
hidden by use of FREE_AND_NULL() and friends, which the sanitizers
would probably not see, but can appear as behaviour change.

> I don't think it does, and certainly SANITIZE=address agrees.

;-)

> -- >8 --
> Subject: [PATCH] bisect: stop leaking strvec in bisect_rev_setup()
>
> Back when 8a534b6124 (bisect: use argv_array API, 2011-09-13) was
> written, it was not safe to free the argv we had passed to
> setup_revisions() until the actual traversal was done.
>
> But since then, we've had many cleanups that makes this safe; e.g.,
> df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
> 2013-05-25) and 31faeb2088 (object_array_entry: fix memory handling of
> the name field, 2013-05-25). The comment here is now out-dated; we can
> just clear the strvec rather than leaking.
>
> Note that there's still a small leak because of the way
> setup_revisions() handles removed elements internally. That will be
> fixed in a subsequent patch.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  bisect.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/bisect.c b/bisect.c
> index 421470bfa5..9cce23e929 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -670,7 +670,7 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
>  		read_bisect_paths(&rev_argv);
>  
>  	setup_revisions(rev_argv.nr, rev_argv.v, revs, NULL);
> -	/* XXX leak rev_argv, as "revs" may still be pointing to it */
> +	strvec_clear(&rev_argv);
>  }
>  
>  static void bisect_common(struct rev_info *revs)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show"
  2022-07-29  8:31   ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
@ 2022-07-29 18:55     ` Jeff King
  2022-07-29 19:07       ` Jeff King
  2022-07-29 18:59     ` Jeff King
  1 sibling, 1 reply; 52+ messages in thread
From: Jeff King @ 2022-07-29 18:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Fri, Jul 29, 2022 at 10:31:06AM +0200, Ævar Arnfjörð Bjarmason wrote:

> @@ -701,6 +701,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  		return cmd_log_deinit(cmd_log_walk(&rev), &rev);
>  
>  	memcpy(&pending, &rev.pending, sizeof(rev.pending));
> +	memcpy(&rev.pending, &blank, sizeof(rev.pending));
>  	rev.diffopt.no_free = 1;
>  	for (i = 0; i < pending.nr && !ret; i++) {
>  		struct object *o = pending.objects[i].item;

OK, so now "pending" is completely separate from what's in "rev". And
then in the later hunk we clear it:

> @@ -752,6 +752,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  			ret = error(_("unknown type: %d"), o->type);
>  		}
>  	}
> +	object_array_clear(&pending);

Good. But in the middle hunk...

> @@ -744,7 +745,6 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  			rev.shown_one = 1;
>  			break;
>  		case OBJ_COMMIT:
> -			memcpy(&rev.pending, &blank, sizeof(rev.pending));
>  			add_object_array(o, name, &rev.pending);
>  			ret = cmd_log_walk_no_free(&rev);
>  			break;

We now do not do anything to clean up rev.pending. On the first pass,
we'd see our blank pending array and add to it. But on a subsequent pass
(i.e., because we are showing two or more commits), what will we see?

My initial assumption was we'd have the last pass's commit in "pending"
here, so we'd be leaking it. But I think in practice it's OK. We end up
in prepare_revision_walk(), which blanks the list again, and then
processes each element. Non-commits _do_ end up back in the pending
list, which would be a leak. But we know that this code triggers only
for commits, which are placed only in the "commits" list (and that's
cleaned up as we walk it via get_revision()).

That might be worth mentioning in the commit message, or even including
a comment like:

  /*
   * No need to clean up rev.pending here; commits are removed from it
   * as part of prepare_revision_walk().
   */

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show"
  2022-07-29  8:31   ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
  2022-07-29 18:55     ` Jeff King
@ 2022-07-29 18:59     ` Jeff King
  1 sibling, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-29 18:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Fri, Jul 29, 2022 at 10:31:06AM +0200, Ævar Arnfjörð Bjarmason wrote:

> So in addition to the now-populated &pending array, to avoid leaking
> the memory related to the one member array we've created. The
> &rev.pending was already being object_array_clear()'d by the
> release_revisions() added in f6bfea0ad01 (revisions API users: use
> release_revisions() in builtin/log.c, 2022-04-13), now we'll also
> correctly free the previous data (f6bfea0ad01 noted this memory leak
> as an outstanding TODO).

I couldn't quite parse the first sentence. But I think this explanation
is proceeding along a wrong assumption (which is the same one I had when
reviewing v1) that we are leaking the one-member array. I think this
paragraph can be replaced with something along the lines of what I
mentioned in my other reply.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-29  8:31   ` [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
@ 2022-07-29 19:00     ` Jeff King
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-29 19:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Fri, Jul 29, 2022 at 10:31:08AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Add a "free_removed_argv_elements" member to "struct
> setup_revision_opt", and use it to fix several memory leaks, e.g. the
> one with a "XXX" comment added in 8a534b61241 (bisect: use argv_array
> API, 2011-09-13).

I think the mention of this "XXX" comment is a little misleading. That
comment is not talking about leaking the "--" entry. It is talking about
leaking the _entire_ strvec, because of memory-lifetime issues. And we
fixed those already in 5.

It's a small point, and the code here is fine. But if you take my
suggested replacement for 5, then it becomes even more nonsensical. ;)

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 0/6] revisions API: fix more memory leaks
  2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
                     ` (5 preceding siblings ...)
  2022-07-29  8:31   ` [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
@ 2022-07-29 19:02   ` Jeff King
  2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
  7 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-29 19:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Fri, Jul 29, 2022 at 10:31:02AM +0200, Ævar Arnfjörð Bjarmason wrote:

> A late re-roll of this set of revisions API and API user memory leak
> fixes. I think the much simpler code here in 4/6 should address the
> points Jeff King brought up in the v1 review.
> 
> Other than that I renamed the variable in 3/6 s/cp/pending/g, per
> Jeff's suggestion (FWIW "cp" = "copy").

Thanks (and for the explanation regarding t91xx; I did totally miss in
the v1 review that those were removing "FAIL" lines, not "PASS" ones).

I don't think there's anything incorrect here, but I did make a few
cosmetic / explanation suggestions.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-07-29 18:52               ` Junio C Hamano
@ 2022-07-29 19:04                 ` Jeff King
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-29 19:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, git

On Fri, Jul 29, 2022 at 11:52:02AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The more interesting question is whether it causes any use-after-free
> > bugs.
> 
> Thanks for mentioning this.  All the "plug more leaks" patches make
> me worried for exactly that.  Another is a potential subtle breakage
> hidden by use of FREE_AND_NULL() and friends, which the sanitizers
> would probably not see, but can appear as behaviour change.

Yeah, agreed that is a potential pitfall.

> > I don't think it does, and certainly SANITIZE=address agrees.
> 
> ;-)

Just to expound a bit: I'm quite sure _this_ call site is fine, because
we are passing pretty vanilla options to setup_revisions(), and I looked
at how we handle the strings there. I'd worry a little more that there's
some dark corner of setup_revisions() which relies on memory lifetimes,
and so this isn't a safe thing to do in the general case.

But IMHO we should assume that setup_revisions() makes copies of strings
it needs, and if there is a dark corner where it doesn't, we should find
and fix that.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show"
  2022-07-29 18:55     ` Jeff King
@ 2022-07-29 19:07       ` Jeff King
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-29 19:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Fri, Jul 29, 2022 at 02:55:35PM -0400, Jeff King wrote:

> > @@ -744,7 +745,6 @@ int cmd_show(int argc, const char **argv, const char *prefix)
> >  			rev.shown_one = 1;
> >  			break;
> >  		case OBJ_COMMIT:
> > -			memcpy(&rev.pending, &blank, sizeof(rev.pending));
> >  			add_object_array(o, name, &rev.pending);
> >  			ret = cmd_log_walk_no_free(&rev);
> >  			break;
> 
> We now do not do anything to clean up rev.pending. On the first pass,
> we'd see our blank pending array and add to it. But on a subsequent pass
> (i.e., because we are showing two or more commits), what will we see?
> 
> My initial assumption was we'd have the last pass's commit in "pending"
> here, so we'd be leaking it. But I think in practice it's OK. We end up
> in prepare_revision_walk(), which blanks the list again, and then
> processes each element. Non-commits _do_ end up back in the pending
> list, which would be a leak. But we know that this code triggers only
> for commits, which are placed only in the "commits" list (and that's
> cleaned up as we walk it via get_revision()).

Sorry, just one more clarification here.

The "so we'd be leaking it" in the second paragraph is really: so before
this patch, we'd have been leaking it writing over it with blank. After
this patch, we'd be building up an ever-growing list of pending objects,
and showing a quadratic number of entries. ;)

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again)
  2022-07-29  8:31   ` [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
@ 2022-07-30  5:22     ` Eric Sunshine
  0 siblings, 0 replies; 52+ messages in thread
From: Eric Sunshine @ 2022-07-30  5:22 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Jeff King

On Fri, Jul 29, 2022 at 4:33 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Fix a bug in 0139c58ab95 (revisions API users: add "goto cleanup" for
> release_revisions(), 2022-04-13), in that commit a release_revisions()
> call was added to this function, but it never did anything due to this
> TODO memset() added in fe1a21d5267 (fast-rebase: demonstrate
> merge-ort's API via new test-tool command, 2020-10-29).
>
> Simply removing the memset() will fix the "cmdline" which can be seen
> when running t5520-pull.sh.
>
> This sort of thing could be detected automatically with a rule similar
> to the unused.cocci merged in 7fa60d2a5b6 (Merge branch
> 'ab/cocci-unused' into next, 2022-07-11). The following rule on top
> would catch the case being fixed here:
>
>         @@
>         type T;
>         identifier I;
>         identifier REL1 =~ "^[a-z_]*_(release|reset|clear|free)$";
>         identifier REL2 =~ "^(release|clear|free)_[a-z_]*$";
>         @@
>
>         - memset(\( I \| &I \), 0, ...);
>           ... when != \( I \| &I \)
>         (
>           \( REL1 \| REL2 \)( \( I \| &I \), ...);
>         |
>           \( REL1 \| REL2 \)( \( &I \| I \) );
>         )
>           ... when != \( I \| &I \)
>
> That rule should arguably use only &I, not I (as we might be passed a
> pointer). He distinction would matter if anyone cared about the

s/He/The/

> side-effects of a memset() followed by release() of a pointer to a
> variable passed into the function.
>
> As such a pattern would be at best very confusing, and most likely
> point to buggy code as in this case, the above rule is probably fine
> as-is.
>
> But as this rule only found one such bug in the entire codebase let's
> not add it to contrib/coccinelle/unused.cocci for now, we can always
> dig it up in the future if it's deemed useful.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 0/6] revisions API: fix more memory leaks
  2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
                     ` (6 preceding siblings ...)
  2022-07-29 19:02   ` [PATCH v2 0/6] revisions API: fix more memory leaks Jeff King
@ 2022-08-02 15:33   ` Ævar Arnfjörð Bjarmason
  2022-08-02 15:33     ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
                       ` (6 more replies)
  7 siblings, 7 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

This series fixes various leaking users of the revisions API.

Changes since v2:

* I think the rewrite here of fixing the leak should address Jeff
  King's concerns. I was somewhat mentally stuck on trying to change
  the code to make the leak fix easier, and then having the leak fix.

  But I think as the new 3/6 shows fixing the leak first is much
  easier, and more straightforward to explain.

* A new 4/6 then follows-up and rewrites the variable clobbering that
  was needed for the pre-image of 3/6.

* Don't reference the "XXX" comment in 6/6.

* A typo fix, pointed out by Eric Sunshine.

Thanks all for the review on the v2!

Ævar Arnfjörð Bjarmason (6):
  bisect.c: add missing "goto" for release_revisions()
  test-fast-rebase helper: use release_revisions() (again)
  log: fix a memory leak in "git show <revision>..."
  log: refactor "rev.pending" code in cmd_show()
  bisect.c: partially fix bisect_rev_setup() memory leak
  revisions API: don't leak memory on argv elements that need free()-ing

 bisect.c                                      | 28 ++++++++++-------
 builtin/log.c                                 | 31 +++++++++++++------
 builtin/submodule--helper.c                   |  5 ++-
 remote.c                                      |  5 ++-
 revision.c                                    |  2 ++
 revision.h                                    |  3 +-
 t/helper/test-fast-rebase.c                   |  2 --
 t/t0203-gettext-setlocale-sanity.sh           |  1 +
 t/t1020-subdirectory.sh                       |  1 +
 t/t2020-checkout-detach.sh                    |  1 +
 t/t3307-notes-man.sh                          |  1 +
 t/t3920-crlf-messages.sh                      |  2 ++
 t/t4069-remerge-diff.sh                       |  1 +
 t/t7007-show.sh                               |  1 +
 ...3-pre-commit-and-pre-merge-commit-hooks.sh |  1 +
 t/t9122-git-svn-author.sh                     |  1 -
 t/t9162-git-svn-dcommit-interactive.sh        |  1 -
 17 files changed, 59 insertions(+), 28 deletions(-)

Range-diff against v2:
1:  12a4a20c59f = 1:  12a4a20c59f bisect.c: add missing "goto" for release_revisions()
2:  bbd3a7e5ecc ! 2:  a04fade9d9d test-fast-rebase helper: use release_revisions() (again)
    @@ Commit message
                   ... when != \( I \| &I \)
     
         That rule should arguably use only &I, not I (as we might be passed a
    -    pointer). He distinction would matter if anyone cared about the
    +    pointer). The distinction would matter if anyone cared about the
         side-effects of a memset() followed by release() of a pointer to a
         variable passed into the function.
     
4:  54b632c1124 ! 3:  83fc1835fe5 log: fix common "rev.pending" memory leak in "git show"
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    log: fix common "rev.pending" memory leak in "git show"
    -
    -    Fix a very common memory leak introduced in 5d7eeee2ac6 (git-show: grok blobs,
    -    trees and tags, too, 2006-12-14).
    -
    -    When "git show" displays commits it needs to temporarily clobbers the
    -    "rev.pending" array, but by doing so we'll fail to
    -    release_revisions(), as we have for most other uses of "struct
    -    rev_info" since 2da81d1efb0 (Merge branch 'ab/plug-leak-in-revisions',
    -    2022-06-07).
    -
    -    In the preceding commit this code was made to use a more extendable
    -    pattern, which we can now complete. Once we've clobbered our
    -    "rev.pending" and invoked "cmd_log_walk_no_free()" we need to
    -    "object_array_clear()" our old "rev.pending".
    -
    -    So in addition to the now-populated &pending array, to avoid leaking
    -    the memory related to the one member array we've created. The
    -    &rev.pending was already being object_array_clear()'d by the
    -    release_revisions() added in f6bfea0ad01 (revisions API users: use
    -    release_revisions() in builtin/log.c, 2022-04-13), now we'll also
    -    correctly free the previous data (f6bfea0ad01 noted this memory leak
    -    as an outstanding TODO).
    -
    -    This way of doing it is making assumptions about the internals of
    -    "struct rev_info", in particular that the "pending" member is
    -    stand-alone, and that no other state is referring to it. But we've
    -    been making those assumptions ever since 5d7eeee2ac6 (git-show: grok
    -    blobs, trees and tags, too, 2006-12-14).
    +    log: fix a memory leak in "git show <revision>..."
    +
    +    Fix a memory leak in code added in 5d7eeee2ac6 (git-show: grok blobs,
    +    trees and tags, too, 2006-12-14). As we iterate over a "<revision>..."
    +    command-line and encounter ad OBJ_COMMIT we want to use our "struct
    +    rev_info", but with a "pending" array of one element: the one commit
    +    we're showing in the loop.
    +
    +    To do this 5d7eeee2ac6 saved away a pointer to rev.pending.objects and
    +    rev.pending.nr for its iteration. We'd then clobber those (and alloc)
    +    when we needed to show an OBJ_COMMIT.
    +
    +    We'd therefore leak the "rev.pending" we started out with, and only
    +    free the new "rev.pending" in the "OBJ_COMMIT" case arm as
    +    prepare_revision_walk() would draw it down.
    +
    +    Let's fix this memory leak. Now when we encounter an OBJ_COMMIT we
    +    save away the "rev.pending" before clearing it. We then add a single
    +    commit to it, which our indirect invocation of prepare_revision_walk()
    +    will remove. After that we restore the "rev.pending".
    +
    +    Our "rev.pending" will then get free'd by the release_revisions()
    +    added in f6bfea0ad01 (revisions API users: use release_revisions() in
    +    builtin/log.c, 2022-04-13)
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/log.c ##
    -@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
    - 		return cmd_log_deinit(cmd_log_walk(&rev), &rev);
    - 
    - 	memcpy(&pending, &rev.pending, sizeof(rev.pending));
    -+	memcpy(&rev.pending, &blank, sizeof(rev.pending));
    - 	rev.diffopt.no_free = 1;
    - 	for (i = 0; i < pending.nr && !ret; i++) {
    - 		struct object *o = pending.objects[i].item;
     @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
      			rev.shown_one = 1;
      			break;
      		case OBJ_COMMIT:
    --			memcpy(&rev.pending, &blank, sizeof(rev.pending));
    ++		{
    ++			struct object_array old;
    ++
    ++			memcpy(&old, &rev.pending, sizeof(old));
    + 			rev.pending.nr = rev.pending.alloc = 0;
    + 			rev.pending.objects = NULL;
      			add_object_array(o, name, &rev.pending);
      			ret = cmd_log_walk_no_free(&rev);
    ++			memcpy(&rev.pending, &old, sizeof(rev.pending));
      			break;
    -@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
    ++		}
    + 		default:
      			ret = error(_("unknown type: %d"), o->type);
      		}
    - 	}
    -+	object_array_clear(&pending);
    - 
    - 	rev.diffopt.no_free = 0;
    - 	diff_free(&rev.diffopt);
     
      ## t/t0203-gettext-setlocale-sanity.sh ##
     @@
3:  1629299f883 ! 4:  fd474666e7c log: make the intent of cmd_show()'s "rev.pending" juggling clearer
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    log: make the intent of cmd_show()'s "rev.pending" juggling clearer
    +    log: refactor "rev.pending" code in cmd_show()
     
    -    Adjust code added in 5d7eeee2ac6 (git-show: grok blobs, trees and
    -    tags, too, 2006-12-14) to use the "memcpy a &blank" idiom introduced
    -    in 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT
    -    macro, 2021-07-01).
    +    Refactor the juggling of "rev.pending" and our replacement for it
    +    amended in the preceding commit so that:
     
    -    Now the types in play are guaranteed to correspond, i.e. we used "int"
    -    here for the "count" before, but the corresponding "nr" is an
    -    "unsigned int". By using a "blank" object we almost entirely bypass
    -    that, we'll only need to declare our own "unsigned int i".
    +     * We use an "unsigned int" instead of an "int" for "i", this matches
    +       the types of "struct rev_info" itself.
     
    -    There are no functional changes here aside from potential overflow
    -    guard rails, the structure only has these three members ("nr", "alloc"
    -    and "objects"), but now we're obviously future-proof against assuming
    -    that.
    +     * We don't need the "count" and "objects" variables introduced in
    +       5d7eeee2ac6 (git-show: grok blobs, trees and tags, too, 2006-12-14).
    +
    +       They were originally added since we'd clobber rev.pending in the
    +       loop without restoring it. Since the preceding commit we are
    +       restoring it when we handle OBJ_COMMIT, so the main for-loop can
    +       refer to "rev.pending" didrectly.
    +
    +     * We use the "memcpy a &blank" idiom introduced in
    +       5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT
    +       macro, 2021-07-01).
    +
    +       This is more obvious than relying on us enumerating all of the
    +       relevant members of the "struct object_array" that we need to
    +       clear.
    +
    +     * We comment on why we don't need an object_array_clear() here, see
    +       the analysis in [1].
    +
    +    1. https://lore.kernel.org/git/YuQtJ2DxNKX%2Fy70N@coredump.intra.peff.net/
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    +    Helped-by: Jeff King <peff@peff.net>
     
      ## builtin/log.c ##
     @@ builtin/log.c: static void show_setup_revisions_tweak(struct rev_info *rev,
    @@ builtin/log.c: static void show_setup_revisions_tweak(struct rev_info *rev,
      {
      	struct rev_info rev;
     -	struct object_array_entry *objects;
    -+	struct object_array blank = OBJECT_ARRAY_INIT;
    -+	struct object_array pending = OBJECT_ARRAY_INIT;
     +	unsigned int i;
      	struct setup_revision_opt opt;
      	struct pathspec match_all;
    @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
      
     -	count = rev.pending.nr;
     -	objects = rev.pending.objects;
    -+	memcpy(&pending, &rev.pending, sizeof(rev.pending));
      	rev.diffopt.no_free = 1;
     -	for (i = 0; i < count && !ret; i++) {
     -		struct object *o = objects[i].item;
     -		const char *name = objects[i].name;
    -+	for (i = 0; i < pending.nr && !ret; i++) {
    -+		struct object *o = pending.objects[i].item;
    -+		const char *name = pending.objects[i].name;
    ++	for (i = 0; i < rev.pending.nr && !ret; i++) {
    ++		struct object *o = rev.pending.objects[i].item;
    ++		const char *name = rev.pending.objects[i].name;
      		switch (o->type) {
      		case OBJ_BLOB:
      			ret = show_blob_object(&o->oid, &rev, name);
    @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
      				ret = error(_("could not read object %s"),
      					    oid_to_hex(oid));
     -			objects[i].item = o;
    -+			pending.objects[i].item = o;
    ++			rev.pending.objects[i].item = o;
      			i--;
      			break;
      		}
     @@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
    - 			rev.shown_one = 1;
    - 			break;
      		case OBJ_COMMIT:
    + 		{
    + 			struct object_array old;
    ++			struct object_array blank = OBJECT_ARRAY_INIT;
    + 
    + 			memcpy(&old, &rev.pending, sizeof(old));
     -			rev.pending.nr = rev.pending.alloc = 0;
     -			rev.pending.objects = NULL;
     +			memcpy(&rev.pending, &blank, sizeof(rev.pending));
    ++
      			add_object_array(o, name, &rev.pending);
      			ret = cmd_log_walk_no_free(&rev);
    ++
    ++			/*
    ++			 * No need for
    ++			 * object_array_clear(&pending). It was
    ++			 * cleared already in prepare_revision_walk()
    ++			 */
    + 			memcpy(&rev.pending, &old, sizeof(rev.pending));
      			break;
    + 		}
5:  2af2bce2d17 = 5:  dae5872e722 bisect.c: partially fix bisect_rev_setup() memory leak
6:  3c57e126554 ! 6:  5e637f55ff1 revisions API: don't leak memory on argv elements that need free()-ing
    @@ Commit message
         revisions API: don't leak memory on argv elements that need free()-ing
     
         Add a "free_removed_argv_elements" member to "struct
    -    setup_revision_opt", and use it to fix several memory leaks, e.g. the
    -    one with a "XXX" comment added in 8a534b61241 (bisect: use argv_array
    -    API, 2011-09-13).
    +    setup_revision_opt", and use it to fix several memory leaks.
     
         We have various memory leaks in APIs that take and munge "const
         char **argv", e.g. parse_options(). Sometimes these APIs are given the
-- 
2.37.1.1233.ge8b09efaedc


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions()
  2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33     ` Ævar Arnfjörð Bjarmason
  2022-08-03 17:13       ` Junio C Hamano
  2022-08-02 15:33     ` [PATCH v3 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

Add a missing "goto cleanup", this fixes a bug in
f196c1e908d (revisions API users: use release_revisions() needing
REV_INFO_INIT, 2022-04-13).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bisect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bisect.c b/bisect.c
index b63669cc9d7..421470bfa59 100644
--- a/bisect.c
+++ b/bisect.c
@@ -1054,7 +1054,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 		 */
 		res = error_if_skipped_commits(tried, NULL);
 		if (res < 0)
-			return res;
+			goto cleanup;
 		printf(_("%s was both %s and %s\n"),
 		       oid_to_hex(current_bad_oid),
 		       term_good,
-- 
2.37.1.1233.ge8b09efaedc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/6] test-fast-rebase helper: use release_revisions() (again)
  2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
  2022-08-02 15:33     ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33     ` Ævar Arnfjörð Bjarmason
  2022-08-02 15:33     ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

Fix a bug in 0139c58ab95 (revisions API users: add "goto cleanup" for
release_revisions(), 2022-04-13), in that commit a release_revisions()
call was added to this function, but it never did anything due to this
TODO memset() added in fe1a21d5267 (fast-rebase: demonstrate
merge-ort's API via new test-tool command, 2020-10-29).

Simply removing the memset() will fix the "cmdline" which can be seen
when running t5520-pull.sh.

This sort of thing could be detected automatically with a rule similar
to the unused.cocci merged in 7fa60d2a5b6 (Merge branch
'ab/cocci-unused' into next, 2022-07-11). The following rule on top
would catch the case being fixed here:

	@@
	type T;
	identifier I;
	identifier REL1 =~ "^[a-z_]*_(release|reset|clear|free)$";
	identifier REL2 =~ "^(release|clear|free)_[a-z_]*$";
	@@

	- memset(\( I \| &I \), 0, ...);
	  ... when != \( I \| &I \)
	(
	  \( REL1 \| REL2 \)( \( I \| &I \), ...);
	|
	  \( REL1 \| REL2 \)( \( &I \| I \) );
	)
	  ... when != \( I \| &I \)

That rule should arguably use only &I, not I (as we might be passed a
pointer). The distinction would matter if anyone cared about the
side-effects of a memset() followed by release() of a pointer to a
variable passed into the function.

As such a pattern would be at best very confusing, and most likely
point to buggy code as in this case, the above rule is probably fine
as-is.

But as this rule only found one such bug in the entire codebase let's
not add it to contrib/coccinelle/unused.cocci for now, we can always
dig it up in the future if it's deemed useful.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/helper/test-fast-rebase.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c
index 4e5553e2024..45665ec19a5 100644
--- a/t/helper/test-fast-rebase.c
+++ b/t/helper/test-fast-rebase.c
@@ -184,8 +184,6 @@ int cmd__fast_rebase(int argc, const char **argv)
 		last_picked_commit = commit;
 		last_commit = create_commit(result.tree, commit, last_commit);
 	}
-	/* TODO: There should be some kind of rev_info_free(&revs) call... */
-	memset(&revs, 0, sizeof(revs));
 
 	merge_switch_to_result(&merge_opt, head_tree, &result, 1, !result.clean);
 
-- 
2.37.1.1233.ge8b09efaedc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..."
  2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
  2022-08-02 15:33     ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
  2022-08-02 15:33     ` [PATCH v3 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33     ` Ævar Arnfjörð Bjarmason
  2022-08-03 17:27       ` Jeff King
  2022-08-03 17:54       ` Junio C Hamano
  2022-08-02 15:33     ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
                       ` (3 subsequent siblings)
  6 siblings, 2 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

Fix a memory leak in code added in 5d7eeee2ac6 (git-show: grok blobs,
trees and tags, too, 2006-12-14). As we iterate over a "<revision>..."
command-line and encounter ad OBJ_COMMIT we want to use our "struct
rev_info", but with a "pending" array of one element: the one commit
we're showing in the loop.

To do this 5d7eeee2ac6 saved away a pointer to rev.pending.objects and
rev.pending.nr for its iteration. We'd then clobber those (and alloc)
when we needed to show an OBJ_COMMIT.

We'd therefore leak the "rev.pending" we started out with, and only
free the new "rev.pending" in the "OBJ_COMMIT" case arm as
prepare_revision_walk() would draw it down.

Let's fix this memory leak. Now when we encounter an OBJ_COMMIT we
save away the "rev.pending" before clearing it. We then add a single
commit to it, which our indirect invocation of prepare_revision_walk()
will remove. After that we restore the "rev.pending".

Our "rev.pending" will then get free'd by the release_revisions()
added in f6bfea0ad01 (revisions API users: use release_revisions() in
builtin/log.c, 2022-04-13)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/log.c                                    | 6 ++++++
 t/t0203-gettext-setlocale-sanity.sh              | 1 +
 t/t1020-subdirectory.sh                          | 1 +
 t/t3307-notes-man.sh                             | 1 +
 t/t3920-crlf-messages.sh                         | 2 ++
 t/t4069-remerge-diff.sh                          | 1 +
 t/t7007-show.sh                                  | 1 +
 t/t7503-pre-commit-and-pre-merge-commit-hooks.sh | 1 +
 t/t9122-git-svn-author.sh                        | 1 -
 t/t9162-git-svn-dcommit-interactive.sh           | 1 -
 10 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 88a5e98875a..b4b1d974617 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -743,11 +743,17 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			rev.shown_one = 1;
 			break;
 		case OBJ_COMMIT:
+		{
+			struct object_array old;
+
+			memcpy(&old, &rev.pending, sizeof(old));
 			rev.pending.nr = rev.pending.alloc = 0;
 			rev.pending.objects = NULL;
 			add_object_array(o, name, &rev.pending);
 			ret = cmd_log_walk_no_free(&rev);
+			memcpy(&rev.pending, &old, sizeof(rev.pending));
 			break;
+		}
 		default:
 			ret = error(_("unknown type: %d"), o->type);
 		}
diff --git a/t/t0203-gettext-setlocale-sanity.sh b/t/t0203-gettext-setlocale-sanity.sh
index 0ce1f22eff6..86cff324ff1 100755
--- a/t/t0203-gettext-setlocale-sanity.sh
+++ b/t/t0203-gettext-setlocale-sanity.sh
@@ -5,6 +5,7 @@
 
 test_description="The Git C functions aren't broken by setlocale(3)"
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./lib-gettext.sh
 
 test_expect_success 'git show a ISO-8859-1 commit under C locale' '
diff --git a/t/t1020-subdirectory.sh b/t/t1020-subdirectory.sh
index 9fdbb2af80e..45eef9457fe 100755
--- a/t/t1020-subdirectory.sh
+++ b/t/t1020-subdirectory.sh
@@ -6,6 +6,7 @@
 test_description='Try various core-level commands in subdirectory.
 '
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-read-tree.sh
 
diff --git a/t/t3307-notes-man.sh b/t/t3307-notes-man.sh
index 1aa366a410e..ae316502c45 100755
--- a/t/t3307-notes-man.sh
+++ b/t/t3307-notes-man.sh
@@ -4,6 +4,7 @@ test_description='Examples from the git-notes man page
 
 Make sure the manual is not full of lies.'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' '
diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index 0276edbe3d3..4c661d4d54a 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
 test_description='Test ref-filter and pretty APIs for commit and tag messages using CRLF'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 LIB_CRLF_BRANCHES=""
diff --git a/t/t4069-remerge-diff.sh b/t/t4069-remerge-diff.sh
index 35f94957fce..9e7cac68b1c 100755
--- a/t/t4069-remerge-diff.sh
+++ b/t/t4069-remerge-diff.sh
@@ -2,6 +2,7 @@
 
 test_description='remerge-diff handling'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 # This test is ort-specific
diff --git a/t/t7007-show.sh b/t/t7007-show.sh
index d6cc69e0f2c..f908a4d1abc 100755
--- a/t/t7007-show.sh
+++ b/t/t7007-show.sh
@@ -2,6 +2,7 @@
 
 test_description='git show'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success setup '
diff --git a/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh b/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
index ad1eb64ba0d..aa004b70a8d 100755
--- a/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
+++ b/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
@@ -5,6 +5,7 @@ test_description='pre-commit and pre-merge-commit hooks'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'root commit' '
diff --git a/t/t9122-git-svn-author.sh b/t/t9122-git-svn-author.sh
index 527ba3d2932..0fc289ae0f0 100755
--- a/t/t9122-git-svn-author.sh
+++ b/t/t9122-git-svn-author.sh
@@ -2,7 +2,6 @@
 
 test_description='git svn authorship'
 
-TEST_FAILS_SANITIZE_LEAK=true
 . ./lib-git-svn.sh
 
 test_expect_success 'setup svn repository' '
diff --git a/t/t9162-git-svn-dcommit-interactive.sh b/t/t9162-git-svn-dcommit-interactive.sh
index e2aa8ed88a9..b3ce033a0d3 100755
--- a/t/t9162-git-svn-dcommit-interactive.sh
+++ b/t/t9162-git-svn-dcommit-interactive.sh
@@ -4,7 +4,6 @@
 
 test_description='git svn dcommit --interactive series'
 
-TEST_FAILS_SANITIZE_LEAK=true
 . ./lib-git-svn.sh
 
 test_expect_success 'initialize repo' '
-- 
2.37.1.1233.ge8b09efaedc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
  2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
                       ` (2 preceding siblings ...)
  2022-08-02 15:33     ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33     ` Ævar Arnfjörð Bjarmason
  2022-08-03 17:32       ` Jeff King
  2022-08-03 17:59       ` Junio C Hamano
  2022-08-02 15:33     ` [PATCH v3 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
                       ` (2 subsequent siblings)
  6 siblings, 2 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

Refactor the juggling of "rev.pending" and our replacement for it
amended in the preceding commit so that:

 * We use an "unsigned int" instead of an "int" for "i", this matches
   the types of "struct rev_info" itself.

 * We don't need the "count" and "objects" variables introduced in
   5d7eeee2ac6 (git-show: grok blobs, trees and tags, too, 2006-12-14).

   They were originally added since we'd clobber rev.pending in the
   loop without restoring it. Since the preceding commit we are
   restoring it when we handle OBJ_COMMIT, so the main for-loop can
   refer to "rev.pending" didrectly.

 * We use the "memcpy a &blank" idiom introduced in
   5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT
   macro, 2021-07-01).

   This is more obvious than relying on us enumerating all of the
   relevant members of the "struct object_array" that we need to
   clear.

 * We comment on why we don't need an object_array_clear() here, see
   the analysis in [1].

1. https://lore.kernel.org/git/YuQtJ2DxNKX%2Fy70N@coredump.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Jeff King <peff@peff.net>
---
 builtin/log.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index b4b1d974617..9b937d59b83 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -668,10 +668,10 @@ static void show_setup_revisions_tweak(struct rev_info *rev,
 int cmd_show(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info rev;
-	struct object_array_entry *objects;
+	unsigned int i;
 	struct setup_revision_opt opt;
 	struct pathspec match_all;
-	int i, count, ret = 0;
+	int ret = 0;
 
 	init_log_defaults();
 	git_config(git_log_config, NULL);
@@ -698,12 +698,10 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 	if (!rev.no_walk)
 		return cmd_log_deinit(cmd_log_walk(&rev), &rev);
 
-	count = rev.pending.nr;
-	objects = rev.pending.objects;
 	rev.diffopt.no_free = 1;
-	for (i = 0; i < count && !ret; i++) {
-		struct object *o = objects[i].item;
-		const char *name = objects[i].name;
+	for (i = 0; i < rev.pending.nr && !ret; i++) {
+		struct object *o = rev.pending.objects[i].item;
+		const char *name = rev.pending.objects[i].name;
 		switch (o->type) {
 		case OBJ_BLOB:
 			ret = show_blob_object(&o->oid, &rev, name);
@@ -726,7 +724,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			if (!o)
 				ret = error(_("could not read object %s"),
 					    oid_to_hex(oid));
-			objects[i].item = o;
+			rev.pending.objects[i].item = o;
 			i--;
 			break;
 		}
@@ -745,12 +743,19 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 		case OBJ_COMMIT:
 		{
 			struct object_array old;
+			struct object_array blank = OBJECT_ARRAY_INIT;
 
 			memcpy(&old, &rev.pending, sizeof(old));
-			rev.pending.nr = rev.pending.alloc = 0;
-			rev.pending.objects = NULL;
+			memcpy(&rev.pending, &blank, sizeof(rev.pending));
+
 			add_object_array(o, name, &rev.pending);
 			ret = cmd_log_walk_no_free(&rev);
+
+			/*
+			 * No need for
+			 * object_array_clear(&pending). It was
+			 * cleared already in prepare_revision_walk()
+			 */
 			memcpy(&rev.pending, &old, sizeof(rev.pending));
 			break;
 		}
-- 
2.37.1.1233.ge8b09efaedc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 5/6] bisect.c: partially fix bisect_rev_setup() memory leak
  2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
                       ` (3 preceding siblings ...)
  2022-08-02 15:33     ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33     ` Ævar Arnfjörð Bjarmason
  2022-08-02 15:33     ` [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
  2022-08-03 17:33     ` [PATCH v3 0/6] revisions API: fix more memory leaks Jeff King
  6 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

Partially fix the memory leak noted in in 8a534b61241 (bisect: use
argv_array API, 2011-09-13), which added the "XXX" comment seen in the
context. We can partially fix it by having the bisect_rev_setup()
function take a "struct strvec", rather than constructing it.

As the comment notes we need to keep the construct "rev_argv" around
while the "struct rev_info" is around, which as seen in the newly
added "strvec_clear()" calls here we do after "release_revisions()".

This "partially" fixes the memory leak because we're leaking the "--"
added to the "rev_argv" here still, which will be addressed in a
subsequent commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bisect.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/bisect.c b/bisect.c
index 421470bfa59..6afb98be7a1 100644
--- a/bisect.c
+++ b/bisect.c
@@ -648,11 +648,11 @@ static struct commit_list *managed_skipped(struct commit_list *list,
 }
 
 static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
+			     struct strvec *rev_argv,
 			     const char *prefix,
 			     const char *bad_format, const char *good_format,
 			     int read_paths)
 {
-	struct strvec rev_argv = STRVEC_INIT;
 	int i;
 
 	repo_init_revisions(r, revs, prefix);
@@ -660,16 +660,16 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 	revs->commit_format = CMIT_FMT_UNSPECIFIED;
 
 	/* rev_argv.argv[0] will be ignored by setup_revisions */
-	strvec_push(&rev_argv, "bisect_rev_setup");
-	strvec_pushf(&rev_argv, bad_format, oid_to_hex(current_bad_oid));
+	strvec_push(rev_argv, "bisect_rev_setup");
+	strvec_pushf(rev_argv, bad_format, oid_to_hex(current_bad_oid));
 	for (i = 0; i < good_revs.nr; i++)
-		strvec_pushf(&rev_argv, good_format,
+		strvec_pushf(rev_argv, good_format,
 			     oid_to_hex(good_revs.oid + i));
-	strvec_push(&rev_argv, "--");
+	strvec_push(rev_argv, "--");
 	if (read_paths)
-		read_bisect_paths(&rev_argv);
+		read_bisect_paths(rev_argv);
 
-	setup_revisions(rev_argv.nr, rev_argv.v, revs, NULL);
+	setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
 	/* XXX leak rev_argv, as "revs" may still be pointing to it */
 }
 
@@ -873,10 +873,11 @@ static enum bisect_error check_merge_bases(int rev_nr, struct commit **rev, int
 static int check_ancestors(struct repository *r, int rev_nr,
 			   struct commit **rev, const char *prefix)
 {
+	struct strvec rev_argv = STRVEC_INIT;
 	struct rev_info revs;
 	int res;
 
-	bisect_rev_setup(r, &revs, prefix, "^%s", "%s", 0);
+	bisect_rev_setup(r, &revs, &rev_argv, prefix, "^%s", "%s", 0);
 
 	bisect_common(&revs);
 	res = (revs.commits != NULL);
@@ -885,6 +886,7 @@ static int check_ancestors(struct repository *r, int rev_nr,
 	clear_commit_marks_many(rev_nr, rev, ALL_REV_FLAGS);
 
 	release_revisions(&revs);
+	strvec_clear(&rev_argv);
 	return res;
 }
 
@@ -1010,6 +1012,7 @@ void read_bisect_terms(const char **read_bad, const char **read_good)
  */
 enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 {
+	struct strvec rev_argv = STRVEC_INIT;
 	struct rev_info revs = REV_INFO_INIT;
 	struct commit_list *tried;
 	int reaches = 0, all = 0, nr, steps;
@@ -1037,7 +1040,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 	if (res)
 		goto cleanup;
 
-	bisect_rev_setup(r, &revs, prefix, "%s", "^%s", 1);
+	bisect_rev_setup(r, &revs, &rev_argv, prefix, "%s", "^%s", 1);
 
 	revs.first_parent_only = !!(bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY);
 	revs.limited = 1;
@@ -1112,6 +1115,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 	res = bisect_checkout(bisect_rev, no_checkout);
 cleanup:
 	release_revisions(&revs);
+	strvec_clear(&rev_argv);
 	return res;
 }
 
-- 
2.37.1.1233.ge8b09efaedc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
                       ` (4 preceding siblings ...)
  2022-08-02 15:33     ` [PATCH v3 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33     ` Ævar Arnfjörð Bjarmason
  2022-08-03 18:30       ` Junio C Hamano
  2022-08-03 17:33     ` [PATCH v3 0/6] revisions API: fix more memory leaks Jeff King
  6 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

Add a "free_removed_argv_elements" member to "struct
setup_revision_opt", and use it to fix several memory leaks.

We have various memory leaks in APIs that take and munge "const
char **argv", e.g. parse_options(). Sometimes these APIs are given the
"argv" we get to the "main" function, in which case we don't leak
memory, but other times we're giving it the "v" member of a "struct
strvec" we created.

There's several potential ways to fix those sort of leaks, we could
add a "nodup" mode to "struct strvec", which would work for the cases
where we push constant strings to it. But that wouldn't work as soon
as we used strvec_pushf(), or otherwise needed to duplicate or create
a string for that "struct strvec".

Let's instead make it the responsibility of the revisions API. If it's
going to clobber elements of argv it can also free() them, which it
will now do if instructed to do so via "free_removed_argv_elements".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bisect.c                    | 6 ++++--
 builtin/submodule--helper.c | 5 ++++-
 remote.c                    | 5 ++++-
 revision.c                  | 2 ++
 revision.h                  | 3 ++-
 t/t2020-checkout-detach.sh  | 1 +
 6 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/bisect.c b/bisect.c
index 6afb98be7a1..38b3891f3a6 100644
--- a/bisect.c
+++ b/bisect.c
@@ -653,6 +653,9 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 			     const char *bad_format, const char *good_format,
 			     int read_paths)
 {
+	struct setup_revision_opt opt = {
+		.free_removed_argv_elements = 1,
+	};
 	int i;
 
 	repo_init_revisions(r, revs, prefix);
@@ -669,8 +672,7 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
 	if (read_paths)
 		read_bisect_paths(rev_argv);
 
-	setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
-	/* XXX leak rev_argv, as "revs" may still be pointing to it */
+	setup_revisions(rev_argv->nr, rev_argv->v, revs, &opt);
 }
 
 static void bisect_common(struct rev_info *revs)
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index fac52ade5e1..b63f420ecef 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1104,6 +1104,9 @@ static int compute_summary_module_list(struct object_id *head_oid,
 {
 	struct strvec diff_args = STRVEC_INIT;
 	struct rev_info rev;
+	struct setup_revision_opt opt = {
+		.free_removed_argv_elements = 1,
+	};
 	struct module_cb_list list = MODULE_CB_LIST_INIT;
 	int ret = 0;
 
@@ -1121,7 +1124,7 @@ static int compute_summary_module_list(struct object_id *head_oid,
 	init_revisions(&rev, info->prefix);
 	rev.abbrev = 0;
 	precompose_argv_prefix(diff_args.nr, diff_args.v, NULL);
-	setup_revisions(diff_args.nr, diff_args.v, &rev, NULL);
+	setup_revisions(diff_args.nr, diff_args.v, &rev, &opt);
 	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = submodule_summary_callback;
 	rev.diffopt.format_callback_data = &list;
diff --git a/remote.c b/remote.c
index 1ee2b145d07..d66064118cb 100644
--- a/remote.c
+++ b/remote.c
@@ -2169,6 +2169,9 @@ static int stat_branch_pair(const char *branch_name, const char *base,
 	struct object_id oid;
 	struct commit *ours, *theirs;
 	struct rev_info revs;
+	struct setup_revision_opt opt = {
+		.free_removed_argv_elements = 1,
+	};
 	struct strvec argv = STRVEC_INIT;
 
 	/* Cannot stat if what we used to build on no longer exists */
@@ -2203,7 +2206,7 @@ static int stat_branch_pair(const char *branch_name, const char *base,
 	strvec_push(&argv, "--");
 
 	repo_init_revisions(the_repository, &revs, NULL);
-	setup_revisions(argv.nr, argv.v, &revs, NULL);
+	setup_revisions(argv.nr, argv.v, &revs, &opt);
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 
diff --git a/revision.c b/revision.c
index 0c6e26cd9c8..35d24e4fd3e 100644
--- a/revision.c
+++ b/revision.c
@@ -2784,6 +2784,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			const char *arg = argv[i];
 			if (strcmp(arg, "--"))
 				continue;
+			if (opt && opt->free_removed_argv_elements)
+				free((char *)argv[i]);
 			argv[i] = NULL;
 			argc = i;
 			if (argv[i + 1])
diff --git a/revision.h b/revision.h
index e576845cdd1..bb91e7ed914 100644
--- a/revision.h
+++ b/revision.h
@@ -375,7 +375,8 @@ struct setup_revision_opt {
 	const char *def;
 	void (*tweak)(struct rev_info *, struct setup_revision_opt *);
 	unsigned int	assume_dashdash:1,
-			allow_exclude_promisor_objects:1;
+			allow_exclude_promisor_objects:1,
+			free_removed_argv_elements:1;
 	unsigned revarg_opt;
 };
 int setup_revisions(int argc, const char **argv, struct rev_info *revs,
diff --git a/t/t2020-checkout-detach.sh b/t/t2020-checkout-detach.sh
index bc46713a43e..2eab6474f8d 100755
--- a/t/t2020-checkout-detach.sh
+++ b/t/t2020-checkout-detach.sh
@@ -4,6 +4,7 @@ test_description='checkout into detached HEAD state'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 check_detached () {
-- 
2.37.1.1233.ge8b09efaedc


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions()
  2022-08-02 15:33     ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
@ 2022-08-03 17:13       ` Junio C Hamano
  0 siblings, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2022-08-03 17:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Eric Sunshine

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add a missing "goto cleanup", this fixes a bug in
> f196c1e908d (revisions API users: use release_revisions() needing
> REV_INFO_INIT, 2022-04-13).
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  bisect.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/bisect.c b/bisect.c
> index b63669cc9d7..421470bfa59 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -1054,7 +1054,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
>  		 */
>  		res = error_if_skipped_commits(tried, NULL);
>  		if (res < 0)
> -			return res;
> +			goto cleanup;
>  		printf(_("%s was both %s and %s\n"),
>  		       oid_to_hex(current_bad_oid),
>  		       term_good,

OK, from the cleanup: label we call release_revisions(&rev) which
was used in the "failed" attempt to find bisection.  Looks good.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..."
  2022-08-02 15:33     ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
@ 2022-08-03 17:27       ` Jeff King
  2022-08-03 17:29         ` Jeff King
  2022-08-03 17:54       ` Junio C Hamano
  1 sibling, 1 reply; 52+ messages in thread
From: Jeff King @ 2022-08-03 17:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Eric Sunshine

On Tue, Aug 02, 2022 at 05:33:13PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Let's fix this memory leak. Now when we encounter an OBJ_COMMIT we
> save away the "rev.pending" before clearing it. We then add a single
> commit to it, which our indirect invocation of prepare_revision_walk()
> will remove. After that we restore the "rev.pending".
> 
> Our "rev.pending" will then get free'd by the release_revisions()
> added in f6bfea0ad01 (revisions API users: use release_revisions() in
> builtin/log.c, 2022-04-13)

OK, I agree this is doing the correct thing. I actually found the
earlier "let's dissociate rev.pending from rev entirely" approach easier
to reason about, though.

> diff --git a/builtin/log.c b/builtin/log.c
> index 88a5e98875a..b4b1d974617 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -743,11 +743,17 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  			rev.shown_one = 1;
>  			break;
>  		case OBJ_COMMIT:
> +		{
> +			struct object_array old;
> +
> +			memcpy(&old, &rev.pending, sizeof(old));
>  			rev.pending.nr = rev.pending.alloc = 0;
>  			rev.pending.objects = NULL;
>  			add_object_array(o, name, &rev.pending);
>  			ret = cmd_log_walk_no_free(&rev);
> +			memcpy(&rev.pending, &old, sizeof(rev.pending));
>  			break;
> +		}

Here we overwrite the one-item rev.pending without freeing it, but just
immediately after instead of before. It's a little subtle, but your
comment in the commit message:

  [...] and only free the new "rev.pending" in the "OBJ_COMMIT" case arm
  as prepare_revision_walk() would draw it down.

covers that. IMHO that could be spelled out a bit more (particularly
that this only works for OBJ_COMMIT, but that's OK because that's the
only type we're adding), but I can live with it.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..."
  2022-08-03 17:27       ` Jeff King
@ 2022-08-03 17:29         ` Jeff King
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-08-03 17:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Eric Sunshine

On Wed, Aug 03, 2022 at 01:27:36PM -0400, Jeff King wrote:

> > diff --git a/builtin/log.c b/builtin/log.c
> > index 88a5e98875a..b4b1d974617 100644
> > --- a/builtin/log.c
> > +++ b/builtin/log.c
> > @@ -743,11 +743,17 @@ int cmd_show(int argc, const char **argv, const char *prefix)
> >  			rev.shown_one = 1;
> >  			break;
> >  		case OBJ_COMMIT:
> > +		{
> > +			struct object_array old;
> > +
> > +			memcpy(&old, &rev.pending, sizeof(old));
> >  			rev.pending.nr = rev.pending.alloc = 0;
> >  			rev.pending.objects = NULL;
> >  			add_object_array(o, name, &rev.pending);
> >  			ret = cmd_log_walk_no_free(&rev);
> > +			memcpy(&rev.pending, &old, sizeof(rev.pending));
> >  			break;
> > +		}
> 
> Here we overwrite the one-item rev.pending without freeing it, but just
> immediately after instead of before. It's a little subtle, but your
> comment in the commit message:
> 
>   [...] and only free the new "rev.pending" in the "OBJ_COMMIT" case arm
>   as prepare_revision_walk() would draw it down.
> 
> covers that. IMHO that could be spelled out a bit more (particularly
> that this only works for OBJ_COMMIT, but that's OK because that's the
> only type we're adding), but I can live with it.

Ah, I see that you did add a comment in the next commit. That's
sufficient, though I really think it makes more sense here, where we're
actually dealing with leaks.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
  2022-08-02 15:33     ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
@ 2022-08-03 17:32       ` Jeff King
  2022-08-03 17:59       ` Junio C Hamano
  1 sibling, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-08-03 17:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Eric Sunshine

On Tue, Aug 02, 2022 at 05:33:14PM +0200, Ævar Arnfjörð Bjarmason wrote:

>  * We don't need the "count" and "objects" variables introduced in
>    5d7eeee2ac6 (git-show: grok blobs, trees and tags, too, 2006-12-14).
> 
>    They were originally added since we'd clobber rev.pending in the
>    loop without restoring it. Since the preceding commit we are
>    restoring it when we handle OBJ_COMMIT, so the main for-loop can
>    refer to "rev.pending" didrectly.

I think this is accurate, though it does feel a bit weird that we are
iterating over rev.pending, and we clobber and restore it mid-loop.

It's correct because of the restore, but I think that's why my gut
feeling favored the earlier approach to completely dissociate the
iteration variables from "rev.pending" before the loop even starts.

That said, it seems like we're spending a lot more time going back and
forth on this topic than it is really worth, so I can live with any of
the versions.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 0/6] revisions API: fix more memory leaks
  2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
                       ` (5 preceding siblings ...)
  2022-08-02 15:33     ` [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
@ 2022-08-03 17:33     ` Jeff King
  6 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-08-03 17:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Eric Sunshine

On Tue, Aug 02, 2022 at 05:33:10PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Changes since v2:
> 
> * I think the rewrite here of fixing the leak should address Jeff
>   King's concerns. I was somewhat mentally stuck on trying to change
>   the code to make the leak fix easier, and then having the leak fix.
> 
>   But I think as the new 3/6 shows fixing the leak first is much
>   easier, and more straightforward to explain.

Yeah, I think fixing the leak first is fine. I did prefer the end result
of the earlier round, but I can live with it either way.

> * A new 4/6 then follows-up and rewrites the variable clobbering that
>   was needed for the pre-image of 3/6.
> 
> * Don't reference the "XXX" comment in 6/6.

I'd hoped you'd take my replacement 5/6, but again, I can live with it
either way.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..."
  2022-08-02 15:33     ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
  2022-08-03 17:27       ` Jeff King
@ 2022-08-03 17:54       ` Junio C Hamano
  1 sibling, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2022-08-03 17:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Eric Sunshine

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Fix a memory leak in code added in 5d7eeee2ac6 (git-show: grok blobs,
> trees and tags, too, 2006-12-14). As we iterate over a "<revision>..."
> command-line and encounter ad OBJ_COMMIT we want to use our "struct

"... encounter an OBJ_COMMIT, we want to ..."?

> rev_info", but with a "pending" array of one element: the one commit
> we're showing in the loop.
>
> To do this 5d7eeee2ac6 saved away a pointer to rev.pending.objects and
> rev.pending.nr for its iteration. We'd then clobber those (and alloc)
> when we needed to show an OBJ_COMMIT.
>
> We'd therefore leak the "rev.pending" we started out with, and only
> free the new "rev.pending" in the "OBJ_COMMIT" case arm as
> prepare_revision_walk() would draw it down.
>
> Let's fix this memory leak. Now when we encounter an OBJ_COMMIT we
> save away the "rev.pending" before clearing it. We then add a single
> commit to it, which our indirect invocation of prepare_revision_walk()
> will remove. After that we restore the "rev.pending".
>
> Our "rev.pending" will then get free'd by the release_revisions()
> added in f6bfea0ad01 (revisions API users: use release_revisions() in
> builtin/log.c, 2022-04-13)

Hmph, this gives me a strange sense of deja-vu that I saw a better
solution in a separate patch from you, strange because I do not see
anything at the tip of 'seen' like what I thought you did elsewhere.

We do need to reuse "rev_info" we got from outside the loop so that
we will have to consistently apply diffopt and other things we got
in there from the configuration and the command line.  But after we
decide to go to "each object is shown individually" mode, the
contents of the pending array (rather, what we got from the command
line in cmdline member) is only relevant to enumerate which
individual objects are shown in the loop, and should not even be
visible to the code that handles each individual object, e.g. even
we pass &rev to this code when we see a blob:

		switch (o->type) {
		case OBJ_BLOB:
			ret = show_blob_object(&o->oid, &rev, name);
			break;

there is no point in exposing the rev.pending to show_blob_object()
at al.  The same is true for codepaths used to show a tag or a tree.
When showing a commit, we do not even want the codepath that shows a
single-commit range to touch and destroy the original rev.pending;
we instead want to give a single-element pending array.

So instead of keeping rev.pending when we enter the loop and saving
it away and restoring it, it feels a lot cleaner to invent/use an
interface to "clone" the settings in an existing rev_info by:

 * Grab rev.pending into a separate "struct object_array" that is a
   local variable in cmd_show() and clear rev.pending, immediately
   after we decide to go to "show individually" mode.

 * Iterate over the objects in that local variable.  For each object
   to be shown, prepare a rev_info, clone the setting from the
   original rev_info, put that single object to the pending member
   of the clone, use that cloned rev_info, and release the resources
   held by the cloned rev_info once we are done.

> diff --git a/builtin/log.c b/builtin/log.c
> index 88a5e98875a..b4b1d974617 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -743,11 +743,17 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  			rev.shown_one = 1;
>  			break;
>  		case OBJ_COMMIT:
> +		{
> +			struct object_array old;
> +
> +			memcpy(&old, &rev.pending, sizeof(old));
>  			rev.pending.nr = rev.pending.alloc = 0;
>  			rev.pending.objects = NULL;
>  			add_object_array(o, name, &rev.pending);
>  			ret = cmd_log_walk_no_free(&rev);
> +			memcpy(&rev.pending, &old, sizeof(rev.pending));
>  			break;
> +		}

The reason why I find this approach a bit disturbing is that we
pretend to know for certain that pending is the only thing we need
to protect across multiple calls to the log_walk(), but I suspect
that such an overconfidence will come back and bite us.  We are not
re-initializing other states reachable from the rev_info (e.g. the
diff_options struct has various members that records what happened
in an invocation, like needed_rename_limit, found_follow, and
found_changes, that would want to be reinitialized if we are
starting a new and totally independent traversal after we are done
one traversal).

But I do not mind at all to leave such a fundamental clean-up to a
totally separate topic, and keep this patch only about "we are
clobbering and leaking rev.pending, let's plug the leak".  In fact,
I would prefer it that way.  So take all of the above as material
for possible NEEDSWORK comment, food for further thought.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
  2022-08-02 15:33     ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
  2022-08-03 17:32       ` Jeff King
@ 2022-08-03 17:59       ` Junio C Hamano
  2022-08-04  7:51         ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2022-08-03 17:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Eric Sunshine

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Refactor the juggling of "rev.pending" and our replacement for it
> amended in the preceding commit so that:
> ...
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Helped-by: Jeff King <peff@peff.net>

List trailer lines chronologically, please.

This does not look wrong per-se, but given that we might want to
rethink the way the original rev_info is reused inside the loop,
as I mentioned in my review of the previous step, this change may be
irrelevant in the longer term.

I may have said this earlier, but once we start using the "prepare a
blank one, copy it to clear another" pattern, we should stop using
memcpy() and use structure assignment, especially if we are trying to
make our intent clear.

> @@ -745,12 +743,19 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  		case OBJ_COMMIT:
>  		{
>  			struct object_array old;
> +			struct object_array blank = OBJECT_ARRAY_INIT;
>  
>  			memcpy(&old, &rev.pending, sizeof(old));
> -			rev.pending.nr = rev.pending.alloc = 0;
> -			rev.pending.objects = NULL;
> +			memcpy(&rev.pending, &blank, sizeof(rev.pending));
> +
>  			add_object_array(o, name, &rev.pending);
>  			ret = cmd_log_walk_no_free(&rev);
> +
> +			/*
> +			 * No need for
> +			 * object_array_clear(&pending). It was
> +			 * cleared already in prepare_revision_walk()
> +			 */
>  			memcpy(&rev.pending, &old, sizeof(rev.pending));
>  			break;
>  		}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing
  2022-08-02 15:33     ` [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
@ 2022-08-03 18:30       ` Junio C Hamano
  0 siblings, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2022-08-03 18:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Eric Sunshine

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> @@ -2784,6 +2784,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
>  			const char *arg = argv[i];
>  			if (strcmp(arg, "--"))
>  				continue;
> +			if (opt && opt->free_removed_argv_elements)
> +				free((char *)argv[i]);

We have "arg", let's free that instead.

>  			argv[i] = NULL;
>  			argc = i;

>  	unsigned int	assume_dashdash:1,
> -			allow_exclude_promisor_objects:1;
> +			allow_exclude_promisor_objects:1,
> +			free_removed_argv_elements:1;

It would be nicer to pick a name that explains why we want to "free
removed argv[] elements" than "if you remove argv[] elements, then
please free them", because the explanation on the reason why we
request the API to free would help future developers to decide if
they need to free in some situations where they do not "remove" but
do something else in their updates to the revisions API.

If we cannot come up with a better name, at least we should add a
comment that says that the caller owns the **argv strings, and it
asks the API to free those if reference to them are lost in any way.

Thanks.  Overall the series was a pleasant read.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
  2022-08-03 17:59       ` Junio C Hamano
@ 2022-08-04  7:51         ` Ævar Arnfjörð Bjarmason
  2022-08-04 15:59           ` Junio C Hamano
  0 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-04  7:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Eric Sunshine


On Wed, Aug 03 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Refactor the juggling of "rev.pending" and our replacement for it
>> amended in the preceding commit so that:
>> ...
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> Helped-by: Jeff King <peff@peff.net>
>
> List trailer lines chronologically, please.

Willdo.

> I may have said this earlier, but once we start using the "prepare a
> blank one, copy it to clear another" pattern, we should stop using
> memcpy() and use structure assignment, especially if we are trying to
> make our intent clear.

Yeah, I saw that. I took it as we should consider changing this more
generally (e.g. with coccicheck etc.).

This was mentioned in one of the original threads about the memcpy()
idiom, but IIRC there was some reason to think that it wasn't as widely
supported, or in any case we'd want to re-rest that the compilers we
care about similarly optimize it.

So I think it's best to just use the more widely used pattern for now,
and if we'd like change them all in some follow-up change...


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
  2022-08-04  7:51         ` Ævar Arnfjörð Bjarmason
@ 2022-08-04 15:59           ` Junio C Hamano
  2022-08-05  9:01             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2022-08-04 15:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Eric Sunshine

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Yeah, I saw that. I took it as we should consider changing this more
> generally (e.g. with coccicheck etc.).

To make things easier in the future, for the record, I in general do
not suggest such a bulk rewrite for the sake of rewrite, whether
driven with Coccinelle or something else, and I did not in this
case.

> This was mentioned in one of the original threads about the memcpy()
> idiom, but IIRC there was some reason to think that it wasn't as widely
> supported, ...

I somehow thought that we had that stage too long ago; I recall we
spotted struct assignment in a patch post release and left it there
without reverting.

> ... or in any case we'd want to re-rest that the compilers we
> care about similarly optimize it.

Perhaps.  Using struct assignment only when we feel an urge to use
memcpy() in a new code (or in the postimage of a newly rewritten
code), instead of doing a bulk update, would give us a chance to
start small and vet the result with compilers of such a small scale
rewrite carefully to build confidence, hopefully?

Thanks.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
  2022-08-04 15:59           ` Junio C Hamano
@ 2022-08-05  9:01             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-05  9:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Eric Sunshine


On Thu, Aug 04 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> [...]
>> ... or in any case we'd want to re-rest that the compilers we
>> care about similarly optimize it.
>
> Perhaps.  Using struct assignment only when we feel an urge to use
> memcpy() in a new code (or in the postimage of a newly rewritten
> code), instead of doing a bulk update, would give us a chance to
> start small and vet the result with compilers of such a small scale
> rewrite carefully to build confidence, hopefully?

I think it would make sense to set out a baloon for it, but as some of
it (and I'll note this in a re-roll) involves some new C99 support I'd
prefer not to conflate that with this series.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2022-08-05  9:04 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
2022-07-18 14:51   ` Jeff King
2022-07-13 13:10 ` [PATCH 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
2022-07-18 14:57   ` Jeff King
2022-07-18 15:08     ` Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
2022-07-18 15:05   ` Jeff King
2022-07-13 13:10 ` [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
2022-07-18 15:11   ` Jeff King
2022-07-18 15:13     ` Ævar Arnfjörð Bjarmason
2022-07-18 15:45       ` Jeff King
2022-07-29  7:07         ` Ævar Arnfjörð Bjarmason
2022-07-29 18:03           ` Jeff King
2022-07-29 18:37             ` Jeff King
2022-07-29 18:52               ` Junio C Hamano
2022-07-29 19:04                 ` Jeff King
2022-07-18 15:14 ` [PATCH 0/6] revisions API: fix more memory leaks Jeff King
2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2022-07-29  8:31   ` [PATCH v2 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-07-29  8:31   ` [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
2022-07-30  5:22     ` Eric Sunshine
2022-07-29  8:31   ` [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
2022-07-29 18:44     ` Junio C Hamano
2022-07-29  8:31   ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
2022-07-29 18:55     ` Jeff King
2022-07-29 19:07       ` Jeff King
2022-07-29 18:59     ` Jeff King
2022-07-29  8:31   ` [PATCH v2 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
2022-07-29  8:31   ` [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
2022-07-29 19:00     ` Jeff King
2022-07-29 19:02   ` [PATCH v2 0/6] revisions API: fix more memory leaks Jeff King
2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-08-03 17:13       ` Junio C Hamano
2022-08-02 15:33     ` [PATCH v3 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
2022-08-03 17:27       ` Jeff King
2022-08-03 17:29         ` Jeff King
2022-08-03 17:54       ` Junio C Hamano
2022-08-02 15:33     ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
2022-08-03 17:32       ` Jeff King
2022-08-03 17:59       ` Junio C Hamano
2022-08-04  7:51         ` Ævar Arnfjörð Bjarmason
2022-08-04 15:59           ` Junio C Hamano
2022-08-05  9:01             ` Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
2022-08-03 18:30       ` Junio C Hamano
2022-08-03 17:33     ` [PATCH v3 0/6] revisions API: fix more memory leaks Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).