git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing
Date: Fri, 29 Jul 2022 09:07:40 +0200	[thread overview]
Message-ID: <220729.86pmhoidsc.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YtWAMP0ROFseFs6B@coredump.intra.peff.net>


On Mon, Jul 18 2022, Jeff King wrote:

> On Mon, Jul 18, 2022 at 05:13:05PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > But if you went just a little further and made the option "rev should
>> > own its own argv", then I think you can simplify life for callers even
>> > more. They could construct a strvec themselves and then hand it off to
>> > the rev_info, to be cleaned up when release_revisions() is called (and
>> > of course freeing the "--" when we overwrite it in the interim, as you
>> > do here).
>> >
>> > Then all of the bisect callers from the previous patch could avoid
>> > having to deal with the strvec at all. They'd call bisect_rev_setup(),
>> > which would internally attach the memory to rev_info.
>> 
>> Yes, I experimented with this, and it's a solid approach.
>> 
>> But it's a much larger change, particularly since we'd also want to
>> update the API itself to take take "const" in the appropriate places to
>> do it properly.
>
> Hmm. I was thinking that we'd just have rev_info.we_own_our_argv, and
> then release it in release_revisions(). But actually, rev_info does not
> hold onto the argv at all! It's totally processed in setup_revisions().
>
> And there it either:
>
>   - becomes part of prune_data (in which case a copy is made)
>
>   - is passed to handle_revisions_opt(), etc, in which case it is parsed
>     but not held onto (it's possible that some string option holds onto
>     a pointer, but the only one I found is --filter, which makes a
>     copy).
>
>   - is passed to handle_revision_arg(), but these days
>     add_rev_cmdline(), etc, make copies. As they must, otherwise --stdin
>     would be totally buggy.
>
> So I actually wonder if the comment in bisect_rev_setup() is simply
> wrong. It was correct in 2011 when I wrote it, but things changed in
> df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
> 2013-05-25), etc.
>
> In that case, we could replace your patch 5 in favor of just calling
> strvec_clear() at the end of bisect_rev_setup().

5/6 is doing that, or rather at the end of check_ancestors() and
bisect_next_all(), but those call bisect_rev_setup() and pass it the
strvec, so in terms of memory management it amounts to the same thing.

> It's possible there's a
> case I'm missing that makes this generally not-safe, but in the case of
> bisect_rev_setup() there's a very limited set of items in our argv in
> the first place. Doing so also passes the test suite with
> SANITIZE=address, though again, this is just exercising the very limited
> bisect options here.

I think what you're missing is that this code is basically doing this,
which is a common pattern in various parts of our codebase:

	struct strvec sv = STRVEC_INIT;
	strvec_push(&sv, "foo"); // sv.v[0] 
	strvec_push(&sv, "bar"); // sv.v[1] 
	sv.v[1] = NULL; // the code in revisions.c
	strvec_clear(&sv);

I.e. you can't fix this by simply having revision.c having its own
strvec, because it would just .. proceed to do the same thing.

Of course we could alter its argv-iterating state machine to not clobber
that "argv" element to NULL, and have other subsequent code know that it
should stop at a new lower "argc" etc. etc., but that's getting at the
much more invasive API changes 6/6 notes as the path not taken.

And, in the general case for things that do this to the strvec what
we're explicitly doing is having the caller then operate on that munged
argv, i.e. it's important that we change *its* argv. That's not going on
here, but e.g. various parse_options() callers rely on that.

I've experimented a bit with how to solve this more generally, the below
is some very rough WIP code I have laying aroun to try to do that.

IIRC this fails SANITIZE=address or was otherwise broken, I didn't test
it now, but that's not the point...

... I'm just including it by way of explanation. I.e. for at least
*some* callers (IIRC the below mostly works, and I can't remember where
it's broken) it would suffice to just keep around a list of "here's
stuff we should free later".

In case below I opted to do that with a string_list, but it could be
another strvec or whatever.

-- >8 --
From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>
Date: Fri, 10 Jun 2022 13:05:05 +0200
Subject: [PATCH] string-list API: add utility to help free "struct strvec"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The "struct strvec" has its own strvec_clear(), but unfortunately it's
a very common pattern to pass its "v" member to parse_options(),
setup_revisions() etc.

Those APIs will then change the "argv" so that strvec_clear() can't do
anything useful with its members, e.g. parse_options_end() will NULL
out the dereferenced elements of the "argv" that's passed in to
indicate how much of it was consumed.

One way to deal with that would be to change all of those APIs to
accept some "in_argv" which they'd then copy, but that would be a very
large change.

This is a less invasive way of doing it, we'll now maintain a "struct
string_list" on the side, whose "util" members point to the elements
of the "struct strvec" that we should free() later, which we can
helpfully do with string_list_clear(list, 1). See the added API
documentation for more details.

As a result we can mark more tests as passing with SANITIZE=leak using
"TEST_PASSES_SANITIZE_LEAK=true". We'll be able to use this API for a
lot more callers, but let's start with "am"'s run_apply().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/am.c                  |  3 +++
 string-list.c                 | 16 ++++++++++++++++
 string-list.h                 | 15 +++++++++++++++
 t/t0023-crlf-am.sh            |  1 +
 t/t4152-am-subjects.sh        |  2 ++
 t/t4254-am-corrupt.sh         |  2 ++
 t/t4256-am-format-flowed.sh   |  1 +
 t/t5403-post-checkout-hook.sh |  1 +
 8 files changed, 41 insertions(+)

diff --git a/builtin/am.c b/builtin/am.c
index 93bec62afa9..d927fea128a 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	int res, opts_left;
 	int force_apply = 0;
 	int options = 0;
+	struct string_list to_free = STRING_LIST_INIT_NODUP;
 
 	if (init_apply_state(&apply_state, the_repository, NULL))
 		BUG("init_apply_state() failed");
@@ -1483,6 +1484,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	strvec_push(&apply_opts, "apply");
 	strvec_pushv(&apply_opts, state->git_apply_opts.v);
 
+	string_list_append_strvec_nodup(&to_free, &apply_opts);
 	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
 					&apply_state, &force_apply, &options,
 					NULL);
@@ -1512,6 +1514,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
 
 	strvec_clear(&apply_paths);
 	strvec_clear(&apply_opts);
+	string_list_clear(&to_free, 1);
 	clear_apply_state(&apply_state);
 
 	if (res)
diff --git a/string-list.c b/string-list.c
index 549fc416d68..ecbfb3065a8 100644
--- a/string-list.c
+++ b/string-list.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "string-list.h"
+#include "strvec.h"
 
 void string_list_init_nodup(struct string_list *list)
 {
@@ -325,3 +326,18 @@ int string_list_split_in_place(struct string_list *list, char *string,
 		}
 	}
 }
+
+void string_list_append_strvec_nodup(struct string_list *list,
+				     const struct strvec *vec)
+{
+	size_t i;
+
+	if (list->strdup_strings)
+		BUG("need .strdup_strings = 0 for string_list_append_strvec_nodup()");
+
+	for (i = 0; i < vec->nr; i++) {
+		const char *p = vec->v[i];
+
+		string_list_append_nodup(list, (char *)p)->util = (void *)p;
+	}
+}
diff --git a/string-list.h b/string-list.h
index d5a744e1438..e740f3f29be 100644
--- a/string-list.h
+++ b/string-list.h
@@ -266,4 +266,19 @@ int string_list_split(struct string_list *list, const char *string,
  */
 int string_list_split_in_place(struct string_list *list, char *string,
 			       int delim, int maxsplit);
+
+/**
+ * Given a STRING_LIST_INIT_NODUP-initialized "struct string_list"
+ * takes a "struct strvec" and "copies" it, the "string" members will
+ * be the elements of the "struct strvec", and more importantly so
+ * will the "util" members.
+ *
+ * This is intended to squirrel away pointers to "argv" to
+ * string_list_clear(list, 1) them later, in cases where a "struct
+ * strvec" is passed to an API like parse_options(), setup_revisions()
+ * etc.
+ */
+struct strvec;
+void string_list_append_strvec_nodup(struct string_list *list,
+				     const struct strvec *vec);
 #endif /* STRING_LIST_H */
diff --git a/t/t0023-crlf-am.sh b/t/t0023-crlf-am.sh
index f9bbb91f64e..575805513a3 100755
--- a/t/t0023-crlf-am.sh
+++ b/t/t0023-crlf-am.sh
@@ -2,6 +2,7 @@
 
 test_description='Test am with auto.crlf'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 cat >patchfile <<\EOF
diff --git a/t/t4152-am-subjects.sh b/t/t4152-am-subjects.sh
index 4c68245acad..9f2edba1f83 100755
--- a/t/t4152-am-subjects.sh
+++ b/t/t4152-am-subjects.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
 test_description='test subject preservation with format-patch | am'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 make_patches() {
diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index 54be7da1611..45f1d4f95e5 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
 test_description='git am with corrupt input'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 make_mbox_with_nul () {
diff --git a/t/t4256-am-format-flowed.sh b/t/t4256-am-format-flowed.sh
index 2369c4e17ad..1015273bc82 100755
--- a/t/t4256-am-format-flowed.sh
+++ b/t/t4256-am-format-flowed.sh
@@ -2,6 +2,7 @@
 
 test_description='test format=flowed support of git am'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' '
diff --git a/t/t5403-post-checkout-hook.sh b/t/t5403-post-checkout-hook.sh
index 978f240cdac..cfaae547398 100755
--- a/t/t5403-post-checkout-hook.sh
+++ b/t/t5403-post-checkout-hook.sh
@@ -7,6 +7,7 @@ test_description='Test the post-checkout hook.'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success setup '



  reply	other threads:[~2022-07-29  7:22 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
2022-07-18 14:51   ` Jeff King
2022-07-13 13:10 ` [PATCH 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
2022-07-18 14:57   ` Jeff King
2022-07-18 15:08     ` Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
2022-07-18 15:05   ` Jeff King
2022-07-13 13:10 ` [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
2022-07-18 15:11   ` Jeff King
2022-07-18 15:13     ` Ævar Arnfjörð Bjarmason
2022-07-18 15:45       ` Jeff King
2022-07-29  7:07         ` Ævar Arnfjörð Bjarmason [this message]
2022-07-29 18:03           ` Jeff King
2022-07-29 18:37             ` Jeff King
2022-07-29 18:52               ` Junio C Hamano
2022-07-29 19:04                 ` Jeff King
2022-07-18 15:14 ` [PATCH 0/6] revisions API: fix more memory leaks Jeff King
2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2022-07-29  8:31   ` [PATCH v2 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-07-29  8:31   ` [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
2022-07-30  5:22     ` Eric Sunshine
2022-07-29  8:31   ` [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
2022-07-29 18:44     ` Junio C Hamano
2022-07-29  8:31   ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
2022-07-29 18:55     ` Jeff King
2022-07-29 19:07       ` Jeff King
2022-07-29 18:59     ` Jeff King
2022-07-29  8:31   ` [PATCH v2 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
2022-07-29  8:31   ` [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
2022-07-29 19:00     ` Jeff King
2022-07-29 19:02   ` [PATCH v2 0/6] revisions API: fix more memory leaks Jeff King
2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-08-03 17:13       ` Junio C Hamano
2022-08-02 15:33     ` [PATCH v3 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
2022-08-03 17:27       ` Jeff King
2022-08-03 17:29         ` Jeff King
2022-08-03 17:54       ` Junio C Hamano
2022-08-02 15:33     ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
2022-08-03 17:32       ` Jeff King
2022-08-03 17:59       ` Junio C Hamano
2022-08-04  7:51         ` Ævar Arnfjörð Bjarmason
2022-08-04 15:59           ` Junio C Hamano
2022-08-05  9:01             ` Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
2022-08-03 18:30       ` Junio C Hamano
2022-08-03 17:33     ` [PATCH v3 0/6] revisions API: fix more memory leaks Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220729.86pmhoidsc.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).