git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] am: don't pass strvec to apply_parse_options()
@ 2022-12-13  6:47 René Scharfe
  2022-12-13  8:37 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 24+ messages in thread
From: René Scharfe @ 2022-12-13  6:47 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano

apply_parse_options() passes the array of argument strings to
parse_options(), which removes recognized options.  The removed strings
are not freed, though.

Make a copy of the strvec to pass to the function to retain the pointers
of its strings, so we release them all at the end.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/am.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index 30c9b3a9cd..dddf1b9af0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	int res, opts_left;
 	int force_apply = 0;
 	int options = 0;
+	const char **apply_argv;

 	if (init_apply_state(&apply_state, the_repository, NULL))
 		BUG("init_apply_state() failed");
@@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	strvec_push(&apply_opts, "apply");
 	strvec_pushv(&apply_opts, state->git_apply_opts.v);

-	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
+	/*
+	 * Build a copy that apply_parse_options() can rearrange.
+	 * apply_opts.v keeps referencing the allocated strings for
+	 * strvec_clear() to release.
+	 */
+	ALLOC_ARRAY(apply_argv, apply_opts.nr);
+	COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr);
+
+	opts_left = apply_parse_options(apply_opts.nr, apply_argv,
 					&apply_state, &force_apply, &options,
 					NULL);

@@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	strvec_clear(&apply_paths);
 	strvec_clear(&apply_opts);
 	clear_apply_state(&apply_state);
+	free(apply_argv);

 	if (res)
 		return res;
--
2.38.2

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] am: don't pass strvec to apply_parse_options()
  2022-12-13  6:47 [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
@ 2022-12-13  8:37 ` Ævar Arnfjörð Bjarmason
  2022-12-13 18:31   ` René Scharfe
  0 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-13  8:37 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano


On Tue, Dec 13 2022, René Scharfe wrote:

> apply_parse_options() passes the array of argument strings to
> parse_options(), which removes recognized options.  The removed strings
> are not freed, though.
>
> Make a copy of the strvec to pass to the function to retain the pointers
> of its strings, so we release them all at the end.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  builtin/am.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/am.c b/builtin/am.c
> index 30c9b3a9cd..dddf1b9af0 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>  	int res, opts_left;
>  	int force_apply = 0;
>  	int options = 0;
> +	const char **apply_argv;
>
>  	if (init_apply_state(&apply_state, the_repository, NULL))
>  		BUG("init_apply_state() failed");
> @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file)
>  	strvec_push(&apply_opts, "apply");
>  	strvec_pushv(&apply_opts, state->git_apply_opts.v);
>
> -	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
> +	/*
> +	 * Build a copy that apply_parse_options() can rearrange.
> +	 * apply_opts.v keeps referencing the allocated strings for
> +	 * strvec_clear() to release.
> +	 */
> +	ALLOC_ARRAY(apply_argv, apply_opts.nr);
> +	COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr);
> +
> +	opts_left = apply_parse_options(apply_opts.nr, apply_argv,
>  					&apply_state, &force_apply, &options,
>  					NULL);
>
> @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>  	strvec_clear(&apply_paths);
>  	strvec_clear(&apply_opts);
>  	clear_apply_state(&apply_state);
> +	free(apply_argv);
>
>  	if (res)
>  		return res;

I don't mind this going in, but it really feels like a bit of a dirty
hack.

We have widespread leaks all over the place due to this
idiom. I.e. parse_options() and a couple of other APIs expect that they
can munge the "argv", which is fine if it arrives via main(), but not if
we're editing our own strvecs.

I think less of a hack is to teach the eventual parse_options() that
when it munges it it should free() it. I did that for the revisions API
in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
need free()-ing, 2022-08-02).

What do you think?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] am: don't pass strvec to apply_parse_options()
  2022-12-13  8:37 ` Ævar Arnfjörð Bjarmason
@ 2022-12-13 18:31   ` René Scharfe
  2022-12-14  8:44     ` Ævar Arnfjörð Bjarmason
  2022-12-17 13:24     ` Jeff King
  0 siblings, 2 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-13 18:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git List, Junio C Hamano

Am 13.12.22 um 09:37 schrieb Ævar Arnfjörð Bjarmason:
>
> On Tue, Dec 13 2022, René Scharfe wrote:
>
>> apply_parse_options() passes the array of argument strings to
>> parse_options(), which removes recognized options.  The removed strings
>> are not freed, though.
>>
>> Make a copy of the strvec to pass to the function to retain the pointers
>> of its strings, so we release them all at the end.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  builtin/am.c | 12 +++++++++++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/builtin/am.c b/builtin/am.c
>> index 30c9b3a9cd..dddf1b9af0 100644
>> --- a/builtin/am.c
>> +++ b/builtin/am.c
>> @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>  	int res, opts_left;
>>  	int force_apply = 0;
>>  	int options = 0;
>> +	const char **apply_argv;
>>
>>  	if (init_apply_state(&apply_state, the_repository, NULL))
>>  		BUG("init_apply_state() failed");
>> @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>  	strvec_push(&apply_opts, "apply");
>>  	strvec_pushv(&apply_opts, state->git_apply_opts.v);
>>
>> -	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
>> +	/*
>> +	 * Build a copy that apply_parse_options() can rearrange.
>> +	 * apply_opts.v keeps referencing the allocated strings for
>> +	 * strvec_clear() to release.
>> +	 */
>> +	ALLOC_ARRAY(apply_argv, apply_opts.nr);
>> +	COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr);
>> +
>> +	opts_left = apply_parse_options(apply_opts.nr, apply_argv,
>>  					&apply_state, &force_apply, &options,
>>  					NULL);
>>
>> @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>  	strvec_clear(&apply_paths);
>>  	strvec_clear(&apply_opts);
>>  	clear_apply_state(&apply_state);
>> +	free(apply_argv);
>>
>>  	if (res)
>>  		return res;
>
> I don't mind this going in, but it really feels like a bit of a dirty
> hack.
>
> We have widespread leaks all over the place due to this
> idiom. I.e. parse_options() and a couple of other APIs expect that they
> can munge the "argv", which is fine if it arrives via main(), but not if
> we're editing our own strvecs.

Where?  A quick "git grep 'parse_options.*nr'" turns up only this place
as one that passes a strvec to parse_options.

> I think less of a hack is to teach the eventual parse_options() that
> when it munges it it should free() it. I did that for the revisions API
> in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
> need free()-ing, 2022-08-02).
>
> What do you think?

Generating string lists and then parsing them is weird.  When calls have
to cross a process boundary then we have no choice, but in-process we
shouldn't have to lower our request to an intermediate text format.  git
am does it anyway because it writes its options to a file and reads them
back when it resumes with --continue, IIUC.

I hope that is and will be the only place that uses parse_options() with
a strvec -- and then we don't have to change that function.

If this pattern is used more widely then we could package the copying
done by this patch somehow, e.g. by adding a strvec_parse_options()
that wraps the real thing.

If we have to change parse_options() at all then I'd prefer it to not
free() anything (to keep it usable with main()'s parameters), but to
reorder in a non-destructive way.  That would mean keeping the NULL
sentinel where it is, and making sure all callers use only the returned
argc to determine which arguments parse_options() didn't recognize.

René

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] am: don't pass strvec to apply_parse_options()
  2022-12-13 18:31   ` René Scharfe
@ 2022-12-14  8:44     ` Ævar Arnfjörð Bjarmason
  2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
  2022-12-17 12:46       ` [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
  2022-12-17 13:24     ` Jeff King
  1 sibling, 2 replies; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-14  8:44 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano


On Tue, Dec 13 2022, René Scharfe wrote:

> Am 13.12.22 um 09:37 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Tue, Dec 13 2022, René Scharfe wrote:
>>
>>> apply_parse_options() passes the array of argument strings to
>>> parse_options(), which removes recognized options.  The removed strings
>>> are not freed, though.
>>>
>>> Make a copy of the strvec to pass to the function to retain the pointers
>>> of its strings, so we release them all at the end.
>>>
>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>> ---
>>>  builtin/am.c | 12 +++++++++++-
>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/builtin/am.c b/builtin/am.c
>>> index 30c9b3a9cd..dddf1b9af0 100644
>>> --- a/builtin/am.c
>>> +++ b/builtin/am.c
>>> @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>>  	int res, opts_left;
>>>  	int force_apply = 0;
>>>  	int options = 0;
>>> +	const char **apply_argv;
>>>
>>>  	if (init_apply_state(&apply_state, the_repository, NULL))
>>>  		BUG("init_apply_state() failed");
>>> @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>>  	strvec_push(&apply_opts, "apply");
>>>  	strvec_pushv(&apply_opts, state->git_apply_opts.v);
>>>
>>> -	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
>>> +	/*
>>> +	 * Build a copy that apply_parse_options() can rearrange.
>>> +	 * apply_opts.v keeps referencing the allocated strings for
>>> +	 * strvec_clear() to release.
>>> +	 */
>>> +	ALLOC_ARRAY(apply_argv, apply_opts.nr);
>>> +	COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr);
>>> +
>>> +	opts_left = apply_parse_options(apply_opts.nr, apply_argv,
>>>  					&apply_state, &force_apply, &options,
>>>  					NULL);
>>>
>>> @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>>  	strvec_clear(&apply_paths);
>>>  	strvec_clear(&apply_opts);
>>>  	clear_apply_state(&apply_state);
>>> +	free(apply_argv);
>>>
>>>  	if (res)
>>>  		return res;
>>
>> I don't mind this going in, but it really feels like a bit of a dirty
>> hack.
>>
>> We have widespread leaks all over the place due to this
>> idiom. I.e. parse_options() and a couple of other APIs expect that they
>> can munge the "argv", which is fine if it arrives via main(), but not if
>> we're editing our own strvecs.
>
> Where?  A quick "git grep 'parse_options.*nr'" turns up only this place
> as one that passes a strvec to parse_options.

Sorry about the vagueness, this was from memory and I didn't have a full
list handy.

First, that grep isn't going to give you what you want, since it only
catches cases where we're passing the "strvec" to the "parse_options()"
directly.

But if you look at e.g. how cmd_stash() invokes push_stash() or
cmd_describe() invoking cmd_name_rev() you'll see callers where we're
potentially losing the strvec due to parse_options().

"Potentially" because in both those cases we call that API, but in the
latter case we've got a missing strvec_clear(), so maybe that's all
that's needed (it doesn't always munge the argv).

>> I think less of a hack is to teach the eventual parse_options() that
>> when it munges it it should free() it. I did that for the revisions API
>> in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
>> need free()-ing, 2022-08-02).
>>
>> What do you think?
>
> Generating string lists and then parsing them is weird.  When calls have
> to cross a process boundary then we have no choice, but in-process we
> shouldn't have to lower our request to an intermediate text format.  git
> am does it anyway because it writes its options to a file and reads them
> back when it resumes with --continue, IIUC.

I agree, but making all those use a nicer API would be a much bigger
change.

> I hope that is and will be the only place that uses parse_options() with
> a strvec -- and then we don't have to change that function.
>
> If this pattern is used more widely then we could package the copying
> done by this patch somehow, e.g. by adding a strvec_parse_options()
> that wraps the real thing.

The reason I'm encouraging you to look at some of the other strvec leaks
is to see if you can think of a pattern that fits these various leaky
callers.

So e.g. a strvec_parse_options() won't work for the ones noted above
where we've lost track of it being a "struct strvec" in the first place.

I have a local version somewhere where I did (pseudocode):

	struct strvec vec = STRVEC_INIT;
	struct string_list to_free = STRING_LIST_INIT_DUP;

        // populate vec...
        for (size_t i = 0; i < vec.nr; i++)
		string_list_append_nodup(&to_free, v[i]):
	// call parse_options(), or otherwise munge "vec"...
	free(strvec_detach(&vec));
        string_list_clear(&to_free, 0);

I.e. you snapshot the members of the "vec" before it's munged, and
free() those via a "struct string_list".

Then you don't strvec_clear(), but instead free() the container, its
members being free'd by the string_list.

Here's a (not yet in-tree) patch that uses that:
https://lore.kernel.org/git/patch-10.10-81f138e460c-20221017T115544Z-avarab@gmail.com/

I think that's more reliable & general than the pattern you're adding
here.

In your version the only reason we're not segfaulting is because the
parse_options() consumes all the arguments, but that's not something you
can generally rely on with parse_options().

Also, and maybe I'm missing something, but wouldn't this be functionally
the same as your implementation:
	
	diff --git a/builtin/am.c b/builtin/am.c
	index 30c9b3a9cd7..e65262cbb21 100644
	--- a/builtin/am.c
	+++ b/builtin/am.c
	@@ -1476,11 +1476,13 @@ static int run_apply(const struct am_state *state, const char *index_file)
	 	int res, opts_left;
	 	int force_apply = 0;
	 	int options = 0;
	+	char *to_free;
	 
	 	if (init_apply_state(&apply_state, the_repository, NULL))
	 		BUG("init_apply_state() failed");
	 
	 	strvec_push(&apply_opts, "apply");
	+	to_free = (char *)apply_opts.v[0];
	 	strvec_pushv(&apply_opts, state->git_apply_opts.v);
	 
	 	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
	@@ -1489,6 +1491,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
	 
	 	if (opts_left != 0)
	 		die("unknown option passed through to git apply");
	+	free(to_free);
	 
	 	if (index_file) {
	 		apply_state.index_file = index_file;

I.e. we leak the strvec_push() of the "apply" literal, but for the rest
we append the "state->git_apply_opts.v", which we already free()
elsewhere.

So actually, aren't we really just fumbling our way towards the "nodup"
interface that the "struct string_list" has, and we should add the same
to the "struct strvec".

This seems to also fix all the leaks you fixed, but without any of the
copying:

	diff --git a/builtin/am.c b/builtin/am.c
	index 30c9b3a9cd7..691ec1d152d 100644
	--- a/builtin/am.c
	+++ b/builtin/am.c
	@@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail)
	 static int run_apply(const struct am_state *state, const char *index_file)
	 {
	 	struct strvec apply_paths = STRVEC_INIT;
	-	struct strvec apply_opts = STRVEC_INIT;
	+	struct strvec apply_opts = STRVEC_INIT_NODUP;
	 	struct apply_state apply_state;
	 	int res, opts_left;
	 	int force_apply = 0;
	diff --git a/strvec.c b/strvec.c
	index 61a76ce6cb9..efdc47a5854 100644
	--- a/strvec.c
	+++ b/strvec.c
	@@ -22,7 +22,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value)
	 
	 const char *strvec_push(struct strvec *array, const char *value)
	 {
	-	strvec_push_nodup(array, xstrdup(value));
	+	const char *to_push = array->nodup_strings ? value : xstrdup(value);
	+
	+	strvec_push_nodup(array, to_push);
	 	return array->v[array->nr - 1];
	 }
	 
	@@ -31,6 +33,9 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...)
	 	va_list ap;
	 	struct strbuf v = STRBUF_INIT;
	 
	+	if (array->nodup_strings)
	+		BUG("cannot strvec_pushf() on a 'nodup' strvec");
	+
	 	va_start(ap, fmt);
	 	strbuf_vaddf(&v, fmt, ap);
	 	va_end(ap);
	@@ -67,6 +72,9 @@ void strvec_pop(struct strvec *array)
	 
	 void strvec_split(struct strvec *array, const char *to_split)
	 {
	+	if (array->nodup_strings)
	+		BUG("cannot strvec_split() on a 'nodup' strvec");
	+
	 	while (isspace(*to_split))
	 		to_split++;
	 	for (;;) {
	@@ -90,7 +98,7 @@ void strvec_clear(struct strvec *array)
	 	if (array->v != empty_strvec) {
	 		int i;
	 		for (i = 0; i < array->nr; i++)
	-			free((char *)array->v[i]);
	+			free(array->nodup_strings ? NULL : (char *)array->v[i]);
	 		free(array->v);
	 	}
	 	strvec_init(array);
	diff --git a/strvec.h b/strvec.h
	index 9f55c8766ba..ac6e2c04cea 100644
	--- a/strvec.h
	+++ b/strvec.h
	@@ -31,12 +31,18 @@ struct strvec {
	 	const char **v;
	 	size_t nr;
	 	size_t alloc;
	+	unsigned int nodup_strings:1;
	 };
	 
	 #define STRVEC_INIT { \
	 	.v = empty_strvec, \
	 }
	 
	+#define STRVEC_INIT_NODUP { \
	+	.v = empty_strvec, \
	+	.nodup_strings = 1, \
	+}
	+
	 /**
	  * Initialize an array. This is no different than assigning from
	  * `STRVEC_INIT`.

In any case, for this change (and leak fixes in general), could you
please also mark the now-passing leak-free tests. This fails after your
patch, but passes before.

The failure is a "good" one, as they're now leak-free, but should be
marked as such:

	GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_SANITIZE_LEAK_LOG=true \
	make SANITIZE=leak test T="t4256-am-format-flowed.sh t0023-crlf-am.sh t4254-am-corrupt.sh t4257-am-interactive.sh t5403-post-checkout-hook.sh t4152-am-subjects.sh"

> If we have to change parse_options() at all then I'd prefer it to not
> free() anything (to keep it usable with main()'s parameters),...

I'm suggesting passing a flag to it to have it free() things it
NULL's.

Maybe that's a bad idea, but I don't see the incompatibility with
main(), for those we just wouldn't pass the flag.

It does have the inherent limitation that you couldn't mix strvec and
non-strvec parameters, i.e. if some of your elements are dup'd but
others aren't you'd need to arrange to have them all dup'd.

But I don't think that's an issue in practice, either we pass the fully
dup'd version, or main() parameters.

> but to
> reorder in a non-destructive way.  That would mean keeping the NULL
> sentinel where it is, and making sure all callers use only the returned
> argc to determine which arguments parse_options() didn't recognize.

I think this would work if parse_options() wasn't clobbering those
elements in some cases, and replacing them with new ones, but doesn't it
do that e.g. in parse_options_step()?

It also seems like a much more fragile change to try to ensure that
nothing uses the "argv" again, but only looks at the updated
"argc".

Both that & adding a "const" to it would be a huge change across the
codebase, so I'd like to avoid them, but at least the "const" method
would be helped along by the compiler.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
  2022-12-14  8:44     ` Ævar Arnfjörð Bjarmason
@ 2022-12-15  9:11       ` Ævar Arnfjörð Bjarmason
  2022-12-15  9:11         ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
                           ` (6 more replies)
  2022-12-17 12:46       ` [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
  1 sibling, 7 replies; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15  9:11 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

This is an alternative to René's [1], his already fixes a leak in "git
am", and this could be done later, so I'm submitting it as RFC, but it
could also replace it.

I think as this series shows extending the "strvec" API to get a
feature that works like the existing "strdup_strings" that the "struct
string_list" has can make memory management much simpler.

The 4/5 here shows cases where we were leaking because our "v" was
clobbered, but where all the strings we were pushing to the "strvec"
were fixed strings, so we could skip xstrdup()-ing them.

The 5/5 then shows more complex cases where we have mixed-use,
i.e. most strings are fixed, but some are not. For those we use a
"struct string_list to_free = STRING_LIST_INIT_DUP", which we then
push to the "to_free" list with "string_list_append_nodup()".

This does make the API slightly more dangerous to use, as it's no
longer guaranteed that it owns all the members it points to. But as
the "struct string_list" has shown this isn't an issue in practice,
and e.g. SANITIZE=address et al are good about finding double-frees,
or frees of fixed strings.

A branch & CI for this are found at [2].

1. https://lore.kernel.org/git/baf93e4a-7f05-857c-e551-09675496c03c@web.de/
2. https://github.com/avar/git/tree/avar/add-and-use-strvec-init-nodup

Ævar Arnfjörð Bjarmason (5):
  builtin/annotate.c: simplify for strvec API
  various: add missing strvec_clear()
  strvec API: add a "STRVEC_INIT_NODUP"
  strvec API users: fix leaks by using "STRVEC_INIT_NODUP"
  strvec API users: fix more leaks by using "STRVEC_INIT_NODUP"

 builtin/am.c                  |  2 +-
 builtin/annotate.c            | 17 ++++++++---------
 builtin/describe.c            | 28 +++++++++++++++++++++-------
 builtin/stash.c               |  8 ++++++--
 builtin/upload-archive.c      | 16 ++++++++++++----
 strvec.c                      | 20 ++++++++++++++++++--
 strvec.h                      | 30 +++++++++++++++++++++++++++++-
 t/t0023-crlf-am.sh            |  1 +
 t/t4152-am-subjects.sh        |  2 ++
 t/t4254-am-corrupt.sh         |  2 ++
 t/t4256-am-format-flowed.sh   |  1 +
 t/t4257-am-interactive.sh     |  2 ++
 t/t5003-archive-zip.sh        |  1 +
 t/t5403-post-checkout-hook.sh |  1 +
 14 files changed, 105 insertions(+), 26 deletions(-)

-- 
2.39.0.rc2.1048.g0e5493b8d5b


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API
  2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
@ 2022-12-15  9:11         ` Ævar Arnfjörð Bjarmason
  2022-12-17 12:45           ` René Scharfe
  2022-12-15  9:11         ` [RFC PATCH 2/5] various: add missing strvec_clear() Ævar Arnfjörð Bjarmason
                           ` (5 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15  9:11 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

When this code was migrated to "struct strvec" (or rather, its
predecessor API) in 8c2cfa55446 (annotate: use argv_array, 2014-07-16)
it didn't take full advantage of what we were given:

* We are always passed the name "annotate" as argv[0] here, so we
  don't need to re-hardcode it. We've already declared it in "struct
  cmd_struct commands" in git.c.

* We are guaranteed to get at least one argument, so rather than
  looping here ourselves let's have strvec_pushv() handle that. If we
  only have one argument we'll pass the terminating NULL to it, making
  it a NOOP.

This change helps to make the subsequent commit smaller, and as a
bonus removes the type discrepancy between "int" and "size_t".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/annotate.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/builtin/annotate.c b/builtin/annotate.c
index 58ff977a231..e37b269196f 100644
--- a/builtin/annotate.c
+++ b/builtin/annotate.c
@@ -7,16 +7,12 @@
 #include "builtin.h"
 #include "strvec.h"
 
-int cmd_annotate(int argc, const char **argv, const char *prefix)
+int cmd_annotate(int argc UNUSED, const char **argv, const char *prefix)
 {
 	struct strvec args = STRVEC_INIT;
-	int i;
 
-	strvec_pushl(&args, "annotate", "-c", NULL);
-
-	for (i = 1; i < argc; i++) {
-		strvec_push(&args, argv[i]);
-	}
+	strvec_pushl(&args, argv[0], "-c", NULL);
+	strvec_pushv(&args, &argv[1]);
 
 	return cmd_blame(args.nr, args.v, prefix);
 }
-- 
2.39.0.rc2.1048.g0e5493b8d5b


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 2/5] various: add missing strvec_clear()
  2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
  2022-12-15  9:11         ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
@ 2022-12-15  9:11         ` Ævar Arnfjörð Bjarmason
  2022-12-15  9:11         ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15  9:11 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Fix or partially fix memory leaks that happened because a
strvec_clear() was missing. The "partially" will be addressed in a
subsequent commit, we'll still leak in cases where the function we're
calling munges our "v" member.

In the case of "builtin/describe.c" let's change it to use the macro
initializer rather than the strvec_init() while we're at it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/annotate.c       | 5 ++++-
 builtin/describe.c       | 8 +++++---
 builtin/stash.c          | 6 +++++-
 builtin/upload-archive.c | 7 +++++--
 4 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/builtin/annotate.c b/builtin/annotate.c
index e37b269196f..de58deadfc7 100644
--- a/builtin/annotate.c
+++ b/builtin/annotate.c
@@ -10,9 +10,12 @@
 int cmd_annotate(int argc UNUSED, const char **argv, const char *prefix)
 {
 	struct strvec args = STRVEC_INIT;
+	int ret;
 
 	strvec_pushl(&args, argv[0], "-c", NULL);
 	strvec_pushv(&args, &argv[1]);
 
-	return cmd_blame(args.nr, args.v, prefix);
+	ret = cmd_blame(args.nr, args.v, prefix);
+	strvec_clear(&args);
+	return ret;
 }
diff --git a/builtin/describe.c b/builtin/describe.c
index eea1e330c00..cb205f6b561 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -597,9 +597,9 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 
 	if (contains) {
 		struct string_list_item *item;
-		struct strvec args;
+		struct strvec args = STRVEC_INIT;
+		int ret;
 
-		strvec_init(&args);
 		strvec_pushl(&args, "name-rev",
 			     "--peel-tag", "--name-only", "--no-undefined",
 			     NULL);
@@ -616,7 +616,9 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 			strvec_pushv(&args, argv);
 		else
 			strvec_push(&args, "HEAD");
-		return cmd_name_rev(args.nr, args.v, prefix);
+		ret = cmd_name_rev(args.nr, args.v, prefix);
+		strvec_clear(&args);
+		return ret;
 	}
 
 	hashmap_init(&names, commit_name_neq, NULL, 0);
diff --git a/builtin/stash.c b/builtin/stash.c
index bb0fd861434..e504e22e0b9 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -961,6 +961,7 @@ static int show_stash(int argc, const char **argv, const char *prefix)
 	ret = diff_result_code(&rev.diffopt, 0);
 cleanup:
 	strvec_clear(&stash_args);
+	strvec_clear(&revision_args);
 	free_stash_info(&info);
 	release_revisions(&rev);
 	if (do_usage)
@@ -1838,6 +1839,7 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
 		OPT_SUBCOMMAND_F("save", &fn, save_stash, PARSE_OPT_NOCOMPLETE),
 		OPT_END()
 	};
+	int ret;
 
 	git_config(git_stash_config, NULL);
 
@@ -1861,5 +1863,7 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
 	/* Assume 'stash push' */
 	strvec_push(&args, "push");
 	strvec_pushv(&args, argv);
-	return !!push_stash(args.nr, args.v, prefix, 1);
+	ret = !!push_stash(args.nr, args.v, prefix, 1);
+	strvec_clear(&args);
+	return ret;
 }
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 945ee2b4126..6ef0d06ee8b 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -21,6 +21,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
 {
 	struct strvec sent_argv = STRVEC_INIT;
 	const char *arg_cmd = "argument ";
+	int ret;
 
 	if (argc != 2 || !strcmp(argv[1], "-h"))
 		usage(upload_archive_usage);
@@ -45,8 +46,10 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
 	}
 
 	/* parse all options sent by the client */
-	return write_archive(sent_argv.nr, sent_argv.v, prefix,
-			     the_repository, NULL, 1);
+	ret = write_archive(sent_argv.nr, sent_argv.v, prefix,
+			    the_repository, NULL, 1);
+	strvec_clear(&sent_argv);
+	return ret;
 }
 
 __attribute__((format (printf, 1, 2)))
-- 
2.39.0.rc2.1048.g0e5493b8d5b


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP"
  2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
  2022-12-15  9:11         ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
  2022-12-15  9:11         ` [RFC PATCH 2/5] various: add missing strvec_clear() Ævar Arnfjörð Bjarmason
@ 2022-12-15  9:11         ` Ævar Arnfjörð Bjarmason
  2022-12-17 12:45           ` René Scharfe
  2022-12-15  9:11         ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
                           ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15  9:11 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

We have various tricky cases where we'll leak a "struct strvec" even
when we call strvec_clear(), these happen because we'll call
setup_revisions(), parse_options() etc., which will munge our "v"
member.

There's various potential ways to deal with that, see the extensive
on-list discussion at [1]. One way would be to pass a flag to ask the
underlying API to free() these, as was done for setup_revisions() in
[2].

But we don't need that complexity for many common cases, which are
pushing fixed strings to the "struct strvec". Let's instead add a flag
analogous to the "strdup_strings" flag in the "struct string_list". A
subsequent commit will make use of this API.

Implementation notes: The BUG_unless_dup() is implemented as a macro
so we'll report the correct line number on BUG(). The "nodup_strings"
flag could have been named a "strdup_strings" for consistency with the
"struct string_list" API, but to do so we'd have to be confident that
we've spotted all callers that assume that they can memset() a "struct
strvec" to zero.

1. https://lore.kernel.org/git/221214.86ilie48cv.gmgdl@evledraar.gmail.com/
2. f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
   need free()-ing, 2022-08-02)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 strvec.c | 20 ++++++++++++++++++--
 strvec.h | 30 +++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/strvec.c b/strvec.c
index 61a76ce6cb9..721f8e94a50 100644
--- a/strvec.c
+++ b/strvec.c
@@ -10,6 +10,16 @@ void strvec_init(struct strvec *array)
 	memcpy(array, &blank, sizeof(*array));
 }
 
+void strvec_init_nodup(struct strvec *array)
+{
+	struct strvec blank = STRVEC_INIT_NODUP;
+	memcpy(array, &blank, sizeof(*array));
+}
+
+#define BUG_unless_dup(array, fn) \
+	if ((array)->nodup_strings) \
+		BUG("cannot %s() on a 'STRVEC_INIT_NODUP' strvec", (fn))
+
 static void strvec_push_nodup(struct strvec *array, const char *value)
 {
 	if (array->v == empty_strvec)
@@ -22,7 +32,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value)
 
 const char *strvec_push(struct strvec *array, const char *value)
 {
-	strvec_push_nodup(array, xstrdup(value));
+	const char *to_push = array->nodup_strings ? value : xstrdup(value);
+
+	strvec_push_nodup(array, to_push);
 	return array->v[array->nr - 1];
 }
 
@@ -31,6 +43,8 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...)
 	va_list ap;
 	struct strbuf v = STRBUF_INIT;
 
+	BUG_unless_dup(array, "strvec_pushf");
+
 	va_start(ap, fmt);
 	strbuf_vaddf(&v, fmt, ap);
 	va_end(ap);
@@ -67,6 +81,8 @@ void strvec_pop(struct strvec *array)
 
 void strvec_split(struct strvec *array, const char *to_split)
 {
+	BUG_unless_dup(array, "strvec_pushf");
+
 	while (isspace(*to_split))
 		to_split++;
 	for (;;) {
@@ -89,7 +105,7 @@ void strvec_clear(struct strvec *array)
 {
 	if (array->v != empty_strvec) {
 		int i;
-		for (i = 0; i < array->nr; i++)
+		for (i = 0; !array->nodup_strings && i < array->nr; i++)
 			free((char *)array->v[i]);
 		free(array->v);
 	}
diff --git a/strvec.h b/strvec.h
index 9f55c8766ba..b122b87b369 100644
--- a/strvec.h
+++ b/strvec.h
@@ -26,29 +26,51 @@ extern const char *empty_strvec[];
  * member contains the actual array; the `nr` member contains the
  * number of elements in the array, not including the terminating
  * NULL.
+ *
+ * When using `STRVEC_INIT_NODUP` to initialize it the `nodup_strings'
+ * member is set, and individual members of the "struct strvec" will
+ * not be free()'d by strvec_clear(). This is for fixed string
+ * arguments to parse_options() and others that might munge the "v"
+ * itself.
  */
 struct strvec {
 	const char **v;
 	size_t nr;
 	size_t alloc;
+	unsigned int nodup_strings:1;
 };
 
 #define STRVEC_INIT { \
 	.v = empty_strvec, \
 }
 
+#define STRVEC_INIT_NODUP { \
+	.v = empty_strvec, \
+	.nodup_strings = 1, \
+}
+
 /**
  * Initialize an array. This is no different than assigning from
  * `STRVEC_INIT`.
  */
 void strvec_init(struct strvec *);
 
+/**
+ * Initialize a "nodup" array. This is no different than assigning from
+ * `STRVEC_INIT_NODUP`.
+ */
+void strvec_init_nodup(struct strvec *);
+
 /* Push a copy of a string onto the end of the array. */
 const char *strvec_push(struct strvec *, const char *);
 
 /**
  * Format a string and push it onto the end of the array. This is a
  * convenience wrapper combining `strbuf_addf` and `strvec_push`.
+ *
+ * This is incompatible with arrays initialized with
+ * `STRVEC_INIT_NODUP`, as pushing the formatted string requires the
+ * equivalent of an xstrfmt().
  */
 __attribute__((format (printf,2,3)))
 const char *strvec_pushf(struct strvec *, const char *fmt, ...);
@@ -70,7 +92,13 @@ void strvec_pushv(struct strvec *, const char **);
  */
 void strvec_pop(struct strvec *);
 
-/* Splits by whitespace; does not handle quoted arguments! */
+/**
+ * Splits by whitespace; does not handle quoted arguments!
+ *
+ * This is incompatible with arrays initialized with
+ * `STRVEC_INIT_NODUP`, as pushing the elements requires an xstrndup()
+ * call.
+ */
 void strvec_split(struct strvec *, const char *);
 
 /**
-- 
2.39.0.rc2.1048.g0e5493b8d5b


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP"
  2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
                           ` (2 preceding siblings ...)
  2022-12-15  9:11         ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
@ 2022-12-15  9:11         ` Ævar Arnfjörð Bjarmason
  2022-12-17 12:45           ` René Scharfe
  2022-12-15  9:11         ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
                           ` (2 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15  9:11 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

For these cases where all of the strings we're pushing to the "struct
strvec" are fixed strings we can fix widespread memory leaks by
skipping the xstrdup() on strvec_push().

More in-tree users could benefit from this to save needless
xstrdup()-ing, but for all of these we were munging the "v" member, so
the subsequent strvec_clear() wouldn't free the memory.

Now there's no need to free the individual elements, but we'll still
need to free the container with the strvec_clear().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/am.c                  | 2 +-
 builtin/annotate.c            | 2 +-
 builtin/stash.c               | 2 +-
 t/t0023-crlf-am.sh            | 1 +
 t/t4152-am-subjects.sh        | 2 ++
 t/t4254-am-corrupt.sh         | 2 ++
 t/t4256-am-format-flowed.sh   | 1 +
 t/t4257-am-interactive.sh     | 2 ++
 t/t5403-post-checkout-hook.sh | 1 +
 9 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 30c9b3a9cd7..691ec1d152d 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail)
 static int run_apply(const struct am_state *state, const char *index_file)
 {
 	struct strvec apply_paths = STRVEC_INIT;
-	struct strvec apply_opts = STRVEC_INIT;
+	struct strvec apply_opts = STRVEC_INIT_NODUP;
 	struct apply_state apply_state;
 	int res, opts_left;
 	int force_apply = 0;
diff --git a/builtin/annotate.c b/builtin/annotate.c
index de58deadfc7..99d97c1a8c0 100644
--- a/builtin/annotate.c
+++ b/builtin/annotate.c
@@ -9,7 +9,7 @@
 
 int cmd_annotate(int argc UNUSED, const char **argv, const char *prefix)
 {
-	struct strvec args = STRVEC_INIT;
+	struct strvec args = STRVEC_INIT_NODUP;
 	int ret;
 
 	strvec_pushl(&args, argv[0], "-c", NULL);
diff --git a/builtin/stash.c b/builtin/stash.c
index e504e22e0b9..b15dd2ebb3c 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1823,7 +1823,7 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
 {
 	pid_t pid = getpid();
 	const char *index_file;
-	struct strvec args = STRVEC_INIT;
+	struct strvec args = STRVEC_INIT_NODUP;
 	parse_opt_subcommand_fn *fn = NULL;
 	struct option options[] = {
 		OPT_SUBCOMMAND("apply", &fn, apply_stash),
diff --git a/t/t0023-crlf-am.sh b/t/t0023-crlf-am.sh
index f9bbb91f64e..575805513a3 100755
--- a/t/t0023-crlf-am.sh
+++ b/t/t0023-crlf-am.sh
@@ -2,6 +2,7 @@
 
 test_description='Test am with auto.crlf'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 cat >patchfile <<\EOF
diff --git a/t/t4152-am-subjects.sh b/t/t4152-am-subjects.sh
index 4c68245acad..9f2edba1f83 100755
--- a/t/t4152-am-subjects.sh
+++ b/t/t4152-am-subjects.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
 test_description='test subject preservation with format-patch | am'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 make_patches() {
diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index 54be7da1611..45f1d4f95e5 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
 test_description='git am with corrupt input'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 make_mbox_with_nul () {
diff --git a/t/t4256-am-format-flowed.sh b/t/t4256-am-format-flowed.sh
index 2369c4e17ad..1015273bc82 100755
--- a/t/t4256-am-format-flowed.sh
+++ b/t/t4256-am-format-flowed.sh
@@ -2,6 +2,7 @@
 
 test_description='test format=flowed support of git am'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' '
diff --git a/t/t4257-am-interactive.sh b/t/t4257-am-interactive.sh
index aed8f4de3d6..f26d7fd2dbd 100755
--- a/t/t4257-am-interactive.sh
+++ b/t/t4257-am-interactive.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
 test_description='am --interactive tests'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'set up patches to apply' '
diff --git a/t/t5403-post-checkout-hook.sh b/t/t5403-post-checkout-hook.sh
index 978f240cdac..cfaae547398 100755
--- a/t/t5403-post-checkout-hook.sh
+++ b/t/t5403-post-checkout-hook.sh
@@ -7,6 +7,7 @@ test_description='Test the post-checkout hook.'
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success setup '
-- 
2.39.0.rc2.1048.g0e5493b8d5b


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 5/5] strvec API users: fix more leaks by using "STRVEC_INIT_NODUP"
  2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
                           ` (3 preceding siblings ...)
  2022-12-15  9:11         ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
@ 2022-12-15  9:11         ` Ævar Arnfjörð Bjarmason
  2022-12-17 12:45           ` René Scharfe
  2022-12-17 12:45         ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks René Scharfe
  2022-12-17 13:13         ` Jeff King
  6 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-15  9:11 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

For these cases where most of the the strings we were pushing to the
"struct strvec" were fixed strings, but others are dynamically
allocated. For the latter we keep around a "to_free" list of them.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/describe.c       | 22 +++++++++++++++++-----
 builtin/upload-archive.c |  9 +++++++--
 t/t5003-archive-zip.sh   |  1 +
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index cb205f6b561..eda59ebb19a 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -596,8 +596,10 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 		die(_("options '%s' and '%s' cannot be used together"), "--long", "--abbrev=0");
 
 	if (contains) {
+		struct string_list to_free = STRING_LIST_INIT_DUP;
 		struct string_list_item *item;
-		struct strvec args = STRVEC_INIT;
+		struct strvec args = STRVEC_INIT_NODUP;
+
 		int ret;
 
 		strvec_pushl(&args, "name-rev",
@@ -607,10 +609,19 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 			strvec_push(&args, "--always");
 		if (!all) {
 			strvec_push(&args, "--tags");
-			for_each_string_list_item(item, &patterns)
-				strvec_pushf(&args, "--refs=refs/tags/%s", item->string);
-			for_each_string_list_item(item, &exclude_patterns)
-				strvec_pushf(&args, "--exclude=refs/tags/%s", item->string);
+			for_each_string_list_item(item, &patterns) {
+				char *str = xstrfmt("--refs=refs/tags/%s", item->string);
+				const char *item = string_list_append_nodup(&to_free, str)->string;
+
+				strvec_push(&args, item);
+			}
+			for_each_string_list_item(item, &exclude_patterns) {
+				char *str = xstrfmt("--exclude=refs/tags/%s", item->string);
+
+				const char *item = string_list_append_nodup(&to_free, str)->string;
+
+				strvec_push(&args, item);
+			}
 		}
 		if (argc)
 			strvec_pushv(&args, argv);
@@ -618,6 +629,7 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 			strvec_push(&args, "HEAD");
 		ret = cmd_name_rev(args.nr, args.v, prefix);
 		strvec_clear(&args);
+		string_list_clear(&to_free, 0);
 		return ret;
 	}
 
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 6ef0d06ee8b..95c6cdc4be0 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -19,7 +19,8 @@ static const char deadchild[] =
 
 int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
 {
-	struct strvec sent_argv = STRVEC_INIT;
+	struct string_list to_free = STRING_LIST_INIT_DUP;
+	struct strvec sent_argv = STRVEC_INIT_NODUP;
 	const char *arg_cmd = "argument ";
 	int ret;
 
@@ -34,6 +35,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
 	/* put received options in sent_argv[] */
 	strvec_push(&sent_argv, "git-upload-archive");
 	for (;;) {
+		struct string_list_item *item;
 		char *buf = packet_read_line(0, NULL);
 		if (!buf)
 			break;	/* got a flush */
@@ -42,13 +44,16 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
 
 		if (!starts_with(buf, arg_cmd))
 			die("'argument' token or flush expected");
-		strvec_push(&sent_argv, buf + strlen(arg_cmd));
+
+		item = string_list_append(&to_free, buf + strlen(arg_cmd));
+		strvec_push(&sent_argv, item->string);
 	}
 
 	/* parse all options sent by the client */
 	ret = write_archive(sent_argv.nr, sent_argv.v, prefix,
 			    the_repository, NULL, 1);
 	strvec_clear(&sent_argv);
+	string_list_clear(&to_free, 0);
 	return ret;
 }
 
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index fc499cdff01..cc1ce060558 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -2,6 +2,7 @@
 
 test_description='git archive --format=zip test'
 
+TEST_PASSES_SANITIZE_LEAK=true
 TEST_CREATE_REPO_NO_TEMPLATE=1
 . ./test-lib.sh
 
-- 
2.39.0.rc2.1048.g0e5493b8d5b


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 5/5] strvec API users: fix more leaks by using "STRVEC_INIT_NODUP"
  2022-12-15  9:11         ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:45           ` René Scharfe
  0 siblings, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> For these cases where most of the the strings we were pushing to the
> "struct strvec" were fixed strings, but others are dynamically
> allocated. For the latter we keep around a "to_free" list of them.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/describe.c       | 22 +++++++++++++++++-----
>  builtin/upload-archive.c |  9 +++++++--
>  t/t5003-archive-zip.sh   |  1 +
>  3 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/describe.c b/builtin/describe.c
> index cb205f6b561..eda59ebb19a 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -596,8 +596,10 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>  		die(_("options '%s' and '%s' cannot be used together"), "--long", "--abbrev=0");
>
>  	if (contains) {
> +		struct string_list to_free = STRING_LIST_INIT_DUP;
>  		struct string_list_item *item;
> -		struct strvec args = STRVEC_INIT;
> +		struct strvec args = STRVEC_INIT_NODUP;
> +
>  		int ret;
>
>  		strvec_pushl(&args, "name-rev",
> @@ -607,10 +609,19 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>  			strvec_push(&args, "--always");
>  		if (!all) {
>  			strvec_push(&args, "--tags");
> -			for_each_string_list_item(item, &patterns)
> -				strvec_pushf(&args, "--refs=refs/tags/%s", item->string);
> -			for_each_string_list_item(item, &exclude_patterns)
> -				strvec_pushf(&args, "--exclude=refs/tags/%s", item->string);
> +			for_each_string_list_item(item, &patterns) {
> +				char *str = xstrfmt("--refs=refs/tags/%s", item->string);
> +				const char *item = string_list_append_nodup(&to_free, str)->string;
> +
> +				strvec_push(&args, item);
> +			}
> +			for_each_string_list_item(item, &exclude_patterns) {
> +				char *str = xstrfmt("--exclude=refs/tags/%s", item->string);
> +
> +				const char *item = string_list_append_nodup(&to_free, str)->string;
> +
> +				strvec_push(&args, item);
> +			}

Having to track allocations separately for each option is tedious, as
we see here.

>  		}
>  		if (argc)
>  			strvec_pushv(&args, argv);
> @@ -618,6 +629,7 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>  			strvec_push(&args, "HEAD");
>  		ret = cmd_name_rev(args.nr, args.v, prefix);
>  		strvec_clear(&args);
> +		string_list_clear(&to_free, 0);
>  		return ret;
>  	}
>
> diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> index 6ef0d06ee8b..95c6cdc4be0 100644
> --- a/builtin/upload-archive.c
> +++ b/builtin/upload-archive.c
> @@ -19,7 +19,8 @@ static const char deadchild[] =
>
>  int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
>  {
> -	struct strvec sent_argv = STRVEC_INIT;
> +	struct string_list to_free = STRING_LIST_INIT_DUP;
> +	struct strvec sent_argv = STRVEC_INIT_NODUP;
>  	const char *arg_cmd = "argument ";
>  	int ret;
>
> @@ -34,6 +35,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
>  	/* put received options in sent_argv[] */
>  	strvec_push(&sent_argv, "git-upload-archive");
>  	for (;;) {
> +		struct string_list_item *item;
>  		char *buf = packet_read_line(0, NULL);
>  		if (!buf)
>  			break;	/* got a flush */
> @@ -42,13 +44,16 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
>
>  		if (!starts_with(buf, arg_cmd))
>  			die("'argument' token or flush expected");
> -		strvec_push(&sent_argv, buf + strlen(arg_cmd));
> +
> +		item = string_list_append(&to_free, buf + strlen(arg_cmd));
> +		strvec_push(&sent_argv, item->string);
>  	}
>
>  	/* parse all options sent by the client */
>  	ret = write_archive(sent_argv.nr, sent_argv.v, prefix,
>  			    the_repository, NULL, 1);
>  	strvec_clear(&sent_argv);
> +	string_list_clear(&to_free, 0);
>  	return ret;
>  }
>
> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index fc499cdff01..cc1ce060558 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -2,6 +2,7 @@
>
>  test_description='git archive --format=zip test'
>
> +TEST_PASSES_SANITIZE_LEAK=true
>  TEST_CREATE_REPO_NO_TEMPLATE=1
>  . ./test-lib.sh
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
  2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
                           ` (4 preceding siblings ...)
  2022-12-15  9:11         ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:45         ` René Scharfe
  2022-12-17 13:13         ` Jeff King
  6 siblings, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> This is an alternative to René's [1], his already fixes a leak in "git
> am", and this could be done later, so I'm submitting it as RFC, but it
> could also replace it.
>
> I think as this series shows extending the "strvec" API to get a
> feature that works like the existing "strdup_strings" that the "struct
> string_list" has can make memory management much simpler.
>
> The 4/5 here shows cases where we were leaking because our "v" was
> clobbered, but where all the strings we were pushing to the "strvec"
> were fixed strings, so we could skip xstrdup()-ing them.
>
> The 5/5 then shows more complex cases where we have mixed-use,
> i.e. most strings are fixed, but some are not. For those we use a
> "struct string_list to_free = STRING_LIST_INIT_DUP", which we then
> push to the "to_free" list with "string_list_append_nodup()".
>
> This does make the API slightly more dangerous to use, as it's no
> longer guaranteed that it owns all the members it points to. But as
> the "struct string_list" has shown this isn't an issue in practice,
> and e.g. SANITIZE=address et al are good about finding double-frees,
> or frees of fixed strings.
>
> A branch & CI for this are found at [2].
>
> 1. https://lore.kernel.org/git/baf93e4a-7f05-857c-e551-09675496c03c@web.de/
> 2. https://github.com/avar/git/tree/avar/add-and-use-strvec-init-nodup
>
> Ævar Arnfjörð Bjarmason (5):
>   builtin/annotate.c: simplify for strvec API
>   various: add missing strvec_clear()
>   strvec API: add a "STRVEC_INIT_NODUP"
>   strvec API users: fix leaks by using "STRVEC_INIT_NODUP"
>   strvec API users: fix more leaks by using "STRVEC_INIT_NODUP"
>
>  builtin/am.c                  |  2 +-
>  builtin/annotate.c            | 17 ++++++++---------
>  builtin/describe.c            | 28 +++++++++++++++++++++-------
>  builtin/stash.c               |  8 ++++++--
>  builtin/upload-archive.c      | 16 ++++++++++++----
>  strvec.c                      | 20 ++++++++++++++++++--
>  strvec.h                      | 30 +++++++++++++++++++++++++++++-
>  t/t0023-crlf-am.sh            |  1 +
>  t/t4152-am-subjects.sh        |  2 ++
>  t/t4254-am-corrupt.sh         |  2 ++
>  t/t4256-am-format-flowed.sh   |  1 +
>  t/t4257-am-interactive.sh     |  2 ++
>  t/t5003-archive-zip.sh        |  1 +
>  t/t5403-post-checkout-hook.sh |  1 +
>  14 files changed, 105 insertions(+), 26 deletions(-)
>

This complicates strvec to gain functionality that we basically already
have with REALLOC_ARRAY.  It allows for nice conversions in some cases
(patch 4), but places that require some kind of double accounting become
quite nasty (patch 5).  I'd prefer to keep strvec simple.  Avoiding
allocations isn't necessary because the number of options to parse is
low -- we just need to keep track of and release them at the end.

The problem here is that parse_options() takes a string list in the form
of an int paired with a const char ** and modifies this list in place.
That's fine if we don't care about the memory ownership of the list and
its elements, e.g. because we get it from the OS (main() style) or
accept leakage.

It's not compatible with the list being a strvec that we don't want to
leak, though.  What we need is something that translates from strvec to
the argc/argv convention.  If we allow leaks this is trivial: Just use
the .v member directly.  If we don't then it is still simple: Make a
shallow copy of the .v member, release it after use.  Demo patch below.

 builtin/am.c             |  5 ++++-
 builtin/annotate.c       |  9 ++++++++-
 builtin/describe.c       |  8 +++++++-
 builtin/stash.c          | 10 +++++++++-
 builtin/upload-archive.c | 11 +++++++++--
 strvec.c                 |  9 +++++++++
 strvec.h                 |  7 +++++++
 7 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 30c9b3a9cd..a80d40f4be 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1476,14 +1476,16 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	int res, opts_left;
 	int force_apply = 0;
 	int options = 0;
+	const char **apply_argv;

 	if (init_apply_state(&apply_state, the_repository, NULL))
 		BUG("init_apply_state() failed");

 	strvec_push(&apply_opts, "apply");
 	strvec_pushv(&apply_opts, state->git_apply_opts.v);
+	apply_argv = strvec_clone_nodup(&apply_opts);

-	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
+	opts_left = apply_parse_options(apply_opts.nr, apply_argv,
 					&apply_state, &force_apply, &options,
 					NULL);

@@ -1513,6 +1515,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	strvec_clear(&apply_paths);
 	strvec_clear(&apply_opts);
 	clear_apply_state(&apply_state);
+	free(apply_argv);

 	if (res)
 		return res;
diff --git a/builtin/annotate.c b/builtin/annotate.c
index 58ff977a23..7e75f0f48d 100644
--- a/builtin/annotate.c
+++ b/builtin/annotate.c
@@ -10,6 +10,8 @@
 int cmd_annotate(int argc, const char **argv, const char *prefix)
 {
 	struct strvec args = STRVEC_INIT;
+	const char **blame_argv;
+	int ret;
 	int i;

 	strvec_pushl(&args, "annotate", "-c", NULL);
@@ -17,6 +19,11 @@ int cmd_annotate(int argc, const char **argv, const char *prefix)
 	for (i = 1; i < argc; i++) {
 		strvec_push(&args, argv[i]);
 	}
+	blame_argv = strvec_clone_nodup(&args);

-	return cmd_blame(args.nr, args.v, prefix);
+	ret = cmd_blame(args.nr, blame_argv, prefix);
+
+	strvec_clear(&args);
+	free(blame_argv);
+	return ret;
 }
diff --git a/builtin/describe.c b/builtin/describe.c
index eea1e330c0..2ef2fbc0d4 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -598,6 +598,8 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 	if (contains) {
 		struct string_list_item *item;
 		struct strvec args;
+		const char **name_rev_argv;
+		int ret;

 		strvec_init(&args);
 		strvec_pushl(&args, "name-rev",
@@ -616,7 +618,11 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 			strvec_pushv(&args, argv);
 		else
 			strvec_push(&args, "HEAD");
-		return cmd_name_rev(args.nr, args.v, prefix);
+		name_rev_argv = strvec_clone_nodup(&args);
+		ret = cmd_name_rev(args.nr, name_rev_argv, prefix);
+		strvec_clear(&args);
+		free(name_rev_argv);
+		return ret;
 	}

 	hashmap_init(&names, commit_name_neq, NULL, 0);
diff --git a/builtin/stash.c b/builtin/stash.c
index bb0fd86143..864b7c573a 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1820,6 +1820,8 @@ static int save_stash(int argc, const char **argv, const char *prefix)

 int cmd_stash(int argc, const char **argv, const char *prefix)
 {
+	int ret;
+	const char **push_stash_argv;
 	pid_t pid = getpid();
 	const char *index_file;
 	struct strvec args = STRVEC_INIT;
@@ -1861,5 +1863,11 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
 	/* Assume 'stash push' */
 	strvec_push(&args, "push");
 	strvec_pushv(&args, argv);
-	return !!push_stash(args.nr, args.v, prefix, 1);
+	push_stash_argv = strvec_clone_nodup(&args);
+
+	ret = !!push_stash(args.nr, push_stash_argv, prefix, 1);
+
+	strvec_clear(&args);
+	free(push_stash_argv);
+	return ret;
 }
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 945ee2b412..36ed85e10a 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -19,6 +19,8 @@ static const char deadchild[] =

 int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
 {
+	int ret;
+	const char **upload_archive_argv;
 	struct strvec sent_argv = STRVEC_INIT;
 	const char *arg_cmd = "argument ";

@@ -43,10 +45,15 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
 			die("'argument' token or flush expected");
 		strvec_push(&sent_argv, buf + strlen(arg_cmd));
 	}
+	upload_archive_argv = strvec_clone_nodup(&sent_argv);

 	/* parse all options sent by the client */
-	return write_archive(sent_argv.nr, sent_argv.v, prefix,
-			     the_repository, NULL, 1);
+	ret = write_archive(sent_argv.nr, upload_archive_argv, prefix,
+			    the_repository, NULL, 1);
+
+	strvec_clear(&sent_argv);
+	free(upload_archive_argv);
+	return ret;
 }

 __attribute__((format (printf, 1, 2)))
diff --git a/strvec.c b/strvec.c
index 61a76ce6cb..5c41c8c506 100644
--- a/strvec.c
+++ b/strvec.c
@@ -106,3 +106,12 @@ const char **strvec_detach(struct strvec *array)
 		return ret;
 	}
 }
+
+const char **strvec_clone_nodup(const struct strvec *sv)
+{
+	const char **ret;
+
+	CALLOC_ARRAY(ret, sv->nr + 1);
+	COPY_ARRAY(ret, sv->v, sv->nr);
+	return ret;
+}
diff --git a/strvec.h b/strvec.h
index 9f55c8766b..6234634b00 100644
--- a/strvec.h
+++ b/strvec.h
@@ -88,4 +88,11 @@ void strvec_clear(struct strvec *);
  */
 const char **strvec_detach(struct strvec *);

+/**
+ * Create a copy of the array that references the original strings.
+ * The caller is responsible for freeing the memory of that copy,
+ * but must not free the individual strings.
+ */
+const char **strvec_clone_nodup(const struct strvec *);
+
 #endif /* STRVEC_H */

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP"
  2022-12-15  9:11         ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:45           ` René Scharfe
  0 siblings, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> For these cases where all of the strings we're pushing to the "struct
> strvec" are fixed strings we can fix widespread memory leaks by
> skipping the xstrdup() on strvec_push().
>
> More in-tree users could benefit from this to save needless
> xstrdup()-ing, but for all of these we were munging the "v" member, so
> the subsequent strvec_clear() wouldn't free the memory.
>
> Now there's no need to free the individual elements, but we'll still
> need to free the container with the strvec_clear().
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/am.c                  | 2 +-
>  builtin/annotate.c            | 2 +-
>  builtin/stash.c               | 2 +-
>  t/t0023-crlf-am.sh            | 1 +
>  t/t4152-am-subjects.sh        | 2 ++
>  t/t4254-am-corrupt.sh         | 2 ++
>  t/t4256-am-format-flowed.sh   | 1 +
>  t/t4257-am-interactive.sh     | 2 ++
>  t/t5403-post-checkout-hook.sh | 1 +
>  9 files changed, 12 insertions(+), 3 deletions(-)

Looks nice and easy, though the test updates are a bit distracting.
They are all enabled by the change to builtin/am.c, right?

>
> diff --git a/builtin/am.c b/builtin/am.c
> index 30c9b3a9cd7..691ec1d152d 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail)
>  static int run_apply(const struct am_state *state, const char *index_file)
>  {
>  	struct strvec apply_paths = STRVEC_INIT;
> -	struct strvec apply_opts = STRVEC_INIT;
> +	struct strvec apply_opts = STRVEC_INIT_NODUP;
>  	struct apply_state apply_state;
>  	int res, opts_left;
>  	int force_apply = 0;
> diff --git a/builtin/annotate.c b/builtin/annotate.c
> index de58deadfc7..99d97c1a8c0 100644
> --- a/builtin/annotate.c
> +++ b/builtin/annotate.c
> @@ -9,7 +9,7 @@
>
>  int cmd_annotate(int argc UNUSED, const char **argv, const char *prefix)
>  {
> -	struct strvec args = STRVEC_INIT;
> +	struct strvec args = STRVEC_INIT_NODUP;
>  	int ret;
>
>  	strvec_pushl(&args, argv[0], "-c", NULL);
> diff --git a/builtin/stash.c b/builtin/stash.c
> index e504e22e0b9..b15dd2ebb3c 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -1823,7 +1823,7 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
>  {
>  	pid_t pid = getpid();
>  	const char *index_file;
> -	struct strvec args = STRVEC_INIT;
> +	struct strvec args = STRVEC_INIT_NODUP;
>  	parse_opt_subcommand_fn *fn = NULL;
>  	struct option options[] = {
>  		OPT_SUBCOMMAND("apply", &fn, apply_stash),
> diff --git a/t/t0023-crlf-am.sh b/t/t0023-crlf-am.sh
> index f9bbb91f64e..575805513a3 100755
> --- a/t/t0023-crlf-am.sh
> +++ b/t/t0023-crlf-am.sh
> @@ -2,6 +2,7 @@
>
>  test_description='Test am with auto.crlf'
>
> +TEST_PASSES_SANITIZE_LEAK=true
>  . ./test-lib.sh
>
>  cat >patchfile <<\EOF
> diff --git a/t/t4152-am-subjects.sh b/t/t4152-am-subjects.sh
> index 4c68245acad..9f2edba1f83 100755
> --- a/t/t4152-am-subjects.sh
> +++ b/t/t4152-am-subjects.sh
> @@ -1,6 +1,8 @@
>  #!/bin/sh
>
>  test_description='test subject preservation with format-patch | am'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
>  . ./test-lib.sh
>
>  make_patches() {
> diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
> index 54be7da1611..45f1d4f95e5 100755
> --- a/t/t4254-am-corrupt.sh
> +++ b/t/t4254-am-corrupt.sh
> @@ -1,6 +1,8 @@
>  #!/bin/sh
>
>  test_description='git am with corrupt input'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
>  . ./test-lib.sh
>
>  make_mbox_with_nul () {
> diff --git a/t/t4256-am-format-flowed.sh b/t/t4256-am-format-flowed.sh
> index 2369c4e17ad..1015273bc82 100755
> --- a/t/t4256-am-format-flowed.sh
> +++ b/t/t4256-am-format-flowed.sh
> @@ -2,6 +2,7 @@
>
>  test_description='test format=flowed support of git am'
>
> +TEST_PASSES_SANITIZE_LEAK=true
>  . ./test-lib.sh
>
>  test_expect_success 'setup' '
> diff --git a/t/t4257-am-interactive.sh b/t/t4257-am-interactive.sh
> index aed8f4de3d6..f26d7fd2dbd 100755
> --- a/t/t4257-am-interactive.sh
> +++ b/t/t4257-am-interactive.sh
> @@ -1,6 +1,8 @@
>  #!/bin/sh
>
>  test_description='am --interactive tests'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
>  . ./test-lib.sh
>
>  test_expect_success 'set up patches to apply' '
> diff --git a/t/t5403-post-checkout-hook.sh b/t/t5403-post-checkout-hook.sh
> index 978f240cdac..cfaae547398 100755
> --- a/t/t5403-post-checkout-hook.sh
> +++ b/t/t5403-post-checkout-hook.sh
> @@ -7,6 +7,7 @@ test_description='Test the post-checkout hook.'
>  GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>  export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>
> +TEST_PASSES_SANITIZE_LEAK=true
>  . ./test-lib.sh
>
>  test_expect_success setup '

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API
  2022-12-15  9:11         ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:45           ` René Scharfe
  0 siblings, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> When this code was migrated to "struct strvec" (or rather, its
> predecessor API) in 8c2cfa55446 (annotate: use argv_array, 2014-07-16)
> it didn't take full advantage of what we were given:
>
> * We are always passed the name "annotate" as argv[0] here, so we
>   don't need to re-hardcode it. We've already declared it in "struct
>   cmd_struct commands" in git.c.

This change has nothing to do with strvec; it could have been done by
e68989a739 (annotate: fix for cvsserver., 2007-02-06) already.  I don't
see much of a benefit to it because the string is already there, but if
it's needed then I think it deserves its own patch.

> * We are guaranteed to get at least one argument, so rather than
>   looping here ourselves let's have strvec_pushv() handle that. If we
>   only have one argument we'll pass the terminating NULL to it, making
>   it a NOOP.

85b343245b (argv-array: implement argv_array_pushv(), 2015-06-14) added
the _pushv() function, so 8c2cfa55446 could not have used it.

> This change helps to make the subsequent commit smaller, and as a
> bonus removes the type discrepancy between "int" and "size_t".

This type mismatch is benign as long as we get a valid argc value,
as positive int fit into size_t.  Getting rid of the loop is nice,
though.

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/annotate.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/annotate.c b/builtin/annotate.c
> index 58ff977a231..e37b269196f 100644
> --- a/builtin/annotate.c
> +++ b/builtin/annotate.c
> @@ -7,16 +7,12 @@
>  #include "builtin.h"
>  #include "strvec.h"
>
> -int cmd_annotate(int argc, const char **argv, const char *prefix)
> +int cmd_annotate(int argc UNUSED, const char **argv, const char *prefix)
>  {
>  	struct strvec args = STRVEC_INIT;
> -	int i;
>
> -	strvec_pushl(&args, "annotate", "-c", NULL);
> -
> -	for (i = 1; i < argc; i++) {
> -		strvec_push(&args, argv[i]);
> -	}
> +	strvec_pushl(&args, argv[0], "-c", NULL);
> +	strvec_pushv(&args, &argv[1]);
                            ^^^^^^^^
"argv + 1" would be easier to read.

>
>  	return cmd_blame(args.nr, args.v, prefix);
>  }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP"
  2022-12-15  9:11         ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:45           ` René Scharfe
  0 siblings, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> We have various tricky cases where we'll leak a "struct strvec" even
> when we call strvec_clear(), these happen because we'll call
> setup_revisions(), parse_options() etc., which will munge our "v"
> member.
>
> There's various potential ways to deal with that, see the extensive
> on-list discussion at [1]. One way would be to pass a flag to ask the
> underlying API to free() these, as was done for setup_revisions() in
> [2].
>
> But we don't need that complexity for many common cases, which are
> pushing fixed strings to the "struct strvec". Let's instead add a flag
> analogous to the "strdup_strings" flag in the "struct string_list". A
> subsequent commit will make use of this API.
>
> Implementation notes: The BUG_unless_dup() is implemented as a macro
> so we'll report the correct line number on BUG(). The "nodup_strings"
> flag could have been named a "strdup_strings" for consistency with the
> "struct string_list" API, but to do so we'd have to be confident that
> we've spotted all callers that assume that they can memset() a "struct
> strvec" to zero.

If the two variants would get separate types then the compiler would
take care of these checks.  Conceptually they are different enough and
have different operations; they only share the same struct layout.

But I doubt a _nodup variant as its own type would pull its weight.
It would basically be a trivial wrapper around REALLOC_ARRAY..

>
> 1. https://lore.kernel.org/git/221214.86ilie48cv.gmgdl@evledraar.gmail.com/
> 2. f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
>    need free()-ing, 2022-08-02)
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  strvec.c | 20 ++++++++++++++++++--
>  strvec.h | 30 +++++++++++++++++++++++++++++-
>  2 files changed, 47 insertions(+), 3 deletions(-)
>
> diff --git a/strvec.c b/strvec.c
> index 61a76ce6cb9..721f8e94a50 100644
> --- a/strvec.c
> +++ b/strvec.c
> @@ -10,6 +10,16 @@ void strvec_init(struct strvec *array)
>  	memcpy(array, &blank, sizeof(*array));
>  }
>
> +void strvec_init_nodup(struct strvec *array)
> +{
> +	struct strvec blank = STRVEC_INIT_NODUP;
> +	memcpy(array, &blank, sizeof(*array));
> +}
> +
> +#define BUG_unless_dup(array, fn) \
> +	if ((array)->nodup_strings) \
> +		BUG("cannot %s() on a 'STRVEC_INIT_NODUP' strvec", (fn))
> +
>  static void strvec_push_nodup(struct strvec *array, const char *value)
>  {
>  	if (array->v == empty_strvec)
> @@ -22,7 +32,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value)
>
>  const char *strvec_push(struct strvec *array, const char *value)
>  {
> -	strvec_push_nodup(array, xstrdup(value));
> +	const char *to_push = array->nodup_strings ? value : xstrdup(value);
> +
> +	strvec_push_nodup(array, to_push);
>  	return array->v[array->nr - 1];
>  }
>
> @@ -31,6 +43,8 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...)
>  	va_list ap;
>  	struct strbuf v = STRBUF_INIT;
>
> +	BUG_unless_dup(array, "strvec_pushf");
> +
>  	va_start(ap, fmt);
>  	strbuf_vaddf(&v, fmt, ap);
>  	va_end(ap);
> @@ -67,6 +81,8 @@ void strvec_pop(struct strvec *array)
>
>  void strvec_split(struct strvec *array, const char *to_split)
>  {
> +	BUG_unless_dup(array, "strvec_pushf");
> +
>  	while (isspace(*to_split))
>  		to_split++;
>  	for (;;) {
> @@ -89,7 +105,7 @@ void strvec_clear(struct strvec *array)
>  {
>  	if (array->v != empty_strvec) {
>  		int i;
> -		for (i = 0; i < array->nr; i++)
> +		for (i = 0; !array->nodup_strings && i < array->nr; i++)
>  			free((char *)array->v[i]);
>  		free(array->v);
>  	}
> diff --git a/strvec.h b/strvec.h
> index 9f55c8766ba..b122b87b369 100644
> --- a/strvec.h
> +++ b/strvec.h
> @@ -26,29 +26,51 @@ extern const char *empty_strvec[];
>   * member contains the actual array; the `nr` member contains the
>   * number of elements in the array, not including the terminating
>   * NULL.
> + *
> + * When using `STRVEC_INIT_NODUP` to initialize it the `nodup_strings'
> + * member is set, and individual members of the "struct strvec" will
> + * not be free()'d by strvec_clear(). This is for fixed string
> + * arguments to parse_options() and others that might munge the "v"
> + * itself.
>   */
>  struct strvec {
>  	const char **v;
>  	size_t nr;
>  	size_t alloc;
> +	unsigned int nodup_strings:1;
>  };
>
>  #define STRVEC_INIT { \
>  	.v = empty_strvec, \
>  }
>
> +#define STRVEC_INIT_NODUP { \
> +	.v = empty_strvec, \
> +	.nodup_strings = 1, \
> +}
> +
>  /**
>   * Initialize an array. This is no different than assigning from
>   * `STRVEC_INIT`.
>   */
>  void strvec_init(struct strvec *);
>
> +/**
> + * Initialize a "nodup" array. This is no different than assigning from
> + * `STRVEC_INIT_NODUP`.
> + */
> +void strvec_init_nodup(struct strvec *);
> +
>  /* Push a copy of a string onto the end of the array. */
>  const char *strvec_push(struct strvec *, const char *);
>
>  /**
>   * Format a string and push it onto the end of the array. This is a
>   * convenience wrapper combining `strbuf_addf` and `strvec_push`.
> + *
> + * This is incompatible with arrays initialized with
> + * `STRVEC_INIT_NODUP`, as pushing the formatted string requires the
> + * equivalent of an xstrfmt().
>   */
>  __attribute__((format (printf,2,3)))
>  const char *strvec_pushf(struct strvec *, const char *fmt, ...);
> @@ -70,7 +92,13 @@ void strvec_pushv(struct strvec *, const char **);
>   */
>  void strvec_pop(struct strvec *);
>
> -/* Splits by whitespace; does not handle quoted arguments! */
> +/**
> + * Splits by whitespace; does not handle quoted arguments!
> + *
> + * This is incompatible with arrays initialized with
> + * `STRVEC_INIT_NODUP`, as pushing the elements requires an xstrndup()
> + * call.
> + */
>  void strvec_split(struct strvec *, const char *);
>
>  /**

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] am: don't pass strvec to apply_parse_options()
  2022-12-14  8:44     ` Ævar Arnfjörð Bjarmason
  2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
@ 2022-12-17 12:46       ` René Scharfe
  1 sibling, 0 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 12:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git List, Junio C Hamano

Am 14.12.22 um 09:44 schrieb Ævar Arnfjörð Bjarmason:
>
> On Tue, Dec 13 2022, René Scharfe wrote:
>
>> Am 13.12.22 um 09:37 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> On Tue, Dec 13 2022, René Scharfe wrote:
>>>
>>>> apply_parse_options() passes the array of argument strings to
>>>> parse_options(), which removes recognized options.  The removed strings
>>>> are not freed, though.
>>>>
>>>> Make a copy of the strvec to pass to the function to retain the pointers
>>>> of its strings, so we release them all at the end.
>>>>
>>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>>> ---
>>>>  builtin/am.c | 12 +++++++++++-
>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/builtin/am.c b/builtin/am.c
>>>> index 30c9b3a9cd..dddf1b9af0 100644
>>>> --- a/builtin/am.c
>>>> +++ b/builtin/am.c
>>>> @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>>>  	int res, opts_left;
>>>>  	int force_apply = 0;
>>>>  	int options = 0;
>>>> +	const char **apply_argv;
>>>>
>>>>  	if (init_apply_state(&apply_state, the_repository, NULL))
>>>>  		BUG("init_apply_state() failed");
>>>> @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>>>  	strvec_push(&apply_opts, "apply");
>>>>  	strvec_pushv(&apply_opts, state->git_apply_opts.v);
>>>>
>>>> -	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
>>>> +	/*
>>>> +	 * Build a copy that apply_parse_options() can rearrange.
>>>> +	 * apply_opts.v keeps referencing the allocated strings for
>>>> +	 * strvec_clear() to release.
>>>> +	 */
>>>> +	ALLOC_ARRAY(apply_argv, apply_opts.nr);
>>>> +	COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr);
>>>> +
>>>> +	opts_left = apply_parse_options(apply_opts.nr, apply_argv,
>>>>  					&apply_state, &force_apply, &options,
>>>>  					NULL);
>>>>
>>>> @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>>>>  	strvec_clear(&apply_paths);
>>>>  	strvec_clear(&apply_opts);
>>>>  	clear_apply_state(&apply_state);
>>>> +	free(apply_argv);
>>>>
>>>>  	if (res)
>>>>  		return res;
>>>
>>> I don't mind this going in, but it really feels like a bit of a dirty
>>> hack.

What's so dirty about it, by the way?

>>>
>>> We have widespread leaks all over the place due to this
>>> idiom. I.e. parse_options() and a couple of other APIs expect that they
>>> can munge the "argv", which is fine if it arrives via main(), but not if
>>> we're editing our own strvecs.
>>
>> Where?  A quick "git grep 'parse_options.*nr'" turns up only this place
>> as one that passes a strvec to parse_options.
>
> Sorry about the vagueness, this was from memory and I didn't have a full
> list handy.
>
> First, that grep isn't going to give you what you want, since it only
> catches cases where we're passing the "strvec" to the "parse_options()"
> directly.
>
> But if you look at e.g. how cmd_stash() invokes push_stash() or
> cmd_describe() invoking cmd_name_rev() you'll see callers where we're
> potentially losing the strvec due to parse_options().
>
> "Potentially" because in both those cases we call that API, but in the
> latter case we've got a missing strvec_clear(), so maybe that's all
> that's needed (it doesn't always munge the argv).

OK.

>>> I think less of a hack is to teach the eventual parse_options() that
>>> when it munges it it should free() it. I did that for the revisions API
>>> in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
>>> need free()-ing, 2022-08-02).
>>>
>>> What do you think?
>>
>> Generating string lists and then parsing them is weird.  When calls have
>> to cross a process boundary then we have no choice, but in-process we
>> shouldn't have to lower our request to an intermediate text format.  git
>> am does it anyway because it writes its options to a file and reads them
>> back when it resumes with --continue, IIUC.
>
> I agree, but making all those use a nicer API would be a much bigger
> change.

Indeed.

>> I hope that is and will be the only place that uses parse_options() with
>> a strvec -- and then we don't have to change that function.
>>
>> If this pattern is used more widely then we could package the copying
>> done by this patch somehow, e.g. by adding a strvec_parse_options()
>> that wraps the real thing.
>
> The reason I'm encouraging you to look at some of the other strvec leaks
> is to see if you can think of a pattern that fits these various leaky
> callers.
>
> So e.g. a strvec_parse_options() won't work for the ones noted above
> where we've lost track of it being a "struct strvec" in the first place.

Right.  It would work if we used a strvec in place of *every* argc/argv
pair, but that seems a but heavy a change.

> I have a local version somewhere where I did (pseudocode):
>
> 	struct strvec vec = STRVEC_INIT;
> 	struct string_list to_free = STRING_LIST_INIT_DUP;
>
>         // populate vec...
>         for (size_t i = 0; i < vec.nr; i++)
> 		string_list_append_nodup(&to_free, v[i]):
> 	// call parse_options(), or otherwise munge "vec"...
> 	free(strvec_detach(&vec));
>         string_list_clear(&to_free, 0);
>
> I.e. you snapshot the members of the "vec" before it's munged, and
> free() those via a "struct string_list".
>
> Then you don't strvec_clear(), but instead free() the container, its
> members being free'd by the string_list.

So the same as my patch, just with a string_list instead of a string
array and ownership juggling.

> Here's a (not yet in-tree) patch that uses that:
> https://lore.kernel.org/git/patch-10.10-81f138e460c-20221017T115544Z-avarab@gmail.com/
>
> I think that's more reliable & general than the pattern you're adding
> here.

How so?

> In your version the only reason we're not segfaulting is because the
> parse_options() consumes all the arguments, but that's not something you
> can generally rely on with parse_options().

Could you please elaborate?  What do you mean with "consume"?  Or could
you give an example of a segfault do to partial consumption?

I see one caveat: We should keep a NULL sentinel at the end when we
hand the copy to arbitrary functions, as some of them don't even look
at argc.  Is that what you meant?

> Also, and maybe I'm missing something, but wouldn't this be functionally
> the same as your implementation:
>
> 	diff --git a/builtin/am.c b/builtin/am.c
> 	index 30c9b3a9cd7..e65262cbb21 100644
> 	--- a/builtin/am.c
> 	+++ b/builtin/am.c
> 	@@ -1476,11 +1476,13 @@ static int run_apply(const struct am_state *state, const char *index_file)
> 	 	int res, opts_left;
> 	 	int force_apply = 0;
> 	 	int options = 0;
> 	+	char *to_free;
>
> 	 	if (init_apply_state(&apply_state, the_repository, NULL))
> 	 		BUG("init_apply_state() failed");
>
> 	 	strvec_push(&apply_opts, "apply");
> 	+	to_free = (char *)apply_opts.v[0];
> 	 	strvec_pushv(&apply_opts, state->git_apply_opts.v);
>
> 	 	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
> 	@@ -1489,6 +1491,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
>
> 	 	if (opts_left != 0)
> 	 		die("unknown option passed through to git apply");
> 	+	free(to_free);
>
> 	 	if (index_file) {
> 	 		apply_state.index_file = index_file;
>
> I.e. we leak the strvec_push() of the "apply" literal, but for the rest
> we append the "state->git_apply_opts.v", which we already free()
> elsewhere.

state->git_apply_opts is released elsewhere, but the copies of its elements
in apply_opts are leaked here.

> So actually, aren't we really just fumbling our way towards the "nodup"
> interface that the "struct string_list" has, and we should add the same
> to the "struct strvec".
>
> This seems to also fix all the leaks you fixed, but without any of the
> copying:
>
> 	diff --git a/builtin/am.c b/builtin/am.c
> 	index 30c9b3a9cd7..691ec1d152d 100644
> 	--- a/builtin/am.c
> 	+++ b/builtin/am.c
> 	@@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail)
> 	 static int run_apply(const struct am_state *state, const char *index_file)
> 	 {
> 	 	struct strvec apply_paths = STRVEC_INIT;
> 	-	struct strvec apply_opts = STRVEC_INIT;
> 	+	struct strvec apply_opts = STRVEC_INIT_NODUP;
> 	 	struct apply_state apply_state;
> 	 	int res, opts_left;
> 	 	int force_apply = 0;
> 	diff --git a/strvec.c b/strvec.c
> 	index 61a76ce6cb9..efdc47a5854 100644
> 	--- a/strvec.c
> 	+++ b/strvec.c
> 	@@ -22,7 +22,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value)
>
> 	 const char *strvec_push(struct strvec *array, const char *value)
> 	 {
> 	-	strvec_push_nodup(array, xstrdup(value));
> 	+	const char *to_push = array->nodup_strings ? value : xstrdup(value);
> 	+
> 	+	strvec_push_nodup(array, to_push);
> 	 	return array->v[array->nr - 1];
> 	 }
>
> 	@@ -31,6 +33,9 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...)
> 	 	va_list ap;
> 	 	struct strbuf v = STRBUF_INIT;
>
> 	+	if (array->nodup_strings)
> 	+		BUG("cannot strvec_pushf() on a 'nodup' strvec");
> 	+
> 	 	va_start(ap, fmt);
> 	 	strbuf_vaddf(&v, fmt, ap);
> 	 	va_end(ap);
> 	@@ -67,6 +72,9 @@ void strvec_pop(struct strvec *array)
>
> 	 void strvec_split(struct strvec *array, const char *to_split)
> 	 {
> 	+	if (array->nodup_strings)
> 	+		BUG("cannot strvec_split() on a 'nodup' strvec");
> 	+
> 	 	while (isspace(*to_split))
> 	 		to_split++;
> 	 	for (;;) {
> 	@@ -90,7 +98,7 @@ void strvec_clear(struct strvec *array)
> 	 	if (array->v != empty_strvec) {
> 	 		int i;
> 	 		for (i = 0; i < array->nr; i++)
> 	-			free((char *)array->v[i]);
> 	+			free(array->nodup_strings ? NULL : (char *)array->v[i]);
> 	 		free(array->v);
> 	 	}
> 	 	strvec_init(array);
> 	diff --git a/strvec.h b/strvec.h
> 	index 9f55c8766ba..ac6e2c04cea 100644
> 	--- a/strvec.h
> 	+++ b/strvec.h
> 	@@ -31,12 +31,18 @@ struct strvec {
> 	 	const char **v;
> 	 	size_t nr;
> 	 	size_t alloc;
> 	+	unsigned int nodup_strings:1;
> 	 };
>
> 	 #define STRVEC_INIT { \
> 	 	.v = empty_strvec, \
> 	 }
>
> 	+#define STRVEC_INIT_NODUP { \
> 	+	.v = empty_strvec, \
> 	+	.nodup_strings = 1, \
> 	+}
> 	+
> 	 /**
> 	  * Initialize an array. This is no different than assigning from
> 	  * `STRVEC_INIT`.

We can do it, but it's quite complicated overall.  strvec is with its
"duplicate everything" stance is easy to understand and reason about.
Adding a nodup variant will remove that feature.

> In any case, for this change (and leak fixes in general), could you
> please also mark the now-passing leak-free tests. This fails after your
> patch, but passes before.
>
> The failure is a "good" one, as they're now leak-free, but should be
> marked as such:
>
> 	GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_SANITIZE_LEAK_LOG=true \
> 	make SANITIZE=leak test T="t4256-am-format-flowed.sh t0023-crlf-am.sh t4254-am-corrupt.sh t4257-am-interactive.sh t5403-post-checkout-hook.sh t4152-am-subjects.sh"
>

Well, it might be my laziness talking, but can we avoid that?  There are
lots of leaks present in tests and plugging the last one for a particular
test script holds no particular significance.  Wouldn't it be better to
update the annotations at the end of the release cycle, similar to
translations?  Or automate it via some kind of CI job?

In any case: Requiring to run the test suite with LeakSanitizer for leak
fixes would be a hurdle that I'd jump only reluctantly.  (LeakSanitizer
is not supported on my development system, an M1 Mac.)

>> If we have to change parse_options() at all then I'd prefer it to not
>> free() anything (to keep it usable with main()'s parameters),...
>
> I'm suggesting passing a flag to it to have it free() things it
> NULL's.
>
> Maybe that's a bad idea, but I don't see the incompatibility with
> main(), for those we just wouldn't pass the flag.
>
> It does have the inherent limitation that you couldn't mix strvec and
> non-strvec parameters, i.e. if some of your elements are dup'd but
> others aren't you'd need to arrange to have them all dup'd.
>
> But I don't think that's an issue in practice, either we pass the fully
> dup'd version, or main() parameters.

Yes, but your examples at the top show that sometimes the parse_options()
call is hidden by several layers of other functions.  We better have a
more general solution.

>> but to
>> reorder in a non-destructive way.  That would mean keeping the NULL
>> sentinel where it is, and making sure all callers use only the returned
>> argc to determine which arguments parse_options() didn't recognize.
>
> I think this would work if parse_options() wasn't clobbering those
> elements in some cases, and replacing them with new ones, but doesn't it
> do that e.g. in parse_options_step()?

You mean the "fake a short option thing", right?  That would have to be
addressed somehow.

> It also seems like a much more fragile change to try to ensure that
> nothing uses the "argv" again, but only looks at the updated
> "argc".
>
> Both that & adding a "const" to it would be a huge change across the
> codebase, so I'd like to avoid them, but at least the "const" method
> would be helped along by the compiler.

Agreed.

René

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
  2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
                           ` (5 preceding siblings ...)
  2022-12-17 12:45         ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks René Scharfe
@ 2022-12-17 13:13         ` Jeff King
  2022-12-19  9:20           ` Ævar Arnfjörð Bjarmason
  6 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2022-12-17 13:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, René Scharfe, Junio C Hamano

On Thu, Dec 15, 2022 at 10:11:06AM +0100, Ævar Arnfjörð Bjarmason wrote:

> This is an alternative to René's [1], his already fixes a leak in "git
> am", and this could be done later, so I'm submitting it as RFC, but it
> could also replace it.
> 
> I think as this series shows extending the "strvec" API to get a
> feature that works like the existing "strdup_strings" that the "struct
> string_list" has can make memory management much simpler.

I know this is kind of a surface level review, but...please don't do
this. We have chased so many bugs over the years due to string-list's
"maybe this is allocated and maybe not", in both directions (accidental
leaks and double-frees).

One of the reasons I advocated for strvec in the first place is so that
it would have consistent memory management semantics, at the minor cost
of sometimes duplicating them when we don't need to.

And having a nodup form doesn't even save you from having to call
strvec_clear(); you still need to do so to avoid leaking the array
itself. It only helps in the weird parse-options case, where we don't
handle ownership of the array very well (the strvec owns it, but
parse-options wants to modify it).

> This does make the API slightly more dangerous to use, as it's no
> longer guaranteed that it owns all the members it points to. But as
> the "struct string_list" has shown this isn't an issue in practice,
> and e.g. SANITIZE=address et al are good about finding double-frees,
> or frees of fixed strings.

I would disagree that this hasn't been an issue in practice. A few
recent examples:

  - 5eeb9aa208 (refs: fix memory leak when parsing hideRefs config,
    2022-11-17)
  - 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
    strings, 2022-09-08)
  - 4c81ee9669 (submodule--helper: fix "reference" leak, 2022-09-01)

Now you could argue that those leaks might still exist if we only had a
duplicating version of string-list (after all, the problem in a leak is
an extra duplication). But IMHO it is the ambiguity and the games we
play with setting/unsetting the strdup_strings field that lead to these
errors.

And yes, leak-checking and sanitizers can sometimes find these bugs. But
that implies triggering the bug in the test suite. And it implies extra
time to track and fix them. An interface which is harder to get wrong in
the first place is preferable.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] am: don't pass strvec to apply_parse_options()
  2022-12-13 18:31   ` René Scharfe
  2022-12-14  8:44     ` Ævar Arnfjörð Bjarmason
@ 2022-12-17 13:24     ` Jeff King
  2022-12-17 16:07       ` René Scharfe
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff King @ 2022-12-17 13:24 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, Git List, Junio C Hamano

On Tue, Dec 13, 2022 at 07:31:13PM +0100, René Scharfe wrote:

> > I think less of a hack is to teach the eventual parse_options() that
> > when it munges it it should free() it. I did that for the revisions API
> > in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
> > need free()-ing, 2022-08-02).
> >
> > What do you think?
> 
> Generating string lists and then parsing them is weird.  When calls have
> to cross a process boundary then we have no choice, but in-process we
> shouldn't have to lower our request to an intermediate text format.  git
> am does it anyway because it writes its options to a file and reads them
> back when it resumes with --continue, IIUC.

The argument has been made[1] in the past that the public API for the
revision machinery is not poking bits in the rev_info struct yourself,
but passing the appropriate options to setup_revisions().

And I can see the point; if option "--foo" requires twiddling both
revs.foo and revs.bar, then we have no way to enforce that callers have
remembered to do both. But if the only code which handles this is the
parser for "--foo", then we're OK.

In a non-C language, we'd probably mark "foo" and "bar" as private and
provide a method to enable the option. We could provide a function, but
we'd have no compiler help to ensure that it was used over fiddling the
bits. Possibly calling them "foo_private" would be sufficient to make
people think twice about tweaking them, though.

[1] Apologies for the passive voice; I didn't want to muddle the
    paragraph by discussing who and when. But I know this is an argument
    Junio has made; I don't know if he still stands by it these days
    (there is quite a lot of field-twiddling of rev_info by now). I'm
    not sure to what degree I agree with it myself, but I thought it was
    worth mentioning here.

> If we have to change parse_options() at all then I'd prefer it to not
> free() anything (to keep it usable with main()'s parameters), but to
> reorder in a non-destructive way.  That would mean keeping the NULL
> sentinel where it is, and making sure all callers use only the returned
> argc to determine which arguments parse_options() didn't recognize.

My findings with -Wunused-parameter show that there are quite a lot of
places that take argv/argc but assume the NULL-termination of the argv.

If we are just re-ordering argv, though, it feels like this could still
work with a strvec. Right now a strvec with "nr" items will free items 0
through nr-1, assuming that v[nr] is its NULL invariant. But it could
free v[nr], too, in case the NULL was swapped into an earlier position.

It's a little weird already, because that swap has violated the
invariant, so trying to strvec_push() onto it would cause confusing
results. But if the general use case is to pass the strvec to
parse_options(), get reordered, and then clear() it, it should work. If
you wanted to get really fancy, push() et al could double-check the
invariant and BUG().

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] am: don't pass strvec to apply_parse_options()
  2022-12-17 13:24     ` Jeff King
@ 2022-12-17 16:07       ` René Scharfe
  2022-12-17 21:53         ` Jeff King
  2022-12-20  1:29         ` Junio C Hamano
  0 siblings, 2 replies; 24+ messages in thread
From: René Scharfe @ 2022-12-17 16:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Git List, Junio C Hamano

Am 17.12.22 um 14:24 schrieb Jeff King:
> On Tue, Dec 13, 2022 at 07:31:13PM +0100, René Scharfe wrote:
>
>>> I think less of a hack is to teach the eventual parse_options() that
>>> when it munges it it should free() it. I did that for the revisions API
>>> in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that
>>> need free()-ing, 2022-08-02).
>>>
>>> What do you think?
>>
>> Generating string lists and then parsing them is weird.  When calls have
>> to cross a process boundary then we have no choice, but in-process we
>> shouldn't have to lower our request to an intermediate text format.  git
>> am does it anyway because it writes its options to a file and reads them
>> back when it resumes with --continue, IIUC.
>
> The argument has been made[1] in the past that the public API for the
> revision machinery is not poking bits in the rev_info struct yourself,
> but passing the appropriate options to setup_revisions().

My point was that we are stuck with the way it's done in git am
specifically because of its on-disk parameter store.  It forces us to
be able to handle a parameter list that we allocate ourselves instead
of being given one by the OS like argc/argv in main().  And this
justifies a bespoke solution instead of changing parse_options().

In the meantime Ævar showed enough examples to convince me that a more
general solution is indeed needed, but I still think we should keep
parse_options() unchanged and add something like strvec_clone_nodup()
instead.

Regarding using textual interfaces internally in general: It's the
easiest method, as we have to provide it for users anyway.  It's not
simple, though -- generating strings and managing their ownership adds
overhead like this patch, which we wouldn't have in an struct
interface.

> And I can see the point; if option "--foo" requires twiddling both
> revs.foo and revs.bar, then we have no way to enforce that callers have
> remembered to do both. But if the only code which handles this is the
> parser for "--foo", then we're OK.

Depends on the specifics.  Perhaps .bar is redundant and could be
inferred from .foo.  Or they could be replaced by an enum.

> In a non-C language, we'd probably mark "foo" and "bar" as private and
> provide a method to enable the option. We could provide a function, but
> we'd have no compiler help to ensure that it was used over fiddling the
> bits. Possibly calling them "foo_private" would be sufficient to make
> people think twice about tweaking them, though.

Or normalize the struct to avoid dependencies between fields.

> [1] Apologies for the passive voice; I didn't want to muddle the
>     paragraph by discussing who and when. But I know this is an argument
>     Junio has made; I don't know if he still stands by it these days
>     (there is quite a lot of field-twiddling of rev_info by now). I'm
>     not sure to what degree I agree with it myself, but I thought it was
>     worth mentioning here.
>
>> If we have to change parse_options() at all then I'd prefer it to not
>> free() anything (to keep it usable with main()'s parameters), but to
>> reorder in a non-destructive way.  That would mean keeping the NULL
>> sentinel where it is, and making sure all callers use only the returned
>> argc to determine which arguments parse_options() didn't recognize.
>
> My findings with -Wunused-parameter show that there are quite a lot of
> places that take argv/argc but assume the NULL-termination of the argv.

Right.

> If we are just re-ordering argv, though, it feels like this could still
> work with a strvec. Right now a strvec with "nr" items will free items 0
> through nr-1, assuming that v[nr] is its NULL invariant. But it could
> free v[nr], too, in case the NULL was swapped into an earlier position.
>
> It's a little weird already, because that swap has violated the
> invariant, so trying to strvec_push() onto it would cause confusing
> results. But if the general use case is to pass the strvec to
> parse_options(), get reordered, and then clear() it, it should work. If
> you wanted to get really fancy, push() et al could double-check the
> invariant and BUG().

Yes, parse_options() and strvec are not fitting perfectly.  Changing the
former to reorder the elements and keeping them all would require
checking that all callers use the return value.  Feels like a lot of work.

A variant that takes a strvec and removes and frees parsed items from it
would be clean, but would require refactoring indirect callers like
apply_parse_options() users.  Busywork.

Making a shallow copy to give to parse_options() in callers that currently
pass a strvec directly or indirectly seems like the simplest solution to
me for now.

René

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] am: don't pass strvec to apply_parse_options()
  2022-12-17 16:07       ` René Scharfe
@ 2022-12-17 21:53         ` Jeff King
  2022-12-18  2:42           ` Junio C Hamano
  2022-12-20  1:29         ` Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff King @ 2022-12-17 21:53 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, Git List, Junio C Hamano

On Sat, Dec 17, 2022 at 05:07:12PM +0100, René Scharfe wrote:

> > If we are just re-ordering argv, though, it feels like this could still
> > work with a strvec. Right now a strvec with "nr" items will free items 0
> > through nr-1, assuming that v[nr] is its NULL invariant. But it could
> > free v[nr], too, in case the NULL was swapped into an earlier position.
> >
> > It's a little weird already, because that swap has violated the
> > invariant, so trying to strvec_push() onto it would cause confusing
> > results. But if the general use case is to pass the strvec to
> > parse_options(), get reordered, and then clear() it, it should work. If
> > you wanted to get really fancy, push() et al could double-check the
> > invariant and BUG().
> 
> Yes, parse_options() and strvec are not fitting perfectly.  Changing the
> former to reorder the elements and keeping them all would require
> checking that all callers use the return value.  Feels like a lot of work.

I think we're already munging the strvec arrays in the option parser,
though. I'm just suggesting that parse_options() swap arguments to the
end instead of overwriting a NULL (actually, I'm not even sure it
doesn't do that already), and strvec_clear() checking the final element.
The end state is not necessarily safe, but it's no worse than what we
have today.

That said...

> A variant that takes a strvec and removes and frees parsed items from it
> would be clean, but would require refactoring indirect callers like
> apply_parse_options() users.  Busywork.
> 
> Making a shallow copy to give to parse_options() in callers that currently
> pass a strvec directly or indirectly seems like the simplest solution to
> me for now.

Yes, I thought your original patch actually got to the root of the
problem. strvec owns the array and its elements, and parse-options wants
to munge the array itself (but not the elements). Making a shallow copy
is eliminates the conflict over ownership.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] am: don't pass strvec to apply_parse_options()
  2022-12-17 21:53         ` Jeff King
@ 2022-12-18  2:42           ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2022-12-18  2:42 UTC (permalink / raw)
  To: Jeff King
  Cc: René Scharfe, Ævar Arnfjörð Bjarmason,
	Git List

Jeff King <peff@peff.net> writes:

>> Making a shallow copy to give to parse_options() in callers that currently
>> pass a strvec directly or indirectly seems like the simplest solution to
>> me for now.
>
> Yes, I thought your original patch actually got to the root of the
> problem. strvec owns the array and its elements, and parse-options wants
> to munge the array itself (but not the elements). Making a shallow copy
> is eliminates the conflict over ownership.

Yup, it did look very sensible to me.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
  2022-12-17 13:13         ` Jeff King
@ 2022-12-19  9:20           ` Ævar Arnfjörð Bjarmason
  2023-01-07 13:21             ` Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-19  9:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Junio C Hamano


On Sat, Dec 17 2022, Jeff King wrote:

> On Thu, Dec 15, 2022 at 10:11:06AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> This is an alternative to René's [1], his already fixes a leak in "git
>> am", and this could be done later, so I'm submitting it as RFC, but it
>> could also replace it.
>> 
>> I think as this series shows extending the "strvec" API to get a
>> feature that works like the existing "strdup_strings" that the "struct
>> string_list" has can make memory management much simpler.
>
> I know this is kind of a surface level review, but...please don't do
> this. We have chased so many bugs over the years due to string-list's
> "maybe this is allocated and maybe not", in both directions (accidental
> leaks and double-frees).
>
> One of the reasons I advocated for strvec in the first place is so that
> it would have consistent memory management semantics, at the minor cost
> of sometimes duplicating them when we don't need to.
>
> And having a nodup form doesn't even save you from having to call
> strvec_clear(); you still need to do so to avoid leaking the array
> itself. It only helps in the weird parse-options case, where we don't
> handle ownership of the array very well (the strvec owns it, but
> parse-options wants to modify it).

Yes, just like "struct string_list" in the "nodup" mode.

I hear you, but I also think you're implicitly conflating two things
here.

There's the question of whether we should in general optimize for safety
over more optimila memory use. I.e. if we simply have every strvec,
string_list etc. own its memory fully we don't need to think as much
about allocation or ownership.

I think we should do that in general, but we also have cases where we'd
like to not do that, e.g. where we're adding thousands of strings to a
string_list, which are all borrewed from elsewhere, except for a few
we'd like to xstrdup().

Such API use *is* tricky, but I think formalizing it as the
"string_list" does is making it better, not worse. In particular...:

>> This does make the API slightly more dangerous to use, as it's no
>> longer guaranteed that it owns all the members it points to. But as
>> the "struct string_list" has shown this isn't an issue in practice,
>> and e.g. SANITIZE=address et al are good about finding double-frees,
>> or frees of fixed strings.
>
> I would disagree that this hasn't been an issue in practice. A few
> recent examples:
>
>   - 5eeb9aa208 (refs: fix memory leak when parsing hideRefs config,
>     2022-11-17)
>   - 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
>     strings, 2022-09-08)
>   - 4c81ee9669 (submodule--helper: fix "reference" leak, 2022-09-01)

...it's funny that those are the examples I would have dug up to argue
that this is a good idea, and to add some:

	- 4a0479086a9 (commit-graph: fix memory leak in misused
          string_list API, 2022-03-04)
	- b202e51b154 (grep: fix a "path_list" memory leak, 2021-10-22)

I.e. above you note "in both directions [...] leaks [...] and double
frees", but these (and the ones I added) are all in the second category.

Which is why I don't think it's an issue in practice. The leaks have
been a non-issue, and to the extent that we care the SANITIZE=leak
testing is closing that gap.

The dangerous issue is the double-free, but (and over the years I have
looked at pretty much every caller) I can't imagine a string_list
use-case where we:

 a) Actually still want to keep that memory optimization, i.e. it
    wouldn't be better by just saying "screw it, let's dup it".
 b) Given "a", we'd be better off with some bespoke custom pattern over
    the facility to do this with the "string_list".

So I really think we're in agreement about 99% of this, I just don't see
how *if* we want to do this why we're better of re-inventing this
particular wheel for every caller.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] am: don't pass strvec to apply_parse_options()
  2022-12-17 16:07       ` René Scharfe
  2022-12-17 21:53         ` Jeff King
@ 2022-12-20  1:29         ` Junio C Hamano
  1 sibling, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2022-12-20  1:29 UTC (permalink / raw)
  To: René Scharfe
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Git List

René Scharfe <l.s.r@web.de> writes:

> Depends on the specifics.  Perhaps .bar is redundant and could be
> inferred from .foo.  Or they could be replaced by an enum.
> ...
> Or normalize the struct to avoid dependencies between fields.

I think the example Peff brought up was mostly from the revisions
API, where there are two vastly different traversal behaviour
(i.e. limited and !limited).  With .limited bit set, we first
fully enumerate the potential commits, before showing a single
entry in the output, and without, we can stream the output.
In general, we prefer !limited traversal because that would give us
better interactive latency, but some options only can work with
the limited bit set (e.g. --ancestry-path, --cherry).

We _could_ probably have do without the .limited bit and instead
wrote each and every check we currently make to the revs->limited
bit as something like 

#define is_limited(revs) (revs->cherry_mark || \
			  revs->ancestry_path || \
			  ...)

so that the "struct members" API users do not have to know that they
have to set the .limit bit when they set .cherry_mark bit.  There
are many others even in the revisions API alone.

IOW, yes, this depends on the specifics, and "normalize" is easier
said than done, unfortunately.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
  2022-12-19  9:20           ` Ævar Arnfjörð Bjarmason
@ 2023-01-07 13:21             ` Jeff King
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff King @ 2023-01-07 13:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, René Scharfe, Junio C Hamano

On Mon, Dec 19, 2022 at 10:20:00AM +0100, Ævar Arnfjörð Bjarmason wrote:

> I hear you, but I also think you're implicitly conflating two things
> here.
> 
> There's the question of whether we should in general optimize for safety
> over more optimila memory use. I.e. if we simply have every strvec,
> string_list etc. own its memory fully we don't need to think as much
> about allocation or ownership.
> 
> I think we should do that in general, but we also have cases where we'd
> like to not do that, e.g. where we're adding thousands of strings to a
> string_list, which are all borrewed from elsewhere, except for a few
> we'd like to xstrdup().
> 
> Such API use *is* tricky, but I think formalizing it as the
> "string_list" does is making it better, not worse. In particular...:

I think the two things are related, though, because "safety" is "people
not mis-using the API". Once you give the API the option to do the
trickier thing, people will do it, and they will do it badly. That has
been my experience with the string-list API (especially that people try
to do clever things with transferring ownership by using a
non-duplicating list and then setting strdup_strings midway through the
function).

> > I would disagree that this hasn't been an issue in practice. A few
> > recent examples:
> >
> >   - 5eeb9aa208 (refs: fix memory leak when parsing hideRefs config,
> >     2022-11-17)
> >   - 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
> >     strings, 2022-09-08)
> >   - 4c81ee9669 (submodule--helper: fix "reference" leak, 2022-09-01)
> 
> ...it's funny that those are the examples I would have dug up to argue
> that this is a good idea, and to add some:
> 
> 	- 4a0479086a9 (commit-graph: fix memory leak in misused
>           string_list API, 2022-03-04)
> 	- b202e51b154 (grep: fix a "path_list" memory leak, 2021-10-22)
> 
> I.e. above you note "in both directions [...] leaks [...] and double
> frees", but these (and the ones I added) are all in the second category.

I almost didn't give a list, because it's hard to dig into all of the
details. For example the second one in my list, which I worked on, did
fix things by consistently setting strdup_strings, and it was "just" a
leak fix. But it took me a full day to untangle the mess of that code,
and there were intermediate states were we did access dangling pointers
before I got to a working order of the series.  And all of it was
complicated by the fact that string_list is configurable, so different
bits of the code expected different things. Even after that commit,
there's a horrible hack to set the strdup field and hope it's enough.

If that were the only time I'd wrestled with string_list, I'd be less
concerned. But (subjectively) this feels like the latest in a long line
of bugs and irritations.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-01-07 13:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-13  6:47 [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
2022-12-13  8:37 ` Ævar Arnfjörð Bjarmason
2022-12-13 18:31   ` René Scharfe
2022-12-14  8:44     ` Ævar Arnfjörð Bjarmason
2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
2022-12-15  9:11         ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 2/5] various: add missing strvec_clear() Ævar Arnfjörð Bjarmason
2022-12-15  9:11         ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-17 12:45         ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks René Scharfe
2022-12-17 13:13         ` Jeff King
2022-12-19  9:20           ` Ævar Arnfjörð Bjarmason
2023-01-07 13:21             ` Jeff King
2022-12-17 12:46       ` [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
2022-12-17 13:24     ` Jeff King
2022-12-17 16:07       ` René Scharfe
2022-12-17 21:53         ` Jeff King
2022-12-18  2:42           ` Junio C Hamano
2022-12-20  1:29         ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).