git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
@ 2017-12-07 20:22 René Scharfe
  2017-12-07 21:27 ` Jeff King
  2017-12-07 21:47 ` Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: René Scharfe @ 2017-12-07 20:22 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano

Use string_list_append_nodup() instead of string_list_append() to hand
over ownership of a detached strbuf and thus avoid leaking its memory.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 builtin/fmt-merge-msg.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index 22034f87e7..8e8a15ea4a 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -377,7 +377,8 @@ static void shortlog(const char *name,
 			string_list_append(&subjects,
 					   oid_to_hex(&commit->object.oid));
 		else
-			string_list_append(&subjects, strbuf_detach(&sb, NULL));
+			string_list_append_nodup(&subjects,
+						 strbuf_detach(&sb, NULL));
 	}
 
 	if (opts->credit_people)
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-07 20:22 [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() René Scharfe
@ 2017-12-07 21:27 ` Jeff King
  2017-12-08 17:29   ` René Scharfe
  2017-12-07 21:47 ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff King @ 2017-12-07 21:27 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano

On Thu, Dec 07, 2017 at 09:22:49PM +0100, René Scharfe wrote:

> Use string_list_append_nodup() instead of string_list_append() to hand
> over ownership of a detached strbuf and thus avoid leaking its memory.

Looks obviously correct (though one thing missing from the diff context
is whether "subjects" is set to DUP -- it is, which is good).

Grepping for "list_append.*detach" shows a few other possible cases in
transport-helper.c, which I think are leaks.

I wondered if it would be possible to write a coccinelle rule for this,
but I think it's not possible. Whether this is right depends on the
strdup_strings flag, which could change at runtime (though in practice
it doesn't).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-07 20:22 [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() René Scharfe
  2017-12-07 21:27 ` Jeff King
@ 2017-12-07 21:47 ` Junio C Hamano
  2017-12-08 10:14   ` Jeff King
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2017-12-07 21:47 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List

René Scharfe <l.s.r@web.de> writes:

> Use string_list_append_nodup() instead of string_list_append() to hand
> over ownership of a detached strbuf and thus avoid leaking its memory.
>
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> ---
>  builtin/fmt-merge-msg.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
> index 22034f87e7..8e8a15ea4a 100644
> --- a/builtin/fmt-merge-msg.c
> +++ b/builtin/fmt-merge-msg.c
> @@ -377,7 +377,8 @@ static void shortlog(const char *name,
>  			string_list_append(&subjects,
>  					   oid_to_hex(&commit->object.oid));
>  		else
> -			string_list_append(&subjects, strbuf_detach(&sb, NULL));
> +			string_list_append_nodup(&subjects,
> +						 strbuf_detach(&sb, NULL));
>  	}
>  
>  	if (opts->credit_people)

What is leaked comes from strbuf, so the title is not a lie, but I
tend to think that this leak is caused by a somewhat strange
string_list API.  The subjects string-list is initialized as a "dup"
kind, but a caller that wants to avoid leaking can (and should) use
_nodup() call to add a string without duping.  It all feels a bit
too convoluted.

The patch looks good.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-07 21:47 ` Junio C Hamano
@ 2017-12-08 10:14   ` Jeff King
  2017-12-08 17:29     ` René Scharfe
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2017-12-08 10:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Git List

On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote:

> > diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
> > index 22034f87e7..8e8a15ea4a 100644
> > --- a/builtin/fmt-merge-msg.c
> > +++ b/builtin/fmt-merge-msg.c
> > @@ -377,7 +377,8 @@ static void shortlog(const char *name,
> >  			string_list_append(&subjects,
> >  					   oid_to_hex(&commit->object.oid));
> >  		else
> > -			string_list_append(&subjects, strbuf_detach(&sb, NULL));
> > +			string_list_append_nodup(&subjects,
> > +						 strbuf_detach(&sb, NULL));
> >  	}
> >  
> >  	if (opts->credit_people)
> 
> What is leaked comes from strbuf, so the title is not a lie, but I
> tend to think that this leak is caused by a somewhat strange
> string_list API.  The subjects string-list is initialized as a "dup"
> kind, but a caller that wants to avoid leaking can (and should) use
> _nodup() call to add a string without duping.  It all feels a bit
> too convoluted.

I'm not sure it's string-list's fault. Many callers (including this one)
have _some_ entries whose strings must be duplicated and others which do
not.

So either:

  1. The list gets marked as "nodup", and we add an extra xstrdup() to the
     oid_to_hex call above. And also need to remember to free() the
     strings later, since the list does not own them.

or

  2. We mark it as "dup" and incur an extra allocation and copy, like:

       string_list_append(&subjects, sb.buf);
       strbuf_release(&buf);

So I'd really blame the caller, which doesn't want to do (2) out of a
sense of optimization. It could also perhaps write it as:

  while (commit = get_revision(rev)) {
	strbuf_reset(&sb);
	... maybe put some stuff in sb ...
	if (!sb.len)
		string_list_append(&subjects, oid_to_hex(obj));
	else
		string_list_append(&subjects, sb.buf);
  }
  strbuf_release(&sb);

which at least avoids the extra allocations.

By the way, I think there's another quite subtle leak in this function.
We do this:

  format_commit_message(commit, "%s", &sb, &ctx);
  strbuf_ltrim(&sb);

and then only use "sb" if sb.len is non-zero. But we may have actually
allocated to create our zero-length string (e.g., if we had a strbuf
full of spaces and trimmed them all off). Since we reuse "sb" over and
over as we loop, this will actually only leak once for the whole loop,
not once per iteration. So it's probably not a big deal, but writing it
with the explicit reset/release pattern fixes that (and is more
idiomatic for our code base, I think).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-07 21:27 ` Jeff King
@ 2017-12-08 17:29   ` René Scharfe
  2017-12-08 18:44     ` Junio C Hamano
  2017-12-08 21:11     ` Jeff King
  0 siblings, 2 replies; 16+ messages in thread
From: René Scharfe @ 2017-12-08 17:29 UTC (permalink / raw)
  To: Jeff King, Git List; +Cc: Junio C Hamano

Am 07.12.2017 um 22:27 schrieb Jeff King:
> Grepping for "list_append.*detach" shows a few other possible cases in
> transport-helper.c, which I think are leaks.

-- >8 --
Subject: [PATCH] transport-helper: plug strbuf and string_list leaks

Transfer ownership of detached strbufs to string_lists of the
duplicating variety by calling string_list_append_nodup() instead of
string_list_append() to avoid duplicating and then leaking the buffer.

While at it make sure to release the string_list when done;
push_refs_with_export() already does that.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 transport-helper.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index bf05a2dcf1..f682e7c534 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport *transport,
 			struct strbuf cas = STRBUF_INIT;
 			strbuf_addf(&cas, "%s:%s",
 				    ref->name, oid_to_hex(&ref->old_oid_expect));
-			string_list_append(&cas_options, strbuf_detach(&cas, NULL));
+			string_list_append_nodup(&cas_options,
+						 strbuf_detach(&cas, NULL));
 		}
 	}
 	if (buf.len == 0) {
@@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport *transport,
 	strbuf_addch(&buf, '\n');
 	sendline(data, &buf);
 	strbuf_release(&buf);
+	string_list_release(&cas_options, 0);
 
 	return push_update_refs_status(data, remote_refs, flags);
 }
@@ -930,7 +932,8 @@ static int push_refs_with_export(struct transport *transport,
 		private = apply_refspecs(data->refspecs, data->refspec_nr, ref->name);
 		if (private && !get_oid(private, &oid)) {
 			strbuf_addf(&buf, "^%s", private);
-			string_list_append(&revlist_args, strbuf_detach(&buf, NULL));
+			string_list_append_nodup(&revlist_args,
+						 strbuf_detach(&buf, NULL));
 			oidcpy(&ref->old_oid, &oid);
 		}
 		free(private);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-08 10:14   ` Jeff King
@ 2017-12-08 17:29     ` René Scharfe
  2017-12-08 18:37       ` Junio C Hamano
  2017-12-08 21:17       ` Jeff King
  0 siblings, 2 replies; 16+ messages in thread
From: René Scharfe @ 2017-12-08 17:29 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Git List

Am 08.12.2017 um 11:14 schrieb Jeff King:
> On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote:
> 
>>> diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
>>> index 22034f87e7..8e8a15ea4a 100644
>>> --- a/builtin/fmt-merge-msg.c
>>> +++ b/builtin/fmt-merge-msg.c
>>> @@ -377,7 +377,8 @@ static void shortlog(const char *name,
>>>   			string_list_append(&subjects,
>>>   					   oid_to_hex(&commit->object.oid));
>>>   		else
>>> -			string_list_append(&subjects, strbuf_detach(&sb, NULL));
>>> +			string_list_append_nodup(&subjects,
>>> +						 strbuf_detach(&sb, NULL));
>>>   	}
>>>   
>>>   	if (opts->credit_people)
>>
>> What is leaked comes from strbuf, so the title is not a lie, but I
>> tend to think that this leak is caused by a somewhat strange
>> string_list API.  The subjects string-list is initialized as a "dup"
>> kind, but a caller that wants to avoid leaking can (and should) use
>> _nodup() call to add a string without duping.  It all feels a bit
>> too convoluted.
> 
> I'm not sure it's string-list's fault. Many callers (including this one)
> have _some_ entries whose strings must be duplicated and others which do
> not.
> 
> So either:
> 
>    1. The list gets marked as "nodup", and we add an extra xstrdup() to the
>       oid_to_hex call above. And also need to remember to free() the
>       strings later, since the list does not own them.
> 
> or
> 
>    2. We mark it as "dup" and incur an extra allocation and copy, like:
> 
>         string_list_append(&subjects, sb.buf);
>         strbuf_release(&buf);

The two modes (dup/nodup) make string_list code tricky.  Not sure
how far we'd get with something simpler (e.g. an array of char pointers),
but having the caller do all string allocations would make the code
easier to analyze.

> So I'd really blame the caller, which doesn't want to do (2) out of a
> sense of optimization. It could also perhaps write it as:
> 
>    while (commit = get_revision(rev)) {
> 	strbuf_reset(&sb);
> 	... maybe put some stuff in sb ...
> 	if (!sb.len)
> 		string_list_append(&subjects, oid_to_hex(obj));
> 	else
> 		string_list_append(&subjects, sb.buf);
>    }
>    strbuf_release(&sb);
> 
> which at least avoids the extra allocations.

Right, we'd just have extra string copies in that case.

> By the way, I think there's another quite subtle leak in this function.
> We do this:
> 
>    format_commit_message(commit, "%s", &sb, &ctx);
>    strbuf_ltrim(&sb);
> 
> and then only use "sb" if sb.len is non-zero. But we may have actually
> allocated to create our zero-length string (e.g., if we had a strbuf
> full of spaces and trimmed them all off). Since we reuse "sb" over and
> over as we loop, this will actually only leak once for the whole loop,
> not once per iteration. So it's probably not a big deal, but writing it
> with the explicit reset/release pattern fixes that (and is more
> idiomatic for our code base, I think).

It's subtle, but I think it's not leaking, at least not in your example
case (and I can't think of another way).  IIUC format_subject(), which
handles the "%s" part, doesn't touch sb if the subject is made up only
of whitespace.

René

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-08 17:29     ` René Scharfe
@ 2017-12-08 18:37       ` Junio C Hamano
  2017-12-08 21:28         ` Jeff King
  2017-12-08 21:17       ` Jeff King
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2017-12-08 18:37 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Git List

René Scharfe <l.s.r@web.de> writes:

>> I'm not sure it's string-list's fault. Many callers (including this one)
>> ...
> The two modes (dup/nodup) make string_list code tricky.  Not sure
> how far we'd get with something simpler (e.g. an array of char pointers),
> but having the caller do all string allocations would make the code
> easier to analyze.

Yes.

It probably would have been more sensible if the API did not have
two modes (instead, have the caller pass whatever string to be
stored, *and* make the caller responsible for freeing them *if* it
passed an allocated string).

For the push_refs_with_push() patch you sent, another possible fix
would be to make cas_options a nodup kind so that the result of
strbuf_detach() does not get an extra strdup to be lost when placed
in cas_options.  With the current string-list API that would not
quite work, because freeing done in _release() is tied to the
"dup/nodup" ness of the string list.  I think there even is a
codepath that initializes a string_list as nodup kind, stuffs string
in it giving the ownership, and then flips it into dup kind just
before calling _release() only to have it free the strings, or
something silly/ugly like that.

In any case, the patch looks sensible.  Thanks for plugging the
leaks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-08 17:29   ` René Scharfe
@ 2017-12-08 18:44     ` Junio C Hamano
  2017-12-08 20:10       ` René Scharfe
  2017-12-08 21:11     ` Jeff King
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2017-12-08 18:44 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Git List

René Scharfe <l.s.r@web.de> writes:

> Am 07.12.2017 um 22:27 schrieb Jeff King:
>> Grepping for "list_append.*detach" shows a few other possible cases in
>> transport-helper.c, which I think are leaks.
>
> -- >8 --
> Subject: [PATCH] transport-helper: plug strbuf and string_list leaks
>
> Transfer ownership of detached strbufs to string_lists of the
> duplicating variety by calling string_list_append_nodup() instead of
> string_list_append() to avoid duplicating and then leaking the buffer.
>
> While at it make sure to release the string_list when done;
> push_refs_with_export() already does that.
>
> Reported-by: Jeff King <peff@peff.net>
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> ---
>  transport-helper.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/transport-helper.c b/transport-helper.c
> index bf05a2dcf1..f682e7c534 100644
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport *transport,
>  			struct strbuf cas = STRBUF_INIT;
>  			strbuf_addf(&cas, "%s:%s",
>  				    ref->name, oid_to_hex(&ref->old_oid_expect));
> -			string_list_append(&cas_options, strbuf_detach(&cas, NULL));
> +			string_list_append_nodup(&cas_options,
> +						 strbuf_detach(&cas, NULL));
>  		}
>  	}
>  	if (buf.len == 0) {
> @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport *transport,
>  	strbuf_addch(&buf, '\n');
>  	sendline(data, &buf);
>  	strbuf_release(&buf);
> +	string_list_release(&cas_options, 0);

There is no such function; you meant _clear() perhaps?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-08 18:44     ` Junio C Hamano
@ 2017-12-08 20:10       ` René Scharfe
  0 siblings, 0 replies; 16+ messages in thread
From: René Scharfe @ 2017-12-08 20:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git List

Am 08.12.2017 um 19:44 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
> 
>> Am 07.12.2017 um 22:27 schrieb Jeff King:
>>> Grepping for "list_append.*detach" shows a few other possible cases in
>>> transport-helper.c, which I think are leaks.
>>
>> -- >8 --
>> Subject: [PATCH] transport-helper: plug strbuf and string_list leaks
>>
>> Transfer ownership of detached strbufs to string_lists of the
>> duplicating variety by calling string_list_append_nodup() instead of
>> string_list_append() to avoid duplicating and then leaking the buffer.
>>
>> While at it make sure to release the string_list when done;
>> push_refs_with_export() already does that.
>>
>> Reported-by: Jeff King <peff@peff.net>
>> Signed-off-by: Rene Scharfe <l.s.r@web.de>
>> ---
>>   transport-helper.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/transport-helper.c b/transport-helper.c
>> index bf05a2dcf1..f682e7c534 100644
>> --- a/transport-helper.c
>> +++ b/transport-helper.c
>> @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport *transport,
>>   			struct strbuf cas = STRBUF_INIT;
>>   			strbuf_addf(&cas, "%s:%s",
>>   				    ref->name, oid_to_hex(&ref->old_oid_expect));
>> -			string_list_append(&cas_options, strbuf_detach(&cas, NULL));
>> +			string_list_append_nodup(&cas_options,
>> +						 strbuf_detach(&cas, NULL));
>>   		}
>>   	}
>>   	if (buf.len == 0) {
>> @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport *transport,
>>   	strbuf_addch(&buf, '\n');
>>   	sendline(data, &buf);
>>   	strbuf_release(&buf);
>> +	string_list_release(&cas_options, 0);
> 
> There is no such function; you meant _clear() perhaps?

Yes, of course, I'm sorry.  Not sure what happened there. O_o

René

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-08 17:29   ` René Scharfe
  2017-12-08 18:44     ` Junio C Hamano
@ 2017-12-08 21:11     ` Jeff King
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff King @ 2017-12-08 21:11 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano

On Fri, Dec 08, 2017 at 06:29:31PM +0100, René Scharfe wrote:

> Am 07.12.2017 um 22:27 schrieb Jeff King:
> > Grepping for "list_append.*detach" shows a few other possible cases in
> > transport-helper.c, which I think are leaks.
> 
> -- >8 --
> Subject: [PATCH] transport-helper: plug strbuf and string_list leaks
> 
> Transfer ownership of detached strbufs to string_lists of the
> duplicating variety by calling string_list_append_nodup() instead of
> string_list_append() to avoid duplicating and then leaking the buffer.

Thanks, this part looks obviously correct.

> While at it make sure to release the string_list when done;
> push_refs_with_export() already does that.

This one takes a bit more digging. I've been bitten before in Git's code
by freeing what appeared to be a leak only to find out that we had
passed the pointers off to some other data structure which expected them
to persist.

Here we feed them to set_helper_option(), which passes them to
quote_c_style(), which makes a copy into a strbuf. So I think all is
well.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-08 17:29     ` René Scharfe
  2017-12-08 18:37       ` Junio C Hamano
@ 2017-12-08 21:17       ` Jeff King
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff King @ 2017-12-08 21:17 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Git List

On Fri, Dec 08, 2017 at 06:29:34PM +0100, René Scharfe wrote:

> > By the way, I think there's another quite subtle leak in this function.
> > We do this:
> > 
> >    format_commit_message(commit, "%s", &sb, &ctx);
> >    strbuf_ltrim(&sb);
> > 
> > and then only use "sb" if sb.len is non-zero. But we may have actually
> > allocated to create our zero-length string (e.g., if we had a strbuf
> > full of spaces and trimmed them all off). Since we reuse "sb" over and
> > over as we loop, this will actually only leak once for the whole loop,
> > not once per iteration. So it's probably not a big deal, but writing it
> > with the explicit reset/release pattern fixes that (and is more
> > idiomatic for our code base, I think).
> 
> It's subtle, but I think it's not leaking, at least not in your example
> case (and I can't think of another way).  IIUC format_subject(), which
> handles the "%s" part, doesn't touch sb if the subject is made up only
> of whitespace.

Yeah, I suspected that may be the case. But IMHO it is a poor use of
strbufs if you have to dig that far to see whether the code leaks or
not. The whole point of strbufs is to make string handling and memory
ownership more obviously correct.

Just skimming the history, I think it's mostly an artifact of the
function was slowly converted over the years.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-08 18:37       ` Junio C Hamano
@ 2017-12-08 21:28         ` Jeff King
  2017-12-18 19:18           ` René Scharfe
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2017-12-08 21:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Git List

On Fri, Dec 08, 2017 at 10:37:08AM -0800, Junio C Hamano wrote:

> > The two modes (dup/nodup) make string_list code tricky.  Not sure
> > how far we'd get with something simpler (e.g. an array of char pointers),
> > but having the caller do all string allocations would make the code
> > easier to analyze.
> 
> Yes.
> 
> It probably would have been more sensible if the API did not have
> two modes (instead, have the caller pass whatever string to be
> stored, *and* make the caller responsible for freeing them *if* it
> passed an allocated string).

I'd actually argue the other way: the simplest interface is one where
the string list owns all of its pointers. That keeps the
ownership/lifetime issues clear, and it's one less step for the caller
to have to remember to do at the end (they do have to clear() the list,
but they must do that anyway to free the array of items).

It does mean that some callers may have to remember to free a temporary
buffer right after adding its contents to the list. But that's a lesser
evil, I think, since the memory ownership issues are all clearly
resolved at the time of add.

The big cost is just extra copies/allocations.

I dunno. I actually do not mind the "nodup" version of append being used
on a "dup" list. It is just a way of letting each call decide on whether
to hand over ownership, and I think most sites are pretty clear.

> For the push_refs_with_push() patch you sent, another possible fix
> would be to make cas_options a nodup kind so that the result of
> strbuf_detach() does not get an extra strdup to be lost when placed
> in cas_options.  With the current string-list API that would not
> quite work, because freeing done in _release() is tied to the
> "dup/nodup" ness of the string list.  I think there even is a
> codepath that initializes a string_list as nodup kind, stuffs string
> in it giving the ownership, and then flips it into dup kind just
> before calling _release() only to have it free the strings, or
> something silly/ugly like that.

Yes, the first grep hit for NODUP is bisect_clean_state(), which does
this. I think it would be more clear if we could do:

diff --git a/bisect.c b/bisect.c
index 0fca17c02b..7c59408a13 100644
--- a/bisect.c
+++ b/bisect.c
@@ -1060,8 +1060,7 @@ static int mark_for_removal(const char *refname, const struct object_id *oid,
 			    int flag, void *cb_data)
 {
 	struct string_list *refs = cb_data;
-	char *ref = xstrfmt("refs/bisect%s", refname);
-	string_list_append(refs, ref);
+	string_list_appendf(refs, "refs/bisect%s", refname);
 	return 0;
 }
 
@@ -1070,11 +1069,10 @@ int bisect_clean_state(void)
 	int result = 0;
 
 	/* There may be some refs packed during bisection */
-	struct string_list refs_for_removal = STRING_LIST_INIT_NODUP;
+	struct string_list refs_for_removal = STRING_LIST_INIT_DUP;
 	for_each_ref_in("refs/bisect", mark_for_removal, (void *) &refs_for_removal);
 	string_list_append(&refs_for_removal, xstrdup("BISECT_HEAD"));
 	result = delete_refs("bisect: remove", &refs_for_removal, REF_NO_DEREF);
-	refs_for_removal.strdup_strings = 1;
 	string_list_clear(&refs_for_removal, 0);
 	unlink_or_warn(git_path_bisect_expected_rev());
 	unlink_or_warn(git_path_bisect_ancestors_ok());


Having a "format into a string" wrapper doesn't cover _every_ string you
might want to add to a list, but my experience with argv_array_pushf
leads me to believe that it covers quite a lot of cases.

I dunno. I am not so bothered by the current dual-nature that I think it
is worth going on a crusade to eliminate one side. But I'm OK if
somebody else wants to do so.

-Peff

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-08 21:28         ` Jeff King
@ 2017-12-18 19:18           ` René Scharfe
  2017-12-19 11:38             ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: René Scharfe @ 2017-12-18 19:18 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Git List

Am 08.12.2017 um 22:28 schrieb Jeff King:
> On Fri, Dec 08, 2017 at 10:37:08AM -0800, Junio C Hamano wrote:
> 
>>> The two modes (dup/nodup) make string_list code tricky.  Not sure
>>> how far we'd get with something simpler (e.g. an array of char pointers),
>>> but having the caller do all string allocations would make the code
>>> easier to analyze.
>>
>> Yes.
>>
>> It probably would have been more sensible if the API did not have
>> two modes (instead, have the caller pass whatever string to be
>> stored, *and* make the caller responsible for freeing them *if* it
>> passed an allocated string).
> 
> I'd actually argue the other way: the simplest interface is one where
> the string list owns all of its pointers. That keeps the
> ownership/lifetime issues clear, and it's one less step for the caller
> to have to remember to do at the end (they do have to clear() the list,
> but they must do that anyway to free the array of items).
> 
> It does mean that some callers may have to remember to free a temporary
> buffer right after adding its contents to the list. But that's a lesser
> evil, I think, since the memory ownership issues are all clearly
> resolved at the time of add.
> 
> The big cost is just extra copies/allocations.

An interface requiring callers to allocate can be used to implement a
wrapper that does all allocations for them -- the other way around is
harder.  It can be used to avoid object duplication, but duplicates
functions.  No idea if that's worth it.
 
>> For the push_refs_with_push() patch you sent, another possible fix
>> would be to make cas_options a nodup kind so that the result of
>> strbuf_detach() does not get an extra strdup to be lost when placed
>> in cas_options.  With the current string-list API that would not
>> quite work, because freeing done in _release() is tied to the
>> "dup/nodup" ness of the string list.  I think there even is a
>> codepath that initializes a string_list as nodup kind, stuffs string
>> in it giving the ownership, and then flips it into dup kind just
>> before calling _release() only to have it free the strings, or
>> something silly/ugly like that.
> 
> Yes, the first grep hit for NODUP is bisect_clean_state(), which does
> this. I think it would be more clear if we could do:
> 
> diff --git a/bisect.c b/bisect.c
> index 0fca17c02b..7c59408a13 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -1060,8 +1060,7 @@ static int mark_for_removal(const char *refname, const struct object_id *oid,
>   			    int flag, void *cb_data)
>   {
>   	struct string_list *refs = cb_data;
> -	char *ref = xstrfmt("refs/bisect%s", refname);
> -	string_list_append(refs, ref);
> +	string_list_appendf(refs, "refs/bisect%s", refname);
>   	return 0;
>   }
>   
> @@ -1070,11 +1069,10 @@ int bisect_clean_state(void)
>   	int result = 0;
>   
>   	/* There may be some refs packed during bisection */
> -	struct string_list refs_for_removal = STRING_LIST_INIT_NODUP;
> +	struct string_list refs_for_removal = STRING_LIST_INIT_DUP;
>   	for_each_ref_in("refs/bisect", mark_for_removal, (void *) &refs_for_removal);
>   	string_list_append(&refs_for_removal, xstrdup("BISECT_HEAD"));

The xstrdup() here would have to go.

>   	result = delete_refs("bisect: remove", &refs_for_removal, REF_NO_DEREF);
> -	refs_for_removal.strdup_strings = 1;
>   	string_list_clear(&refs_for_removal, 0);
>   	unlink_or_warn(git_path_bisect_expected_rev());
>   	unlink_or_warn(git_path_bisect_ancestors_ok());
> 
> 
> Having a "format into a string" wrapper doesn't cover _every_ string you
> might want to add to a list, but my experience with argv_array_pushf
> leads me to believe that it covers quite a lot of cases.

It would fit in with the rest of the API -- we have string_list_append()
as a wrapper for string_list_append_nodup()+xstrdup() already.  We also
have similar functions for strbuf and argv_array.  I find it a bit sad
to reimplement xstrfmt() yet again instead of using it directly, though.

René

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-18 19:18           ` René Scharfe
@ 2017-12-19 11:38             ` Jeff King
  2017-12-19 18:26               ` René Scharfe
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2017-12-19 11:38 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Git List

On Mon, Dec 18, 2017 at 08:18:17PM +0100, René Scharfe wrote:

> > I'd actually argue the other way: the simplest interface is one where
> > the string list owns all of its pointers. That keeps the
> > ownership/lifetime issues clear, and it's one less step for the caller
> > to have to remember to do at the end (they do have to clear() the list,
> > but they must do that anyway to free the array of items).
> > 
> > It does mean that some callers may have to remember to free a temporary
> > buffer right after adding its contents to the list. But that's a lesser
> > evil, I think, since the memory ownership issues are all clearly
> > resolved at the time of add.
> > 
> > The big cost is just extra copies/allocations.
> 
> An interface requiring callers to allocate can be used to implement a
> wrapper that does all allocations for them -- the other way around is
> harder.  It can be used to avoid object duplication, but duplicates
> functions.  No idea if that's worth it.

Sure, but would anybody actually want to _use_ the non-wrapped version?
That's the same duality we have now with string_list.

> > Having a "format into a string" wrapper doesn't cover _every_ string you
> > might want to add to a list, but my experience with argv_array_pushf
> > leads me to believe that it covers quite a lot of cases.
> 
> It would fit in with the rest of the API -- we have string_list_append()
> as a wrapper for string_list_append_nodup()+xstrdup() already.  We also
> have similar functions for strbuf and argv_array.  I find it a bit sad
> to reimplement xstrfmt() yet again instead of using it directly, though.

I dunno, I think could provide some safety and some clarity. IOW, why
_don't_ we like:

  string_list_append_nodup(list, xstrfmt(fmt, ...));

? I think because:

  1. It's a bit long and ugly.

  2. It requires a magic "nodup", because we're violating the memory
     ownership boundary.

  3. It doesn't provide any safety for the case where strdup_strings is
     not set, making it easy to leak accidentally.

Doing:

  string_list_appendf(list, fmt, ...);

pushes the memory ownership semantics "under the hood" of the
string_list API. And as opposed to being a simple wrapper, it could
assert that strdup_strings is set (we already do some similar assertions
in the split functions).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-19 11:38             ` Jeff King
@ 2017-12-19 18:26               ` René Scharfe
  2017-12-20 13:05                 ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: René Scharfe @ 2017-12-19 18:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git List

Am 19.12.2017 um 12:38 schrieb Jeff King:
> On Mon, Dec 18, 2017 at 08:18:17PM +0100, René Scharfe wrote:
> 
>>> I'd actually argue the other way: the simplest interface is one where
>>> the string list owns all of its pointers. That keeps the
>>> ownership/lifetime issues clear, and it's one less step for the caller
>>> to have to remember to do at the end (they do have to clear() the list,
>>> but they must do that anyway to free the array of items).
>>>
>>> It does mean that some callers may have to remember to free a temporary
>>> buffer right after adding its contents to the list. But that's a lesser
>>> evil, I think, since the memory ownership issues are all clearly
>>> resolved at the time of add.
>>>
>>> The big cost is just extra copies/allocations.
>>
>> An interface requiring callers to allocate can be used to implement a
>> wrapper that does all allocations for them -- the other way around is
>> harder.  It can be used to avoid object duplication, but duplicates
>> functions.  No idea if that's worth it.
> 
> Sure, but would anybody actually want to _use_ the non-wrapped version?

Not sure, but cases that currently use STRING_LIST_INIT_NODUP probably
apply.  Apropos: apply.c::write_out_results() looks like it might, too.

Another question is how much it would cost to let them duplicate strings
as well in order to simplify the code.

> That's the same duality we have now with string_list.

Hmm, I thought we *were* discussing string_list?

>>> Having a "format into a string" wrapper doesn't cover _every_ string you
>>> might want to add to a list, but my experience with argv_array_pushf
>>> leads me to believe that it covers quite a lot of cases.
>>
>> It would fit in with the rest of the API -- we have string_list_append()
>> as a wrapper for string_list_append_nodup()+xstrdup() already.  We also
>> have similar functions for strbuf and argv_array.  I find it a bit sad
>> to reimplement xstrfmt() yet again instead of using it directly, though.
> 
> I dunno, I think could provide some safety and some clarity. IOW, why
> _don't_ we like:
> 
>    string_list_append_nodup(list, xstrfmt(fmt, ...));
> 
> ? I think because:
> 
>    1. It's a bit long and ugly.
> 
>    2. It requires a magic "nodup", because we're violating the memory
>       ownership boundary.
> 
>    3. It doesn't provide any safety for the case where strdup_strings is
>       not set, making it easy to leak accidentally.

Right, and at least 2 and 3 would be solved by having distinct types for
the plain and the duplicating variants.  The plain one would always
"nodup" and would have no flags that need to be checked.

> Doing:
> 
>    string_list_appendf(list, fmt, ...);
> 
> pushes the memory ownership semantics "under the hood" of the
> string_list API. And as opposed to being a simple wrapper, it could
> assert that strdup_strings is set (we already do some similar assertions
> in the split functions).

Yes, that check would guard against leaks.

Having few functions that can be combined looks like a cleaner interface
to me than having additional shortcuts for specific combinations -- less
duplication, smaller surface.

That said I'm not against adding string_list_appendf(); we already have
similar functions for other types.

René

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
  2017-12-19 18:26               ` René Scharfe
@ 2017-12-20 13:05                 ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2017-12-20 13:05 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Git List

On Tue, Dec 19, 2017 at 07:26:06PM +0100, René Scharfe wrote:

> > That's the same duality we have now with string_list.
> 
> Hmm, I thought we *were* discussing string_list?

Right, I guess what I was wondering is if a wrapper over string_list
really ends up any better than having the dual-natured string_list.

If they both use the same struct, then your wrappers are all just
functions. And isn't that more or less what we have now?

If they're actually different structs, then that complicates call
signatures for functions that take a list (unless we are getting into
polymorphism, they need to specify one of the types, even if they don't
particularly care whether it's an allocated list or not).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-12-20 13:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-07 20:22 [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() René Scharfe
2017-12-07 21:27 ` Jeff King
2017-12-08 17:29   ` René Scharfe
2017-12-08 18:44     ` Junio C Hamano
2017-12-08 20:10       ` René Scharfe
2017-12-08 21:11     ` Jeff King
2017-12-07 21:47 ` Junio C Hamano
2017-12-08 10:14   ` Jeff King
2017-12-08 17:29     ` René Scharfe
2017-12-08 18:37       ` Junio C Hamano
2017-12-08 21:28         ` Jeff King
2017-12-18 19:18           ` René Scharfe
2017-12-19 11:38             ` Jeff King
2017-12-19 18:26               ` René Scharfe
2017-12-20 13:05                 ` Jeff King
2017-12-08 21:17       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).