* [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() @ 2017-12-07 20:22 René Scharfe 2017-12-07 21:27 ` Jeff King 2017-12-07 21:47 ` Junio C Hamano 0 siblings, 2 replies; 16+ messages in thread From: René Scharfe @ 2017-12-07 20:22 UTC (permalink / raw) To: Git List; +Cc: Junio C Hamano Use string_list_append_nodup() instead of string_list_append() to hand over ownership of a detached strbuf and thus avoid leaking its memory. Signed-off-by: Rene Scharfe <l.s.r@web.de> --- builtin/fmt-merge-msg.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c index 22034f87e7..8e8a15ea4a 100644 --- a/builtin/fmt-merge-msg.c +++ b/builtin/fmt-merge-msg.c @@ -377,7 +377,8 @@ static void shortlog(const char *name, string_list_append(&subjects, oid_to_hex(&commit->object.oid)); else - string_list_append(&subjects, strbuf_detach(&sb, NULL)); + string_list_append_nodup(&subjects, + strbuf_detach(&sb, NULL)); } if (opts->credit_people) -- 2.15.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-07 20:22 [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() René Scharfe @ 2017-12-07 21:27 ` Jeff King 2017-12-08 17:29 ` René Scharfe 2017-12-07 21:47 ` Junio C Hamano 1 sibling, 1 reply; 16+ messages in thread From: Jeff King @ 2017-12-07 21:27 UTC (permalink / raw) To: René Scharfe; +Cc: Git List, Junio C Hamano On Thu, Dec 07, 2017 at 09:22:49PM +0100, René Scharfe wrote: > Use string_list_append_nodup() instead of string_list_append() to hand > over ownership of a detached strbuf and thus avoid leaking its memory. Looks obviously correct (though one thing missing from the diff context is whether "subjects" is set to DUP -- it is, which is good). Grepping for "list_append.*detach" shows a few other possible cases in transport-helper.c, which I think are leaks. I wondered if it would be possible to write a coccinelle rule for this, but I think it's not possible. Whether this is right depends on the strdup_strings flag, which could change at runtime (though in practice it doesn't). -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-07 21:27 ` Jeff King @ 2017-12-08 17:29 ` René Scharfe 2017-12-08 18:44 ` Junio C Hamano 2017-12-08 21:11 ` Jeff King 0 siblings, 2 replies; 16+ messages in thread From: René Scharfe @ 2017-12-08 17:29 UTC (permalink / raw) To: Jeff King, Git List; +Cc: Junio C Hamano Am 07.12.2017 um 22:27 schrieb Jeff King: > Grepping for "list_append.*detach" shows a few other possible cases in > transport-helper.c, which I think are leaks. -- >8 -- Subject: [PATCH] transport-helper: plug strbuf and string_list leaks Transfer ownership of detached strbufs to string_lists of the duplicating variety by calling string_list_append_nodup() instead of string_list_append() to avoid duplicating and then leaking the buffer. While at it make sure to release the string_list when done; push_refs_with_export() already does that. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> --- transport-helper.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/transport-helper.c b/transport-helper.c index bf05a2dcf1..f682e7c534 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport *transport, struct strbuf cas = STRBUF_INIT; strbuf_addf(&cas, "%s:%s", ref->name, oid_to_hex(&ref->old_oid_expect)); - string_list_append(&cas_options, strbuf_detach(&cas, NULL)); + string_list_append_nodup(&cas_options, + strbuf_detach(&cas, NULL)); } } if (buf.len == 0) { @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport *transport, strbuf_addch(&buf, '\n'); sendline(data, &buf); strbuf_release(&buf); + string_list_release(&cas_options, 0); return push_update_refs_status(data, remote_refs, flags); } @@ -930,7 +932,8 @@ static int push_refs_with_export(struct transport *transport, private = apply_refspecs(data->refspecs, data->refspec_nr, ref->name); if (private && !get_oid(private, &oid)) { strbuf_addf(&buf, "^%s", private); - string_list_append(&revlist_args, strbuf_detach(&buf, NULL)); + string_list_append_nodup(&revlist_args, + strbuf_detach(&buf, NULL)); oidcpy(&ref->old_oid, &oid); } free(private); -- 2.15.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-08 17:29 ` René Scharfe @ 2017-12-08 18:44 ` Junio C Hamano 2017-12-08 20:10 ` René Scharfe 2017-12-08 21:11 ` Jeff King 1 sibling, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2017-12-08 18:44 UTC (permalink / raw) To: René Scharfe; +Cc: Jeff King, Git List René Scharfe <l.s.r@web.de> writes: > Am 07.12.2017 um 22:27 schrieb Jeff King: >> Grepping for "list_append.*detach" shows a few other possible cases in >> transport-helper.c, which I think are leaks. > > -- >8 -- > Subject: [PATCH] transport-helper: plug strbuf and string_list leaks > > Transfer ownership of detached strbufs to string_lists of the > duplicating variety by calling string_list_append_nodup() instead of > string_list_append() to avoid duplicating and then leaking the buffer. > > While at it make sure to release the string_list when done; > push_refs_with_export() already does that. > > Reported-by: Jeff King <peff@peff.net> > Signed-off-by: Rene Scharfe <l.s.r@web.de> > --- > transport-helper.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/transport-helper.c b/transport-helper.c > index bf05a2dcf1..f682e7c534 100644 > --- a/transport-helper.c > +++ b/transport-helper.c > @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport *transport, > struct strbuf cas = STRBUF_INIT; > strbuf_addf(&cas, "%s:%s", > ref->name, oid_to_hex(&ref->old_oid_expect)); > - string_list_append(&cas_options, strbuf_detach(&cas, NULL)); > + string_list_append_nodup(&cas_options, > + strbuf_detach(&cas, NULL)); > } > } > if (buf.len == 0) { > @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport *transport, > strbuf_addch(&buf, '\n'); > sendline(data, &buf); > strbuf_release(&buf); > + string_list_release(&cas_options, 0); There is no such function; you meant _clear() perhaps? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-08 18:44 ` Junio C Hamano @ 2017-12-08 20:10 ` René Scharfe 0 siblings, 0 replies; 16+ messages in thread From: René Scharfe @ 2017-12-08 20:10 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Git List Am 08.12.2017 um 19:44 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >> Am 07.12.2017 um 22:27 schrieb Jeff King: >>> Grepping for "list_append.*detach" shows a few other possible cases in >>> transport-helper.c, which I think are leaks. >> >> -- >8 -- >> Subject: [PATCH] transport-helper: plug strbuf and string_list leaks >> >> Transfer ownership of detached strbufs to string_lists of the >> duplicating variety by calling string_list_append_nodup() instead of >> string_list_append() to avoid duplicating and then leaking the buffer. >> >> While at it make sure to release the string_list when done; >> push_refs_with_export() already does that. >> >> Reported-by: Jeff King <peff@peff.net> >> Signed-off-by: Rene Scharfe <l.s.r@web.de> >> --- >> transport-helper.c | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/transport-helper.c b/transport-helper.c >> index bf05a2dcf1..f682e7c534 100644 >> --- a/transport-helper.c >> +++ b/transport-helper.c >> @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport *transport, >> struct strbuf cas = STRBUF_INIT; >> strbuf_addf(&cas, "%s:%s", >> ref->name, oid_to_hex(&ref->old_oid_expect)); >> - string_list_append(&cas_options, strbuf_detach(&cas, NULL)); >> + string_list_append_nodup(&cas_options, >> + strbuf_detach(&cas, NULL)); >> } >> } >> if (buf.len == 0) { >> @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport *transport, >> strbuf_addch(&buf, '\n'); >> sendline(data, &buf); >> strbuf_release(&buf); >> + string_list_release(&cas_options, 0); > > There is no such function; you meant _clear() perhaps? Yes, of course, I'm sorry. Not sure what happened there. O_o René ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-08 17:29 ` René Scharfe 2017-12-08 18:44 ` Junio C Hamano @ 2017-12-08 21:11 ` Jeff King 1 sibling, 0 replies; 16+ messages in thread From: Jeff King @ 2017-12-08 21:11 UTC (permalink / raw) To: René Scharfe; +Cc: Git List, Junio C Hamano On Fri, Dec 08, 2017 at 06:29:31PM +0100, René Scharfe wrote: > Am 07.12.2017 um 22:27 schrieb Jeff King: > > Grepping for "list_append.*detach" shows a few other possible cases in > > transport-helper.c, which I think are leaks. > > -- >8 -- > Subject: [PATCH] transport-helper: plug strbuf and string_list leaks > > Transfer ownership of detached strbufs to string_lists of the > duplicating variety by calling string_list_append_nodup() instead of > string_list_append() to avoid duplicating and then leaking the buffer. Thanks, this part looks obviously correct. > While at it make sure to release the string_list when done; > push_refs_with_export() already does that. This one takes a bit more digging. I've been bitten before in Git's code by freeing what appeared to be a leak only to find out that we had passed the pointers off to some other data structure which expected them to persist. Here we feed them to set_helper_option(), which passes them to quote_c_style(), which makes a copy into a strbuf. So I think all is well. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-07 20:22 [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() René Scharfe 2017-12-07 21:27 ` Jeff King @ 2017-12-07 21:47 ` Junio C Hamano 2017-12-08 10:14 ` Jeff King 1 sibling, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2017-12-07 21:47 UTC (permalink / raw) To: René Scharfe; +Cc: Git List René Scharfe <l.s.r@web.de> writes: > Use string_list_append_nodup() instead of string_list_append() to hand > over ownership of a detached strbuf and thus avoid leaking its memory. > > Signed-off-by: Rene Scharfe <l.s.r@web.de> > --- > builtin/fmt-merge-msg.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c > index 22034f87e7..8e8a15ea4a 100644 > --- a/builtin/fmt-merge-msg.c > +++ b/builtin/fmt-merge-msg.c > @@ -377,7 +377,8 @@ static void shortlog(const char *name, > string_list_append(&subjects, > oid_to_hex(&commit->object.oid)); > else > - string_list_append(&subjects, strbuf_detach(&sb, NULL)); > + string_list_append_nodup(&subjects, > + strbuf_detach(&sb, NULL)); > } > > if (opts->credit_people) What is leaked comes from strbuf, so the title is not a lie, but I tend to think that this leak is caused by a somewhat strange string_list API. The subjects string-list is initialized as a "dup" kind, but a caller that wants to avoid leaking can (and should) use _nodup() call to add a string without duping. It all feels a bit too convoluted. The patch looks good. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-07 21:47 ` Junio C Hamano @ 2017-12-08 10:14 ` Jeff King 2017-12-08 17:29 ` René Scharfe 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2017-12-08 10:14 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, Git List On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote: > > diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c > > index 22034f87e7..8e8a15ea4a 100644 > > --- a/builtin/fmt-merge-msg.c > > +++ b/builtin/fmt-merge-msg.c > > @@ -377,7 +377,8 @@ static void shortlog(const char *name, > > string_list_append(&subjects, > > oid_to_hex(&commit->object.oid)); > > else > > - string_list_append(&subjects, strbuf_detach(&sb, NULL)); > > + string_list_append_nodup(&subjects, > > + strbuf_detach(&sb, NULL)); > > } > > > > if (opts->credit_people) > > What is leaked comes from strbuf, so the title is not a lie, but I > tend to think that this leak is caused by a somewhat strange > string_list API. The subjects string-list is initialized as a "dup" > kind, but a caller that wants to avoid leaking can (and should) use > _nodup() call to add a string without duping. It all feels a bit > too convoluted. I'm not sure it's string-list's fault. Many callers (including this one) have _some_ entries whose strings must be duplicated and others which do not. So either: 1. The list gets marked as "nodup", and we add an extra xstrdup() to the oid_to_hex call above. And also need to remember to free() the strings later, since the list does not own them. or 2. We mark it as "dup" and incur an extra allocation and copy, like: string_list_append(&subjects, sb.buf); strbuf_release(&buf); So I'd really blame the caller, which doesn't want to do (2) out of a sense of optimization. It could also perhaps write it as: while (commit = get_revision(rev)) { strbuf_reset(&sb); ... maybe put some stuff in sb ... if (!sb.len) string_list_append(&subjects, oid_to_hex(obj)); else string_list_append(&subjects, sb.buf); } strbuf_release(&sb); which at least avoids the extra allocations. By the way, I think there's another quite subtle leak in this function. We do this: format_commit_message(commit, "%s", &sb, &ctx); strbuf_ltrim(&sb); and then only use "sb" if sb.len is non-zero. But we may have actually allocated to create our zero-length string (e.g., if we had a strbuf full of spaces and trimmed them all off). Since we reuse "sb" over and over as we loop, this will actually only leak once for the whole loop, not once per iteration. So it's probably not a big deal, but writing it with the explicit reset/release pattern fixes that (and is more idiomatic for our code base, I think). -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-08 10:14 ` Jeff King @ 2017-12-08 17:29 ` René Scharfe 2017-12-08 18:37 ` Junio C Hamano 2017-12-08 21:17 ` Jeff King 0 siblings, 2 replies; 16+ messages in thread From: René Scharfe @ 2017-12-08 17:29 UTC (permalink / raw) To: Jeff King, Junio C Hamano; +Cc: Git List Am 08.12.2017 um 11:14 schrieb Jeff King: > On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote: > >>> diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c >>> index 22034f87e7..8e8a15ea4a 100644 >>> --- a/builtin/fmt-merge-msg.c >>> +++ b/builtin/fmt-merge-msg.c >>> @@ -377,7 +377,8 @@ static void shortlog(const char *name, >>> string_list_append(&subjects, >>> oid_to_hex(&commit->object.oid)); >>> else >>> - string_list_append(&subjects, strbuf_detach(&sb, NULL)); >>> + string_list_append_nodup(&subjects, >>> + strbuf_detach(&sb, NULL)); >>> } >>> >>> if (opts->credit_people) >> >> What is leaked comes from strbuf, so the title is not a lie, but I >> tend to think that this leak is caused by a somewhat strange >> string_list API. The subjects string-list is initialized as a "dup" >> kind, but a caller that wants to avoid leaking can (and should) use >> _nodup() call to add a string without duping. It all feels a bit >> too convoluted. > > I'm not sure it's string-list's fault. Many callers (including this one) > have _some_ entries whose strings must be duplicated and others which do > not. > > So either: > > 1. The list gets marked as "nodup", and we add an extra xstrdup() to the > oid_to_hex call above. And also need to remember to free() the > strings later, since the list does not own them. > > or > > 2. We mark it as "dup" and incur an extra allocation and copy, like: > > string_list_append(&subjects, sb.buf); > strbuf_release(&buf); The two modes (dup/nodup) make string_list code tricky. Not sure how far we'd get with something simpler (e.g. an array of char pointers), but having the caller do all string allocations would make the code easier to analyze. > So I'd really blame the caller, which doesn't want to do (2) out of a > sense of optimization. It could also perhaps write it as: > > while (commit = get_revision(rev)) { > strbuf_reset(&sb); > ... maybe put some stuff in sb ... > if (!sb.len) > string_list_append(&subjects, oid_to_hex(obj)); > else > string_list_append(&subjects, sb.buf); > } > strbuf_release(&sb); > > which at least avoids the extra allocations. Right, we'd just have extra string copies in that case. > By the way, I think there's another quite subtle leak in this function. > We do this: > > format_commit_message(commit, "%s", &sb, &ctx); > strbuf_ltrim(&sb); > > and then only use "sb" if sb.len is non-zero. But we may have actually > allocated to create our zero-length string (e.g., if we had a strbuf > full of spaces and trimmed them all off). Since we reuse "sb" over and > over as we loop, this will actually only leak once for the whole loop, > not once per iteration. So it's probably not a big deal, but writing it > with the explicit reset/release pattern fixes that (and is more > idiomatic for our code base, I think). It's subtle, but I think it's not leaking, at least not in your example case (and I can't think of another way). IIUC format_subject(), which handles the "%s" part, doesn't touch sb if the subject is made up only of whitespace. René ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-08 17:29 ` René Scharfe @ 2017-12-08 18:37 ` Junio C Hamano 2017-12-08 21:28 ` Jeff King 2017-12-08 21:17 ` Jeff King 1 sibling, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2017-12-08 18:37 UTC (permalink / raw) To: René Scharfe; +Cc: Jeff King, Git List René Scharfe <l.s.r@web.de> writes: >> I'm not sure it's string-list's fault. Many callers (including this one) >> ... > The two modes (dup/nodup) make string_list code tricky. Not sure > how far we'd get with something simpler (e.g. an array of char pointers), > but having the caller do all string allocations would make the code > easier to analyze. Yes. It probably would have been more sensible if the API did not have two modes (instead, have the caller pass whatever string to be stored, *and* make the caller responsible for freeing them *if* it passed an allocated string). For the push_refs_with_push() patch you sent, another possible fix would be to make cas_options a nodup kind so that the result of strbuf_detach() does not get an extra strdup to be lost when placed in cas_options. With the current string-list API that would not quite work, because freeing done in _release() is tied to the "dup/nodup" ness of the string list. I think there even is a codepath that initializes a string_list as nodup kind, stuffs string in it giving the ownership, and then flips it into dup kind just before calling _release() only to have it free the strings, or something silly/ugly like that. In any case, the patch looks sensible. Thanks for plugging the leaks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-08 18:37 ` Junio C Hamano @ 2017-12-08 21:28 ` Jeff King 2017-12-18 19:18 ` René Scharfe 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2017-12-08 21:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, Git List On Fri, Dec 08, 2017 at 10:37:08AM -0800, Junio C Hamano wrote: > > The two modes (dup/nodup) make string_list code tricky. Not sure > > how far we'd get with something simpler (e.g. an array of char pointers), > > but having the caller do all string allocations would make the code > > easier to analyze. > > Yes. > > It probably would have been more sensible if the API did not have > two modes (instead, have the caller pass whatever string to be > stored, *and* make the caller responsible for freeing them *if* it > passed an allocated string). I'd actually argue the other way: the simplest interface is one where the string list owns all of its pointers. That keeps the ownership/lifetime issues clear, and it's one less step for the caller to have to remember to do at the end (they do have to clear() the list, but they must do that anyway to free the array of items). It does mean that some callers may have to remember to free a temporary buffer right after adding its contents to the list. But that's a lesser evil, I think, since the memory ownership issues are all clearly resolved at the time of add. The big cost is just extra copies/allocations. I dunno. I actually do not mind the "nodup" version of append being used on a "dup" list. It is just a way of letting each call decide on whether to hand over ownership, and I think most sites are pretty clear. > For the push_refs_with_push() patch you sent, another possible fix > would be to make cas_options a nodup kind so that the result of > strbuf_detach() does not get an extra strdup to be lost when placed > in cas_options. With the current string-list API that would not > quite work, because freeing done in _release() is tied to the > "dup/nodup" ness of the string list. I think there even is a > codepath that initializes a string_list as nodup kind, stuffs string > in it giving the ownership, and then flips it into dup kind just > before calling _release() only to have it free the strings, or > something silly/ugly like that. Yes, the first grep hit for NODUP is bisect_clean_state(), which does this. I think it would be more clear if we could do: diff --git a/bisect.c b/bisect.c index 0fca17c02b..7c59408a13 100644 --- a/bisect.c +++ b/bisect.c @@ -1060,8 +1060,7 @@ static int mark_for_removal(const char *refname, const struct object_id *oid, int flag, void *cb_data) { struct string_list *refs = cb_data; - char *ref = xstrfmt("refs/bisect%s", refname); - string_list_append(refs, ref); + string_list_appendf(refs, "refs/bisect%s", refname); return 0; } @@ -1070,11 +1069,10 @@ int bisect_clean_state(void) int result = 0; /* There may be some refs packed during bisection */ - struct string_list refs_for_removal = STRING_LIST_INIT_NODUP; + struct string_list refs_for_removal = STRING_LIST_INIT_DUP; for_each_ref_in("refs/bisect", mark_for_removal, (void *) &refs_for_removal); string_list_append(&refs_for_removal, xstrdup("BISECT_HEAD")); result = delete_refs("bisect: remove", &refs_for_removal, REF_NO_DEREF); - refs_for_removal.strdup_strings = 1; string_list_clear(&refs_for_removal, 0); unlink_or_warn(git_path_bisect_expected_rev()); unlink_or_warn(git_path_bisect_ancestors_ok()); Having a "format into a string" wrapper doesn't cover _every_ string you might want to add to a list, but my experience with argv_array_pushf leads me to believe that it covers quite a lot of cases. I dunno. I am not so bothered by the current dual-nature that I think it is worth going on a crusade to eliminate one side. But I'm OK if somebody else wants to do so. -Peff ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-08 21:28 ` Jeff King @ 2017-12-18 19:18 ` René Scharfe 2017-12-19 11:38 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: René Scharfe @ 2017-12-18 19:18 UTC (permalink / raw) To: Jeff King, Junio C Hamano; +Cc: Git List Am 08.12.2017 um 22:28 schrieb Jeff King: > On Fri, Dec 08, 2017 at 10:37:08AM -0800, Junio C Hamano wrote: > >>> The two modes (dup/nodup) make string_list code tricky. Not sure >>> how far we'd get with something simpler (e.g. an array of char pointers), >>> but having the caller do all string allocations would make the code >>> easier to analyze. >> >> Yes. >> >> It probably would have been more sensible if the API did not have >> two modes (instead, have the caller pass whatever string to be >> stored, *and* make the caller responsible for freeing them *if* it >> passed an allocated string). > > I'd actually argue the other way: the simplest interface is one where > the string list owns all of its pointers. That keeps the > ownership/lifetime issues clear, and it's one less step for the caller > to have to remember to do at the end (they do have to clear() the list, > but they must do that anyway to free the array of items). > > It does mean that some callers may have to remember to free a temporary > buffer right after adding its contents to the list. But that's a lesser > evil, I think, since the memory ownership issues are all clearly > resolved at the time of add. > > The big cost is just extra copies/allocations. An interface requiring callers to allocate can be used to implement a wrapper that does all allocations for them -- the other way around is harder. It can be used to avoid object duplication, but duplicates functions. No idea if that's worth it. >> For the push_refs_with_push() patch you sent, another possible fix >> would be to make cas_options a nodup kind so that the result of >> strbuf_detach() does not get an extra strdup to be lost when placed >> in cas_options. With the current string-list API that would not >> quite work, because freeing done in _release() is tied to the >> "dup/nodup" ness of the string list. I think there even is a >> codepath that initializes a string_list as nodup kind, stuffs string >> in it giving the ownership, and then flips it into dup kind just >> before calling _release() only to have it free the strings, or >> something silly/ugly like that. > > Yes, the first grep hit for NODUP is bisect_clean_state(), which does > this. I think it would be more clear if we could do: > > diff --git a/bisect.c b/bisect.c > index 0fca17c02b..7c59408a13 100644 > --- a/bisect.c > +++ b/bisect.c > @@ -1060,8 +1060,7 @@ static int mark_for_removal(const char *refname, const struct object_id *oid, > int flag, void *cb_data) > { > struct string_list *refs = cb_data; > - char *ref = xstrfmt("refs/bisect%s", refname); > - string_list_append(refs, ref); > + string_list_appendf(refs, "refs/bisect%s", refname); > return 0; > } > > @@ -1070,11 +1069,10 @@ int bisect_clean_state(void) > int result = 0; > > /* There may be some refs packed during bisection */ > - struct string_list refs_for_removal = STRING_LIST_INIT_NODUP; > + struct string_list refs_for_removal = STRING_LIST_INIT_DUP; > for_each_ref_in("refs/bisect", mark_for_removal, (void *) &refs_for_removal); > string_list_append(&refs_for_removal, xstrdup("BISECT_HEAD")); The xstrdup() here would have to go. > result = delete_refs("bisect: remove", &refs_for_removal, REF_NO_DEREF); > - refs_for_removal.strdup_strings = 1; > string_list_clear(&refs_for_removal, 0); > unlink_or_warn(git_path_bisect_expected_rev()); > unlink_or_warn(git_path_bisect_ancestors_ok()); > > > Having a "format into a string" wrapper doesn't cover _every_ string you > might want to add to a list, but my experience with argv_array_pushf > leads me to believe that it covers quite a lot of cases. It would fit in with the rest of the API -- we have string_list_append() as a wrapper for string_list_append_nodup()+xstrdup() already. We also have similar functions for strbuf and argv_array. I find it a bit sad to reimplement xstrfmt() yet again instead of using it directly, though. René ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-18 19:18 ` René Scharfe @ 2017-12-19 11:38 ` Jeff King 2017-12-19 18:26 ` René Scharfe 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2017-12-19 11:38 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, Git List On Mon, Dec 18, 2017 at 08:18:17PM +0100, René Scharfe wrote: > > I'd actually argue the other way: the simplest interface is one where > > the string list owns all of its pointers. That keeps the > > ownership/lifetime issues clear, and it's one less step for the caller > > to have to remember to do at the end (they do have to clear() the list, > > but they must do that anyway to free the array of items). > > > > It does mean that some callers may have to remember to free a temporary > > buffer right after adding its contents to the list. But that's a lesser > > evil, I think, since the memory ownership issues are all clearly > > resolved at the time of add. > > > > The big cost is just extra copies/allocations. > > An interface requiring callers to allocate can be used to implement a > wrapper that does all allocations for them -- the other way around is > harder. It can be used to avoid object duplication, but duplicates > functions. No idea if that's worth it. Sure, but would anybody actually want to _use_ the non-wrapped version? That's the same duality we have now with string_list. > > Having a "format into a string" wrapper doesn't cover _every_ string you > > might want to add to a list, but my experience with argv_array_pushf > > leads me to believe that it covers quite a lot of cases. > > It would fit in with the rest of the API -- we have string_list_append() > as a wrapper for string_list_append_nodup()+xstrdup() already. We also > have similar functions for strbuf and argv_array. I find it a bit sad > to reimplement xstrfmt() yet again instead of using it directly, though. I dunno, I think could provide some safety and some clarity. IOW, why _don't_ we like: string_list_append_nodup(list, xstrfmt(fmt, ...)); ? I think because: 1. It's a bit long and ugly. 2. It requires a magic "nodup", because we're violating the memory ownership boundary. 3. It doesn't provide any safety for the case where strdup_strings is not set, making it easy to leak accidentally. Doing: string_list_appendf(list, fmt, ...); pushes the memory ownership semantics "under the hood" of the string_list API. And as opposed to being a simple wrapper, it could assert that strdup_strings is set (we already do some similar assertions in the split functions). -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-19 11:38 ` Jeff King @ 2017-12-19 18:26 ` René Scharfe 2017-12-20 13:05 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: René Scharfe @ 2017-12-19 18:26 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Git List Am 19.12.2017 um 12:38 schrieb Jeff King: > On Mon, Dec 18, 2017 at 08:18:17PM +0100, René Scharfe wrote: > >>> I'd actually argue the other way: the simplest interface is one where >>> the string list owns all of its pointers. That keeps the >>> ownership/lifetime issues clear, and it's one less step for the caller >>> to have to remember to do at the end (they do have to clear() the list, >>> but they must do that anyway to free the array of items). >>> >>> It does mean that some callers may have to remember to free a temporary >>> buffer right after adding its contents to the list. But that's a lesser >>> evil, I think, since the memory ownership issues are all clearly >>> resolved at the time of add. >>> >>> The big cost is just extra copies/allocations. >> >> An interface requiring callers to allocate can be used to implement a >> wrapper that does all allocations for them -- the other way around is >> harder. It can be used to avoid object duplication, but duplicates >> functions. No idea if that's worth it. > > Sure, but would anybody actually want to _use_ the non-wrapped version? Not sure, but cases that currently use STRING_LIST_INIT_NODUP probably apply. Apropos: apply.c::write_out_results() looks like it might, too. Another question is how much it would cost to let them duplicate strings as well in order to simplify the code. > That's the same duality we have now with string_list. Hmm, I thought we *were* discussing string_list? >>> Having a "format into a string" wrapper doesn't cover _every_ string you >>> might want to add to a list, but my experience with argv_array_pushf >>> leads me to believe that it covers quite a lot of cases. >> >> It would fit in with the rest of the API -- we have string_list_append() >> as a wrapper for string_list_append_nodup()+xstrdup() already. We also >> have similar functions for strbuf and argv_array. I find it a bit sad >> to reimplement xstrfmt() yet again instead of using it directly, though. > > I dunno, I think could provide some safety and some clarity. IOW, why > _don't_ we like: > > string_list_append_nodup(list, xstrfmt(fmt, ...)); > > ? I think because: > > 1. It's a bit long and ugly. > > 2. It requires a magic "nodup", because we're violating the memory > ownership boundary. > > 3. It doesn't provide any safety for the case where strdup_strings is > not set, making it easy to leak accidentally. Right, and at least 2 and 3 would be solved by having distinct types for the plain and the duplicating variants. The plain one would always "nodup" and would have no flags that need to be checked. > Doing: > > string_list_appendf(list, fmt, ...); > > pushes the memory ownership semantics "under the hood" of the > string_list API. And as opposed to being a simple wrapper, it could > assert that strdup_strings is set (we already do some similar assertions > in the split functions). Yes, that check would guard against leaks. Having few functions that can be combined looks like a cleaner interface to me than having additional shortcuts for specific combinations -- less duplication, smaller surface. That said I'm not against adding string_list_appendf(); we already have similar functions for other types. René ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-19 18:26 ` René Scharfe @ 2017-12-20 13:05 ` Jeff King 0 siblings, 0 replies; 16+ messages in thread From: Jeff King @ 2017-12-20 13:05 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, Git List On Tue, Dec 19, 2017 at 07:26:06PM +0100, René Scharfe wrote: > > That's the same duality we have now with string_list. > > Hmm, I thought we *were* discussing string_list? Right, I guess what I was wondering is if a wrapper over string_list really ends up any better than having the dual-natured string_list. If they both use the same struct, then your wrappers are all just functions. And isn't that more or less what we have now? If they're actually different structs, then that complicates call signatures for functions that take a list (unless we are getting into polymorphism, they need to specify one of the types, even if they don't particularly care whether it's an allocated list or not). -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() 2017-12-08 17:29 ` René Scharfe 2017-12-08 18:37 ` Junio C Hamano @ 2017-12-08 21:17 ` Jeff King 1 sibling, 0 replies; 16+ messages in thread From: Jeff King @ 2017-12-08 21:17 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, Git List On Fri, Dec 08, 2017 at 06:29:34PM +0100, René Scharfe wrote: > > By the way, I think there's another quite subtle leak in this function. > > We do this: > > > > format_commit_message(commit, "%s", &sb, &ctx); > > strbuf_ltrim(&sb); > > > > and then only use "sb" if sb.len is non-zero. But we may have actually > > allocated to create our zero-length string (e.g., if we had a strbuf > > full of spaces and trimmed them all off). Since we reuse "sb" over and > > over as we loop, this will actually only leak once for the whole loop, > > not once per iteration. So it's probably not a big deal, but writing it > > with the explicit reset/release pattern fixes that (and is more > > idiomatic for our code base, I think). > > It's subtle, but I think it's not leaking, at least not in your example > case (and I can't think of another way). IIUC format_subject(), which > handles the "%s" part, doesn't touch sb if the subject is made up only > of whitespace. Yeah, I suspected that may be the case. But IMHO it is a poor use of strbufs if you have to dig that far to see whether the code leaks or not. The whole point of strbufs is to make string handling and memory ownership more obviously correct. Just skimming the history, I think it's mostly an artifact of the function was slowly converted over the years. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-12-20 13:05 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-07 20:22 [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() René Scharfe 2017-12-07 21:27 ` Jeff King 2017-12-08 17:29 ` René Scharfe 2017-12-08 18:44 ` Junio C Hamano 2017-12-08 20:10 ` René Scharfe 2017-12-08 21:11 ` Jeff King 2017-12-07 21:47 ` Junio C Hamano 2017-12-08 10:14 ` Jeff King 2017-12-08 17:29 ` René Scharfe 2017-12-08 18:37 ` Junio C Hamano 2017-12-08 21:28 ` Jeff King 2017-12-18 19:18 ` René Scharfe 2017-12-19 11:38 ` Jeff King 2017-12-19 18:26 ` René Scharfe 2017-12-20 13:05 ` Jeff King 2017-12-08 21:17 ` Jeff King
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).