git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
Date: Fri, 8 Dec 2017 18:29:34 +0100	[thread overview]
Message-ID: <1654a696-73d5-c9ef-0fc2-bd82aaf2cabb@web.de> (raw)
In-Reply-To: <20171208101455.GC1899@sigill.intra.peff.net>

Am 08.12.2017 um 11:14 schrieb Jeff King:
> On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote:
> 
>>> diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
>>> index 22034f87e7..8e8a15ea4a 100644
>>> --- a/builtin/fmt-merge-msg.c
>>> +++ b/builtin/fmt-merge-msg.c
>>> @@ -377,7 +377,8 @@ static void shortlog(const char *name,
>>>   			string_list_append(&subjects,
>>>   					   oid_to_hex(&commit->object.oid));
>>>   		else
>>> -			string_list_append(&subjects, strbuf_detach(&sb, NULL));
>>> +			string_list_append_nodup(&subjects,
>>> +						 strbuf_detach(&sb, NULL));
>>>   	}
>>>   
>>>   	if (opts->credit_people)
>>
>> What is leaked comes from strbuf, so the title is not a lie, but I
>> tend to think that this leak is caused by a somewhat strange
>> string_list API.  The subjects string-list is initialized as a "dup"
>> kind, but a caller that wants to avoid leaking can (and should) use
>> _nodup() call to add a string without duping.  It all feels a bit
>> too convoluted.
> 
> I'm not sure it's string-list's fault. Many callers (including this one)
> have _some_ entries whose strings must be duplicated and others which do
> not.
> 
> So either:
> 
>    1. The list gets marked as "nodup", and we add an extra xstrdup() to the
>       oid_to_hex call above. And also need to remember to free() the
>       strings later, since the list does not own them.
> 
> or
> 
>    2. We mark it as "dup" and incur an extra allocation and copy, like:
> 
>         string_list_append(&subjects, sb.buf);
>         strbuf_release(&buf);

The two modes (dup/nodup) make string_list code tricky.  Not sure
how far we'd get with something simpler (e.g. an array of char pointers),
but having the caller do all string allocations would make the code
easier to analyze.

> So I'd really blame the caller, which doesn't want to do (2) out of a
> sense of optimization. It could also perhaps write it as:
> 
>    while (commit = get_revision(rev)) {
> 	strbuf_reset(&sb);
> 	... maybe put some stuff in sb ...
> 	if (!sb.len)
> 		string_list_append(&subjects, oid_to_hex(obj));
> 	else
> 		string_list_append(&subjects, sb.buf);
>    }
>    strbuf_release(&sb);
> 
> which at least avoids the extra allocations.

Right, we'd just have extra string copies in that case.

> By the way, I think there's another quite subtle leak in this function.
> We do this:
> 
>    format_commit_message(commit, "%s", &sb, &ctx);
>    strbuf_ltrim(&sb);
> 
> and then only use "sb" if sb.len is non-zero. But we may have actually
> allocated to create our zero-length string (e.g., if we had a strbuf
> full of spaces and trimmed them all off). Since we reuse "sb" over and
> over as we loop, this will actually only leak once for the whole loop,
> not once per iteration. So it's probably not a big deal, but writing it
> with the explicit reset/release pattern fixes that (and is more
> idiomatic for our code base, I think).

It's subtle, but I think it's not leaking, at least not in your example
case (and I can't think of another way).  IIUC format_subject(), which
handles the "%s" part, doesn't touch sb if the subject is made up only
of whitespace.

René

  reply	other threads:[~2017-12-08 17:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-07 20:22 [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog() René Scharfe
2017-12-07 21:27 ` Jeff King
2017-12-08 17:29   ` René Scharfe
2017-12-08 18:44     ` Junio C Hamano
2017-12-08 20:10       ` René Scharfe
2017-12-08 21:11     ` Jeff King
2017-12-07 21:47 ` Junio C Hamano
2017-12-08 10:14   ` Jeff King
2017-12-08 17:29     ` René Scharfe [this message]
2017-12-08 18:37       ` Junio C Hamano
2017-12-08 21:28         ` Jeff King
2017-12-18 19:18           ` René Scharfe
2017-12-19 11:38             ` Jeff King
2017-12-19 18:26               ` René Scharfe
2017-12-20 13:05                 ` Jeff King
2017-12-08 21:17       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1654a696-73d5-c9ef-0fc2-bd82aaf2cabb@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).