git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Eric Sunshine <sunshine@sunshineco.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 04/21] strbuf: fix leak when `appendwholeline()` fails with EOF
Date: Wed, 29 May 2024 05:16:33 -0400	[thread overview]
Message-ID: <20240529091633.GB1098944@coredump.intra.peff.net> (raw)
In-Reply-To: <ZlQr3tsDTSOGvFUQ@tanuki>

On Mon, May 27, 2024 at 08:44:46AM +0200, Patrick Steinhardt wrote:

> > diff --git a/strbuf.c b/strbuf.c
> > diff --git a/strbuf.c b/strbuf.c
> > index e1076c9891..aed699c6bf 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
> >  	 * we can just re-init, but otherwise we should make sure that our
> >  	 * length is empty, and that the result is NUL-terminated.
> >  	 */
> > -	if (!sb->buf)
> > -		strbuf_init(sb, 0);
> > -	else
> > -		strbuf_reset(sb);
> > +	FREE_AND_NULL(sb->buf);
> > +	strbuf_init(sb, 0);
> >  	return EOF;
> >  }
> >  #else
> > 
> > But I think either of those would solve your leak, _and_ would help with
> > similar leaks of strbuf_getwholeline() and friends.
> 
> I'm not quite convinced that `strbuf_getwholeline()` should deallocate
> the buffer for the caller, I think that makes for quite a confusing
> calling convention. The caller may want to reuse the buffer for other
> operations, and it feels hostile to release the buffer under their feet.
> 
> The only edge case where I think it would make sense to free allocated
> data is when being passed a not-yet-allocated strbuf. But I wonder
> whether the added complexity would be worth it.

I'm not sure what they'd reuse it for. We necessarily have to reset it
before reading, so the contents are now garbage. The allocated buffer
could be reused, but since everybody has to call strbuf_grow() before
assuming they can write, it's not a correctness issue, but only an
optimization. But that optimization is pretty unlikely to matter. Since
we hit this code only on EOF or error, it's generally going to happen
once in a program, and not in a tight loop.

If we really cared, though, I think you could check sb->alloc before the
call to getdelim(), and then we'd know whether the original held an
allocation or not (and we could restore its state). That's what other
syscall-ish strbuf functions like strbuf_readlink() and strbuf_getcwd()
do.

That said, I agree that leaks here are not going to be common. Most
callers are going to call it in a loop and unconditionally release at
the end, whether they get multiple lines or not. The "append" function
is the odd man out by reading a single line into a new buffer[1].

Looking through the results of:

  git grep -P '(?<!while) \(!?strbuf_get(whole)?line'

I saw only one questionable case. builtin/difftool.c does:

  if (strbuf_getline_nul(&lpath, fp))
	break;

without freeing lpath. But then...it does not free it in the case that
we got a value, either! So I think it is leaking either way, and the
solution, to strbuf_release(&lpath) outside of the loop, would fix both
cases.

> I've been going through all callsites and couldn't spot any that doesn't
> free the buffer on EOF. So I'd propose to leave this as-is and revisit
> if we eventually see that this is causing more memory leaks.

OK. I don't feel too strongly about it, but mostly thought it seemed
inconsistent with the philosophy of those other strbuf functions.

-Peff

[1] I was surprised at first that strbuf_appendwholeline() uses that
    extra copy, as the obvious implementation is just to skip the
    strbuf_reset() call in getwholeline(), and then implement "get" as a
    wrapper around "append" to add back in the reset call.

    But that only works for the slower getc() implementation. For the
    getdelim()-based one, there is no notion of append (the interface
    would have to take an offset into the buffer).  We could probably
    optimize this at the cost of repeating ourselves, but given that
    there's only one probably-not-very-hot call to appendwholeline, it's
    likely not worth it.


  reply	other threads:[~2024-05-29  9:16 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-23 12:25 [PATCH 00/20] Various memory leak fixes Patrick Steinhardt
2024-05-23 12:25 ` [PATCH 01/20] t: mark a bunch of tests as leak-free Patrick Steinhardt
2024-05-23 17:44   ` Junio C Hamano
2024-05-24  6:56     ` Patrick Steinhardt
2024-05-24 16:05       ` Junio C Hamano
2024-05-24 17:53         ` Junio C Hamano
2024-05-24 20:34   ` Karthik Nayak
2024-05-23 12:25 ` [PATCH 02/20] transport-helper: fix leaking helper name Patrick Steinhardt
2024-05-23 17:36   ` Junio C Hamano
2024-05-24 20:38   ` Karthik Nayak
2024-05-23 12:25 ` [PATCH 03/20] strbuf: fix leak when `appendwholeline()` fails with EOF Patrick Steinhardt
2024-05-23 12:25 ` [PATCH 04/20] checkout: clarify memory ownership in `unique_tracking_name()` Patrick Steinhardt
2024-05-23 12:25 ` [PATCH 05/20] http: refactor code to clarify memory ownership Patrick Steinhardt
2024-05-23 12:25 ` [PATCH 06/20] config: clarify memory ownership in `git_config_pathname()` Patrick Steinhardt
2024-05-23 12:25 ` [PATCH 07/20] diff: refactor code to clarify memory ownership of prefixes Patrick Steinhardt
2024-05-23 16:59   ` Eric Sunshine
2024-05-23 12:25 ` [PATCH 08/20] convert: refactor code to clarify ownership of check_roundtrip_encoding Patrick Steinhardt
2024-05-23 12:25 ` [PATCH 09/20] builtin/log: stop using globals for log config Patrick Steinhardt
2024-05-23 12:25 ` [PATCH 10/20] builtin/log: stop using globals for format config Patrick Steinhardt
2024-05-23 12:26 ` [PATCH 11/20] config: clarify memory ownership in `git_config_string()` Patrick Steinhardt
2024-05-23 12:26 ` [PATCH 12/20] config: plug various memory leaks Patrick Steinhardt
2024-05-23 17:13   ` Junio C Hamano
2024-05-24  6:58     ` Patrick Steinhardt
2024-05-24  8:55       ` Patrick Steinhardt
2024-05-24 16:12         ` Junio C Hamano
2024-05-24 16:11       ` Junio C Hamano
2024-05-23 12:26 ` [PATCH 13/20] builtin/credential: clear credential before exit Patrick Steinhardt
2024-05-23 12:26 ` [PATCH 14/20] commit-reach: fix memory leak in `ahead_behind()` Patrick Steinhardt
2024-05-23 12:26 ` [PATCH 15/20] submodule: fix leaking memory for submodule entries Patrick Steinhardt
2024-05-23 12:26 ` [PATCH 16/20] strvec: add functions to replace and remove strings Patrick Steinhardt
2024-05-23 17:09   ` Eric Sunshine
2024-05-24  6:56     ` Patrick Steinhardt
2024-05-23 12:26 ` [PATCH 17/20] builtin/mv: refactor `add_slash()` to always return allocated strings Patrick Steinhardt
2024-05-23 12:26 ` [PATCH 18/20] builtin/mv duplicate string list memory Patrick Steinhardt
2024-05-23 12:26 ` [PATCH 19/20] builtin/mv: refactor to use `struct strvec` Patrick Steinhardt
2024-05-23 12:26 ` [PATCH 20/20] builtin/mv: fix leaks for submodule gitfile paths Patrick Steinhardt
2024-05-23 16:45 ` [PATCH 00/20] Various memory leak fixes Junio C Hamano
2024-05-24  6:56   ` Patrick Steinhardt
2024-05-24 10:03 ` [PATCH v2 00/21] " Patrick Steinhardt
2024-05-24 10:03   ` [PATCH v2 01/21] ci: add missing dependency for TTY prereq Patrick Steinhardt
2024-05-24 16:31     ` Junio C Hamano
2024-05-24 10:03   ` [PATCH v2 02/21] t: mark a bunch of tests as leak-free Patrick Steinhardt
2024-05-24 10:03   ` [PATCH v2 03/21] transport-helper: fix leaking helper name Patrick Steinhardt
2024-05-24 10:03   ` [PATCH v2 04/21] strbuf: fix leak when `appendwholeline()` fails with EOF Patrick Steinhardt
2024-05-25  4:46     ` Jeff King
2024-05-27  6:44       ` Patrick Steinhardt
2024-05-29  9:16         ` Jeff King [this message]
2024-05-29 11:25           ` Patrick Steinhardt
2024-05-30  7:16             ` Jeff King
2024-05-24 10:03   ` [PATCH v2 05/21] checkout: clarify memory ownership in `unique_tracking_name()` Patrick Steinhardt
2024-05-24 10:03   ` [PATCH v2 06/21] http: refactor code to clarify memory ownership Patrick Steinhardt
2024-05-24 10:03   ` [PATCH v2 07/21] config: clarify memory ownership in `git_config_pathname()` Patrick Steinhardt
2024-05-24 10:03   ` [PATCH v2 08/21] diff: refactor code to clarify memory ownership of prefixes Patrick Steinhardt
2024-05-24 10:03   ` [PATCH v2 09/21] convert: refactor code to clarify ownership of check_roundtrip_encoding Patrick Steinhardt
2024-05-24 10:03   ` [PATCH v2 10/21] builtin/log: stop using globals for log config Patrick Steinhardt
2024-05-24 10:04   ` [PATCH v2 11/21] builtin/log: stop using globals for format config Patrick Steinhardt
2024-05-24 10:04   ` [PATCH v2 12/21] config: clarify memory ownership in `git_config_string()` Patrick Steinhardt
2024-05-24 10:04   ` [PATCH v2 13/21] config: plug various memory leaks Patrick Steinhardt
2024-05-24 10:13     ` Patrick Steinhardt
2024-05-25  4:33     ` Jeff King
2024-05-27  6:46       ` Patrick Steinhardt
2024-05-29  9:20         ` Jeff King
2024-05-24 10:04   ` [PATCH v2 14/21] builtin/credential: clear credential before exit Patrick Steinhardt
2024-05-24 10:04   ` [PATCH v2 15/21] commit-reach: fix memory leak in `ahead_behind()` Patrick Steinhardt
2024-05-24 10:04   ` [PATCH v2 16/21] submodule: fix leaking memory for submodule entries Patrick Steinhardt
2024-05-24 10:04   ` [PATCH v2 17/21] strvec: add functions to replace and remove strings Patrick Steinhardt
2024-05-24 10:04   ` [PATCH v2 18/21] builtin/mv: refactor `add_slash()` to always return allocated strings Patrick Steinhardt
2024-05-24 10:04   ` [PATCH v2 19/21] builtin/mv duplicate string list memory Patrick Steinhardt
2024-05-24 10:04   ` [PATCH v2 20/21] builtin/mv: refactor to use `struct strvec` Patrick Steinhardt
2024-05-24 10:04   ` [PATCH v2 21/21] builtin/mv: fix leaks for submodule gitfile paths Patrick Steinhardt
2024-05-25  2:10   ` [PATCH v2 00/21] Various memory leak fixes Junio C Hamano
2024-05-27  6:44     ` Patrick Steinhardt
2024-05-27 17:38       ` Junio C Hamano
2024-05-27 18:02         ` Junio C Hamano
2024-05-28  5:09         ` Patrick Steinhardt
2024-05-29  8:25       ` Karthik Nayak
2024-05-27 11:45 ` [PATCH v3 " Patrick Steinhardt
2024-05-27 11:45   ` [PATCH v3 01/21] ci: add missing dependency for TTY prereq Patrick Steinhardt
2024-05-27 11:45   ` [PATCH v3 02/21] t: mark a bunch of tests as leak-free Patrick Steinhardt
2024-05-27 11:45   ` [PATCH v3 03/21] transport-helper: fix leaking helper name Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 04/21] strbuf: fix leak when `appendwholeline()` fails with EOF Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 05/21] checkout: clarify memory ownership in `unique_tracking_name()` Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 06/21] http: refactor code to clarify memory ownership Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 07/21] config: clarify memory ownership in `git_config_pathname()` Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 08/21] diff: refactor code to clarify memory ownership of prefixes Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 09/21] convert: refactor code to clarify ownership of check_roundtrip_encoding Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 10/21] builtin/log: stop using globals for log config Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 11/21] builtin/log: stop using globals for format config Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 12/21] config: clarify memory ownership in `git_config_string()` Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 13/21] config: plug various memory leaks Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 14/21] builtin/credential: clear credential before exit Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 15/21] commit-reach: fix memory leak in `ahead_behind()` Patrick Steinhardt
2024-05-27 11:46   ` [PATCH v3 16/21] submodule: fix leaking memory for submodule entries Patrick Steinhardt
2024-05-27 11:47   ` [PATCH v3 17/21] strvec: add functions to replace and remove strings Patrick Steinhardt
2024-05-27 11:47   ` [PATCH v3 18/21] builtin/mv: refactor `add_slash()` to always return allocated strings Patrick Steinhardt
2024-05-27 11:47   ` [PATCH v3 19/21] builtin/mv duplicate string list memory Patrick Steinhardt
2024-05-27 11:47   ` [PATCH v3 20/21] builtin/mv: refactor to use `struct strvec` Patrick Steinhardt
2024-05-27 11:47   ` [PATCH v3 21/21] builtin/mv: fix leaks for submodule gitfile paths Patrick Steinhardt
2024-05-27 17:52   ` [PATCH v3 00/21] Various memory leak fixes Junio C Hamano
2024-05-30  6:38   ` [PATCH 0/5] add-ons for ps/leakfixes Jeff King
2024-05-30  6:39     ` [PATCH 1/5] t-strvec: use va_end() to match va_start() Jeff King
2024-05-30  6:39     ` [PATCH 2/5] t-strvec: mark variable-arg helper with LAST_ARG_MUST_BE_NULL Jeff King
2024-05-30  6:44     ` [PATCH 3/5] mv: move src_dir cleanup to end of cmd_mv() Jeff King
2024-05-30  7:04       ` Patrick Steinhardt
2024-05-30  7:21         ` Jeff King
2024-05-30  7:24           ` Patrick Steinhardt
2024-05-30  8:15             ` Jeff King
2024-05-30  8:19               ` Patrick Steinhardt
2024-05-30  8:28                 ` Jeff King
2024-05-30  6:45     ` [PATCH 4/5] mv: factor out empty src_dir removal Jeff King
2024-05-30  6:46     ` [PATCH 5/5] mv: replace src_dir with a strvec Jeff King
2024-05-30 15:36       ` Junio C Hamano
2024-05-31 11:12         ` Jeff King
2024-05-31 14:56           ` Junio C Hamano
2024-05-30  7:05     ` [PATCH 0/5] add-ons for ps/leakfixes Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240529091633.GB1098944@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).