git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, "René Scharfe" <l.s.r@web.de>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks
Date: Sat, 7 Jan 2023 08:21:26 -0500	[thread overview]
Message-ID: <Y7lx1hUpZ7zOP1Lo@coredump.intra.peff.net> (raw)
In-Reply-To: <221219.867cyn20zj.gmgdl@evledraar.gmail.com>

On Mon, Dec 19, 2022 at 10:20:00AM +0100, Ævar Arnfjörð Bjarmason wrote:

> I hear you, but I also think you're implicitly conflating two things
> here.
> 
> There's the question of whether we should in general optimize for safety
> over more optimila memory use. I.e. if we simply have every strvec,
> string_list etc. own its memory fully we don't need to think as much
> about allocation or ownership.
> 
> I think we should do that in general, but we also have cases where we'd
> like to not do that, e.g. where we're adding thousands of strings to a
> string_list, which are all borrewed from elsewhere, except for a few
> we'd like to xstrdup().
> 
> Such API use *is* tricky, but I think formalizing it as the
> "string_list" does is making it better, not worse. In particular...:

I think the two things are related, though, because "safety" is "people
not mis-using the API". Once you give the API the option to do the
trickier thing, people will do it, and they will do it badly. That has
been my experience with the string-list API (especially that people try
to do clever things with transferring ownership by using a
non-duplicating list and then setting strdup_strings midway through the
function).

> > I would disagree that this hasn't been an issue in practice. A few
> > recent examples:
> >
> >   - 5eeb9aa208 (refs: fix memory leak when parsing hideRefs config,
> >     2022-11-17)
> >   - 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
> >     strings, 2022-09-08)
> >   - 4c81ee9669 (submodule--helper: fix "reference" leak, 2022-09-01)
> 
> ...it's funny that those are the examples I would have dug up to argue
> that this is a good idea, and to add some:
> 
> 	- 4a0479086a9 (commit-graph: fix memory leak in misused
>           string_list API, 2022-03-04)
> 	- b202e51b154 (grep: fix a "path_list" memory leak, 2021-10-22)
> 
> I.e. above you note "in both directions [...] leaks [...] and double
> frees", but these (and the ones I added) are all in the second category.

I almost didn't give a list, because it's hard to dig into all of the
details. For example the second one in my list, which I worked on, did
fix things by consistently setting strdup_strings, and it was "just" a
leak fix. But it took me a full day to untangle the mess of that code,
and there were intermediate states were we did access dangling pointers
before I got to a working order of the series.  And all of it was
complicated by the fact that string_list is configurable, so different
bits of the code expected different things. Even after that commit,
there's a horrible hack to set the strdup field and hope it's enough.

If that were the only time I'd wrestled with string_list, I'd be less
concerned. But (subjectively) this feels like the latest in a long line
of bugs and irritations.

-Peff

  reply	other threads:[~2023-01-07 13:21 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13  6:47 [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
2022-12-13  8:37 ` Ævar Arnfjörð Bjarmason
2022-12-13 18:31   ` René Scharfe
2022-12-14  8:44     ` Ævar Arnfjörð Bjarmason
2022-12-15  9:11       ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks Ævar Arnfjörð Bjarmason
2022-12-15  9:11         ` [RFC PATCH 1/5] builtin/annotate.c: simplify for strvec API Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 2/5] various: add missing strvec_clear() Ævar Arnfjörð Bjarmason
2022-12-15  9:11         ` [RFC PATCH 3/5] strvec API: add a "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 4/5] strvec API users: fix leaks by using "STRVEC_INIT_NODUP" Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-15  9:11         ` [RFC PATCH 5/5] strvec API users: fix more " Ævar Arnfjörð Bjarmason
2022-12-17 12:45           ` René Scharfe
2022-12-17 12:45         ` [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks René Scharfe
2022-12-17 13:13         ` Jeff King
2022-12-19  9:20           ` Ævar Arnfjörð Bjarmason
2023-01-07 13:21             ` Jeff King [this message]
2022-12-17 12:46       ` [PATCH] am: don't pass strvec to apply_parse_options() René Scharfe
2022-12-17 13:24     ` Jeff King
2022-12-17 16:07       ` René Scharfe
2022-12-17 21:53         ` Jeff King
2022-12-18  2:42           ` Junio C Hamano
2022-12-20  1:29         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7lx1hUpZ7zOP1Lo@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).