git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Martin Ågren" <martin.agren@gmail.com>,
	"Andrzej Hunt" <ajrhunt@google.com>, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH 04/10] unpack-trees API: don't have clear_unpack_trees_porcelain() reset
Date: Mon, 4 Oct 2021 09:28:18 -0700	[thread overview]
Message-ID: <CABPp-BGN_cyLVRRcz_BfriK5Gw=3mdUoSmePT4qFSmV6uYgJ3Q@mail.gmail.com> (raw)
In-Reply-To: <87lf38n6e4.fsf@evledraar.gmail.com>

On Mon, Oct 4, 2021 at 8:42 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> On Mon, Oct 04 2021, Elijah Newren wrote:
>
> > On Sun, Oct 3, 2021 at 5:46 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> >>
> >> Change the clear_unpack_trees_porcelain() to be like a *_release()
> >> function, not a *_reset() (in strbuf.c terms). Let's move the only API
> >> user that relied on the latter to doing its own
> >> unpack_trees_options_init(). See the commit that introduced
> >> unpack_trees_options_init() for details on the control flow involved
> >> here.
> >>
> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> >> ---
> >>  merge-recursive.c | 1 +
> >>  unpack-trees.c    | 1 -
> >>  2 files changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/merge-recursive.c b/merge-recursive.c
> >> index d24a4903f1d..a77f66b006c 100644
> >> --- a/merge-recursive.c
> >> +++ b/merge-recursive.c
> >> @@ -442,6 +442,7 @@ static void unpack_trees_finish(struct merge_options *opt)
> >>  {
> >>         discard_index(&opt->priv->orig_index);
> >>         clear_unpack_trees_porcelain(&opt->priv->unpack_opts);
> >> +       unpack_trees_options_init(&opt->priv->unpack_opts);
> >
> > This is wrong.  It suggests that unpack_opts is used after
> > unpack_trees_finish() (other than an outer merge first calling
> > unpack_trees_start() again), which can only serve to greatly confuse
> > future readers.  Drop this hunk.
>
> Sure, but (and also re:
> https://lore.kernel.org/git/CABPp-BEA2myh2Np_YpFWnE+jqmT5vz7ohigZ0=2tL-wizgYQmg@mail.gmail.com/)
> if you'd like not initialize things in merge_start() just for good
> measure wouldn't the diff-at-the-end on top of your 5bf7e5779ec
> (merge-recursive: split internal fields into a separate struct,
> 2019-08-17) also make sense?

Sorry, I can't parse this sentence.  Could you retry?

> I.e. the reason I entered this particular rabbit hole was in looking at
> existing members of "struct merge_options_internal" & past commits and
> seeing how we did its initialization. That canary on top passes all our
> tests, and per my reading we also don't use "df_conflict_file_set" until
> as late as the things we setup in unpack_trees_start(). Should those be
> moved to do the post-merge_start() setup at the same time?

It appears df_conflict_file_set has some theoretical memory leaks
(though in practice unlikely and quite small in the few cases that
could be constructed to trigger it).  Initializing it nearer to use
and free'ing when done (in merge_trees_internal()) would  make more
sense, yes.

But, merge-recursive.c right now is supposed to be the stable fallback
in case someone runs into an issue with merge-ort.  I'd rather keep it
stable in preparation for deleting it, not churning its code
unnecessarily.

> >>  }
> >>
> >>  static int save_files_dirs(const struct object_id *oid,
> >> diff --git a/unpack-trees.c b/unpack-trees.c
> >> index 94767d3f96f..e7365322e82 100644
> >> --- a/unpack-trees.c
> >> +++ b/unpack-trees.c
> >> @@ -197,7 +197,6 @@ void clear_unpack_trees_porcelain(struct unpack_trees_options *opts)
> >>  {
> >>         strvec_clear(&opts->msgs_to_free);
> >>         dir_clear(&opts->dir);
> >> -       memset(opts->msgs, 0, sizeof(opts->msgs));
> >
> > This seems like a very dangerous change.  You want to leave opts->msgs
> > pointing at freed memory?
>
> Yes, as argued in
> http://lore.kernel.org/git/87bl45niqs.fsf@evledraar.gmail.com; In this
> series we can see that nothing re-uses it, so it's as safe as our
> strbuf_release(), or a plain free().

strbuf_release() sets sb->buf to strbuf_slopbuf, and sets sb->len =
sb->alloc = 0.  The strbuf can thus be reused after calling
strbuf_release().

strvec_clear() also calls strvec_init() afterwards to set the vector
to be usable though 0-sized.

hashmap_clear() also clears out existing data, but makes it ready for
reuse (as per 6da1a25814)

strmap_clear(), strintmap_clear(), and strset_clear() also set up the
data structure for reuse.

There's a longstanding presumption that something named `*_clear()`
will make it still usable afterwards.  Rename it to end with `_free`
if you want it to be an analogy to free() where usage afterwards would
cause use-after-free errors.

> Maybe I'm misunderstanding what you're getting at, and I could
> understand a "let's just reset it for good measure" POV. But I can't
> square your view that we shouldn't do setup in merge_start() for good
> measure in case some new future code accidentally uses the data earlier
> (which I'm fine with), but then also not finding it OK to skip the
> memset() here ...

No existing caller needs to make use of the fact that it's a `_clear`
function rather than a `_free` function, but if you want to take
advantage of that to do less work, you should both call it out in your
commit message and rename the function.  You didn't do either.  In
fact, your existing commit message mentions strbuf_release(), which
reinforces the `_clear` presumption of reusability and thus makes me
flag the change as dangerous.

  reply	other threads:[~2021-10-04 16:29 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-04  0:46 [PATCH 00/10] unpack-trees & dir APIs: fix memory leaks Ævar Arnfjörð Bjarmason
2021-10-04  0:46 ` [PATCH 01/10] unpack-trees.[ch]: define and use a UNPACK_TREES_OPTIONS_INIT Ævar Arnfjörð Bjarmason
2021-10-04  0:46 ` [PATCH 02/10] merge-recursive.c: call a new unpack_trees_options_init() function Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04 14:41     ` Ævar Arnfjörð Bjarmason
2021-10-04 15:04       ` Elijah Newren
2021-10-04  0:46 ` [PATCH 03/10] unpack-trees.[ch]: embed "dir" in "struct unpack_trees_options" Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04  0:46 ` [PATCH 04/10] unpack-trees API: don't have clear_unpack_trees_porcelain() reset Ævar Arnfjörð Bjarmason
2021-10-04  9:31   ` Phillip Wood
2021-10-04 11:12     ` Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04 15:20     ` Ævar Arnfjörð Bjarmason
2021-10-04 16:28       ` Elijah Newren [this message]
2021-10-04  0:46 ` [PATCH 05/10] dir.[ch]: make DIR_INIT mandatory Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04  0:46 ` [PATCH 06/10] dir.c: get rid of lazy initialization Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04  0:46 ` [PATCH 07/10] unpack-trees API: rename clear_unpack_trees_porcelain() Ævar Arnfjörð Bjarmason
2021-10-04  9:38   ` Phillip Wood
2021-10-04 11:10     ` Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04  0:46 ` [PATCH 08/10] unpack-trees: don't leak memory in verify_clean_subdirectory() Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04  0:46 ` [PATCH 09/10] merge.c: avoid duplicate unpack_trees_options_release() code Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04 14:50     ` Ævar Arnfjörð Bjarmason
2021-10-04  0:46 ` [PATCH 10/10] built-ins: plug memory leaks with unpack_trees_options_release() Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04 14:54     ` Ævar Arnfjörð Bjarmason
2021-10-04 13:45 ` [PATCH 00/10] unpack-trees & dir APIs: fix memory leaks Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BGN_cyLVRRcz_BfriK5Gw=3mdUoSmePT4qFSmV6uYgJ3Q@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=ajrhunt@google.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=martin.agren@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).