git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks
Date: Mon, 17 Aug 2020 10:19:03 -0700	[thread overview]
Message-ID: <CABPp-BHyCVdb5AueF+tTwTsgAA5LkPEj-mPLoX_F+WYgPqFcNw@mail.gmail.com> (raw)
In-Reply-To: <20200816091154.GC1221900@coredump.intra.peff.net>

On Sun, Aug 16, 2020 at 2:11 AM Jeff King <peff@peff.net> wrote:
>
> On Sun, Aug 16, 2020 at 06:59:11AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Digging further, I found that despite the pretty clear documentation
> > near the top of dir.h that folks were supposed to call clear_directory()
> > when the user no longer needed the dir_struct, there were four callers
> > that didn't bother doing that at all.  However, two of them clearly
> > thought about leaks since they had an UNLEAK(dir) directive, which to me
> > suggests that the method to free the data was too unclear.  I suspect
> > the non-obviousness of the API and its holes led folks to avoid it,
> > which then snowballed into further problems with the entries[],
> > ignored[], parent_hashmap, and recursive_hashmap problems.
>
> The UNLEAK() ones are sort-of my fault, and are a combination of:
>
>   - The commit adding them says "in some cases (e.g., dir_struct) we
>     don't even have a function which knows how to free all of the struct
>     members". I'm not sure if why I didn't fix clear_directory() as you
>     did. I may not have known about it, or I may have been lazy. Or more
>     likely I was simply holding the UNLEAK() hammer, so it looked like a
>     nail. ;)
>
>   - My focus was on making leak-checker output cleaner. And it wasn't
>     clear for cases where we're about to exit the program whether we
>     should be actually freeing (which has small but non-zero cost) or
>     just annotating (which is zero-cost, but clearly has confused some
>     people about how UNLEAK() was meant to be used). I think I'm leaning
>     these days towards "free if it is easy to do so".
>
> So this definitely seems like a good direction to me.
>
> > Rename clear_directory() to dir_free() to be more in line with other
> > data structures in git, and introduce a dir_init() to handle the
> > suggested memsetting of dir_struct to all zeroes.  I hope that a name
> > like "dir_free()" is more clear, and that the presence of dir_init()
> > will provide a hint to those looking at the code that there may be a
> > corresponding dir_free() that they need to call.
>
> I think having a pair of matched calls is good. I _thought_ we had
> established a pattern that we should prefer "clear" to "free" for cases
> where the struct itself its not freed. But grepping around, I see there
> are a few exceptions (hashmap_free() is the big one, and then
> oidmap_free() which is built around it seems to have inherited it).
>
> The rest seem to follow that pattern, though: attr_check_free,
> cache_tree_free, and submodule_cache_free all actually free the pointer
> passed in. And "git grep '_clear(' *.h" shows lots of
> clear-but-don't-free examples.
>
> Would dir_clear() make the pairing more obvious, but keep the clear
> name?

Sure, that works for me.  The case where having an actual _free()
makes sense to me -- possibly in addition to a _clear() -- is when
some memory has to be allocated before first use, and thus a
foo_clear() would leave that memory allocated.  Then you really do
need a foo_free() for use when the data structure won't be used again.

> (I also wouldn't be opposed to changing hashmap and oidmap to use the
> name "clear", but that's obviously a separate patch).

hashmap is one of the cases that needs to have a free construct,
because the table in which to stuff the entries has to be allocated
and thus a hashmap_clear() would have to leave the table allocated if
it wants to be ready for re-use.  If someone really is done with a
hashmap, then to avoid leaking, both the entries and the table need to
be deallocated.

I keep getting confused by the hashmap API, and what pieces it frees
-- it looks like my earlier comments today were wrong and
hashmap_free_entries() does free the table.  So...perhaps I should
create a patch to make that clearer, and also submit the patch I've
had for a while to introduce a hashmap_clear() function (which is
similar to hashmap_free_entries, in that it frees the entries and
zeros out most of the map, but it leaves the table allocated and ready
for use).

I really wish hashmap_free() did what hashmap_free_entries() did.  So
annoying and counter-intuitive...

> >  builtin/add.c          |  4 ++--
> >  builtin/check-ignore.c |  4 ++--
> >  builtin/clean.c        |  8 ++++----
> >  builtin/grep.c         |  3 ++-
> >  builtin/ls-files.c     |  4 ++--
> >  builtin/stash.c        |  4 ++--
> >  dir.c                  |  7 ++++++-
> >  dir.h                  | 19 ++++++++++---------
> >  merge.c                |  3 ++-
> >  wt-status.c            |  4 ++--
> >  10 files changed, 34 insertions(+), 26 deletions(-)
>
> That patch itself looks good except for two minor points:
>
> >  /* Frees memory within dir which was allocated.  Does not free dir itself. */
> > -void clear_directory(struct dir_struct *dir)
> > +void dir_free(struct dir_struct *dir)
> >  {
> >       int i, j;
> >       struct exclude_list_group *group;
>
> As I mentioned in my previous email, I think it would be nice if this
> called dir_init() at the end, so that the struct is always in a
> consistent state.
>
> > diff --git a/dir.h b/dir.h
> > index 7d76d0644f..7c55c1a460 100644
> > --- a/dir.h
> > +++ b/dir.h
> > [...]
> > - * - Use `dir.entries[]`.
> > + * - Use `dir.entries[]` and `dir.ignored[]`.
> >   *
> >   * - Call `clear_directory()` when the contained elements are no longer in use.
> >   *
>
> This last line should become dir_free() / dir_clear().

I'll fix these up.

Elijah

  reply	other threads:[~2020-08-17 17:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-16  6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-16  6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
2020-08-16  8:43   ` Jeff King
2020-08-17 16:57     ` Elijah Newren
2020-08-16  6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-16  8:54   ` Jeff King
2020-08-17 16:58     ` Elijah Newren
2020-08-16  6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-16  9:11   ` Jeff King
2020-08-17 17:19     ` Elijah Newren [this message]
2020-08-17 19:00       ` Jeff King
2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-18 22:58   ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-18 22:58   ` [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-19 13:51   ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BHyCVdb5AueF+tTwTsgAA5LkPEj-mPLoX_F+WYgPqFcNw@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).