git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Elijah Newren <newren@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks
Date: Mon, 17 Aug 2020 15:00:31 -0400	[thread overview]
Message-ID: <20200817190031.GB1278968@coredump.intra.peff.net> (raw)
In-Reply-To: <CABPp-BHyCVdb5AueF+tTwTsgAA5LkPEj-mPLoX_F+WYgPqFcNw@mail.gmail.com>

On Mon, Aug 17, 2020 at 10:19:03AM -0700, Elijah Newren wrote:

> > (I also wouldn't be opposed to changing hashmap and oidmap to use the
> > name "clear", but that's obviously a separate patch).
> 
> hashmap is one of the cases that needs to have a free construct,
> because the table in which to stuff the entries has to be allocated
> and thus a hashmap_clear() would have to leave the table allocated if
> it wants to be ready for re-use.  If someone really is done with a
> hashmap, then to avoid leaking, both the entries and the table need to
> be deallocated.

Hmm, you're right. oidmap() will lazy-initialize the table, but hashmap
will not. You _do_ have to initialize a hashmap because somebody has to
set the comparison function. But that could be fixed, and would make it
more like the rest of our code. I.e., you should be able to do:

  struct hashmap foo = HASHMAP_INIT(my_cmpfn);

  if (bar)
	return; /* no leak, because we never put anything in the map! */

  ... add some stuff to the map ...

  hashmap_clear(&foo);

  ... now it's empty and nothing allocated; we could return
      here without a leak or we could add more stuff to it ...

I don't even think it would be that big a change. Just translate a NULL
table to "not found" on the read side, and lazily call alloc_table() on
the write side. And have hashmap_free() not very the _whole_ struct, but
leave the cmpfn in place.

> I keep getting confused by the hashmap API, and what pieces it frees
> -- it looks like my earlier comments today were wrong and
> hashmap_free_entries() does free the table.  So...perhaps I should
> create a patch to make that clearer, and also submit the patch I've
> had for a while to introduce a hashmap_clear() function (which is
> similar to hashmap_free_entries, in that it frees the entries and
> zeros out most of the map, but it leaves the table allocated and ready
> for use).
> 
> I really wish hashmap_free() did what hashmap_free_entries() did.  So
> annoying and counter-intuitive...

I left some comments in my other reply, but in case you do pursue this:
the obvious thing to have is a free_entries boolean parameter to the
function, so that each caller is clear about what they want. And we used
to have that. But it's awkward because "free the entries" isn't a
boolean anymore; it's "free the entries you can find by moving backwards
to this offset from the hashmap_entry pointer". So callers who don't
want to free them have to pass some sentinel value there. And that's how
we ended up with two separate wrapper functions.

I think your main complaint is just about the naming though. If we had:

  /* drop all entries, freeing any hashmap-specific memory */
  hashmap_clear();

  /* ditto, but also free the entries themselves */
  hashmap_clear_and_free_entires();

that would be a bit more obvious (though I imagine it would still be
easy to forget that "clear" doesn't drop the entries). Another approach
would be to have a flag in the map for "do I own the entry memory". Most
callers are happy to hand off ownership of the entries when they're
added. And it may even be that this would open up the possibility of
more convenience functions on the adding/allocation side. I didn't think
it through carefully, though.

-Peff

  reply	other threads:[~2020-08-17 19:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-16  6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-16  6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
2020-08-16  8:43   ` Jeff King
2020-08-17 16:57     ` Elijah Newren
2020-08-16  6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-16  8:54   ` Jeff King
2020-08-17 16:58     ` Elijah Newren
2020-08-16  6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-16  9:11   ` Jeff King
2020-08-17 17:19     ` Elijah Newren
2020-08-17 19:00       ` Jeff King [this message]
2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-18 22:58   ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-18 22:58   ` [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-19 13:51   ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200817190031.GB1278968@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    --subject='Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).