git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "René Scharfe" <l.s.r@web.de>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH] chdir-notify: UNLEAK registrated callback entries
Date: Tue, 17 Nov 2020 00:39:02 -0800	[thread overview]
Message-ID: <CABPp-BGNws7mdnBL2j-hvh1fVwxNjJ1scynzUZtMWtEtJTdkDA@mail.gmail.com> (raw)
In-Reply-To: <20201117065304.GA41136@coredump.intra.peff.net>

On Mon, Nov 16, 2020 at 10:53 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Nov 16, 2020 at 08:49:23PM -0800, Elijah Newren wrote:
>
> > > Right, avoiding list.h like below lets Valgrind classify the memory as
> > > "still reachable", without UNLEAK.  Not sure if it's worth it, though.
> > >
> > > Note my somewhat concerning knee-jerk reaction to write some macros. ;)
> >
> > I've got a local patch that changes these to actually be freed as yet
> > another route we could go.  It's mixed with a few (half-baked)
> > revision-related memory cleanups that should be separated, and I'm
> > unsure if there are other places that would need a finalize_globals()
> > call too, but it's another approach to consider...
>
> Here are a few thoughts...
>
> > diff --git a/builtin/fast-rebase.c b/builtin/fast-rebase.c
> > index 19333ca42d..32cab36ad3 100644
> > --- a/builtin/fast-rebase.c
> > +++ b/builtin/fast-rebase.c
> > @@ -173,6 +173,8 @@ int cmd_fast_rebase(int argc, const char **argv, const char *prefix)
> >               last_commit = create_commit(result.tree, commit, last_commit);
> >       }
> >       fprintf(stderr, "\nDone.\n");
> > +     rev_info_free(&revs);
> > +     memset(&revs, 0, sizeof(revs));
>
> This one could easily be an UNLEAK(), since we're presumably exiting
> from cmd_fast_rebase() soon after. But I certainly don't mind a real
> cleanup.
>
> Probably rev_info_free() should leave the rev_info in a reasonable state
> (so callers don't have to re-zero it).
>
> > diff --git a/cache.h b/cache.h
> > index c0072d43b1..b9b543f75b 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -604,6 +604,9 @@ const char *setup_git_directory(void);
> >  char *prefix_path(const char *prefix, int len, const char *path);
> >  char *prefix_path_gently(const char *prefix, int len, int *remaining, const char *path);
> >
> > +/* Release resources related to global config. */
> > +void finalize_globals(void);
>
> Yuck. Remembering to put things in this, and remembering to call it in
> all of the right places seems ugly.
>
> If we want to go this route, then I think individual pieces of code
> could register their own atexit() handlers to clean up. But...
>
> > +void chdir_deregister_all(void)
> > +{
> > +     struct list_head *pos, *tmp;
> > +
> > +     if (list_empty(&chdir_notify_entries))
> > +             return;
> > +
> > +     list_for_each_safe(pos, tmp, &chdir_notify_entries) {
> > +             struct chdir_notify_entry *e =
> > +                     list_entry(pos, struct chdir_notify_entry, list);
> > +             free(e);
> > +     }
> > +     memset(&chdir_notify_entries, 0, sizeof(chdir_notify_entries));
> > +}
>
> I disagree that this is actually a leak. The memory is still referenced
> by the list, just like a ton of other global variables (e.g., quite a
> lot of config or environment variables that are loaded throughout a
> normal program).
>
> So my inclination is to leave referenced globals like this. If there are
> tools that are confused by our linked list structure, then we should
> adjust our tools (either using ASan, or turning off valgrind's "possibly
> lost" report).
>
> (My preference is to use ASan in general, because it's _so_ much
> faster. Running the whole suite under valgrind is pretty painful, and
> ideally we'd eventually hit a point where it was possible to run the
> whole suite with leak-checking enabled).

I'm aware of the differences between "definitely lossed" and "still
reachable".  And yes, the definitely lost are more important to clean
up.  But I'm not convinced that "still reachable" are always okay; I
think they sometimes point to latent bugs and other issues that might
manifest as leaks with slightly different workloads.  I also think
that sometimes these are easier to track down than some of the actual
leaks, but the work to free them and their associated structures will
naturally clear up some of the actual leaks.  Neither of those two
cases apply here, of course, because we've done the auditing to know
why these particular "possibly lost"/"still reachable" references
exist.  But if you have a lot of noise in the "possibly lost" and
"still reachable" categories, it makes it unlikely you'll want to go
through the remainder to find other useful signal.

All that said, I've been sitting on this and a bunch of other patches
for months, because I agree with some of your comments on the ugliness
of the code, and because when I started to go down the rabbit hole for
cleaning up rev_info I found it was not only ugly but also a very deep
hole...and it was measurably slowing down my fast-rebase testcases too
which was demotivating.


Running the whole testsuite under ASan sounds great, btw.

> > diff --git a/git.c b/git.c
> > index af84f11e69..4b04988909 100644
> > --- a/git.c
> > +++ b/git.c
> > @@ -358,6 +358,7 @@ static int handle_alias(int *argcp, const char ***argv)
> >                       trace2_cmd_name("_run_shell_alias_");
> >
> >                       ret = run_command(&child);
> > +                     finalize_globals();
> >                       if (ret >= 0)   /* normal exit */
> >                               exit(ret);
>
> I thought common-main.c might be a better place to put something like
> this, but any time we exit() it gets weird (of course, atexit() would
> take care of this).
>
> OTOH, if we call exit() then most things are still on the stack (or in
> globals), so we're not actually leaking those things. I think the root
> of this chdir-notify issue is not that we don't still have a handle to
> the memory, but that the linked list confuses valgrind.  But the same
> would be true if we were holding another linked list (or hashmap) in a
> stack variable, referenced via the heap, etc, and called exit().
>
> > +void rev_info_free(struct rev_info *revs)
> > +{
> > +     int i;
> > +
> > +     for (i = 0; i < revs->cmdline.nr; i++)
> > +             free((char*)revs->cmdline.rev[i].name);
> > +     free(revs->cmdline.rev);
> > +}
>
> I suspect there's quite a bit more needed here (René already found a
> case where pathspecs needed to be freed). But I don't mind if we start
> small and add to it as leak-checking notices items.

Yeah, I had other patches that added on to it; I didn't have tests
quite passing so I #ifdef'd out some blocks while testing to return
some leaks but undo some of the breaks.  Anyway, I had something that
looked like

+void rev_info_free(struct rev_info *revs)
+{
+       int i;
+
+#if 0
+       if (revs->commits)
+               free_commit_list(revs->commits);
+#endif
+
+       object_array_clear(&revs->boundary_commits);
+
+       for (i = 0; i < revs->cmdline.nr; i++) {
+#if 0
+               /* This block causes double free for 'git fetch origin' */
+               struct object *obj = revs->cmdline.rev[i].item;
+               if (obj->type == OBJ_COMMIT)
+                       free_commit_list(((struct commit *)obj)->parents);
+#endif
+               free((char*)revs->cmdline.rev[i].name);
+       }
+       free(revs->cmdline.rev);
+
+       clear_pathspec(&revs->prune_data);
+       clear_pathspec(&revs->pruning.pathspec);
+       if (revs->graph)
+               graph_free(revs->graph);
+       free(revs->graph);
+       clear_pathspec(&revs->diffopt.pathspec);
+
+       if (revs->reflog_info)
+               reflog_walk_free(&revs->reflog_info);
+
+       clear_decorations(&revs->children, 1);
+       clear_decorations(&revs->merge_simplification, 1);
+       clear_decorations(&revs->treesame, 1);
+       free_saved_parents(revs);
+       free(revs->saved_parents_slab);
+}

but, not only was it not passing tests, I knew there were other things
to free still that I hadn't gotten too.  Also, you'll note that a
number of functions listed above don't exist; I had to add them.


So yeah, trying to clean up rev_info will be a bit of work.  I'd be
happy to hand off what I do have if anyone is interested; I doubt I'll
get back to it soon.

  reply	other threads:[~2020-11-17  8:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-14 21:40 [PATCH] chdir-notify: UNLEAK registrated callback entries René Scharfe
2020-11-14 21:53 ` René Scharfe
2020-11-16 21:59   ` Junio C Hamano
2020-11-17  4:49   ` Elijah Newren
2020-11-17  6:53     ` Jeff King
2020-11-17  8:39       ` Elijah Newren [this message]
2020-11-17  0:24 ` Jeff King
2020-11-17 18:45   ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGNws7mdnBL2j-hvh1fVwxNjJ1scynzUZtMWtEtJTdkDA@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).