git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] chdir-notify: UNLEAK registrated callback entries
@ 2020-11-14 21:40 René Scharfe
  2020-11-14 21:53 ` René Scharfe
  2020-11-17  0:24 ` Jeff King
  0 siblings, 2 replies; 8+ messages in thread
From: René Scharfe @ 2020-11-14 21:40 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Jeff King

chdir_notify_register() allows registering functions to notify when
chdir() is called.  There is no way to unsubscribe or shut this
mechanism down, so these entries are present until the program ends.

Valgrind reports allocations for these registrations as "possibly lost",
probably because it doesn't see through list.h's offsetof tricks.
Annotate them using UNLEAK, which causes Valgrind to report them as
"still reachable" instead.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 chdir-notify.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/chdir-notify.c b/chdir-notify.c
index 5f7f2c2ac2..b236288416 100644
--- a/chdir-notify.c
+++ b/chdir-notify.c
@@ -16,6 +16,7 @@ void chdir_notify_register(const char *name,
 			   void *data)
 {
 	struct chdir_notify_entry *e = xmalloc(sizeof(*e));
+	UNLEAK(e);
 	e->name = name;
 	e->cb = cb;
 	e->data = data;
--
2.29.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] chdir-notify: UNLEAK registrated callback entries
  2020-11-14 21:40 [PATCH] chdir-notify: UNLEAK registrated callback entries René Scharfe
@ 2020-11-14 21:53 ` René Scharfe
  2020-11-16 21:59   ` Junio C Hamano
  2020-11-17  4:49   ` Elijah Newren
  2020-11-17  0:24 ` Jeff King
  1 sibling, 2 replies; 8+ messages in thread
From: René Scharfe @ 2020-11-14 21:53 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Jeff King

Am 14.11.20 um 22:40 schrieb René Scharfe:
> chdir_notify_register() allows registering functions to notify when
> chdir() is called.  There is no way to unsubscribe or shut this
> mechanism down, so these entries are present until the program ends.
>
> Valgrind reports allocations for these registrations as "possibly lost",
> probably because it doesn't see through list.h's offsetof tricks.

Right, avoiding list.h like below lets Valgrind classify the memory as
"still reachable", without UNLEAK.  Not sure if it's worth it, though.

Note my somewhat concerning knee-jerk reaction to write some macros. ;)

---
 chdir-notify.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/chdir-notify.c b/chdir-notify.c
index b236288416..4ce1f1120b 100644
--- a/chdir-notify.c
+++ b/chdir-notify.c
@@ -1,26 +1,32 @@
 #include "cache.h"
 #include "chdir-notify.h"
-#include "list.h"
 #include "strbuf.h"

 struct chdir_notify_entry {
 	const char *name;
 	chdir_notify_callback cb;
 	void *data;
-	struct list_head list;
 };
-static LIST_HEAD(chdir_notify_entries);
+
+#define VECTOR_TYPE(elemtype) struct { elemtype *v; size_t alloc, nr; }
+#define VECTOR_FOR_EACH(x, p) for (p = (x)->v; p < (x)->v + (x)->nr; p++)
+#define VECTOR_NEW_ENTRY(x, p) do {			\
+	ALLOC_GROW_BY((x)->v, (x)->nr, 1, (x)->alloc);	\
+	p = (x)->v + (x)->nr - 1;			\
+} while (0)
+
+static VECTOR_TYPE(struct chdir_notify_entry) chdir_notify_entries;

 void chdir_notify_register(const char *name,
 			   chdir_notify_callback cb,
 			   void *data)
 {
-	struct chdir_notify_entry *e = xmalloc(sizeof(*e));
-	UNLEAK(e);
+	struct chdir_notify_entry *e;
+
+	VECTOR_NEW_ENTRY(&chdir_notify_entries, e);
 	e->name = name;
 	e->cb = cb;
 	e->data = data;
-	list_add_tail(&e->list, &chdir_notify_entries);
 }

 static void reparent_cb(const char *name,
@@ -52,7 +58,7 @@ void chdir_notify_reparent(const char *name, char **path)
 int chdir_notify(const char *new_cwd)
 {
 	struct strbuf old_cwd = STRBUF_INIT;
-	struct list_head *pos;
+	struct chdir_notify_entry *e;

 	if (strbuf_getcwd(&old_cwd) < 0)
 		return -1;
@@ -67,11 +73,8 @@ int chdir_notify(const char *new_cwd)
 			 "setup: chdir from '%s' to '%s'",
 			 old_cwd.buf, new_cwd);

-	list_for_each(pos, &chdir_notify_entries) {
-		struct chdir_notify_entry *e =
-			list_entry(pos, struct chdir_notify_entry, list);
+	VECTOR_FOR_EACH(&chdir_notify_entries, e)
 		e->cb(e->name, old_cwd.buf, new_cwd, e->data);
-	}

 	strbuf_release(&old_cwd);
 	return 0;
--
2.29.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] chdir-notify: UNLEAK registrated callback entries
  2020-11-14 21:53 ` René Scharfe
@ 2020-11-16 21:59   ` Junio C Hamano
  2020-11-17  4:49   ` Elijah Newren
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2020-11-16 21:59 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git Mailing List, Jeff King

René Scharfe <l.s.r@web.de> writes:

> Right, avoiding list.h like below lets Valgrind classify the memory as
> "still reachable", without UNLEAK.  Not sure if it's worth it, though.
>
> Note my somewhat concerning knee-jerk reaction to write some macros. ;)

We have used such 3-tuple of <vector, alloc, nr> in many places from
very early time, and these macros might be worth using if we were to
rewrite all of them.

As to this particular instance, given that the entries is either a
list or a vector of at most one element in practice, I do not think
it matters very much either way.  Just a single UNLEAK is certainly
simpler ;-)

Thanks.




>  struct chdir_notify_entry {
>  	const char *name;
>  	chdir_notify_callback cb;
>  	void *data;
> -	struct list_head list;
>  };
> -static LIST_HEAD(chdir_notify_entries);
> +
> +#define VECTOR_TYPE(elemtype) struct { elemtype *v; size_t alloc, nr; }
> +#define VECTOR_FOR_EACH(x, p) for (p = (x)->v; p < (x)->v + (x)->nr; p++)
> +#define VECTOR_NEW_ENTRY(x, p) do {			\
> +	ALLOC_GROW_BY((x)->v, (x)->nr, 1, (x)->alloc);	\
> +	p = (x)->v + (x)->nr - 1;			\
> +} while (0)
> +
> +static VECTOR_TYPE(struct chdir_notify_entry) chdir_notify_entries;
>
>  void chdir_notify_register(const char *name,
>  			   chdir_notify_callback cb,
>  			   void *data)
>  {
> -	struct chdir_notify_entry *e = xmalloc(sizeof(*e));
> -	UNLEAK(e);
> +	struct chdir_notify_entry *e;
> +
> +	VECTOR_NEW_ENTRY(&chdir_notify_entries, e);
>  	e->name = name;
>  	e->cb = cb;
>  	e->data = data;
> -	list_add_tail(&e->list, &chdir_notify_entries);
>  }
>
>  static void reparent_cb(const char *name,
> @@ -52,7 +58,7 @@ void chdir_notify_reparent(const char *name, char **path)
>  int chdir_notify(const char *new_cwd)
>  {
>  	struct strbuf old_cwd = STRBUF_INIT;
> -	struct list_head *pos;
> +	struct chdir_notify_entry *e;
>
>  	if (strbuf_getcwd(&old_cwd) < 0)
>  		return -1;
> @@ -67,11 +73,8 @@ int chdir_notify(const char *new_cwd)
>  			 "setup: chdir from '%s' to '%s'",
>  			 old_cwd.buf, new_cwd);
>
> -	list_for_each(pos, &chdir_notify_entries) {
> -		struct chdir_notify_entry *e =
> -			list_entry(pos, struct chdir_notify_entry, list);
> +	VECTOR_FOR_EACH(&chdir_notify_entries, e)
>  		e->cb(e->name, old_cwd.buf, new_cwd, e->data);
> -	}
>
>  	strbuf_release(&old_cwd);
>  	return 0;
> --
> 2.29.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] chdir-notify: UNLEAK registrated callback entries
  2020-11-14 21:40 [PATCH] chdir-notify: UNLEAK registrated callback entries René Scharfe
  2020-11-14 21:53 ` René Scharfe
@ 2020-11-17  0:24 ` Jeff King
  2020-11-17 18:45   ` René Scharfe
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff King @ 2020-11-17  0:24 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git Mailing List, Junio C Hamano

On Sat, Nov 14, 2020 at 10:40:01PM +0100, René Scharfe wrote:

> chdir_notify_register() allows registering functions to notify when
> chdir() is called.  There is no way to unsubscribe or shut this
> mechanism down, so these entries are present until the program ends.
> 
> Valgrind reports allocations for these registrations as "possibly lost",
> probably because it doesn't see through list.h's offsetof tricks.
> Annotate them using UNLEAK, which causes Valgrind to report them as
> "still reachable" instead.

I can't say I'm excited to see UNLEAK used here. It was really intended
for items going out of scope that weren't worth cleaning up. But here
we're papering over a failure in the memory checking tool for something
that _is_ in scope.

I guess I'm not too surprised that valgrind has trouble with list.h. We
have pointers into a heap-allocated block, but not the start of it.
Curiously, ASan/LSan get this case right. So my first instinct is: use
those tools, they're better. :)

If we did want to paper over this case for valgrind, I think this is a
better way to do so:

diff --git a/chdir-notify.c b/chdir-notify.c
index 5f7f2c2ac2..ddfe703b1a 100644
--- a/chdir-notify.c
+++ b/chdir-notify.c
@@ -4,10 +4,10 @@
 #include "strbuf.h"
 
 struct chdir_notify_entry {
+	struct list_head list;
 	const char *name;
 	chdir_notify_callback cb;
 	void *data;
-	struct list_head list;
 };
 static LIST_HEAD(chdir_notify_entries);
 

I also wonder if valgrind _is_ aware of the distinction, and that's why
these show up as only "possibly lost". And indeed, the faq[1] says:

 - "possibly lost" means your program is leaking memory, unless you're
   doing unusual things with pointers that could cause them to point
   into the middle of an allocated block; see the user manual for some
   possible causes. Use --show-possibly-lost=no if you don't want to see
   these reports.

and the user manual[2] has a more elaborate example that calls these
"interior pointers". So I think that's exactly what is going on here.

But then I'm not sure why we'd want this patch. List pointers (and now
hashmap entries, which also contain a linked-list chain) are used in
lots of data structures. Fixing this one case manually is not that
interesting. If we're going to use valgrind, we probably need to accept
that its "possibly lost" distinction is not useful for our code and turn
it off.

-Peff

[1] https://valgrind.org/docs/manual/faq.html#faq.deflost

[2] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] chdir-notify: UNLEAK registrated callback entries
  2020-11-14 21:53 ` René Scharfe
  2020-11-16 21:59   ` Junio C Hamano
@ 2020-11-17  4:49   ` Elijah Newren
  2020-11-17  6:53     ` Jeff King
  1 sibling, 1 reply; 8+ messages in thread
From: Elijah Newren @ 2020-11-17  4:49 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git Mailing List, Junio C Hamano, Jeff King, Elijah Newren

Hi,

On Sat, Nov 14, 2020 at 1:55 PM René Scharfe <l.s.r@web.de> wrote:
>
> Am 14.11.20 um 22:40 schrieb René Scharfe:
> > chdir_notify_register() allows registering functions to notify when
> > chdir() is called.  There is no way to unsubscribe or shut this
> > mechanism down, so these entries are present until the program ends.
> >
> > Valgrind reports allocations for these registrations as "possibly lost",
> > probably because it doesn't see through list.h's offsetof tricks.
>
> Right, avoiding list.h like below lets Valgrind classify the memory as
> "still reachable", without UNLEAK.  Not sure if it's worth it, though.
>
> Note my somewhat concerning knee-jerk reaction to write some macros. ;)

I've got a local patch that changes these to actually be freed as yet
another route we could go.  It's mixed with a few (half-baked)
revision-related memory cleanups that should be separated, and I'm
unsure if there are other places that would need a finalize_globals()
call too, but it's another approach to consider...


-- >8 --
Subject: [RFC PATCH] Make merge-ort run leak check free

Tweaks to revision walking, adding a chdir_deregister_all() (so that all
watchers can be cleared as well as free'ing their associated memory),
and maybe other stuff.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/fast-rebase.c |  2 ++
 cache.h               |  3 +++
 chdir-notify.c        | 15 +++++++++++++++
 chdir-notify.h        |  5 +++++
 git.c                 | 10 ++++++++--
 revision.c            | 11 +++++++++++
 revision.h            |  6 ++++++
 setup.c               |  5 +++++
 8 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/builtin/fast-rebase.c b/builtin/fast-rebase.c
index 19333ca42d..32cab36ad3 100644
--- a/builtin/fast-rebase.c
+++ b/builtin/fast-rebase.c
@@ -173,6 +173,8 @@ int cmd_fast_rebase(int argc, const char **argv, const char *prefix)
 		last_commit = create_commit(result.tree, commit, last_commit);
 	}
 	fprintf(stderr, "\nDone.\n");
+	rev_info_free(&revs);
+	memset(&revs, 0, sizeof(revs));
 
 	merge_switch_to_result(&merge_opt, head_tree, &result, 1, !result.clean);
 
diff --git a/cache.h b/cache.h
index c0072d43b1..b9b543f75b 100644
--- a/cache.h
+++ b/cache.h
@@ -604,6 +604,9 @@ const char *setup_git_directory(void);
 char *prefix_path(const char *prefix, int len, const char *path);
 char *prefix_path_gently(const char *prefix, int len, int *remaining, const char *path);
 
+/* Release resources related to global config. */
+void finalize_globals(void);
+
 /*
  * Concatenate "prefix" (if len is non-zero) and "path", with no
  * connecting characters (so "prefix" should end with a "/").
diff --git a/chdir-notify.c b/chdir-notify.c
index 5f7f2c2ac2..ea7e52cfea 100644
--- a/chdir-notify.c
+++ b/chdir-notify.c
@@ -91,3 +91,18 @@ char *reparent_relative_path(const char *old_cwd,
 
 	return ret;
 }
+
+void chdir_deregister_all(void)
+{
+	struct list_head *pos, *tmp;
+
+	if (list_empty(&chdir_notify_entries))
+		return;
+
+	list_for_each_safe(pos, tmp, &chdir_notify_entries) {
+		struct chdir_notify_entry *e =
+			list_entry(pos, struct chdir_notify_entry, list);
+		free(e);
+	}
+	memset(&chdir_notify_entries, 0, sizeof(chdir_notify_entries));
+}
diff --git a/chdir-notify.h b/chdir-notify.h
index 366e4c1ee9..0cd3e1f633 100644
--- a/chdir-notify.h
+++ b/chdir-notify.h
@@ -70,4 +70,9 @@ char *reparent_relative_path(const char *old_cwd,
 			     const char *new_cwd,
 			     const char *path);
 
+/*
+ * Deregister all subscribers and free any associated memory.
+ */
+void chdir_deregister_all(void);
+
 #endif /* CHDIR_NOTIFY_H */
diff --git a/git.c b/git.c
index af84f11e69..4b04988909 100644
--- a/git.c
+++ b/git.c
@@ -358,6 +358,7 @@ static int handle_alias(int *argcp, const char ***argv)
 			trace2_cmd_name("_run_shell_alias_");
 
 			ret = run_command(&child);
+			finalize_globals();
 			if (ret >= 0)   /* normal exit */
 				exit(ret);
 
@@ -697,8 +698,11 @@ static void handle_builtin(int argc, const char **argv)
 	}
 
 	builtin = get_builtin(cmd);
-	if (builtin)
-		exit(run_builtin(builtin, argc, argv));
+	if (builtin) {
+		int result = run_builtin(builtin, argc, argv);
+		finalize_globals();
+		exit(result);
+	}
 	strvec_clear(&args);
 }
 
@@ -742,6 +746,7 @@ static void execv_dashed_external(const char **argv)
 	 * generic string as our trace2 command verb to indicate that we
 	 * launched a dashed command.
 	 */
+	finalize_globals();
 	if (status >= 0)
 		exit(status);
 	else if (errno != ENOENT)
@@ -796,6 +801,7 @@ static int run_argv(int *argcp, const char ***argv)
 			 */
 			i = run_command_v_opt_tr2(args.v, RUN_SILENT_EXEC_FAILURE |
 						  RUN_CLEAN_ON_EXIT | RUN_WAIT_AFTER_CLEAN, "git_alias");
+			finalize_globals();
 			if (i >= 0 || errno != ENOENT)
 				exit(i);
 			die("could not execute builtin %s", **argv);
diff --git a/revision.c b/revision.c
index c6e169e3eb..b4f276d854 100644
--- a/revision.c
+++ b/revision.c
@@ -1419,6 +1419,8 @@ static int limit_list(struct rev_info *revs)
 			slop = still_interesting(list, date, slop, &interesting_cache);
 			if (slop)
 				continue;
+			free_commit_list(list);
+			revs->commits = NULL;
 			break;
 		}
 		if (revs->min_age != -1 && (commit->date > revs->min_age) &&
@@ -3281,6 +3283,15 @@ static void set_children(struct rev_info *revs)
 	}
 }
 
+void rev_info_free(struct rev_info *revs)
+{
+	int i;
+
+	for (i = 0; i < revs->cmdline.nr; i++)
+		free((char*)revs->cmdline.rev[i].name);
+	free(revs->cmdline.rev);
+}
+
 void reset_revision_walk(void)
 {
 	clear_object_flags(SEEN | ADDED | SHOWN | TOPO_WALK_EXPLORED | TOPO_WALK_INDEGREE);
diff --git a/revision.h b/revision.h
index f6bf860d19..5dd98f2f6d 100644
--- a/revision.h
+++ b/revision.h
@@ -362,6 +362,12 @@ void repo_init_revisions(struct repository *r,
 int setup_revisions(int argc, const char **argv, struct rev_info *revs,
 		    struct setup_revision_opt *);
 
+/**
+ * Free information associated with a rev_info once no longer needed.
+ * Will not free revs itself.
+ */
+void rev_info_free(struct rev_info *revs);
+
 void parse_revision_opt(struct rev_info *revs, struct parse_opt_ctx_t *ctx,
 			const struct option *options,
 			const char * const usagestr[]);
diff --git a/setup.c b/setup.c
index c04cd25a30..cd958f5f5c 100644
--- a/setup.c
+++ b/setup.c
@@ -1411,6 +1411,11 @@ void sanitize_stdfds(void)
 		close(fd);
 }
 
+void finalize_globals(void)
+{
+	chdir_deregister_all();
+}
+
 int daemonize(void)
 {
 #ifdef NO_POSIX_GOODIES
-- 
2.29.2.529.gc090dab8c5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] chdir-notify: UNLEAK registrated callback entries
  2020-11-17  4:49   ` Elijah Newren
@ 2020-11-17  6:53     ` Jeff King
  2020-11-17  8:39       ` Elijah Newren
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2020-11-17  6:53 UTC (permalink / raw)
  To: Elijah Newren; +Cc: René Scharfe, Git Mailing List, Junio C Hamano

On Mon, Nov 16, 2020 at 08:49:23PM -0800, Elijah Newren wrote:

> > Right, avoiding list.h like below lets Valgrind classify the memory as
> > "still reachable", without UNLEAK.  Not sure if it's worth it, though.
> >
> > Note my somewhat concerning knee-jerk reaction to write some macros. ;)
> 
> I've got a local patch that changes these to actually be freed as yet
> another route we could go.  It's mixed with a few (half-baked)
> revision-related memory cleanups that should be separated, and I'm
> unsure if there are other places that would need a finalize_globals()
> call too, but it's another approach to consider...

Here are a few thoughts...

> diff --git a/builtin/fast-rebase.c b/builtin/fast-rebase.c
> index 19333ca42d..32cab36ad3 100644
> --- a/builtin/fast-rebase.c
> +++ b/builtin/fast-rebase.c
> @@ -173,6 +173,8 @@ int cmd_fast_rebase(int argc, const char **argv, const char *prefix)
>  		last_commit = create_commit(result.tree, commit, last_commit);
>  	}
>  	fprintf(stderr, "\nDone.\n");
> +	rev_info_free(&revs);
> +	memset(&revs, 0, sizeof(revs));

This one could easily be an UNLEAK(), since we're presumably exiting
from cmd_fast_rebase() soon after. But I certainly don't mind a real
cleanup.

Probably rev_info_free() should leave the rev_info in a reasonable state
(so callers don't have to re-zero it).

> diff --git a/cache.h b/cache.h
> index c0072d43b1..b9b543f75b 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -604,6 +604,9 @@ const char *setup_git_directory(void);
>  char *prefix_path(const char *prefix, int len, const char *path);
>  char *prefix_path_gently(const char *prefix, int len, int *remaining, const char *path);
>  
> +/* Release resources related to global config. */
> +void finalize_globals(void);

Yuck. Remembering to put things in this, and remembering to call it in
all of the right places seems ugly.

If we want to go this route, then I think individual pieces of code
could register their own atexit() handlers to clean up. But...

> +void chdir_deregister_all(void)
> +{
> +	struct list_head *pos, *tmp;
> +
> +	if (list_empty(&chdir_notify_entries))
> +		return;
> +
> +	list_for_each_safe(pos, tmp, &chdir_notify_entries) {
> +		struct chdir_notify_entry *e =
> +			list_entry(pos, struct chdir_notify_entry, list);
> +		free(e);
> +	}
> +	memset(&chdir_notify_entries, 0, sizeof(chdir_notify_entries));
> +}

I disagree that this is actually a leak. The memory is still referenced
by the list, just like a ton of other global variables (e.g., quite a
lot of config or environment variables that are loaded throughout a
normal program).

So my inclination is to leave referenced globals like this. If there are
tools that are confused by our linked list structure, then we should
adjust our tools (either using ASan, or turning off valgrind's "possibly
lost" report).

(My preference is to use ASan in general, because it's _so_ much
faster. Running the whole suite under valgrind is pretty painful, and
ideally we'd eventually hit a point where it was possible to run the
whole suite with leak-checking enabled).

> diff --git a/git.c b/git.c
> index af84f11e69..4b04988909 100644
> --- a/git.c
> +++ b/git.c
> @@ -358,6 +358,7 @@ static int handle_alias(int *argcp, const char ***argv)
>  			trace2_cmd_name("_run_shell_alias_");
>  
>  			ret = run_command(&child);
> +			finalize_globals();
>  			if (ret >= 0)   /* normal exit */
>  				exit(ret);

I thought common-main.c might be a better place to put something like
this, but any time we exit() it gets weird (of course, atexit() would
take care of this).

OTOH, if we call exit() then most things are still on the stack (or in
globals), so we're not actually leaking those things. I think the root
of this chdir-notify issue is not that we don't still have a handle to
the memory, but that the linked list confuses valgrind.  But the same
would be true if we were holding another linked list (or hashmap) in a
stack variable, referenced via the heap, etc, and called exit().

> +void rev_info_free(struct rev_info *revs)
> +{
> +	int i;
> +
> +	for (i = 0; i < revs->cmdline.nr; i++)
> +		free((char*)revs->cmdline.rev[i].name);
> +	free(revs->cmdline.rev);
> +}

I suspect there's quite a bit more needed here (René already found a
case where pathspecs needed to be freed). But I don't mind if we start
small and add to it as leak-checking notices items.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] chdir-notify: UNLEAK registrated callback entries
  2020-11-17  6:53     ` Jeff King
@ 2020-11-17  8:39       ` Elijah Newren
  0 siblings, 0 replies; 8+ messages in thread
From: Elijah Newren @ 2020-11-17  8:39 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Git Mailing List, Junio C Hamano

On Mon, Nov 16, 2020 at 10:53 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Nov 16, 2020 at 08:49:23PM -0800, Elijah Newren wrote:
>
> > > Right, avoiding list.h like below lets Valgrind classify the memory as
> > > "still reachable", without UNLEAK.  Not sure if it's worth it, though.
> > >
> > > Note my somewhat concerning knee-jerk reaction to write some macros. ;)
> >
> > I've got a local patch that changes these to actually be freed as yet
> > another route we could go.  It's mixed with a few (half-baked)
> > revision-related memory cleanups that should be separated, and I'm
> > unsure if there are other places that would need a finalize_globals()
> > call too, but it's another approach to consider...
>
> Here are a few thoughts...
>
> > diff --git a/builtin/fast-rebase.c b/builtin/fast-rebase.c
> > index 19333ca42d..32cab36ad3 100644
> > --- a/builtin/fast-rebase.c
> > +++ b/builtin/fast-rebase.c
> > @@ -173,6 +173,8 @@ int cmd_fast_rebase(int argc, const char **argv, const char *prefix)
> >               last_commit = create_commit(result.tree, commit, last_commit);
> >       }
> >       fprintf(stderr, "\nDone.\n");
> > +     rev_info_free(&revs);
> > +     memset(&revs, 0, sizeof(revs));
>
> This one could easily be an UNLEAK(), since we're presumably exiting
> from cmd_fast_rebase() soon after. But I certainly don't mind a real
> cleanup.
>
> Probably rev_info_free() should leave the rev_info in a reasonable state
> (so callers don't have to re-zero it).
>
> > diff --git a/cache.h b/cache.h
> > index c0072d43b1..b9b543f75b 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -604,6 +604,9 @@ const char *setup_git_directory(void);
> >  char *prefix_path(const char *prefix, int len, const char *path);
> >  char *prefix_path_gently(const char *prefix, int len, int *remaining, const char *path);
> >
> > +/* Release resources related to global config. */
> > +void finalize_globals(void);
>
> Yuck. Remembering to put things in this, and remembering to call it in
> all of the right places seems ugly.
>
> If we want to go this route, then I think individual pieces of code
> could register their own atexit() handlers to clean up. But...
>
> > +void chdir_deregister_all(void)
> > +{
> > +     struct list_head *pos, *tmp;
> > +
> > +     if (list_empty(&chdir_notify_entries))
> > +             return;
> > +
> > +     list_for_each_safe(pos, tmp, &chdir_notify_entries) {
> > +             struct chdir_notify_entry *e =
> > +                     list_entry(pos, struct chdir_notify_entry, list);
> > +             free(e);
> > +     }
> > +     memset(&chdir_notify_entries, 0, sizeof(chdir_notify_entries));
> > +}
>
> I disagree that this is actually a leak. The memory is still referenced
> by the list, just like a ton of other global variables (e.g., quite a
> lot of config or environment variables that are loaded throughout a
> normal program).
>
> So my inclination is to leave referenced globals like this. If there are
> tools that are confused by our linked list structure, then we should
> adjust our tools (either using ASan, or turning off valgrind's "possibly
> lost" report).
>
> (My preference is to use ASan in general, because it's _so_ much
> faster. Running the whole suite under valgrind is pretty painful, and
> ideally we'd eventually hit a point where it was possible to run the
> whole suite with leak-checking enabled).

I'm aware of the differences between "definitely lossed" and "still
reachable".  And yes, the definitely lost are more important to clean
up.  But I'm not convinced that "still reachable" are always okay; I
think they sometimes point to latent bugs and other issues that might
manifest as leaks with slightly different workloads.  I also think
that sometimes these are easier to track down than some of the actual
leaks, but the work to free them and their associated structures will
naturally clear up some of the actual leaks.  Neither of those two
cases apply here, of course, because we've done the auditing to know
why these particular "possibly lost"/"still reachable" references
exist.  But if you have a lot of noise in the "possibly lost" and
"still reachable" categories, it makes it unlikely you'll want to go
through the remainder to find other useful signal.

All that said, I've been sitting on this and a bunch of other patches
for months, because I agree with some of your comments on the ugliness
of the code, and because when I started to go down the rabbit hole for
cleaning up rev_info I found it was not only ugly but also a very deep
hole...and it was measurably slowing down my fast-rebase testcases too
which was demotivating.


Running the whole testsuite under ASan sounds great, btw.

> > diff --git a/git.c b/git.c
> > index af84f11e69..4b04988909 100644
> > --- a/git.c
> > +++ b/git.c
> > @@ -358,6 +358,7 @@ static int handle_alias(int *argcp, const char ***argv)
> >                       trace2_cmd_name("_run_shell_alias_");
> >
> >                       ret = run_command(&child);
> > +                     finalize_globals();
> >                       if (ret >= 0)   /* normal exit */
> >                               exit(ret);
>
> I thought common-main.c might be a better place to put something like
> this, but any time we exit() it gets weird (of course, atexit() would
> take care of this).
>
> OTOH, if we call exit() then most things are still on the stack (or in
> globals), so we're not actually leaking those things. I think the root
> of this chdir-notify issue is not that we don't still have a handle to
> the memory, but that the linked list confuses valgrind.  But the same
> would be true if we were holding another linked list (or hashmap) in a
> stack variable, referenced via the heap, etc, and called exit().
>
> > +void rev_info_free(struct rev_info *revs)
> > +{
> > +     int i;
> > +
> > +     for (i = 0; i < revs->cmdline.nr; i++)
> > +             free((char*)revs->cmdline.rev[i].name);
> > +     free(revs->cmdline.rev);
> > +}
>
> I suspect there's quite a bit more needed here (René already found a
> case where pathspecs needed to be freed). But I don't mind if we start
> small and add to it as leak-checking notices items.

Yeah, I had other patches that added on to it; I didn't have tests
quite passing so I #ifdef'd out some blocks while testing to return
some leaks but undo some of the breaks.  Anyway, I had something that
looked like

+void rev_info_free(struct rev_info *revs)
+{
+       int i;
+
+#if 0
+       if (revs->commits)
+               free_commit_list(revs->commits);
+#endif
+
+       object_array_clear(&revs->boundary_commits);
+
+       for (i = 0; i < revs->cmdline.nr; i++) {
+#if 0
+               /* This block causes double free for 'git fetch origin' */
+               struct object *obj = revs->cmdline.rev[i].item;
+               if (obj->type == OBJ_COMMIT)
+                       free_commit_list(((struct commit *)obj)->parents);
+#endif
+               free((char*)revs->cmdline.rev[i].name);
+       }
+       free(revs->cmdline.rev);
+
+       clear_pathspec(&revs->prune_data);
+       clear_pathspec(&revs->pruning.pathspec);
+       if (revs->graph)
+               graph_free(revs->graph);
+       free(revs->graph);
+       clear_pathspec(&revs->diffopt.pathspec);
+
+       if (revs->reflog_info)
+               reflog_walk_free(&revs->reflog_info);
+
+       clear_decorations(&revs->children, 1);
+       clear_decorations(&revs->merge_simplification, 1);
+       clear_decorations(&revs->treesame, 1);
+       free_saved_parents(revs);
+       free(revs->saved_parents_slab);
+}

but, not only was it not passing tests, I knew there were other things
to free still that I hadn't gotten too.  Also, you'll note that a
number of functions listed above don't exist; I had to add them.


So yeah, trying to clean up rev_info will be a bit of work.  I'd be
happy to hand off what I do have if anyone is interested; I doubt I'll
get back to it soon.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] chdir-notify: UNLEAK registrated callback entries
  2020-11-17  0:24 ` Jeff King
@ 2020-11-17 18:45   ` René Scharfe
  0 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2020-11-17 18:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Junio C Hamano, Elijah Newren

Am 17.11.20 um 01:24 schrieb Jeff King:
> On Sat, Nov 14, 2020 at 10:40:01PM +0100, René Scharfe wrote:
>
>> chdir_notify_register() allows registering functions to notify when
>> chdir() is called.  There is no way to unsubscribe or shut this
>> mechanism down, so these entries are present until the program ends.
>>
>> Valgrind reports allocations for these registrations as "possibly lost",
>> probably because it doesn't see through list.h's offsetof tricks.
>> Annotate them using UNLEAK, which causes Valgrind to report them as
>> "still reachable" instead.
>
> I can't say I'm excited to see UNLEAK used here. It was really intended
> for items going out of scope that weren't worth cleaning up. But here
> we're papering over a failure in the memory checking tool for something
> that _is_ in scope.

Right, though I'd consider every use of UNLEAK as "papering over".  And
not being able to distinguish with certainty between pointer trickery
and accidents is more of a missing (or impossible) feature than a
failure in my book.

> I guess I'm not too surprised that valgrind has trouble with list.h. We
> have pointers into a heap-allocated block, but not the start of it.
> Curiously, ASan/LSan get this case right. So my first instinct is: use
> those tools, they're better. :)

Does Leak Sanitizer get it right or is it just as unsure, but reports
its findings more cautiously?

> If we did want to paper over this case for valgrind, I think this is a
> better way to do so:
>
> diff --git a/chdir-notify.c b/chdir-notify.c
> index 5f7f2c2ac2..ddfe703b1a 100644
> --- a/chdir-notify.c
> +++ b/chdir-notify.c
> @@ -4,10 +4,10 @@
>  #include "strbuf.h"
>
>  struct chdir_notify_entry {
> +	struct list_head list;
>  	const char *name;
>  	chdir_notify_callback cb;
>  	void *data;
> -	struct list_head list;
>  };
>  static LIST_HEAD(chdir_notify_entries);

A trick to hide the trick -- I like it.

> I also wonder if valgrind _is_ aware of the distinction, and that's why
> these show up as only "possibly lost". And indeed, the faq[1] says:
>
>  - "possibly lost" means your program is leaking memory, unless you're
>    doing unusual things with pointers that could cause them to point
>    into the middle of an allocated block; see the user manual for some
>    possible causes. Use --show-possibly-lost=no if you don't want to see
>    these reports.
>
> and the user manual[2] has a more elaborate example that calls these
> "interior pointers". So I think that's exactly what is going on here.
>
> But then I'm not sure why we'd want this patch. List pointers (and now
> hashmap entries, which also contain a linked-list chain) are used in
> lots of data structures. Fixing this one case manually is not that
> interesting. If we're going to use valgrind, we probably need to accept
> that its "possibly lost" distinction is not useful for our code and turn
> it off.

That would probably disregard real issues as well -- but it's hard to
know how many without finding and classifying them.

Lists (and hashmaps) have cleanup methods that we could call, as Elijah
noted.  We can skip that in main() functions for production builds, but
cutting out the noise when SUPPRESS_ANNOTATED_LEAKS is set would surely
make finding real leaks easier.

René

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-17 18:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-14 21:40 [PATCH] chdir-notify: UNLEAK registrated callback entries René Scharfe
2020-11-14 21:53 ` René Scharfe
2020-11-16 21:59   ` Junio C Hamano
2020-11-17  4:49   ` Elijah Newren
2020-11-17  6:53     ` Jeff King
2020-11-17  8:39       ` Elijah Newren
2020-11-17  0:24 ` Jeff King
2020-11-17 18:45   ` René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).