git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] Clean up some memory leaks in and around dir.c
@ 2020-08-16  6:59 Elijah Newren via GitGitGadget
  2020-08-16  6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-16  6:59 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

Some memory leaks in dir.c were making it hard for me to find my own leaks
in merge-ort. I followed a few threads and tried to both clean up the leaks
and make the API less prone to incorrect use.

Elijah Newren (3):
  dir: fix leak of parent_hashmap and recursive_hashmap
  dir: make clear_directory() free all relevant memory
  dir: fix problematic API to avoid memory leaks

 builtin/add.c          |  4 ++--
 builtin/check-ignore.c |  4 ++--
 builtin/clean.c        | 12 ++++--------
 builtin/grep.c         |  3 ++-
 builtin/ls-files.c     |  4 ++--
 builtin/stash.c        |  7 ++-----
 dir.c                  | 23 ++++++++++++++++++-----
 dir.h                  | 21 +++++++++++----------
 merge.c                |  3 ++-
 wt-status.c            |  8 ++------
 10 files changed, 47 insertions(+), 42 deletions(-)


base-commit: 878e727637ec5815ccb3301eb994a54df95b21b8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-831%2Fnewren%2Fdir_memory_cleanup-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-831/newren/dir_memory_cleanup-v1
Pull-Request: https://github.com/git/git/pull/831
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap
  2020-08-16  6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
@ 2020-08-16  6:59 ` Elijah Newren via GitGitGadget
  2020-08-16  8:43   ` Jeff King
  2020-08-16  6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-16  6:59 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Commit 96cc8ab531 ("sparse-checkout: use hashmaps for cone patterns",
2019-11-21) added a parent_hashmap and recursive_hashmap to each struct
pattern_list and initialized these in add_patterns_from_buffer() but did
not make sure to add necessary code to clear_pattern_list() to free
these new structures.  Call hashmap_free_() on each to plug this memory
leak.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/dir.c b/dir.c
index fe64be30ed..08df469bf7 100644
--- a/dir.c
+++ b/dir.c
@@ -916,6 +916,10 @@ void clear_pattern_list(struct pattern_list *pl)
 		free(pl->patterns[i]);
 	free(pl->patterns);
 	free(pl->filebuf);
+	hashmap_free_(&pl->parent_hashmap,
+		      offsetof(struct pattern_entry, ent));
+	hashmap_free_(&pl->recursive_hashmap,
+		      offsetof(struct pattern_entry, ent));
 
 	memset(pl, 0, sizeof(*pl));
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] dir: make clear_directory() free all relevant memory
  2020-08-16  6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
  2020-08-16  6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
@ 2020-08-16  6:59 ` Elijah Newren via GitGitGadget
  2020-08-16  8:54   ` Jeff King
  2020-08-16  6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
  2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
  3 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-16  6:59 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The calling convention for the dir API is supposed to end with a call to
clear_directory() to free up no longer needed memory.  However,
clear_directory() didn't free dir->entries or dir->ignored.  I believe
this was oversight, but a number of callers noticed memory leaks and
started free'ing these, but often somewhat haphazardly (sometimes
freeing the entries in the arrays, and sometimes only free'ing the
arrays themselves).  This suggests the callers weren't trying to make
sure any possible memory used might be free'd, but just the memory they
noticed their usecase definitely had allocated.  This also caused the
extra memory deallocations to be duplicated in many places.

Fix this mess by moving all the duplicated free'ing logic into
clear_directory().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c |  6 +-----
 builtin/stash.c |  3 ---
 dir.c           | 12 ++++++++----
 dir.h           |  2 +-
 wt-status.c     |  4 ----
 5 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 5a9c29a558..4ffe00dd7f 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -1021,11 +1021,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		string_list_append(&del_list, rel);
 	}
 
-	for (i = 0; i < dir.nr; i++)
-		free(dir.entries[i]);
-
-	for (i = 0; i < dir.ignored_nr; i++)
-		free(dir.ignored[i]);
+	clear_directory(&dir);
 
 	if (interactive && del_list.nr > 0)
 		interactive_main_loop();
diff --git a/builtin/stash.c b/builtin/stash.c
index 10d87630cd..da48533d49 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -875,11 +875,8 @@ static int get_untracked_files(const struct pathspec *ps, int include_untracked,
 		strbuf_addstr(untracked_files, ent->name);
 		/* NUL-terminate: will be fed to update-index -z */
 		strbuf_addch(untracked_files, '\0');
-		free(ent);
 	}
 
-	free(dir.entries);
-	free(dir.ignored);
 	clear_directory(&dir);
 	return found;
 }
diff --git a/dir.c b/dir.c
index 08df469bf7..b136c037d9 100644
--- a/dir.c
+++ b/dir.c
@@ -3012,10 +3012,7 @@ int remove_path(const char *name)
 	return 0;
 }
 
-/*
- * Frees memory within dir which was allocated for exclude lists and
- * the exclude_stack.  Does not free dir itself.
- */
+/* Frees memory within dir which was allocated.  Does not free dir itself. */
 void clear_directory(struct dir_struct *dir)
 {
 	int i, j;
@@ -3034,6 +3031,13 @@ void clear_directory(struct dir_struct *dir)
 		free(group->pl);
 	}
 
+	for (i = 0; i < dir->ignored_nr; i++)
+		free(dir->ignored[i]);
+	for (i = 0; i < dir->nr; i++)
+		free(dir->entries[i]);
+	free(dir->ignored);
+	free(dir->entries);
+
 	stk = dir->exclude_stack;
 	while (stk) {
 		struct exclude_stack *prev = stk->prev;
diff --git a/dir.h b/dir.h
index 5855c065a6..7d76d0644f 100644
--- a/dir.h
+++ b/dir.h
@@ -36,7 +36,7 @@
  *
  * - Use `dir.entries[]`.
  *
- * - Call `clear_directory()` when none of the contained elements are no longer in use.
+ * - Call `clear_directory()` when the contained elements are no longer in use.
  *
  */
 
diff --git a/wt-status.c b/wt-status.c
index d75399085d..c00ea3e06a 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -724,18 +724,14 @@ static void wt_status_collect_untracked(struct wt_status *s)
 		struct dir_entry *ent = dir.entries[i];
 		if (index_name_is_other(istate, ent->name, ent->len))
 			string_list_insert(&s->untracked, ent->name);
-		free(ent);
 	}
 
 	for (i = 0; i < dir.ignored_nr; i++) {
 		struct dir_entry *ent = dir.ignored[i];
 		if (index_name_is_other(istate, ent->name, ent->len))
 			string_list_insert(&s->ignored, ent->name);
-		free(ent);
 	}
 
-	free(dir.entries);
-	free(dir.ignored);
 	clear_directory(&dir);
 
 	if (advice_status_u_option)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] dir: fix problematic API to avoid memory leaks
  2020-08-16  6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
  2020-08-16  6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
  2020-08-16  6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
@ 2020-08-16  6:59 ` Elijah Newren via GitGitGadget
  2020-08-16  9:11   ` Jeff King
  2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
  3 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-16  6:59 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Two commits ago, we noticed that parent_hashmap and recursive_hashmap
were being leaked and fixed it by making clear_pattern_list() free those
hashmaps.  One commit ago, we noticed that clear_directory() was only
taking responsibility for a subset of fields within dir_struct, despite
the fact that entries[] and ignored[] we allocated internally to dir.c,
resulting in many callers either leaking or haphazardly trying to free
these arrays and their contents.

Digging further, I found that despite the pretty clear documentation
near the top of dir.h that folks were supposed to call clear_directory()
when the user no longer needed the dir_struct, there were four callers
that didn't bother doing that at all.  However, two of them clearly
thought about leaks since they had an UNLEAK(dir) directive, which to me
suggests that the method to free the data was too unclear.  I suspect
the non-obviousness of the API and its holes led folks to avoid it,
which then snowballed into further problems with the entries[],
ignored[], parent_hashmap, and recursive_hashmap problems.

Rename clear_directory() to dir_free() to be more in line with other
data structures in git, and introduce a dir_init() to handle the
suggested memsetting of dir_struct to all zeroes.  I hope that a name
like "dir_free()" is more clear, and that the presence of dir_init()
will provide a hint to those looking at the code that there may be a
corresponding dir_free() that they need to call.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/add.c          |  4 ++--
 builtin/check-ignore.c |  4 ++--
 builtin/clean.c        |  8 ++++----
 builtin/grep.c         |  3 ++-
 builtin/ls-files.c     |  4 ++--
 builtin/stash.c        |  4 ++--
 dir.c                  |  7 ++++++-
 dir.h                  | 19 ++++++++++---------
 merge.c                |  3 ++-
 wt-status.c            |  4 ++--
 10 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index ab39a60a0d..46c9a0f552 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -534,11 +534,11 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	die_in_unpopulated_submodule(&the_index, prefix);
 	die_path_inside_submodule(&the_index, &pathspec);
 
+	dir_init(&dir);
 	if (add_new_files) {
 		int baselen;
 
 		/* Set up the default git porcelain excludes */
-		memset(&dir, 0, sizeof(dir));
 		if (!ignored_too) {
 			dir.flags |= DIR_COLLECT_IGNORED;
 			setup_standard_excludes(&dir);
@@ -611,7 +611,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
 		die(_("Unable to write new index file"));
 
+	dir_free(&dir);
 	UNLEAK(pathspec);
-	UNLEAK(dir);
 	return exit_status;
 }
diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index ea5d0ae3a6..29b464a1fa 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -180,7 +180,7 @@ int cmd_check_ignore(int argc, const char **argv, const char *prefix)
 	if (!no_index && read_cache() < 0)
 		die(_("index file corrupt"));
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	setup_standard_excludes(&dir);
 
 	if (stdin_paths) {
@@ -190,7 +190,7 @@ int cmd_check_ignore(int argc, const char **argv, const char *prefix)
 		maybe_flush_or_die(stdout, "ignore to stdout");
 	}
 
-	clear_directory(&dir);
+	dir_free(&dir);
 
 	return !num_ignored;
 }
diff --git a/builtin/clean.c b/builtin/clean.c
index 4ffe00dd7f..a4cf58af74 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -667,7 +667,7 @@ static int filter_by_patterns_cmd(void)
 		if (!confirm.len)
 			break;
 
-		memset(&dir, 0, sizeof(dir));
+		dir_init(&dir);
 		pl = add_pattern_list(&dir, EXC_CMDL, "manual exclude");
 		ignore_list = strbuf_split_max(&confirm, ' ', 0);
 
@@ -698,7 +698,7 @@ static int filter_by_patterns_cmd(void)
 		}
 
 		strbuf_list_free(ignore_list);
-		clear_directory(&dir);
+		dir_free(&dir);
 	}
 
 	strbuf_release(&confirm);
@@ -923,7 +923,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, options, builtin_clean_usage,
 			     0);
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	if (!interactive && !dry_run && !force) {
 		if (config_set)
 			die(_("clean.requireForce set to true and neither -i, -n, nor -f given; "
@@ -1021,7 +1021,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		string_list_append(&del_list, rel);
 	}
 
-	clear_directory(&dir);
+	dir_free(&dir);
 
 	if (interactive && del_list.nr > 0)
 		interactive_main_loop();
diff --git a/builtin/grep.c b/builtin/grep.c
index cee9db3477..fef2cc05ca 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -693,7 +693,7 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
 	struct dir_struct dir;
 	int i, hit = 0;
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	if (!use_index)
 		dir.flags |= DIR_NO_GITLINKS;
 	if (exc_std)
@@ -705,6 +705,7 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (hit && opt->status_only)
 			break;
 	}
+	dir_free(&dir);
 	return hit;
 }
 
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 30a4c10334..740feff53e 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -584,7 +584,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(ls_files_usage, builtin_ls_files_options);
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	prefix = cmd_prefix;
 	if (prefix)
 		prefix_len = strlen(prefix);
@@ -688,6 +688,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		return bad ? 1 : 0;
 	}
 
-	UNLEAK(dir);
+	dir_free(&dir);
 	return 0;
 }
diff --git a/builtin/stash.c b/builtin/stash.c
index da48533d49..85b646b7ad 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -864,7 +864,7 @@ static int get_untracked_files(const struct pathspec *ps, int include_untracked,
 	int found = 0;
 	struct dir_struct dir;
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	if (include_untracked != INCLUDE_ALL_FILES)
 		setup_standard_excludes(&dir);
 
@@ -877,7 +877,7 @@ static int get_untracked_files(const struct pathspec *ps, int include_untracked,
 		strbuf_addch(untracked_files, '\0');
 	}
 
-	clear_directory(&dir);
+	dir_free(&dir);
 	return found;
 }
 
diff --git a/dir.c b/dir.c
index b136c037d9..e3c8e7ffd6 100644
--- a/dir.c
+++ b/dir.c
@@ -54,6 +54,11 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 static int resolve_dtype(int dtype, struct index_state *istate,
 			 const char *path, int len);
 
+void dir_init(struct dir_struct *dir)
+{
+	memset(dir, 0, sizeof(*dir));
+}
+
 int count_slashes(const char *s)
 {
 	int cnt = 0;
@@ -3013,7 +3018,7 @@ int remove_path(const char *name)
 }
 
 /* Frees memory within dir which was allocated.  Does not free dir itself. */
-void clear_directory(struct dir_struct *dir)
+void dir_free(struct dir_struct *dir)
 {
 	int i, j;
 	struct exclude_list_group *group;
diff --git a/dir.h b/dir.h
index 7d76d0644f..7c55c1a460 100644
--- a/dir.h
+++ b/dir.h
@@ -19,22 +19,21 @@
  * CE_SKIP_WORKTREE marked. If you want to exclude files, make sure you have
  * loaded the index first.
  *
- * - Prepare `struct dir_struct dir` and clear it with `memset(&dir, 0,
- * sizeof(dir))`.
+ * - Prepare `struct dir_struct dir` using `dir_init()` function.
  *
  * - To add single exclude pattern, call `add_pattern_list()` and then
  *   `add_pattern()`.
  *
  * - To add patterns from a file (e.g. `.git/info/exclude`), call
- *   `add_patterns_from_file()` , and/or set `dir.exclude_per_dir`.  A
- *   short-hand function `setup_standard_excludes()` can be used to set
- *   up the standard set of exclude settings.
+ *   `add_patterns_from_file()` , and/or set `dir.exclude_per_dir`.
  *
- * - Set options described in the Data Structure section above.
+ * - A short-hand function `setup_standard_excludes()` can be used to set
+ *   up the standard set of exclude settings, instead of manually calling
+ *   the add_pattern*() family of functions.
  *
- * - Call `read_directory()`.
+ * - Call `fill_directory()`.
  *
- * - Use `dir.entries[]`.
+ * - Use `dir.entries[]` and `dir.ignored[]`.
  *
  * - Call `clear_directory()` when the contained elements are no longer in use.
  *
@@ -362,6 +361,8 @@ int match_pathspec(const struct index_state *istate,
 int report_path_error(const char *ps_matched, const struct pathspec *pathspec);
 int within_depth(const char *name, int namelen, int depth, int max_depth);
 
+void dir_init(struct dir_struct *dir);
+
 int fill_directory(struct dir_struct *dir,
 		   struct index_state *istate,
 		   const struct pathspec *pathspec);
@@ -428,7 +429,7 @@ void parse_path_pattern(const char **string, int *patternlen, unsigned *flags, i
 void add_pattern(const char *string, const char *base,
 		 int baselen, struct pattern_list *pl, int srcpos);
 void clear_pattern_list(struct pattern_list *pl);
-void clear_directory(struct dir_struct *dir);
+void dir_free(struct dir_struct *dir);
 
 int repo_file_exists(struct repository *repo, const char *path);
 int file_exists(const char *);
diff --git a/merge.c b/merge.c
index 753e461659..ee149b87ea 100644
--- a/merge.c
+++ b/merge.c
@@ -80,8 +80,8 @@ int checkout_fast_forward(struct repository *r,
 	}
 
 	memset(&opts, 0, sizeof(opts));
+	dir_init(&dir);
 	if (overwrite_ignore) {
-		memset(&dir, 0, sizeof(dir));
 		dir.flags |= DIR_SHOW_IGNORED;
 		setup_standard_excludes(&dir);
 		opts.dir = &dir;
@@ -102,6 +102,7 @@ int checkout_fast_forward(struct repository *r,
 		clear_unpack_trees_porcelain(&opts);
 		return -1;
 	}
+	dir_free(&dir);
 	clear_unpack_trees_porcelain(&opts);
 
 	if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
diff --git a/wt-status.c b/wt-status.c
index c00ea3e06a..5624a8ff7f 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -703,7 +703,7 @@ static void wt_status_collect_untracked(struct wt_status *s)
 	if (!s->show_untracked_files)
 		return;
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	if (s->show_untracked_files != SHOW_ALL_UNTRACKED_FILES)
 		dir.flags |=
 			DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
@@ -732,7 +732,7 @@ static void wt_status_collect_untracked(struct wt_status *s)
 			string_list_insert(&s->ignored, ent->name);
 	}
 
-	clear_directory(&dir);
+	dir_free(&dir);
 
 	if (advice_status_u_option)
 		s->untracked_in_ms = (getnanotime() - t_begin) / 1000000;
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap
  2020-08-16  6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
@ 2020-08-16  8:43   ` Jeff King
  2020-08-17 16:57     ` Elijah Newren
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2020-08-16  8:43 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Sun, Aug 16, 2020 at 06:59:09AM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> Commit 96cc8ab531 ("sparse-checkout: use hashmaps for cone patterns",
> 2019-11-21) added a parent_hashmap and recursive_hashmap to each struct
> pattern_list and initialized these in add_patterns_from_buffer() but did
> not make sure to add necessary code to clear_pattern_list() to free
> these new structures.  Call hashmap_free_() on each to plug this memory
> leak.

Beat you to it. :)

See: https://lore.kernel.org/git/20200814111049.GA4101811@coredump.intra.peff.net/

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] dir: make clear_directory() free all relevant memory
  2020-08-16  6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
@ 2020-08-16  8:54   ` Jeff King
  2020-08-17 16:58     ` Elijah Newren
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2020-08-16  8:54 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Sun, Aug 16, 2020 at 06:59:10AM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> The calling convention for the dir API is supposed to end with a call to
> clear_directory() to free up no longer needed memory.  However,
> clear_directory() didn't free dir->entries or dir->ignored.  I believe
> this was oversight, but a number of callers noticed memory leaks and
> started free'ing these, but often somewhat haphazardly (sometimes
> freeing the entries in the arrays, and sometimes only free'ing the
> arrays themselves).  This suggests the callers weren't trying to make
> sure any possible memory used might be free'd, but just the memory they
> noticed their usecase definitely had allocated.  This also caused the
> extra memory deallocations to be duplicated in many places.
> 
> Fix this mess by moving all the duplicated free'ing logic into
> clear_directory().

Makes sense. I don't know the dir.c code very well, so my worry would be
that some caller really wanted the other fields left untouched. But
looking over the callers, it seems they're all clearing it before the
struct goes out of scope. It's possible that they could have created
other pointer references, but it seems unlikely (and I'd argue they
should stop doing that and make their own copies of the data). E.g.,
wt_status_collect_untracked() sticks names into a string_list, but it
sets strdup_strings to make its own copy, so it's good.

> @@ -3034,6 +3031,13 @@ void clear_directory(struct dir_struct *dir)
>  		free(group->pl);
>  	}
>  
> +	for (i = 0; i < dir->ignored_nr; i++)
> +		free(dir->ignored[i]);
> +	for (i = 0; i < dir->nr; i++)
> +		free(dir->entries[i]);
> +	free(dir->ignored);
> +	free(dir->entries);
> +
>  	stk = dir->exclude_stack;
>  	while (stk) {
>  		struct exclude_stack *prev = stk->prev;

In most of our "clear" functions, the struct is ready for use again
(i.e., fields are set back to the initialized state). I don't think any
caller cares at this point, but it may be worth doing while we are here
as a least-surprise thing. That would mean setting these pointers to
NULL, and probably a few others that you aren't touching here.

Perhaps even easier would be to add a dir_init() call at the end after
your next patch adds that function.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks
  2020-08-16  6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
@ 2020-08-16  9:11   ` Jeff King
  2020-08-17 17:19     ` Elijah Newren
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2020-08-16  9:11 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Sun, Aug 16, 2020 at 06:59:11AM +0000, Elijah Newren via GitGitGadget wrote:

> Digging further, I found that despite the pretty clear documentation
> near the top of dir.h that folks were supposed to call clear_directory()
> when the user no longer needed the dir_struct, there were four callers
> that didn't bother doing that at all.  However, two of them clearly
> thought about leaks since they had an UNLEAK(dir) directive, which to me
> suggests that the method to free the data was too unclear.  I suspect
> the non-obviousness of the API and its holes led folks to avoid it,
> which then snowballed into further problems with the entries[],
> ignored[], parent_hashmap, and recursive_hashmap problems.

The UNLEAK() ones are sort-of my fault, and are a combination of:

  - The commit adding them says "in some cases (e.g., dir_struct) we
    don't even have a function which knows how to free all of the struct
    members". I'm not sure if why I didn't fix clear_directory() as you
    did. I may not have known about it, or I may have been lazy. Or more
    likely I was simply holding the UNLEAK() hammer, so it looked like a
    nail. ;)

  - My focus was on making leak-checker output cleaner. And it wasn't
    clear for cases where we're about to exit the program whether we
    should be actually freeing (which has small but non-zero cost) or
    just annotating (which is zero-cost, but clearly has confused some
    people about how UNLEAK() was meant to be used). I think I'm leaning
    these days towards "free if it is easy to do so".

So this definitely seems like a good direction to me.

> Rename clear_directory() to dir_free() to be more in line with other
> data structures in git, and introduce a dir_init() to handle the
> suggested memsetting of dir_struct to all zeroes.  I hope that a name
> like "dir_free()" is more clear, and that the presence of dir_init()
> will provide a hint to those looking at the code that there may be a
> corresponding dir_free() that they need to call.

I think having a pair of matched calls is good. I _thought_ we had
established a pattern that we should prefer "clear" to "free" for cases
where the struct itself its not freed. But grepping around, I see there
are a few exceptions (hashmap_free() is the big one, and then
oidmap_free() which is built around it seems to have inherited it).

The rest seem to follow that pattern, though: attr_check_free,
cache_tree_free, and submodule_cache_free all actually free the pointer
passed in. And "git grep '_clear(' *.h" shows lots of
clear-but-don't-free examples.

Would dir_clear() make the pairing more obvious, but keep the clear
name?

(I also wouldn't be opposed to changing hashmap and oidmap to use the
name "clear", but that's obviously a separate patch).

>  builtin/add.c          |  4 ++--
>  builtin/check-ignore.c |  4 ++--
>  builtin/clean.c        |  8 ++++----
>  builtin/grep.c         |  3 ++-
>  builtin/ls-files.c     |  4 ++--
>  builtin/stash.c        |  4 ++--
>  dir.c                  |  7 ++++++-
>  dir.h                  | 19 ++++++++++---------
>  merge.c                |  3 ++-
>  wt-status.c            |  4 ++--
>  10 files changed, 34 insertions(+), 26 deletions(-)

That patch itself looks good except for two minor points:

>  /* Frees memory within dir which was allocated.  Does not free dir itself. */
> -void clear_directory(struct dir_struct *dir)
> +void dir_free(struct dir_struct *dir)
>  {
>  	int i, j;
>  	struct exclude_list_group *group;

As I mentioned in my previous email, I think it would be nice if this
called dir_init() at the end, so that the struct is always in a
consistent state.

> diff --git a/dir.h b/dir.h
> index 7d76d0644f..7c55c1a460 100644
> --- a/dir.h
> +++ b/dir.h
> [...]
> - * - Use `dir.entries[]`.
> + * - Use `dir.entries[]` and `dir.ignored[]`.
>   *
>   * - Call `clear_directory()` when the contained elements are no longer in use.
>   *

This last line should become dir_free() / dir_clear().

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap
  2020-08-16  8:43   ` Jeff King
@ 2020-08-17 16:57     ` Elijah Newren
  0 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2020-08-17 16:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Sun, Aug 16, 2020 at 1:43 AM Jeff King <peff@peff.net> wrote:
>
> On Sun, Aug 16, 2020 at 06:59:09AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > Commit 96cc8ab531 ("sparse-checkout: use hashmaps for cone patterns",
> > 2019-11-21) added a parent_hashmap and recursive_hashmap to each struct
> > pattern_list and initialized these in add_patterns_from_buffer() but did
> > not make sure to add necessary code to clear_pattern_list() to free
> > these new structures.  Call hashmap_free_() on each to plug this memory
> > leak.
>
> Beat you to it. :)
>
> See: https://lore.kernel.org/git/20200814111049.GA4101811@coredump.intra.peff.net/

Doh!  However, you do need to take care to free the hash table in
addition to the hash entries, or you'll still have a leak (I actually
made the same mistake)...  I'll add those comments on your patch,
though.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] dir: make clear_directory() free all relevant memory
  2020-08-16  8:54   ` Jeff King
@ 2020-08-17 16:58     ` Elijah Newren
  0 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2020-08-17 16:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Sun, Aug 16, 2020 at 1:54 AM Jeff King <peff@peff.net> wrote:
>
> On Sun, Aug 16, 2020 at 06:59:10AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > The calling convention for the dir API is supposed to end with a call to
> > clear_directory() to free up no longer needed memory.  However,
> > clear_directory() didn't free dir->entries or dir->ignored.  I believe
> > this was oversight, but a number of callers noticed memory leaks and
> > started free'ing these, but often somewhat haphazardly (sometimes
> > freeing the entries in the arrays, and sometimes only free'ing the
> > arrays themselves).  This suggests the callers weren't trying to make
> > sure any possible memory used might be free'd, but just the memory they
> > noticed their usecase definitely had allocated.  This also caused the
> > extra memory deallocations to be duplicated in many places.
> >
> > Fix this mess by moving all the duplicated free'ing logic into
> > clear_directory().
>
> Makes sense. I don't know the dir.c code very well, so my worry would be
> that some caller really wanted the other fields left untouched. But
> looking over the callers, it seems they're all clearing it before the
> struct goes out of scope. It's possible that they could have created
> other pointer references, but it seems unlikely (and I'd argue they
> should stop doing that and make their own copies of the data). E.g.,
> wt_status_collect_untracked() sticks names into a string_list, but it
> sets strdup_strings to make its own copy, so it's good.
>
> > @@ -3034,6 +3031,13 @@ void clear_directory(struct dir_struct *dir)
> >               free(group->pl);
> >       }
> >
> > +     for (i = 0; i < dir->ignored_nr; i++)
> > +             free(dir->ignored[i]);
> > +     for (i = 0; i < dir->nr; i++)
> > +             free(dir->entries[i]);
> > +     free(dir->ignored);
> > +     free(dir->entries);
> > +
> >       stk = dir->exclude_stack;
> >       while (stk) {
> >               struct exclude_stack *prev = stk->prev;
>
> In most of our "clear" functions, the struct is ready for use again
> (i.e., fields are set back to the initialized state). I don't think any
> caller cares at this point, but it may be worth doing while we are here
> as a least-surprise thing. That would mean setting these pointers to
> NULL, and probably a few others that you aren't touching here.
>
> Perhaps even easier would be to add a dir_init() call at the end after
> your next patch adds that function.

Make sense; I'll fix it up.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks
  2020-08-16  9:11   ` Jeff King
@ 2020-08-17 17:19     ` Elijah Newren
  2020-08-17 19:00       ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren @ 2020-08-17 17:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Sun, Aug 16, 2020 at 2:11 AM Jeff King <peff@peff.net> wrote:
>
> On Sun, Aug 16, 2020 at 06:59:11AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Digging further, I found that despite the pretty clear documentation
> > near the top of dir.h that folks were supposed to call clear_directory()
> > when the user no longer needed the dir_struct, there were four callers
> > that didn't bother doing that at all.  However, two of them clearly
> > thought about leaks since they had an UNLEAK(dir) directive, which to me
> > suggests that the method to free the data was too unclear.  I suspect
> > the non-obviousness of the API and its holes led folks to avoid it,
> > which then snowballed into further problems with the entries[],
> > ignored[], parent_hashmap, and recursive_hashmap problems.
>
> The UNLEAK() ones are sort-of my fault, and are a combination of:
>
>   - The commit adding them says "in some cases (e.g., dir_struct) we
>     don't even have a function which knows how to free all of the struct
>     members". I'm not sure if why I didn't fix clear_directory() as you
>     did. I may not have known about it, or I may have been lazy. Or more
>     likely I was simply holding the UNLEAK() hammer, so it looked like a
>     nail. ;)
>
>   - My focus was on making leak-checker output cleaner. And it wasn't
>     clear for cases where we're about to exit the program whether we
>     should be actually freeing (which has small but non-zero cost) or
>     just annotating (which is zero-cost, but clearly has confused some
>     people about how UNLEAK() was meant to be used). I think I'm leaning
>     these days towards "free if it is easy to do so".
>
> So this definitely seems like a good direction to me.
>
> > Rename clear_directory() to dir_free() to be more in line with other
> > data structures in git, and introduce a dir_init() to handle the
> > suggested memsetting of dir_struct to all zeroes.  I hope that a name
> > like "dir_free()" is more clear, and that the presence of dir_init()
> > will provide a hint to those looking at the code that there may be a
> > corresponding dir_free() that they need to call.
>
> I think having a pair of matched calls is good. I _thought_ we had
> established a pattern that we should prefer "clear" to "free" for cases
> where the struct itself its not freed. But grepping around, I see there
> are a few exceptions (hashmap_free() is the big one, and then
> oidmap_free() which is built around it seems to have inherited it).
>
> The rest seem to follow that pattern, though: attr_check_free,
> cache_tree_free, and submodule_cache_free all actually free the pointer
> passed in. And "git grep '_clear(' *.h" shows lots of
> clear-but-don't-free examples.
>
> Would dir_clear() make the pairing more obvious, but keep the clear
> name?

Sure, that works for me.  The case where having an actual _free()
makes sense to me -- possibly in addition to a _clear() -- is when
some memory has to be allocated before first use, and thus a
foo_clear() would leave that memory allocated.  Then you really do
need a foo_free() for use when the data structure won't be used again.

> (I also wouldn't be opposed to changing hashmap and oidmap to use the
> name "clear", but that's obviously a separate patch).

hashmap is one of the cases that needs to have a free construct,
because the table in which to stuff the entries has to be allocated
and thus a hashmap_clear() would have to leave the table allocated if
it wants to be ready for re-use.  If someone really is done with a
hashmap, then to avoid leaking, both the entries and the table need to
be deallocated.

I keep getting confused by the hashmap API, and what pieces it frees
-- it looks like my earlier comments today were wrong and
hashmap_free_entries() does free the table.  So...perhaps I should
create a patch to make that clearer, and also submit the patch I've
had for a while to introduce a hashmap_clear() function (which is
similar to hashmap_free_entries, in that it frees the entries and
zeros out most of the map, but it leaves the table allocated and ready
for use).

I really wish hashmap_free() did what hashmap_free_entries() did.  So
annoying and counter-intuitive...

> >  builtin/add.c          |  4 ++--
> >  builtin/check-ignore.c |  4 ++--
> >  builtin/clean.c        |  8 ++++----
> >  builtin/grep.c         |  3 ++-
> >  builtin/ls-files.c     |  4 ++--
> >  builtin/stash.c        |  4 ++--
> >  dir.c                  |  7 ++++++-
> >  dir.h                  | 19 ++++++++++---------
> >  merge.c                |  3 ++-
> >  wt-status.c            |  4 ++--
> >  10 files changed, 34 insertions(+), 26 deletions(-)
>
> That patch itself looks good except for two minor points:
>
> >  /* Frees memory within dir which was allocated.  Does not free dir itself. */
> > -void clear_directory(struct dir_struct *dir)
> > +void dir_free(struct dir_struct *dir)
> >  {
> >       int i, j;
> >       struct exclude_list_group *group;
>
> As I mentioned in my previous email, I think it would be nice if this
> called dir_init() at the end, so that the struct is always in a
> consistent state.
>
> > diff --git a/dir.h b/dir.h
> > index 7d76d0644f..7c55c1a460 100644
> > --- a/dir.h
> > +++ b/dir.h
> > [...]
> > - * - Use `dir.entries[]`.
> > + * - Use `dir.entries[]` and `dir.ignored[]`.
> >   *
> >   * - Call `clear_directory()` when the contained elements are no longer in use.
> >   *
>
> This last line should become dir_free() / dir_clear().

I'll fix these up.

Elijah

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks
  2020-08-17 17:19     ` Elijah Newren
@ 2020-08-17 19:00       ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2020-08-17 19:00 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Mon, Aug 17, 2020 at 10:19:03AM -0700, Elijah Newren wrote:

> > (I also wouldn't be opposed to changing hashmap and oidmap to use the
> > name "clear", but that's obviously a separate patch).
> 
> hashmap is one of the cases that needs to have a free construct,
> because the table in which to stuff the entries has to be allocated
> and thus a hashmap_clear() would have to leave the table allocated if
> it wants to be ready for re-use.  If someone really is done with a
> hashmap, then to avoid leaking, both the entries and the table need to
> be deallocated.

Hmm, you're right. oidmap() will lazy-initialize the table, but hashmap
will not. You _do_ have to initialize a hashmap because somebody has to
set the comparison function. But that could be fixed, and would make it
more like the rest of our code. I.e., you should be able to do:

  struct hashmap foo = HASHMAP_INIT(my_cmpfn);

  if (bar)
	return; /* no leak, because we never put anything in the map! */

  ... add some stuff to the map ...

  hashmap_clear(&foo);

  ... now it's empty and nothing allocated; we could return
      here without a leak or we could add more stuff to it ...

I don't even think it would be that big a change. Just translate a NULL
table to "not found" on the read side, and lazily call alloc_table() on
the write side. And have hashmap_free() not very the _whole_ struct, but
leave the cmpfn in place.

> I keep getting confused by the hashmap API, and what pieces it frees
> -- it looks like my earlier comments today were wrong and
> hashmap_free_entries() does free the table.  So...perhaps I should
> create a patch to make that clearer, and also submit the patch I've
> had for a while to introduce a hashmap_clear() function (which is
> similar to hashmap_free_entries, in that it frees the entries and
> zeros out most of the map, but it leaves the table allocated and ready
> for use).
> 
> I really wish hashmap_free() did what hashmap_free_entries() did.  So
> annoying and counter-intuitive...

I left some comments in my other reply, but in case you do pursue this:
the obvious thing to have is a free_entries boolean parameter to the
function, so that each caller is clear about what they want. And we used
to have that. But it's awkward because "free the entries" isn't a
boolean anymore; it's "free the entries you can find by moving backwards
to this offset from the hashmap_entry pointer". So callers who don't
want to free them have to pass some sentinel value there. And that's how
we ended up with two separate wrapper functions.

I think your main complaint is just about the naming though. If we had:

  /* drop all entries, freeing any hashmap-specific memory */
  hashmap_clear();

  /* ditto, but also free the entries themselves */
  hashmap_clear_and_free_entires();

that would be a bit more obvious (though I imagine it would still be
easy to forget that "clear" doesn't drop the entries). Another approach
would be to have a flag in the map for "do I own the entry memory". Most
callers are happy to hand off ownership of the entries when they're
added. And it may even be that this would open up the possibility of
more convenience functions on the adding/allocation side. I didn't think
it through carefully, though.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 0/2] Clean up some memory leaks in and around dir.c
  2020-08-16  6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-08-16  6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
@ 2020-08-18 22:58 ` Elijah Newren via GitGitGadget
  2020-08-18 22:58   ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
                     ` (2 more replies)
  3 siblings, 3 replies; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-18 22:58 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

Some memory leaks in dir.c were making it hard for me to find my own leaks
in merge-ort. I followed a few threads and tried to both clean up the leaks
and make the API less prone to incorrect use.

Changes since v1:

 * dropped the first patch, since Peff submitted an identical patch a day
   before me
 * implemented Peff's suggestions to rename dir_free() to dir_clear() and
   have it call dir_init() at the end so that folks can re-use the struct
   after dir_clear() if they so choose.

Elijah Newren (2):
  dir: make clear_directory() free all relevant memory
  dir: fix problematic API to avoid memory leaks

 builtin/add.c          |  4 ++--
 builtin/check-ignore.c |  4 ++--
 builtin/clean.c        | 12 ++++--------
 builtin/grep.c         |  3 ++-
 builtin/ls-files.c     |  4 ++--
 builtin/stash.c        |  7 ++-----
 dir.c                  | 20 +++++++++++++++++---
 dir.h                  | 21 +++++++++++----------
 merge.c                |  3 ++-
 wt-status.c            |  8 ++------
 10 files changed, 46 insertions(+), 40 deletions(-)


base-commit: 878e727637ec5815ccb3301eb994a54df95b21b8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-831%2Fnewren%2Fdir_memory_cleanup-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-831/newren/dir_memory_cleanup-v2
Pull-Request: https://github.com/git/git/pull/831

Range-diff vs v1:

 1:  932741d759 < -:  ---------- dir: fix leak of parent_hashmap and recursive_hashmap
 2:  068e097e22 ! 1:  f890ac0279 dir: make clear_directory() free all relevant memory
     @@ Commit message
          The calling convention for the dir API is supposed to end with a call to
          clear_directory() to free up no longer needed memory.  However,
          clear_directory() didn't free dir->entries or dir->ignored.  I believe
     -    this was oversight, but a number of callers noticed memory leaks and
     -    started free'ing these, but often somewhat haphazardly (sometimes
     -    freeing the entries in the arrays, and sometimes only free'ing the
     -    arrays themselves).  This suggests the callers weren't trying to make
     -    sure any possible memory used might be free'd, but just the memory they
     -    noticed their usecase definitely had allocated.  This also caused the
     -    extra memory deallocations to be duplicated in many places.
     +    this was an oversight, but a number of callers noticed memory leaks and
     +    started free'ing these.  Unfortunately, they did so somewhat haphazardly
     +    (sometimes freeing the entries in the arrays, and sometimes only
     +    free'ing the arrays themselves).  This suggests the callers weren't
     +    trying to make sure any possible memory used might be free'd, but just
     +    the memory they noticed their usecase definitely had allocated.
      
          Fix this mess by moving all the duplicated free'ing logic into
     -    clear_directory().
     +    clear_directory().  End by resetting dir to a pristine state so it could
     +    be reused if desired.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ builtin/stash.c: static int get_untracked_files(const struct pathspec *ps, int i
      
       ## dir.c ##
      @@ dir.c: int remove_path(const char *name)
     - 	return 0;
       }
       
     --/*
     + /*
      - * Frees memory within dir which was allocated for exclude lists and
      - * the exclude_stack.  Does not free dir itself.
     -- */
     -+/* Frees memory within dir which was allocated.  Does not free dir itself. */
     ++ * Frees memory within dir which was allocated, and resets fields for further
     ++ * use.  Does not free dir itself.
     +  */
       void clear_directory(struct dir_struct *dir)
       {
     - 	int i, j;
      @@ dir.c: void clear_directory(struct dir_struct *dir)
       		free(group->pl);
       	}
     @@ dir.c: void clear_directory(struct dir_struct *dir)
       	stk = dir->exclude_stack;
       	while (stk) {
       		struct exclude_stack *prev = stk->prev;
     +@@ dir.c: void clear_directory(struct dir_struct *dir)
     + 		stk = prev;
     + 	}
     + 	strbuf_release(&dir->basebuf);
     ++
     ++	memset(&dir, 0, sizeof(*dir));
     + }
     + 
     + struct ondisk_untracked_cache {
      
       ## dir.h ##
      @@
 3:  b9310e9941 ! 2:  dcd72eef48 dir: fix problematic API to avoid memory leaks
     @@ Metadata
       ## Commit message ##
          dir: fix problematic API to avoid memory leaks
      
     -    Two commits ago, we noticed that parent_hashmap and recursive_hashmap
     -    were being leaked and fixed it by making clear_pattern_list() free those
     -    hashmaps.  One commit ago, we noticed that clear_directory() was only
     -    taking responsibility for a subset of fields within dir_struct, despite
     -    the fact that entries[] and ignored[] we allocated internally to dir.c,
     -    resulting in many callers either leaking or haphazardly trying to free
     -    these arrays and their contents.
     +    The dir structure seemed to have a number of leaks and problems around
     +    it.  First I noticed that parent_hashmap and recursive_hashmap were
     +    being leaked (though Peff noticed and submitted fixes before me).  Then
     +    I noticed in the previous commit that clear_directory() was only taking
     +    responsibility for a subset of fields within dir_struct, despite the
     +    fact that entries[] and ignored[] we allocated internally to dir.c.
     +    That, of course, resulted in many callers either leaking or haphazardly
     +    trying to free these arrays and their contents.
      
          Digging further, I found that despite the pretty clear documentation
          near the top of dir.h that folks were supposed to call clear_directory()
     @@ Commit message
          which then snowballed into further problems with the entries[],
          ignored[], parent_hashmap, and recursive_hashmap problems.
      
     -    Rename clear_directory() to dir_free() to be more in line with other
     +    Rename clear_directory() to dir_clear() to be more in line with other
          data structures in git, and introduce a dir_init() to handle the
          suggested memsetting of dir_struct to all zeroes.  I hope that a name
     -    like "dir_free()" is more clear, and that the presence of dir_init()
     -    will provide a hint to those looking at the code that there may be a
     -    corresponding dir_free() that they need to call.
     +    like "dir_clear()" is more clear, and that the presence of dir_init()
     +    will provide a hint to those looking at the code that they need to look
     +    for either a dir_clear() or a dir_free() and lead them to find
     +    dir_clear().
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
       			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
       		die(_("Unable to write new index file"));
       
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       	UNLEAK(pathspec);
      -	UNLEAK(dir);
       	return exit_status;
     @@ builtin/check-ignore.c: int cmd_check_ignore(int argc, const char **argv, const
       	}
       
      -	clear_directory(&dir);
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       
       	return !num_ignored;
       }
     @@ builtin/clean.c: static int filter_by_patterns_cmd(void)
       
       		strbuf_list_free(ignore_list);
      -		clear_directory(&dir);
     -+		dir_free(&dir);
     ++		dir_clear(&dir);
       	}
       
       	strbuf_release(&confirm);
     @@ builtin/clean.c: int cmd_clean(int argc, const char **argv, const char *prefix)
       	}
       
      -	clear_directory(&dir);
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       
       	if (interactive && del_list.nr > 0)
       		interactive_main_loop();
     @@ builtin/grep.c: static int grep_directory(struct grep_opt *opt, const struct pat
       		if (hit && opt->status_only)
       			break;
       	}
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       	return hit;
       }
       
     @@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cm
       	}
       
      -	UNLEAK(dir);
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       	return 0;
       }
      
     @@ builtin/stash.c: static int get_untracked_files(const struct pathspec *ps, int i
       	}
       
      -	clear_directory(&dir);
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       	return found;
       }
       
     @@ dir.c: static enum path_treatment read_directory_recursive(struct dir_struct *di
       {
       	int cnt = 0;
      @@ dir.c: int remove_path(const char *name)
     - }
     - 
     - /* Frees memory within dir which was allocated.  Does not free dir itself. */
     +  * Frees memory within dir which was allocated, and resets fields for further
     +  * use.  Does not free dir itself.
     +  */
      -void clear_directory(struct dir_struct *dir)
     -+void dir_free(struct dir_struct *dir)
     ++void dir_clear(struct dir_struct *dir)
       {
       	int i, j;
       	struct exclude_list_group *group;
     +@@ dir.c: void clear_directory(struct dir_struct *dir)
     + 	}
     + 	strbuf_release(&dir->basebuf);
     + 
     +-	memset(&dir, 0, sizeof(*dir));
     ++	dir_init(dir);
     + }
     + 
     + struct ondisk_untracked_cache {
      
       ## dir.h ##
      @@
     @@ dir.h
      - * - Use `dir.entries[]`.
      + * - Use `dir.entries[]` and `dir.ignored[]`.
        *
     -  * - Call `clear_directory()` when the contained elements are no longer in use.
     +- * - Call `clear_directory()` when the contained elements are no longer in use.
     ++ * - Call `dir_clear()` when the contained elements are no longer in use.
        *
     +  */
     + 
      @@ dir.h: int match_pathspec(const struct index_state *istate,
       int report_path_error(const char *ps_matched, const struct pathspec *pathspec);
       int within_depth(const char *name, int namelen, int depth, int max_depth);
     @@ dir.h: void parse_path_pattern(const char **string, int *patternlen, unsigned *f
       		 int baselen, struct pattern_list *pl, int srcpos);
       void clear_pattern_list(struct pattern_list *pl);
      -void clear_directory(struct dir_struct *dir);
     -+void dir_free(struct dir_struct *dir);
     ++void dir_clear(struct dir_struct *dir);
       
       int repo_file_exists(struct repository *repo, const char *path);
       int file_exists(const char *);
     @@ merge.c: int checkout_fast_forward(struct repository *r,
       		clear_unpack_trees_porcelain(&opts);
       		return -1;
       	}
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       	clear_unpack_trees_porcelain(&opts);
       
       	if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
     @@ wt-status.c: static void wt_status_collect_untracked(struct wt_status *s)
       	}
       
      -	clear_directory(&dir);
     -+	dir_free(&dir);
     ++	dir_clear(&dir);
       
       	if (advice_status_u_option)
       		s->untracked_in_ms = (getnanotime() - t_begin) / 1000000;

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/2] dir: make clear_directory() free all relevant memory
  2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
@ 2020-08-18 22:58   ` Elijah Newren via GitGitGadget
  2020-08-18 22:58   ` [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
  2020-08-19 13:51   ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Jeff King
  2 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-18 22:58 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The calling convention for the dir API is supposed to end with a call to
clear_directory() to free up no longer needed memory.  However,
clear_directory() didn't free dir->entries or dir->ignored.  I believe
this was an oversight, but a number of callers noticed memory leaks and
started free'ing these.  Unfortunately, they did so somewhat haphazardly
(sometimes freeing the entries in the arrays, and sometimes only
free'ing the arrays themselves).  This suggests the callers weren't
trying to make sure any possible memory used might be free'd, but just
the memory they noticed their usecase definitely had allocated.

Fix this mess by moving all the duplicated free'ing logic into
clear_directory().  End by resetting dir to a pristine state so it could
be reused if desired.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c |  6 +-----
 builtin/stash.c |  3 ---
 dir.c           | 13 +++++++++++--
 dir.h           |  2 +-
 wt-status.c     |  4 ----
 5 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 5a9c29a558..4ffe00dd7f 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -1021,11 +1021,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		string_list_append(&del_list, rel);
 	}
 
-	for (i = 0; i < dir.nr; i++)
-		free(dir.entries[i]);
-
-	for (i = 0; i < dir.ignored_nr; i++)
-		free(dir.ignored[i]);
+	clear_directory(&dir);
 
 	if (interactive && del_list.nr > 0)
 		interactive_main_loop();
diff --git a/builtin/stash.c b/builtin/stash.c
index 10d87630cd..da48533d49 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -875,11 +875,8 @@ static int get_untracked_files(const struct pathspec *ps, int include_untracked,
 		strbuf_addstr(untracked_files, ent->name);
 		/* NUL-terminate: will be fed to update-index -z */
 		strbuf_addch(untracked_files, '\0');
-		free(ent);
 	}
 
-	free(dir.entries);
-	free(dir.ignored);
 	clear_directory(&dir);
 	return found;
 }
diff --git a/dir.c b/dir.c
index fe64be30ed..aa96030031 100644
--- a/dir.c
+++ b/dir.c
@@ -3009,8 +3009,8 @@ int remove_path(const char *name)
 }
 
 /*
- * Frees memory within dir which was allocated for exclude lists and
- * the exclude_stack.  Does not free dir itself.
+ * Frees memory within dir which was allocated, and resets fields for further
+ * use.  Does not free dir itself.
  */
 void clear_directory(struct dir_struct *dir)
 {
@@ -3030,6 +3030,13 @@ void clear_directory(struct dir_struct *dir)
 		free(group->pl);
 	}
 
+	for (i = 0; i < dir->ignored_nr; i++)
+		free(dir->ignored[i]);
+	for (i = 0; i < dir->nr; i++)
+		free(dir->entries[i]);
+	free(dir->ignored);
+	free(dir->entries);
+
 	stk = dir->exclude_stack;
 	while (stk) {
 		struct exclude_stack *prev = stk->prev;
@@ -3037,6 +3044,8 @@ void clear_directory(struct dir_struct *dir)
 		stk = prev;
 	}
 	strbuf_release(&dir->basebuf);
+
+	memset(&dir, 0, sizeof(*dir));
 }
 
 struct ondisk_untracked_cache {
diff --git a/dir.h b/dir.h
index 5855c065a6..7d76d0644f 100644
--- a/dir.h
+++ b/dir.h
@@ -36,7 +36,7 @@
  *
  * - Use `dir.entries[]`.
  *
- * - Call `clear_directory()` when none of the contained elements are no longer in use.
+ * - Call `clear_directory()` when the contained elements are no longer in use.
  *
  */
 
diff --git a/wt-status.c b/wt-status.c
index d75399085d..c00ea3e06a 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -724,18 +724,14 @@ static void wt_status_collect_untracked(struct wt_status *s)
 		struct dir_entry *ent = dir.entries[i];
 		if (index_name_is_other(istate, ent->name, ent->len))
 			string_list_insert(&s->untracked, ent->name);
-		free(ent);
 	}
 
 	for (i = 0; i < dir.ignored_nr; i++) {
 		struct dir_entry *ent = dir.ignored[i];
 		if (index_name_is_other(istate, ent->name, ent->len))
 			string_list_insert(&s->ignored, ent->name);
-		free(ent);
 	}
 
-	free(dir.entries);
-	free(dir.ignored);
 	clear_directory(&dir);
 
 	if (advice_status_u_option)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks
  2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
  2020-08-18 22:58   ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
@ 2020-08-18 22:58   ` Elijah Newren via GitGitGadget
  2020-08-19 13:51   ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Jeff King
  2 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-08-18 22:58 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The dir structure seemed to have a number of leaks and problems around
it.  First I noticed that parent_hashmap and recursive_hashmap were
being leaked (though Peff noticed and submitted fixes before me).  Then
I noticed in the previous commit that clear_directory() was only taking
responsibility for a subset of fields within dir_struct, despite the
fact that entries[] and ignored[] we allocated internally to dir.c.
That, of course, resulted in many callers either leaking or haphazardly
trying to free these arrays and their contents.

Digging further, I found that despite the pretty clear documentation
near the top of dir.h that folks were supposed to call clear_directory()
when the user no longer needed the dir_struct, there were four callers
that didn't bother doing that at all.  However, two of them clearly
thought about leaks since they had an UNLEAK(dir) directive, which to me
suggests that the method to free the data was too unclear.  I suspect
the non-obviousness of the API and its holes led folks to avoid it,
which then snowballed into further problems with the entries[],
ignored[], parent_hashmap, and recursive_hashmap problems.

Rename clear_directory() to dir_clear() to be more in line with other
data structures in git, and introduce a dir_init() to handle the
suggested memsetting of dir_struct to all zeroes.  I hope that a name
like "dir_clear()" is more clear, and that the presence of dir_init()
will provide a hint to those looking at the code that they need to look
for either a dir_clear() or a dir_free() and lead them to find
dir_clear().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/add.c          |  4 ++--
 builtin/check-ignore.c |  4 ++--
 builtin/clean.c        |  8 ++++----
 builtin/grep.c         |  3 ++-
 builtin/ls-files.c     |  4 ++--
 builtin/stash.c        |  4 ++--
 dir.c                  |  9 +++++++--
 dir.h                  | 21 +++++++++++----------
 merge.c                |  3 ++-
 wt-status.c            |  4 ++--
 10 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index ab39a60a0d..b36a99eb7c 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -534,11 +534,11 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	die_in_unpopulated_submodule(&the_index, prefix);
 	die_path_inside_submodule(&the_index, &pathspec);
 
+	dir_init(&dir);
 	if (add_new_files) {
 		int baselen;
 
 		/* Set up the default git porcelain excludes */
-		memset(&dir, 0, sizeof(dir));
 		if (!ignored_too) {
 			dir.flags |= DIR_COLLECT_IGNORED;
 			setup_standard_excludes(&dir);
@@ -611,7 +611,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
 		die(_("Unable to write new index file"));
 
+	dir_clear(&dir);
 	UNLEAK(pathspec);
-	UNLEAK(dir);
 	return exit_status;
 }
diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index ea5d0ae3a6..3c652748d5 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -180,7 +180,7 @@ int cmd_check_ignore(int argc, const char **argv, const char *prefix)
 	if (!no_index && read_cache() < 0)
 		die(_("index file corrupt"));
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	setup_standard_excludes(&dir);
 
 	if (stdin_paths) {
@@ -190,7 +190,7 @@ int cmd_check_ignore(int argc, const char **argv, const char *prefix)
 		maybe_flush_or_die(stdout, "ignore to stdout");
 	}
 
-	clear_directory(&dir);
+	dir_clear(&dir);
 
 	return !num_ignored;
 }
diff --git a/builtin/clean.c b/builtin/clean.c
index 4ffe00dd7f..e53ea52d89 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -667,7 +667,7 @@ static int filter_by_patterns_cmd(void)
 		if (!confirm.len)
 			break;
 
-		memset(&dir, 0, sizeof(dir));
+		dir_init(&dir);
 		pl = add_pattern_list(&dir, EXC_CMDL, "manual exclude");
 		ignore_list = strbuf_split_max(&confirm, ' ', 0);
 
@@ -698,7 +698,7 @@ static int filter_by_patterns_cmd(void)
 		}
 
 		strbuf_list_free(ignore_list);
-		clear_directory(&dir);
+		dir_clear(&dir);
 	}
 
 	strbuf_release(&confirm);
@@ -923,7 +923,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, options, builtin_clean_usage,
 			     0);
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	if (!interactive && !dry_run && !force) {
 		if (config_set)
 			die(_("clean.requireForce set to true and neither -i, -n, nor -f given; "
@@ -1021,7 +1021,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		string_list_append(&del_list, rel);
 	}
 
-	clear_directory(&dir);
+	dir_clear(&dir);
 
 	if (interactive && del_list.nr > 0)
 		interactive_main_loop();
diff --git a/builtin/grep.c b/builtin/grep.c
index cee9db3477..f58979bc3f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -693,7 +693,7 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
 	struct dir_struct dir;
 	int i, hit = 0;
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	if (!use_index)
 		dir.flags |= DIR_NO_GITLINKS;
 	if (exc_std)
@@ -705,6 +705,7 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (hit && opt->status_only)
 			break;
 	}
+	dir_clear(&dir);
 	return hit;
 }
 
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 30a4c10334..c8eae899b8 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -584,7 +584,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(ls_files_usage, builtin_ls_files_options);
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	prefix = cmd_prefix;
 	if (prefix)
 		prefix_len = strlen(prefix);
@@ -688,6 +688,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		return bad ? 1 : 0;
 	}
 
-	UNLEAK(dir);
+	dir_clear(&dir);
 	return 0;
 }
diff --git a/builtin/stash.c b/builtin/stash.c
index da48533d49..4bdfaf8397 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -864,7 +864,7 @@ static int get_untracked_files(const struct pathspec *ps, int include_untracked,
 	int found = 0;
 	struct dir_struct dir;
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	if (include_untracked != INCLUDE_ALL_FILES)
 		setup_standard_excludes(&dir);
 
@@ -877,7 +877,7 @@ static int get_untracked_files(const struct pathspec *ps, int include_untracked,
 		strbuf_addch(untracked_files, '\0');
 	}
 
-	clear_directory(&dir);
+	dir_clear(&dir);
 	return found;
 }
 
diff --git a/dir.c b/dir.c
index aa96030031..8d6c59a9dc 100644
--- a/dir.c
+++ b/dir.c
@@ -54,6 +54,11 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 static int resolve_dtype(int dtype, struct index_state *istate,
 			 const char *path, int len);
 
+void dir_init(struct dir_struct *dir)
+{
+	memset(dir, 0, sizeof(*dir));
+}
+
 int count_slashes(const char *s)
 {
 	int cnt = 0;
@@ -3012,7 +3017,7 @@ int remove_path(const char *name)
  * Frees memory within dir which was allocated, and resets fields for further
  * use.  Does not free dir itself.
  */
-void clear_directory(struct dir_struct *dir)
+void dir_clear(struct dir_struct *dir)
 {
 	int i, j;
 	struct exclude_list_group *group;
@@ -3045,7 +3050,7 @@ void clear_directory(struct dir_struct *dir)
 	}
 	strbuf_release(&dir->basebuf);
 
-	memset(&dir, 0, sizeof(*dir));
+	dir_init(dir);
 }
 
 struct ondisk_untracked_cache {
diff --git a/dir.h b/dir.h
index 7d76d0644f..a3c40dec51 100644
--- a/dir.h
+++ b/dir.h
@@ -19,24 +19,23 @@
  * CE_SKIP_WORKTREE marked. If you want to exclude files, make sure you have
  * loaded the index first.
  *
- * - Prepare `struct dir_struct dir` and clear it with `memset(&dir, 0,
- * sizeof(dir))`.
+ * - Prepare `struct dir_struct dir` using `dir_init()` function.
  *
  * - To add single exclude pattern, call `add_pattern_list()` and then
  *   `add_pattern()`.
  *
  * - To add patterns from a file (e.g. `.git/info/exclude`), call
- *   `add_patterns_from_file()` , and/or set `dir.exclude_per_dir`.  A
- *   short-hand function `setup_standard_excludes()` can be used to set
- *   up the standard set of exclude settings.
+ *   `add_patterns_from_file()` , and/or set `dir.exclude_per_dir`.
  *
- * - Set options described in the Data Structure section above.
+ * - A short-hand function `setup_standard_excludes()` can be used to set
+ *   up the standard set of exclude settings, instead of manually calling
+ *   the add_pattern*() family of functions.
  *
- * - Call `read_directory()`.
+ * - Call `fill_directory()`.
  *
- * - Use `dir.entries[]`.
+ * - Use `dir.entries[]` and `dir.ignored[]`.
  *
- * - Call `clear_directory()` when the contained elements are no longer in use.
+ * - Call `dir_clear()` when the contained elements are no longer in use.
  *
  */
 
@@ -362,6 +361,8 @@ int match_pathspec(const struct index_state *istate,
 int report_path_error(const char *ps_matched, const struct pathspec *pathspec);
 int within_depth(const char *name, int namelen, int depth, int max_depth);
 
+void dir_init(struct dir_struct *dir);
+
 int fill_directory(struct dir_struct *dir,
 		   struct index_state *istate,
 		   const struct pathspec *pathspec);
@@ -428,7 +429,7 @@ void parse_path_pattern(const char **string, int *patternlen, unsigned *flags, i
 void add_pattern(const char *string, const char *base,
 		 int baselen, struct pattern_list *pl, int srcpos);
 void clear_pattern_list(struct pattern_list *pl);
-void clear_directory(struct dir_struct *dir);
+void dir_clear(struct dir_struct *dir);
 
 int repo_file_exists(struct repository *repo, const char *path);
 int file_exists(const char *);
diff --git a/merge.c b/merge.c
index 753e461659..5fb88af102 100644
--- a/merge.c
+++ b/merge.c
@@ -80,8 +80,8 @@ int checkout_fast_forward(struct repository *r,
 	}
 
 	memset(&opts, 0, sizeof(opts));
+	dir_init(&dir);
 	if (overwrite_ignore) {
-		memset(&dir, 0, sizeof(dir));
 		dir.flags |= DIR_SHOW_IGNORED;
 		setup_standard_excludes(&dir);
 		opts.dir = &dir;
@@ -102,6 +102,7 @@ int checkout_fast_forward(struct repository *r,
 		clear_unpack_trees_porcelain(&opts);
 		return -1;
 	}
+	dir_clear(&dir);
 	clear_unpack_trees_porcelain(&opts);
 
 	if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
diff --git a/wt-status.c b/wt-status.c
index c00ea3e06a..7ce58b8aae 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -703,7 +703,7 @@ static void wt_status_collect_untracked(struct wt_status *s)
 	if (!s->show_untracked_files)
 		return;
 
-	memset(&dir, 0, sizeof(dir));
+	dir_init(&dir);
 	if (s->show_untracked_files != SHOW_ALL_UNTRACKED_FILES)
 		dir.flags |=
 			DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
@@ -732,7 +732,7 @@ static void wt_status_collect_untracked(struct wt_status *s)
 			string_list_insert(&s->ignored, ent->name);
 	}
 
-	clear_directory(&dir);
+	dir_clear(&dir);
 
 	if (advice_status_u_option)
 		s->untracked_in_ms = (getnanotime() - t_begin) / 1000000;
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] Clean up some memory leaks in and around dir.c
  2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
  2020-08-18 22:58   ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
  2020-08-18 22:58   ` [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
@ 2020-08-19 13:51   ` Jeff King
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2020-08-19 13:51 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Aug 18, 2020 at 10:58:24PM +0000, Elijah Newren via GitGitGadget wrote:

> Some memory leaks in dir.c were making it hard for me to find my own leaks
> in merge-ort. I followed a few threads and tried to both clean up the leaks
> and make the API less prone to incorrect use.
> 
> Changes since v1:
> 
>  * dropped the first patch, since Peff submitted an identical patch a day
>    before me
>  * implemented Peff's suggestions to rename dir_free() to dir_clear() and
>    have it call dir_init() at the end so that folks can re-use the struct
>    after dir_clear() if they so choose.

Thanks, both patches look good to me.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-08-19 13:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-16  6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-16  6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
2020-08-16  8:43   ` Jeff King
2020-08-17 16:57     ` Elijah Newren
2020-08-16  6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-16  8:54   ` Jeff King
2020-08-17 16:58     ` Elijah Newren
2020-08-16  6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-16  9:11   ` Jeff King
2020-08-17 17:19     ` Elijah Newren
2020-08-17 19:00       ` Jeff King
2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-18 22:58   ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-18 22:58   ` [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-19 13:51   ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Jeff King

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git