From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 0/2] Clean up some memory leaks in and around dir.c
Date: Tue, 18 Aug 2020 22:58:24 +0000 [thread overview]
Message-ID: <pull.831.v2.git.git.1597791506.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.831.git.git.1597561152.gitgitgadget@gmail.com>
Some memory leaks in dir.c were making it hard for me to find my own leaks
in merge-ort. I followed a few threads and tried to both clean up the leaks
and make the API less prone to incorrect use.
Changes since v1:
* dropped the first patch, since Peff submitted an identical patch a day
before me
* implemented Peff's suggestions to rename dir_free() to dir_clear() and
have it call dir_init() at the end so that folks can re-use the struct
after dir_clear() if they so choose.
Elijah Newren (2):
dir: make clear_directory() free all relevant memory
dir: fix problematic API to avoid memory leaks
builtin/add.c | 4 ++--
builtin/check-ignore.c | 4 ++--
builtin/clean.c | 12 ++++--------
builtin/grep.c | 3 ++-
builtin/ls-files.c | 4 ++--
builtin/stash.c | 7 ++-----
dir.c | 20 +++++++++++++++++---
dir.h | 21 +++++++++++----------
merge.c | 3 ++-
wt-status.c | 8 ++------
10 files changed, 46 insertions(+), 40 deletions(-)
base-commit: 878e727637ec5815ccb3301eb994a54df95b21b8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-831%2Fnewren%2Fdir_memory_cleanup-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-831/newren/dir_memory_cleanup-v2
Pull-Request: https://github.com/git/git/pull/831
Range-diff vs v1:
1: 932741d759 < -: ---------- dir: fix leak of parent_hashmap and recursive_hashmap
2: 068e097e22 ! 1: f890ac0279 dir: make clear_directory() free all relevant memory
@@ Commit message
The calling convention for the dir API is supposed to end with a call to
clear_directory() to free up no longer needed memory. However,
clear_directory() didn't free dir->entries or dir->ignored. I believe
- this was oversight, but a number of callers noticed memory leaks and
- started free'ing these, but often somewhat haphazardly (sometimes
- freeing the entries in the arrays, and sometimes only free'ing the
- arrays themselves). This suggests the callers weren't trying to make
- sure any possible memory used might be free'd, but just the memory they
- noticed their usecase definitely had allocated. This also caused the
- extra memory deallocations to be duplicated in many places.
+ this was an oversight, but a number of callers noticed memory leaks and
+ started free'ing these. Unfortunately, they did so somewhat haphazardly
+ (sometimes freeing the entries in the arrays, and sometimes only
+ free'ing the arrays themselves). This suggests the callers weren't
+ trying to make sure any possible memory used might be free'd, but just
+ the memory they noticed their usecase definitely had allocated.
Fix this mess by moving all the duplicated free'ing logic into
- clear_directory().
+ clear_directory(). End by resetting dir to a pristine state so it could
+ be reused if desired.
Signed-off-by: Elijah Newren <newren@gmail.com>
@@ builtin/stash.c: static int get_untracked_files(const struct pathspec *ps, int i
## dir.c ##
@@ dir.c: int remove_path(const char *name)
- return 0;
}
--/*
+ /*
- * Frees memory within dir which was allocated for exclude lists and
- * the exclude_stack. Does not free dir itself.
-- */
-+/* Frees memory within dir which was allocated. Does not free dir itself. */
++ * Frees memory within dir which was allocated, and resets fields for further
++ * use. Does not free dir itself.
+ */
void clear_directory(struct dir_struct *dir)
{
- int i, j;
@@ dir.c: void clear_directory(struct dir_struct *dir)
free(group->pl);
}
@@ dir.c: void clear_directory(struct dir_struct *dir)
stk = dir->exclude_stack;
while (stk) {
struct exclude_stack *prev = stk->prev;
+@@ dir.c: void clear_directory(struct dir_struct *dir)
+ stk = prev;
+ }
+ strbuf_release(&dir->basebuf);
++
++ memset(&dir, 0, sizeof(*dir));
+ }
+
+ struct ondisk_untracked_cache {
## dir.h ##
@@
3: b9310e9941 ! 2: dcd72eef48 dir: fix problematic API to avoid memory leaks
@@ Metadata
## Commit message ##
dir: fix problematic API to avoid memory leaks
- Two commits ago, we noticed that parent_hashmap and recursive_hashmap
- were being leaked and fixed it by making clear_pattern_list() free those
- hashmaps. One commit ago, we noticed that clear_directory() was only
- taking responsibility for a subset of fields within dir_struct, despite
- the fact that entries[] and ignored[] we allocated internally to dir.c,
- resulting in many callers either leaking or haphazardly trying to free
- these arrays and their contents.
+ The dir structure seemed to have a number of leaks and problems around
+ it. First I noticed that parent_hashmap and recursive_hashmap were
+ being leaked (though Peff noticed and submitted fixes before me). Then
+ I noticed in the previous commit that clear_directory() was only taking
+ responsibility for a subset of fields within dir_struct, despite the
+ fact that entries[] and ignored[] we allocated internally to dir.c.
+ That, of course, resulted in many callers either leaking or haphazardly
+ trying to free these arrays and their contents.
Digging further, I found that despite the pretty clear documentation
near the top of dir.h that folks were supposed to call clear_directory()
@@ Commit message
which then snowballed into further problems with the entries[],
ignored[], parent_hashmap, and recursive_hashmap problems.
- Rename clear_directory() to dir_free() to be more in line with other
+ Rename clear_directory() to dir_clear() to be more in line with other
data structures in git, and introduce a dir_init() to handle the
suggested memsetting of dir_struct to all zeroes. I hope that a name
- like "dir_free()" is more clear, and that the presence of dir_init()
- will provide a hint to those looking at the code that there may be a
- corresponding dir_free() that they need to call.
+ like "dir_clear()" is more clear, and that the presence of dir_init()
+ will provide a hint to those looking at the code that they need to look
+ for either a dir_clear() or a dir_free() and lead them to find
+ dir_clear().
Signed-off-by: Elijah Newren <newren@gmail.com>
@@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
COMMIT_LOCK | SKIP_IF_UNCHANGED))
die(_("Unable to write new index file"));
-+ dir_free(&dir);
++ dir_clear(&dir);
UNLEAK(pathspec);
- UNLEAK(dir);
return exit_status;
@@ builtin/check-ignore.c: int cmd_check_ignore(int argc, const char **argv, const
}
- clear_directory(&dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
return !num_ignored;
}
@@ builtin/clean.c: static int filter_by_patterns_cmd(void)
strbuf_list_free(ignore_list);
- clear_directory(&dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
}
strbuf_release(&confirm);
@@ builtin/clean.c: int cmd_clean(int argc, const char **argv, const char *prefix)
}
- clear_directory(&dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
if (interactive && del_list.nr > 0)
interactive_main_loop();
@@ builtin/grep.c: static int grep_directory(struct grep_opt *opt, const struct pat
if (hit && opt->status_only)
break;
}
-+ dir_free(&dir);
++ dir_clear(&dir);
return hit;
}
@@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cm
}
- UNLEAK(dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
return 0;
}
@@ builtin/stash.c: static int get_untracked_files(const struct pathspec *ps, int i
}
- clear_directory(&dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
return found;
}
@@ dir.c: static enum path_treatment read_directory_recursive(struct dir_struct *di
{
int cnt = 0;
@@ dir.c: int remove_path(const char *name)
- }
-
- /* Frees memory within dir which was allocated. Does not free dir itself. */
+ * Frees memory within dir which was allocated, and resets fields for further
+ * use. Does not free dir itself.
+ */
-void clear_directory(struct dir_struct *dir)
-+void dir_free(struct dir_struct *dir)
++void dir_clear(struct dir_struct *dir)
{
int i, j;
struct exclude_list_group *group;
+@@ dir.c: void clear_directory(struct dir_struct *dir)
+ }
+ strbuf_release(&dir->basebuf);
+
+- memset(&dir, 0, sizeof(*dir));
++ dir_init(dir);
+ }
+
+ struct ondisk_untracked_cache {
## dir.h ##
@@
@@ dir.h
- * - Use `dir.entries[]`.
+ * - Use `dir.entries[]` and `dir.ignored[]`.
*
- * - Call `clear_directory()` when the contained elements are no longer in use.
+- * - Call `clear_directory()` when the contained elements are no longer in use.
++ * - Call `dir_clear()` when the contained elements are no longer in use.
*
+ */
+
@@ dir.h: int match_pathspec(const struct index_state *istate,
int report_path_error(const char *ps_matched, const struct pathspec *pathspec);
int within_depth(const char *name, int namelen, int depth, int max_depth);
@@ dir.h: void parse_path_pattern(const char **string, int *patternlen, unsigned *f
int baselen, struct pattern_list *pl, int srcpos);
void clear_pattern_list(struct pattern_list *pl);
-void clear_directory(struct dir_struct *dir);
-+void dir_free(struct dir_struct *dir);
++void dir_clear(struct dir_struct *dir);
int repo_file_exists(struct repository *repo, const char *path);
int file_exists(const char *);
@@ merge.c: int checkout_fast_forward(struct repository *r,
clear_unpack_trees_porcelain(&opts);
return -1;
}
-+ dir_free(&dir);
++ dir_clear(&dir);
clear_unpack_trees_porcelain(&opts);
if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
@@ wt-status.c: static void wt_status_collect_untracked(struct wt_status *s)
}
- clear_directory(&dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
if (advice_status_u_option)
s->untracked_in_ms = (getnanotime() - t_begin) / 1000000;
--
gitgitgadget
next prev parent reply other threads:[~2020-08-18 22:58 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-16 6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-16 6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
2020-08-16 8:43 ` Jeff King
2020-08-17 16:57 ` Elijah Newren
2020-08-16 6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-16 8:54 ` Jeff King
2020-08-17 16:58 ` Elijah Newren
2020-08-16 6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-16 9:11 ` Jeff King
2020-08-17 17:19 ` Elijah Newren
2020-08-17 19:00 ` Jeff King
2020-08-18 22:58 ` Elijah Newren via GitGitGadget [this message]
2020-08-18 22:58 ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-18 22:58 ` [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-19 13:51 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.831.v2.git.git.1597791506.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).