git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Fix a couple small leaks in merge-ort
@ 2022-02-19 17:09 Elijah Newren via GitGitGadget
  2022-02-19 17:09 ` [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-02-19 17:09 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason, Elijah Newren

Off-list, Ævar reported a few small leaks in merge-ort to me that I missed
previously. Here's a couple fixes.

Elijah Newren (2):
  merge-ort: fix small memory leak in detect_and_process_renames()
  merge-ort: fix small memory leak in unique_path()

 merge-ort.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)


base-commit: e2ac9141e64e2cd3e690d1b5fc848949827c09b4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1152%2Fnewren%2Fmerge-ort-leak-fixes-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1152/newren/merge-ort-leak-fixes-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1152
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames()
  2022-02-19 17:09 [PATCH 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
@ 2022-02-19 17:09 ` Elijah Newren via GitGitGadget
  2022-02-19 21:44   ` Ævar Arnfjörð Bjarmason
  2022-02-19 17:10 ` [PATCH 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
  2022-02-20  1:29 ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
  2 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-02-19 17:09 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

detect_and_process_renames() detects renames on both sides of history
and then combines these into a single diff_queue_struct.  The combined
diff_queue_struct needs to be able to hold the renames found on either
side, and since it knows the (maximum) size it needs, it pre-emptively
grows the array to the appropriate size:

	ALLOC_GROW(combined.queue,
		   renames->pairs[1].nr + renames->pairs[2].nr,
		   combined.alloc);

It then collects the items from each side:

	collect_renames(opt, &combined, MERGE_SIDE1, ...)
	collect_renames(opt, &combined, MERGE_SIDE2, ...)

Note, though, that collect_renames() sometimes determines that some
pairs are unnecessary and does not include them in the combined array.
When it is done, detect_and_process_renames() frees this memory:

	if (combined.nr) {
                ...
		free(combined.queue);
        }

The problem is that sometimes even when there are pairs, none of them are
necessary.  Instead of checking combined.nr, we should check
combined.alloc.  Doing so fixes the following memory leak, as reported
by valgrind:

==PID== 192 bytes in 1 blocks are definitely lost in loss record 107 of 134
==PID==    at 0xADDRESS: malloc
==PID==    by 0xADDRESS: realloc
==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
==PID==    by 0xADDRESS: detect_and_process_renames (merge-ort.c:3134)
==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4610)
==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)
==PID==    by 0xADDRESS: cmd_merge (merge.c:1676)
==PID==    by 0xADDRESS: run_builtin (git.c:461)
==PID==    by 0xADDRESS: handle_builtin (git.c:713)
==PID==    by 0xADDRESS: run_argv (git.c:780)
==PID==    by 0xADDRESS: cmd_main (git.c:911)
==PID==    by 0xADDRESS: main (common-main.c:52)

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index d85b1cd99e9..4f5abc558c5 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3175,7 +3175,7 @@ simple_cleanup:
 		free(renames->pairs[s].queue);
 		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
 	}
-	if (combined.nr) {
+	if (combined.alloc) {
 		int i;
 		for (i = 0; i < combined.nr; i++)
 			pool_diff_free_filepair(&opt->priv->pool,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] merge-ort: fix small memory leak in unique_path()
  2022-02-19 17:09 [PATCH 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
  2022-02-19 17:09 ` [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
@ 2022-02-19 17:10 ` Elijah Newren via GitGitGadget
  2022-02-19 22:22   ` Ævar Arnfjörð Bjarmason
  2022-02-20  1:29 ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
  2 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-02-19 17:10 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

The struct strmap paths member of merge_options_internal is perhaps the
most central data structure to all of merge-ort.  Because all the paths
involved in the merge need to be kept until the merge is complete, this
"paths" data structure traditionally took responsibility for owning all
the allocated paths.  When the merge is over, those paths were free()d
as part of free()ing this strmap.

In commit 6697ee01b5d3 (merge-ort: switch our strmaps over to using
memory pools, 2021-07-30), we changed the allocations for pathnames to
come from a memory pool.  That meant the ownership changed slightly;
there were no individual free() calls to make, instead the memory pool
owned all those paths and they were free()d all at once.

Unfortunately unique_path() was written presuming the pre-memory-pool
model, and allocated a path on the heap and left it in the strmap for
later free()ing.  Modify it to return a path allocated from the memory
pool instead.

Note that there's one instance -- in record_conflicted_index_entries()
-- where the returned string from unique_path() was only used very
temporarily and thus had been immediately free()'d.  This codepath was
associated with an ugly skip-worktree workaround that has since been
better fixed by the in-flight en/present-despite-skipped topic.  This
workaround probably makes sense to excise once that topic merges down,
but for now, just remove the immediate free() and allow the returned
string to be free()d when the memory pool is released.

This fixes the following memory leak as reported by valgrind:

==PID== 65 bytes in 1 blocks are definitely lost in loss record 79 of 134
==PID==    at 0xADDRESS: malloc
==PID==    by 0xADDRESS: realloc
==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
==PID==    by 0xADDRESS: strbuf_grow (strbuf.c:98)
==PID==    by 0xADDRESS: strbuf_vaddf (strbuf.c:394)
==PID==    by 0xADDRESS: strbuf_addf (strbuf.c:335)
==PID==    by 0xADDRESS: unique_path (merge-ort.c:733)
==PID==    by 0xADDRESS: process_entry (merge-ort.c:3678)
==PID==    by 0xADDRESS: process_entries (merge-ort.c:4037)
==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4621)
==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 4f5abc558c5..40ae4dc4e92 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -722,13 +722,15 @@ static void add_flattened_path(struct strbuf *out, const char *s)
 			out->buf[i] = '_';
 }
 
-static char *unique_path(struct strmap *existing_paths,
+static char *unique_path(struct merge_options *opt,
 			 const char *path,
 			 const char *branch)
 {
+	char *ret = NULL;
 	struct strbuf newpath = STRBUF_INIT;
 	int suffix = 0;
 	size_t base_len;
+	struct strmap *existing_paths = &opt->priv->paths;
 
 	strbuf_addf(&newpath, "%s~", path);
 	add_flattened_path(&newpath, branch);
@@ -739,7 +741,11 @@ static char *unique_path(struct strmap *existing_paths,
 		strbuf_addf(&newpath, "_%d", suffix++);
 	}
 
-	return strbuf_detach(&newpath, NULL);
+	/* Track the new path in our memory pool */
+	ret = mem_pool_alloc(&opt->priv->pool, newpath.len + 1);
+	memcpy(ret, newpath.buf, newpath.len + 1);
+	strbuf_release(&newpath);
+	return ret;
 }
 
 /*** Function Grouping: functions related to collect_merge_info() ***/
@@ -3679,7 +3685,7 @@ static void process_entry(struct merge_options *opt,
 		 */
 		df_file_index = (ci->dirmask & (1 << 1)) ? 2 : 1;
 		branch = (df_file_index == 1) ? opt->branch1 : opt->branch2;
-		path = unique_path(&opt->priv->paths, path, branch);
+		path = unique_path(opt, path, branch);
 		strmap_put(&opt->priv->paths, path, new_ci);
 
 		path_msg(opt, path, 0,
@@ -3804,14 +3810,12 @@ static void process_entry(struct merge_options *opt,
 			/* Insert entries into opt->priv_paths */
 			assert(rename_a || rename_b);
 			if (rename_a) {
-				a_path = unique_path(&opt->priv->paths,
-						     path, opt->branch1);
+				a_path = unique_path(opt, path, opt->branch1);
 				strmap_put(&opt->priv->paths, a_path, ci);
 			}
 
 			if (rename_b)
-				b_path = unique_path(&opt->priv->paths,
-						     path, opt->branch2);
+				b_path = unique_path(opt, path, opt->branch2);
 			else
 				b_path = path;
 			strmap_put(&opt->priv->paths, b_path, new_ci);
@@ -4199,7 +4203,7 @@ static int record_conflicted_index_entries(struct merge_options *opt)
 				struct stat st;
 
 				if (!lstat(path, &st)) {
-					char *new_name = unique_path(&opt->priv->paths,
+					char *new_name = unique_path(opt,
 								     path,
 								     "cruft");
 
@@ -4207,7 +4211,6 @@ static int record_conflicted_index_entries(struct merge_options *opt)
 						 _("Note: %s not up to date and in way of checking out conflicted version; old copy renamed to %s"),
 						 path, new_name);
 					errs |= rename(path, new_name);
-					free(new_name);
 				}
 				errs |= checkout_entry(ce, &state, NULL, NULL);
 			}
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames()
  2022-02-19 17:09 ` [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
@ 2022-02-19 21:44   ` Ævar Arnfjörð Bjarmason
  2022-02-20  0:26     ` Elijah Newren
  0 siblings, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-19 21:44 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren


On Sat, Feb 19 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> detect_and_process_renames() detects renames on both sides of history
> and then combines these into a single diff_queue_struct.  The combined
> diff_queue_struct needs to be able to hold the renames found on either
> side, and since it knows the (maximum) size it needs, it pre-emptively
> grows the array to the appropriate size:
>
> 	ALLOC_GROW(combined.queue,
> 		   renames->pairs[1].nr + renames->pairs[2].nr,
> 		   combined.alloc);
>
> It then collects the items from each side:
>
> 	collect_renames(opt, &combined, MERGE_SIDE1, ...)
> 	collect_renames(opt, &combined, MERGE_SIDE2, ...)
>
> Note, though, that collect_renames() sometimes determines that some
> pairs are unnecessary and does not include them in the combined array.
> When it is done, detect_and_process_renames() frees this memory:
>
> 	if (combined.nr) {
>                 ...
> 		free(combined.queue);
>         }
>
> The problem is that sometimes even when there are pairs, none of them are
> necessary.  Instead of checking combined.nr, we should check
> combined.alloc.  Doing so fixes the following memory leak, as reported
> by valgrind:
>
> ==PID== 192 bytes in 1 blocks are definitely lost in loss record 107 of 134
> ==PID==    at 0xADDRESS: malloc
> ==PID==    by 0xADDRESS: realloc
> ==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
> ==PID==    by 0xADDRESS: detect_and_process_renames (merge-ort.c:3134)
> ==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4610)
> ==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
> ==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
> ==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
> ==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)
> ==PID==    by 0xADDRESS: cmd_merge (merge.c:1676)
> ==PID==    by 0xADDRESS: run_builtin (git.c:461)
> ==PID==    by 0xADDRESS: handle_builtin (git.c:713)
> ==PID==    by 0xADDRESS: run_argv (git.c:780)
> ==PID==    by 0xADDRESS: cmd_main (git.c:911)
> ==PID==    by 0xADDRESS: main (common-main.c:52)
>
> Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index d85b1cd99e9..4f5abc558c5 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3175,7 +3175,7 @@ simple_cleanup:
>  		free(renames->pairs[s].queue);
>  		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
>  	}
> -	if (combined.nr) {
> +	if (combined.alloc) {
>  		int i;
>  		for (i = 0; i < combined.nr; i++)
>  			pool_diff_free_filepair(&opt->priv->pool,

This looks correct in that it'll work, but I still don't get why the
pre-image or post-image is going through this indirection of checking
"alloc" at all. Wouldn't this be correct & more straightforward (the
memset part is optional, just did it for good measure)?:

diff --git a/merge-ort.c b/merge-ort.c
index 40ae4dc4e92..a01f28586a1 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3092,12 +3092,12 @@ static int detect_and_process_renames(struct merge_options *opt,
 				      struct tree *side1,
 				      struct tree *side2)
 {
-	struct diff_queue_struct combined;
+	struct diff_queue_struct combined = { 0 };
 	struct rename_info *renames = &opt->priv->renames;
 	int need_dir_renames, s, clean = 1;
 	unsigned detection_run = 0;
+	int i;
 
-	memset(&combined, 0, sizeof(combined));
 	if (!possible_renames(renames))
 		goto cleanup;
 
@@ -3181,13 +3181,10 @@ static int detect_and_process_renames(struct merge_options *opt,
 		free(renames->pairs[s].queue);
 		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
 	}
-	if (combined.alloc) {
-		int i;
-		for (i = 0; i < combined.nr; i++)
-			pool_diff_free_filepair(&opt->priv->pool,
-						combined.queue[i]);
-		free(combined.queue);
-	}
+
+	for (i = 0; i < combined.nr; i++)
+		pool_diff_free_filepair(&opt->priv->pool, combined.queue[i]);
+	free(combined.queue);
 
 	return clean;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] merge-ort: fix small memory leak in unique_path()
  2022-02-19 17:10 ` [PATCH 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
@ 2022-02-19 22:22   ` Ævar Arnfjörð Bjarmason
  2022-02-20  0:37     ` Elijah Newren
  0 siblings, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-19 22:22 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren


On Sat, Feb 19 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> The struct strmap paths member of merge_options_internal is perhaps the
> most central data structure to all of merge-ort.  Because all the paths
> involved in the merge need to be kept until the merge is complete, this
> "paths" data structure traditionally took responsibility for owning all
> the allocated paths.  When the merge is over, those paths were free()d
> as part of free()ing this strmap.
>
> In commit 6697ee01b5d3 (merge-ort: switch our strmaps over to using
> memory pools, 2021-07-30), we changed the allocations for pathnames to
> come from a memory pool.  That meant the ownership changed slightly;
> there were no individual free() calls to make, instead the memory pool
> owned all those paths and they were free()d all at once.
>
> Unfortunately unique_path() was written presuming the pre-memory-pool
> model, and allocated a path on the heap and left it in the strmap for
> later free()ing.  Modify it to return a path allocated from the memory
> pool instead.

This seems like a rather obvious fix to the leak, as the other side
wasn't ready to have the detached strbuf handed to it, and instead is
assuming everything is mempools.

The downside is a bit of heap churn here since you malloc() & use the
strbuf just to ask for that size from the mempool, and then free() the
strbuf (of course we had that before, we just weren't free-ing).

So this is just an aside & I have no idea if it's worth it, but FWIW you
can have your cake & eat it too here memory-allocation wise and avoid
the strbuf allocation entirely, and just use your mem-pool.

Like this:

diff --git a/merge-ort.c b/merge-ort.c
index 40ae4dc4e92..1111916d5cb 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -731,6 +731,16 @@ static char *unique_path(struct merge_options *opt,
 	int suffix = 0;
 	size_t base_len;
 	struct strmap *existing_paths = &opt->priv->paths;
+	/*
+	 * pre-size path + ~ + branch + _%d + "\0". Hopefully 6 digits
+	 * of suffix is enough for everyone?
+	 */
+	const size_t max_suffix = 6;
+	const size_t expected_len = strlen(path) + 1 + strlen(branch) + 1 +
+		max_suffix + 1;
+
+	ret = mem_pool_alloc(&opt->priv->pool, expected_len);
+	strbuf_attach(&newpath, ret, 0, expected_len);
 
 	strbuf_addf(&newpath, "%s~", path);
 	add_flattened_path(&newpath, branch);
@@ -741,10 +751,10 @@ static char *unique_path(struct merge_options *opt,
 		strbuf_addf(&newpath, "_%d", suffix++);
 	}
 
-	/* Track the new path in our memory pool */
-	ret = mem_pool_alloc(&opt->priv->pool, newpath.len + 1);
-	memcpy(ret, newpath.buf, newpath.len + 1);
-	strbuf_release(&newpath);
+	if (newpath.alloc > expected_len)
+		BUG("we assumed too much thinking '%s~%s' would fit in %lu, ended up %lu ('%s')",
+		    path, branch, expected_len, newpath.alloc, newpath.buf);
+
 	return ret;
 }
 

A bit nasty for sure, but if you're willing to BUG() out if we ever go
above 999999 suffix tries or whatever (which would be trivial to add to
the loop there) it's rather straightforward.

I.e. we know the size of the buffer ahead of time, except for that loop
that'll add "_%d" to the end, and that can be made bounded.

Obviously your solution's a lot simpler, so I think this is only
something you should consider if you think it matters for the
performance numbers linked to from 6697ee01b5d3. I'm not familiar enough
with merge-ort.c to know if it is in this case, or if this would be
pointless micro-optimization on a non-hot codepath.


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames()
  2022-02-19 21:44   ` Ævar Arnfjörð Bjarmason
@ 2022-02-20  0:26     ` Elijah Newren
  0 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2022-02-20  0:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Sat, Feb 19, 2022 at 1:45 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Sat, Feb 19 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > detect_and_process_renames() detects renames on both sides of history
> > and then combines these into a single diff_queue_struct.  The combined
> > diff_queue_struct needs to be able to hold the renames found on either
> > side, and since it knows the (maximum) size it needs, it pre-emptively
> > grows the array to the appropriate size:
> >
> >       ALLOC_GROW(combined.queue,
> >                  renames->pairs[1].nr + renames->pairs[2].nr,
> >                  combined.alloc);
> >
> > It then collects the items from each side:
> >
> >       collect_renames(opt, &combined, MERGE_SIDE1, ...)
> >       collect_renames(opt, &combined, MERGE_SIDE2, ...)
> >
> > Note, though, that collect_renames() sometimes determines that some
> > pairs are unnecessary and does not include them in the combined array.
> > When it is done, detect_and_process_renames() frees this memory:
> >
> >       if (combined.nr) {
> >                 ...
> >               free(combined.queue);
> >         }
> >
> > The problem is that sometimes even when there are pairs, none of them are
> > necessary.  Instead of checking combined.nr, we should check
> > combined.alloc.  Doing so fixes the following memory leak, as reported
> > by valgrind:
> >
> > ==PID== 192 bytes in 1 blocks are definitely lost in loss record 107 of 134
> > ==PID==    at 0xADDRESS: malloc
> > ==PID==    by 0xADDRESS: realloc
> > ==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
> > ==PID==    by 0xADDRESS: detect_and_process_renames (merge-ort.c:3134)
> > ==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4610)
> > ==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
> > ==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
> > ==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
> > ==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)
> > ==PID==    by 0xADDRESS: cmd_merge (merge.c:1676)
> > ==PID==    by 0xADDRESS: run_builtin (git.c:461)
> > ==PID==    by 0xADDRESS: handle_builtin (git.c:713)
> > ==PID==    by 0xADDRESS: run_argv (git.c:780)
> > ==PID==    by 0xADDRESS: cmd_main (git.c:911)
> > ==PID==    by 0xADDRESS: main (common-main.c:52)
> >
> > Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-ort.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index d85b1cd99e9..4f5abc558c5 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -3175,7 +3175,7 @@ simple_cleanup:
> >               free(renames->pairs[s].queue);
> >               DIFF_QUEUE_CLEAR(&renames->pairs[s]);
> >       }
> > -     if (combined.nr) {
> > +     if (combined.alloc) {
> >               int i;
> >               for (i = 0; i < combined.nr; i++)
> >                       pool_diff_free_filepair(&opt->priv->pool,
>
> This looks correct in that it'll work, but I still don't get why the
> pre-image or post-image is going through this indirection of checking
> "alloc" at all. Wouldn't this be correct & more straightforward (the
> memset part is optional, just did it for good measure)?:
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 40ae4dc4e92..a01f28586a1 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3092,12 +3092,12 @@ static int detect_and_process_renames(struct merge_options *opt,
>                                       struct tree *side1,
>                                       struct tree *side2)
>  {
> -       struct diff_queue_struct combined;
> +       struct diff_queue_struct combined = { 0 };
>         struct rename_info *renames = &opt->priv->renames;
>         int need_dir_renames, s, clean = 1;
>         unsigned detection_run = 0;
> +       int i;
>
> -       memset(&combined, 0, sizeof(combined));
>         if (!possible_renames(renames))
>                 goto cleanup;
>
> @@ -3181,13 +3181,10 @@ static int detect_and_process_renames(struct merge_options *opt,
>                 free(renames->pairs[s].queue);
>                 DIFF_QUEUE_CLEAR(&renames->pairs[s]);
>         }
> -       if (combined.alloc) {
> -               int i;
> -               for (i = 0; i < combined.nr; i++)
> -                       pool_diff_free_filepair(&opt->priv->pool,
> -                                               combined.queue[i]);
> -               free(combined.queue);
> -       }
> +
> +       for (i = 0; i < combined.nr; i++)
> +               pool_diff_free_filepair(&opt->priv->pool, combined.queue[i]);
> +       free(combined.queue);
>
>         return clean;
>  }

Hmm, good point; I like your version better.  I'll change it, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] merge-ort: fix small memory leak in unique_path()
  2022-02-19 22:22   ` Ævar Arnfjörð Bjarmason
@ 2022-02-20  0:37     ` Elijah Newren
  0 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2022-02-20  0:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Sat, Feb 19, 2022 at 2:31 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Sat, Feb 19 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > The struct strmap paths member of merge_options_internal is perhaps the
> > most central data structure to all of merge-ort.  Because all the paths
> > involved in the merge need to be kept until the merge is complete, this
> > "paths" data structure traditionally took responsibility for owning all
> > the allocated paths.  When the merge is over, those paths were free()d
> > as part of free()ing this strmap.
> >
> > In commit 6697ee01b5d3 (merge-ort: switch our strmaps over to using
> > memory pools, 2021-07-30), we changed the allocations for pathnames to
> > come from a memory pool.  That meant the ownership changed slightly;
> > there were no individual free() calls to make, instead the memory pool
> > owned all those paths and they were free()d all at once.
> >
> > Unfortunately unique_path() was written presuming the pre-memory-pool
> > model, and allocated a path on the heap and left it in the strmap for
> > later free()ing.  Modify it to return a path allocated from the memory
> > pool instead.
>
> This seems like a rather obvious fix to the leak, as the other side
> wasn't ready to have the detached strbuf handed to it, and instead is
> assuming everything is mempools.
>
> The downside is a bit of heap churn here since you malloc() & use the
> strbuf just to ask for that size from the mempool, and then free() the
> strbuf (of course we had that before, we just weren't free-ing).
>
> So this is just an aside & I have no idea if it's worth it, but FWIW you
> can have your cake & eat it too here memory-allocation wise and avoid
> the strbuf allocation entirely, and just use your mem-pool.
>
> Like this:
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 40ae4dc4e92..1111916d5cb 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -731,6 +731,16 @@ static char *unique_path(struct merge_options *opt,
>         int suffix = 0;
>         size_t base_len;
>         struct strmap *existing_paths = &opt->priv->paths;
> +       /*
> +        * pre-size path + ~ + branch + _%d + "\0". Hopefully 6 digits
> +        * of suffix is enough for everyone?
> +        */
> +       const size_t max_suffix = 6;
> +       const size_t expected_len = strlen(path) + 1 + strlen(branch) + 1 +
> +               max_suffix + 1;
> +
> +       ret = mem_pool_alloc(&opt->priv->pool, expected_len);
> +       strbuf_attach(&newpath, ret, 0, expected_len);
>
>         strbuf_addf(&newpath, "%s~", path);
>         add_flattened_path(&newpath, branch);
> @@ -741,10 +751,10 @@ static char *unique_path(struct merge_options *opt,
>                 strbuf_addf(&newpath, "_%d", suffix++);
>         }
>
> -       /* Track the new path in our memory pool */
> -       ret = mem_pool_alloc(&opt->priv->pool, newpath.len + 1);
> -       memcpy(ret, newpath.buf, newpath.len + 1);
> -       strbuf_release(&newpath);
> +       if (newpath.alloc > expected_len)
> +               BUG("we assumed too much thinking '%s~%s' would fit in %lu, ended up %lu ('%s')",
> +                   path, branch, expected_len, newpath.alloc, newpath.buf);
> +
>         return ret;
>  }
>
>
> A bit nasty for sure, but if you're willing to BUG() out if we ever go
> above 999999 suffix tries or whatever (which would be trivial to add to
> the loop there) it's rather straightforward.
>
> I.e. we know the size of the buffer ahead of time, except for that loop
> that'll add "_%d" to the end, and that can be made bounded.
>
> Obviously your solution's a lot simpler, so I think this is only
> something you should consider if you think it matters for the
> performance numbers linked to from 6697ee01b5d3. I'm not familiar enough
> with merge-ort.c to know if it is in this case, or if this would be
> pointless micro-optimization on a non-hot codepath.

That's an interesting idea, but it's a micro-optimization on a very
cold path.  You need to either have a D/F conflict that persists or a
mode-type conflict (e.g. an add/add conflict with symlink vs. file
types).  Those tend to be quite rare; in fact, the testcase with the
performance numbers in 6697ee01b5d3 didn't have any of these and never
even triggered this codepath (otherwise I would have caught the leak
in my earlier leak testing).  So I think simple and robust makes more
sense here.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 0/2] Fix a couple small leaks in merge-ort
  2022-02-19 17:09 [PATCH 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
  2022-02-19 17:09 ` [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
  2022-02-19 17:10 ` [PATCH 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
@ 2022-02-20  1:29 ` Elijah Newren via GitGitGadget
  2022-02-20  1:29   ` [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
                     ` (2 more replies)
  2 siblings, 3 replies; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-02-20  1:29 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason, Elijah Newren

Off-list, Ævar reported a few small leaks in merge-ort to me that I missed
previously. Here's a couple fixes.

Changes since v1:

 * Simplified patch 1 a bit as per Ævar's suggestion

Elijah Newren (2):
  merge-ort: fix small memory leak in detect_and_process_renames()
  merge-ort: fix small memory leak in unique_path()

 merge-ort.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)


base-commit: e2ac9141e64e2cd3e690d1b5fc848949827c09b4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1152%2Fnewren%2Fmerge-ort-leak-fixes-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1152/newren/merge-ort-leak-fixes-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1152

Range-diff vs v1:

 1:  f0308de28e4 ! 1:  f1f7fc97fe2 merge-ort: fix small memory leak in detect_and_process_renames()
     @@ Commit message
                          free(combined.queue);
                  }
      
     -    The problem is that sometimes even when there are pairs, none of them are
     -    necessary.  Instead of checking combined.nr, we should check
     -    combined.alloc.  Doing so fixes the following memory leak, as reported
     -    by valgrind:
     +    The problem is that sometimes even when there are pairs, none of them
     +    are necessary.  Instead of checking combined.nr, just remove the
     +    if-check; free() knows to skip NULL pointers.  This change fixes the
     +    following memory leak, as reported by valgrind:
      
          ==PID== 192 bytes in 1 blocks are definitely lost in loss record 107 of 134
          ==PID==    at 0xADDRESS: malloc
     @@ Commit message
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## merge-ort.c ##
     +@@ merge-ort.c: static int detect_and_process_renames(struct merge_options *opt,
     + 				      struct tree *side1,
     + 				      struct tree *side2)
     + {
     +-	struct diff_queue_struct combined;
     ++	struct diff_queue_struct combined = { 0 };
     + 	struct rename_info *renames = &opt->priv->renames;
     +-	int need_dir_renames, s, clean = 1;
     ++	int need_dir_renames, s, i, clean = 1;
     + 	unsigned detection_run = 0;
     + 
     +-	memset(&combined, 0, sizeof(combined));
     + 	if (!possible_renames(renames))
     + 		goto cleanup;
     + 
      @@ merge-ort.c: simple_cleanup:
       		free(renames->pairs[s].queue);
       		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
       	}
      -	if (combined.nr) {
     -+	if (combined.alloc) {
     - 		int i;
     - 		for (i = 0; i < combined.nr; i++)
     - 			pool_diff_free_filepair(&opt->priv->pool,
     +-		int i;
     +-		for (i = 0; i < combined.nr; i++)
     +-			pool_diff_free_filepair(&opt->priv->pool,
     +-						combined.queue[i]);
     +-		free(combined.queue);
     +-	}
     ++	for (i = 0; i < combined.nr; i++)
     ++		pool_diff_free_filepair(&opt->priv->pool, combined.queue[i]);
     ++	free(combined.queue);
     + 
     + 	return clean;
     + }
 2:  73bc1e5c5df = 2:  69fb932c21d merge-ort: fix small memory leak in unique_path()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames()
  2022-02-20  1:29 ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
@ 2022-02-20  1:29   ` Elijah Newren via GitGitGadget
  2022-02-21  2:35     ` Taylor Blau
                       ` (2 more replies)
  2022-02-20  1:29   ` [PATCH v2 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
  2022-02-21  2:43   ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Taylor Blau
  2 siblings, 3 replies; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-02-20  1:29 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

detect_and_process_renames() detects renames on both sides of history
and then combines these into a single diff_queue_struct.  The combined
diff_queue_struct needs to be able to hold the renames found on either
side, and since it knows the (maximum) size it needs, it pre-emptively
grows the array to the appropriate size:

	ALLOC_GROW(combined.queue,
		   renames->pairs[1].nr + renames->pairs[2].nr,
		   combined.alloc);

It then collects the items from each side:

	collect_renames(opt, &combined, MERGE_SIDE1, ...)
	collect_renames(opt, &combined, MERGE_SIDE2, ...)

Note, though, that collect_renames() sometimes determines that some
pairs are unnecessary and does not include them in the combined array.
When it is done, detect_and_process_renames() frees this memory:

	if (combined.nr) {
                ...
		free(combined.queue);
        }

The problem is that sometimes even when there are pairs, none of them
are necessary.  Instead of checking combined.nr, just remove the
if-check; free() knows to skip NULL pointers.  This change fixes the
following memory leak, as reported by valgrind:

==PID== 192 bytes in 1 blocks are definitely lost in loss record 107 of 134
==PID==    at 0xADDRESS: malloc
==PID==    by 0xADDRESS: realloc
==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
==PID==    by 0xADDRESS: detect_and_process_renames (merge-ort.c:3134)
==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4610)
==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)
==PID==    by 0xADDRESS: cmd_merge (merge.c:1676)
==PID==    by 0xADDRESS: run_builtin (git.c:461)
==PID==    by 0xADDRESS: handle_builtin (git.c:713)
==PID==    by 0xADDRESS: run_argv (git.c:780)
==PID==    by 0xADDRESS: cmd_main (git.c:911)
==PID==    by 0xADDRESS: main (common-main.c:52)

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index d85b1cd99e9..3d7f9feb6f7 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3086,12 +3086,11 @@ static int detect_and_process_renames(struct merge_options *opt,
 				      struct tree *side1,
 				      struct tree *side2)
 {
-	struct diff_queue_struct combined;
+	struct diff_queue_struct combined = { 0 };
 	struct rename_info *renames = &opt->priv->renames;
-	int need_dir_renames, s, clean = 1;
+	int need_dir_renames, s, i, clean = 1;
 	unsigned detection_run = 0;
 
-	memset(&combined, 0, sizeof(combined));
 	if (!possible_renames(renames))
 		goto cleanup;
 
@@ -3175,13 +3174,9 @@ simple_cleanup:
 		free(renames->pairs[s].queue);
 		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
 	}
-	if (combined.nr) {
-		int i;
-		for (i = 0; i < combined.nr; i++)
-			pool_diff_free_filepair(&opt->priv->pool,
-						combined.queue[i]);
-		free(combined.queue);
-	}
+	for (i = 0; i < combined.nr; i++)
+		pool_diff_free_filepair(&opt->priv->pool, combined.queue[i]);
+	free(combined.queue);
 
 	return clean;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/2] merge-ort: fix small memory leak in unique_path()
  2022-02-20  1:29 ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
  2022-02-20  1:29   ` [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
@ 2022-02-20  1:29   ` Elijah Newren via GitGitGadget
  2022-02-21  2:43   ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Taylor Blau
  2 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-02-20  1:29 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

The struct strmap paths member of merge_options_internal is perhaps the
most central data structure to all of merge-ort.  Because all the paths
involved in the merge need to be kept until the merge is complete, this
"paths" data structure traditionally took responsibility for owning all
the allocated paths.  When the merge is over, those paths were free()d
as part of free()ing this strmap.

In commit 6697ee01b5d3 (merge-ort: switch our strmaps over to using
memory pools, 2021-07-30), we changed the allocations for pathnames to
come from a memory pool.  That meant the ownership changed slightly;
there were no individual free() calls to make, instead the memory pool
owned all those paths and they were free()d all at once.

Unfortunately unique_path() was written presuming the pre-memory-pool
model, and allocated a path on the heap and left it in the strmap for
later free()ing.  Modify it to return a path allocated from the memory
pool instead.

Note that there's one instance -- in record_conflicted_index_entries()
-- where the returned string from unique_path() was only used very
temporarily and thus had been immediately free()'d.  This codepath was
associated with an ugly skip-worktree workaround that has since been
better fixed by the in-flight en/present-despite-skipped topic.  This
workaround probably makes sense to excise once that topic merges down,
but for now, just remove the immediate free() and allow the returned
string to be free()d when the memory pool is released.

This fixes the following memory leak as reported by valgrind:

==PID== 65 bytes in 1 blocks are definitely lost in loss record 79 of 134
==PID==    at 0xADDRESS: malloc
==PID==    by 0xADDRESS: realloc
==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
==PID==    by 0xADDRESS: strbuf_grow (strbuf.c:98)
==PID==    by 0xADDRESS: strbuf_vaddf (strbuf.c:394)
==PID==    by 0xADDRESS: strbuf_addf (strbuf.c:335)
==PID==    by 0xADDRESS: unique_path (merge-ort.c:733)
==PID==    by 0xADDRESS: process_entry (merge-ort.c:3678)
==PID==    by 0xADDRESS: process_entries (merge-ort.c:4037)
==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4621)
==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 3d7f9feb6f7..71134d51502 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -722,13 +722,15 @@ static void add_flattened_path(struct strbuf *out, const char *s)
 			out->buf[i] = '_';
 }
 
-static char *unique_path(struct strmap *existing_paths,
+static char *unique_path(struct merge_options *opt,
 			 const char *path,
 			 const char *branch)
 {
+	char *ret = NULL;
 	struct strbuf newpath = STRBUF_INIT;
 	int suffix = 0;
 	size_t base_len;
+	struct strmap *existing_paths = &opt->priv->paths;
 
 	strbuf_addf(&newpath, "%s~", path);
 	add_flattened_path(&newpath, branch);
@@ -739,7 +741,11 @@ static char *unique_path(struct strmap *existing_paths,
 		strbuf_addf(&newpath, "_%d", suffix++);
 	}
 
-	return strbuf_detach(&newpath, NULL);
+	/* Track the new path in our memory pool */
+	ret = mem_pool_alloc(&opt->priv->pool, newpath.len + 1);
+	memcpy(ret, newpath.buf, newpath.len + 1);
+	strbuf_release(&newpath);
+	return ret;
 }
 
 /*** Function Grouping: functions related to collect_merge_info() ***/
@@ -3674,7 +3680,7 @@ static void process_entry(struct merge_options *opt,
 		 */
 		df_file_index = (ci->dirmask & (1 << 1)) ? 2 : 1;
 		branch = (df_file_index == 1) ? opt->branch1 : opt->branch2;
-		path = unique_path(&opt->priv->paths, path, branch);
+		path = unique_path(opt, path, branch);
 		strmap_put(&opt->priv->paths, path, new_ci);
 
 		path_msg(opt, path, 0,
@@ -3799,14 +3805,12 @@ static void process_entry(struct merge_options *opt,
 			/* Insert entries into opt->priv_paths */
 			assert(rename_a || rename_b);
 			if (rename_a) {
-				a_path = unique_path(&opt->priv->paths,
-						     path, opt->branch1);
+				a_path = unique_path(opt, path, opt->branch1);
 				strmap_put(&opt->priv->paths, a_path, ci);
 			}
 
 			if (rename_b)
-				b_path = unique_path(&opt->priv->paths,
-						     path, opt->branch2);
+				b_path = unique_path(opt, path, opt->branch2);
 			else
 				b_path = path;
 			strmap_put(&opt->priv->paths, b_path, new_ci);
@@ -4194,7 +4198,7 @@ static int record_conflicted_index_entries(struct merge_options *opt)
 				struct stat st;
 
 				if (!lstat(path, &st)) {
-					char *new_name = unique_path(&opt->priv->paths,
+					char *new_name = unique_path(opt,
 								     path,
 								     "cruft");
 
@@ -4202,7 +4206,6 @@ static int record_conflicted_index_entries(struct merge_options *opt)
 						 _("Note: %s not up to date and in way of checking out conflicted version; old copy renamed to %s"),
 						 path, new_name);
 					errs |= rename(path, new_name);
-					free(new_name);
 				}
 				errs |= checkout_entry(ce, &state, NULL, NULL);
 			}
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames()
  2022-02-20  1:29   ` [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
@ 2022-02-21  2:35     ` Taylor Blau
  2022-02-23  7:57       ` Elijah Newren
  2022-06-01 10:00     ` Ævar Arnfjörð Bjarmason
  2022-06-01 10:09     ` Flaky SANITIZE=leak test "regression" in v2.36.0 (was: [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames()) Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2022-02-21  2:35 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Elijah Newren

On Sun, Feb 20, 2022 at 01:29:50AM +0000, Elijah Newren via GitGitGadget wrote:
>  merge-ort.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)

Both versions of this patch look good to me (in fact, I appreciated
seeing both v1 and v2, since v1 makes it more obvious what is changing,
but v2 does the whole thing in a little bit of a cleaner way).

> diff --git a/merge-ort.c b/merge-ort.c
> index d85b1cd99e9..3d7f9feb6f7 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3086,12 +3086,11 @@ static int detect_and_process_renames(struct merge_options *opt,
>  				      struct tree *side1,
>  				      struct tree *side2)
>  {
> -	struct diff_queue_struct combined;
> +	struct diff_queue_struct combined = { 0 };
>  	struct rename_info *renames = &opt->priv->renames;
> -	int need_dir_renames, s, clean = 1;
> +	int need_dir_renames, s, i, clean = 1;

And this entire patch looks good to me, but I did wonder about why "i"
is an int here. Shouldn't it be a size_t instead? Looking at
diff_queue_struct's definition, both "alloc" and "nr" are signed ints,
when they should almost certainly be unsigned to avoid overflow.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] Fix a couple small leaks in merge-ort
  2022-02-20  1:29 ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
  2022-02-20  1:29   ` [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
  2022-02-20  1:29   ` [PATCH v2 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
@ 2022-02-21  2:43   ` Taylor Blau
  2 siblings, 0 replies; 15+ messages in thread
From: Taylor Blau @ 2022-02-21  2:43 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Elijah Newren

On Sun, Feb 20, 2022 at 01:29:49AM +0000, Elijah Newren via GitGitGadget wrote:
> Off-list, Ævar reported a few small leaks in merge-ort to me that I missed
> previously. Here's a couple fixes.
>
> Changes since v1:
>
>  * Simplified patch 1 a bit as per Ævar's suggestion

Thanks; both look good. I left a comment about an issue unrelated to
this series on the first patch. But this looks ready to go to my eyes.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames()
  2022-02-21  2:35     ` Taylor Blau
@ 2022-02-23  7:57       ` Elijah Newren
  0 siblings, 0 replies; 15+ messages in thread
From: Elijah Newren @ 2022-02-23  7:57 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Ævar Arnfjörð Bjarmason

On Sun, Feb 20, 2022 at 6:35 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Sun, Feb 20, 2022 at 01:29:50AM +0000, Elijah Newren via GitGitGadget wrote:
> >  merge-ort.c | 15 +++++----------
> >  1 file changed, 5 insertions(+), 10 deletions(-)
>
> Both versions of this patch look good to me (in fact, I appreciated
> seeing both v1 and v2, since v1 makes it more obvious what is changing,
> but v2 does the whole thing in a little bit of a cleaner way).

Thanks for taking a look!

> > diff --git a/merge-ort.c b/merge-ort.c
> > index d85b1cd99e9..3d7f9feb6f7 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -3086,12 +3086,11 @@ static int detect_and_process_renames(struct merge_options *opt,
> >                                     struct tree *side1,
> >                                     struct tree *side2)
> >  {
> > -     struct diff_queue_struct combined;
> > +     struct diff_queue_struct combined = { 0 };
> >       struct rename_info *renames = &opt->priv->renames;
> > -     int need_dir_renames, s, clean = 1;
> > +     int need_dir_renames, s, i, clean = 1;
>
> And this entire patch looks good to me, but I did wonder about why "i"
> is an int here. Shouldn't it be a size_t instead? Looking at
> diff_queue_struct's definition, both "alloc" and "nr" are signed ints,
> when they should almost certainly be unsigned to avoid overflow.

You may be right, but I'm not sure we're too worried right now about
folks having billions of paths involved in renames; such a repo would
make even the Microsoft monorepos look miniscule.  ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames()
  2022-02-20  1:29   ` [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
  2022-02-21  2:35     ` Taylor Blau
@ 2022-06-01 10:00     ` Ævar Arnfjörð Bjarmason
  2022-06-01 10:09     ` Flaky SANITIZE=leak test "regression" in v2.36.0 (was: [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames()) Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-01 10:00 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren


On Sun, Feb 20 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> detect_and_process_renames() detects renames on both sides of history
> and then combines these into a single diff_queue_struct.  The combined
> diff_queue_struct needs to be able to hold the renames found on either
> side, and since it knows the (maximum) size it needs, it pre-emptively
> grows the array to the appropriate size:
>
> 	ALLOC_GROW(combined.queue,
> 		   renames->pairs[1].nr + renames->pairs[2].nr,
> 		   combined.alloc);
>
> It then collects the items from each side:
>
> 	collect_renames(opt, &combined, MERGE_SIDE1, ...)
> 	collect_renames(opt, &combined, MERGE_SIDE2, ...)
>
> Note, though, that collect_renames() sometimes determines that some
> pairs are unnecessary and does not include them in the combined array.
> When it is done, detect_and_process_renames() frees this memory:
>
> 	if (combined.nr) {
>                 ...
> 		free(combined.queue);
>         }
>
> The problem is that sometimes even when there are pairs, none of them
> are necessary.  Instead of checking combined.nr, just remove the
> if-check; free() knows to skip NULL pointers.  This change fixes the
> following memory leak, as reported by valgrind:
>
> ==PID== 192 bytes in 1 blocks are definitely lost in loss record 107 of 134
> ==PID==    at 0xADDRESS: malloc
> ==PID==    by 0xADDRESS: realloc
> ==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
> ==PID==    by 0xADDRESS: detect_and_process_renames (merge-ort.c:3134)
> ==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4610)
> ==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
> ==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
> ==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
> ==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)
> ==PID==    by 0xADDRESS: cmd_merge (merge.c:1676)
> ==PID==    by 0xADDRESS: run_builtin (git.c:461)
> ==PID==    by 0xADDRESS: handle_builtin (git.c:713)
> ==PID==    by 0xADDRESS: run_argv (git.c:780)
> ==PID==    by 0xADDRESS: cmd_main (git.c:911)
> ==PID==    by 0xADDRESS: main (common-main.c:52)
>
> Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index d85b1cd99e9..3d7f9feb6f7 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3086,12 +3086,11 @@ static int detect_and_process_renames(struct merge_options *opt,
>  				      struct tree *side1,
>  				      struct tree *side2)
>  {
> -	struct diff_queue_struct combined;
> +	struct diff_queue_struct combined = { 0 };
>  	struct rename_info *renames = &opt->priv->renames;
> -	int need_dir_renames, s, clean = 1;
> +	int need_dir_renames, s, i, clean = 1;
>  	unsigned detection_run = 0;
>  
> -	memset(&combined, 0, sizeof(combined));
>  	if (!possible_renames(renames))
>  		goto cleanup;
>  
> @@ -3175,13 +3174,9 @@ simple_cleanup:
>  		free(renames->pairs[s].queue);
>  		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
>  	}
> -	if (combined.nr) {
> -		int i;
> -		for (i = 0; i < combined.nr; i++)
> -			pool_diff_free_filepair(&opt->priv->pool,
> -						combined.queue[i]);
> -		free(combined.queue);
> -	}
> +	for (i = 0; i < combined.nr; i++)
> +		pool_diff_free_filepair(&opt->priv->pool, combined.queue[i]);
> +	free(combined.queue);
>  
>  	return clean;
>  }


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Flaky SANITIZE=leak test "regression" in v2.36.0 (was: [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames())
  2022-02-20  1:29   ` [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
  2022-02-21  2:35     ` Taylor Blau
  2022-06-01 10:00     ` Ævar Arnfjörð Bjarmason
@ 2022-06-01 10:09     ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-01 10:09 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren, Derrick Stolee


On Sun, Feb 20 2022, Elijah Newren via GitGitGadget wrote:

[I fat-fingered an empty E-Mail reply to this earlier in
https://lore.kernel.org/git/220601.86h7541yqj.gmgdl@evledraar.gmail.com/,
sorry!]

> From: Elijah Newren <newren@gmail.com>
>
> detect_and_process_renames() detects renames on both sides of history
> and then combines these into a single diff_queue_struct.  The combined
> diff_queue_struct needs to be able to hold the renames found on either
> side, and since it knows the (maximum) size it needs, it pre-emptively
> grows the array to the appropriate size:
> [...]
> diff --git a/merge-ort.c b/merge-ort.c
> index d85b1cd99e9..3d7f9feb6f7 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3086,12 +3086,11 @@ static int detect_and_process_renames(struct merge_options *opt,
>  				      struct tree *side1,
>  				      struct tree *side2)
>  {
> -	struct diff_queue_struct combined;
> +	struct diff_queue_struct combined = { 0 };
>  	struct rename_info *renames = &opt->priv->renames;
> -	int need_dir_renames, s, clean = 1;
> +	int need_dir_renames, s, i, clean = 1;
>  	unsigned detection_run = 0;
>  
> -	memset(&combined, 0, sizeof(combined));
>  	if (!possible_renames(renames))
>  		goto cleanup;
>  
> @@ -3175,13 +3174,9 @@ simple_cleanup:
>  		free(renames->pairs[s].queue);
>  		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
>  	}
> -	if (combined.nr) {
> -		int i;
> -		for (i = 0; i < combined.nr; i++)
> -			pool_diff_free_filepair(&opt->priv->pool,
> -						combined.queue[i]);
> -		free(combined.queue);
> -	}
> +	for (i = 0; i < combined.nr; i++)
> +		pool_diff_free_filepair(&opt->priv->pool, combined.queue[i]);
> +	free(combined.queue);
>  
>  	return clean;
>  }

I haven't been able to find whether this actually causes anything bad,
but when I started digging into SANITIZE=leak failures I found that this
change made t6435-merge-sparse.sh flaky in the v2.36.0 release when
compiled with SANITIZE=leak.

I.e. it "should" fail, but with 8d60e9d2010 (merge-ort: fix small memory
leak in detect_and_process_renames(), 2022-02-20) it will sometimes
succeed.

I bisected it between 2.35.0..2.36.0 with:

	git bisect run sh -c 'make SANITIZE=leak && (cd t && ! (for i in $(seq 1 20); do ./t6435-merge-sparse.sh >/dev/null && echo $? || echo $?; done | grep -q 0))'

I.e. "fail if we succeed" (there's some redundancy in the ad-hoc
one-liner). Manually debugging it with:
	
	diff --git a/dir.c b/dir.c
	index d91295f2bcd..3e860769c26 100644
	--- a/dir.c
	+++ b/dir.c
	@@ -878,6 +878,8 @@ void add_pattern(const char *string, const char *base,
	 	unsigned flags;
	 	int nowildcardlen;
	 
	+	fprintf(stderr, "adding pattern %s\n", string);
	+
	 	parse_path_pattern(&string, &patternlen, &flags, &nowildcardlen);
	 	if (flags & PATTERN_FLAG_MUSTBEDIR) {
	 		FLEXPTR_ALLOC_MEM(pattern, pattern, string, patternlen);

Turns up this odd case, i.e. we "should" fail at the end of the "setup"
in this bit of test code:

	[...]
        test_commit_this ours &&
        git config core.sparseCheckout true &&
        echo "/checked-out" >.git/info/sparse-checkout &&
        git reset --hard &&
        test_must_fail git merge their

Here we leak in "git reset", so we "should" get a leak like:
	
	[...]	
	+ git tag ours
	+ git config core.sparseCheckout true
	+ echo /checked-out
	+ git reset --hard
	adding pattern /checked-out
	adding pattern /checked-out
	adding pattern /checked-out
	HEAD is now at 3dfe889 ours
	
	=================================================================
	==25385==ERROR: LeakSanitizer: detected memory leaks
	
	Indirect leak of 512 byte(s) in 1 object(s) allocated from:
	    #0 0x7f9652ef50aa in __interceptor_calloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:90
	    #1 0x6ab6dc in xcalloc /home/avar/g/git/wrapper.c:140
	    #2 0x58e5ac in alloc_table /home/avar/g/git/hashmap.c:79
	    #3 0x58e8e9 in hashmap_init /home/avar/g/git/hashmap.c:168
	    #4 0x569808 in add_patterns_from_buffer /home/avar/g/git/dir.c:1136
	    #5 0x5697a1 in add_patterns /home/avar/g/git/dir.c:1124
	    #6 0x56996a in add_patterns_from_file_to_list /home/avar/g/git/dir.c:1164
	    #7 0x56e0b2 in get_sparse_checkout_patterns /home/avar/g/git/dir.c:3273
	    #8 0x56a240 in init_sparse_checkout_patterns /home/avar/g/git/dir.c:1451
	[...]
	
That's consistent with what I see before, but sometimes it'll succeed
like this:

	[...]		
	+ git tag ours
	+ git config core.sparseCheckout true
	+ echo /checked-out
	+ git reset --hard
	adding pattern /checked-out   
	HEAD is now at 3dfe889 ours   
        [...]

I.e. for some reason the same "reset --hard" now adds one, not three
patterns (which before were all identical).

Now, the "flaky success" with SANITIZE=leak does appear to be new in
8d60e9d2010, but before that running this in a loop reveals that the 2nd
test sometimes succeeds, so perhaps the underlying "issue" isn't new.

I say "issue" because I haven't dug enough to see if this has any impact
on anything, and the failure I discovered doesn't per-se matter now.

But perhaps this observed indeterminism is pointing the way to some
deeper bug here? Or maybe it isn't...
	

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-06-01 10:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-19 17:09 [PATCH 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
2022-02-19 17:09 ` [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
2022-02-19 21:44   ` Ævar Arnfjörð Bjarmason
2022-02-20  0:26     ` Elijah Newren
2022-02-19 17:10 ` [PATCH 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
2022-02-19 22:22   ` Ævar Arnfjörð Bjarmason
2022-02-20  0:37     ` Elijah Newren
2022-02-20  1:29 ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
2022-02-20  1:29   ` [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
2022-02-21  2:35     ` Taylor Blau
2022-02-23  7:57       ` Elijah Newren
2022-06-01 10:00     ` Ævar Arnfjörð Bjarmason
2022-06-01 10:09     ` Flaky SANITIZE=leak test "regression" in v2.36.0 (was: [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames()) Ævar Arnfjörð Bjarmason
2022-02-20  1:29   ` [PATCH v2 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
2022-02-21  2:43   ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Taylor Blau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).