git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 2/2] merge-ort: fix small memory leak in unique_path()
Date: Sat, 19 Feb 2022 23:22:44 +0100	[thread overview]
Message-ID: <220219.86o832cwup.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <73bc1e5c5dffbe9c132ea786dd414ef2159967e3.1645290601.git.gitgitgadget@gmail.com>


On Sat, Feb 19 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> The struct strmap paths member of merge_options_internal is perhaps the
> most central data structure to all of merge-ort.  Because all the paths
> involved in the merge need to be kept until the merge is complete, this
> "paths" data structure traditionally took responsibility for owning all
> the allocated paths.  When the merge is over, those paths were free()d
> as part of free()ing this strmap.
>
> In commit 6697ee01b5d3 (merge-ort: switch our strmaps over to using
> memory pools, 2021-07-30), we changed the allocations for pathnames to
> come from a memory pool.  That meant the ownership changed slightly;
> there were no individual free() calls to make, instead the memory pool
> owned all those paths and they were free()d all at once.
>
> Unfortunately unique_path() was written presuming the pre-memory-pool
> model, and allocated a path on the heap and left it in the strmap for
> later free()ing.  Modify it to return a path allocated from the memory
> pool instead.

This seems like a rather obvious fix to the leak, as the other side
wasn't ready to have the detached strbuf handed to it, and instead is
assuming everything is mempools.

The downside is a bit of heap churn here since you malloc() & use the
strbuf just to ask for that size from the mempool, and then free() the
strbuf (of course we had that before, we just weren't free-ing).

So this is just an aside & I have no idea if it's worth it, but FWIW you
can have your cake & eat it too here memory-allocation wise and avoid
the strbuf allocation entirely, and just use your mem-pool.

Like this:

diff --git a/merge-ort.c b/merge-ort.c
index 40ae4dc4e92..1111916d5cb 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -731,6 +731,16 @@ static char *unique_path(struct merge_options *opt,
 	int suffix = 0;
 	size_t base_len;
 	struct strmap *existing_paths = &opt->priv->paths;
+	/*
+	 * pre-size path + ~ + branch + _%d + "\0". Hopefully 6 digits
+	 * of suffix is enough for everyone?
+	 */
+	const size_t max_suffix = 6;
+	const size_t expected_len = strlen(path) + 1 + strlen(branch) + 1 +
+		max_suffix + 1;
+
+	ret = mem_pool_alloc(&opt->priv->pool, expected_len);
+	strbuf_attach(&newpath, ret, 0, expected_len);
 
 	strbuf_addf(&newpath, "%s~", path);
 	add_flattened_path(&newpath, branch);
@@ -741,10 +751,10 @@ static char *unique_path(struct merge_options *opt,
 		strbuf_addf(&newpath, "_%d", suffix++);
 	}
 
-	/* Track the new path in our memory pool */
-	ret = mem_pool_alloc(&opt->priv->pool, newpath.len + 1);
-	memcpy(ret, newpath.buf, newpath.len + 1);
-	strbuf_release(&newpath);
+	if (newpath.alloc > expected_len)
+		BUG("we assumed too much thinking '%s~%s' would fit in %lu, ended up %lu ('%s')",
+		    path, branch, expected_len, newpath.alloc, newpath.buf);
+
 	return ret;
 }
 

A bit nasty for sure, but if you're willing to BUG() out if we ever go
above 999999 suffix tries or whatever (which would be trivial to add to
the loop there) it's rather straightforward.

I.e. we know the size of the buffer ahead of time, except for that loop
that'll add "_%d" to the end, and that can be made bounded.

Obviously your solution's a lot simpler, so I think this is only
something you should consider if you think it matters for the
performance numbers linked to from 6697ee01b5d3. I'm not familiar enough
with merge-ort.c to know if it is in this case, or if this would be
pointless micro-optimization on a non-hot codepath.


  reply	other threads:[~2022-02-19 22:31 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-19 17:09 [PATCH 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
2022-02-19 17:09 ` [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
2022-02-19 21:44   ` Ævar Arnfjörð Bjarmason
2022-02-20  0:26     ` Elijah Newren
2022-02-19 17:10 ` [PATCH 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
2022-02-19 22:22   ` Ævar Arnfjörð Bjarmason [this message]
2022-02-20  0:37     ` Elijah Newren
2022-02-20  1:29 ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
2022-02-20  1:29   ` [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
2022-02-21  2:35     ` Taylor Blau
2022-02-23  7:57       ` Elijah Newren
2022-06-01 10:00     ` Ævar Arnfjörð Bjarmason
2022-06-01 10:09     ` Flaky SANITIZE=leak test "regression" in v2.36.0 (was: [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames()) Ævar Arnfjörð Bjarmason
2022-02-20  1:29   ` [PATCH v2 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
2022-02-21  2:43   ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220219.86o832cwup.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).