From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 2/2] merge-ort: fix small memory leak in unique_path()
Date: Sat, 19 Feb 2022 16:37:50 -0800 [thread overview]
Message-ID: <CABPp-BEpSEmndTHOLrdTGmcf_+5uoR6_7fy58KXXktV4tcTXpA@mail.gmail.com> (raw)
In-Reply-To: <220219.86o832cwup.gmgdl@evledraar.gmail.com>
On Sat, Feb 19, 2022 at 2:31 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Sat, Feb 19 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > The struct strmap paths member of merge_options_internal is perhaps the
> > most central data structure to all of merge-ort. Because all the paths
> > involved in the merge need to be kept until the merge is complete, this
> > "paths" data structure traditionally took responsibility for owning all
> > the allocated paths. When the merge is over, those paths were free()d
> > as part of free()ing this strmap.
> >
> > In commit 6697ee01b5d3 (merge-ort: switch our strmaps over to using
> > memory pools, 2021-07-30), we changed the allocations for pathnames to
> > come from a memory pool. That meant the ownership changed slightly;
> > there were no individual free() calls to make, instead the memory pool
> > owned all those paths and they were free()d all at once.
> >
> > Unfortunately unique_path() was written presuming the pre-memory-pool
> > model, and allocated a path on the heap and left it in the strmap for
> > later free()ing. Modify it to return a path allocated from the memory
> > pool instead.
>
> This seems like a rather obvious fix to the leak, as the other side
> wasn't ready to have the detached strbuf handed to it, and instead is
> assuming everything is mempools.
>
> The downside is a bit of heap churn here since you malloc() & use the
> strbuf just to ask for that size from the mempool, and then free() the
> strbuf (of course we had that before, we just weren't free-ing).
>
> So this is just an aside & I have no idea if it's worth it, but FWIW you
> can have your cake & eat it too here memory-allocation wise and avoid
> the strbuf allocation entirely, and just use your mem-pool.
>
> Like this:
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 40ae4dc4e92..1111916d5cb 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -731,6 +731,16 @@ static char *unique_path(struct merge_options *opt,
> int suffix = 0;
> size_t base_len;
> struct strmap *existing_paths = &opt->priv->paths;
> + /*
> + * pre-size path + ~ + branch + _%d + "\0". Hopefully 6 digits
> + * of suffix is enough for everyone?
> + */
> + const size_t max_suffix = 6;
> + const size_t expected_len = strlen(path) + 1 + strlen(branch) + 1 +
> + max_suffix + 1;
> +
> + ret = mem_pool_alloc(&opt->priv->pool, expected_len);
> + strbuf_attach(&newpath, ret, 0, expected_len);
>
> strbuf_addf(&newpath, "%s~", path);
> add_flattened_path(&newpath, branch);
> @@ -741,10 +751,10 @@ static char *unique_path(struct merge_options *opt,
> strbuf_addf(&newpath, "_%d", suffix++);
> }
>
> - /* Track the new path in our memory pool */
> - ret = mem_pool_alloc(&opt->priv->pool, newpath.len + 1);
> - memcpy(ret, newpath.buf, newpath.len + 1);
> - strbuf_release(&newpath);
> + if (newpath.alloc > expected_len)
> + BUG("we assumed too much thinking '%s~%s' would fit in %lu, ended up %lu ('%s')",
> + path, branch, expected_len, newpath.alloc, newpath.buf);
> +
> return ret;
> }
>
>
> A bit nasty for sure, but if you're willing to BUG() out if we ever go
> above 999999 suffix tries or whatever (which would be trivial to add to
> the loop there) it's rather straightforward.
>
> I.e. we know the size of the buffer ahead of time, except for that loop
> that'll add "_%d" to the end, and that can be made bounded.
>
> Obviously your solution's a lot simpler, so I think this is only
> something you should consider if you think it matters for the
> performance numbers linked to from 6697ee01b5d3. I'm not familiar enough
> with merge-ort.c to know if it is in this case, or if this would be
> pointless micro-optimization on a non-hot codepath.
That's an interesting idea, but it's a micro-optimization on a very
cold path. You need to either have a D/F conflict that persists or a
mode-type conflict (e.g. an add/add conflict with symlink vs. file
types). Those tend to be quite rare; in fact, the testcase with the
performance numbers in 6697ee01b5d3 didn't have any of these and never
even triggered this codepath (otherwise I would have caught the leak
in my earlier leak testing). So I think simple and robust makes more
sense here.
next prev parent reply other threads:[~2022-02-20 0:38 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-19 17:09 [PATCH 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
2022-02-19 17:09 ` [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
2022-02-19 21:44 ` Ævar Arnfjörð Bjarmason
2022-02-20 0:26 ` Elijah Newren
2022-02-19 17:10 ` [PATCH 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
2022-02-19 22:22 ` Ævar Arnfjörð Bjarmason
2022-02-20 0:37 ` Elijah Newren [this message]
2022-02-20 1:29 ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
2022-02-20 1:29 ` [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
2022-02-21 2:35 ` Taylor Blau
2022-02-23 7:57 ` Elijah Newren
2022-06-01 10:00 ` Ævar Arnfjörð Bjarmason
2022-06-01 10:09 ` Flaky SANITIZE=leak test "regression" in v2.36.0 (was: [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames()) Ævar Arnfjörð Bjarmason
2022-02-20 1:29 ` [PATCH v2 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
2022-02-21 2:43 ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABPp-BEpSEmndTHOLrdTGmcf_+5uoR6_7fy58KXXktV4tcTXpA@mail.gmail.com \
--to=newren@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).