git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 2/2] merge-ort: fix small memory leak in unique_path()
Date: Sat, 19 Feb 2022 16:37:50 -0800	[thread overview]
Message-ID: <CABPp-BEpSEmndTHOLrdTGmcf_+5uoR6_7fy58KXXktV4tcTXpA@mail.gmail.com> (raw)
In-Reply-To: <220219.86o832cwup.gmgdl@evledraar.gmail.com>

On Sat, Feb 19, 2022 at 2:31 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Sat, Feb 19 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > The struct strmap paths member of merge_options_internal is perhaps the
> > most central data structure to all of merge-ort.  Because all the paths
> > involved in the merge need to be kept until the merge is complete, this
> > "paths" data structure traditionally took responsibility for owning all
> > the allocated paths.  When the merge is over, those paths were free()d
> > as part of free()ing this strmap.
> >
> > In commit 6697ee01b5d3 (merge-ort: switch our strmaps over to using
> > memory pools, 2021-07-30), we changed the allocations for pathnames to
> > come from a memory pool.  That meant the ownership changed slightly;
> > there were no individual free() calls to make, instead the memory pool
> > owned all those paths and they were free()d all at once.
> >
> > Unfortunately unique_path() was written presuming the pre-memory-pool
> > model, and allocated a path on the heap and left it in the strmap for
> > later free()ing.  Modify it to return a path allocated from the memory
> > pool instead.
>
> This seems like a rather obvious fix to the leak, as the other side
> wasn't ready to have the detached strbuf handed to it, and instead is
> assuming everything is mempools.
>
> The downside is a bit of heap churn here since you malloc() & use the
> strbuf just to ask for that size from the mempool, and then free() the
> strbuf (of course we had that before, we just weren't free-ing).
>
> So this is just an aside & I have no idea if it's worth it, but FWIW you
> can have your cake & eat it too here memory-allocation wise and avoid
> the strbuf allocation entirely, and just use your mem-pool.
>
> Like this:
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 40ae4dc4e92..1111916d5cb 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -731,6 +731,16 @@ static char *unique_path(struct merge_options *opt,
>         int suffix = 0;
>         size_t base_len;
>         struct strmap *existing_paths = &opt->priv->paths;
> +       /*
> +        * pre-size path + ~ + branch + _%d + "\0". Hopefully 6 digits
> +        * of suffix is enough for everyone?
> +        */
> +       const size_t max_suffix = 6;
> +       const size_t expected_len = strlen(path) + 1 + strlen(branch) + 1 +
> +               max_suffix + 1;
> +
> +       ret = mem_pool_alloc(&opt->priv->pool, expected_len);
> +       strbuf_attach(&newpath, ret, 0, expected_len);
>
>         strbuf_addf(&newpath, "%s~", path);
>         add_flattened_path(&newpath, branch);
> @@ -741,10 +751,10 @@ static char *unique_path(struct merge_options *opt,
>                 strbuf_addf(&newpath, "_%d", suffix++);
>         }
>
> -       /* Track the new path in our memory pool */
> -       ret = mem_pool_alloc(&opt->priv->pool, newpath.len + 1);
> -       memcpy(ret, newpath.buf, newpath.len + 1);
> -       strbuf_release(&newpath);
> +       if (newpath.alloc > expected_len)
> +               BUG("we assumed too much thinking '%s~%s' would fit in %lu, ended up %lu ('%s')",
> +                   path, branch, expected_len, newpath.alloc, newpath.buf);
> +
>         return ret;
>  }
>
>
> A bit nasty for sure, but if you're willing to BUG() out if we ever go
> above 999999 suffix tries or whatever (which would be trivial to add to
> the loop there) it's rather straightforward.
>
> I.e. we know the size of the buffer ahead of time, except for that loop
> that'll add "_%d" to the end, and that can be made bounded.
>
> Obviously your solution's a lot simpler, so I think this is only
> something you should consider if you think it matters for the
> performance numbers linked to from 6697ee01b5d3. I'm not familiar enough
> with merge-ort.c to know if it is in this case, or if this would be
> pointless micro-optimization on a non-hot codepath.

That's an interesting idea, but it's a micro-optimization on a very
cold path.  You need to either have a D/F conflict that persists or a
mode-type conflict (e.g. an add/add conflict with symlink vs. file
types).  Those tend to be quite rare; in fact, the testcase with the
performance numbers in 6697ee01b5d3 didn't have any of these and never
even triggered this codepath (otherwise I would have caught the leak
in my earlier leak testing).  So I think simple and robust makes more
sense here.

  reply	other threads:[~2022-02-20  0:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-19 17:09 [PATCH 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
2022-02-19 17:09 ` [PATCH 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
2022-02-19 21:44   ` Ævar Arnfjörð Bjarmason
2022-02-20  0:26     ` Elijah Newren
2022-02-19 17:10 ` [PATCH 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
2022-02-19 22:22   ` Ævar Arnfjörð Bjarmason
2022-02-20  0:37     ` Elijah Newren [this message]
2022-02-20  1:29 ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Elijah Newren via GitGitGadget
2022-02-20  1:29   ` [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames() Elijah Newren via GitGitGadget
2022-02-21  2:35     ` Taylor Blau
2022-02-23  7:57       ` Elijah Newren
2022-06-01 10:00     ` Ævar Arnfjörð Bjarmason
2022-06-01 10:09     ` Flaky SANITIZE=leak test "regression" in v2.36.0 (was: [PATCH v2 1/2] merge-ort: fix small memory leak in detect_and_process_renames()) Ævar Arnfjörð Bjarmason
2022-02-20  1:29   ` [PATCH v2 2/2] merge-ort: fix small memory leak in unique_path() Elijah Newren via GitGitGadget
2022-02-21  2:43   ` [PATCH v2 0/2] Fix a couple small leaks in merge-ort Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BEpSEmndTHOLrdTGmcf_+5uoR6_7fy58KXXktV4tcTXpA@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).