git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..."
Date: Wed, 03 Aug 2022 10:54:09 -0700	[thread overview]
Message-ID: <xmqqv8r99pry.fsf@gitster.g> (raw)
In-Reply-To: <patch-v3-3.6-83fc1835fe5-20220802T152925Z-avarab@gmail.com> ("Ævar Arnfjörð Bjarmason"'s message of "Tue, 2 Aug 2022 17:33:13 +0200")

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Fix a memory leak in code added in 5d7eeee2ac6 (git-show: grok blobs,
> trees and tags, too, 2006-12-14). As we iterate over a "<revision>..."
> command-line and encounter ad OBJ_COMMIT we want to use our "struct

"... encounter an OBJ_COMMIT, we want to ..."?

> rev_info", but with a "pending" array of one element: the one commit
> we're showing in the loop.
>
> To do this 5d7eeee2ac6 saved away a pointer to rev.pending.objects and
> rev.pending.nr for its iteration. We'd then clobber those (and alloc)
> when we needed to show an OBJ_COMMIT.
>
> We'd therefore leak the "rev.pending" we started out with, and only
> free the new "rev.pending" in the "OBJ_COMMIT" case arm as
> prepare_revision_walk() would draw it down.
>
> Let's fix this memory leak. Now when we encounter an OBJ_COMMIT we
> save away the "rev.pending" before clearing it. We then add a single
> commit to it, which our indirect invocation of prepare_revision_walk()
> will remove. After that we restore the "rev.pending".
>
> Our "rev.pending" will then get free'd by the release_revisions()
> added in f6bfea0ad01 (revisions API users: use release_revisions() in
> builtin/log.c, 2022-04-13)

Hmph, this gives me a strange sense of deja-vu that I saw a better
solution in a separate patch from you, strange because I do not see
anything at the tip of 'seen' like what I thought you did elsewhere.

We do need to reuse "rev_info" we got from outside the loop so that
we will have to consistently apply diffopt and other things we got
in there from the configuration and the command line.  But after we
decide to go to "each object is shown individually" mode, the
contents of the pending array (rather, what we got from the command
line in cmdline member) is only relevant to enumerate which
individual objects are shown in the loop, and should not even be
visible to the code that handles each individual object, e.g. even
we pass &rev to this code when we see a blob:

		switch (o->type) {
		case OBJ_BLOB:
			ret = show_blob_object(&o->oid, &rev, name);
			break;

there is no point in exposing the rev.pending to show_blob_object()
at al.  The same is true for codepaths used to show a tag or a tree.
When showing a commit, we do not even want the codepath that shows a
single-commit range to touch and destroy the original rev.pending;
we instead want to give a single-element pending array.

So instead of keeping rev.pending when we enter the loop and saving
it away and restoring it, it feels a lot cleaner to invent/use an
interface to "clone" the settings in an existing rev_info by:

 * Grab rev.pending into a separate "struct object_array" that is a
   local variable in cmd_show() and clear rev.pending, immediately
   after we decide to go to "show individually" mode.

 * Iterate over the objects in that local variable.  For each object
   to be shown, prepare a rev_info, clone the setting from the
   original rev_info, put that single object to the pending member
   of the clone, use that cloned rev_info, and release the resources
   held by the cloned rev_info once we are done.

> diff --git a/builtin/log.c b/builtin/log.c
> index 88a5e98875a..b4b1d974617 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -743,11 +743,17 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  			rev.shown_one = 1;
>  			break;
>  		case OBJ_COMMIT:
> +		{
> +			struct object_array old;
> +
> +			memcpy(&old, &rev.pending, sizeof(old));
>  			rev.pending.nr = rev.pending.alloc = 0;
>  			rev.pending.objects = NULL;
>  			add_object_array(o, name, &rev.pending);
>  			ret = cmd_log_walk_no_free(&rev);
> +			memcpy(&rev.pending, &old, sizeof(rev.pending));
>  			break;
> +		}

The reason why I find this approach a bit disturbing is that we
pretend to know for certain that pending is the only thing we need
to protect across multiple calls to the log_walk(), but I suspect
that such an overconfidence will come back and bite us.  We are not
re-initializing other states reachable from the rev_info (e.g. the
diff_options struct has various members that records what happened
in an invocation, like needed_rename_limit, found_follow, and
found_changes, that would want to be reinitialized if we are
starting a new and totally independent traversal after we are done
one traversal).

But I do not mind at all to leave such a fundamental clean-up to a
totally separate topic, and keep this patch only about "we are
clobbering and leaking rev.pending, let's plug the leak".  In fact,
I would prefer it that way.  So take all of the above as material
for possible NEEDSWORK comment, food for further thought.


  parent reply	other threads:[~2022-08-03 17:54 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-13 13:10 [PATCH 0/6] revisions API: fix more memory leaks Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
2022-07-18 14:51   ` Jeff King
2022-07-13 13:10 ` [PATCH 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
2022-07-18 14:57   ` Jeff King
2022-07-18 15:08     ` Ævar Arnfjörð Bjarmason
2022-07-13 13:10 ` [PATCH 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
2022-07-18 15:05   ` Jeff King
2022-07-13 13:10 ` [PATCH 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
2022-07-18 15:11   ` Jeff King
2022-07-18 15:13     ` Ævar Arnfjörð Bjarmason
2022-07-18 15:45       ` Jeff King
2022-07-29  7:07         ` Ævar Arnfjörð Bjarmason
2022-07-29 18:03           ` Jeff King
2022-07-29 18:37             ` Jeff King
2022-07-29 18:52               ` Junio C Hamano
2022-07-29 19:04                 ` Jeff King
2022-07-18 15:14 ` [PATCH 0/6] revisions API: fix more memory leaks Jeff King
2022-07-29  8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2022-07-29  8:31   ` [PATCH v2 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-07-29  8:31   ` [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
2022-07-30  5:22     ` Eric Sunshine
2022-07-29  8:31   ` [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
2022-07-29 18:44     ` Junio C Hamano
2022-07-29  8:31   ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
2022-07-29 18:55     ` Jeff King
2022-07-29 19:07       ` Jeff King
2022-07-29 18:59     ` Jeff King
2022-07-29  8:31   ` [PATCH v2 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
2022-07-29  8:31   ` [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
2022-07-29 19:00     ` Jeff King
2022-07-29 19:02   ` [PATCH v2 0/6] revisions API: fix more memory leaks Jeff King
2022-08-02 15:33   ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-08-03 17:13       ` Junio C Hamano
2022-08-02 15:33     ` [PATCH v3 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
2022-08-03 17:27       ` Jeff King
2022-08-03 17:29         ` Jeff King
2022-08-03 17:54       ` Junio C Hamano [this message]
2022-08-02 15:33     ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
2022-08-03 17:32       ` Jeff King
2022-08-03 17:59       ` Junio C Hamano
2022-08-04  7:51         ` Ævar Arnfjörð Bjarmason
2022-08-04 15:59           ` Junio C Hamano
2022-08-05  9:01             ` Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
2022-08-02 15:33     ` [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
2022-08-03 18:30       ` Junio C Hamano
2022-08-03 17:33     ` [PATCH v3 0/6] revisions API: fix more memory leaks Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqv8r99pry.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).