git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: John Cai <johncai86@gmail.com>
Cc: git <git@vger.kernel.org>, Christian Couder <christian.couder@gmail.com>
Subject: [PATCH 3/3] parse_object_buffer(): respect save_commit_buffer
Date: Thu, 22 Sep 2022 06:15:38 -0400	[thread overview]
Message-ID: <Yyw1ytQv35+m76VC@coredump.intra.peff.net> (raw)
In-Reply-To: <Yyw0PSVe3YTQGgRS@coredump.intra.peff.net>

If the global variable "save_commit_buffer" is set to 0, then
parse_commit() will throw away the commit object data after parsing it,
rather than sticking it into a commit slab. This goes all the way back
to 60ab26de99 ([PATCH] Avoid wasting memory in git-rev-list,
2005-09-15).

But there's another code path which may similarly stash the buffer:
parse_object_buffer(). This is where we end up if we parse a commit via
parse_object(), and it's used directly in a few other code paths like
git-fsck.

The original goal of 60ab26de99 was avoiding extra memory usage for
rev-list. And there it's not all that important to catch parse_object().
We use that function only for looking at the tips of the traversal, and
the majority of the commits are parsed by following parent links, where
we use parse_commit() directly. So we were wasting some memory, but only
a small portion.

It's much easier to see the effect with fsck. Since we now turn off
save_commit_buffer by default there, we _should_ be able to drop the
freeing of the commit buffer in fsck_obj(). But if we do so (taking the
first hunk of this patch without the rest), then the peak heap of "git
fsck" in a clone of git.git goes from 136MB to 194MB. Teaching
parse_object_buffer() to respect save_commit_buffer brings that down to
134.5MB (it's hard to tell from massif's output, but I suspect the
savings comes from avoiding the overhead of the mostly-empty commit
slab).

Other programs should see a small improvement. Both "rev-list --all" and
"fsck --connectivity-only" improve by a few hundred kilobytes, as they'd
avoid loading the tip objects of their traversals.

Most importantly, no code should be hurt by doing this. Any program that
turns off save_commit_buffer is already making the assumption that any
commit it sees may need to have its object data loaded on demand, as it
doesn't know which ones were parsed by parse_commit() versus
parse_object(). Not to mention that anything parsed by the commit graph
may be in the same boat, even if save_commit_buffer was not disabled.

This should be the only spot that needs to be fixed. Grepping for
set_commit_buffer() shows that this and parse_commit() are the only
relevant calls.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c | 3 ---
 object.c       | 3 ++-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b45de003d4..41acbc229e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -439,9 +439,6 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size)
 out:
 	if (obj->type == OBJ_TREE)
 		free_tree_buffer((struct tree *)obj);
-	if (obj->type == OBJ_COMMIT)
-		free_commit_buffer(the_repository->parsed_objects,
-				   (struct commit *)obj);
 	return err;
 }
 
diff --git a/object.c b/object.c
index 2e4589bae5..8a74eb85e9 100644
--- a/object.c
+++ b/object.c
@@ -233,7 +233,8 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 		if (commit) {
 			if (parse_commit_buffer(r, commit, buffer, size, 1))
 				return NULL;
-			if (!get_cached_commit_buffer(r, commit, NULL)) {
+			if (save_commit_buffer &&
+			    !get_cached_commit_buffer(r, commit, NULL)) {
 				set_commit_buffer(r, commit, buffer, size);
 				*eaten_p = 1;
 			}
-- 
2.38.0.rc1.583.ga560cd8328

      parent reply	other threads:[~2022-09-22 10:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20 19:27 [INVESTIGATION] why is fsck --connectivity-only so much more expensive than rev-list --objects --all? John Cai
2022-09-20 20:41 ` Jeff King
2022-09-22 10:09   ` [PATCH 0/3] reducing fsck memory usage Jeff King
2022-09-22 10:11     ` [PATCH 1/3] fsck: free tree buffers after walking unreachable objects Jeff King
2022-09-22 18:40       ` Junio C Hamano
2022-09-22 18:58         ` Jeff King
2022-09-22 19:27           ` Junio C Hamano
2022-09-22 22:16             ` Jeff King
2022-09-22 10:13     ` [PATCH 2/3] fsck: turn off save_commit_buffer Jeff King
2022-09-22 10:15     ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yyw1ytQv35+m76VC@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johncai86@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).