git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Taylor Blau" <me@ttaylorr.com>
Subject: [PATCH 2/2] commit-graph: turn off save_commit_buffer
Date: Sat, 7 Sep 2019 01:04:40 -0400	[thread overview]
Message-ID: <20190907050439.GB23904@sigill.intra.peff.net> (raw)
In-Reply-To: <20190907045848.GA24515@sigill.intra.peff.net>

The commit-graph tool may read a lot of commits, but it only cares about
parsing their metadata (parents, trees, etc) and doesn't ever show the
messages to the user. And so it should not need save_commit_buffer,
which is meant for holding onto the object data of parsed commits so
that we can show them later. In fact, it's quite harmful to do so.
According to massif, the max heap of "git commit-graph write
--reachable" in linux.git before/after this patch (removing the commit
graph file in between) goes from ~1.1GB to ~270MB.

Which isn't surprising, since the difference is about the sum of the
uncompressed sizes of all commits in the repository, and this was
equivalent to leaking them.

This obviously helps if you're under memory pressure, but even without
it, things go faster. My before/after times for that command (without
massif) went from 12.521s to 11.874s, a speedup of ~5%.

Signed-off-by: Jeff King <peff@peff.net>
---
We didn't actually notice this on linux.git, but rather on a repository
with 130 million commits (don't ask). With this patch, I was able to
generate the commit-graph file with a peak heap of ~25GB, which is ~200
bytes per commit.

I'll bet we could do better with some effort, but obviously this case
was just pathological. For most cases this should be cheaper than a
normal repack (which probably spends that much memory on each object,
not just commits).

 builtin/commit-graph.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 57863619b7..052696f1af 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -251,6 +251,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 			     builtin_commit_graph_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
+	save_commit_buffer = 0;
+
 	if (argc > 0) {
 		if (!strcmp(argv[0], "read"))
 			return graph_read(argc, argv);
-- 
2.23.0.474.gb1abd76f7a

  parent reply	other threads:[~2019-09-07  5:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-07  4:58 [PATCH 0/2] a few commit-graph improvements Jeff King
2019-09-07  5:01 ` [PATCH 1/2] commit-graph: don't show progress percentages while expanding reachable commits Jeff King
2019-09-07 10:34   ` SZEDER Gábor
2019-09-07 18:54     ` Taylor Blau
2019-09-27 18:54   ` Linus Torvalds
2019-09-07  5:04 ` Jeff King [this message]
2019-09-07 18:56   ` [PATCH 2/2] commit-graph: turn off save_commit_buffer Taylor Blau
2019-09-08 10:31     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190907050439.GB23904@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).