git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Taylor Blau" <me@ttaylorr.com>
Subject: Re: [PATCH 2/2] commit-graph: turn off save_commit_buffer
Date: Sat, 7 Sep 2019 14:56:36 -0400	[thread overview]
Message-ID: <20190907185636.GB32028@syl.local> (raw)
In-Reply-To: <20190907050439.GB23904@sigill.intra.peff.net>

On Sat, Sep 07, 2019 at 01:04:40AM -0400, Jeff King wrote:
> The commit-graph tool may read a lot of commits, but it only cares about
> parsing their metadata (parents, trees, etc) and doesn't ever show the
> messages to the user. And so it should not need save_commit_buffer,
> which is meant for holding onto the object data of parsed commits so
> that we can show them later. In fact, it's quite harmful to do so.
> According to massif, the max heap of "git commit-graph write
> --reachable" in linux.git before/after this patch (removing the commit
> graph file in between) goes from ~1.1GB to ~270MB.
>
> Which isn't surprising, since the difference is about the sum of the
> uncompressed sizes of all commits in the repository, and this was
> equivalent to leaking them.
>
> This obviously helps if you're under memory pressure, but even without
> it, things go faster. My before/after times for that command (without
> massif) went from 12.521s to 11.874s, a speedup of ~5%.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> We didn't actually notice this on linux.git, but rather on a repository
> with 130 million commits (don't ask). With this patch, I was able to
> generate the commit-graph file with a peak heap of ~25GB, which is ~200
> bytes per commit.
>
> I'll bet we could do better with some effort, but obviously this case
> was just pathological. For most cases this should be cheaper than a
> normal repack (which probably spends that much memory on each object,
> not just commits).
>
>  builtin/commit-graph.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 57863619b7..052696f1af 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -251,6 +251,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
>  			     builtin_commit_graph_usage,
>  			     PARSE_OPT_STOP_AT_NON_OPTION);
>
> +	save_commit_buffer = 0;
> +

This looks exactly right to me. We had discussed a little bit off-list
about where you might place this line, but I think that the spot you
picked is perfect: as late as possible.

Thankfully, the option parsing code here doesn't load any commits
(though even if it did, I don't think that turning on/off
'save_commit_buffer' would really make much of a difference).

So, the patch here looks obviously correct, and I don't think it needs a
test or anything like that... besides: what is there to test? :).

>  	if (argc > 0) {
>  		if (!strcmp(argv[0], "read"))
>  			return graph_read(argc, argv);
> --
> 2.23.0.474.gb1abd76f7a

Thanks,
Taylor

  reply	other threads:[~2019-09-07 18:56 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-07  4:58 [PATCH 0/2] a few commit-graph improvements Jeff King
2019-09-07  5:01 ` [PATCH 1/2] commit-graph: don't show progress percentages while expanding reachable commits Jeff King
2019-09-07 10:34   ` SZEDER Gábor
2019-09-07 18:54     ` Taylor Blau
2019-09-27 18:54   ` Linus Torvalds
2019-09-07  5:04 ` [PATCH 2/2] commit-graph: turn off save_commit_buffer Jeff King
2019-09-07 18:56   ` Taylor Blau [this message]
2019-09-08 10:31     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190907185636.GB32028@syl.local \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).