git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 0/2] a few commit-graph improvements
@ 2019-09-07  4:58 Jeff King
  2019-09-07  5:01 ` [PATCH 1/2] commit-graph: don't show progress percentages while expanding reachable commits Jeff King
  2019-09-07  5:04 ` [PATCH 2/2] commit-graph: turn off save_commit_buffer Jeff King
  0 siblings, 2 replies; 7+ messages in thread
From: Jeff King @ 2019-09-07  4:58 UTC (permalink / raw)
  To: git
  Cc: SZEDER Gábor, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, Taylor Blau

We've been playing with commit graphs at GitHub and found a few bits of
low-hanging fruit (one liners -- it doesn't get any lower than that).

The first one is actually a resurrection of a patch from March:

  https://public-inbox.org/git/20190322102817.19708-1-szeder.dev@gmail.com/

where the progress bar sometimes prints nonsense. There's some
discussion in that thread about how we could sometimes show a real
percentage instead of a counting-up progress meter. But given the number
of corner cases discussed, and the fact that nothing has happened for 6
months, I think we should first make sure we're always doing the
_correct_ thing, and then people can build a nicer meter on top if they
want to.

The second is a fix for a small memory "leak", but it makes a big
difference.

  [1/2]: commit-graph: don't show progress percentages while expanding reachable commits
  [2/2]: commit-graph: turn off save_commit_buffer

 builtin/commit-graph.c | 2 ++
 commit-graph.c         | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] commit-graph: don't show progress percentages while expanding reachable commits
  2019-09-07  4:58 [PATCH 0/2] a few commit-graph improvements Jeff King
@ 2019-09-07  5:01 ` Jeff King
  2019-09-07 10:34   ` SZEDER Gábor
  2019-09-07  5:04 ` [PATCH 2/2] commit-graph: turn off save_commit_buffer Jeff King
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2019-09-07  5:01 UTC (permalink / raw)
  To: git
  Cc: SZEDER Gábor, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, Taylor Blau

From: SZEDER Gábor <szeder.dev@gmail.com>

Commit 49bbc57a57 (commit-graph write: emit a percentage for all
progress, 2019-01-19) was a bit overeager when it added progress
percentages to the "Expanding reachable commits in commit graph" phase
as well, because most of the time the number of commits that phase has
to iterate over is not known in advance and grows significantly, and,
consequently, we end up with nonsensical numbers:

  $ git commit-graph write --reachable
  Expanding reachable commits in commit graph: 138606% (824706/595), done.
  [...]

  $ git rev-parse v5.0 | git commit-graph write --stdin-commits
  Expanding reachable commits in commit graph: 81264400% (812644/1), done.
  [...]

Even worse, because the percentage grows so quickly, the progress code
outputs much more often than it should (because it ticks every second,
or every 1%), slowing the whole process down. My time for "git
commit-graph write --reachable" on linux.git went from 13.463s to
12.521s with this patch, ~7% savings.

Therefore, don't show progress percentages in the "Expanding reachable
commits in commit graph" phase.

Note that the current code does sometimes do the right thing, if we
picked up all commits initially (e.g., omitting "--reachable" in a
fully-packed repository would get the correct count without any parent
traversal). So it may be possible to come up with a way to tell when we
could use a percentage here. But in the meantime, let's make sure we
robustly avoid printing nonsense.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
Compared to the original from:

  https://public-inbox.org/git/20190322102817.19708-1-szeder.dev@gmail.com/

I rebased it to handle code movement, added in the timing data, and
tried to summarize the discussion from the thread.

 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index f2888c203b..d6a5c8cf1c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1050,7 +1050,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 	if (ctx->report_progress)
 		ctx->progress = start_delayed_progress(
 					_("Expanding reachable commits in commit graph"),
-					ctx->oids.nr);
+					0);
 	for (i = 0; i < ctx->oids.nr; i++) {
 		display_progress(ctx->progress, i + 1);
 		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
-- 
2.23.0.474.gb1abd76f7a

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/2] commit-graph: turn off save_commit_buffer
  2019-09-07  4:58 [PATCH 0/2] a few commit-graph improvements Jeff King
  2019-09-07  5:01 ` [PATCH 1/2] commit-graph: don't show progress percentages while expanding reachable commits Jeff King
@ 2019-09-07  5:04 ` Jeff King
  2019-09-07 18:56   ` Taylor Blau
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2019-09-07  5:04 UTC (permalink / raw)
  To: git
  Cc: SZEDER Gábor, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, Taylor Blau

The commit-graph tool may read a lot of commits, but it only cares about
parsing their metadata (parents, trees, etc) and doesn't ever show the
messages to the user. And so it should not need save_commit_buffer,
which is meant for holding onto the object data of parsed commits so
that we can show them later. In fact, it's quite harmful to do so.
According to massif, the max heap of "git commit-graph write
--reachable" in linux.git before/after this patch (removing the commit
graph file in between) goes from ~1.1GB to ~270MB.

Which isn't surprising, since the difference is about the sum of the
uncompressed sizes of all commits in the repository, and this was
equivalent to leaking them.

This obviously helps if you're under memory pressure, but even without
it, things go faster. My before/after times for that command (without
massif) went from 12.521s to 11.874s, a speedup of ~5%.

Signed-off-by: Jeff King <peff@peff.net>
---
We didn't actually notice this on linux.git, but rather on a repository
with 130 million commits (don't ask). With this patch, I was able to
generate the commit-graph file with a peak heap of ~25GB, which is ~200
bytes per commit.

I'll bet we could do better with some effort, but obviously this case
was just pathological. For most cases this should be cheaper than a
normal repack (which probably spends that much memory on each object,
not just commits).

 builtin/commit-graph.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 57863619b7..052696f1af 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -251,6 +251,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 			     builtin_commit_graph_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
+	save_commit_buffer = 0;
+
 	if (argc > 0) {
 		if (!strcmp(argv[0], "read"))
 			return graph_read(argc, argv);
-- 
2.23.0.474.gb1abd76f7a

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] commit-graph: don't show progress percentages while expanding reachable commits
  2019-09-07  5:01 ` [PATCH 1/2] commit-graph: don't show progress percentages while expanding reachable commits Jeff King
@ 2019-09-07 10:34   ` SZEDER Gábor
  2019-09-07 18:54     ` Taylor Blau
  0 siblings, 1 reply; 7+ messages in thread
From: SZEDER Gábor @ 2019-09-07 10:34 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Ævar Arnfjörð Bjarmason, Derrick Stolee, Taylor Blau

On Sat, Sep 07, 2019 at 01:01:33AM -0400, Jeff King wrote:
> From: SZEDER Gábor <szeder.dev@gmail.com>
> 
> Commit 49bbc57a57 (commit-graph write: emit a percentage for all
> progress, 2019-01-19) was a bit overeager when it added progress
> percentages to the "Expanding reachable commits in commit graph" phase
> as well, because most of the time the number of commits that phase has
> to iterate over is not known in advance and grows significantly, and,
> consequently, we end up with nonsensical numbers:
> 
>   $ git commit-graph write --reachable
>   Expanding reachable commits in commit graph: 138606% (824706/595), done.
>   [...]
> 
>   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
>   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
>   [...]
> 
> Even worse, because the percentage grows so quickly, the progress code
> outputs much more often than it should (because it ticks every second,
> or every 1%), slowing the whole process down. My time for "git
> commit-graph write --reachable" on linux.git went from 13.463s to
> 12.521s with this patch, ~7% savings.

Oh, interesting.

> Therefore, don't show progress percentages in the "Expanding reachable
> commits in commit graph" phase.
> 
> Note that the current code does sometimes do the right thing, if we
> picked up all commits initially (e.g., omitting "--reachable" in a
> fully-packed repository would get the correct count without any parent
> traversal). So it may be possible to come up with a way to tell when we
> could use a percentage here. But in the meantime, let's make sure we
> robustly avoid printing nonsense.
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Compared to the original from:
> 
>   https://public-inbox.org/git/20190322102817.19708-1-szeder.dev@gmail.com/
> 
> I rebased it to handle code movement, added in the timing data, and
> tried to summarize the discussion from the thread.

Thanks for resurrecting this patch and for the summary paragraph.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] commit-graph: don't show progress percentages while expanding reachable commits
  2019-09-07 10:34   ` SZEDER Gábor
@ 2019-09-07 18:54     ` Taylor Blau
  0 siblings, 0 replies; 7+ messages in thread
From: Taylor Blau @ 2019-09-07 18:54 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, git, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, Taylor Blau

On Sat, Sep 07, 2019 at 12:34:07PM +0200, SZEDER Gábor wrote:
> On Sat, Sep 07, 2019 at 01:01:33AM -0400, Jeff King wrote:
> > From: SZEDER Gábor <szeder.dev@gmail.com>
> >
> > Commit 49bbc57a57 (commit-graph write: emit a percentage for all
> > progress, 2019-01-19) was a bit overeager when it added progress
> > percentages to the "Expanding reachable commits in commit graph" phase
> > as well, because most of the time the number of commits that phase has
> > to iterate over is not known in advance and grows significantly, and,
> > consequently, we end up with nonsensical numbers:
> >
> >   $ git commit-graph write --reachable
> >   Expanding reachable commits in commit graph: 138606% (824706/595), done.
> >   [...]
> >
> >   $ git rev-parse v5.0 | git commit-graph write --stdin-commits
> >   Expanding reachable commits in commit graph: 81264400% (812644/1), done.
> >   [...]
> >
> > Even worse, because the percentage grows so quickly, the progress code
> > outputs much more often than it should (because it ticks every second,
> > or every 1%), slowing the whole process down. My time for "git
> > commit-graph write --reachable" on linux.git went from 13.463s to
> > 12.521s with this patch, ~7% savings.
>
> Oh, interesting.
>
> > Therefore, don't show progress percentages in the "Expanding reachable
> > commits in commit graph" phase.
> >
> > Note that the current code does sometimes do the right thing, if we
> > picked up all commits initially (e.g., omitting "--reachable" in a
> > fully-packed repository would get the correct count without any parent
> > traversal). So it may be possible to come up with a way to tell when we
> > could use a percentage here. But in the meantime, let's make sure we
> > robustly avoid printing nonsense.
> >
> > Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > Compared to the original from:
> >
> >   https://public-inbox.org/git/20190322102817.19708-1-szeder.dev@gmail.com/
> >
> > I rebased it to handle code movement, added in the timing data, and
> > tried to summarize the discussion from the thread.
>
> Thanks for resurrecting this patch and for the summary paragraph.

Thanks from me, as well. I noticed that we had achieved three billion
percent progress on the repository that brought this to our attention,
but didn't notice that you had already written these patches.

So, I am glad that they are getting the attribution that they deserve.
Thanks again both.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] commit-graph: turn off save_commit_buffer
  2019-09-07  5:04 ` [PATCH 2/2] commit-graph: turn off save_commit_buffer Jeff King
@ 2019-09-07 18:56   ` Taylor Blau
  2019-09-08 10:31     ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Taylor Blau @ 2019-09-07 18:56 UTC (permalink / raw)
  To: Jeff King
  Cc: git, SZEDER Gábor, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, Taylor Blau

On Sat, Sep 07, 2019 at 01:04:40AM -0400, Jeff King wrote:
> The commit-graph tool may read a lot of commits, but it only cares about
> parsing their metadata (parents, trees, etc) and doesn't ever show the
> messages to the user. And so it should not need save_commit_buffer,
> which is meant for holding onto the object data of parsed commits so
> that we can show them later. In fact, it's quite harmful to do so.
> According to massif, the max heap of "git commit-graph write
> --reachable" in linux.git before/after this patch (removing the commit
> graph file in between) goes from ~1.1GB to ~270MB.
>
> Which isn't surprising, since the difference is about the sum of the
> uncompressed sizes of all commits in the repository, and this was
> equivalent to leaking them.
>
> This obviously helps if you're under memory pressure, but even without
> it, things go faster. My before/after times for that command (without
> massif) went from 12.521s to 11.874s, a speedup of ~5%.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> We didn't actually notice this on linux.git, but rather on a repository
> with 130 million commits (don't ask). With this patch, I was able to
> generate the commit-graph file with a peak heap of ~25GB, which is ~200
> bytes per commit.
>
> I'll bet we could do better with some effort, but obviously this case
> was just pathological. For most cases this should be cheaper than a
> normal repack (which probably spends that much memory on each object,
> not just commits).
>
>  builtin/commit-graph.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 57863619b7..052696f1af 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -251,6 +251,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
>  			     builtin_commit_graph_usage,
>  			     PARSE_OPT_STOP_AT_NON_OPTION);
>
> +	save_commit_buffer = 0;
> +

This looks exactly right to me. We had discussed a little bit off-list
about where you might place this line, but I think that the spot you
picked is perfect: as late as possible.

Thankfully, the option parsing code here doesn't load any commits
(though even if it did, I don't think that turning on/off
'save_commit_buffer' would really make much of a difference).

So, the patch here looks obviously correct, and I don't think it needs a
test or anything like that... besides: what is there to test? :).

>  	if (argc > 0) {
>  		if (!strcmp(argv[0], "read"))
>  			return graph_read(argc, argv);
> --
> 2.23.0.474.gb1abd76f7a

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] commit-graph: turn off save_commit_buffer
  2019-09-07 18:56   ` Taylor Blau
@ 2019-09-08 10:31     ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2019-09-08 10:31 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, SZEDER Gábor, Ævar Arnfjörð Bjarmason,
	Derrick Stolee

On Sat, Sep 07, 2019 at 02:56:36PM -0400, Taylor Blau wrote:

> So, the patch here looks obviously correct, and I don't think it needs a
> test or anything like that... besides: what is there to test? :).

There's no functional change, so as long as this has coverage in the
regular suite (and I think it does), there's no point in adding a new
functional test.

However, it may make sense to get some coverage of commit graphs in the
perf suite, so that we could detect regressions there. I think that
would make sense as a separate series, though, not attached to this
particular fix. (It would also be cool if the perf suite could record
peak memory usage, but that would take a fair bit of refactoring).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-07  4:58 [PATCH 0/2] a few commit-graph improvements Jeff King
2019-09-07  5:01 ` [PATCH 1/2] commit-graph: don't show progress percentages while expanding reachable commits Jeff King
2019-09-07 10:34   ` SZEDER Gábor
2019-09-07 18:54     ` Taylor Blau
2019-09-07  5:04 ` [PATCH 2/2] commit-graph: turn off save_commit_buffer Jeff King
2019-09-07 18:56   ` Taylor Blau
2019-09-08 10:31     ` Jeff King

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox