git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] pretty: lazy-load commit data when expanding user-format
@ 2021-01-28 19:57 Jeff King
  2021-01-28 22:36 ` Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jeff King @ 2021-01-28 19:57 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty

When we expand a user-format, we try to avoid work that isn't necessary
for the output. For instance, we don't bother parsing the commit header
until we know we need the author, subject, etc.

But we do always load the commit object's contents from disk, even if
the format doesn't require it (e.g., just "%H"). Traditionally this
didn't matter much, because we'd have loaded it as part of the traversal
anyway, and we'd typically have those bytes attached to the commit
struct (or these days, cached in a commit-slab).

But when we have a commit-graph, we might easily get to the point of
pretty-printing a commit without ever having looked at the actual object
contents. We should push off that load (and reencoding) until we're
certain that it's needed.

I think the results of p4205 show the advantage pretty clearly (we serve
parent and tree oids out of the commit struct itself, so they benefit as
well):

  # using git.git as the test repo
  Test                          HEAD^             HEAD
  ----------------------------------------------------------------------
  4205.1: log with %H           0.40(0.39+0.01)   0.03(0.02+0.01) -92.5%
  4205.2: log with %h           0.45(0.44+0.01)   0.09(0.09+0.00) -80.0%
  4205.3: log with %T           0.40(0.39+0.00)   0.04(0.04+0.00) -90.0%
  4205.4: log with %t           0.46(0.46+0.00)   0.09(0.08+0.01) -80.4%
  4205.5: log with %P           0.39(0.39+0.00)   0.03(0.03+0.00) -92.3%
  4205.6: log with %p           0.46(0.46+0.00)   0.10(0.09+0.00) -78.3%
  4205.7: log with %h-%h-%h     0.52(0.51+0.01)   0.15(0.14+0.00) -71.2%
  4205.8: log with %an-%ae-%s   0.42(0.41+0.00)   0.42(0.41+0.01) +0.0%

  # using linux.git as the test repo
  Test                          HEAD^             HEAD
  ----------------------------------------------------------------------
  4205.1: log with %H           7.12(6.97+0.14)   0.76(0.65+0.11) -89.3%
  4205.2: log with %h           7.35(7.19+0.16)   1.30(1.19+0.11) -82.3%
  4205.3: log with %T           7.58(7.42+0.15)   1.02(0.94+0.08) -86.5%
  4205.4: log with %t           8.05(7.89+0.15)   1.55(1.41+0.13) -80.7%
  4205.5: log with %P           7.12(7.01+0.10)   0.76(0.69+0.07) -89.3%
  4205.6: log with %p           7.38(7.27+0.10)   1.32(1.20+0.12) -82.1%
  4205.7: log with %h-%h-%h     7.81(7.67+0.13)   1.79(1.67+0.12) -77.1%
  4205.8: log with %an-%ae-%s   7.90(7.74+0.15)   7.81(7.66+0.15) -1.1%

I added the final test to show where we don't improve (the 1% there is
just lucky noise), but also as a regression test to make sure we're not
doing anything stupid like loading the commit multiple times when there
are several placeholders that need it.

Reported-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
---
This benefits "rev-list --format=", as well, though there you can also
use things like "--parents" instead, which are already fast.

 pretty.c                           | 23 ++++++++++++-----------
 t/perf/p4205-log-pretty-formats.sh |  2 +-
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/pretty.c b/pretty.c
index 3922f6f9f2..b4ff3f602f 100644
--- a/pretty.c
+++ b/pretty.c
@@ -783,6 +783,7 @@ enum trunc_type {
 };
 
 struct format_commit_context {
+	struct repository *repository;
 	const struct commit *commit;
 	const struct pretty_print_context *pretty_ctx;
 	unsigned commit_header_parsed:1;
@@ -1373,10 +1374,13 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 		return 2;
 	}
 
-
 	/* For the rest we have to parse the commit header. */
-	if (!c->commit_header_parsed)
+	if (!c->commit_header_parsed) {
+		msg = c->message =
+			repo_logmsg_reencode(c->repository, commit,
+					     &c->commit_encoding, "UTF-8");
 		parse_commit_header(c);
+	}
 
 	switch (placeholder[0]) {
 	case 'a':	/* author ... */
@@ -1667,25 +1671,22 @@ void repo_format_commit_message(struct repository *r,
 				const struct pretty_print_context *pretty_ctx)
 {
 	struct format_commit_context context = {
+		.repository = r,
 		.commit = commit,
 		.pretty_ctx = pretty_ctx,
 		.wrap_start = sb->len
 	};
 	const char *output_enc = pretty_ctx->output_encoding;
 	const char *utf8 = "UTF-8";
 
-	/*
-	 * convert a commit message to UTF-8 first
-	 * as far as 'format_commit_item' assumes it in UTF-8
-	 */
-	context.message = repo_logmsg_reencode(r, commit,
-					       &context.commit_encoding,
-					       utf8);
-
 	strbuf_expand(sb, format, format_commit_item, &context);
 	rewrap_message_tail(sb, &context, 0, 0, 0);
 
-	/* then convert a commit message to an actual output encoding */
+	/*
+	 * Convert output to an actual output encoding; note that
+	 * format_commit_item() will always use UTF-8, so we don't
+	 * have to bother if that's what the output wants.
+	 */
 	if (output_enc) {
 		if (same_encoding(utf8, output_enc))
 			output_enc = NULL;
diff --git a/t/perf/p4205-log-pretty-formats.sh b/t/perf/p4205-log-pretty-formats.sh
index 7c26f4f337..609fecd65d 100755
--- a/t/perf/p4205-log-pretty-formats.sh
+++ b/t/perf/p4205-log-pretty-formats.sh
@@ -6,7 +6,7 @@ test_description='Tests the performance of various pretty format placeholders'
 
 test_perf_default_repo
 
-for format in %H %h %T %t %P %p %h-%h-%h
+for format in %H %h %T %t %P %p %h-%h-%h %an-%ae-%s
 do
 	test_perf "log with $format" "
 		git log --format=\"$format\" >/dev/null
-- 
2.30.0.757.g41d9b3dd9b

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] pretty: lazy-load commit data when expanding user-format
  2021-01-28 19:57 [PATCH] pretty: lazy-load commit data when expanding user-format Jeff King
@ 2021-01-28 22:36 ` Ævar Arnfjörð Bjarmason
  2021-01-29  1:11   ` Jeff King
  2021-01-28 22:58 ` Taylor Blau
  2021-01-28 23:25 ` Junio C Hamano
  2 siblings, 1 reply; 5+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-28 22:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty


On Thu, Jan 28 2021, Jeff King wrote:

>   # using git.git as the test repo
>   Test                          HEAD^             HEAD
>   ----------------------------------------------------------------------
>   4205.1: log with %H           0.40(0.39+0.01)   0.03(0.02+0.01) -92.5%
>   4205.2: log with %h           0.45(0.44+0.01)   0.09(0.09+0.00) -80.0%
>   4205.3: log with %T           0.40(0.39+0.00)   0.04(0.04+0.00) -90.0%
>   4205.4: log with %t           0.46(0.46+0.00)   0.09(0.08+0.01) -80.4%
>   4205.5: log with %P           0.39(0.39+0.00)   0.03(0.03+0.00) -92.3%
>   4205.6: log with %p           0.46(0.46+0.00)   0.10(0.09+0.00) -78.3%
>   4205.7: log with %h-%h-%h     0.52(0.51+0.01)   0.15(0.14+0.00) -71.2%
>   4205.8: log with %an-%ae-%s   0.42(0.41+0.00)   0.42(0.41+0.01) +0.0%

Looks nice!

> diff --git a/t/perf/p4205-log-pretty-formats.sh b/t/perf/p4205-log-pretty-formats.sh
> index 7c26f4f337..609fecd65d 100755
> --- a/t/perf/p4205-log-pretty-formats.sh
> +++ b/t/perf/p4205-log-pretty-formats.sh
> @@ -6,7 +6,7 @@ test_description='Tests the performance of various pretty format placeholders'
>  
>  test_perf_default_repo
>  
> -for format in %H %h %T %t %P %p %h-%h-%h
> +for format in %H %h %T %t %P %p %h-%h-%h %an-%ae-%s
>  do
>  	test_perf "log with $format" "
>  		git log --format=\"$format\" >/dev/null


While we're at it it would be nice to have a few more formats that have
to do with the body in some way in those tests, and stess things like
mailmap/trailers etc.

    %s
    %b
    %B
    %N
    %aN-%aE
    %cn-%ce
    %cN-%cE
    %d
    %D
    %(trailers)

Just paging over the git-log manpage, that seems to stress most of the
codepaths, i.e. subject/body, but also things like notes, .mailmap, ref
names, and body parsing (trailers).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] pretty: lazy-load commit data when expanding user-format
  2021-01-28 19:57 [PATCH] pretty: lazy-load commit data when expanding user-format Jeff King
  2021-01-28 22:36 ` Ævar Arnfjörð Bjarmason
@ 2021-01-28 22:58 ` Taylor Blau
  2021-01-28 23:25 ` Junio C Hamano
  2 siblings, 0 replies; 5+ messages in thread
From: Taylor Blau @ 2021-01-28 22:58 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty

On Thu, Jan 28, 2021 at 02:57:39PM -0500, Jeff King wrote:
> Reported-by: Michael Haggerty <mhagger@alum.mit.edu>
> Signed-off-by: Jeff King <peff@peff.net>

This is delightful.

As I was reading the discussion you and Michael had off-list, I was
worried that this would be more complicated than it ended up being. But,
it's terrific to see that this change appeared to be relatively easy,
and the performance speaks for itself.

Thanks for such a pleasant read!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] pretty: lazy-load commit data when expanding user-format
  2021-01-28 19:57 [PATCH] pretty: lazy-load commit data when expanding user-format Jeff King
  2021-01-28 22:36 ` Ævar Arnfjörð Bjarmason
  2021-01-28 22:58 ` Taylor Blau
@ 2021-01-28 23:25 ` Junio C Hamano
  2 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2021-01-28 23:25 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty

Jeff King <peff@peff.net> writes:

> I added the final test to show where we don't improve (the 1% there is
> just lucky noise), but also as a regression test to make sure we're not
> doing anything stupid like loading the commit multiple times when there
> are several placeholders that need it.
>
> Reported-by: Michael Haggerty <mhagger@alum.mit.edu>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> This benefits "rev-list --format=", as well, though there you can also
> use things like "--parents" instead, which are already fast.

Quite pleased to see more work lazily done (and personally I am
happy to see Michael's name on the list---say Hi, I missed him).


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] pretty: lazy-load commit data when expanding user-format
  2021-01-28 22:36 ` Ævar Arnfjörð Bjarmason
@ 2021-01-29  1:11   ` Jeff King
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2021-01-29  1:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Michael Haggerty

On Thu, Jan 28, 2021 at 11:36:23PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > -for format in %H %h %T %t %P %p %h-%h-%h
> > +for format in %H %h %T %t %P %p %h-%h-%h %an-%ae-%s
> >  do
> >  	test_perf "log with $format" "
> >  		git log --format=\"$format\" >/dev/null
> 
> 
> While we're at it it would be nice to have a few more formats that have
> to do with the body in some way in those tests, and stess things like
> mailmap/trailers etc.
> 
>     %s
>     %b
>     %B
>     %N
>     %aN-%aE
>     %cn-%ce
>     %cN-%cE
>     %d
>     %D
>     %(trailers)
> 
> Just paging over the git-log manpage, that seems to stress most of the
> codepaths, i.e. subject/body, but also things like notes, .mailmap, ref
> names, and body parsing (trailers).

I'd prefer not to do so in this patch, since most of those aren't
providing any new data.

I don't mind _too_ much if you'd like to do so on top for general
regression-testing, but I'm generally a bit hesitant to throw a lot of
stuff into the perf suite without a sense of what it's measuring, just
because it already takes forever to run. So for example, is the
difference between %aN-%aE and %cN-%cE worth spending 21 seconds of CPU
(3 times the 7 seconds it takes to run on the kernel)?

One test to check mailmap performance, or one for trailers, seems like
it might be more directed. I think the existing test is likewise a bit
wasteful in checking %h _and_ %t _and_ %p), though at least it is now
much faster after my patch. ;)

(In the regular test suite, we of course should be covering all these
for correctness already, and I think we do).

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-29  1:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28 19:57 [PATCH] pretty: lazy-load commit data when expanding user-format Jeff King
2021-01-28 22:36 ` Ævar Arnfjörð Bjarmason
2021-01-29  1:11   ` Jeff King
2021-01-28 22:58 ` Taylor Blau
2021-01-28 23:25 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).