git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	Christian Couder <chriscool@tuxfamily.org>,
	Hariom Verma <hariom18599@gmail.com>,
	ZheNing Hu <adlternative@gmail.com>
Subject: Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
Date: Tue, 6 Apr 2021 20:34:21 +0200	[thread overview]
Message-ID: <c70a7c17-650a-ae4d-9a90-66c3511f8371@web.de> (raw)
In-Reply-To: <pull.927.git.1617631280402.gitgitgadget@gmail.com>

Am 05.04.21 um 16:01 schrieb ZheNing Hu via GitGitGadget:
> From: ZheNing Hu <adlternative@gmail.com>
>
> When we use `git for-each-ref`, every ref will call
> `show_ref_array_item()` and allocate its own final strbuf
> and error strbuf. Instead, we can provide two single strbuf:
> final_buf and error_buf that get reused for each output.
>
> When run it 100 times:
>
> $ git for-each-ref
>
> on git.git :
>
> 3.19s user
> 3.88s system
> 35% cpu
> 20.199 total
>
> to:
>
> 2.89s user
> 4.00s system
> 34% cpu
> 19.741 total
>
> The performance has been slightly improved.

I like to use hyperfine (https://github.com/sharkdp/hyperfine) to get
more stable benchmark numbers, incl. standard deviation.  With three
warmup runs I get the following results for running git for-each-ref on
Git's own repo with the current master (2e36527f23):

  Benchmark #1: ./git for-each-ref
    Time (mean ± σ):      18.8 ms ±   0.3 ms    [User: 12.7 ms, System: 5.6 ms]
    Range (min … max):    18.2 ms …  19.8 ms    148 runs

With your patch on top I get this:

  Benchmark #1: ./git for-each-ref
    Time (mean ± σ):      18.5 ms ±   0.4 ms    [User: 12.3 ms, System: 5.6 ms]
    Range (min … max):    17.8 ms …  19.6 ms    147 runs

So there seems to be a slight improvement here, but it is within the
noise.

I'm quite surprised how much longer this takes on your machine, however,
and (like Peff already mentioned) how much of the total time it spends
in system calls.  Is an antivirus program or similar interferring?  Or
some kind of emulator or similar, e.g. Valgrind?  Or has it been a long
time since you ran "git gc"?

The benchmark certainly depends on the number of local and remote
branches in the repo; my copy currently has 4304 according to
"git for-each-ref | wc -l".

>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>     [GSOC] ref-filter: use single strbuf for all output
>
>     This patch learned Jeff King's optimization measures in git
>     cat-file(79ed0a5): using a single strbuf for all objects output Instead
>     of allocating a large number of small strbuf for every object.
>
>     So ref-filter can learn same thing: use single buffer: final_buf and
>     error_buf for all refs output.
>
>     Thanks.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-927%2Fadlternative%2Fref-filter-single-buf-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-927/adlternative/ref-filter-single-buf-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/927
>
>  builtin/for-each-ref.c |  4 +++-
>  builtin/tag.c          |  4 +++-
>  ref-filter.c           | 21 ++++++++++++---------
>  ref-filter.h           |  5 ++++-
>  4 files changed, 22 insertions(+), 12 deletions(-)
>
> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index cb9c81a04606..9dc41f48bfa0 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  	struct ref_array array;
>  	struct ref_filter filter;
>  	struct ref_format format = REF_FORMAT_INIT;
> +	struct strbuf final_buf = STRBUF_INIT;
> +	struct strbuf error_buf = STRBUF_INIT;
>
>  	struct option opts[] = {
>  		OPT_BIT('s', "shell", &format.quote_style,
> @@ -81,7 +83,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  	if (!maxcount || array.nr < maxcount)
>  		maxcount = array.nr;
>  	for (i = 0; i < maxcount; i++)
> -		show_ref_array_item(array.items[i], &format);
> +		show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);

This user of show_ref_array_item() calls it in a loop on an array.

>  	ref_array_clear(&array);
>  	return 0;
>  }
> diff --git a/builtin/tag.c b/builtin/tag.c
> index d403417b5625..8a38b3e2de34 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -39,6 +39,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
>  		     struct ref_format *format)
>  {
>  	struct ref_array array;
> +	struct strbuf final_buf = STRBUF_INIT;
> +	struct strbuf error_buf = STRBUF_INIT;
>  	char *to_free = NULL;
>  	int i;
>
> @@ -64,7 +66,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
>  	ref_array_sort(sorting, &array);
>
>  	for (i = 0; i < array.nr; i++)
> -		show_ref_array_item(array.items[i], format);
> +		show_ref_array_item(array.items[i], format, &final_buf, &error_buf);

Dito.

>  	ref_array_clear(&array);
>  	free(to_free);
>
> diff --git a/ref-filter.c b/ref-filter.c
> index f0bd32f71416..51ff6af64ebc 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2436,16 +2436,16 @@ int format_ref_array_item(struct ref_array_item *info,
>  }
>
>  void show_ref_array_item(struct ref_array_item *info,
> -			 const struct ref_format *format)
> +			 const struct ref_format *format,
> +			 struct strbuf *final_buf,
> +			 struct strbuf *error_buf)
>  {
> -	struct strbuf final_buf = STRBUF_INIT;
> -	struct strbuf error_buf = STRBUF_INIT;
>
> -	if (format_ref_array_item(info, format, &final_buf, &error_buf))
> -		die("%s", error_buf.buf);
> -	fwrite(final_buf.buf, 1, final_buf.len, stdout);
> -	strbuf_release(&error_buf);
> -	strbuf_release(&final_buf);
> +	if (format_ref_array_item(info, format, final_buf, error_buf))
> +		die("%s", error_buf->buf);
> +	fwrite(final_buf->buf, 1, final_buf->len, stdout);
> +	strbuf_reset(error_buf);
> +	strbuf_reset(final_buf);
>  	putchar('\n');
>  }
>
> @@ -2453,9 +2453,12 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
>  		      const struct ref_format *format)
>  {
>  	struct ref_array_item *ref_item;
> +	struct strbuf final_buf = STRBUF_INIT;
> +	struct strbuf error_buf = STRBUF_INIT;
> +
>  	ref_item = new_ref_array_item(name, oid);
>  	ref_item->kind = ref_kind_from_refname(name);
> -	show_ref_array_item(ref_item, format);
> +	show_ref_array_item(ref_item, format, &final_buf, &error_buf);

This third and final caller works with a single item; there is no loop.

>  	free_array_item(ref_item);
>  }
>
> diff --git a/ref-filter.h b/ref-filter.h
> index 19ea4c413409..95498c9f4467 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -120,7 +120,10 @@ int format_ref_array_item(struct ref_array_item *info,
>  			  struct strbuf *final_buf,
>  			  struct strbuf *error_buf);
>  /*  Print the ref using the given format and quote_style */
> -void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> +void show_ref_array_item(struct ref_array_item *info,
> +			 const struct ref_format *format,
> +			 struct strbuf *final_buf,
> +			 struct strbuf *error_buf);

This bring-your-own-buffer approach pushes responsibilities back to
the callers, in exchange for improved performance.  The number of
users of this interface is low, so that's defensible.  But that added
effort is also non-trivial -- as you demonstrated by leaking the
allocated memory. ;-)

How about offering to do more instead?  In particular you could add
a count parameter and have show_ref_array_item() handle an array of
struct ref_array_item objects.  It could reuse the buffers internally
to get the same performance benefit, and would free callers from
having to iterate loops themselves.  Something like:

	void show_ref_array_items(struct ref_array_item **info,
				  size_t n,
				  const struct ref_format *format);

Callers that deal with a single element can pass n = 1.

Perhaps the "format" parameter should go first, like with printf.

The double reference in "**info" is a bit ugly, though (array of
pointers instead of a simple array of objects).  That's dictated
by struct ref_array_item containing a flexible array member, which
seems to be hard to change.

>  /*  Parse a single sort specifier and add it to the list */
>  void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *atom);
>  /*  Callback function for parsing the sort option */
>
> base-commit: 2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48
>


  parent reply	other threads:[~2021-04-06 18:34 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05 14:01 [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
2021-04-05 17:05 ` Eric Sunshine
2021-04-06  8:53   ` ZheNing Hu
2021-04-05 21:02 ` Derrick Stolee
2021-04-06  8:58   ` ZheNing Hu
2021-04-05 22:17 ` Jeff King
2021-04-06  9:49   ` ZheNing Hu
2021-04-06 10:35     ` ZheNing Hu
2021-04-06 14:00       ` Jeff King
2021-04-06 14:35         ` ZheNing Hu
2021-04-06 18:34 ` René Scharfe [this message]
2021-04-07 13:57   ` ZheNing Hu
2021-04-07 15:26 ` [PATCH v2] " ZheNing Hu via GitGitGadget
2021-04-07 20:31   ` Junio C Hamano
2021-04-08 12:05     ` ZheNing Hu
2021-04-07 21:27   ` Jeff King
2021-04-08 12:18     ` ZheNing Hu
2021-04-08 14:32       ` Jeff King
2021-04-08 14:43         ` ZheNing Hu
2021-04-08 14:51           ` Jeff King
2021-04-08 15:12             ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c70a7c17-650a-ae4d-9a90-66c3511f8371@web.de \
    --to=l.s.r@web.de \
    --cc=adlternative@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).