git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH] [GSOC] ref-filter: use single strbuf for all output
@ 2021-04-05 14:01 ZheNing Hu via GitGitGadget
  2021-04-05 17:05 ` Eric Sunshine
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-04-05 14:01 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Junio C Hamano, Christian Couder, Hariom Verma,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

When we use `git for-each-ref`, every ref will call
`show_ref_array_item()` and allocate its own final strbuf
and error strbuf. Instead, we can provide two single strbuf:
final_buf and error_buf that get reused for each output.

When run it 100 times:

$ git for-each-ref

on git.git :

3.19s user
3.88s system
35% cpu
20.199 total

to:

2.89s user
4.00s system
34% cpu
19.741 total

The performance has been slightly improved.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    [GSOC] ref-filter: use single strbuf for all output
    
    This patch learned Jeff King's optimization measures in git
    cat-file(79ed0a5): using a single strbuf for all objects output Instead
    of allocating a large number of small strbuf for every object.
    
    So ref-filter can learn same thing: use single buffer: final_buf and
    error_buf for all refs output.
    
    Thanks.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-927%2Fadlternative%2Fref-filter-single-buf-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-927/adlternative/ref-filter-single-buf-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/927

 builtin/for-each-ref.c |  4 +++-
 builtin/tag.c          |  4 +++-
 ref-filter.c           | 21 ++++++++++++---------
 ref-filter.h           |  5 ++++-
 4 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index cb9c81a04606..9dc41f48bfa0 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct ref_array array;
 	struct ref_filter filter;
 	struct ref_format format = REF_FORMAT_INIT;
+	struct strbuf final_buf = STRBUF_INIT;
+	struct strbuf error_buf = STRBUF_INIT;
 
 	struct option opts[] = {
 		OPT_BIT('s', "shell", &format.quote_style,
@@ -81,7 +83,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	if (!maxcount || array.nr < maxcount)
 		maxcount = array.nr;
 	for (i = 0; i < maxcount; i++)
-		show_ref_array_item(array.items[i], &format);
+		show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);
 	ref_array_clear(&array);
 	return 0;
 }
diff --git a/builtin/tag.c b/builtin/tag.c
index d403417b5625..8a38b3e2de34 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -39,6 +39,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		     struct ref_format *format)
 {
 	struct ref_array array;
+	struct strbuf final_buf = STRBUF_INIT;
+	struct strbuf error_buf = STRBUF_INIT;
 	char *to_free = NULL;
 	int i;
 
@@ -64,7 +66,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++)
-		show_ref_array_item(array.items[i], format);
+		show_ref_array_item(array.items[i], format, &final_buf, &error_buf);
 	ref_array_clear(&array);
 	free(to_free);
 
diff --git a/ref-filter.c b/ref-filter.c
index f0bd32f71416..51ff6af64ebc 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2436,16 +2436,16 @@ int format_ref_array_item(struct ref_array_item *info,
 }
 
 void show_ref_array_item(struct ref_array_item *info,
-			 const struct ref_format *format)
+			 const struct ref_format *format,
+			 struct strbuf *final_buf,
+			 struct strbuf *error_buf)
 {
-	struct strbuf final_buf = STRBUF_INIT;
-	struct strbuf error_buf = STRBUF_INIT;
 
-	if (format_ref_array_item(info, format, &final_buf, &error_buf))
-		die("%s", error_buf.buf);
-	fwrite(final_buf.buf, 1, final_buf.len, stdout);
-	strbuf_release(&error_buf);
-	strbuf_release(&final_buf);
+	if (format_ref_array_item(info, format, final_buf, error_buf))
+		die("%s", error_buf->buf);
+	fwrite(final_buf->buf, 1, final_buf->len, stdout);
+	strbuf_reset(error_buf);
+	strbuf_reset(final_buf);
 	putchar('\n');
 }
 
@@ -2453,9 +2453,12 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
 		      const struct ref_format *format)
 {
 	struct ref_array_item *ref_item;
+	struct strbuf final_buf = STRBUF_INIT;
+	struct strbuf error_buf = STRBUF_INIT;
+
 	ref_item = new_ref_array_item(name, oid);
 	ref_item->kind = ref_kind_from_refname(name);
-	show_ref_array_item(ref_item, format);
+	show_ref_array_item(ref_item, format, &final_buf, &error_buf);
 	free_array_item(ref_item);
 }
 
diff --git a/ref-filter.h b/ref-filter.h
index 19ea4c413409..95498c9f4467 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -120,7 +120,10 @@ int format_ref_array_item(struct ref_array_item *info,
 			  struct strbuf *final_buf,
 			  struct strbuf *error_buf);
 /*  Print the ref using the given format and quote_style */
-void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
+void show_ref_array_item(struct ref_array_item *info,
+			 const struct ref_format *format,
+			 struct strbuf *final_buf,
+			 struct strbuf *error_buf);
 /*  Parse a single sort specifier and add it to the list */
 void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *atom);
 /*  Callback function for parsing the sort option */

base-commit: 2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-05 14:01 [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
@ 2021-04-05 17:05 ` Eric Sunshine
  2021-04-06  8:53   ` ZheNing Hu
  2021-04-05 21:02 ` Derrick Stolee
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Eric Sunshine @ 2021-04-05 17:05 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: Git List, Jeff King, Junio C Hamano, Christian Couder,
	Hariom Verma, ZheNing Hu

On Mon, Apr 5, 2021 at 10:01 AM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> When we use `git for-each-ref`, every ref will call
> `show_ref_array_item()` and allocate its own final strbuf
> and error strbuf. Instead, we can provide two single strbuf:
> final_buf and error_buf that get reused for each output.
> [...]
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---

Was there a discussion leading up to this change? If so, it may be a
good idea to provide a link to it in the mailing list here under the
"---" line.

Some comments below...

> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> @@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>         struct ref_format format = REF_FORMAT_INIT;
> +       struct strbuf final_buf = STRBUF_INIT;
> +       struct strbuf error_buf = STRBUF_INIT;
>
>         for (i = 0; i < maxcount; i++)
> -               show_ref_array_item(array.items[i], &format);
> +               show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);
>         ref_array_clear(&array);
>         return 0;
>  }

The call to strbuf_reset() within show_ref_array_item() does not free
the memory from the strbufs, so the memory is being leaked. Therefore,
at the end of this function, you should have:

    strbuf_release(final_buf);
    strbuf_release(error_buf);

> diff --git a/builtin/tag.c b/builtin/tag.c
> @@ -39,6 +39,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
>         struct ref_array array;
> +       struct strbuf final_buf = STRBUF_INIT;
> +       struct strbuf error_buf = STRBUF_INIT;
>
>         for (i = 0; i < array.nr; i++)
> -               show_ref_array_item(array.items[i], format);
> +               show_ref_array_item(array.items[i], format, &final_buf, &error_buf);
>         ref_array_clear(&array);
>         free(to_free);

Leaking `final_buf` and `error_buf`.

> diff --git a/ref-filter.c b/ref-filter.c
> @@ -2436,16 +2436,16 @@ int format_ref_array_item(struct ref_array_item *info,
>  void show_ref_array_item(struct ref_array_item *info,
> -                        const struct ref_format *format)
> +                        const struct ref_format *format,
> +                        struct strbuf *final_buf,
> +                        struct strbuf *error_buf)
>  {
> -       struct strbuf final_buf = STRBUF_INIT;
> -       struct strbuf error_buf = STRBUF_INIT;
>
> -       if (format_ref_array_item(info, format, &final_buf, &error_buf))
> -               die("%s", error_buf.buf);
> -       fwrite(final_buf.buf, 1, final_buf.len, stdout);
> -       strbuf_release(&error_buf);
> -       strbuf_release(&final_buf);
> +       if (format_ref_array_item(info, format, final_buf, error_buf))
> +               die("%s", error_buf->buf);
> +       fwrite(final_buf->buf, 1, final_buf->len, stdout);
> +       strbuf_reset(error_buf);
> +       strbuf_reset(final_buf);
>         putchar('\n');
>  }

A couple comments:

It is especially ugly that `error_buf` needs to be passed in by the
caller since it is only ever used in case of an error, at which point
the program will die() anyhow, so it's not on a critical,
speed-sensitive path. The initialization with STRBUF_INIT is
practically cost-free, so this variable could easily stay local to
this function without cost-penalty rather than forcing the caller to
pass it in. (This assumes that none of the consumers of `error_buf`
down the line insert into the buffer unnecessarily, which is probably
a reasonable assumption.)

It is an unsafe assumption to only call strbuf_reset() at the end of
the function. For this to be robust, you can't assume that the caller
has given you an empty strbuf. Instead, you must ensure it by calling
strbuf_reset() at the start. (It doesn't hurt to also call
strbuf_reset() at the end, but that is not critical to correct
operation, so could be omitted.)

> @@ -2453,9 +2453,12 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
>         struct ref_array_item *ref_item;
> +       struct strbuf final_buf = STRBUF_INIT;
> +       struct strbuf error_buf = STRBUF_INIT;
> +
>         ref_item = new_ref_array_item(name, oid);
>         ref_item->kind = ref_kind_from_refname(name);
> -       show_ref_array_item(ref_item, format);
> +       show_ref_array_item(ref_item, format, &final_buf, &error_buf);
>         free_array_item(ref_item);
>  }

Leaking `final_buf` and `error_buf`.

> diff --git a/ref-filter.h b/ref-filter.h
> @@ -120,7 +120,10 @@ int format_ref_array_item(struct ref_array_item *info,
>  /*  Print the ref using the given format and quote_style */
> -void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> +void show_ref_array_item(struct ref_array_item *info,
> +                        const struct ref_format *format,
> +                        struct strbuf *final_buf,
> +                        struct strbuf *error_buf);

It is not clear to the person reading this what these two new
arguments are for or what should be provided, so the comment above the
function deserves an update explaining what these arguments are for
and how to provide them. This is especially important since this is a
public function.

If this function was merely internal to some builtin command, this
sort of change for the sake of optimization might not be so bad, but
as a public function, it comes across as rather ugly. In general, we
don't necessarily want to pollute an otherwise clean API with changes
which make the API uglier merely for the sake of tiny optimizations
like this (IMHO). The extra burden placed on callers by this change,
coupled with the ugliness this introduces into the API, makes the
change seem less than desirable.

One way you might be able to mitigate the undesirableness would be to
have two versions of this function. For instance:

    /* Print the ref using the given format and quote_style */
    show_ref_array_item(...);
   /* This is like show_ref_array_item() but ... */
    show_ref_array_item_optim(...);

The comment of the second function would, of course, need to explain
that it is similar to show_ref_array_item() but more optimal because
<some reasons> and that the caller is responsible for allocating and
releasing the strbufs, and that the strbufs are used only for
temporary storage, thus the caller should not assume anything about
them.

This way, callers which don't invoke show_ref_array_item() in a tight
loop don't need to be burdened by the new arguments (and won't have to
remember to release them), and callers with a tight loop can take
advantage of the optimization with the bit of extra work of having to
declare and release the strbufs.

So, having said all that, it's not clear that the ugliness and extra
work are worth the gain. However, if it is decided that the change is
worthwhile, then the commit message probably should explain cases in
which such an optimization will be really beneficial.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-05 14:01 [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
  2021-04-05 17:05 ` Eric Sunshine
@ 2021-04-05 21:02 ` Derrick Stolee
  2021-04-06  8:58   ` ZheNing Hu
  2021-04-05 22:17 ` Jeff King
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Derrick Stolee @ 2021-04-05 21:02 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget, git
  Cc: Jeff King, Junio C Hamano, Christian Couder, Hariom Verma, ZheNing Hu

On 4/5/2021 10:01 AM, ZheNing Hu via GitGitGadget wrote:
> From: ZheNing Hu <adlternative@gmail.com>
> 
> When we use `git for-each-ref`, every ref will call
> `show_ref_array_item()` and allocate its own final strbuf
> and error strbuf. Instead, we can provide two single strbuf:
> final_buf and error_buf that get reused for each output.

s/two single strbuf/two buffers/

> When run it 100 times:
> 
> $ git for-each-ref
> 
> on git.git :
> 
> 3.19s user
> 3.88s system
> 35% cpu
> 20.199 total
> 
> to:
> 
> 2.89s user
> 4.00s system
> 34% cpu
> 19.741 total
> 
> The performance has been slightly improved.

This is a nice amount of time! Perhaps simplify the
presentation:

  The performance for 'git for-each-ref' on the Git
  repository itself with X references changes from
  3.2 seconds to 2.9 seconds.
 
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>     [GSOC] ref-filter: use single strbuf for all output
>     
>     This patch learned Jeff King's optimization measures in git
>     cat-file(79ed0a5): using a single strbuf for all objects output Instead
>     of allocating a large number of small strbuf for every object.

This part of the cover letter could be put into the
commit message itself. Use the correct formatting,
though:

  This approach is similar to the one used by 79ed0a5
  (cat-file: use a single strbuf for all output, 2018-08-14)
  to speed up the cat-file builtin.

I found the code change to be correct. Thanks!
-Stolee

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-05 14:01 [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
  2021-04-05 17:05 ` Eric Sunshine
  2021-04-05 21:02 ` Derrick Stolee
@ 2021-04-05 22:17 ` Jeff King
  2021-04-06  9:49   ` ZheNing Hu
  2021-04-06 18:34 ` René Scharfe
  2021-04-07 15:26 ` [PATCH v2] " ZheNing Hu via GitGitGadget
  4 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2021-04-05 22:17 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Junio C Hamano, Christian Couder, Hariom Verma, ZheNing Hu

On Mon, Apr 05, 2021 at 02:01:19PM +0000, ZheNing Hu via GitGitGadget wrote:

> When we use `git for-each-ref`, every ref will call
> `show_ref_array_item()` and allocate its own final strbuf
> and error strbuf. Instead, we can provide two single strbuf:
> final_buf and error_buf that get reused for each output.
> 
> When run it 100 times:
> 
> $ git for-each-ref
> 
> on git.git :
> 
> 3.19s user
> 3.88s system
> 35% cpu
> 20.199 total
> 
> to:
> 
> 2.89s user
> 4.00s system
> 34% cpu
> 19.741 total

That's a bigger performance improvement than I'd expect from this. I'm
having trouble reproducing it here (I get the same time before and
after). I also notice that you don't seem to be CPU bound, and we spend
most of our time on system CPU (so program startup stuff, not the loop
you're optimizing).

I think a more interesting test is timing a single invocation with a
large number of refs. If you are planning to do a lot of work on the
formatting code, it might be worth adding such a test into t/perf (both
to show off results, but also to catch any regressions after we make
things faster).

>     This patch learned Jeff King's optimization measures in git
>     cat-file(79ed0a5): using a single strbuf for all objects output Instead
>     of allocating a large number of small strbuf for every object.

I do think this is a good direction (for all the reasons laid out in
79ed0a5), though it wasn't actually the part I was most worried about
for ref-filter performance. The bigger issue in ref-filter is that each
individual atom will generally have its own allocated string, and then
we'll format all of the atoms into the final strbuf output. In most
cases we could avoid those intermediate copies entirely.

I do think this would be a useful optimization to have in addition,
though. As for the patch itself, I looked over the review that Eric
gave, and he already said everything I would have. :)

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-05 17:05 ` Eric Sunshine
@ 2021-04-06  8:53   ` ZheNing Hu
  0 siblings, 0 replies; 21+ messages in thread
From: ZheNing Hu @ 2021-04-06  8:53 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: ZheNing Hu via GitGitGadget, Git List, Jeff King, Junio C Hamano,
	Christian Couder, Hariom Verma

Eric Sunshine <sunshine@sunshineco.com> 于2021年4月6日周二 上午1:05写道:
>
> On Mon, Apr 5, 2021 at 10:01 AM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can provide two single strbuf:
> > final_buf and error_buf that get reused for each output.
> > [...]
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
>
> Was there a discussion leading up to this change? If so, it may be a
> good idea to provide a link to it in the mailing list here under the
> "---" line.
>

Okay, I will add them in cover-letter next time.

> Some comments below...
>
> > diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> > @@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >         struct ref_format format = REF_FORMAT_INIT;
> > +       struct strbuf final_buf = STRBUF_INIT;
> > +       struct strbuf error_buf = STRBUF_INIT;
> >
> >         for (i = 0; i < maxcount; i++)
> > -               show_ref_array_item(array.items[i], &format);
> > +               show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);
> >         ref_array_clear(&array);
> >         return 0;
> >  }
>
> The call to strbuf_reset() within show_ref_array_item() does not free
> the memory from the strbufs, so the memory is being leaked. Therefore,
> at the end of this function, you should have:
>
>     strbuf_release(final_buf);
>     strbuf_release(error_buf);
>

Thanks, I ignored this point.

> > diff --git a/builtin/tag.c b/builtin/tag.c
> > @@ -39,6 +39,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >         struct ref_array array;
> > +       struct strbuf final_buf = STRBUF_INIT;
> > +       struct strbuf error_buf = STRBUF_INIT;
> >
> >         for (i = 0; i < array.nr; i++)
> > -               show_ref_array_item(array.items[i], format);
> > +               show_ref_array_item(array.items[i], format, &final_buf, &error_buf);
> >         ref_array_clear(&array);
> >         free(to_free);
>
> Leaking `final_buf` and `error_buf`.
>
> > diff --git a/ref-filter.c b/ref-filter.c
> > @@ -2436,16 +2436,16 @@ int format_ref_array_item(struct ref_array_item *info,
> >  void show_ref_array_item(struct ref_array_item *info,
> > -                        const struct ref_format *format)
> > +                        const struct ref_format *format,
> > +                        struct strbuf *final_buf,
> > +                        struct strbuf *error_buf)
> >  {
> > -       struct strbuf final_buf = STRBUF_INIT;
> > -       struct strbuf error_buf = STRBUF_INIT;
> >
> > -       if (format_ref_array_item(info, format, &final_buf, &error_buf))
> > -               die("%s", error_buf.buf);
> > -       fwrite(final_buf.buf, 1, final_buf.len, stdout);
> > -       strbuf_release(&error_buf);
> > -       strbuf_release(&final_buf);
> > +       if (format_ref_array_item(info, format, final_buf, error_buf))
> > +               die("%s", error_buf->buf);
> > +       fwrite(final_buf->buf, 1, final_buf->len, stdout);
> > +       strbuf_reset(error_buf);
> > +       strbuf_reset(final_buf);
> >         putchar('\n');
> >  }
>
> A couple comments:
>
> It is especially ugly that `error_buf` needs to be passed in by the
> caller since it is only ever used in case of an error, at which point
> the program will die() anyhow, so it's not on a critical,
> speed-sensitive path. The initialization with STRBUF_INIT is
> practically cost-free, so this variable could easily stay local to
> this function without cost-penalty rather than forcing the caller to
> pass it in. (This assumes that none of the consumers of `error_buf`
> down the line insert into the buffer unnecessarily, which is probably
> a reasonable assumption.)
>

What you said makes sense. The `error_buf` may not need to be passed
as a parameter, because errors are generally less.

> It is an unsafe assumption to only call strbuf_reset() at the end of
> the function. For this to be robust, you can't assume that the caller
> has given you an empty strbuf. Instead, you must ensure it by calling
> strbuf_reset() at the start. (It doesn't hurt to also call
> strbuf_reset() at the end, but that is not critical to correct
> operation, so could be omitted.)
>

Well, indeed, it would be better to use `strbuf_reset()` first, as it do in
`cat-file.c`.

> > @@ -2453,9 +2453,12 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
> >         struct ref_array_item *ref_item;
> > +       struct strbuf final_buf = STRBUF_INIT;
> > +       struct strbuf error_buf = STRBUF_INIT;
> > +
> >         ref_item = new_ref_array_item(name, oid);
> >         ref_item->kind = ref_kind_from_refname(name);
> > -       show_ref_array_item(ref_item, format);
> > +       show_ref_array_item(ref_item, format, &final_buf, &error_buf);
> >         free_array_item(ref_item);
> >  }
>
> Leaking `final_buf` and `error_buf`.
>
> > diff --git a/ref-filter.h b/ref-filter.h
> > @@ -120,7 +120,10 @@ int format_ref_array_item(struct ref_array_item *info,
> >  /*  Print the ref using the given format and quote_style */
> > -void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> > +void show_ref_array_item(struct ref_array_item *info,
> > +                        const struct ref_format *format,
> > +                        struct strbuf *final_buf,
> > +                        struct strbuf *error_buf);
>
> It is not clear to the person reading this what these two new
> arguments are for or what should be provided, so the comment above the
> function deserves an update explaining what these arguments are for
> and how to provide them. This is especially important since this is a
> public function.
>
> If this function was merely internal to some builtin command, this
> sort of change for the sake of optimization might not be so bad, but
> as a public function, it comes across as rather ugly. In general, we
> don't necessarily want to pollute an otherwise clean API with changes
> which make the API uglier merely for the sake of tiny optimizations
> like this (IMHO). The extra burden placed on callers by this change,
> coupled with the ugliness this introduces into the API, makes the
> change seem less than desirable.
>

Well, for the time being, there are relatively few places
where`show_ref_array_item()`
is used in the entire git repository. I may need to pay attention to this later:
the ease of use of public interfaces is also important.

> One way you might be able to mitigate the undesirableness would be to
> have two versions of this function. For instance:
>
>     /* Print the ref using the given format and quote_style */
>     show_ref_array_item(...);
>    /* This is like show_ref_array_item() but ... */
>     show_ref_array_item_optim(...);
>
> The comment of the second function would, of course, need to explain
> that it is similar to show_ref_array_item() but more optimal because
> <some reasons> and that the caller is responsible for allocating and
> releasing the strbufs, and that the strbufs are used only for
> temporary storage, thus the caller should not assume anything about
> them.
>

Yes, this will ensure that this new public interface will not be misused.

> This way, callers which don't invoke show_ref_array_item() in a tight
> loop don't need to be burdened by the new arguments (and won't have to
> remember to release them), and callers with a tight loop can take
> advantage of the optimization with the bit of extra work of having to
> declare and release the strbufs.
>

For external calls, reasonable release of strbuf does require attention,
which is indeed a disadvantage at some time.

> So, having said all that, it's not clear that the ugliness and extra
> work are worth the gain. However, if it is decided that the change is
> worthwhile, then the commit message probably should explain cases in
> which such an optimization will be really beneficial.

I now estimate that the optimization brought here may appear in a more refs
git repo like `linux.git`. I have to experiment first.

Thanks, Eric.

--
ZheNing Hu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-05 21:02 ` Derrick Stolee
@ 2021-04-06  8:58   ` ZheNing Hu
  0 siblings, 0 replies; 21+ messages in thread
From: ZheNing Hu @ 2021-04-06  8:58 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: ZheNing Hu via GitGitGadget, Git List, Jeff King, Junio C Hamano,
	Christian Couder, Hariom Verma

Derrick Stolee <stolee@gmail.com> 于2021年4月6日周二 上午5:02写道:
>
> On 4/5/2021 10:01 AM, ZheNing Hu via GitGitGadget wrote:
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can provide two single strbuf:
> > final_buf and error_buf that get reused for each output.
>
> s/two single strbuf/two buffers/
>
> > When run it 100 times:
> >
> > $ git for-each-ref
> >
> > on git.git :
> >
> > 3.19s user
> > 3.88s system
> > 35% cpu
> > 20.199 total
> >
> > to:
> >
> > 2.89s user
> > 4.00s system
> > 34% cpu
> > 19.741 total
> >
> > The performance has been slightly improved.
>
> This is a nice amount of time! Perhaps simplify the
> presentation:
>

Thanks! But now it may still need more experimental examples
to prove that this optimization is visible.

>   The performance for 'git for-each-ref' on the Git
>   repository itself with X references changes from
>   3.2 seconds to 2.9 seconds.
>
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> >     [GSOC] ref-filter: use single strbuf for all output
> >
> >     This patch learned Jeff King's optimization measures in git
> >     cat-file(79ed0a5): using a single strbuf for all objects output Instead
> >     of allocating a large number of small strbuf for every object.
>
> This part of the cover letter could be put into the
> commit message itself. Use the correct formatting,
> though:
>
>   This approach is similar to the one used by 79ed0a5
>   (cat-file: use a single strbuf for all output, 2018-08-14)
>   to speed up the cat-file builtin.
>

OK, I will do it.

> I found the code change to be correct. Thanks!
> -Stolee

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-05 22:17 ` Jeff King
@ 2021-04-06  9:49   ` ZheNing Hu
  2021-04-06 10:35     ` ZheNing Hu
  0 siblings, 1 reply; 21+ messages in thread
From: ZheNing Hu @ 2021-04-06  9:49 UTC (permalink / raw)
  To: Jeff King
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma

Jeff King <peff@peff.net> 于2021年4月6日周二 上午6:17写道:
>
> On Mon, Apr 05, 2021 at 02:01:19PM +0000, ZheNing Hu via GitGitGadget wrote:
>
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can provide two single strbuf:
> > final_buf and error_buf that get reused for each output.
> >
> > When run it 100 times:
> >
> > $ git for-each-ref
> >
> > on git.git :
> >
> > 3.19s user
> > 3.88s system
> > 35% cpu
> > 20.199 total
> >
> > to:
> >
> > 2.89s user
> > 4.00s system
> > 34% cpu
> > 19.741 total
>
> That's a bigger performance improvement than I'd expect from this. I'm
> having trouble reproducing it here (I get the same time before and
> after). I also notice that you don't seem to be CPU bound, and we spend
> most of our time on system CPU (so program startup stuff, not the loop
> you're optimizing).
>
> I think a more interesting test is timing a single invocation with a
> large number of refs. If you are planning to do a lot of work on the
> formatting code, it might be worth adding such a test into t/perf (both
> to show off results, but also to catch any regressions after we make
> things faster).
>

It makes sense. A lot of refs can be convincing. Just like the number of
objects measured in `cat-files` is large enough.

But this is the first time I use `t/perf/*` and there is a little problem.
It seem like whatever I run single script like `sh ./p0007-write-cache.sh`
or just `make` or `./run ${HOME}/git -- ./p0002-read-cache.sh` , these
tests will fail.

> >     This patch learned Jeff King's optimization measures in git
> >     cat-file(79ed0a5): using a single strbuf for all objects output Instead
> >     of allocating a large number of small strbuf for every object.
>
> I do think this is a good direction (for all the reasons laid out in
> 79ed0a5), though it wasn't actually the part I was most worried about
> for ref-filter performance. The bigger issue in ref-filter is that each
> individual atom will generally have its own allocated string, and then
> we'll format all of the atoms into the final strbuf output. In most
> cases we could avoid those intermediate copies entirely.
>

Yes! In `ref-filter` we set object info in `v->s` and then append them to
current `stack` buffer, and finally set in `final_buf`, the copy of the string
is expensive. I don’t know if the optimization should start by removing the
stack buffer?

> I do think this would be a useful optimization to have in addition,
> though. As for the patch itself, I looked over the review that Eric
> gave, and he already said everything I would have. :)
>

I think it should be optimized, It will reduce the overhead of malloc and
free, but it is not obvious enough.

Yes, there is a lot of bad code in my patch.

> -Peff

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-06  9:49   ` ZheNing Hu
@ 2021-04-06 10:35     ` ZheNing Hu
  2021-04-06 14:00       ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: ZheNing Hu @ 2021-04-06 10:35 UTC (permalink / raw)
  To: Jeff King
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma

ZheNing Hu <adlternative@gmail.com> 于2021年4月6日周二 下午5:49写道:
> But this is the first time I use `t/perf/*` and there is a little problem.
> It seem like whatever I run single script like `sh ./p0007-write-cache.sh`
> or just `make` or `./run ${HOME}/git -- ./p0002-read-cache.sh` , these
> tests will fail.
>
It's because I don't have /usr/bin/time, solved after installation.
So best have this:

--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -152,6 +152,10 @@ immediate=t
 # Perf tests require GNU time
 case "$(uname -s)" in Darwin) GTIME="${GTIME:-gtime}";; esac
 GTIME="${GTIME:-/usr/bin/time}"
+if ! test -f "$GTIME"
+then
+       error "command not found: "$GTIME""
+fi

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-06 10:35     ` ZheNing Hu
@ 2021-04-06 14:00       ` Jeff King
  2021-04-06 14:35         ` ZheNing Hu
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2021-04-06 14:00 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma

On Tue, Apr 06, 2021 at 06:35:57PM +0800, ZheNing Hu wrote:

> ZheNing Hu <adlternative@gmail.com> 于2021年4月6日周二 下午5:49写道:
> > But this is the first time I use `t/perf/*` and there is a little problem.
> > It seem like whatever I run single script like `sh ./p0007-write-cache.sh`
> > or just `make` or `./run ${HOME}/git -- ./p0002-read-cache.sh` , these
> > tests will fail.
> >
> It's because I don't have /usr/bin/time, solved after installation.
> So best have this:
> 
> --- a/t/perf/perf-lib.sh
> +++ b/t/perf/perf-lib.sh
> @@ -152,6 +152,10 @@ immediate=t
>  # Perf tests require GNU time
>  case "$(uname -s)" in Darwin) GTIME="${GTIME:-gtime}";; esac
>  GTIME="${GTIME:-/usr/bin/time}"
> +if ! test -f "$GTIME"
> +then
> +       error "command not found: "$GTIME""
> +fi

This patch would create problems when we expect to find the value of
$GTIME in the $PATH e.g., you can see in the Darwin case it is set to
just "gtime", not an absolute path).

I am sympathetic to helping people see what's wrong, but I think in this
case we're better off pointing people to using "-v". E.g.:

  $ GTIME=pretend-we-do-not-have-gtime ./p0001-rev-list.sh 
  perf 1 - rev-list --all:
  not ok 1 - rev-list --all
  #	
  #		git rev-list --all >/dev/null
  #	

Uh oh, that wasn't very informative. But how about this:

  $ GTIME=pretend-we-do-not-have-gtime ./p0001-rev-list.sh -v
  [...]
  perf 1 - rev-list --all:
  running: 
  	git rev-list --all >/dev/null
  
  ./p0001-rev-list.sh: 160: pretend-we-do-not-have-gtime: not found
  not ok 1 - rev-list --all
  #	
  #		git rev-list --all >/dev/null
  #	

which I think makes it reasonably clear.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-06 14:00       ` Jeff King
@ 2021-04-06 14:35         ` ZheNing Hu
  0 siblings, 0 replies; 21+ messages in thread
From: ZheNing Hu @ 2021-04-06 14:35 UTC (permalink / raw)
  To: Jeff King
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma

Jeff King <peff@peff.net> 于2021年4月6日周二 下午10:00写道:
>
> On Tue, Apr 06, 2021 at 06:35:57PM +0800, ZheNing Hu wrote:
>
> > ZheNing Hu <adlternative@gmail.com> 于2021年4月6日周二 下午5:49写道:
> > > But this is the first time I use `t/perf/*` and there is a little problem.
> > > It seem like whatever I run single script like `sh ./p0007-write-cache.sh`
> > > or just `make` or `./run ${HOME}/git -- ./p0002-read-cache.sh` , these
> > > tests will fail.
> > >
> > It's because I don't have /usr/bin/time, solved after installation.
> > So best have this:
> >
> > --- a/t/perf/perf-lib.sh
> > +++ b/t/perf/perf-lib.sh
> > @@ -152,6 +152,10 @@ immediate=t
> >  # Perf tests require GNU time
> >  case "$(uname -s)" in Darwin) GTIME="${GTIME:-gtime}";; esac
> >  GTIME="${GTIME:-/usr/bin/time}"
> > +if ! test -f "$GTIME"
> > +then
> > +       error "command not found: "$GTIME""
> > +fi
>
> This patch would create problems when we expect to find the value of
> $GTIME in the $PATH e.g., you can see in the Darwin case it is set to
> just "gtime", not an absolute path).
>
> I am sympathetic to helping people see what's wrong, but I think in this
> case we're better off pointing people to using "-v". E.g.:
>
>   $ GTIME=pretend-we-do-not-have-gtime ./p0001-rev-list.sh
>   perf 1 - rev-list --all:
>   not ok 1 - rev-list --all
>   #
>   #             git rev-list --all >/dev/null
>   #
>
> Uh oh, that wasn't very informative. But how about this:
>
>   $ GTIME=pretend-we-do-not-have-gtime ./p0001-rev-list.sh -v
>   [...]
>   perf 1 - rev-list --all:
>   running:
>         git rev-list --all >/dev/null
>
>   ./p0001-rev-list.sh: 160: pretend-we-do-not-have-gtime: not found
>   not ok 1 - rev-list --all
>   #
>   #             git rev-list --all >/dev/null
>   #
>
> which I think makes it reasonably clear.
>
> -Peff

I just make a small suggestion. ;)
You are right, "-v" is enough.

In addition, I found that the performance was basically
unchanged after testing. It seems that this optimization is
indeed too small, not as practical as in `cat-file`.

This shows that the performance bottleneck of `ref-filter`
lies elsewhere. E.g. you mentioned "intermediate copies".

--
ZheNing Hu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-05 14:01 [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-04-05 22:17 ` Jeff King
@ 2021-04-06 18:34 ` René Scharfe
  2021-04-07 13:57   ` ZheNing Hu
  2021-04-07 15:26 ` [PATCH v2] " ZheNing Hu via GitGitGadget
  4 siblings, 1 reply; 21+ messages in thread
From: René Scharfe @ 2021-04-06 18:34 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget, git
  Cc: Jeff King, Junio C Hamano, Christian Couder, Hariom Verma, ZheNing Hu

Am 05.04.21 um 16:01 schrieb ZheNing Hu via GitGitGadget:
> From: ZheNing Hu <adlternative@gmail.com>
>
> When we use `git for-each-ref`, every ref will call
> `show_ref_array_item()` and allocate its own final strbuf
> and error strbuf. Instead, we can provide two single strbuf:
> final_buf and error_buf that get reused for each output.
>
> When run it 100 times:
>
> $ git for-each-ref
>
> on git.git :
>
> 3.19s user
> 3.88s system
> 35% cpu
> 20.199 total
>
> to:
>
> 2.89s user
> 4.00s system
> 34% cpu
> 19.741 total
>
> The performance has been slightly improved.

I like to use hyperfine (https://github.com/sharkdp/hyperfine) to get
more stable benchmark numbers, incl. standard deviation.  With three
warmup runs I get the following results for running git for-each-ref on
Git's own repo with the current master (2e36527f23):

  Benchmark #1: ./git for-each-ref
    Time (mean ± σ):      18.8 ms ±   0.3 ms    [User: 12.7 ms, System: 5.6 ms]
    Range (min … max):    18.2 ms …  19.8 ms    148 runs

With your patch on top I get this:

  Benchmark #1: ./git for-each-ref
    Time (mean ± σ):      18.5 ms ±   0.4 ms    [User: 12.3 ms, System: 5.6 ms]
    Range (min … max):    17.8 ms …  19.6 ms    147 runs

So there seems to be a slight improvement here, but it is within the
noise.

I'm quite surprised how much longer this takes on your machine, however,
and (like Peff already mentioned) how much of the total time it spends
in system calls.  Is an antivirus program or similar interferring?  Or
some kind of emulator or similar, e.g. Valgrind?  Or has it been a long
time since you ran "git gc"?

The benchmark certainly depends on the number of local and remote
branches in the repo; my copy currently has 4304 according to
"git for-each-ref | wc -l".

>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>     [GSOC] ref-filter: use single strbuf for all output
>
>     This patch learned Jeff King's optimization measures in git
>     cat-file(79ed0a5): using a single strbuf for all objects output Instead
>     of allocating a large number of small strbuf for every object.
>
>     So ref-filter can learn same thing: use single buffer: final_buf and
>     error_buf for all refs output.
>
>     Thanks.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-927%2Fadlternative%2Fref-filter-single-buf-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-927/adlternative/ref-filter-single-buf-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/927
>
>  builtin/for-each-ref.c |  4 +++-
>  builtin/tag.c          |  4 +++-
>  ref-filter.c           | 21 ++++++++++++---------
>  ref-filter.h           |  5 ++++-
>  4 files changed, 22 insertions(+), 12 deletions(-)
>
> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index cb9c81a04606..9dc41f48bfa0 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  	struct ref_array array;
>  	struct ref_filter filter;
>  	struct ref_format format = REF_FORMAT_INIT;
> +	struct strbuf final_buf = STRBUF_INIT;
> +	struct strbuf error_buf = STRBUF_INIT;
>
>  	struct option opts[] = {
>  		OPT_BIT('s', "shell", &format.quote_style,
> @@ -81,7 +83,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  	if (!maxcount || array.nr < maxcount)
>  		maxcount = array.nr;
>  	for (i = 0; i < maxcount; i++)
> -		show_ref_array_item(array.items[i], &format);
> +		show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);

This user of show_ref_array_item() calls it in a loop on an array.

>  	ref_array_clear(&array);
>  	return 0;
>  }
> diff --git a/builtin/tag.c b/builtin/tag.c
> index d403417b5625..8a38b3e2de34 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -39,6 +39,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
>  		     struct ref_format *format)
>  {
>  	struct ref_array array;
> +	struct strbuf final_buf = STRBUF_INIT;
> +	struct strbuf error_buf = STRBUF_INIT;
>  	char *to_free = NULL;
>  	int i;
>
> @@ -64,7 +66,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
>  	ref_array_sort(sorting, &array);
>
>  	for (i = 0; i < array.nr; i++)
> -		show_ref_array_item(array.items[i], format);
> +		show_ref_array_item(array.items[i], format, &final_buf, &error_buf);

Dito.

>  	ref_array_clear(&array);
>  	free(to_free);
>
> diff --git a/ref-filter.c b/ref-filter.c
> index f0bd32f71416..51ff6af64ebc 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2436,16 +2436,16 @@ int format_ref_array_item(struct ref_array_item *info,
>  }
>
>  void show_ref_array_item(struct ref_array_item *info,
> -			 const struct ref_format *format)
> +			 const struct ref_format *format,
> +			 struct strbuf *final_buf,
> +			 struct strbuf *error_buf)
>  {
> -	struct strbuf final_buf = STRBUF_INIT;
> -	struct strbuf error_buf = STRBUF_INIT;
>
> -	if (format_ref_array_item(info, format, &final_buf, &error_buf))
> -		die("%s", error_buf.buf);
> -	fwrite(final_buf.buf, 1, final_buf.len, stdout);
> -	strbuf_release(&error_buf);
> -	strbuf_release(&final_buf);
> +	if (format_ref_array_item(info, format, final_buf, error_buf))
> +		die("%s", error_buf->buf);
> +	fwrite(final_buf->buf, 1, final_buf->len, stdout);
> +	strbuf_reset(error_buf);
> +	strbuf_reset(final_buf);
>  	putchar('\n');
>  }
>
> @@ -2453,9 +2453,12 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
>  		      const struct ref_format *format)
>  {
>  	struct ref_array_item *ref_item;
> +	struct strbuf final_buf = STRBUF_INIT;
> +	struct strbuf error_buf = STRBUF_INIT;
> +
>  	ref_item = new_ref_array_item(name, oid);
>  	ref_item->kind = ref_kind_from_refname(name);
> -	show_ref_array_item(ref_item, format);
> +	show_ref_array_item(ref_item, format, &final_buf, &error_buf);

This third and final caller works with a single item; there is no loop.

>  	free_array_item(ref_item);
>  }
>
> diff --git a/ref-filter.h b/ref-filter.h
> index 19ea4c413409..95498c9f4467 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -120,7 +120,10 @@ int format_ref_array_item(struct ref_array_item *info,
>  			  struct strbuf *final_buf,
>  			  struct strbuf *error_buf);
>  /*  Print the ref using the given format and quote_style */
> -void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> +void show_ref_array_item(struct ref_array_item *info,
> +			 const struct ref_format *format,
> +			 struct strbuf *final_buf,
> +			 struct strbuf *error_buf);

This bring-your-own-buffer approach pushes responsibilities back to
the callers, in exchange for improved performance.  The number of
users of this interface is low, so that's defensible.  But that added
effort is also non-trivial -- as you demonstrated by leaking the
allocated memory. ;-)

How about offering to do more instead?  In particular you could add
a count parameter and have show_ref_array_item() handle an array of
struct ref_array_item objects.  It could reuse the buffers internally
to get the same performance benefit, and would free callers from
having to iterate loops themselves.  Something like:

	void show_ref_array_items(struct ref_array_item **info,
				  size_t n,
				  const struct ref_format *format);

Callers that deal with a single element can pass n = 1.

Perhaps the "format" parameter should go first, like with printf.

The double reference in "**info" is a bit ugly, though (array of
pointers instead of a simple array of objects).  That's dictated
by struct ref_array_item containing a flexible array member, which
seems to be hard to change.

>  /*  Parse a single sort specifier and add it to the list */
>  void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *atom);
>  /*  Callback function for parsing the sort option */
>
> base-commit: 2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
  2021-04-06 18:34 ` René Scharfe
@ 2021-04-07 13:57   ` ZheNing Hu
  0 siblings, 0 replies; 21+ messages in thread
From: ZheNing Hu @ 2021-04-07 13:57 UTC (permalink / raw)
  To: René Scharfe
  Cc: ZheNing Hu via GitGitGadget, Git List, Jeff King, Junio C Hamano,
	Christian Couder, Hariom Verma

René Scharfe <l.s.r@web.de> 于2021年4月7日周三 上午2:34写道:
>
> Am 05.04.21 um 16:01 schrieb ZheNing Hu via GitGitGadget:
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can provide two single strbuf:
> > final_buf and error_buf that get reused for each output.
> >
> > When run it 100 times:
> >
> > $ git for-each-ref
> >
> > on git.git :
> >
> > 3.19s user
> > 3.88s system
> > 35% cpu
> > 20.199 total
> >
> > to:
> >
> > 2.89s user
> > 4.00s system
> > 34% cpu
> > 19.741 total
> >
> > The performance has been slightly improved.
>
> I like to use hyperfine (https://github.com/sharkdp/hyperfine) to get
> more stable benchmark numbers, incl. standard deviation.  With three
> warmup runs I get the following results for running git for-each-ref on
> Git's own repo with the current master (2e36527f23):
>

Yes, hyperfine is really easy to use!

>   Benchmark #1: ./git for-each-ref
>     Time (mean ± σ):      18.8 ms ±   0.3 ms    [User: 12.7 ms, System: 5.6 ms]
>     Range (min … max):    18.2 ms …  19.8 ms    148 runs
>
> With your patch on top I get this:
>
>   Benchmark #1: ./git for-each-ref
>     Time (mean ± σ):      18.5 ms ±   0.4 ms    [User: 12.3 ms, System: 5.6 ms]
>     Range (min … max):    17.8 ms …  19.6 ms    147 runs
>
> So there seems to be a slight improvement here, but it is within the
> noise.
>

Yeah. I meet same noise when I do such test.

> I'm quite surprised how much longer this takes on your machine, however,
> and (like Peff already mentioned) how much of the total time it spends
> in system calls.  Is an antivirus program or similar interferring?  Or
> some kind of emulator or similar, e.g. Valgrind?  Or has it been a long
> time since you ran "git gc"?
>

Yes, I haven't used `git gc` for a long time.
In addition, when I did the test before, I ran the network proxy software,
so there have a bit notice.

> The benchmark certainly depends on the number of local and remote
> branches in the repo; my copy currently has 4304 according to
> "git for-each-ref | wc -l".
>

Yes i understand this point.
But In my git.git, the result of "git for-each-ref | wc -l" is 8716 refs.

> >
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> >     [GSOC] ref-filter: use single strbuf for all output
> >
> >     This patch learned Jeff King's optimization measures in git
> >     cat-file(79ed0a5): using a single strbuf for all objects output Instead
> >     of allocating a large number of small strbuf for every object.
> >
> >     So ref-filter can learn same thing: use single buffer: final_buf and
> >     error_buf for all refs output.
> >
> >     Thanks.
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-927%2Fadlternative%2Fref-filter-single-buf-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-927/adlternative/ref-filter-single-buf-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/927
> >
> >  builtin/for-each-ref.c |  4 +++-
> >  builtin/tag.c          |  4 +++-
> >  ref-filter.c           | 21 ++++++++++++---------
> >  ref-filter.h           |  5 ++++-
> >  4 files changed, 22 insertions(+), 12 deletions(-)
> >
> > diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> > index cb9c81a04606..9dc41f48bfa0 100644
> > --- a/builtin/for-each-ref.c
> > +++ b/builtin/for-each-ref.c
> > @@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >       struct ref_array array;
> >       struct ref_filter filter;
> >       struct ref_format format = REF_FORMAT_INIT;
> > +     struct strbuf final_buf = STRBUF_INIT;
> > +     struct strbuf error_buf = STRBUF_INIT;
> >
> >       struct option opts[] = {
> >               OPT_BIT('s', "shell", &format.quote_style,
> > @@ -81,7 +83,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >       if (!maxcount || array.nr < maxcount)
> >               maxcount = array.nr;
> >       for (i = 0; i < maxcount; i++)
> > -             show_ref_array_item(array.items[i], &format);
> > +             show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);
>
> This user of show_ref_array_item() calls it in a loop on an array.
>
> >       ref_array_clear(&array);
> >       return 0;
> >  }
> > diff --git a/builtin/tag.c b/builtin/tag.c
> > index d403417b5625..8a38b3e2de34 100644
> > --- a/builtin/tag.c
> > +++ b/builtin/tag.c
> > @@ -39,6 +39,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >                    struct ref_format *format)
> >  {
> >       struct ref_array array;
> > +     struct strbuf final_buf = STRBUF_INIT;
> > +     struct strbuf error_buf = STRBUF_INIT;
> >       char *to_free = NULL;
> >       int i;
> >
> > @@ -64,7 +66,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >       ref_array_sort(sorting, &array);
> >
> >       for (i = 0; i < array.nr; i++)
> > -             show_ref_array_item(array.items[i], format);
> > +             show_ref_array_item(array.items[i], format, &final_buf, &error_buf);
>
> Dito.
>
> >       ref_array_clear(&array);
> >       free(to_free);
> >
> > diff --git a/ref-filter.c b/ref-filter.c
> > index f0bd32f71416..51ff6af64ebc 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2436,16 +2436,16 @@ int format_ref_array_item(struct ref_array_item *info,
> >  }
> >
> >  void show_ref_array_item(struct ref_array_item *info,
> > -                      const struct ref_format *format)
> > +                      const struct ref_format *format,
> > +                      struct strbuf *final_buf,
> > +                      struct strbuf *error_buf)
> >  {
> > -     struct strbuf final_buf = STRBUF_INIT;
> > -     struct strbuf error_buf = STRBUF_INIT;
> >
> > -     if (format_ref_array_item(info, format, &final_buf, &error_buf))
> > -             die("%s", error_buf.buf);
> > -     fwrite(final_buf.buf, 1, final_buf.len, stdout);
> > -     strbuf_release(&error_buf);
> > -     strbuf_release(&final_buf);
> > +     if (format_ref_array_item(info, format, final_buf, error_buf))
> > +             die("%s", error_buf->buf);
> > +     fwrite(final_buf->buf, 1, final_buf->len, stdout);
> > +     strbuf_reset(error_buf);
> > +     strbuf_reset(final_buf);
> >       putchar('\n');
> >  }
> >
> > @@ -2453,9 +2453,12 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
> >                     const struct ref_format *format)
> >  {
> >       struct ref_array_item *ref_item;
> > +     struct strbuf final_buf = STRBUF_INIT;
> > +     struct strbuf error_buf = STRBUF_INIT;
> > +
> >       ref_item = new_ref_array_item(name, oid);
> >       ref_item->kind = ref_kind_from_refname(name);
> > -     show_ref_array_item(ref_item, format);
> > +     show_ref_array_item(ref_item, format, &final_buf, &error_buf);
>
> This third and final caller works with a single item; there is no loop.
>
> >       free_array_item(ref_item);
> >  }
> >
> > diff --git a/ref-filter.h b/ref-filter.h
> > index 19ea4c413409..95498c9f4467 100644
> > --- a/ref-filter.h
> > +++ b/ref-filter.h
> > @@ -120,7 +120,10 @@ int format_ref_array_item(struct ref_array_item *info,
> >                         struct strbuf *final_buf,
> >                         struct strbuf *error_buf);
> >  /*  Print the ref using the given format and quote_style */
> > -void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> > +void show_ref_array_item(struct ref_array_item *info,
> > +                      const struct ref_format *format,
> > +                      struct strbuf *final_buf,
> > +                      struct strbuf *error_buf);
>
> This bring-your-own-buffer approach pushes responsibilities back to
> the callers, in exchange for improved performance.  The number of
> users of this interface is low, so that's defensible.  But that added
> effort is also non-trivial -- as you demonstrated by leaking the
> allocated memory. ;-)
>

Yes, this may be burden for the function caller.

> How about offering to do more instead?  In particular you could add
> a count parameter and have show_ref_array_item() handle an array of
> struct ref_array_item objects.  It could reuse the buffers internally
> to get the same performance benefit, and would free callers from
> having to iterate loops themselves.  Something like:
>
>         void show_ref_array_items(struct ref_array_item **info,
>                                   size_t n,
>                                   const struct ref_format *format);
>
> Callers that deal with a single element can pass n = 1.
>
> Perhaps the "format" parameter should go first, like with printf.
>
> The double reference in "**info" is a bit ugly, though (array of
> pointers instead of a simple array of objects).  That's dictated
> by struct ref_array_item containing a flexible array member, which
> seems to be hard to change.
>

I personally think this idea is great.
In this way, there is no need to pass in two strbuf from the outside.

+void show_ref_array_items(struct ref_array_item **info,
+                        const struct ref_format *format,
+                        size_t n)
+{
+       struct strbuf final_buf = STRBUF_INIT;
+       struct strbuf error_buf = STRBUF_INIT;
+       size_t i;
+
+       for (i = 0; i < n; i++) {
+               if (format_ref_array_item(info[i], format, &final_buf,
&error_buf))
+                       die("%s", error_buf.buf);
+               fwrite(final_buf.buf, 1, final_buf.len, stdout);
+               strbuf_reset(&error_buf);
+               strbuf_reset(&final_buf);
+               putchar('\n');
+       }
+       strbuf_release(&error_buf);
+       strbuf_release(&final_buf);
+}
+

And the result is here(close the network proxy program):

HEAD~ result :

(git)-[heads/master] % hyperfine "./bin-wrappers/git for-each-ref"
--warmup=10
  Benchmark #1: ./bin-wrappers/git for-each-ref
    Time (mean ± σ):      18.7 ms ±   0.4 ms    [User: 14.9 ms,
System: 3.9 ms]
    Range (min … max):    18.1 ms …  19.8 ms    141 runs

With the new patch :
 (git)-[ref-filter-single-buf] % hyperfine "./bin-wrappers/git
for-each-ref" --warmup=10
  Benchmark #1: ./bin-wrappers/git for-each-ref
   Time (mean ± σ):      18.2 ms ±   0.3 ms    [User: 14.1 ms, System:
4.2 ms]
   Range (min … max):    17.4 ms …  19.2 ms    140 runs

Seem like it does have some small advantages ;-)

> >  /*  Parse a single sort specifier and add it to the list */
> >  void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *atom);
> >  /*  Callback function for parsing the sort option */
> >
> > base-commit: 2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48
> >
>

A new iteration will be sent later.

Thanks!
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  2021-04-05 14:01 [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-04-06 18:34 ` René Scharfe
@ 2021-04-07 15:26 ` ZheNing Hu via GitGitGadget
  2021-04-07 20:31   ` Junio C Hamano
  2021-04-07 21:27   ` Jeff King
  4 siblings, 2 replies; 21+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-04-07 15:26 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Junio C Hamano, Christian Couder, Hariom Verma,
	Eric Sunshine, Derrick Stolee, René Scharfe, ZheNing Hu,
	ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

When we use `git for-each-ref`, every ref will call
`show_ref_array_item()` and allocate its own final strbuf
and error strbuf. Instead, we can reuse these two strbuf
for each step ref's output.

The performance for `git for-each-ref` on the Git repository
itself with performance testing tool `hyperfine` changes from
18.7 ms ± 0.4 ms to 18.2 ms ± 0.3 ms.

This approach is similar to the one used by 79ed0a5
(cat-file: use a single strbuf for all output, 2018-08-14)
to speed up the cat-file builtin.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    [GSOC] ref-filter: use single strbuf for all output
    
    Now git for-each-ref can reuse two buffers for all refs output, the
    performance is slightly improved.
    
    Now there may be a question : Should the original interface
    show_ref_array_items be retained?
    
    Thanks.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-927%2Fadlternative%2Fref-filter-single-buf-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-927/adlternative/ref-filter-single-buf-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/927

Range-diff vs v1:

 1:  bcc3feb4de7c ! 1:  3aed12c4f5a8 [GSOC] ref-filter: use single strbuf for all output
     @@ Commit message
      
          When we use `git for-each-ref`, every ref will call
          `show_ref_array_item()` and allocate its own final strbuf
     -    and error strbuf. Instead, we can provide two single strbuf:
     -    final_buf and error_buf that get reused for each output.
     +    and error strbuf. Instead, we can reuse these two strbuf
     +    for each step ref's output.
      
     -    When run it 100 times:
     +    The performance for `git for-each-ref` on the Git repository
     +    itself with performance testing tool `hyperfine` changes from
     +    18.7 ms ± 0.4 ms to 18.2 ms ± 0.3 ms.
      
     -    $ git for-each-ref
     -
     -    on git.git :
     -
     -    3.19s user
     -    3.88s system
     -    35% cpu
     -    20.199 total
     -
     -    to:
     -
     -    2.89s user
     -    4.00s system
     -    34% cpu
     -    19.741 total
     -
     -    The performance has been slightly improved.
     +    This approach is similar to the one used by 79ed0a5
     +    (cat-file: use a single strbuf for all output, 2018-08-14)
     +    to speed up the cat-file builtin.
      
          Signed-off-by: ZheNing Hu <adlternative@gmail.com>
      
       ## builtin/for-each-ref.c ##
     -@@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
     - 	struct ref_array array;
     - 	struct ref_filter filter;
     - 	struct ref_format format = REF_FORMAT_INIT;
     -+	struct strbuf final_buf = STRBUF_INIT;
     -+	struct strbuf error_buf = STRBUF_INIT;
     +@@ builtin/for-each-ref.c: static char const * const for_each_ref_usage[] = {
       
     - 	struct option opts[] = {
     - 		OPT_BIT('s', "shell", &format.quote_style,
     + int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
     + {
     +-	int i;
     + 	struct ref_sorting *sorting = NULL, **sorting_tail = &sorting;
     + 	int maxcount = 0, icase = 0;
     + 	struct ref_array array;
      @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
     + 
       	if (!maxcount || array.nr < maxcount)
       		maxcount = array.nr;
     - 	for (i = 0; i < maxcount; i++)
     +-	for (i = 0; i < maxcount; i++)
      -		show_ref_array_item(array.items[i], &format);
     -+		show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);
     ++	show_ref_array_items(array.items, &format, maxcount);
       	ref_array_clear(&array);
       	return 0;
       }
      
     - ## builtin/tag.c ##
     -@@ builtin/tag.c: static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
     - 		     struct ref_format *format)
     - {
     - 	struct ref_array array;
     -+	struct strbuf final_buf = STRBUF_INIT;
     -+	struct strbuf error_buf = STRBUF_INIT;
     - 	char *to_free = NULL;
     - 	int i;
     - 
     -@@ builtin/tag.c: static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
     - 	ref_array_sort(sorting, &array);
     - 
     - 	for (i = 0; i < array.nr; i++)
     --		show_ref_array_item(array.items[i], format);
     -+		show_ref_array_item(array.items[i], format, &final_buf, &error_buf);
     - 	ref_array_clear(&array);
     - 	free(to_free);
     - 
     -
       ## ref-filter.c ##
      @@ ref-filter.c: int format_ref_array_item(struct ref_array_item *info,
     + 	return 0;
       }
       
     - void show_ref_array_item(struct ref_array_item *info,
     --			 const struct ref_format *format)
     ++void show_ref_array_items(struct ref_array_item **info,
      +			 const struct ref_format *format,
     -+			 struct strbuf *final_buf,
     -+			 struct strbuf *error_buf)
     - {
     --	struct strbuf final_buf = STRBUF_INIT;
     --	struct strbuf error_buf = STRBUF_INIT;
     - 
     --	if (format_ref_array_item(info, format, &final_buf, &error_buf))
     --		die("%s", error_buf.buf);
     --	fwrite(final_buf.buf, 1, final_buf.len, stdout);
     --	strbuf_release(&error_buf);
     --	strbuf_release(&final_buf);
     -+	if (format_ref_array_item(info, format, final_buf, error_buf))
     -+		die("%s", error_buf->buf);
     -+	fwrite(final_buf->buf, 1, final_buf->len, stdout);
     -+	strbuf_reset(error_buf);
     -+	strbuf_reset(final_buf);
     - 	putchar('\n');
     - }
     - 
     -@@ ref-filter.c: void pretty_print_ref(const char *name, const struct object_id *oid,
     - 		      const struct ref_format *format)
     - {
     - 	struct ref_array_item *ref_item;
     ++			 size_t n)
     ++{
      +	struct strbuf final_buf = STRBUF_INIT;
      +	struct strbuf error_buf = STRBUF_INIT;
     ++	size_t i;
      +
     - 	ref_item = new_ref_array_item(name, oid);
     - 	ref_item->kind = ref_kind_from_refname(name);
     --	show_ref_array_item(ref_item, format);
     -+	show_ref_array_item(ref_item, format, &final_buf, &error_buf);
     - 	free_array_item(ref_item);
     - }
     - 
     ++	for (i = 0; i < n; i++) {
     ++		if (format_ref_array_item(info[i], format, &final_buf, &error_buf))
     ++			die("%s", error_buf.buf);
     ++		fwrite(final_buf.buf, 1, final_buf.len, stdout);
     ++		strbuf_reset(&error_buf);
     ++		strbuf_reset(&final_buf);
     ++		putchar('\n');
     ++	}
     ++	strbuf_release(&error_buf);
     ++	strbuf_release(&final_buf);
     ++}
     ++
     + void show_ref_array_item(struct ref_array_item *info,
     + 			 const struct ref_format *format)
     + {
      
       ## ref-filter.h ##
      @@ ref-filter.h: int format_ref_array_item(struct ref_array_item *info,
     - 			  struct strbuf *final_buf,
       			  struct strbuf *error_buf);
       /*  Print the ref using the given format and quote_style */
     --void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
     -+void show_ref_array_item(struct ref_array_item *info,
     + void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
     ++/*  Print the refs using the given format and quote_style and maxcount */
     ++void show_ref_array_items(struct ref_array_item **info,
      +			 const struct ref_format *format,
     -+			 struct strbuf *final_buf,
     -+			 struct strbuf *error_buf);
     ++			 size_t n);
     ++
       /*  Parse a single sort specifier and add it to the list */
       void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *atom);
       /*  Callback function for parsing the sort option */


 builtin/for-each-ref.c |  4 +---
 ref-filter.c           | 20 ++++++++++++++++++++
 ref-filter.h           |  5 +++++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index cb9c81a04606..d630402230f3 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -16,7 +16,6 @@ static char const * const for_each_ref_usage[] = {
 
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 {
-	int i;
 	struct ref_sorting *sorting = NULL, **sorting_tail = &sorting;
 	int maxcount = 0, icase = 0;
 	struct ref_array array;
@@ -80,8 +79,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	if (!maxcount || array.nr < maxcount)
 		maxcount = array.nr;
-	for (i = 0; i < maxcount; i++)
-		show_ref_array_item(array.items[i], &format);
+	show_ref_array_items(array.items, &format, maxcount);
 	ref_array_clear(&array);
 	return 0;
 }
diff --git a/ref-filter.c b/ref-filter.c
index f0bd32f71416..27bbf9b6c8ac 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2435,6 +2435,26 @@ int format_ref_array_item(struct ref_array_item *info,
 	return 0;
 }
 
+void show_ref_array_items(struct ref_array_item **info,
+			 const struct ref_format *format,
+			 size_t n)
+{
+	struct strbuf final_buf = STRBUF_INIT;
+	struct strbuf error_buf = STRBUF_INIT;
+	size_t i;
+
+	for (i = 0; i < n; i++) {
+		if (format_ref_array_item(info[i], format, &final_buf, &error_buf))
+			die("%s", error_buf.buf);
+		fwrite(final_buf.buf, 1, final_buf.len, stdout);
+		strbuf_reset(&error_buf);
+		strbuf_reset(&final_buf);
+		putchar('\n');
+	}
+	strbuf_release(&error_buf);
+	strbuf_release(&final_buf);
+}
+
 void show_ref_array_item(struct ref_array_item *info,
 			 const struct ref_format *format)
 {
diff --git a/ref-filter.h b/ref-filter.h
index 19ea4c413409..eb7e79a6676d 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -121,6 +121,11 @@ int format_ref_array_item(struct ref_array_item *info,
 			  struct strbuf *error_buf);
 /*  Print the ref using the given format and quote_style */
 void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
+/*  Print the refs using the given format and quote_style and maxcount */
+void show_ref_array_items(struct ref_array_item **info,
+			 const struct ref_format *format,
+			 size_t n);
+
 /*  Parse a single sort specifier and add it to the list */
 void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *atom);
 /*  Callback function for parsing the sort option */

base-commit: 2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  2021-04-07 15:26 ` [PATCH v2] " ZheNing Hu via GitGitGadget
@ 2021-04-07 20:31   ` Junio C Hamano
  2021-04-08 12:05     ` ZheNing Hu
  2021-04-07 21:27   ` Jeff King
  1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2021-04-07 20:31 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Jeff King, Christian Couder, Hariom Verma, Eric Sunshine,
	Derrick Stolee, René Scharfe, ZheNing Hu

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Subject: Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output

The implementation changed so much from the initial attempt, for
which the above title may have been appropriate, that reusing single
strbuf over and over is not the most important part of the change
anymore, I am afraid.  Besides, it uses TWO strbufs ;-)

Subject: [PATCH] ref-filter: introduce show_ref_array_items() helper

or something like that?

> From: ZheNing Hu <adlternative@gmail.com>
>
> When we use `git for-each-ref`, every ref will call
> `show_ref_array_item()` and allocate its own final strbuf
> and error strbuf. Instead, we can reuse these two strbuf
> for each step ref's output.
>
> The performance for `git for-each-ref` on the Git repository
> itself with performance testing tool `hyperfine` changes from
> 18.7 ms ± 0.4 ms to 18.2 ms ± 0.3 ms.
>
> This approach is similar to the one used by 79ed0a5
> (cat-file: use a single strbuf for all output, 2018-08-14)
> to speed up the cat-file builtin.
>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>     [GSOC] ref-filter: use single strbuf for all output
>     
>     Now git for-each-ref can reuse two buffers for all refs output, the
>     performance is slightly improved.
>     
>     Now there may be a question : Should the original interface
>     show_ref_array_items be retained?
> ...
>        /*  Callback function for parsing the sort option */

Again, not a very useful range-diff as the implementation changed so much.


>  builtin/for-each-ref.c |  4 +---
>  ref-filter.c           | 20 ++++++++++++++++++++
>  ref-filter.h           |  5 +++++
>  3 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index cb9c81a04606..d630402230f3 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -16,7 +16,6 @@ static char const * const for_each_ref_usage[] = {
>  
>  int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  {
> -	int i;
>  	struct ref_sorting *sorting = NULL, **sorting_tail = &sorting;
>  	int maxcount = 0, icase = 0;
>  	struct ref_array array;
> @@ -80,8 +79,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  
>  	if (!maxcount || array.nr < maxcount)
>  		maxcount = array.nr;
> -	for (i = 0; i < maxcount; i++)
> -		show_ref_array_item(array.items[i], &format);
> +	show_ref_array_items(array.items, &format, maxcount);

The intention of this call is to pass an array and the number of
elements in the array as a pair to the function, right?  When you
design the API for a new helper function, do not split them apart by
inserting an unrelated parameter in the middle.

> diff --git a/ref-filter.c b/ref-filter.c
> index f0bd32f71416..27bbf9b6c8ac 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2435,6 +2435,26 @@ int format_ref_array_item(struct ref_array_item *info,
>  	return 0;
>  }
>  
> +void show_ref_array_items(struct ref_array_item **info,
> +			 const struct ref_format *format,
> +			 size_t n)

IOW,

	void show_ref_array_items(const struct ref_format *format,
				  struct ref_array_item *info[], size_t n)

> +{
> +	struct strbuf final_buf = STRBUF_INIT;
> +	struct strbuf error_buf = STRBUF_INIT;
> +	size_t i;
> +
> +	for (i = 0; i < n; i++) {
> +		if (format_ref_array_item(info[i], format, &final_buf, &error_buf))
> +			die("%s", error_buf.buf);

OK, the contents of error_buf is already localized, so it is correct
not to have _() around the "%s" here.

> +		fwrite(final_buf.buf, 1, final_buf.len, stdout);
> +		strbuf_reset(&error_buf);
> +		strbuf_reset(&final_buf);
> +		putchar('\n');

This is inherited code, but splitting fwrite() and putchar() apart
like this makes the code hard to follow.  Perhaps clean it up later
when nothing else is going on in the code as leftoverbits, outside
the topic.

> +	}
> +	strbuf_release(&error_buf);
> +	strbuf_release(&final_buf);
> +}
> +
>  void show_ref_array_item(struct ref_array_item *info,
>  			 const struct ref_format *format)
>  {

Isn't the point of the new helper function so that this can become a
thin wrapper around it, i.e.

	void show_ref_array_item(...)
        {
		show_ref_array_items(format, &info, 1);
	}

> diff --git a/ref-filter.h b/ref-filter.h
> index 19ea4c413409..eb7e79a6676d 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -121,6 +121,11 @@ int format_ref_array_item(struct ref_array_item *info,
>  			  struct strbuf *error_buf);
>  /*  Print the ref using the given format and quote_style */
>  void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> +/*  Print the refs using the given format and quote_style and maxcount */
> +void show_ref_array_items(struct ref_array_item **info,
> +			 const struct ref_format *format,
> +			 size_t n);

The inconsistency between "maxcount" vs "n" is irritating.  Calling
the parameter with a name that has the word "info" (because the new
parameter is about that array) and a word like "nelem" to hint that
it is the number of elements in the array) would be sensible.

void show_ref_array_items(const struct ref_format *format,
			  struct ref_array_item *info[], size_t info_count);

or something along the line, perhaps?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  2021-04-07 15:26 ` [PATCH v2] " ZheNing Hu via GitGitGadget
  2021-04-07 20:31   ` Junio C Hamano
@ 2021-04-07 21:27   ` Jeff King
  2021-04-08 12:18     ` ZheNing Hu
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff King @ 2021-04-07 21:27 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Junio C Hamano, Christian Couder, Hariom Verma,
	Eric Sunshine, Derrick Stolee, René Scharfe, ZheNing Hu

On Wed, Apr 07, 2021 at 03:26:48PM +0000, ZheNing Hu via GitGitGadget wrote:

> diff --git a/ref-filter.c b/ref-filter.c
> index f0bd32f71416..27bbf9b6c8ac 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2435,6 +2435,26 @@ int format_ref_array_item(struct ref_array_item *info,
>  	return 0;
>  }
>  
> +void show_ref_array_items(struct ref_array_item **info,
> +			 const struct ref_format *format,
> +			 size_t n)
> +{
> +	struct strbuf final_buf = STRBUF_INIT;
> +	struct strbuf error_buf = STRBUF_INIT;
> +	size_t i;
> +
> +	for (i = 0; i < n; i++) {
> +		if (format_ref_array_item(info[i], format, &final_buf, &error_buf))
> +			die("%s", error_buf.buf);
> +		fwrite(final_buf.buf, 1, final_buf.len, stdout);
> +		strbuf_reset(&error_buf);
> +		strbuf_reset(&final_buf);
> +		putchar('\n');
> +	}
> +	strbuf_release(&error_buf);
> +	strbuf_release(&final_buf);
> +}

I think this is a reasonable direction to take the solution: wrapping
the loop so that the reuse of the buffers can be included there.

But I do wonder if we should go the opposite direction, and get rid of
show_ref_array_item() entirely. It only has two callers, both of which
could just write the loop themselves. That is more code, but perhaps it
would make it more clear what is going on in those callers, and to give
them more flexibility.

I notice there's a third user of the ref-filter.c code in
builtin/branch.c that does not use show_ref_array_item(). Instead, it
loops itself and calls format_ref_array_item(). I think this is because
it is sometimes columnizing the results. Perhaps git-tag and
for-each-ref would want to learn the same trick, in which case they'd be
happy to have the open-coded loop.

Either way, it probably makes sense to introduce the same optimization
to the case in builtin/branch.c.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  2021-04-07 20:31   ` Junio C Hamano
@ 2021-04-08 12:05     ` ZheNing Hu
  0 siblings, 0 replies; 21+ messages in thread
From: ZheNing Hu @ 2021-04-08 12:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: ZheNing Hu via GitGitGadget, Git List, Jeff King,
	Christian Couder, Hariom Verma, Eric Sunshine, Derrick Stolee,
	René Scharfe

Junio C Hamano <gitster@pobox.com> 于2021年4月8日周四 上午4:31写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Subject: Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
>
> The implementation changed so much from the initial attempt, for
> which the above title may have been appropriate, that reusing single
> strbuf over and over is not the most important part of the change
> anymore, I am afraid.  Besides, it uses TWO strbufs ;-)
>
> Subject: [PATCH] ref-filter: introduce show_ref_array_items() helper
>
> or something like that?
>

Yep, I may think that its core is still reusing strbufs, but
"introduce show_ref_array_items()"  will be more accurate.

> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can reuse these two strbuf
> > for each step ref's output.
> >
> > The performance for `git for-each-ref` on the Git repository
> > itself with performance testing tool `hyperfine` changes from
> > 18.7 ms ± 0.4 ms to 18.2 ms ± 0.3 ms.
> >
> > This approach is similar to the one used by 79ed0a5
> > (cat-file: use a single strbuf for all output, 2018-08-14)
> > to speed up the cat-file builtin.
> >
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> >     [GSOC] ref-filter: use single strbuf for all output
> >
> >     Now git for-each-ref can reuse two buffers for all refs output, the
> >     performance is slightly improved.
> >
> >     Now there may be a question : Should the original interface
> >     show_ref_array_items be retained?
> > ...
> >        /*  Callback function for parsing the sort option */
>
> Again, not a very useful range-diff as the implementation changed so much.
>

This makes me wonder if I should give up GGG in the future.
I also don’t want a rang-diff with a big difference.

>
> >  builtin/for-each-ref.c |  4 +---
> >  ref-filter.c           | 20 ++++++++++++++++++++
> >  ref-filter.h           |  5 +++++
> >  3 files changed, 26 insertions(+), 3 deletions(-)
> >
> > diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> > index cb9c81a04606..d630402230f3 100644
> > --- a/builtin/for-each-ref.c
> > +++ b/builtin/for-each-ref.c
> > @@ -16,7 +16,6 @@ static char const * const for_each_ref_usage[] = {
> >
> >  int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >  {
> > -     int i;
> >       struct ref_sorting *sorting = NULL, **sorting_tail = &sorting;
> >       int maxcount = 0, icase = 0;
> >       struct ref_array array;
> > @@ -80,8 +79,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >
> >       if (!maxcount || array.nr < maxcount)
> >               maxcount = array.nr;
> > -     for (i = 0; i < maxcount; i++)
> > -             show_ref_array_item(array.items[i], &format);
> > +     show_ref_array_items(array.items, &format, maxcount);
>
> The intention of this call is to pass an array and the number of
> elements in the array as a pair to the function, right?  When you
> design the API for a new helper function, do not split them apart by
> inserting an unrelated parameter in the middle.
>

Eh, are you saying that `maxcount` is irrelevant here? There should be
`maxcount`, because we need to limit the number of iterations here.

> > diff --git a/ref-filter.c b/ref-filter.c
> > index f0bd32f71416..27bbf9b6c8ac 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2435,6 +2435,26 @@ int format_ref_array_item(struct ref_array_item *info,
> >       return 0;
> >  }
> >
> > +void show_ref_array_items(struct ref_array_item **info,
> > +                      const struct ref_format *format,
> > +                      size_t n)
>
> IOW,
>
>         void show_ref_array_items(const struct ref_format *format,
>                                   struct ref_array_item *info[], size_t n)
>

Yes, it will be more obvious in the form of an array.

> > +{
> > +     struct strbuf final_buf = STRBUF_INIT;
> > +     struct strbuf error_buf = STRBUF_INIT;
> > +     size_t i;
> > +
> > +     for (i = 0; i < n; i++) {
> > +             if (format_ref_array_item(info[i], format, &final_buf, &error_buf))
> > +                     die("%s", error_buf.buf);
>
> OK, the contents of error_buf is already localized, so it is correct
> not to have _() around the "%s" here.
>
> > +             fwrite(final_buf.buf, 1, final_buf.len, stdout);
> > +             strbuf_reset(&error_buf);
> > +             strbuf_reset(&final_buf);
> > +             putchar('\n');
>
> This is inherited code, but splitting fwrite() and putchar() apart
> like this makes the code hard to follow.  Perhaps clean it up later
> when nothing else is going on in the code as leftoverbits, outside
> the topic.
>

Ok, swap the position of reset and putchar.

> > +     }
> > +     strbuf_release(&error_buf);
> > +     strbuf_release(&final_buf);
> > +}
> > +
> >  void show_ref_array_item(struct ref_array_item *info,
> >                        const struct ref_format *format)
> >  {
>
> Isn't the point of the new helper function so that this can become a
> thin wrapper around it, i.e.
>
>         void show_ref_array_item(...)
>         {
>                 show_ref_array_items(format, &info, 1);
>         }
>

Maybe it makes sense. But as Peff said, Maybe we can just delete it.

> > diff --git a/ref-filter.h b/ref-filter.h
> > index 19ea4c413409..eb7e79a6676d 100644
> > --- a/ref-filter.h
> > +++ b/ref-filter.h
> > @@ -121,6 +121,11 @@ int format_ref_array_item(struct ref_array_item *info,
> >                         struct strbuf *error_buf);
> >  /*  Print the ref using the given format and quote_style */
> >  void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> > +/*  Print the refs using the given format and quote_style and maxcount */
> > +void show_ref_array_items(struct ref_array_item **info,
> > +                      const struct ref_format *format,
> > +                      size_t n);
>
> The inconsistency between "maxcount" vs "n" is irritating.  Calling
> the parameter with a name that has the word "info" (because the new
> parameter is about that array) and a word like "nelem" to hint that
> it is the number of elements in the array) would be sensible.
>
> void show_ref_array_items(const struct ref_format *format,
>                           struct ref_array_item *info[], size_t info_count);
>
> or something along the line, perhaps?
>

Aha, I guess this is the reason for the misunderstanding above.
Yes, `info_count` is the correct meaning and the meaning of `n` is
wrong.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  2021-04-07 21:27   ` Jeff King
@ 2021-04-08 12:18     ` ZheNing Hu
  2021-04-08 14:32       ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: ZheNing Hu @ 2021-04-08 12:18 UTC (permalink / raw)
  To: Jeff King
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma, Eric Sunshine, Derrick Stolee,
	René Scharfe

Jeff King <peff@peff.net> 于2021年4月8日周四 上午5:27写道:
>
> On Wed, Apr 07, 2021 at 03:26:48PM +0000, ZheNing Hu via GitGitGadget wrote:
>
> > diff --git a/ref-filter.c b/ref-filter.c
> > index f0bd32f71416..27bbf9b6c8ac 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2435,6 +2435,26 @@ int format_ref_array_item(struct ref_array_item *info,
> >       return 0;
> >  }
> >
> > +void show_ref_array_items(struct ref_array_item **info,
> > +                      const struct ref_format *format,
> > +                      size_t n)
> > +{
> > +     struct strbuf final_buf = STRBUF_INIT;
> > +     struct strbuf error_buf = STRBUF_INIT;
> > +     size_t i;
> > +
> > +     for (i = 0; i < n; i++) {
> > +             if (format_ref_array_item(info[i], format, &final_buf, &error_buf))
> > +                     die("%s", error_buf.buf);
> > +             fwrite(final_buf.buf, 1, final_buf.len, stdout);
> > +             strbuf_reset(&error_buf);
> > +             strbuf_reset(&final_buf);
> > +             putchar('\n');
> > +     }
> > +     strbuf_release(&error_buf);
> > +     strbuf_release(&final_buf);
> > +}
>
> I think this is a reasonable direction to take the solution: wrapping
> the loop so that the reuse of the buffers can be included there.
>
> But I do wonder if we should go the opposite direction, and get rid of
> show_ref_array_item() entirely. It only has two callers, both of which
> could just write the loop themselves. That is more code, but perhaps it
> would make it more clear what is going on in those callers, and to give
> them more flexibility.
>

Indeed. I think `pretty_print_ref()` is proof that we may need to keep
`show_ref_array_item()` because If it modified to `show_ref_array_items(...,1);`
it will look very strange.

> I notice there's a third user of the ref-filter.c code in
> builtin/branch.c that does not use show_ref_array_item(). Instead, it
> loops itself and calls format_ref_array_item(). I think this is because
> it is sometimes columnizing the results. Perhaps git-tag and
> for-each-ref would want to learn the same trick, in which case they'd be
> happy to have the open-coded loop.
>

Yes, a special judgment about colopts is used in the `print_ref_list()` of
`branch.c`.

> Either way, it probably makes sense to introduce the same optimization
> to the case in builtin/branch.c.
>

I agree in `branch.c` here can learn this reusing bufs optimization.

> -Peff

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  2021-04-08 12:18     ` ZheNing Hu
@ 2021-04-08 14:32       ` Jeff King
  2021-04-08 14:43         ` ZheNing Hu
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2021-04-08 14:32 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma, Eric Sunshine, Derrick Stolee,
	René Scharfe

On Thu, Apr 08, 2021 at 08:18:59PM +0800, ZheNing Hu wrote:

> > I think this is a reasonable direction to take the solution: wrapping
> > the loop so that the reuse of the buffers can be included there.
> >
> > But I do wonder if we should go the opposite direction, and get rid of
> > show_ref_array_item() entirely. It only has two callers, both of which
> > could just write the loop themselves. That is more code, but perhaps it
> > would make it more clear what is going on in those callers, and to give
> > them more flexibility.
> >
> 
> Indeed. I think `pretty_print_ref()` is proof that we may need to keep
> `show_ref_array_item()` because If it modified to `show_ref_array_items(...,1);`
> it will look very strange.

What I meant was that we should get rid of show_ref_array_items(), as
well, and just use format_ref_array_item() everywhere. This whole
wrapper is only saving us a few lines, and it makes it harder to see
what the function is doing. Likewise for pretty-print ref. But I dunno.
Maybe that is all going too far.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  2021-04-08 14:32       ` Jeff King
@ 2021-04-08 14:43         ` ZheNing Hu
  2021-04-08 14:51           ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: ZheNing Hu @ 2021-04-08 14:43 UTC (permalink / raw)
  To: Jeff King
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma, Eric Sunshine, Derrick Stolee,
	René Scharfe

Jeff King <peff@peff.net> 于2021年4月8日周四 下午10:32写道:
>
> On Thu, Apr 08, 2021 at 08:18:59PM +0800, ZheNing Hu wrote:
>
> > > I think this is a reasonable direction to take the solution: wrapping
> > > the loop so that the reuse of the buffers can be included there.
> > >
> > > But I do wonder if we should go the opposite direction, and get rid of
> > > show_ref_array_item() entirely. It only has two callers, both of which
> > > could just write the loop themselves. That is more code, but perhaps it
> > > would make it more clear what is going on in those callers, and to give
> > > them more flexibility.
> > >
> >
> > Indeed. I think `pretty_print_ref()` is proof that we may need to keep
> > `show_ref_array_item()` because If it modified to `show_ref_array_items(...,1);`
> > it will look very strange.
>
> What I meant was that we should get rid of show_ref_array_items(), as
> well, and just use format_ref_array_item() everywhere. This whole
> wrapper is only saving us a few lines, and it makes it harder to see
> what the function is doing. Likewise for pretty-print ref. But I dunno.
> Maybe that is all going too far.
>

Ok... so you mean we just use a loop like in branch.c, and get rid of
show_ref_array_items() and show_ref_array_item().
(We can still use the optimization of reuse bufs)

> -Peff

--
ZheNing Hu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  2021-04-08 14:43         ` ZheNing Hu
@ 2021-04-08 14:51           ` Jeff King
  2021-04-08 15:12             ` ZheNing Hu
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2021-04-08 14:51 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma, Eric Sunshine, Derrick Stolee,
	René Scharfe

On Thu, Apr 08, 2021 at 10:43:54PM +0800, ZheNing Hu wrote:

> > What I meant was that we should get rid of show_ref_array_items(), as
> > well, and just use format_ref_array_item() everywhere. This whole
> > wrapper is only saving us a few lines, and it makes it harder to see
> > what the function is doing. Likewise for pretty-print ref. But I dunno.
> > Maybe that is all going too far.
> >
> 
> Ok... so you mean we just use a loop like in branch.c, and get rid of
> show_ref_array_items() and show_ref_array_item().
> (We can still use the optimization of reuse bufs)

Yes, something like this:

diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index cb9c81a046..55297fe297 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct ref_array array;
 	struct ref_filter filter;
 	struct ref_format format = REF_FORMAT_INIT;
+	struct strbuf output = STRBUF_INIT;
+	struct strbuf err = STRBUF_INIT;
 
 	struct option opts[] = {
 		OPT_BIT('s', "shell", &format.quote_style,
@@ -80,8 +82,16 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	if (!maxcount || array.nr < maxcount)
 		maxcount = array.nr;
-	for (i = 0; i < maxcount; i++)
-		show_ref_array_item(array.items[i], &format);
+
+	for (i = 0; i < maxcount; i++) {
+		strbuf_reset(&output);
+		if (format_ref_array_item(array.items[i], &format, &output, &err))
+			die("%s", err.buf);
+		fwrite(output.buf, 1, output.len, stdout);
+		putchar('\n');
+	}
+
+	strbuf_release(&output);
 	ref_array_clear(&array);
 	return 0;
 }

It is dropping a few lines by assuming that the error buf is only
touched when we return an error (which IMHO is a reasonable assumption
to make).

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  2021-04-08 14:51           ` Jeff King
@ 2021-04-08 15:12             ` ZheNing Hu
  0 siblings, 0 replies; 21+ messages in thread
From: ZheNing Hu @ 2021-04-08 15:12 UTC (permalink / raw)
  To: Jeff King
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma, Eric Sunshine, Derrick Stolee,
	René Scharfe

Jeff King <peff@peff.net> 于2021年4月8日周四 下午10:51写道:
>
> On Thu, Apr 08, 2021 at 10:43:54PM +0800, ZheNing Hu wrote:
>
> > > What I meant was that we should get rid of show_ref_array_items(), as
> > > well, and just use format_ref_array_item() everywhere. This whole
> > > wrapper is only saving us a few lines, and it makes it harder to see
> > > what the function is doing. Likewise for pretty-print ref. But I dunno.
> > > Maybe that is all going too far.
> > >
> >
> > Ok... so you mean we just use a loop like in branch.c, and get rid of
> > show_ref_array_items() and show_ref_array_item().
> > (We can still use the optimization of reuse bufs)
>
> Yes, something like this:
>
> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index cb9c81a046..55297fe297 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>         struct ref_array array;
>         struct ref_filter filter;
>         struct ref_format format = REF_FORMAT_INIT;
> +       struct strbuf output = STRBUF_INIT;
> +       struct strbuf err = STRBUF_INIT;
>
>         struct option opts[] = {
>                 OPT_BIT('s', "shell", &format.quote_style,
> @@ -80,8 +82,16 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>
>         if (!maxcount || array.nr < maxcount)
>                 maxcount = array.nr;
> -       for (i = 0; i < maxcount; i++)
> -               show_ref_array_item(array.items[i], &format);
> +
> +       for (i = 0; i < maxcount; i++) {
> +               strbuf_reset(&output);
> +               if (format_ref_array_item(array.items[i], &format, &output, &err))
> +                       die("%s", err.buf);
> +               fwrite(output.buf, 1, output.len, stdout);
> +               putchar('\n');
> +       }
> +
> +       strbuf_release(&output);
>         ref_array_clear(&array);
>         return 0;
>  }
>
> It is dropping a few lines by assuming that the error buf is only
> touched when we return an error (which IMHO is a reasonable assumption
> to make).
>

I agree, err buffer does not need to be reused.

> -Peff

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-04-08 15:12 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-05 14:01 [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
2021-04-05 17:05 ` Eric Sunshine
2021-04-06  8:53   ` ZheNing Hu
2021-04-05 21:02 ` Derrick Stolee
2021-04-06  8:58   ` ZheNing Hu
2021-04-05 22:17 ` Jeff King
2021-04-06  9:49   ` ZheNing Hu
2021-04-06 10:35     ` ZheNing Hu
2021-04-06 14:00       ` Jeff King
2021-04-06 14:35         ` ZheNing Hu
2021-04-06 18:34 ` René Scharfe
2021-04-07 13:57   ` ZheNing Hu
2021-04-07 15:26 ` [PATCH v2] " ZheNing Hu via GitGitGadget
2021-04-07 20:31   ` Junio C Hamano
2021-04-08 12:05     ` ZheNing Hu
2021-04-07 21:27   ` Jeff King
2021-04-08 12:18     ` ZheNing Hu
2021-04-08 14:32       ` Jeff King
2021-04-08 14:43         ` ZheNing Hu
2021-04-08 14:51           ` Jeff King
2021-04-08 15:12             ` ZheNing Hu

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git