git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
	Git List <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>,
	Christian Couder <chriscool@tuxfamily.org>,
	Hariom Verma <hariom18599@gmail.com>
Subject: Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
Date: Wed, 7 Apr 2021 21:57:00 +0800	[thread overview]
Message-ID: <CAOLTT8S4-ZAjU5qcfep8-bbw+BNM3f-khMXJvQP+an3H6emp8g@mail.gmail.com> (raw)
In-Reply-To: <c70a7c17-650a-ae4d-9a90-66c3511f8371@web.de>

René Scharfe <l.s.r@web.de> 于2021年4月7日周三 上午2:34写道:
>
> Am 05.04.21 um 16:01 schrieb ZheNing Hu via GitGitGadget:
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can provide two single strbuf:
> > final_buf and error_buf that get reused for each output.
> >
> > When run it 100 times:
> >
> > $ git for-each-ref
> >
> > on git.git :
> >
> > 3.19s user
> > 3.88s system
> > 35% cpu
> > 20.199 total
> >
> > to:
> >
> > 2.89s user
> > 4.00s system
> > 34% cpu
> > 19.741 total
> >
> > The performance has been slightly improved.
>
> I like to use hyperfine (https://github.com/sharkdp/hyperfine) to get
> more stable benchmark numbers, incl. standard deviation.  With three
> warmup runs I get the following results for running git for-each-ref on
> Git's own repo with the current master (2e36527f23):
>

Yes, hyperfine is really easy to use!

>   Benchmark #1: ./git for-each-ref
>     Time (mean ± σ):      18.8 ms ±   0.3 ms    [User: 12.7 ms, System: 5.6 ms]
>     Range (min … max):    18.2 ms …  19.8 ms    148 runs
>
> With your patch on top I get this:
>
>   Benchmark #1: ./git for-each-ref
>     Time (mean ± σ):      18.5 ms ±   0.4 ms    [User: 12.3 ms, System: 5.6 ms]
>     Range (min … max):    17.8 ms …  19.6 ms    147 runs
>
> So there seems to be a slight improvement here, but it is within the
> noise.
>

Yeah. I meet same noise when I do such test.

> I'm quite surprised how much longer this takes on your machine, however,
> and (like Peff already mentioned) how much of the total time it spends
> in system calls.  Is an antivirus program or similar interferring?  Or
> some kind of emulator or similar, e.g. Valgrind?  Or has it been a long
> time since you ran "git gc"?
>

Yes, I haven't used `git gc` for a long time.
In addition, when I did the test before, I ran the network proxy software,
so there have a bit notice.

> The benchmark certainly depends on the number of local and remote
> branches in the repo; my copy currently has 4304 according to
> "git for-each-ref | wc -l".
>

Yes i understand this point.
But In my git.git, the result of "git for-each-ref | wc -l" is 8716 refs.

> >
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> >     [GSOC] ref-filter: use single strbuf for all output
> >
> >     This patch learned Jeff King's optimization measures in git
> >     cat-file(79ed0a5): using a single strbuf for all objects output Instead
> >     of allocating a large number of small strbuf for every object.
> >
> >     So ref-filter can learn same thing: use single buffer: final_buf and
> >     error_buf for all refs output.
> >
> >     Thanks.
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-927%2Fadlternative%2Fref-filter-single-buf-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-927/adlternative/ref-filter-single-buf-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/927
> >
> >  builtin/for-each-ref.c |  4 +++-
> >  builtin/tag.c          |  4 +++-
> >  ref-filter.c           | 21 ++++++++++++---------
> >  ref-filter.h           |  5 ++++-
> >  4 files changed, 22 insertions(+), 12 deletions(-)
> >
> > diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> > index cb9c81a04606..9dc41f48bfa0 100644
> > --- a/builtin/for-each-ref.c
> > +++ b/builtin/for-each-ref.c
> > @@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >       struct ref_array array;
> >       struct ref_filter filter;
> >       struct ref_format format = REF_FORMAT_INIT;
> > +     struct strbuf final_buf = STRBUF_INIT;
> > +     struct strbuf error_buf = STRBUF_INIT;
> >
> >       struct option opts[] = {
> >               OPT_BIT('s', "shell", &format.quote_style,
> > @@ -81,7 +83,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >       if (!maxcount || array.nr < maxcount)
> >               maxcount = array.nr;
> >       for (i = 0; i < maxcount; i++)
> > -             show_ref_array_item(array.items[i], &format);
> > +             show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);
>
> This user of show_ref_array_item() calls it in a loop on an array.
>
> >       ref_array_clear(&array);
> >       return 0;
> >  }
> > diff --git a/builtin/tag.c b/builtin/tag.c
> > index d403417b5625..8a38b3e2de34 100644
> > --- a/builtin/tag.c
> > +++ b/builtin/tag.c
> > @@ -39,6 +39,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >                    struct ref_format *format)
> >  {
> >       struct ref_array array;
> > +     struct strbuf final_buf = STRBUF_INIT;
> > +     struct strbuf error_buf = STRBUF_INIT;
> >       char *to_free = NULL;
> >       int i;
> >
> > @@ -64,7 +66,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >       ref_array_sort(sorting, &array);
> >
> >       for (i = 0; i < array.nr; i++)
> > -             show_ref_array_item(array.items[i], format);
> > +             show_ref_array_item(array.items[i], format, &final_buf, &error_buf);
>
> Dito.
>
> >       ref_array_clear(&array);
> >       free(to_free);
> >
> > diff --git a/ref-filter.c b/ref-filter.c
> > index f0bd32f71416..51ff6af64ebc 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2436,16 +2436,16 @@ int format_ref_array_item(struct ref_array_item *info,
> >  }
> >
> >  void show_ref_array_item(struct ref_array_item *info,
> > -                      const struct ref_format *format)
> > +                      const struct ref_format *format,
> > +                      struct strbuf *final_buf,
> > +                      struct strbuf *error_buf)
> >  {
> > -     struct strbuf final_buf = STRBUF_INIT;
> > -     struct strbuf error_buf = STRBUF_INIT;
> >
> > -     if (format_ref_array_item(info, format, &final_buf, &error_buf))
> > -             die("%s", error_buf.buf);
> > -     fwrite(final_buf.buf, 1, final_buf.len, stdout);
> > -     strbuf_release(&error_buf);
> > -     strbuf_release(&final_buf);
> > +     if (format_ref_array_item(info, format, final_buf, error_buf))
> > +             die("%s", error_buf->buf);
> > +     fwrite(final_buf->buf, 1, final_buf->len, stdout);
> > +     strbuf_reset(error_buf);
> > +     strbuf_reset(final_buf);
> >       putchar('\n');
> >  }
> >
> > @@ -2453,9 +2453,12 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
> >                     const struct ref_format *format)
> >  {
> >       struct ref_array_item *ref_item;
> > +     struct strbuf final_buf = STRBUF_INIT;
> > +     struct strbuf error_buf = STRBUF_INIT;
> > +
> >       ref_item = new_ref_array_item(name, oid);
> >       ref_item->kind = ref_kind_from_refname(name);
> > -     show_ref_array_item(ref_item, format);
> > +     show_ref_array_item(ref_item, format, &final_buf, &error_buf);
>
> This third and final caller works with a single item; there is no loop.
>
> >       free_array_item(ref_item);
> >  }
> >
> > diff --git a/ref-filter.h b/ref-filter.h
> > index 19ea4c413409..95498c9f4467 100644
> > --- a/ref-filter.h
> > +++ b/ref-filter.h
> > @@ -120,7 +120,10 @@ int format_ref_array_item(struct ref_array_item *info,
> >                         struct strbuf *final_buf,
> >                         struct strbuf *error_buf);
> >  /*  Print the ref using the given format and quote_style */
> > -void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> > +void show_ref_array_item(struct ref_array_item *info,
> > +                      const struct ref_format *format,
> > +                      struct strbuf *final_buf,
> > +                      struct strbuf *error_buf);
>
> This bring-your-own-buffer approach pushes responsibilities back to
> the callers, in exchange for improved performance.  The number of
> users of this interface is low, so that's defensible.  But that added
> effort is also non-trivial -- as you demonstrated by leaking the
> allocated memory. ;-)
>

Yes, this may be burden for the function caller.

> How about offering to do more instead?  In particular you could add
> a count parameter and have show_ref_array_item() handle an array of
> struct ref_array_item objects.  It could reuse the buffers internally
> to get the same performance benefit, and would free callers from
> having to iterate loops themselves.  Something like:
>
>         void show_ref_array_items(struct ref_array_item **info,
>                                   size_t n,
>                                   const struct ref_format *format);
>
> Callers that deal with a single element can pass n = 1.
>
> Perhaps the "format" parameter should go first, like with printf.
>
> The double reference in "**info" is a bit ugly, though (array of
> pointers instead of a simple array of objects).  That's dictated
> by struct ref_array_item containing a flexible array member, which
> seems to be hard to change.
>

I personally think this idea is great.
In this way, there is no need to pass in two strbuf from the outside.

+void show_ref_array_items(struct ref_array_item **info,
+                        const struct ref_format *format,
+                        size_t n)
+{
+       struct strbuf final_buf = STRBUF_INIT;
+       struct strbuf error_buf = STRBUF_INIT;
+       size_t i;
+
+       for (i = 0; i < n; i++) {
+               if (format_ref_array_item(info[i], format, &final_buf,
&error_buf))
+                       die("%s", error_buf.buf);
+               fwrite(final_buf.buf, 1, final_buf.len, stdout);
+               strbuf_reset(&error_buf);
+               strbuf_reset(&final_buf);
+               putchar('\n');
+       }
+       strbuf_release(&error_buf);
+       strbuf_release(&final_buf);
+}
+

And the result is here(close the network proxy program):

HEAD~ result :

(git)-[heads/master] % hyperfine "./bin-wrappers/git for-each-ref"
--warmup=10
  Benchmark #1: ./bin-wrappers/git for-each-ref
    Time (mean ± σ):      18.7 ms ±   0.4 ms    [User: 14.9 ms,
System: 3.9 ms]
    Range (min … max):    18.1 ms …  19.8 ms    141 runs

With the new patch :
 (git)-[ref-filter-single-buf] % hyperfine "./bin-wrappers/git
for-each-ref" --warmup=10
  Benchmark #1: ./bin-wrappers/git for-each-ref
   Time (mean ± σ):      18.2 ms ±   0.3 ms    [User: 14.1 ms, System:
4.2 ms]
   Range (min … max):    17.4 ms …  19.2 ms    140 runs

Seem like it does have some small advantages ;-)

> >  /*  Parse a single sort specifier and add it to the list */
> >  void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *atom);
> >  /*  Callback function for parsing the sort option */
> >
> > base-commit: 2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48
> >
>

A new iteration will be sent later.

Thanks!
--
ZheNing Hu

  reply	other threads:[~2021-04-07 13:57 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05 14:01 [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
2021-04-05 17:05 ` Eric Sunshine
2021-04-06  8:53   ` ZheNing Hu
2021-04-05 21:02 ` Derrick Stolee
2021-04-06  8:58   ` ZheNing Hu
2021-04-05 22:17 ` Jeff King
2021-04-06  9:49   ` ZheNing Hu
2021-04-06 10:35     ` ZheNing Hu
2021-04-06 14:00       ` Jeff King
2021-04-06 14:35         ` ZheNing Hu
2021-04-06 18:34 ` René Scharfe
2021-04-07 13:57   ` ZheNing Hu [this message]
2021-04-07 15:26 ` [PATCH v2] " ZheNing Hu via GitGitGadget
2021-04-07 20:31   ` Junio C Hamano
2021-04-08 12:05     ` ZheNing Hu
2021-04-07 21:27   ` Jeff King
2021-04-08 12:18     ` ZheNing Hu
2021-04-08 14:32       ` Jeff King
2021-04-08 14:43         ` ZheNing Hu
2021-04-08 14:51           ` Jeff King
2021-04-08 15:12             ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8S4-ZAjU5qcfep8-bbw+BNM3f-khMXJvQP+an3H6emp8g@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).