git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: "René Scharfe." <l.s.r@web.de>
Cc: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
	Git List <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>,
	Christian Couder <chriscool@tuxfamily.org>,
	Hariom Verma <hariom18599@gmail.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH] [GSOC] ref-filter: get rid of show_ref_array_item
Date: Sun, 18 Apr 2021 19:22:05 +0800	[thread overview]
Message-ID: <CAOLTT8RY_8Ui7AAfU6K7O03DqbMTh0m5GJpmuwOyiV8Ghn+cQg@mail.gmail.com> (raw)
In-Reply-To: <4c4eded7-3bb3-7ae9-6455-468b9522978c@web.de>

René Scharfe. <l.s.r@web.de> 于2021年4月17日周六 下午5:11写道:
>
> Am 09.04.21 um 15:35 schrieb ZheNing Hu via GitGitGadget:
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf.
> > But we can reuse the final strbuf for each step ref's output.
> > Since `show_ref_array_item()` is not used in many places,
> > we can directly delete `show_ref_array_item()` and use the
> > same logic code to replace it. In this way, the caller can
> > clearly see how the loop work.
>
> Inlining an exported function that is not providing the right level of
> abstraction is a bold move that simplifies the API and can unlock
> improvements at the former call sites, like the possibility to reuse an
> allocated buffer in this case.  OK.
>
> > The performance for `git for-each-ref` on the Git repository
> > itself with performance testing tool `hyperfine` changes from
> > 23.7 ms ± 0.9 ms to 22.2 ms ± 1.0 ms.
>
> I see a speedup as well, but it's within the noise.
>

Yes, the performance improvement is very small under a large number
of refs. It was almost completely drowned out by the noise.

> > At the same time, we apply this optimization to `git tag -l`
> > and `git branch -l`, the `git branch -l` performance upgrade
> > from 5.8 ms ± 0.8 ms to 2.7 ms ± 0.2 ms and `git tag -l`
> > performance upgrade from 5.9 ms ± 0.4 ms to 5.4 ms ± 0.4 ms.
>
> On my system there's no measurable change with these commands.
>

In our case, git branch -l has made obvious progress, but it may be because
the number of branches is far less than tags.

> Nevertheless I think reusing the buffer across the loops is a good
> idea.
>
> > Since the number of tags in git.git is much more than branches,
> > so this shows that the optimization will be more obvious in
> > those repositories that contain a small number of objects.
> >
> > This approach is similar to the one used by 79ed0a5
> > (cat-file: use a single strbuf for all output, 2018-08-14)
> > to speed up the cat-file builtin.
> >
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> >     [GSOC] ref-filter: get rid of show_ref_array_item
> >
> >     Now git for-each-ref can reuse final buf for all refs output, the
> >     performance is slightly improved, This optimization is also applied to
> >     git tag -l and git branch -l.
> >
> >     Thanks.
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-928%2Fadlternative%2Fref-filter-reuse-buf-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-928/adlternative/ref-filter-reuse-buf-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/928
> >
> >  builtin/branch.c       |  8 ++++----
> >  builtin/for-each-ref.c | 13 +++++++++++--
> >  builtin/tag.c          | 13 +++++++++++--
> >  ref-filter.c           | 24 +++++++++---------------
> >  ref-filter.h           |  2 --
> >  5 files changed, 35 insertions(+), 25 deletions(-)
> >
> > diff --git a/builtin/branch.c b/builtin/branch.c
> > index bcc00bcf182d..5c797e992aa4 100644
> > --- a/builtin/branch.c
> > +++ b/builtin/branch.c
> > @@ -411,6 +411,8 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
> >  {
> >       int i;
> >       struct ref_array array;
> > +     struct strbuf out = STRBUF_INIT;
> > +     struct strbuf err = STRBUF_INIT;
> >       int maxwidth = 0;
> >       const char *remote_prefix = "";
> >       char *to_free = NULL;
> > @@ -440,8 +442,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
> >       ref_array_sort(sorting, &array);
> >
> >       for (i = 0; i < array.nr; i++) {
> > -             struct strbuf out = STRBUF_INIT;
> > -             struct strbuf err = STRBUF_INIT;
> > +             strbuf_reset(&out);
> >               if (format_ref_array_item(array.items[i], format, &out, &err))
>
> This function didn't call show_ref_array_item() to begin with, so
> strictly speaking it's not related to change in the title.  It is a
> preexisting example of show_ref_array_item() not being flexible enough,
> though.  I think it makes sense to have separate patches for inlining
> the function verbatim and reusing the output buffer when
> format_ref_array_item() is called in a loop.
>

I agree with you. I will divide this into a separate patch.

> >                       die("%s", err.buf);
> >               if (column_active(colopts)) {
> > @@ -452,10 +453,9 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
> >                       fwrite(out.buf, 1, out.len, stdout);
> >                       putchar('\n');
> >               }
> > -             strbuf_release(&err);
> > -             strbuf_release(&out);
> >       }
> >
> > +     strbuf_release(&out);
>
> err is no longer released, and it is also not reset in the loop.
> That change is not mentioned in the commit message, but it should.
> Why is it safe?  Probably because format_ref_array_item() only
> populates it if it also returns non-zero and then we end up dying
> anyway.
>
> That makes leak checking harder, though -- it's not easy to see if
> err hasn't simply been forgotten to be released.  I'd just retain
> the strbuf_release() call at the end of the function -- it
> shouldn't have a measurable performance impact and documents that
> this function is cleaning up after itself.  Thoughts?
>

Makes sense. Perhaps future changes will forget the release of err buffer.
I will add `strbuf_release()` here.

Thanks.
--
ZheNing Hu

  reply	other threads:[~2021-04-18 11:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09 13:35 [PATCH] [GSOC] ref-filter: get rid of show_ref_array_item ZheNing Hu via GitGitGadget
2021-04-16 11:28 ` ZheNing Hu
2021-04-17  9:11 ` René Scharfe.
2021-04-18 11:22   ` ZheNing Hu [this message]
     [not found]   ` <xmqqeef47s5k.fsf@gitster.g>
2021-04-21  5:51     ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8RY_8Ui7AAfU6K7O03DqbMTh0m5GJpmuwOyiV8Ghn+cQg@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).