From: ZheNing Hu <adlternative@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
Git List <git@vger.kernel.org>, Jeff King <peff@peff.net>,
Junio C Hamano <gitster@pobox.com>,
Christian Couder <chriscool@tuxfamily.org>,
Hariom Verma <hariom18599@gmail.com>
Subject: Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output
Date: Wed, 7 Apr 2021 21:57:00 +0800 [thread overview]
Message-ID: <CAOLTT8S4-ZAjU5qcfep8-bbw+BNM3f-khMXJvQP+an3H6emp8g@mail.gmail.com> (raw)
In-Reply-To: <c70a7c17-650a-ae4d-9a90-66c3511f8371@web.de>
René Scharfe <l.s.r@web.de> 于2021年4月7日周三 上午2:34写道:
>
> Am 05.04.21 um 16:01 schrieb ZheNing Hu via GitGitGadget:
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can provide two single strbuf:
> > final_buf and error_buf that get reused for each output.
> >
> > When run it 100 times:
> >
> > $ git for-each-ref
> >
> > on git.git :
> >
> > 3.19s user
> > 3.88s system
> > 35% cpu
> > 20.199 total
> >
> > to:
> >
> > 2.89s user
> > 4.00s system
> > 34% cpu
> > 19.741 total
> >
> > The performance has been slightly improved.
>
> I like to use hyperfine (https://github.com/sharkdp/hyperfine) to get
> more stable benchmark numbers, incl. standard deviation. With three
> warmup runs I get the following results for running git for-each-ref on
> Git's own repo with the current master (2e36527f23):
>
Yes, hyperfine is really easy to use!
> Benchmark #1: ./git for-each-ref
> Time (mean ± σ): 18.8 ms ± 0.3 ms [User: 12.7 ms, System: 5.6 ms]
> Range (min … max): 18.2 ms … 19.8 ms 148 runs
>
> With your patch on top I get this:
>
> Benchmark #1: ./git for-each-ref
> Time (mean ± σ): 18.5 ms ± 0.4 ms [User: 12.3 ms, System: 5.6 ms]
> Range (min … max): 17.8 ms … 19.6 ms 147 runs
>
> So there seems to be a slight improvement here, but it is within the
> noise.
>
Yeah. I meet same noise when I do such test.
> I'm quite surprised how much longer this takes on your machine, however,
> and (like Peff already mentioned) how much of the total time it spends
> in system calls. Is an antivirus program or similar interferring? Or
> some kind of emulator or similar, e.g. Valgrind? Or has it been a long
> time since you ran "git gc"?
>
Yes, I haven't used `git gc` for a long time.
In addition, when I did the test before, I ran the network proxy software,
so there have a bit notice.
> The benchmark certainly depends on the number of local and remote
> branches in the repo; my copy currently has 4304 according to
> "git for-each-ref | wc -l".
>
Yes i understand this point.
But In my git.git, the result of "git for-each-ref | wc -l" is 8716 refs.
> >
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> > [GSOC] ref-filter: use single strbuf for all output
> >
> > This patch learned Jeff King's optimization measures in git
> > cat-file(79ed0a5): using a single strbuf for all objects output Instead
> > of allocating a large number of small strbuf for every object.
> >
> > So ref-filter can learn same thing: use single buffer: final_buf and
> > error_buf for all refs output.
> >
> > Thanks.
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-927%2Fadlternative%2Fref-filter-single-buf-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-927/adlternative/ref-filter-single-buf-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/927
> >
> > builtin/for-each-ref.c | 4 +++-
> > builtin/tag.c | 4 +++-
> > ref-filter.c | 21 ++++++++++++---------
> > ref-filter.h | 5 ++++-
> > 4 files changed, 22 insertions(+), 12 deletions(-)
> >
> > diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> > index cb9c81a04606..9dc41f48bfa0 100644
> > --- a/builtin/for-each-ref.c
> > +++ b/builtin/for-each-ref.c
> > @@ -22,6 +22,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> > struct ref_array array;
> > struct ref_filter filter;
> > struct ref_format format = REF_FORMAT_INIT;
> > + struct strbuf final_buf = STRBUF_INIT;
> > + struct strbuf error_buf = STRBUF_INIT;
> >
> > struct option opts[] = {
> > OPT_BIT('s', "shell", &format.quote_style,
> > @@ -81,7 +83,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> > if (!maxcount || array.nr < maxcount)
> > maxcount = array.nr;
> > for (i = 0; i < maxcount; i++)
> > - show_ref_array_item(array.items[i], &format);
> > + show_ref_array_item(array.items[i], &format, &final_buf, &error_buf);
>
> This user of show_ref_array_item() calls it in a loop on an array.
>
> > ref_array_clear(&array);
> > return 0;
> > }
> > diff --git a/builtin/tag.c b/builtin/tag.c
> > index d403417b5625..8a38b3e2de34 100644
> > --- a/builtin/tag.c
> > +++ b/builtin/tag.c
> > @@ -39,6 +39,8 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> > struct ref_format *format)
> > {
> > struct ref_array array;
> > + struct strbuf final_buf = STRBUF_INIT;
> > + struct strbuf error_buf = STRBUF_INIT;
> > char *to_free = NULL;
> > int i;
> >
> > @@ -64,7 +66,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> > ref_array_sort(sorting, &array);
> >
> > for (i = 0; i < array.nr; i++)
> > - show_ref_array_item(array.items[i], format);
> > + show_ref_array_item(array.items[i], format, &final_buf, &error_buf);
>
> Dito.
>
> > ref_array_clear(&array);
> > free(to_free);
> >
> > diff --git a/ref-filter.c b/ref-filter.c
> > index f0bd32f71416..51ff6af64ebc 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2436,16 +2436,16 @@ int format_ref_array_item(struct ref_array_item *info,
> > }
> >
> > void show_ref_array_item(struct ref_array_item *info,
> > - const struct ref_format *format)
> > + const struct ref_format *format,
> > + struct strbuf *final_buf,
> > + struct strbuf *error_buf)
> > {
> > - struct strbuf final_buf = STRBUF_INIT;
> > - struct strbuf error_buf = STRBUF_INIT;
> >
> > - if (format_ref_array_item(info, format, &final_buf, &error_buf))
> > - die("%s", error_buf.buf);
> > - fwrite(final_buf.buf, 1, final_buf.len, stdout);
> > - strbuf_release(&error_buf);
> > - strbuf_release(&final_buf);
> > + if (format_ref_array_item(info, format, final_buf, error_buf))
> > + die("%s", error_buf->buf);
> > + fwrite(final_buf->buf, 1, final_buf->len, stdout);
> > + strbuf_reset(error_buf);
> > + strbuf_reset(final_buf);
> > putchar('\n');
> > }
> >
> > @@ -2453,9 +2453,12 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
> > const struct ref_format *format)
> > {
> > struct ref_array_item *ref_item;
> > + struct strbuf final_buf = STRBUF_INIT;
> > + struct strbuf error_buf = STRBUF_INIT;
> > +
> > ref_item = new_ref_array_item(name, oid);
> > ref_item->kind = ref_kind_from_refname(name);
> > - show_ref_array_item(ref_item, format);
> > + show_ref_array_item(ref_item, format, &final_buf, &error_buf);
>
> This third and final caller works with a single item; there is no loop.
>
> > free_array_item(ref_item);
> > }
> >
> > diff --git a/ref-filter.h b/ref-filter.h
> > index 19ea4c413409..95498c9f4467 100644
> > --- a/ref-filter.h
> > +++ b/ref-filter.h
> > @@ -120,7 +120,10 @@ int format_ref_array_item(struct ref_array_item *info,
> > struct strbuf *final_buf,
> > struct strbuf *error_buf);
> > /* Print the ref using the given format and quote_style */
> > -void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> > +void show_ref_array_item(struct ref_array_item *info,
> > + const struct ref_format *format,
> > + struct strbuf *final_buf,
> > + struct strbuf *error_buf);
>
> This bring-your-own-buffer approach pushes responsibilities back to
> the callers, in exchange for improved performance. The number of
> users of this interface is low, so that's defensible. But that added
> effort is also non-trivial -- as you demonstrated by leaking the
> allocated memory. ;-)
>
Yes, this may be burden for the function caller.
> How about offering to do more instead? In particular you could add
> a count parameter and have show_ref_array_item() handle an array of
> struct ref_array_item objects. It could reuse the buffers internally
> to get the same performance benefit, and would free callers from
> having to iterate loops themselves. Something like:
>
> void show_ref_array_items(struct ref_array_item **info,
> size_t n,
> const struct ref_format *format);
>
> Callers that deal with a single element can pass n = 1.
>
> Perhaps the "format" parameter should go first, like with printf.
>
> The double reference in "**info" is a bit ugly, though (array of
> pointers instead of a simple array of objects). That's dictated
> by struct ref_array_item containing a flexible array member, which
> seems to be hard to change.
>
I personally think this idea is great.
In this way, there is no need to pass in two strbuf from the outside.
+void show_ref_array_items(struct ref_array_item **info,
+ const struct ref_format *format,
+ size_t n)
+{
+ struct strbuf final_buf = STRBUF_INIT;
+ struct strbuf error_buf = STRBUF_INIT;
+ size_t i;
+
+ for (i = 0; i < n; i++) {
+ if (format_ref_array_item(info[i], format, &final_buf,
&error_buf))
+ die("%s", error_buf.buf);
+ fwrite(final_buf.buf, 1, final_buf.len, stdout);
+ strbuf_reset(&error_buf);
+ strbuf_reset(&final_buf);
+ putchar('\n');
+ }
+ strbuf_release(&error_buf);
+ strbuf_release(&final_buf);
+}
+
And the result is here(close the network proxy program):
HEAD~ result :
(git)-[heads/master] % hyperfine "./bin-wrappers/git for-each-ref"
--warmup=10
Benchmark #1: ./bin-wrappers/git for-each-ref
Time (mean ± σ): 18.7 ms ± 0.4 ms [User: 14.9 ms,
System: 3.9 ms]
Range (min … max): 18.1 ms … 19.8 ms 141 runs
With the new patch :
(git)-[ref-filter-single-buf] % hyperfine "./bin-wrappers/git
for-each-ref" --warmup=10
Benchmark #1: ./bin-wrappers/git for-each-ref
Time (mean ± σ): 18.2 ms ± 0.3 ms [User: 14.1 ms, System:
4.2 ms]
Range (min … max): 17.4 ms … 19.2 ms 140 runs
Seem like it does have some small advantages ;-)
> > /* Parse a single sort specifier and add it to the list */
> > void parse_ref_sorting(struct ref_sorting **sorting_tail, const char *atom);
> > /* Callback function for parsing the sort option */
> >
> > base-commit: 2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48
> >
>
A new iteration will be sent later.
Thanks!
--
ZheNing Hu
next prev parent reply other threads:[~2021-04-07 13:57 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-05 14:01 [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
2021-04-05 17:05 ` Eric Sunshine
2021-04-06 8:53 ` ZheNing Hu
2021-04-05 21:02 ` Derrick Stolee
2021-04-06 8:58 ` ZheNing Hu
2021-04-05 22:17 ` Jeff King
2021-04-06 9:49 ` ZheNing Hu
2021-04-06 10:35 ` ZheNing Hu
2021-04-06 14:00 ` Jeff King
2021-04-06 14:35 ` ZheNing Hu
2021-04-06 18:34 ` René Scharfe
2021-04-07 13:57 ` ZheNing Hu [this message]
2021-04-07 15:26 ` [PATCH v2] " ZheNing Hu via GitGitGadget
2021-04-07 20:31 ` Junio C Hamano
2021-04-08 12:05 ` ZheNing Hu
2021-04-07 21:27 ` Jeff King
2021-04-08 12:18 ` ZheNing Hu
2021-04-08 14:32 ` Jeff King
2021-04-08 14:43 ` ZheNing Hu
2021-04-08 14:51 ` Jeff King
2021-04-08 15:12 ` ZheNing Hu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOLTT8S4-ZAjU5qcfep8-bbw+BNM3f-khMXJvQP+an3H6emp8g@mail.gmail.com \
--to=adlternative@gmail.com \
--cc=chriscool@tuxfamily.org \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=hariom18599@gmail.com \
--cc=l.s.r@web.de \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).