From: Noah Goldstein via Libc-alpha <libc-alpha@sourceware.org>
To: "naohirot@fujitsu.com" <naohirot@fujitsu.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Subject: Re: [PATCH v2 2/5] benchtests: Add memset zero fill benchtest
Date: Mon, 26 Jul 2021 13:22:11 -0400 [thread overview]
Message-ID: <CAFUsyfJgPUNBK0tW6LRJbUz+ewVLyzALcMGdE2LoXZ5=+2s0Ww@mail.gmail.com> (raw)
In-Reply-To: <TYAPR01MB6025720E6684DFC3BBEC7474DFE89@TYAPR01MB6025.jpnprd01.prod.outlook.com>
On Mon, Jul 26, 2021 at 4:39 AM naohirot@fujitsu.com <naohirot@fujitsu.com>
wrote:
> Hi Noah,
>
> > I see. I think 16 for the inner loop makes sense. From the x86_64
> > perspective this
> > will keep the loop from running out of the LSD which is necessary for
> > accurate
> > benchmarking. I guess then somewhere between [2, 8] is reasonable for the
> > outer
> > loop?
> >
> >
> > > #define START_SIZE (16 * 1024)
> > > ...
> > > static void
> > > __attribute__((noinline, noclone))
> > > do_one_test (json_ctx_t *json_ctx, impl_t *impl, CHAR *s,
> > > int c1 __attribute ((unused)), int c2 __attribute
> ((unused)),
> > > size_t n)
> > > {
> > > size_t i, j, iters = INNER_LOOP_ITERS; // 32;
> > > timing_t start, stop, cur, latency = 0;
> > >
> > > for (i = 0; i < 512; i++) // for (i = 0; i < 2; i++)
> > > {
> > >
> > > CALL (impl, s, c1, n * 16);
> > > TIMING_NOW (start);
> > > for (j = 0; j < 16; j++)
> > > CALL (impl, s + n * j, c2, n);
> > > TIMING_NOW (stop);
> > > TIMING_DIFF (cur, start, stop);
> > > TIMING_ACCUM (latency, cur);
> > > }
> > >
> > This looks good. But as you said, a much smaller value for outer loop.
>
> I made one improvement that replaced
> CALL (impl, s, c1, n * 16);
> to
> __builtin_memset (s, c1, n * 16);
> and tentatively chose outer loop two times such as the followings:
>
> -----
> static void
> __attribute__((noinline, noclone))
> do_one_test (json_ctx_t *json_ctx, impl_t *impl, CHAR *s,
> int c1 __attribute ((unused)), int c2 __attribute ((unused)),
> size_t n)
> {
> size_t i, j, iters = 32;
> timing_t start, stop, cur, latency = 0;
>
> for (i = 0; i < 2; i++)
> {
> __builtin_memset (s, c1, n * 16);
> TIMING_NOW (start);
> for (j = 0; j < 16; j++)
> CALL (impl, s + n * j, c2, n);
> TIMING_NOW (stop);
> TIMING_DIFF (cur, start, stop);
> TIMING_ACCUM (latency, cur);
> }
>
> json_element_double (json_ctx, (double) latency / (double) iters);
> }
>
Looks good!
> -----
>
In case of __memset_generic on a64fx, execution of outer loop 8times
> and 2times took as follows:
>
> 8times
> real 0m26.236s
> user 0m18.806s
> sys 0m6.562s
>
> 2times
> real 0m12.956s
> user 0m5.081s
> sys 0m6.594s
>
> The performance difference is shown in a comparison graph [1],
> there is a difference at 16KB.
> This difference would not be critical if we use the performance data
> mainly to compare "before" with "after" such as master version of
> memset with patched version of memset.
>
>
> This graph[1] can be drawn as the following:
>
> $ cat 2times/bench-memset-zerofill.out 8times/bench-memset-zerofill.out | \
> > merge_strings4graph.sh __memset_generic 2times 8times | \
> > plot_strings.py -l -p thru -v -
>
>
> In order to use __builtin_memset() and create the comparison graph [1],
> I submitted two ground work patches [2][3].
>
> [1]
> https://drive.google.com/file/d/1vD1VE3pdHLoYdaAMWXtImvDlGFDHYkyx/view?usp=sharing
> [2] https://sourceware.org/pipermail/libc-alpha/2021-July/129459.html
> [3] https://sourceware.org/pipermail/libc-alpha/2021-July/129460.html
>
> Thanks.
> Naohiro
>
next prev parent reply other threads:[~2021-07-26 17:22 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-13 8:22 [PATCH] benchtests: Add memset zero fill benchmark tests Naohiro Tamura via Libc-alpha
2021-07-13 13:50 ` Lucas A. M. Magalhaes via Libc-alpha
2021-07-20 6:31 ` [PATCH v2 0/5] " Naohiro Tamura via Libc-alpha
2021-08-05 7:47 ` [PATCH v3 0/5] benchtests: Add memset zero fill benchmark test Naohiro Tamura via Libc-alpha
2021-08-05 7:49 ` [PATCH v3 1/5] benchtests: Enable scripts/plot_strings.py to read stdin Naohiro Tamura via Libc-alpha
2021-08-05 7:56 ` Siddhesh Poyarekar
2021-09-08 1:46 ` naohirot--- via Libc-alpha
2021-09-08 12:56 ` Siddhesh Poyarekar
2021-09-09 0:22 ` naohirot--- via Libc-alpha
2021-09-13 3:45 ` Siddhesh Poyarekar
2021-08-05 7:50 ` [PATCH v3 2/5] benchtests: Add memset zero fill benchtest Naohiro Tamura via Libc-alpha
2021-09-08 2:03 ` naohirot--- via Libc-alpha
2021-09-10 20:40 ` Lucas A. M. Magalhaes via Libc-alpha
2021-09-13 0:53 ` naohirot--- via Libc-alpha
2021-09-13 14:05 ` Lucas A. M. Magalhaes via Libc-alpha
2021-09-14 0:38 ` [PATCH v4] " Naohiro Tamura via Libc-alpha
2021-09-14 0:44 ` [PATCH v3 2/5] " naohirot--- via Libc-alpha
2021-09-14 14:02 ` Wilco Dijkstra via Libc-alpha
2021-09-15 8:24 ` naohirot--- via Libc-alpha
2021-09-21 1:27 ` naohirot--- via Libc-alpha
2021-09-21 11:09 ` Wilco Dijkstra via Libc-alpha
2021-09-22 1:05 ` [PATCH v5] " Naohiro Tamura via Libc-alpha
2023-02-09 17:23 ` Carlos O'Donell via Libc-alpha
2023-02-10 1:26 ` Siddhesh Poyarekar via Libc-alpha
2021-09-22 1:07 ` [PATCH v3 2/5] " naohirot--- via Libc-alpha
2021-09-28 1:40 ` naohirot--- via Libc-alpha
2021-09-30 0:55 ` Tamura, Naohiro/田村 直� via Libc-alpha
2021-10-18 12:57 ` Lucas A. M. Magalhaes via Libc-alpha
2021-10-20 13:44 ` Wilco Dijkstra via Libc-alpha
2021-10-20 15:35 ` Lucas A. M. Magalhaes via Libc-alpha
2021-10-20 17:47 ` Wilco Dijkstra via Libc-alpha
2021-10-22 13:08 ` Lucas A. M. Magalhaes via Libc-alpha
2021-08-05 7:51 ` [PATCH v3 3/5] benchtests: Remove redundant assert.h Naohiro Tamura via Libc-alpha
2021-09-08 1:59 ` naohirot--- via Libc-alpha
2021-09-13 3:36 ` Siddhesh Poyarekar
2021-08-05 7:51 ` [PATCH v3 4/5] benchtests: Fix validate_benchout.py exceptions Naohiro Tamura via Libc-alpha
2021-09-08 1:55 ` naohirot--- via Libc-alpha
2021-09-13 3:42 ` Siddhesh Poyarekar
2021-09-13 3:50 ` Siddhesh Poyarekar
2021-09-13 13:44 ` [PATCH v4] " Naohiro Tamura via Libc-alpha
2021-09-15 3:23 ` Siddhesh Poyarekar
2021-09-16 1:12 ` naohirot--- via Libc-alpha
2021-09-16 1:41 ` Siddhesh Poyarekar
2021-09-16 2:23 ` [PATCH v5] " Naohiro Tamura via Libc-alpha
2021-09-16 3:48 ` Siddhesh Poyarekar
2021-09-16 5:23 ` naohirot--- via Libc-alpha
2021-09-16 2:26 ` [PATCH v4] " naohirot--- via Libc-alpha
2021-09-13 13:46 ` [PATCH v3 4/5] " naohirot--- via Libc-alpha
2021-08-05 7:52 ` [PATCH v3 5/5] config: Rename HAVE_BUILTIN_MEMSET macro Naohiro Tamura via Libc-alpha
2021-08-11 20:34 ` Adhemerval Zanella via Libc-alpha
2021-07-20 6:34 ` [PATCH v2 1/5] benchtests: Enable scripts/plot_strings.py to read stdin Naohiro Tamura via Libc-alpha
2021-07-20 6:35 ` [PATCH v2 2/5] benchtests: Add memset zero fill benchtest Naohiro Tamura via Libc-alpha
2021-07-20 16:48 ` Noah Goldstein via Libc-alpha
2021-07-21 12:56 ` naohirot--- via Libc-alpha
2021-07-21 13:07 ` naohirot--- via Libc-alpha
2021-07-21 18:14 ` Noah Goldstein via Libc-alpha
2021-07-21 19:17 ` Wilco Dijkstra via Libc-alpha
2021-07-26 8:42 ` naohirot--- via Libc-alpha
2021-07-26 11:15 ` Wilco Dijkstra via Libc-alpha
2021-07-27 2:24 ` naohirot--- via Libc-alpha
2021-07-27 17:26 ` Wilco Dijkstra via Libc-alpha
2021-07-28 7:27 ` naohirot--- via Libc-alpha
2021-08-04 9:11 ` naohirot--- via Libc-alpha
2021-07-26 8:39 ` naohirot--- via Libc-alpha
2021-07-26 17:22 ` Noah Goldstein via Libc-alpha [this message]
2021-07-20 6:35 ` [PATCH v2 3/5] benchtests: Add a script to convert benchout string JSON to CSV Naohiro Tamura via Libc-alpha
2021-07-21 2:41 ` naohirot--- via Libc-alpha
2021-07-27 20:17 ` Joseph Myers
2021-07-29 1:56 ` naohirot--- via Libc-alpha
2021-07-29 4:42 ` Siddhesh Poyarekar
2021-07-30 7:05 ` naohirot--- via Libc-alpha
2021-07-31 10:47 ` Siddhesh Poyarekar
2021-07-20 6:36 ` [PATCH v2 4/5] benchtests: Remove redundant assert.h Naohiro Tamura via Libc-alpha
2021-07-20 6:37 ` [PATCH v2 5/5] benchtests: Fix validate_benchout.py exceptions Naohiro Tamura via Libc-alpha
2021-07-26 8:34 ` [PATCH] config: Remove HAVE_BUILTIN_MEMSET macro Naohiro Tamura via Libc-alpha
2021-07-26 8:48 ` naohirot--- via Libc-alpha
2021-07-26 8:49 ` Andreas Schwab
2021-07-26 9:42 ` naohirot--- via Libc-alpha
2021-07-26 9:51 ` Andreas Schwab
2021-07-26 13:16 ` naohirot--- via Libc-alpha
2021-07-26 8:35 ` [PATCH] benchtests: Add a script to merge two benchout string files Naohiro Tamura via Libc-alpha
2021-07-27 20:51 ` Joseph Myers
2021-07-30 7:04 ` naohirot--- via Libc-alpha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFUsyfJgPUNBK0tW6LRJbUz+ewVLyzALcMGdE2LoXZ5=+2s0Ww@mail.gmail.com' \
--to=libc-alpha@sourceware.org \
--cc=Wilco.Dijkstra@arm.com \
--cc=goldstein.w.n@gmail.com \
--cc=naohirot@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).