unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Improve string benchtest timing
@ 2019-05-08 15:39 Wilco Dijkstra
  2019-05-16 14:57 ` Wilco Dijkstra
  2019-05-20 18:44 ` Adhemerval Zanella
  0 siblings, 2 replies; 12+ messages in thread
From: Wilco Dijkstra @ 2019-05-08 15:39 UTC (permalink / raw)
  To: libc-alpha@sourceware.org; +Cc: nd

Improve string benchtest timing.  Many tests run for 0.01s which is way too
short to give accurate results.  Other tests take over 40 seconds which is
way too long.  Significantly increase the iterations of the short running
tests.  Reduce number of alignment variations in the long running memcpy walk
tests so they take less than 5 seconds.

As a result most tests take at least 0.1s and all finish within 5 seconds.

OK for commit?

2019-05-08  Wilco Dijkstra  <wdijkstr@arm.com>

        * benchtests/bench-memcpy-random.c (do_one_test): Use medium iterations.
        * benchtests/bench-memcpy-walk.c (test_main): Reduce alignment tests.
        * benchtests/bench-memmem.c (do_one_test): Use small iterations.
        * benchtests/bench-memmove-walk.c (test_main): Reduce alignment tests.
        * benchtests/bench-memset-walk.c (test_main): Reduce alignment tests.
        * benchtests/bench-strcasestr.c (do_one_test): Use small iterations.
        * benchtests/bench-string.h (INNER_LOOP_ITERS): Increase iterations.
        (INNER_LOOP_ITERS_MEDIUM): New define.
        (INNER_LOOP_ITERS_SMALL): New define.
        * benchtests/bench-strpbrk.c (do_one_test): Use medium iterations.
        * benchtests/bench-strsep.c (do_one_test): Use small iterations.
        * benchtests/bench-strspn.c (do_one_test): Use medium iterations.
        * benchtests/bench-strstr.c (do_one_test): Use small iterations.
        * benchtests/bench-strtok.c (do_one_test): Use small iterations.

--

diff --git a/benchtests/bench-memcpy-random.c b/benchtests/bench-memcpy-random.c
index 873133d9a6988af8a251e1035ab8bded277d5775..f2c2e9dc3db29ca269514874a4976eec13d9d685 100644
--- a/benchtests/bench-memcpy-random.c
+++ b/benchtests/bench-memcpy-random.c
@@ -101,7 +101,7 @@ do_one_test (json_ctx_t *json_ctx, impl_t *impl, char *dst, char *src,
 	     copy_t *copy, size_t n)
 {
   timing_t start, stop, cur;
-  size_t iters = INNER_LOOP_ITERS * 20;
+  size_t iters = INNER_LOOP_ITERS_MEDIUM;
 
   TIMING_NOW (start);
   for (int i = 0; i < iters; ++i)
diff --git a/benchtests/bench-memcpy-walk.c b/benchtests/bench-memcpy-walk.c
index 16b455f7796b9b3b9288fb6e5245bed006967970..397a1e98df4f7bd069865262470c40a58665bcd6 100644
--- a/benchtests/bench-memcpy-walk.c
+++ b/benchtests/bench-memcpy-walk.c
@@ -103,12 +103,8 @@ test_main (void)
   json_array_begin (&json_ctx, "results");
   for (size_t i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
     {
-      /* Test length alignments from 0-16 bytes.  */
-      for (int j = 0; j < 8; j++)
-	{
-	  do_test (&json_ctx, i + j);
-	  do_test (&json_ctx, i + 16 - j);
-	}
+      do_test (&json_ctx, i);
+      do_test (&json_ctx, i + 1);
     }
 
   json_array_end (&json_ctx);
diff --git a/benchtests/bench-memmem.c b/benchtests/bench-memmem.c
index b6b97f3d1f5f94a71c177cc516cba81c869d3865..42778b07658efbe9d8027e04dc4682aba5ab8797 100644
--- a/benchtests/bench-memmem.c
+++ b/benchtests/bench-memmem.c
@@ -105,7 +105,7 @@ static void
 do_one_test (impl_t *impl, const void *haystack, size_t haystack_len,
 	     const void *needle, size_t needle_len, const void *expected)
 {
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
 
   TIMING_NOW (start);
diff --git a/benchtests/bench-memmove-walk.c b/benchtests/bench-memmove-walk.c
index 7b4c31e63f6f3d39beb8ec8da8bfffc82f24d1ac..2b4a7e4f2966b930a8488dd02647639ffbfddc73 100644
--- a/benchtests/bench-memmove-walk.c
+++ b/benchtests/bench-memmove-walk.c
@@ -107,23 +107,15 @@ test_main (void)
   /* Non-overlapping buffers.  */
   for (size_t i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
     {
-      /* Test length alignments from 0-16 bytes.  */
-      for (int j = 0; j < 8; j++)
-	{
-	  do_test (&json_ctx, i + j, false);
-	  do_test (&json_ctx, i + 16 - j, false);
-	}
+      do_test (&json_ctx, i, false);
+      do_test (&json_ctx, i + 1, false);
     }
 
   /* Overlapping buffers.  */
   for (size_t i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
     {
-      /* Test length alignments from 0-16 bytes.  */
-      for (int j = 0; j < 8; j++)
-	{
-	  do_test (&json_ctx, i + j, true);
-	  do_test (&json_ctx, i + 16 - j, true);
-	}
+      do_test (&json_ctx, i, true);
+      do_test (&json_ctx, i + 1, true);
     }
 
   json_array_end (&json_ctx);
diff --git a/benchtests/bench-memset-walk.c b/benchtests/bench-memset-walk.c
index 5f43671216e2294856df35a865061740de7d7d67..930aae34d10603c80a7c40d123fb0376cde56436 100644
--- a/benchtests/bench-memset-walk.c
+++ b/benchtests/bench-memset-walk.c
@@ -112,13 +112,16 @@ test_main (void)
 
   json_array_begin (&json_ctx, "results");
   for (i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
-      /* Test length alignments from 0-16 bytes.  */
-      for (int j = 0; j < i && j < 16; j++)
-	do_test (&json_ctx, 65, i + j);
+    {
+      do_test (&json_ctx, 65, i);
+      do_test (&json_ctx, 65, i + 1);
+    }
 
   for (i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
-      for (int j = 0; j < i && j < 16; j++)
-	do_test (&json_ctx, 0, i + j);
+    {
+      do_test (&json_ctx, 0, i);
+      do_test (&json_ctx, 0, i + 1);
+    }
 
   json_array_end (&json_ctx);
   json_attr_object_end (&json_ctx);
diff --git a/benchtests/bench-strcasestr.c b/benchtests/bench-strcasestr.c
index 1458070d47cbfe5e27d40288f8ac7f83df735d08..d701ff4f364b35bd5223831fc8df7b468e183ceb 100644
--- a/benchtests/bench-strcasestr.c
+++ b/benchtests/bench-strcasestr.c
@@ -36,7 +36,7 @@ IMPL (strcasestr, 1)
 static void
 do_one_test (impl_t *impl, const char *s1, const char *s2, char *exp_result)
 {
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
 
   TIMING_NOW (start);
diff --git a/benchtests/bench-string.h b/benchtests/bench-string.h
index 9d3332a624c54b22884b2f90444c841a67f4baeb..e7fb015bc639d97e702e2c0e3c9aee86c9537e2c 100644
--- a/benchtests/bench-string.h
+++ b/benchtests/bench-string.h
@@ -124,7 +124,9 @@ extern impl_t __start_impls[], __stop_impls[];
 # define OPT_RANDOM 10001
 # define OPT_SEED 10002
 
-# define INNER_LOOP_ITERS 64
+# define INNER_LOOP_ITERS 8192
+# define INNER_LOOP_ITERS_MEDIUM 1024
+# define INNER_LOOP_ITERS_SMALL 32
 
 int ret, do_srandom;
 unsigned int seed;
diff --git a/benchtests/bench-strpbrk.c b/benchtests/bench-strpbrk.c
index 14e1bcb873fa86caa1f50a9013f52cd422e74185..e484ebd0c08df802e67578e613535f65ad233bdc 100644
--- a/benchtests/bench-strpbrk.c
+++ b/benchtests/bench-strpbrk.c
@@ -66,7 +66,7 @@ static void
 do_one_test (impl_t *impl, const CHAR *s, const CHAR *rej, RES_TYPE exp_res)
 {
   RES_TYPE res = CALL (impl, s, rej);
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_MEDIUM;
   timing_t start, stop, cur;
 
   if (res != exp_res)
diff --git a/benchtests/bench-strsep.c b/benchtests/bench-strsep.c
index fea7fbdfa6f2dcc8601354f56558dc3d83f07eef..96d3be52221f40ed1afeda1896921a04215a6b85 100644
--- a/benchtests/bench-strsep.c
+++ b/benchtests/bench-strsep.c
@@ -103,7 +103,7 @@ IMPL (oldstrsep, 2)
 static void
 do_one_test (impl_t * impl, const char *s1, const char *s2)
 {
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
 
   TIMING_NOW (start);
diff --git a/benchtests/bench-strspn.c b/benchtests/bench-strspn.c
index ac874fe234af66ad98a6188099f32c10484452db..ed97c5fc6205825531cb7e0b6f349223be0c3f66 100644
--- a/benchtests/bench-strspn.c
+++ b/benchtests/bench-strspn.c
@@ -60,7 +60,7 @@ SIMPLE_STRSPN (const CHAR *s, const CHAR *acc)
 static void
 do_one_test (impl_t *impl, const CHAR *s, const CHAR *acc, size_t exp_res)
 {
-  size_t res = CALL (impl, s, acc), i, iters = INNER_LOOP_ITERS;
+  size_t res = CALL (impl, s, acc), i, iters = INNER_LOOP_ITERS_MEDIUM;
   timing_t start, stop, cur;
 
   if (res != exp_res)
diff --git a/benchtests/bench-strstr.c b/benchtests/bench-strstr.c
index 6e45cb7c901fef4319ad62c8ba8ebb9109f12c28..1769dc2a102b3978ea51130e2dc7bb91c8b8947c 100644
--- a/benchtests/bench-strstr.c
+++ b/benchtests/bench-strstr.c
@@ -131,7 +131,7 @@ IMPL (basic_strstr, 0)
 static void
 do_one_test (impl_t *impl, const char *s1, const char *s2, char *exp_result)
 {
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
   char *res;
 
diff --git a/benchtests/bench-strtok.c b/benchtests/bench-strtok.c
index 15ae76e2373792781930a3d73418ef81fecfe645..f5ab587dd94ab5035632fbf78f4058cc62d72acb 100644
--- a/benchtests/bench-strtok.c
+++ b/benchtests/bench-strtok.c
@@ -60,7 +60,7 @@ IMPL (strtok, 1)
 static void
 do_one_test (impl_t * impl, const char *s1, const char *s2)
 {
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
   TIMING_NOW (start);
   for (i = 0; i < iters; ++i)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-08 15:39 [PATCH] Improve string benchtest timing Wilco Dijkstra
@ 2019-05-16 14:57 ` Wilco Dijkstra
  2019-05-20 18:44 ` Adhemerval Zanella
  1 sibling, 0 replies; 12+ messages in thread
From: Wilco Dijkstra @ 2019-05-16 14:57 UTC (permalink / raw)
  To: libc-alpha@sourceware.org; +Cc: nd


ping
  

Improve string benchtest timing.  Many tests run for 0.01s which is way too
short to give accurate results.  Other tests take over 40 seconds which is
way too long.  Significantly increase the iterations of the short running
tests.  Reduce number of alignment variations in the long running memcpy walk
tests so they take less than 5 seconds.

As a result most tests take at least 0.1s and all finish within 5 seconds.

OK for commit?

2019-05-08  Wilco Dijkstra  <wdijkstr@arm.com>

        * benchtests/bench-memcpy-random.c (do_one_test): Use medium iterations.
        * benchtests/bench-memcpy-walk.c (test_main): Reduce alignment tests.
        * benchtests/bench-memmem.c (do_one_test): Use small iterations.
        * benchtests/bench-memmove-walk.c (test_main): Reduce alignment tests.
        * benchtests/bench-memset-walk.c (test_main): Reduce alignment tests.
        * benchtests/bench-strcasestr.c (do_one_test): Use small iterations.
        * benchtests/bench-string.h (INNER_LOOP_ITERS): Increase iterations.
        (INNER_LOOP_ITERS_MEDIUM): New define.
        (INNER_LOOP_ITERS_SMALL): New define.
        * benchtests/bench-strpbrk.c (do_one_test): Use medium iterations.
        * benchtests/bench-strsep.c (do_one_test): Use small iterations.
        * benchtests/bench-strspn.c (do_one_test): Use medium iterations.
        * benchtests/bench-strstr.c (do_one_test): Use small iterations.
        * benchtests/bench-strtok.c (do_one_test): Use small iterations.

--

diff --git a/benchtests/bench-memcpy-random.c b/benchtests/bench-memcpy-random.c
index 873133d9a6988af8a251e1035ab8bded277d5775..f2c2e9dc3db29ca269514874a4976eec13d9d685 100644
--- a/benchtests/bench-memcpy-random.c
+++ b/benchtests/bench-memcpy-random.c
@@ -101,7 +101,7 @@ do_one_test (json_ctx_t *json_ctx, impl_t *impl, char *dst, char *src,
              copy_t *copy, size_t n)
 {
   timing_t start, stop, cur;
-  size_t iters = INNER_LOOP_ITERS * 20;
+  size_t iters = INNER_LOOP_ITERS_MEDIUM;
 
   TIMING_NOW (start);
   for (int i = 0; i < iters; ++i)
diff --git a/benchtests/bench-memcpy-walk.c b/benchtests/bench-memcpy-walk.c
index 16b455f7796b9b3b9288fb6e5245bed006967970..397a1e98df4f7bd069865262470c40a58665bcd6 100644
--- a/benchtests/bench-memcpy-walk.c
+++ b/benchtests/bench-memcpy-walk.c
@@ -103,12 +103,8 @@ test_main (void)
   json_array_begin (&json_ctx, "results");
   for (size_t i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
     {
-      /* Test length alignments from 0-16 bytes.  */
-      for (int j = 0; j < 8; j++)
-       {
-         do_test (&json_ctx, i + j);
-         do_test (&json_ctx, i + 16 - j);
-       }
+      do_test (&json_ctx, i);
+      do_test (&json_ctx, i + 1);
     }
 
   json_array_end (&json_ctx);
diff --git a/benchtests/bench-memmem.c b/benchtests/bench-memmem.c
index b6b97f3d1f5f94a71c177cc516cba81c869d3865..42778b07658efbe9d8027e04dc4682aba5ab8797 100644
--- a/benchtests/bench-memmem.c
+++ b/benchtests/bench-memmem.c
@@ -105,7 +105,7 @@ static void
 do_one_test (impl_t *impl, const void *haystack, size_t haystack_len,
              const void *needle, size_t needle_len, const void *expected)
 {
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
 
   TIMING_NOW (start);
diff --git a/benchtests/bench-memmove-walk.c b/benchtests/bench-memmove-walk.c
index 7b4c31e63f6f3d39beb8ec8da8bfffc82f24d1ac..2b4a7e4f2966b930a8488dd02647639ffbfddc73 100644
--- a/benchtests/bench-memmove-walk.c
+++ b/benchtests/bench-memmove-walk.c
@@ -107,23 +107,15 @@ test_main (void)
   /* Non-overlapping buffers.  */
   for (size_t i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
     {
-      /* Test length alignments from 0-16 bytes.  */
-      for (int j = 0; j < 8; j++)
-       {
-         do_test (&json_ctx, i + j, false);
-         do_test (&json_ctx, i + 16 - j, false);
-       }
+      do_test (&json_ctx, i, false);
+      do_test (&json_ctx, i + 1, false);
     }
 
   /* Overlapping buffers.  */
   for (size_t i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
     {
-      /* Test length alignments from 0-16 bytes.  */
-      for (int j = 0; j < 8; j++)
-       {
-         do_test (&json_ctx, i + j, true);
-         do_test (&json_ctx, i + 16 - j, true);
-       }
+      do_test (&json_ctx, i, true);
+      do_test (&json_ctx, i + 1, true);
     }
 
   json_array_end (&json_ctx);
diff --git a/benchtests/bench-memset-walk.c b/benchtests/bench-memset-walk.c
index 5f43671216e2294856df35a865061740de7d7d67..930aae34d10603c80a7c40d123fb0376cde56436 100644
--- a/benchtests/bench-memset-walk.c
+++ b/benchtests/bench-memset-walk.c
@@ -112,13 +112,16 @@ test_main (void)
 
   json_array_begin (&json_ctx, "results");
   for (i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
-      /* Test length alignments from 0-16 bytes.  */
-      for (int j = 0; j < i && j < 16; j++)
-       do_test (&json_ctx, 65, i + j);
+    {
+      do_test (&json_ctx, 65, i);
+      do_test (&json_ctx, 65, i + 1);
+    }
 
   for (i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
-      for (int j = 0; j < i && j < 16; j++)
-       do_test (&json_ctx, 0, i + j);
+    {
+      do_test (&json_ctx, 0, i);
+      do_test (&json_ctx, 0, i + 1);
+    }
 
   json_array_end (&json_ctx);
   json_attr_object_end (&json_ctx);
diff --git a/benchtests/bench-strcasestr.c b/benchtests/bench-strcasestr.c
index 1458070d47cbfe5e27d40288f8ac7f83df735d08..d701ff4f364b35bd5223831fc8df7b468e183ceb 100644
--- a/benchtests/bench-strcasestr.c
+++ b/benchtests/bench-strcasestr.c
@@ -36,7 +36,7 @@ IMPL (strcasestr, 1)
 static void
 do_one_test (impl_t *impl, const char *s1, const char *s2, char *exp_result)
 {
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
 
   TIMING_NOW (start);
diff --git a/benchtests/bench-string.h b/benchtests/bench-string.h
index 9d3332a624c54b22884b2f90444c841a67f4baeb..e7fb015bc639d97e702e2c0e3c9aee86c9537e2c 100644
--- a/benchtests/bench-string.h
+++ b/benchtests/bench-string.h
@@ -124,7 +124,9 @@ extern impl_t __start_impls[], __stop_impls[];
 # define OPT_RANDOM 10001
 # define OPT_SEED 10002
 
-# define INNER_LOOP_ITERS 64
+# define INNER_LOOP_ITERS 8192
+# define INNER_LOOP_ITERS_MEDIUM 1024
+# define INNER_LOOP_ITERS_SMALL 32
 
 int ret, do_srandom;
 unsigned int seed;
diff --git a/benchtests/bench-strpbrk.c b/benchtests/bench-strpbrk.c
index 14e1bcb873fa86caa1f50a9013f52cd422e74185..e484ebd0c08df802e67578e613535f65ad233bdc 100644
--- a/benchtests/bench-strpbrk.c
+++ b/benchtests/bench-strpbrk.c
@@ -66,7 +66,7 @@ static void
 do_one_test (impl_t *impl, const CHAR *s, const CHAR *rej, RES_TYPE exp_res)
 {
   RES_TYPE res = CALL (impl, s, rej);
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_MEDIUM;
   timing_t start, stop, cur;
 
   if (res != exp_res)
diff --git a/benchtests/bench-strsep.c b/benchtests/bench-strsep.c
index fea7fbdfa6f2dcc8601354f56558dc3d83f07eef..96d3be52221f40ed1afeda1896921a04215a6b85 100644
--- a/benchtests/bench-strsep.c
+++ b/benchtests/bench-strsep.c
@@ -103,7 +103,7 @@ IMPL (oldstrsep, 2)
 static void
 do_one_test (impl_t * impl, const char *s1, const char *s2)
 {
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
 
   TIMING_NOW (start);
diff --git a/benchtests/bench-strspn.c b/benchtests/bench-strspn.c
index ac874fe234af66ad98a6188099f32c10484452db..ed97c5fc6205825531cb7e0b6f349223be0c3f66 100644
--- a/benchtests/bench-strspn.c
+++ b/benchtests/bench-strspn.c
@@ -60,7 +60,7 @@ SIMPLE_STRSPN (const CHAR *s, const CHAR *acc)
 static void
 do_one_test (impl_t *impl, const CHAR *s, const CHAR *acc, size_t exp_res)
 {
-  size_t res = CALL (impl, s, acc), i, iters = INNER_LOOP_ITERS;
+  size_t res = CALL (impl, s, acc), i, iters = INNER_LOOP_ITERS_MEDIUM;
   timing_t start, stop, cur;
 
   if (res != exp_res)
diff --git a/benchtests/bench-strstr.c b/benchtests/bench-strstr.c
index 6e45cb7c901fef4319ad62c8ba8ebb9109f12c28..1769dc2a102b3978ea51130e2dc7bb91c8b8947c 100644
--- a/benchtests/bench-strstr.c
+++ b/benchtests/bench-strstr.c
@@ -131,7 +131,7 @@ IMPL (basic_strstr, 0)
 static void
 do_one_test (impl_t *impl, const char *s1, const char *s2, char *exp_result)
 {
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
   char *res;
 
diff --git a/benchtests/bench-strtok.c b/benchtests/bench-strtok.c
index 15ae76e2373792781930a3d73418ef81fecfe645..f5ab587dd94ab5035632fbf78f4058cc62d72acb 100644
--- a/benchtests/bench-strtok.c
+++ b/benchtests/bench-strtok.c
@@ -60,7 +60,7 @@ IMPL (strtok, 1)
 static void
 do_one_test (impl_t * impl, const char *s1, const char *s2)
 {
-  size_t i, iters = INNER_LOOP_ITERS;
+  size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
   TIMING_NOW (start);
   for (i = 0; i < iters; ++i)
    

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-08 15:39 [PATCH] Improve string benchtest timing Wilco Dijkstra
  2019-05-16 14:57 ` Wilco Dijkstra
@ 2019-05-20 18:44 ` Adhemerval Zanella
  2019-05-21  7:01   ` Siddhesh Poyarekar
  1 sibling, 1 reply; 12+ messages in thread
From: Adhemerval Zanella @ 2019-05-20 18:44 UTC (permalink / raw)
  To: libc-alpha; +Cc: Siddhesh Poyarekar, Wilco Dijkstra



On 08/05/2019 12:39, Wilco Dijkstra wrote:
> Improve string benchtest timing.  Many tests run for 0.01s which is way too
> short to give accurate results.  Other tests take over 40 seconds which is
> way too long.  Significantly increase the iterations of the short running
> tests.  Reduce number of alignment variations in the long running memcpy walk
> tests so they take less than 5 seconds.
> 
> As a result most tests take at least 0.1s and all finish within 5 seconds.
> 
> OK for commit?
> 
> 2019-05-08  Wilco Dijkstra  <wdijkstr@arm.com>
> 
>         * benchtests/bench-memcpy-random.c (do_one_test): Use medium iterations.
>         * benchtests/bench-memcpy-walk.c (test_main): Reduce alignment tests.
>         * benchtests/bench-memmem.c (do_one_test): Use small iterations.
>         * benchtests/bench-memmove-walk.c (test_main): Reduce alignment tests.
>         * benchtests/bench-memset-walk.c (test_main): Reduce alignment tests.
>         * benchtests/bench-strcasestr.c (do_one_test): Use small iterations.
>         * benchtests/bench-string.h (INNER_LOOP_ITERS): Increase iterations.
>         (INNER_LOOP_ITERS_MEDIUM): New define.
>         (INNER_LOOP_ITERS_SMALL): New define.
>         * benchtests/bench-strpbrk.c (do_one_test): Use medium iterations.
>         * benchtests/bench-strsep.c (do_one_test): Use small iterations.
>         * benchtests/bench-strspn.c (do_one_test): Use medium iterations.
>         * benchtests/bench-strstr.c (do_one_test): Use small iterations.
>         * benchtests/bench-strtok.c (do_one_test): Use small iterations.
> 
> --
> 
> diff --git a/benchtests/bench-memcpy-random.c b/benchtests/bench-memcpy-random.c
> index 873133d9a6988af8a251e1035ab8bded277d5775..f2c2e9dc3db29ca269514874a4976eec13d9d685 100644
> --- a/benchtests/bench-memcpy-random.c
> +++ b/benchtests/bench-memcpy-random.c
> @@ -101,7 +101,7 @@ do_one_test (json_ctx_t *json_ctx, impl_t *impl, char *dst, char *src,
>  	     copy_t *copy, size_t n)
>  {
>    timing_t start, stop, cur;
> -  size_t iters = INNER_LOOP_ITERS * 20;
> +  size_t iters = INNER_LOOP_ITERS_MEDIUM;
>  
>    TIMING_NOW (start);
>    for (int i = 0; i < iters; ++i)

Ok.

> diff --git a/benchtests/bench-memcpy-walk.c b/benchtests/bench-memcpy-walk.c
> index 16b455f7796b9b3b9288fb6e5245bed006967970..397a1e98df4f7bd069865262470c40a58665bcd6 100644
> --- a/benchtests/bench-memcpy-walk.c
> +++ b/benchtests/bench-memcpy-walk.c
> @@ -103,12 +103,8 @@ test_main (void)
>    json_array_begin (&json_ctx, "results");
>    for (size_t i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
>      {
> -      /* Test length alignments from 0-16 bytes.  */
> -      for (int j = 0; j < 8; j++)
> -	{
> -	  do_test (&json_ctx, i + j);
> -	  do_test (&json_ctx, i + 16 - j);
> -	}
> +      do_test (&json_ctx, i);
> +      do_test (&json_ctx, i + 1);
>      }
>  
>    json_array_end (&json_ctx);

Reading the initial benchmark comment, it seems the original intention is
indeed to measure some sizes different than the multiple of START_SIZE
(my guess is to measure some tail handling on memcpy).  So either adjust
the comment accordingly to reflect what the benchmark is now measuring
or maybe add some sizes non multiple of START_SIZE.

Siddhesh, do you recall why did you added the 'misaligning' sizes?

> diff --git a/benchtests/bench-memmem.c b/benchtests/bench-memmem.c
> index b6b97f3d1f5f94a71c177cc516cba81c869d3865..42778b07658efbe9d8027e04dc4682aba5ab8797 100644
> --- a/benchtests/bench-memmem.c
> +++ b/benchtests/bench-memmem.c
> @@ -105,7 +105,7 @@ static void
>  do_one_test (impl_t *impl, const void *haystack, size_t haystack_len,
>  	     const void *needle, size_t needle_len, const void *expected)
>  {
> -  size_t i, iters = INNER_LOOP_ITERS;
> +  size_t i, iters = INNER_LOOP_ITERS_SMALL;
>    timing_t start, stop, cur;
>  
>    TIMING_NOW (start);

Ok.

> diff --git a/benchtests/bench-memmove-walk.c b/benchtests/bench-memmove-walk.c
> index 7b4c31e63f6f3d39beb8ec8da8bfffc82f24d1ac..2b4a7e4f2966b930a8488dd02647639ffbfddc73 100644
> --- a/benchtests/bench-memmove-walk.c
> +++ b/benchtests/bench-memmove-walk.c
> @@ -107,23 +107,15 @@ test_main (void)
>    /* Non-overlapping buffers.  */
>    for (size_t i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
>      {
> -      /* Test length alignments from 0-16 bytes.  */
> -      for (int j = 0; j < 8; j++)
> -	{
> -	  do_test (&json_ctx, i + j, false);
> -	  do_test (&json_ctx, i + 16 - j, false);
> -	}
> +      do_test (&json_ctx, i, false);
> +      do_test (&json_ctx, i + 1, false);
>      }
>  
>    /* Overlapping buffers.  */
>    for (size_t i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
>      {
> -      /* Test length alignments from 0-16 bytes.  */
> -      for (int j = 0; j < 8; j++)
> -	{
> -	  do_test (&json_ctx, i + j, true);
> -	  do_test (&json_ctx, i + 16 - j, true);
> -	}
> +      do_test (&json_ctx, i, true);
> +      do_test (&json_ctx, i + 1, true);
>      }
>  
>    json_array_end (&json_ctx);

Same comment from bench-memcpy-walk.c.

> diff --git a/benchtests/bench-memset-walk.c b/benchtests/bench-memset-walk.c
> index 5f43671216e2294856df35a865061740de7d7d67..930aae34d10603c80a7c40d123fb0376cde56436 100644
> --- a/benchtests/bench-memset-walk.c
> +++ b/benchtests/bench-memset-walk.c
> @@ -112,13 +112,16 @@ test_main (void)
>  
>    json_array_begin (&json_ctx, "results");
>    for (i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
> -      /* Test length alignments from 0-16 bytes.  */
> -      for (int j = 0; j < i && j < 16; j++)
> -	do_test (&json_ctx, 65, i + j);
> +    {
> +      do_test (&json_ctx, 65, i);
> +      do_test (&json_ctx, 65, i + 1);
> +    }
>  
>    for (i = START_SIZE; i <= MIN_PAGE_SIZE; i <<= 1)
> -      for (int j = 0; j < i && j < 16; j++)
> -	do_test (&json_ctx, 0, i + j);
> +    {
> +      do_test (&json_ctx, 0, i);
> +      do_test (&json_ctx, 0, i + 1);
> +    }
>  
>    json_array_end (&json_ctx);
>    json_attr_object_end (&json_ctx);

I don't have a strong opinion here, but maybe use the size of the uintptr_t
to add some 'misaligning' to size?

> diff --git a/benchtests/bench-strcasestr.c b/benchtests/bench-strcasestr.c
> index 1458070d47cbfe5e27d40288f8ac7f83df735d08..d701ff4f364b35bd5223831fc8df7b468e183ceb 100644
> --- a/benchtests/bench-strcasestr.c
> +++ b/benchtests/bench-strcasestr.c
> @@ -36,7 +36,7 @@ IMPL (strcasestr, 1)
>  static void
>  do_one_test (impl_t *impl, const char *s1, const char *s2, char *exp_result)
>  {
> -  size_t i, iters = INNER_LOOP_ITERS;
> +  size_t i, iters = INNER_LOOP_ITERS_SMALL;
>    timing_t start, stop, cur;
>  
>    TIMING_NOW (start);

Ok.

> diff --git a/benchtests/bench-string.h b/benchtests/bench-string.h
> index 9d3332a624c54b22884b2f90444c841a67f4baeb..e7fb015bc639d97e702e2c0e3c9aee86c9537e2c 100644
> --- a/benchtests/bench-string.h
> +++ b/benchtests/bench-string.h
> @@ -124,7 +124,9 @@ extern impl_t __start_impls[], __stop_impls[];
>  # define OPT_RANDOM 10001
>  # define OPT_SEED 10002
>  
> -# define INNER_LOOP_ITERS 64
> +# define INNER_LOOP_ITERS 8192
> +# define INNER_LOOP_ITERS_MEDIUM 1024
> +# define INNER_LOOP_ITERS_SMALL 32
>  
>  int ret, do_srandom;
>  unsigned int seed;

Ok.

> diff --git a/benchtests/bench-strpbrk.c b/benchtests/bench-strpbrk.c
> index 14e1bcb873fa86caa1f50a9013f52cd422e74185..e484ebd0c08df802e67578e613535f65ad233bdc 100644
> --- a/benchtests/bench-strpbrk.c
> +++ b/benchtests/bench-strpbrk.c
> @@ -66,7 +66,7 @@ static void
>  do_one_test (impl_t *impl, const CHAR *s, const CHAR *rej, RES_TYPE exp_res)
>  {
>    RES_TYPE res = CALL (impl, s, rej);
> -  size_t i, iters = INNER_LOOP_ITERS;
> +  size_t i, iters = INNER_LOOP_ITERS_MEDIUM;
>    timing_t start, stop, cur;
>  
>    if (res != exp_res)

Ok.

> diff --git a/benchtests/bench-strsep.c b/benchtests/bench-strsep.c
> index fea7fbdfa6f2dcc8601354f56558dc3d83f07eef..96d3be52221f40ed1afeda1896921a04215a6b85 100644
> --- a/benchtests/bench-strsep.c
> +++ b/benchtests/bench-strsep.c
> @@ -103,7 +103,7 @@ IMPL (oldstrsep, 2)
>  static void
>  do_one_test (impl_t * impl, const char *s1, const char *s2)
>  {
> -  size_t i, iters = INNER_LOOP_ITERS;
> +  size_t i, iters = INNER_LOOP_ITERS_SMALL;
>    timing_t start, stop, cur;
>  
>    TIMING_NOW (start);

Ok.

> diff --git a/benchtests/bench-strspn.c b/benchtests/bench-strspn.c
> index ac874fe234af66ad98a6188099f32c10484452db..ed97c5fc6205825531cb7e0b6f349223be0c3f66 100644
> --- a/benchtests/bench-strspn.c
> +++ b/benchtests/bench-strspn.c
> @@ -60,7 +60,7 @@ SIMPLE_STRSPN (const CHAR *s, const CHAR *acc)
>  static void
>  do_one_test (impl_t *impl, const CHAR *s, const CHAR *acc, size_t exp_res)
>  {
> -  size_t res = CALL (impl, s, acc), i, iters = INNER_LOOP_ITERS;
> +  size_t res = CALL (impl, s, acc), i, iters = INNER_LOOP_ITERS_MEDIUM;
>    timing_t start, stop, cur;
>  
>    if (res != exp_res)

Ok.

> diff --git a/benchtests/bench-strstr.c b/benchtests/bench-strstr.c
> index 6e45cb7c901fef4319ad62c8ba8ebb9109f12c28..1769dc2a102b3978ea51130e2dc7bb91c8b8947c 100644
> --- a/benchtests/bench-strstr.c
> +++ b/benchtests/bench-strstr.c
> @@ -131,7 +131,7 @@ IMPL (basic_strstr, 0)
>  static void
>  do_one_test (impl_t *impl, const char *s1, const char *s2, char *exp_result)
>  {
> -  size_t i, iters = INNER_LOOP_ITERS;
> +  size_t i, iters = INNER_LOOP_ITERS_SMALL;
>    timing_t start, stop, cur;
>    char *res;
>  

Ok.

> diff --git a/benchtests/bench-strtok.c b/benchtests/bench-strtok.c
> index 15ae76e2373792781930a3d73418ef81fecfe645..f5ab587dd94ab5035632fbf78f4058cc62d72acb 100644
> --- a/benchtests/bench-strtok.c
> +++ b/benchtests/bench-strtok.c
> @@ -60,7 +60,7 @@ IMPL (strtok, 1)
>  static void
>  do_one_test (impl_t * impl, const char *s1, const char *s2)
>  {
> -  size_t i, iters = INNER_LOOP_ITERS;
> +  size_t i, iters = INNER_LOOP_ITERS_SMALL;
>    timing_t start, stop, cur;
>    TIMING_NOW (start);
>    for (i = 0; i < iters; ++i)
> 

Ok.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-20 18:44 ` Adhemerval Zanella
@ 2019-05-21  7:01   ` Siddhesh Poyarekar
  2019-05-21 13:58     ` Wilco Dijkstra
  0 siblings, 1 reply; 12+ messages in thread
From: Siddhesh Poyarekar @ 2019-05-21  7:01 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha; +Cc: Wilco Dijkstra

On 21/05/19 12:14 AM, Adhemerval Zanella wrote:
> Reading the initial benchmark comment, it seems the original intention is
> indeed to measure some sizes different than the multiple of START_SIZE
> (my guess is to measure some tail handling on memcpy).  So either adjust
> the comment accordingly to reflect what the benchmark is now measuring
> or maybe add some sizes non multiple of START_SIZE.
> 
> Siddhesh, do you recall why did you added the 'misaligning' sizes?

The misalignment was a simulation of a proprietary workload that was
used to tune memcpy when I wrote it.  This proposed change skews the
numbers in favour of aligned inputs, which is not really a bad thing.
I'm OK with this bit.

Siddhesh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-21  7:01   ` Siddhesh Poyarekar
@ 2019-05-21 13:58     ` Wilco Dijkstra
  2019-05-21 14:11       ` Siddhesh Poyarekar
  0 siblings, 1 reply; 12+ messages in thread
From: Wilco Dijkstra @ 2019-05-21 13:58 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Adhemerval Zanella, libc-alpha@sourceware.org; +Cc: nd

Hi Siddhesh,

>> Reading the initial benchmark comment, it seems the original intention is
>> indeed to measure some sizes different than the multiple of START_SIZE
>> (my guess is to measure some tail handling on memcpy).  So either adjust
>> the comment accordingly to reflect what the benchmark is now measuring
>> or maybe add some sizes non multiple of START_SIZE.
>> 
>> Siddhesh, do you recall why did you added the 'misaligning' sizes?
>
> The misalignment was a simulation of a proprietary workload that was
> used to tune memcpy when I wrote it.  This proposed change skews the
> numbers in favour of aligned inputs, which is not really a bad thing.
> I'm OK with this bit.

Well the test doesn't actually test misaligned copies - both source and
destination are always mutually aligned, so any memcpy implementation which
aligns either source or destination will only do aligned copies.

In any case I'm not sure what the test is supposed to measure - the scores are
identical across all memcpy implementations. The time taken for double the
copy size is exactly twice as much.

Wilco

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-21 13:58     ` Wilco Dijkstra
@ 2019-05-21 14:11       ` Siddhesh Poyarekar
  2019-05-21 15:11         ` Patrick McGehearty
  2019-05-22 11:11         ` Wilco Dijkstra
  0 siblings, 2 replies; 12+ messages in thread
From: Siddhesh Poyarekar @ 2019-05-21 14:11 UTC (permalink / raw)
  To: Wilco Dijkstra, Adhemerval Zanella, libc-alpha@sourceware.org; +Cc: nd

On 21/05/19 7:28 PM, Wilco Dijkstra wrote:
> Well the test doesn't actually test misaligned copies - both source and
> destination are always mutually aligned, so any memcpy implementation which
> aligns either source or destination will only do aligned copies.

They were not intended to be mutually misaligned, that was not the
intent of the benchmark since the target application it modeled did not
have such inputs.

> In any case I'm not sure what the test is supposed to measure - the scores are
> identical across all memcpy implementations. The time taken for double the
> copy size is exactly twice as much.

Right, you'll probably only see differences in case of mutually
misaligned inputs.

Siddhesh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-21 14:11       ` Siddhesh Poyarekar
@ 2019-05-21 15:11         ` Patrick McGehearty
  2019-05-22 11:11         ` Wilco Dijkstra
  1 sibling, 0 replies; 12+ messages in thread
From: Patrick McGehearty @ 2019-05-21 15:11 UTC (permalink / raw)
  To: libc-alpha

General comment on performance test design:

The idea that it is worthwhile to measure memcpy with
misaligned src/dest that are mutually aligned shows
how exposure to particular implementations can
create blind spots in implementation. I'm speaking
of my own blind spot in this issue as I've been
involved in tuning platform specific memcpy
on Sparc/Solaris for over a decade and it never
occurred to me to test that particular case.
That's because the first memcpy code I worked from
[optimized for the Sparc Cheetah processor, first
available almost twenty years ago] had an early
test to determine if the src/dest were mutually
aligned and if so then move a few bytes until
they were long word aligned. Thus, for medium length
or longer copies, the performance was essentially
the same for aligned copies with aligned starting points
and aligned copies with misaligned starting points.

Full coverage performance testing of library code can be
challenging, especially when the "typical" usage pattern
is not predictable in advance. Even something as well
defined as memcpy can have very different usage patterns.
Examples:
Machine characteristic differences:
target machine supports/does not support misaligned access at full speed
target machine supports/does not support misaligned access at cost
     similar to or less than mispredicted branch
target machine imposes major overhead for misaligned access
     equivalent to a segfault, handled by the kernel
Even within a single architectural family, these details can
change with implementations. Across architectures like x86, sparc,
power, arm, etc. it can be very challenging to write 'general'
fast code.

Data characteristic differences also can be challenging:
    Data is overwhelmingly aligned but often short
       [Need to minimize testing before moving data]
    Data is often misaligned sometimes aligned and large
       [Best to special case various alignments with large copies
        handled differently]
    Data frequently overlaps
       [Need to watch out for HW limitations on load/store timing]
and so on...

In writing tests, if we are buried in irrelevant data, we are more
likely to miss important changes while tuning. But if we trim the
tests too much, we may miss important limitations of the implementation.
Another opportunity comes in how the data is presented. If the
data to be compared is spread in an irregular way, it is more
difficult to identify trends as opposed to random variations.
I find the current standard glibc mem* performance tests difficult to
quickly skim/compare for possible performance regressions.

I've just touched on a few of the issues involved for memcpy perf work.
malloc is even more complex which is why we are still discussing
how to performance test glibc malloc.

- Patrick McGehearty (patrick.mcgehearty@oracle.com)


On 5/21/2019 9:11 AM, Siddhesh Poyarekar wrote:
> On 21/05/19 7:28 PM, Wilco Dijkstra wrote:
>> Well the test doesn't actually test misaligned copies - both source and
>> destination are always mutually aligned, so any memcpy implementation which
>> aligns either source or destination will only do aligned copies.
> They were not intended to be mutually misaligned, that was not the
> intent of the benchmark since the target application it modeled did not
> have such inputs.
>
>> In any case I'm not sure what the test is supposed to measure - the scores are
>> identical across all memcpy implementations. The time taken for double the
>> copy size is exactly twice as much.
> Right, you'll probably only see differences in case of mutually
> misaligned inputs.
>
> Siddhesh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-21 14:11       ` Siddhesh Poyarekar
  2019-05-21 15:11         ` Patrick McGehearty
@ 2019-05-22 11:11         ` Wilco Dijkstra
  2019-05-22 11:27           ` Siddhesh Poyarekar
  1 sibling, 1 reply; 12+ messages in thread
From: Wilco Dijkstra @ 2019-05-22 11:11 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Adhemerval Zanella, libc-alpha@sourceware.org; +Cc: nd

Hi Siddhesh,
  
>On 21/05/19 7:28 PM, Wilco Dijkstra wrote:
>> Well the test doesn't actually test misaligned copies - both source and
>> destination are always mutually aligned, so any memcpy implementation which
>> aligns either source or destination will only do aligned copies.
>
> They were not intended to be mutually misaligned, that was not the
> intent of the benchmark since the target application it modeled did not
> have such inputs.
>
>> In any case I'm not sure what the test is supposed to measure - the scores are
>> identical across all memcpy implementations. The time taken for double the
>> copy size is exactly twice as much.
>
> Right, you'll probably only see differences in case of mutually
> misaligned inputs.

Well if I force the copies to be mutually unaligned, there is only about 1% difference
for a few of the memcpy implementations compared to them being always aligned
The others show identical performance whether aligned or not. This is not too
surprising since the test is basically waiting for DRAM most of the time.

So if we wanted to measure something useful we'd need to do it differently. Maybe
the goal was to measure DRAM bandwidth? If so we could modify it to compare
copy bandwidth for just a few different sizes (corresponding with typical L1/L2/L3 sizes).

Wilco

    

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-22 11:11         ` Wilco Dijkstra
@ 2019-05-22 11:27           ` Siddhesh Poyarekar
  2019-05-22 12:38             ` Anton Youdkevitch
  0 siblings, 1 reply; 12+ messages in thread
From: Siddhesh Poyarekar @ 2019-05-22 11:27 UTC (permalink / raw)
  To: Wilco Dijkstra, Adhemerval Zanella, libc-alpha@sourceware.org; +Cc: nd

On 22/05/19 4:41 PM, Wilco Dijkstra wrote:
> Well if I force the copies to be mutually unaligned, there is only about 1% difference
> for a few of the memcpy implementations compared to them being always aligned
> The others show identical performance whether aligned or not. This is not too
> surprising since the test is basically waiting for DRAM most of the time.

That's a good point.  Is that the case for thunderx as well?  IIRC they
perform particularly badly with misaligned code but I don't know if
they're bad enough to be significant in the face of DRAM waits.

> So if we wanted to measure something useful we'd need to do it differently. Maybe
> the goal was to measure DRAM bandwidth? If so we could modify it to compare
> copy bandwidth for just a few different sizes (corresponding with typical L1/L2/L3 sizes).

That sounds like a good idea.

Siddhesh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-22 11:27           ` Siddhesh Poyarekar
@ 2019-05-22 12:38             ` Anton Youdkevitch
  2019-05-22 13:24               ` Siddhesh Poyarekar
  0 siblings, 1 reply; 12+ messages in thread
From: Anton Youdkevitch @ 2019-05-22 12:38 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Wilco Dijkstra, Adhemerval Zanella,
	libc-alpha@sourceware.org
  Cc: nd

Siddhesh,

On 22.5.2019 14:27 , Siddhesh Poyarekar wrote:
> On 22/05/19 4:41 PM, Wilco Dijkstra wrote:
>> Well if I force the copies to be mutually unaligned, there is only about 1% difference
>> for a few of the memcpy implementations compared to them being always aligned
>> The others show identical performance whether aligned or not. This is not too
>> surprising since the test is basically waiting for DRAM most of the time.
>
> That's a good point.  Is that the case for thunderx as well?  IIRC they
> perform particularly badly with misaligned code but I don't know if
> they're bad enough to be significant in the face of DRAM waits.
They were. The difference was up to 50% in some cases. But this is the
data for bench-memcpy. Reports for bench-memcpy-walk and
bench-memcpy-random do not have alignment info in them.

So, unaligned accesses can be much slower that aligned ones. Or, at least,
this is how the benchmarks measure them. We already know that the results
are not very stable, though.

P.S. To be precise I'm speaking about TX2 here.

-- 
   Thanks,
   Anton

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-22 12:38             ` Anton Youdkevitch
@ 2019-05-22 13:24               ` Siddhesh Poyarekar
  2019-05-22 13:59                 ` Anton Youdkevitch
  0 siblings, 1 reply; 12+ messages in thread
From: Siddhesh Poyarekar @ 2019-05-22 13:24 UTC (permalink / raw)
  To: Anton Youdkevitch, Wilco Dijkstra, Adhemerval Zanella,
	libc-alpha@sourceware.org
  Cc: nd

On 22/05/19 6:08 PM, Anton Youdkevitch wrote:
> They were. The difference was up to 50% in some cases. But this is the
> data for bench-memcpy. Reports for bench-memcpy-walk and
> bench-memcpy-random do not have alignment info in them.
> 
> So, unaligned accesses can be much slower that aligned ones. Or, at least,
> this is how the benchmarks measure them. We already know that the results
> are not very stable, though.

bench-memcpy-walk mixes in the misaligned sizes, but it shouldn't have a
big impact because 1) the source and destination are not mutually
misaligned and 2) the alignment code is usually a tiny portion of the
execution time and hence shouldn't show up in memcpy-walk.

Siddhesh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Improve string benchtest timing
  2019-05-22 13:24               ` Siddhesh Poyarekar
@ 2019-05-22 13:59                 ` Anton Youdkevitch
  0 siblings, 0 replies; 12+ messages in thread
From: Anton Youdkevitch @ 2019-05-22 13:59 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Wilco Dijkstra, Adhemerval Zanella,
	libc-alpha@sourceware.org
  Cc: nd

On 22.5.2019 16:24 , Siddhesh Poyarekar wrote:
> On 22/05/19 6:08 PM, Anton Youdkevitch wrote:
>> They were. The difference was up to 50% in some cases. But this is the
>> data for bench-memcpy. Reports for bench-memcpy-walk and
>> bench-memcpy-random do not have alignment info in them.
>>
>> So, unaligned accesses can be much slower that aligned ones. Or, at least,
>> this is how the benchmarks measure them. We already know that the results
>> are not very stable, though.
>
> bench-memcpy-walk mixes in the misaligned sizes, but it shouldn't have a
> big impact because 1) the source and destination are not mutually
> misaligned and 2) the alignment code is usually a tiny portion of the
> execution time and hence shouldn't show up in memcpy-walk.
OK, sorry, I probably misunderstood the purpose of memcpy-walk.
If it has a specific access pattern intentionally (for whatever reason)
then my objection does not apply.

-- 
   Thanks,
   Anton

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-05-22 13:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-08 15:39 [PATCH] Improve string benchtest timing Wilco Dijkstra
2019-05-16 14:57 ` Wilco Dijkstra
2019-05-20 18:44 ` Adhemerval Zanella
2019-05-21  7:01   ` Siddhesh Poyarekar
2019-05-21 13:58     ` Wilco Dijkstra
2019-05-21 14:11       ` Siddhesh Poyarekar
2019-05-21 15:11         ` Patrick McGehearty
2019-05-22 11:11         ` Wilco Dijkstra
2019-05-22 11:27           ` Siddhesh Poyarekar
2019-05-22 12:38             ` Anton Youdkevitch
2019-05-22 13:24               ` Siddhesh Poyarekar
2019-05-22 13:59                 ` Anton Youdkevitch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).