From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org
Subject: [PATCH 03/28] powerpc: Remove power4 mpa optimization
Date: Fri, 29 Mar 2019 10:35:04 -0300 [thread overview]
Message-ID: <20190329133529.22523-4-adhemerval.zanella@linaro.org> (raw)
In-Reply-To: <20190329133529.22523-1-adhemerval.zanella@linaro.org>
This patch removes the POWER4 optimized mpa optimization used for power4+
build. For newer chips, GCC generates *worse* code than generic
implementation as benchmaks result below. One possibilty would to add
IFUNC variants for the mpa routines (as x86_64), but it will add complexity
only for older chips (and one would need to check if power5, power5+, and
power6 do benefict from this optimization), and only for specific
implementation (since most used ones such as sin, cos, exp, pow already
avoid calling the slow multiprecision path).
* POWER9 patched
"atan": {
"": {
"duration": 5.12565e+09,
"iterations": 1.552e+08,
"max": 100.552,
"min": 7.799,
"mean": 33.0261
},
"144bits": {
"duration": 5.12745e+09,
"iterations": 825000,
"max": 7517.17,
"min": 6186.3,
"mean": 6215.09
}
}
"acos": {
"": {
"duration": 5.21741e+09,
"iterations": 1.269e+08,
"max": 191.738,
"min": 7.931,
"mean": 41.1144
},
"slow": {
"duration": 5.25999e+09,
"iterations": 198000,
"max": 26681.7,
"min": 26463.6,
"mean": 26565.6
}
}
* POWER9 master
"atan": {
"": {
"duration": 5.12815e+09,
"iterations": 1.552e+08,
"max": 134.788,
"min": 7.803,
"mean": 33.0422
},
"144bits": {
"duration": 5.1209e+09,
"iterations": 447000,
"max": 11615.8,
"min": 11301.8,
"mean": 11456.2
}
}
"acos": {
"": {
"duration": 5.22272e+09,
"iterations": 1.269e+08,
"max": 115.981,
"min": 7.931,
"mean": 41.1562
},
"slow": {
"duration": 5.28723e+09,
"iterations": 96000,
"max": 55434.1,
"min": 54820.6,
"mean": 55075.3
}
}
* POWER8 patched
"acos": {
"": {
"duration": 5.16398e+09,
"iterations": 9.99e+07,
"max": 174.408,
"min": 8.645,
"mean": 51.6915
},
"slow": {
"duration": 5.16982e+09,
"iterations": 96000,
"max": 54830.5,
"min": 53703.8,
"mean": 53852.3
}
}
* POWER8 master
"acos": {
"": {
"duration": 5.17019e+09,
"iterations": 9.99e+07,
"max": 186.127,
"min": 8.633,
"mean": 51.7537
},
"slow": {
"duration": 5.34225e+09,
"iterations": 90000,
"max": 60353.2,
"min": 59155.3,
"mean": 59358.4
}
}
* POWER7 patched
"asin": {
"": {
"duration": 5.15559e+09,
"iterations": 6.5e+07,
"max": 193.335,
"min": 12.227,
"mean": 79.3168
},
"slow": {
"duration": 5.20538e+09,
"iterations": 80000,
"max": 65705.2,
"min": 64299.4,
"mean": 65067.3
}
}
* POWER7 master
"asin": {
"": {
"duration": 5.15446e+09,
"iterations": 6.5e+07,
"max": 184.575,
"min": 12.226,
"mean": 79.2994
},
"slow": {
"duration": 5.20616e+09,
"iterations": 80000,
"max": 65705.1,
"min": 64336.6,
"mean": 65076.9
}
}
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/power4/fpu/Makefile: Remove file.
* sysdeps/powerpc/power4/fpu/mpa-arch.h: Likewise.
* sysdeps/powerpc/power4/fpu/mpa.c: Likewise.
---
sysdeps/powerpc/power4/fpu/Makefile | 5 -
sysdeps/powerpc/power4/fpu/mpa-arch.h | 56 -------
sysdeps/powerpc/power4/fpu/mpa.c | 214 --------------------------
3 files changed, 275 deletions(-)
delete mode 100644 sysdeps/powerpc/power4/fpu/Makefile
delete mode 100644 sysdeps/powerpc/power4/fpu/mpa-arch.h
delete mode 100644 sysdeps/powerpc/power4/fpu/mpa.c
diff --git a/sysdeps/powerpc/power4/fpu/Makefile b/sysdeps/powerpc/power4/fpu/Makefile
deleted file mode 100644
index f487ed6014..0000000000
--- a/sysdeps/powerpc/power4/fpu/Makefile
+++ /dev/null
@@ -1,5 +0,0 @@
-# Makefile fragment for POWER4/5/5+ with FPU.
-
-ifeq ($(subdir),math)
-CFLAGS-mpa.c += --param max-unroll-times=4 -funroll-loops -fpeel-loops
-endif
diff --git a/sysdeps/powerpc/power4/fpu/mpa-arch.h b/sysdeps/powerpc/power4/fpu/mpa-arch.h
deleted file mode 100644
index 929c60b314..0000000000
--- a/sysdeps/powerpc/power4/fpu/mpa-arch.h
+++ /dev/null
@@ -1,56 +0,0 @@
-/* Overridable constants and operations.
- Copyright (C) 2013-2019 Free Software Foundation, Inc.
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU Lesser General Public License as published by
- the Free Software Foundation; either version 2.1 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public License
- along with this program; if not, see <http://www.gnu.org/licenses/>. */
-
-typedef double mantissa_t;
-typedef double mantissa_store_t;
-
-#define TWOPOW(i) (0x1.0p##i)
-
-#define RADIX TWOPOW (24) /* 2^24 */
-#define CUTTER TWOPOW (76) /* 2^76 */
-#define RADIXI 0x1.0p-24 /* 2^-24 */
-#define TWO52 TWOPOW (52) /* 2^52 */
-
-/* Divide D by RADIX and put the remainder in R. */
-#define DIV_RADIX(d,r) \
- ({ \
- double u = ((d) + CUTTER) - CUTTER; \
- if (u > (d)) \
- u -= RADIX; \
- r = (d) - u; \
- (d) = u * RADIXI; \
- })
-
-/* Put the integer component of a double X in R and retain the fraction in
- X. */
-#define INTEGER_OF(x, r) \
- ({ \
- double u = ((x) + TWO52) - TWO52; \
- if (u > (x)) \
- u -= 1; \
- (r) = u; \
- (x) -= u; \
- })
-
-/* Align IN down to a multiple of F, where F is a power of two. */
-#define ALIGN_DOWN_TO(in, f) \
- ({ \
- double factor = f * TWO52; \
- double u = (in + factor) - factor; \
- if (u > in) \
- u -= f; \
- u; \
- })
diff --git a/sysdeps/powerpc/power4/fpu/mpa.c b/sysdeps/powerpc/power4/fpu/mpa.c
deleted file mode 100644
index 1be2e93cb7..0000000000
--- a/sysdeps/powerpc/power4/fpu/mpa.c
+++ /dev/null
@@ -1,214 +0,0 @@
-
-/*
- * IBM Accurate Mathematical Library
- * written by International Business Machines Corp.
- * Copyright (C) 2001-2019 Free Software Foundation, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU Lesser General Public License as published by
- * the Free Software Foundation; either version 2.1 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-
-/* Define __mul and __sqr and use the rest from generic code. */
-#define NO__MUL
-#define NO__SQR
-
-#include <sysdeps/ieee754/dbl-64/mpa.c>
-
-/* Multiply *X and *Y and store result in *Z. X and Y may overlap but not X
- and Z or Y and Z. For P in [1, 2, 3], the exact result is truncated to P
- digits. In case P > 3 the error is bounded by 1.001 ULP. */
-void
-__mul (const mp_no *x, const mp_no *y, mp_no *z, int p)
-{
- long i, i1, i2, j, k, k2;
- long p2 = p;
- double u, zk, zk2;
-
- /* Is z=0? */
- if (__glibc_unlikely (X[0] * Y[0] == 0))
- {
- Z[0] = 0;
- return;
- }
-
- /* Multiply, add and carry */
- k2 = (p2 < 3) ? p2 + p2 : p2 + 3;
- zk = Z[k2] = 0;
- for (k = k2; k > 1;)
- {
- if (k > p2)
- {
- i1 = k - p2;
- i2 = p2 + 1;
- }
- else
- {
- i1 = 1;
- i2 = k;
- }
-#if 1
- /* Rearrange this inner loop to allow the fmadd instructions to be
- independent and execute in parallel on processors that have
- dual symmetrical FP pipelines. */
- if (i1 < (i2 - 1))
- {
- /* Make sure we have at least 2 iterations. */
- if (((i2 - i1) & 1L) == 1L)
- {
- /* Handle the odd iterations case. */
- zk2 = x->d[i2 - 1] * y->d[i1];
- }
- else
- zk2 = 0.0;
- /* Do two multiply/adds per loop iteration, using independent
- accumulators; zk and zk2. */
- for (i = i1, j = i2 - 1; i < i2 - 1; i += 2, j -= 2)
- {
- zk += x->d[i] * y->d[j];
- zk2 += x->d[i + 1] * y->d[j - 1];
- }
- zk += zk2; /* Final sum. */
- }
- else
- {
- /* Special case when iterations is 1. */
- zk += x->d[i1] * y->d[i1];
- }
-#else
- /* The original code. */
- for (i = i1, j = i2 - 1; i < i2; i++, j--)
- zk += X[i] * Y[j];
-#endif
-
- u = (zk + CUTTER) - CUTTER;
- if (u > zk)
- u -= RADIX;
- Z[k] = zk - u;
- zk = u * RADIXI;
- --k;
- }
- Z[k] = zk;
-
- int e = EX + EY;
- /* Is there a carry beyond the most significant digit? */
- if (Z[1] == 0)
- {
- for (i = 1; i <= p2; i++)
- Z[i] = Z[i + 1];
- e--;
- }
-
- EZ = e;
- Z[0] = X[0] * Y[0];
-}
-
-/* Square *X and store result in *Y. X and Y may not overlap. For P in
- [1, 2, 3], the exact result is truncated to P digits. In case P > 3 the
- error is bounded by 1.001 ULP. This is a faster special case of
- multiplication. */
-void
-__sqr (const mp_no *x, mp_no *y, int p)
-{
- long i, j, k, ip;
- double u, yk;
-
- /* Is z=0? */
- if (__glibc_unlikely (X[0] == 0))
- {
- Y[0] = 0;
- return;
- }
-
- /* We need not iterate through all X's since it's pointless to
- multiply zeroes. */
- for (ip = p; ip > 0; ip--)
- if (X[ip] != 0)
- break;
-
- k = (__glibc_unlikely (p < 3)) ? p + p : p + 3;
-
- while (k > 2 * ip + 1)
- Y[k--] = 0;
-
- yk = 0;
-
- while (k > p)
- {
- double yk2 = 0.0;
- long lim = k / 2;
-
- if (k % 2 == 0)
- {
- yk += X[lim] * X[lim];
- lim--;
- }
-
- /* In __mul, this loop (and the one within the next while loop) run
- between a range to calculate the mantissa as follows:
-
- Z[k] = X[k] * Y[n] + X[k+1] * Y[n-1] ... + X[n-1] * Y[k+1]
- + X[n] * Y[k]
-
- For X == Y, we can get away with summing halfway and doubling the
- result. For cases where the range size is even, the mid-point needs
- to be added separately (above). */
- for (i = k - p, j = p; i <= lim; i++, j--)
- yk2 += X[i] * X[j];
-
- yk += 2.0 * yk2;
-
- u = (yk + CUTTER) - CUTTER;
- if (u > yk)
- u -= RADIX;
- Y[k--] = yk - u;
- yk = u * RADIXI;
- }
-
- while (k > 1)
- {
- double yk2 = 0.0;
- long lim = k / 2;
-
- if (k % 2 == 0)
- {
- yk += X[lim] * X[lim];
- lim--;
- }
-
- /* Likewise for this loop. */
- for (i = 1, j = k - 1; i <= lim; i++, j--)
- yk2 += X[i] * X[j];
-
- yk += 2.0 * yk2;
-
- u = (yk + CUTTER) - CUTTER;
- if (u > yk)
- u -= RADIX;
- Y[k--] = yk - u;
- yk = u * RADIXI;
- }
- Y[k] = yk;
-
- /* Squares are always positive. */
- Y[0] = 1.0;
-
- int e = EX * 2;
- /* Is there a carry beyond the most significant digit? */
- if (__glibc_unlikely (Y[1] == 0))
- {
- for (i = 1; i <= p; i++)
- Y[i] = Y[i + 1];
- e--;
- }
- EY = e;
-}
--
2.17.1
next prev parent reply other threads:[~2019-03-29 13:36 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-29 13:35 [PATCH 00/28] powerpc floating-point optimization refactor Adhemerval Zanella
2019-03-29 13:35 ` [PATCH 01/28] powerpc: Use generic fabs{f} implementations Adhemerval Zanella
2019-04-01 20:04 ` Joseph Myers
2019-04-03 1:04 ` Adhemerval Zanella
2019-04-15 20:23 ` Gabriel F. T. Gomes
2019-04-15 21:32 ` Tulio Magno Quites Machado Filho
2019-04-17 17:08 ` Adhemerval Zanella
2019-03-29 13:35 ` [PATCH 02/28] powerpc: fma using builtins Adhemerval Zanella
2019-04-01 20:05 ` Joseph Myers
2019-04-03 1:06 ` Adhemerval Zanella
2019-04-15 21:44 ` Gabriel F. T. Gomes
2019-04-17 21:10 ` Joseph Myers
2019-04-17 21:28 ` Adhemerval Zanella
2019-03-29 13:35 ` Adhemerval Zanella [this message]
2019-04-24 21:51 ` [PATCH 03/28] powerpc: Remove power4 mpa optimization Gabriel F. T. Gomes
2019-04-25 12:19 ` Adhemerval Zanella
2019-03-29 13:35 ` [PATCH 04/28] powerpc: ceil/ceilf refactor Adhemerval Zanella
2019-04-25 1:56 ` Gabriel F. T. Gomes
2019-04-25 21:58 ` Adhemerval Zanella
2019-05-02 18:41 ` Gabriel F. T. Gomes
2019-05-04 21:46 ` Gabriel F. T. Gomes
2019-05-06 12:22 ` Adhemerval Zanella
2019-05-09 19:56 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 05/28] powerpc: floor/floorf refactor Adhemerval Zanella
2019-05-03 21:44 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 06/28] powerpc: round/roundf refactor Adhemerval Zanella
2019-05-04 1:10 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 07/28] powerpc: trunc/truncf refactor Adhemerval Zanella
2019-05-09 20:06 ` Gabriel F. T. Gomes
2019-05-09 20:54 ` Adhemerval Zanella
2019-05-09 21:36 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 08/28] powerpc: generic nearbyint/nearbyintf Adhemerval Zanella
2019-05-27 20:42 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 09/28] powerpc: consolidate rint Adhemerval Zanella
2019-05-30 14:36 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 10/28] powerpc: copysign cleanup Adhemerval Zanella
2019-05-31 14:14 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 11/28] benchtests: Add isnan/isinf/isfinite benchmark Adhemerval Zanella
2019-06-05 22:45 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 12/28] math: Use wordsize-64 version for isnan Adhemerval Zanella
2019-06-05 22:45 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 13/28] powerpc: Remove optimized isnan Adhemerval Zanella
2019-06-05 22:46 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 14/28] math: Use wordsize-64 version for isinf Adhemerval Zanella
2019-06-11 13:13 ` Gabriel F. T. Gomes
2019-06-13 8:57 ` Szabolcs Nagy
2019-06-13 9:11 ` Szabolcs Nagy
2019-03-29 13:35 ` [PATCH 15/28] powerpc: Remove optimized isinf Adhemerval Zanella
2019-06-11 13:45 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 16/28] math: Use wordsize-64 version for finite Adhemerval Zanella
2019-06-11 16:20 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 17/28] powerpc: Remove optimized finite Adhemerval Zanella
2019-06-11 18:08 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 18/28] powerpc: refactor powerpc64 lrint/lrintf/llrint/llrintf Adhemerval Zanella
2019-06-13 19:30 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 19/28] powerpc: Refactor powerpc32 lrint/lrintf/llrint/llrintf Adhemerval Zanella
2019-06-14 18:34 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 20/28] powerpc: Refactor powerpc64 lround/lroundf/llround/llroundf Adhemerval Zanella
2019-06-13 19:30 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 21/28] powerpc: Refactor powerpc32 lround/lroundf/llround/llroundf Adhemerval Zanella
2019-06-24 21:07 ` Gabriel F. T. Gomes
2019-06-25 18:34 ` Adhemerval Zanella
2019-06-25 18:44 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 22/28] powerpc: Use generic e_expf Adhemerval Zanella
2019-06-26 12:59 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 23/28] benchtests: hypot benchmark Adhemerval Zanella
2019-06-26 19:58 ` Gabriel F. T. Gomes
2019-03-29 13:35 ` [PATCH 24/28] powerpc: hypot refactor and optimization Adhemerval Zanella
2019-04-01 20:14 ` Joseph Myers
2019-04-03 1:08 ` Adhemerval Zanella
2019-06-26 19:59 ` Gabriel F. T. Gomes
2019-07-08 15:37 ` Adhemerval Zanella
2019-07-08 18:30 ` Adhemerval Zanella
2019-03-29 13:35 ` [PATCH 25/28] powerpc: Refactor modf{f} Adhemerval Zanella
2019-07-05 1:23 ` Gabriel F. T. Gomes
2019-07-08 18:30 ` Adhemerval Zanella
2019-03-29 13:35 ` [PATCH 26/28] benchtests: Add logb{f} benchmark Adhemerval Zanella
2019-07-05 1:23 ` Gabriel F. T. Gomes
2019-07-08 18:31 ` Adhemerval Zanella
2019-03-29 13:35 ` [PATCH 27/28] math: Use wordsize-64 version for s_logb Adhemerval Zanella
2019-07-05 1:23 ` Gabriel F. T. Gomes
2019-07-08 18:31 ` Adhemerval Zanella
2019-03-29 13:35 ` [PATCH 28/28] powerpc: refactor logb{f,l} Adhemerval Zanella
2019-07-05 1:24 ` Gabriel F. T. Gomes
2019-07-08 18:31 ` Adhemerval Zanella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190329133529.22523-4-adhemerval.zanella@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).