From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 1E3841F44D for ; Wed, 27 Mar 2024 20:15:39 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=cXjcFRgy; dkim-atps=neutral Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 679E53858D34 for ; Wed, 27 Mar 2024 20:15:37 +0000 (GMT) Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by sourceware.org (Postfix) with ESMTPS id 63C7E3858D20 for ; Wed, 27 Mar 2024 20:15:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 63C7E3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 63C7E3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::112c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711570511; cv=none; b=R0uhbnvkHRW/nmbLs02qINIGhrEnT780kHebhNqvN14e7eymG1qcwxjpM2NMZEbYCXH7wX199SICoCxGZQ3vOzRaU5M2R3hG/QO12U+zeZH/PRPOoos0sXozze/Om/VJGfI1R1/zXu0a2zFCqNrivlrr5+mfIKdJJgu0+2fu9D4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711570511; c=relaxed/simple; bh=M1Q+s+hzaWSIHoJBY3K9bl+KnCXCPmHoPdVFtucP/lg=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=XS11yDZXoGejfOxwvxfHQKkUb9S5d1eWqC/MmOn5a7Sms1NMvDSJIsQw9mjsa615ZV6B98cE7Vs/bSaTH02Ss3Pbc7on72pUI9rPyeeou2vFx+xAC8sARtGk3U3BSXkUYvagzWAG2jggOiUs7DbMYX1D8uVL541qywhpXxW4wew= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-609ed7ca444so2695487b3.1 for ; Wed, 27 Mar 2024 13:15:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711570509; x=1712175309; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=37LeKFWXYD/gQ3sUwUKSwsGT2gzc6jB8ORVFT5LbGBY=; b=cXjcFRgyIMzwugPEeRItyudY40J3YMxW4QUq+Mk10WbRSKfYs7OSp53WjHunvn8khn K0scOBYe5bx3I+unztLw73azV1cXuNZ1VU6PmIKp++q/PUJwI5ASQsQ1k7LSFL8ZDvVL IdmeMMJVM7vEEadWj/4eBuH3A2+85GeykFXoIM06qviweEne7bEzR7ap73YyQ8hrcwaj h+VB9de0BEB1wBK9XpVC0YuusyTkKmGfPoDyVN/Fb59PekngcGkjOUNT7dalIxhkPdZJ M48aJ8Q/hdEsqg5E8LDr/FP5tkxV/ACwRiNxEoVY0bRiIbnR2uvQ4xn+rbF9Ql7PgacM STTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711570509; x=1712175309; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=37LeKFWXYD/gQ3sUwUKSwsGT2gzc6jB8ORVFT5LbGBY=; b=fcIcTWTVgY4/W3Abw7mZ6TnvrENafnNZCmkC+A5dlMykWdIU+y3OlvJJO5B9RUP9Xw gTHVf4kAd86+hND32AT6MFkzbc9GsLQAkgCj40kB/Gt5d5L7iFreA8yCZ3dd0EIF0lD5 xPddG8QJfNg3ZR7Dz51ZJm0QuCOpYlAtPtzy/+bZVCneojjX8qmA3mY+E5h4TAYCPXjZ xBTq5PBosKg62fDF90a7s1LvlTwPgyzy0sKlXkX1yD+Fdq/i2qPQpdyU5SYwglatCm9x lLdXtAsZsDVBZoJ1dlKMVXYHqOlNE6Hx2/n8me1mDlJoX8ED0YLXdkssXBDXKE1QiP4b m2BQ== X-Gm-Message-State: AOJu0YxO7+a7RG8aThl/lad+7qL2B2WmiWbm1P1lm9+P//ifky4GcbQB swRn5PN1G7YaE9d/zlcoW3CU+xMijCNOnXID0ZyZOPqHN6HELwWsWg3o3p++WKNE8N3vR266H0h tstKQlnx830UmyxY8y2lZjDOgRJY= X-Google-Smtp-Source: AGHT+IEzlX6+/xNrT+HsUo25au+S7myJIooxTrgG6J3ONxuiPJlQhfCALOIiBj/3RObtAGSKDup/4/ufQQgcsWPWNOE= X-Received: by 2002:a81:f90e:0:b0:614:64f:7df1 with SMTP id x14-20020a81f90e000000b00614064f7df1mr764078ywm.3.1711570508584; Wed, 27 Mar 2024 13:15:08 -0700 (PDT) MIME-Version: 1.0 References: <20240327194024.1409677-1-adhemerval.zanella@linaro.org> <20240327194024.1409677-7-adhemerval.zanella@linaro.org> In-Reply-To: <20240327194024.1409677-7-adhemerval.zanella@linaro.org> From: "H.J. Lu" Date: Wed, 27 Mar 2024 13:14:32 -0700 Message-ID: Subject: Re: [PATCH v2 06/10] i386: Use generic exp10 To: Adhemerval Zanella Cc: libc-alpha@sourceware.org, Joseph Myers , Florian Weimer Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org On Wed, Mar 27, 2024 at 12:40=E2=80=AFPM Adhemerval Zanella wrote: > > The resulting performance is slight better (Ryzen 5900, gcc 13.2.1): > > * master > "exp10": { > "": { > "duration": 3.70091e+09, > "iterations": 5.8534e+07, > "max": 91.279, > "min": 62.6225, > "mean": 63.2267 > } > } > > * patch > "exp10": { > "": { > "duration": 3.70793e+09, > "iterations": 6.328e+07, > "max": 259.592, > "min": 52.1145, > "mean": 58.5957 > } > } > > Checked on i686-linux-gnu. > --- > sysdeps/i386/fpu/Versions | 1 + > sysdeps/i386/fpu/e_exp10.S | 51 ----------------------- > sysdeps/i386/fpu/e_exp10.c | 2 + > sysdeps/i386/fpu/e_exp_data.c | 1 - > sysdeps/i386/fpu/w_exp10_compat.c | 8 ---- > sysdeps/ieee754/dbl-64/e_exp10.c | 7 +++- > sysdeps/mach/hurd/i386/libm.abilist | 1 + > sysdeps/unix/sysv/linux/i386/libm.abilist | 1 + > 8 files changed, 10 insertions(+), 62 deletions(-) > delete mode 100644 sysdeps/i386/fpu/e_exp10.S > create mode 100644 sysdeps/i386/fpu/e_exp10.c > delete mode 100644 sysdeps/i386/fpu/e_exp_data.c > delete mode 100644 sysdeps/i386/fpu/w_exp10_compat.c > > diff --git a/sysdeps/i386/fpu/Versions b/sysdeps/i386/fpu/Versions > index 9509f9b7c7..7326f25583 100644 > --- a/sysdeps/i386/fpu/Versions > +++ b/sysdeps/i386/fpu/Versions > @@ -5,6 +5,7 @@ libm { > } > GLIBC_2.40 { > # No SVID compatible error handling. > + exp10; > fmod; fmodf; > } > } > diff --git a/sysdeps/i386/fpu/e_exp10.S b/sysdeps/i386/fpu/e_exp10.S > deleted file mode 100644 > index 902f70b77f..0000000000 > --- a/sysdeps/i386/fpu/e_exp10.S > +++ /dev/null > @@ -1,51 +0,0 @@ > - > -#include > -#include > -#include > - > -DEFINE_DBL_MIN > - > -#ifdef PIC > -# define MO(op) op##@GOTOFF(%ecx) > -#else > -# define MO(op) op > -#endif > - > - .text > -/* 10^x =3D 2^(x * log2(10)) */ > -ENTRY(__ieee754_exp10) > -#ifdef PIC > - LOAD_PIC_REG (cx) > -#endif > - fldl 4(%esp) > -/* I added the following ugly construct because exp(+-Inf) resulted > - in NaN. The ugliness results from the bright minds at Intel. > - For the i686 the code can be written better. > - -- drepper@cygnus.com. */ > - fxam /* Is NaN or +-Inf? */ > - fstsw %ax > - movb $0x45, %dh > - andb %ah, %dh > - cmpb $0x05, %dh > - je 1f /* Is +-Inf, jump. */ > - fldl2t > - fmulp /* x * log2(10) */ > - fld %st > - frndint /* int(x * log2(10)) */ > - fsubr %st,%st(1) /* fract(x * log2(10)) */ > - fxch > - f2xm1 /* 2^(fract(x * log2(10))) - 1 */ > - fld1 > - faddp /* 2^(fract(x * log2(10))) */ > - fscale /* e^x */ > - fstp %st(1) > - DBL_NARROW_EVAL_UFLOW_NONNEG_NAN > - ret > - > -1: testl $0x200, %eax /* Test sign. */ > - jz 2f /* If positive, jump. */ > - fstp %st > - fldz /* Set result to 0. */ > -2: ret > -END (__ieee754_exp10) > -libm_alias_finite (__ieee754_exp10, __exp10) > diff --git a/sysdeps/i386/fpu/e_exp10.c b/sysdeps/i386/fpu/e_exp10.c > new file mode 100644 > index 0000000000..340254fc6e > --- /dev/null > +++ b/sysdeps/i386/fpu/e_exp10.c > @@ -0,0 +1,2 @@ > +#define EXP10_VERSION GLIBC_2_40 > +#include > diff --git a/sysdeps/i386/fpu/e_exp_data.c b/sysdeps/i386/fpu/e_exp_data.= c > deleted file mode 100644 > index 1cc8931700..0000000000 > --- a/sysdeps/i386/fpu/e_exp_data.c > +++ /dev/null > @@ -1 +0,0 @@ > -/* Not needed. */ > diff --git a/sysdeps/i386/fpu/w_exp10_compat.c b/sysdeps/i386/fpu/w_exp10= _compat.c > deleted file mode 100644 > index 49a0e03385..0000000000 > --- a/sysdeps/i386/fpu/w_exp10_compat.c > +++ /dev/null > @@ -1,8 +0,0 @@ > -/* i386 provides an optimized __ieee754_exp10. */ > -#ifdef SHARED > -# define NO_COMPAT_NEEDED 1 > -# include > -#else > -# include > -# include > -#endif > diff --git a/sysdeps/ieee754/dbl-64/e_exp10.c b/sysdeps/ieee754/dbl-64/e_= exp10.c > index 225fc74c4c..c63b852f72 100644 > --- a/sysdeps/ieee754/dbl-64/e_exp10.c > +++ b/sysdeps/ieee754/dbl-64/e_exp10.c > @@ -99,7 +99,7 @@ __exp10 (double x) > > /* Reduce x: z =3D x * N / log10(2), k =3D round(z). */ > double_t z =3D __exp_data.invlog10_2N * x; > - double_t kd; > + double kd; > int64_t ki; > #if TOINT_INTRINSICS > kd =3D roundtoint (z); > @@ -147,7 +147,10 @@ __exp10 (double x) > strong_alias (__exp10, __ieee754_exp10) > libm_alias_finite (__ieee754_exp10, __exp10) > #if LIBM_SVID_COMPAT > -versioned_symbol (libm, __exp10, exp10, GLIBC_2_39); > +# ifndef EXP10_VERSION > +# define EXP10_VERSION GLIBC_2_39 > +# endif > +versioned_symbol (libm, __exp10, exp10, EXP10_VERSION); > libm_alias_double_other (__exp10, exp10) > #else > libm_alias_double (__exp10, exp10) > diff --git a/sysdeps/mach/hurd/i386/libm.abilist b/sysdeps/mach/hurd/i386= /libm.abilist > index 88e7538e51..01c5633663 100644 > --- a/sysdeps/mach/hurd/i386/libm.abilist > +++ b/sysdeps/mach/hurd/i386/libm.abilist > @@ -1181,5 +1181,6 @@ GLIBC_2.35 fsqrt F > GLIBC_2.35 fsqrtl F > GLIBC_2.35 hypot F > GLIBC_2.35 hypotf F > +GLIBC_2.40 exp10 F > GLIBC_2.40 fmod F > GLIBC_2.40 fmodf F > diff --git a/sysdeps/unix/sysv/linux/i386/libm.abilist b/sysdeps/unix/sys= v/linux/i386/libm.abilist > index c99c60161d..3413cfdbe7 100644 > --- a/sysdeps/unix/sysv/linux/i386/libm.abilist > +++ b/sysdeps/unix/sysv/linux/i386/libm.abilist > @@ -1188,5 +1188,6 @@ GLIBC_2.35 fsqrt F > GLIBC_2.35 fsqrtl F > GLIBC_2.35 hypot F > GLIBC_2.35 hypotf F > +GLIBC_2.40 exp10 F > GLIBC_2.40 fmod F > GLIBC_2.40 fmodf F > -- > 2.34.1 > Also need a bug report. --=20 H.J.