From: Noah Goldstein via Libc-alpha <libc-alpha@sourceware.org>
To: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex
Date: Mon, 13 Sep 2021 18:22:10 -0500 [thread overview]
Message-ID: <CAFUsyfLZnBra19JkFjV3Kkx55kqEEgGNA1vNRgK2v4F11WMM2g@mail.gmail.com> (raw)
In-Reply-To: <20210913230506.546749-1-goldstein.w.n@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 24661 bytes --]
On Mon, Sep 13, 2021 at 6:21 PM Noah Goldstein <goldstein.w.n@gmail.com>
wrote:
> No bug. This commit adds support for an optimized bcmp implementation.
> Support is for sse2, sse4_1, avx2, and evex.
>
> All string tests passing and build succeeding.
> ---
> This commit is essentially because compilers will optimize the
> idiomatic use of memcmp return as a boolean:
>
> https://godbolt.org/z/Tbhefh6cv
>
> so it seems reasonable to have an optimized bcmp implementation as we
> can get ~0-25% improvement (generally larger improvement for the
> smaller size ranges which ultimately are the most important to opimize
> for).
>
> Numbers for new implementations attached in reply.
>
Numbers in this email.
>
> Tests where run on the following CPUs:
>
> Tigerlake:
> https://ark.intel.com/content/www/us/en/ark/products/208921/intel-core-i7-1165g7-processor-12m-cache-up-to-4-70-ghz-with-ipu.html
> Skylake:
> https://ark.intel.com/content/www/us/en/ark/products/149091/intel-core-i7-8565u-processor-8m-cache-up-to-4-60-ghz.html
>
> Some notes on the numbers.
>
> There are some regressions in the sse2/sse4_1 versions. I didn't
> optimize these versions beyond defining out obviously irrelivant code
> for bcmp. My intuition is that the slowdowns are alignment related. I
> am not sure if these issues would translate to architectures that
> would actually use sse2/sse4_1.
>
> I add the sse2/sse4_1 implementations mostly so that the ifunc would
> have something to fallback on. With the lackluster numbers it may not
> be worth it, especially factoring in code size costs. Thoughts?
>
> The Tigerlake and Skylake versions are basically universal
> improvements for evex and avx2. I opted to align bcmp to 64 byte as
> opposed to 16. The rational is that to optimize for frontend behavior
> on either machine, only 16 byte gurantees is not enough. I think in
> any function where throughput (which I think bcmp can be) might be
> important good frontend behavior is important.
>
>
> benchtests/Makefile | 2 +-
> benchtests/bench-bcmp.c | 20 ++++++++
> benchtests/bench-memcmp.c | 4 +-
> string/Makefile | 4 +-
> string/test-bcmp.c | 21 +++++++++
> string/test-memcmp.c | 27 +++++++----
> sysdeps/x86_64/memcmp.S | 2 -
> sysdeps/x86_64/multiarch/Makefile | 3 ++
> sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S | 12 +++++
> sysdeps/x86_64/multiarch/bcmp-avx2.S | 23 ++++++++++
> sysdeps/x86_64/multiarch/bcmp-evex.S | 23 ++++++++++
> sysdeps/x86_64/multiarch/bcmp-sse2.S | 23 ++++++++++
> sysdeps/x86_64/multiarch/bcmp-sse4.S | 23 ++++++++++
> sysdeps/x86_64/multiarch/bcmp.c | 35 ++++++++++++++
> sysdeps/x86_64/multiarch/ifunc-bcmp.h | 53 ++++++++++++++++++++++
> sysdeps/x86_64/multiarch/ifunc-impl-list.c | 23 ++++++++++
> sysdeps/x86_64/multiarch/memcmp-sse2.S | 4 +-
> sysdeps/x86_64/multiarch/memcmp.c | 2 -
> 18 files changed, 286 insertions(+), 18 deletions(-)
> create mode 100644 benchtests/bench-bcmp.c
> create mode 100644 string/test-bcmp.c
> create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
> create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2.S
> create mode 100644 sysdeps/x86_64/multiarch/bcmp-evex.S
> create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse2.S
> create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse4.S
> create mode 100644 sysdeps/x86_64/multiarch/bcmp.c
> create mode 100644 sysdeps/x86_64/multiarch/ifunc-bcmp.h
>
> diff --git a/benchtests/Makefile b/benchtests/Makefile
> index 1530939a8c..5fc495eb57 100644
> --- a/benchtests/Makefile
> +++ b/benchtests/Makefile
> @@ -47,7 +47,7 @@ bench := $(foreach B,$(filter bench-%,${BENCHSET}),
> ${${B}})
> endif
>
> # String function benchmarks.
> -string-benchset := memccpy memchr memcmp memcpy memmem memmove \
> +string-benchset := bcmp memccpy memchr memcmp memcpy memmem memmove \
> mempcpy memset rawmemchr stpcpy stpncpy strcasecmp
> strcasestr \
> strcat strchr strchrnul strcmp strcpy strcspn strlen \
> strncasecmp strncat strncmp strncpy strnlen strpbrk
> strrchr \
> diff --git a/benchtests/bench-bcmp.c b/benchtests/bench-bcmp.c
> new file mode 100644
> index 0000000000..1023639787
> --- /dev/null
> +++ b/benchtests/bench-bcmp.c
> @@ -0,0 +1,20 @@
> +/* Measure bcmp functions.
> + Copyright (C) 2015-2021 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#define TEST_BCMP 1
> +#include "bench-memcmp.c"
> diff --git a/benchtests/bench-memcmp.c b/benchtests/bench-memcmp.c
> index 744c7ec5ba..4d5f8fb766 100644
> --- a/benchtests/bench-memcmp.c
> +++ b/benchtests/bench-memcmp.c
> @@ -17,7 +17,9 @@
> <https://www.gnu.org/licenses/>. */
>
> #define TEST_MAIN
> -#ifdef WIDE
> +#ifdef TEST_BCMP
> +# define TEST_NAME "bcmp"
> +#elif defined WIDE
> # define TEST_NAME "wmemcmp"
> #else
> # define TEST_NAME "memcmp"
> diff --git a/string/Makefile b/string/Makefile
> index f0fce2a0b8..f1f67ee157 100644
> --- a/string/Makefile
> +++ b/string/Makefile
> @@ -35,7 +35,7 @@ routines := strcat strchr strcmp strcoll strcpy
> strcspn \
> strncat strncmp strncpy \
> strrchr strpbrk strsignal strspn strstr strtok \
> strtok_r strxfrm memchr memcmp memmove memset \
> - mempcpy bcopy bzero ffs ffsll stpcpy stpncpy \
> + mempcpy bcmp bcopy bzero ffs ffsll stpcpy stpncpy
> \
> strcasecmp strncase strcasecmp_l strncase_l \
> memccpy memcpy wordcopy strsep strcasestr \
> swab strfry memfrob memmem rawmemchr strchrnul \
> @@ -52,7 +52,7 @@ strop-tests := memchr memcmp memcpy memmove mempcpy
> memset memccpy \
> stpcpy stpncpy strcat strchr strcmp strcpy strcspn \
> strlen strncmp strncpy strpbrk strrchr strspn memmem \
> strstr strcasestr strnlen strcasecmp strncasecmp \
> - strncat rawmemchr strchrnul bcopy bzero memrchr \
> + strncat rawmemchr strchrnul bcmp bcopy bzero memrchr \
> explicit_bzero
> tests := tester inl-tester noinl-tester testcopy test-ffs \
> tst-strlen stratcliff tst-svc tst-inlcall \
> diff --git a/string/test-bcmp.c b/string/test-bcmp.c
> new file mode 100644
> index 0000000000..6d19a4a87c
> --- /dev/null
> +++ b/string/test-bcmp.c
> @@ -0,0 +1,21 @@
> +/* Test and measure bcmp functions.
> + Copyright (C) 2012-2021 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#define BAD_RESULT(result, expec) ((!(result)) != (!(expec)))
> +#define TEST_BCMP 1
> +#include "test-memcmp.c"
> diff --git a/string/test-memcmp.c b/string/test-memcmp.c
> index 6ddbc05d2f..c630e6799d 100644
> --- a/string/test-memcmp.c
> +++ b/string/test-memcmp.c
> @@ -17,11 +17,14 @@
> <https://www.gnu.org/licenses/>. */
>
> #define TEST_MAIN
> -#ifdef WIDE
> +#ifdef TEST_BCMP
> +# define TEST_NAME "bcmp"
> +#elif defined WIDE
> # define TEST_NAME "wmemcmp"
> #else
> # define TEST_NAME "memcmp"
> #endif
> +
> #include "test-string.h"
> #ifdef WIDE
> # include <inttypes.h>
> @@ -35,6 +38,7 @@
> # define CHARBYTES 4
> # define CHAR__MIN WCHAR_MIN
> # define CHAR__MAX WCHAR_MAX
> +
> int
> simple_wmemcmp (const wchar_t *s1, const wchar_t *s2, size_t n)
> {
> @@ -48,8 +52,11 @@ simple_wmemcmp (const wchar_t *s1, const wchar_t *s2,
> size_t n)
> }
> #else
> # include <limits.h>
> -
> -# define MEMCMP memcmp
> +# ifdef TEST_BCMP
> +# define MEMCMP bcmp
> +# else
> +# define MEMCMP memcmp
> +# endif
> # define MEMCPY memcpy
> # define SIMPLE_MEMCMP simple_memcmp
> # define CHAR char
> @@ -69,6 +76,12 @@ simple_memcmp (const char *s1, const char *s2, size_t n)
> }
> #endif
>
> +# ifndef BAD_RESULT
> +# define BAD_RESULT(result, expec) \
> + (((result) == 0 && (expec)) || ((result) < 0 && (expec) >= 0) || \
> + ((result) > 0 && (expec) <= 0))
> +# endif
> +
> typedef int (*proto_t) (const CHAR *, const CHAR *, size_t);
>
> IMPL (SIMPLE_MEMCMP, 0)
> @@ -79,9 +92,7 @@ check_result (impl_t *impl, const CHAR *s1, const CHAR
> *s2, size_t len,
> int exp_result)
> {
> int result = CALL (impl, s1, s2, len);
> - if ((exp_result == 0 && result != 0)
> - || (exp_result < 0 && result >= 0)
> - || (exp_result > 0 && result <= 0))
> + if (BAD_RESULT(result, exp_result))
> {
> error (0, 0, "Wrong result in function %s %d %d", impl->name,
> result, exp_result);
> @@ -186,9 +197,7 @@ do_random_tests (void)
> {
> r = CALL (impl, (CHAR *) p1 + align1, (const CHAR *) p2 + align2,
> len);
> - if ((r == 0 && result)
> - || (r < 0 && result >= 0)
> - || (r > 0 && result <= 0))
> + if (BAD_RESULT(r, result))
> {
> error (0, 0, "Iteration %zd - wrong result in function %s
> (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
> n, impl->name, align1 * CHARBYTES & 63, align2 *
> CHARBYTES & 63, len, pos, r, result, p1, p2);
> diff --git a/sysdeps/x86_64/memcmp.S b/sysdeps/x86_64/memcmp.S
> index 870e15c5a0..dfd0269db2 100644
> --- a/sysdeps/x86_64/memcmp.S
> +++ b/sysdeps/x86_64/memcmp.S
> @@ -356,6 +356,4 @@ L(ATR32res):
> .p2align 4,, 4
> END(memcmp)
>
> -#undef bcmp
> -weak_alias (memcmp, bcmp)
> libc_hidden_builtin_def (memcmp)
> diff --git a/sysdeps/x86_64/multiarch/Makefile
> b/sysdeps/x86_64/multiarch/Makefile
> index 26be40959c..9dd0d8c3ff 100644
> --- a/sysdeps/x86_64/multiarch/Makefile
> +++ b/sysdeps/x86_64/multiarch/Makefile
> @@ -1,6 +1,7 @@
> ifeq ($(subdir),string)
>
> sysdep_routines += strncat-c stpncpy-c strncpy-c \
> + bcmp-sse2 bcmp-sse4 bcmp-avx2 \
> strcmp-sse2 strcmp-sse2-unaligned strcmp-ssse3 \
> strcmp-sse4_2 strcmp-avx2 \
> strncmp-sse2 strncmp-ssse3 strncmp-sse4_2 strncmp-avx2 \
> @@ -40,6 +41,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \
> memset-sse2-unaligned-erms \
> memset-avx2-unaligned-erms \
> memset-avx512-unaligned-erms \
> + bcmp-avx2-rtm \
> memchr-avx2-rtm \
> memcmp-avx2-movbe-rtm \
> memmove-avx-unaligned-erms-rtm \
> @@ -59,6 +61,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \
> strncpy-avx2-rtm \
> strnlen-avx2-rtm \
> strrchr-avx2-rtm \
> + bcmp-evex \
> memchr-evex \
> memcmp-evex-movbe \
> memmove-evex-unaligned-erms \
> diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
> b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
> new file mode 100644
> index 0000000000..d742257e4e
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
> @@ -0,0 +1,12 @@
> +#ifndef MEMCMP
> +# define MEMCMP __bcmp_avx2_rtm
> +#endif
> +
> +#define ZERO_UPPER_VEC_REGISTERS_RETURN \
> + ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
> +
> +#define VZEROUPPER_RETURN jmp L(return_vzeroupper)
> +
> +#define SECTION(p) p##.avx.rtm
> +
> +#include "bcmp-avx2.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2.S
> b/sysdeps/x86_64/multiarch/bcmp-avx2.S
> new file mode 100644
> index 0000000000..93a9a20b17
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-avx2.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with AVX2.
> + Copyright (C) 2017-2021 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#ifndef MEMCMP
> +# define MEMCMP __bcmp_avx2
> +#endif
> +
> +#include "bcmp-avx2.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-evex.S
> b/sysdeps/x86_64/multiarch/bcmp-evex.S
> new file mode 100644
> index 0000000000..ade52e8c68
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-evex.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with EVEX.
> + Copyright (C) 2017-2021 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#ifndef MEMCMP
> +# define MEMCMP __bcmp_evex
> +#endif
> +
> +#include "memcmp-evex-movbe.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-sse2.S
> b/sysdeps/x86_64/multiarch/bcmp-sse2.S
> new file mode 100644
> index 0000000000..b18d570386
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-sse2.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with SSE2
> + Copyright (C) 2017-2021 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +# ifndef memcmp
> +# define memcmp __bcmp_sse2
> +# endif
> +# define USE_AS_BCMP 1
> +#include "memcmp-sse2.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-sse4.S
> b/sysdeps/x86_64/multiarch/bcmp-sse4.S
> new file mode 100644
> index 0000000000..ed9804053f
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-sse4.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with SSE4.1
> + Copyright (C) 2017-2021 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +# ifndef MEMCMP
> +# define MEMCMP __bcmp_sse4_1
> +# endif
> +# define USE_AS_BCMP 1
> +#include "memcmp-sse4.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp.c
> b/sysdeps/x86_64/multiarch/bcmp.c
> new file mode 100644
> index 0000000000..6e26b73ecc
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp.c
> @@ -0,0 +1,35 @@
> +/* Multiple versions of bcmp.
> + All versions must be listed in ifunc-impl-list.c.
> + Copyright (C) 2017-2021 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +/* Define multiple versions only for the definition in libc. */
> +#if IS_IN (libc)
> +# define bcmp __redirect_bcmp
> +# include <string.h>
> +# undef bcmp
> +
> +# define SYMBOL_NAME bcmp
> +# include "ifunc-bcmp.h"
> +
> +libc_ifunc_redirected (__redirect_bcmp, bcmp, IFUNC_SELECTOR ());
> +
> +# ifdef SHARED
> +__hidden_ver1 (bcmp, __GI_bcmp, __redirect_bcmp)
> + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (bcmp);
> +# endif
> +#endif
> diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h
> b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
> new file mode 100644
> index 0000000000..b0dacd8526
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
> @@ -0,0 +1,53 @@
> +/* Common definition for bcmp ifunc selections.
> + All versions must be listed in ifunc-impl-list.c.
> + Copyright (C) 2017-2021 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +# include <init-arch.h>
> +
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden;
> +
> +static inline void *
> +IFUNC_SELECTOR (void)
> +{
> + const struct cpu_features* cpu_features = __get_cpu_features ();
> +
> + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> + && CPU_FEATURE_USABLE_P (cpu_features, BMI2)
> + && CPU_FEATURE_USABLE_P (cpu_features, MOVBE)
> + && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
> + {
> + if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
> + && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
> + return OPTIMIZE (evex);
> +
> + if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
> + return OPTIMIZE (avx2_rtm);
> +
> + if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER))
> + return OPTIMIZE (avx2);
> + }
> +
> + if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1))
> + return OPTIMIZE (sse4_1);
> +
> + return OPTIMIZE (sse2);
> +}
> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> index 39ab10613b..dd0c393c7d 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> @@ -38,6 +38,29 @@ __libc_ifunc_impl_list (const char *name, struct
> libc_ifunc_impl *array,
>
> size_t i = 0;
>
> + /* Support sysdeps/x86_64/multiarch/bcmp.c. */
> + IFUNC_IMPL (i, name, bcmp,
> + IFUNC_IMPL_ADD (array, i, bcmp,
> + (CPU_FEATURE_USABLE (AVX2)
> + && CPU_FEATURE_USABLE (MOVBE)
> + && CPU_FEATURE_USABLE (BMI2)),
> + __bcmp_avx2)
> + IFUNC_IMPL_ADD (array, i, bcmp,
> + (CPU_FEATURE_USABLE (AVX2)
> + && CPU_FEATURE_USABLE (BMI2)
> + && CPU_FEATURE_USABLE (MOVBE)
> + && CPU_FEATURE_USABLE (RTM)),
> + __bcmp_avx2_rtm)
> + IFUNC_IMPL_ADD (array, i, bcmp,
> + (CPU_FEATURE_USABLE (AVX512VL)
> + && CPU_FEATURE_USABLE (AVX512BW)
> + && CPU_FEATURE_USABLE (MOVBE)
> + && CPU_FEATURE_USABLE (BMI2)),
> + __bcmp_evex)
> + IFUNC_IMPL_ADD (array, i, bcmp, CPU_FEATURE_USABLE (SSE4_1),
> + __bcmp_sse4_1)
> + IFUNC_IMPL_ADD (array, i, bcmp, 1, __bcmp_sse2))
> +
> /* Support sysdeps/x86_64/multiarch/memchr.c. */
> IFUNC_IMPL (i, name, memchr,
> IFUNC_IMPL_ADD (array, i, memchr,
> diff --git a/sysdeps/x86_64/multiarch/memcmp-sse2.S
> b/sysdeps/x86_64/multiarch/memcmp-sse2.S
> index b135fa2d40..2a4867ad18 100644
> --- a/sysdeps/x86_64/multiarch/memcmp-sse2.S
> +++ b/sysdeps/x86_64/multiarch/memcmp-sse2.S
> @@ -17,7 +17,9 @@
> <https://www.gnu.org/licenses/>. */
>
> #if IS_IN (libc)
> -# define memcmp __memcmp_sse2
> +# ifndef memcmp
> +# define memcmp __memcmp_sse2
> +# endif
>
> # ifdef SHARED
> # undef libc_hidden_builtin_def
> diff --git a/sysdeps/x86_64/multiarch/memcmp.c
> b/sysdeps/x86_64/multiarch/memcmp.c
> index fe725f3563..1760e045df 100644
> --- a/sysdeps/x86_64/multiarch/memcmp.c
> +++ b/sysdeps/x86_64/multiarch/memcmp.c
> @@ -27,8 +27,6 @@
> # include "ifunc-memcmp.h"
>
> libc_ifunc_redirected (__redirect_memcmp, memcmp, IFUNC_SELECTOR ());
> -# undef bcmp
> -weak_alias (memcmp, bcmp)
>
> # ifdef SHARED
> __hidden_ver1 (memcmp, __GI_memcmp, __redirect_memcmp)
> --
> 2.25.1
>
>
[-- Attachment #2: bcmp-skl.pdf --]
[-- Type: application/pdf, Size: 195097 bytes --]
[-- Attachment #3: bcmp-tgl.pdf --]
[-- Type: application/pdf, Size: 223172 bytes --]
next prev parent reply other threads:[~2021-09-13 23:25 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-13 23:05 [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
2021-09-14 1:18 ` Carlos O'Donell via Libc-alpha
2021-09-14 2:05 ` Noah Goldstein via Libc-alpha
2021-09-14 2:35 ` Carlos O'Donell via Libc-alpha
2021-09-14 2:55 ` DJ Delorie via Libc-alpha
2021-09-14 3:24 ` Noah Goldstein via Libc-alpha
2021-09-14 3:40 ` Noah Goldstein via Libc-alpha
2021-09-14 4:21 ` DJ Delorie via Libc-alpha
2021-09-14 5:29 ` Noah Goldstein via Libc-alpha
2021-09-14 5:42 ` DJ Delorie via Libc-alpha
2021-09-14 5:55 ` Noah Goldstein via Libc-alpha
2021-09-13 23:22 ` Noah Goldstein via Libc-alpha [this message]
2021-09-14 6:30 ` [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
2021-09-14 6:30 ` [PATCH v2 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
2021-09-14 6:30 ` [PATCH v2 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S Noah Goldstein via Libc-alpha
2021-09-14 6:30 ` [PATCH v2 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S Noah Goldstein via Libc-alpha
2021-09-14 6:30 ` [PATCH v2 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
2021-09-14 14:40 ` [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex H.J. Lu via Libc-alpha
2021-09-14 19:23 ` Noah Goldstein via Libc-alpha
2021-09-14 20:30 ` Florian Weimer via Libc-alpha
2021-09-15 0:00 ` [PATCH " Joseph Myers
2021-09-15 13:37 ` Zack Weinberg via Libc-alpha
2021-09-15 14:01 ` Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, " Florian Weimer via Libc-alpha
2021-09-15 18:06 ` Noah Goldstein via Libc-alpha
2021-09-15 18:30 ` Joseph Myers
2021-09-27 1:35 ` Noah Goldstein via Libc-alpha
2021-09-27 7:29 ` Florian Weimer via Libc-alpha
2021-09-27 16:49 ` Noah Goldstein via Libc-alpha
2021-09-27 16:54 ` Florian Weimer via Libc-alpha
2021-09-27 17:54 ` Noah Goldstein via Libc-alpha
2021-09-27 17:56 ` Florian Weimer via Libc-alpha
2021-09-27 18:05 ` Noah Goldstein via Libc-alpha
2021-09-27 18:10 ` Florian Weimer via Libc-alpha
2021-09-27 18:15 ` Noah Goldstein via Libc-alpha
2021-09-27 18:22 ` Florian Weimer via Libc-alpha
2021-09-27 18:34 ` Noah Goldstein via Libc-alpha
2021-09-27 18:56 ` Florian Weimer via Libc-alpha
2021-09-27 19:20 ` Noah Goldstein via Libc-alpha
2021-09-27 19:34 ` Florian Weimer via Libc-alpha
2021-09-27 19:43 ` Noah Goldstein via Libc-alpha
2021-09-27 19:59 ` Florian Weimer via Libc-alpha
2021-09-27 20:22 ` Noah Goldstein via Libc-alpha
2021-09-27 20:24 ` Florian Weimer via Libc-alpha
2021-09-27 20:38 ` Noah Goldstein via Libc-alpha
2021-09-28 0:07 ` Noah Goldstein via Libc-alpha
2021-09-27 17:42 ` Joseph Myers
2021-09-27 17:48 ` Noah Goldstein via Libc-alpha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFUsyfLZnBra19JkFjV3Kkx55kqEEgGNA1vNRgK2v4F11WMM2g@mail.gmail.com \
--to=libc-alpha@sourceware.org \
--cc=goldstein.w.n@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).