unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Noah Goldstein via Libc-alpha <libc-alpha@sourceware.org>
To: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex
Date: Mon, 13 Sep 2021 18:22:10 -0500	[thread overview]
Message-ID: <CAFUsyfLZnBra19JkFjV3Kkx55kqEEgGNA1vNRgK2v4F11WMM2g@mail.gmail.com> (raw)
In-Reply-To: <20210913230506.546749-1-goldstein.w.n@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 24661 bytes --]

On Mon, Sep 13, 2021 at 6:21 PM Noah Goldstein <goldstein.w.n@gmail.com>
wrote:

> No bug. This commit adds support for an optimized bcmp implementation.
> Support is for sse2, sse4_1, avx2, and evex.
>
> All string tests passing and build succeeding.
> ---
> This commit is essentially because compilers will optimize the
> idiomatic use of memcmp return as a boolean:
>
> https://godbolt.org/z/Tbhefh6cv
>
> so it seems reasonable to have an optimized bcmp implementation as we
> can get ~0-25% improvement (generally larger improvement for the
> smaller size ranges which ultimately are the most important to opimize
> for).
>
> Numbers for new implementations attached in reply.
>

Numbers in this email.


>
> Tests where run on the following CPUs:
>
> Tigerlake:
> https://ark.intel.com/content/www/us/en/ark/products/208921/intel-core-i7-1165g7-processor-12m-cache-up-to-4-70-ghz-with-ipu.html
> Skylake:
> https://ark.intel.com/content/www/us/en/ark/products/149091/intel-core-i7-8565u-processor-8m-cache-up-to-4-60-ghz.html
>
> Some notes on the numbers.
>
> There are some regressions in the sse2/sse4_1 versions. I didn't
> optimize these versions beyond defining out obviously irrelivant code
> for bcmp. My intuition is that the slowdowns are alignment related. I
> am not sure if these issues would translate to architectures that
> would actually use sse2/sse4_1.
>
> I add the sse2/sse4_1 implementations mostly so that the ifunc would
> have something to fallback on. With the lackluster numbers it may not
> be worth it, especially factoring in code size costs. Thoughts?
>
> The Tigerlake and Skylake versions are basically universal
> improvements for evex and avx2. I opted to align bcmp to 64 byte as
> opposed to 16. The rational is that to optimize for frontend behavior
> on either machine, only 16 byte gurantees is not enough. I think in
> any function where throughput (which I think bcmp can be) might be
> important good frontend behavior is important.
>
>
>  benchtests/Makefile                        |  2 +-
>  benchtests/bench-bcmp.c                    | 20 ++++++++
>  benchtests/bench-memcmp.c                  |  4 +-
>  string/Makefile                            |  4 +-
>  string/test-bcmp.c                         | 21 +++++++++
>  string/test-memcmp.c                       | 27 +++++++----
>  sysdeps/x86_64/memcmp.S                    |  2 -
>  sysdeps/x86_64/multiarch/Makefile          |  3 ++
>  sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S   | 12 +++++
>  sysdeps/x86_64/multiarch/bcmp-avx2.S       | 23 ++++++++++
>  sysdeps/x86_64/multiarch/bcmp-evex.S       | 23 ++++++++++
>  sysdeps/x86_64/multiarch/bcmp-sse2.S       | 23 ++++++++++
>  sysdeps/x86_64/multiarch/bcmp-sse4.S       | 23 ++++++++++
>  sysdeps/x86_64/multiarch/bcmp.c            | 35 ++++++++++++++
>  sysdeps/x86_64/multiarch/ifunc-bcmp.h      | 53 ++++++++++++++++++++++
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c | 23 ++++++++++
>  sysdeps/x86_64/multiarch/memcmp-sse2.S     |  4 +-
>  sysdeps/x86_64/multiarch/memcmp.c          |  2 -
>  18 files changed, 286 insertions(+), 18 deletions(-)
>  create mode 100644 benchtests/bench-bcmp.c
>  create mode 100644 string/test-bcmp.c
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2.S
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse2.S
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse4.S
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp.c
>  create mode 100644 sysdeps/x86_64/multiarch/ifunc-bcmp.h
>
> diff --git a/benchtests/Makefile b/benchtests/Makefile
> index 1530939a8c..5fc495eb57 100644
> --- a/benchtests/Makefile
> +++ b/benchtests/Makefile
> @@ -47,7 +47,7 @@ bench := $(foreach B,$(filter bench-%,${BENCHSET}),
> ${${B}})
>  endif
>
>  # String function benchmarks.
> -string-benchset := memccpy memchr memcmp memcpy memmem memmove \
> +string-benchset := bcmp memccpy memchr memcmp memcpy memmem memmove \
>                    mempcpy memset rawmemchr stpcpy stpncpy strcasecmp
> strcasestr \
>                    strcat strchr strchrnul strcmp strcpy strcspn strlen \
>                    strncasecmp strncat strncmp strncpy strnlen strpbrk
> strrchr \
> diff --git a/benchtests/bench-bcmp.c b/benchtests/bench-bcmp.c
> new file mode 100644
> index 0000000000..1023639787
> --- /dev/null
> +++ b/benchtests/bench-bcmp.c
> @@ -0,0 +1,20 @@
> +/* Measure bcmp functions.
> +   Copyright (C) 2015-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define TEST_BCMP 1
> +#include "bench-memcmp.c"
> diff --git a/benchtests/bench-memcmp.c b/benchtests/bench-memcmp.c
> index 744c7ec5ba..4d5f8fb766 100644
> --- a/benchtests/bench-memcmp.c
> +++ b/benchtests/bench-memcmp.c
> @@ -17,7 +17,9 @@
>     <https://www.gnu.org/licenses/>.  */
>
>  #define TEST_MAIN
> -#ifdef WIDE
> +#ifdef TEST_BCMP
> +# define TEST_NAME "bcmp"
> +#elif defined WIDE
>  # define TEST_NAME "wmemcmp"
>  #else
>  # define TEST_NAME "memcmp"
> diff --git a/string/Makefile b/string/Makefile
> index f0fce2a0b8..f1f67ee157 100644
> --- a/string/Makefile
> +++ b/string/Makefile
> @@ -35,7 +35,7 @@ routines      := strcat strchr strcmp strcoll strcpy
> strcspn          \
>                    strncat strncmp strncpy                              \
>                    strrchr strpbrk strsignal strspn strstr strtok       \
>                    strtok_r strxfrm memchr memcmp memmove memset        \
> -                  mempcpy bcopy bzero ffs ffsll stpcpy stpncpy         \
> +                  mempcpy bcmp bcopy bzero ffs ffsll stpcpy stpncpy
>       \
>                    strcasecmp strncase strcasecmp_l strncase_l          \
>                    memccpy memcpy wordcopy strsep strcasestr            \
>                    swab strfry memfrob memmem rawmemchr strchrnul       \
> @@ -52,7 +52,7 @@ strop-tests   := memchr memcmp memcpy memmove mempcpy
> memset memccpy  \
>                    stpcpy stpncpy strcat strchr strcmp strcpy strcspn   \
>                    strlen strncmp strncpy strpbrk strrchr strspn memmem \
>                    strstr strcasestr strnlen strcasecmp strncasecmp     \
> -                  strncat rawmemchr strchrnul bcopy bzero memrchr      \
> +                  strncat rawmemchr strchrnul bcmp bcopy bzero memrchr \
>                    explicit_bzero
>  tests          := tester inl-tester noinl-tester testcopy test-ffs     \
>                    tst-strlen stratcliff tst-svc tst-inlcall            \
> diff --git a/string/test-bcmp.c b/string/test-bcmp.c
> new file mode 100644
> index 0000000000..6d19a4a87c
> --- /dev/null
> +++ b/string/test-bcmp.c
> @@ -0,0 +1,21 @@
> +/* Test and measure bcmp functions.
> +   Copyright (C) 2012-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define BAD_RESULT(result, expec) ((!(result)) != (!(expec)))
> +#define TEST_BCMP 1
> +#include "test-memcmp.c"
> diff --git a/string/test-memcmp.c b/string/test-memcmp.c
> index 6ddbc05d2f..c630e6799d 100644
> --- a/string/test-memcmp.c
> +++ b/string/test-memcmp.c
> @@ -17,11 +17,14 @@
>     <https://www.gnu.org/licenses/>.  */
>
>  #define TEST_MAIN
> -#ifdef WIDE
> +#ifdef TEST_BCMP
> +# define TEST_NAME "bcmp"
> +#elif defined WIDE
>  # define TEST_NAME "wmemcmp"
>  #else
>  # define TEST_NAME "memcmp"
>  #endif
> +
>  #include "test-string.h"
>  #ifdef WIDE
>  # include <inttypes.h>
> @@ -35,6 +38,7 @@
>  # define CHARBYTES 4
>  # define CHAR__MIN WCHAR_MIN
>  # define CHAR__MAX WCHAR_MAX
> +
>  int
>  simple_wmemcmp (const wchar_t *s1, const wchar_t *s2, size_t n)
>  {
> @@ -48,8 +52,11 @@ simple_wmemcmp (const wchar_t *s1, const wchar_t *s2,
> size_t n)
>  }
>  #else
>  # include <limits.h>
> -
> -# define MEMCMP memcmp
> +# ifdef TEST_BCMP
> +#  define MEMCMP bcmp
> +# else
> +#  define MEMCMP memcmp
> +# endif
>  # define MEMCPY memcpy
>  # define SIMPLE_MEMCMP simple_memcmp
>  # define CHAR char
> @@ -69,6 +76,12 @@ simple_memcmp (const char *s1, const char *s2, size_t n)
>  }
>  #endif
>
> +# ifndef BAD_RESULT
> +#  define BAD_RESULT(result, expec)                                     \
> +    (((result) == 0 && (expec)) || ((result) < 0 && (expec) >= 0) ||    \
> +     ((result) > 0 && (expec) <= 0))
> +#  endif
> +
>  typedef int (*proto_t) (const CHAR *, const CHAR *, size_t);
>
>  IMPL (SIMPLE_MEMCMP, 0)
> @@ -79,9 +92,7 @@ check_result (impl_t *impl, const CHAR *s1, const CHAR
> *s2, size_t len,
>               int exp_result)
>  {
>    int result = CALL (impl, s1, s2, len);
> -  if ((exp_result == 0 && result != 0)
> -      || (exp_result < 0 && result >= 0)
> -      || (exp_result > 0 && result <= 0))
> +  if (BAD_RESULT(result, exp_result))
>      {
>        error (0, 0, "Wrong result in function %s %d %d", impl->name,
>              result, exp_result);
> @@ -186,9 +197,7 @@ do_random_tests (void)
>         {
>           r = CALL (impl, (CHAR *) p1 + align1, (const CHAR *) p2 + align2,
>                     len);
> -         if ((r == 0 && result)
> -             || (r < 0 && result >= 0)
> -             || (r > 0 && result <= 0))
> +         if (BAD_RESULT(r, result))
>             {
>               error (0, 0, "Iteration %zd - wrong result in function %s
> (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
>                      n, impl->name, align1 * CHARBYTES & 63,  align2 *
> CHARBYTES & 63, len, pos, r, result, p1, p2);
> diff --git a/sysdeps/x86_64/memcmp.S b/sysdeps/x86_64/memcmp.S
> index 870e15c5a0..dfd0269db2 100644
> --- a/sysdeps/x86_64/memcmp.S
> +++ b/sysdeps/x86_64/memcmp.S
> @@ -356,6 +356,4 @@ L(ATR32res):
>         .p2align 4,, 4
>  END(memcmp)
>
> -#undef bcmp
> -weak_alias (memcmp, bcmp)
>  libc_hidden_builtin_def (memcmp)
> diff --git a/sysdeps/x86_64/multiarch/Makefile
> b/sysdeps/x86_64/multiarch/Makefile
> index 26be40959c..9dd0d8c3ff 100644
> --- a/sysdeps/x86_64/multiarch/Makefile
> +++ b/sysdeps/x86_64/multiarch/Makefile
> @@ -1,6 +1,7 @@
>  ifeq ($(subdir),string)
>
>  sysdep_routines += strncat-c stpncpy-c strncpy-c \
> +                  bcmp-sse2 bcmp-sse4 bcmp-avx2 \
>                    strcmp-sse2 strcmp-sse2-unaligned strcmp-ssse3  \
>                    strcmp-sse4_2 strcmp-avx2 \
>                    strncmp-sse2 strncmp-ssse3 strncmp-sse4_2 strncmp-avx2 \
> @@ -40,6 +41,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \
>                    memset-sse2-unaligned-erms \
>                    memset-avx2-unaligned-erms \
>                    memset-avx512-unaligned-erms \
> +                  bcmp-avx2-rtm \
>                    memchr-avx2-rtm \
>                    memcmp-avx2-movbe-rtm \
>                    memmove-avx-unaligned-erms-rtm \
> @@ -59,6 +61,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \
>                    strncpy-avx2-rtm \
>                    strnlen-avx2-rtm \
>                    strrchr-avx2-rtm \
> +                  bcmp-evex \
>                    memchr-evex \
>                    memcmp-evex-movbe \
>                    memmove-evex-unaligned-erms \
> diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
> b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
> new file mode 100644
> index 0000000000..d742257e4e
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
> @@ -0,0 +1,12 @@
> +#ifndef MEMCMP
> +# define MEMCMP __bcmp_avx2_rtm
> +#endif
> +
> +#define ZERO_UPPER_VEC_REGISTERS_RETURN \
> +  ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
> +
> +#define VZEROUPPER_RETURN jmp   L(return_vzeroupper)
> +
> +#define SECTION(p) p##.avx.rtm
> +
> +#include "bcmp-avx2.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2.S
> b/sysdeps/x86_64/multiarch/bcmp-avx2.S
> new file mode 100644
> index 0000000000..93a9a20b17
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-avx2.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with AVX2.
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef MEMCMP
> +# define MEMCMP        __bcmp_avx2
> +#endif
> +
> +#include "bcmp-avx2.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-evex.S
> b/sysdeps/x86_64/multiarch/bcmp-evex.S
> new file mode 100644
> index 0000000000..ade52e8c68
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-evex.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with EVEX.
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef MEMCMP
> +# define MEMCMP        __bcmp_evex
> +#endif
> +
> +#include "memcmp-evex-movbe.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-sse2.S
> b/sysdeps/x86_64/multiarch/bcmp-sse2.S
> new file mode 100644
> index 0000000000..b18d570386
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-sse2.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with SSE2
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +# ifndef memcmp
> +#  define memcmp       __bcmp_sse2
> +# endif
> +# define USE_AS_BCMP   1
> +#include "memcmp-sse2.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-sse4.S
> b/sysdeps/x86_64/multiarch/bcmp-sse4.S
> new file mode 100644
> index 0000000000..ed9804053f
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-sse4.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with SSE4.1
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +# ifndef MEMCMP
> +#  define MEMCMP       __bcmp_sse4_1
> +# endif
> +# define USE_AS_BCMP   1
> +#include "memcmp-sse4.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp.c
> b/sysdeps/x86_64/multiarch/bcmp.c
> new file mode 100644
> index 0000000000..6e26b73ecc
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp.c
> @@ -0,0 +1,35 @@
> +/* Multiple versions of bcmp.
> +   All versions must be listed in ifunc-impl-list.c.
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Define multiple versions only for the definition in libc.  */
> +#if IS_IN (libc)
> +# define bcmp __redirect_bcmp
> +# include <string.h>
> +# undef bcmp
> +
> +# define SYMBOL_NAME bcmp
> +# include "ifunc-bcmp.h"
> +
> +libc_ifunc_redirected (__redirect_bcmp, bcmp, IFUNC_SELECTOR ());
> +
> +# ifdef SHARED
> +__hidden_ver1 (bcmp, __GI_bcmp, __redirect_bcmp)
> +  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (bcmp);
> +# endif
> +#endif
> diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h
> b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
> new file mode 100644
> index 0000000000..b0dacd8526
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
> @@ -0,0 +1,53 @@
> +/* Common definition for bcmp ifunc selections.
> +   All versions must be listed in ifunc-impl-list.c.
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +# include <init-arch.h>
> +
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden;
> +
> +static inline void *
> +IFUNC_SELECTOR (void)
> +{
> +  const struct cpu_features* cpu_features = __get_cpu_features ();
> +
> +  if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> +      && CPU_FEATURE_USABLE_P (cpu_features, BMI2)
> +      && CPU_FEATURE_USABLE_P (cpu_features, MOVBE)
> +      && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
> +    {
> +      if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
> +         && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
> +       return OPTIMIZE (evex);
> +
> +      if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
> +       return OPTIMIZE (avx2_rtm);
> +
> +      if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER))
> +       return OPTIMIZE (avx2);
> +    }
> +
> +  if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1))
> +    return OPTIMIZE (sse4_1);
> +
> +  return OPTIMIZE (sse2);
> +}
> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> index 39ab10613b..dd0c393c7d 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> @@ -38,6 +38,29 @@ __libc_ifunc_impl_list (const char *name, struct
> libc_ifunc_impl *array,
>
>    size_t i = 0;
>
> +  /* Support sysdeps/x86_64/multiarch/bcmp.c.  */
> +  IFUNC_IMPL (i, name, bcmp,
> +             IFUNC_IMPL_ADD (array, i, bcmp,
> +                             (CPU_FEATURE_USABLE (AVX2)
> +                   && CPU_FEATURE_USABLE (MOVBE)
> +                              && CPU_FEATURE_USABLE (BMI2)),
> +                             __bcmp_avx2)
> +             IFUNC_IMPL_ADD (array, i, bcmp,
> +                             (CPU_FEATURE_USABLE (AVX2)
> +                              && CPU_FEATURE_USABLE (BMI2)
> +                   && CPU_FEATURE_USABLE (MOVBE)
> +                              && CPU_FEATURE_USABLE (RTM)),
> +                             __bcmp_avx2_rtm)
> +             IFUNC_IMPL_ADD (array, i, bcmp,
> +                             (CPU_FEATURE_USABLE (AVX512VL)
> +                              && CPU_FEATURE_USABLE (AVX512BW)
> +                   && CPU_FEATURE_USABLE (MOVBE)
> +                              && CPU_FEATURE_USABLE (BMI2)),
> +                             __bcmp_evex)
> +             IFUNC_IMPL_ADD (array, i, bcmp, CPU_FEATURE_USABLE (SSE4_1),
> +                             __bcmp_sse4_1)
> +             IFUNC_IMPL_ADD (array, i, bcmp, 1, __bcmp_sse2))
> +
>    /* Support sysdeps/x86_64/multiarch/memchr.c.  */
>    IFUNC_IMPL (i, name, memchr,
>               IFUNC_IMPL_ADD (array, i, memchr,
> diff --git a/sysdeps/x86_64/multiarch/memcmp-sse2.S
> b/sysdeps/x86_64/multiarch/memcmp-sse2.S
> index b135fa2d40..2a4867ad18 100644
> --- a/sysdeps/x86_64/multiarch/memcmp-sse2.S
> +++ b/sysdeps/x86_64/multiarch/memcmp-sse2.S
> @@ -17,7 +17,9 @@
>     <https://www.gnu.org/licenses/>.  */
>
>  #if IS_IN (libc)
> -# define memcmp __memcmp_sse2
> +# ifndef memcmp
> +#  define memcmp __memcmp_sse2
> +# endif
>
>  # ifdef SHARED
>  #  undef libc_hidden_builtin_def
> diff --git a/sysdeps/x86_64/multiarch/memcmp.c
> b/sysdeps/x86_64/multiarch/memcmp.c
> index fe725f3563..1760e045df 100644
> --- a/sysdeps/x86_64/multiarch/memcmp.c
> +++ b/sysdeps/x86_64/multiarch/memcmp.c
> @@ -27,8 +27,6 @@
>  # include "ifunc-memcmp.h"
>
>  libc_ifunc_redirected (__redirect_memcmp, memcmp, IFUNC_SELECTOR ());
> -# undef bcmp
> -weak_alias (memcmp, bcmp)
>
>  # ifdef SHARED
>  __hidden_ver1 (memcmp, __GI_memcmp, __redirect_memcmp)
> --
> 2.25.1
>
>

[-- Attachment #2: bcmp-skl.pdf --]
[-- Type: application/pdf, Size: 195097 bytes --]

[-- Attachment #3: bcmp-tgl.pdf --]
[-- Type: application/pdf, Size: 223172 bytes --]

  parent reply	other threads:[~2021-09-13 23:25 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-13 23:05 [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
2021-09-14  1:18   ` Carlos O'Donell via Libc-alpha
2021-09-14  2:05     ` Noah Goldstein via Libc-alpha
2021-09-14  2:35       ` Carlos O'Donell via Libc-alpha
2021-09-14  2:55         ` DJ Delorie via Libc-alpha
2021-09-14  3:24           ` Noah Goldstein via Libc-alpha
2021-09-14  3:40         ` Noah Goldstein via Libc-alpha
2021-09-14  4:21           ` DJ Delorie via Libc-alpha
2021-09-14  5:29             ` Noah Goldstein via Libc-alpha
2021-09-14  5:42               ` DJ Delorie via Libc-alpha
2021-09-14  5:55                 ` Noah Goldstein via Libc-alpha
2021-09-13 23:22 ` Noah Goldstein via Libc-alpha [this message]
2021-09-14  6:30 ` [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
2021-09-14  6:30   ` [PATCH v2 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
2021-09-14  6:30   ` [PATCH v2 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S Noah Goldstein via Libc-alpha
2021-09-14  6:30   ` [PATCH v2 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S Noah Goldstein via Libc-alpha
2021-09-14  6:30   ` [PATCH v2 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
2021-09-14 14:40   ` [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex H.J. Lu via Libc-alpha
2021-09-14 19:23     ` Noah Goldstein via Libc-alpha
2021-09-14 20:30     ` Florian Weimer via Libc-alpha
2021-09-15  0:00 ` [PATCH " Joseph Myers
2021-09-15 13:37   ` Zack Weinberg via Libc-alpha
2021-09-15 14:01     ` Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, " Florian Weimer via Libc-alpha
2021-09-15 18:06       ` Noah Goldstein via Libc-alpha
2021-09-15 18:30         ` Joseph Myers
2021-09-27  1:35           ` Noah Goldstein via Libc-alpha
2021-09-27  7:29             ` Florian Weimer via Libc-alpha
2021-09-27 16:49               ` Noah Goldstein via Libc-alpha
2021-09-27 16:54                 ` Florian Weimer via Libc-alpha
2021-09-27 17:54                   ` Noah Goldstein via Libc-alpha
2021-09-27 17:56                     ` Florian Weimer via Libc-alpha
2021-09-27 18:05                       ` Noah Goldstein via Libc-alpha
2021-09-27 18:10                         ` Florian Weimer via Libc-alpha
2021-09-27 18:15                           ` Noah Goldstein via Libc-alpha
2021-09-27 18:22                             ` Florian Weimer via Libc-alpha
2021-09-27 18:34                               ` Noah Goldstein via Libc-alpha
2021-09-27 18:56                                 ` Florian Weimer via Libc-alpha
2021-09-27 19:20                                   ` Noah Goldstein via Libc-alpha
2021-09-27 19:34                                     ` Florian Weimer via Libc-alpha
2021-09-27 19:43                                       ` Noah Goldstein via Libc-alpha
2021-09-27 19:59                                         ` Florian Weimer via Libc-alpha
2021-09-27 20:22                                           ` Noah Goldstein via Libc-alpha
2021-09-27 20:24                                             ` Florian Weimer via Libc-alpha
2021-09-27 20:38                                               ` Noah Goldstein via Libc-alpha
2021-09-28  0:07                                                 ` Noah Goldstein via Libc-alpha
2021-09-27 17:42               ` Joseph Myers
2021-09-27 17:48                 ` Noah Goldstein via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFUsyfLZnBra19JkFjV3Kkx55kqEEgGNA1vNRgK2v4F11WMM2g@mail.gmail.com \
    --to=libc-alpha@sourceware.org \
    --cc=goldstein.w.n@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).