unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex
@ 2021-09-13 23:05 Noah Goldstein via Libc-alpha
  2021-09-13 23:05 ` [PATCH 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
                   ` (6 more replies)
  0 siblings, 7 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-13 23:05 UTC (permalink / raw)
  To: libc-alpha

No bug. This commit adds support for an optimized bcmp implementation.
Support is for sse2, sse4_1, avx2, and evex.

All string tests passing and build succeeding.
---
This commit is essentially because compilers will optimize the
idiomatic use of memcmp return as a boolean:
    
https://godbolt.org/z/Tbhefh6cv
    
so it seems reasonable to have an optimized bcmp implementation as we
can get ~0-25% improvement (generally larger improvement for the
smaller size ranges which ultimately are the most important to opimize
for).
    
Numbers for new implementations attached in reply.

Tests where run on the following CPUs:

Tigerlake: https://ark.intel.com/content/www/us/en/ark/products/208921/intel-core-i7-1165g7-processor-12m-cache-up-to-4-70-ghz-with-ipu.html
Skylake: https://ark.intel.com/content/www/us/en/ark/products/149091/intel-core-i7-8565u-processor-8m-cache-up-to-4-60-ghz.html

Some notes on the numbers.

There are some regressions in the sse2/sse4_1 versions. I didn't
optimize these versions beyond defining out obviously irrelivant code
for bcmp. My intuition is that the slowdowns are alignment related. I
am not sure if these issues would translate to architectures that
would actually use sse2/sse4_1.

I add the sse2/sse4_1 implementations mostly so that the ifunc would
have something to fallback on. With the lackluster numbers it may not
be worth it, especially factoring in code size costs. Thoughts?

The Tigerlake and Skylake versions are basically universal
improvements for evex and avx2. I opted to align bcmp to 64 byte as
opposed to 16. The rational is that to optimize for frontend behavior
on either machine, only 16 byte gurantees is not enough. I think in
any function where throughput (which I think bcmp can be) might be
important good frontend behavior is important.

    
 benchtests/Makefile                        |  2 +-
 benchtests/bench-bcmp.c                    | 20 ++++++++
 benchtests/bench-memcmp.c                  |  4 +-
 string/Makefile                            |  4 +-
 string/test-bcmp.c                         | 21 +++++++++
 string/test-memcmp.c                       | 27 +++++++----
 sysdeps/x86_64/memcmp.S                    |  2 -
 sysdeps/x86_64/multiarch/Makefile          |  3 ++
 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S   | 12 +++++
 sysdeps/x86_64/multiarch/bcmp-avx2.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp-evex.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp-sse2.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp-sse4.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp.c            | 35 ++++++++++++++
 sysdeps/x86_64/multiarch/ifunc-bcmp.h      | 53 ++++++++++++++++++++++
 sysdeps/x86_64/multiarch/ifunc-impl-list.c | 23 ++++++++++
 sysdeps/x86_64/multiarch/memcmp-sse2.S     |  4 +-
 sysdeps/x86_64/multiarch/memcmp.c          |  2 -
 18 files changed, 286 insertions(+), 18 deletions(-)
 create mode 100644 benchtests/bench-bcmp.c
 create mode 100644 string/test-bcmp.c
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse2.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse4.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp.c
 create mode 100644 sysdeps/x86_64/multiarch/ifunc-bcmp.h

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 1530939a8c..5fc495eb57 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -47,7 +47,7 @@ bench := $(foreach B,$(filter bench-%,${BENCHSET}), ${${B}})
 endif
 
 # String function benchmarks.
-string-benchset := memccpy memchr memcmp memcpy memmem memmove \
+string-benchset := bcmp memccpy memchr memcmp memcpy memmem memmove \
 		   mempcpy memset rawmemchr stpcpy stpncpy strcasecmp strcasestr \
 		   strcat strchr strchrnul strcmp strcpy strcspn strlen \
 		   strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \
diff --git a/benchtests/bench-bcmp.c b/benchtests/bench-bcmp.c
new file mode 100644
index 0000000000..1023639787
--- /dev/null
+++ b/benchtests/bench-bcmp.c
@@ -0,0 +1,20 @@
+/* Measure bcmp functions.
+   Copyright (C) 2015-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define TEST_BCMP 1
+#include "bench-memcmp.c"
diff --git a/benchtests/bench-memcmp.c b/benchtests/bench-memcmp.c
index 744c7ec5ba..4d5f8fb766 100644
--- a/benchtests/bench-memcmp.c
+++ b/benchtests/bench-memcmp.c
@@ -17,7 +17,9 @@
    <https://www.gnu.org/licenses/>.  */
 
 #define TEST_MAIN
-#ifdef WIDE
+#ifdef TEST_BCMP
+# define TEST_NAME "bcmp"
+#elif defined WIDE
 # define TEST_NAME "wmemcmp"
 #else
 # define TEST_NAME "memcmp"
diff --git a/string/Makefile b/string/Makefile
index f0fce2a0b8..f1f67ee157 100644
--- a/string/Makefile
+++ b/string/Makefile
@@ -35,7 +35,7 @@ routines	:= strcat strchr strcmp strcoll strcpy strcspn		\
 		   strncat strncmp strncpy				\
 		   strrchr strpbrk strsignal strspn strstr strtok	\
 		   strtok_r strxfrm memchr memcmp memmove memset	\
-		   mempcpy bcopy bzero ffs ffsll stpcpy stpncpy		\
+		   mempcpy bcmp bcopy bzero ffs ffsll stpcpy stpncpy		\
 		   strcasecmp strncase strcasecmp_l strncase_l		\
 		   memccpy memcpy wordcopy strsep strcasestr		\
 		   swab strfry memfrob memmem rawmemchr strchrnul	\
@@ -52,7 +52,7 @@ strop-tests	:= memchr memcmp memcpy memmove mempcpy memset memccpy	\
 		   stpcpy stpncpy strcat strchr strcmp strcpy strcspn	\
 		   strlen strncmp strncpy strpbrk strrchr strspn memmem	\
 		   strstr strcasestr strnlen strcasecmp strncasecmp	\
-		   strncat rawmemchr strchrnul bcopy bzero memrchr	\
+		   strncat rawmemchr strchrnul bcmp bcopy bzero memrchr	\
 		   explicit_bzero
 tests		:= tester inl-tester noinl-tester testcopy test-ffs	\
 		   tst-strlen stratcliff tst-svc tst-inlcall		\
diff --git a/string/test-bcmp.c b/string/test-bcmp.c
new file mode 100644
index 0000000000..6d19a4a87c
--- /dev/null
+++ b/string/test-bcmp.c
@@ -0,0 +1,21 @@
+/* Test and measure bcmp functions.
+   Copyright (C) 2012-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define BAD_RESULT(result, expec) ((!(result)) != (!(expec)))
+#define TEST_BCMP 1
+#include "test-memcmp.c"
diff --git a/string/test-memcmp.c b/string/test-memcmp.c
index 6ddbc05d2f..c630e6799d 100644
--- a/string/test-memcmp.c
+++ b/string/test-memcmp.c
@@ -17,11 +17,14 @@
    <https://www.gnu.org/licenses/>.  */
 
 #define TEST_MAIN
-#ifdef WIDE
+#ifdef TEST_BCMP
+# define TEST_NAME "bcmp"
+#elif defined WIDE
 # define TEST_NAME "wmemcmp"
 #else
 # define TEST_NAME "memcmp"
 #endif
+
 #include "test-string.h"
 #ifdef WIDE
 # include <inttypes.h>
@@ -35,6 +38,7 @@
 # define CHARBYTES 4
 # define CHAR__MIN WCHAR_MIN
 # define CHAR__MAX WCHAR_MAX
+
 int
 simple_wmemcmp (const wchar_t *s1, const wchar_t *s2, size_t n)
 {
@@ -48,8 +52,11 @@ simple_wmemcmp (const wchar_t *s1, const wchar_t *s2, size_t n)
 }
 #else
 # include <limits.h>
-
-# define MEMCMP memcmp
+# ifdef TEST_BCMP
+#  define MEMCMP bcmp
+# else
+#  define MEMCMP memcmp
+# endif
 # define MEMCPY memcpy
 # define SIMPLE_MEMCMP simple_memcmp
 # define CHAR char
@@ -69,6 +76,12 @@ simple_memcmp (const char *s1, const char *s2, size_t n)
 }
 #endif
 
+# ifndef BAD_RESULT
+#  define BAD_RESULT(result, expec)                                     \
+    (((result) == 0 && (expec)) || ((result) < 0 && (expec) >= 0) ||    \
+     ((result) > 0 && (expec) <= 0))
+#  endif
+
 typedef int (*proto_t) (const CHAR *, const CHAR *, size_t);
 
 IMPL (SIMPLE_MEMCMP, 0)
@@ -79,9 +92,7 @@ check_result (impl_t *impl, const CHAR *s1, const CHAR *s2, size_t len,
 	      int exp_result)
 {
   int result = CALL (impl, s1, s2, len);
-  if ((exp_result == 0 && result != 0)
-      || (exp_result < 0 && result >= 0)
-      || (exp_result > 0 && result <= 0))
+  if (BAD_RESULT(result, exp_result))
     {
       error (0, 0, "Wrong result in function %s %d %d", impl->name,
 	     result, exp_result);
@@ -186,9 +197,7 @@ do_random_tests (void)
 	{
 	  r = CALL (impl, (CHAR *) p1 + align1, (const CHAR *) p2 + align2,
 		    len);
-	  if ((r == 0 && result)
-	      || (r < 0 && result >= 0)
-	      || (r > 0 && result <= 0))
+	  if (BAD_RESULT(r, result))
 	    {
 	      error (0, 0, "Iteration %zd - wrong result in function %s (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
 		     n, impl->name, align1 * CHARBYTES & 63,  align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);
diff --git a/sysdeps/x86_64/memcmp.S b/sysdeps/x86_64/memcmp.S
index 870e15c5a0..dfd0269db2 100644
--- a/sysdeps/x86_64/memcmp.S
+++ b/sysdeps/x86_64/memcmp.S
@@ -356,6 +356,4 @@ L(ATR32res):
 	.p2align 4,, 4
 END(memcmp)
 
-#undef bcmp
-weak_alias (memcmp, bcmp)
 libc_hidden_builtin_def (memcmp)
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
index 26be40959c..9dd0d8c3ff 100644
--- a/sysdeps/x86_64/multiarch/Makefile
+++ b/sysdeps/x86_64/multiarch/Makefile
@@ -1,6 +1,7 @@
 ifeq ($(subdir),string)
 
 sysdep_routines += strncat-c stpncpy-c strncpy-c \
+		   bcmp-sse2 bcmp-sse4 bcmp-avx2 \
 		   strcmp-sse2 strcmp-sse2-unaligned strcmp-ssse3  \
 		   strcmp-sse4_2 strcmp-avx2 \
 		   strncmp-sse2 strncmp-ssse3 strncmp-sse4_2 strncmp-avx2 \
@@ -40,6 +41,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \
 		   memset-sse2-unaligned-erms \
 		   memset-avx2-unaligned-erms \
 		   memset-avx512-unaligned-erms \
+		   bcmp-avx2-rtm \
 		   memchr-avx2-rtm \
 		   memcmp-avx2-movbe-rtm \
 		   memmove-avx-unaligned-erms-rtm \
@@ -59,6 +61,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \
 		   strncpy-avx2-rtm \
 		   strnlen-avx2-rtm \
 		   strrchr-avx2-rtm \
+		   bcmp-evex \
 		   memchr-evex \
 		   memcmp-evex-movbe \
 		   memmove-evex-unaligned-erms \
diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
new file mode 100644
index 0000000000..d742257e4e
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
@@ -0,0 +1,12 @@
+#ifndef MEMCMP
+# define MEMCMP __bcmp_avx2_rtm
+#endif
+
+#define ZERO_UPPER_VEC_REGISTERS_RETURN \
+  ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
+
+#define VZEROUPPER_RETURN jmp	 L(return_vzeroupper)
+
+#define SECTION(p) p##.avx.rtm
+
+#include "bcmp-avx2.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2.S b/sysdeps/x86_64/multiarch/bcmp-avx2.S
new file mode 100644
index 0000000000..93a9a20b17
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-avx2.S
@@ -0,0 +1,23 @@
+/* bcmp optimized with AVX2.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef MEMCMP
+# define MEMCMP	__bcmp_avx2
+#endif
+
+#include "bcmp-avx2.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-evex.S b/sysdeps/x86_64/multiarch/bcmp-evex.S
new file mode 100644
index 0000000000..ade52e8c68
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-evex.S
@@ -0,0 +1,23 @@
+/* bcmp optimized with EVEX.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef MEMCMP
+# define MEMCMP	__bcmp_evex
+#endif
+
+#include "memcmp-evex-movbe.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-sse2.S b/sysdeps/x86_64/multiarch/bcmp-sse2.S
new file mode 100644
index 0000000000..b18d570386
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-sse2.S
@@ -0,0 +1,23 @@
+/* bcmp optimized with SSE2
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+# ifndef memcmp
+#  define memcmp	__bcmp_sse2
+# endif
+# define USE_AS_BCMP	1
+#include "memcmp-sse2.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-sse4.S b/sysdeps/x86_64/multiarch/bcmp-sse4.S
new file mode 100644
index 0000000000..ed9804053f
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-sse4.S
@@ -0,0 +1,23 @@
+/* bcmp optimized with SSE4.1
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+# ifndef MEMCMP
+#  define MEMCMP	__bcmp_sse4_1
+# endif
+# define USE_AS_BCMP	1
+#include "memcmp-sse4.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp.c b/sysdeps/x86_64/multiarch/bcmp.c
new file mode 100644
index 0000000000..6e26b73ecc
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp.c
@@ -0,0 +1,35 @@
+/* Multiple versions of bcmp.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+#if IS_IN (libc)
+# define bcmp __redirect_bcmp
+# include <string.h>
+# undef bcmp
+
+# define SYMBOL_NAME bcmp
+# include "ifunc-bcmp.h"
+
+libc_ifunc_redirected (__redirect_bcmp, bcmp, IFUNC_SELECTOR ());
+
+# ifdef SHARED
+__hidden_ver1 (bcmp, __GI_bcmp, __redirect_bcmp)
+  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (bcmp);
+# endif
+#endif
diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
new file mode 100644
index 0000000000..b0dacd8526
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
@@ -0,0 +1,53 @@
+/* Common definition for bcmp ifunc selections.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+# include <init-arch.h>
+
+extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden;
+
+static inline void *
+IFUNC_SELECTOR (void)
+{
+  const struct cpu_features* cpu_features = __get_cpu_features ();
+
+  if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
+      && CPU_FEATURE_USABLE_P (cpu_features, BMI2)
+      && CPU_FEATURE_USABLE_P (cpu_features, MOVBE)
+      && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
+    {
+      if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
+	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
+	return OPTIMIZE (evex);
+
+      if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
+	return OPTIMIZE (avx2_rtm);
+
+      if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER))
+	return OPTIMIZE (avx2);
+    }
+
+  if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1))
+    return OPTIMIZE (sse4_1);
+
+  return OPTIMIZE (sse2);
+}
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index 39ab10613b..dd0c393c7d 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -38,6 +38,29 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   size_t i = 0;
 
+  /* Support sysdeps/x86_64/multiarch/bcmp.c.  */
+  IFUNC_IMPL (i, name, bcmp,
+	      IFUNC_IMPL_ADD (array, i, bcmp,
+			      (CPU_FEATURE_USABLE (AVX2)
+                   && CPU_FEATURE_USABLE (MOVBE)
+			       && CPU_FEATURE_USABLE (BMI2)),
+			      __bcmp_avx2)
+	      IFUNC_IMPL_ADD (array, i, bcmp,
+			      (CPU_FEATURE_USABLE (AVX2)
+			       && CPU_FEATURE_USABLE (BMI2)
+                   && CPU_FEATURE_USABLE (MOVBE)
+			       && CPU_FEATURE_USABLE (RTM)),
+			      __bcmp_avx2_rtm)
+	      IFUNC_IMPL_ADD (array, i, bcmp,
+			      (CPU_FEATURE_USABLE (AVX512VL)
+			       && CPU_FEATURE_USABLE (AVX512BW)
+                   && CPU_FEATURE_USABLE (MOVBE)
+			       && CPU_FEATURE_USABLE (BMI2)),
+			      __bcmp_evex)
+	      IFUNC_IMPL_ADD (array, i, bcmp, CPU_FEATURE_USABLE (SSE4_1),
+			      __bcmp_sse4_1)
+	      IFUNC_IMPL_ADD (array, i, bcmp, 1, __bcmp_sse2))
+
   /* Support sysdeps/x86_64/multiarch/memchr.c.  */
   IFUNC_IMPL (i, name, memchr,
 	      IFUNC_IMPL_ADD (array, i, memchr,
diff --git a/sysdeps/x86_64/multiarch/memcmp-sse2.S b/sysdeps/x86_64/multiarch/memcmp-sse2.S
index b135fa2d40..2a4867ad18 100644
--- a/sysdeps/x86_64/multiarch/memcmp-sse2.S
+++ b/sysdeps/x86_64/multiarch/memcmp-sse2.S
@@ -17,7 +17,9 @@
    <https://www.gnu.org/licenses/>.  */
 
 #if IS_IN (libc)
-# define memcmp __memcmp_sse2
+# ifndef memcmp
+#  define memcmp __memcmp_sse2
+# endif
 
 # ifdef SHARED
 #  undef libc_hidden_builtin_def
diff --git a/sysdeps/x86_64/multiarch/memcmp.c b/sysdeps/x86_64/multiarch/memcmp.c
index fe725f3563..1760e045df 100644
--- a/sysdeps/x86_64/multiarch/memcmp.c
+++ b/sysdeps/x86_64/multiarch/memcmp.c
@@ -27,8 +27,6 @@
 # include "ifunc-memcmp.h"
 
 libc_ifunc_redirected (__redirect_memcmp, memcmp, IFUNC_SELECTOR ());
-# undef bcmp
-weak_alias (memcmp, bcmp)
 
 # ifdef SHARED
 __hidden_ver1 (memcmp, __GI_memcmp, __redirect_memcmp)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S
  2021-09-13 23:05 [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
@ 2021-09-13 23:05 ` Noah Goldstein via Libc-alpha
  2021-09-13 23:05 ` [PATCH 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S Noah Goldstein via Libc-alpha
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-13 23:05 UTC (permalink / raw)
  To: libc-alpha

No bug. This commit does not modify any of the memcmp
implementation. It just adds bcmp ifdefs to skip obvious cases
where computing the proper 1/-1 required by memcmp is not needed.

test-memcmp, test-bcmp, and test-wmemcmp are all passing.
---
 sysdeps/x86_64/memcmp.S | 55 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/sysdeps/x86_64/memcmp.S b/sysdeps/x86_64/memcmp.S
index dfd0269db2..21607e7c91 100644
--- a/sysdeps/x86_64/memcmp.S
+++ b/sysdeps/x86_64/memcmp.S
@@ -49,34 +49,63 @@ L(s2b):
 	movzwl	(%rdi),	%eax
 	movzwl	(%rdi, %rsi), %edx
 	subq    $2, %r10
+#ifdef USE_AS_BCMP
+	je	L(finz1)
+#else
 	je	L(fin2_7)
+#endif
 	addq	$2, %rdi
 	cmpl	%edx, %eax
+#ifdef USE_AS_BCMP
+	jnz	L(neq_early)
+#else
 	jnz	L(fin2_7)
+#endif
 L(s4b):
 	testq	$4, %r10
 	jz	L(s8b)
 	movl	(%rdi),	%eax
 	movl	(%rdi, %rsi), %edx
 	subq    $4, %r10
+#ifdef USE_AS_BCMP
+	je	L(finz1)
+#else
 	je	L(fin2_7)
+#endif
 	addq	$4, %rdi
 	cmpl	%edx, %eax
+#ifdef USE_AS_BCMP
+	jnz	L(neq_early)
+#else
 	jnz	L(fin2_7)
+#endif
 L(s8b):
 	testq	$8, %r10
 	jz	L(s16b)
 	movq	(%rdi),	%rax
 	movq	(%rdi, %rsi), %rdx
 	subq    $8, %r10
+#ifdef USE_AS_BCMP
+	je	L(sub_return8)
+#else
 	je	L(fin2_7)
+#endif
 	addq	$8, %rdi
 	cmpq	%rdx, %rax
+#ifdef USE_AS_BCMP
+	jnz	L(neq_early)
+#else
 	jnz	L(fin2_7)
+#endif
 L(s16b):
 	movdqu    (%rdi), %xmm1
 	movdqu    (%rdi, %rsi), %xmm0
 	pcmpeqb   %xmm0, %xmm1
+#ifdef USE_AS_BCMP
+	pmovmskb  %xmm1, %eax
+	subl      $0xffff, %eax
+	ret
+#else
 	pmovmskb  %xmm1, %edx
 	xorl	  %eax, %eax
 	subl      $0xffff, %edx
@@ -86,7 +115,7 @@ L(s16b):
 	movzbl	 (%rcx), %eax
 	movzbl	 (%rsi, %rcx), %edx
 	jmp	 L(finz1)
-
+#endif
 	.p2align 4,, 4
 L(finr1b):
 	movzbl	(%rdi), %eax
@@ -95,7 +124,15 @@ L(finz1):
 	subl	%edx, %eax
 L(exit):
 	ret
-
+#ifdef USE_AS_BCMP
+	.p2align 4,, 4
+L(sub_return8):
+	subq	%rdx, %rax
+	movl	%eax, %edx
+	shrq	$32, %rax
+	orl	%edx, %eax
+	ret
+#else
 	.p2align 4,, 4
 L(fin2_7):
 	cmpq	%rdx, %rax
@@ -111,12 +148,17 @@ L(fin2_7):
 	movzbl  %dl, %edx
 	subl	%edx, %eax
 	ret
-
+#endif
 	.p2align 4,, 4
 L(finz):
 	xorl	%eax, %eax
 	ret
-
+#ifdef USE_AS_BCMP
+	.p2align 4,, 4
+L(neq_early):
+	movl	$1, %eax
+	ret
+#endif
 	/* For blocks bigger than 32 bytes
 	   1. Advance one of the addr pointer to be 16B aligned.
 	   2. Treat the case of both addr pointers aligned to 16B
@@ -246,11 +288,16 @@ L(mt16):
 
 	.p2align 4,, 4
 L(neq):
+#ifdef USE_AS_BCMP
+	movl	$1, %eax
+    ret
+#else
 	bsfl      %edx, %ecx
 	movzbl	 (%rdi, %rcx), %eax
 	addq	 %rdi, %rsi
 	movzbl	 (%rsi,%rcx), %edx
 	jmp	 L(finz1)
+#endif
 
 	.p2align 4,, 4
 L(ATR):
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S
  2021-09-13 23:05 [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
  2021-09-13 23:05 ` [PATCH 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
@ 2021-09-13 23:05 ` Noah Goldstein via Libc-alpha
  2021-09-13 23:05 ` [PATCH 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S Noah Goldstein via Libc-alpha
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-13 23:05 UTC (permalink / raw)
  To: libc-alpha

No bug. This commit does not modify any of the memcmp
implementation. It just adds bcmp ifdefs to skip obvious cases
where computing the proper 1/-1 required by memcmp is not needed.

test-memcmp, test-bcmp, and test-wmemcmp are all passing.
---
 sysdeps/x86_64/multiarch/memcmp-sse4.S | 761 ++++++++++++++++++++++++-
 1 file changed, 746 insertions(+), 15 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/memcmp-sse4.S b/sysdeps/x86_64/multiarch/memcmp-sse4.S
index b82adcd5fa..b9528ed58e 100644
--- a/sysdeps/x86_64/multiarch/memcmp-sse4.S
+++ b/sysdeps/x86_64/multiarch/memcmp-sse4.S
@@ -72,7 +72,11 @@ L(79bytesormore):
 	movdqu	(%rdi), %xmm2
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 	mov	%rsi, %rcx
 	and	$-16, %rsi
 	add	$16, %rsi
@@ -91,34 +95,58 @@ L(less128bytes):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqu	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqu	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 	cmp	$32, %rdx
 	jb	L(less32bytesin64)
 
 	movdqu	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqu	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -140,42 +168,74 @@ L(less256bytes):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqu	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqu	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 
 	movdqu	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqu	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 
 	movdqu	96(%rdi), %xmm2
 	pxor	96(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(112bytesin256)
+# endif
 
 	movdqu	112(%rdi), %xmm2
 	pxor	112(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(128bytesin256)
+# endif
 
 	add	$128, %rsi
 	add	$128, %rdi
@@ -189,12 +249,20 @@ L(less256bytes):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -208,82 +276,146 @@ L(less512bytes):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqu	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqu	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 
 	movdqu	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqu	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 
 	movdqu	96(%rdi), %xmm2
 	pxor	96(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(112bytesin256)
+# endif
 
 	movdqu	112(%rdi), %xmm2
 	pxor	112(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(128bytesin256)
+# endif
 
 	movdqu	128(%rdi), %xmm2
 	pxor	128(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(144bytesin256)
+# endif
 
 	movdqu	144(%rdi), %xmm2
 	pxor	144(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(160bytesin256)
+# endif
 
 	movdqu	160(%rdi), %xmm2
 	pxor	160(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(176bytesin256)
+# endif
 
 	movdqu	176(%rdi), %xmm2
 	pxor	176(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(192bytesin256)
+# endif
 
 	movdqu	192(%rdi), %xmm2
 	pxor	192(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(208bytesin256)
+# endif
 
 	movdqu	208(%rdi), %xmm2
 	pxor	208(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(224bytesin256)
+# endif
 
 	movdqu	224(%rdi), %xmm2
 	pxor	224(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(240bytesin256)
+# endif
 
 	movdqu	240(%rdi), %xmm2
 	pxor	240(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(256bytesin256)
+# endif
 
 	add	$256, %rsi
 	add	$256, %rdi
@@ -300,12 +432,20 @@ L(less512bytes):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -346,7 +486,11 @@ L(64bytesormore_loop):
 	por	%xmm5, %xmm1
 
 	ptest	%xmm1, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesormore_loop_end)
+# endif
 	add	$64, %rsi
 	add	$64, %rdi
 	sub	$64, %rdx
@@ -380,7 +524,11 @@ L(L2_L3_unaligned_128bytes_loop):
 	por	%xmm5, %xmm1
 
 	ptest	%xmm1, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesormore_loop_end)
+# endif
 	add	$64, %rsi
 	add	$64, %rdi
 	sub	$64, %rdx
@@ -404,34 +552,58 @@ L(less128bytesin2aligned):
 	movdqa	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqa	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqa	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqa	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 	cmp	$32, %rdx
 	jb	L(less32bytesin64in2alinged)
 
 	movdqa	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqa	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -454,42 +626,74 @@ L(less256bytesin2alinged):
 	movdqa	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqa	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqa	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqa	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 
 	movdqa	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqa	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 
 	movdqa	96(%rdi), %xmm2
 	pxor	96(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(112bytesin256)
+# endif
 
 	movdqa	112(%rdi), %xmm2
 	pxor	112(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(128bytesin256)
+# endif
 
 	add	$128, %rsi
 	add	$128, %rdi
@@ -503,12 +707,20 @@ L(less256bytesin2alinged):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -524,82 +736,146 @@ L(256bytesormorein2aligned):
 	movdqa	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqa	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqa	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqa	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 
 	movdqa	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqa	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 
 	movdqa	96(%rdi), %xmm2
 	pxor	96(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(112bytesin256)
+# endif
 
 	movdqa	112(%rdi), %xmm2
 	pxor	112(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(128bytesin256)
+# endif
 
 	movdqa	128(%rdi), %xmm2
 	pxor	128(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(144bytesin256)
+# endif
 
 	movdqa	144(%rdi), %xmm2
 	pxor	144(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(160bytesin256)
+# endif
 
 	movdqa	160(%rdi), %xmm2
 	pxor	160(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(176bytesin256)
+# endif
 
 	movdqa	176(%rdi), %xmm2
 	pxor	176(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(192bytesin256)
+# endif
 
 	movdqa	192(%rdi), %xmm2
 	pxor	192(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(208bytesin256)
+# endif
 
 	movdqa	208(%rdi), %xmm2
 	pxor	208(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(224bytesin256)
+# endif
 
 	movdqa	224(%rdi), %xmm2
 	pxor	224(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(240bytesin256)
+# endif
 
 	movdqa	240(%rdi), %xmm2
 	pxor	240(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(256bytesin256)
+# endif
 
 	add	$256, %rsi
 	add	$256, %rdi
@@ -616,12 +892,20 @@ L(256bytesormorein2aligned):
 	movdqa	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqa	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -663,7 +947,11 @@ L(64bytesormore_loopin2aligned):
 	por	%xmm5, %xmm1
 
 	ptest	%xmm1, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesormore_loop_end)
+# endif
 	add	$64, %rsi
 	add	$64, %rdi
 	sub	$64, %rdx
@@ -697,7 +985,11 @@ L(L2_L3_aligned_128bytes_loop):
 	por	%xmm5, %xmm1
 
 	ptest	%xmm1, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesormore_loop_end)
+# endif
 	add	$64, %rsi
 	add	$64, %rdi
 	sub	$64, %rdx
@@ -708,7 +1000,7 @@ L(L2_L3_aligned_128bytes_loop):
 	add	%rdx, %rdi
 	BRANCH_TO_JMPTBL_ENTRY(L(table_64bytes), %rdx, 4)
 
-
+# ifndef USE_AS_BCMP
 	.p2align 4
 L(64bytesormore_loop_end):
 	add	$16, %rdi
@@ -791,17 +1083,29 @@ L(32bytesin256):
 L(16bytesin256):
 	add	$16, %rdi
 	add	$16, %rsi
+# endif
 L(16bytes):
 	mov	-16(%rdi), %rax
 	mov	-16(%rsi), %rcx
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 L(8bytes):
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
+# ifdef USE_AS_BCMP
+	sub	%rcx, %rax
+	mov	%rax, %rcx
+	shr	$32, %rcx
+	or	%ecx, %eax
+# else
 	cmp	%rax, %rcx
 	jne	L(diffin8bytes)
 	xor	%eax, %eax
+# endif
 	ret
 
 	.p2align 4
@@ -809,16 +1113,26 @@ L(12bytes):
 	mov	-12(%rdi), %rax
 	mov	-12(%rsi), %rcx
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 L(4bytes):
 	mov	-4(%rsi), %ecx
-# ifndef USE_AS_WMEMCMP
+# ifdef USE_AS_BCMP
 	mov	-4(%rdi), %eax
-	cmp	%eax, %ecx
+	sub	%ecx, %eax
+	ret
 # else
+#  ifndef USE_AS_WMEMCMP
+	mov	-4(%rdi), %eax
+	cmp	%eax, %ecx
+#  else
 	cmp	-4(%rdi), %ecx
-# endif
+#  endif
 	jne	L(diffin4bytes)
+# endif
 L(0bytes):
 	xor	%eax, %eax
 	ret
@@ -832,31 +1146,51 @@ L(65bytes):
 	mov	$-65, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(49bytes):
 	movdqu	-49(%rdi), %xmm1
 	movdqu	-49(%rsi), %xmm2
 	mov	$-49, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(33bytes):
 	movdqu	-33(%rdi), %xmm1
 	movdqu	-33(%rsi), %xmm2
 	mov	$-33, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(17bytes):
 	mov	-17(%rdi), %rax
 	mov	-17(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 L(9bytes):
 	mov	-9(%rdi), %rax
 	mov	-9(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	movzbl	-1(%rdi), %eax
 	movzbl	-1(%rsi), %edx
 	sub	%edx, %eax
@@ -867,12 +1201,23 @@ L(13bytes):
 	mov	-13(%rdi), %rax
 	mov	-13(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
+#  ifdef USE_AS_BCMP
+	sub	%rcx, %rax
+	mov	%rax, %rcx
+	shr	$32, %rcx
+	or	%ecx, %eax
+#  else
 	cmp	%rax, %rcx
 	jne	L(diffin8bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 
 	.p2align 4
@@ -880,7 +1225,11 @@ L(5bytes):
 	mov	-5(%rdi), %eax
 	mov	-5(%rsi), %ecx
 	cmp	%eax, %ecx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin4bytes)
+#  endif
 	movzbl	-1(%rdi), %eax
 	movzbl	-1(%rsi), %edx
 	sub	%edx, %eax
@@ -893,37 +1242,59 @@ L(66bytes):
 	mov	$-66, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(50bytes):
 	movdqu	-50(%rdi), %xmm1
 	movdqu	-50(%rsi), %xmm2
 	mov	$-50, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(34bytes):
 	movdqu	-34(%rdi), %xmm1
 	movdqu	-34(%rsi), %xmm2
 	mov	$-34, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(18bytes):
 	mov	-18(%rdi), %rax
 	mov	-18(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 L(10bytes):
 	mov	-10(%rdi), %rax
 	mov	-10(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	movzwl	-2(%rdi), %eax
 	movzwl	-2(%rsi), %ecx
+#  ifndef USE_AS_BCMP
 	cmp	%cl, %al
 	jne	L(end)
 	and	$0xffff, %eax
 	and	$0xffff, %ecx
+#  endif
 	sub	%ecx, %eax
 	ret
 
@@ -932,12 +1303,23 @@ L(14bytes):
 	mov	-14(%rdi), %rax
 	mov	-14(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
+#  ifdef USE_AS_BCMP
+	sub	%rcx, %rax
+	mov	%rax, %rcx
+	shr	$32, %rcx
+	or	%ecx, %eax
+#  else
 	cmp	%rax, %rcx
 	jne	L(diffin8bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 
 	.p2align 4
@@ -945,14 +1327,20 @@ L(6bytes):
 	mov	-6(%rdi), %eax
 	mov	-6(%rsi), %ecx
 	cmp	%eax, %ecx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin4bytes)
+#  endif
 L(2bytes):
 	movzwl	-2(%rsi), %ecx
 	movzwl	-2(%rdi), %eax
+#  ifndef USE_AS_BCMP
 	cmp	%cl, %al
 	jne	L(end)
 	and	$0xffff, %eax
 	and	$0xffff, %ecx
+#  endif
 	sub	%ecx, %eax
 	ret
 
@@ -963,36 +1351,60 @@ L(67bytes):
 	mov	$-67, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(51bytes):
 	movdqu	-51(%rdi), %xmm2
 	movdqu	-51(%rsi), %xmm1
 	mov	$-51, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(35bytes):
 	movdqu	-35(%rsi), %xmm1
 	movdqu	-35(%rdi), %xmm2
 	mov	$-35, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(19bytes):
 	mov	-19(%rdi), %rax
 	mov	-19(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 L(11bytes):
 	mov	-11(%rdi), %rax
 	mov	-11(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-4(%rdi), %eax
 	mov	-4(%rsi), %ecx
+#  ifdef USE_AS_BCMP
+	sub	%ecx, %eax
+#  else
 	cmp	%eax, %ecx
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 
 	.p2align 4
@@ -1000,12 +1412,23 @@ L(15bytes):
 	mov	-15(%rdi), %rax
 	mov	-15(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
+#  ifdef USE_AS_BCMP
+	sub	%rcx, %rax
+	mov	%rax, %rcx
+	shr	$32, %rcx
+	or	%ecx, %eax
+#  else
 	cmp	%rax, %rcx
 	jne	L(diffin8bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 
 	.p2align 4
@@ -1013,12 +1436,20 @@ L(7bytes):
 	mov	-7(%rdi), %eax
 	mov	-7(%rsi), %ecx
 	cmp	%eax, %ecx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin4bytes)
+#  endif
 	mov	-4(%rdi), %eax
 	mov	-4(%rsi), %ecx
+#  ifdef USE_AS_BCMP
+	sub	%ecx, %eax
+#  else
 	cmp	%eax, %ecx
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 
 	.p2align 4
@@ -1026,7 +1457,11 @@ L(3bytes):
 	movzwl	-3(%rdi), %eax
 	movzwl	-3(%rsi), %ecx
 	cmp	%eax, %ecx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin2bytes)
+#  endif
 L(1bytes):
 	movzbl	-1(%rdi), %eax
 	movzbl	-1(%rsi), %ecx
@@ -1041,38 +1476,58 @@ L(68bytes):
 	mov	$-68, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(52bytes):
 	movdqu	-52(%rdi), %xmm2
 	movdqu	-52(%rsi), %xmm1
 	mov	$-52, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(36bytes):
 	movdqu	-36(%rdi), %xmm2
 	movdqu	-36(%rsi), %xmm1
 	mov	$-36, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(20bytes):
 	movdqu	-20(%rdi), %xmm2
 	movdqu	-20(%rsi), %xmm1
 	mov	$-20, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 	mov	-4(%rsi), %ecx
-
-# ifndef USE_AS_WMEMCMP
+# ifdef USE_AS_BCMP
 	mov	-4(%rdi), %eax
-	cmp	%eax, %ecx
+	sub	%ecx, %eax
 # else
+#  ifndef USE_AS_WMEMCMP
+	mov	-4(%rdi), %eax
+	cmp	%eax, %ecx
+#  else
 	cmp	-4(%rdi), %ecx
-# endif
+#  endif
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
+# endif
 	ret
 
 # ifndef USE_AS_WMEMCMP
@@ -1084,32 +1539,52 @@ L(69bytes):
 	mov	$-69, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(53bytes):
 	movdqu	-53(%rsi), %xmm1
 	movdqu	-53(%rdi), %xmm2
 	mov	$-53, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(37bytes):
 	movdqu	-37(%rsi), %xmm1
 	movdqu	-37(%rdi), %xmm2
 	mov	$-37, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(21bytes):
 	movdqu	-21(%rsi), %xmm1
 	movdqu	-21(%rdi), %xmm2
 	mov	$-21, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 
@@ -1120,32 +1595,52 @@ L(70bytes):
 	mov	$-70, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(54bytes):
 	movdqu	-54(%rsi), %xmm1
 	movdqu	-54(%rdi), %xmm2
 	mov	$-54, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(38bytes):
 	movdqu	-38(%rsi), %xmm1
 	movdqu	-38(%rdi), %xmm2
 	mov	$-38, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(22bytes):
 	movdqu	-22(%rsi), %xmm1
 	movdqu	-22(%rdi), %xmm2
 	mov	$-22, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 
@@ -1156,32 +1651,52 @@ L(71bytes):
 	mov	$-71, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(55bytes):
 	movdqu	-55(%rdi), %xmm2
 	movdqu	-55(%rsi), %xmm1
 	mov	$-55, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(39bytes):
 	movdqu	-39(%rdi), %xmm2
 	movdqu	-39(%rsi), %xmm1
 	mov	$-39, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(23bytes):
 	movdqu	-23(%rdi), %xmm2
 	movdqu	-23(%rsi), %xmm1
 	mov	$-23, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 # endif
@@ -1193,33 +1708,53 @@ L(72bytes):
 	mov	$-72, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(56bytes):
 	movdqu	-56(%rdi), %xmm2
 	movdqu	-56(%rsi), %xmm1
 	mov	$-56, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(40bytes):
 	movdqu	-40(%rdi), %xmm2
 	movdqu	-40(%rsi), %xmm1
 	mov	$-40, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(24bytes):
 	movdqu	-24(%rdi), %xmm2
 	movdqu	-24(%rsi), %xmm1
 	mov	$-24, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 
 	mov	-8(%rsi), %rcx
 	mov	-8(%rdi), %rax
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 	xor	%eax, %eax
 	ret
 
@@ -1232,32 +1767,52 @@ L(73bytes):
 	mov	$-73, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(57bytes):
 	movdqu	-57(%rdi), %xmm2
 	movdqu	-57(%rsi), %xmm1
 	mov	$-57, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(41bytes):
 	movdqu	-41(%rdi), %xmm2
 	movdqu	-41(%rsi), %xmm1
 	mov	$-41, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(25bytes):
 	movdqu	-25(%rdi), %xmm2
 	movdqu	-25(%rsi), %xmm1
 	mov	$-25, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-9(%rdi), %rax
 	mov	-9(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	movzbl	-1(%rdi), %eax
 	movzbl	-1(%rsi), %ecx
 	sub	%ecx, %eax
@@ -1270,35 +1825,60 @@ L(74bytes):
 	mov	$-74, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(58bytes):
 	movdqu	-58(%rdi), %xmm2
 	movdqu	-58(%rsi), %xmm1
 	mov	$-58, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(42bytes):
 	movdqu	-42(%rdi), %xmm2
 	movdqu	-42(%rsi), %xmm1
 	mov	$-42, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(26bytes):
 	movdqu	-26(%rdi), %xmm2
 	movdqu	-26(%rsi), %xmm1
 	mov	$-26, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-10(%rdi), %rax
 	mov	-10(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	movzwl	-2(%rdi), %eax
 	movzwl	-2(%rsi), %ecx
+#  ifdef USE_AS_BCMP
+	sub	%ecx, %eax
+	ret
+#  else
 	jmp	L(diffin2bytes)
+#  endif
 
 	.p2align 4
 L(75bytes):
@@ -1307,37 +1887,61 @@ L(75bytes):
 	mov	$-75, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(59bytes):
 	movdqu	-59(%rdi), %xmm2
 	movdqu	-59(%rsi), %xmm1
 	mov	$-59, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(43bytes):
 	movdqu	-43(%rdi), %xmm2
 	movdqu	-43(%rsi), %xmm1
 	mov	$-43, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(27bytes):
 	movdqu	-27(%rdi), %xmm2
 	movdqu	-27(%rsi), %xmm1
 	mov	$-27, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-11(%rdi), %rax
 	mov	-11(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-4(%rdi), %eax
 	mov	-4(%rsi), %ecx
+#  ifdef USE_AS_BCMP
+	sub	%ecx, %eax
+#  else
 	cmp	%eax, %ecx
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 # endif
 	.p2align 4
@@ -1347,41 +1951,66 @@ L(76bytes):
 	mov	$-76, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(60bytes):
 	movdqu	-60(%rdi), %xmm2
 	movdqu	-60(%rsi), %xmm1
 	mov	$-60, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(44bytes):
 	movdqu	-44(%rdi), %xmm2
 	movdqu	-44(%rsi), %xmm1
 	mov	$-44, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(28bytes):
 	movdqu	-28(%rdi), %xmm2
 	movdqu	-28(%rsi), %xmm1
 	mov	$-28, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 	mov	-12(%rdi), %rax
 	mov	-12(%rsi), %rcx
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 	mov	-4(%rsi), %ecx
-# ifndef USE_AS_WMEMCMP
+# ifdef USE_AS_BCMP
 	mov	-4(%rdi), %eax
-	cmp	%eax, %ecx
+	sub	%ecx, %eax
 # else
+#  ifndef USE_AS_WMEMCMP
+	mov	-4(%rdi), %eax
+	cmp	%eax, %ecx
+#  else
 	cmp	-4(%rdi), %ecx
-# endif
+#  endif
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
+# endif
 	ret
 
 # ifndef USE_AS_WMEMCMP
@@ -1393,38 +2022,62 @@ L(77bytes):
 	mov	$-77, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(61bytes):
 	movdqu	-61(%rdi), %xmm2
 	movdqu	-61(%rsi), %xmm1
 	mov	$-61, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(45bytes):
 	movdqu	-45(%rdi), %xmm2
 	movdqu	-45(%rsi), %xmm1
 	mov	$-45, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(29bytes):
 	movdqu	-29(%rdi), %xmm2
 	movdqu	-29(%rsi), %xmm1
 	mov	$-29, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 
 	mov	-13(%rdi), %rax
 	mov	-13(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 
@@ -1435,36 +2088,60 @@ L(78bytes):
 	mov	$-78, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(62bytes):
 	movdqu	-62(%rdi), %xmm2
 	movdqu	-62(%rsi), %xmm1
 	mov	$-62, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(46bytes):
 	movdqu	-46(%rdi), %xmm2
 	movdqu	-46(%rsi), %xmm1
 	mov	$-46, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(30bytes):
 	movdqu	-30(%rdi), %xmm2
 	movdqu	-30(%rsi), %xmm1
 	mov	$-30, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-14(%rdi), %rax
 	mov	-14(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 
@@ -1475,36 +2152,60 @@ L(79bytes):
 	mov	$-79, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(63bytes):
 	movdqu	-63(%rdi), %xmm2
 	movdqu	-63(%rsi), %xmm1
 	mov	$-63, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(47bytes):
 	movdqu	-47(%rdi), %xmm2
 	movdqu	-47(%rsi), %xmm1
 	mov	$-47, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(31bytes):
 	movdqu	-31(%rdi), %xmm2
 	movdqu	-31(%rsi), %xmm1
 	mov	$-31, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-15(%rdi), %rax
 	mov	-15(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 # endif
@@ -1515,37 +2216,58 @@ L(64bytes):
 	mov	$-64, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(48bytes):
 	movdqu	-48(%rdi), %xmm2
 	movdqu	-48(%rsi), %xmm1
 	mov	$-48, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(32bytes):
 	movdqu	-32(%rdi), %xmm2
 	movdqu	-32(%rsi), %xmm1
 	mov	$-32, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 
 	mov	-16(%rdi), %rax
 	mov	-16(%rsi), %rcx
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 	xor	%eax, %eax
 	ret
 
 /*
  * Aligned 8 bytes to avoid 2 branch "taken" in one 16 alinged code block.
  */
+# ifndef USE_AS_BCMP
 	.p2align 3
 L(less16bytes):
 	movsbq	%dl, %rdx
@@ -1561,16 +2283,16 @@ L(diffin8bytes):
 	shr	$32, %rcx
 	shr	$32, %rax
 
-# ifdef USE_AS_WMEMCMP
+#  ifdef USE_AS_WMEMCMP
 /* for wmemcmp */
 	cmp	%eax, %ecx
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
 	ret
-# endif
+#  endif
 
 L(diffin4bytes):
-# ifndef USE_AS_WMEMCMP
+#  ifndef USE_AS_WMEMCMP
 	cmp	%cx, %ax
 	jne	L(diffin2bytes)
 	shr	$16, %ecx
@@ -1589,7 +2311,7 @@ L(end):
 	and	$0xff, %ecx
 	sub	%ecx, %eax
 	ret
-# else
+#  else
 
 /* for wmemcmp */
 	mov	$1, %eax
@@ -1601,6 +2323,15 @@ L(end):
 L(nequal_bigger):
 	ret
 
+L(unreal_case):
+	xor	%eax, %eax
+	ret
+#  endif
+# else
+	.p2align 4
+L(return_not_equals):
+	mov	$1, %eax
+	ret
 L(unreal_case):
 	xor	%eax, %eax
 	ret
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S
  2021-09-13 23:05 [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
  2021-09-13 23:05 ` [PATCH 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
  2021-09-13 23:05 ` [PATCH 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S Noah Goldstein via Libc-alpha
@ 2021-09-13 23:05 ` Noah Goldstein via Libc-alpha
  2021-09-13 23:05 ` [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-13 23:05 UTC (permalink / raw)
  To: libc-alpha

No bug. This commit adds new optimized bcmp implementation for avx2.

The primary optimizations are 1) skipping the logic to find the
difference of the first mismatched byte and 2) not updating src/dst
addresses as the non-equals logic does not need to be reused by
different areas.

The entry alignment has been fixed at 64. In throughput sensitive
functions which bcmp can potentially be frontend loop performance is
important to opimized for. This is impossible/difficult to do/maintain
with only 16 byte fixed alignment.

test-memcmp, test-bcmp, and test-wmemcmp are all passing.
---
 sysdeps/x86/sysdep.h                       |   6 +-
 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S   |   4 +-
 sysdeps/x86_64/multiarch/bcmp-avx2.S       | 304 ++++++++++++++++++++-
 sysdeps/x86_64/multiarch/ifunc-bcmp.h      |   4 +-
 sysdeps/x86_64/multiarch/ifunc-impl-list.c |   2 -
 5 files changed, 308 insertions(+), 12 deletions(-)

diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h
index cac1d762fb..4895179c10 100644
--- a/sysdeps/x86/sysdep.h
+++ b/sysdeps/x86/sysdep.h
@@ -78,15 +78,17 @@ enum cf_protection_level
 #define ASM_SIZE_DIRECTIVE(name) .size name,.-name;
 
 /* Define an entry point visible from C.  */
-#define	ENTRY(name)							      \
+#define	ENTRY_P2ALIGN(name, alignment)					      \
   .globl C_SYMBOL_NAME(name);						      \
   .type C_SYMBOL_NAME(name),@function;					      \
-  .align ALIGNARG(4);							      \
+  .align ALIGNARG(alignment);						      \
   C_LABEL(name)								      \
   cfi_startproc;							      \
   _CET_ENDBR;								      \
   CALL_MCOUNT
 
+#define ENTRY(name) ENTRY_P2ALIGN (name, 4)
+
 #undef	END
 #define END(name)							      \
   cfi_endproc;								      \
diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
index d742257e4e..28976daff0 100644
--- a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
+++ b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
@@ -1,5 +1,5 @@
-#ifndef MEMCMP
-# define MEMCMP __bcmp_avx2_rtm
+#ifndef BCMP
+# define BCMP __bcmp_avx2_rtm
 #endif
 
 #define ZERO_UPPER_VEC_REGISTERS_RETURN \
diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2.S b/sysdeps/x86_64/multiarch/bcmp-avx2.S
index 93a9a20b17..eb77ae5c4a 100644
--- a/sysdeps/x86_64/multiarch/bcmp-avx2.S
+++ b/sysdeps/x86_64/multiarch/bcmp-avx2.S
@@ -16,8 +16,304 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#ifndef MEMCMP
-# define MEMCMP	__bcmp_avx2
-#endif
+#if IS_IN (libc)
+
+/* bcmp is implemented as:
+   1. Use ymm vector compares when possible. The only case where
+      vector compares is not possible for when size < VEC_SIZE
+      and loading from either s1 or s2 would cause a page cross.
+   2. Use xmm vector compare when size >= 8 bytes.
+   3. Optimistically compare up to first 4 * VEC_SIZE one at a
+      to check for early mismatches. Only do this if its guranteed the
+      work is not wasted.
+   4. If size is 8 * VEC_SIZE or less, unroll the loop.
+   5. Compare 4 * VEC_SIZE at a time with the aligned first memory
+      area.
+   6. Use 2 vector compares when size is 2 * VEC_SIZE or less.
+   7. Use 4 vector compares when size is 4 * VEC_SIZE or less.
+   8. Use 8 vector compares when size is 8 * VEC_SIZE or less.  */
+
+# include <sysdep.h>
+
+# ifndef BCMP
+#  define BCMP	__bcmp_avx2
+# endif
+
+# define VPCMPEQ	vpcmpeqb
+
+# ifndef VZEROUPPER
+#  define VZEROUPPER	vzeroupper
+# endif
+
+# ifndef SECTION
+#  define SECTION(p)	p##.avx
+# endif
+
+# define VEC_SIZE 32
+# define PAGE_SIZE	4096
+
+	.section SECTION(.text), "ax", @progbits
+ENTRY_P2ALIGN (BCMP, 6)
+# ifdef __ILP32__
+	/* Clear the upper 32 bits.  */
+	movl	%edx, %edx
+# endif
+	cmp	$VEC_SIZE, %RDX_LP
+	jb	L(less_vec)
+
+	/* From VEC to 2 * VEC.  No branch when size == VEC_SIZE.  */
+	vmovdqu	(%rsi), %ymm1
+	VPCMPEQ	(%rdi), %ymm1, %ymm1
+	vpmovmskb %ymm1, %eax
+	incl	%eax
+	jnz	L(return_neq0)
+	cmpq	$(VEC_SIZE * 2), %rdx
+	jbe	L(last_1x_vec)
+
+	/* Check second VEC no matter what.  */
+	vmovdqu	VEC_SIZE(%rsi), %ymm2
+	VPCMPEQ	VEC_SIZE(%rdi), %ymm2, %ymm2
+	vpmovmskb %ymm2, %eax
+	/* If all 4 VEC where equal eax will be all 1s so incl will overflow
+	   and set zero flag.  */
+	incl	%eax
+	jnz	L(return_neq0)
+
+	/* Less than 4 * VEC.  */
+	cmpq	$(VEC_SIZE * 4), %rdx
+	jbe	L(last_2x_vec)
+
+	/* Check third and fourth VEC no matter what.  */
+	vmovdqu	(VEC_SIZE * 2)(%rsi), %ymm3
+	VPCMPEQ	(VEC_SIZE * 2)(%rdi), %ymm3, %ymm3
+	vpmovmskb %ymm3, %eax
+	incl	%eax
+	jnz	L(return_neq0)
+
+	vmovdqu	(VEC_SIZE * 3)(%rsi), %ymm4
+	VPCMPEQ	(VEC_SIZE * 3)(%rdi), %ymm4, %ymm4
+	vpmovmskb %ymm4, %eax
+	incl	%eax
+	jnz	L(return_neq0)
+
+	/* Go to 4x VEC loop.  */
+	cmpq	$(VEC_SIZE * 8), %rdx
+	ja	L(more_8x_vec)
+
+	/* Handle remainder of size = 4 * VEC + 1 to 8 * VEC without any
+	   branches.  */
+
+	/* Adjust rsi and rdi to avoid indexed address mode. This end up
+	   saving a 16 bytes of code, prevents unlamination, and bottlenecks in
+	   the AGU.  */
+	addq	%rdx, %rsi
+	vmovdqu	-(VEC_SIZE * 4)(%rsi), %ymm1
+	vmovdqu	-(VEC_SIZE * 3)(%rsi), %ymm2
+	addq	%rdx, %rdi
+
+	VPCMPEQ	-(VEC_SIZE * 4)(%rdi), %ymm1, %ymm1
+	VPCMPEQ	-(VEC_SIZE * 3)(%rdi), %ymm2, %ymm2
+
+	vmovdqu	-(VEC_SIZE * 2)(%rsi), %ymm3
+	VPCMPEQ	-(VEC_SIZE * 2)(%rdi), %ymm3, %ymm3
+	vmovdqu	-VEC_SIZE(%rsi), %ymm4
+	VPCMPEQ	-VEC_SIZE(%rdi), %ymm4, %ymm4
 
-#include "bcmp-avx2.S"
+	/* Reduce VEC0 - VEC4.  */
+	vpand	%ymm1, %ymm2, %ymm2
+	vpand	%ymm3, %ymm4, %ymm4
+	vpand	%ymm2, %ymm4, %ymm4
+	vpmovmskb %ymm4, %eax
+	incl	%eax
+L(return_neq0):
+L(return_vzeroupper):
+	ZERO_UPPER_VEC_REGISTERS_RETURN
+
+	/* NB: p2align 5 here will ensure the L(loop_4x_vec) is also 32 byte
+	   aligned.  */
+	.p2align 5
+L(less_vec):
+	/* Check if one or less char. This is necessary for size = 0 but is
+	   also faster for size = 1.  */
+	cmpl	$1, %edx
+	jbe	L(one_or_less)
+
+	/* Check if loading one VEC from either s1 or s2 could cause a page
+	   cross. This can have false positives but is by far the fastest
+	   method.  */
+	movl	%edi, %eax
+	orl	%esi, %eax
+	andl	$(PAGE_SIZE - 1), %eax
+	cmpl	$(PAGE_SIZE - VEC_SIZE), %eax
+	jg	L(page_cross_less_vec)
+
+	/* No page cross possible.  */
+	vmovdqu	(%rsi), %ymm2
+	VPCMPEQ	(%rdi), %ymm2, %ymm2
+	vpmovmskb %ymm2, %eax
+	incl	%eax
+	/* Result will be zero if s1 and s2 match. Otherwise first set bit
+	   will be first mismatch.  */
+	bzhil	%edx, %eax, %eax
+	VZEROUPPER_RETURN
+
+	/* Relatively cold but placing close to L(less_vec) for 2 byte jump
+	   encoding.  */
+	.p2align 4
+L(one_or_less):
+	jb	L(zero)
+	movzbl	(%rsi), %ecx
+	movzbl	(%rdi), %eax
+	subl	%ecx, %eax
+	/* No ymm register was touched.  */
+	ret
+	/* Within the same 16 byte block is L(one_or_less).  */
+L(zero):
+	xorl	%eax, %eax
+	ret
+
+	.p2align 4
+L(last_1x_vec):
+	vmovdqu	-(VEC_SIZE * 1)(%rsi, %rdx), %ymm1
+	VPCMPEQ	-(VEC_SIZE * 1)(%rdi, %rdx), %ymm1, %ymm1
+	vpmovmskb %ymm1, %eax
+	incl	%eax
+	VZEROUPPER_RETURN
+
+	.p2align 4
+L(last_2x_vec):
+	vmovdqu	-(VEC_SIZE * 2)(%rsi, %rdx), %ymm1
+	VPCMPEQ	-(VEC_SIZE * 2)(%rdi, %rdx), %ymm1, %ymm1
+	vmovdqu	-(VEC_SIZE * 1)(%rsi, %rdx), %ymm2
+	VPCMPEQ	-(VEC_SIZE * 1)(%rdi, %rdx), %ymm2, %ymm2
+	vpand	%ymm1, %ymm2, %ymm2
+	vpmovmskb %ymm2, %eax
+	incl	%eax
+	VZEROUPPER_RETURN
+
+	.p2align 4
+L(more_8x_vec):
+	/* Set end of s1 in rdx.  */
+	leaq	-(VEC_SIZE * 4)(%rdi, %rdx), %rdx
+	/* rsi stores s2 - s1. This allows loop to only update one pointer.
+	 */
+	subq	%rdi, %rsi
+	/* Align s1 pointer.  */
+	andq	$-VEC_SIZE, %rdi
+	/* Adjust because first 4x vec where check already.  */
+	subq	$-(VEC_SIZE * 4), %rdi
+	.p2align 4
+L(loop_4x_vec):
+	/* rsi has s2 - s1 so get correct address by adding s1 (in rdi).  */
+	vmovdqu	(%rsi, %rdi), %ymm1
+	VPCMPEQ	(%rdi), %ymm1, %ymm1
+
+	vmovdqu	VEC_SIZE(%rsi, %rdi), %ymm2
+	VPCMPEQ	VEC_SIZE(%rdi), %ymm2, %ymm2
+
+	vmovdqu	(VEC_SIZE * 2)(%rsi, %rdi), %ymm3
+	VPCMPEQ	(VEC_SIZE * 2)(%rdi), %ymm3, %ymm3
+
+	vmovdqu	(VEC_SIZE * 3)(%rsi, %rdi), %ymm4
+	VPCMPEQ	(VEC_SIZE * 3)(%rdi), %ymm4, %ymm4
+
+	vpand	%ymm1, %ymm2, %ymm2
+	vpand	%ymm3, %ymm4, %ymm4
+	vpand	%ymm2, %ymm4, %ymm4
+	vpmovmskb %ymm4, %eax
+	incl	%eax
+	jnz	L(return_neq1)
+	subq	$-(VEC_SIZE * 4), %rdi
+	/* Check if s1 pointer at end.  */
+	cmpq	%rdx, %rdi
+	jb	L(loop_4x_vec)
+
+	vmovdqu	(VEC_SIZE * 3)(%rsi, %rdx), %ymm4
+	VPCMPEQ	(VEC_SIZE * 3)(%rdx), %ymm4, %ymm4
+	subq	%rdx, %rdi
+	/* rdi has 4 * VEC_SIZE - remaining length.  */
+	cmpl	$(VEC_SIZE * 3), %edi
+	jae	L(8x_last_1x_vec)
+	/* Load regardless of branch.  */
+	vmovdqu	(VEC_SIZE * 2)(%rsi, %rdx), %ymm3
+	VPCMPEQ	(VEC_SIZE * 2)(%rdx), %ymm3, %ymm3
+	cmpl	$(VEC_SIZE * 2), %edi
+	jae	L(8x_last_2x_vec)
+	/* Check last 4 VEC.  */
+	vmovdqu	VEC_SIZE(%rsi, %rdx), %ymm1
+	VPCMPEQ	VEC_SIZE(%rdx), %ymm1, %ymm1
+
+	vmovdqu	(%rsi, %rdx), %ymm2
+	VPCMPEQ	(%rdx), %ymm2, %ymm2
+
+	vpand	%ymm3, %ymm4, %ymm4
+	vpand	%ymm1, %ymm2, %ymm3
+L(8x_last_2x_vec):
+	vpand	%ymm3, %ymm4, %ymm4
+L(8x_last_1x_vec):
+	vpmovmskb %ymm4, %eax
+	/* Restore s1 pointer to rdi.  */
+	incl	%eax
+L(return_neq1):
+	VZEROUPPER_RETURN
+
+	/* Relatively cold case as page cross are unexpected.  */
+	.p2align 4
+L(page_cross_less_vec):
+	cmpl	$16, %edx
+	jae	L(between_16_31)
+	cmpl	$8, %edx
+	ja	L(between_9_15)
+	cmpl	$4, %edx
+	jb	L(between_2_3)
+	/* From 4 to 8 bytes.  No branch when size == 4.  */
+	movl	(%rdi), %eax
+	movl	(%rsi), %ecx
+	subl	%ecx, %eax
+	movl	-4(%rdi, %rdx), %ecx
+	movl	-4(%rsi, %rdx), %esi
+	subl	%esi, %ecx
+	orl	%ecx, %eax
+	ret
+
+	.p2align 4,, 8
+L(between_9_15):
+	vmovq	(%rdi), %xmm1
+	vmovq	(%rsi), %xmm2
+	VPCMPEQ	%xmm1, %xmm2, %xmm3
+	vmovq	-8(%rdi, %rdx), %xmm1
+	vmovq	-8(%rsi, %rdx), %xmm2
+	VPCMPEQ	%xmm1, %xmm2, %xmm2
+	vpand	%xmm2, %xmm3, %xmm3
+	vpmovmskb %xmm3, %eax
+	subl	$0xffff, %eax
+	/* No ymm register was touched.  */
+	ret
+
+	.p2align 4,, 8
+L(between_16_31):
+	/* From 16 to 31 bytes.  No branch when size == 16.  */
+	vmovdqu	(%rsi), %xmm1
+	VPCMPEQ	(%rdi), %xmm1, %xmm1
+	vmovdqu	-16(%rsi, %rdx), %xmm2
+	VPCMPEQ	-16(%rdi, %rdx), %xmm2, %xmm2
+	vpand	%xmm1, %xmm2, %xmm2
+	vpmovmskb %xmm2, %eax
+	subl	$0xffff, %eax
+	/* No ymm register was touched.  */
+	ret
+
+	.p2align 4,, 8
+L(between_2_3):
+	/* From 2 to 3 bytes.  No branch when size == 2.  */
+	movzwl	(%rdi), %eax
+	movzwl	(%rsi), %ecx
+	subl	%ecx, %eax
+	movzbl	-1(%rdi, %rdx), %edi
+	movzbl	-1(%rsi, %rdx), %esi
+	subl	%edi, %esi
+	orl	%esi, %eax
+	/* No ymm register was touched.  */
+	ret
+END (BCMP)
+#endif
diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
index b0dacd8526..f94516e5ee 100644
--- a/sysdeps/x86_64/multiarch/ifunc-bcmp.h
+++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
@@ -32,11 +32,11 @@ IFUNC_SELECTOR (void)
 
   if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
       && CPU_FEATURE_USABLE_P (cpu_features, BMI2)
-      && CPU_FEATURE_USABLE_P (cpu_features, MOVBE)
       && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
     {
       if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
-	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
+	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
+	  && CPU_FEATURE_USABLE_P (cpu_features, MOVBE))
 	return OPTIMIZE (evex);
 
       if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index dd0c393c7d..cda0316928 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -42,13 +42,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, bcmp,
 	      IFUNC_IMPL_ADD (array, i, bcmp,
 			      (CPU_FEATURE_USABLE (AVX2)
-                   && CPU_FEATURE_USABLE (MOVBE)
 			       && CPU_FEATURE_USABLE (BMI2)),
 			      __bcmp_avx2)
 	      IFUNC_IMPL_ADD (array, i, bcmp,
 			      (CPU_FEATURE_USABLE (AVX2)
 			       && CPU_FEATURE_USABLE (BMI2)
-                   && CPU_FEATURE_USABLE (MOVBE)
 			       && CPU_FEATURE_USABLE (RTM)),
 			      __bcmp_avx2_rtm)
 	      IFUNC_IMPL_ADD (array, i, bcmp,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-13 23:05 [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
                   ` (2 preceding siblings ...)
  2021-09-13 23:05 ` [PATCH 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S Noah Goldstein via Libc-alpha
@ 2021-09-13 23:05 ` Noah Goldstein via Libc-alpha
  2021-09-14  1:18   ` Carlos O'Donell via Libc-alpha
  2021-09-13 23:22 ` [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-13 23:05 UTC (permalink / raw)
  To: libc-alpha

No bug. This commit adds new optimized bcmp implementation for evex.

The primary optimizations are 1) skipping the logic to find the
difference of the first mismatched byte and 2) not updating src/dst
addresses as the non-equals logic does not need to be reused by
different areas.

The entry alignment has been fixed at 64. In throughput sensitive
functions which bcmp can potentially be frontend loop performance is
important to opimized for. This is impossible/difficult to do/maintain
with only 16 byte fixed alignment.

test-memcmp, test-bcmp, and test-wmemcmp are all passing.
---
 sysdeps/x86_64/multiarch/bcmp-evex.S       | 305 ++++++++++++++++++++-
 sysdeps/x86_64/multiarch/ifunc-bcmp.h      |   3 +-
 sysdeps/x86_64/multiarch/ifunc-impl-list.c |   1 -
 3 files changed, 302 insertions(+), 7 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/bcmp-evex.S b/sysdeps/x86_64/multiarch/bcmp-evex.S
index ade52e8c68..1bfe824eb4 100644
--- a/sysdeps/x86_64/multiarch/bcmp-evex.S
+++ b/sysdeps/x86_64/multiarch/bcmp-evex.S
@@ -16,8 +16,305 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#ifndef MEMCMP
-# define MEMCMP	__bcmp_evex
-#endif
+#if IS_IN (libc)
+
+/* bcmp is implemented as:
+   1. Use ymm vector compares when possible. The only case where
+      vector compares is not possible for when size < VEC_SIZE
+      and loading from either s1 or s2 would cause a page cross.
+   2. Use xmm vector compare when size >= 8 bytes.
+   3. Optimistically compare up to first 4 * VEC_SIZE one at a
+      to check for early mismatches. Only do this if its guranteed the
+      work is not wasted.
+   4. If size is 8 * VEC_SIZE or less, unroll the loop.
+   5. Compare 4 * VEC_SIZE at a time with the aligned first memory
+      area.
+   6. Use 2 vector compares when size is 2 * VEC_SIZE or less.
+   7. Use 4 vector compares when size is 4 * VEC_SIZE or less.
+   8. Use 8 vector compares when size is 8 * VEC_SIZE or less.  */
+
+# include <sysdep.h>
+
+# ifndef BCMP
+#  define BCMP	__bcmp_evex
+# endif
+
+# define VMOVU	vmovdqu64
+# define VPCMP	vpcmpub
+# define VPTEST	vptestmb
+
+# define VEC_SIZE	32
+# define PAGE_SIZE	4096
+
+# define YMM0		ymm16
+# define YMM1		ymm17
+# define YMM2		ymm18
+# define YMM3		ymm19
+# define YMM4		ymm20
+# define YMM5		ymm21
+# define YMM6		ymm22
+
+
+	.section .text.evex, "ax", @progbits
+ENTRY_P2ALIGN (BCMP, 6)
+# ifdef __ILP32__
+	/* Clear the upper 32 bits.  */
+	movl	%edx, %edx
+# endif
+	cmp	$VEC_SIZE, %RDX_LP
+	jb	L(less_vec)
+
+	/* From VEC to 2 * VEC.  No branch when size == VEC_SIZE.  */
+	VMOVU	(%rsi), %YMM1
+	/* Use compare not equals to directly check for mismatch.  */
+	VPCMP	$4, (%rdi), %YMM1, %k1
+	kmovd	%k1, %eax
+	testl	%eax, %eax
+	jnz	L(return_neq0)
+
+	cmpq	$(VEC_SIZE * 2), %rdx
+	jbe	L(last_1x_vec)
+
+	/* Check second VEC no matter what.  */
+	VMOVU	VEC_SIZE(%rsi), %YMM2
+	VPCMP	$4, VEC_SIZE(%rdi), %YMM2, %k1
+	kmovd	%k1, %eax
+	testl	%eax, %eax
+	jnz	L(return_neq0)
+
+	/* Less than 4 * VEC.  */
+	cmpq	$(VEC_SIZE * 4), %rdx
+	jbe	L(last_2x_vec)
+
+	/* Check third and fourth VEC no matter what.  */
+	VMOVU	(VEC_SIZE * 2)(%rsi), %YMM3
+	VPCMP	$4, (VEC_SIZE * 2)(%rdi), %YMM3, %k1
+	kmovd	%k1, %eax
+	testl	%eax, %eax
+	jnz	L(return_neq0)
+
+	VMOVU	(VEC_SIZE * 3)(%rsi), %YMM4
+	VPCMP	$4, (VEC_SIZE * 3)(%rdi), %YMM4, %k1
+	kmovd	%k1, %eax
+	testl	%eax, %eax
+	jnz	L(return_neq0)
+
+	/* Go to 4x VEC loop.  */
+	cmpq	$(VEC_SIZE * 8), %rdx
+	ja	L(more_8x_vec)
+
+	/* Handle remainder of size = 4 * VEC + 1 to 8 * VEC without any
+	   branches.  */
+
+	VMOVU	-(VEC_SIZE * 4)(%rsi, %rdx), %YMM1
+	VMOVU	-(VEC_SIZE * 3)(%rsi, %rdx), %YMM2
+	addq	%rdx, %rdi
+
+	/* Wait to load from s1 until addressed adjust due to unlamination.
+	 */
+
+	/* vpxor will be all 0s if s1 and s2 are equal. Otherwise it will
+	   have some 1s.  */
+	vpxorq	-(VEC_SIZE * 4)(%rdi), %YMM1, %YMM1
+	vpxorq	-(VEC_SIZE * 3)(%rdi), %YMM2, %YMM2
+
+	VMOVU	-(VEC_SIZE * 2)(%rsi, %rdx), %YMM3
+	vpxorq	-(VEC_SIZE * 2)(%rdi), %YMM3, %YMM3
+	/* Or together YMM1, YMM2, and YMM3 into YMM3.  */
+	vpternlogd $0xfe, %YMM1, %YMM2, %YMM3
 
-#include "memcmp-evex-movbe.S"
+	VMOVU	-(VEC_SIZE)(%rsi, %rdx), %YMM4
+	/* Ternary logic to xor (VEC_SIZE * 3)(%rdi) with YMM4 while oring
+	   with YMM3. Result is stored in YMM4.  */
+	vpternlogd $0xde, -(VEC_SIZE)(%rdi), %YMM3, %YMM4
+	/* Compare YMM4 with 0. If any 1s s1 and s2 don't match.  */
+	VPTEST	%YMM4, %YMM4, %k1
+	kmovd	%k1, %eax
+L(return_neq0):
+	ret
+
+	/* Fits in padding needed to .p2align 5 L(less_vec).  */
+L(last_1x_vec):
+	VMOVU	-(VEC_SIZE * 1)(%rsi, %rdx), %YMM1
+	VPCMP	$4, -(VEC_SIZE * 1)(%rdi, %rdx), %YMM1, %k1
+	kmovd	%k1, %eax
+	ret
+
+	/* NB: p2align 5 here will ensure the L(loop_4x_vec) is also 32 byte
+	   aligned.  */
+	.p2align 5
+L(less_vec):
+	/* Check if one or less char. This is necessary for size = 0 but is
+	   also faster for size = 1.  */
+	cmpl	$1, %edx
+	jbe	L(one_or_less)
+
+	/* Check if loading one VEC from either s1 or s2 could cause a page
+	   cross. This can have false positives but is by far the fastest
+	   method.  */
+	movl	%edi, %eax
+	orl	%esi, %eax
+	andl	$(PAGE_SIZE - 1), %eax
+	cmpl	$(PAGE_SIZE - VEC_SIZE), %eax
+	jg	L(page_cross_less_vec)
+
+	/* No page cross possible.  */
+	VMOVU	(%rsi), %YMM2
+	VPCMP	$4, (%rdi), %YMM2, %k1
+	kmovd	%k1, %eax
+	/* Result will be zero if s1 and s2 match. Otherwise first set bit
+	   will be first mismatch.  */
+	bzhil	%edx, %eax, %eax
+	ret
+
+	/* Relatively cold but placing close to L(less_vec) for 2 byte jump
+	   encoding.  */
+	.p2align 4
+L(one_or_less):
+	jb	L(zero)
+	movzbl	(%rsi), %ecx
+	movzbl	(%rdi), %eax
+	subl	%ecx, %eax
+	/* No ymm register was touched.  */
+	ret
+	/* Within the same 16 byte block is L(one_or_less).  */
+L(zero):
+	xorl	%eax, %eax
+	ret
+
+	.p2align 4
+L(last_2x_vec):
+	VMOVU	-(VEC_SIZE * 2)(%rsi, %rdx), %YMM1
+	vpxorq	-(VEC_SIZE * 2)(%rdi, %rdx), %YMM1, %YMM1
+	VMOVU	-(VEC_SIZE * 1)(%rsi, %rdx), %YMM2
+	vpternlogd $0xde, -(VEC_SIZE * 1)(%rdi, %rdx), %YMM1, %YMM2
+	VPTEST	%YMM2, %YMM2, %k1
+	kmovd	%k1, %eax
+	ret
+
+	.p2align 4
+L(more_8x_vec):
+	/* Set end of s1 in rdx.  */
+	leaq	-(VEC_SIZE * 4)(%rdi, %rdx), %rdx
+	/* rsi stores s2 - s1. This allows loop to only update one pointer.
+	 */
+	subq	%rdi, %rsi
+	/* Align s1 pointer.  */
+	andq	$-VEC_SIZE, %rdi
+	/* Adjust because first 4x vec where check already.  */
+	subq	$-(VEC_SIZE * 4), %rdi
+	.p2align 4
+L(loop_4x_vec):
+	VMOVU	(%rsi, %rdi), %YMM1
+	vpxorq	(%rdi), %YMM1, %YMM1
+
+	VMOVU	VEC_SIZE(%rsi, %rdi), %YMM2
+	vpxorq	VEC_SIZE(%rdi), %YMM2, %YMM2
+
+	VMOVU	(VEC_SIZE * 2)(%rsi, %rdi), %YMM3
+	vpxorq	(VEC_SIZE * 2)(%rdi), %YMM3, %YMM3
+	vpternlogd $0xfe, %YMM1, %YMM2, %YMM3
+
+	VMOVU	(VEC_SIZE * 3)(%rsi, %rdi), %YMM4
+	vpternlogd $0xde, (VEC_SIZE * 3)(%rdi), %YMM3, %YMM4
+	VPTEST	%YMM4, %YMM4, %k1
+	kmovd	%k1, %eax
+	testl	%eax, %eax
+	jnz	L(return_neq2)
+	subq	$-(VEC_SIZE * 4), %rdi
+	cmpq	%rdx, %rdi
+	jb	L(loop_4x_vec)
+
+	subq	%rdx, %rdi
+	VMOVU	(VEC_SIZE * 3)(%rsi, %rdx), %YMM4
+	vpxorq	(VEC_SIZE * 3)(%rdx), %YMM4, %YMM4
+	/* rdi has 4 * VEC_SIZE - remaining length.  */
+	cmpl	$(VEC_SIZE * 3), %edi
+	jae	L(8x_last_1x_vec)
+	/* Load regardless of branch.  */
+	VMOVU	(VEC_SIZE * 2)(%rsi, %rdx), %YMM3
+	/* Ternary logic to xor (VEC_SIZE * 2)(%rdx) with YMM3 while oring
+	   with YMM4. Result is stored in YMM4.  */
+	vpternlogd $0xf6, (VEC_SIZE * 2)(%rdx), %YMM3, %YMM4
+	cmpl	$(VEC_SIZE * 2), %edi
+	jae	L(8x_last_2x_vec)
+
+	VMOVU	VEC_SIZE(%rsi, %rdx), %YMM2
+	vpxorq	VEC_SIZE(%rdx), %YMM2, %YMM2
+
+	VMOVU	(%rsi, %rdx), %YMM1
+	vpxorq	(%rdx), %YMM1, %YMM1
+
+	vpternlogd $0xfe, %YMM1, %YMM2, %YMM4
+L(8x_last_1x_vec):
+L(8x_last_2x_vec):
+	VPTEST	%YMM4, %YMM4, %k1
+	kmovd	%k1, %eax
+L(return_neq2):
+	ret
+
+	/* Relatively cold case as page cross are unexpected.  */
+	.p2align 4
+L(page_cross_less_vec):
+	cmpl	$16, %edx
+	jae	L(between_16_31)
+	cmpl	$8, %edx
+	ja	L(between_9_15)
+	cmpl	$4, %edx
+	jb	L(between_2_3)
+	/* From 4 to 8 bytes.  No branch when size == 4.  */
+	movl	(%rdi), %eax
+	movl	(%rsi), %ecx
+	subl	%ecx, %eax
+	movl	-4(%rdi, %rdx), %ecx
+	movl	-4(%rsi, %rdx), %esi
+	subl	%esi, %ecx
+	orl	%ecx, %eax
+	ret
+
+	.p2align 4,, 8
+L(between_9_15):
+	/* Safe to use xmm[0, 15] as no vzeroupper is needed so RTM safe.
+	 */
+	vmovq	(%rdi), %xmm1
+	vmovq	(%rsi), %xmm2
+	vpcmpeqb %xmm1, %xmm2, %xmm3
+	vmovq	-8(%rdi, %rdx), %xmm1
+	vmovq	-8(%rsi, %rdx), %xmm2
+	vpcmpeqb %xmm1, %xmm2, %xmm2
+	vpand	%xmm2, %xmm3, %xmm3
+	vpmovmskb %xmm3, %eax
+	subl	$0xffff, %eax
+	/* No ymm register was touched.  */
+	ret
+
+	.p2align 4,, 8
+L(between_16_31):
+	/* From 16 to 31 bytes.  No branch when size == 16.  */
+
+	/* Safe to use xmm[0, 15] as no vzeroupper is needed so RTM safe.
+	 */
+	vmovdqu	(%rsi), %xmm1
+	vpcmpeqb (%rdi), %xmm1, %xmm1
+	vmovdqu	-16(%rsi, %rdx), %xmm2
+	vpcmpeqb -16(%rdi, %rdx), %xmm2, %xmm2
+	vpand	%xmm1, %xmm2, %xmm2
+	vpmovmskb %xmm2, %eax
+	subl	$0xffff, %eax
+	/* No ymm register was touched.  */
+	ret
+
+	.p2align 4,, 8
+L(between_2_3):
+	/* From 2 to 3 bytes.  No branch when size == 2.  */
+	movzwl	(%rdi), %eax
+	movzwl	(%rsi), %ecx
+	subl	%ecx, %eax
+	movzbl	-1(%rdi, %rdx), %edi
+	movzbl	-1(%rsi, %rdx), %esi
+	subl	%edi, %esi
+	orl	%esi, %eax
+	/* No ymm register was touched.  */
+	ret
+END (BCMP)
+#endif
diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
index f94516e5ee..51f251d0c9 100644
--- a/sysdeps/x86_64/multiarch/ifunc-bcmp.h
+++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
@@ -35,8 +35,7 @@ IFUNC_SELECTOR (void)
       && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
     {
       if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
-	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
-	  && CPU_FEATURE_USABLE_P (cpu_features, MOVBE))
+	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
 	return OPTIMIZE (evex);
 
       if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index cda0316928..abbb4e407f 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -52,7 +52,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      IFUNC_IMPL_ADD (array, i, bcmp,
 			      (CPU_FEATURE_USABLE (AVX512VL)
 			       && CPU_FEATURE_USABLE (AVX512BW)
-                   && CPU_FEATURE_USABLE (MOVBE)
 			       && CPU_FEATURE_USABLE (BMI2)),
 			      __bcmp_evex)
 	      IFUNC_IMPL_ADD (array, i, bcmp, CPU_FEATURE_USABLE (SSE4_1),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex
  2021-09-13 23:05 [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
                   ` (3 preceding siblings ...)
  2021-09-13 23:05 ` [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
@ 2021-09-13 23:22 ` Noah Goldstein via Libc-alpha
  2021-09-14  6:30 ` [PATCH v2 " Noah Goldstein via Libc-alpha
  2021-09-15  0:00 ` [PATCH " Joseph Myers
  6 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-13 23:22 UTC (permalink / raw)
  To: GNU C Library

[-- Attachment #1: Type: text/plain, Size: 24661 bytes --]

On Mon, Sep 13, 2021 at 6:21 PM Noah Goldstein <goldstein.w.n@gmail.com>
wrote:

> No bug. This commit adds support for an optimized bcmp implementation.
> Support is for sse2, sse4_1, avx2, and evex.
>
> All string tests passing and build succeeding.
> ---
> This commit is essentially because compilers will optimize the
> idiomatic use of memcmp return as a boolean:
>
> https://godbolt.org/z/Tbhefh6cv
>
> so it seems reasonable to have an optimized bcmp implementation as we
> can get ~0-25% improvement (generally larger improvement for the
> smaller size ranges which ultimately are the most important to opimize
> for).
>
> Numbers for new implementations attached in reply.
>

Numbers in this email.


>
> Tests where run on the following CPUs:
>
> Tigerlake:
> https://ark.intel.com/content/www/us/en/ark/products/208921/intel-core-i7-1165g7-processor-12m-cache-up-to-4-70-ghz-with-ipu.html
> Skylake:
> https://ark.intel.com/content/www/us/en/ark/products/149091/intel-core-i7-8565u-processor-8m-cache-up-to-4-60-ghz.html
>
> Some notes on the numbers.
>
> There are some regressions in the sse2/sse4_1 versions. I didn't
> optimize these versions beyond defining out obviously irrelivant code
> for bcmp. My intuition is that the slowdowns are alignment related. I
> am not sure if these issues would translate to architectures that
> would actually use sse2/sse4_1.
>
> I add the sse2/sse4_1 implementations mostly so that the ifunc would
> have something to fallback on. With the lackluster numbers it may not
> be worth it, especially factoring in code size costs. Thoughts?
>
> The Tigerlake and Skylake versions are basically universal
> improvements for evex and avx2. I opted to align bcmp to 64 byte as
> opposed to 16. The rational is that to optimize for frontend behavior
> on either machine, only 16 byte gurantees is not enough. I think in
> any function where throughput (which I think bcmp can be) might be
> important good frontend behavior is important.
>
>
>  benchtests/Makefile                        |  2 +-
>  benchtests/bench-bcmp.c                    | 20 ++++++++
>  benchtests/bench-memcmp.c                  |  4 +-
>  string/Makefile                            |  4 +-
>  string/test-bcmp.c                         | 21 +++++++++
>  string/test-memcmp.c                       | 27 +++++++----
>  sysdeps/x86_64/memcmp.S                    |  2 -
>  sysdeps/x86_64/multiarch/Makefile          |  3 ++
>  sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S   | 12 +++++
>  sysdeps/x86_64/multiarch/bcmp-avx2.S       | 23 ++++++++++
>  sysdeps/x86_64/multiarch/bcmp-evex.S       | 23 ++++++++++
>  sysdeps/x86_64/multiarch/bcmp-sse2.S       | 23 ++++++++++
>  sysdeps/x86_64/multiarch/bcmp-sse4.S       | 23 ++++++++++
>  sysdeps/x86_64/multiarch/bcmp.c            | 35 ++++++++++++++
>  sysdeps/x86_64/multiarch/ifunc-bcmp.h      | 53 ++++++++++++++++++++++
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c | 23 ++++++++++
>  sysdeps/x86_64/multiarch/memcmp-sse2.S     |  4 +-
>  sysdeps/x86_64/multiarch/memcmp.c          |  2 -
>  18 files changed, 286 insertions(+), 18 deletions(-)
>  create mode 100644 benchtests/bench-bcmp.c
>  create mode 100644 string/test-bcmp.c
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2.S
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse2.S
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse4.S
>  create mode 100644 sysdeps/x86_64/multiarch/bcmp.c
>  create mode 100644 sysdeps/x86_64/multiarch/ifunc-bcmp.h
>
> diff --git a/benchtests/Makefile b/benchtests/Makefile
> index 1530939a8c..5fc495eb57 100644
> --- a/benchtests/Makefile
> +++ b/benchtests/Makefile
> @@ -47,7 +47,7 @@ bench := $(foreach B,$(filter bench-%,${BENCHSET}),
> ${${B}})
>  endif
>
>  # String function benchmarks.
> -string-benchset := memccpy memchr memcmp memcpy memmem memmove \
> +string-benchset := bcmp memccpy memchr memcmp memcpy memmem memmove \
>                    mempcpy memset rawmemchr stpcpy stpncpy strcasecmp
> strcasestr \
>                    strcat strchr strchrnul strcmp strcpy strcspn strlen \
>                    strncasecmp strncat strncmp strncpy strnlen strpbrk
> strrchr \
> diff --git a/benchtests/bench-bcmp.c b/benchtests/bench-bcmp.c
> new file mode 100644
> index 0000000000..1023639787
> --- /dev/null
> +++ b/benchtests/bench-bcmp.c
> @@ -0,0 +1,20 @@
> +/* Measure bcmp functions.
> +   Copyright (C) 2015-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define TEST_BCMP 1
> +#include "bench-memcmp.c"
> diff --git a/benchtests/bench-memcmp.c b/benchtests/bench-memcmp.c
> index 744c7ec5ba..4d5f8fb766 100644
> --- a/benchtests/bench-memcmp.c
> +++ b/benchtests/bench-memcmp.c
> @@ -17,7 +17,9 @@
>     <https://www.gnu.org/licenses/>.  */
>
>  #define TEST_MAIN
> -#ifdef WIDE
> +#ifdef TEST_BCMP
> +# define TEST_NAME "bcmp"
> +#elif defined WIDE
>  # define TEST_NAME "wmemcmp"
>  #else
>  # define TEST_NAME "memcmp"
> diff --git a/string/Makefile b/string/Makefile
> index f0fce2a0b8..f1f67ee157 100644
> --- a/string/Makefile
> +++ b/string/Makefile
> @@ -35,7 +35,7 @@ routines      := strcat strchr strcmp strcoll strcpy
> strcspn          \
>                    strncat strncmp strncpy                              \
>                    strrchr strpbrk strsignal strspn strstr strtok       \
>                    strtok_r strxfrm memchr memcmp memmove memset        \
> -                  mempcpy bcopy bzero ffs ffsll stpcpy stpncpy         \
> +                  mempcpy bcmp bcopy bzero ffs ffsll stpcpy stpncpy
>       \
>                    strcasecmp strncase strcasecmp_l strncase_l          \
>                    memccpy memcpy wordcopy strsep strcasestr            \
>                    swab strfry memfrob memmem rawmemchr strchrnul       \
> @@ -52,7 +52,7 @@ strop-tests   := memchr memcmp memcpy memmove mempcpy
> memset memccpy  \
>                    stpcpy stpncpy strcat strchr strcmp strcpy strcspn   \
>                    strlen strncmp strncpy strpbrk strrchr strspn memmem \
>                    strstr strcasestr strnlen strcasecmp strncasecmp     \
> -                  strncat rawmemchr strchrnul bcopy bzero memrchr      \
> +                  strncat rawmemchr strchrnul bcmp bcopy bzero memrchr \
>                    explicit_bzero
>  tests          := tester inl-tester noinl-tester testcopy test-ffs     \
>                    tst-strlen stratcliff tst-svc tst-inlcall            \
> diff --git a/string/test-bcmp.c b/string/test-bcmp.c
> new file mode 100644
> index 0000000000..6d19a4a87c
> --- /dev/null
> +++ b/string/test-bcmp.c
> @@ -0,0 +1,21 @@
> +/* Test and measure bcmp functions.
> +   Copyright (C) 2012-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define BAD_RESULT(result, expec) ((!(result)) != (!(expec)))
> +#define TEST_BCMP 1
> +#include "test-memcmp.c"
> diff --git a/string/test-memcmp.c b/string/test-memcmp.c
> index 6ddbc05d2f..c630e6799d 100644
> --- a/string/test-memcmp.c
> +++ b/string/test-memcmp.c
> @@ -17,11 +17,14 @@
>     <https://www.gnu.org/licenses/>.  */
>
>  #define TEST_MAIN
> -#ifdef WIDE
> +#ifdef TEST_BCMP
> +# define TEST_NAME "bcmp"
> +#elif defined WIDE
>  # define TEST_NAME "wmemcmp"
>  #else
>  # define TEST_NAME "memcmp"
>  #endif
> +
>  #include "test-string.h"
>  #ifdef WIDE
>  # include <inttypes.h>
> @@ -35,6 +38,7 @@
>  # define CHARBYTES 4
>  # define CHAR__MIN WCHAR_MIN
>  # define CHAR__MAX WCHAR_MAX
> +
>  int
>  simple_wmemcmp (const wchar_t *s1, const wchar_t *s2, size_t n)
>  {
> @@ -48,8 +52,11 @@ simple_wmemcmp (const wchar_t *s1, const wchar_t *s2,
> size_t n)
>  }
>  #else
>  # include <limits.h>
> -
> -# define MEMCMP memcmp
> +# ifdef TEST_BCMP
> +#  define MEMCMP bcmp
> +# else
> +#  define MEMCMP memcmp
> +# endif
>  # define MEMCPY memcpy
>  # define SIMPLE_MEMCMP simple_memcmp
>  # define CHAR char
> @@ -69,6 +76,12 @@ simple_memcmp (const char *s1, const char *s2, size_t n)
>  }
>  #endif
>
> +# ifndef BAD_RESULT
> +#  define BAD_RESULT(result, expec)                                     \
> +    (((result) == 0 && (expec)) || ((result) < 0 && (expec) >= 0) ||    \
> +     ((result) > 0 && (expec) <= 0))
> +#  endif
> +
>  typedef int (*proto_t) (const CHAR *, const CHAR *, size_t);
>
>  IMPL (SIMPLE_MEMCMP, 0)
> @@ -79,9 +92,7 @@ check_result (impl_t *impl, const CHAR *s1, const CHAR
> *s2, size_t len,
>               int exp_result)
>  {
>    int result = CALL (impl, s1, s2, len);
> -  if ((exp_result == 0 && result != 0)
> -      || (exp_result < 0 && result >= 0)
> -      || (exp_result > 0 && result <= 0))
> +  if (BAD_RESULT(result, exp_result))
>      {
>        error (0, 0, "Wrong result in function %s %d %d", impl->name,
>              result, exp_result);
> @@ -186,9 +197,7 @@ do_random_tests (void)
>         {
>           r = CALL (impl, (CHAR *) p1 + align1, (const CHAR *) p2 + align2,
>                     len);
> -         if ((r == 0 && result)
> -             || (r < 0 && result >= 0)
> -             || (r > 0 && result <= 0))
> +         if (BAD_RESULT(r, result))
>             {
>               error (0, 0, "Iteration %zd - wrong result in function %s
> (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
>                      n, impl->name, align1 * CHARBYTES & 63,  align2 *
> CHARBYTES & 63, len, pos, r, result, p1, p2);
> diff --git a/sysdeps/x86_64/memcmp.S b/sysdeps/x86_64/memcmp.S
> index 870e15c5a0..dfd0269db2 100644
> --- a/sysdeps/x86_64/memcmp.S
> +++ b/sysdeps/x86_64/memcmp.S
> @@ -356,6 +356,4 @@ L(ATR32res):
>         .p2align 4,, 4
>  END(memcmp)
>
> -#undef bcmp
> -weak_alias (memcmp, bcmp)
>  libc_hidden_builtin_def (memcmp)
> diff --git a/sysdeps/x86_64/multiarch/Makefile
> b/sysdeps/x86_64/multiarch/Makefile
> index 26be40959c..9dd0d8c3ff 100644
> --- a/sysdeps/x86_64/multiarch/Makefile
> +++ b/sysdeps/x86_64/multiarch/Makefile
> @@ -1,6 +1,7 @@
>  ifeq ($(subdir),string)
>
>  sysdep_routines += strncat-c stpncpy-c strncpy-c \
> +                  bcmp-sse2 bcmp-sse4 bcmp-avx2 \
>                    strcmp-sse2 strcmp-sse2-unaligned strcmp-ssse3  \
>                    strcmp-sse4_2 strcmp-avx2 \
>                    strncmp-sse2 strncmp-ssse3 strncmp-sse4_2 strncmp-avx2 \
> @@ -40,6 +41,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \
>                    memset-sse2-unaligned-erms \
>                    memset-avx2-unaligned-erms \
>                    memset-avx512-unaligned-erms \
> +                  bcmp-avx2-rtm \
>                    memchr-avx2-rtm \
>                    memcmp-avx2-movbe-rtm \
>                    memmove-avx-unaligned-erms-rtm \
> @@ -59,6 +61,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \
>                    strncpy-avx2-rtm \
>                    strnlen-avx2-rtm \
>                    strrchr-avx2-rtm \
> +                  bcmp-evex \
>                    memchr-evex \
>                    memcmp-evex-movbe \
>                    memmove-evex-unaligned-erms \
> diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
> b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
> new file mode 100644
> index 0000000000..d742257e4e
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
> @@ -0,0 +1,12 @@
> +#ifndef MEMCMP
> +# define MEMCMP __bcmp_avx2_rtm
> +#endif
> +
> +#define ZERO_UPPER_VEC_REGISTERS_RETURN \
> +  ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
> +
> +#define VZEROUPPER_RETURN jmp   L(return_vzeroupper)
> +
> +#define SECTION(p) p##.avx.rtm
> +
> +#include "bcmp-avx2.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2.S
> b/sysdeps/x86_64/multiarch/bcmp-avx2.S
> new file mode 100644
> index 0000000000..93a9a20b17
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-avx2.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with AVX2.
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef MEMCMP
> +# define MEMCMP        __bcmp_avx2
> +#endif
> +
> +#include "bcmp-avx2.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-evex.S
> b/sysdeps/x86_64/multiarch/bcmp-evex.S
> new file mode 100644
> index 0000000000..ade52e8c68
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-evex.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with EVEX.
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef MEMCMP
> +# define MEMCMP        __bcmp_evex
> +#endif
> +
> +#include "memcmp-evex-movbe.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-sse2.S
> b/sysdeps/x86_64/multiarch/bcmp-sse2.S
> new file mode 100644
> index 0000000000..b18d570386
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-sse2.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with SSE2
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +# ifndef memcmp
> +#  define memcmp       __bcmp_sse2
> +# endif
> +# define USE_AS_BCMP   1
> +#include "memcmp-sse2.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp-sse4.S
> b/sysdeps/x86_64/multiarch/bcmp-sse4.S
> new file mode 100644
> index 0000000000..ed9804053f
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp-sse4.S
> @@ -0,0 +1,23 @@
> +/* bcmp optimized with SSE4.1
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +# ifndef MEMCMP
> +#  define MEMCMP       __bcmp_sse4_1
> +# endif
> +# define USE_AS_BCMP   1
> +#include "memcmp-sse4.S"
> diff --git a/sysdeps/x86_64/multiarch/bcmp.c
> b/sysdeps/x86_64/multiarch/bcmp.c
> new file mode 100644
> index 0000000000..6e26b73ecc
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/bcmp.c
> @@ -0,0 +1,35 @@
> +/* Multiple versions of bcmp.
> +   All versions must be listed in ifunc-impl-list.c.
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Define multiple versions only for the definition in libc.  */
> +#if IS_IN (libc)
> +# define bcmp __redirect_bcmp
> +# include <string.h>
> +# undef bcmp
> +
> +# define SYMBOL_NAME bcmp
> +# include "ifunc-bcmp.h"
> +
> +libc_ifunc_redirected (__redirect_bcmp, bcmp, IFUNC_SELECTOR ());
> +
> +# ifdef SHARED
> +__hidden_ver1 (bcmp, __GI_bcmp, __redirect_bcmp)
> +  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (bcmp);
> +# endif
> +#endif
> diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h
> b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
> new file mode 100644
> index 0000000000..b0dacd8526
> --- /dev/null
> +++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
> @@ -0,0 +1,53 @@
> +/* Common definition for bcmp ifunc selections.
> +   All versions must be listed in ifunc-impl-list.c.
> +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +# include <init-arch.h>
> +
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;
> +extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden;
> +
> +static inline void *
> +IFUNC_SELECTOR (void)
> +{
> +  const struct cpu_features* cpu_features = __get_cpu_features ();
> +
> +  if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> +      && CPU_FEATURE_USABLE_P (cpu_features, BMI2)
> +      && CPU_FEATURE_USABLE_P (cpu_features, MOVBE)
> +      && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
> +    {
> +      if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
> +         && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
> +       return OPTIMIZE (evex);
> +
> +      if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
> +       return OPTIMIZE (avx2_rtm);
> +
> +      if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER))
> +       return OPTIMIZE (avx2);
> +    }
> +
> +  if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1))
> +    return OPTIMIZE (sse4_1);
> +
> +  return OPTIMIZE (sse2);
> +}
> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> index 39ab10613b..dd0c393c7d 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> @@ -38,6 +38,29 @@ __libc_ifunc_impl_list (const char *name, struct
> libc_ifunc_impl *array,
>
>    size_t i = 0;
>
> +  /* Support sysdeps/x86_64/multiarch/bcmp.c.  */
> +  IFUNC_IMPL (i, name, bcmp,
> +             IFUNC_IMPL_ADD (array, i, bcmp,
> +                             (CPU_FEATURE_USABLE (AVX2)
> +                   && CPU_FEATURE_USABLE (MOVBE)
> +                              && CPU_FEATURE_USABLE (BMI2)),
> +                             __bcmp_avx2)
> +             IFUNC_IMPL_ADD (array, i, bcmp,
> +                             (CPU_FEATURE_USABLE (AVX2)
> +                              && CPU_FEATURE_USABLE (BMI2)
> +                   && CPU_FEATURE_USABLE (MOVBE)
> +                              && CPU_FEATURE_USABLE (RTM)),
> +                             __bcmp_avx2_rtm)
> +             IFUNC_IMPL_ADD (array, i, bcmp,
> +                             (CPU_FEATURE_USABLE (AVX512VL)
> +                              && CPU_FEATURE_USABLE (AVX512BW)
> +                   && CPU_FEATURE_USABLE (MOVBE)
> +                              && CPU_FEATURE_USABLE (BMI2)),
> +                             __bcmp_evex)
> +             IFUNC_IMPL_ADD (array, i, bcmp, CPU_FEATURE_USABLE (SSE4_1),
> +                             __bcmp_sse4_1)
> +             IFUNC_IMPL_ADD (array, i, bcmp, 1, __bcmp_sse2))
> +
>    /* Support sysdeps/x86_64/multiarch/memchr.c.  */
>    IFUNC_IMPL (i, name, memchr,
>               IFUNC_IMPL_ADD (array, i, memchr,
> diff --git a/sysdeps/x86_64/multiarch/memcmp-sse2.S
> b/sysdeps/x86_64/multiarch/memcmp-sse2.S
> index b135fa2d40..2a4867ad18 100644
> --- a/sysdeps/x86_64/multiarch/memcmp-sse2.S
> +++ b/sysdeps/x86_64/multiarch/memcmp-sse2.S
> @@ -17,7 +17,9 @@
>     <https://www.gnu.org/licenses/>.  */
>
>  #if IS_IN (libc)
> -# define memcmp __memcmp_sse2
> +# ifndef memcmp
> +#  define memcmp __memcmp_sse2
> +# endif
>
>  # ifdef SHARED
>  #  undef libc_hidden_builtin_def
> diff --git a/sysdeps/x86_64/multiarch/memcmp.c
> b/sysdeps/x86_64/multiarch/memcmp.c
> index fe725f3563..1760e045df 100644
> --- a/sysdeps/x86_64/multiarch/memcmp.c
> +++ b/sysdeps/x86_64/multiarch/memcmp.c
> @@ -27,8 +27,6 @@
>  # include "ifunc-memcmp.h"
>
>  libc_ifunc_redirected (__redirect_memcmp, memcmp, IFUNC_SELECTOR ());
> -# undef bcmp
> -weak_alias (memcmp, bcmp)
>
>  # ifdef SHARED
>  __hidden_ver1 (memcmp, __GI_memcmp, __redirect_memcmp)
> --
> 2.25.1
>
>

[-- Attachment #2: bcmp-skl.pdf --]
[-- Type: application/pdf, Size: 195097 bytes --]

[-- Attachment #3: bcmp-tgl.pdf --]
[-- Type: application/pdf, Size: 223172 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-13 23:05 ` [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
@ 2021-09-14  1:18   ` Carlos O'Donell via Libc-alpha
  2021-09-14  2:05     ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Carlos O'Donell via Libc-alpha @ 2021-09-14  1:18 UTC (permalink / raw)
  To: Noah Goldstein, libc-alpha

On 9/13/21 7:05 PM, Noah Goldstein via Libc-alpha wrote:
> No bug. This commit adds new optimized bcmp implementation for evex.
> 
> The primary optimizations are 1) skipping the logic to find the
> difference of the first mismatched byte and 2) not updating src/dst
> addresses as the non-equals logic does not need to be reused by
> different areas.
> 
> The entry alignment has been fixed at 64. In throughput sensitive
> functions which bcmp can potentially be frontend loop performance is
> important to opimized for. This is impossible/difficult to do/maintain
> with only 16 byte fixed alignment.
> 
> test-memcmp, test-bcmp, and test-wmemcmp are all passing.

This series fails in the containerized 32-bit x86 CI/CD regression tester.
https://patchwork.sourceware.org/project/glibc/patch/20210913230506.546749-5-goldstein.w.n@gmail.com/

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-14  1:18   ` Carlos O'Donell via Libc-alpha
@ 2021-09-14  2:05     ` Noah Goldstein via Libc-alpha
  2021-09-14  2:35       ` Carlos O'Donell via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14  2:05 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: GNU C Library

On Mon, Sep 13, 2021 at 8:18 PM Carlos O'Donell <carlos@redhat.com> wrote:

> On 9/13/21 7:05 PM, Noah Goldstein via Libc-alpha wrote:
> > No bug. This commit adds new optimized bcmp implementation for evex.
> >
> > The primary optimizations are 1) skipping the logic to find the
> > difference of the first mismatched byte and 2) not updating src/dst
> > addresses as the non-equals logic does not need to be reused by
> > different areas.
> >
> > The entry alignment has been fixed at 64. In throughput sensitive
> > functions which bcmp can potentially be frontend loop performance is
> > important to opimized for. This is impossible/difficult to do/maintain
> > with only 16 byte fixed alignment.
> >
> > test-memcmp, test-bcmp, and test-wmemcmp are all passing.
>
> This series fails in the containerized 32-bit x86 CI/CD regression tester.
>
> https://patchwork.sourceware.org/project/glibc/patch/20210913230506.546749-5-goldstein.w.n@gmail.com/


Shoot.

AFAICT the first error is:
*** No rule to make target '/build/string/stamp.os', needed by
'/build/libc_pic.a'.

I saw that issue earlier when I was working on just supporting bcmp for the
first
commit:

[PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex

So I think I missed/messed up something there regarding the necessary
changes
to the  Makefile/build infrastructure to support the change.

While it doesn't appear to be an issue on my local machine I left the
redirect in
string/memcmp.c:

https://sourceware.org/git/?p=glibc.git;a=blob;f=string/memcmp.c;h=9b46d7a905c8b7886f046b7660f63df10dc4573c;hb=HEAD#l360

But was one area where I didn't really know the right answer.


Does anyone know if there is anything special that needs to be done for the
32 bit
build when adding a new implementation?

Also, does anyone know what make/configure commands I need to reproduce
this
on a x86_64-Linux machine? The build log doesn't appear to have the command.

For my completely fresh build / testing I ran:

rm -rf /path/to/build/glibc; mkdir -p /path/to/build/glibc; (cd
/path/to/build/glibc/; unset LD_LIBRARY_PATH; /path/to/src/glibc/configure
--prefix=/usr; make --silent; make xcheck; make -r -C
/path/to/src/glibc/string/ objdir=`pwd` check; make -r -C
/path/to/src/glibc/wcsmbs/ objdir=`pwd` check)

which doesn't appear to have cut it.


>
> --
> Cheers,
> Carlos.
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-14  2:05     ` Noah Goldstein via Libc-alpha
@ 2021-09-14  2:35       ` Carlos O'Donell via Libc-alpha
  2021-09-14  2:55         ` DJ Delorie via Libc-alpha
  2021-09-14  3:40         ` Noah Goldstein via Libc-alpha
  0 siblings, 2 replies; 51+ messages in thread
From: Carlos O'Donell via Libc-alpha @ 2021-09-14  2:35 UTC (permalink / raw)
  To: Noah Goldstein, DJ Delorie; +Cc: GNU C Library

On 9/13/21 10:05 PM, Noah Goldstein wrote:
> On Mon, Sep 13, 2021 at 8:18 PM Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> On 9/13/21 7:05 PM, Noah Goldstein via Libc-alpha wrote:
>>> No bug. This commit adds new optimized bcmp implementation for evex.
>>>
>>> The primary optimizations are 1) skipping the logic to find the
>>> difference of the first mismatched byte and 2) not updating src/dst
>>> addresses as the non-equals logic does not need to be reused by
>>> different areas.
>>>
>>> The entry alignment has been fixed at 64. In throughput sensitive
>>> functions which bcmp can potentially be frontend loop performance is
>>> important to opimized for. This is impossible/difficult to do/maintain
>>> with only 16 byte fixed alignment.
>>>
>>> test-memcmp, test-bcmp, and test-wmemcmp are all passing.
>>
>> This series fails in the containerized 32-bit x86 CI/CD regression tester.
>>
>> https://patchwork.sourceware.org/project/glibc/patch/20210913230506.546749-5-goldstein.w.n@gmail.com/
> 
> 
> Shoot.

No worries! That's what the CI/CD system is there for :-)
 
> AFAICT the first error is:
> *** No rule to make target '/build/string/stamp.os', needed by
> '/build/libc_pic.a'.
 
I think a normal 32-bit x86 builds should show this issue.

You need a gcc that accepts -m32.

I minimally set:
export CC="gcc -m32 -Wl,--build-id=none"
export CXX="g++ -m32 -Wl,--build-id=none"
export CFLAGS="-g -O2 -march=i686 -Wl,--build-id=none"
export CXXFLAGS="-g -O2 -march=i686 -Wl,--build-id=none"
export CPPFLAGS="-g -O2 -march=i686 -Wl,--build-id=none"

Then build with --host.

e.g.

/home/carlos/src/glibc-work/configure --host i686-pc-linux-gnu CC=gcc -m32 -Wl,--build-id=none CFLAGS=-g -O2 -march=i686 -Wl,--build-id=none CPPFLAGS=-g -O2 -march=i686 -Wl,--build-id=none CXX=g++ -m32 -Wl,--build-id=none CXXFLAGS=-g -O2 -march=i686 -Wl,--build-id=none --prefix=/usr --with-headers=/home/carlos/build/glibc-headers-work-i686/include --with-selinux --disable-nss-crypt --enable-bind-now --enable-static-pie --enable-systemtap --enable-hardcoded-path-in-tests --enable-tunables=yes --enable-add-ons

> Also, does anyone know what make/configure commands I need to reproduce
> this on a x86_64-Linux machine? The build log doesn't appear to have the command.

DJ, Should the trybot log the configure step?

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-14  2:35       ` Carlos O'Donell via Libc-alpha
@ 2021-09-14  2:55         ` DJ Delorie via Libc-alpha
  2021-09-14  3:24           ` Noah Goldstein via Libc-alpha
  2021-09-14  3:40         ` Noah Goldstein via Libc-alpha
  1 sibling, 1 reply; 51+ messages in thread
From: DJ Delorie via Libc-alpha @ 2021-09-14  2:55 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

"Carlos O'Donell" <carlos@redhat.com> writes:
>> Also, does anyone know what make/configure commands I need to reproduce
>> this on a x86_64-Linux machine? The build log doesn't appear to have the command.
>
> DJ, Should the trybot log the configure step?

Perhaps.  It's in the stdout that gets added to the trybot's general log
file, rather than a per-series log (and in the git repo's sample script
;).  It's:

/glibc/configure CC="gcc -m32" CXX="g++ -m32" --prefix=/usr \
   --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu

However, this doesn't smell like a 64-vs-32 bug, but a x86-64 vs
anything-else bug.

(It's also in build-many-glibcs.py)


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-14  2:55         ` DJ Delorie via Libc-alpha
@ 2021-09-14  3:24           ` Noah Goldstein via Libc-alpha
  0 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14  3:24 UTC (permalink / raw)
  To: DJ Delorie; +Cc: GNU C Library

On Mon, Sep 13, 2021 at 9:55 PM DJ Delorie <dj@redhat.com> wrote:

> "Carlos O'Donell" <carlos@redhat.com> writes:
> >> Also, does anyone know what make/configure commands I need to reproduce
> >> this on a x86_64-Linux machine? The build log doesn't appear to have
> the command.
> >
> > DJ, Should the trybot log the configure step?
>
> Perhaps.  It's in the stdout that gets added to the trybot's general log
> file, rather than a per-series log (and in the git repo's sample script
> ;).  It's:
>
> /glibc/configure CC="gcc -m32" CXX="g++ -m32" --prefix=/usr \
>    --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu
>

Thanks I was able to reproduce the bug with that. Thanks!

>
> However, this doesn't smell like a 64-vs-32 bug, but a x86-64 vs
> anything-else bug.
>

That makes sense.


>
> (It's also in build-many-glibcs.py)
>

Thanks!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-14  2:35       ` Carlos O'Donell via Libc-alpha
  2021-09-14  2:55         ` DJ Delorie via Libc-alpha
@ 2021-09-14  3:40         ` Noah Goldstein via Libc-alpha
  2021-09-14  4:21           ` DJ Delorie via Libc-alpha
  1 sibling, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14  3:40 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: GNU C Library

On Mon, Sep 13, 2021 at 9:35 PM Carlos O'Donell <carlos@redhat.com> wrote:

> On 9/13/21 10:05 PM, Noah Goldstein wrote:
> > On Mon, Sep 13, 2021 at 8:18 PM Carlos O'Donell <carlos@redhat.com>
> wrote:
> >
> >> On 9/13/21 7:05 PM, Noah Goldstein via Libc-alpha wrote:
> >>> No bug. This commit adds new optimized bcmp implementation for evex.
> >>>
> >>> The primary optimizations are 1) skipping the logic to find the
> >>> difference of the first mismatched byte and 2) not updating src/dst
> >>> addresses as the non-equals logic does not need to be reused by
> >>> different areas.
> >>>
> >>> The entry alignment has been fixed at 64. In throughput sensitive
> >>> functions which bcmp can potentially be frontend loop performance is
> >>> important to opimized for. This is impossible/difficult to do/maintain
> >>> with only 16 byte fixed alignment.
> >>>
> >>> test-memcmp, test-bcmp, and test-wmemcmp are all passing.
> >>
> >> This series fails in the containerized 32-bit x86 CI/CD regression
> tester.
> >>
> >>
> https://patchwork.sourceware.org/project/glibc/patch/20210913230506.546749-5-goldstein.w.n@gmail.com/
> >
> >
> > Shoot.
>
> No worries! That's what the CI/CD system is there for :-)
>
> > AFAICT the first error is:
> > *** No rule to make target '/build/string/stamp.os', needed by
> > '/build/libc_pic.a'.
>
> I think a normal 32-bit x86 builds should show this issue.
>
> You need a gcc that accepts -m32.
>

Was able to get it with DJ's command.

>
> I minimally set:
> export CC="gcc -m32 -Wl,--build-id=none"
> export CXX="g++ -m32 -Wl,--build-id=none"
> export CFLAGS="-g -O2 -march=i686 -Wl,--build-id=none"
> export CXXFLAGS="-g -O2 -march=i686 -Wl,--build-id=none"
> export CPPFLAGS="-g -O2 -march=i686 -Wl,--build-id=none"
>
> Then build with --host.
>
> e.g.
>
> /home/carlos/src/glibc-work/configure --host i686-pc-linux-gnu CC=gcc -m32
> -Wl,--build-id=none CFLAGS=-g -O2 -march=i686 -Wl,--build-id=none
> CPPFLAGS=-g -O2 -march=i686 -Wl,--build-id=none CXX=g++ -m32
> -Wl,--build-id=none CXXFLAGS=-g -O2 -march=i686 -Wl,--build-id=none
> --prefix=/usr
> --with-headers=/home/carlos/build/glibc-headers-work-i686/include
> --with-selinux --disable-nss-crypt --enable-bind-now --enable-static-pie
> --enable-systemtap --enable-hardcoded-path-in-tests --enable-tunables=yes
> --enable-add-ons


Thanks for the help!


>


> > Also, does anyone know what make/configure commands I need to reproduce
> > this on a x86_64-Linux machine? The build log doesn't appear to have the
> command.
>
> DJ, Should the trybot log the configure step?
>
>
So I think I was able to fix the build by making a new file in
glibc/string/bcmp.c
and just having bcmp call memcmp

Is there another/better way to fix the build?  I don't think it's really
fair that every
arch other than x86_64 should have to pay an extra function call cost to
use bcmp.


> --
> Cheers,
> Carlos.
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-14  3:40         ` Noah Goldstein via Libc-alpha
@ 2021-09-14  4:21           ` DJ Delorie via Libc-alpha
  2021-09-14  5:29             ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: DJ Delorie via Libc-alpha @ 2021-09-14  4:21 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha

Noah Goldstein <goldstein.w.n@gmail.com> writes:
> So I think I was able to fix the build by making a new file in glibc/string/bcmp.c
> and just having bcmp call memcmp
>
> Is there another/better way to fix the build?  I don't think it's really fair that every 
> arch other than x86_64 should have to pay an extra function call cost to use bcmp. 

There are at least three...

First, note that bcmp is a weak alias to memcmp already - see
strings/memcmp.c - which avoids the extra call you mention.

So, you could either move that weak alias into bcmp.c, or arrange for
bcmp.c to not be needed by the Makefile for non-x86_64 platforms.
Lastly, an empty bcmp.c wouldn't override the alias in memcmp.c.  I
think the first would be easiest, although it may be tricky to compile a
source file that seems to do "nothing".  Also, I suspect liberal use of
comments would be beneficial for the unsuspecting reader ;-)

Alternately, you could change your patch to provide alternate versions
of memcmp() instead of bcmp(), as glibc's bcmp *is* memcmp.  This is
what other arches (and x86_64) do:

$ find . -name 'memcmp*' -print


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-14  4:21           ` DJ Delorie via Libc-alpha
@ 2021-09-14  5:29             ` Noah Goldstein via Libc-alpha
  2021-09-14  5:42               ` DJ Delorie via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14  5:29 UTC (permalink / raw)
  To: DJ Delorie; +Cc: GNU C Library

On Mon, Sep 13, 2021 at 11:21 PM DJ Delorie <dj@redhat.com> wrote:

> Noah Goldstein <goldstein.w.n@gmail.com> writes:
> > So I think I was able to fix the build by making a new file in
> glibc/string/bcmp.c
> > and just having bcmp call memcmp
> >
> > Is there another/better way to fix the build?  I don't think it's really
> fair that every
> > arch other than x86_64 should have to pay an extra function call cost to
> use bcmp.
>
> There are at least three...
>
> First, note that bcmp is a weak alias to memcmp already - see
> strings/memcmp.c - which avoids the extra call you mention.
>
> So, you could either move that weak alias into bcmp.c, or arrange for
> bcmp.c to not be needed by the Makefile for non-x86_64 platforms.
> Lastly, an empty bcmp.c wouldn't override the alias in memcmp.c.  I
> think the first would be easiest, although it may be tricky to compile a
> source file that seems to do "nothing".  Also, I suspect liberal use of
> comments would be beneficial for the unsuspecting reader ;-)
>
>
I see.

I was able to get it working with just an empty bcmp.c file but was not able
to move the weak_alias from memcmp.c to bcmp.c

Adding:
```
#ifdef weak_alias
# undef bcmp
weak_alias (memcmp, bcmp)
#endif
```

to bcmp.c gets me the following compiler error:

```
bcmp.c:24:21: error: ‘bcmp’ aliased to undefined symbol ‘memcmp’
```

irrespective of the ifdef/undef and whether I include string.h/manually
put in a prototype of memcmp.

Sorry for the hassle. Build infrastructure, especially in a project as
complex
as this, is a bit out of my domain.


> Alternately, you could change your patch to provide alternate versions
> of memcmp() instead of bcmp(), as glibc's bcmp *is* memcmp.  This is
> what other arches (and x86_64) do:
>

I'm not 100% sure what you mean? memcmp can correctly implement bcmp
but not the vice versa.


>
> $ find . -name 'memcmp*' -print
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-14  5:29             ` Noah Goldstein via Libc-alpha
@ 2021-09-14  5:42               ` DJ Delorie via Libc-alpha
  2021-09-14  5:55                 ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: DJ Delorie via Libc-alpha @ 2021-09-14  5:42 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha

Noah Goldstein <goldstein.w.n@gmail.com> writes:
> I'm not 100% sure what you mean? memcmp can correctly implement bcmp
> but not the vice versa.

glibc does not have a separate implementation of bcmp().  Any calls to
bcmp() end up calling memcmp() (through that weak alias).  So your patch
is not *optimizing* bcmp, it is *adding* bcmp.  The new version you are
adding is no longer using the optimized versions of memcmp, so you'd
have to either (1) be very careful to not introduce a performance
regression, or (2) optimize the existing memcmp()s further instead.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-14  5:42               ` DJ Delorie via Libc-alpha
@ 2021-09-14  5:55                 ` Noah Goldstein via Libc-alpha
  0 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14  5:55 UTC (permalink / raw)
  To: DJ Delorie; +Cc: GNU C Library

On Tue, Sep 14, 2021 at 12:42 AM DJ Delorie <dj@redhat.com> wrote:

> Noah Goldstein <goldstein.w.n@gmail.com> writes:
> > I'm not 100% sure what you mean? memcmp can correctly implement bcmp
> > but not the vice versa.
>
> glibc does not have a separate implementation of bcmp().  Any calls to
> bcmp() end up calling memcmp() (through that weak alias).  So your patch
> is not *optimizing* bcmp, it is *adding* bcmp.  The new version you are
> adding is no longer using the optimized versions of memcmp, so you'd
> have to either (1) be very careful to not introduce a performance
> regression, or (2) optimize the existing memcmp()s further instead.
>

Ah, got it.

In the first patch of the set:
[PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex

I have some performance numbers. Seems to be an improvement for avx2/evex.
The sse2/sse4 stuff is a bit more iffy. I don't really have the hardware to
properly
test those versions.

Thank you for all the help!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex
  2021-09-13 23:05 [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
                   ` (4 preceding siblings ...)
  2021-09-13 23:22 ` [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
@ 2021-09-14  6:30 ` Noah Goldstein via Libc-alpha
  2021-09-14  6:30   ` [PATCH v2 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
                     ` (4 more replies)
  2021-09-15  0:00 ` [PATCH " Joseph Myers
  6 siblings, 5 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14  6:30 UTC (permalink / raw)
  To: libc-alpha

No bug. This commit adds support for an optimized bcmp implementation.
Support is for sse2, sse4_1, avx2, and evex.

All string tests passing and build succeeding.
---
 benchtests/Makefile                        |  2 +-
 benchtests/bench-bcmp.c                    | 20 ++++++++
 benchtests/bench-memcmp.c                  |  4 +-
 string/Makefile                            |  4 +-
 string/bcmp.c                              | 25 ++++++++++
 string/test-bcmp.c                         | 21 +++++++++
 string/test-memcmp.c                       | 27 +++++++----
 sysdeps/x86_64/memcmp.S                    |  2 -
 sysdeps/x86_64/multiarch/Makefile          |  3 ++
 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S   | 12 +++++
 sysdeps/x86_64/multiarch/bcmp-avx2.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp-evex.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp-sse2.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp-sse4.S       | 23 ++++++++++
 sysdeps/x86_64/multiarch/bcmp.c            | 35 ++++++++++++++
 sysdeps/x86_64/multiarch/ifunc-bcmp.h      | 53 ++++++++++++++++++++++
 sysdeps/x86_64/multiarch/ifunc-impl-list.c | 23 ++++++++++
 sysdeps/x86_64/multiarch/memcmp-sse2.S     |  4 +-
 sysdeps/x86_64/multiarch/memcmp.c          |  2 -
 19 files changed, 311 insertions(+), 18 deletions(-)
 create mode 100644 benchtests/bench-bcmp.c
 create mode 100644 string/bcmp.c
 create mode 100644 string/test-bcmp.c
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-avx2.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse2.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp-sse4.S
 create mode 100644 sysdeps/x86_64/multiarch/bcmp.c
 create mode 100644 sysdeps/x86_64/multiarch/ifunc-bcmp.h

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 1530939a8c..5fc495eb57 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -47,7 +47,7 @@ bench := $(foreach B,$(filter bench-%,${BENCHSET}), ${${B}})
 endif
 
 # String function benchmarks.
-string-benchset := memccpy memchr memcmp memcpy memmem memmove \
+string-benchset := bcmp memccpy memchr memcmp memcpy memmem memmove \
 		   mempcpy memset rawmemchr stpcpy stpncpy strcasecmp strcasestr \
 		   strcat strchr strchrnul strcmp strcpy strcspn strlen \
 		   strncasecmp strncat strncmp strncpy strnlen strpbrk strrchr \
diff --git a/benchtests/bench-bcmp.c b/benchtests/bench-bcmp.c
new file mode 100644
index 0000000000..1023639787
--- /dev/null
+++ b/benchtests/bench-bcmp.c
@@ -0,0 +1,20 @@
+/* Measure bcmp functions.
+   Copyright (C) 2015-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define TEST_BCMP 1
+#include "bench-memcmp.c"
diff --git a/benchtests/bench-memcmp.c b/benchtests/bench-memcmp.c
index 744c7ec5ba..4d5f8fb766 100644
--- a/benchtests/bench-memcmp.c
+++ b/benchtests/bench-memcmp.c
@@ -17,7 +17,9 @@
    <https://www.gnu.org/licenses/>.  */
 
 #define TEST_MAIN
-#ifdef WIDE
+#ifdef TEST_BCMP
+# define TEST_NAME "bcmp"
+#elif defined WIDE
 # define TEST_NAME "wmemcmp"
 #else
 # define TEST_NAME "memcmp"
diff --git a/string/Makefile b/string/Makefile
index f0fce2a0b8..f1f67ee157 100644
--- a/string/Makefile
+++ b/string/Makefile
@@ -35,7 +35,7 @@ routines	:= strcat strchr strcmp strcoll strcpy strcspn		\
 		   strncat strncmp strncpy				\
 		   strrchr strpbrk strsignal strspn strstr strtok	\
 		   strtok_r strxfrm memchr memcmp memmove memset	\
-		   mempcpy bcopy bzero ffs ffsll stpcpy stpncpy		\
+		   mempcpy bcmp bcopy bzero ffs ffsll stpcpy stpncpy		\
 		   strcasecmp strncase strcasecmp_l strncase_l		\
 		   memccpy memcpy wordcopy strsep strcasestr		\
 		   swab strfry memfrob memmem rawmemchr strchrnul	\
@@ -52,7 +52,7 @@ strop-tests	:= memchr memcmp memcpy memmove mempcpy memset memccpy	\
 		   stpcpy stpncpy strcat strchr strcmp strcpy strcspn	\
 		   strlen strncmp strncpy strpbrk strrchr strspn memmem	\
 		   strstr strcasestr strnlen strcasecmp strncasecmp	\
-		   strncat rawmemchr strchrnul bcopy bzero memrchr	\
+		   strncat rawmemchr strchrnul bcmp bcopy bzero memrchr	\
 		   explicit_bzero
 tests		:= tester inl-tester noinl-tester testcopy test-ffs	\
 		   tst-strlen stratcliff tst-svc tst-inlcall		\
diff --git a/string/bcmp.c b/string/bcmp.c
new file mode 100644
index 0000000000..2f5c446124
--- /dev/null
+++ b/string/bcmp.c
@@ -0,0 +1,25 @@
+/* Copyright (C) 1991-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+
+/* This file is intentionally left empty. It exists so that both
+   architectures which implement bcmp seperately from memcmp and
+   architectures which implement bcmp by having it alias memcmp will
+   build.
+
+   The alias for bcmp to memcmp for the C implementation is in
+   memcmp.c.  */
diff --git a/string/test-bcmp.c b/string/test-bcmp.c
new file mode 100644
index 0000000000..6d19a4a87c
--- /dev/null
+++ b/string/test-bcmp.c
@@ -0,0 +1,21 @@
+/* Test and measure bcmp functions.
+   Copyright (C) 2012-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define BAD_RESULT(result, expec) ((!(result)) != (!(expec)))
+#define TEST_BCMP 1
+#include "test-memcmp.c"
diff --git a/string/test-memcmp.c b/string/test-memcmp.c
index 6ddbc05d2f..c630e6799d 100644
--- a/string/test-memcmp.c
+++ b/string/test-memcmp.c
@@ -17,11 +17,14 @@
    <https://www.gnu.org/licenses/>.  */
 
 #define TEST_MAIN
-#ifdef WIDE
+#ifdef TEST_BCMP
+# define TEST_NAME "bcmp"
+#elif defined WIDE
 # define TEST_NAME "wmemcmp"
 #else
 # define TEST_NAME "memcmp"
 #endif
+
 #include "test-string.h"
 #ifdef WIDE
 # include <inttypes.h>
@@ -35,6 +38,7 @@
 # define CHARBYTES 4
 # define CHAR__MIN WCHAR_MIN
 # define CHAR__MAX WCHAR_MAX
+
 int
 simple_wmemcmp (const wchar_t *s1, const wchar_t *s2, size_t n)
 {
@@ -48,8 +52,11 @@ simple_wmemcmp (const wchar_t *s1, const wchar_t *s2, size_t n)
 }
 #else
 # include <limits.h>
-
-# define MEMCMP memcmp
+# ifdef TEST_BCMP
+#  define MEMCMP bcmp
+# else
+#  define MEMCMP memcmp
+# endif
 # define MEMCPY memcpy
 # define SIMPLE_MEMCMP simple_memcmp
 # define CHAR char
@@ -69,6 +76,12 @@ simple_memcmp (const char *s1, const char *s2, size_t n)
 }
 #endif
 
+# ifndef BAD_RESULT
+#  define BAD_RESULT(result, expec)                                     \
+    (((result) == 0 && (expec)) || ((result) < 0 && (expec) >= 0) ||    \
+     ((result) > 0 && (expec) <= 0))
+#  endif
+
 typedef int (*proto_t) (const CHAR *, const CHAR *, size_t);
 
 IMPL (SIMPLE_MEMCMP, 0)
@@ -79,9 +92,7 @@ check_result (impl_t *impl, const CHAR *s1, const CHAR *s2, size_t len,
 	      int exp_result)
 {
   int result = CALL (impl, s1, s2, len);
-  if ((exp_result == 0 && result != 0)
-      || (exp_result < 0 && result >= 0)
-      || (exp_result > 0 && result <= 0))
+  if (BAD_RESULT(result, exp_result))
     {
       error (0, 0, "Wrong result in function %s %d %d", impl->name,
 	     result, exp_result);
@@ -186,9 +197,7 @@ do_random_tests (void)
 	{
 	  r = CALL (impl, (CHAR *) p1 + align1, (const CHAR *) p2 + align2,
 		    len);
-	  if ((r == 0 && result)
-	      || (r < 0 && result >= 0)
-	      || (r > 0 && result <= 0))
+	  if (BAD_RESULT(r, result))
 	    {
 	      error (0, 0, "Iteration %zd - wrong result in function %s (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
 		     n, impl->name, align1 * CHARBYTES & 63,  align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);
diff --git a/sysdeps/x86_64/memcmp.S b/sysdeps/x86_64/memcmp.S
index 870e15c5a0..dfd0269db2 100644
--- a/sysdeps/x86_64/memcmp.S
+++ b/sysdeps/x86_64/memcmp.S
@@ -356,6 +356,4 @@ L(ATR32res):
 	.p2align 4,, 4
 END(memcmp)
 
-#undef bcmp
-weak_alias (memcmp, bcmp)
 libc_hidden_builtin_def (memcmp)
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
index 26be40959c..9dd0d8c3ff 100644
--- a/sysdeps/x86_64/multiarch/Makefile
+++ b/sysdeps/x86_64/multiarch/Makefile
@@ -1,6 +1,7 @@
 ifeq ($(subdir),string)
 
 sysdep_routines += strncat-c stpncpy-c strncpy-c \
+		   bcmp-sse2 bcmp-sse4 bcmp-avx2 \
 		   strcmp-sse2 strcmp-sse2-unaligned strcmp-ssse3  \
 		   strcmp-sse4_2 strcmp-avx2 \
 		   strncmp-sse2 strncmp-ssse3 strncmp-sse4_2 strncmp-avx2 \
@@ -40,6 +41,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \
 		   memset-sse2-unaligned-erms \
 		   memset-avx2-unaligned-erms \
 		   memset-avx512-unaligned-erms \
+		   bcmp-avx2-rtm \
 		   memchr-avx2-rtm \
 		   memcmp-avx2-movbe-rtm \
 		   memmove-avx-unaligned-erms-rtm \
@@ -59,6 +61,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \
 		   strncpy-avx2-rtm \
 		   strnlen-avx2-rtm \
 		   strrchr-avx2-rtm \
+		   bcmp-evex \
 		   memchr-evex \
 		   memcmp-evex-movbe \
 		   memmove-evex-unaligned-erms \
diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
new file mode 100644
index 0000000000..d742257e4e
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
@@ -0,0 +1,12 @@
+#ifndef MEMCMP
+# define MEMCMP __bcmp_avx2_rtm
+#endif
+
+#define ZERO_UPPER_VEC_REGISTERS_RETURN \
+  ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
+
+#define VZEROUPPER_RETURN jmp	 L(return_vzeroupper)
+
+#define SECTION(p) p##.avx.rtm
+
+#include "bcmp-avx2.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2.S b/sysdeps/x86_64/multiarch/bcmp-avx2.S
new file mode 100644
index 0000000000..93a9a20b17
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-avx2.S
@@ -0,0 +1,23 @@
+/* bcmp optimized with AVX2.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef MEMCMP
+# define MEMCMP	__bcmp_avx2
+#endif
+
+#include "bcmp-avx2.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-evex.S b/sysdeps/x86_64/multiarch/bcmp-evex.S
new file mode 100644
index 0000000000..ade52e8c68
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-evex.S
@@ -0,0 +1,23 @@
+/* bcmp optimized with EVEX.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef MEMCMP
+# define MEMCMP	__bcmp_evex
+#endif
+
+#include "memcmp-evex-movbe.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-sse2.S b/sysdeps/x86_64/multiarch/bcmp-sse2.S
new file mode 100644
index 0000000000..b18d570386
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-sse2.S
@@ -0,0 +1,23 @@
+/* bcmp optimized with SSE2
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+# ifndef memcmp
+#  define memcmp	__bcmp_sse2
+# endif
+# define USE_AS_BCMP	1
+#include "memcmp-sse2.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp-sse4.S b/sysdeps/x86_64/multiarch/bcmp-sse4.S
new file mode 100644
index 0000000000..ed9804053f
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp-sse4.S
@@ -0,0 +1,23 @@
+/* bcmp optimized with SSE4.1
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+# ifndef MEMCMP
+#  define MEMCMP	__bcmp_sse4_1
+# endif
+# define USE_AS_BCMP	1
+#include "memcmp-sse4.S"
diff --git a/sysdeps/x86_64/multiarch/bcmp.c b/sysdeps/x86_64/multiarch/bcmp.c
new file mode 100644
index 0000000000..6e26b73ecc
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/bcmp.c
@@ -0,0 +1,35 @@
+/* Multiple versions of bcmp.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+#if IS_IN (libc)
+# define bcmp __redirect_bcmp
+# include <string.h>
+# undef bcmp
+
+# define SYMBOL_NAME bcmp
+# include "ifunc-bcmp.h"
+
+libc_ifunc_redirected (__redirect_bcmp, bcmp, IFUNC_SELECTOR ());
+
+# ifdef SHARED
+__hidden_ver1 (bcmp, __GI_bcmp, __redirect_bcmp)
+  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (bcmp);
+# endif
+#endif
diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
new file mode 100644
index 0000000000..b0dacd8526
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
@@ -0,0 +1,53 @@
+/* Common definition for bcmp ifunc selections.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+# include <init-arch.h>
+
+extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden;
+
+static inline void *
+IFUNC_SELECTOR (void)
+{
+  const struct cpu_features* cpu_features = __get_cpu_features ();
+
+  if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
+      && CPU_FEATURE_USABLE_P (cpu_features, BMI2)
+      && CPU_FEATURE_USABLE_P (cpu_features, MOVBE)
+      && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
+    {
+      if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
+	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
+	return OPTIMIZE (evex);
+
+      if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
+	return OPTIMIZE (avx2_rtm);
+
+      if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER))
+	return OPTIMIZE (avx2);
+    }
+
+  if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1))
+    return OPTIMIZE (sse4_1);
+
+  return OPTIMIZE (sse2);
+}
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index 39ab10613b..dd0c393c7d 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -38,6 +38,29 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 
   size_t i = 0;
 
+  /* Support sysdeps/x86_64/multiarch/bcmp.c.  */
+  IFUNC_IMPL (i, name, bcmp,
+	      IFUNC_IMPL_ADD (array, i, bcmp,
+			      (CPU_FEATURE_USABLE (AVX2)
+                   && CPU_FEATURE_USABLE (MOVBE)
+			       && CPU_FEATURE_USABLE (BMI2)),
+			      __bcmp_avx2)
+	      IFUNC_IMPL_ADD (array, i, bcmp,
+			      (CPU_FEATURE_USABLE (AVX2)
+			       && CPU_FEATURE_USABLE (BMI2)
+                   && CPU_FEATURE_USABLE (MOVBE)
+			       && CPU_FEATURE_USABLE (RTM)),
+			      __bcmp_avx2_rtm)
+	      IFUNC_IMPL_ADD (array, i, bcmp,
+			      (CPU_FEATURE_USABLE (AVX512VL)
+			       && CPU_FEATURE_USABLE (AVX512BW)
+                   && CPU_FEATURE_USABLE (MOVBE)
+			       && CPU_FEATURE_USABLE (BMI2)),
+			      __bcmp_evex)
+	      IFUNC_IMPL_ADD (array, i, bcmp, CPU_FEATURE_USABLE (SSE4_1),
+			      __bcmp_sse4_1)
+	      IFUNC_IMPL_ADD (array, i, bcmp, 1, __bcmp_sse2))
+
   /* Support sysdeps/x86_64/multiarch/memchr.c.  */
   IFUNC_IMPL (i, name, memchr,
 	      IFUNC_IMPL_ADD (array, i, memchr,
diff --git a/sysdeps/x86_64/multiarch/memcmp-sse2.S b/sysdeps/x86_64/multiarch/memcmp-sse2.S
index b135fa2d40..2a4867ad18 100644
--- a/sysdeps/x86_64/multiarch/memcmp-sse2.S
+++ b/sysdeps/x86_64/multiarch/memcmp-sse2.S
@@ -17,7 +17,9 @@
    <https://www.gnu.org/licenses/>.  */
 
 #if IS_IN (libc)
-# define memcmp __memcmp_sse2
+# ifndef memcmp
+#  define memcmp __memcmp_sse2
+# endif
 
 # ifdef SHARED
 #  undef libc_hidden_builtin_def
diff --git a/sysdeps/x86_64/multiarch/memcmp.c b/sysdeps/x86_64/multiarch/memcmp.c
index fe725f3563..1760e045df 100644
--- a/sysdeps/x86_64/multiarch/memcmp.c
+++ b/sysdeps/x86_64/multiarch/memcmp.c
@@ -27,8 +27,6 @@
 # include "ifunc-memcmp.h"
 
 libc_ifunc_redirected (__redirect_memcmp, memcmp, IFUNC_SELECTOR ());
-# undef bcmp
-weak_alias (memcmp, bcmp)
 
 # ifdef SHARED
 __hidden_ver1 (memcmp, __GI_memcmp, __redirect_memcmp)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S
  2021-09-14  6:30 ` [PATCH v2 " Noah Goldstein via Libc-alpha
@ 2021-09-14  6:30   ` Noah Goldstein via Libc-alpha
  2021-09-14  6:30   ` [PATCH v2 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S Noah Goldstein via Libc-alpha
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14  6:30 UTC (permalink / raw)
  To: libc-alpha

No bug. This commit does not modify any of the memcmp
implementation. It just adds bcmp ifdefs to skip obvious cases
where computing the proper 1/-1 required by memcmp is not needed.

test-memcmp, test-bcmp, and test-wmemcmp are all passing.
---
 sysdeps/x86_64/memcmp.S | 55 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/sysdeps/x86_64/memcmp.S b/sysdeps/x86_64/memcmp.S
index dfd0269db2..21607e7c91 100644
--- a/sysdeps/x86_64/memcmp.S
+++ b/sysdeps/x86_64/memcmp.S
@@ -49,34 +49,63 @@ L(s2b):
 	movzwl	(%rdi),	%eax
 	movzwl	(%rdi, %rsi), %edx
 	subq    $2, %r10
+#ifdef USE_AS_BCMP
+	je	L(finz1)
+#else
 	je	L(fin2_7)
+#endif
 	addq	$2, %rdi
 	cmpl	%edx, %eax
+#ifdef USE_AS_BCMP
+	jnz	L(neq_early)
+#else
 	jnz	L(fin2_7)
+#endif
 L(s4b):
 	testq	$4, %r10
 	jz	L(s8b)
 	movl	(%rdi),	%eax
 	movl	(%rdi, %rsi), %edx
 	subq    $4, %r10
+#ifdef USE_AS_BCMP
+	je	L(finz1)
+#else
 	je	L(fin2_7)
+#endif
 	addq	$4, %rdi
 	cmpl	%edx, %eax
+#ifdef USE_AS_BCMP
+	jnz	L(neq_early)
+#else
 	jnz	L(fin2_7)
+#endif
 L(s8b):
 	testq	$8, %r10
 	jz	L(s16b)
 	movq	(%rdi),	%rax
 	movq	(%rdi, %rsi), %rdx
 	subq    $8, %r10
+#ifdef USE_AS_BCMP
+	je	L(sub_return8)
+#else
 	je	L(fin2_7)
+#endif
 	addq	$8, %rdi
 	cmpq	%rdx, %rax
+#ifdef USE_AS_BCMP
+	jnz	L(neq_early)
+#else
 	jnz	L(fin2_7)
+#endif
 L(s16b):
 	movdqu    (%rdi), %xmm1
 	movdqu    (%rdi, %rsi), %xmm0
 	pcmpeqb   %xmm0, %xmm1
+#ifdef USE_AS_BCMP
+	pmovmskb  %xmm1, %eax
+	subl      $0xffff, %eax
+	ret
+#else
 	pmovmskb  %xmm1, %edx
 	xorl	  %eax, %eax
 	subl      $0xffff, %edx
@@ -86,7 +115,7 @@ L(s16b):
 	movzbl	 (%rcx), %eax
 	movzbl	 (%rsi, %rcx), %edx
 	jmp	 L(finz1)
-
+#endif
 	.p2align 4,, 4
 L(finr1b):
 	movzbl	(%rdi), %eax
@@ -95,7 +124,15 @@ L(finz1):
 	subl	%edx, %eax
 L(exit):
 	ret
-
+#ifdef USE_AS_BCMP
+	.p2align 4,, 4
+L(sub_return8):
+	subq	%rdx, %rax
+	movl	%eax, %edx
+	shrq	$32, %rax
+	orl	%edx, %eax
+	ret
+#else
 	.p2align 4,, 4
 L(fin2_7):
 	cmpq	%rdx, %rax
@@ -111,12 +148,17 @@ L(fin2_7):
 	movzbl  %dl, %edx
 	subl	%edx, %eax
 	ret
-
+#endif
 	.p2align 4,, 4
 L(finz):
 	xorl	%eax, %eax
 	ret
-
+#ifdef USE_AS_BCMP
+	.p2align 4,, 4
+L(neq_early):
+	movl	$1, %eax
+	ret
+#endif
 	/* For blocks bigger than 32 bytes
 	   1. Advance one of the addr pointer to be 16B aligned.
 	   2. Treat the case of both addr pointers aligned to 16B
@@ -246,11 +288,16 @@ L(mt16):
 
 	.p2align 4,, 4
 L(neq):
+#ifdef USE_AS_BCMP
+	movl	$1, %eax
+    ret
+#else
 	bsfl      %edx, %ecx
 	movzbl	 (%rdi, %rcx), %eax
 	addq	 %rdi, %rsi
 	movzbl	 (%rsi,%rcx), %edx
 	jmp	 L(finz1)
+#endif
 
 	.p2align 4,, 4
 L(ATR):
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S
  2021-09-14  6:30 ` [PATCH v2 " Noah Goldstein via Libc-alpha
  2021-09-14  6:30   ` [PATCH v2 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
@ 2021-09-14  6:30   ` Noah Goldstein via Libc-alpha
  2021-09-14  6:30   ` [PATCH v2 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S Noah Goldstein via Libc-alpha
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14  6:30 UTC (permalink / raw)
  To: libc-alpha

No bug. This commit does not modify any of the memcmp
implementation. It just adds bcmp ifdefs to skip obvious cases
where computing the proper 1/-1 required by memcmp is not needed.

test-memcmp, test-bcmp, and test-wmemcmp are all passing.
---
 sysdeps/x86_64/multiarch/memcmp-sse4.S | 761 ++++++++++++++++++++++++-
 1 file changed, 746 insertions(+), 15 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/memcmp-sse4.S b/sysdeps/x86_64/multiarch/memcmp-sse4.S
index b82adcd5fa..b9528ed58e 100644
--- a/sysdeps/x86_64/multiarch/memcmp-sse4.S
+++ b/sysdeps/x86_64/multiarch/memcmp-sse4.S
@@ -72,7 +72,11 @@ L(79bytesormore):
 	movdqu	(%rdi), %xmm2
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 	mov	%rsi, %rcx
 	and	$-16, %rsi
 	add	$16, %rsi
@@ -91,34 +95,58 @@ L(less128bytes):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqu	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqu	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 	cmp	$32, %rdx
 	jb	L(less32bytesin64)
 
 	movdqu	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqu	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -140,42 +168,74 @@ L(less256bytes):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqu	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqu	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 
 	movdqu	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqu	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 
 	movdqu	96(%rdi), %xmm2
 	pxor	96(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(112bytesin256)
+# endif
 
 	movdqu	112(%rdi), %xmm2
 	pxor	112(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(128bytesin256)
+# endif
 
 	add	$128, %rsi
 	add	$128, %rdi
@@ -189,12 +249,20 @@ L(less256bytes):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -208,82 +276,146 @@ L(less512bytes):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqu	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqu	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 
 	movdqu	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqu	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 
 	movdqu	96(%rdi), %xmm2
 	pxor	96(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(112bytesin256)
+# endif
 
 	movdqu	112(%rdi), %xmm2
 	pxor	112(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(128bytesin256)
+# endif
 
 	movdqu	128(%rdi), %xmm2
 	pxor	128(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(144bytesin256)
+# endif
 
 	movdqu	144(%rdi), %xmm2
 	pxor	144(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(160bytesin256)
+# endif
 
 	movdqu	160(%rdi), %xmm2
 	pxor	160(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(176bytesin256)
+# endif
 
 	movdqu	176(%rdi), %xmm2
 	pxor	176(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(192bytesin256)
+# endif
 
 	movdqu	192(%rdi), %xmm2
 	pxor	192(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(208bytesin256)
+# endif
 
 	movdqu	208(%rdi), %xmm2
 	pxor	208(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(224bytesin256)
+# endif
 
 	movdqu	224(%rdi), %xmm2
 	pxor	224(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(240bytesin256)
+# endif
 
 	movdqu	240(%rdi), %xmm2
 	pxor	240(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(256bytesin256)
+# endif
 
 	add	$256, %rsi
 	add	$256, %rdi
@@ -300,12 +432,20 @@ L(less512bytes):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -346,7 +486,11 @@ L(64bytesormore_loop):
 	por	%xmm5, %xmm1
 
 	ptest	%xmm1, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesormore_loop_end)
+# endif
 	add	$64, %rsi
 	add	$64, %rdi
 	sub	$64, %rdx
@@ -380,7 +524,11 @@ L(L2_L3_unaligned_128bytes_loop):
 	por	%xmm5, %xmm1
 
 	ptest	%xmm1, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesormore_loop_end)
+# endif
 	add	$64, %rsi
 	add	$64, %rdi
 	sub	$64, %rdx
@@ -404,34 +552,58 @@ L(less128bytesin2aligned):
 	movdqa	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqa	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqa	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqa	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 	cmp	$32, %rdx
 	jb	L(less32bytesin64in2alinged)
 
 	movdqa	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqa	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -454,42 +626,74 @@ L(less256bytesin2alinged):
 	movdqa	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqa	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqa	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqa	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 
 	movdqa	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqa	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 
 	movdqa	96(%rdi), %xmm2
 	pxor	96(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(112bytesin256)
+# endif
 
 	movdqa	112(%rdi), %xmm2
 	pxor	112(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(128bytesin256)
+# endif
 
 	add	$128, %rsi
 	add	$128, %rdi
@@ -503,12 +707,20 @@ L(less256bytesin2alinged):
 	movdqu	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqu	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -524,82 +736,146 @@ L(256bytesormorein2aligned):
 	movdqa	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqa	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 
 	movdqa	32(%rdi), %xmm2
 	pxor	32(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(48bytesin256)
+# endif
 
 	movdqa	48(%rdi), %xmm2
 	pxor	48(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesin256)
+# endif
 
 	movdqa	64(%rdi), %xmm2
 	pxor	64(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(80bytesin256)
+# endif
 
 	movdqa	80(%rdi), %xmm2
 	pxor	80(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(96bytesin256)
+# endif
 
 	movdqa	96(%rdi), %xmm2
 	pxor	96(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(112bytesin256)
+# endif
 
 	movdqa	112(%rdi), %xmm2
 	pxor	112(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(128bytesin256)
+# endif
 
 	movdqa	128(%rdi), %xmm2
 	pxor	128(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(144bytesin256)
+# endif
 
 	movdqa	144(%rdi), %xmm2
 	pxor	144(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(160bytesin256)
+# endif
 
 	movdqa	160(%rdi), %xmm2
 	pxor	160(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(176bytesin256)
+# endif
 
 	movdqa	176(%rdi), %xmm2
 	pxor	176(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(192bytesin256)
+# endif
 
 	movdqa	192(%rdi), %xmm2
 	pxor	192(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(208bytesin256)
+# endif
 
 	movdqa	208(%rdi), %xmm2
 	pxor	208(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(224bytesin256)
+# endif
 
 	movdqa	224(%rdi), %xmm2
 	pxor	224(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(240bytesin256)
+# endif
 
 	movdqa	240(%rdi), %xmm2
 	pxor	240(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(256bytesin256)
+# endif
 
 	add	$256, %rsi
 	add	$256, %rdi
@@ -616,12 +892,20 @@ L(256bytesormorein2aligned):
 	movdqa	(%rdi), %xmm2
 	pxor	(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(16bytesin256)
+# endif
 
 	movdqa	16(%rdi), %xmm2
 	pxor	16(%rsi), %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(32bytesin256)
+# endif
 	sub	$32, %rdx
 	add	$32, %rdi
 	add	$32, %rsi
@@ -663,7 +947,11 @@ L(64bytesormore_loopin2aligned):
 	por	%xmm5, %xmm1
 
 	ptest	%xmm1, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesormore_loop_end)
+# endif
 	add	$64, %rsi
 	add	$64, %rdi
 	sub	$64, %rdx
@@ -697,7 +985,11 @@ L(L2_L3_aligned_128bytes_loop):
 	por	%xmm5, %xmm1
 
 	ptest	%xmm1, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(64bytesormore_loop_end)
+# endif
 	add	$64, %rsi
 	add	$64, %rdi
 	sub	$64, %rdx
@@ -708,7 +1000,7 @@ L(L2_L3_aligned_128bytes_loop):
 	add	%rdx, %rdi
 	BRANCH_TO_JMPTBL_ENTRY(L(table_64bytes), %rdx, 4)
 
-
+# ifndef USE_AS_BCMP
 	.p2align 4
 L(64bytesormore_loop_end):
 	add	$16, %rdi
@@ -791,17 +1083,29 @@ L(32bytesin256):
 L(16bytesin256):
 	add	$16, %rdi
 	add	$16, %rsi
+# endif
 L(16bytes):
 	mov	-16(%rdi), %rax
 	mov	-16(%rsi), %rcx
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 L(8bytes):
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
+# ifdef USE_AS_BCMP
+	sub	%rcx, %rax
+	mov	%rax, %rcx
+	shr	$32, %rcx
+	or	%ecx, %eax
+# else
 	cmp	%rax, %rcx
 	jne	L(diffin8bytes)
 	xor	%eax, %eax
+# endif
 	ret
 
 	.p2align 4
@@ -809,16 +1113,26 @@ L(12bytes):
 	mov	-12(%rdi), %rax
 	mov	-12(%rsi), %rcx
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 L(4bytes):
 	mov	-4(%rsi), %ecx
-# ifndef USE_AS_WMEMCMP
+# ifdef USE_AS_BCMP
 	mov	-4(%rdi), %eax
-	cmp	%eax, %ecx
+	sub	%ecx, %eax
+	ret
 # else
+#  ifndef USE_AS_WMEMCMP
+	mov	-4(%rdi), %eax
+	cmp	%eax, %ecx
+#  else
 	cmp	-4(%rdi), %ecx
-# endif
+#  endif
 	jne	L(diffin4bytes)
+# endif
 L(0bytes):
 	xor	%eax, %eax
 	ret
@@ -832,31 +1146,51 @@ L(65bytes):
 	mov	$-65, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(49bytes):
 	movdqu	-49(%rdi), %xmm1
 	movdqu	-49(%rsi), %xmm2
 	mov	$-49, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(33bytes):
 	movdqu	-33(%rdi), %xmm1
 	movdqu	-33(%rsi), %xmm2
 	mov	$-33, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(17bytes):
 	mov	-17(%rdi), %rax
 	mov	-17(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 L(9bytes):
 	mov	-9(%rdi), %rax
 	mov	-9(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	movzbl	-1(%rdi), %eax
 	movzbl	-1(%rsi), %edx
 	sub	%edx, %eax
@@ -867,12 +1201,23 @@ L(13bytes):
 	mov	-13(%rdi), %rax
 	mov	-13(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
+#  ifdef USE_AS_BCMP
+	sub	%rcx, %rax
+	mov	%rax, %rcx
+	shr	$32, %rcx
+	or	%ecx, %eax
+#  else
 	cmp	%rax, %rcx
 	jne	L(diffin8bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 
 	.p2align 4
@@ -880,7 +1225,11 @@ L(5bytes):
 	mov	-5(%rdi), %eax
 	mov	-5(%rsi), %ecx
 	cmp	%eax, %ecx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin4bytes)
+#  endif
 	movzbl	-1(%rdi), %eax
 	movzbl	-1(%rsi), %edx
 	sub	%edx, %eax
@@ -893,37 +1242,59 @@ L(66bytes):
 	mov	$-66, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(50bytes):
 	movdqu	-50(%rdi), %xmm1
 	movdqu	-50(%rsi), %xmm2
 	mov	$-50, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(34bytes):
 	movdqu	-34(%rdi), %xmm1
 	movdqu	-34(%rsi), %xmm2
 	mov	$-34, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(18bytes):
 	mov	-18(%rdi), %rax
 	mov	-18(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 L(10bytes):
 	mov	-10(%rdi), %rax
 	mov	-10(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	movzwl	-2(%rdi), %eax
 	movzwl	-2(%rsi), %ecx
+#  ifndef USE_AS_BCMP
 	cmp	%cl, %al
 	jne	L(end)
 	and	$0xffff, %eax
 	and	$0xffff, %ecx
+#  endif
 	sub	%ecx, %eax
 	ret
 
@@ -932,12 +1303,23 @@ L(14bytes):
 	mov	-14(%rdi), %rax
 	mov	-14(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
+#  ifdef USE_AS_BCMP
+	sub	%rcx, %rax
+	mov	%rax, %rcx
+	shr	$32, %rcx
+	or	%ecx, %eax
+#  else
 	cmp	%rax, %rcx
 	jne	L(diffin8bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 
 	.p2align 4
@@ -945,14 +1327,20 @@ L(6bytes):
 	mov	-6(%rdi), %eax
 	mov	-6(%rsi), %ecx
 	cmp	%eax, %ecx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin4bytes)
+#  endif
 L(2bytes):
 	movzwl	-2(%rsi), %ecx
 	movzwl	-2(%rdi), %eax
+#  ifndef USE_AS_BCMP
 	cmp	%cl, %al
 	jne	L(end)
 	and	$0xffff, %eax
 	and	$0xffff, %ecx
+#  endif
 	sub	%ecx, %eax
 	ret
 
@@ -963,36 +1351,60 @@ L(67bytes):
 	mov	$-67, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(51bytes):
 	movdqu	-51(%rdi), %xmm2
 	movdqu	-51(%rsi), %xmm1
 	mov	$-51, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(35bytes):
 	movdqu	-35(%rsi), %xmm1
 	movdqu	-35(%rdi), %xmm2
 	mov	$-35, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(19bytes):
 	mov	-19(%rdi), %rax
 	mov	-19(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 L(11bytes):
 	mov	-11(%rdi), %rax
 	mov	-11(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-4(%rdi), %eax
 	mov	-4(%rsi), %ecx
+#  ifdef USE_AS_BCMP
+	sub	%ecx, %eax
+#  else
 	cmp	%eax, %ecx
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 
 	.p2align 4
@@ -1000,12 +1412,23 @@ L(15bytes):
 	mov	-15(%rdi), %rax
 	mov	-15(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
+#  ifdef USE_AS_BCMP
+	sub	%rcx, %rax
+	mov	%rax, %rcx
+	shr	$32, %rcx
+	or	%ecx, %eax
+#  else
 	cmp	%rax, %rcx
 	jne	L(diffin8bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 
 	.p2align 4
@@ -1013,12 +1436,20 @@ L(7bytes):
 	mov	-7(%rdi), %eax
 	mov	-7(%rsi), %ecx
 	cmp	%eax, %ecx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin4bytes)
+#  endif
 	mov	-4(%rdi), %eax
 	mov	-4(%rsi), %ecx
+#  ifdef USE_AS_BCMP
+	sub	%ecx, %eax
+#  else
 	cmp	%eax, %ecx
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 
 	.p2align 4
@@ -1026,7 +1457,11 @@ L(3bytes):
 	movzwl	-3(%rdi), %eax
 	movzwl	-3(%rsi), %ecx
 	cmp	%eax, %ecx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin2bytes)
+#  endif
 L(1bytes):
 	movzbl	-1(%rdi), %eax
 	movzbl	-1(%rsi), %ecx
@@ -1041,38 +1476,58 @@ L(68bytes):
 	mov	$-68, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(52bytes):
 	movdqu	-52(%rdi), %xmm2
 	movdqu	-52(%rsi), %xmm1
 	mov	$-52, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(36bytes):
 	movdqu	-36(%rdi), %xmm2
 	movdqu	-36(%rsi), %xmm1
 	mov	$-36, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(20bytes):
 	movdqu	-20(%rdi), %xmm2
 	movdqu	-20(%rsi), %xmm1
 	mov	$-20, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 	mov	-4(%rsi), %ecx
-
-# ifndef USE_AS_WMEMCMP
+# ifdef USE_AS_BCMP
 	mov	-4(%rdi), %eax
-	cmp	%eax, %ecx
+	sub	%ecx, %eax
 # else
+#  ifndef USE_AS_WMEMCMP
+	mov	-4(%rdi), %eax
+	cmp	%eax, %ecx
+#  else
 	cmp	-4(%rdi), %ecx
-# endif
+#  endif
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
+# endif
 	ret
 
 # ifndef USE_AS_WMEMCMP
@@ -1084,32 +1539,52 @@ L(69bytes):
 	mov	$-69, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(53bytes):
 	movdqu	-53(%rsi), %xmm1
 	movdqu	-53(%rdi), %xmm2
 	mov	$-53, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(37bytes):
 	movdqu	-37(%rsi), %xmm1
 	movdqu	-37(%rdi), %xmm2
 	mov	$-37, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(21bytes):
 	movdqu	-21(%rsi), %xmm1
 	movdqu	-21(%rdi), %xmm2
 	mov	$-21, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 
@@ -1120,32 +1595,52 @@ L(70bytes):
 	mov	$-70, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(54bytes):
 	movdqu	-54(%rsi), %xmm1
 	movdqu	-54(%rdi), %xmm2
 	mov	$-54, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(38bytes):
 	movdqu	-38(%rsi), %xmm1
 	movdqu	-38(%rdi), %xmm2
 	mov	$-38, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(22bytes):
 	movdqu	-22(%rsi), %xmm1
 	movdqu	-22(%rdi), %xmm2
 	mov	$-22, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 
@@ -1156,32 +1651,52 @@ L(71bytes):
 	mov	$-71, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(55bytes):
 	movdqu	-55(%rdi), %xmm2
 	movdqu	-55(%rsi), %xmm1
 	mov	$-55, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(39bytes):
 	movdqu	-39(%rdi), %xmm2
 	movdqu	-39(%rsi), %xmm1
 	mov	$-39, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(23bytes):
 	movdqu	-23(%rdi), %xmm2
 	movdqu	-23(%rsi), %xmm1
 	mov	$-23, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 # endif
@@ -1193,33 +1708,53 @@ L(72bytes):
 	mov	$-72, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(56bytes):
 	movdqu	-56(%rdi), %xmm2
 	movdqu	-56(%rsi), %xmm1
 	mov	$-56, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(40bytes):
 	movdqu	-40(%rdi), %xmm2
 	movdqu	-40(%rsi), %xmm1
 	mov	$-40, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(24bytes):
 	movdqu	-24(%rdi), %xmm2
 	movdqu	-24(%rsi), %xmm1
 	mov	$-24, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 
 	mov	-8(%rsi), %rcx
 	mov	-8(%rdi), %rax
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 	xor	%eax, %eax
 	ret
 
@@ -1232,32 +1767,52 @@ L(73bytes):
 	mov	$-73, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(57bytes):
 	movdqu	-57(%rdi), %xmm2
 	movdqu	-57(%rsi), %xmm1
 	mov	$-57, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(41bytes):
 	movdqu	-41(%rdi), %xmm2
 	movdqu	-41(%rsi), %xmm1
 	mov	$-41, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(25bytes):
 	movdqu	-25(%rdi), %xmm2
 	movdqu	-25(%rsi), %xmm1
 	mov	$-25, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-9(%rdi), %rax
 	mov	-9(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	movzbl	-1(%rdi), %eax
 	movzbl	-1(%rsi), %ecx
 	sub	%ecx, %eax
@@ -1270,35 +1825,60 @@ L(74bytes):
 	mov	$-74, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(58bytes):
 	movdqu	-58(%rdi), %xmm2
 	movdqu	-58(%rsi), %xmm1
 	mov	$-58, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(42bytes):
 	movdqu	-42(%rdi), %xmm2
 	movdqu	-42(%rsi), %xmm1
 	mov	$-42, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(26bytes):
 	movdqu	-26(%rdi), %xmm2
 	movdqu	-26(%rsi), %xmm1
 	mov	$-26, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-10(%rdi), %rax
 	mov	-10(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	movzwl	-2(%rdi), %eax
 	movzwl	-2(%rsi), %ecx
+#  ifdef USE_AS_BCMP
+	sub	%ecx, %eax
+	ret
+#  else
 	jmp	L(diffin2bytes)
+#  endif
 
 	.p2align 4
 L(75bytes):
@@ -1307,37 +1887,61 @@ L(75bytes):
 	mov	$-75, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(59bytes):
 	movdqu	-59(%rdi), %xmm2
 	movdqu	-59(%rsi), %xmm1
 	mov	$-59, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(43bytes):
 	movdqu	-43(%rdi), %xmm2
 	movdqu	-43(%rsi), %xmm1
 	mov	$-43, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(27bytes):
 	movdqu	-27(%rdi), %xmm2
 	movdqu	-27(%rsi), %xmm1
 	mov	$-27, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-11(%rdi), %rax
 	mov	-11(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-4(%rdi), %eax
 	mov	-4(%rsi), %ecx
+#  ifdef USE_AS_BCMP
+	sub	%ecx, %eax
+#  else
 	cmp	%eax, %ecx
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
+#  endif
 	ret
 # endif
 	.p2align 4
@@ -1347,41 +1951,66 @@ L(76bytes):
 	mov	$-76, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(60bytes):
 	movdqu	-60(%rdi), %xmm2
 	movdqu	-60(%rsi), %xmm1
 	mov	$-60, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(44bytes):
 	movdqu	-44(%rdi), %xmm2
 	movdqu	-44(%rsi), %xmm1
 	mov	$-44, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(28bytes):
 	movdqu	-28(%rdi), %xmm2
 	movdqu	-28(%rsi), %xmm1
 	mov	$-28, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 	mov	-12(%rdi), %rax
 	mov	-12(%rsi), %rcx
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 	mov	-4(%rsi), %ecx
-# ifndef USE_AS_WMEMCMP
+# ifdef USE_AS_BCMP
 	mov	-4(%rdi), %eax
-	cmp	%eax, %ecx
+	sub	%ecx, %eax
 # else
+#  ifndef USE_AS_WMEMCMP
+	mov	-4(%rdi), %eax
+	cmp	%eax, %ecx
+#  else
 	cmp	-4(%rdi), %ecx
-# endif
+#  endif
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
+# endif
 	ret
 
 # ifndef USE_AS_WMEMCMP
@@ -1393,38 +2022,62 @@ L(77bytes):
 	mov	$-77, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(61bytes):
 	movdqu	-61(%rdi), %xmm2
 	movdqu	-61(%rsi), %xmm1
 	mov	$-61, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(45bytes):
 	movdqu	-45(%rdi), %xmm2
 	movdqu	-45(%rsi), %xmm1
 	mov	$-45, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(29bytes):
 	movdqu	-29(%rdi), %xmm2
 	movdqu	-29(%rsi), %xmm1
 	mov	$-29, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 
 	mov	-13(%rdi), %rax
 	mov	-13(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 
@@ -1435,36 +2088,60 @@ L(78bytes):
 	mov	$-78, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(62bytes):
 	movdqu	-62(%rdi), %xmm2
 	movdqu	-62(%rsi), %xmm1
 	mov	$-62, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(46bytes):
 	movdqu	-46(%rdi), %xmm2
 	movdqu	-46(%rsi), %xmm1
 	mov	$-46, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(30bytes):
 	movdqu	-30(%rdi), %xmm2
 	movdqu	-30(%rsi), %xmm1
 	mov	$-30, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-14(%rdi), %rax
 	mov	-14(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 
@@ -1475,36 +2152,60 @@ L(79bytes):
 	mov	$-79, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(63bytes):
 	movdqu	-63(%rdi), %xmm2
 	movdqu	-63(%rsi), %xmm1
 	mov	$-63, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(47bytes):
 	movdqu	-47(%rdi), %xmm2
 	movdqu	-47(%rsi), %xmm1
 	mov	$-47, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 L(31bytes):
 	movdqu	-31(%rdi), %xmm2
 	movdqu	-31(%rsi), %xmm1
 	mov	$-31, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+#  ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+#  else
 	jnc	L(less16bytes)
+#  endif
 	mov	-15(%rdi), %rax
 	mov	-15(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+#  ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+#  else
 	jne	L(diffin8bytes)
+#  endif
 	xor	%eax, %eax
 	ret
 # endif
@@ -1515,37 +2216,58 @@ L(64bytes):
 	mov	$-64, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(48bytes):
 	movdqu	-48(%rdi), %xmm2
 	movdqu	-48(%rsi), %xmm1
 	mov	$-48, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 L(32bytes):
 	movdqu	-32(%rdi), %xmm2
 	movdqu	-32(%rsi), %xmm1
 	mov	$-32, %dl
 	pxor	%xmm1, %xmm2
 	ptest	%xmm2, %xmm0
+# ifdef USE_AS_BCMP
+	jnc	L(return_not_equals)
+# else
 	jnc	L(less16bytes)
+# endif
 
 	mov	-16(%rdi), %rax
 	mov	-16(%rsi), %rcx
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 
 	mov	-8(%rdi), %rax
 	mov	-8(%rsi), %rcx
 	cmp	%rax, %rcx
+# ifdef USE_AS_BCMP
+	jne	L(return_not_equals)
+# else
 	jne	L(diffin8bytes)
+# endif
 	xor	%eax, %eax
 	ret
 
 /*
  * Aligned 8 bytes to avoid 2 branch "taken" in one 16 alinged code block.
  */
+# ifndef USE_AS_BCMP
 	.p2align 3
 L(less16bytes):
 	movsbq	%dl, %rdx
@@ -1561,16 +2283,16 @@ L(diffin8bytes):
 	shr	$32, %rcx
 	shr	$32, %rax
 
-# ifdef USE_AS_WMEMCMP
+#  ifdef USE_AS_WMEMCMP
 /* for wmemcmp */
 	cmp	%eax, %ecx
 	jne	L(diffin4bytes)
 	xor	%eax, %eax
 	ret
-# endif
+#  endif
 
 L(diffin4bytes):
-# ifndef USE_AS_WMEMCMP
+#  ifndef USE_AS_WMEMCMP
 	cmp	%cx, %ax
 	jne	L(diffin2bytes)
 	shr	$16, %ecx
@@ -1589,7 +2311,7 @@ L(end):
 	and	$0xff, %ecx
 	sub	%ecx, %eax
 	ret
-# else
+#  else
 
 /* for wmemcmp */
 	mov	$1, %eax
@@ -1601,6 +2323,15 @@ L(end):
 L(nequal_bigger):
 	ret
 
+L(unreal_case):
+	xor	%eax, %eax
+	ret
+#  endif
+# else
+	.p2align 4
+L(return_not_equals):
+	mov	$1, %eax
+	ret
 L(unreal_case):
 	xor	%eax, %eax
 	ret
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S
  2021-09-14  6:30 ` [PATCH v2 " Noah Goldstein via Libc-alpha
  2021-09-14  6:30   ` [PATCH v2 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
  2021-09-14  6:30   ` [PATCH v2 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S Noah Goldstein via Libc-alpha
@ 2021-09-14  6:30   ` Noah Goldstein via Libc-alpha
  2021-09-14  6:30   ` [PATCH v2 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
  2021-09-14 14:40   ` [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex H.J. Lu via Libc-alpha
  4 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14  6:30 UTC (permalink / raw)
  To: libc-alpha

No bug. This commit adds new optimized bcmp implementation for avx2.

The primary optimizations are 1) skipping the logic to find the
difference of the first mismatched byte and 2) not updating src/dst
addresses as the non-equals logic does not need to be reused by
different areas.

The entry alignment has been fixed at 64. In throughput sensitive
functions which bcmp can potentially be frontend loop performance is
important to opimized for. This is impossible/difficult to do/maintain
with only 16 byte fixed alignment.

test-memcmp, test-bcmp, and test-wmemcmp are all passing.
---
 sysdeps/x86/sysdep.h                       |   6 +-
 sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S   |   4 +-
 sysdeps/x86_64/multiarch/bcmp-avx2.S       | 304 ++++++++++++++++++++-
 sysdeps/x86_64/multiarch/ifunc-bcmp.h      |   4 +-
 sysdeps/x86_64/multiarch/ifunc-impl-list.c |   2 -
 5 files changed, 308 insertions(+), 12 deletions(-)

diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h
index cac1d762fb..4895179c10 100644
--- a/sysdeps/x86/sysdep.h
+++ b/sysdeps/x86/sysdep.h
@@ -78,15 +78,17 @@ enum cf_protection_level
 #define ASM_SIZE_DIRECTIVE(name) .size name,.-name;
 
 /* Define an entry point visible from C.  */
-#define	ENTRY(name)							      \
+#define	ENTRY_P2ALIGN(name, alignment)					      \
   .globl C_SYMBOL_NAME(name);						      \
   .type C_SYMBOL_NAME(name),@function;					      \
-  .align ALIGNARG(4);							      \
+  .align ALIGNARG(alignment);						      \
   C_LABEL(name)								      \
   cfi_startproc;							      \
   _CET_ENDBR;								      \
   CALL_MCOUNT
 
+#define ENTRY(name) ENTRY_P2ALIGN (name, 4)
+
 #undef	END
 #define END(name)							      \
   cfi_endproc;								      \
diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
index d742257e4e..28976daff0 100644
--- a/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
+++ b/sysdeps/x86_64/multiarch/bcmp-avx2-rtm.S
@@ -1,5 +1,5 @@
-#ifndef MEMCMP
-# define MEMCMP __bcmp_avx2_rtm
+#ifndef BCMP
+# define BCMP __bcmp_avx2_rtm
 #endif
 
 #define ZERO_UPPER_VEC_REGISTERS_RETURN \
diff --git a/sysdeps/x86_64/multiarch/bcmp-avx2.S b/sysdeps/x86_64/multiarch/bcmp-avx2.S
index 93a9a20b17..eb77ae5c4a 100644
--- a/sysdeps/x86_64/multiarch/bcmp-avx2.S
+++ b/sysdeps/x86_64/multiarch/bcmp-avx2.S
@@ -16,8 +16,304 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#ifndef MEMCMP
-# define MEMCMP	__bcmp_avx2
-#endif
+#if IS_IN (libc)
+
+/* bcmp is implemented as:
+   1. Use ymm vector compares when possible. The only case where
+      vector compares is not possible for when size < VEC_SIZE
+      and loading from either s1 or s2 would cause a page cross.
+   2. Use xmm vector compare when size >= 8 bytes.
+   3. Optimistically compare up to first 4 * VEC_SIZE one at a
+      to check for early mismatches. Only do this if its guranteed the
+      work is not wasted.
+   4. If size is 8 * VEC_SIZE or less, unroll the loop.
+   5. Compare 4 * VEC_SIZE at a time with the aligned first memory
+      area.
+   6. Use 2 vector compares when size is 2 * VEC_SIZE or less.
+   7. Use 4 vector compares when size is 4 * VEC_SIZE or less.
+   8. Use 8 vector compares when size is 8 * VEC_SIZE or less.  */
+
+# include <sysdep.h>
+
+# ifndef BCMP
+#  define BCMP	__bcmp_avx2
+# endif
+
+# define VPCMPEQ	vpcmpeqb
+
+# ifndef VZEROUPPER
+#  define VZEROUPPER	vzeroupper
+# endif
+
+# ifndef SECTION
+#  define SECTION(p)	p##.avx
+# endif
+
+# define VEC_SIZE 32
+# define PAGE_SIZE	4096
+
+	.section SECTION(.text), "ax", @progbits
+ENTRY_P2ALIGN (BCMP, 6)
+# ifdef __ILP32__
+	/* Clear the upper 32 bits.  */
+	movl	%edx, %edx
+# endif
+	cmp	$VEC_SIZE, %RDX_LP
+	jb	L(less_vec)
+
+	/* From VEC to 2 * VEC.  No branch when size == VEC_SIZE.  */
+	vmovdqu	(%rsi), %ymm1
+	VPCMPEQ	(%rdi), %ymm1, %ymm1
+	vpmovmskb %ymm1, %eax
+	incl	%eax
+	jnz	L(return_neq0)
+	cmpq	$(VEC_SIZE * 2), %rdx
+	jbe	L(last_1x_vec)
+
+	/* Check second VEC no matter what.  */
+	vmovdqu	VEC_SIZE(%rsi), %ymm2
+	VPCMPEQ	VEC_SIZE(%rdi), %ymm2, %ymm2
+	vpmovmskb %ymm2, %eax
+	/* If all 4 VEC where equal eax will be all 1s so incl will overflow
+	   and set zero flag.  */
+	incl	%eax
+	jnz	L(return_neq0)
+
+	/* Less than 4 * VEC.  */
+	cmpq	$(VEC_SIZE * 4), %rdx
+	jbe	L(last_2x_vec)
+
+	/* Check third and fourth VEC no matter what.  */
+	vmovdqu	(VEC_SIZE * 2)(%rsi), %ymm3
+	VPCMPEQ	(VEC_SIZE * 2)(%rdi), %ymm3, %ymm3
+	vpmovmskb %ymm3, %eax
+	incl	%eax
+	jnz	L(return_neq0)
+
+	vmovdqu	(VEC_SIZE * 3)(%rsi), %ymm4
+	VPCMPEQ	(VEC_SIZE * 3)(%rdi), %ymm4, %ymm4
+	vpmovmskb %ymm4, %eax
+	incl	%eax
+	jnz	L(return_neq0)
+
+	/* Go to 4x VEC loop.  */
+	cmpq	$(VEC_SIZE * 8), %rdx
+	ja	L(more_8x_vec)
+
+	/* Handle remainder of size = 4 * VEC + 1 to 8 * VEC without any
+	   branches.  */
+
+	/* Adjust rsi and rdi to avoid indexed address mode. This end up
+	   saving a 16 bytes of code, prevents unlamination, and bottlenecks in
+	   the AGU.  */
+	addq	%rdx, %rsi
+	vmovdqu	-(VEC_SIZE * 4)(%rsi), %ymm1
+	vmovdqu	-(VEC_SIZE * 3)(%rsi), %ymm2
+	addq	%rdx, %rdi
+
+	VPCMPEQ	-(VEC_SIZE * 4)(%rdi), %ymm1, %ymm1
+	VPCMPEQ	-(VEC_SIZE * 3)(%rdi), %ymm2, %ymm2
+
+	vmovdqu	-(VEC_SIZE * 2)(%rsi), %ymm3
+	VPCMPEQ	-(VEC_SIZE * 2)(%rdi), %ymm3, %ymm3
+	vmovdqu	-VEC_SIZE(%rsi), %ymm4
+	VPCMPEQ	-VEC_SIZE(%rdi), %ymm4, %ymm4
 
-#include "bcmp-avx2.S"
+	/* Reduce VEC0 - VEC4.  */
+	vpand	%ymm1, %ymm2, %ymm2
+	vpand	%ymm3, %ymm4, %ymm4
+	vpand	%ymm2, %ymm4, %ymm4
+	vpmovmskb %ymm4, %eax
+	incl	%eax
+L(return_neq0):
+L(return_vzeroupper):
+	ZERO_UPPER_VEC_REGISTERS_RETURN
+
+	/* NB: p2align 5 here will ensure the L(loop_4x_vec) is also 32 byte
+	   aligned.  */
+	.p2align 5
+L(less_vec):
+	/* Check if one or less char. This is necessary for size = 0 but is
+	   also faster for size = 1.  */
+	cmpl	$1, %edx
+	jbe	L(one_or_less)
+
+	/* Check if loading one VEC from either s1 or s2 could cause a page
+	   cross. This can have false positives but is by far the fastest
+	   method.  */
+	movl	%edi, %eax
+	orl	%esi, %eax
+	andl	$(PAGE_SIZE - 1), %eax
+	cmpl	$(PAGE_SIZE - VEC_SIZE), %eax
+	jg	L(page_cross_less_vec)
+
+	/* No page cross possible.  */
+	vmovdqu	(%rsi), %ymm2
+	VPCMPEQ	(%rdi), %ymm2, %ymm2
+	vpmovmskb %ymm2, %eax
+	incl	%eax
+	/* Result will be zero if s1 and s2 match. Otherwise first set bit
+	   will be first mismatch.  */
+	bzhil	%edx, %eax, %eax
+	VZEROUPPER_RETURN
+
+	/* Relatively cold but placing close to L(less_vec) for 2 byte jump
+	   encoding.  */
+	.p2align 4
+L(one_or_less):
+	jb	L(zero)
+	movzbl	(%rsi), %ecx
+	movzbl	(%rdi), %eax
+	subl	%ecx, %eax
+	/* No ymm register was touched.  */
+	ret
+	/* Within the same 16 byte block is L(one_or_less).  */
+L(zero):
+	xorl	%eax, %eax
+	ret
+
+	.p2align 4
+L(last_1x_vec):
+	vmovdqu	-(VEC_SIZE * 1)(%rsi, %rdx), %ymm1
+	VPCMPEQ	-(VEC_SIZE * 1)(%rdi, %rdx), %ymm1, %ymm1
+	vpmovmskb %ymm1, %eax
+	incl	%eax
+	VZEROUPPER_RETURN
+
+	.p2align 4
+L(last_2x_vec):
+	vmovdqu	-(VEC_SIZE * 2)(%rsi, %rdx), %ymm1
+	VPCMPEQ	-(VEC_SIZE * 2)(%rdi, %rdx), %ymm1, %ymm1
+	vmovdqu	-(VEC_SIZE * 1)(%rsi, %rdx), %ymm2
+	VPCMPEQ	-(VEC_SIZE * 1)(%rdi, %rdx), %ymm2, %ymm2
+	vpand	%ymm1, %ymm2, %ymm2
+	vpmovmskb %ymm2, %eax
+	incl	%eax
+	VZEROUPPER_RETURN
+
+	.p2align 4
+L(more_8x_vec):
+	/* Set end of s1 in rdx.  */
+	leaq	-(VEC_SIZE * 4)(%rdi, %rdx), %rdx
+	/* rsi stores s2 - s1. This allows loop to only update one pointer.
+	 */
+	subq	%rdi, %rsi
+	/* Align s1 pointer.  */
+	andq	$-VEC_SIZE, %rdi
+	/* Adjust because first 4x vec where check already.  */
+	subq	$-(VEC_SIZE * 4), %rdi
+	.p2align 4
+L(loop_4x_vec):
+	/* rsi has s2 - s1 so get correct address by adding s1 (in rdi).  */
+	vmovdqu	(%rsi, %rdi), %ymm1
+	VPCMPEQ	(%rdi), %ymm1, %ymm1
+
+	vmovdqu	VEC_SIZE(%rsi, %rdi), %ymm2
+	VPCMPEQ	VEC_SIZE(%rdi), %ymm2, %ymm2
+
+	vmovdqu	(VEC_SIZE * 2)(%rsi, %rdi), %ymm3
+	VPCMPEQ	(VEC_SIZE * 2)(%rdi), %ymm3, %ymm3
+
+	vmovdqu	(VEC_SIZE * 3)(%rsi, %rdi), %ymm4
+	VPCMPEQ	(VEC_SIZE * 3)(%rdi), %ymm4, %ymm4
+
+	vpand	%ymm1, %ymm2, %ymm2
+	vpand	%ymm3, %ymm4, %ymm4
+	vpand	%ymm2, %ymm4, %ymm4
+	vpmovmskb %ymm4, %eax
+	incl	%eax
+	jnz	L(return_neq1)
+	subq	$-(VEC_SIZE * 4), %rdi
+	/* Check if s1 pointer at end.  */
+	cmpq	%rdx, %rdi
+	jb	L(loop_4x_vec)
+
+	vmovdqu	(VEC_SIZE * 3)(%rsi, %rdx), %ymm4
+	VPCMPEQ	(VEC_SIZE * 3)(%rdx), %ymm4, %ymm4
+	subq	%rdx, %rdi
+	/* rdi has 4 * VEC_SIZE - remaining length.  */
+	cmpl	$(VEC_SIZE * 3), %edi
+	jae	L(8x_last_1x_vec)
+	/* Load regardless of branch.  */
+	vmovdqu	(VEC_SIZE * 2)(%rsi, %rdx), %ymm3
+	VPCMPEQ	(VEC_SIZE * 2)(%rdx), %ymm3, %ymm3
+	cmpl	$(VEC_SIZE * 2), %edi
+	jae	L(8x_last_2x_vec)
+	/* Check last 4 VEC.  */
+	vmovdqu	VEC_SIZE(%rsi, %rdx), %ymm1
+	VPCMPEQ	VEC_SIZE(%rdx), %ymm1, %ymm1
+
+	vmovdqu	(%rsi, %rdx), %ymm2
+	VPCMPEQ	(%rdx), %ymm2, %ymm2
+
+	vpand	%ymm3, %ymm4, %ymm4
+	vpand	%ymm1, %ymm2, %ymm3
+L(8x_last_2x_vec):
+	vpand	%ymm3, %ymm4, %ymm4
+L(8x_last_1x_vec):
+	vpmovmskb %ymm4, %eax
+	/* Restore s1 pointer to rdi.  */
+	incl	%eax
+L(return_neq1):
+	VZEROUPPER_RETURN
+
+	/* Relatively cold case as page cross are unexpected.  */
+	.p2align 4
+L(page_cross_less_vec):
+	cmpl	$16, %edx
+	jae	L(between_16_31)
+	cmpl	$8, %edx
+	ja	L(between_9_15)
+	cmpl	$4, %edx
+	jb	L(between_2_3)
+	/* From 4 to 8 bytes.  No branch when size == 4.  */
+	movl	(%rdi), %eax
+	movl	(%rsi), %ecx
+	subl	%ecx, %eax
+	movl	-4(%rdi, %rdx), %ecx
+	movl	-4(%rsi, %rdx), %esi
+	subl	%esi, %ecx
+	orl	%ecx, %eax
+	ret
+
+	.p2align 4,, 8
+L(between_9_15):
+	vmovq	(%rdi), %xmm1
+	vmovq	(%rsi), %xmm2
+	VPCMPEQ	%xmm1, %xmm2, %xmm3
+	vmovq	-8(%rdi, %rdx), %xmm1
+	vmovq	-8(%rsi, %rdx), %xmm2
+	VPCMPEQ	%xmm1, %xmm2, %xmm2
+	vpand	%xmm2, %xmm3, %xmm3
+	vpmovmskb %xmm3, %eax
+	subl	$0xffff, %eax
+	/* No ymm register was touched.  */
+	ret
+
+	.p2align 4,, 8
+L(between_16_31):
+	/* From 16 to 31 bytes.  No branch when size == 16.  */
+	vmovdqu	(%rsi), %xmm1
+	VPCMPEQ	(%rdi), %xmm1, %xmm1
+	vmovdqu	-16(%rsi, %rdx), %xmm2
+	VPCMPEQ	-16(%rdi, %rdx), %xmm2, %xmm2
+	vpand	%xmm1, %xmm2, %xmm2
+	vpmovmskb %xmm2, %eax
+	subl	$0xffff, %eax
+	/* No ymm register was touched.  */
+	ret
+
+	.p2align 4,, 8
+L(between_2_3):
+	/* From 2 to 3 bytes.  No branch when size == 2.  */
+	movzwl	(%rdi), %eax
+	movzwl	(%rsi), %ecx
+	subl	%ecx, %eax
+	movzbl	-1(%rdi, %rdx), %edi
+	movzbl	-1(%rsi, %rdx), %esi
+	subl	%edi, %esi
+	orl	%esi, %eax
+	/* No ymm register was touched.  */
+	ret
+END (BCMP)
+#endif
diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
index b0dacd8526..f94516e5ee 100644
--- a/sysdeps/x86_64/multiarch/ifunc-bcmp.h
+++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
@@ -32,11 +32,11 @@ IFUNC_SELECTOR (void)
 
   if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
       && CPU_FEATURE_USABLE_P (cpu_features, BMI2)
-      && CPU_FEATURE_USABLE_P (cpu_features, MOVBE)
       && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
     {
       if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
-	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
+	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
+	  && CPU_FEATURE_USABLE_P (cpu_features, MOVBE))
 	return OPTIMIZE (evex);
 
       if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index dd0c393c7d..cda0316928 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -42,13 +42,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, bcmp,
 	      IFUNC_IMPL_ADD (array, i, bcmp,
 			      (CPU_FEATURE_USABLE (AVX2)
-                   && CPU_FEATURE_USABLE (MOVBE)
 			       && CPU_FEATURE_USABLE (BMI2)),
 			      __bcmp_avx2)
 	      IFUNC_IMPL_ADD (array, i, bcmp,
 			      (CPU_FEATURE_USABLE (AVX2)
 			       && CPU_FEATURE_USABLE (BMI2)
-                   && CPU_FEATURE_USABLE (MOVBE)
 			       && CPU_FEATURE_USABLE (RTM)),
 			      __bcmp_avx2_rtm)
 	      IFUNC_IMPL_ADD (array, i, bcmp,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S
  2021-09-14  6:30 ` [PATCH v2 " Noah Goldstein via Libc-alpha
                     ` (2 preceding siblings ...)
  2021-09-14  6:30   ` [PATCH v2 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S Noah Goldstein via Libc-alpha
@ 2021-09-14  6:30   ` Noah Goldstein via Libc-alpha
  2021-09-14 14:40   ` [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex H.J. Lu via Libc-alpha
  4 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14  6:30 UTC (permalink / raw)
  To: libc-alpha

No bug. This commit adds new optimized bcmp implementation for evex.

The primary optimizations are 1) skipping the logic to find the
difference of the first mismatched byte and 2) not updating src/dst
addresses as the non-equals logic does not need to be reused by
different areas.

The entry alignment has been fixed at 64. In throughput sensitive
functions which bcmp can potentially be frontend loop performance is
important to opimized for. This is impossible/difficult to do/maintain
with only 16 byte fixed alignment.

test-memcmp, test-bcmp, and test-wmemcmp are all passing.
---
 sysdeps/x86_64/multiarch/bcmp-evex.S       | 305 ++++++++++++++++++++-
 sysdeps/x86_64/multiarch/ifunc-bcmp.h      |   3 +-
 sysdeps/x86_64/multiarch/ifunc-impl-list.c |   1 -
 3 files changed, 302 insertions(+), 7 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/bcmp-evex.S b/sysdeps/x86_64/multiarch/bcmp-evex.S
index ade52e8c68..1bfe824eb4 100644
--- a/sysdeps/x86_64/multiarch/bcmp-evex.S
+++ b/sysdeps/x86_64/multiarch/bcmp-evex.S
@@ -16,8 +16,305 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#ifndef MEMCMP
-# define MEMCMP	__bcmp_evex
-#endif
+#if IS_IN (libc)
+
+/* bcmp is implemented as:
+   1. Use ymm vector compares when possible. The only case where
+      vector compares is not possible for when size < VEC_SIZE
+      and loading from either s1 or s2 would cause a page cross.
+   2. Use xmm vector compare when size >= 8 bytes.
+   3. Optimistically compare up to first 4 * VEC_SIZE one at a
+      to check for early mismatches. Only do this if its guranteed the
+      work is not wasted.
+   4. If size is 8 * VEC_SIZE or less, unroll the loop.
+   5. Compare 4 * VEC_SIZE at a time with the aligned first memory
+      area.
+   6. Use 2 vector compares when size is 2 * VEC_SIZE or less.
+   7. Use 4 vector compares when size is 4 * VEC_SIZE or less.
+   8. Use 8 vector compares when size is 8 * VEC_SIZE or less.  */
+
+# include <sysdep.h>
+
+# ifndef BCMP
+#  define BCMP	__bcmp_evex
+# endif
+
+# define VMOVU	vmovdqu64
+# define VPCMP	vpcmpub
+# define VPTEST	vptestmb
+
+# define VEC_SIZE	32
+# define PAGE_SIZE	4096
+
+# define YMM0		ymm16
+# define YMM1		ymm17
+# define YMM2		ymm18
+# define YMM3		ymm19
+# define YMM4		ymm20
+# define YMM5		ymm21
+# define YMM6		ymm22
+
+
+	.section .text.evex, "ax", @progbits
+ENTRY_P2ALIGN (BCMP, 6)
+# ifdef __ILP32__
+	/* Clear the upper 32 bits.  */
+	movl	%edx, %edx
+# endif
+	cmp	$VEC_SIZE, %RDX_LP
+	jb	L(less_vec)
+
+	/* From VEC to 2 * VEC.  No branch when size == VEC_SIZE.  */
+	VMOVU	(%rsi), %YMM1
+	/* Use compare not equals to directly check for mismatch.  */
+	VPCMP	$4, (%rdi), %YMM1, %k1
+	kmovd	%k1, %eax
+	testl	%eax, %eax
+	jnz	L(return_neq0)
+
+	cmpq	$(VEC_SIZE * 2), %rdx
+	jbe	L(last_1x_vec)
+
+	/* Check second VEC no matter what.  */
+	VMOVU	VEC_SIZE(%rsi), %YMM2
+	VPCMP	$4, VEC_SIZE(%rdi), %YMM2, %k1
+	kmovd	%k1, %eax
+	testl	%eax, %eax
+	jnz	L(return_neq0)
+
+	/* Less than 4 * VEC.  */
+	cmpq	$(VEC_SIZE * 4), %rdx
+	jbe	L(last_2x_vec)
+
+	/* Check third and fourth VEC no matter what.  */
+	VMOVU	(VEC_SIZE * 2)(%rsi), %YMM3
+	VPCMP	$4, (VEC_SIZE * 2)(%rdi), %YMM3, %k1
+	kmovd	%k1, %eax
+	testl	%eax, %eax
+	jnz	L(return_neq0)
+
+	VMOVU	(VEC_SIZE * 3)(%rsi), %YMM4
+	VPCMP	$4, (VEC_SIZE * 3)(%rdi), %YMM4, %k1
+	kmovd	%k1, %eax
+	testl	%eax, %eax
+	jnz	L(return_neq0)
+
+	/* Go to 4x VEC loop.  */
+	cmpq	$(VEC_SIZE * 8), %rdx
+	ja	L(more_8x_vec)
+
+	/* Handle remainder of size = 4 * VEC + 1 to 8 * VEC without any
+	   branches.  */
+
+	VMOVU	-(VEC_SIZE * 4)(%rsi, %rdx), %YMM1
+	VMOVU	-(VEC_SIZE * 3)(%rsi, %rdx), %YMM2
+	addq	%rdx, %rdi
+
+	/* Wait to load from s1 until addressed adjust due to unlamination.
+	 */
+
+	/* vpxor will be all 0s if s1 and s2 are equal. Otherwise it will
+	   have some 1s.  */
+	vpxorq	-(VEC_SIZE * 4)(%rdi), %YMM1, %YMM1
+	vpxorq	-(VEC_SIZE * 3)(%rdi), %YMM2, %YMM2
+
+	VMOVU	-(VEC_SIZE * 2)(%rsi, %rdx), %YMM3
+	vpxorq	-(VEC_SIZE * 2)(%rdi), %YMM3, %YMM3
+	/* Or together YMM1, YMM2, and YMM3 into YMM3.  */
+	vpternlogd $0xfe, %YMM1, %YMM2, %YMM3
 
-#include "memcmp-evex-movbe.S"
+	VMOVU	-(VEC_SIZE)(%rsi, %rdx), %YMM4
+	/* Ternary logic to xor (VEC_SIZE * 3)(%rdi) with YMM4 while oring
+	   with YMM3. Result is stored in YMM4.  */
+	vpternlogd $0xde, -(VEC_SIZE)(%rdi), %YMM3, %YMM4
+	/* Compare YMM4 with 0. If any 1s s1 and s2 don't match.  */
+	VPTEST	%YMM4, %YMM4, %k1
+	kmovd	%k1, %eax
+L(return_neq0):
+	ret
+
+	/* Fits in padding needed to .p2align 5 L(less_vec).  */
+L(last_1x_vec):
+	VMOVU	-(VEC_SIZE * 1)(%rsi, %rdx), %YMM1
+	VPCMP	$4, -(VEC_SIZE * 1)(%rdi, %rdx), %YMM1, %k1
+	kmovd	%k1, %eax
+	ret
+
+	/* NB: p2align 5 here will ensure the L(loop_4x_vec) is also 32 byte
+	   aligned.  */
+	.p2align 5
+L(less_vec):
+	/* Check if one or less char. This is necessary for size = 0 but is
+	   also faster for size = 1.  */
+	cmpl	$1, %edx
+	jbe	L(one_or_less)
+
+	/* Check if loading one VEC from either s1 or s2 could cause a page
+	   cross. This can have false positives but is by far the fastest
+	   method.  */
+	movl	%edi, %eax
+	orl	%esi, %eax
+	andl	$(PAGE_SIZE - 1), %eax
+	cmpl	$(PAGE_SIZE - VEC_SIZE), %eax
+	jg	L(page_cross_less_vec)
+
+	/* No page cross possible.  */
+	VMOVU	(%rsi), %YMM2
+	VPCMP	$4, (%rdi), %YMM2, %k1
+	kmovd	%k1, %eax
+	/* Result will be zero if s1 and s2 match. Otherwise first set bit
+	   will be first mismatch.  */
+	bzhil	%edx, %eax, %eax
+	ret
+
+	/* Relatively cold but placing close to L(less_vec) for 2 byte jump
+	   encoding.  */
+	.p2align 4
+L(one_or_less):
+	jb	L(zero)
+	movzbl	(%rsi), %ecx
+	movzbl	(%rdi), %eax
+	subl	%ecx, %eax
+	/* No ymm register was touched.  */
+	ret
+	/* Within the same 16 byte block is L(one_or_less).  */
+L(zero):
+	xorl	%eax, %eax
+	ret
+
+	.p2align 4
+L(last_2x_vec):
+	VMOVU	-(VEC_SIZE * 2)(%rsi, %rdx), %YMM1
+	vpxorq	-(VEC_SIZE * 2)(%rdi, %rdx), %YMM1, %YMM1
+	VMOVU	-(VEC_SIZE * 1)(%rsi, %rdx), %YMM2
+	vpternlogd $0xde, -(VEC_SIZE * 1)(%rdi, %rdx), %YMM1, %YMM2
+	VPTEST	%YMM2, %YMM2, %k1
+	kmovd	%k1, %eax
+	ret
+
+	.p2align 4
+L(more_8x_vec):
+	/* Set end of s1 in rdx.  */
+	leaq	-(VEC_SIZE * 4)(%rdi, %rdx), %rdx
+	/* rsi stores s2 - s1. This allows loop to only update one pointer.
+	 */
+	subq	%rdi, %rsi
+	/* Align s1 pointer.  */
+	andq	$-VEC_SIZE, %rdi
+	/* Adjust because first 4x vec where check already.  */
+	subq	$-(VEC_SIZE * 4), %rdi
+	.p2align 4
+L(loop_4x_vec):
+	VMOVU	(%rsi, %rdi), %YMM1
+	vpxorq	(%rdi), %YMM1, %YMM1
+
+	VMOVU	VEC_SIZE(%rsi, %rdi), %YMM2
+	vpxorq	VEC_SIZE(%rdi), %YMM2, %YMM2
+
+	VMOVU	(VEC_SIZE * 2)(%rsi, %rdi), %YMM3
+	vpxorq	(VEC_SIZE * 2)(%rdi), %YMM3, %YMM3
+	vpternlogd $0xfe, %YMM1, %YMM2, %YMM3
+
+	VMOVU	(VEC_SIZE * 3)(%rsi, %rdi), %YMM4
+	vpternlogd $0xde, (VEC_SIZE * 3)(%rdi), %YMM3, %YMM4
+	VPTEST	%YMM4, %YMM4, %k1
+	kmovd	%k1, %eax
+	testl	%eax, %eax
+	jnz	L(return_neq2)
+	subq	$-(VEC_SIZE * 4), %rdi
+	cmpq	%rdx, %rdi
+	jb	L(loop_4x_vec)
+
+	subq	%rdx, %rdi
+	VMOVU	(VEC_SIZE * 3)(%rsi, %rdx), %YMM4
+	vpxorq	(VEC_SIZE * 3)(%rdx), %YMM4, %YMM4
+	/* rdi has 4 * VEC_SIZE - remaining length.  */
+	cmpl	$(VEC_SIZE * 3), %edi
+	jae	L(8x_last_1x_vec)
+	/* Load regardless of branch.  */
+	VMOVU	(VEC_SIZE * 2)(%rsi, %rdx), %YMM3
+	/* Ternary logic to xor (VEC_SIZE * 2)(%rdx) with YMM3 while oring
+	   with YMM4. Result is stored in YMM4.  */
+	vpternlogd $0xf6, (VEC_SIZE * 2)(%rdx), %YMM3, %YMM4
+	cmpl	$(VEC_SIZE * 2), %edi
+	jae	L(8x_last_2x_vec)
+
+	VMOVU	VEC_SIZE(%rsi, %rdx), %YMM2
+	vpxorq	VEC_SIZE(%rdx), %YMM2, %YMM2
+
+	VMOVU	(%rsi, %rdx), %YMM1
+	vpxorq	(%rdx), %YMM1, %YMM1
+
+	vpternlogd $0xfe, %YMM1, %YMM2, %YMM4
+L(8x_last_1x_vec):
+L(8x_last_2x_vec):
+	VPTEST	%YMM4, %YMM4, %k1
+	kmovd	%k1, %eax
+L(return_neq2):
+	ret
+
+	/* Relatively cold case as page cross are unexpected.  */
+	.p2align 4
+L(page_cross_less_vec):
+	cmpl	$16, %edx
+	jae	L(between_16_31)
+	cmpl	$8, %edx
+	ja	L(between_9_15)
+	cmpl	$4, %edx
+	jb	L(between_2_3)
+	/* From 4 to 8 bytes.  No branch when size == 4.  */
+	movl	(%rdi), %eax
+	movl	(%rsi), %ecx
+	subl	%ecx, %eax
+	movl	-4(%rdi, %rdx), %ecx
+	movl	-4(%rsi, %rdx), %esi
+	subl	%esi, %ecx
+	orl	%ecx, %eax
+	ret
+
+	.p2align 4,, 8
+L(between_9_15):
+	/* Safe to use xmm[0, 15] as no vzeroupper is needed so RTM safe.
+	 */
+	vmovq	(%rdi), %xmm1
+	vmovq	(%rsi), %xmm2
+	vpcmpeqb %xmm1, %xmm2, %xmm3
+	vmovq	-8(%rdi, %rdx), %xmm1
+	vmovq	-8(%rsi, %rdx), %xmm2
+	vpcmpeqb %xmm1, %xmm2, %xmm2
+	vpand	%xmm2, %xmm3, %xmm3
+	vpmovmskb %xmm3, %eax
+	subl	$0xffff, %eax
+	/* No ymm register was touched.  */
+	ret
+
+	.p2align 4,, 8
+L(between_16_31):
+	/* From 16 to 31 bytes.  No branch when size == 16.  */
+
+	/* Safe to use xmm[0, 15] as no vzeroupper is needed so RTM safe.
+	 */
+	vmovdqu	(%rsi), %xmm1
+	vpcmpeqb (%rdi), %xmm1, %xmm1
+	vmovdqu	-16(%rsi, %rdx), %xmm2
+	vpcmpeqb -16(%rdi, %rdx), %xmm2, %xmm2
+	vpand	%xmm1, %xmm2, %xmm2
+	vpmovmskb %xmm2, %eax
+	subl	$0xffff, %eax
+	/* No ymm register was touched.  */
+	ret
+
+	.p2align 4,, 8
+L(between_2_3):
+	/* From 2 to 3 bytes.  No branch when size == 2.  */
+	movzwl	(%rdi), %eax
+	movzwl	(%rsi), %ecx
+	subl	%ecx, %eax
+	movzbl	-1(%rdi, %rdx), %edi
+	movzbl	-1(%rsi, %rdx), %esi
+	subl	%edi, %esi
+	orl	%esi, %eax
+	/* No ymm register was touched.  */
+	ret
+END (BCMP)
+#endif
diff --git a/sysdeps/x86_64/multiarch/ifunc-bcmp.h b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
index f94516e5ee..51f251d0c9 100644
--- a/sysdeps/x86_64/multiarch/ifunc-bcmp.h
+++ b/sysdeps/x86_64/multiarch/ifunc-bcmp.h
@@ -35,8 +35,7 @@ IFUNC_SELECTOR (void)
       && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
     {
       if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
-	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
-	  && CPU_FEATURE_USABLE_P (cpu_features, MOVBE))
+	  && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
 	return OPTIMIZE (evex);
 
       if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index cda0316928..abbb4e407f 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -52,7 +52,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      IFUNC_IMPL_ADD (array, i, bcmp,
 			      (CPU_FEATURE_USABLE (AVX512VL)
 			       && CPU_FEATURE_USABLE (AVX512BW)
-                   && CPU_FEATURE_USABLE (MOVBE)
 			       && CPU_FEATURE_USABLE (BMI2)),
 			      __bcmp_evex)
 	      IFUNC_IMPL_ADD (array, i, bcmp, CPU_FEATURE_USABLE (SSE4_1),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex
  2021-09-14  6:30 ` [PATCH v2 " Noah Goldstein via Libc-alpha
                     ` (3 preceding siblings ...)
  2021-09-14  6:30   ` [PATCH v2 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
@ 2021-09-14 14:40   ` H.J. Lu via Libc-alpha
  2021-09-14 19:23     ` Noah Goldstein via Libc-alpha
  2021-09-14 20:30     ` Florian Weimer via Libc-alpha
  4 siblings, 2 replies; 51+ messages in thread
From: H.J. Lu via Libc-alpha @ 2021-09-14 14:40 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: GNU C Library

On Mon, Sep 13, 2021 at 11:30 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> No bug. This commit adds support for an optimized bcmp implementation.
> Support is for sse2, sse4_1, avx2, and evex.
>
> All string tests passing and build succeeding.

memcmp can be a little slower than bcmp.  But bcmp isn't a standard C function.
All new codes should use memcmp.  Can you improve memcmp instead?

Thanks.


-- 
H.J.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex
  2021-09-14 14:40   ` [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex H.J. Lu via Libc-alpha
@ 2021-09-14 19:23     ` Noah Goldstein via Libc-alpha
  2021-09-14 20:30     ` Florian Weimer via Libc-alpha
  1 sibling, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-14 19:23 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library

On Tue, Sep 14, 2021 at 9:40 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Mon, Sep 13, 2021 at 11:30 PM Noah Goldstein <goldstein.w.n@gmail.com>
> wrote:
> >
> > No bug. This commit adds support for an optimized bcmp implementation.
> > Support is for sse2, sse4_1, avx2, and evex.
> >
> > All string tests passing and build succeeding.
>
> memcmp can be a little slower than bcmp.  But bcmp isn't a standard C
> function.
> All new codes should use memcmp.  Can you improve memcmp instead?


> Thanks.
>


There are some small improvements to memcmp I could imagine.

Use vptestm{b|d} instead of vpcmp for zero tests.
Aligning to 64 bytes so that target alignments/placements can be better
optimized.

But for the most part the biggest optimization is just reducing all the
work to compute
1/-1 for the result.

I think, however, that since GLIBC supports bcmp it makes sense to offer
the best
version we can.  Especially since compilers (Clang at least) will use it
when possible
to optimize memcmp usage.

I don't fully understand the concern with adding it. AFAICT if GLIBC
decides it no
longer wants to support bcmp we can remove it, but essentially the same work
would need to be done regardless. Can you elaborate on why?


>
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex
  2021-09-14 14:40   ` [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex H.J. Lu via Libc-alpha
  2021-09-14 19:23     ` Noah Goldstein via Libc-alpha
@ 2021-09-14 20:30     ` Florian Weimer via Libc-alpha
  1 sibling, 0 replies; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-14 20:30 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha

* H. J. Lu via Libc-alpha:

> On Mon, Sep 13, 2021 at 11:30 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>>
>> No bug. This commit adds support for an optimized bcmp implementation.
>> Support is for sse2, sse4_1, avx2, and evex.
>>
>> All string tests passing and build succeeding.
>
> memcmp can be a little slower than bcmp.  But bcmp isn't a standard C
> function.  All new codes should use memcmp.  Can you improve memcmp
> instead?

I looked at this from the angle of a timing-insensitive memcmp
implementation a while back.  Computing the ordering (in addition to
equality) is actually fairly costly because you have to find the
position of the mismatch and, in the vector case, load the the bytes
once more for memory.  In the non-vector case, a saturating subtraction
is needed, along with correction for endianness.  Skipping that work has
a real benefit.  It also makes it easier to argue that the
implementations are quasi-constant-time, which could be considered
security hardening.  It had not occurred to me to reuse the bcmp symbol
for that.

The Clang optimization is dubious because it replaces memcmp with bcmp
even if there is no suitable bcmp declaration in scope (which is the
hack used by GCC to guide rewriting in similar cases).  I do not know
for sure if GCC implements similar out-of-standard symbol replacements.
I think it can synthesize mempcpy calls in some cases.  That would be a
precedent for bcmp rewriting that Clang does.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex
  2021-09-13 23:05 [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
                   ` (5 preceding siblings ...)
  2021-09-14  6:30 ` [PATCH v2 " Noah Goldstein via Libc-alpha
@ 2021-09-15  0:00 ` Joseph Myers
  2021-09-15 13:37   ` Zack Weinberg via Libc-alpha
  6 siblings, 1 reply; 51+ messages in thread
From: Joseph Myers @ 2021-09-15  0:00 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha

bcmp is an obsolescent function that no modern programs should be using, 
and it's not in the implementation namespace either so compilers shouldn't 
translate memcmp calls to bcmp.

If you want to define memcmp ABI variants optimized for particular usages, 
I suggest the following:

1. Add reserved-namespace names for such variants to the x86_64 psABI 
document (working with the ABI mailing list).  The 32-bit Arm RTABI 
<https://github.com/ARM-software/abi-aa/blob/main/rtabi32/rtabi32.rst> 
provides a precedent for defining such function variants in a psABI (it 
includes various __aeabi_mem*, though no memcmp variants).

2. Add those names to glibc, as well as teaching compilers to generate 
calls to them (with appropriate conditionals for whether the functions are 
known to be available in the target libc; in GCC, that would be based on 
GCC_GLIBC_VERSION_GTE_IFELSE configure tests for targets using glibc).


As a variant, you could define such names as architecture-independent GNU 
extensions rather than in a psABI, especially if there's nothing 
architecture-specific about the variants you think are useful (e.g. no use 
for having changes to calling conventions / call-clobbered registers for 
the variants).  But what should not be done in any case is tying an 
optimization to an obsolescent non-reserved name - any such optimized 
variants should use only implementation-namespace names.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2,  sse4_1, avx2, and evex
  2021-09-15  0:00 ` [PATCH " Joseph Myers
@ 2021-09-15 13:37   ` Zack Weinberg via Libc-alpha
  2021-09-15 14:01     ` Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, " Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Zack Weinberg via Libc-alpha @ 2021-09-15 13:37 UTC (permalink / raw)
  To: libc-alpha

On Tue, Sep 14, 2021, at 8:00 PM, Joseph Myers wrote:
> bcmp is an obsolescent function that no modern programs should be using, 
> and it's not in the implementation namespace either so compilers shouldn't 
> translate memcmp calls to bcmp.

I want to add that glibc has made bcmp an alias for memcmp for many years, which means that Linux- or Hurd-specific programs that are still using bcmp may have come to depend on its return value indicating ordering rather than just equality.  I myself had been under the impression that they were *specified* exactly the same, until this thread prompted me to double-check the specifications.  As such I don't think it's safe for *glibc* to accept patches that optimize bcmp separately from memcmp.

I do rather like the idea of a __gnu_memeq() that compilers could optimize memcmp calls to, when they can prove that the result is used only for its truth value.

zw

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2,  sse 4_1, avx2, and evex
  2021-09-15 13:37   ` Zack Weinberg via Libc-alpha
@ 2021-09-15 14:01     ` Florian Weimer via Libc-alpha
  2021-09-15 18:06       ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-15 14:01 UTC (permalink / raw)
  To: Zack Weinberg via Libc-alpha; +Cc: Zack Weinberg

* Zack Weinberg via Libc-alpha:

> On Tue, Sep 14, 2021, at 8:00 PM, Joseph Myers wrote:
>> bcmp is an obsolescent function that no modern programs should be using, 
>> and it's not in the implementation namespace either so compilers shouldn't 
>> translate memcmp calls to bcmp.
>
> I want to add that glibc has made bcmp an alias for memcmp for many
> years, which means that Linux- or Hurd-specific programs that are
> still using bcmp may have come to depend on its return value
> indicating ordering rather than just equality.  I myself had been
> under the impression that they were *specified* exactly the same,
> until this thread prompted me to double-check the specifications.  As
> such I don't think it's safe for *glibc* to accept patches that
> optimize bcmp separately from memcmp.

That's a very good point.

> I do rather like the idea of a __gnu_memeq() that compilers could
> optimize memcmp calls to, when they can prove that the result is used
> only for its truth value.

Yes, we should use a name in the implementation namespace because even
if we pick an obvious like memequal, it will probably come back under a
different name from the C committee.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1,  avx2, and evex
  2021-09-15 14:01     ` Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, " Florian Weimer via Libc-alpha
@ 2021-09-15 18:06       ` Noah Goldstein via Libc-alpha
  2021-09-15 18:30         ` Joseph Myers
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-15 18:06 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha

On Wed, Sep 15, 2021 at 9:02 AM Florian Weimer via Libc-alpha <
libc-alpha@sourceware.org> wrote:

> * Zack Weinberg via Libc-alpha:
>
> > On Tue, Sep 14, 2021, at 8:00 PM, Joseph Myers wrote:
> >> bcmp is an obsolescent function that no modern programs should be
> using,
> >> and it's not in the implementation namespace either so compilers
> shouldn't
> >> translate memcmp calls to bcmp.
> >
> > I want to add that glibc has made bcmp an alias for memcmp for many
> > years, which means that Linux- or Hurd-specific programs that are
> > still using bcmp may have come to depend on its return value
> > indicating ordering rather than just equality.  I myself had been
> > under the impression that they were *specified* exactly the same,
> > until this thread prompted me to double-check the specifications.  As
> > such I don't think it's safe for *glibc* to accept patches that
> > optimize bcmp separately from memcmp.
>
> That's a very good point.
>
> > I do rather like the idea of a __gnu_memeq() that compilers could
> > optimize memcmp calls to, when they can prove that the result is used
> > only for its truth value.
>
> Yes, we should use a name in the implementation namespace because even
> if we pick an obvious like memequal, it will probably come back under a
> different name from the C committee.
>

+1

What would be the steps for getting that into GLIBC?


>
> Thanks,
> Florian
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1,  avx2, and evex
  2021-09-15 18:06       ` Noah Goldstein via Libc-alpha
@ 2021-09-15 18:30         ` Joseph Myers
  2021-09-27  1:35           ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Joseph Myers @ 2021-09-15 18:30 UTC (permalink / raw)
  To: Noah Goldstein
  Cc: Florian Weimer, Zack Weinberg, Zack Weinberg via Libc-alpha

On Wed, 15 Sep 2021, Noah Goldstein via Libc-alpha wrote:

> > > I do rather like the idea of a __gnu_memeq() that compilers could
> > > optimize memcmp calls to, when they can prove that the result is used
> > > only for its truth value.
> >
> > Yes, we should use a name in the implementation namespace because even
> > if we pick an obvious like memequal, it will probably come back under a
> > different name from the C committee.
> >
> 
> +1
> 
> What would be the steps for getting that into GLIBC?

Define what the exact interface you want is (the exact function type and 
(reserved) name and semantics of the return value and arguments; 
explicitly including details such as whether the full n bytes of each 
argument are required to be mapped into memory even if they compare 
unequal before n bytes).

Discuss it on the libc-coord mailing list (probably include compiler 
mailing lists as well) to get agreement on semantics that are good for 
both libc implementations and for compilers to generate; it's best if this 
interface is acceptable to multiple libc implementations and suitable for 
multiple compilers to generate calls to (when available in libc).

Implement in glibc, across all glibc ports and including all the ABI test 
baseline updates.  If the semantics are such that an alias to memcmp is a 
valid implementation, that probably means adding such an alias to every 
memcmp implementation in glibc (and verifying with build-many-glibcs.py 
that they all build and pass the ABI tests), as well as allowing for 
architectures to add their own separate implementation of the new function 
if they wish.  There should also be execution tests that the new function 
works correctly at runtime (with different alignment, arguments just 
before unmapped pages, etc., as with other string function tests).  If the 
new function is purely an ABI, not an API, it doesn't need user manual 
documentation, however (although there will at least need to be a comment 
giving the detailed semantics that were agreed on libc-coord).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1,  avx2, and evex
  2021-09-15 18:30         ` Joseph Myers
@ 2021-09-27  1:35           ` Noah Goldstein via Libc-alpha
  2021-09-27  7:29             ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27  1:35 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Florian Weimer, Zack Weinberg, Zack Weinberg via Libc-alpha

On Wed, Sep 15, 2021 at 1:30 PM Joseph Myers <joseph@codesourcery.com>
wrote:

> On Wed, 15 Sep 2021, Noah Goldstein via Libc-alpha wrote:
>
> > > > I do rather like the idea of a __gnu_memeq() that compilers could
> > > > optimize memcmp calls to, when they can prove that the result is used
> > > > only for its truth value.
> > >
> > > Yes, we should use a name in the implementation namespace because even
> > > if we pick an obvious like memequal, it will probably come back under a
> > > different name from the C committee.
> > >
> >
> > +1
> >
> > What would be the steps for getting that into GLIBC?
>
> Define what the exact interface you want is (the exact function type and
> (reserved) name and semantics of the return value and arguments;
> explicitly including details such as whether the full n bytes of each
> argument are required to be mapped into memory even if they compare
> unequal before n bytes).
>
> Discuss it on the libc-coord mailing list (probably include compiler
> mailing lists as well) to get agreement on semantics that are good for
> both libc implementations and for compilers to generate; it's best if this
> interface is acceptable to multiple libc implementations and suitable for
> multiple compilers to generate calls to (when available in libc).
>
> Implement in glibc, across all glibc ports and including all the ABI test
> baseline updates.  If the semantics are such that an alias to memcmp is a
> valid implementation, that probably means adding such an alias to every
> memcmp implementation in glibc (and verifying with build-many-glibcs.py
> that they all build and pass the ABI tests), as well as allowing for
> architectures to add their own separate implementation of the new function
> if they wish.  There should also be execution tests that the new function
> works correctly at runtime (with different alignment, arguments just
> before unmapped pages, etc., as with other string function tests).  If the
> new function is purely an ABI, not an API, it doesn't need user manual
> documentation, however (although there will at least need to be a comment
> giving the detailed semantics that were agreed on libc-coord).
>
>
Is there some documentation for how to effectively use build-many-glibcs.py

I've tried:

$> python3 src/glibc/scripts/build-many-glibcs.py /some/were checkout
gcc-vcs-11
$> python3 src/glibc/scripts/build-many-glibcs.py /some/were host-libraries
$> python3 src/glibc/scripts/build-many-glibcs.py /some/were compilers
$> python3 src/glibc/scripts/build-many-glibcs.py /some/were glibcs

With GLIBC master I'm seeing a ton of failures so I'm not sure how I'm
supposed to actually test my patches.

I've also tried with gcc-vcs-mainline although my guess is that it will
just be a less stable version that could cause unrelated failures.



> --
> Joseph S. Myers
> joseph@codesourcery.com
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27  1:35           ` Noah Goldstein via Libc-alpha
@ 2021-09-27  7:29             ` Florian Weimer via Libc-alpha
  2021-09-27 16:49               ` Noah Goldstein via Libc-alpha
  2021-09-27 17:42               ` Joseph Myers
  0 siblings, 2 replies; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-27  7:29 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

* Noah Goldstein:

> Is there some documentation for how to effectively use build-many-glibcs.py
>
> I've tried:
>
> $> python3 src/glibc/scripts/build-many-glibcs.py /some/were checkout gcc-vcs-11

(gcc-vcs-11 is actually the current default.)

> $> python3 src/glibc/scripts/build-many-glibcs.py /some/were host-libraries
> $> python3 src/glibc/scripts/build-many-glibcs.py /some/were compilers
> $> python3 src/glibc/scripts/build-many-glibcs.py /some/were glibcs
>
> With GLIBC master I'm seeing a ton of failures so I'm not sure how I'm
> supposed to actually test my patches.

Running build-many-glibcs.py is not a requirement for patch submission.
(One run that completes in a somewhat reasonable amount of time costs
10 USD to 20 USD in the public cloud, after all.)

I just tried a run, and it passes for me without errors.  Joseph's
tester also produces clean reports for GCC 11.

Which errors do you encounter?  For investigation, it may be prudent to
build with “--keep failed”.  Logs are always preserved.  For the Linux
targets, they can be found in logs/glibcs/…/004*.log.txt for the build
phase, and logs/glibcs/…/007*.log.txt for the check phase.

Common sources of errors are lack of disk space or memory.  I think
below 1 GiB RAM per core, it gets a bit tight, and you may have to
reduce parallelism using -j.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27  7:29             ` Florian Weimer via Libc-alpha
@ 2021-09-27 16:49               ` Noah Goldstein via Libc-alpha
  2021-09-27 16:54                 ` Florian Weimer via Libc-alpha
  2021-09-27 17:42               ` Joseph Myers
  1 sibling, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27 16:49 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

On Mon, Sep 27, 2021 at 2:29 AM Florian Weimer <fweimer@redhat.com> wrote:

> * Noah Goldstein:
>
> > Is there some documentation for how to effectively use
> build-many-glibcs.py
> >
> > I've tried:
> >
> > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were checkout
> gcc-vcs-11
>
> (gcc-vcs-11 is actually the current default.)
>
> > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were
> host-libraries
> > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were compilers
> > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were glibcs
> >
> > With GLIBC master I'm seeing a ton of failures so I'm not sure how I'm
> > supposed to actually test my patches.
>
> Running build-many-glibcs.py is not a requirement for patch submission.
> (One run that completes in a somewhat reasonable amount of time costs
> 10 USD to 20 USD in the public cloud, after all.)
>
> I just tried a run, and it passes for me without errors.  Joseph's
> tester also produces clean reports for GCC 11.
>
> Which errors do you encounter?  For investigation, it may be prudent to
> build with “--keep failed”.  Logs are always preserved.  For the Linux
> targets, they can be found in logs/glibcs/…/004*.log.txt for the build
> phase, and logs/glibcs/…/007*.log.txt for the check phase.
>
> Common sources of errors are lack of disk space or memory.  I think
> below 1 GiB RAM per core, it gets a bit tight, and you may have to
> reduce parallelism using -j.
>

So I essentially get an error at the first build step in compilers (same
for any target)

i.e

FAIL: compilers-arc-linux-gnu binutils build

Looking inside the log I see:

```
/some/were/src/binutils/gas/as.c:110:31: error:
‘DEFAULT_GENERATE_ELF_STT_COMMON’ undeclared here (not in a function)
  110 | int flag_use_elf_stt_common = DEFAULT_GENERATE_ELF_STT_COMMON;
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/some/were/src/binutils/gas/as.c:111:34: error:
‘DEFAULT_GENERATE_BUILD_NOTES’ undeclared here (not in a function)
  111 | bool flag_generate_build_notes = DEFAULT_GENERATE_BUILD_NOTES;
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/some/were/src/binutils/gas/as.c: In function ‘print_version_id’:
/some/were/src/binutils/gas/as.c:237:21: error: ‘TARGET_ALIAS’ undeclared
(first use in this function); did you mean ‘TARGET_ARCH’?
  237 |            VERSION, TARGET_ALIAS, BFD_VERSION_STRING);
      |                     ^~~~~~~~~~~~
      |                     TARGET_ARCH
/some/were/src/binutils/gas/as.c:237:21: note: each undeclared identifier
is reported only once for each function it appears in
/some/were/src/binutils/gas/as.c: In function ‘parse_args’:
/some/were/src/binutils/gas/as.c:700:19: error: ‘TARGET_ALIAS’ undeclared
(first use in this function); did you mean ‘TARGET_ARCH’?
  700 |                   TARGET_ALIAS);
      |                   ^~~~~~~~~~~~
      |                   TARGET_ARCH
/some/were/src/binutils/gas/as.c:715:51: error: ‘TARGET_CANONICAL’
undeclared (first use in this function)
  715 |           fprintf (stderr, _("canonical = %s\n"), TARGET_CANONICAL);
      |                                                   ^~~~~~~~~~~~~~~~
/some/were/src/binutils/gas/as.c:716:50: error: ‘TARGET_CPU’ undeclared
(first use in this function)
  716 |           fprintf (stderr, _("cpu-type = %s\n"), TARGET_CPU);
      |                                                  ^~~~~~~~~~
make[5]: *** [Makefile:1238: as.o] Error 1
make[5]: Leaving directory
'/some/were/build/compilers/arc-linux-gnu/binutils/gas'
make[4]: *** [Makefile:1283: all-recursive] Error 1
make[3]: *** [Makefile:819: all] Error 2
make[2]: *** [Makefile:4990: all-gas] Error 2
make[1]: *** [Makefile:903: all] Error 2

FAIL: compilers-arc-linux-gnu binutils build
```

All the ensuing GLIBC builds result in UNRESOLVED i.e:

UNRESOLVED: glibcs-x86_64-linux-gnu build

```
Description: glibcs-x86_64-linux-gnu build
Command: make
Directory: /some/were/build/glibcs/x86_64-linux-gnu/glibc
Path addition: /some/were/install/compilers/x86_64-linux-gnu/bin


UNRESOLVED: glibcs-x86_64-linux-gnu build
```


> Thanks,
> Florian
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 16:49               ` Noah Goldstein via Libc-alpha
@ 2021-09-27 16:54                 ` Florian Weimer via Libc-alpha
  2021-09-27 17:54                   ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-27 16:54 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

* Noah Goldstein:

> So I essentially get an error at the first build step in compilers (same for any target)
>
> i.e
>
> FAIL: compilers-arc-linux-gnu binutils build
>
> Looking inside the log I see:
>
> ```
> /some/were/src/binutils/gas/as.c:110:31: error: ‘DEFAULT_GENERATE_ELF_STT_COMMON’
> undeclared here (not in a function)
>   110 | int flag_use_elf_stt_common = DEFAULT_GENERATE_ELF_STT_COMMON;
>       |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is quite weird.  It means that something went wrong when running
configure because it should be defined unconditionally.  I haven't seen
a report of this error before.

Do you use site defaults for autoconf or something like that?

Could you please check what's in config.log for the binutils build?
(You may have to pass --keep failed to the Python script.)

Thanks,
Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27  7:29             ` Florian Weimer via Libc-alpha
  2021-09-27 16:49               ` Noah Goldstein via Libc-alpha
@ 2021-09-27 17:42               ` Joseph Myers
  2021-09-27 17:48                 ` Noah Goldstein via Libc-alpha
  1 sibling, 1 reply; 51+ messages in thread
From: Joseph Myers @ 2021-09-27 17:42 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg via Libc-alpha, Zack Weinberg

On Mon, 27 Sep 2021, Florian Weimer via Libc-alpha wrote:

> * Noah Goldstein:
> 
> > Is there some documentation for how to effectively use build-many-glibcs.py
> >
> > I've tried:
> >
> > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were checkout gcc-vcs-11
> 
> (gcc-vcs-11 is actually the current default.)
> 
> > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were host-libraries
> > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were compilers
> > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were glibcs
> >
> > With GLIBC master I'm seeing a ton of failures so I'm not sure how I'm
> > supposed to actually test my patches.
> 
> Running build-many-glibcs.py is not a requirement for patch submission.
> (One run that completes in a somewhat reasonable amount of time costs
> 10 USD to 20 USD in the public cloud, after all.)

However, it's certainly a good idea, when proposing a patch changing all 
the architecture-specific memcmp implementations to add a new alias, to 
test building at least one configuration using each such implementation 
(which is a smaller set than the full set of build-many-glibcs.py 
configurations).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 17:42               ` Joseph Myers
@ 2021-09-27 17:48                 ` Noah Goldstein via Libc-alpha
  0 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27 17:48 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Florian Weimer, Zack Weinberg, Zack Weinberg via Libc-alpha

On Mon, Sep 27, 2021 at 12:42 PM Joseph Myers <joseph@codesourcery.com>
wrote:

> On Mon, 27 Sep 2021, Florian Weimer via Libc-alpha wrote:
>
> > * Noah Goldstein:
> >
> > > Is there some documentation for how to effectively use
> build-many-glibcs.py
> > >
> > > I've tried:
> > >
> > > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were checkout
> gcc-vcs-11
> >
> > (gcc-vcs-11 is actually the current default.)
> >
> > > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were
> host-libraries
> > > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were compilers
> > > $> python3 src/glibc/scripts/build-many-glibcs.py /some/were glibcs
> > >
> > > With GLIBC master I'm seeing a ton of failures so I'm not sure how I'm
> > > supposed to actually test my patches.
> >
> > Running build-many-glibcs.py is not a requirement for patch submission.
> > (One run that completes in a somewhat reasonable amount of time costs
> > 10 USD to 20 USD in the public cloud, after all.)
>
> However, it's certainly a good idea, when proposing a patch changing all
> the architecture-specific memcmp implementations to add a new alias, to
> test building at least one configuration using each such implementation
> (which is a smaller set than the full set of build-many-glibcs.py
> configurations).
>

Agreed. Hence I'm trying to get it working before posting my patch.


>
> --
> Joseph S. Myers
> joseph@codesourcery.com
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 16:54                 ` Florian Weimer via Libc-alpha
@ 2021-09-27 17:54                   ` Noah Goldstein via Libc-alpha
  2021-09-27 17:56                     ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27 17:54 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

On Mon, Sep 27, 2021 at 11:54 AM Florian Weimer <fweimer@redhat.com> wrote:

> * Noah Goldstein:
>
> > So I essentially get an error at the first build step in compilers (same
> for any target)
> >
> > i.e
> >
> > FAIL: compilers-arc-linux-gnu binutils build
> >
> > Looking inside the log I see:
> >
> > ```
> > /some/were/src/binutils/gas/as.c:110:31: error:
> ‘DEFAULT_GENERATE_ELF_STT_COMMON’
> > undeclared here (not in a function)
> >   110 | int flag_use_elf_stt_common = DEFAULT_GENERATE_ELF_STT_COMMON;
> >       |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> This is quite weird.  It means that something went wrong when running
> configure because it should be defined unconditionally.  I haven't seen
> a report of this error before.
>
> Do you use site defaults for autoconf or something like that?
>
> Could you please check what's in config.log for the binutils build?
> (You may have to pass --keep failed to the Python script.)
>

Full command im using.

$> python3 src/glibc/scripts/build-many-glibcs.py build-many/ checkout
gcc-vcs-11; echo "Host Libraries"; python3
src/glibc/scripts/build-many-glibcs.py build-many host-libraries
--keep=all; echo "Compilers"; python3
src/glibc/scripts/build-many-glibcs.py build-many compilers
x86_64-linux-gnu --keep=all; echo "GLIBC"; python3
src/glibc/scripts/build-many-glibcs.py build-many glibcs --keep=all

binutils configure seems to PASS. Here is the full log file for
x86-64-linux-gnu (my host).
Note the build for x86-64 binutils still fails with the same error as arc.


```
$> cat
build-many/logs/compilers/x86_64-linux-gnu/003-compilers-x86_64-linux-gnu-binutils-configure-log.txt

Mon 27 Sep 2021 12:44:53 PM CDT

Description: compilers-x86_64-linux-gnu binutils configure
Command: /some/were/src/binutils/configure
'--prefix=/some/were/install/compilers/x86_64-linux-gnu'
'--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu'
'--target=x86_64-glibc-linux-gnu'
'--with-sysroot=/some/were/install/compilers/x86_64-linux-gnu/sysroot'
--disable-gdb --disable-gdbserver --disable-libdecnumber --disable-readline
--disable-sim
Directory: /some/were/build/compilers/x86_64-linux-gnu/binutils
Path addition: /some/were/install/compilers/x86_64-linux-gnu/bin

checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-glibc-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether ln works... yes
checking whether ln -s works... yes
checking for a sed that does not truncate output... /usr/bin/sed
checking for gawk... gawk
checking for x86_64-pc-linux-gnu-gcc... no
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for gcc option to accept ISO C99... none needed
checking for x86_64-pc-linux-gnu-g++... no
checking for x86_64-pc-linux-gnu-c++... no
checking for x86_64-pc-linux-gnu-gpp... no
checking for x86_64-pc-linux-gnu-aCC... no
checking for x86_64-pc-linux-gnu-CC... no
checking for x86_64-pc-linux-gnu-cxx... no
checking for x86_64-pc-linux-gnu-cc++... no
checking for x86_64-pc-linux-gnu-cl.exe... no
checking for x86_64-pc-linux-gnu-FCC... no
checking for x86_64-pc-linux-gnu-KCC... no
checking for x86_64-pc-linux-gnu-RCC... no
checking for x86_64-pc-linux-gnu-xlC_r... no
checking for x86_64-pc-linux-gnu-xlC... no
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking whether g++ accepts -static-libstdc++ -static-libgcc... yes
checking for x86_64-pc-linux-gnu-gnatbind... no
checking for gnatbind... no
checking for x86_64-pc-linux-gnu-gnatmake... no
checking for gnatmake... no
checking whether compiler driver understands Ada... no
checking how to compare bootstrapped objects... cmp --ignore-initial=16
$$f1 $$f2
checking for objdir... .libs
checking for isl 0.15 or later... no
required isl version is 0.15 or later
*** This configuration is not supported in the following subdirectories:
     readline libdecnumber sim gdb gdbserver
    (Any other directories should still work fine.)
checking for default BUILD_CONFIG...
checking for --enable-vtable-verify... no
checking for bison... bison -y
checking for bison... bison
checking for gm4... no
checking for gnum4... no
checking for m4... m4
checking for flex... flex
checking for flex... flex
checking for makeinfo... makeinfo
checking for expect... no
checking for runtest... no
checking for x86_64-pc-linux-gnu-ar... no
checking for ar... ar
checking for x86_64-pc-linux-gnu-as... no
checking for as... as
checking for x86_64-pc-linux-gnu-dlltool... no
checking for dlltool... no
checking for x86_64-pc-linux-gnu-ld... no
checking for ld... ld
checking for x86_64-pc-linux-gnu-lipo... no
checking for lipo... no
checking for x86_64-pc-linux-gnu-nm... no
checking for nm... nm
checking for x86_64-pc-linux-gnu-ranlib... no
checking for ranlib... ranlib
checking for x86_64-pc-linux-gnu-strip... no
checking for strip... strip
checking for x86_64-pc-linux-gnu-windres... no
checking for windres... no
checking for x86_64-pc-linux-gnu-windmc... no
checking for windmc... no
checking for x86_64-pc-linux-gnu-objcopy... no
checking for objcopy... objcopy
checking for x86_64-pc-linux-gnu-objdump... no
checking for objdump... objdump
checking for x86_64-pc-linux-gnu-readelf... no
checking for readelf... readelf
checking for -plugin option... checking for x86_64-pc-linux-gnu-ar...
(cached) ar
--plugin /usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so
checking for x86_64-glibc-linux-gnu-cc... no
checking for x86_64-glibc-linux-gnu-gcc... no
checking for x86_64-glibc-linux-gnu-c++... no
checking for x86_64-glibc-linux-gnu-g++... no
checking for x86_64-glibc-linux-gnu-cxx... no
checking for x86_64-glibc-linux-gnu-gxx... no
checking for x86_64-glibc-linux-gnu-gcc... no
checking for x86_64-glibc-linux-gnu-gfortran... no
checking for x86_64-glibc-linux-gnu-gccgo... no
checking for x86_64-glibc-linux-gnu-ar... no
checking for x86_64-glibc-linux-gnu-as... no
checking for x86_64-glibc-linux-gnu-dlltool... no
checking for x86_64-glibc-linux-gnu-ld... no
checking for x86_64-glibc-linux-gnu-lipo... no
checking for x86_64-glibc-linux-gnu-nm... no
checking for x86_64-glibc-linux-gnu-objcopy... no
checking for x86_64-glibc-linux-gnu-objdump... no
checking for x86_64-glibc-linux-gnu-ranlib... no
checking for x86_64-glibc-linux-gnu-readelf... no
checking for x86_64-glibc-linux-gnu-strip... no
checking for x86_64-glibc-linux-gnu-windres... no
checking for x86_64-glibc-linux-gnu-windmc... no
checking where to find the target ar... just compiled
checking where to find the target as... just compiled
checking where to find the target cc... pre-installed
checking where to find the target c++... pre-installed
checking where to find the target c++ for libstdc++... pre-installed
checking where to find the target dlltool... just compiled
checking where to find the target gcc... pre-installed
checking where to find the target gfortran... pre-installed
checking where to find the target gccgo... pre-installed
checking where to find the target ld... just compiled
checking where to find the target lipo... pre-installed
checking where to find the target nm... just compiled
checking where to find the target objcopy... just compiled
checking where to find the target objdump... just compiled
checking where to find the target ranlib... just compiled
checking where to find the target readelf... just compiled
checking where to find the target strip... just compiled
checking where to find the target windres... just compiled
checking where to find the target windmc... just compiled
checking whether to enable maintainer-specific portions of Makefiles... no
configure: creating ./config.status
config.status: creating Makefile

PASS: compilers-x86_64-linux-gnu binutils configure

Mon 27 Sep 2021 12:44:54 PM CDT
```



> Thanks,
> Florian
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 17:54                   ` Noah Goldstein via Libc-alpha
@ 2021-09-27 17:56                     ` Florian Weimer via Libc-alpha
  2021-09-27 18:05                       ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-27 17:56 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

* Noah Goldstein:

> $> cat
> build-many/logs/compilers/x86_64-linux-gnu/003-compilers-x86_64-linux-gnu-binutils-configure-log.txt

There should be a config.log file in the binutils build directory (under
build-many/build/compilers).  I hope this file contains illuminating
data.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 17:56                     ` Florian Weimer via Libc-alpha
@ 2021-09-27 18:05                       ` Noah Goldstein via Libc-alpha
  2021-09-27 18:10                         ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27 18:05 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

On Mon, Sep 27, 2021 at 12:56 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Noah Goldstein:
>
> > $> cat
> > build-many/logs/compilers/x86_64-linux-gnu/003-compilers-x86_64-linux-gnu-binutils-configure-log.txt
>
> There should be a config.log file in the binutils build directory (under
> build-many/build/compilers).  I hope this file contains illuminating
> data.

Oh sorry. Here is the output of config.log

```
$> cat build-many/build/compilers/x86_64-linux-gnu/binutils/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by configure, which was
generated by GNU Autoconf 2.69.  Invocation command line was

  $ /some/were/build-many/src/binutils/configure
--prefix=/some/were/build-many/install/compilers/x86_64-linux-gnu
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-glibc-linux-gnu
--with-sysroot=/some/were/build-many/install/compilers/x86_64-linux-gnu/sysroot
--disable-gdb --disable-gdbserver --disable-libdecnumber
--disable-readline --disable-sim

## --------- ##
## Platform. ##
## --------- ##

hostname = noah-tigerlake
uname -m = x86_64
uname -r = 5.11.0-27-generic
uname -s = Linux
uname -v = #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021

/usr/bin/uname -p = x86_64
/bin/uname -X     = unknown

/bin/arch              = x86_64
/usr/bin/arch -k       = unknown
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo      = unknown
/bin/machine           = unknown
/usr/bin/oslevel       = unknown
/bin/universe          = unknown

PATH: /some/were/build-many/install/compilers/x86_64-linux-gnu/bin
PATH: /home/noah/programs/libraries/
PATH: /home/noah/.local/bin
PATH: /home/noah/programs/pyscripts
PATH: /home/noah/scripts
PATH: /home/noah/programs/libraries/
PATH: /home/noah/.local/bin
PATH: /home/noah/programs/pyscripts
PATH: /home/noah/scripts
PATH: /home/noah/.local/bin
PATH: /usr/local/sbin
PATH: /usr/local/bin
PATH: /usr/sbin
PATH: /usr/bin
PATH: /sbin
PATH: /bin
PATH: /usr/games
PATH: /usr/local/games
PATH: /snap/bin


## ----------- ##
## Core tests. ##
## ----------- ##

configure:2348: checking build system type
configure:2362: result: x86_64-pc-linux-gnu
configure:2409: checking host system type
configure:2422: result: x86_64-pc-linux-gnu
configure:2442: checking target system type
configure:2455: result: x86_64-glibc-linux-gnu
configure:2509: checking for a BSD-compatible install
configure:2577: result: /usr/bin/install -c
configure:2588: checking whether ln works
configure:2610: result: yes
configure:2614: checking whether ln -s works
configure:2618: result: yes
configure:2625: checking for a sed that does not truncate output
configure:2689: result: /usr/bin/sed
configure:2698: checking for gawk
configure:2714: found /usr/bin/gawk
configure:2725: result: gawk
configure:4021: checking for x86_64-pc-linux-gnu-gcc
configure:4051: result: no
configure:4061: checking for gcc
configure:4077: found /usr/bin/gcc
configure:4088: result: gcc
configure:4317: checking for C compiler version
configure:4326: gcc --version >&5
gcc (Ubuntu 11.1.0-1ubuntu1~20.04) 11.1.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

configure:4337: $? = 0
configure:4326: gcc -v >&5
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.1.0-1ubuntu1~20.04'
--with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2
--prefix=/usr --with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/lib
--enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie
--with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto
--enable-multiarch --disable-werror --disable-cet --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-11-2V7zgg/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-2V7zgg/gcc-11-11.1.0/debian/tmp-gcn/usr
--without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean
--enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.1.0 (Ubuntu 11.1.0-1ubuntu1~20.04)
... rest of stderr output deleted ...
configure:4337: $? = 0
configure:4326: gcc -V >&5
gcc: error: unrecognized command-line option '-V'
gcc: fatal error: no input files
compilation terminated.
configure:4337: $? = 1
configure:4326: gcc -qversion >&5
gcc: error: unrecognized command-line option '-qversion'; did you mean
'--version'?
gcc: fatal error: no input files
compilation terminated.
configure:4337: $? = 1
configure:4357: checking whether the C compiler works
configure:4379: gcc    conftest.c  >&5
configure:4383: $? = 0
configure:4431: result: yes
configure:4434: checking for C compiler default output file name
configure:4436: result: a.out
configure:4442: checking for suffix of executables
configure:4449: gcc -o conftest    conftest.c  >&5
configure:4453: $? = 0
configure:4475: result:
configure:4497: checking whether we are cross compiling
configure:4505: gcc -o conftest    conftest.c  >&5
configure:4509: $? = 0
configure:4516: ./conftest
configure:4520: $? = 0
configure:4508: result: no
configure:4513: checking for suffix of object files
configure:4535: gcc -c   conftest.c >&5
configure:4539: $? = 0
configure:4560: result: o
configure:4564: checking whether we are using the GNU C compiler
configure:4583: gcc -c   conftest.c >&5
configure:4583: $? = 0
configure:4592: result: yes
configure:4601: checking whether gcc accepts -g
configure:4621: gcc -c -g  conftest.c >&5
configure:4621: $? = 0
configure:4662: result: yes
configure:4679: checking for gcc option to accept ISO C89
configure:4742: gcc  -c -g -O2  conftest.c >&5
configure:4742: $? = 0
configure:4755: result: none needed
configure:4775: checking for gcc option to accept ISO C99
configure:4924: gcc  -c -g -O2  conftest.c >&5
configure:4924: $? = 0
configure:4937: result: none needed
configure:4966: checking for x86_64-pc-linux-gnu-g++
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-c++
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-gpp
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-aCC
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-CC
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-cxx
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-cc++
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-cl.exe
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-FCC
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-KCC
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-RCC
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-xlC_r
configure:4996: result: no
configure:4966: checking for x86_64-pc-linux-gnu-xlC
configure:4996: result: no
configure:5010: checking for g++
configure:5026: found /usr/bin/g++
configure:5037: result: g++
configure:5064: checking for C++ compiler version
configure:5073: g++ --version >&5
g++ (Ubuntu 11.1.0-1ubuntu1~20.04) 11.1.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

configure:5084: $? = 0
configure:5073: g++ -v >&5
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.1.0-1ubuntu1~20.04'
--with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2
--prefix=/usr --with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/lib
--enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie
--with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto
--enable-multiarch --disable-werror --disable-cet --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-11-2V7zgg/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-2V7zgg/gcc-11-11.1.0/debian/tmp-gcn/usr
--without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean
--enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.1.0 (Ubuntu 11.1.0-1ubuntu1~20.04)
... rest of stderr output deleted ...
configure:5084: $? = 0
configure:5073: g++ -V >&5
g++: error: unrecognized command-line option '-V'
g++: fatal error: no input files
compilation terminated.
configure:5084: $? = 1
configure:5073: g++ -qversion >&5
g++: error: unrecognized command-line option '-qversion'; did you mean
'--version'?
g++: fatal error: no input files
compilation terminated.
configure:5084: $? = 1
configure:5088: checking whether we are using the GNU C++ compiler
configure:5107: g++ -c   conftest.cpp >&5
configure:5107: $? = 0
configure:5116: result: yes
configure:5125: checking whether g++ accepts -g
configure:5145: g++ -c -g  conftest.cpp >&5
configure:5145: $? = 0
configure:5186: result: yes
configure:5235: checking whether g++ accepts -static-libstdc++ -static-libgcc
configure:5252: g++ -o conftest -g -O2   -static-libstdc++
-static-libgcc conftest.cpp  >&5
configure:5252: $? = 0
configure:5253: result: yes
configure:5277: checking for x86_64-pc-linux-gnu-gnatbind
configure:5307: result: no
configure:5317: checking for gnatbind
configure:5347: result: no
configure:5369: checking for x86_64-pc-linux-gnu-gnatmake
configure:5399: result: no
configure:5409: checking for gnatmake
configure:5439: result: no
configure:5458: checking whether compiler driver understands Ada
configure:5481: result: no
configure:5490: checking how to compare bootstrapped objects
configure:5515: result: cmp --ignore-initial=16 $$f1 $$f2
configure:5660: checking for objdir
configure:5675: result: .libs
configure:6240: checking for isl 0.15 or later
configure:6253: gcc -o conftest -g -O2      -lisl -lmpc -lmpfr -lgmp
conftest.c  -lisl -lgmp >&5
conftest.c:10:10: fatal error: isl/schedule.h: No such file or directory
   10 | #include <isl/schedule.h>
      |          ^~~~~~~~~~~~~~~~
compilation terminated.
configure:6253: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME ""
| #define PACKAGE_TARNAME ""
| #define PACKAGE_VERSION ""
| #define PACKAGE_STRING ""
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define LT_OBJDIR ".libs/"
| /* end confdefs.h.  */
| #include <isl/schedule.h>
| int
| main ()
| {
| isl_options_set_schedule_serialize_sccs (NULL, 0);
|   ;
|   return 0;
| }
configure:6260: result: no
configure:6264: result: required isl version is 0.15 or later
configure:7357: checking for default BUILD_CONFIG
configure:7389: result:
configure:7394: checking for --enable-vtable-verify
configure:7407: result: no
configure:8028: checking for bison
configure:8044: found /usr/bin/bison
configure:8055: result: bison -y
configure:8075: checking for bison
configure:8091: found /usr/bin/bison
configure:8102: result: bison
configure:8122: checking for gm4
configure:8152: result: no
configure:8122: checking for gnum4
configure:8152: result: no
configure:8122: checking for m4
configure:8138: found /usr/bin/m4
configure:8149: result: m4
configure:8169: checking for flex
configure:8185: found /usr/bin/flex
configure:8196: result: flex
configure:8217: checking for flex
configure:8233: found /usr/bin/flex
configure:8244: result: flex
configure:8264: checking for makeinfo
configure:8280: found /usr/bin/makeinfo
configure:8291: result: makeinfo
configure:8325: checking for expect
configure:8355: result: no
configure:8374: checking for runtest
configure:8404: result: no
configure:8480: checking for x86_64-pc-linux-gnu-ar
configure:8510: result: no
configure:8519: checking for ar
configure:8535: found /usr/bin/ar
configure:8546: result: ar
configure:8621: checking for x86_64-pc-linux-gnu-as
configure:8651: result: no
configure:8660: checking for as
configure:8676: found /usr/bin/as
configure:8687: result: as
configure:8762: checking for x86_64-pc-linux-gnu-dlltool
configure:8792: result: no
configure:8801: checking for dlltool
configure:8831: result: no
configure:8903: checking for x86_64-pc-linux-gnu-ld
configure:8933: result: no
configure:8942: checking for ld
configure:8958: found /usr/bin/ld
configure:8969: result: ld
configure:9044: checking for x86_64-pc-linux-gnu-lipo
configure:9074: result: no
configure:9083: checking for lipo
configure:9113: result: no
configure:9185: checking for x86_64-pc-linux-gnu-nm
configure:9215: result: no
configure:9224: checking for nm
configure:9240: found /usr/bin/nm
configure:9251: result: nm
configure:9326: checking for x86_64-pc-linux-gnu-ranlib
configure:9356: result: no
configure:9365: checking for ranlib
configure:9381: found /usr/bin/ranlib
configure:9392: result: ranlib
configure:9462: checking for x86_64-pc-linux-gnu-strip
configure:9492: result: no
configure:9501: checking for strip
configure:9517: found /usr/bin/strip
configure:9528: result: strip
configure:9598: checking for x86_64-pc-linux-gnu-windres
configure:9628: result: no
configure:9637: checking for windres
configure:9667: result: no
configure:9739: checking for x86_64-pc-linux-gnu-windmc
configure:9769: result: no
configure:9778: checking for windmc
configure:9808: result: no
configure:9880: checking for x86_64-pc-linux-gnu-objcopy
configure:9910: result: no
configure:9919: checking for objcopy
configure:9935: found /usr/bin/objcopy
configure:9946: result: objcopy
configure:10021: checking for x86_64-pc-linux-gnu-objdump
configure:10051: result: no
configure:10060: checking for objdump
configure:10076: found /usr/bin/objdump
configure:10087: result: objdump
configure:10162: checking for x86_64-pc-linux-gnu-readelf
configure:10192: result: no
configure:10201: checking for readelf
configure:10217: found /usr/bin/readelf
configure:10228: result: readelf
configure:10254: checking for -plugin option
configure:10272: checking for x86_64-pc-linux-gnu-ar
configure:10299: result: ar
configure:10374: result: --plugin
/usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so
configure:10486: checking for x86_64-glibc-linux-gnu-cc
configure:10516: result: no
configure:10486: checking for x86_64-glibc-linux-gnu-gcc
configure:10516: result: no
configure:10647: checking for x86_64-glibc-linux-gnu-c++
configure:10677: result: no
configure:10647: checking for x86_64-glibc-linux-gnu-g++
configure:10677: result: no
configure:10647: checking for x86_64-glibc-linux-gnu-cxx
configure:10677: result: no
configure:10647: checking for x86_64-glibc-linux-gnu-gxx
configure:10677: result: no
configure:10808: checking for x86_64-glibc-linux-gnu-gcc
configure:10838: result: no
configure:10964: checking for x86_64-glibc-linux-gnu-gfortran
configure:10994: result: no
configure:11125: checking for x86_64-glibc-linux-gnu-gccgo
configure:11155: result: no
configure:11366: checking for x86_64-glibc-linux-gnu-ar
configure:11396: result: no
configure:11596: checking for x86_64-glibc-linux-gnu-as
configure:11626: result: no
configure:11826: checking for x86_64-glibc-linux-gnu-dlltool
configure:11856: result: no
configure:12056: checking for x86_64-glibc-linux-gnu-ld
configure:12086: result: no
configure:12286: checking for x86_64-glibc-linux-gnu-lipo
configure:12316: result: no
configure:12516: checking for x86_64-glibc-linux-gnu-nm
configure:12546: result: no
configure:12746: checking for x86_64-glibc-linux-gnu-objcopy
configure:12776: result: no
configure:12976: checking for x86_64-glibc-linux-gnu-objdump
configure:13006: result: no
configure:13206: checking for x86_64-glibc-linux-gnu-ranlib
configure:13236: result: no
configure:13436: checking for x86_64-glibc-linux-gnu-readelf
configure:13466: result: no
configure:13666: checking for x86_64-glibc-linux-gnu-strip
configure:13696: result: no
configure:13896: checking for x86_64-glibc-linux-gnu-windres
configure:13926: result: no
configure:14126: checking for x86_64-glibc-linux-gnu-windmc
configure:14156: result: no
configure:14223: checking where to find the target ar
configure:14246: result: just compiled
configure:14265: checking where to find the target as
configure:14288: result: just compiled
configure:14307: checking where to find the target cc
configure:14344: result: pre-installed
configure:14349: checking where to find the target c++
configure:14389: result: pre-installed
configure:14394: checking where to find the target c++ for libstdc++
configure:14434: result: pre-installed
configure:14439: checking where to find the target dlltool
configure:14462: result: just compiled
configure:14481: checking where to find the target gcc
configure:14518: result: pre-installed
configure:14523: checking where to find the target gfortran
configure:14563: result: pre-installed
configure:14568: checking where to find the target gccgo
configure:14608: result: pre-installed
configure:14613: checking where to find the target ld
configure:14636: result: just compiled
configure:14655: checking where to find the target lipo
configure:14681: result: pre-installed
configure:14686: checking where to find the target nm
configure:14709: result: just compiled
configure:14728: checking where to find the target objcopy
configure:14751: result: just compiled
configure:14770: checking where to find the target objdump
configure:14793: result: just compiled
configure:14812: checking where to find the target ranlib
configure:14835: result: just compiled
configure:14854: checking where to find the target readelf
configure:14877: result: just compiled
configure:14896: checking where to find the target strip
configure:14919: result: just compiled
configure:14938: checking where to find the target windres
configure:14961: result: just compiled
configure:14980: checking where to find the target windmc
configure:15003: result: just compiled
configure:15050: checking whether to enable maintainer-specific
portions of Makefiles
configure:15059: result: no
configure:15294: creating ./config.status

## ---------------------- ##
## Running config.status. ##
## ---------------------- ##

This file was extended by config.status, which was
generated by GNU Autoconf 2.69.  Invocation command line was

  CONFIG_FILES    =
  CONFIG_HEADERS  =
  CONFIG_LINKS    =
  CONFIG_COMMANDS =
  $ ./config.status

on noah-tigerlake

config.status:983: creating Makefile

## ---------------- ##
## Cache variables. ##
## ---------------- ##

ac_cv_build=x86_64-pc-linux-gnu
ac_cv_c_compiler_gnu=yes
ac_cv_cxx_compiler_gnu=yes
ac_cv_env_AR_FOR_TARGET_set=
ac_cv_env_AR_FOR_TARGET_value=
ac_cv_env_AR_set=
ac_cv_env_AR_value=
ac_cv_env_AS_FOR_TARGET_set=
ac_cv_env_AS_FOR_TARGET_value=
ac_cv_env_AS_set=
ac_cv_env_AS_value=
ac_cv_env_CCC_set=
ac_cv_env_CCC_value=
ac_cv_env_CC_FOR_TARGET_set=
ac_cv_env_CC_FOR_TARGET_value=
ac_cv_env_CC_set=
ac_cv_env_CC_value=
ac_cv_env_CFLAGS_set=
ac_cv_env_CFLAGS_value=
ac_cv_env_CPPFLAGS_set=
ac_cv_env_CPPFLAGS_value=
ac_cv_env_CXXFLAGS_set=
ac_cv_env_CXXFLAGS_value=
ac_cv_env_CXX_FOR_TARGET_set=
ac_cv_env_CXX_FOR_TARGET_value=
ac_cv_env_CXX_set=
ac_cv_env_CXX_value=
ac_cv_env_DLLTOOL_FOR_TARGET_set=
ac_cv_env_DLLTOOL_FOR_TARGET_value=
ac_cv_env_DLLTOOL_set=
ac_cv_env_DLLTOOL_value=
ac_cv_env_GCC_FOR_TARGET_set=
ac_cv_env_GCC_FOR_TARGET_value=
ac_cv_env_GFORTRAN_FOR_TARGET_set=
ac_cv_env_GFORTRAN_FOR_TARGET_value=
ac_cv_env_GOC_FOR_TARGET_set=
ac_cv_env_GOC_FOR_TARGET_value=
ac_cv_env_LDFLAGS_set=
ac_cv_env_LDFLAGS_value=
ac_cv_env_LD_FOR_TARGET_set=
ac_cv_env_LD_FOR_TARGET_value=
ac_cv_env_LD_set=
ac_cv_env_LD_value=
ac_cv_env_LIBS_set=
ac_cv_env_LIBS_value=
ac_cv_env_LIPO_FOR_TARGET_set=
ac_cv_env_LIPO_FOR_TARGET_value=
ac_cv_env_LIPO_set=
ac_cv_env_LIPO_value=
ac_cv_env_NM_FOR_TARGET_set=
ac_cv_env_NM_FOR_TARGET_value=
ac_cv_env_NM_set=
ac_cv_env_NM_value=
ac_cv_env_OBJCOPY_FOR_TARGET_set=
ac_cv_env_OBJCOPY_FOR_TARGET_value=
ac_cv_env_OBJCOPY_set=
ac_cv_env_OBJCOPY_value=
ac_cv_env_OBJDUMP_FOR_TARGET_set=
ac_cv_env_OBJDUMP_FOR_TARGET_value=
ac_cv_env_OBJDUMP_set=
ac_cv_env_OBJDUMP_value=
ac_cv_env_RANLIB_FOR_TARGET_set=
ac_cv_env_RANLIB_FOR_TARGET_value=
ac_cv_env_RANLIB_set=
ac_cv_env_RANLIB_value=
ac_cv_env_READELF_FOR_TARGET_set=
ac_cv_env_READELF_FOR_TARGET_value=
ac_cv_env_READELF_set=
ac_cv_env_READELF_value=
ac_cv_env_STRIP_FOR_TARGET_set=
ac_cv_env_STRIP_FOR_TARGET_value=
ac_cv_env_STRIP_set=
ac_cv_env_STRIP_value=
ac_cv_env_WINDMC_FOR_TARGET_set=
ac_cv_env_WINDMC_FOR_TARGET_value=
ac_cv_env_WINDMC_set=
ac_cv_env_WINDMC_value=
ac_cv_env_WINDRES_FOR_TARGET_set=
ac_cv_env_WINDRES_FOR_TARGET_value=
ac_cv_env_WINDRES_set=
ac_cv_env_WINDRES_value=
ac_cv_env_build_alias_set=set
ac_cv_env_build_alias_value=x86_64-pc-linux-gnu
ac_cv_env_build_configargs_set=
ac_cv_env_build_configargs_value=
ac_cv_env_host_alias_set=set
ac_cv_env_host_alias_value=x86_64-pc-linux-gnu
ac_cv_env_host_configargs_set=
ac_cv_env_host_configargs_value=
ac_cv_env_target_alias_set=set
ac_cv_env_target_alias_value=x86_64-glibc-linux-gnu
ac_cv_env_target_configargs_set=
ac_cv_env_target_configargs_value=
ac_cv_host=x86_64-pc-linux-gnu
ac_cv_objext=o
ac_cv_path_SED=/usr/bin/sed
ac_cv_path_install='/usr/bin/install -c'
ac_cv_prog_AR=ar
ac_cv_prog_AS=as
ac_cv_prog_AWK=gawk
ac_cv_prog_BISON=bison
ac_cv_prog_FLEX=flex
ac_cv_prog_LD=ld
ac_cv_prog_LEX=flex
ac_cv_prog_M4=m4
ac_cv_prog_MAKEINFO=makeinfo
ac_cv_prog_NM=nm
ac_cv_prog_OBJCOPY=objcopy
ac_cv_prog_OBJDUMP=objdump
ac_cv_prog_RANLIB=ranlib
ac_cv_prog_READELF=readelf
ac_cv_prog_STRIP=strip
ac_cv_prog_YACC='bison -y'
ac_cv_prog_ac_ct_CC=gcc
ac_cv_prog_ac_ct_CXX=g++
ac_cv_prog_cc_c89=
ac_cv_prog_cc_c99=
ac_cv_prog_cc_g=yes
ac_cv_prog_cxx_g=yes
ac_cv_target=x86_64-glibc-linux-gnu
acx_cv_cc_gcc_supports_ada=no
acx_cv_prog_LN=ln
gcc_cv_isl=no
gcc_cv_prog_cmp_skip='cmp --ignore-initial=16 $$f1 $$f2'
gcc_cv_tool_dirs=
gcc_cv_tool_prefix=/some/were/build-many/install/compilers/x86_64-linux-gnu
lt_cv_objdir=.libs

## ----------------- ##
## Output variables. ##
## ----------------- ##

AR='ar'
AR_FOR_BUILD='$(AR)'
AR_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/ar'
AR_PLUGIN_OPTION='--plugin /usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so'
AS='as'
AS_FOR_BUILD='$(AS)'
AS_FOR_TARGET='$$r/$(HOST_SUBDIR)/gas/as-new'
AWK='gawk'
BISON='bison'
BUILD_CONFIG=''
CC='gcc'
CC_FOR_BUILD='$(CC)'
CC_FOR_TARGET='x86_64-glibc-linux-gnu-cc'
CFLAGS='-g -O2'
CFLAGS_FOR_BUILD='-g -O2'
CFLAGS_FOR_TARGET='-g -O2'
COMPILER_AS_FOR_TARGET='$(AS_FOR_TARGET)'
COMPILER_LD_FOR_TARGET='$(LD_FOR_TARGET)'
COMPILER_NM_FOR_TARGET='$(NM_FOR_TARGET)'
CONFIGURE_GDB_TK=''
CPPFLAGS=''
CXX='g++'
CXXFLAGS='-g -O2'
CXXFLAGS_FOR_BUILD='-g -O2'
CXXFLAGS_FOR_TARGET='-g -O2'
CXX_FOR_BUILD='$(CXX)'
CXX_FOR_TARGET='x86_64-glibc-linux-gnu-c++'
DEBUG_PREFIX_CFLAGS_FOR_TARGET=''
DEFS='-DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\"
-DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\"
-DPACKAGE_URL=\"\" -DLT_OBJDIR=\".libs/\"'
DLLTOOL='dlltool'
DLLTOOL_FOR_BUILD='$(DLLTOOL)'
DLLTOOL_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/dlltool'
ECHO_C=''
ECHO_N='-n'
ECHO_T=''
EXEEXT=''
EXPECT='expect'
FLAGS_FOR_TARGET=' -L$$r/$(HOST_SUBDIR)/ld'
FLEX='flex'
GCC_FOR_TARGET='x86_64-glibc-linux-gnu-gcc'
GCC_SHLIB_SUBDIR=''
GDB_TK=''
GFORTRAN_FOR_BUILD='$(GFORTRAN)'
GFORTRAN_FOR_TARGET='x86_64-glibc-linux-gnu-gfortran'
GNATBIND='no'
GNATMAKE='no'
GOC_FOR_BUILD='$(GOC)'
GOC_FOR_TARGET='x86_64-glibc-linux-gnu-gccgo'
INSTALL_DATA='${INSTALL} -m 644'
INSTALL_GDB_TK=''
INSTALL_PROGRAM='${INSTALL}'
INSTALL_SCRIPT='${INSTALL}'
LD='ld'
LDFLAGS=''
LDFLAGS_FOR_BUILD=''
LDFLAGS_FOR_TARGET=''
LD_FOR_BUILD='$(LD)'
LD_FOR_TARGET='$$r/$(HOST_SUBDIR)/ld/ld-new'
LEX='flex'
LIBOBJS=''
LIBS=''
LIPO='lipo'
LIPO_FOR_TARGET='x86_64-glibc-linux-gnu-lipo'
LN='ln'
LN_S='ln -s'
LTLIBOBJS=''
M4='m4'
MAINT='#'
MAINTAINER_MODE_FALSE=''
MAINTAINER_MODE_TRUE='#'
MAKEINFO='makeinfo'
NM='nm'
NM_FOR_BUILD='$(NM)'
NM_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/nm-new'
OBJCOPY='objcopy'
OBJCOPY_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/objcopy'
OBJDUMP='objdump'
OBJDUMP_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/objdump'
OBJEXT='o'
PACKAGE_BUGREPORT=''
PACKAGE_NAME=''
PACKAGE_STRING=''
PACKAGE_TARNAME=''
PACKAGE_URL=''
PACKAGE_VERSION=''
PATH_SEPARATOR=':'
PGO_BUILD_GEN_CFLAGS=''
PGO_BUILD_LTO_CFLAGS=''
PGO_BUILD_USE_CFLAGS=''
RANLIB='ranlib'
RANLIB_FOR_BUILD='$(RANLIB)'
RANLIB_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/ranlib'
RANLIB_PLUGIN_OPTION='--plugin
/usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so'
RAW_CXX_FOR_TARGET='x86_64-glibc-linux-gnu-c++'
READELF='readelf'
READELF_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/readelf'
RPATH_ENVVAR='LD_LIBRARY_PATH'
RUNTEST='runtest'
SED='/usr/bin/sed'
SHELL='/bin/bash'
STRIP='strip'
STRIP_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/strip-new'
SYSROOT_CFLAGS_FOR_TARGET=''
TOPLEVEL_CONFIGURE_ARGUMENTS='/some/were/build-many/src/binutils/configure
--prefix=/some/were/build-many/install/compilers/x86_64-linux-gnu
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-glibc-linux-gnu
--with-sysroot=/some/were/build-many/install/compilers/x86_64-linux-gnu/sysroot
--disable-gdb --disable-gdbserver --disable-libdecnumber
--disable-readline --disable-sim'
WINDMC='windmc'
WINDMC_FOR_BUILD='$(WINDMC)'
WINDMC_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/windmc'
WINDRES='windres'
WINDRES_FOR_BUILD='$(WINDRES)'
WINDRES_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/windres'
YACC='bison -y'
ac_ct_CC='gcc'
ac_ct_CXX='g++'
bindir='${exec_prefix}/bin'
build='x86_64-pc-linux-gnu'
build_alias='x86_64-pc-linux-gnu'
build_configargs=' --cache-file=./config.cache
'\''--prefix=/some/were/build-many/install/compilers/x86_64-linux-gnu'\''
'\''--with-sysroot=/some/were/build-many/install/compilers/x86_64-linux-gnu/sysroot'\''
'\''--disable-gdb'\'' '\''--disable-gdbserver'\''
'\''--disable-libdecnumber'\'' '\''--disable-readline'\''
'\''--disable-sim'\''
--program-transform-name='\''s&^&x86_64-glibc-linux-gnu-&'\''
--disable-option-checking'
build_configdirs=' libiberty'
build_cpu='x86_64'
build_libsubdir='build-x86_64-pc-linux-gnu'
build_noncanonical='x86_64-pc-linux-gnu'
build_os='linux-gnu'
build_subdir='build-x86_64-pc-linux-gnu'
build_tooldir='${exec_prefix}/x86_64-glibc-linux-gnu'
build_vendor='pc'
compare_exclusions='gcc/cc*-checksum$(objext) | gcc/ada/*tools/*'
configdirs=' intl libiberty opcodes bfd zlib libctf binutils gas ld gprof etc'
datadir='${datarootdir}'
datarootdir='${prefix}/share'
do_compare='cmp --ignore-initial=16 $$f1 $$f2'
docdir='${datarootdir}/doc/${PACKAGE}'
dvidir='${docdir}'
exec_prefix='${prefix}'
extra_host_libiberty_configure_flags=''
extra_host_zlib_configure_flags=''
extra_isl_gmp_configure_flags=''
extra_liboffloadmic_configure_flags=''
extra_linker_plugin_configure_flags=''
extra_linker_plugin_flags=''
extra_mpc_gmp_configure_flags=''
extra_mpc_mpfr_configure_flags=''
extra_mpfr_configure_flags=''
get_gcc_base_ver='cat'
gmpinc=''
gmplibs='-lmpc -lmpfr -lgmp'
host='x86_64-pc-linux-gnu'
host_alias='x86_64-pc-linux-gnu'
host_configargs=' --cache-file=./config.cache  --with-gnu-as
--with-gnu-ld '\''--prefix=/some/were/build-many/install/compilers/x86_64-linux-gnu'\''
'\''--with-sysroot=/some/were/build-many/install/compilers/x86_64-linux-gnu/sysroot'\''
'\''--disable-gdb'\'' '\''--disable-gdbserver'\''
'\''--disable-libdecnumber'\'' '\''--disable-readline'\''
'\''--disable-sim'\''
--program-transform-name='\''s&^&x86_64-glibc-linux-gnu-&'\''
--disable-option-checking'
host_cpu='x86_64'
host_noncanonical='x86_64-pc-linux-gnu'
host_os='linux-gnu'
host_shared='no'
host_subdir='.'
host_vendor='pc'
htmldir='${docdir}'
includedir='${prefix}/include'
infodir='${datarootdir}/info'
islinc=''
isllibs=''
libdir='${exec_prefix}/lib'
libexecdir='${exec_prefix}/libexec'
localedir='${datarootdir}/locale'
localstatedir='${prefix}/var'
mandir='${datarootdir}/man'
oldincludedir='/usr/include'
pdfdir='${docdir}'
poststage1_ldflags='-static-libstdc++ -static-libgcc'
poststage1_libs=''
prefix='/some/were/build-many/install/compilers/x86_64-linux-gnu'
program_transform_name='s&^&x86_64-glibc-linux-gnu-&'
psdir='${docdir}'
sbindir='${exec_prefix}/sbin'
sharedstatedir='${prefix}/com'
stage1_cflags='-g'
stage1_checking='--enable-checking=yes,types'
stage1_languages=',c,'
stage1_ldflags=''
stage1_libs=''
stage2_werror_flag=''
sysconfdir='${prefix}/etc'
target='x86_64-glibc-linux-gnu'
target_alias='x86_64-glibc-linux-gnu'
target_configargs='--cache-file=./config.cache --enable-multilib
--with-cross-host=x86_64-pc-linux-gnu
'\''--prefix=/some/were/build-many/install/compilers/x86_64-linux-gnu'\''
'\''--with-sysroot=/some/were/build-many/install/compilers/x86_64-linux-gnu/sysroot'\''
'\''--disable-gdb'\'' '\''--disable-gdbserver'\''
'\''--disable-libdecnumber'\'' '\''--disable-readline'\''
'\''--disable-sim'\''
--program-transform-name='\''s&^&x86_64-glibc-linux-gnu-&'\''
--disable-option-checking'
target_configdirs=''
target_cpu='x86_64'
target_noncanonical='x86_64-glibc-linux-gnu'
target_os='linux-gnu'
target_subdir='x86_64-glibc-linux-gnu'
target_vendor='glibc'
tooldir='${exec_prefix}/x86_64-glibc-linux-gnu'

## ------------------- ##
## File substitutions. ##
## ------------------- ##

alphaieee_frag='/dev/null'
host_makefile_frag='/dev/null'
ospace_frag='/dev/null'
serialization_dependencies='serdep.tmp'
target_makefile_frag='/some/were/build-many/src/binutils/config/mt-gnu'

## ----------- ##
## confdefs.h. ##
## ----------- ##

/* confdefs.h */
#define PACKAGE_NAME ""
#define PACKAGE_TARNAME ""
#define PACKAGE_VERSION ""
#define PACKAGE_STRING ""
#define PACKAGE_BUGREPORT ""
#define PACKAGE_URL ""
#define LT_OBJDIR ".libs/"

configure: exit 0
```
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 18:05                       ` Noah Goldstein via Libc-alpha
@ 2021-09-27 18:10                         ` Florian Weimer via Libc-alpha
  2021-09-27 18:15                           ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-27 18:10 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

* Noah Goldstein:

> On Mon, Sep 27, 2021 at 12:56 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Noah Goldstein:
>>
>> > $> cat
>> > build-many/logs/compilers/x86_64-linux-gnu/003-compilers-x86_64-linux-gnu-binutils-configure-log.txt
>>
>> There should be a config.log file in the binutils build directory (under
>> build-many/build/compilers).  I hope this file contains illuminating
>> data.
>
> Oh sorry. Here is the output of config.log
>
> ```
> $> cat build-many/build/compilers/x86_64-linux-gnu/binutils/config.log

Hmm, is there a binutils/gas/config.log as well? 

Thanks,
Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 18:10                         ` Florian Weimer via Libc-alpha
@ 2021-09-27 18:15                           ` Noah Goldstein via Libc-alpha
  2021-09-27 18:22                             ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27 18:15 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

On Mon, Sep 27, 2021 at 1:11 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Noah Goldstein:
>
> > On Mon, Sep 27, 2021 at 12:56 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Noah Goldstein:
> >>
> >> > $> cat
> >> > build-many/logs/compilers/x86_64-linux-gnu/003-compilers-x86_64-linux-gnu-binutils-configure-log.txt
> >>
> >> There should be a config.log file in the binutils build directory (under
> >> build-many/build/compilers).  I hope this file contains illuminating
> >> data.
> >
> > Oh sorry. Here is the output of config.log
> >
> > ```
> > $> cat build-many/build/compilers/x86_64-linux-gnu/binutils/config.log
>
> Hmm, is there a binutils/gas/config.log as well?

Here is the dump. Thanks for the help!

```
$> cat build-many/build/compilers/x86_64-linux-gnu/binutils/gas/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by gas configure 2.37, which was
generated by GNU Autoconf 2.69.  Invocation command line was

  $ /some/were/build-many/src/binutils/gas/configure
--srcdir=/some/were/build-many/src/binutils/gas
--cache-file=./config.cache --with-gnu-as --with-gnu-ld
--prefix=/some/were/build-many/install/compilers/x86_64-linux-gnu
--with-sysroot=/some/were/build-many/install/compilers/x86_64-linux-gnu/sysroot
--disable-gdb --disable-gdbserver --disable-libdecnumber
--disable-readline --disable-sim
--program-transform-name=s&^&x86_64-glibc-linux-gnu-&
--disable-option-checking --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --target=x86_64-glibc-linux-gnu

## --------- ##
## Platform. ##
## --------- ##

hostname = noah-tigerlake
uname -m = x86_64
uname -r = 5.11.0-27-generic
uname -s = Linux
uname -v = #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021

/usr/bin/uname -p = x86_64
/bin/uname -X     = unknown

/bin/arch              = x86_64
/usr/bin/arch -k       = unknown
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo      = unknown
/bin/machine           = unknown
/usr/bin/oslevel       = unknown
/bin/universe          = unknown

PATH: /some/were/build-many/install/compilers/x86_64-linux-gnu/bin
PATH: /home/noah/programs/libraries/
PATH: /home/noah/.local/bin
PATH: /home/noah/programs/pyscripts
PATH: /home/noah/scripts
PATH: /home/noah/programs/libraries/
PATH: /home/noah/.local/bin
PATH: /home/noah/programs/pyscripts
PATH: /home/noah/scripts
PATH: /home/noah/.local/bin
PATH: /usr/local/sbin
PATH: /usr/local/bin
PATH: /usr/sbin
PATH: /usr/bin
PATH: /sbin
PATH: /bin
PATH: /usr/games
PATH: /usr/local/games
PATH: /snap/bin


## ----------- ##
## Core tests. ##
## ----------- ##

configure:2261: creating cache ./config.cache
configure:2372: checking build system type
configure:2386: result: x86_64-pc-linux-gnu
configure:2406: checking host system type
configure:2419: result: x86_64-pc-linux-gnu
configure:2439: checking target system type
configure:2452: result: x86_64-glibc-linux-gnu
configure:2495: checking for a BSD-compatible install
configure:2563: result: /usr/bin/install -c
configure:2574: checking whether build environment is sane
configure:2629: result: yes
configure:2778: checking for a thread-safe mkdir -p
configure:2817: result: /usr/bin/mkdir -p
configure:2824: checking for gawk
configure:2851: result: gawk
configure:2862: checking whether make sets $(MAKE)
configure:2884: result: yes
configure:2913: checking whether make supports nested variables
configure:2930: result: yes
configure:3065: checking for x86_64-pc-linux-gnu-gcc
configure:3092: result: gcc
configure:3361: checking for C compiler version
configure:3370: gcc --version >&5
gcc (Ubuntu 11.1.0-1ubuntu1~20.04) 11.1.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

configure:3381: $? = 0
configure:3370: gcc -v >&5
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.1.0-1ubuntu1~20.04'
--with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2
--prefix=/usr --with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/lib
--enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie
--with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto
--enable-multiarch --disable-werror --disable-cet --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-11-2V7zgg/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-2V7zgg/gcc-11-11.1.0/debian/tmp-gcn/usr
--without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean
--enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.1.0 (Ubuntu 11.1.0-1ubuntu1~20.04)
... rest of stderr output deleted ...
configure:3381: $? = 0
configure:3370: gcc -V >&5
gcc: error: unrecognized command-line option '-V'
gcc: fatal error: no input files
compilation terminated.
configure:3381: $? = 1
configure:3370: gcc -qversion >&5
gcc: error: unrecognized command-line option '-qversion'; did you mean
'--version'?
gcc: fatal error: no input files
compilation terminated.
configure:3381: $? = 1
configure:3401: checking whether the C compiler works
configure:3423: gcc -g -O2        conftest.c  >&5
configure:3427: $? = 0
configure:3475: result: yes
configure:3478: checking for C compiler default output file name
configure:3480: result: a.out
configure:3486: checking for suffix of executables
configure:3493: gcc -o conftest -g -O2        conftest.c  >&5
configure:3497: $? = 0
configure:3519: result:
configure:3541: checking whether we are cross compiling
configure:3549: gcc -o conftest -g -O2        conftest.c  >&5
configure:3553: $? = 0
configure:3560: ./conftest
configure:3564: $? = 0
configure:3552: result: no
configure:3557: checking for suffix of object files
configure:3579: gcc -c -g -O2      conftest.c >&5
configure:3583: $? = 0
configure:3604: result: o
configure:3608: checking whether we are using the GNU C compiler
configure:3627: gcc -c -g -O2      conftest.c >&5
configure:3627: $? = 0
configure:3636: result: yes
configure:3645: checking whether gcc accepts -g
configure:3665: gcc -c -g  conftest.c >&5
configure:3665: $? = 0
configure:3706: result: yes
configure:3723: checking for gcc option to accept ISO C89
configure:3786: gcc  -c -g -O2      conftest.c >&5
configure:3786: $? = 0
configure:3799: result: none needed
configure:3824: checking whether gcc understands -c and -o together
configure:3846: gcc -c conftest.c -o conftest2.o
configure:3849: $? = 0
configure:3846: gcc -c conftest.c -o conftest2.o
configure:3849: $? = 0
configure:3861: result: yes
configure:3889: checking for style of include used by make
configure:3917: result: GNU
configure:3943: checking dependency style of gcc
configure:4054: result: gcc3
configure:4075: checking how to run the C preprocessor
configure:4106: gcc -E  conftest.c
configure:4106: $? = 0
configure:4120: gcc -E  conftest.c
conftest.c:11:10: fatal error: ac_nonexistent.h: No such file or directory
   11 | #include <ac_nonexistent.h>
      |          ^~~~~~~~~~~~~~~~~~
compilation terminated.
configure:4120: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "gas"
| #define PACKAGE_TARNAME "gas"
| #define PACKAGE_VERSION "2.37"
| #define PACKAGE_STRING "gas 2.37"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define PACKAGE "gas"
| #define VERSION "2.37"
| /* end confdefs.h.  */
| #include <ac_nonexistent.h>
configure:4145: result: gcc -E
configure:4165: gcc -E  conftest.c
configure:4165: $? = 0
configure:4179: gcc -E  conftest.c
conftest.c:11:10: fatal error: ac_nonexistent.h: No such file or directory
   11 | #include <ac_nonexistent.h>
      |          ^~~~~~~~~~~~~~~~~~
compilation terminated.
configure:4179: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "gas"
| #define PACKAGE_TARNAME "gas"
| #define PACKAGE_VERSION "2.37"
| #define PACKAGE_STRING "gas 2.37"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define PACKAGE "gas"
| #define VERSION "2.37"
| /* end confdefs.h.  */
| #include <ac_nonexistent.h>
configure:4208: checking for grep that handles long lines and -e
configure:4266: result: /usr/bin/grep
configure:4271: checking for egrep
configure:4333: result: /usr/bin/grep -E
configure:4338: checking for ANSI C header files
configure:4358: gcc -c -g -O2      conftest.c >&5
configure:4358: $? = 0
configure:4431: gcc -o conftest -g -O2        conftest.c  >&5
configure:4431: $? = 0
configure:4431: ./conftest
configure:4431: $? = 0
configure:4442: result: yes
configure:4455: checking for sys/types.h
configure:4455: gcc -c -g -O2      conftest.c >&5
configure:4455: $? = 0
configure:4455: result: yes
configure:4455: checking for sys/stat.h
configure:4455: gcc -c -g -O2      conftest.c >&5
configure:4455: $? = 0
configure:4455: result: yes
configure:4455: checking for stdlib.h
configure:4455: gcc -c -g -O2      conftest.c >&5
configure:4455: $? = 0
configure:4455: result: yes
configure:4455: checking for string.h
configure:4455: gcc -c -g -O2      conftest.c >&5
configure:4455: $? = 0
configure:4455: result: yes
configure:4455: checking for memory.h
configure:4455: gcc -c -g -O2      conftest.c >&5
configure:4455: $? = 0
configure:4455: result: yes
configure:4455: checking for strings.h
configure:4455: gcc -c -g -O2      conftest.c >&5
configure:4455: $? = 0
configure:4455: result: yes
configure:4455: checking for inttypes.h
configure:4455: gcc -c -g -O2      conftest.c >&5
configure:4455: $? = 0
configure:4455: result: yes
configure:4455: checking for stdint.h
configure:4455: gcc -c -g -O2      conftest.c >&5
configure:4455: $? = 0
configure:4455: result: yes
configure:4455: checking for unistd.h
configure:4455: gcc -c -g -O2      conftest.c >&5
configure:4455: $? = 0
configure:4455: result: yes
configure:4468: checking minix/config.h usability
configure:4468: gcc -c -g -O2      conftest.c >&5
conftest.c:54:10: fatal error: minix/config.h: No such file or directory
   54 | #include <minix/config.h>
      |          ^~~~~~~~~~~~~~~~
compilation terminated.
configure:4468: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "gas"
| #define PACKAGE_TARNAME "gas"
| #define PACKAGE_VERSION "2.37"
| #define PACKAGE_STRING "gas 2.37"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define PACKAGE "gas"
| #define VERSION "2.37"
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| /* end confdefs.h.  */
| #include <stdio.h>
| #ifdef HAVE_SYS_TYPES_H
| # include <sys/types.h>
| #endif
| #ifdef HAVE_SYS_STAT_H
| # include <sys/stat.h>
| #endif
| #ifdef STDC_HEADERS
| # include <stdlib.h>
| # include <stddef.h>
| #else
| # ifdef HAVE_STDLIB_H
| #  include <stdlib.h>
| # endif
| #endif
| #ifdef HAVE_STRING_H
| # if !defined STDC_HEADERS && defined HAVE_MEMORY_H
| #  include <memory.h>
| # endif
| # include <string.h>
| #endif
| #ifdef HAVE_STRINGS_H
| # include <strings.h>
| #endif
| #ifdef HAVE_INTTYPES_H
| # include <inttypes.h>
| #endif
| #ifdef HAVE_STDINT_H
| # include <stdint.h>
| #endif
| #ifdef HAVE_UNISTD_H
| # include <unistd.h>
| #endif
| #include <minix/config.h>
configure:4468: result: no
configure:4468: checking minix/config.h presence
configure:4468: gcc -E  conftest.c
conftest.c:21:10: fatal error: minix/config.h: No such file or directory
   21 | #include <minix/config.h>
      |          ^~~~~~~~~~~~~~~~
compilation terminated.
configure:4468: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "gas"
| #define PACKAGE_TARNAME "gas"
| #define PACKAGE_VERSION "2.37"
| #define PACKAGE_STRING "gas 2.37"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define PACKAGE "gas"
| #define VERSION "2.37"
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| /* end confdefs.h.  */
| #include <minix/config.h>
configure:4468: result: no
configure:4468: checking for minix/config.h
configure:4468: result: no
configure:4489: checking whether it is safe to define __EXTENSIONS__
configure:4507: gcc -c -g -O2      conftest.c >&5
configure:4507: $? = 0
configure:4514: result: yes
configure:4577: checking how to print strings
configure:4604: result: printf
configure:4625: checking for a sed that does not truncate output
configure:4689: result: /usr/bin/sed
configure:4707: checking for fgrep
configure:4769: result: /usr/bin/grep -F
configure:4804: checking for ld used by gcc
configure:4871: result: ld
configure:4878: checking if the linker (ld) is GNU ld
configure:4893: result: yes
configure:4905: checking for BSD- or MS-compatible name lister (nm)
configure:4954: result: nm
configure:5084: checking the name lister (nm) interface
configure:5091: gcc -c -g -O2      conftest.c >&5
configure:5094: nm "conftest.o"
configure:5097: output
0000000000000000 B some_variable
configure:5098: result: BSD nm
configure:5101: checking whether ln -s works
configure:5105: result: yes
configure:5113: checking the maximum length of command line arguments
configure:5238: result: 1879296
configure:5255: checking whether the shell understands some XSI constructs
configure:5264: result: yes
configure:5268: checking whether the shell understands "+="
configure:5272: result: yes
configure:5307: checking for ld option to reload object files
configure:5314: result: -r
configure:5343: checking for x86_64-pc-linux-gnu-objdump
configure:5370: result: objdump
configure:5442: checking how to recognize dependent libraries
configure:5643: result: pass_all
configure:5676: checking for x86_64-pc-linux-gnu-ar
configure:5703: result: ar --plugin
/usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so
configure:5794: checking for x86_64-pc-linux-gnu-strip
configure:5824: result: no
configure:5834: checking for strip
configure:5850: found /usr/bin/strip
configure:5861: result: strip
configure:5893: checking for x86_64-pc-linux-gnu-ranlib
configure:5920: result: ranlib --plugin
/usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so
configure:6067: checking command to parse nm output from gcc object
configure:6185: gcc -c -g -O2      conftest.c >&5
configure:6188: $? = 0
configure:6192: nm conftest.o \| sed -n -e 's/^.*[
]\([ABCDGIRSTW][ABCDGIRSTW]*\)[ ][ ]*\([_A-Za-z][_A-Za-z0-9]*\)$/\1 \2
\2/p' \> conftest.nm
configure:6195: $? = 0
configure:6249: gcc -o conftest -g -O2        conftest.c conftstm.o >&5
configure:6252: $? = 0
configure:6290: result: ok
configure:6385: gcc -c -g -O2      conftest.c >&5
configure:6388: $? = 0
configure:7138: checking for dlfcn.h
configure:7138: gcc -c -g -O2      conftest.c >&5
configure:7138: $? = 0
configure:7138: result: yes
configure:7325: checking for objdir
configure:7340: result: .libs
configure:7611: checking if gcc supports -fno-rtti -fno-exceptions
configure:7629: gcc -c -g -O2      -fno-rtti -fno-exceptions conftest.c >&5
cc1: warning: command-line option '-fno-rtti' is valid for
C++/D/ObjC++ but not for C
configure:7633: $? = 0
configure:7646: result: no
configure:7666: checking for gcc option to produce PIC
configure:7952: result: -fPIC -DPIC
configure:7964: checking if gcc PIC flag -fPIC -DPIC works
configure:7982: gcc -c -g -O2      -fPIC -DPIC -DPIC conftest.c >&5
configure:7986: $? = 0
configure:7999: result: yes
configure:8023: checking if gcc static flag -static works
configure:8051: result: yes
configure:8066: checking if gcc supports -c -o file.o
configure:8087: gcc -c -g -O2      -o out/conftest2.o conftest.c >&5
configure:8091: $? = 0
configure:8113: result: yes
configure:8121: checking if gcc supports -c -o file.o
configure:8168: result: yes
configure:8201: checking whether the gcc linker (ld -m elf_x86_64)
supports shared libraries
configure:9272: result: yes
configure:9309: checking whether -lc should be explicitly linked in
configure:9317: gcc -c -g -O2      conftest.c >&5
configure:9320: $? = 0
configure:9335: gcc -shared  -fPIC -DPIC conftest.o  -v -Wl,-soname
-Wl,conftest -o conftest 2\>\&1 \| /usr/bin/grep  -lc  \>/dev/null
2\>\&1
configure:9338: $? = 0
configure:9352: result: no
configure:9517: checking dynamic linker characteristics
configure:9958: gcc -o conftest -g -O2        -Wl,-rpath -Wl,/foo
conftest.c  >&5
configure:9958: $? = 0
configure:10180: result: GNU/Linux ld.so
configure:10287: checking how to hardcode library paths into programs
configure:10312: result: immediate
configure:10852: checking whether stripping libraries is possible
configure:10857: result: yes
configure:10892: checking if libtool supports shared libraries
configure:10894: result: yes
configure:10897: checking whether to build shared libraries
configure:10918: result: yes
configure:10921: checking whether to build static libraries
configure:10925: result: yes
configure:8130: checking for dlfcn.h
configure:8130: result: yes
configure:8143: checking for windows.h
configure:8143: gcc -c -g -O2      conftest.c >&5
conftest.c:63:10: fatal error: windows.h: No such file or directory
   63 | #include <windows.h>
      |          ^~~~~~~~~~~
compilation terminated.
configure:8143: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "gas"
| #define PACKAGE_TARNAME "gas"
| #define PACKAGE_VERSION "2.37"
| #define PACKAGE_STRING "gas 2.37"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define PACKAGE "gas"
| #define VERSION "2.37"
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| #define __EXTENSIONS__ 1
| #define _ALL_SOURCE 1
| #define _GNU_SOURCE 1
| #define _POSIX_PTHREAD_SEMANTICS 1
| #define _TANDEM_SOURCE 1
| #define HAVE_DLFCN_H 1
| #define LT_OBJDIR ".libs/"
| #define HAVE_DLFCN_H 1
| /* end confdefs.h.  */
| #include <stdio.h>
| #ifdef HAVE_SYS_TYPES_H
| # include <sys/types.h>
| #endif
| #ifdef HAVE_SYS_STAT_H
| # include <sys/stat.h>
| #endif
| #ifdef STDC_HEADERS
| # include <stdlib.h>
| # include <stddef.h>
| #else
| # ifdef HAVE_STDLIB_H
| #  include <stdlib.h>
| # endif
| #endif
| #ifdef HAVE_STRING_H
| # if !defined STDC_HEADERS && defined HAVE_MEMORY_H
| #  include <memory.h>
| # endif
| # include <string.h>
| #endif
| #ifdef HAVE_STRINGS_H
| # include <strings.h>
| #endif
| #ifdef HAVE_INTTYPES_H
| # include <inttypes.h>
| #endif
| #ifdef HAVE_STDINT_H
| # include <stdint.h>
| #endif
| #ifdef HAVE_UNISTD_H
| # include <unistd.h>
| #endif
|
| #include <windows.h>
configure:8143: result: no
configure:8170: checking for library containing dlsym
configure:8201: gcc -o conftest -g -O2        conftest.c  >&5
/usr/bin/ld: /tmp/ccdjFoMs.o: in function `main':
/some/were/build-many/build/compilers/x86_64-linux-gnu/binutils/gas/conftest.c:40:
undefined reference to `dlsym'
collect2: error: ld returned 1 exit status
configure:8201: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "gas"
| #define PACKAGE_TARNAME "gas"
| #define PACKAGE_VERSION "2.37"
| #define PACKAGE_STRING "gas 2.37"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define PACKAGE "gas"
| #define VERSION "2.37"
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| #define __EXTENSIONS__ 1
| #define _ALL_SOURCE 1
| #define _GNU_SOURCE 1
| #define _POSIX_PTHREAD_SEMANTICS 1
| #define _TANDEM_SOURCE 1
| #define HAVE_DLFCN_H 1
| #define LT_OBJDIR ".libs/"
| #define HAVE_DLFCN_H 1
| /* end confdefs.h.  */
|
| /* Override any GCC internal prototype to avoid an error.
|    Use char because int might match the return type of a GCC
|    builtin and then its argument prototype would still apply.  */
| #ifdef __cplusplus
| extern "C"
| #endif
| char dlsym ();
| int
| main ()
| {
| return dlsym ();
|   ;
|   return 0;
| }
configure:8201: gcc -o conftest -g -O2        conftest.c -ldl   >&5
configure:8201: $? = 0
configure:8218: result: -ldl
configure:8290: checking for special C compiler options needed for large files
configure:8335: result: no
configure:8341: checking for _FILE_OFFSET_BITS value needed for large files
configure:8366: gcc -c -g -O2      conftest.c >&5
configure:8366: $? = 0
configure:8398: result: no
configure:8484: checking how to compare bootstrapped objects
configure:8509: result: cmp --ignore-initial=16 $$f1 $$f2
configure:8814: checking whether byte ordering is bigendian
configure:8829: gcc -c -g -O2      conftest.c >&5
conftest.c:31:16: error: unknown type name 'not'
   31 |                not a universal capable compiler
      |                ^~~
conftest.c:31:22: error: expected '=', ',', ';', 'asm' or
'__attribute__' before 'universal'
   31 |                not a universal capable compiler
      |                      ^~~~~~~~~
conftest.c:31:22: error: unknown type name 'universal'
configure:8829: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "gas"
| #define PACKAGE_TARNAME "gas"
| #define PACKAGE_VERSION "2.37"
| #define PACKAGE_STRING "gas 2.37"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define PACKAGE "gas"
| #define VERSION "2.37"
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| #define __EXTENSIONS__ 1
| #define _ALL_SOURCE 1
| #define _GNU_SOURCE 1
| #define _POSIX_PTHREAD_SEMANTICS 1
| #define _TANDEM_SOURCE 1
| #define HAVE_DLFCN_H 1
| #define LT_OBJDIR ".libs/"
| #define HAVE_DLFCN_H 1
| #define ENABLE_CHECKING 1
| /* end confdefs.h.  */
| #ifndef __APPLE_CC__
|        not a universal capable compiler
|      #endif
|      typedef int dummy;
|
configure:8874: gcc -c -g -O2      conftest.c >&5
configure:8874: $? = 0
configure:8892: gcc -c -g -O2      conftest.c >&5
conftest.c: In function 'main':
conftest.c:37:18: error: unknown type name 'not'; did you mean 'ino_t'?
   37 |                  not big endian
      |                  ^~~
      |                  ino_t
conftest.c:37:26: error: expected '=', ',', ';', 'asm' or
'__attribute__' before 'endian'
   37 |                  not big endian
      |                          ^~~~~~
configure:8892: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "gas"
| #define PACKAGE_TARNAME "gas"
| #define PACKAGE_VERSION "2.37"
| #define PACKAGE_STRING "gas 2.37"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define PACKAGE "gas"
| #define VERSION "2.37"
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| #define __EXTENSIONS__ 1
| #define _ALL_SOURCE 1
| #define _GNU_SOURCE 1
| #define _POSIX_PTHREAD_SEMANTICS 1
| #define _TANDEM_SOURCE 1
| #define HAVE_DLFCN_H 1
| #define LT_OBJDIR ".libs/"
| #define HAVE_DLFCN_H 1
| #define ENABLE_CHECKING 1
| /* end confdefs.h.  */
| #include <sys/types.h>
| #include <sys/param.h>
|
| int
| main ()
| {
| #if BYTE_ORDER != BIG_ENDIAN
| not big endian
| #endif
|
|   ;
|   return 0;
| }
configure:9020: result: no
configure:9959: checking for bison
configure:9986: result: bison -y
configure:10002: checking for flex
configure:10029: result: flex
configure:10068: flex conftest.l
configure:10072: $? = 0
configure:10074: checking lex output file root
configure:10088: result: lex.yy
configure:10093: checking lex library
configure:10107: gcc -o conftest -g -O2        conftest.c  -ldl  >&5
/usr/bin/ld: /tmp/cc2Y69vh.o: in function `input':
/some/were/build-many/build/compilers/x86_64-linux-gnu/binutils/gas/lex.yy.c:1180:
undefined reference to `yywrap'
/usr/bin/ld: /tmp/cc2Y69vh.o: in function `yylex':
/some/were/build-many/build/compilers/x86_64-linux-gnu/binutils/gas/lex.yy.c:871:
undefined reference to `yywrap'
/usr/bin/ld: /tmp/cc2Y69vh.o: in function `main':
/some/were/build-many/build/compilers/x86_64-linux-gnu/binutils/gas/conftest.l:17:
undefined reference to `yywrap'
collect2: error: ld returned 1 exit status
configure:10107: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "gas"
| #define PACKAGE_TARNAME "gas"
| #define PACKAGE_VERSION "2.37"
| #define PACKAGE_STRING "gas 2.37"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define PACKAGE "gas"
| #define VERSION "2.37"
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| #define __EXTENSIONS__ 1
| #define _ALL_SOURCE 1
| #define _GNU_SOURCE 1
| #define _POSIX_PTHREAD_SEMANTICS 1
| #define _TANDEM_SOURCE 1
| #define HAVE_DLFCN_H 1
| #define LT_OBJDIR ".libs/"
| #define HAVE_DLFCN_H 1
| #define ENABLE_CHECKING 1
| #define DEFAULT_ARCH "x86_64"
| #define DEFAULT_GENERATE_X86_RELAX_RELOCATIONS 1
| #define DEFAULT_GENERATE_ELF_STT_COMMON 0
| #define DEFAULT_GENERATE_BUILD_NOTES 0
| #define DEFAULT_X86_USED_NOTE 1
| #define DEFAULT_RISCV_ATTR 0
| #define DEFAULT_MIPS_FIX_LOONGSON3_LLSC 0
| #define DEFAULT_FLAG_COMPRESS_DEBUG 1
| #define EMULATIONS  &i386elf,
| #define DEFAULT_EMULATION "i386elf"
| #define TARGET_ALIAS "x86_64-glibc-linux-gnu"
| #define TARGET_CANONICAL "x86_64-glibc-linux-gnu"
| #define TARGET_CPU "x86_64"
| #define TARGET_VENDOR "glibc"
| #define TARGET_OS "linux-gnu"
| /* end confdefs.h.  */
|
| #line 3 "lex.yy.c"
|
| #define  YY_INT_ALIGNED short int
|
| /* A lexical scanner generated by flex */
|
| #define FLEX_SCANNER
| #define YY_FLEX_MAJOR_VERSION 2
| #define YY_FLEX_MINOR_VERSION 6
| #define YY_FLEX_SUBMINOR_VERSION 4
| #if YY_FLEX_SUBMINOR_VERSION > 0
| #define FLEX_BETA
| #endif
|
| /* First, we deal with  platform-specific or compiler-specific issues. */
|
| /* begin standard C headers. */
| #include <stdio.h>
| #include <string.h>
| #include <errno.h>
| #include <stdlib.h>
|
| /* end standard C headers. */
|
| /* flex integer type definitions */
|
| #ifndef FLEXINT_H
| #define FLEXINT_H
|
| /* C99 systems have <inttypes.h>. Non-C99 systems may or may not. */
|
| #if defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
|
| /* C99 says to define __STDC_LIMIT_MACROS before including stdint.h,
|  * if you want the limit (max/min) macros for int types.
|  */
| #ifndef __STDC_LIMIT_MACROS
| #define __STDC_LIMIT_MACROS 1
| #endif
|
| #include <inttypes.h>
| typedef int8_t flex_int8_t;
| typedef uint8_t flex_uint8_t;
| typedef int16_t flex_int16_t;
| typedef uint16_t flex_uint16_t;
| typedef int32_t flex_int32_t;
| typedef uint32_t flex_uint32_t;
| #else
| typedef signed char flex_int8_t;
| typedef short int flex_int16_t;
| typedef int flex_int32_t;
| typedef unsigned char flex_uint8_t;
| typedef unsigned short int flex_uint16_t;
| typedef unsigned int flex_uint32_t;
|
| /* Limits of integral types. */
| #ifndef INT8_MIN
| #define INT8_MIN               (-128)
| #endif
| #ifndef INT16_MIN
| #define INT16_MIN              (-32767-1)
| #endif
| #ifndef INT32_MIN
| #define INT32_MIN              (-2147483647-1)
| #endif
| #ifndef INT8_MAX
| #define INT8_MAX               (127)
| #endif
| #ifndef INT16_MAX
| #define INT16_MAX              (32767)
| #endif
| #ifndef INT32_MAX
| #define INT32_MAX              (2147483647)
| #endif
| #ifndef UINT8_MAX
| #define UINT8_MAX              (255U)
| #endif
| #ifndef UINT16_MAX
| #define UINT16_MAX             (65535U)
| #endif
| #ifndef UINT32_MAX
| #define UINT32_MAX             (4294967295U)
| #endif
|
| #ifndef SIZE_MAX
| #define SIZE_MAX               (~(size_t)0)
| #endif
|
| #endif /* ! C99 */
|
| #endif /* ! FLEXINT_H */
|
| /* begin standard C++ headers. */
|
| /* TODO: this is always defined, so inline it */
| #define yyconst const
|
| #if defined(__GNUC__) && __GNUC__ >= 3
| #define yynoreturn __attribute__((__noreturn__))
| #else
| #define yynoreturn
| #endif
|
| /* Returned upon end-of-file. */
| #define YY_NULL 0
|
| /* Promotes a possibly negative, possibly signed char to an
|  *   integer in range [0..255] for use as an array index.
|  */
| #define YY_SC_TO_UI(c) ((YY_CHAR) (c))
|
| /* Enter a start condition.  This macro really ought to take a parameter,
|  * but we do it the disgusting crufty way forced on us by the ()-less
|  * definition of BEGIN.
|  */
| #define BEGIN (yy_start) = 1 + 2 *
| /* Translate the current start state into a value that can be later handed
|  * to BEGIN to return to the state.  The YYSTATE alias is for lex
|  * compatibility.
|  */
| #define YY_START (((yy_start) - 1) / 2)
| #define YYSTATE YY_START
| /* Action number for EOF rule of a given start state. */
| #define YY_STATE_EOF(state) (YY_END_OF_BUFFER + state + 1)
| /* Special action meaning "start processing a new file". */
| #define YY_NEW_FILE yyrestart( yyin  )
| #define YY_END_OF_BUFFER_CHAR 0
|
| /* Size of default input buffer. */
| #ifndef YY_BUF_SIZE
| #ifdef __ia64__
| /* On IA-64, the buffer size is 16k, not 8k.
|  * Moreover, YY_BUF_SIZE is 2*YY_READ_BUF_SIZE in the general case.
|  * Ditto for the __ia64__ case accordingly.
|  */
| #define YY_BUF_SIZE 32768
| #else
| #define YY_BUF_SIZE 16384
| #endif /* __ia64__ */
| #endif
|
| /* The state buf must be large enough to hold one state per
character in the main buffer.
|  */
| #define YY_STATE_BUF_SIZE   ((YY_BUF_SIZE + 2) * sizeof(yy_state_type))
|
| #ifndef YY_TYPEDEF_YY_BUFFER_STATE
| #define YY_TYPEDEF_YY_BUFFER_STATE
| typedef struct yy_buffer_state *YY_BUFFER_STATE;
| #endif
|
| #ifndef YY_TYPEDEF_YY_SIZE_T
| #define YY_TYPEDEF_YY_SIZE_T
| typedef size_t yy_size_t;
| #endif
|
| extern int yyleng;
|
| extern FILE *yyin, *yyout;
|
| #define EOB_ACT_CONTINUE_SCAN 0
| #define EOB_ACT_END_OF_FILE 1
| #define EOB_ACT_LAST_MATCH 2
|
|     #define YY_LESS_LINENO(n)
|     #define YY_LINENO_REWIND_TO(ptr)
|
| /* Return all but the first "n" matched characters back to the input
stream. */
| #define yyless(n) \
| do \
| { \
| /* Undo effects of setting up yytext. */ \
|         int yyless_macro_arg = (n); \
|         YY_LESS_LINENO(yyless_macro_arg);\
| *yy_cp = (yy_hold_char); \
| YY_RESTORE_YY_MORE_OFFSET \
| (yy_c_buf_p) = yy_cp = yy_bp + yyless_macro_arg - YY_MORE_ADJ; \
| YY_DO_BEFORE_ACTION; /* set up yytext again */ \
| } \
| while ( 0 )
| #define unput(c) yyunput( c, (yytext_ptr)  )
|
| #ifndef YY_STRUCT_YY_BUFFER_STATE
| #define YY_STRUCT_YY_BUFFER_STATE
| struct yy_buffer_state
| {
| FILE *yy_input_file;
|
| char *yy_ch_buf; /* input buffer */
| char *yy_buf_pos; /* current position in input buffer */
|
| /* Size of input buffer in bytes, not including room for EOB
| * characters.
| */
| int yy_buf_size;
|
| /* Number of characters read into yy_ch_buf, not including EOB
| * characters.
| */
| int yy_n_chars;
|
| /* Whether we "own" the buffer - i.e., we know we created it,
| * and can realloc() it to grow it, and should free() it to
| * delete it.
| */
| int yy_is_our_buffer;
|
| /* Whether this is an "interactive" input source; if so, and
| * if we're using stdio for input, then we want to use getc()
| * instead of fread(), to make sure we stop fetching input after
| * each newline.
| */
| int yy_is_interactive;
|
| /* Whether we're considered to be at the beginning of a line.
| * If so, '^' rules will be active on the next match, otherwise
| * not.
| */
| int yy_at_bol;
|
|     int yy_bs_lineno; /**< The line count. */
|     int yy_bs_column; /**< The column count. */
|
| /* Whether to try to fill the input buffer when we reach the
| * end of it.
| */
| int yy_fill_buffer;
|
| int yy_buffer_status;
|
| #define YY_BUFFER_NEW 0
| #define YY_BUFFER_NORMAL 1
| /* When an EOF's been seen but there's still some text to process
| * then we mark the buffer as YY_EOF_PENDING, to indicate that we
| * shouldn't try reading from the input source any more.  We might
| * still have a bunch of tokens to match, though, because of
| * possible backing-up.
| *
| * When we actually see the EOF, we change the status to "new"
| * (via yyrestart()), so that the user can continue scanning by
| * just pointing yyin at a new input file.
| */
| #define YY_BUFFER_EOF_PENDING 2
|
| };
| #endif /* !YY_STRUCT_YY_BUFFER_STATE */
|
| /* Stack of input buffers. */
| static size_t yy_buffer_stack_top = 0; /**< index of top of stack. */
| static size_t yy_buffer_stack_max = 0; /**< capacity of stack. */
| static YY_BUFFER_STATE * yy_buffer_stack = NULL; /**< Stack as an array. */
|
| /* We provide macros for accessing buffer states in case in the
|  * future we want to put the buffer states in a more general
|  * "scanner state".
|  *
|  * Returns the top of the stack, or NULL.
|  */
| #define YY_CURRENT_BUFFER ( (yy_buffer_stack) \
|                           ? (yy_buffer_stack)[(yy_buffer_stack_top)] \
|                           : NULL)
| /* Same as previous macro, but useful when we know that the buffer
stack is not
|  * NULL or when we need an lvalue. For internal use only.
|  */
| #define YY_CURRENT_BUFFER_LVALUE (yy_buffer_stack)[(yy_buffer_stack_top)]
|
| /* yy_hold_char holds the character lost when yytext is formed. */
| static char yy_hold_char;
| static int yy_n_chars; /* number of characters read into yy_ch_buf */
| int yyleng;
|
| /* Points to current character in buffer. */
| static char *yy_c_buf_p = NULL;
| static int yy_init = 0; /* whether we need to initialize */
| static int yy_start = 0; /* start state number */
|
| /* Flag which is used to allow yywrap()'s to do buffer switches
|  * instead of setting up a fresh yyin.  A bit of a hack ...
|  */
| static int yy_did_buffer_switch_on_eof;
|
| void yyrestart ( FILE *input_file  );
| void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer  );
| YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size  );
| void yy_delete_buffer ( YY_BUFFER_STATE b  );
| void yy_flush_buffer ( YY_BUFFER_STATE b  );
| void yypush_buffer_state ( YY_BUFFER_STATE new_buffer  );
| void yypop_buffer_state ( void );
|
| static void yyensure_buffer_stack ( void );
| static void yy_load_buffer_state ( void );
| static void yy_init_buffer ( YY_BUFFER_STATE b, FILE *file  );
| #define YY_FLUSH_BUFFER yy_flush_buffer( YY_CURRENT_BUFFER )
|
| YY_BUFFER_STATE yy_scan_buffer ( char *base, yy_size_t size  );
| YY_BUFFER_STATE yy_scan_string ( const char *yy_str  );
| YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int len  );
|
| void *yyalloc ( yy_size_t  );
| void *yyrealloc ( void *, yy_size_t  );
| void yyfree ( void *  );
|
| #define yy_new_buffer yy_create_buffer
| #define yy_set_interactive(is_interactive) \
| { \
| if ( ! YY_CURRENT_BUFFER ){ \
|         yyensure_buffer_stack (); \
| YY_CURRENT_BUFFER_LVALUE =    \
|             yy_create_buffer( yyin, YY_BUF_SIZE ); \
| } \
| YY_CURRENT_BUFFER_LVALUE->yy_is_interactive = is_interactive; \
| }
| #define yy_set_bol(at_bol) \
| { \
| if ( ! YY_CURRENT_BUFFER ){\
|         yyensure_buffer_stack (); \
| YY_CURRENT_BUFFER_LVALUE =    \
|             yy_create_buffer( yyin, YY_BUF_SIZE ); \
| } \
| YY_CURRENT_BUFFER_LVALUE->yy_at_bol = at_bol; \
| }
| #define YY_AT_BOL() (YY_CURRENT_BUFFER_LVALUE->yy_at_bol)
|
| /* Begin user sect3 */
| typedef flex_uint8_t YY_CHAR;
|
| FILE *yyin = NULL, *yyout = NULL;
|
| typedef int yy_state_type;
|
| extern int yylineno;
| int yylineno = 1;
|
| extern char *yytext;
| #ifdef yytext_ptr
| #undef yytext_ptr
| #endif
| #define yytext_ptr yytext
|
| static yy_state_type yy_get_previous_state ( void );
| static yy_state_type yy_try_NUL_trans ( yy_state_type current_state  );
| static int yy_get_next_buffer ( void );
| static void yynoreturn yy_fatal_error ( const char* msg  );
|
| /* Done after the current pattern has been matched and before the
|  * corresponding action - sets up yytext.
|  */
| #define YY_DO_BEFORE_ACTION \
| (yytext_ptr) = yy_bp; \
| (yytext_ptr) -= (yy_more_len); \
| yyleng = (int) (yy_cp - (yytext_ptr)); \
| (yy_hold_char) = *yy_cp; \
| *yy_cp = '\0'; \
| (yy_c_buf_p) = yy_cp;
| #define YY_NUM_RULES 8
| #define YY_END_OF_BUFFER 9
| /* This struct is not used in this scanner,
|    but its presence is necessary. */
| struct yy_trans_info
| {
| flex_int32_t yy_verify;
| flex_int32_t yy_nxt;
| };
| static const flex_int16_t yy_acclist[23] =
|     {   0,
|         9,    7,    8,    8,    1,    7,    8,    2,    7,    8,
|         3,    7,    8,    4,    7,    8,    5,    7,    8,    6,
|         7,    8
|     } ;
|
| static const flex_int16_t yy_accept[14] =
|     {   0,
|         1,    1,    1,    2,    4,    5,    8,   11,   14,   17,
|        20,   23,   23
|     } ;
|
| static const YY_CHAR yy_ec[256] =
|     {   0,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    2,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    3,    4,    5,    6,
|
|         7,    8,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
|         1,    1,    1,    1,    1
|     } ;
|
| static const YY_CHAR yy_meta[9] =
|     {   0,
|         1,    1,    1,    1,    1,    1,    1,    1
|     } ;
|
| static const flex_int16_t yy_base[13] =
|     {   0,
|         0,    0,    9,   10,   10,   10,   10,   10,   10,   10,
|        10,   10
|     } ;
|
| static const flex_int16_t yy_def[13] =
|     {   0,
|        12,    1,   12,   12,   12,   12,   12,   12,   12,   12,
|        12,    0
|     } ;
|
| static const flex_int16_t yy_nxt[19] =
|     {   0,
|         4,    5,    6,    7,    8,    9,   10,   11,   12,    3,
|        12,   12,   12,   12,   12,   12,   12,   12
|     } ;
|
| static const flex_int16_t yy_chk[19] =
|     {   0,
|         1,    1,    1,    1,    1,    1,    1,    1,    3,   12,
|        12,   12,   12,   12,   12,   12,   12,   12
|     } ;
|
| extern int yy_flex_debug;
| int yy_flex_debug = 0;
|
| static yy_state_type *yy_state_buf=0, *yy_state_ptr=0;
| static char *yy_full_match;
| static int yy_lp;
| #define REJECT \
| { \
| *yy_cp = (yy_hold_char); /* undo effects of setting up yytext */ \
| yy_cp = (yy_full_match); /* restore poss. backed-over text */ \
| ++(yy_lp); \
| goto find_rule; \
| }
|
| static int yy_more_flag = 0;
| static int yy_more_len = 0;
| #define yymore() ((yy_more_flag) = 1)
| #define YY_MORE_ADJ (yy_more_len)
| #define YY_RESTORE_YY_MORE_OFFSET
| char *yytext;
| #line 1 "conftest.l"
| #line 460 "lex.yy.c"
|
| #define INITIAL 0
|
| #ifndef YY_NO_UNISTD_H
| /* Special case for "unistd.h", since it is non-ANSI. We include it way
|  * down here because we want the user's section 1 to have been scanned first.
|  * The user has a chance to override it with an option.
|  */
| #include <unistd.h>
| #endif
|
| #ifndef YY_EXTRA_TYPE
| #define YY_EXTRA_TYPE void *
| #endif
|
| static int yy_init_globals ( void );
|
| /* Accessor methods to globals.
|    These are made visible to non-reentrant scanners for convenience. */
|
| int yylex_destroy ( void );
|
| int yyget_debug ( void );
|
| void yyset_debug ( int debug_flag  );
|
| YY_EXTRA_TYPE yyget_extra ( void );
|
| void yyset_extra ( YY_EXTRA_TYPE user_defined  );
|
| FILE *yyget_in ( void );
|
| void yyset_in  ( FILE * _in_str  );
|
| FILE *yyget_out ( void );
|
| void yyset_out  ( FILE * _out_str  );
|
| int yyget_leng ( void );
|
| char *yyget_text ( void );
|
| int yyget_lineno ( void );
|
| void yyset_lineno ( int _line_number  );
|
| /* Macros after this point can all be overridden by user definitions in
|  * section 1.
|  */
|
| #ifndef YY_SKIP_YYWRAP
| #ifdef __cplusplus
| extern "C" int yywrap ( void );
| #else
| extern int yywrap ( void );
| #endif
| #endif
|
| #ifndef YY_NO_UNPUT
|
|     static void yyunput ( int c, char *buf_ptr  );
|
| #endif
|
| #ifndef yytext_ptr
| static void yy_flex_strncpy ( char *, const char *, int );
| #endif
|
| #ifdef YY_NEED_STRLEN
| static int yy_flex_strlen ( const char * );
| #endif
|
| #ifndef YY_NO_INPUT
| #ifdef __cplusplus
| static int yyinput ( void );
| #else
| static int input ( void );
| #endif
|
| #endif
|
| /* Amount of stuff to slurp up with each read. */
| #ifndef YY_READ_BUF_SIZE
| #ifdef __ia64__
| /* On IA-64, the buffer size is 16k, not 8k */
| #define YY_READ_BUF_SIZE 16384
| #else
| #define YY_READ_BUF_SIZE 8192
| #endif /* __ia64__ */
| #endif
|
| /* Copy whatever the last rule matched to the standard output. */
| #ifndef ECHO
| /* This used to be an fputs(), but since the string might contain NUL's,
|  * we now use fwrite().
|  */
| #define ECHO do { if (fwrite( yytext, (size_t) yyleng, 1, yyout ))
{} } while (0)
| #endif
|
| /* Gets input and stuffs it into "buf".  number of characters read,
or YY_NULL,
|  * is returned in "result".
|  */
| #ifndef YY_INPUT
| #define YY_INPUT(buf,result,max_size) \
| if ( YY_CURRENT_BUFFER_LVALUE->yy_is_interactive ) \
| { \
| int c = '*'; \
| int n; \
| for ( n = 0; n < max_size && \
|      (c = getc( yyin )) != EOF && c != '\n'; ++n ) \
| buf[n] = (char) c; \
| if ( c == '\n' ) \
| buf[n++] = (char) c; \
| if ( c == EOF && ferror( yyin ) ) \
| YY_FATAL_ERROR( "input in flex scanner failed" ); \
| result = n; \
| } \
| else \
| { \
| errno=0; \
| while ( (result = (int) fread(buf, 1, (yy_size_t) max_size, yyin))
== 0 && ferror(yyin)) \
| { \
| if( errno != EINTR) \
| { \
| YY_FATAL_ERROR( "input in flex scanner failed" ); \
| break; \
| } \
| errno=0; \
| clearerr(yyin); \
| } \
| }\
| \
|
| #endif
|
| /* No semi-colon after return; correct usage is to write "yyterminate();" -
|  * we don't want an extra ';' after the "return" because that will cause
|  * some compilers to complain about unreachable statements.
|  */
| #ifndef yyterminate
| #define yyterminate() return YY_NULL
| #endif
|
| /* Number of entries by which start-condition stack grows. */
| #ifndef YY_START_STACK_INCR
| #define YY_START_STACK_INCR 25
| #endif
|
| /* Report a fatal error. */
| #ifndef YY_FATAL_ERROR
| #define YY_FATAL_ERROR(msg) yy_fatal_error( msg )
| #endif
|
| /* end tables serialization structures and prototypes */
|
| /* Default declaration of generated scanner - a define so the user can
|  * easily add parameters.
|  */
| #ifndef YY_DECL
| #define YY_DECL_IS_OURS 1
|
| extern int yylex (void);
|
| #define YY_DECL int yylex (void)
| #endif /* !YY_DECL */
|
| /* Code executed at the beginning of each rule, after yytext and yyleng
|  * have been set up.
|  */
| #ifndef YY_USER_ACTION
| #define YY_USER_ACTION
| #endif
|
| /* Code executed at the end of each rule. */
| #ifndef YY_BREAK
| #define YY_BREAK /*LINTED*/break;
| #endif
|
| #define YY_RULE_SETUP \
| YY_USER_ACTION
|
| /** The main scanner function which does all the work.
|  */
| YY_DECL
| {
| yy_state_type yy_current_state;
| char *yy_cp, *yy_bp;
| int yy_act;
|
| if ( !(yy_init) )
| {
| (yy_init) = 1;
|
| #ifdef YY_USER_INIT
| YY_USER_INIT;
| #endif
|
|         /* Create the reject buffer large enough to save one state
per allowed character. */
|         if ( ! (yy_state_buf) )
|             (yy_state_buf) = (yy_state_type *)yyalloc(YY_STATE_BUF_SIZE  );
|             if ( ! (yy_state_buf) )
|                 YY_FATAL_ERROR( "out of dynamic memory in yylex()" );
|
| if ( ! (yy_start) )
| (yy_start) = 1; /* first start state */
|
| if ( ! yyin )
| yyin = stdin;
|
| if ( ! yyout )
| yyout = stdout;
|
| if ( ! YY_CURRENT_BUFFER ) {
| yyensure_buffer_stack ();
| YY_CURRENT_BUFFER_LVALUE =
| yy_create_buffer( yyin, YY_BUF_SIZE );
| }
|
| yy_load_buffer_state(  );
| }
|
| {
| #line 1 "conftest.l"
|
| #line 685 "lex.yy.c"
|
| while ( /*CONSTCOND*/1 ) /* loops until end-of-file is reached */
| {
| (yy_more_len) = 0;
| if ( (yy_more_flag) )
| {
| (yy_more_len) = (int) ((yy_c_buf_p) - (yytext_ptr));
| (yy_more_flag) = 0;
| }
| yy_cp = (yy_c_buf_p);
|
| /* Support of yytext. */
| *yy_cp = (yy_hold_char);
|
| /* yy_bp points to the position in yy_ch_buf of the start of
| * the current run.
| */
| yy_bp = yy_cp;
|
| yy_current_state = (yy_start);
|
| (yy_state_ptr) = (yy_state_buf);
| *(yy_state_ptr)++ = yy_current_state;
|
| yy_match:
| do
| {
| YY_CHAR yy_c = yy_ec[YY_SC_TO_UI(*yy_cp)] ;
| while ( yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state )
| {
| yy_current_state = (int) yy_def[yy_current_state];
| if ( yy_current_state >= 13 )
| yy_c = yy_meta[yy_c];
| }
| yy_current_state = yy_nxt[yy_base[yy_current_state] + yy_c];
| *(yy_state_ptr)++ = yy_current_state;
| ++yy_cp;
| }
| while ( yy_base[yy_current_state] != 10 );
|
| yy_find_action:
| yy_current_state = *--(yy_state_ptr);
| (yy_lp) = yy_accept[yy_current_state];
|
| find_rule: /* we branch to this label when backing up */
|
| for ( ; ; ) /* until we find what rule we matched */
| {
| if ( (yy_lp) && (yy_lp) < yy_accept[yy_current_state + 1] )
| {
| yy_act = yy_acclist[(yy_lp)];
| {
| (yy_full_match) = yy_cp;
| break;
| }
| }
| --yy_cp;
| yy_current_state = *--(yy_state_ptr);
| (yy_lp) = yy_accept[yy_current_state];
| }
|
| YY_DO_BEFORE_ACTION;
|
| do_action: /* This label is used only to access EOF actions. */
|
| switch ( yy_act )
| { /* beginning of action switch */
| case 1:
| YY_RULE_SETUP
| #line 2 "conftest.l"
| { ECHO; }
| YY_BREAK
| case 2:
| YY_RULE_SETUP
| #line 3 "conftest.l"
| { REJECT; }
| YY_BREAK
| case 3:
| YY_RULE_SETUP
| #line 4 "conftest.l"
| { yymore (); }
| YY_BREAK
| case 4:
| YY_RULE_SETUP
| #line 5 "conftest.l"
| { yyless (1); }
| YY_BREAK
| case 5:
| YY_RULE_SETUP
| #line 6 "conftest.l"
| { /* IRIX 6.5 flex 2.5.4 underquotes its yyless argument.  */
|     yyless ((input () != 0)); }
| YY_BREAK
| case 6:
| YY_RULE_SETUP
| #line 8 "conftest.l"
| { unput (yytext[0]); }
| YY_BREAK
| case 7:
| YY_RULE_SETUP
| #line 9 "conftest.l"
| { BEGIN INITIAL; }
| YY_BREAK
| case 8:
| YY_RULE_SETUP
| #line 10 "conftest.l"
| ECHO;
| YY_BREAK
| #line 794 "lex.yy.c"
| case YY_STATE_EOF(INITIAL):
| yyterminate();
|
| case YY_END_OF_BUFFER:
| {
| /* Amount of text matched not including the EOB char. */
| int yy_amount_of_matched_text = (int) (yy_cp - (yytext_ptr)) - 1;
|
| /* Undo the effects of YY_DO_BEFORE_ACTION. */
| *yy_cp = (yy_hold_char);
| YY_RESTORE_YY_MORE_OFFSET
|
| if ( YY_CURRENT_BUFFER_LVALUE->yy_buffer_status == YY_BUFFER_NEW )
| {
| /* We're scanning a new file or input source.  It's
| * possible that this happened because the user
| * just pointed yyin at a new source and called
| * yylex().  If so, then we have to assure
| * consistency between YY_CURRENT_BUFFER and our
| * globals.  Here is the right place to do so, because
| * this is the first action (other than possibly a
| * back-up) that will match for the new input source.
| */
| (yy_n_chars) = YY_CURRENT_BUFFER_LVALUE->yy_n_chars;
| YY_CURRENT_BUFFER_LVALUE->yy_input_file = yyin;
| YY_CURRENT_BUFFER_LVALUE->yy_buffer_status = YY_BUFFER_NORMAL;
| }
|
| /* Note that here we test for yy_c_buf_p "<=" to the position
| * of the first EOB in the buffer, since yy_c_buf_p will
| * already have been incremented past the NUL character
| * (since all states make transitions on EOB to the
| * end-of-buffer state).  Contrast this with the test
| * in input().
| */
| if ( (yy_c_buf_p) <= &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[(yy_n_chars)] )
| { /* This was really a NUL. */
| yy_state_type yy_next_state;
|
| (yy_c_buf_p) = (yytext_ptr) + yy_amount_of_matched_text;
|
| yy_current_state = yy_get_previous_state(  );
|
| /* Okay, we're now positioned to make the NUL
| * transition.  We couldn't have
| * yy_get_previous_state() go ahead and do it
| * for us because it doesn't know how to deal
| * with the possibility of jamming (and we don't
| * want to build jamming into it because then it
| * will run more slowly).
| */
|
| yy_next_state = yy_try_NUL_trans( yy_current_state );
|
| yy_bp = (yytext_ptr) + YY_MORE_ADJ;
|
| if ( yy_next_state )
| {
| /* Consume the NUL. */
| yy_cp = ++(yy_c_buf_p);
| yy_current_state = yy_next_state;
| goto yy_match;
| }
|
| else
| {
| yy_cp = (yy_c_buf_p);
| goto yy_find_action;
| }
| }
|
| else switch ( yy_get_next_buffer(  ) )
| {
| case EOB_ACT_END_OF_FILE:
| {
| (yy_did_buffer_switch_on_eof) = 0;
|
| if ( yywrap(  ) )
| {
| /* Note: because we've taken care in
| * yy_get_next_buffer() to have set up
| * yytext, we can now set up
| * yy_c_buf_p so that if some total
| * hoser (like flex itself) wants to
| * call the scanner after we return the
| * YY_NULL, it'll still work - another
| * YY_NULL will get returned.
| */
| (yy_c_buf_p) = (yytext_ptr) + YY_MORE_ADJ;
|
| yy_act = YY_STATE_EOF(YY_START);
| goto do_action;
| }
|
| else
| {
| if ( ! (yy_did_buffer_switch_on_eof) )
| YY_NEW_FILE;
| }
| break;
| }
|
| case EOB_ACT_CONTINUE_SCAN:
| (yy_c_buf_p) =
| (yytext_ptr) + yy_amount_of_matched_text;
|
| yy_current_state = yy_get_previous_state(  );
|
| yy_cp = (yy_c_buf_p);
| yy_bp = (yytext_ptr) + YY_MORE_ADJ;
| goto yy_match;
|
| case EOB_ACT_LAST_MATCH:
| (yy_c_buf_p) =
| &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[(yy_n_chars)];
|
| yy_current_state = yy_get_previous_state(  );
|
| yy_cp = (yy_c_buf_p);
| yy_bp = (yytext_ptr) + YY_MORE_ADJ;
| goto yy_find_action;
| }
| break;
| }
|
| default:
| YY_FATAL_ERROR(
| "fatal flex scanner internal error--no action found" );
| } /* end of action switch */
| } /* end of scanning one token */
| } /* end of user's declarations */
| } /* end of yylex */
|
| /* yy_get_next_buffer - try to read in a new buffer
|  *
|  * Returns a code representing an action:
|  * EOB_ACT_LAST_MATCH -
|  * EOB_ACT_CONTINUE_SCAN - continue scanning from current position
|  * EOB_ACT_END_OF_FILE - end of file
|  */
| static int yy_get_next_buffer (void)
| {
|      char *dest = YY_CURRENT_BUFFER_LVALUE->yy_ch_buf;
| char *source = (yytext_ptr);
| int number_to_move, i;
| int ret_val;
|
| if ( (yy_c_buf_p) > &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[(yy_n_chars) + 1] )
| YY_FATAL_ERROR(
| "fatal flex scanner internal error--end of buffer missed" );
|
| if ( YY_CURRENT_BUFFER_LVALUE->yy_fill_buffer == 0 )
| { /* Don't try to fill the buffer, so this is an EOF. */
| if ( (yy_c_buf_p) - (yytext_ptr) - YY_MORE_ADJ == 1 )
| {
| /* We matched a single character, the EOB, so
| * treat this as a final EOF.
| */
| return EOB_ACT_END_OF_FILE;
| }
|
| else
| {
| /* We matched some text prior to the EOB, first
| * process it.
| */
| return EOB_ACT_LAST_MATCH;
| }
| }
|
| /* Try to read more data. */
|
| /* First move last chars to start of buffer. */
| number_to_move = (int) ((yy_c_buf_p) - (yytext_ptr) - 1);
|
| for ( i = 0; i < number_to_move; ++i )
| *(dest++) = *(source++);
|
| if ( YY_CURRENT_BUFFER_LVALUE->yy_buffer_status == YY_BUFFER_EOF_PENDING )
| /* don't do the read, it's not guaranteed to return an EOF,
| * just force an EOF
| */
| YY_CURRENT_BUFFER_LVALUE->yy_n_chars = (yy_n_chars) = 0;
|
| else
| {
| int num_to_read =
| YY_CURRENT_BUFFER_LVALUE->yy_buf_size - number_to_move - 1;
|
| while ( num_to_read <= 0 )
| { /* Not enough room in the buffer - grow it. */
|
| YY_FATAL_ERROR(
| "input buffer overflow, can't enlarge buffer because scanner uses REJECT" );
|
| }
|
| if ( num_to_read > YY_READ_BUF_SIZE )
| num_to_read = YY_READ_BUF_SIZE;
|
| /* Read in more data. */
| YY_INPUT( (&YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[number_to_move]),
| (yy_n_chars), num_to_read );
|
| YY_CURRENT_BUFFER_LVALUE->yy_n_chars = (yy_n_chars);
| }
|
| if ( (yy_n_chars) == 0 )
| {
| if ( number_to_move == YY_MORE_ADJ )
| {
| ret_val = EOB_ACT_END_OF_FILE;
| yyrestart( yyin  );
| }
|
| else
| {
| ret_val = EOB_ACT_LAST_MATCH;
| YY_CURRENT_BUFFER_LVALUE->yy_buffer_status =
| YY_BUFFER_EOF_PENDING;
| }
| }
|
| else
| ret_val = EOB_ACT_CONTINUE_SCAN;
|
| if (((yy_n_chars) + number_to_move) > YY_CURRENT_BUFFER_LVALUE->yy_buf_size) {
| /* Extend the array by 50%, plus the number we really need. */
| int new_size = (yy_n_chars) + number_to_move + ((yy_n_chars) >> 1);
| YY_CURRENT_BUFFER_LVALUE->yy_ch_buf = (char *) yyrealloc(
| (void *) YY_CURRENT_BUFFER_LVALUE->yy_ch_buf, (yy_size_t) new_size  );
| if ( ! YY_CURRENT_BUFFER_LVALUE->yy_ch_buf )
| YY_FATAL_ERROR( "out of dynamic memory in yy_get_next_buffer()" );
| /* "- 2" to take care of EOB's */
| YY_CURRENT_BUFFER_LVALUE->yy_buf_size = (int) (new_size - 2);
| }
|
| (yy_n_chars) += number_to_move;
| YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[(yy_n_chars)] = YY_END_OF_BUFFER_CHAR;
| YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[(yy_n_chars) + 1] = YY_END_OF_BUFFER_CHAR;
|
| (yytext_ptr) = &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[0];
|
| return ret_val;
| }
|
| /* yy_get_previous_state - get the state just before the EOB char
was reached */
|
|     static yy_state_type yy_get_previous_state (void)
| {
| yy_state_type yy_current_state;
| char *yy_cp;
|
| yy_current_state = (yy_start);
|
| (yy_state_ptr) = (yy_state_buf);
| *(yy_state_ptr)++ = yy_current_state;
|
| for ( yy_cp = (yytext_ptr) + YY_MORE_ADJ; yy_cp < (yy_c_buf_p); ++yy_cp )
| {
| YY_CHAR yy_c = (*yy_cp ? yy_ec[YY_SC_TO_UI(*yy_cp)] : 1);
| while ( yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state )
| {
| yy_current_state = (int) yy_def[yy_current_state];
| if ( yy_current_state >= 13 )
| yy_c = yy_meta[yy_c];
| }
| yy_current_state = yy_nxt[yy_base[yy_current_state] + yy_c];
| *(yy_state_ptr)++ = yy_current_state;
| }
|
| return yy_current_state;
| }
|
| /* yy_try_NUL_trans - try to make a transition on the NUL character
|  *
|  * synopsis
|  * next_state = yy_try_NUL_trans( current_state );
|  */
|     static yy_state_type yy_try_NUL_trans  (yy_state_type yy_current_state )
| {
| int yy_is_jam;
|
| YY_CHAR yy_c = 1;
| while ( yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state )
| {
| yy_current_state = (int) yy_def[yy_current_state];
| if ( yy_current_state >= 13 )
| yy_c = yy_meta[yy_c];
| }
| yy_current_state = yy_nxt[yy_base[yy_current_state] + yy_c];
| yy_is_jam = (yy_current_state == 12);
| if ( ! yy_is_jam )
| *(yy_state_ptr)++ = yy_current_state;
|
| return yy_is_jam ? 0 : yy_current_state;
| }
|
| #ifndef YY_NO_UNPUT
|
|     static void yyunput (int c, char * yy_bp )
| {
| char *yy_cp;
|
|     yy_cp = (yy_c_buf_p);
|
| /* undo effects of setting up yytext */
| *yy_cp = (yy_hold_char);
|
| if ( yy_cp < YY_CURRENT_BUFFER_LVALUE->yy_ch_buf + 2 )
| { /* need to shift things up to make room */
| /* +2 for EOB chars. */
| int number_to_move = (yy_n_chars) + 2;
| char *dest = &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[
| YY_CURRENT_BUFFER_LVALUE->yy_buf_size + 2];
| char *source =
| &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[number_to_move];
|
| while ( source > YY_CURRENT_BUFFER_LVALUE->yy_ch_buf )
| *--dest = *--source;
|
| yy_cp += (int) (dest - source);
| yy_bp += (int) (dest - source);
| YY_CURRENT_BUFFER_LVALUE->yy_n_chars =
| (yy_n_chars) = (int) YY_CURRENT_BUFFER_LVALUE->yy_buf_size;
|
| if ( yy_cp < YY_CURRENT_BUFFER_LVALUE->yy_ch_buf + 2 )
| YY_FATAL_ERROR( "flex scanner push-back overflow" );
| }
|
| *--yy_cp = (char) c;
|
| (yytext_ptr) = yy_bp;
| (yy_hold_char) = *yy_cp;
| (yy_c_buf_p) = yy_cp;
| }
|
| #endif
|
| #ifndef YY_NO_INPUT
| #ifdef __cplusplus
|     static int yyinput (void)
| #else
|     static int input  (void)
| #endif
|
| {
| int c;
|
| *(yy_c_buf_p) = (yy_hold_char);
|
| if ( *(yy_c_buf_p) == YY_END_OF_BUFFER_CHAR )
| {
| /* yy_c_buf_p now points to the character we want to return.
| * If this occurs *before* the EOB characters, then it's a
| * valid NUL; if not, then we've hit the end of the buffer.
| */
| if ( (yy_c_buf_p) < &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[(yy_n_chars)] )
| /* This was really a NUL. */
| *(yy_c_buf_p) = '\0';
|
| else
| { /* need more input */
| int offset = (int) ((yy_c_buf_p) - (yytext_ptr));
| ++(yy_c_buf_p);
|
| switch ( yy_get_next_buffer(  ) )
| {
| case EOB_ACT_LAST_MATCH:
| /* This happens because yy_g_n_b()
| * sees that we've accumulated a
| * token and flags that we need to
| * try matching the token before
| * proceeding.  But for input(),
| * there's no matching to consider.
| * So convert the EOB_ACT_LAST_MATCH
| * to EOB_ACT_END_OF_FILE.
| */
|
| /* Reset buffer status. */
| yyrestart( yyin );
|
| /*FALLTHROUGH*/
|
| case EOB_ACT_END_OF_FILE:
| {
| if ( yywrap(  ) )
| return 0;
|
| if ( ! (yy_did_buffer_switch_on_eof) )
| YY_NEW_FILE;
| #ifdef __cplusplus
| return yyinput();
| #else
| return input();
| #endif
| }
|
| case EOB_ACT_CONTINUE_SCAN:
| (yy_c_buf_p) = (yytext_ptr) + offset;
| break;
| }
| }
| }
|
| c = *(unsigned char *) (yy_c_buf_p); /* cast for 8-bit char's */
| *(yy_c_buf_p) = '\0'; /* preserve yytext */
| (yy_hold_char) = *++(yy_c_buf_p);
|
| return c;
| }
| #endif /* ifndef YY_NO_INPUT */
|
| /** Immediately switch to a different input stream.
|  * @param input_file A readable stream.
|  *
|  * @note This function does not reset the start condition to @c INITIAL .
|  */
|     void yyrestart  (FILE * input_file )
| {
|
| if ( ! YY_CURRENT_BUFFER ){
|         yyensure_buffer_stack ();
| YY_CURRENT_BUFFER_LVALUE =
|             yy_create_buffer( yyin, YY_BUF_SIZE );
| }
|
| yy_init_buffer( YY_CURRENT_BUFFER, input_file );
| yy_load_buffer_state(  );
| }
|
| /** Switch to a different input buffer.
|  * @param new_buffer The new input buffer.
|  *
|  */
|     void yy_switch_to_buffer  (YY_BUFFER_STATE  new_buffer )
| {
|
| /* TODO. We should be able to replace this entire function body
| * with
| * yypop_buffer_state();
| * yypush_buffer_state(new_buffer);
|      */
| yyensure_buffer_stack ();
| if ( YY_CURRENT_BUFFER == new_buffer )
| return;
|
| if ( YY_CURRENT_BUFFER )
| {
| /* Flush out information for old buffer. */
| *(yy_c_buf_p) = (yy_hold_char);
| YY_CURRENT_BUFFER_LVALUE->yy_buf_pos = (yy_c_buf_p);
| YY_CURRENT_BUFFER_LVALUE->yy_n_chars = (yy_n_chars);
| }
|
| YY_CURRENT_BUFFER_LVALUE = new_buffer;
| yy_load_buffer_state(  );
|
| /* We don't actually know whether we did this switch during
| * EOF (yywrap()) processing, but the only time this flag
| * is looked at is after yywrap() is called, so it's safe
| * to go ahead and always set it.
| */
| (yy_did_buffer_switch_on_eof) = 1;
| }
|
| static void yy_load_buffer_state  (void)
| {
|      (yy_n_chars) = YY_CURRENT_BUFFER_LVALUE->yy_n_chars;
| (yytext_ptr) = (yy_c_buf_p) = YY_CURRENT_BUFFER_LVALUE->yy_buf_pos;
| yyin = YY_CURRENT_BUFFER_LVALUE->yy_input_file;
| (yy_hold_char) = *(yy_c_buf_p);
| }
|
| /** Allocate and initialize an input buffer state.
|  * @param file A readable stream.
|  * @param size The character buffer size in bytes. When in doubt,
use @c YY_BUF_SIZE.
|  *
|  * @return the allocated buffer state.
|  */
|     YY_BUFFER_STATE yy_create_buffer  (FILE * file, int  size )
| {
| YY_BUFFER_STATE b;
|
| b = (YY_BUFFER_STATE) yyalloc( sizeof( struct yy_buffer_state )  );
| if ( ! b )
| YY_FATAL_ERROR( "out of dynamic memory in yy_create_buffer()" );
|
| b->yy_buf_size = size;
|
| /* yy_ch_buf has to be 2 characters longer than the size given because
| * we need to put in 2 end-of-buffer characters.
| */
| b->yy_ch_buf = (char *) yyalloc( (yy_size_t) (b->yy_buf_size + 2)  );
| if ( ! b->yy_ch_buf )
| YY_FATAL_ERROR( "out of dynamic memory in yy_create_buffer()" );
|
| b->yy_is_our_buffer = 1;
|
| yy_init_buffer( b, file );
|
| return b;
| }
|
| /** Destroy the buffer.
|  * @param b a buffer created with yy_create_buffer()
|  *
|  */
|     void yy_delete_buffer (YY_BUFFER_STATE  b )
| {
|
| if ( ! b )
| return;
|
| if ( b == YY_CURRENT_BUFFER ) /* Not sure if we should pop here. */
| YY_CURRENT_BUFFER_LVALUE = (YY_BUFFER_STATE) 0;
|
| if ( b->yy_is_our_buffer )
| yyfree( (void *) b->yy_ch_buf  );
|
| yyfree( (void *) b  );
| }
|
| /* Initializes or reinitializes a buffer.
|  * This function is sometimes called more than once on the same buffer,
|  * such as during a yyrestart() or at EOF.
|  */
|     static void yy_init_buffer  (YY_BUFFER_STATE  b, FILE * file )
|
| {
| int oerrno = errno;
|
| yy_flush_buffer( b );
|
| b->yy_input_file = file;
| b->yy_fill_buffer = 1;
|
|     /* If b is the current buffer, then yy_init_buffer was _probably_
|      * called from yyrestart() or through yy_get_next_buffer.
|      * In that case, we don't want to reset the lineno or column.
|      */
|     if (b != YY_CURRENT_BUFFER){
|         b->yy_bs_lineno = 1;
|         b->yy_bs_column = 0;
|     }
|
|         b->yy_is_interactive = file ? (isatty( fileno(file) ) > 0) : 0;
|
| errno = oerrno;
| }
|
| /** Discard all buffered characters. On the next scan, YY_INPUT will
be called.
|  * @param b the buffer state to be flushed, usually @c YY_CURRENT_BUFFER.
|  *
|  */
|     void yy_flush_buffer (YY_BUFFER_STATE  b )
| {
|      if ( ! b )
| return;
|
| b->yy_n_chars = 0;
|
| /* We always need two end-of-buffer characters.  The first causes
| * a transition to the end-of-buffer state.  The second causes
| * a jam in that state.
| */
| b->yy_ch_buf[0] = YY_END_OF_BUFFER_CHAR;
| b->yy_ch_buf[1] = YY_END_OF_BUFFER_CHAR;
|
| b->yy_buf_pos = &b->yy_ch_buf[0];
|
| b->yy_at_bol = 1;
| b->yy_buffer_status = YY_BUFFER_NEW;
|
| if ( b == YY_CURRENT_BUFFER )
| yy_load_buffer_state(  );
| }
|
| /** Pushes the new state onto the stack. The new state becomes
|  *  the current state. This function will allocate the stack
|  *  if necessary.
|  *  @param new_buffer The new state.
|  *
|  */
| void yypush_buffer_state (YY_BUFFER_STATE new_buffer )
| {
|      if (new_buffer == NULL)
| return;
|
| yyensure_buffer_stack();
|
| /* This block is copied from yy_switch_to_buffer. */
| if ( YY_CURRENT_BUFFER )
| {
| /* Flush out information for old buffer. */
| *(yy_c_buf_p) = (yy_hold_char);
| YY_CURRENT_BUFFER_LVALUE->yy_buf_pos = (yy_c_buf_p);
| YY_CURRENT_BUFFER_LVALUE->yy_n_chars = (yy_n_chars);
| }
|
| /* Only push if top exists. Otherwise, replace top. */
| if (YY_CURRENT_BUFFER)
| (yy_buffer_stack_top)++;
| YY_CURRENT_BUFFER_LVALUE = new_buffer;
|
| /* copied from yy_switch_to_buffer. */
| yy_load_buffer_state(  );
| (yy_did_buffer_switch_on_eof) = 1;
| }
|
| /** Removes and deletes the top of the stack, if present.
|  *  The next element becomes the new top.
|  *
|  */
| void yypop_buffer_state (void)
| {
|      if (!YY_CURRENT_BUFFER)
| return;
|
| yy_delete_buffer(YY_CURRENT_BUFFER );
| YY_CURRENT_BUFFER_LVALUE = NULL;
| if ((yy_buffer_stack_top) > 0)
| --(yy_buffer_stack_top);
|
| if (YY_CURRENT_BUFFER) {
| yy_load_buffer_state(  );
| (yy_did_buffer_switch_on_eof) = 1;
| }
| }
|
| /* Allocates the stack if it does not exist.
|  *  Guarantees space for at least one push.
|  */
| static void yyensure_buffer_stack (void)
| {
| yy_size_t num_to_alloc;
|
| if (!(yy_buffer_stack)) {
|
| /* First allocation is just for 2 elements, since we don't know if this
| * scanner will even need a stack. We use 2 instead of 1 to avoid an
| * immediate realloc on the next call.
|          */
|       num_to_alloc = 1; /* After all that talk, this was set to 1
anyways... */
| (yy_buffer_stack) = (struct yy_buffer_state**)yyalloc
| (num_to_alloc * sizeof(struct yy_buffer_state*)
| );
| if ( ! (yy_buffer_stack) )
| YY_FATAL_ERROR( "out of dynamic memory in yyensure_buffer_stack()" );
|
| memset((yy_buffer_stack), 0, num_to_alloc * sizeof(struct yy_buffer_state*));
|
| (yy_buffer_stack_max) = num_to_alloc;
| (yy_buffer_stack_top) = 0;
| return;
| }
|
| if ((yy_buffer_stack_top) >= ((yy_buffer_stack_max)) - 1){
|
| /* Increase the buffer to prepare for a possible push. */
| yy_size_t grow_size = 8 /* arbitrary grow size */;
|
| num_to_alloc = (yy_buffer_stack_max) + grow_size;
| (yy_buffer_stack) = (struct yy_buffer_state**)yyrealloc
| ((yy_buffer_stack),
| num_to_alloc * sizeof(struct yy_buffer_state*)
| );
| if ( ! (yy_buffer_stack) )
| YY_FATAL_ERROR( "out of dynamic memory in yyensure_buffer_stack()" );
|
| /* zero only the new slots.*/
| memset((yy_buffer_stack) + (yy_buffer_stack_max), 0, grow_size *
sizeof(struct yy_buffer_state*));
| (yy_buffer_stack_max) = num_to_alloc;
| }
| }
|
| /** Setup the input buffer state to scan directly from a
user-specified character buffer.
|  * @param base the character buffer
|  * @param size the size in bytes of the character buffer
|  *
|  * @return the newly allocated buffer state object.
|  */
| YY_BUFFER_STATE yy_scan_buffer  (char * base, yy_size_t  size )
| {
| YY_BUFFER_STATE b;
|
| if ( size < 2 ||
|      base[size-2] != YY_END_OF_BUFFER_CHAR ||
|      base[size-1] != YY_END_OF_BUFFER_CHAR )
| /* They forgot to leave room for the EOB's. */
| return NULL;
|
| b = (YY_BUFFER_STATE) yyalloc( sizeof( struct yy_buffer_state )  );
| if ( ! b )
| YY_FATAL_ERROR( "out of dynamic memory in yy_scan_buffer()" );
|
| b->yy_buf_size = (int) (size - 2); /* "- 2" to take care of EOB's */
| b->yy_buf_pos = b->yy_ch_buf = base;
| b->yy_is_our_buffer = 0;
| b->yy_input_file = NULL;
| b->yy_n_chars = b->yy_buf_size;
| b->yy_is_interactive = 0;
| b->yy_at_bol = 1;
| b->yy_fill_buffer = 0;
| b->yy_buffer_status = YY_BUFFER_NEW;
|
| yy_switch_to_buffer( b  );
|
| return b;
| }
|
| /** Setup the input buffer state to scan a string. The next call to
yylex() will
|  * scan from a @e copy of @a str.
|  * @param yystr a NUL-terminated string to scan
|  *
|  * @return the newly allocated buffer state object.
|  * @note If you want to scan bytes that may contain NUL values, then use
|  *       yy_scan_bytes() instead.
|  */
| YY_BUFFER_STATE yy_scan_string (const char * yystr )
| {
|
| return yy_scan_bytes( yystr, (int) strlen(yystr) );
| }
|
| /** Setup the input buffer state to scan the given bytes. The next
call to yylex() will
|  * scan from a @e copy of @a bytes.
|  * @param yybytes the byte buffer to scan
|  * @param _yybytes_len the number of bytes in the buffer pointed to
by @a bytes.
|  *
|  * @return the newly allocated buffer state object.
|  */
| YY_BUFFER_STATE yy_scan_bytes  (const char * yybytes, int  _yybytes_len )
| {
| YY_BUFFER_STATE b;
| char *buf;
| yy_size_t n;
| int i;
|
| /* Get memory for full buffer, including space for trailing EOB's. */
| n = (yy_size_t) (_yybytes_len + 2);
| buf = (char *) yyalloc( n  );
| if ( ! buf )
| YY_FATAL_ERROR( "out of dynamic memory in yy_scan_bytes()" );
|
| for ( i = 0; i < _yybytes_len; ++i )
| buf[i] = yybytes[i];
|
| buf[_yybytes_len] = buf[_yybytes_len+1] = YY_END_OF_BUFFER_CHAR;
|
| b = yy_scan_buffer( buf, n );
| if ( ! b )
| YY_FATAL_ERROR( "bad buffer in yy_scan_bytes()" );
|
| /* It's okay to grow etc. this buffer, and we should throw it
| * away when we're done.
| */
| b->yy_is_our_buffer = 1;
|
| return b;
| }
|
| #ifndef YY_EXIT_FAILURE
| #define YY_EXIT_FAILURE 2
| #endif
|
| static void yynoreturn yy_fatal_error (const char* msg )
| {
| fprintf( stderr, "%s\n", msg );
| exit( YY_EXIT_FAILURE );
| }
|
| /* Redefine yyless() so it works in section 3 code. */
|
| #undef yyless
| #define yyless(n) \
| do \
| { \
| /* Undo effects of setting up yytext. */ \
|         int yyless_macro_arg = (n); \
|         YY_LESS_LINENO(yyless_macro_arg);\
| yytext[yyleng] = (yy_hold_char); \
| (yy_c_buf_p) = yytext + yyless_macro_arg; \
| (yy_hold_char) = *(yy_c_buf_p); \
| *(yy_c_buf_p) = '\0'; \
| yyleng = yyless_macro_arg; \
| } \
| while ( 0 )
|
| /* Accessor  methods (get/set functions) to struct members. */
|
| /** Get the current line number.
|  *
|  */
| int yyget_lineno  (void)
| {
|
|     return yylineno;
| }
|
| /** Get the input stream.
|  *
|  */
| FILE *yyget_in  (void)
| {
|         return yyin;
| }
|
| /** Get the output stream.
|  *
|  */
| FILE *yyget_out  (void)
| {
|         return yyout;
| }
|
| /** Get the length of the current token.
|  *
|  */
| int yyget_leng  (void)
| {
|         return yyleng;
| }
|
| /** Get the current token.
|  *
|  */
|
| char *yyget_text  (void)
| {
|         return yytext;
| }
|
| /** Set the current line number.
|  * @param _line_number line number
|  *
|  */
| void yyset_lineno (int  _line_number )
| {
|
|     yylineno = _line_number;
| }
|
| /** Set the input stream. This does not discard the current
|  * input buffer.
|  * @param _in_str A readable stream.
|  *
|  * @see yy_switch_to_buffer
|  */
| void yyset_in (FILE *  _in_str )
| {
|         yyin = _in_str ;
| }
|
| void yyset_out (FILE *  _out_str )
| {
|         yyout = _out_str ;
| }
|
| int yyget_debug  (void)
| {
|         return yy_flex_debug;
| }
|
| void yyset_debug (int  _bdebug )
| {
|         yy_flex_debug = _bdebug ;
| }
|
| static int yy_init_globals (void)
| {
|         /* Initialization is the same as for the non-reentrant scanner.
|      * This function is called from yylex_destroy(), so don't allocate here.
|      */
|
|     (yy_buffer_stack) = NULL;
|     (yy_buffer_stack_top) = 0;
|     (yy_buffer_stack_max) = 0;
|     (yy_c_buf_p) = NULL;
|     (yy_init) = 0;
|     (yy_start) = 0;
|
|     (yy_state_buf) = 0;
|     (yy_state_ptr) = 0;
|     (yy_full_match) = 0;
|     (yy_lp) = 0;
|
| /* Defined in main.c */
| #ifdef YY_STDINIT
|     yyin = stdin;
|     yyout = stdout;
| #else
|     yyin = NULL;
|     yyout = NULL;
| #endif
|
|     /* For future reference: Set errno on error, since we are called by
|      * yylex_init()
|      */
|     return 0;
| }
|
| /* yylex_destroy is for both reentrant and non-reentrant scanners. */
| int yylex_destroy  (void)
| {
|
|     /* Pop the buffer stack, destroying each element. */
| while(YY_CURRENT_BUFFER){
| yy_delete_buffer( YY_CURRENT_BUFFER  );
| YY_CURRENT_BUFFER_LVALUE = NULL;
| yypop_buffer_state();
| }
|
| /* Destroy the stack itself. */
| yyfree((yy_buffer_stack) );
| (yy_buffer_stack) = NULL;
|
|     yyfree ( (yy_state_buf) );
|     (yy_state_buf)  = NULL;
|
|     /* Reset the globals. This is important in a non-reentrant
scanner so the next time
|      * yylex() is called, initialization will occur. */
|     yy_init_globals( );
|
|     return 0;
| }
|
| /*
|  * Internal utility routines.
|  */
|
| #ifndef yytext_ptr
| static void yy_flex_strncpy (char* s1, const char * s2, int n )
| {
|
| int i;
| for ( i = 0; i < n; ++i )
| s1[i] = s2[i];
| }
| #endif
|
| #ifdef YY_NEED_STRLEN
| static int yy_flex_strlen (const char * s )
| {
| int n;
| for ( n = 0; s[n]; ++n )
| ;
|
| return n;
| }
| #endif
|
| void *yyalloc (yy_size_t  size )
| {
| return malloc(size);
| }
|
| void *yyrealloc  (void * ptr, yy_size_t  size )
| {
|
| /* The cast to (char *) in the following accommodates both
| * implementations that use char* generic pointers, and those
| * that use void* generic pointers.  It works with the latter
| * because both ANSI C and C++ allow castless assignment from
| * any pointer type to void*, and deal with argument conversions
| * as though doing an assignment.
| */
| return realloc(ptr, size);
| }
|
| void yyfree (void * ptr )
| {
| free( (char *) ptr ); /* see yyrealloc() for (char *) cast */
| }
|
| #define YYTABLES_NAME "yytables"
|
| #line 10 "conftest.l"
|
| #ifdef YYTEXT_POINTER
| extern char *yytext;
| #endif
| int
| main (void)
| {
|   return ! yylex () + ! yywrap ();
| }
configure:10107: gcc -o conftest -g -O2        conftest.c -lfl -ldl  >&5
configure:10107: $? = 0
configure:10117: result: -lfl
configure:10123: checking whether yytext is a pointer
configure:10140: gcc -o conftest -g -O2        conftest.c -lfl -ldl  >&5
configure:10140: $? = 0
configure:10148: result: yes
configure:10176: checking whether NLS is requested
configure:10182: result: yes
configure:10188: checking for catalogs to be installed
configure:10220: result:  es fi fr id ja ru rw sv tr uk zh_CN
configure:10247: checking whether NLS is requested
configure:10256: result: yes
configure:10294: checking for msgfmt
configure:10325: result: /usr/bin/msgfmt
configure:10334: checking for gmsgfmt
configure:10365: result: /usr/bin/msgfmt
configure:10405: checking for xgettext
configure:10436: result: /usr/bin/xgettext
configure:10476: checking for msgmerge
configure:10506: result: /usr/bin/msgmerge
configure:10543: checking whether to enable maintainer-specific
portions of Makefiles
configure:10552: result: no
configure:10578: checking for memory.h
configure:10578: result: yes
configure:10578: checking for sys/stat.h
configure:10578: result: yes
configure:10578: checking for sys/types.h
configure:10578: result: yes
configure:10578: checking for unistd.h
configure:10578: result: yes
configure:10591: checking whether compiling a cross-assembler
configure:10601: result: yes
configure:10606: checking for strsignal
configure:10606: gcc -o conftest -g -O2        conftest.c -ldl  >&5
configure:10606: $? = 0
configure:10606: result: yes
configure:10617: checking for LC_MESSAGES
configure:10633: gcc -o conftest -g -O2        conftest.c -ldl  >&5
configure:10633: $? = 0
configure:10641: result: yes
configure:10792: checking for working assert macro
configure:10817: gcc -o conftest -g -O2        conftest.c -ldl  >&5
In file included from conftest.c:54:
conftest.c: In function 'main':
conftest.c:63:10: warning: implicit declaration of function 'strcmp'
[-Wimplicit-function-declaration]
   63 | assert (!strcmp(s, "foo bar baz quux"));
      |          ^~~~~~
conftest.c:56:1: note: include '<string.h>' or provide a declaration of 'strcmp'
   55 | #include <stdio.h>
   56 | int
In file included from conftest.c:54:
conftest.c:63:10: warning: argument 1 null where non-null expected [-Wnonnull]
   63 | assert (!strcmp(s, "foo bar baz quux"));
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
conftest.c:63:10: note: in a call to built-in function 'strcmp'
configure:10817: $? = 0
configure:10825: result: yes
configure:10846: checking for a known getopt prototype in unistd.h
configure:10862: gcc -c -g -O2      conftest.c >&5
configure:10862: $? = 0
configure:10870: result: yes
configure:10879: checking whether declaration is required for environ
configure:10899: gcc -o conftest -g -O2        conftest.c -ldl  >&5
configure:10899: $? = 0
configure:10907: result: no
configure:10916: checking whether declaration is required for ffs
configure:10936: gcc -o conftest -g -O2        conftest.c -ldl  >&5
configure:10936: $? = 0
configure:10944: result: no
configure:10953: checking whether asprintf is declared
configure:10953: gcc -c -g -O2      conftest.c >&5
configure:10953: $? = 0
configure:10953: result: yes
configure:10963: checking whether mempcpy is declared
configure:10963: gcc -c -g -O2      conftest.c >&5
configure:10963: $? = 0
configure:10963: result: yes
configure:10973: checking whether stpcpy is declared
configure:10973: gcc -c -g -O2      conftest.c >&5
configure:10973: $? = 0
configure:10973: result: yes
configure:11053: checking for struct stat.st_mtim.tv_sec in sys/stat.h
configure:11071: gcc -c -g -O2      conftest.c >&5
In file included from /usr/include/x86_64-linux-gnu/sys/stat.h:25,
                 from conftest.c:60:
/usr/include/features.h:187:3: warning: #warning "_BSD_SOURCE and
_SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp]
  187 | # warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use
_DEFAULT_SOURCE"
      |   ^~~~~~~
configure:11071: $? = 0
configure:11085: result: yes
configure:11088: checking for struct stat.st_mtim.tv_nsec in sys/stat.h
configure:11106: gcc -c -g -O2      conftest.c >&5
In file included from /usr/include/x86_64-linux-gnu/sys/stat.h:25,
                 from conftest.c:61:
/usr/include/features.h:187:3: warning: #warning "_BSD_SOURCE and
_SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp]
  187 | # warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use
_DEFAULT_SOURCE"
      |   ^~~~~~~
configure:11106: $? = 0
configure:11120: result: yes
configure:11195: updating cache ./config.cache
configure:11239: checking that generated files are newer than configure
configure:11245: result: done
configure:11277: creating ./config.status

## ---------------------- ##
## Running config.status. ##
## ---------------------- ##

This file was extended by gas config.status 2.37, which was
generated by GNU Autoconf 2.69.  Invocation command line was

  CONFIG_FILES    =
  CONFIG_HEADERS  =
  CONFIG_LINKS    =
  CONFIG_COMMANDS =
  $ ./config.status

on noah-tigerlake

config.status:1164: creating .gdbinit
config.status:1164: creating Makefile
config.status:1164: creating doc/Makefile
config.status:1164: creating po/Makefile.in
config.status:1164: creating config.h
config.status:1378: executing depfiles commands
config.status:1378: executing libtool commands
config.status:1378: executing default-1 commands
config.status:1378: executing default commands

## ---------------- ##
## Cache variables. ##
## ---------------- ##

ac_cv_build=x86_64-pc-linux-gnu
ac_cv_c_bigendian=no
ac_cv_c_compiler_gnu=yes
ac_cv_env_CC_set=set
ac_cv_env_CC_value=gcc
ac_cv_env_CFLAGS_set=set
ac_cv_env_CFLAGS_value='-g -O2    '
ac_cv_env_CPPFLAGS_set=
ac_cv_env_CPPFLAGS_value=
ac_cv_env_CPP_set=
ac_cv_env_CPP_value=
ac_cv_env_LDFLAGS_set=set
ac_cv_env_LDFLAGS_value=' '
ac_cv_env_LIBS_set=
ac_cv_env_LIBS_value=
ac_cv_env_YACC_set=set
ac_cv_env_YACC_value='bison -y'
ac_cv_env_YFLAGS_set=
ac_cv_env_YFLAGS_value=
ac_cv_env_build_alias_set=set
ac_cv_env_build_alias_value=x86_64-pc-linux-gnu
ac_cv_env_host_alias_set=set
ac_cv_env_host_alias_value=x86_64-pc-linux-gnu
ac_cv_env_target_alias_set=set
ac_cv_env_target_alias_value=x86_64-glibc-linux-gnu
ac_cv_func_strsignal=yes
ac_cv_have_decl_asprintf=yes
ac_cv_have_decl_mempcpy=yes
ac_cv_have_decl_stpcpy=yes
ac_cv_header_dlfcn_h=yes
ac_cv_header_inttypes_h=yes
ac_cv_header_memory_h=yes
ac_cv_header_minix_config_h=no
ac_cv_header_stdc=yes
ac_cv_header_stdint_h=yes
ac_cv_header_stdlib_h=yes
ac_cv_header_string_h=yes
ac_cv_header_strings_h=yes
ac_cv_header_sys_stat_h=yes
ac_cv_header_sys_types_h=yes
ac_cv_header_unistd_h=yes
ac_cv_header_windows_h=no
ac_cv_host=x86_64-pc-linux-gnu
ac_cv_lib_lex=-lfl
ac_cv_objext=o
ac_cv_path_EGREP='/usr/bin/grep -E'
ac_cv_path_FGREP='/usr/bin/grep -F'
ac_cv_path_GMSGFMT=/usr/bin/msgfmt
ac_cv_path_GREP=/usr/bin/grep
ac_cv_path_MSGFMT=/usr/bin/msgfmt
ac_cv_path_MSGMERGE=/usr/bin/msgmerge
ac_cv_path_SED=/usr/bin/sed
ac_cv_path_XGETTEXT=/usr/bin/xgettext
ac_cv_path_mkdir=/usr/bin/mkdir
ac_cv_prog_AR='ar --plugin /usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so'
ac_cv_prog_AWK=gawk
ac_cv_prog_CC=gcc
ac_cv_prog_CPP='gcc -E'
ac_cv_prog_LEX=flex
ac_cv_prog_OBJDUMP=objdump
ac_cv_prog_RANLIB='ranlib --plugin
/usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so'
ac_cv_prog_YACC='bison -y'
ac_cv_prog_ac_ct_STRIP=strip
ac_cv_prog_cc_c89=
ac_cv_prog_cc_g=yes
ac_cv_prog_lex_root=lex.yy
ac_cv_prog_lex_yytext_pointer=yes
ac_cv_prog_make_make_set=yes
ac_cv_safe_to_define___extensions__=yes
ac_cv_search_dlsym=-ldl
ac_cv_sys_file_offset_bits=no
ac_cv_sys_largefile_CC=no
ac_cv_target=x86_64-glibc-linux-gnu
am_cv_CC_dependencies_compiler_type=gcc3
am_cv_make_support_nested_variables=yes
am_cv_prog_cc_c_o=yes
am_cv_val_LC_MESSAGES=yes
gas_cv_assert_ok=yes
gas_cv_decl_getopt_unistd_h=yes
gas_cv_decl_needed_environ=no
gas_cv_decl_needed_ffs=no
gas_cv_have_sys_stat_type_member_st_mtim_tv_nsec=yes
gas_cv_have_sys_stat_type_member_st_mtim_tv_sec=yes
gcc_cv_prog_cmp_skip='cmp --ignore-initial=16 $$f1 $$f2'
lt_cv_archive_cmds_need_lc=no
lt_cv_deplibs_check_method=pass_all
lt_cv_file_magic_cmd='$MAGIC_CMD'
lt_cv_file_magic_test_file=
lt_cv_ld_reload_flag=-r
lt_cv_nm_interface='BSD nm'
lt_cv_objdir=.libs
lt_cv_path_LD=ld
lt_cv_path_NM=nm
lt_cv_prog_compiler_c_o=yes
lt_cv_prog_compiler_pic_works=yes
lt_cv_prog_compiler_rtti_exceptions=no
lt_cv_prog_compiler_static_works=yes
lt_cv_prog_gnu_ld=yes
lt_cv_shlibpath_overrides_runpath=yes
lt_cv_sys_global_symbol_pipe='sed -n -e '\''s/^.*[
]\([ABCDGIRSTW][ABCDGIRSTW]*\)[ ][ ]*\([_A-Za-z][_A-Za-z0-9]*\)$/\1 \2
\2/p'\'''
lt_cv_sys_global_symbol_to_c_name_address='sed -n -e '\''s/^: \([^
]*\) $/  {\"\1\", (void *) 0},/p'\'' -e '\''s/^[ABCDGIRSTW]* \([^ ]*\)
\([^ ]*\)$/  {"\2", (void *) \&\2},/p'\'''
lt_cv_sys_global_symbol_to_c_name_address_lib_prefix='sed -n -e
'\''s/^: \([^ ]*\) $/  {\"\1\", (void *) 0},/p'\'' -e
'\''s/^[ABCDGIRSTW]* \([^ ]*\) \(lib[^ ]*\)$/  {"\2", (void *)
\&\2},/p'\'' -e '\''s/^[ABCDGIRSTW]* \([^ ]*\) \([^ ]*\)$/  {"lib\2",
(void *) \&\2},/p'\'''
lt_cv_sys_global_symbol_to_cdecl='sed -n -e '\''s/^T .* \(.*\)$/extern
int \1();/p'\'' -e '\''s/^[ABCDGIRSTW]* .* \(.*\)$/extern char
\1;/p'\'''
lt_cv_sys_max_cmd_len=1879296

## ----------------- ##
## Output variables. ##
## ----------------- ##

ACLOCAL='${SHELL} /some/were/build-many/src/binutils/missing aclocal-1.15'
AMDEPBACKSLASH='\'
AMDEP_FALSE='#'
AMDEP_TRUE=''
AMTAR='$${TAR-tar}'
AM_BACKSLASH='\'
AM_DEFAULT_V='$(AM_DEFAULT_VERBOSITY)'
AM_DEFAULT_VERBOSITY='1'
AM_V='$(V)'
AR='ar --plugin /usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so
--plugin /usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so'
AUTOCONF='${SHELL} /some/were/build-many/src/binutils/missing autoconf'
AUTOHEADER='${SHELL} /some/were/build-many/src/binutils/missing autoheader'
AUTOMAKE='${SHELL} /some/were/build-many/src/binutils/missing automake-1.15'
AWK='gawk'
CATALOGS=' es.gmo fi.gmo fr.gmo id.gmo ja.gmo ru.gmo rw.gmo sv.gmo
tr.gmo uk.gmo zh_CN.gmo'
CATOBJEXT='.gmo'
CC='gcc'
CCDEPMODE='depmode=gcc3'
CFLAGS='-g -O2    '
CPP='gcc -E'
CPPFLAGS=''
CYGPATH_W='echo'
DATADIRNAME='share'
DEFS='-DHAVE_CONFIG_H'
DEPDIR='.deps'
DSYMUTIL=''
DUMPBIN=''
ECHO_C=''
ECHO_N='-n'
ECHO_T=''
EGREP='/usr/bin/grep -E'
EXEEXT=''
FGREP='/usr/bin/grep -F'
GDBINIT='.gdbinit'
GENCAT='gencat'
GENINSRC_NEVER_FALSE=''
GENINSRC_NEVER_TRUE='#'
GMSGFMT='/usr/bin/msgfmt'
GREP='/usr/bin/grep'
INCINTL=''
INSTALL_DATA='/usr/bin/install -c -m 644'
INSTALL_PROGRAM='/usr/bin/install -c'
INSTALL_SCRIPT='/usr/bin/install -c'
INSTALL_STRIP_PROGRAM='$(install_sh) -c -s'
INSTOBJEXT='.mo'
LARGEFILE_CPPFLAGS=''
LD='ld -m elf_x86_64'
LDFLAGS=' '
LEX='flex'
LEXLIB='-lfl'
LEX_OUTPUT_ROOT='lex.yy'
LIBINTL=''
LIBINTL_DEP=''
LIBM=''
LIBOBJS=''
LIBS='-ldl '
LIBTOOL='$(SHELL) $(top_builddir)/libtool'
LIPO=''
LN_S='ln -s'
LTLIBOBJS=''
MAINT='#'
MAINTAINER_MODE_FALSE=''
MAINTAINER_MODE_TRUE='#'
MAKEINFO='makeinfo --split-size=5000000'
MKDIR_P='/usr/bin/mkdir -p'
MKINSTALLDIRS='/some/were/build-many/src/binutils/gas/../mkinstalldirs'
MSGFMT='/usr/bin/msgfmt'
MSGMERGE='/usr/bin/msgmerge'
NM='nm'
NMEDIT=''
NO_WERROR='-Wno-error'
OBJDUMP='objdump'
OBJEXT='o'
OPCODES_LIB='../opcodes/libopcodes.la'
OTOOL64=''
OTOOL=''
PACKAGE='gas'
PACKAGE_BUGREPORT=''
PACKAGE_NAME='gas'
PACKAGE_STRING='gas 2.37'
PACKAGE_TARNAME='gas'
PACKAGE_URL=''
PACKAGE_VERSION='2.37'
PATH_SEPARATOR=':'
POSUB='po'
RANLIB='ranlib --plugin
/usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so --plugin
/usr/lib/gcc/x86_64-linux-gnu/11/liblto_plugin.so'
SED='/usr/bin/sed'
SET_MAKE=''
SHELL='/bin/bash'
STRIP='strip'
USE_NLS='yes'
VERSION='2.37'
WARN_CFLAGS='-W -Wall -Wstrict-prototypes -Wmissing-prototypes
-Wshadow -Wstack-usage=262144 -Werror'
WARN_CFLAGS_FOR_BUILD='-W -Wall -Wstrict-prototypes
-Wmissing-prototypes -Wshadow -Wstack-usage=262144 -Werror'
WARN_WRITE_STRINGS='-Wwrite-strings'
XGETTEXT='/usr/bin/xgettext'
YACC='bison -y'
YFLAGS=''
ac_ct_CC=''
ac_ct_DUMPBIN=''
am__EXEEXT_FALSE=''
am__EXEEXT_TRUE='#'
am__fastdepCC_FALSE='#'
am__fastdepCC_TRUE=''
am__include='include'
am__isrc=' -I$(srcdir)'
am__leading_dot='.'
am__nodep='_no'
am__quote=''
am__tar='$${TAR-tar} chof - "$$tardir"'
am__untar='$${TAR-tar} xf -'
atof='ieee'
bindir='${exec_prefix}/bin'
build='x86_64-pc-linux-gnu'
build_alias='x86_64-pc-linux-gnu'
build_cpu='x86_64'
build_os='linux-gnu'
build_vendor='pc'
cgen_cpu_prefix=''
datadir='${datarootdir}'
datarootdir='${prefix}/share'
do_compare='cmp --ignore-initial=16 $$f1 $$f2'
docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
dvidir='${docdir}'
exec_prefix='${prefix}'
extra_objects=''
host='x86_64-pc-linux-gnu'
host_alias='x86_64-pc-linux-gnu'
host_cpu='x86_64'
host_os='linux-gnu'
host_vendor='pc'
htmldir='${docdir}'
includedir='${prefix}/include'
infodir='${datarootdir}/info'
install_sh='${SHELL} /some/were/build-many/src/binutils/install-sh'
install_tooldir='install-exec-tooldir'
libdir='${exec_prefix}/lib'
libexecdir='${exec_prefix}/libexec'
localedir='${datarootdir}/locale'
localstatedir='${prefix}/var'
mandir='${datarootdir}/man'
mkdir_p='$(MKDIR_P)'
obj_format='elf'
oldincludedir='/usr/include'
pdfdir='${docdir}'
prefix='/some/were/build-many/install/compilers/x86_64-linux-gnu'
program_transform_name='s&^&x86_64-glibc-linux-gnu-&'
psdir='${docdir}'
sbindir='${exec_prefix}/sbin'
sharedstatedir='${prefix}/com'
sysconfdir='${prefix}/etc'
target='x86_64-glibc-linux-gnu'
target_alias='x86_64-glibc-linux-gnu'
target_cpu='x86_64'
target_cpu_type='i386'
target_os='linux-gnu'
target_vendor='glibc'
te_file='linux'
zlibdir='-L$(top_builddir)/../zlib'
zlibinc='-I$(top_srcdir)/../zlib'

## ----------- ##
## confdefs.h. ##
## ----------- ##

/* confdefs.h */
#define PACKAGE_NAME "gas"
#define PACKAGE_TARNAME "gas"
#define PACKAGE_VERSION "2.37"
#define PACKAGE_STRING "gas 2.37"
#define PACKAGE_BUGREPORT ""
#define PACKAGE_URL ""
#define PACKAGE "gas"
#define VERSION "2.37"
#define STDC_HEADERS 1
#define HAVE_SYS_TYPES_H 1
#define HAVE_SYS_STAT_H 1
#define HAVE_STDLIB_H 1
#define HAVE_STRING_H 1
#define HAVE_MEMORY_H 1
#define HAVE_STRINGS_H 1
#define HAVE_INTTYPES_H 1
#define HAVE_STDINT_H 1
#define HAVE_UNISTD_H 1
#define __EXTENSIONS__ 1
#define _ALL_SOURCE 1
#define _GNU_SOURCE 1
#define _POSIX_PTHREAD_SEMANTICS 1
#define _TANDEM_SOURCE 1
#define HAVE_DLFCN_H 1
#define LT_OBJDIR ".libs/"
#define HAVE_DLFCN_H 1
#define ENABLE_CHECKING 1
#define DEFAULT_ARCH "x86_64"
#define DEFAULT_GENERATE_X86_RELAX_RELOCATIONS 1
#define DEFAULT_GENERATE_ELF_STT_COMMON 0
#define DEFAULT_GENERATE_BUILD_NOTES 0
#define DEFAULT_X86_USED_NOTE 1
#define DEFAULT_RISCV_ATTR 0
#define DEFAULT_MIPS_FIX_LOONGSON3_LLSC 0
#define DEFAULT_FLAG_COMPRESS_DEBUG 1
#define EMULATIONS  &i386elf,
#define DEFAULT_EMULATION "i386elf"
#define TARGET_ALIAS "x86_64-glibc-linux-gnu"
#define TARGET_CANONICAL "x86_64-glibc-linux-gnu"
#define TARGET_CPU "x86_64"
#define TARGET_VENDOR "glibc"
#define TARGET_OS "linux-gnu"
#define YYTEXT_POINTER 1
#define ENABLE_NLS 1
#define HAVE_MEMORY_H 1
#define HAVE_SYS_STAT_H 1
#define HAVE_SYS_TYPES_H 1
#define HAVE_UNISTD_H 1
#define CROSS_COMPILE 1
#define HAVE_STRSIGNAL 1
#define HAVE_LC_MESSAGES 1
#define HAVE_DECL_GETOPT 1
#define HAVE_DECL_ASPRINTF 1
#define HAVE_DECL_MEMPCPY 1
#define HAVE_DECL_STPCPY 1
#define HAVE_ST_MTIM_TV_SEC 1
#define HAVE_ST_MTIM_TV_NSEC 1

configure: exit 0
```
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 18:15                           ` Noah Goldstein via Libc-alpha
@ 2021-09-27 18:22                             ` Florian Weimer via Libc-alpha
  2021-09-27 18:34                               ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-27 18:22 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

* Noah Goldstein:

> On Mon, Sep 27, 2021 at 1:11 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Noah Goldstein:
>>
>> > On Mon, Sep 27, 2021 at 12:56 PM Florian Weimer <fweimer@redhat.com> wrote:
>> >>
>> >> * Noah Goldstein:
>> >>
>> >> > $> cat
>> >> > build-many/logs/compilers/x86_64-linux-gnu/003-compilers-x86_64-linux-gnu-binutils-configure-log.txt
>> >>
>> >> There should be a config.log file in the binutils build directory (under
>> >> build-many/build/compilers).  I hope this file contains illuminating
>> >> data.
>> >
>> > Oh sorry. Here is the output of config.log
>> >
>> > ```
>> > $> cat build-many/build/compilers/x86_64-linux-gnu/binutils/config.log
>>
>> Hmm, is there a binutils/gas/config.log as well?
>
> Here is the dump. Thanks for the help!

So we clearly have:

> #define DEFAULT_GENERATE_ELF_STT_COMMON 0

And yet the build failes with the indicated error?

Could you check what as/config.h contains?  This is very odd.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 18:22                             ` Florian Weimer via Libc-alpha
@ 2021-09-27 18:34                               ` Noah Goldstein via Libc-alpha
  2021-09-27 18:56                                 ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27 18:34 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 47299 bytes --]

On Mon, Sep 27, 2021 at 1:22 PM Florian Weimer <fweimer@redhat.com> wrote:

> * Noah Goldstein:
>
> > On Mon, Sep 27, 2021 at 1:11 PM Florian Weimer <fweimer@redhat.com>
> wrote:
> >>
> >> * Noah Goldstein:
> >>
> >> > On Mon, Sep 27, 2021 at 12:56 PM Florian Weimer <fweimer@redhat.com>
> wrote:
> >> >>
> >> >> * Noah Goldstein:
> >> >>
> >> >> > $> cat
> >> >> >
> build-many/logs/compilers/x86_64-linux-gnu/003-compilers-x86_64-linux-gnu-binutils-configure-log.txt
> >> >>
> >> >> There should be a config.log file in the binutils build directory
> (under
> >> >> build-many/build/compilers).  I hope this file contains illuminating
> >> >> data.
> >> >
> >> > Oh sorry. Here is the output of config.log
> >> >
> >> > ```
> >> > $> cat build-many/build/compilers/x86_64-linux-gnu/binutils/config.log
> >>
> >> Hmm, is there a binutils/gas/config.log as well?
> >
> > Here is the dump. Thanks for the help!
>
> So we clearly have:
>
> > #define DEFAULT_GENERATE_ELF_STT_COMMON 0
>
> And yet the build failes with the indicated error?
>
> Could you check what as/config.h contains?  This is very odd.
>

/* Define to 1 if you want to generate ELF common symbols with the
STT_COMMON
   type by default. */
#define DEFAULT_GENERATE_ELF_STT_COMMON 0

is in gas/config.h

Include tar file of all logs for the build.

Full command / stdout was:

```
rm -rf build-many; python3 src/glibc/scripts/build-many-glibcs.py
build-many/ checkout gcc-vcs-11; echo "Host Libraries"; python3
src/glibc/scripts/build-many-glibcs.py build-many host-libraries
--keep=all; echo "Compilers"; python3
src/glibc/scripts/build-many-glibcs.py build-many compilers
x86_64-linux-gnu --keep=all; python3 src/glibc/scripts/build-many-glibcs.py
build-many glibcs --keep=all
configure.ac:83: installing 'build-aux/compile'
configure.ac:46: installing 'build-aux/config.guess'
configure.ac:46: installing 'build-aux/config.sub'
configure.ac:26: installing 'build-aux/install-sh'
configure.ac:26: installing 'build-aux/missing'
Makefile.am: installing './INSTALL'
Makefile.am: installing 'build-aux/depcomp'
Makefile.am:32: installing 'build-aux/mdate-sh'
doc/Makefrag.am:106: warning: user target '$(srcdir)/doc/version.texi'
defined here ...
Makefile.am:155:   'doc/Makefrag.am' included from here
/usr/share/automake-1.16/am/texi-vers.am: ... overrides Automake target
'$(srcdir)/doc/version.texi' defined here
Makefile.am:32: installing 'build-aux/texinfo.tex'
parallel-tests: installing 'build-aux/test-driver'
configure.ac:25: installing 'build-aux/compile'
configure.ac:9: installing 'build-aux/config.guess'
configure.ac:9: installing 'build-aux/config.sub'
configure.ac:14: installing 'build-aux/install-sh'
configure.ac:14: installing 'build-aux/missing'
Makefile.am: installing './INSTALL'
Makefile.am: installing 'build-aux/depcomp'
configure.ac: installing 'build-aux/ylwrap'
parallel-tests: installing 'build-aux/test-driver'
Host Libraries
PASS: host-libraries gmp rm
PASS: host-libraries gmp mkdir
PASS: host-libraries gmp configure
PASS: host-libraries gmp build
PASS: host-libraries gmp check
PASS: host-libraries gmp install
PASS: host-libraries mpfr rm
PASS: host-libraries mpfr mkdir
PASS: host-libraries mpfr configure
PASS: host-libraries mpfr build
PASS: host-libraries mpfr check
PASS: host-libraries mpfr install
PASS: host-libraries mpc rm
PASS: host-libraries mpc mkdir
PASS: host-libraries mpc configure
PASS: host-libraries mpc build
PASS: host-libraries mpc check
PASS: host-libraries mpc install
PASS: host-libraries done
Compilers
PASS: compilers-x86_64-linux-gnu check-host-libraries
PASS: compilers-x86_64-linux-gnu binutils rm
PASS: compilers-x86_64-linux-gnu binutils mkdir
PASS: compilers-x86_64-linux-gnu binutils configure
FAIL: compilers-x86_64-linux-gnu binutils build
UNRESOLVED: compilers-x86_64-linux-gnu binutils install
UNRESOLVED: compilers-x86_64-linux-gnu linux rm
UNRESOLVED: compilers-x86_64-linux-gnu linux mkdir
UNRESOLVED: compilers-x86_64-linux-gnu linux install-headers
UNRESOLVED: compilers-x86_64-linux-gnu gcc-first rm
UNRESOLVED: compilers-x86_64-linux-gnu gcc-first mkdir
UNRESOLVED: compilers-x86_64-linux-gnu gcc-first configure
UNRESOLVED: compilers-x86_64-linux-gnu gcc-first build
UNRESOLVED: compilers-x86_64-linux-gnu gcc-first install
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu rm
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu mkdir
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu configure
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu build
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu install
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu mkdir-lib
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu-x32 rm
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu-x32 mkdir
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu-x32 configure
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu-x32 build
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu-x32 install
UNRESOLVED: compilers-x86_64-linux-gnu glibc x86_64-linux-gnu-x32 mkdir-lib
UNRESOLVED: compilers-x86_64-linux-gnu glibc i686-linux-gnu rm
UNRESOLVED: compilers-x86_64-linux-gnu glibc i686-linux-gnu mkdir
UNRESOLVED: compilers-x86_64-linux-gnu glibc i686-linux-gnu configure
UNRESOLVED: compilers-x86_64-linux-gnu glibc i686-linux-gnu build
UNRESOLVED: compilers-x86_64-linux-gnu glibc i686-linux-gnu install
UNRESOLVED: compilers-x86_64-linux-gnu glibc i686-linux-gnu mkdir-lib
UNRESOLVED: compilers-x86_64-linux-gnu gcc rm
UNRESOLVED: compilers-x86_64-linux-gnu gcc mkdir
UNRESOLVED: compilers-x86_64-linux-gnu gcc configure
UNRESOLVED: compilers-x86_64-linux-gnu gcc build
UNRESOLVED: compilers-x86_64-linux-gnu gcc install
UNRESOLVED: compilers-x86_64-linux-gnu done
FAIL: glibcs-arm-linux-gnueabi check-compilers
FAIL: glibcs-aarch64-linux-gnu check-compilers
FAIL: glibcs-alpha-linux-gnu check-compilers
FAIL: glibcs-arceb-linux-gnu check-compilers
FAIL: glibcs-aarch64_be-linux-gnu check-compilers
FAIL: glibcs-aarch64-linux-gnu-disable-multi-arch check-compilers
FAIL: glibcs-arc-linux-gnu check-compilers
FAIL: glibcs-arc-linux-gnuhf check-compilers
UNRESOLVED: glibcs-arm-linux-gnueabi rm
UNRESOLVED: glibcs-alpha-linux-gnu rm
UNRESOLVED: glibcs-aarch64_be-linux-gnu rm
UNRESOLVED: glibcs-arc-linux-gnu rm
UNRESOLVED: glibcs-arceb-linux-gnu rm
UNRESOLVED: glibcs-aarch64-linux-gnu-disable-multi-arch rm
UNRESOLVED: glibcs-aarch64-linux-gnu rm
UNRESOLVED: glibcs-arm-linux-gnueabi mkdir
UNRESOLVED: glibcs-alpha-linux-gnu mkdir
UNRESOLVED: glibcs-arc-linux-gnu mkdir
UNRESOLVED: glibcs-arc-linux-gnuhf rm
UNRESOLVED: glibcs-arceb-linux-gnu mkdir
UNRESOLVED: glibcs-aarch64_be-linux-gnu mkdir
UNRESOLVED: glibcs-aarch64-linux-gnu-disable-multi-arch mkdir
UNRESOLVED: glibcs-arc-linux-gnuhf mkdir
UNRESOLVED: glibcs-aarch64-linux-gnu mkdir
UNRESOLVED: glibcs-arm-linux-gnueabi configure
UNRESOLVED: glibcs-arceb-linux-gnu configure
UNRESOLVED: glibcs-arm-linux-gnueabi build
UNRESOLVED: glibcs-arceb-linux-gnu build
UNRESOLVED: glibcs-arc-linux-gnu configure
UNRESOLVED: glibcs-arc-linux-gnu build
UNRESOLVED: glibcs-arm-linux-gnueabi install
UNRESOLVED: glibcs-aarch64-linux-gnu-disable-multi-arch configure
UNRESOLVED: glibcs-arceb-linux-gnu install
UNRESOLVED: glibcs-alpha-linux-gnu configure
UNRESOLVED: glibcs-arc-linux-gnu install
UNRESOLVED: glibcs-alpha-linux-gnu build
UNRESOLVED: glibcs-aarch64-linux-gnu-disable-multi-arch build
UNRESOLVED: glibcs-arm-linux-gnueabi mkdir-lib
UNRESOLVED: glibcs-arceb-linux-gnu mkdir-lib
UNRESOLVED: glibcs-alpha-linux-gnu install
UNRESOLVED: glibcs-aarch64-linux-gnu configure
UNRESOLVED: glibcs-aarch64-linux-gnu-disable-multi-arch install
UNRESOLVED: glibcs-arc-linux-gnuhf configure
UNRESOLVED: glibcs-aarch64_be-linux-gnu configure
UNRESOLVED: glibcs-arc-linux-gnu mkdir-lib
UNRESOLVED: glibcs-aarch64-linux-gnu build
UNRESOLVED: glibcs-arceb-linux-gnu check
UNRESOLVED: glibcs-arc-linux-gnuhf build
UNRESOLVED: glibcs-arm-linux-gnueabi check
PASS: glibcs-arm-linux-gnueabi save-logs
UNRESOLVED: glibcs-aarch64_be-linux-gnu build
UNRESOLVED: glibcs-aarch64-linux-gnu install
UNRESOLVED: glibcs-arc-linux-gnu check
FAIL: glibcs-arm-linux-gnueabi-v4t check-compilers
PASS: glibcs-arceb-linux-gnu save-logs
UNRESOLVED: glibcs-alpha-linux-gnu mkdir-lib
UNRESOLVED: glibcs-aarch64-linux-gnu-disable-multi-arch mkdir-lib
UNRESOLVED: glibcs-arc-linux-gnuhf install
FAIL: glibcs-arm-linux-gnueabihf check-compilers
PASS: glibcs-arc-linux-gnu save-logs
UNRESOLVED: glibcs-arm-linux-gnueabi-v4t rm
UNRESOLVED: glibcs-alpha-linux-gnu check
UNRESOLVED: glibcs-aarch64-linux-gnu mkdir-lib
UNRESOLVED: glibcs-aarch64-linux-gnu-disable-multi-arch check
FAIL: glibcs-arm-linux-gnueabihf-thumb check-compilers
UNRESOLVED: glibcs-arm-linux-gnueabihf rm
UNRESOLVED: glibcs-aarch64_be-linux-gnu install
UNRESOLVED: glibcs-arm-linux-gnueabihf mkdir
UNRESOLVED: glibcs-arm-linux-gnueabihf-thumb rm
PASS: glibcs-alpha-linux-gnu save-logs
UNRESOLVED: glibcs-arm-linux-gnueabi-v4t mkdir
UNRESOLVED: glibcs-arc-linux-gnuhf mkdir-lib
PASS: glibcs-aarch64-linux-gnu-disable-multi-arch save-logs
UNRESOLVED: glibcs-aarch64-linux-gnu check
UNRESOLVED: glibcs-aarch64_be-linux-gnu mkdir-lib
UNRESOLVED: glibcs-arc-linux-gnuhf check
UNRESOLVED: glibcs-arm-linux-gnueabihf-thumb mkdir
UNRESOLVED: glibcs-aarch64_be-linux-gnu check
PASS: glibcs-arc-linux-gnuhf save-logs
FAIL: glibcs-arm-linux-gnueabihf-v7a check-compilers
PASS: glibcs-aarch64-linux-gnu save-logs
FAIL: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch check-compilers
PASS: glibcs-aarch64_be-linux-gnu save-logs
FAIL: glibcs-armeb-linux-gnueabi check-compilers
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch rm
UNRESOLVED: glibcs-armeb-linux-gnueabi rm
FAIL: glibcs-armeb-linux-gnueabi-be8 check-compilers
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch mkdir
FAIL: glibcs-armeb-linux-gnueabihf check-compilers
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a rm
UNRESOLVED: glibcs-armeb-linux-gnueabi mkdir
UNRESOLVED: glibcs-armeb-linux-gnueabi-be8 rm
UNRESOLVED: glibcs-armeb-linux-gnueabihf rm
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a mkdir
UNRESOLVED: glibcs-arm-linux-gnueabihf configure
UNRESOLVED: glibcs-armeb-linux-gnueabi-be8 mkdir
UNRESOLVED: glibcs-armeb-linux-gnueabihf mkdir
UNRESOLVED: glibcs-arm-linux-gnueabihf build
UNRESOLVED: glibcs-arm-linux-gnueabihf install
UNRESOLVED: glibcs-armeb-linux-gnueabi-be8 configure
UNRESOLVED: glibcs-arm-linux-gnueabi-v4t configure
UNRESOLVED: glibcs-armeb-linux-gnueabi-be8 build
UNRESOLVED: glibcs-arm-linux-gnueabihf mkdir-lib
UNRESOLVED: glibcs-armeb-linux-gnueabihf configure
UNRESOLVED: glibcs-arm-linux-gnueabi-v4t build
UNRESOLVED: glibcs-armeb-linux-gnueabi-be8 install
UNRESOLVED: glibcs-arm-linux-gnueabihf-thumb configure
UNRESOLVED: glibcs-armeb-linux-gnueabihf build
UNRESOLVED: glibcs-arm-linux-gnueabihf check
UNRESOLVED: glibcs-armeb-linux-gnueabi-be8 mkdir-lib
UNRESOLVED: glibcs-armeb-linux-gnueabihf install
UNRESOLVED: glibcs-arm-linux-gnueabihf-thumb build
UNRESOLVED: glibcs-armeb-linux-gnueabi-be8 check
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch configure
PASS: glibcs-arm-linux-gnueabihf save-logs
PASS: glibcs-armeb-linux-gnueabi-be8 save-logs
UNRESOLVED: glibcs-arm-linux-gnueabi-v4t install
UNRESOLVED: glibcs-arm-linux-gnueabihf-thumb install
UNRESOLVED: glibcs-armeb-linux-gnueabi configure
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch build
FAIL: glibcs-csky-linux-gnuabiv2 check-compilers
FAIL: glibcs-armeb-linux-gnueabihf-be8 check-compilers
UNRESOLVED: glibcs-arm-linux-gnueabi-v4t mkdir-lib
UNRESOLVED: glibcs-armeb-linux-gnueabi build
UNRESOLVED: glibcs-arm-linux-gnueabihf-thumb mkdir-lib
UNRESOLVED: glibcs-armeb-linux-gnueabihf mkdir-lib
UNRESOLVED: glibcs-arm-linux-gnueabi-v4t check
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a configure
UNRESOLVED: glibcs-arm-linux-gnueabihf-thumb check
PASS: glibcs-arm-linux-gnueabi-v4t save-logs
UNRESOLVED: glibcs-csky-linux-gnuabiv2 rm
UNRESOLVED: glibcs-armeb-linux-gnueabihf-be8 rm
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch install
UNRESOLVED: glibcs-armeb-linux-gnueabihf check
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a build
PASS: glibcs-arm-linux-gnueabihf-thumb save-logs
FAIL: glibcs-csky-linux-gnuabiv2-soft check-compilers
PASS: glibcs-armeb-linux-gnueabihf save-logs
UNRESOLVED: glibcs-armeb-linux-gnueabi install
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch mkdir-lib
UNRESOLVED: glibcs-csky-linux-gnuabiv2 mkdir
UNRESOLVED: glibcs-csky-linux-gnuabiv2-soft rm
UNRESOLVED: glibcs-armeb-linux-gnueabihf-be8 mkdir
FAIL: glibcs-i486-linux-gnu check-compilers
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a install
FAIL: glibcs-hppa-linux-gnu check-compilers
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch check
UNRESOLVED: glibcs-armeb-linux-gnueabi mkdir-lib
PASS: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch save-logs
UNRESOLVED: glibcs-hppa-linux-gnu rm
FAIL: glibcs-i586-linux-gnu check-compilers
UNRESOLVED: glibcs-csky-linux-gnuabiv2-soft mkdir
UNRESOLVED: glibcs-i486-linux-gnu rm
UNRESOLVED: glibcs-hppa-linux-gnu mkdir
UNRESOLVED: glibcs-armeb-linux-gnueabi check
UNRESOLVED: glibcs-i586-linux-gnu rm
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a mkdir-lib
PASS: glibcs-armeb-linux-gnueabi save-logs
UNRESOLVED: glibcs-i486-linux-gnu mkdir
FAIL: glibcs-i686-gnu check-compilers
UNRESOLVED: glibcs-i586-linux-gnu mkdir
UNRESOLVED: glibcs-arm-linux-gnueabihf-v7a check
PASS: glibcs-arm-linux-gnueabihf-v7a save-logs
UNRESOLVED: glibcs-i686-gnu rm
UNRESOLVED: glibcs-csky-linux-gnuabiv2 configure
FAIL: glibcs-i686-linux-gnu check-compilers
UNRESOLVED: glibcs-csky-linux-gnuabiv2 build
UNRESOLVED: glibcs-armeb-linux-gnueabihf-be8 configure
UNRESOLVED: glibcs-i686-gnu mkdir
UNRESOLVED: glibcs-csky-linux-gnuabiv2 install
UNRESOLVED: glibcs-armeb-linux-gnueabihf-be8 build
UNRESOLVED: glibcs-i686-linux-gnu rm
UNRESOLVED: glibcs-csky-linux-gnuabiv2 mkdir-lib
UNRESOLVED: glibcs-armeb-linux-gnueabihf-be8 install
UNRESOLVED: glibcs-csky-linux-gnuabiv2 check
UNRESOLVED: glibcs-i686-linux-gnu mkdir
PASS: glibcs-csky-linux-gnuabiv2 save-logs
UNRESOLVED: glibcs-armeb-linux-gnueabihf-be8 mkdir-lib
UNRESOLVED: glibcs-hppa-linux-gnu configure
FAIL: glibcs-i686-linux-gnu-disable-multi-arch check-compilers
UNRESOLVED: glibcs-i486-linux-gnu configure
UNRESOLVED: glibcs-i486-linux-gnu build
UNRESOLVED: glibcs-hppa-linux-gnu build
UNRESOLVED: glibcs-armeb-linux-gnueabihf-be8 check
UNRESOLVED: glibcs-csky-linux-gnuabiv2-soft configure
UNRESOLVED: glibcs-i686-linux-gnu-disable-multi-arch rm
UNRESOLVED: glibcs-csky-linux-gnuabiv2-soft build
PASS: glibcs-armeb-linux-gnueabihf-be8 save-logs
UNRESOLVED: glibcs-i486-linux-gnu install
UNRESOLVED: glibcs-csky-linux-gnuabiv2-soft install
UNRESOLVED: glibcs-i686-gnu configure
UNRESOLVED: glibcs-i586-linux-gnu configure
UNRESOLVED: glibcs-hppa-linux-gnu install
FAIL: glibcs-i686-linux-gnu-static-pie check-compilers
UNRESOLVED: glibcs-i686-linux-gnu-disable-multi-arch mkdir
UNRESOLVED: glibcs-i586-linux-gnu build
UNRESOLVED: glibcs-i686-gnu build
UNRESOLVED: glibcs-i486-linux-gnu mkdir-lib
UNRESOLVED: glibcs-csky-linux-gnuabiv2-soft mkdir-lib
UNRESOLVED: glibcs-i686-linux-gnu-static-pie rm
UNRESOLVED: glibcs-i686-gnu install
UNRESOLVED: glibcs-i586-linux-gnu install
UNRESOLVED: glibcs-i486-linux-gnu check
UNRESOLVED: glibcs-hppa-linux-gnu mkdir-lib
UNRESOLVED: glibcs-csky-linux-gnuabiv2-soft check
UNRESOLVED: glibcs-i686-gnu mkdir-lib
UNRESOLVED: glibcs-i686-linux-gnu-static-pie mkdir
UNRESOLVED: glibcs-i686-gnu check
PASS: glibcs-i486-linux-gnu save-logs
PASS: glibcs-i686-gnu save-logs
PASS: glibcs-csky-linux-gnuabiv2-soft save-logs
UNRESOLVED: glibcs-hppa-linux-gnu check
UNRESOLVED: glibcs-i686-linux-gnu-disable-multi-arch configure
UNRESOLVED: glibcs-i586-linux-gnu mkdir-lib
FAIL: glibcs-m68k-linux-gnu-coldfire check-compilers
PASS: glibcs-hppa-linux-gnu save-logs
UNRESOLVED: glibcs-i686-linux-gnu configure
FAIL: glibcs-m68k-linux-gnu check-compilers
FAIL: glibcs-ia64-linux-gnu check-compilers
UNRESOLVED: glibcs-i686-linux-gnu-disable-multi-arch build
UNRESOLVED: glibcs-i586-linux-gnu check
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire rm
FAIL: glibcs-m68k-linux-gnu-coldfire-soft check-compilers
UNRESOLVED: glibcs-i686-linux-gnu-disable-multi-arch install
UNRESOLVED: glibcs-i686-linux-gnu build
UNRESOLVED: glibcs-m68k-linux-gnu rm
UNRESOLVED: glibcs-ia64-linux-gnu rm
PASS: glibcs-i586-linux-gnu save-logs
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire-soft rm
UNRESOLVED: glibcs-i686-linux-gnu-disable-multi-arch mkdir-lib
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire mkdir
UNRESOLVED: glibcs-m68k-linux-gnu mkdir
FAIL: glibcs-microblaze-linux-gnu check-compilers
UNRESOLVED: glibcs-i686-linux-gnu install
UNRESOLVED: glibcs-i686-linux-gnu-disable-multi-arch check
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire-soft mkdir
UNRESOLVED: glibcs-ia64-linux-gnu mkdir
UNRESOLVED: glibcs-i686-linux-gnu mkdir-lib
PASS: glibcs-i686-linux-gnu-disable-multi-arch save-logs
UNRESOLVED: glibcs-microblaze-linux-gnu rm
UNRESOLVED: glibcs-i686-linux-gnu check
FAIL: glibcs-microblazeel-linux-gnu check-compilers
UNRESOLVED: glibcs-microblaze-linux-gnu mkdir
PASS: glibcs-i686-linux-gnu save-logs
UNRESOLVED: glibcs-i686-linux-gnu-static-pie configure
UNRESOLVED: glibcs-m68k-linux-gnu configure
UNRESOLVED: glibcs-microblazeel-linux-gnu rm
UNRESOLVED: glibcs-m68k-linux-gnu build
FAIL: glibcs-mips-linux-gnu check-compilers
UNRESOLVED: glibcs-i686-linux-gnu-static-pie build
UNRESOLVED: glibcs-m68k-linux-gnu install
UNRESOLVED: glibcs-microblazeel-linux-gnu mkdirls
UNRESOLVED: glibcs-m68k-linux-gnu mkdir-lib
UNRESOLVED: glibcs-mips-linux-gnu rm
UNRESOLVED: glibcs-microblaze-linux-gnu configure
UNRESOLVED: glibcs-i686-linux-gnu-static-pie install
UNRESOLVED: glibcs-microblaze-linux-gnu build
UNRESOLVED: glibcs-m68k-linux-gnu check
UNRESOLVED: glibcs-mips-linux-gnu mkdir
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire-soft configure
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire configure
PASS: glibcs-m68k-linux-gnu save-logs
UNRESOLVED: glibcs-i686-linux-gnu-static-pie mkdir-lib
UNRESOLVED: glibcs-ia64-linux-gnu configure
FAIL: glibcs-mips-linux-gnu-nan2008 check-compilers
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire-soft build
UNRESOLVED: glibcs-ia64-linux-gnu build
UNRESOLVED: glibcs-microblaze-linux-gnu install
UNRESOLVED: glibcs-mips-linux-gnu-nan2008 rm
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire build
UNRESOLVED: glibcs-i686-linux-gnu-static-pie check
UNRESOLVED: glibcs-ia64-linux-gnu install
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire-soft install
PASS: glibcs-i686-linux-gnu-static-pie save-logs
UNRESOLVED: glibcs-microblaze-linux-gnu mkdir-lib
UNRESOLVED: glibcs-mips-linux-gnu-nan2008 mkdir
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire install
UNRESOLVED: glibcs-mips-linux-gnu configure
FAIL: glibcs-mips-linux-gnu-nan2008-soft check-compilers
UNRESOLVED: glibcs-mips-linux-gnu build
UNRESOLVED: glibcs-microblaze-linux-gnu check
UNRESOLVED: glibcs-ia64-linux-gnu mkdir-lib
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire-soft mkdir-lib
UNRESOLVED: glibcs-microblazeel-linux-gnu configure
UNRESOLVED: glibcs-mips-linux-gnu install
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire-soft check
PASS: glibcs-microblaze-linux-gnu save-logs
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire mkdir-lib
UNRESOLVED: glibcs-mips-linux-gnu-nan2008-soft rm
UNRESOLVED: glibcs-microblazeel-linux-gnu build
UNRESOLVED: glibcs-ia64-linux-gnu check
PASS: glibcs-m68k-linux-gnu-coldfire-soft save-logs
UNRESOLVED: glibcs-m68k-linux-gnu-coldfire check
FAIL: glibcs-mips-linux-gnu-soft check-compilers
FAIL: glibcs-mips64-linux-gnu-n32 check-compilers
UNRESOLVED: glibcs-mips-linux-gnu-nan2008-soft mkdir
PASS: glibcs-ia64-linux-gnu save-logs
PASS: glibcs-m68k-linux-gnu-coldfire save-logs
UNRESOLVED: glibcs-mips-linux-gnu mkdir-lib
UNRESOLVED: glibcs-microblazeel-linux-gnu install
UNRESOLVED: glibcs-microblazeel-linux-gnu mkdir-lib
FAIL: glibcs-mips64-linux-gnu-n32-nan2008 check-compilers
FAIL: glibcs-mips64-linux-gnu-n32-nan2008-soft check-compilers
UNRESOLVED: glibcs-mips-linux-gnu check
UNRESOLVED: glibcs-mips-linux-gnu-soft rm
UNRESOLVED: glibcs-mips64-linux-gnu-n32 rm
UNRESOLVED: glibcs-microblazeel-linux-gnu check
PASS: glibcs-microblazeel-linux-gnu save-logs
UNRESOLVED: glibcs-mips-linux-gnu-soft mkdir
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008 rm
PASS: glibcs-mips-linux-gnu save-logs
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008-soft rm
FAIL: glibcs-mips64-linux-gnu-n32-soft check-compilers
FAIL: glibcs-mips64-linux-gnu-n64 check-compilers
UNRESOLVED: glibcs-mips64-linux-gnu-n32-soft rm
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008-soft mkdir
UNRESOLVED: glibcs-mips64-linux-gnu-n32 mkdir
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008 mkdir
UNRESOLVED: glibcs-mips64-linux-gnu-n64 rm
UNRESOLVED: glibcs-mips64-linux-gnu-n32-soft mkdir
UNRESOLVED: glibcs-mips-linux-gnu-nan2008 configure
UNRESOLVED: glibcs-mips-linux-gnu-nan2008 build
UNRESOLVED: glibcs-mips-linux-gnu-nan2008-soft configure
UNRESOLVED: glibcs-mips-linux-gnu-nan2008 install
UNRESOLVED: glibcs-mips64-linux-gnu-n64 mkdir
UNRESOLVED: glibcs-mips-linux-gnu-nan2008 mkdir-lib
UNRESOLVED: glibcs-mips-linux-gnu-nan2008-soft build
UNRESOLVED: glibcs-mips-linux-gnu-nan2008 check
PASS: glibcs-mips-linux-gnu-nan2008 save-logs
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008 configure
FAIL: glibcs-mips64-linux-gnu-n64-nan2008 check-compilers
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008 build
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008 rm
UNRESOLVED: glibcs-mips-linux-gnu-nan2008-soft install
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008 install
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008 mkdir
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008 mkdir-lib
UNRESOLVED: glibcs-mips-linux-gnu-nan2008-soft mkdir-lib
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008 check
UNRESOLVED: glibcs-mips-linux-gnu-soft configure
UNRESOLVED: glibcs-mips-linux-gnu-nan2008-soft check
PASS: glibcs-mips64-linux-gnu-n32-nan2008 save-logs
PASS: glibcs-mips-linux-gnu-nan2008-soft save-logs
UNRESOLVED: glibcs-mips-linux-gnu-soft build
FAIL: glibcs-mips64-linux-gnu-n64-nan2008-soft check-compilers
UNRESOLVED: glibcs-mips64-linux-gnu-n32 configure
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008 configure
FAIL: glibcs-mips64-linux-gnu-n64-soft check-compilers
UNRESOLVED: glibcs-mips64-linux-gnu-n64 configure
UNRESOLVED: glibcs-mips64-linux-gnu-n64 build
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008-soft configure
UNRESOLVED: glibcs-mips64-linux-gnu-n32-soft configure
UNRESOLVED: glibcs-mips64-linux-gnu-n64-soft rm
UNRESOLVED: glibcs-mips64-linux-gnu-n32 build
UNRESOLVED: glibcs-mips-linux-gnu-soft install
UNRESOLVED: glibcs-mips64-linux-gnu-n64 install
UNRESOLVED: glibcs-mips64-linux-gnu-n64-soft mkdir
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008 build
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008-soft rm
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008-soft build
UNRESOLVED: glibcs-mips64-linux-gnu-n32-soft build
UNRESOLVED: glibcs-mips64-linux-gnu-n64 mkdir-lib
UNRESOLVED: glibcs-mips64-linux-gnu-n64 check
UNRESOLVED: glibcs-mips64-linux-gnu-n32-soft install
UNRESOLVED: glibcs-mips-linux-gnu-soft mkdir-lib
PASS: glibcs-mips64-linux-gnu-n64 save-logs
UNRESOLVED: glibcs-mips64-linux-gnu-n32 install
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008-soft install
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008 install
UNRESOLVED: glibcs-mips64-linux-gnu-n32 mkdir-lib
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008-soft mkdir
UNRESOLVED: glibcs-mips64-linux-gnu-n32-soft mkdir-lib
FAIL: glibcs-mips64el-linux-gnu-n32 check-compilers
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008-soft mkdir-lib
UNRESOLVED: glibcs-mips64-linux-gnu-n32 check
UNRESOLVED: glibcs-mips-linux-gnu-soft check
UNRESOLVED: glibcs-mips64-linux-gnu-n32-nan2008-soft check
PASS: glibcs-mips64-linux-gnu-n32 save-logs
PASS: glibcs-mips-linux-gnu-soft save-logs
PASS: glibcs-mips64-linux-gnu-n32-nan2008-soft save-logs
FAIL: glibcs-mips64el-linux-gnu-n32-nan2008-soft check-compilers
UNRESOLVED: glibcs-mips64-linux-gnu-n32-soft check
FAIL: glibcs-mips64el-linux-gnu-n32-soft check-compilers
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008 mkdir-lib
UNRESOLVED: glibcs-mips64el-linux-gnu-n32 rm
FAIL: glibcs-mips64el-linux-gnu-n32-nan2008 check-compilers
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-soft rm
PASS: glibcs-mips64-linux-gnu-n32-soft save-logs
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008 rm
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008-soft rm
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008 check
UNRESOLVED: glibcs-mips64el-linux-gnu-n32 mkdir
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008 mkdir
PASS: glibcs-mips64-linux-gnu-n64-nan2008 save-logs
FAIL: glibcs-mips64el-linux-gnu-n64 check-compilers
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-soft mkdir
FAIL: glibcs-mips64el-linux-gnu-n64-nan2008 check-compilers
UNRESOLVED: glibcs-mips64el-linux-gnu-n64 rm
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008-soft mkdir
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008 rm
UNRESOLVED: glibcs-mips64el-linux-gnu-n64 mkdir
UNRESOLVED: glibcs-mips64-linux-gnu-n64-soft configure
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008 mkdir
UNRESOLVED: glibcs-mips64-linux-gnu-n64-soft build
UNRESOLVED: glibcs-mips64-linux-gnu-n64-soft install
UNRESOLVED: glibcs-mips64el-linux-gnu-n64 configure
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008 configure
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008-soft configure
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008 build
UNRESOLVED: glibcs-mips64-linux-gnu-n64-soft mkdir-lib
UNRESOLVED: glibcs-mips64el-linux-gnu-n64 build
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008 install
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008 configure
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008-soft build
UNRESOLVED: glibcs-mips64-linux-gnu-n64-soft check
UNRESOLVED: glibcs-mips64el-linux-gnu-n64 install
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008 build
PASS: glibcs-mips64-linux-gnu-n64-soft save-logs
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008 mkdir-lib
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-soft configure
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008-soft install
UNRESOLVED: glibcs-mips64el-linux-gnu-n32 configure
UNRESOLVED: glibcs-mips64el-linux-gnu-n32 build
FAIL: glibcs-mips64el-linux-gnu-n64-nan2008-soft check-compilers
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008-soft configure
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008-soft mkdir-lib
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008 check
UNRESOLVED: glibcs-mips64el-linux-gnu-n32 install
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-soft build
PASS: glibcs-mips64el-linux-gnu-n64-nan2008 save-logs
UNRESOLVED: glibcs-mips64el-linux-gnu-n64 mkdir-lib
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008 install
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008-soft rm
UNRESOLVED: glibcs-mips64-linux-gnu-n64-nan2008-soft check
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008-soft build
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008 mkdir-lib
UNRESOLVED: glibcs-mips64el-linux-gnu-n32 mkdir-lib
FAIL: glibcs-mips64el-linux-gnu-n64-soft check-compilers
UNRESOLVED: glibcs-mips64el-linux-gnu-n64 check
UNRESOLVED: glibcs-mips64el-linux-gnu-n32 check
PASS: glibcs-mips64-linux-gnu-n64-nan2008-soft save-logs
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008-soft mkdir
PASS: glibcs-mips64el-linux-gnu-n64 save-logs
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-soft install
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008-soft install
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008 check
FAIL: glibcs-mipsel-linux-gnu-nan2008 check-compilers
PASS: glibcs-mips64el-linux-gnu-n32 save-logs
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008 rm
FAIL: glibcs-mipsel-linux-gnu check-compilers
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-soft rm
PASS: glibcs-mips64el-linux-gnu-n32-nan2008 save-logs
UNRESOLVED: glibcs-mipsel-linux-gnu rm
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008 mkdir
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-soft mkdir-lib
FAIL: glibcs-mipsel-linux-gnu-nan2008-soft check-compilers
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008-soft mkdir-lib
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-soft check
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-soft mkdir
FAIL: glibcs-mipsel-linux-gnu-soft check-compilers
UNRESOLVED: glibcs-mips64el-linux-gnu-n32-nan2008-soft check
PASS: glibcs-mips64el-linux-gnu-n32-soft save-logs
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008-soft rm
UNRESOLVED: glibcs-mipsel-linux-gnu mkdir
UNRESOLVED: glibcs-mipsel-linux-gnu-soft rm
FAIL: glibcs-mipsisa32r6el-linux-gnu check-compilers
PASS: glibcs-mips64el-linux-gnu-n32-nan2008-soft save-logs
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008-soft configure
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008-soft mkdir
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008-soft build
FAIL: glibcs-mipsisa64r6el-linux-gnu-n32 check-compilers
UNRESOLVED: glibcs-mipsel-linux-gnu-soft mkdir
UNRESOLVED: glibcs-mipsisa32r6el-linux-gnu rm
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008-soft install
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008 configure
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n32 rm
UNRESOLVED: glibcs-mipsisa32r6el-linux-gnu mkdir
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-soft configure
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008 build
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n32 mkdir
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-soft build
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008 install
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-soft install
UNRESOLVED: glibcs-mipsel-linux-gnu-soft configure
UNRESOLVED: glibcs-mipsel-linux-gnu-soft build
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n32 configure
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008 mkdir-lib
UNRESOLVED: glibcs-mipsel-linux-gnu-soft install
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n32 build
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008-soft mkdir-lib
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-soft mkdir-lib
UNRESOLVED: glibcs-mipsel-linux-gnu-soft mkdir-lib
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n32 install
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008 check
UNRESOLVED: glibcs-mipsel-linux-gnu configure
UNRESOLVED: glibcs-mipsel-linux-gnu-soft check
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-nan2008-soft check
UNRESOLVED: glibcs-mipsel-linux-gnu build
UNRESOLVED: glibcs-mips64el-linux-gnu-n64-soft check
PASS: glibcs-mipsel-linux-gnu-soft save-logs
PASS: glibcs-mipsel-linux-gnu-nan2008 save-logs
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n32 mkdir-lib
FAIL: glibcs-nios2-linux-gnu check-compilers
PASS: glibcs-mips64el-linux-gnu-n64-nan2008-soft save-logs
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008-soft configure
UNRESOLVED: glibcs-mipsel-linux-gnu install
PASS: glibcs-mips64el-linux-gnu-n64-soft save-logs
FAIL: glibcs-mipsisa64r6el-linux-gnu-n64 check-compilers
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008-soft build
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n32 check
FAIL: glibcs-powerpc-linux-gnu check-compilers
UNRESOLVED: glibcs-nios2-linux-gnu rm
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n64 rm
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008-soft install
FAIL: glibcs-powerpc-linux-gnu-power4 check-compilers
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n64 mkdir
PASS: glibcs-mipsisa64r6el-linux-gnu-n32 save-logs
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008-soft mkdir-lib
UNRESOLVED: glibcs-mipsel-linux-gnu mkdir-lib
UNRESOLVED: glibcs-mipsel-linux-gnu-nan2008-soft check
UNRESOLVED: glibcs-powerpc-linux-gnu rm
UNRESOLVED: glibcs-nios2-linux-gnu mkdir
PASS: glibcs-mipsel-linux-gnu-nan2008-soft save-logs
FAIL: glibcs-powerpc-linux-gnu-soft check-compilers
UNRESOLVED: glibcs-powerpc-linux-gnu-power4 rm
FAIL: glibcs-powerpc64-linux-gnu check-compilers
UNRESOLVED: glibcs-mipsel-linux-gnu check
UNRESOLVED: glibcs-mipsisa32r6el-linux-gnu configure
UNRESOLVED: glibcs-powerpc-linux-gnu mkdir
UNRESOLVED: glibcs-powerpc64-linux-gnu rm
UNRESOLVED: glibcs-mipsisa32r6el-linux-gnu build
PASS: glibcs-mipsel-linux-gnu save-logs
UNRESOLVED: glibcs-powerpc-linux-gnu-soft rm
UNRESOLVED: glibcs-powerpc-linux-gnu-power4 mkdir
FAIL: glibcs-powerpc64le-linux-gnu check-compilers
UNRESOLVED: glibcs-powerpc64-linux-gnu mkdir
UNRESOLVED: glibcs-powerpc-linux-gnu-soft mkdir
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n64 configure
UNRESOLVED: glibcs-mipsisa32r6el-linux-gnu install
UNRESOLVED: glibcs-powerpc64le-linux-gnu rm
UNRESOLVED: glibcs-powerpc-linux-gnu configure
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n64 build
UNRESOLVED: glibcs-mipsisa32r6el-linux-gnu mkdir-lib
UNRESOLVED: glibcs-powerpc-linux-gnu build
UNRESOLVED: glibcs-mipsisa32r6el-linux-gnu check
UNRESOLVED: glibcs-powerpc64le-linux-gnu mkdir
PASS: glibcs-mipsisa32r6el-linux-gnu save-logs
UNRESOLVED: glibcs-nios2-linux-gnu configure
UNRESOLVED: glibcs-nios2-linux-gnu build
FAIL: glibcs-powerpc64le-linux-gnu-disable-multi-arch check-compilers
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n64 install
UNRESOLVED: glibcs-powerpc64le-linux-gnu-disable-multi-arch rm
UNRESOLVED: glibcs-nios2-linux-gnu install
UNRESOLVED: glibcs-powerpc-linux-gnu install
UNRESOLVED: glibcs-powerpc64le-linux-gnu-disable-multi-arch mkdir
UNRESOLVED: glibcs-nios2-linux-gnu mkdir-lib
UNRESOLVED: glibcs-nios2-linux-gnu check
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n64 mkdir-lib
PASS: glibcs-nios2-linux-gnu save-logs
UNRESOLVED: glibcs-powerpc-linux-gnu mkdir-lib
UNRESOLVED: glibcs-powerpc64-linux-gnu configure
UNRESOLVED: glibcs-mipsisa64r6el-linux-gnu-n64 check
UNRESOLVED: glibcs-powerpc-linux-gnu-power4 configure
FAIL: glibcs-riscv32-linux-gnu-rv32imac-ilp32 check-compilers
UNRESOLVED: glibcs-powerpc64-linux-gnu build
UNRESOLVED: glibcs-powerpc-linux-gnu check
PASS: glibcs-mipsisa64r6el-linux-gnu-n64 save-logs
UNRESOLVED: glibcs-powerpc64-linux-gnu install
UNRESOLVED: glibcs-powerpc-linux-gnu-power4 build
PASS: glibcs-powerpc-linux-gnu save-logs
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imac-ilp32 rm
UNRESOLVED: glibcs-powerpc64-linux-gnu mkdir-lib
UNRESOLVED: glibcs-powerpc-linux-gnu-soft configure
FAIL: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32 check-compilers
UNRESOLVED: glibcs-powerpc64le-linux-gnu-disable-multi-arch configure
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imac-ilp32 mkdir
FAIL: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32d check-compilers
UNRESOLVED: glibcs-powerpc-linux-gnu-power4 install
UNRESOLVED: glibcs-powerpc-linux-gnu-soft build
UNRESOLVED: glibcs-powerpc64-linux-gnu check
UNRESOLVED: glibcs-powerpc64le-linux-gnu-disable-multi-arch build
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32d rm
UNRESOLVED: glibcs-powerpc64le-linux-gnu configure
PASS: glibcs-powerpc64-linux-gnu save-logs
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32 rm
UNRESOLVED: glibcs-powerpc-linux-gnu-soft install
UNRESOLVED: glibcs-powerpc64le-linux-gnu-disable-multi-arch install
UNRESOLVED: glibcs-powerpc-linux-gnu-soft mkdir-lib
UNRESOLVED: glibcs-powerpc64le-linux-gnu build
UNRESOLVED: glibcs-powerpc64le-linux-gnu-disable-multi-arch mkdir-lib
UNRESOLVED: glibcs-powerpc-linux-gnu-power4 mkdir-lib
FAIL: glibcs-riscv64-linux-gnu-rv64imac-lp64 check-compilers
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32d mkdir
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32 mkdir
UNRESOLVED: glibcs-powerpc-linux-gnu-power4 check
UNRESOLVED: glibcs-powerpc64le-linux-gnu-disable-multi-arch check
UNRESOLVED: glibcs-powerpc-linux-gnu-soft check
UNRESOLVED: glibcs-powerpc64le-linux-gnu install
PASS: glibcs-powerpc-linux-gnu-soft save-logs
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imac-lp64 rm
PASS: glibcs-powerpc-linux-gnu-power4 save-logs
PASS: glibcs-powerpc64le-linux-gnu-disable-multi-arch save-logs
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imac-lp64 mkdir
FAIL: glibcs-riscv64-linux-gnu-rv64imafdc-lp64 check-compilers
FAIL: glibcs-riscv64-linux-gnu-rv64imafdc-lp64d check-compilers
FAIL: glibcs-s390-linux-gnu check-compilers
UNRESOLVED: glibcs-powerpc64le-linux-gnu mkdir-lib
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64 rm
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64d rm
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32d configure
UNRESOLVED: glibcs-s390-linux-gnu rm
UNRESOLVED: glibcs-powerpc64le-linux-gnu check
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64d mkdir
UNRESOLVED: glibcs-s390-linux-gnu mkdir
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32d build
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64 mkdir
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imac-lp64 configure
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imac-ilp32 configure
PASS: glibcs-powerpc64le-linux-gnu save-logs
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imac-lp64 build
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32 configure
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imac-ilp32 build
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32 build
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32d install
FAIL: glibcs-s390x-linux-gnu check-compilers
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imac-lp64 install
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32 install
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32 mkdir-lib
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imac-ilp32 install
UNRESOLVED: glibcs-s390x-linux-gnu rm
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32d mkdir-lib
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imac-lp64 mkdir-lib
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32 check
PASS: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32 save-logs
UNRESOLVED: glibcs-s390x-linux-gnu mkdir
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imac-ilp32 mkdir-lib
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32d check
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imac-lp64 check
FAIL: glibcs-s390x-linux-gnu-O3 check-compilers
UNRESOLVED: glibcs-riscv32-linux-gnu-rv32imac-ilp32 check
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64 configure
PASS: glibcs-riscv64-linux-gnu-rv64imac-lp64 save-logs
UNRESOLVED: glibcs-s390x-linux-gnu-O3 rm
PASS: glibcs-riscv32-linux-gnu-rv32imafdc-ilp32d save-logs
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64d configure
PASS: glibcs-riscv32-linux-gnu-rv32imac-ilp32 save-logs
UNRESOLVED: glibcs-s390-linux-gnu configure
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64 build
UNRESOLVED: glibcs-s390x-linux-gnu-O3 mkdir
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64d build
FAIL: glibcs-sh3eb-linux-gnu check-compilers
UNRESOLVED: glibcs-sh3eb-linux-gnu rm
UNRESOLVED: glibcs-s390-linux-gnu build
FAIL: glibcs-sh3-linux-gnu check-compilers
FAIL: glibcs-sh4-linux-gnu check-compilers
UNRESOLVED: glibcs-sh3-linux-gnu rm
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64 install
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64d install
UNRESOLVED: glibcs-sh3-linux-gnu mkdir
UNRESOLVED: glibcs-sh3eb-linux-gnu mkdir
UNRESOLVED: glibcs-sh4-linux-gnu rm
UNRESOLVED: glibcs-s390-linux-gnu install
UNRESOLVED: glibcs-s390x-linux-gnu configure
UNRESOLVED: glibcs-sh4-linux-gnu mkdir
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64d mkdir-lib
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64 mkdir-lib
UNRESOLVED: glibcs-s390x-linux-gnu build
UNRESOLVED: glibcs-s390-linux-gnu mkdir-lib
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64 check
UNRESOLVED: glibcs-riscv64-linux-gnu-rv64imafdc-lp64d check
UNRESOLVED: glibcs-s390x-linux-gnu install
UNRESOLVED: glibcs-s390x-linux-gnu-O3 configure
UNRESOLVED: glibcs-s390x-linux-gnu-O3 build
PASS: glibcs-riscv64-linux-gnu-rv64imafdc-lp64d save-logs
UNRESOLVED: glibcs-s390-linux-gnu check
PASS: glibcs-riscv64-linux-gnu-rv64imafdc-lp64 save-logs
UNRESOLVED: glibcs-s390x-linux-gnu-O3 install
UNRESOLVED: glibcs-sh3-linux-gnu configure
PASS: glibcs-s390-linux-gnu save-logs
FAIL: glibcs-sh4eb-linux-gnu check-compilers
FAIL: glibcs-sh4-linux-gnu-soft check-compilers
UNRESOLVED: glibcs-s390x-linux-gnu mkdir-lib
FAIL: glibcs-sh4eb-linux-gnu-soft check-compilers
UNRESOLVED: glibcs-sh3-linux-gnu build
UNRESOLVED: glibcs-s390x-linux-gnu-O3 mkdir-lib
UNRESOLVED: glibcs-sh4eb-linux-gnu rm
UNRESOLVED: glibcs-sh4-linux-gnu-soft rm
UNRESOLVED: glibcs-sh4eb-linux-gnu mkdir
UNRESOLVED: glibcs-sh4-linux-gnu-soft mkdir
UNRESOLVED: glibcs-s390x-linux-gnu-O3 check
UNRESOLVED: glibcs-s390x-linux-gnu check
UNRESOLVED: glibcs-sh4eb-linux-gnu-soft rm
UNRESOLVED: glibcs-sh3-linux-gnu install
UNRESOLVED: glibcs-sh3-linux-gnu mkdir-lib
UNRESOLVED: glibcs-sh3eb-linux-gnu configure
PASS: glibcs-s390x-linux-gnu save-logs
UNRESOLVED: glibcs-sh3-linux-gnu check
UNRESOLVED: glibcs-sh3eb-linux-gnu build
UNRESOLVED: glibcs-sh4-linux-gnu configure
PASS: glibcs-sh3-linux-gnu save-logs
PASS: glibcs-s390x-linux-gnu-O3 save-logs
UNRESOLVED: glibcs-sh4eb-linux-gnu-soft mkdir
FAIL: glibcs-sparc64-linux-gnu-disable-multi-arch check-compilers
FAIL: glibcs-sparc64-linux-gnu check-compilers
UNRESOLVED: glibcs-sh4-linux-gnu build
UNRESOLVED: glibcs-sh3eb-linux-gnu install
FAIL: glibcs-sparcv8-linux-gnu-leon3 check-compilers
UNRESOLVED: glibcs-sparc64-linux-gnu-disable-multi-arch rm
UNRESOLVED: glibcs-sparc64-linux-gnu rm
UNRESOLVED: glibcs-sparcv8-linux-gnu-leon3 rm
UNRESOLVED: glibcs-sh3eb-linux-gnu mkdir-lib
UNRESOLVED: glibcs-sh4-linux-gnu install
UNRESOLVED: glibcs-sh4-linux-gnu-soft configure
UNRESOLVED: glibcs-sparc64-linux-gnu mkdir
UNRESOLVED: glibcs-sparc64-linux-gnu-disable-multi-arch mkdir
UNRESOLVED: glibcs-sh4eb-linux-gnu configure
UNRESOLVED: glibcs-sh4-linux-gnu-soft build
UNRESOLVED: glibcs-sh3eb-linux-gnu check
UNRESOLVED: glibcs-sparcv8-linux-gnu-leon3 mkdir
UNRESOLVED: glibcs-sh4-linux-gnu-soft install
UNRESOLVED: glibcs-sh4eb-linux-gnu build
UNRESOLVED: glibcs-sh4-linux-gnu mkdir-lib
PASS: glibcs-sh3eb-linux-gnu save-logs
UNRESOLVED: glibcs-sh4-linux-gnu-soft mkdir-lib
UNRESOLVED: glibcs-sh4-linux-gnu-soft check
UNRESOLVED: glibcs-sh4-linux-gnu check
PASS: glibcs-sh4-linux-gnu-soft save-logs
UNRESOLVED: glibcs-sh4eb-linux-gnu install
FAIL: glibcs-sparcv9-linux-gnu check-compilers
UNRESOLVED: glibcs-sh4eb-linux-gnu mkdir-lib
UNRESOLVED: glibcs-sparc64-linux-gnu configure
PASS: glibcs-sh4-linux-gnu save-logs
FAIL: glibcs-sparcv9-linux-gnu-disable-multi-arch check-compilers
UNRESOLVED: glibcs-sparc64-linux-gnu build
UNRESOLVED: glibcs-sh4eb-linux-gnu check
UNRESOLVED: glibcs-sparcv9-linux-gnu-disable-multi-arch rm
UNRESOLVED: glibcs-sparcv9-linux-gnu rm
FAIL: glibcs-x86_64-linux-gnu check-compilers
PASS: glibcs-sh4eb-linux-gnu save-logs
UNRESOLVED: glibcs-sparc64-linux-gnu install
FAIL: glibcs-x86_64-linux-gnu-disable-multi-arch check-compilers
UNRESOLVED: glibcs-sparcv9-linux-gnu mkdir
UNRESOLVED: glibcs-sparcv9-linux-gnu-disable-multi-arch mkdir
UNRESOLVED: glibcs-sh4eb-linux-gnu-soft configure
UNRESOLVED: glibcs-x86_64-linux-gnu rm
UNRESOLVED: glibcs-sparc64-linux-gnu mkdir-lib
UNRESOLVED: glibcs-sparc64-linux-gnu check
UNRESOLVED: glibcs-sh4eb-linux-gnu-soft build
UNRESOLVED: glibcs-x86_64-linux-gnu-disable-multi-arch rm
UNRESOLVED: glibcs-x86_64-linux-gnu mkdir
PASS: glibcs-sparc64-linux-gnu save-logs
UNRESOLVED: glibcs-sparc64-linux-gnu-disable-multi-arch configure
UNRESOLVED: glibcs-sh4eb-linux-gnu-soft install
UNRESOLVED: glibcs-sparc64-linux-gnu-disable-multi-arch build
UNRESOLVED: glibcs-sh4eb-linux-gnu-soft mkdir-lib
UNRESOLVED: glibcs-x86_64-linux-gnu-disable-multi-arch mkdir
UNRESOLVED: glibcs-sparc64-linux-gnu-disable-multi-arch install
UNRESOLVED: glibcs-sh4eb-linux-gnu-soft check
UNRESOLVED: glibcs-sparcv8-linux-gnu-leon3 configure
FAIL: glibcs-x86_64-linux-gnu-minimal check-compilers
UNRESOLVED: glibcs-sparcv8-linux-gnu-leon3 build
PASS: glibcs-sh4eb-linux-gnu-soft save-logs
FAIL: glibcs-x86_64-linux-gnu-static-pie check-compilers
UNRESOLVED: glibcs-sparcv8-linux-gnu-leon3 install
UNRESOLVED: glibcs-x86_64-linux-gnu-static-pie rm
UNRESOLVED: glibcs-sparc64-linux-gnu-disable-multi-arch mkdir-lib
UNRESOLVED: glibcs-sparcv8-linux-gnu-leon3 mkdir-lib
UNRESOLVED: glibcs-x86_64-linux-gnu-static-pie mkdir
UNRESOLVED: glibcs-x86_64-linux-gnu-minimal rm
UNRESOLVED: glibcs-sparcv8-linux-gnu-leon3 check
PASS: glibcs-sparcv8-linux-gnu-leon3 save-logs
UNRESOLVED: glibcs-sparc64-linux-gnu-disable-multi-arch check
FAIL: glibcs-x86_64-linux-gnu-x32 check-compilers
UNRESOLVED: glibcs-x86_64-linux-gnu-x32 rm
PASS: glibcs-sparc64-linux-gnu-disable-multi-arch save-logs
UNRESOLVED: glibcs-x86_64-linux-gnu-x32 mkdir
UNRESOLVED: glibcs-x86_64-linux-gnu-minimal mkdir
FAIL: glibcs-x86_64-linux-gnu-x32-static-pie check-compilers
UNRESOLVED: glibcs-sparcv9-linux-gnu configure
UNRESOLVED: glibcs-sparcv9-linux-gnu-disable-multi-arch configure
UNRESOLVED: glibcs-x86_64-linux-gnu-static-pie configure
UNRESOLVED: glibcs-sparcv9-linux-gnu build
UNRESOLVED: glibcs-x86_64-linux-gnu-disable-multi-arch configure
UNRESOLVED: glibcs-x86_64-linux-gnu-x32-static-pie rm
UNRESOLVED: glibcs-x86_64-linux-gnu-disable-multi-arch build
UNRESOLVED: glibcs-sparcv9-linux-gnu-disable-multi-arch build
UNRESOLVED: glibcs-x86_64-linux-gnu-static-pie build
UNRESOLVED: glibcs-x86_64-linux-gnu-x32-static-pie mkdir
UNRESOLVED: glibcs-sparcv9-linux-gnu-disable-multi-arch install
UNRESOLVED: glibcs-x86_64-linux-gnu configure
UNRESOLVED: glibcs-sparcv9-linux-gnu-disable-multi-arch mkdir-lib
UNRESOLVED: glibcs-x86_64-linux-gnu-disable-multi-arch install
UNRESOLVED: glibcs-sparcv9-linux-gnu install
UNRESOLVED: glibcs-sparcv9-linux-gnu-disable-multi-arch check
UNRESOLVED: glibcs-x86_64-linux-gnu build
UNRESOLVED: glibcs-x86_64-linux-gnu install
UNRESOLVED: glibcs-x86_64-linux-gnu-static-pie install
PASS: glibcs-sparcv9-linux-gnu-disable-multi-arch save-logs
UNRESOLVED: glibcs-x86_64-linux-gnu-x32 configure
UNRESOLVED: glibcs-x86_64-linux-gnu mkdir-lib
UNRESOLVED: glibcs-x86_64-linux-gnu-disable-multi-arch mkdir-lib
UNRESOLVED: glibcs-sparcv9-linux-gnu mkdir-lib
UNRESOLVED: glibcs-x86_64-linux-gnu-x32 build
UNRESOLVED: glibcs-x86_64-linux-gnu-static-pie mkdir-lib
UNRESOLVED: glibcs-x86_64-linux-gnu check
UNRESOLVED: glibcs-x86_64-linux-gnu-static-pie check
UNRESOLVED: glibcs-x86_64-linux-gnu-disable-multi-arch check
UNRESOLVED: glibcs-sparcv9-linux-gnu check
UNRESOLVED: glibcs-x86_64-linux-gnu-x32 install
PASS: glibcs-x86_64-linux-gnu save-logs
PASS: glibcs-x86_64-linux-gnu-static-pie save-logs
UNRESOLVED: glibcs-x86_64-linux-gnu-x32-static-pie configure
PASS: glibcs-sparcv9-linux-gnu save-logs
UNRESOLVED: glibcs-x86_64-linux-gnu-x32-static-pie build
PASS: glibcs-x86_64-linux-gnu-disable-multi-arch save-logs
UNRESOLVED: glibcs-x86_64-linux-gnu-x32 mkdir-lib
UNRESOLVED: glibcs-x86_64-linux-gnu-x32-static-pie install
UNRESOLVED: glibcs-x86_64-linux-gnu-x32 check
UNRESOLVED: glibcs-x86_64-linux-gnu-minimal configure
PASS: glibcs-x86_64-linux-gnu-x32 save-logs
UNRESOLVED: glibcs-x86_64-linux-gnu-x32-static-pie mkdir-lib
UNRESOLVED: glibcs-x86_64-linux-gnu-minimal build
UNRESOLVED: glibcs-x86_64-linux-gnu-x32-static-pie check
UNRESOLVED: glibcs-x86_64-linux-gnu-minimal install
PASS: glibcs-x86_64-linux-gnu-x32-static-pie save-logs
UNRESOLVED: glibcs-x86_64-linux-gnu-minimal mkdir-lib
UNRESOLVED: glibcs-x86_64-linux-gnu-minimal check
PASS: glibcs-x86_64-linux-gnu-minimal save-logs
```


> Thanks,
> Florian
>
>

[-- Attachment #2: log-x86-64-linux-gnu.tar.gz --]
[-- Type: application/gzip, Size: 31096 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 18:34                               ` Noah Goldstein via Libc-alpha
@ 2021-09-27 18:56                                 ` Florian Weimer via Libc-alpha
  2021-09-27 19:20                                   ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-27 18:56 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

* Noah Goldstein:

>  And yet the build failes with the indicated error?
>
>  Could you check what as/config.h contains?  This is very odd.
>
> /* Define to 1 if you want to generate ELF common symbols with the STT_COMMON
>    type by default. */
> #define DEFAULT_GENERATE_ELF_STT_COMMON 0
>
> is in gas/config.h
>
> Include tar file of all logs for the build.

I must say I'm stumped, sorry.

Do you use ccache?  NFS or some other unusual file system?

I remember seeing something like that before, but I can't remember what
the cause was and what I did to fix it.

Ohhhh, wait, is . on your path by any chance?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 18:56                                 ` Florian Weimer via Libc-alpha
@ 2021-09-27 19:20                                   ` Noah Goldstein via Libc-alpha
  2021-09-27 19:34                                     ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27 19:20 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

On Mon, Sep 27, 2021 at 1:56 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Noah Goldstein:
>
> >  And yet the build failes with the indicated error?
> >
> >  Could you check what as/config.h contains?  This is very odd.
> >
> > /* Define to 1 if you want to generate ELF common symbols with the STT_COMMON
> >    type by default. */
> > #define DEFAULT_GENERATE_ELF_STT_COMMON 0
> >
> > is in gas/config.h
> >
> > Include tar file of all logs for the build.
>
> I must say I'm stumped, sorry.

No problem. Thank you for taking the time!
>
> Do you use ccache?  NFS or some other unusual file system?

Using ext4:

/dev/nvme0n1p4 ext4

>
> I remember seeing something like that before, but I can't remember what
> the cause was and what I did to fix it.
>
> Ohhhh, wait, is . on your path by any chance?

"." is not a complete anywhere path but shows up for example in:
/home/noah/.local/bin

Could the issue be related to LD_LIBRARY_PATH picking up something
from my system maybe?

>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 19:20                                   ` Noah Goldstein via Libc-alpha
@ 2021-09-27 19:34                                     ` Florian Weimer via Libc-alpha
  2021-09-27 19:43                                       ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-27 19:34 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

* Noah Goldstein:

>> I remember seeing something like that before, but I can't remember what
>> the cause was and what I did to fix it.
>>
>> Ohhhh, wait, is . on your path by any chance?
>
> "." is not a complete anywhere path but shows up for example in:
> /home/noah/.local/bin

Hmm, okay, and that failure is during the gcc stage.

> Could the issue be related to LD_LIBRARY_PATH picking up something
> from my system maybe?

Seem unlikely, but possible.  Looking at the environment settings is
probably a good idea.

Have you installed any compiler wrappers, e.g. ccache, or hardening
wrappers?

You could also take this command:

depbase=`echo as.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`; gcc -DHAVE_CONFIG_H -I. -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas  -I. -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas -I../bfd -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/config -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/../include -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/.. -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/../bfd -DLOCALEDIR="\"/home/noah/programs/opensource/glibc-dev/build-many/install/compilers/x86_64-linux-gnu/share/locale\""  -W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wstack-usage=262144 -Werror -Wwrite-strings -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/../zlib -g -O2     -MT as.o -MD -MP -MF $depbase.Tpo -c -o as.o /home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c

add --save-temps, and invoke it in the directory
/home/noah/programs/opensource/glibc-dev/build-many/build/compilers/x86_64-linux-gnu/binutils/gas.
This should produce an as.i file, and from the # lines in it, it should
be possible to glean from where the config.h file actually used comes
from.  Or perhaps use the -H option and hope that the include nesting
reveals what's going on.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 19:34                                     ` Florian Weimer via Libc-alpha
@ 2021-09-27 19:43                                       ` Noah Goldstein via Libc-alpha
  2021-09-27 19:59                                         ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27 19:43 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

On Mon, Sep 27, 2021 at 2:34 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Noah Goldstein:
>
> >> I remember seeing something like that before, but I can't remember what
> >> the cause was and what I did to fix it.
> >>
> >> Ohhhh, wait, is . on your path by any chance?
> >
> > "." is not a complete anywhere path but shows up for example in:
> > /home/noah/.local/bin
>
> Hmm, okay, and that failure is during the gcc stage.
>
> > Could the issue be related to LD_LIBRARY_PATH picking up something
> > from my system maybe?
>
> Seem unlikely, but possible.  Looking at the environment settings is
> probably a good idea.
>
> Have you installed any compiler wrappers, e.g. ccache, or hardening
> wrappers?

No.

>
> You could also take this command:
>
> depbase=`echo as.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`; gcc -DHAVE_CONFIG_H -I. -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas  -I. -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas -I../bfd -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/config -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/../include -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/.. -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/../bfd -DLOCALEDIR="\"/home/noah/programs/opensource/glibc-dev/build-many/install/compilers/x86_64-linux-gnu/share/locale\""  -W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wstack-usage=262144 -Werror -Wwrite-strings -I/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/../zlib -g -O2     -MT as.o -MD -MP -MF $depbase.Tpo -c -o as.o /home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c
>
> add --save-temps, and invoke it in the directory
> /home/noah/programs/opensource/glibc-dev/build-many/build/compilers/x86_64-linux-gnu/binutils/gas.
> This should produce an as.i file, and from the # lines in it, it should
> be possible to glean from where the config.h file actually used comes
> from.  Or perhaps use the -H option and hope that the include nesting
> reveals what's going on.

From as.i

# 0 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c"
# 1 "/home/noah/programs/opensource/glibc-dev/build-many/build/compilers/x86_64-linux-gnu/binutils/gas//"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c"
# 38 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c"
# 1 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.h"
1
# 37 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.h"
# 1 "../bfd/config.h" 1
# 38 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.h"
2

Seems to be using "../bfd/config.h

Which does not contain a define for DEFAULT_GENERATE_ELF_STT_COMMON


>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 19:43                                       ` Noah Goldstein via Libc-alpha
@ 2021-09-27 19:59                                         ` Florian Weimer via Libc-alpha
  2021-09-27 20:22                                           ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-27 19:59 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

* Noah Goldstein:

> From as.i
>
> # 0 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c"
> # 1 "/home/noah/programs/opensource/glibc-dev/build-many/build/compilers/x86_64-linux-gnu/binutils/gas//"
> # 0 "<built-in>"
> # 0 "<command-line>"
> # 1 "/usr/include/stdc-predef.h" 1 3 4
> # 0 "<command-line>" 2
> # 1 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c"
> # 38 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c"
> # 1 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.h"
> 1
> # 37 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.h"
> # 1 "../bfd/config.h" 1

The above corresponds to this source line: #include "config.h"

> # 38 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.h"
> 2
>
> Seems to be using "../bfd/config.h
>
> Which does not contain a define for DEFAULT_GENERATE_ELF_STT_COMMON

Right, that's wrong.  We saw something like this before:

  <https://sourceware.org/legacy-ml/gdb-patches/2020-01/msg00417.html>

But the command line seems to be right this time: -I. should pick up

/home/noah/programs/opensource/glibc-dev/build-many/build/compilers/x86_64-linux-gnu/binutils/gas/config.h

which hopefully exists on your system.

I still haven't got a clue what is going on here.

Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 19:59                                         ` Florian Weimer via Libc-alpha
@ 2021-09-27 20:22                                           ` Noah Goldstein via Libc-alpha
  2021-09-27 20:24                                             ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27 20:22 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

On Mon, Sep 27, 2021 at 2:59 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Noah Goldstein:
>
> > From as.i
> >
> > # 0 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c"
> > # 1 "/home/noah/programs/opensource/glibc-dev/build-many/build/compilers/x86_64-linux-gnu/binutils/gas//"
> > # 0 "<built-in>"
> > # 0 "<command-line>"
> > # 1 "/usr/include/stdc-predef.h" 1 3 4
> > # 0 "<command-line>" 2
> > # 1 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c"
> > # 38 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.c"
> > # 1 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.h"
> > 1
> > # 37 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.h"
> > # 1 "../bfd/config.h" 1
>
> The above corresponds to this source line: #include "config.h"
>
> > # 38 "/home/noah/programs/opensource/glibc-dev/build-many/src/binutils/gas/as.h"
> > 2
> >
> > Seems to be using "../bfd/config.h
> >
> > Which does not contain a define for DEFAULT_GENERATE_ELF_STT_COMMON
>
> Right, that's wrong.  We saw something like this before:
>
>   <https://sourceware.org/legacy-ml/gdb-patches/2020-01/msg00417.html>
>
> But the command line seems to be right this time: -I. should pick up
>
> /home/noah/programs/opensource/glibc-dev/build-many/build/compilers/x86_64-linux-gnu/binutils/gas/config.h
>
> which hopefully exists on your system.

It does.

>
> I still haven't got a clue what is going on here.

:/

Either way, thank you for the help!

I'm gunna post the patch w.o fully build-many-glibcs.py tests.


>
> Florian
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 20:22                                           ` Noah Goldstein via Libc-alpha
@ 2021-09-27 20:24                                             ` Florian Weimer via Libc-alpha
  2021-09-27 20:38                                               ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Florian Weimer via Libc-alpha @ 2021-09-27 20:24 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

* Noah Goldstein:

> I'm gunna post the patch w.o fully build-many-glibcs.py tests.

Right.  I can run it through the build for you, I've still got a lab
machine checked out.

Florian


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 20:24                                             ` Florian Weimer via Libc-alpha
@ 2021-09-27 20:38                                               ` Noah Goldstein via Libc-alpha
  2021-09-28  0:07                                                 ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-27 20:38 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

On Mon, Sep 27, 2021 at 3:24 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Noah Goldstein:
>
> > I'm gunna post the patch w.o fully build-many-glibcs.py tests.
>
> Right.  I can run it through the build for you, I've still got a lab
> machine checked out.

Thanks :)

I just posted the patch.
>
> Florian
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, avx2, and evex
  2021-09-27 20:38                                               ` Noah Goldstein via Libc-alpha
@ 2021-09-28  0:07                                                 ` Noah Goldstein via Libc-alpha
  0 siblings, 0 replies; 51+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-09-28  0:07 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Zack Weinberg, Zack Weinberg via Libc-alpha, Joseph Myers

On Mon, Sep 27, 2021 at 3:38 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Mon, Sep 27, 2021 at 3:24 PM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Noah Goldstein:
> >
> > > I'm gunna post the patch w.o fully build-many-glibcs.py tests.
> >
> > Right.  I can run it through the build for you, I've still got a lab
> > machine checked out.
>
> Thanks :)
>
> I just posted the patch.
> >
> > Florian
> >


Figured it out.

I had $C_INCLUDE_PATH as ":" instead of unset.

See PASS for build now :)

Thank you for your help!

I'll run the ABI tests overnight.

Will:

$> python3 src/glibc/scripts/build-many-glibcs.py build-many/ checkout
gcc-vcs-11; echo "Host Libraries"; python3
src/glibc/scripts/build-many-glibcs.py build-many host-libraries
--keep=failed; echo "Compilers"; python3
src/glibc/scripts/build-many-glibcs.py build-many compilers
x86_64-linux-gnu --keep=failed; python3
src/glibc/scripts/build-many-glibcs.py build-many glibcs --keep=failed

be sufficient to test all GLIBC abi builds?

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2021-09-28  0:08 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-13 23:05 [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S Noah Goldstein via Libc-alpha
2021-09-13 23:05 ` [PATCH 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
2021-09-14  1:18   ` Carlos O'Donell via Libc-alpha
2021-09-14  2:05     ` Noah Goldstein via Libc-alpha
2021-09-14  2:35       ` Carlos O'Donell via Libc-alpha
2021-09-14  2:55         ` DJ Delorie via Libc-alpha
2021-09-14  3:24           ` Noah Goldstein via Libc-alpha
2021-09-14  3:40         ` Noah Goldstein via Libc-alpha
2021-09-14  4:21           ` DJ Delorie via Libc-alpha
2021-09-14  5:29             ` Noah Goldstein via Libc-alpha
2021-09-14  5:42               ` DJ Delorie via Libc-alpha
2021-09-14  5:55                 ` Noah Goldstein via Libc-alpha
2021-09-13 23:22 ` [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex Noah Goldstein via Libc-alpha
2021-09-14  6:30 ` [PATCH v2 " Noah Goldstein via Libc-alpha
2021-09-14  6:30   ` [PATCH v2 2/5] x86_64: Add sse2 optimized bcmp implementation in memcmp.S Noah Goldstein via Libc-alpha
2021-09-14  6:30   ` [PATCH v2 3/5] x86_64: Add sse4_1 optimized bcmp implementation in memcmp-sse4.S Noah Goldstein via Libc-alpha
2021-09-14  6:30   ` [PATCH v2 4/5] x86_64: Add avx2 optimized bcmp implementation in bcmp-avx2.S Noah Goldstein via Libc-alpha
2021-09-14  6:30   ` [PATCH v2 5/5] x86_64: Add evex optimized bcmp implementation in bcmp-evex.S Noah Goldstein via Libc-alpha
2021-09-14 14:40   ` [PATCH v2 1/5] x86_64: Add support for bcmp using sse2, sse4_1, avx2, and evex H.J. Lu via Libc-alpha
2021-09-14 19:23     ` Noah Goldstein via Libc-alpha
2021-09-14 20:30     ` Florian Weimer via Libc-alpha
2021-09-15  0:00 ` [PATCH " Joseph Myers
2021-09-15 13:37   ` Zack Weinberg via Libc-alpha
2021-09-15 14:01     ` Re: [PATCH 1/5] x86_64: Add support for bcmp using sse2, sse 4_1, " Florian Weimer via Libc-alpha
2021-09-15 18:06       ` Noah Goldstein via Libc-alpha
2021-09-15 18:30         ` Joseph Myers
2021-09-27  1:35           ` Noah Goldstein via Libc-alpha
2021-09-27  7:29             ` Florian Weimer via Libc-alpha
2021-09-27 16:49               ` Noah Goldstein via Libc-alpha
2021-09-27 16:54                 ` Florian Weimer via Libc-alpha
2021-09-27 17:54                   ` Noah Goldstein via Libc-alpha
2021-09-27 17:56                     ` Florian Weimer via Libc-alpha
2021-09-27 18:05                       ` Noah Goldstein via Libc-alpha
2021-09-27 18:10                         ` Florian Weimer via Libc-alpha
2021-09-27 18:15                           ` Noah Goldstein via Libc-alpha
2021-09-27 18:22                             ` Florian Weimer via Libc-alpha
2021-09-27 18:34                               ` Noah Goldstein via Libc-alpha
2021-09-27 18:56                                 ` Florian Weimer via Libc-alpha
2021-09-27 19:20                                   ` Noah Goldstein via Libc-alpha
2021-09-27 19:34                                     ` Florian Weimer via Libc-alpha
2021-09-27 19:43                                       ` Noah Goldstein via Libc-alpha
2021-09-27 19:59                                         ` Florian Weimer via Libc-alpha
2021-09-27 20:22                                           ` Noah Goldstein via Libc-alpha
2021-09-27 20:24                                             ` Florian Weimer via Libc-alpha
2021-09-27 20:38                                               ` Noah Goldstein via Libc-alpha
2021-09-28  0:07                                                 ` Noah Goldstein via Libc-alpha
2021-09-27 17:42               ` Joseph Myers
2021-09-27 17:48                 ` Noah Goldstein via Libc-alpha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).