* [PATCH] totalorder, totalorderf, totalorderl: new modules
@ 2023-10-02 5:21 Paul Eggert
2023-10-02 11:33 ` Bruno Haible
` (4 more replies)
0 siblings, 5 replies; 33+ messages in thread
From: Paul Eggert @ 2023-10-02 5:21 UTC (permalink / raw)
To: bug-gnulib; +Cc: Paul Eggert
* lib/math.in.h: Declare totalorderf, totalorder, totalorderl.
* lib/totalorder.c, lib/totalorderf.c, lib/totalorderl.c:
* m4/totalorder.m4, modules/totalorder, modules/totalorder-tests:
* modules/totalorderf, modules/totalorderf-tests:
* modules/totalorderl, modules/totalorderl-tests:
* tests/test-totalorder.c, tests/test-totalorderf.c:
* tests/test-totalorderl.c: New files.
* m4/math_h.m4 (gl_MATH_H, gl_MATH_H_REQUIRE_DEFAULTS)
(gl_MATH_H_DEFAULTS):
* modules/math (math.h): Set up totalorder, totalorderf, totalorderl.
* m4/mathfunc.m4 (gl_MATHFUNC): Also support pointer-to-const.
---
ChangeLog | 15 +++++
doc/posix-functions/totalorder.texi | 12 ++--
doc/posix-functions/totalorderf.texi | 12 ++--
doc/posix-functions/totalorderl.texi | 12 ++--
lib/math.in.h | 76 ++++++++++++++++++++++++
lib/totalorder.c | 45 +++++++++++++++
lib/totalorderf.c | 45 +++++++++++++++
lib/totalorderl.c | 63 ++++++++++++++++++++
m4/math_h.m4 | 11 +++-
m4/mathfunc.m4 | 11 ++--
m4/totalorder.m4 | 86 ++++++++++++++++++++++++++++
modules/math | 9 +++
modules/totalorder | 36 ++++++++++++
modules/totalorder-tests | 15 +++++
modules/totalorderf | 36 ++++++++++++
modules/totalorderf-tests | 16 ++++++
modules/totalorderl | 37 ++++++++++++
modules/totalorderl-tests | 16 ++++++
tests/test-totalorder.c | 57 ++++++++++++++++++
tests/test-totalorderf.c | 6 ++
tests/test-totalorderl.c | 6 ++
21 files changed, 598 insertions(+), 24 deletions(-)
create mode 100644 lib/totalorder.c
create mode 100644 lib/totalorderf.c
create mode 100644 lib/totalorderl.c
create mode 100644 m4/totalorder.m4
create mode 100644 modules/totalorder
create mode 100644 modules/totalorder-tests
create mode 100644 modules/totalorderf
create mode 100644 modules/totalorderf-tests
create mode 100644 modules/totalorderl
create mode 100644 modules/totalorderl-tests
create mode 100644 tests/test-totalorder.c
create mode 100644 tests/test-totalorderf.c
create mode 100644 tests/test-totalorderl.c
diff --git a/ChangeLog b/ChangeLog
index 463a44ffbb..19e1a21d72 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,18 @@
+2023-10-01 Paul Eggert <eggert@cs.ucla.edu>
+
+ totalorder, totalorderf, totalorderl: new modules
+ * lib/math.in.h: Declare totalorderf, totalorder, totalorderl.
+ * lib/totalorder.c, lib/totalorderf.c, lib/totalorderl.c:
+ * m4/totalorder.m4, modules/totalorder, modules/totalorder-tests:
+ * modules/totalorderf, modules/totalorderf-tests:
+ * modules/totalorderl, modules/totalorderl-tests:
+ * tests/test-totalorder.c, tests/test-totalorderf.c:
+ * tests/test-totalorderl.c: New files.
+ * m4/math_h.m4 (gl_MATH_H, gl_MATH_H_REQUIRE_DEFAULTS)
+ (gl_MATH_H_DEFAULTS):
+ * modules/math (math.h): Set up totalorder, totalorderf, totalorderl.
+ * m4/mathfunc.m4 (gl_MATHFUNC): Also support pointer-to-const.
+
2023-09-30 Paul Eggert <eggert@cs.ucla.edu>
regex-quote: fix recently-introduced typo
diff --git a/doc/posix-functions/totalorder.texi b/doc/posix-functions/totalorder.texi
index 4687d9ba90..a7f8524718 100644
--- a/doc/posix-functions/totalorder.texi
+++ b/doc/posix-functions/totalorder.texi
@@ -10,18 +10,18 @@ Documentation:@*
@url{https://www.gnu.org/software/libc/manual/html_node/FP-Comparison-Functions.html}.
@end ifnotinfo
-Gnulib module: ---
+Gnulib module: totalorder
Portability problems fixed by Gnulib:
@itemize
-@end itemize
-
-Portability problems not fixed by Gnulib:
-@itemize
@item
-This function is missing on all non-glibc platforms:
+This function is missing on many platforms:
glibc 2.24, macOS 11.1, FreeBSD 13.0, NetBSD 9.0, OpenBSD 6.7, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.4, Cygwin 2.9, mingw, MSVC 14, Android 9.0.
@item
This function has a different signature on some platforms:
glibc 2.30.
@end itemize
+
+Portability problems not fixed by Gnulib:
+@itemize
+@end itemize
diff --git a/doc/posix-functions/totalorderf.texi b/doc/posix-functions/totalorderf.texi
index 4e3928c1d8..115fbfdf5b 100644
--- a/doc/posix-functions/totalorderf.texi
+++ b/doc/posix-functions/totalorderf.texi
@@ -10,18 +10,18 @@ Documentation:@*
@url{https://www.gnu.org/software/libc/manual/html_node/FP-Comparison-Functions.html}.
@end ifnotinfo
-Gnulib module: ---
+Gnulib module: totalorderf
Portability problems fixed by Gnulib:
@itemize
-@end itemize
-
-Portability problems not fixed by Gnulib:
-@itemize
@item
-This function is missing on all non-glibc platforms:
+This function is missing on many platforms:
glibc 2.24, macOS 11.1, FreeBSD 13.0, NetBSD 9.0, OpenBSD 6.7, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.4, Cygwin 2.9, mingw, MSVC 14, Android 9.0.
@item
This function has a different signature on some platforms:
glibc 2.30.
@end itemize
+
+Portability problems not fixed by Gnulib:
+@itemize
+@end itemize
diff --git a/doc/posix-functions/totalorderl.texi b/doc/posix-functions/totalorderl.texi
index 927597882e..cc07d0e080 100644
--- a/doc/posix-functions/totalorderl.texi
+++ b/doc/posix-functions/totalorderl.texi
@@ -10,18 +10,18 @@ Documentation:@*
@url{https://www.gnu.org/software/libc/manual/html_node/FP-Comparison-Functions.html}.
@end ifnotinfo
-Gnulib module: ---
+Gnulib module: totalorderl
Portability problems fixed by Gnulib:
@itemize
-@end itemize
-
-Portability problems not fixed by Gnulib:
-@itemize
@item
-This function is missing on all non-glibc platforms:
+This function is missing on many platforms:
glibc 2.24, macOS 11.1, FreeBSD 13.0, NetBSD 9.0, OpenBSD 6.7, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.4, Cygwin 2.9, mingw, MSVC 14, Android 9.0.
@item
This function has a different signature on some platforms:
glibc 2.30.
@end itemize
+
+Portability problems not fixed by Gnulib:
+@itemize
+@end itemize
diff --git a/lib/math.in.h b/lib/math.in.h
index 75d5013cf1..b18627ae0a 100644
--- a/lib/math.in.h
+++ b/lib/math.in.h
@@ -2761,6 +2761,82 @@ _GL_WARN_REAL_FLOATING_DECL (signbit);
# endif
#endif
+
+#if @GNULIB_TOTALORDERF@
+# if @REPLACE_TOTALORDERF@
+# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef totalorderf
+# define totalorderf rpl_totalorderf
+# endif
+_GL_FUNCDECL_RPL (totalorderf, int, (float const *, float const *));
+_GL_CXXALIAS_RPL (totalorderf, int, (float const *, float const *));
+# else
+# if !@HAVE_TOTALORDERF@
+_GL_FUNCDECL_SYS (totalorderf, int, (float const *, float const *));
+# endif
+_GL_CXXALIAS_SYS (totalorderf, int, (float const *, float const *));
+# endif
+_GL_CXXALIASWARN (totalorderf);
+#elif defined GNULIB_POSIXCHECK
+# undef totalorderf
+# if HAVE_RAW_DECL_TOTALORDERF
+_GL_WARN_ON_USE (totalorderf, "totalorderf is unportable - "
+ "use gnulib module totalorderf for portability");
+# endif
+#endif
+
+#if @GNULIB_TOTALORDER@
+# if @REPLACE_TOTALORDER@
+# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef totalorder
+# define totalorder rpl_totalorder
+# endif
+_GL_FUNCDECL_RPL (totalorder, int, (double const *, double const *));
+_GL_CXXALIAS_RPL (totalorder, int, (double const *, double const *));
+# else
+# if !@HAVE_TOTALORDER@
+_GL_FUNCDECL_SYS (totalorder, int, (double const *, double const *));
+# endif
+_GL_CXXALIAS_SYS (totalorder, int, (double const *, double const *));
+# endif
+# if __GLIBC__ >= 2
+_GL_CXXALIASWARN1 (totalorder, int, (double const *, double const *));
+# endif
+#elif defined GNULIB_POSIXCHECK
+# undef totalorder
+# if HAVE_RAW_DECL_TOTALORDER
+_GL_WARN_ON_USE (totalorder, "totalorder is unportable - "
+ "use gnulib module totalorder for portability");
+# endif
+#endif
+
+#if @GNULIB_TOTALORDERL@
+# if @REPLACE_TOTALORDERL@
+# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef totalorderl
+# define totalorderl rpl_totalorderl
+# endif
+_GL_FUNCDECL_RPL (totalorderl, int,
+ (long double const *, long double const *));
+_GL_CXXALIAS_RPL (totalorderl, int,
+ (long double const *, long double const *));
+# else
+# if !@HAVE_TOTALORDERL@
+_GL_FUNCDECL_SYS (totalorderl, int,
+ (long double const *, long double const *));
+# endif
+_GL_CXXALIAS_SYS (totalorderl, int,
+ (long double const *, long double const *));
+# endif
+_GL_CXXALIASWARN (totalorderl);
+#elif defined GNULIB_POSIXCHECK
+# undef totalorderl
+# if HAVE_RAW_DECL_TOTALORDERL
+_GL_WARN_ON_USE (totalorderl, "totalorderl is unportable - "
+ "use gnulib module totalorderl for portability");
+# endif
+#endif
+
_GL_INLINE_HEADER_END
#endif /* _@GUARD_PREFIX@_MATH_H */
diff --git a/lib/totalorder.c b/lib/totalorder.c
new file mode 100644
index 0000000000..4c861f971c
--- /dev/null
+++ b/lib/totalorder.c
@@ -0,0 +1,45 @@
+/* Total order for 'double'
+ Copyright 2023 Free Software Foundation, Inc.
+
+ This file is free software: you can redistribute it and/or modify
+ it under the terms of the GNU Lesser General Public License as
+ published by the Free Software Foundation, either version 3 of the
+ License, or (at your option) any later version.
+
+ This file is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public License
+ along with this program. If not, see <https://www.gnu.org/licenses/>. */
+
+/* Written by Paul Eggert. */
+
+#include <config.h>
+
+#include <math.h>
+
+int
+totalorder (double const *x, double const *y)
+{
+ /* Although the following is not strictly portable, and won't work
+ on some obsolete platforms (e.g., PA-RISC, MIPS before alignment
+ to IEEE 754-2008), it should be good enough nowadays. */
+ int xs = signbit (*x);
+ int ys = signbit (*y);
+ if (!xs != !ys)
+ return xs;
+ int xn = isnand (*x);
+ int yn = isnand (*y);
+ if (!xn != !yn)
+ return !xn == !xs;
+ if (!xn)
+ return *x <= *y;
+
+ unsigned long long extended_sign = -!!xs;
+ union { unsigned long long i; double f; } volatile xu = {0}, yu = {0};
+ xu.f = *x;
+ yu.f = *y;
+ return (xu.i ^ extended_sign) <= (yu.i ^ extended_sign);
+}
diff --git a/lib/totalorderf.c b/lib/totalorderf.c
new file mode 100644
index 0000000000..d86299ff09
--- /dev/null
+++ b/lib/totalorderf.c
@@ -0,0 +1,45 @@
+/* Total order for 'float'
+ Copyright 2023 Free Software Foundation, Inc.
+
+ This file is free software: you can redistribute it and/or modify
+ it under the terms of the GNU Lesser General Public License as
+ published by the Free Software Foundation, either version 3 of the
+ License, or (at your option) any later version.
+
+ This file is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public License
+ along with this program. If not, see <https://www.gnu.org/licenses/>. */
+
+/* Written by Paul Eggert. */
+
+#include <config.h>
+
+#include <math.h>
+
+int
+totalorderf (float const *x, float const *y)
+{
+ /* Although the following is not strictly portable, and won't work
+ on some obsolete platforms (e.g., PA-RISC, MIPS before alignment
+ to IEEE 754-2008), it should be good enough nowadays. */
+ int xs = signbit (*x);
+ int ys = signbit (*y);
+ if (!xs != !ys)
+ return xs;
+ int xn = isnanf (*x);
+ int yn = isnanf (*y);
+ if (!xn != !yn)
+ return !xn == !xs;
+ if (!xn)
+ return *x <= *y;
+
+ unsigned long extended_sign = -!!xs;
+ union { unsigned long i; float f; } volatile xu = {0}, yu = {0};
+ xu.f = *x;
+ yu.f = *y;
+ return (xu.i ^ extended_sign) <= (yu.i ^ extended_sign);
+}
diff --git a/lib/totalorderl.c b/lib/totalorderl.c
new file mode 100644
index 0000000000..03aba3adf7
--- /dev/null
+++ b/lib/totalorderl.c
@@ -0,0 +1,63 @@
+/* Total order for 'long double'
+ Copyright 2023 Free Software Foundation, Inc.
+
+ This file is free software: you can redistribute it and/or modify
+ it under the terms of the GNU Lesser General Public License as
+ published by the Free Software Foundation, either version 3 of the
+ License, or (at your option) any later version.
+
+ This file is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public License
+ along with this program. If not, see <https://www.gnu.org/licenses/>. */
+
+/* Written by Paul Eggert. */
+
+#include <config.h>
+
+#include <math.h>
+
+int
+totalorderl (long double const *x, long double const *y)
+{
+ /* Although the following is not strictly portable, and won't work
+ on some obsolete platforms (e.g., PA-RISC, MIPS before alignment
+ to IEEE 754-2008), it should be good enough nowadays. */
+ int xs = signbit (*x);
+ int ys = signbit (*y);
+ if (!xs != !ys)
+ return xs;
+ int xn = isnanl (*x);
+ int yn = isnanl (*y);
+ if (!xn != !yn)
+ return !xn == !xs;
+ if (!xn)
+ return *x <= *y;
+
+ unsigned long long extended_sign = -!!xs;
+
+ if (sizeof (long double) <= sizeof (unsigned long long))
+ {
+ union { unsigned long long i; long double f; } volatile
+ xu = {0}, yu = {0};
+ xu.f = *x;
+ yu.f = *y;
+ return (xu.i ^ extended_sign) <= (yu.i ^ extended_sign);
+ }
+
+ union { unsigned long long i[2]; long double f; } volatile
+ xu = {0}, yu = {0}, zu = {0};
+ xu.f = *x;
+ yu.f = *y;
+ zu.f = -zu.f;
+ bool bigendian = !!zu.i[0];
+ unsigned long long
+ xhi = xu.i[!bigendian] ^ extended_sign,
+ yhi = yu.i[!bigendian] ^ extended_sign,
+ xlo = xu.i[ bigendian] ^ extended_sign,
+ ylo = yu.i[ bigendian] ^ extended_sign;
+ return (xhi < yhi) | ((xhi == yhi) & (xlo <= ylo));
+}
diff --git a/m4/math_h.m4 b/m4/math_h.m4
index c214f8efa8..008dd6fa3b 100644
--- a/m4/math_h.m4
+++ b/m4/math_h.m4
@@ -50,7 +50,7 @@ AC_DEFUN_ONCE([gl_MATH_H],
modf modff modfl powf
remainder remainderf remainderl
rint rintf rintl round roundf roundl sinf sinl sinhf sqrtf sqrtl
- tanf tanl tanhf trunc truncf truncl])
+ tanf tanl tanhf totalorder totalorderf totalorderl trunc truncf truncl])
])
# gl_MATH_MODULE_INDICATOR([modulename])
@@ -165,6 +165,9 @@ AC_DEFUN([gl_MATH_H_REQUIRE_DEFAULTS],
gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_TANF])
gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_TANL])
gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_TANHF])
+ gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_TOTALORDER])
+ gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_TOTALORDERF])
+ gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_TOTALORDERL])
gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_TRUNC])
gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_TRUNCF])
gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_TRUNCL])
@@ -243,6 +246,9 @@ AC_DEFUN([gl_MATH_H_DEFAULTS],
HAVE_TANF=1; AC_SUBST([HAVE_TANF])
HAVE_TANL=1; AC_SUBST([HAVE_TANL])
HAVE_TANHF=1; AC_SUBST([HAVE_TANHF])
+ HAVE_TOTALORDER=1; AC_SUBST([HAVE_TOTALORDER])
+ HAVE_TOTALORDERF=1; AC_SUBST([HAVE_TOTALORDERF])
+ HAVE_TOTALORDERL=1; AC_SUBST([HAVE_TOTALORDERL])
HAVE_DECL_ACOSL=1; AC_SUBST([HAVE_DECL_ACOSL])
HAVE_DECL_ASINL=1; AC_SUBST([HAVE_DECL_ASINL])
HAVE_DECL_ATANL=1; AC_SUBST([HAVE_DECL_ATANL])
@@ -356,6 +362,9 @@ AC_DEFUN([gl_MATH_H_DEFAULTS],
REPLACE_SQRTL=0; AC_SUBST([REPLACE_SQRTL])
REPLACE_TANF=0; AC_SUBST([REPLACE_TANF])
REPLACE_TANHF=0; AC_SUBST([REPLACE_TANHF])
+ REPLACE_TOTALORDER=0; AC_SUBST([REPLACE_TOTALORDER])
+ REPLACE_TOTALORDERF=0; AC_SUBST([REPLACE_TOTALORDERF])
+ REPLACE_TOTALORDERL=0; AC_SUBST([REPLACE_TOTALORDERL])
REPLACE_TRUNC=0; AC_SUBST([REPLACE_TRUNC])
REPLACE_TRUNCF=0; AC_SUBST([REPLACE_TRUNCF])
REPLACE_TRUNCL=0; AC_SUBST([REPLACE_TRUNCL])
diff --git a/m4/mathfunc.m4 b/m4/mathfunc.m4
index 4cdd0a9001..00216ceae6 100644
--- a/m4/mathfunc.m4
+++ b/m4/mathfunc.m4
@@ -1,4 +1,4 @@
-# mathfunc.m4 serial 12
+# mathfunc.m4 serial 13
dnl Copyright (C) 2010-2023 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
@@ -40,10 +40,11 @@ AC_DEFUN([gl_MATHFUNC],
[m4_bpatsubst(
[m4_bpatsubst(
[$3],
- [int \*], [&i_ret])],
- [float \*], [&f_ret])],
- [double \*], [&d_ret])],
- [long double \*], [&l_ret])],
+ [int\( const\)? \*],
+ [&i_ret])],
+ [float\( const\)? \*], [&f_ret])],
+ [double\( const\)? \*], [&d_ret])],
+ [long double\( const\)? \*], [&l_ret])],
[int], [2])],
[float], [1.618034f])],
[long double], [1.618033988749894848L])],
diff --git a/m4/totalorder.m4 b/m4/totalorder.m4
new file mode 100644
index 0000000000..4d065dc0eb
--- /dev/null
+++ b/m4/totalorder.m4
@@ -0,0 +1,86 @@
+# totalorder.m4
+dnl Copyright 2023 Free Software Foundation, Inc.
+dnl This file is free software; the Free Software Foundation
+dnl gives unlimited permission to copy and/or distribute it,
+dnl with or without modifications, as long as this notice is preserved.
+
+AC_DEFUN([gl_FUNC_TOTALORDERF],
+[
+ AC_REQUIRE([gl_MATH_H_DEFAULTS])
+ AC_REQUIRE([gl_USE_SYSTEM_EXTENSIONS])
+
+ gl_MATHFUNC([totalorderf], [int],
+ [(float const *, float const *)],
+ [extern
+ #ifdef __cplusplus
+ "C"
+ #endif
+ int totalorderf (float const *, float const *);
+ ])
+ AS_IF([test $gl_cv_func_totalorderf_no_libm != yes &&
+ test $gl_cv_func_totalorderf_in_libm != yes],
+ [gl_saved_LIBS=$LIBS
+ AC_SEARCH_LIBS([totalorderf], [m])
+ LIBS=$gl_saved_LIBS
+ if test "$ac_cv_search_totalorderf" = no; then
+ HAVE_TOTALORDERF=0
+ else
+ REPLACE_TOTALORDERF=1
+ fi
+ TOTALORDER_LIBM='$(ISNANF_LIBM)'])
+ AC_SUBST([TOTALORDERF_LIBM])
+])
+
+AC_DEFUN([gl_FUNC_TOTALORDER],
+[
+ AC_REQUIRE([gl_MATH_H_DEFAULTS])
+ AC_REQUIRE([gl_USE_SYSTEM_EXTENSIONS])
+
+ gl_MATHFUNC([totalorder], [int],
+ [(double const *, double const *)],
+ [extern
+ #ifdef __cplusplus
+ "C"
+ #endif
+ int totalorder (double const *, double const *);
+ ])
+ AS_IF([test $gl_cv_func_totalorder_no_libm != yes &&
+ test $gl_cv_func_totalorder_in_libm != yes],
+ [gl_saved_LIBS=$LIBS
+ AC_SEARCH_LIBS([totalorder], [m])
+ LIBS=$gl_saved_LIBS
+ if test "$ac_cv_search_totalorder" = no; then
+ HAVE_TOTALORDER=0
+ else
+ REPLACE_TOTALORDER=1
+ fi
+ TOTALORDER_LIBM='$(ISNAND_LIBM)'])
+ AC_SUBST([TOTALORDER_LIBM])
+])
+
+AC_DEFUN([gl_FUNC_TOTALORDERL],
+[
+ AC_REQUIRE([gl_MATH_H_DEFAULTS])
+ AC_REQUIRE([gl_USE_SYSTEM_EXTENSIONS])
+
+ gl_MATHFUNC([totalorderl], [int],
+ [(long double const *, long double const *)],
+ [extern
+ #ifdef __cplusplus
+ "C"
+ #endif
+ int totalorderl (long double const *, long double const *);
+ ])
+ AS_IF([test $gl_cv_func_totalorderl_no_libm != yes &&
+ test $gl_cv_func_totalorderl_in_libm != yes],
+ [gl_saved_LIBS=$LIBS
+ AC_SEARCH_LIBS([totalorderl], [m])
+ LIBS=$gl_saved_LIBS
+ if test "$ac_cv_search_totalorderl" = no; then
+ HAVE_TOTALORDERL=0
+ else
+ REPLACE_TOTALORDERL=1
+ fi
+ TOTALORDERL_LIBM='$(ISNANL_LIBM)'])
+ AC_SUBST([TOTALORDERL_LIBM])
+])
diff --git a/modules/math b/modules/math
index f107141533..f816d64cd2 100644
--- a/modules/math
+++ b/modules/math
@@ -132,6 +132,9 @@ math.h: math.in.h $(top_builddir)/config.status $(CXXDEFS_H) $(ARG_NONNULL_H) $(
-e 's/@''GNULIB_TRUNC''@/$(GNULIB_TRUNC)/g' \
-e 's/@''GNULIB_TRUNCF''@/$(GNULIB_TRUNCF)/g' \
-e 's/@''GNULIB_TRUNCL''@/$(GNULIB_TRUNCL)/g' \
+ -e 's/@''GNULIB_TOTALORDER''@/$(GNULIB_TOTALORDER)/g' \
+ -e 's/@''GNULIB_TOTALORDERF''@/$(GNULIB_TOTALORDERF)/g' \
+ -e 's/@''GNULIB_TOTALORDERL''@/$(GNULIB_TOTALORDERL)/g' \
-e 's/@''GNULIB_MDA_J0''@/$(GNULIB_MDA_J0)/g' \
-e 's/@''GNULIB_MDA_J1''@/$(GNULIB_MDA_J1)/g' \
-e 's/@''GNULIB_MDA_JN''@/$(GNULIB_MDA_JN)/g' \
@@ -200,6 +203,9 @@ math.h: math.in.h $(top_builddir)/config.status $(CXXDEFS_H) $(ARG_NONNULL_H) $(
-e 's|@''HAVE_TANF''@|$(HAVE_TANF)|g' \
-e 's|@''HAVE_TANL''@|$(HAVE_TANL)|g' \
-e 's|@''HAVE_TANHF''@|$(HAVE_TANHF)|g' \
+ -e 's|@''HAVE_TOTALORDER''@|$(HAVE_TOTALORDER)|g' \
+ -e 's|@''HAVE_TOTALORDERF''@|$(HAVE_TOTALORDERF)|g' \
+ -e 's|@''HAVE_TOTALORDERL''@|$(HAVE_TOTALORDERL)|g' \
< $@-t2 > $@-t3
$(AM_V_at)sed \
-e 's|@''HAVE_DECL_ACOSL''@|$(HAVE_DECL_ACOSL)|g' \
@@ -320,6 +326,9 @@ math.h: math.in.h $(top_builddir)/config.status $(CXXDEFS_H) $(ARG_NONNULL_H) $(
-e 's|@''REPLACE_SQRTL''@|$(REPLACE_SQRTL)|g' \
-e 's|@''REPLACE_TANF''@|$(REPLACE_TANF)|g' \
-e 's|@''REPLACE_TANHF''@|$(REPLACE_TANHF)|g' \
+ -e 's|@''REPLACE_TOTALORDER''@|$(REPLACE_TOTALORDER)|g' \
+ -e 's|@''REPLACE_TOTALORDERF''@|$(REPLACE_TOTALORDERF)|g' \
+ -e 's|@''REPLACE_TOTALORDERL''@|$(REPLACE_TOTALORDERL)|g' \
-e 's|@''REPLACE_TRUNC''@|$(REPLACE_TRUNC)|g' \
-e 's|@''REPLACE_TRUNCF''@|$(REPLACE_TRUNCF)|g' \
-e 's|@''REPLACE_TRUNCL''@|$(REPLACE_TRUNCL)|g' \
diff --git a/modules/totalorder b/modules/totalorder
new file mode 100644
index 0000000000..c74fa383b2
--- /dev/null
+++ b/modules/totalorder
@@ -0,0 +1,36 @@
+Description:
+totalorder function: total order on float
+
+Files:
+lib/totalorder.c
+m4/mathfunc.m4
+m4/totalorder.m4
+
+Depends-on:
+math
+extensions
+isnand [test $HAVE_TOTALORDER = 0 || test $REPLACE_TOTALORDER = 0]
+signbit [test $HAVE_TOTALORDER = 0 || test $REPLACE_TOTALORDER = 0]
+
+configure.ac:
+gl_FUNC_TOTALORDER
+gl_CONDITIONAL([GL_COND_OBJ_TOTALORDER],
+ [test $HAVE_TOTALORDER = 0 || test $REPLACE_TOTALORDER = 1])
+gl_MATH_MODULE_INDICATOR([totalorder])
+
+Makefile.am:
+if GL_COND_OBJ_TOTALORDER
+lib_SOURCES += totalorder.c
+endif
+
+Include:
+<math.h>
+
+Link:
+$(TOTALORDER_LIBM)
+
+License:
+LGPL
+
+Maintainer:
+all
diff --git a/modules/totalorder-tests b/modules/totalorder-tests
new file mode 100644
index 0000000000..fbcfb11a99
--- /dev/null
+++ b/modules/totalorder-tests
@@ -0,0 +1,15 @@
+Files:
+tests/test-totalorder.c
+tests/minus-zero.h
+tests/infinity.h
+tests/nan.h
+tests/macros.h
+
+Depends-on:
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-totalorder
+check_PROGRAMS += test-totalorder
+test_totalorder_LDADD = $(LDADD) @TOTALORDER_LIBM@
diff --git a/modules/totalorderf b/modules/totalorderf
new file mode 100644
index 0000000000..b3fb5d99e2
--- /dev/null
+++ b/modules/totalorderf
@@ -0,0 +1,36 @@
+Description:
+totalorderf function: total order on float
+
+Files:
+lib/totalorderf.c
+m4/mathfunc.m4
+m4/totalorder.m4
+
+Depends-on:
+math
+extensions
+isnanf [test $HAVE_TOTALORDERF = 0 || test $REPLACE_TOTALORDERF = 0]
+signbit [test $HAVE_TOTALORDERF = 0 || test $REPLACE_TOTALORDERF = 0]
+
+configure.ac:
+gl_FUNC_TOTALORDERF
+gl_CONDITIONAL([GL_COND_OBJ_TOTALORDERF],
+ [test $HAVE_TOTALORDERF = 0 || test $REPLACE_TOTALORDERF = 1])
+gl_MATH_MODULE_INDICATOR([totalorderf])
+
+Makefile.am:
+if GL_COND_OBJ_TOTALORDERF
+lib_SOURCES += totalorderf.c
+endif
+
+Include:
+<math.h>
+
+Link:
+$(TOTALORDERF_LIBM)
+
+License:
+LGPL
+
+Maintainer:
+all
diff --git a/modules/totalorderf-tests b/modules/totalorderf-tests
new file mode 100644
index 0000000000..e76c9fda38
--- /dev/null
+++ b/modules/totalorderf-tests
@@ -0,0 +1,16 @@
+Files:
+tests/test-totalorderf.c
+tests/test-totalorder.c
+tests/minus-zero.h
+tests/infinity.h
+tests/nan.h
+tests/macros.h
+
+Depends-on:
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-totalorderf
+check_PROGRAMS += test-totalorderf
+test_totalorderf_LDADD = $(LDADD) @TOTALORDERF_LIBM@
diff --git a/modules/totalorderl b/modules/totalorderl
new file mode 100644
index 0000000000..407b243bb3
--- /dev/null
+++ b/modules/totalorderl
@@ -0,0 +1,37 @@
+Description:
+totalorderl function: total order on long double
+
+Files:
+lib/totalorderl.c
+m4/mathfunc.m4
+m4/totalorder.m4
+
+Depends-on:
+math
+extensions
+stdbool [test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 0]
+isnanl [test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 0]
+signbit [test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 0]
+
+configure.ac:
+gl_FUNC_TOTALORDERL
+gl_CONDITIONAL([GL_COND_OBJ_TOTALORDERL],
+ [test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 1])
+gl_MATH_MODULE_INDICATOR([totalorderl])
+
+Makefile.am:
+if GL_COND_OBJ_TOTALORDERL
+lib_SOURCES += totalorderl.c
+endif
+
+Include:
+<math.h>
+
+Link:
+$(TOTALORDERL_LIBM)
+
+License:
+LGPL
+
+Maintainer:
+all
diff --git a/modules/totalorderl-tests b/modules/totalorderl-tests
new file mode 100644
index 0000000000..cfc50e21e9
--- /dev/null
+++ b/modules/totalorderl-tests
@@ -0,0 +1,16 @@
+Files:
+tests/test-totalorderl.c
+tests/test-totalorder.c
+tests/minus-zero.h
+tests/infinity.h
+tests/nan.h
+tests/macros.h
+
+Depends-on:
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-totalorderl
+check_PROGRAMS += test-totalorderl
+test_totalorderl_LDADD = $(LDADD) @TOTALORDERL_LIBM@
diff --git a/tests/test-totalorder.c b/tests/test-totalorder.c
new file mode 100644
index 0000000000..1fc1b84568
--- /dev/null
+++ b/tests/test-totalorder.c
@@ -0,0 +1,57 @@
+/* Test totalorder.
+ Copyright 2023 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <https://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include <math.h>
+
+#include "infinity.h"
+#include "macros.h"
+#include "minus-zero.h"
+#include "nan.h"
+
+#ifndef TOTALORDER
+# define TOTALORDER totalorder
+# define TOTALORDER_INF Infinityd
+# define TOTALORDER_MINUS_ZERO minus_zerod
+# define TOTALORDER_NAN NaNd
+# define TOTALORDER_TYPE double
+#endif
+
+int
+main ()
+{
+ TOTALORDER_TYPE x[] =
+ {
+ -TOTALORDER_NAN (), -TOTALORDER_INF (), -1e37, -1, -1e-5,
+ TOTALORDER_MINUS_ZERO, 0,
+ 1e-5, 1, 1e37, TOTALORDER_INF (), TOTALORDER_NAN ()
+ };
+ int n = sizeof x / sizeof *x;
+
+ /* TOTALORDER_NAN () yields a NaN of unknown sign, so fix the
+ first NaN to be negative and the last NaN to be nonnegative.
+ Do this via volatile accesses, to work around GCC bug 111655. */
+ TOTALORDER_TYPE volatile a = x[0], z = x[n - 1];
+ if (! signbit (a)) a = -a;
+ if ( signbit (z)) z = -z;
+ x[0] = a, x[n - 1] = z;
+
+ for (int i = 0; i < n; i++)
+ for (int j = 0; j < n; j++)
+ ASSERT (!!TOTALORDER (&x[i], &x[j]) == (i <= j));
+ return 0;
+}
diff --git a/tests/test-totalorderf.c b/tests/test-totalorderf.c
new file mode 100644
index 0000000000..2dcb47aca4
--- /dev/null
+++ b/tests/test-totalorderf.c
@@ -0,0 +1,6 @@
+#define TOTALORDER totalorderf
+#define TOTALORDER_INF Infinityf
+#define TOTALORDER_MINUS_ZERO minus_zerof
+#define TOTALORDER_NAN NaNf
+#define TOTALORDER_TYPE float
+#include "test-totalorder.c"
diff --git a/tests/test-totalorderl.c b/tests/test-totalorderl.c
new file mode 100644
index 0000000000..ac454ad848
--- /dev/null
+++ b/tests/test-totalorderl.c
@@ -0,0 +1,6 @@
+#define TOTALORDER totalorderl
+#define TOTALORDER_INF Infinityl
+#define TOTALORDER_MINUS_ZERO minus_zerol
+#define TOTALORDER_NAN NaNl
+#define TOTALORDER_TYPE long double
+#include "test-totalorder.c"
--
2.41.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH] totalorder, totalorderf, totalorderl: new modules
2023-10-02 5:21 [PATCH] totalorder, totalorderf, totalorderl: new modules Paul Eggert
@ 2023-10-02 11:33 ` Bruno Haible
2023-10-07 1:40 ` Paul Eggert
2023-10-02 11:39 ` [PATCH] totalorder, totalorderf, totalorderl: new modules Bruno Haible
` (3 subsequent siblings)
4 siblings, 1 reply; 33+ messages in thread
From: Bruno Haible @ 2023-10-02 11:33 UTC (permalink / raw)
To: bug-gnulib, Paul Eggert
Note: it looks like the libstdc++ has similar functions, with a templated
implementation:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96526
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9805965e3551b66b5bd751d6076791d00cdeb137
Here's a discussion how to get nearly optimal code for this function on x86_64:
https://stackoverflow.com/questions/59348310/
But your portable code appears to produce fairly decent x86_64 assembly code
as well.
There were a few typos, which I'm correcting like this:
2023-10-02 Bruno Haible <bruno@clisp.org>
totalorder, totalorderf, totalorderl: Fix some typos.
* m4/totalorder.m4 (gl_FUNC_TOTALORDERF): Assign TOTALORDERF_LIBM, not
TOTALORDER_LIBM.
* modules/totalorder (Description): Fix copy&paste mistake.
(Depends-on): Fix conditions.
* modules/totalorderf (Depends-on): Likewise.
* modules/totalorderl (Depends-on): Likewise.
diff --git a/m4/totalorder.m4 b/m4/totalorder.m4
index 4d065dc0eb..0bbd252675 100644
--- a/m4/totalorder.m4
+++ b/m4/totalorder.m4
@@ -27,7 +27,7 @@ AC_DEFUN([gl_FUNC_TOTALORDERF]
else
REPLACE_TOTALORDERF=1
fi
- TOTALORDER_LIBM='$(ISNANF_LIBM)'])
+ TOTALORDERF_LIBM='$(ISNANF_LIBM)'])
AC_SUBST([TOTALORDERF_LIBM])
])
diff --git a/modules/totalorder b/modules/totalorder
index c74fa383b2..00f190c42e 100644
--- a/modules/totalorder
+++ b/modules/totalorder
@@ -1,5 +1,5 @@
Description:
-totalorder function: total order on float
+totalorder function: total order on double
Files:
lib/totalorder.c
@@ -9,8 +9,8 @@ m4/totalorder.m4
Depends-on:
math
extensions
-isnand [test $HAVE_TOTALORDER = 0 || test $REPLACE_TOTALORDER = 0]
-signbit [test $HAVE_TOTALORDER = 0 || test $REPLACE_TOTALORDER = 0]
+isnand [test $HAVE_TOTALORDER = 0 || test $REPLACE_TOTALORDER = 1]
+signbit [test $HAVE_TOTALORDER = 0 || test $REPLACE_TOTALORDER = 1]
configure.ac:
gl_FUNC_TOTALORDER
diff --git a/modules/totalorderf b/modules/totalorderf
index b3fb5d99e2..ab1569c9cc 100644
--- a/modules/totalorderf
+++ b/modules/totalorderf
@@ -9,8 +9,8 @@ m4/totalorder.m4
Depends-on:
math
extensions
-isnanf [test $HAVE_TOTALORDERF = 0 || test $REPLACE_TOTALORDERF = 0]
-signbit [test $HAVE_TOTALORDERF = 0 || test $REPLACE_TOTALORDERF = 0]
+isnanf [test $HAVE_TOTALORDERF = 0 || test $REPLACE_TOTALORDERF = 1]
+signbit [test $HAVE_TOTALORDERF = 0 || test $REPLACE_TOTALORDERF = 1]
configure.ac:
gl_FUNC_TOTALORDERF
diff --git a/modules/totalorderl b/modules/totalorderl
index 407b243bb3..d684c8c69c 100644
--- a/modules/totalorderl
+++ b/modules/totalorderl
@@ -9,9 +9,9 @@ m4/totalorder.m4
Depends-on:
math
extensions
-stdbool [test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 0]
-isnanl [test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 0]
-signbit [test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 0]
+stdbool [test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 1]
+isnanl [test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 1]
+signbit [test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 1]
configure.ac:
gl_FUNC_TOTALORDERL
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH] totalorder, totalorderf, totalorderl: new modules
2023-10-02 5:21 [PATCH] totalorder, totalorderf, totalorderl: new modules Paul Eggert
2023-10-02 11:33 ` Bruno Haible
@ 2023-10-02 11:39 ` Bruno Haible
2023-10-04 10:59 ` Bruno Haible
2023-10-04 11:39 ` totalorder* tests: Refactor Bruno Haible
` (2 subsequent siblings)
4 siblings, 1 reply; 33+ messages in thread
From: Bruno Haible @ 2023-10-02 11:39 UTC (permalink / raw)
To: bug-gnulib, Paul Eggert
Since this is obviously CPU dependent code, I tested it on various platforms.
There were failures only on two platforms (see 'FAIL' below); the tests
passed on all other platforms.
* Linux/glibc
x86_64: Ubuntu 22.04, Fedora 31, Debian 9.6
x86: Ubuntu 22.04, Debian 9.1
x86_64-x32: Ubuntu 22.04
alpha: Debian 5
arm: Arch
arm64: CentOS 8 Stream
hppa: Debian 12
ia64: RHEL 5
loongarch64: Debian 12
mips64 (big-endian): ABIs 32, 64
mips64 (little-endian): ABIs 32, n32, 64
powerpc64 (big-endian): ABIs 32, 64
powerpc64le: CentOS 7
riscv64: Fedora 28
s390x: ABIs 32, 64 on Debian 8
sparc64: ABIs 32, 64 on Debian 9
* Linux/glibc via QEMU user-mode
- alpha
- arm, armhf
- arm64
- hppa
- m68k: FAIL
- mips32, mips64
- powerpc, powerpc64, powerpc64-elfv2
- riscv64
- s390x
- sh4
- sparc64
* Linux/musl libc
x86_64: Alpine 3.18
x86: Alpine 3.18
arm64: Alpine 3.13
powerpc64le: Alpine 3.13
s390x: Alpine 3.13
* macOS
arm64: macOS 12.5
* FreeBSD
x86_64: FreeBSD 11.0
x86: FreeBSD 11.0
arm: FreeBSD 12.2
arm64: FreeBSD 12.2
sparc64: FreeBSD 12.2
* NetBSD
x86_64: NetBSD 8.0
x86: NetBSD 8.0
sparc: NetBSD 7.1
sparc64: ABIs 32, 64 on NetBSD 8.0
* OpenBSD
x86_64: OpenBSD 7.2
* AIX
powerpc64: ABIs 32, 64 on AIX 7.1, AIX 7.3.1
* Solaris
x86_64: ABIs 32, 64 on Solaris 10..., Solaris 11.4
except ABI 64 with cc on Solaris 10: FAIL: test-totalorderl
sparc64: ABIs 32, 64 on Solaris 10, Solaris 11.3
* Cygwin
x86_64: Cygwin 2.9.0
x86: Cygwin 2.9.0
* Native Windows
x86_64: mingw 5, MSVC 14
x86: mingw 5, MSVC 14
* Haiku
x86_64: Haiku r1beta4
* Android
arm: Android 11 (Termux)
Bruno
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] totalorder, totalorderf, totalorderl: new modules
2023-10-02 11:39 ` [PATCH] totalorder, totalorderf, totalorderl: new modules Bruno Haible
@ 2023-10-04 10:59 ` Bruno Haible
2023-10-04 11:21 ` totalorderl: Optimize Bruno Haible
0 siblings, 1 reply; 33+ messages in thread
From: Bruno Haible @ 2023-10-04 10:59 UTC (permalink / raw)
To: bug-gnulib, Paul Eggert
> * Solaris
> x86_64: ABIs 32, 64 on Solaris 10..., Solaris 11.4
> except ABI 64 with cc on Solaris 10: FAIL: test-totalorderl
The cause of this test failure is that the initialization of xu, yu, zu
in
union { unsigned long long i[2]; long double f; } volatile
xu = {0}, yu = {0}, zu = {0};
does not happen as expected.
The compiler is
cc: Sun C 5.8 Patch 121016-08 2009/04/20
and I'm using
CC="cc -D_STDC_C99= -xarch=generic64 -O"
I tracked this down
1) by printing which (i,j) pairs in the loop fail the assertion,
namely usually i=11, j=11,
2) by adding an assertion before the nested loop:
ASSERT (!!TOTALORDER (&x[11], &x[11]));
3) by single-stepping this invocation at the instruction level.
This piece of code:
union { unsigned long long i[2]; long double f; } volatile
xu = {0}, yu = {0}, zu = {0};
xu.f = *x;
yu.f = *y;
produces this sequence of instructions:
401132: db 2d 88 02 00 00 fldt 0x288(%rip) # 4013c0 <Drodata.rodata+0x20>
401138: db 7c 24 40 fstpt 0x40(%rsp)
40113c: db 2d 6e 02 00 00 fldt 0x26e(%rip) # 4013b0 <Drodata.rodata+0x10>
401142: db 7c 24 50 fstpt 0x50(%rsp)
401146: db 2d 54 02 00 00 fldt 0x254(%rip) # 4013a0 <Drodata.rodata>
40114c: db 7c 24 60 fstpt 0x60(%rsp)
401150: db 7c 24 40 fstpt 0x40(%rsp)
401154: db 7c 24 50 fstpt 0x50(%rsp)
and after their execution, I see that the upper 48 bits of each of xu, yu, zu
are not the expected zero bits:
(gdb) print *(void*(*)[2])($rsp+0x40)
$10 = {0xc000000000000000, 0xfffffd7fff217fff}
(gdb) print *(void*(*)[2])($rsp+0x50)
$11 = {0xc000000000000000, 0x407fff}
(gdb) print *(void*(*)[2])($rsp+0x60)
$12 = {0x0, 0xbee4f8b588e30000}
This patch makes the test succeed, and shouldn't slow it down on the other
platforms.
2023-10-04 Bruno Haible <bruno@clisp.org>
totalorderl: Work around Solaris cc bug.
* lib/totalorderl.c (totalorderl): Initialize xu, yu, zu using a
different syntax.
diff --git a/lib/totalorderl.c b/lib/totalorderl.c
index 03aba3adf7..84fed3c4d1 100644
--- a/lib/totalorderl.c
+++ b/lib/totalorderl.c
@@ -26,17 +26,28 @@ totalorderl (long double const *x, long double const *y)
/* Although the following is not strictly portable, and won't work
on some obsolete platforms (e.g., PA-RISC, MIPS before alignment
to IEEE 754-2008), it should be good enough nowadays. */
+
+ /* If the sign bits of *X and *Y differ, the one with sign bit == 1
+ is "smaller" than the one with sign bit == 0. */
int xs = signbit (*x);
int ys = signbit (*y);
if (!xs != !ys)
return xs;
+
+ /* If one of *X, *Y is a NaN and the other isn't, the answer is easy
+ as well: the negative NaN is "smaller", the positive NaN is "greater"
+ than the other argument. */
int xn = isnanl (*x);
int yn = isnanl (*y);
if (!xn != !yn)
return !xn == !xs;
+ /* If none of *X, *Y is a NaN, the '<=' operator does the job, including
+ for -Infinity and +Infinity. */
if (!xn)
return *x <= *y;
+ /* At this point, *X and *Y are NaNs with the same sign bit. */
+
unsigned long long extended_sign = -!!xs;
if (sizeof (long double) <= sizeof (unsigned long long))
@@ -48,12 +59,23 @@ totalorderl (long double const *x, long double const *y)
return (xu.i ^ extended_sign) <= (yu.i ^ extended_sign);
}
- union { unsigned long long i[2]; long double f; } volatile
- xu = {0}, yu = {0}, zu = {0};
+ union { unsigned long long i[2]; long double f; } volatile xu, yu, zu;
+ /* It would be tempting to initialize xu, yu, zu with {0}. But
+ Solaris cc (Sun C 5.8) on x86_64 miscompiles it: It initializes
+ only the lower 80 bits, not the entire 128 bits, of each of the
+ three variables. */
+ xu.i[0] = 0; xu.i[1] = 0;
+ yu.i[0] = 0; yu.i[1] = 0;
+ zu.i[0] = 0; zu.i[1] = 0;
+
xu.f = *x;
yu.f = *y;
+
+ /* Determine in which of the two 'unsigned long long' words the sign bit
+ is located. */
zu.f = -zu.f;
bool bigendian = !!zu.i[0];
+
unsigned long long
xhi = xu.i[!bigendian] ^ extended_sign,
yhi = yu.i[!bigendian] ^ extended_sign,
^ permalink raw reply related [flat|nested] 33+ messages in thread
* totalorderl: Optimize
2023-10-04 10:59 ` Bruno Haible
@ 2023-10-04 11:21 ` Bruno Haible
0 siblings, 0 replies; 33+ messages in thread
From: Bruno Haible @ 2023-10-04 11:21 UTC (permalink / raw)
To: bug-gnulib, Paul Eggert
The variable 'bigendian' denotes whether the sign bit is in the first or
second 64-bit word of the 'long double'. But in the usual case (namely,
when not cross-compiling), the 'signbit' module already has determined at
configure time the location of the sign bit. With this knowledge, it is
possible to optimize the code somewhat.
2023-10-04 Bruno Haible <bruno@clisp.org>
totalorderl: Optimize.
* modules/totalorderl (Files): Add m4/signbit.m4.
* m4/totalorder.m4 (gl_FUNC_TOTALORDERL): Invoke
gl_LONG_DOUBLE_SIGN_LOCATION.
* lib/totalorderl.c (totalorderl): If LDBL_SIGNBIT_WORD is known,
use it, so that 'bigendian' becomes a constant.
diff --git a/lib/totalorderl.c b/lib/totalorderl.c
index 84fed3c4d1..cf7da088ac 100644
--- a/lib/totalorderl.c
+++ b/lib/totalorderl.c
@@ -73,8 +73,14 @@ totalorderl (long double const *x, long double const *y)
/* Determine in which of the two 'unsigned long long' words the sign bit
is located. */
+ bool bigendian;
+#if defined LDBL_SIGNBIT_WORD
+ /* We have already determined the sign bit location at configure time. */
+ bigendian = (LDBL_SIGNBIT_WORD < 2);
+#else
zu.f = -zu.f;
- bool bigendian = !!zu.i[0];
+ bigendian = !!zu.i[0];
+#endif
unsigned long long
xhi = xu.i[!bigendian] ^ extended_sign,
diff --git a/m4/totalorder.m4 b/m4/totalorder.m4
index 0bbd252675..95aab65f19 100644
--- a/m4/totalorder.m4
+++ b/m4/totalorder.m4
@@ -81,6 +81,9 @@ AC_DEFUN([gl_FUNC_TOTALORDERL]
else
REPLACE_TOTALORDERL=1
fi
- TOTALORDERL_LIBM='$(ISNANL_LIBM)'])
+ TOTALORDERL_LIBM='$(ISNANL_LIBM)'
+ dnl Prerequisite of lib/totalorderl.c.
+ gl_LONG_DOUBLE_SIGN_LOCATION
+ ])
AC_SUBST([TOTALORDERL_LIBM])
])
diff --git a/modules/totalorderl b/modules/totalorderl
index d684c8c69c..ed6f801130 100644
--- a/modules/totalorderl
+++ b/modules/totalorderl
@@ -5,6 +5,7 @@ Files:
lib/totalorderl.c
m4/mathfunc.m4
m4/totalorder.m4
+m4/signbit.m4
Depends-on:
math
^ permalink raw reply related [flat|nested] 33+ messages in thread
* totalorder* tests: Refactor
2023-10-02 5:21 [PATCH] totalorder, totalorderf, totalorderl: new modules Paul Eggert
2023-10-02 11:33 ` Bruno Haible
2023-10-02 11:39 ` [PATCH] totalorder, totalorderf, totalorderl: new modules Bruno Haible
@ 2023-10-04 11:39 ` Bruno Haible
2023-10-05 19:59 ` totalorderl: minor porting fixes Bruno Haible
2023-10-07 11:02 ` [PATCH] totalorder, totalorderf, totalorderl: new modules Bruno Haible
4 siblings, 0 replies; 33+ messages in thread
From: Bruno Haible @ 2023-10-04 11:39 UTC (permalink / raw)
To: bug-gnulib, Paul Eggert
In the test-totalorder.c file, the destructive array manipulations are a bit
tedious to understand. With a functional style, the code becomes clearer
(and possibly reusable in other tests).
2023-10-04 Bruno Haible <bruno@clisp.org>
totalorder* tests: Refactor.
* tests/test-totalorder.c (positive_nan, negative_nan): New functions,
extracted from main.
(main): Use them when initializing the array.
diff --git a/tests/test-totalorder.c b/tests/test-totalorder.c
index 1fc1b84568..b4a121c2a2 100644
--- a/tests/test-totalorder.c
+++ b/tests/test-totalorder.c
@@ -31,25 +31,37 @@
# define TOTALORDER_TYPE double
#endif
+/* Return a quiet NaN with sign bit == 0. */
+static TOTALORDER_TYPE
+positive_nan ()
+{
+ /* 'volatile' works around a GCC bug:
+ <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655> */
+ TOTALORDER_TYPE volatile nan = TOTALORDER_NAN ();
+ return (signbit (nan) ? - nan : nan);
+}
+
+/* Return a quiet NaN with sign bit == 1. */
+static TOTALORDER_TYPE
+negative_nan ()
+{
+ /* 'volatile' works around a GCC bug:
+ <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655> */
+ TOTALORDER_TYPE volatile nan = TOTALORDER_NAN ();
+ return (signbit (nan) ? nan : - nan);
+}
+
int
main ()
{
TOTALORDER_TYPE x[] =
{
- -TOTALORDER_NAN (), -TOTALORDER_INF (), -1e37, -1, -1e-5,
+ negative_nan (), -TOTALORDER_INF (), -1e37, -1, -1e-5,
TOTALORDER_MINUS_ZERO, 0,
- 1e-5, 1, 1e37, TOTALORDER_INF (), TOTALORDER_NAN ()
+ 1e-5, 1, 1e37, TOTALORDER_INF (), positive_nan ()
};
int n = sizeof x / sizeof *x;
- /* TOTALORDER_NAN () yields a NaN of unknown sign, so fix the
- first NaN to be negative and the last NaN to be nonnegative.
- Do this via volatile accesses, to work around GCC bug 111655. */
- TOTALORDER_TYPE volatile a = x[0], z = x[n - 1];
- if (! signbit (a)) a = -a;
- if ( signbit (z)) z = -z;
- x[0] = a, x[n - 1] = z;
-
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
ASSERT (!!TOTALORDER (&x[i], &x[j]) == (i <= j));
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: totalorderl: minor porting fixes
2023-10-02 5:21 [PATCH] totalorder, totalorderf, totalorderl: new modules Paul Eggert
` (2 preceding siblings ...)
2023-10-04 11:39 ` totalorder* tests: Refactor Bruno Haible
@ 2023-10-05 19:59 ` Bruno Haible
2023-10-07 11:02 ` [PATCH] totalorder, totalorderf, totalorderl: new modules Bruno Haible
4 siblings, 0 replies; 33+ messages in thread
From: Bruno Haible @ 2023-10-05 19:59 UTC (permalink / raw)
To: bug-gnulib; +Cc: Paul Eggert
> Don’t assume sizeof (unsigned long long) == 2 * sizeof (unsigned).
Thanks. Today 'unsigned int' is uint32_t, and 'unsigned long long' is uint64_t,
but who knows whether this will still be the case in 10 years...
Bruno
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] totalorder, totalorderf, totalorderl: new modules
2023-10-02 11:33 ` Bruno Haible
@ 2023-10-07 1:40 ` Paul Eggert
2023-10-07 2:33 ` sort and -lm Bruno Haible
0 siblings, 1 reply; 33+ messages in thread
From: Paul Eggert @ 2023-10-07 1:40 UTC (permalink / raw)
To: Bruno Haible, bug-gnulib
On 2023-10-02 04:33, Bruno Haible wrote:
> But your portable code appears to produce fairly decent x86_64 assembly
> code as well.
Thanks for your fixes. Yes, the goal was portability rather than saving
a few instructions. A high-performance implementation of totalorder
would need help from the compiler anyway, just as signbit gets help from
__builtin_signbit with GCC.
The win for GNU sort, once I get around to fixing totalorder's -lm
dependency, is that it can call totalorderl instead of doing what it
does now, which is to call sprintf twice and then strcmp.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort and -lm
2023-10-07 1:40 ` Paul Eggert
@ 2023-10-07 2:33 ` Bruno Haible
2023-10-07 3:38 ` Paul Eggert
0 siblings, 1 reply; 33+ messages in thread
From: Bruno Haible @ 2023-10-07 2:33 UTC (permalink / raw)
To: bug-gnulib, Paul Eggert
Paul Eggert wrote:
> The win for GNU sort, once I get around to fixing totalorder's -lm
> dependency
Why would that be a problem? The average run time of a 'sort' invocation
is long enough that an additional link dependency to libm shouldn't matter.
'sort' already links against libcrypto:
$ nm sort | grep ' U ' | grep -v @GLIBC
U MD5_Final@OPENSSL_3.0.0
U MD5_Init@OPENSSL_3.0.0
U MD5_Update@OPENSSL_3.0.0
If you really want to minimize the startup time, you could load the libcrypto,
needed for the option '--random', using dlopen() ? I wouldn't advocate it as
a solution on all platforms, since libltdl surely has its own set of
portability problems. But guarded by a '#ifdef __GLIBC__', why not?
Bruno
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort and -lm
2023-10-07 2:33 ` sort and -lm Bruno Haible
@ 2023-10-07 3:38 ` Paul Eggert
2023-10-07 11:42 ` Pádraig Brady
2023-10-07 13:20 ` sort and -lm Bruno Haible
0 siblings, 2 replies; 33+ messages in thread
From: Paul Eggert @ 2023-10-07 3:38 UTC (permalink / raw)
To: Bruno Haible, bug-gnulib
On 2023-10-06 19:33, Bruno Haible wrote:
> Why would that be a problem? The average run time of a 'sort' invocation
> is long enough that an additional link dependency to libm shouldn't matter.
We've avoided -lm in the past, I think more to avoid the dependency.
> If you really want to minimize the startup time, you could load the libcrypto,
> needed for the option '--random', using dlopen() ? I wouldn't advocate it as
> a solution on all platforms, since libltdl surely has its own set of
> portability problems. But guarded by a '#ifdef __GLIBC__', why not?
Yes, that sounds like it might be a win too. I hadn't noticed the
dependency on libcrypto (and libz).
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] totalorder, totalorderf, totalorderl: new modules
2023-10-02 5:21 [PATCH] totalorder, totalorderf, totalorderl: new modules Paul Eggert
` (3 preceding siblings ...)
2023-10-05 19:59 ` totalorderl: minor porting fixes Bruno Haible
@ 2023-10-07 11:02 ` Bruno Haible
4 siblings, 0 replies; 33+ messages in thread
From: Bruno Haible @ 2023-10-07 11:02 UTC (permalink / raw)
To: bug-gnulib, Paul Eggert
The gnulib CI failed today, with this error:
depbase=`echo c-vsnprintf.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
ccache gcc -DHAVE_CONFIG_H -DEXEEXT=\"\" -DEXEEXT=\"\" -DNO_XMALLOC -DEXEEXT=\"\" -I. -I.. -DGNULIB_STRICT_CHECKING=1 -fvisibility=hidden -g -O2 -MT c-vsnprintf.o -MD -MP -MF $depbase.Tpo -c -o c-vsnprintf.o c-vsnprintf.c &&\
mv -f $depbase.Tpo $depbase.Po
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:196,
from ./sys/types.h:46,
from ./stdio.h:76,
from c-vasnprintf.h:32,
from c-vasnprintf.c:20:
./math.h:3288:1: error: conflicting types for 'totalorderf'
_GL_FUNCDECL_SYS (totalorderf, int, (float const *, float const *));
^~~~~~~~~~~~~~~~
In file included from ./math.h:46,
from vasnprintf.c:107,
from c-vasnprintf.c:46:
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:371:1: note: previous declaration of 'totalorderf' was here
__MATHDECL_1 (int, totalorder,, (_Mdouble_ __x, _Mdouble_ __y))
^~~~~~~~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:196,
from ./sys/types.h:46,
from ./stdio.h:76,
from c-vasnprintf.h:32,
from c-vasnprintf.c:20:
./math.h:3311:1: error: conflicting types for 'totalorder'
_GL_FUNCDECL_SYS (totalorder, int, (double const *, double const *));
^~~~~~~~~~~~~~~~
In file included from /usr/include/features.h:424,
from /usr/include/assert.h:35,
from ./assert.h:26,
from ../config.h:8358,
from c-vasnprintf.c:17:
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:371:1: note: previous declaration of 'totalorder' was here
__MATHDECL_1 (int, totalorder,, (_Mdouble_ __x, _Mdouble_ __y))
^~~~~~~~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:196,
from ./sys/types.h:46,
from ./stdio.h:76,
from c-vasnprintf.h:32,
from c-vasnprintf.c:20:
./math.h:3338:1: error: conflicting types for 'totalorderl'
_GL_FUNCDECL_SYS (totalorderl, int,
^~~~~~~~~~~~~~~~
In file included from ./math.h:46,
from vasnprintf.c:107,
from c-vasnprintf.c:46:
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:371:1: note: previous declaration of 'totalorderl' was here
__MATHDECL_1 (int, totalorder,, (_Mdouble_ __x, _Mdouble_ __y))
^~~~~~~~~~~~
make[4]: *** [Makefile:11173: c-vasnprintf.o] Error 1
Our documentation says
"This function has a different signature on some platforms:
glibc 2.30."
The gnulib CI happens to be running on a Debian 10 system, that is, with
glibc 2.28. And indeed, I reproduce the compilation error on Debian 10
and CentOS 8; both have glibc 2.28.
The cause is the configuration code in m4/totalorder.m4. It attempts
to decide based on the *_no_libm and *_in_libm variables whether to
set REPLACE_TOTALORDER=1 or not. But that is not possible; it requires
a test of the declaration in <math.h>, not a link test, to determine this.
This patch fixes it.
2023-10-07 Bruno Haible <bruno@clisp.org>
totalorder*: Fix compilation error on glibc 2.25..2.30.
* m4/totalorder.m4 (gl_FUNC_TOTALORDERF): Test whether <math.h> has an
incompatible declaration of totalorderf, and set REPLACE_TOTALORDERF
to 1 if so.
(gl_FUNC_TOTALORDER): Test whether <math.h> has an incompatible
declaration of totalorder, and set REPLACE_TOTALORDER to 1 if so.
(gl_FUNC_TOTALORDERL): Test whether <math.h> has an incompatible
declaration of totalorderl, and set REPLACE_TOTALORDERL to 1 if so.
diff --git a/m4/totalorder.m4 b/m4/totalorder.m4
index 02c01fcd1a..ce264f4c65 100644
--- a/m4/totalorder.m4
+++ b/m4/totalorder.m4
@@ -9,22 +9,36 @@ AC_DEFUN([gl_FUNC_TOTALORDERF]
AC_REQUIRE([gl_MATH_H_DEFAULTS])
AC_REQUIRE([gl_USE_SYSTEM_EXTENSIONS])
- gl_MATHFUNC([totalorderf], [int],
- [(float const *, float const *)],
- [extern
- #ifdef __cplusplus
- "C"
- #endif
- int totalorderf (float const *, float const *);
+ dnl glibc versions < 2.31 had an incompatible declaration of this function,
+ dnl see <https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=42760d764649ad82f5fe45a26cbdf2c2500409f7>
+ AC_CACHE_CHECK([whether totalorderf has a non-standard declaration],
+ [gl_cv_func_totalorderf_incompatible],
+ [AC_COMPILE_IFELSE(
+ [AC_LANG_PROGRAM(
+ [[#include <math.h>
+ ]],
+ [[extern
+ #ifdef __cplusplus
+ "C"
+ #endif
+ int totalorderf (float const *, float const *);
+ ]])
+ ],
+ [gl_cv_func_totalorderf_incompatible=no],
+ [gl_cv_func_totalorderf_incompatible=yes])
])
- AS_IF([test $gl_cv_func_totalorderf_no_libm != yes &&
- test $gl_cv_func_totalorderf_in_libm != yes],
- [if test $gl_cv_func_totalorderf_in_libm != yes; then
- HAVE_TOTALORDERF=0
- else
- REPLACE_TOTALORDERF=1
- fi
- TOTALORDERF_LIBM='$(ISNANF_LIBM)'])
+ if test $gl_cv_func_totalorderf_incompatible = yes; then
+ REPLACE_TOTALORDERF=1
+ else
+ gl_MATHFUNC([totalorderf], [int], [(float const *, float const *)])
+ if test $gl_cv_func_totalorderf_no_libm != yes \
+ && test $gl_cv_func_totalorderf_in_libm != yes; then
+ HAVE_TOTALORDERF=0
+ fi
+ fi
+ if test $HAVE_TOTALORDERF = 0 || test $REPLACE_TOTALORDERF = 1; then
+ TOTALORDERF_LIBM='$(ISNANF_LIBM)'
+ fi
AC_SUBST([TOTALORDERF_LIBM])
])
@@ -33,22 +47,36 @@ AC_DEFUN([gl_FUNC_TOTALORDER]
AC_REQUIRE([gl_MATH_H_DEFAULTS])
AC_REQUIRE([gl_USE_SYSTEM_EXTENSIONS])
- gl_MATHFUNC([totalorder], [int],
- [(double const *, double const *)],
- [extern
- #ifdef __cplusplus
- "C"
- #endif
- int totalorder (double const *, double const *);
+ dnl glibc versions < 2.31 had an incompatible declaration of this function,
+ dnl see <https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=42760d764649ad82f5fe45a26cbdf2c2500409f7>
+ AC_CACHE_CHECK([whether totalorder has a non-standard declaration],
+ [gl_cv_func_totalorder_incompatible],
+ [AC_COMPILE_IFELSE(
+ [AC_LANG_PROGRAM(
+ [[#include <math.h>
+ ]],
+ [[extern
+ #ifdef __cplusplus
+ "C"
+ #endif
+ int totalorder (double const *, double const *);
+ ]])
+ ],
+ [gl_cv_func_totalorder_incompatible=no],
+ [gl_cv_func_totalorder_incompatible=yes])
])
- AS_IF([test $gl_cv_func_totalorder_no_libm != yes &&
- test $gl_cv_func_totalorder_in_libm != yes],
- [if test $gl_cv_func_totalorder_in_libm != yes; then
- HAVE_TOTALORDER=0
- else
- REPLACE_TOTALORDER=1
- fi
- TOTALORDER_LIBM='$(ISNAND_LIBM)'])
+ if test $gl_cv_func_totalorder_incompatible = yes; then
+ REPLACE_TOTALORDER=1
+ else
+ gl_MATHFUNC([totalorder], [int], [(double const *, double const *)])
+ if test $gl_cv_func_totalorder_no_libm != yes \
+ && test $gl_cv_func_totalorder_in_libm != yes; then
+ HAVE_TOTALORDER=0
+ fi
+ fi
+ if test $HAVE_TOTALORDER = 0 || test $REPLACE_TOTALORDER = 1; then
+ TOTALORDER_LIBM='$(ISNAND_LIBM)'
+ fi
AC_SUBST([TOTALORDER_LIBM])
])
@@ -57,24 +85,38 @@ AC_DEFUN([gl_FUNC_TOTALORDERL]
AC_REQUIRE([gl_MATH_H_DEFAULTS])
AC_REQUIRE([gl_USE_SYSTEM_EXTENSIONS])
- gl_MATHFUNC([totalorderl], [int],
- [(long double const *, long double const *)],
- [extern
- #ifdef __cplusplus
- "C"
- #endif
- int totalorderl (long double const *, long double const *);
- ])
- AS_IF([test $gl_cv_func_totalorderl_no_libm != yes &&
- test $gl_cv_func_totalorderl_in_libm != yes],
- [if test $gl_cv_func_totalorderl_in_libm != yes; then
- HAVE_TOTALORDERL=0
- else
- REPLACE_TOTALORDERL=1
- fi
- TOTALORDERL_LIBM='$(ISNANL_LIBM)'
- dnl Prerequisite of lib/totalorderl.c.
- gl_LONG_DOUBLE_SIGN_LOCATION
+ dnl glibc versions < 2.31 had an incompatible declaration of this function,
+ dnl see <https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=42760d764649ad82f5fe45a26cbdf2c2500409f7>
+ AC_CACHE_CHECK([whether totalorderl has a non-standard declaration],
+ [gl_cv_func_totalorderl_incompatible],
+ [AC_COMPILE_IFELSE(
+ [AC_LANG_PROGRAM(
+ [[#include <math.h>
+ ]],
+ [[extern
+ #ifdef __cplusplus
+ "C"
+ #endif
+ int totalorderl (long double const *, long double const *);
+ ]])
+ ],
+ [gl_cv_func_totalorderl_incompatible=no],
+ [gl_cv_func_totalorderl_incompatible=yes])
])
+ if test $gl_cv_func_totalorderl_incompatible = yes; then
+ REPLACE_TOTALORDERL=1
+ else
+ gl_MATHFUNC([totalorderl], [int],
+ [(long double const *, long double const *)])
+ if test $gl_cv_func_totalorderl_no_libm != yes \
+ && test $gl_cv_func_totalorderl_in_libm != yes; then
+ HAVE_TOTALORDERL=0
+ fi
+ fi
+ if test $HAVE_TOTALORDERL = 0 || test $REPLACE_TOTALORDERL = 1; then
+ TOTALORDERL_LIBM='$(ISNANL_LIBM)'
+ dnl Prerequisite of lib/totalorderl.c.
+ gl_LONG_DOUBLE_SIGN_LOCATION
+ fi
AC_SUBST([TOTALORDERL_LIBM])
])
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: sort and -lm
2023-10-07 3:38 ` Paul Eggert
@ 2023-10-07 11:42 ` Pádraig Brady
2023-10-07 21:29 ` Paul Eggert
2023-10-07 13:20 ` sort and -lm Bruno Haible
1 sibling, 1 reply; 33+ messages in thread
From: Pádraig Brady @ 2023-10-07 11:42 UTC (permalink / raw)
To: Paul Eggert, Bruno Haible, bug-gnulib
On 07/10/2023 04:38, Paul Eggert wrote:
> On 2023-10-06 19:33, Bruno Haible wrote:
>> Why would that be a problem? The average run time of a 'sort' invocation
>> is long enough that an additional link dependency to libm shouldn't matter.
>
> We've avoided -lm in the past, I think more to avoid the dependency.
>
>> If you really want to minimize the startup time, you could load the libcrypto,
>> needed for the option '--random', using dlopen() ? I wouldn't advocate it as
>> a solution on all platforms, since libltdl surely has its own set of
>> portability problems. But guarded by a '#ifdef __GLIBC__', why not?
>
> Yes, that sounds like it might be a win too. I hadn't noticed the
> dependency on libcrypto (and libz).
>
FYI.
coreutils auto links with libcrypto >=3 since it's GPL compat,
and quite a bit faster than the fallback routines.
libz is a transitive dependency of libcrypto.
The auto linking is globally controlled with the --with-openssl
cofigure option, but you could build sort (and md5sum)
without that dependency with:
./configure ac_cv_lib_crypto_MD5=no
cheers,
Pádraig
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort and -lm
2023-10-07 3:38 ` Paul Eggert
2023-10-07 11:42 ` Pádraig Brady
@ 2023-10-07 13:20 ` Bruno Haible
1 sibling, 0 replies; 33+ messages in thread
From: Bruno Haible @ 2023-10-07 13:20 UTC (permalink / raw)
To: bug-gnulib, Paul Eggert
> > Why would that be a problem? The average run time of a 'sort' invocation
> > is long enough that an additional link dependency to libm shouldn't matter.
>
> We've avoided -lm in the past, I think more to avoid the dependency.
You could alternatively link with libm.a instead of -lm. This won't increase
the startup time of 'sort'.
Bruno
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort and -lm
2023-10-07 11:42 ` Pádraig Brady
@ 2023-10-07 21:29 ` Paul Eggert
2023-10-08 13:36 ` Pádraig Brady
0 siblings, 1 reply; 33+ messages in thread
From: Paul Eggert @ 2023-10-07 21:29 UTC (permalink / raw)
To: Pádraig Brady, Bruno Haible, bug-gnulib
On 2023-10-07 04:42, Pádraig Brady wrote:
>
> The auto linking is globally controlled with the --with-openssl
> cofigure option, but you could build sort (and md5sum)
> without that dependency with:
>
> ./configure ac_cv_lib_crypto_MD5=no
Thanks, I was thinking more along the lines that Bruno suggested, which
to continue to link to libcrypto, but do it with dlopen/dlsym in 'sort'
only when need_random is true.
It's not clear to me offhand whether this should be done entirely in
Coreutils, or whether we should add some Gnulib support to make it
easier to do this sort of lazier linking.
Also, why doesn't coreutils/configure's ---with-linux-crypto option
cause 'sort' to use Linux kernel crypto? Is this because the syscall
overhead would be too high and it's significantly faster to do it in
user space?
I naively thought that --with-linux-crypto would mean we wouldn't need
to link to libcrypto. Evidently not.....
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort and -lm
2023-10-07 21:29 ` Paul Eggert
@ 2023-10-08 13:36 ` Pádraig Brady
2023-10-08 20:53 ` sort dynamic linking overhead Pádraig Brady
0 siblings, 1 reply; 33+ messages in thread
From: Pádraig Brady @ 2023-10-08 13:36 UTC (permalink / raw)
To: Paul Eggert, Bruno Haible, bug-gnulib
On 07/10/2023 22:29, Paul Eggert wrote:
> On 2023-10-07 04:42, Pádraig Brady wrote:
>>
>> The auto linking is globally controlled with the --with-openssl
>> cofigure option, but you could build sort (and md5sum)
>> without that dependency with:
>>
>> ./configure ac_cv_lib_crypto_MD5=no
>
> Thanks, I was thinking more along the lines that Bruno suggested, which
> to continue to link to libcrypto, but do it with dlopen/dlsym in 'sort'
> only when need_random is true.
>
> It's not clear to me offhand whether this should be done entirely in
> Coreutils, or whether we should add some Gnulib support to make it
> easier to do this sort of lazier linking.
I was wondering if this was worth worrying about at all,
but it is a significant overhead that's worth improving.
To quantify the overhead I compared optimized builds,
with and without the above configure option, giving:
$ time seq 10000 | xargs -I'{}' src/sort /dev/null -k'{}'
real 0m7.009s
user 0m3.462s
sys 0m3.578s
$ time seq 10000 | xargs -I'{}' src/sort-lc /dev/null -k'{}'
real 0m12.950s
user 0m3.754s
sys 0m9.200s
So we should do something. Now dlopening libcrypto on demand
would work, but there may be better solutions.
sort doesn't have to use md5. It could use blake2 routines
already in coreutils to avoid the issue (and get some speed ups).
Alternatively it might use some other hash function.
For example see the other 128 bit functions compared at:
https://github.com/Cyan4973/xxHash
BTW there was mention of static linking as an option in this thread.
That's is an option to provide better speed an isolation for binaries,
however it's best left to the system builders to use this for their builds.
There can be security implications for prompt library updating,
and libcrypto is particularly sensitive in this regard.
> Also, why doesn't coreutils/configure's ---with-linux-crypto option
> cause 'sort' to use Linux kernel crypto? Is this because the syscall
> overhead would be too high and it's significantly faster to do it in
> user space?
>
> I naively thought that --with-linux-crypto would mean we wouldn't need
> to link to libcrypto. Evidently not.....
Well --with-linux-crypto is for specialized setups
and not really for general usage, due to syscall overhead,
differing network syscall class (consider locked down environments),
and generally slower than libcrypto. That was detailed at:
https://git.sv.gnu.org/cgit/gnulib.git/commit/?id=71cc633a0f
cheers,
Pádraig
^ permalink raw reply [flat|nested] 33+ messages in thread
* sort dynamic linking overhead
2023-10-08 13:36 ` Pádraig Brady
@ 2023-10-08 20:53 ` Pádraig Brady
2023-10-09 13:48 ` Pádraig Brady
0 siblings, 1 reply; 33+ messages in thread
From: Pádraig Brady @ 2023-10-08 20:53 UTC (permalink / raw)
To: Paul Eggert, Bruno Haible, bug-gnulib, Coreutils
[-- Attachment #1: Type: text/plain, Size: 2745 bytes --]
On 08/10/2023 14:36, Pádraig Brady wrote:
> On 07/10/2023 22:29, Paul Eggert wrote:
>> On 2023-10-07 04:42, Pádraig Brady wrote:
>>>
>>> The auto linking is globally controlled with the --with-openssl
>>> cofigure option, but you could build sort (and md5sum)
>>> without that dependency with:
>>>
>>> ./configure ac_cv_lib_crypto_MD5=no
>>
>> Thanks, I was thinking more along the lines that Bruno suggested, which
>> to continue to link to libcrypto, but do it with dlopen/dlsym in 'sort'
>> only when need_random is true.
>>
>> It's not clear to me offhand whether this should be done entirely in
>> Coreutils, or whether we should add some Gnulib support to make it
>> easier to do this sort of lazier linking.
>
> I was wondering if this was worth worrying about at all,
> but it is a significant overhead that's worth improving.
> To quantify the overhead I compared optimized builds,
> with and without the above configure option, giving:
>
> $ time seq 10000 | xargs -I'{}' src/sort /dev/null -k'{}'
> real 0m7.009s
> user 0m3.462s
> sys 0m3.578s
>
> $ time seq 10000 | xargs -I'{}' src/sort-lc /dev/null -k'{}'
> real 0m12.950s
> user 0m3.754s
> sys 0m9.200s
>
>
> So we should do something. Now dlopening libcrypto on demand
> would work, but there may be better solutions.
> sort doesn't have to use md5. It could use blake2 routines
> already in coreutils to avoid the issue (and get some speed ups).
> Alternatively it might use some other hash function.
> For example see the other 128 bit functions compared at:
> https://github.com/Cyan4973/xxHash
>
> BTW there was mention of static linking as an option in this thread.
> That's is an option to provide better speed an isolation for binaries,
> however it's best left to the system builders to use this for their builds.
> There can be security implications for prompt library updating,
> and libcrypto is particularly sensitive in this regard.
Adding coreutils list...
So above we've demonstrated that sort dynamically loading libcrypto
does nearly double the startup time for the process.
Attached is a patch to use the coreutils reference blake2b hash instead
of the optimized libcrypto md5 routines.
$ seq 1000000 > 1.txt
$ time src/sort-md5-lc -R < 1.txt > /dev/null
real 0m6.734s
user 0m23.258s
sys 0m0.047s
$ time src/sort-blake2 -R < 1.txt > /dev/null
real 0m7.215s
user 0m25.683s
sys 0m0.043s
$ grep 'model name' /proc/cpuinfo | head -n1
model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
$ rpm -q openssl-libs
openssl-libs-3.0.9-2.fc38.x86_64
So while this avoids the startup overhead,
the reference blake2 routines are a little less efficient
than the optimized md5 libcrypto routines.
cheers,
Pádraig
[-- Attachment #2: sort-blake2.patch --]
[-- Type: text/x-patch, Size: 5642 bytes --]
From 7b6765771a235849f424cb6a884cb8237de8380d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <P@draigBrady.com>
Date: Sun, 8 Oct 2023 21:17:09 +0100
Subject: [PATCH] sort: avoid linking with libcrypto
Use blake2b (128 bit) instead of md5
---
src/blake2/blake2.h | 2 +-
src/local.mk | 10 ++++++----
src/sort.c | 35 +++++++++++++++++++----------------
3 files changed, 26 insertions(+), 21 deletions(-)
diff --git a/src/blake2/blake2.h b/src/blake2/blake2.h
index be8b176d7..73bd171b7 100644
--- a/src/blake2/blake2.h
+++ b/src/blake2/blake2.h
@@ -23,7 +23,7 @@
#ifdef _MSC_VER
# define BLAKE2_PACKED(x) __pragma (pack (push, 1)) x __pragma (pack (pop))
#else
-# define BLAKE2_PACKED(x) x _GL_ATTRIBUTE_PACKED
+# define BLAKE2_PACKED(x) x /* _GL_ATTRIBUTE_PACKED */
#endif
#if defined(__cplusplus)
diff --git a/src/local.mk b/src/local.mk
index ed5d46ddb..35f4d0a35 100644
--- a/src/local.mk
+++ b/src/local.mk
@@ -302,7 +302,6 @@ src_printf_LDADD += $(LIBICONV)
# for libcrypto hash routines
src_md5sum_LDADD += $(LIB_CRYPTO)
-src_sort_LDADD += $(LIB_CRYPTO)
src_sha1sum_LDADD += $(LIB_CRYPTO)
src_sha224sum_LDADD += $(LIB_CRYPTO)
src_sha256sum_LDADD += $(LIB_CRYPTO)
@@ -409,6 +408,9 @@ src_tac_SOURCES = src/tac.c src/temp-stream.c
src_tail_SOURCES = src/tail.c src/iopoll.c
src_tee_SOURCES = src/tee.c src/iopoll.c
+src_sort_CPPFLAGS = -DHASH_ALGO_BLAKE2=1 -DHAVE_CONFIG_H $(AM_CPPFLAGS)
+src_sort_SOURCES = src/sort.c $(src_blake2_SOURCES)
+
src_sum_SOURCES = src/sum.c src/sum.h src/digest.c
src_sum_CPPFLAGS = -DHASH_ALGO_SUM=1 $(AM_CPPFLAGS)
@@ -425,9 +427,9 @@ src_sha384sum_CPPFLAGS = -DHASH_ALGO_SHA384=1 $(AM_CPPFLAGS)
src_sha512sum_SOURCES = src/digest.c
src_sha512sum_CPPFLAGS = -DHASH_ALGO_SHA512=1 $(AM_CPPFLAGS)
src_b2sum_CPPFLAGS = -DHASH_ALGO_BLAKE2=1 -DHAVE_CONFIG_H $(AM_CPPFLAGS)
-src_b2sum_SOURCES = src/digest.c \
- src/blake2/blake2.h src/blake2/blake2-impl.h \
- src/blake2/blake2b-ref.c \
+src_blake2_SOURCES = src/blake2/blake2.h src/blake2/blake2-impl.h \
+ src/blake2/blake2b-ref.c
+src_b2sum_SOURCES = src/digest.c $(src_blake2_SOURCES) \
src/blake2/b2sum.c src/blake2/b2sum.h
src_cksum_SOURCES = $(src_b2sum_SOURCES) src/sum.c src/sum.h \
diff --git a/src/sort.c b/src/sort.c
index 5c86b8332..14c3951ce 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -31,6 +31,7 @@
#include "system.h"
#include "argmatch.h"
#include "assure.h"
+#include "blake2/blake2.h"
#include "fadvise.h"
#include "filevercmp.h"
#include "flexmember.h"
@@ -38,7 +39,6 @@
#include "hash.h"
#include "heap.h"
#include "ignore-value.h"
-#include "md5.h"
#include "mbswidth.h"
#include "nproc.h"
#include "physmem.h"
@@ -2084,23 +2084,24 @@ getmonth (char const *month, char **ea)
return 0;
}
-/* A randomly chosen MD5 state, used for random comparison. */
-static struct md5_ctx random_md5_state;
+/* A randomly chosen state, used for random comparison. */
+static blake2b_state random_hash_state;
+#define RANDOM_HASH_BYTES 16
-/* Initialize the randomly chosen MD5 state. */
+/* Initialize the randomly chosen hash state. */
static void
-random_md5_state_init (char const *random_source)
+random_hash_state_init (char const *random_source)
{
- unsigned char buf[MD5_DIGEST_SIZE];
+ unsigned char buf[RANDOM_HASH_BYTES];
struct randread_source *r = randread_new (random_source, sizeof buf);
if (! r)
sort_die (_("open failed"), random_source ? random_source : "getrandom");
randread (r, buf, sizeof buf);
if (randread_free (r) != 0)
sort_die (_("close failed"), random_source);
- md5_init_ctx (&random_md5_state);
- md5_process_bytes (buf, sizeof buf, &random_md5_state);
+ blake2b_init (&random_hash_state, RANDOM_HASH_BYTES);
+ blake2b_update (&random_hash_state, buf, sizeof buf);
}
/* This is like strxfrm, except it reports any error and exits. */
@@ -2141,9 +2142,9 @@ compare_random (char *restrict texta, size_t lena,
char *buf = stackbuf;
size_t bufsize = sizeof stackbuf;
void *allocated = nullptr;
- uint32_t dig[2][MD5_DIGEST_SIZE / sizeof (uint32_t)];
- struct md5_ctx s[2];
- s[0] = s[1] = random_md5_state;
+ uint32_t dig[2][RANDOM_HASH_BYTES / sizeof (uint32_t)];
+ blake2b_state s[2];
+ s[0] = s[1] = random_hash_state;
if (hard_LC_COLLATE)
{
@@ -2220,8 +2221,8 @@ compare_random (char *restrict texta, size_t lena,
/* Accumulate the transformed data in the corresponding
checksums. */
- md5_process_bytes (buf, sizea, &s[0]);
- md5_process_bytes (buf + sizea, sizeb, &s[1]);
+ blake2b_update (&s[0], buf, sizea);
+ blake2b_update (&s[1], buf + sizea, sizeb);
/* Update the tiebreaker comparison of the transformed data. */
if (! xfrm_diff)
@@ -2234,8 +2235,10 @@ compare_random (char *restrict texta, size_t lena,
}
/* Compute and compare the checksums. */
- md5_process_bytes (texta, lena, &s[0]); md5_finish_ctx (&s[0], dig[0]);
- md5_process_bytes (textb, lenb, &s[1]); md5_finish_ctx (&s[1], dig[1]);
+ blake2b_update (&s[0], texta, lena);
+ blake2b_final (&s[0], dig[0], RANDOM_HASH_BYTES);
+ blake2b_update (&s[1], textb, lenb);
+ blake2b_final (&s[1], dig[1], RANDOM_HASH_BYTES);
int diff = memcmp (dig[0], dig[1], sizeof dig[0]);
/* Fall back on the tiebreaker if the checksums collide. */
@@ -4771,7 +4774,7 @@ main (int argc, char **argv)
reverse = gkey.reverse;
if (need_random)
- random_md5_state_init (random_source);
+ random_hash_state_init (random_source);
if (temp_dir_count == 0)
{
--
2.41.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2023-10-08 20:53 ` sort dynamic linking overhead Pádraig Brady
@ 2023-10-09 13:48 ` Pádraig Brady
2024-02-26 6:25 ` Paul Eggert
0 siblings, 1 reply; 33+ messages in thread
From: Pádraig Brady @ 2023-10-09 13:48 UTC (permalink / raw)
To: Paul Eggert, Bruno Haible, bug-gnulib, Coreutils; +Cc: cyan
[-- Attachment #1: Type: text/plain, Size: 3217 bytes --]
On 08/10/2023 21:53, Pádraig Brady wrote:
> On 08/10/2023 14:36, Pádraig Brady wrote:
>> On 07/10/2023 22:29, Paul Eggert wrote:
>>> On 2023-10-07 04:42, Pádraig Brady wrote:
>>>>
>>>> The auto linking is globally controlled with the --with-openssl
>>>> cofigure option, but you could build sort (and md5sum)
>>>> without that dependency with:
>>>>
>>>> ./configure ac_cv_lib_crypto_MD5=no
>>>
>>> Thanks, I was thinking more along the lines that Bruno suggested, which
>>> to continue to link to libcrypto, but do it with dlopen/dlsym in 'sort'
>>> only when need_random is true.
>>>
>>> It's not clear to me offhand whether this should be done entirely in
>>> Coreutils, or whether we should add some Gnulib support to make it
>>> easier to do this sort of lazier linking.
>>
>> I was wondering if this was worth worrying about at all,
>> but it is a significant overhead that's worth improving.
>> To quantify the overhead I compared optimized builds,
>> with and without the above configure option, giving:
>>
>> $ time seq 10000 | xargs -I'{}' src/sort /dev/null -k'{}'
>> real 0m7.009s
>> user 0m3.462s
>> sys 0m3.578s
>>
>> $ time seq 10000 | xargs -I'{}' src/sort-lc /dev/null -k'{}'
>> real 0m12.950s
>> user 0m3.754s
>> sys 0m9.200s
>>
>>
>> So we should do something. Now dlopening libcrypto on demand
>> would work, but there may be better solutions.
>> sort doesn't have to use md5. It could use blake2 routines
>> already in coreutils to avoid the issue (and get some speed ups).
>> Alternatively it might use some other hash function.
>> For example see the other 128 bit functions compared at:
>> https://github.com/Cyan4973/xxHash
>>
>> BTW there was mention of static linking as an option in this thread.
>> That's is an option to provide better speed an isolation for binaries,
>> however it's best left to the system builders to use this for their builds.
>> There can be security implications for prompt library updating,
>> and libcrypto is particularly sensitive in this regard.
>
> Adding coreutils list...
>
> So above we've demonstrated that sort dynamically loading libcrypto
> does nearly double the startup time for the process.
>
> Attached is a patch to use the coreutils reference blake2b hash instead
> of the optimized libcrypto md5 routines.
>
> $ seq 1000000 > 1.txt
>
> $ time src/sort-md5-lc -R < 1.txt > /dev/null
> real 0m6.734s
> user 0m23.258s
> sys 0m0.047s
>
> $ time src/sort-blake2 -R < 1.txt > /dev/null
> real 0m7.215s
> user 0m25.683s
> sys 0m0.043s
>
> $ grep 'model name' /proc/cpuinfo | head -n1
> model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
> $ rpm -q openssl-libs
> openssl-libs-3.0.9-2.fc38.x86_64
>
> So while this avoids the startup overhead,
> the reference blake2 routines are a little less efficient
> than the optimized md5 libcrypto routines.
An incremental patch attached to use xxhash128 (0.8.2)
shows a good improvement (note avx2 being used on this cpu):
$ time src/sort-xxh -R < 1.txt > /dev/null
real 0m4.111s
user 0m14.429s
sys 0m0.058s
I'm not sure how best to avail of it though.
Perhaps embed, or maybe link statically if available?
cheers,
Pádraig
[-- Attachment #2: sort-xxhash.diff --]
[-- Type: text/x-patch, Size: 3911 bytes --]
From 912c5d139c24bf0ad5aa5459bd02506be30ab206 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <P@draigBrady.com>
Date: Mon, 9 Oct 2023 14:45:01 +0100
Subject: [PATCH] sort: use XXH128 rather than blake2b
$ seq 1000000 > 1.txt
$ time src/sort-md5-lc -R < 1.txt > /dev/null
real 0m6.734s
user 0m23.258s
sys 0m0.047s
$ time src/sort-blake2 -R < 1.txt > /dev/null
real 0m7.215s
user 0m25.683s
sys 0m0.043s
Using xxhash128 (0.8.2) shows a good improvement
(note avx2 being used on this cpu):
$ time src/sort-xxh -R < 1.txt > /dev/null
real 0m4.111s
user 0m14.429s
sys 0m0.058s
$ grep 'model name' /proc/cpuinfo | head -n1
model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
$ rpm -q openssl-libs
openssl-libs-3.0.9-2.fc38.x86_64
* src/sort.c: Use xxh128 routines rather than blake2b.
* src/xxhash.h: Include xxhash appropriately.
---
src/sort.c | 26 +++++++++++++-------------
src/xxhash.h | 3 +++
2 files changed, 16 insertions(+), 13 deletions(-)
create mode 100644 src/xxhash.h
diff --git a/src/sort.c b/src/sort.c
index 14c3951ce..5f04d6921 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -31,7 +31,7 @@
#include "system.h"
#include "argmatch.h"
#include "assure.h"
-#include "blake2/blake2.h"
+#include "xxhash.h"
#include "fadvise.h"
#include "filevercmp.h"
#include "flexmember.h"
@@ -2085,7 +2085,7 @@ getmonth (char const *month, char **ea)
}
/* A randomly chosen state, used for random comparison. */
-static blake2b_state random_hash_state;
+static XXH3_state_t random_hash_state;
#define RANDOM_HASH_BYTES 16
/* Initialize the randomly chosen hash state. */
@@ -2100,8 +2100,8 @@ random_hash_state_init (char const *random_source)
randread (r, buf, sizeof buf);
if (randread_free (r) != 0)
sort_die (_("close failed"), random_source);
- blake2b_init (&random_hash_state, RANDOM_HASH_BYTES);
- blake2b_update (&random_hash_state, buf, sizeof buf);
+ (void)XXH3_128bits_reset(&random_hash_state);
+ (void)XXH3_128bits_update(&random_hash_state, buf, sizeof buf);
}
/* This is like strxfrm, except it reports any error and exits. */
@@ -2142,8 +2142,8 @@ compare_random (char *restrict texta, size_t lena,
char *buf = stackbuf;
size_t bufsize = sizeof stackbuf;
void *allocated = nullptr;
- uint32_t dig[2][RANDOM_HASH_BYTES / sizeof (uint32_t)];
- blake2b_state s[2];
+ XXH128_hash_t dig[2];
+ XXH3_state_t s[2];
s[0] = s[1] = random_hash_state;
if (hard_LC_COLLATE)
@@ -2221,8 +2221,8 @@ compare_random (char *restrict texta, size_t lena,
/* Accumulate the transformed data in the corresponding
checksums. */
- blake2b_update (&s[0], buf, sizea);
- blake2b_update (&s[1], buf + sizea, sizeb);
+ (void)XXH3_128bits_update (&s[0], buf, sizea);
+ (void)XXH3_128bits_update (&s[1], buf + sizea, sizeb);
/* Update the tiebreaker comparison of the transformed data. */
if (! xfrm_diff)
@@ -2235,11 +2235,11 @@ compare_random (char *restrict texta, size_t lena,
}
/* Compute and compare the checksums. */
- blake2b_update (&s[0], texta, lena);
- blake2b_final (&s[0], dig[0], RANDOM_HASH_BYTES);
- blake2b_update (&s[1], textb, lenb);
- blake2b_final (&s[1], dig[1], RANDOM_HASH_BYTES);
- int diff = memcmp (dig[0], dig[1], sizeof dig[0]);
+ (void)XXH3_128bits_update (&s[0], texta, lena);
+ dig[0] = XXH3_128bits_digest(&s[0]);
+ (void)XXH3_128bits_update (&s[1], textb, lenb);
+ dig[1] = XXH3_128bits_digest(&s[1]);
+ int diff = XXH128_cmp (&dig[0], &dig[1]);
/* Fall back on the tiebreaker if the checksums collide. */
if (! diff)
diff --git a/src/xxhash.h b/src/xxhash.h
new file mode 100644
index 000000000..2802f6917
--- /dev/null
+++ b/src/xxhash.h
@@ -0,0 +1,3 @@
+#define XXH_INLINE_ALL
+#define XXH_STATIC_LINKING_ONLY
+#include "xxHash/xxhash.h"
--
2.41.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2023-10-09 13:48 ` Pádraig Brady
@ 2024-02-26 6:25 ` Paul Eggert
2024-02-26 6:44 ` Yann Collet
` (2 more replies)
0 siblings, 3 replies; 33+ messages in thread
From: Paul Eggert @ 2024-02-26 6:25 UTC (permalink / raw)
To: Pádraig Brady, Bruno Haible, bug-gnulib, Coreutils; +Cc: cyan
[-- Attachment #1: Type: text/plain, Size: 1269 bytes --]
On 2023-10-09 06:48, Pádraig Brady wrote:
> An incremental patch attached to use xxhash128 (0.8.2)
> shows a good improvement (note avx2 being used on this cpu):
xxhash128 is not a cryptographic hash function, so it doesn't attempt to
be random. Of course most people won't care - it's random "enough" - but
it would be a functionality change.
blake2 is cryptographic and would be random, but would bloat the 'sort'
executable with code that's hardly ever used.
To attack the problem in a more conservative way, I installed the
attached patch into coreutils. With it, 'sort -R' continues to use MD5
but on GNUish platforms 'sort' links libcrypto dynamically only if -R is
used (Bruno's suggestion). This doesn't significantly affect 'sort -R'
performance, and reduces the startup overhead of plain 'sort' to be what
it was before we started passing -lcrypto to gcc by default (in
coreutils 8.32).
I also toyed with changing MD5 to SHA512, but that hurt performance. For
what it's worth, although I tested with an Intel Xeon W-1350, which
supports SHA-NI as well as various AVX-512 options, I didn't see where
libcrypto (at least on Ubuntu 23.10, which has OpenSSL 3.0.10) takes
advantage of these special-purpose instructions.
[-- Attachment #2: 0001-sort-dynamically-link-lcrypto-if-R.patch --]
[-- Type: text/x-patch, Size: 4556 bytes --]
From 7f57ac2d20c144242953a8dc7d95b02df0244751 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 25 Feb 2024 17:13:12 -0800
Subject: [PATCH] sort: dynamically link -lcrypto if -R
This saves time in the usual case, which does not need -lcrypto.
* configure.ac (DLOPEN_LIBCRYPTO): New macro.
* src/sort.c [DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5]: New macros
MD5_Init, MD5_Update, MD5_Final. Include "md5.h" after defining
them. Include <dlfcn.h>, and define new functions link_failure
and symbol_address.
(link_libcrypto): New function.
(random_md5_state_init): Call it before using crypto functions.
---
configure.ac | 29 +++++++++++++++++++++++++++++
src/sort.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 80 insertions(+), 1 deletion(-)
diff --git a/configure.ac b/configure.ac
index c7eca1b8d..043081b90 100644
--- a/configure.ac
+++ b/configure.ac
@@ -351,6 +351,35 @@ if test $utils_cv_localtime_cache = yes; then
AC_DEFINE([LOCALTIME_CACHE], [1], [FIXME])
fi
+# Should 'sort' link libcrypto dynamically?
+AS_CASE([$LIB_CRYPTO],
+ [-lcrypto],
+ [# Check for dlopen and libcrypto dynamic linking in one program,
+ # as there's little point to checking them separately.
+ AC_CACHE_CHECK([for dlopen and whether libcrypto is linked dynamically],
+ [utils_cv_dlopen_libcrypto],
+ [utils_cv_dlopen_libcrypto=no
+ saved_LIBS=$LIBS
+ LIBS="$LIBS $LIB_CRYPTO"
+ AC_LINK_IFELSE(
+ [AC_LANG_PROGRAM(
+ [[#include <dlfcn.h>
+ #include <openssl/sha.h>
+ ]],
+ [[return !(dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL)
+ && SHA512 (0, 0, 0));]])],
+ [# readelf works with cross-builds; ldd works on more platforms.
+ AS_CASE([`(readelf -d conftest$EXEEXT || ldd conftest$EXEEXT
+ ) 2>/dev/null`],
+ [*libcrypto*],
+ [utils_cv_dlopen_libcrypto=yes])])
+ LIBS=$saved_LIBS])
+ AS_CASE([$utils_cv_dlopen_libcrypto],
+ [yes],
+ [AC_DEFINE([DLOPEN_LIBCRYPTO], [1],
+ [Define to 1 if dlopen exists and libcrypto is
+ linked dynamically.])])])
+
# macOS >= 10.12
AC_CHECK_FUNCS([fclonefileat])
diff --git a/src/sort.c b/src/sort.c
index dea7be45d..cefe381bf 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -39,7 +39,6 @@
#include "hash.h"
#include "heap.h"
#include "ignore-value.h"
-#include "md5.h"
#include "mbswidth.h"
#include "nproc.h"
#include "physmem.h"
@@ -2085,6 +2084,56 @@ getmonth (char const *month, char **ea)
return 0;
}
+/* When using the OpenSSL implementation, dynamically link only if -R.
+ This saves startup time in the usual (sans -R) case. */
+
+#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
+/* In the typical case where md5.h does not #undef HAVE_OPENSSL_MD5,
+ trick md5.h into declaring and using pointers to functions not functions.
+ This causes the compiler's -lcrypto option to have no effect,
+ as sort.o no longer uses any crypto symbols statically. */
+# define MD5_Init (*ptr_MD5_Init)
+# define MD5_Update (*ptr_MD5_Update)
+# define MD5_Final (*ptr_MD5_Final)
+#endif
+
+#include "md5.h"
+
+#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
+# include <dlfcn.h>
+
+/* Diagnose a dynamic linking failure. */
+static void
+link_failure (void)
+{
+ error (SORT_FAILURE, 0, "%s", dlerror ());
+}
+
+/* Return a function pointer in HANDLE for SYMBOL. */
+static void *
+symbol_address (void *handle, char const *symbol)
+{
+ void *address = dlsym (handle, symbol);
+ if (!address)
+ link_failure ();
+ return address;
+}
+#endif
+
+/* Dynamically link the crypto library, if it needs linking. */
+static void
+link_libcrypto (void)
+{
+#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
+ void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);
+ if (!handle)
+ link_failure ();
+ ptr_MD5_Init = symbol_address (handle, "MD5_Init");
+ ptr_MD5_Update = symbol_address (handle, "MD5_Update");
+ ptr_MD5_Final = symbol_address (handle, "MD5_Final");
+#endif
+}
+
/* A randomly chosen MD5 state, used for random comparison. */
static struct md5_ctx random_md5_state;
@@ -2100,6 +2149,7 @@ random_md5_state_init (char const *random_source)
randread (r, buf, sizeof buf);
if (randread_free (r) != 0)
sort_die (_("close failed"), random_source);
+ link_libcrypto ();
md5_init_ctx (&random_md5_state);
md5_process_bytes (buf, sizeof buf, &random_md5_state);
}
--
2.40.1
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 6:25 ` Paul Eggert
@ 2024-02-26 6:44 ` Yann Collet
2024-02-26 14:12 ` Pádraig Brady
2024-02-26 17:11 ` Andreas Schwab
2024-02-27 11:15 ` Bruno Haible
2 siblings, 1 reply; 33+ messages in thread
From: Yann Collet @ 2024-02-26 6:44 UTC (permalink / raw)
To: Paul Eggert, Pádraig Brady, Bruno Haible, bug-gnulib@gnu.org,
Coreutils
[-- Attachment #1: Type: text/plain, Size: 2269 bytes --]
* xxhash128 is not a cryptographic hash function, so it doesn't attempt to be random.
Just a correction : xxh128 does try to be random. And quite hardly: a significant amount of development is spent on ensuring this property.
It’s even tested with PractRand, and it could be used as a good random number generator.
Being non-cryptographic means that what it doesn’t try is to make sure no one can intentionally forge a hash collision from 2 different files (other than brute-forcing, which is impractical).
But that’s different, and I wouldn’t call this property “randomness”, even though randomness is a pre-requisite (but not sufficient in itself) to collision resistance.
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sunday, February 25, 2024 at 10:25 PM
To: Pádraig Brady <P@draigBrady.com>, Bruno Haible <bruno@clisp.org>, bug-gnulib@gnu.org <bug-gnulib@gnu.org>, Coreutils <coreutils@gnu.org>
Cc: Yann Collet <cyan@meta.com>
Subject: Re: sort dynamic linking overhead
On 2023-10-09 06:48, Pádraig Brady wrote:
> An incremental patch attached to use xxhash128 (0.8.2)
> shows a good improvement (note avx2 being used on this cpu):
xxhash128 is not a cryptographic hash function, so it doesn't attempt to
be random. Of course most people won't care - it's random "enough" - but
it would be a functionality change.
blake2 is cryptographic and would be random, but would bloat the 'sort'
executable with code that's hardly ever used.
To attack the problem in a more conservative way, I installed the
attached patch into coreutils. With it, 'sort -R' continues to use MD5
but on GNUish platforms 'sort' links libcrypto dynamically only if -R is
used (Bruno's suggestion). This doesn't significantly affect 'sort -R'
performance, and reduces the startup overhead of plain 'sort' to be what
it was before we started passing -lcrypto to gcc by default (in
coreutils 8.32).
I also toyed with changing MD5 to SHA512, but that hurt performance. For
what it's worth, although I tested with an Intel Xeon W-1350, which
supports SHA-NI as well as various AVX-512 options, I didn't see where
libcrypto (at least on Ubuntu 23.10, which has OpenSSL 3.0.10) takes
advantage of these special-purpose instructions.
[-- Attachment #2: Type: text/html, Size: 7680 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 6:44 ` Yann Collet
@ 2024-02-26 14:12 ` Pádraig Brady
2024-02-26 18:54 ` Paul Eggert
0 siblings, 1 reply; 33+ messages in thread
From: Pádraig Brady @ 2024-02-26 14:12 UTC (permalink / raw)
To: Yann Collet, Paul Eggert, Bruno Haible, bug-gnulib@gnu.org,
Coreutils
On 26/02/2024 06:44, Yann Collet wrote:
> * xxhash128 is not a cryptographic hash function, so it doesn't attempt tobe random.
>
> Just a correction : xxh128 does try to be random. And quite hardly: a significant amount of development is spent on ensuring this property.
>
> It’s even tested with PractRand, and it could be used as a good random number generator.
>
> Being non-cryptographic means that what it doesn’t try is to make sure no one can intentionally forge a hash collision from 2 different files (other than brute-forcing, which is impractical).
>
> But that’s different, and I wouldn’t call this property “randomness”, even though randomness is a pre-requisite (but not sufficient in itself) to collision resistance.
Right. I was looking at both md5 and xxhash128 having a 10 quality score in the SMHasher metric.
I even saw a comment from you Yann that xxhas128 may have slightly better dispersion than md5.
Also md5 shouldn't be considered as cryptographic anyway since it's broken.
I.e. I don't think users would need to be informed of this change if made.
Re Paul's committed patch, it's a good improvement, and does not add a new (xxhash) dependency.
Paul, should the configure check, be testing for the MD5 routines rather than SHA512?
Also an entry in the Improvements section of NEWS would be appropriate.
thanks!
Pádraig
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 6:25 ` Paul Eggert
2024-02-26 6:44 ` Yann Collet
@ 2024-02-26 17:11 ` Andreas Schwab
2024-02-26 17:37 ` Pádraig Brady
2024-02-27 11:15 ` Bruno Haible
2 siblings, 1 reply; 33+ messages in thread
From: Andreas Schwab @ 2024-02-26 17:11 UTC (permalink / raw)
To: Paul Eggert; +Cc: Pádraig Brady, Bruno Haible, bug-gnulib, Coreutils, cyan
On Feb 25 2024, Paul Eggert wrote:
> +/* Dynamically link the crypto library, if it needs linking. */
> +static void
> +link_libcrypto (void)
> +{
> +#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
> + void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);
That only works if libopenssl-devel is installed.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 17:11 ` Andreas Schwab
@ 2024-02-26 17:37 ` Pádraig Brady
2024-02-26 17:39 ` Bruno Haible
2024-02-26 18:06 ` Andreas Schwab
0 siblings, 2 replies; 33+ messages in thread
From: Pádraig Brady @ 2024-02-26 17:37 UTC (permalink / raw)
To: Andreas Schwab, Paul Eggert; +Cc: Bruno Haible, bug-gnulib, Coreutils, cyan
On 26/02/2024 17:11, Andreas Schwab wrote:
> On Feb 25 2024, Paul Eggert wrote:
>
>> +/* Dynamically link the crypto library, if it needs linking. */
>> +static void
>> +link_libcrypto (void)
>> +{
>> +#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
>> + void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);
>
> That only works if libopenssl-devel is installed.
Good spot.
I'd already pushed a fix for this at:
https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938
thanks,
Pádraig
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 17:37 ` Pádraig Brady
@ 2024-02-26 17:39 ` Bruno Haible
2024-02-26 17:43 ` Pádraig Brady
2024-02-26 18:06 ` Andreas Schwab
1 sibling, 1 reply; 33+ messages in thread
From: Bruno Haible @ 2024-02-26 17:39 UTC (permalink / raw)
To: Andreas Schwab, Paul Eggert, Pádraig Brady
Cc: bug-gnulib, Coreutils, cyan
Pádraig Brady wrote:
> >> + void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);
> >
> > That only works if libopenssl-devel is installed.
>
> Good spot.
> I'd already pushed a fix for this at:
> https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938
Does this work for all the various names of libcrypto in various distros?
Debian 12 libcrypto.so.3
Ubuntu 22.04 libcrypto.so.1.1 libcrypto.so.3
Slackware 15 libcrypto.so.1.1
openSUSE 15.5 libcrypto.so.1.1
CentOS Stream 9 libcrypto.so.3
Guix 1.4 libcrypto.so.1.1
Alpine 3.19 libcrypto.so.3
FreeBSD 14.0 libcrypto.so.38
NetBSD 9.3 libcrypto.so.14
OpenBSD 7.4 libcrypto.so.52.0
Bruno
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 17:39 ` Bruno Haible
@ 2024-02-26 17:43 ` Pádraig Brady
2024-02-27 21:36 ` Bruno Haible
0 siblings, 1 reply; 33+ messages in thread
From: Pádraig Brady @ 2024-02-26 17:43 UTC (permalink / raw)
To: Bruno Haible, Andreas Schwab, Paul Eggert; +Cc: bug-gnulib, Coreutils, cyan
On 26/02/2024 17:39, Bruno Haible wrote:
> Pádraig Brady wrote:
>>>> + void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);
>>>
>>> That only works if libopenssl-devel is installed.
>>
>> Good spot.
>> I'd already pushed a fix for this at:
>> https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938
>
> Does this work for all the various names of libcrypto in various distros?
>
> Debian 12 libcrypto.so.3
> Ubuntu 22.04 libcrypto.so.1.1 libcrypto.so.3
> Slackware 15 libcrypto.so.1.1
> openSUSE 15.5 libcrypto.so.1.1
> CentOS Stream 9 libcrypto.so.3
> Guix 1.4 libcrypto.so.1.1
> Alpine 3.19 libcrypto.so.3
> FreeBSD 14.0 libcrypto.so.38
> NetBSD 9.3 libcrypto.so.14
> OpenBSD 7.4 libcrypto.so.52.0
>
I only tested with libcrypto.so.3, but it should match all of the above.
It matches libcrypto.so.[.0-9]*
cheers,
Pádraig
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 17:37 ` Pádraig Brady
2024-02-26 17:39 ` Bruno Haible
@ 2024-02-26 18:06 ` Andreas Schwab
2024-02-26 19:13 ` Pádraig Brady
1 sibling, 1 reply; 33+ messages in thread
From: Andreas Schwab @ 2024-02-26 18:06 UTC (permalink / raw)
To: Pádraig Brady; +Cc: Paul Eggert, Bruno Haible, bug-gnulib, Coreutils, cyan
On Feb 26 2024, Pádraig Brady wrote:
> https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938
It's still bad as it adds a hidden dependency that is invisible to rpm.
Also, the regexp in the sed command contains unquoted uses of '.' that
are supposed to be matched literally.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 14:12 ` Pádraig Brady
@ 2024-02-26 18:54 ` Paul Eggert
0 siblings, 0 replies; 33+ messages in thread
From: Paul Eggert @ 2024-02-26 18:54 UTC (permalink / raw)
To: Pádraig Brady, Yann Collet, Bruno Haible, bug-gnulib@gnu.org,
Coreutils
On 2024-02-26 06:12, Pádraig Brady wrote:
> On 26/02/2024 06:44, Yann Collet wrote:
>> * xxhash128 is not a cryptographic hash function, so it doesn't
>> attempt tobe random.
>>
>> Just a correction : xxh128 does try to be random. And quite hardly: a
>> significant amount of development is spent on ensuring this property.
>>
>> It’s even tested with PractRand, and it could be used as a good random
>> number generator.
>>
>> Being non-cryptographic means that what it doesn’t try is to make sure
>> no one can intentionally forge a hash collision from 2 different files
>> (other than brute-forcing, which is impractical).
>>
>> But that’s different, and I wouldn’t call this property “randomness”,
>> even though randomness is a pre-requisite (but not sufficient in
>> itself) to collision resistance.
Fair enough, I should have called it a different name since it's not
really random. However, 'sort -R' does have problems when hashes have
collisions, as it falls back on ordinary comparisons and thus ceases to
be a "random" sort, so collision resistance is a good property to have
if 'sort -R' is given adversarial input.
> md5 shouldn't be considered as cryptographic anyway since it's broken.
Although MD5 is broken as a hash for a string, it's not clear to me that
it's broken in the way that GNU 'sort -R' uses MD5. GNU 'sort -R' uses a
random salt, does not report hashes to the user, and does not output
anything until it's read all its input. It seems to me that breaking gnu
'sort -R' would be harder than breaking HMAC-MD5, which itself is
significantly harder than breaking MD5.
That being said, if MD5 is broken for GNU 'sort' then we should use a
better hash.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 18:06 ` Andreas Schwab
@ 2024-02-26 19:13 ` Pádraig Brady
0 siblings, 0 replies; 33+ messages in thread
From: Pádraig Brady @ 2024-02-26 19:13 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Coreutils, bug-gnulib
On 26/02/2024 18:06, Andreas Schwab wrote:
> On Feb 26 2024, Pádraig Brady wrote:
>
>> https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938
>
> It's still bad as it adds a hidden dependency that is invisible to rpm.
Right. In practice though since coreutils also links libcrypto
for cksum and the separate digest utilities, this should be OK.
In edge cases with a separate package per util, the dependency
can be added manually.
> Also, the regexp in the sed command contains unquoted uses of '.' that
> are supposed to be matched literally.
Fixed.
thanks,
Pádraig
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 6:25 ` Paul Eggert
2024-02-26 6:44 ` Yann Collet
2024-02-26 17:11 ` Andreas Schwab
@ 2024-02-27 11:15 ` Bruno Haible
2024-02-27 14:14 ` Pádraig Brady
2024-02-27 18:31 ` Paul Eggert
2 siblings, 2 replies; 33+ messages in thread
From: Bruno Haible @ 2024-02-27 11:15 UTC (permalink / raw)
To: Pádraig Brady, bug-gnulib, coreutils, Paul Eggert; +Cc: cyan
[-- Attachment #1: Type: text/plain, Size: 1755 bytes --]
Paul Eggert wrote:
> I installed the
> attached patch into coreutils. With it, 'sort -R' continues to use MD5
> but on GNUish platforms 'sort' links libcrypto dynamically only if -R is
> used (Bruno's suggestion). This doesn't significantly affect 'sort -R'
> performance, and reduces the startup overhead of plain 'sort'
The patch has no effect on openSUSE 15.5: DLOPEN_LIBCRYPTO is not defined
in config.h.
config.log shows a link error of the test program:
undefined reference to symbol 'dlopen@@GLIBC_2.2.5'
Both the test program and 'sort' need to link with -ldl.
This is generally true on the following platforms:
- glibc < 2.34,
- Android (with some compilers, not with Termux (*)).
(*) In the Android Termux environment, the compiler is configured to pass
the options '-ldl -lc', rather than just '-lc', to the linker. Thus,
we don't need to pass '-ldl' to the compiler. But the 'sort' program will
be linked with -ldl. In other Android environments, things may be different,
though.
The proposed attached patch fixes the problem: It defines a variable LIB_DL
that contains '-ldl' where needed or '' if not needed, and uses it with
the test program and with 'sort'.
You might think that this patch is no win, because it trades one link
dependency for another link dependency? But that's not what it does:
'ldd' shows that without the patch, 'sort' loads the libraries
linux-vdso.so.1
libcrypto.so.3
libpthread.so.0
libc.so.6
libz.so.1
libdl.so.2
/lib64/ld-linux-x86-64.so.2
and with the patch, 'sort' loads the libraries
linux-vdso.so.1
libdl.so.2
libpthread.so.0
libc.so.6
/lib64/ld-linux-x86-64.so.2
— that is, libdl.so.2 is getting loaded anyway.
[-- Attachment #2: 0001-sort-Make-the-startup-time-optimization-effective-on.patch --]
[-- Type: text/x-patch, Size: 1789 bytes --]
From c30a0e55c95e0ae7062ee2ececf85cd0dbfe49fb Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Tue, 27 Feb 2024 12:12:59 +0100
Subject: [PATCH] sort: Make the startup time optimization effective on glibc <
2.34
* configure.ac: Test where to find the dlopen function. Set LIB_DL.
Use it in the DLOPEN_LIBCRYPTO test.
* src/local.mk (src_sort_LDADD): Add $(LIB_DL).
---
configure.ac | 11 ++++++++++-
src/local.mk | 2 +-
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/configure.ac b/configure.ac
index fe8408a06..248e30ca2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -351,6 +351,15 @@ if test $utils_cv_localtime_cache = yes; then
AC_DEFINE([LOCALTIME_CACHE], [1], [FIXME])
fi
+# Find the library for dynamic loading of shared libraries.
+AC_SEARCH_LIBS([dlopen], [dl])
+AS_CASE([$ac_cv_search_dlopen],
+ [no | 'none required'],
+ [LIB_DL=],
+ [*],
+ [LIB_DL="$ac_cv_search_dlopen"])
+AC_SUBST([LIB_DL])
+
# Should 'sort' link libcrypto dynamically?
AS_CASE([$LIB_CRYPTO],
[-lcrypto],
@@ -360,7 +369,7 @@ AS_CASE([$LIB_CRYPTO],
[utils_cv_dlopen_libcrypto],
[utils_cv_dlopen_libcrypto=no
saved_LIBS=$LIBS
- LIBS="$LIBS $LIB_CRYPTO"
+ LIBS="$LIBS $LIB_DL $LIB_CRYPTO"
AC_LINK_IFELSE(
[AC_LANG_PROGRAM(
[[#include <dlfcn.h>
diff --git a/src/local.mk b/src/local.mk
index 7bc5ba5bc..96ee941ca 100644
--- a/src/local.mk
+++ b/src/local.mk
@@ -304,7 +304,7 @@ src_printf_LDADD += $(LIBICONV)
# for libcrypto hash routines
src_md5sum_LDADD += $(LIB_CRYPTO)
-src_sort_LDADD += $(LIB_CRYPTO)
+src_sort_LDADD += $(LIB_DL) $(LIB_CRYPTO)
src_sha1sum_LDADD += $(LIB_CRYPTO)
src_sha224sum_LDADD += $(LIB_CRYPTO)
src_sha256sum_LDADD += $(LIB_CRYPTO)
--
2.34.1
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-27 11:15 ` Bruno Haible
@ 2024-02-27 14:14 ` Pádraig Brady
2024-02-27 18:31 ` Paul Eggert
1 sibling, 0 replies; 33+ messages in thread
From: Pádraig Brady @ 2024-02-27 14:14 UTC (permalink / raw)
To: Bruno Haible, bug-gnulib, coreutils, Paul Eggert; +Cc: cyan
On 27/02/2024 11:15, Bruno Haible wrote:
> Paul Eggert wrote:
>> I installed the
>> attached patch into coreutils. With it, 'sort -R' continues to use MD5
>> but on GNUish platforms 'sort' links libcrypto dynamically only if -R is
>> used (Bruno's suggestion). This doesn't significantly affect 'sort -R'
>> performance, and reduces the startup overhead of plain 'sort'
>
> The patch has no effect on openSUSE 15.5: DLOPEN_LIBCRYPTO is not defined
> in config.h.
>
> config.log shows a link error of the test program:
> undefined reference to symbol 'dlopen@@GLIBC_2.2.5'
> Both the test program and 'sort' need to link with -ldl.
>
> This is generally true on the following platforms:
> - glibc < 2.34,
> - Android (with some compilers, not with Termux (*)).
>
> (*) In the Android Termux environment, the compiler is configured to pass
> the options '-ldl -lc', rather than just '-lc', to the linker. Thus,
> we don't need to pass '-ldl' to the compiler. But the 'sort' program will
> be linked with -ldl. In other Android environments, things may be different,
> though.
>
> The proposed attached patch fixes the problem: It defines a variable LIB_DL
> that contains '-ldl' where needed or '' if not needed, and uses it with
> the test program and with 'sort'.
>
> You might think that this patch is no win, because it trades one link
> dependency for another link dependency? But that's not what it does:
> 'ldd' shows that without the patch, 'sort' loads the libraries
>
> linux-vdso.so.1
> libcrypto.so.3
> libpthread.so.0
> libc.so.6
> libz.so.1
> libdl.so.2
> /lib64/ld-linux-x86-64.so.2
>
> and with the patch, 'sort' loads the libraries
>
> linux-vdso.so.1
> libdl.so.2
> libpthread.so.0
> libc.so.6
> /lib64/ld-linux-x86-64.so.2
>
> — that is, libdl.so.2 is getting loaded anyway.
Applied.
thanks,
Pádraig
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-27 11:15 ` Bruno Haible
2024-02-27 14:14 ` Pádraig Brady
@ 2024-02-27 18:31 ` Paul Eggert
2024-02-27 19:01 ` Bruno Haible
1 sibling, 1 reply; 33+ messages in thread
From: Paul Eggert @ 2024-02-27 18:31 UTC (permalink / raw)
To: Bruno Haible, Pádraig Brady, bug-gnulib, coreutils; +Cc: cyan
Thanks for the patch. I was hoping that we didn't need to worry about
older platforms needing -ldl. Oh well.
The patch causes 'configure' to search for dlopen even when there's no
crypto library. 'configure' could instead use AC_SEARCH_LIBS only if the
AC_LINK_IFELSE fails (or simply put AC_LINK_ELSE in an 'for LIB_DL in ""
-ldl' loop). But perhaps it's better to leave things be, in case we ever
need dlopen for something else.
Also, if I understand things correctly, with this patch it's
theoretically possible to pass -ldl to gcc even when 'sort' doesn't do
dynamic linking, and we could complicate configure.ac further to omit
-ldl in that case. But I doubt whether it's worth worrying about this.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-27 18:31 ` Paul Eggert
@ 2024-02-27 19:01 ` Bruno Haible
0 siblings, 0 replies; 33+ messages in thread
From: Bruno Haible @ 2024-02-27 19:01 UTC (permalink / raw)
To: Pádraig Brady, bug-gnulib, coreutils, Paul Eggert; +Cc: cyan
Paul Eggert wrote:
> Thanks for the patch. I was hoping that we didn't need to worry about
> older platforms needing -ldl. Oh well.
Distros with glibc < 2.34 include:
Debian 11
CentOS 8 stream
Fedora 34
openSUSE 15.5
Slackware 15
These are not obsolete, so far.
> The patch causes 'configure' to search for dlopen even when there's no
> crypto library. 'configure' could instead use AC_SEARCH_LIBS only if the
> AC_LINK_IFELSE fails (or simply put AC_LINK_ELSE in an 'for LIB_DL in ""
> -ldl' loop). But perhaps it's better to leave things be, in case we ever
> need dlopen for something else.
If we were to start optimizing configure.ac scripts like this, they would
quickly become hard to maintain, because here and there we would be
accessing a shell variable that has not been initialized (because the
initialization was in an 'if' branch). (There's a reason why AC_REQUIRE
has been invented...)
The configure tests which nag me the most are those which take one second
of time or more: the fcntl test, the sleep test, the strstr test, and a few
others. If someone is starting to optimize, here would be the starting point :)
> Also, if I understand things correctly, with this patch it's
> theoretically possible to pass -ldl to gcc even when 'sort' doesn't do
> dynamic linking, and we could complicate configure.ac further to omit
> -ldl in that case. But I doubt whether it's worth worrying about this.
Correct, but this applies only to glibc and Android systems, and on these
systems configure finds
checking for C compiler flag to ignore unused libraries... -Wl,--as-needed
So, if $(LIB_DL) is -ldl and this linker option gets used for 'sort', but
'sort' does not use dlopen(), the linker will eliminate it.
Bruno
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-26 17:43 ` Pádraig Brady
@ 2024-02-27 21:36 ` Bruno Haible
2024-02-28 14:30 ` Pádraig Brady
0 siblings, 1 reply; 33+ messages in thread
From: Bruno Haible @ 2024-02-27 21:36 UTC (permalink / raw)
To: Andreas Schwab, Paul Eggert, Pádraig Brady
Cc: bug-gnulib, coreutils, cyan
Pádraig Brady wrote:
> > Does this work for all the various names of libcrypto in various distros?
> >
> > Debian 12 libcrypto.so.3
> > Ubuntu 22.04 libcrypto.so.1.1 libcrypto.so.3
> > Slackware 15 libcrypto.so.1.1
> > openSUSE 15.5 libcrypto.so.1.1
> > CentOS Stream 9 libcrypto.so.3
> > Guix 1.4 libcrypto.so.1.1
> > Alpine 3.19 libcrypto.so.3
> > FreeBSD 14.0 libcrypto.so.38
> > NetBSD 9.3 libcrypto.so.14
> > OpenBSD 7.4 libcrypto.so.52.0
> >
>
> I only tested with libcrypto.so.3, but it should match all of the above.
> It matches libcrypto.so.[.0-9]*
Here are my testing results (with the LIB_DL fix):
* On some machines, I had to install the packages with the <openssl/*>
header files first:
- Debian 12:
# apt install libssl-dev
- openSUSE 15.5:
YaST software > install libopenssl-3-devel
- Slackware 15:
Download and unpack the openssl-1.1.1w binary packages
- Alpine Linux 3.19:
# apk add openssl openssl-dev
* On some platforms, the configure test gave
checking whether openssl is GPL compatible... no
due to the <openssl/*> header files being absent:
- Guix 1.4
- macOS 12.5
- Cygwin 2.9.0
* On some platforms, the configure test gave
checking whether openssl is GPL compatible... no
because the openssl version is not >= 3.
- Slackware 15 (has libcrypto.so.1.1)
- NetBSD 9.3 (has libcrypto.so.14)
- OpenBSD 7.4 (has libcrypto.so.52.0)
- Solaris 11.4 (has libcrypto.so.1.0.0)
* On these platforms, the configure test gave
checking whether openssl is GPL compatible... yes
and LIBCRYPTO_SONAME got defined.
- Debian 12
- Ubuntu 22.04
- openSUSE 15.5
- CentOS Stream 9
- Alpine Linux 3.19
- FreeBSD 14.0
- Android
The value of LIBCRYPTO_SONAME is
"libcrypto.so.30" on Android,
"libcrypto.so.3" on the other platforms.
Bruno
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: sort dynamic linking overhead
2024-02-27 21:36 ` Bruno Haible
@ 2024-02-28 14:30 ` Pádraig Brady
0 siblings, 0 replies; 33+ messages in thread
From: Pádraig Brady @ 2024-02-28 14:30 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib, coreutils
On 27/02/2024 21:36, Bruno Haible wrote:
> Pádraig Brady wrote:
>>> Does this work for all the various names of libcrypto in various distros?
>>>
>>> Debian 12 libcrypto.so.3
>>> Ubuntu 22.04 libcrypto.so.1.1 libcrypto.so.3
>>> Slackware 15 libcrypto.so.1.1
>>> openSUSE 15.5 libcrypto.so.1.1
>>> CentOS Stream 9 libcrypto.so.3
>>> Guix 1.4 libcrypto.so.1.1
>>> Alpine 3.19 libcrypto.so.3
>>> FreeBSD 14.0 libcrypto.so.38
>>> NetBSD 9.3 libcrypto.so.14
>>> OpenBSD 7.4 libcrypto.so.52.0
>>>
>>
>> I only tested with libcrypto.so.3, but it should match all of the above.
>> It matches libcrypto.so.[.0-9]*
>
> Here are my testing results (with the LIB_DL fix):
>
> * On some machines, I had to install the packages with the <openssl/*>
> header files first:
>
> - Debian 12:
> # apt install libssl-dev
> - openSUSE 15.5:
> YaST software > install libopenssl-3-devel
> - Slackware 15:
> Download and unpack the openssl-1.1.1w binary packages
> - Alpine Linux 3.19:
> # apk add openssl openssl-dev
>
> * On some platforms, the configure test gave
>
> checking whether openssl is GPL compatible... no
>
> due to the <openssl/*> header files being absent:
>
> - Guix 1.4
> - macOS 12.5
> - Cygwin 2.9.0
>
> * On some platforms, the configure test gave
>
> checking whether openssl is GPL compatible... no
>
> because the openssl version is not >= 3.
>
> - Slackware 15 (has libcrypto.so.1.1)
> - NetBSD 9.3 (has libcrypto.so.14)
> - OpenBSD 7.4 (has libcrypto.so.52.0)
> - Solaris 11.4 (has libcrypto.so.1.0.0)
>
> * On these platforms, the configure test gave
>
> checking whether openssl is GPL compatible... yes
>
> and LIBCRYPTO_SONAME got defined.
>
> - Debian 12
> - Ubuntu 22.04
> - openSUSE 15.5
> - CentOS Stream 9
> - Alpine Linux 3.19
> - FreeBSD 14.0
> - Android
>
> The value of LIBCRYPTO_SONAME is
> "libcrypto.so.30" on Android,
> "libcrypto.so.3" on the other platforms.
Thanks for all the testing / info.
FTR some platforms may deem openssl < 3 as GPL compat,
and they can `configure --with-openssl=auto`,
in which case this optimization would be enabled.
thanks!
Pádraig
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2024-02-28 14:30 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-02 5:21 [PATCH] totalorder, totalorderf, totalorderl: new modules Paul Eggert
2023-10-02 11:33 ` Bruno Haible
2023-10-07 1:40 ` Paul Eggert
2023-10-07 2:33 ` sort and -lm Bruno Haible
2023-10-07 3:38 ` Paul Eggert
2023-10-07 11:42 ` Pádraig Brady
2023-10-07 21:29 ` Paul Eggert
2023-10-08 13:36 ` Pádraig Brady
2023-10-08 20:53 ` sort dynamic linking overhead Pádraig Brady
2023-10-09 13:48 ` Pádraig Brady
2024-02-26 6:25 ` Paul Eggert
2024-02-26 6:44 ` Yann Collet
2024-02-26 14:12 ` Pádraig Brady
2024-02-26 18:54 ` Paul Eggert
2024-02-26 17:11 ` Andreas Schwab
2024-02-26 17:37 ` Pádraig Brady
2024-02-26 17:39 ` Bruno Haible
2024-02-26 17:43 ` Pádraig Brady
2024-02-27 21:36 ` Bruno Haible
2024-02-28 14:30 ` Pádraig Brady
2024-02-26 18:06 ` Andreas Schwab
2024-02-26 19:13 ` Pádraig Brady
2024-02-27 11:15 ` Bruno Haible
2024-02-27 14:14 ` Pádraig Brady
2024-02-27 18:31 ` Paul Eggert
2024-02-27 19:01 ` Bruno Haible
2023-10-07 13:20 ` sort and -lm Bruno Haible
2023-10-02 11:39 ` [PATCH] totalorder, totalorderf, totalorderl: new modules Bruno Haible
2023-10-04 10:59 ` Bruno Haible
2023-10-04 11:21 ` totalorderl: Optimize Bruno Haible
2023-10-04 11:39 ` totalorder* tests: Refactor Bruno Haible
2023-10-05 19:59 ` totalorderl: minor porting fixes Bruno Haible
2023-10-07 11:02 ` [PATCH] totalorder, totalorderf, totalorderl: new modules Bruno Haible
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).