bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* localename: Fix test failure on OpenBSD >= 6.2
@ 2018-12-16  6:12 Bruno Haible
  2018-12-16 18:04 ` Ingo Schwarze
  0 siblings, 1 reply; 6+ messages in thread
From: Bruno Haible @ 2018-12-16  6:12 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Ingo Schwarze

While testing a grep snapshot on OpenBSD 6.3, I see a test failure of
test-localename. The cause is that in OpenBSD >= 6.2, the locale_t
type and uselocale() etc. are now available, but with a terribly dumbed-
down implementation that makes it impossible to use these per-thread
locales for anything real (including gettext()).

This patch adjusts the gnulib code so that it treats OpenBSD >= 6.2 like
the platforms without locale_t and uselocale().


2018-12-16  Bruno Haible  <bruno@clisp.org>

	localename: Fix test failure on OpenBSD >= 6.2.
	* m4/intl-thread-locale.m4 (gt_INTL_THREAD_LOCALE_NAME): Test for fake
	locale system. Define HAVE_FAKE_LOCALES in this case.
	* lib/localename.c (HAVE_GOOD_USELOCALE): New macro. Use it instead of
	HAVE_USELOCALE.
	* tests/test-localename.c (HAVE_GOOD_USELOCALE): New macro. Use it
	instead of HAVE_NEWLOCALE && HAVE_USELOCALE.
	* doc/posix-functions/uselocale.texi: Mention OpenBSD problem. Update
	platforms list.
	* doc/posix-functions/newlocale.texi: Likewise.
	* doc/posix-functions/duplocale.texi: Update platforms list.
	* doc/posix-functions/freelocale.texi: Likewise.

diff --git a/doc/posix-functions/uselocale.texi b/doc/posix-functions/uselocale.texi
index 07711df..2d8b947 100644
--- a/doc/posix-functions/uselocale.texi
+++ b/doc/posix-functions/uselocale.texi
@@ -14,5 +14,9 @@ Portability problems not fixed by Gnulib:
 @itemize
 @item
 This function is missing on many platforms:
-Mac OS X 10.3, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11.3, Cygwin, mingw, MSVC 14, Interix 3.5, BeOS, Android 4.4.
+Mac OS X 10.3, FreeBSD 9.0, NetBSD 5.0, OpenBSD 6.1, Minix 3.1.8, AIX 6.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11.3, Cygwin, mingw, MSVC 14, Interix 3.5, BeOS, Android 4.4.
+@item
+This function is useless because the @code{locale_t} type contains basically
+no information on some platforms:
+OpenBSD 6.3.
 @end itemize
diff --git a/doc/posix-functions/newlocale.texi b/doc/posix-functions/newlocale.texi
index c15b1bd..376f7e8 100644
--- a/doc/posix-functions/newlocale.texi
+++ b/doc/posix-functions/newlocale.texi
@@ -14,5 +14,9 @@ Portability problems not fixed by Gnulib:
 @itemize
 @item
 This function is missing on many platforms:
-Mac OS X 10.3, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11.3, Cygwin, mingw, MSVC 14, Interix 3.5, BeOS, Android 4.4.
+Mac OS X 10.3, FreeBSD 9.0, NetBSD 5.0, OpenBSD 6.1, Minix 3.1.8, AIX 6.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11.3, Cygwin, mingw, MSVC 14, Interix 3.5, BeOS, Android 4.4.
+@item
+This function is useless because the @code{locale_t} type contains basically
+no information on some platforms:
+OpenBSD 6.3.
 @end itemize
diff --git a/doc/posix-functions/duplocale.texi b/doc/posix-functions/duplocale.texi
index a328a67..000300b 100644
--- a/doc/posix-functions/duplocale.texi
+++ b/doc/posix-functions/duplocale.texi
@@ -21,5 +21,5 @@ Portability problems not fixed by Gnulib:
 @itemize
 @item
 This function is missing on many platforms:
-Mac OS X 10.3, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11.3, Cygwin, mingw, MSVC 14, Interix 3.5, BeOS, Android 4.4.
+Mac OS X 10.3, FreeBSD 9.0, NetBSD 5.0, OpenBSD 6.1, Minix 3.1.8, AIX 6.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11.3, Cygwin, mingw, MSVC 14, Interix 3.5, BeOS, Android 4.4.
 @end itemize
diff --git a/doc/posix-functions/freelocale.texi b/doc/posix-functions/freelocale.texi
index 50e448e..e4ff00f 100644
--- a/doc/posix-functions/freelocale.texi
+++ b/doc/posix-functions/freelocale.texi
@@ -14,5 +14,5 @@ Portability problems not fixed by Gnulib:
 @itemize
 @item
 This function is missing on many platforms:
-Mac OS X 10.3, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 5.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11.3, Cygwin, mingw, MSVC 14, Interix 3.5, BeOS, Android 4.4.
+Mac OS X 10.3, FreeBSD 9.0, NetBSD 5.0, OpenBSD 6.1, Minix 3.1.8, AIX 6.1, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11.3, Cygwin, mingw, MSVC 14, Interix 3.5, BeOS, Android 4.4.
 @end itemize
diff --git a/m4/intl-thread-locale.m4 b/m4/intl-thread-locale.m4
index d7ad0ac..0666e39 100644
--- a/m4/intl-thread-locale.m4
+++ b/m4/intl-thread-locale.m4
@@ -1,4 +1,4 @@
-# intl-thread-locale.m4 serial 2
+# intl-thread-locale.m4 serial 3
 dnl Copyright (C) 2015-2018 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -24,6 +24,57 @@ AC_DEFUN([gt_INTL_THREAD_LOCALE_NAME],
 
   AC_CHECK_FUNCS_ONCE([uselocale])
 
+  dnl On OpenBSD >= 6.2, the locale_t type and the uselocale(), newlocale(),
+  dnl duplocale(), freelocale() functions exist but are effectively useless,
+  dnl because the locale_t value depends only on the LC_CTYPE category of the
+  dnl locale and furthermore contains only one bit of information (it
+  dnl distinguishes the "C" locale from the *.UTF-8 locales). See
+  dnl <https://cvsweb.openbsd.org/src/lib/libc/locale/newlocale.c?rev=1.1&content-type=text/x-cvsweb-markup>.
+  dnl In the setlocale() implementation they have thought about the programs
+  dnl that use the API ("Even though only LC_CTYPE has any effect in the
+  dnl OpenBSD base system, store complete information about the global locale,
+  dnl such that third-party software can access it"), but for uselocale()
+  dnl they did not think about the programs.
+  dnl In this situation, even the HAVE_NAMELESS_LOCALES support does not work.
+  dnl So, define HAVE_FAKE_LOCALES and disable all locale_t support.
+  if test $ac_cv_func_uselocale = yes; then
+    AC_CHECK_HEADERS_ONCE([xlocale.h])
+    AC_CACHE_CHECK([for fake locale system (OpenBSD)],
+      [gt_cv_locale_fake],
+      [AC_RUN_IFELSE(
+         [AC_LANG_SOURCE([[
+#include <locale.h>
+#if HAVE_XLOCALE_H
+# include <xlocale.h>
+#endif
+int main ()
+{
+  locale_t loc1, loc2;
+  if (setlocale (LC_ALL, "de_DE.UTF-8") == NULL) return 1;
+  if (setlocale (LC_ALL, "fr_FR.UTF-8") == NULL) return 1;
+  loc1 = newlocale (LC_ALL_MASK, "de_DE.UTF-8", (locale_t)0);
+  loc2 = newlocale (LC_ALL_MASK, "fr_FR.UTF-8", (locale_t)0);
+  return !(loc1 == loc2);
+}]])],
+         [gt_cv_locale_fake=yes],
+         [gt_cv_locale_fake=no],
+         [dnl Guess the locale system is fake only on OpenBSD.
+          case "$host_os" in
+            openbsd*) gt_cv_locale_fake="guessing yes" ;;
+            *)        gt_cv_locale_fake="guessing no" ;;
+          esac
+         ])
+      ])
+  else
+    gt_cv_locale_fake=no
+  fi
+  case "$gt_cv_locale_fake" in
+    *yes)
+      AC_DEFINE([HAVE_FAKE_LOCALES], [1],
+        [Define if the locale_t type contains insufficient information, as on OpenBSD.])
+      ;;
+  esac
+
   if test $ac_cv_func_uselocale = yes; then
     AC_CACHE_CHECK([for Solaris 11.4 locale system],
       [gt_cv_locale_solaris114],
diff --git a/lib/localename.c b/lib/localename.c
index aa3cc13..28b3d61 100644
--- a/lib/localename.c
+++ b/lib/localename.c
@@ -35,7 +35,13 @@
 
 #include "flexmember.h"
 
-#if HAVE_USELOCALE
+/* We cannot support uselocale() on platforms where the locale_t type is fake.
+   See intl-thread-locale.m4 for details.  */
+#if HAVE_USELOCALE && !HAVE_FAKE_LOCALES
+# define HAVE_GOOD_USELOCALE 1
+#endif
+
+#if HAVE_GOOD_USELOCALE
 /* Mac OS X 10.5 defines the locale_t type in <xlocale.h>.  */
 # if defined __APPLE__ && defined __MACH__
 #  include <xlocale.h>
@@ -2623,8 +2629,8 @@ get_lcid (const char *locale_name)
 #endif
 
 
-#if HAVE_USELOCALE /* glibc, Mac OS X, FreeBSD >= 9.1, AIX >= 7,
-                      Solaris 11 OpenIndiana, or Solaris >= 11.4  */
+#if HAVE_GOOD_USELOCALE /* glibc, Mac OS X, FreeBSD >= 9.1, AIX >= 7,
+                           Solaris 11 OpenIndiana, or Solaris >= 11.4  */
 
 /* Simple hash set of strings.  We don't want to drag in lots of hash table
    code here.  */
@@ -2709,7 +2715,7 @@ struniq (const char *string)
 #endif
 
 
-#if HAVE_USELOCALE && HAVE_NAMELESS_LOCALES
+#if HAVE_GOOD_USELOCALE && HAVE_NAMELESS_LOCALES
 
 /* The 'locale_t' object does not contain the names of the locale categories.
    We have to associate them with the object through a hash table.
@@ -3089,7 +3095,7 @@ freelocale (locale_t locale)
 #endif
 
 
-#if defined IN_LIBINTL || HAVE_USELOCALE
+#if defined IN_LIBINTL || HAVE_GOOD_USELOCALE
 
 /* Like gl_locale_name_thread, except that the result is not in storage of
    indefinite extent.  */
@@ -3099,7 +3105,7 @@ static
 const char *
 gl_locale_name_thread_unsafe (int category, const char *categoryname)
 {
-# if HAVE_USELOCALE
+# if HAVE_GOOD_USELOCALE
   {
     locale_t thread_locale = uselocale (NULL);
     if (thread_locale != LC_GLOBAL_LOCALE)
@@ -3212,7 +3218,7 @@ gl_locale_name_thread_unsafe (int category, const char *categoryname)
 const char *
 gl_locale_name_thread (int category, const char *categoryname)
 {
-#if HAVE_USELOCALE
+#if HAVE_GOOD_USELOCALE
   const char *name = gl_locale_name_thread_unsafe (category, categoryname);
   if (name != NULL)
     return struniq (name);
diff --git a/tests/test-localename.c b/tests/test-localename.c
index 4e8d146..8c3c425 100644
--- a/tests/test-localename.c
+++ b/tests/test-localename.c
@@ -26,8 +26,12 @@
 
 #include "macros.h"
 
+#if HAVE_NEWLOCALE && HAVE_USELOCALE && !HAVE_FAKE_LOCALES
+# define HAVE_GOOD_USELOCALE 1
+#endif
+
 
-#if HAVE_NEWLOCALE && HAVE_USELOCALE
+#if HAVE_GOOD_USELOCALE
 
 static struct { int cat; int mask; const char *string; } const categories[] =
   {
@@ -70,7 +74,7 @@ test_locale_name (void)
 
   /* Get into a defined state,  */
   setlocale (LC_ALL, "en_US.UTF-8");
-#if HAVE_NEWLOCALE && HAVE_USELOCALE
+#if HAVE_GOOD_USELOCALE
   uselocale (LC_GLOBAL_LOCALE);
 #endif
 
@@ -181,7 +185,7 @@ test_locale_name (void)
       ASSERT (strcmp (name, "fr_FR.UTF-8") == 0);
     }
 
-#if HAVE_NEWLOCALE && HAVE_USELOCALE
+#if HAVE_GOOD_USELOCALE
   /* Check that gl_locale_name considers the thread locale.  */
   {
     locale_t locale = newlocale (LC_ALL_MASK, "fr_FR.UTF-8", NULL);
@@ -241,7 +245,7 @@ test_locale_name_thread (void)
   /* Get into a defined state,  */
   setlocale (LC_ALL, "en_US.UTF-8");
 
-#if HAVE_NEWLOCALE && HAVE_USELOCALE
+#if HAVE_GOOD_USELOCALE
   /* Check that gl_locale_name_thread returns NULL when no thread locale is
      set.  */
   uselocale (LC_GLOBAL_LOCALE);
@@ -496,7 +500,7 @@ test_locale_name_posix (void)
 
   /* Get into a defined state,  */
   setlocale (LC_ALL, "en_US.UTF-8");
-#if HAVE_NEWLOCALE && HAVE_USELOCALE
+#if HAVE_GOOD_USELOCALE
   uselocale (LC_GLOBAL_LOCALE);
 #endif
 
@@ -605,7 +609,7 @@ test_locale_name_posix (void)
       ASSERT (strcmp (name, "fr_FR.UTF-8") == 0);
     }
 
-#if HAVE_NEWLOCALE && HAVE_USELOCALE
+#if HAVE_GOOD_USELOCALE
   /* Check that gl_locale_name_posix ignores the thread locale.  */
   {
     locale_t locale = newlocale (LC_ALL_MASK, "fr_FR.UTF-8", NULL);
@@ -634,7 +638,7 @@ test_locale_name_environ (void)
 
   /* Get into a defined state,  */
   setlocale (LC_ALL, "en_US.UTF-8");
-#if HAVE_NEWLOCALE && HAVE_USELOCALE
+#if HAVE_GOOD_USELOCALE
   uselocale (LC_GLOBAL_LOCALE);
 #endif
 
@@ -719,7 +723,7 @@ test_locale_name_environ (void)
   name = gl_locale_name_environ (LC_MESSAGES, "LC_MESSAGES");
   ASSERT (strcmp (name, "fr_FR.UTF-8") == 0);
 
-#if HAVE_NEWLOCALE && HAVE_USELOCALE
+#if HAVE_GOOD_USELOCALE
   /* Check that gl_locale_name_environ ignores the thread locale.  */
   {
     locale_t locale = newlocale (LC_ALL_MASK, "fr_FR.UTF-8", NULL);
@@ -754,7 +758,7 @@ test_locale_name_default (void)
   ASSERT (strcmp (name, "C") == 0);
 #endif
 
-#if HAVE_NEWLOCALE && HAVE_USELOCALE
+#if HAVE_GOOD_USELOCALE
   /* Check that gl_locale_name_default ignores the thread locale.  */
   {
     locale_t locale = newlocale (LC_ALL_MASK, "fr_FR.UTF-8", NULL);



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: localename: Fix test failure on OpenBSD >= 6.2
  2018-12-16  6:12 localename: Fix test failure on OpenBSD >= 6.2 Bruno Haible
@ 2018-12-16 18:04 ` Ingo Schwarze
  2018-12-16 19:01   ` OpenBSD locale system Bruno Haible
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Schwarze @ 2018-12-16 18:04 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib

Hi Bruno,

Bruno Haible wrote on Sun, Dec 16, 2018 at 07:12:44AM +0100:

> While testing a grep snapshot on OpenBSD 6.3, I see a test failure of
> test-localename. The cause is that in OpenBSD >= 6.2, the locale_t
> type and uselocale() etc. are now available, but with a terribly dumbed-
> down implementation that makes it impossible to use these per-thread
> locales for anything real (including gettext()).

Well, you *can* use newlocale()/uselocale() to switch the charset
between the two supported character sets ASCII and UTF-8, on a
per-thread basis.

The OpenBSD C library intentionally doesn't implement any other
locale(1) categories except LC_CTYPE because many here regard the
other categories as overengineering and as detrimental to system
security, so in that sense, xlocale support is complete.  Yes, you
are right that locale_t contains exactly one bit of information,
intentionally so.  POSIX does not require that "de_DE.UTF-8" and
"fr_FR.UTF-8" must be different locales, or that they behave
differently from each other in any way.

Not sure what the conclusion should be for gnulib - it probably
depends on what gnulib suggests the application build system should
do when newlocale(3) is not available, or what it should do when
the target system intentionally refrains from implementing LC_*
categories other than LC_CTYPE.

Yours,
  Ingo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OpenBSD locale system
  2018-12-16 18:04 ` Ingo Schwarze
@ 2018-12-16 19:01   ` Bruno Haible
  2018-12-16 23:16     ` Ingo Schwarze
  0 siblings, 1 reply; 6+ messages in thread
From: Bruno Haible @ 2018-12-16 19:01 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: bug-gnulib

Hi Ingo,

> The OpenBSD C library intentionally doesn't implement any other
> locale(1) categories except LC_CTYPE because many here regard the
> other categories as overengineering and as detrimental to system
> security

I partially agree with this, regarding specific categories, such as

  - LC_MONETARY: The main API function for this category, strfmon(),
    is defined in such a way that, if implemented correctly, it
    produces misleading results.
    <http://austingroupbugs.net/view.php?id=1199>

  - LC_PAPER: Any software which wants to print something should
    better ask the attached printer, rather than make assumptions
    about the printer device based on the locale.

However, locale categories such as LC_NUMERIC and LC_MESSAGES
are useful when you assume that your software does have end-users
that are not sysadmins.

For Android, the choice to use dumbed-down locales in libc was made
because 99% of the software visible to the end user is written in
another programming language (Java) that comes with its own, elaborate,
locale system. The remaining 1% of visible only to developers via 'adb',
and thus needs no locales.

Regarding OpenBSD, the uselocale support is useful for adding a checkmark
to the checkbox "We support POSIX locale_t API", but is not useful, for
example, to have a multithreaded web server honor the Accept-Language
settings given by a browser user, other than by reimplementing all
needed locale-dependent behaviour.

> POSIX does not require that "de_DE.UTF-8" and
> "fr_FR.UTF-8" must be different locales, or that they behave
> differently from each other in any way.

Here you need to distinguish
  - locale-dependent behaviour defined by POSIX functions and
  - locale-dependent behaviour defined by the application.

In setlocale.c you made this distinction, as witnessed by the comment in
https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/locale/setlocale.c?annotate=1.29
lines 72..75.

Why not also for the per-thread locales? By implementing the FreeBSD
querylocale API (the equivalent of setlocale(category,NULL) for locale_t
objects), you would make it possible for applications to pull out
German versus French messages, depending whether the per-thread locale
is "de_DE.UTF-8" or "fr_FR.UTF-8".

> Not sure what the conclusion should be for gnulib - it probably
> depends on what gnulib suggests the application build system should
> do when newlocale(3) is not available

Exactly. Gnulib now treats such a platform like one which does not
have locale_t, uselocale(), newlocale().

Bruno



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OpenBSD locale system
  2018-12-16 19:01   ` OpenBSD locale system Bruno Haible
@ 2018-12-16 23:16     ` Ingo Schwarze
  2018-12-20 20:55       ` Bruno Haible
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Schwarze @ 2018-12-16 23:16 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib

Hi Bruno,

Bruno Haible wrote on Sun, Dec 16, 2018 at 08:01:04PM +0100:
> Ingo Schwarze wrote:

>> The OpenBSD C library intentionally doesn't implement any other
>> locale(1) categories except LC_CTYPE because many here regard the
>> other categories as overengineering and as detrimental to system
>> security

> I partially agree with this, regarding specific categories, such as
> 
>   - LC_MONETARY: The main API function for this category, strfmon(),
>     is defined in such a way that, if implemented correctly, it
>     produces misleading results.
>     <http://austingroupbugs.net/view.php?id=1199>
> 
>   - LC_PAPER: Any software which wants to print something should
>     better ask the attached printer, rather than make assumptions
>     about the printer device based on the locale.
> 
> However, locale categories such as LC_NUMERIC and LC_MESSAGES
> are useful when you assume that your software does have end-users
> that are not sysadmins.

Probably, you are right that LC_MESSAGES is not dangerous as long
as the C library doesn't actually attempt to translate system
error strings.  But LC_NUMERIC is certainly dangerous, it can
break parsers in subtle and surprising ways, whereas it doesn't
really matter all that much for end users in the first place.

But i guess discussing such considerations in detail would be
off-topic on this mailing list; i merely mentioned them to provide
minimal context regarding why certain decisions were made; so let's
focus on the consequences of the decisions, how gnulib should best
deal with them, and possibly identify parts that might need revisiting,
see below.

[...]
> Regarding OpenBSD, the uselocale support is useful for adding a checkmark
> to the checkbox "We support POSIX locale_t API", but is not useful, for
> example, to have a multithreaded web server honor the Accept-Language
> settings given by a browser user, other than by reimplementing all
> needed locale-dependent behaviour.

The "all needed" in this sentence sounds like it were a big deal;
but all that is needed here is storing one language code per user,
right?  Why would any programmer call a library API for that rather
than simply storing the selected language in a variable?

For comparison, the point of using {set,new,use}locale(3) with
LC_CTYPE is not merely remembering which character set the user
asked for, but also changing the behaviour of many *wc*(3) and
*mb*(3) library functions.  LC_MESSAGES, on the other hand, will
never have any effect on the behaviour of any library function
in the OpenBSD libc.

Also, in your web server example, you certainly don't want syslog
messages in languages requested by clients, so calling uselocale(3)
would merely be asking for trouble...  (Of course it's still possible
to write correct code, but harder.)

>> POSIX does not require that "de_DE.UTF-8" and
>> "fr_FR.UTF-8" must be different locales, or that they behave
>> differently from each other in any way.

> Here you need to distinguish
>   - locale-dependent behaviour defined by POSIX functions and
>   - locale-dependent behaviour defined by the application.
> 
> In setlocale.c you made this distinction, as witnessed by the
> comment in
> https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/locale/setlocale.c?annotate=1.29
> lines 72..75.

Actually, originally i proposed to delete that behaviour for
consistency with {new,use}locale(3), but no consensus was reached
on that point - some argued: given that it is already implemented,
why not simply keep it in setlocale(3)?  It may be useful in some
situations.  So it was kept.

But i consider setlocale(3) the odd one out here rather than
{new,use}locale(3), because setlocale(3) supports storing a string
in the library that the application program could just as easily,
or arguably even more easily, store itself.

> Why not also for the per-thread locales? By implementing the FreeBSD
> querylocale API (the equivalent of setlocale(category,NULL) for locale_t
> objects), you would make it possible for applications to pull out
> German versus French messages, depending whether the per-thread locale
> is "de_DE.UTF-8" or "fr_FR.UTF-8".

So, you suggest to store this string in the library (where it has
no effect) even though POSIX does not define a method to retrieve
it again once it is stored?  I don't quite see yet how that might
be useful - not even for your webserver example, because the webserver
couldn't portably retrieve the string, or could it?


I hoped to understand better what your point is by looking at the
HEAD of the master branch of the git repo of GNU grep because you
mentioned a test failure there - but grepping the grep repo, i can't
even seem to find any usage of newlocale(3) or setlocale(3) in
there, so i'm not quite sure what you are actually trying to
achieve.  Also, you mentioned "a test failure of test-localename",
but "grep -RF localename *" returns nohing for me in the grep repo
either...

I also tried running the build myself in order to reproduce your
issue on OpenBSD-current.  Here are the findings:
 1. ./bootstrap appears to run wget(1), unconditionally, which didn't
    exist on my system.  On OpenBSD, the program for that purpose
    is called ftp(1) - even for https:// URIs.
 2. make check yields only two failures:
XFAIL: equiv-classes
XFAIL: triple-backref
============================================================================
Testsuite summary for GNU grep 3.1.51-e767
============================================================================
# TOTAL: 109
# PASS:  80
# SKIP:  27
# XFAIL: 2
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================
============================================================================
Testsuite summary for GNU grep 3.1.51-e767
============================================================================
# TOTAL: 173
# PASS:  157
# SKIP:  16
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================

In particular, i see:
PASS: test-localename

Do you need more info?  If so, what exactly?  Better on or off list?


Of course, asking for querylocale(3) support - as opposed to
questioning the implementation of uselocale(3) - would be a rather
different matter.  But while i did hear from porters that the lack
of {new,use}locale(3) and the related interfaces did cause porting
trouble in the past, i didn't hear about trouble that would go away
by implementing querylocale(3) so far, and given that it isn't
standardized, that doesn't seem very surprising.  Of course, i may
simply have missed such trouble.

Anyway, in case what you really ask for is implementing querylocale(3),
then i no longer understand what is broken about {new,use}locale(3)
as long as querylocale(3) does not exist, so why exactly it needs
to be marked as non-working...

Yours,
  Ingo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OpenBSD locale system
  2018-12-16 23:16     ` Ingo Schwarze
@ 2018-12-20 20:55       ` Bruno Haible
  2018-12-21  0:37         ` Ingo Schwarze
  0 siblings, 1 reply; 6+ messages in thread
From: Bruno Haible @ 2018-12-20 20:55 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: bug-gnulib

Hi Ingo,

Thanks for your thoughts. I appreciate that you give true consideration
to arguments.

> But i guess discussing such considerations in detail would be
> off-topic on this mailing list

We can stay on this mailing list. I'm not going to go deep into OpenBSD
specific system design arguments.

> So, you suggest to store this string in the library (where it has
> no effect) even though POSIX does not define a method to retrieve
> it again once it is stored?

I have now submitted a request to add such a method to POSIX. Here:
http://austingroupbugs.net/view.php?id=1220

I used OpenBSD 6.2 as an example in there, not to bash OpenBSD, but
to prove that POSIX is incomplete so far. Which I should probably
have done as early as 2005, when I noticed that the API is incomplete
regarding GNU libc:
https://sourceware.org/ml/libc-alpha/2005-03/msg00125.html

> Why would any programmer call a library API for that rather
> than simply storing the selected language in a variable?
...
> setlocale(3) supports storing a string
> in the library that the application program could just as easily,
> or arguably even more easily, store itself.

Many application programs are not small pieces of code, written by
a small group of programmers, but are rather assembled through
libraries, written by different groups of programmers.

The following libc APIs exist, not in order primarily make system calls
to the kernel, but to let information flow from one place of the
application to another place of the application:
  <locale.h>   setlocale, uselocale
  <setjmp.h>   setjmp, longjmp
  <stdio.h>    setbuf, setvbuf, clearerr
  <syslog.h>   setlogmask
  <libintl.h>  textdomain, bindtextdomain

Going even further, applications can even dynamically load libraries,
through <dlfcn.h>.

For example, 'ldd /usr/bin/emacs' displays 110 libraries on my system,
and a running 'kate' process has 46 dynamically loaded .so files open.
That's where libc (or libstdc++, in the second case) as information
dispatcher between different parts of the application becomes important.

> For comparison, the point of using {set,new,use}locale(3) with
> LC_CTYPE is not merely remembering which character set the user
> asked for, but also changing the behaviour of many *wc*(3) and
> *mb*(3) library functions.  LC_MESSAGES, on the other hand, will
> never have any effect on the behaviour of any library function
> in the OpenBSD libc.

The *mb* and *wc* function are only one consumer of the information
(the name of the locale_t category). Other parts of the application
want to consume this information as well.

> Also, in your web server example, you certainly don't want syslog
> messages in languages requested by clients, so calling uselocale(3)
> would merely be asking for trouble...  (Of course it's still possible
> to write correct code, but harder.)

You make a good (and often overlooked) point: In a properly internationalized
system: When a message is generated, the audience of the message (which
user? the web server administrator? the database administrator? the
browser user?) needs to be considered *already* at the point where the
message is generated. In my web server example, one could use
  - the global locale, set by setlocale(), for messages that go to the
    administrator,
  - a '__thread locale_t browser_use_locale;' object, or uselocale(),
    for messages that go to the browser user.

> But LC_NUMERIC is certainly dangerous, it can
> break parsers in subtle and surprising ways

Yes, here too, consideration needs to be given to the question: who
will parse the decimal number? A human user (supposed to be using
which locale?) or a language neutral parser?

The fact that these considerations can be done is shown by the
category LC_TIME. Here, parsing locale-dependent output is so
complex and buggy (see e.g. the ill-designed attempt in
https://pubs.opengroup.org/onlinepubs/9699919799/functions/getdate.html)
that real-world software is forced to make the distinction between
localized and not localized time representations. For the not localized
representations, software usually standardizes on the
  date +"%Y-%m-%dT%H:%M:%S"
format (with Gregorian calendar).

When you apply similar thought to LC_NUMERIC functionality, you can
achieve good results. But I agree it's easy to introduce bugs in this
area. Just last week, by mistake, I wrote code that prints a port number
in a localized way: 8,080 or 8.080 depending on locale. Ouch.

> I hoped to understand better what your point is by looking at the
> HEAD of the master branch of the git repo of GNU grep because you
> mentioned a test failure there

You would better look at a GNU gettext release:
https://ftp.gnu.org/gnu/gettext/gettext-0.19.8.1.tar.gz
There, in the gettext-runtime/intl/ directory, you will find the
localename.c file - which is my attempt at overcoming the lack of
a locale name getter function in POSIX - and its use for gettext().

GNU grep indeed happens to include the localename test, but since
'grep' is not a multithreaded program, inspection of this code will
not give you insights on this issue.

Bruno



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OpenBSD locale system
  2018-12-20 20:55       ` Bruno Haible
@ 2018-12-21  0:37         ` Ingo Schwarze
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Schwarze @ 2018-12-21  0:37 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib

Hi Bruno,

Bruno Haible wrote on Thu, Dec 20, 2018 at 09:55:25PM +0100:

> I have now submitted a request to add such a method to POSIX. Here:
> http://austingroupbugs.net/view.php?id=1220

Yes, i had seen that request.

> I used OpenBSD 6.2 as an example in there, not to bash OpenBSD,

Nothing is wrong with bashing OpenBSD when it is at fault.  ;-)

> but to prove that POSIX is incomplete so far. Which I should probably
> have done as early as 2005, when I noticed that the API is incomplete
> regarding GNU libc:
> https://sourceware.org/ml/libc-alpha/2005-03/msg00125.html

There can be no doubt that the POSIX locale system is incomplete in
many ways.  I do doubt that making it complete is feasible or even
desirable - trying to seems more likely to result in bloat than to
result in anything that might actually become complete in the end.

That said, i like the general approach of the ticket you submitted:
Do one very specific thing that can be argued to be particularly
important and that can also be argued to be a gap, because there
already is related functionality on more than one side:
setlocale(NULL) on one side, uselocale() on another side.

I'm not completely convinced the interface you propose is the best
design possible or really urgently needed, but i don't see off the
top of my head how to make it better (i.e. simpler), either.


I think i now see your point why setlocale(3) is not only useful to
configure the behaviour of *wc* and *mb* functions, but also for
standardized communication among various parts of a program, and
why something similar for uselocale(3) might possibly make sense.
Though i don't claim to anticipate all the dragons one might
encounter going that way.

[...]
> When you apply similar thought to LC_NUMERIC functionality, you can
> achieve good results. But I agree it's easy to introduce bugs in this
> area. Just last week, by mistake, I wrote code that prints a port number
> in a localized way: 8,080 or 8.080 depending on locale. Ouch.

Ouch indeed.  You strengthen my determination to oppose any support
for LC_NUMERIC and the like in OpenBSD.  We really don't want
functionality that makes bugs and vulnerabilities more likely, and
even less so when it only serves relatively arcane, rarely needed
purposes.  The unavoidable complexity of internationalized UI
programming must not introduce additional risks into pure-libc
systems programming.  Such stuff belongs into GUI programming kits,
not into libc.

[...]
> You would better look at a GNU gettext release:
> https://ftp.gnu.org/gnu/gettext/gettext-0.19.8.1.tar.gz
> There, in the gettext-runtime/intl/ directory, you will find the
> localename.c file - which is my attempt at overcoming the lack of
> a locale name getter function in POSIX - and its use for gettext().

Ah.  I hope you aren't offended that i won't study that code right
now in detail unless the following conclusion happens to be mistaken.
I suspect that i understand the situation well enough from your
short description to conclude that the bug might not be in gnulib
at all, but possibly rather in gettext.

On the one hand, gnulib seems right - without the patch you submitted
to this list - to detect that OpenBSD now provides uselocale(3).
So i'm not so sure the patch you submitted here is a good idea.

On the other hand, if i understand correctly what you are saying,
gettext expects some behaviour of newlocale(3) that isn't really
required by POSIX, and then jumps to conclusions about how the
locale_t objects returned from uselocale(3) can be used.  It seems
to me that, if gettext wants to use non-standard features of newlocale,
it is gettext that should test whether the specific extension features
it wants to use are available.  I'm not sure that is gnulibs task.
And even if people here think that doing *something* in gnulib to
help people deal with this situation makes sense, than disabling
the interface altogether looks like throwing the baby out with the
bathwater, when only specific non-standard ways of using the interface
are unsupported...

Also, a very brief look at localename.c in gettext gives me the
impression that it might be using an inconsistent mixture of (good)
feature tests and (always somewhat fragile) operating system name
tests.  For example, the function gl_locale_name_thread_unsafe()
appears to be compiled in and used without further conditions when
autoconf concludes HAVE_USELOCALE, but then the function doesn't
appear to do anything useful unless __GLIBC__, __FreeBSD__, or
__APPLE__ are defined.


Given that at least three different APIs to deal with the task you
are describing in the Austin group ticket are already implemented
in various operating systems, this is likely not the best time to
implement any additional functionality in OpenBSD.  A better time
for doing so might be when the dust has settled and the Austin group
has made a descision which among the several options will be
considered the standard one.  In particular since your goal of
having a standard way to communicate among various parts of a program
can hardly be reached anyway, as long as there are several competeting
tools and none of them generally available.  Besides, locale support
certainly isn't the focus of OpenBSD development goals, so a "wait
and see what other free projects more invested in this particular
business can agree on" looks like a somewhat reasonable approach
to me.

Yours,
  Ingo


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-12-21  0:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-16  6:12 localename: Fix test failure on OpenBSD >= 6.2 Bruno Haible
2018-12-16 18:04 ` Ingo Schwarze
2018-12-16 19:01   ` OpenBSD locale system Bruno Haible
2018-12-16 23:16     ` Ingo Schwarze
2018-12-20 20:55       ` Bruno Haible
2018-12-21  0:37         ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).