bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* new module 'c32rtomb'
@ 2020-01-09  1:14 Bruno Haible
  2020-01-21  1:00 ` Bruno Haible
  0 siblings, 1 reply; 2+ messages in thread
From: Bruno Haible @ 2020-01-09  1:14 UTC (permalink / raw)
  To: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 4169 bytes --]

The function c32rtomb() is like wcrtomb(), except that it takes a 32-bit wide
character (char32_t) as argument, not a wchar_t.

While implementing this module, I noted a mistake in the 'mbrtoc32' module:
It assumed that when wchar_t is 32-bit and mbrtoc32() exists in libc,
mbrtoc32() is equivalent to mbrtowc(); in other words, that the char32_t
encoding and the wchar_t encoding of the same multibyte sequence are the
same. But this is not the case! On FreeBSD 12 and Solaris 11.4, the
two encodings are different. The FreeBSD 12 wchar_t encoding is apparently
based on ISO 2022 (very old).

The fix is to use mbrtoc32() on platforms where this is possible, namely
on FreeBSD.

On Solaris 11.4 and native Windows, however, it is not good to use the
system's mbrtoc32() because it refuses to convert some multibyte sequences
that mbrtowc() supports!

So, we end up using the system's mbrtoc32() and c32rtomb() functions on
  - glibc,
  - FreeBSD,
  - AIX,
and not using them on
  - Solaris 11.4,
  - mingw,
  - MSVC.


2020-01-08  Bruno Haible  <bruno@clisp.org>

	mbrtoc32: Use the system's mbrtoc32 if it exists and basically works.
	* m4/mbrtoc32.m4 (gl_MBRTOC32_SANITYCHECK): New macro.
	(gl_FUNC_MBRTOC32): Require it. Set REPLACE_MBRTOC32 if mbrtoc32 exists
	but is not working.
	* lib/mbrtoc32.c: Include hard-locale.h, <locale.h>.
	(mbrtoc32): If the char32_t encoding and the wchar_t encoding may
	differ, use the system's mbrtoc32, adding workarounds.
	* modules/mbrtoc32 (Depends-on): Add hard-locale.
	* doc/posix-functions/mbrtoc32.texi: Mention the Solaris and native
	Windows problem.
	* lib/btoc32.c: Include <stdio.h>, <string.h>.
	(btoc32): If the char32_t encoding and the wchar_t encoding may differ,
	use mbrtoc32, not btowc.
	* modules/btoc32 (Depends-on): Add mbrtoc32.
	* lib/mbsrtoc32s.c (mbsrtoc32s): If the char32_t encoding and the
	wchar_t encoding may differ, use mbrtoc32, not mbsrtowcs.
	* modules/mbsrtoc32s (Depends-on): Update conditions.
	(configure.ac): Compile mbsrtoc32s-state.c unconditionally.
	* lib/mbsnrtoc32s.c (mbsnrtoc32s): If the char32_t encoding and the
	wchar_t encoding may differ, use mbrtoc32, not mbsnrtowcs.
	* modules/mbsnrtoc32s (Depends-on): Update conditions.
	(configure.ac): Compile mbsrtoc32s-state.c unconditionally.

2020-01-08  Bruno Haible  <bruno@clisp.org>

	c32rtomb: Add tests.
	* tests/test-c32rtomb.c: New file, based on tests/test-wcrtomb.c.
	* tests/test-c32rtomb.sh: New file, based on tests/test-wcrtomb.sh.
	* tests/test-c32rtomb-w32.c: New file, based on
	tests/test-wcrtomb-w32.c.
	* tests/test-c32rtomb-w32-1.sh: New file, based on
	tests/test-wcrtomb-w32-1.sh.
	* tests/test-c32rtomb-w32-2.sh: New file, based on
	tests/test-wcrtomb-w32-2.sh.
	* tests/test-c32rtomb-w32-3.sh: New file, based on
	tests/test-wcrtomb-w32-3.sh.
	* tests/test-c32rtomb-w32-4.sh: New file, based on
	tests/test-wcrtomb-w32-4.sh.
	* tests/test-c32rtomb-w32-5.sh: New file, based on
	tests/test-wcrtomb-w32-5.sh.
	* tests/test-c32rtomb-w32-6.sh: New file, based on
	tests/test-wcrtomb-w32-6.sh.
	* tests/test-c32rtomb-w32-7.sh: New file, based on
	tests/test-wcrtomb-w32-7.sh.
	* modules/c32rtomb-tests: New file.

	c32rtomb: New module.
	* lib/uchar.in.h (c32rtomb): New declaration.
	* lib/c32rtomb.c: New file, based on lib/unistr/u8-uctomb-aux.c.
	* m4/c32rtomb.m4: New file.
	* m4/uchar.m4 (gl_UCHAR_H): Test whether c32rtomb is declared.
	(gl_UCHAR_H_DEFAULTS): Initialize GNULIB_C32RTOMB, HAVE_C32RTOMB,
	REPLACE_C32RTOMB.
	* modules/uchar (Makefile.am): Substitute GNULIB_C32RTOMB,
	HAVE_C32RTOMB, REPLACE_C32RTOMB.
	* modules/c32rtomb: New file.
	* tests/test-uchar-c++.cc: Test the signature of c32rtomb.
	* doc/posix-functions/c32rtomb.texi: Document the new module.
	* doc/posix-functions/wcrtomb.texi: Mention the new module.

2020-01-08  Bruno Haible  <bruno@clisp.org>

	c32tob: Make consistent with mbrtoc32.
	* lib/c32tob.c: Include <stdio.h>, <string.h>, <wchar.h>.
	(c32tob): If the char32_t encoding and the wchar_t encoding may differ,
	use c32rtomb, not wctob.
	* modules/c32tob (Files): Add m4/mbrtoc32.m4.
	(Depends-on): Add c32rtomb.
	(configure.ac): Require gl_MBRTOC32_SANITYCHECK.


[-- Attachment #2: 0001-mbrtoc32-Use-the-system-s-mbrtoc32-if-it-exists-and-.patch --]
[-- Type: text/x-patch, Size: 14544 bytes --]

From 9be236d67f3d78235c5cbe4381c5dd7b3cddb179 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Thu, 9 Jan 2020 01:47:17 +0100
Subject: [PATCH 1/4] mbrtoc32: Use the system's mbrtoc32 if it exists and
 basically works.

* m4/mbrtoc32.m4 (gl_MBRTOC32_SANITYCHECK): New macro.
(gl_FUNC_MBRTOC32): Require it. Set REPLACE_MBRTOC32 if mbrtoc32 exists
but is not working.
* lib/mbrtoc32.c: Include hard-locale.h, <locale.h>.
(mbrtoc32): If the char32_t encoding and the wchar_t encoding may
differ, use the system's mbrtoc32, adding workarounds.
* modules/mbrtoc32 (Depends-on): Add hard-locale.
* doc/posix-functions/mbrtoc32.texi: Mention the Solaris and native
Windows problem.
* lib/btoc32.c: Include <stdio.h>, <string.h>.
(btoc32): If the char32_t encoding and the wchar_t encoding may differ,
use mbrtoc32, not btowc.
* modules/btoc32 (Depends-on): Add mbrtoc32.
* lib/mbsrtoc32s.c (mbsrtoc32s): If the char32_t encoding and the
wchar_t encoding may differ, use mbrtoc32, not mbsrtowcs.
* modules/mbsrtoc32s (Depends-on): Update conditions.
(configure.ac): Compile mbsrtoc32s-state.c unconditionally.
* lib/mbsnrtoc32s.c (mbsnrtoc32s): If the char32_t encoding and the
wchar_t encoding may differ, use mbrtoc32, not mbsnrtowcs.
* modules/mbsnrtoc32s (Depends-on): Update conditions.
(configure.ac): Compile mbsrtoc32s-state.c unconditionally.
---
 ChangeLog                         |  25 ++++++++++
 doc/posix-functions/mbrtoc32.texi |   4 ++
 lib/btoc32.c                      |  20 ++++++++
 lib/mbrtoc32.c                    |  53 ++++++++++++++------
 lib/mbsnrtoc32s.c                 |   4 +-
 lib/mbsrtoc32s.c                  |   4 +-
 m4/mbrtoc32.m4                    | 102 +++++++++++++++++++++++++++++++++++++-
 modules/btoc32                    |   1 +
 modules/mbrtoc32                  |   1 +
 modules/mbsnrtoc32s               |  10 ++--
 modules/mbsrtoc32s                |   8 ++-
 11 files changed, 204 insertions(+), 28 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index ea35e7e..4b5a419 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,28 @@
+2020-01-08  Bruno Haible  <bruno@clisp.org>
+
+	mbrtoc32: Use the system's mbrtoc32 if it exists and basically works.
+	* m4/mbrtoc32.m4 (gl_MBRTOC32_SANITYCHECK): New macro.
+	(gl_FUNC_MBRTOC32): Require it. Set REPLACE_MBRTOC32 if mbrtoc32 exists
+	but is not working.
+	* lib/mbrtoc32.c: Include hard-locale.h, <locale.h>.
+	(mbrtoc32): If the char32_t encoding and the wchar_t encoding may
+	differ, use the system's mbrtoc32, adding workarounds.
+	* modules/mbrtoc32 (Depends-on): Add hard-locale.
+	* doc/posix-functions/mbrtoc32.texi: Mention the Solaris and native
+	Windows problem.
+	* lib/btoc32.c: Include <stdio.h>, <string.h>.
+	(btoc32): If the char32_t encoding and the wchar_t encoding may differ,
+	use mbrtoc32, not btowc.
+	* modules/btoc32 (Depends-on): Add mbrtoc32.
+	* lib/mbsrtoc32s.c (mbsrtoc32s): If the char32_t encoding and the
+	wchar_t encoding may differ, use mbrtoc32, not mbsrtowcs.
+	* modules/mbsrtoc32s (Depends-on): Update conditions.
+	(configure.ac): Compile mbsrtoc32s-state.c unconditionally.
+	* lib/mbsnrtoc32s.c (mbsnrtoc32s): If the char32_t encoding and the
+	wchar_t encoding may differ, use mbrtoc32, not mbsnrtowcs.
+	* modules/mbsnrtoc32s (Depends-on): Update conditions.
+	(configure.ac): Compile mbsrtoc32s-state.c unconditionally.
+
 2020-01-07  Bruno Haible  <bruno@clisp.org>
 
 	wcrtomb: Make multithread-safe, except possibly on IRIX.
diff --git a/doc/posix-functions/mbrtoc32.texi b/doc/posix-functions/mbrtoc32.texi
index 1aa15a3..9789bef 100644
--- a/doc/posix-functions/mbrtoc32.texi
+++ b/doc/posix-functions/mbrtoc32.texi
@@ -17,6 +17,10 @@ glibc 2.23.
 This function returns 0 instead of @code{(size_t) -2} when the input
 is empty:
 glibc 2.19.
+@item
+This function does not recognize multibyte sequences that @code{mbrtowc}
+recognizes on some platforms:
+Solaris 11.4, mingw, MSVC 14.
 @end itemize
 
 Portability problems not fixed by Gnulib:
diff --git a/lib/btoc32.c b/lib/btoc32.c
index 8b27875..d8ce087 100644
--- a/lib/btoc32.c
+++ b/lib/btoc32.c
@@ -21,10 +21,30 @@
 /* Specification.  */
 #include <uchar.h>
 
+#include <stdio.h>
+#include <string.h>
+
 wint_t
 btoc32 (int c)
 {
+#if HAVE_WORKING_MBRTOC32 && !defined __GLIBC__
+  /* The char32_t encoding of a multibyte character may be different than its
+     wchar_t encoding.  */
+  if (c != EOF)
+    {
+      mbstate_t state;
+      char s[1];
+      char32_t wc;
+
+      memset (&state, '\0', sizeof (mbstate_t));
+      s[0] = (unsigned char) c;
+      if (mbrtoc32 (&wc, s, 1, &state) <= 1)
+        return wc;
+    }
+  return WEOF;
+#else
   /* In all known locale encodings, unibyte characters correspond only to
      characters in the BMP.  */
   return btowc (c);
+#endif
 }
diff --git a/lib/mbrtoc32.c b/lib/mbrtoc32.c
index f2cf71e..facf28b 100644
--- a/lib/mbrtoc32.c
+++ b/lib/mbrtoc32.c
@@ -24,13 +24,13 @@
 #include <errno.h>
 #include <stdlib.h>
 
-# ifndef FALLTHROUGH
-#  if __GNUC__ < 7
-#   define FALLTHROUGH ((void) 0)
-#  else
-#   define FALLTHROUGH __attribute__ ((__fallthrough__))
-#  endif
+#ifndef FALLTHROUGH
+# if __GNUC__ < 7
+#  define FALLTHROUGH ((void) 0)
+# else
+#  define FALLTHROUGH __attribute__ ((__fallthrough__))
 # endif
+#endif
 
 #if GNULIB_defined_mbstate_t /* AIX, IRIX */
 /* Implement mbrtoc32() on top of mbtowc() for the non-UTF-8 locales
@@ -74,17 +74,23 @@ mbrtoc32 (char32_t *pwc, const char *s, size_t n, mbstate_t *ps)
 
 #else /* glibc, macOS, FreeBSD, NetBSD, OpenBSD, HP-UX, Solaris, Cygwin, mingw, MSVC, Minix, Android */
 
-/* Implement mbrtoc32() based on mbrtowc().  */
+/* Implement mbrtoc32() based on the original mbrtoc32() or on mbrtowc().  */
 
 # include <wchar.h>
 
 # include "localcharset.h"
 # include "streq.h"
 
+# if MBRTOC32_IN_C_LOCALE_MAYBE_EILSEQ
+#  include "hard-locale.h"
+#  include <locale.h>
+# endif
+
 static mbstate_t internal_state;
 
 size_t
 mbrtoc32 (char32_t *pwc, const char *s, size_t n, mbstate_t *ps)
+# undef mbrtoc32
 {
   /* It's simpler to handle the case s == NULL upfront, than to worry about
      this case later, before every test of pwc and n.  */
@@ -103,7 +109,31 @@ mbrtoc32 (char32_t *pwc, const char *s, size_t n, mbstate_t *ps)
   if (ps == NULL)
     ps = &internal_state;
 
-# if _GL_LARGE_CHAR32_T
+# if HAVE_WORKING_MBRTOC32
+  /* mbrtoc32() may produce different values for wc than mbrtowc().  Therefore
+     use mbrtoc32().  */
+
+#  if defined _WIN32 && !defined __CYGWIN__
+  char32_t wc;
+  size_t ret = mbrtoc32 (&wc, s, n, ps);
+  if (ret < (size_t) -2 && pwc != NULL)
+    *pwc = wc;
+#  else
+  size_t ret = mbrtoc32 (pwc, s, n, ps);
+#  endif
+
+#  if MBRTOC32_IN_C_LOCALE_MAYBE_EILSEQ
+  if ((size_t) -2 <= ret && n != 0 && ! hard_locale (LC_CTYPE))
+    {
+      if (pwc != NULL)
+        *pwc = (unsigned char) *s;
+      return 1;
+    }
+#  endif
+
+  return ret;
+
+# elif _GL_LARGE_CHAR32_T
 
   /* Special-case all encodings that may produce wide character values
      > WCHAR_MAX.  */
@@ -209,12 +239,7 @@ mbrtoc32 (char32_t *pwc, const char *s, size_t n, mbstate_t *ps)
 
 # else
 
-  /* char32_t and wchar_t are equivalent.
-     Two implementations are possible:
-       - We can call the original mbrtoc32 (if it exists) and handle
-         MBRTOC32_IN_C_LOCALE_MAYBE_EILSEQ.
-       - We can call mbrtowc.
-     The latter is simpler.   */
+  /* char32_t and wchar_t are equivalent.  Use mbrtowc().  */
   wchar_t wc;
   size_t ret = mbrtowc (&wc, s, n, ps);
   if (ret < (size_t) -2 && pwc != NULL)
diff --git a/lib/mbsnrtoc32s.c b/lib/mbsnrtoc32s.c
index 7ba0415..c0f6e1f 100644
--- a/lib/mbsnrtoc32s.c
+++ b/lib/mbsnrtoc32s.c
@@ -22,7 +22,9 @@
 
 #include <wchar.h>
 
-#if _GL_LARGE_CHAR32_T
+#if (HAVE_WORKING_MBRTOC32 && !defined __GLIBC__) || _GL_LARGE_CHAR32_T
+/* The char32_t encoding of a multibyte character may be different than its
+   wchar_t encoding, or char32_t is wider than wchar_t.  */
 
 /* For Cygwin >= 1.7 it would be possible to speed this up a bit by cutting
    the source into chunks, calling mbsnrtowcs on a chunk, then u16_to_u32 on
diff --git a/lib/mbsrtoc32s.c b/lib/mbsrtoc32s.c
index 432ffaf..8887ddf 100644
--- a/lib/mbsrtoc32s.c
+++ b/lib/mbsrtoc32s.c
@@ -22,7 +22,9 @@
 
 #include <wchar.h>
 
-#if _GL_LARGE_CHAR32_T
+#if (HAVE_WORKING_MBRTOC32 && !defined __GLIBC__) || _GL_LARGE_CHAR32_T
+/* The char32_t encoding of a multibyte character may be different than its
+   wchar_t encoding, or char32_t is wider than wchar_t.  */
 
 # include <errno.h>
 # include <limits.h>
diff --git a/m4/mbrtoc32.m4 b/m4/mbrtoc32.m4
index 5039fc7..3dee900 100644
--- a/m4/mbrtoc32.m4
+++ b/m4/mbrtoc32.m4
@@ -1,4 +1,4 @@
-# mbrtoc32.m4 serial 1
+# mbrtoc32.m4 serial 2
 dnl Copyright (C) 2014-2020 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -11,6 +11,8 @@ AC_DEFUN([gl_FUNC_MBRTOC32],
   AC_REQUIRE([AC_TYPE_MBSTATE_T])
   gl_MBSTATE_T_BROKEN
 
+  AC_REQUIRE([gl_MBRTOC32_SANITYCHECK])
+
   AC_CHECK_FUNCS_ONCE([mbrtoc32])
   if test $ac_cv_func_mbrtoc32 = no; then
     HAVE_MBRTOC32=0
@@ -35,6 +37,9 @@ AC_DEFUN([gl_FUNC_MBRTOC32],
            ;;
       esac
     fi
+    if test $HAVE_WORKING_MBRTOC32 = 0; then
+      REPLACE_MBRTOC32=1
+    fi
   fi
 ])
 
@@ -111,6 +116,101 @@ AC_DEFUN([gl_MBRTOC32_C_LOCALE],
     ])
 ])
 
+dnl Test whether mbrtoc32 works not worse than mbrtowc.
+dnl Result is HAVE_WORKING_MBRTOC32.
+
+AC_DEFUN([gl_MBRTOC32_SANITYCHECK],
+[
+  AC_REQUIRE([AC_PROG_CC])
+  AC_CHECK_FUNCS_ONCE([mbrtoc32])
+  AC_REQUIRE([gt_LOCALE_FR])
+  AC_REQUIRE([gt_LOCALE_ZH_CN])
+  AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles
+  if test $ac_cv_func_mbrtoc32 = no; then
+    HAVE_WORKING_MBRTOC32=0
+  else
+    AC_CACHE_CHECK([whether mbrtoc32 works as well as mbrtowc],
+      [gl_cv_func_mbrtoc32_sanitycheck],
+      [
+        dnl Initial guess, used when cross-compiling or when no suitable locale
+        dnl is present.
+changequote(,)dnl
+        case "$host_os" in
+                             # Guess no on Solaris, native Windows.
+          solaris* | mingw*) gl_cv_func_mbrtoc32_sanitycheck="guessing no" ;;
+                             # Guess yes otherwise.
+          *)                 gl_cv_func_mbrtoc32_sanitycheck="guessing yes" ;;
+        esac
+changequote([,])dnl
+        if test $LOCALE_FR != none || test $LOCALE_ZH_CN != none; then
+          AC_RUN_IFELSE(
+            [AC_LANG_SOURCE([[
+#include <locale.h>
+#include <stdlib.h>
+#include <string.h>
+/* Tru64 with Desktop Toolkit C has a bug: <stdio.h> must be included before
+   <wchar.h>.
+   BSD/OS 4.0.1 has a bug: <stddef.h>, <stdio.h> and <time.h> must be
+   included before <wchar.h>.  */
+#include <stddef.h>
+#include <stdio.h>
+#include <time.h>
+#include <wchar.h>
+#include <uchar.h>
+int main ()
+{
+  int result = 0;
+  /* This fails on native Windows:
+     mbrtoc32 returns (size_t)-1.
+     mbrtowc returns 1 (correct).  */
+  if (setlocale (LC_ALL, "$LOCALE_FR") != NULL)
+    {
+      mbstate_t state;
+      wchar_t wc = (wchar_t) 0xBADFACE;
+      memset (&state, '\0', sizeof (mbstate_t));
+      if (mbrtowc (&wc, "\374", 1, &state) == 1)
+        {
+          char32_t c32 = (wchar_t) 0xBADFACE;
+          memset (&state, '\0', sizeof (mbstate_t));
+          if (mbrtoc32 (&c32, "\374", 1, &state) != 1)
+            result |= 1;
+        }
+    }
+  /* This fails on Solaris 11.4:
+     mbrtoc32 returns (size_t)-1.
+     mbrtowc returns 4 (correct).  */
+  if (setlocale (LC_ALL, "$LOCALE_ZH_CN") != NULL)
+    {
+      mbstate_t state;
+      wchar_t wc = (wchar_t) 0xBADFACE;
+      memset (&state, '\0', sizeof (mbstate_t));
+      if (mbrtowc (&wc, "\224\071\375\067", 4, &state) == 4)
+        {
+          char32_t c32 = (wchar_t) 0xBADFACE;
+          memset (&state, '\0', sizeof (mbstate_t));
+          if (mbrtoc32 (&c32, "\224\071\375\067", 4, &state) != 4)
+            result |= 2;
+        }
+    }
+  return result;
+}]])],
+            [gl_cv_func_mbrtoc32_sanitycheck=yes],
+            [gl_cv_func_mbrtoc32_sanitycheck=no],
+            [:])
+        fi
+      ])
+    case "$gl_cv_func_mbrtoc32_sanitycheck" in
+      *yes)
+        HAVE_WORKING_MBRTOC32=1
+        AC_DEFINE([HAVE_WORKING_MBRTOC32], [1],
+          [Define if the mbrtoc32 function basically works.])
+        ;;
+      *) HAVE_WORKING_MBRTOC32=0 ;;
+    esac
+  fi
+  AC_SUBST([HAVE_WORKING_MBRTOC32])
+])
+
 # Prerequisites of lib/mbrtoc32.c and lib/lc-charset-dispatch.c.
 AC_DEFUN([gl_PREREQ_MBRTOC32], [
   :
diff --git a/modules/btoc32 b/modules/btoc32
index 5e5d4a9..caf36d3 100644
--- a/modules/btoc32
+++ b/modules/btoc32
@@ -6,6 +6,7 @@ lib/btoc32.c
 
 Depends-on:
 uchar
+mbrtoc32
 btowc
 
 configure.ac:
diff --git a/modules/mbrtoc32 b/modules/mbrtoc32
index 2575394..cf41846 100644
--- a/modules/mbrtoc32
+++ b/modules/mbrtoc32
@@ -18,6 +18,7 @@ m4/visibility.m4
 
 Depends-on:
 uchar
+hard-locale     [{ test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1; } && test $REPLACE_MBSTATE_T = 0]
 mbrtowc         [{ test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1; } && test $REPLACE_MBSTATE_T = 0]
 localcharset    [test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1]
 streq           [test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1]
diff --git a/modules/mbsnrtoc32s b/modules/mbsnrtoc32s
index 44784d8..ac464a8 100644
--- a/modules/mbsnrtoc32s
+++ b/modules/mbsnrtoc32s
@@ -10,16 +10,14 @@ Depends-on:
 uchar
 wchar
 verify
-mbrtoc32        [test $SMALL_WCHAR_T = 1]
-minmax          [test $SMALL_WCHAR_T = 1]
-strnlen1        [test $SMALL_WCHAR_T = 1]
+mbrtoc32
+minmax
+strnlen1
 mbsnrtowcs      [test $SMALL_WCHAR_T = 0]
 
 configure.ac:
 AC_REQUIRE([gl_UCHAR_H])
-if test $SMALL_WCHAR_T = 1; then
-  AC_LIBOBJ([mbsrtoc32s-state])
-fi
+AC_LIBOBJ([mbsrtoc32s-state])
 gl_UCHAR_MODULE_INDICATOR([mbsnrtoc32s])
 
 Makefile.am:
diff --git a/modules/mbsrtoc32s b/modules/mbsrtoc32s
index e7e5ee2..64892cf 100644
--- a/modules/mbsrtoc32s
+++ b/modules/mbsrtoc32s
@@ -10,15 +10,13 @@ Depends-on:
 uchar
 wchar
 verify
-mbrtoc32        [test $SMALL_WCHAR_T = 1]
-strnlen1        [test $SMALL_WCHAR_T = 1]
+mbrtoc32
+strnlen1
 mbsrtowcs       [test $SMALL_WCHAR_T = 0]
 
 configure.ac:
 AC_REQUIRE([gl_UCHAR_H])
-if test $SMALL_WCHAR_T = 1; then
-  AC_LIBOBJ([mbsrtoc32s-state])
-fi
+AC_LIBOBJ([mbsrtoc32s-state])
 gl_UCHAR_MODULE_INDICATOR([mbsrtoc32s])
 
 Makefile.am:
-- 
2.7.4


[-- Attachment #3: 0002-c32rtomb-New-module.patch --]
[-- Type: text/x-patch, Size: 14833 bytes --]

From 4ec96253823bde7488bfee4ee5d890792d6b555b Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Thu, 9 Jan 2020 01:56:35 +0100
Subject: [PATCH 2/4] c32rtomb: New module.

* lib/uchar.in.h (c32rtomb): New declaration.
* lib/c32rtomb.c: New file, based on lib/unistr/u8-uctomb-aux.c.
* m4/c32rtomb.m4: New file.
* m4/uchar.m4 (gl_UCHAR_H): Test whether c32rtomb is declared.
(gl_UCHAR_H_DEFAULTS): Initialize GNULIB_C32RTOMB, HAVE_C32RTOMB,
REPLACE_C32RTOMB.
* modules/uchar (Makefile.am): Substitute GNULIB_C32RTOMB,
HAVE_C32RTOMB, REPLACE_C32RTOMB.
* modules/c32rtomb: New file.
* tests/test-uchar-c++.cc: Test the signature of c32rtomb.
* doc/posix-functions/c32rtomb.texi: Document the new module.
* doc/posix-functions/wcrtomb.texi: Mention the new module.
---
 ChangeLog                         |  16 +++++
 doc/posix-functions/c32rtomb.texi |  11 ++--
 doc/posix-functions/wcrtomb.texi  |   7 ++-
 lib/c32rtomb.c                    | 124 ++++++++++++++++++++++++++++++++++++++
 lib/uchar.in.h                    |  25 ++++++++
 m4/c32rtomb.m4                    |  55 +++++++++++++++++
 m4/uchar.m4                       |   7 ++-
 modules/c32rtomb                  |  32 ++++++++++
 modules/uchar                     |   3 +
 tests/test-uchar-c++.cc           |   5 ++
 10 files changed, 277 insertions(+), 8 deletions(-)
 create mode 100644 lib/c32rtomb.c
 create mode 100644 m4/c32rtomb.m4
 create mode 100644 modules/c32rtomb

diff --git a/ChangeLog b/ChangeLog
index 4b5a419..3ad99ff 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,21 @@
 2020-01-08  Bruno Haible  <bruno@clisp.org>
 
+	c32rtomb: New module.
+	* lib/uchar.in.h (c32rtomb): New declaration.
+	* lib/c32rtomb.c: New file, based on lib/unistr/u8-uctomb-aux.c.
+	* m4/c32rtomb.m4: New file.
+	* m4/uchar.m4 (gl_UCHAR_H): Test whether c32rtomb is declared.
+	(gl_UCHAR_H_DEFAULTS): Initialize GNULIB_C32RTOMB, HAVE_C32RTOMB,
+	REPLACE_C32RTOMB.
+	* modules/uchar (Makefile.am): Substitute GNULIB_C32RTOMB,
+	HAVE_C32RTOMB, REPLACE_C32RTOMB.
+	* modules/c32rtomb: New file.
+	* tests/test-uchar-c++.cc: Test the signature of c32rtomb.
+	* doc/posix-functions/c32rtomb.texi: Document the new module.
+	* doc/posix-functions/wcrtomb.texi: Mention the new module.
+
+2020-01-08  Bruno Haible  <bruno@clisp.org>
+
 	mbrtoc32: Use the system's mbrtoc32 if it exists and basically works.
 	* m4/mbrtoc32.m4 (gl_MBRTOC32_SANITYCHECK): New macro.
 	(gl_FUNC_MBRTOC32): Require it. Set REPLACE_MBRTOC32 if mbrtoc32 exists
diff --git a/doc/posix-functions/c32rtomb.texi b/doc/posix-functions/c32rtomb.texi
index 392bbe9..4a1a617 100644
--- a/doc/posix-functions/c32rtomb.texi
+++ b/doc/posix-functions/c32rtomb.texi
@@ -2,15 +2,18 @@
 @section @code{c32rtomb}
 @findex c32rtomb
 
-Gnulib module: ---
+Gnulib module: c32rtomb
 
 Portability problems fixed by Gnulib:
 @itemize
+@item
+This function is missing on most non-glibc platforms:
+glibc 2.15, Mac OS X 10.5, FreeBSD 6.4, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.3, Cygwin, mingw, MSVC 9, Android 4.4.
+@item
+This function returns 0 when the first argument is NULL in some locales on some platforms:
+AIX 7.2.
 @end itemize
 
 Portability problems not fixed by Gnulib:
 @itemize
-@item
-This function is missing on most non-glibc platforms:
-glibc 2.15, Mac OS X 10.5, FreeBSD 6.4, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.3, Cygwin, mingw, MSVC 9, Android 4.4.
 @end itemize
diff --git a/doc/posix-functions/wcrtomb.texi b/doc/posix-functions/wcrtomb.texi
index 232bea4..28b8dfe 100644
--- a/doc/posix-functions/wcrtomb.texi
+++ b/doc/posix-functions/wcrtomb.texi
@@ -25,6 +25,9 @@ MSVC 14.
 Portability problems not fixed by Gnulib:
 @itemize
 @item
-On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and therefore cannot
-accommodate all Unicode characters.
+On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and
+therefore cannot accommodate all Unicode characters.
+However, the ISO C11 function @code{c32rtomb}, provided by Gnulib module
+@code{c32rtomb}, operates on 32-bit wide characters and therefore does not have
+this limitation.
 @end itemize
diff --git a/lib/c32rtomb.c b/lib/c32rtomb.c
new file mode 100644
index 0000000..ba39929
--- /dev/null
+++ b/lib/c32rtomb.c
@@ -0,0 +1,124 @@
+/* Convert 32-bit wide character to multibyte character.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <bruno@clisp.org>, 2020.  */
+
+#include <config.h>
+
+/* Specification.  */
+#include <uchar.h>
+
+#include <errno.h>
+#include <wchar.h>
+
+#include "localcharset.h"
+#include "streq.h"
+
+#ifndef FALLTHROUGH
+# if __GNUC__ < 7
+#  define FALLTHROUGH ((void) 0)
+# else
+#  define FALLTHROUGH __attribute__ ((__fallthrough__))
+# endif
+#endif
+
+size_t
+c32rtomb (char *s, char32_t wc, mbstate_t *ps)
+#undef c32rtomb
+{
+#if HAVE_WORKING_MBRTOC32
+
+# if C32RTOMB_RETVAL_BUG
+  if (s == NULL)
+    /* We know the NUL wide character corresponds to the NUL character.  */
+    return 1;
+# endif
+
+  return c32rtomb (s, wc, ps);
+
+#elif _GL_LARGE_CHAR32_T
+
+  if (s == NULL)
+    return wcrtomb (NULL, 0, ps);
+  else
+    {
+      /* Special-case all encodings that may produce wide character values
+         > WCHAR_MAX.  */
+      const char *encoding = locale_charset ();
+      if (STREQ_OPT (encoding, "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0, 0))
+        {
+          /* Special-case the UTF-8 encoding.  Assume that the wide-character
+             encoding in a UTF-8 locale is UCS-2 or, equivalently, UTF-16.  */
+          if (wc < 0x80)
+            {
+              s[0] = (unsigned char) wc;
+              return 1;
+            }
+          else
+            {
+              int count;
+
+              if (wc < 0x800)
+                count = 2;
+              else if (wc < 0x10000)
+                {
+                  if (wc < 0xd800 || wc >= 0xe000)
+                    count = 3;
+                  else
+                    {
+                      errno = EILSEQ;
+                      return (size_t)(-1);
+                    }
+                }
+              else if (wc < 0x110000)
+                count = 4;
+              else
+                {
+                  errno = EILSEQ;
+                  return (size_t)(-1);
+                }
+
+              switch (count) /* note: code falls through cases! */
+                {
+                case 4: s[3] = 0x80 | (wc & 0x3f); wc = wc >> 6; wc |= 0x10000;
+                  FALLTHROUGH;
+                case 3: s[2] = 0x80 | (wc & 0x3f); wc = wc >> 6; wc |= 0x800;
+                  FALLTHROUGH;
+                case 2: s[1] = 0x80 | (wc & 0x3f); wc = wc >> 6; wc |= 0xc0;
+              /*case 1:*/ s[0] = wc;
+                }
+              return count;
+            }
+        }
+      else
+        {
+          if ((wchar_t) wc == wc)
+            return wcrtomb (s, (wchar_t) wc, ps);
+          else
+            {
+              errno = EILSEQ;
+              return (size_t)(-1);
+            }
+        }
+    }
+
+#else
+
+  /* char32_t and wchar_t are equivalent.  */
+  return wcrtomb (s, (wchar_t) wc, ps);
+
+#endif
+}
diff --git a/lib/uchar.in.h b/lib/uchar.in.h
index 513fa8c..dbbfc30 100644
--- a/lib/uchar.in.h
+++ b/lib/uchar.in.h
@@ -68,6 +68,31 @@ _GL_CXXALIASWARN (btoc32);
 #endif
 
 
+/* Converts a 32-bit wide character to a multibyte character.  */
+#if @GNULIB_C32RTOMB@
+# if @REPLACE_C32RTOMB@
+#  if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+#   undef c32rtomb
+#   define c32rtomb rpl_c32rtomb
+#  endif
+_GL_FUNCDECL_RPL (c32rtomb, size_t, (char *s, char32_t wc, mbstate_t *ps));
+_GL_CXXALIAS_RPL (c32rtomb, size_t, (char *s, char32_t wc, mbstate_t *ps));
+# else
+#  if !@HAVE_C32RTOMB@
+_GL_FUNCDECL_SYS (c32rtomb, size_t, (char *s, char32_t wc, mbstate_t *ps));
+#  endif
+_GL_CXXALIAS_SYS (c32rtomb, size_t, (char *s, char32_t wc, mbstate_t *ps));
+# endif
+_GL_CXXALIASWARN (c32rtomb);
+#elif defined GNULIB_POSIXCHECK
+# undef c32rtomb
+# if HAVE_RAW_DECL_C32RTOMB
+_GL_WARN_ON_USE (mbrtoc32, "c32rtomb is not portable - "
+                 "use gnulib module c32rtomb for portability");
+# endif
+#endif
+
+
 /* Converts a 32-bit wide character to unibyte character.
    Returns the single-byte representation of WC if it exists,
    or EOF otherwise.  */
diff --git a/m4/c32rtomb.m4 b/m4/c32rtomb.m4
new file mode 100644
index 0000000..4cf0e4d
--- /dev/null
+++ b/m4/c32rtomb.m4
@@ -0,0 +1,55 @@
+# c32rtomb.m4 serial 1
+dnl Copyright (C) 2020 Free Software Foundation, Inc.
+dnl This file is free software; the Free Software Foundation
+dnl gives unlimited permission to copy and/or distribute it,
+dnl with or without modifications, as long as this notice is preserved.
+
+AC_DEFUN([gl_FUNC_C32RTOMB],
+[
+  AC_REQUIRE([gl_UCHAR_H_DEFAULTS])
+
+  AC_REQUIRE([gl_MBRTOC32_SANITYCHECK])
+
+  AC_CHECK_FUNCS_ONCE([c32rtomb])
+  if test $ac_cv_func_c32rtomb = no; then
+    HAVE_C32RTOMB=0
+  else
+    dnl When we override mbrtoc32, redefining the meaning of the char32_t
+    dnl values, we need to override c32rtomb as well, for consistency.
+    if test $HAVE_WORKING_MBRTOC32 = 0; then
+      REPLACE_C32RTOMB=1
+    fi
+    AC_CACHE_CHECK([whether c32rtomb return value is correct],
+      [gl_cv_func_c32rtomb_retval],
+      [
+        dnl Initial guess, used when cross-compiling.
+changequote(,)dnl
+        case "$host_os" in
+          # Guess no on AIX.
+          aix*) gl_cv_func_c32rtomb_retval="guessing no" ;;
+          # Guess yes otherwise.
+          *)    gl_cv_func_c32rtomb_retval="guessing yes" ;;
+        esac
+changequote([,])dnl
+        AC_RUN_IFELSE(
+          [AC_LANG_SOURCE([[
+#include <uchar.h>
+int main ()
+{
+  int result = 0;
+  if (c32rtomb (NULL, 0, NULL) != 1)
+    result |= 1;
+  return result;
+}]])],
+          [gl_cv_func_c32rtomb_retval=yes],
+          [gl_cv_func_c32rtomb_retval=no],
+          [:])
+      ])
+    case "$gl_cv_func_c32rtomb_retval" in
+      *yes) ;;
+      *) AC_DEFINE([C32RTOMB_RETVAL_BUG], [1],
+           [Define if the wcrtomb function has an incorrect return value.])
+         REPLACE_C32RTOMB=1 ;;
+    esac
+  fi
+])
diff --git a/m4/uchar.m4 b/m4/uchar.m4
index 0b5c662..be71196 100644
--- a/m4/uchar.m4
+++ b/m4/uchar.m4
@@ -1,4 +1,4 @@
-# uchar.m4 serial 8
+# uchar.m4 serial 9
 dnl Copyright (C) 2019-2020 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -33,7 +33,7 @@ AC_DEFUN_ONCE([gl_UCHAR_H],
   dnl corresponding gnulib module is not in use, and which is not
   dnl guaranteed by C11.
   gl_WARN_ON_USE_PREPARE([[#include <uchar.h>
-    ]], [mbrtoc32])
+    ]], [c32rtomb mbrtoc32])
 ])
 
 AC_DEFUN([gl_UCHAR_MODULE_INDICATOR],
@@ -48,12 +48,15 @@ AC_DEFUN([gl_UCHAR_MODULE_INDICATOR],
 AC_DEFUN([gl_UCHAR_H_DEFAULTS],
 [
   GNULIB_BTOC32=0;           AC_SUBST([GNULIB_BTOC32])
+  GNULIB_C32RTOMB=0;         AC_SUBST([GNULIB_C32RTOMB])
   GNULIB_C32TOB=0;           AC_SUBST([GNULIB_C32TOB])
   GNULIB_MBRTOC32=0;         AC_SUBST([GNULIB_MBRTOC32])
   GNULIB_MBSNRTOC32S=0;      AC_SUBST([GNULIB_MBSNRTOC32S])
   GNULIB_MBSRTOC32S=0;       AC_SUBST([GNULIB_MBSRTOC32S])
   GNULIB_MBSTOC32S=0;        AC_SUBST([GNULIB_MBSTOC32S])
   dnl Assume proper GNU behavior unless another module says otherwise.
+  HAVE_C32RTOMB=1;           AC_SUBST([HAVE_C32RTOMB])
   HAVE_MBRTOC32=1;           AC_SUBST([HAVE_MBRTOC32])
+  REPLACE_C32RTOMB=0;        AC_SUBST([REPLACE_C32RTOMB])
   REPLACE_MBRTOC32=0;        AC_SUBST([REPLACE_MBRTOC32])
 ])
diff --git a/modules/c32rtomb b/modules/c32rtomb
new file mode 100644
index 0000000..ea227df
--- /dev/null
+++ b/modules/c32rtomb
@@ -0,0 +1,32 @@
+Description:
+c32rtomb() function: convert 32-bit wide character to multibyte character.
+
+Files:
+lib/c32rtomb.c
+m4/c32rtomb.m4
+m4/mbrtoc32.m4
+
+Depends-on:
+uchar
+wchar           [test $HAVE_C32RTOMB = 0 || test $REPLACE_C32RTOMB = 1]
+wcrtomb         [test $HAVE_C32RTOMB = 0 || test $REPLACE_C32RTOMB = 1]
+localcharset    [{ test $HAVE_C32RTOMB = 0 || test $REPLACE_C32RTOMB = 1; } && test $SMALL_WCHAR_T = 1]
+streq           [{ test $HAVE_C32RTOMB = 0 || test $REPLACE_C32RTOMB = 1; } && test $SMALL_WCHAR_T = 1]
+
+configure.ac:
+gl_FUNC_C32RTOMB
+if test $HAVE_C32RTOMB = 0 || test $REPLACE_C32RTOMB = 1; then
+  AC_LIBOBJ([c32rtomb])
+fi
+gl_UCHAR_MODULE_INDICATOR([c32rtomb])
+
+Makefile.am:
+
+Include:
+<uchar.h>
+
+License:
+LGPLv2+
+
+Maintainer:
+Bruno Haible
diff --git a/modules/uchar b/modules/uchar
index 29bc7ae..cab4518 100644
--- a/modules/uchar
+++ b/modules/uchar
@@ -29,12 +29,15 @@ uchar.h: uchar.in.h $(top_builddir)/config.status $(CXXDEFS_H)
 	      -e 's|@''NEXT_UCHAR_H''@|$(NEXT_UCHAR_H)|g' \
 	      -e 's|@''SMALL_WCHAR_T''@|$(SMALL_WCHAR_T)|g' \
 	      -e 's/@''GNULIB_BTOC32''@/$(GNULIB_BTOC32)/g' \
+	      -e 's/@''GNULIB_C32RTOMB''@/$(GNULIB_C32RTOMB)/g' \
 	      -e 's/@''GNULIB_C32TOB''@/$(GNULIB_C32TOB)/g' \
 	      -e 's/@''GNULIB_MBRTOC32''@/$(GNULIB_MBRTOC32)/g' \
 	      -e 's/@''GNULIB_MBSNRTOC32S''@/$(GNULIB_MBSNRTOC32S)/g' \
 	      -e 's/@''GNULIB_MBSRTOC32S''@/$(GNULIB_MBSRTOC32S)/g' \
 	      -e 's/@''GNULIB_MBSTOC32S''@/$(GNULIB_MBSTOC32S)/g' \
+	      -e 's|@''HAVE_C32RTOMB''@|$(HAVE_C32RTOMB)|g' \
 	      -e 's|@''HAVE_MBRTOC32''@|$(HAVE_MBRTOC32)|g' \
+	      -e 's|@''REPLACE_C32RTOMB''@|$(REPLACE_C32RTOMB)|g' \
 	      -e 's|@''REPLACE_MBRTOC32''@|$(REPLACE_MBRTOC32)|g' \
 	      -e '/definitions of _GL_FUNCDECL_RPL/r $(CXXDEFS_H)' \
 	      < $(srcdir)/uchar.in.h; \
diff --git a/tests/test-uchar-c++.cc b/tests/test-uchar-c++.cc
index 3e71c89..ed45da2 100644
--- a/tests/test-uchar-c++.cc
+++ b/tests/test-uchar-c++.cc
@@ -28,6 +28,11 @@
 SIGNATURE_CHECK (GNULIB_NAMESPACE::btoc32, wint_t, (int));
 #endif
 
+#if GNULIB_TEST_C32RTOMB
+SIGNATURE_CHECK (GNULIB_NAMESPACE::c32rtomb, size_t,
+                 (char *, char32_t , mbstate_t *));
+#endif
+
 #if GNULIB_TEST_C32TOB
 SIGNATURE_CHECK (GNULIB_NAMESPACE::c32tob, int, (wint_t));
 #endif
-- 
2.7.4


[-- Attachment #4: 0003-c32rtomb-Add-tests.patch --]
[-- Type: text/x-patch, Size: 23903 bytes --]

From 18f05ac59765d532823d48c061d0dcac7c55007e Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Thu, 9 Jan 2020 02:00:19 +0100
Subject: [PATCH 3/4] c32rtomb: Add tests.

* tests/test-c32rtomb.c: New file, based on tests/test-wcrtomb.c.
* tests/test-c32rtomb.sh: New file, based on tests/test-wcrtomb.sh.
* tests/test-c32rtomb-w32.c: New file, based on
tests/test-wcrtomb-w32.c.
* tests/test-c32rtomb-w32-1.sh: New file, based on
tests/test-wcrtomb-w32-1.sh.
* tests/test-c32rtomb-w32-2.sh: New file, based on
tests/test-wcrtomb-w32-2.sh.
* tests/test-c32rtomb-w32-3.sh: New file, based on
tests/test-wcrtomb-w32-3.sh.
* tests/test-c32rtomb-w32-4.sh: New file, based on
tests/test-wcrtomb-w32-4.sh.
* tests/test-c32rtomb-w32-5.sh: New file, based on
tests/test-wcrtomb-w32-5.sh.
* tests/test-c32rtomb-w32-6.sh: New file, based on
tests/test-wcrtomb-w32-6.sh.
* tests/test-c32rtomb-w32-7.sh: New file, based on
tests/test-wcrtomb-w32-7.sh.
* modules/c32rtomb-tests: New file.
---
 ChangeLog                    |  21 +++
 modules/c32rtomb-tests       |  43 ++++++
 tests/test-c32rtomb-w32-1.sh |   4 +
 tests/test-c32rtomb-w32-2.sh |   4 +
 tests/test-c32rtomb-w32-3.sh |   4 +
 tests/test-c32rtomb-w32-4.sh |   4 +
 tests/test-c32rtomb-w32-5.sh |   4 +
 tests/test-c32rtomb-w32-6.sh |   4 +
 tests/test-c32rtomb-w32-7.sh |   4 +
 tests/test-c32rtomb-w32.c    | 349 +++++++++++++++++++++++++++++++++++++++++++
 tests/test-c32rtomb.c        | 170 +++++++++++++++++++++
 tests/test-c32rtomb.sh       |  39 +++++
 12 files changed, 650 insertions(+)
 create mode 100644 modules/c32rtomb-tests
 create mode 100755 tests/test-c32rtomb-w32-1.sh
 create mode 100755 tests/test-c32rtomb-w32-2.sh
 create mode 100755 tests/test-c32rtomb-w32-3.sh
 create mode 100755 tests/test-c32rtomb-w32-4.sh
 create mode 100755 tests/test-c32rtomb-w32-5.sh
 create mode 100755 tests/test-c32rtomb-w32-6.sh
 create mode 100755 tests/test-c32rtomb-w32-7.sh
 create mode 100644 tests/test-c32rtomb-w32.c
 create mode 100644 tests/test-c32rtomb.c
 create mode 100755 tests/test-c32rtomb.sh

diff --git a/ChangeLog b/ChangeLog
index 3ad99ff..c303d41 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,26 @@
 2020-01-08  Bruno Haible  <bruno@clisp.org>
 
+	c32rtomb: Add tests.
+	* tests/test-c32rtomb.c: New file, based on tests/test-wcrtomb.c.
+	* tests/test-c32rtomb.sh: New file, based on tests/test-wcrtomb.sh.
+	* tests/test-c32rtomb-w32.c: New file, based on
+	tests/test-wcrtomb-w32.c.
+	* tests/test-c32rtomb-w32-1.sh: New file, based on
+	tests/test-wcrtomb-w32-1.sh.
+	* tests/test-c32rtomb-w32-2.sh: New file, based on
+	tests/test-wcrtomb-w32-2.sh.
+	* tests/test-c32rtomb-w32-3.sh: New file, based on
+	tests/test-wcrtomb-w32-3.sh.
+	* tests/test-c32rtomb-w32-4.sh: New file, based on
+	tests/test-wcrtomb-w32-4.sh.
+	* tests/test-c32rtomb-w32-5.sh: New file, based on
+	tests/test-wcrtomb-w32-5.sh.
+	* tests/test-c32rtomb-w32-6.sh: New file, based on
+	tests/test-wcrtomb-w32-6.sh.
+	* tests/test-c32rtomb-w32-7.sh: New file, based on
+	tests/test-wcrtomb-w32-7.sh.
+	* modules/c32rtomb-tests: New file.
+
 	c32rtomb: New module.
 	* lib/uchar.in.h (c32rtomb): New declaration.
 	* lib/c32rtomb.c: New file, based on lib/unistr/u8-uctomb-aux.c.
diff --git a/modules/c32rtomb-tests b/modules/c32rtomb-tests
new file mode 100644
index 0000000..a8d2bee
--- /dev/null
+++ b/modules/c32rtomb-tests
@@ -0,0 +1,43 @@
+Files:
+tests/test-c32rtomb.sh
+tests/test-c32rtomb.c
+tests/test-c32rtomb-w32-1.sh
+tests/test-c32rtomb-w32-2.sh
+tests/test-c32rtomb-w32-3.sh
+tests/test-c32rtomb-w32-4.sh
+tests/test-c32rtomb-w32-5.sh
+tests/test-c32rtomb-w32-6.sh
+tests/test-c32rtomb-w32-7.sh
+tests/test-c32rtomb-w32.c
+tests/signature.h
+tests/macros.h
+m4/locale-fr.m4
+m4/locale-ja.m4
+m4/locale-zh.m4
+m4/codeset.m4
+
+Depends-on:
+btoc32
+mbrtoc32
+setlocale
+localcharset
+
+configure.ac:
+gt_LOCALE_FR
+gt_LOCALE_FR_UTF8
+gt_LOCALE_JA
+gt_LOCALE_ZH_CN
+
+Makefile.am:
+TESTS += \
+  test-c32rtomb.sh \
+  test-c32rtomb-w32-1.sh test-c32rtomb-w32-2.sh test-c32rtomb-w32-3.sh \
+  test-c32rtomb-w32-4.sh test-c32rtomb-w32-5.sh test-c32rtomb-w32-6.sh \
+  test-c32rtomb-w32-7.sh
+TESTS_ENVIRONMENT += \
+  LOCALE_FR='@LOCALE_FR@' \
+  LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \
+  LOCALE_JA='@LOCALE_JA@' \
+  LOCALE_ZH_CN='@LOCALE_ZH_CN@'
+check_PROGRAMS += test-c32rtomb test-c32rtomb-w32
+test_c32rtomb_LDADD = $(LDADD) $(LIB_SETLOCALE)
diff --git a/tests/test-c32rtomb-w32-1.sh b/tests/test-c32rtomb-w32-1.sh
new file mode 100755
index 0000000..e797d0e
--- /dev/null
+++ b/tests/test-c32rtomb-w32-1.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a CP1252 locale.
+${CHECKER} ./test-c32rtomb-w32${EXEEXT} French_France 1252
diff --git a/tests/test-c32rtomb-w32-2.sh b/tests/test-c32rtomb-w32-2.sh
new file mode 100755
index 0000000..1b63d47
--- /dev/null
+++ b/tests/test-c32rtomb-w32-2.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a CP1256 locale.
+${CHECKER} ./test-c32rtomb-w32${EXEEXT} "Arabic_Saudi Arabia" 1256
diff --git a/tests/test-c32rtomb-w32-3.sh b/tests/test-c32rtomb-w32-3.sh
new file mode 100755
index 0000000..ff59a87
--- /dev/null
+++ b/tests/test-c32rtomb-w32-3.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a CP932 locale.
+${CHECKER} ./test-c32rtomb-w32${EXEEXT} Japanese_Japan 932
diff --git a/tests/test-c32rtomb-w32-4.sh b/tests/test-c32rtomb-w32-4.sh
new file mode 100755
index 0000000..3cf3406
--- /dev/null
+++ b/tests/test-c32rtomb-w32-4.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a CP950 locale.
+${CHECKER} ./test-c32rtomb-w32${EXEEXT} Chinese_Taiwan 950
diff --git a/tests/test-c32rtomb-w32-5.sh b/tests/test-c32rtomb-w32-5.sh
new file mode 100755
index 0000000..2174c0b
--- /dev/null
+++ b/tests/test-c32rtomb-w32-5.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a CP936 locale.
+${CHECKER} ./test-c32rtomb-w32${EXEEXT} Chinese_China 936
diff --git a/tests/test-c32rtomb-w32-6.sh b/tests/test-c32rtomb-w32-6.sh
new file mode 100755
index 0000000..b7e77b2
--- /dev/null
+++ b/tests/test-c32rtomb-w32-6.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a GB18030 locale.
+${CHECKER} ./test-c32rtomb-w32${EXEEXT} Chinese_China 54936
diff --git a/tests/test-c32rtomb-w32-7.sh b/tests/test-c32rtomb-w32-7.sh
new file mode 100755
index 0000000..3c0f3db
--- /dev/null
+++ b/tests/test-c32rtomb-w32-7.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test some UTF-8 locales.
+${CHECKER} ./test-c32rtomb-w32${EXEEXT} French_France Japanese_Japan Chinese_Taiwan Chinese_China 65001
diff --git a/tests/test-c32rtomb-w32.c b/tests/test-c32rtomb-w32.c
new file mode 100644
index 0000000..18630c7
--- /dev/null
+++ b/tests/test-c32rtomb-w32.c
@@ -0,0 +1,349 @@
+/* Test of conversion of wide character to multibyte character.
+   Copyright (C) 2008-2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+#include <config.h>
+
+#include <uchar.h>
+
+#include <locale.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "localcharset.h"
+#include "macros.h"
+
+#if defined _WIN32 && !defined __CYGWIN__
+
+static int
+test_one_locale (const char *name, int codepage)
+{
+  char buf[64];
+  size_t ret;
+
+# if 1
+  /* Portable code to set the locale.  */
+  {
+    char name_with_codepage[1024];
+
+    sprintf (name_with_codepage, "%s.%d", name, codepage);
+
+    /* Set the locale.  */
+    if (setlocale (LC_ALL, name_with_codepage) == NULL)
+      return 77;
+  }
+# else
+  /* Hacky way to set a locale.codepage combination that setlocale() refuses
+     to set.  */
+  {
+    /* Codepage of the current locale, set with setlocale().
+       Not necessarily the same as GetACP().  */
+    extern __declspec(dllimport) unsigned int __lc_codepage;
+
+    /* Set the locale.  */
+    if (setlocale (LC_ALL, name) == NULL)
+      return 77;
+
+    /* Clobber the codepage and MB_CUR_MAX, both set by setlocale().  */
+    __lc_codepage = codepage;
+    switch (codepage)
+      {
+      case 1252:
+      case 1256:
+        MB_CUR_MAX = 1;
+        break;
+      case 932:
+      case 950:
+      case 936:
+        MB_CUR_MAX = 2;
+        break;
+      case 54936:
+      case 65001:
+        MB_CUR_MAX = 4;
+        break;
+      }
+
+    /* Test whether the codepage is really available.  */
+    {
+      mbstate_t state;
+      wchar_t wc;
+
+      memset (&state, '\0', sizeof (mbstate_t));
+      if (mbrtowc (&wc, " ", 1, &state) == (size_t)(-1))
+        return 77;
+    }
+  }
+# endif
+
+  /* Test NUL character.  */
+  {
+    buf[0] = 'x';
+    ret = c32rtomb (buf, 0, NULL);
+    ASSERT (ret == 1);
+    ASSERT (buf[0] == '\0');
+  }
+
+  /* Test single bytes.  */
+  {
+    int c;
+
+    for (c = 0; c < 0x100; c++)
+      switch (c)
+        {
+        case '\t': case '\v': case '\f':
+        case ' ': case '!': case '"': case '#': case '%':
+        case '&': case '\'': case '(': case ')': case '*':
+        case '+': case ',': case '-': case '.': case '/':
+        case '0': case '1': case '2': case '3': case '4':
+        case '5': case '6': case '7': case '8': case '9':
+        case ':': case ';': case '<': case '=': case '>':
+        case '?':
+        case 'A': case 'B': case 'C': case 'D': case 'E':
+        case 'F': case 'G': case 'H': case 'I': case 'J':
+        case 'K': case 'L': case 'M': case 'N': case 'O':
+        case 'P': case 'Q': case 'R': case 'S': case 'T':
+        case 'U': case 'V': case 'W': case 'X': case 'Y':
+        case 'Z':
+        case '[': case '\\': case ']': case '^': case '_':
+        case 'a': case 'b': case 'c': case 'd': case 'e':
+        case 'f': case 'g': case 'h': case 'i': case 'j':
+        case 'k': case 'l': case 'm': case 'n': case 'o':
+        case 'p': case 'q': case 'r': case 's': case 't':
+        case 'u': case 'v': case 'w': case 'x': case 'y':
+        case 'z': case '{': case '|': case '}': case '~':
+          /* c is in the ISO C "basic character set".  */
+          ret = c32rtomb (buf, btoc32 (c), NULL);
+          ASSERT (ret == 1);
+          ASSERT (buf[0] == (char) c);
+          break;
+        }
+  }
+
+  /* Test special calling convention, passing a NULL pointer.  */
+  {
+    ret = c32rtomb (NULL, '\0', NULL);
+    ASSERT (ret == 1);
+    ret = c32rtomb (NULL, btoc32 ('x'), NULL);
+    ASSERT (ret == 1);
+  }
+
+  switch (codepage)
+    {
+    case 1252:
+      /* Locale encoding is CP1252, an extension of ISO-8859-1.  */
+      {
+        /* Convert "B\374\337er": "Büßer" */
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x00FC, NULL);
+        ASSERT (ret == 1);
+        ASSERT (memcmp (buf, "\374", 1) == 0);
+        ASSERT (buf[1] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x00DF, NULL);
+        ASSERT (ret == 1);
+        ASSERT (memcmp (buf, "\337", 1) == 0);
+        ASSERT (buf[1] == 'x');
+      }
+      return 0;
+
+    case 1256:
+      /* Locale encoding is CP1256, not the same as ISO-8859-6.  */
+      {
+        /* Convert "x\302\341\346y": "xآلوy" */
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x0622, NULL);
+        ASSERT (ret == 1);
+        ASSERT (memcmp (buf, "\302", 1) == 0);
+        ASSERT (buf[1] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x0644, NULL);
+        ASSERT (ret == 1);
+        ASSERT (memcmp (buf, "\341", 1) == 0);
+        ASSERT (buf[1] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x0648, NULL);
+        ASSERT (ret == 1);
+        ASSERT (memcmp (buf, "\346", 1) == 0);
+        ASSERT (buf[1] == 'x');
+      }
+      return 0;
+
+    case 932:
+      /* Locale encoding is CP932, similar to Shift_JIS.  */
+      {
+        /* Convert "<\223\372\226\173\214\352>": "<日本語>" */
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x65E5, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\223\372", 2) == 0);
+        ASSERT (buf[2] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x672C, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\226\173", 2) == 0);
+        ASSERT (buf[2] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x8A9E, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\214\352", 2) == 0);
+        ASSERT (buf[2] == 'x');
+      }
+      return 0;
+
+    case 950:
+      /* Locale encoding is CP950, similar to Big5.  */
+      {
+        /* Convert "<\244\351\245\273\273\171>": "<日本語>" */
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x65E5, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\244\351", 2) == 0);
+        ASSERT (buf[2] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x672C, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\245\273", 2) == 0);
+        ASSERT (buf[2] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x8A9E, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\273\171", 2) == 0);
+        ASSERT (buf[2] == 'x');
+      }
+      return 0;
+
+    case 936:
+      /* Locale encoding is CP936 = GBK, an extension of GB2312.  */
+      {
+        /* Convert "<\310\325\261\276\325\132>": "<日本語>" */
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x65E5, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\310\325", 2) == 0);
+        ASSERT (buf[2] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x672C, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\261\276", 2) == 0);
+        ASSERT (buf[2] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x8A9E, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\325\132", 2) == 0);
+        ASSERT (buf[2] == 'x');
+      }
+      return 0;
+
+    case 54936:
+      /* Locale encoding is CP54936 = GB18030.  */
+      if (strcmp (locale_charset (), "GB18030") != 0)
+        return 77;
+      {
+        /* Convert "s\250\271\201\060\211\070\224\071\375\067!"; "süß😋!" */
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x00FC, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\250\271", 2) == 0);
+        ASSERT (buf[2] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x00DF, NULL);
+        ASSERT (ret == 4);
+        ASSERT (memcmp (buf, "\201\060\211\070", 4) == 0);
+        ASSERT (buf[4] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x1F60B, NULL);
+        ASSERT (ret == 4);
+        ASSERT (memcmp (buf, "\224\071\375\067", 4) == 0);
+        ASSERT (buf[4] == 'x');
+      }
+      return 0;
+
+    case 65001:
+      /* Locale encoding is CP65001 = UTF-8.  */
+      if (strcmp (locale_charset (), "UTF-8") != 0)
+        return 77;
+      {
+        /* Convert "s\303\274\303\237\360\237\230\213!"; "süß😋!" */
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x00FC, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\303\274", 2) == 0);
+        ASSERT (buf[2] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x00DF, NULL);
+        ASSERT (ret == 2);
+        ASSERT (memcmp (buf, "\303\237", 2) == 0);
+        ASSERT (buf[2] == 'x');
+
+        memset (buf, 'x', 8);
+        ret = c32rtomb (buf, 0x1F60B, NULL);
+        ASSERT (ret == 4);
+        ASSERT (memcmp (buf, "\360\237\230\213", 4) == 0);
+        ASSERT (buf[4] == 'x');
+      }
+      return 0;
+
+    default:
+      return 1;
+    }
+}
+
+int
+main (int argc, char *argv[])
+{
+  int codepage = atoi (argv[argc - 1]);
+  int result;
+  int i;
+
+  result = 77;
+  for (i = 1; i < argc - 1; i++)
+    {
+      int ret = test_one_locale (argv[i], codepage);
+
+      if (ret != 77)
+        result = ret;
+    }
+
+  if (result == 77)
+    {
+      fprintf (stderr, "Skipping test: found no locale with codepage %d\n",
+               codepage);
+    }
+  return result;
+}
+
+#else
+
+int
+main (int argc, char *argv[])
+{
+  fputs ("Skipping test: not a native Windows system\n", stderr);
+  return 77;
+}
+
+#endif
diff --git a/tests/test-c32rtomb.c b/tests/test-c32rtomb.c
new file mode 100644
index 0000000..108efe3
--- /dev/null
+++ b/tests/test-c32rtomb.c
@@ -0,0 +1,170 @@
+/* Test of conversion of wide character to multibyte character.
+   Copyright (C) 2008-2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <bruno@clisp.org>, 2008.  */
+
+#include <config.h>
+
+#include <uchar.h>
+
+#include "signature.h"
+SIGNATURE_CHECK (c32rtomb, size_t, (char *, char32_t, mbstate_t *));
+
+#include <locale.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "macros.h"
+
+/* Check the multibyte character s[0..n-1].  */
+static void
+check_character (const char *s, size_t n)
+{
+  mbstate_t state;
+  char32_t wc;
+  char buf[64];
+  int iret;
+  size_t ret;
+
+  memset (&state, '\0', sizeof (mbstate_t));
+  wc = (char32_t) 0xBADFACE;
+  iret = mbrtoc32 (&wc, s, n, &state);
+  ASSERT (iret == n);
+
+  ret = c32rtomb (buf, wc, NULL);
+  ASSERT (ret == n);
+  ASSERT (memcmp (buf, s, n) == 0);
+
+  /* Test special calling convention, passing a NULL pointer.  */
+  ret = c32rtomb (NULL, wc, NULL);
+  ASSERT (ret == 1);
+}
+
+int
+main (int argc, char *argv[])
+{
+  char buf[64];
+  size_t ret;
+
+  /* configure should already have checked that the locale is supported.  */
+  if (setlocale (LC_ALL, "") == NULL)
+    return 1;
+
+  /* Test NUL character.  */
+  {
+    buf[0] = 'x';
+    ret = c32rtomb (buf, 0, NULL);
+    ASSERT (ret == 1);
+    ASSERT (buf[0] == '\0');
+  }
+
+  /* Test single bytes.  */
+  {
+    int c;
+
+    for (c = 0; c < 0x100; c++)
+      switch (c)
+        {
+        case '\t': case '\v': case '\f':
+        case ' ': case '!': case '"': case '#': case '%':
+        case '&': case '\'': case '(': case ')': case '*':
+        case '+': case ',': case '-': case '.': case '/':
+        case '0': case '1': case '2': case '3': case '4':
+        case '5': case '6': case '7': case '8': case '9':
+        case ':': case ';': case '<': case '=': case '>':
+        case '?':
+        case 'A': case 'B': case 'C': case 'D': case 'E':
+        case 'F': case 'G': case 'H': case 'I': case 'J':
+        case 'K': case 'L': case 'M': case 'N': case 'O':
+        case 'P': case 'Q': case 'R': case 'S': case 'T':
+        case 'U': case 'V': case 'W': case 'X': case 'Y':
+        case 'Z':
+        case '[': case '\\': case ']': case '^': case '_':
+        case 'a': case 'b': case 'c': case 'd': case 'e':
+        case 'f': case 'g': case 'h': case 'i': case 'j':
+        case 'k': case 'l': case 'm': case 'n': case 'o':
+        case 'p': case 'q': case 'r': case 's': case 't':
+        case 'u': case 'v': case 'w': case 'x': case 'y':
+        case 'z': case '{': case '|': case '}': case '~':
+          /* c is in the ISO C "basic character set".  */
+          ret = c32rtomb (buf, btoc32 (c), NULL);
+          ASSERT (ret == 1);
+          ASSERT (buf[0] == (char) c);
+          break;
+        }
+  }
+
+  /* Test special calling convention, passing a NULL pointer.  */
+  {
+    ret = c32rtomb (NULL, '\0', NULL);
+    ASSERT (ret == 1);
+    ret = c32rtomb (NULL, btoc32 ('x'), NULL);
+    ASSERT (ret == 1);
+  }
+
+  if (argc > 1)
+    switch (argv[1][0])
+      {
+      case '1':
+        /* Locale encoding is ISO-8859-1 or ISO-8859-15.  */
+        {
+          const char input[] = "B\374\337er"; /* "Büßer" */
+
+          check_character (input + 1, 1);
+          check_character (input + 2, 1);
+        }
+        return 0;
+
+      case '2':
+        /* Locale encoding is UTF-8.  */
+        {
+          const char input[] = "s\303\274\303\237\360\237\230\213!"; /* "süß😋!" */
+
+          check_character (input + 1, 2);
+          check_character (input + 3, 2);
+          check_character (input + 5, 4);
+        }
+        return 0;
+
+      case '3':
+        /* Locale encoding is EUC-JP.  */
+        {
+          const char input[] = "<\306\374\313\334\270\354>"; /* "<日本語>" */
+
+          check_character (input + 1, 2);
+          check_character (input + 3, 2);
+          check_character (input + 5, 2);
+        }
+        return 0;
+
+      case '4':
+        /* Locale encoding is GB18030.  */
+        {
+          const char input[] = "s\250\271\201\060\211\070\224\071\375\067!"; /* "süß😋!" */
+
+          check_character (input + 1, 2);
+          check_character (input + 3, 4);
+          check_character (input + 7, 4);
+        }
+        return 0;
+
+      case '5':
+        /* C locale; tested above.  */
+        return 0;
+      }
+
+  return 1;
+}
diff --git a/tests/test-c32rtomb.sh b/tests/test-c32rtomb.sh
new file mode 100755
index 0000000..2899297
--- /dev/null
+++ b/tests/test-c32rtomb.sh
@@ -0,0 +1,39 @@
+#!/bin/sh
+
+# Test in an ISO-8859-1 or ISO-8859-15 locale.
+: ${LOCALE_FR=fr_FR}
+if test $LOCALE_FR != none; then
+  LC_ALL=$LOCALE_FR \
+  ${CHECKER} ./test-c32rtomb${EXEEXT} 1 \
+  || exit 1
+fi
+
+# Test whether a specific UTF-8 locale is installed.
+: ${LOCALE_FR_UTF8=fr_FR.UTF-8}
+if test $LOCALE_FR_UTF8 != none; then
+  LC_ALL=$LOCALE_FR_UTF8 \
+  ${CHECKER} ./test-c32rtomb${EXEEXT} 2 \
+  || exit 1
+fi
+
+# Test whether a specific EUC-JP locale is installed.
+: ${LOCALE_JA=ja_JP}
+if test $LOCALE_JA != none; then
+  LC_ALL=$LOCALE_JA \
+  ${CHECKER} ./test-c32rtomb${EXEEXT} 3 \
+  || exit 1
+fi
+
+# Test whether a specific GB18030 locale is installed.
+: ${LOCALE_ZH_CN=zh_CN.GB18030}
+if test $LOCALE_ZH_CN != none; then
+  LC_ALL=$LOCALE_ZH_CN \
+  ${CHECKER} ./test-c32rtomb${EXEEXT} 4 \
+  || exit 1
+fi
+
+# Test in the POSIX locale.
+LC_ALL=C     ${CHECKER} ./test-c32rtomb${EXEEXT} 5 || exit 1
+LC_ALL=POSIX ${CHECKER} ./test-c32rtomb${EXEEXT} 5 || exit 1
+
+exit 0
-- 
2.7.4


[-- Attachment #5: 0004-c32tob-Make-consistent-with-mbrtoc32.patch --]
[-- Type: text/x-patch, Size: 2585 bytes --]

From d6f8671505956401691e3c35d19499470f582a88 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Thu, 9 Jan 2020 02:04:07 +0100
Subject: [PATCH 4/4] c32tob: Make consistent with mbrtoc32.

* lib/c32tob.c: Include <stdio.h>, <string.h>, <wchar.h>.
(c32tob): If the char32_t encoding and the wchar_t encoding may differ,
use c32rtomb, not wctob.
* modules/c32tob (Files): Add m4/mbrtoc32.m4.
(Depends-on): Add c32rtomb.
(configure.ac): Require gl_MBRTOC32_SANITYCHECK.
---
 ChangeLog      | 10 ++++++++++
 lib/c32tob.c   | 19 ++++++++++++++++++-
 modules/c32tob |  3 +++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index c303d41..9c3f603 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,15 @@
 2020-01-08  Bruno Haible  <bruno@clisp.org>
 
+	c32tob: Make consistent with mbrtoc32.
+	* lib/c32tob.c: Include <stdio.h>, <string.h>, <wchar.h>.
+	(c32tob): If the char32_t encoding and the wchar_t encoding may differ,
+	use c32rtomb, not wctob.
+	* modules/c32tob (Files): Add m4/mbrtoc32.m4.
+	(Depends-on): Add c32rtomb.
+	(configure.ac): Require gl_MBRTOC32_SANITYCHECK.
+
+2020-01-08  Bruno Haible  <bruno@clisp.org>
+
 	c32rtomb: Add tests.
 	* tests/test-c32rtomb.c: New file, based on tests/test-wcrtomb.c.
 	* tests/test-c32rtomb.sh: New file, based on tests/test-wcrtomb.sh.
diff --git a/lib/c32tob.c b/lib/c32tob.c
index 4da438f..55f61c7 100644
--- a/lib/c32tob.c
+++ b/lib/c32tob.c
@@ -21,10 +21,27 @@
 /* Specification.  */
 #include <uchar.h>
 
+#include <stdio.h>
+#include <string.h>
+#include <wchar.h>
+
 int
 c32tob (wint_t wc)
 {
-#if _GL_LARGE_CHAR32_T
+#if HAVE_WORKING_MBRTOC32 && !defined __GLIBC__
+  /* The char32_t encoding of a multibyte character may be different than its
+     wchar_t encoding.  */
+  if (wc != WEOF)
+    {
+      mbstate_t state;
+      char buf[8];
+
+      memset (&state, '\0', sizeof (mbstate_t));
+      if (c32rtomb (buf, wc, &state) == 1)
+        return (unsigned char) buf[0];
+    }
+  return EOF;
+#elif _GL_LARGE_CHAR32_T
   /* In all known encodings, unibyte characters correspond only to
      characters in the BMP.  */
   if (wc != WEOF && (wchar_t) wc == wc)
diff --git a/modules/c32tob b/modules/c32tob
index 3ef42ba..42e18a9 100644
--- a/modules/c32tob
+++ b/modules/c32tob
@@ -3,12 +3,15 @@ c32tob() function: convert 32-bit wide character to unibyte character.
 
 Files:
 lib/c32tob.c
+m4/mbrtoc32.m4
 
 Depends-on:
 uchar
+c32rtomb
 wctob
 
 configure.ac:
+AC_REQUIRE([gl_MBRTOC32_SANITYCHECK])
 gl_UCHAR_MODULE_INDICATOR([c32tob])
 
 Makefile.am:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: new module 'c32rtomb'
  2020-01-09  1:14 new module 'c32rtomb' Bruno Haible
@ 2020-01-21  1:00 ` Bruno Haible
  0 siblings, 0 replies; 2+ messages in thread
From: Bruno Haible @ 2020-01-21  1:00 UTC (permalink / raw)
  To: bug-gnulib

I wrote:
> On FreeBSD 12 and Solaris 11.4, the
> two encodings are different. The FreeBSD 12 wchar_t encoding is apparently
> based on ISO 2022 (very old).
> 
> The fix is to use mbrtoc32() on platforms where this is possible, namely
> on FreeBSD.

Actually, FreeBSD has a similar bug as Solaris 11.4. Therefore it is better to
NOT use the system's mbrtoc32() on FreeBSD 12.


2020-01-20  Bruno Haible  <bruno@clisp.org>

	mbrtoc32: Add note about FreeBSD 12.
	* m4/mbrtoc32.m4 (gl_MBRTOC32_SANITYCHECK): Guess no also on FreeBSD.
	* doc/posix-functions/mbrtoc32.texi: Mention that FreeBSD 12 is also
	affected.

diff --git a/doc/posix-functions/mbrtoc32.texi b/doc/posix-functions/mbrtoc32.texi
index 9789bef..3151a09 100644
--- a/doc/posix-functions/mbrtoc32.texi
+++ b/doc/posix-functions/mbrtoc32.texi
@@ -20,7 +20,7 @@ glibc 2.19.
 @item
 This function does not recognize multibyte sequences that @code{mbrtowc}
 recognizes on some platforms:
-Solaris 11.4, mingw, MSVC 14.
+FreeBSD 12, Solaris 11.4, mingw, MSVC 14.
 @end itemize
 
 Portability problems not fixed by Gnulib:
diff --git a/m4/mbrtoc32.m4 b/m4/mbrtoc32.m4
index 3dee900..a5dc51a 100644
--- a/m4/mbrtoc32.m4
+++ b/m4/mbrtoc32.m4
@@ -1,4 +1,4 @@
-# mbrtoc32.m4 serial 2
+# mbrtoc32.m4 serial 3
 dnl Copyright (C) 2014-2020 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -136,10 +136,14 @@ AC_DEFUN([gl_MBRTOC32_SANITYCHECK],
         dnl is present.
 changequote(,)dnl
         case "$host_os" in
-                             # Guess no on Solaris, native Windows.
-          solaris* | mingw*) gl_cv_func_mbrtoc32_sanitycheck="guessing no" ;;
-                             # Guess yes otherwise.
-          *)                 gl_cv_func_mbrtoc32_sanitycheck="guessing yes" ;;
+          # Guess no on FreeBSD, Solaris, native Windows.
+          freebsd* | solaris* | mingw*)
+            gl_cv_func_mbrtoc32_sanitycheck="guessing no"
+            ;;
+          # Guess yes otherwise.
+          *)
+            gl_cv_func_mbrtoc32_sanitycheck="guessing yes"
+            ;;
         esac
 changequote([,])dnl
         if test $LOCALE_FR != none || test $LOCALE_ZH_CN != none; then
@@ -176,8 +180,8 @@ int main ()
             result |= 1;
         }
     }
-  /* This fails on Solaris 11.4:
-     mbrtoc32 returns (size_t)-1.
+  /* This fails on FreeBSD 12 and Solaris 11.4:
+     mbrtoc32 returns (size_t)-2 or (size_t)-1.
      mbrtowc returns 4 (correct).  */
   if (setlocale (LC_ALL, "$LOCALE_ZH_CN") != NULL)
     {



^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-01-21  1:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-09  1:14 new module 'c32rtomb' Bruno Haible
2020-01-21  1:00 ` Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).