* [PATCH] IBM z/OS + EBCDIC support
@ 2015-09-22 2:28 Daniel Richard G.
2015-09-22 15:23 ` Eric Blake
` (2 more replies)
0 siblings, 3 replies; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-22 2:28 UTC (permalink / raw)
To: bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 8000 bytes --]
Hello list,
The attached patch, against Git master, addresses numerous
incompatibilities in Gnulib with IBM z/OS (a mainframe operating system)
and the EBCDIC encoding.
With my changes, Gnulib builds successfully, and most of the tests
succeed. The remaining failures are as follows.
These appear to expose bugs in the system implementation, and have been
reported to IBM. (A few others have already received APAR fixes):
FAIL: test-fdopendir
FAIL: test-getopt
FAIL: test-mbsrtowcs1.sh
A number of floating-point tests appear to be in the same boat. These
failure modes have yet to be evaluated:
FAIL: test-fma2
FAIL: test-fmaf2
FAIL: test-fmodl-ieee
FAIL: test-isinf
FAIL: test-isnan
FAIL: test-isnanl-nolibm
FAIL: test-isnanl
FAIL: test-ldexpf
FAIL: test-remainderl-ieee
FAIL: test-truncl-ieee
These require more investigation and/or discussion on this list:
FAIL: test-perror.sh
FAIL: test-poll
FAIL: test-select-in.sh
FAIL: test-select-out.sh
FAIL: test-sigpipe.sh
FAIL: test-symlink
FAIL: test-symlinkat
One more issue for now: In order to build Gnulib on this system, it is
necessary to use a compiler wrapper script, due to the inexplicably
broken way xlc handles #include paths. I recently submitted some changes
to Gawk to work around this (look in the feature/zOS-try2 branch,
m4/arch.m4 file; search for "zos-cc"). It's possible that a similar
workaround will need to be bundled here.
In any event, below is a walk-through of my changes in the patch.
Comments and questions are welcome.
+++ lib/alloca.in.h
* z/OS has the alloca() definitions in stdlib.h.
+++ lib/c-ctype.c
* Implementing ctype functions that support EBCDIC from scratch is not
feasible, not least because there isn't even one specific EBCDIC
variant that should be targeted. So I just call through to the system
routines, while ensuring that the compile-time environment is set
correctly, and working around the system routines' input-range issues
with signed chars.
* In EBCDIC, normal chars like 'A' occur in the upper half of the 8-bit
range. This interferes with the idiom of using "switch (c)" and then
"case 'A':" et al. because c can have two distinct values (-63 and
193) that should match to 'A'.
My fix, then, is a macro which converts the input codepoint to the
range that will match literal chars, when necessary. (Obviously, in
ASCII, it's a no-op.) Any takers on a better name for this macro than
CHAR_LITERAL()?
+++ lib/c-ctype.h
* Ensure that ASCII optimizations are applied only when building in
ASCII.
+++ lib/fnmatch.c
* Fixed an error from __GNUC__ not being defined.
+++ lib/get-rusage-as.c
* Added z/OS awareness.
+++ lib/glob.c
* Avoid this #define on z/OS, because...
$ grep alloca /usr/include/stdlib.h
#ifndef alloca
#define alloca(x) __alloca(x)
#pragma linkage(__alloca,builtin)
void *__alloca(unsigned int x);
+++ lib/glthread/thread.c
* Added z/OS awareness. pthread_t does not have a .p field on z/OS, but
this does otherwise seem to apply.
For what it's worth, this is pthread_t, from /usr/include/sys/types.h:
typedef struct {
char __[0x08];
} pthread_t;
+++ lib/glthread/thread.h
* Best guess at a gl_thread implementation for z/OS.
+++ lib/math.in.h
* The system defines these functions as macros, and the compiler did not
like seeing them redefined.
+++ lib/ptsname_r.c
* Likewise.
+++ lib/regex.h
* Ensure that "__string" does not expand to "1" when it is used as a
formal parameter name.
+++ lib/string.in.h
* Likewise.
+++ lib/strtod.c
* The system strtod() sets ERANGE for some reason when parsing "0x".
* It also returns a value of 0.0 for "nan()".
+++ m4/fclose.m4
* This system has a broken fclose(); without this bit, the test-fclose
test fails:
$ ./test-fclose
/path/to/gltests/test-fclose.c:74: assertion 'lseek (fd, 0, SEEK_CUR) == 3' failed
CEE5207E The signal SIGABRT was received.
ABORT instruction
However, the existing conditions didn't enable it, so I added a
host-platform check.
+++ m4/strstr.m4
* The IBM runtime sucks; signal delivery is delayed until strstr()
exits, so this test results in a hang that can only be SIGKILL'ed.
+++ m4/wchar_h.m4
* The linker on this system cares way too much about the object file's
original name.
Slightly longer explanation: In 64-bit builds, the toolchain uses the
XPLINK object format (as opposed to GOFF for 31-bit builds). XPLINK
has the notion of CSECTs, and these are named. By default, the main
code CSECT is named after the source-file basename. If the linker
encounters two CSECTs with the same name, it will consider them to be
duplicates, and discard one---even if they contain completely
orthogonal definitions.
This can be worked around by specifying the CSECT names explicitly
with -qcsect=foobaz (using different values of "foobaz" for the two
files), but IMO it is easier just to compile the two source files for
these tests from differently-named source files in the first place.
+++ tests/infinity.h
* xlc doesn't like constant div-by-zero expressions.
+++ tests/nan.h
* z/OS, in addition to supporting IEEE floating-point, also supports an
older "hexadecimal" format that does not support NaN. Bomb out if this
is in use.
+++ tests/test-c-ctype.c
* We need the same CHAR_LITERAL() hack here as in c-ctype.c.
+++ tests/test-c-strcasecmp.c
* In EBCDIC-1047, the tests
ASSERT (c_strcasecmp ("turkish", "TURK\304\260SH") < 0);
ASSERT (c_strcasecmp ("TURK\304\260SH", "turkish") > 0);
are actually
ASSERT (c_strcasecmp ("turkish", "TURKD¬SH") < 0);
ASSERT (c_strcasecmp ("TURKD¬SH", "turkish") > 0);
which, of course, fail.
+++ tests/test-c-strncasecmp.c
* Likewise.
+++ tests/test-canonicalize-lgpl.c
* Addressed a strange z/OS corner case. This system has
DOUBLE_SLASH_IS_DISTINCT_ROOT, yet the dev/ino numbers are the same.
+++ tests/test-iconv-utf.c
* When compiling in (normal) EBCDIC mode on z/OS, the compiler
translates char and string literals to EBCDIC. (Numerical escapes like
"\346" are not remapped.) This messes up the test, because the input
strings are supposed to have their literal characters represented in
ASCII. So I moved all the input strings to the top of the file, added
an appropriate compiler #pragma to change the conversion behavior, and
modified the tests to refer to these.
(Note that a #define would not work for the input strings, because the
text is converted at the point of use, not the point of definition.)
+++ tests/test-iconv.c
* The system iconv implementation does not recognize "ISO-8859-1", but
it does recognize "ISO8859-1".
* Similar issue with converting input strings. (This leaves open the
possibility that any ASSERT() failures will be reported in ISO 8859-1,
not EBCDIC, thus resulting in gibberish on the user's terminal. But I
kept the changes to the minimum needed to get this test to pass. I can
do the full nine yards if desired.)
+++ tests/test-nonblocking-pipe.h
* Added z/OS awareness. (I tested this and found that exact
boundary value; the test fails with 131072.)
+++ tests/test-nonblocking-reader.h
* Nonblocking read() returns EWOULDBLOCK on this system.
+++ tests/test-nonblocking-writer.h
* Nonblocking write() returns EWOULDBLOCK on this system.
+++ tests/test-sigpipe.sh
* Fixed an apparent typo.
+++ tests/test-wcwidth.c
* Only run ASCII-specific tests in ASCII mode.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gnulib-zos-v1.patch --]
[-- Type: text/x-patch; name="gnulib-zos-v1.patch", Size: 37481 bytes --]
diff --git a/lib/alloca.in.h b/lib/alloca.in.h
index d5664b6..6606984 100644
--- a/lib/alloca.in.h
+++ b/lib/alloca.in.h
@@ -51,6 +51,8 @@ extern "C"
void *_alloca (unsigned short);
# pragma intrinsic (_alloca)
# define alloca _alloca
+# elif defined __MVS__
+# include <stdlib.h>
# else
# include <stddef.h>
# ifdef __cplusplus
diff --git a/lib/c-ctype.c b/lib/c-ctype.c
index 6635d34..bbc543f 100644
--- a/lib/c-ctype.c
+++ b/lib/c-ctype.c
@@ -17,16 +17,54 @@ along with this program; if not, see <http://www.gnu.org/licenses/>. */
#include <config.h>
+/* On z/OS with EBCDIC, we punt and just use the system functions.
+ IBM created this mess; let them deal with it.
+
+ Note that if we are not building with -D_ALL_SOURCE, then isascii()
+ interprets its input as an ASCII codepoint, even in an EBCDIC build.
+
+ Also, the z/OS ctype functions do not handle negative-valued chars
+ at all (especially helpful when signed EBCDIC 'A' == -63), so we
+ adjust their arguments accordingly. */
+#if defined __MVS__ && !C_CTYPE_ASCII
+# ifndef _ALL_SOURCE
+# error "Please compile me with -D_ALL_SOURCE, or else isascii() will not work correctly with EBCDIC input."
+# endif
+# include <ctype.h>
+# define USE_SYSTEM_CTYPE
+# define SYSTEM_CTYPE_CHAR(C) ((C) & 0xff)
+#endif
+
/* Specification. */
#define NO_C_CTYPE_MACROS
#include "c-ctype.h"
+/* In EBCDIC, literal chars like 'A' may be represented by a signed
+ negative (-63) as well as unsigned positive (193) value. If we are
+ comparing a char integer value to a literal, then we want the
+ former to be on the same side of the "fence" as the latter. */
+#if C_CTYPE_ASCII
+# define CHAR_LITERAL(C) (C)
+#elif 'A' < 0
+# define CHAR_LITERAL(C) ((C) >= 128 && (C) < 256 ? (C) - 256 : (C))
+#else
+# define CHAR_LITERAL(C) ((C) >= -128 && (C) < 0 ? (C) + 256 : (C))
+#endif
+
/* The function isascii is not locale dependent. Its use in EBCDIC is
questionable. */
bool
c_isascii (int c)
{
+#if C_CTYPE_ASCII
return (c >= 0x00 && c <= 0x7f);
+#elif defined USE_SYSTEM_CTYPE
+ /* On z/OS, the ctype functions return zero or non-zero,
+ not necessarily 0 or 1. */
+ return isascii (SYSTEM_CTYPE_CHAR (c)) != 0;
+#else
+# error "No suitable implementation for c_isascii()"
+#endif
}
bool
@@ -42,8 +80,8 @@ c_isalnum (int c)
|| (c >= 'A' && c <= 'Z')
|| (c >= 'a' && c <= 'z'));
#endif
-#else
- switch (c)
+#else /* Non-consecutive alphanumerics */
+ switch (CHAR_LITERAL (c))
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
@@ -74,7 +112,7 @@ c_isalpha (int c)
return ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'));
#endif
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
@@ -104,8 +142,10 @@ c_iscntrl (int c)
{
#if C_CTYPE_ASCII
return ((c & ~0x1f) == 0 || c == 0x7f);
+#elif defined USE_SYSTEM_CTYPE
+ return iscntrl(SYSTEM_CTYPE_CHAR (c)) != 0;
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case ' ': case '!': case '"': case '#': case '$': case '%':
case '&': case '\'': case '(': case ')': case '*': case '+':
@@ -137,9 +177,9 @@ bool
c_isdigit (int c)
{
#if C_CTYPE_CONSECUTIVE_DIGITS
- return (c >= '0' && c <= '9');
+ return (CHAR_LITERAL (c) >= '0' && CHAR_LITERAL (c) <= '9');
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
@@ -156,7 +196,7 @@ c_islower (int c)
#if C_CTYPE_CONSECUTIVE_LOWERCASE
return (c >= 'a' && c <= 'z');
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
@@ -175,8 +215,10 @@ c_isgraph (int c)
{
#if C_CTYPE_ASCII
return (c >= '!' && c <= '~');
+#elif defined USE_SYSTEM_CTYPE
+ return isgraph(SYSTEM_CTYPE_CHAR (c)) != 0;
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case '!': case '"': case '#': case '$': case '%': case '&':
case '\'': case '(': case ')': case '*': case '+': case ',':
@@ -209,8 +251,10 @@ c_isprint (int c)
{
#if C_CTYPE_ASCII
return (c >= ' ' && c <= '~');
+#elif defined USE_SYSTEM_CTYPE
+ return isprint(SYSTEM_CTYPE_CHAR (c)) != 0;
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case ' ': case '!': case '"': case '#': case '$': case '%':
case '&': case '\'': case '(': case ')': case '*': case '+':
@@ -245,8 +289,10 @@ c_ispunct (int c)
return ((c >= '!' && c <= '~')
&& !((c >= '0' && c <= '9')
|| ((c & ~0x20) >= 'A' && (c & ~0x20) <= 'Z')));
+#elif defined USE_SYSTEM_CTYPE
+ return ispunct(SYSTEM_CTYPE_CHAR (c)) != 0;
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case '!': case '"': case '#': case '$': case '%': case '&':
case '\'': case '(': case ')': case '*': case '+': case ',':
@@ -275,7 +321,7 @@ c_isupper (int c)
#if C_CTYPE_CONSECUTIVE_UPPERCASE
return (c >= 'A' && c <= 'Z');
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
@@ -303,7 +349,7 @@ c_isxdigit (int c)
|| (c >= 'a' && c <= 'f'));
#endif
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
@@ -322,7 +368,7 @@ c_tolower (int c)
#if C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE
return (c >= 'A' && c <= 'Z' ? c - 'A' + 'a' : c);
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case 'A': return 'a';
case 'B': return 'b';
@@ -361,7 +407,7 @@ c_toupper (int c)
#if C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE
return (c >= 'a' && c <= 'z' ? c - 'a' + 'A' : c);
#else
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case 'a': return 'A';
case 'b': return 'B';
diff --git a/lib/c-ctype.h b/lib/c-ctype.h
index d622973..d94f526 100644
--- a/lib/c-ctype.h
+++ b/lib/c-ctype.h
@@ -141,11 +141,13 @@ extern int c_toupper (int c) _GL_ATTRIBUTE_CONST;
/* ASCII optimizations. */
+#ifdef C_CTYPE_ASCII
#undef c_isascii
#define c_isascii(c) \
({ int __c = (c); \
(__c >= 0x00 && __c <= 0x7f); \
})
+#endif
#if C_CTYPE_CONSECUTIVE_DIGITS \
&& C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE
diff --git a/lib/fnmatch.c b/lib/fnmatch.c
index a607672..58754fa 100644
--- a/lib/fnmatch.c
+++ b/lib/fnmatch.c
@@ -22,7 +22,7 @@
# define _GNU_SOURCE 1
#endif
-#if ! defined __builtin_expect && __GNUC__ < 3
+#if ! defined __builtin_expect && defined __GNUC__ && __GNUC__ < 3
# define __builtin_expect(expr, expected) (expr)
#endif
diff --git a/lib/get-rusage-as.c b/lib/get-rusage-as.c
index 2bad20a..4db1596 100644
--- a/lib/get-rusage-as.c
+++ b/lib/get-rusage-as.c
@@ -355,7 +355,7 @@ get_rusage_as_via_iterator (void)
uintptr_t
get_rusage_as (void)
{
-#if (defined __APPLE__ && defined __MACH__) || defined _AIX || defined __CYGWIN__ /* Mac OS X, AIX, Cygwin */
+#if (defined __APPLE__ && defined __MACH__) || defined _AIX || defined __CYGWIN__ || defined __MVS__ /* Mac OS X, AIX, Cygwin, z/OS */
/* get_rusage_as_via_setrlimit() does not work.
Prefer get_rusage_as_via_iterator(). */
return get_rusage_as_via_iterator ();
diff --git a/lib/glob.c b/lib/glob.c
index ed49a9d..9fd6482 100644
--- a/lib/glob.c
+++ b/lib/glob.c
@@ -144,7 +144,9 @@
# define __stat64(fname, buf) stat (fname, buf)
# define __fxstatat64(_, d, f, st, flag) fstatat (d, f, st, flag)
# define struct_stat64 struct stat
-# define __alloca alloca
+# ifndef __MVS__
+# define __alloca alloca
+# endif
# define __readdir readdir
# define __glob_pattern_p glob_pattern_p
#endif /* _LIBC */
diff --git a/lib/glthread/thread.c b/lib/glthread/thread.c
index d4e2921..28a2797 100644
--- a/lib/glthread/thread.c
+++ b/lib/glthread/thread.c
@@ -33,7 +33,7 @@
#include <pthread.h>
-#ifdef PTW32_VERSION
+#if defined(PTW32_VERSION) || defined(__MVS__)
const gl_thread_t gl_null_thread /* = { .p = NULL } */;
diff --git a/lib/glthread/thread.h b/lib/glthread/thread.h
index 2febe34..01ec45b 100644
--- a/lib/glthread/thread.h
+++ b/lib/glthread/thread.h
@@ -172,6 +172,15 @@ typedef pthread_t gl_thread_t;
# define gl_thread_self_pointer() \
(pthread_in_use () ? pthread_self ().p : NULL)
extern const gl_thread_t gl_null_thread;
+# elif defined(__MVS__)
+ /* On IBM z/OS, pthread_t is a struct with an 8-byte '__' field.
+ The first three bytes of this field appear to uniquely identify a
+ pthread_t, though not necessarily representing a pointer. */
+# define gl_thread_self() \
+ (pthread_in_use () ? pthread_self () : gl_null_thread)
+# define gl_thread_self_pointer() \
+ (pthread_in_use () ? *((void **) pthread_self ().__) : NULL)
+extern const gl_thread_t gl_null_thread;
# else
# define gl_thread_self() \
(pthread_in_use () ? pthread_self () : (pthread_t) NULL)
diff --git a/lib/math.in.h b/lib/math.in.h
index 62a089a..59293fd 100644
--- a/lib/math.in.h
+++ b/lib/math.in.h
@@ -406,6 +406,7 @@ _GL_WARN_ON_USE (ceilf, "ceilf is unportable - "
#if @GNULIB_CEIL@
# if @REPLACE_CEIL@
# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef ceil
# define ceil rpl_ceil
# endif
_GL_FUNCDECL_RPL (ceil, double, (double x));
@@ -753,6 +754,7 @@ _GL_WARN_ON_USE (floorf, "floorf is unportable - "
#if @GNULIB_FLOOR@
# if @REPLACE_FLOOR@
# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef floor
# define floor rpl_floor
# endif
_GL_FUNCDECL_RPL (floor, double, (double x));
@@ -973,6 +975,7 @@ _GL_WARN_ON_USE (frexpf, "frexpf is unportable - "
#if @GNULIB_FREXP@
# if @REPLACE_FREXP@
# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef frexp
# define frexp rpl_frexp
# endif
_GL_FUNCDECL_RPL (frexp, double, (double x, int *expptr) _GL_ARG_NONNULL ((2)));
@@ -1958,6 +1961,7 @@ _GL_WARN_ON_USE (tanhf, "tanhf is unportable - "
#if @GNULIB_TRUNCF@
# if @REPLACE_TRUNCF@
# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef truncf
# define truncf rpl_truncf
# endif
_GL_FUNCDECL_RPL (truncf, float, (float x));
@@ -1980,6 +1984,7 @@ _GL_WARN_ON_USE (truncf, "truncf is unportable - "
#if @GNULIB_TRUNC@
# if @REPLACE_TRUNC@
# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef trunc
# define trunc rpl_trunc
# endif
_GL_FUNCDECL_RPL (trunc, double, (double x));
diff --git a/lib/ptsname_r.c b/lib/ptsname_r.c
index faa33fb..809388a 100644
--- a/lib/ptsname_r.c
+++ b/lib/ptsname_r.c
@@ -34,6 +34,11 @@
# define _PATH_DEV "/dev/"
# endif
+# undef __set_errno
+# undef __stat
+# undef __ttyname_r
+# undef __ptsname_r
+
# define __set_errno(e) errno = (e)
# define __isatty isatty
# define __stat stat
diff --git a/lib/regex.h b/lib/regex.h
index 6f3bae3..64d7a43 100644
--- a/lib/regex.h
+++ b/lib/regex.h
@@ -23,6 +23,12 @@
#include <sys/types.h>
+/* IBM z/OS uses -D__string=1 as an inclusion guard. */
+#if defined(__MVS__) && defined(__string)
+# undef __string
+# define __string __string
+#endif
+
/* Allow the use in C++ code. */
#ifdef __cplusplus
extern "C" {
diff --git a/lib/string.in.h b/lib/string.in.h
index b3356bb..6359ea3 100644
--- a/lib/string.in.h
+++ b/lib/string.in.h
@@ -44,6 +44,12 @@
#ifndef _@GUARD_PREFIX@_STRING_H
#define _@GUARD_PREFIX@_STRING_H
+/* IBM z/OS uses -D__string=1 as an inclusion guard. */
+#if defined(__MVS__) && defined(__string)
+# undef __string
+# define __string __string
+#endif
+
/* NetBSD 5.0 mis-defines NULL. */
#include <stddef.h>
diff --git a/lib/strtod.c b/lib/strtod.c
index 9fd0170..09bc76a 100644
--- a/lib/strtod.c
+++ b/lib/strtod.c
@@ -239,7 +239,13 @@ strtod (const char *nptr, char **endptr)
if (*s == '0' && c_tolower (s[1]) == 'x')
{
if (! c_isxdigit (s[2 + (s[2] == '.')]))
- end = s + 1;
+ {
+ end = s + 1;
+
+ /* strtod() on z/OS is confused by "0x". */
+ if (errno == ERANGE)
+ errno = 0;
+ }
else if (end <= s + 2)
{
num = parse_number (s + 2, 16, 2, 4, 'p', &endbuf);
@@ -321,7 +327,7 @@ strtod (const char *nptr, char **endptr)
better to use the underlying implementation's result, since a
nice implementation populates the bits of the NaN according
to interpreting n-char-sequence as a hexadecimal number. */
- if (s != end)
+ if (s != end || !isnand(num))
num = NAN;
errno = saved_errno;
}
diff --git a/m4/fclose.m4 b/m4/fclose.m4
index 6bd1ad8..92ba457 100644
--- a/m4/fclose.m4
+++ b/m4/fclose.m4
@@ -17,4 +17,8 @@ AC_DEFUN([gl_FUNC_FCLOSE],
if test $REPLACE_CLOSE = 1; then
REPLACE_FCLOSE=1
fi
+
+ case "$host" in
+ *-ibm-openedition) REPLACE_FCLOSE=1 ;;
+ esac
])
diff --git a/m4/strstr.m4 b/m4/strstr.m4
index 040c0b9..e623e28 100644
--- a/m4/strstr.m4
+++ b/m4/strstr.m4
@@ -79,6 +79,11 @@ static void quit (int sig) { exit (sig + 128); }
char *needle = (char *) malloc (m + 2);
/* Failure to compile this test due to missing alarm is okay,
since all such platforms (mingw) also have quadratic strstr. */
+#ifdef __MVS__
+ /* Except for z/OS, which does not deliver signals while strstr()
+ is running (thanks to restrictions on its LE runtime). */
+ return 1;
+#endif
signal (SIGALRM, quit);
alarm (5);
/* Check for quadratic performance. */
diff --git a/m4/wchar_h.m4 b/m4/wchar_h.m4
index 9d1b0f8..35ece60 100644
--- a/m4/wchar_h.m4
+++ b/m4/wchar_h.m4
@@ -81,8 +81,14 @@ AC_DEFUN([gl_WCHAR_H_INLINE_OK],
extern int zero (void);
int main () { return zero(); }
]])])
+ dnl Do not rename the object file from conftest.$ac_objext to
+ dnl conftest1.$ac_objext, as this will cause the link to fail on
+ dnl z/OS when using the XPLINK object format (due to duplicate
+ dnl CSECT names). Instead, we temporarily redefine $ac_compile so
+ dnl that the object file has the latter name from the start.
+ save_ac_compile="$ac_compile"
+ ac_compile=`echo "$save_ac_compile" | sed s/conftest/conftest1/`
if AC_TRY_EVAL([ac_compile]); then
- mv conftest.$ac_objext conftest1.$ac_objext
AC_LANG_CONFTEST([
AC_LANG_SOURCE([[#define wcstod renamed_wcstod
/* Tru64 with Desktop Toolkit C has a bug: <stdio.h> must be included before
@@ -95,8 +101,9 @@ int main () { return zero(); }
#include <wchar.h>
int zero (void) { return 0; }
]])])
+ dnl See note above about renaming object files.
+ ac_compile=`echo "$save_ac_compile" | sed s/conftest/conftest2/`
if AC_TRY_EVAL([ac_compile]); then
- mv conftest.$ac_objext conftest2.$ac_objext
if $CC -o conftest$ac_exeext $CFLAGS $LDFLAGS conftest1.$ac_objext conftest2.$ac_objext $LIBS >&AS_MESSAGE_LOG_FD 2>&1; then
:
else
@@ -104,6 +111,7 @@ int zero (void) { return 0; }
fi
fi
fi
+ ac_compile="$save_ac_compile"
rm -f conftest1.$ac_objext conftest2.$ac_objext conftest$ac_exeext
])
if test $gl_cv_header_wchar_h_correct_inline = no; then
diff --git a/tests/infinity.h b/tests/infinity.h
index 45c30bd..4e8a755 100644
--- a/tests/infinity.h
+++ b/tests/infinity.h
@@ -17,8 +17,9 @@
/* Infinityf () returns a 'float' +Infinity. */
-/* The Microsoft MSVC 9 compiler chokes on the expression 1.0f / 0.0f. */
-#if defined _MSC_VER
+/* The Microsoft MSVC 9 compiler chokes on the expression 1.0f / 0.0f.
+ The IBM XL C compiler on z/OS complains. */
+#if defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
static float
Infinityf ()
{
@@ -32,8 +33,9 @@ Infinityf ()
/* Infinityd () returns a 'double' +Infinity. */
-/* The Microsoft MSVC 9 compiler chokes on the expression 1.0 / 0.0. */
-#if defined _MSC_VER
+/* The Microsoft MSVC 9 compiler chokes on the expression 1.0 / 0.0.
+ The IBM XL C compiler on z/OS complains. */
+#if defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
static double
Infinityd ()
{
@@ -47,9 +49,10 @@ Infinityd ()
/* Infinityl () returns a 'long double' +Infinity. */
-/* The Microsoft MSVC 9 compiler chokes on the expression 1.0L / 0.0L. */
-#if defined _MSC_VER
-static double
+/* The Microsoft MSVC 9 compiler chokes on the expression 1.0L / 0.0L.
+ The IBM XL C compiler on z/OS complains. */
+#if defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
+static long double
Infinityl ()
{
static long double zero = 0.0L;
diff --git a/tests/nan.h b/tests/nan.h
index 9f6819c..10b393e 100644
--- a/tests/nan.h
+++ b/tests/nan.h
@@ -15,11 +15,18 @@
along with this program. If not, see <http://www.gnu.org/licenses/>. */
+/* IBM z/OS supports both hexadecimal and IEEE floating-point formats. The
+ former does not support NaN and its isnan() implementation returns zero
+ for all values. */
+#if defined __MVS__ && defined __IBMC__ && !defined __BFP__
+# error "NaN is not supported with IBM's hexadecimal floating-point format; please re-compile with -qfloat=ieee"
+#endif
+
/* NaNf () returns a 'float' not-a-number. */
/* The Compaq (ex-DEC) C 6.4 compiler and the Microsoft MSVC 9 compiler choke
- on the expression 0.0 / 0.0. */
-#if defined __DECC || defined _MSC_VER
+ on the expression 0.0 / 0.0. The IBM XL C compiler on z/OS complains. */
+#if defined __DECC || defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
static float
NaNf ()
{
@@ -34,8 +41,8 @@ NaNf ()
/* NaNd () returns a 'double' not-a-number. */
/* The Compaq (ex-DEC) C 6.4 compiler and the Microsoft MSVC 9 compiler choke
- on the expression 0.0 / 0.0. */
-#if defined __DECC || defined _MSC_VER
+ on the expression 0.0 / 0.0. The IBM XL C compiler on z/OS complains. */
+#if defined __DECC || defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
static double
NaNd ()
{
@@ -51,14 +58,15 @@ NaNd ()
/* On Irix 6.5, gcc 3.4.3 can't compute compile-time NaN, and needs the
runtime type conversion.
- The Microsoft MSVC 9 compiler chokes on the expression 0.0L / 0.0L. */
+ The Microsoft MSVC 9 compiler chokes on the expression 0.0L / 0.0L.
+ The IBM XL C compiler on z/OS complains. */
#ifdef __sgi
static long double NaNl ()
{
double zero = 0.0;
return zero / zero;
}
-#elif defined _MSC_VER
+#elif defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
static long double
NaNl ()
{
diff --git a/tests/test-c-ctype.c b/tests/test-c-ctype.c
index 81fe936..f7a2e39 100644
--- a/tests/test-c-ctype.c
+++ b/tests/test-c-ctype.c
@@ -24,6 +24,12 @@
#include "macros.h"
+#if 'A' < 0
+# define CHAR_LITERAL(C) ((C) >= 128 && (C) < 256 ? (C) - 256 : (C))
+#else
+# define CHAR_LITERAL(C) ((C) >= -128 && (C) < 0 ? (C) + 256 : (C))
+#endif
+
static void
test_all (void)
{
@@ -31,9 +37,11 @@ test_all (void)
for (c = -0x80; c < 0x100; c++)
{
+#ifdef C_CTYPE_ASCII
ASSERT (c_isascii (c) == (c >= 0 && c < 0x80));
+#endif
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
@@ -54,7 +62,7 @@ test_all (void)
break;
}
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
@@ -73,7 +81,7 @@ test_all (void)
break;
}
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case '\t': case ' ':
ASSERT (c_isblank (c) == 1);
@@ -83,9 +91,13 @@ test_all (void)
break;
}
+#ifdef C_CTYPE_ASCII
ASSERT (c_iscntrl (c) == ((c >= 0 && c < 0x20) || c == 0x7f));
+#else
+ ASSERT (!! c_iscntrl (c) == !! iscntrl (c & 0xff));
+#endif
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
@@ -96,7 +108,7 @@ test_all (void)
break;
}
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
@@ -110,13 +122,27 @@ test_all (void)
break;
}
+#ifdef C_CTYPE_ASCII
ASSERT (c_isgraph (c) == ((c >= 0x20 && c < 0x7f) && c != ' '));
+#else
+ ASSERT (!! c_isgraph (c) == !! isgraph (c & 0xff));
+#endif
+#ifdef C_CTYPE_ASCII
ASSERT (c_isprint (c) == (c >= 0x20 && c < 0x7f));
+#else
+ ASSERT (!! c_isprint (c) == !! isprint (c & 0xff));
+#endif
+#ifdef C_CTYPE_ASCII
ASSERT (c_ispunct (c) == (c_isgraph (c) && !c_isalnum (c)));
+#else
+ /* EBCDIC contains characters like accented letters, which fail
+ the above test. */
+ ASSERT (!! c_ispunct (c) == !! ispunct (c & 0xff));
+#endif
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case ' ': case '\t': case '\n': case '\v': case '\f': case '\r':
ASSERT (c_isspace (c) == 1);
@@ -126,7 +152,7 @@ test_all (void)
break;
}
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
@@ -140,7 +166,7 @@ test_all (void)
break;
}
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
@@ -153,7 +179,7 @@ test_all (void)
break;
}
- switch (c)
+ switch (CHAR_LITERAL (c))
{
case 'A':
ASSERT (c_tolower (c) == 'a');
diff --git a/tests/test-c-strcasecmp.c b/tests/test-c-strcasecmp.c
index f7f6b43..47feac8 100644
--- a/tests/test-c-strcasecmp.c
+++ b/tests/test-c-strcasecmp.c
@@ -19,6 +19,7 @@
#include <config.h>
#include "c-strcase.h"
+#include "c-ctype.h"
#include <locale.h>
#include <string.h>
@@ -57,9 +58,11 @@ main (int argc, char *argv[])
ASSERT (c_strcasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R") > 0); /* özgür */
ASSERT (c_strcasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r") < 0); /* özgür */
+#if C_CTYPE_ASCII
/* This test shows how strings of different size cannot compare equal. */
ASSERT (c_strcasecmp ("turkish", "TURK\304\260SH") < 0);
ASSERT (c_strcasecmp ("TURK\304\260SH", "turkish") > 0);
+#endif
return 0;
}
diff --git a/tests/test-c-strncasecmp.c b/tests/test-c-strncasecmp.c
index 4027b5b..20c64e3 100644
--- a/tests/test-c-strncasecmp.c
+++ b/tests/test-c-strncasecmp.c
@@ -19,6 +19,7 @@
#include <config.h>
#include "c-strcase.h"
+#include "c-ctype.h"
#include <locale.h>
#include <string.h>
@@ -71,9 +72,11 @@ main (int argc, char *argv[])
ASSERT (c_strncasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R", 99) > 0); /* özgür */
ASSERT (c_strncasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r", 99) < 0); /* özgür */
+#if C_CTYPE_ASCII
/* This test shows how strings of different size cannot compare equal. */
ASSERT (c_strncasecmp ("turkish", "TURK\304\260SH", 7) < 0);
ASSERT (c_strncasecmp ("TURK\304\260SH", "turkish", 7) > 0);
+#endif
return 0;
}
diff --git a/tests/test-canonicalize-lgpl.c b/tests/test-canonicalize-lgpl.c
index 12d2bb0..49c0221 100644
--- a/tests/test-canonicalize-lgpl.c
+++ b/tests/test-canonicalize-lgpl.c
@@ -191,12 +191,16 @@ main (void)
ASSERT (result2);
ASSERT (stat ("/", &st1) == 0);
ASSERT (stat ("//", &st2) == 0);
+ /* On IBM z/OS, "/" and "//" are distinct, yet they both have
+ st_dev == st_ino == 1. */
+#ifndef __MVS__
if (SAME_INODE (st1, st2))
{
ASSERT (strcmp (result1, "/") == 0);
ASSERT (strcmp (result2, "/") == 0);
}
else
+#endif
{
ASSERT (strcmp (result1, "//") == 0);
ASSERT (strcmp (result2, "//") == 0);
diff --git a/tests/test-iconv-utf.c b/tests/test-iconv-utf.c
index c1589f6..547e859 100644
--- a/tests/test-iconv-utf.c
+++ b/tests/test-iconv-utf.c
@@ -27,20 +27,39 @@
#include "macros.h"
+/* If we're compiling on an EBCDIC-based system, we need the test strings
+ to remain in ASCII. */
+#if 'A' != 0x41 && defined(__IBMC__)
+# pragma convert("ISO8859-1")
+# define CONVERT_ENABLED
+#endif
+
+/* The text is "Japanese (日本語) [\U0001D50D\U0001D51E\U0001D52D]". */
+
+const char test_utf8_string[] = "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
+
+const char test_utf16be_string[] = "\000J\000a\000p\000a\000n\000e\000s\000e\000 \000(\145\345\147\054\212\236\000)\000 \000[\330\065\335\015\330\065\335\036\330\065\335\055\000]";
+
+const char test_utf16le_string[] = "J\000a\000p\000a\000n\000e\000s\000e\000 \000(\000\345\145\054\147\236\212)\000 \000[\000\065\330\015\335\065\330\036\335\065\330\055\335]\000";
+
+const char test_utf32be_string[] = "\000\000\000J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\145\345\000\000\147\054\000\000\212\236\000\000\000)\000\000\000 \000\000\000[\000\001\325\015\000\001\325\036\000\001\325\055\000\000\000]";
+
+const char test_utf32le_string[] = "J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\000\345\145\000\000\054\147\000\000\236\212\000\000)\000\000\000 \000\000\000[\000\000\000\015\325\001\000\036\325\001\000\055\325\001\000]\000\000\000";
+
+#ifdef CONVERT_ENABLED
+# pragma convert(pop)
+#endif
+
int
main ()
{
#if HAVE_ICONV
/* Assume that iconv() supports at least the encoding UTF-8. */
- /* The text is "Japanese (日本語) [\U0001D50D\U0001D51E\U0001D52D]". */
-
/* Test conversion from UTF-8 to UTF-16BE with no errors. */
{
- static const char input[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
- static const char expected[] =
- "\000J\000a\000p\000a\000n\000e\000s\000e\000 \000(\145\345\147\054\212\236\000)\000 \000[\330\065\335\015\330\065\335\036\330\065\335\055\000]";
+#define input test_utf8_string
+#define expected test_utf16be_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -64,14 +83,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-8 to UTF-16LE with no errors. */
{
- static const char input[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
- static const char expected[] =
- "J\000a\000p\000a\000n\000e\000s\000e\000 \000(\000\345\145\054\147\236\212)\000 \000[\000\065\330\015\335\065\330\036\335\065\330\055\335]\000";
+#define input test_utf8_string
+#define expected test_utf16le_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -95,14 +115,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-8 to UTF-32BE with no errors. */
{
- static const char input[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
- static const char expected[] =
- "\000\000\000J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\145\345\000\000\147\054\000\000\212\236\000\000\000)\000\000\000 \000\000\000[\000\001\325\015\000\001\325\036\000\001\325\055\000\000\000]";
+#define input test_utf8_string
+#define expected test_utf32be_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -126,14 +147,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-8 to UTF-32LE with no errors. */
{
- static const char input[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
- static const char expected[] =
- "J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\000\345\145\000\000\054\147\000\000\236\212\000\000)\000\000\000 \000\000\000[\000\000\000\015\325\001\000\036\325\001\000\055\325\001\000]\000\000\000";
+#define input test_utf8_string
+#define expected test_utf32le_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -157,14 +179,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-16BE to UTF-8 with no errors. */
{
- static const char input[] =
- "\000J\000a\000p\000a\000n\000e\000s\000e\000 \000(\145\345\147\054\212\236\000)\000 \000[\330\065\335\015\330\065\335\036\330\065\335\055\000]";
- static const char expected[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
+#define input test_utf16be_string
+#define expected test_utf8_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -188,14 +211,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-16LE to UTF-8 with no errors. */
{
- static const char input[] =
- "J\000a\000p\000a\000n\000e\000s\000e\000 \000(\000\345\145\054\147\236\212)\000 \000[\000\065\330\015\335\065\330\036\335\065\330\055\335]\000";
- static const char expected[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
+#define input test_utf16le_string
+#define expected test_utf8_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -219,14 +243,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-32BE to UTF-8 with no errors. */
{
- static const char input[] =
- "\000\000\000J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\145\345\000\000\147\054\000\000\212\236\000\000\000)\000\000\000 \000\000\000[\000\001\325\015\000\001\325\036\000\001\325\055\000\000\000]";
- static const char expected[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
+#define input test_utf32be_string
+#define expected test_utf8_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -250,14 +275,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-32LE to UTF-8 with no errors. */
{
- static const char input[] =
- "J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\000\345\145\000\000\054\147\000\000\236\212\000\000)\000\000\000 \000\000\000[\000\000\000\015\325\001\000\036\325\001\000\055\325\001\000]\000\000\000";
- static const char expected[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
+#define input test_utf32le_string
+#define expected test_utf8_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -281,6 +307,9 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
#endif
diff --git a/tests/test-iconv.c b/tests/test-iconv.c
index ed715bd..a64c6dd 100644
--- a/tests/test-iconv.c
+++ b/tests/test-iconv.c
@@ -44,8 +44,14 @@ main ()
#if HAVE_ICONV
/* Assume that iconv() supports at least the encodings ASCII, ISO-8859-1,
and UTF-8. */
- iconv_t cd_88591_to_utf8 = iconv_open ("UTF-8", "ISO-8859-1");
- iconv_t cd_utf8_to_88591 = iconv_open ("ISO-8859-1", "UTF-8");
+ iconv_t cd_88591_to_utf8 = iconv_open ("UTF-8", "ISO8859-1");
+ iconv_t cd_utf8_to_88591 = iconv_open ("ISO8859-1", "UTF-8");
+
+#if defined __MVS__ && defined __IBMC__
+ /* String literals below are in ASCII, not EBCDIC. */
+# pragma convert("ISO8859-1")
+# define CONVERT_ENABLED
+#endif
ASSERT (cd_88591_to_utf8 != (iconv_t)(-1));
ASSERT (cd_utf8_to_88591 != (iconv_t)(-1));
@@ -142,7 +148,12 @@ main ()
iconv_close (cd_88591_to_utf8);
iconv_close (cd_utf8_to_88591);
+
+#ifdef CONVERT_ENABLED
+# pragma convert(pop)
#endif
+#endif /* HAVE_ICONV */
+
return 0;
}
diff --git a/tests/test-nonblocking-pipe.h b/tests/test-nonblocking-pipe.h
index 5b3646e..01c992c 100644
--- a/tests/test-nonblocking-pipe.h
+++ b/tests/test-nonblocking-pipe.h
@@ -31,10 +31,11 @@
OSF/1 >= 262145
Solaris <= 7 >= 10241
Solaris >= 8 >= 20481
+ z/OS >= 131073
Cygwin >= 65537
native Windows >= 4097 (depends on the _pipe argument)
*/
-#if defined __osf__ || (defined __linux__ && (defined __ia64__ || defined __mips__))
+#if defined __MVS__ || defined __osf__ || (defined __linux__ && (defined __ia64__ || defined __mips__))
# define PIPE_DATA_BLOCK_SIZE 270000
#elif defined __linux__ && defined __sparc__
# define PIPE_DATA_BLOCK_SIZE 140000
diff --git a/tests/test-nonblocking-reader.h b/tests/test-nonblocking-reader.h
index 8cba131..d8eaa32 100644
--- a/tests/test-nonblocking-reader.h
+++ b/tests/test-nonblocking-reader.h
@@ -110,7 +110,7 @@ full_read_from_nonblocking_fd (size_t fd, void *buf, size_t count)
ASSERT (spent_time < 0.5);
if (ret < 0)
{
- ASSERT (saved_errno == EAGAIN);
+ ASSERT (saved_errno == EAGAIN || saved_errno == EWOULDBLOCK);
usleep (SMALL_DELAY);
}
else
diff --git a/tests/test-nonblocking-writer.h b/tests/test-nonblocking-writer.h
index 0ecf996..ff148dc 100644
--- a/tests/test-nonblocking-writer.h
+++ b/tests/test-nonblocking-writer.h
@@ -124,7 +124,7 @@ main_writer_loop (int test, size_t data_block_size, int fd,
(long) ret, dbgstrerror (ret < 0, saved_errno));
if (ret < 0 && bytes_written >= data_block_size)
{
- ASSERT (saved_errno == EAGAIN);
+ ASSERT (saved_errno == EAGAIN || saved_errno == EWOULDBLOCK);
ASSERT (spent_time < 0.5);
break;
}
@@ -133,7 +133,7 @@ main_writer_loop (int test, size_t data_block_size, int fd,
ASSERT (spent_time < 0.5);
if (ret < 0)
{
- ASSERT (saved_errno == EAGAIN);
+ ASSERT (saved_errno == EAGAIN || saved_errno == EWOULDBLOCK);
usleep (SMALL_DELAY);
}
else
@@ -165,7 +165,7 @@ main_writer_loop (int test, size_t data_block_size, int fd,
ASSERT (spent_time < 0.5);
if (ret < 0)
{
- ASSERT (saved_errno == EAGAIN);
+ ASSERT (saved_errno == EAGAIN || saved_errno == EWOULDBLOCK);
usleep (SMALL_DELAY);
}
else
diff --git a/tests/test-sigpipe.sh b/tests/test-sigpipe.sh
index bc2baf2..6cf3242 100755
--- a/tests/test-sigpipe.sh
+++ b/tests/test-sigpipe.sh
@@ -21,7 +21,7 @@ fi
# Test signal's behaviour when a handler is installed.
tmpfiles="$tmpfiles t-sigpipeC.tmp"
-./test-sigpipe${EXEEXT} B 2> t-sigpipeC.tmp | head -1 > /dev/null
+./test-sigpipe${EXEEXT} C 2> t-sigpipeC.tmp | head -1 > /dev/null
if test -s t-sigpipeC.tmp; then
LC_ALL=C tr -d '\r' < t-sigpipeC.tmp
rm -fr $tmpfiles; exit 1
diff --git a/tests/test-wcwidth.c b/tests/test-wcwidth.c
index 9fad785..fdbecc3 100644
--- a/tests/test-wcwidth.c
+++ b/tests/test-wcwidth.c
@@ -26,6 +26,7 @@ SIGNATURE_CHECK (wcwidth, int, (wchar_t));
#include <locale.h>
#include <string.h>
+#include "c-ctype.h"
#include "localcharset.h"
#include "macros.h"
@@ -34,9 +35,11 @@ main ()
{
wchar_t wc;
+#ifdef C_CTYPE_ASCII
/* Test width of ASCII characters. */
for (wc = 0x20; wc < 0x7F; wc++)
ASSERT (wcwidth (wc) == 1);
+#endif
/* Switch to an UTF-8 locale. */
if (setlocale (LC_ALL, "fr_FR.UTF-8") != NULL
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 2:28 [PATCH] IBM z/OS + EBCDIC support Daniel Richard G.
@ 2015-09-22 15:23 ` Eric Blake
2015-09-22 19:27 ` Daniel Richard G.
2015-09-22 19:32 ` Paul Eggert
2015-09-22 19:50 ` [PATCH] IBM z/OS + EBCDIC support Paul Eggert
2 siblings, 1 reply; 49+ messages in thread
From: Eric Blake @ 2015-09-22 15:23 UTC (permalink / raw)
To: Daniel Richard G., bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 3950 bytes --]
On 09/21/2015 08:28 PM, Daniel Richard G. wrote:
> Hello list,
>
> The attached patch, against Git master, addresses numerous
> incompatibilities in Gnulib with IBM z/OS (a mainframe operating system)
> and the EBCDIC encoding.
>
> With my changes, Gnulib builds successfully, and most of the tests
> succeed. The remaining failures are as follows.
Thanks for the work. Can you please split the patch into a series of
multiple pieces, one patch per issue, so that we can apply the
obviously-correct ones while still discussing the other pieces, rather
than holding the entire large patch hostage to review?
Also, while I see you have copyright assignment on file for Gawk, I
don't see it for gnulib. You'll want to repeat the assignment process
for gnulib before we can take more than the most trivial patches.
Some quick comments, without having reviewed any code:
>
> * In EBCDIC, normal chars like 'A' occur in the upper half of the 8-bit
> range. This interferes with the idiom of using "switch (c)" and then
> "case 'A':" et al. because c can have two distinct values (-63 and
> 193) that should match to 'A'.
>
> My fix, then, is a macro which converts the input codepoint to the
> range that will match literal chars, when necessary. (Obviously, in
> ASCII, it's a no-op.) Any takers on a better name for this macro than
> CHAR_LITERAL()?
coreutils uses to_uchar() to force the conversion of a byte to an
unsigned character, useful for cases where sign extension of a byte is
not desired. Sounds like it does the same thing as what you are doing here.
> +++ lib/math.in.h
>
> * The system defines these functions as macros, and the compiler did not
> like seeing them redefined.
No underlying functions with linkage? POSIX generally requires that, so
you may want to submit a bug, but it's certainly not the first time
we've worked around that.
>
> +++ lib/regex.h
>
> * Ensure that "__string" does not expand to "1" when it is used as a
> formal parameter name.
Sounds like we shouldn't be naming our formal parameter __string, since
that's a name reserved to the internal implementation namespace.
>
> +++ m4/strstr.m4
>
> * The IBM runtime sucks; signal delivery is delayed until strstr()
> exits, so this test results in a hang that can only be SIGKILL'ed.
Not a hang, just a reallllllly long execution time; and all because the
libc implementation is O(n^2) instead of O(n). But they really block
signals during the call? Ouch.
> +++ tests/nan.h
>
> * z/OS, in addition to supporting IEEE floating-point, also supports an
> older "hexadecimal" format that does not support NaN. Bomb out if this
> is in use.
C, and POSIX, allow for platforms without NaN (in part because of cases
like the z/OS non-IEEE mode). I'm not surprised if we have baked in
assumptions that don't hold when IEEE is not around.
> +++ tests/test-c-strcasecmp.c
>
> * In EBCDIC-1047, the tests
>
> ASSERT (c_strcasecmp ("turkish", "TURK\304\260SH") < 0);
> ASSERT (c_strcasecmp ("TURK\304\260SH", "turkish") > 0);
>
> are actually
>
> ASSERT (c_strcasecmp ("turkish", "TURKD¬SH") < 0);
> ASSERT (c_strcasecmp ("TURKD¬SH", "turkish") > 0);
>
> which, of course, fail.
Basically, EBCDIC lacks the Turkish i, and since it is not a UTF-8
locale, we should probably be skipping the test in that environment.
> +++ tests/test-canonicalize-lgpl.c
>
> * Addressed a strange z/OS corner case. This system has
> DOUBLE_SLASH_IS_DISTINCT_ROOT, yet the dev/ino numbers are the same.
What? Does that mean 'ls -a /' and 'ls -a //' see different contents?
If they do, then sharing dev/ino is a bug; if they are identical, then
DOUBLE_SLASH_IS_DISTINCT_ROOT is defined incorrectly.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 15:23 ` Eric Blake
@ 2015-09-22 19:27 ` Daniel Richard G.
2015-09-22 20:00 ` Paul Eggert
0 siblings, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-22 19:27 UTC (permalink / raw)
To: Eric Blake, bug-gnulib
Hi Eric,
On Tue, 2015 Sep 22 09:23-0600, Eric Blake wrote:
>
> Thanks for the work. Can you please split the patch into a series of
> multiple pieces, one patch per issue, so that we can apply the obviously-
> correct ones while still discussing the other pieces, rather than
> holding the entire large patch hostage to review?
Wouldn't it be easier to apply everything to a feature branch, and
integrate it bit by bit? I can split up the patch, but my idea of a
sensible partitioning might not agree with yours...
> Also, while I see you have copyright assignment on file for Gawk, I
> don't see it for gnulib. You'll want to repeat the assignment process
> for gnulib before we can take more than the most trivial patches.
Okay, I've sent in the request form.
> coreutils uses to_uchar() to force the conversion of a byte to an
> unsigned character, useful for cases where sign extension of a byte
> is not desired. Sounds like it does the same thing as what you are
> doing here.
Kind of, except that character literals may be signed, and thus
potentially have a negative value. to_uchar() could be applied to both
sides, but then that wouldn't work in a "switch" block.
What my macro does, then, is "convert to the value that a matching char
literal would have, if it's not there already."
> > +++ lib/math.in.h
> >
> > * The system defines these functions as macros, and the compiler did
> > not like seeing them redefined.
>
> No underlying functions with linkage? POSIX generally requires that,
> so you may want to submit a bug, but it's certainly not the first time
> we've worked around that.
It wasn't that the functions had no linkage (though that may or may not
be the case), just that the compiler borked on the macro redefinition.
> > +++ lib/regex.h
> >
> > * Ensure that "__string" does not expand to "1" when it is used as a
> > formal parameter name.
>
> Sounds like we shouldn't be naming our formal parameter __string,
> since that's a name reserved to the internal implementation namespace.
That would be a better fix, yes. But doesn't this file come from glibc?
Is it feasible to make such a change happen there?
(The Gawk maintainer was reluctant to make local changes to that
project's regex files.)
> > +++ m4/strstr.m4
> >
> > * The IBM runtime sucks; signal delivery is delayed until strstr()
> > exits, so this test results in a hang that can only be SIGKILL'ed.
>
> Not a hang, just a reallllllly long execution time; and all because
> the libc implementation is O(n^2) instead of O(n). But they really
> block signals during the call? Ouch.
Yes, that was one of many forehead-slapping moments in this work :>
And the silly thing is, if you provide your own implementation of
strstr(), it won't have this problem, because it won't be in the
system runtime where signal delivery is verboten!
> > +++ tests/nan.h
> >
> > * z/OS, in addition to supporting IEEE floating-point, also supports
> > an older "hexadecimal" format that does not support NaN. Bomb out
> > if this is in use.
>
> C, and POSIX, allow for platforms without NaN (in part because of
> cases like the z/OS non-IEEE mode). I'm not surprised if we have
> baked in assumptions that don't hold when IEEE is not around.
If you'd like to disable the NaN stuff cleanly when IEEE is not in use,
I'd be happy to help make that happen. But for my purposes, I want the
same floating-point gestalt as all other platforms of interest have, so
I punted on the hex floats.
> > ASSERT (c_strcasecmp ("turkish", "TURKD¬SH") < 0);
> > ASSERT (c_strcasecmp ("TURKD¬SH", "turkish") > 0);
> >
> > which, of course, fail.
>
> Basically, EBCDIC lacks the Turkish i, and since it is not a UTF-8
> locale, we should probably be skipping the test in that environment.
I see no harm in checking for unexpected-UTF-8 behavior; it's just the
fact that this is not ASCII that is throwing things off.
> > +++ tests/test-canonicalize-lgpl.c
> >
> > * Addressed a strange z/OS corner case. This system has
> > DOUBLE_SLASH_IS_DISTINCT_ROOT, yet the dev/ino numbers are the
> > same.
>
> What? Does that mean 'ls -a /' and 'ls -a //' see different contents?
> If they do, then sharing dev/ino is a bug; if they are identical, then
> DOUBLE_SLASH_IS_DISTINCT_ROOT is defined incorrectly.
Well... those two "ls" commands give the same output, but the
double-slash is used within z/OS Unix System Services to refer to a
different sort of file space:
http://www-01.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxa500/mvsds.htm?lang=en
Some commands support that syntax, but not all. As for what the
configure test does...
$ ls -di / //
1 / 1 //
$ wc /dev/null
0 0 0 /dev/null
$ wc //dev/null
wc: file "//dev/null": EDC5047I An invalid file name was specified as a function parameter.
And yet...
$ cd //dev
$ ls -l null
crwxrwxrwx 1 BPXROOT SYS1 4, 0 Sep 3 2013 null
It's not clear exactly how this "alternate root" is implemented---
possibly by intercepting pathnames in open(). Might be worth
special-casing the DOUBLE_SLASH test for this platform...
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 2:28 [PATCH] IBM z/OS + EBCDIC support Daniel Richard G.
2015-09-22 15:23 ` Eric Blake
@ 2015-09-22 19:32 ` Paul Eggert
2015-09-22 19:46 ` Paul Eggert
2015-09-22 20:37 ` Daniel Richard G.
2015-09-22 19:50 ` [PATCH] IBM z/OS + EBCDIC support Paul Eggert
2 siblings, 2 replies; 49+ messages in thread
From: Paul Eggert @ 2015-09-22 19:32 UTC (permalink / raw)
To: Daniel Richard G., bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 1647 bytes --]
Thanks for looking into this. I have some questions about the c-ctype
changes. It appears that the proposed patch defers to the system
functions (which use the current locale), but that's not the intent of
c-ctype: it's supposed to correspond to a stripped down POSIX "C" locale
regardless of the current locale settings. Is there something special
in z/OS that requires using the system functions? (E.g., does the "C"
locale behave differently depending on some *other* setting regarding
character set?)
With the above in mind, it's not clear what c_isascii should do. Should
it return 1 for bytes in the range 0..127, or for bytes that correspond
to ASCII bytes if one assumes the standard translation from EBCDIC code
page 037 to ASCII? (Is there a standard?) If the former, the current
code is OK; if the latter, does the system isascii always return the
same results regardless of locale and do these results make sense?
Anyway, in looking through the code I see that it's hard to test a port
to EBCDIC because it uses ifdef rather than if, and I do see some
promotion bugs that you noted but we can fix these with inline functions
rather than macros (cleaner and safer nowadays), and there are a few
other style glitches (e.g., boolean values, overuse of >=) so I
installed the attached patch. This patch assumes EBCDIC control
characters are either less than ' ' or are all 1 bits, which I think is
right. The patch also tightens up the tests a bit.
This patch doesn't address the isascii problem, nor the "something
special in z/OS" problem, so quite possibly further patches will be
needed to this module.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-c-ctype-port-better-to-EBCDIC.patch --]
[-- Type: text/x-patch; name="0001-c-ctype-port-better-to-EBCDIC.patch", Size: 20517 bytes --]
>From 1b0f778e32f73c8601e7c517a0b83098996363a9 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 22 Sep 2015 12:17:06 -0700
Subject: [PATCH] c-ctype: port better to EBCDIC
Problems reported by Daniel Richard G. in
http://lists.gnu.org/archive/html/bug-gnulib/2015-09/msg00020.html
* lib/c-ctype.c: Include <limits.h>, for CHAR_MIN and CHAR_MAX.
Include "verify.h".
(C_CTYPE_ASCII, C_CTYPE_CONSECUTIVE_DIGITS)
(C_CTYPE_CONSECUTIVE_LOWERCASE, C_CTYPE_CONSECUTIVE_UPPERCASE):
Define as enum constants with value false, if not defined, so that
code can use 'if' instead of 'ifdef'. Using 'if' helps make the
code more portable, as both branches of the 'if' are compiled on
all platforms.
(C_CTYPE_EBCDIC): New constant.
(to_char): New static function.
(c_isalnum, c_isalpha, c_isdigit, c_islower, c_isgraph, c_isprint)
(c_ispunct, c_isupper, c_isxdigit, c_tolower, c_toupper):
Rewrite to use 'if' instead of 'ifdef'.
Use to_char if non-ASCII. Prefer <= to >=.
Prefer true and false to 1 and 0, for booleans.
(c_iscntrl): Use 'if', not 'ifdef'. Special case for EBCDIC.
Verify that the character set is either ASCII or EBCDIC.
* tests/test-c-ctype.c: Include <limits.h>, for CHAR_MIN
(to_char): New function.
(test_all): Port to EBCDIC. Add some more tests, e.g., for c_ispunct.
---
ChangeLog | 26 ++++++
lib/c-ctype.c | 253 ++++++++++++++++++++++++++-------------------------
tests/test-c-ctype.c | 106 +++++++++++----------
3 files changed, 216 insertions(+), 169 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index c552225..8723b38 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,29 @@
+2015-09-22 Paul Eggert <eggert@cs.ucla.edu>
+
+ c-ctype: port better to EBCDIC
+ Problems reported by Daniel Richard G. in
+ http://lists.gnu.org/archive/html/bug-gnulib/2015-09/msg00020.html
+ * lib/c-ctype.c: Include <limits.h>, for CHAR_MIN and CHAR_MAX.
+ Include "verify.h".
+ (C_CTYPE_ASCII, C_CTYPE_CONSECUTIVE_DIGITS)
+ (C_CTYPE_CONSECUTIVE_LOWERCASE, C_CTYPE_CONSECUTIVE_UPPERCASE):
+ Define as enum constants with value false, if not defined, so that
+ code can use 'if' instead of 'ifdef'. Using 'if' helps make the
+ code more portable, as both branches of the 'if' are compiled on
+ all platforms.
+ (C_CTYPE_EBCDIC): New constant.
+ (to_char): New static function.
+ (c_isalnum, c_isalpha, c_isdigit, c_islower, c_isgraph, c_isprint)
+ (c_ispunct, c_isupper, c_isxdigit, c_tolower, c_toupper):
+ Rewrite to use 'if' instead of 'ifdef'.
+ Use to_char if non-ASCII. Prefer <= to >=.
+ Prefer true and false to 1 and 0, for booleans.
+ (c_iscntrl): Use 'if', not 'ifdef'. Special case for EBCDIC.
+ Verify that the character set is either ASCII or EBCDIC.
+ * tests/test-c-ctype.c: Include <limits.h>, for CHAR_MIN
+ (to_char): New function.
+ (test_all): Port to EBCDIC. Add some more tests, e.g., for c_ispunct.
+
2015-09-21 Pádraig Brady <P@draigBrady.com>
nanosleep: fix return code for interrupted replacement
diff --git a/lib/c-ctype.c b/lib/c-ctype.c
index 6635d34..916d46e 100644
--- a/lib/c-ctype.c
+++ b/lib/c-ctype.c
@@ -21,6 +21,34 @@ along with this program; if not, see <http://www.gnu.org/licenses/>. */
#define NO_C_CTYPE_MACROS
#include "c-ctype.h"
+#include <limits.h>
+#include "verify.h"
+
+#ifndef C_CTYPE_ASCII
+enum { C_CTYPE_ASCII = false };
+#endif
+#ifndef C_CTYPE_CONSECUTIVE_DIGITS
+enum { C_CTYPE_CONSECUTIVE_DIGITS = false };
+#endif
+#ifndef C_CTYPE_CONSECUTIVE_LOWERCASE
+enum { C_CTYPE_CONSECUTIVE_LOWERCASE = false };
+#endif
+#ifndef C_CTYPE_CONSECUTIVE_UPPERCASE
+enum { C_CTYPE_CONSECUTIVE_UPPERCASE = false };
+#endif
+
+/* Convert an int, which may be promoted from either an unsigned or a
+ signed char, to the corresponding char. */
+
+static char
+to_char (int c)
+{
+ enum { nchars = CHAR_MAX - CHAR_MIN + 1 };
+ if (CHAR_MIN < 0 && CHAR_MAX < c && c < nchars)
+ return c - nchars;
+ return c;
+}
+
/* The function isascii is not locale dependent. Its use in EBCDIC is
questionable. */
bool
@@ -32,18 +60,20 @@ c_isascii (int c)
bool
c_isalnum (int c)
{
-#if C_CTYPE_CONSECUTIVE_DIGITS \
- && C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE
-#if C_CTYPE_ASCII
- return ((c >= '0' && c <= '9')
- || ((c & ~0x20) >= 'A' && (c & ~0x20) <= 'Z'));
-#else
- return ((c >= '0' && c <= '9')
- || (c >= 'A' && c <= 'Z')
- || (c >= 'a' && c <= 'z'));
-#endif
-#else
- switch (c)
+ if (C_CTYPE_CONSECUTIVE_DIGITS
+ && C_CTYPE_CONSECUTIVE_UPPERCASE
+ && C_CTYPE_CONSECUTIVE_LOWERCASE)
+ {
+ if (C_CTYPE_ASCII)
+ return (('0' <= c && c <= '9')
+ || ('A' <= (c & ~0x20) && (c & ~0x20) <= 'Z'));
+ else
+ return (('0' <= c && c <= '9')
+ || ('A' <= c && c <= 'Z')
+ || ('a' <= c && c <= 'z'));
+ }
+
+ switch (to_char (c))
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
@@ -57,24 +87,24 @@ c_isalnum (int c)
case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
case 's': case 't': case 'u': case 'v': case 'w': case 'x':
case 'y': case 'z':
- return 1;
+ return true;
default:
- return 0;
+ return false;
}
-#endif
}
bool
c_isalpha (int c)
{
-#if C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE
-#if C_CTYPE_ASCII
- return ((c & ~0x20) >= 'A' && (c & ~0x20) <= 'Z');
-#else
- return ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'));
-#endif
-#else
- switch (c)
+ if (C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE)
+ {
+ if (C_CTYPE_ASCII)
+ return 'A' <= (c & ~0x20) && (c & ~0x20) <= 'Z';
+ else
+ return ('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z');
+ }
+
+ switch (to_char (c))
{
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
@@ -86,11 +116,10 @@ c_isalpha (int c)
case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
case 's': case 't': case 'u': case 'v': case 'w': case 'x':
case 'y': case 'z':
- return 1;
+ return true;
default:
- return 0;
+ return false;
}
-#endif
}
bool
@@ -102,81 +131,65 @@ c_isblank (int c)
bool
c_iscntrl (int c)
{
-#if C_CTYPE_ASCII
- return ((c & ~0x1f) == 0 || c == 0x7f);
-#else
- switch (c)
- {
- case ' ': case '!': case '"': case '#': case '$': case '%':
- case '&': case '\'': case '(': case ')': case '*': case '+':
- case ',': case '-': case '.': case '/':
- case '0': case '1': case '2': case '3': case '4': case '5':
- case '6': case '7': case '8': case '9':
- case ':': case ';': case '<': case '=': case '>': case '?':
- case '@':
- case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
- case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
- case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
- case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
- case 'Y': case 'Z':
- case '[': case '\\': case ']': case '^': case '_': case '`':
- case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
- case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
- case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
- case 's': case 't': case 'u': case 'v': case 'w': case 'x':
- case 'y': case 'z':
- case '{': case '|': case '}': case '~':
- return 0;
- default:
- return 1;
- }
-#endif
+ enum { C_CTYPE_EBCDIC = (' ' == 64 && '0' == 240
+ && 'A' == 193 && 'J' == 209 && 'S' == 226
+ && 'A' == 129 && 'J' == 145 && 'S' == 162) };
+ verify (C_CTYPE_ASCII || C_CTYPE_EBCDIC);
+
+ if (0 <= c && c < ' ')
+ return true;
+ if (C_CTYPE_ASCII)
+ return c == 0x7f;
+ else
+ return c == 0xff || c == -1;
}
bool
c_isdigit (int c)
{
-#if C_CTYPE_CONSECUTIVE_DIGITS
- return (c >= '0' && c <= '9');
-#else
+ if (C_CTYPE_ASCII)
+ return '0' <= c && c <= '9';
+
+ c = to_char (c);
+ if (C_CTYPE_CONSECUTIVE_DIGITS)
+ return '0' <= c && c <= '9';
+
switch (c)
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
- return 1;
+ return true;
default:
- return 0;
+ return false;
}
-#endif
}
bool
c_islower (int c)
{
-#if C_CTYPE_CONSECUTIVE_LOWERCASE
- return (c >= 'a' && c <= 'z');
-#else
- switch (c)
+ if (C_CTYPE_CONSECUTIVE_LOWERCASE)
+ return 'a' <= c && c <= 'z';
+
+ switch (to_char (c))
{
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
case 's': case 't': case 'u': case 'v': case 'w': case 'x':
case 'y': case 'z':
- return 1;
+ return true;
default:
- return 0;
+ return false;
}
-#endif
}
bool
c_isgraph (int c)
{
-#if C_CTYPE_ASCII
- return (c >= '!' && c <= '~');
-#else
- switch (c)
+ if (C_CTYPE_ASCII)
+ return '!' <= c && c <= '~';
+
+ switch (to_char (c))
{
case '!': case '"': case '#': case '$': case '%': case '&':
case '\'': case '(': case ')': case '*': case '+': case ',':
@@ -197,20 +210,19 @@ c_isgraph (int c)
case 's': case 't': case 'u': case 'v': case 'w': case 'x':
case 'y': case 'z':
case '{': case '|': case '}': case '~':
- return 1;
+ return true;
default:
- return 0;
+ return false;
}
-#endif
}
bool
c_isprint (int c)
{
-#if C_CTYPE_ASCII
- return (c >= ' ' && c <= '~');
-#else
- switch (c)
+ if (C_CTYPE_ASCII)
+ return ' ' <= c && c <= '~';
+
+ switch (to_char (c))
{
case ' ': case '!': case '"': case '#': case '$': case '%':
case '&': case '\'': case '(': case ')': case '*': case '+':
@@ -231,22 +243,21 @@ c_isprint (int c)
case 's': case 't': case 'u': case 'v': case 'w': case 'x':
case 'y': case 'z':
case '{': case '|': case '}': case '~':
- return 1;
+ return true;
default:
- return 0;
+ return false;
}
-#endif
}
bool
c_ispunct (int c)
{
-#if C_CTYPE_ASCII
- return ((c >= '!' && c <= '~')
- && !((c >= '0' && c <= '9')
- || ((c & ~0x20) >= 'A' && (c & ~0x20) <= 'Z')));
-#else
- switch (c)
+ if (C_CTYPE_ASCII)
+ return (('!' <= c && c <= '~')
+ && !(('0' <= c && c <= '9')
+ || ('A' <= (c & ~0x20) && (c & ~0x20) <= 'Z')));
+
+ switch (to_char (c))
{
case '!': case '"': case '#': case '$': case '%': case '&':
case '\'': case '(': case ')': case '*': case '+': case ',':
@@ -255,11 +266,10 @@ c_ispunct (int c)
case '@':
case '[': case '\\': case ']': case '^': case '_': case '`':
case '{': case '|': case '}': case '~':
- return 1;
+ return true;
default:
- return 0;
+ return false;
}
-#endif
}
bool
@@ -272,57 +282,56 @@ c_isspace (int c)
bool
c_isupper (int c)
{
-#if C_CTYPE_CONSECUTIVE_UPPERCASE
- return (c >= 'A' && c <= 'Z');
-#else
- switch (c)
+ if (C_CTYPE_CONSECUTIVE_UPPERCASE)
+ return 'A' <= c && c <= 'Z';
+
+ switch (to_char (c))
{
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
case 'Y': case 'Z':
- return 1;
+ return true;
default:
- return 0;
+ return false;
}
-#endif
}
bool
c_isxdigit (int c)
{
-#if C_CTYPE_CONSECUTIVE_DIGITS \
- && C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE
-#if C_CTYPE_ASCII
- return ((c >= '0' && c <= '9')
- || ((c & ~0x20) >= 'A' && (c & ~0x20) <= 'F'));
-#else
- return ((c >= '0' && c <= '9')
- || (c >= 'A' && c <= 'F')
- || (c >= 'a' && c <= 'f'));
-#endif
-#else
- switch (c)
+ if (C_CTYPE_CONSECUTIVE_DIGITS
+ && C_CTYPE_CONSECUTIVE_UPPERCASE
+ && C_CTYPE_CONSECUTIVE_LOWERCASE)
+ {
+ if ('0' <= c && c <= '9')
+ return true;
+ if (C_CTYPE_ASCII)
+ return 'A' <= (c & ~0x20) && (c & ~0x20) <= 'F';
+ return (('A' <= c && c <= 'F')
+ || ('a' <= c && c <= 'f'));
+ }
+
+ switch (to_char (c))
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
- return 1;
+ return true;
default:
- return 0;
+ return false;
}
-#endif
}
int
c_tolower (int c)
{
-#if C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE
- return (c >= 'A' && c <= 'Z' ? c - 'A' + 'a' : c);
-#else
- switch (c)
+ if (C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE)
+ return c_isupper (c) ? c - 'A' + 'a' : c;
+
+ switch (to_char (c))
{
case 'A': return 'a';
case 'B': return 'b';
@@ -352,16 +361,15 @@ c_tolower (int c)
case 'Z': return 'z';
default: return c;
}
-#endif
}
int
c_toupper (int c)
{
-#if C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE
- return (c >= 'a' && c <= 'z' ? c - 'a' + 'A' : c);
-#else
- switch (c)
+ if (C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE)
+ return c_islower (c) ? c - 'a' + 'A' : c;
+
+ switch (to_char (c))
{
case 'a': return 'A';
case 'b': return 'B';
@@ -391,5 +399,4 @@ c_toupper (int c)
case 'z': return 'Z';
default: return c;
}
-#endif
}
diff --git a/tests/test-c-ctype.c b/tests/test-c-ctype.c
index 81fe936..63d0af9 100644
--- a/tests/test-c-ctype.c
+++ b/tests/test-c-ctype.c
@@ -20,10 +20,19 @@
#include "c-ctype.h"
+#include <limits.h>
#include <locale.h>
#include "macros.h"
+static char
+to_char (int c)
+{
+ if (CHAR_MIN < 0 && CHAR_MAX < c)
+ return c - CHAR_MAX - 1 + CHAR_MIN;
+ return c;
+}
+
static void
test_all (void)
{
@@ -31,49 +40,32 @@ test_all (void)
for (c = -0x80; c < 0x100; c++)
{
- ASSERT (c_isascii (c) == (c >= 0 && c < 0x80));
-
- switch (c)
+ if (c < 0)
{
- case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
- case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
- case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
- case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
- case 'Y': case 'Z':
- case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
- case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
- case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
- case 's': case 't': case 'u': case 'v': case 'w': case 'x':
- case 'y': case 'z':
- case '0': case '1': case '2': case '3': case '4': case '5':
- case '6': case '7': case '8': case '9':
- ASSERT (c_isalnum (c) == 1);
- break;
- default:
- ASSERT (c_isalnum (c) == 0);
- break;
+ ASSERT (c_isascii (c) == c_isascii (c + 0x100));
+ ASSERT (c_isalnum (c) == c_isalnum (c + 0x100));
+ ASSERT (c_isalpha (c) == c_isalpha (c + 0x100));
+ ASSERT (c_isblank (c) == c_isblank (c + 0x100));
+ ASSERT (c_iscntrl (c) == c_iscntrl (c + 0x100));
+ ASSERT (c_isdigit (c) == c_isdigit (c + 0x100));
+ ASSERT (c_islower (c) == c_islower (c + 0x100));
+ ASSERT (c_isgraph (c) == c_isgraph (c + 0x100));
+ ASSERT (c_isprint (c) == c_isprint (c + 0x100));
+ ASSERT (c_ispunct (c) == c_ispunct (c + 0x100));
+ ASSERT (c_isspace (c) == c_isspace (c + 0x100));
+ ASSERT (c_isupper (c) == c_isupper (c + 0x100));
+ ASSERT (c_isxdigit (c) == c_isxdigit (c + 0x100));
+ ASSERT (to_char (c_tolower (c)) == to_char (c_tolower (c + 0x100)));
+ ASSERT (to_char (c_toupper (c)) == to_char (c_toupper (c + 0x100)));
}
- switch (c)
- {
- case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
- case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
- case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
- case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
- case 'Y': case 'Z':
- case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
- case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
- case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
- case 's': case 't': case 'u': case 'v': case 'w': case 'x':
- case 'y': case 'z':
- ASSERT (c_isalpha (c) == 1);
- break;
- default:
- ASSERT (c_isalpha (c) == 0);
- break;
- }
+ ASSERT (c_isascii (c) == (c >= 0 && c < 0x80));
+
+ ASSERT (c_isalnum (c) == (c_isalpha (c) || c_isdigit (c)));
+
+ ASSERT (c_isalpha (c) == (c_islower (c) || c_isupper (c)));
- switch (c)
+ switch (to_char (c))
{
case '\t': case ' ':
ASSERT (c_isblank (c) == 1);
@@ -83,9 +75,13 @@ test_all (void)
break;
}
+#ifdef C_CTYPE_ASCII
ASSERT (c_iscntrl (c) == ((c >= 0 && c < 0x20) || c == 0x7f));
+#endif
- switch (c)
+ ASSERT (! (c_iscntrl (c) && c_isprint (c)));
+
+ switch (to_char (c))
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
@@ -96,7 +92,7 @@ test_all (void)
break;
}
- switch (c)
+ switch (to_char (c))
{
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
@@ -110,13 +106,31 @@ test_all (void)
break;
}
+#ifdef C_CTYPE_ASCII
ASSERT (c_isgraph (c) == ((c >= 0x20 && c < 0x7f) && c != ' '));
ASSERT (c_isprint (c) == (c >= 0x20 && c < 0x7f));
+#endif
+
+ ASSERT (c_isgraph (c) == (c_isalnum (c) || c_ispunct (c)));
+
+ ASSERT (c_isprint (c) == (c_isgraph (c) || c == ' '));
- ASSERT (c_ispunct (c) == (c_isgraph (c) && !c_isalnum (c)));
+ switch (to_char (c))
+ {
+ case '!': case '"': case '#': case '$': case '%': case '&': case '\'':
+ case '(': case ')': case '*': case '+': case ',': case '-': case '.':
+ case '/': case ':': case ';': case '<': case '=': case '>': case '?':
+ case '@': case '[': case'\\': case ']': case '^': case '_': case '`':
+ case '{': case '|': case '}': case '~':
+ ASSERT (c_ispunct (c) == 1);
+ break;
+ default:
+ ASSERT (c_ispunct (c) == 0);
+ break;
+ }
- switch (c)
+ switch (to_char (c))
{
case ' ': case '\t': case '\n': case '\v': case '\f': case '\r':
ASSERT (c_isspace (c) == 1);
@@ -126,7 +140,7 @@ test_all (void)
break;
}
- switch (c)
+ switch (to_char (c))
{
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
@@ -140,7 +154,7 @@ test_all (void)
break;
}
- switch (c)
+ switch (to_char (c))
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
@@ -153,7 +167,7 @@ test_all (void)
break;
}
- switch (c)
+ switch (to_char (c))
{
case 'A':
ASSERT (c_tolower (c) == 'a');
--
2.1.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 19:32 ` Paul Eggert
@ 2015-09-22 19:46 ` Paul Eggert
2015-09-22 20:37 ` Daniel Richard G.
1 sibling, 0 replies; 49+ messages in thread
From: Paul Eggert @ 2015-09-22 19:46 UTC (permalink / raw)
To: Daniel Richard G., bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 80 bytes --]
Ooops, I forgot to add a dependency. I installed the attached followup
patch.
[-- Attachment #2: 0001-modules-c-ctype-Depends-on-Add-verify.patch --]
[-- Type: text/x-patch, Size: 1002 bytes --]
>From 07ed58f3c5b54fe3935ce522dbc1c1a716185e67 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 22 Sep 2015 12:44:25 -0700
Subject: [PATCH] * modules/c-ctype (Depends-on): Add verify.
---
ChangeLog | 1 +
modules/c-ctype | 1 +
2 files changed, 2 insertions(+)
diff --git a/ChangeLog b/ChangeLog
index 8723b38..4ae3a57 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -20,6 +20,7 @@
Prefer true and false to 1 and 0, for booleans.
(c_iscntrl): Use 'if', not 'ifdef'. Special case for EBCDIC.
Verify that the character set is either ASCII or EBCDIC.
+ * modules/c-ctype (Depends-on): Add verify.
* tests/test-c-ctype.c: Include <limits.h>, for CHAR_MIN
(to_char): New function.
(test_all): Port to EBCDIC. Add some more tests, e.g., for c_ispunct.
diff --git a/modules/c-ctype b/modules/c-ctype
index 7be209f..b172d13 100644
--- a/modules/c-ctype
+++ b/modules/c-ctype
@@ -7,6 +7,7 @@ lib/c-ctype.c
Depends-on:
stdbool
+verify
configure.ac:
--
2.1.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 2:28 [PATCH] IBM z/OS + EBCDIC support Daniel Richard G.
2015-09-22 15:23 ` Eric Blake
2015-09-22 19:32 ` Paul Eggert
@ 2015-09-22 19:50 ` Paul Eggert
2015-09-22 20:47 ` Daniel Richard G.
2 siblings, 1 reply; 49+ messages in thread
From: Paul Eggert @ 2015-09-22 19:50 UTC (permalink / raw)
To: Daniel Richard G., bug-gnulib
A few non-ctype-related comments:
Omit parens around arguments of 'defined', e.g., say "defined __MVS__"
not "defined (__MVS__)".
I agree with Eric that we should rename "__string" rather than fiddle
with #undefing it. It's just a placeholder name. I suggest renaming it
to "__str". We can backport this to glibc eventually.
In strtod.c, don't bother with "if (errno == ERANGE) errno = 0;". Just
do "errno = 0;".
Also in strtod.c, don't assume isnand exists. That is, replace
"!isnand(num)" with "num == num".
Update serial numbers in changed .m4 files.
In m4/fclose.m4, gl_FUNC_FCLOSE should AC_REQUIRE([AC_CANONICAL_HOST]).
Also, it should test $host_os rather than $host.
In m4/strstr.m4, the __MVS__ failure should be at compile-time, with
#error, rather than at run-time. That's better if cross-compiling.
In comments, prefer imperatives, e.g., "Instead, temporarily redefine
..." rather than "Instead, we temporarily redefine ...". This is
standard GNU style and is shorter.
The get_rusage_as code has duplications. Simpler would be:
uintptr_t
get_rusage_as (void)
{
/* On Mac OS X, AIX, Cygwin, and z/OS, get_rusage_as_via_setrlimit
exists but does not work. */
#if (! ((defined __APPLE__ && defined __MACH__) \
|| defined _AIX || defined __CYGWIN__ || defined __MVS__) \
&& HAVE_SETRLIMIT && defined RLIMIT_AS && HAVE_SYS_MMAN_H &&
HAVE_MPROTECT)
/* Prefer get_rusage_as_via_setrlimit() if it succeeds,
because the caller may want to use the result with setrlimit(). */
uintptr_t result = get_rusage_as_via_setrlimit ();
if (result != 0)
return result;
#endif
return get_rusage_as_via_iterator ();
}
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 19:27 ` Daniel Richard G.
@ 2015-09-22 20:00 ` Paul Eggert
2015-09-22 20:08 ` Eric Blake
0 siblings, 1 reply; 49+ messages in thread
From: Paul Eggert @ 2015-09-22 20:00 UTC (permalink / raw)
To: Daniel Richard G., Eric Blake, bug-gnulib
On 09/22/2015 12:27 PM, Daniel Richard G. wrote:
> Wouldn't it be easier to apply everything to a feature branch, and
> integrate it bit by bit?
You can do that in your own repository if it makes things simpler, but
for something this small I'd rather just see patches via email.
> wc: file "//dev/null": EDC5047I An invalid file name was specified as
> a function parameter
How about if we add //dev/null to the configure-time test as to whether
/ and // are the same? If //dev/null doesn't work, then / and // are
not the same.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 20:00 ` Paul Eggert
@ 2015-09-22 20:08 ` Eric Blake
2015-09-22 20:51 ` Daniel Richard G.
0 siblings, 1 reply; 49+ messages in thread
From: Eric Blake @ 2015-09-22 20:08 UTC (permalink / raw)
To: Paul Eggert, Daniel Richard G., bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 722 bytes --]
On 09/22/2015 02:00 PM, Paul Eggert wrote:
>> wc: file "//dev/null": EDC5047I An invalid file name was specified as
>> a function parameter
>
> How about if we add //dev/null to the configure-time test as to whether
> / and // are the same? If //dev/null doesn't work, then / and // are
> not the same.
Rather, it sounds like configure is already correct, and we are
correctly deducing that // is different; but that the difference is odd
on this platform in that it is not distinguishable via dev/ino (every
other platform with distinct // at least has the decency to give a
distinct dev/ino).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 19:32 ` Paul Eggert
2015-09-22 19:46 ` Paul Eggert
@ 2015-09-22 20:37 ` Daniel Richard G.
2015-09-22 22:03 ` Paul Eggert
1 sibling, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-22 20:37 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib
Hi Paul,
On Tue, 2015 Sep 22 12:32-0700, Paul Eggert wrote:
> Thanks for looking into this. I have some questions about the c-ctype
> changes. It appears that the proposed patch defers to the system
> functions (which use the current locale), but that's not the intent of
> c-ctype: it's supposed to correspond to a stripped down POSIX "C"
> locale regardless of the current locale settings. Is there something
> special in z/OS that requires using the system functions? (E.g., does
> the "C" locale behave differently depending on some *other* setting
> regarding character set?)
Mainly, it was the attempt to answer the question "so what specific
variant of EBCDIC are we going to target here?" that led me to use
the system functions. EBCDIC-1047 is favored in z/OS, but EBCDIC-037
is also popular, and then there are the Russian/Japanese/etc. code
pages that some far-flung users might want. However, unlike "normal"
8-bit encodings like ISO 8859-#, KOI8-R et al., there is no agreement
in the 7-bit range, and even ASCII characters like "[" and "]" are
not consistently encoded between EBCDIC variants. We don't have the
option of saying, "Okay, screw all that, we'll just limit ourselves
to this common subset," unless said subset excludes things like
punctuation marks.
My view is, it's not worth the hassle. Yes, c-ctype is not supposed to
be locale-dependent. It's going to be a lot more work, and a lot more
code to maintain to overcome that, and it's not likely the users of
these systems will see a corresponding benefit. I think it would be
better to have this for now---it's better than nothing---and if a clear
need arises in the future for locale-independent behavior on z/OS
(possibly by selecting an EBCDIC variant at compile time), then cross
that bridge then.
> With the above in mind, it's not clear what c_isascii should do.
> Should it return 1 for bytes in the range 0..127, or for bytes that
> correspond to ASCII bytes if one assumes the standard translation
> from EBCDIC code page 037 to ASCII? (Is there a standard?) If the
> former, the current code is OK; if the latter, does the system
> isascii always return the same results regardless of locale and do
> these results make sense?
The latter behavior is the right one, IMO. If the former, there wouldn't
even be a point to having an isascii() function at all; you would just
do a range check.
Yes, there's a standard... a whole smorgasbord to choose from ^_^
The system isascii() function is locale-dependent. With "[" and "]"
depending on that, I don't see a way to get around this, unless you
deliberately support one EBCDIC variant at the expense of all others.
http://www-01.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/risasc.htm?lang=en
> Anyway, in looking through the code I see that it's hard to test a port
> to EBCDIC because it uses ifdef rather than if, and I do see some
> promotion bugs that you noted but we can fix these with inline functions
> rather than macros (cleaner and safer nowadays), and there are a few
> other style glitches (e.g., boolean values, overuse of >=) so I
> installed the attached patch. This patch assumes EBCDIC control
> characters are either less than ' ' or are all 1 bits, which I think is
> right. The patch also tightens up the tests a bit.
Yes, all control characters appear to be in [\x00-\x3F], but not
everything in that range is a control character. (I remember 0x04 was
not.) I tried making c_iscntrl() a simple range check at first, but that
did not agree with the system iscntrl().
> This patch doesn't address the isascii problem, nor the "something
> special in z/OS" problem, so quite possibly further patches will be
> needed to this module.
> Email had 1 attachment:
> + 0001-c-ctype-port-better-to-EBCDIC.patch
> 21k (text/x-patch)
I'll be happy to test your [revised] patch this evening.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 19:50 ` [PATCH] IBM z/OS + EBCDIC support Paul Eggert
@ 2015-09-22 20:47 ` Daniel Richard G.
0 siblings, 0 replies; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-22 20:47 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib
On Tue, 2015 Sep 22 12:50-0700, Paul Eggert wrote:
> A few non-ctype-related comments:
>
> Omit parens around arguments of 'defined', e.g., say "defined __MVS__"
> not "defined (__MVS__)".
Understood.
> I agree with Eric that we should rename "__string" rather than fiddle
> with #undefing it. It's just a placeholder name. I suggest renaming
> it to "__str". We can backport this to glibc eventually.
Might it be better to forgo the double-underscore prefix? (Why not
just use "s"?)
> In strtod.c, don't bother with "if (errno == ERANGE) errno = 0;". Just
> do "errno = 0;".
Understood. I was after a narrow fix to this issue.
> Also in strtod.c, don't assume isnand exists. That is, replace
> "!isnand(num)" with "num == num".
>
> Update serial numbers in changed .m4 files.
>
> In m4/fclose.m4, gl_FUNC_FCLOSE should
> AC_REQUIRE([AC_CANONICAL_HOST]). Also, it should test $host_os rather
> than $host.
>
> In m4/strstr.m4, the __MVS__ failure should be at compile-time, with
> #error, rather than at run-time. That's better if cross-compiling.
>
> In comments, prefer imperatives, e.g., "Instead, temporarily redefine
> ..." rather than "Instead, we temporarily redefine ...". This is
> standard GNU style and is shorter.
Roger all that.
> The get_rusage_as code has duplications. Simpler would be:
Agreed, but that's in the "general clean-up" category :)
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 20:08 ` Eric Blake
@ 2015-09-22 20:51 ` Daniel Richard G.
0 siblings, 0 replies; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-22 20:51 UTC (permalink / raw)
To: Eric Blake, Paul Eggert, bug-gnulib
On Tue, 2015 Sep 22 14:08-0600, Eric Blake wrote:
> On 09/22/2015 02:00 PM, Paul Eggert wrote:
> >>
> >> wc: file "//dev/null": EDC5047I An invalid file name was specified
> >> as a function parameter
> >
> > How about if we add //dev/null to the configure-time test as to
> > whether / and // are the same? If //dev/null doesn't work, then /
> > and // are not the same.
>
> Rather, it sounds like configure is already correct, and we are
> correctly deducing that // is different; but that the difference is
> odd on this platform in that it is not distinguishable via dev/ino
> (every other platform with distinct // at least has the decency to
> give a distinct dev/ino).
Yes, my intent was to show why the DOUBLE_SLASH_IS_DISTINCT_ROOT test
was giving the result that it had.
There's certainly much about this platform that is... odd.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 20:37 ` Daniel Richard G.
@ 2015-09-22 22:03 ` Paul Eggert
2015-09-22 23:44 ` Daniel Richard G.
0 siblings, 1 reply; 49+ messages in thread
From: Paul Eggert @ 2015-09-22 22:03 UTC (permalink / raw)
To: Daniel Richard G., bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 1445 bytes --]
Thanks for explaining. I still see a problem with the proposed patch,
though, in that (if I'm understanding it correctly) it would cause
c_isalpha (120) to succeed, even though EBCDIC 120 corresponds to U+00CC
LATIN CAPITAL LETTER I WITH GRAVE, and that is not supposed to be an
alphabetic character in the stripped-down C locale. Code that uses
c-ctype wants only ASCII letters, and departing from this would likely
break things.
Worse, the C expression "c_ispunct ('[')" might return false, as the
library may be in a locale that's incompatible with the mode the
compiler was in when it compiled the '['.
Looking at the web page you mentioned, it appears that one approach is
to assume EBCDIC 1047 (this seems to be the default and typical setting
for C programs) at both compile-time and run-time. We can check the
compile-time assumption without any code overhead. The proposed patch
does that. If someone ally wants to use a different code page, either
at compile-time or at run-time, more code will need to be written (most
likely by the poor soul who actually needs that feature).
> Yes, all control characters appear to be in [\x00-\x3F], but not
> everything in that range is a control character. (I remember 0x04 was
> not.) I tried making c_iscntrl() a simple range check at first, but
> that did not agree with the system iscntrl().
Thanks, this should be fixed in the attached patch, which I've installed.
[-- Attachment #2: 0001-c-ctype-assume-EBCDIC-1047-for-c_iscntrl.patch --]
[-- Type: text/x-patch, Size: 2531 bytes --]
>From a92ab221b5cad8a5c1a5ca1fc1823d1f3fe4a24b Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 22 Sep 2015 14:47:06 -0700
Subject: [PATCH] c-ctype: assume EBCDIC 1047 for c_iscntrl
* lib/c-ctype.c (c_iscntrl): When EBCDIC, assume code page 1047 at
both compile-time and at run-time. Check it at compile-time. We can
worry about other code pages later, if the topic ever comes up.
Fix typo in C_CTYPE_EBCDIC.
---
lib/c-ctype.c | 38 +++++++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 9 deletions(-)
diff --git a/lib/c-ctype.c b/lib/c-ctype.c
index 916d46e..558c4af 100644
--- a/lib/c-ctype.c
+++ b/lib/c-ctype.c
@@ -131,17 +131,37 @@ c_isblank (int c)
bool
c_iscntrl (int c)
{
- enum { C_CTYPE_EBCDIC = (' ' == 64 && '0' == 240
- && 'A' == 193 && 'J' == 209 && 'S' == 226
- && 'A' == 129 && 'J' == 145 && 'S' == 162) };
- verify (C_CTYPE_ASCII || C_CTYPE_EBCDIC);
-
- if (0 <= c && c < ' ')
- return true;
+ enum { C_CTYPE_EBCDIC = (' ' == '\x40' && '0' == '\xf0'
+ && 'A' == '\xc1' && 'J' == '\xd1' && 'S' == '\xe2'
+ && 'a' == '\x81' && 'j' == '\x91' && 's' == '\xa2') };
if (C_CTYPE_ASCII)
- return c == 0x7f;
+ return (0 <= c && c < ' ') || c == 0x7f;
else
- return c == 0xff || c == -1;
+ {
+ /* Return true if C corresponds to an ASCII control character.
+ Assume EBCDIC code page 1047, and verify that the compiler
+ agrees with this. */
+ verify (C_CTYPE_ASCII
+ || (C_CTYPE_EBCDIC
+ && '!' == '\x5a' && '#' == '\x7b' && '$' == '\x5b'
+ && '@' == '\x7c' && '[' == '\xad' && '\\' == '\xe0'
+ && ']' == '\xbd' && '^' == '\x5f' && '_' == '\x6d'
+ && '`' == '\x79'));
+ switch (c)
+ {
+ case '\x00': case '\x01': case '\x02': case '\x03': case '\x05':
+ case '\x0b': case '\x0c': case '\x0d': case '\x0e': case '\x0f':
+ case '\x10': case '\x11': case '\x12': case '\x13': case '\x15':
+ case '\x16': case '\x18': case '\x19': case '\x1c': case '\x1d':
+ case '\x1e': case '\x1f': case '\x26': case '\x27': case '\x2d':
+ case '\x2e': case '\x2f': case '\x32': case '\x37': case '\x3c':
+ case '\x3d': case '\x3f': case '\xff':
+ case '\xff' < 0 ? 0xff : -1:
+ return true;
+ default:
+ return false;
+ }
+ }
}
bool
--
2.1.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 22:03 ` Paul Eggert
@ 2015-09-22 23:44 ` Daniel Richard G.
2015-09-23 2:02 ` Paul Eggert
0 siblings, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-22 23:44 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 2660 bytes --]
On Tue, 2015 Sep 22 15:03-0700, Paul Eggert wrote:
> Thanks for explaining. I still see a problem with the proposed patch,
> though, in that (if I'm understanding it correctly) it would cause
> c_isalpha (120) to succeed, even though EBCDIC 120 corresponds to
> U+00CC LATIN CAPITAL LETTER I WITH GRAVE, and that is not supposed to
> be an alphabetic character in the stripped-down C locale. Code that
> uses c-ctype wants only ASCII letters, and departing from this would
> likely break things.
How would that match occur? c_isalpha() was/is using a "switch"
statement for EBCDIC.
> Worse, the C expression "c_ispunct ('[')" might return false, as the
> library may be in a locale that's incompatible with the mode the
> compiler was in when it compiled the '['.
If the user builds in one locale and runs in another, they're going to
have bigger problems (e.g. garbled program messages). As far as I've
seen, this is considered "out of bounds" in z/OS usage.
> Looking at the web page you mentioned, it appears that one approach is
> to assume EBCDIC 1047 (this seems to be the default and typical
> setting for C programs) at both compile-time and run-time. We can
> check the compile-time assumption without any code overhead. The
> proposed patch does that. If someone ally wants to use a different
> code page, either at compile-time or at run-time, more code will need
> to be written (most likely by the poor soul who actually needs that
> feature).
A different code page at run time, I think, is not feasible. But
international users will at least want a different code page at
compile time.
A simple program could generate tables for all the isxxxxx() functions
(see below) at compile time. Would you be inclined to do it that way?
> > Yes, all control characters appear to be in [\x00-\x3F], but not
> > everything in that range is a control character. (I remember 0x04
> > was not.) I tried making c_iscntrl() a simple range check at first,
> > but that did not agree with the system iscntrl().
>
> Thanks, this should be fixed in the attached patch, which I've
> installed.
> Email had 1 attachment:
> + 0001-c-ctype-assume-EBCDIC-1047-for-c_iscntrl.patch
> 3k (text/x-patch)
I'll try that out. I wasn't expecting you to all but rewrite c-ctype!
Just to help inform the discussion, I've attached a small program that
shows the output of the various isxxxxx() functions for all values in
[0, 255], and its output on z/OS with EBCDIC-1047 and -D_ALL_SOURCE.
It goes to show: where mainframes are concerned, nothing's easy :]
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ctype.c --]
[-- Type: text/x-csrc; name="ctype.c", Size: 878 bytes --]
#include <stdio.h>
#include <ctype.h>
int main(void)
{
int c;
puts("A. isalnum");
puts("B. isalpha");
puts("C. isascii");
puts("D. isblank");
puts("E. iscntrl");
puts("F. isdigit");
puts("G. isgraph");
puts("H. islower");
puts("I. isprint");
puts("J. ispunct");
puts("K. isspace");
puts("L. isupper");
puts("M. isxdigit");
puts(" char\tA B C D E F G H I J K L M");
for (c = 0; c < 256; c++)
printf("%3d %c\t%d %d %d %d %d %d %d %d %d %d %d %d %d\n",
c, isprint(c) ? c : ' ',
isalnum(c) ? 1 : 0,
isalpha(c) ? 1 : 0,
isascii(c) ? 1 : 0,
isblank(c) ? 1 : 0,
iscntrl(c) ? 1 : 0,
isdigit(c) ? 1 : 0,
isgraph(c) ? 1 : 0,
islower(c) ? 1 : 0,
isprint(c) ? 1 : 0,
ispunct(c) ? 1 : 0,
isspace(c) ? 1 : 0,
isupper(c) ? 1 : 0,
isxdigit(c) ? 1 : 0);
return 0;
}
[-- Attachment #3: ctype-ebcdic1047.txt --]
[-- Type: text/plain, Size: 8368 bytes --]
A. isalnum
B. isalpha
C. isascii
D. isblank
E. iscntrl
F. isdigit
G. isgraph
H. islower
I. isprint
J. ispunct
K. isspace
L. isupper
M. isxdigit
char A B C D E F G H I J K L M
0 0 0 1 0 1 0 0 0 0 0 0 0 0
1 0 0 1 0 1 0 0 0 0 0 0 0 0
2 0 0 1 0 1 0 0 0 0 0 0 0 0
3 0 0 1 0 1 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 1 1 1 0 0 0 0 0 1 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 1 0 1 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 1 0 1 0 0 0 0 0 1 0 0
12 0 0 1 0 1 0 0 0 0 0 1 0 0
13 0 0 1 0 1 0 0 0 0 0 1 0 0
14 0 0 1 0 1 0 0 0 0 0 0 0 0
15 0 0 1 0 1 0 0 0 0 0 0 0 0
16 0 0 1 0 1 0 0 0 0 0 0 0 0
17 0 0 1 0 1 0 0 0 0 0 0 0 0
18 0 0 1 0 1 0 0 0 0 0 0 0 0
19 0 0 1 0 1 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0
21 0 0 1 0 1 0 0 0 0 0 1 0 0
22 0 0 1 0 1 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 1 0 1 0 0 0 0 0 0 0 0
25 0 0 1 0 1 0 0 0 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0 0 0
28 0 0 1 0 1 0 0 0 0 0 0 0 0
29 0 0 1 0 1 0 0 0 0 0 0 0 0
30 0 0 1 0 1 0 0 0 0 0 0 0 0
31 0 0 1 0 1 0 0 0 0 0 0 0 0
32 0 0 0 0 0 0 0 0 0 0 0 0 0
33 0 0 0 0 0 0 0 0 0 0 0 0 0
34 0 0 0 0 0 0 0 0 0 0 0 0 0
35 0 0 0 0 0 0 0 0 0 0 0 0 0
36 0 0 0 0 0 0 0 0 0 0 0 0 0
37 0 0 0 0 0 0 0 0 0 0 0 0 0
38 0 0 1 0 1 0 0 0 0 0 0 0 0
39 0 0 1 0 1 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0 0 0 0 0
41 0 0 0 0 0 0 0 0 0 0 0 0 0
42 0 0 0 0 0 0 0 0 0 0 0 0 0
43 0 0 0 0 0 0 0 0 0 0 0 0 0
44 0 0 0 0 0 0 0 0 0 0 0 0 0
45 0 0 1 0 1 0 0 0 0 0 0 0 0
46 0 0 1 0 1 0 0 0 0 0 0 0 0
47 0 0 1 0 1 0 0 0 0 0 0 0 0
48 0 0 0 0 0 0 0 0 0 0 0 0 0
49 0 0 0 0 0 0 0 0 0 0 0 0 0
50 0 0 1 0 1 0 0 0 0 0 0 0 0
51 0 0 0 0 0 0 0 0 0 0 0 0 0
52 0 0 0 0 0 0 0 0 0 0 0 0 0
53 0 0 0 0 0 0 0 0 0 0 0 0 0
54 0 0 0 0 0 0 0 0 0 0 0 0 0
55 0 0 1 0 1 0 0 0 0 0 0 0 0
56 0 0 0 0 0 0 0 0 0 0 0 0 0
57 0 0 0 0 0 0 0 0 0 0 0 0 0
58 0 0 0 0 0 0 0 0 0 0 0 0 0
59 0 0 0 0 0 0 0 0 0 0 0 0 0
60 0 0 1 0 1 0 0 0 0 0 0 0 0
61 0 0 1 0 1 0 0 0 0 0 0 0 0
62 0 0 0 0 0 0 0 0 0 0 0 0 0
63 0 0 1 0 1 0 0 0 0 0 0 0 0
64 0 0 1 1 0 0 0 0 1 0 1 0 0
65 0 0 0 0 0 0 0 0 0 0 0 0 0
66 0 0 0 0 0 0 0 0 0 0 0 0 0
67 0 0 0 0 0 0 0 0 0 0 0 0 0
68 0 0 0 0 0 0 0 0 0 0 0 0 0
69 0 0 0 0 0 0 0 0 0 0 0 0 0
70 0 0 0 0 0 0 0 0 0 0 0 0 0
71 0 0 0 0 0 0 0 0 0 0 0 0 0
72 0 0 0 0 0 0 0 0 0 0 0 0 0
73 0 0 0 0 0 0 0 0 0 0 0 0 0
74 0 0 0 0 0 0 0 0 0 0 0 0 0
75 . 0 0 1 0 0 0 1 0 1 1 0 0 0
76 < 0 0 1 0 0 0 1 0 1 1 0 0 0
77 ( 0 0 1 0 0 0 1 0 1 1 0 0 0
78 + 0 0 1 0 0 0 1 0 1 1 0 0 0
79 | 0 0 1 0 0 0 1 0 1 1 0 0 0
80 & 0 0 1 0 0 0 1 0 1 1 0 0 0
81 0 0 0 0 0 0 0 0 0 0 0 0 0
82 0 0 0 0 0 0 0 0 0 0 0 0 0
83 0 0 0 0 0 0 0 0 0 0 0 0 0
84 0 0 0 0 0 0 0 0 0 0 0 0 0
85 0 0 0 0 0 0 0 0 0 0 0 0 0
86 0 0 0 0 0 0 0 0 0 0 0 0 0
87 0 0 0 0 0 0 0 0 0 0 0 0 0
88 0 0 0 0 0 0 0 0 0 0 0 0 0
89 0 0 0 0 0 0 0 0 0 0 0 0 0
90 ! 0 0 1 0 0 0 1 0 1 1 0 0 0
91 $ 0 0 1 0 0 0 1 0 1 1 0 0 0
92 * 0 0 1 0 0 0 1 0 1 1 0 0 0
93 ) 0 0 1 0 0 0 1 0 1 1 0 0 0
94 ; 0 0 1 0 0 0 1 0 1 1 0 0 0
95 ^ 0 0 1 0 0 0 1 0 1 1 0 0 0
96 - 0 0 1 0 0 0 1 0 1 1 0 0 0
97 / 0 0 1 0 0 0 1 0 1 1 0 0 0
98 0 0 0 0 0 0 0 0 0 0 0 0 0
99 0 0 0 0 0 0 0 0 0 0 0 0 0
100 0 0 0 0 0 0 0 0 0 0 0 0 0
101 0 0 0 0 0 0 0 0 0 0 0 0 0
102 0 0 0 0 0 0 0 0 0 0 0 0 0
103 0 0 0 0 0 0 0 0 0 0 0 0 0
104 0 0 0 0 0 0 0 0 0 0 0 0 0
105 0 0 0 0 0 0 0 0 0 0 0 0 0
106 0 0 0 0 0 0 0 0 0 0 0 0 0
107 , 0 0 1 0 0 0 1 0 1 1 0 0 0
108 % 0 0 1 0 0 0 1 0 1 1 0 0 0
109 _ 0 0 1 0 0 0 1 0 1 1 0 0 0
110 > 0 0 1 0 0 0 1 0 1 1 0 0 0
111 ? 0 0 1 0 0 0 1 0 1 1 0 0 0
112 0 0 0 0 0 0 0 0 0 0 0 0 0
113 0 0 0 0 0 0 0 0 0 0 0 0 0
114 0 0 0 0 0 0 0 0 0 0 0 0 0
115 0 0 0 0 0 0 0 0 0 0 0 0 0
116 0 0 0 0 0 0 0 0 0 0 0 0 0
117 0 0 0 0 0 0 0 0 0 0 0 0 0
118 0 0 0 0 0 0 0 0 0 0 0 0 0
119 0 0 0 0 0 0 0 0 0 0 0 0 0
120 0 0 0 0 0 0 0 0 0 0 0 0 0
121 ` 0 0 1 0 0 0 1 0 1 1 0 0 0
122 : 0 0 1 0 0 0 1 0 1 1 0 0 0
123 # 0 0 1 0 0 0 1 0 1 1 0 0 0
124 @ 0 0 1 0 0 0 1 0 1 1 0 0 0
125 ' 0 0 1 0 0 0 1 0 1 1 0 0 0
126 = 0 0 1 0 0 0 1 0 1 1 0 0 0
127 " 0 0 1 0 0 0 1 0 1 1 0 0 0
128 0 0 0 0 0 0 0 0 0 0 0 0 0
129 a 1 1 1 0 0 0 1 1 1 0 0 0 1
130 b 1 1 1 0 0 0 1 1 1 0 0 0 1
131 c 1 1 1 0 0 0 1 1 1 0 0 0 1
132 d 1 1 1 0 0 0 1 1 1 0 0 0 1
133 e 1 1 1 0 0 0 1 1 1 0 0 0 1
134 f 1 1 1 0 0 0 1 1 1 0 0 0 1
135 g 1 1 1 0 0 0 1 1 1 0 0 0 0
136 h 1 1 1 0 0 0 1 1 1 0 0 0 0
137 i 1 1 1 0 0 0 1 1 1 0 0 0 0
138 0 0 0 0 0 0 0 0 0 0 0 0 0
139 0 0 0 0 0 0 0 0 0 0 0 0 0
140 0 0 0 0 0 0 0 0 0 0 0 0 0
141 0 0 0 0 0 0 0 0 0 0 0 0 0
142 0 0 0 0 0 0 0 0 0 0 0 0 0
143 0 0 0 0 0 0 0 0 0 0 0 0 0
144 0 0 0 0 0 0 0 0 0 0 0 0 0
145 j 1 1 1 0 0 0 1 1 1 0 0 0 0
146 k 1 1 1 0 0 0 1 1 1 0 0 0 0
147 l 1 1 1 0 0 0 1 1 1 0 0 0 0
148 m 1 1 1 0 0 0 1 1 1 0 0 0 0
149 n 1 1 1 0 0 0 1 1 1 0 0 0 0
150 o 1 1 1 0 0 0 1 1 1 0 0 0 0
151 p 1 1 1 0 0 0 1 1 1 0 0 0 0
152 q 1 1 1 0 0 0 1 1 1 0 0 0 0
153 r 1 1 1 0 0 0 1 1 1 0 0 0 0
154 0 0 0 0 0 0 0 0 0 0 0 0 0
155 0 0 0 0 0 0 0 0 0 0 0 0 0
156 0 0 0 0 0 0 0 0 0 0 0 0 0
157 0 0 0 0 0 0 0 0 0 0 0 0 0
158 0 0 0 0 0 0 0 0 0 0 0 0 0
159 0 0 0 0 0 0 0 0 0 0 0 0 0
160 0 0 0 0 0 0 0 0 0 0 0 0 0
161 ~ 0 0 1 0 0 0 1 0 1 1 0 0 0
162 s 1 1 1 0 0 0 1 1 1 0 0 0 0
163 t 1 1 1 0 0 0 1 1 1 0 0 0 0
164 u 1 1 1 0 0 0 1 1 1 0 0 0 0
165 v 1 1 1 0 0 0 1 1 1 0 0 0 0
166 w 1 1 1 0 0 0 1 1 1 0 0 0 0
167 x 1 1 1 0 0 0 1 1 1 0 0 0 0
168 y 1 1 1 0 0 0 1 1 1 0 0 0 0
169 z 1 1 1 0 0 0 1 1 1 0 0 0 0
170 0 0 0 0 0 0 0 0 0 0 0 0 0
171 0 0 0 0 0 0 0 0 0 0 0 0 0
172 0 0 0 0 0 0 0 0 0 0 0 0 0
173 [ 0 0 1 0 0 0 1 0 1 1 0 0 0
174 0 0 0 0 0 0 0 0 0 0 0 0 0
175 0 0 0 0 0 0 0 0 0 0 0 0 0
176 0 0 0 0 0 0 0 0 0 0 0 0 0
177 0 0 0 0 0 0 0 0 0 0 0 0 0
178 0 0 0 0 0 0 0 0 0 0 0 0 0
179 0 0 0 0 0 0 0 0 0 0 0 0 0
180 0 0 0 0 0 0 0 0 0 0 0 0 0
181 0 0 0 0 0 0 0 0 0 0 0 0 0
182 0 0 0 0 0 0 0 0 0 0 0 0 0
183 0 0 0 0 0 0 0 0 0 0 0 0 0
184 0 0 0 0 0 0 0 0 0 0 0 0 0
185 0 0 0 0 0 0 0 0 0 0 0 0 0
186 0 0 0 0 0 0 0 0 0 0 0 0 0
187 0 0 0 0 0 0 0 0 0 0 0 0 0
188 0 0 0 0 0 0 0 0 0 0 0 0 0
189 ] 0 0 1 0 0 0 1 0 1 1 0 0 0
190 0 0 0 0 0 0 0 0 0 0 0 0 0
191 0 0 0 0 0 0 0 0 0 0 0 0 0
192 { 0 0 1 0 0 0 1 0 1 1 0 0 0
193 A 1 1 1 0 0 0 1 0 1 0 0 1 1
194 B 1 1 1 0 0 0 1 0 1 0 0 1 1
195 C 1 1 1 0 0 0 1 0 1 0 0 1 1
196 D 1 1 1 0 0 0 1 0 1 0 0 1 1
197 E 1 1 1 0 0 0 1 0 1 0 0 1 1
198 F 1 1 1 0 0 0 1 0 1 0 0 1 1
199 G 1 1 1 0 0 0 1 0 1 0 0 1 0
200 H 1 1 1 0 0 0 1 0 1 0 0 1 0
201 I 1 1 1 0 0 0 1 0 1 0 0 1 0
202 0 0 0 0 0 0 0 0 0 0 0 0 0
203 0 0 0 0 0 0 0 0 0 0 0 0 0
204 0 0 0 0 0 0 0 0 0 0 0 0 0
205 0 0 0 0 0 0 0 0 0 0 0 0 0
206 0 0 0 0 0 0 0 0 0 0 0 0 0
207 0 0 0 0 0 0 0 0 0 0 0 0 0
208 } 0 0 1 0 0 0 1 0 1 1 0 0 0
209 J 1 1 1 0 0 0 1 0 1 0 0 1 0
210 K 1 1 1 0 0 0 1 0 1 0 0 1 0
211 L 1 1 1 0 0 0 1 0 1 0 0 1 0
212 M 1 1 1 0 0 0 1 0 1 0 0 1 0
213 N 1 1 1 0 0 0 1 0 1 0 0 1 0
214 O 1 1 1 0 0 0 1 0 1 0 0 1 0
215 P 1 1 1 0 0 0 1 0 1 0 0 1 0
216 Q 1 1 1 0 0 0 1 0 1 0 0 1 0
217 R 1 1 1 0 0 0 1 0 1 0 0 1 0
218 0 0 0 0 0 0 0 0 0 0 0 0 0
219 0 0 0 0 0 0 0 0 0 0 0 0 0
220 0 0 0 0 0 0 0 0 0 0 0 0 0
221 0 0 0 0 0 0 0 0 0 0 0 0 0
222 0 0 0 0 0 0 0 0 0 0 0 0 0
223 0 0 0 0 0 0 0 0 0 0 0 0 0
224 \ 0 0 1 0 0 0 1 0 1 1 0 0 0
225 0 0 0 0 0 0 0 0 0 0 0 0 0
226 S 1 1 1 0 0 0 1 0 1 0 0 1 0
227 T 1 1 1 0 0 0 1 0 1 0 0 1 0
228 U 1 1 1 0 0 0 1 0 1 0 0 1 0
229 V 1 1 1 0 0 0 1 0 1 0 0 1 0
230 W 1 1 1 0 0 0 1 0 1 0 0 1 0
231 X 1 1 1 0 0 0 1 0 1 0 0 1 0
232 Y 1 1 1 0 0 0 1 0 1 0 0 1 0
233 Z 1 1 1 0 0 0 1 0 1 0 0 1 0
234 0 0 0 0 0 0 0 0 0 0 0 0 0
235 0 0 0 0 0 0 0 0 0 0 0 0 0
236 0 0 0 0 0 0 0 0 0 0 0 0 0
237 0 0 0 0 0 0 0 0 0 0 0 0 0
238 0 0 0 0 0 0 0 0 0 0 0 0 0
239 0 0 0 0 0 0 0 0 0 0 0 0 0
240 0 1 0 1 0 0 1 1 0 1 0 0 0 1
241 1 1 0 1 0 0 1 1 0 1 0 0 0 1
242 2 1 0 1 0 0 1 1 0 1 0 0 0 1
243 3 1 0 1 0 0 1 1 0 1 0 0 0 1
244 4 1 0 1 0 0 1 1 0 1 0 0 0 1
245 5 1 0 1 0 0 1 1 0 1 0 0 0 1
246 6 1 0 1 0 0 1 1 0 1 0 0 0 1
247 7 1 0 1 0 0 1 1 0 1 0 0 0 1
248 8 1 0 1 0 0 1 1 0 1 0 0 0 1
249 9 1 0 1 0 0 1 1 0 1 0 0 0 1
250 0 0 0 0 0 0 0 0 0 0 0 0 0
251 0 0 0 0 0 0 0 0 0 0 0 0 0
252 0 0 0 0 0 0 0 0 0 0 0 0 0
253 0 0 0 0 0 0 0 0 0 0 0 0 0
254 0 0 0 0 0 0 0 0 0 0 0 0 0
255 0 0 0 0 0 0 0 0 0 0 0 0 0
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-22 23:44 ` Daniel Richard G.
@ 2015-09-23 2:02 ` Paul Eggert
2015-09-23 6:58 ` Daniel Richard G.
0 siblings, 1 reply; 49+ messages in thread
From: Paul Eggert @ 2015-09-23 2:02 UTC (permalink / raw)
To: Daniel Richard G., bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 1259 bytes --]
>> Code that
>> uses c-ctype wants only ASCII letters, and departing from this would
>> likely break things.
>
> How would that match occur? c_isalpha() was/is using a "switch"
> statement for EBCDIC.
Oh, sorry, I was assuming that the substitution was being proposed for all the
functions; but it's being proposed only for c_isascii, c_iscntrl, c_isgraph,
c_isprint, and c_ispunct. These functions are so rarely used that it probably
doesn't matter that much what we do....
> If the user builds in one locale and runs in another, they're going to
> have bigger problems (e.g. garbled program messages). As far as I've
> seen, this is considered "out of bounds" in z/OS usage.
Excellent; that simplifies things.
> A different code page at run time, I think, is not feasible. But
> international users will at least want a different code page at
> compile time.
>
> A simple program could generate tables for all the isxxxxx() functions
> (see below) at compile time. Would you be inclined to do it that way?
I think we can do it without that kind of compile-time hassle, if we can assume
that the compile-time locale is the same as the run-time. I installed the
attached patch, which makes that assumption, and which I hope does the right thing.
[-- Attachment #2: 0001-c-ctype-support-EBCDIC-style-c_isascii.patch --]
[-- Type: text/plain, Size: 5407 bytes --]
From a5ce2c8c0b604a86fd575c6f80384e3189703546 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 22 Sep 2015 18:59:28 -0700
Subject: [PATCH] c-ctype: support EBCDIC-style c_isascii
* lib/c-ctype.c (C_TYPE_EBCDIC): Move to top level.
(c_isascii, c_iscntrl): Assume EBCDIC code page 1047 for control
characters, if EBCDIC.
---
lib/c-ctype.c | 93 +++++++++++++++++++++++++++++++++++++++++------------------
1 file changed, 65 insertions(+), 28 deletions(-)
diff --git a/lib/c-ctype.c b/lib/c-ctype.c
index 558c4af..a3913a1 100644
--- a/lib/c-ctype.c
+++ b/lib/c-ctype.c
@@ -37,6 +37,17 @@ enum { C_CTYPE_CONSECUTIVE_LOWERCASE = false };
enum { C_CTYPE_CONSECUTIVE_UPPERCASE = false };
#endif
+enum
+ {
+ /* True if this appears to be a host using EBCDIC. */
+ C_CTYPE_EBCDIC = (' ' == '\x40' && '0' == '\xf0'
+ && 'A' == '\xc1' && 'J' == '\xd1' && 'S' == '\xe2'
+ && 'a' == '\x81' && 'j' == '\x91' && 's' == '\xa2')
+ };
+
+/* The implementation currently supports ASCII and EBCDIC. */
+verify (C_CTYPE_ASCII || C_CTYPE_EBCDIC);
+
/* Convert an int, which may be promoted from either an unsigned or a
signed char, to the corresponding char. */
@@ -54,7 +65,45 @@ to_char (int c)
bool
c_isascii (int c)
{
- return (c >= 0x00 && c <= 0x7f);
+ if (C_CTYPE_ASCII)
+ return 0 <= c && c <= 0x7f;
+
+ /* Use EBCDIC code page 1047's assignments for ASCII control chars;
+ assume all EBCDIC code pages agree about these assignments. */
+ switch (to_char (c))
+ {
+ case '\x00': case '\x01': case '\x02': case '\x03': case '\x05':
+ case '\x0b': case '\x0c': case '\x0d': case '\x0e': case '\x0f':
+ case '\x10': case '\x11': case '\x12': case '\x13': case '\x15':
+ case '\x16': case '\x18': case '\x19': case '\x1c': case '\x1d':
+ case '\x1e': case '\x1f': case '\x26': case '\x27': case '\x2d':
+ case '\x2e': case '\x2f': case '\x32': case '\x37': case '\x3c':
+ case '\x3d': case '\x3f': case '\xff':
+ case '\xff' < 0 ? 0xff : -1:
+
+ case ' ': case '!': case '"': case '#': case '$': case '%':
+ case '&': case '\'': case '(': case ')': case '*': case '+':
+ case ',': case '-': case '.': case '/':
+ case '0': case '1': case '2': case '3': case '4': case '5':
+ case '6': case '7': case '8': case '9':
+ case ':': case ';': case '<': case '=': case '>': case '?':
+ case '@':
+ case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
+ case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
+ case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
+ case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
+ case 'Y': case 'Z':
+ case '[': case '\\': case ']': case '^': case '_': case '`':
+ case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
+ case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
+ case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
+ case 's': case 't': case 'u': case 'v': case 'w': case 'x':
+ case 'y': case 'z':
+ case '{': case '|': case '}': case '~':
+ return true;
+ default:
+ return false;
+ }
}
bool
@@ -131,36 +180,24 @@ c_isblank (int c)
bool
c_iscntrl (int c)
{
- enum { C_CTYPE_EBCDIC = (' ' == '\x40' && '0' == '\xf0'
- && 'A' == '\xc1' && 'J' == '\xd1' && 'S' == '\xe2'
- && 'a' == '\x81' && 'j' == '\x91' && 's' == '\xa2') };
if (C_CTYPE_ASCII)
return (0 <= c && c < ' ') || c == 0x7f;
- else
+
+ /* Use EBCDIC code page 1047's assignments for ASCII control chars;
+ assume all EBCDIC code pages agree about these assignments. */
+ switch (c)
{
- /* Return true if C corresponds to an ASCII control character.
- Assume EBCDIC code page 1047, and verify that the compiler
- agrees with this. */
- verify (C_CTYPE_ASCII
- || (C_CTYPE_EBCDIC
- && '!' == '\x5a' && '#' == '\x7b' && '$' == '\x5b'
- && '@' == '\x7c' && '[' == '\xad' && '\\' == '\xe0'
- && ']' == '\xbd' && '^' == '\x5f' && '_' == '\x6d'
- && '`' == '\x79'));
- switch (c)
- {
- case '\x00': case '\x01': case '\x02': case '\x03': case '\x05':
- case '\x0b': case '\x0c': case '\x0d': case '\x0e': case '\x0f':
- case '\x10': case '\x11': case '\x12': case '\x13': case '\x15':
- case '\x16': case '\x18': case '\x19': case '\x1c': case '\x1d':
- case '\x1e': case '\x1f': case '\x26': case '\x27': case '\x2d':
- case '\x2e': case '\x2f': case '\x32': case '\x37': case '\x3c':
- case '\x3d': case '\x3f': case '\xff':
- case '\xff' < 0 ? 0xff : -1:
- return true;
- default:
- return false;
- }
+ case '\x00': case '\x01': case '\x02': case '\x03': case '\x05':
+ case '\x0b': case '\x0c': case '\x0d': case '\x0e': case '\x0f':
+ case '\x10': case '\x11': case '\x12': case '\x13': case '\x15':
+ case '\x16': case '\x18': case '\x19': case '\x1c': case '\x1d':
+ case '\x1e': case '\x1f': case '\x26': case '\x27': case '\x2d':
+ case '\x2e': case '\x2f': case '\x32': case '\x37': case '\x3c':
+ case '\x3d': case '\x3f': case '\xff':
+ case '\xff' < 0 ? 0xff : -1:
+ return true;
+ default:
+ return false;
}
}
--
2.1.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-23 2:02 ` Paul Eggert
@ 2015-09-23 6:58 ` Daniel Richard G.
2015-09-23 19:05 ` Paul Eggert
2015-09-23 19:29 ` Paul Eggert
0 siblings, 2 replies; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-23 6:58 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib
Okay, I tested your latest changes (git 4d83e798). There is one
assertion that needs to be #ifdef'ed out for EBCDIC:
$ ./test-c-ctype
/path/to/gltests/test-c-ctype.c:62: assertion 'c_isascii (c) == (c >= 0 && c < 0x80)' failed
CEE5207E The signal SIGABRT was received.
ABORT instruction
With that change, test-c-ctype passes.
I also tried a run with signed characters (-qchars=signed), and while
test-c-ctype passed, a number of other things broke. I'll be preparing
and submitting patches for those as well.
On Tue, 2015 Sep 22 19:02-0700, Paul Eggert wrote:
>
> > How would that match occur? c_isalpha() was/is using a "switch"
> > statement for EBCDIC.
>
> Oh, sorry, I was assuming that the substitution was being proposed
> for all the functions; but it's being proposed only for c_isascii,
> c_iscntrl, c_isgraph, c_isprint, and c_ispunct. These functions
> are so rarely used that it probably doesn't matter that much what
> we do....
Okay, I understand. The functions that already had a complete "switch"
implementation, I left alone; that approach will work pretty much
regardless of encoding.
> > A simple program could generate tables for all the isxxxxx()
> > functions (see below) at compile time. Would you be inclined to do
> > it that way?
>
> I think we can do it without that kind of compile-time hassle, if we
> can assume that the compile-time locale is the same as the run-time.
> I installed the attached patch, which makes that assumption, and which
> I hope does the right thing.
I'm a bit uneasy about hard-coding the list of control characters for
c_iscntrl() like that.
What about having a check in test-c-ctype that compares c_iscntrl() with
its system counterpart? If the assumption is that alternate EBCDIC
encodings used with Gnulib will agree with EBCDIC-1047 on these
characters, then that should be checked.
Also, perhaps, that any character for which c_iscntrl() is true should
return false from most of the other functions...
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-23 6:58 ` Daniel Richard G.
@ 2015-09-23 19:05 ` Paul Eggert
2015-09-23 19:29 ` Paul Eggert
1 sibling, 0 replies; 49+ messages in thread
From: Paul Eggert @ 2015-09-23 19:05 UTC (permalink / raw)
To: Daniel Richard G., bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 223 bytes --]
On 09/22/2015 11:58 PM, Daniel Richard G. wrote:
> There is one
> assertion that needs to be #ifdef'ed out for EBCDIC:
Better than that, let's improve the assertion so that it works for
EBCDIC. I installed the attached.
[-- Attachment #2: 0001-c-ctype-improve-c_isascii-testing.patch --]
[-- Type: text/x-patch, Size: 1625 bytes --]
>From a7a072a14945dfbe5fdd207926846b2c286b4b83 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 23 Sep 2015 12:02:35 -0700
Subject: [PATCH] c-ctype: improve c_isascii testing
* tests/test-c-ctype.c (test_all): Port c_isascii test to EBCDIC.
Add a test to count the number of ASCII characters.
---
ChangeLog | 6 ++++++
tests/test-c-ctype.c | 8 +++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/ChangeLog b/ChangeLog
index 7f7910b..493c915 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2015-09-23 Paul Eggert <eggert@cs.ucla.edu>
+
+ c-ctype: improve c_isascii testing
+ * tests/test-c-ctype.c (test_all): Port c_isascii test to EBCDIC.
+ Add a test to count the number of ASCII characters.
+
2015-09-22 Paul Eggert <eggert@cs.ucla.edu>
savewd: remove SAVEWD_CHDIR_READABLE
diff --git a/tests/test-c-ctype.c b/tests/test-c-ctype.c
index 63d0af9..80eb69d 100644
--- a/tests/test-c-ctype.c
+++ b/tests/test-c-ctype.c
@@ -37,6 +37,7 @@ static void
test_all (void)
{
int c;
+ int n_isascii = 0;
for (c = -0x80; c < 0x100; c++)
{
@@ -59,7 +60,10 @@ test_all (void)
ASSERT (to_char (c_toupper (c)) == to_char (c_toupper (c + 0x100)));
}
- ASSERT (c_isascii (c) == (c >= 0 && c < 0x80));
+ if (0 <= c)
+ n_isascii += c_isascii (c);
+
+ ASSERT (c_isascii (c) == (c_isprint (c) || c_iscntrl (c)));
ASSERT (c_isalnum (c) == (c_isalpha (c) || c_isdigit (c)));
@@ -383,6 +387,8 @@ test_all (void)
break;
}
}
+
+ ASSERT (n_isascii == 128);
}
int
--
2.1.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-23 6:58 ` Daniel Richard G.
2015-09-23 19:05 ` Paul Eggert
@ 2015-09-23 19:29 ` Paul Eggert
2015-09-23 21:57 ` Daniel Richard G.
1 sibling, 1 reply; 49+ messages in thread
From: Paul Eggert @ 2015-09-23 19:29 UTC (permalink / raw)
To: Daniel Richard G., bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 543 bytes --]
On 09/22/2015 11:58 PM, Daniel Richard G. wrote:
> What about having a check in test-c-ctype that compares c_iscntrl() with
> its system counterpart? If the assumption is that alternate EBCDIC
> encodings used with Gnulib will agree with EBCDIC-1047 on these
> characters, then that should be checked.
Good idea. Done in the attached patch.
> Also, perhaps, that any character for which c_iscntrl() is true should
> return false from most of the other functions...
That's already tested by "ASSERT (! (c_iscntrl (c) && c_isprint (c)));".
[-- Attachment #2: 0001-Test-that-c_iscntrl-agrees-with-iscntrl-etc.patch --]
[-- Type: text/x-patch, Size: 5384 bytes --]
>From 54237dcc0ce907673513e4812a1bb270d54737fb Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 23 Sep 2015 12:26:38 -0700
Subject: [PATCH] Test that c_iscntrl agrees with iscntrl, etc.
Suggested by Daniel Richard G. in:
http://lists.gnu.org/archive/html/bug-gnulib/2015-09/msg00034.html
* modules/c-ctype-tests (Depends-on): Add ctype.
* tests/test-c-ctype.c: Include <ctype.h>.
(NCHARS): New constant.
(test_agree_with_C_locale): New function.
(main): Use it.
(test_all): Use named constants.
---
ChangeLog | 10 ++++++++
modules/c-ctype-tests | 2 +-
tests/test-c-ctype.c | 65 ++++++++++++++++++++++++++++++++++++++-------------
3 files changed, 60 insertions(+), 17 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 493c915..7eca2c4 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,15 @@
2015-09-23 Paul Eggert <eggert@cs.ucla.edu>
+ Test that c_iscntrl agrees with iscntrl, etc.
+ Suggested by Daniel Richard G. in:
+ http://lists.gnu.org/archive/html/bug-gnulib/2015-09/msg00034.html
+ * modules/c-ctype-tests (Depends-on): Add ctype.
+ * tests/test-c-ctype.c: Include <ctype.h>.
+ (NCHARS): New constant.
+ (test_agree_with_C_locale): New function.
+ (main): Use it.
+ (test_all): Use named constants.
+
c-ctype: improve c_isascii testing
* tests/test-c-ctype.c (test_all): Port c_isascii test to EBCDIC.
Add a test to count the number of ASCII characters.
diff --git a/modules/c-ctype-tests b/modules/c-ctype-tests
index 196f529..cb65ee3 100644
--- a/modules/c-ctype-tests
+++ b/modules/c-ctype-tests
@@ -3,10 +3,10 @@ tests/test-c-ctype.c
tests/macros.h
Depends-on:
+ctype
configure.ac:
Makefile.am:
TESTS += test-c-ctype
check_PROGRAMS += test-c-ctype
-
diff --git a/tests/test-c-ctype.c b/tests/test-c-ctype.c
index 80eb69d..481cbbb 100644
--- a/tests/test-c-ctype.c
+++ b/tests/test-c-ctype.c
@@ -20,11 +20,14 @@
#include "c-ctype.h"
+#include <ctype.h>
#include <limits.h>
#include <locale.h>
#include "macros.h"
+enum { NCHARS = UCHAR_MAX + 1 };
+
static char
to_char (int c)
{
@@ -34,30 +37,58 @@ to_char (int c)
}
static void
+test_agree_with_C_locale (void)
+{
+ int c;
+
+ for (c = 0; c <= UCHAR_MAX; c++)
+ {
+ ASSERT (c_isascii (c) == (isascii (c) != 0));
+ if (c_isascii (c))
+ {
+ ASSERT (c_isalnum (c) == (isalnum (c) != 0));
+ ASSERT (c_isalpha (c) == (isalpha (c) != 0));
+ ASSERT (c_isblank (c) == (isblank (c) != 0));
+ ASSERT (c_iscntrl (c) == (iscntrl (c) != 0));
+ ASSERT (c_isdigit (c) == (isdigit (c) != 0));
+ ASSERT (c_islower (c) == (islower (c) != 0));
+ ASSERT (c_isgraph (c) == (isgraph (c) != 0));
+ ASSERT (c_isprint (c) == (isprint (c) != 0));
+ ASSERT (c_ispunct (c) == (ispunct (c) != 0));
+ ASSERT (c_isspace (c) == (isspace (c) != 0));
+ ASSERT (c_isupper (c) == (isupper (c) != 0));
+ ASSERT (c_isxdigit (c) == (isxdigit (c) != 0));
+ ASSERT (c_tolower (c) == tolower (c));
+ ASSERT (c_toupper (c) == toupper (c));
+ }
+ }
+}
+
+static void
test_all (void)
{
int c;
int n_isascii = 0;
- for (c = -0x80; c < 0x100; c++)
+ for (c = SCHAR_MIN; c <= UCHAR_MAX; c++)
{
if (c < 0)
{
- ASSERT (c_isascii (c) == c_isascii (c + 0x100));
- ASSERT (c_isalnum (c) == c_isalnum (c + 0x100));
- ASSERT (c_isalpha (c) == c_isalpha (c + 0x100));
- ASSERT (c_isblank (c) == c_isblank (c + 0x100));
- ASSERT (c_iscntrl (c) == c_iscntrl (c + 0x100));
- ASSERT (c_isdigit (c) == c_isdigit (c + 0x100));
- ASSERT (c_islower (c) == c_islower (c + 0x100));
- ASSERT (c_isgraph (c) == c_isgraph (c + 0x100));
- ASSERT (c_isprint (c) == c_isprint (c + 0x100));
- ASSERT (c_ispunct (c) == c_ispunct (c + 0x100));
- ASSERT (c_isspace (c) == c_isspace (c + 0x100));
- ASSERT (c_isupper (c) == c_isupper (c + 0x100));
- ASSERT (c_isxdigit (c) == c_isxdigit (c + 0x100));
- ASSERT (to_char (c_tolower (c)) == to_char (c_tolower (c + 0x100)));
- ASSERT (to_char (c_toupper (c)) == to_char (c_toupper (c + 0x100)));
+ ASSERT (c_isascii (c) == c_isascii (c + NCHARS));
+ ASSERT (c_isalnum (c) == c_isalnum (c + NCHARS));
+ ASSERT (c_isalpha (c) == c_isalpha (c + NCHARS));
+ ASSERT (c_isblank (c) == c_isblank (c + NCHARS));
+ ASSERT (c_iscntrl (c) == c_iscntrl (c + NCHARS));
+ ASSERT (c_isdigit (c) == c_isdigit (c + NCHARS));
+ ASSERT (c_islower (c) == c_islower (c + NCHARS));
+ ASSERT (c_isgraph (c) == c_isgraph (c + NCHARS));
+ ASSERT (c_isprint (c) == c_isprint (c + NCHARS));
+ ASSERT (c_ispunct (c) == c_ispunct (c + NCHARS));
+ ASSERT (c_isspace (c) == c_isspace (c + NCHARS));
+ ASSERT (c_isupper (c) == c_isupper (c + NCHARS));
+ ASSERT (c_isxdigit (c) == c_isxdigit (c + NCHARS));
+ ASSERT (to_char (c_tolower (c)) == to_char (c_tolower (c + NCHARS)));
+ ASSERT (to_char (c_toupper (c)) == to_char (c_toupper (c + NCHARS)));
}
if (0 <= c)
@@ -394,6 +425,8 @@ test_all (void)
int
main ()
{
+ test_agree_with_C_locale ();
+
test_all ();
setlocale (LC_ALL, "de_DE");
--
2.1.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-23 19:29 ` Paul Eggert
@ 2015-09-23 21:57 ` Daniel Richard G.
2015-09-25 7:29 ` Paul Eggert
0 siblings, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-23 21:57 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib
Hi Paul,
I tested your changes in git a406de9c. A handful of fixes are needed:
* c_isascii(): Add \x07 (DEL) as an ASCII character.
* c_isascii(): Drop \xFF (EO), as this is not ASCII.
* c_iscntrl(): Add \x07 (DEL) as a control character.
* c_iscntrl(): Drop \xFF (EO), as apparently this is not a control
character.
* c_tolower(): In order to agree with tolower(), it needs to return the
_unsigned_ promoted form of the character. (Returning identity in the
default: case seems fine.)
* c_toupper(): Likewise.
* test-c-ctype.c: test_all(): As a result of the preceding two changes,
this is needed:
- ASSERT (c_tolower (c) == 'a');
+ ASSERT (to_char (c_tolower (c)) == 'a');
- ASSERT (c_toupper (c) == 'A');
+ ASSERT (to_char (c_toupper (c)) == 'A');
Signed characters are a real PITA in EBCDIC. Even something like
wchar_t buf[] = { 'a', 'b', 'c', '\0' };
doesn't work properly in that case.
--Daniel
On Wed, 2015 Sep 23 12:29-0700, Paul Eggert wrote:
> On 09/22/2015 11:58 PM, Daniel Richard G. wrote:
> >
> > What about having a check in test-c-ctype that compares c_iscntrl()
> > with its system counterpart? If the assumption is that alternate
> > EBCDIC encodings used with Gnulib will agree with EBCDIC-1047 on
> > these characters, then that should be checked.
>
> Good idea. Done in the attached patch.
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-23 21:57 ` Daniel Richard G.
@ 2015-09-25 7:29 ` Paul Eggert
2015-09-26 0:25 ` Daniel Richard G.
0 siblings, 1 reply; 49+ messages in thread
From: Paul Eggert @ 2015-09-25 7:29 UTC (permalink / raw)
To: Daniel Richard G., bug-gnulib
Thanks for checking it. On further thought, I'd rather that we went to inline
functions, as that would have made ironing out all these glitches easier, and
anyway inline functions are typically the way to go for this sort of thing
nowadays. I installed a further patch to do that (see URL below); it should
also fix the c-ctype bugs you mentioned.
http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=43a090ce05f7046457be302ae4a17e83351968b0
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-25 7:29 ` Paul Eggert
@ 2015-09-26 0:25 ` Daniel Richard G.
2015-09-26 2:49 ` Paul Eggert
0 siblings, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-26 0:25 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib
Hi Paul,
On Fri, 2015 Sep 25 00:29-0700, Paul Eggert wrote:
> Thanks for checking it. On further thought, I'd rather that we went
> to inline functions, as that would have made ironing out all these
> glitches easier, and anyway inline functions are typically the way to
> go for this sort of thing nowadays. I installed a further patch to do
> that (see URL below); it should also fix the c-ctype bugs you
> mentioned.
>
> http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=43a090ce05f7046457be302ae4a17e83351968b0
When I run test-c-ctype with unsigned chars, a number of assertions trip
starting at c == -127. (-127 + NCHARS == 129 == 'a'). Here is the
complete list for that value, after removing the abort() from ASSERT():
.../test-c-ctype.c:82: assertion 'c_isascii (c) == c_isascii (c + NCHARS)' failed
.../test-c-ctype.c:83: assertion 'c_isalnum (c) == c_isalnum (c + NCHARS)' failed
.../test-c-ctype.c:84: assertion 'c_isalpha (c) == c_isalpha (c + NCHARS)' failed
.../test-c-ctype.c:88: assertion 'c_islower (c) == c_islower (c + NCHARS)' failed
.../test-c-ctype.c:89: assertion 'c_isgraph (c) == c_isgraph (c + NCHARS)' failed
.../test-c-ctype.c:90: assertion 'c_isprint (c) == c_isprint (c + NCHARS)' failed
.../test-c-ctype.c:94: assertion 'c_isxdigit (c) == c_isxdigit (c + NCHARS)' failed
.../test-c-ctype.c:96: assertion 'to_char (c_toupper (c)) == to_char (c_toupper (c + NCHARS))' failed
.../test-c-ctype.c:142: assertion 'c_islower (c) == 1' failed
.../test-c-ctype.c:203: assertion 'c_isxdigit (c) == 1' failed
.../test-c-ctype.c:243: assertion 'to_char (c_toupper (c)) == 'A'' failed
(line numbers will have minor deltas due to printf() debugging)
The way the c_isxxxxx() functions are written now makes it a little
difficult for me to determine what's going on, but it should be
clearer to you.
When I run the test with signed chars, there are only a couple failures,
and they represent an odd corner case of EBCDIC.
So in z/OS, '\n' == 0x15, and that is the normal end-of-line marker:
$ echo x | od -t x1
0000000000 A7 15
0000000002
According to
https://www-304.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/risasc.htm?lang=en
ISO 8859-1 codepoint 0x0A (LF) corresponds to IBM-1047 codepoint
0x15 (NL/newline).
IBM-1047 does contain LF, at 0x25. But per IBM, that does not map to
anything in ISO 8859-1.
(IANA disagrees, of course: EBCDIC 0x15 == U+0085 and
EBCDIC 0x25 == U+000A. But that does you little good in z/OS.)
What's more, all the system isxxxxx() functions---including isascii(),
iscntrl() and isspace()---return false for 0x25.
There is probably some ancient history behind the NL<->LF mapping,
seeing as EBCDIC has both characters and ASCII only has the latter. My
hypothesis is that UNIX decided to "emulate" NL using LF, and as UNIX
become popular and linefeeds became standardized as an end-of-line
marker, IBM figured it made more sense to map it to NL (as a functional
equivalent) than to LF (as a pedantically-correct translation).
EBCDIC LF not being classified as control nor space looks dodgy. But as
it appears that all control and space characters are also isascii()
characters, I suspect IBM for whatever reason did not want to have a
codepoint that would be an exception to that rule.
So to make a long story short: After I add \x15 and remove \x25 to/from
_C_CTYPE_CNTRL for EBCDIC, the test passes in the signed-char case.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-26 0:25 ` Daniel Richard G.
@ 2015-09-26 2:49 ` Paul Eggert
2015-09-26 4:39 ` Daniel Richard G.
0 siblings, 1 reply; 49+ messages in thread
From: Paul Eggert @ 2015-09-26 2:49 UTC (permalink / raw)
To: Daniel Richard G., bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 432 bytes --]
Daniel Richard G. wrote:
> So to make a long story short: After I add \x15 and remove \x25 to/from
> _C_CTYPE_CNTRL for EBCDIC, the test passes in the signed-char case.
Thanks, given all that history let's rewrite it so that the compiler can decide
what '\n' maps to, that way it'll work even in EBCDIC environments that agree
with IANA instead of IBM. I installed the attached patch, which should fix the
bugs you mentioned.
[-- Attachment #2: 0001-c-ctype-port-better-to-z-OS-EBCDIC.patch --]
[-- Type: text/plain, Size: 4416 bytes --]
From b3807b62cc5e4e06a74c69665cb171ef51b40567 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 25 Sep 2015 19:45:59 -0700
Subject: [PATCH] c-ctype: port better to z/OS EBCDIC
Problems reported by Daniel Richard G. in:
http://lists.gnu.org/archive/html/bug-gnulib/2015-09/msg00050.html
* lib/c-ctype.h (_C_CTYPE_CNTRL): Rewrite in terms of
the C standard escapes and _C_CTYPE_OTHER_CNTRL.
(_C_CTYPE_OTHER_CNTRL): New macro.
* tests/test-c-ctype.c (test_all): Test from CHAR_MIN, not
from SCHAR_MIN, as the functions are defined only from values
promoted from char or from unsigned char, not necessarily from
signed char.
---
ChangeLog | 13 +++++++++++++
lib/c-ctype.h | 41 +++++++++++++++++++++++------------------
tests/test-c-ctype.c | 2 +-
3 files changed, 37 insertions(+), 19 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 3b3f101..a347908 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,16 @@
+2015-09-25 Paul Eggert <eggert@cs.ucla.edu>
+
+ c-ctype: port better to z/OS EBCDIC
+ Problems reported by Daniel Richard G. in:
+ http://lists.gnu.org/archive/html/bug-gnulib/2015-09/msg00050.html
+ * lib/c-ctype.h (_C_CTYPE_CNTRL): Rewrite in terms of
+ the C standard escapes and _C_CTYPE_OTHER_CNTRL.
+ (_C_CTYPE_OTHER_CNTRL): New macro.
+ * tests/test-c-ctype.c (test_all): Test from CHAR_MIN, not
+ from SCHAR_MIN, as the functions are defined only from values
+ promoted from char or from unsigned char, not necessarily from
+ signed char.
+
2015-09-25 Pavel Raiskup <praiskup@redhat.com>
gnulib-common.m4: fix gl_PROG_AR_RANLIB/AM_PROG_AR clash
diff --git a/lib/c-ctype.h b/lib/c-ctype.h
index 1292fc8..88e001f 100644
--- a/lib/c-ctype.h
+++ b/lib/c-ctype.h
@@ -80,30 +80,35 @@ extern "C" {
#define _C_CTYPE_SIGNED_EBCDIC ('A' < 0)
+/* Cases for control characters. */
+
+#define _C_CTYPE_CNTRL \
+ case '\a': case '\b': case '\f': case '\n': \
+ case '\r': case '\t': case '\v': \
+ _C_CTYPE_OTHER_CNTRL
+
+/* ASCII control characters other than those with \-letter escapes. */
+
#if C_CTYPE_ASCII
-# define _C_CTYPE_CNTRL \
+# define _C_CTYPE_OTHER_CNTRL \
case '\x00': case '\x01': case '\x02': case '\x03': \
- case '\x04': case '\x05': case '\x06': case '\x07': \
- case '\x08': case '\x09': case '\x0a': case '\x0b': \
- case '\x0c': case '\x0d': case '\x0e': case '\x0f': \
- case '\x10': case '\x11': case '\x12': case '\x13': \
- case '\x14': case '\x15': case '\x16': case '\x17': \
- case '\x18': case '\x19': case '\x1a': case '\x1b': \
- case '\x1c': case '\x1d': case '\x1e': case '\x1f': \
- case '\x7f'
+ case '\x04': case '\x05': case '\x06': case '\x0e': \
+ case '\x0f': case '\x10': case '\x11': case '\x12': \
+ case '\x13': case '\x14': case '\x15': case '\x16': \
+ case '\x17': case '\x18': case '\x19': case '\x1a': \
+ case '\x1b': case '\x1c': case '\x1d': case '\x1e': \
+ case '\x1f': case '\x7f'
#else
/* Use EBCDIC code page 1047's assignments for ASCII control chars;
assume all EBCDIC code pages agree about these assignments. */
-# define _C_CTYPE_CNTRL \
+# define _C_CTYPE_OTHER_CNTRL \
case '\x00': case '\x01': case '\x02': case '\x03': \
- case '\x05': case '\x07': case '\x0b': case '\x0c': \
- case '\x0d': case '\x0e': case '\x0f': case '\x10': \
- case '\x11': case '\x12': case '\x13': case '\x16': \
- case '\x18': case '\x19': case '\x1c': case '\x1d': \
- case '\x1e': case '\x1f': case '\x25': case '\x26': \
- case '\x27': case '\x2d': case '\x2e': case '\x2f': \
- case '\x32': case '\x37': case '\x3c': case '\x3d': \
- case '\x3f'
+ case '\x07': case '\x0e': case '\x0f': case '\x10': \
+ case '\x11': case '\x12': case '\x13': case '\x18': \
+ case '\x19': case '\x1c': case '\x1d': case '\x1e': \
+ case '\x1f': case '\x26': case '\x27': case '\x2d': \
+ case '\x2e': case '\x32': case '\x37': case '\x3c': \
+ case '\x3d': case '\x3f'
#endif
/* Cases for hex letter digits, digits, lower, and upper, offset by N. */
diff --git a/tests/test-c-ctype.c b/tests/test-c-ctype.c
index d25dc03..544adeb 100644
--- a/tests/test-c-ctype.c
+++ b/tests/test-c-ctype.c
@@ -70,7 +70,7 @@ test_all (void)
int c;
int n_isascii = 0;
- for (c = SCHAR_MIN; c <= UCHAR_MAX; c++)
+ for (c = CHAR_MIN; c <= UCHAR_MAX; c++)
{
if (c < 0)
{
--
2.1.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-26 2:49 ` Paul Eggert
@ 2015-09-26 4:39 ` Daniel Richard G.
2015-09-26 16:08 ` Ben Pfaff
0 siblings, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-26 4:39 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib
On Fri, 2015 Sep 25 19:49-0700, Paul Eggert wrote:
>
> Thanks, given all that history let's rewrite it so that the compiler
> can decide what '\n' maps to, that way it'll work even in EBCDIC
> environments that agree with IANA instead of IBM. I installed the
> attached patch, which should fix the bugs you mentioned.
I'm happy to report that test-c-ctype in Git ff1ef114 now passes with
both signed and unsigned EBCDIC chars on z/OS. Thank you for chasing
this down!
I will investigate further some of the issues that have been uncovered
in this thread, and return to this list with my findings in a few days.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-26 4:39 ` Daniel Richard G.
@ 2015-09-26 16:08 ` Ben Pfaff
2015-09-27 6:31 ` Daniel Richard G.
0 siblings, 1 reply; 49+ messages in thread
From: Ben Pfaff @ 2015-09-26 16:08 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: Paul Eggert, bug-gnulib
On Sat, Sep 26, 2015 at 12:39:52AM -0400, Daniel Richard G. wrote:
> I'm happy to report that test-c-ctype in Git ff1ef114 now passes with
> both signed and unsigned EBCDIC chars on z/OS. Thank you for chasing
> this down!
A "char" configured as signed in EBCDIC violates the ANSI C standard,
which says:
If a member of the basic execution character set is stored in a
char object, its value is guaranteed to be positive.
whereas the "basic execution character set" is defined as:
Both the basic source and basic execution character sets shall have
the following members: the 26 uppercase letters of the Latin
alphabet
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
the 26 lowercase letters of the Latin alphabet
a b c d e f g h i j k l m
n o p q r s t u v w x y z
the 10 decimal digits
0 1 2 3 4 5 6 7 8 9
the following 29 graphic characters
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~
the space character, and control characters representing horizontal
tab, vertical tab, and form feed.
Do people actually used signed "char" with EBCDIC?
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-26 16:08 ` Ben Pfaff
@ 2015-09-27 6:31 ` Daniel Richard G.
2015-09-27 6:59 ` Paul Eggert
0 siblings, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-27 6:31 UTC (permalink / raw)
To: Ben Pfaff; +Cc: Paul Eggert, bug-gnulib
On Sat, 2015 Sep 26 09:08-0700, Ben Pfaff wrote:
>
> A "char" configured as signed in EBCDIC violates the ANSI C standard,
> which says:
>
> If a member of the basic execution character set is stored in a
> char object, its value is guaranteed to be positive.
Now _that's_ a welcome bit of clarity.
While (IMO) it is reasonable to support the oddball case of negative
basic chars where the logic can be centralized, there are numerous
instances where problems arise in common C idioms that are less cleanly
addressable.
Examples that I've found so far in Gnulib include
if (getc(f) == 'x') { ... }
wchar_t buf[] = { 'a', 'b', 'c', '\0' };
> Do people actually used signed "char" with EBCDIC?
It's certainly not the default, but given the sort of history and
longevity that surround many mainframe installations, I wouldn't
be surprised if some folks do. Not that the xlc man page gives
any hint why:
-qchars={signed|unsigned}
Determines whether all variables of type char are
treated as either signed or unsigned.
The default is -qchars=unsigned.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-27 6:31 ` Daniel Richard G.
@ 2015-09-27 6:59 ` Paul Eggert
2015-09-28 2:09 ` Daniel Richard G.
2015-10-15 4:49 ` Daniel Richard G.
0 siblings, 2 replies; 49+ messages in thread
From: Paul Eggert @ 2015-09-27 6:59 UTC (permalink / raw)
To: Daniel Richard G., Ben Pfaff; +Cc: bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 691 bytes --]
Daniel Richard G. wrote:
> It's certainly not the default, but given the sort of history and
> longevity that surround many mainframe installations, I wouldn't
> be surprised if some folks do.
Given all the problems mentioned (including some in the proposed patches), let's
give up on trying to support any such folks. If they want to build gnulib-using
software on z/OS, they'll have to build with the default configuration in which
char is unsigned. It wouldn't be practical for us to try to support char being
signed when standard chars have the top bit set. With that in mind I installed
the attached further patch, which simplifies the recent changes to c-ctype quite
a bit.
[-- Attachment #2: 0001-c-ctype-do-not-worry-about-EBCDIC-char-signed.patch --]
[-- Type: text/plain, Size: 22483 bytes --]
From d25768c6961d9d94492c328035a35ceb140dec6f Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 26 Sep 2015 23:55:07 -0700
Subject: [PATCH] c-ctype: do not worry about EBCDIC + char signed
Drop support for EBCDIC with char being signed, as this breaks too
many programs. Problem reported by Ben Pfaff in:
http://lists.gnu.org/archive/html/bug-gnulib/2015-09/msg00053.html
* lib/c-ctype.h: Verify that we are not using EBCDIC with
char being signed.
(_C_CTYPE_LOWER_A_THRU_F_N): New macro.
(_C_CTYPE_LOWER_N, _C_CTYPE_A_THRU_F): Use it.
(_C_CTYPE_DIGIT, _C_CTYPE_LOWER, _C_CTYPE_PUNCT, _C_CTYPE_UPPER):
(c_isascii, c_isgraph, c_isprint, c_ispunct, c_tolower, c_toupper):
* tests/test-c-ctype.c (test_all):
Simplify by assuming standard char values cannot be negative.
* tests/test-c-ctype.c (NCHARS, to_char): Remove; all uses removed.
---
ChangeLog | 16 ++
lib/c-ctype.h | 490 ++++-----------------------------------------------
tests/test-c-ctype.c | 137 +++++---------
3 files changed, 88 insertions(+), 555 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index a347908..1584e29 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2015-09-26 Paul Eggert <eggert@cs.ucla.edu>
+
+ c-ctype: do not worry about EBCDIC + char signed
+ Drop support for EBCDIC with char being signed, as this breaks too
+ many programs. Problem reported by Ben Pfaff in:
+ http://lists.gnu.org/archive/html/bug-gnulib/2015-09/msg00053.html
+ * lib/c-ctype.h: Verify that we are not using EBCDIC with
+ char being signed.
+ (_C_CTYPE_LOWER_A_THRU_F_N): New macro.
+ (_C_CTYPE_LOWER_N, _C_CTYPE_A_THRU_F): Use it.
+ (_C_CTYPE_DIGIT, _C_CTYPE_LOWER, _C_CTYPE_PUNCT, _C_CTYPE_UPPER):
+ (c_isascii, c_isgraph, c_isprint, c_ispunct, c_tolower, c_toupper):
+ * tests/test-c-ctype.c (test_all):
+ Simplify by assuming standard char values cannot be negative.
+ * tests/test-c-ctype.c (NCHARS, to_char): Remove; all uses removed.
+
2015-09-25 Paul Eggert <eggert@cs.ucla.edu>
c-ctype: port better to z/OS EBCDIC
diff --git a/lib/c-ctype.h b/lib/c-ctype.h
index 88e001f..907e1e2 100644
--- a/lib/c-ctype.h
+++ b/lib/c-ctype.h
@@ -78,7 +78,9 @@ extern "C" {
# error "Only ASCII and EBCDIC are supported"
#endif
-#define _C_CTYPE_SIGNED_EBCDIC ('A' < 0)
+#if 'A' < 0
+# error "EBCDIC and char is signed -- not supported"
+#endif
/* Cases for control characters. */
@@ -111,54 +113,30 @@ extern "C" {
case '\x3d': case '\x3f'
#endif
-/* Cases for hex letter digits, digits, lower, and upper, offset by N. */
+/* Cases for lowercase hex letters, and lowercase letters, all offset by N. */
-#define _C_CTYPE_A_THRU_F_N(n) \
+#define _C_CTYPE_LOWER_A_THRU_F_N(n) \
case 'a' + (n): case 'b' + (n): case 'c' + (n): case 'd' + (n): \
- case 'e' + (n): case 'f' + (n): \
- case 'A' + (n): case 'B' + (n): case 'C' + (n): case 'D' + (n): \
- case 'E' + (n): case 'F' + (n)
-#define _C_CTYPE_DIGIT_N(n) \
- case '0' + (n): case '1' + (n): case '2' + (n): case '3' + (n): \
- case '4' + (n): case '5' + (n): case '6' + (n): case '7' + (n): \
- case '8' + (n): case '9' + (n)
+ case 'e' + (n): case 'f' + (n)
#define _C_CTYPE_LOWER_N(n) \
- case 'a' + (n): case 'b' + (n): case 'c' + (n): case 'd' + (n): \
- case 'e' + (n): case 'f' + (n): case 'g' + (n): case 'h' + (n): \
- case 'i' + (n): case 'j' + (n): case 'k' + (n): case 'l' + (n): \
- case 'm' + (n): case 'n' + (n): case 'o' + (n): case 'p' + (n): \
- case 'q' + (n): case 'r' + (n): case 's' + (n): case 't' + (n): \
- case 'u' + (n): case 'v' + (n): case 'w' + (n): case 'x' + (n): \
- case 'y' + (n): case 'z' + (n)
-#define _C_CTYPE_UPPER_N(n) \
- case 'A' + (n): case 'B' + (n): case 'C' + (n): case 'D' + (n): \
- case 'E' + (n): case 'F' + (n): case 'G' + (n): case 'H' + (n): \
- case 'I' + (n): case 'J' + (n): case 'K' + (n): case 'L' + (n): \
- case 'M' + (n): case 'N' + (n): case 'O' + (n): case 'P' + (n): \
- case 'Q' + (n): case 'R' + (n): case 'S' + (n): case 'T' + (n): \
- case 'U' + (n): case 'V' + (n): case 'W' + (n): case 'X' + (n): \
- case 'Y' + (n): case 'Z' + (n)
-
-/* Given MACRO_N, expand to all the cases for the corresponding class. */
-#if _C_CTYPE_SIGNED_EBCDIC
-# define _C_CTYPE_CASES(macro_n) macro_n (0): macro_n (256)
-#else
-# define _C_CTYPE_CASES(macro_n) macro_n (0)
-#endif
-
-/* Cases for hex letter digits, digits, lower, and upper, with another
- case for unsigned char if the original char is negative. */
-
-#define _C_CTYPE_A_THRU_F _C_CTYPE_CASES (_C_CTYPE_A_THRU_F_N)
-#define _C_CTYPE_DIGIT _C_CTYPE_CASES (_C_CTYPE_DIGIT_N)
-#define _C_CTYPE_LOWER _C_CTYPE_CASES (_C_CTYPE_LOWER_N)
-#define _C_CTYPE_UPPER _C_CTYPE_CASES (_C_CTYPE_UPPER_N)
-
-/* The punct class differs because some punctuation characters may be
- negative while others are nonnegative. Instead of attempting to
- define _C_CTYPE_PUNCT, define just the plain chars here, and do any
- cases-plus-256 by hand after using this macro. */
-#define _C_CTYPE_PUNCT_PLAIN \
+ _C_CTYPE_LOWER_A_THRU_F_N(n): \
+ case 'g' + (n): case 'h' + (n): case 'i' + (n): case 'j' + (n): \
+ case 'k' + (n): case 'l' + (n): case 'm' + (n): case 'n' + (n): \
+ case 'o' + (n): case 'p' + (n): case 'q' + (n): case 'r' + (n): \
+ case 's' + (n): case 't' + (n): case 'u' + (n): case 'v' + (n): \
+ case 'w' + (n): case 'x' + (n): case 'y' + (n): case 'z' + (n)
+
+/* Cases for hex letters, digits, lower, punct, and upper. */
+
+#define _C_CTYPE_A_THRU_F \
+ _C_CTYPE_LOWER_A_THRU_F_N (0): \
+ _C_CTYPE_LOWER_A_THRU_F_N ('A' - 'a')
+#define _C_CTYPE_DIGIT \
+ case '0': case '1': case '2': case '3': \
+ case '4': case '5': case '6': case '7': \
+ case '8': case '9'
+#define _C_CTYPE_LOWER _C_CTYPE_LOWER_N (0)
+#define _C_CTYPE_PUNCT \
case '!': case '"': case '#': case '$': \
case '%': case '&': case '\'': case '(': \
case ')': case '*': case '+': case ',': \
@@ -167,6 +145,8 @@ extern "C" {
case '?': case '@': case '[': case '\\': \
case ']': case '^': case '_': case '`': \
case '{': case '|': case '}': case '~'
+#define _C_CTYPE_UPPER _C_CTYPE_LOWER_N ('A' - 'a')
+
/* Function definitions. */
@@ -194,7 +174,6 @@ c_isalnum (int c)
_C_CTYPE_LOWER:
_C_CTYPE_UPPER:
return true;
-
default:
return false;
}
@@ -208,7 +187,6 @@ c_isalpha (int c)
_C_CTYPE_LOWER:
_C_CTYPE_UPPER:
return true;
-
default:
return false;
}
@@ -225,107 +203,9 @@ c_isascii (int c)
_C_CTYPE_CNTRL:
_C_CTYPE_DIGIT:
_C_CTYPE_LOWER:
+ _C_CTYPE_PUNCT:
_C_CTYPE_UPPER:
-
- _C_CTYPE_PUNCT_PLAIN:
-#if '!' < 0
- case '!' + 256:
-#endif
-#if '"' < 0
- case '"' + 256:
-#endif
-#if '#' < 0
- case '#' + 256:
-#endif
-#if '$' < 0
- case '$' + 256:
-#endif
-#if '%' < 0
- case '%' + 256:
-#endif
-#if '&' < 0
- case '&' + 256:
-#endif
-#if '\'' < 0
- case '\'' + 256:
-#endif
-#if '(' < 0
- case '(' + 256:
-#endif
-#if ')' < 0
- case ')' + 256:
-#endif
-#if '*' < 0
- case '*' + 256:
-#endif
-#if '+' < 0
- case '+' + 256:
-#endif
-#if ',' < 0
- case ',' + 256:
-#endif
-#if '-' < 0
- case '-' + 256:
-#endif
-#if '.' < 0
- case '.' + 256:
-#endif
-#if '/' < 0
- case '/' + 256:
-#endif
-#if ':' < 0
- case ':' + 256:
-#endif
-#if ';' < 0
- case ';' + 256:
-#endif
-#if '<' < 0
- case '<' + 256:
-#endif
-#if '=' < 0
- case '=' + 256:
-#endif
-#if '>' < 0
- case '>' + 256:
-#endif
-#if '?' < 0
- case '?' + 256:
-#endif
-#if '@' < 0
- case '@' + 256:
-#endif
-#if '[' < 0
- case '[' + 256:
-#endif
-#if '\\' < 0
- case '\\' + 256:
-#endif
-#if ']' < 0
- case ']' + 256:
-#endif
-#if '^' < 0
- case '^' + 256:
-#endif
-#if '_' < 0
- case '_' + 256:
-#endif
-#if '`' < 0
- case '`' + 256:
-#endif
-#if '{' < 0
- case '{' + 256:
-#endif
-#if '|' < 0
- case '|' + 256:
-#endif
-#if '}' < 0
- case '}' + 256:
-#endif
-#if '~' < 0
- case '~' + 256:
-#endif
return true;
-
default:
return false;
}
@@ -368,107 +248,9 @@ c_isgraph (int c)
{
_C_CTYPE_DIGIT:
_C_CTYPE_LOWER:
+ _C_CTYPE_PUNCT:
_C_CTYPE_UPPER:
-
- _C_CTYPE_PUNCT_PLAIN:
-#if '!' < 0
- case '!' + 256:
-#endif
-#if '"' < 0
- case '"' + 256:
-#endif
-#if '#' < 0
- case '#' + 256:
-#endif
-#if '$' < 0
- case '$' + 256:
-#endif
-#if '%' < 0
- case '%' + 256:
-#endif
-#if '&' < 0
- case '&' + 256:
-#endif
-#if '\'' < 0
- case '\'' + 256:
-#endif
-#if '(' < 0
- case '(' + 256:
-#endif
-#if ')' < 0
- case ')' + 256:
-#endif
-#if '*' < 0
- case '*' + 256:
-#endif
-#if '+' < 0
- case '+' + 256:
-#endif
-#if ',' < 0
- case ',' + 256:
-#endif
-#if '-' < 0
- case '-' + 256:
-#endif
-#if '.' < 0
- case '.' + 256:
-#endif
-#if '/' < 0
- case '/' + 256:
-#endif
-#if ':' < 0
- case ':' + 256:
-#endif
-#if ';' < 0
- case ';' + 256:
-#endif
-#if '<' < 0
- case '<' + 256:
-#endif
-#if '=' < 0
- case '=' + 256:
-#endif
-#if '>' < 0
- case '>' + 256:
-#endif
-#if '?' < 0
- case '?' + 256:
-#endif
-#if '@' < 0
- case '@' + 256:
-#endif
-#if '[' < 0
- case '[' + 256:
-#endif
-#if '\\' < 0
- case '\\' + 256:
-#endif
-#if ']' < 0
- case ']' + 256:
-#endif
-#if '^' < 0
- case '^' + 256:
-#endif
-#if '_' < 0
- case '_' + 256:
-#endif
-#if '`' < 0
- case '`' + 256:
-#endif
-#if '{' < 0
- case '{' + 256:
-#endif
-#if '|' < 0
- case '|' + 256:
-#endif
-#if '}' < 0
- case '}' + 256:
-#endif
-#if '~' < 0
- case '~' + 256:
-#endif
return true;
-
default:
return false;
}
@@ -494,107 +276,9 @@ c_isprint (int c)
case ' ':
_C_CTYPE_DIGIT:
_C_CTYPE_LOWER:
+ _C_CTYPE_PUNCT:
_C_CTYPE_UPPER:
-
- _C_CTYPE_PUNCT_PLAIN:
-#if '!' < 0
- case '!' + 256:
-#endif
-#if '"' < 0
- case '"' + 256:
-#endif
-#if '#' < 0
- case '#' + 256:
-#endif
-#if '$' < 0
- case '$' + 256:
-#endif
-#if '%' < 0
- case '%' + 256:
-#endif
-#if '&' < 0
- case '&' + 256:
-#endif
-#if '\'' < 0
- case '\'' + 256:
-#endif
-#if '(' < 0
- case '(' + 256:
-#endif
-#if ')' < 0
- case ')' + 256:
-#endif
-#if '*' < 0
- case '*' + 256:
-#endif
-#if '+' < 0
- case '+' + 256:
-#endif
-#if ',' < 0
- case ',' + 256:
-#endif
-#if '-' < 0
- case '-' + 256:
-#endif
-#if '.' < 0
- case '.' + 256:
-#endif
-#if '/' < 0
- case '/' + 256:
-#endif
-#if ':' < 0
- case ':' + 256:
-#endif
-#if ';' < 0
- case ';' + 256:
-#endif
-#if '<' < 0
- case '<' + 256:
-#endif
-#if '=' < 0
- case '=' + 256:
-#endif
-#if '>' < 0
- case '>' + 256:
-#endif
-#if '?' < 0
- case '?' + 256:
-#endif
-#if '@' < 0
- case '@' + 256:
-#endif
-#if '[' < 0
- case '[' + 256:
-#endif
-#if '\\' < 0
- case '\\' + 256:
-#endif
-#if ']' < 0
- case ']' + 256:
-#endif
-#if '^' < 0
- case '^' + 256:
-#endif
-#if '_' < 0
- case '_' + 256:
-#endif
-#if '`' < 0
- case '`' + 256:
-#endif
-#if '{' < 0
- case '{' + 256:
-#endif
-#if '|' < 0
- case '|' + 256:
-#endif
-#if '}' < 0
- case '}' + 256:
-#endif
-#if '~' < 0
- case '~' + 256:
-#endif
return true;
-
default:
return false;
}
@@ -605,105 +289,8 @@ c_ispunct (int c)
{
switch (c)
{
- _C_CTYPE_PUNCT_PLAIN:
-#if '!' < 0
- case '!' + 256:
-#endif
-#if '"' < 0
- case '"' + 256:
-#endif
-#if '#' < 0
- case '#' + 256:
-#endif
-#if '$' < 0
- case '$' + 256:
-#endif
-#if '%' < 0
- case '%' + 256:
-#endif
-#if '&' < 0
- case '&' + 256:
-#endif
-#if '\'' < 0
- case '\'' + 256:
-#endif
-#if '(' < 0
- case '(' + 256:
-#endif
-#if ')' < 0
- case ')' + 256:
-#endif
-#if '*' < 0
- case '*' + 256:
-#endif
-#if '+' < 0
- case '+' + 256:
-#endif
-#if ',' < 0
- case ',' + 256:
-#endif
-#if '-' < 0
- case '-' + 256:
-#endif
-#if '.' < 0
- case '.' + 256:
-#endif
-#if '/' < 0
- case '/' + 256:
-#endif
-#if ':' < 0
- case ':' + 256:
-#endif
-#if ';' < 0
- case ';' + 256:
-#endif
-#if '<' < 0
- case '<' + 256:
-#endif
-#if '=' < 0
- case '=' + 256:
-#endif
-#if '>' < 0
- case '>' + 256:
-#endif
-#if '?' < 0
- case '?' + 256:
-#endif
-#if '@' < 0
- case '@' + 256:
-#endif
-#if '[' < 0
- case '[' + 256:
-#endif
-#if '\\' < 0
- case '\\' + 256:
-#endif
-#if ']' < 0
- case ']' + 256:
-#endif
-#if '^' < 0
- case '^' + 256:
-#endif
-#if '_' < 0
- case '_' + 256:
-#endif
-#if '`' < 0
- case '`' + 256:
-#endif
-#if '{' < 0
- case '{' + 256:
-#endif
-#if '|' < 0
- case '|' + 256:
-#endif
-#if '}' < 0
- case '}' + 256:
-#endif
-#if '~' < 0
- case '~' + 256:
-#endif
+ _C_CTYPE_PUNCT:
return true;
-
default:
return false;
}
@@ -741,7 +328,6 @@ c_isxdigit (int c)
_C_CTYPE_DIGIT:
_C_CTYPE_A_THRU_F:
return true;
-
default:
return false;
}
@@ -752,14 +338,8 @@ c_tolower (int c)
{
switch (c)
{
- _C_CTYPE_UPPER_N (0):
-#if _C_CTYPE_SIGNED_EBCDIC
- c += 256;
- /* Fall through. */
- _C_CTYPE_UPPER_N (256):
-#endif
+ _C_CTYPE_UPPER:
return c - 'A' + 'a';
-
default:
return c;
}
@@ -770,14 +350,8 @@ c_toupper (int c)
{
switch (c)
{
- _C_CTYPE_LOWER_N (0):
-#if _C_CTYPE_SIGNED_EBCDIC
- c += 256;
- /* Fall through. */
- _C_CTYPE_LOWER_N (256):
-#endif
+ _C_CTYPE_LOWER:
return c - 'a' + 'A';
-
default:
return c;
}
diff --git a/tests/test-c-ctype.c b/tests/test-c-ctype.c
index 544adeb..9780554 100644
--- a/tests/test-c-ctype.c
+++ b/tests/test-c-ctype.c
@@ -26,16 +26,6 @@
#include "macros.h"
-enum { NCHARS = UCHAR_MAX + 1 };
-
-static char
-to_char (int c)
-{
- if (CHAR_MIN < 0 && CHAR_MAX < c)
- return c - CHAR_MAX - 1 + CHAR_MIN;
- return c;
-}
-
static void
test_agree_with_C_locale (void)
{
@@ -72,27 +62,30 @@ test_all (void)
for (c = CHAR_MIN; c <= UCHAR_MAX; c++)
{
- if (c < 0)
+ if (! (0 <= c && c <= CHAR_MAX))
{
- ASSERT (c_isascii (c) == c_isascii (c + NCHARS));
- ASSERT (c_isalnum (c) == c_isalnum (c + NCHARS));
- ASSERT (c_isalpha (c) == c_isalpha (c + NCHARS));
- ASSERT (c_isblank (c) == c_isblank (c + NCHARS));
- ASSERT (c_iscntrl (c) == c_iscntrl (c + NCHARS));
- ASSERT (c_isdigit (c) == c_isdigit (c + NCHARS));
- ASSERT (c_islower (c) == c_islower (c + NCHARS));
- ASSERT (c_isgraph (c) == c_isgraph (c + NCHARS));
- ASSERT (c_isprint (c) == c_isprint (c + NCHARS));
- ASSERT (c_ispunct (c) == c_ispunct (c + NCHARS));
- ASSERT (c_isspace (c) == c_isspace (c + NCHARS));
- ASSERT (c_isupper (c) == c_isupper (c + NCHARS));
- ASSERT (c_isxdigit (c) == c_isxdigit (c + NCHARS));
- ASSERT (to_char (c_tolower (c)) == to_char (c_tolower (c + NCHARS)));
- ASSERT (to_char (c_toupper (c)) == to_char (c_toupper (c + NCHARS)));
+ ASSERT (! c_isascii (c));
+ ASSERT (! c_isalnum (c));
+ ASSERT (! c_isalpha (c));
+ ASSERT (! c_isblank (c));
+ ASSERT (! c_iscntrl (c));
+ ASSERT (! c_isdigit (c));
+ ASSERT (! c_islower (c));
+ ASSERT (! c_isgraph (c));
+ ASSERT (! c_isprint (c));
+ ASSERT (! c_ispunct (c));
+ ASSERT (! c_isspace (c));
+ ASSERT (! c_isupper (c));
+ ASSERT (! c_isxdigit (c));
+ ASSERT (c_tolower (c) == c);
+ ASSERT (c_toupper (c) == c);
}
- if (0 <= c)
- n_isascii += c_isascii (c);
+ n_isascii += c_isascii (c);
+
+#ifdef C_CTYPE_ASCII
+ ASSERT (c_isascii (c) == (0 <= c && c <= 0x7f));
+#endif
ASSERT (c_isascii (c) == (c_isprint (c) || c_iscntrl (c)));
@@ -100,7 +93,7 @@ test_all (void)
ASSERT (c_isalpha (c) == (c_islower (c) || c_isupper (c)));
- switch (to_char (c))
+ switch (c)
{
case '\t': case ' ':
ASSERT (c_isblank (c) == 1);
@@ -114,9 +107,17 @@ test_all (void)
ASSERT (c_iscntrl (c) == ((c >= 0 && c < 0x20) || c == 0x7f));
#endif
+ switch (c)
+ {
+ case '\a': case '\b': case '\f': case '\n':
+ case '\r': case '\t': case '\v':
+ ASSERT (c_iscntrl (c));
+ break;
+ }
+
ASSERT (! (c_iscntrl (c) && c_isprint (c)));
- switch (to_char (c))
+ switch (c)
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
@@ -127,7 +128,7 @@ test_all (void)
break;
}
- switch (to_char (c))
+ switch (c)
{
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
@@ -135,9 +136,11 @@ test_all (void)
case 's': case 't': case 'u': case 'v': case 'w': case 'x':
case 'y': case 'z':
ASSERT (c_islower (c) == 1);
+ ASSERT (c_toupper (c) == c - 'a' + 'A');
break;
default:
ASSERT (c_islower (c) == 0);
+ ASSERT (c_toupper (c) == c);
break;
}
@@ -151,7 +154,7 @@ test_all (void)
ASSERT (c_isprint (c) == (c_isgraph (c) || c == ' '));
- switch (to_char (c))
+ switch (c)
{
case '!': case '"': case '#': case '$': case '%': case '&': case '\'':
case '(': case ')': case '*': case '+': case ',': case '-': case '.':
@@ -165,7 +168,7 @@ test_all (void)
break;
}
- switch (to_char (c))
+ switch (c)
{
case ' ': case '\t': case '\n': case '\v': case '\f': case '\r':
ASSERT (c_isspace (c) == 1);
@@ -175,7 +178,7 @@ test_all (void)
break;
}
- switch (to_char (c))
+ switch (c)
{
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
@@ -183,13 +186,15 @@ test_all (void)
case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
case 'Y': case 'Z':
ASSERT (c_isupper (c) == 1);
+ ASSERT (c_tolower (c) == c - 'A' + 'a');
break;
default:
ASSERT (c_isupper (c) == 0);
+ ASSERT (c_tolower (c) == c);
break;
}
- switch (to_char (c))
+ switch (c)
{
case '0': case '1': case '2': case '3': case '4': case '5':
case '6': case '7': case '8': case '9':
@@ -201,68 +206,6 @@ test_all (void)
ASSERT (c_isxdigit (c) == 0);
break;
}
-
- switch (to_char (c))
- {
- case 'A': ASSERT (to_char (c_tolower (c)) == 'a'); break;
- case 'B': ASSERT (to_char (c_tolower (c)) == 'b'); break;
- case 'C': ASSERT (to_char (c_tolower (c)) == 'c'); break;
- case 'D': ASSERT (to_char (c_tolower (c)) == 'd'); break;
- case 'E': ASSERT (to_char (c_tolower (c)) == 'e'); break;
- case 'F': ASSERT (to_char (c_tolower (c)) == 'f'); break;
- case 'G': ASSERT (to_char (c_tolower (c)) == 'g'); break;
- case 'H': ASSERT (to_char (c_tolower (c)) == 'h'); break;
- case 'I': ASSERT (to_char (c_tolower (c)) == 'i'); break;
- case 'J': ASSERT (to_char (c_tolower (c)) == 'j'); break;
- case 'K': ASSERT (to_char (c_tolower (c)) == 'k'); break;
- case 'L': ASSERT (to_char (c_tolower (c)) == 'l'); break;
- case 'M': ASSERT (to_char (c_tolower (c)) == 'm'); break;
- case 'N': ASSERT (to_char (c_tolower (c)) == 'n'); break;
- case 'O': ASSERT (to_char (c_tolower (c)) == 'o'); break;
- case 'P': ASSERT (to_char (c_tolower (c)) == 'p'); break;
- case 'Q': ASSERT (to_char (c_tolower (c)) == 'q'); break;
- case 'R': ASSERT (to_char (c_tolower (c)) == 'r'); break;
- case 'S': ASSERT (to_char (c_tolower (c)) == 's'); break;
- case 'T': ASSERT (to_char (c_tolower (c)) == 't'); break;
- case 'U': ASSERT (to_char (c_tolower (c)) == 'u'); break;
- case 'V': ASSERT (to_char (c_tolower (c)) == 'v'); break;
- case 'W': ASSERT (to_char (c_tolower (c)) == 'w'); break;
- case 'X': ASSERT (to_char (c_tolower (c)) == 'x'); break;
- case 'Y': ASSERT (to_char (c_tolower (c)) == 'y'); break;
- case 'Z': ASSERT (to_char (c_tolower (c)) == 'z'); break;
- default: ASSERT (c_tolower (c) == c); break;
- }
-
- switch (to_char (c))
- {
- case 'a': ASSERT (to_char (c_toupper (c)) == 'A'); break;
- case 'b': ASSERT (to_char (c_toupper (c)) == 'B'); break;
- case 'c': ASSERT (to_char (c_toupper (c)) == 'C'); break;
- case 'd': ASSERT (to_char (c_toupper (c)) == 'D'); break;
- case 'e': ASSERT (to_char (c_toupper (c)) == 'E'); break;
- case 'f': ASSERT (to_char (c_toupper (c)) == 'F'); break;
- case 'g': ASSERT (to_char (c_toupper (c)) == 'G'); break;
- case 'h': ASSERT (to_char (c_toupper (c)) == 'H'); break;
- case 'i': ASSERT (to_char (c_toupper (c)) == 'I'); break;
- case 'j': ASSERT (to_char (c_toupper (c)) == 'J'); break;
- case 'k': ASSERT (to_char (c_toupper (c)) == 'K'); break;
- case 'l': ASSERT (to_char (c_toupper (c)) == 'L'); break;
- case 'm': ASSERT (to_char (c_toupper (c)) == 'M'); break;
- case 'n': ASSERT (to_char (c_toupper (c)) == 'N'); break;
- case 'o': ASSERT (to_char (c_toupper (c)) == 'O'); break;
- case 'p': ASSERT (to_char (c_toupper (c)) == 'P'); break;
- case 'q': ASSERT (to_char (c_toupper (c)) == 'Q'); break;
- case 'r': ASSERT (to_char (c_toupper (c)) == 'R'); break;
- case 's': ASSERT (to_char (c_toupper (c)) == 'S'); break;
- case 't': ASSERT (to_char (c_toupper (c)) == 'T'); break;
- case 'u': ASSERT (to_char (c_toupper (c)) == 'U'); break;
- case 'v': ASSERT (to_char (c_toupper (c)) == 'V'); break;
- case 'w': ASSERT (to_char (c_toupper (c)) == 'W'); break;
- case 'x': ASSERT (to_char (c_toupper (c)) == 'X'); break;
- case 'y': ASSERT (to_char (c_toupper (c)) == 'Y'); break;
- case 'z': ASSERT (to_char (c_toupper (c)) == 'Z'); break;
- default: ASSERT (c_toupper (c) == c); break;
- }
}
ASSERT (n_isascii == 128);
--
2.1.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-27 6:59 ` Paul Eggert
@ 2015-09-28 2:09 ` Daniel Richard G.
2015-10-15 4:49 ` Daniel Richard G.
1 sibling, 0 replies; 49+ messages in thread
From: Daniel Richard G. @ 2015-09-28 2:09 UTC (permalink / raw)
To: Paul Eggert, Ben Pfaff; +Cc: bug-gnulib
On Sat, 2015 Sep 26 23:59-0700, Paul Eggert wrote:
>
> Given all the problems mentioned (including some in the proposed
> patches), let's give up on trying to support any such folks. If they
> want to build gnulib-using software on z/OS, they'll have to build
> with the default configuration in which char is unsigned. It wouldn't
> be practical for us to try to support char being signed when standard
> chars have the top bit set.
I wasn't quite sure where to draw the "not worth the trouble" line,
but I think this is defensible.
> With that in mind I installed the attached further patch, which
> simplifies the recent changes to c-ctype quite a bit.
It's a shame; I was impressed by the work you had done to support the
signed chars. But in any event, test-c-ctype in Git d2de2a91 passes in
an unsigned-char EBCDIC build on z/OS, and that will hopefully remain
the case for a good long while.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-09-27 6:59 ` Paul Eggert
2015-09-28 2:09 ` Daniel Richard G.
@ 2015-10-15 4:49 ` Daniel Richard G.
2016-08-18 0:47 ` Paul Eggert
` (2 more replies)
1 sibling, 3 replies; 49+ messages in thread
From: Daniel Richard G. @ 2015-10-15 4:49 UTC (permalink / raw)
To: Paul Eggert; +Cc: bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 1633 bytes --]
Okay, I've split my changes into a set of patches, attached. These
patches are orthogonal and may be applied in any order:
gnulib-zos-ascii.patch: When in a non-ASCII environment, disable tests
that assume ASCII.
gnulib-zos-charset.patch: Added appropriately conditional #pragmas so
that the test strings in test-iconv-utf.c are correctly interpreted in
ASCII instead of EBCDIC (i.e. 'J' == 0x4A and not 0xD1). This issue
could be addressed in a more portable way by simply rewriting all the
ASCII literal characters as octal escapes, but then you would lose the
partial readability that the strings have now. Also, iconv_open() on
z/OS does not recognize "ISO-8859-1", but "ISO8859-1" works.
gnulib-zos-configure.patch: Changes to the Autoconf M4 code to support
z/OS. Note that fclose() is broken in a different way on z/OS than it is
on other systems, thus the special-case in fclose.m4.
gnulib-zos-cpp.patch: General preprocessor-level changes to
support z/OS.
gnulib-zos-errno.patch: Accommodate z/OS errno code preferences. (I
believe this should still be within spec; IBM is good at following the
letter if not the spirit of such things.)
gnulib-zos-pthread.patch: Rudimentary gl_thread support for z/OS.
gnulib-zos-regex-argname.patch: "__string" is not a good name to use as
an identifier on this system. A better fix would be to use a different
name (why not just "s"?), provided this can be pushed to upstream glibc.
gnulib-zos-strtod.patch: Address a couple quirks in the z/OS
implementation of strtod().
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gnulib-zos-ascii.patch --]
[-- Type: text/x-patch; name="gnulib-zos-ascii.patch", Size: 1363 bytes --]
diff --git a/tests/test-c-strcasecmp.c b/tests/test-c-strcasecmp.c
index f7f6b43..47feac8 100644
--- a/tests/test-c-strcasecmp.c
+++ b/tests/test-c-strcasecmp.c
@@ -19,6 +19,7 @@
#include <config.h>
#include "c-strcase.h"
+#include "c-ctype.h"
#include <locale.h>
#include <string.h>
@@ -57,9 +58,11 @@ main (int argc, char *argv[])
ASSERT (c_strcasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R") > 0); /* özgür */
ASSERT (c_strcasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r") < 0); /* özgür */
+#if C_CTYPE_ASCII
/* This test shows how strings of different size cannot compare equal. */
ASSERT (c_strcasecmp ("turkish", "TURK\304\260SH") < 0);
ASSERT (c_strcasecmp ("TURK\304\260SH", "turkish") > 0);
+#endif
return 0;
}
diff --git a/tests/test-wcwidth.c b/tests/test-wcwidth.c
index 9fad785..fdbecc3 100644
--- a/tests/test-wcwidth.c
+++ b/tests/test-wcwidth.c
@@ -26,6 +26,7 @@ SIGNATURE_CHECK (wcwidth, int, (wchar_t));
#include <locale.h>
#include <string.h>
+#include "c-ctype.h"
#include "localcharset.h"
#include "macros.h"
@@ -34,9 +35,11 @@ main ()
{
wchar_t wc;
+#ifdef C_CTYPE_ASCII
/* Test width of ASCII characters. */
for (wc = 0x20; wc < 0x7F; wc++)
ASSERT (wcwidth (wc) == 1);
+#endif
/* Switch to an UTF-8 locale. */
if (setlocale (LC_ALL, "fr_FR.UTF-8") != NULL
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: gnulib-zos-charset.patch --]
[-- Type: text/x-patch; name="gnulib-zos-charset.patch", Size: 9066 bytes --]
diff --git a/tests/test-iconv-utf.c b/tests/test-iconv-utf.c
index c1589f6..a769bee 100644
--- a/tests/test-iconv-utf.c
+++ b/tests/test-iconv-utf.c
@@ -27,20 +27,38 @@
#include "macros.h"
+/* If compiling on an EBCDIC system, keep the test strings in ASCII. */
+#if defined __IBMC__ && 'A' != 0x41
+# pragma convert("ISO8859-1")
+# define CONVERT_ENABLED
+#endif
+
+/* The text is "Japanese (日本語) [\U0001D50D\U0001D51E\U0001D52D]". */
+
+const char test_utf8_string[] = "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
+
+const char test_utf16be_string[] = "\000J\000a\000p\000a\000n\000e\000s\000e\000 \000(\145\345\147\054\212\236\000)\000 \000[\330\065\335\015\330\065\335\036\330\065\335\055\000]";
+
+const char test_utf16le_string[] = "J\000a\000p\000a\000n\000e\000s\000e\000 \000(\000\345\145\054\147\236\212)\000 \000[\000\065\330\015\335\065\330\036\335\065\330\055\335]\000";
+
+const char test_utf32be_string[] = "\000\000\000J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\145\345\000\000\147\054\000\000\212\236\000\000\000)\000\000\000 \000\000\000[\000\001\325\015\000\001\325\036\000\001\325\055\000\000\000]";
+
+const char test_utf32le_string[] = "J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\000\345\145\000\000\054\147\000\000\236\212\000\000)\000\000\000 \000\000\000[\000\000\000\015\325\001\000\036\325\001\000\055\325\001\000]\000\000\000";
+
+#ifdef CONVERT_ENABLED
+# pragma convert(pop)
+#endif
+
int
main ()
{
#if HAVE_ICONV
/* Assume that iconv() supports at least the encoding UTF-8. */
- /* The text is "Japanese (日本語) [\U0001D50D\U0001D51E\U0001D52D]". */
-
/* Test conversion from UTF-8 to UTF-16BE with no errors. */
{
- static const char input[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
- static const char expected[] =
- "\000J\000a\000p\000a\000n\000e\000s\000e\000 \000(\145\345\147\054\212\236\000)\000 \000[\330\065\335\015\330\065\335\036\330\065\335\055\000]";
+#define input test_utf8_string
+#define expected test_utf16be_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -64,14 +82,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-8 to UTF-16LE with no errors. */
{
- static const char input[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
- static const char expected[] =
- "J\000a\000p\000a\000n\000e\000s\000e\000 \000(\000\345\145\054\147\236\212)\000 \000[\000\065\330\015\335\065\330\036\335\065\330\055\335]\000";
+#define input test_utf8_string
+#define expected test_utf16le_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -95,14 +114,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-8 to UTF-32BE with no errors. */
{
- static const char input[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
- static const char expected[] =
- "\000\000\000J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\145\345\000\000\147\054\000\000\212\236\000\000\000)\000\000\000 \000\000\000[\000\001\325\015\000\001\325\036\000\001\325\055\000\000\000]";
+#define input test_utf8_string
+#define expected test_utf32be_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -126,14 +146,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-8 to UTF-32LE with no errors. */
{
- static const char input[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
- static const char expected[] =
- "J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\000\345\145\000\000\054\147\000\000\236\212\000\000)\000\000\000 \000\000\000[\000\000\000\015\325\001\000\036\325\001\000\055\325\001\000]\000\000\000";
+#define input test_utf8_string
+#define expected test_utf32le_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -157,14 +178,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-16BE to UTF-8 with no errors. */
{
- static const char input[] =
- "\000J\000a\000p\000a\000n\000e\000s\000e\000 \000(\145\345\147\054\212\236\000)\000 \000[\330\065\335\015\330\065\335\036\330\065\335\055\000]";
- static const char expected[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
+#define input test_utf16be_string
+#define expected test_utf8_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -188,14 +210,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-16LE to UTF-8 with no errors. */
{
- static const char input[] =
- "J\000a\000p\000a\000n\000e\000s\000e\000 \000(\000\345\145\054\147\236\212)\000 \000[\000\065\330\015\335\065\330\036\335\065\330\055\335]\000";
- static const char expected[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
+#define input test_utf16le_string
+#define expected test_utf8_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -219,14 +242,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-32BE to UTF-8 with no errors. */
{
- static const char input[] =
- "\000\000\000J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\145\345\000\000\147\054\000\000\212\236\000\000\000)\000\000\000 \000\000\000[\000\001\325\015\000\001\325\036\000\001\325\055\000\000\000]";
- static const char expected[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
+#define input test_utf32be_string
+#define expected test_utf8_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -250,14 +274,15 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
/* Test conversion from UTF-32LE to UTF-8 with no errors. */
{
- static const char input[] =
- "J\000\000\000a\000\000\000p\000\000\000a\000\000\000n\000\000\000e\000\000\000s\000\000\000e\000\000\000 \000\000\000(\000\000\000\345\145\000\000\054\147\000\000\236\212\000\000)\000\000\000 \000\000\000[\000\000\000\015\325\001\000\036\325\001\000\055\325\001\000]\000\000\000";
- static const char expected[] =
- "Japanese (\346\227\245\346\234\254\350\252\236) [\360\235\224\215\360\235\224\236\360\235\224\255]";
+#define input test_utf32le_string
+#define expected test_utf8_string
iconv_t cd;
char buf[100];
const char *inptr;
@@ -281,6 +306,9 @@ main ()
ASSERT (memcmp (buf, expected, sizeof (expected) - 1) == 0);
ASSERT (iconv_close (cd) == 0);
+
+#undef input
+#undef expected
}
#endif
diff --git a/tests/test-iconv.c b/tests/test-iconv.c
index ed715bd..a64c6dd 100644
--- a/tests/test-iconv.c
+++ b/tests/test-iconv.c
@@ -44,8 +44,14 @@ main ()
#if HAVE_ICONV
/* Assume that iconv() supports at least the encodings ASCII, ISO-8859-1,
and UTF-8. */
- iconv_t cd_88591_to_utf8 = iconv_open ("UTF-8", "ISO-8859-1");
- iconv_t cd_utf8_to_88591 = iconv_open ("ISO-8859-1", "UTF-8");
+ iconv_t cd_88591_to_utf8 = iconv_open ("UTF-8", "ISO8859-1");
+ iconv_t cd_utf8_to_88591 = iconv_open ("ISO8859-1", "UTF-8");
+
+#if defined __MVS__ && defined __IBMC__
+ /* String literals below are in ASCII, not EBCDIC. */
+# pragma convert("ISO8859-1")
+# define CONVERT_ENABLED
+#endif
ASSERT (cd_88591_to_utf8 != (iconv_t)(-1));
ASSERT (cd_utf8_to_88591 != (iconv_t)(-1));
@@ -142,7 +148,12 @@ main ()
iconv_close (cd_88591_to_utf8);
iconv_close (cd_utf8_to_88591);
+
+#ifdef CONVERT_ENABLED
+# pragma convert(pop)
#endif
+#endif /* HAVE_ICONV */
+
return 0;
}
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: gnulib-zos-configure.patch --]
[-- Type: text/x-patch; name="gnulib-zos-configure.patch", Size: 3598 bytes --]
diff --git a/m4/fclose.m4 b/m4/fclose.m4
index 6bd1ad8..e939d30 100644
--- a/m4/fclose.m4
+++ b/m4/fclose.m4
@@ -1,4 +1,4 @@
-# fclose.m4 serial 6
+# fclose.m4 serial 7
dnl Copyright (C) 2008-2015 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
@@ -7,6 +7,7 @@ dnl with or without modifications, as long as this notice is preserved.
AC_DEFUN([gl_FUNC_FCLOSE],
[
AC_REQUIRE([gl_STDIO_H_DEFAULTS])
+ AC_REQUIRE([AC_CANONICAL_HOST])
gl_FUNC_FFLUSH_STDIN
if test $gl_cv_func_fflush_stdin != yes; then
@@ -17,4 +18,8 @@ AC_DEFUN([gl_FUNC_FCLOSE],
if test $REPLACE_CLOSE = 1; then
REPLACE_FCLOSE=1
fi
+
+ case "$host_os" in
+ openedition) REPLACE_FCLOSE=1 ;;
+ esac
])
diff --git a/m4/strstr.m4 b/m4/strstr.m4
index 040c0b9..e3e528d 100644
--- a/m4/strstr.m4
+++ b/m4/strstr.m4
@@ -1,4 +1,4 @@
-# strstr.m4 serial 16
+# strstr.m4 serial 17
dnl Copyright (C) 2008-2015 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
@@ -67,6 +67,12 @@ AC_DEFUN([gl_FUNC_STRSTR],
AC_CACHE_CHECK([whether strstr works in linear time],
[gl_cv_func_strstr_linear],
[AC_RUN_IFELSE([AC_LANG_PROGRAM([[
+#ifdef __MVS__
+/* z/OS does not deliver signals while strstr() is running (thanks to
+ restrictions on its LE runtime), which prevents us from limiting the
+ running time of this test. */
+# error "This test does not work properly on z/OS"
+#endif
#include <signal.h> /* for signal */
#include <string.h> /* for strstr */
#include <stdlib.h> /* for malloc */
diff --git a/m4/wchar_h.m4 b/m4/wchar_h.m4
index 9d1b0f8..c926c4b 100644
--- a/m4/wchar_h.m4
+++ b/m4/wchar_h.m4
@@ -7,7 +7,7 @@ dnl with or without modifications, as long as this notice is preserved.
dnl Written by Eric Blake.
-# wchar_h.m4 serial 39
+# wchar_h.m4 serial 40
AC_DEFUN([gl_WCHAR_H],
[
@@ -81,8 +81,14 @@ AC_DEFUN([gl_WCHAR_H_INLINE_OK],
extern int zero (void);
int main () { return zero(); }
]])])
+ dnl Do not rename the object file from conftest.$ac_objext to
+ dnl conftest1.$ac_objext, as this will cause the link to fail on
+ dnl z/OS when using the XPLINK object format (due to duplicate
+ dnl CSECT names). Instead, temporarily redefine $ac_compile so
+ dnl that the object file has the latter name from the start.
+ save_ac_compile="$ac_compile"
+ ac_compile=`echo "$save_ac_compile" | sed s/conftest/conftest1/`
if AC_TRY_EVAL([ac_compile]); then
- mv conftest.$ac_objext conftest1.$ac_objext
AC_LANG_CONFTEST([
AC_LANG_SOURCE([[#define wcstod renamed_wcstod
/* Tru64 with Desktop Toolkit C has a bug: <stdio.h> must be included before
@@ -95,8 +101,9 @@ int main () { return zero(); }
#include <wchar.h>
int zero (void) { return 0; }
]])])
+ dnl See note above about renaming object files.
+ ac_compile=`echo "$save_ac_compile" | sed s/conftest/conftest2/`
if AC_TRY_EVAL([ac_compile]); then
- mv conftest.$ac_objext conftest2.$ac_objext
if $CC -o conftest$ac_exeext $CFLAGS $LDFLAGS conftest1.$ac_objext conftest2.$ac_objext $LIBS >&AS_MESSAGE_LOG_FD 2>&1; then
:
else
@@ -104,6 +111,7 @@ int zero (void) { return 0; }
fi
fi
fi
+ ac_compile="$save_ac_compile"
rm -f conftest1.$ac_objext conftest2.$ac_objext conftest$ac_exeext
])
if test $gl_cv_header_wchar_h_correct_inline = no; then
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: gnulib-zos-cpp.patch --]
[-- Type: text/x-patch; name="gnulib-zos-cpp.patch", Size: 8471 bytes --]
diff --git a/lib/alloca.in.h b/lib/alloca.in.h
index d5664b6..6606984 100644
--- a/lib/alloca.in.h
+++ b/lib/alloca.in.h
@@ -51,6 +51,8 @@ extern "C"
void *_alloca (unsigned short);
# pragma intrinsic (_alloca)
# define alloca _alloca
+# elif defined __MVS__
+# include <stdlib.h>
# else
# include <stddef.h>
# ifdef __cplusplus
diff --git a/lib/fnmatch.c b/lib/fnmatch.c
index a607672..58754fa 100644
--- a/lib/fnmatch.c
+++ b/lib/fnmatch.c
@@ -22,7 +22,7 @@
# define _GNU_SOURCE 1
#endif
-#if ! defined __builtin_expect && __GNUC__ < 3
+#if ! defined __builtin_expect && defined __GNUC__ && __GNUC__ < 3
# define __builtin_expect(expr, expected) (expr)
#endif
diff --git a/lib/get-rusage-as.c b/lib/get-rusage-as.c
index 2bad20a..4db1596 100644
--- a/lib/get-rusage-as.c
+++ b/lib/get-rusage-as.c
@@ -355,7 +355,7 @@ get_rusage_as_via_iterator (void)
uintptr_t
get_rusage_as (void)
{
-#if (defined __APPLE__ && defined __MACH__) || defined _AIX || defined __CYGWIN__ /* Mac OS X, AIX, Cygwin */
+#if (defined __APPLE__ && defined __MACH__) || defined _AIX || defined __CYGWIN__ || defined __MVS__ /* Mac OS X, AIX, Cygwin, z/OS */
/* get_rusage_as_via_setrlimit() does not work.
Prefer get_rusage_as_via_iterator(). */
return get_rusage_as_via_iterator ();
diff --git a/lib/glob.c b/lib/glob.c
index ed49a9d..9fd6482 100644
--- a/lib/glob.c
+++ b/lib/glob.c
@@ -144,7 +144,9 @@
# define __stat64(fname, buf) stat (fname, buf)
# define __fxstatat64(_, d, f, st, flag) fstatat (d, f, st, flag)
# define struct_stat64 struct stat
-# define __alloca alloca
+# ifndef __MVS__
+# define __alloca alloca
+# endif
# define __readdir readdir
# define __glob_pattern_p glob_pattern_p
#endif /* _LIBC */
diff --git a/lib/math.in.h b/lib/math.in.h
index 62a089a..59293fd 100644
--- a/lib/math.in.h
+++ b/lib/math.in.h
@@ -406,6 +406,7 @@ _GL_WARN_ON_USE (ceilf, "ceilf is unportable - "
#if @GNULIB_CEIL@
# if @REPLACE_CEIL@
# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef ceil
# define ceil rpl_ceil
# endif
_GL_FUNCDECL_RPL (ceil, double, (double x));
@@ -753,6 +754,7 @@ _GL_WARN_ON_USE (floorf, "floorf is unportable - "
#if @GNULIB_FLOOR@
# if @REPLACE_FLOOR@
# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef floor
# define floor rpl_floor
# endif
_GL_FUNCDECL_RPL (floor, double, (double x));
@@ -973,6 +975,7 @@ _GL_WARN_ON_USE (frexpf, "frexpf is unportable - "
#if @GNULIB_FREXP@
# if @REPLACE_FREXP@
# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef frexp
# define frexp rpl_frexp
# endif
_GL_FUNCDECL_RPL (frexp, double, (double x, int *expptr) _GL_ARG_NONNULL ((2)));
@@ -1958,6 +1961,7 @@ _GL_WARN_ON_USE (tanhf, "tanhf is unportable - "
#if @GNULIB_TRUNCF@
# if @REPLACE_TRUNCF@
# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef truncf
# define truncf rpl_truncf
# endif
_GL_FUNCDECL_RPL (truncf, float, (float x));
@@ -1980,6 +1984,7 @@ _GL_WARN_ON_USE (truncf, "truncf is unportable - "
#if @GNULIB_TRUNC@
# if @REPLACE_TRUNC@
# if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+# undef trunc
# define trunc rpl_trunc
# endif
_GL_FUNCDECL_RPL (trunc, double, (double x));
diff --git a/lib/ptsname_r.c b/lib/ptsname_r.c
index faa33fb..809388a 100644
--- a/lib/ptsname_r.c
+++ b/lib/ptsname_r.c
@@ -34,6 +34,11 @@
# define _PATH_DEV "/dev/"
# endif
+# undef __set_errno
+# undef __stat
+# undef __ttyname_r
+# undef __ptsname_r
+
# define __set_errno(e) errno = (e)
# define __isatty isatty
# define __stat stat
diff --git a/tests/infinity.h b/tests/infinity.h
index 45c30bd..4e8a755 100644
--- a/tests/infinity.h
+++ b/tests/infinity.h
@@ -17,8 +17,9 @@
/* Infinityf () returns a 'float' +Infinity. */
-/* The Microsoft MSVC 9 compiler chokes on the expression 1.0f / 0.0f. */
-#if defined _MSC_VER
+/* The Microsoft MSVC 9 compiler chokes on the expression 1.0f / 0.0f.
+ The IBM XL C compiler on z/OS complains. */
+#if defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
static float
Infinityf ()
{
@@ -32,8 +33,9 @@ Infinityf ()
/* Infinityd () returns a 'double' +Infinity. */
-/* The Microsoft MSVC 9 compiler chokes on the expression 1.0 / 0.0. */
-#if defined _MSC_VER
+/* The Microsoft MSVC 9 compiler chokes on the expression 1.0 / 0.0.
+ The IBM XL C compiler on z/OS complains. */
+#if defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
static double
Infinityd ()
{
@@ -47,9 +49,10 @@ Infinityd ()
/* Infinityl () returns a 'long double' +Infinity. */
-/* The Microsoft MSVC 9 compiler chokes on the expression 1.0L / 0.0L. */
-#if defined _MSC_VER
-static double
+/* The Microsoft MSVC 9 compiler chokes on the expression 1.0L / 0.0L.
+ The IBM XL C compiler on z/OS complains. */
+#if defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
+static long double
Infinityl ()
{
static long double zero = 0.0L;
diff --git a/tests/nan.h b/tests/nan.h
index 9f6819c..10b393e 100644
--- a/tests/nan.h
+++ b/tests/nan.h
@@ -15,11 +15,18 @@
along with this program. If not, see <http://www.gnu.org/licenses/>. */
+/* IBM z/OS supports both hexadecimal and IEEE floating-point formats. The
+ former does not support NaN and its isnan() implementation returns zero
+ for all values. */
+#if defined __MVS__ && defined __IBMC__ && !defined __BFP__
+# error "NaN is not supported with IBM's hexadecimal floating-point format; please re-compile with -qfloat=ieee"
+#endif
+
/* NaNf () returns a 'float' not-a-number. */
/* The Compaq (ex-DEC) C 6.4 compiler and the Microsoft MSVC 9 compiler choke
- on the expression 0.0 / 0.0. */
-#if defined __DECC || defined _MSC_VER
+ on the expression 0.0 / 0.0. The IBM XL C compiler on z/OS complains. */
+#if defined __DECC || defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
static float
NaNf ()
{
@@ -34,8 +41,8 @@ NaNf ()
/* NaNd () returns a 'double' not-a-number. */
/* The Compaq (ex-DEC) C 6.4 compiler and the Microsoft MSVC 9 compiler choke
- on the expression 0.0 / 0.0. */
-#if defined __DECC || defined _MSC_VER
+ on the expression 0.0 / 0.0. The IBM XL C compiler on z/OS complains. */
+#if defined __DECC || defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
static double
NaNd ()
{
@@ -51,14 +58,15 @@ NaNd ()
/* On Irix 6.5, gcc 3.4.3 can't compute compile-time NaN, and needs the
runtime type conversion.
- The Microsoft MSVC 9 compiler chokes on the expression 0.0L / 0.0L. */
+ The Microsoft MSVC 9 compiler chokes on the expression 0.0L / 0.0L.
+ The IBM XL C compiler on z/OS complains. */
#ifdef __sgi
static long double NaNl ()
{
double zero = 0.0;
return zero / zero;
}
-#elif defined _MSC_VER
+#elif defined _MSC_VER || (defined __MVS__ && defined __IBMC__)
static long double
NaNl ()
{
diff --git a/tests/test-canonicalize-lgpl.c b/tests/test-canonicalize-lgpl.c
index 12d2bb0..49c0221 100644
--- a/tests/test-canonicalize-lgpl.c
+++ b/tests/test-canonicalize-lgpl.c
@@ -191,12 +191,16 @@ main (void)
ASSERT (result2);
ASSERT (stat ("/", &st1) == 0);
ASSERT (stat ("//", &st2) == 0);
+ /* On IBM z/OS, "/" and "//" are distinct, yet they both have
+ st_dev == st_ino == 1. */
+#ifndef __MVS__
if (SAME_INODE (st1, st2))
{
ASSERT (strcmp (result1, "/") == 0);
ASSERT (strcmp (result2, "/") == 0);
}
else
+#endif
{
ASSERT (strcmp (result1, "//") == 0);
ASSERT (strcmp (result2, "//") == 0);
diff --git a/tests/test-nonblocking-pipe.h b/tests/test-nonblocking-pipe.h
index 5b3646e..01c992c 100644
--- a/tests/test-nonblocking-pipe.h
+++ b/tests/test-nonblocking-pipe.h
@@ -31,10 +31,11 @@
OSF/1 >= 262145
Solaris <= 7 >= 10241
Solaris >= 8 >= 20481
+ z/OS >= 131073
Cygwin >= 65537
native Windows >= 4097 (depends on the _pipe argument)
*/
-#if defined __osf__ || (defined __linux__ && (defined __ia64__ || defined __mips__))
+#if defined __MVS__ || defined __osf__ || (defined __linux__ && (defined __ia64__ || defined __mips__))
# define PIPE_DATA_BLOCK_SIZE 270000
#elif defined __linux__ && defined __sparc__
# define PIPE_DATA_BLOCK_SIZE 140000
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #6: gnulib-zos-errno.patch --]
[-- Type: text/x-patch; name="gnulib-zos-errno.patch", Size: 1871 bytes --]
diff --git a/tests/test-nonblocking-reader.h b/tests/test-nonblocking-reader.h
index 8cba131..d8eaa32 100644
--- a/tests/test-nonblocking-reader.h
+++ b/tests/test-nonblocking-reader.h
@@ -110,7 +110,7 @@ full_read_from_nonblocking_fd (size_t fd, void *buf, size_t count)
ASSERT (spent_time < 0.5);
if (ret < 0)
{
- ASSERT (saved_errno == EAGAIN);
+ ASSERT (saved_errno == EAGAIN || saved_errno == EWOULDBLOCK);
usleep (SMALL_DELAY);
}
else
diff --git a/tests/test-nonblocking-writer.h b/tests/test-nonblocking-writer.h
index 0ecf996..ff148dc 100644
--- a/tests/test-nonblocking-writer.h
+++ b/tests/test-nonblocking-writer.h
@@ -124,7 +124,7 @@ main_writer_loop (int test, size_t data_block_size, int fd,
(long) ret, dbgstrerror (ret < 0, saved_errno));
if (ret < 0 && bytes_written >= data_block_size)
{
- ASSERT (saved_errno == EAGAIN);
+ ASSERT (saved_errno == EAGAIN || saved_errno == EWOULDBLOCK);
ASSERT (spent_time < 0.5);
break;
}
@@ -133,7 +133,7 @@ main_writer_loop (int test, size_t data_block_size, int fd,
ASSERT (spent_time < 0.5);
if (ret < 0)
{
- ASSERT (saved_errno == EAGAIN);
+ ASSERT (saved_errno == EAGAIN || saved_errno == EWOULDBLOCK);
usleep (SMALL_DELAY);
}
else
@@ -165,7 +165,7 @@ main_writer_loop (int test, size_t data_block_size, int fd,
ASSERT (spent_time < 0.5);
if (ret < 0)
{
- ASSERT (saved_errno == EAGAIN);
+ ASSERT (saved_errno == EAGAIN || saved_errno == EWOULDBLOCK);
usleep (SMALL_DELAY);
}
else
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #7: gnulib-zos-pthread.patch --]
[-- Type: text/x-patch; name="gnulib-zos-pthread.patch", Size: 1223 bytes --]
diff --git a/lib/glthread/thread.c b/lib/glthread/thread.c
index d4e2921..5923ea2 100644
--- a/lib/glthread/thread.c
+++ b/lib/glthread/thread.c
@@ -33,7 +33,7 @@
#include <pthread.h>
-#ifdef PTW32_VERSION
+#if defined PTW32_VERSION || defined __MVS__
const gl_thread_t gl_null_thread /* = { .p = NULL } */;
diff --git a/lib/glthread/thread.h b/lib/glthread/thread.h
index 2febe34..36a9521 100644
--- a/lib/glthread/thread.h
+++ b/lib/glthread/thread.h
@@ -172,6 +172,15 @@ typedef pthread_t gl_thread_t;
# define gl_thread_self_pointer() \
(pthread_in_use () ? pthread_self ().p : NULL)
extern const gl_thread_t gl_null_thread;
+# elif defined __MVS__
+ /* On IBM z/OS, pthread_t is a struct with an 8-byte '__' field.
+ The first three bytes of this field appear to uniquely identify a
+ pthread_t, though not necessarily representing a pointer. */
+# define gl_thread_self() \
+ (pthread_in_use () ? pthread_self () : gl_null_thread)
+# define gl_thread_self_pointer() \
+ (pthread_in_use () ? *((void **) pthread_self ().__) : NULL)
+extern const gl_thread_t gl_null_thread;
# else
# define gl_thread_self() \
(pthread_in_use () ? pthread_self () : (pthread_t) NULL)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #8: gnulib-zos-regex-argname.patch --]
[-- Type: text/x-patch; name="gnulib-zos-regex-argname.patch", Size: 804 bytes --]
diff --git a/lib/regex.h b/lib/regex.h
index 6f3bae3..21fc00c 100644
--- a/lib/regex.h
+++ b/lib/regex.h
@@ -23,6 +23,12 @@
#include <sys/types.h>
+/* IBM z/OS uses -D__string=1 as an inclusion guard. */
+#if defined __MVS__ && defined(__string)
+# undef __string
+# define __string __string
+#endif
+
/* Allow the use in C++ code. */
#ifdef __cplusplus
extern "C" {
diff --git a/lib/string.in.h b/lib/string.in.h
index b3356bb..fa438a4 100644
--- a/lib/string.in.h
+++ b/lib/string.in.h
@@ -44,6 +44,12 @@
#ifndef _@GUARD_PREFIX@_STRING_H
#define _@GUARD_PREFIX@_STRING_H
+/* IBM z/OS uses -D__string=1 as an inclusion guard. */
+#if defined __MVS__ && defined(__string)
+# undef __string
+# define __string __string
+#endif
+
/* NetBSD 5.0 mis-defines NULL. */
#include <stddef.h>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #9: gnulib-zos-strtod.patch --]
[-- Type: text/x-patch; name="gnulib-zos-strtod.patch", Size: 961 bytes --]
diff --git a/lib/strtod.c b/lib/strtod.c
index 9fd0170..9dc6eeb 100644
--- a/lib/strtod.c
+++ b/lib/strtod.c
@@ -239,7 +239,12 @@ strtod (const char *nptr, char **endptr)
if (*s == '0' && c_tolower (s[1]) == 'x')
{
if (! c_isxdigit (s[2 + (s[2] == '.')]))
- end = s + 1;
+ {
+ end = s + 1;
+
+ /* strtod() on z/OS returns ERANGE for "0x". */
+ errno = 0;
+ }
else if (end <= s + 2)
{
num = parse_number (s + 2, 16, 2, 4, 'p', &endbuf);
@@ -321,7 +326,7 @@ strtod (const char *nptr, char **endptr)
better to use the underlying implementation's result, since a
nice implementation populates the bits of the NaN according
to interpreting n-char-sequence as a hexadecimal number. */
- if (s != end)
+ if (s != end || num == num)
num = NAN;
errno = saved_errno;
}
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2015-10-15 4:49 ` Daniel Richard G.
@ 2016-08-18 0:47 ` Paul Eggert
2016-08-18 8:24 ` Daniel Richard G.
2019-12-19 4:57 ` z/OS configure triple Bruno Haible
2019-12-19 5:16 ` z/OS, iconv, and charset aliases Bruno Haible
2 siblings, 1 reply; 49+ messages in thread
From: Paul Eggert @ 2016-08-18 0:47 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
Daniel Richard G. wrote:
> Okay, I've split my changes into a set of patches, attached. These
> patches are orthogonal and may be applied in any order:
Thanks, I finally installed those into the main repository on Savannah. I had to
write ChangeLog entries, which I took from your email. I also fixed a couple of
minor glitches that I noticed.
For future patches, could you please email the output of "git format-patch"? Or
just use "git send-email". Please use the typical format for ChangeLog entries;
I did that for your first patch but ran out of time to do that for later ones.
Thanks again.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2016-08-18 0:47 ` Paul Eggert
@ 2016-08-18 8:24 ` Daniel Richard G.
2016-08-18 8:53 ` Paul Eggert
0 siblings, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2016-08-18 8:24 UTC (permalink / raw)
To: Paul Eggert; +Cc: bug-gnulib
On Wed, 2016 Aug 17 17:47-0700, Paul Eggert wrote:
>
> Thanks, I finally installed those into the main repository on
> Savannah. I had to write ChangeLog entries, which I took from your
> email. I also fixed a couple of minor glitches that I noticed.
Much appreciated, Paul.
(Did you get the minor changes to test-c-strncasecmp.c and
test-sigpipe.sh? Those are the only salient ones still outstanding)
I have a few more fixes to send in, but those are not yet ready as I am
still working with IBM on various z/OS issues exposed by the gnulib test
suite. The process has been, to say the least, frustratingly slow.
> For future patches, could you please email the output of "git format-
> patch"? Or just use "git send-email". Please use the typical format
> for ChangeLog entries; I did that for your first patch but ran out of
> time to do that for later ones.
Understood. I will keep this in mind for the next patch.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2016-08-18 8:24 ` Daniel Richard G.
@ 2016-08-18 8:53 ` Paul Eggert
2016-08-19 8:20 ` Daniel Richard G.
0 siblings, 1 reply; 49+ messages in thread
From: Paul Eggert @ 2016-08-18 8:53 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
Daniel Richard G. wrote:
> (Did you get the minor changes to test-c-strncasecmp.c and
> test-sigpipe.sh? Those are the only salient ones still outstanding)
Hmm, sorry, I don't seem to have them. Could you please resend them, in 'git
format-patch' format? Thanks.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2016-08-18 8:53 ` Paul Eggert
@ 2016-08-19 8:20 ` Daniel Richard G.
2016-08-19 11:03 ` Bruno Haible
2016-08-19 19:28 ` Paul Eggert
0 siblings, 2 replies; 49+ messages in thread
From: Daniel Richard G. @ 2016-08-19 8:20 UTC (permalink / raw)
To: Paul Eggert; +Cc: bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 726 bytes --]
On Thu, 2016 Aug 18 01:53-0700, Paul Eggert wrote:
>
> Hmm, sorry, I don't seem to have them. Could you please resend them,
> in 'git format-patch' format? Thanks.
They are attached. I needed to make some minor edits on the format-patch
output, hopefully harmless. The ChangeLog-style entries are at the top,
though you'll need to add the "section" lines.
The first change, to test-c-strncasecmp.c, disables two string-compares
that fail in EBCDIC. (\304 == 'D' in the 1047 encoding.)
The second change, to test-sigpipe.sh, fixes what looked like a typo.
(A goes with A, B goes with B, so what should go with C...)
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Changes-for-z-OS.patch --]
[-- Type: text/x-patch; name="0001-Changes-for-z-OS.patch", Size: 1616 bytes --]
2016-08-19 Daniel Richard G. <skunk@iSKUNK.ORG>
* tests/test-c-strncasecmp.c: Allow two c_strncasecmp() calls
which assume ASCII encoding semantics to run only in ASCII
mode, as they fail in EBCDIC
* tests/test-sigpipe.sh: Fixed typo
---
tests/test-c-strncasecmp.c | 3 +++
tests/test-sigpipe.sh | 2 +-
diff --git a/tests/test-c-strncasecmp.c b/tests/test-c-strncasecmp.c
index 1ca42d8..349f6b3 100644
--- a/tests/test-c-strncasecmp.c
+++ b/tests/test-c-strncasecmp.c
@@ -19,6 +19,7 @@
#include <config.h>
#include "c-strcase.h"
+#include "c-ctype.h"
#include <locale.h>
#include <string.h>
@@ -71,9 +72,11 @@ main (int argc, char *argv[])
ASSERT (c_strncasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R", 99) > 0); /* özgür */
ASSERT (c_strncasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r", 99) < 0); /* özgür */
+#if C_CTYPE_ASCII
/* This test shows how strings of different size cannot compare equal. */
ASSERT (c_strncasecmp ("turkish", "TURK\304\260SH", 7) < 0);
ASSERT (c_strncasecmp ("TURK\304\260SH", "turkish", 7) > 0);
+#endif
return 0;
}
diff --git a/tests/test-sigpipe.sh b/tests/test-sigpipe.sh
index bc2baf2..6cf3242 100755
--- a/tests/test-sigpipe.sh
+++ b/tests/test-sigpipe.sh
@@ -21,7 +21,7 @@ fi
# Test signal's behaviour when a handler is installed.
tmpfiles="$tmpfiles t-sigpipeC.tmp"
-./test-sigpipe${EXEEXT} B 2> t-sigpipeC.tmp | head -1 > /dev/null
+./test-sigpipe${EXEEXT} C 2> t-sigpipeC.tmp | head -1 > /dev/null
if test -s t-sigpipeC.tmp; then
LC_ALL=C tr -d '\r' < t-sigpipeC.tmp
rm -fr $tmpfiles; exit 1
--
2.9.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2016-08-19 8:20 ` Daniel Richard G.
@ 2016-08-19 11:03 ` Bruno Haible
2016-08-19 19:28 ` Paul Eggert
1 sibling, 0 replies; 49+ messages in thread
From: Bruno Haible @ 2016-08-19 11:03 UTC (permalink / raw)
To: bug-gnulib; +Cc: Daniel Richard G.
Daniel Richard G. wrote:
> The second change, to test-sigpipe.sh, fixes what looked like a typo.
> (A goes with A, B goes with B, so what should go with C...)
Thanks. This change is definitely good.
Bruno
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2016-08-19 8:20 ` Daniel Richard G.
2016-08-19 11:03 ` Bruno Haible
@ 2016-08-19 19:28 ` Paul Eggert
2016-08-19 20:38 ` Daniel Richard G.
1 sibling, 1 reply; 49+ messages in thread
From: Paul Eggert @ 2016-08-19 19:28 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
Thanks, I installed those changes in your name, after reformatting the ChangeLog
file and commit message to fit Gnulib style. You can see what that format is
like by doing "git pull; git format-patch --stdout -1": the ChangeLog entry's
contents duplicate the commit message, except that they're indented a tab and
the 2nd (empty) line is omitted.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] IBM z/OS + EBCDIC support
2016-08-19 19:28 ` Paul Eggert
@ 2016-08-19 20:38 ` Daniel Richard G.
0 siblings, 0 replies; 49+ messages in thread
From: Daniel Richard G. @ 2016-08-19 20:38 UTC (permalink / raw)
To: Paul Eggert; +Cc: bug-gnulib
On Fri, 2016 Aug 19 12:28-0700, Paul Eggert wrote:
> Thanks, I installed those changes in your name, after reformatting the
> ChangeLog file and commit message to fit Gnulib style. You can see
> what that format is like by doing "git pull; git format-patch --stdout
> -1": the ChangeLog entry's contents duplicate the commit message,
> except that they're indented a tab and the 2nd (empty) line is
> omitted.
I see the formatting, but admit to uncertainty on the precise
wording desired.
Nevertheless, thanks for curating that and getting these changes in.
That wraps it up for my initial submission. Here's hoping the next batch
won't be terribly far off.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: z/OS configure triple
2015-10-15 4:49 ` Daniel Richard G.
2016-08-18 0:47 ` Paul Eggert
@ 2019-12-19 4:57 ` Bruno Haible
2019-12-20 0:22 ` Daniel Richard G.
2019-12-19 5:16 ` z/OS, iconv, and charset aliases Bruno Haible
2 siblings, 1 reply; 49+ messages in thread
From: Bruno Haible @ 2019-12-19 4:57 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
Hi Daniel,
In <https://lists.gnu.org/archive/html/bug-gnulib/2015-10/msg00020.html>
you wrote:
> gnulib-zos-configure.patch: Changes to the Autoconf M4 code to support
> z/OS.
What is the host_os of the canonical triple in that environment?
Is it 'mvs' or 'openedition'?
I know that at the C preprocessor level, the test is 'defined __MVS__'.
Bruno
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and charset aliases
2015-10-15 4:49 ` Daniel Richard G.
2016-08-18 0:47 ` Paul Eggert
2019-12-19 4:57 ` z/OS configure triple Bruno Haible
@ 2019-12-19 5:16 ` Bruno Haible
2019-12-19 5:21 ` Bruno Haible
2019-12-20 4:38 ` Daniel Richard G.
2 siblings, 2 replies; 49+ messages in thread
From: Bruno Haible @ 2019-12-19 5:16 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
Hi Daniel,
In <https://lists.gnu.org/archive/html/bug-gnulib/2015-10/msg00020.html>
you submitted this patch, which Paul committed on 2016-08-18:
> Also, iconv_open() on
> z/OS does not recognize "ISO-8859-1", but "ISO8859-1" works.
diff --git a/tests/test-iconv.c b/tests/test-iconv.c
index ed715bd..a64c6dd 100644
--- a/tests/test-iconv.c
+++ b/tests/test-iconv.c
@@ -44,8 +44,14 @@ main ()
#if HAVE_ICONV
/* Assume that iconv() supports at least the encodings ASCII, ISO-8859-1,
and UTF-8. */
- iconv_t cd_88591_to_utf8 = iconv_open ("UTF-8", "ISO-8859-1");
- iconv_t cd_utf8_to_88591 = iconv_open ("ISO-8859-1", "UTF-8");
+ iconv_t cd_88591_to_utf8 = iconv_open ("UTF-8", "ISO8859-1");
+ iconv_t cd_utf8_to_88591 = iconv_open ("ISO8859-1", "UTF-8");
+
This part is not right. The approach we take regarding charset/encoding
aliases is that
- locale_charset() produces canonicalized charset names
(see localcharset.h for the precise list, e.g. "UTF-8" not "UTF8",
"ISO-8859-1" not "ISO8859-1", "CP1252" not "WINDOWS-1252", etc.),
- glibc is known to support these canonicalized charset names,
- All functions are supposed to receive canonical, not system-dependent
charset names.
In particular, the gnulib iconv_open module is supposed to receive an
encoding name such as "ISO-8859-1" as argument and, on platforms which
don't understand it, pass "ISO8859-1" (on whatever the platform likes)
to the platform's iconv_open() function. The way this is done is by
adding a gperf-syntax data file to the 'iconv_open' module. To create
such a file, I would need from you the list of encoding names, as z/OS
lists them. You can also take lib/iconv_open-aix.gperf as a template.
Packages such gettext are passing an encoding name "ISO-8859-1" to
iconv_open, and the unit test is supposed to verify that this works.
2019-12-19 Bruno Haible <bruno@clisp.org>
iconv tests: Test canonicalized, not system-dependent, encoding names.
* tests/test-iconv.c (main): Revert part of the 2016-08-17 patch.
diff --git a/tests/test-iconv.c b/tests/test-iconv.c
index 3fe9e30..ef1e681 100644
--- a/tests/test-iconv.c
+++ b/tests/test-iconv.c
@@ -44,8 +44,8 @@ main ()
#if HAVE_ICONV
/* Assume that iconv() supports at least the encodings ASCII, ISO-8859-1,
and UTF-8. */
- iconv_t cd_88591_to_utf8 = iconv_open ("UTF-8", "ISO8859-1");
- iconv_t cd_utf8_to_88591 = iconv_open ("ISO8859-1", "UTF-8");
+ iconv_t cd_88591_to_utf8 = iconv_open ("UTF-8", "ISO-8859-1");
+ iconv_t cd_utf8_to_88591 = iconv_open ("ISO-8859-1", "UTF-8");
#if defined __MVS__ && defined __IBMC__
/* String literals below are in ASCII, not EBCDIC. */
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and charset aliases
2019-12-19 5:16 ` z/OS, iconv, and charset aliases Bruno Haible
@ 2019-12-19 5:21 ` Bruno Haible
2019-12-20 4:38 ` Daniel Richard G.
1 sibling, 0 replies; 49+ messages in thread
From: Bruno Haible @ 2019-12-19 5:21 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
> 2019-12-19 Bruno Haible <bruno@clisp.org>
>
> iconv tests: Test canonicalized, not system-dependent, encoding names.
> * tests/test-iconv.c (main): Revert part of the 2016-08-17 patch.
Addendum:
* modules/iconv-tests (Depends-on): Add iconv_open.
diff --git a/modules/iconv-tests b/modules/iconv-tests
index 91e17f0..c1709f9 100644
--- a/modules/iconv-tests
+++ b/modules/iconv-tests
@@ -4,6 +4,7 @@ tests/signature.h
tests/macros.h
Depends-on:
+iconv_open
configure.ac:
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: z/OS configure triple
2019-12-19 4:57 ` z/OS configure triple Bruno Haible
@ 2019-12-20 0:22 ` Daniel Richard G.
2019-12-20 6:29 ` Bruno Haible
0 siblings, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2019-12-20 0:22 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib
On Wed, 2019 Dec 18 23:57-05:00, Bruno Haible wrote:
> Hi Daniel,
>
> > gnulib-zos-configure.patch: Changes to the Autoconf M4 code to support
> > z/OS.
>
> What is the host_os of the canonical triple in that environment?
> Is it 'mvs' or 'openedition'?
>
> I know that at the C preprocessor level, the test is 'defined __MVS__'.
It's the latter; I haven't seen "mvs" used much by third parties, aside
from older folks at my org :)
$ /tmp/testdir/build-aux/config.guess
trap: /tmp/testdir/build-aux/config.guess 99: FSUM7327 signal number 13 not conventional
i370-ibm-openedition
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and charset aliases
2019-12-19 5:16 ` z/OS, iconv, and charset aliases Bruno Haible
2019-12-19 5:21 ` Bruno Haible
@ 2019-12-20 4:38 ` Daniel Richard G.
2019-12-20 8:19 ` Bruno Haible
1 sibling, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2019-12-20 4:38 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 1915 bytes --]
On Thu, 2019 Dec 19 00:16-05:00, Bruno Haible wrote:
> Hi Daniel,
>
> In <https://lists.gnu.org/archive/html/bug-gnulib/2015-10/msg00020.html>
> you submitted this patch, which Paul committed on 2016-08-18:
>
> > Also, iconv_open() on
> > z/OS does not recognize "ISO-8859-1", but "ISO8859-1" works.
>
> [...]
>
> This part is not right. The approach we take regarding charset/encoding
> aliases is that
> - locale_charset() produces canonicalized charset names
> (see localcharset.h for the precise list, e.g. "UTF-8" not "UTF8",
> "ISO-8859-1" not "ISO8859-1", "CP1252" not "WINDOWS-1252", etc.),
> - glibc is known to support these canonicalized charset names,
> - All functions are supposed to receive canonical, not system-dependent
> charset names.
Understood.
> In particular, the gnulib iconv_open module is supposed to receive an
> encoding name such as "ISO-8859-1" as argument and, on platforms which
> don't understand it, pass "ISO8859-1" (on whatever the platform likes)
> to the platform's iconv_open() function. The way this is done is by
> adding a gperf-syntax data file to the 'iconv_open' module. To create
> such a file, I would need from you the list of encoding names, as z/OS
> lists them. You can also take lib/iconv_open-aix.gperf as a template.
I've attached a file with the output of "iconv -l". The names appear
consistent with what's in iconv_open-aix.gperf.
> Packages such gettext are passing an encoding name "ISO-8859-1" to
> iconv_open, and the unit test is supposed to verify that this works.
>
> 2019-12-19 Bruno Haible <bruno@clisp.org>
>
> iconv tests: Test canonicalized, not system-dependent, encoding names.
> * tests/test-iconv.c (main): Revert part of the 2016-08-17 patch.
So *that's* why test-iconv was breaking for me again :]
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
[-- Attachment #2: zos-iconv-output.txt --]
[-- Type: text/plain, Size: 7708 bytes --]
$ iconv -l
Character sets:
37 IBM-037
256 00256
259 00259
273 IBM-273
274 IBM-274
275 IBM-275
277 IBM-277
278 IBM-278
280 IBM-280
281 IBM-281
282 IBM-282
284 IBM-284
285 IBM-285
286 00286
290 IBM-290
293 00293
297 IBM-297
300 IBM-300
301 IBM-301
367 00367
420 IBM-420
421 00421
423 00423
424 IBM-424
425 IBM-425
437 IBM-437
500 IBM-500
720 00720
737 00737
775 00775
803 00803
806 00806
808 IBM-808
813 ISO8859-7
819 ISO8859-1
833 IBM-833
834 IBM-834
835 IBM-835
836 IBM-836
837 IBM-837
838 IBM-838
848 IBM-848
849 00849
850 IBM-850
851 00851
852 IBM-852
853 00853
855 IBM-855
856 IBM-856
857 00857
858 IBM-858
859 IBM-859
860 00860
861 IBM-861
862 IBM-862
863 00863
864 IBM-864
865 00865
866 IBM-866
867 IBM-867
868 00868
869 IBM-869
870 IBM-870
871 IBM-871
872 IBM-872
874 TIS-620
875 IBM-875
876 00876
878 00878
880 IBM-880
891 00891
895 00895
896 00896
897 00897
899 00899
901 IBM-901
902 IBM-902
903 00903
904 IBM-904
905 00905
912 ISO8859-2
913 00913
914 ISO8859-4
915 ISO8859-5
916 ISO8859-8
918 00918
920 ISO8859-9
921 ISO8859-13
922 IBM-922
923 ISO8859-15
924 IBM-924
926 00926
927 IBM-927
928 IBM-928
930 IBM-930
931 00931
932 IBM-eucJC
933 IBM-933
934 00934
935 IBM-935
936 IBM-936
937 IBM-937
938 IBM-938
939 IBM-939
941 00941
942 IBM-942
943 IBM-943
944 00944
946 IBM-946
947 IBM-947
948 IBM-948
949 IBM-949
950 BIG5
951 IBM-951
952 00952
953 00953
954 00954
955 00955
956 IBM-956
957 IBM-957
958 IBM-958
959 IBM-959
960 00960
961 00961
963 00963
964 IBM-eucTW
965 00965
966 00966
970 IBM-eucKR
971 00971
1002 01002
1004 01004
1006 01006
1008 01008
1009 01009
1010 01010
1011 01011
1012 01012
1013 01013
1014 01014
1015 01015
1016 01016
1017 01017
1018 01018
1019 01019
1020 01020
1021 01021
1023 01023
1025 IBM-1025
1026 IBM-1026
1027 IBM-1027
1040 01040
1041 01041
1042 01042
1043 01043
1046 IBM-1046
1047 IBM-1047
1051 01051
1088 IBM-1088
1089 ISO8859-6
1097 01097
1098 01098
1100 01100
1101 01101
1102 01102
1103 01103
1104 01104
1105 01105
1106 01106
1107 01107
1112 IBM-1112
1114 01114
1115 IBM-1115
1122 IBM-1122
1123 IBM-1123
1124 IBM-1124
1125 IBM-1125
1126 IBM-1126
1129 01129
1130 01130
1131 01131
1132 01132
1133 01133
1137 01137
1140 IBM-1140
1141 IBM-1141
1142 IBM-1142
1143 IBM-1143
1144 IBM-1144
1145 IBM-1145
1146 IBM-1146
1147 IBM-1147
1148 IBM-1148
1149 IBM-1149
1153 IBM-1153
1154 IBM-1154
1155 IBM-1155
1156 IBM-1156
1157 IBM-1157
1158 IBM-1158
1159 IBM-1159
1160 IBM-1160
1161 IBM-1161
1162 01162
1163 01163
1164 01164
1165 IBM-1165
1166 01166
1167 01167
1168 01168
1200 01200
1202 01202
1208 UTF-8
1210 01210
1232 01232
1250 IBM-1250
1251 IBM-1251
1252 IBM-1252
1253 IBM-1253
1254 IBM-1254
1255 IBM-1255
1256 IBM-1256
1257 01257
1258 01258
1275 01275
1276 01276
1277 01277
1280 01280
1281 01281
1282 01282
1283 01283
1284 01284
1285 01285
1287 01287
1288 01288
1350 01350
1351 01351
1362 IBM-1362
1363 IBM-1363
1364 IBM-1364
1370 IBM-1370
1371 IBM-1371
1374 01374
1375 01375
1380 IBM-1380
1381 IBM-1381
1382 01382
1383 IBM-eucCN
1385 01385
1386 IBM-1386
1388 IBM-1388
1390 IBM-1390
1391 01391
1392 01392
1399 IBM-1399
4133 04133
4369 04369
4370 04370
4371 04371
4373 04373
4374 04374
4376 04376
4378 04378
4380 04380
4381 04381
4386 04386
4393 04393
4396 IBM-4396
4397 04397
4516 04516
4517 04517
4519 04519
4520 04520
4533 04533
4596 04596
4899 04899
4904 04904
4909 IBM-4909
4929 04929
4930 IBM-4930
4931 04931
4932 04932
4933 IBM-4933
4934 04934
4944 04944
4945 04945
4946 IBM-4946
4947 04947
4948 04948
4949 04949
4951 04951
4952 04952
4953 04953
4954 04954
4955 04955
4956 04956
4957 04957
4958 04958
4959 04959
4960 04960
4961 04961
4962 04962
4963 04963
4964 04964
4965 04965
4966 04966
4967 04967
4970 04970
4971 IBM-4971
4976 04976
4992 04992
4993 04993
5012 05012
5014 05014
5023 05023
5026 IBM-5026
5028 05028
5029 05029
5031 IBM-5031
5033 05033
5035 IBM-5035
5038 05038
5039 05039
5043 05043
5045 05045
5046 05046
5047 05047
5048 05048
5049 05049
5050 05050
5052 ISO-2022-JP
5053 IBM-5053
5054 IBM-5054
5055 IBM-5055
5056 05056
5067 05067
5100 05100
5104 05104
5123 IBM-5123
5137 05137
5142 05142
5143 05143
5210 05210
5211 05211
5346 IBM-5346
5347 IBM-5347
5348 IBM-5348
5349 IBM-5349
5350 IBM-5350
5351 IBM-5351
5352 IBM-5352
5353 05353
5354 05354
5470 05470
5471 05471
5472 05472
5473 05473
5476 05476
5477 05477
5478 05478
5479 05479
5486 05486
5487 05487
5488 IBM-5488
5495 05495
8229 08229
8448 08448
8482 IBM-8482
8492 08492
8493 08493
8612 08612
8629 08629
8692 08692
9025 09025
9026 09026
9027 IBM-9027
9028 09028
9030 09030
9042 09042
9044 IBM-9044
9047 09047
9048 09048
9049 09049
9056 09056
9060 09060
9061 IBM-9061
9064 09064
9066 09066
9088 09088
9089 09089
9122 09122
9124 09124
9125 09125
9127 09127
9131 09131
9139 09139
9142 09142
9144 09144
9145 09145
9146 09146
9163 09163
9238 IBM-9238
9306 09306
9444 09444
9447 09447
9448 09448
9449 09449
9572 09572
9574 09574
9575 09575
9577 09577
9580 09580
12544 12544
12588 12588
12712 IBM-12712
12725 12725
12788 12788
13121 IBM-13121
13124 IBM-13124
13125 13125
13140 13140
13143 13143
13145 13145
13152 13152
13156 13156
13157 13157
13162 13162
13184 13184
13185 13185
13218 13218
13219 13219
13221 13221
13223 13223
13235 13235
13238 13238
13240 13240
13241 13241
13242 13242
13488 UCS-2
13671 13671
13676 13676
16421 16421
16684 IBM-16684
16804 IBM-16804
16821 16821
16884 16884
17221 17221
17240 17240
17248 IBM-17248
17314 17314
17331 17331
17337 17337
17354 17354
17584 17584
20517 20517
20780 20780
20917 20917
20980 20980
21314 21314
21317 21317
21344 21344
21427 21427
21433 21433
21450 21450
21680 21680
24613 24613
24876 24876
24877 24877
25013 25013
25076 25076
25426 25426
25427 25427
25428 25428
25429 25429
25431 25431
25432 25432
25433 25433
25436 25436
25437 25437
25438 25438
25439 25439
25440 25440
25441 25441
25442 25442
25444 25444
25445 25445
25450 25450
25467 25467
25473 25473
25479 25479
25480 25480
25502 25502
25503 25503
25504 25504
25508 25508
25510 25510
25512 25512
25514 25514
25518 25518
25520 25520
25522 25522
25524 25524
25525 25525
25527 25527
25546 25546
25580 25580
25616 25616
25617 25617
25618 25618
25619 25619
25664 25664
25690 25690
25691 25691
28709 IBM-28709
29109 29109
29172 29172
29522 29522
29523 29523
29524 29524
29525 29525
29527 29527
29528 29528
29529 29529
29532 29532
29533 29533
29534 29534
29535 29535
29536 29536
29537 29537
29540 29540
29541 29541
29546 29546
29614 29614
29616 29616
29618 29618
29620 29620
29621 29621
29623 29623
29712 29712
29713 29713
29714 29714
29715 29715
29760 29760
32805 32805
33058 33058
33205 33205
33268 33268
33618 33618
33619 33619
33620 33620
33621 33621
33623 33623
33624 33624
33632 33632
33636 33636
33637 33637
33665 33665
33698 33698
33699 33699
33700 33700
33717 33717
33722 EUCJP
37301 37301
37719 37719
37728 37728
37732 37732
37761 37761
37813 37813
41397 41397
41460 41460
41824 41824
41828 41828
45493 45493
45556 45556
45920 45920
49589 49589
49652 49652
53668 IBM-53668
53685 53685
53748 53748
54189 54189
54191 IBM-54191
54289 54289
61696 61696
61697 61697
61698 61698
61699 61699
61700 61700
61710 61710
61711 61711
61712 61712
61953 61953
61956 61956
62337 62337
62381 62381
62383 IBM-62383
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: z/OS configure triple
2019-12-20 0:22 ` Daniel Richard G.
@ 2019-12-20 6:29 ` Bruno Haible
0 siblings, 0 replies; 49+ messages in thread
From: Bruno Haible @ 2019-12-20 6:29 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
Hi Daniel,
> $ /tmp/testdir/build-aux/config.guess
> trap: /tmp/testdir/build-aux/config.guess 99: FSUM7327 signal number 13 not conventional
> i370-ibm-openedition
Thanks. So the wiki is right, and the m4/intl-thread-locale.m4 file is wrong.
2019-12-20 Bruno Haible <bruno@clisp.org>
localename, gettext: Fix host_os value for z/OS.
* m4/intl-thread-locale.m4 (gt_FUNC_USELOCALE): Fix host_os value in
cross-configuration code.
diff --git a/m4/intl-thread-locale.m4 b/m4/intl-thread-locale.m4
index f74f116..7f8817f 100644
--- a/m4/intl-thread-locale.m4
+++ b/m4/intl-thread-locale.m4
@@ -1,4 +1,4 @@
-# intl-thread-locale.m4 serial 6
+# intl-thread-locale.m4 serial 7
dnl Copyright (C) 2015-2019 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
@@ -171,8 +171,8 @@ int main ()
[gt_cv_func_uselocale_works=no],
[# Guess no on AIX and z/OS, yes otherwise.
case "$host_os" in
- aix* | mvs*) gt_cv_func_uselocale_works="guessing no" ;;
- *) gt_cv_func_uselocale_works="guessing yes" ;;
+ aix* | openedition*) gt_cv_func_uselocale_works="guessing no" ;;
+ *) gt_cv_func_uselocale_works="guessing yes" ;;
esac
])
])
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and charset aliases
2019-12-20 4:38 ` Daniel Richard G.
@ 2019-12-20 8:19 ` Bruno Haible
2019-12-20 18:23 ` Daniel Richard G.
0 siblings, 1 reply; 49+ messages in thread
From: Bruno Haible @ 2019-12-20 8:19 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 2498 bytes --]
Hi Daniel,
> I've attached a file with the output of "iconv -l". The names appear
> consistent with what's in iconv_open-aix.gperf.
Thanks. From this, I think we can equate the following vendor names with
GNU canonical names:
Vendor name Canonical name References
00367 ASCII, ANSI_X3.4-1968 https://en.wikipedia.org/wiki/Code_page_367
https://haible.de/bruno/charsets/conversion-tables/ASCII.html
ISO8859-1 ISO-8859-1
ISO8859-2 ISO-8859-2
ISO8859-4 ISO-8859-4
ISO8859-5 ISO-8859-5
ISO8859-6 ISO-8859-6
ISO8859-7 ISO-8859-7
ISO8859-8 ISO-8859-8
ISO8859-9 ISO-8859-9
ISO8859-13 ISO-8859-13
ISO8859-15 ISO-8859-15
IBM-437 CP437
IBM-850 CP850
IBM-852 CP852
IBM-855 CP855
IBM-856 CP856
IBM-861 CP861
IBM-862 CP862
IBM-864 CP864
IBM-866 CP866
IBM-869 CP869
TIS-620 CP874 https://haible.de/bruno/charsets/conversion-tables/Thai.html
IBM-922 CP922
IBM-eucJC CP932
IBM-943 CP943
IBM-949 CP949
IBM-1046 CP1046
IBM-1124 CP1124
IBM-1125 CP1125
IBM-1250 CP1250
IBM-1251 CP1251
IBM-1252 CP1252
IBM-1253 CP1253
IBM-1254 CP1254
IBM-1255 CP1255
IBM-1256 CP1256
IBM-eucCN GB2312
EUCJP EUC-JP
IBM-eucKR EUC-KR
IBM-eucTW EUC-TW
BIG5 BIG5
IBM-936 GBK
TIS-620 TIS-620
UTF-8 UTF-8
Fortunately, all encodings listed as locale encodings in
"Table 3. Supported language-territory names and LT codes for ASCII locales"
of https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.cbcpx01/locnamc.htm
are in this list.
Omitting identical names on both sides (e.g. BIG5 BIG5), I arrive at the two
attached patches.
2019-12-20 Bruno Haible <bruno@clisp.org>
iconv_open: Add support for z/OS encoding names.
Reported by Daniel Richard G. in
<https://lists.gnu.org/archive/html/bug-gnulib/2019-12/msg00172.html>.
* lib/iconv_open-zos.gperf: New file.
* modules/iconv_open (Files): Add iconv_open-zos.gperf.
(Makefile.am): Add rules for generating iconv_open-zos.h from it.
* lib/iconv_open.c (ICONV_FLAVOR_ZOS): New macro.
* m4/iconv_open.m4 (gl_FUNC_ICONV_OPEN): On z/OS, use ICONV_FLAVOR_ZOS.
* doc/posix-functions/iconv_open.texi: Mention z/OS.
2019-12-20 Bruno Haible <bruno@clisp.org>
localcharset: Add support for z/OS encoding names.
* lib/localcharset.h: Mention which encodings are used as locale
encodings on z/OS.
[-- Attachment #2: 0001-iconv_open-Add-support-for-z-OS-encoding-names.patch --]
[-- Type: text/x-patch, Size: 8174 bytes --]
From 49e78fcade5457b00b877fa7f7309056076a9b53 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Fri, 20 Dec 2019 09:12:37 +0100
Subject: [PATCH 1/2] iconv_open: Add support for z/OS encoding names.
Reported by Daniel Richard G. in
<https://lists.gnu.org/archive/html/bug-gnulib/2019-12/msg00172.html>.
* lib/iconv_open-zos.gperf: New file.
* modules/iconv_open (Files): Add iconv_open-zos.gperf.
(Makefile.am): Add rules for generating iconv_open-zos.h from it.
* lib/iconv_open.c (ICONV_FLAVOR_ZOS): New macro.
* m4/iconv_open.m4 (gl_FUNC_ICONV_OPEN): On z/OS, use ICONV_FLAVOR_ZOS.
* doc/posix-functions/iconv_open.texi: Mention z/OS.
---
ChangeLog | 12 +++++++
doc/posix-functions/iconv_open.texi | 2 +-
lib/iconv_open-zos.gperf | 68 +++++++++++++++++++++++++++++++++++++
lib/iconv_open.c | 1 +
m4/iconv_open.m4 | 13 +++----
modules/iconv_open | 12 ++++---
6 files changed, 97 insertions(+), 11 deletions(-)
create mode 100644 lib/iconv_open-zos.gperf
diff --git a/ChangeLog b/ChangeLog
index dece371..9d6d06b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,17 @@
2019-12-20 Bruno Haible <bruno@clisp.org>
+ iconv_open: Add support for z/OS encoding names.
+ Reported by Daniel Richard G. in
+ <https://lists.gnu.org/archive/html/bug-gnulib/2019-12/msg00172.html>.
+ * lib/iconv_open-zos.gperf: New file.
+ * modules/iconv_open (Files): Add iconv_open-zos.gperf.
+ (Makefile.am): Add rules for generating iconv_open-zos.h from it.
+ * lib/iconv_open.c (ICONV_FLAVOR_ZOS): New macro.
+ * m4/iconv_open.m4 (gl_FUNC_ICONV_OPEN): On z/OS, use ICONV_FLAVOR_ZOS.
+ * doc/posix-functions/iconv_open.texi: Mention z/OS.
+
+2019-12-20 Bruno Haible <bruno@clisp.org>
+
doc: Document the problem of the per-thread locale functions on z/OS.
* doc/posix-functions/uselocale.texi: Document the z/OS problem.
* doc/posix-functions/newlocale.texi: Likewise.
diff --git a/doc/posix-functions/iconv_open.texi b/doc/posix-functions/iconv_open.texi
index a70e1f5..d5f05ee 100644
--- a/doc/posix-functions/iconv_open.texi
+++ b/doc/posix-functions/iconv_open.texi
@@ -20,7 +20,7 @@ Portability problems fixed by Gnulib module @code{iconv_open}:
@item
This function recognizes only non-standard aliases for many encodings (not
the IANA registered encoding names) on many platforms:
-AIX 5.1, HP-UX 11, IRIX 6.5, Solaris 11 2010-11.
+AIX 5.1, HP-UX 11, IRIX 6.5, Solaris 11 2010-11, z/OS.
@end itemize
Portability problems fixed by Gnulib module @code{iconv_open-utf}:
diff --git a/lib/iconv_open-zos.gperf b/lib/iconv_open-zos.gperf
new file mode 100644
index 0000000..d44b5d7
--- /dev/null
+++ b/lib/iconv_open-zos.gperf
@@ -0,0 +1,68 @@
+/* Character set conversion.
+ Copyright (C) 2019 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2, or (at your option)
+ any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License along
+ with this program; if not, see <https://www.gnu.org/licenses/>. */
+
+struct mapping { int standard_name; const char vendor_name[10 + 1]; };
+%struct-type
+%language=ANSI-C
+%define slot-name standard_name
+%define hash-function-name mapping_hash
+%define lookup-function-name mapping_lookup
+%readonly-tables
+%global-table
+%define word-array-name mappings
+%pic
+%%
+ASCII, "00367"
+ISO-8859-1, "ISO8859-1"
+ISO-8859-2, "ISO8859-2"
+ISO-8859-4, "ISO8859-4"
+ISO-8859-5, "ISO8859-5"
+ISO-8859-6, "ISO8859-6"
+ISO-8859-7, "ISO8859-7"
+ISO-8859-8, "ISO8859-8"
+ISO-8859-9, "ISO8859-9"
+ISO-8859-13, "ISO8859-13"
+ISO-8859-15, "ISO8859-15"
+CP437, "IBM-437"
+CP850, "IBM-850"
+CP852, "IBM-852"
+CP855, "IBM-855"
+CP856, "IBM-856"
+CP861, "IBM-861"
+CP862, "IBM-862"
+CP864, "IBM-864"
+CP866, "IBM-866"
+CP869, "IBM-869"
+CP874, "TIS-620"
+CP922, "IBM-922"
+CP932, "IBM-eucJC"
+CP943, "IBM-943"
+CP949, "IBM-949"
+CP1046, "IBM-1046"
+CP1124, "IBM-1124"
+CP1125, "IBM-1125"
+CP1250, "IBM-1250"
+CP1251, "IBM-1251"
+CP1252, "IBM-1252"
+CP1253, "IBM-1253"
+CP1254, "IBM-1254"
+CP1255, "IBM-1255"
+CP1256, "IBM-1256"
+GB2312, "IBM-eucCN"
+EUC-JP, "EUCJP"
+EUC-KR, "IBM-eucKR"
+EUC-TW, "IBM-eucTW"
+GBK, "IBM-936"
diff --git a/lib/iconv_open.c b/lib/iconv_open.c
index 928ccf2..918b89c 100644
--- a/lib/iconv_open.c
+++ b/lib/iconv_open.c
@@ -36,6 +36,7 @@
#define ICONV_FLAVOR_IRIX "iconv_open-irix.h"
#define ICONV_FLAVOR_OSF "iconv_open-osf.h"
#define ICONV_FLAVOR_SOLARIS "iconv_open-solaris.h"
+#define ICONV_FLAVOR_ZOS "iconv_open-zos.h"
#ifdef ICONV_FLAVOR
# include ICONV_FLAVOR
diff --git a/m4/iconv_open.m4 b/m4/iconv_open.m4
index bfcd354..b4730a9 100644
--- a/m4/iconv_open.m4
+++ b/m4/iconv_open.m4
@@ -1,4 +1,4 @@
-# iconv_open.m4 serial 15
+# iconv_open.m4 serial 16
dnl Copyright (C) 2007-2019 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
@@ -23,11 +23,12 @@ AC_DEFUN([gl_FUNC_ICONV_OPEN],
if test $gl_func_iconv_gnu = no; then
iconv_flavor=
case "$host_os" in
- aix*) iconv_flavor=ICONV_FLAVOR_AIX ;;
- irix*) iconv_flavor=ICONV_FLAVOR_IRIX ;;
- hpux*) iconv_flavor=ICONV_FLAVOR_HPUX ;;
- osf*) iconv_flavor=ICONV_FLAVOR_OSF ;;
- solaris*) iconv_flavor=ICONV_FLAVOR_SOLARIS ;;
+ aix*) iconv_flavor=ICONV_FLAVOR_AIX ;;
+ irix*) iconv_flavor=ICONV_FLAVOR_IRIX ;;
+ hpux*) iconv_flavor=ICONV_FLAVOR_HPUX ;;
+ osf*) iconv_flavor=ICONV_FLAVOR_OSF ;;
+ solaris*) iconv_flavor=ICONV_FLAVOR_SOLARIS ;;
+ openedition*) iconv_flavor=ICONV_FLAVOR_ZOS ;;
esac
if test -n "$iconv_flavor"; then
AC_DEFINE_UNQUOTED([ICONV_FLAVOR], [$iconv_flavor],
diff --git a/modules/iconv_open b/modules/iconv_open
index 7032dca..3486901 100644
--- a/modules/iconv_open
+++ b/modules/iconv_open
@@ -8,6 +8,7 @@ lib/iconv_open-hpux.gperf
lib/iconv_open-irix.gperf
lib/iconv_open-osf.gperf
lib/iconv_open-solaris.gperf
+lib/iconv_open-zos.gperf
lib/iconv.c
lib/iconv_close.c
m4/iconv_open.m4
@@ -48,10 +49,13 @@ $(srcdir)/iconv_open-osf.h: $(srcdir)/iconv_open-osf.gperf
$(srcdir)/iconv_open-solaris.h: $(srcdir)/iconv_open-solaris.gperf
$(V_GPERF)$(GPERF) -m 10 $(srcdir)/iconv_open-solaris.gperf > $(srcdir)/iconv_open-solaris.h-t && \
mv $(srcdir)/iconv_open-solaris.h-t $(srcdir)/iconv_open-solaris.h
-BUILT_SOURCES += iconv_open-aix.h iconv_open-hpux.h iconv_open-irix.h iconv_open-osf.h iconv_open-solaris.h
-MOSTLYCLEANFILES += iconv_open-aix.h-t iconv_open-hpux.h-t iconv_open-irix.h-t iconv_open-osf.h-t iconv_open-solaris.h-t
-MAINTAINERCLEANFILES += iconv_open-aix.h iconv_open-hpux.h iconv_open-irix.h iconv_open-osf.h iconv_open-solaris.h
-EXTRA_DIST += iconv_open-aix.h iconv_open-hpux.h iconv_open-irix.h iconv_open-osf.h iconv_open-solaris.h
+$(srcdir)/iconv_open-zos.h: $(srcdir)/iconv_open-zos.gperf
+ $(V_GPERF)$(GPERF) -m 10 $(srcdir)/iconv_open-zos.gperf > $(srcdir)/iconv_open-zos.h-t && \
+ mv $(srcdir)/iconv_open-zos.h-t $(srcdir)/iconv_open-zos.h
+BUILT_SOURCES += iconv_open-aix.h iconv_open-hpux.h iconv_open-irix.h iconv_open-osf.h iconv_open-solaris.h iconv_open-zos.h
+MOSTLYCLEANFILES += iconv_open-aix.h-t iconv_open-hpux.h-t iconv_open-irix.h-t iconv_open-osf.h-t iconv_open-solaris.h-t iconv_open-zos.h-t
+MAINTAINERCLEANFILES += iconv_open-aix.h iconv_open-hpux.h iconv_open-irix.h iconv_open-osf.h iconv_open-solaris.h iconv_open-zos.h
+EXTRA_DIST += iconv_open-aix.h iconv_open-hpux.h iconv_open-irix.h iconv_open-osf.h iconv_open-solaris.h iconv_open-zos.h
Include:
<iconv.h>
--
2.7.4
[-- Attachment #3: 0002-localcharset-Add-support-for-z-OS-encoding-names.patch --]
[-- Type: text/x-patch, Size: 5308 bytes --]
From 3f7d8da2ee9e513a9db318dc9c4aa91ca6ed8b3b Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Fri, 20 Dec 2019 09:17:20 +0100
Subject: [PATCH 2/2] localcharset: Add support for z/OS encoding names.
* lib/localcharset.h: Mention which encodings are used as locale
encodings on z/OS.
---
ChangeLog | 6 ++++++
lib/localcharset.h | 24 ++++++++++++------------
2 files changed, 18 insertions(+), 12 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 9d6d06b..9a47dbc 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@
2019-12-20 Bruno Haible <bruno@clisp.org>
+ localcharset: Add support for z/OS encoding names.
+ * lib/localcharset.h: Mention which encodings are used as locale
+ encodings on z/OS.
+
+2019-12-20 Bruno Haible <bruno@clisp.org>
+
iconv_open: Add support for z/OS encoding names.
Reported by Daniel Richard G. in
<https://lists.gnu.org/archive/html/bug-gnulib/2019-12/msg00172.html>.
diff --git a/lib/localcharset.h b/lib/localcharset.h
index 5897140..81ebfae 100644
--- a/lib/localcharset.h
+++ b/lib/localcharset.h
@@ -48,15 +48,15 @@ extern const char * locale_charset (void);
(darwin = Mac OS X, windows = native Windows)
ASCII, ANSI_X3.4-1968 glibc solaris freebsd netbsd darwin minix cygwin
- ISO-8859-1 Y glibc aix hpux irix osf solaris freebsd netbsd openbsd darwin cygwin
- ISO-8859-2 Y glibc aix hpux irix osf solaris freebsd netbsd openbsd darwin cygwin
+ ISO-8859-1 Y glibc aix hpux irix osf solaris freebsd netbsd openbsd darwin cygwin zos
+ ISO-8859-2 Y glibc aix hpux irix osf solaris freebsd netbsd openbsd darwin cygwin zos
ISO-8859-3 Y glibc solaris cygwin
ISO-8859-4 Y hpux osf solaris freebsd netbsd openbsd darwin
- ISO-8859-5 Y glibc aix hpux irix osf solaris freebsd netbsd openbsd darwin cygwin
+ ISO-8859-5 Y glibc aix hpux irix osf solaris freebsd netbsd openbsd darwin cygwin zos
ISO-8859-6 Y glibc aix hpux solaris cygwin
- ISO-8859-7 Y glibc aix hpux irix osf solaris freebsd netbsd openbsd darwin cygwin
- ISO-8859-8 Y glibc aix hpux osf solaris cygwin
- ISO-8859-9 Y glibc aix hpux irix osf solaris freebsd darwin cygwin
+ ISO-8859-7 Y glibc aix hpux irix osf solaris freebsd netbsd openbsd darwin cygwin zos
+ ISO-8859-8 Y glibc aix hpux osf solaris cygwin zos
+ ISO-8859-9 Y glibc aix hpux irix osf solaris freebsd darwin cygwin zos
ISO-8859-13 glibc hpux solaris freebsd netbsd openbsd darwin cygwin
ISO-8859-14 glibc cygwin
ISO-8859-15 glibc aix irix osf solaris freebsd netbsd openbsd darwin cygwin
@@ -79,7 +79,7 @@ extern const char * locale_charset (void);
CP874 windows dos
CP922 aix
CP932 aix cygwin windows dos
- CP943 aix
+ CP943 aix zos
CP949 osf darwin windows dos
CP950 windows dos
CP1046 aix
@@ -95,17 +95,17 @@ extern const char * locale_charset (void);
CP1255 glibc windows
CP1256 windows
CP1257 windows
- GB2312 Y glibc aix hpux irix solaris freebsd netbsd darwin cygwin
+ GB2312 Y glibc aix hpux irix solaris freebsd netbsd darwin cygwin zos
EUC-JP Y glibc aix hpux irix osf solaris freebsd netbsd darwin cygwin
- EUC-KR Y glibc aix hpux irix osf solaris freebsd netbsd darwin cygwin
+ EUC-KR Y glibc aix hpux irix osf solaris freebsd netbsd darwin cygwin zos
EUC-TW glibc aix hpux irix osf solaris netbsd
- BIG5 Y glibc aix hpux osf solaris freebsd netbsd darwin cygwin
+ BIG5 Y glibc aix hpux osf solaris freebsd netbsd darwin cygwin zos
BIG5-HKSCS glibc hpux solaris netbsd darwin
GBK glibc aix osf solaris freebsd darwin cygwin windows dos
GB18030 glibc hpux solaris freebsd netbsd darwin
SHIFT_JIS Y hpux osf solaris freebsd netbsd darwin
JOHAB glibc solaris windows
- TIS-620 glibc aix hpux osf solaris cygwin
+ TIS-620 glibc aix hpux osf solaris cygwin zos
VISCII Y glibc
TCVN5712-1 glibc
ARMSCII-8 glibc freebsd netbsd darwin
@@ -119,7 +119,7 @@ extern const char * locale_charset (void);
HP-KANA8 hpux
DEC-KANJI osf
DEC-HANYU osf
- UTF-8 Y glibc aix hpux osf solaris netbsd darwin cygwin
+ UTF-8 Y glibc aix hpux osf solaris netbsd darwin cygwin zos
Note: Names which are not marked as being a MIME name should not be used in
Internet protocols for information interchange (mail, news, etc.).
--
2.7.4
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and charset aliases
2019-12-20 8:19 ` Bruno Haible
@ 2019-12-20 18:23 ` Daniel Richard G.
2019-12-21 5:49 ` z/OS, iconv, and gperf Bruno Haible
0 siblings, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2019-12-20 18:23 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib
On Fri, 2019 Dec 20 03:19-05:00, Bruno Haible wrote:
>
> Thanks. From this, I think we can equate the following vendor names
> with GNU canonical names:
Is there a good test to ensure that the conversions are as expected? I
wouldn't put it past IBM to use a strange variant of some of these otherwise-
familiar encodings...
> Omitting identical names on both sides (e.g. BIG5 BIG5), I arrive at
> the two attached patches.
Git 3f7d8da2 gives me this build error:
make[3]: Entering directory `/tmp/gnulib-build/gllib'
source='/tmp/testdir/gllib/iconv_open.c' object='iconv_open.o' libtool=no \
DEPDIR=.deps depmode=aix /bin/sh /tmp/testdir/build-aux/depcomp \
xlc-wrap -DHAVE_CONFIG_H -I. -I/tmp/testdir/gllib -I.. -DGNULIB_STRICT_CHECKING=1 -D_UNIX95_THREADS -D_XOPEN_SOURCE=600 -DNSIG=39 -qhaltonmsg=CCN3296 -g -q64 -qfloat=ieee -qlanglvl=extc99 -qenumsize=4 -c -o iconv_open.o /tmp/testdir/gllib/iconv_open.c
ERROR CCN3205 /tmp/testdir/gllib/iconv_open-zos.h:29 "gperf generated tables don't work with this execution character set. Please report a bug to <bug-gperf@gnu.org>."
CCN0793(I) Compilation failed for file /tmp/testdir/gllib/iconv_open.c. Object file not created.
make[3]: *** [iconv_open.o] Error 12
Normally, everything builds using EBCDIC on this system. (There are ways
of compiling ASCII source, but that's not the usual way of working.)
There isn't a way to compile gperf tables in an encoding-agnostic
manner?
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and gperf
2019-12-20 18:23 ` Daniel Richard G.
@ 2019-12-21 5:49 ` Bruno Haible
2020-01-09 5:48 ` Daniel Richard G.
0 siblings, 1 reply; 49+ messages in thread
From: Bruno Haible @ 2019-12-21 5:49 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
Hi Daniel,
> > Thanks. From this, I think we can equate the following vendor names
> > with GNU canonical names:
>
> Is there a good test to ensure that the conversions are as expected? I
> wouldn't put it past IBM to use a strange variant of some of these otherwise-
> familiar encodings...
Oh, certainly many of the IBM-nnn encodings are variants of what Microsoft
and the rest of the world do regarding codepage nnn. Find an extensive
comparison at https://haible.de/bruno/charsets/conversion-tables/index.html .
You find the tools to extract the conversion tables and compare them here:
https://haible.de/bruno/charsets/conversion-tables/tools.html
> > Omitting identical names on both sides (e.g. BIG5 BIG5), I arrive at
> > the two attached patches.
>
> Git 3f7d8da2 gives me this build error:
>
> make[3]: Entering directory `/tmp/gnulib-build/gllib'
> source='/tmp/testdir/gllib/iconv_open.c' object='iconv_open.o' libtool=no \
> DEPDIR=.deps depmode=aix /bin/sh /tmp/testdir/build-aux/depcomp \
> xlc-wrap -DHAVE_CONFIG_H -I. -I/tmp/testdir/gllib -I.. -DGNULIB_STRICT_CHECKING=1 -D_UNIX95_THREADS -D_XOPEN_SOURCE=600 -DNSIG=39 -qhaltonmsg=CCN3296 -g -q64 -qfloat=ieee -qlanglvl=extc99 -qenumsize=4 -c -o iconv_open.o /tmp/testdir/gllib/iconv_open.c
> ERROR CCN3205 /tmp/testdir/gllib/iconv_open-zos.h:29 "gperf generated tables don't work with this execution character set. Please report a bug to <bug-gperf@gnu.org>."
> CCN0793(I) Compilation failed for file /tmp/testdir/gllib/iconv_open.c. Object file not created.
> make[3]: *** [iconv_open.o] Error 12
>
> Normally, everything builds using EBCDIC on this system. (There are ways
> of compiling ASCII source, but that's not the usual way of working.)
>
> There isn't a way to compile gperf tables in an encoding-agnostic
> manner?
No. gperf works by using character values as indices into arrays; the
arrays are filled by gperf at code generation time.
Can you experiment with the pragmas to resolve this? For this, you
best take the gperf source distribution, remove the part that emits
the error message in gperf/src/output.cc:2103, and then work with
"make check" to get things working.
Bruno
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and gperf
2019-12-21 5:49 ` z/OS, iconv, and gperf Bruno Haible
@ 2020-01-09 5:48 ` Daniel Richard G.
2020-01-19 21:52 ` Bruno Haible
2020-01-19 21:59 ` Bruno Haible
0 siblings, 2 replies; 49+ messages in thread
From: Daniel Richard G. @ 2020-01-09 5:48 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib
On Sat, 2019 Dec 21 00:49-05:00, Bruno Haible wrote:
>
> Oh, certainly many of the IBM-nnn encodings are variants of what
> Microsoft and the rest of the world do regarding codepage nnn. Find an
> extensive comparison at
> https://haible.de/bruno/charsets/conversion-tables/index.html .
>
> You find the tools to extract the conversion tables and compare
> them here:
> https://haible.de/bruno/charsets/conversion-tables/tools.html
I downloaded the tools, and gave them a try. I will discuss sending you
the resulting information in a private message, as it is fairly large.
> > There isn't a way to compile gperf tables in an encoding-agnostic
> > manner?
>
> No. gperf works by using character values as indices into arrays; the
> arrays are filled by gperf at code generation time.
>
> Can you experiment with the pragmas to resolve this? For this, you
> best take the gperf source distribution, remove the part that emits
> the error message in gperf/src/output.cc:2103, and then work with
> "make check" to get things working.
What is the intended outcome, however? There are pragmas to change the
encoding assumed by the compiler in character/string literals, but if
that is set to ASCII, then the compiled code will also assume ASCII
input, which would typically not be the case on this system.
I suppose in theory, gperf could be given an option to generate
code that expects EBCDIC instead of ASCII, and that source could be
used on this system. However, gperf has no such encoding-related
option, probably because anything other than ASCII is too niche for
their purposes.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and gperf
2020-01-09 5:48 ` Daniel Richard G.
@ 2020-01-19 21:52 ` Bruno Haible
2020-01-19 21:59 ` Bruno Haible
1 sibling, 0 replies; 49+ messages in thread
From: Bruno Haible @ 2020-01-19 21:52 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
Hi Daniel,
> > Oh, certainly many of the IBM-nnn encodings are variants of what
> > Microsoft and the rest of the world do regarding codepage nnn. Find an
> > extensive comparison at
> > https://haible.de/bruno/charsets/conversion-tables/index.html .
> >
> > You find the tools to extract the conversion tables and compare
> > them here:
> > https://haible.de/bruno/charsets/conversion-tables/tools.html
>
> I downloaded the tools, and gave them a try. I will discuss sending you
> the resulting information in a private message, as it is fairly large.
Thank you. With this information, I updated the charsets comparison
site at https://haible.de/bruno/charsets/conversion-tables/ . It turns
out that z/OS has a couple of encodings under names that we did not
guess. Also, for some encodings a non-intuitive encoding name is closer
to what one would expect. For example, "04962" is better than "IBM-866"
(see https://haible.de/bruno/charsets/conversion-tables/CP866.html).
Also, for EUC-TW there is no really suitable z/OS encoding; "IBM-eucTW"
differs too much from the standard (as measured by 'table-diff').
2020-01-19 Bruno Haible <bruno@clisp.org>
iconv_open: Improve z/OS support.
* lib/iconv_open-zos.gperf: Choose better aliases. Add mapping for
ISO-8859-3, KOI8-R, KOI8-U, CP775, CP857, CP865, CP1129, CP1131, CP1257.
Remove mapping for EUC-TW.
diff --git a/lib/iconv_open-zos.gperf b/lib/iconv_open-zos.gperf
index 00e696e..918fdb9 100644
--- a/lib/iconv_open-zos.gperf
+++ b/lib/iconv_open-zos.gperf
@@ -28,41 +28,49 @@ struct mapping { int standard_name; const char vendor_name[10 + 1]; };
ASCII, "00367"
ISO-8859-1, "ISO8859-1"
ISO-8859-2, "ISO8859-2"
+ISO-8859-3, "00913"
ISO-8859-4, "ISO8859-4"
ISO-8859-5, "ISO8859-5"
ISO-8859-6, "ISO8859-6"
ISO-8859-7, "ISO8859-7"
-ISO-8859-8, "ISO8859-8"
+ISO-8859-8, "05012"
ISO-8859-9, "ISO8859-9"
ISO-8859-13, "ISO8859-13"
ISO-8859-15, "ISO8859-15"
+KOI8-R, "00878"
+KOI8-U, "01168"
CP437, "IBM-437"
-CP850, "IBM-850"
+CP775, "00775"
+CP850, "09042"
CP852, "IBM-852"
-CP855, "IBM-855"
+CP855, "13143"
CP856, "IBM-856"
+CP857, "00857"
CP861, "IBM-861"
CP862, "IBM-862"
CP864, "IBM-864"
-CP866, "IBM-866"
+CP865, "00865"
+CP866, "04962"
CP869, "IBM-869"
CP874, "TIS-620"
CP922, "IBM-922"
-CP932, "IBM-eucJC"
+CP932, "IBM-943"
CP943, "IBM-943"
-CP949, "IBM-949"
+CP949, "IBM-1363"
CP1046, "IBM-1046"
CP1124, "IBM-1124"
CP1125, "IBM-1125"
-CP1250, "IBM-1250"
-CP1251, "IBM-1251"
-CP1252, "IBM-1252"
-CP1253, "IBM-1253"
-CP1254, "IBM-1254"
-CP1255, "IBM-1255"
-CP1256, "IBM-1256"
+CP1129, "01129"
+CP1131, "01131"
+CP1250, "IBM-5346"
+CP1251, "IBM-5347"
+CP1252, "IBM-5348"
+CP1253, "IBM-5349"
+CP1254, "IBM-5350"
+CP1255, "09447"
+CP1256, "09448"
+CP1257, "09449"
GB2312, "IBM-eucCN"
-EUC-JP, "EUCJP"
+EUC-JP, "01350"
EUC-KR, "IBM-eucKR"
-EUC-TW, "IBM-eucTW"
-GBK, "IBM-936"
+GBK, "IBM-1386"
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and gperf
2020-01-09 5:48 ` Daniel Richard G.
2020-01-19 21:52 ` Bruno Haible
@ 2020-01-19 21:59 ` Bruno Haible
2020-01-19 22:32 ` Daniel Richard G.
1 sibling, 1 reply; 49+ messages in thread
From: Bruno Haible @ 2020-01-19 21:59 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
Hi Daniel,
> > > There isn't a way to compile gperf tables in an encoding-agnostic
> > > manner?
> >
> > No. gperf works by using character values as indices into arrays; the
> > arrays are filled by gperf at code generation time.
> >
> > Can you experiment with the pragmas to resolve this? For this, you
> > best take the gperf source distribution, remove the part that emits
> > the error message in gperf/src/output.cc:2103, and then work with
> > "make check" to get things working.
>
> What is the intended outcome, however? There are pragmas to change the
> encoding assumed by the compiler in character/string literals, but if
> that is set to ASCII, then the compiled code will also assume ASCII
> input, which would typically not be the case on this system.
>
> I suppose in theory, gperf could be given an option to generate
> code that expects EBCDIC instead of ASCII, and that source could be
> used on this system. However, gperf has no such encoding-related
> option, probably because anything other than ASCII is too niche for
> their purposes.
The intended outcome is that a gperf-generated mapping function, say,
for
FOO, "BAR"
performs equivalently to
if (strcmp (arg, "FOO") == 0)
return "BAR";
just faster. Can you find the suitable compiler settings and #pragmas
- inside and outside the gperf-generated code - to make this happen?
In theory, it would be possible to introduce an option to gperf that
makes it generate ASCII- and EBCDIC-based tables in the same output
file, and let the compiler pick the right one at compile-time. But
this is a *lot* of work. Therefore, if you can get the same result
through compiler settings and #pragmas, that will be the way to go.
Bruno
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and gperf
2020-01-19 21:59 ` Bruno Haible
@ 2020-01-19 22:32 ` Daniel Richard G.
2020-01-20 0:13 ` Bruno Haible
0 siblings, 1 reply; 49+ messages in thread
From: Daniel Richard G. @ 2020-01-19 22:32 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib
On Sun, 2020 Jan 19 16:59-05:00, Bruno Haible wrote:
>
> The intended outcome is that a gperf-generated mapping function, say,
> for
> FOO, "BAR"
> performs equivalently to
> if (strcmp (arg, "FOO") == 0)
> return "BAR";
> just faster. Can you find the suitable compiler settings and #pragmas
> - inside and outside the gperf-generated code - to make this happen?
But what good will that do, if the (ASCII-consuming) gperf code receives
e.g. the EBCDIC form of "ISO-8859-1"? "#pragma convert" only works at
compile time, not run time.
In order for a literal string in a user program to get passed in as
ASCII, the user program itself would also need to be compiled with this
#pragma surrounding the string. (I don't think that is a reasonable thing
to ask of user programs, and it doesn't address non-literal strings in
any event.)
There would need to be an additional step of explicit run-time EBCDIC-to-
ASCII conversion of the input in order for ASCII-based gperf code to
work on this platform. Is that feasible?
> In theory, it would be possible to introduce an option to gperf that
> makes it generate ASCII- and EBCDIC-based tables in the same output
> file, and let the compiler pick the right one at compile-time. But
> this is a *lot* of work. Therefore, if you can get the same result
> through compiler settings and #pragmas, that will be the way to go.
It's not the same result. Only the former would allow "char c = 0xC1" to
be recognized as a letter "A". The latter just makes 'A' == 0x41.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and gperf
2020-01-19 22:32 ` Daniel Richard G.
@ 2020-01-20 0:13 ` Bruno Haible
2020-01-22 6:38 ` Daniel Richard G.
0 siblings, 1 reply; 49+ messages in thread
From: Bruno Haible @ 2020-01-20 0:13 UTC (permalink / raw)
To: Daniel Richard G.; +Cc: bug-gnulib
Daniel Richard G. wrote:
> But what good will that do, if the (ASCII-consuming) gperf code receives
> e.g. the EBCDIC form of "ISO-8859-1"? "#pragma convert" only works at
> compile time, not run time.
OK, then we'll need
a) for the short-term: in lib/iconv_open.c, apply an EBCDIC -> ASCII
conversion to the 'from' and the 'to' strings. Can you implement that?
And also a rule that removes the anti-EBCDIC guard from the gperf
generated output (in modules/iconv_open).
b) a feature request for the 'gperf' program, to generate two code
bodies, one for ASCII and one for EBCDIC.
Bruno
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: z/OS, iconv, and gperf
2020-01-20 0:13 ` Bruno Haible
@ 2020-01-22 6:38 ` Daniel Richard G.
0 siblings, 0 replies; 49+ messages in thread
From: Daniel Richard G. @ 2020-01-22 6:38 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 2112 bytes --]
Hi Bruno,
On Sun, 2020 Jan 19 19:13-05:00, Bruno Haible wrote:
>
> OK, then we'll need
> a) for the short-term: in lib/iconv_open.c, apply an EBCDIC -> ASCII
> conversion to the 'from' and the 'to' strings. Can you implement that?
> And also a rule that removes the anti-EBCDIC guard from the gperf
> generated output (in modules/iconv_open).
Please see the attached patch to iconv_open.c. I'll leave the makefile
rule to you, as that is less straightforward for me. The patch, plus a
disabled #error in iconv_open-zos.h, gets test-iconv to build and pass.
However, the following test failures are new to me:
$ /tmp/testdir/gltests/test-btoc32-1.sh
/tmp/testdir/gltests/test-btoc32.c:49: assertion 'btoc32 (c) == c' failed
CEE5207E The signal SIGABRT was received.
$ /tmp/testdir/gltests/test-mbrtoc32-1.sh
/tmp/testdir/gltests/test-mbrtoc32.c:108: assertion 'wc == c' failed
CEE5207E The signal SIGABRT was received.
$ /tmp/testdir/gltests/test-mbrtoc32-5.sh
/tmp/testdir/gltests/test-mbrtoc32.c:115: assertion 'mbsinit (&state)' faid
CEE5207E The signal SIGABRT was received.
I tested using using Git 49c6f78c. Poking a bit into the
test-btoc32-1.sh failure, I saw that it occurred when btoc32(4) yielded
156, which seems consistent with an IBM-1047-to-ASCII mapping. (Per
Wikipedia, 0-3 is the same as ASCII, but 4 is a "SEL" character. And
btoc32(5) returns 9.)
> b) a feature request for the 'gperf' program, to generate two code
> bodies, one for ASCII and one for EBCDIC.
What about generating a translation table at compile/run time, that is
used if ASCII is unavailable? Something like
xlate['A'] = 65;
xlate['B'] = 66;
...
xlate['Z'] = 90;
...
c = xlate[c];
As I recall, there are EBCDIC variants with minor differences in the
positions of certain punctuation marks, and while they may or may not be
commonly used on z/OS, it would be desirable to remain robust against
that possibility.
--Daniel
--
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
[-- Attachment #2: zos-iconv-fix.patch.txt --]
[-- Type: text/plain, Size: 2198 bytes --]
diff --git a/lib/iconv_open.c b/lib/iconv_open.c
index 989bd9d57..72276b1c3 100644
--- a/lib/iconv_open.c
+++ b/lib/iconv_open.c
@@ -38,10 +38,25 @@
#define ICONV_FLAVOR_SOLARIS "iconv_open-solaris.h"
#define ICONV_FLAVOR_ZOS "iconv_open-zos.h"
+#if defined __MVS__ && defined __IBMC__ && 'A' != 0x41
+/* On IBM z/OS, the encoding names are in EBCDIC, but the gperf source still
+ expects and returns ASCII. We need to convert between the two. */
+# define EBCDIC_CONVERT
+#endif
+
+#ifdef EBCDIC_CONVERT
+/* Ensure that the gperf source is compiled as ASCII. */
+# pragma convert("ISO8859-1")
+#endif
+
#ifdef ICONV_FLAVOR
# include ICONV_FLAVOR
#endif
+#ifdef EBCDIC_CONVERT
+# pragma convert(pop)
+#endif
+
iconv_t
rpl_iconv_open (const char *tocode, const char *fromcode)
#undef iconv_open
@@ -50,6 +65,10 @@ rpl_iconv_open (const char *tocode, const char *fromcode)
char tocode_upper[32];
char *fromcode_upper_end;
char *tocode_upper_end;
+#ifdef EBCDIC_CONVERT
+ char fromcode_ae[32];
+ char tocode_ae[32];
+#endif
#if REPLACE_ICONV_UTF
/* Special handling of conversion between UTF-8 and UTF-{16,32}{BE,LE}.
@@ -150,6 +169,15 @@ rpl_iconv_open (const char *tocode, const char *fromcode)
tocode_upper_end = q;
}
+#ifdef EBCDIC_CONVERT
+ /* Convert the encodings from EBCDIC to ASCII, as gperf expects the latter. */
+ if (__etoa (fromcode_upper) < 0 || __etoa (tocode_upper) < 0)
+ {
+ errno = EINVAL;
+ return (iconv_t)(-1);
+ }
+#endif
+
#ifdef ICONV_FLAVOR
/* Apply the mappings. */
{
@@ -169,5 +197,20 @@ rpl_iconv_open (const char *tocode, const char *fromcode)
tocode = tocode_upper;
#endif
+#ifdef EBCDIC_CONVERT
+ /* Convert the encodings back to EBCDIC for iconv_open(). */
+ strncpy (fromcode_ae, fromcode, sizeof(fromcode_ae));
+ strncpy (tocode_ae, tocode, sizeof(tocode_ae));
+ fromcode_ae[SIZEOF (fromcode_ae) - 1] = '\0';
+ tocode_ae[SIZEOF (tocode_ae) - 1] = '\0';
+ if (__atoe (fromcode_ae) < 0 || __atoe (tocode_ae) < 0)
+ {
+ errno = EINVAL;
+ return (iconv_t)(-1);
+ }
+ fromcode = fromcode_ae;
+ tocode = tocode_ae;
+#endif
+
return iconv_open (tocode, fromcode);
}
^ permalink raw reply related [flat|nested] 49+ messages in thread
end of thread, other threads:[~2020-01-22 6:41 UTC | newest]
Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-22 2:28 [PATCH] IBM z/OS + EBCDIC support Daniel Richard G.
2015-09-22 15:23 ` Eric Blake
2015-09-22 19:27 ` Daniel Richard G.
2015-09-22 20:00 ` Paul Eggert
2015-09-22 20:08 ` Eric Blake
2015-09-22 20:51 ` Daniel Richard G.
2015-09-22 19:32 ` Paul Eggert
2015-09-22 19:46 ` Paul Eggert
2015-09-22 20:37 ` Daniel Richard G.
2015-09-22 22:03 ` Paul Eggert
2015-09-22 23:44 ` Daniel Richard G.
2015-09-23 2:02 ` Paul Eggert
2015-09-23 6:58 ` Daniel Richard G.
2015-09-23 19:05 ` Paul Eggert
2015-09-23 19:29 ` Paul Eggert
2015-09-23 21:57 ` Daniel Richard G.
2015-09-25 7:29 ` Paul Eggert
2015-09-26 0:25 ` Daniel Richard G.
2015-09-26 2:49 ` Paul Eggert
2015-09-26 4:39 ` Daniel Richard G.
2015-09-26 16:08 ` Ben Pfaff
2015-09-27 6:31 ` Daniel Richard G.
2015-09-27 6:59 ` Paul Eggert
2015-09-28 2:09 ` Daniel Richard G.
2015-10-15 4:49 ` Daniel Richard G.
2016-08-18 0:47 ` Paul Eggert
2016-08-18 8:24 ` Daniel Richard G.
2016-08-18 8:53 ` Paul Eggert
2016-08-19 8:20 ` Daniel Richard G.
2016-08-19 11:03 ` Bruno Haible
2016-08-19 19:28 ` Paul Eggert
2016-08-19 20:38 ` Daniel Richard G.
2019-12-19 4:57 ` z/OS configure triple Bruno Haible
2019-12-20 0:22 ` Daniel Richard G.
2019-12-20 6:29 ` Bruno Haible
2019-12-19 5:16 ` z/OS, iconv, and charset aliases Bruno Haible
2019-12-19 5:21 ` Bruno Haible
2019-12-20 4:38 ` Daniel Richard G.
2019-12-20 8:19 ` Bruno Haible
2019-12-20 18:23 ` Daniel Richard G.
2019-12-21 5:49 ` z/OS, iconv, and gperf Bruno Haible
2020-01-09 5:48 ` Daniel Richard G.
2020-01-19 21:52 ` Bruno Haible
2020-01-19 21:59 ` Bruno Haible
2020-01-19 22:32 ` Daniel Richard G.
2020-01-20 0:13 ` Bruno Haible
2020-01-22 6:38 ` Daniel Richard G.
2015-09-22 19:50 ` [PATCH] IBM z/OS + EBCDIC support Paul Eggert
2015-09-22 20:47 ` Daniel Richard G.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).