From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS3215 2.0.0.0/16
X-Spam-Status: No, score=-2.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,SPF_FAIL shortcircuit=no autolearn=no
	autolearn_force=no version=3.4.0
Received: from list by lists.gnu.org with archive (Exim 4.71)
	id 1ZeUJL-00007X-Ox
	for mharc-bug-gnulib@gnu.org; Tue, 22 Sep 2015 16:37:11 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42953)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <skunk@iSKUNK.ORG>) id 1ZeUJI-0008Sj-7B
	for bug-gnulib@gnu.org; Tue, 22 Sep 2015 16:37:09 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <skunk@iSKUNK.ORG>) id 1ZeUJD-00004s-Dn
	for bug-gnulib@gnu.org; Tue, 22 Sep 2015 16:37:08 -0400
Received: from out1-smtp.messagingengine.com ([66.111.4.25]:54890)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <skunk@iSKUNK.ORG>) id 1ZeUJD-0008WN-9H
	for bug-gnulib@gnu.org; Tue, 22 Sep 2015 16:37:03 -0400
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42])
	by mailout.nyi.internal (Postfix) with ESMTP id B4CBB20929
	for <bug-gnulib@gnu.org>; Tue, 22 Sep 2015 16:37:02 -0400 (EDT)
Received: from web5 ([10.202.2.215])
	by compute2.internal (MEProxy); Tue, 22 Sep 2015 16:37:02 -0400
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=iSKUNK.ORG; h=
	content-transfer-encoding:content-type:date:from:in-reply-to
	:message-id:mime-version:references:subject:to:x-sasl-enc
	:x-sasl-enc; s=mesmtp; bh=Nuv5yKIGoO+/XxUkOWulHRYQHiU=; b=J2GGAk
	pTwwq/mjSg3tdVm+euxb7nT46JEBrzmpETHlAsd+UHmf8n8VpNaX8JRgThYcj0Ad
	ApUzJ8qR+5mRKGfG0u8aorYZiVFyR0E3Pdfg4OYUYy/gbIQU2XST7yq3bV5KLiQ0
	yk0QrTB2Kcnr9jgN4XEXaB/M2clON1ndWdp8U=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=
	messagingengine.com; h=content-transfer-encoding:content-type
	:date:from:in-reply-to:message-id:mime-version:references
	:subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=Nuv5yKIGoO+/XxU
	kOWulHRYQHiU=; b=riKyMwiTIJWd+R6QD+klpGYuUDRUg0PWezvjEJxWD6yzuP2
	RKvR+buNOZ4Xi3eUB3xlnkeyZCZv0tBS0FJ+149jCfrxMZVAZ6/1it3ByktWGuag
	xYNGbAD5FE1f7zydRqILh4owJel0QkLqBXqB/Fu8KAoGtc+MxFRjNUJJFMRI=
Received: by web5.nyi.internal (Postfix, from userid 99)
	id 80D09A88304; Tue, 22 Sep 2015 16:37:02 -0400 (EDT)
Message-Id: <1442954222.2595506.390816073.4DC03D94@webmail.messagingengine.com>
X-Sasl-Enc: yNI3jmL3S1iICBUjGe7Dm/yzDuWXzvW30aZs0jJiqTw5 1442954222
From: "Daniel Richard G." <skunk@iSKUNK.ORG>
To: Paul Eggert <eggert@cs.ucla.edu>, bug-gnulib@gnu.org
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain
X-Mailer: MessagingEngine.com Webmail Interface - html
In-Reply-To: <5601ACB5.3010005@cs.ucla.edu>
References: <1442888927.2328038.389926169.50DB0133@webmail.messagingengine.com>
	<5601ACB5.3010005@cs.ucla.edu>
Subject: Re: [PATCH] IBM z/OS + EBCDIC support
Date: Tue, 22 Sep 2015 16:37:02 -0400
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 66.111.4.25
X-BeenThere: bug-gnulib@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Gnulib discussion list <bug-gnulib.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnulib>,
	<mailto:bug-gnulib-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnulib>
List-Post: <mailto:bug-gnulib@gnu.org>
List-Help: <mailto:bug-gnulib-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnulib>,
	<mailto:bug-gnulib-request@gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Sep 2015 20:37:10 -0000

Hi Paul,

On Tue, 2015 Sep 22 12:32-0700, Paul Eggert wrote:
> Thanks for looking into this.  I have some questions about the c-ctype
> changes.  It appears that the proposed patch defers to the system
> functions (which use the current locale), but that's not the intent of
> c-ctype: it's supposed to correspond to a stripped down POSIX "C"
> locale regardless of the current locale settings.  Is there something
> special in z/OS that requires using the system functions?  (E.g., does
> the "C" locale behave differently depending on some *other* setting
> regarding character set?)

Mainly, it was the attempt to answer the question "so what specific
variant of EBCDIC are we going to target here?" that led me to use
the system functions. EBCDIC-1047 is favored in z/OS, but EBCDIC-037
is also popular, and then there are the Russian/Japanese/etc. code
pages that some far-flung users might want. However, unlike "normal"
8-bit encodings like ISO 8859-#, KOI8-R et al., there is no agreement
in the 7-bit range, and even ASCII characters like "[" and "]" are
not consistently encoded between EBCDIC variants. We don't have the
option of saying, "Okay, screw all that, we'll just limit ourselves
to this common subset," unless said subset excludes things like
punctuation marks.

My view is, it's not worth the hassle. Yes, c-ctype is not supposed to
be locale-dependent. It's going to be a lot more work, and a lot more
code to maintain to overcome that, and it's not likely the users of
these systems will see a corresponding benefit. I think it would be
better to have this for now---it's better than nothing---and if a clear
need arises in the future for locale-independent behavior on z/OS
(possibly by selecting an EBCDIC variant at compile time), then cross
that bridge then.

> With the above in mind, it's not clear what c_isascii should do.
> Should it return 1 for bytes in the range 0..127, or for bytes that
> correspond to ASCII bytes if one assumes the standard translation
> from EBCDIC code page 037 to ASCII?  (Is there a standard?)  If the
> former, the current code is OK; if the latter, does the system
> isascii always return the same results regardless of locale and do
> these results make sense?

The latter behavior is the right one, IMO. If the former, there wouldn't
even be a point to having an isascii() function at all; you would just
do a range check.

Yes, there's a standard... a whole smorgasbord to choose from ^_^

The system isascii() function is locale-dependent. With "[" and "]"
depending on that, I don't see a way to get around this, unless you
deliberately support one EBCDIC variant at the expense of all others.

    http://www-01.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/risasc.htm?lang=en

> Anyway, in looking through the code I see that it's hard to test a port 
> to EBCDIC because it uses ifdef rather than if, and I do see some 
> promotion bugs that you noted but we can fix these with inline functions 
> rather than macros (cleaner and safer nowadays), and there are a few 
> other style glitches (e.g., boolean values, overuse of >=) so I 
> installed the attached patch.  This patch assumes EBCDIC control 
> characters are either less than ' ' or are all 1 bits, which I think is 
> right.  The patch also tightens up the tests a bit.

Yes, all control characters appear to be in [\x00-\x3F], but not
everything in that range is a control character. (I remember 0x04 was
not.) I tried making c_iscntrl() a simple range check at first, but that
did not agree with the system iscntrl().

> This patch doesn't address the isascii problem, nor the "something 
> special in z/OS" problem, so quite possibly further patches will be 
> needed to this module.
> Email had 1 attachment:
> + 0001-c-ctype-port-better-to-EBCDIC.patch
>   21k (text/x-patch)

I'll be happy to test your [revised] patch this evening.


--Daniel


-- 
Daniel Richard G. || skunk@iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.