From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.0.0.0/16 X-Spam-Status: No, score=-2.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,SPF_FAIL shortcircuit=no autolearn=no autolearn_force=no version=3.4.0 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1ZeUJL-00007X-Ox for mharc-bug-gnulib@gnu.org; Tue, 22 Sep 2015 16:37:11 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42953) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZeUJI-0008Sj-7B for bug-gnulib@gnu.org; Tue, 22 Sep 2015 16:37:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZeUJD-00004s-Dn for bug-gnulib@gnu.org; Tue, 22 Sep 2015 16:37:08 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:54890) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZeUJD-0008WN-9H for bug-gnulib@gnu.org; Tue, 22 Sep 2015 16:37:03 -0400 Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id B4CBB20929 for ; Tue, 22 Sep 2015 16:37:02 -0400 (EDT) Received: from web5 ([10.202.2.215]) by compute2.internal (MEProxy); Tue, 22 Sep 2015 16:37:02 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=iSKUNK.ORG; h= content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-sasl-enc :x-sasl-enc; s=mesmtp; bh=Nuv5yKIGoO+/XxUkOWulHRYQHiU=; b=J2GGAk pTwwq/mjSg3tdVm+euxb7nT46JEBrzmpETHlAsd+UHmf8n8VpNaX8JRgThYcj0Ad ApUzJ8qR+5mRKGfG0u8aorYZiVFyR0E3Pdfg4OYUYy/gbIQU2XST7yq3bV5KLiQ0 yk0QrTB2Kcnr9jgN4XEXaB/M2clON1ndWdp8U= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=Nuv5yKIGoO+/XxU kOWulHRYQHiU=; b=riKyMwiTIJWd+R6QD+klpGYuUDRUg0PWezvjEJxWD6yzuP2 RKvR+buNOZ4Xi3eUB3xlnkeyZCZv0tBS0FJ+149jCfrxMZVAZ6/1it3ByktWGuag xYNGbAD5FE1f7zydRqILh4owJel0QkLqBXqB/Fu8KAoGtc+MxFRjNUJJFMRI= Received: by web5.nyi.internal (Postfix, from userid 99) id 80D09A88304; Tue, 22 Sep 2015 16:37:02 -0400 (EDT) Message-Id: <1442954222.2595506.390816073.4DC03D94@webmail.messagingengine.com> X-Sasl-Enc: yNI3jmL3S1iICBUjGe7Dm/yzDuWXzvW30aZs0jJiqTw5 1442954222 From: "Daniel Richard G." To: Paul Eggert , bug-gnulib@gnu.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface - html In-Reply-To: <5601ACB5.3010005@cs.ucla.edu> References: <1442888927.2328038.389926169.50DB0133@webmail.messagingengine.com> <5601ACB5.3010005@cs.ucla.edu> Subject: Re: [PATCH] IBM z/OS + EBCDIC support Date: Tue, 22 Sep 2015 16:37:02 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.111.4.25 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 20:37:10 -0000 Hi Paul, On Tue, 2015 Sep 22 12:32-0700, Paul Eggert wrote: > Thanks for looking into this. I have some questions about the c-ctype > changes. It appears that the proposed patch defers to the system > functions (which use the current locale), but that's not the intent of > c-ctype: it's supposed to correspond to a stripped down POSIX "C" > locale regardless of the current locale settings. Is there something > special in z/OS that requires using the system functions? (E.g., does > the "C" locale behave differently depending on some *other* setting > regarding character set?) Mainly, it was the attempt to answer the question "so what specific variant of EBCDIC are we going to target here?" that led me to use the system functions. EBCDIC-1047 is favored in z/OS, but EBCDIC-037 is also popular, and then there are the Russian/Japanese/etc. code pages that some far-flung users might want. However, unlike "normal" 8-bit encodings like ISO 8859-#, KOI8-R et al., there is no agreement in the 7-bit range, and even ASCII characters like "[" and "]" are not consistently encoded between EBCDIC variants. We don't have the option of saying, "Okay, screw all that, we'll just limit ourselves to this common subset," unless said subset excludes things like punctuation marks. My view is, it's not worth the hassle. Yes, c-ctype is not supposed to be locale-dependent. It's going to be a lot more work, and a lot more code to maintain to overcome that, and it's not likely the users of these systems will see a corresponding benefit. I think it would be better to have this for now---it's better than nothing---and if a clear need arises in the future for locale-independent behavior on z/OS (possibly by selecting an EBCDIC variant at compile time), then cross that bridge then. > With the above in mind, it's not clear what c_isascii should do. > Should it return 1 for bytes in the range 0..127, or for bytes that > correspond to ASCII bytes if one assumes the standard translation > from EBCDIC code page 037 to ASCII? (Is there a standard?) If the > former, the current code is OK; if the latter, does the system > isascii always return the same results regardless of locale and do > these results make sense? The latter behavior is the right one, IMO. If the former, there wouldn't even be a point to having an isascii() function at all; you would just do a range check. Yes, there's a standard... a whole smorgasbord to choose from ^_^ The system isascii() function is locale-dependent. With "[" and "]" depending on that, I don't see a way to get around this, unless you deliberately support one EBCDIC variant at the expense of all others. http://www-01.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/risasc.htm?lang=en > Anyway, in looking through the code I see that it's hard to test a port > to EBCDIC because it uses ifdef rather than if, and I do see some > promotion bugs that you noted but we can fix these with inline functions > rather than macros (cleaner and safer nowadays), and there are a few > other style glitches (e.g., boolean values, overuse of >=) so I > installed the attached patch. This patch assumes EBCDIC control > characters are either less than ' ' or are all 1 bits, which I think is > right. The patch also tightens up the tests a bit. Yes, all control characters appear to be in [\x00-\x3F], but not everything in that range is a control character. (I remember 0x04 was not.) I tried making c_iscntrl() a simple range check at first, but that did not agree with the system iscntrl(). > This patch doesn't address the isascii problem, nor the "something > special in z/OS" problem, so quite possibly further patches will be > needed to this module. > Email had 1 attachment: > + 0001-c-ctype-port-better-to-EBCDIC.patch > 21k (text/x-patch) I'll be happy to test your [revised] patch this evening. --Daniel -- Daniel Richard G. || skunk@iSKUNK.ORG My ASCII-art .sig got a bad case of Times New Roman.