From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 12C391F5AE for ; Wed, 8 Jul 2020 04:45:47 +0000 (UTC) Received: from localhost ([::1]:42638 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jt1xp-0005K9-QE for normalperson@yhbt.net; Wed, 08 Jul 2020 00:45:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59156) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jt1xm-0005Hg-7e; Wed, 08 Jul 2020 00:45:42 -0400 Received: from mail-wm1-x334.google.com ([2a00:1450:4864:20::334]:35738) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jt1xk-0007Sz-86; Wed, 08 Jul 2020 00:45:41 -0400 Received: by mail-wm1-x334.google.com with SMTP id l2so1536560wmf.0; Tue, 07 Jul 2020 21:45:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=titBuYPq/GFtA6Ygd1GUkaT3BfFKV16jqOGEEyojqFE=; b=DyKMZtNcS/YTCgpOLStEqTvgrFCmBEynMtLjZG5wPIekNgj7jFr72ewLfaEerGL2yy 0DtOJ2eoMYILliwweg+e/o1XzUEPihxSBSumtg4UP7KBi+jJ2a6sr6uA8eBxWPRifpdP TwIG24iOaEY/6RiIH3MYw3IQgnzqf12dZj9M3hc3wTq5Eo3jbzPsIPKaoI2Wli5mTC/D cgY7iJ/vLsQs4MBPaJVv04cE/GkdgxH9MZHEliIXudlHnj9/N5TiqovF6MlLZHOzKfmo lg1wecV59nSX2HTR0XyBNNAd05DJ/f4PEuVXRZ/p3GcBjjKR/dzr6KocKyXCjyAhux0L WLUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=titBuYPq/GFtA6Ygd1GUkaT3BfFKV16jqOGEEyojqFE=; b=ni9JGjffcuOAKtJMm9p43ZNjQK+Mwat1ze2P06W1hv+NyXLXBY+eGJ9xwgkSZ8UCFA bQKl3tK5kR4iU4DECunN4XptXEjGAjslg0b3Qr6nDcF1OkzwMyvb3Fkq8u51Snnw5QWv 4VQvPzGlOVhiQWct1svl8rBi3l5CwIqc9G2jJkXtQO7OGmzNHudimadcAYmEYJmd0Sg4 JwIz7z4yQuPyvTpJmkNYxdwxpF5IBjyb+IVZVEoqxH/ofXU+A1z4KKLqTcztBDIPWE05 eu97IfqcugLH4o+lQ/pig48z+DLz12Mupj6Uc22IhZjGwVR1N2WPthmHyYNcUIrJotwI YoIA== X-Gm-Message-State: AOAM532w2HNywnbroN/18pdcNGdTG1+XnlTA5FkhXgqqmjnJ4n76VF1Z vRPgXpxebMTWati5hFt8C2KsWKaux4Y= X-Google-Smtp-Source: ABdhPJxWR4Agg0fyeKOQL8BHoAcIVefRvSoijzPl/AI14bv5fHVO35vAMUXE+fkV/pKSz+CTnJGYHw== X-Received: by 2002:a7b:c1d8:: with SMTP id a24mr827017wmj.0.1594183537580; Tue, 07 Jul 2020 21:45:37 -0700 (PDT) Received: from ?IPv6:2a01:e35:2fd9:96a0:57:83c6:b4a5:65e? ([2a01:e35:2fd9:96a0:57:83c6:b4a5:65e]) by smtp.gmail.com with ESMTPSA id v24sm4506570wrd.92.2020.07.07.21.45.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 07 Jul 2020 21:45:36 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.14\)) Subject: portability issues with unicodeio (was: [GNU Bison 3.6.90] testsuite: 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 196 220 221 228 244 245 246 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 549 555 562 567 577 failed) From: Akim Demaille In-Reply-To: <1053773524.1039079.1594067726913.JavaMail.yahoo@mail.yahoo.co.jp> Date: Wed, 8 Jul 2020 06:45:35 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <13303583.938451.1593894838288.JavaMail.yahoo.ref@mail.yahoo.co.jp> <13303583.938451.1593894838288.JavaMail.yahoo@mail.yahoo.co.jp> <52B497D4-4617-4264-AFC8-37626872D56F@lrde.epita.fr> <439429652.978173.1593983452287.JavaMail.yahoo@mail.yahoo.co.jp> <1488800230.1015833.1594018271357.JavaMail.yahoo@mail.yahoo.co.jp> <3EB4A3A9-7FBC-4A84-AECF-FC0029783523@lrde.epita.fr> <1053773524.1039079.1594067726913.JavaMail.yahoo@mail.yahoo.co.jp> To: Gnulib bugs X-Mailer: Apple Mail (2.3445.104.14) Received-SPF: pass client-ip=2a00:1450:4864:20::334; envelope-from=akim.demaille@gmail.com; helo=mail-wm1-x334.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kiyoshi KANAZAWA , Bison Bugs Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" Hi! Bison uses gnulib's unicodeio module to emit bullets (=E2=80=A2) = portably, with a fallback to '.'. It's implemented this way (src/gram.h): > /* Fallback in case we can't print "=E2=80=A2". */ > static inline long > print_dot_fallback (unsigned int code _GL_UNUSED, > const char *msg _GL_UNUSED, > void *callback_arg) > { > FILE *out =3D (FILE *) callback_arg; > putc ('.', out); > return -1; > } >=20 > /* Print "=E2=80=A2", the symbol used to represent a point in an item = (aka, a > dotted rule). */ > static inline void > print_dot (FILE *out) > { > unicode_to_mb (0x2022, fwrite_success_callback, print_dot_fallback, = out); > } Unfortunately on Kiyoshi's environment (SunOS hidden 5.11 11.3 i86pc = i386 i86pc, GCC 9.3.0) we get '?' instead of '.' in the C locale. We get a genuine = ASCII '?', it's not some fallback from the terminal which fails to display the character. And we properly get the bullet with en_US.UTF-8. Kiyoshi can reproduce the problem with GNU Coreutils' printf, where he get's a '?', although the fallback display the escape sequence (i.e., it should repeat '\u2022'): > /* Simple failure callback that displays a fallback representation in = plain > ASCII, using the same notation as ISO C99 strings. */ > static long > fallback_failure_callback (unsigned int code, > const char *msg _GL_UNUSED, > void *callback_arg) > { > FILE *stream =3D (FILE *) callback_arg; >=20 > if (code < 0x10000) > fprintf (stream, "\\u%04X", code); > else > fprintf (stream, "\\U%08X", code); > return -1; > } >=20 > /* Outputs the Unicode character CODE to the output stream STREAM. > Upon failure, exit if exit_on_error is true, otherwise output a = fallback > notation. */ > void > print_unicode_char (FILE *stream, unsigned int code, int = exit_on_error) > { > unicode_to_mb (code, fwrite_success_callback, > exit_on_error > ? exit_failure_callback > : fallback_failure_callback, > stream); > } Kiyoshi's messages start here: https://lists.gnu.org/r/bug-bison/2020-07/msg00001.html The latest: > Le 6 juil. 2020 =C3=A0 22:35, Kiyoshi KANAZAWA = a =C3=A9crit : >=20 > Hi Akim, >=20 > $ LC_ALL=3DC $coreutilsbin/printf '\u2022\n' | od -t x1 > 0000000 3f 0a > 0000002 >=20 > $ LC_ALL=3Den_US.UTF-8 $coreutilsbin/printf '\u2022\n' | od -t x1 > 0000000 e2 80 a2 0a > 0000004 >=20 >=20 > FYI, I have very limited locale. > $ locale -a > C > POSIX > en_US.ISO8859-1 > en_US.ISO8859-15 > en_US.ISO8859-15@euro > en_US.UTF-8 > ja_JP.PCK > ja_JP.UTF-8 > ja_JP.UTF-8@cldr > ja_JP.eucJP I'm unsure what the next steps would be from here. Thanks in advance!=