From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.0.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 0FD90211B5 for ; Wed, 19 Dec 2018 14:44:27 +0000 (UTC) Received: from localhost ([::1]:60528 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gZd5G-00081w-7s for normalperson@yhbt.net; Wed, 19 Dec 2018 09:44:26 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38939) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gZd5C-00081e-BT for bug-gnulib@gnu.org; Wed, 19 Dec 2018 09:44:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gZd59-00085q-7B for bug-gnulib@gnu.org; Wed, 19 Dec 2018 09:44:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58656) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gZd58-00084Y-AA; Wed, 19 Dec 2018 09:44:18 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 613EB7E9D5; Wed, 19 Dec 2018 14:44:16 +0000 (UTC) Received: from calimero.vinschen.de (ovpn-116-27.ams2.redhat.com [10.36.116.27]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 141A16061E; Wed, 19 Dec 2018 14:44:16 +0000 (UTC) Received: by calimero.vinschen.de (Postfix, from userid 500) id C10A3A803E7; Wed, 19 Dec 2018 15:44:14 +0100 (CET) Date: Wed, 19 Dec 2018 15:44:14 +0100 From: Corinna Vinschen To: Bruno Haible Subject: Re: [Grep-devel] handling of non-BMP characters Message-ID: <20181219144414.GN28727@calimero.vinschen.de> References: <20181216204837.GM28727@calimero.vinschen.de> <20181216205140.GN28727@calimero.vinschen.de> <2767188.vsDAfJlR39@omega> <20181219144157.GM28727@calimero.vinschen.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20181219144157.GM28727@calimero.vinschen.de> User-Agent: Mutt/1.9.2 (2017-12-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 19 Dec 2018 14:44:16 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eric Blake , bug-gnulib@gnu.org, Jim Meyering , grep-devel@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" On Dec 19 15:41, Corinna Vinschen wrote: > On Dec 19 08:51, Bruno Haible wrote: > > Corinna Vinschen wrote in > > : > > > it would be > > > pretty nice if that code could get reverted back in to support > > > non-BMP charsets even on Cygwin. > > > > I agree that support for beyond-BMP characters should be added back to 'grep'. > > > > Your earlier fix from 2013-08-16 (and the fact that the test failure is > > occurring exactly on Windows and AIX platforms) shows that the problem is > > with wchar_t being only 16-bit wide on these platforms. > > > > The type 'char32_t' has been introduced in C11 to overcome this limitation.[1] > > > > I propose to > > > > 1) introduce in gnulib support for , char32_t, and mbrtoc32, so > > that we can use these instead of , wchar_t, and mbrtowc > > portably, > > > > 2) change those gnulib modules that don't behave well with beyond-BMP > > characters on Windows and AIX to use char32_t instead of wchar_t. > > > > Then the 'grep' code can be changed in a similar way, and this will > > fix the bug on Cygwin and AIX (though not on native Windows [2]). > > > > The advantage of this approach are minimal code changes in 'grep': just > > change some type and function names here and there, and add code for > > the additional (size_t)(-3) return value of mbrtoc32. > > IIUC this would also drop the requirement for #ifdef CYGWIN'ed code. ... in grep. > Sounds like a great idea to me! > > > Corinna > > > > > > > Bruno > > > > [1] https://stackoverflow.com/questions/21264035/why-did-c11-introduce-the-char16-t-and-char32-t-types > > [2] https://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00175.html