bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
From: Corinna Vinschen <vinschen@redhat.com>
To: Bruno Haible <bruno@clisp.org>
Cc: Eric Blake <eblake@redhat.com>,
	bug-gnulib@gnu.org, Jim Meyering <jim@meyering.net>,
	grep-devel@gnu.org
Subject: Re: [Grep-devel] handling of non-BMP characters
Date: Wed, 19 Dec 2018 15:41:57 +0100	[thread overview]
Message-ID: <20181219144157.GM28727@calimero.vinschen.de> (raw)
In-Reply-To: <2767188.vsDAfJlR39@omega>

On Dec 19 08:51, Bruno Haible wrote:
> Corinna Vinschen wrote in
> <https://lists.gnu.org/archive/html/grep-devel/2018-12/msg00039.html>:
> > it would be
> > pretty nice if that code could get reverted back in to support
> > non-BMP charsets even on Cygwin.
> 
> I agree that support for beyond-BMP characters should be added back to 'grep'.
> 
> Your earlier fix from 2013-08-16 (and the fact that the test failure is
> occurring exactly on Windows and AIX platforms) shows that the problem is
> with wchar_t being only 16-bit wide on these platforms.
> 
> The type 'char32_t' has been introduced in C11 to overcome this limitation.[1]
> 
> I propose to
> 
>   1) introduce in gnulib support for <uchar.h>, char32_t, and mbrtoc32, so
>      that we can use these instead of <wchar.h>, wchar_t, and mbrtowc
>      portably,
> 
>   2) change those gnulib modules that don't behave well with beyond-BMP
>      characters on Windows and AIX to use char32_t instead of wchar_t.
> 
> Then the 'grep' code can be changed in a similar way, and this will
> fix the bug on Cygwin and AIX (though not on native Windows [2]).
> 
> The advantage of this approach are minimal code changes in 'grep': just
> change some type and function names here and there, and add code for
> the additional (size_t)(-3) return value of mbrtoc32.

IIUC this would also drop the requirement for #ifdef CYGWIN'ed code.
Sounds like a great idea to me!


Corinna



> 
> Bruno
> 
> [1] https://stackoverflow.com/questions/21264035/why-did-c11-introduce-the-char16-t-and-char32-t-types
> [2] https://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00175.html


  reply	other threads:[~2018-12-19 14:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <lubpf3h8feabk5.fsf@meyering.net>
     [not found] ` <57223855.0cMppWhKHm@omega>
     [not found]   ` <CA+8g5KGumQSMO82BKDsYUAuTzzkAAAZ+H1qqzy1-HiU0AOxbaA@mail.gmail.com>
2018-12-15 22:08     ` new snapshot available: grep-3.1.46-504af Bruno Haible
2018-12-15 23:32       ` Jim Meyering
2018-12-16 22:52 ` grep-3.1.46-504af on Minix Bruno Haible
     [not found] ` <20181216204837.GM28727@calimero.vinschen.de>
     [not found]   ` <20181216205140.GN28727@calimero.vinschen.de>
2018-12-19  7:51     ` [Grep-devel] handling of non-BMP characters Bruno Haible
2018-12-19 14:41       ` Corinna Vinschen [this message]
2018-12-19 14:44         ` Corinna Vinschen
2018-12-19 17:21       ` Jim Meyering
2018-12-19 22:54       ` Paul Eggert
2018-12-20  6:49         ` arnold
2018-12-20 11:30           ` Bruno Haible

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181219144157.GM28727@calimero.vinschen.de \
    --to=vinschen@redhat.com \
    --cc=bruno@clisp.org \
    --cc=bug-gnulib@gnu.org \
    --cc=eblake@redhat.com \
    --cc=grep-devel@gnu.org \
    --cc=jim@meyering.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).