bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* Re: new snapshot available: grep-3.1.46-504af
       [not found]   ` <CA+8g5KGumQSMO82BKDsYUAuTzzkAAAZ+H1qqzy1-HiU0AOxbaA@mail.gmail.com>
@ 2018-12-15 22:08     ` Bruno Haible
  2018-12-15 23:32       ` Jim Meyering
  0 siblings, 1 reply; 10+ messages in thread
From: Bruno Haible @ 2018-12-15 22:08 UTC (permalink / raw)
  To: Jim Meyering; +Cc: bug-gnulib, GNU grep developers

Hi Jim,

> > I guess the fix should be to detect the glibc bug in m4/regex.m4 ?
> 
> Exactly. That's what I'm doing now.
> I'm expecting to insert something like this, probably reusing the
> result value of "64":
> 
>             /* Matching with the compiled form of this regexp would
> provoke
>                an assertion failure prior to glibc-2.28:
>                  regexec.c:1375: pop_fail_stack: Assertion `num >= 0'
> failed
>                With glibc-2.28, compilation fails and reports the
> invalid
>                back reference.  */
>             re_set_syntax (RE_SYNTAX_POSIX_EGREP);
>             memset (&regex, 0, sizeof regex);
>             s = re_compile_pattern ("0|()0|\\1|0", 10, &regex);
>             if (!s || strcmp (s, "Invalid back reference"))
>               result |= 64;
> 

Looks good to me (modulo the line breaks that are probably caused
by your MUA).

Yes, we have to reuse some of the bits, because a program's return
code (> 0, < 126) has only room for 7 bits.

This test should also be added to tests/test-regex.c, so that we
verify that the choices made by regex.m4 have really achieved their
objective.

Bruno



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: new snapshot available: grep-3.1.46-504af
  2018-12-15 22:08     ` new snapshot available: grep-3.1.46-504af Bruno Haible
@ 2018-12-15 23:32       ` Jim Meyering
  0 siblings, 0 replies; 10+ messages in thread
From: Jim Meyering @ 2018-12-15 23:32 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib@gnu.org List, GNU grep developers

On Sat, Dec 15, 2018 at 2:08 PM Bruno Haible <bruno@clisp.org> wrote:
> Hi Jim,
> > > I guess the fix should be to detect the glibc bug in m4/regex.m4 ?
> >
> > Exactly. That's what I'm doing now.
> > I'm expecting to insert something like this, probably reusing the
> > result value of "64":
> >
> >             /* Matching with the compiled form of this regexp would
> > provoke
> >                an assertion failure prior to glibc-2.28:
> >                  regexec.c:1375: pop_fail_stack: Assertion `num >= 0'
> > failed
> >                With glibc-2.28, compilation fails and reports the
> > invalid
> >                back reference.  */
> >             re_set_syntax (RE_SYNTAX_POSIX_EGREP);
> >             memset (&regex, 0, sizeof regex);
> >             s = re_compile_pattern ("0|()0|\\1|0", 10, &regex);
> >             if (!s || strcmp (s, "Invalid back reference"))
> >               result |= 64;
> >
>
> Looks good to me (modulo the line breaks that are probably caused
> by your MUA).
>
> Yes, we have to reuse some of the bits, because a program's return
> code (> 0, < 126) has only room for 7 bits.
>
> This test should also be added to tests/test-regex.c, so that we
> verify that the choices made by regex.m4 have really achieved their
> objective.

Thanks.
I've just pushed this:
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=55a4abd92a0a8fa0a9d9aff3892505f7b0c6d73c
Tested both this
./gnulib-tool --create-testdir --test --dir /t/x --with-tests regex
and via bulding/testing grep on a Debian system.
Before, both would fail. After, they both pass.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: grep-3.1.46-504af on Minix
       [not found] <lubpf3h8feabk5.fsf@meyering.net>
       [not found] ` <57223855.0cMppWhKHm@omega>
@ 2018-12-16 22:52 ` Bruno Haible
       [not found] ` <20181216204837.GM28727@calimero.vinschen.de>
  2 siblings, 0 replies; 10+ messages in thread
From: Bruno Haible @ 2018-12-16 22:52 UTC (permalink / raw)
  To: grep-devel; +Cc: bug-gnulib

On Minix 3.3, several tests fail:

* stackoverflow - this is expected, because Minix does not have the
  necessary support for libsigsegv.

* include-exclude, rdot, symlink, word-multi-file - apparently "grep -r DIRECTORY"
  produces an "Invalid argument" error.

A couple of gnulib tests fail as well:
test-cloexec
test-dup2
test-fchdir
test-fcntl
test-lseek.sh
test-select-in.sh
test-select-out.sh
test-dup-safer

But this is low priority (at least for me).

Bruno



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Grep-devel] handling of non-BMP characters
       [not found]   ` <20181216205140.GN28727@calimero.vinschen.de>
@ 2018-12-19  7:51     ` Bruno Haible
  2018-12-19 14:41       ` Corinna Vinschen
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Bruno Haible @ 2018-12-19  7:51 UTC (permalink / raw)
  To: Corinna Vinschen, bug-gnulib; +Cc: Eric Blake, Jim Meyering, grep-devel

Corinna Vinschen wrote in
<https://lists.gnu.org/archive/html/grep-devel/2018-12/msg00039.html>:
> it would be
> pretty nice if that code could get reverted back in to support
> non-BMP charsets even on Cygwin.

I agree that support for beyond-BMP characters should be added back to 'grep'.

Your earlier fix from 2013-08-16 (and the fact that the test failure is
occurring exactly on Windows and AIX platforms) shows that the problem is
with wchar_t being only 16-bit wide on these platforms.

The type 'char32_t' has been introduced in C11 to overcome this limitation.[1]

I propose to

  1) introduce in gnulib support for <uchar.h>, char32_t, and mbrtoc32, so
     that we can use these instead of <wchar.h>, wchar_t, and mbrtowc
     portably,

  2) change those gnulib modules that don't behave well with beyond-BMP
     characters on Windows and AIX to use char32_t instead of wchar_t.

Then the 'grep' code can be changed in a similar way, and this will
fix the bug on Cygwin and AIX (though not on native Windows [2]).

The advantage of this approach are minimal code changes in 'grep': just
change some type and function names here and there, and add code for
the additional (size_t)(-3) return value of mbrtoc32.

Bruno

[1] https://stackoverflow.com/questions/21264035/why-did-c11-introduce-the-char16-t-and-char32-t-types
[2] https://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00175.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Grep-devel] handling of non-BMP characters
  2018-12-19  7:51     ` [Grep-devel] handling of non-BMP characters Bruno Haible
@ 2018-12-19 14:41       ` Corinna Vinschen
  2018-12-19 14:44         ` Corinna Vinschen
  2018-12-19 17:21       ` Jim Meyering
  2018-12-19 22:54       ` Paul Eggert
  2 siblings, 1 reply; 10+ messages in thread
From: Corinna Vinschen @ 2018-12-19 14:41 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Eric Blake, bug-gnulib, Jim Meyering, grep-devel

On Dec 19 08:51, Bruno Haible wrote:
> Corinna Vinschen wrote in
> <https://lists.gnu.org/archive/html/grep-devel/2018-12/msg00039.html>:
> > it would be
> > pretty nice if that code could get reverted back in to support
> > non-BMP charsets even on Cygwin.
> 
> I agree that support for beyond-BMP characters should be added back to 'grep'.
> 
> Your earlier fix from 2013-08-16 (and the fact that the test failure is
> occurring exactly on Windows and AIX platforms) shows that the problem is
> with wchar_t being only 16-bit wide on these platforms.
> 
> The type 'char32_t' has been introduced in C11 to overcome this limitation.[1]
> 
> I propose to
> 
>   1) introduce in gnulib support for <uchar.h>, char32_t, and mbrtoc32, so
>      that we can use these instead of <wchar.h>, wchar_t, and mbrtowc
>      portably,
> 
>   2) change those gnulib modules that don't behave well with beyond-BMP
>      characters on Windows and AIX to use char32_t instead of wchar_t.
> 
> Then the 'grep' code can be changed in a similar way, and this will
> fix the bug on Cygwin and AIX (though not on native Windows [2]).
> 
> The advantage of this approach are minimal code changes in 'grep': just
> change some type and function names here and there, and add code for
> the additional (size_t)(-3) return value of mbrtoc32.

IIUC this would also drop the requirement for #ifdef CYGWIN'ed code.
Sounds like a great idea to me!


Corinna



> 
> Bruno
> 
> [1] https://stackoverflow.com/questions/21264035/why-did-c11-introduce-the-char16-t-and-char32-t-types
> [2] https://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00175.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Grep-devel] handling of non-BMP characters
  2018-12-19 14:41       ` Corinna Vinschen
@ 2018-12-19 14:44         ` Corinna Vinschen
  0 siblings, 0 replies; 10+ messages in thread
From: Corinna Vinschen @ 2018-12-19 14:44 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Eric Blake, bug-gnulib, Jim Meyering, grep-devel

On Dec 19 15:41, Corinna Vinschen wrote:
> On Dec 19 08:51, Bruno Haible wrote:
> > Corinna Vinschen wrote in
> > <https://lists.gnu.org/archive/html/grep-devel/2018-12/msg00039.html>:
> > > it would be
> > > pretty nice if that code could get reverted back in to support
> > > non-BMP charsets even on Cygwin.
> > 
> > I agree that support for beyond-BMP characters should be added back to 'grep'.
> > 
> > Your earlier fix from 2013-08-16 (and the fact that the test failure is
> > occurring exactly on Windows and AIX platforms) shows that the problem is
> > with wchar_t being only 16-bit wide on these platforms.
> > 
> > The type 'char32_t' has been introduced in C11 to overcome this limitation.[1]
> > 
> > I propose to
> > 
> >   1) introduce in gnulib support for <uchar.h>, char32_t, and mbrtoc32, so
> >      that we can use these instead of <wchar.h>, wchar_t, and mbrtowc
> >      portably,
> > 
> >   2) change those gnulib modules that don't behave well with beyond-BMP
> >      characters on Windows and AIX to use char32_t instead of wchar_t.
> > 
> > Then the 'grep' code can be changed in a similar way, and this will
> > fix the bug on Cygwin and AIX (though not on native Windows [2]).
> > 
> > The advantage of this approach are minimal code changes in 'grep': just
> > change some type and function names here and there, and add code for
> > the additional (size_t)(-3) return value of mbrtoc32.
> 
> IIUC this would also drop the requirement for #ifdef CYGWIN'ed code.

  ... in grep.

> Sounds like a great idea to me!
> 
> 
> Corinna
> 
> 
> 
> > 
> > Bruno
> > 
> > [1] https://stackoverflow.com/questions/21264035/why-did-c11-introduce-the-char16-t-and-char32-t-types
> > [2] https://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00175.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Grep-devel] handling of non-BMP characters
  2018-12-19  7:51     ` [Grep-devel] handling of non-BMP characters Bruno Haible
  2018-12-19 14:41       ` Corinna Vinschen
@ 2018-12-19 17:21       ` Jim Meyering
  2018-12-19 22:54       ` Paul Eggert
  2 siblings, 0 replies; 10+ messages in thread
From: Jim Meyering @ 2018-12-19 17:21 UTC (permalink / raw)
  To: Bruno Haible
  Cc: bug-gnulib@gnu.org List, Corinna Vinschen, Eric Blake,
	GNU grep developers

On Tue, Dec 18, 2018 at 11:51 PM Bruno Haible <bruno@clisp.org> wrote:
> Corinna Vinschen wrote in
> <https://lists.gnu.org/archive/html/grep-devel/2018-12/msg00039.html>:
> > it would be
> > pretty nice if that code could get reverted back in to support
> > non-BMP charsets even on Cygwin.
>
> I agree that support for beyond-BMP characters should be added back to 'grep'.
>
> Your earlier fix from 2013-08-16 (and the fact that the test failure is
> occurring exactly on Windows and AIX platforms) shows that the problem is
> with wchar_t being only 16-bit wide on these platforms.
>
> The type 'char32_t' has been introduced in C11 to overcome this limitation.[1]
>
> I propose to
>
>   1) introduce in gnulib support for <uchar.h>, char32_t, and mbrtoc32, so
>      that we can use these instead of <wchar.h>, wchar_t, and mbrtowc
>      portably,
>
>   2) change those gnulib modules that don't behave well with beyond-BMP
>      characters on Windows and AIX to use char32_t instead of wchar_t.
>
> Then the 'grep' code can be changed in a similar way, and this will
> fix the bug on Cygwin and AIX (though not on native Windows [2]).

Sounds perfect. Thank you!


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Grep-devel] handling of non-BMP characters
  2018-12-19  7:51     ` [Grep-devel] handling of non-BMP characters Bruno Haible
  2018-12-19 14:41       ` Corinna Vinschen
  2018-12-19 17:21       ` Jim Meyering
@ 2018-12-19 22:54       ` Paul Eggert
  2018-12-20  6:49         ` arnold
  2 siblings, 1 reply; 10+ messages in thread
From: Paul Eggert @ 2018-12-19 22:54 UTC (permalink / raw)
  To: Bruno Haible, Corinna Vinschen, bug-gnulib; +Cc: Eric Blake, grep-devel

On 12/18/18 11:51 PM, Bruno Haible wrote:
>    2) change those gnulib modules that don't behave well with beyond-BMP
>       characters on Windows and AIX to use char32_t instead of wchar_t.

This sounds good to me. I assume the regexp code will need to be changed 
accordingly, and if so I can volunteer to coordinate that with glibc 
(we're close to a freeze in Glibc, but we can install into Gnulib first).



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Grep-devel] handling of non-BMP characters
  2018-12-19 22:54       ` Paul Eggert
@ 2018-12-20  6:49         ` arnold
  2018-12-20 11:30           ` Bruno Haible
  0 siblings, 1 reply; 10+ messages in thread
From: arnold @ 2018-12-20  6:49 UTC (permalink / raw)
  To: vinschen, eggert, bug-gnulib, bruno; +Cc: eblake, grep-devel

Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 12/18/18 11:51 PM, Bruno Haible wrote:
> >    2) change those gnulib modules that don't behave well with beyond-BMP
> >       characters on Windows and AIX to use char32_t instead of wchar_t.
>
> This sounds good to me. I assume the regexp code will need to be changed 
> accordingly, and if so I can volunteer to coordinate that with glibc 
> (we're close to a freeze in Glibc, but we can install into Gnulib first).
>

I assume you'll make parallel changes in dfa.c at the same time?

Thanks,

Arnold


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Grep-devel] handling of non-BMP characters
  2018-12-20  6:49         ` arnold
@ 2018-12-20 11:30           ` Bruno Haible
  0 siblings, 0 replies; 10+ messages in thread
From: Bruno Haible @ 2018-12-20 11:30 UTC (permalink / raw)
  To: arnold; +Cc: bug-gnulib, eggert, vinschen, eblake, grep-devel

Hi Arnold,

> > >    2) change those gnulib modules that don't behave well with beyond-BMP
> > >       characters on Windows and AIX to use char32_t instead of wchar_t.
> ...
> I assume you'll make parallel changes in dfa.c at the same time?

If dfa.c does have bugs w.r.t. beyond-BMP characters, yes.

Bruno



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-12-20 11:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <lubpf3h8feabk5.fsf@meyering.net>
     [not found] ` <57223855.0cMppWhKHm@omega>
     [not found]   ` <CA+8g5KGumQSMO82BKDsYUAuTzzkAAAZ+H1qqzy1-HiU0AOxbaA@mail.gmail.com>
2018-12-15 22:08     ` new snapshot available: grep-3.1.46-504af Bruno Haible
2018-12-15 23:32       ` Jim Meyering
2018-12-16 22:52 ` grep-3.1.46-504af on Minix Bruno Haible
     [not found] ` <20181216204837.GM28727@calimero.vinschen.de>
     [not found]   ` <20181216205140.GN28727@calimero.vinschen.de>
2018-12-19  7:51     ` [Grep-devel] handling of non-BMP characters Bruno Haible
2018-12-19 14:41       ` Corinna Vinschen
2018-12-19 14:44         ` Corinna Vinschen
2018-12-19 17:21       ` Jim Meyering
2018-12-19 22:54       ` Paul Eggert
2018-12-20  6:49         ` arnold
2018-12-20 11:30           ` Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).