* Re: new snapshot available: grep-3.1.46-504af
[not found] ` <CA+8g5KGumQSMO82BKDsYUAuTzzkAAAZ+H1qqzy1-HiU0AOxbaA@mail.gmail.com>
@ 2018-12-15 22:08 ` Bruno Haible
2018-12-15 23:32 ` Jim Meyering
0 siblings, 1 reply; 10+ messages in thread
From: Bruno Haible @ 2018-12-15 22:08 UTC (permalink / raw)
To: Jim Meyering; +Cc: bug-gnulib, GNU grep developers
Hi Jim,
> > I guess the fix should be to detect the glibc bug in m4/regex.m4 ?
>
> Exactly. That's what I'm doing now.
> I'm expecting to insert something like this, probably reusing the
> result value of "64":
>
> /* Matching with the compiled form of this regexp would
> provoke
> an assertion failure prior to glibc-2.28:
> regexec.c:1375: pop_fail_stack: Assertion `num >= 0'
> failed
> With glibc-2.28, compilation fails and reports the
> invalid
> back reference. */
> re_set_syntax (RE_SYNTAX_POSIX_EGREP);
> memset (®ex, 0, sizeof regex);
> s = re_compile_pattern ("0|()0|\\1|0", 10, ®ex);
> if (!s || strcmp (s, "Invalid back reference"))
> result |= 64;
>
Looks good to me (modulo the line breaks that are probably caused
by your MUA).
Yes, we have to reuse some of the bits, because a program's return
code (> 0, < 126) has only room for 7 bits.
This test should also be added to tests/test-regex.c, so that we
verify that the choices made by regex.m4 have really achieved their
objective.
Bruno
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: new snapshot available: grep-3.1.46-504af
2018-12-15 22:08 ` new snapshot available: grep-3.1.46-504af Bruno Haible
@ 2018-12-15 23:32 ` Jim Meyering
0 siblings, 0 replies; 10+ messages in thread
From: Jim Meyering @ 2018-12-15 23:32 UTC (permalink / raw)
To: Bruno Haible; +Cc: bug-gnulib@gnu.org List, GNU grep developers
On Sat, Dec 15, 2018 at 2:08 PM Bruno Haible <bruno@clisp.org> wrote:
> Hi Jim,
> > > I guess the fix should be to detect the glibc bug in m4/regex.m4 ?
> >
> > Exactly. That's what I'm doing now.
> > I'm expecting to insert something like this, probably reusing the
> > result value of "64":
> >
> > /* Matching with the compiled form of this regexp would
> > provoke
> > an assertion failure prior to glibc-2.28:
> > regexec.c:1375: pop_fail_stack: Assertion `num >= 0'
> > failed
> > With glibc-2.28, compilation fails and reports the
> > invalid
> > back reference. */
> > re_set_syntax (RE_SYNTAX_POSIX_EGREP);
> > memset (®ex, 0, sizeof regex);
> > s = re_compile_pattern ("0|()0|\\1|0", 10, ®ex);
> > if (!s || strcmp (s, "Invalid back reference"))
> > result |= 64;
> >
>
> Looks good to me (modulo the line breaks that are probably caused
> by your MUA).
>
> Yes, we have to reuse some of the bits, because a program's return
> code (> 0, < 126) has only room for 7 bits.
>
> This test should also be added to tests/test-regex.c, so that we
> verify that the choices made by regex.m4 have really achieved their
> objective.
Thanks.
I've just pushed this:
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=55a4abd92a0a8fa0a9d9aff3892505f7b0c6d73c
Tested both this
./gnulib-tool --create-testdir --test --dir /t/x --with-tests regex
and via bulding/testing grep on a Debian system.
Before, both would fail. After, they both pass.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: grep-3.1.46-504af on Minix
[not found] <lubpf3h8feabk5.fsf@meyering.net>
[not found] ` <57223855.0cMppWhKHm@omega>
@ 2018-12-16 22:52 ` Bruno Haible
[not found] ` <20181216204837.GM28727@calimero.vinschen.de>
2 siblings, 0 replies; 10+ messages in thread
From: Bruno Haible @ 2018-12-16 22:52 UTC (permalink / raw)
To: grep-devel; +Cc: bug-gnulib
On Minix 3.3, several tests fail:
* stackoverflow - this is expected, because Minix does not have the
necessary support for libsigsegv.
* include-exclude, rdot, symlink, word-multi-file - apparently "grep -r DIRECTORY"
produces an "Invalid argument" error.
A couple of gnulib tests fail as well:
test-cloexec
test-dup2
test-fchdir
test-fcntl
test-lseek.sh
test-select-in.sh
test-select-out.sh
test-dup-safer
But this is low priority (at least for me).
Bruno
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Grep-devel] handling of non-BMP characters
[not found] ` <20181216205140.GN28727@calimero.vinschen.de>
@ 2018-12-19 7:51 ` Bruno Haible
2018-12-19 14:41 ` Corinna Vinschen
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Bruno Haible @ 2018-12-19 7:51 UTC (permalink / raw)
To: Corinna Vinschen, bug-gnulib; +Cc: Eric Blake, Jim Meyering, grep-devel
Corinna Vinschen wrote in
<https://lists.gnu.org/archive/html/grep-devel/2018-12/msg00039.html>:
> it would be
> pretty nice if that code could get reverted back in to support
> non-BMP charsets even on Cygwin.
I agree that support for beyond-BMP characters should be added back to 'grep'.
Your earlier fix from 2013-08-16 (and the fact that the test failure is
occurring exactly on Windows and AIX platforms) shows that the problem is
with wchar_t being only 16-bit wide on these platforms.
The type 'char32_t' has been introduced in C11 to overcome this limitation.[1]
I propose to
1) introduce in gnulib support for <uchar.h>, char32_t, and mbrtoc32, so
that we can use these instead of <wchar.h>, wchar_t, and mbrtowc
portably,
2) change those gnulib modules that don't behave well with beyond-BMP
characters on Windows and AIX to use char32_t instead of wchar_t.
Then the 'grep' code can be changed in a similar way, and this will
fix the bug on Cygwin and AIX (though not on native Windows [2]).
The advantage of this approach are minimal code changes in 'grep': just
change some type and function names here and there, and add code for
the additional (size_t)(-3) return value of mbrtoc32.
Bruno
[1] https://stackoverflow.com/questions/21264035/why-did-c11-introduce-the-char16-t-and-char32-t-types
[2] https://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00175.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Grep-devel] handling of non-BMP characters
2018-12-19 7:51 ` [Grep-devel] handling of non-BMP characters Bruno Haible
@ 2018-12-19 14:41 ` Corinna Vinschen
2018-12-19 14:44 ` Corinna Vinschen
2018-12-19 17:21 ` Jim Meyering
2018-12-19 22:54 ` Paul Eggert
2 siblings, 1 reply; 10+ messages in thread
From: Corinna Vinschen @ 2018-12-19 14:41 UTC (permalink / raw)
To: Bruno Haible; +Cc: Eric Blake, bug-gnulib, Jim Meyering, grep-devel
On Dec 19 08:51, Bruno Haible wrote:
> Corinna Vinschen wrote in
> <https://lists.gnu.org/archive/html/grep-devel/2018-12/msg00039.html>:
> > it would be
> > pretty nice if that code could get reverted back in to support
> > non-BMP charsets even on Cygwin.
>
> I agree that support for beyond-BMP characters should be added back to 'grep'.
>
> Your earlier fix from 2013-08-16 (and the fact that the test failure is
> occurring exactly on Windows and AIX platforms) shows that the problem is
> with wchar_t being only 16-bit wide on these platforms.
>
> The type 'char32_t' has been introduced in C11 to overcome this limitation.[1]
>
> I propose to
>
> 1) introduce in gnulib support for <uchar.h>, char32_t, and mbrtoc32, so
> that we can use these instead of <wchar.h>, wchar_t, and mbrtowc
> portably,
>
> 2) change those gnulib modules that don't behave well with beyond-BMP
> characters on Windows and AIX to use char32_t instead of wchar_t.
>
> Then the 'grep' code can be changed in a similar way, and this will
> fix the bug on Cygwin and AIX (though not on native Windows [2]).
>
> The advantage of this approach are minimal code changes in 'grep': just
> change some type and function names here and there, and add code for
> the additional (size_t)(-3) return value of mbrtoc32.
IIUC this would also drop the requirement for #ifdef CYGWIN'ed code.
Sounds like a great idea to me!
Corinna
>
> Bruno
>
> [1] https://stackoverflow.com/questions/21264035/why-did-c11-introduce-the-char16-t-and-char32-t-types
> [2] https://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00175.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Grep-devel] handling of non-BMP characters
2018-12-19 14:41 ` Corinna Vinschen
@ 2018-12-19 14:44 ` Corinna Vinschen
0 siblings, 0 replies; 10+ messages in thread
From: Corinna Vinschen @ 2018-12-19 14:44 UTC (permalink / raw)
To: Bruno Haible; +Cc: Eric Blake, bug-gnulib, Jim Meyering, grep-devel
On Dec 19 15:41, Corinna Vinschen wrote:
> On Dec 19 08:51, Bruno Haible wrote:
> > Corinna Vinschen wrote in
> > <https://lists.gnu.org/archive/html/grep-devel/2018-12/msg00039.html>:
> > > it would be
> > > pretty nice if that code could get reverted back in to support
> > > non-BMP charsets even on Cygwin.
> >
> > I agree that support for beyond-BMP characters should be added back to 'grep'.
> >
> > Your earlier fix from 2013-08-16 (and the fact that the test failure is
> > occurring exactly on Windows and AIX platforms) shows that the problem is
> > with wchar_t being only 16-bit wide on these platforms.
> >
> > The type 'char32_t' has been introduced in C11 to overcome this limitation.[1]
> >
> > I propose to
> >
> > 1) introduce in gnulib support for <uchar.h>, char32_t, and mbrtoc32, so
> > that we can use these instead of <wchar.h>, wchar_t, and mbrtowc
> > portably,
> >
> > 2) change those gnulib modules that don't behave well with beyond-BMP
> > characters on Windows and AIX to use char32_t instead of wchar_t.
> >
> > Then the 'grep' code can be changed in a similar way, and this will
> > fix the bug on Cygwin and AIX (though not on native Windows [2]).
> >
> > The advantage of this approach are minimal code changes in 'grep': just
> > change some type and function names here and there, and add code for
> > the additional (size_t)(-3) return value of mbrtoc32.
>
> IIUC this would also drop the requirement for #ifdef CYGWIN'ed code.
... in grep.
> Sounds like a great idea to me!
>
>
> Corinna
>
>
>
> >
> > Bruno
> >
> > [1] https://stackoverflow.com/questions/21264035/why-did-c11-introduce-the-char16-t-and-char32-t-types
> > [2] https://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00175.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Grep-devel] handling of non-BMP characters
2018-12-19 7:51 ` [Grep-devel] handling of non-BMP characters Bruno Haible
2018-12-19 14:41 ` Corinna Vinschen
@ 2018-12-19 17:21 ` Jim Meyering
2018-12-19 22:54 ` Paul Eggert
2 siblings, 0 replies; 10+ messages in thread
From: Jim Meyering @ 2018-12-19 17:21 UTC (permalink / raw)
To: Bruno Haible
Cc: bug-gnulib@gnu.org List, Corinna Vinschen, Eric Blake,
GNU grep developers
On Tue, Dec 18, 2018 at 11:51 PM Bruno Haible <bruno@clisp.org> wrote:
> Corinna Vinschen wrote in
> <https://lists.gnu.org/archive/html/grep-devel/2018-12/msg00039.html>:
> > it would be
> > pretty nice if that code could get reverted back in to support
> > non-BMP charsets even on Cygwin.
>
> I agree that support for beyond-BMP characters should be added back to 'grep'.
>
> Your earlier fix from 2013-08-16 (and the fact that the test failure is
> occurring exactly on Windows and AIX platforms) shows that the problem is
> with wchar_t being only 16-bit wide on these platforms.
>
> The type 'char32_t' has been introduced in C11 to overcome this limitation.[1]
>
> I propose to
>
> 1) introduce in gnulib support for <uchar.h>, char32_t, and mbrtoc32, so
> that we can use these instead of <wchar.h>, wchar_t, and mbrtowc
> portably,
>
> 2) change those gnulib modules that don't behave well with beyond-BMP
> characters on Windows and AIX to use char32_t instead of wchar_t.
>
> Then the 'grep' code can be changed in a similar way, and this will
> fix the bug on Cygwin and AIX (though not on native Windows [2]).
Sounds perfect. Thank you!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Grep-devel] handling of non-BMP characters
2018-12-19 7:51 ` [Grep-devel] handling of non-BMP characters Bruno Haible
2018-12-19 14:41 ` Corinna Vinschen
2018-12-19 17:21 ` Jim Meyering
@ 2018-12-19 22:54 ` Paul Eggert
2018-12-20 6:49 ` arnold
2 siblings, 1 reply; 10+ messages in thread
From: Paul Eggert @ 2018-12-19 22:54 UTC (permalink / raw)
To: Bruno Haible, Corinna Vinschen, bug-gnulib; +Cc: Eric Blake, grep-devel
On 12/18/18 11:51 PM, Bruno Haible wrote:
> 2) change those gnulib modules that don't behave well with beyond-BMP
> characters on Windows and AIX to use char32_t instead of wchar_t.
This sounds good to me. I assume the regexp code will need to be changed
accordingly, and if so I can volunteer to coordinate that with glibc
(we're close to a freeze in Glibc, but we can install into Gnulib first).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Grep-devel] handling of non-BMP characters
2018-12-19 22:54 ` Paul Eggert
@ 2018-12-20 6:49 ` arnold
2018-12-20 11:30 ` Bruno Haible
0 siblings, 1 reply; 10+ messages in thread
From: arnold @ 2018-12-20 6:49 UTC (permalink / raw)
To: vinschen, eggert, bug-gnulib, bruno; +Cc: eblake, grep-devel
Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 12/18/18 11:51 PM, Bruno Haible wrote:
> > 2) change those gnulib modules that don't behave well with beyond-BMP
> > characters on Windows and AIX to use char32_t instead of wchar_t.
>
> This sounds good to me. I assume the regexp code will need to be changed
> accordingly, and if so I can volunteer to coordinate that with glibc
> (we're close to a freeze in Glibc, but we can install into Gnulib first).
>
I assume you'll make parallel changes in dfa.c at the same time?
Thanks,
Arnold
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Grep-devel] handling of non-BMP characters
2018-12-20 6:49 ` arnold
@ 2018-12-20 11:30 ` Bruno Haible
0 siblings, 0 replies; 10+ messages in thread
From: Bruno Haible @ 2018-12-20 11:30 UTC (permalink / raw)
To: arnold; +Cc: bug-gnulib, eggert, vinschen, eblake, grep-devel
Hi Arnold,
> > > 2) change those gnulib modules that don't behave well with beyond-BMP
> > > characters on Windows and AIX to use char32_t instead of wchar_t.
> ...
> I assume you'll make parallel changes in dfa.c at the same time?
If dfa.c does have bugs w.r.t. beyond-BMP characters, yes.
Bruno
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-12-20 11:30 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <lubpf3h8feabk5.fsf@meyering.net>
[not found] ` <57223855.0cMppWhKHm@omega>
[not found] ` <CA+8g5KGumQSMO82BKDsYUAuTzzkAAAZ+H1qqzy1-HiU0AOxbaA@mail.gmail.com>
2018-12-15 22:08 ` new snapshot available: grep-3.1.46-504af Bruno Haible
2018-12-15 23:32 ` Jim Meyering
2018-12-16 22:52 ` grep-3.1.46-504af on Minix Bruno Haible
[not found] ` <20181216204837.GM28727@calimero.vinschen.de>
[not found] ` <20181216205140.GN28727@calimero.vinschen.de>
2018-12-19 7:51 ` [Grep-devel] handling of non-BMP characters Bruno Haible
2018-12-19 14:41 ` Corinna Vinschen
2018-12-19 14:44 ` Corinna Vinschen
2018-12-19 17:21 ` Jim Meyering
2018-12-19 22:54 ` Paul Eggert
2018-12-20 6:49 ` arnold
2018-12-20 11:30 ` Bruno Haible
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).