bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* regex.m4 fails in all glibc
@ 2021-07-14 13:51 林宏雄
  2021-07-15  5:08 ` Paul Eggert
  0 siblings, 1 reply; 3+ messages in thread
From: 林宏雄 @ 2021-07-14 13:51 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Jim Meyering

[-- Attachment #1: Type: text/plain, Size: 2034 bytes --]

I'm importing the regex module of Gnulib to Universal ctags.
https://github.com/universal-ctags/ctags/pull/3054

From the comment of regex.m4
    # If the system regex support is good enough that it passes the
    # following run test, then default to *not* using the included regex.c.

We expect that configure should not use the included regex.c
on the system with up-to-date glibc.
But we found it used the included regex.c even in (at least) glibc-2.31 and
glibc-2.33.

The test added by the following commit fails.
-----------------------------------------
commit 55a4abd92a0a8fa0a9d9aff3892505f7b0c6d73c
Author: Jim Meyering meyering@fb.com
Date: Sat Dec 15 15:24:21 2018 -0800

regex: work around a bug in glibc-2.27 and prior

* m4/regex.m4 (gl_REGEX): Reject any system regexp that gets a failed
assertion for /0|()0|\1|0/.
* tests/test-regex.c (main): Add the same test here.
...
-----------------------------------------

It should fails in glibc-2.27 and prior, but it fails in all glibc, if I
understand correctly.

Here is the test fails.
-----------------------------------------
            ...
            /* Matching with the compiled form of this regexp would provoke
               an assertion failure prior to glibc-2.28:
                 regexec.c:1375: pop_fail_stack: Assertion 'num >= 0' failed
               With glibc-2.28, compilation fails and reports the invalid
               back reference.  */
            re_set_syntax (RE_SYNTAX_POSIX_EGREP);
            memset (&regex, 0, sizeof regex);
            s = re_compile_pattern ("0|()0|\\1|0", 10, &regex);
            if (!s)
              result |= 64;     // !!! FAILED HERE !!!
            else
              {
                if (strcmp (s, "Invalid back reference"))
                  result |= 64;
                regfree (&regex);
              }
            ...
-----------------------------------------

Returning 0 by re_compile_pattern() is correct behavior. It should not fail.

I'm using the lastest Gnulib.

Regards,
-- 
Hiroo Hayashi

[-- Attachment #2: Type: text/html, Size: 2886 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: regex.m4 fails in all glibc
  2021-07-14 13:51 regex.m4 fails in all glibc 林宏雄
@ 2021-07-15  5:08 ` Paul Eggert
  2021-07-18  8:53   ` Bruno Haible
  0 siblings, 1 reply; 3+ messages in thread
From: Paul Eggert @ 2021-07-15  5:08 UTC (permalink / raw)
  To: 林宏雄, bug-gnulib; +Cc: Jim Meyering

[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

On 7/14/21 8:51 AM, 林宏雄 wrote:
> We expect that configure should not use the included regex.c
> on the system with up-to-date glibc.

Recently that has not been the case, as glibc has been trailing behind 
Gnulib a bit. At some point I hope they will become more in sync.

> Returning 0 by re_compile_pattern() is correct behavior. It should not fail.
Thank you for reporting the problem. There is no formal spec for 
re_compile_pattern and the regular expression is in some sense invalid 
and in another sense valid, so it's not clear which is the correct 
behavior here. However, Gnulib should allow the latest glibc behavior, 
as that behavior is arguably correct.

I installed the attached patches. The first patch merely fixes some 
longstanding quoting problems. The second patch updates the Gnulib tests 
to match the latest Gnulib, and to allow glibc implementations that 
behave as you describe. However, glibc 2.33 (the current version) still 
fails to pass the configure-time test, due to glibc bug 11053 which is 
fixed in Gnulib; see:

https://sourceware.org/bugzilla/show_bug.cgi?id=11053


[-- Attachment #2: 0001-regex-fix-shell-quoting-problem-in-configuration.patch --]
[-- Type: text/x-patch, Size: 1919 bytes --]

From 11e6fc32e9c5c770150b47bc0260b1b91110eba0 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 14 Jul 2021 23:23:20 -0500
Subject: [PATCH 1/2] regex: fix shell quoting problem in configuration

* m4/regex.m4 (gl_REGEX): Fix quoting problems.
These C programs are put into unquoted here-documents,
so $ and \ need to be quoted.
---
 ChangeLog   | 7 +++++++
 m4/regex.m4 | 6 +++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 25d85aa4a..00d31cdc7 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2021-07-14  Paul Eggert  <eggert@cs.ucla.edu>
+
+	regex: fix shell quoting problem in configuration
+	* m4/regex.m4 (gl_REGEX): Fix quoting problems.
+	These C programs are put into unquoted here-documents,
+	so $ and \ need to be quoted.
+
 2021-07-08  Paul Eggert  <eggert@cs.ucla.edu>
 
 	select: port better to MinGW
diff --git a/m4/regex.m4 b/m4/regex.m4
index 850c57222..0e1bafef2 100644
--- a/m4/regex.m4
+++ b/m4/regex.m4
@@ -1,4 +1,4 @@
-# serial 71
+# serial 72
 
 # Copyright (C) 1996-2001, 2003-2021 Free Software Foundation, Inc.
 #
@@ -246,7 +246,7 @@ AC_DEFUN([gl_REGEX],
                            & ~RE_CONTEXT_INVALID_DUP
                            & ~RE_NO_EMPTY_RANGES);
             memset (&regex, 0, sizeof regex);
-            s = re_compile_pattern ("[[:alnum:]_-]\\\\+$", 16, &regex);
+            s = re_compile_pattern ("[[:alnum:]_-]\\\\+\$", 16, &regex);
             if (s)
               result |= 32;
             else
@@ -264,7 +264,7 @@ AC_DEFUN([gl_REGEX],
                back reference.  */
             re_set_syntax (RE_SYNTAX_POSIX_EGREP);
             memset (&regex, 0, sizeof regex);
-            s = re_compile_pattern ("0|()0|\\1|0", 10, &regex);
+            s = re_compile_pattern ("0|()0|\\\\1|0", 10, &regex);
             if (!s)
               result |= 64;
             else
-- 
2.25.1


[-- Attachment #3: 0002-regex-modernize-to-newer-regex-bugset.patch --]
[-- Type: text/x-patch, Size: 5111 bytes --]

From e707dbe7c6da9dd8300cb3d60141f144a7a5d5b1 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 14 Jul 2021 23:55:30 -0500
Subject: [PATCH 2/2] regex: modernize to newer regex bugset

Problem reported by Hiroo Hayashi in:
https://lists.gnu.org/r/bug-gnulib/2021-07/msg00024.html
* m4/regex.m4 (gl_REGEX): Allow newer glibc behavior for ()0|\1,
behavior where the regex compiles but does not match.
Test for glibc bug 11053.
* tests/test-regex.c (bug_regex11, main): Add casts needed
for printf portability.
(main): Allow newer glibc behavior for ()0|\1.
---
 ChangeLog          | 10 ++++++++++
 m4/regex.m4        | 40 ++++++++++++++++++++++++++++++++++++++--
 tests/test-regex.c | 11 ++++++-----
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 00d31cdc7..78590feae 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,15 @@
 2021-07-14  Paul Eggert  <eggert@cs.ucla.edu>
 
+	regex: modernize to newer regex bugset
+	Problem reported by Hiroo Hayashi in:
+	https://lists.gnu.org/r/bug-gnulib/2021-07/msg00024.html
+	* m4/regex.m4 (gl_REGEX): Allow newer glibc behavior for ()0|\1,
+	behavior where the regex compiles but does not match.
+	Test for glibc bug 11053.
+	* tests/test-regex.c (bug_regex11, main): Add casts needed
+	for printf portability.
+	(main): Allow newer glibc behavior for ()0|\1.
+
 	regex: fix shell quoting problem in configuration
 	* m4/regex.m4 (gl_REGEX): Fix quoting problems.
 	These C programs are put into unquoted here-documents,
diff --git a/m4/regex.m4 b/m4/regex.m4
index 0e1bafef2..1c7e562f6 100644
--- a/m4/regex.m4
+++ b/m4/regex.m4
@@ -1,4 +1,4 @@
-# serial 72
+# serial 73
 
 # Copyright (C) 1996-2001, 2003-2021 Free Software Foundation, Inc.
 #
@@ -266,12 +266,48 @@ AC_DEFUN([gl_REGEX],
             memset (&regex, 0, sizeof regex);
             s = re_compile_pattern ("0|()0|\\\\1|0", 10, &regex);
             if (!s)
-              result |= 64;
+              {
+                memset (&regs, 0, sizeof regs);
+                i = re_search (&regex, "x", 1, 0, 1, &regs);
+                if (i != -1)
+                  result |= 64;
+                if (0 <= i)
+                  {
+                    free (regs.start);
+                    free (regs.end);
+                  }
+                regfree (&regex);
+              }
             else
               {
                 if (strcmp (s, "Invalid back reference"))
                   result |= 64;
+              }
+
+            /* glibc bug 11053.  */
+            re_set_syntax (RE_SYNTAX_POSIX_BASIC);
+            memset (&regex, 0, sizeof regex);
+            static char const pat_sub2[] = "\\\\(a*\\\\)*a*\\\\1";
+            s = re_compile_pattern (pat_sub2, sizeof pat_sub2 - 1, &regex);
+            if (s)
+              result |= 64;
+            else
+              {
+                memset (&regs, 0, sizeof regs);
+                static char const data[] = "a";
+                int datalen = sizeof data - 1;
+                i = re_search (&regex, data, datalen, 0, datalen, &regs);
+                if (i != 0)
+                  result |= 64;
+                else if (regs.num_regs < 2)
+                  result |= 64;
+                else if (! (regs.start[0] == 0 && regs.end[0] == 1))
+                  result |= 64;
+                else if (! (regs.start[1] == 0 && regs.end[1] == 0))
+                  result |= 64;
                 regfree (&regex);
+                free (regs.start);
+                free (regs.end);
               }
 
 #if 0
diff --git a/tests/test-regex.c b/tests/test-regex.c
index 7ea73cfb6..ed4ca64c0 100644
--- a/tests/test-regex.c
+++ b/tests/test-regex.c
@@ -155,7 +155,8 @@ bug_regex11 (void)
 	    if (tests[i].rm[n].rm_so == -1 && tests[i].rm[n].rm_eo == -1)
 	      break;
 	    report_error ("%s: regexec %zd match failure rm[%d] %d..%d",
-                          tests[i].pattern, i, n, rm[n].rm_so, rm[n].rm_eo);
+                          tests[i].pattern, i, n,
+                          (int) rm[n].rm_so, (int) rm[n].rm_eo);
 	    break;
 	  }
 
@@ -433,7 +434,7 @@ main (void)
                       pat_sub2, data, (int) regs.start[0], (int) regs.end[0]);
       else if (! (regs.start[1] == 0 && regs.end[1] == 0))
         report_error ("re_search '%s' on '%s' returned wrong submatch [%d,%d)",
-                      pat_sub2, data, regs.start[1], regs.end[1]);
+                      pat_sub2, data, (int) regs.start[1], (int) regs.end[1]);
       regfree (&regex);
       free (regs.start);
       free (regs.end);
@@ -466,9 +467,9 @@ main (void)
   memset (&regex, 0, sizeof regex);
   static char const pat_badback[] = "0|()0|\\1|0";
   s = re_compile_pattern (pat_badback, sizeof pat_badback, &regex);
-  if (!s)
-    s = "failed to report invalid back reference";
-  if (strcmp (s, "Invalid back reference") != 0)
+  if (!s && re_search (&regex, "x", 1, 0, 1, &regs) != -1)
+    s = "mishandled invalid back reference";
+  if (s && strcmp (s, "Invalid back reference") != 0)
     report_error ("%s: %s", pat_badback, s);
 
 #if 0
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: regex.m4 fails in all glibc
  2021-07-15  5:08 ` Paul Eggert
@ 2021-07-18  8:53   ` Bruno Haible
  0 siblings, 0 replies; 3+ messages in thread
From: Bruno Haible @ 2021-07-18  8:53 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Paul Eggert

Paul Eggert wrote:
> I installed the attached patches. The first patch merely fixes some 
> longstanding quoting problems.

Oh, one needs to look into the generated configure script, in order to
understand the problem.

I view this as an Autoconf documentation bug:
https://savannah.gnu.org/support/index.php?110518

Bruno



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-07-18  8:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-14 13:51 regex.m4 fails in all glibc 林宏雄
2021-07-15  5:08 ` Paul Eggert
2021-07-18  8:53   ` Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).