From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.2 required=3.0 tests=AWL,BAYES_00,BODY_8BITS, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 0B8641F8C6 for ; Thu, 15 Jul 2021 05:08:27 +0000 (UTC) Received: from localhost ([::1]:45872 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m3tbl-0007sm-Ra for normalperson@yhbt.net; Thu, 15 Jul 2021 01:08:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41710) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m3tbi-0007sO-Oe for bug-gnulib@gnu.org; Thu, 15 Jul 2021 01:08:22 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:38350) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m3tbg-0002S7-11 for bug-gnulib@gnu.org; Thu, 15 Jul 2021 01:08:22 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 02A3316005E; Wed, 14 Jul 2021 22:08:16 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id vTjH7SlM3JT6; Wed, 14 Jul 2021 22:08:14 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 93B7316006F; Wed, 14 Jul 2021 22:08:14 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id LudMotrr03dS; Wed, 14 Jul 2021 22:08:14 -0700 (PDT) Received: from [192.168.0.205] (ip72-206-1-93.fv.ks.cox.net [72.206.1.93]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 2E01516005E; Wed, 14 Jul 2021 22:08:14 -0700 (PDT) Subject: Re: regex.m4 fails in all glibc To: =?UTF-8?B?5p6X5a6P6ZuE?= , bug-gnulib@gnu.org References: From: Paul Eggert Message-ID: Date: Thu, 15 Jul 2021 00:08:09 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------4E66B65BF23EC797B35D7ADD" Content-Language: en-US Received-SPF: pass client-ip=131.179.128.68; envelope-from=eggert@cs.ucla.edu; helo=zimbra.cs.ucla.edu X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jim Meyering Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" This is a multi-part message in MIME format. --------------4E66B65BF23EC797B35D7ADD Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable On 7/14/21 8:51 AM, =E6=9E=97=E5=AE=8F=E9=9B=84 wrote: > We expect that configure should not use the included regex.c > on the system with up-to-date glibc. Recently that has not been the case, as glibc has been trailing behind=20 Gnulib a bit. At some point I hope they will become more in sync. > Returning 0 by re_compile_pattern() is correct behavior. It should not = fail. Thank you for reporting the problem. There is no formal spec for=20 re_compile_pattern and the regular expression is in some sense invalid=20 and in another sense valid, so it's not clear which is the correct=20 behavior here. However, Gnulib should allow the latest glibc behavior,=20 as that behavior is arguably correct. I installed the attached patches. The first patch merely fixes some=20 longstanding quoting problems. The second patch updates the Gnulib tests=20 to match the latest Gnulib, and to allow glibc implementations that=20 behave as you describe. However, glibc 2.33 (the current version) still=20 fails to pass the configure-time test, due to glibc bug 11053 which is=20 fixed in Gnulib; see: https://sourceware.org/bugzilla/show_bug.cgi?id=3D11053 --------------4E66B65BF23EC797B35D7ADD Content-Type: text/x-patch; charset=UTF-8; name="0001-regex-fix-shell-quoting-problem-in-configuration.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-regex-fix-shell-quoting-problem-in-configuration.patch" >From 11e6fc32e9c5c770150b47bc0260b1b91110eba0 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 14 Jul 2021 23:23:20 -0500 Subject: [PATCH 1/2] regex: fix shell quoting problem in configuration * m4/regex.m4 (gl_REGEX): Fix quoting problems. These C programs are put into unquoted here-documents, so $ and \ need to be quoted. --- ChangeLog | 7 +++++++ m4/regex.m4 | 6 +++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/ChangeLog b/ChangeLog index 25d85aa4a..00d31cdc7 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2021-07-14 Paul Eggert + + regex: fix shell quoting problem in configuration + * m4/regex.m4 (gl_REGEX): Fix quoting problems. + These C programs are put into unquoted here-documents, + so $ and \ need to be quoted. + 2021-07-08 Paul Eggert select: port better to MinGW diff --git a/m4/regex.m4 b/m4/regex.m4 index 850c57222..0e1bafef2 100644 --- a/m4/regex.m4 +++ b/m4/regex.m4 @@ -1,4 +1,4 @@ -# serial 71 +# serial 72 # Copyright (C) 1996-2001, 2003-2021 Free Software Foundation, Inc. # @@ -246,7 +246,7 @@ AC_DEFUN([gl_REGEX], & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES); memset (®ex, 0, sizeof regex); - s = re_compile_pattern ("[[:alnum:]_-]\\\\+$", 16, ®ex); + s = re_compile_pattern ("[[:alnum:]_-]\\\\+\$", 16, ®ex); if (s) result |= 32; else @@ -264,7 +264,7 @@ AC_DEFUN([gl_REGEX], back reference. */ re_set_syntax (RE_SYNTAX_POSIX_EGREP); memset (®ex, 0, sizeof regex); - s = re_compile_pattern ("0|()0|\\1|0", 10, ®ex); + s = re_compile_pattern ("0|()0|\\\\1|0", 10, ®ex); if (!s) result |= 64; else -- 2.25.1 --------------4E66B65BF23EC797B35D7ADD Content-Type: text/x-patch; charset=UTF-8; name="0002-regex-modernize-to-newer-regex-bugset.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-regex-modernize-to-newer-regex-bugset.patch" >From e707dbe7c6da9dd8300cb3d60141f144a7a5d5b1 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 14 Jul 2021 23:55:30 -0500 Subject: [PATCH 2/2] regex: modernize to newer regex bugset Problem reported by Hiroo Hayashi in: https://lists.gnu.org/r/bug-gnulib/2021-07/msg00024.html * m4/regex.m4 (gl_REGEX): Allow newer glibc behavior for ()0|\1, behavior where the regex compiles but does not match. Test for glibc bug 11053. * tests/test-regex.c (bug_regex11, main): Add casts needed for printf portability. (main): Allow newer glibc behavior for ()0|\1. --- ChangeLog | 10 ++++++++++ m4/regex.m4 | 40 ++++++++++++++++++++++++++++++++++++++-- tests/test-regex.c | 11 ++++++----- 3 files changed, 54 insertions(+), 7 deletions(-) diff --git a/ChangeLog b/ChangeLog index 00d31cdc7..78590feae 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,15 @@ 2021-07-14 Paul Eggert + regex: modernize to newer regex bugset + Problem reported by Hiroo Hayashi in: + https://lists.gnu.org/r/bug-gnulib/2021-07/msg00024.html + * m4/regex.m4 (gl_REGEX): Allow newer glibc behavior for ()0|\1, + behavior where the regex compiles but does not match. + Test for glibc bug 11053. + * tests/test-regex.c (bug_regex11, main): Add casts needed + for printf portability. + (main): Allow newer glibc behavior for ()0|\1. + regex: fix shell quoting problem in configuration * m4/regex.m4 (gl_REGEX): Fix quoting problems. These C programs are put into unquoted here-documents, diff --git a/m4/regex.m4 b/m4/regex.m4 index 0e1bafef2..1c7e562f6 100644 --- a/m4/regex.m4 +++ b/m4/regex.m4 @@ -1,4 +1,4 @@ -# serial 72 +# serial 73 # Copyright (C) 1996-2001, 2003-2021 Free Software Foundation, Inc. # @@ -266,12 +266,48 @@ AC_DEFUN([gl_REGEX], memset (®ex, 0, sizeof regex); s = re_compile_pattern ("0|()0|\\\\1|0", 10, ®ex); if (!s) - result |= 64; + { + memset (®s, 0, sizeof regs); + i = re_search (®ex, "x", 1, 0, 1, ®s); + if (i != -1) + result |= 64; + if (0 <= i) + { + free (regs.start); + free (regs.end); + } + regfree (®ex); + } else { if (strcmp (s, "Invalid back reference")) result |= 64; + } + + /* glibc bug 11053. */ + re_set_syntax (RE_SYNTAX_POSIX_BASIC); + memset (®ex, 0, sizeof regex); + static char const pat_sub2[] = "\\\\(a*\\\\)*a*\\\\1"; + s = re_compile_pattern (pat_sub2, sizeof pat_sub2 - 1, ®ex); + if (s) + result |= 64; + else + { + memset (®s, 0, sizeof regs); + static char const data[] = "a"; + int datalen = sizeof data - 1; + i = re_search (®ex, data, datalen, 0, datalen, ®s); + if (i != 0) + result |= 64; + else if (regs.num_regs < 2) + result |= 64; + else if (! (regs.start[0] == 0 && regs.end[0] == 1)) + result |= 64; + else if (! (regs.start[1] == 0 && regs.end[1] == 0)) + result |= 64; regfree (®ex); + free (regs.start); + free (regs.end); } #if 0 diff --git a/tests/test-regex.c b/tests/test-regex.c index 7ea73cfb6..ed4ca64c0 100644 --- a/tests/test-regex.c +++ b/tests/test-regex.c @@ -155,7 +155,8 @@ bug_regex11 (void) if (tests[i].rm[n].rm_so == -1 && tests[i].rm[n].rm_eo == -1) break; report_error ("%s: regexec %zd match failure rm[%d] %d..%d", - tests[i].pattern, i, n, rm[n].rm_so, rm[n].rm_eo); + tests[i].pattern, i, n, + (int) rm[n].rm_so, (int) rm[n].rm_eo); break; } @@ -433,7 +434,7 @@ main (void) pat_sub2, data, (int) regs.start[0], (int) regs.end[0]); else if (! (regs.start[1] == 0 && regs.end[1] == 0)) report_error ("re_search '%s' on '%s' returned wrong submatch [%d,%d)", - pat_sub2, data, regs.start[1], regs.end[1]); + pat_sub2, data, (int) regs.start[1], (int) regs.end[1]); regfree (®ex); free (regs.start); free (regs.end); @@ -466,9 +467,9 @@ main (void) memset (®ex, 0, sizeof regex); static char const pat_badback[] = "0|()0|\\1|0"; s = re_compile_pattern (pat_badback, sizeof pat_badback, ®ex); - if (!s) - s = "failed to report invalid back reference"; - if (strcmp (s, "Invalid back reference") != 0) + if (!s && re_search (®ex, "x", 1, 0, 1, ®s) != -1) + s = "mishandled invalid back reference"; + if (s && strcmp (s, "Invalid back reference") != 0) report_error ("%s: %s", pat_badback, s); #if 0 -- 2.25.1 --------------4E66B65BF23EC797B35D7ADD--