bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: arnold@skeeve.com, sam@gentoo.org
Cc: bug-gnulib@gnu.org, concord@gentoo.org, bug-gawk@gnu.org
Subject: Re: Clang-built Gawk 5.2.1 regex oddity
Date: Sun, 1 Jan 2023 22:10:28 -0800	[thread overview]
Message-ID: <1cdb3d41-d675-cca1-7498-c3840f2ee8ad@cs.ucla.edu> (raw)
In-Reply-To: <202301011906.301J6ROQ018104@freefriends.org>

[-- Attachment #1: Type: text/plain, Size: 2142 bytes --]

This is a serious bug in Clang: it generates incorrect machine code.

The code that Clang generates for the following (gawk/support/dfa.c 
lines 1141-1143):

     ((dfa->syntax.dfaopts & DFA_CONFUSING_BRACKETS_ERROR
       ? dfaerror : dfawarn)
      (_("character class syntax is [[:space:]], not [:space:]")));

is immediately followed by the code generated for the following 
(gawk/support/dfa.c line 1015):

                     dfaerror (_("invalid character class"));

and this is incorrect because the two source code regions are not 
connected with each other.

You can see the bug in the attached (compressed) file dfa.s which 
contains the assembly language output. Here's the dfa.s file starting 
with line 6975:

   6975		testb	$4, 456(%r12)
   6976		movl	$dfawarn, %eax
   6977		movl	$dfaerror, %ebx
   6978		cmoveq	%rax, %rbx
   6979		movl	$.L.str.26, %esi
   6980		xorl	%edi, %edi
   6981		movl	$5, %edx
   6982		callq	dcgettext
   6983		movq	%rax, %rdi
   6984		callq	*%rbx
   6985	.LBB34_144:
   6986		movl	$.L.str.25, %esi
   6987		xorl	%edi, %edi
   6988		movl	$5, %edx
   6989		callq	dcgettext
   6990		movq	%rax, %rdi
   6991		callq	dfaerror

Line 6984, which is source lines 1141-1143 call to either dfaerror or 
dfawarn, is immediately followed by the code for source line 1015. This 
means that at runtime when dfawarn returns the code immediately calls 
dfaerror, which is incorrect.

My guess is that Clang got confused because dfaerror is declared 
_Noreturn, so Clang mistakenly assumed that dfawarn is also _Noreturn, 
which it is not.

I worked around the Clang bug by installed the attached patch into 
Gnulib. Please give it a try with Gawk.

Incorrect code generation is a serious bug in Clang; can you please 
report it to the Clang folks? I am considering using a bigger hammer, 
and doing this:

    #define _Noreturn /*empty*/

whenever Clang is used, until the bug is fixed.

This is because if the bug occurs here it's likely that similar bugs 
will occur elsewhere and this sort of thing can be really subtle and 
hard to catch or work around in general. Clang really needs to get this 
fixed.

Thanks.

[-- Attachment #2: dfa.s.gz --]
[-- Type: application/gzip, Size: 37979 bytes --]

[-- Attachment #3: 0001-dfa-work-around-Clang-15-bug.patch --]
[-- Type: text/x-patch, Size: 2032 bytes --]

From 8805a44cf04253f63bce160054e2fbf21ab9beb1 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 1 Jan 2023 22:06:10 -0800
Subject: [PATCH] dfa: work around Clang 15 bug

Problem reported by Kenton Groombridge in:
https://lists.gnu.org/archive/html/bug-gawk/2022-12/msg00010.html
On x86-64, Clang 15 gets confused by a call (X ? dfaerror :
dfawarn) (Y) and generates the wrong code, presumably because
dfaerror is _Noreturn and dfawarn is not.
* lib/dfa.c (parse_bracket_exp): Reword to have one call for
dfaerror, the other for dfawarn.
---
 ChangeLog | 11 +++++++++++
 lib/dfa.c | 11 ++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 59500558e4..ac3d388c2b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2023-01-01  Paul Eggert  <eggert@cs.ucla.edu>
+
+	dfa: work around Clang 15 bug
+	Problem reported by Kenton Groombridge in:
+	https://lists.gnu.org/archive/html/bug-gawk/2022-12/msg00010.html
+	On x86-64, Clang 15 gets confused by a call (X ? dfaerror :
+	dfawarn) (Y) and generates the wrong code, presumably because
+	dfaerror is _Noreturn and dfawarn is not.
+	* lib/dfa.c (parse_bracket_exp): Reword to have one call for
+	dfaerror, the other for dfawarn.
+
 2023-01-01  Bruno Haible  <bruno@clisp.org>
 
 	doc: Update regarding stable branches.
diff --git a/lib/dfa.c b/lib/dfa.c
index 57df1e0421..211e1ed18f 100644
--- a/lib/dfa.c
+++ b/lib/dfa.c
@@ -1138,9 +1138,14 @@ parse_bracket_exp (struct dfa *dfa)
   while ((wc = wc1, (c = c1) != ']'));
 
   if (colon_warning_state == 7)
-    ((dfa->syntax.dfaopts & DFA_CONFUSING_BRACKETS_ERROR
-      ? dfaerror : dfawarn)
-     (_("character class syntax is [[:space:]], not [:space:]")));
+    {
+      char const *msg
+        = _("character class syntax is [[:space:]], not [:space:]");
+      if (dfa->syntax.dfaopts & DFA_CONFUSING_BRACKETS_ERROR)
+        dfaerror (msg);
+      else
+        dfawarn (msg);
+    }
 
   if (! known_bracket_exp)
     return BACKREF;
-- 
2.37.2


  reply	other threads:[~2023-01-02  6:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20221230000119.hyui6umnspuyzqum@bubbles>
     [not found] ` <202212300913.2BU9DV6V030160@freefriends.org>
     [not found]   ` <6DADB0FC-87EE-4028-91DF-C93A968A8982@gentoo.org>
2023-01-01 19:06     ` Clang-built Gawk 5.2.1 regex oddity arnold
2023-01-02  6:10       ` Paul Eggert [this message]
2023-01-03  2:14         ` Sam James
2023-01-03  2:43           ` Sam James
2023-01-05 23:06         ` Arsen Arsenović
2023-01-06 12:21           ` arnold
2023-01-13  7:03           ` Sam James

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1cdb3d41-d675-cca1-7498-c3840f2ee8ad@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=arnold@skeeve.com \
    --cc=bug-gawk@gnu.org \
    --cc=bug-gnulib@gnu.org \
    --cc=concord@gentoo.org \
    --cc=sam@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).