bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* Re: [PATCH] regex: port to Gawk on nonstandard platforms
@ 2020-01-26  9:42 arnold
  2020-01-27 21:09 ` Paul Eggert
  0 siblings, 1 reply; 3+ messages in thread
From: arnold @ 2020-01-26  9:42 UTC (permalink / raw)
  To: eggert; +Cc: bug-gnulib

Hi. Paul.

> diff --git a/lib/regex_internal.h b/lib/regex_internal.h
> index 13e15e21e..6d436fde1 100644
> --- a/lib/regex_internal.h
> +++ b/lib/regex_internal.h
> @@ -141,6 +141,9 @@
>  #ifndef SSIZE_MAX
>  # define SSIZE_MAX ((ssize_t) (SIZE_MAX / 2))
>  #endif
> +#ifndef ULONG_WIDTH
> +# define ULONG_WIDTH (CHAR_BIT * sizeof (unsigned long int))
> +#endif
>  
>  /* The type of indexes into strings.  This is signed, not size_t,
>     since the API requires indexes to fit in regoff_t anyway, and using

This change is problematic.  Further on in regex_internal.h we
have

	#define BITSET_WORD_BITS ULONG_WIDTH

And then in places in regcomp.c BITSET_WORD_BITS is tested in
several #if/#elif statements.

Thus on systems that don't provide ULONG_WIDTH, we end up with
expressions in #if/#elif that wants to use sizeof.

Needless to say, this fails spectactularly. :-(

Can you revert to the original code or to something else that
will compile on systems where ULONG_WIDTH is not defined?

Much thanks,

Arnold


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] regex: port to Gawk on nonstandard platforms
  2020-01-26  9:42 [PATCH] regex: port to Gawk on nonstandard platforms arnold
@ 2020-01-27 21:09 ` Paul Eggert
  2020-01-28  7:41   ` arnold
  0 siblings, 1 reply; 3+ messages in thread
From: Paul Eggert @ 2020-01-27 21:09 UTC (permalink / raw)
  To: arnold; +Cc: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

On 1/26/20 1:42 AM, arnold@skeeve.com wrote:
> And then in places in regcomp.c BITSET_WORD_BITS is tested in
> several #if/#elif statements.

Ouch, I hadn't noticed that. It's exercised only on non-GCC platforms 
that don't support INT_WIDTH etc., which is why I didn't see it in my 
testing. I installed the first attached patch, which should fix it. 
Thanks for reporting it.

While I was at it I also installed the second attached patch, since the 
regex code no longer depends on the limits-h module. This second patch 
shouldn't affect Awk.

[-- Attachment #2: 0001-regex-port-to-non-GCC-pre-IEC-60559.patch --]
[-- Type: text/x-patch, Size: 2344 bytes --]

From cc27f179a6f3c17bda8c8bad5fa125864603bae2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 27 Jan 2020 13:00:57 -0800
Subject: [PATCH 1/2] regex: port to non-GCC pre-IEC-60559

Problem reported by Arnold Robbins in:
https://lists.gnu.org/r/bug-gnulib/2020-01/msg00154.html
* lib/regex_internal.h (ULONG_WIDTH): Make this usable in #if.
---
 ChangeLog            |  7 +++++++
 lib/regex_internal.h | 17 ++++++++++++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index a4ea8009b..d3d1942a1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2020-01-27  Paul Eggert  <eggert@cs.ucla.edu>
+
+	regex: port to non-GCC pre-IEC-60559
+	Problem reported by Arnold Robbins in:
+	https://lists.gnu.org/r/bug-gnulib/2020-01/msg00154.html
+	* lib/regex_internal.h (ULONG_WIDTH): Make this usable in #if.
+
 2020-01-25  Bruno Haible  <bruno@clisp.org>
 
 	c32isxdigit: Add tests.
diff --git a/lib/regex_internal.h b/lib/regex_internal.h
index 6d436fde1..8c42586c4 100644
--- a/lib/regex_internal.h
+++ b/lib/regex_internal.h
@@ -142,7 +142,22 @@
 # define SSIZE_MAX ((ssize_t) (SIZE_MAX / 2))
 #endif
 #ifndef ULONG_WIDTH
-# define ULONG_WIDTH (CHAR_BIT * sizeof (unsigned long int))
+# define ULONG_WIDTH REGEX_UINTEGER_WIDTH (ULONG_MAX)
+/* The number of usable bits in an unsigned integer type with maximum
+   value MAX, as an int expression suitable in #if.  Cover all known
+   practical hosts.  This implementation exploits the fact that MAX is
+   1 less than a power of 2, and merely counts the number of 1 bits in
+   MAX; "COBn" means "count the number of 1 bits in the low-order n bits".  */
+# define REGEX_UINTEGER_WIDTH(max) REGEX_COB128 (max)
+# define REGEX_COB128(n) (REGEX_COB64 ((n) >> 31 >> 31 >> 2) + REGEX_COB64 (n))
+# define REGEX_COB64(n) (REGEX_COB32 ((n) >> 31 >> 1) + REGEX_COB32 (n))
+# define REGEX_COB32(n) (REGEX_COB16 ((n) >> 16) + REGEX_COB16 (n))
+# define REGEX_COB16(n) (REGEX_COB8 ((n) >> 8) + REGEX_COB8 (n))
+# define REGEX_COB8(n) (REGEX_COB4 ((n) >> 4) + REGEX_COB4 (n))
+# define REGEX_COB4(n) (!!((n) & 8) + !!((n) & 4) + !!((n) & 2) + ((n) & 1))
+# if ULONG_MAX / 2 + 1 != 1ul << (ULONG_WIDTH - 1)
+#  error "ULONG_MAX out of range"
+# endif
 #endif
 
 /* The type of indexes into strings.  This is signed, not size_t,
-- 
2.24.1


[-- Attachment #3: 0002-regex-remove-limits-h-dependency.patch --]
[-- Type: text/x-patch, Size: 1478 bytes --]

From 55cb9de6ff5f6d382da4efe6c47a0fad5b00c4cf Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 27 Jan 2020 13:07:22 -0800
Subject: [PATCH 2/2] regex: remove limits-h dependency

* modules/regex (Depends-on): Remove limits-h, since the
code no longer depends on ULONG_WIDTH already being defined.
---
 ChangeLog     | 4 ++++
 modules/regex | 1 -
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index d3d1942a1..a861f4996 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,9 @@
 2020-01-27  Paul Eggert  <eggert@cs.ucla.edu>
 
+	regex: remove limits-h dependency
+	* modules/regex (Depends-on): Remove limits-h, since the
+	code no longer depends on ULONG_WIDTH already being defined.
+
 	regex: port to non-GCC pre-IEC-60559
 	Problem reported by Arnold Robbins in:
 	https://lists.gnu.org/r/bug-gnulib/2020-01/msg00154.html
diff --git a/modules/regex b/modules/regex
index bd38cd2d4..9d77df7ae 100644
--- a/modules/regex
+++ b/modules/regex
@@ -22,7 +22,6 @@ builtin-expect  [test $ac_use_included_regex = yes]
 intprops        [test $ac_use_included_regex = yes]
 langinfo        [test $ac_use_included_regex = yes]
 libc-config     [test $ac_use_included_regex = yes]
-limits-h        [test $ac_use_included_regex = yes]
 lock      [test "$ac_cv_gnu_library_2_1:$ac_use_included_regex" = no:yes]
 memcmp          [test $ac_use_included_regex = yes]
 memmove         [test $ac_use_included_regex = yes]
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] regex: port to Gawk on nonstandard platforms
  2020-01-27 21:09 ` Paul Eggert
@ 2020-01-28  7:41   ` arnold
  0 siblings, 0 replies; 3+ messages in thread
From: arnold @ 2020-01-28  7:41 UTC (permalink / raw)
  To: eggert, arnold; +Cc: bug-gnulib

Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 1/26/20 1:42 AM, arnold@skeeve.com wrote:
> > And then in places in regcomp.c BITSET_WORD_BITS is tested in
> > several #if/#elif statements.
>
> Ouch, I hadn't noticed that. It's exercised only on non-GCC platforms 
> that don't support INT_WIDTH etc., which is why I didn't see it in my 
> testing. I installed the first attached patch, which should fix it. 
> Thanks for reporting it.
>
> While I was at it I also installed the second attached patch, since the 
> regex code no longer depends on the limits-h module. This second patch 
> shouldn't affect Awk.

Much thanks for the fix. I have pulled it into gawk and we'll see
what my testers report.

Thanks,

Arnold


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-01-28  7:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-26  9:42 [PATCH] regex: port to Gawk on nonstandard platforms arnold
2020-01-27 21:09 ` Paul Eggert
2020-01-28  7:41   ` arnold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).