bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* Re: [bug-libunistring] Unistring 0.9.10 and Undefined Behavior sanitizer findings
       [not found] <CAH8yC8kNZpPFZQdUD_brywwH75hyae=_Xsw=5MWgVQNjYfqMpg@mail.gmail.com>
@ 2019-03-08 23:36 ` Bruno Haible
  2019-03-09  1:29   ` Jeffrey Walton
  0 siblings, 1 reply; 3+ messages in thread
From: Bruno Haible @ 2019-03-08 23:36 UTC (permalink / raw)
  To: bug-libunistring, noloader; +Cc: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 4360 bytes --]

Hi,

Jeffrey Walton wrote in
<https://lists.gnu.org/archive/html/bug-libunistring/2019-03/msg00002.html>:
> I used CFLAGS="-DNDEBUG -g2 -O1 -fsanitize=undefined". I built the
> library, then ran a 'make test V=1'. The output presented to the user
> looks OK. However, it looks like UBsan found some undefined behavior:
> 
> libunistring-0.9.10$ grep -IR 'runtime error'
> tests/test-u8-width-linebreaks.log:unilbrk/u8-possible-linebreaks.c:49:3:
> runtime error: null pointer passed as argument 1, which is declared to
> never be null
> tests/test-u32-to-u8.log:unistr/u8-cmp.c:38:10: runtime error: null
> pointer passed as argument 2, which is declared to never be null
> tests/test-numeric.log:unictype/numeric.c:52:54: runtime error: left
> shift of 34927 by 16 places cannot be represented in type 'int'
> tests/test-nfkc.log:unistr/u8-cmp.c:38:10: runtime error: null pointer
> passed as argument 2, which is declared to never be null
> tests/test-u8-tolower.log:unistr/u8-cmp.c:38:10: runtime error: null
> pointer passed as argument 2, which is declared to never be null
> tests/test-u16-to-u8.log:unistr/u8-cmp.c:38:10: runtime error: null
> pointer passed as argument 2, which is declared to never be null
> tests/test-nfkd.log:unistr/u8-cmp.c:38:10: runtime error: null pointer
> passed as argument 2, which is declared to never be null
> tests/test-u8-casefold.log:unistr/u8-cmp.c:38:10: runtime error: null
> pointer passed as argument 2, which is declared to never be null
> tests/unicase/test-ulc-casecmp2.sh.log:uniconv/u8-conv-from-enc.c:89:7:
> runtime error: null pointer passed as argument 2, which is declared to
> never be null
> tests/unicase/test-ulc-casecoll2.sh.log:uniconv/u8-conv-from-enc.c:89:7:
> runtime error: null pointer passed as argument 2, which is declared to
> never be null
> tests/test-nfc.log:unistr/u8-cmp.c:38:10: runtime error: null pointer
> passed as argument 2, which is declared to never be null
> tests/test-u8-totitle.log:unistr/u8-cmp.c:38:10: runtime error: null
> pointer passed as argument 2, which is declared to never be null
> tests/test-nfd.log:unistr/u8-cmp.c:38:10: runtime error: null pointer
> passed as argument 2, which is declared to never be null
> tests/test-u16-possible-linebreaks.log:unilbrk/u16-possible-linebreaks.c:49:3:
> runtime error: null pointer passed as argument 1, which is declared to
> never be null
> tests/test-u16-width-linebreaks.log:unilbrk/u16-possible-linebreaks.c:49:3:
> runtime error: null pointer passed as argument 1, which is declared to
> never be null
> tests/test-u8-toupper.log:unistr/u8-cmp.c:38:10: runtime error: null
> pointer passed as argument 2, which is declared to never be null
> tests/test-u8-possible-linebreaks.log:unilbrk/u8-possible-linebreaks.c:49:3:
> runtime error: null pointer passed as argument 1, which is declared to
> never be null

Thanks for these reports. Indeed, use of CC="gcc -fsanitize=undefined"
produces error messages in the 'make check' logs.

1) unictype/numeric.c:52:54: runtime error: left shift of 34927 by 16 places cannot be represented in type 'int'

The code was implicitly casting an 'unsigned short' to 'int' and doing a
shift of the result. It is better to case from 'unsigned short' to
'unsigned int' explicitly.

2) unistr/u8-cmp.c:38:10: runtime error: null pointer passed as argument 2, which is declared to never be null

memcmp(NULL, NULL, 0) is invalid in ISO C and POSIX. ISO C11 7.24.1.(2)
states: "Where an argument declared as size_t n specifies the length of
the array for a function, n can have the value zero on a call to that function.
Unless explicitly stated otherwise in the description of a particular function
in this subclause, pointer arguments on such a call shall still have valid
values, as described in 7.1.4."

References:
https://lists.gnu.org/archive/html/bug-gnulib/2009-05/msg00156.html
https://stackoverflow.com/questions/16362925/can-i-pass-a-null-pointer-to-memcmp

3) uniconv/u8-conv-from-enc.c:89:7: runtime error: null pointer passed as argument 2, which is declared to never be null

memcpy(dest, NULL, 0) is invalid in ISO C and POSIX, similarly.

4) unilbrk/u8-possible-linebreaks.c:49:3: runtime error: null pointer passed as argument 1, which is declared to never be null

memset(NULL, c, 0) is invalid in ISO C and POSIX, similarly.

I'm pushing the attached patches.



[-- Attachment #2: 0001-unictype-numeric-Fix-undefined-behaviour.patch --]
[-- Type: text/x-patch, Size: 4224 bytes --]

From 01ec92af9fda3163f3379d5932991b47f011aa33 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Fri, 8 Mar 2019 19:17:37 +0100
Subject: [PATCH 1/4] unictype/numeric: Fix undefined behaviour.

Reported by Jeffrey Walton <noloader@gmail.com>.

* lib/unictype/numeric.c (uc_numeric_value): Avoid undefined behaviour
on shift overflow, caught by "gcc -fsanitize=undefined".
* lib/unictype/bidi_of.c (uc_bidi_class): Add cast, for clarity.
* lib/unictype/categ_of.c (lookup_withtable): Likewise.
* lib/unictype/joininggroup_of.c (uc_joining_group): Likewise.
---
 ChangeLog                      | 10 ++++++++++
 lib/unictype/bidi_of.c         |  2 +-
 lib/unictype/categ_of.c        |  2 +-
 lib/unictype/joininggroup_of.c |  2 +-
 lib/unictype/numeric.c         |  4 ++--
 5 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index e5ba157..2e61261 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2019-03-08  Bruno Haible  <bruno@clisp.org>
+
+	unictype/numeric: Fix undefined behaviour.
+	Reported by Jeffrey Walton <noloader@gmail.com>.
+	* lib/unictype/numeric.c (uc_numeric_value): Avoid undefined behaviour
+	on shift overflow, caught by "gcc -fsanitize=undefined".
+	* lib/unictype/bidi_of.c (uc_bidi_class): Add cast, for clarity.
+	* lib/unictype/categ_of.c (lookup_withtable): Likewise.
+	* lib/unictype/joininggroup_of.c (uc_joining_group): Likewise.
+
 2019-03-05  Paul Eggert  <eggert@cs.ucla.edu>
 
 	git-version-gen: fix --version copyright year
diff --git a/lib/unictype/bidi_of.c b/lib/unictype/bidi_of.c
index 7e52117..3db1f3a 100644
--- a/lib/unictype/bidi_of.c
+++ b/lib/unictype/bidi_of.c
@@ -39,7 +39,7 @@ uc_bidi_class (ucs4_t uc)
               unsigned int index3 = ((uc & bidi_category_header_4) + lookup2) * 5;
               /* level3 contains 5-bit values, packed into 16-bit words.  */
               unsigned int lookup3 =
-                ((u_bidi_category.level3[index3>>4]
+                (((unsigned int) u_bidi_category.level3[index3>>4]
                   | ((unsigned int) u_bidi_category.level3[(index3>>4)+1] << 16))
                  >> (index3 % 16))
                 & 0x1f;
diff --git a/lib/unictype/categ_of.c b/lib/unictype/categ_of.c
index a1e6762..4c3f127 100644
--- a/lib/unictype/categ_of.c
+++ b/lib/unictype/categ_of.c
@@ -39,7 +39,7 @@ lookup_withtable (ucs4_t uc)
               unsigned int index3 = ((uc & category_header_4) + lookup2) * 5;
               /* level3 contains 5-bit values, packed into 16-bit words.  */
               unsigned int lookup3 =
-                ((u_category.level3[index3>>4]
+                (((unsigned int) u_category.level3[index3>>4]
                   | ((unsigned int) u_category.level3[(index3>>4)+1] << 16))
                  >> (index3 % 16))
                 & 0x1f;
diff --git a/lib/unictype/joininggroup_of.c b/lib/unictype/joininggroup_of.c
index db254a9..3b2a75e 100644
--- a/lib/unictype/joininggroup_of.c
+++ b/lib/unictype/joininggroup_of.c
@@ -39,7 +39,7 @@ uc_joining_group (ucs4_t uc)
               unsigned int index3 = ((uc & joining_group_header_4) + lookup2) * 7;
               /* level3 contains 7-bit values, packed into 16-bit words.  */
               unsigned int lookup3 =
-                ((u_joining_group.level3[index3>>4]
+                (((unsigned int) u_joining_group.level3[index3>>4]
                   | ((unsigned int) u_joining_group.level3[(index3>>4)+1] << 16))
                  >> (index3 % 16))
                 & 0x7f;
diff --git a/lib/unictype/numeric.c b/lib/unictype/numeric.c
index fd6eab9..a999183 100644
--- a/lib/unictype/numeric.c
+++ b/lib/unictype/numeric.c
@@ -39,8 +39,8 @@ uc_numeric_value (ucs4_t uc)
               unsigned int index3 = ((uc & numeric_header_4) + lookup2) * 8;
               /* level3 contains 8-bit values, packed into 16-bit words.  */
               unsigned int lookup3 =
-                ((u_numeric.level3[index3>>4]
-                  | (u_numeric.level3[(index3>>4)+1] << 16))
+                (((unsigned int) u_numeric.level3[index3>>4]
+                  | ((unsigned int) u_numeric.level3[(index3>>4)+1] << 16))
                  >> (index3 % 16))
                 & 0xff;
 
-- 
2.7.4


[-- Attachment #3: 0002-unistr-u8-cmp-Fix-undefined-behaviour.patch --]
[-- Type: text/x-patch, Size: 1421 bytes --]

From 7e7501fa24bcd2e1484fee5648d88b8d94f0bf38 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Fri, 8 Mar 2019 20:38:22 +0100
Subject: [PATCH 2/4] unistr/u8-cmp: Fix undefined behaviour.

Reported by Jeffrey Walton <noloader@gmail.com>.

* lib/unistr/u8-cmp.c (u8_cmp): Don't invoke memcmp if n is zero.
---
 ChangeLog           | 6 ++++++
 lib/unistr/u8-cmp.c | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 2e61261..074b1ca 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@
 2019-03-08  Bruno Haible  <bruno@clisp.org>
 
+	unistr/u8-cmp: Fix undefined behaviour.
+	Reported by Jeffrey Walton <noloader@gmail.com>.
+	* lib/unistr/u8-cmp.c (u8_cmp): Don't invoke memcmp if n is zero.
+
+2019-03-08  Bruno Haible  <bruno@clisp.org>
+
 	unictype/numeric: Fix undefined behaviour.
 	Reported by Jeffrey Walton <noloader@gmail.com>.
 	* lib/unictype/numeric.c (uc_numeric_value): Avoid undefined behaviour
diff --git a/lib/unistr/u8-cmp.c b/lib/unistr/u8-cmp.c
index bd2f11b..5349ec6 100644
--- a/lib/unistr/u8-cmp.c
+++ b/lib/unistr/u8-cmp.c
@@ -26,5 +26,5 @@ int
 u8_cmp (const uint8_t *s1, const uint8_t *s2, size_t n)
 {
   /* Use the fact that the UTF-8 encoding preserves lexicographic order.  */
-  return memcmp ((const char *) s1, (const char *) s2, n);
+  return n == 0 ? 0 : memcmp ((const char *) s1, (const char *) s2, n);
 }
-- 
2.7.4


[-- Attachment #4: 0003-unistr-uniconv-Fix-undefined-behaviour.patch --]
[-- Type: text/x-patch, Size: 3270 bytes --]

From f6f567a94d1f34b96ae05e2354bfaa994c9de840 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Sat, 9 Mar 2019 00:01:47 +0100
Subject: [PATCH 3/4] unistr/*, uniconv/*: Fix undefined behaviour.

Reported by Jeffrey Walton <noloader@gmail.com>.

* lib/unistr/u-cpy.h (FUNC): Don't invoke memcpy with a zero size.
* lib/unistr/u-cpy-alloc.h (FUNC): Likewise.
* lib/uniconv/u8-conv-from-enc.c (u8_conv_from_encoding): Likewise.
* lib/uniconv/u8-conv-to-enc.c (u8_conv_to_encoding): Likewise.
---
 ChangeLog                      | 9 +++++++++
 lib/uniconv/u8-conv-from-enc.c | 3 ++-
 lib/uniconv/u8-conv-to-enc.c   | 3 ++-
 lib/unistr/u-cpy-alloc.h       | 3 ++-
 lib/unistr/u-cpy.h             | 3 ++-
 5 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 074b1ca..687357e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,14 @@
 2019-03-08  Bruno Haible  <bruno@clisp.org>
 
+	unistr/*, uniconv/*: Fix undefined behaviour.
+	Reported by Jeffrey Walton <noloader@gmail.com>.
+	* lib/unistr/u-cpy.h (FUNC): Don't invoke memcpy with a zero size.
+	* lib/unistr/u-cpy-alloc.h (FUNC): Likewise.
+	* lib/uniconv/u8-conv-from-enc.c (u8_conv_from_encoding): Likewise.
+	* lib/uniconv/u8-conv-to-enc.c (u8_conv_to_encoding): Likewise.
+
+2019-03-08  Bruno Haible  <bruno@clisp.org>
+
 	unistr/u8-cmp: Fix undefined behaviour.
 	Reported by Jeffrey Walton <noloader@gmail.com>.
 	* lib/unistr/u8-cmp.c (u8_cmp): Don't invoke memcmp if n is zero.
diff --git a/lib/uniconv/u8-conv-from-enc.c b/lib/uniconv/u8-conv-from-enc.c
index 623dc1f..f8e9bd9 100644
--- a/lib/uniconv/u8-conv-from-enc.c
+++ b/lib/uniconv/u8-conv-from-enc.c
@@ -77,7 +77,8 @@ u8_conv_from_encoding (const char *fromcode,
             }
         }
 
-      memcpy ((char *) result, src, srclen);
+      if (srclen > 0)
+        memcpy ((char *) result, src, srclen);
       *lengthp = srclen;
       return result;
     }
diff --git a/lib/uniconv/u8-conv-to-enc.c b/lib/uniconv/u8-conv-to-enc.c
index ff0bc64..56e3121 100644
--- a/lib/uniconv/u8-conv-to-enc.c
+++ b/lib/uniconv/u8-conv-to-enc.c
@@ -60,7 +60,8 @@ u8_conv_to_encoding (const char *tocode,
             }
         }
 
-      memcpy (result, (const char *) src, srclen);
+      if (srclen > 0)
+        memcpy (result, (const char *) src, srclen);
       *lengthp = srclen;
       return result;
     }
diff --git a/lib/unistr/u-cpy-alloc.h b/lib/unistr/u-cpy-alloc.h
index 8afa66b..55fef32 100644
--- a/lib/unistr/u-cpy-alloc.h
+++ b/lib/unistr/u-cpy-alloc.h
@@ -33,7 +33,8 @@ FUNC (const UNIT *s, size_t n)
       for (; n > 0; n--)
         *destptr++ = *s++;
 #else
-      memcpy ((char *) dest, (const char *) s, n * sizeof (UNIT));
+      if (n > 0)
+        memcpy ((char *) dest, (const char *) s, n * sizeof (UNIT));
 #endif
     }
   return dest;
diff --git a/lib/unistr/u-cpy.h b/lib/unistr/u-cpy.h
index 1dc2129..c023a27 100644
--- a/lib/unistr/u-cpy.h
+++ b/lib/unistr/u-cpy.h
@@ -26,7 +26,8 @@ FUNC (UNIT *dest, const UNIT *src, size_t n)
   for (; n > 0; n--)
     *destptr++ = *src++;
 #else
-  memcpy ((char *) dest, (const char *) src, n * sizeof (UNIT));
+  if (n > 0)
+    memcpy ((char *) dest, (const char *) src, n * sizeof (UNIT));
 #endif
   return dest;
 }
-- 
2.7.4


[-- Attachment #5: 0004-unilbrk-u-possible-linebreaks-Fix-undefined-behaviou.patch --]
[-- Type: text/x-patch, Size: 28950 bytes --]

From fa418ebbdf307d0efc1e61677045f48de5a93398 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Sat, 9 Mar 2019 00:27:19 +0100
Subject: [PATCH 4/4] unilbrk/u*-possible-linebreaks: Fix undefined behaviour.

Reported by Jeffrey Walton <noloader@gmail.com>.

* lib/unilbrk/u8-possible-linebreaks.c (u8_possible_linebreaks): Don't
invoke memset with a zero size.
* lib/unilbrk/u16-possible-linebreaks.c (u16_possible_linebreaks):
Likewise.
* lib/unilbrk/u32-possible-linebreaks.c (u32_possible_linebreaks):
Adjust accordingly.
---
 ChangeLog                             |  11 ++
 lib/unilbrk/u16-possible-linebreaks.c | 212 +++++++++++++++++-----------------
 lib/unilbrk/u32-possible-linebreaks.c | 204 ++++++++++++++++----------------
 lib/unilbrk/u8-possible-linebreaks.c  | 212 +++++++++++++++++-----------------
 4 files changed, 331 insertions(+), 308 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 687357e..a67ed92 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,16 @@
 2019-03-08  Bruno Haible  <bruno@clisp.org>
 
+	unilbrk/u*-possible-linebreaks: Fix undefined behaviour.
+	Reported by Jeffrey Walton <noloader@gmail.com>.
+	* lib/unilbrk/u8-possible-linebreaks.c (u8_possible_linebreaks): Don't
+	invoke memset with a zero size.
+	* lib/unilbrk/u16-possible-linebreaks.c (u16_possible_linebreaks):
+	Likewise.
+	* lib/unilbrk/u32-possible-linebreaks.c (u32_possible_linebreaks):
+	Adjust accordingly.
+
+2019-03-08  Bruno Haible  <bruno@clisp.org>
+
 	unistr/*, uniconv/*: Fix undefined behaviour.
 	Reported by Jeffrey Walton <noloader@gmail.com>.
 	* lib/unistr/u-cpy.h (FUNC): Don't invoke memcpy with a zero size.
diff --git a/lib/unilbrk/u16-possible-linebreaks.c b/lib/unilbrk/u16-possible-linebreaks.c
index 3ee91ad..0a91368 100644
--- a/lib/unilbrk/u16-possible-linebreaks.c
+++ b/lib/unilbrk/u16-possible-linebreaks.c
@@ -30,135 +30,139 @@
 void
 u16_possible_linebreaks (const uint16_t *s, size_t n, const char *encoding, char *p)
 {
-  int LBP_AI_REPLACEMENT = (is_cjk_encoding (encoding) ? LBP_ID : LBP_AL);
-  const uint16_t *s_end = s + n;
-  int last_prop = LBP_BK; /* line break property of last non-space character */
-  char *seen_space = NULL; /* Was a space seen after the last non-space character? */
-  char *seen_space2 = NULL; /* At least two spaces after the last non-space? */
-
-  /* Don't break inside multibyte characters.  */
-  memset (p, UC_BREAK_PROHIBITED, n);
-
-  while (s < s_end)
+  if (n > 0)
     {
-      ucs4_t uc;
-      int count = u16_mbtouc_unsafe (&uc, s, s_end - s);
-      int prop = unilbrkprop_lookup (uc);
+      int LBP_AI_REPLACEMENT = (is_cjk_encoding (encoding) ? LBP_ID : LBP_AL);
+      const uint16_t *s_end = s + n;
+      int last_prop = LBP_BK; /* line break property of last non-space character */
+      char *seen_space = NULL; /* Was a space seen after the last non-space character? */
+      char *seen_space2 = NULL; /* At least two spaces after the last non-space? */
 
-      if (prop == LBP_BK)
-        {
-          /* Mandatory break.  */
-          *p = UC_BREAK_MANDATORY;
-          last_prop = LBP_BK;
-          seen_space = NULL;
-          seen_space2 = NULL;
-        }
-      else
-        {
-          char *q;
+      /* Don't break inside multibyte characters.  */
+      memset (p, UC_BREAK_PROHIBITED, n);
 
-          /* Resolve property values whose behaviour is not fixed.  */
-          switch (prop)
-            {
-            case LBP_AI:
-              /* Resolve ambiguous.  */
-              prop = LBP_AI_REPLACEMENT;
-              break;
-            case LBP_CB:
-              /* This is arbitrary.  */
-              prop = LBP_ID;
-              break;
-            case LBP_SA:
-              /* We don't handle complex scripts yet.
-                 Treat LBP_SA like LBP_XX.  */
-            case LBP_XX:
-              /* This is arbitrary.  */
-              prop = LBP_AL;
-              break;
-            }
+      do
+        {
+          ucs4_t uc;
+          int count = u16_mbtouc_unsafe (&uc, s, s_end - s);
+          int prop = unilbrkprop_lookup (uc);
 
-          /* Deal with spaces and combining characters.  */
-          q = p;
-          if (prop == LBP_SP)
-            {
-              /* Don't break just before a space.  */
-              *p = UC_BREAK_PROHIBITED;
-              seen_space2 = seen_space;
-              seen_space = p;
-            }
-          else if (prop == LBP_ZW)
+          if (prop == LBP_BK)
             {
-              /* Don't break just before a zero-width space.  */
-              *p = UC_BREAK_PROHIBITED;
-              last_prop = LBP_ZW;
+              /* Mandatory break.  */
+              *p = UC_BREAK_MANDATORY;
+              last_prop = LBP_BK;
               seen_space = NULL;
               seen_space2 = NULL;
             }
-          else if (prop == LBP_CM)
+          else
             {
-              /* Don't break just before a combining character, except immediately after a
-                 zero-width space.  */
-              if (last_prop == LBP_ZW)
+              char *q;
+
+              /* Resolve property values whose behaviour is not fixed.  */
+              switch (prop)
                 {
-                  /* Break after zero-width space.  */
-                  *p = UC_BREAK_POSSIBLE;
-                  /* A combining character turns a preceding space into LBP_ID.  */
-                  last_prop = LBP_ID;
+                case LBP_AI:
+                  /* Resolve ambiguous.  */
+                  prop = LBP_AI_REPLACEMENT;
+                  break;
+                case LBP_CB:
+                  /* This is arbitrary.  */
+                  prop = LBP_ID;
+                  break;
+                case LBP_SA:
+                  /* We don't handle complex scripts yet.
+                     Treat LBP_SA like LBP_XX.  */
+                case LBP_XX:
+                  /* This is arbitrary.  */
+                  prop = LBP_AL;
+                  break;
                 }
-              else
+
+              /* Deal with spaces and combining characters.  */
+              q = p;
+              if (prop == LBP_SP)
                 {
+                  /* Don't break just before a space.  */
                   *p = UC_BREAK_PROHIBITED;
-                  /* A combining character turns a preceding space into LBP_ID.  */
-                  if (seen_space != NULL)
-                    {
-                      q = seen_space;
-                      seen_space = seen_space2;
-                      prop = LBP_ID;
-                      goto lookup_via_table;
-                    }
+                  seen_space2 = seen_space;
+                  seen_space = p;
                 }
-            }
-          else
-            {
-             lookup_via_table:
-              /* prop must be usable as an index for table 7.3 of UTR #14.  */
-              if (!(prop >= 0 && prop < sizeof (unilbrk_table) / sizeof (unilbrk_table[0])))
-                abort ();
-
-              if (last_prop == LBP_BK)
+              else if (prop == LBP_ZW)
                 {
-                  /* Don't break at the beginning of a line.  */
-                  *q = UC_BREAK_PROHIBITED;
+                  /* Don't break just before a zero-width space.  */
+                  *p = UC_BREAK_PROHIBITED;
+                  last_prop = LBP_ZW;
+                  seen_space = NULL;
+                  seen_space2 = NULL;
                 }
-              else if (last_prop == LBP_ZW)
+              else if (prop == LBP_CM)
                 {
-                  /* Break after zero-width space.  */
-                  *q = UC_BREAK_POSSIBLE;
+                  /* Don't break just before a combining character, except immediately
+                     after a zero-width space.  */
+                  if (last_prop == LBP_ZW)
+                    {
+                      /* Break after zero-width space.  */
+                      *p = UC_BREAK_POSSIBLE;
+                      /* A combining character turns a preceding space into LBP_ID.  */
+                      last_prop = LBP_ID;
+                    }
+                  else
+                    {
+                      *p = UC_BREAK_PROHIBITED;
+                      /* A combining character turns a preceding space into LBP_ID.  */
+                      if (seen_space != NULL)
+                        {
+                          q = seen_space;
+                          seen_space = seen_space2;
+                          prop = LBP_ID;
+                          goto lookup_via_table;
+                        }
+                    }
                 }
               else
                 {
-                  switch (unilbrk_table [last_prop] [prop])
+                 lookup_via_table:
+                  /* prop must be usable as an index for table 7.3 of UTR #14.  */
+                  if (!(prop >= 0 && prop < sizeof (unilbrk_table) / sizeof (unilbrk_table[0])))
+                    abort ();
+
+                  if (last_prop == LBP_BK)
                     {
-                    case D:
-                      *q = UC_BREAK_POSSIBLE;
-                      break;
-                    case I:
-                      *q = (seen_space != NULL ? UC_BREAK_POSSIBLE : UC_BREAK_PROHIBITED);
-                      break;
-                    case P:
+                      /* Don't break at the beginning of a line.  */
                       *q = UC_BREAK_PROHIBITED;
-                      break;
-                    default:
-                      abort ();
                     }
+                  else if (last_prop == LBP_ZW)
+                    {
+                      /* Break after zero-width space.  */
+                      *q = UC_BREAK_POSSIBLE;
+                    }
+                  else
+                    {
+                      switch (unilbrk_table [last_prop] [prop])
+                        {
+                        case D:
+                          *q = UC_BREAK_POSSIBLE;
+                          break;
+                        case I:
+                          *q = (seen_space != NULL ? UC_BREAK_POSSIBLE : UC_BREAK_PROHIBITED);
+                          break;
+                        case P:
+                          *q = UC_BREAK_PROHIBITED;
+                          break;
+                        default:
+                          abort ();
+                        }
+                    }
+                  last_prop = prop;
+                  seen_space = NULL;
+                  seen_space2 = NULL;
                 }
-              last_prop = prop;
-              seen_space = NULL;
-              seen_space2 = NULL;
             }
-        }
 
-      s += count;
-      p += count;
+          s += count;
+          p += count;
+        }
+      while (s < s_end);
     }
 }
diff --git a/lib/unilbrk/u32-possible-linebreaks.c b/lib/unilbrk/u32-possible-linebreaks.c
index cdd00c8..9a0bafa 100644
--- a/lib/unilbrk/u32-possible-linebreaks.c
+++ b/lib/unilbrk/u32-possible-linebreaks.c
@@ -28,131 +28,135 @@
 void
 u32_possible_linebreaks (const uint32_t *s, size_t n, const char *encoding, char *p)
 {
-  int LBP_AI_REPLACEMENT = (is_cjk_encoding (encoding) ? LBP_ID : LBP_AL);
-  const uint32_t *s_end = s + n;
-  int last_prop = LBP_BK; /* line break property of last non-space character */
-  char *seen_space = NULL; /* Was a space seen after the last non-space character? */
-  char *seen_space2 = NULL; /* At least two spaces after the last non-space? */
-
-  while (s < s_end)
+  if (n > 0)
     {
-      ucs4_t uc = *s;
-      int prop = unilbrkprop_lookup (uc);
+      int LBP_AI_REPLACEMENT = (is_cjk_encoding (encoding) ? LBP_ID : LBP_AL);
+      const uint32_t *s_end = s + n;
+      int last_prop = LBP_BK; /* line break property of last non-space character */
+      char *seen_space = NULL; /* Was a space seen after the last non-space character? */
+      char *seen_space2 = NULL; /* At least two spaces after the last non-space? */
 
-      if (prop == LBP_BK)
-        {
-          /* Mandatory break.  */
-          *p = UC_BREAK_MANDATORY;
-          last_prop = LBP_BK;
-          seen_space = NULL;
-          seen_space2 = NULL;
-        }
-      else
+      do
         {
-          char *q;
+          ucs4_t uc = *s;
+          int prop = unilbrkprop_lookup (uc);
 
-          /* Resolve property values whose behaviour is not fixed.  */
-          switch (prop)
+          if (prop == LBP_BK)
             {
-            case LBP_AI:
-              /* Resolve ambiguous.  */
-              prop = LBP_AI_REPLACEMENT;
-              break;
-            case LBP_CB:
-              /* This is arbitrary.  */
-              prop = LBP_ID;
-              break;
-            case LBP_SA:
-              /* We don't handle complex scripts yet.
-                 Treat LBP_SA like LBP_XX.  */
-            case LBP_XX:
-              /* This is arbitrary.  */
-              prop = LBP_AL;
-              break;
-            }
-
-          /* Deal with spaces and combining characters.  */
-          q = p;
-          if (prop == LBP_SP)
-            {
-              /* Don't break just before a space.  */
-              *p = UC_BREAK_PROHIBITED;
-              seen_space2 = seen_space;
-              seen_space = p;
-            }
-          else if (prop == LBP_ZW)
-            {
-              /* Don't break just before a zero-width space.  */
-              *p = UC_BREAK_PROHIBITED;
-              last_prop = LBP_ZW;
+              /* Mandatory break.  */
+              *p = UC_BREAK_MANDATORY;
+              last_prop = LBP_BK;
               seen_space = NULL;
               seen_space2 = NULL;
             }
-          else if (prop == LBP_CM)
+          else
             {
-              /* Don't break just before a combining character, except immediately after a
-                 zero-width space.  */
-              if (last_prop == LBP_ZW)
+              char *q;
+
+              /* Resolve property values whose behaviour is not fixed.  */
+              switch (prop)
                 {
-                  /* Break after zero-width space.  */
-                  *p = UC_BREAK_POSSIBLE;
-                  /* A combining character turns a preceding space into LBP_ID.  */
-                  last_prop = LBP_ID;
+                case LBP_AI:
+                  /* Resolve ambiguous.  */
+                  prop = LBP_AI_REPLACEMENT;
+                  break;
+                case LBP_CB:
+                  /* This is arbitrary.  */
+                  prop = LBP_ID;
+                  break;
+                case LBP_SA:
+                  /* We don't handle complex scripts yet.
+                     Treat LBP_SA like LBP_XX.  */
+                case LBP_XX:
+                  /* This is arbitrary.  */
+                  prop = LBP_AL;
+                  break;
                 }
-              else
+
+              /* Deal with spaces and combining characters.  */
+              q = p;
+              if (prop == LBP_SP)
                 {
+                  /* Don't break just before a space.  */
                   *p = UC_BREAK_PROHIBITED;
-                  /* A combining character turns a preceding space into LBP_ID.  */
-                  if (seen_space != NULL)
-                    {
-                      q = seen_space;
-                      seen_space = seen_space2;
-                      prop = LBP_ID;
-                      goto lookup_via_table;
-                    }
+                  seen_space2 = seen_space;
+                  seen_space = p;
                 }
-            }
-          else
-            {
-             lookup_via_table:
-              /* prop must be usable as an index for table 7.3 of UTR #14.  */
-              if (!(prop >= 0 && prop < sizeof (unilbrk_table) / sizeof (unilbrk_table[0])))
-                abort ();
-
-              if (last_prop == LBP_BK)
+              else if (prop == LBP_ZW)
                 {
-                  /* Don't break at the beginning of a line.  */
-                  *q = UC_BREAK_PROHIBITED;
+                  /* Don't break just before a zero-width space.  */
+                  *p = UC_BREAK_PROHIBITED;
+                  last_prop = LBP_ZW;
+                  seen_space = NULL;
+                  seen_space2 = NULL;
                 }
-              else if (last_prop == LBP_ZW)
+              else if (prop == LBP_CM)
                 {
-                  /* Break after zero-width space.  */
-                  *q = UC_BREAK_POSSIBLE;
+                  /* Don't break just before a combining character, except immediately
+                     after a zero-width space.  */
+                  if (last_prop == LBP_ZW)
+                    {
+                      /* Break after zero-width space.  */
+                      *p = UC_BREAK_POSSIBLE;
+                      /* A combining character turns a preceding space into LBP_ID.  */
+                      last_prop = LBP_ID;
+                    }
+                  else
+                    {
+                      *p = UC_BREAK_PROHIBITED;
+                      /* A combining character turns a preceding space into LBP_ID.  */
+                      if (seen_space != NULL)
+                        {
+                          q = seen_space;
+                          seen_space = seen_space2;
+                          prop = LBP_ID;
+                          goto lookup_via_table;
+                        }
+                    }
                 }
               else
                 {
-                  switch (unilbrk_table [last_prop] [prop])
+                 lookup_via_table:
+                  /* prop must be usable as an index for table 7.3 of UTR #14.  */
+                  if (!(prop >= 0 && prop < sizeof (unilbrk_table) / sizeof (unilbrk_table[0])))
+                    abort ();
+
+                  if (last_prop == LBP_BK)
                     {
-                    case D:
-                      *q = UC_BREAK_POSSIBLE;
-                      break;
-                    case I:
-                      *q = (seen_space != NULL ? UC_BREAK_POSSIBLE : UC_BREAK_PROHIBITED);
-                      break;
-                    case P:
+                      /* Don't break at the beginning of a line.  */
                       *q = UC_BREAK_PROHIBITED;
-                      break;
-                    default:
-                      abort ();
                     }
+                  else if (last_prop == LBP_ZW)
+                    {
+                      /* Break after zero-width space.  */
+                      *q = UC_BREAK_POSSIBLE;
+                    }
+                  else
+                    {
+                      switch (unilbrk_table [last_prop] [prop])
+                        {
+                        case D:
+                          *q = UC_BREAK_POSSIBLE;
+                          break;
+                        case I:
+                          *q = (seen_space != NULL ? UC_BREAK_POSSIBLE : UC_BREAK_PROHIBITED);
+                          break;
+                        case P:
+                          *q = UC_BREAK_PROHIBITED;
+                          break;
+                        default:
+                          abort ();
+                        }
+                    }
+                  last_prop = prop;
+                  seen_space = NULL;
+                  seen_space2 = NULL;
                 }
-              last_prop = prop;
-              seen_space = NULL;
-              seen_space2 = NULL;
             }
-        }
 
-      s++;
-      p++;
+          s++;
+          p++;
+        }
+      while (s < s_end);
     }
 }
diff --git a/lib/unilbrk/u8-possible-linebreaks.c b/lib/unilbrk/u8-possible-linebreaks.c
index 1e2224b..79efcb2 100644
--- a/lib/unilbrk/u8-possible-linebreaks.c
+++ b/lib/unilbrk/u8-possible-linebreaks.c
@@ -30,136 +30,140 @@
 void
 u8_possible_linebreaks (const uint8_t *s, size_t n, const char *encoding, char *p)
 {
-  int LBP_AI_REPLACEMENT = (is_cjk_encoding (encoding) ? LBP_ID : LBP_AL);
-  const uint8_t *s_end = s + n;
-  int last_prop = LBP_BK; /* line break property of last non-space character */
-  char *seen_space = NULL; /* Was a space seen after the last non-space character? */
-  char *seen_space2 = NULL; /* At least two spaces after the last non-space? */
-
-  /* Don't break inside multibyte characters.  */
-  memset (p, UC_BREAK_PROHIBITED, n);
-
-  while (s < s_end)
+  if (n > 0)
     {
-      ucs4_t uc;
-      int count = u8_mbtouc_unsafe (&uc, s, s_end - s);
-      int prop = unilbrkprop_lookup (uc);
+      int LBP_AI_REPLACEMENT = (is_cjk_encoding (encoding) ? LBP_ID : LBP_AL);
+      const uint8_t *s_end = s + n;
+      int last_prop = LBP_BK; /* line break property of last non-space character */
+      char *seen_space = NULL; /* Was a space seen after the last non-space character? */
+      char *seen_space2 = NULL; /* At least two spaces after the last non-space? */
 
-      if (prop == LBP_BK)
-        {
-          /* Mandatory break.  */
-          *p = UC_BREAK_MANDATORY;
-          last_prop = LBP_BK;
-          seen_space = NULL;
-          seen_space2 = NULL;
-        }
-      else
-        {
-          char *q;
+      /* Don't break inside multibyte characters.  */
+      memset (p, UC_BREAK_PROHIBITED, n);
 
-          /* Resolve property values whose behaviour is not fixed.  */
-          switch (prop)
-            {
-            case LBP_AI:
-              /* Resolve ambiguous.  */
-              prop = LBP_AI_REPLACEMENT;
-              break;
-            case LBP_CB:
-              /* This is arbitrary.  */
-              prop = LBP_ID;
-              break;
-            case LBP_SA:
-              /* We don't handle complex scripts yet.
-                 Treat LBP_SA like LBP_XX.  */
-            case LBP_XX:
-              /* This is arbitrary.  */
-              prop = LBP_AL;
-              break;
-            }
+      do
+        {
+          ucs4_t uc;
+          int count = u8_mbtouc_unsafe (&uc, s, s_end - s);
+          int prop = unilbrkprop_lookup (uc);
 
-          /* Deal with spaces and combining characters.  */
-          q = p;
-          if (prop == LBP_SP)
+          if (prop == LBP_BK)
             {
-              /* Don't break just before a space.  */
-              *p = UC_BREAK_PROHIBITED;
-              seen_space2 = seen_space;
-              seen_space = p;
-            }
-          else if (prop == LBP_ZW)
-            {
-              /* Don't break just before a zero-width space.  */
-              *p = UC_BREAK_PROHIBITED;
-              last_prop = LBP_ZW;
+              /* Mandatory break.  */
+              *p = UC_BREAK_MANDATORY;
+              last_prop = LBP_BK;
               seen_space = NULL;
               seen_space2 = NULL;
             }
-          else if (prop == LBP_CM)
+          else
             {
-              /* Don't break just before a combining character, except immediately after a
-                 zero-width space.  */
-              if (last_prop == LBP_ZW)
+              char *q;
+
+              /* Resolve property values whose behaviour is not fixed.  */
+              switch (prop)
                 {
-                  /* Break after zero-width space.  */
-                  *p = UC_BREAK_POSSIBLE;
-                  /* A combining character turns a preceding space into LBP_ID.  */
-                  last_prop = LBP_ID;
+                case LBP_AI:
+                  /* Resolve ambiguous.  */
+                  prop = LBP_AI_REPLACEMENT;
+                  break;
+                case LBP_CB:
+                  /* This is arbitrary.  */
+                  prop = LBP_ID;
+                  break;
+                case LBP_SA:
+                  /* We don't handle complex scripts yet.
+                     Treat LBP_SA like LBP_XX.  */
+                case LBP_XX:
+                  /* This is arbitrary.  */
+                  prop = LBP_AL;
+                  break;
                 }
-              else
+
+              /* Deal with spaces and combining characters.  */
+              q = p;
+              if (prop == LBP_SP)
                 {
+                  /* Don't break just before a space.  */
                   *p = UC_BREAK_PROHIBITED;
-                  /* A combining character turns a preceding space into LBP_ID.  */
-                  if (seen_space != NULL)
-                    {
-                      q = seen_space;
-                      seen_space = seen_space2;
-                      prop = LBP_ID;
-                      goto lookup_via_table;
-                    }
+                  seen_space2 = seen_space;
+                  seen_space = p;
                 }
-            }
-          else
-            {
-             lookup_via_table:
-              /* prop must be usable as an index for table 7.3 of UTR #14.  */
-              if (!(prop >= 0 && prop < sizeof (unilbrk_table) / sizeof (unilbrk_table[0])))
-                abort ();
-
-              if (last_prop == LBP_BK)
+              else if (prop == LBP_ZW)
                 {
-                  /* Don't break at the beginning of a line.  */
-                  *q = UC_BREAK_PROHIBITED;
+                  /* Don't break just before a zero-width space.  */
+                  *p = UC_BREAK_PROHIBITED;
+                  last_prop = LBP_ZW;
+                  seen_space = NULL;
+                  seen_space2 = NULL;
                 }
-              else if (last_prop == LBP_ZW)
+              else if (prop == LBP_CM)
                 {
-                  /* Break after zero-width space.  */
-                  *q = UC_BREAK_POSSIBLE;
+                  /* Don't break just before a combining character, except immediately
+                     after a zero-width space.  */
+                  if (last_prop == LBP_ZW)
+                    {
+                      /* Break after zero-width space.  */
+                      *p = UC_BREAK_POSSIBLE;
+                      /* A combining character turns a preceding space into LBP_ID.  */
+                      last_prop = LBP_ID;
+                    }
+                  else
+                    {
+                      *p = UC_BREAK_PROHIBITED;
+                      /* A combining character turns a preceding space into LBP_ID.  */
+                      if (seen_space != NULL)
+                        {
+                          q = seen_space;
+                          seen_space = seen_space2;
+                          prop = LBP_ID;
+                          goto lookup_via_table;
+                        }
+                    }
                 }
               else
                 {
-                  switch (unilbrk_table [last_prop] [prop])
+                 lookup_via_table:
+                  /* prop must be usable as an index for table 7.3 of UTR #14.  */
+                  if (!(prop >= 0 && prop < sizeof (unilbrk_table) / sizeof (unilbrk_table[0])))
+                    abort ();
+
+                  if (last_prop == LBP_BK)
                     {
-                    case D:
-                      *q = UC_BREAK_POSSIBLE;
-                      break;
-                    case I:
-                      *q = (seen_space != NULL ? UC_BREAK_POSSIBLE : UC_BREAK_PROHIBITED);
-                      break;
-                    case P:
+                      /* Don't break at the beginning of a line.  */
                       *q = UC_BREAK_PROHIBITED;
-                      break;
-                    default:
-                      abort ();
                     }
+                  else if (last_prop == LBP_ZW)
+                    {
+                      /* Break after zero-width space.  */
+                      *q = UC_BREAK_POSSIBLE;
+                    }
+                  else
+                    {
+                      switch (unilbrk_table [last_prop] [prop])
+                        {
+                        case D:
+                          *q = UC_BREAK_POSSIBLE;
+                          break;
+                        case I:
+                          *q = (seen_space != NULL ? UC_BREAK_POSSIBLE : UC_BREAK_PROHIBITED);
+                          break;
+                        case P:
+                          *q = UC_BREAK_PROHIBITED;
+                          break;
+                        default:
+                          abort ();
+                        }
+                    }
+                  last_prop = prop;
+                  seen_space = NULL;
+                  seen_space2 = NULL;
                 }
-              last_prop = prop;
-              seen_space = NULL;
-              seen_space2 = NULL;
             }
-        }
 
-      s += count;
-      p += count;
+          s += count;
+          p += count;
+        }
+      while (s < s_end);
     }
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [bug-libunistring] Unistring 0.9.10 and Undefined Behavior sanitizer findings
  2019-03-08 23:36 ` [bug-libunistring] Unistring 0.9.10 and Undefined Behavior sanitizer findings Bruno Haible
@ 2019-03-09  1:29   ` Jeffrey Walton
  2019-03-10  4:47     ` Jim Meyering
  0 siblings, 1 reply; 3+ messages in thread
From: Jeffrey Walton @ 2019-03-09  1:29 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-libunistring, bug-gnulib

On Fri, Mar 8, 2019 at 6:36 PM Bruno Haible <bruno@clisp.org> wrote:
>  ...
>
> I'm pushing the attached patches.

Thanks Bruno, got them. 0.9.10 tested good with them.

(I should be starting on -fsanitize=address this weekend. Address
Sanitizer is a little trickier because I've seen findings that I could
not explain or repro with Valgrind).

Jeff


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug-libunistring] Unistring 0.9.10 and Undefined Behavior sanitizer findings
  2019-03-09  1:29   ` Jeffrey Walton
@ 2019-03-10  4:47     ` Jim Meyering
  0 siblings, 0 replies; 3+ messages in thread
From: Jim Meyering @ 2019-03-10  4:47 UTC (permalink / raw)
  To: noloader; +Cc: bug-libunistring, bug-gnulib@gnu.org List, Bruno Haible

On Fri, Mar 8, 2019 at 5:30 PM Jeffrey Walton <noloader@gmail.com> wrote:

Thanks for the UBSAN testing and report, and to Bruno for all the fixes.

> On Fri, Mar 8, 2019 at 6:36 PM Bruno Haible <bruno@clisp.org> wrote:
> >  ...
> > I'm pushing the attached patches.
>
> Thanks Bruno, got them. 0.9.10 tested good with them.
>
> (I should be starting on -fsanitize=address this weekend. Address
> Sanitizer is a little trickier because I've seen findings that I could
> not explain or repro with Valgrind).

For some types of problems, that is expected: valgrind detects few
things (UMR is the one I miss the most. use MSAN for those if you can
manage to instrument all dependent libraries) that ASAN does not
catch. However, ASAN catches many types of bugs that valgrind cannot
detect.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-03-10  4:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAH8yC8kNZpPFZQdUD_brywwH75hyae=_Xsw=5MWgVQNjYfqMpg@mail.gmail.com>
2019-03-08 23:36 ` [bug-libunistring] Unistring 0.9.10 and Undefined Behavior sanitizer findings Bruno Haible
2019-03-09  1:29   ` Jeffrey Walton
2019-03-10  4:47     ` Jim Meyering

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).