bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* Re: undefined-behavior obstack.c:139
@ 2023-12-01 13:14 Bruno Haible
  2023-12-01 13:34 ` Jeffrey Walton
  0 siblings, 1 reply; 29+ messages in thread
From: Bruno Haible @ 2023-12-01 13:14 UTC (permalink / raw)
  To: Alexey Palienko; +Cc: bug-m4, bug-gnulib

[CCing bug-gnulib because obstack.c comes from gnulib.]

Alexey Palienko <Alexey.Palienko@cma.se> wrote in
<https://lists.gnu.org/archive/html/bug-m4/2023-02/msg00000.html>:

> It has been built by clang 13 with "-g -fsanitize=address,undefined -fno-omit-frame-pointer -fsanitize-address-use-after-scope -fno-sanitize-recover=all"
> And we have an error:
>
> # /home/docker/.conan/data/m4/latest/_/_/package/3421fde5744f1eadef515027cbcbb9a8fbcd667c/bin/m4
> obstack.c:139:35: runtime error: applying non-zero offset 107820858999056 to null pointer
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior obstack.c:139:35 in

I reproduce the issue with the newest m4 snapshot and with clang 17.

However, it's not a bug in m4, but rather a false alarm from clang's
UndefinedBehaviorSanitizer.

Namely, when I use the modified CC value
  CC="clang -fsanitize=address,undefined,signed-integer-overflow,shift,integer-divide-by-zero -fno-sanitize=pointer-overflow -fsanitize-address-use-after-scope -fno-sanitize-recover=all"
there are no issues.

The 'pointer-overflow' sanitizer considers adding NULL + 0 as undefined:
========================
#include <stdlib.h>
int main ()
{
  char *p = NULL;
  char *q = p + 0;
  return 0;
}
========================

I don't see wording to this effect in ISO C 23 § 6.5.6.(9).
Many programs use NULL + 0 in some cases, because it avoids a gratuitous
test against NULL.

So, I recommend turning off the 'pointer-overflow' sanitizer.

Bruno

[1] https://gitlab.com/gnu-m4/ci-distcheck





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 13:14 undefined-behavior obstack.c:139 Bruno Haible
@ 2023-12-01 13:34 ` Jeffrey Walton
  2023-12-01 16:01   ` Bruno Haible
  0 siblings, 1 reply; 29+ messages in thread
From: Jeffrey Walton @ 2023-12-01 13:34 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Alexey Palienko, bug-m4, bug-gnulib

On Fri, Dec 1, 2023 at 8:15 AM Bruno Haible <bruno@clisp.org> wrote:
>
> [CCing bug-gnulib because obstack.c comes from gnulib.]
>
> Alexey Palienko <Alexey.Palienko@cma.se> wrote in
> <https://lists.gnu.org/archive/html/bug-m4/2023-02/msg00000.html>:
>
> > It has been built by clang 13 with "-g -fsanitize=address,undefined -fno-omit-frame-pointer -fsanitize-address-use-after-scope -fno-sanitize-recover=all"
> > And we have an error:
> >
> > # /home/docker/.conan/data/m4/latest/_/_/package/3421fde5744f1eadef515027cbcbb9a8fbcd667c/bin/m4
> > obstack.c:139:35: runtime error: applying non-zero offset 107820858999056 to null pointer
> > SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior obstack.c:139:35 in
>
> I reproduce the issue with the newest m4 snapshot and with clang 17.
>
> However, it's not a bug in m4, but rather a false alarm from clang's
> UndefinedBehaviorSanitizer.
>
> Namely, when I use the modified CC value
>   CC="clang -fsanitize=address,undefined,signed-integer-overflow,shift,integer-divide-by-zero -fno-sanitize=pointer-overflow -fsanitize-address-use-after-scope -fno-sanitize-recover=all"
> there are no issues.
>
> The 'pointer-overflow' sanitizer considers adding NULL + 0 as undefined:
> ========================
> #include <stdlib.h>
> int main ()
> {
>   char *p = NULL;
>   char *q = p + 0;
>   return 0;
> }
> ========================
>
> I don't see wording to this effect in ISO C 23 § 6.5.6.(9).
> Many programs use NULL + 0 in some cases, because it avoids a gratuitous
> test against NULL.
>
> So, I recommend turning off the 'pointer-overflow' sanitizer.
>
> Bruno
>
> [1] https://gitlab.com/gnu-m4/ci-distcheck

I think that's a valid finding. NULL is not a valid address. You can't
add anything to it.

Jeff


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 13:34 ` Jeffrey Walton
@ 2023-12-01 16:01   ` Bruno Haible
  2023-12-01 16:14     ` Marc Nieper-Wißkirchen
  2023-12-01 18:36     ` Andreas F. Borchert
  0 siblings, 2 replies; 29+ messages in thread
From: Bruno Haible @ 2023-12-01 16:01 UTC (permalink / raw)
  To: Jeffrey Walton; +Cc: Alexey Palienko, bug-m4, bug-gnulib

Jeffrey Walton wrote:
> I think that's a valid finding. NULL is not a valid address. You can't
> add anything to it.

Can you back this opinion with citations from ISO C 23 [1] ?

Bruno

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 16:01   ` Bruno Haible
@ 2023-12-01 16:14     ` Marc Nieper-Wißkirchen
  2023-12-01 18:40       ` Bruno Haible
  2023-12-01 18:36     ` Andreas F. Borchert
  1 sibling, 1 reply; 29+ messages in thread
From: Marc Nieper-Wißkirchen @ 2023-12-01 16:14 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Jeffrey Walton, Alexey Palienko, bug-m4, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 818 bytes --]

By 6.5.6 "Additive Operators":

(2) "... one operator shall be a pointer to a complete object type..."

NULL, which is a null pointer constant, is not necessarily a pointer to a
complete object type.

(9) "... If the pointer operand and the result do not point to elements of
the same array object or one past the last element of the array object, the
behavior is undefined..."

NULL does not have to point to an element of an array object (or any
object; see (8)).


Am Fr., 1. Dez. 2023 um 17:02 Uhr schrieb Bruno Haible <bruno@clisp.org>:

> Jeffrey Walton wrote:
> > I think that's a valid finding. NULL is not a valid address. You can't
> > add anything to it.
>
> Can you back this opinion with citations from ISO C 23 [1] ?
>
> Bruno
>
> [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf
>
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 1930 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 16:01   ` Bruno Haible
  2023-12-01 16:14     ` Marc Nieper-Wißkirchen
@ 2023-12-01 18:36     ` Andreas F. Borchert
  2023-12-02  8:50       ` Bruno Haible
  1 sibling, 1 reply; 29+ messages in thread
From: Andreas F. Borchert @ 2023-12-01 18:36 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Jeffrey Walton, Alexey Palienko, bug-m4, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 1866 bytes --]

On Fri, Dec 01, 2023 at 05:01:45PM +0100, Bruno Haible wrote:
> Jeffrey Walton wrote:
> > I think that's a valid finding. NULL is not a valid address. You can't
> > add anything to it.
> 
> Can you back this opinion with citations from ISO C 23 [1] ?

See N3096 (most recent draft of C23 [1]), quotes from § 6.5.6, (2), (8) and (9):

   For addition, either both operands shall have arithmetic type,
   or one operand shall be a pointer to a complete object type
   and the other shall have integer type.
   [..]
   For the purposes of these operators, a pointer to an object that
   is not an element of an array behaves the same as a pointer to the
   first elemenet of an array of length one with the type of
   the object as its element type.
   [..]
   If the pointer operand points to an element of an array object,
   and the array is large enough, the result points to an element
   offset from the original element such that the difference of
   the subscripts of the resulting and original array elements
   equals the integer expression.

NULL per <stddef.h> is defined per § 7.21, (4):

   The macros are NULL which expands to an implementation-defined
   null pointer constant; [..]

The definition of null pointer constant is given at § 6.3.2.3 (3):

   An integer constant expression with the value 0, such an
   expression cast to type void *, or the predefined constant nullptr
   is called a null pointer constant.

In summary, null pointer constants are not pointers to an object
or an element of an array object and thereby must not be used
for pointer arithmetic.

Andreas.

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf

-- 
Dr. Andreas F. Borchert,  Institut für Numerische Mathematik, Universität Ulm
Helmholtzstr. 20, 89081 Ulm, +49 7315023572 https://mathematik.uni-ulm.de/afb

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6013 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 16:14     ` Marc Nieper-Wißkirchen
@ 2023-12-01 18:40       ` Bruno Haible
  2023-12-01 19:08         ` Marc Nieper-Wißkirchen
  2023-12-01 20:01         ` Paul Eggert
  0 siblings, 2 replies; 29+ messages in thread
From: Bruno Haible @ 2023-12-01 18:40 UTC (permalink / raw)
  To: Marc Nieper-Wißkirchen
  Cc: Jeffrey Walton, Alexey Palienko, bug-m4, bug-gnulib

Marc Nieper-Wißkirchen wrote:
> By 6.5.6 "Additive Operators":
> 
> (2) "... one operator shall be a pointer to a complete object type..."
> 
> NULL, which is a null pointer constant, is not necessarily a pointer to a
> complete object type.

In my test program, I used a variable of type 'char *'. Which is a pointer
to a complete object type.

> (9) "... If the pointer operand and the result do not point to elements of
> the same array object or one past the last element of the array object, the
> behavior is undefined..."
> 
> NULL does not have to point to an element of an array object (or any
> object; see (8)).

Indeed, this sentence appears to forbid ((char *) NULL) + something.
Thanks for highlighting it; I had read this paragraph too quickly.

I'm therefore applying this fix.


2023-12-01  Bruno Haible  <bruno@clisp.org>

	obstack: Avoid undefined behaviour.
	Reported by Alexey Palienko <Alexey.Palienko@cma.se> in
	<https://lists.gnu.org/archive/html/bug-m4/2023-02/msg00000.html>.
	* lib/obstack.in.h: Include <stdint.h>.
	(__BPTR_ALIGN): Remove macro.
	(__PTR_ALIGN): For the optimized case, compute the alignment through
	uintptr_t, instead of computing NULL + something.

diff --git a/lib/obstack.in.h b/lib/obstack.in.h
index 265203b6e2..468a797341 100644
--- a/lib/obstack.in.h
+++ b/lib/obstack.in.h
@@ -111,6 +111,7 @@
 #endif
 
 #include <stddef.h>             /* For size_t and ptrdiff_t.  */
+#include <stdint.h>             /* For uintptr_t.  */
 #include <string.h>             /* For memcpy.  */
 
 #if __STDC_VERSION__ < 199901L || defined __HP_cc
@@ -134,20 +135,15 @@
 
 /* If B is the base of an object addressed by P, return the result of
    aligning P to the next multiple of A + 1.  B and P must be of type
-   char *.  A + 1 must be a power of 2.  */
-
-#define __BPTR_ALIGN(B, P, A) ((B) + (((P) - (B) + (A)) & ~(A)))
-
-/* Similar to __BPTR_ALIGN (B, P, A), except optimize the common case
-   where pointers can be converted to integers, aligned as integers,
-   and converted back again.  If ptrdiff_t is narrower than a
-   pointer (e.g., the AS/400), play it safe and compute the alignment
-   relative to B.  Otherwise, use the faster strategy of computing the
-   alignment relative to 0.  */
-
-#define __PTR_ALIGN(B, P, A)						      \
-  __BPTR_ALIGN (sizeof (ptrdiff_t) < sizeof (void *) ? (B) : (char *) 0,      \
-                P, A)
+   char *.  A + 1 must be a power of 2.
+   If ptrdiff_t is narrower than a pointer (e.g., the AS/400), play it
+   safe and compute the alignment relative to B.  Otherwise, use the
+   faster strategy of computing the alignment through uintptr_t.  */
+
+#define __PTR_ALIGN(B, P, A) \
+  (sizeof (ptrdiff_t) < sizeof (void *) \
+   ? (B) + (((P) - (B) + (A)) & ~(A))   \
+   : (P) + ((- (uintptr_t) (P)) & (A)))
 
 #ifndef __attribute_pure__
 # define __attribute_pure__ _GL_ATTRIBUTE_PURE





^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 18:40       ` Bruno Haible
@ 2023-12-01 19:08         ` Marc Nieper-Wißkirchen
  2023-12-01 20:01         ` Paul Eggert
  1 sibling, 0 replies; 29+ messages in thread
From: Marc Nieper-Wißkirchen @ 2023-12-01 19:08 UTC (permalink / raw)
  To: Bruno Haible
  Cc: Marc Nieper-Wißkirchen, Jeffrey Walton, Alexey Palienko,
	bug-m4, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]

Am Fr., 1. Dez. 2023 um 19:42 Uhr schrieb Bruno Haible <bruno@clisp.org>:

> Marc Nieper-Wißkirchen wrote:
> > By 6.5.6 "Additive Operators":
> >
> > (2) "... one operator shall be a pointer to a complete object type..."
> >
> > NULL, which is a null pointer constant, is not necessarily a pointer to a
> > complete object type.
>
> In my test program, I used a variable of type 'char *'. Which is a pointer
> to a complete object type.
>

Yes, I was referring to the expression "NULL + 0" you mentioned.

[...]

[-- Attachment #2: Type: text/html, Size: 1091 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 18:40       ` Bruno Haible
  2023-12-01 19:08         ` Marc Nieper-Wißkirchen
@ 2023-12-01 20:01         ` Paul Eggert
  2023-12-01 20:36           ` Marc Nieper-Wißkirchen
  2023-12-02  9:04           ` Bruno Haible
  1 sibling, 2 replies; 29+ messages in thread
From: Paul Eggert @ 2023-12-01 20:01 UTC (permalink / raw)
  To: Bruno Haible, Marc Nieper-Wißkirchen
  Cc: Jeffrey Walton, Alexey Palienko, bug-m4, bug-gnulib

On 2023-12-01 10:40, Bruno Haible wrote:

> Indeed, this sentence appears to forbid ((char *) NULL) + something.

Yes. However, Gnulib code can still use ((char *) NULL) + something) 
because the Gnulib portability guidelines allow it.

The issue with clang false positives is covered here:

https://www.gnu.org/software/gnulib/manual/html_node/Unsupported-Platforms.html

which lists "clang -fsanitize=undefined" as an unsupported platform 
unless you also specify "-fno-sanitize=pointer-overflow".

The obstack patch you installed is fine, as it's clearer and just as 
fast as the original. However, we needn't go through Gnulib and change 
other code merely because it runs afoul of this false alarm from clang.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 20:01         ` Paul Eggert
@ 2023-12-01 20:36           ` Marc Nieper-Wißkirchen
  2023-12-01 22:23             ` Paul Eggert
  2023-12-02  9:04           ` Bruno Haible
  1 sibling, 1 reply; 29+ messages in thread
From: Marc Nieper-Wißkirchen @ 2023-12-01 20:36 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Bruno Haible, Marc Nieper-Wißkirchen, Jeffrey Walton,
	Alexey Palienko, bug-m4, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]

Am Fr., 1. Dez. 2023 um 21:01 Uhr schrieb Paul Eggert <eggert@cs.ucla.edu>:

> On 2023-12-01 10:40, Bruno Haible wrote:
>
> > Indeed, this sentence appears to forbid ((char *) NULL) + something.
>
> Yes. However, Gnulib code can still use ((char *) NULL) + something)
> because the Gnulib portability guidelines allow it.
>
> The issue with clang false positives is covered here:
>
>
> https://www.gnu.org/software/gnulib/manual/html_node/Unsupported-Platforms.html
>
> which lists "clang -fsanitize=undefined" as an unsupported platform
> unless you also specify "-fno-sanitize=pointer-overflow".
>
> The obstack patch you installed is fine, as it's clearer and just as
> fast as the original. However, we needn't go through Gnulib and change
> other code merely because it runs afoul of this false alarm from clang.
>

It may not be a false alarm in future versions of the compiler.  At any
time, the compiler may decide to replace ((char *) NULL + 0) by
__builtin_unreachable.  It can make sense to find an ISO C-compliant
replacement of this idiom at some point.

[-- Attachment #2: Type: text/html, Size: 1639 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 20:36           ` Marc Nieper-Wißkirchen
@ 2023-12-01 22:23             ` Paul Eggert
  0 siblings, 0 replies; 29+ messages in thread
From: Paul Eggert @ 2023-12-01 22:23 UTC (permalink / raw)
  To: Marc Nieper-Wißkirchen
  Cc: Bruno Haible, Jeffrey Walton, Alexey Palienko, bug-m4, bug-gnulib

On 2023-12-01 12:36, Marc Nieper-Wißkirchen wrote:
> It may not be a false alarm in future versions of the compiler.

If that happens, Gnulib (and lots of other software) won't work in those 
future versions. This is so unlikely that we needn't worry about it now.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 18:36     ` Andreas F. Borchert
@ 2023-12-02  8:50       ` Bruno Haible
  2023-12-02 17:16         ` Andreas F. Borchert
  0 siblings, 1 reply; 29+ messages in thread
From: Bruno Haible @ 2023-12-02  8:50 UTC (permalink / raw)
  To: Andreas F. Borchert; +Cc: Jeffrey Walton, Alexey Palienko, bug-m4, bug-gnulib

Andreas F. Borchert wrote:
> In summary, null pointer constants are not pointers to an object
> or an element of an array object and thereby must not be used
> for pointer arithmetic.

Thanks for the explanations.

I still don't know whether it's OK to have pointers to arrays with
0 elements (which are not "array objects", since "objects" are non-
empty (§ 6.2.6.1.(2))). And if so, whether adding 0 to such a pointer
would be valid. For the moment, it's not relevant, fortunately.

> [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf

Thanks for this new URL as well. I missed it.

Bruno





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-01 20:01         ` Paul Eggert
  2023-12-01 20:36           ` Marc Nieper-Wißkirchen
@ 2023-12-02  9:04           ` Bruno Haible
  2023-12-03  9:04             ` Paul Eggert
  1 sibling, 1 reply; 29+ messages in thread
From: Bruno Haible @ 2023-12-02  9:04 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Marc Nieper-Wißkirchen, Jeffrey Walton, Alexey Palienko,
	bug-m4, bug-gnulib

Paul Eggert wrote:
> > Indeed, this sentence appears to forbid ((char *) NULL) + something.
> 
> Yes. However, Gnulib code can still use ((char *) NULL) + something) 
> because the Gnulib portability guidelines allow it.
> 
> The issue with clang false positives is covered here:
> 
> https://www.gnu.org/software/gnulib/manual/html_node/Unsupported-Platforms.html
> 
> which lists "clang -fsanitize=undefined" as an unsupported platform 
> unless you also specify "-fno-sanitize=pointer-overflow".

But, as the Gnulib documentation says, "this may also disable some unrelated
and useful pointer checks". And indeed, the runtime error from the obstack
module was hiding two other runtime errors (undefined behaviour) in GNU m4:
<https://lists.gnu.org/archive/html/bug-m4/2023-12/msg00011.html>.
These two other cases became visible only after the 'obstack' one was
eliminated (due to the option '-fno-sanitize-recover=all', which is generally
reasonable to use).

So, while we are unhappy about changing our code, it allows more people
to make good use of clang's UB sanitizer.

Much like the problem we had with
  memset (NULL, c, 0);
or
  memcpy (dest, NULL, 0);
There also I hated to change source code, to accommodate a picky sanitizer.
But the benefit is that the sanitizer can be applied during development
and won't give a false alarm.

> However, we needn't go through Gnulib and change 
> other code merely because it runs afoul of this false alarm from clang.

On the contrary, I will try to find and eliminate such alarms.

Bruno





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-02  8:50       ` Bruno Haible
@ 2023-12-02 17:16         ` Andreas F. Borchert
  2023-12-03  7:41           ` pointer addition and arrays Bruno Haible
  0 siblings, 1 reply; 29+ messages in thread
From: Andreas F. Borchert @ 2023-12-02 17:16 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Jeffrey Walton, Alexey Palienko, bug-m4, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 836 bytes --]

On Sat, Dec 02, 2023 at 09:50:33AM +0100, Bruno Haible wrote:
> I still don't know whether it's OK to have pointers to arrays with
> 0 elements (which are not "array objects", since "objects" are non-
> empty (§ 6.2.6.1.(2))).

From § 6.7.6.2 about array declarators:

   If the expression is a constant expression, it shall have a
   value greater than zero.

Hence, arrays must have at least one element. And in case of
memory management functions, the behaviour is implementation-defined
for a requested size of zero (see § 7.24.3) and null
pointers are possible. Hence pointer arithmetic is not supported
for pointers returned by malloc(0).

Andreas

-- 
Dr. Andreas F. Borchert,  Institut für Numerische Mathematik, Universität Ulm
Helmholtzstr. 20, 89081 Ulm, +49 7315023572 https://mathematik.uni-ulm.de/afb

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6013 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: pointer addition and arrays
  2023-12-02 17:16         ` Andreas F. Borchert
@ 2023-12-03  7:41           ` Bruno Haible
  2023-12-03  9:25             ` Andreas F. Borchert
  0 siblings, 1 reply; 29+ messages in thread
From: Bruno Haible @ 2023-12-03  7:41 UTC (permalink / raw)
  To: Andreas F. Borchert, bug-gnulib; +Cc: Jeffrey Walton, Alexey Palienko

Andreas F. Borchert wrote:
> > I still don't know whether it's OK to have pointers to arrays with
> > 0 elements (which are not "array objects", since "objects" are non-
> > empty (§ 6.2.6.1.(2))).
> 
> From § 6.7.6.2 about array declarators:
> 
>    If the expression is a constant expression, it shall have a
>    value greater than zero.
> 
> Hence, arrays must have at least one element. And in case of
> memory management functions, the behaviour is implementation-defined
> for a requested size of zero (see § 7.24.3) and null
> pointers are possible. Hence pointer arithmetic is not supported
> for pointers returned by malloc(0).

Thanks for the answer. But I'm talking about programs that access arrays
through pointers, not through array declarators. Such as this one:

==================================== foo.c ====================================
#include <stddef.h>
#include <stdlib.h>
#ifdef __GNUC__
# define unreachable() __builtin_unreachable ()
#endif
typedef struct { int a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t; } foo;
int main ()
{
  char *m = malloc (1);
  if (m == NULL)
    unreachable ();
  foo *p = (foo *) m;     /* CAST */
  ptrdiff_t d = p - p;    /* DIFFERENCE */
  foo *q = p + 0;         /* ADDITION */
  free (m);
}
/* gcc -fsanitize=address,undefined,signed-integer-overflow,shift,integer-divide-by-zero foo.c */
/* clang -fsanitize=address,undefined,signed-integer-overflow,shift,integer-divide-by-zero foo.c */
===============================================================================

gcc's and clang's sanitizers let this program pass at runtime.

The line /* CAST */ is probably OK because ISO C 23 § 6.5.4 specifies no
undefined behaviour.

The lines /* DIFFERENCE */ and /* ADDITION */ could be construed as undefined
behaviour according to ISO C 23 § 6.5.6.(9) and § 6.5.6.(10), but the sanitizers
don't flag it. Why?

Bruno





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-02  9:04           ` Bruno Haible
@ 2023-12-03  9:04             ` Paul Eggert
  2023-12-03  9:17               ` Marc Nieper-Wißkirchen
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Eggert @ 2023-12-03  9:04 UTC (permalink / raw)
  To: Bruno Haible
  Cc: Marc Nieper-Wißkirchen, Jeffrey Walton, Alexey Palienko,
	bug-m4, bug-gnulib

On 2023-12-02 01:04, Bruno Haible wrote:
> On the contrary, I will try to find and eliminate such alarms.

Please don't complicate and/or slow down shared Gnulib code just to 
pacify this particular false alarm from Clang. The obstack fix was fine, 
because it made obstack clearer and no slower. But we shouldn't have to 
insert unnecessary "is the size zero?" checks merely to pacify the false 
alarms.

The right place to fix this problem is in Clang, not in Gnulib.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-03  9:04             ` Paul Eggert
@ 2023-12-03  9:17               ` Marc Nieper-Wißkirchen
  2023-12-04 13:57                 ` Bruno Haible
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Nieper-Wißkirchen @ 2023-12-03  9:17 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Bruno Haible, Marc Nieper-Wißkirchen, Jeffrey Walton,
	Alexey Palienko, bug-m4, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]

Am So., 3. Dez. 2023 um 10:05 Uhr schrieb Paul Eggert <eggert@cs.ucla.edu>:

> On 2023-12-02 01:04, Bruno Haible wrote:
> > On the contrary, I will try to find and eliminate such alarms.
>
> Please don't complicate and/or slow down shared Gnulib code just to
> pacify this particular false alarm from Clang. The obstack fix was fine,
> because it made obstack clearer and no slower. But we shouldn't have to
> insert unnecessary "is the size zero?" checks merely to pacify the false
> alarms.
>

I would suggest writing macros that encapsulate code like "NULL + 0" and
replace the code with macro invocations.  On legacy systems, the macro
would expand into the legacy code; on modern systems where checks like "is
this zero size?" should be eliminated by the compiler, the macro would
expand into the correct ISO C code.  This can also make the code more
transparent as long as the macro has a self-describing name.


> The right place to fix this problem is in Clang, not in Gnulib.
>

I don't think so.  As long as Clang implements ISO C, it is not Clang's
problem but a problem of code that produces UB (in the technical sense).

Just my two cents, of course.

[-- Attachment #2: Type: text/html, Size: 1963 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: pointer addition and arrays
  2023-12-03  7:41           ` pointer addition and arrays Bruno Haible
@ 2023-12-03  9:25             ` Andreas F. Borchert
  2023-12-03 10:03               ` Bruno Haible
  0 siblings, 1 reply; 29+ messages in thread
From: Andreas F. Borchert @ 2023-12-03  9:25 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib, Jeffrey Walton, Alexey Palienko

[-- Attachment #1: Type: text/plain, Size: 2121 bytes --]

> ==================================== foo.c ====================================
> #include <stddef.h>
> #include <stdlib.h>
> #ifdef __GNUC__
> # define unreachable() __builtin_unreachable ()
> #endif
> typedef struct { int a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t; } foo;
> int main ()
> {
>   char *m = malloc (1);
>   if (m == NULL)
>     unreachable ();
>   foo *p = (foo *) m;     /* CAST */
>   ptrdiff_t d = p - p;    /* DIFFERENCE */
>   foo *q = p + 0;         /* ADDITION */
>   free (m);
> }
> /* gcc -fsanitize=address,undefined,signed-integer-overflow,shift,integer-divide-by-zero foo.c */
> /* clang -fsanitize=address,undefined,signed-integer-overflow,shift,integer-divide-by-zero foo.c */
> ===============================================================================
> 
> gcc's and clang's sanitizers let this program pass at runtime.
> 
> The line /* CAST */ is probably OK because ISO C 23 § 6.5.4 specifies no
> undefined behaviour.

This line is not guaranteed to be ok due to § 6.3.2.3, (7):

   A pointer to an object type may be converted to a pointer to a different
   object type. If the resulting pointer is not correctly aligned for the
   referenced type, the behaviour is undefined.

alignof(char) will be 1 and the alignment of foo will probably be 4, hence
there is no guarantee that the pointer value of p is properly aligned.
You have just the exception that you are free to convert a (foo *) to (char *)
and to even convert a (char *) back to (foo *) if it was originally a (foo *).

See also § 6.2.5, (33) according to which pointers to structure types
(as foo in the example) can have possibly a different representation
than a pointer to char. To see the problem, assume a word addressed
machine where char pointers need extra representation for the byte
offset within a word. How could you convert a non-word-aligned (char *)
to a (foo *) on such a machine?

Andreas.

-- 
Dr. Andreas F. Borchert,  Institut für Numerische Mathematik, Universität Ulm
Helmholtzstr. 20, 89081 Ulm, +49 7315023572 https://mathematik.uni-ulm.de/afb

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6013 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: pointer addition and arrays
  2023-12-03  9:25             ` Andreas F. Borchert
@ 2023-12-03 10:03               ` Bruno Haible
  2023-12-03 13:15                 ` Andreas F. Borchert
  0 siblings, 1 reply; 29+ messages in thread
From: Bruno Haible @ 2023-12-03 10:03 UTC (permalink / raw)
  To: Andreas F. Borchert; +Cc: bug-gnulib, Jeffrey Walton, Alexey Palienko

Andreas F. Borchert wrote:
> >   foo *p = (foo *) m;     /* CAST */
> ...
> This line is not guaranteed to be ok due to § 6.3.2.3, (7):
> 
>    A pointer to an object type may be converted to a pointer to a different
>    object type. If the resulting pointer is not correctly aligned for the
>    referenced type, the behaviour is undefined.
> 
> alignof(char) will be 1 and the alignment of foo will probably be 4, hence
> there is no guarantee that the pointer value of p is properly aligned.

There is no guarantee, but the malloc() result has an alignment of at least 4
on all of today's platforms. [1]

Bruno

[1] see gnulib/lib/memalign.c and gnulib/m4/malloc-align.m4.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: pointer addition and arrays
  2023-12-03 10:03               ` Bruno Haible
@ 2023-12-03 13:15                 ` Andreas F. Borchert
  2023-12-15  1:00                   ` Paul Eggert
  0 siblings, 1 reply; 29+ messages in thread
From: Andreas F. Borchert @ 2023-12-03 13:15 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib, Jeffrey Walton, Alexey Palienko

[-- Attachment #1: Type: text/plain, Size: 1004 bytes --]

On Sun, Dec 03, 2023 at 11:03:30AM +0100, Bruno Haible wrote:
> There is no guarantee, but the malloc() result has an alignment of at least 4
> on all of today's platforms. [1]

It may be legitimate to have such assumptions. But it is likewise
legitimate that compilers and the associated libraries adhere strictly
to the standard without accommodating such assumptions. You will have
then to possibly live with annoying warnings and odd behaviours. Here is a
classical example where common assumptions about integer arithmetic led
to trouble when a newer version of gcc changed the game in a
standard-conforming way:
   https://bugzilla.redhat.com/show_bug.cgi?id=175462
In case of malloc the game could be changed through LD_PRELOAD which
is also perfectly legitimate as long as the replacement conforms
to § 7.24.3.

Andreas.

-- 
Dr. Andreas F. Borchert,  Institut für Numerische Mathematik, Universität Ulm
Helmholtzstr. 20, 89081 Ulm, Raumnummer: 1.23, Telefon: +49 7315023572

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6013 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-03  9:17               ` Marc Nieper-Wißkirchen
@ 2023-12-04 13:57                 ` Bruno Haible
  2023-12-15  0:32                   ` Paul Eggert
  0 siblings, 1 reply; 29+ messages in thread
From: Bruno Haible @ 2023-12-04 13:57 UTC (permalink / raw)
  To: Paul Eggert, Marc Nieper-Wißkirchen, bug-gnulib
  Cc: Jeffrey Walton, Pádraig Brady

[-- Attachment #1: Type: text/plain, Size: 2265 bytes --]

Paul Eggert wrote:
> > On the contrary, I will try to find and eliminate such alarms.
> 
> Please don't complicate and/or slow down shared Gnulib code just to 
> pacify this particular false alarm from Clang. The obstack fix was fine, 
> because it made obstack clearer and no slower. But we shouldn't have to 
> insert unnecessary "is the size zero?" checks merely to pacify the false 
> alarms.

Building a gnulib testdir with these settings:
  CC="clang -fsanitize=address,undefined,signed-integer-overflow,shift,integer-divide-by-zero"
produced no relevant findings (other than 'argp' things).

But building coreutils with these settings gives two findings:
  lib/linebuffer.c:65:22: runtime error: applying zero offset to null pointer
  lib/mpsort.c:155:34: runtime error: applying zero offset to null pointer

In both cases, the ability to add NULL + 0 unifies code paths for
empty arrays and non-empty arrays. Such unified code paths are desirable:
they make the code look simpler.

Marc Nieper-Wißkirchen wrote:
> I would suggest writing macros that encapsulate code like "NULL + 0" and
> replace the code with macro invocations.  On legacy systems, the macro
> would expand into the legacy code; on modern systems where checks like "is
> this zero size?" should be eliminated by the compiler, the macro would
> expand into the correct ISO C code.

That's an interesting idea. It applies to both lib/linebuffer.c and
lib/mpsort.c.

Paul, is the attached patch OK with you? It allows coreutils to build and
"make check" with clang's ASAN+UBSAN, without taking out some categories
of UBSAN.
The downside is an additional macro. Which I could move to a separate
include file.

The macro does not produce slower code:
  - clang optimizes the conditional expression nicely.
  - With gcc, which does not yet optimize the conditional expression well,
    we continue to use the simple but not ISO C compliant code.

It is quite possible that newer GCCs will, in the future, trigger the same
  runtime error: applying zero offset to null pointer
as clang does. For this reason, I would report a GCC bug, asking them to
optimize
   ((n) ? (array) + (n) : (array))
into identical code as
   ((array) + (n))

Bruno


[-- Attachment #2: ubsan.diff --]
[-- Type: text/x-patch, Size: 2375 bytes --]

diff --git a/lib/linebuffer.c b/lib/linebuffer.c
index 4124bbcb37..9c8e12ab32 100644
--- a/lib/linebuffer.c
+++ b/lib/linebuffer.c
@@ -31,6 +31,19 @@
 # include "unlocked-io.h"
 #endif
 
+/* END_OF_ARRAY(array,n) returns a pointer past the last element of the array
+   ARRAY that has N elements.
+   For clang, it uses a conditional expression that avoids adding 0 to a null
+   pointer, which is undefined according to ISO C 23 § 6.5.6.(9) and which
+   triggers an error in clang's undefined-behaviour sanitizer.  When no
+   sanitizer is in effect, clang optimizes this conditional expression to
+   exactly the same code.  */
+#if defined __clang__
+# define END_OF_ARRAY(array,n) ((n) ? (array) + (n) : (array))
+#else
+# define END_OF_ARRAY(array,n) ((array) + (n))
+#endif
+
 /* Initialize linebuffer LINEBUFFER for use. */
 
 void
@@ -62,7 +75,7 @@ readlinebuffer_delim (struct linebuffer *linebuffer, FILE *stream,
   int c;
   char *buffer = linebuffer->buffer;
   char *p = linebuffer->buffer;
-  char *end = buffer + linebuffer->size; /* Sentinel. */
+  char *end = END_OF_ARRAY (buffer, linebuffer->size); /* Sentinel. */
 
   if (feof (stream))
     return NULL;
diff --git a/lib/mpsort.c b/lib/mpsort.c
index 32cc5e1f82..2741d7199d 100644
--- a/lib/mpsort.c
+++ b/lib/mpsort.c
@@ -23,6 +23,19 @@
 
 #include <string.h>
 
+/* END_OF_ARRAY(array,n) returns a pointer past the last element of the array
+   ARRAY that has N elements.
+   For clang, it uses a conditional expression that avoids adding 0 to a null
+   pointer, which is undefined according to ISO C 23 § 6.5.6.(9) and which
+   triggers an error in clang's undefined-behaviour sanitizer.  When no
+   sanitizer is in effect, clang optimizes this conditional expression to
+   exactly the same code.  */
+#if defined __clang__
+# define END_OF_ARRAY(array,n) ((n) ? (array) + (n) : (array))
+#else
+# define END_OF_ARRAY(array,n) ((array) + (n))
+#endif
+
 /* The type of qsort-style comparison functions.  */
 
 typedef int (*comparison_function) (void const *, void const *);
@@ -152,5 +165,5 @@ mpsort_with_tmp (void const **restrict base, size_t n,
 void
 mpsort (void const **base, size_t n, comparison_function cmp)
 {
-  mpsort_with_tmp (base, n, base + n, cmp);
+  mpsort_with_tmp (base, n, END_OF_ARRAY (base, n), cmp);
 }

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-04 13:57                 ` Bruno Haible
@ 2023-12-15  0:32                   ` Paul Eggert
  2023-12-16 21:09                     ` Bruno Haible
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Eggert @ 2023-12-15  0:32 UTC (permalink / raw)
  To: Bruno Haible, Marc Nieper-Wißkirchen, bug-gnulib
  Cc: Jeffrey Walton, Pádraig Brady

On 12/4/23 05:57, Bruno Haible wrote:

> In both cases, the ability to add NULL + 0 unifies code paths for
> empty arrays and non-empty arrays. Such unified code paths are desirable:
> they make the code look simpler.
> ...
> Paul, is the attached patch OK with you?

Thanks, but I'm of two minds about that sort of thing. On the one hand, 
they document the assumption that (char *) NULL + 0 == NULL. On the 
other, they are still clutter and this assumption is safe on all but 
pedantic platforms -- and we've long not worried about pedantic 
platforms (e.g., gcc -pedantic), for good reason.

Of course, the fact that you found these issues with clang's runtime 
checking does not mean that these are all the issues; I fully expect 
that there are others.

All in all, it may be better to leave the code alone.

PS. The name END_OF_ARRAY is not the best, since a null pointer does not 
point to an array, and even when the pointer is non-null the macro can 
be used to point elsewhere than to the end of an array. A better name 
would be something like PTR_ADD, since the first arg is the pointer, the 
second arg the amount to add.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: pointer addition and arrays
  2023-12-03 13:15                 ` Andreas F. Borchert
@ 2023-12-15  1:00                   ` Paul Eggert
  2023-12-16 12:42                     ` Marc Nieper-Wißkirchen
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Eggert @ 2023-12-15  1:00 UTC (permalink / raw)
  To: Andreas F. Borchert; +Cc: bug-gnulib

On 12/3/23 05:15, Andreas F. Borchert wrote:
> You will have
> then to possibly live with annoying warnings and odd behaviours.

The C standard says that when P is a null pointer, P==P must be true 
whereas P<=P has undefined behavior. However, in practice any 
implementation where P==P succeeds but P<=P fails is a pedantic 
implementation that is merely enforcing what are arguably bugs in the 
standard. Let's not waste time worrying about implementations like that.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: pointer addition and arrays
  2023-12-15  1:00                   ` Paul Eggert
@ 2023-12-16 12:42                     ` Marc Nieper-Wißkirchen
  2023-12-17  8:50                       ` Paul Eggert
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Nieper-Wißkirchen @ 2023-12-16 12:42 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Andreas F. Borchert, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 1368 bytes --]

Am Fr., 15. Dez. 2023 um 02:00 Uhr schrieb Paul Eggert <eggert@cs.ucla.edu>:

> On 12/3/23 05:15, Andreas F. Borchert wrote:
> > You will have
> > then to possibly live with annoying warnings and odd behaviours.
>
> The C standard says that when P is a null pointer, P==P must be true
> whereas P<=P has undefined behavior. However, in practice any
> implementation where P==P succeeds but P<=P fails is a pedantic
> implementation that is merely enforcing what are arguably bugs in the
> standard. Let's not waste time worrying about implementations like that.
>

I would agree if implementations did this because of pedantry, just
because.  But the problem is if further down in the compiler pipeline, the
P <= P expression produces some intermediate code that becomes equal to
code the optimizer should rightfully optimize away (or replace by the
equivalent of __builtin_unreachable).

Gnulib should lead by example and follow the standard (on systems that
implement the standard.  I like Example 2 on this page [1] very much.  It
shows that well-justified optimizations can easily lead to consequences of
UB that make the compiler look evil (or pedantic).  We cannot circumvent
the principle of explosion ([2]).

Thanks,

Marc

--

[1] https://mohitmv.github.io/blog/Shocking-Undefined-Behaviour-In-Action/
[2] https://en.wikipedia.org/wiki/Principle_of_explosion

[-- Attachment #2: Type: text/html, Size: 2653 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-15  0:32                   ` Paul Eggert
@ 2023-12-16 21:09                     ` Bruno Haible
  2023-12-17  8:40                       ` Paul Eggert
  0 siblings, 1 reply; 29+ messages in thread
From: Bruno Haible @ 2023-12-16 21:09 UTC (permalink / raw)
  To: Marc Nieper-Wißkirchen, bug-gnulib, Paul Eggert
  Cc: Jeffrey Walton, Pádraig Brady

[-- Attachment #1: Type: text/plain, Size: 2692 bytes --]

Hi Paul,

Paul Eggert wrote:
> PS. The name END_OF_ARRAY is not the best, since a null pointer does not 
> point to an array, and even when the pointer is non-null the macro can 
> be used to point elsewhere than to the end of an array. A better name 
> would be something like PTR_ADD, since the first arg is the pointer, the 
> second arg the amount to add.

Good point. In the new proposed patch (attached), the macro is called PTR_ADD.

> Thanks, but I'm of two minds about that sort of thing. On the one hand, 
> they document the assumption that (char *) NULL + 0 == NULL. On the 
> other, they are still clutter

I see it differently. We should abandon the thinking that adding
  (char *) NULL + 0
is well-defined. It was probably well-defined in the past, but is not
any more. The standard and its common interpretation are no more like
it was 20 years ago. Together with the standard, the tools change and
will change. The clang UBSAN has changed. GCC's UBSAN may change as well.
Compilers and static analysis tools may follow suit (i.e. give warnings
like "pointer operation on ptr, but ptr may be null").

It's like with other changes we saw in the past:
  - strict aliasing rules - forbid to cast (int**) to (char**) etc.
  - signed integer overflow.
What was once a common style is now outlawed.

Gnulib should not contain outlawed code; otherwise it becomes a bad
citizen that people (rightfully) complain about.

> and this assumption is safe on all but 
> pedantic platforms -- and we've long not worried about pedantic 
> platforms (e.g., gcc -pedantic), for good reason.

My point is not about pedantic platforms. I don't care about 'gcc -pedantic'
since it produces so many pointless diagnostics.

What I want is to be able to use (and that people are able to use)
clang + ASAN + UBSAN as a general compilation environment, since it
provides good runtime checking, better than CHERI and valgrind,
with zero false positives. And the
  (char *) NULL + 0
is the _last_ obstacle; it hinders us from achieving this goal.

> Of course, the fact that you found these issues with clang's runtime 
> checking does not mean that these are all the issues; I fully expect 
> that there are others.

If there are some issues left, there will not be many. By running
clang + UBSAN I found only two issues:
  - obstack,
  - (char *) NULL + 0

> All in all, it may be better to leave the code alone.

I disagree. The goals of
  - making full use of UBSAN,
  - avoid future compiler warnings as compilers get stricter on this issue
are more important than one (one!) additional macro that is used in very few
places. The maintenance cost of having one additional macro is near zero.

Bruno

[-- Attachment #2: 0001-linebuffer-mpsort-Avoid-undefined-behaviour-adding-p.patch --]
[-- Type: text/x-patch, Size: 5337 bytes --]

From ce227dca145176a17342a52a798459fcb37e0bfd Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Sat, 16 Dec 2023 21:40:33 +0100
Subject: [PATCH] linebuffer, mpsort: Avoid undefined behaviour adding pointer
 + integer.

* lib/pointer-op.h: New file.
* lib/linebuffer.c: Include pointer-op.h.
(readlinebuffer_delim): Use PTR_ADD.
* lib/mpsort.c: Include pointer-op.h.
(mpsort): Use PTR_ADD.
* modules/linebuffer (Files): Add lib/pointer-op.h.
* modules/mpsort (Files): Likewise.
---
 ChangeLog          | 11 +++++++++++
 lib/linebuffer.c   |  8 ++++++--
 lib/mpsort.c       |  5 ++++-
 lib/pointer-op.h   | 40 ++++++++++++++++++++++++++++++++++++++++
 modules/linebuffer |  1 +
 modules/mpsort     |  1 +
 6 files changed, 63 insertions(+), 3 deletions(-)
 create mode 100644 lib/pointer-op.h

diff --git a/ChangeLog b/ChangeLog
index 13e80727ba..06ac734286 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2023-12-16  Bruno Haible  <bruno@clisp.org>
+
+	linebuffer, mpsort: Avoid undefined behaviour adding pointer + integer.
+	* lib/pointer-op.h: New file.
+	* lib/linebuffer.c: Include pointer-op.h.
+	(readlinebuffer_delim): Use PTR_ADD.
+	* lib/mpsort.c: Include pointer-op.h.
+	(mpsort): Use PTR_ADD.
+	* modules/linebuffer (Files): Add lib/pointer-op.h.
+	* modules/mpsort (Files): Likewise.
+
 2023-12-14  Paul Eggert  <eggert@cs.ucla.edu>
 
 	mcel-tests: fix thinko in test
diff --git a/lib/linebuffer.c b/lib/linebuffer.c
index 4124bbcb37..19b7325a52 100644
--- a/lib/linebuffer.c
+++ b/lib/linebuffer.c
@@ -20,11 +20,15 @@
 
 #include <config.h>
 
+/* Specification.  */
+#include "linebuffer.h"
+
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sys/types.h>
-#include "linebuffer.h"
+
+#include "pointer-op.h"
 #include "xalloc.h"
 
 #if USE_UNLOCKED_IO
@@ -62,7 +66,7 @@ readlinebuffer_delim (struct linebuffer *linebuffer, FILE *stream,
   int c;
   char *buffer = linebuffer->buffer;
   char *p = linebuffer->buffer;
-  char *end = buffer + linebuffer->size; /* Sentinel. */
+  char *end = PTR_ADD (buffer, linebuffer->size); /* Sentinel. */
 
   if (feof (stream))
     return NULL;
diff --git a/lib/mpsort.c b/lib/mpsort.c
index 32cc5e1f82..4e0c3ac70f 100644
--- a/lib/mpsort.c
+++ b/lib/mpsort.c
@@ -19,10 +19,13 @@
 
 #include <config.h>
 
+/* Specification.  */
 #include "mpsort.h"
 
 #include <string.h>
 
+#include "pointer-op.h"
+
 /* The type of qsort-style comparison functions.  */
 
 typedef int (*comparison_function) (void const *, void const *);
@@ -152,5 +155,5 @@ mpsort_with_tmp (void const **restrict base, size_t n,
 void
 mpsort (void const **base, size_t n, comparison_function cmp)
 {
-  mpsort_with_tmp (base, n, base + n, cmp);
+  mpsort_with_tmp (base, n, PTR_ADD (base, n), cmp);
 }
diff --git a/lib/pointer-op.h b/lib/pointer-op.h
new file mode 100644
index 0000000000..4167502a66
--- /dev/null
+++ b/lib/pointer-op.h
@@ -0,0 +1,40 @@
+/* Pointer operations.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is free software: you can redistribute it and/or modify
+   it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   This file is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+#ifndef _POINTER_OP_H
+#define _POINTER_OP_H
+
+/* PTR_ADD(POINTER,N) returns a generalization of the ISO C definition of
+   POINTER + N.
+   Namely, when POINTER + N is well-defined according to ISO C 23 § 6.5.6.(9),
+   this macro returns it.  Otherwise, when N is zero, it merely returns the
+   POINTER.
+
+   If POINTER points to an array of N elements, this macro returns a pointer
+   past the last element of the array.
+
+   For clang, it uses a conditional expression that avoids adding 0 to a null
+   pointer, which is undefined according to ISO C 23 § 6.5.6.(9) and which
+   triggers an error in clang's undefined-behaviour sanitizer.  When no
+   sanitizer is in effect, clang optimizes this conditional expression to
+   exactly the same code.  */
+#if defined __clang__
+# define PTR_ADD(pointer,n) ((n) ? (pointer) + (n) : (pointer))
+#else
+# define PTR_ADD(pointer,n) ((pointer) + (n))
+#endif
+
+#endif /* _POINTER_OP_H */
diff --git a/modules/linebuffer b/modules/linebuffer
index 4ddd3c38ed..d142eeee7b 100644
--- a/modules/linebuffer
+++ b/modules/linebuffer
@@ -4,6 +4,7 @@ Read a line from a stream.
 Files:
 lib/linebuffer.h
 lib/linebuffer.c
+lib/pointer-op.h
 
 Depends-on:
 idx
diff --git a/modules/mpsort b/modules/mpsort
index 054e979c2e..aef0432ce3 100644
--- a/modules/mpsort
+++ b/modules/mpsort
@@ -4,6 +4,7 @@ Sort a vector of pointers to data.
 Files:
 lib/mpsort.h
 lib/mpsort.c
+lib/pointer-op.h
 m4/mpsort.m4
 
 Depends-on:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-16 21:09                     ` Bruno Haible
@ 2023-12-17  8:40                       ` Paul Eggert
  2023-12-18 13:27                         ` Bruno Haible
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Eggert @ 2023-12-17  8:40 UTC (permalink / raw)
  To: Bruno Haible, Marc Nieper-Wißkirchen, bug-gnulib
  Cc: Jeffrey Walton, Pádraig Brady

On 2023-12-16 13:09, Bruno Haible wrote:
> It's like with other changes we saw in the past:
>    - strict aliasing rules - forbid to cast (int**) to (char**) etc.
>    - signed integer overflow.
> What was once a common style is now outlawed.

It's not like those changes, because those other changes were put in for 
good reasons - to improve performance and/or catch real bugs. This 
change, in practice, makes performance worse, and it doesn't catch real 
bugs. It's pure pedantry.

The bad citizen here is clang, not Gnulib or Coreutils. Or maybe it's 
the C standard; it doesn't matter. Clang needn't (and doesn't) complain 
about every violation of the C standard, and it should not be 
complaining about this one.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: pointer addition and arrays
  2023-12-16 12:42                     ` Marc Nieper-Wißkirchen
@ 2023-12-17  8:50                       ` Paul Eggert
  2023-12-17 10:16                         ` Marc Nieper-Wißkirchen
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Eggert @ 2023-12-17  8:50 UTC (permalink / raw)
  To: Marc Nieper-Wißkirchen; +Cc: Andreas F. Borchert, bug-gnulib

On 2023-12-16 04:42, Marc Nieper-Wißkirchen wrote:
> the problem is if further down in the compiler pipeline, the
> P <= P expression produces some intermediate code that becomes equal to
> code the optimizer should rightfully optimize away

I'm skeptical that such optimizations would be worth the trouble in 
practice, because any optimization where P==P does not imply P<=P would 
run contrary to common sense and programmer intuition. In practice, 
compiler writers who'd employ such optimizations would likely cause more 
trouble than they'd cure, and I wouldn't want to encourage them.

If I'm wrong, and in the future these optimizations become valuable and 
break programs like Coreutils, we can revisit the issue then. In the 
meantime let's leave these sleeping dogs lie. I have little doubt that 
Bruno has missed many of the issues and it'd be a waste of our time to 
try to find the rest of them.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: pointer addition and arrays
  2023-12-17  8:50                       ` Paul Eggert
@ 2023-12-17 10:16                         ` Marc Nieper-Wißkirchen
  0 siblings, 0 replies; 29+ messages in thread
From: Marc Nieper-Wißkirchen @ 2023-12-17 10:16 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Marc Nieper-Wißkirchen, Andreas F. Borchert, bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]

Am So., 17. Dez. 2023 um 09:50 Uhr schrieb Paul Eggert <eggert@cs.ucla.edu>:

> On 2023-12-16 04:42, Marc Nieper-Wißkirchen wrote:
> > the problem is if further down in the compiler pipeline, the
> > P <= P expression produces some intermediate code that becomes equal to
> > code the optimizer should rightfully optimize away
>
> I'm skeptical that such optimizations would be worth the trouble in
> practice, because any optimization where P==P does not imply P<=P would
> run contrary to common sense and programmer intuition.


The C standard is full of such examples "contrary to common sense", and in
related areas (e.g. comparison/overflow of (un)signed integers) optimizers
are already using these.


> In practice,
> compiler writers who'd employ such optimizations would likely cause more
> trouble than they'd cure, and I wouldn't want to encourage them.
>
> If I'm wrong, and in the future these optimizations become valuable and
> break programs like Coreutils, we can revisit the issue then. In the
> meantime let's leave these sleeping dogs lie. I have little doubt that
> Bruno has missed many of the issues and it'd be a waste of our time to
> try to find the rest of them.
>

We don't have to hunt all these issues, but we should correct those we find.

The C standard has always favoured the freedom of compiler writers for
efficiency, so it is definitely not wrong by its own measures.  It may not
be the best decision for every application area, but there are other
programming languages besides C.

[-- Attachment #2: Type: text/html, Size: 2362 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-17  8:40                       ` Paul Eggert
@ 2023-12-18 13:27                         ` Bruno Haible
  2023-12-22  4:58                           ` Paul Eggert
  0 siblings, 1 reply; 29+ messages in thread
From: Bruno Haible @ 2023-12-18 13:27 UTC (permalink / raw)
  To: bug-gnulib, Paul Eggert
  Cc: Marc Nieper-Wißkirchen, Jeffrey Walton, Pádraig Brady

Paul Eggert wrote:
> The bad citizen here is clang, not Gnulib or Coreutils. Or maybe it's 
> the C standard; it doesn't matter. Clang needn't (and doesn't) complain 
> about every violation of the C standard, and it should not be 
> complaining about this one.

I would be willing to
  1) open a bug report with the Clang ASAN people, asking them to move
     the   (char *) NULL + 0   detection to a separate, "pedantic"
     category.

But this would not be enough. Marc has pointed us to
  https://mohitmv.github.io/blog/Shocking-Undefined-Behaviour-In-Action/
example 2, which highlights that compiler writers exploit undefined behaviour
knowledge in order to strengthen optimizations. The same kind of optimization
is possible here: When a compiler sees
   <pointer> + <integer>
it can, by ISO C 23, deduce that the <pointer> is non-NULL. 

We would thus also need to
  2) ask the clang people whether they can promise that they will never make
     this optimization,
  3) ask the GCC people the same thing.

Do you think there's a chance they will say "yes, we promise"? I don't think
so. Rather they will probably say "interesting, here's another possible
optimization; let's use it!".

> > It's like with other changes we saw in the past:
> >    - strict aliasing rules - forbid to cast (int**) to (char**) etc.
> >    - signed integer overflow.
> > What was once a common style is now outlawed.
> 
> It's not like those changes, because those other changes were put in for 
> good reasons - to improve performance and/or catch real bugs. This 
> change, in practice, makes performance worse, and it doesn't catch real 
> bugs. It's pure pedantry.

I disagree. It's very much like the strict aliasing and the signed integer
overflow. Like these, the   <pointer> + <integer>   scenario with <pointer>
being NULL is an undefined-behaviour scenario that compiler writers will
be eager to exploit for optimization. And like in those other two cases,
our only choice is to be ISO C compliant, otherwise we risk "shocking"
optimizations.

Bruno





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: undefined-behavior obstack.c:139
  2023-12-18 13:27                         ` Bruno Haible
@ 2023-12-22  4:58                           ` Paul Eggert
  0 siblings, 0 replies; 29+ messages in thread
From: Paul Eggert @ 2023-12-22  4:58 UTC (permalink / raw)
  To: Bruno Haible, bug-gnulib
  Cc: Marc Nieper-Wißkirchen, Jeffrey Walton, Pádraig Brady

On 2023-12-18 05:27, Bruno Haible wrote:
> I would be willing to
>    1) open a bug report with the Clang ASAN people, asking them to move
>       the   (char *) NULL + 0   detection to a separate, "pedantic"
>       category.
> ...
> We would thus also need to
>    2) ask the clang people whether they can promise that they will never make
>       this optimization,
>    3) ask the GCC people the same thing.
> 
> Do you think there's a chance they will say "yes, we promise"?

Yes, of course, that's the point. Lots of real code makes this 
assumption, and the clang folks need to know about that (if they don't 
know already) so that they don't put in optimizations that would break 
real code. This sort of tradeoff is routine for compiler-writers.

This is what makes it different from the example 2 in 
<https://mohitmv.github.io/blog/Shocking-Undefined-Behaviour-In-Action/>. 
That's a contrived example, where the application writer "wants" the 
program to dump core via dereferencing a null pointer, which is 
obviously bogus behavior.

In contrast, adding 0 to a null pointer is not obviously bogus behavior; 
it works on all non-pedantic platforms and it's realistic to expect it 
to work.


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-12-22  4:58 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-01 13:14 undefined-behavior obstack.c:139 Bruno Haible
2023-12-01 13:34 ` Jeffrey Walton
2023-12-01 16:01   ` Bruno Haible
2023-12-01 16:14     ` Marc Nieper-Wißkirchen
2023-12-01 18:40       ` Bruno Haible
2023-12-01 19:08         ` Marc Nieper-Wißkirchen
2023-12-01 20:01         ` Paul Eggert
2023-12-01 20:36           ` Marc Nieper-Wißkirchen
2023-12-01 22:23             ` Paul Eggert
2023-12-02  9:04           ` Bruno Haible
2023-12-03  9:04             ` Paul Eggert
2023-12-03  9:17               ` Marc Nieper-Wißkirchen
2023-12-04 13:57                 ` Bruno Haible
2023-12-15  0:32                   ` Paul Eggert
2023-12-16 21:09                     ` Bruno Haible
2023-12-17  8:40                       ` Paul Eggert
2023-12-18 13:27                         ` Bruno Haible
2023-12-22  4:58                           ` Paul Eggert
2023-12-01 18:36     ` Andreas F. Borchert
2023-12-02  8:50       ` Bruno Haible
2023-12-02 17:16         ` Andreas F. Borchert
2023-12-03  7:41           ` pointer addition and arrays Bruno Haible
2023-12-03  9:25             ` Andreas F. Borchert
2023-12-03 10:03               ` Bruno Haible
2023-12-03 13:15                 ` Andreas F. Borchert
2023-12-15  1:00                   ` Paul Eggert
2023-12-16 12:42                     ` Marc Nieper-Wißkirchen
2023-12-17  8:50                       ` Paul Eggert
2023-12-17 10:16                         ` Marc Nieper-Wißkirchen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).