From: Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org>
To: Paul Eggert <eggert@cs.ucla.edu>, libc-alpha@sourceware.org
Cc: gcc@gcc.gnu.org, A <amit234234234234@gmail.com>
Subject: Using size_t to crash on off-by-one errors (was: size_t vs long.)
Date: Wed, 23 Nov 2022 21:08:17 +0100 [thread overview]
Message-ID: <148dc963-1d9c-b7d8-e5bf-6843b4b36882@gmail.com> (raw)
In-Reply-To: <683baaee-f3dc-bc13-c303-8fb0df0d0a36@gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 4925 bytes --]
Hi,
On 11/18/22 00:04, Alejandro Colomar wrote:
>>> The main advantage of this code compared to the equivalent ssize_t or
>>> ptrdiff_t or idx_t code is that if you somehow write an off-by-one error, and
>>> manage to access the array at [-1], if i is unsigned you'll access
>>> [SIZE_MAX], which will definitely crash your program.
>>
>> That's not true on the vast majority of today's platforms, which don't have
>> subscript checking, and for which a[-1] is treated the same way a[SIZE_MAX]
>> is. On my platform (Fedora 36 x86-64) the same machine code is generated for
>> 'a' and 'b' for the following C code.
>>
>> #include <stdint.h>
>> int a(int *p) { return p[-1]; }
>> int b(int *p) { return p[SIZE_MAX]; }
>
> Hmm, this seems to be true in my platform (amd64) per the experiment I just did:
>
> $ cat s.c
> #include <sys/types.h>
>
> char
> f(char *p, ssize_t i)
> {
> return p[i];
> }
> $ cat u.c
> #include <stddef.h>
>
> char
> f(char *p, size_t i)
> {
> return p[i];
> }
> $ cc -Wall -Wextra -Werror -S -O3 s.c u.c
> $ diff -u u.s s.s
> --- u.s 2022-11-17 23:41:47.773805041 +0100
> +++ s.s 2022-11-17 23:41:47.761805265 +0100
> @@ -1,15 +1,15 @@
> - .file "u.c"
> + .file "s.c"
> .text
> .p2align 4
> .globl f
> .type f, @function
> f:
> -.LFB0:
> +.LFB6:
> .cfi_startproc
> movzbl (%rdi,%rsi), %eax
> ret
> .cfi_endproc
> -.LFE0:
> +.LFE6:
> .size f, .-f
> .ident "GCC: (Debian 12.2.0-9) 12.2.0"
> .section .note.GNU-stack,"",@progbits
>
>
> It seems a violation of the standard, isn't it?
>
> The operator [] doesn't have a type, and an argument to it should be treated
> with whatever type it has after default promotions. If I pass a size_t to it,
> the type should be unsigned, and that should be preserved, by accessing the
> array at a high value, which the compiler has no way to know if it will exist or
> not, by that function definition. The extreme of -1 and SIZE_MAX might be not
> the best one, since we would need a pointer to be 0 to be accessible at
> [SIZE_MAX], but if you replace those by -RANDOM, and (size_t)-RANDOM, then the
> compiler definitely needs to generate different code, yet it doesn't.
>
> I'm guessing this is an optimization by GCC knowing that we will never be close
> to using the whole 64-bit address space. If we use int and unsigned, things
> change:
>
> $ cat s.c
> char
> f(char *p, int i)
> {
> return p[i];
> }
> alx@asus5775:~/tmp$ cat u.c
> char
> f(char *p, unsigned i)
> {
> return p[i];
> }
> $ cc -Wall -Wextra -Werror -S -O3 s.c u.c
> $ diff -u u.s s.s
> --- u.s 2022-11-17 23:44:54.446318186 +0100
> +++ s.s 2022-11-17 23:44:54.434318409 +0100
> @@ -1,4 +1,4 @@
> - .file "u.c"
> + .file "s.c"
> .text
> .p2align 4
> .globl f
> @@ -6,7 +6,7 @@
> f:
> .LFB0:
> .cfi_startproc
> - movl %esi, %esi
> + movslq %esi, %rsi
> movzbl (%rdi,%rsi), %eax
> ret
> .cfi_endproc
>
>
> I'm guessing that GCC doesn't do the assumption here, and I guess the unsigned
> version would crash, while the signed version would cause nasal demons. Anyway,
> now that I'm here, I'll test it:
>
>
> $ cat s.c
> [[gnu::noipa]]
> char
> f(char *p, int i)
> {
> return p[i];
> }
>
> int main(void)
> {
> int i = -1;
> char c[4];
>
> return f(c, i);
> }
> $ cc -Wall -Wextra -Werror -O3 s.c
> $ ./a.out
> $ echo $?
> 0
>
>
> $ cat u.c
> [[gnu::noipa]]
> char
> f(char *p, unsigned i)
> {
> return p[i];
> }
>
> int main(void)
> {
> unsigned i = -1;
> char c[4];
>
> return f(c, i);
> }
> $ cc -Wall -Wextra -Werror -O3 u.c
> $ ./a.out
> Segmentation fault
>
>
> I get this SEGV difference consistently. I CCed gcc@ in case they consider this
> to be something they want to address. Maybe the optimization is important for
> size_t-sized indices, but if it is not, I'd prefer getting the SEGV for SIZE_MAX.
>
After some though, of course the compiler can't produce any different code,
since pointers are 64 bits. A different story would be if pointers were 128
bits, but that might cause its own issues; should sizes be still 64 bits? or 128
bits? Maybe using a configurable size_t would be interesting for debugging.
Anyway, it's good to know that tweaking size_t to be 32 bits in some debug
builds might help catch some off-by-one errors.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-11-23 20:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-17 7:02 size_t vs long A via Libc-alpha
2022-11-17 9:21 ` Alejandro Colomar via Libc-alpha
2022-11-17 9:48 ` A via Libc-alpha
2022-11-17 11:00 ` Alejandro Colomar via Libc-alpha
2022-11-17 19:40 ` Jason Duerstock via Libc-alpha
2022-11-17 20:01 ` Alejandro Colomar via Libc-alpha
2022-11-17 19:17 ` Paul Eggert
2022-11-17 20:27 ` Alejandro Colomar via Libc-alpha
2022-11-17 21:39 ` Paul Eggert
2022-11-17 23:04 ` Alejandro Colomar via Libc-alpha
2022-11-23 20:08 ` Alejandro Colomar via Libc-alpha [this message]
2022-11-18 2:11 ` Maciej W. Rozycki
2022-11-18 2:47 ` Paul Eggert
2022-11-23 20:01 ` Alejandro Colomar via Libc-alpha
2022-11-17 21:58 ` DJ Delorie via Libc-alpha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=148dc963-1d9c-b7d8-e5bf-6843b4b36882@gmail.com \
--to=libc-alpha@sourceware.org \
--cc=alx.manpages@gmail.com \
--cc=amit234234234234@gmail.com \
--cc=eggert@cs.ucla.edu \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).