From: Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org>
To: Paul Eggert <eggert@cs.ucla.edu>, A <amit234234234234@gmail.com>,
libc-alpha@sourceware.org
Subject: Re: size_t vs long.
Date: Thu, 17 Nov 2022 21:27:25 +0100 [thread overview]
Message-ID: <380b196e-b78e-3b0e-7399-ee106b0e716c@gmail.com> (raw)
In-Reply-To: <dd16db9e-bdfe-901d-9b9f-c0aa2836e55e@cs.ucla.edu>
[-- Attachment #1.1: Type: text/plain, Size: 5036 bytes --]
Hi Paul,
On 11/17/22 20:17, Paul Eggert wrote:
> On 2022-11-17 01:21, Alejandro Colomar via Libc-alpha wrote:
>
>> I'd like to change your opinion. Please read this excellent article by Jens
>> Gustedt (member of WG14, the group that develops the ISO C standard) which
>> explains why size_t is better:
>>
>> <https://gustedt.wordpress.com/2013/07/15/a-praise-of-size_t-and-other-unsigned-types/>
>
> Sorry, but that article is not excellent: it's mostly wrong. Among other things
> it says size_t is better because it lets you write code like this:
>
>> for (size_t i = 41; i < sizeof A / sizeof A[0]; --i) {
>> A[i] = something_nice;
>> }
>
> and that there will be "No traps, no signals, no exceptions".
>
> First, Gustedt technically incorrect, because the code *can* trap on platforms
> where SIZE_MAX <= INT_MAX,
First of all, let me suggest that this is not a problem of that kind of code,
but rather a bug in the language: default promotion to int is the underlying
problem, and root of much evil.
But let's continue. SIZE_MAX <= INT_MAX really amounts to platforms where
sizeof(size_t) < sizeof(int).
I honestly don't know of any existing platforms where that is true, and I've
been searching, but couldn't find any. I expect that it's possible that one of
those very old unicorn platforms may make this true, and if you know of any, I'm
curious to know which is it.
For future platforms, since we've learnt that we want size_t to be at least 64
bits, I guess this can happen in a hypothetical platform where size_t is 64
bits, and int is 128. I hope we've also learnt that default promotion to int it
bad. And so I hope that no-one develops such an arch, and that we all do our
best to try and minimize the damage that default promotion to int can do, by
keeping int smaller than most useful sizes, be it by increasing size_t, or by
not increasing int.
For the time being, and while no one points me to an existing platform where
sizeof(size_t) < sizeof(int) (and even if it exists, I only care about POSIX
platforms, where we can probably assume that's not going to happen ever, if only
for not breaking existing code), I'll assume such a platform doesn't exist.
If this ever becomes a real concern:
_Static_assert(sizeof(size_t) < sizeof(int), "This platform is out of luck.");
> because on such a platform when i is zero, '--i' can
> store a trap value into i.
So many things need to be broken in an arch for that to happen. BTW, C23 will
require that signed integers are 2's complement, which I guess removes the
possibility of a trap, IIRC. But I, as you, prefer the trap if I meet an arch
where I get promotion from size_t to int.
>
> Second and more important, that code is bogus. Nobody should ever write code
> like that. If I wrote code like that, I'd *want* a trap.
for (size_t i = 41; i < sizeof A / sizeof A[0]; --i) {
A[i] = something_nice;
}
The code above seems a bug by not being used to it. Once you get used to it, it
can become natural, but let's go for the more natural:
for (size_t i = 0; i < sizeof A / sizeof A[0]; ++i) {
A[i] = something_nice;
}
The main advantage of this code compared to the equivalent ssize_t or ptrdiff_t
or idx_t code is that if you somehow write an off-by-one error, and manage to
access the array at [-1], if i is unsigned you'll access [SIZE_MAX], which will
definitely crash your program. An access to [-1] might instead overwrite some
valuable data. This is an important point for unsigned types.
> Traps are *good* when
> they prevent buggy code from doing further damage.
We seem to agree on this sentence. It's actually the main reason I like size_t
for indices, as explained in my paragraph above.
>
> For what it's worth, in Gnulib's more recent code we've been using the type
> "idx_t". It is a signed type, thus avoiding C's bug-inducing comparison rules,
> where most size_t values compare to be less than -1.
> However, by convention idx_t contains only nonnegative values.
I agree that one should try to avoid comparing signed and unsigned integers.
That's actually very doable. The main issue against using unsigned indices is
'argc', where I can't use unsigned. For the rest of the code, it is usually
easy to keep the separation between signed and unsigned types, and not mix them.
>
> The idx_t type is *much* better than size_t, both because we can tell the
> compiler to do some overflow checking on it, and because it compares nicely to
> ordinary integers. This overcomes two major disadvantages of size_t.
Ignoring odd platforms, idx_t looses much of its advantage. With idx_t you may
be able to use sanitizers to check overflow (if you don't, it's likely that
you'll invoke critical UB, since [-1] is unlikely to crash). With size_t you
get a crash for free if you go off-by-minus-one.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-11-17 20:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-17 7:02 size_t vs long A via Libc-alpha
2022-11-17 9:21 ` Alejandro Colomar via Libc-alpha
2022-11-17 9:48 ` A via Libc-alpha
2022-11-17 11:00 ` Alejandro Colomar via Libc-alpha
2022-11-17 19:40 ` Jason Duerstock via Libc-alpha
2022-11-17 20:01 ` Alejandro Colomar via Libc-alpha
2022-11-17 19:17 ` Paul Eggert
2022-11-17 20:27 ` Alejandro Colomar via Libc-alpha [this message]
2022-11-17 21:39 ` Paul Eggert
2022-11-17 23:04 ` Alejandro Colomar via Libc-alpha
2022-11-23 20:08 ` Using size_t to crash on off-by-one errors (was: size_t vs long.) Alejandro Colomar via Libc-alpha
2022-11-18 2:11 ` size_t vs long Maciej W. Rozycki
2022-11-18 2:47 ` Paul Eggert
2022-11-23 20:01 ` Alejandro Colomar via Libc-alpha
2022-11-17 21:58 ` DJ Delorie via Libc-alpha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=380b196e-b78e-3b0e-7399-ee106b0e716c@gmail.com \
--to=libc-alpha@sourceware.org \
--cc=alx.manpages@gmail.com \
--cc=amit234234234234@gmail.com \
--cc=eggert@cs.ucla.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).