Re: size_t vs long. - Alejandro Colomar via Libc-alpha

unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org>
To: Paul Eggert <eggert@cs.ucla.edu>, A <amit234234234234@gmail.com>,
	libc-alpha@sourceware.org
Subject: Re: size_t vs long.
Date: Thu, 17 Nov 2022 21:27:25 +0100	[thread overview]
Message-ID: <380b196e-b78e-3b0e-7399-ee106b0e716c@gmail.com> (raw)
In-Reply-To: <dd16db9e-bdfe-901d-9b9f-c0aa2836e55e@cs.ucla.edu>

[-- Attachment #1.1: Type: text/plain, Size: 5036 bytes --]

Hi Paul,

On 11/17/22 20:17, Paul Eggert wrote:
> On 2022-11-17 01:21, Alejandro Colomar via Libc-alpha wrote:
> 
>> I'd like to change your opinion.  Please read this excellent article by Jens 
>> Gustedt (member of WG14, the group that develops the ISO C standard) which 
>> explains why size_t is better:
>>
>> <https://gustedt.wordpress.com/2013/07/15/a-praise-of-size_t-and-other-unsigned-types/>
> 
> Sorry, but that article is not excellent: it's mostly wrong. Among other things 
> it says size_t is better because it lets you write code like this:
> 
>> for (size_t i = 41; i < sizeof A / sizeof A[0]; --i) {
>>    A[i] = something_nice;
>> }
> 
> and that there will be "No traps, no signals, no exceptions".
> 
> First, Gustedt technically incorrect, because the code *can* trap on platforms 
> where SIZE_MAX <= INT_MAX,
First of all, let me suggest that this is not a problem of that kind of code, 
but rather a bug in the language: default promotion to int is the underlying 
problem, and root of much evil.

But let's continue.  SIZE_MAX <= INT_MAX really amounts to platforms where 
sizeof(size_t) < sizeof(int).

I honestly don't know of any existing platforms where that is true, and I've 
been searching, but couldn't find any.  I expect that it's possible that one of 
those very old unicorn platforms may make this true, and if you know of any, I'm 
curious to know which is it.

For future platforms, since we've learnt that we want size_t to be at least 64 
bits, I guess this can happen in a hypothetical platform where size_t is 64 
bits, and int is 128.  I hope we've also learnt that default promotion to int it 
bad.  And so I hope that no-one develops such an arch, and that we all do our 
best to try and minimize the damage that default promotion to int can do, by 
keeping int smaller than most useful sizes, be it by increasing size_t, or by 
not increasing int.

For the time being, and while no one points me to an existing platform where 
sizeof(size_t) < sizeof(int) (and even if it exists, I only care about POSIX 
platforms, where we can probably assume that's not going to happen ever, if only 
for not breaking existing code), I'll assume such a platform doesn't exist.

If this ever becomes a real concern:

_Static_assert(sizeof(size_t) < sizeof(int), "This platform is out of luck.");

> because on such a platform when i is zero, '--i' can 
> store a trap value into i.

So many things need to be broken in an arch for that to happen.  BTW, C23 will 
require that signed integers are 2's complement, which I guess removes the 
possibility of a trap, IIRC.  But I, as you, prefer the trap if I meet an arch 
where I get promotion from size_t to int.

> 
> Second and more important, that code is bogus. Nobody should ever write code 
> like that. If I wrote code like that, I'd *want* a trap.

for (size_t i = 41; i < sizeof A / sizeof A[0]; --i) {
    A[i] = something_nice;
}

The code above seems a bug by not being used to it.  Once you get used to it, it 
can become natural, but let's go for the more natural:

for (size_t i = 0; i < sizeof A / sizeof A[0]; ++i) {
    A[i] = something_nice;
}

The main advantage of this code compared to the equivalent ssize_t or ptrdiff_t 
or idx_t code is that if you somehow write an off-by-one error, and manage to 
access the array at [-1], if i is unsigned you'll access [SIZE_MAX], which will 
definitely crash your program.  An access to [-1] might instead overwrite some 
valuable data.  This is an important point for unsigned types.

> Traps are *good* when 
> they prevent buggy code from doing further damage.

We seem to agree on this sentence.  It's actually the main reason I like size_t 
for indices, as explained in my paragraph above.

> 
> For what it's worth, in Gnulib's more recent code we've been using the type 
> "idx_t". It is a signed type, thus avoiding C's bug-inducing comparison rules, 
> where most size_t values compare to be less than -1.
> However, by convention idx_t contains only nonnegative values.

I agree that one should try to avoid comparing signed and unsigned integers. 
That's actually very doable.  The main issue against using unsigned indices is 
'argc', where I can't use unsigned.  For the rest of the code, it is usually 
easy to keep the separation between signed and unsigned types, and not mix them.

> 
> The idx_t type is *much* better than size_t, both because we can tell the 
> compiler to do some overflow checking on it, and because it compares nicely to 
> ordinary integers. This overcomes two major disadvantages of size_t.

Ignoring odd platforms, idx_t looses much of its advantage.  With idx_t you may 
be able to use sanitizers to check overflow (if you don't, it's likely that 
you'll invoke critical UB, since [-1] is unlikely to crash).  With size_t you 
get a crash for free if you go off-by-minus-one.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2022-11-17 20:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17  7:02 size_t vs long A via Libc-alpha
2022-11-17  9:21 ` Alejandro Colomar via Libc-alpha
2022-11-17  9:48   ` A via Libc-alpha
2022-11-17 11:00     ` Alejandro Colomar via Libc-alpha
2022-11-17 19:40       ` Jason Duerstock via Libc-alpha
2022-11-17 20:01         ` Alejandro Colomar via Libc-alpha
2022-11-17 19:17   ` Paul Eggert
2022-11-17 20:27     ` Alejandro Colomar via Libc-alpha [this message]
2022-11-17 21:39       ` Paul Eggert
2022-11-17 23:04         ` Alejandro Colomar via Libc-alpha
2022-11-23 20:08           ` Using size_t to crash on off-by-one errors (was: size_t vs long.) Alejandro Colomar via Libc-alpha
2022-11-18  2:11         ` size_t vs long Maciej W. Rozycki
2022-11-18  2:47           ` Paul Eggert
2022-11-23 20:01             ` Alejandro Colomar via Libc-alpha
2022-11-17 21:58 ` DJ Delorie via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=380b196e-b78e-3b0e-7399-ee106b0e716c@gmail.com \
    --to=libc-alpha@sourceware.org \
    --cc=alx.manpages@gmail.com \
    --cc=amit234234234234@gmail.com \
    --cc=eggert@cs.ucla.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).