bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* preferring ptrdiff_t to size_t for object counts
@ 2017-06-05  6:45 Paul Eggert
  2017-06-05  9:57 ` Bruno Haible
  2017-06-05 10:07 ` Bruno Haible
  0 siblings, 2 replies; 6+ messages in thread
From: Paul Eggert @ 2017-06-05  6:45 UTC (permalink / raw
  To: Gnulib bugs

GNU Emacs has long been using signed types (typically ptrdiff_t) to count 
objects. This has the advantage that signed integer overflow can be detected 
automatically on some platforms (unfortunately, size_t arithmetic silently wraps 
around). I would like to change the Gnulib modules that GNU Emacs uses, to use 
this style. The main effect on these modules' non-Emacs users would be:

* They accept ptrdiff_t counts, not size_t counts. Normally sizes are computed 
by new functions like xwgrowalloc. When the caller computes sizes by hand, it is 
the caller's responsibility to check for integer overflow.

* They report errors via xwalloc_die, not xalloc_die.

I've also changed the modules that GNU grep uses, as a test that this idea works 
on non-Emacs applications.

As this is a nontrivial change, I'll post the Gnulib patches first without 
installing them, for discussion.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: preferring ptrdiff_t to size_t for object counts
  2017-06-05  6:45 preferring ptrdiff_t to size_t for object counts Paul Eggert
@ 2017-06-05  9:57 ` Bruno Haible
  2017-06-07 21:53   ` Bruno Haible
  2017-06-05 10:07 ` Bruno Haible
  1 sibling, 1 reply; 6+ messages in thread
From: Bruno Haible @ 2017-06-05  9:57 UTC (permalink / raw
  To: bug-gnulib; +Cc: Paul Eggert

Hi Paul,

> GNU Emacs has long been using signed types (typically ptrdiff_t) to count 
> objects. This has the advantage that signed integer overflow can be detected 
> automatically on some platforms (unfortunately, size_t arithmetic silently wraps 
> around).

I have one objection, but a big one: The direct use of ptrdiff_t.

Reasons:

1) Like you, I spend time reviewing code other people have written. In these
   code reviews, it is important to know whether a variable is known to always
   be >= 0 or not.

   For example, when we have
     int n = ...;
     for (int i = 0; i < n; i++) ...
   I always have to spend brain cycles around the question "what if n < 0?
   Does the code still achieve its goal in this case?"

   Whereas if the type clearly states the intent to store only values >= 0,
   there is no issue; no extra brain cycles required.

2) Standards change, and the considerations behind 'walloc' may also change.

   Do you want, 5 or 10 years from now, to go through hundreds of uses of
   'ptrdiff_t' and separate those uses with values >= 0 from those with values
   that can be negative? I certainly don't want to.

3) GCC has range types for Ada. I would hope that someday it also has range
   types for C or C++. Then, it would be very useful to express the fact that
   the values are in the range [0..PTRDIFF_MAX], so that GCC can use it for
   optimization.

4) For static analysis tools (gnulib now uses coverity in particular), I can
   imagine that an unsigned type is easier to work with than a signed type
   (i.e. that the tool can make more inferences and therefore detect more bugs
   when using unsigned types).
   To this effect, it is useful to use an unsigned type for those counters /
   size_t object, *just* for the static analysis tool.

To fix all of these issues, I suggest to use a typedef'ed type, instead. For
example:

  typedef ptrdiff_t wsize_t;

And then use wsize_t everywhere.

This solves problems 1), 2), 3), and 4 (through a #ifdefed definition of
wsize_t).

Yes it means that people reading the code will have to memorize one more type
identifier. But it is to their benefit: they will know the values are >= 0
(see point 1).

Bruno



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: preferring ptrdiff_t to size_t for object counts
  2017-06-05  6:45 preferring ptrdiff_t to size_t for object counts Paul Eggert
  2017-06-05  9:57 ` Bruno Haible
@ 2017-06-05 10:07 ` Bruno Haible
  1 sibling, 0 replies; 6+ messages in thread
From: Bruno Haible @ 2017-06-05 10:07 UTC (permalink / raw
  To: bug-gnulib; +Cc: Paul Eggert

Hi Paul,

I'd like to understand how much better this "ptrdiff_t world" is.

> This has the advantage that signed integer overflow can be detected 
> automatically on some platforms

You mean "-fsanitize=undefined", right?

Does this also catch the following situations?

  a) Pointer subtraction. ISO C11 § J.2 says:
     "The behavior is undefined in the following circumstances: ...
      The result of subtracting two pointers is not representable in an object
      of type ptrdiff_t (6.5.6)."

  b) When assigning a 'size_t' value > PTRDIFF_MAX to a 'ptrdiff_t' variable,
     is that undefined behaviour? Is that caught by "-fsanitize=undefined"?

Bruno



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: preferring ptrdiff_t to size_t for object counts
  2017-06-05  9:57 ` Bruno Haible
@ 2017-06-07 21:53   ` Bruno Haible
  2017-06-07 22:12     ` Paul Eggert
  0 siblings, 1 reply; 6+ messages in thread
From: Bruno Haible @ 2017-06-07 21:53 UTC (permalink / raw
  To: bug-gnulib; +Cc: Paul Eggert

I wrote:
>   typedef ptrdiff_t wsize_t;

'wsize_t' or 'wcount_t'. I don't really mind the name of the type - as
long as it's a typedef.

Bruno



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: preferring ptrdiff_t to size_t for object counts
  2017-06-07 21:53   ` Bruno Haible
@ 2017-06-07 22:12     ` Paul Eggert
  2017-06-08  0:36       ` Bruno Haible
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Eggert @ 2017-06-07 22:12 UTC (permalink / raw
  To: Bruno Haible, bug-gnulib

On 06/07/2017 02:53 PM, Bruno Haible wrote:
> I don't really mind the name of the type - as
> long as it's a typedef.

I've been leaning towards a name that doesn't start with 'w', since the 
type is not specific to the walloc module family. The name I'm currently 
thinking of is 'in_t', short for "index type". That's an 
easy-to-remember name (the type is like 'int', but possibly wider).

One other advantage of having our own signed type is that we can 
guarantee that it's at least as wide as int (something that is not true 
for ptrdiff_t). That way, some of my current code that says 'MIN 
(INT_MAX, PTRDIFF_MAX)' can be simplified to the more-natural INT_MAX. 
This is helpful for traditional interfaces that use int counters.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: preferring ptrdiff_t to size_t for object counts
  2017-06-07 22:12     ` Paul Eggert
@ 2017-06-08  0:36       ` Bruno Haible
  0 siblings, 0 replies; 6+ messages in thread
From: Bruno Haible @ 2017-06-08  0:36 UTC (permalink / raw
  To: Paul Eggert; +Cc: bug-gnulib

Hi Paul,

> The name I'm currently 
> thinking of is 'in_t', short for "index type". That's an 
> easy-to-remember name (the type is like 'int', but possibly wider).

Fine with me.

It doesn't collide: Only very few packages use this identifier 'in_t', and
only in isolated places.

> One other advantage of having our own signed type is that we can 
> guarantee that it's at least as wide as int (something that is not true 
> for ptrdiff_t). That way, some of my current code that says 'MIN 
> (INT_MAX, PTRDIFF_MAX)' can be simplified to the more-natural INT_MAX. 
> This is helpful for traditional interfaces that use int counters.

Indeed. (Although portability to Windows 3.1 is not in the focus of gnulib
nor of GNU programs any more.)

Bruno



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-06-08  0:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-05  6:45 preferring ptrdiff_t to size_t for object counts Paul Eggert
2017-06-05  9:57 ` Bruno Haible
2017-06-07 21:53   ` Bruno Haible
2017-06-07 22:12     ` Paul Eggert
2017-06-08  0:36       ` Bruno Haible
2017-06-05 10:07 ` Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).