bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* new module suggestion: fprintftime-check
@ 2018-12-28 10:49 Assaf Gordon
  2018-12-28 16:34 ` Bruno Haible
  0 siblings, 1 reply; 12+ messages in thread
From: Assaf Gordon @ 2018-12-28 10:49 UTC (permalink / raw)
  To: bug-gnulib@gnu.org List

[-- Attachment #1: Type: text/plain, Size: 648 bytes --]

Hello,

I'd like to suggest the following new module: fprintftime-check.
It uses the same infrastructure as fprintftime
(i.e. #include "nstrtime,c") to implement a new function:

   int fprintftime_check (const char *format, const char** err_ptr);

This function enables syntax-check of the format string.
Current implementation of nstrtime/fprintftime silently ignore
bad formats.

With this module, it will be possible to detect/warn/reject
invalid format strings in coreutils' date(1)
( https://bugs.gnu.org/16782 ).

This patch is only a rough draft - everything can be changed (including 
the name).

Comments and feedback welcomed,
  - assaf

[-- Attachment #2: 0001-fprintftime-check-new-module-for-fprintftime-format-.patch.gz --]
[-- Type: application/gzip, Size: 5030 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: new module suggestion: fprintftime-check
  2018-12-28 10:49 new module suggestion: fprintftime-check Assaf Gordon
@ 2018-12-28 16:34 ` Bruno Haible
  2018-12-28 20:09   ` Assaf Gordon
  0 siblings, 1 reply; 12+ messages in thread
From: Bruno Haible @ 2018-12-28 16:34 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Assaf Gordon

Hi Assaf,

> This function enables syntax-check of the format string.

First question: Should this syntax-check be integrated into the
nstrftime() and fprintftime() functions? These functions are gnulib
inventions, therefore they could be extended to return
  - an error indicator (maybe EINVAL?),
  - a detailed error message, if you like,
  - a pointer to the wrong directive in the format string, if you like.
Or is it better to have a function that does only the syntax check,
and document that nstrftime() and fprintftime() should only be called
after the syntax check has been done?
(I have no preference. Just asking.)

> I'd like to suggest the following new module: fprintftime-check.

For a module that does only the syntax check, I would suggest a
different name:
  - The prefix 'fprint' indicates output to a 'FILE *'.
  - The prefix 'str' indicates output to a 'char *' buffer.
Why not call it 'ftime-check' or 'check-ftime'?

> It uses the same infrastructure as fprintftime
> (i.e. #include "nstrtime,c")

That's a good idea, otherwise the syntax check and the actual use of
the format string might too easily get out-of-sync.

> This patch is only a rough draft

tzalloc() may fail and return NULL. (And why would parsing the format
string be dependent on a particular time zone at all?)

Bruno



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: new module suggestion: fprintftime-check
  2018-12-28 16:34 ` Bruno Haible
@ 2018-12-28 20:09   ` Assaf Gordon
  2018-12-29  6:08     ` Bruno Haible
  2018-12-29 17:22     ` Paul Eggert
  0 siblings, 2 replies; 12+ messages in thread
From: Assaf Gordon @ 2018-12-28 20:09 UTC (permalink / raw)
  To: Bruno Haible, bug-gnulib

Hello Bruno,

On 2018-12-28 9:34 a.m., Bruno Haible wrote:
>> This function enables syntax-check of the format string.
> 
> First question: Should this syntax-check be integrated into the
> nstrftime() and fprintftime() functions? These functions are gnulib
> inventions, therefore they could be extended to return
>    - an error indicator (maybe EINVAL?),
>    - a detailed error message, if you like,
>    - a pointer to the wrong directive in the format string, if you like.
> Or is it better to have a function that does only the syntax check,
> and document that nstrftime() and fprintftime() should only be called
> after the syntax check has been done?
> (I have no preference. Just asking.)

Both nstrtime and fprintftime silently ignore bad formats:
https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/nstrftime.c#n1478

The comment even says:
       /* Unknown format; output the format, including the '%',
          since this is most likely the right thing to do if a
          multibyte string has been misparsed.  */

This has been the case since 1996 when strftime.c was imported from libc
(gnulib commit afabd949).

I suspect that changing this behavior would be a disruptive
backwards-incompatible change (but other opinions are welcomed).


>> I'd like to suggest the following new module: fprintftime-check.
> 
> For a module that does only the syntax check, I would suggest a
> different name:
>    - The prefix 'fprint' indicates output to a 'FILE *'.
>    - The prefix 'str' indicates output to a 'char *' buffer.
> Why not call it 'ftime-check' or 'check-ftime'?

Good idea. I'll change it.


> tzalloc() may fail and return NULL. (And why would parsing the format
> string be dependent on a particular time zone at all?)

nstrtime's __strftime_internal() is a complicated beast, I don't
claim to fully understand it.
But the function contains this comment:
    /* POSIX.1 requires that local time zone information be used as
      though strftime called tzset.  */
https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/nstrftime.c#n540

And there is even an output "bool * tzset_called" parameter, to
let the caller know if it has been called or not - so I assume it is
critical to correct behavior of both nstrtime and fprintftime.
I will try to see if it can be disabled for the syntax-check code.

-assaf



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: new module suggestion: fprintftime-check
  2018-12-28 20:09   ` Assaf Gordon
@ 2018-12-29  6:08     ` Bruno Haible
  2018-12-29 15:21       ` Assaf Gordon
  2019-01-02  8:03       ` Florian Weimer
  2018-12-29 17:22     ` Paul Eggert
  1 sibling, 2 replies; 12+ messages in thread
From: Bruno Haible @ 2018-12-29  6:08 UTC (permalink / raw)
  To: Assaf Gordon; +Cc: Florian Weimer, bug-gnulib

[CCing Florian Weimer.
Florian, the thread started at
https://lists.gnu.org/archive/html/bug-gnulib/2018-12/msg00149.html ]

Assaf Gordon wrote:
> The comment even says:
>        /* Unknown format; output the format, including the '%',
>           since this is most likely the right thing to do if a
>           multibyte string has been misparsed.  */
> 
> This has been the case since 1996 when strftime.c was imported from libc
> (gnulib commit afabd949).
> 
> I suspect that changing this behavior would be a disruptive
> backwards-incompatible change (but other opinions are welcomed).

The "security" and "robustness" aspects of software have gained importance
over the last 22 years, also in domain of glibc.

Florian, Assaf discovered that glibc processing of time format strings
(strftime) operates according to the garbage-in - garbage-out principle,
that is, an invalid format string does not get reported to the caller
but instead produces output that is "most likely the right thing".

Is this still considered the adequate processing, from a glibc point of
view?

Bruno



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: new module suggestion: fprintftime-check
  2018-12-29  6:08     ` Bruno Haible
@ 2018-12-29 15:21       ` Assaf Gordon
  2019-01-02  8:03       ` Florian Weimer
  1 sibling, 0 replies; 12+ messages in thread
From: Assaf Gordon @ 2018-12-29 15:21 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Florian Weimer, bug-gnulib

On 2018-12-28 11:08 p.m., Bruno Haible wrote:
> [CCing Florian Weimer.
> Florian, the thread started at
> https://lists.gnu.org/archive/html/bug-gnulib/2018-12/msg00149.html ]
> 
> Assaf Gordon wrote:
>> The comment even says:
>>         /* Unknown format; output the format, including the '%',
>>            since this is most likely the right thing to do if a
>>            multibyte string has been misparsed.  */
>>
>> This has been the case since 1996 when strftime.c was imported from libc
>> (gnulib commit afabd949).
>>
>> I suspect that changing this behavior would be a disruptive
>> backwards-incompatible change (but other opinions are welcomed).
> 
> The "security" and "robustness" aspects of software have gained importance
> over the last 22 years, also in domain of glibc.
> 
> Florian, Assaf discovered that glibc processing of time format strings
> (strftime) operates according to the garbage-in - garbage-out principle,
> that is, an invalid format string does not get reported to the caller
> but instead produces output that is "most likely the right thing".
> 
> Is this still considered the adequate processing, from a glibc point of
> view?
> 

For reference, this is about ./time/strftime_l.c lines 1414-1428:

https://sourceware.org/git/?p=glibc.git;a=blob;f=time/strftime_l.c;h=c71f9f47a9525046b59a89c005de22a304367d4d;hb=HEAD#l1414


Also, POSIX says:
"If a conversion specification does not correspond to any of the above, 
the behavior is undefined."
http://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html

Looking at the "bigger picture",
I'll just say my goal is to provide a helpful warning in date(1),
not to change any APIs...


regards,
  - assaf


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: new module suggestion: fprintftime-check
  2018-12-28 20:09   ` Assaf Gordon
  2018-12-29  6:08     ` Bruno Haible
@ 2018-12-29 17:22     ` Paul Eggert
  2018-12-29 17:48       ` Ben Pfaff
  1 sibling, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2018-12-29 17:22 UTC (permalink / raw)
  To: Assaf Gordon, Bruno Haible, bug-gnulib

> I suspect that changing this behavior would be a disruptive
> backwards-incompatible change (but other opinions are welcomed).

I wouldn't mind a change as long as it changes the API enough so that compilers 
complain if we don't also update the calling code. For example, nstrftime could 
take an additional ptrdiff_t * argument that (if not null) is set to the offset 
of the first offending % or encoding error, or to -1 if the format is OK. (We 
should report encoding errors as well as bad % formats.)

(Using ptrdiff_t is part of my campaign to prefer ptrdiff_t to size_t. While 
we're at it, let's change the other size_t args to ptrdiff_t, but I digress....)

> nstrtime's __strftime_internal() is a complicated beast, I don't
> claim to fully understand it.
> But the function contains this comment:
>     /* POSIX.1 requires that local time zone information be used as
>       though strftime called tzset.  */
> https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/nstrftime.c#n540

This comment is for the POSIX API where there was no timezone_t argument. It 
doesn't apply to the Gnulib API, because of the "undef HAVE_TZSET" earlier on. 
The only reason for the tzset_called stuff is the faint hope that we can merge 
all this stuff into glibc someday.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: new module suggestion: fprintftime-check
  2018-12-29 17:22     ` Paul Eggert
@ 2018-12-29 17:48       ` Ben Pfaff
  2018-12-31  1:40         ` Paul Eggert
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Pfaff @ 2018-12-29 17:48 UTC (permalink / raw)
  To: Paul Eggert; +Cc: bug-gnulib, Assaf Gordon, Bruno Haible

On Sat, Dec 29, 2018 at 09:22:17AM -0800, Paul Eggert wrote:
> (Using ptrdiff_t is part of my campaign to prefer ptrdiff_t to size_t. While
> we're at it, let's change the other size_t args to ptrdiff_t, but I
> digress....)

Have you said anything about this campaign elsewhere?  I'd like to hear
more.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: new module suggestion: fprintftime-check
  2018-12-29 17:48       ` Ben Pfaff
@ 2018-12-31  1:40         ` Paul Eggert
  2019-01-05 10:37           ` preferring ptrdiff_t to size_t Bruno Haible
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2018-12-31  1:40 UTC (permalink / raw)
  To: Ben Pfaff; +Cc: bug-gnulib, Assaf Gordon, Bruno Haible

Ben Pfaff wrote:
> On Sat, Dec 29, 2018 at 09:22:17AM -0800, Paul Eggert wrote:
>> (Using ptrdiff_t is part of my campaign to prefer ptrdiff_t to size_t. While
>> we're at it, let's change the other size_t args to ptrdiff_t, but I
>> digress....)
> 
> Have you said anything about this campaign elsewhere?  I'd like to hear
> more.

I probably have, but it's easier for me to say something new afresh.

The C standard says that objects with size greater than PTRDIFF_MAX can cause 
trouble, because subtracting pointers into these objects yields undefined 
behavior if the resulting integer does not fit into ptrdiff_t. Worse, on many 
popular systems (including GNU), even if (P - Q)'s value fits into ptrdiff_t, (P 
- Q) can be calculated incorrectly if ((char *) P - (char *) Q) does not fit 
into ptrdiff_t; although this behavior is contrary to the C standard it's such a 
common problem that we cannot ignore it in portable code.

Because of this, C code should never allocate objects with size greater than 
PTRDIFF_MAX unless it never does pointer subtraction with the result, and it's 
generally simpler and more reliable if C code simply refuses to allocate such 
objects under any circumstances. Gnulib started doing this a couple of years ago 
with commit f3b846699de69b8e6a508396f7f778eb1e917a47, which causes 
xalloc-oversized and related modules to reject attempts to create objects larger 
than PTRDIFF_MAX bytes; this means many GNU applications are already safer. 
(Emacs has a different way of enforcing the same restriction internally.) Also, 
we are thinking of doing something similar with glibc's malloc/etc. functions in 
the next glibc version, though this patch has not gone in yet and may have to 
wait until the version after next.

As a result, C programs can now use either size_t or ptrdiff_t to store object 
sizes, since no actual object can have more than min (SIZE_MAX, PTRDIFF_MAX) 
bytes. Using signed types is better nowadays than using unsigned types, since 
many platforms now check for signed integer overflow and this can catch many 
bugs, some of them security-relevant, whereas unsigned arithmetic is well 
defined to wrap around with no overflow check (something that can be quite 
dangerous when doing size calculations). So, for reliability and security 
reasons, C programs should now prefer ptrdiff_t to size_t when dealing with 
object sizes.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: new module suggestion: fprintftime-check
  2018-12-29  6:08     ` Bruno Haible
  2018-12-29 15:21       ` Assaf Gordon
@ 2019-01-02  8:03       ` Florian Weimer
  2019-01-05 10:40         ` Bruno Haible
  1 sibling, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2019-01-02  8:03 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Assaf Gordon, bug-gnulib

* Bruno Haible:

> [CCing Florian Weimer.
> Florian, the thread started at
> https://lists.gnu.org/archive/html/bug-gnulib/2018-12/msg00149.html ]
>
> Assaf Gordon wrote:
>> The comment even says:
>>        /* Unknown format; output the format, including the '%',
>>           since this is most likely the right thing to do if a
>>           multibyte string has been misparsed.  */
>> 
>> This has been the case since 1996 when strftime.c was imported from libc
>> (gnulib commit afabd949).
>> 
>> I suspect that changing this behavior would be a disruptive
>> backwards-incompatible change (but other opinions are welcomed).
>
> The "security" and "robustness" aspects of software have gained importance
> over the last 22 years, also in domain of glibc.
>
> Florian, Assaf discovered that glibc processing of time format strings
> (strftime) operates according to the garbage-in - garbage-out principle,
> that is, an invalid format string does not get reported to the caller
> but instead produces output that is "most likely the right thing".

Historically, some Lua scripts have relied on strftime not crashing, but
I think this awas fixed on the Lua side a couple of years ago.

The standards do not provide a way to report errors for malformed format
strings.  I think the current behavior is acceptable, all things
considered.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: preferring ptrdiff_t to size_t
  2018-12-31  1:40         ` Paul Eggert
@ 2019-01-05 10:37           ` Bruno Haible
  0 siblings, 0 replies; 12+ messages in thread
From: Bruno Haible @ 2019-01-05 10:37 UTC (permalink / raw)
  To: Paul Eggert; +Cc: bug-gnulib, Assaf Gordon, Ben Pfaff

Paul Eggert wrote:
> Using signed types is better nowadays than using unsigned types, since 
> many platforms now check for signed integer overflow and this can catch many 
> bugs, some of them security-relevant, whereas unsigned arithmetic is well 
> defined to wrap around with no overflow check (something that can be quite 
> dangerous when doing size calculations). So, for reliability and security 
> reasons, C programs should now prefer ptrdiff_t to size_t when dealing with 
> object sizes.

In the thread that starts at
http://lists.gnu.org/archive/html/bug-gnulib/2017-06/msg00009.html
I suggest to use a typedef, not ptrdiff_t directly, for values that are
known to be in the range 0..PTRDIFF_MAX.

Bruno



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: new module suggestion: fprintftime-check
  2019-01-02  8:03       ` Florian Weimer
@ 2019-01-05 10:40         ` Bruno Haible
  2019-01-05 12:43           ` Florian Weimer
  0 siblings, 1 reply; 12+ messages in thread
From: Bruno Haible @ 2019-01-05 10:40 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Assaf Gordon, bug-gnulib

Florian Weimer wrote:
> The standards do not provide a way to report errors for malformed format
> strings.  I think the current behavior is acceptable, all things
> considered.

OK, then I'm fine with Assaf's approach to create a new, separate function
that does only the syntax checking.

Bruno



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: new module suggestion: fprintftime-check
  2019-01-05 10:40         ` Bruno Haible
@ 2019-01-05 12:43           ` Florian Weimer
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Weimer @ 2019-01-05 12:43 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Assaf Gordon, bug-gnulib

* Bruno Haible:

> Florian Weimer wrote:
>> The standards do not provide a way to report errors for malformed format
>> strings.  I think the current behavior is acceptable, all things
>> considered.
>
> OK, then I'm fine with Assaf's approach to create a new, separate function
> that does only the syntax checking.

How do you plan to keep this aligned with the glibc implementation?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-01-05 12:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-28 10:49 new module suggestion: fprintftime-check Assaf Gordon
2018-12-28 16:34 ` Bruno Haible
2018-12-28 20:09   ` Assaf Gordon
2018-12-29  6:08     ` Bruno Haible
2018-12-29 15:21       ` Assaf Gordon
2019-01-02  8:03       ` Florian Weimer
2019-01-05 10:40         ` Bruno Haible
2019-01-05 12:43           ` Florian Weimer
2018-12-29 17:22     ` Paul Eggert
2018-12-29 17:48       ` Ben Pfaff
2018-12-31  1:40         ` Paul Eggert
2019-01-05 10:37           ` preferring ptrdiff_t to size_t Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).