Why does close_stdout close stdout and stderr?

bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed

* Why does close_stdout close stdout and stderr?
@ 2019-04-29 19:45 Florian Weimer
  2019-04-29 19:49 ` Eric Blake
  0 siblings, 1 reply; 34+ messages in thread
From: Florian Weimer @ 2019-04-29 19:45 UTC (permalink / raw)
  To: bug-gnulib

I get that error checking is important.  But why not just use ferror and
fflush?  Closing the streams is excessive and tends to introduce
use-after-free issues, as evidenced by the sanitizer workarounds.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-04-29 19:45 Why does close_stdout close stdout and stderr? Florian Weimer
@ 2019-04-29 19:49 ` Eric Blake
  2019-04-29 20:26   ` Florian Weimer
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2019-04-29 19:49 UTC (permalink / raw)
  To: Florian Weimer, bug-gnulib


[-- Attachment #1.1: Type: text/plain, Size: 749 bytes --]

On 4/29/19 2:45 PM, Florian Weimer wrote:
> I get that error checking is important.  But why not just use ferror and
> fflush?  Closing the streams is excessive and tends to introduce
> use-after-free issues, as evidenced by the sanitizer workarounds.

If I recall the explanation, at least some versions of NFS do not
actually flush on fflush(), but wait until close(). If you want to avoid
data loss and ensure that things written made it to the remote storage
while detecting every possible indication when an error may have
prevented that from working, then you have to go all the way through
close().

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-04-29 19:49 ` Eric Blake
@ 2019-04-29 20:26   ` Florian Weimer
  2019-05-06 12:05     ` Florian Weimer
  0 siblings, 1 reply; 34+ messages in thread
From: Florian Weimer @ 2019-04-29 20:26 UTC (permalink / raw)
  To: Eric Blake; +Cc: bug-gnulib

* Eric Blake:

> On 4/29/19 2:45 PM, Florian Weimer wrote:
>> I get that error checking is important.  But why not just use ferror and
>> fflush?  Closing the streams is excessive and tends to introduce
>> use-after-free issues, as evidenced by the sanitizer workarounds.
>
> If I recall the explanation, at least some versions of NFS do not
> actually flush on fflush(), but wait until close(). If you want to avoid
> data loss and ensure that things written made it to the remote storage
> while detecting every possible indication when an error may have
> prevented that from working, then you have to go all the way through
> close().

Any file system on Linux does this to a varying degree (unlike Solaris
and Windows, I think).  If you want to catch low-level I/O errors, you
need to call fsync after fflush.  And I doubt this is something we want
to do because it will result in bad-looking performance.

But the NFS aspect is somewhat plausible at least.

I can try to figure out if NFS makes a difference for Linux here,
i.e. if there are cases where a write will succeed, but only an
immediately following close will report an error condition that is
known, in principle, even at the time of the write.  A difference
between hard and soft NFS mounts could matter in this context.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-04-29 20:26   ` Florian Weimer
@ 2019-05-06 12:05     ` Florian Weimer
  2019-05-06 14:56       ` Bernhard Voelker
                         ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Florian Weimer @ 2019-05-06 12:05 UTC (permalink / raw)
  To: Eric Blake; +Cc: bug-gnulib, NeilBrown, Jeff Layton

* Florian Weimer:

> * Eric Blake:
>
>> On 4/29/19 2:45 PM, Florian Weimer wrote:
>>> I get that error checking is important.  But why not just use ferror and
>>> fflush?  Closing the streams is excessive and tends to introduce
>>> use-after-free issues, as evidenced by the sanitizer workarounds.
>>
>> If I recall the explanation, at least some versions of NFS do not
>> actually flush on fflush(), but wait until close(). If you want to avoid
>> data loss and ensure that things written made it to the remote storage
>> while detecting every possible indication when an error may have
>> prevented that from working, then you have to go all the way through
>> close().
>
> Any file system on Linux does this to a varying degree (unlike Solaris
> and Windows, I think).  If you want to catch low-level I/O errors, you
> need to call fsync after fflush.  And I doubt this is something we want
> to do because it will result in bad-looking performance.
>
> But the NFS aspect is somewhat plausible at least.
>
> I can try to figure out if NFS makes a difference for Linux here,
> i.e. if there are cases where a write will succeed, but only an
> immediately following close will report an error condition that is
> known, in principle, even at the time of the write.  A difference
> between hard and soft NFS mounts could matter in this context.

Start of thread: <https://lists.gnu.org/r/bug-gnulib/2019-04/msg00059.html>

I've been told that on Linux, close does not report writeback errors.
So the only way to get a reliable error indicator (beyond what you get
from the write system call) would be fsync.  And I doubt you want to
call that, purely for performance reasons.

This means that for Linux at least, close_stdout should just call
fflush, not fclose.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-06 12:05     ` Florian Weimer
@ 2019-05-06 14:56       ` Bernhard Voelker
  2019-05-06 15:47         ` Florian Weimer
  2019-05-06 18:53       ` Paul Eggert
  2019-05-06 22:32       ` Bruno Haible
  2 siblings, 1 reply; 34+ messages in thread
From: Bernhard Voelker @ 2019-05-06 14:56 UTC (permalink / raw)
  To: Florian Weimer, Eric Blake; +Cc: bug-gnulib, NeilBrown, Jeff Layton

On 5/6/19 2:05 PM, Florian Weimer wrote:
>>> On 4/29/19 2:45 PM, Florian Weimer wrote:
>>>> I get that error checking is important.  But why not just use ferror and
>>>> fflush?  Closing the streams is excessive and tends to introduce
>>>> use-after-free issues, as evidenced by the sanitizer workarounds.

> This means that for Linux at least, close_stdout should just call
> fflush, not fclose.

What is the problem?  I mean if it is use-after-free as mentioned in
the first mail, then write() after fflush() without error checking via
another fflush() is in the same category, isn't it?

Have a nice day,
Berny



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-06 14:56       ` Bernhard Voelker
@ 2019-05-06 15:47         ` Florian Weimer
  2019-05-06 19:14           ` Bernhard Voelker
  0 siblings, 1 reply; 34+ messages in thread
From: Florian Weimer @ 2019-05-06 15:47 UTC (permalink / raw)
  To: Bernhard Voelker; +Cc: bug-gnulib, Eric Blake, NeilBrown, Jeff Layton

* Bernhard Voelker:

> On 5/6/19 2:05 PM, Florian Weimer wrote:
>>>> On 4/29/19 2:45 PM, Florian Weimer wrote:
>>>>> I get that error checking is important.  But why not just use ferror and
>>>>> fflush?  Closing the streams is excessive and tends to introduce
>>>>> use-after-free issues, as evidenced by the sanitizer workarounds.
>
>> This means that for Linux at least, close_stdout should just call
>> fflush, not fclose.
>
> What is the problem?  I mean if it is use-after-free as mentioned in
> the first mail, then write() after fflush() without error checking via
> another fflush() is in the same category, isn't it?

No, there is no memory corruption involved because stdout and stderr
remain valid.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-06 12:05     ` Florian Weimer
  2019-05-06 14:56       ` Bernhard Voelker
@ 2019-05-06 18:53       ` Paul Eggert
  2019-05-06 19:02         ` Jeff Layton
  2019-05-06 22:32       ` Bruno Haible
  2 siblings, 1 reply; 34+ messages in thread
From: Paul Eggert @ 2019-05-06 18:53 UTC (permalink / raw)
  To: Florian Weimer, Eric Blake; +Cc: bug-gnulib, NeilBrown, Jeff Layton

On 5/6/19 5:05 AM, Florian Weimer wrote:

> I've been told that on Linux, close does not report writeback errors.
> So the only way to get a reliable error indicator (beyond what you get
> from the write system call) would be fsync.

Are you sure you asked your expert the right question? I just looked at
the current (5.1) Linux kernel, and though I'm no kernel expert it looks
like the close system call invokes filp_close, which calls
filp->f_op->flush which on NFS is nfs_file_flush, which invokes
vfs_fsync which in turn invokes nfs_file_fsync, and surely this does
what fsync would do (and does more work and error checking than an
ordinary write syscall would do, at any rate).

Admittedly Linux is a real mess in reporting write errors back to the
user, and this has been true for years.[1] That being said, the
longstanding documented tradition is that writeback errors are reported
in later close or fsync calls, and there are suggestions to clean up
this kernel area to get closer to doing what the documentation says.[2]
So until we're pretty sure the matter is settled for Linux, perhaps we'd
better let Gnulib be.

[1] Gunawi HS, Rubio-González C, Arpaci-Dusseau AC, Arpaci-Dusseau RH,
Liblit B. EIO: Error Handling is Occasionally Correct. FAST 2008.
https://www.usenix.org/legacy/event/fast08/tech/gunawi.html

[2] Edge J. Handling writeback errors. LWN.net. 2017-05-04.
https://lwn.net/Articles/718734/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-06 18:53       ` Paul Eggert
@ 2019-05-06 19:02         ` Jeff Layton
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff Layton @ 2019-05-06 19:02 UTC (permalink / raw)
  To: Paul Eggert, Florian Weimer, Eric Blake; +Cc: bug-gnulib, NeilBrown

On Mon, 2019-05-06 at 11:53 -0700, Paul Eggert wrote:
> On 5/6/19 5:05 AM, Florian Weimer wrote:
> 
> > I've been told that on Linux, close does not report writeback errors.
> > So the only way to get a reliable error indicator (beyond what you get
> > from the write system call) would be fsync.
> 
> Are you sure you asked your expert the right question? I just looked at
> the current (5.1) Linux kernel, and though I'm no kernel expert it looks
> like the close system call invokes filp_close, which calls
> filp->f_op->flush which on NFS is nfs_file_flush, which invokes
> vfs_fsync which in turn invokes nfs_file_fsync, and surely this does
> what fsync would do (and does more work and error checking than an
> ordinary write syscall would do, at any rate).
> 

Sure, we have the plumbing to report errors there and some filesystems
do it, but those errors don't mean much. See below.

> Admittedly Linux is a real mess in reporting write errors back to the
> user, and this has been true for years.[1] That being said, the
> longstanding documented tradition is that writeback errors are reported
> in later close or fsync calls, and there are suggestions to clean up
> this kernel area to get closer to doing what the documentation says.[2]
> So until we're pretty sure the matter is settled for Linux, perhaps we'd
> better let Gnulib be.
> 
> [1] Gunawi HS, Rubio-González C, Arpaci-Dusseau AC, Arpaci-Dusseau RH,
> Liblit B. EIO: Error Handling is Occasionally Correct. FAST 2008.
> https://www.usenix.org/legacy/event/fast08/tech/gunawi.html
> 
> [2] Edge J. Handling writeback errors. LWN.net. 2017-05-04.
> https://lwn.net/Articles/718734/
> 

You may also be interested in this article too:

    https://lwn.net/Articles/752613/

Neil points out in the comments that issuing a close() doesn't
necessarily initiate any writeback from the kernel. It does on NFS of
course, but not on most filesystems.

So, a successful return from close() is inconclusive. It just means
that the kernel has not hit a writeback error yet, not that the data is
truly safe.
-- 
Jeff Layton <jlayton@redhat.com>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-06 15:47         ` Florian Weimer
@ 2019-05-06 19:14           ` Bernhard Voelker
  2019-05-06 19:19             ` Florian Weimer
  0 siblings, 1 reply; 34+ messages in thread
From: Bernhard Voelker @ 2019-05-06 19:14 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Eric Blake, bug-gnulib, NeilBrown, Jeff Layton

On 5/6/19 5:47 PM, Florian Weimer wrote:
> * Bernhard Voelker:
>> What is the problem?  I mean if it is use-after-free as mentioned in
>> the first mail, then write() after fflush() without error checking via
>> another fflush() is in the same category, isn't it?
> 
> No, there is no memory corruption involved because stdout and stderr
> remain valid.

IMO that's easier to detect than a write() without a following error
checking; the consequences may also be quite fatal for the user.

Have a nice day,
Berny


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-06 19:14           ` Bernhard Voelker
@ 2019-05-06 19:19             ` Florian Weimer
  2019-05-09  6:20               ` Bernhard Voelker
  0 siblings, 1 reply; 34+ messages in thread
From: Florian Weimer @ 2019-05-06 19:19 UTC (permalink / raw)
  To: Bernhard Voelker; +Cc: Eric Blake, bug-gnulib, NeilBrown, Jeff Layton

* Bernhard Voelker:

> On 5/6/19 5:47 PM, Florian Weimer wrote:
>> * Bernhard Voelker:
>>> What is the problem?  I mean if it is use-after-free as mentioned in
>>> the first mail, then write() after fflush() without error checking via
>>> another fflush() is in the same category, isn't it?
>> 
>> No, there is no memory corruption involved because stdout and stderr
>> remain valid.
>
> IMO that's easier to detect than a write() without a following error
> checking; the consequences may also be quite fatal for the user.

I do not understand this comment.  Do you mean fwrite?

How would a programmer check that close_stdout has run, to determine
that stdout and stderr are now invalid, to avoid the memory corruption?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-06 12:05     ` Florian Weimer
  2019-05-06 14:56       ` Bernhard Voelker
  2019-05-06 18:53       ` Paul Eggert
@ 2019-05-06 22:32       ` Bruno Haible
  2019-05-07  9:44         ` Assaf Gordon
  2 siblings, 1 reply; 34+ messages in thread
From: Bruno Haible @ 2019-05-06 22:32 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Florian Weimer, Eric Blake, NeilBrown, Jeff Layton

Florian Weimer wrote:
> ... would be fsync.  And I doubt you want to
> call that, purely for performance reasons.

The trade-off between data safety and speed of disk accesses
is to be done by the system administrator, when they decide whether
to use the mount option 'sync' or not. It would be wrong for gnulib
to call fsync() in close_stdout, because that would force the
safety and inefficiency of immediate syncing on systems which have
not asked for it.

Bruno

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-06 22:32       ` Bruno Haible
@ 2019-05-07  9:44         ` Assaf Gordon
  2019-05-07  9:49           ` Assaf Gordon
  2019-05-07 11:28           ` Bruno Haible
  0 siblings, 2 replies; 34+ messages in thread
From: Assaf Gordon @ 2019-05-07  9:44 UTC (permalink / raw)
  To: Bruno Haible, bug-gnulib
  Cc: Florian Weimer, Eric Blake, NeilBrown, Jeff Layton

[-- Attachment #1: Type: text/plain, Size: 3129 bytes --]

Hello all,

joining a bit late to this discussion, but I'd like to add
another POV on why fclose is important and useful:

On 2019-04-29 1:45 p.m., Florian Weimer wrote:
> I get that error checking is important. But why not just use ferror
> and fflush? Closing the streams is excessive and tends to introduce 
> use-after-free issues, as evidenced by the sanitizer workarounds.

There is at least one (sadly) common issue in my work, and that is
detecting failed writes due to full disk (or less common - user quota
limits).

It is common in bioinformatics to process large files ("big data" and
all) - a program reading a 400MB file and writing a 2GB file is common.
Larger outputs (e.g. 20GB) are also not rare.

Many of these programs write to STDOUT and expect the user to redirect
to a file or a pipe.
In other cases, even if the output goes to a file, progress reports and
error messages go to stdout/stderr.

For those who think disk-space is a non-issue, consider the case where
the processing happens on "the cloud" - there disk-space is at premium
cost, and it is common to spin "cloud" virtual machines with smaller
disks (when I write "smaller" it could be a 300GB disk - which can still
get full quite fast).

---

Forcing close+error checking on stdout AND stderr is currently the only
(simple?)  way to ensure the output was not lost due to disk-full errors
(and for brevity I ignore the low-level issues of write-backs, caching,
journaling, etc.).

I'm attaching a sample test program to illustrate some points.
The program writes to stdout/stderr then optionally calls 
fclose/fflush/fsync.

Note the following:

1.
Without fclose, "disk-full" errors are not detected, and information
is lost (either from stdout, e.g. program's output, or from stderr, e.g. 
program's error messages or logs or progress info):

   $ ./aa stdout none > /dev/full && echo ok || echo error
   ok

2.
If we force fclose on stdout, errors are detected and reported:

   $ ./aa stdout fclose > /dev/full && echo ok || echo error
   aa: fclose failed: No space left on device
   error

3.
If we force fclose on stderr with disk-full, error messages are lost,
but at least the exit code will indicate an error:

   $ ./aa stderr fclose 2> /dev/full && echo ok || echo error
   error

4.
"fflush" instead of "fclose" seems to work OK, but I do not know
if there are other side effects:

   $ ./aa stdout fflush > /dev/full && echo ok || echo error
   aa: fflush failed: No space left on device
   error

5.
Calling "fsync" has the disadvantage that it always fails on special
files (e.g. pipes/sockets/char-devs).
This results in weird behavior, where "fsync" will report errors
when no I/O errors actually happened:

   $ ./aa stdout fsync && echo ok || echo error
   Hello
   aa: fsync failed: Invalid argument
   error

   $ ./aa stdout fsync | tee log
   Hello
   aa: fsync failed: Invalid argument

This perhaps could be avoided if fsync is checked for errors,
then a fall-back to fclose? but that's more code, not less...

---

For these reasons, I strongly encourage to keep close_stdout + close_stderr.

regards,
  - assaf

[-- Attachment #2: aa.c --]
[-- Type: text/x-csrc, Size: 1505 bytes --]

/* Test program for gnulib's close_stdout, regarding discussion
  in https://lists.gnu.org/archive/html/bug-gnulib/2019-05/msg00039.html

 Copyright (C) 2019 Assaf Gordon <assafgordon@gmail.com>
 Placed under Public Domain.

 Compile with:
    gcc -o aa aa.c
 Test with:
    $ ./aa stdout fclose && echo ok
    Hello
    ok

    $ ./aa stdout none >/dev/full && echo ok
    ok

    $ ./aa stdout fclose >/dev/full && echo ok
    aa: fclose failed: No space left on device

*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>

int main(int argc, char *argv[])
{
	FILE* f;
	int i;

	if (argc!=3)
		errx(1,"usage: %s [stdout|stderr] [fclose|fflush|fsync|none]",
			argv[0]);

	if (strncmp(argv[1],"stdout",6)==0)
		f = stdout;
	else if (strncmp(argv[1],"stderr",6)==0)
		f = stderr;
	else
		errx(1,"invalid 1st param (%s): must be 'stdout' or 'stderr'",
			argv[1]);

	i = fprintf(f,"Hello\n");
	if (i<0)
		err(1,"fprintf failed");

	i = ferror(f);
	if (i)
		errx(1,"ferror reports ERROR state");

	if (strncmp(argv[2],"fclose",6)==0) {
		i = fclose(f);
		if (i)
			err(1,"fclose failed");
	} else if (strncmp(argv[2],"fflush",6)==0) {
		i = fflush(f);
		if (i)
			err(1,"fflush failed");
	} else if (strncmp(argv[2],"fsync",5)==0) {
		i = fsync(fileno(f));
		if (i)
			err(1,"fsync failed");
	} else if (strncmp(argv[2],"none",5)!=0) {
		errx(1,"invalid 2st param (%s): must be 'fclose' or " \
			"'fflush' or 'fsync' or 'none'", argv[2]);
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-07  9:44         ` Assaf Gordon
@ 2019-05-07  9:49           ` Assaf Gordon
  2019-05-07 11:28           ` Bruno Haible
  1 sibling, 0 replies; 34+ messages in thread
From: Assaf Gordon @ 2019-05-07  9:49 UTC (permalink / raw)
  To: Bruno Haible, bug-gnulib
  Cc: Florian Weimer, Eric Blake, NeilBrown, Jeff Layton

I should've added:

On 2019-05-07 3:44 a.m., Assaf Gordon wrote:
> I'm attaching a sample test program to illustrate some points.
> The program writes to stdout/stderr then optionally calls 
> fclose/fflush/fsync.
> 
> Note the following:
> 

The attached program also calls "ferror" on the stream,
but it does not signal an error in those listed cases.

Also note that the output of the program is purposefully small ("Hello\n").
If it was large enough to fill libc's buffer and cause a write(2),
then the errors would be returned to the application.
But if the output is small - such errors are lost.

-assaf

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-07  9:44         ` Assaf Gordon
  2019-05-07  9:49           ` Assaf Gordon
@ 2019-05-07 11:28           ` Bruno Haible
  2019-05-08  0:43             ` NeilBrown
  1 sibling, 1 reply; 34+ messages in thread
From: Bruno Haible @ 2019-05-07 11:28 UTC (permalink / raw)
  To: Assaf Gordon
  Cc: Florian Weimer, Eric Blake, bug-gnulib, NeilBrown, Jeff Layton

Assaf Gordon wrote:
> 4.
> "fflush" instead of "fclose" seems to work OK, but I do not know
> if there are other side effects:
> 
>    $ ./aa stdout fflush > /dev/full && echo ok || echo error
>    aa: fflush failed: No space left on device
>    error

Except that it does not work OK on NFS, as explained by the comment
in close-stream.c (written in 2006):

                       Even calling fflush is not always sufficient,
   since some file systems (NFS and CODA) buffer written/flushed data
   until an actual close call.

Bruno



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-07 11:28           ` Bruno Haible
@ 2019-05-08  0:43             ` NeilBrown
  2019-05-08 11:00               ` Florian Weimer
  0 siblings, 1 reply; 34+ messages in thread
From: NeilBrown @ 2019-05-08  0:43 UTC (permalink / raw)
  To: Bruno Haible, Assaf Gordon
  Cc: Florian Weimer, Eric Blake, bug-gnulib, Jeff Layton

[-- Attachment #1: Type: text/plain, Size: 839 bytes --]

On Tue, May 07 2019, Bruno Haible wrote:

> Assaf Gordon wrote:
>> 4.
>> "fflush" instead of "fclose" seems to work OK, but I do not know
>> if there are other side effects:
>> 
>>    $ ./aa stdout fflush > /dev/full && echo ok || echo error
>>    aa: fflush failed: No space left on device
>>    error
>
> Except that it does not work OK on NFS, as explained by the comment
> in close-stream.c (written in 2006):
>
>                        Even calling fflush is not always sufficient,
>    since some file systems (NFS and CODA) buffer written/flushed data
>    until an actual close call.

You can achieve that "actual close call" using

  error = close(dup(fileno(stdout)));

so you don't actually need to "fclose" if you don't want to.
Any 'close' will do, it doesn't have to be the "last close".

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-08  0:43             ` NeilBrown
@ 2019-05-08 11:00               ` Florian Weimer
  2019-05-09  4:42                 ` Paul Eggert
  0 siblings, 1 reply; 34+ messages in thread
From: Florian Weimer @ 2019-05-08 11:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: bug-gnulib, Eric Blake, Assaf Gordon, Bruno Haible, Jeff Layton

* NeilBrown:

> On Tue, May 07 2019, Bruno Haible wrote:
>
>> Assaf Gordon wrote:
>>> 4.
>>> "fflush" instead of "fclose" seems to work OK, but I do not know
>>> if there are other side effects:
>>> 
>>>    $ ./aa stdout fflush > /dev/full && echo ok || echo error
>>>    aa: fflush failed: No space left on device
>>>    error
>>
>> Except that it does not work OK on NFS, as explained by the comment
>> in close-stream.c (written in 2006):
>>
>>                        Even calling fflush is not always sufficient,
>>    since some file systems (NFS and CODA) buffer written/flushed data
>>    until an actual close call.
>
> You can achieve that "actual close call" using
>
>   error = close(dup(fileno(stdout)));
>
> so you don't actually need to "fclose" if you don't want to.
> Any 'close' will do, it doesn't have to be the "last close".

Hah, thanks for this suggestion!  So something good came out of this
thread after all.  The big advantage of this approach is that this will
preserve the descriptor and the stream, so that further diagnostics from
the process are not suppressed.

Would someone who is familiar with the gnulib policies and procedures
please turn this into a proper patch, with error checking and all?  I
can try, but it's probably more work for you to bring what I could write
into an acceptable shape, than write the patch from scratch.

Thanks,
Florian



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-08 11:00               ` Florian Weimer
@ 2019-05-09  4:42                 ` Paul Eggert
  2019-05-09  5:01                   ` Florian Weimer
  2019-05-09  6:27                   ` NeilBrown
  0 siblings, 2 replies; 34+ messages in thread
From: Paul Eggert @ 2019-05-09  4:42 UTC (permalink / raw)
  To: Florian Weimer, NeilBrown
  Cc: Eric Blake, Assaf Gordon, bug-gnulib, Bruno Haible, Jeff Layton

Florian Weimer wrote:
>> You can achieve that "actual close call" using
>>
>>    error = close(dup(fileno(stdout)));
>>
>> so you don't actually need to "fclose" if you don't want to.
>> Any 'close' will do, it doesn't have to be the "last close".
> Hah, thanks for this suggestion!  So something good came out of this
> thread after all.  The big advantage of this approach is that this will
> preserve the descriptor and the stream, so that further diagnostics from
> the process are not suppressed.

That trick won't work if the dup fails.

Also, I worry that the trick won't port to non-Linux kernels, so it would have 
to be '#ifdef __linux__' or something like that.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-09  4:42                 ` Paul Eggert
@ 2019-05-09  5:01                   ` Florian Weimer
  2019-05-09  6:27                   ` NeilBrown
  1 sibling, 0 replies; 34+ messages in thread
From: Florian Weimer @ 2019-05-09  5:01 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Assaf Gordon, bug-gnulib, Jeff Layton, Bruno Haible, NeilBrown,
	Eric Blake

* Paul Eggert:

> Florian Weimer wrote:
>>> You can achieve that "actual close call" using
>>>
>>>    error = close(dup(fileno(stdout)));
>>>
>>> so you don't actually need to "fclose" if you don't want to.
>>> Any 'close' will do, it doesn't have to be the "last close".
>> Hah, thanks for this suggestion!  So something good came out of this
>> thread after all.  The big advantage of this approach is that this will
>> preserve the descriptor and the stream, so that further diagnostics from
>> the process are not suppressed.
>
> That trick won't work if the dup fails.

You can do an fsync in this case.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-06 19:19             ` Florian Weimer
@ 2019-05-09  6:20               ` Bernhard Voelker
  2019-05-09  6:39                 ` Florian Weimer
  0 siblings, 1 reply; 34+ messages in thread
From: Bernhard Voelker @ 2019-05-09  6:20 UTC (permalink / raw)
  To: Florian Weimer; +Cc: bug-gnulib, Eric Blake, NeilBrown, Jeff Layton

On 5/6/19 9:19 PM, Florian Weimer wrote:
> How would a programmer check that close_stdout has run, to determine
> that stdout and stderr are now invalid, to avoid the memory corruption?

lib/closeout.c:98:
  "Since close_stdout is commonly registered via 'atexit', [...]"

close_stdout is used right before the process ends, so I don't see
what further actions would follow.

It seems to me that you're trying to prevent programming errors.
So even the programmer knows his/her code, or maybe he can try
some static code analyzers to find this issue.
Is there a recent case which fell into this (quite obvious) trap?

Have a nice day,
Berny

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-09  4:42                 ` Paul Eggert
  2019-05-09  5:01                   ` Florian Weimer
@ 2019-05-09  6:27                   ` NeilBrown
  1 sibling, 0 replies; 34+ messages in thread
From: NeilBrown @ 2019-05-09  6:27 UTC (permalink / raw)
  To: Paul Eggert, Florian Weimer
  Cc: Eric Blake, Assaf Gordon, bug-gnulib, Bruno Haible, Jeff Layton

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On Wed, May 08 2019, Paul Eggert wrote:

> Florian Weimer wrote:
>>> You can achieve that "actual close call" using
>>>
>>>    error = close(dup(fileno(stdout)));
>>>
>>> so you don't actually need to "fclose" if you don't want to.
>>> Any 'close' will do, it doesn't have to be the "last close".
>> Hah, thanks for this suggestion!  So something good came out of this
>> thread after all.  The big advantage of this approach is that this will
>> preserve the descriptor and the stream, so that further diagnostics from
>> the process are not suppressed.
>
> That trick won't work if the dup fails.
>
> Also, I worry that the trick won't port to non-Linux kernels, so it would have 
> to be '#ifdef __linux__' or something like that.

Are there any non-Linux kernels ???

Seriously though, I suspect it would work on any Unix-like kernel.
It certainly doesn't hurt, so there is no need to protect it with the
#ifdef.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-09  6:20               ` Bernhard Voelker
@ 2019-05-09  6:39                 ` Florian Weimer
  2019-05-09  9:49                   ` Bernhard Voelker
  2019-05-09 22:17                   ` Paul Eggert
  0 siblings, 2 replies; 34+ messages in thread
From: Florian Weimer @ 2019-05-09  6:39 UTC (permalink / raw)
  To: Bernhard Voelker; +Cc: bug-gnulib, Eric Blake, NeilBrown, Jeff Layton

* Bernhard Voelker:

> On 5/6/19 9:19 PM, Florian Weimer wrote:
>> How would a programmer check that close_stdout has run, to determine
>> that stdout and stderr are now invalid, to avoid the memory corruption?
>
> lib/closeout.c:98:
>   "Since close_stdout is commonly registered via 'atexit', [...]"
>
> close_stdout is used right before the process ends, so I don't see
> what further actions would follow.

atexit handlers run before ELF destructors (and some C++ destructors).
There can also be multiple such handlers.  So it's not true that an
atexit handler always runs last.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-09  6:39                 ` Florian Weimer
@ 2019-05-09  9:49                   ` Bernhard Voelker
  2019-05-09 22:17                   ` Paul Eggert
  1 sibling, 0 replies; 34+ messages in thread
From: Bernhard Voelker @ 2019-05-09  9:49 UTC (permalink / raw)
  To: Florian Weimer; +Cc: bug-gnulib, Eric Blake, NeilBrown, Jeff Layton

On 5/9/19 8:39 AM, Florian Weimer wrote:
> There can also be multiple such handlers.  So it's not true that an
> atexit handler always runs last.

If a program is complex enough to have several atexit handlers
... then 'man atexit' applies:

  "Functions so registered are called in the reverse order
   of their registration [...]"

Have a nice day,
Berny


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-09  6:39                 ` Florian Weimer
  2019-05-09  9:49                   ` Bernhard Voelker
@ 2019-05-09 22:17                   ` Paul Eggert
  2019-05-10  7:28                     ` Kamil Dudka
  1 sibling, 1 reply; 34+ messages in thread
From: Paul Eggert @ 2019-05-09 22:17 UTC (permalink / raw)
  To: Florian Weimer, Bernhard Voelker
  Cc: Eric Blake, bug-gnulib, NeilBrown, Jeff Layton

On 5/8/19 11:39 PM, Florian Weimer wrote:
> atexit handlers run before ELF destructors (and some C++ destructors).
> There can also be multiple such handlers.  So it's not true that an
> atexit handler always runs last.

OK, but this shouldn't be a problem with any applications currently
using close_stdout. At least, none of the applications I know about.

I can see the need for a module that does the trick you mention (with
suitable error handling) instead of closing stdout, for applications
that have nontrivial atexit handlers or destructors. This module's API
shouldn't use identifiers like "close_stdout", though, since they
wouldn't actually closing stdout.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-09 22:17                   ` Paul Eggert
@ 2019-05-10  7:28                     ` Kamil Dudka
  2019-05-10  7:31                       ` Florian Weimer
  2019-05-13  1:09                       ` Paul Eggert
  0 siblings, 2 replies; 34+ messages in thread
From: Kamil Dudka @ 2019-05-10  7:28 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Florian Weimer, bug-gnulib, Jeff Layton, Bernhard Voelker,
	NeilBrown, Eric Blake

On Friday, May 10, 2019 12:17:11 AM CEST Paul Eggert wrote:
> On 5/8/19 11:39 PM, Florian Weimer wrote:
> > atexit handlers run before ELF destructors (and some C++ destructors).
> > There can also be multiple such handlers.  So it's not true that an
> > atexit handler always runs last.
> 
> OK, but this shouldn't be a problem with any applications currently
> using close_stdout. At least, none of the applications I know about.

How would you know if they did?

As long as you link libraries dynamically, any of the directly or indirectly 
linked libraries can introduce an ELF destructor or atexit() handler anytime, 
which would take an immediate effect even on your already built applications.
People also like to use instrumentation libraries enforced by LD_PRELOAD in 
their test environment.  Those can easily clash with such cleanup handlers.

Kamil

> I can see the need for a module that does the trick you mention (with
> suitable error handling) instead of closing stdout, for applications
> that have nontrivial atexit handlers or destructors. This module's API
> shouldn't use identifiers like "close_stdout", though, since they
> wouldn't actually closing stdout.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-10  7:28                     ` Kamil Dudka
@ 2019-05-10  7:31                       ` Florian Weimer
  2019-05-13  1:09                       ` Paul Eggert
  1 sibling, 0 replies; 34+ messages in thread
From: Florian Weimer @ 2019-05-10  7:31 UTC (permalink / raw)
  To: Kamil Dudka
  Cc: Paul Eggert, bug-gnulib, Jeff Layton, Bernhard Voelker, NeilBrown,
	Eric Blake

* Kamil Dudka:

> On Friday, May 10, 2019 12:17:11 AM CEST Paul Eggert wrote:
>> On 5/8/19 11:39 PM, Florian Weimer wrote:
>> > atexit handlers run before ELF destructors (and some C++ destructors).
>> > There can also be multiple such handlers.  So it's not true that an
>> > atexit handler always runs last.
>> 
>> OK, but this shouldn't be a problem with any applications currently
>> using close_stdout. At least, none of the applications I know about.
>
> How would you know if they did?
>
> As long as you link libraries dynamically, any of the directly or indirectly 
> linked libraries can introduce an ELF destructor or atexit() handler anytime, 
> which would take an immediate effect even on your already built applications.
> People also like to use instrumentation libraries enforced by LD_PRELOAD in 
> their test environment.  Those can easily clash with such cleanup handlers.

Right.  And I found close_stdout after someone reported problems with an
LD_PRELOAD-like mechanism and its logging output.  For some users, this
is an actual problem.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-10  7:28                     ` Kamil Dudka
  2019-05-10  7:31                       ` Florian Weimer
@ 2019-05-13  1:09                       ` Paul Eggert
  2019-05-13  7:00                         ` Florian Weimer
  1 sibling, 1 reply; 34+ messages in thread
From: Paul Eggert @ 2019-05-13  1:09 UTC (permalink / raw)
  To: Kamil Dudka
  Cc: Florian Weimer, bug-gnulib, Jeff Layton, Bernhard Voelker,
	NeilBrown, Eric Blake

Kamil Dudka wrote:

>> OK, but this shouldn't be a problem with any applications currently
>> using close_stdout. At least, none of the applications I know about.
> 
> How would you know if they did?

The usual way: ordinary users would be complaining. By "ordinary users" I mean 
users other than developers who are employing unusual compiler options, 
LD_PRELOAD settings, etc. for testing.
> As long as you link libraries dynamically, any of the directly or indirectly
> linked libraries can introduce an ELF destructor or atexit() handler anytime,

I don't see any way around this problem in general with the closeout module's 
current API, because when it discovers an I/O error it calls _exit, and _exit 
also clashes with that kind of cleanup handling.

If we want Coreutils and similar programs to be robust even for developers with 
unusual configurations for testing, I expect we'll need to change the programs 
to not use the closeout module at all. This would complicate these programs, 
since we'd need to check every way that every stdin-reading or stdout- or 
stderr-writing program can exit normally, and modify the affected programs to 
check the relevant I/O streams just before the normal exit occurs.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-13  1:09                       ` Paul Eggert
@ 2019-05-13  7:00                         ` Florian Weimer
  2019-05-25  1:41                           ` Paul Eggert
  2019-05-25 11:24                           ` Bruno Haible
  0 siblings, 2 replies; 34+ messages in thread
From: Florian Weimer @ 2019-05-13  7:00 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Kamil Dudka, bug-gnulib, Jeff Layton, Bernhard Voelker, NeilBrown,
	Eric Blake

* Paul Eggert:

>> As long as you link libraries dynamically, any of the directly or indirectly
>> linked libraries can introduce an ELF destructor or atexit() handler anytime,
>
> I don't see any way around this problem in general with the closeout
> module's current API, because when it discovers an I/O error it calls
> _exit, and _exit also clashes with that kind of cleanup handling.

This isn't the problem.  fflush may also realistically cause SIGPIPE to
be delivered to the process.  That's all fine.

The relevant case is where there is no error, and we do not call _exit.
I'm worried that the current implementation introduces a use-after-free
bug under certain, quite reasonable circumstances.  All that is needed
is a shared object that tries to log something to stderr from an ELF
destructor.  I don't think that's something that can be ruled out, or
can be assumed to happen in development environments only.

Even if I didn't have a user bug report about this issue, I would have
expected the current behavior to be unacceptable from a
quality-of-implementation perspective.  Look at the existing workaround
for sanitizers, and the comment in close_stream.  The code is buggy and
needs to be fixed.

My offer to write patch along the lines that Neil Brown sketched still
stands, but so does my concern that polishing it will require more work
than the patch itself.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-13  7:00                         ` Florian Weimer
@ 2019-05-25  1:41                           ` Paul Eggert
  2019-05-25 10:58                             ` Bruno Haible
  2019-05-27 11:56                             ` Florian Weimer
  2019-05-25 11:24                           ` Bruno Haible
  1 sibling, 2 replies; 34+ messages in thread
From: Paul Eggert @ 2019-05-25  1:41 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Kamil Dudka, bug-gnulib, Jeff Layton, Bernhard Voelker, NeilBrown,
	Eric Blake

On 5/13/19 12:00 AM, Florian Weimer wrote:
>> I don't see any way around this problem in general with the closeout
>> module's current API, because when it discovers an I/O error it calls
>> _exit, and _exit also clashes with that kind of cleanup handling.
> This isn't the problem.  fflush may also realistically cause SIGPIPE to
> be delivered to the process.  That's all fine.

It might be fine for some of these environments, but surely it can't be 
fine for others. If I'm relying on the startup routines to issue some 
sort of report for any non-signalling exit, then calls to _exit will 
bypass the report.

> Look at the existing workaround
> for sanitizers, and the comment in close_stream.  The code is buggy

I agree the code is a hack. But it's not buggy: it's portable to any 
environment that conforms to POSIX, and that's a wide variety of 
environments. The problem seems to be that people want to run these 
applications in debugging environments that don't conform to POSIX. 
While I'm sympathetic to that goal, it's not a high-priority-enough 
situation to call the current code "buggy".

A better fix here would not be to pile yet another hack into this hacky 
module. It would be to write a better module (we could call it 
"flushout", say) that would define a function ("flushout_stream", say) 
that would act like fflush but would do a better job with NFS-based 
streams by playing the dup+close+fsync trick. We could then modify 
coreutils etc. to use this new module instead of the old one. The hard 
part is the "modify coreutils etc." part, because flushout_stream will 
be less convenient to use than the current API.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-25  1:41                           ` Paul Eggert
@ 2019-05-25 10:58                             ` Bruno Haible
  2019-05-27 11:56                             ` Florian Weimer
  1 sibling, 0 replies; 34+ messages in thread
From: Bruno Haible @ 2019-05-25 10:58 UTC (permalink / raw)
  To: bug-gnulib
  Cc: Florian Weimer, Kamil Dudka, Paul Eggert, Jeff Layton,
	Bernhard Voelker, NeilBrown, Eric Blake

Paul Eggert wrote:
> A better fix ... would be to write a better module (we could call it 
> "flushout", say) that would define a function ("flushout_stream", say) 
> that would act like fflush but would do a better job with NFS-based 
> streams by playing the dup+close+fsync trick. We could then modify 
> coreutils etc. to use this new module instead of the old one. The hard 
> part is the "modify coreutils etc." part, because flushout_stream will 
> be less convenient to use than the current API.

I disagree that it would be a better module.

1) As stated in [1], I think that it would be wrong for gnulib to call
   fsync().

2) It would be a hacky module that makes assumptions about how the kernel
   behaves, assumptions that go further than what POSIX guarantees:

   - Do you want to apply the dup+close trick [2] for all file descriptors
     or only for those that write to certain file systems? In the first case,
     you are causing all coreutils programs to make more system calls than
     needed. In the second case, you are making assumptions about the file
     systems. We know about NFS and Coda only through code inspection of
     the Linux kernel.

   - The dup+close trick may work on some kernels and not on others.
     It is perfectly reasonable for a kernel to delay flushing the internal
     buffers associated with a file until the *last* close(). POSIX [3]
     specifies several behaviours of close() that occur only on the last
     close(): "... the last close() shall ...". So a kernel must already
     have the reference-counting needed to detect the last close().
     How do you want to achieve portability from Linux over OpenBSD to Solaris,
     then?

Bruno

[1] https://lists.gnu.org/archive/html/bug-gnulib/2019-05/msg00039.html
[2] https://lists.gnu.org/archive/html/bug-gnulib/2019-05/msg00046.html
[3] https://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-13  7:00                         ` Florian Weimer
  2019-05-25  1:41                           ` Paul Eggert
@ 2019-05-25 11:24                           ` Bruno Haible
  2019-05-25 19:23                             ` Paul Eggert
  2019-05-27 12:00                             ` Florian Weimer
  1 sibling, 2 replies; 34+ messages in thread
From: Bruno Haible @ 2019-05-25 11:24 UTC (permalink / raw)
  To: bug-gnulib
  Cc: Florian Weimer, Kamil Dudka, Paul Eggert, Jeff Layton,
	Bernhard Voelker, NeilBrown, Eric Blake

Florian Weimer wrote in
<https://lists.gnu.org/archive/html/bug-gnulib/2019-05/msg00087.html>:
> The relevant case is where there is no error, and we do not call _exit.
> I'm worried that the current implementation introduces a use-after-free
> bug under certain, quite reasonable circumstances.  All that is needed
> is a shared object that tries to log something to stderr from an ELF
> destructor.  I don't think that's something that can be ruled out, or
> can be assumed to happen in development environments only.

OK, now you have described the problem in an understandable way.

Let me rephrase the dilemma:

  1) POSIX guarantees that we can detect write errors [up to the file
     system layer of the kernel - I'm not worried about I/O errors on
     the actual device] through fclose(), and kernels implement this.
     Neither POSIX nor Linux guarantees that we can detect write errors
     without calling fclose().

  2) POSIX says "After the call to fclose(), any use of stream results in
     undefined behavior." [1]

So, we need to call fclose(stderr) at a moment when we know that stderr
will not be used any more.

We have
  * applications (like the coreutils programs), and
  * environments which can modify the behaviour of these applications,
    like shared objects added through LD_PRELOAD, or sanitizers [2].

The solution I would propose is to
  - By default, assume that the environment does not modify the behaviour
    of the application. That is, that the application executes its code
    and nothing more.
  - Let the environment tell the application (through an environment
    variable) that it is modifying its behaviour.

For the first case, the current 'closeout' module is perfect.

For the other case, we can introduce, next to the !SANITIZE_ADDRESS test,
tests for
  getenv ("LD_PRELOAD") != NULL
  getenv ("ASAN_OPTIONS") != NULL
  getenv ("TSAN_OPTIONS") != NULL
  getenv ("MSAN_OPTIONS") != NULL
  getenv ("LSAN_OPTIONS") != NULL
We can add more such environment variables as needed. getenv() lookups
don't make system calls; so they are cheap.

Bruno

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/fclose.html
[2] https://github.com/google/sanitizers/wiki/SanitizerCommonFlags



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-25 11:24                           ` Bruno Haible
@ 2019-05-25 19:23                             ` Paul Eggert
  2019-05-27 12:00                             ` Florian Weimer
  1 sibling, 0 replies; 34+ messages in thread
From: Paul Eggert @ 2019-05-25 19:23 UTC (permalink / raw)
  To: Bruno Haible, bug-gnulib
  Cc: Florian Weimer, Kamil Dudka, Jeff Layton, Bernhard Voelker,
	NeilBrown, Eric Blake

Bruno Haible wrote:
> For the other case, we can introduce, next to the !SANITIZE_ADDRESS test,
> tests for
>    getenv ("LD_PRELOAD") != NULL
>    getenv ("ASAN_OPTIONS") != NULL
>    getenv ("TSAN_OPTIONS") != NULL
>    getenv ("MSAN_OPTIONS") != NULL
>    getenv ("LSAN_OPTIONS") != NULL
> We can add more such environment variables as needed. getenv() lookups
> don't make system calls; so they are cheap.

A less-intrusive possibility is to suggest to people writing specialized 
log-to-stderr environments that they use an (optional) environment variable 
ERROR_FD to specify which file descriptor to log to (with the default being file 
descriptor 2), and that they test coreutils with ERROR_FD=3 and with file 
descriptor 3 open to their error log. That way, coreutils would not need to be 
modified, and could even get rid of the !SANITIZE_ADDRESS hack after 
AddressSanitizer adopts this approach.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-25  1:41                           ` Paul Eggert
  2019-05-25 10:58                             ` Bruno Haible
@ 2019-05-27 11:56                             ` Florian Weimer
  1 sibling, 0 replies; 34+ messages in thread
From: Florian Weimer @ 2019-05-27 11:56 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Kamil Dudka, bug-gnulib, Jeff Layton, Bernhard Voelker, NeilBrown,
	Eric Blake

* Paul Eggert:

> On 5/13/19 12:00 AM, Florian Weimer wrote:
>>> I don't see any way around this problem in general with the closeout
>>> module's current API, because when it discovers an I/O error it calls
>>> _exit, and _exit also clashes with that kind of cleanup handling.
>> This isn't the problem.  fflush may also realistically cause SIGPIPE to
>> be delivered to the process.  That's all fine.
>
> It might be fine for some of these environments, but surely it can't
> be fine for others. If I'm relying on the startup routines to issue
> some sort of report for any non-signalling exit, then calls to _exit
> will bypass the report.

Sorry, I don't follow.  If fflush (stderr) fails (or terminates the
process with SIGPIPE), in the current code, then at least there isn't a
memory safety violation.  It's also a bit unlikely that code running
later could do anything useful, given that the process or the entire
system is in such a bad state.

As far as I understand, the close_stdout function intends to avoid
silent truncation due to an ENOSPC error.  It does not call fsync, so it
cannot prevent data loss as the result of a subsequent system crash, but
at least it prevents silent ENOSPC data corruption with the Linux NFS
implementation.

>> Look at the existing workaround
>> for sanitizers, and the comment in close_stream.  The code is buggy
>
> I agree the code is a hack. But it's not buggy: it's portable to any
> environment that conforms to POSIX, and that's a wide variety of
> environments.

As far as I know, POSIX does not say anywhere something like, “No
function in this volume of POSIX.1‐2008 shall write anything to the
standard error stream” or “No function in this volume of POSIX.1‐2008
shall call the perror function”.

> The problem seems to be that people want to run these applications in
> debugging environments that don't conform to POSIX. While I'm
> sympathetic to that goal, it's not a high-priority-enough situation to
> call the current code "buggy".

I don't think this issue is restricted to debugging environments.
Printing diagnostics to stderr is always a bit iffy, but can happen in
many cases.

> A better fix here would not be to pile yet another hack into this
> hacky module. It would be to write a better module (we could call it
> "flushout", say) that would define a function ("flushout_stream", say)
> that would act like fflush but would do a better job with NFS-based
> streams by playing the dup+close+fsync trick. We could then modify
> coreutils etc. to use this new module instead of the old one. The hard
> part is the "modify coreutils etc." part, because flushout_stream will
> be less convenient to use than the current API.

That's why I think we need to fix close_stdout instead.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-25 11:24                           ` Bruno Haible
  2019-05-25 19:23                             ` Paul Eggert
@ 2019-05-27 12:00                             ` Florian Weimer
  2019-05-27 21:13                               ` Bruno Haible
  1 sibling, 1 reply; 34+ messages in thread
From: Florian Weimer @ 2019-05-27 12:00 UTC (permalink / raw)
  To: Bruno Haible
  Cc: Kamil Dudka, Paul Eggert, bug-gnulib, Jeff Layton,
	Bernhard Voelker, NeilBrown, Eric Blake

* Bruno Haible:

> Florian Weimer wrote in
> <https://lists.gnu.org/archive/html/bug-gnulib/2019-05/msg00087.html>:
>> The relevant case is where there is no error, and we do not call _exit.
>> I'm worried that the current implementation introduces a use-after-free
>> bug under certain, quite reasonable circumstances.  All that is needed
>> is a shared object that tries to log something to stderr from an ELF
>> destructor.  I don't think that's something that can be ruled out, or
>> can be assumed to happen in development environments only.
>
> OK, now you have described the problem in an understandable way.
>
> Let me rephrase the dilemma:
>
>   1) POSIX guarantees that we can detect write errors [up to the file
>      system layer of the kernel - I'm not worried about I/O errors on
>      the actual device] through fclose(), and kernels implement this.
>      Neither POSIX nor Linux guarantees that we can detect write errors
>      without calling fclose().
>
>   2) POSIX says "After the call to fclose(), any use of stream results in
>      undefined behavior." [1]
>
> So, we need to call fclose(stderr) at a moment when we know that stderr
> will not be used any more.
>
> We have
>   * applications (like the coreutils programs), and
>   * environments which can modify the behaviour of these applications,
>     like shared objects added through LD_PRELOAD, or sanitizers [2].

This doesn't have to do anything with LD_PRELOAD.  It really depends
what kind of ELF destructors and atexit handlers are present in the
process image.

> The solution I would propose is to
>   - By default, assume that the environment does not modify the behaviour
>     of the application. That is, that the application executes its code
>     and nothing more.
>   - Let the environment tell the application (through an environment
>     variable) that it is modifying its behaviour.
>
> For the first case, the current 'closeout' module is perfect.
>
> For the other case, we can introduce, next to the !SANITIZE_ADDRESS test,
> tests for
>   getenv ("LD_PRELOAD") != NULL
>   getenv ("ASAN_OPTIONS") != NULL
>   getenv ("TSAN_OPTIONS") != NULL
>   getenv ("MSAN_OPTIONS") != NULL
>   getenv ("LSAN_OPTIONS") != NULL
> We can add more such environment variables as needed. getenv() lookups
> don't make system calls; so they are cheap.

The application may have called clearenv before that.

What's so bad about closing the underyling file descriptor after
duplicating it?  It's much more portable than the other stdio hacks the
gnulib code contains today.  It will not have the desired effect on some
platforms, and on others, it is redundant because the write system call
performs ENOSPC checks, even on NFS.  It fixes a real problem our users
reported on Linux.  Why can't we make this work correctly out of the
box?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Why does close_stdout close stdout and stderr?
  2019-05-27 12:00                             ` Florian Weimer
@ 2019-05-27 21:13                               ` Bruno Haible
  0 siblings, 0 replies; 34+ messages in thread
From: Bruno Haible @ 2019-05-27 21:13 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Kamil Dudka, Paul Eggert, bug-gnulib, Jeff Layton,
	Bernhard Voelker, NeilBrown, Eric Blake

Florian Weimer wrote:
> What's so bad about closing the underyling file descriptor after
> duplicating it?

See my other mail:
<https://lists.gnu.org/archive/html/bug-gnulib/2019-05/msg00134.html>

> It will not have the desired effect on some platforms

Then we will have a regression on these platforms. We don't want regressions.

> It fixes a real problem our users reported on Linux.

Please show the test program and test recipes (with results) that prove that.

> > For the other case, we can introduce, next to the !SANITIZE_ADDRESS test,
> > tests for
> >   getenv ("LD_PRELOAD") != NULL
> >   getenv ("ASAN_OPTIONS") != NULL
> >   getenv ("TSAN_OPTIONS") != NULL
> >   getenv ("MSAN_OPTIONS") != NULL
> >   getenv ("LSAN_OPTIONS") != NULL
> > We can add more such environment variables as needed. getenv() lookups
> > don't make system calls; so they are cheap.
> 
> The application may have called clearenv before that.

The only programs that use clearenv() in a normal Linux distro are init,
systemd, pkexec, php-fpm, and they don't use gnulib. So, no need to worry
about applications that call clearenv().

Bruno



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2019-05-27 21:14 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-29 19:45 Why does close_stdout close stdout and stderr? Florian Weimer
2019-04-29 19:49 ` Eric Blake
2019-04-29 20:26   ` Florian Weimer
2019-05-06 12:05     ` Florian Weimer
2019-05-06 14:56       ` Bernhard Voelker
2019-05-06 15:47         ` Florian Weimer
2019-05-06 19:14           ` Bernhard Voelker
2019-05-06 19:19             ` Florian Weimer
2019-05-09  6:20               ` Bernhard Voelker
2019-05-09  6:39                 ` Florian Weimer
2019-05-09  9:49                   ` Bernhard Voelker
2019-05-09 22:17                   ` Paul Eggert
2019-05-10  7:28                     ` Kamil Dudka
2019-05-10  7:31                       ` Florian Weimer
2019-05-13  1:09                       ` Paul Eggert
2019-05-13  7:00                         ` Florian Weimer
2019-05-25  1:41                           ` Paul Eggert
2019-05-25 10:58                             ` Bruno Haible
2019-05-27 11:56                             ` Florian Weimer
2019-05-25 11:24                           ` Bruno Haible
2019-05-25 19:23                             ` Paul Eggert
2019-05-27 12:00                             ` Florian Weimer
2019-05-27 21:13                               ` Bruno Haible
2019-05-06 18:53       ` Paul Eggert
2019-05-06 19:02         ` Jeff Layton
2019-05-06 22:32       ` Bruno Haible
2019-05-07  9:44         ` Assaf Gordon
2019-05-07  9:49           ` Assaf Gordon
2019-05-07 11:28           ` Bruno Haible
2019-05-08  0:43             ` NeilBrown
2019-05-08 11:00               ` Florian Weimer
2019-05-09  4:42                 ` Paul Eggert
2019-05-09  5:01                   ` Florian Weimer
2019-05-09  6:27                   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).