unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Rich Felker <dalias@libc.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	 Arnd Bergmann <arnd@arndb.de>,
	 Alistair Francis <alistair.francis@wdc.com>,
	 GNU C Library <libc-alpha@sourceware.org>,
	 Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	 Florian Weimer <fweimer@redhat.com>,
	 Palmer Dabbelt <palmer@sifive.com>,
	 macro@wdc.com,  Zong Li <zongbox@gmail.com>,
	 Alistair Francis <alistair23@gmail.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Al Viro <viro@zeniv.linux.org.uk>,
	 Christian Brauner <christian@brauner.io>,
	 "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC v3 03/23] sysdeps/wait: Use waitid if avaliable
Date: Mon, 22 Jul 2019 19:00:50 -0500	[thread overview]
Message-ID: <87k1c962ml.fsf@xmission.com> (raw)
In-Reply-To: 20190721232336.GA30851@brightrain.aerifal.cx

Rich Felker <dalias@libc.org> writes:

> On Sun, Jul 21, 2019 at 04:40:27PM -0500, Eric W. Biederman wrote:
>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>> 
>> > On Sun, Jul 21, 2019 at 8:45 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >>
>> >> We either add P_PROCESS_PGID to the kernel's waitid or we add wait4.
>> >
>> > Do we actually need a new flag?
>> >
>> > Why not just allow people to pass in pid = 0 with the existing P_PGID flag?
>> >
>> > Those are the traditional and documented waitpid() semantics. Why
>> > wouldn't we use those semantics for waitid() too?
>> >
>> > And since our current (broken) waitid() returns EINVAL for that case,
>> > it's even trivial to test for in user space (and fall back on the
>> > broken racy code in user space - which you have to do anyway).
>> 
>> Our current waitid implementation when serving as waitid is not broken.
>> 
>> What is broken is the waitpid emulation on top of waitid.
>> 
>> The way waitid is defined waitid P_PGID with a pid of 0 means
>> wait for the process group id of 0.  Which in theory and in limited
>> practice exists.  AKA that is the pid and process group of the idle
>> thread.
>> 
>> Further there is the outstanding question.  Should P_PROCESS_PGID refer
>> to the curent processes process group id at the time of the waitid call,
>> or should P_PROCESS_PGID refer to the process group id when a child is
>> found?
>> 
>> It has been suggested elswehere in this conversation that 161550d74c07
>> ("pid: sys_wait... fixes") may have introduced a regression.  If so
>> the current wait4 behavior of capturing the process group at the time
>> of call is wrong and needs to be fixed.
>> 
>> Not capuring the pid at time time of call is a very different behavior
>> of P_PROCESS_PGID vs all of the other waitid cases which do have the pid
>> fixed at the time of the call which further argues a separate flag
>> because the behavior would be distinctly different.
>
> I believe that is a regression, and that the "capturing it" line of
> thinking is misleading. It's thinking in terms of implementation
> rather than interface contract. The interface contract covers which
> children it can reap and when it can block. If the call remains
> blocking, there must be a possible ordering of events, with no
> observable effects contradicting that order, by which there are no
> reapable children in the process's current pgid. If the call returns
> successfully, there must be a possible ordering of events, with no
> observable effects contradicting that order, by which the caller's
> pgid and the reaped process's pgid were equal.

A regression (as we talk about it in the linux kernel) requires there to
be an actual program that has problems because of the change in behavior
of the kernel.  Just it being possible for userspace to observe a
behavior change is not enough for something to be classified as a
regression.

Furthermore thinking about this and looking at it more closely this can
only be an issue in the presence of pthreads.  Except in a very narrow
special case setpgid is restricted to only changing the process group of
the calling process.  Which means that in unix without pthreads setpgid
can not run while a process is in wait, waitpid, or wait4.


I was really hoping we would have a pointer to a program that shows
a problem, or a clarification that someone was just reading the code.
I will wait a bit more but given that the code has worked without bug
reports for 11 years I am going to assume that there are no programs
that have regressed because of my old change.


>> > Honestly, that seems like the simplest soilution, but also like the
>> > only sane model. The fact that P_PGID with a zero pid doesn't work
>> > seems to simply be a bug right now, and keeps waitid() from being a
>> > proper superset of waitpid().
>> 
>> In the context of waitid it does not look sane.  Unlike the other wait
>> functions waitid does not have any magic ids and by having an idtype
>> field makes that kind of magic unnecessary.  Further it is trivial
>> to add one line to the architecture independent enumeration and have 0
>> risk of confusion or of regressing user space.
>
> I'm in agreement that if an extension to the waitid syscall is added,
> it should be P_PROCESS_PGID, not defining some special case for
> pid==0. It's not entirely clear but arguable that the standard
> requires EINVAL for P_PGID + pid==0, and plausible that there are
> applications that depend on this. We could emulate the EINVAL case in
> userspace, but assigning weird semantics to special cases is just a
> mess of potential future headaches when it would be trivial to do it
> right. And doing it right would also make programming userspace side
> easier.

Since this is a POSIX interface and shared between many unices I took
at look at a couple to see what they have done.  If possible it is
important to be able to write programs that work the same across
different unices.

The BSDs implemented wait6 as their common wait that does everything
system call.   It's large contribution is that wait6 will give both
the rusage of the child that exited and it's children separately.
To be the one system call that is a superset of them all wait6 implements
both P_PGID and P_PID with an id of 0 as a request to wait for any
process in the current process group.

The wait6 family also extends the idtype enumeration a bit with
things such as P_UID and P_GID that wait for the effective uid and gid
respectively.

I looked at the code on FreeBSD and they replace the id of 0 with
an the result of getpgid() internal to the system call, before any
waiting happens.   Resulting in the same semantics as current Linux.


I took a quick look at what Illumos does and they have narrowed their
kernel wait interface to just the posix waitid.  There is no rusage
nor is their any special case for waiting for the processes in the
current process group.


I looked at XNU and it still implements both wait4 and waitid as native
interfaces.  The waitid is not extended while their wait4 converts 0 to
-getpgrp() before waiting.



So my recommendation now is to avoid gratuitous incompatibilities.

1) For extending waitid.  Let's use P_PGID and and id of 0 to represent
   the callers process group.  That is compatible with the BSDs, and
   portable programs should not have any problem with it.

   I want to stay away from the BSD extention of P_PID with an id of 0
   meaning wait for the calling process's process group.  I see where
   it comes from but that is confusing.

2) It appears the popular definition of current process group is the
   current process group at the time of the system call.  Currently that
   is Linux, FreeBSD, Illumos, and XNU.  So short of a real program that
   cares, let's adopt that definition in linux.  Along with patches I
   will see about getting the linux manpage updated to clarify that
   point.

Eric


  reply	other threads:[~2019-07-23  0:02 UTC|newest]

Thread overview: 127+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-17  0:08 [RFC v3 00/23] RISC-V glibc port for the 32-bit Alistair Francis
2019-07-17  0:08 ` [RFC v3 01/23] sysdeps/nanosleep: Use clock_nanosleep_time64 if avaliable Alistair Francis
2019-07-17  5:16   ` Florian Weimer
2019-07-19 17:25     ` Alistair Francis
2019-07-20 14:24       ` Stepan Golosunov
2019-07-22 21:14         ` Alistair Francis
2019-07-17  0:08 ` [RFC v3 02/23] sysdeps/gettimeofday: Use clock_gettime64 " Alistair Francis
2019-07-17  7:09   ` Florian Weimer
2019-07-20  3:20     ` Rich Felker
2019-07-25 20:54       ` Joseph Myers
2019-07-17 12:43   ` Lukasz Majewski
2019-07-17 12:48     ` Lukasz Majewski
2019-07-19 22:26     ` Alistair Francis
2019-07-17  0:08 ` [RFC v3 03/23] sysdeps/wait: Use waitid " Alistair Francis
2019-07-17  5:31   ` Florian Weimer
2019-07-19 17:49     ` Alistair Francis
2019-07-22 15:58       ` Florian Weimer
2019-07-22 21:02         ` Alistair Francis
2019-07-21  4:03   ` Rich Felker
2019-07-21  4:20     ` Rich Felker
2019-07-21 11:59       ` Eric W. Biederman
2019-07-21 22:59         ` Rich Felker
2019-07-21  7:57     ` Arnd Bergmann
2019-07-21 12:15       ` Eric W. Biederman
2019-07-21 12:28         ` Christian Brauner
2019-07-21 14:30         ` Arnd Bergmann
2019-07-21 15:45           ` Eric W. Biederman
2019-07-21 17:05             ` Arnd Bergmann
2019-07-21 17:16             ` Linus Torvalds
2019-07-21 21:40               ` Eric W. Biederman
2019-07-21 23:23                 ` Rich Felker
2019-07-23  0:00                   ` Eric W. Biederman [this message]
2019-07-23  8:12                     ` Arnd Bergmann
2019-07-23  8:28                       ` Christian Brauner
2019-07-23  8:45                         ` Arnd Bergmann
2019-07-25  0:04                           ` Alistair Francis
2019-07-25  4:40                             ` Rich Felker
2019-07-25 13:15                               ` Arnd Bergmann
2019-07-25 16:06                                 ` Christian Brauner
2019-07-25 17:14                                 ` Eric W. Biederman
2019-07-25 17:30                                   ` Christian Brauner
2019-08-13 22:22                                     ` Alistair Francis
2019-08-13 23:11                                       ` Rich Felker
2019-08-14  5:07                                         ` Christian Brauner
2019-08-14 11:38                                       ` [PATCH v1 0/1] waitid: process group enhancement christian.brauner
2019-08-14 11:38                                         ` [PATCH v1 1/1] waitid: Add support for waiting for the current process group christian.brauner
2019-08-14 12:29                                           ` Oleg Nesterov
2019-08-14 12:45                                             ` Christian Brauner
2019-08-14 12:50                                               ` Oleg Nesterov
2019-08-14 12:53                                                 ` Christian Brauner
2019-08-14 13:07                                       ` [PATCH v2 0/1] waitid: process group enhancement Christian Brauner
2019-08-14 13:07                                         ` [PATCH v2 1/1] waitid: Add support for waiting for the current process group Christian Brauner
2019-08-14 14:19                                           ` Oleg Nesterov
2019-08-14 14:35                                             ` Christian Brauner
2019-08-14 15:27                                               ` Oleg Nesterov
2019-08-14 15:30                                                 ` Christian Brauner
2019-08-14 15:43                                       ` [PATCH v3 0/1] waitid: process group enhancement Christian Brauner
2019-08-14 15:44                                         ` [PATCH v3 1/1] waitid: Add support for waiting for the current process group Christian Brauner
2019-08-14 16:09                                           ` Oleg Nesterov
2019-08-14 16:15                                             ` Christian Brauner
2019-08-14 16:34                                               ` Christian Brauner
2019-08-14 16:55                                                 ` Rich Felker
2019-08-14 17:02                                                   ` Christian Brauner
2019-08-14 17:06                                                   ` Linus Torvalds
2019-08-14 18:00                                                     ` Rich Felker
2019-08-14 20:50                                             ` Christian Brauner
2019-08-14 15:58                                         ` [PATCH v3 0/1] waitid: process group enhancement Rich Felker
2019-08-14 16:13                                           ` Christian Brauner
2019-07-26 23:35                                   ` [RFC v3 03/23] sysdeps/wait: Use waitid if avaliable Alistair Francis
2019-07-17  0:08 ` [RFC v3 04/23] sysdeps/clock_gettime: Use clock_gettime64 " Alistair Francis
2019-07-17  5:38   ` Florian Weimer
2019-07-17  8:04     ` Arnd Bergmann
2019-07-17  8:44       ` Florian Weimer
2019-07-17  9:10         ` Arnd Bergmann
2019-07-17 15:16           ` Florian Weimer
2019-07-18  7:38             ` Arnd Bergmann
2019-07-18  8:18               ` Florian Weimer
2019-07-18  9:14                 ` Arnd Bergmann
2019-07-18 18:10                 ` Adhemerval Zanella
2019-07-19 21:03     ` Alistair Francis
2019-07-17  7:03   ` Andreas Schwab
2019-07-17 12:37   ` Lukasz Majewski
2019-07-17  0:08 ` [RFC v3 05/23] sysdeps/timespec_get: " Alistair Francis
2019-07-17  5:08   ` Florian Weimer
2019-07-17  7:59     ` Arnd Bergmann
2019-07-17  8:11       ` Florian Weimer
2019-07-17  8:23         ` Arnd Bergmann
2019-07-17  8:41           ` Florian Weimer
2019-07-17  8:54             ` Arnd Bergmann
2019-07-25 20:14       ` Joseph Myers
2019-07-17 12:22   ` Lukasz Majewski
2019-07-17  0:08 ` [RFC v3 06/23] Documentation for the RISC-V 32-bit port Alistair Francis
2019-07-17  0:08 ` [RFC v3 07/23] RISC-V: Use 64-bit time_t and off_t for RV32 and RV64 Alistair Francis
2019-07-17  8:27   ` Arnd Bergmann
2019-07-17 22:39     ` Alistair Francis
2019-07-18  7:41       ` Arnd Bergmann
2019-07-18 17:36         ` Alistair Francis
2019-07-19  6:44           ` Arnd Bergmann
2019-07-19 17:02             ` Alistair Francis
2019-07-17  0:09 ` [RFC v3 08/23] RISC-V: define __NR_futex as __NR_futex_time64 for 32-bit Alistair Francis
2019-07-17  0:09 ` [RFC v3 09/23] RISC-V: define __NR_* as __NR_*_time64/64 " Alistair Francis
2019-07-17  0:09 ` [RFC v3 10/23] RISC-V: define __NR_clock_getres as __NR_*_time64 " Alistair Francis
2019-07-17  0:09 ` [RFC v3 11/23] RISC-V: define __vdso_clock_getres as __vdso_clock_getres_time64 " Alistair Francis
2019-07-17  0:09 ` [RFC v3 12/23] RISC-V: define __vdso_clock_gettime as __vdso_clock_gettime64 " Alistair Francis
2019-07-17  8:16   ` Arnd Bergmann
2019-07-19 17:15     ` Alistair Francis
2019-07-17  0:09 ` [RFC v3 13/23] RISC-V: Use 64-bit timespec in clock_gettime vdso calls Alistair Francis
2019-07-17  8:13   ` Arnd Bergmann
2019-07-17  0:09 ` [RFC v3 14/23] RISC-V: Support dynamic loader for the 32-bit Alistair Francis
2019-07-17  0:09 ` [RFC v3 15/23] RISC-V: Add path of library directories " Alistair Francis
2019-07-17 12:20   ` Florian Weimer
2019-07-17  0:09 ` [RFC v3 16/23] RISC-V: The ABI implementation " Alistair Francis
2019-07-17  0:09 ` [RFC v3 17/23] RISC-V: Hard float support for the 32 bit Alistair Francis
2019-07-17  0:09 ` [RFC v3 18/23] RISC-V: Regenerate ULPs of RISC-V Alistair Francis
2019-07-17  0:09 ` [RFC v3 19/23] RISC-V: Add ABI lists Alistair Francis
2019-07-17  0:09 ` [RFC v3 20/23] RISC-V: Build Infastructure for the 32-bit Alistair Francis
2019-07-17  0:09 ` [RFC v3 21/23] RISC-V: Fix llrint and llround missing exceptions on RV32 Alistair Francis
2019-07-17 12:22   ` Florian Weimer
2019-07-17 22:32     ` Alistair Francis
2019-07-17  0:09 ` [RFC v3 22/23] Add RISC-V 32-bit target to build-many-glibcs.py Alistair Francis
2019-07-17  0:09 ` [RFC v3 23/23] RISC-V: Use 64-bit vdso syscalls Alistair Francis
2019-07-17  5:33   ` Florian Weimer
2019-07-17  8:02     ` Arnd Bergmann
2019-07-17 22:23       ` Alistair Francis
2019-07-17 23:42         ` Alistair Francis
2019-07-18  0:01           ` Alistair Francis
2019-07-19 17:14 ` [RFC v3 00/23] RISC-V glibc port for the 32-bit Alistair Francis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1c962ml.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=akpm@linux-foundation.org \
    --cc=alistair.francis@wdc.com \
    --cc=alistair23@gmail.com \
    --cc=arnd@arndb.de \
    --cc=christian@brauner.io \
    --cc=dalias@libc.org \
    --cc=fweimer@redhat.com \
    --cc=hpa@zytor.com \
    --cc=libc-alpha@sourceware.org \
    --cc=macro@wdc.com \
    --cc=palmer@sifive.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zongbox@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).