From: Aleksa Sarai <cyphar@cyphar.com>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fweimer@redhat.com>,
Florian Weimer via Libc-alpha <libc-alpha@sourceware.org>
Subject: Re: RFC: Disable clone3 for glibc 2.34
Date: Tue, 27 Jul 2021 20:22:22 +1000 [thread overview]
Message-ID: <20210727102222.r2hys526mfkpt4xo@senku> (raw)
In-Reply-To: <20210727094117.jid7shl7futsciih@wittgenstein>
[-- Attachment #1: Type: text/plain, Size: 4638 bytes --]
On 2021-07-27, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> On Tue, Jul 27, 2021 at 11:24:16AM +0200, Christian Brauner wrote:
> > On Tue, Jul 27, 2021 at 11:11:17AM +0200, Florian Weimer via Libc-alpha wrote:
> > > * Florian Weimer via Libc-alpha:
> > >
> > > > Reportedly, the docker package in Ubuntu as used by Github Actions and
> > > > others does not provide a way to enable the clone3 system call. It
> > > > always fails with EPERM.
> > > >
> > > > Should we apply a patch like this for the release?
> > > >
> > > > diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> > > > index 1e7a8f6b35..4046c81180 100644
> > > > --- a/sysdeps/unix/sysv/linux/clone-internal.c
> > > > +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> > > > @@ -48,17 +48,6 @@ __clone_internal (struct clone_args *cl_args,
> > > > int (*func) (void *arg), void *arg)
> > > > {
> > > > int ret;
> > > > -#ifdef HAVE_CLONE3_WAPPER
> > > > - /* Try clone3 first. */
> > > > - int saved_errno = errno;
> > > > - ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> > > > - if (ret != -1 || errno != ENOSYS)
> > > > - return ret;
> > > > -
> > > > - /* NB: Restore errno since errno may be checked against non-zero
> > > > - return value. */
> > > > - __set_errno (saved_errno);
> > > > -#endif
> > > >
> > > > /* Map clone3 arguments to clone arguments. NB: No need to check
> > > > invalid clone3 specific bits in flags nor exit_signal since this
> > > >
> > > > My concern with this is that we don't know yet where the CET kernel API
> > > > will land exactly and if CET will require clone3. So clone3 might have
> > > > to come back once we turn on CET, which is hopefully soon.
> > >
> > > Ubuntu 20.04 LTS may have already been fixed, I cannot reproduce the
> > > issue with its docker.io/containerd/runc packages.
> > >
> > > I could trivially fix a previously failing Github Action with:
> > >
> > > diff --git a/.github/workflows/fedora.yml b/.github/workflows/fedora.yml
> > > index d2381ec..7b10286 100644
> > > --- a/.github/workflows/fedora.yml
> > > +++ b/.github/workflows/fedora.yml
> > > @@ -22,6 +22,7 @@ jobs:
> > > runs-on: ubuntu-latest
> > > container:
> > > image: fedora:${{matrix.release}}
> > > + options: --security-opt seccomp=unconfined
> > >
> > > steps:
> > > - name: Checkout repository
> > >
> > > So I think we need to figure out what people are actually complaining
> > > about.
> >
> > This relates to the discussion what errno value should be used in a
> > seccomp filter to indicate that a syscall is blocked.
> >
> > So there are two problems I see with seccomp and clone3():
> > 1. the profile doesn't include clone3() at all and therefore the syscall
> > is blocked and the default action is EPERM
> > 2. the profile does include clone3() and decided to block it but the
> > runtime has decided to make seccomp return EPERM and not ENOSYS when
> > clone3() is attempted
> >
> > The correct fix in both scenarios is to add clone3() to the seccomp
> > profile and either allow it or return ENOSYS.
> >
> > Note that this ENOSYS/EPERM problem is a general problem. Not just glibc
> > doesn't know when to fallback gracefully other tools don't know either.
> > Application container usually just get lucky because their applications
> > don't need to issue the syscalls that are blocked. On a generic system
> > container with systemd inside this is always an issue and not using
> > ENOSYS is guaranteed to fail across the board.
>
> Aleksa, this is fixed in runC, right?
Yes, runc has had the -ENOSYS fallback behaviour for a few releases now.
The way it works is that any syscall which has a larger syscall number
than any syscall specified in the filter will get -ENOSYS (this works
even if libseccomp is outdated). The only way you could get the -EPERM
behaviour with modern runc is if you write a seccomp profile that had
rules for newer syscalls (openat2 for instance) but not clone3 -- but
Docker doesn't do that. (The reason for this slightly convoluted
behaviour was to make sure that intentional omissions actually give you
-EPERM.)
However this requires the container host to have an updated version of
runc which is up to GitHub. (Though we fixed a security issue in runc
recently, so I would expect that they've updated their versions of runc
by now.)
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2021-07-27 10:22 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-27 8:43 RFC: Disable clone3 for glibc 2.34 Florian Weimer via Libc-alpha
2021-07-27 9:11 ` Florian Weimer via Libc-alpha
2021-07-27 9:24 ` Christian Brauner
2021-07-27 9:41 ` Christian Brauner
2021-07-27 10:22 ` Aleksa Sarai [this message]
2021-07-27 10:48 ` Szabolcs Nagy via Libc-alpha
2021-07-29 8:56 ` Aleksa Sarai
2021-07-29 10:50 ` Florian Weimer via Libc-alpha
2021-07-30 12:16 ` Aleksa Sarai
2021-07-29 11:38 ` Szabolcs Nagy via Libc-alpha
2021-07-30 15:08 ` Aleksa Sarai
2021-07-28 17:44 ` Florian Weimer via Libc-alpha
2021-07-29 8:36 ` Daniel P. Berrangé via Libc-alpha
2021-07-27 23:07 ` Andreas K. Huettel via Libc-alpha
2021-07-28 4:58 ` Florian Weimer via Libc-alpha
2021-07-28 17:22 ` [PATCH] Typo: Rename HAVE_CLONE3_WAPPER to HAVE_CLONE3_WRAPPER H.J. Lu via Libc-alpha
2021-07-28 17:35 ` Adhemerval Zanella via Libc-alpha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210727102222.r2hys526mfkpt4xo@senku \
--to=cyphar@cyphar.com \
--cc=christian.brauner@ubuntu.com \
--cc=fweimer@redhat.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).