unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Aleksa Sarai <cyphar@cyphar.com>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fweimer@redhat.com>,
	Florian Weimer via Libc-alpha <libc-alpha@sourceware.org>
Subject: Re: RFC: Disable clone3 for glibc 2.34
Date: Tue, 27 Jul 2021 20:22:22 +1000	[thread overview]
Message-ID: <20210727102222.r2hys526mfkpt4xo@senku> (raw)
In-Reply-To: <20210727094117.jid7shl7futsciih@wittgenstein>

[-- Attachment #1: Type: text/plain, Size: 4638 bytes --]

On 2021-07-27, Christian Brauner <christian.brauner@ubuntu.com> wrote:
> On Tue, Jul 27, 2021 at 11:24:16AM +0200, Christian Brauner wrote:
> > On Tue, Jul 27, 2021 at 11:11:17AM +0200, Florian Weimer via Libc-alpha wrote:
> > > * Florian Weimer via Libc-alpha:
> > > 
> > > > Reportedly, the docker package in Ubuntu as used by Github Actions and
> > > > others does not provide a way to enable the clone3 system call.  It
> > > > always fails with EPERM.
> > > >
> > > > Should we apply a patch like this for the release?
> > > >
> > > > diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> > > > index 1e7a8f6b35..4046c81180 100644
> > > > --- a/sysdeps/unix/sysv/linux/clone-internal.c
> > > > +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> > > > @@ -48,17 +48,6 @@ __clone_internal (struct clone_args *cl_args,
> > > >  		  int (*func) (void *arg), void *arg)
> > > >  {
> > > >    int ret;
> > > > -#ifdef HAVE_CLONE3_WAPPER
> > > > -  /* Try clone3 first.  */
> > > > -  int saved_errno = errno;
> > > > -  ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> > > > -  if (ret != -1 || errno != ENOSYS)
> > > > -    return ret;
> > > > -
> > > > -  /* NB: Restore errno since errno may be checked against non-zero
> > > > -     return value.  */
> > > > -  __set_errno (saved_errno);
> > > > -#endif
> > > >  
> > > >    /* Map clone3 arguments to clone arguments.  NB: No need to check
> > > >       invalid clone3 specific bits in flags nor exit_signal since this
> > > >
> > > > My concern with this is that we don't know yet where the CET kernel API
> > > > will land exactly and if CET will require clone3.  So clone3 might have
> > > > to come back once we turn on CET, which is hopefully soon.
> > > 
> > > Ubuntu 20.04 LTS may have already been fixed, I cannot reproduce the
> > > issue with its docker.io/containerd/runc packages.
> > > 
> > > I could trivially fix a previously failing Github Action with:
> > > 
> > > diff --git a/.github/workflows/fedora.yml b/.github/workflows/fedora.yml
> > > index d2381ec..7b10286 100644
> > > --- a/.github/workflows/fedora.yml
> > > +++ b/.github/workflows/fedora.yml
> > > @@ -22,6 +22,7 @@ jobs:
> > >      runs-on: ubuntu-latest
> > >      container:
> > >        image: fedora:${{matrix.release}}
> > > +      options: --security-opt seccomp=unconfined
> > >  
> > >      steps:
> > >        - name: Checkout repository
> > > 
> > > So I think we need to figure out what people are actually complaining
> > > about.
> > 
> > This relates to the discussion what errno value should be used in a
> > seccomp filter to indicate that a syscall is blocked.
> > 
> > So there are two problems I see with seccomp and clone3():
> > 1. the profile doesn't include clone3() at all and therefore the syscall
> >    is blocked and the default action is EPERM
> > 2. the profile does include clone3() and decided to block it but the
> >    runtime has decided to make seccomp return EPERM and not ENOSYS when
> >    clone3() is attempted
> > 
> > The correct fix in both scenarios is to add clone3() to the seccomp
> > profile and either allow it or return ENOSYS.
> > 
> > Note that this ENOSYS/EPERM problem is a general problem. Not just glibc
> > doesn't know when to fallback gracefully other tools don't know either.
> > Application container usually just get lucky because their applications
> > don't need to issue the syscalls that are blocked. On a generic system
> > container with systemd inside this is always an issue and not using
> > ENOSYS is guaranteed to fail across the board.
> 
> Aleksa, this is fixed in runC, right?

Yes, runc has had the -ENOSYS fallback behaviour for a few releases now.

The way it works is that any syscall which has a larger syscall number
than any syscall specified in the filter will get -ENOSYS (this works
even if libseccomp is outdated). The only way you could get the -EPERM
behaviour with modern runc is if you write a seccomp profile that had
rules for newer syscalls (openat2 for instance) but not clone3 -- but
Docker doesn't do that. (The reason for this slightly convoluted
behaviour was to make sure that intentional omissions actually give you
-EPERM.)

However this requires the container host to have an updated version of
runc which is up to GitHub. (Though we fixed a security issue in runc
recently, so I would expect that they've updated their versions of runc
by now.)

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  reply	other threads:[~2021-07-27 10:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27  8:43 RFC: Disable clone3 for glibc 2.34 Florian Weimer via Libc-alpha
2021-07-27  9:11 ` Florian Weimer via Libc-alpha
2021-07-27  9:24   ` Christian Brauner
2021-07-27  9:41     ` Christian Brauner
2021-07-27 10:22       ` Aleksa Sarai [this message]
2021-07-27 10:48         ` Szabolcs Nagy via Libc-alpha
2021-07-29  8:56           ` Aleksa Sarai
2021-07-29 10:50             ` Florian Weimer via Libc-alpha
2021-07-30 12:16               ` Aleksa Sarai
2021-07-29 11:38             ` Szabolcs Nagy via Libc-alpha
2021-07-30 15:08               ` Aleksa Sarai
2021-07-28 17:44         ` Florian Weimer via Libc-alpha
2021-07-29  8:36           ` Daniel P. Berrangé via Libc-alpha
2021-07-27 23:07 ` Andreas K. Huettel via Libc-alpha
2021-07-28  4:58   ` Florian Weimer via Libc-alpha
2021-07-28 17:22     ` [PATCH] Typo: Rename HAVE_CLONE3_WAPPER to HAVE_CLONE3_WRAPPER H.J. Lu via Libc-alpha
2021-07-28 17:35       ` Adhemerval Zanella via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210727102222.r2hys526mfkpt4xo@senku \
    --to=cyphar@cyphar.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).