unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé via Libc-alpha" <libc-alpha@sourceware.org>
To: Florian Weimer <fweimer@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>,
	Florian Weimer via Libc-alpha <libc-alpha@sourceware.org>,
	Christian Brauner <christian.brauner@ubuntu.com>
Subject: Re: RFC: Disable clone3 for glibc 2.34
Date: Thu, 29 Jul 2021 09:36:55 +0100	[thread overview]
Message-ID: <YQJop2U28016Nh4F@redhat.com> (raw)
In-Reply-To: <871r7i8hb0.fsf@oldenburg.str.redhat.com>

On Wed, Jul 28, 2021 at 07:44:03PM +0200, Florian Weimer wrote:
> * Aleksa Sarai:
> 
> > Yes, runc has had the -ENOSYS fallback behaviour for a few releases now.
> >
> > The way it works is that any syscall which has a larger syscall number
> > than any syscall specified in the filter will get -ENOSYS (this works
> > even if libseccomp is outdated). The only way you could get the -EPERM
> > behaviour with modern runc is if you write a seccomp profile that had
> > rules for newer syscalls (openat2 for instance) but not clone3 -- but
> > Docker doesn't do that. (The reason for this slightly convoluted
> > behaviour was to make sure that intentional omissions actually give you
> > -EPERM.)
> >
> > However this requires the container host to have an updated version of
> > runc which is up to GitHub. (Though we fixed a security issue in runc
> > recently, so I would expect that they've updated their versions of runc
> > by now.)
> 
> Indeed I wasn't able to reproduce this locally.  Ubuntu's docker.io
> package behaves as expected, even for “docker build” as far as I can
> see.
> 
> So far, the reported breakage has been focused on Github Actions and
> Azure Devops.  They use a custom Docker-Moby build, and I don't know
> what's in it.  The net effect is that clone3 does not work in containers
> by default.  “docker build” still does not allow “--security-opt
> seccomp=unconfined” for unknown reasons, but that workaround still
> applies to “docker create”.

FYI I found this issue describing the docker build + seccomp feature
gap. Seems it was intentional to not allow it to be used:

   https://github.com/moby/moby/issues/34454

To workaround this gap in "docker build", you can use the --seccomp-profile
option to dockerd daemon when starting it up, to pass a new profile that
applies by default to everything. Doesn't help if you're just using a
dockerd instance started/managed by someone/something else though.

> Daniel P. Berrangé reported that Moby mentions a system call in its
> policy whose number is larger than clone3, effectively turning ENOSYS
> into ENOPERM for clone3.  Looking at the recent change, it could be the
> addition of close_range and epoll_pwait2 in this commit:

I'm not 100% convinced my understanding is right, as there are quite
a few moving parts involved. Wierdly I managed to get things working
with existing docker 20.10.7 simply by using the seccomp profile from
docker git master, which is counter to my understanding described
above. The heuristics involved in runc for EPERM/ENOSYS are very
hard to understand and rationalize behaviour for :-(

To try to make it simpler I send a pull request to explicitly list
clone3 with ENOSYS, so that its not subject to the wierd heuristics
in runc

   https://github.com/moby/moby/pull/42681

> commit 54eff4354b17a9c460b851300f28aed1408a8615
> Author: Aleksa Sarai <asarai@suse.de>
> Date:   Sun Jan 17 23:39:31 2021 +1100
> 
>     profiles: seccomp: update to Linux 5.11 syscall list
>     
>     These syscalls (some of which have been in Linux for a while but were
>     missing from the profile) fall into a few buckets:
>     
>      * close_range(2), epoll_pwait2(2) are just extensions of existing "safe
>        for everyone" syscalls.
>     
>      * The mountv2 API syscalls (fs*(2), move_mount(2), open_tree(2)) are
>        all equivalent to aspects of mount(2) and thus go into the
>        CAP_SYS_ADMIN category.
>     
>      * process_madvise(2) is similar to the other process_*(2) syscalls and
>        thus goes in the CAP_SYS_PTRACE category.
>     
>     Signed-off-by: Aleksa Sarai <asarai@suse.de>
> 
> Maybe we don't see this everywhere because these higher system call
> numbers become available only if the system libseccomp version is recent
> enough to know about them.  Once that is the case, the ENOSYS/EPERM line
> shifts and clone3 is on the wrong side of it.
> 
> If that's indeed the explanation, then maybe we can simply fix moby and
> ask Microsoft to respin their images?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


  reply	other threads:[~2021-07-29  8:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27  8:43 RFC: Disable clone3 for glibc 2.34 Florian Weimer via Libc-alpha
2021-07-27  9:11 ` Florian Weimer via Libc-alpha
2021-07-27  9:24   ` Christian Brauner
2021-07-27  9:41     ` Christian Brauner
2021-07-27 10:22       ` Aleksa Sarai
2021-07-27 10:48         ` Szabolcs Nagy via Libc-alpha
2021-07-29  8:56           ` Aleksa Sarai
2021-07-29 10:50             ` Florian Weimer via Libc-alpha
2021-07-30 12:16               ` Aleksa Sarai
2021-07-29 11:38             ` Szabolcs Nagy via Libc-alpha
2021-07-30 15:08               ` Aleksa Sarai
2021-07-28 17:44         ` Florian Weimer via Libc-alpha
2021-07-29  8:36           ` Daniel P. Berrangé via Libc-alpha [this message]
2021-07-27 23:07 ` Andreas K. Huettel via Libc-alpha
2021-07-28  4:58   ` Florian Weimer via Libc-alpha
2021-07-28 17:22     ` [PATCH] Typo: Rename HAVE_CLONE3_WAPPER to HAVE_CLONE3_WRAPPER H.J. Lu via Libc-alpha
2021-07-28 17:35       ` Adhemerval Zanella via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YQJop2U28016Nh4F@redhat.com \
    --to=libc-alpha@sourceware.org \
    --cc=berrange@redhat.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=cyphar@cyphar.com \
    --cc=fweimer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).