unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Florian Weimer via Libc-alpha <libc-alpha@sourceware.org>
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	"Florian Weimer via Libc-alpha" <libc-alpha@sourceware.org>
Subject: Re: RFC: Disable clone3 for glibc 2.34
Date: Wed, 28 Jul 2021 19:44:03 +0200	[thread overview]
Message-ID: <871r7i8hb0.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <20210727102222.r2hys526mfkpt4xo@senku> (Aleksa Sarai's message of "Tue, 27 Jul 2021 20:22:22 +1000")

* Aleksa Sarai:

> Yes, runc has had the -ENOSYS fallback behaviour for a few releases now.
>
> The way it works is that any syscall which has a larger syscall number
> than any syscall specified in the filter will get -ENOSYS (this works
> even if libseccomp is outdated). The only way you could get the -EPERM
> behaviour with modern runc is if you write a seccomp profile that had
> rules for newer syscalls (openat2 for instance) but not clone3 -- but
> Docker doesn't do that. (The reason for this slightly convoluted
> behaviour was to make sure that intentional omissions actually give you
> -EPERM.)
>
> However this requires the container host to have an updated version of
> runc which is up to GitHub. (Though we fixed a security issue in runc
> recently, so I would expect that they've updated their versions of runc
> by now.)

Indeed I wasn't able to reproduce this locally.  Ubuntu's docker.io
package behaves as expected, even for “docker build” as far as I can
see.

So far, the reported breakage has been focused on Github Actions and
Azure Devops.  They use a custom Docker-Moby build, and I don't know
what's in it.  The net effect is that clone3 does not work in containers
by default.  “docker build” still does not allow “--security-opt
seccomp=unconfined” for unknown reasons, but that workaround still
applies to “docker create”.

Daniel P. Berrangé reported that Moby mentions a system call in its
policy whose number is larger than clone3, effectively turning ENOSYS
into ENOPERM for clone3.  Looking at the recent change, it could be the
addition of close_range and epoll_pwait2 in this commit:

commit 54eff4354b17a9c460b851300f28aed1408a8615
Author: Aleksa Sarai <asarai@suse.de>
Date:   Sun Jan 17 23:39:31 2021 +1100

    profiles: seccomp: update to Linux 5.11 syscall list
    
    These syscalls (some of which have been in Linux for a while but were
    missing from the profile) fall into a few buckets:
    
     * close_range(2), epoll_pwait2(2) are just extensions of existing "safe
       for everyone" syscalls.
    
     * The mountv2 API syscalls (fs*(2), move_mount(2), open_tree(2)) are
       all equivalent to aspects of mount(2) and thus go into the
       CAP_SYS_ADMIN category.
    
     * process_madvise(2) is similar to the other process_*(2) syscalls and
       thus goes in the CAP_SYS_PTRACE category.
    
    Signed-off-by: Aleksa Sarai <asarai@suse.de>

Maybe we don't see this everywhere because these higher system call
numbers become available only if the system libseccomp version is recent
enough to know about them.  Once that is the case, the ENOSYS/EPERM line
shifts and clone3 is on the wrong side of it.

If that's indeed the explanation, then maybe we can simply fix moby and
ask Microsoft to respin their images?

Thanks,
Florian


  parent reply	other threads:[~2021-07-28 17:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27  8:43 RFC: Disable clone3 for glibc 2.34 Florian Weimer via Libc-alpha
2021-07-27  9:11 ` Florian Weimer via Libc-alpha
2021-07-27  9:24   ` Christian Brauner
2021-07-27  9:41     ` Christian Brauner
2021-07-27 10:22       ` Aleksa Sarai
2021-07-27 10:48         ` Szabolcs Nagy via Libc-alpha
2021-07-29  8:56           ` Aleksa Sarai
2021-07-29 10:50             ` Florian Weimer via Libc-alpha
2021-07-30 12:16               ` Aleksa Sarai
2021-07-29 11:38             ` Szabolcs Nagy via Libc-alpha
2021-07-30 15:08               ` Aleksa Sarai
2021-07-28 17:44         ` Florian Weimer via Libc-alpha [this message]
2021-07-29  8:36           ` Daniel P. Berrangé via Libc-alpha
2021-07-27 23:07 ` Andreas K. Huettel via Libc-alpha
2021-07-28  4:58   ` Florian Weimer via Libc-alpha
2021-07-28 17:22     ` [PATCH] Typo: Rename HAVE_CLONE3_WAPPER to HAVE_CLONE3_WRAPPER H.J. Lu via Libc-alpha
2021-07-28 17:35       ` Adhemerval Zanella via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871r7i8hb0.fsf@oldenburg.str.redhat.com \
    --to=libc-alpha@sourceware.org \
    --cc=berrange@redhat.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=cyphar@cyphar.com \
    --cc=fweimer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).