From: Florian Weimer via Libc-alpha <libc-alpha@sourceware.org>
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
"Christian Brauner" <christian.brauner@ubuntu.com>,
"Florian Weimer via Libc-alpha" <libc-alpha@sourceware.org>
Subject: Re: RFC: Disable clone3 for glibc 2.34
Date: Wed, 28 Jul 2021 19:44:03 +0200 [thread overview]
Message-ID: <871r7i8hb0.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <20210727102222.r2hys526mfkpt4xo@senku> (Aleksa Sarai's message of "Tue, 27 Jul 2021 20:22:22 +1000")
* Aleksa Sarai:
> Yes, runc has had the -ENOSYS fallback behaviour for a few releases now.
>
> The way it works is that any syscall which has a larger syscall number
> than any syscall specified in the filter will get -ENOSYS (this works
> even if libseccomp is outdated). The only way you could get the -EPERM
> behaviour with modern runc is if you write a seccomp profile that had
> rules for newer syscalls (openat2 for instance) but not clone3 -- but
> Docker doesn't do that. (The reason for this slightly convoluted
> behaviour was to make sure that intentional omissions actually give you
> -EPERM.)
>
> However this requires the container host to have an updated version of
> runc which is up to GitHub. (Though we fixed a security issue in runc
> recently, so I would expect that they've updated their versions of runc
> by now.)
Indeed I wasn't able to reproduce this locally. Ubuntu's docker.io
package behaves as expected, even for “docker build” as far as I can
see.
So far, the reported breakage has been focused on Github Actions and
Azure Devops. They use a custom Docker-Moby build, and I don't know
what's in it. The net effect is that clone3 does not work in containers
by default. “docker build” still does not allow “--security-opt
seccomp=unconfined” for unknown reasons, but that workaround still
applies to “docker create”.
Daniel P. Berrangé reported that Moby mentions a system call in its
policy whose number is larger than clone3, effectively turning ENOSYS
into ENOPERM for clone3. Looking at the recent change, it could be the
addition of close_range and epoll_pwait2 in this commit:
commit 54eff4354b17a9c460b851300f28aed1408a8615
Author: Aleksa Sarai <asarai@suse.de>
Date: Sun Jan 17 23:39:31 2021 +1100
profiles: seccomp: update to Linux 5.11 syscall list
These syscalls (some of which have been in Linux for a while but were
missing from the profile) fall into a few buckets:
* close_range(2), epoll_pwait2(2) are just extensions of existing "safe
for everyone" syscalls.
* The mountv2 API syscalls (fs*(2), move_mount(2), open_tree(2)) are
all equivalent to aspects of mount(2) and thus go into the
CAP_SYS_ADMIN category.
* process_madvise(2) is similar to the other process_*(2) syscalls and
thus goes in the CAP_SYS_PTRACE category.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Maybe we don't see this everywhere because these higher system call
numbers become available only if the system libseccomp version is recent
enough to know about them. Once that is the case, the ENOSYS/EPERM line
shifts and clone3 is on the wrong side of it.
If that's indeed the explanation, then maybe we can simply fix moby and
ask Microsoft to respin their images?
Thanks,
Florian
next prev parent reply other threads:[~2021-07-28 17:44 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-27 8:43 RFC: Disable clone3 for glibc 2.34 Florian Weimer via Libc-alpha
2021-07-27 9:11 ` Florian Weimer via Libc-alpha
2021-07-27 9:24 ` Christian Brauner
2021-07-27 9:41 ` Christian Brauner
2021-07-27 10:22 ` Aleksa Sarai
2021-07-27 10:48 ` Szabolcs Nagy via Libc-alpha
2021-07-29 8:56 ` Aleksa Sarai
2021-07-29 10:50 ` Florian Weimer via Libc-alpha
2021-07-30 12:16 ` Aleksa Sarai
2021-07-29 11:38 ` Szabolcs Nagy via Libc-alpha
2021-07-30 15:08 ` Aleksa Sarai
2021-07-28 17:44 ` Florian Weimer via Libc-alpha [this message]
2021-07-29 8:36 ` Daniel P. Berrangé via Libc-alpha
2021-07-27 23:07 ` Andreas K. Huettel via Libc-alpha
2021-07-28 4:58 ` Florian Weimer via Libc-alpha
2021-07-28 17:22 ` [PATCH] Typo: Rename HAVE_CLONE3_WAPPER to HAVE_CLONE3_WRAPPER H.J. Lu via Libc-alpha
2021-07-28 17:35 ` Adhemerval Zanella via Libc-alpha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871r7i8hb0.fsf@oldenburg.str.redhat.com \
--to=libc-alpha@sourceware.org \
--cc=berrange@redhat.com \
--cc=christian.brauner@ubuntu.com \
--cc=cyphar@cyphar.com \
--cc=fweimer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).