From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,PDS_RDNS_DYNAMIC_FP, RCVD_IN_DNSWL_HI,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 9BEC31F8C6 for ; Tue, 27 Jul 2021 10:22:55 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 55087398200E for ; Tue, 27 Jul 2021 10:22:54 +0000 (GMT) Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by sourceware.org (Postfix) with ESMTPS id 065B9383D81C for ; Tue, 27 Jul 2021 10:22:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 065B9383D81C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=cyphar.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cyphar.com Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4GYtCN08YdzQk20; Tue, 27 Jul 2021 12:22:40 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter02.heinlein-hosting.de (spamfilter02.heinlein-hosting.de [80.241.56.116]) (amavisd-new, port 10030) with ESMTP id hxmKczi9P_AM; Tue, 27 Jul 2021 12:22:36 +0200 (CEST) Date: Tue, 27 Jul 2021 20:22:22 +1000 From: Aleksa Sarai To: Christian Brauner Subject: Re: RFC: Disable clone3 for glibc 2.34 Message-ID: <20210727102222.r2hys526mfkpt4xo@senku> References: <87eebkf8ph.fsf@oldenburg.str.redhat.com> <87y29sdsui.fsf@oldenburg.str.redhat.com> <20210727092416.layfgqi6auudbpgc@wittgenstein> <20210727094117.jid7shl7futsciih@wittgenstein> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="45dko3czl55cqzoo" Content-Disposition: inline In-Reply-To: <20210727094117.jid7shl7futsciih@wittgenstein> X-Rspamd-Queue-Id: 99962183C X-Rspamd-UID: 7a3912 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Florian Weimer , Florian Weimer via Libc-alpha Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" --45dko3czl55cqzoo Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2021-07-27, Christian Brauner wrote: > On Tue, Jul 27, 2021 at 11:24:16AM +0200, Christian Brauner wrote: > > On Tue, Jul 27, 2021 at 11:11:17AM +0200, Florian Weimer via Libc-alpha= wrote: > > > * Florian Weimer via Libc-alpha: > > >=20 > > > > Reportedly, the docker package in Ubuntu as used by Github Actions = and > > > > others does not provide a way to enable the clone3 system call. It > > > > always fails with EPERM. > > > > > > > > Should we apply a patch like this for the release? > > > > > > > > diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/uni= x/sysv/linux/clone-internal.c > > > > index 1e7a8f6b35..4046c81180 100644 > > > > --- a/sysdeps/unix/sysv/linux/clone-internal.c > > > > +++ b/sysdeps/unix/sysv/linux/clone-internal.c > > > > @@ -48,17 +48,6 @@ __clone_internal (struct clone_args *cl_args, > > > > int (*func) (void *arg), void *arg) > > > > { > > > > int ret; > > > > -#ifdef HAVE_CLONE3_WAPPER > > > > - /* Try clone3 first. */ > > > > - int saved_errno =3D errno; > > > > - ret =3D __clone3 (cl_args, sizeof (*cl_args), func, arg); > > > > - if (ret !=3D -1 || errno !=3D ENOSYS) > > > > - return ret; > > > > - > > > > - /* NB: Restore errno since errno may be checked against non-zero > > > > - return value. */ > > > > - __set_errno (saved_errno); > > > > -#endif > > > > =20 > > > > /* Map clone3 arguments to clone arguments. NB: No need to check > > > > invalid clone3 specific bits in flags nor exit_signal since t= his > > > > > > > > My concern with this is that we don't know yet where the CET kernel= API > > > > will land exactly and if CET will require clone3. So clone3 might = have > > > > to come back once we turn on CET, which is hopefully soon. > > >=20 > > > Ubuntu 20.04 LTS may have already been fixed, I cannot reproduce the > > > issue with its docker.io/containerd/runc packages. > > >=20 > > > I could trivially fix a previously failing Github Action with: > > >=20 > > > diff --git a/.github/workflows/fedora.yml b/.github/workflows/fedora.= yml > > > index d2381ec..7b10286 100644 > > > --- a/.github/workflows/fedora.yml > > > +++ b/.github/workflows/fedora.yml > > > @@ -22,6 +22,7 @@ jobs: > > > runs-on: ubuntu-latest > > > container: > > > image: fedora:${{matrix.release}} > > > + options: --security-opt seccomp=3Dunconfined > > > =20 > > > steps: > > > - name: Checkout repository > > >=20 > > > So I think we need to figure out what people are actually complaining > > > about. > >=20 > > This relates to the discussion what errno value should be used in a > > seccomp filter to indicate that a syscall is blocked. > >=20 > > So there are two problems I see with seccomp and clone3(): > > 1. the profile doesn't include clone3() at all and therefore the syscall > > is blocked and the default action is EPERM > > 2. the profile does include clone3() and decided to block it but the > > runtime has decided to make seccomp return EPERM and not ENOSYS when > > clone3() is attempted > >=20 > > The correct fix in both scenarios is to add clone3() to the seccomp > > profile and either allow it or return ENOSYS. > >=20 > > Note that this ENOSYS/EPERM problem is a general problem. Not just glibc > > doesn't know when to fallback gracefully other tools don't know either. > > Application container usually just get lucky because their applications > > don't need to issue the syscalls that are blocked. On a generic system > > container with systemd inside this is always an issue and not using > > ENOSYS is guaranteed to fail across the board. >=20 > Aleksa, this is fixed in runC, right? Yes, runc has had the -ENOSYS fallback behaviour for a few releases now. The way it works is that any syscall which has a larger syscall number than any syscall specified in the filter will get -ENOSYS (this works even if libseccomp is outdated). The only way you could get the -EPERM behaviour with modern runc is if you write a seccomp profile that had rules for newer syscalls (openat2 for instance) but not clone3 -- but Docker doesn't do that. (The reason for this slightly convoluted behaviour was to make sure that intentional omissions actually give you -EPERM.) However this requires the container host to have an updated version of runc which is up to GitHub. (Though we fixed a security issue in runc recently, so I would expect that they've updated their versions of runc by now.) --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --45dko3czl55cqzoo Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSxZm6dtfE8gxLLfYqdlLljIbnQEgUCYP/eWwAKCRCdlLljIbnQ EtZsAQCVsNxFbXywYlOTqqH2RP0HJZanwyTQPXCTTD49Z6QPegEAqLX8yEpy+B9e 1FXBvxT8eixkoHDw6hc+y8sJKmUYrQk= =QzBv -----END PGP SIGNATURE----- --45dko3czl55cqzoo--