From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 1AD2D1F8C6 for ; Tue, 27 Jul 2021 09:24:35 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EB1BF39730E2 for ; Tue, 27 Jul 2021 09:24:33 +0000 (GMT) Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by sourceware.org (Postfix) with ESMTPS id A5B633850427 for ; Tue, 27 Jul 2021 09:24:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A5B633850427 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.org Received: by mail.kernel.org (Postfix) with ESMTPSA id 55C8761278; Tue, 27 Jul 2021 09:24:19 +0000 (UTC) Date: Tue, 27 Jul 2021 11:24:16 +0200 From: Christian Brauner To: Florian Weimer Subject: Re: RFC: Disable clone3 for glibc 2.34 Message-ID: <20210727092416.layfgqi6auudbpgc@wittgenstein> References: <87eebkf8ph.fsf@oldenburg.str.redhat.com> <87y29sdsui.fsf@oldenburg.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87y29sdsui.fsf@oldenburg.str.redhat.com> X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Florian Weimer via Libc-alpha Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On Tue, Jul 27, 2021 at 11:11:17AM +0200, Florian Weimer via Libc-alpha wrote: > * Florian Weimer via Libc-alpha: > > > Reportedly, the docker package in Ubuntu as used by Github Actions and > > others does not provide a way to enable the clone3 system call. It > > always fails with EPERM. > > > > Should we apply a patch like this for the release? > > > > diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c > > index 1e7a8f6b35..4046c81180 100644 > > --- a/sysdeps/unix/sysv/linux/clone-internal.c > > +++ b/sysdeps/unix/sysv/linux/clone-internal.c > > @@ -48,17 +48,6 @@ __clone_internal (struct clone_args *cl_args, > > int (*func) (void *arg), void *arg) > > { > > int ret; > > -#ifdef HAVE_CLONE3_WAPPER > > - /* Try clone3 first. */ > > - int saved_errno = errno; > > - ret = __clone3 (cl_args, sizeof (*cl_args), func, arg); > > - if (ret != -1 || errno != ENOSYS) > > - return ret; > > - > > - /* NB: Restore errno since errno may be checked against non-zero > > - return value. */ > > - __set_errno (saved_errno); > > -#endif > > > > /* Map clone3 arguments to clone arguments. NB: No need to check > > invalid clone3 specific bits in flags nor exit_signal since this > > > > My concern with this is that we don't know yet where the CET kernel API > > will land exactly and if CET will require clone3. So clone3 might have > > to come back once we turn on CET, which is hopefully soon. > > Ubuntu 20.04 LTS may have already been fixed, I cannot reproduce the > issue with its docker.io/containerd/runc packages. > > I could trivially fix a previously failing Github Action with: > > diff --git a/.github/workflows/fedora.yml b/.github/workflows/fedora.yml > index d2381ec..7b10286 100644 > --- a/.github/workflows/fedora.yml > +++ b/.github/workflows/fedora.yml > @@ -22,6 +22,7 @@ jobs: > runs-on: ubuntu-latest > container: > image: fedora:${{matrix.release}} > + options: --security-opt seccomp=unconfined > > steps: > - name: Checkout repository > > So I think we need to figure out what people are actually complaining > about. This relates to the discussion what errno value should be used in a seccomp filter to indicate that a syscall is blocked. So there are two problems I see with seccomp and clone3(): 1. the profile doesn't include clone3() at all and therefore the syscall is blocked and the default action is EPERM 2. the profile does include clone3() and decided to block it but the runtime has decided to make seccomp return EPERM and not ENOSYS when clone3() is attempted The correct fix in both scenarios is to add clone3() to the seccomp profile and either allow it or return ENOSYS. Note that this ENOSYS/EPERM problem is a general problem. Not just glibc doesn't know when to fallback gracefully other tools don't know either. Application container usually just get lucky because their applications don't need to issue the syscalls that are blocked. On a generic system container with systemd inside this is always an issue and not using ENOSYS is guaranteed to fail across the board. Christian