From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id C531D1F8C6 for ; Thu, 29 Jul 2021 08:37:52 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 015CA393A427 for ; Thu, 29 Jul 2021 08:37:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 015CA393A427 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1627547872; bh=pgPzGzmboOQLLU3eD1baSlu4cJNRUfypZCm+ri7Uuvc=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=RaXKyI0wiVP4jlGNGmPb8tXF+GJLwOxDMz5d9rwPYjK4x/wYLYurVHMltKcWpjYw1 sdBbt/3Hud9mDl0yqN0njh/2piIeeJEQt79jr/0ggifb+sjwjUIL6dx4BXLJorGx4x J7P7jYPr3POfdkLPg3qPBRmP7Q2KZnFdTIhvGmI0= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id BB15C385C41F for ; Thu, 29 Jul 2021 08:37:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BB15C385C41F Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-424-uVYkqeISOcywKR-j-HP38w-1; Thu, 29 Jul 2021 04:37:00 -0400 X-MC-Unique: uVYkqeISOcywKR-j-HP38w-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A169E92500; Thu, 29 Jul 2021 08:36:59 +0000 (UTC) Received: from redhat.com (ovpn-113-133.ams2.redhat.com [10.36.113.133]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 54FA619D7C; Thu, 29 Jul 2021 08:36:58 +0000 (UTC) Date: Thu, 29 Jul 2021 09:36:55 +0100 To: Florian Weimer Subject: Re: RFC: Disable clone3 for glibc 2.34 Message-ID: References: <87eebkf8ph.fsf@oldenburg.str.redhat.com> <87y29sdsui.fsf@oldenburg.str.redhat.com> <20210727092416.layfgqi6auudbpgc@wittgenstein> <20210727094117.jid7shl7futsciih@wittgenstein> <20210727102222.r2hys526mfkpt4xo@senku> <871r7i8hb0.fsf@oldenburg.str.redhat.com> MIME-Version: 1.0 In-Reply-To: <871r7i8hb0.fsf@oldenburg.str.redhat.com> User-Agent: Mutt/2.0.7 (2021-05-04) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9_via_Libc-alpha?= Reply-To: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= Cc: Aleksa Sarai , Florian Weimer via Libc-alpha , Christian Brauner Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On Wed, Jul 28, 2021 at 07:44:03PM +0200, Florian Weimer wrote: > * Aleksa Sarai: > > > Yes, runc has had the -ENOSYS fallback behaviour for a few releases now. > > > > The way it works is that any syscall which has a larger syscall number > > than any syscall specified in the filter will get -ENOSYS (this works > > even if libseccomp is outdated). The only way you could get the -EPERM > > behaviour with modern runc is if you write a seccomp profile that had > > rules for newer syscalls (openat2 for instance) but not clone3 -- but > > Docker doesn't do that. (The reason for this slightly convoluted > > behaviour was to make sure that intentional omissions actually give you > > -EPERM.) > > > > However this requires the container host to have an updated version of > > runc which is up to GitHub. (Though we fixed a security issue in runc > > recently, so I would expect that they've updated their versions of runc > > by now.) > > Indeed I wasn't able to reproduce this locally. Ubuntu's docker.io > package behaves as expected, even for “docker build” as far as I can > see. > > So far, the reported breakage has been focused on Github Actions and > Azure Devops. They use a custom Docker-Moby build, and I don't know > what's in it. The net effect is that clone3 does not work in containers > by default. “docker build” still does not allow “--security-opt > seccomp=unconfined” for unknown reasons, but that workaround still > applies to “docker create”. FYI I found this issue describing the docker build + seccomp feature gap. Seems it was intentional to not allow it to be used: https://github.com/moby/moby/issues/34454 To workaround this gap in "docker build", you can use the --seccomp-profile option to dockerd daemon when starting it up, to pass a new profile that applies by default to everything. Doesn't help if you're just using a dockerd instance started/managed by someone/something else though. > Daniel P. Berrangé reported that Moby mentions a system call in its > policy whose number is larger than clone3, effectively turning ENOSYS > into ENOPERM for clone3. Looking at the recent change, it could be the > addition of close_range and epoll_pwait2 in this commit: I'm not 100% convinced my understanding is right, as there are quite a few moving parts involved. Wierdly I managed to get things working with existing docker 20.10.7 simply by using the seccomp profile from docker git master, which is counter to my understanding described above. The heuristics involved in runc for EPERM/ENOSYS are very hard to understand and rationalize behaviour for :-( To try to make it simpler I send a pull request to explicitly list clone3 with ENOSYS, so that its not subject to the wierd heuristics in runc https://github.com/moby/moby/pull/42681 > commit 54eff4354b17a9c460b851300f28aed1408a8615 > Author: Aleksa Sarai > Date: Sun Jan 17 23:39:31 2021 +1100 > > profiles: seccomp: update to Linux 5.11 syscall list > > These syscalls (some of which have been in Linux for a while but were > missing from the profile) fall into a few buckets: > > * close_range(2), epoll_pwait2(2) are just extensions of existing "safe > for everyone" syscalls. > > * The mountv2 API syscalls (fs*(2), move_mount(2), open_tree(2)) are > all equivalent to aspects of mount(2) and thus go into the > CAP_SYS_ADMIN category. > > * process_madvise(2) is similar to the other process_*(2) syscalls and > thus goes in the CAP_SYS_PTRACE category. > > Signed-off-by: Aleksa Sarai > > Maybe we don't see this everywhere because these higher system call > numbers become available only if the system libseccomp version is recent > enough to know about them. Once that is the case, the ENOSYS/EPERM line > shifts and clone3 is on the wrong side of it. > > If that's indeed the explanation, then maybe we can simply fix moby and > ask Microsoft to respin their images? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|