From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 510111F5AE for ; Mon, 10 May 2021 08:17:23 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 211BA3959E51; Mon, 10 May 2021 08:17:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 211BA3959E51 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1620634642; bh=zHvQqorlu36Zjl5b2Rpi57anQotPPVUxWB5/JOMyFIE=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=h0Qwy04mC4PB8GhLV8SlXeUP3qzVfbXxwpoJJdC6II/ybbPQ9oybDFhxi8Cjhd7Ea vQT1VVEk1aaM7S6jKbOkVsf2801dZwUSeYhf4Fhnzf+TO6YIZtivSZhU59czW67xMD glDD9m8/UGTtgYbzdPci9PkHn/K/mmf351PwvRDk= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 35CCA3959E51 for ; Mon, 10 May 2021 08:17:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 35CCA3959E51 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-466-2N7usL87OKqWZQ76_aVn4w-1; Mon, 10 May 2021 04:17:15 -0400 X-MC-Unique: 2N7usL87OKqWZQ76_aVn4w-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 64DA31883520; Mon, 10 May 2021 08:17:14 +0000 (UTC) Received: from oldenburg.str.redhat.com (ovpn-112-137.ams2.redhat.com [10.36.112.137]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B4D5E380; Mon, 10 May 2021 08:17:12 +0000 (UTC) To: "H.J. Lu via Libc-alpha" Subject: Re: [PATCH v3] Add an internal wrapper for clone, clone2 and clone3 References: <20210214224505.4448-1-hjl.tools@gmail.com> <87r1lhehkw.fsf@oldenburg.str.redhat.com> <7e132276-269f-b9c6-72fe-9d771e023d29@linaro.org> <20210305083227.hrg5ja5cdzrs3yo3@wittgenstein> Date: Mon, 10 May 2021 10:17:38 +0200 In-Reply-To: (H. J. Lu via Libc-alpha's message of "Fri, 5 Mar 2021 08:59:39 -0800") Message-ID: <87y2cnf1el.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Florian Weimer via Libc-alpha Reply-To: Florian Weimer Cc: Christian Brauner Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" * H. J. Lu via Libc-alpha: > diff --git a/include/sched.h b/include/sched.h > index b0bf971c93..28dd99a571 100644 > --- a/include/sched.h > +++ b/include/sched.h > @@ -32,5 +32,8 @@ libc_hidden_proto (__clone2) > and Hurd doesn't have it. */ > extern int __getcpu (unsigned int *, unsigned int *); > libc_hidden_proto (__getcpu) > + > +# include > +libc_hidden_proto (__clone_internal) > #endif > #endif I think the libc_hidden_proto should be in below, and including the header here shouldn't be necessary. > diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c > index 149b999603..73c7f33a00 100644 > --- a/nptl/allocatestack.c > +++ b/nptl/allocatestack.c > @@ -34,47 +34,6 @@ > #include > > > -#ifndef NEED_SEPARATE_REGISTER_STACK This conflicts with my libpthread removal work. Maybe it is possible to do these changes in a separate refactoring? > diff --git a/sysdeps/generic/clone-internal.h b/sysdeps/generic/clone-internal.h > new file mode 100644 > index 0000000000..1c1f997ca5 > --- /dev/null > +++ b/sysdeps/generic/clone-internal.h This should be in sysdeps/unix/sysv/linux because it is Linux-specific. I would call it (still as an internal-only header of course). > @@ -0,0 +1,49 @@ > +/* The internal wrapper of clone and clone3. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + . */ > + > +#include > +#include > + > +struct clone_args > +{ > + uint64_t flags; /* Flags bit mask. */ > + uint64_t pidfd; /* Where to store PID file descriptor > + (pid_t *). */ > + uint64_t child_tid; /* Where to store child TID, in child's memory > + (pid_t *). */ > + uint64_t parent_tid; /* Where to store child TID, in parent's memory > + (int *). */ > + uint64_t exit_signal; /* Signal to deliver to parent on child > + termination */ > + uint64_t stack; /* The lowest address of stack. */ > + uint64_t stack_size; /* Size of stack. */ > + uint64_t tls; /* Location of new TLS. */ > + uint64_t set_tid; /* Pointer to a pid_t array > + (since Linux 5.5). */ > + uint64_t set_tid_size; /* Number of elements in set_tid > + (since Linux 5.5). */ > + uint64_t cgroup; /* File descriptor for target cgroup > + of child (since Linux 5.7). */ > +}; I think this has to be surrounded by #ifdef CLONE_ARGS_SIZE_VER0 (after including ). This assumes we actually can include in the glibc build, which I haven't checked. But I think it is worth a try. If we can't use , it would be safer to rename that struct, in case kernel headers begin to define it. > +extern int __clone3 (struct clone_args *cl_args, size_t size, > + int (*__fn) (void *__arg), void *__arg) > + __attribute__ ((visibility ("hidden"))); Please use libc_hidden_proto (without the explicit visibility change). The internal __clone3 call should not have size argument because the size of struct clone_args is known during the build. Soon we won't need GLIBC_PRIVATE, so there is no risk of this backwards-incompatible interface leaking out to userspace. Likewise for __clone_internal. > diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c > new file mode 100644 > index 0000000000..fd9929b6df > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/clone-internal.c > @@ -0,0 +1,62 @@ > +#include > +#include > +#include /* For cast_to_pointer. */ > +#include /* For _STACK_GROWS_{UP,DOWN}. */ > + > +int > +__clone_internal (struct clone_args *cl_args, size_t size, > + int (*fn)(void *arg), void *arg) > +{ > + /* Try clone3 first. */ > + int ret = __clone3 (cl_args, size, fn, arg); > + if (ret == -1 && errno == ENOSYS) > + { > + /* NB: Must clear errno since errno may be checked against non-zero > + return value. */ > + __set_errno (0); Setting errno to zero doesn't look right. This should probably restore the original errno value. > + /* Map clone3 arguments to clone arguments. */ > + int flags = cl_args->flags | cl_args->exit_signal; > + void *stack = cast_to_pointer (cl_args->stack); I think you should check here if there are flags set outside of 0xff because the kernel won't. > +libc_hidden_def (__clone_internal) > diff --git a/sysdeps/unix/sysv/linux/clone-offsets.sym b/sysdeps/unix/sysv/linux/clone-offsets.sym > new file mode 100644 > index 0000000000..38d447bcf3 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/clone-offsets.sym > @@ -0,0 +1,9 @@ > +#include > +#include > + > +-- > + > +#define clone_args_offset(member) offsetof (struct clone_args, member) > + > +CLONE_ARGS_STACK_OFFSET clone_args_offset(stack) > +CLONE_ARGS_STACK_SIZE_OFFSET clone_args_offset(stack_size) Missing spaces before '('. > diff --git a/sysdeps/unix/sysv/linux/tst-align-clone.c b/sysdeps/unix/sysv/linux/tst-align-clone.c > index 6ace61bac3..5f00b8ee4a 100644 > --- a/sysdeps/unix/sysv/linux/tst-align-clone.c > +++ b/sysdeps/unix/sysv/linux/tst-align-clone.c > @@ -23,6 +23,7 @@ > #include > #include > #include > +#include > #include > > static int > @@ -49,21 +50,17 @@ do_test (void) > ok = false; > > #ifdef __ia64__ > - extern int __clone2 (int (*__fn) (void *__arg), void *__child_stack_base, > - size_t __child_stack_size, int __flags, > - void *__arg, ...); > - char st[256 * 1024]; > - pid_t p = __clone2 (f, st, sizeof (st), 0, 0); > +# define STACK_SIZE 256 * 1024 > #else > - char st[128 * 1024] __attribute__ ((aligned)); > -# if _STACK_GROWS_DOWN > - pid_t p = clone (f, st + sizeof (st), 0, 0); > -# elif _STACK_GROWS_UP > - pid_t p = clone (f, st, 0, 0); > -# else > -# error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP" > -# endif > +# define STACK_SIZE 128 * 1024 > #endif > + char st[STACK_SIZE] __attribute__ ((aligned)); > + struct clone_args clone_args = > + { > + .stack = (uintptr_t) st, > + .stack_size = sizeof (st), > + }; > + pid_t p = __clone_internal (&clone_args, sizeof (clone_args), f, 0); > if (p == -1) > { > printf("clone failed: %m\n"); Please do not remvoe coverage of the clone/__clone2 interface from the test. Either test both here or add a new test. The same comment applies to the other tests as well. > diff --git a/sysdeps/unix/sysv/linux/x86_64/clone3.S b/sysdeps/unix/sysv/linux/x86_64/clone3.S > new file mode 100644 > index 0000000000..4d99b4b881 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/x86_64/clone3.S > @@ -0,0 +1,105 @@ > +/* The clone3 syscall wrapper. Linux/x86-64 version. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +/* clone3() is even more special than fork() as it mucks with stacks > + and invokes a function in the right context after its all over. */ > + > +#include > +#include > + > +/* The userland implementation is: > + int clone3 (struct clone_args *cl_args, size_t size, > + int (*fn)(void *arg), void *arg); > + the kernel entry is: > + int clone3 (struct clone_args *cl_args, size_t size); > + > + The parameters are passed in registers from userland: > + rdi: cl_args > + rsi: size > + rdx: fn > + rcx: arg > + > + The kernel expects: > + rax: system call number > + rdi: cl_args > + rsi: size */ > + > + .text > +ENTRY (__clone3) > + /* Sanity check arguments. */ > + movq $-EINVAL, %rax > + testq %rdi, %rdi /* No NULL cl_args pointer. */ > + jz SYSCALL_ERROR_LABEL > + testq %rdx, %rdx /* No NULL function pointer. */ > + jz SYSCALL_ERROR_LABEL I don't think we need the error checking for the internal usage. > + /* Load the new stack size into R9. */ > + movq CLONE_ARGS_STACK_SIZE_OFFSET(%rdi), %r9 > + > + /* Load the new stack bottom into R8. */ > + movq CLONE_ARGS_STACK_OFFSET(%rdi), %r8 > + > + /* Make room for 2 8-byte slots on the new stack. */ > + subq $16, %r9 > + > + /* Insert the argument onto the new stack. */ > + movq %rcx, 8(%r8, %r9) > + > + /* Save the function pointer. It will be popped off in the > + child. */ > + movq %rdx, (%r8, %r9) > + > + /* Update the stack size field passed to clone3. */ > + movq %r9, CLONE_ARGS_STACK_SIZE_OFFSET(%rdi) Why is this necessary? This is rather surprising. I think you should avoid modification here. Can you pass the data to thread_start in registers instead? Thanks, Florian