Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp1188835ybx; Thu, 31 Oct 2019 07:00:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqxQGJ7lR2lFYFlwkTHNfMUZTsE0vJYuKtp02oeMRddZwT00MyHLpSZ369yLb9BJIi1Cucgc X-Received: by 2002:a17:906:d291:: with SMTP id ay17mr4076264ejb.85.1572530457173; Thu, 31 Oct 2019 07:00:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572530457; cv=none; d=google.com; s=arc-20160816; b=sMD9s2Pfq/d1olJx+0PmoVcXSvptqprTdtqgJLVl0WXw2Iqbr6CjPXT3uHp8Gua8RV 9DXevwTe3cv5rq1hkRsW+nlN3Tbo1lRF/JQQZ5V2BK7LHFmvuE+qPw0psShZ2vPY6/Du tzhIo/mOfbhmAfJa+r4et3PrWuEPYU/3FxnUSD/ZatCEF1pTPTANyIG3OHuEw0Z9tKdn QvgSObwv/o45aQpxsGxPApTGcBuEG8x7MfnyjMiFy+DyeGo9Yuk10tOOeIcmmdoipQTr WdeeG6padNahGhhlhoUo789iqqeW4BUucWk57HLg7W5zPh0uuEbToiiDt3PGglhsC+0x b6DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=zZ2k2iLq/6DcjAD0PQDmtSkABeK1yhtkUJFv/oWC79s=; b=fBwyOhIPDg60IEngRg6tmEJaHG2JE96NUFzBaJ88ijHwl2f3BqODrPpBP45H5fkOz4 HuJLwoLzUZKcDvCmvSx0LM00/ANOdjUr1WzOcDDjH6SkNTl92asrXS7ndq58HyB85WT9 f6fPMRzzKpRGkqN8sD4402IwyoMiSYBJbYEhbpQ65mAdbb8vqvgU9TXc4O8UZAbWGW1x 5ArSNKj/i3hIfbFol5XqjR4d3cSXkibr11ouKZ9NoaqEscIHsmbhdO3whjtbl63l8zDj k2KAOcnvhS+Gh+zZXMQdET6VLebJ2ZCZVsKS0oUdjki+eyEYlQrctJ/BD3JqEWG5VVJP 2WFw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z3si3583913ejw.34.2019.10.31.07.00.33; Thu, 31 Oct 2019 07:00:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727841AbfJaN7n (ORCPT + 99 others); Thu, 31 Oct 2019 09:59:43 -0400 Received: from mout-p-101.mailbox.org ([80.241.56.151]:51592 "EHLO mout-p-101.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727771AbfJaN7m (ORCPT ); Thu, 31 Oct 2019 09:59:42 -0400 Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 473n4q2TVXzKmdp; Thu, 31 Oct 2019 14:59:39 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by gerste.heinlein-support.de (gerste.heinlein-support.de [91.198.250.173]) (amavisd-new, port 10030) with ESMTP id tXdZ0yfGJYAE; Thu, 31 Oct 2019 14:59:34 +0100 (CET) Date: Fri, 1 Nov 2019 00:59:21 +1100 From: Aleksa Sarai To: Christian Brauner Cc: linux-kernel@vger.kernel.org, Florian Weimer , GNU C Library , Arnd Bergmann , Kees Cook , Jann Horn , David Howells , Ingo Molnar , Oleg Nesterov , Linus Torvalds , Peter Zijlstra , linux-api@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] clone3: validate stack arguments Message-ID: <20191031135921.wlcnhqvkq4d6azvg@yavin.dot.cyphar.com> References: <20191031113608.20713-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="pw6zyzyjyealujoj" Content-Disposition: inline In-Reply-To: <20191031113608.20713-1-christian.brauner@ubuntu.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --pw6zyzyjyealujoj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2019-10-31, Christian Brauner wrote: > Validate the stack arguments and setup the stack depening on whether or n= ot > it is growing down or up. >=20 > Legacy clone() required userspace to know in which direction the stack is > growing and pass down the stack pointer appropriately. To make things more > confusing microblaze uses a variant of the clone() syscall selected by > CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument. > IA64 has a separate clone2() syscall which also takes an additional > stack_size argument. Finally, parisc has a stack that is growing upwards. > Userspace therefore has a lot nasty code like the following: >=20 > #define __STACK_SIZE (8 * 1024 * 1024) > pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd) > { > pid_t ret; > void *stack; >=20 > stack =3D malloc(__STACK_SIZE); > if (!stack) > return -ENOMEM; >=20 > #ifdef __ia64__ > ret =3D __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, = pidfd); > #elif defined(__parisc__) /* stack grows up */ > ret =3D clone(fn, stack, flags | SIGCHLD, arg, pidfd); > #else > ret =3D clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pi= dfd); > #endif > return ret; > } >=20 > or even crazier variants such as [3]. >=20 > With clone3() we have the ability to validate the stack. We can check that > when stack_size is passed, the stack pointer is valid and the other way > around. We can also check that the memory area userspace gave us is fine = to > use via access_ok(). Furthermore, we probably should not require > userspace to know in which direction the stack is growing. It is easy > for us to do this in the kernel and I couldn't find the original > reasoning behind exposing this detail to userspace. >=20 > /* Intentional user visible API change */ > clone3() was released with 5.3. Currently, it is not documented and very > unclear to userspace how the stack and stack_size argument have to be > passed. After talking to glibc folks we concluded that trying to change > clone3() to setup the stack instead of requiring userspace to do this is > the right course of action. > Note, that this is an explicit change in user visible behavior we introdu= ce > with this patch. If it breaks someone's use-case we will revert! (And then > e.g. place the new behavior under an appropriate flag.) > Breaking someone's use-case is very unlikely though. First, neither glibc > nor musl currently expose a wrapper for clone3(). Second, there is no real > motivation for anyone to use clone3() directly since it does not provide > features that legacy clone doesn't. New features for clone3() will first > happen in v5.5 which is why v5.4 is still a good time to try and make that > change now and backport it to v5.3. Searches on [4] did not reveal any > packages calling clone3(). >=20 > [1]: https://lore.kernel.org/r/CAG48ez3q=3DBeNcuVTKBN79kJui4vC6nw0Bfq6xc-= i0neheT17TA@mail.gmail.com > [2]: https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenste= in > [3]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2= ff9effa3b338/src/basic/raw-clone.h#L31 > [4]: https://codesearch.debian.net > Fixes: 7f192e3cd316 ("fork: add clone3") > Cc: Arnd Bergmann > Cc: Kees Cook > Cc: Jann Horn > Cc: David Howells > Cc: Ingo Molnar > Cc: Oleg Nesterov > Cc: Linus Torvalds > Cc: Florian Weimer > Cc: Peter Zijlstra > Cc: linux-api@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: # 5.3 > Cc: GNU C Library > Signed-off-by: Christian Brauner Looks reasonable, and it'll be lovely to not have to worry about stack ordering anymore. As for the UAPI breakage, I agree that nobody is using clone3() yet and so we still have a bit of lee-way to fix this behaviour. Acked-by: Aleksa Sarai > --- > include/uapi/linux/sched.h | 4 ++++ > kernel/fork.c | 33 ++++++++++++++++++++++++++++++++- > 2 files changed, 36 insertions(+), 1 deletion(-) >=20 > diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h > index 99335e1f4a27..25b4fa00bad1 100644 > --- a/include/uapi/linux/sched.h > +++ b/include/uapi/linux/sched.h > @@ -51,6 +51,10 @@ > * sent when the child exits. > * @stack: Specify the location of the stack for the > * child process. > + * Note, @stack is expected to point to the > + * lowest address. The stack direction will be > + * determined by the kernel and set up > + * appropriately based on @stack_size. > * @stack_size: The size of the stack for the child process. > * @tls: If CLONE_SETTLS is set, the tls descriptor > * is set to tls. > diff --git a/kernel/fork.c b/kernel/fork.c > index bcdf53125210..55af6931c6ec 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2561,7 +2561,35 @@ noinline static int copy_clone_args_from_user(stru= ct kernel_clone_args *kargs, > return 0; > } > =20 > -static bool clone3_args_valid(const struct kernel_clone_args *kargs) > +/** > + * clone3_stack_valid - check and prepare stack > + * @kargs: kernel clone args > + * > + * Verify that the stack arguments userspace gave us are sane. > + * In addition, set the stack direction for userspace since it's easy fo= r us to > + * determine. > + */ > +static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) > +{ > + if (kargs->stack =3D=3D 0) { > + if (kargs->stack_size > 0) > + return false; > + } else { > + if (kargs->stack_size =3D=3D 0) > + return false; > + > + if (!access_ok((void __user *)kargs->stack, kargs->stack_size)) > + return false; > + > +#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_IA64) > + kargs->stack +=3D kargs->stack_size; > +#endif > + } > + > + return true; > +} > + > +static bool clone3_args_valid(struct kernel_clone_args *kargs) > { > /* > * All lower bits of the flag word are taken. > @@ -2581,6 +2609,9 @@ static bool clone3_args_valid(const struct kernel_c= lone_args *kargs) > kargs->exit_signal) > return false; > =20 > + if (!clone3_stack_valid(kargs)) > + return false; > + > return true; > } --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --pw6zyzyjyealujoj Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSxZm6dtfE8gxLLfYqdlLljIbnQEgUCXbrotQAKCRCdlLljIbnQ ErrRAP4tqMhnjKyYprqmY6QeGWhkKcJ3tprEu2C5yjo7FwzPogEAl4/P+p0KALr9 YM68clzUGdKCpq3iE0HCH8qF9MI/4gM= =OpcD -----END PGP SIGNATURE----- --pw6zyzyjyealujoj--