Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp1020848ybx; Thu, 31 Oct 2019 04:37:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqxljPmK5Xmv73ekCz4QPS4DNKf4ljH2v9rB6Io/DLOOMIN892CMllFi9CYRkHeRMhJa2qGO X-Received: by 2002:a50:9f65:: with SMTP id b92mr5380926edf.63.1572521852173; Thu, 31 Oct 2019 04:37:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572521852; cv=none; d=google.com; s=arc-20160816; b=x+8sAwQyDVpdnczXvngNErxTU/MV+HAXG+MLlzU3ZUgF+/fmF7m8uzF7aHg7ehz4bE DrBSjKE3SgIg8lU3q6C+/TGMlGl1In4MlKUKXrbha+Fh5jPf/nATQPriM7hCYXl4cA9x cpGqYkp0rQfSl9XQjp4/FvMbIbOF76WnTR6c/L4c0rv8ydzF3XSp1gcGCtVK7ZuwPO01 8cPGvWM0X/ON1OxMPCTspaKirVWdIUmgnhlVgugU4UA6jgYAV2sR8xEd/GFOHubGTNOz YgAfZF+W10YbIXBbpzgGw7eijqYgBRzt9D6EbtdQg88s0H5m8tqoG8ktj3BZcSSNW28R 66bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=BARNee6EkOT0tav7crpNJcWCbvv7e8MKoLw3/eKlNK4=; b=jgH1AkvqY5VHV+W2uC+0XNhHtJuKOoOnrAeC856oqmzBertlBaFPuBdGtDkkXovD7Z MOFseVPyoWhLJ82nyhDuYq5E+bcgxrKxy23y+HsRdWap6V9uTkoQLvcCFiFEhniKZ4tK 5CunMMym38sME6xmsI7LEnxg/vRVTeIQpv5Zs0Jkhg2qPZDA3CjxJsKFBNJHa5ICFUbD aZdhRv0Yj/42WJDwbGNScd/iWgrUutFlfOidJ2KmmNndANkaEs8tB2LzRjRCjEZDGIej wxaHLeNqAEuIXep+f9OqEWyn52Hw5vDNSMhZyY0n9Tzo3I4gxjbqRNzfAU7/56r8Dfi7 UxgQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g3si3977684edj.220.2019.10.31.04.37.05; Thu, 31 Oct 2019 04:37:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726636AbfJaLgX (ORCPT + 99 others); Thu, 31 Oct 2019 07:36:23 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:50044 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726462AbfJaLgW (ORCPT ); Thu, 31 Oct 2019 07:36:22 -0400 Received: from [91.217.168.176] (helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iQ8kU-0000Jq-Oo; Thu, 31 Oct 2019 11:36:18 +0000 From: Christian Brauner To: linux-kernel@vger.kernel.org, Florian Weimer , GNU C Library Cc: Arnd Bergmann , Kees Cook , Jann Horn , David Howells , Ingo Molnar , Oleg Nesterov , Linus Torvalds , Peter Zijlstra , linux-api@vger.kernel.org, stable@vger.kernel.org, Christian Brauner Subject: [PATCH] clone3: validate stack arguments Date: Thu, 31 Oct 2019 12:36:08 +0100 Message-Id: <20191031113608.20713-1-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Validate the stack arguments and setup the stack depening on whether or not it is growing down or up. Legacy clone() required userspace to know in which direction the stack is growing and pass down the stack pointer appropriately. To make things more confusing microblaze uses a variant of the clone() syscall selected by CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument. IA64 has a separate clone2() syscall which also takes an additional stack_size argument. Finally, parisc has a stack that is growing upwards. Userspace therefore has a lot nasty code like the following: #define __STACK_SIZE (8 * 1024 * 1024) pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd) { pid_t ret; void *stack; stack = malloc(__STACK_SIZE); if (!stack) return -ENOMEM; #ifdef __ia64__ ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #elif defined(__parisc__) /* stack grows up */ ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd); #else ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #endif return ret; } or even crazier variants such as [3]. With clone3() we have the ability to validate the stack. We can check that when stack_size is passed, the stack pointer is valid and the other way around. We can also check that the memory area userspace gave us is fine to use via access_ok(). Furthermore, we probably should not require userspace to know in which direction the stack is growing. It is easy for us to do this in the kernel and I couldn't find the original reasoning behind exposing this detail to userspace. /* Intentional user visible API change */ clone3() was released with 5.3. Currently, it is not documented and very unclear to userspace how the stack and stack_size argument have to be passed. After talking to glibc folks we concluded that trying to change clone3() to setup the stack instead of requiring userspace to do this is the right course of action. Note, that this is an explicit change in user visible behavior we introduce with this patch. If it breaks someone's use-case we will revert! (And then e.g. place the new behavior under an appropriate flag.) Breaking someone's use-case is very unlikely though. First, neither glibc nor musl currently expose a wrapper for clone3(). Second, there is no real motivation for anyone to use clone3() directly since it does not provide features that legacy clone doesn't. New features for clone3() will first happen in v5.5 which is why v5.4 is still a good time to try and make that change now and backport it to v5.3. Searches on [4] did not reveal any packages calling clone3(). [1]: https://lore.kernel.org/r/CAG48ez3q=BeNcuVTKBN79kJui4vC6nw0Bfq6xc-i0neheT17TA@mail.gmail.com [2]: https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenstein [3]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa3b338/src/basic/raw-clone.h#L31 [4]: https://codesearch.debian.net Fixes: 7f192e3cd316 ("fork: add clone3") Cc: Arnd Bergmann Cc: Kees Cook Cc: Jann Horn Cc: David Howells Cc: Ingo Molnar Cc: Oleg Nesterov Cc: Linus Torvalds Cc: Florian Weimer Cc: Peter Zijlstra Cc: linux-api@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: # 5.3 Cc: GNU C Library Signed-off-by: Christian Brauner --- include/uapi/linux/sched.h | 4 ++++ kernel/fork.c | 33 ++++++++++++++++++++++++++++++++- 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 99335e1f4a27..25b4fa00bad1 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -51,6 +51,10 @@ * sent when the child exits. * @stack: Specify the location of the stack for the * child process. + * Note, @stack is expected to point to the + * lowest address. The stack direction will be + * determined by the kernel and set up + * appropriately based on @stack_size. * @stack_size: The size of the stack for the child process. * @tls: If CLONE_SETTLS is set, the tls descriptor * is set to tls. diff --git a/kernel/fork.c b/kernel/fork.c index bcdf53125210..55af6931c6ec 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2561,7 +2561,35 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, return 0; } -static bool clone3_args_valid(const struct kernel_clone_args *kargs) +/** + * clone3_stack_valid - check and prepare stack + * @kargs: kernel clone args + * + * Verify that the stack arguments userspace gave us are sane. + * In addition, set the stack direction for userspace since it's easy for us to + * determine. + */ +static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{ + if (kargs->stack == 0) { + if (kargs->stack_size > 0) + return false; + } else { + if (kargs->stack_size == 0) + return false; + + if (!access_ok((void __user *)kargs->stack, kargs->stack_size)) + return false; + +#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_IA64) + kargs->stack += kargs->stack_size; +#endif + } + + return true; +} + +static bool clone3_args_valid(struct kernel_clone_args *kargs) { /* * All lower bits of the flag word are taken. @@ -2581,6 +2609,9 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs) kargs->exit_signal) return false; + if (!clone3_stack_valid(kargs)) + return false; + return true; } -- 2.23.0