Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp790800rdb; Fri, 17 Nov 2023 12:51:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IFurK2QIPmu5O9qRNYLRO2P3YOb3FLWmzH+T7agVAjGbi5HGKmon6zTf20ytIwjX+YvkeD/ X-Received: by 2002:a05:6a20:4305:b0:181:a784:67df with SMTP id h5-20020a056a20430500b00181a78467dfmr9085890pzk.20.1700254310206; Fri, 17 Nov 2023 12:51:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700254310; cv=none; d=google.com; s=arc-20160816; b=pOm7uU3H/qcuJV7DpeaJi0w4Lya3JjGbvUoV5bT+Sw7g4zplhzn9RhZS1La+dJzY02 dHiC2QhvaLgmEV770i3hJ8jJKtsPaNgva3IQ9XAOCdxRPkdx+wwutICcgBShVBqwBmDm rxL4HRKxk+jin6StGst5u7aqE0TtB1fb1zA4/2ALh+cYU0BiU0d7KMKgdtgvLRqGpxBi 4j0h+JR0F8aRIUvu0PuCwCsi1Z4Z+8UKzT9CLDmflji1W2mv3kiWwlgcWdLJmiFNR24f OTh21Wqdk86Bv5S1VsTi0TfWCh5/AX5obEMP++lklekifG5JdAPNnrZsHrO3IOX/8kwu bYaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=bN20sgWzr3hvmPnBHj6NtFmr2ao5No2o7Np+nHfNS3Q=; fh=Q+Yh6t/z4BsVWAG523uHjtZAcwV+6lgRYXvZ3SfzxK8=; b=flUnZOQn503ymVxkHNi/QcnuBPXQLP1M5+PelEhiWDSHZbl3Vy7WrpnVBn/L0ByPMV NKmG+yxNHza6YyNVXL+n4kYFx3i0C33KLn1dr6DbK9AtkPQlVM+jqKyNnFdV+ViFrFW7 bG4/Rj0EXXS2kw1SjFkCDrZ4ZwKrMCK8MDZrKbA3pOLDoslwZdlOZkm/N7AgaOeX8coz FfbHKPhW6ymL5o0CWFu4mgqzTgxaJoCAcPwJqjLJ1KI0H8kwDAlnFQJLONo1k9H0U4aI 2cCWreq42eW0bKu9cSZRuiJ/3uYW1S1MVv1uW44hBuOmKjNT5lTpNiHXOsfNyKjF15YS eMmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=POEunZhc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id ca8-20020a056a02068800b005b96c2ab110si2824502pgb.131.2023.11.17.12.51.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Nov 2023 12:51:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=POEunZhc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 7022B81E1B5E; Fri, 17 Nov 2023 12:51:47 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346146AbjKQUv3 (ORCPT + 99 others); Fri, 17 Nov 2023 15:51:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232067AbjKQUv0 (ORCPT ); Fri, 17 Nov 2023 15:51:26 -0500 Received: from mail-oo1-xc32.google.com (mail-oo1-xc32.google.com [IPv6:2607:f8b0:4864:20::c32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7368FD72 for ; Fri, 17 Nov 2023 12:51:22 -0800 (PST) Received: by mail-oo1-xc32.google.com with SMTP id 006d021491bc7-58a0154b4baso1671644eaf.1 for ; Fri, 17 Nov 2023 12:51:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1700254282; x=1700859082; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=bN20sgWzr3hvmPnBHj6NtFmr2ao5No2o7Np+nHfNS3Q=; b=POEunZhcNSUDYAzRqThXThXM9GBQSMPdFBj9nSmQ6MTjMJdM1kXLyiK7TRxM+1n5oO QTgnQ+NTIVSYmzhkqA+lv3NqPIQa629nbHqUmwNv/mY9PAheqdV28iLCJtWeN0yNy0NJ IG2OFFPnB0Pkn2AyyxkspJmG5nZrXO7xiJMmJ/EXTG3SzMttTyAZp+kli2fXp7ZcpV8D hsVukl8whAtd44R0+c8KaK7PHysQbsMEIjmvwOdIUHAimC7KI+f0lfBIdLAbQoOP48kd WXfP1iQT9++lheppG+oQeM4QTSsvD1ZbJ75h7iEgw0FnfIMMrhx8m3IX/jXw504EvIKn +7xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700254282; x=1700859082; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=bN20sgWzr3hvmPnBHj6NtFmr2ao5No2o7Np+nHfNS3Q=; b=MiPzi5F6Eg1y5+AE+Q6rNHH0m0piZUXjxpK2PosfQ965+XX5W9x1f8l28bdhJK1Vb3 d9D+bRGb9Sm3WbpKKCuGsTy9AN/r370qOrSunyES8wOrf/SSz4WxJu6ZyOGLWg7VyZVj G01Qav8MFDEmH225K8gx6Er/YqgQSgvx4zYr0L4x5nBy7AZRZcpSl66RhIrfzzjwmfmu bHdzttleRyxXToqL7wYsxM/5IciTfcU27kNU0P/lxsJdmgtPNPBHH6RrGMKdNtCBeTaZ bfTcdI6yqFZaXoZJPwamqZMJMJ5dvO+sTb30uu4cBLYxEgSImW1GeOpQt1/Fea3dta6r pGXQ== X-Gm-Message-State: AOJu0Yw5+hkS6FGR1NVipscEiZ8Snm3GjVBGbp3TYHFoYsvlrzamIxuv IknRudXnev4iP+PNP19rMo7YIw== X-Received: by 2002:a05:6870:9d99:b0:1c8:c9ca:7092 with SMTP id pv25-20020a0568709d9900b001c8c9ca7092mr124337oab.11.1700254281705; Fri, 17 Nov 2023 12:51:21 -0800 (PST) Received: from debug.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id pj24-20020a056871d19800b001efa91630f6sm402222oac.6.2023.11.17.12.51.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Nov 2023 12:51:21 -0800 (PST) Date: Fri, 17 Nov 2023 12:51:18 -0800 From: Deepak Gupta To: Mark Brown Cc: "Rick P. Edgecombe" , Szabolcs Nagy , "H.J. Lu" , Florian Weimer , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Christian Brauner , Shuah Khan , linux-kernel@vger.kernel.org, Catalin Marinas , Will Deacon , Kees Cook , jannh@google.com, linux-kselftest@vger.kernel.org, linux-api@vger.kernel.org Subject: Re: [PATCH RFC RFT v2 2/5] fork: Add shadow stack support to clone3() Message-ID: References: <20231114-clone3-shadow-stack-v2-0-b613f8681155@kernel.org> <20231114-clone3-shadow-stack-v2-2-b613f8681155@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20231114-clone3-shadow-stack-v2-2-b613f8681155@kernel.org> X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 17 Nov 2023 12:51:47 -0800 (PST) On Tue, Nov 14, 2023 at 08:05:55PM +0000, Mark Brown wrote: >Unlike with the normal stack there is no API for configuring the the shadow >stack for a new thread, instead the kernel will dynamically allocate a new >shadow stack with the same size as the normal stack. This appears to be due >to the shadow stack series having been in development since before the more >extensible clone3() was added rather than anything more deliberate. > >Add parameters to clone3() specifying the address and size of a shadow >stack for the newly created process, Probably should update commit message in next version. Address is not specified anymore. >we validate that the range specified >is accessible to userspace but do not validate that it has been mapped >appropriately for use as a shadow stack (normally via map_shadow_stack()). >If the shadow stack is specified in this way then the caller is responsible >for freeing the memory as with the main stack. If no shadow stack is >specified then the existing implicit allocation and freeing behaviour is >maintained. > >If the architecture does not support shadow stacks the shadow stack >parameters must be zero, architectures that do support the feature are >expected to have the same requirement on individual systems that lack >shadow stack support. > >Update the existing x86 implementation to pay attention to the newly added >arguments, in order to maintain compatibility we use the existing behaviour >if no shadow stack is specified. Minimal validation is done of the supplied >parameters, detailed enforcement is left to when the thread is executed. >Since we are now using four fields from the kernel_clone_args we pass that >into the shadow stack code rather than individual fields. > >Signed-off-by: Mark Brown >--- > arch/x86/include/asm/shstk.h | 11 +++++++---- > arch/x86/kernel/process.c | 2 +- > arch/x86/kernel/shstk.c | 30 +++++++++++++++++++++++++----- > include/linux/sched/task.h | 2 ++ > include/uapi/linux/sched.h | 4 ++++ > kernel/fork.c | 24 ++++++++++++++++++++++-- > 6 files changed, 61 insertions(+), 12 deletions(-) > >diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h >index 42fee8959df7..8be7b0a909c3 100644 >--- a/arch/x86/include/asm/shstk.h >+++ b/arch/x86/include/asm/shstk.h >@@ -6,6 +6,7 @@ > #include > > struct task_struct; >+struct kernel_clone_args; > struct ksignal; > > #ifdef CONFIG_X86_USER_SHADOW_STACK >@@ -16,8 +17,8 @@ struct thread_shstk { > > long shstk_prctl(struct task_struct *task, int option, unsigned long arg2); > void reset_thread_features(void); >-unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, >- unsigned long stack_size); >+unsigned long shstk_alloc_thread_stack(struct task_struct *p, >+ const struct kernel_clone_args *args); > void shstk_free(struct task_struct *p); > int setup_signal_shadow_stack(struct ksignal *ksig); > int restore_signal_shadow_stack(void); >@@ -26,8 +27,10 @@ static inline long shstk_prctl(struct task_struct *task, int option, > unsigned long arg2) { return -EINVAL; } > static inline void reset_thread_features(void) {} > static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p, >- unsigned long clone_flags, >- unsigned long stack_size) { return 0; } >+ const struct kernel_clone_args *args) >+{ >+ return 0; >+} > static inline void shstk_free(struct task_struct *p) {} > static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } > static inline int restore_signal_shadow_stack(void) { return 0; } >diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c >index b6f4e8399fca..a9ca80ea5056 100644 >--- a/arch/x86/kernel/process.c >+++ b/arch/x86/kernel/process.c >@@ -207,7 +207,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > * is disabled, new_ssp will remain 0, and fpu_clone() will know not to > * update it. > */ >- new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size); >+ new_ssp = shstk_alloc_thread_stack(p, args); > if (IS_ERR_VALUE(new_ssp)) > return PTR_ERR((void *)new_ssp); > >diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c >index 59e15dd8d0f8..7ffe90010587 100644 >--- a/arch/x86/kernel/shstk.c >+++ b/arch/x86/kernel/shstk.c >@@ -191,18 +191,38 @@ void reset_thread_features(void) > current->thread.features_locked = 0; > } > >-unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, >- unsigned long stack_size) >+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, >+ const struct kernel_clone_args *args) > { > struct thread_shstk *shstk = &tsk->thread.shstk; >+ unsigned long clone_flags = args->flags; > unsigned long addr, size; > > /* > * If shadow stack is not enabled on the new thread, skip any >- * switch to a new shadow stack. >+ * implicit switch to a new shadow stack and reject attempts to >+ * explciitly specify one. > */ >- if (!features_enabled(ARCH_SHSTK_SHSTK)) >+ if (!features_enabled(ARCH_SHSTK_SHSTK)) { >+ if (args->shadow_stack) >+ return (unsigned long)ERR_PTR(-EINVAL); >+ > return 0; >+ } >+ >+ /* >+ * If the user specified a shadow stack then do some basic >+ * validation and use it. The caller is responsible for >+ * freeing the shadow stack. >+ */ >+ if (args->shadow_stack_size) { >+ size = args->shadow_stack_size; >+ >+ if (size < 8) >+ return (unsigned long)ERR_PTR(-EINVAL); >+ } else { >+ size = args->stack_size; >+ } > > /* > * For CLONE_VFORK the child will share the parents shadow stack. >@@ -222,7 +242,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long cl > if (!(clone_flags & CLONE_VM)) > return 0; > >- size = adjust_shstk_size(stack_size); >+ size = adjust_shstk_size(size); > addr = alloc_shstk(0, size, 0, false); > if (IS_ERR_VALUE(addr)) > return addr; >diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h >index a23af225c898..94e7cf62be51 100644 >--- a/include/linux/sched/task.h >+++ b/include/linux/sched/task.h >@@ -41,6 +41,8 @@ struct kernel_clone_args { > void *fn_arg; > struct cgroup *cgrp; > struct css_set *cset; >+ unsigned long shadow_stack; >+ unsigned long shadow_stack_size; > }; > > /* >diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h >index 3bac0a8ceab2..a998b6d0c897 100644 >--- a/include/uapi/linux/sched.h >+++ b/include/uapi/linux/sched.h >@@ -84,6 +84,8 @@ > * kernel's limit of nested PID namespaces. > * @cgroup: If CLONE_INTO_CGROUP is specified set this to > * a file descriptor for the cgroup. >+ * @shadow_stack_size: Specify the size of the shadow stack to allocate >+ * for the child process. > * > * The structure is versioned by size and thus extensible. > * New struct members must go at the end of the struct and >@@ -101,12 +103,14 @@ struct clone_args { > __aligned_u64 set_tid; > __aligned_u64 set_tid_size; > __aligned_u64 cgroup; >+ __aligned_u64 shadow_stack_size; > }; > #endif > > #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */ > #define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */ > #define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */ >+#define CLONE_ARGS_SIZE_VER3 96 /* sizeof fourth published struct */ > > /* > * Scheduling policies >diff --git a/kernel/fork.c b/kernel/fork.c >index 10917c3e1f03..b0df69c8185e 100644 >--- a/kernel/fork.c >+++ b/kernel/fork.c >@@ -3067,7 +3067,9 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, > CLONE_ARGS_SIZE_VER1); > BUILD_BUG_ON(offsetofend(struct clone_args, cgroup) != > CLONE_ARGS_SIZE_VER2); >- BUILD_BUG_ON(sizeof(struct clone_args) != CLONE_ARGS_SIZE_VER2); >+ BUILD_BUG_ON(offsetofend(struct clone_args, shadow_stack_size) != >+ CLONE_ARGS_SIZE_VER3); >+ BUILD_BUG_ON(sizeof(struct clone_args) != CLONE_ARGS_SIZE_VER3); > > if (unlikely(usize > PAGE_SIZE)) > return -E2BIG; >@@ -3110,6 +3112,7 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, > .tls = args.tls, > .set_tid_size = args.set_tid_size, > .cgroup = args.cgroup, >+ .shadow_stack_size = args.shadow_stack_size, > }; > > if (args.set_tid && >@@ -3150,6 +3153,23 @@ static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) > return true; > } > >+/** >+ * clone3_shadow_stack_valid - check and prepare shadow stack >+ * @kargs: kernel clone args >+ * >+ * Verify that shadow stacks are only enabled if supported. >+ */ >+static inline bool clone3_shadow_stack_valid(struct kernel_clone_args *kargs) >+{ >+#ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK >+ /* The architecture must check support on the specific machine */ >+ return true; >+#else >+ /* The architecture does not support shadow stacks */ >+ return !kargs->shadow_stack_size; >+#endif >+} >+ > static bool clone3_args_valid(struct kernel_clone_args *kargs) > { > /* Verify that no unknown flags are passed along. */ >@@ -3172,7 +3192,7 @@ static bool clone3_args_valid(struct kernel_clone_args *kargs) > kargs->exit_signal) > return false; > >- if (!clone3_stack_valid(kargs)) >+ if (!clone3_stack_valid(kargs) || !clone3_shadow_stack_valid(kargs)) > return false; > > return true; > >-- >2.30.2 >