Received: by 2002:a05:7412:6592:b0:d7:7d3a:4fe2 with SMTP id m18csp1291881rdg; Fri, 11 Aug 2023 17:07:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IER/iW3M9xaZkge2jJ08CSn/F8GrdIBIBlqmMqjDBqxSlbPAUy1upRXB6PyXj+ZXHeu8Go8 X-Received: by 2002:a05:6512:3246:b0:4fd:cc8c:54e5 with SMTP id c6-20020a056512324600b004fdcc8c54e5mr2081220lfr.41.1691798851069; Fri, 11 Aug 2023 17:07:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691798851; cv=none; d=google.com; s=arc-20160816; b=vP5or+o8F/Lrw3/ACjq7LJh1WEBNxL+FAhbtmklT6s2YHIdrkN3h1oKqd4bjf9wS96 iuztGxJY0UZNaU2R7LrAlgLh5mRFXUXQYMnFa7xTnPIuTuahlacjCMoRCa+8MlXM/6Rn SEwJEIHdk0twOYOgbdVEydAXthSikw/9xQFx6suxpB0b5cB0A7/3gvZbjKr2fkN3emDP l17z3py3Vmom4n7EEJUauhUMWqd0caV6afZTRBXLfkw8vAFy8x1oaBhKjKafishmzA5l FUGo5iJJd+JluPdA8/VPwtGWBySIp+s1feE7qShbHGdwhSQJxzixYSPa/+ErXnXPVHVs A/RA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=sl+Vx+A1GYD8K7fSxD1TZR0BVDZ/BsJ4aeaPLzyfoKI=; fh=GWMiqReRye56Df+Alo9WkyXkbVgKX4T+jyORwmzTO1M=; b=ziJca9qJX3oUHCQ765YwsTu2ef55iwCGz5MDMIDOAIuzwGgPbUfFmnJRKjY65Hhvmu alUY2YnMUMG0PTOREGB4WpTcqHcEiaYcSD167li4HG9Hk6HN7LaPyjMCLdv2s3bRT5un byO5IaoRg2hHTDv8IX8XO2FK2DR1VBx6LVd/Da8oPMYN8srLd2m4Z2z9F8A57zsonH4k Mejy+YDGz61VDLIHMGXduOGlB6+wwxrZRbXV6NVTwuM0I49IfRh9n8ZsLtVpVmrQv7Pd 0jY83xuasX3sJgS2rddMrhfqVzl2pZau9MQ0w1rovricCbgBheDwoRrXploAmFQcRZmy FkuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=rpTXQIJT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f14-20020a056402068e00b00523b2e7fbf7si3410163edy.646.2023.08.11.17.07.06; Fri, 11 Aug 2023 17:07:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=rpTXQIJT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237102AbjHKXgG (ORCPT + 99 others); Fri, 11 Aug 2023 19:36:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235342AbjHKXgD (ORCPT ); Fri, 11 Aug 2023 19:36:03 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A81A10F5 for ; Fri, 11 Aug 2023 16:36:03 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d4db57d2982so2644222276.3 for ; Fri, 11 Aug 2023 16:36:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691796962; x=1692401762; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=sl+Vx+A1GYD8K7fSxD1TZR0BVDZ/BsJ4aeaPLzyfoKI=; b=rpTXQIJT/8Urt+8bmPNt6747JatP4h6MwdtEFPz7F80VDbouuOlTtSZowuZeUdzNGg /CkBTfFsXAEKRUSc8LmORZEzoIQZ54nFrUjEt5fLPY+OCU6JtDJyZZ2lt8mXw/iLr2Mu VyjGNPPrDBrheV5zhGgK6G4kKifD1vLUKCoDyNpX3V9JKkM4jktDlHy+Nrj4em3f2iPv 2y/hnRksBcNdhmYreZWjYUBEwuzlau7FTcwhy7feJGXusO6zG+67TG6d2P/t3Sj9PrQQ v155YA/xb03kvefeIAkEeJBCGXdU338VLTd61mIKC7N9qER0vU7nVRe8ZZxyznIxnRk/ OAsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691796962; x=1692401762; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sl+Vx+A1GYD8K7fSxD1TZR0BVDZ/BsJ4aeaPLzyfoKI=; b=jBkCCgui4by9JIyvrwXO7WbQXNN+KXRHs7sgDQ0O5lMFD6LWGPCkidAdtTdAuKtHxY mlmbPdKuupLpccrm+U9l/tu28o9+vnvINACG7ZTFdtsiclOfuiLa7+eSzJqfQuMX8bVh SMyDClydr2v0cZJp7s14PKUYXvttSBLkRxvKFNepFZqAzXys0KF/1y47X4g7CkNcs76S Xj7hK6HDM/0g28bPxVtItjexRjrMQg4jSGrjFEJY80U5Sm5yaw0Sf9WH1FrQl39XOvvq BYe5p0vNeqXwUUQ/lhxwHcTBLuSTFCcmCUYGtDWvcOiSJ1kW3EHvcA5o1qgxs6+P8niN OXWg== X-Gm-Message-State: AOJu0YyCwrZOg1ICjHOQ2Ut22nZSR4nN247FOo6X86a9OmG18g0IoI+I zsEhNxJw8l4/PgEATELJrynKqCJaDCaTFTHRIAo= X-Received: from samitolvanen.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:4f92]) (user=samitolvanen job=sendgmr) by 2002:a05:6902:1816:b0:d08:ea77:52d4 with SMTP id cf22-20020a056902181600b00d08ea7752d4mr55647ybb.12.1691796962289; Fri, 11 Aug 2023 16:36:02 -0700 (PDT) Date: Fri, 11 Aug 2023 23:35:58 +0000 In-Reply-To: <20230811233556.97161-7-samitolvanen@google.com> Mime-Version: 1.0 References: <20230811233556.97161-7-samitolvanen@google.com> X-Developer-Key: i=samitolvanen@google.com; a=openpgp; fpr=35CCFB63B283D6D3AEB783944CB5F6848BBC56EE X-Developer-Signature: v=1; a=openpgp-sha256; l=10015; i=samitolvanen@google.com; h=from:subject; bh=Y2Q+0SIOBCkSyV4UDfg2ZXoAjZ/iJOwOOLFDLF3IL4w=; b=owEB7QES/pANAwAKAUy19oSLvFbuAcsmYgBk1sXckQ+uqRSpz9pUeeNm+q+S6fTT449pmndoL hnHHYwHAnCJAbMEAAEKAB0WIQQ1zPtjsoPW0663g5RMtfaEi7xW7gUCZNbF3AAKCRBMtfaEi7xW 7k4SC/91twCv+6Nupvh6PMBIDoibMdPLsOvY3OWtnHulGqXWqgzj9IMZ6w6r+0GcA8pKdJL4+6G Fnx4mG3R7hPXiv/3AbHsAhRkmjgMSrvtGfEb0Co+fRNKuiHm6LxJnL75+FFkix1BTUd8t5ssbBn xgJeb7+NuSkQdtZ6ElCNZFyEuNd1r3bmhAs8bgd5+KiumNIbMddBqTGxKCt9MeTT44mc8IypMDI WDaxzjYp6crsXmuuB1k+m2h2qrjJsF9s3JxlTyakdeoDKGzbylCi2mCAKL5kyRoK/to2QhpBt1E I3qnZCyGZZjbEr5PpLraZ+aF82FNvb0SBr+gd1CQZTg8NeUS2ATEisqQbjUmVL3IuuLxcEZT0+i q6avizXLdZ5zfz5oA6jtzz2pQkZQaxtJ0yZv12sKre3K3LV+3bhyBM97chJfTx+JeXXi4xMoqRr 26gyqUDDpixyjGMDDnZtPyTSs1AeCBv00Dt6l47tgeobC2eabQoclJi0KlEZtB/ngCaFI= X-Mailer: git-send-email 2.41.0.640.ga95def55d0-goog Message-ID: <20230811233556.97161-8-samitolvanen@google.com> Subject: [PATCH 1/5] riscv: VMAP_STACK overflow detection thread-safe From: Sami Tolvanen To: Paul Walmsley , Palmer Dabbelt , Albert Ou , Kees Cook Cc: Guo Ren , Deepak Gupta , Nathan Chancellor , Nick Desaulniers , Fangrui Song , linux-riscv@lists.infradead.org, llvm@lists.linux.dev, linux-kernel@vger.kernel.org, Jisheng Zhang Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Deepak Gupta commit 31da94c25aea ("riscv: add VMAP_STACK overflow detection") added support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to `shadow_stack` temporarily before switching finally to per-cpu `overflow_stack`. If two CPUs/harts are racing and end up in over flowing kernel stack, one or both will end up corrupting each other state because `shadow_stack` is not per-cpu. This patch optimizes per-cpu overflow stack switch by directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`. Following are the changes in this patch - Defines an asm macro to obtain per-cpu symbols in destination register. - In entry.S, when overflow is detected, per-cpu overflow stack is located using per-cpu asm macro. Computing per-cpu symbol requires a temporary register. x31 is saved away into CSR_SCRATCH (CSR_SCRATCH is anyways zero since we're in kernel). Please see Links for additional relevant disccussion and alternative solution. Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT` Kernel crash log below Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT Task stack: [0xff20000010a98000..0xff20000010a9c000] Overflow stack: [0xff600001f7d98370..0xff600001f7d99370] CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) epc : __memset+0x60/0xfc ra : recursive_loop+0x48/0xc6 [lkdtm] epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80 gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88 t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0 s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000 a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000 a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90 s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684 s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10 s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4 t5 : ffffffff815dbab8 t6 : ff20000010a9bb48 status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f Kernel panic - not syncing: Kernel stack overflow CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) Call Trace: [] dump_backtrace+0x30/0x38 [] show_stack+0x40/0x4c [] dump_stack_lvl+0x44/0x5c [] dump_stack+0x18/0x20 [] panic+0x126/0x2fe [] walk_stackframe+0x0/0xf0 [] recursive_loop+0x48/0xc6 [lkdtm] SMP: stopping secondary CPUs ---[ end Kernel panic - not syncing: Kernel stack overflow ]--- Cc: Guo Ren Cc: Jisheng Zhang Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t Link: https://lore.kernel.org/lkml/20221124094845.1907443-1-debug@rivosinc.com/ Signed-off-by: Deepak Gupta Acked-by: Guo Ren --- arch/riscv/include/asm/asm.h | 16 +++++++ arch/riscv/include/asm/thread_info.h | 3 -- arch/riscv/kernel/asm-offsets.c | 1 + arch/riscv/kernel/entry.S | 70 ++++------------------------ arch/riscv/kernel/traps.c | 36 +------------- 5 files changed, 28 insertions(+), 98 deletions(-) diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h index 114bbadaef41..f403e46e04f2 100644 --- a/arch/riscv/include/asm/asm.h +++ b/arch/riscv/include/asm/asm.h @@ -82,6 +82,22 @@ .endr .endm +#ifdef CONFIG_32BIT +#define PER_CPU_OFFSET_SHIFT 2 +#else +#define PER_CPU_OFFSET_SHIFT 3 +#endif + +.macro asm_per_cpu dst sym tmp + REG_L \tmp, TASK_TI_CPU_NUM(tp) + slli \tmp, \tmp, PER_CPU_OFFSET_SHIFT + la \dst, __per_cpu_offset + add \dst, \dst, \tmp + REG_L \tmp, 0(\dst) + la \dst, \sym + add \dst, \dst, \tmp +.endm + /* save all GPs except x1 ~ x5 */ .macro save_from_x6_to_x31 REG_S x6, PT_T1(sp) diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h index 1833beb00489..d18ce0113ca1 100644 --- a/arch/riscv/include/asm/thread_info.h +++ b/arch/riscv/include/asm/thread_info.h @@ -34,9 +34,6 @@ #ifndef __ASSEMBLY__ -extern long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE / sizeof(long)]; -extern unsigned long spin_shadow_stack; - #include #include diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index d6a75aac1d27..9f535d5de33f 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -39,6 +39,7 @@ void asm_offsets(void) OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp); OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp); + OFFSET(TASK_TI_CPU_NUM, task_struct, thread_info.cpu); OFFSET(TASK_THREAD_F0, task_struct, thread.fstate.f[0]); OFFSET(TASK_THREAD_F1, task_struct, thread.fstate.f[1]); OFFSET(TASK_THREAD_F2, task_struct, thread.fstate.f[2]); diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 143a2bb3e697..3d11aa3af105 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -10,9 +10,11 @@ #include #include #include +#include #include #include #include +#include SYM_CODE_START(handle_exception) /* @@ -170,67 +172,15 @@ SYM_CODE_END(ret_from_exception) #ifdef CONFIG_VMAP_STACK SYM_CODE_START_LOCAL(handle_kernel_stack_overflow) - /* - * Takes the psuedo-spinlock for the shadow stack, in case multiple - * harts are concurrently overflowing their kernel stacks. We could - * store any value here, but since we're overflowing the kernel stack - * already we only have SP to use as a scratch register. So we just - * swap in the address of the spinlock, as that's definately non-zero. - * - * Pairs with a store_release in handle_bad_stack(). - */ -1: la sp, spin_shadow_stack - REG_AMOSWAP_AQ sp, sp, (sp) - bnez sp, 1b - - la sp, shadow_stack - addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE - - //save caller register to shadow stack - addi sp, sp, -(PT_SIZE_ON_STACK) - REG_S x1, PT_RA(sp) - REG_S x5, PT_T0(sp) - REG_S x6, PT_T1(sp) - REG_S x7, PT_T2(sp) - REG_S x10, PT_A0(sp) - REG_S x11, PT_A1(sp) - REG_S x12, PT_A2(sp) - REG_S x13, PT_A3(sp) - REG_S x14, PT_A4(sp) - REG_S x15, PT_A5(sp) - REG_S x16, PT_A6(sp) - REG_S x17, PT_A7(sp) - REG_S x28, PT_T3(sp) - REG_S x29, PT_T4(sp) - REG_S x30, PT_T5(sp) - REG_S x31, PT_T6(sp) - - la ra, restore_caller_reg - tail get_overflow_stack - -restore_caller_reg: - //save per-cpu overflow stack - REG_S a0, -8(sp) - //restore caller register from shadow_stack - REG_L x1, PT_RA(sp) - REG_L x5, PT_T0(sp) - REG_L x6, PT_T1(sp) - REG_L x7, PT_T2(sp) - REG_L x10, PT_A0(sp) - REG_L x11, PT_A1(sp) - REG_L x12, PT_A2(sp) - REG_L x13, PT_A3(sp) - REG_L x14, PT_A4(sp) - REG_L x15, PT_A5(sp) - REG_L x16, PT_A6(sp) - REG_L x17, PT_A7(sp) - REG_L x28, PT_T3(sp) - REG_L x29, PT_T4(sp) - REG_L x30, PT_T5(sp) - REG_L x31, PT_T6(sp) + /* we reach here from kernel context, sscratch must be 0 */ + csrrw x31, CSR_SCRATCH, x31 + asm_per_cpu sp, overflow_stack, x31 + li x31, OVERFLOW_STACK_SIZE + add sp, sp, x31 + /* zero out x31 again and restore x31 */ + xor x31, x31, x31 + csrrw x31, CSR_SCRATCH, x31 - //load per-cpu overflow stack - REG_L sp, -8(sp) addi sp, sp, -(PT_SIZE_ON_STACK) //save context to overflow stack diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index f910dfccbf5d..deb2144d9143 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -397,48 +397,14 @@ int is_valid_bugaddr(unsigned long pc) #endif /* CONFIG_GENERIC_BUG */ #ifdef CONFIG_VMAP_STACK -/* - * Extra stack space that allows us to provide panic messages when the kernel - * has overflowed its stack. - */ -static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], +DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack)__aligned(16); -/* - * A temporary stack for use by handle_kernel_stack_overflow. This is used so - * we can call into C code to get the per-hart overflow stack. Usage of this - * stack must be protected by spin_shadow_stack. - */ -long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)] __aligned(16); - -/* - * A pseudo spinlock to protect the shadow stack from being used by multiple - * harts concurrently. This isn't a real spinlock because the lock side must - * be taken without a valid stack and only a single register, it's only taken - * while in the process of panicing anyway so the performance and error - * checking a proper spinlock gives us doesn't matter. - */ -unsigned long spin_shadow_stack; - -asmlinkage unsigned long get_overflow_stack(void) -{ - return (unsigned long)this_cpu_ptr(overflow_stack) + - OVERFLOW_STACK_SIZE; -} asmlinkage void handle_bad_stack(struct pt_regs *regs) { unsigned long tsk_stk = (unsigned long)current->stack; unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack); - /* - * We're done with the shadow stack by this point, as we're on the - * overflow stack. Tell any other concurrent overflowing harts that - * they can proceed with panicing by releasing the pseudo-spinlock. - * - * This pairs with an amoswap.aq in handle_kernel_stack_overflow. - */ - smp_store_release(&spin_shadow_stack, 0); - console_verbose(); pr_emerg("Insufficient stack space to handle exception!\n"); -- 2.41.0.640.ga95def55d0-goog