Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp8627963rwb; Thu, 24 Nov 2022 02:03:40 -0800 (PST) X-Google-Smtp-Source: AA0mqf7R66+B8DswFz8l2/AFlY6n4YJTdo8kvdkAHdTR7QRtCDMihj29OEaAUrkiv6YpbobJuuQf X-Received: by 2002:a63:580a:0:b0:477:12e3:6e1c with SMTP id m10-20020a63580a000000b0047712e36e1cmr14053991pgb.126.1669284220354; Thu, 24 Nov 2022 02:03:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669284220; cv=none; d=google.com; s=arc-20160816; b=lp9rvoQRny4zj6RDKvqNh8t4AYFJ3tv9vGTvAFNCc3rLmo4CEVxpZ+mQXe4su/F/So haQnA+WoLYdMXWcP7baEt1dremqkEZKsONN5QOYiKqiqJy5N11TGXg6pMSihfZRY5OA4 MRbvbJE1WJdyPMRYYqOHbrau/i2cRw+zza+gzaM3BxXoHeOkG6/p7skBaagLkfzZb/mW qjoLq3P+Zo+LMO+hL7W71aM+38kvLhtvfXIY1HQxN2oBYOEq9XYEMcgPOlgWlCFq73l1 jMO5Xb5Or39A6HRzR6yr8qYL0jKdoFHdr4fQgR+++ep7eVe+PHM+AhxGPUha9iptn4xQ 9Htw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=9kBQ33fvv43X9QDifgGOs3l03dsWxbzD4NDNrJtL/Co=; b=i3gMl04YfbGIyMmuJhjyTTEYoiJpiRvaIgr+V7JnAHPt0Y9B6OBk9wjq3vHDyfYFSl iKPBkrmpiJX5OmD+l+2KV9bFliXpT8IDW6C4KIHdQH+omRKGLalaEayV8RCBAQf3xgeQ 4Sc4FGjeZSoqTN98WWSrL1nQOa8q7MFBAjhJ7ZfUrGzqrSxb1MAtzOB2q5hNd0VvCymY 7LH5MgnsGW2r3oGwZGprH3LeSFq9v6ctV3n/rGuJTNWrsNXNcF1MuI3gGLabhhwO4yQW idj7l22V6EkMwORvtqrOEtytRxf4ggNwK4UQJ4F5axh7UdyfCGA4ar5tYcauazgQiY6n uJkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=d7+HnpOB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r10-20020a63d90a000000b00476c9ad73b6si967414pgg.44.2022.11.24.02.03.27; Thu, 24 Nov 2022 02:03:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=d7+HnpOB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229884AbiKXJyG (ORCPT + 87 others); Thu, 24 Nov 2022 04:54:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229633AbiKXJyC (ORCPT ); Thu, 24 Nov 2022 04:54:02 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9BA7125204 for ; Thu, 24 Nov 2022 01:54:01 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 403AEB8273C for ; Thu, 24 Nov 2022 09:54:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DD45DC433C1 for ; Thu, 24 Nov 2022 09:53:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669283638; bh=P1NXp6aO0av4KBHQZwpjn1lh9PmrvySYyrVRjcY+yQY=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=d7+HnpOBj3Lj9EJ4dLjT5k3tZKd/pJ5ecrbi4fFpWZASPSktbj4jxaTb0xS7obk3T Jmdqnt9o+E/pjQJyZSxDIaoG0ePk4JjU9qB+Vyb8OzEBx9BUmIzopCmZIalrK/5LMi PNki5l/5HxkWI0YqvRTzArPCUJxLbnJIdTTLoPakXEeXxqpRdpBbDmVksqtq61ChXJ XkUQ+mRl28CGEKvZcvw399qNQyQImT+fUlGXBh+BNomdab8q66Z9/q0rSidLijuP6U 6R56JdlkUVjT8uEZMCMps0y5wP8uTUsyCxCR51Dkr8nuSZib0ZL2l/YUiaCBV1K1Yr SDsQLqcsBGS7w== Received: by mail-ej1-f44.google.com with SMTP id ha10so3012754ejb.3 for ; Thu, 24 Nov 2022 01:53:58 -0800 (PST) X-Gm-Message-State: ANoB5plH+Vyyzss4aJinoRD0gQ4VERGWLJ1/l/Ifa2fPcSuBSpTHhyaX v0QHtB3Prb2riqp0ivN4O1l2UEbNNY7xQjT4g/c= X-Received: by 2002:a17:906:2342:b0:78d:9e77:1f8c with SMTP id m2-20020a170906234200b0078d9e771f8cmr18684770eja.236.1669283637020; Thu, 24 Nov 2022 01:53:57 -0800 (PST) MIME-Version: 1.0 References: <20221124094845.1907443-1-debug@rivosinc.com> In-Reply-To: <20221124094845.1907443-1-debug@rivosinc.com> From: Guo Ren Date: Thu, 24 Nov 2022 17:53:45 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2] riscv: VMAP_STACK overflow detection thread-safe To: Deepak Gupta Cc: palmer@dabbelt.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Jisheng Zhang Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 24, 2022 at 5:48 PM Deepak Gupta wrote: > > commit 31da94c25aea ("riscv: add VMAP_STACK overflow detection") added > support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to > `shadow_stack` temporarily before switching finally to per-cpu > `overflow_stack`. > > If two CPUs/harts are racing and end up in over flowing kernel stack, one > or both will end up corrupting each other state because `shadow_stack` is > not per-cpu. This patch optimizes per-cpu overflow stack switch by > directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`. > > Following are the changes in this patch > > - Defines an asm macro to obtain per-cpu symbols in destination > register. > - In entry.S, when overflow is detected, per-cpu overflow stack is > located using per-cpu asm macro. Computing per-cpu symbol requires > a temporary register. x31 is saved away into CSR_SCRATCH > (CSR_SCRATCH is anyways zero since we're in kernel). > > Please see Links for additional relevant disccussion and alternative > solution. > > Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT` > Kernel crash log below > > Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT > Task stack: [0xff20000010a98000..0xff20000010a9c000] > Overflow stack: [0xff600001f7d98370..0xff600001f7d99370] > CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 > Hardware name: riscv-virtio,qemu (DT) > epc : __memset+0x60/0xfc > ra : recursive_loop+0x48/0xc6 [lkdtm] > epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80 > gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88 > t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0 > s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000 > a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000 > a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff > s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90 > s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684 > s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10 > s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4 > t5 : ffffffff815dbab8 t6 : ff20000010a9bb48 > status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f > Kernel panic - not syncing: Kernel stack overflow > CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 > Hardware name: riscv-virtio,qemu (DT) > Call Trace: > [] dump_backtrace+0x30/0x38 > [] show_stack+0x40/0x4c > [] dump_stack_lvl+0x44/0x5c > [] dump_stack+0x18/0x20 > [] panic+0x126/0x2fe > [] walk_stackframe+0x0/0xf0 > [] recursive_loop+0x48/0xc6 [lkdtm] > SMP: stopping secondary CPUs > ---[ end Kernel panic - not syncing: Kernel stack overflow ]--- > > Cc: Guo Ren > Cc: Jisheng Zhang > Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t > Signed-off-by: Deepak Gupta > > --- > v1 --> v2: > - asm macro to locate per-cpu symbol requires a temp reg. > When stack overflow happens, in trap handler we don't have spare regs > except sp. > v1 had a place holder in `thread_info` to spill a register. > v2 instead uses CSR_SCRATCH register because it's free to use. > > - v2 made per-cpu macro more readable. > - v2 fixed a bug that would've broken 32bit support. > > - v1 called it a fix over 31da94c25aea. v2 calls it alternative/ > optimization solution > --- > arch/riscv/include/asm/asm.h | 17 ++++++++++ > arch/riscv/kernel/asm-offsets.c | 1 + > arch/riscv/kernel/entry.S | 57 ++++++--------------------------- > arch/riscv/kernel/traps.c | 12 +------ > 4 files changed, 29 insertions(+), 58 deletions(-) > > diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h > index 1b471ff73178..1eb479cb9ae4 100644 > --- a/arch/riscv/include/asm/asm.h > +++ b/arch/riscv/include/asm/asm.h > @@ -69,6 +69,7 @@ > > #ifdef __ASSEMBLY__ > > +#include > /* Common assembly source macros */ > > /* > @@ -80,6 +81,22 @@ > .endr > .endm > > +#ifdef CONFIG_32BIT > +#define PER_CPU_OFFSET_SHIFT 2 > +#else > +#define PER_CPU_OFFSET_SHIFT 3 > +#endif > + > +.macro asm_per_cpu dst sym tmp > + REG_L \tmp, TASK_TI_CPU_NUM(tp) > + slli \tmp, \tmp, PER_CPU_OFFSET_SHIFT > + la \dst, __per_cpu_offset > + add \dst, \dst, \tmp > + REG_L \tmp, 0(\dst) > + la \dst, \sym > + add \dst, \dst, \tmp > +.endm > + > #endif /* __ASSEMBLY__ */ > > #endif /* _ASM_RISCV_ASM_H */ > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c > index df9444397908..a7da051159cf 100644 > --- a/arch/riscv/kernel/asm-offsets.c > +++ b/arch/riscv/kernel/asm-offsets.c > @@ -38,6 +38,7 @@ void asm_offsets(void) > OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp); > OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp); > > + OFFSET(TASK_TI_CPU_NUM, task_struct, thread_info.cpu); Why not TASK_TI_CPU ? > OFFSET(TASK_THREAD_F0, task_struct, thread.fstate.f[0]); > OFFSET(TASK_THREAD_F1, task_struct, thread.fstate.f[1]); > OFFSET(TASK_THREAD_F2, task_struct, thread.fstate.f[2]); > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S > index b9eda3fcbd6d..2e90d9ccddd0 100644 > --- a/arch/riscv/kernel/entry.S > +++ b/arch/riscv/kernel/entry.S > @@ -10,9 +10,11 @@ > #include > #include > #include > +#include > #include > #include > #include > +#include > > #if !IS_ENABLED(CONFIG_PREEMPTION) > .set resume_kernel, restore_all > @@ -404,54 +406,15 @@ handle_syscall_trace_exit: > > #ifdef CONFIG_VMAP_STACK > handle_kernel_stack_overflow: > - la sp, shadow_stack > - addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE > + /* we reach here from kernel context, sscratch must be 0 */ > + csrrw x31, CSR_SCRATCH, x31 > + asm_per_cpu sp, overflow_stack, x31 > + li x31, OVERFLOW_STACK_SIZE > + add sp, sp, x31 > + /* zero out x31 again and restore x31 */ > + xor x31, x31, x31 > + csrrw x31, CSR_SCRATCH, x31 > > - //save caller register to shadow stack > - addi sp, sp, -(PT_SIZE_ON_STACK) > - REG_S x1, PT_RA(sp) > - REG_S x5, PT_T0(sp) > - REG_S x6, PT_T1(sp) > - REG_S x7, PT_T2(sp) > - REG_S x10, PT_A0(sp) > - REG_S x11, PT_A1(sp) > - REG_S x12, PT_A2(sp) > - REG_S x13, PT_A3(sp) > - REG_S x14, PT_A4(sp) > - REG_S x15, PT_A5(sp) > - REG_S x16, PT_A6(sp) > - REG_S x17, PT_A7(sp) > - REG_S x28, PT_T3(sp) > - REG_S x29, PT_T4(sp) > - REG_S x30, PT_T5(sp) > - REG_S x31, PT_T6(sp) > - > - la ra, restore_caller_reg > - tail get_overflow_stack > - > -restore_caller_reg: > - //save per-cpu overflow stack > - REG_S a0, -8(sp) > - //restore caller register from shadow_stack > - REG_L x1, PT_RA(sp) > - REG_L x5, PT_T0(sp) > - REG_L x6, PT_T1(sp) > - REG_L x7, PT_T2(sp) > - REG_L x10, PT_A0(sp) > - REG_L x11, PT_A1(sp) > - REG_L x12, PT_A2(sp) > - REG_L x13, PT_A3(sp) > - REG_L x14, PT_A4(sp) > - REG_L x15, PT_A5(sp) > - REG_L x16, PT_A6(sp) > - REG_L x17, PT_A7(sp) > - REG_L x28, PT_T3(sp) > - REG_L x29, PT_T4(sp) > - REG_L x30, PT_T5(sp) > - REG_L x31, PT_T6(sp) > - > - //load per-cpu overflow stack > - REG_L sp, -8(sp) > addi sp, sp, -(PT_SIZE_ON_STACK) > > //save context to overflow stack > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c > index f3e96d60a2ff..eef3a87514c7 100644 > --- a/arch/riscv/kernel/traps.c > +++ b/arch/riscv/kernel/traps.c > @@ -208,18 +208,8 @@ int is_valid_bugaddr(unsigned long pc) > #endif /* CONFIG_GENERIC_BUG */ > > #ifdef CONFIG_VMAP_STACK > -static DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], > +DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], > overflow_stack)__aligned(16); > -/* > - * shadow stack, handled_ kernel_ stack_ overflow(in kernel/entry.S) is used > - * to get per-cpu overflow stack(get_overflow_stack). > - */ > -long shadow_stack[SHADOW_OVERFLOW_STACK_SIZE/sizeof(long)]; > -asmlinkage unsigned long get_overflow_stack(void) > -{ > - return (unsigned long)this_cpu_ptr(overflow_stack) + > - OVERFLOW_STACK_SIZE; > -} > > asmlinkage void handle_bad_stack(struct pt_regs *regs) > { > -- > 2.25.1 > -- Best Regards Guo Ren