Received: by 2002:a05:7412:2a91:b0:fc:a2b0:25d7 with SMTP id u17csp505316rdh; Wed, 14 Feb 2024 03:37:48 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXXzzJWfYqvzOSAEyHL56/y5iROmwh4hIgZx5rsrJx3VYbu1jQj0hb/AD8qghZ2F6lHeWRpMGkYhKU/D1umEtfw+Z4x3OSpgTi6VGb4dQ== X-Google-Smtp-Source: AGHT+IHJo72qxqc3ptfhYKEJPa1+GNgbj0rtWHaWDF5hvbohhVarAqnjfES70cj0uHr6oQVibMDI X-Received: by 2002:a05:622a:10:b0:42c:7160:d46c with SMTP id x16-20020a05622a001000b0042c7160d46cmr2926757qtw.19.1707910668079; Wed, 14 Feb 2024 03:37:48 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707910668; cv=pass; d=google.com; s=arc-20160816; b=rUpXyMJSrldyQkYiddunkCpcaWvee5JiPRdRfBjYQWgRtDOB4W9MkG6U5pXmad8CID 0fOl813NHgx1xdIzNzbB1hlpYeO8uANAFIoC/A5HnLWTayoF4Vhk3PqruflyrAUv/22i apHNsa3ED3lWp+E7T1P72lDEi4Dp0RNr3lGfcEsucunNYOrrM8OKIQl1yGqXZMxZ+z7b 1G3kMpK30SehH48jVd3l03Mf3WBUn9f1fH0QEgGr89C/vApmkPZ3Js6ayN7VTnvp5JxW 99QhQsBJc4hTH4h3Jwf2CrxqMvnf4ZnpIWCWlie/Xq9A0SltR1blysWAPouRqOVa8dw0 0itg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=aMdq2/I9TuaJI+Kn/Ajn7PLLSAHgO3XXhq/EsEpflmc=; fh=C9GU5GOdWjmKLrZ0ky/raUkzzeGXejicB6gk94KT0vY=; b=WsqOidlvhjy00lhtM7rEbhDwolzJwvXRcTqkLXOtkMOzwIgBaYGTjlXN2UE2u0m+W+ 3wV1SKWmSbX9ycu/jjKwbnim5JoF4BgIPMLgAIPW7tuPMmeCnZIz3dzl8bM7+x59HGzK 8tvDUc3Kiywyr04Hhpe/vjFBK7+b93dl1E0UJeAKXDw4pa4JqSM7+YkWMDGXB6DCXsvA 2fl1B4EzRihYFWntnEjzlu9fLev1rYEnKQRRIIPaVl0qAfjKLk/i9xfReH9yCp4iRTyB GZ9gKvH/O4siiXacwYmIEdd22d/txNL97biXReXxNxWipp2ViR0Yzt5ut2Rm/nLB7giV s7DQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65143-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65143-linux.lists.archive=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCWXc6MPoq5EBRIm/19Pc7jLejN4T5BZb7Dp2UJOAqin6iMLExKPwCPR1cWbRUF/T/mftXIGVmFSu2bRRrZ0IBBVIcaqPeCqSBFL3bVG2Q== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id kj15-20020a056214528f00b0068cb644b482si5215110qvb.197.2024.02.14.03.37.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Feb 2024 03:37:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-65143-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65143-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65143-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 913731C22979 for ; Wed, 14 Feb 2024 11:37:47 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5FE901B26E; Wed, 14 Feb 2024 11:37:11 +0000 (UTC) Received: from frasgout11.his.huawei.com (frasgout11.his.huawei.com [14.137.139.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EB431AACE; Wed, 14 Feb 2024 11:37:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=14.137.139.23 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910630; cv=none; b=plMhWzxq3sYYcPCA8+3dJBmyDr4dZb+CLDIGv1KnQ/skUDjujokH9+o2/0rLEKkIan/Qsfa9Z2RtmyNpML6cDPnCSs/VTvYUSf452UHybc3KFq/X8Y5md+9HhuEnhIMrfGCDrka0FpIbioEgPFpw5gWzX61LisMWw3QfKoe+V4k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910630; c=relaxed/simple; bh=a2vh6qq7VHsS7NIXK2cTE7729KXe3Z0csZimJkYwPdw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gW16qYIr0U4wFn2yXViwS1+YlxsB8Yf8W3yaCCzeph21xP1HK/fwDhte/OBy8NjuInL3eB9l5nswJcxBm6cQLqnAmoq9ZJiKHiT1MHu8Qgqy7XfB5zc8xkFt2fIDNGPhAWMmdh0Zal9pbxLcrKRNP9UugLapWCadeliI/PKWFvg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=14.137.139.23 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.18.186.51]) by frasgout11.his.huawei.com (SkyGuard) with ESMTP id 4TZbPT2J9nz9ynSS; Wed, 14 Feb 2024 19:21:49 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id DB5521405A2; Wed, 14 Feb 2024 19:36:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.45.156.69]) by APP1 (Coremail) with SMTP id LxC2BwAHshp7pcxlDJx9Ag--.51624S7; Wed, 14 Feb 2024 12:36:55 +0100 (CET) From: Petr Tesarik To: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Peter Zijlstra , Xin Li , Arnd Bergmann , Andrew Morton , Rick Edgecombe , Kees Cook , "Masami Hiramatsu (Google)" , Pengfei Xu , Josh Poimboeuf , Ze Gao , "Kirill A. Shutemov" , Kai Huang , David Woodhouse , Brian Gerst , Jason Gunthorpe , Joerg Roedel , "Mike Rapoport (IBM)" , Tina Zhang , Jacob Pan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu , petr@tesarici.cz, Petr Tesarik Subject: [PATCH v1 5/8] sbm: x86: handle sandbox mode faults Date: Wed, 14 Feb 2024 12:35:13 +0100 Message-Id: <20240214113516.2307-6-petrtesarik@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240214113516.2307-1-petrtesarik@huaweicloud.com> References: <20240214113516.2307-1-petrtesarik@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID:LxC2BwAHshp7pcxlDJx9Ag--.51624S7 X-Coremail-Antispam: 1UD129KBjvJXoW3Xw48KFW8Jr4DKFy5JFyrWFg_yoWDGryxpF 9rAFn5GFZxWa4SvF9xAr4vvrW3Aws5Kw1YkF9rKry5Z3W2q345Xr4v9w1qqr4kZ395W3WY gFW5Zrn5uan8Jw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUml14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26r4j6F4UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_ Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6x IIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_ Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8c xan2IY04v7MxkF7I0Ew4C26cxK6c8Ij28IcwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE 7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI 8E67AF67kF1VAFwI0_Wrv_Gr1UMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_ Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r 1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4U JbIYCTnIWIevJa73UjIFyTuYvjfUnzVbDUUUU X-CM-SenderInfo: hshw23xhvd2x3n6k3tpzhluzxrxghudrp/ From: Petr Tesarik Provide a fault handler for sandbox mode. Set the sandbox mode instance error code, abort the sandbox and return to the caller. To allow graceful return from a fatal fault, save all callee-saved registers (including the stack pointer) just before passing control to the target function. Modify the handlers for #PF and #DF CPU exceptions to call this handler if coming from sandbox mode. The check is based on the saved CS register, which should be modified in the entry path to a value that is otherwise not possible (__SBM_CS). For the page fault handler, make sure that sandbox mode check is placed before do_kern_addr_fault(). That function calls spurious_kernel_fault(), which implements lazy TLB invalidation of kernel pages and it assumes that the faulting instruction ran with kernel-mode page tables; it would produce false positives for sandbox mode. Signed-off-by: Petr Tesarik --- arch/x86/include/asm/ptrace.h | 21 +++++++++++++++++++++ arch/x86/include/asm/sbm.h | 24 ++++++++++++++++++++++++ arch/x86/include/asm/segment.h | 7 +++++++ arch/x86/kernel/asm-offsets.c | 5 +++++ arch/x86/kernel/sbm/call_64.S | 21 +++++++++++++++++++++ arch/x86/kernel/sbm/core.c | 26 ++++++++++++++++++++++++++ arch/x86/kernel/traps.c | 11 +++++++++++ arch/x86/mm/fault.c | 6 ++++++ 8 files changed, 121 insertions(+) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index f4db78b09c8f..f66f16f037b0 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -164,6 +164,27 @@ static inline bool user_64bit_mode(struct pt_regs *regs) #endif } +/* + * sandbox_mode() - did a register set come from SandBox Mode? + * @regs: register set + */ +static inline bool sandbox_mode(struct pt_regs *regs) +{ +#ifdef CONFIG_X86_64 +#ifdef CONFIG_SANDBOX_MODE + /* + * SandBox Mode always runs in 64-bit and it is not implemented + * on paravirt systems, so this is the only possible value. + */ + return regs->cs == __SBM_CS; +#else /* !CONFIG_SANDBOX_MODE */ + return false; +#endif +#else /* !CONFIG_X86_64 */ + return false; +#endif +} + /* * Determine whether the register set came from any context that is running in * 64-bit mode. diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h index ca4741b449e8..229b1ac3bbd4 100644 --- a/arch/x86/include/asm/sbm.h +++ b/arch/x86/include/asm/sbm.h @@ -11,23 +11,29 @@ #include +struct pt_regs; + #if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) #include /** * struct x86_sbm_state - Run-time state of the environment. + * @sbm: Link back to the SBM instance. * @pgd: Sandbox mode page global directory. * @stack: Sandbox mode stack. * @exc_stack: Exception and IRQ stack. + * @return_sp: Stack pointer for returning to kernel mode. * * One instance of this union is allocated for each sandbox and stored as SBM * instance private data. */ struct x86_sbm_state { + struct sbm *sbm; pgd_t *pgd; unsigned long stack; unsigned long exc_stack; + unsigned long return_sp; }; /** @@ -43,6 +49,18 @@ static inline unsigned long top_of_intr_stack(void) return current_top_of_stack(); } +/** + * handle_sbm_fault() - Handle a CPU fault in sandbox mode. + * @regs: Saved registers at fault. + * @error_code: CPU error code. + * @address: Fault address (CR2 register). + * + * Handle a sandbox mode fault. The caller should use sandbox_mode() to + * check that @regs came from sandbox mode before calling this function. + */ +void handle_sbm_fault(struct pt_regs *regs, unsigned long error_code, + unsigned long address); + #else /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */ static inline unsigned long top_of_intr_stack(void) @@ -50,6 +68,12 @@ static inline unsigned long top_of_intr_stack(void) return current_top_of_stack(); } +static inline void handle_sbm_fault(struct pt_regs *regs, + unsigned long error_code, + unsigned long address) +{ +} + #endif /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */ #endif /* __ASM_SBM_H */ diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index 9d6411c65920..966831385d18 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -217,6 +217,13 @@ #define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8 + 3) #define __CPUNODE_SEG (GDT_ENTRY_CPUNODE*8 + 3) +/* + * Sandbox runs with __USER_CS, but the interrupt entry code sets the RPL + * in the saved selector to zero to avoid user-mode processing (FPU, signal + * delivery, etc.). This is the resulting pseudo-CS. + */ +#define __SBM_CS (GDT_ENTRY_DEFAULT_USER_CS*8) + #endif #define IDT_ENTRIES 256 diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 6913b372ccf7..44d4f0a0cb19 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -20,6 +20,7 @@ #include #include #include +#include #ifdef CONFIG_XEN #include @@ -120,4 +121,8 @@ static void __used common(void) OFFSET(ARIA_CTX_rounds, aria_ctx, rounds); #endif +#if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) + COMMENT("SandBox Mode"); + OFFSET(SBM_return_sp, x86_sbm_state, return_sp); +#endif } diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S index 1b232c8d15b7..6a615b4f6047 100644 --- a/arch/x86/kernel/sbm/call_64.S +++ b/arch/x86/kernel/sbm/call_64.S @@ -22,6 +22,17 @@ * rcx .. top of sandbox stack */ SYM_FUNC_START(x86_sbm_exec) + /* save all callee-saved registers */ + push %rbp + push %rbx + push %r12 + push %r13 + push %r14 + push %r15 + + /* to be used by sandbox abort */ + mov %rsp, SBM_return_sp(%rdi) + /* * Set up the sandbox stack: * 1. Store the old stack pointer at the top of the sandbox stack, @@ -37,5 +48,15 @@ SYM_FUNC_START(x86_sbm_exec) pop %rsp +SYM_INNER_LABEL(x86_sbm_return, SYM_L_GLOBAL) + ANNOTATE_NOENDBR // IRET target via x86_sbm_fault() + + /* restore callee-saved registers and return */ + pop %r15 + pop %r14 + pop %r13 + pop %r12 + pop %rbx + pop %rbp RET SYM_FUNC_END(x86_sbm_exec) diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c index 81f1b0093537..d4c378847e93 100644 --- a/arch/x86/kernel/sbm/core.c +++ b/arch/x86/kernel/sbm/core.c @@ -13,6 +13,8 @@ #include #include #include +#include +#include #include #include #include @@ -23,6 +25,7 @@ asmlinkage int x86_sbm_exec(struct x86_sbm_state *state, sbm_func func, void *args, unsigned long sbm_tos); +extern char x86_sbm_return[]; static inline phys_addr_t page_to_ptval(struct page *page) { @@ -343,6 +346,8 @@ int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args) struct x86_sbm_state *state = sbm->private; int err; + state->sbm = sbm; + /* let interrupt handlers use the sandbox state page */ barrier(); WRITE_ONCE(current_thread_info()->sbm_state, state); @@ -354,3 +359,24 @@ int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args) return err; } + +void handle_sbm_fault(struct pt_regs *regs, unsigned long error_code, + unsigned long address) +{ + struct x86_sbm_state *state = current_thread_info()->sbm_state; + + /* + * Force -EFAULT unless the fault was due to a user-mode instruction + * fetch from the designated return address. + */ + if (error_code != (X86_PF_PROT | X86_PF_USER | X86_PF_INSTR) || + address != (unsigned long)x86_sbm_return) + state->sbm->error = -EFAULT; + + /* modify IRET frame to exit from sandbox */ + regs->ip = (unsigned long)x86_sbm_return; + regs->cs = __KERNEL_CS; + regs->flags = X86_EFLAGS_IF; + regs->sp = state->return_sp; + regs->ss = __KERNEL_DS; +} diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index b9c9c74314e7..8fc5b17b8fb4 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -416,6 +416,12 @@ DEFINE_IDTENTRY_DF(exc_double_fault) irqentry_nmi_enter(regs); instrumentation_begin(); + + if (sandbox_mode(regs)) { + handle_sbm_fault(regs, error_code, 0); + return; + } + notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV); tsk->thread.error_code = error_code; @@ -675,6 +681,11 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) goto exit; } + if (sandbox_mode(regs)) { + handle_sbm_fault(regs, error_code, 0); + return; + } + if (gp_try_fixup_and_notify(regs, X86_TRAP_GP, error_code, desc, 0)) goto exit; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 679b09cfe241..f223b258e53f 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -34,6 +34,7 @@ #include /* kvm_handle_async_pf */ #include /* fixup_vdso_exception() */ #include +#include #define CREATE_TRACE_POINTS #include @@ -1500,6 +1501,11 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code, if (unlikely(kmmio_fault(regs, address))) return; + if (sandbox_mode(regs)) { + handle_sbm_fault(regs, error_code, address); + return; + } + /* Was the fault on kernel-controlled part of the address space? */ if (unlikely(fault_in_kernel_space(address))) { do_kern_addr_fault(regs, error_code, address); -- 2.34.1