Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED18FC64ED6 for ; Wed, 1 Mar 2023 17:00:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229868AbjCARAH (ORCPT ); Wed, 1 Mar 2023 12:00:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229670AbjCARAF (ORCPT ); Wed, 1 Mar 2023 12:00:05 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F24A9298FB for ; Wed, 1 Mar 2023 09:00:03 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 99BE8B81035 for ; Wed, 1 Mar 2023 17:00:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B51A5C433D2; Wed, 1 Mar 2023 16:59:58 +0000 (UTC) Date: Wed, 1 Mar 2023 16:59:55 +0000 From: Catalin Marinas To: Andrey Konovalov Cc: =?utf-8?B?6KKB5biFKFNodWFpIFl1YW4p?= , Dmitry Vyukov , =?utf-8?B?5qyn6Ziz54Kc6ZKKKFdlaXpoYW8gT3V5YW5nKQ==?= , Andrey Ryabinin , Alexander Potapenko , Vincenzo Frascino , Andrew Morton , "kasan-dev@googlegroups.com" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Weizhao Ouyang , =?utf-8?B?5Lu756uL6bmPKFBlbmcgUmVuKQ==?= , Peter Collingbourne Subject: Re: [PATCH v2] kasan: fix deadlock in start_report() Message-ID: References: <20230209031159.2337445-1-ouyangweizhao@zeku.com> <93b94f59016145adbb1e01311a1103f8@zeku.com> <2b57491a9fab4ce9a643bd0922e03e73@zeku.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 28, 2023 at 10:50:46PM +0100, Andrey Konovalov wrote: > On Tue, Feb 28, 2023 at 5:09 PM Catalin Marinas wrote: > > On Mon, Feb 27, 2023 at 03:13:45AM +0100, Andrey Konovalov wrote: > > > +Catalin, would it be acceptable to implement a routine that disables > > > in-kernel MTE tag checking (until the next > > > mte_enable_kernel_sync/async/asymm call)? In a similar way an MTE > > > fault does this, but without the fault itself. I.e., expose the part > > > of do_tag_recovery functionality without report_tag_fault? > > > > I don't think we ever re-enable MTE after do_tag_recovery(). The > > mte_enable_kernel_*() are called at boot. We do call > > kasan_enable_tagging() explicitly in the kunit tests but that's a > > controlled fault environment. > > Right, but here we don't want to re-enable MTE after a fault, we want > to suppress faults when printing an error report. > > > IIUC, the problem is that the kernel already got an MTE fault, so at > > that point the error is not really recoverable. > > No, the problem is with the following sequence of events: > > 1. KASAN detects a memory corruption and starts printing a report > _without getting an MTE fault_. This happens when e.g. KASAN sees a > free of an invalid address. > > 2. During error reporting, an MTE fault is triggered by the error > reporting code. E.g. while collecting information about the accessed > slab object. > > 3. KASAN tries to print another report while printing a report and > goes into a deadlock. > > If we could avoid MTE faults being triggered during error reporting, > this would solve the problem. Ah, I get it now. So we just want to avoid triggering a benign MTE fault. > > If we want to avoid a > > fault in the first place, we could do something like > > __uaccess_enable_tco() (Vincenzo has some patches to generalise these > > routines) > > Ah, this looks exactly like what we need. Adding > __uaccess_en/disable_tco to kasan_report_invalid_free solves the > problem. > > Do you think it would be possible to expose these routines to KASAN? Yes. I'm including Vincenzo's patch below (part of fixing some potential strscpy() faults with its unaligned accesses eager reading; we'll get to posting that eventually). You can add some arch_kasan_enable/disable() macros on top and feel free to include the patch below. Now, I wonder whether we should link those into kasan_disable_current(). These functions only deal with the depth for KASAN_SW_TAGS but it would make sense for KASAN_HW_TAGS to enable tag-check-override so that we don't need to bother with a match-all tags on pointer dereferencing. ----8<---------------------------- From 0dcfc84d8b984001219cc3c9eaf698c26286624c Mon Sep 17 00:00:00 2001 From: Vincenzo Frascino Date: Thu, 13 Oct 2022 07:46:23 +0100 Subject: [PATCH] arm64: mte: Rename TCO routines The TCO related routines are used in uaccess methods and load_unaligned_zeropad() but are unrelated to both even if the naming suggest otherwise. Improve the readability of the code moving the away from uaccess.h and pre-pending them with "mte". Cc: Will Deacon Signed-off-by: Vincenzo Frascino Signed-off-by: Catalin Marinas --- arch/arm64/include/asm/mte-kasan.h | 81 +++++++++++++++++++++++++ arch/arm64/include/asm/mte.h | 12 ---- arch/arm64/include/asm/uaccess.h | 66 +++----------------- arch/arm64/include/asm/word-at-a-time.h | 4 +- 4 files changed, 93 insertions(+), 70 deletions(-) diff --git a/arch/arm64/include/asm/mte-kasan.h b/arch/arm64/include/asm/mte-kasan.h index 9f79425fc65a..598be32ed811 100644 --- a/arch/arm64/include/asm/mte-kasan.h +++ b/arch/arm64/include/asm/mte-kasan.h @@ -13,8 +13,73 @@ #include +#ifdef CONFIG_KASAN_HW_TAGS + +/* Whether the MTE asynchronous mode is enabled. */ +DECLARE_STATIC_KEY_FALSE(mte_async_or_asymm_mode); + +static inline bool system_uses_mte_async_or_asymm_mode(void) +{ + return static_branch_unlikely(&mte_async_or_asymm_mode); +} + +#else /* CONFIG_KASAN_HW_TAGS */ + +static inline bool system_uses_mte_async_or_asymm_mode(void) +{ + return false; +} + +#endif /* CONFIG_KASAN_HW_TAGS */ + #ifdef CONFIG_ARM64_MTE +/* + * The Tag Check Flag (TCF) mode for MTE is per EL, hence TCF0 + * affects EL0 and TCF affects EL1 irrespective of which TTBR is + * used. + * The kernel accesses TTBR0 usually with LDTR/STTR instructions + * when UAO is available, so these would act as EL0 accesses using + * TCF0. + * However futex.h code uses exclusives which would be executed as + * EL1, this can potentially cause a tag check fault even if the + * user disables TCF0. + * + * To address the problem we set the PSTATE.TCO bit in uaccess_enable() + * and reset it in uaccess_disable(). + * + * The Tag check override (TCO) bit disables temporarily the tag checking + * preventing the issue. + */ +static inline void __mte_disable_tco(void) +{ + asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(0), + ARM64_MTE, CONFIG_KASAN_HW_TAGS)); +} + +static inline void __mte_enable_tco(void) +{ + asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(1), + ARM64_MTE, CONFIG_KASAN_HW_TAGS)); +} + +/* + * These functions disable tag checking only if in MTE async mode + * since the sync mode generates exceptions synchronously and the + * nofault or load_unaligned_zeropad can handle them. + */ +static inline void __mte_disable_tco_async(void) +{ + if (system_uses_mte_async_or_asymm_mode()) + __mte_disable_tco(); +} + +static inline void __mte_enable_tco_async(void) +{ + if (system_uses_mte_async_or_asymm_mode()) + __mte_enable_tco(); +} + /* * These functions are meant to be only used from KASAN runtime through * the arch_*() interface defined in asm/memory.h. @@ -138,6 +203,22 @@ void mte_enable_kernel_asymm(void); #else /* CONFIG_ARM64_MTE */ +static inline void __mte_disable_tco(void) +{ +} + +static inline void __mte_enable_tco(void) +{ +} + +static inline void __mte_disable_tco_async(void) +{ +} + +static inline void __mte_enable_tco_async(void) +{ +} + static inline u8 mte_get_ptr_tag(void *ptr) { return 0xFF; diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h index 20dd06d70af5..c028afb1cd0b 100644 --- a/arch/arm64/include/asm/mte.h +++ b/arch/arm64/include/asm/mte.h @@ -178,14 +178,6 @@ static inline void mte_disable_tco_entry(struct task_struct *task) } #ifdef CONFIG_KASAN_HW_TAGS -/* Whether the MTE asynchronous mode is enabled. */ -DECLARE_STATIC_KEY_FALSE(mte_async_or_asymm_mode); - -static inline bool system_uses_mte_async_or_asymm_mode(void) -{ - return static_branch_unlikely(&mte_async_or_asymm_mode); -} - void mte_check_tfsr_el1(void); static inline void mte_check_tfsr_entry(void) @@ -212,10 +204,6 @@ static inline void mte_check_tfsr_exit(void) mte_check_tfsr_el1(); } #else -static inline bool system_uses_mte_async_or_asymm_mode(void) -{ - return false; -} static inline void mte_check_tfsr_el1(void) { } diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index 5c7b2f9d5913..057ec1882326 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -136,55 +136,9 @@ static inline void __uaccess_enable_hw_pan(void) CONFIG_ARM64_PAN)); } -/* - * The Tag Check Flag (TCF) mode for MTE is per EL, hence TCF0 - * affects EL0 and TCF affects EL1 irrespective of which TTBR is - * used. - * The kernel accesses TTBR0 usually with LDTR/STTR instructions - * when UAO is available, so these would act as EL0 accesses using - * TCF0. - * However futex.h code uses exclusives which would be executed as - * EL1, this can potentially cause a tag check fault even if the - * user disables TCF0. - * - * To address the problem we set the PSTATE.TCO bit in uaccess_enable() - * and reset it in uaccess_disable(). - * - * The Tag check override (TCO) bit disables temporarily the tag checking - * preventing the issue. - */ -static inline void __uaccess_disable_tco(void) -{ - asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(0), - ARM64_MTE, CONFIG_KASAN_HW_TAGS)); -} - -static inline void __uaccess_enable_tco(void) -{ - asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(1), - ARM64_MTE, CONFIG_KASAN_HW_TAGS)); -} - -/* - * These functions disable tag checking only if in MTE async mode - * since the sync mode generates exceptions synchronously and the - * nofault or load_unaligned_zeropad can handle them. - */ -static inline void __uaccess_disable_tco_async(void) -{ - if (system_uses_mte_async_or_asymm_mode()) - __uaccess_disable_tco(); -} - -static inline void __uaccess_enable_tco_async(void) -{ - if (system_uses_mte_async_or_asymm_mode()) - __uaccess_enable_tco(); -} - static inline void uaccess_disable_privileged(void) { - __uaccess_disable_tco(); + __mte_disable_tco(); if (uaccess_ttbr0_disable()) return; @@ -194,7 +148,7 @@ static inline void uaccess_disable_privileged(void) static inline void uaccess_enable_privileged(void) { - __uaccess_enable_tco(); + __mte_enable_tco(); if (uaccess_ttbr0_enable()) return; @@ -302,8 +256,8 @@ do { \ #define get_user __get_user /* - * We must not call into the scheduler between __uaccess_enable_tco_async() and - * __uaccess_disable_tco_async(). As `dst` and `src` may contain blocking + * We must not call into the scheduler between __mte_enable_tco_async() and + * __mte_disable_tco_async(). As `dst` and `src` may contain blocking * functions, we must evaluate these outside of the critical section. */ #define __get_kernel_nofault(dst, src, type, err_label) \ @@ -312,10 +266,10 @@ do { \ __typeof__(src) __gkn_src = (src); \ int __gkn_err = 0; \ \ - __uaccess_enable_tco_async(); \ + __mte_enable_tco_async(); \ __raw_get_mem("ldr", *((type *)(__gkn_dst)), \ (__force type *)(__gkn_src), __gkn_err, K); \ - __uaccess_disable_tco_async(); \ + __mte_disable_tco_async(); \ \ if (unlikely(__gkn_err)) \ goto err_label; \ @@ -388,8 +342,8 @@ do { \ #define put_user __put_user /* - * We must not call into the scheduler between __uaccess_enable_tco_async() and - * __uaccess_disable_tco_async(). As `dst` and `src` may contain blocking + * We must not call into the scheduler between __mte_enable_tco_async() and + * __mte_disable_tco_async(). As `dst` and `src` may contain blocking * functions, we must evaluate these outside of the critical section. */ #define __put_kernel_nofault(dst, src, type, err_label) \ @@ -398,10 +352,10 @@ do { \ __typeof__(src) __pkn_src = (src); \ int __pkn_err = 0; \ \ - __uaccess_enable_tco_async(); \ + __mte_enable_tco_async(); \ __raw_put_mem("str", *((type *)(__pkn_src)), \ (__force type *)(__pkn_dst), __pkn_err, K); \ - __uaccess_disable_tco_async(); \ + __mte_disable_tco_async(); \ \ if (unlikely(__pkn_err)) \ goto err_label; \ diff --git a/arch/arm64/include/asm/word-at-a-time.h b/arch/arm64/include/asm/word-at-a-time.h index 1c8e4f2490bf..f3b151ed0d7a 100644 --- a/arch/arm64/include/asm/word-at-a-time.h +++ b/arch/arm64/include/asm/word-at-a-time.h @@ -55,7 +55,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr) { unsigned long ret; - __uaccess_enable_tco_async(); + __mte_enable_tco_async(); /* Load word from unaligned pointer addr */ asm( @@ -65,7 +65,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr) : "=&r" (ret) : "r" (addr), "Q" (*(unsigned long *)addr)); - __uaccess_disable_tco_async(); + __mte_disable_tco_async(); return ret; }