Received: by 2002:ab2:710b:0:b0:1ef:a325:1205 with SMTP id z11csp322008lql; Mon, 11 Mar 2024 04:03:00 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXsxWnUIqMWeYTFonPDMQhfjRrRuDhOg1rWKxA2zpP6GvJIrPecbHE9XmUZZ5MUSzAlOdcmkf5RuGpTbnCBNdEUf1FLrRliSYqOeds8zw== X-Google-Smtp-Source: AGHT+IGKuJdx0jZPr26XbTg/yf50TZQoGSewh62JbV194vf/C4H4oFsAKJWbmq8iYCzDio140PfG X-Received: by 2002:a17:90a:bd17:b0:29b:ecf0:c788 with SMTP id y23-20020a17090abd1700b0029becf0c788mr2359458pjr.4.1710154980416; Mon, 11 Mar 2024 04:03:00 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710154980; cv=pass; d=google.com; s=arc-20160816; b=SK/tVCpGw3ziw1yjES2xV749WkMatFdtD5JXQxs3kZD1aZ15NeJB7adhJU5JsSXUQw VRQ2PTwxXeYhFWNKoQ6j2aUFrsLer/OoIzH4UQp2jAIU+p9erPh2hrYlvLdw2ntaVBSL RM33FS91xCzWx72TFs2QvPzVQp8XWaC5LL9M3l344/W7+qDeV03Htvnbqk/AXNf0HHpQ HMm0rJ+j5U2YF3Z13XdBYBoUL2jFS8WH4ng5L7WD99PzWKd4je5RPKZNusHT3rp+Udbe orQ53KWhpozNmbEJIuZ9zUUUzP8FCiJD8vsfFfG/DlTUbkf18hU4BBWfRlvHJf5mM+kH 8l4g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date; bh=gcidNvystXJxtfC0sxmBbDb3AOrGrNTytJ4nsg3MVhA=; fh=FM0f/aovtzZ1X2zuxXrjPkln2HUFVXGHxTlEb0gOFHM=; b=YfwTBKp8c8GWvoWVY+4950PA2PUsj3T+KP5LKjbjNzKh21J4UJb3UZHPBVR0rd8ol7 XsY+xrqrE8s+wVLdIPAaMNwN3PPv4qUTRcYAIYWp/e68scMcrSbEZSOBMqBmCIObvNSr 9YFetHZbrEBJ33uYU53wx9/oe3YQv5FGjBdauwPc7/HG1WBtRa7KUkljysM4maQ9Vx83 2JDeUJAACMQRF+/+CtQGbUDbHygmnkOPrrjKSmXjSQgG7br0whmlHJDKStGj/QYBwtMC Ia6sWHBnGKcGI1aNCTgMIJeP9NnjdiZm1mNpgwN2IZCc6QYqfGTlZdCRQtEbyQ3YGgE1 E28Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-98737-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-98737-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id x25-20020a631719000000b005e43cb39729si4659788pgl.870.2024.03.11.04.03.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Mar 2024 04:03:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-98737-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-98737-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-98737-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id B5CE8282782 for ; Mon, 11 Mar 2024 11:02:41 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 23F1B38DC0; Mon, 11 Mar 2024 11:02:36 +0000 (UTC) Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F08E36AEF for ; Mon, 11 Mar 2024 11:02:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.255 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710154955; cv=none; b=VZ6fz4Eh4gXGgRsjF5CFysOz7U3rSuI61y0pZvfY6oSYcwp2ocVTCFC32+MmRxJuFOhAOTOSKj/lhTKcFaE2bkycAocP2dZoIwOHoyAIlIDRSit64USIPt0Kmo0NDb3igIlksdfe8WKdyXi8rJezoPET/tUbTw/WuYYhrluopjw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710154955; c=relaxed/simple; bh=0A0K4y5XOT6do6vzcN1PPIM8k6P516zwvH5G2Cqm7nM=; h=Date:From:To:CC:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=thrkI6N1xmwiQIJaB9Hg03W3altl0jB3i2hiLrqEQWMIWcyb7Io5F87ihLDdIAgKFXAtyDZHWQTcACQ5x7Hp3KvNaw7nKstkAVUM7Weui58vS8Wt0v+WkQRS6H4Y6Z6axW+n5SZJ1bsByKolv5OMnZPC9W8nEV3/80aB8tPtbWs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.255 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4TtYhm5LwNz1Q9Ws; Mon, 11 Mar 2024 19:00:24 +0800 (CST) Received: from kwepemd100011.china.huawei.com (unknown [7.221.188.204]) by mail.maildlp.com (Postfix) with ESMTPS id 0484414040F; Mon, 11 Mar 2024 19:02:29 +0800 (CST) Received: from M910t (10.110.54.157) by kwepemd100011.china.huawei.com (7.221.188.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Mon, 11 Mar 2024 19:02:28 +0800 Date: Mon, 11 Mar 2024 19:02:23 +0800 From: Changbin Du To: , Changbin Du CC: Marco Elver , Alexander Potapenko , Andrew Morton , , , Subject: Re: [BUG] kmsan: instrumentation recursion problems Message-ID: <20240311110223.nzsplk6a6lzxmzqi@M910t> References: <20240308043448.masllzeqwht45d4j@M910t> <20240311093036.44txy57hvhevybsu@M910t> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20240311093036.44txy57hvhevybsu@M910t> X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemd100011.china.huawei.com (7.221.188.204) On Mon, Mar 11, 2024 at 05:30:36PM +0800, Changbin Du wrote: > On Fri, Mar 08, 2024 at 10:39:15AM +0100, Marco Elver wrote: > > On Fri, 8 Mar 2024 at 05:36, 'Changbin Du' via kasan-dev > > wrote: > > > > > > Hey, folks, > > > I found two instrumentation recursion issues on mainline kernel. > > > > > > 1. recur on preempt count. > > > __msan_metadata_ptr_for_load_4() -> kmsan_virt_addr_valid() -> preempt_disable() -> __msan_metadata_ptr_for_load_4() > > > > > > 2. recur in lockdep and rcu > > > __msan_metadata_ptr_for_load_4() -> kmsan_virt_addr_valid() -> pfn_valid() -> rcu_read_lock_sched() -> lock_acquire() -> rcu_is_watching() -> __msan_metadata_ptr_for_load_8() > > > > > > > > > Here is an unofficial fix, I don't know if it will generate false reports. > > > > > > $ git show > > > commit 7f0120b621c1cbb667822b0f7eb89f3c25868509 (HEAD -> master) > > > Author: Changbin Du > > > Date: Fri Mar 8 20:21:48 2024 +0800 > > > > > > kmsan: fix instrumentation recursions > > > > > > Signed-off-by: Changbin Du > > > > > > diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile > > > index 0db4093d17b8..ea925731fa40 100644 > > > --- a/kernel/locking/Makefile > > > +++ b/kernel/locking/Makefile > > > @@ -7,6 +7,7 @@ obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o > > > > > > # Avoid recursion lockdep -> sanitizer -> ... -> lockdep. > > > KCSAN_SANITIZE_lockdep.o := n > > > +KMSAN_SANITIZE_lockdep.o := n > > > > This does not result in false positives? > > This does result lots of false positives. > I saw a lot of reports but seems not related to this. > > [ 2.742743][ T0] BUG: KMSAN: uninit-value in unwind_next_frame+0x3729/0x48a0 > [ 2.744404][ T0] unwind_next_frame+0x3729/0x48a0 > [ 2.745623][ T0] arch_stack_walk+0x1d9/0x2a0 > [ 2.746838][ T0] stack_trace_save+0xb8/0x100 > [ 2.747928][ T0] set_track_prepare+0x88/0x120 > [ 2.749095][ T0] __alloc_object+0x602/0xbe0 > [ 2.750200][ T0] __create_object+0x3f/0x4e0 > [ 2.751332][ T0] pcpu_alloc+0x1e18/0x2b00 > [ 2.752401][ T0] mm_init+0x688/0xb20 > [ 2.753436][ T0] mm_alloc+0xf4/0x180 > [ 2.754510][ T0] poking_init+0x50/0x500 > [ 2.755594][ T0] start_kernel+0x3b0/0xbf0 > [ 2.756724][ T0] __pfx_reserve_bios_regions+0x0/0x10 > [ 2.758073][ T0] x86_64_start_kernel+0x92/0xa0 > [ 2.759320][ T0] secondary_startup_64_no_verify+0x176/0x17b > Above reports are triggered by KMEMLEAK and KFENCE. Now with below fix, I was able to run kmsan kernel with: CONFIG_DEBUG_KMEMLEAK=n CONFIG_KFENCE=n CONFIG_LOCKDEP=n KMEMLEAK and KFENCE generate too many false positives in unwinding code. LOCKDEP still introduces instrumenting recursions. > > > Does > > KMSAN_ENABLE_CHECKS_lockdep.o := n > > work as well? If it does, that is preferred because it makes sure > > there are no false positives if the lockdep code unpoisons data that > > is passed and used outside lockdep. > > > > lockdep has a serious impact on performance, and not sanitizing it > > with KMSAN is probably a reasonable performance trade-off. > > > Disabling checks is not working here. The recursion become this: > > __msan_metadata_ptr_for_load_4() -> kmsan_get_metadata() -> virt_to_page_or_null() -> pfn_valid() -> lock_acquire() -> __msan_unpoison_alloca() -> kmsan_get_metadata() > > > > ifdef CONFIG_FUNCTION_TRACER > > > CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE) > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index b2bccfd37c38..8935cc866e2d 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -692,7 +692,7 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp) > > > * Make notrace because it can be called by the internal functions of > > > * ftrace, and making this notrace removes unnecessary recursion calls. > > > */ > > > -notrace bool rcu_is_watching(void) > > > +notrace __no_sanitize_memory bool rcu_is_watching(void) > > > > For all of these, does __no_kmsan_checks instead of __no_sanitize_memory work? > > Again, __no_kmsan_checks (function-only counterpart to > > KMSAN_ENABLE_CHECKS_.... := n) is preferred if it works as it avoids > > any potential false positives that would be introduced by not > > instrumenting. > > > This works because it is not unpoisoning local variables. > > > > { > > > bool ret; > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index 9116bcc90346..33aa4df8fd82 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -5848,7 +5848,7 @@ static inline void preempt_latency_start(int val) > > > } > > > } > > > > > > -void preempt_count_add(int val) > > > +void __no_sanitize_memory preempt_count_add(int val) > > > { > > > #ifdef CONFIG_DEBUG_PREEMPT > > > /* > > > @@ -5880,7 +5880,7 @@ static inline void preempt_latency_stop(int val) > > > trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); > > > } > > > > > > -void preempt_count_sub(int val) > > > +void __no_sanitize_memory preempt_count_sub(int val) > > > { > > > #ifdef CONFIG_DEBUG_PREEMPT > > > > > > > > > -- > > > Cheers, > > > Changbin Du > > -- > Cheers, > Changbin Du -- Cheers, Changbin Du