Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2301954pxj; Sun, 16 May 2021 21:50:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx+qJw/nfWVMkS1CQjWaPhRN9L4UvCqXX1U2VW2oFODCE5GiLx00TrqCMn/ihwoGgArsxly X-Received: by 2002:a02:b90b:: with SMTP id v11mr6270727jan.1.1621227008198; Sun, 16 May 2021 21:50:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621227008; cv=none; d=google.com; s=arc-20160816; b=H0CpQmyJrQHw8Tgx9kMF3Q0WrJ7TFUUdRo6pZPanN8kcAZwzZ46pz+dOQaXOB379W7 plrCu8DPrzhmrl+CtOjkt5Mqy3Y8LGoF7U/e1yAK88ZpoVMg8Xs68CYJUu45AExKEnNC X20iaeFeJ13/VKxuy4BXV1vlN3IML3U+ZJ3SJ6ydMQ7ruJOkquvFQ5k8q3lqV5bORRT2 0oRB6jmuxB0eMmpkzu4PWD4qgvRrDvRfv/MsrTfQuCVGj3H/qk9zI4kkTXXvBuXiKih5 nkzk7Bl6IB4+eZn4pVJmv0FgssZfbhEmAtnqpW1e1WSrkf7U1YxyPIS7+OMxXi8uh+MW unZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=cQTeXzE686St8NNoygeZBWwazlae2EK7WbSfYvXhKv0=; b=tfxihfqB34kI7G7m4crPW7UdDIFaTMEpWg2lo06Fgs7MM1gFMAMPUmMUUjyNWHHBFM Nyol5yqiHQEhXbMd1B4b4xBC2uvttsRGsnd0GHIuj0Inwf5nL/WV5i/9ZyDabE2Oo5EF Fyi240xTdtvFZ9DjnjexeI+P7KhgceuBPm3qZVa3gnsxx+x3AzOUn1U67ydvZAoiTZLu ySbBRfHNmyo43068YJa9s5N1/nSQhtmPT7nsnddJ3nm46WjKYavIlSlipFOffE1q9yFb d/A9VJC+Yo6lJM+dzAVZx7A0Sv0MxJFpjsXClAZvsnhJsc36vklo2rOTjunT3t30l/cp gr4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hbKFJ03h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a2si17490590jao.7.2021.05.16.21.49.28; Sun, 16 May 2021 21:50:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hbKFJ03h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231588AbhEPXAL (ORCPT + 99 others); Sun, 16 May 2021 19:00:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:42764 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230210AbhEPXAJ (ORCPT ); Sun, 16 May 2021 19:00:09 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 2513B6113C; Sun, 16 May 2021 22:58:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1621205934; bh=fs924xrDF8amvmwj2roEalsH1PWBkImLcf788hcVKF8=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=hbKFJ03hDs5fZa75fAlEEuvLCoddZ/YbFFpQjwPhGN2Z4+ipdDpC1Ps4BNhAdHBet ROH0sGF8BBqyoj2DNHSo8bnYMiiIQ+Du4CLYKhFtxfFmWJhzsTmYbD/KIwvmLRcae+ R5Z/nAy6ddYrp+W29KSsE19eLJR24F1vR37fBlaYtnBpW5ZEUijtBFJOPe+21rHKzR 9aeEaDeIEideewXzQtHf440je++LTdOPSqV6vpkFquJFbW5Kw6R71rpTrxYom50pUS 33XxG4OeE0VRfx026x8PeMtcQYFZpEZeVAqGgz9vRxHOw8F4zAIWjGlKA+5/r5tKgl nJr9vlIja+WlQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id E2DDB5C03A8; Sun, 16 May 2021 15:58:53 -0700 (PDT) Date: Sun, 16 May 2021 15:58:53 -0700 From: "Paul E. McKenney" To: yanfei.xu@windriver.com Cc: josh@joshtriplett.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, joel@joelfernandes.org, rcu@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] rcu: fix a deadlock caused by not release rcu_node->lock Message-ID: <20210516225853.GD4441@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org References: <20210516095010.3657134-1-yanfei.xu@windriver.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210516095010.3657134-1-yanfei.xu@windriver.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, May 16, 2021 at 05:50:10PM +0800, yanfei.xu@windriver.com wrote: > From: Yanfei Xu > > rcu_node->lock isn't released in rcu_print_task_stall() if the rcu_node > don't contain tasks which blocking the GP. However this rcu_node->lock > will be used again in rcu_dump_cpu_stacks() soon while the ndetected is > non-zero. As a result the cpu will hung by this deadlock. > > Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled") > Signed-off-by: Yanfei Xu Also a good catch, thank you! Queued for further review and testing, wordsmithed as shown below. The rcutorture scripts have been known to work on ARM in the past, and might still do so. (I test on x86.) As always, please check to make sure that I didn't mess something up. Thanx, Paul ------------------------------------------------------------------------ commit e0a9b77f245ae4fe1537120fd5319bf9e091618e Author: Yanfei Xu Date: Sun May 16 17:50:10 2021 +0800 rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock If rcu_print_task_stall() is invoked on an rcu_node structure that does not contain any tasks blocking the current grace period, it takes an early exit that fails to release that rcu_node structure's lock. This results in a self-deadlock, which is detected by lockdep. To reproduce this bug: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 3 --trust-make --configs "TREE03" --kconfig "CONFIG_PROVE_LOCKING=y" --bootargs "rcutorture.stall_cpu=30 rcutorture.stall_cpu_block=1 rcutorture.fwd_progress=0 rcutorture.test_boost=0" This will also result in other complaints, including RCU's scheduler hook complaining about blocking rather than preemption and an rcutorture writer stall. Only a partial RCU CPU stall warning message will be printed because of the self-deadlock. This commit therefore releases the lock on the rcu_print_task_stall() function's early exit path. Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled") Signed-off-by: Yanfei Xu Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index a10ea1f1f81f..d574e3bbd929 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -267,8 +267,10 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags) struct task_struct *ts[8]; lockdep_assert_irqs_disabled(); - if (!rcu_preempt_blocked_readers_cgp(rnp)) + if (!rcu_preempt_blocked_readers_cgp(rnp)) { + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); return 0; + } pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):", rnp->level, rnp->grplo, rnp->grphi); t = list_entry(rnp->gp_tasks->prev,