Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1157692pxu; Fri, 27 Nov 2020 00:45:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJxN1UhPUVA3LiMGMAVeARE54Q4rVVqB8jQEb/GbgD5vRevjJLnXIPX2sjL0BF5zHLVBcyKo X-Received: by 2002:a50:b224:: with SMTP id o33mr6421010edd.21.1606466707216; Fri, 27 Nov 2020 00:45:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606466707; cv=none; d=google.com; s=arc-20160816; b=SSEiKLTH6XP40xMzSAeHd8WMmx0LaltiEDaGLiwlCPWuy5Z8fs0TTPWfFphHe1rfMa jI3CbrW/O/tC/7Xyvx1fCOwIXMeiB1xtrbvlHdUKNaKK7d1VRbAitzWWTnO37RS8TIk9 RtfHM/7eg23PmGjhsYYVwUJpbIDtZSB/4M+E5mDcVRxO26AyqFy5s1K6kTySMl5IJx6U MpklI4SxrdcJgxJ3gHEKnVWvajW7RmYpDV/CYXbgveuHODw7IzILTfZbbVq4vNJ+rzri ZkdmEN8ggLmt110zPJ+ix/RNKzmEZItmzrGpJXfaZ/2uFbJYRcN6jW5Ln2rzdWfHLjYE d30w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=9VJV37+Gg+q0TSMlXsh7ehK58Ni+2SnosCNUVxyRC4Y=; b=EBgj5ESLB3j1rLG5jx32Wc0ombUbRC0hRjCUiJV/wLgWVLZo0WkqS0IzinUE72JYo0 olHaHUexyIBSAgcPJ6ZJebwhvN25jLyPbM4utfndQDAuWp2b0sBr8LlytL1NGKnoA9fq BWtfD7JulzQhdL/s+tSMPwH1qiThIGOEJhY4ZFDsE6iiCGTd7lkvnrnjHgtiN4vVLQEL sVWAxZHrIIyR69EJl9hmnUTs1jBqMwFUmx1Zdfd4Zugrrk9s+HO0yQGqMZypO/9OqeO5 zXiUBLaYbmBfOBfZwT2/wBcTkErl+Z6nqeQ4pm9vRsbxErng+nQ+Su6eIIyw78kwTbsF p63w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=l7vIXpop; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ga18si4641431ejc.519.2020.11.27.00.44.44; Fri, 27 Nov 2020 00:45:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=l7vIXpop; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389322AbgKZVm1 (ORCPT + 99 others); Thu, 26 Nov 2020 16:42:27 -0500 Received: from mail.kernel.org ([198.145.29.99]:58760 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726178AbgKZVm1 (ORCPT ); Thu, 26 Nov 2020 16:42:27 -0500 Received: from paulmck-ThinkPad-P72.home (50-39-104-11.bvtn.or.frontiernet.net [50.39.104.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5A66320872; Thu, 26 Nov 2020 21:42:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1606426946; bh=Hy1ZQK9dRMU3ToqMmxOyHjgGadc+D2Wc65UGFap7/Kg=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=l7vIXpopAMloE7428TO49IBDwok55u7vtDQbfzUgssbE87lVSbVZREo5mAlTASoT7 eLR8K1pM4HUuzl3RsLIqiJdFfXVN4GJcj/PyWeoS98u5hfc787uEnGZU6/sLQbtXn0 Q6RJYt+ISJdmlOAYmGtVJWJ6gevxkaVL2f1YEdq0= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id 367243522FFC; Thu, 26 Nov 2020 13:42:26 -0800 (PST) Date: Thu, 26 Nov 2020 13:42:26 -0800 From: "Paul E. McKenney" To: Dexuan Cui Cc: "boqun.feng@gmail.com" , Ingo Molnar , "rcu@vger.kernel.org" , vkuznets , Michael Kelley , "linux-kernel@vger.kernel.org" Subject: Re: kdump always hangs in rcu_barrier() -> wait_for_completion() Message-ID: <20201126214226.GS1437@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20201126154630.GR1437@paulmck-ThinkPad-P72> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 26, 2020 at 09:25:28PM +0000, Dexuan Cui wrote: > > From: Paul E. McKenney > > Sent: Thursday, November 26, 2020 7:47 AM > > ... > > The rcu_segcblist_n_cbs() function returns non-zero because something > > invoked call_rcu() some time previously. The ftrace facility (or just > > a printk) should help you work out where that call_rcu() is located. > > call_rcu() is indeed called multiple times, but as you said, this should > be normal. Good to know, thank you! > > My best guess is that the underlying bug is that you are invoking > > rcu_barrier() before the RCU grace-period kthread has been created. > > This means that RCU grace periods cannot complete, which in turn means > > that if there has been even one invocation of call_rcu() since boot, > > rcu_barrier() cannot complete, which is what you are in fact seeing. > > Please note that it is perfectly legal to invoke call_rcu() very early in > > the boot process, as in even before the call to rcu_init(). Therefore, > > if this is the case, the bug is the early call to rcu_barrier(), not > > the early calls to call_rcu(). > > > > To check this, at the beginning of rcu_barrier(), check the value of > > rcu_state.gp_kthread. If my guess is correct, it will be NULL. > > Unluckily, it's not NULL here. :-) You can't have everything! ;-) > > Another possibility is that rcu_state.gp_kthread is non-NULL, but that > > something else is preventing RCU grace periods from completing, but in > > It looks like somehow the scheduling is not working here: in rcu_barrier() > , if I replace the wait_for_completion() with > wait_for_completion_timeout(&rcu_state.barrier_completion, 30*HZ), the > issue persists. Have you tried using sysreq-t to see what the various tasks are doing? One way that this can happen is if whatever task is currently running has managed to enter long loop with interrupts disabled. > > that case you should see RCU CPU stall warnings. Unless of course they > > have been disabled. > > Thanx, Paul > > I guess I didn't disable the wanrings (I don't even know how to do that :) Having interrupts disabled on all CPUs would have the effect of disabling the RCU CPU stall warnings. The intended method is in Documentation/admin-guide/kernel-parameters.txt. Search for rcu_cpu_stall_suppress. Not that it seems important at this point. Thanx, Paul > grep RCU .config > # RCU Subsystem > CONFIG_TREE_RCU=y > # CONFIG_RCU_EXPERT is not set > CONFIG_SRCU=y > CONFIG_TREE_SRCU=y > CONFIG_TASKS_RCU_GENERIC=y > CONFIG_TASKS_RUDE_RCU=y > CONFIG_TASKS_TRACE_RCU=y > CONFIG_RCU_STALL_COMMON=y > CONFIG_RCU_NEED_SEGCBLIST=y > CONFIG_RCU_NOCB_CPU=y > # end of RCU Subsystem > CONFIG_MMU_GATHER_RCU_TABLE_FREE=y > # RCU Debugging > # CONFIG_RCU_SCALE_TEST is not set > # CONFIG_RCU_TORTURE_TEST is not set > # CONFIG_RCU_REF_SCALE_TEST is not set > CONFIG_RCU_CPU_STALL_TIMEOUT=30 > CONFIG_RCU_TRACE=y > CONFIG_RCU_EQS_DEBUG=y > # end of RCU Debugging > > Thanks, > -- Dexuan >