Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1835863pxk; Tue, 1 Sep 2020 08:56:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyk8YdTy3dYS+i0Z3LguzxQCu+D+aNdStZCbSpGhHsyGgX7QraasNMC7Xvk8Ms2+vo5Kcjy X-Received: by 2002:a17:906:d282:: with SMTP id ay2mr1908978ejb.265.1598975762808; Tue, 01 Sep 2020 08:56:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598975762; cv=none; d=google.com; s=arc-20160816; b=hfD8VhLs5JDz8zUF23j4dPgi+aUECTyAoacBbDZpcamqNBMonHEP70AR1j/X6Mth4H MRnnOYmtEkDZ0uF18D5++9cHrSEK9JOGCYlGE9rdYiOJy8Ojb38IQVygV0IVjMHkd0jn AYL+Z0fE/pgb99Dj0/BdMoC42kOJphb4udDBboB5cNEpoFC4GBZVGYZ8/Wdt+9azuWvO swi0ngpuoYwmoZnKH4pzFpStc1hKlLldK3Fo7vzp3CdR+FCqzNw4yhEY2JstqEogc31Q UGzZdpOnmTZ26gFUPwj1RR072N02DX4bCLpy2O150ucONNtfXPp3fDR+vaY0OeK0VZRG QmVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:dkim-signature; bh=9RN/DXcpCgvBkcZ0mIGG04s1Hi8vxYfEBWSAyyjyMVw=; b=bGDe5XQoQqkAQb/BJjhTliDYUx8BcDHU3AzOVDgYDptVJke7HkRy0W4pfCF7i/c3/N uIytBxpBZeL3gwiREcKvkx/rcM3JurlY9DWe4c8TCcot6uG35chUZQYeEtRM/pfbmQkj BbUSm9i+K/YRtE1TKYVQuDrBS4irHiUp6AZgxqDA9Vf8v6QKNcUkgd8YjcCZebKXjMsr I5vB+Vt46tTJkUOwhxKopniLdai9cxmeoEGLpEetoNd3UyrGDhP7npsSWixtQiRtn1GN KQpXYKCBYZBX5Y2XoONVb4rd0lHLj1nGGF/08Uy4l5G8wxEUHLrk63FQQ6YLwuMg9js+ Fu9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@mg.codeaurora.org header.s=smtp header.b=sS0kdCHX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j8si734145edl.471.2020.09.01.08.55.39; Tue, 01 Sep 2020 08:56:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@mg.codeaurora.org header.s=smtp header.b=sS0kdCHX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731811AbgIAPs0 (ORCPT + 99 others); Tue, 1 Sep 2020 11:48:26 -0400 Received: from m43-7.mailgun.net ([69.72.43.7]:45489 "EHLO m43-7.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730653AbgIAPoh (ORCPT ); Tue, 1 Sep 2020 11:44:37 -0400 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1598975075; h=In-Reply-To: Content-Type: MIME-Version: References: Message-ID: Subject: Cc: To: From: Date: Sender; bh=9RN/DXcpCgvBkcZ0mIGG04s1Hi8vxYfEBWSAyyjyMVw=; b=sS0kdCHXqZW0tktlsqJIG+stUdbZ3alGb6gNFfSGSBh63zhFC+3oCTo0ymS3MnU2BAUeIWvn 1BH6hNWFmPFuzdxpzHMAyXi0JQeST/HVa+w5yXwWfcE+OAMKyp4RYYVs1Z3+0La2MCJTcN/J s6NdWCRnbnxR1uixyp3lXQoZxUQ= X-Mailgun-Sending-Ip: 69.72.43.7 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n05.prod.us-west-2.postgun.com with SMTP id 5f4e6c559bdf68cc03daec13 (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Tue, 01 Sep 2020 15:44:21 GMT Received: by smtp.codeaurora.org (Postfix, from userid 1001) id D1EACC433CB; Tue, 1 Sep 2020 15:44:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=2.0 tests=ALL_TRUSTED,SPF_NONE, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from localhost (i-global254.qualcomm.com [199.106.103.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: ilina) by smtp.codeaurora.org (Postfix) with ESMTPSA id 4CC86C433CB; Tue, 1 Sep 2020 15:44:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 4CC86C433CB Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; spf=none smtp.mailfrom=ilina@codeaurora.org Date: Tue, 1 Sep 2020 09:44:17 -0600 From: Lina Iyer To: Ulf Hansson Cc: Peter Zijlstra , Naresh Kamboju , paulmck@kernel.org, "Rafael J. Wysocki" , Saravana Kannan , open list , linux-mmc , lkft-triage@lists.linaro.org, rcu@vger.kernel.org, Linux PM , Anders Roxell , Arnd Bergmann , Rajendra Nayak , John Stultz , Stephen Boyd , Lars Povlsen , madhuparnabhowmik10@gmail.com, Viresh Kumar , Vincent Guittot , Thomas Gleixner Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper Message-ID: <20200901154417.GD20303@codeaurora.org> References: <20200831194402.GD2855@paulmck-ThinkPad-P72> <20200901104206.GU1362448@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Ulf, On Tue, Sep 01 2020 at 06:41 -0600, Ulf Hansson wrote: >On Tue, 1 Sep 2020 at 14:35, Ulf Hansson wrote: >> >> On Tue, 1 Sep 2020 at 12:42, wrote: >> > >> > On Tue, Sep 01, 2020 at 08:50:57AM +0200, Ulf Hansson wrote: >> > > On Tue, 1 Sep 2020 at 08:46, Ulf Hansson wrote: >> > > > On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney wrote: >> > >> > > > > > [ 5.308588] ============================= >> > > > > > [ 5.308593] WARNING: suspicious RCU usage >> > > > > > [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper >> > > > > > [ 5.320052] 5.9.0-rc3 #1 Not tainted >> > > > > > [ 5.320057] ----------------------------- >> > > > > > [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! >> > > > > > [ 5.320068] >> > > > > > [ 5.320068] other info that might help us debug this: >> > > > > > [ 5.320068] >> > > > > > [ 5.320074] >> > > > > > [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1 >> > > > > > [ 5.320078] RCU used illegally from extended quiescent state! >> > > > > > [ 5.320084] no locks held by swapper/0/0. >> > > > > > [ 5.320089] >> > > > > > [ 5.320089] stack backtrace: >> > > > > > [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1 >> > > > > > [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO >> > > > > > [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) >> > > > > > [ 5.346452] Call trace: >> > > > > > [ 5.346463] dump_backtrace+0x0/0x1f8 >> > > > > > [ 5.346471] show_stack+0x2c/0x38 >> > > > > > [ 5.346480] dump_stack+0xec/0x15c >> > > > > > [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8 >> > > > > > [ 5.346499] lock_acquire+0x3d0/0x440 >> > > > > > [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0 >> > > > > > [ 5.413118] __pm_runtime_suspend+0x34/0x1d0 >> > > > > > [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0 >> > > > > > [ 5.421795] cpuidle_enter_state+0xc8/0x610 >> > > > > > [ 5.426392] cpuidle_enter+0x3c/0x50 >> > > > > > [ 5.430561] call_cpuidle+0x44/0x80 >> > > > > > [ 5.434378] do_idle+0x240/0x2a0 >> > >> > > > > Note also that Peter Zijlstra (CCed) is working to shrink the portion >> > > > > of the idle loop that RCU ignores. Not sure that it covers your >> > > > > case, but it is worth checking. >> > >> > Right, so I think I 'caused' this by making the lock tracepoints >> > visible. That is, the error always existed, now we actually warn about >> > it. >> > >> > > > Thanks for letting me know. Let's see what Peter thinks about this then. >> > > > >> > > > Apologize for my ignorance, but from a cpuidle point of view, what >> > > > does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE >> > > > on bigger code paths? >> > > > >> > > > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend() >> > > > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps >> > > > that's the easiest approach, at least to start with. >> > > > I think this would be nice. This should also cover the case, where PM domain power off notification callbacks call trace function internally. Right? --Lina >> > > > Or do you have any other ideas? >> > >> > So IMO trace_*_rcuidle() and RCU_NONIDLE() are bugs, they just mean we >> > got the ordering wrong and are papering over it. That said, that's been >> > the modus operandi for a while now, just make it shut up and don't think >> > about it :-/ >> > >> > That said; I pushed the rcu_idle_enter() about as deep as it goes into >> > generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle >> > deeper into the idle path") >> >> Aha, that commit should fix this problem, I think. Looks like that >> commit was sent as a fix and included in the recent v5.9-rc3. >> >> Naresh, can you try with the above commit? > >Ah, just realized that I misread the patch. It doesn't fix it. > >We would still need a RCU_NONIDLE() in psci_enter_domain_idle_state() >- or something along the lines of what you suggest below. > >Apologies for the noise. > >Kind regards >Uffe > [1]. https://lkml.org/lkml/2020/8/26/81 >> >> > >> > I suppose the next step is pushing it into individual driver when >> > needed, something like the below perhaps. I realize the coupled idle >> > state stuff is more complicated that most, but it's also not an area >> > I've looked at in detail, so perhaps I've just made a bigger mess, but >> > it ought to give you enough to get going I think. >> >> These aren't coupled states. Instead, in cpuidle-psci, we are using PM >> domains through genpd and runtime PM to manage "shared idle states" >> between CPUs. >> >> Kind regards >> Uffe >> >> > >> > Rafael? >> > >> > --- >> > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c >> > index 74463841805f..617bbef316e6 100644 >> > --- a/drivers/cpuidle/cpuidle-psci.c >> > +++ b/drivers/cpuidle/cpuidle-psci.c >> > @@ -49,6 +49,9 @@ static inline u32 psci_get_domain_state(void) >> > >> > static inline int psci_enter_state(int idx, u32 state) >> > { >> > + /* >> > + * XXX push rcu_idle_enter into the coupled code >> > + */ >> > return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state); >> > } >> > >> > @@ -72,7 +75,9 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev, >> > if (!state) >> > state = states[idx]; >> > >> > + rcu_idle_enter(); >> > ret = psci_cpu_suspend_enter(state) ? -1 : idx; >> > + rcu_idle_exit(); >> > >> > pm_runtime_get_sync(pd_dev); >> > >> > @@ -125,8 +130,13 @@ static int psci_enter_idle_state(struct cpuidle_device *dev, >> > struct cpuidle_driver *drv, int idx) >> > { >> > u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states); >> > + int ret; >> > >> > - return psci_enter_state(idx, state[idx]); >> > + rcu_idle_enter(); >> > + ret = psci_enter_state(idx, state[idx]); >> > + rcu_idle_exit(); >> > + >> > + return ret; >> > } >> > >> > static const struct of_device_id psci_idle_state_match[] = { >> > @@ -170,6 +180,7 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv, >> > * deeper states. >> > */ >> > drv->states[state_count - 1].enter = psci_enter_domain_idle_state; >> > + drv->states[state_count - 1].flags = CPUIDLE_FLAG_RCU_IDLE; >> > psci_cpuidle_use_cpuhp = true; >> > >> > return 0; >> > @@ -285,6 +296,7 @@ static int psci_idle_init_cpu(struct device *dev, int cpu) >> > * state index 0. >> > */ >> > drv->states[0].enter = psci_enter_idle_state; >> > + drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE; >> > drv->states[0].exit_latency = 1; >> > drv->states[0].target_residency = 1; >> > drv->states[0].power_usage = UINT_MAX; >> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c >> > index 04becd70cc41..3dbac3bb761b 100644 >> > --- a/drivers/cpuidle/cpuidle.c >> > +++ b/drivers/cpuidle/cpuidle.c >> > @@ -239,9 +239,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, >> > time_start = ns_to_ktime(local_clock()); >> > >> > stop_critical_timings(); >> > - rcu_idle_enter(); >> > + if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) >> > + rcu_idle_enter(); >> > entered_state = target_state->enter(dev, drv, index); >> > - rcu_idle_exit(); >> > + if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) >> > + rcu_idle_exit(); >> > start_critical_timings(); >> > >> > sched_clock_idle_wakeup_event(); >> > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h >> > index 75895e6363b8..47f686131a54 100644 >> > --- a/include/linux/cpuidle.h >> > +++ b/include/linux/cpuidle.h >> > @@ -82,6 +82,7 @@ struct cpuidle_state { >> > #define CPUIDLE_FLAG_UNUSABLE BIT(3) /* avoid using this state */ >> > #define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */ >> > #define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */ >> > +#define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* driver will do RCU-idle */ >> > >> > struct cpuidle_device_kobj; >> > struct cpuidle_state_kobj;