Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp603454pxj; Thu, 17 Jun 2021 09:33:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzrWMPktTu0wz0iFJr6wojvFOvl5hMxKh/gXkb+soL45usOh5RgyMlhy/mtc/AYauASzVtk X-Received: by 2002:aa7:d284:: with SMTP id w4mr7845883edq.347.1623947612262; Thu, 17 Jun 2021 09:33:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623947612; cv=none; d=google.com; s=arc-20160816; b=MbJ1DdbQLJE/zmmkFOqXAthRbrbWOk5nrH/0R5aQJyxhWSYOZiTZlq99q4kgDN23e5 4pWXKC1J7xBfxdCDRwGxcR3lcyCFiRHOwDWPGqY0XnBMKx22QDfoBoMkPnm01k6XraVR QJ0MTXB63zzTFMhheVsMXVzlFLbHvacF2SJMOZedUPlxsLnQ/zZfmx6whH+AYrm1dqgb p8YaGOwHYCOXIdSEucOT1QkSg3cvOJkimVVd72zQMbIejhLJRLRBhqh5oueoy0bIBbsr DHbNPV38NtKmsvNjvJlxC1lBV/STB092PBhuP/Jv8pWBjYfFcWRFeRTCQUhAT5+Z2PV+ 5sRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=UIwDw5It67gtmh1VzMKodpZv7Vdx72o6JvpV1IeQnOg=; b=tM2Jz86aJFMvZiZX6QhggV8ZDlbfDQeE8i5s1SZzt5W0/YYSPw08MNXHQ67qfg1acL BV05GuIpkN/25qC5x5/vo4pB4byE/ydrCR7H+nnp3sSLjEOksGlkSYBwy7ODjEWTs2s3 mwXMEjcI6Q3C9ZN4lIjivDe8+VH31s26gYWDbc93Z4Fu95/4W5PCA4DaUuEjvD4wDXua Wd3GkfruPWV1+pF8h0p7pUCAxU60fG3pBfuDgXgJHscCLuP5KyZkU3eMR4WvbWbJNsNo TTMJOHDvlYc8tr8nuSCcfcy4r8Lzijj/ElJAHZ2qVsyjoxz7miEl73cC0jCyMA5+kond aGpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=JF84FlMY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y9si5867346eda.518.2021.06.17.09.33.09; Thu, 17 Jun 2021 09:33:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=JF84FlMY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232156AbhFQQVQ (ORCPT + 99 others); Thu, 17 Jun 2021 12:21:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231607AbhFQQVO (ORCPT ); Thu, 17 Jun 2021 12:21:14 -0400 Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4C60C06175F for ; Thu, 17 Jun 2021 09:19:06 -0700 (PDT) Received: by mail-lj1-x22f.google.com with SMTP id a21so4846771ljj.1 for ; Thu, 17 Jun 2021 09:19:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=UIwDw5It67gtmh1VzMKodpZv7Vdx72o6JvpV1IeQnOg=; b=JF84FlMYzrgpsHqrEnua1qeDPPUmo4iEY1m1KI3nMy4BtYk4nt8BpgbzIY2AI+oVrM ylPmtmacWTin/4AMe1HKaQCim3OsL5UGKNoQpuS/lWDLHkqgj8OcstirEQPq6ziU/nXf h1yYDTc+GGpusnTSWfdp8lT17VTE4LSRx4hF/QKG5uwZBaY0vGJKPlUmHj6MVdZCyQY3 X59TgiB25zbg94Sp8HU3Cfab518FEnfgQsE2YUb3neUs57NWxTpkR3HrI4WvwF7td3Er b3Ir+mCvSNQpLiBDnT9Iv20K7DQgyBLtSi0W0rtIk2pIa+iPV69lOUN3WwEJ5xKYHZrO XD/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=UIwDw5It67gtmh1VzMKodpZv7Vdx72o6JvpV1IeQnOg=; b=lW3k6NlZqUZIbSNf4Sc0J44KzaxHi6DIB1d2YGUuOOZ7f/L3WGrraNgvf/1uteNIei HTfocmg6Iaz1f79xgAy/lsFUF0qEIy3nMdqnbvtZGzbvfwuQQknQ0AMy9gfe92YIUdwd NpCYC/okzJzzoeb+fkJfCLaoCltAFmys8kodTwWKiEHFjoNlAmSIdsF1czESKk2QgZpP ybsjUfwcQMazmZ9eT7zFPjXJ8SCws9dgaQAL+owQrkUSap2dcxu9oY4YnPKBw80doYnk nN4s9XfgVux67cTxV681fFw4yPeZUrGUudAcDrlaX9lsNSPuAPEgv/otNP6goREiFRF5 Rsag== X-Gm-Message-State: AOAM532FBabCPT7FjXysGRt3t0GtgQAQZ75E2J06mq9SZjJ2W0q/hgbp mTgpS8S0yVmut/8AzdvRtdsXwA== X-Received: by 2002:a2e:8802:: with SMTP id x2mr5422195ljh.245.1623946745144; Thu, 17 Jun 2021 09:19:05 -0700 (PDT) Received: from [192.168.1.211] ([37.153.55.125]) by smtp.gmail.com with ESMTPSA id j16sm623312lfk.155.2021.06.17.09.19.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 17 Jun 2021 09:19:04 -0700 (PDT) Subject: Re: [PATCH 2/2] PM: domain: use per-genpd lockdep class To: Ulf Hansson , Bjorn Andersson Cc: Mark Brown , Stephen Boyd , "Rafael J. Wysocki" , Kevin Hilman , Greg Kroah-Hartman , Linux PM , Linux Kernel Mailing List , Rajendra Nayak , Rob Herring References: <20210611101540.3379937-1-dmitry.baryshkov@linaro.org> <20210611101540.3379937-3-dmitry.baryshkov@linaro.org> From: Dmitry Baryshkov Message-ID: <9136597d-f30b-bf75-77c3-b42533d822fa@linaro.org> Date: Thu, 17 Jun 2021 19:19:03 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 17/06/2021 12:07, Ulf Hansson wrote: > + Rajendra > > On Tue, 15 Jun 2021 at 17:55, Bjorn Andersson > wrote: >> >> On Tue 15 Jun 05:17 CDT 2021, Ulf Hansson wrote: >> >>> + Mark >>> >>> On Fri, 11 Jun 2021 at 16:34, Dmitry Baryshkov >>> wrote: >>>> >>>> Added Stephen to Cc list >>>> >>>> On Fri, 11 Jun 2021 at 16:50, Ulf Hansson wrote: >>>>> >>>>> On Fri, 11 Jun 2021 at 12:15, Dmitry Baryshkov >>>>> wrote: >>>>>> >>>>>> In case of nested genpds it is easy to get the following warning from >>>>>> lockdep, because all genpd's mutexes share same locking class. Use the >>>>>> per-genpd locking class to stop lockdep from warning about possible >>>>>> deadlocks. It is not possible to directly use genpd nested locking, as >>>>>> it is not the genpd code calling genpd. There are interim calls to >>>>>> regulator core. >>>>>> >>>>>> [ 3.030219] ============================================ >>>>>> [ 3.030220] WARNING: possible recursive locking detected >>>>>> [ 3.030221] 5.13.0-rc3-00054-gf8f0a2f2b643-dirty #2480 Not tainted >>>>>> [ 3.030222] -------------------------------------------- >>>>>> [ 3.030223] kworker/u16:0/7 is trying to acquire lock: >>>>>> [ 3.030224] ffffde0eabd29aa0 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x2c >>>>>> [ 3.030236] >>>>>> [ 3.030236] but task is already holding lock: >>>>>> [ 3.030236] ffffde0eabcfd6d0 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x2c >>>>>> [ 3.030240] >>>>>> [ 3.030240] other info that might help us debug this: >>>>>> [ 3.030240] Possible unsafe locking scenario: >>>>>> [ 3.030240] >>>>>> [ 3.030241] CPU0 >>>>>> [ 3.030241] ---- >>>>>> [ 3.030242] lock(&genpd->mlock); >>>>>> [ 3.030243] lock(&genpd->mlock); >>>>>> [ 3.030244] >>>>>> [ 3.030244] *** DEADLOCK *** >>>>>> [ 3.030244] >>>>>> [ 3.030244] May be due to missing lock nesting notation >>>>>> [ 3.030244] >>>>>> [ 3.030245] 6 locks held by kworker/u16:0/7: >>>>>> [ 3.030246] #0: ffff6cca00010938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1f0/0x730 >>>>>> [ 3.030252] #1: ffff8000100c3db0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1f0/0x730 >>>>>> [ 3.030255] #2: ffff6cca00ce3188 (&dev->mutex){....}-{3:3}, at: __device_attach+0x3c/0x184 >>>>>> [ 3.030260] #3: ffffde0eabcfd6d0 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x2c >>>>>> [ 3.030264] #4: ffff8000100c3968 (regulator_ww_class_acquire){+.+.}-{0:0}, at: regulator_lock_dependent+0x6c/0x1b0 >>>>>> [ 3.030270] #5: ffff6cca00a59158 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_lock_recursive+0x94/0x1d0 >>>>>> [ 3.030273] >>>>>> [ 3.030273] stack backtrace: >>>>>> [ 3.030275] CPU: 6 PID: 7 Comm: kworker/u16:0 Not tainted 5.13.0-rc3-00054-gf8f0a2f2b643-dirty #2480 >>>>>> [ 3.030276] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) >>>>>> [ 3.030278] Workqueue: events_unbound deferred_probe_work_func >>>>>> [ 3.030280] Call trace: >>>>>> [ 3.030281] dump_backtrace+0x0/0x1a0 >>>>>> [ 3.030284] show_stack+0x18/0x24 >>>>>> [ 3.030286] dump_stack+0x108/0x188 >>>>>> [ 3.030289] __lock_acquire+0xa20/0x1e0c >>>>>> [ 3.030292] lock_acquire.part.0+0xc8/0x320 >>>>>> [ 3.030294] lock_acquire+0x68/0x84 >>>>>> [ 3.030296] __mutex_lock+0xa0/0x4f0 >>>>>> [ 3.030299] mutex_lock_nested+0x40/0x50 >>>>>> [ 3.030301] genpd_lock_mtx+0x18/0x2c >>>>>> [ 3.030303] dev_pm_genpd_set_performance_state+0x94/0x1a0 >>>>>> [ 3.030305] reg_domain_enable+0x28/0x4c >>>>>> [ 3.030308] _regulator_do_enable+0x420/0x6b0 >>>>>> [ 3.030310] _regulator_enable+0x178/0x1f0 >>>>>> [ 3.030312] regulator_enable+0x3c/0x80 >>>>> >>>>> At a closer look, I am pretty sure that it's the wrong code design >>>>> that triggers this problem, rather than that we have a real problem in >>>>> genpd. To put it simply, the code in genpd isn't designed to work like >>>>> this. We will end up in circular looking paths, leading to deadlocks, >>>>> sooner or later if we allow the above code path. >>>>> >>>>> To fix it, the regulator here needs to be converted to a proper PM >>>>> domain. This PM domain should be assigned as the parent to the one >>>>> that is requested to be powered on. >>>> >>>> This more or less resembles original design, replaced per review >>>> request to use separate regulator >>>> (https://lore.kernel.org/linux-arm-msm/160269659638.884498.4031967462806977493@swboyd.mtv.corp.google.com/, >>>> https://lore.kernel.org/linux-arm-msm/20201023131925.334864-1-dmitry.baryshkov@linaro.org/). >>> >>> Thanks for the pointers. In hindsight, it looks like the >>> "regulator-fixed-domain" DT binding wasn't the right thing. >>> >>> Fortunately, it looks like the problem can be quite easily fixed, by >>> moving to a correct model of the domain hierarchy. >>> >> >> Can you give some pointers to how we actually fix this? >> >> The problem that lead us down this path is that drivers/clk/qcom/gdsc.c >> describes power domains, which are parented by domains provided by >> drivers/soc/qcom/rpmhpd.c. >> >> But I am unable to find a way for the gdsc driver to get hold of the >> struct generic_pm_domain of the resources exposed by the rpmhpd driver. > > You don't need a handle to the struct generic_pm_domain, to assign a > parent/child domain. Instead you can use of_genpd_add_subdomain(), > which takes two "struct of_phandle_args*" corresponding to the > parent/child device nodes of the genpd providers and then let genpd > internally do the look up. > > As an example, you may have a look at how the PM domain topology in > drivers/cpuidle/cpuidle-psci-domain.c are being created. > >> >> >> The second thing that Dmitry's regulator driver does is to cast the >> appropriate performance state vote on the rpmhpd resource, but I _think_ >> we can do that using OPP tables in the gdsc client's node... > > Yes, it looks like using an OPP table and to specify a > "required-opps", at some device node is the right thing to do. > > In this case, I wonder if the "required-opps" belongs to the genpd > provider node of the new power-domain (as it seems like it only > supports one fixed performance state when it's powered on). On the > other hand, specifying this at the consumer node should work as well, > I think. > > Actually, this relates to the changes [1] that Rajendra is working on > with "assigned-performance-state" (that we agreed to move to > OPP/required-opps) for genpd. What about the following dts snippet? I do not want to add power-domains directly to the dispcc node (as it's not a device's power domain, just gdsc's parent power domain). dispcc: clock-controller@af00000 { compatible = "qcom,sm8250-dispcc"; [....] #power-domain-cells = <1>; mmss_gdsc { power-domains = <&rpmhpd SM8250_MMCX>; required-opps = <&rpmhpd_opp_low_svs>; }; }; > >> >>> Beyond this, perhaps we should consider removing the >>> "regulator-fixed-domain" DT property, as to avoid similar problems >>> from cropping up? >>> >> >> Currently there's a single user upstream, but we have the exact same >> problem in at least 3-4 platforms with patches in the pipeline. >> >> In order to avoid breakage with existing DT I would prefer to see >> regulator-fixed-domain to live for at least one kernel release beyond >> the introduction of the other model. > > Yes, this seems reasonable to me. > > As Mark suggested, let's mark the regulator-fixed-domain DT property > as deprecated and remove it once we have the new solution in place. > > [...] > > Kind regards > Uffe > -- With best wishes Dmitry