2024-01-25 21:09:33

by Bjorn Andersson

[permalink] [raw]
Subject: [PATCH v4 2/8] clk: qcom: gdsc: Enable supply reglator in GPU GX handler

The GX GDSC is modelled to aid the GMU in powering down the GPU in the
event that the GPU crashes, so that it can be restarted again. But in
the event that the power-domain is supplied through a dedicated
regulator (in contrast to being a subdomin of another power-domain),
something needs to turn that regulator on, both to make sure things are
powered and to match the operation in gdsc_disable().

Reviewed-by: Konrad Dybcio <[email protected]>
Signed-off-by: Bjorn Andersson <[email protected]>
---
drivers/clk/qcom/gdsc.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
index 5358e28122ab..e7a4068b9f39 100644
--- a/drivers/clk/qcom/gdsc.c
+++ b/drivers/clk/qcom/gdsc.c
@@ -557,7 +557,15 @@ void gdsc_unregister(struct gdsc_desc *desc)
*/
int gdsc_gx_do_nothing_enable(struct generic_pm_domain *domain)
{
- /* Do nothing but give genpd the impression that we were successful */
- return 0;
+ struct gdsc *sc = domain_to_gdsc(domain);
+ int ret = 0;
+
+ /* Enable the parent supply, when controlled through the regulator framework. */
+ if (sc->rsupply)
+ ret = regulator_enable(sc->rsupply);
+
+ /* Do nothing with the GDSC itself */
+
+ return ret;
}
EXPORT_SYMBOL_GPL(gdsc_gx_do_nothing_enable);

--
2.25.1



2024-03-22 17:01:10

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] clk: qcom: gdsc: Enable supply reglator in GPU GX handler

On Thu, Jan 25, 2024 at 01:05:08PM -0800, Bjorn Andersson wrote:
> The GX GDSC is modelled to aid the GMU in powering down the GPU in the
> event that the GPU crashes, so that it can be restarted again. But in
> the event that the power-domain is supplied through a dedicated
> regulator (in contrast to being a subdomin of another power-domain),
> something needs to turn that regulator on, both to make sure things are
> powered and to match the operation in gdsc_disable().
>
> Reviewed-by: Konrad Dybcio <[email protected]>
> Signed-off-by: Bjorn Andersson <[email protected]>
> ---
> drivers/clk/qcom/gdsc.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
> index 5358e28122ab..e7a4068b9f39 100644
> --- a/drivers/clk/qcom/gdsc.c
> +++ b/drivers/clk/qcom/gdsc.c
> @@ -557,7 +557,15 @@ void gdsc_unregister(struct gdsc_desc *desc)
> */
> int gdsc_gx_do_nothing_enable(struct generic_pm_domain *domain)
> {
> - /* Do nothing but give genpd the impression that we were successful */
> - return 0;
> + struct gdsc *sc = domain_to_gdsc(domain);
> + int ret = 0;
> +
> + /* Enable the parent supply, when controlled through the regulator framework. */
> + if (sc->rsupply)
> + ret = regulator_enable(sc->rsupply);
> +
> + /* Do nothing with the GDSC itself */
> +
> + return ret;
> }
> EXPORT_SYMBOL_GPL(gdsc_gx_do_nothing_enable);

This patch (and series) is now in mainline as commit 9187ebb954ab ("clk:
qcom: gdsc: Enable supply reglator in GPU GX handler") and appears to be
involved in triggering the below lockdep splat on boot of the Lenovo
ThinkPad X13s.

Adding Ulf and the MSM DRM devs as well in case I blamed the wrong
change here.

Johan


[ 5.849106] ======================================================
[ 5.849111] WARNING: possible circular locking dependency detected
[ 5.849115] 6.8.0 #111 Not tainted
[ 5.849119] ------------------------------------------------------
[ 5.849123] kworker/u32:2/66 is trying to acquire lock:
[ 5.849128] ffffaffb0ca9b4f0 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x54/0x27c
[ 5.849148]
but task is already holding lock:
[ 5.849152] ffffaffad8f33a00 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x24
[ 5.849165]
which lock already depends on the new lock.

[ 5.849170]
the existing dependency chain (in reverse order) is:
[ 5.849175]
-> #3 (&genpd->mlock){+.+.}-{3:3}:
[ 5.849182] lock_acquire+0x68/0x84
[ 5.849190] __mutex_lock+0xa0/0x840
[ 5.849197] mutex_lock_nested+0x24/0x30
[ 5.849201] genpd_lock_mtx+0x18/0x24
[ 5.849206] genpd_runtime_resume+0x104/0x2f4
[ 5.849211] __rpm_callback+0x48/0x1a8
[ 5.849218] rpm_callback+0x68/0x74
[ 5.849223] rpm_resume+0x444/0x638
[ 5.849228] __pm_runtime_resume+0x5c/0xbc
[ 5.849233] device_link_add+0x680/0x6e8
[ 5.849239] arm_smmu_probe_device+0x2a4/0x3e4
[ 5.849245] __iommu_probe_device+0x108/0x430
[ 5.849250] iommu_probe_device+0x3c/0x80
[ 5.849255] of_iommu_configure+0x170/0x25c
[ 5.849260] of_dma_configure_id+0x10c/0x340
[ 5.849265] platform_dma_configure+0x78/0xbc
[ 5.849270] really_probe+0x74/0x388
[ 5.849276] __driver_probe_device+0x7c/0x160
[ 5.849281] driver_probe_device+0x40/0x114
[ 5.849287] __driver_attach+0xfc/0x208
[ 5.849292] bus_for_each_dev+0x74/0xd0
[ 5.849296] driver_attach+0x24/0x30
[ 5.849301] bus_add_driver+0x110/0x214
[ 5.849306] driver_register+0x60/0x128
[ 5.849312] __platform_driver_register+0x28/0x34
[ 5.849316] panel_edp_init+0x20/0x1000 [panel_edp]
[ 5.849329] panel_edp_init+0xc8/0x1000 [panel_edp]
[ 5.849337] do_one_initcall+0x74/0x344
[ 5.849342] do_init_module+0x5c/0x1f8
[ 5.849347] load_module+0x1c9c/0x1e18
[ 5.849351] init_module_from_file+0x84/0xc0
[ 5.849356] idempotent_init_module+0x180/0x250
[ 5.849360] __arm64_sys_finit_module+0x68/0xcc
[ 5.849365] invoke_syscall+0x48/0x114
[ 5.849370] el0_svc_common.constprop.0+0xc0/0xe0
[ 5.849376] do_el0_svc+0x1c/0x28
[ 5.849381] el0_svc+0x48/0x114
[ 5.849386] el0t_64_sync_handler+0xc0/0xc4
[ 5.849391] el0t_64_sync+0x190/0x194
[ 5.849395]
-> #2 (dpm_list_mtx){+.+.}-{3:3}:
[ 5.849403] lock_acquire+0x68/0x84
[ 5.849408] __mutex_lock+0xa0/0x840
[ 5.849412] mutex_lock_nested+0x24/0x30
[ 5.849416] device_pm_lock+0x1c/0x28
[ 5.849422] device_link_add+0x274/0x6e8
[ 5.849426] fw_devlink_create_devlink+0x118/0x2fc
[ 5.849431] __fw_devlink_link_to_consumers.isra.0+0x50/0x104
[ 5.849437] device_add+0x744/0x7b0
[ 5.849441] of_device_add+0x44/0x60
[ 5.849446] of_platform_device_create_pdata+0x98/0x140
[ 5.849451] of_platform_bus_create+0x1b4/0x4b4
[ 5.849455] of_platform_populate+0x58/0x150
[ 5.849459] of_platform_default_populate_init+0xd8/0xf0
[ 5.849466] do_one_initcall+0x74/0x344
[ 5.849470] kernel_init_freeable+0x244/0x350
[ 5.849476] kernel_init+0x20/0x1d8
[ 5.849480] ret_from_fork+0x10/0x20
[ 5.849484]
-> #1 (device_links_lock){+.+.}-{3:3}:
[ 5.849492] lock_acquire+0x68/0x84
[ 5.849497] __mutex_lock+0xa0/0x840
[ 5.849501] mutex_lock_nested+0x24/0x30
[ 5.849505] device_link_remove+0x38/0x94
[ 5.849509] _regulator_put.part.0+0x168/0x190
[ 5.849515] regulator_bulk_free+0x64/0x90
[ 5.849521] devm_regulator_bulk_release+0x1c/0x28
[ 5.849526] release_nodes+0x5c/0x90
[ 5.849532] devres_release_group+0xc8/0x134
[ 5.849537] i2c_device_probe+0x138/0x2e8 [i2c_core]
[ 5.849549] really_probe+0xc0/0x388
[ 5.849554] __driver_probe_device+0x7c/0x160
[ 5.849559] driver_probe_device+0x40/0x114
[ 5.849565] __device_attach_driver+0xbc/0x158
[ 5.849571] bus_for_each_drv+0x84/0xe0
[ 5.849576] __device_attach_async_helper+0xb0/0x10c
[ 5.849582] async_run_entry_fn+0x34/0x14c
[ 5.849588] process_one_work+0x220/0x634
[ 5.849595] worker_thread+0x268/0x3a8
[ 5.849599] kthread+0x124/0x128
[ 5.849603] ret_from_fork+0x10/0x20
[ 5.849608]
-> #0 (regulator_list_mutex){+.+.}-{3:3}:
[ 5.849615] __lock_acquire+0x130c/0x2064
[ 5.849621] lock_acquire.part.0+0xc8/0x20c
[ 5.849626] lock_acquire+0x68/0x84
[ 5.849631] __mutex_lock+0xa0/0x840
[ 5.849636] mutex_lock_nested+0x24/0x30
[ 5.849640] regulator_lock_dependent+0x54/0x27c
[ 5.849646] regulator_enable+0x34/0xd0
[ 5.849651] gdsc_gx_do_nothing_enable+0x18/0x2c
[ 5.849659] _genpd_power_on+0x94/0x17c
[ 5.849664] genpd_power_on.part.0+0xa4/0x1a8
[ 5.849669] genpd_runtime_resume+0x120/0x2f4
[ 5.849673] __rpm_callback+0x48/0x1a8
[ 5.849678] rpm_callback+0x68/0x74
[ 5.849683] rpm_resume+0x584/0x638
[ 5.849688] __pm_runtime_resume+0x5c/0xbc
[ 5.849693] a6xx_gmu_resume+0x70/0xcf8 [msm]
[ 5.849722] a6xx_gmu_pm_resume+0x3c/0x284 [msm]
[ 5.849745] adreno_runtime_resume+0x28/0x34 [msm]
[ 5.849769] pm_generic_runtime_resume+0x2c/0x44
[ 5.849774] __rpm_callback+0x48/0x1a8
[ 5.849779] rpm_callback+0x68/0x74
[ 5.849783] rpm_resume+0x444/0x638
[ 5.849788] __pm_runtime_resume+0x5c/0xbc
[ 5.849793] adreno_load_gpu+0x78/0x238 [msm]
[ 5.849816] msm_open+0x114/0x128 [msm]
[ 5.849843] drm_file_alloc+0x184/0x2a8 [drm]
[ 5.849876] drm_client_init+0x7c/0x10c [drm]
[ 5.849900] msm_fbdev_setup+0x84/0x150 [msm]
[ 5.849929] msm_drm_bind+0x264/0x420 [msm]
[ 5.849950] try_to_bring_up_aggregate_device+0x1ec/0x2f4
[ 5.849962] __component_add+0xa8/0x194
[ 5.849965] component_add+0x14/0x20
[ 5.849967] dp_display_probe_tail+0x4c/0xac [msm]
[ 5.850000] dp_auxbus_done_probe+0x14/0x20 [msm]
[ 5.850023] dp_aux_ep_probe+0x4c/0xf0 [drm_dp_aux_bus]
[ 5.850030] really_probe+0xc0/0x388
[ 5.850034] __driver_probe_device+0x7c/0x160
[ 5.850038] driver_probe_device+0x40/0x114
[ 5.850042] __device_attach_driver+0xbc/0x158
[ 5.850046] bus_for_each_drv+0x84/0xe0
[ 5.850050] __device_attach+0xa8/0x1d4
[ 5.850053] device_initial_probe+0x14/0x20
[ 5.850058] bus_probe_device+0xb0/0xb4
[ 5.850061] device_add+0x5b8/0x7b0
[ 5.850065] device_register+0x20/0x30
[ 5.850068] of_dp_aux_populate_bus+0xcc/0x1a0 [drm_dp_aux_bus]
[ 5.850074] devm_of_dp_aux_populate_bus+0x18/0x80 [drm_dp_aux_bus]
[ 5.850080] dp_display_probe+0x2a4/0x454 [msm]
[ 5.850098] platform_probe+0x68/0xd8
[ 5.850102] really_probe+0xc0/0x388
[ 5.850106] __driver_probe_device+0x7c/0x160
[ 5.850109] driver_probe_device+0x40/0x114
[ 5.850113] __device_attach_driver+0xbc/0x158
[ 5.850117] bus_for_each_drv+0x84/0xe0
[ 5.850121] __device_attach+0xa8/0x1d4
[ 5.850125] device_initial_probe+0x14/0x20
[ 5.850129] bus_probe_device+0xb0/0xb4
[ 5.850133] deferred_probe_work_func+0xa0/0xf4
[ 5.850137] process_one_work+0x220/0x634
[ 5.850141] worker_thread+0x268/0x3a8
[ 5.850144] kthread+0x124/0x128
[ 5.850147] ret_from_fork+0x10/0x20
[ 5.850151]
other info that might help us debug this:

[ 5.850154] Chain exists of:
regulator_list_mutex --> dpm_list_mtx --> &genpd->mlock

[ 5.850163] Possible unsafe locking scenario:

[ 5.850166] CPU0 CPU1
[ 5.850168] ---- ----
[ 5.850171] lock(&genpd->mlock);
[ 5.850174] lock(dpm_list_mtx);
[ 5.850178] lock(&genpd->mlock);
[ 5.850182] lock(regulator_list_mutex);
[ 5.850185]
*** DEADLOCK ***
[ 5.850188] 8 locks held by kworker/u32:2/66:
[ 5.850191] #0: ffff2be180020948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1a0/0x634
[ 5.850199] #1: ffff800081033de0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1c8/0x634
[ 5.850207] #2: ffff2be1849e38f8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x38/0x1d4
[ 5.850215] #3: ffff2be183c550e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x38/0x1d4
[ 5.850223] #4: ffffaffb0caa59e8 (component_mutex){+.+.}-{3:3}, at: __component_add+0x60/0x194
[ 5.850231] #5: ffffaffad96ee178 (init_lock){+.+.}-{3:3}, at: msm_open+0x38/0x128 [msm]
[ 5.850252] #6: ffff2be18d6536d8 (&a6xx_gpu->gmu.lock){+.+.}-{3:3}, at: a6xx_gmu_pm_resume+0x34/0x284 [msm]
[ 5.850274] #7: ffffaffad8f33a00 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x18/0x24
[ 5.850282]
stack backtrace:
[ 5.850286] CPU: 6 PID: 66 Comm: kworker/u32:2 Not tainted 6.8.0 #111
[ 5.850291] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET87W (1.59 ) 12/05/2023
[ 5.850296] Workqueue: events_unbound deferred_probe_work_func
[ 5.850301] Call trace:
[ 5.850303] dump_backtrace+0x9c/0x11c
[ 5.850307] show_stack+0x18/0x24
[ 5.850312] dump_stack_lvl+0x90/0xd0
[ 5.850316] dump_stack+0x18/0x24
[ 5.850320] print_circular_bug+0x290/0x370
[ 5.850324] check_noncircular+0x15c/0x170
[ 5.850327] __lock_acquire+0x130c/0x2064
[ 5.850331] lock_acquire.part.0+0xc8/0x20c
[ 5.850335] lock_acquire+0x68/0x84
[ 5.850339] __mutex_lock+0xa0/0x840
[ 5.850342] mutex_lock_nested+0x24/0x30
[ 5.850345] regulator_lock_dependent+0x54/0x27c
[ 5.850349] regulator_enable+0x34/0xd0
[ 5.850352] gdsc_gx_do_nothing_enable+0x18/0x2c
[ 5.850356] _genpd_power_on+0x94/0x17c
[ 5.850360] genpd_power_on.part.0+0xa4/0x1a8
[ 5.850363] genpd_runtime_resume+0x120/0x2f4
[ 5.850366] __rpm_callback+0x48/0x1a8
[ 5.850370] rpm_callback+0x68/0x74
[ 5.850374] rpm_resume+0x584/0x638
[ 5.850377] __pm_runtime_resume+0x5c/0xbc
[ 5.850381] a6xx_gmu_resume+0x70/0xcf8 [msm]
[ 5.850398] a6xx_gmu_pm_resume+0x3c/0x284 [msm]
[ 5.850416] adreno_runtime_resume+0x28/0x34 [msm]
[ 5.850433] pm_generic_runtime_resume+0x2c/0x44
[ 5.850437] __rpm_callback+0x48/0x1a8
[ 5.850440] rpm_callback+0x68/0x74
[ 5.850443] rpm_resume+0x444/0x638
[ 5.850447] __pm_runtime_resume+0x5c/0xbc
[ 5.850450] adreno_load_gpu+0x78/0x238 [msm]
[ 5.850467] msm_open+0x114/0x128 [msm]
[ 5.850485] drm_file_alloc+0x184/0x2a8 [drm]
[ 5.850504] drm_client_init+0x7c/0x10c [drm]
[ 5.850522] msm_fbdev_setup+0x84/0x150 [msm]
[ 5.850540] msm_drm_bind+0x264/0x420 [msm]
[ 5.850557] try_to_bring_up_aggregate_device+0x1ec/0x2f4
[ 5.850562] __component_add+0xa8/0x194
[ 5.850566] component_add+0x14/0x20
[ 5.850570] dp_display_probe_tail+0x4c/0xac [msm]
[ 5.850587] dp_auxbus_done_probe+0x14/0x20 [msm]
[ 5.850605] dp_aux_ep_probe+0x4c/0xf0 [drm_dp_aux_bus]
[ 5.850611] really_probe+0xc0/0x388
[ 5.850615] __driver_probe_device+0x7c/0x160
[ 5.850619] driver_probe_device+0x40/0x114
[ 5.850624] __device_attach_driver+0xbc/0x158
[ 5.850628] bus_for_each_drv+0x84/0xe0
[ 5.850631] __device_attach+0xa8/0x1d4
[ 5.850635] device_initial_probe+0x14/0x20
[ 5.850639] bus_probe_device+0xb0/0xb4
[ 5.850642] device_add+0x5b8/0x7b0
[ 5.850645] device_register+0x20/0x30
[ 5.850648] of_dp_aux_populate_bus+0xcc/0x1a0 [drm_dp_aux_bus]
[ 5.850654] devm_of_dp_aux_populate_bus+0x18/0x80 [drm_dp_aux_bus]
[ 5.850660] dp_display_probe+0x2a4/0x454 [msm]
[ 5.850678] platform_probe+0x68/0xd8
[ 5.850681] really_probe+0xc0/0x388
[ 5.850684] __driver_probe_device+0x7c/0x160
[ 5.850688] driver_probe_device+0x40/0x114
[ 5.850692] __device_attach_driver+0xbc/0x158
[ 5.850696] bus_for_each_drv+0x84/0xe0
[ 5.850699] __device_attach+0xa8/0x1d4
[ 5.850703] device_initial_probe+0x14/0x20
[ 5.850707] bus_probe_device+0xb0/0xb4
[ 5.850710] deferred_probe_work_func+0xa0/0xf4
[ 5.850714] process_one_work+0x220/0x634
[ 5.850717] worker_thread+0x268/0x3a8
[ 5.850720] kthread+0x124/0x128
[ 5.850723] ret_from_fork+0x10/0x20


2024-03-25 13:21:35

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] clk: qcom: gdsc: Enable supply reglator in GPU GX handler

On Fri, Mar 22, 2024 at 06:00:58PM +0100, Johan Hovold wrote:
> On Thu, Jan 25, 2024 at 01:05:08PM -0800, Bjorn Andersson wrote:
> > The GX GDSC is modelled to aid the GMU in powering down the GPU in the
> > event that the GPU crashes, so that it can be restarted again. But in
> > the event that the power-domain is supplied through a dedicated
> > regulator (in contrast to being a subdomin of another power-domain),
> > something needs to turn that regulator on, both to make sure things are
> > powered and to match the operation in gdsc_disable().
> >
> > Reviewed-by: Konrad Dybcio <[email protected]>
> > Signed-off-by: Bjorn Andersson <[email protected]>
> > ---
> > drivers/clk/qcom/gdsc.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
> > index 5358e28122ab..e7a4068b9f39 100644
> > --- a/drivers/clk/qcom/gdsc.c
> > +++ b/drivers/clk/qcom/gdsc.c
> > @@ -557,7 +557,15 @@ void gdsc_unregister(struct gdsc_desc *desc)
> > */
> > int gdsc_gx_do_nothing_enable(struct generic_pm_domain *domain)
> > {
> > - /* Do nothing but give genpd the impression that we were successful */
> > - return 0;
> > + struct gdsc *sc = domain_to_gdsc(domain);
> > + int ret = 0;
> > +
> > + /* Enable the parent supply, when controlled through the regulator framework. */
> > + if (sc->rsupply)
> > + ret = regulator_enable(sc->rsupply);
> > +
> > + /* Do nothing with the GDSC itself */
> > +
> > + return ret;
> > }
> > EXPORT_SYMBOL_GPL(gdsc_gx_do_nothing_enable);
>
> This patch (and series) is now in mainline as commit 9187ebb954ab ("clk:
> qcom: gdsc: Enable supply reglator in GPU GX handler") and appears to be
> involved in triggering the below lockdep splat on boot of the Lenovo
> ThinkPad X13s.
>
> Adding Ulf and the MSM DRM devs as well in case I blamed the wrong
> change here.

I've now verified that applying this series on top of 6.8 also triggers
the lockdep splat even if it is possible that it only exposed an
existing issue.

This is still a regression and also prevents using lockdep on these
platforms, which can lead to further locking issues being introduced
until this is fixed:

#regzbot ^introduced: 9187ebb954ab

Johan