2024-03-25 12:57:46

by Stephen Boyd

[permalink] [raw]
Subject: [PATCH 0/5] Fix a deadlock with clk_pm_runtime_get()

This patch series fixes a deadlock reported[1] on ChromeOS devices
(Qualcomm sc7180 Trogdor). To get there, we allow __clk_release() to run
without the prepare_lock held. Then we add runtime PM enabled clk_core
structs to a list that we iterate and enable runtime PM for each entry
before grabbing the prepare_lock to walk the clk tree. The details are
in patch #4.

The patch after that is based on the analysis in the disable unused
patch. We similarly resume devices from runtime suspend when walking the
clk tree for the debugfs clk_summary.

Unfortunately this doesn't fix all problems with the usage of runtime PM
in the clk framework. We still have a problem if preparing a clk happens
in parallel to the device providing that clk runtime resuming or
suspending. In that case, the task will go to sleep waiting for the
runtime PM state to change, and we'll deadlock. This is primarily a
problem with the global prepare_lock. I suspect we'll be able to fix
this by implementing per-clk locking, because then we will be able to
split up the big prepare_lock into smaller locks that don't deadlock on
some device runtime PM transitions.

I'll start working on that problem in earnest now because I'm worried
we're going to run into that problem very soon.

Stephen Boyd (5):
clk: Remove prepare_lock hold assertion in __clk_release()
clk: Don't hold prepare_lock when calling kref_put()
clk: Initialize struct clk_core kref earlier
clk: Get runtime PM before walking tree during disable_unused
clk: Get runtime PM before walking tree for clk_summary

drivers/clk/clk.c | 142 +++++++++++++++++++++++++++++++++++++---------
1 file changed, 115 insertions(+), 27 deletions(-)

Cc: Douglas Anderson <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Taniya Das <[email protected]>
Cc: Ulf Hansson <[email protected]>

[1] https://lore.kernel.org/all/20220922084322.RFC.2.I375b6b9e0a0a5348962f004beb3dafee6a12dfbb@changeid/

base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/
https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git



2024-03-25 12:58:02

by Stephen Boyd

[permalink] [raw]
Subject: [PATCH 3/5] clk: Initialize struct clk_core kref earlier

Initialize this kref once we allocate memory for the struct clk_core so
that we can reuse the release function to free any memory associated
with the structure. This mostly consolidates code, but also clarifies
that the kref lifetime exists once the container structure (struct
clk_core) is allocated instead of leaving it in a half-baked state for
most of __clk_core_init().

Cc: Douglas Anderson <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
---
drivers/clk/clk.c | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 9fc522c26de8..ee80b21f2824 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -3959,8 +3959,6 @@ static int __clk_core_init(struct clk_core *core)
}

clk_core_reparent_orphans_nolock();
-
- kref_init(&core->ref);
out:
clk_pm_runtime_put(core);
unlock:
@@ -4189,6 +4187,16 @@ static void clk_core_free_parent_map(struct clk_core *core)
kfree(core->parents);
}

+/* Free memory allocated for a struct clk_core */
+static void __clk_release(struct kref *ref)
+{
+ struct clk_core *core = container_of(ref, struct clk_core, ref);
+
+ clk_core_free_parent_map(core);
+ kfree_const(core->name);
+ kfree(core);
+}
+
static struct clk *
__clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw)
{
@@ -4209,6 +4217,8 @@ __clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw)
goto fail_out;
}

+ kref_init(&core->ref);
+
core->name = kstrdup_const(init->name, GFP_KERNEL);
if (!core->name) {
ret = -ENOMEM;
@@ -4263,12 +4273,10 @@ __clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw)
hw->clk = NULL;

fail_create_clk:
- clk_core_free_parent_map(core);
fail_parents:
fail_ops:
- kfree_const(core->name);
fail_name:
- kfree(core);
+ kref_put(&core->ref, __clk_release);
fail_out:
return ERR_PTR(ret);
}
@@ -4348,16 +4356,6 @@ int of_clk_hw_register(struct device_node *node, struct clk_hw *hw)
}
EXPORT_SYMBOL_GPL(of_clk_hw_register);

-/* Free memory allocated for a clock. */
-static void __clk_release(struct kref *ref)
-{
- struct clk_core *core = container_of(ref, struct clk_core, ref);
-
- clk_core_free_parent_map(core);
- kfree_const(core->name);
- kfree(core);
-}
-
/*
* Empty clk_ops for unregistered clocks. These are used temporarily
* after clk_unregister() was called on a clock and until last clock
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/
https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git


2024-03-25 12:58:27

by Stephen Boyd

[permalink] [raw]
Subject: [PATCH 2/5] clk: Don't hold prepare_lock when calling kref_put()

We don't need to hold the prepare_lock when dropping a ref on a struct
clk_core. The release function is only freeing memory and any code with
a pointer reference has already unlinked anything pointing to the
clk_core. This reduces the holding area of the prepare_lock a bit.

Note that we also don't call free_clk() with the prepare_lock held.
There isn't any reason to do that.

Cc: Douglas Anderson <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
---
drivers/clk/clk.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 44e71736477d..9fc522c26de8 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -4448,7 +4448,8 @@ void clk_unregister(struct clk *clk)
if (ops == &clk_nodrv_ops) {
pr_err("%s: unregistered clock: %s\n", __func__,
clk->core->name);
- goto unlock;
+ clk_prepare_unlock();
+ return;
}
/*
* Assign empty clock ops for consumers that might still hold
@@ -4482,11 +4483,10 @@ void clk_unregister(struct clk *clk)
if (clk->core->protect_count)
pr_warn("%s: unregistering protected clock: %s\n",
__func__, clk->core->name);
+ clk_prepare_unlock();

kref_put(&clk->core->ref, __clk_release);
free_clk(clk);
-unlock:
- clk_prepare_unlock();
}
EXPORT_SYMBOL_GPL(clk_unregister);

@@ -4645,13 +4645,11 @@ void __clk_put(struct clk *clk)
if (clk->min_rate > 0 || clk->max_rate < ULONG_MAX)
clk_set_rate_range_nolock(clk, 0, ULONG_MAX);

- owner = clk->core->owner;
- kref_put(&clk->core->ref, __clk_release);
-
clk_prepare_unlock();

+ owner = clk->core->owner;
+ kref_put(&clk->core->ref, __clk_release);
module_put(owner);
-
free_clk(clk);
}

--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/
https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git


2024-03-25 12:58:32

by Stephen Boyd

[permalink] [raw]
Subject: [PATCH 1/5] clk: Remove prepare_lock hold assertion in __clk_release()

Removing this assertion lets us move the kref_put() call outside the
prepare_lock section. We don't need to hold the prepare_lock here to
free memory and destroy the clk_core structure. We've already unlinked
the clk from the clk tree and by the time the release function runs
nothing holds a reference to the clk_core anymore so anything with the
pointer can't access the memory that's being freed anyway. Way back in
commit 496eadf821c2 ("clk: Use lockdep asserts to find missing hold of
prepare_lock") we didn't need to have this assertion either.

Fixes: 496eadf821c2 ("clk: Use lockdep asserts to find missing hold of prepare_lock")
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Douglas Anderson <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
---
drivers/clk/clk.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 2253c154a824..44e71736477d 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -4353,8 +4353,6 @@ static void __clk_release(struct kref *ref)
{
struct clk_core *core = container_of(ref, struct clk_core, ref);

- lockdep_assert_held(&prepare_lock);
-
clk_core_free_parent_map(core);
kfree_const(core->name);
kfree(core);
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/
https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git


2024-03-25 12:58:41

by Stephen Boyd

[permalink] [raw]
Subject: [PATCH 5/5] clk: Get runtime PM before walking tree for clk_summary

Similar to the previous commit, we should make sure that all devices are
runtime resumed before printing the clk_summary through debugfs. Failure
to do so would result in a deadlock if the thread is resuming a device
to print clk state and that device is also runtime resuming in another
thread, e.g the screen is turning on and the display driver is starting
up.

Fixes: 1bb294a7981c ("clk: Enable/Disable runtime PM for clk_summary")
Cc: Taniya Das <[email protected]>
Cc: Douglas Anderson <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
---
drivers/clk/clk.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 31998ca67b1e..10792599bec1 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -3332,7 +3332,7 @@ static int clk_summary_show(struct seq_file *s, void *data)
seq_puts(s, " clock count count count rate accuracy phase cycle enable consumer id\n");
seq_puts(s, "---------------------------------------------------------------------------------------------------------------------------------------------\n");

-
+ clk_pm_runtime_get_all();
clk_prepare_lock();

for (; *lists; lists++)
@@ -3340,6 +3340,7 @@ static int clk_summary_show(struct seq_file *s, void *data)
clk_summary_show_subtree(s, c, 0);

clk_prepare_unlock();
+ clk_pm_runtime_put_all();

return 0;
}
@@ -3389,6 +3390,8 @@ static int clk_dump_show(struct seq_file *s, void *data)
struct hlist_head **lists = s->private;

seq_putc(s, '{');
+
+ clk_pm_runtime_get_all();
clk_prepare_lock();

for (; *lists; lists++) {
@@ -3401,6 +3404,7 @@ static int clk_dump_show(struct seq_file *s, void *data)
}

clk_prepare_unlock();
+ clk_pm_runtime_put_all();

seq_puts(s, "}\n");
return 0;
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/
https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git


2024-03-25 12:59:06

by Stephen Boyd

[permalink] [raw]
Subject: [PATCH 4/5] clk: Get runtime PM before walking tree during disable_unused

Doug reported [1] the following hung task:

INFO: task swapper/0:1 blocked for more than 122 seconds.
Not tainted 5.15.149-21875-gf795ebc40eb8 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00000008
Call trace:
__switch_to+0xf4/0x1f4
__schedule+0x418/0xb80
schedule+0x5c/0x10c
rpm_resume+0xe0/0x52c
rpm_resume+0x178/0x52c
__pm_runtime_resume+0x58/0x98
clk_pm_runtime_get+0x30/0xb0
clk_disable_unused_subtree+0x58/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused+0x4c/0xe4
do_one_initcall+0xcc/0x2d8
do_initcall_level+0xa4/0x148
do_initcalls+0x5c/0x9c
do_basic_setup+0x24/0x30
kernel_init_freeable+0xec/0x164
kernel_init+0x28/0x120
ret_from_fork+0x10/0x20
INFO: task kworker/u16:0:9 blocked for more than 122 seconds.
Not tainted 5.15.149-21875-gf795ebc40eb8 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u16:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00000008
Workqueue: events_unbound deferred_probe_work_func
Call trace:
__switch_to+0xf4/0x1f4
__schedule+0x418/0xb80
schedule+0x5c/0x10c
schedule_preempt_disabled+0x2c/0x48
__mutex_lock+0x238/0x488
__mutex_lock_slowpath+0x1c/0x28
mutex_lock+0x50/0x74
clk_prepare_lock+0x7c/0x9c
clk_core_prepare_lock+0x20/0x44
clk_prepare+0x24/0x30
clk_bulk_prepare+0x40/0xb0
mdss_runtime_resume+0x54/0x1c8
pm_generic_runtime_resume+0x30/0x44
__genpd_runtime_resume+0x68/0x7c
genpd_runtime_resume+0x108/0x1f4
__rpm_callback+0x84/0x144
rpm_callback+0x30/0x88
rpm_resume+0x1f4/0x52c
rpm_resume+0x178/0x52c
__pm_runtime_resume+0x58/0x98
__device_attach+0xe0/0x170
device_initial_probe+0x1c/0x28
bus_probe_device+0x3c/0x9c
device_add+0x644/0x814
mipi_dsi_device_register_full+0xe4/0x170
devm_mipi_dsi_device_register_full+0x28/0x70
ti_sn_bridge_probe+0x1dc/0x2c0
auxiliary_bus_probe+0x4c/0x94
really_probe+0xcc/0x2c8
__driver_probe_device+0xa8/0x130
driver_probe_device+0x48/0x110
__device_attach_driver+0xa4/0xcc
bus_for_each_drv+0x8c/0xd8
__device_attach+0xf8/0x170
device_initial_probe+0x1c/0x28
bus_probe_device+0x3c/0x9c
deferred_probe_work_func+0x9c/0xd8
process_one_work+0x148/0x518
worker_thread+0x138/0x350
kthread+0x138/0x1e0
ret_from_fork+0x10/0x20

The first thread is walking the clk tree and calling
clk_pm_runtime_get() to power on devices required to read the clk
hardware via struct clk_ops::is_enabled(). This thread holds the clk
prepare_lock, and is trying to runtime PM resume a device, when it finds
that the device is in the process of resuming so the thread schedule()s
away waiting for the device to finish resuming before continuing. The
second thread is runtime PM resuming the same device, but the runtime
resume callback is calling clk_prepare(), trying to grab the
prepare_lock waiting on the first thread.

This is a classic ABBA deadlock. To properly fix the deadlock, we must
never runtime PM resume or suspend a device with the clk prepare_lock
held. Actually doing that is near impossible today because the global
prepare_lock would have to be dropped in the middle of the tree, the
device runtime PM resumed/suspended, and then the prepare_lock grabbed
again to ensure consistency of the clk tree topology. If anything
changes with the clk tree in the meantime, we've lost and will need to
start the operation all over again.

Luckily, most of the time we're simply incrementing or decrementing the
runtime PM count on an active device, so we don't have the chance to
schedule away with the prepare_lock held. Let's fix this immediate
problem that can be triggered more easily by simply booting on Qualcomm
sc7180.

Introduce a list of clk_core structures that have been registered, or
are in the process of being registered, that require runtime PM to
operate. Iterate this list and call clk_pm_runtime_get() on each of them
without holding the prepare_lock during clk_disable_unused(). This way
we can be certain that the runtime PM state of the devices will be
active and resumed so we can't schedule away while walking the clk tree
with the prepare_lock held. Similarly, call clk_pm_runtime_put() without
the prepare_lock held to properly drop the runtime PM reference.

Reported-by: Douglas Anderson <[email protected]>
Closes: https://lore.kernel.org/all/20220922084322.RFC.2.I375b6b9e0a0a5348962f004beb3dafee6a12dfbb@changeid/ [1]
Closes: https://issuetracker.google.com/328070191
Cc: Marek Szyprowski <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Fixes: 9a34b45397e5 ("clk: Add support for runtime PM")
Signed-off-by: Stephen Boyd <[email protected]>
---
drivers/clk/clk.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 92 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index ee80b21f2824..31998ca67b1e 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -37,6 +37,10 @@ static HLIST_HEAD(clk_root_list);
static HLIST_HEAD(clk_orphan_list);
static LIST_HEAD(clk_notifier_list);

+/* List of registered clks that use runtime PM */
+static HLIST_HEAD(clk_rpm_list);
+static DEFINE_MUTEX(clk_rpm_list_lock);
+
static const struct hlist_head *all_lists[] = {
&clk_root_list,
&clk_orphan_list,
@@ -59,6 +63,7 @@ struct clk_core {
struct clk_hw *hw;
struct module *owner;
struct device *dev;
+ struct hlist_node rpm_node;
struct device_node *of_node;
struct clk_core *parent;
struct clk_parent_map *parents;
@@ -122,6 +127,19 @@ static void clk_pm_runtime_put(struct clk_core *core)
pm_runtime_put_sync(core->dev);
}

+static void clk_core_rpm_init(struct clk_core *core)
+{
+ struct device *dev = core->dev;
+
+ if (dev && pm_runtime_enabled(dev)) {
+ core->rpm_enabled = true;
+
+ mutex_lock(&clk_rpm_list_lock);
+ hlist_add_head(&core->rpm_node, &clk_rpm_list);
+ mutex_unlock(&clk_rpm_list_lock);
+ }
+}
+
/*** locking ***/
static void clk_prepare_lock(void)
{
@@ -191,6 +209,63 @@ static void clk_enable_unlock(unsigned long flags)
spin_unlock_irqrestore(&enable_lock, flags);
}

+/*
+ * Call clk_pm_runtime_get() on all runtime PM enabled clks in the clk tree so
+ * that disabling unused clks avoids a deadlock where a device is runtime PM
+ * resuming/suspending and the runtime PM callback is trying to grab the
+ * prepare_lock for something like clk_prepare_enable() while
+ * clk_disable_unused_subtree() holds the prepare_lock and is trying to runtime
+ * PM resume/suspend the device as well.
+ */
+static int clk_pm_runtime_get_all(void)
+{
+ int ret;
+ struct clk_core *core, *failed;
+
+ /*
+ * Grab the list lock to avoid any new clks from being registered
+ * or unregistered.
+ */
+ mutex_lock(&clk_rpm_list_lock);
+
+ /*
+ * Runtime PM "get" all the devices that are needed for the clks
+ * currently registered. Do this without holding the prepare_lock, to
+ * avoid the deadlock.
+ */
+ hlist_for_each_entry(core, &clk_rpm_list, rpm_node) {
+ ret = clk_pm_runtime_get(core);
+ if (ret) {
+ failed = core;
+ pr_err("clk: Failed to runtime PM get '%s' for clk '%s'\n",
+ failed->name, dev_name(failed->dev));
+ goto out;
+ }
+ }
+
+ return 0;
+
+out:
+ hlist_for_each_entry(core, &clk_rpm_list, rpm_node) {
+ if (core == failed)
+ break;
+
+ clk_pm_runtime_put(core);
+ }
+ mutex_unlock(&clk_rpm_list_lock);
+
+ return ret;
+}
+
+static void clk_pm_runtime_put_all(void)
+{
+ struct clk_core *core;
+
+ hlist_for_each_entry(core, &clk_rpm_list, rpm_node)
+ clk_pm_runtime_put(core);
+ mutex_unlock(&clk_rpm_list_lock);
+}
+
static bool clk_core_rate_is_protected(struct clk_core *core)
{
return core->protect_count;
@@ -1431,6 +1506,7 @@ __setup("clk_ignore_unused", clk_ignore_unused_setup);
static int __init clk_disable_unused(void)
{
struct clk_core *core;
+ int ret;

if (clk_ignore_unused) {
pr_warn("clk: Not disabling unused clocks\n");
@@ -1439,6 +1515,13 @@ static int __init clk_disable_unused(void)

pr_info("clk: Disabling unused clocks\n");

+ ret = clk_pm_runtime_get_all();
+ if (ret)
+ return ret;
+ /*
+ * Grab the prepare lock to keep the clk topology stable while iterating
+ * over clks.
+ */
clk_prepare_lock();

hlist_for_each_entry(core, &clk_root_list, child_node)
@@ -1455,6 +1538,8 @@ static int __init clk_disable_unused(void)

clk_prepare_unlock();

+ clk_pm_runtime_put_all();
+
return 0;
}
late_initcall_sync(clk_disable_unused);
@@ -4192,6 +4277,12 @@ static void __clk_release(struct kref *ref)
{
struct clk_core *core = container_of(ref, struct clk_core, ref);

+ if (core->rpm_enabled) {
+ mutex_lock(&clk_rpm_list_lock);
+ hlist_del(&core->rpm_node);
+ mutex_unlock(&clk_rpm_list_lock);
+ }
+
clk_core_free_parent_map(core);
kfree_const(core->name);
kfree(core);
@@ -4231,9 +4322,8 @@ __clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw)
}
core->ops = init->ops;

- if (dev && pm_runtime_enabled(dev))
- core->rpm_enabled = true;
core->dev = dev;
+ clk_core_rpm_init(core);
core->of_node = np;
if (dev && dev->driver)
core->owner = dev->driver->owner;
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/
https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git


2024-03-25 17:27:21

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 2/5] clk: Don't hold prepare_lock when calling kref_put()

Hi,

On Sun, Mar 24, 2024 at 10:44 PM Stephen Boyd <[email protected]> wrote:
>
> We don't need to hold the prepare_lock when dropping a ref on a struct
> clk_core. The release function is only freeing memory and any code with
> a pointer reference has already unlinked anything pointing to the
> clk_core. This reduces the holding area of the prepare_lock a bit.
>
> Note that we also don't call free_clk() with the prepare_lock held.
> There isn't any reason to do that.
>
> Cc: Douglas Anderson <[email protected]>
> Signed-off-by: Stephen Boyd <[email protected]>
> ---
> drivers/clk/clk.c | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)

Reviewed-by: Douglas Anderson <[email protected]>

2024-03-25 17:33:29

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 1/5] clk: Remove prepare_lock hold assertion in __clk_release()

Hi,

On Sun, Mar 24, 2024 at 10:44 PM Stephen Boyd <[email protected]> wrote:
>
> Removing this assertion lets us move the kref_put() call outside the
> prepare_lock section. We don't need to hold the prepare_lock here to
> free memory and destroy the clk_core structure. We've already unlinked
> the clk from the clk tree and by the time the release function runs
> nothing holds a reference to the clk_core anymore so anything with the
> pointer can't access the memory that's being freed anyway. Way back in
> commit 496eadf821c2 ("clk: Use lockdep asserts to find missing hold of
> prepare_lock") we didn't need to have this assertion either.
>
> Fixes: 496eadf821c2 ("clk: Use lockdep asserts to find missing hold of prepare_lock")
> Cc: Krzysztof Kozlowski <[email protected]>
> Cc: Douglas Anderson <[email protected]>
> Signed-off-by: Stephen Boyd <[email protected]>
> ---
> drivers/clk/clk.c | 2 --
> 1 file changed, 2 deletions(-)

Reviewed-by: Douglas Anderson <[email protected]>

2024-03-25 17:34:33

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 3/5] clk: Initialize struct clk_core kref earlier

Hik

On Sun, Mar 24, 2024 at 10:44 PM Stephen Boyd <[email protected]> wrote:
>
> Initialize this kref once we allocate memory for the struct clk_core so
> that we can reuse the release function to free any memory associated
> with the structure. This mostly consolidates code, but also clarifies
> that the kref lifetime exists once the container structure (struct
> clk_core) is allocated instead of leaving it in a half-baked state for
> most of __clk_core_init().
>
> Cc: Douglas Anderson <[email protected]>
> Signed-off-by: Stephen Boyd <[email protected]>
> ---
> drivers/clk/clk.c | 28 +++++++++++++---------------
> 1 file changed, 13 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 9fc522c26de8..ee80b21f2824 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -3959,8 +3959,6 @@ static int __clk_core_init(struct clk_core *core)
> }
>
> clk_core_reparent_orphans_nolock();
> -
> - kref_init(&core->ref);
> out:
> clk_pm_runtime_put(core);
> unlock:
> @@ -4189,6 +4187,16 @@ static void clk_core_free_parent_map(struct clk_core *core)
> kfree(core->parents);
> }
>
> +/* Free memory allocated for a struct clk_core */
> +static void __clk_release(struct kref *ref)
> +{
> + struct clk_core *core = container_of(ref, struct clk_core, ref);
> +
> + clk_core_free_parent_map(core);
> + kfree_const(core->name);
> + kfree(core);
> +}
> +
> static struct clk *
> __clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw)
> {
> @@ -4209,6 +4217,8 @@ __clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw)
> goto fail_out;
> }
>
> + kref_init(&core->ref);
> +
> core->name = kstrdup_const(init->name, GFP_KERNEL);
> if (!core->name) {
> ret = -ENOMEM;
> @@ -4263,12 +4273,10 @@ __clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw)
> hw->clk = NULL;
>
> fail_create_clk:
> - clk_core_free_parent_map(core);
> fail_parents:
> fail_ops:
> - kfree_const(core->name);
> fail_name:
> - kfree(core);
> + kref_put(&core->ref, __clk_release);
> fail_out:
> return ERR_PTR(ret);

If it were me, I probably would have:

* Removed "fail_out" and turned the one "goto fail_out" to just return
the error.

* Consolidated the rest of the labels into a single "fail" label.

That's definitely just a style opinion though, and IMO the patch is
fine as-is and overall cleans up the code.

Reviewed-by: Douglas Anderson <[email protected]>

2024-03-25 17:35:08

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 5/5] clk: Get runtime PM before walking tree for clk_summary

Hi,

On Sun, Mar 24, 2024 at 10:44 PM Stephen Boyd <[email protected]> wrote:
>
> Similar to the previous commit, we should make sure that all devices are
> runtime resumed before printing the clk_summary through debugfs. Failure
> to do so would result in a deadlock if the thread is resuming a device
> to print clk state and that device is also runtime resuming in another
> thread, e.g the screen is turning on and the display driver is starting
> up.
>
> Fixes: 1bb294a7981c ("clk: Enable/Disable runtime PM for clk_summary")
> Cc: Taniya Das <[email protected]>
> Cc: Douglas Anderson <[email protected]>
> Signed-off-by: Stephen Boyd <[email protected]>
> ---
> drivers/clk/clk.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)

Shouldn't this also squash in a revert of commit 1bb294a7981c ("clk:
Enable/Disable runtime PM for clk_summary")? As it is,
clk_summary_show_subtree() is left with an extra/unnecessary
clk_pm_runtime_get() / clk_pm_runtime_put(), right?

Other than that, this looks good to me:

Reviewed-by: Douglas Anderson <[email protected]>

2024-03-25 17:56:42

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 4/5] clk: Get runtime PM before walking tree during disable_unused

Hi,

On Mon, Mar 25, 2024 at 10:06 AM Stephen Boyd <[email protected]> wrote:
>
> > > +/*
> > > + * Call clk_pm_runtime_get() on all runtime PM enabled clks in the clk tree so
> > > + * that disabling unused clks avoids a deadlock where a device is runtime PM
> > > + * resuming/suspending and the runtime PM callback is trying to grab the
> > > + * prepare_lock for something like clk_prepare_enable() while
> > > + * clk_disable_unused_subtree() holds the prepare_lock and is trying to runtime
> > > + * PM resume/suspend the device as well.
> > > + */
> > > +static int clk_pm_runtime_get_all(void)
> >
> > nit: It'd be nice if this documented that it acquired / held the lock.
> > Could be in comments, or, might as well use the syntax like this (I
> > think):
> >
> > __acquires(&clk_rpm_list_lock);
> >
> > ...similar with the put function.
>
> I had that but removed it because on the error path we drop the lock and
> sparse complains. I don't know how to signal that the lock is held
> unless an error happens, but I'm a little out of date on sparse now.

I'd settle for something in the comments then? Maybe tagged with "Context:" ?

Thanks!

-Doug

2024-03-25 18:03:14

by Stephen Boyd

[permalink] [raw]
Subject: Re: [PATCH 5/5] clk: Get runtime PM before walking tree for clk_summary

Quoting Doug Anderson (2024-03-25 09:19:51)
> Hi,
>
> On Sun, Mar 24, 2024 at 10:44 PM Stephen Boyd <[email protected]> wrote:
> >
> > Similar to the previous commit, we should make sure that all devices are
> > runtime resumed before printing the clk_summary through debugfs. Failure
> > to do so would result in a deadlock if the thread is resuming a device
> > to print clk state and that device is also runtime resuming in another
> > thread, e.g the screen is turning on and the display driver is starting
> > up.
> >
> > Fixes: 1bb294a7981c ("clk: Enable/Disable runtime PM for clk_summary")
> > Cc: Taniya Das <[email protected]>
> > Cc: Douglas Anderson <[email protected]>
> > Signed-off-by: Stephen Boyd <[email protected]>
> > ---
> > drivers/clk/clk.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
>
> Shouldn't this also squash in a revert of commit 1bb294a7981c ("clk:
> Enable/Disable runtime PM for clk_summary")? As it is,
> clk_summary_show_subtree() is left with an extra/unnecessary
> clk_pm_runtime_get() / clk_pm_runtime_put(), right?

Sure, it is superfluous now. I suppose it means we can remove
clk_pm_runtime_get()/put() calls in
clk_{disable,unprepare}_unused_subtree() as well.

>
> Other than that, this looks good to me:
>
> Reviewed-by: Douglas Anderson <[email protected]>

2024-03-25 23:34:57

by Stephen Boyd

[permalink] [raw]
Subject: Re: [PATCH 4/5] clk: Get runtime PM before walking tree during disable_unused

Quoting Doug Anderson (2024-03-25 09:19:37)
> Hi,
>
> On Sun, Mar 24, 2024 at 10:44 PM Stephen Boyd <[email protected]> wrote:
> >
> > Introduce a list of clk_core structures that have been registered, or
> > are in the process of being registered, that require runtime PM to
> > operate. Iterate this list and call clk_pm_runtime_get() on each of them
> > without holding the prepare_lock during clk_disable_unused(). This way
> > we can be certain that the runtime PM state of the devices will be
> > active and resumed so we can't schedule away while walking the clk tree
> > with the prepare_lock held. Similarly, call clk_pm_runtime_put() without
> > the prepare_lock held to properly drop the runtime PM reference.
>
> There's a part of me that worries about the fact that we'll now be
> doing a pm_runtime get() on _all clocks_ (even those that are used) at
> bootup now. I worry that some device out there will be unhappy about
> it. ...but I guess the device passed in here is already documented to
> be one that the clock framework can get/put whenever it needs to
> prepare the clock, so that makes me feel like it should be fine.
>
> Anyway, no action item, just documenting my thoughts...
>
> Oh, funny. After reading the next patch, I guess I'm even less
> concerned. I guess we were already grabbing the pm_runtime state for
> all clocks while printing the clock summary. While that's a debugfs
> function, it's still something that many people have likely exercised
> and it's likely not going to introduce random/long tail problems.
>
>
> > +/*
> > + * Call clk_pm_runtime_get() on all runtime PM enabled clks in the clk tree so
> > + * that disabling unused clks avoids a deadlock where a device is runtime PM
> > + * resuming/suspending and the runtime PM callback is trying to grab the
> > + * prepare_lock for something like clk_prepare_enable() while
> > + * clk_disable_unused_subtree() holds the prepare_lock and is trying to runtime
> > + * PM resume/suspend the device as well.
> > + */
> > +static int clk_pm_runtime_get_all(void)
>
> nit: It'd be nice if this documented that it acquired / held the lock.
> Could be in comments, or, might as well use the syntax like this (I
> think):
>
> __acquires(&clk_rpm_list_lock);
>
> ...similar with the put function.

I had that but removed it because on the error path we drop the lock and
sparse complains. I don't know how to signal that the lock is held
unless an error happens, but I'm a little out of date on sparse now.

>
>
> > + /*
> > + * Runtime PM "get" all the devices that are needed for the clks
> > + * currently registered. Do this without holding the prepare_lock, to
> > + * avoid the deadlock.
> > + */
> > + hlist_for_each_entry(core, &clk_rpm_list, rpm_node) {
> > + ret = clk_pm_runtime_get(core);
> > + if (ret) {
> > + failed = core;
> > + pr_err("clk: Failed to runtime PM get '%s' for clk '%s'\n",
> > + failed->name, dev_name(failed->dev));
>
> If I'm reading this correctly, the strings are backward in your error
> print. Right now you're printing:
>
> clk: Failed to runtime PM get '<clk_name>' for clk '<dev_name>'

Good catch. Thanks!

>
> With the printout fixed and some type of documentation that
> clk_pm_runtime_get_all() and clk_pm_runtime_put_all() grab/release the
> mutex:
>
> Reviewed-by: Douglas Anderson <[email protected]>

2024-03-25 17:27:21

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 4/5] clk: Get runtime PM before walking tree during disable_unused

Hi,

On Sun, Mar 24, 2024 at 10:44 PM Stephen Boyd <[email protected]> wrote:
>
> Introduce a list of clk_core structures that have been registered, or
> are in the process of being registered, that require runtime PM to
> operate. Iterate this list and call clk_pm_runtime_get() on each of them
> without holding the prepare_lock during clk_disable_unused(). This way
> we can be certain that the runtime PM state of the devices will be
> active and resumed so we can't schedule away while walking the clk tree
> with the prepare_lock held. Similarly, call clk_pm_runtime_put() without
> the prepare_lock held to properly drop the runtime PM reference.

There's a part of me that worries about the fact that we'll now be
doing a pm_runtime get() on _all clocks_ (even those that are used) at
bootup now. I worry that some device out there will be unhappy about
it. ...but I guess the device passed in here is already documented to
be one that the clock framework can get/put whenever it needs to
prepare the clock, so that makes me feel like it should be fine.

Anyway, no action item, just documenting my thoughts...

Oh, funny. After reading the next patch, I guess I'm even less
concerned. I guess we were already grabbing the pm_runtime state for
all clocks while printing the clock summary. While that's a debugfs
function, it's still something that many people have likely exercised
and it's likely not going to introduce random/long tail problems.


> +/*
> + * Call clk_pm_runtime_get() on all runtime PM enabled clks in the clk tree so
> + * that disabling unused clks avoids a deadlock where a device is runtime PM
> + * resuming/suspending and the runtime PM callback is trying to grab the
> + * prepare_lock for something like clk_prepare_enable() while
> + * clk_disable_unused_subtree() holds the prepare_lock and is trying to runtime
> + * PM resume/suspend the device as well.
> + */
> +static int clk_pm_runtime_get_all(void)

nit: It'd be nice if this documented that it acquired / held the lock.
Could be in comments, or, might as well use the syntax like this (I
think):

__acquires(&clk_rpm_list_lock);

..similar with the put function.


> + /*
> + * Runtime PM "get" all the devices that are needed for the clks
> + * currently registered. Do this without holding the prepare_lock, to
> + * avoid the deadlock.
> + */
> + hlist_for_each_entry(core, &clk_rpm_list, rpm_node) {
> + ret = clk_pm_runtime_get(core);
> + if (ret) {
> + failed = core;
> + pr_err("clk: Failed to runtime PM get '%s' for clk '%s'\n",
> + failed->name, dev_name(failed->dev));

If I'm reading this correctly, the strings are backward in your error
print. Right now you're printing:

clk: Failed to runtime PM get '<clk_name>' for clk '<dev_name>'


With the printout fixed and some type of documentation that
clk_pm_runtime_get_all() and clk_pm_runtime_put_all() grab/release the
mutex:

Reviewed-by: Douglas Anderson <[email protected]>