2022-11-04 14:13:12

by Doug Anderson

[permalink] [raw]
Subject: [PATCH 1/3] clk: qcom: lpass-sc7280: Fix pm_runtime usage

The pm_runtime usage in lpass-sc7280 was broken in quite a few
ways. Specifically:

1. At the end of probe it called "put" twice. This is a no-no and will
end us up with a negative usage count. Even worse than calling
"put" twice, it never called "get" once. Thus after bootup it could
be seen that the runtime usage of the devices managed by this
driver was -2.
2. In some error cases it manually called pm_runtime_disable() even
though it had previously used devm_add_action_or_reset() to set
this up to be called automatically. This meant that in these error
cases we'd double-call pm_runtime_disable().
3. It forgot to call undo pm_runtime_use_autosuspend(), which can
sometimes have subtle problems (and the docs specifically mention
that you need to undo this function).

Overall the above seriously calls into question how this driver is
working. It seems like a combination of "it doesn't", "by luck", and
"because of the weirdness of runtime_pm". Specifically I put a
printout to the serial console every time the runtime suspend/resume
was called for the two devices created by this driver (I wrapped the
pm_clk calls). When I had serial console enabled, I found that the
calls got resumed at bootup (when the clk core probed and before our
double-put) and then never touched again. That's no good.
[ 0.829997] DOUG: my_pm_clk_resume, usage=1
[ 0.835487] DOUG: my_pm_clk_resume, usage=1

When I disabled serial console (speeding up boot), I got a different
pattern, which I guess (?) is better:
[ 0.089767] DOUG: my_pm_clk_resume, usage=1
[ 0.090507] DOUG: my_pm_clk_resume, usage=1
[ 0.151885] DOUG: my_pm_clk_suspend, usage=-2
[ 0.151914] DOUG: my_pm_clk_suspend, usage=-2
[ 1.825747] DOUG: my_pm_clk_resume, usage=-1
[ 1.825774] DOUG: my_pm_clk_resume, usage=-1
[ 1.888269] DOUG: my_pm_clk_suspend, usage=-2
[ 1.888282] DOUG: my_pm_clk_suspend, usage=-2

These different patterns have to do with the fact that the core PM
Runtime code really isn't designed to be robust to negative usage
counts and sometimes may happen to stumble upon a behavior that
happens to "work". For instance, you can see that
__pm_runtime_suspend() will treat any non-zero value (including
negative numbers) as if the device is in use.

In any case, let's fix the driver to be correct. We'll hold a
pm_runtime reference for the whole probe and then drop it (once!) at
the end. We'll get rid of manual pm_runtime_disable() calls in the
error handling. We'll also switch to devm_pm_runtime_enable(), which
magically handles undoing pm_runtime_use_autosuspend() as of commit
b4060db9251f ("PM: runtime: Have devm_pm_runtime_enable() handle
pm_runtime_dont_use_autosuspend()").

While we're at this, let's also use devm_pm_clk_create() instead of
rolling it ourselves.

Note that the above changes make it obvious that
lpassaudio_create_pm_clks() was doing more than just creating
clocks. It was also setting up pm_runtime parameters. Let's rename it.

All of these problems were found by code inspection. I started looking
at this driver because it was involved in a deadlock that I reported a
while ago [1]. Though I bisected the deadlock to commit 1b771839de05
("clk: qcom: gdsc: enable optional power domain support"), it was
never really clear why that patch affected it other than a luck of
timing changes. I'll also note that by fixing the timing (as done in
this change) we also seem to aboid the deadlock, which is a nice
benefit.

Also note that some of the fixes here are much the same type of stuff
that Dmitry did in commit 72cfc73f4663 ("clk: qcom: use
devm_pm_runtime_enable and devm_pm_clk_create"), but I guess
lpassaudiocc-sc7280.c didn't exist then.

[1] https://lore.kernel.org/r/[email protected]

Fixes: a9dd26639d05 ("clk: qcom: lpass: Add support for LPASS clock controller for SC7280")
Signed-off-by: Douglas Anderson <[email protected]>
---

drivers/clk/qcom/lpassaudiocc-sc7280.c | 55 ++++++++++----------------
1 file changed, 21 insertions(+), 34 deletions(-)

diff --git a/drivers/clk/qcom/lpassaudiocc-sc7280.c b/drivers/clk/qcom/lpassaudiocc-sc7280.c
index 063e0365f311..1339f9211a14 100644
--- a/drivers/clk/qcom/lpassaudiocc-sc7280.c
+++ b/drivers/clk/qcom/lpassaudiocc-sc7280.c
@@ -722,33 +722,17 @@ static const struct of_device_id lpass_audio_cc_sc7280_match_table[] = {
};
MODULE_DEVICE_TABLE(of, lpass_audio_cc_sc7280_match_table);

-static void lpassaudio_pm_runtime_disable(void *data)
-{
- pm_runtime_disable(data);
-}
-
-static void lpassaudio_pm_clk_destroy(void *data)
-{
- pm_clk_destroy(data);
-}
-
-static int lpassaudio_create_pm_clks(struct platform_device *pdev)
+static int lpass_audio_setup_runtime_pm(struct platform_device *pdev)
{
int ret;

pm_runtime_use_autosuspend(&pdev->dev);
pm_runtime_set_autosuspend_delay(&pdev->dev, 50);
- pm_runtime_enable(&pdev->dev);
-
- ret = devm_add_action_or_reset(&pdev->dev, lpassaudio_pm_runtime_disable, &pdev->dev);
- if (ret)
- return ret;
-
- ret = pm_clk_create(&pdev->dev);
+ ret = devm_pm_runtime_enable(&pdev->dev);
if (ret)
return ret;

- ret = devm_add_action_or_reset(&pdev->dev, lpassaudio_pm_clk_destroy, &pdev->dev);
+ ret = devm_pm_clk_create(&pdev->dev);
if (ret)
return ret;

@@ -756,7 +740,7 @@ static int lpassaudio_create_pm_clks(struct platform_device *pdev)
if (ret < 0)
dev_err(&pdev->dev, "failed to acquire iface clock\n");

- return ret;
+ return pm_runtime_resume_and_get(&pdev->dev);
}

static int lpass_audio_cc_sc7280_probe(struct platform_device *pdev)
@@ -765,7 +749,7 @@ static int lpass_audio_cc_sc7280_probe(struct platform_device *pdev)
struct regmap *regmap;
int ret;

- ret = lpassaudio_create_pm_clks(pdev);
+ ret = lpass_audio_setup_runtime_pm(pdev);
if (ret)
return ret;

@@ -775,8 +759,8 @@ static int lpass_audio_cc_sc7280_probe(struct platform_device *pdev)

regmap = qcom_cc_map(pdev, desc);
if (IS_ERR(regmap)) {
- pm_runtime_disable(&pdev->dev);
- return PTR_ERR(regmap);
+ ret = PTR_ERR(regmap);
+ goto exit;
}

clk_zonda_pll_configure(&lpass_audio_cc_pll, regmap, &lpass_audio_cc_pll_config);
@@ -788,20 +772,18 @@ static int lpass_audio_cc_sc7280_probe(struct platform_device *pdev)
ret = qcom_cc_really_probe(pdev, &lpass_audio_cc_sc7280_desc, regmap);
if (ret) {
dev_err(&pdev->dev, "Failed to register LPASS AUDIO CC clocks\n");
- pm_runtime_disable(&pdev->dev);
- return ret;
+ goto exit;
}

ret = qcom_cc_probe_by_index(pdev, 1, &lpass_audio_cc_reset_sc7280_desc);
if (ret) {
dev_err(&pdev->dev, "Failed to register LPASS AUDIO CC Resets\n");
- pm_runtime_disable(&pdev->dev);
- return ret;
+ goto exit;
}

pm_runtime_mark_last_busy(&pdev->dev);
+exit:
pm_runtime_put_autosuspend(&pdev->dev);
- pm_runtime_put_sync(&pdev->dev);

return ret;
}
@@ -839,14 +821,15 @@ static int lpass_aon_cc_sc7280_probe(struct platform_device *pdev)
struct regmap *regmap;
int ret;

- ret = lpassaudio_create_pm_clks(pdev);
+ ret = lpass_audio_setup_runtime_pm(pdev);
if (ret)
return ret;

if (of_property_read_bool(pdev->dev.of_node, "qcom,adsp-pil-mode")) {
lpass_audio_cc_sc7280_regmap_config.name = "cc";
desc = &lpass_cc_sc7280_desc;
- return qcom_cc_probe(pdev, desc);
+ ret = qcom_cc_probe(pdev, desc);
+ goto exit;
}

lpass_audio_cc_sc7280_regmap_config.name = "lpasscc_aon";
@@ -854,18 +837,22 @@ static int lpass_aon_cc_sc7280_probe(struct platform_device *pdev)
desc = &lpass_aon_cc_sc7280_desc;

regmap = qcom_cc_map(pdev, desc);
- if (IS_ERR(regmap))
- return PTR_ERR(regmap);
+ if (IS_ERR(regmap)) {
+ ret = PTR_ERR(regmap);
+ goto exit;
+ }

clk_lucid_pll_configure(&lpass_aon_cc_pll, regmap, &lpass_aon_cc_pll_config);

ret = qcom_cc_really_probe(pdev, &lpass_aon_cc_sc7280_desc, regmap);
- if (ret)
+ if (ret) {
dev_err(&pdev->dev, "Failed to register LPASS AON CC clocks\n");
+ goto exit;
+ }

pm_runtime_mark_last_busy(&pdev->dev);
+exit:
pm_runtime_put_autosuspend(&pdev->dev);
- pm_runtime_put_sync(&pdev->dev);

return ret;
}
--
2.38.1.431.g37b22c650d-goog



2022-11-04 14:13:39

by Doug Anderson

[permalink] [raw]
Subject: [PATCH 2/3] clk: qcom: lpass-sc7180: Fix pm_runtime usage

The sc7180 lpass clock controller's pm_runtime usage wasn't broken
quite as spectacularly as the sc7280's pm_runtime usage, but it was
still broken. Putting some printouts in at boot showed me this (with
serial console enabled, which makes the prints slow and thus changes
timing):
[ 3.109951] DOUG: my_pm_clk_resume, usage=1
[ 3.114767] DOUG: my_pm_clk_resume, usage=1
[ 3.664443] DOUG: my_pm_clk_suspend, usage=0
[ 3.897566] DOUG: my_pm_clk_suspend, usage=0
[ 3.910137] DOUG: my_pm_clk_resume, usage=1
[ 3.923217] DOUG: my_pm_clk_resume, usage=0
[ 4.440116] DOUG: my_pm_clk_suspend, usage=-1
[ 4.444982] DOUG: my_pm_clk_suspend, usage=0
[ 14.170501] DOUG: my_pm_clk_resume, usage=1
[ 14.176245] DOUG: my_pm_clk_resume, usage=0

...or this w/out serial console:
[ 0.556139] DOUG: my_pm_clk_resume, usage=1
[ 0.556279] DOUG: my_pm_clk_resume, usage=1
[ 1.058422] DOUG: my_pm_clk_suspend, usage=-1
[ 1.058464] DOUG: my_pm_clk_suspend, usage=0
[ 1.186250] DOUG: my_pm_clk_resume, usage=1
[ 1.186292] DOUG: my_pm_clk_resume, usage=0
[ 1.731536] DOUG: my_pm_clk_suspend, usage=-1
[ 1.731557] DOUG: my_pm_clk_suspend, usage=0
[ 10.288910] DOUG: my_pm_clk_resume, usage=1
[ 10.289496] DOUG: my_pm_clk_resume, usage=0

It seems to be doing roughly the right sequence of calls, but just
like with sc7280 this is more by luck than anything. Having a usage of
-1 is just not OK.

Let's fix this like we did with sc7280.

Signed-off-by: Douglas Anderson <[email protected]>
---

drivers/clk/qcom/lpasscorecc-sc7180.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/clk/qcom/lpasscorecc-sc7180.c b/drivers/clk/qcom/lpasscorecc-sc7180.c
index ac09b7b840ab..a5731994cbed 100644
--- a/drivers/clk/qcom/lpasscorecc-sc7180.c
+++ b/drivers/clk/qcom/lpasscorecc-sc7180.c
@@ -356,7 +356,7 @@ static const struct qcom_cc_desc lpass_audio_hm_sc7180_desc = {
.num_gdscs = ARRAY_SIZE(lpass_audio_hm_sc7180_gdscs),
};

-static int lpass_create_pm_clks(struct platform_device *pdev)
+static int lpass_setup_runtime_pm(struct platform_device *pdev)
{
int ret;

@@ -375,7 +375,7 @@ static int lpass_create_pm_clks(struct platform_device *pdev)
if (ret < 0)
dev_err(&pdev->dev, "failed to acquire iface clock\n");

- return ret;
+ return pm_runtime_resume_and_get(&pdev->dev);
}

static int lpass_core_cc_sc7180_probe(struct platform_device *pdev)
@@ -384,7 +384,7 @@ static int lpass_core_cc_sc7180_probe(struct platform_device *pdev)
struct regmap *regmap;
int ret;

- ret = lpass_create_pm_clks(pdev);
+ ret = lpass_setup_runtime_pm(pdev);
if (ret)
return ret;

@@ -392,12 +392,14 @@ static int lpass_core_cc_sc7180_probe(struct platform_device *pdev)
desc = &lpass_audio_hm_sc7180_desc;
ret = qcom_cc_probe_by_index(pdev, 1, desc);
if (ret)
- return ret;
+ goto exit;

lpass_core_cc_sc7180_regmap_config.name = "lpass_core_cc";
regmap = qcom_cc_map(pdev, &lpass_core_cc_sc7180_desc);
- if (IS_ERR(regmap))
- return PTR_ERR(regmap);
+ if (IS_ERR(regmap)) {
+ ret = PTR_ERR(regmap);
+ goto exit;
+ }

/*
* Keep the CLK always-ON
@@ -415,6 +417,7 @@ static int lpass_core_cc_sc7180_probe(struct platform_device *pdev)
ret = qcom_cc_really_probe(pdev, &lpass_core_cc_sc7180_desc, regmap);

pm_runtime_mark_last_busy(&pdev->dev);
+exit:
pm_runtime_put_autosuspend(&pdev->dev);

return ret;
@@ -425,14 +428,19 @@ static int lpass_hm_core_probe(struct platform_device *pdev)
const struct qcom_cc_desc *desc;
int ret;

- ret = lpass_create_pm_clks(pdev);
+ ret = lpass_setup_runtime_pm(pdev);
if (ret)
return ret;

lpass_core_cc_sc7180_regmap_config.name = "lpass_hm_core";
desc = &lpass_core_hm_sc7180_desc;

- return qcom_cc_probe_by_index(pdev, 0, desc);
+ ret = qcom_cc_probe_by_index(pdev, 0, desc);
+
+ pm_runtime_mark_last_busy(&pdev->dev);
+ pm_runtime_put_autosuspend(&pdev->dev);
+
+ return ret;
}

static const struct of_device_id lpass_hm_sc7180_match_table[] = {
--
2.38.1.431.g37b22c650d-goog


2022-11-04 21:13:56

by Stephen Boyd

[permalink] [raw]
Subject: Re: [PATCH 1/3] clk: qcom: lpass-sc7280: Fix pm_runtime usage

Quoting Douglas Anderson (2022-11-04 06:56:28)
> The pm_runtime usage in lpass-sc7280 was broken in quite a few
> ways. Specifically:
>
> 1. At the end of probe it called "put" twice. This is a no-no and will
> end us up with a negative usage count. Even worse than calling
> "put" twice, it never called "get" once. Thus after bootup it could
> be seen that the runtime usage of the devices managed by this
> driver was -2.
> 2. In some error cases it manually called pm_runtime_disable() even
> though it had previously used devm_add_action_or_reset() to set
> this up to be called automatically. This meant that in these error
> cases we'd double-call pm_runtime_disable().
> 3. It forgot to call undo pm_runtime_use_autosuspend(), which can
> sometimes have subtle problems (and the docs specifically mention
> that you need to undo this function).
>
> Overall the above seriously calls into question how this driver is
> working. It seems like a combination of "it doesn't", "by luck", and
> "because of the weirdness of runtime_pm". Specifically I put a
> printout to the serial console every time the runtime suspend/resume
> was called for the two devices created by this driver (I wrapped the
> pm_clk calls). When I had serial console enabled, I found that the
> calls got resumed at bootup (when the clk core probed and before our
> double-put) and then never touched again. That's no good.
> [ 0.829997] DOUG: my_pm_clk_resume, usage=1
> [ 0.835487] DOUG: my_pm_clk_resume, usage=1
>
> When I disabled serial console (speeding up boot), I got a different
> pattern, which I guess (?) is better:
> [ 0.089767] DOUG: my_pm_clk_resume, usage=1
> [ 0.090507] DOUG: my_pm_clk_resume, usage=1
> [ 0.151885] DOUG: my_pm_clk_suspend, usage=-2
> [ 0.151914] DOUG: my_pm_clk_suspend, usage=-2
> [ 1.825747] DOUG: my_pm_clk_resume, usage=-1
> [ 1.825774] DOUG: my_pm_clk_resume, usage=-1
> [ 1.888269] DOUG: my_pm_clk_suspend, usage=-2
> [ 1.888282] DOUG: my_pm_clk_suspend, usage=-2
>
> These different patterns have to do with the fact that the core PM
> Runtime code really isn't designed to be robust to negative usage
> counts and sometimes may happen to stumble upon a behavior that
> happens to "work". For instance, you can see that
> __pm_runtime_suspend() will treat any non-zero value (including
> negative numbers) as if the device is in use.
>
> In any case, let's fix the driver to be correct. We'll hold a
> pm_runtime reference for the whole probe and then drop it (once!) at
> the end. We'll get rid of manual pm_runtime_disable() calls in the
> error handling. We'll also switch to devm_pm_runtime_enable(), which
> magically handles undoing pm_runtime_use_autosuspend() as of commit
> b4060db9251f ("PM: runtime: Have devm_pm_runtime_enable() handle
> pm_runtime_dont_use_autosuspend()").
>
> While we're at this, let's also use devm_pm_clk_create() instead of
> rolling it ourselves.
>
> Note that the above changes make it obvious that
> lpassaudio_create_pm_clks() was doing more than just creating
> clocks. It was also setting up pm_runtime parameters. Let's rename it.
>
> All of these problems were found by code inspection. I started looking
> at this driver because it was involved in a deadlock that I reported a
> while ago [1]. Though I bisected the deadlock to commit 1b771839de05
> ("clk: qcom: gdsc: enable optional power domain support"), it was
> never really clear why that patch affected it other than a luck of
> timing changes. I'll also note that by fixing the timing (as done in
> this change) we also seem to aboid the deadlock, which is a nice
> benefit.
>
> Also note that some of the fixes here are much the same type of stuff
> that Dmitry did in commit 72cfc73f4663 ("clk: qcom: use
> devm_pm_runtime_enable and devm_pm_clk_create"), but I guess
> lpassaudiocc-sc7280.c didn't exist then.
>
> [1] https://lore.kernel.org/r/[email protected]
>
> Fixes: a9dd26639d05 ("clk: qcom: lpass: Add support for LPASS clock controller for SC7280")
> Signed-off-by: Douglas Anderson <[email protected]>
> ---

Reviewed-by: Stephen Boyd <[email protected]>

2022-11-04 21:15:07

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH 1/3] clk: qcom: lpass-sc7280: Fix pm_runtime usage

On Fri, 4 Nov 2022 at 16:57, Douglas Anderson <[email protected]> wrote:
>
> The pm_runtime usage in lpass-sc7280 was broken in quite a few
> ways. Specifically:
>
> 1. At the end of probe it called "put" twice. This is a no-no and will
> end us up with a negative usage count. Even worse than calling
> "put" twice, it never called "get" once. Thus after bootup it could
> be seen that the runtime usage of the devices managed by this
> driver was -2.
> 2. In some error cases it manually called pm_runtime_disable() even
> though it had previously used devm_add_action_or_reset() to set
> this up to be called automatically. This meant that in these error
> cases we'd double-call pm_runtime_disable().
> 3. It forgot to call undo pm_runtime_use_autosuspend(), which can
> sometimes have subtle problems (and the docs specifically mention
> that you need to undo this function).
>
> Overall the above seriously calls into question how this driver is
> working. It seems like a combination of "it doesn't", "by luck", and
> "because of the weirdness of runtime_pm". Specifically I put a
> printout to the serial console every time the runtime suspend/resume
> was called for the two devices created by this driver (I wrapped the
> pm_clk calls). When I had serial console enabled, I found that the
> calls got resumed at bootup (when the clk core probed and before our
> double-put) and then never touched again. That's no good.
> [ 0.829997] DOUG: my_pm_clk_resume, usage=1
> [ 0.835487] DOUG: my_pm_clk_resume, usage=1
>
> When I disabled serial console (speeding up boot), I got a different
> pattern, which I guess (?) is better:
> [ 0.089767] DOUG: my_pm_clk_resume, usage=1
> [ 0.090507] DOUG: my_pm_clk_resume, usage=1
> [ 0.151885] DOUG: my_pm_clk_suspend, usage=-2
> [ 0.151914] DOUG: my_pm_clk_suspend, usage=-2
> [ 1.825747] DOUG: my_pm_clk_resume, usage=-1
> [ 1.825774] DOUG: my_pm_clk_resume, usage=-1
> [ 1.888269] DOUG: my_pm_clk_suspend, usage=-2
> [ 1.888282] DOUG: my_pm_clk_suspend, usage=-2
>
> These different patterns have to do with the fact that the core PM
> Runtime code really isn't designed to be robust to negative usage
> counts and sometimes may happen to stumble upon a behavior that
> happens to "work". For instance, you can see that
> __pm_runtime_suspend() will treat any non-zero value (including
> negative numbers) as if the device is in use.
>
> In any case, let's fix the driver to be correct. We'll hold a
> pm_runtime reference for the whole probe and then drop it (once!) at
> the end. We'll get rid of manual pm_runtime_disable() calls in the
> error handling. We'll also switch to devm_pm_runtime_enable(), which
> magically handles undoing pm_runtime_use_autosuspend() as of commit
> b4060db9251f ("PM: runtime: Have devm_pm_runtime_enable() handle
> pm_runtime_dont_use_autosuspend()").
>
> While we're at this, let's also use devm_pm_clk_create() instead of
> rolling it ourselves.
>
> Note that the above changes make it obvious that
> lpassaudio_create_pm_clks() was doing more than just creating
> clocks. It was also setting up pm_runtime parameters. Let's rename it.
>
> All of these problems were found by code inspection. I started looking
> at this driver because it was involved in a deadlock that I reported a
> while ago [1]. Though I bisected the deadlock to commit 1b771839de05
> ("clk: qcom: gdsc: enable optional power domain support"), it was
> never really clear why that patch affected it other than a luck of
> timing changes. I'll also note that by fixing the timing (as done in
> this change) we also seem to aboid the deadlock, which is a nice
> benefit.
>
> Also note that some of the fixes here are much the same type of stuff
> that Dmitry did in commit 72cfc73f4663 ("clk: qcom: use
> devm_pm_runtime_enable and devm_pm_clk_create"), but I guess
> lpassaudiocc-sc7280.c didn't exist then.

I don't remember. Most probably so.

>
> [1] https://lore.kernel.org/r/[email protected]
>
> Fixes: a9dd26639d05 ("clk: qcom: lpass: Add support for LPASS clock controller for SC7280")
> Signed-off-by: Douglas Anderson <[email protected]>
> ---

Reviewed-by: Dmitry Baryshkov <[email protected]>

--
With best wishes
Dmitry

2022-11-04 21:54:04

by Stephen Boyd

[permalink] [raw]
Subject: Re: [PATCH 2/3] clk: qcom: lpass-sc7180: Fix pm_runtime usage

Quoting Douglas Anderson (2022-11-04 06:56:29)
> The sc7180 lpass clock controller's pm_runtime usage wasn't broken
> quite as spectacularly as the sc7280's pm_runtime usage, but it was
> still broken. Putting some printouts in at boot showed me this (with
> serial console enabled, which makes the prints slow and thus changes
> timing):
> [ 3.109951] DOUG: my_pm_clk_resume, usage=1
> [ 3.114767] DOUG: my_pm_clk_resume, usage=1
> [ 3.664443] DOUG: my_pm_clk_suspend, usage=0
> [ 3.897566] DOUG: my_pm_clk_suspend, usage=0
> [ 3.910137] DOUG: my_pm_clk_resume, usage=1
> [ 3.923217] DOUG: my_pm_clk_resume, usage=0
> [ 4.440116] DOUG: my_pm_clk_suspend, usage=-1
> [ 4.444982] DOUG: my_pm_clk_suspend, usage=0
> [ 14.170501] DOUG: my_pm_clk_resume, usage=1
> [ 14.176245] DOUG: my_pm_clk_resume, usage=0
>
> ...or this w/out serial console:
> [ 0.556139] DOUG: my_pm_clk_resume, usage=1
> [ 0.556279] DOUG: my_pm_clk_resume, usage=1
> [ 1.058422] DOUG: my_pm_clk_suspend, usage=-1
> [ 1.058464] DOUG: my_pm_clk_suspend, usage=0
> [ 1.186250] DOUG: my_pm_clk_resume, usage=1
> [ 1.186292] DOUG: my_pm_clk_resume, usage=0
> [ 1.731536] DOUG: my_pm_clk_suspend, usage=-1
> [ 1.731557] DOUG: my_pm_clk_suspend, usage=0
> [ 10.288910] DOUG: my_pm_clk_resume, usage=1
> [ 10.289496] DOUG: my_pm_clk_resume, usage=0
>
> It seems to be doing roughly the right sequence of calls, but just
> like with sc7280 this is more by luck than anything. Having a usage of
> -1 is just not OK.
>
> Let's fix this like we did with sc7280.

Any Fixes tag?

>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---

Reviewed-by: Stephen Boyd <[email protected]>

2022-11-04 22:38:34

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 2/3] clk: qcom: lpass-sc7180: Fix pm_runtime usage

Hi,

On Fri, Nov 4, 2022 at 2:19 PM Stephen Boyd <[email protected]> wrote:
>
> Quoting Douglas Anderson (2022-11-04 06:56:29)
> > The sc7180 lpass clock controller's pm_runtime usage wasn't broken
> > quite as spectacularly as the sc7280's pm_runtime usage, but it was
> > still broken. Putting some printouts in at boot showed me this (with
> > serial console enabled, which makes the prints slow and thus changes
> > timing):
> > [ 3.109951] DOUG: my_pm_clk_resume, usage=1
> > [ 3.114767] DOUG: my_pm_clk_resume, usage=1
> > [ 3.664443] DOUG: my_pm_clk_suspend, usage=0
> > [ 3.897566] DOUG: my_pm_clk_suspend, usage=0
> > [ 3.910137] DOUG: my_pm_clk_resume, usage=1
> > [ 3.923217] DOUG: my_pm_clk_resume, usage=0
> > [ 4.440116] DOUG: my_pm_clk_suspend, usage=-1
> > [ 4.444982] DOUG: my_pm_clk_suspend, usage=0
> > [ 14.170501] DOUG: my_pm_clk_resume, usage=1
> > [ 14.176245] DOUG: my_pm_clk_resume, usage=0
> >
> > ...or this w/out serial console:
> > [ 0.556139] DOUG: my_pm_clk_resume, usage=1
> > [ 0.556279] DOUG: my_pm_clk_resume, usage=1
> > [ 1.058422] DOUG: my_pm_clk_suspend, usage=-1
> > [ 1.058464] DOUG: my_pm_clk_suspend, usage=0
> > [ 1.186250] DOUG: my_pm_clk_resume, usage=1
> > [ 1.186292] DOUG: my_pm_clk_resume, usage=0
> > [ 1.731536] DOUG: my_pm_clk_suspend, usage=-1
> > [ 1.731557] DOUG: my_pm_clk_suspend, usage=0
> > [ 10.288910] DOUG: my_pm_clk_resume, usage=1
> > [ 10.289496] DOUG: my_pm_clk_resume, usage=0
> >
> > It seems to be doing roughly the right sequence of calls, but just
> > like with sc7280 this is more by luck than anything. Having a usage of
> > -1 is just not OK.
> >
> > Let's fix this like we did with sc7280.
>
> Any Fixes tag?

Ah, right. I guess the most obvious one is actually:

Fixes: ce8c195e652f ("clk: qcom: lpasscc: Introduce pm autosuspend for SC7180")

That's what got us going negative. One could _sorta_ make the argument
for a "Fixes" tag all the way to the start of the driver, though. The
driver never did a pm_runtime_get() during probe and so there was (I
guess) a chance that some of the bare register writes in probe could
have been unclocked. I'm not aware of that ever being a problem, so I
guess just the above "Fixes" is fine.


> > Signed-off-by: Douglas Anderson <[email protected]>
> > ---
>
> Reviewed-by: Stephen Boyd <[email protected]>

Thanks! Yell if you want me to spin a v2 with the Fixes in place.

-Doug

2022-11-15 00:34:13

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 1/3] clk: qcom: lpass-sc7280: Fix pm_runtime usage

Bjorn,

On Fri, Nov 4, 2022 at 6:57 AM Douglas Anderson <[email protected]> wrote:
>
> The pm_runtime usage in lpass-sc7280 was broken in quite a few
> ways. Specifically:
>
> 1. At the end of probe it called "put" twice. This is a no-no and will
> end us up with a negative usage count. Even worse than calling
> "put" twice, it never called "get" once. Thus after bootup it could
> be seen that the runtime usage of the devices managed by this
> driver was -2.
> 2. In some error cases it manually called pm_runtime_disable() even
> though it had previously used devm_add_action_or_reset() to set
> this up to be called automatically. This meant that in these error
> cases we'd double-call pm_runtime_disable().
> 3. It forgot to call undo pm_runtime_use_autosuspend(), which can
> sometimes have subtle problems (and the docs specifically mention
> that you need to undo this function).
>
> Overall the above seriously calls into question how this driver is
> working. It seems like a combination of "it doesn't", "by luck", and
> "because of the weirdness of runtime_pm". Specifically I put a
> printout to the serial console every time the runtime suspend/resume
> was called for the two devices created by this driver (I wrapped the
> pm_clk calls). When I had serial console enabled, I found that the
> calls got resumed at bootup (when the clk core probed and before our
> double-put) and then never touched again. That's no good.
> [ 0.829997] DOUG: my_pm_clk_resume, usage=1
> [ 0.835487] DOUG: my_pm_clk_resume, usage=1
>
> When I disabled serial console (speeding up boot), I got a different
> pattern, which I guess (?) is better:
> [ 0.089767] DOUG: my_pm_clk_resume, usage=1
> [ 0.090507] DOUG: my_pm_clk_resume, usage=1
> [ 0.151885] DOUG: my_pm_clk_suspend, usage=-2
> [ 0.151914] DOUG: my_pm_clk_suspend, usage=-2
> [ 1.825747] DOUG: my_pm_clk_resume, usage=-1
> [ 1.825774] DOUG: my_pm_clk_resume, usage=-1
> [ 1.888269] DOUG: my_pm_clk_suspend, usage=-2
> [ 1.888282] DOUG: my_pm_clk_suspend, usage=-2
>
> These different patterns have to do with the fact that the core PM
> Runtime code really isn't designed to be robust to negative usage
> counts and sometimes may happen to stumble upon a behavior that
> happens to "work". For instance, you can see that
> __pm_runtime_suspend() will treat any non-zero value (including
> negative numbers) as if the device is in use.
>
> In any case, let's fix the driver to be correct. We'll hold a
> pm_runtime reference for the whole probe and then drop it (once!) at
> the end. We'll get rid of manual pm_runtime_disable() calls in the
> error handling. We'll also switch to devm_pm_runtime_enable(), which
> magically handles undoing pm_runtime_use_autosuspend() as of commit
> b4060db9251f ("PM: runtime: Have devm_pm_runtime_enable() handle
> pm_runtime_dont_use_autosuspend()").
>
> While we're at this, let's also use devm_pm_clk_create() instead of
> rolling it ourselves.
>
> Note that the above changes make it obvious that
> lpassaudio_create_pm_clks() was doing more than just creating
> clocks. It was also setting up pm_runtime parameters. Let's rename it.
>
> All of these problems were found by code inspection. I started looking
> at this driver because it was involved in a deadlock that I reported a
> while ago [1]. Though I bisected the deadlock to commit 1b771839de05
> ("clk: qcom: gdsc: enable optional power domain support"), it was
> never really clear why that patch affected it other than a luck of
> timing changes. I'll also note that by fixing the timing (as done in
> this change) we also seem to aboid the deadlock, which is a nice
> benefit.
>
> Also note that some of the fixes here are much the same type of stuff
> that Dmitry did in commit 72cfc73f4663 ("clk: qcom: use
> devm_pm_runtime_enable and devm_pm_clk_create"), but I guess
> lpassaudiocc-sc7280.c didn't exist then.
>
> [1] https://lore.kernel.org/r/[email protected]
>
> Fixes: a9dd26639d05 ("clk: qcom: lpass: Add support for LPASS clock controller for SC7280")
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> drivers/clk/qcom/lpassaudiocc-sc7280.c | 55 ++++++++++----------------
> 1 file changed, 21 insertions(+), 34 deletions(-)

Is anything blocking this series from landing? I noticed a few other
patches have landed since then to your Qualcomm clk branch, but I
don't see these there. I assume it'll land through your tree.

Thanks!


-Doug

2022-12-02 21:01:02

by Bjorn Andersson

[permalink] [raw]
Subject: Re: (subset) [PATCH 1/3] clk: qcom: lpass-sc7280: Fix pm_runtime usage

On Fri, 4 Nov 2022 06:56:28 -0700, Douglas Anderson wrote:
> The pm_runtime usage in lpass-sc7280 was broken in quite a few
> ways. Specifically:
>
> 1. At the end of probe it called "put" twice. This is a no-no and will
> end us up with a negative usage count. Even worse than calling
> "put" twice, it never called "get" once. Thus after bootup it could
> be seen that the runtime usage of the devices managed by this
> driver was -2.
> 2. In some error cases it manually called pm_runtime_disable() even
> though it had previously used devm_add_action_or_reset() to set
> this up to be called automatically. This meant that in these error
> cases we'd double-call pm_runtime_disable().
> 3. It forgot to call undo pm_runtime_use_autosuspend(), which can
> sometimes have subtle problems (and the docs specifically mention
> that you need to undo this function).
>
> [...]

Applied, thanks!

[1/3] clk: qcom: lpass-sc7280: Fix pm_runtime usage
commit: d470be3c4f30b4666e43eef6bab80f543563cdb0
[2/3] clk: qcom: lpass-sc7180: Fix pm_runtime usage
commit: ff1ccf59eaffd192efe21f7de9fb0c130faf1b1b
[3/3] clk: qcom: lpass-sc7180: Avoid an extra "struct dev_pm_ops"
commit: e3ad6c3f21ddb89e4b71361be8318da57dbe3597

Best regards,
--
Bjorn Andersson <[email protected]>