2020-05-21 07:51:57

by Dinghao Liu

[permalink] [raw]
Subject: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error

pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error code. Thus a pairing decrement is needed on
the error handling path to keep the counter balanced.

Signed-off-by: Dinghao Liu <[email protected]>
---
drivers/spi/spi-tegra20-slink.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/spi/spi-tegra20-slink.c b/drivers/spi/spi-tegra20-slink.c
index 7f4d932dade7..15361db00982 100644
--- a/drivers/spi/spi-tegra20-slink.c
+++ b/drivers/spi/spi-tegra20-slink.c
@@ -1118,6 +1118,7 @@ static int tegra_slink_probe(struct platform_device *pdev)
ret = pm_runtime_get_sync(&pdev->dev);
if (ret < 0) {
dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);
+ pm_runtime_put(&pdev->dev);
goto exit_pm_disable;
}
tspi->def_command_reg = SLINK_M_S;
--
2.17.1


2020-05-21 08:07:05

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error

On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <[email protected]> wrote:
>
> pm_runtime_get_sync() increments the runtime PM usage counter even
> when it returns an error code. Thus a pairing decrement is needed on
> the error handling path to keep the counter balanced.

...

> ret = pm_runtime_get_sync(&pdev->dev);
> if (ret < 0) {
> dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);

> + pm_runtime_put(&pdev->dev);

For all your patches, please, double check what you are proposing.

Here, I believe, the correct one will be _put_noidle().

AFAIU you are not supposed to actually suspend the device in case of error.
But I might be mistaken, thus see above.

> goto exit_pm_disable;
> }

--
With Best Regards,
Andy Shevchenko

2020-05-21 08:09:23

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error

On Thu, May 21, 2020 at 11:04 AM Andy Shevchenko
<[email protected]> wrote:
> On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <[email protected]> wrote:

Any I have coccinelle script for this, I can share with you.

--
With Best Regards,
Andy Shevchenko

2020-05-21 08:27:09

by Dinghao Liu

[permalink] [raw]
Subject: Re: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error

Hi Andy,

Thank you for your advice! I will fix the problem in the next edition
of patch. The coccinelle script will be very helpful and I'm looking
forward to it.

Regards,
Dinghao

&quot;Andy Shevchenko&quot; &lt;[email protected]&gt;写道:
> On Thu, May 21, 2020 at 11:04 AM Andy Shevchenko
> <[email protected]> wrote:
> > On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <[email protected]> wrote:
>
> Any I have coccinelle script for this, I can share with you.
>
> --
> With Best Regards,
> Andy Shevchenko

2020-05-21 08:42:07

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error


On 21/05/2020 09:04, Andy Shevchenko wrote:
> On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <[email protected]> wrote:
>>
>> pm_runtime_get_sync() increments the runtime PM usage counter even
>> when it returns an error code. Thus a pairing decrement is needed on
>> the error handling path to keep the counter balanced.
>
> ...
>
>> ret = pm_runtime_get_sync(&pdev->dev);
>> if (ret < 0) {
>> dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);
>
>> + pm_runtime_put(&pdev->dev);
>
> For all your patches, please, double check what you are proposing.
>
> Here, I believe, the correct one will be _put_noidle().
>
> AFAIU you are not supposed to actually suspend the device in case of error.
> But I might be mistaken, thus see above.
>
>> goto exit_pm_disable;
>> }


Is there any reason why this is not handled in pm_runtime_get itself?
Jon

--
nvpublic

2020-05-21 08:48:40

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error


On 21/05/2020 09:38, Jon Hunter wrote:
>
> On 21/05/2020 09:04, Andy Shevchenko wrote:
>> On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <[email protected]> wrote:
>>>
>>> pm_runtime_get_sync() increments the runtime PM usage counter even
>>> when it returns an error code. Thus a pairing decrement is needed on
>>> the error handling path to keep the counter balanced.
>>
>> ...
>>
>>> ret = pm_runtime_get_sync(&pdev->dev);
>>> if (ret < 0) {
>>> dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);
>>
>>> + pm_runtime_put(&pdev->dev);
>>
>> For all your patches, please, double check what you are proposing.
>>
>> Here, I believe, the correct one will be _put_noidle().
>>
>> AFAIU you are not supposed to actually suspend the device in case of error.
>> But I might be mistaken, thus see above.
>>
>>> goto exit_pm_disable;
>>> }
>
>
> Is there any reason why this is not handled in pm_runtime_get itself?

Ah I see a response from Rafael here:
https://lkml.org/lkml/2020/5/20/1100

OK so this is intentional and needs to be fixed.

Jon

--
nvpublic

2020-05-22 07:48:29

by Dinghao Liu

[permalink] [raw]
Subject: Re: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error

Hi Andy,

Thank you for your advice!

Your suggestion is to use pm_runtime_put_noidle(), right?
The only difference between pm_runtime_put() and this function
is that pm_runtime_put() will run an extra pm_request_idle().

I checked this patched function again and found there is a
pm_runtime_put() in the normal branch of pm_runtime_get_sync().
Does this mean the original program logic need to execute idle
callback?

According to runtime PM's doc, the pm_runtime_get_sync() call
paired with a pm_runtime_put() call will be appropriate to ensure
that the device is not put back to sleep during the probe. Therefore
I think pm_runtime_put() is more appropriate here. Do you have
more detailed suggestion for why we should use _put_noidle()?

Regards,
Dinghao
> On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <[email protected]> wrote:
> >
> > pm_runtime_get_sync() increments the runtime PM usage counter even
> > when it returns an error code. Thus a pairing decrement is needed on
> > the error handling path to keep the counter balanced.
>
> ...
>
> > ret = pm_runtime_get_sync(&pdev->dev);
> > if (ret < 0) {
> > dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);
>
> > + pm_runtime_put(&pdev->dev);
>
> For all your patches, please, double check what you are proposing.
>
> Here, I believe, the correct one will be _put_noidle().
>
> AFAIU you are not supposed to actually suspend the device in case of error.
> But I might be mistaken, thus see above.
>
> > goto exit_pm_disable;
> > }
>
> --
> With Best Regards,
> Andy Shevchenko

2020-05-22 15:23:01

by Andy Shevchenko

[permalink] [raw]
Subject: Re: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error

On Fri, May 22, 2020 at 10:46 AM <[email protected]> wrote:
>
> Hi Andy,
>
> Thank you for your advice!

You are welcome, but please, stop top-posting.

> Your suggestion is to use pm_runtime_put_noidle(), right?
> The only difference between pm_runtime_put() and this function
> is that pm_runtime_put() will run an extra pm_request_idle().
>
> I checked this patched function again and found there is a
> pm_runtime_put() in the normal branch of pm_runtime_get_sync().
> Does this mean the original program logic need to execute idle
> callback?
>
> According to runtime PM's doc, the pm_runtime_get_sync() call
> paired with a pm_runtime_put() call will be appropriate to ensure
> that the device is not put back to sleep during the probe.

Correct.

> Therefore
> I think pm_runtime_put() is more appropriate here.

How come to wrong conclusion? We are considering error path. What does
documentation say about this?

> Do you have
> more detailed suggestion for why we should use _put_noidle()?

Because in error case there is no need to go through all code patch to
be sure that the device is idling. Moreover, consider below case

CPU1: ...somewhere in the code...
pm_runtime_get() // with success!
...see below...
pm_runtime_put()

CPU2: ...on parallel thread...
ret = pm_runtime_get_sync() // failed!
if (ret)
pm_runtime_put() // oi vei, we put device into sleep

So, there is a potential issue.

> > > pm_runtime_get_sync() increments the runtime PM usage counter even
> > > when it returns an error code. Thus a pairing decrement is needed on
> > > the error handling path to keep the counter balanced.
> >
> > ...
> >
> > > ret = pm_runtime_get_sync(&pdev->dev);
> > > if (ret < 0) {
> > > dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);
> >
> > > + pm_runtime_put(&pdev->dev);
> >
> > For all your patches, please, double check what you are proposing.
> >
> > Here, I believe, the correct one will be _put_noidle().
> >
> > AFAIU you are not supposed to actually suspend the device in case of error.
> > But I might be mistaken, thus see above.
> >
> > > goto exit_pm_disable;
> > > }


--
With Best Regards,
Andy Shevchenko

2020-05-22 15:25:34

by Andy Shevchenko

[permalink] [raw]
Subject: Re: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error

On Fri, May 22, 2020 at 6:20 PM Andy Shevchenko
<[email protected]> wrote:
> On Fri, May 22, 2020 at 10:46 AM <[email protected]> wrote:

...

> Moreover, consider below case
>
> CPU1: ...somewhere in the code...
> pm_runtime_get() // with success!
> ...see below...
> pm_runtime_put()
>
> CPU2: ...on parallel thread...
> ret = pm_runtime_get_sync() // failed!
> if (ret)
> pm_runtime_put() // oi vei, we put device into sleep
>
> So, there is a potential issue.

...and even if it's impossible (no bugs in runtime PM core, etc) the
code with put() looks suspicious.

--
With Best Regards,
Andy Shevchenko

2020-05-23 11:36:05

by Dinghao Liu

[permalink] [raw]
Subject: Re: Re: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error

> On Fri, May 22, 2020 at 6:20 PM Andy Shevchenko
> <[email protected]> wrote:
> > On Fri, May 22, 2020 at 10:46 AM <[email protected]> wrote:
>
> ...
>
> > Moreover, consider below case
> >
> > CPU1: ...somewhere in the code...
> > pm_runtime_get() // with success!
> > ...see below...
> > pm_runtime_put()
> >
> > CPU2: ...on parallel thread...
> > ret = pm_runtime_get_sync() // failed!
> > if (ret)
> > pm_runtime_put() // oi vei, we put device into sleep
> >
> > So, there is a potential issue.
>
> ...and even if it's impossible (no bugs in runtime PM core, etc) the
> code with put() looks suspicious.
>

I may understand what you are worried about. Do you mean that
executing pm_runtime_put() will influence other threads (e.g.,
one parallel thread can put the device into sleep while other
threads are using this device)?

I think this will never happen. Because in this case the PM usage
counter cannot be decreased to zero if there are still some threads
using this device. Otherwise, pm_runtime_put() should never be
used in the case of multithreading, which is strange since this
API is used widely.

I also checked many other implementation of probe in drivers.
It seems that using pm_runtime_put() is ok. If I misunderstood
your opinion, please point it out, thanks.

Regards,
Dinghao

> --
> With Best Regards,
> Andy Shevchenko

2020-05-23 11:55:00

by Andy Shevchenko

[permalink] [raw]
Subject: Re: Re: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error

On Sat, May 23, 2020 at 2:32 PM <[email protected]> wrote:
>
> > On Fri, May 22, 2020 at 6:20 PM Andy Shevchenko
> > <[email protected]> wrote:
> > > On Fri, May 22, 2020 at 10:46 AM <[email protected]> wrote:

...

> I also checked many other implementation of probe in drivers.
> It seems that using pm_runtime_put() is ok.

In *error path* or normal path?

> If I misunderstood
> your opinion, please point it out, thanks.

Bottom line is (for the *error path* case):
pm_runtime_put_noidle() has no side effects
pm_runtime_put() (potentially) might have side effects.

You should choose one which is clearer about what it does.

--
With Best Regards,
Andy Shevchenko

2020-05-23 12:07:40

by Dinghao Liu

[permalink] [raw]
Subject: Re: Re: Re: Re: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error

> On Sat, May 23, 2020 at 2:32 PM <[email protected]> wrote:
> >
> > > On Fri, May 22, 2020 at 6:20 PM Andy Shevchenko
> > > <[email protected]> wrote:
> > > > On Fri, May 22, 2020 at 10:46 AM <[email protected]> wrote:
>
> ...
>
> > I also checked many other implementation of probe in drivers.
> > It seems that using pm_runtime_put() is ok.
>
> In *error path* or normal path?
>

Error path (e.g., sysc_probe, exynos_trng_probe,
map_rng_probe, ti_eqep_probe).

> > If I misunderstood
> > your opinion, please point it out, thanks.
>
> Bottom line is (for the *error path* case):
> pm_runtime_put_noidle() has no side effects
> pm_runtime_put() (potentially) might have side effects.
>
> You should choose one which is clearer about what it does.
>
> --
> With Best Regards,
> Andy Shevchenko

Agree, for this bug using _noidle() is clearer. I will
send a new path to fix this.

Regards,
Dinghao