2016-11-24 07:48:43

by Joonyoung Shim

[permalink] [raw]
Subject: [PATCH] PM / OPP: fix CPU device to be removed from OPP table in wrong order

The device that creates OPP table first should be removed from dev_list
of OPP table in last because it can be used by other resources
(supported_hw, prop_name, regulator), but not now. If OPP table is
shared by several CPUs, the CPU device that creates OPP table can be
removed earlier than other CPU devices.

This patch makes that the CPU device that creates OPP table is removed
last.

Signed-off-by: Joonyoung Shim <[email protected]>
---
drivers/base/power/opp/cpu.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)

diff --git a/drivers/base/power/opp/cpu.c b/drivers/base/power/opp/cpu.c
index 8c3434bdb26d..9d0773a237f8 100644
--- a/drivers/base/power/opp/cpu.c
+++ b/drivers/base/power/opp/cpu.c
@@ -118,9 +118,36 @@ void dev_pm_opp_free_cpufreq_table(struct device *dev,
EXPORT_SYMBOL_GPL(dev_pm_opp_free_cpufreq_table);
#endif /* CONFIG_CPU_FREQ */

+static bool dev_pm_opp_is_removed_last(struct device *dev)
+{
+ struct opp_device *opp_dev;
+ struct opp_table *opp_table;
+ bool ret = false;
+
+ mutex_lock(&opp_table_lock);
+
+ opp_table = _find_opp_table(dev);
+ if (IS_ERR(opp_table))
+ goto unlock;
+
+ if (list_is_singular(&opp_table->dev_list))
+ goto unlock;
+
+ opp_dev = list_last_entry(&opp_table->dev_list, struct opp_device,
+ node);
+ if (opp_dev->dev == dev)
+ ret = true;
+
+unlock:
+ mutex_unlock(&opp_table_lock);
+
+ return ret;
+}
+
void _dev_pm_opp_cpumask_remove_table(const struct cpumask *cpumask, bool of)
{
struct device *cpu_dev;
+ struct device *last = NULL;
int cpu;

WARN_ON(cpumask_empty(cpumask));
@@ -133,11 +160,23 @@ void _dev_pm_opp_cpumask_remove_table(const struct cpumask *cpumask, bool of)
continue;
}

+ if (!last && dev_pm_opp_is_removed_last(cpu_dev)) {
+ last = cpu_dev;
+ continue;
+ }
+
if (of)
dev_pm_opp_of_remove_table(cpu_dev);
else
dev_pm_opp_remove_table(cpu_dev);
}
+
+ if (last) {
+ if (of)
+ dev_pm_opp_of_remove_table(last);
+ else
+ dev_pm_opp_remove_table(last);
+ }
}

/**
--
1.9.1


2016-11-24 08:34:09

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] PM / OPP: fix CPU device to be removed from OPP table in wrong order

Ho Joonyoung,

On 24-11-16, 16:49, Joonyoung Shim wrote:
> The device that creates OPP table first should be removed from dev_list
> of OPP table in last because it can be used by other resources
> (supported_hw, prop_name, regulator), but not now.

I am not sure what you are trying to do here? Why can't the CPU which
added the OPP should be removed last.

Can you give a real example where you see a problem ?

> If OPP table is
> shared by several CPUs, the CPU device that creates OPP table can be
> removed earlier than other CPU devices.

I don't think that's a problem, though I can be wrong for sure.

--
viresh

2016-11-25 01:54:20

by Joonyoung Shim

[permalink] [raw]
Subject: Re: [PATCH] PM / OPP: fix CPU device to be removed from OPP table in wrong order

Hi Viresh.

On 11/24/2016 05:34 PM, Viresh Kumar wrote:
> Ho Joonyoung,
>
> On 24-11-16, 16:49, Joonyoung Shim wrote:
>> The device that creates OPP table first should be removed from dev_list
>> of OPP table in last because it can be used by other resources
>> (supported_hw, prop_name, regulator), but not now.
>
> I am not sure what you are trying to do here? Why can't the CPU which
> added the OPP should be removed last.
>
> Can you give a real example where you see a problem ?
>
>> If OPP table is
>> shared by several CPUs, the CPU device that creates OPP table can be
>> removed earlier than other CPU devices.
>
> I don't think that's a problem, though I can be wrong for sure.
>

I found this problem during system suspend/resume of Odroid-XU3 board.

# rtcwake -m mem -s 3
wakeup from "mem" at Wed Apr 4 05:54:44 2001
[ 15.965996] PM: Syncing filesystems ... done.
[ 15.976333] Freezing user space processes ... (elapsed 0.002 seconds) done.
[ 15.983287] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
[ 16.006951] wake enabled for irq 135
[ 16.008782] smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
[ 16.094110] PM: suspend of devices complete after 95.038 msecs
[ 16.105648] PM: late suspend of devices complete after 6.903 msecs
[ 16.116356] PM: noirq suspend of devices complete after 5.912 msecs
[ 16.121213] Disabling non-boot CPUs ...
[ 16.154140] IRQ51 no longer affine to CPU1
[ 16.154709] CPU1: shutdown
[ 16.214148] IRQ52 no longer affine to CPU2
[ 16.214646] CPU2: shutdown
[ 16.273980] IRQ53 no longer affine to CPU3
[ 16.274458] CPU3: shutdown
[ 16.335093] IRQ54 no longer affine to CPU4
[ 16.336033] CPU4: shutdown
[ 16.389979] IRQ55 no longer affine to CPU5
[ 16.390823] CPU5: shutdown
[ 16.444829] IRQ56 no longer affine to CPU6
[ 16.445621] CPU6: shutdown
[ 16.509229] cpu cpu4: Failed to find opp_table: -19
[ 16.514008] IRQ57 no longer affine to CPU7
[ 16.514824] CPU7: shutdown

CPU4-7 are offline on after system resume.

Above -19 error occurs because cannot find OPP table for CPU4 from
dev_pm_opp_put_regulator() when regulator resource for CPU4 is put.

Odroid-XU3 board uses exynos5422 SoCs having 8 CPUs (4 little cores /
4 bit cores). CPU0-3 (little core) shares a OPP table and CPU4-7
(bit core) shares a OPP table (operating-points-v2 binding). CPU0 and
CPU4 have each regulators. You can find them from below dts files,

arch/arm/boot/dts/exynos5420.dtsi
arch/arm/boot/dts/exynos5422-cpus.dtsi
arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi

Let's see CPU4-7 case. A OPP tables are created by device of CPU4 from
cpufreq_init() of drivers/cpufreq/cpufreq-dt.c. Each OPP devices of
CPU4-7 are registered at dev_list of OPP table created by CPU4 device
as CPU4 - CPU5 - CPU6 - CPU7 order. And regulator resource for CPU4 uses
OPP device of CPU4.

The cpufreq_exit() tries to free OPP table for CPU4-7 during system
suspend. It will remove OPP devices as CPU4 - CPU5 - CPU6 - CPU7 order,
then only OPP device for CPU7 remains before regulator resource of CPU4
is put, but it needs CPU4 OPP device that creates OPP table to put
regulator resource of CPU4, not CPU7 OPP device.

This patch will make to remove OPP devices as CPU5 - CPU6 - CPU7 - CPU4
order.

As another way, we can consider reference count for OPP device. Above
problem occurs because CPU4 and regulator of CPU4 use same OPP device.

Any ideas?

Thanks.

2016-11-25 06:58:33

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] PM / OPP: fix CPU device to be removed from OPP table in wrong order

On 25-11-16, 10:54, Joonyoung Shim wrote:
> I found this problem during system suspend/resume of Odroid-XU3 board.
>
> # rtcwake -m mem -s 3
> wakeup from "mem" at Wed Apr 4 05:54:44 2001
> [ 15.965996] PM: Syncing filesystems ... done.
> [ 15.976333] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [ 15.983287] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
> [ 16.006951] wake enabled for irq 135
> [ 16.008782] smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
> [ 16.094110] PM: suspend of devices complete after 95.038 msecs
> [ 16.105648] PM: late suspend of devices complete after 6.903 msecs
> [ 16.116356] PM: noirq suspend of devices complete after 5.912 msecs
> [ 16.121213] Disabling non-boot CPUs ...
> [ 16.154140] IRQ51 no longer affine to CPU1
> [ 16.154709] CPU1: shutdown
> [ 16.214148] IRQ52 no longer affine to CPU2
> [ 16.214646] CPU2: shutdown
> [ 16.273980] IRQ53 no longer affine to CPU3
> [ 16.274458] CPU3: shutdown
> [ 16.335093] IRQ54 no longer affine to CPU4
> [ 16.336033] CPU4: shutdown
> [ 16.389979] IRQ55 no longer affine to CPU5
> [ 16.390823] CPU5: shutdown
> [ 16.444829] IRQ56 no longer affine to CPU6
> [ 16.445621] CPU6: shutdown
> [ 16.509229] cpu cpu4: Failed to find opp_table: -19
> [ 16.514008] IRQ57 no longer affine to CPU7
> [ 16.514824] CPU7: shutdown

Hi,

Yes you have found a real bug it seems. I think that can be reproduced
by a simple rmmmod of cpufreq-dt.ko module as well.

Though the solution you provided isn't good enough.

Consider for example a case where you do this:
- offline CPU 4
- suspend the system

You are going to get stuck in the exact same problem again.

I have sent a separate patch and cc'd you. Can you please verify that
it works for you ?

--
viresh

2016-11-25 07:29:54

by Joonyoung Shim

[permalink] [raw]
Subject: Re: [PATCH] PM / OPP: fix CPU device to be removed from OPP table in wrong order

Hi Viresh,

On 11/25/2016 03:57 PM, Viresh Kumar wrote:
> On 25-11-16, 10:54, Joonyoung Shim wrote:
>> I found this problem during system suspend/resume of Odroid-XU3 board.
>>
>> # rtcwake -m mem -s 3
>> wakeup from "mem" at Wed Apr 4 05:54:44 2001
>> [ 15.965996] PM: Syncing filesystems ... done.
>> [ 15.976333] Freezing user space processes ... (elapsed 0.002 seconds) done.
>> [ 15.983287] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
>> [ 16.006951] wake enabled for irq 135
>> [ 16.008782] smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
>> [ 16.094110] PM: suspend of devices complete after 95.038 msecs
>> [ 16.105648] PM: late suspend of devices complete after 6.903 msecs
>> [ 16.116356] PM: noirq suspend of devices complete after 5.912 msecs
>> [ 16.121213] Disabling non-boot CPUs ...
>> [ 16.154140] IRQ51 no longer affine to CPU1
>> [ 16.154709] CPU1: shutdown
>> [ 16.214148] IRQ52 no longer affine to CPU2
>> [ 16.214646] CPU2: shutdown
>> [ 16.273980] IRQ53 no longer affine to CPU3
>> [ 16.274458] CPU3: shutdown
>> [ 16.335093] IRQ54 no longer affine to CPU4
>> [ 16.336033] CPU4: shutdown
>> [ 16.389979] IRQ55 no longer affine to CPU5
>> [ 16.390823] CPU5: shutdown
>> [ 16.444829] IRQ56 no longer affine to CPU6
>> [ 16.445621] CPU6: shutdown
>> [ 16.509229] cpu cpu4: Failed to find opp_table: -19
>> [ 16.514008] IRQ57 no longer affine to CPU7
>> [ 16.514824] CPU7: shutdown
>
> Hi,
>
> Yes you have found a real bug it seems. I think that can be reproduced
> by a simple rmmmod of cpufreq-dt.ko module as well.
>
> Though the solution you provided isn't good enough.
>
> Consider for example a case where you do this:
> - offline CPU 4
> - suspend the system
>
> You are going to get stuck in the exact same problem again.
>
> I have sent a separate patch and cc'd you. Can you please verify that
> it works for you ?
>

Thanks for the patch, i tested it and it is working well.

Thanks.