Presented fixes are a response for problem described below:
http://thread.gmane.org/gmane.linux.kernel/1793821/match=thermal+core+fix+initialize+max_state+variable+0
In short - it turned out that two trivial fixes (included in this patch set)
require support for deferred probe in thermal drivers.
This situation shows up when CPU frequency reduction is used as a thermal cooling
device for a thermal zone.
It happens that during initialization, the call to thermal probe will be executed
before cpufreq probe (it can be observed at ./drivers/Makefile).
In such a situation thermal will not be properly configured until cpufreq policy
is setup.
In the current code (without included fixes) there is a time window in which thermal
can try to use not configured cpufreq and possibly crash the system.
Proposed solution was based on the code already available in the imx_thermal.c file.
/db8500_thermal.c: -> NOT NEEDED
/intel_powerclamp.c: -> NOT NEEDED - INTEL (x86)
/intel_powerclamp.c: -> NOT NEEDED - INTEL (x86)
/ti-soc-thermal/ti-bandgap.c: -> FIXED [omap2plus_defconfig]
/dove_thermal.c: -> NOT NEEDED - CPU_COOLING NOT AVAILABLE
[dove_defconfig]
/spear_thermal.c: -> FIXED [spear3xx_defconfig]
/samsung/exynos_tmu.c: -> NOT NEEDED (nasty hack - will be reworked in later patches)
/imx_thermal.c: -> OK (deferred probe already in place)
/int340x_thermal/int3402_thermal.c: -> NOT NEEDED - ACPI x86 - Intel specific
/int340x_thermal/int3400_thermal.c: -> NOT NEEDED - ACPI x86 - Intel specific
/tegra_soctherm.c: -> FIXED [tegra_defconfig]
/kirkwood_thermal.c: -> FIXED [multi_v5_defconfig]
/armada_thermal.c: -> FIXED [multi_v7_defconfig]
/rcar_thermal.c: -> FIXED [shmobile_defconfig]
/db8500_cpufreq_cooling.c: -> OK (deferred probe already in place) [multi_v7_defconfig]
/st/st_thermal_syscfg.c: -> NOT NEEDED (Those two are enabled by e.g. ARMADA)
/st/st_thermal_memmap.c:
I only possess Exynos boards and Beagle Bone Black, so I'd be grateful for
testing proposed solution on other boards. The posted code is compile tested.
This code applies on Eduardo's ti-soc-thermal-next tree:
SHA1: 208a97042d66d9bfbcfab0d4a00c9fe317bb73d3
Lukasz Majewski (8):
thermal:cpu cooling:armada: Provide deferred probing for armada driver
thermal:cpu cooling:kirkwood: Provide deferred probing for kirkwood
driver
thermal:cpu cooling:rcar: Provide deferred probing for rcar driver
thermal:cpu cooling:spear: Provide deferred probing for spear driver
thermal:cpu cooling:tegra: Provide deferred probing for tegra driver
thermal:cpu cooling:ti: Provide deferred probing for ti drivers
thermal:core:fix: Initialize the max_state variable to 0
thermal:core:fix: Check return code of the ->get_max_state() callback
drivers/thermal/armada_thermal.c | 7 +++++++
drivers/thermal/kirkwood_thermal.c | 7 +++++++
drivers/thermal/rcar_thermal.c | 7 +++++++
drivers/thermal/spear_thermal.c | 7 +++++++
drivers/thermal/tegra_soctherm.c | 7 +++++++
drivers/thermal/thermal_core.c | 8 +++++---
drivers/thermal/ti-soc-thermal/ti-bandgap.c | 7 +++++++
7 files changed, 47 insertions(+), 3 deletions(-)
--
2.0.0.rc2
When CPU freq is used as a thermal zone cooling device, one needs to wait
until cpufreq subsystem is properly initialized.
This code is similar to the one already available in imx_thermal.c file.
Signed-off-by: Lukasz Majewski <[email protected]>
---
drivers/thermal/armada_thermal.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/thermal/armada_thermal.c b/drivers/thermal/armada_thermal.c
index 9d1420a..1b5d15d 100644
--- a/drivers/thermal/armada_thermal.c
+++ b/drivers/thermal/armada_thermal.c
@@ -23,6 +23,7 @@
#include <linux/platform_device.h>
#include <linux/of_device.h>
#include <linux/thermal.h>
+#include <linux/cpufreq.h>
#define THERMAL_VALID_MASK 0x1
@@ -280,6 +281,12 @@ static int armada_thermal_probe(struct platform_device *pdev)
struct armada_thermal_priv *priv;
struct resource *res;
+#ifdef CONFIG_CPU_THERMAL
+ if (!cpufreq_get_current_driver()) {
+ dev_dbg(&pdev->dev, "no cpufreq driver!");
+ return -EPROBE_DEFER;
+ }
+#endif
match = of_match_device(armada_thermal_id_table, &pdev->dev);
if (!match)
return -ENODEV;
--
2.0.0.rc2
When CPU freq is used as a thermal zone cooling device, one needs to wait
until cpufreq subsystem is properly initialized.
This code is similar to the one already available in imx_thermal.c file.
Signed-off-by: Lukasz Majewski <[email protected]>
---
drivers/thermal/rcar_thermal.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/thermal/rcar_thermal.c b/drivers/thermal/rcar_thermal.c
index 8803e69..b268b4d 100644
--- a/drivers/thermal/rcar_thermal.c
+++ b/drivers/thermal/rcar_thermal.c
@@ -29,6 +29,7 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/thermal.h>
+#include <linux/cpufreq.h>
#define IDLE_INTERVAL 5000
@@ -373,6 +374,12 @@ static int rcar_thermal_probe(struct platform_device *pdev)
int ret = -ENODEV;
int idle = IDLE_INTERVAL;
+#ifdef CONFIG_CPU_THERMAL
+ if (!cpufreq_get_current_driver()) {
+ dev_dbg(&pdev->dev, "no cpufreq driver!");
+ return -EPROBE_DEFER;
+ }
+#endif
common = devm_kzalloc(dev, sizeof(*common), GFP_KERNEL);
if (!common)
return -ENOMEM;
--
2.0.0.rc2
When CPU freq is used as a thermal zone cooling device, one needs to wait
until cpufreq subsystem is properly initialized.
This code is similar to the one already available in imx_thermal.c file.
Signed-off-by: Lukasz Majewski <[email protected]>
---
drivers/thermal/spear_thermal.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/thermal/spear_thermal.c b/drivers/thermal/spear_thermal.c
index 1e2193f..fd8fadf5 100644
--- a/drivers/thermal/spear_thermal.c
+++ b/drivers/thermal/spear_thermal.c
@@ -24,6 +24,7 @@
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/thermal.h>
+#include <linux/cpufreq.h>
#define MD_FACTOR 1000
@@ -107,6 +108,12 @@ static int spear_thermal_probe(struct platform_device *pdev)
struct resource *res;
int ret = 0, val;
+#ifdef CONFIG_CPU_THERMAL
+ if (!cpufreq_get_current_driver()) {
+ dev_dbg(&pdev->dev, "no cpufreq driver!");
+ return -EPROBE_DEFER;
+ }
+#endif
if (!np || !of_property_read_u32(np, "st,thermal-flags", &val)) {
dev_err(&pdev->dev, "Failed: DT Pdata not passed\n");
return -EINVAL;
--
2.0.0.rc2
When CPU freq is used as a thermal zone cooling device, one needs to wait
until cpufreq subsystem is properly initialized.
This code is similar to the one already available in imx_thermal.c file.
Signed-off-by: Lukasz Majewski <[email protected]>
---
drivers/thermal/kirkwood_thermal.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/thermal/kirkwood_thermal.c b/drivers/thermal/kirkwood_thermal.c
index 3b034a0..e938291 100644
--- a/drivers/thermal/kirkwood_thermal.c
+++ b/drivers/thermal/kirkwood_thermal.c
@@ -21,6 +21,7 @@
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/thermal.h>
+#include <linux/cpufreq.h>
#define KIRKWOOD_THERMAL_VALID_OFFSET 9
#define KIRKWOOD_THERMAL_VALID_MASK 0x1
@@ -75,6 +76,12 @@ static int kirkwood_thermal_probe(struct platform_device *pdev)
struct kirkwood_thermal_priv *priv;
struct resource *res;
+#ifdef CONFIG_CPU_THERMAL
+ if (!cpufreq_get_current_driver()) {
+ dev_dbg(&pdev->dev, "no cpufreq driver!");
+ return -EPROBE_DEFER;
+ }
+#endif
priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
--
2.0.0.rc2
When CPU freq is used as a thermal zone cooling device, one needs to wait
until cpufreq subsystem is properly initialized.
This code is similar to the one already available in imx_thermal.c file.
Signed-off-by: Lukasz Majewski <[email protected]>
---
drivers/thermal/tegra_soctherm.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/thermal/tegra_soctherm.c b/drivers/thermal/tegra_soctherm.c
index 70f7e9e..9c5aaa4 100644
--- a/drivers/thermal/tegra_soctherm.c
+++ b/drivers/thermal/tegra_soctherm.c
@@ -26,6 +26,7 @@
#include <linux/platform_device.h>
#include <linux/reset.h>
#include <linux/thermal.h>
+#include <linux/cpufreq.h>
#include <soc/tegra/fuse.h>
@@ -346,6 +347,12 @@ static int tegra_soctherm_probe(struct platform_device *pdev)
const struct tegra_tsensor *tsensors = t124_tsensors;
+#ifdef CONFIG_CPU_THERMAL
+ if (!cpufreq_get_current_driver()) {
+ dev_dbg(&pdev->dev, "no cpufreq driver!");
+ return -EPROBE_DEFER;
+ }
+#endif
tegra = devm_kzalloc(&pdev->dev, sizeof(*tegra), GFP_KERNEL);
if (!tegra)
return -ENOMEM;
--
2.0.0.rc2
Pointer to the uninitialized max_state variable is passed to get the
maximal cooling state.
For CPU cooling device (cpu_cooling.c) the cpufreq_get_max_state() is
called, which even when error occurs will return the max_state variable
unchanged.
Since error for ->get_max_state() is not checked, the automatically
allocated value of max_state is used for (upper > max_state) comparison.
For any possible max_state value it is very unlikely that it will be less
than upper.
As a consequence, the cooling device is bind even without the backed
cpufreq table initialized.
This initialization will prevent from accidental binding trip points to
cpu freq cooling frequencies when cpufreq driver itself is not yet fully
initialized.
Signed-off-by: Lukasz Majewski <[email protected]>
---
drivers/thermal/thermal_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 43b9070..413daa4 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -927,7 +927,7 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
struct thermal_instance *pos;
struct thermal_zone_device *pos1;
struct thermal_cooling_device *pos2;
- unsigned long max_state;
+ unsigned long max_state = 0;
int result;
if (trip >= tz->trips || (trip < 0 && trip != THERMAL_TRIPS_NONE))
--
2.0.0.rc2
When CPU freq is used as a thermal zone cooling device, one needs to wait
until cpufreq subsystem is properly initialized.
This code is similar to the one already available in imx_thermal.c file.
Signed-off-by: Lukasz Majewski <[email protected]>
---
drivers/thermal/ti-soc-thermal/ti-bandgap.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
index 634b6ce..0ac5ead 100644
--- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
+++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
@@ -40,6 +40,7 @@
#include <linux/of_irq.h>
#include <linux/of_gpio.h>
#include <linux/io.h>
+#include <linux/cpufreq.h>
#include "ti-bandgap.h"
@@ -1196,6 +1197,12 @@ int ti_bandgap_probe(struct platform_device *pdev)
struct ti_bandgap *bgp;
int clk_rate, ret = 0, i;
+#ifdef CONFIG_CPU_THERMAL
+ if (!cpufreq_get_current_driver()) {
+ dev_dbg(&pdev->dev, "no cpufreq driver!");
+ return -EPROBE_DEFER;
+ }
+#endif
bgp = ti_bandgap_build(pdev);
if (IS_ERR(bgp)) {
dev_err(&pdev->dev, "failed to fetch platform data\n");
--
2.0.0.rc2
Without this check it was possible to execute cooling device binding code
even when wrong max cooling state was read.
Signed-off-by: Lukasz Majewski <[email protected]>
---
drivers/thermal/thermal_core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 413daa4..c1ddef1 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -928,7 +928,7 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
struct thermal_zone_device *pos1;
struct thermal_cooling_device *pos2;
unsigned long max_state = 0;
- int result;
+ int result, ret;
if (trip >= tz->trips || (trip < 0 && trip != THERMAL_TRIPS_NONE))
return -EINVAL;
@@ -945,7 +945,9 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
if (tz != pos1 || cdev != pos2)
return -EINVAL;
- cdev->ops->get_max_state(cdev, &max_state);
+ ret = cdev->ops->get_max_state(cdev, &max_state);
+ if (ret)
+ return ret;
/* lower default 0, upper default max_state */
lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
--
2.0.0.rc2
Tested-by: Mikko Perttunen <[email protected]>
One potential issue I can see is that if the cpufreq driver fails to
probe then you'll never get the thermal driver either. For example,
Tegra124 currently has no cpufreq driver, so if CONFIG_CPU_THERMAL was
enabled, then the soctherm driver would never be able to probe. But I
don't really have a solution for this either.
Cheers,
Mikko
On 11/13/2014 07:02 PM, Lukasz Majewski wrote:
> When CPU freq is used as a thermal zone cooling device, one needs to wait
> until cpufreq subsystem is properly initialized.
>
> This code is similar to the one already available in imx_thermal.c file.
>
> Signed-off-by: Lukasz Majewski <[email protected]>
> ---
> drivers/thermal/tegra_soctherm.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/thermal/tegra_soctherm.c b/drivers/thermal/tegra_soctherm.c
> index 70f7e9e..9c5aaa4 100644
> --- a/drivers/thermal/tegra_soctherm.c
> +++ b/drivers/thermal/tegra_soctherm.c
> @@ -26,6 +26,7 @@
> #include <linux/platform_device.h>
> #include <linux/reset.h>
> #include <linux/thermal.h>
> +#include <linux/cpufreq.h>
>
> #include <soc/tegra/fuse.h>
>
> @@ -346,6 +347,12 @@ static int tegra_soctherm_probe(struct platform_device *pdev)
>
> const struct tegra_tsensor *tsensors = t124_tsensors;
>
> +#ifdef CONFIG_CPU_THERMAL
> + if (!cpufreq_get_current_driver()) {
> + dev_dbg(&pdev->dev, "no cpufreq driver!");
> + return -EPROBE_DEFER;
> + }
> +#endif
> tegra = devm_kzalloc(&pdev->dev, sizeof(*tegra), GFP_KERNEL);
> if (!tegra)
> return -ENOMEM;
>
Hi Mikko,
> Tested-by: Mikko Perttunen <[email protected]>
Thanks for testing.
>
> One potential issue I can see is that if the cpufreq driver fails to
> probe then you'll never get the thermal driver either. For example,
> Tegra124 currently has no cpufreq driver, so if CONFIG_CPU_THERMAL
> was enabled, then the soctherm driver would never be able to probe.
Yes, this is a potential issue. However, this option in tegra_defconfig
is by default disabled when thermal is enabled.
> But I don't really have a solution for this either.
Ok. I see.
>
> Cheers,
> Mikko
>
> On 11/13/2014 07:02 PM, Lukasz Majewski wrote:
> > When CPU freq is used as a thermal zone cooling device, one needs
> > to wait until cpufreq subsystem is properly initialized.
> >
> > This code is similar to the one already available in imx_thermal.c
> > file.
> >
> > Signed-off-by: Lukasz Majewski <[email protected]>
> > ---
> > drivers/thermal/tegra_soctherm.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/thermal/tegra_soctherm.c
> > b/drivers/thermal/tegra_soctherm.c index 70f7e9e..9c5aaa4 100644
> > --- a/drivers/thermal/tegra_soctherm.c
> > +++ b/drivers/thermal/tegra_soctherm.c
> > @@ -26,6 +26,7 @@
> > #include <linux/platform_device.h>
> > #include <linux/reset.h>
> > #include <linux/thermal.h>
> > +#include <linux/cpufreq.h>
> >
> > #include <soc/tegra/fuse.h>
> >
> > @@ -346,6 +347,12 @@ static int tegra_soctherm_probe(struct
> > platform_device *pdev)
> >
> > const struct tegra_tsensor *tsensors = t124_tsensors;
> >
> > +#ifdef CONFIG_CPU_THERMAL
> > + if (!cpufreq_get_current_driver()) {
> > + dev_dbg(&pdev->dev, "no cpufreq driver!");
> > + return -EPROBE_DEFER;
> > + }
> > +#endif
> > tegra = devm_kzalloc(&pdev->dev, sizeof(*tegra),
> > GFP_KERNEL); if (!tegra)
> > return -ENOMEM;
> >
>
--
Best regards,
Lukasz Majewski
Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
On Thu, Nov 13, 2014 at 06:02:42PM +0100, Lukasz Majewski wrote:
> When CPU freq is used as a thermal zone cooling device, one needs to wait
> until cpufreq subsystem is properly initialized.
>
> This code is similar to the one already available in imx_thermal.c file.
>
> Signed-off-by: Lukasz Majewski <[email protected]>
> ---
> drivers/thermal/tegra_soctherm.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/thermal/tegra_soctherm.c b/drivers/thermal/tegra_soctherm.c
> index 70f7e9e..9c5aaa4 100644
> --- a/drivers/thermal/tegra_soctherm.c
> +++ b/drivers/thermal/tegra_soctherm.c
> @@ -26,6 +26,7 @@
> #include <linux/platform_device.h>
> #include <linux/reset.h>
> #include <linux/thermal.h>
> +#include <linux/cpufreq.h>
>
> #include <soc/tegra/fuse.h>
>
> @@ -346,6 +347,12 @@ static int tegra_soctherm_probe(struct platform_device *pdev)
>
> const struct tegra_tsensor *tsensors = t124_tsensors;
>
> +#ifdef CONFIG_CPU_THERMAL
> + if (!cpufreq_get_current_driver()) {
> + dev_dbg(&pdev->dev, "no cpufreq driver!");
Shouldn't this rather be dev_err() or dev_warn() to give at least some
clue as to the cause?
Thierry
On Fri, Nov 14, 2014 at 12:47:33PM +0200, Mikko Perttunen wrote:
> Tested-by: Mikko Perttunen <[email protected]>
>
> One potential issue I can see is that if the cpufreq driver fails to probe
> then you'll never get the thermal driver either. For example, Tegra124
> currently has no cpufreq driver, so if CONFIG_CPU_THERMAL was enabled, then
> the soctherm driver would never be able to probe. But I don't really have a
> solution for this either.
It doesn't seem like there's any code whatsoever to deal with cpufreq
within the soctherm driver, so deferring probe based on something we're
not using anyway seems rather useless.
Thierry
Hi Thierry,
> On Fri, Nov 14, 2014 at 12:47:33PM +0200, Mikko Perttunen wrote:
> > Tested-by: Mikko Perttunen <[email protected]>
> >
> > One potential issue I can see is that if the cpufreq driver fails
> > to probe then you'll never get the thermal driver either. For
> > example, Tegra124 currently has no cpufreq driver, so if
> > CONFIG_CPU_THERMAL was enabled, then the soctherm driver would
> > never be able to probe. But I don't really have a solution for this
> > either.
>
> It doesn't seem like there's any code whatsoever to deal with cpufreq
> within the soctherm driver, so deferring probe based on something
> we're not using anyway seems rather useless.
So, If I understood you correctly - this patch is not needed in the
/tegra_soctherm.c:[tegra_defconfig] driver and can be safely omitted in
v2 of this driver.
Please note that I'm not aware of Tegra specific use cases and hence
I've prepared fixes for all potential candidates.
>
> Thierry
--
Best regards,
Lukasz Majewski
Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
On Fri, Nov 14, 2014 at 12:24:37PM +0100, Lukasz Majewski wrote:
> Hi Mikko,
>
> > Tested-by: Mikko Perttunen <[email protected]>
>
> Thanks for testing.
>
> >
> > One potential issue I can see is that if the cpufreq driver fails to
> > probe then you'll never get the thermal driver either. For example,
> > Tegra124 currently has no cpufreq driver, so if CONFIG_CPU_THERMAL
> > was enabled, then the soctherm driver would never be able to probe.
>
> Yes, this is a potential issue. However, this option in tegra_defconfig
> is by default disabled when thermal is enabled.
Not everybody uses tegra_defconfig for their kernel builds. In fact I'd
imagine that the majority of kernels use a variant of multi_v7_defconfig
and therefore CPU_THERMAL might get enabled irrespective of any Tegra
support.
I think a better solution would be to add this check only when proper
support for CPU frequency based cooling is added. That is, when a call
to cpufreq_cooling_register() (or a variant thereof) is added.
But while at it, why not make it so that cpufreq_cooling_register()
detects if a cpufreq driver has been registered yet and propagate
-EPROBE_DEFER if necessary?
Thierry
On Mon, Nov 17, 2014 at 12:50:13PM +0100, Lukasz Majewski wrote:
> Hi Thierry,
>
> > On Fri, Nov 14, 2014 at 12:47:33PM +0200, Mikko Perttunen wrote:
> > > Tested-by: Mikko Perttunen <[email protected]>
> > >
> > > One potential issue I can see is that if the cpufreq driver fails
> > > to probe then you'll never get the thermal driver either. For
> > > example, Tegra124 currently has no cpufreq driver, so if
> > > CONFIG_CPU_THERMAL was enabled, then the soctherm driver would
> > > never be able to probe. But I don't really have a solution for this
> > > either.
> >
> > It doesn't seem like there's any code whatsoever to deal with cpufreq
> > within the soctherm driver, so deferring probe based on something
> > we're not using anyway seems rather useless.
>
> So, If I understood you correctly - this patch is not needed in the
> /tegra_soctherm.c:[tegra_defconfig] driver and can be safely omitted in
> v2 of this driver.
What I'm saying is that I don't think doing this mass conversion
wholesale is useful since none of the drivers register a cooling device
based on cpufreq. In other words: if you're not going to use a feature
there's no use testing for it.
Thierry
On 11/17/2014 01:43 PM, Thierry Reding wrote:
> On Fri, Nov 14, 2014 at 12:47:33PM +0200, Mikko Perttunen wrote:
>> Tested-by: Mikko Perttunen <[email protected]>
>>
>> One potential issue I can see is that if the cpufreq driver fails to probe
>> then you'll never get the thermal driver either. For example, Tegra124
>> currently has no cpufreq driver, so if CONFIG_CPU_THERMAL was enabled, then
>> the soctherm driver would never be able to probe. But I don't really have a
>> solution for this either.
>
> It doesn't seem like there's any code whatsoever to deal with cpufreq
> within the soctherm driver, so deferring probe based on something we're
> not using anyway seems rather useless.
>
> Thierry
>
My understanding is that there needs to be no code inside soctherm to
handle it, as the cpufreq driver (cpufreq-dt) will register a cooling
device that will then be bound to the soctherm sensors using the
of-thermal device tree properties. At this moment, however, we don't
have that cpufreq driver so this patch is still useless for Tegra.
Mikko
Hi Thierry,
> On Fri, Nov 14, 2014 at 12:24:37PM +0100, Lukasz Majewski wrote:
> > Hi Mikko,
> >
> > > Tested-by: Mikko Perttunen <[email protected]>
> >
> > Thanks for testing.
> >
> > >
> > > One potential issue I can see is that if the cpufreq driver fails
> > > to probe then you'll never get the thermal driver either. For
> > > example, Tegra124 currently has no cpufreq driver, so if
> > > CONFIG_CPU_THERMAL was enabled, then the soctherm driver would
> > > never be able to probe.
> >
> > Yes, this is a potential issue. However, this option in
> > tegra_defconfig is by default disabled when thermal is enabled.
>
> Not everybody uses tegra_defconfig for their kernel builds. In fact
> I'd imagine that the majority of kernels use a variant of
> multi_v7_defconfig and therefore CPU_THERMAL might get enabled
> irrespective of any Tegra support.
I see your point.
>
> I think a better solution would be to add this check only when proper
> support for CPU frequency based cooling is added. That is, when a call
> to cpufreq_cooling_register() (or a variant thereof) is added.
For the above reason the code is only compiled in when user enable
CONFIG_CPU_THERMAL.
>
> But while at it, why not make it so that cpufreq_cooling_register()
> detects if a cpufreq driver has been registered yet and propagate
> -EPROBE_DEFER if necessary?
cpufreq_cooling_register() may be a good place to add check for
deferred probe.
The problem with cpufreq_cooling_register() is that it may be called
late in the probe function (as it is done now in Exynos).
>
> Thierry
--
Best regards,
Lukasz Majewski
Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
Hi Thierry,
> On Mon, Nov 17, 2014 at 12:50:13PM +0100, Lukasz Majewski wrote:
> > Hi Thierry,
> >
> > > On Fri, Nov 14, 2014 at 12:47:33PM +0200, Mikko Perttunen wrote:
> > > > Tested-by: Mikko Perttunen <[email protected]>
> > > >
> > > > One potential issue I can see is that if the cpufreq driver
> > > > fails to probe then you'll never get the thermal driver either.
> > > > For example, Tegra124 currently has no cpufreq driver, so if
> > > > CONFIG_CPU_THERMAL was enabled, then the soctherm driver would
> > > > never be able to probe. But I don't really have a solution for
> > > > this either.
> > >
> > > It doesn't seem like there's any code whatsoever to deal with
> > > cpufreq within the soctherm driver, so deferring probe based on
> > > something we're not using anyway seems rather useless.
> >
> > So, If I understood you correctly - this patch is not needed in the
> > /tegra_soctherm.c:[tegra_defconfig] driver and can be safely
> > omitted in v2 of this driver.
>
> What I'm saying is that I don't think doing this mass conversion
> wholesale is useful since none of the drivers register a cooling
> device based on cpufreq. In other words: if you're not going to use a
> feature there's no use testing for it.
>
It seems, like one option here would be to add deferred proble to
cpufreq_cooling_register() or check which driver in its thermal probe
is calling cpufreq_cooling_register() function.
The latter option explains why in the imx_thermal.c file we check for
deferred probe without #ifdefs for CONFIG_CPU_THERMAL.
If no objections, I would like to stick to the code already available
in imx_thermal.c.
> Thierry
--
Best regards,
Lukasz Majewski
Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
On Mon, Nov 17, 2014 at 02:51:24PM +0200, Mikko Perttunen wrote:
> On 11/17/2014 01:43 PM, Thierry Reding wrote:
> >On Fri, Nov 14, 2014 at 12:47:33PM +0200, Mikko Perttunen wrote:
> >>Tested-by: Mikko Perttunen <[email protected]>
> >>
> >>One potential issue I can see is that if the cpufreq driver fails to probe
> >>then you'll never get the thermal driver either. For example, Tegra124
> >>currently has no cpufreq driver, so if CONFIG_CPU_THERMAL was enabled, then
> >>the soctherm driver would never be able to probe. But I don't really have a
> >>solution for this either.
> >
> >It doesn't seem like there's any code whatsoever to deal with cpufreq
> >within the soctherm driver, so deferring probe based on something we're
> >not using anyway seems rather useless.
> >
> >Thierry
> >
>
> My understanding is that there needs to be no code inside soctherm to handle
> it, as the cpufreq driver (cpufreq-dt) will register a cooling device that
> will then be bound to the soctherm sensors using the of-thermal device tree
> properties. At this moment, however, we don't have that cpufreq driver so
> this patch is still useless for Tegra.
But if the cpufreq driver will automatically do this already, why do we
even need to check for it in the soctherm driver?
Thierry
On Mon, Nov 17, 2014 at 01:51:42PM +0100, Lukasz Majewski wrote:
> Hi Thierry,
>
> > On Fri, Nov 14, 2014 at 12:24:37PM +0100, Lukasz Majewski wrote:
> > > Hi Mikko,
> > >
> > > > Tested-by: Mikko Perttunen <[email protected]>
> > >
> > > Thanks for testing.
> > >
> > > >
> > > > One potential issue I can see is that if the cpufreq driver fails
> > > > to probe then you'll never get the thermal driver either. For
> > > > example, Tegra124 currently has no cpufreq driver, so if
> > > > CONFIG_CPU_THERMAL was enabled, then the soctherm driver would
> > > > never be able to probe.
> > >
> > > Yes, this is a potential issue. However, this option in
> > > tegra_defconfig is by default disabled when thermal is enabled.
> >
> > Not everybody uses tegra_defconfig for their kernel builds. In fact
> > I'd imagine that the majority of kernels use a variant of
> > multi_v7_defconfig and therefore CPU_THERMAL might get enabled
> > irrespective of any Tegra support.
>
> I see your point.
>
> >
> > I think a better solution would be to add this check only when proper
> > support for CPU frequency based cooling is added. That is, when a call
> > to cpufreq_cooling_register() (or a variant thereof) is added.
>
> For the above reason the code is only compiled in when user enable
> CONFIG_CPU_THERMAL.
Like I said, CPU_THERMAL is a kernel-wide configuration option and not
tied to the specific SoC. Therefore if an SoC, say Exynos, gains full
support for cooling via cpufreq then somebody may decide to enable the
CPU_THERMAL option in multi_v7_defconfig. At that point every thermal
driver that you're patching in this way but for which no cpufreq driver
is registered will indefinitely defer probing.
So I see two options:
1) make sure that we defer probing only on devices where a cpufreq
driver is available
2) not rely on deferred probing at all but allow a cooling device to
be registered after the thermal driver has finished probing
1) seems impossible to do because cpufreq simply doesn't provide enough
infrastructure to deal with that situation.
Thierry
On 11/17/2014 03:08 PM, Thierry Reding wrote:
> On Mon, Nov 17, 2014 at 02:51:24PM +0200, Mikko Perttunen wrote:
>> On 11/17/2014 01:43 PM, Thierry Reding wrote:
>>> On Fri, Nov 14, 2014 at 12:47:33PM +0200, Mikko Perttunen wrote:
>>>> Tested-by: Mikko Perttunen <[email protected]>
>>>>
>>>> One potential issue I can see is that if the cpufreq driver fails to probe
>>>> then you'll never get the thermal driver either. For example, Tegra124
>>>> currently has no cpufreq driver, so if CONFIG_CPU_THERMAL was enabled, then
>>>> the soctherm driver would never be able to probe. But I don't really have a
>>>> solution for this either.
>>>
>>> It doesn't seem like there's any code whatsoever to deal with cpufreq
>>> within the soctherm driver, so deferring probe based on something we're
>>> not using anyway seems rather useless.
>>>
>>> Thierry
>>>
>>
>> My understanding is that there needs to be no code inside soctherm to handle
>> it, as the cpufreq driver (cpufreq-dt) will register a cooling device that
>> will then be bound to the soctherm sensors using the of-thermal device tree
>> properties. At this moment, however, we don't have that cpufreq driver so
>> this patch is still useless for Tegra.
>
> But if the cpufreq driver will automatically do this already, why do we
> even need to check for it in the soctherm driver?
>
> Thierry
>
Indeed, we shouldn't. Unless I am mistaken, the issue is then that the
cpufreq cooling device calls thermal_cooling_device_register before
being ready to handle callbacks, which clearly would be an issue in the
cpufreq driver.
The thermal core seems to able to handle registrations of thermal zones
and cooling devices in any order; AFAICT it defers binding the tz<->cdev
mapping until both have registered themselves to the thermal core.
Mikko
The return code from ->get_max_state() callback was not checked during
binding cooling device to thermal zone device.
Signed-off-by: Lukasz Majewski <[email protected]>
---
Changes for v2:
- It turned out that patches from 1 to 6 from v1 are not needed, since
they either already solve the problem (like imx_thermal.c) or not use
cpufreq as a thermal cooling device.
- The patch 7 from v1 is also not needed since this patch on error exits
this function without using max_state variable.
- In thermal driver probe the cpufreq_cooling_register() method presence
is crucial to evaluate if the thermal driver needs any actions with
-EPROBE_DEFER.
---
drivers/thermal/thermal_core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 43b9070..8567929 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -928,7 +928,7 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
struct thermal_zone_device *pos1;
struct thermal_cooling_device *pos2;
unsigned long max_state;
- int result;
+ int result, ret;
if (trip >= tz->trips || (trip < 0 && trip != THERMAL_TRIPS_NONE))
return -EINVAL;
@@ -945,7 +945,9 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
if (tz != pos1 || cdev != pos2)
return -EINVAL;
- cdev->ops->get_max_state(cdev, &max_state);
+ ret = cdev->ops->get_max_state(cdev, &max_state);
+ if (ret)
+ return ret;
/* lower default 0, upper default max_state */
lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
--
2.0.0.rc2
Lukasz,
Thanks for the keeping this up. And apologize for late answer.
On Thu, Nov 13, 2014 at 06:02:37PM +0100, Lukasz Majewski wrote:
> Presented fixes are a response for problem described below:
> http://thread.gmane.org/gmane.linux.kernel/1793821/match=thermal+core+fix+initialize+max_state+variable+0
>
> In short - it turned out that two trivial fixes (included in this patch set)
> require support for deferred probe in thermal drivers.
>
> This situation shows up when CPU frequency reduction is used as a thermal cooling
> device for a thermal zone.
> It happens that during initialization, the call to thermal probe will be executed
> before cpufreq probe (it can be observed at ./drivers/Makefile).
> In such a situation thermal will not be properly configured until cpufreq policy
> is setup.
>
> In the current code (without included fixes) there is a time window in which thermal
> can try to use not configured cpufreq and possibly crash the system.
>
>
> Proposed solution was based on the code already available in the imx_thermal.c file.
>
> /db8500_thermal.c: -> NOT NEEDED
> /intel_powerclamp.c: -> NOT NEEDED - INTEL (x86)
> /intel_powerclamp.c: -> NOT NEEDED - INTEL (x86)
> /ti-soc-thermal/ti-bandgap.c: -> FIXED [omap2plus_defconfig]
> /dove_thermal.c: -> NOT NEEDED - CPU_COOLING NOT AVAILABLE
> [dove_defconfig]
> /spear_thermal.c: -> FIXED [spear3xx_defconfig]
> /samsung/exynos_tmu.c: -> NOT NEEDED (nasty hack - will be reworked in later patches)
> /imx_thermal.c: -> OK (deferred probe already in place)
> /int340x_thermal/int3402_thermal.c: -> NOT NEEDED - ACPI x86 - Intel specific
> /int340x_thermal/int3400_thermal.c: -> NOT NEEDED - ACPI x86 - Intel specific
> /tegra_soctherm.c: -> FIXED [tegra_defconfig]
> /kirkwood_thermal.c: -> FIXED [multi_v5_defconfig]
> /armada_thermal.c: -> FIXED [multi_v7_defconfig]
> /rcar_thermal.c: -> FIXED [shmobile_defconfig]
> /db8500_cpufreq_cooling.c: -> OK (deferred probe already in place) [multi_v7_defconfig]
> /st/st_thermal_syscfg.c: -> NOT NEEDED (Those two are enabled by e.g. ARMADA)
> /st/st_thermal_memmap.c:
>
>
Instead of doing the same check on all drivers in the need for cpu
cooling looks like a promiscuous solution. What if we do this check in
cpu cooling itself and we propagate the error in callers code?
From what I see, only exynos does not propagate the error. And we would
need a tweak in the cpufreq-dt code. Something like the following
(not tested):
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index f657c57..f139247 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -181,7 +181,6 @@ static int cpufreq_init(struct cpufreq_policy *policy)
{
struct cpufreq_dt_platform_data *pd;
struct cpufreq_frequency_table *freq_table;
- struct thermal_cooling_device *cdev;
struct device_node *np;
struct private_data *priv;
struct device *cpu_dev;
@@ -264,20 +263,6 @@ static int cpufreq_init(struct cpufreq_policy *policy)
goto out_free_priv;
}
- /*
- * For now, just loading the cooling device;
- * thermal DT code takes care of matching them.
- */
- if (of_find_property(np, "#cooling-cells", NULL)) {
- cdev = of_cpufreq_cooling_register(np, cpu_present_mask);
- if (IS_ERR(cdev))
- dev_err(cpu_dev,
- "running cpufreq without cooling device: %ld\n",
- PTR_ERR(cdev));
- else
- priv->cdev = cdev;
- }
-
priv->cpu_dev = cpu_dev;
priv->cpu_reg = cpu_reg;
policy->driver_data = priv;
@@ -287,7 +272,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
if (ret) {
dev_err(cpu_dev, "%s: invalid frequency table: %d\n", __func__,
ret);
- goto out_cooling_unregister;
+ goto free_table;
}
policy->cpuinfo.transition_latency = transition_latency;
@@ -300,8 +285,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
return 0;
-out_cooling_unregister:
- cpufreq_cooling_unregister(priv->cdev);
+free_table:
dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table);
out_free_priv:
kfree(priv);
@@ -342,11 +326,14 @@ static struct cpufreq_driver dt_cpufreq_driver = {
static int dt_cpufreq_probe(struct platform_device *pdev)
{
+ struct device_node *np;
struct device *cpu_dev;
struct regulator *cpu_reg;
struct clk *cpu_clk;
int ret;
+ /* at this point we checked the pointer already right? */
+ np = of_node_get(pdev->dev.of_node);
/*
* All per-cluster (CPUs sharing clock/voltages) initialization is done
* from ->init(). In probe(), we just need to make sure that clk and
@@ -368,6 +355,28 @@ static int dt_cpufreq_probe(struct platform_device *pdev)
if (ret)
dev_err(cpu_dev, "failed register driver: %d\n", ret);
+ /*
+ * For now, just loading the cooling device;
+ * thermal DT code takes care of matching them.
+ */
+ if (of_find_property(np, "#cooling-cells", NULL)) {
+ struct cpufreq_policy policy;
+ struct private_data *priv;
+ struct thermal_cooling_device *cdev;
+
+ /* TODO: can cpu0 be always used ? */
+ cpufreq_get_policy(&policy, 0);
+ priv = policy.driver_data;
+ cdev = of_cpufreq_cooling_register(np, cpu_present_mask);
+ if (IS_ERR(cdev))
+ dev_err(cpu_dev,
+ "running cpufreq without cooling device: %ld\n",
+ PTR_ERR(cdev));
+ else
+ priv->cdev = cdev;
+ }
+ of_node_put(np);
+
return ret;
}
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 1ab0018..342eb9e 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -440,6 +440,11 @@ __cpufreq_cooling_register(struct device_node *np,
int ret = 0, i;
struct cpufreq_policy policy;
+ if (!cpufreq_get_current_driver()) {
+ dev_warn(&pdev->dev, "no cpufreq driver, deferring.");
+ return -EPROBE_DEFER;
+ }
+
/* Verify that all the clip cpus have same freq_min, freq_max limit */
for_each_cpu(i, clip_cpus) {
/* continue if cpufreq policy not found and not return error */
diff --git a/drivers/thermal/samsung/exynos_thermal_common.c b/drivers/thermal/samsung/exynos_thermal_common.c
index 3f5ad25..f84975e 100644
--- a/drivers/thermal/samsung/exynos_thermal_common.c
+++ b/drivers/thermal/samsung/exynos_thermal_common.c
@@ -373,7 +373,7 @@ int exynos_register_thermal(struct thermal_sensor_conf *sensor_conf)
if (IS_ERR(th_zone->cool_dev[th_zone->cool_dev_size])) {
dev_err(sensor_conf->dev,
"Failed to register cpufreq cooling device\n");
- ret = -EINVAL;
+ ret = PTR_ERR(th_zone->cool_dev[th_zone->cool_dev_size]);
goto err_unregister;
}
th_zone->cool_dev_size++;
The above way, we avoid having same test in every driver that needs it.
Besides, it makes sense the cpu_cooling code takes care of this check,
as it is the very first part that has direct dependency with cpufreq.
> I only possess Exynos boards and Beagle Bone Black, so I'd be grateful for
> testing proposed solution on other boards. The posted code is compile tested.
>
> This code applies on Eduardo's ti-soc-thermal-next tree:
> SHA1: 208a97042d66d9bfbcfab0d4a00c9fe317bb73d3
>
> Lukasz Majewski (8):
> thermal:cpu cooling:armada: Provide deferred probing for armada driver
> thermal:cpu cooling:kirkwood: Provide deferred probing for kirkwood
> driver
> thermal:cpu cooling:rcar: Provide deferred probing for rcar driver
> thermal:cpu cooling:spear: Provide deferred probing for spear driver
> thermal:cpu cooling:tegra: Provide deferred probing for tegra driver
> thermal:cpu cooling:ti: Provide deferred probing for ti drivers
> thermal:core:fix: Initialize the max_state variable to 0
> thermal:core:fix: Check return code of the ->get_max_state() callback
>
> drivers/thermal/armada_thermal.c | 7 +++++++
> drivers/thermal/kirkwood_thermal.c | 7 +++++++
> drivers/thermal/rcar_thermal.c | 7 +++++++
> drivers/thermal/spear_thermal.c | 7 +++++++
> drivers/thermal/tegra_soctherm.c | 7 +++++++
> drivers/thermal/thermal_core.c | 8 +++++---
> drivers/thermal/ti-soc-thermal/ti-bandgap.c | 7 +++++++
> 7 files changed, 47 insertions(+), 3 deletions(-)
>
> --
> 2.0.0.rc2
>
Lukaz,
On Thu, Nov 13, 2014 at 06:02:43PM +0100, Lukasz Majewski wrote:
> When CPU freq is used as a thermal zone cooling device, one needs to wait
> until cpufreq subsystem is properly initialized.
>
> This code is similar to the one already available in imx_thermal.c file.
>
> Signed-off-by: Lukasz Majewski <[email protected]>
> ---
> drivers/thermal/ti-soc-thermal/ti-bandgap.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> index 634b6ce..0ac5ead 100644
> --- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> +++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
> @@ -40,6 +40,7 @@
> #include <linux/of_irq.h>
> #include <linux/of_gpio.h>
> #include <linux/io.h>
> +#include <linux/cpufreq.h>
>
> #include "ti-bandgap.h"
>
> @@ -1196,6 +1197,12 @@ int ti_bandgap_probe(struct platform_device *pdev)
> struct ti_bandgap *bgp;
> int clk_rate, ret = 0, i;
>
> +#ifdef CONFIG_CPU_THERMAL
> + if (!cpufreq_get_current_driver()) {
> + dev_dbg(&pdev->dev, "no cpufreq driver!");
> + return -EPROBE_DEFER;
> + }
> +#endif
This is part of drivers/thermal/ti-soc-thermal/ti-thermal-common.c. So,
no need to duplicate it in here.
> bgp = ti_bandgap_build(pdev);
> if (IS_ERR(bgp)) {
> dev_err(&pdev->dev, "failed to fetch platform data\n");
> --
> 2.0.0.rc2
>
Hi Eduardo,
> Lukasz,
>
> Thanks for the keeping this up. And apologize for late answer.
I've already posted v2 of this patch set (which consists of only one
patch :-) ).
Thanks to Thierry Reding's hint, I've realized that I don't need to add
code from patches 1-6 from v1.
Please instead review following patch:
"thermal:core:fix: Check return code of the ->get_max_state() callback"
https://patchwork.kernel.org/patch/5326991/
>
> On Thu, Nov 13, 2014 at 06:02:37PM +0100, Lukasz Majewski wrote:
> > Presented fixes are a response for problem described below:
> > http://thread.gmane.org/gmane.linux.kernel/1793821/match=thermal+core+fix+initialize+max_state+variable+0
> >
> > In short - it turned out that two trivial fixes (included in this
> > patch set) require support for deferred probe in thermal drivers.
> >
> > This situation shows up when CPU frequency reduction is used as a
> > thermal cooling device for a thermal zone.
> > It happens that during initialization, the call to thermal probe
> > will be executed before cpufreq probe (it can be observed
> > at ./drivers/Makefile). In such a situation thermal will not be
> > properly configured until cpufreq policy is setup.
> >
> > In the current code (without included fixes) there is a time window
> > in which thermal can try to use not configured cpufreq and possibly
> > crash the system.
> >
> >
> > Proposed solution was based on the code already available in the
> > imx_thermal.c file.
> >
> > /db8500_thermal.c: -> NOT NEEDED
> > /intel_powerclamp.c: -> NOT NEEDED - INTEL (x86)
> > /intel_powerclamp.c: -> NOT NEEDED - INTEL (x86)
> > /ti-soc-thermal/ti-bandgap.c: -> FIXED
> > [omap2plus_defconfig] /dove_thermal.c: ->
> > NOT NEEDED - CPU_COOLING NOT AVAILABLE [dove_defconfig]
> > /spear_thermal.c: -> FIXED
> > [spear3xx_defconfig] /samsung/exynos_tmu.c: -> NOT
> > NEEDED (nasty hack - will be reworked in later
> > patches) /imx_thermal.c: -> OK (deferred
> > probe already in place) /int340x_thermal/int3402_thermal.c: ->
> > NOT NEEDED - ACPI x86 - Intel
> > specific /int340x_thermal/int3400_thermal.c: -> NOT NEEDED -
> > ACPI x86 - Intel specific /tegra_soctherm.c:
> > -> FIXED [tegra_defconfig] /kirkwood_thermal.c:
> > -> FIXED
> > [multi_v5_defconfig] /armada_thermal.c: ->
> > FIXED [multi_v7_defconfig] /rcar_thermal.c:
> > -> FIXED
> > [shmobile_defconfig] /db8500_cpufreq_cooling.c: -> OK
> > (deferred probe already in place)
> > [multi_v7_defconfig] /st/st_thermal_syscfg.c: -> NOT
> > NEEDED (Those two are enabled by e.g.
> > ARMADA) /st/st_thermal_memmap.c:
> >
> >
>
>
> Instead of doing the same check on all drivers in the need for cpu
> cooling looks like a promiscuous solution. What if we do this check in
> cpu cooling itself and we propagate the error in callers code?
>
> From what I see, only exynos does not propagate the error. And we
> would need a tweak in the cpufreq-dt code. Something like the
> following (not tested):
>
> diff --git a/drivers/cpufreq/cpufreq-dt.c
> b/drivers/cpufreq/cpufreq-dt.c index f657c57..f139247 100644
> --- a/drivers/cpufreq/cpufreq-dt.c
> +++ b/drivers/cpufreq/cpufreq-dt.c
> @@ -181,7 +181,6 @@ static int cpufreq_init(struct cpufreq_policy
> *policy) {
> struct cpufreq_dt_platform_data *pd;
> struct cpufreq_frequency_table *freq_table;
> - struct thermal_cooling_device *cdev;
> struct device_node *np;
> struct private_data *priv;
> struct device *cpu_dev;
> @@ -264,20 +263,6 @@ static int cpufreq_init(struct cpufreq_policy
> *policy) goto out_free_priv;
> }
>
> - /*
> - * For now, just loading the cooling device;
> - * thermal DT code takes care of matching them.
> - */
> - if (of_find_property(np, "#cooling-cells", NULL)) {
> - cdev = of_cpufreq_cooling_register(np,
> cpu_present_mask);
> - if (IS_ERR(cdev))
> - dev_err(cpu_dev,
> - "running cpufreq without cooling
> device: %ld\n",
> - PTR_ERR(cdev));
> - else
> - priv->cdev = cdev;
> - }
> -
> priv->cpu_dev = cpu_dev;
> priv->cpu_reg = cpu_reg;
> policy->driver_data = priv;
> @@ -287,7 +272,7 @@ static int cpufreq_init(struct cpufreq_policy
> *policy) if (ret) {
> dev_err(cpu_dev, "%s: invalid frequency table:
> %d\n", __func__, ret);
> - goto out_cooling_unregister;
> + goto free_table;
> }
>
> policy->cpuinfo.transition_latency = transition_latency;
> @@ -300,8 +285,7 @@ static int cpufreq_init(struct cpufreq_policy
> *policy)
> return 0;
>
> -out_cooling_unregister:
> - cpufreq_cooling_unregister(priv->cdev);
> +free_table:
> dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table);
> out_free_priv:
> kfree(priv);
> @@ -342,11 +326,14 @@ static struct cpufreq_driver dt_cpufreq_driver
> = {
> static int dt_cpufreq_probe(struct platform_device *pdev)
> {
> + struct device_node *np;
> struct device *cpu_dev;
> struct regulator *cpu_reg;
> struct clk *cpu_clk;
> int ret;
>
> + /* at this point we checked the pointer already right? */
> + np = of_node_get(pdev->dev.of_node);
> /*
> * All per-cluster (CPUs sharing clock/voltages)
> initialization is done
> * from ->init(). In probe(), we just need to make sure that
> clk and @@ -368,6 +355,28 @@ static int dt_cpufreq_probe(struct
> platform_device *pdev) if (ret)
> dev_err(cpu_dev, "failed register driver: %d\n",
> ret);
> + /*
> + * For now, just loading the cooling device;
> + * thermal DT code takes care of matching them.
> + */
> + if (of_find_property(np, "#cooling-cells", NULL)) {
> + struct cpufreq_policy policy;
> + struct private_data *priv;
> + struct thermal_cooling_device *cdev;
> +
> + /* TODO: can cpu0 be always used ? */
> + cpufreq_get_policy(&policy, 0);
> + priv = policy.driver_data;
> + cdev = of_cpufreq_cooling_register(np,
> cpu_present_mask);
> + if (IS_ERR(cdev))
> + dev_err(cpu_dev,
> + "running cpufreq without cooling
> device: %ld\n",
> + PTR_ERR(cdev));
> + else
> + priv->cdev = cdev;
> + }
> + of_node_put(np);
> +
> return ret;
> }
>
> diff --git a/drivers/thermal/cpu_cooling.c
> b/drivers/thermal/cpu_cooling.c index 1ab0018..342eb9e 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -440,6 +440,11 @@ __cpufreq_cooling_register(struct device_node
> *np, int ret = 0, i;
> struct cpufreq_policy policy;
>
> + if (!cpufreq_get_current_driver()) {
> + dev_warn(&pdev->dev, "no cpufreq driver,
> deferring.");
> + return -EPROBE_DEFER;
> + }
> +
> /* Verify that all the clip cpus have same freq_min,
> freq_max limit */ for_each_cpu(i, clip_cpus) {
> /* continue if cpufreq policy not found and not
> return error */ diff --git
> a/drivers/thermal/samsung/exynos_thermal_common.c
> b/drivers/thermal/samsung/exynos_thermal_common.c index
> 3f5ad25..f84975e 100644 ---
> a/drivers/thermal/samsung/exynos_thermal_common.c +++
> b/drivers/thermal/samsung/exynos_thermal_common.c @@ -373,7 +373,7 @@
> int exynos_register_thermal(struct thermal_sensor_conf *sensor_conf)
> if (IS_ERR(th_zone->cool_dev[th_zone->cool_dev_size]))
> { dev_err(sensor_conf->dev, "Failed to register cpufreq cooling
> device\n");
> - ret = -EINVAL;
> + ret =
> PTR_ERR(th_zone->cool_dev[th_zone->cool_dev_size]); goto
> err_unregister; }
> th_zone->cool_dev_size++;
>
>
> The above way, we avoid having same test in every driver that needs
> it. Besides, it makes sense the cpu_cooling code takes care of this
> check, as it is the very first part that has direct dependency with
> cpufreq.
>
> > I only possess Exynos boards and Beagle Bone Black, so I'd be
> > grateful for testing proposed solution on other boards. The posted
> > code is compile tested.
> >
> > This code applies on Eduardo's ti-soc-thermal-next tree:
> > SHA1: 208a97042d66d9bfbcfab0d4a00c9fe317bb73d3
> >
> > Lukasz Majewski (8):
> > thermal:cpu cooling:armada: Provide deferred probing for armada
> > driver thermal:cpu cooling:kirkwood: Provide deferred probing for
> > kirkwood driver
> > thermal:cpu cooling:rcar: Provide deferred probing for rcar driver
> > thermal:cpu cooling:spear: Provide deferred probing for spear
> > driver thermal:cpu cooling:tegra: Provide deferred probing for
> > tegra driver thermal:cpu cooling:ti: Provide deferred probing for
> > ti drivers thermal:core:fix: Initialize the max_state variable to 0
> > thermal:core:fix: Check return code of the ->get_max_state()
> > callback
> >
> > drivers/thermal/armada_thermal.c | 7 +++++++
> > drivers/thermal/kirkwood_thermal.c | 7 +++++++
> > drivers/thermal/rcar_thermal.c | 7 +++++++
> > drivers/thermal/spear_thermal.c | 7 +++++++
> > drivers/thermal/tegra_soctherm.c | 7 +++++++
> > drivers/thermal/thermal_core.c | 8 +++++---
> > drivers/thermal/ti-soc-thermal/ti-bandgap.c | 7 +++++++
> > 7 files changed, 47 insertions(+), 3 deletions(-)
> >
> > --
> > 2.0.0.rc2
> >
--
Best regards,
Lukasz Majewski
Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
Lukasz,
On Fri, Nov 21, 2014 at 09:33:48AM +0100, Lukasz Majewski wrote:
> Hi Eduardo,
>
> > Lukasz,
> >
> > Thanks for the keeping this up. And apologize for late answer.
>
> I've already posted v2 of this patch set (which consists of only one
> patch :-) ).
>
> Thanks to Thierry Reding's hint, I've realized that I don't need to add
> code from patches 1-6 from v1.
>
> Please instead review following patch:
> "thermal:core:fix: Check return code of the ->get_max_state() callback"
> https://patchwork.kernel.org/patch/5326991/
I see. If I got correctly, with the above patch, we still need to have
the check for cpufreq driver in the thermal driver right?
quoting:
"In thermal driver probe the cpufreq_cooling_register() method presence
is crucial to evaluate if the thermal driver needs any actions with
-EPROBE_DEFER."
If yes, that means the proposal still leaves to drivers to deal with
the sequencing. For the patch above, I believe it is fine. However, a
better sequencing is still needed :-(.
For the case of of-thermal based drivers, it should be dealt between
cpu_cooling and cpufreq, as I proposed, bellow. I really agree that
drivers should not care about this, and thus we should not spread the
check among drivers, specially if there is nothing regarding cpufreq in
the driver's code. I might send the proposal of having the check between
cpu_cooling and cpufreq as a formal patch, in a separated thread.
I will have a look in your v2. Briefly looking, looks reasonable.
Once again, thanks.
Cheers,
Eduardo Valentin
>
>
> >
> > On Thu, Nov 13, 2014 at 06:02:37PM +0100, Lukasz Majewski wrote:
> > > Presented fixes are a response for problem described below:
> > > http://thread.gmane.org/gmane.linux.kernel/1793821/match=thermal+core+fix+initialize+max_state+variable+0
> > >
> > > In short - it turned out that two trivial fixes (included in this
> > > patch set) require support for deferred probe in thermal drivers.
> > >
> > > This situation shows up when CPU frequency reduction is used as a
> > > thermal cooling device for a thermal zone.
> > > It happens that during initialization, the call to thermal probe
> > > will be executed before cpufreq probe (it can be observed
> > > at ./drivers/Makefile). In such a situation thermal will not be
> > > properly configured until cpufreq policy is setup.
> > >
> > > In the current code (without included fixes) there is a time window
> > > in which thermal can try to use not configured cpufreq and possibly
> > > crash the system.
> > >
> > >
> > > Proposed solution was based on the code already available in the
> > > imx_thermal.c file.
> > >
> > > /db8500_thermal.c: -> NOT NEEDED
> > > /intel_powerclamp.c: -> NOT NEEDED - INTEL (x86)
> > > /intel_powerclamp.c: -> NOT NEEDED - INTEL (x86)
> > > /ti-soc-thermal/ti-bandgap.c: -> FIXED
> > > [omap2plus_defconfig] /dove_thermal.c: ->
> > > NOT NEEDED - CPU_COOLING NOT AVAILABLE [dove_defconfig]
> > > /spear_thermal.c: -> FIXED
> > > [spear3xx_defconfig] /samsung/exynos_tmu.c: -> NOT
> > > NEEDED (nasty hack - will be reworked in later
> > > patches) /imx_thermal.c: -> OK (deferred
> > > probe already in place) /int340x_thermal/int3402_thermal.c: ->
> > > NOT NEEDED - ACPI x86 - Intel
> > > specific /int340x_thermal/int3400_thermal.c: -> NOT NEEDED -
> > > ACPI x86 - Intel specific /tegra_soctherm.c:
> > > -> FIXED [tegra_defconfig] /kirkwood_thermal.c:
> > > -> FIXED
> > > [multi_v5_defconfig] /armada_thermal.c: ->
> > > FIXED [multi_v7_defconfig] /rcar_thermal.c:
> > > -> FIXED
> > > [shmobile_defconfig] /db8500_cpufreq_cooling.c: -> OK
> > > (deferred probe already in place)
> > > [multi_v7_defconfig] /st/st_thermal_syscfg.c: -> NOT
> > > NEEDED (Those two are enabled by e.g.
> > > ARMADA) /st/st_thermal_memmap.c:
> > >
> > >
> >
> >
> > Instead of doing the same check on all drivers in the need for cpu
> > cooling looks like a promiscuous solution. What if we do this check in
> > cpu cooling itself and we propagate the error in callers code?
> >
> > From what I see, only exynos does not propagate the error. And we
> > would need a tweak in the cpufreq-dt code. Something like the
> > following (not tested):
> >
> > diff --git a/drivers/cpufreq/cpufreq-dt.c
> > b/drivers/cpufreq/cpufreq-dt.c index f657c57..f139247 100644
> > --- a/drivers/cpufreq/cpufreq-dt.c
> > +++ b/drivers/cpufreq/cpufreq-dt.c
> > @@ -181,7 +181,6 @@ static int cpufreq_init(struct cpufreq_policy
> > *policy) {
> > struct cpufreq_dt_platform_data *pd;
> > struct cpufreq_frequency_table *freq_table;
> > - struct thermal_cooling_device *cdev;
> > struct device_node *np;
> > struct private_data *priv;
> > struct device *cpu_dev;
> > @@ -264,20 +263,6 @@ static int cpufreq_init(struct cpufreq_policy
> > *policy) goto out_free_priv;
> > }
> >
> > - /*
> > - * For now, just loading the cooling device;
> > - * thermal DT code takes care of matching them.
> > - */
> > - if (of_find_property(np, "#cooling-cells", NULL)) {
> > - cdev = of_cpufreq_cooling_register(np,
> > cpu_present_mask);
> > - if (IS_ERR(cdev))
> > - dev_err(cpu_dev,
> > - "running cpufreq without cooling
> > device: %ld\n",
> > - PTR_ERR(cdev));
> > - else
> > - priv->cdev = cdev;
> > - }
> > -
> > priv->cpu_dev = cpu_dev;
> > priv->cpu_reg = cpu_reg;
> > policy->driver_data = priv;
> > @@ -287,7 +272,7 @@ static int cpufreq_init(struct cpufreq_policy
> > *policy) if (ret) {
> > dev_err(cpu_dev, "%s: invalid frequency table:
> > %d\n", __func__, ret);
> > - goto out_cooling_unregister;
> > + goto free_table;
> > }
> >
> > policy->cpuinfo.transition_latency = transition_latency;
> > @@ -300,8 +285,7 @@ static int cpufreq_init(struct cpufreq_policy
> > *policy)
> > return 0;
> >
> > -out_cooling_unregister:
> > - cpufreq_cooling_unregister(priv->cdev);
> > +free_table:
> > dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table);
> > out_free_priv:
> > kfree(priv);
> > @@ -342,11 +326,14 @@ static struct cpufreq_driver dt_cpufreq_driver
> > = {
> > static int dt_cpufreq_probe(struct platform_device *pdev)
> > {
> > + struct device_node *np;
> > struct device *cpu_dev;
> > struct regulator *cpu_reg;
> > struct clk *cpu_clk;
> > int ret;
> >
> > + /* at this point we checked the pointer already right? */
> > + np = of_node_get(pdev->dev.of_node);
> > /*
> > * All per-cluster (CPUs sharing clock/voltages)
> > initialization is done
> > * from ->init(). In probe(), we just need to make sure that
> > clk and @@ -368,6 +355,28 @@ static int dt_cpufreq_probe(struct
> > platform_device *pdev) if (ret)
> > dev_err(cpu_dev, "failed register driver: %d\n",
> > ret);
> > + /*
> > + * For now, just loading the cooling device;
> > + * thermal DT code takes care of matching them.
> > + */
> > + if (of_find_property(np, "#cooling-cells", NULL)) {
> > + struct cpufreq_policy policy;
> > + struct private_data *priv;
> > + struct thermal_cooling_device *cdev;
> > +
> > + /* TODO: can cpu0 be always used ? */
> > + cpufreq_get_policy(&policy, 0);
> > + priv = policy.driver_data;
> > + cdev = of_cpufreq_cooling_register(np,
> > cpu_present_mask);
> > + if (IS_ERR(cdev))
> > + dev_err(cpu_dev,
> > + "running cpufreq without cooling
> > device: %ld\n",
> > + PTR_ERR(cdev));
> > + else
> > + priv->cdev = cdev;
> > + }
> > + of_node_put(np);
> > +
> > return ret;
> > }
> >
> > diff --git a/drivers/thermal/cpu_cooling.c
> > b/drivers/thermal/cpu_cooling.c index 1ab0018..342eb9e 100644
> > --- a/drivers/thermal/cpu_cooling.c
> > +++ b/drivers/thermal/cpu_cooling.c
> > @@ -440,6 +440,11 @@ __cpufreq_cooling_register(struct device_node
> > *np, int ret = 0, i;
> > struct cpufreq_policy policy;
> >
> > + if (!cpufreq_get_current_driver()) {
> > + dev_warn(&pdev->dev, "no cpufreq driver,
> > deferring.");
> > + return -EPROBE_DEFER;
> > + }
> > +
> > /* Verify that all the clip cpus have same freq_min,
> > freq_max limit */ for_each_cpu(i, clip_cpus) {
> > /* continue if cpufreq policy not found and not
> > return error */ diff --git
> > a/drivers/thermal/samsung/exynos_thermal_common.c
> > b/drivers/thermal/samsung/exynos_thermal_common.c index
> > 3f5ad25..f84975e 100644 ---
> > a/drivers/thermal/samsung/exynos_thermal_common.c +++
> > b/drivers/thermal/samsung/exynos_thermal_common.c @@ -373,7 +373,7 @@
> > int exynos_register_thermal(struct thermal_sensor_conf *sensor_conf)
> > if (IS_ERR(th_zone->cool_dev[th_zone->cool_dev_size]))
> > { dev_err(sensor_conf->dev, "Failed to register cpufreq cooling
> > device\n");
> > - ret = -EINVAL;
> > + ret =
> > PTR_ERR(th_zone->cool_dev[th_zone->cool_dev_size]); goto
> > err_unregister; }
> > th_zone->cool_dev_size++;
> >
> >
> > The above way, we avoid having same test in every driver that needs
> > it. Besides, it makes sense the cpu_cooling code takes care of this
> > check, as it is the very first part that has direct dependency with
> > cpufreq.
> >
> > > I only possess Exynos boards and Beagle Bone Black, so I'd be
> > > grateful for testing proposed solution on other boards. The posted
> > > code is compile tested.
> > >
> > > This code applies on Eduardo's ti-soc-thermal-next tree:
> > > SHA1: 208a97042d66d9bfbcfab0d4a00c9fe317bb73d3
> > >
> > > Lukasz Majewski (8):
> > > thermal:cpu cooling:armada: Provide deferred probing for armada
> > > driver thermal:cpu cooling:kirkwood: Provide deferred probing for
> > > kirkwood driver
> > > thermal:cpu cooling:rcar: Provide deferred probing for rcar driver
> > > thermal:cpu cooling:spear: Provide deferred probing for spear
> > > driver thermal:cpu cooling:tegra: Provide deferred probing for
> > > tegra driver thermal:cpu cooling:ti: Provide deferred probing for
> > > ti drivers thermal:core:fix: Initialize the max_state variable to 0
> > > thermal:core:fix: Check return code of the ->get_max_state()
> > > callback
> > >
> > > drivers/thermal/armada_thermal.c | 7 +++++++
> > > drivers/thermal/kirkwood_thermal.c | 7 +++++++
> > > drivers/thermal/rcar_thermal.c | 7 +++++++
> > > drivers/thermal/spear_thermal.c | 7 +++++++
> > > drivers/thermal/tegra_soctherm.c | 7 +++++++
> > > drivers/thermal/thermal_core.c | 8 +++++---
> > > drivers/thermal/ti-soc-thermal/ti-bandgap.c | 7 +++++++
> > > 7 files changed, 47 insertions(+), 3 deletions(-)
> > >
> > > --
> > > 2.0.0.rc2
> > >
>
>
>
> --
> Best regards,
>
> Lukasz Majewski
>
> Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
Hi Eduardo,
> Lukasz,
>
> On Fri, Nov 21, 2014 at 09:33:48AM +0100, Lukasz Majewski wrote:
> > Hi Eduardo,
> >
> > > Lukasz,
> > >
> > > Thanks for the keeping this up. And apologize for late answer.
> >
> > I've already posted v2 of this patch set (which consists of only one
> > patch :-) ).
> >
> > Thanks to Thierry Reding's hint, I've realized that I don't need to
> > add code from patches 1-6 from v1.
> >
> > Please instead review following patch:
> > "thermal:core:fix: Check return code of the ->get_max_state()
> > callback" https://patchwork.kernel.org/patch/5326991/
>
> I see. If I got correctly, with the above patch, we still need to have
> the check for cpufreq driver in the thermal driver right?
> quoting:
> "In thermal driver probe the cpufreq_cooling_register() method
> presence is crucial to evaluate if the thermal driver needs any
> actions with -EPROBE_DEFER."
Yes we need those checks in thermal drivers. However, proper checks are
already in place - please look into imx_thermal.c case.
>
> If yes, that means the proposal still leaves to drivers to deal with
> the sequencing. For the patch above, I believe it is fine. However, a
> better sequencing is still needed :-(.
There is always a trade off. What I've shown in v2 is that I'm pretty
confident that my fix won't introduce any regression for present code.
>
> For the case of of-thermal based drivers, it should be dealt between
> cpu_cooling and cpufreq, as I proposed, bellow. I really agree that
> drivers should not care about this, and thus we should not spread the
> check among drivers,
Unfortunately several checks are already in place (imx_thermal.c,
db8500_cpufreq.c, ti-soc*.c, in some way the old Exynos thermal driver).
> specially if there is nothing regarding cpufreq
> in the driver's code.
There is a call to cpufreq_cooling_register(), which sometimes happens
in the late part of probe function. In such a case it would be quite
challenging to release already allocated resources (e.g. ti, old
Exynos driver).
> I might send the proposal of having the check
> between cpu_cooling and cpufreq as a formal patch, in a separated
> thread.
It would be beneficial to hear Viresh's opinion.
>
> I will have a look in your v2. Briefly looking, looks reasonable.
In short: The goal of this patch is to show that regressions should not
happen and that it can be safely applied for v3.19.
>
>
> Once again, thanks.
>
> Cheers,
>
> Eduardo Valentin
>
> >
> >
> > >
> > > On Thu, Nov 13, 2014 at 06:02:37PM +0100, Lukasz Majewski wrote:
> > > > Presented fixes are a response for problem described below:
> > > > http://thread.gmane.org/gmane.linux.kernel/1793821/match=thermal+core+fix+initialize+max_state+variable+0
> > > >
> > > > In short - it turned out that two trivial fixes (included in
> > > > this patch set) require support for deferred probe in thermal
> > > > drivers.
> > > >
> > > > This situation shows up when CPU frequency reduction is used as
> > > > a thermal cooling device for a thermal zone.
> > > > It happens that during initialization, the call to thermal probe
> > > > will be executed before cpufreq probe (it can be observed
> > > > at ./drivers/Makefile). In such a situation thermal will not be
> > > > properly configured until cpufreq policy is setup.
> > > >
> > > > In the current code (without included fixes) there is a time
> > > > window in which thermal can try to use not configured cpufreq
> > > > and possibly crash the system.
> > > >
> > > >
> > > > Proposed solution was based on the code already available in the
> > > > imx_thermal.c file.
> > > >
> > > > /db8500_thermal.c: -> NOT NEEDED
> > > > /intel_powerclamp.c: -> NOT NEEDED - INTEL
> > > > (x86) /intel_powerclamp.c: -> NOT NEEDED -
> > > > INTEL (x86) /ti-soc-thermal/ti-bandgap.c: -> FIXED
> > > > [omap2plus_defconfig] /dove_thermal.c: ->
> > > > NOT NEEDED - CPU_COOLING NOT AVAILABLE [dove_defconfig]
> > > > /spear_thermal.c: -> FIXED
> > > > [spear3xx_defconfig] /samsung/exynos_tmu.c: ->
> > > > NOT NEEDED (nasty hack - will be reworked in later
> > > > patches) /imx_thermal.c: -> OK (deferred
> > > > probe already in place) /int340x_thermal/int3402_thermal.c:
> > > > -> NOT NEEDED - ACPI x86 - Intel
> > > > specific /int340x_thermal/int3400_thermal.c: -> NOT NEEDED -
> > > > ACPI x86 - Intel specific /tegra_soctherm.c:
> > > > -> FIXED [tegra_defconfig] /kirkwood_thermal.c:
> > > > -> FIXED
> > > > [multi_v5_defconfig] /armada_thermal.c: ->
> > > > FIXED [multi_v7_defconfig] /rcar_thermal.c:
> > > > -> FIXED
> > > > [shmobile_defconfig] /db8500_cpufreq_cooling.c: ->
> > > > OK (deferred probe already in place)
> > > > [multi_v7_defconfig] /st/st_thermal_syscfg.c: ->
> > > > NOT NEEDED (Those two are enabled by e.g.
> > > > ARMADA) /st/st_thermal_memmap.c:
> > > >
> > > >
> > >
> > >
> > > Instead of doing the same check on all drivers in the need for cpu
> > > cooling looks like a promiscuous solution. What if we do this
> > > check in cpu cooling itself and we propagate the error in callers
> > > code?
> > >
> > > From what I see, only exynos does not propagate the error. And we
> > > would need a tweak in the cpufreq-dt code. Something like the
> > > following (not tested):
> > >
> > > diff --git a/drivers/cpufreq/cpufreq-dt.c
> > > b/drivers/cpufreq/cpufreq-dt.c index f657c57..f139247 100644
> > > --- a/drivers/cpufreq/cpufreq-dt.c
> > > +++ b/drivers/cpufreq/cpufreq-dt.c
> > > @@ -181,7 +181,6 @@ static int cpufreq_init(struct cpufreq_policy
> > > *policy) {
> > > struct cpufreq_dt_platform_data *pd;
> > > struct cpufreq_frequency_table *freq_table;
> > > - struct thermal_cooling_device *cdev;
> > > struct device_node *np;
> > > struct private_data *priv;
> > > struct device *cpu_dev;
> > > @@ -264,20 +263,6 @@ static int cpufreq_init(struct cpufreq_policy
> > > *policy) goto out_free_priv;
> > > }
> > >
> > > - /*
> > > - * For now, just loading the cooling device;
> > > - * thermal DT code takes care of matching them.
> > > - */
> > > - if (of_find_property(np, "#cooling-cells", NULL)) {
> > > - cdev = of_cpufreq_cooling_register(np,
> > > cpu_present_mask);
> > > - if (IS_ERR(cdev))
> > > - dev_err(cpu_dev,
> > > - "running cpufreq without cooling
> > > device: %ld\n",
> > > - PTR_ERR(cdev));
> > > - else
> > > - priv->cdev = cdev;
> > > - }
> > > -
> > > priv->cpu_dev = cpu_dev;
> > > priv->cpu_reg = cpu_reg;
> > > policy->driver_data = priv;
> > > @@ -287,7 +272,7 @@ static int cpufreq_init(struct cpufreq_policy
> > > *policy) if (ret) {
> > > dev_err(cpu_dev, "%s: invalid frequency table:
> > > %d\n", __func__, ret);
> > > - goto out_cooling_unregister;
> > > + goto free_table;
> > > }
> > >
> > > policy->cpuinfo.transition_latency = transition_latency;
> > > @@ -300,8 +285,7 @@ static int cpufreq_init(struct cpufreq_policy
> > > *policy)
> > > return 0;
> > >
> > > -out_cooling_unregister:
> > > - cpufreq_cooling_unregister(priv->cdev);
> > > +free_table:
> > > dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table);
> > > out_free_priv:
> > > kfree(priv);
> > > @@ -342,11 +326,14 @@ static struct cpufreq_driver
> > > dt_cpufreq_driver = {
> > > static int dt_cpufreq_probe(struct platform_device *pdev)
> > > {
> > > + struct device_node *np;
> > > struct device *cpu_dev;
> > > struct regulator *cpu_reg;
> > > struct clk *cpu_clk;
> > > int ret;
> > >
> > > + /* at this point we checked the pointer already right? */
> > > + np = of_node_get(pdev->dev.of_node);
> > > /*
> > > * All per-cluster (CPUs sharing clock/voltages)
> > > initialization is done
> > > * from ->init(). In probe(), we just need to make sure
> > > that clk and @@ -368,6 +355,28 @@ static int
> > > dt_cpufreq_probe(struct platform_device *pdev) if (ret)
> > > dev_err(cpu_dev, "failed register driver: %d\n",
> > > ret);
> > > + /*
> > > + * For now, just loading the cooling device;
> > > + * thermal DT code takes care of matching them.
> > > + */
> > > + if (of_find_property(np, "#cooling-cells", NULL)) {
> > > + struct cpufreq_policy policy;
> > > + struct private_data *priv;
> > > + struct thermal_cooling_device *cdev;
> > > +
> > > + /* TODO: can cpu0 be always used ? */
> > > + cpufreq_get_policy(&policy, 0);
> > > + priv = policy.driver_data;
> > > + cdev = of_cpufreq_cooling_register(np,
> > > cpu_present_mask);
> > > + if (IS_ERR(cdev))
> > > + dev_err(cpu_dev,
> > > + "running cpufreq without cooling
> > > device: %ld\n",
> > > + PTR_ERR(cdev));
> > > + else
> > > + priv->cdev = cdev;
> > > + }
> > > + of_node_put(np);
> > > +
> > > return ret;
> > > }
> > >
> > > diff --git a/drivers/thermal/cpu_cooling.c
> > > b/drivers/thermal/cpu_cooling.c index 1ab0018..342eb9e 100644
> > > --- a/drivers/thermal/cpu_cooling.c
> > > +++ b/drivers/thermal/cpu_cooling.c
> > > @@ -440,6 +440,11 @@ __cpufreq_cooling_register(struct device_node
> > > *np, int ret = 0, i;
> > > struct cpufreq_policy policy;
> > >
> > > + if (!cpufreq_get_current_driver()) {
> > > + dev_warn(&pdev->dev, "no cpufreq driver,
> > > deferring.");
> > > + return -EPROBE_DEFER;
> > > + }
> > > +
> > > /* Verify that all the clip cpus have same freq_min,
> > > freq_max limit */ for_each_cpu(i, clip_cpus) {
> > > /* continue if cpufreq policy not found and not
> > > return error */ diff --git
> > > a/drivers/thermal/samsung/exynos_thermal_common.c
> > > b/drivers/thermal/samsung/exynos_thermal_common.c index
> > > 3f5ad25..f84975e 100644 ---
> > > a/drivers/thermal/samsung/exynos_thermal_common.c +++
> > > b/drivers/thermal/samsung/exynos_thermal_common.c @@ -373,7
> > > +373,7 @@ int exynos_register_thermal(struct thermal_sensor_conf
> > > *sensor_conf) if
> > > (IS_ERR(th_zone->cool_dev[th_zone->cool_dev_size]))
> > > { dev_err(sensor_conf->dev, "Failed to register cpufreq cooling
> > > device\n");
> > > - ret = -EINVAL;
> > > + ret =
> > > PTR_ERR(th_zone->cool_dev[th_zone->cool_dev_size]); goto
> > > err_unregister; }
> > > th_zone->cool_dev_size++;
> > >
> > >
> > > The above way, we avoid having same test in every driver that
> > > needs it. Besides, it makes sense the cpu_cooling code takes care
> > > of this check, as it is the very first part that has direct
> > > dependency with cpufreq.
> > >
> > > > I only possess Exynos boards and Beagle Bone Black, so I'd be
> > > > grateful for testing proposed solution on other boards. The
> > > > posted code is compile tested.
> > > >
> > > > This code applies on Eduardo's ti-soc-thermal-next tree:
> > > > SHA1: 208a97042d66d9bfbcfab0d4a00c9fe317bb73d3
> > > >
> > > > Lukasz Majewski (8):
> > > > thermal:cpu cooling:armada: Provide deferred probing for
> > > > armada driver thermal:cpu cooling:kirkwood: Provide deferred
> > > > probing for kirkwood driver
> > > > thermal:cpu cooling:rcar: Provide deferred probing for rcar
> > > > driver thermal:cpu cooling:spear: Provide deferred probing for
> > > > spear driver thermal:cpu cooling:tegra: Provide deferred
> > > > probing for tegra driver thermal:cpu cooling:ti: Provide
> > > > deferred probing for ti drivers thermal:core:fix: Initialize
> > > > the max_state variable to 0 thermal:core:fix: Check return code
> > > > of the ->get_max_state() callback
> > > >
> > > > drivers/thermal/armada_thermal.c | 7 +++++++
> > > > drivers/thermal/kirkwood_thermal.c | 7 +++++++
> > > > drivers/thermal/rcar_thermal.c | 7 +++++++
> > > > drivers/thermal/spear_thermal.c | 7 +++++++
> > > > drivers/thermal/tegra_soctherm.c | 7 +++++++
> > > > drivers/thermal/thermal_core.c | 8 +++++---
> > > > drivers/thermal/ti-soc-thermal/ti-bandgap.c | 7 +++++++
> > > > 7 files changed, 47 insertions(+), 3 deletions(-)
> > > >
> > > > --
> > > > 2.0.0.rc2
> > > >
> >
> >
> >
> > --
> > Best regards,
> >
> > Lukasz Majewski
> >
> > Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
--
Best regards,
Lukasz Majewski
Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
Lukasz,
On Tue, Nov 18, 2014 at 11:16:30AM +0100, Lukasz Majewski wrote:
> The return code from ->get_max_state() callback was not checked during
> binding cooling device to thermal zone device.
>
> Signed-off-by: Lukasz Majewski <[email protected]>
> ---
> Changes for v2:
> - It turned out that patches from 1 to 6 from v1 are not needed, since
> they either already solve the problem (like imx_thermal.c) or not use
> cpufreq as a thermal cooling device.
> - The patch 7 from v1 is also not needed since this patch on error exits
> this function without using max_state variable.
> - In thermal driver probe the cpufreq_cooling_register() method presence
> is crucial to evaluate if the thermal driver needs any actions with
> -EPROBE_DEFER.
Have you tried this patch with of-thermal based systems?
The above proposal works if the thermal driver is dealing with loading
cpu_cooling. But for of-thermal based drivers, the idea is to leave to
cpufreq code to load it.
As an example, I am taking the ti-soc-thermal, but we already have other
of-thermal based drivers. Booting with this patch ti-soc-thermal
(of-based boot) loads fine, but the cpu_cooling never gets bound to the
thermal zone.
The thing is that the bind may happen before cpufreq-dt code loads the
cpufreq driver, and when cpu_cooling is checking what is the max freq,
by using cpufreq table, it won't be able to do it, as there is no table.
While, without the patch, it will use wrong in the binding, but after
it gets bound, and cpufreq loads, the max will be used correctly.
And in this case, the system still works besides this bug. The reasoning
is because the max state comes from DT (2) and lower and upper wont be
equal to THERMAL_NO_LIMIT. Then, the following check will use the
parameter, instead of max_state:
cdev->ops->get_max_state(cdev, &max_state);
/* lower default 0, upper default max_state */
lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
upper = upper == THERMAL_NO_LIMIT ?
max_state : upper;
In summary, introducing this patch, although it fix a problem, will
introduce regressions, in of-thermal based drivers.
I believe, to have this fix, you need to provide a way to have probing
deferring also in cpu_cooling. That needs also the change in the cpufreq
driver, as I mentioned in the other thread.
Cheers,
> ---
> drivers/thermal/thermal_core.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 43b9070..8567929 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -928,7 +928,7 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
> struct thermal_zone_device *pos1;
> struct thermal_cooling_device *pos2;
> unsigned long max_state;
> - int result;
> + int result, ret;
>
> if (trip >= tz->trips || (trip < 0 && trip != THERMAL_TRIPS_NONE))
> return -EINVAL;
> @@ -945,7 +945,9 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
> if (tz != pos1 || cdev != pos2)
> return -EINVAL;
>
> - cdev->ops->get_max_state(cdev, &max_state);
> + ret = cdev->ops->get_max_state(cdev, &max_state);
> + if (ret)
> + return ret;
>
> /* lower default 0, upper default max_state */
> lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
> --
> 2.0.0.rc2
>
Hi Eduardo,
>
> Lukasz,
>
> On Tue, Nov 18, 2014 at 11:16:30AM +0100, Lukasz Majewski wrote:
> > The return code from ->get_max_state() callback was not checked
> > during binding cooling device to thermal zone device.
> >
> > Signed-off-by: Lukasz Majewski <[email protected]>
> > ---
> > Changes for v2:
> > - It turned out that patches from 1 to 6 from v1 are not needed,
> > since they either already solve the problem (like imx_thermal.c) or
> > not use cpufreq as a thermal cooling device.
> > - The patch 7 from v1 is also not needed since this patch on error
> > exits this function without using max_state variable.
> > - In thermal driver probe the cpufreq_cooling_register() method
> > presence is crucial to evaluate if the thermal driver needs any
> > actions with -EPROBE_DEFER.
>
> Have you tried this patch with of-thermal based systems?
Yes. I did try it with Exynos (after the rework). And there weren't
any regressions.
To be precise - do you refer to of_cpufreq_cooling_register() [1] or
cpufreq_cooling_register() [2]?
For the latter [2] - drivers like imx_thermal.c are fully prepared for
-EDEFER_PROBE.
For the former [1] - only cpufreq-dt.c uses it (and Exynos SoC after
the rework).
>
> The above proposal works if the thermal driver is dealing with loading
> cpu_cooling. But for of-thermal based drivers, the idea is to leave to
> cpufreq code to load it.
I assume, that you mean case [1]?
>
> As an example, I am taking the ti-soc-thermal, but we already have
> other of-thermal based drivers. Booting with this patch ti-soc-thermal
> (of-based boot) loads fine, but the cpu_cooling never gets bound to
> the thermal zone.
Could you share the exact SoC/board/_defconfig setup to reproduce this
behavior? I possess Beagle Bone Black, but it doesn't have thermal
support (perhaps because its lack of accuracy).
With my Exynos setup I didn't experience any problems with this patch.
>
> The thing is that the bind may happen before cpufreq-dt code loads the
> cpufreq driver, and when cpu_cooling is checking what is the max freq,
> by using cpufreq table, it won't be able to do it, as there is no
> table.
As I look into the cpufreq-dt.c driver - in the cpufreq_init()
function, the call to of_cpufreq_cooling_register() is performed just
before cpufreq_table_validate_and_show().
It looks like a mistake in the cpufreq-dt.c code.
>
> While, without the patch, it will use wrong in the binding, but after
> it gets bound, and cpufreq loads, the max will be used correctly.
Correct. Such _wrong_ behavior was the original motivation to prepare
this patch.
>
> And in this case, the system still works besides this bug.
Unfortunately there is also a "window" in which the driver is not
properly configured and can cause system crash, although it is unlikely.
> The
> reasoning is because the max state comes from DT (2) and lower and
> upper wont be equal to THERMAL_NO_LIMIT. Then, the following check
> will use the parameter, instead of max_state:
>
> cdev->ops->get_max_state(cdev, &max_state);
>
> /* lower default 0, upper default max_state */
> lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
> upper = upper == THERMAL_NO_LIMIT ?
> max_state : upper;
>
> In summary, introducing this patch, although it fix a problem, will
> introduce regressions, in of-thermal based drivers.
To be more precise - it will affect systems, which use of-thermal.c and
cpufreq-dt.c in the same time, due to wrong ordering in the latter file.
Could you give me a hint about the exact affected system? I've grep'ed
for CPUFREQ_DT in the ./arch/arm/configs with no success.
>
> I believe, to have this fix, you need to provide a way to have probing
> deferring also in cpu_cooling. That needs also the change in the
> cpufreq driver, as I mentioned in the other thread.
I will think about possible solution and refer to previous discussion.
>
> Cheers,
>
> > ---
> > drivers/thermal/thermal_core.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/thermal/thermal_core.c
> > b/drivers/thermal/thermal_core.c index 43b9070..8567929 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -928,7 +928,7 @@ int thermal_zone_bind_cooling_device(struct
> > thermal_zone_device *tz, struct thermal_zone_device *pos1;
> > struct thermal_cooling_device *pos2;
> > unsigned long max_state;
> > - int result;
> > + int result, ret;
> >
> > if (trip >= tz->trips || (trip < 0 && trip !=
> > THERMAL_TRIPS_NONE)) return -EINVAL;
> > @@ -945,7 +945,9 @@ int thermal_zone_bind_cooling_device(struct
> > thermal_zone_device *tz, if (tz != pos1 || cdev != pos2)
> > return -EINVAL;
> >
> > - cdev->ops->get_max_state(cdev, &max_state);
> > + ret = cdev->ops->get_max_state(cdev, &max_state);
> > + if (ret)
> > + return ret;
> >
> > /* lower default 0, upper default max_state */
> > lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
> > --
> > 2.0.0.rc2
> >
--
Best regards,
Lukasz Majewski
Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
On 24 November 2014 at 16:08, Lukasz Majewski <[email protected]> wrote:
> As I look into the cpufreq-dt.c driver - in the cpufreq_init()
> function, the call to of_cpufreq_cooling_register() is performed just
> before cpufreq_table_validate_and_show().
> It looks like a mistake in the cpufreq-dt.c code.
Yes. Just fixed it up and sent a patch. Please provide your
tested-by's..
Hi Viresh,
> On 24 November 2014 at 16:08, Lukasz Majewski
> <[email protected]> wrote:
> > As I look into the cpufreq-dt.c driver - in the cpufreq_init()
> > function, the call to of_cpufreq_cooling_register() is performed
> > just before cpufreq_table_validate_and_show().
> > It looks like a mistake in the cpufreq-dt.c code.
>
> Yes. Just fixed it up and sent a patch. Please provide your
> tested-by's..
Thanks for your prompt response. I don't have board which uses both
of-thermal.c and cpufreq-dt.c (exynos uses old approach for cpufreq).
I think that Eduardo may have some boards for testing.
Regarding the patch:
Reviewed-by: Lukasz Majewski <[email protected]>
--
Best regards,
Lukasz Majewski
Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
Hello Lukasz,
On Mon, Nov 24, 2014 at 11:38:54AM +0100, Lukasz Majewski wrote:
> Hi Eduardo,
>
> >
> > Lukasz,
> >
> > On Tue, Nov 18, 2014 at 11:16:30AM +0100, Lukasz Majewski wrote:
> > > The return code from ->get_max_state() callback was not checked
> > > during binding cooling device to thermal zone device.
> > >
> > > Signed-off-by: Lukasz Majewski <[email protected]>
> > > ---
> > > Changes for v2:
> > > - It turned out that patches from 1 to 6 from v1 are not needed,
> > > since they either already solve the problem (like imx_thermal.c) or
> > > not use cpufreq as a thermal cooling device.
> > > - The patch 7 from v1 is also not needed since this patch on error
> > > exits this function without using max_state variable.
> > > - In thermal driver probe the cpufreq_cooling_register() method
> > > presence is crucial to evaluate if the thermal driver needs any
> > > actions with -EPROBE_DEFER.
> >
> > Have you tried this patch with of-thermal based systems?
>
> Yes. I did try it with Exynos (after the rework). And there weren't
> any regressions.
>
> To be precise - do you refer to of_cpufreq_cooling_register() [1] or
> cpufreq_cooling_register() [2]?
>
[1]
> For the latter [2] - drivers like imx_thermal.c are fully prepared for
> -EDEFER_PROBE.
>
> For the former [1] - only cpufreq-dt.c uses it (and Exynos SoC after
> the rework).
>
> >
> > The above proposal works if the thermal driver is dealing with loading
> > cpu_cooling. But for of-thermal based drivers, the idea is to leave to
> > cpufreq code to load it.
>
> I assume, that you mean case [1]?
>
yup
> >
> > As an example, I am taking the ti-soc-thermal, but we already have
> > other of-thermal based drivers. Booting with this patch ti-soc-thermal
> > (of-based boot) loads fine, but the cpu_cooling never gets bound to
> > the thermal zone.
>
> Could you share the exact SoC/board/_defconfig setup to reproduce this
> behavior? I possess Beagle Bone Black, but it doesn't have thermal
> support (perhaps because its lack of accuracy).
>
Well, it may happen any system a driver with of-thermal + cpufreq-dt.
One board that is easily available is OMAP4460 panda board (tried
myself, the problem is there).
> With my Exynos setup I didn't experience any problems with this patch.
>
> >
> > The thing is that the bind may happen before cpufreq-dt code loads the
> > cpufreq driver, and when cpu_cooling is checking what is the max freq,
> > by using cpufreq table, it won't be able to do it, as there is no
> > table.
>
> As I look into the cpufreq-dt.c driver - in the cpufreq_init()
> function, the call to of_cpufreq_cooling_register() is performed just
> before cpufreq_table_validate_and_show().
> It looks like a mistake in the cpufreq-dt.c code.
>
Well, I believe for our case, better would be if the cpu_cooling could
be done after cpufreq driver registration call.
> >
> > While, without the patch, it will use wrong in the binding, but after
> > it gets bound, and cpufreq loads, the max will be used correctly.
>
> Correct. Such _wrong_ behavior was the original motivation to prepare
> this patch.
>
> >
> > And in this case, the system still works besides this bug.
>
> Unfortunately there is also a "window" in which the driver is not
> properly configured and can cause system crash, although it is unlikely.
>
Agreed.
>
> > The
> > reasoning is because the max state comes from DT (2) and lower and
> > upper wont be equal to THERMAL_NO_LIMIT. Then, the following check
> > will use the parameter, instead of max_state:
> >
> > cdev->ops->get_max_state(cdev, &max_state);
> >
> > /* lower default 0, upper default max_state */
> > lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
> > upper = upper == THERMAL_NO_LIMIT ?
> > max_state : upper;
> >
> > In summary, introducing this patch, although it fix a problem, will
> > introduce regressions, in of-thermal based drivers.
>
> To be more precise - it will affect systems, which use of-thermal.c and
> cpufreq-dt.c in the same time, due to wrong ordering in the latter file.
>
Exactly.
> Could you give me a hint about the exact affected system? I've grep'ed
> for CPUFREQ_DT in the ./arch/arm/configs with no success.
>
Yeah, the grepping is correct. But well, just because it is not in
defconfigs does not mean it won't be used.
> >
> > I believe, to have this fix, you need to provide a way to have probing
> > deferring also in cpu_cooling. That needs also the change in the
> > cpufreq driver, as I mentioned in the other thread.
>
> I will think about possible solution and refer to previous discussion.
>
Good. For your patch, it is still sane to have it. But needs to be taken
after fixing the ordering between cpufreq-dt and cpu_cooling.
> >
> > Cheers,
> >
> > > ---
> > > drivers/thermal/thermal_core.c | 6 ++++--
> > > 1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/thermal/thermal_core.c
> > > b/drivers/thermal/thermal_core.c index 43b9070..8567929 100644
> > > --- a/drivers/thermal/thermal_core.c
> > > +++ b/drivers/thermal/thermal_core.c
> > > @@ -928,7 +928,7 @@ int thermal_zone_bind_cooling_device(struct
> > > thermal_zone_device *tz, struct thermal_zone_device *pos1;
> > > struct thermal_cooling_device *pos2;
> > > unsigned long max_state;
> > > - int result;
> > > + int result, ret;
> > >
> > > if (trip >= tz->trips || (trip < 0 && trip !=
> > > THERMAL_TRIPS_NONE)) return -EINVAL;
> > > @@ -945,7 +945,9 @@ int thermal_zone_bind_cooling_device(struct
> > > thermal_zone_device *tz, if (tz != pos1 || cdev != pos2)
> > > return -EINVAL;
> > >
> > > - cdev->ops->get_max_state(cdev, &max_state);
> > > + ret = cdev->ops->get_max_state(cdev, &max_state);
> > > + if (ret)
> > > + return ret;
> > >
> > > /* lower default 0, upper default max_state */
> > > lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
> > > --
> > > 2.0.0.rc2
> > >
>
>
> --
> Best regards,
>
> Lukasz Majewski
>
> Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
On Mon, 24 Nov 2014 14:02:25 -0400
Eduardo Valentin <[email protected]> wrote:
>
> Hello Lukasz,
>
> On Mon, Nov 24, 2014 at 11:38:54AM +0100, Lukasz Majewski wrote:
> > Hi Eduardo,
> >
> > >
> > > Lukasz,
> > >
> > > On Tue, Nov 18, 2014 at 11:16:30AM +0100, Lukasz Majewski wrote:
> > > > The return code from ->get_max_state() callback was not checked
> > > > during binding cooling device to thermal zone device.
> > > >
> > > > Signed-off-by: Lukasz Majewski <[email protected]>
> > > > ---
> > > > Changes for v2:
> > > > - It turned out that patches from 1 to 6 from v1 are not needed,
> > > > since they either already solve the problem (like
> > > > imx_thermal.c) or not use cpufreq as a thermal cooling device.
> > > > - The patch 7 from v1 is also not needed since this patch on
> > > > error exits this function without using max_state variable.
> > > > - In thermal driver probe the cpufreq_cooling_register() method
> > > > presence is crucial to evaluate if the thermal driver needs any
> > > > actions with -EPROBE_DEFER.
> > >
> > > Have you tried this patch with of-thermal based systems?
> >
> > Yes. I did try it with Exynos (after the rework). And there weren't
> > any regressions.
> >
> > To be precise - do you refer to of_cpufreq_cooling_register() [1] or
> > cpufreq_cooling_register() [2]?
> >
>
> [1]
>
> > For the latter [2] - drivers like imx_thermal.c are fully prepared
> > for -EDEFER_PROBE.
> >
> > For the former [1] - only cpufreq-dt.c uses it (and Exynos SoC after
> > the rework).
> >
> > >
> > > The above proposal works if the thermal driver is dealing with
> > > loading cpu_cooling. But for of-thermal based drivers, the idea
> > > is to leave to cpufreq code to load it.
> >
> > I assume, that you mean case [1]?
> >
>
> yup
>
> > >
> > > As an example, I am taking the ti-soc-thermal, but we already have
> > > other of-thermal based drivers. Booting with this patch
> > > ti-soc-thermal (of-based boot) loads fine, but the cpu_cooling
> > > never gets bound to the thermal zone.
> >
> > Could you share the exact SoC/board/_defconfig setup to reproduce
> > this behavior? I possess Beagle Bone Black, but it doesn't have
> > thermal support (perhaps because its lack of accuracy).
> >
>
> Well, it may happen any system a driver with of-thermal + cpufreq-dt.
>
> One board that is easily available is OMAP4460 panda board (tried
> myself, the problem is there).
>
> > With my Exynos setup I didn't experience any problems with this
> > patch.
> >
> > >
> > > The thing is that the bind may happen before cpufreq-dt code
> > > loads the cpufreq driver, and when cpu_cooling is checking what
> > > is the max freq, by using cpufreq table, it won't be able to do
> > > it, as there is no table.
> >
> > As I look into the cpufreq-dt.c driver - in the cpufreq_init()
> > function, the call to of_cpufreq_cooling_register() is performed
> > just before cpufreq_table_validate_and_show().
> > It looks like a mistake in the cpufreq-dt.c code.
> >
>
> Well, I believe for our case, better would be if the cpu_cooling could
> be done after cpufreq driver registration call.
>
>
> > >
> > > While, without the patch, it will use wrong in the binding, but
> > > after it gets bound, and cpufreq loads, the max will be used
> > > correctly.
> >
> > Correct. Such _wrong_ behavior was the original motivation to
> > prepare this patch.
> >
> > >
> > > And in this case, the system still works besides this bug.
> >
> > Unfortunately there is also a "window" in which the driver is not
> > properly configured and can cause system crash, although it is
> > unlikely.
> >
>
> Agreed.
>
> >
> > > The
> > > reasoning is because the max state comes from DT (2) and lower and
> > > upper wont be equal to THERMAL_NO_LIMIT. Then, the following check
> > > will use the parameter, instead of max_state:
> > >
> > > cdev->ops->get_max_state(cdev, &max_state);
> > >
> > > /* lower default 0, upper default max_state */
> > > lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
> > > upper = upper == THERMAL_NO_LIMIT ?
> > > max_state : upper;
> > >
> > > In summary, introducing this patch, although it fix a problem,
> > > will introduce regressions, in of-thermal based drivers.
> >
> > To be more precise - it will affect systems, which use of-thermal.c
> > and cpufreq-dt.c in the same time, due to wrong ordering in the
> > latter file.
> >
>
> Exactly.
>
> > Could you give me a hint about the exact affected system? I've
> > grep'ed for CPUFREQ_DT in the ./arch/arm/configs with no success.
> >
>
> Yeah, the grepping is correct. But well, just because it is not in
> defconfigs does not mean it won't be used.
>
> > >
> > > I believe, to have this fix, you need to provide a way to have
> > > probing deferring also in cpu_cooling. That needs also the change
> > > in the cpufreq driver, as I mentioned in the other thread.
> >
> > I will think about possible solution and refer to previous
> > discussion.
> >
>
> Good. For your patch, it is still sane to have it. But needs to be
> taken after fixing the ordering between cpufreq-dt and cpu_cooling.
Such fix has been already prepared by Viresh. Could you test if it
works on your Panda Board (unfortunately I don't have one)?
>
>
> > >
> > > Cheers,
> > >
> > > > ---
> > > > drivers/thermal/thermal_core.c | 6 ++++--
> > > > 1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/thermal/thermal_core.c
> > > > b/drivers/thermal/thermal_core.c index 43b9070..8567929 100644
> > > > --- a/drivers/thermal/thermal_core.c
> > > > +++ b/drivers/thermal/thermal_core.c
> > > > @@ -928,7 +928,7 @@ int thermal_zone_bind_cooling_device(struct
> > > > thermal_zone_device *tz, struct thermal_zone_device *pos1;
> > > > struct thermal_cooling_device *pos2;
> > > > unsigned long max_state;
> > > > - int result;
> > > > + int result, ret;
> > > >
> > > > if (trip >= tz->trips || (trip < 0 && trip !=
> > > > THERMAL_TRIPS_NONE)) return -EINVAL;
> > > > @@ -945,7 +945,9 @@ int thermal_zone_bind_cooling_device(struct
> > > > thermal_zone_device *tz, if (tz != pos1 || cdev != pos2)
> > > > return -EINVAL;
> > > >
> > > > - cdev->ops->get_max_state(cdev, &max_state);
> > > > + ret = cdev->ops->get_max_state(cdev, &max_state);
> > > > + if (ret)
> > > > + return ret;
> > > >
> > > > /* lower default 0, upper default max_state */
> > > > lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
> > > > --
> > > > 2.0.0.rc2
> > > >
> >
> >
> > --
> > Best regards,
> >
> > Lukasz Majewski
> >
> > Samsung R&D Institute Poland (SRPOL) | Linux Platform Group
Best regards,
Lukasz Majewski
On 24 November 2014 at 19:54, Lukasz Majewski <[email protected]> wrote:
> Reviewed-by: Lukasz Majewski <[email protected]>
Would be better if you send it as reply to that one.