2024-06-06 16:11:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v2] thermal: core: Do not fail cdev registration because of invalid initial state

From: Rafael J. Wysocki <[email protected]>

It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling
device state to thermal_debug_cdev_add()") causes the ACPI fan driver
to fail probing on some systems which turns out to be due to the _FST
control method returning an invalid value until _FSL is first evaluated
for the given fan. If this happens, the .get_cur_state() cooling device
callback returns an error and __thermal_cooling_device_register() fails
as uses that callback after commit 31a0fa0019b0.

Arguably, _FST should not return an invalid value even if it is
evaluated before _FSL, so this may be regarded as a platform firmware
issue, but at the same time it is not a good enough reason for failing
the cooling device registration where the initial cooling device state
is only needed to initialize a thermal debug facility.

Accordingly, modify __thermal_cooling_device_register() to avoid calling
thermal_debug_cdev_add() instead of returning an error if the initial
.get_cur_state() callback invocation fails.

Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
Closes: https://lore.kernel.org/linux-acpi/[email protected]
Reported-by: Laura Nao <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---

v1 -> v2:
* Instead of making the thermal debug code effectively ignore the invalid
initial cooling device state, simply don't register thermal debugfs for
a cooling device if its initial state returned by the driver's
.get_cur_state() is invalid (Daniel).

Laura, please test this one even though I don't see why it wouldn't work for
you if the v1 did.

---
drivers/thermal/thermal_core.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -1001,7 +1001,7 @@ __thermal_cooling_device_register(struct

ret = cdev->ops->get_cur_state(cdev, &current_state);
if (ret)
- goto out_cdev_type;
+ current_state = ULONG_MAX;

thermal_cooling_device_setup_sysfs(cdev);

@@ -1016,7 +1016,8 @@ __thermal_cooling_device_register(struct
return ERR_PTR(ret);
}

- thermal_debug_cdev_add(cdev, current_state);
+ if (current_state <= cdev->max_state)
+ thermal_debug_cdev_add(cdev, current_state);

/* Add 'this' new cdev to the global cdev list */
mutex_lock(&thermal_list_lock);





2024-06-06 16:39:18

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v2] thermal: core: Do not fail cdev registration because of invalid initial state

On 06/06/2024 18:08, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling
> device state to thermal_debug_cdev_add()") causes the ACPI fan driver
> to fail probing on some systems which turns out to be due to the _FST
> control method returning an invalid value until _FSL is first evaluated
> for the given fan. If this happens, the .get_cur_state() cooling device
> callback returns an error and __thermal_cooling_device_register() fails
> as uses that callback after commit 31a0fa0019b0.
>
> Arguably, _FST should not return an invalid value even if it is
> evaluated before _FSL, so this may be regarded as a platform firmware
> issue, but at the same time it is not a good enough reason for failing
> the cooling device registration where the initial cooling device state
> is only needed to initialize a thermal debug facility.
>
> Accordingly, modify __thermal_cooling_device_register() to avoid calling
> thermal_debug_cdev_add() instead of returning an error if the initial
> .get_cur_state() callback invocation fails.
>
> Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
> Closes: https://lore.kernel.org/linux-acpi/[email protected]
> Reported-by: Laura Nao <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
>
> v1 -> v2:
> * Instead of making the thermal debug code effectively ignore the invalid
> initial cooling device state, simply don't register thermal debugfs for
> a cooling device if its initial state returned by the driver's
> .get_cur_state() is invalid (Daniel).
>
> Laura, please test this one even though I don't see why it wouldn't work for
> you if the v1 did.
>
> ---
> drivers/thermal/thermal_core.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -1001,7 +1001,7 @@ __thermal_cooling_device_register(struct
>
> ret = cdev->ops->get_cur_state(cdev, &current_state);
> if (ret)
> - goto out_cdev_type;
> + current_state = ULONG_MAX;

Why not move the section ? So we end up below.

>
> thermal_cooling_device_setup_sysfs(cdev);
>
> @@ -1016,7 +1016,8 @@ __thermal_cooling_device_register(struct
> return ERR_PTR(ret);
> }
>
> - thermal_debug_cdev_add(cdev, current_state);
> + if (current_state <= cdev->max_state)
> + thermal_debug_cdev_add(cdev, current_state);

ret = cdev->ops->get_cur_state(cdev, &current_state);
if (!ret)
thermal_debug_cdev_add(cdev, current_state);

Additionally a comment here to explain why get_cur_state can fail and
telling it is up to the driver to fix its routine?


> /* Add 'this' new cdev to the global cdev list */
> mutex_lock(&thermal_list_lock);
>
>
>

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


2024-06-06 18:19:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] thermal: core: Do not fail cdev registration because of invalid initial state

On Thursday, June 6, 2024 6:39:02 PM CEST Daniel Lezcano wrote:
> On 06/06/2024 18:08, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling
> > device state to thermal_debug_cdev_add()") causes the ACPI fan driver
> > to fail probing on some systems which turns out to be due to the _FST
> > control method returning an invalid value until _FSL is first evaluated
> > for the given fan. If this happens, the .get_cur_state() cooling device
> > callback returns an error and __thermal_cooling_device_register() fails
> > as uses that callback after commit 31a0fa0019b0.
> >
> > Arguably, _FST should not return an invalid value even if it is
> > evaluated before _FSL, so this may be regarded as a platform firmware
> > issue, but at the same time it is not a good enough reason for failing
> > the cooling device registration where the initial cooling device state
> > is only needed to initialize a thermal debug facility.
> >
> > Accordingly, modify __thermal_cooling_device_register() to avoid calling
> > thermal_debug_cdev_add() instead of returning an error if the initial
> > .get_cur_state() callback invocation fails.
> >
> > Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
> > Closes: https://lore.kernel.org/linux-acpi/[email protected]
> > Reported-by: Laura Nao <[email protected]>
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> >
> > v1 -> v2:
> > * Instead of making the thermal debug code effectively ignore the invalid
> > initial cooling device state, simply don't register thermal debugfs for
> > a cooling device if its initial state returned by the driver's
> > .get_cur_state() is invalid (Daniel).
> >
> > Laura, please test this one even though I don't see why it wouldn't work for
> > you if the v1 did.
> >
> > ---
> > drivers/thermal/thermal_core.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > Index: linux-pm/drivers/thermal/thermal_core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/thermal_core.c
> > +++ linux-pm/drivers/thermal/thermal_core.c
> > @@ -1001,7 +1001,7 @@ __thermal_cooling_device_register(struct
> >
> > ret = cdev->ops->get_cur_state(cdev, &current_state);
> > if (ret)
> > - goto out_cdev_type;
> > + current_state = ULONG_MAX;
>
> Why not move the section ? So we end up below.

Because it's fewer changes in this patch and it doesn't matter.

> >
> > thermal_cooling_device_setup_sysfs(cdev);
> >
> > @@ -1016,7 +1016,8 @@ __thermal_cooling_device_register(struct
> > return ERR_PTR(ret);
> > }
> >
> > - thermal_debug_cdev_add(cdev, current_state);
> > + if (current_state <= cdev->max_state)
> > + thermal_debug_cdev_add(cdev, current_state);
>
> ret = cdev->ops->get_cur_state(cdev, &current_state);
> if (!ret)
> thermal_debug_cdev_add(cdev, current_state);

So the current_state <= cdev->max_state is actually useful regardless and
consistent with the analogous check in thermal_cooling_device_update().

> Additionally a comment here to explain why get_cur_state can fail and
> telling it is up to the driver to fix its routine?

I've added a comment in v3.

>
> > /* Add 'this' new cdev to the global cdev list */
> > mutex_lock(&thermal_list_lock);
> >
> >
> >
>
>