2023-12-08 19:21:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v1 0/3] thermal: core: Remove thermal zones during unregistration

Hi All,

This patch series adds a mechanism to guarantee that
thermal_zone_device_unregister() will not return until all of the active
references to the thermal zone device object in question have been dropped
and it has been deleted (patch [1/3]).

This supersedes the approach used so far in which all thermal zone sysfs
attribute callbacks check if the zone device is still registered under the
zone lock, so as to return early if that is not the case, as it means that
device_del() has been called for the thermal zone in question (and returned).
It is not necessary to do that any more after patch [1/3], so patch [2/3]
removes those checks from the code and drops zone locking that is not
necessary any more either.

Patch [3/3] uses the observation that the thermal subsystem does not need to
check if a thermal zone device is registered at all, because it can use its
own data to determine whether or not the thermal zone is going away and so
it may not be worth updating it, for example.

Please refer to the patch changelogs for details.

The series depends on new thermal material in linux-next, but it should not
substantially depend on any changes that have not made it into linux-next yet.

Thanks!




2023-12-08 19:21:14

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v1 3/3] thermal: core: Rework thermal zone availability check

From: Rafael J. Wysocki <[email protected]>

In order to avoid running __thermal_zone_device_update() for thermal
zones going away, the thermal zone lock is held around device_del()
in thermal_zone_device_unregister() and thermal_zone_device_update()
passes the given thermal zone device to device_is_registered().
This allows thermal_zone_device_update() to skip the
__thermal_zone_device_update() if device_del() has already run for
the thermal zone at hand.

However, instead of looking at driver core internals, the thermal
subsystem may as well rely on its own data structures for this
purpose. Namely, if the thermal zone is not present in
thermal_tz_list, it can be regarded as unavailable, which in fact is
already the case in thermal_zone_device_unregister(). Accordingly,
the device_is_registered() check in thermal_zone_device_update() can
be replaced with checking whether or not the node list_head in struct
thermal_zone_device is empty, in which case it is not there in
thermal_tz_list.

To make this work, though, it is necessary to initialize tz->node
in thermal_zone_device_register_with_trips() before registering the
thermal zone device and it needs to be added to thermal_tz_list and
deleted from it under its zone lock.

After the above modifications, the zone lock does not need to be
held around device_del() in thermal_zone_device_unregister() any more.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/thermal/thermal_core.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -505,11 +505,16 @@ int thermal_zone_device_is_enabled(struc
return tz->mode == THERMAL_DEVICE_ENABLED;
}

+static bool thermal_zone_is_present(struct thermal_zone_device *tz)
+{
+ return !list_empty(&tz->node);
+}
+
void thermal_zone_device_update(struct thermal_zone_device *tz,
enum thermal_notify_event event)
{
mutex_lock(&tz->lock);
- if (device_is_registered(&tz->device))
+ if (thermal_zone_is_present(tz))
__thermal_zone_device_update(tz, event);
mutex_unlock(&tz->lock);
}
@@ -1304,6 +1309,7 @@ thermal_zone_device_register_with_trips(
}

INIT_LIST_HEAD(&tz->thermal_instances);
+ INIT_LIST_HEAD(&tz->node);
ida_init(&tz->ida);
mutex_init(&tz->lock);
init_completion(&tz->removal);
@@ -1369,7 +1375,9 @@ thermal_zone_device_register_with_trips(
}

mutex_lock(&thermal_list_lock);
+ mutex_lock(&tz->lock);
list_add_tail(&tz->node, &thermal_tz_list);
+ mutex_unlock(&tz->lock);
mutex_unlock(&thermal_list_lock);

/* Bind cooling devices for this zone */
@@ -1460,7 +1468,10 @@ void thermal_zone_device_unregister(stru
mutex_unlock(&thermal_list_lock);
return;
}
+
+ mutex_lock(&tz->lock);
list_del(&tz->node);
+ mutex_unlock(&tz->lock);

/* Unbind all cdevs associated with 'this' thermal zone */
list_for_each_entry(cdev, &thermal_cdev_list, node)
@@ -1477,9 +1488,7 @@ void thermal_zone_device_unregister(stru
ida_free(&thermal_tz_ida, tz->id);
ida_destroy(&tz->ida);

- mutex_lock(&tz->lock);
device_del(&tz->device);
- mutex_unlock(&tz->lock);

kfree(tz->tzp);




2023-12-08 19:21:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v1 1/3] thermal: core: Make thermal_zone_device_unregister() return after freeing the zone

From: Rafael J. Wysocki <[email protected]>

Make thermal_zone_device_unregister() wait until all of the references
to the given thermal zone object have been dropped and free it before
returning.

This guarantees that when thermal_zone_device_unregister() returns,
there is no leftover activity regarding the thermal zone in question
which is required by some of its callers (for instance, modular driver
code that wants to know when it is safe to let the module go away).

Subsequently, this will allow some confusing device_is_registered()
checks to be dropped from the thermal sysfs and core code.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/thermal/thermal_core.c | 6 +++++-
include/linux/thermal.h | 2 ++
2 files changed, 7 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -822,7 +822,7 @@ static void thermal_release(struct devic
tz = to_thermal_zone(dev);
thermal_zone_destroy_device_groups(tz);
mutex_destroy(&tz->lock);
- kfree(tz);
+ complete(&tz->removal);
} else if (!strncmp(dev_name(dev), "cooling_device",
sizeof("cooling_device") - 1)) {
cdev = to_cooling_device(dev);
@@ -1315,6 +1315,7 @@ thermal_zone_device_register_with_trips(
INIT_LIST_HEAD(&tz->thermal_instances);
ida_init(&tz->ida);
mutex_init(&tz->lock);
+ init_completion(&tz->removal);
id = ida_alloc(&thermal_tz_ida, GFP_KERNEL);
if (id < 0) {
result = id;
@@ -1494,6 +1495,9 @@ void thermal_zone_device_unregister(stru
put_device(&tz->device);

thermal_notify_tz_delete(tz_id);
+
+ wait_for_completion(&tz->removal);
+ kfree(tz);
}
EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);

Index: linux-pm/include/linux/thermal.h
===================================================================
--- linux-pm.orig/include/linux/thermal.h
+++ linux-pm/include/linux/thermal.h
@@ -117,6 +117,7 @@ struct thermal_cooling_device {
* @id: unique id number for each thermal zone
* @type: the thermal zone device type
* @device: &struct device for this thermal zone
+ * @removal: removal completion
* @trip_temp_attrs: attributes for trip points for sysfs: trip temperature
* @trip_type_attrs: attributes for trip points for sysfs: trip type
* @trip_hyst_attrs: attributes for trip points for sysfs: trip hysteresis
@@ -156,6 +157,7 @@ struct thermal_zone_device {
int id;
char type[THERMAL_NAME_LENGTH];
struct device device;
+ struct completion removal;
struct attribute_group trips_attribute_group;
struct thermal_attr *trip_temp_attrs;
struct thermal_attr *trip_type_attrs;



2023-12-08 19:21:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v1 2/3] thermal: Drop redundant and confusing device_is_registered() checks

From: Rafael J. Wysocki <[email protected]>

Multiple places in the thermal subsystem (most importantly, sysfs
attribute callback functions) check if the given thermal zone device is
still registered in order to return early in case the device_del() in
thermal_zone_device_unregister() has run already.

However, after thermal_zone_device_unregister() has been made wait for
all of the zone-related activity to complete before returning, it is
not necessary to do that any more, because all of the code holding a
reference to the thermal zone device object will be waited for even if
it does not do anything special to enforce this.

Accordingly, drop all of the device_is_registered() checks that are now
redundant and get rid of the zone locking that is not necessary any more
after dropping them.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/thermal/thermal_core.c | 9 -----
drivers/thermal/thermal_helpers.c | 5 ---
drivers/thermal/thermal_hwmon.c | 5 ---
drivers/thermal/thermal_sysfs.c | 60 +++-----------------------------------
4 files changed, 7 insertions(+), 72 deletions(-)

Index: linux-pm/drivers/thermal/thermal_sysfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_sysfs.c
+++ linux-pm/drivers/thermal/thermal_sysfs.c
@@ -83,24 +83,12 @@ trip_point_type_show(struct device *dev,
char *buf)
{
struct thermal_zone_device *tz = to_thermal_zone(dev);
- enum thermal_trip_type type;
int trip_id;

if (sscanf(attr->attr.name, "trip_point_%d_type", &trip_id) != 1)
return -EINVAL;

- mutex_lock(&tz->lock);
-
- if (!device_is_registered(dev)) {
- mutex_unlock(&tz->lock);
- return -ENODEV;
- }
-
- type = tz->trips[trip_id].type;
-
- mutex_unlock(&tz->lock);
-
- switch (type) {
+ switch (tz->trips[trip_id].type) {
case THERMAL_TRIP_CRITICAL:
return sprintf(buf, "critical\n");
case THERMAL_TRIP_HOT:
@@ -132,11 +120,6 @@ trip_point_temp_store(struct device *dev

mutex_lock(&tz->lock);

- if (!device_is_registered(dev)) {
- ret = -ENODEV;
- goto unlock;
- }
-
trip = &tz->trips[trip_id];

if (temp != trip->temperature) {
@@ -162,23 +145,12 @@ trip_point_temp_show(struct device *dev,
char *buf)
{
struct thermal_zone_device *tz = to_thermal_zone(dev);
- int trip_id, temp;
+ int trip_id;

if (sscanf(attr->attr.name, "trip_point_%d_temp", &trip_id) != 1)
return -EINVAL;

- mutex_lock(&tz->lock);
-
- if (!device_is_registered(dev)) {
- mutex_unlock(&tz->lock);
- return -ENODEV;
- }
-
- temp = tz->trips[trip_id].temperature;
-
- mutex_unlock(&tz->lock);
-
- return sprintf(buf, "%d\n", temp);
+ return sprintf(buf, "%d\n", tz->trips[trip_id].temperature);
}

static ssize_t
@@ -199,11 +171,6 @@ trip_point_hyst_store(struct device *dev

mutex_lock(&tz->lock);

- if (!device_is_registered(dev)) {
- ret = -ENODEV;
- goto unlock;
- }
-
trip = &tz->trips[trip_id];

if (hyst != trip->hysteresis) {
@@ -229,23 +196,12 @@ trip_point_hyst_show(struct device *dev,
char *buf)
{
struct thermal_zone_device *tz = to_thermal_zone(dev);
- int trip_id, hyst;
+ int trip_id;

if (sscanf(attr->attr.name, "trip_point_%d_hyst", &trip_id) != 1)
return -EINVAL;

- mutex_lock(&tz->lock);
-
- if (!device_is_registered(dev)) {
- mutex_unlock(&tz->lock);
- return -ENODEV;
- }
-
- hyst = tz->trips[trip_id].hysteresis;
-
- mutex_unlock(&tz->lock);
-
- return sprintf(buf, "%d\n", hyst);
+ return sprintf(buf, "%d\n", tz->trips[trip_id].hysteresis);
}

static ssize_t
@@ -294,11 +250,6 @@ emul_temp_store(struct device *dev, stru

mutex_lock(&tz->lock);

- if (!device_is_registered(dev)) {
- ret = -ENODEV;
- goto unlock;
- }
-
if (!tz->ops->set_emul_temp)
tz->emul_temperature = temperature;
else
@@ -307,7 +258,6 @@ emul_temp_store(struct device *dev, stru
if (!ret)
__thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);

-unlock:
mutex_unlock(&tz->lock);

return ret ? ret : count;
Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -203,9 +203,6 @@ int thermal_zone_device_set_policy(struc
mutex_lock(&thermal_governor_lock);
mutex_lock(&tz->lock);

- if (!device_is_registered(&tz->device))
- goto exit;
-
gov = __find_governor(strim(policy));
if (!gov)
goto exit;
@@ -471,12 +468,6 @@ static int thermal_zone_device_set_mode(
return ret;
}

- if (!device_is_registered(&tz->device)) {
- mutex_unlock(&tz->lock);
-
- return -ENODEV;
- }
-
if (tz->ops->change_mode)
ret = tz->ops->change_mode(tz, mode);

Index: linux-pm/drivers/thermal/thermal_helpers.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_helpers.c
+++ linux-pm/drivers/thermal/thermal_helpers.c
@@ -137,10 +137,7 @@ int thermal_zone_get_temp(struct thermal
goto unlock;
}

- if (device_is_registered(&tz->device))
- ret = __thermal_zone_get_temp(tz, temp);
- else
- ret = -ENODEV;
+ ret = __thermal_zone_get_temp(tz, temp);

unlock:
mutex_unlock(&tz->lock);
Index: linux-pm/drivers/thermal/thermal_hwmon.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_hwmon.c
+++ linux-pm/drivers/thermal/thermal_hwmon.c
@@ -80,10 +80,7 @@ temp_crit_show(struct device *dev, struc

mutex_lock(&tz->lock);

- if (device_is_registered(&tz->device))
- ret = tz->ops->get_crit_temp(tz, &temperature);
- else
- ret = -ENODEV;
+ ret = tz->ops->get_crit_temp(tz, &temperature);

mutex_unlock(&tz->lock);




2023-12-11 13:37:29

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal: core: Remove thermal zones during unregistration

Hi Rafael,

On 12/8/23 19:11, Rafael J. Wysocki wrote:
> Hi All,
>
> This patch series adds a mechanism to guarantee that
> thermal_zone_device_unregister() will not return until all of the active
> references to the thermal zone device object in question have been dropped
> and it has been deleted (patch [1/3]).
>
> This supersedes the approach used so far in which all thermal zone sysfs
> attribute callbacks check if the zone device is still registered under the
> zone lock, so as to return early if that is not the case, as it means that
> device_del() has been called for the thermal zone in question (and returned).
> It is not necessary to do that any more after patch [1/3], so patch [2/3]
> removes those checks from the code and drops zone locking that is not
> necessary any more either.
>
> Patch [3/3] uses the observation that the thermal subsystem does not need to
> check if a thermal zone device is registered at all, because it can use its
> own data to determine whether or not the thermal zone is going away and so
> it may not be worth updating it, for example.
>
> Please refer to the patch changelogs for details.
>
> The series depends on new thermal material in linux-next, but it should not
> substantially depend on any changes that have not made it into linux-next yet.
>
> Thanks!
>
>
>

I like the concept with completion thing for this.
I have tired to stress test these patches with my mock
thermal zone module load/unload and it works good.

The test was doing the these bits:
for i in $(seq 1 1000000) ; do cat
/sys/class/thermal/thermal_zone2/trip_point_0_temp > /dev/null 2>&1 ; done &
for i in $(seq 1 10000) ; do insmod /data/selftest_ipa.ko ; rmmod
selftest_ipa ; done &

I couldn't trigger any issues in reading from this
trip temp file in background, which should go now w/o the
locking. I thought it would be nice test, since we have
direct call to trips array 'tz->trips[trip_id].temperature'.
Let me know if you think about other scenario for stress testing it.
(I have also checked the 'temp' sysfs read, where the mutex for
tz is used - also no issues).

Feel free to add to all patches:

Reviewed-and-tested-by: Lukasz Luba <[email protected]>

Regards,
Lukasz

2023-12-11 13:45:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal: core: Remove thermal zones during unregistration

Hi Lukasz,

On Mon, Dec 11, 2023 at 2:37 PM Lukasz Luba <[email protected]> wrote:
>
> Hi Rafael,
>
> On 12/8/23 19:11, Rafael J. Wysocki wrote:
> > Hi All,
> >
> > This patch series adds a mechanism to guarantee that
> > thermal_zone_device_unregister() will not return until all of the active
> > references to the thermal zone device object in question have been dropped
> > and it has been deleted (patch [1/3]).
> >
> > This supersedes the approach used so far in which all thermal zone sysfs
> > attribute callbacks check if the zone device is still registered under the
> > zone lock, so as to return early if that is not the case, as it means that
> > device_del() has been called for the thermal zone in question (and returned).
> > It is not necessary to do that any more after patch [1/3], so patch [2/3]
> > removes those checks from the code and drops zone locking that is not
> > necessary any more either.
> >
> > Patch [3/3] uses the observation that the thermal subsystem does not need to
> > check if a thermal zone device is registered at all, because it can use its
> > own data to determine whether or not the thermal zone is going away and so
> > it may not be worth updating it, for example.
> >
> > Please refer to the patch changelogs for details.
> >
> > The series depends on new thermal material in linux-next, but it should not
> > substantially depend on any changes that have not made it into linux-next yet.
> >
> > Thanks!
> >
> >
> >
>
> I like the concept with completion thing for this.
> I have tired to stress test these patches with my mock
> thermal zone module load/unload and it works good.
>
> The test was doing the these bits:
> for i in $(seq 1 1000000) ; do cat
> /sys/class/thermal/thermal_zone2/trip_point_0_temp > /dev/null 2>&1 ; done &
> for i in $(seq 1 10000) ; do insmod /data/selftest_ipa.ko ; rmmod
> selftest_ipa ; done &
>
> I couldn't trigger any issues in reading from this
> trip temp file in background, which should go now w/o the
> locking. I thought it would be nice test, since we have
> direct call to trips array 'tz->trips[trip_id].temperature'.
> Let me know if you think about other scenario for stress testing it.
> (I have also checked the 'temp' sysfs read, where the mutex for
> tz is used - also no issues).
>
> Feel free to add to all patches:
>
> Reviewed-and-tested-by: Lukasz Luba <[email protected]>

Thank you!

2023-12-11 16:28:57

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] thermal: core: Make thermal_zone_device_unregister() return after freeing the zone

On 08/12/2023 20:13, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Make thermal_zone_device_unregister() wait until all of the references
> to the given thermal zone object have been dropped and free it before
> returning.
>
> This guarantees that when thermal_zone_device_unregister() returns,
> there is no leftover activity regarding the thermal zone in question
> which is required by some of its callers (for instance, modular driver
> code that wants to know when it is safe to let the module go away).
>
> Subsequently, this will allow some confusing device_is_registered()
> checks to be dropped from the thermal sysfs and core code.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---

Definitively agree on the change

Acked-by: Daniel Lezcano <[email protected]>

Would it make sense to use kref_get/put ?


> drivers/thermal/thermal_core.c | 6 +++++-
> include/linux/thermal.h | 2 ++
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -822,7 +822,7 @@ static void thermal_release(struct devic
> tz = to_thermal_zone(dev);
> thermal_zone_destroy_device_groups(tz);
> mutex_destroy(&tz->lock);
> - kfree(tz);
> + complete(&tz->removal);
> } else if (!strncmp(dev_name(dev), "cooling_device",
> sizeof("cooling_device") - 1)) {
> cdev = to_cooling_device(dev);
> @@ -1315,6 +1315,7 @@ thermal_zone_device_register_with_trips(
> INIT_LIST_HEAD(&tz->thermal_instances);
> ida_init(&tz->ida);
> mutex_init(&tz->lock);
> + init_completion(&tz->removal);
> id = ida_alloc(&thermal_tz_ida, GFP_KERNEL);
> if (id < 0) {
> result = id;
> @@ -1494,6 +1495,9 @@ void thermal_zone_device_unregister(stru
> put_device(&tz->device);
>
> thermal_notify_tz_delete(tz_id);
> +
> + wait_for_completion(&tz->removal);
> + kfree(tz);
> }
> EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
>
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -117,6 +117,7 @@ struct thermal_cooling_device {
> * @id: unique id number for each thermal zone
> * @type: the thermal zone device type
> * @device: &struct device for this thermal zone
> + * @removal: removal completion
> * @trip_temp_attrs: attributes for trip points for sysfs: trip temperature
> * @trip_type_attrs: attributes for trip points for sysfs: trip type
> * @trip_hyst_attrs: attributes for trip points for sysfs: trip hysteresis
> @@ -156,6 +157,7 @@ struct thermal_zone_device {
> int id;
> char type[THERMAL_NAME_LENGTH];
> struct device device;
> + struct completion removal;
> struct attribute_group trips_attribute_group;
> struct thermal_attr *trip_temp_attrs;
> struct thermal_attr *trip_type_attrs;
>
>
>

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2023-12-11 16:42:33

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] thermal: core: Make thermal_zone_device_unregister() return after freeing the zone

On Mon, Dec 11, 2023 at 5:28 PM Daniel Lezcano
<[email protected]> wrote:
>
> On 08/12/2023 20:13, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Make thermal_zone_device_unregister() wait until all of the references
> > to the given thermal zone object have been dropped and free it before
> > returning.
> >
> > This guarantees that when thermal_zone_device_unregister() returns,
> > there is no leftover activity regarding the thermal zone in question
> > which is required by some of its callers (for instance, modular driver
> > code that wants to know when it is safe to let the module go away).
> >
> > Subsequently, this will allow some confusing device_is_registered()
> > checks to be dropped from the thermal sysfs and core code.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
>
> Definitively agree on the change
>
> Acked-by: Daniel Lezcano <[email protected]>

Thanks!

> Would it make sense to use kref_get/put ?

Why and where?

2023-12-11 17:35:09

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] thermal: core: Make thermal_zone_device_unregister() return after freeing the zone

On 11/12/2023 17:42, Rafael J. Wysocki wrote:
> On Mon, Dec 11, 2023 at 5:28 PM Daniel Lezcano
> <[email protected]> wrote:
>>
>> On 08/12/2023 20:13, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <[email protected]>
>>>
>>> Make thermal_zone_device_unregister() wait until all of the references
>>> to the given thermal zone object have been dropped and free it before
>>> returning.
>>>
>>> This guarantees that when thermal_zone_device_unregister() returns,
>>> there is no leftover activity regarding the thermal zone in question
>>> which is required by some of its callers (for instance, modular driver
>>> code that wants to know when it is safe to let the module go away).
>>>
>>> Subsequently, this will allow some confusing device_is_registered()
>>> checks to be dropped from the thermal sysfs and core code.
>>>
>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>>> ---
>>
>> Definitively agree on the change
>>
>> Acked-by: Daniel Lezcano <[email protected]>
>
> Thanks!
>
>> Would it make sense to use kref_get/put ?
>
> Why and where?

Well it is a general question. Usually this kind of removal is tied with
a refcount

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2023-12-11 17:39:43

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] thermal: Drop redundant and confusing device_is_registered() checks

On 08/12/2023 20:19, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Multiple places in the thermal subsystem (most importantly, sysfs
> attribute callback functions) check if the given thermal zone device is
> still registered in order to return early in case the device_del() in
> thermal_zone_device_unregister() has run already.
>
> However, after thermal_zone_device_unregister() has been made wait for
> all of the zone-related activity to complete before returning, it is
> not necessary to do that any more, because all of the code holding a
> reference to the thermal zone device object will be waited for even if
> it does not do anything special to enforce this.
>
> Accordingly, drop all of the device_is_registered() checks that are now
> redundant and get rid of the zone locking that is not necessary any more
> after dropping them.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---

[ ... ]

> @@ -132,11 +120,6 @@ trip_point_temp_store(struct device *dev
>
> mutex_lock(&tz->lock);
>
> - if (!device_is_registered(dev)) {
> - ret = -ENODEV;
> - goto unlock;
> - }
> -
> trip = &tz->trips[trip_id];
>
> if (temp != trip->temperature) {
> @@ -162,23 +145,12 @@ trip_point_temp_show(struct device *dev,
> char *buf)
> {
> struct thermal_zone_device *tz = to_thermal_zone(dev);
> - int trip_id, temp;
> + int trip_id;
>
> if (sscanf(attr->attr.name, "trip_point_%d_temp", &trip_id) != 1)
> return -EINVAL;
>
> - mutex_lock(&tz->lock);
> -
> - if (!device_is_registered(dev)) {
> - mutex_unlock(&tz->lock);
> - return -ENODEV;
> - }
> -
> - temp = tz->trips[trip_id].temperature;
> -
> - mutex_unlock(&tz->lock);
> -
> - return sprintf(buf, "%d\n", temp);
> + return sprintf(buf, "%d\n", tz->trips[trip_id].temperature);

Without the lock, could the trip_temp_store() make the value change
while we read it?

[ ... ]

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2023-12-11 17:58:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] thermal: Drop redundant and confusing device_is_registered() checks

On Mon, Dec 11, 2023 at 6:39 PM Daniel Lezcano
<[email protected]> wrote:
>
> On 08/12/2023 20:19, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Multiple places in the thermal subsystem (most importantly, sysfs
> > attribute callback functions) check if the given thermal zone device is
> > still registered in order to return early in case the device_del() in
> > thermal_zone_device_unregister() has run already.
> >
> > However, after thermal_zone_device_unregister() has been made wait for
> > all of the zone-related activity to complete before returning, it is
> > not necessary to do that any more, because all of the code holding a
> > reference to the thermal zone device object will be waited for even if
> > it does not do anything special to enforce this.
> >
> > Accordingly, drop all of the device_is_registered() checks that are now
> > redundant and get rid of the zone locking that is not necessary any more
> > after dropping them.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
>
> [ ... ]
>
> > @@ -132,11 +120,6 @@ trip_point_temp_store(struct device *dev
> >
> > mutex_lock(&tz->lock);
> >
> > - if (!device_is_registered(dev)) {
> > - ret = -ENODEV;
> > - goto unlock;
> > - }
> > -
> > trip = &tz->trips[trip_id];
> >
> > if (temp != trip->temperature) {
> > @@ -162,23 +145,12 @@ trip_point_temp_show(struct device *dev,
> > char *buf)
> > {
> > struct thermal_zone_device *tz = to_thermal_zone(dev);
> > - int trip_id, temp;
> > + int trip_id;
> >
> > if (sscanf(attr->attr.name, "trip_point_%d_temp", &trip_id) != 1)
> > return -EINVAL;
> >
> > - mutex_lock(&tz->lock);
> > -
> > - if (!device_is_registered(dev)) {
> > - mutex_unlock(&tz->lock);
> > - return -ENODEV;
> > - }
> > -
> > - temp = tz->trips[trip_id].temperature;
> > -
> > - mutex_unlock(&tz->lock);
> > -
> > - return sprintf(buf, "%d\n", temp);
> > + return sprintf(buf, "%d\n", tz->trips[trip_id].temperature);
>
> Without the lock, could the trip_temp_store() make the value change
> while we read it?

The lock doesn't change that, because the write can occur before
dropping the lock and the printf() and reading an int is atomic on all
architectures supported by Linux.

2023-12-11 18:57:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] thermal: core: Make thermal_zone_device_unregister() return after freeing the zone

On Mon, Dec 11, 2023 at 6:35 PM Daniel Lezcano
<[email protected]> wrote:
>
> On 11/12/2023 17:42, Rafael J. Wysocki wrote:
> > On Mon, Dec 11, 2023 at 5:28 PM Daniel Lezcano
> > <[email protected]> wrote:
> >>
> >> On 08/12/2023 20:13, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <[email protected]>
> >>>
> >>> Make thermal_zone_device_unregister() wait until all of the references
> >>> to the given thermal zone object have been dropped and free it before
> >>> returning.
> >>>
> >>> This guarantees that when thermal_zone_device_unregister() returns,
> >>> there is no leftover activity regarding the thermal zone in question
> >>> which is required by some of its callers (for instance, modular driver
> >>> code that wants to know when it is safe to let the module go away).
> >>>
> >>> Subsequently, this will allow some confusing device_is_registered()
> >>> checks to be dropped from the thermal sysfs and core code.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <[email protected]>
> >>> ---
> >>
> >> Definitively agree on the change
> >>
> >> Acked-by: Daniel Lezcano <[email protected]>
> >
> > Thanks!
> >
> >> Would it make sense to use kref_get/put ?
> >
> > Why and where?
>
> Well it is a general question. Usually this kind of removal is tied with
> a refcount

It is tied to a refcount already, but the problem is that the last
reference can be dropped from a thread concurrent to the removal one.

The completion effectively causes the removal thread to wait for the
refcont to become 0.

2023-12-12 10:27:17

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] thermal: Drop redundant and confusing device_is_registered() checks

On 08/12/2023 20:19, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Multiple places in the thermal subsystem (most importantly, sysfs
> attribute callback functions) check if the given thermal zone device is
> still registered in order to return early in case the device_del() in
> thermal_zone_device_unregister() has run already.
>
> However, after thermal_zone_device_unregister() has been made wait for
> all of the zone-related activity to complete before returning, it is
> not necessary to do that any more, because all of the code holding a
> reference to the thermal zone device object will be waited for even if
> it does not do anything special to enforce this.
>
> Accordingly, drop all of the device_is_registered() checks that are now
> redundant and get rid of the zone locking that is not necessary any more
> after dropping them.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---

Acked-by: Daniel Lezcano <[email protected]>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2023-12-12 10:29:12

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] thermal: core: Rework thermal zone availability check

On 08/12/2023 20:20, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> In order to avoid running __thermal_zone_device_update() for thermal
> zones going away, the thermal zone lock is held around device_del()
> in thermal_zone_device_unregister() and thermal_zone_device_update()
> passes the given thermal zone device to device_is_registered().
> This allows thermal_zone_device_update() to skip the
> __thermal_zone_device_update() if device_del() has already run for
> the thermal zone at hand.
>
> However, instead of looking at driver core internals, the thermal
> subsystem may as well rely on its own data structures for this
> purpose. Namely, if the thermal zone is not present in
> thermal_tz_list, it can be regarded as unavailable, which in fact is
> already the case in thermal_zone_device_unregister(). Accordingly,
> the device_is_registered() check in thermal_zone_device_update() can
> be replaced with checking whether or not the node list_head in struct
> thermal_zone_device is empty, in which case it is not there in
> thermal_tz_list.
>
> To make this work, though, it is necessary to initialize tz->node
> in thermal_zone_device_register_with_trips() before registering the
> thermal zone device and it needs to be added to thermal_tz_list and
> deleted from it under its zone lock.
>
> After the above modifications, the zone lock does not need to be
> held around device_del() in thermal_zone_device_unregister() any more.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---

Acked-by: Daniel Lezcano <[email protected]>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog