2023-01-25 14:55:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v1 0/3] thermal: intel: int340x: Use generic trip points table

Hi All,

This series replaces the following patch:

https://patchwork.kernel.org/project/linux-pm/patch/2147918.irdbgypaU6@kreacher/

but it has been almost completely rewritten, so I've dropped all tags from it.

The most significant difference is that firmware-induced trip point updates are
now handled in a less controversial manner (no renumbering, just temperature
updates if applicable).

Please refer to the individual patch changelogs for details.

The series is on top of this patch:

https://patchwork.kernel.org/project/linux-pm/patch/2688799.mvXUDI8C0e@kreacher/

which applies on top of the linux-next branch in linux-pm.git from today.

Thanks!





2023-01-25 14:56:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v1 1/3] thermal: intel: int340x: Rework updating trip points

From: Rafael J. Wysocki <[email protected]>

It is generally invalid to change the trip point indices after they have
been exposed via sysfs.

Moreover, the thermal objects in the ACPI namespace cannot go away and
appear on the fly. In practice, the only thing that can happen when the
INT3403_PERF_TRIP_POINT_CHANGED notification is sent by the platform
firmware is a change of the return values of those thermal objects.

For this reason, add a special function for updating the trip point
temperatures after re-evaluating the respective ACPI thermal objects
and change int3403_notify() to invoke it instead of
int340x_thermal_read_trips() that would change the trip point indices
on errors. Also remove the locking from the latter, because it is only
called before registering the thermal zone and it cannot race with the
zone's callbacks.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/thermal/intel/int340x_thermal/int3403_thermal.c | 2
drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c | 51 +++++++++--
drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.h | 2
3 files changed, 47 insertions(+), 8 deletions(-)

Index: linux-pm/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
===================================================================
--- linux-pm.orig/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
+++ linux-pm/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
@@ -168,13 +168,11 @@ static int int340x_thermal_get_trip_conf
return 0;
}

-int int340x_thermal_read_trips(struct int34x_thermal_zone *int34x_zone)
+static int int340x_thermal_read_trips(struct int34x_thermal_zone *int34x_zone)
{
int trip_cnt = int34x_zone->aux_trip_nr;
int i;

- mutex_lock(&int34x_zone->trip_mutex);
-
int34x_zone->crt_trip_id = -1;
if (!int340x_thermal_get_trip_config(int34x_zone->adev->handle, "_CRT",
&int34x_zone->crt_temp))
@@ -202,11 +200,8 @@ int int340x_thermal_read_trips(struct in
int34x_zone->act_trips[i].valid = true;
}

- mutex_unlock(&int34x_zone->trip_mutex);
-
return trip_cnt;
}
-EXPORT_SYMBOL_GPL(int340x_thermal_read_trips);

static struct thermal_zone_params int340x_thermal_params = {
.governor_name = "user_space",
@@ -309,6 +304,50 @@ void int340x_thermal_zone_remove(struct
}
EXPORT_SYMBOL_GPL(int340x_thermal_zone_remove);

+void int340x_thermal_update_trips(struct int34x_thermal_zone *int34x_zone)
+{
+ acpi_handle zone_handle = int34x_zone->adev->handle;
+ int i, err;
+
+ mutex_lock(&int34x_zone->trip_mutex);
+
+ if (int34x_zone->crt_trip_id > 0) {
+ err = int340x_thermal_get_trip_config(zone_handle, "_CRT",
+ &int34x_zone->crt_temp);
+ if (err)
+ int34x_zone->crt_temp = THERMAL_TEMP_INVALID;
+ }
+
+ if (int34x_zone->hot_trip_id > 0) {
+ err = int340x_thermal_get_trip_config(zone_handle, "_HOT",
+ &int34x_zone->hot_temp);
+ if (err)
+ int34x_zone->hot_temp = THERMAL_TEMP_INVALID;
+ }
+
+ if (int34x_zone->psv_trip_id > 0) {
+ err = int340x_thermal_get_trip_config(zone_handle, "_PSV",
+ &int34x_zone->psv_temp);
+ if (err)
+ int34x_zone->psv_temp = THERMAL_TEMP_INVALID;
+ }
+
+ for (i = 0; i < INT340X_THERMAL_MAX_ACT_TRIP_COUNT; i++) {
+ char name[5] = { '_', 'A', 'C', '0' + i, '\0' };
+
+ if (!int34x_zone->act_trips[i].valid)
+ break;
+
+ err = int340x_thermal_get_trip_config(zone_handle, name,
+ &int34x_zone->act_trips[i].temp);
+ if (err)
+ int34x_zone->act_trips[i].temp = THERMAL_TEMP_INVALID;
+ }
+
+ mutex_unlock(&int34x_zone->trip_mutex);
+}
+EXPORT_SYMBOL_GPL(int340x_thermal_update_trips);
+
MODULE_AUTHOR("Aaron Lu <[email protected]>");
MODULE_AUTHOR("Srinivas Pandruvada <[email protected]>");
MODULE_DESCRIPTION("Intel INT340x common thermal zone handler");
Index: linux-pm/drivers/thermal/intel/int340x_thermal/int3403_thermal.c
===================================================================
--- linux-pm.orig/drivers/thermal/intel/int340x_thermal/int3403_thermal.c
+++ linux-pm/drivers/thermal/intel/int340x_thermal/int3403_thermal.c
@@ -69,7 +69,7 @@ static void int3403_notify(acpi_handle h
THERMAL_TRIP_VIOLATED);
break;
case INT3403_PERF_TRIP_POINT_CHANGED:
- int340x_thermal_read_trips(obj->int340x_zone);
+ int340x_thermal_update_trips(obj->int340x_zone);
int340x_thermal_zone_device_update(obj->int340x_zone,
THERMAL_TRIP_CHANGED);
break;
Index: linux-pm/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.h
===================================================================
--- linux-pm.orig/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.h
+++ linux-pm/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.h
@@ -38,7 +38,7 @@ struct int34x_thermal_zone {
struct int34x_thermal_zone *int340x_thermal_zone_add(struct acpi_device *,
int (*get_temp) (struct thermal_zone_device *, int *));
void int340x_thermal_zone_remove(struct int34x_thermal_zone *);
-int int340x_thermal_read_trips(struct int34x_thermal_zone *int34x_zone);
+void int340x_thermal_update_trips(struct int34x_thermal_zone *int34x_zone);

static inline void int340x_thermal_zone_set_priv_data(
struct int34x_thermal_zone *tzone, void *priv_data)




2023-01-25 15:20:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal: intel: int340x: Use generic trip points table

Hi Srinivas,

On Wed, Jan 25, 2023 at 3:55 PM Rafael J. Wysocki <[email protected]> wrote:
>
> Hi All,
>
> This series replaces the following patch:
>
> https://patchwork.kernel.org/project/linux-pm/patch/2147918.irdbgypaU6@kreacher/
>
> but it has been almost completely rewritten, so I've dropped all tags from it.
>
> The most significant difference is that firmware-induced trip point updates are
> now handled in a less controversial manner (no renumbering, just temperature
> updates if applicable).
>
> Please refer to the individual patch changelogs for details.
>
> The series is on top of this patch:
>
> https://patchwork.kernel.org/project/linux-pm/patch/2688799.mvXUDI8C0e@kreacher/
>
> which applies on top of the linux-next branch in linux-pm.git from today.

There are two additional branches in linux-pm.git:

thermal-intel-fixes
thermal-intel-testing

The former is just fixes to go on top of 6.2-rc5 and the latter - this
series on top of those and the current thermal-intel branch I have
locally with the Intel thermal drivers changes for 6.3.

I would appreciate giving each of them a go in your test setup.

Cheers!

2023-01-26 00:03:10

by srinivas pandruvada

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal: intel: int340x: Use generic trip points table

Hi Rafael,


On Wed, 2023-01-25 at 16:20 +0100, Rafael J. Wysocki wrote:
> Hi Srinivas,
>
> On Wed, Jan 25, 2023 at 3:55 PM Rafael J. Wysocki <[email protected]>
> wrote:
> >
> > Hi All,
> >
> > This series replaces the following patch:
> >
> > https://patchwork.kernel.org/project/linux-pm/patch/2147918.irdbgypaU6@kreacher/
> >
> > but it has been almost completely rewritten, so I've dropped all
> > tags from it.
> >
> >

[...]

> > The series is on top of this patch:
> >
> > https://patchwork.kernel.org/project/linux-pm/patch/2688799.mvXUDI8C0e@kreacher/
> >
> > which applies on top of the linux-next branch in linux-pm.git from
> > today.
>
> There are two additional branches in linux-pm.git:
>
> thermal-intel-fixes
On two systems test, no issues are observed.

> thermal-intel-testing
branch: thermal-intel-test

No issues, but number of trips are not same as invalid trips are not
registered.
Not sure if this is correct. At boot up they may be invalid, but
firmware may update later (Not aware of such scenario).

For example, the hot is not registered.

Current:

thermal_zone9/trip_point_0_type:critical
thermal_zone9/trip_point_0_temp:125050
thermal_zone9/trip_point_0_hyst:0

thermal_zone9/trip_point_1_type:hot
thermal_zone9/trip_point_1_temp:-273250
thermal_zone9/trip_point_1_hyst:0

thermal_zone9/trip_point_2_type:passive
thermal_zone9/trip_point_2_temp:103050
thermal_zone9/trip_point_2_hyst:0

thermal_zone9/trip_point_3_type:active
thermal_zone9/trip_point_3_temp:103050
thermal_zone9/trip_point_3_hyst:0

thermal_zone9/trip_point_4_type:active
thermal_zone9/trip_point_4_temp:101050
thermal_zone9/trip_point_4_hyst:0

thermal_zone9/trip_point_5_type:active
thermal_zone9/trip_point_5_temp:100050
thermal_zone9/trip_point_5_hyst:0


thermal_zone9/trip_point_6_type:active
thermal_zone9/trip_point_6_temp:98550
thermal_zone9/trip_point_6_hyst:0

thermal_zone9/trip_point_7_type:active
thermal_zone9/trip_point_7_temp:97050
thermal_zone9/trip_point_7_hyst:0


with 6.3-rc1 changes

thermal_zone9/trip_point_0_type:critical
thermal_zone9/trip_point_0_temp:125050
thermal_zone9/trip_point_0_hyst:0

thermal_zone9/trip_point_1_type:passive
thermal_zone9/trip_point_1_temp:103050
thermal_zone9/trip_point_1_hyst:0

thermal_zone9/trip_point_2_type:active
thermal_zone9/trip_point_2_temp:103050
thermal_zone9/trip_point_2_hyst:0

thermal_zone9/trip_point_3_type:active
thermal_zone9/trip_point_3_temp:101050
thermal_zone9/trip_point_3_hyst:0

thermal_zone9/trip_point_4_type:active
thermal_zone9/trip_point_4_temp:100050
thermal_zone9/trip_point_4_hyst:0

thermal_zone9/trip_point_5_type:active
thermal_zone9/trip_point_5_temp:98550
thermal_zone9/trip_point_5_hyst:0


thermal_zone9/trip_point_6_hyst:0
thermal_zone9/trip_point_6_temp:97050
thermal_zone9/trip_point_6_type:active

Thanks,
Srinivas


>
> The former is just fixes to go on top of 6.2-rc5 and the latter -
> this
> series on top of those and the current thermal-intel branch I have
> locally with the Intel thermal drivers changes for 6.3.
>
> I would appreciate giving each of them a go in your test setup.
>
> Cheers!


2023-01-26 13:13:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal: intel: int340x: Use generic trip points table

On Thursday, January 26, 2023 1:02:59 AM CET srinivas pandruvada wrote:
> Hi Rafael,
>
>
> On Wed, 2023-01-25 at 16:20 +0100, Rafael J. Wysocki wrote:
> > Hi Srinivas,
> >
> > On Wed, Jan 25, 2023 at 3:55 PM Rafael J. Wysocki <[email protected]>
> > wrote:
> > >
> > > Hi All,
> > >
> > > This series replaces the following patch:
> > >
> > > https://patchwork.kernel.org/project/linux-pm/patch/2147918.irdbgypaU6@kreacher/
> > >
> > > but it has been almost completely rewritten, so I've dropped all
> > > tags from it.
> > >
> > >
>
> [...]
>
> > > The series is on top of this patch:
> > >
> > > https://patchwork.kernel.org/project/linux-pm/patch/2688799.mvXUDI8C0e@kreacher/
> > >
> > > which applies on top of the linux-next branch in linux-pm.git from
> > > today.
> >
> > There are two additional branches in linux-pm.git:
> >
> > thermal-intel-fixes
> On two systems test, no issues are observed.

Great! I'll move this to linux-next then.

> > thermal-intel-testing
> branch: thermal-intel-test
>
> No issues, but number of trips are not same as invalid trips are not
> registered.
> Not sure if this is correct.

It may not be. At least it is a change in behavior that is not expected to
happen after these changes.

> At boot up they may be invalid, but
> firmware may update later (Not aware of such scenario).
>
> For example, the hot is not registered.
>
> Current:
>
> thermal_zone9/trip_point_0_type:critical
> thermal_zone9/trip_point_0_temp:125050
> thermal_zone9/trip_point_0_hyst:0
>
> thermal_zone9/trip_point_1_type:hot
> thermal_zone9/trip_point_1_temp:-273250
> thermal_zone9/trip_point_1_hyst:0

So this means that _HOT is evaluated successfully (or the trip point index
would be negative), but it probably returned an invalid temperature (likely 0)
that has been turned into an error by the temperature range check in the
new ACPI helper introduced by the change.

OK, thanks for testing!

I've added the appended patch to the thermal-intel-test branch. Can you please
check if it makes that difference in behavior go away?

---
From: Rafael J. Wysocki <[email protected]>
Subject: [PATCH] thermal: ACPI: Initialize trips if temperature is out of range

In some cases it is still useful to register a trip point if the
temperature returned by the corresponding ACPI thermal object (for
example, _HOT) is invalid to start with, because the same ACPI
thermal object may start to return a valid temperature after a
system configuration change (for example, from an AC power source
to battery an vice versa).

For this reason, if the ACPI thermal object evaluated by
thermal_acpi_trip_init() successfully returns a temperature value that
is out of the range of values taken into account, initialize the trip
point using THERMAL_TEMP_INVALID as the temperature value instead of
returning an error to allow the user of the trip point to decide what
to do with it.

Also update pch_wpt_add_acpi_psv_trip() to reject trip points with
invalid temperature values.

Fixes: 7a0e39748861 ("thermal: ACPI: Add ACPI trip point routines")
Reported-by: Srinivas Pandruvada <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/thermal/intel/intel_pch_thermal.c | 2 +-
drivers/thermal/thermal_acpi.c | 7 ++++---
2 files changed, 5 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/thermal/thermal_acpi.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_acpi.c
+++ linux-pm/drivers/thermal/thermal_acpi.c
@@ -64,13 +64,14 @@ static int thermal_acpi_trip_init(struct
return -ENODATA;
}

- if (temp < TEMP_MIN_DECIK || temp >= TEMP_MAX_DECIK) {
+ if (temp >= TEMP_MIN_DECIK && temp <= TEMP_MAX_DECIK) {
+ trip->temperature = deci_kelvin_to_millicelsius(temp);
+ } else {
acpi_handle_debug(adev->handle, "%s result %llu out of range\n",
obj_name, temp);
- return -ENODATA;
+ trip->temperature = THERMAL_TEMP_INVALID;
}

- trip->temperature = deci_kelvin_to_millicelsius(temp);
trip->hysteresis = 0;
trip->type = type;

Index: linux-pm/drivers/thermal/intel/intel_pch_thermal.c
===================================================================
--- linux-pm.orig/drivers/thermal/intel/intel_pch_thermal.c
+++ linux-pm/drivers/thermal/intel/intel_pch_thermal.c
@@ -107,7 +107,7 @@ static void pch_wpt_add_acpi_psv_trip(st
return;

ret = thermal_acpi_trip_passive(adev, &ptd->trips[*nr_trips]);
- if (ret)
+ if (ret || ptd->trips[*nr_trips].temperature <= 0)
return;

++(*nr_trips);




2023-01-26 17:17:56

by srinivas pandruvada

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal: intel: int340x: Use generic trip points table

Hi Rafael,

On Thu, 2023-01-26 at 14:13 +0100, Rafael J. Wysocki wrote:
> On Thursday, January 26, 2023 1:02:59 AM CET srinivas pandruvada
> wrote:
> > Hi Rafael,
> >
> >
>

[...]

> I've added the appended patch to the thermal-intel-test branch.  Can
> you please
> check if it makes that difference in behavior go away?
I synced the tree again and your patch in thermal-intel-test fixes the
issue.

Thanks,
Srinivas
>
> ---
> From: Rafael J. Wysocki <[email protected]>
> Subject: [PATCH] thermal: ACPI: Initialize trips if temperature is
> out of range
>
> In some cases it is still useful to register a trip point if the
> temperature returned by the corresponding ACPI thermal object (for
> example, _HOT) is invalid to start with, because the same ACPI
> thermal object may start to return a valid temperature after a
> system configuration change (for example, from an AC power source
> to battery an vice versa).
>
> For this reason, if the ACPI thermal object evaluated by
> thermal_acpi_trip_init() successfully returns a temperature value
> that
> is out of the range of values taken into account, initialize the trip
> point using THERMAL_TEMP_INVALID as the temperature value instead of
> returning an error to allow the user of the trip point to decide what
> to do with it.
>
> Also update pch_wpt_add_acpi_psv_trip() to reject trip points with
> invalid temperature values.
>
> Fixes: 7a0e39748861 ("thermal: ACPI: Add ACPI trip point routines")
> Reported-by: Srinivas Pandruvada
> <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
>  drivers/thermal/intel/intel_pch_thermal.c |    2 +-
>  drivers/thermal/thermal_acpi.c            |    7 ++++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_acpi.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_acpi.c
> +++ linux-pm/drivers/thermal/thermal_acpi.c
> @@ -64,13 +64,14 @@ static int thermal_acpi_trip_init(struct
>                 return -ENODATA;
>         }
>  
> -       if (temp < TEMP_MIN_DECIK || temp >= TEMP_MAX_DECIK) {
> +       if (temp >= TEMP_MIN_DECIK && temp <= TEMP_MAX_DECIK) {
> +               trip->temperature =
> deci_kelvin_to_millicelsius(temp);
> +       } else {
>                 acpi_handle_debug(adev->handle, "%s result %llu out
> of range\n",
>                                   obj_name, temp);
> -               return -ENODATA;
> +               trip->temperature = THERMAL_TEMP_INVALID;
>         }
>  
> -       trip->temperature = deci_kelvin_to_millicelsius(temp);
>         trip->hysteresis = 0;
>         trip->type = type;
>  
> Index: linux-pm/drivers/thermal/intel/intel_pch_thermal.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/intel/intel_pch_thermal.c
> +++ linux-pm/drivers/thermal/intel/intel_pch_thermal.c
> @@ -107,7 +107,7 @@ static void pch_wpt_add_acpi_psv_trip(st
>                 return;
>  
>         ret = thermal_acpi_trip_passive(adev, &ptd-
> >trips[*nr_trips]);
> -       if (ret)
> +       if (ret || ptd->trips[*nr_trips].temperature <= 0)
>                 return;
>  
>         ++(*nr_trips);
>
>
>


2023-01-26 17:42:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] thermal: intel: int340x: Use generic trip points table

Hi Srinivas,

On Thu, Jan 26, 2023 at 6:17 PM srinivas pandruvada
<[email protected]> wrote:
>
> Hi Rafael,
>
> On Thu, 2023-01-26 at 14:13 +0100, Rafael J. Wysocki wrote:
> > On Thursday, January 26, 2023 1:02:59 AM CET srinivas pandruvada
> > wrote:
> > > Hi Rafael,
> > >
> > >
> >
>
> [...]
>
> > I've added the appended patch to the thermal-intel-test branch. Can
> > you please
> > check if it makes that difference in behavior go away?
> I synced the tree again and your patch in thermal-intel-test fixes the
> issue.

Thanks a lot for testing and the confirmation!

In the meantime, I've merged the thermal-intel-test into the
bleeding-edge branch and if 0-day reports success with building it,
I'll move the patches to linux-next.

Cheers!