2024-05-08 06:41:49

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 0/4] thermal: intel: hfi: Update thermal netlink parameters

Hi,

This is v2 of a series previously known as "Add debugfs files for tuning"
[1].

Changes since v1:

Rui and Rafael pointed out various problems with using debugfs for tuning
how HFI uses thermal netlink. Instead, in this version I attempt to fix
the issues that motivated v1 (see the cover letter of v1 for details). I
update the two parameters that control how HFI uses thermal netlink: the
delay between an HFI interrupt and the thermal netlink event as well as the
size of the message payload.

Added Acked-by: tag from Rui on patch 1 (thanks!).

These patches apply cleanly on top of the testing branch of Rafael's
linux-pm.

Thanks and BR,
Ricardo

[1]. https://lore.kernel.org/linux-pm/[email protected]/

Ricardo Neri (4):
thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL
thermal: intel: hfi: Shorten the thermal netlink event delay to 100ms
thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT
thermal: intel: hfi: Increase the number of CPU capabilities per
netlink event

drivers/thermal/intel/intel_hfi.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

--
2.34.1



2024-05-08 06:41:52

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 3/4] thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT

When processing a hardware update, HFI generates as many thermal netlink
events as needed to relay all the updated CPU capabilities to user space.
The constant HFI_MAX_THERM_NOTIFY_COUNT is the number of CPU capabilities
updated per each of those events.

Give this constant a more descriptive name.

Signed-off-by: Ricardo Neri <[email protected]>
---
Cc: Len Brown <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes since v1:
* None
---
drivers/thermal/intel/intel_hfi.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index d82b8788b0f8..c6658f8c5cca 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -167,7 +167,7 @@ static DEFINE_MUTEX(hfi_instance_lock);

static struct workqueue_struct *hfi_updates_wq;
#define HFI_UPDATE_DELAY_MS 100
-#define HFI_MAX_THERM_NOTIFY_COUNT 16
+#define HFI_THERMNL_CAPS_PER_EVENT 16

static void get_hfi_caps(struct hfi_instance *hfi_instance,
struct thermal_genl_cpu_caps *cpu_caps)
@@ -218,14 +218,14 @@ static void update_capabilities(struct hfi_instance *hfi_instance)

get_hfi_caps(hfi_instance, cpu_caps);

- if (cpu_count < HFI_MAX_THERM_NOTIFY_COUNT)
+ if (cpu_count < HFI_THERMNL_CAPS_PER_EVENT)
goto last_cmd;

- /* Process complete chunks of HFI_MAX_THERM_NOTIFY_COUNT capabilities. */
+ /* Process complete chunks of HFI_THERMNL_CAPS_PER_EVENT capabilities. */
for (i = 0;
- (i + HFI_MAX_THERM_NOTIFY_COUNT) <= cpu_count;
- i += HFI_MAX_THERM_NOTIFY_COUNT)
- thermal_genl_cpu_capability_event(HFI_MAX_THERM_NOTIFY_COUNT,
+ (i + HFI_THERMNL_CAPS_PER_EVENT) <= cpu_count;
+ i += HFI_THERMNL_CAPS_PER_EVENT)
+ thermal_genl_cpu_capability_event(HFI_THERMNL_CAPS_PER_EVENT,
&cpu_caps[i]);

cpu_count = cpu_count - i;
--
2.34.1


2024-05-08 07:42:29

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 1/4] thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL

The name of the constant HFI_UPDATE_INTERVAL is misleading. It is not a
periodic interval at which HFI updates are processed. It is the delay in
the processing of an HFI update after the arrival of an HFI interrupt.

Acked-by: Zhang Rui <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
Cc: Len Brown <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes since v1:
* None
---
drivers/thermal/intel/intel_hfi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index fbc7f0cd83d7..e2b82d71ab6b 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -166,7 +166,7 @@ static struct hfi_features hfi_features;
static DEFINE_MUTEX(hfi_instance_lock);

static struct workqueue_struct *hfi_updates_wq;
-#define HFI_UPDATE_INTERVAL HZ
+#define HFI_UPDATE_DELAY HZ
#define HFI_MAX_THERM_NOTIFY_COUNT 16

static void get_hfi_caps(struct hfi_instance *hfi_instance,
@@ -322,7 +322,7 @@ void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
raw_spin_unlock(&hfi_instance->event_lock);

queue_delayed_work(hfi_updates_wq, &hfi_instance->update_work,
- HFI_UPDATE_INTERVAL);
+ HFI_UPDATE_DELAY);
}

static void init_hfi_cpu_index(struct hfi_cpu_info *info)
--
2.34.1


2024-05-08 09:05:49

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 2/4] thermal: intel: hfi: Shorten the thermal netlink event delay to 100ms

The delay between an HFI interrupt and its corresponding thermal netlink
event has so far been hard-coded to CONFIG_HZ jiffies (1 second). This
delay is too long for hardware that generates updates every tens of
milliseconds.

The HFI driver uses a delayed workqueue to send thermal netlink events. No
subsequent events will be sent if there is pending work.

As a result, much of the information of consecutive hardware updates will
be lost if the workqueue delay is too long. User space entities may act on
obsolete data. If the delay is too short, multiple events may overwhelm
listeners.

Set the delay to 100ms to strike a balance between too many and too few
events. Use milliseconds instead of jiffies to improve readability.

Signed-off-by: Ricardo Neri <[email protected]>
---
Cc: Len Brown <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes since v1:
* Dropped the debugfs interface. Instead, updated the delay from 1s to
100ms.
---
drivers/thermal/intel/intel_hfi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index e2b82d71ab6b..d82b8788b0f8 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -166,7 +166,7 @@ static struct hfi_features hfi_features;
static DEFINE_MUTEX(hfi_instance_lock);

static struct workqueue_struct *hfi_updates_wq;
-#define HFI_UPDATE_DELAY HZ
+#define HFI_UPDATE_DELAY_MS 100
#define HFI_MAX_THERM_NOTIFY_COUNT 16

static void get_hfi_caps(struct hfi_instance *hfi_instance,
@@ -322,7 +322,7 @@ void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
raw_spin_unlock(&hfi_instance->event_lock);

queue_delayed_work(hfi_updates_wq, &hfi_instance->update_work,
- HFI_UPDATE_DELAY);
+ msecs_to_jiffies(HFI_UPDATE_DELAY_MS));
}

static void init_hfi_cpu_index(struct hfi_cpu_info *info)
--
2.34.1


2024-05-08 13:59:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] thermal: intel: hfi: Update thermal netlink parameters

On Wed, May 8, 2024 at 5:37 AM Ricardo Neri
<[email protected]> wrote:
>
> Hi,
>
> This is v2 of a series previously known as "Add debugfs files for tuning"
> [1].
>
> Changes since v1:
>
> Rui and Rafael pointed out various problems with using debugfs for tuning
> how HFI uses thermal netlink. Instead, in this version I attempt to fix
> the issues that motivated v1 (see the cover letter of v1 for details). I
> update the two parameters that control how HFI uses thermal netlink: the
> delay between an HFI interrupt and the thermal netlink event as well as the
> size of the message payload.
>
> Added Acked-by: tag from Rui on patch 1 (thanks!).
>
> These patches apply cleanly on top of the testing branch of Rafael's
> linux-pm.
>
> Thanks and BR,
> Ricardo
>
> [1]. https://lore.kernel.org/linux-pm/[email protected]/
>
> Ricardo Neri (4):
> thermal: intel: hfi: Rename HFI_UPDATE_INTERVAL
> thermal: intel: hfi: Shorten the thermal netlink event delay to 100ms
> thermal: intel: hfi: Rename HFI_MAX_THERM_NOTIFY_COUNT
> thermal: intel: hfi: Increase the number of CPU capabilities per
> netlink event
>
> drivers/thermal/intel/intel_hfi.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> --

Whole series applied (with Rui's ACKs) as 6.10 material, thanks!