2023-10-09 19:06:21

by Srinivas Pandruvada

[permalink] [raw]
Subject: [PATCH v2 0/7] thermal: processor_thermal: Power floor status

Support power floor notifications for Meteor Lake processors.

The first three changes is to prepare for power floor status and others
add support for power floor.

v2
- Use common define for offset
- Fix potential race during clearing of interrupt with workload hint
- Address comment for v1 for
thermal: int340x: processor_thermal: Support power floor notifications

Srinivas Pandruvada (7):
thermal: int340x: processor_thermal: Move interrupt status MMIO offset
to common header
thermal: int340x: processor_thermal: Common function to clear SOC
interrupt
thermal: int340x: processor_thermal: Set feature mask before
proc_thermal_add
thermal: int340x: processor_thermal: Support power floor notifications
thermal: int340x: processor_thermal: Handle power floor interrupts
thermal: int340x: processor_thermal: Enable power floor support
selftests/thermel/intel: Add test to read power floor status

.../driver-api/thermal/intel_dptf.rst | 8 ++
.../thermal/intel/int340x_thermal/Makefile | 1 +
.../processor_thermal_device.c | 68 +++++++++-
.../processor_thermal_device.h | 11 ++
.../processor_thermal_device_pci.c | 43 ++++--
.../processor_thermal_power_floor.c | 126 ++++++++++++++++++
.../processor_thermal_wt_hint.c | 3 -
tools/testing/selftests/Makefile | 1 +
.../thermal/intel/power_floor/Makefile | 12 ++
.../intel/power_floor/power_floor_test.c | 108 +++++++++++++++
10 files changed, 365 insertions(+), 16 deletions(-)
create mode 100644 drivers/thermal/intel/int340x_thermal/processor_thermal_power_floor.c
create mode 100644 tools/testing/selftests/thermal/intel/power_floor/Makefile
create mode 100644 tools/testing/selftests/thermal/intel/power_floor/power_floor_test.c

--
2.40.1


2023-10-09 19:06:33

by Srinivas Pandruvada

[permalink] [raw]
Subject: [PATCH v2 2/7] thermal: int340x: processor_thermal: Common function to clear SOC interrupt

The SOC interrupt status register contains multiple interrupt sources
(workload hint interrupt and power floor interrupt). This is not possible
to clear individual interrupt source with read-modify-write, as it may
clear the new interrupt from the firmware after read operation. This is
also not possible to set the interrupt status bit to 1 for the other
interrupt source, which is not part of clearing.

Hence create a common function, which does clear all status bits at once.
Call this function after processing all interrupt sources.

Signed-off-by: Srinivas Pandruvada <[email protected]>
---
v2:
- New patch in the series

.../int340x_thermal/processor_thermal_device.h | 1 +
.../int340x_thermal/processor_thermal_device_pci.c | 13 +++++++++++++
.../int340x_thermal/processor_thermal_wt_hint.c | 2 --
3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
index 8ed6e8e94c8a..f9a381b3e55c 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
@@ -92,6 +92,7 @@ void proc_thermal_wt_req_remove(struct pci_dev *pdev);
#define MBOX_DATA_BIT_VALID 31

#define SOC_WT_RES_INT_STATUS_OFFSET 0x5B18
+#define SOC_WT_RES_INT_STATUS_MASK GENMASK_ULL(3, 2)

int processor_thermal_send_mbox_read_cmd(struct pci_dev *pdev, u16 id, u64 *resp);
int processor_thermal_send_mbox_write_cmd(struct pci_dev *pdev, u16 id, u32 data);
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
index 3c5ced79ead0..d353a190ce44 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
@@ -122,11 +122,24 @@ static void pkg_thermal_schedule_work(struct delayed_work *work)
schedule_delayed_work(work, ms);
}

+static void proc_thermal_clear_soc_int_status(struct proc_thermal_device *proc_priv)
+{
+ u64 status;
+
+ if (!(proc_priv->mmio_feature_mask & PROC_THERMAL_FEATURE_WT_HINT))
+ return;
+
+ status = readq(proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
+ writeq(status & ~SOC_WT_RES_INT_STATUS_MASK,
+ proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
+}
+
static irqreturn_t proc_thermal_irq_thread_handler(int irq, void *devid)
{
struct proc_thermal_pci *pci_info = devid;

proc_thermal_wt_intr_callback(pci_info->pdev, pci_info->proc_priv);
+ proc_thermal_clear_soc_int_status(pci_info->proc_priv);

return IRQ_HANDLED;
}
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_wt_hint.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_wt_hint.c
index c08838eb10c8..9d5e4c169d1b 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_wt_hint.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_wt_hint.c
@@ -215,8 +215,6 @@ void proc_thermal_wt_intr_callback(struct pci_dev *pdev, struct proc_thermal_dev
if (!(status & SOC_WT_PREDICTION_INT_ACTIVE))
return;

- writeq(status & ~SOC_WT_PREDICTION_INT_ACTIVE,
- proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
sysfs_notify(&pdev->dev.kobj, "workload_hint", "workload_type_index");
}
EXPORT_SYMBOL_NS_GPL(proc_thermal_wt_intr_callback, INT340X_THERMAL);
--
2.40.1

2023-10-09 19:06:43

by Srinivas Pandruvada

[permalink] [raw]
Subject: [PATCH v2 1/7] thermal: int340x: processor_thermal: Move interrupt status MMIO offset to common header

Move define SOC_WT_RES_INT_STATUS_OFFSET to processor_thermal_device.h.
This way it can be reused in other modules.

Signed-off-by: Srinivas Pandruvada <[email protected]>
---
v2:
- New patch in the series

.../thermal/intel/int340x_thermal/processor_thermal_device.h | 2 ++
.../thermal/intel/int340x_thermal/processor_thermal_wt_hint.c | 1 -
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
index dd025c8c2bac..8ed6e8e94c8a 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
@@ -91,6 +91,8 @@ void proc_thermal_wt_req_remove(struct pci_dev *pdev);
#define MBOX_DATA_BIT_AC_DC 30
#define MBOX_DATA_BIT_VALID 31

+#define SOC_WT_RES_INT_STATUS_OFFSET 0x5B18
+
int processor_thermal_send_mbox_read_cmd(struct pci_dev *pdev, u16 id, u64 *resp);
int processor_thermal_send_mbox_write_cmd(struct pci_dev *pdev, u16 id, u32 data);
int processor_thermal_mbox_interrupt_config(struct pci_dev *pdev, bool enable, int enable_bit,
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_wt_hint.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_wt_hint.c
index fabd8a363abb..c08838eb10c8 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_wt_hint.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_wt_hint.c
@@ -32,7 +32,6 @@
#include <linux/pci.h>
#include "processor_thermal_device.h"

-#define SOC_WT_RES_INT_STATUS_OFFSET 0x5B18
#define SOC_WT GENMASK_ULL(47, 40)

#define SOC_WT_PREDICTION_INT_ENABLE_BIT 23
--
2.40.1

2023-10-09 19:06:53

by Srinivas Pandruvada

[permalink] [raw]
Subject: [PATCH v2 7/7] selftests/thermel/intel: Add test to read power floor status

Some SoCs have firmware support to notify, if the system can't lower
power limit to a value requested from user space via RAPL constraints.

This test program waits for notification of power floor and prints. This
program can be used to test this feature and also allows other user space
programs to use as a reference.

Signed-off-by: Srinivas Pandruvada <[email protected]>
---
v2:
- No change

tools/testing/selftests/Makefile | 1 +
.../thermal/intel/power_floor/Makefile | 12 ++
.../intel/power_floor/power_floor_test.c | 108 ++++++++++++++++++
3 files changed, 121 insertions(+)
create mode 100644 tools/testing/selftests/thermal/intel/power_floor/Makefile
create mode 100644 tools/testing/selftests/thermal/intel/power_floor/power_floor_test.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 8d9b2341b79a..3b2061d1c1a5 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -86,6 +86,7 @@ TARGETS += syscall_user_dispatch
TARGETS += sysctl
TARGETS += tc-testing
TARGETS += tdx
+TARGETS += thermal/intel/power_floor
TARGETS += thermal/intel/workload_hint
TARGETS += timens
ifneq (1, $(quicktest))
diff --git a/tools/testing/selftests/thermal/intel/power_floor/Makefile b/tools/testing/selftests/thermal/intel/power_floor/Makefile
new file mode 100644
index 000000000000..9b88e57dbba5
--- /dev/null
+++ b/tools/testing/selftests/thermal/intel/power_floor/Makefile
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0
+ifndef CROSS_COMPILE
+uname_M := $(shell uname -m 2>/dev/null || echo not)
+ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
+
+ifeq ($(ARCH),x86)
+TEST_GEN_PROGS := power_floor_test
+
+include ../../../lib.mk
+
+endif
+endif
diff --git a/tools/testing/selftests/thermal/intel/power_floor/power_floor_test.c b/tools/testing/selftests/thermal/intel/power_floor/power_floor_test.c
new file mode 100644
index 000000000000..0326b39a11b9
--- /dev/null
+++ b/tools/testing/selftests/thermal/intel/power_floor/power_floor_test.c
@@ -0,0 +1,108 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <signal.h>
+
+#define POWER_FLOOR_ENABLE_ATTRIBUTE "/sys/bus/pci/devices/0000:00:04.0/power_limits/power_floor_enable"
+#define POWER_FLOOR_STATUS_ATTRIBUTE "/sys/bus/pci/devices/0000:00:04.0/power_limits/power_floor_status"
+
+void power_floor_exit(int signum)
+{
+ int fd;
+
+ /* Disable feature via sysfs knob */
+
+ fd = open(POWER_FLOOR_ENABLE_ATTRIBUTE, O_RDWR);
+ if (fd < 0) {
+ perror("Unable to open power floor enable file\n");
+ exit(1);
+ }
+
+ if (write(fd, "0\n", 2) < 0) {
+ perror("Can' disable power floor notifications\n");
+ exit(1);
+ }
+
+ printf("Disabled power floor notifications\n");
+
+ close(fd);
+}
+
+int main(int argc, char **argv)
+{
+ struct pollfd ufd;
+ char status_str[3];
+ int fd, ret;
+
+ if (signal(SIGINT, power_floor_exit) == SIG_IGN)
+ signal(SIGINT, SIG_IGN);
+ if (signal(SIGHUP, power_floor_exit) == SIG_IGN)
+ signal(SIGHUP, SIG_IGN);
+ if (signal(SIGTERM, power_floor_exit) == SIG_IGN)
+ signal(SIGTERM, SIG_IGN);
+
+ /* Enable feature via sysfs knob */
+ fd = open(POWER_FLOOR_ENABLE_ATTRIBUTE, O_RDWR);
+ if (fd < 0) {
+ perror("Unable to open power floor enable file\n");
+ exit(1);
+ }
+
+ if (write(fd, "1\n", 2) < 0) {
+ perror("Can' enable power floor notifications\n");
+ exit(1);
+ }
+
+ close(fd);
+
+ printf("Enabled power floor notifications\n");
+
+ while (1) {
+ fd = open(POWER_FLOOR_STATUS_ATTRIBUTE, O_RDONLY);
+ if (fd < 0) {
+ perror("Unable to power floor status file\n");
+ exit(1);
+ }
+
+ if ((lseek(fd, 0L, SEEK_SET)) < 0) {
+ fprintf(stderr, "Failed to set pointer to beginning\n");
+ exit(1);
+ }
+
+ if (read(fd, status_str, sizeof(status_str)) < 0) {
+ fprintf(stderr, "Failed to read from:%s\n",
+ POWER_FLOOR_STATUS_ATTRIBUTE);
+ exit(1);
+ }
+
+ ufd.fd = fd;
+ ufd.events = POLLPRI;
+
+ ret = poll(&ufd, 1, -1);
+ if (ret < 0) {
+ perror("poll error");
+ exit(1);
+ } else if (ret == 0) {
+ printf("Poll Timeout\n");
+ } else {
+ if ((lseek(fd, 0L, SEEK_SET)) < 0) {
+ fprintf(stderr, "Failed to set pointer to beginning\n");
+ exit(1);
+ }
+
+ if (read(fd, status_str, sizeof(status_str)) < 0)
+ exit(0);
+
+ printf("power floor status: %s\n", status_str);
+ }
+
+ close(fd);
+ }
+}
--
2.40.1

2023-10-09 19:07:09

by Srinivas Pandruvada

[permalink] [raw]
Subject: [PATCH v2 4/7] thermal: int340x: processor_thermal: Support power floor notifications

When the hardware reduces the power to the minimum possible, the power
floor is notified via an interrupt. This can happen when user space
requests a power limit via powercap RAPL interface, which forces the
system to enter to the lowest power. This power floor indication can
be used as a hint to resort to other methods of reducing power than
via RAPL power limit.

Before power floor status can be read or get notifications from the
firmware, it needs to be configured via a mailbox command. Actual power
floor status is read via bit 39 of MMIO offset 0x5B18 of the processor
thermal PCI device.

To show the current power floor status and get notification
on a sysfs attribute, add additional attributes to
/sys/bus/pci/devices/0000\:00\:04.0/power_limits/

power_floor_enable : This attribute is present when a SoC supports
power floor notification. This attribute allows to enable/disable
power floor notifications.

power_floor_status : This attribute is present when a SoC supports
power floor notification. When enabled this shows the current
status of power floor.

The power floor implementation provides interfaces, which are called
from the sysfs callbacks to enable/disable and read power floor
status. Also provides two additional interface to check if the current
processor thermal device interrupt is for power floor status and
send notification to user space.

Signed-off-by: Srinivas Pandruvada <[email protected]>
---
v2:
- Use kernel doc as suggested by Rafael
- Code changes as suggested by Rafael
- Use common offset from header file
- Removed clearing of interrupt

.../driver-api/thermal/intel_dptf.rst | 8 ++
.../thermal/intel/int340x_thermal/Makefile | 1 +
.../processor_thermal_device.c | 68 +++++++++-
.../processor_thermal_device.h | 8 ++
.../processor_thermal_power_floor.c | 126 ++++++++++++++++++
5 files changed, 210 insertions(+), 1 deletion(-)
create mode 100644 drivers/thermal/intel/int340x_thermal/processor_thermal_power_floor.c

diff --git a/Documentation/driver-api/thermal/intel_dptf.rst b/Documentation/driver-api/thermal/intel_dptf.rst
index 2d11e74ac665..55d90eafd94b 100644
--- a/Documentation/driver-api/thermal/intel_dptf.rst
+++ b/Documentation/driver-api/thermal/intel_dptf.rst
@@ -164,6 +164,14 @@ ABI.
``power_limit_1_tmax_us`` (RO)
Maximum powercap sysfs constraint_1_time_window_us for Intel RAPL

+``power_floor_status`` (RO)
+ When set to 1, hardware is not able to satisfy the requested power limit
+ via Intel RAPL.
+
+``power_floor_enable`` (RW)
+ When set to 1, enable reading and notification of power floor status.
+ Notifications are issued for changes in the power_floor_status attribute.
+
:file:`/sys/bus/pci/devices/0000\:00\:04.0/`

``tcc_offset_degree_celsius`` (RW)
diff --git a/drivers/thermal/intel/int340x_thermal/Makefile b/drivers/thermal/intel/int340x_thermal/Makefile
index f33a3ad3bef3..fe3f43924525 100644
--- a/drivers/thermal/intel/int340x_thermal/Makefile
+++ b/drivers/thermal/intel/int340x_thermal/Makefile
@@ -12,5 +12,6 @@ obj-$(CONFIG_INT340X_THERMAL) += processor_thermal_rfim.o
obj-$(CONFIG_INT340X_THERMAL) += processor_thermal_mbox.o
obj-$(CONFIG_INT340X_THERMAL) += processor_thermal_wt_req.o
obj-$(CONFIG_INT340X_THERMAL) += processor_thermal_wt_hint.o
+obj-$(CONFIG_INT340X_THERMAL) += processor_thermal_power_floor.o
obj-$(CONFIG_INT3406_THERMAL) += int3406_thermal.o
obj-$(CONFIG_ACPI_THERMAL_REL) += acpi_thermal_rel.o
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
index 29ed7d0f7022..649f67fdf345 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
@@ -26,6 +26,48 @@ static ssize_t power_limit_##index##_##suffix##_show(struct device *dev, \
(unsigned long)proc_dev->power_limits[index].suffix * 1000); \
}

+static ssize_t power_floor_status_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct proc_thermal_device *proc_dev = dev_get_drvdata(dev);
+ int ret;
+
+ ret = proc_thermal_read_power_floor_status(proc_dev);
+
+ return sysfs_emit(buf, "%d\n", ret);
+}
+
+static ssize_t power_floor_enable_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct proc_thermal_device *proc_dev = dev_get_drvdata(dev);
+ bool ret;
+
+ ret = proc_thermal_power_floor_get_state(proc_dev);
+
+ return sysfs_emit(buf, "%d\n", ret);
+}
+
+static ssize_t power_floor_enable_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct proc_thermal_device *proc_dev = dev_get_drvdata(dev);
+ u8 state;
+ int ret;
+
+ if (kstrtou8(buf, 0, &state))
+ return -EINVAL;
+
+ ret = proc_thermal_power_floor_set_state(proc_dev, !!state);
+ if (ret)
+ return ret;
+
+ return count;
+}
+
POWER_LIMIT_SHOW(0, min_uw)
POWER_LIMIT_SHOW(0, max_uw)
POWER_LIMIT_SHOW(0, step_uw)
@@ -50,6 +92,9 @@ static DEVICE_ATTR_RO(power_limit_1_step_uw);
static DEVICE_ATTR_RO(power_limit_1_tmin_us);
static DEVICE_ATTR_RO(power_limit_1_tmax_us);

+static DEVICE_ATTR_RO(power_floor_status);
+static DEVICE_ATTR_RW(power_floor_enable);
+
static struct attribute *power_limit_attrs[] = {
&dev_attr_power_limit_0_min_uw.attr,
&dev_attr_power_limit_1_min_uw.attr,
@@ -61,12 +106,30 @@ static struct attribute *power_limit_attrs[] = {
&dev_attr_power_limit_1_tmin_us.attr,
&dev_attr_power_limit_0_tmax_us.attr,
&dev_attr_power_limit_1_tmax_us.attr,
+ &dev_attr_power_floor_status.attr,
+ &dev_attr_power_floor_enable.attr,
NULL
};

+static umode_t power_limit_attr_visible(struct kobject *kobj, struct attribute *attr, int unused)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct proc_thermal_device *proc_dev;
+
+ if (attr != &dev_attr_power_floor_status.attr && attr != &dev_attr_power_floor_enable.attr)
+ return attr->mode;
+
+ proc_dev = dev_get_drvdata(dev);
+ if (!proc_dev || !(proc_dev->mmio_feature_mask & PROC_THERMAL_FEATURE_POWER_FLOOR))
+ return 0;
+
+ return attr->mode;
+}
+
static const struct attribute_group power_limit_attribute_group = {
.attrs = power_limit_attrs,
- .name = "power_limits"
+ .name = "power_limits",
+ .is_visible = power_limit_attr_visible,
};

static ssize_t tcc_offset_degree_celsius_show(struct device *dev,
@@ -380,6 +443,9 @@ void proc_thermal_mmio_remove(struct pci_dev *pdev, struct proc_thermal_device *
proc_priv->mmio_feature_mask & PROC_THERMAL_FEATURE_DVFS)
proc_thermal_rfim_remove(pdev);

+ if (proc_priv->mmio_feature_mask & PROC_THERMAL_FEATURE_POWER_FLOOR)
+ proc_thermal_power_floor_set_state(proc_priv, false);
+
if (proc_priv->mmio_feature_mask & PROC_THERMAL_FEATURE_WT_REQ)
proc_thermal_wt_req_remove(pdev);
else if (proc_priv->mmio_feature_mask & PROC_THERMAL_FEATURE_WT_HINT)
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
index f9a381b3e55c..95c6013a33fb 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.h
@@ -63,6 +63,7 @@ struct rapl_mmio_regs {
#define PROC_THERMAL_FEATURE_WT_REQ 0x08
#define PROC_THERMAL_FEATURE_DLVR 0x10
#define PROC_THERMAL_FEATURE_WT_HINT 0x20
+#define PROC_THERMAL_FEATURE_POWER_FLOOR 0x40

#if IS_ENABLED(CONFIG_PROC_THERMAL_MMIO_RAPL)
int proc_thermal_rapl_add(struct pci_dev *pdev, struct proc_thermal_device *proc_priv);
@@ -94,6 +95,13 @@ void proc_thermal_wt_req_remove(struct pci_dev *pdev);
#define SOC_WT_RES_INT_STATUS_OFFSET 0x5B18
#define SOC_WT_RES_INT_STATUS_MASK GENMASK_ULL(3, 2)

+int proc_thermal_read_power_floor_status(struct proc_thermal_device *proc_priv);
+int proc_thermal_power_floor_set_state(struct proc_thermal_device *proc_priv, bool enable);
+bool proc_thermal_power_floor_get_state(struct proc_thermal_device *proc_priv);
+void proc_thermal_power_floor_intr_callback(struct pci_dev *pdev,
+ struct proc_thermal_device *proc_priv);
+bool proc_thermal_check_power_floor_intr(struct proc_thermal_device *proc_priv);
+
int processor_thermal_send_mbox_read_cmd(struct pci_dev *pdev, u16 id, u64 *resp);
int processor_thermal_send_mbox_write_cmd(struct pci_dev *pdev, u16 id, u32 data);
int processor_thermal_mbox_interrupt_config(struct pci_dev *pdev, bool enable, int enable_bit,
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_power_floor.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_power_floor.c
new file mode 100644
index 000000000000..a1a108407f0f
--- /dev/null
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_power_floor.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Processor thermal device module for registering and processing
+ * power floor. When the hardware reduces the power to the minimum
+ * possible, the power floor is notified via an interrupt.
+ *
+ * Operation:
+ * When user space enables power floor reporting:
+ * - Use mailbox to:
+ * Enable processor thermal device interrupt
+ *
+ * - Current status of power floor is read from offset 0x5B18
+ * bit 39.
+ *
+ * Two interface functions are provided to call when there is a
+ * thermal device interrupt:
+ * - proc_thermal_power_floor_intr():
+ * Check if the interrupt is for change in power floor.
+ * Called from interrupt context.
+ *
+ * - proc_thermal_power_floor_intr_callback():
+ * Callback for interrupt processing in thread context. This involves
+ * sending notification to user space that there is a change in the
+ * power floor status.
+ *
+ * Copyright (c) 2023, Intel Corporation.
+ */
+
+#include <linux/pci.h>
+#include "processor_thermal_device.h"
+
+#define SOC_POWER_FLOOR_STATUS BIT(39)
+#define SOC_POWER_FLOOR_SHIFT 39
+
+#define SOC_POWER_FLOOR_INT_ENABLE_BIT 31
+#define SOC_POWER_FLOOR_INT_ACTIVE BIT(3)
+
+int proc_thermal_read_power_floor_status(struct proc_thermal_device *proc_priv)
+{
+ u64 status = 0;
+
+ status = readq(proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
+ return (status & SOC_POWER_FLOOR_STATUS) >> SOC_POWER_FLOOR_SHIFT;
+}
+EXPORT_SYMBOL_NS_GPL(proc_thermal_read_power_floor_status, INT340X_THERMAL);
+
+static bool enable_state;
+static DEFINE_MUTEX(pf_lock);
+
+int proc_thermal_power_floor_set_state(struct proc_thermal_device *proc_priv, bool enable)
+{
+ int ret = 0;
+
+ mutex_lock(&pf_lock);
+ if (enable_state == enable)
+ goto pf_unlock;
+
+ /*
+ * Time window parameter is not applicable to power floor interrupt configuration.
+ * Hence use -1 for time window.
+ */
+ ret = processor_thermal_mbox_interrupt_config(to_pci_dev(proc_priv->dev), enable,
+ SOC_POWER_FLOOR_INT_ENABLE_BIT, -1);
+ if (!ret)
+ enable_state = enable;
+
+pf_unlock:
+ mutex_unlock(&pf_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(proc_thermal_power_floor_set_state, INT340X_THERMAL);
+
+bool proc_thermal_power_floor_get_state(struct proc_thermal_device *proc_priv)
+{
+ return enable_state;
+}
+EXPORT_SYMBOL_NS_GPL(proc_thermal_power_floor_get_state, INT340X_THERMAL);
+
+/**
+ * proc_thermal_check_power_floor_intr() - Check power floor interrupt.
+ * @proc_priv: Processor thermal device instance.
+ *
+ * Callback to check if the interrupt for power floor is active.
+ *
+ * Context: Called from interrupt context.
+ *
+ * Return: true if power floor is active, false when not active.
+ */
+bool proc_thermal_check_power_floor_intr(struct proc_thermal_device *proc_priv)
+{
+ u64 int_status;
+
+ int_status = readq(proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
+ return !!(int_status & SOC_POWER_FLOOR_INT_ACTIVE);
+}
+EXPORT_SYMBOL_NS_GPL(proc_thermal_check_power_floor_intr, INT340X_THERMAL);
+
+/**
+ * proc_thermal_power_floor_intr_callback() - Process power floor notification
+ * @pdev: PCI device instance
+ * @proc_priv: Processor thermal device instance.
+ *
+ * Check if the power floor interrupt is active, if active send notification to
+ * user space for the attribute "power_limits", so that user can read the attribute
+ * and take action.
+ *
+ * Context: Called from interrupt thread context.
+ *
+ * Return: None.
+ */
+void proc_thermal_power_floor_intr_callback(struct pci_dev *pdev,
+ struct proc_thermal_device *proc_priv)
+{
+ u64 status;
+
+ status = readq(proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
+ if (!(status & SOC_POWER_FLOOR_INT_ACTIVE))
+ return;
+
+ sysfs_notify(&pdev->dev.kobj, "power_limits", "power_floor_status");
+}
+EXPORT_SYMBOL_NS_GPL(proc_thermal_power_floor_intr_callback, INT340X_THERMAL);
+
+MODULE_IMPORT_NS(INT340X_THERMAL);
+MODULE_LICENSE("GPL");
--
2.40.1

2023-10-09 19:07:13

by Srinivas Pandruvada

[permalink] [raw]
Subject: [PATCH v2 5/7] thermal: int340x: processor_thermal: Handle power floor interrupts

On thermal device interrupt, if the interrupt is generated for passing
power floor status, call the callback to pass notification to the user
space.

First call proc_thermal_check_power_floor_intr() to check interrupt, if
this callback returns true, wake the IRQ thread to call
proc_thermal_power_floor_intr_callback() to notify user space.

Signed-off-by: Srinivas Pandruvada <[email protected]>
---
v2:
- Use common interrupt clearing function to clear interrupt.
Previously it was cleared in the callback function
proc_thermal_power_floor_intr_callback().

.../intel/int340x_thermal/processor_thermal_device_pci.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
index ae70fabffb2d..4c2194f19ed2 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
@@ -126,7 +126,8 @@ static void proc_thermal_clear_soc_int_status(struct proc_thermal_device *proc_p
{
u64 status;

- if (!(proc_priv->mmio_feature_mask & PROC_THERMAL_FEATURE_WT_HINT))
+ if (!(proc_priv->mmio_feature_mask &
+ (PROC_THERMAL_FEATURE_WT_HINT | PROC_THERMAL_FEATURE_POWER_FLOOR)))
return;

status = readq(proc_priv->mmio_base + SOC_WT_RES_INT_STATUS_OFFSET);
@@ -139,6 +140,7 @@ static irqreturn_t proc_thermal_irq_thread_handler(int irq, void *devid)
struct proc_thermal_pci *pci_info = devid;

proc_thermal_wt_intr_callback(pci_info->pdev, pci_info->proc_priv);
+ proc_thermal_power_floor_intr_callback(pci_info->pdev, pci_info->proc_priv);
proc_thermal_clear_soc_int_status(pci_info->proc_priv);

return IRQ_HANDLED;
@@ -158,6 +160,11 @@ static irqreturn_t proc_thermal_irq_handler(int irq, void *devid)
ret = IRQ_WAKE_THREAD;
}

+ if (proc_priv->mmio_feature_mask & PROC_THERMAL_FEATURE_POWER_FLOOR) {
+ if (proc_thermal_check_power_floor_intr(pci_info->proc_priv))
+ ret = IRQ_WAKE_THREAD;
+ }
+
/*
* Since now there are two sources of interrupts: one from thermal threshold
* and another from workload hint, add a check if there was really a threshold
--
2.40.1

2023-10-09 19:07:20

by Srinivas Pandruvada

[permalink] [raw]
Subject: [PATCH v2 3/7] thermal: int340x: processor_thermal: Set feature mask before proc_thermal_add

The function proc_thermal_add() adds sysfs entries for power limits.
The feature mask of available features is not present by this time.
If feature mask is available, then this can be used to selectively
create attributes.

Feature mask is set during call to proc_thermal_mmio_add(). Change the
order of calls so that proc_thermal_mmio_add() is called before
proc_thermal_add().

There is no functional impact with this change.

Signed-off-by: Srinivas Pandruvada <[email protected]>
---
v2:
No change

.../processor_thermal_device_pci.c | 21 +++++++++----------
1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
index d353a190ce44..ae70fabffb2d 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
@@ -266,19 +266,19 @@ static int proc_thermal_pci_probe(struct pci_dev *pdev, const struct pci_device_

INIT_DELAYED_WORK(&pci_info->work, proc_thermal_threshold_work_fn);

- ret = proc_thermal_add(&pdev->dev, proc_priv);
- if (ret) {
- dev_err(&pdev->dev, "error: proc_thermal_add, will continue\n");
- pci_info->no_legacy = 1;
- }
-
proc_priv->priv_data = pci_info;
pci_info->proc_priv = proc_priv;
pci_set_drvdata(pdev, proc_priv);

ret = proc_thermal_mmio_add(pdev, proc_priv, id->driver_data);
if (ret)
- goto err_ret_thermal;
+ return ret;
+
+ ret = proc_thermal_add(&pdev->dev, proc_priv);
+ if (ret) {
+ dev_err(&pdev->dev, "error: proc_thermal_add, will continue\n");
+ pci_info->no_legacy = 1;
+ }

psv_trip.temperature = get_trip_temp(pci_info);

@@ -288,7 +288,7 @@ static int proc_thermal_pci_probe(struct pci_dev *pdev, const struct pci_device_
&tzone_params, 0, 0);
if (IS_ERR(pci_info->tzone)) {
ret = PTR_ERR(pci_info->tzone);
- goto err_ret_mmio;
+ goto err_del_legacy;
}

if (use_msi && (pdev->msi_enabled || pdev->msix_enabled)) {
@@ -325,11 +325,10 @@ static int proc_thermal_pci_probe(struct pci_dev *pdev, const struct pci_device_
pci_free_irq_vectors(pdev);
err_ret_tzone:
thermal_zone_device_unregister(pci_info->tzone);
-err_ret_mmio:
- proc_thermal_mmio_remove(pdev, proc_priv);
-err_ret_thermal:
+err_del_legacy:
if (!pci_info->no_legacy)
proc_thermal_remove(proc_priv);
+ proc_thermal_mmio_remove(pdev, proc_priv);
pci_disable_device(pdev);

return ret;
--
2.40.1

2023-10-09 19:07:21

by Srinivas Pandruvada

[permalink] [raw]
Subject: [PATCH v2 6/7] thermal: int340x: processor_thermal: Enable power floor support

Enable power floor feature support for Meteor Lake processors.

Signed-off-by: Srinivas Pandruvada <[email protected]>
---
v2:
- No change

.../intel/int340x_thermal/processor_thermal_device_pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
index 4c2194f19ed2..d7495571dd5d 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
@@ -409,7 +409,7 @@ static const struct pci_device_id proc_thermal_pci_ids[] = {
PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS | PROC_THERMAL_FEATURE_WT_REQ) },
{ PCI_DEVICE_DATA(INTEL, MTLP_THERMAL, PROC_THERMAL_FEATURE_RAPL |
PROC_THERMAL_FEATURE_FIVR | PROC_THERMAL_FEATURE_DVFS | PROC_THERMAL_FEATURE_DLVR |
- PROC_THERMAL_FEATURE_WT_HINT) },
+ PROC_THERMAL_FEATURE_WT_HINT | PROC_THERMAL_FEATURE_POWER_FLOOR) },
{ PCI_DEVICE_DATA(INTEL, ARL_S_THERMAL, PROC_THERMAL_FEATURE_RAPL |
PROC_THERMAL_FEATURE_DVFS | PROC_THERMAL_FEATURE_DLVR | PROC_THERMAL_FEATURE_WT_HINT) },
{ PCI_DEVICE_DATA(INTEL, RPL_THERMAL, PROC_THERMAL_FEATURE_RAPL |
--
2.40.1

2023-10-12 19:15:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] thermal: processor_thermal: Power floor status

On Mon, Oct 9, 2023 at 9:06 PM Srinivas Pandruvada
<[email protected]> wrote:
>
> Support power floor notifications for Meteor Lake processors.
>
> The first three changes is to prepare for power floor status and others
> add support for power floor.
>
> v2
> - Use common define for offset
> - Fix potential race during clearing of interrupt with workload hint
> - Address comment for v1 for
> thermal: int340x: processor_thermal: Support power floor notifications
>
> Srinivas Pandruvada (7):
> thermal: int340x: processor_thermal: Move interrupt status MMIO offset
> to common header
> thermal: int340x: processor_thermal: Common function to clear SOC
> interrupt
> thermal: int340x: processor_thermal: Set feature mask before
> proc_thermal_add
> thermal: int340x: processor_thermal: Support power floor notifications
> thermal: int340x: processor_thermal: Handle power floor interrupts
> thermal: int340x: processor_thermal: Enable power floor support
> selftests/thermel/intel: Add test to read power floor status
>
> .../driver-api/thermal/intel_dptf.rst | 8 ++
> .../thermal/intel/int340x_thermal/Makefile | 1 +
> .../processor_thermal_device.c | 68 +++++++++-
> .../processor_thermal_device.h | 11 ++
> .../processor_thermal_device_pci.c | 43 ++++--
> .../processor_thermal_power_floor.c | 126 ++++++++++++++++++
> .../processor_thermal_wt_hint.c | 3 -
> tools/testing/selftests/Makefile | 1 +
> .../thermal/intel/power_floor/Makefile | 12 ++
> .../intel/power_floor/power_floor_test.c | 108 +++++++++++++++
> 10 files changed, 365 insertions(+), 16 deletions(-)
> create mode 100644 drivers/thermal/intel/int340x_thermal/processor_thermal_power_floor.c
> create mode 100644 tools/testing/selftests/thermal/intel/power_floor/Makefile
> create mode 100644 tools/testing/selftests/thermal/intel/power_floor/power_floor_test.c
>
> --

Whole series queued up as 6.7 material.

I've edited a couple of changelogs to clarify them a bit and changed
the documentation of the new sysfs attributes somewhat, so they don't
talk about RAPL directly, because I think that the key point here is
that if the power floor is signaled, the configuration of the system
needs to be changed in order to reduce power below the current level.