Zhao Liu will soon post a patchset to virtualize the Hardware Feedback
Interface (HFI) and Intel Thread Director (ITD) for the benefit of virtual
machines that make use of ITD for scheduling. His experiments show up to
14% improvement in performance in some workloads and configurations. This
series lays the foundation for his patchset. I will share Zhao's patchset
when available.
Patches 1-3 reorganize portions of the HFI driver to facilitate the
implementation of virtual HFI tables. In my opinion, this reorganization is
valuable on its own.
Patches 4-6 introduce the concept of ITD classes and enable ITD.
Patches 7-9 add support to reset the ITD classification history of the
current task to be used during context switch.
Several patches of the series were cherry-picked from my last submission
to support IPC classes of tasks for scheduling [1], but this series does
not touch the scheduler. I have kept the Reviewed-by and Acked-by tags of
the cherry-picked patches.
In this iteration, the virtualization of HFI requires that HFI and ITD are
unconditionally enabled in the bare-metal system. This conflicts with a
recent patchset from Stanislaw [2] that only enables HFI if there are
user space entities listening to thermal netlink events. I am not sure
how to resolve this conflict. Please see patch 6 for a discussion.
Patches apply cleanly on Rafael's master and testing branches at the time
of posting.
Thanks and BR,
Ricardo
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
Ricardo Neri (6):
thermal: intel: hfi: Introduce Intel Thread Director classes
x86/cpufeatures: Add the Intel Thread Director feature definitions
thermal: intel: hfi: Enable Intel Thread Director
x86/cpufeatures: Add feature bit for HRESET
x86/hreset: Configure history reset
x86/cpu: Introduce interface to reset hardware history
Zhao Liu (2):
thermal: intel: hfi: Relocate bit definitions of HFI registers
thermal: intel: hfi: Introduce the hfi_table structure
Zhuocheng Ding (1):
thermal: intel: hfi: Move selected data structures to a header file
MAINTAINERS | 1 +
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/hfi.h | 85 +++++++++++++++
arch/x86/include/asm/hreset.h | 30 ++++++
arch/x86/include/asm/msr-index.h | 13 +++
arch/x86/kernel/cpu/common.c | 31 +++++-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
drivers/thermal/intel/intel_hfi.c | 165 +++++++++++++++--------------
8 files changed, 245 insertions(+), 83 deletions(-)
create mode 100644 arch/x86/include/asm/hfi.h
create mode 100644 arch/x86/include/asm/hreset.h
--
2.25.1
From: Zhao Liu <[email protected]>
KVM needs the definition of several HFI registers for the virtualization
of HFI. Move the necessary definitions to msr-index.h
While here, use BIT_ULL() and GENMASK_ULL() since the relevant registers
have 64 bits. Also, remove the "_BIT" suffix for consistency in naming.
No functional changes.
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhuocheng Ding <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Zhao Liu <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/msr-index.h | 4 ++++
drivers/thermal/intel/intel_hfi.c | 10 +++-------
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index f1bd7b91b3c6..46983fb0b5b3 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1143,7 +1143,11 @@
/* Hardware Feedback Interface */
#define MSR_IA32_HW_FEEDBACK_PTR 0x17d0
+#define HW_FEEDBACK_PTR_VALID BIT_ULL(0)
+#define HW_FEEDBACK_PTR_RESERVED_MASK GENMASK_ULL(11, 1)
+
#define MSR_IA32_HW_FEEDBACK_CONFIG 0x17d1
+#define HW_FEEDBACK_CONFIG_HFI_ENABLE BIT_ULL(0)
/* x2APIC locked status */
#define MSR_IA32_XAPIC_DISABLE_STATUS 0xBD
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 3b04c6ec4fca..9aaca74bdfa3 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -48,10 +48,6 @@
#include "../thermal_netlink.h"
-/* Hardware Feedback Interface MSR configuration bits */
-#define HW_FEEDBACK_PTR_VALID_BIT BIT(0)
-#define HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT BIT(0)
-
/* CPUID detection and enumeration definitions for HFI */
#define CPUID_HFI_LEAF 6
@@ -356,7 +352,7 @@ static void hfi_enable(void)
u64 msr_val;
rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
- msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT;
+ msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE;
wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
}
@@ -366,7 +362,7 @@ static void hfi_set_hw_table(struct hfi_instance *hfi_instance)
u64 msr_val;
hw_table_pa = virt_to_phys(hfi_instance->hw_table);
- msr_val = hw_table_pa | HW_FEEDBACK_PTR_VALID_BIT;
+ msr_val = hw_table_pa | HW_FEEDBACK_PTR_VALID;
wrmsrl(MSR_IA32_HW_FEEDBACK_PTR, msr_val);
}
@@ -377,7 +373,7 @@ static void hfi_disable(void)
int i;
rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
- msr_val &= ~HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT;
+ msr_val &= ~HW_FEEDBACK_CONFIG_HFI_ENABLE;
wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
/*
--
2.25.1
From: Zhao Liu <[email protected]>
The virtualization of HFI requires to parse the HFI table of the host
system. Instead of exposing several pointers to the various section of the
table, create a single data structure that describes the table and can be
shared more cleanly.
A separate data structure that represents an HFI table improves readability
as it makes it clear that the table is one of several attributes of an HFI
instance.
No functional changes.
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Suggested-by: Ricardo Neri <[email protected]>
Co-developed-by: Zhuocheng Ding <[email protected]>
Signed-off-by: Zhuocheng Ding <[email protected]>
Signed-off-by: Zhao Liu <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
drivers/thermal/intel/intel_hfi.c | 47 ++++++++++++++++++-------------
1 file changed, 28 insertions(+), 19 deletions(-)
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 9aaca74bdfa3..eeabdf072efd 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -97,12 +97,25 @@ struct hfi_hdr {
} __packed;
/**
- * struct hfi_instance - Representation of an HFI instance (i.e., a table)
- * @local_table: Base of the local copy of the HFI table
+ * struct hfi_table - Representation of an HFI table
+ * @base_addr: Base address of the local copy of the HFI table
* @timestamp: Timestamp of the last update of the local table.
* Located at the base of the local table.
* @hdr: Base address of the header of the local table
* @data: Base address of the data of the local table
+ */
+struct hfi_table {
+ union {
+ void *base_addr;
+ u64 *timestamp;
+ };
+ void *hdr;
+ void *data;
+};
+
+/**
+ * struct hfi_instance - Representation of an HFI instance (i.e., a table)
+ * @local_table: Local copy of HFI table for this instance
* @cpus: CPUs represented in this HFI table instance
* @hw_table: Pointer to the HFI table of this instance
* @update_work: Delayed work to process HFI updates
@@ -112,12 +125,7 @@ struct hfi_hdr {
* A set of parameters to parse and navigate a specific HFI table.
*/
struct hfi_instance {
- union {
- void *local_table;
- u64 *timestamp;
- };
- void *hdr;
- void *data;
+ struct hfi_table local_table;
cpumask_var_t cpus;
void *hw_table;
struct delayed_work update_work;
@@ -175,7 +183,7 @@ static void get_hfi_caps(struct hfi_instance *hfi_instance,
s16 index;
index = per_cpu(hfi_cpu_info, cpu).index;
- caps = hfi_instance->data + index * hfi_features.cpu_stride;
+ caps = hfi_instance->local_table.data + index * hfi_features.cpu_stride;
cpu_caps[i].cpu = cpu;
/*
@@ -292,7 +300,7 @@ void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
* where a lagging CPU entered the locked region.
*/
new_timestamp = *(u64 *)hfi_instance->hw_table;
- if (*hfi_instance->timestamp == new_timestamp) {
+ if (*hfi_instance->local_table.timestamp == new_timestamp) {
thermal_clear_package_intr_status(PACKAGE_LEVEL, PACKAGE_THERM_STATUS_HFI_UPDATED);
raw_spin_unlock(&hfi_instance->event_lock);
return;
@@ -304,7 +312,7 @@ void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
* Copy the updated table into our local copy. This includes the new
* timestamp.
*/
- memcpy(hfi_instance->local_table, hfi_instance->hw_table,
+ memcpy(hfi_instance->local_table.base_addr, hfi_instance->hw_table,
hfi_features.nr_table_pages << PAGE_SHIFT);
/*
@@ -339,11 +347,12 @@ static void init_hfi_cpu_index(struct hfi_cpu_info *info)
static void init_hfi_instance(struct hfi_instance *hfi_instance)
{
/* The HFI header is below the time-stamp. */
- hfi_instance->hdr = hfi_instance->local_table +
- sizeof(*hfi_instance->timestamp);
+ hfi_instance->local_table.hdr = hfi_instance->local_table.base_addr +
+ sizeof(*hfi_instance->local_table.timestamp);
/* The HFI data starts below the header. */
- hfi_instance->data = hfi_instance->hdr + hfi_features.hdr_size;
+ hfi_instance->local_table.data = hfi_instance->local_table.hdr +
+ hfi_features.hdr_size;
}
/* Caller must hold hfi_instance_lock. */
@@ -439,7 +448,7 @@ void intel_hfi_online(unsigned int cpu)
* if needed.
*/
mutex_lock(&hfi_instance_lock);
- if (hfi_instance->hdr)
+ if (hfi_instance->local_table.hdr)
goto enable;
/*
@@ -459,9 +468,9 @@ void intel_hfi_online(unsigned int cpu)
* Allocate memory to keep a local copy of the table that
* hardware generates.
*/
- hfi_instance->local_table = kzalloc(hfi_features.nr_table_pages << PAGE_SHIFT,
- GFP_KERNEL);
- if (!hfi_instance->local_table)
+ hfi_instance->local_table.base_addr = kzalloc(hfi_features.nr_table_pages << PAGE_SHIFT,
+ GFP_KERNEL);
+ if (!hfi_instance->local_table.base_addr)
goto free_hw_table;
init_hfi_instance(hfi_instance);
@@ -512,7 +521,7 @@ void intel_hfi_offline(unsigned int cpu)
if (!hfi_instance)
return;
- if (!hfi_instance->hdr)
+ if (!hfi_instance->local_table.hdr)
return;
mutex_lock(&hfi_instance_lock);
--
2.25.1
From: Zhuocheng Ding <[email protected]>
The virtualization of HFI needs the definitions of HFI capabilities and
the HFI table. Move these definitions into a header file under arch/x86.
No changes in the relocated structures.
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Zhuocheng Ding <[email protected]>
Co-developed-by: Zhao Liu <[email protected]>
Signed-off-by: Zhao Liu <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
MAINTAINERS | 1 +
arch/x86/include/asm/hfi.h | 70 +++++++++++++++++++++++++++++++
drivers/thermal/intel/intel_hfi.c | 66 +----------------------------
3 files changed, 72 insertions(+), 65 deletions(-)
create mode 100644 arch/x86/include/asm/hfi.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 8999497011a2..8bc873ba82fa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21788,6 +21788,7 @@ L: [email protected]
S: Supported
Q: https://patchwork.kernel.org/project/linux-pm/list/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git thermal
+F: arch/x86/include/asm/hfi.h
F: Documentation/ABI/testing/sysfs-class-thermal
F: Documentation/admin-guide/thermal/
F: Documentation/devicetree/bindings/thermal/
diff --git a/arch/x86/include/asm/hfi.h b/arch/x86/include/asm/hfi.h
new file mode 100644
index 000000000000..ed8a548a376e
--- /dev/null
+++ b/arch/x86/include/asm/hfi.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_HFI_H
+#define _ASM_X86_HFI_H
+
+/* CPUID detection and enumeration definitions for HFI */
+
+union hfi_capabilities {
+ struct {
+ u8 performance:1;
+ u8 energy_efficiency:1;
+ u8 __reserved:6;
+ } split;
+ u8 bits;
+};
+
+union cpuid6_edx {
+ struct {
+ union hfi_capabilities capabilities;
+ u32 table_pages:4;
+ u32 __reserved:4;
+ s32 index:16;
+ } split;
+ u32 full;
+};
+
+/**
+ * struct hfi_hdr - Header of the HFI table
+ * @perf_updated: Hardware updated performance capabilities
+ * @ee_updated: Hardware updated energy efficiency capabilities
+ *
+ * Properties of the data in an HFI table.
+ */
+struct hfi_hdr {
+ u8 perf_updated;
+ u8 ee_updated;
+} __packed;
+
+/**
+ * struct hfi_table - Representation of an HFI table
+ * @base_addr: Base address of the local copy of the HFI table
+ * @timestamp: Timestamp of the last update of the local table.
+ * Located at the base of the local table.
+ * @hdr: Base address of the header of the local table
+ * @data: Base address of the data of the local table
+ */
+struct hfi_table {
+ union {
+ void *base_addr;
+ u64 *timestamp;
+ };
+ void *hdr;
+ void *data;
+};
+
+/**
+ * struct hfi_features - Supported HFI features
+ * @nr_table_pages: Size of the HFI table in 4KB pages
+ * @cpu_stride: Stride size to locate the capability data of a logical
+ * processor within the table (i.e., row stride)
+ * @hdr_size: Size of the table header
+ *
+ * Parameters and supported features that are common to all HFI instances
+ */
+struct hfi_features {
+ size_t nr_table_pages;
+ unsigned int cpu_stride;
+ unsigned int hdr_size;
+};
+
+#endif /* _ASM_X86_HFI_H */
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index eeabdf072efd..ee8950a60f72 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -41,6 +41,7 @@
#include <linux/topology.h>
#include <linux/workqueue.h>
+#include <asm/hfi.h>
#include <asm/msr.h>
#include "intel_hfi.h"
@@ -48,29 +49,8 @@
#include "../thermal_netlink.h"
-/* CPUID detection and enumeration definitions for HFI */
-
#define CPUID_HFI_LEAF 6
-union hfi_capabilities {
- struct {
- u8 performance:1;
- u8 energy_efficiency:1;
- u8 __reserved:6;
- } split;
- u8 bits;
-};
-
-union cpuid6_edx {
- struct {
- union hfi_capabilities capabilities;
- u32 table_pages:4;
- u32 __reserved:4;
- s32 index:16;
- } split;
- u32 full;
-};
-
/**
* struct hfi_cpu_data - HFI capabilities per CPU
* @perf_cap: Performance capability
@@ -84,35 +64,6 @@ struct hfi_cpu_data {
u8 ee_cap;
} __packed;
-/**
- * struct hfi_hdr - Header of the HFI table
- * @perf_updated: Hardware updated performance capabilities
- * @ee_updated: Hardware updated energy efficiency capabilities
- *
- * Properties of the data in an HFI table.
- */
-struct hfi_hdr {
- u8 perf_updated;
- u8 ee_updated;
-} __packed;
-
-/**
- * struct hfi_table - Representation of an HFI table
- * @base_addr: Base address of the local copy of the HFI table
- * @timestamp: Timestamp of the last update of the local table.
- * Located at the base of the local table.
- * @hdr: Base address of the header of the local table
- * @data: Base address of the data of the local table
- */
-struct hfi_table {
- union {
- void *base_addr;
- u64 *timestamp;
- };
- void *hdr;
- void *data;
-};
-
/**
* struct hfi_instance - Representation of an HFI instance (i.e., a table)
* @local_table: Local copy of HFI table for this instance
@@ -133,21 +84,6 @@ struct hfi_instance {
raw_spinlock_t event_lock;
};
-/**
- * struct hfi_features - Supported HFI features
- * @nr_table_pages: Size of the HFI table in 4KB pages
- * @cpu_stride: Stride size to locate the capability data of a logical
- * processor within the table (i.e., row stride)
- * @hdr_size: Size of the table header
- *
- * Parameters and supported features that are common to all HFI instances
- */
-struct hfi_features {
- size_t nr_table_pages;
- unsigned int cpu_stride;
- unsigned int hdr_size;
-};
-
/**
* struct hfi_cpu_info - Per-CPU attributes to consume HFI data
* @index: Row of this CPU in its HFI table
--
2.25.1
On Intel hybrid parts, each type of CPU has specific performance and
energy efficiency capabilities. The Intel Thread Director technology
extends the Hardware Feedback Interface (HFI) to provide performance and
energy efficiency data for advanced classes of instructions.
Add support to parse per-class capabilities.
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhao Liu <[email protected]>
Cc: Zhuocheng Ding <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
Patch cherry-picked from the IPC classes patchset.
---
---
arch/x86/include/asm/hfi.h | 8 +++++++-
drivers/thermal/intel/intel_hfi.c | 22 +++++++++++++++++-----
2 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/hfi.h b/arch/x86/include/asm/hfi.h
index ed8a548a376e..02ee56dbaeb6 100644
--- a/arch/x86/include/asm/hfi.h
+++ b/arch/x86/include/asm/hfi.h
@@ -28,7 +28,8 @@ union cpuid6_edx {
* @perf_updated: Hardware updated performance capabilities
* @ee_updated: Hardware updated energy efficiency capabilities
*
- * Properties of the data in an HFI table.
+ * Properties of the data in an HFI table. There exists one header per each
+ * HFI class.
*/
struct hfi_hdr {
u8 perf_updated;
@@ -54,16 +55,21 @@ struct hfi_table {
/**
* struct hfi_features - Supported HFI features
+ * @nr_classes: Number of classes supported
* @nr_table_pages: Size of the HFI table in 4KB pages
* @cpu_stride: Stride size to locate the capability data of a logical
* processor within the table (i.e., row stride)
+ * @class_stride: Stride size to locate a class within the capability
+ * data of a logical processor or the HFI table header
* @hdr_size: Size of the table header
*
* Parameters and supported features that are common to all HFI instances
*/
struct hfi_features {
+ unsigned int nr_classes;
size_t nr_table_pages;
unsigned int cpu_stride;
+ unsigned int class_stride;
unsigned int hdr_size;
};
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index ee8950a60f72..3c399f3d059f 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -57,7 +57,7 @@
* @ee_cap: Energy efficiency capability
*
* Capabilities of a logical processor in the HFI table. These capabilities are
- * unitless.
+ * unitless and specific to each HFI class.
*/
struct hfi_cpu_data {
u8 perf_cap;
@@ -277,8 +277,8 @@ static void init_hfi_cpu_index(struct hfi_cpu_info *info)
}
/*
- * The format of the HFI table depends on the number of capabilities that the
- * hardware supports. Keep a data structure to navigate the table.
+ * The format of the HFI table depends on the number of capabilities and classes
+ * that the hardware supports. Keep a data structure to navigate the table.
*/
static void init_hfi_instance(struct hfi_instance *hfi_instance)
{
@@ -498,18 +498,30 @@ static __init int hfi_parse_features(void)
/* The number of 4KB pages required by the table */
hfi_features.nr_table_pages = edx.split.table_pages + 1;
+ /*
+ * Capability fields of an HFI class are grouped together. Classes are
+ * contiguous in memory. Hence, use the number of supported features to
+ * locate a specific class.
+ */
+ hfi_features.class_stride = nr_capabilities;
+
+ /* For now, use only one class of the HFI table */
+ hfi_features.nr_classes = 1;
+
/*
* The header contains change indications for each supported feature.
* The size of the table header is rounded up to be a multiple of 8
* bytes.
*/
- hfi_features.hdr_size = DIV_ROUND_UP(nr_capabilities, 8) * 8;
+ hfi_features.hdr_size = DIV_ROUND_UP(nr_capabilities *
+ hfi_features.nr_classes, 8) * 8;
/*
* Data of each logical processor is also rounded up to be a multiple
* of 8 bytes.
*/
- hfi_features.cpu_stride = DIV_ROUND_UP(nr_capabilities, 8) * 8;
+ hfi_features.cpu_stride = DIV_ROUND_UP(nr_capabilities *
+ hfi_features.nr_classes, 8) * 8;
return 0;
}
--
2.25.1
Intel Thread Director (ITD) provides hardware resources to classify
the current task. The classification reflects the instructions per cycle of
such current task.
ITD extends the Hardware Feedback Interface table to provide performance
and energy efficiency capabilities for each of the supported classes of
tasks.
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhao Liu <[email protected]>
Cc: Zhuocheng Ding <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
Patch cherry-picked from the IPC classes patchset.
---
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index fdf723b6f6d0..8104f4791abd 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -360,6 +360,7 @@
#define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */
#define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */
#define X86_FEATURE_HFI (14*32+19) /* Hardware Feedback Interface */
+#define X86_FEATURE_ITD (14*32+23) /* Intel Thread Director */
/* AMD SVM Feature Identification, CPUID level 0x8000000a (EDX), word 15 */
#define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index e462c1d3800a..2ab036125a56 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -82,6 +82,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_XFD, X86_FEATURE_XGETBV1 },
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
{ X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
+ { X86_FEATURE_ITD, X86_FEATURE_HFI },
{}
};
--
2.25.1
Enable Intel Thread Director (ITD) from the CPU hotplug callback: globally
from CPU0 and then enable the thread-classification hardware in each
logical processor individually.
Also, initialize the number of classes supported.
Currently, a bare-metal machine does not use ITD, but KVM uses the
attributes of the bare-metal machine to virtualize HFI.
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhao Liu <[email protected]>
Cc: Zhuocheng Ding <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Acked-by: Rafael J. Wysocki <[email protected]> # intel_hfi.c
Signed-off-by: Ricardo Neri <[email protected]>
---
Discussion:
This patch conflicts with a patchset from Stanislaw Gruszka to enable HFI
only if there are user space entities listening to the thermal netlink
events. ITD requires that HFI is enabled to function. ITD needs to be
unconditionally enabled for virtual machines.
Options to resolve this conflict include a command-line argument for users
wanting to virtualize HFI or a CONFIG_ option for the same effect. QEMU
could also learn to listen to thermal netlink event. A blunter option is
to unconditionally enable HFI when KVM is enabled at build time.
---
Patch cherry-picked from the IPC classes patchset.
---
---
arch/x86/include/asm/hfi.h | 9 ++++++
arch/x86/include/asm/msr-index.h | 6 ++++
drivers/thermal/intel/intel_hfi.c | 52 +++++++++++++++++++++++++++++--
3 files changed, 65 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/hfi.h b/arch/x86/include/asm/hfi.h
index 02ee56dbaeb6..b7fda3e0e8c8 100644
--- a/arch/x86/include/asm/hfi.h
+++ b/arch/x86/include/asm/hfi.h
@@ -23,6 +23,15 @@ union cpuid6_edx {
u32 full;
};
+union cpuid6_ecx {
+ struct {
+ u32 dont_care0:8;
+ u32 nr_classes:8;
+ u32 dont_care1:16;
+ } split;
+ u32 full;
+};
+
/**
* struct hfi_hdr - Header of the HFI table
* @perf_updated: Hardware updated performance capabilities
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 46983fb0b5b3..d74932a0778d 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1148,6 +1148,12 @@
#define MSR_IA32_HW_FEEDBACK_CONFIG 0x17d1
#define HW_FEEDBACK_CONFIG_HFI_ENABLE BIT_ULL(0)
+#define HW_FEEDBACK_CONFIG_ITD_ENABLE BIT_ULL(1)
+
+#define MSR_IA32_HW_FEEDBACK_THREAD_CONFIG 0x17d4
+#define HW_FEEDBACK_THREAD_CONFIG_ENABLE BIT_ULL(0)
+
+#define MSR_IA32_HW_FEEDBACK_CHAR 0x17d2
/* x2APIC locked status */
#define MSR_IA32_XAPIC_DISABLE_STATUS 0xBD
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 3c399f3d059f..b69fa234b317 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -33,6 +33,7 @@
#include <linux/percpu-defs.h>
#include <linux/printk.h>
#include <linux/processor.h>
+#include <linux/sched/topology.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/suspend.h>
@@ -298,6 +299,10 @@ static void hfi_enable(void)
rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE;
+
+ if (cpu_feature_enabled(X86_FEATURE_ITD))
+ msr_val |= HW_FEEDBACK_CONFIG_ITD_ENABLE;
+
wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
}
@@ -319,6 +324,10 @@ static void hfi_disable(void)
rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
msr_val &= ~HW_FEEDBACK_CONFIG_HFI_ENABLE;
+
+ if (cpu_feature_enabled(X86_FEATURE_ITD))
+ msr_val &= ~HW_FEEDBACK_CONFIG_ITD_ENABLE;
+
wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
/*
@@ -337,6 +346,30 @@ static void hfi_disable(void)
}
}
+static void hfi_enable_itd_classification(void)
+{
+ u64 msr_val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_ITD))
+ return;
+
+ rdmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val);
+ msr_val |= HW_FEEDBACK_THREAD_CONFIG_ENABLE;
+ wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val);
+}
+
+static void hfi_disable_itd_classification(void)
+{
+ u64 msr_val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_ITD))
+ return;
+
+ rdmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val);
+ msr_val &= ~HW_FEEDBACK_THREAD_CONFIG_ENABLE;
+ wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val);
+}
+
/**
* intel_hfi_online() - Enable HFI on @cpu
* @cpu: CPU in which the HFI will be enabled
@@ -377,6 +410,8 @@ void intel_hfi_online(unsigned int cpu)
init_hfi_cpu_index(info);
+ hfi_enable_itd_classification();
+
/*
* Now check if the HFI instance of the package/die of @cpu has been
* initialized (by checking its header). In such case, all we have to
@@ -460,6 +495,8 @@ void intel_hfi_offline(unsigned int cpu)
if (!hfi_instance->local_table.hdr)
return;
+ hfi_disable_itd_classification();
+
mutex_lock(&hfi_instance_lock);
cpumask_clear_cpu(cpu, hfi_instance->cpus);
@@ -505,8 +542,14 @@ static __init int hfi_parse_features(void)
*/
hfi_features.class_stride = nr_capabilities;
- /* For now, use only one class of the HFI table */
- hfi_features.nr_classes = 1;
+ if (cpu_feature_enabled(X86_FEATURE_ITD)) {
+ union cpuid6_ecx ecx;
+
+ ecx.full = cpuid_ecx(CPUID_HFI_LEAF);
+ hfi_features.nr_classes = ecx.split.nr_classes;
+ } else {
+ hfi_features.nr_classes = 1;
+ }
/*
* The header contains change indications for each supported feature.
@@ -535,11 +578,16 @@ static void hfi_do_enable(void)
/* No locking needed. There is no concurrency with CPU online. */
hfi_set_hw_table(hfi_instance);
hfi_enable();
+
+ hfi_enable_itd_classification();
}
static int hfi_do_disable(void)
{
/* No locking needed. There is no concurrency with CPU offline. */
+
+ hfi_disable_itd_classification();
+
hfi_disable();
return 0;
--
2.25.1
The HRESET instruction isolates the classification of individual
tasks when they run sequentially on the same logical processor. It resets
the classification history that the logical processor maintains.
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhao Liu <[email protected]>
Cc: Zhuocheng Ding <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
Patch cherry-picked from the IPC classes patchset.
---
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 3 +++
2 files changed, 4 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 8104f4791abd..3b42479c049d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -326,6 +326,7 @@
#define X86_FEATURE_FSRC (12*32+12) /* "" Fast short REP {CMPSB,SCASB} */
#define X86_FEATURE_LKGS (12*32+18) /* "" Load "kernel" (userspace) GS */
#define X86_FEATURE_AMX_FP16 (12*32+21) /* "" AMX fp16 Support */
+#define X86_FEATURE_HRESET (12*32+22) /* Hardware history reset instruction */
#define X86_FEATURE_AVX_IFMA (12*32+23) /* "" Support for VPMADD52[H,L]UQ */
#define X86_FEATURE_LAM (12*32+26) /* Linear Address Masking */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index d74932a0778d..65b1bfb9c304 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1155,6 +1155,9 @@
#define MSR_IA32_HW_FEEDBACK_CHAR 0x17d2
+/* Hardware History Reset */
+#define MSR_IA32_HW_HRESET_ENABLE 0x17da
+
/* x2APIC locked status */
#define MSR_IA32_XAPIC_DISABLE_STATUS 0xBD
#define LEGACY_XAPIC_DISABLED BIT(0) /*
--
2.25.1
Configure the MSR that controls the behavior of HRESET on each logical
processor.
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhao Liu <[email protected]>
Cc: Zhuocheng Ding <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
Patch cherry-picked from the IPC classes patchset
---
---
arch/x86/kernel/cpu/common.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0b97bcde70c6..bce8719b47c9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -381,6 +381,26 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
cr4_clear_bits(X86_CR4_UMIP);
}
+static u32 hardware_history_features __ro_after_init;
+
+static __always_inline void setup_hreset(struct cpuinfo_x86 *c)
+{
+ if (!cpu_feature_enabled(X86_FEATURE_HRESET))
+ return;
+
+ /*
+ * Use on all CPUs the hardware history features that the boot
+ * CPU supports.
+ */
+ if (c == &boot_cpu_data)
+ hardware_history_features = cpuid_ebx(0x20);
+
+ if (!hardware_history_features)
+ return;
+
+ wrmsrl(MSR_IA32_HW_HRESET_ENABLE, hardware_history_features);
+}
+
/* These bits should not change their value after CPU init is finished. */
static const unsigned long cr4_pinned_mask =
X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
@@ -1872,10 +1892,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
- /* Set up SMEP/SMAP/UMIP */
+ /* Set up SMEP/SMAP/UMIP/HRESET */
setup_smep(c);
setup_smap(c);
setup_umip(c);
+ setup_hreset(c);
/* Enable FSGSBASE instructions if available. */
if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
--
2.25.1
KVM needs an interface to reset the history of vCPU at context switch.
When called, hardware will start the classification of the next task
from scratch.
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhao Liu <[email protected]>
Cc: Zhuocheng Ding <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
Patch cherry-picked from the IPC classes patchset. Removed calls to
reset_hardware_history() from context switch. Now KVM will call it
directly when needed.
---
* Measurements of the cost of the HRESET instruction
Methodology:
I created a tight loop with interrupts and preemption disabled. I
recorded the value of the TSC counter before and after executing
HRESET or RDTSC. I repeated the measurement 100,000 times.
I performed the experiment using an Alder Lake S system. I set the
frequency of the CPUs at a fixed value.
The table below compares the cost of HRESET with RDTSC (expressed in
the elapsed TSC count). The cost of the two instructions is
comparable.
PCore ECore
Frequency (GHz) 5.0 3.8
HRESET (avg) 28.5 44.7
HRESET (stdev %) 3.6 2.3
RDTSC (avg) 25.2 35.7
RDTSC (stdev %) 3.9 2.6
---
arch/x86/include/asm/hreset.h | 30 ++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/common.c | 8 ++++++++
2 files changed, 38 insertions(+)
create mode 100644 arch/x86/include/asm/hreset.h
diff --git a/arch/x86/include/asm/hreset.h b/arch/x86/include/asm/hreset.h
new file mode 100644
index 000000000000..d68ca2fb8642
--- /dev/null
+++ b/arch/x86/include/asm/hreset.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_HRESET_H
+
+/**
+ * HRESET - History reset. Available since binutils v2.36.
+ *
+ * Request the processor to reset the history of task classification on the
+ * current logical processor. The history components to be
+ * reset are specified in %eax. Only bits specified in CPUID(0x20).EBX
+ * and enabled in the IA32_HRESET_ENABLE MSR can be selected.
+ *
+ * The assembly code looks like:
+ *
+ * hreset %eax
+ *
+ * The corresponding machine code looks like:
+ *
+ * F3 0F 3A F0 ModRM Imm
+ *
+ * The value of ModRM is 0xc0 to specify %eax register addressing.
+ * The ignored immediate operand is set to 0.
+ *
+ * The instruction is documented in the Intel SDM.
+ */
+
+#define __ASM_HRESET ".byte 0xf3, 0xf, 0x3a, 0xf0, 0xc0, 0x0"
+
+void reset_hardware_history(void);
+
+#endif /* _ASM_X86_HRESET_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index bce8719b47c9..ab9809520164 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -57,6 +57,7 @@
#include <asm/mce.h>
#include <asm/msr.h>
#include <asm/cacheinfo.h>
+#include <asm/hreset.h>
#include <asm/memtype.h>
#include <asm/microcode.h>
#include <asm/intel-family.h>
@@ -383,6 +384,13 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
static u32 hardware_history_features __ro_after_init;
+void reset_hardware_history(void)
+{
+ asm_inline volatile (ALTERNATIVE("", __ASM_HRESET, X86_FEATURE_HRESET)
+ : : "a" (hardware_history_features) : "memory");
+}
+EXPORT_SYMBOL(reset_hardware_history);
+
static __always_inline void setup_hreset(struct cpuinfo_x86 *c)
{
if (!cpu_feature_enabled(X86_FEATURE_HRESET))
--
2.25.1
On Fri, Feb 02, 2024 at 08:05:13PM -0800, Ricardo Neri wrote:
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 8104f4791abd..3b42479c049d 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -326,6 +326,7 @@
> #define X86_FEATURE_FSRC (12*32+12) /* "" Fast short REP {CMPSB,SCASB} */
> #define X86_FEATURE_LKGS (12*32+18) /* "" Load "kernel" (userspace) GS */
> #define X86_FEATURE_AMX_FP16 (12*32+21) /* "" AMX fp16 Support */
> +#define X86_FEATURE_HRESET (12*32+22) /* Hardware history reset instruction */
#define X86_FEATURE_HRESET (12*32+22) /* "" Hardware history reset instruction */
unless this really needs to be visible in /proc/cpuinfo:
Documentation/arch/x86/cpuinfo.rst
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Feb 02, 2024 at 08:05:14PM -0800, Ricardo Neri wrote:
> +static __always_inline void setup_hreset(struct cpuinfo_x86 *c)
> +{
> + if (!cpu_feature_enabled(X86_FEATURE_HRESET))
> + return;
> +
> + /*
> + * Use on all CPUs the hardware history features that the boot
> + * CPU supports.
> + */
> + if (c == &boot_cpu_data)
> + hardware_history_features = cpuid_ebx(0x20);
That's not what this does - that sets hardware_history_features on the
BSP.
Why isn't this whole thing called in bsp_init_intel()?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Feb 02, 2024 at 08:05:15PM -0800, Ricardo Neri wrote:
> +void reset_hardware_history(void)
> +{
> + asm_inline volatile (ALTERNATIVE("", __ASM_HRESET, X86_FEATURE_HRESET)
> + : : "a" (hardware_history_features) : "memory");
> +}
This thing belongs in the header too, and it should be __always_inline.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Sat, Feb 03, 2024 at 10:40:14AM +0100, Borislav Petkov wrote:
> On Fri, Feb 02, 2024 at 08:05:15PM -0800, Ricardo Neri wrote:
> > +void reset_hardware_history(void)
> > +{
> > + asm_inline volatile (ALTERNATIVE("", __ASM_HRESET, X86_FEATURE_HRESET)
> > + : : "a" (hardware_history_features) : "memory");
> > +}
>
> This thing belongs in the header too, and it should be __always_inline.
Thanks a lot for your feedback Boris!
Sure, I can move this code to the header ad use __always_inline.
BR,
Ricardo
On Sat, Feb 03, 2024 at 10:38:57AM +0100, Borislav Petkov wrote:
> On Fri, Feb 02, 2024 at 08:05:14PM -0800, Ricardo Neri wrote:
> > +static __always_inline void setup_hreset(struct cpuinfo_x86 *c)
> > +{
> > + if (!cpu_feature_enabled(X86_FEATURE_HRESET))
> > + return;
> > +
> > + /*
> > + * Use on all CPUs the hardware history features that the boot
> > + * CPU supports.
> > + */
> > + if (c == &boot_cpu_data)
> > + hardware_history_features = cpuid_ebx(0x20);
>
> That's not what this does - that sets hardware_history_features on the
> BSP.
I meant to say that we will use the features that the BSP supports and
use them to configure all other CPUs. I will reword the comment to make
this clear.
>
> Why isn't this whole thing called in bsp_init_intel()?
The register MSR_IA32_HW_HRESET_ENABLE needs to be configured on each CPU.
I can set hardware_history_features from bsp_init_intel() but I still
need to call setup_hreset() on every CPU.
On Sat, Feb 03, 2024 at 10:36:22AM +0100, Borislav Petkov wrote:
> On Fri, Feb 02, 2024 at 08:05:13PM -0800, Ricardo Neri wrote:
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index 8104f4791abd..3b42479c049d 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -326,6 +326,7 @@
> > #define X86_FEATURE_FSRC (12*32+12) /* "" Fast short REP {CMPSB,SCASB} */
> > #define X86_FEATURE_LKGS (12*32+18) /* "" Load "kernel" (userspace) GS */
> > #define X86_FEATURE_AMX_FP16 (12*32+21) /* "" AMX fp16 Support */
> > +#define X86_FEATURE_HRESET (12*32+22) /* Hardware history reset instruction */
>
> #define X86_FEATURE_HRESET (12*32+22) /* "" Hardware history reset instruction */
>
> unless this really needs to be visible in /proc/cpuinfo:
>
> Documentation/arch/x86/cpuinfo.rst
Good point. There is no need to expose HRESET in /proc/cpuinfo. I will
implement this change.
On Sat, Feb 03, 2024 at 07:55:52PM -0800, Ricardo Neri wrote:
> I can set hardware_history_features from bsp_init_intel() but I still
> need to call setup_hreset() on every CPU.
init_intel() runs on every CPU.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Feb 02, 2024 at 08:05:12PM -0800, Ricardo Neri wrote:
> Enable Intel Thread Director (ITD) from the CPU hotplug callback: globally
> from CPU0 and then enable the thread-classification hardware in each
> logical processor individually.
>
> Also, initialize the number of classes supported.
>
> Currently, a bare-metal machine does not use ITD, but KVM uses the
> attributes of the bare-metal machine to virtualize HFI.
>
> Cc: Len Brown <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Srinivas Pandruvada <[email protected]>
> Cc: Stanislaw Gruszka <[email protected]>
> Cc: Zhao Liu <[email protected]>
> Cc: Zhuocheng Ding <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Acked-by: Rafael J. Wysocki <[email protected]> # intel_hfi.c
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> Discussion:
>
> This patch conflicts with a patchset from Stanislaw Gruszka to enable HFI
> only if there are user space entities listening to the thermal netlink
> events. ITD requires that HFI is enabled to function. ITD needs to be
> unconditionally enabled for virtual machines.
Why unconditionally? From what I can tell from KVM patches (please correct
me if I'm wrong) guests need to be modified to utilize HFI/ITD. Do we
also have to enable HFI/ITD if no such guest run on virtual machine ?
> Options to resolve this conflict include a command-line argument for users
> wanting to virtualize HFI or a CONFIG_ option for the same effect. QEMU
> could also learn to listen to thermal netlink event. A blunter option is
> to unconditionally enable HFI when KVM is enabled at build time.
In general similar principle should be applied - do not enable if not
needed. We should be able to get information from KVM when there is
actual need. QEMU registering to thermal events seems to be odd for
me, and I think there must be better solution.
Regards
Stanislaw
> ---
> Patch cherry-picked from the IPC classes patchset.
> ---
> ---
> arch/x86/include/asm/hfi.h | 9 ++++++
> arch/x86/include/asm/msr-index.h | 6 ++++
> drivers/thermal/intel/intel_hfi.c | 52 +++++++++++++++++++++++++++++--
> 3 files changed, 65 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/hfi.h b/arch/x86/include/asm/hfi.h
> index 02ee56dbaeb6..b7fda3e0e8c8 100644
> --- a/arch/x86/include/asm/hfi.h
> +++ b/arch/x86/include/asm/hfi.h
> @@ -23,6 +23,15 @@ union cpuid6_edx {
> u32 full;
> };
>
> +union cpuid6_ecx {
> + struct {
> + u32 dont_care0:8;
> + u32 nr_classes:8;
> + u32 dont_care1:16;
> + } split;
> + u32 full;
> +};
> +
> /**
> * struct hfi_hdr - Header of the HFI table
> * @perf_updated: Hardware updated performance capabilities
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 46983fb0b5b3..d74932a0778d 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1148,6 +1148,12 @@
>
> #define MSR_IA32_HW_FEEDBACK_CONFIG 0x17d1
> #define HW_FEEDBACK_CONFIG_HFI_ENABLE BIT_ULL(0)
> +#define HW_FEEDBACK_CONFIG_ITD_ENABLE BIT_ULL(1)
> +
> +#define MSR_IA32_HW_FEEDBACK_THREAD_CONFIG 0x17d4
> +#define HW_FEEDBACK_THREAD_CONFIG_ENABLE BIT_ULL(0)
> +
> +#define MSR_IA32_HW_FEEDBACK_CHAR 0x17d2
>
> /* x2APIC locked status */
> #define MSR_IA32_XAPIC_DISABLE_STATUS 0xBD
> diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
> index 3c399f3d059f..b69fa234b317 100644
> --- a/drivers/thermal/intel/intel_hfi.c
> +++ b/drivers/thermal/intel/intel_hfi.c
> @@ -33,6 +33,7 @@
> #include <linux/percpu-defs.h>
> #include <linux/printk.h>
> #include <linux/processor.h>
> +#include <linux/sched/topology.h>
> #include <linux/slab.h>
> #include <linux/spinlock.h>
> #include <linux/suspend.h>
> @@ -298,6 +299,10 @@ static void hfi_enable(void)
>
> rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
> msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE;
> +
> + if (cpu_feature_enabled(X86_FEATURE_ITD))
> + msr_val |= HW_FEEDBACK_CONFIG_ITD_ENABLE;
> +
> wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
> }
>
> @@ -319,6 +324,10 @@ static void hfi_disable(void)
>
> rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
> msr_val &= ~HW_FEEDBACK_CONFIG_HFI_ENABLE;
> +
> + if (cpu_feature_enabled(X86_FEATURE_ITD))
> + msr_val &= ~HW_FEEDBACK_CONFIG_ITD_ENABLE;
> +
> wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
>
> /*
> @@ -337,6 +346,30 @@ static void hfi_disable(void)
> }
> }
>
> +static void hfi_enable_itd_classification(void)
> +{
> + u64 msr_val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_ITD))
> + return;
> +
> + rdmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val);
> + msr_val |= HW_FEEDBACK_THREAD_CONFIG_ENABLE;
> + wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val);
> +}
> +
> +static void hfi_disable_itd_classification(void)
> +{
> + u64 msr_val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_ITD))
> + return;
> +
> + rdmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val);
> + msr_val &= ~HW_FEEDBACK_THREAD_CONFIG_ENABLE;
> + wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val);
> +}
> +
> /**
> * intel_hfi_online() - Enable HFI on @cpu
> * @cpu: CPU in which the HFI will be enabled
> @@ -377,6 +410,8 @@ void intel_hfi_online(unsigned int cpu)
>
> init_hfi_cpu_index(info);
>
> + hfi_enable_itd_classification();
> +
> /*
> * Now check if the HFI instance of the package/die of @cpu has been
> * initialized (by checking its header). In such case, all we have to
> @@ -460,6 +495,8 @@ void intel_hfi_offline(unsigned int cpu)
> if (!hfi_instance->local_table.hdr)
> return;
>
> + hfi_disable_itd_classification();
> +
> mutex_lock(&hfi_instance_lock);
> cpumask_clear_cpu(cpu, hfi_instance->cpus);
>
> @@ -505,8 +542,14 @@ static __init int hfi_parse_features(void)
> */
> hfi_features.class_stride = nr_capabilities;
>
> - /* For now, use only one class of the HFI table */
> - hfi_features.nr_classes = 1;
> + if (cpu_feature_enabled(X86_FEATURE_ITD)) {
> + union cpuid6_ecx ecx;
> +
> + ecx.full = cpuid_ecx(CPUID_HFI_LEAF);
> + hfi_features.nr_classes = ecx.split.nr_classes;
> + } else {
> + hfi_features.nr_classes = 1;
> + }
>
> /*
> * The header contains change indications for each supported feature.
> @@ -535,11 +578,16 @@ static void hfi_do_enable(void)
> /* No locking needed. There is no concurrency with CPU online. */
> hfi_set_hw_table(hfi_instance);
> hfi_enable();
> +
> + hfi_enable_itd_classification();
> }
>
> static int hfi_do_disable(void)
> {
> /* No locking needed. There is no concurrency with CPU offline. */
> +
> + hfi_disable_itd_classification();
> +
> hfi_disable();
>
> return 0;
> --
> 2.25.1
>
On Sun, Feb 04, 2024 at 11:49:04AM +0100, Borislav Petkov wrote:
> On Sat, Feb 03, 2024 at 07:55:52PM -0800, Ricardo Neri wrote:
> > I can set hardware_history_features from bsp_init_intel() but I still
> > need to call setup_hreset() on every CPU.
>
> init_intel() runs on every CPU.
I see. Yes, this looks like the place to setup HRESET. Thank you for
the pointer!
On Mon, Feb 05, 2024 at 11:28:47AM +0100, Stanislaw Gruszka wrote:
> On Fri, Feb 02, 2024 at 08:05:12PM -0800, Ricardo Neri wrote:
> > Enable Intel Thread Director (ITD) from the CPU hotplug callback: globally
> > from CPU0 and then enable the thread-classification hardware in each
> > logical processor individually.
> >
> > Also, initialize the number of classes supported.
> >
> > Currently, a bare-metal machine does not use ITD, but KVM uses the
> > attributes of the bare-metal machine to virtualize HFI.
> >
> > Cc: Len Brown <[email protected]>
> > Cc: Rafael J. Wysocki <[email protected]>
> > Cc: Srinivas Pandruvada <[email protected]>
> > Cc: Stanislaw Gruszka <[email protected]>
> > Cc: Zhao Liu <[email protected]>
> > Cc: Zhuocheng Ding <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Acked-by: Rafael J. Wysocki <[email protected]> # intel_hfi.c
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > Discussion:
> >
> > This patch conflicts with a patchset from Stanislaw Gruszka to enable HFI
> > only if there are user space entities listening to the thermal netlink
> > events. ITD requires that HFI is enabled to function. ITD needs to be
> > unconditionally enabled for virtual machines.
>
> Why unconditionally? From what I can tell from KVM patches (please correct
> me if I'm wrong) guests need to be modified to utilize HFI/ITD. Do we
> also have to enable HFI/ITD if no such guest run on virtual machine ?
You are correct. ITD needs to be enabled unconditionally iff we can't know
when a guest machine is running.
>
> > Options to resolve this conflict include a command-line argument for users
> > wanting to virtualize HFI or a CONFIG_ option for the same effect. QEMU
> > could also learn to listen to thermal netlink event. A blunter option is
> > to unconditionally enable HFI when KVM is enabled at build time.
>
> In general similar principle should be applied - do not enable if not
> needed. We should be able to get information from KVM when there is
> actual need.
Agreed, Zhao had suggested to enable/disable when virtual machines are
built/destroyed.
> QEMU registering to thermal events seems to be odd for
> me, and I think there must be better solution.
Right.
On Fri, Feb 02, 2024 at 08:05:06PM -0800, Ricardo Neri wrote:
> Zhao Liu will soon post a patchset to virtualize the Hardware Feedback
> Interface (HFI) and Intel Thread Director (ITD) for the benefit of virtual
> machines that make use of ITD for scheduling. His experiments show up to
> 14% improvement in performance in some workloads and configurations. This
> series lays the foundation for his patchset. I will share Zhao's patchset
> when available.
Here are Zhao and Zhuocheng's patches:
https://lore.kernel.org/lkml/[email protected]/