Hi All,
This is a resend of https://patchwork.kernel.org/patch/10862699/ (with minor
modifications) plus additional patch to add a sysfs attribute for accessing
the EPB (as requested by Boris).
Thanks,
Rafael
From: Rafael J. Wysocki <[email protected]>
The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is
problematic, because it may cause changes made by user space to that
MSR (with the help of the x86_energy_perf_policy tool, for example)
to be lost every time a CPU goes offline and then back online as well
as during system-wide power management transitions into sleep states
and back into the working state.
The first problem is that if the current EPB value for a CPU going
online is 0 ('performance'), the kernel will change it to 6 ('normal')
regardless of whether or not this is the first bring-up of that CPU.
That also happens during system-wide resume from sleep states
(including, but not limited to, hibernation). However, the EPB may
have been adjusted by user space this way and the kernel should not
blindly override that setting.
The second problem is that if the platform firmware resets the EPB
values for any CPUs during system-wide resume from a sleep state,
the kernel will not restore their previous EPB values that may
have been set by user space before the preceding system-wide
suspend transition. Again, that behavior may at least be confusing
from the user space perspective.
In order to address these issues, rework the handling of
MSR_IA32_ENERGY_PERF_BIAS so that the EPB value is saved on CPU
offline and restored on CPU online as well as (for the boot CPU)
during the syscore stages of system-wide suspend and resume
transitions, respectively.
However, retain the policy by which the EPB is set to 6 ('normal')
on the first bring-up of each CPU if its initial value is 0, based
on the observation that 0 may mean 'not initialized' just as well as
'performance' in that case.
While at it, move the MSR_IA32_ENERGY_PERF_BIAS handling code into
a separate file and document it in Documentation/admin-guide.
Fixes: abe48b108247 (x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS)
Fixes: b51ef52df71c (x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume)
Reported-by: Thomas Renninger <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
This basically is a resend of https://patchwork.kernel.org/patch/10862699/,
except that EPB_MASK and EPB_SAVED are ULL values now and the year in the
copyright notice for the new file is 2019.
---
Documentation/admin-guide/pm/intel_epb.rst | 6 +
Documentation/admin-guide/pm/working-state.rst | 1
arch/x86/kernel/cpu/Makefile | 2
arch/x86/kernel/cpu/common.c | 17 ---
arch/x86/kernel/cpu/cpu.h | 1
arch/x86/kernel/cpu/intel.c | 34 ------
arch/x86/kernel/cpu/intel_epb.c | 131 +++++++++++++++++++++++++
include/linux/cpuhotplug.h | 1
8 files changed, 140 insertions(+), 53 deletions(-)
Index: linux-pm/include/linux/cpuhotplug.h
===================================================================
--- linux-pm.orig/include/linux/cpuhotplug.h
+++ linux-pm/include/linux/cpuhotplug.h
@@ -147,6 +147,7 @@ enum cpuhp_state {
CPUHP_AP_X86_VDSO_VMA_ONLINE,
CPUHP_AP_IRQ_AFFINITY_ONLINE,
CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS,
+ CPUHP_AP_X86_INTEL_EPB_ONLINE,
CPUHP_AP_PERF_ONLINE,
CPUHP_AP_PERF_X86_ONLINE,
CPUHP_AP_PERF_X86_UNCORE_ONLINE,
Index: linux-pm/arch/x86/kernel/cpu/intel.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/intel.c
+++ linux-pm/arch/x86/kernel/cpu/intel.c
@@ -596,36 +596,6 @@ detect_keyid_bits:
c->x86_phys_bits -= keyid_bits;
}
-static void init_intel_energy_perf(struct cpuinfo_x86 *c)
-{
- u64 epb;
-
- /*
- * Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
- * (x86_energy_perf_policy(8) is available to change it at run-time.)
- */
- if (!cpu_has(c, X86_FEATURE_EPB))
- return;
-
- rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
- if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
- return;
-
- pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
- pr_warn_once("ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)\n");
- epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
- wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
-}
-
-static void intel_bsp_resume(struct cpuinfo_x86 *c)
-{
- /*
- * MSR_IA32_ENERGY_PERF_BIAS is lost across suspend/resume,
- * so reinitialize it properly like during bootup:
- */
- init_intel_energy_perf(c);
-}
-
static void init_cpuid_fault(struct cpuinfo_x86 *c)
{
u64 msr;
@@ -763,8 +733,6 @@ static void init_intel(struct cpuinfo_x8
if (cpu_has(c, X86_FEATURE_TME))
detect_tme(c);
- init_intel_energy_perf(c);
-
init_intel_misc_features(c);
}
@@ -1023,9 +991,7 @@ static const struct cpu_dev intel_cpu_de
.c_detect_tlb = intel_detect_tlb,
.c_early_init = early_init_intel,
.c_init = init_intel,
- .c_bsp_resume = intel_bsp_resume,
.c_x86_vendor = X86_VENDOR_INTEL,
};
cpu_dev_register(intel_cpu_dev);
-
Index: linux-pm/arch/x86/kernel/cpu/common.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/common.c
+++ linux-pm/arch/x86/kernel/cpu/common.c
@@ -1864,23 +1864,6 @@ void cpu_init(void)
}
#endif
-static void bsp_resume(void)
-{
- if (this_cpu->c_bsp_resume)
- this_cpu->c_bsp_resume(&boot_cpu_data);
-}
-
-static struct syscore_ops cpu_syscore_ops = {
- .resume = bsp_resume,
-};
-
-static int __init init_cpu_syscore(void)
-{
- register_syscore_ops(&cpu_syscore_ops);
- return 0;
-}
-core_initcall(init_cpu_syscore);
-
/*
* The microcode loader calls this upon late microcode load to recheck features,
* only when microcode has been updated. Caller holds microcode_mutex and CPU
Index: linux-pm/arch/x86/kernel/cpu/cpu.h
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/cpu.h
+++ linux-pm/arch/x86/kernel/cpu/cpu.h
@@ -14,7 +14,6 @@ struct cpu_dev {
void (*c_init)(struct cpuinfo_x86 *);
void (*c_identify)(struct cpuinfo_x86 *);
void (*c_detect_tlb)(struct cpuinfo_x86 *);
- void (*c_bsp_resume)(struct cpuinfo_x86 *);
int c_x86_vendor;
#ifdef CONFIG_X86_32
/* Optional vendor specific routine to obtain the cache size. */
Index: linux-pm/arch/x86/kernel/cpu/Makefile
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/Makefile
+++ linux-pm/arch/x86/kernel/cpu/Makefile
@@ -28,7 +28,7 @@ obj-y += cpuid-deps.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
-obj-$(CONFIG_CPU_SUP_INTEL) += intel.o intel_pconfig.o
+obj-$(CONFIG_CPU_SUP_INTEL) += intel.o intel_pconfig.o intel_epb.o
obj-$(CONFIG_CPU_SUP_AMD) += amd.o
obj-$(CONFIG_CPU_SUP_HYGON) += hygon.o
obj-$(CONFIG_CPU_SUP_CYRIX_32) += cyrix.o
Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c
===================================================================
--- /dev/null
+++ linux-pm/arch/x86/kernel/cpu/intel_epb.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel Performance and Energy Bias Hint support.
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author:
+ * Rafael J. Wysocki <[email protected]>
+ */
+
+#include <linux/cpuhotplug.h>
+#include <linux/kernel.h>
+#include <linux/syscore_ops.h>
+
+#include <asm/cpufeature.h>
+#include <asm/msr.h>
+
+/**
+ * DOC: overview
+ *
+ * The Performance and Energy Bias Hint (EPB) allows software to specify its
+ * preference with respect to the power-performance tradeoffs present in the
+ * processor. Generally, the EPB is expected to be set by user space through
+ * the generic MSR interface (with the help of the x86_energy_perf_policy tool),
+ * but there are two reasons for the kernel to touch it.
+ *
+ * First, there are systems where the platform firmware resets the EPB during
+ * system-wide transitions from sleep states back into the working state
+ * effectively causing the previous EPB updates by user space to be lost.
+ * Thus the kernel needs to save the current EPB values for all CPUs during
+ * system-wide transitions to sleep states and restore them on the way back to
+ * the working state. That can be achieved by saving EPB for secondary CPUs
+ * when they are taken offline during transitions into system sleep states and
+ * for the boot CPU in a syscore suspend operation, so that it can be restored
+ * for the boot CPU in a syscore resume operation and for the other CPUs when
+ * they are brought back online. However, CPUs that are already offline when
+ * a system-wide PM transition is started are not taken offline again, but their
+ * EPB values may still be reset by the platform firmware during the transition,
+ * so in fact it is necessary to save the EPB of any CPU taken offline and to
+ * restore it when the given CPU goes back online at all times.
+ *
+ * Second, on many systems the initial EPB value coming from the platform
+ * firmware is 0 ('performance') and at least on some of them that is because
+ * the platform firmware does not initialize EPB at all with the assumption that
+ * the OS will do that anyway. That sometimes is problematic, as it may cause
+ * the system battery to drain too fast, for example, so it is better to adjust
+ * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
+ * kernel changes it to 6 ('normal').
+ */
+
+static DEFINE_PER_CPU(u8, saved_epb);
+
+#define EPB_MASK 0x0fULL
+#define EPB_SAVED 0x10ULL
+
+static int intel_epb_save(void)
+{
+ u64 epb;
+
+ rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
+ /*
+ * Ensure that saved_epb will always be nonzero after this write even if
+ * the EPB value read from the MSR is 0.
+ */
+ this_cpu_write(saved_epb, (epb & EPB_MASK) | EPB_SAVED);
+
+ return 0;
+}
+
+static void intel_epb_restore(void)
+{
+ u64 val = this_cpu_read(saved_epb);
+ u64 epb;
+
+ rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
+ if (val) {
+ val &= EPB_MASK;
+ } else {
+ /*
+ * Because intel_epb_save() has not run for the current CPU yet,
+ * it is going online for the first time, so if its EPB value is
+ * 0 ('performance') at this point, assume that it has not been
+ * initialized by the platform firmware and set it to 6
+ * ('normal').
+ */
+ val = epb & EPB_MASK;
+ if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
+ val = ENERGY_PERF_BIAS_NORMAL;
+ pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
+ }
+ }
+ wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, (epb & ~EPB_MASK) | val);
+}
+
+static struct syscore_ops intel_epb_syscore_ops = {
+ .suspend = intel_epb_save,
+ .resume = intel_epb_restore,
+};
+
+static int intel_epb_online(unsigned int cpu)
+{
+ intel_epb_restore();
+ return 0;
+}
+
+static int intel_epb_offline(unsigned int cpu)
+{
+ return intel_epb_save();
+}
+
+static __init int intel_epb_init(void)
+{
+ int ret;
+
+ if (!boot_cpu_has(X86_FEATURE_EPB))
+ return -ENODEV;
+
+ ret = cpuhp_setup_state(CPUHP_AP_X86_INTEL_EPB_ONLINE,
+ "x86/intel/epb:online", intel_epb_online,
+ intel_epb_offline);
+ if (ret < 0)
+ goto err_out_online;
+
+ register_syscore_ops(&intel_epb_syscore_ops);
+ return 0;
+
+err_out_online:
+ cpuhp_remove_state(CPUHP_AP_X86_INTEL_EPB_ONLINE);
+ return ret;
+}
+subsys_initcall(intel_epb_init);
Index: linux-pm/Documentation/admin-guide/pm/intel_epb.rst
===================================================================
--- /dev/null
+++ linux-pm/Documentation/admin-guide/pm/intel_epb.rst
@@ -0,0 +1,6 @@
+======================================
+Intel Performance and Energy Bias Hint
+======================================
+
+.. kernel-doc:: arch/x86/kernel/cpu/intel_epb.c
+ :doc: overview
Index: linux-pm/Documentation/admin-guide/pm/working-state.rst
===================================================================
--- linux-pm.orig/Documentation/admin-guide/pm/working-state.rst
+++ linux-pm/Documentation/admin-guide/pm/working-state.rst
@@ -8,3 +8,4 @@ Working-State Power Management
cpuidle
cpufreq
intel_pstate
+ intel_epb
From: Rafael J. Wysocki <[email protected]>
The Performance and Energy Bias Hint (EPB) is expected to be set by
user space through the generic MSR interface, but that interface is
not particularly nice and there are security concerns regarding it,
so it is not always available.
For this reason, add a sysfs interface for reading and updating the
EPB, in the form of a new attribute, energy_perf_bias, located
under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
support the EPB feature.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-system-cpu | 18 ++++
Documentation/admin-guide/pm/intel_epb.rst | 27 ++++++
arch/x86/kernel/cpu/intel_epb.c | 93 ++++++++++++++++++++-
3 files changed, 134 insertions(+), 4 deletions(-)
Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/intel_epb.c
+++ linux-pm/arch/x86/kernel/cpu/intel_epb.c
@@ -9,8 +9,12 @@
*/
#include <linux/cpuhotplug.h>
+#include <linux/cpu.h>
+#include <linux/device.h>
#include <linux/kernel.h>
+#include <linux/string.h>
#include <linux/syscore_ops.h>
+#include <linux/pm.h>
#include <asm/cpufeature.h>
#include <asm/msr.h>
@@ -20,9 +24,9 @@
*
* The Performance and Energy Bias Hint (EPB) allows software to specify its
* preference with respect to the power-performance tradeoffs present in the
- * processor. Generally, the EPB is expected to be set by user space through
- * the generic MSR interface (with the help of the x86_energy_perf_policy tool),
- * but there are two reasons for the kernel to touch it.
+ * processor. Generally, the EPB is expected to be set by user space (directly
+ * via sysfs or with the help of the x86_energy_perf_policy tool), but there are
+ * two reasons for the kernel to update it.
*
* First, there are systems where the platform firmware resets the EPB during
* system-wide transitions from sleep states back into the working state
@@ -52,6 +56,7 @@ static DEFINE_PER_CPU(u8, saved_epb);
#define EPB_MASK 0x0fULL
#define EPB_SAVED 0x10ULL
+#define MAX_EPB EPB_MASK
static int intel_epb_save(void)
{
@@ -97,15 +102,95 @@ static struct syscore_ops intel_epb_sysc
.resume = intel_epb_restore,
};
+static const char * const energy_perf_strings[] = {
+ "performance",
+ "balance-performance",
+ "normal",
+ "balance-power",
+ "power"
+};
+static const u8 energ_perf_values[] = {
+ ENERGY_PERF_BIAS_PERFORMANCE,
+ ENERGY_PERF_BIAS_BALANCE_PERFORMANCE,
+ ENERGY_PERF_BIAS_NORMAL,
+ ENERGY_PERF_BIAS_BALANCE_POWERSAVE,
+ ENERGY_PERF_BIAS_POWERSAVE
+};
+
+static ssize_t energy_perf_bias_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ unsigned int cpu = dev->id;
+ u64 epb;
+ int ret;
+
+ ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
+ if (ret < 0)
+ return ret;
+
+ return sprintf(buf, "%llu\n", epb);
+}
+
+static ssize_t energy_perf_bias_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ unsigned int cpu = dev->id;
+ u64 epb, val;
+ int ret;
+
+ ret = __sysfs_match_string(energy_perf_strings,
+ ARRAY_SIZE(energy_perf_strings), buf);
+ if (ret >= 0)
+ val = energ_perf_values[ret];
+ else if (kstrtou64(buf, 0, &val) || val > MAX_EPB)
+ return -EINVAL;
+
+ ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
+ if (ret < 0)
+ return ret;
+
+ ret = wrmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS,
+ (epb & ~EPB_MASK) | val);
+ if (ret < 0)
+ return ret;
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(energy_perf_bias);
+
+static struct attribute *intel_epb_attrs[] = {
+ &dev_attr_energy_perf_bias.attr,
+ NULL
+};
+
+static const struct attribute_group intel_epb_attr_group = {
+ .name = power_group_name,
+ .attrs = intel_epb_attrs
+};
+
static int intel_epb_online(unsigned int cpu)
{
+ struct device *cpu_dev = get_cpu_device(cpu);
+
intel_epb_restore();
+ if (!cpuhp_tasks_frozen)
+ sysfs_merge_group(&cpu_dev->kobj, &intel_epb_attr_group);
+
return 0;
}
static int intel_epb_offline(unsigned int cpu)
{
- return intel_epb_save();
+ struct device *cpu_dev = get_cpu_device(cpu);
+
+ if (!cpuhp_tasks_frozen)
+ sysfs_unmerge_group(&cpu_dev->kobj, &intel_epb_attr_group);
+
+ intel_epb_save();
+ return 0;
}
static __init int intel_epb_init(void)
Index: linux-pm/Documentation/admin-guide/pm/intel_epb.rst
===================================================================
--- linux-pm.orig/Documentation/admin-guide/pm/intel_epb.rst
+++ linux-pm/Documentation/admin-guide/pm/intel_epb.rst
@@ -4,3 +4,30 @@ Intel Performance and Energy Bias Hint
.. kernel-doc:: arch/x86/kernel/cpu/intel_epb.c
:doc: overview
+
+Intel Performance and Energy Bias Attribute in ``sysfs``
+========================================================
+
+The Intel Performance and Energy Bias Hint (EPB) value for a given (logical) CPU
+can be checked or updated through a ``sysfs`` attribute (file) under
+:file:`/sys/devices/system/cpu/cpu<N>/power/`, where the CPU number ``<N>``
+is allocated at the system initialization time:
+
+``energy_perf_bias``
+ Shows the current EPB value for the CPU in a sliding scale 0 - 15, where
+ a value of 0 corresponds to a hint preference for highest performance
+ and a value of 15 corresponds to the maximum energy savings.
+
+ In order to update the EPB value for the CPU, this attribute can be
+ written to, either with a number in the 0 - 15 sliding scale above, or
+ with one of the strings: "performance", "balance-performance", "normal",
+ "balance-power", "power" that represent values reflected by their
+ meaning.
+
+ This attribute is present for all online CPUs supporting the EPB
+ feature.
+
+Note that while the EPB interface to the processor is defined at the logical CPU
+level, the physical register backing it may be shared by multiple CPUs (for
+example, SMT siblings or cores in one package). For this reason, updating the
+EPB value for one CPU may cause the EPB values for other CPUs to change.
Index: linux-pm/Documentation/ABI/testing/sysfs-devices-system-cpu
===================================================================
--- linux-pm.orig/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ linux-pm/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -518,3 +518,21 @@ Description: Control Symetric Multi Thre
If control status is "forceoff" or "notsupported" writes
are rejected.
+
+What: /sys/devices/system/cpu/cpu#/power/energy_perf_bias
+Date: March 2019
+Contact: [email protected]
+Description: Intel Energy and Performance Bias Hint (EPB)
+
+ EPB for the given CPU in a sliding scale 0 - 15, where a value
+ of 0 corresponds to a hint preference for highest performance
+ and a value of 15 corresponds to the maximum energy savings.
+
+ In order to change the EPB value for the CPU, write either
+ a number in the 0 - 15 sliding scale above, or one of the
+ strings: "performance", "balance-performance", "normal",
+ "balance-power", "power" (that represent values reflected by
+ their meaning), to this attribute.
+
+ This attribute is present for all online CPUs supporting the
+ Intel EPB feature.
On 3/21/19 11:18 PM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is
> problematic, because it may cause changes made by user space to that
> MSR (with the help of the x86_energy_perf_policy tool, for example)
> to be lost every time a CPU goes offline and then back online as well
> as during system-wide power management transitions into sleep states
> and back into the working state.
>
> The first problem is that if the current EPB value for a CPU going
> online is 0 ('performance'), the kernel will change it to 6 ('normal')
> regardless of whether or not this is the first bring-up of that CPU.
> That also happens during system-wide resume from sleep states
> (including, but not limited to, hibernation). However, the EPB may
> have been adjusted by user space this way and the kernel should not
> blindly override that setting.
>
> The second problem is that if the platform firmware resets the EPB
> values for any CPUs during system-wide resume from a sleep state,
> the kernel will not restore their previous EPB values that may
> have been set by user space before the preceding system-wide
> suspend transition. Again, that behavior may at least be confusing
> from the user space perspective.
>
> In order to address these issues, rework the handling of
> MSR_IA32_ENERGY_PERF_BIAS so that the EPB value is saved on CPU
> offline and restored on CPU online as well as (for the boot CPU)
> during the syscore stages of system-wide suspend and resume
> transitions, respectively.
>
> However, retain the policy by which the EPB is set to 6 ('normal')
> on the first bring-up of each CPU if its initial value is 0, based
> on the observation that 0 may mean 'not initialized' just as well as
> 'performance' in that case.
>
> While at it, move the MSR_IA32_ENERGY_PERF_BIAS handling code into
> a separate file and document it in Documentation/admin-guide.
>
> Fixes: abe48b108247 (x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS)
> Fixes: b51ef52df71c (x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume)
> Reported-by: Thomas Renninger <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
>
Reviewed-by: Hannes Reinecke <[email protected]>
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
On 3/21/19 11:20 PM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The Performance and Energy Bias Hint (EPB) is expected to be set by
> user space through the generic MSR interface, but that interface is
> not particularly nice and there are security concerns regarding it,
> so it is not always available.
>
> For this reason, add a sysfs interface for reading and updating the
> EPB, in the form of a new attribute, energy_perf_bias, located
> under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
> support the EPB feature.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-devices-system-cpu | 18 ++++
> Documentation/admin-guide/pm/intel_epb.rst | 27 ++++++
> arch/x86/kernel/cpu/intel_epb.c | 93 ++++++++++++++++++++-
> 3 files changed, 134 insertions(+), 4 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
[email protected] +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
On Thu, Mar 21, 2019 at 11:18:01PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is
> problematic, because it may cause changes made by user space to that
> MSR (with the help of the x86_energy_perf_policy tool, for example)
One more reason to control MSR accesses from userspace. I'm working on
a series to even completely forbid accesses to some MSRs over /dev/msr
so I think accessing MSR_IA32_ENERGY_PERF_BIAS solely over the new
interface in patch 2 would be much better.
So, you're carrying those and you'd like to have an ACK from me?
Btw, a couple of nitpicks below.
> Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c
> ===================================================================
> --- /dev/null
> +++ linux-pm/arch/x86/kernel/cpu/intel_epb.c
> @@ -0,0 +1,131 @@
> +// SPDX-License-Identifier: GPL-2.0
...
> +static DEFINE_PER_CPU(u8, saved_epb);
> +
> +#define EPB_MASK 0x0fULL
> +#define EPB_SAVED 0x10ULL
> +
> +static int intel_epb_save(void)
I'd drop that "intel_epb_" prefix from those static functions, but your
call...
> +{
> + u64 epb;
> +
> + rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> + /*
> + * Ensure that saved_epb will always be nonzero after this write even if
> + * the EPB value read from the MSR is 0.
> + */
> + this_cpu_write(saved_epb, (epb & EPB_MASK) | EPB_SAVED);
> +
> + return 0;
> +}
...
> Index: linux-pm/Documentation/admin-guide/pm/intel_epb.rst
> ===================================================================
> --- /dev/null
> +++ linux-pm/Documentation/admin-guide/pm/intel_epb.rst
WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
#345: FILE: Documentation/admin-guide/pm/intel_epb.rst:1:
+======================================
> @@ -0,0 +1,6 @@
> +======================================
> +Intel Performance and Energy Bias Hint
> +======================================
> +
> +.. kernel-doc:: arch/x86/kernel/cpu/intel_epb.c
> + :doc: overview
> Index: linux-pm/Documentation/admin-guide/pm/working-state.rst
> ===================================================================
> --- linux-pm.orig/Documentation/admin-guide/pm/working-state.rst
> +++ linux-pm/Documentation/admin-guide/pm/working-state.rst
> @@ -8,3 +8,4 @@ Working-State Power Management
> cpuidle
> cpufreq
> intel_pstate
> + intel_epb
>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, 22 Mar 2019, Borislav Petkov wrote:
> On Thu, Mar 21, 2019 at 11:18:01PM +0100, Rafael J. Wysocki wrote:
> > Index: linux-pm/Documentation/admin-guide/pm/intel_epb.rst
> > ===================================================================
> > --- /dev/null
> > +++ linux-pm/Documentation/admin-guide/pm/intel_epb.rst
>
> WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
> #345: FILE: Documentation/admin-guide/pm/intel_epb.rst:1:
> +======================================
We have no proper decision/recommendation about documentation
licensing. That's being worked on.
Thanks,
tglx
On Fri, Mar 22, 2019 at 03:31:54PM +0100, Thomas Gleixner wrote:
> We have no proper decision/recommendation about documentation
> licensing. That's being worked on.
Ok, I'm ignoring this checkpatch warning from now on.
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
First of all, thanks a lot for doing that!
This is a good example for how we should convert all the /dev/msr
accessing tools.
Nitpicks below.
On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The Performance and Energy Bias Hint (EPB) is expected to be set by
> user space through the generic MSR interface, but that interface is
> not particularly nice and there are security concerns regarding it,
> so it is not always available.
>
> For this reason, add a sysfs interface for reading and updating the
> EPB, in the form of a new attribute, energy_perf_bias, located
> under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
> support the EPB feature.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-devices-system-cpu | 18 ++++
> Documentation/admin-guide/pm/intel_epb.rst | 27 ++++++
> arch/x86/kernel/cpu/intel_epb.c | 93 ++++++++++++++++++++-
> 3 files changed, 134 insertions(+), 4 deletions(-)
...
> +static ssize_t energy_perf_bias_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + unsigned int cpu = dev->id;
> + u64 epb;
> + int ret;
> +
> + ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
That's an IPI and an MSR read each time. You could dump saved_epb
instead, no?
> + if (ret < 0)
> + return ret;
> +
> + return sprintf(buf, "%llu\n", epb);
> +}
> +
> +static ssize_t energy_perf_bias_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + unsigned int cpu = dev->id;
> + u64 epb, val;
> + int ret;
> +
> + ret = __sysfs_match_string(energy_perf_strings,
> + ARRAY_SIZE(energy_perf_strings), buf);
> + if (ret >= 0)
> + val = energ_perf_values[ret];
> + else if (kstrtou64(buf, 0, &val) || val > MAX_EPB)
Range is 0 - 15 but u64? Maybe make it an u8? :)
> + return -EINVAL;
> +
> + ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
> + if (ret < 0)
> + return ret;
> +
> + ret = wrmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS,
> + (epb & ~EPB_MASK) | val);
> + if (ret < 0)
> + return ret;
> +
> + return count;
> +}
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> + ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
> + if (ret < 0)
> + return ret;
> +
> + ret = wrmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS,
> + (epb & ~EPB_MASK) | val);
That's two back-to-back IPIs and a giant waste.
If you'd use a proper msr shadow variable, you'd not have to do the
rdmsr_on_cpu :-)
On Fri, 22 Mar 2019, Borislav Petkov wrote:
> On Fri, Mar 22, 2019 at 03:31:54PM +0100, Thomas Gleixner wrote:
> > We have no proper decision/recommendation about documentation
> > licensing. That's being worked on.
>
> Ok, I'm ignoring this checkpatch warning from now on.
Only for Documentation/* files please.
Thanks Rafael for your quick look at and all the time you spend for this!
A /sys userspace knob will certainly not be enough for us.
You'll need a tool installed fixing this.
powertop on laptops or tuned on servers or a well hidden bootup quirk or
whatsoever.
The patch I sent with this part:
+ if (acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER ||
+ acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER)
+ return;
and not touching the EBP value then should at least match most of
our users and OEMs who want a "performance" setting out of the box and
set this on purpose.
Even nicer would be compile option to not touch this at all.
On Thursday, March 21, 2019 11:18:01 PM CET Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
...
> + *
> + * Second, on many systems the initial EPB value coming from the platform
> + * firmware is 0 ('performance') and at least on some of them that is
> because + * the platform firmware does not initialize EPB
Why does the CPU not initialize this value to 6?
That would allow OEMs/BIOS to also suggest an init value for the system.
We should try to get microcode people or whoever is in charge to initialize
this value "properly" if Intel thinks 6 is the correct init value.
> at all with the
> assumption that + * the OS will do that anyway. That sometimes is
> problematic, as it may cause + * the system battery to drain too fast, for
> example, so it is better to adjust + * it on CPU bring-up and if the
> initial EPB value for a given CPU is 0, the + * kernel changes it to 6
> ('normal').
I have an idea to let the kernel more decide about such policies.
It's a nice example that it makes sense to let the kernel set default values.
But not unconditionally, according to what the system is intended to do.
I wanted to do this for quite some time.., I hopefully find the time and
motivation now.
Thanks Rafael.
Sorry for the somewhat rude sounding previous mail, that was not on purpose.
You helped me quite a lot in the past and you obviously still do.
Thomas
On Fri, Mar 22, 2019 at 05:27:43PM +0100, Thomas Renninger wrote:
> Thanks Rafael for your quick look at and all the time you spend for this!
>
> A /sys userspace knob will certainly not be enough for us.
> You'll need a tool installed fixing this.
We won't pick up the second patch converting to the sysfs interface of
course - only the first one. The second patch is beginning the slow
phasing out of the direct /dev/msr access.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, 2019-03-22 at 17:12 +0100, Thomas Gleixner wrote:
> On Fri, 22 Mar 2019, Borislav Petkov wrote:
>
> > On Fri, Mar 22, 2019 at 03:31:54PM +0100, Thomas Gleixner wrote:
> > > We have no proper decision/recommendation about documentation
> > > licensing. That's being worked on.
> >
> > Ok, I'm ignoring this checkpatch warning from now on.
>
> Only for Documentation/* files please.
Perhaps
---
scripts/checkpatch.pl | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index d0001fd1112d..00c1457fe79b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3058,7 +3058,8 @@ sub process {
"Improper SPDX comment style for '$realfile', please use '$comment' instead\n" . $herecurr);
}
- if ($comment !~ /^$/ &&
+ if ($realfile !~ m@^Documentation/@ &&
+ $comment !~ /^$/ &&
$rawline !~ m@^\+\Q$comment\E SPDX-License-Identifier: @) {
WARN("SPDX_LICENSE_TAG",
"Missing or malformed SPDX-License-Identifier tag in line $checklicenseline\n" . $herecurr);
On Fri, Mar 22, 2019 at 4:00 PM Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> > + ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
> > + if (ret < 0)
> > + return ret;
> > +
> > + ret = wrmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS,
> > + (epb & ~EPB_MASK) | val);
>
> That's two back-to-back IPIs and a giant waste.
Giant with respect to what?
I know that the read can be avoidable if more MSR bits are stored in
memory, but I don't expect this i/f to be used very often (once per
boot maybe or on AC<->DC changes at most), so I didn't think that this
would be a good tradeoff.
> If you'd use a proper msr shadow variable, you'd not have to do the
> rdmsr_on_cpu :-)
Not really.
The MSR can be updated from elsewhere which is not controlled by this code.
On Fri, Mar 22, 2019 at 3:46 PM Borislav Petkov <[email protected]> wrote:
>
> First of all, thanks a lot for doing that!
>
> This is a good example for how we should convert all the /dev/msr
> accessing tools.
>
> Nitpicks below.
>
> On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > The Performance and Energy Bias Hint (EPB) is expected to be set by
> > user space through the generic MSR interface, but that interface is
> > not particularly nice and there are security concerns regarding it,
> > so it is not always available.
> >
> > For this reason, add a sysfs interface for reading and updating the
> > EPB, in the form of a new attribute, energy_perf_bias, located
> > under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
> > support the EPB feature.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > Documentation/ABI/testing/sysfs-devices-system-cpu | 18 ++++
> > Documentation/admin-guide/pm/intel_epb.rst | 27 ++++++
> > arch/x86/kernel/cpu/intel_epb.c | 93 ++++++++++++++++++++-
> > 3 files changed, 134 insertions(+), 4 deletions(-)
>
> ...
>
> > +static ssize_t energy_perf_bias_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + unsigned int cpu = dev->id;
> > + u64 epb;
> > + int ret;
> > +
> > + ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
>
> That's an IPI and an MSR read each time. You could dump saved_epb
> instead, no?
No, because the MSR can change in ways beyond control of this code sometimes.
Generally, saved_epb only is the right value at the CPU online time.
> > + if (ret < 0)
> > + return ret;
> > +
> > + return sprintf(buf, "%llu\n", epb);
> > +}
> > +
> > +static ssize_t energy_perf_bias_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + unsigned int cpu = dev->id;
> > + u64 epb, val;
> > + int ret;
> > +
> > + ret = __sysfs_match_string(energy_perf_strings,
> > + ARRAY_SIZE(energy_perf_strings), buf);
> > + if (ret >= 0)
> > + val = energ_perf_values[ret];
> > + else if (kstrtou64(buf, 0, &val) || val > MAX_EPB)
>
> Range is 0 - 15 but u64? Maybe make it an u8? :)
At the cost of an extra conversion below, right?
> > + return -EINVAL;
> > +
> > + ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
> > + if (ret < 0)
> > + return ret;
> > +
> > + ret = wrmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS,
> > + (epb & ~EPB_MASK) | val);
> > + if (ret < 0)
> > + return ret;
> > +
> > + return count;
> > +}
>
> --
On Fri, Mar 22, 2019 at 3:28 PM Borislav Petkov <[email protected]> wrote:
>
> On Thu, Mar 21, 2019 at 11:18:01PM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is
> > problematic, because it may cause changes made by user space to that
> > MSR (with the help of the x86_energy_perf_policy tool, for example)
>
> One more reason to control MSR accesses from userspace. I'm working on
> a series to even completely forbid accesses to some MSRs over /dev/msr
> so I think accessing MSR_IA32_ENERGY_PERF_BIAS solely over the new
> interface in patch 2 would be much better.
>
> So, you're carrying those and you'd like to have an ACK from me?
>
> Btw, a couple of nitpicks below.
>
> > Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-pm/arch/x86/kernel/cpu/intel_epb.c
> > @@ -0,0 +1,131 @@
> > +// SPDX-License-Identifier: GPL-2.0
>
> ...
>
> > +static DEFINE_PER_CPU(u8, saved_epb);
> > +
> > +#define EPB_MASK 0x0fULL
> > +#define EPB_SAVED 0x10ULL
> > +
> > +static int intel_epb_save(void)
>
> I'd drop that "intel_epb_" prefix from those static functions, but your
> call...
They help indexing tools (elixir.bootlin.com and similar) a bit, so
I'd rather retain them.
> > +{
> > + u64 epb;
> > +
> > + rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> > + /*
> > + * Ensure that saved_epb will always be nonzero after this write even if
> > + * the EPB value read from the MSR is 0.
> > + */
> > + this_cpu_write(saved_epb, (epb & EPB_MASK) | EPB_SAVED);
> > +
> > + return 0;
> > +}
>
> ...
>
> > Index: linux-pm/Documentation/admin-guide/pm/intel_epb.rst
> > ===================================================================
> > --- /dev/null
> > +++ linux-pm/Documentation/admin-guide/pm/intel_epb.rst
>
> WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
> #345: FILE: Documentation/admin-guide/pm/intel_epb.rst:1:
Well, this is documentation, so ...
On Thu, Mar 21, 2019 at 11:18:01PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is
> problematic, because it may cause changes made by user space to that
> MSR (with the help of the x86_energy_perf_policy tool, for example)
> to be lost every time a CPU goes offline and then back online as well
> as during system-wide power management transitions into sleep states
> and back into the working state.
>
> The first problem is that if the current EPB value for a CPU going
> online is 0 ('performance'), the kernel will change it to 6 ('normal')
> regardless of whether or not this is the first bring-up of that CPU.
> That also happens during system-wide resume from sleep states
> (including, but not limited to, hibernation). However, the EPB may
> have been adjusted by user space this way and the kernel should not
> blindly override that setting.
>
> The second problem is that if the platform firmware resets the EPB
> values for any CPUs during system-wide resume from a sleep state,
> the kernel will not restore their previous EPB values that may
> have been set by user space before the preceding system-wide
> suspend transition. Again, that behavior may at least be confusing
> from the user space perspective.
>
> In order to address these issues, rework the handling of
> MSR_IA32_ENERGY_PERF_BIAS so that the EPB value is saved on CPU
> offline and restored on CPU online as well as (for the boot CPU)
> during the syscore stages of system-wide suspend and resume
> transitions, respectively.
>
> However, retain the policy by which the EPB is set to 6 ('normal')
> on the first bring-up of each CPU if its initial value is 0, based
> on the observation that 0 may mean 'not initialized' just as well as
> 'performance' in that case.
>
> While at it, move the MSR_IA32_ENERGY_PERF_BIAS handling code into
> a separate file and document it in Documentation/admin-guide.
>
> Fixes: abe48b108247 (x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS)
> Fixes: b51ef52df71c (x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume)
> Reported-by: Thomas Renninger <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The Performance and Energy Bias Hint (EPB) is expected to be set by
> user space through the generic MSR interface, but that interface is
> not particularly nice and there are security concerns regarding it,
> so it is not always available.
>
> For this reason, add a sysfs interface for reading and updating the
> EPB, in the form of a new attribute, energy_perf_bias, located
> under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
> support the EPB feature.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-devices-system-cpu | 18 ++++
> Documentation/admin-guide/pm/intel_epb.rst | 27 ++++++
> arch/x86/kernel/cpu/intel_epb.c | 93 ++++++++++++++++++++-
> 3 files changed, 134 insertions(+), 4 deletions(-)
Acked-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> +static struct attribute *intel_epb_attrs[] = {
> + &dev_attr_energy_perf_bias.attr,
> + NULL
> +};
> +
> +static const struct attribute_group intel_epb_attr_group = {
> + .name = power_group_name,
> + .attrs = intel_epb_attrs
> +};
> +
> static int intel_epb_online(unsigned int cpu)
> {
> + struct device *cpu_dev = get_cpu_device(cpu);
> +
> intel_epb_restore();
> + if (!cpuhp_tasks_frozen)
> + sysfs_merge_group(&cpu_dev->kobj, &intel_epb_attr_group);
> +
> return 0;
> }
>
> static int intel_epb_offline(unsigned int cpu)
> {
> - return intel_epb_save();
> + struct device *cpu_dev = get_cpu_device(cpu);
> +
> + if (!cpuhp_tasks_frozen)
> + sysfs_unmerge_group(&cpu_dev->kobj, &intel_epb_attr_group);
> +
> + intel_epb_save();
> + return 0;
> }
Hi,
I just booted net-next and got the following NULL pointer dereference
[1] during boot. I believe it is caused by this patch.
CONFIG_PM is disabled in my config which means 'power_group_name' is
defined as NULL. When I enable CONFIG_PM the issue is not reproduced.
Thanks
[1]
[ 1.230241] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1.231043] #PF: supervisor read access in kernel mode
[ 1.231043] #PF: error_code(0x0000) - not-present page
[ 1.231043] PGD 0 P4D 0
[ 1.231043] Oops: 0000 [#1] SMP
[ 1.231043] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 5.1.0-custom-07273-g80f232121b69 #1392
[ 1.231043] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[ 1.231043] RIP: 0010:strlen+0x0/0x20
[ 1.231043] Code: b5 20 75 eb c6 42 01 00 0f b6 10 f6 82 40 bf 4d b5 20 74 14 48 c7 c1 40 bf 4d b5 48 83 c0 01 0f b6 10 f6 04 11 20 75 f3 c3 90 <80> 3f 00 74 10 48 89 f8
48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[ 1.231043] RSP: 0000:ffffb587c0cd3dc8 EFLAGS: 00010246
[ 1.231043] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100
[ 1.231043] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1.231043] RBP: 0000000000000000 R08: ffff8e6137a160c8 R09: 0000000000000000
[ 1.231043] R10: 0000000000000000 R11: ffff8e613652ec80 R12: 0000000000000000
[ 1.231043] R13: 0000000000000000 R14: ffff8e6137a160c8 R15: ffffffffb4690120
[ 1.231043] FS: 0000000000000000(0000) GS:ffff8e6137a00000(0000) knlGS:0000000000000000
[ 1.231043] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.231043] CR2: 0000000000000000 CR3: 0000000200409000 CR4: 00000000001006f0
[ 1.231043] Call Trace:
[ 1.231043] kernfs_name_hash+0xd/0x80
[ 1.231043] kernfs_find_ns+0x30/0xc0
[ 1.231043] kernfs_find_and_get_ns+0x27/0x50
[ 1.231043] sysfs_merge_group+0x2e/0x100
[ 1.231043] ? __switch_to_asm+0x40/0x70
[ 1.231043] intel_epb_online+0x2a/0x30
[ 1.231043] cpuhp_invoke_callback+0x8f/0x550
[ 1.231043] ? sort_range+0x20/0x20
[ 1.231043] cpuhp_thread_fun+0x9b/0x100
[ 1.231043] smpboot_thread_fn+0xc0/0x160
[ 1.231043] kthread+0x10d/0x130
[ 1.231043] ? __kthread_create_on_node+0x180/0x180
[ 1.231043] ret_from_fork+0x35/0x40
[ 1.231043] CR2: 0000000000000000
[ 1.231043] ---[ end trace c8ea60276791261c ]---
[ 1.231043] RIP: 0010:strlen+0x0/0x20
[ 1.231043] Code: b5 20 75 eb c6 42 01 00 0f b6 10 f6 82 40 bf 4d b5 20 74 14 48 c7 c1 40 bf 4d b5 48 83 c0 01 0f b6 10 f6 04 11 20 75 f3 c3 90 <80> 3f 00 74 10 48 89 f8
48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[ 1.231043] RSP: 0000:ffffb587c0cd3dc8 EFLAGS: 00010246
[ 1.231043] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100
[ 1.231043] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1.231043] RBP: 0000000000000000 R08: ffff8e6137a160c8 R09: 0000000000000000
[ 1.231043] R10: 0000000000000000 R11: ffff8e613652ec80 R12: 0000000000000000
[ 1.231043] R13: 0000000000000000 R14: ffff8e6137a160c8 R15: ffffffffb4690120
[ 1.231043] FS: 0000000000000000(0000) GS:ffff8e6137a00000(0000) knlGS:0000000000000000
[ 1.231043] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.231043] CR2: 0000000000000000 CR3: 0000000200409000 CR4: 00000000001006f0
On Thursday, May 9, 2019 12:23:15 PM CEST Ido Schimmel wrote:
> On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> > +static struct attribute *intel_epb_attrs[] = {
> > + &dev_attr_energy_perf_bias.attr,
> > + NULL
> > +};
> > +
> > +static const struct attribute_group intel_epb_attr_group = {
> > + .name = power_group_name,
> > + .attrs = intel_epb_attrs
> > +};
> > +
> > static int intel_epb_online(unsigned int cpu)
> > {
> > + struct device *cpu_dev = get_cpu_device(cpu);
> > +
> > intel_epb_restore();
> > + if (!cpuhp_tasks_frozen)
> > + sysfs_merge_group(&cpu_dev->kobj, &intel_epb_attr_group);
> > +
> > return 0;
> > }
> >
> > static int intel_epb_offline(unsigned int cpu)
> > {
> > - return intel_epb_save();
> > + struct device *cpu_dev = get_cpu_device(cpu);
> > +
> > + if (!cpuhp_tasks_frozen)
> > + sysfs_unmerge_group(&cpu_dev->kobj, &intel_epb_attr_group);
> > +
> > + intel_epb_save();
> > + return 0;
> > }
>
> Hi,
>
> I just booted net-next and got the following NULL pointer dereference
> [1] during boot. I believe it is caused by this patch.
I think you're right, sorry about this.
> CONFIG_PM is disabled in my config which means 'power_group_name' is
> defined as NULL. When I enable CONFIG_PM the issue is not reproduced.
So does the patch below fix it for you?
---
arch/x86/kernel/cpu/intel_epb.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/intel_epb.c
+++ linux-pm/arch/x86/kernel/cpu/intel_epb.c
@@ -97,6 +97,7 @@ static void intel_epb_restore(void)
wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, (epb & ~EPB_MASK) | val);
}
+#ifdef CONFIG_PM
static struct syscore_ops intel_epb_syscore_ops = {
.suspend = intel_epb_save,
.resume = intel_epb_restore,
@@ -193,6 +194,25 @@ static int intel_epb_offline(unsigned in
return 0;
}
+static inline void register_intel_ebp_syscore_ops(void)
+{
+ register_syscore_ops(&intel_epb_syscore_ops);
+}
+#else /* !CONFIG_PM */
+static int intel_epb_online(unsigned int cpu)
+{
+ intel_epb_restore();
+ return 0;
+}
+
+static int intel_epb_offline(unsigned int cpu)
+{
+ return intel_epb_save();
+}
+
+static inline void register_intel_ebp_syscore_ops(void) {}
+#endif
+
static __init int intel_epb_init(void)
{
int ret;
@@ -206,7 +226,7 @@ static __init int intel_epb_init(void)
if (ret < 0)
goto err_out_online;
- register_syscore_ops(&intel_epb_syscore_ops);
+ register_intel_ebp_syscore_ops();
return 0;
err_out_online:
On Thu, May 09, 2019 at 07:18:28PM +0200, Rafael J. Wysocki wrote:
> So does the patch below fix it for you?
Yes. Thanks for the fix. Feel free to add my tag:
Tested-by: Ido Schimmel <[email protected]>
From: Rafael J. Wysocki <[email protected]>
Commit b9c273babce7 (PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs
interface) caused kernels built with CONFIG_PM unset to crash on
systems supporting the Performance and Energy Bias Hint (EPB),
because it attempts to add files to sysfs directories that don't
exist on those systems.
Prevent that from happening by taking CONFIG_PM into account so
that the code depending on it is not compiled at all when it is
not set.
Fixes: b9c273babce7 (PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface)
Reported-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/kernel/cpu/intel_epb.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/intel_epb.c
+++ linux-pm/arch/x86/kernel/cpu/intel_epb.c
@@ -97,6 +97,7 @@ static void intel_epb_restore(void)
wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, (epb & ~EPB_MASK) | val);
}
+#ifdef CONFIG_PM
static struct syscore_ops intel_epb_syscore_ops = {
.suspend = intel_epb_save,
.resume = intel_epb_restore,
@@ -193,6 +194,25 @@ static int intel_epb_offline(unsigned in
return 0;
}
+static inline void register_intel_ebp_syscore_ops(void)
+{
+ register_syscore_ops(&intel_epb_syscore_ops);
+}
+#else /* !CONFIG_PM */
+static int intel_epb_online(unsigned int cpu)
+{
+ intel_epb_restore();
+ return 0;
+}
+
+static int intel_epb_offline(unsigned int cpu)
+{
+ return intel_epb_save();
+}
+
+static inline void register_intel_ebp_syscore_ops(void) {}
+#endif
+
static __init int intel_epb_init(void)
{
int ret;
@@ -206,7 +226,7 @@ static __init int intel_epb_init(void)
if (ret < 0)
goto err_out_online;
- register_syscore_ops(&intel_epb_syscore_ops);
+ register_intel_ebp_syscore_ops();
return 0;
err_out_online:
* Rafael J. Wysocki <[email protected]> wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Commit b9c273babce7 (PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs
> interface) caused kernels built with CONFIG_PM unset to crash on
> systems supporting the Performance and Energy Bias Hint (EPB),
> because it attempts to add files to sysfs directories that don't
> exist on those systems.
>
> Prevent that from happening by taking CONFIG_PM into account so
> that the code depending on it is not compiled at all when it is
> not set.
>
> Fixes: b9c273babce7 (PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface)
> Reported-by: Ido Schimmel <[email protected]>
> Tested-by: Ido Schimmel <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> arch/x86/kernel/cpu/intel_epb.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
Acked-by: Ingo Molnar <[email protected]>
Thanks,
Ingo
From: Rafael J. Wysocki <[email protected]>
Commit 9ed0985332a6 ("x86: intel_epb: Take CONFIG_PM into account")
prevented the majority of the Performance and Energy Bias Hint (EPB)
handling code from being built when CONFIG_PM is unset to fix a
regression introduced by commit b9c273babce7 ("PM / arch: x86:
MSR_IA32_ENERGY_PERF_BIAS sysfs interface").
In hindsight, however, it would be better to skip all of the EPB
handling code for CONFIG_PM unset as there really is no reason for
it to be there in that case. Namely, if the EPB is not touched
by the kernel at all with CONFIG_PM unset, there is no need to
worry about modifying the EPB inadvertently on CPU online and since
the system will not suspend or hibernate then, there is no need to
worry about possible modifications of the EPB by the platform
firmware during system-wide PM transitions.
For this reason, revert the changes made by commit 9ed0985332a6
and only allow intel_epb.o to be built when CONFIG_PM is set.
Note that this changes the behavior of the kernels built with
CONFIG_PM unset as they will not modify the EPB on boot if it is
zero initially any more, so it is not a fix strictly speaking, but
users building their kernels with CONFIG_PM unset really should not
expect them to take energy efficiency into account. Moreover, if
CONFIG_PM is unset for performance reasons, leaving EPB as set
initially by the platform firmware will actually be consistent
with the user's expectations.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
This is complementary to the EPB handling changes made in the current
development cycle, so IMO it would be good to do it in this cycle too
if there are no technical concerns or objections regarding it.
---
arch/x86/kernel/cpu/Makefile | 5 ++++-
arch/x86/kernel/cpu/intel_epb.c | 22 +---------------------
2 files changed, 5 insertions(+), 22 deletions(-)
Index: linux-pm/arch/x86/kernel/cpu/Makefile
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/Makefile
+++ linux-pm/arch/x86/kernel/cpu/Makefile
@@ -28,7 +28,10 @@ obj-y += cpuid-deps.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
-obj-$(CONFIG_CPU_SUP_INTEL) += intel.o intel_pconfig.o intel_epb.o
+ifdef CONFIG_CPU_SUP_INTEL
+obj-y += intel.o intel_pconfig.o
+obj-$(CONFIG_PM) += intel_epb.o
+endif
obj-$(CONFIG_CPU_SUP_AMD) += amd.o
obj-$(CONFIG_CPU_SUP_HYGON) += hygon.o
obj-$(CONFIG_CPU_SUP_CYRIX_32) += cyrix.o
Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/intel_epb.c
+++ linux-pm/arch/x86/kernel/cpu/intel_epb.c
@@ -97,7 +97,6 @@ static void intel_epb_restore(void)
wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, (epb & ~EPB_MASK) | val);
}
-#ifdef CONFIG_PM
static struct syscore_ops intel_epb_syscore_ops = {
.suspend = intel_epb_save,
.resume = intel_epb_restore,
@@ -194,25 +193,6 @@ static int intel_epb_offline(unsigned in
return 0;
}
-static inline void register_intel_ebp_syscore_ops(void)
-{
- register_syscore_ops(&intel_epb_syscore_ops);
-}
-#else /* !CONFIG_PM */
-static int intel_epb_online(unsigned int cpu)
-{
- intel_epb_restore();
- return 0;
-}
-
-static int intel_epb_offline(unsigned int cpu)
-{
- return intel_epb_save();
-}
-
-static inline void register_intel_ebp_syscore_ops(void) {}
-#endif
-
static __init int intel_epb_init(void)
{
int ret;
@@ -226,7 +206,7 @@ static __init int intel_epb_init(void)
if (ret < 0)
goto err_out_online;
- register_intel_ebp_syscore_ops();
+ register_syscore_ops(&intel_epb_syscore_ops);
return 0;
err_out_online:
* Rafael J. Wysocki <[email protected]> wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Commit 9ed0985332a6 ("x86: intel_epb: Take CONFIG_PM into account")
> prevented the majority of the Performance and Energy Bias Hint (EPB)
> handling code from being built when CONFIG_PM is unset to fix a
> regression introduced by commit b9c273babce7 ("PM / arch: x86:
> MSR_IA32_ENERGY_PERF_BIAS sysfs interface").
>
> In hindsight, however, it would be better to skip all of the EPB
> handling code for CONFIG_PM unset as there really is no reason for
> it to be there in that case. Namely, if the EPB is not touched
> by the kernel at all with CONFIG_PM unset, there is no need to
> worry about modifying the EPB inadvertently on CPU online and since
> the system will not suspend or hibernate then, there is no need to
> worry about possible modifications of the EPB by the platform
> firmware during system-wide PM transitions.
>
> For this reason, revert the changes made by commit 9ed0985332a6
> and only allow intel_epb.o to be built when CONFIG_PM is set.
>
> Note that this changes the behavior of the kernels built with
> CONFIG_PM unset as they will not modify the EPB on boot if it is
> zero initially any more, so it is not a fix strictly speaking, but
> users building their kernels with CONFIG_PM unset really should not
> expect them to take energy efficiency into account. Moreover, if
> CONFIG_PM is unset for performance reasons, leaving EPB as set
> initially by the platform firmware will actually be consistent
> with the user's expectations.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
>
> This is complementary to the EPB handling changes made in the current
> development cycle, so IMO it would be good to do it in this cycle too
> if there are no technical concerns or objections regarding it.
Sure:
Acked-by: Ingo Molnar <[email protected]>
Thanks,
Ingo