On Saturday, February 27, 2016 12:15:47 AM Rafael J. Wysocki wrote:
> On Friday, February 26, 2016 05:38:00 PM Thomas Renninger wrote:
> > The assumption that BIOSes never want to have this register being set to
> > full performance (zero) is wrong.
> >
> > While wrongly overruling this BIOS setting and set it from performance
> > to normal did not hurt that much, because nobody really knew the effects
> > inside Intel processors.
> >
> > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo
> > modes if this value is not set to performance.
> >
> > So switch logic to tell the user in a friendly way (info) that the CPU is
> > in performance mode and how to switch via userspace if this is not
> > intended.
> >
> > But otherwise trust that the BIOS has set the correct value here and do
> > not
> > blindly overrule.
> >
> > How this has been found: SLE11 had this patch, SLE12 it slipped through.
> > It took quite some time to nail down that this patch missing is the reason
> > for not entering turbo modes with this specific processor.
> >
> > Signed-off-by: Thomas Renninger <[email protected]>
> >
> > --- a/arch/x86/kernel/cpu/intel.c 2016-02-26 17:19:55.731042972 +0100
> > +++ b/arch/x86/kernel/cpu/intel.c 2016-02-26 17:20:48.598020581 +0100
> > @@ -377,8 +377,12 @@ static void init_intel_energy_perf(struc
> >
> > u64 epb;
> >
> > /*
> >
> > - * Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
> > - * (x86_energy_perf_policy(8) is available to change it at run-time.)
> > + * On server platforms energy bias typically is set to
> > + * performance on purpose.
> > + * On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS
> > + * did not get initialized properly by BIOS.
> > + * Best is to to keep BIOS settings and give the user a hint whether
> > + * to change it via cpupower-set(8) userspace tool at runtime.
> >
> > */
> >
> > if (!cpu_has(c, X86_FEATURE_EPB))
> >
> > return;
> >
> > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc
> >
> > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
> >
> > return;
> >
> > - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was
'performance'\n");
> > - pr_warn_once("ENERGY_PERF_BIAS: View and update with
> > x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) |
> > ENERGY_PERF_BIAS_NORMAL;
> > - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> > + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
> > + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n");
>
> This doesn't need to be cpupower-set IMO.
You mean why switch the message from:
x86_energy_perf_policy to cpupower-set
?
IMO x86_energy_perf_policy should not exist. It has been introduce before
cpupower set -b.
Having an extra tool/binary for this functionality is an unneeded packaging
overhead for distros.
Also having more and more of such CPU specific tools is not userfriendly.
cpupower supports all power relevant features of your CPU and on all
architectures (or at least it should). People should know this one better
than "x86_energy_perf_policy" and theoretically intuitively find it, even
without a message.
So it would be nice to get the message fixed as well.
Thomas
On Tuesday, March 01, 2016 01:17:37 PM Thomas Renninger wrote:
> On Saturday, February 27, 2016 12:15:47 AM Rafael J. Wysocki wrote:
> > On Friday, February 26, 2016 05:38:00 PM Thomas Renninger wrote:
> > > The assumption that BIOSes never want to have this register being set to
> > > full performance (zero) is wrong.
> > >
> > > While wrongly overruling this BIOS setting and set it from performance
> > > to normal did not hurt that much, because nobody really knew the effects
> > > inside Intel processors.
> > >
> > > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo
> > > modes if this value is not set to performance.
> > >
> > > So switch logic to tell the user in a friendly way (info) that the CPU is
> > > in performance mode and how to switch via userspace if this is not
> > > intended.
> > >
> > > But otherwise trust that the BIOS has set the correct value here and do
> > > not
> > > blindly overrule.
> > >
> > > How this has been found: SLE11 had this patch, SLE12 it slipped through.
> > > It took quite some time to nail down that this patch missing is the reason
> > > for not entering turbo modes with this specific processor.
> > >
> > > Signed-off-by: Thomas Renninger <[email protected]>
> > >
> > > --- a/arch/x86/kernel/cpu/intel.c 2016-02-26 17:19:55.731042972 +0100
> > > +++ b/arch/x86/kernel/cpu/intel.c 2016-02-26 17:20:48.598020581 +0100
> > > @@ -377,8 +377,12 @@ static void init_intel_energy_perf(struc
> > >
> > > u64 epb;
> > >
> > > /*
> > >
> > > - * Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
> > > - * (x86_energy_perf_policy(8) is available to change it at run-time.)
> > > + * On server platforms energy bias typically is set to
> > > + * performance on purpose.
> > > + * On other platforms it may happen that MSR_IA32_ENERGY_PERF_BIAS
> > > + * did not get initialized properly by BIOS.
> > > + * Best is to to keep BIOS settings and give the user a hint whether
> > > + * to change it via cpupower-set(8) userspace tool at runtime.
> > >
> > > */
> > >
> > > if (!cpu_has(c, X86_FEATURE_EPB))
> > >
> > > return;
> > >
> > > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc
> > >
> > > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
> > >
> > > return;
> > >
> > > - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was
> 'performance'\n");
> > > - pr_warn_once("ENERGY_PERF_BIAS: View and update with
> > > x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) |
> > > ENERGY_PERF_BIAS_NORMAL;
> > > - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> > > + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
> > > + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-set(8)\n");
> >
> > This doesn't need to be cpupower-set IMO.
>
> You mean why switch the message from:
> x86_energy_perf_policy to cpupower-set
> ?
>
> IMO x86_energy_perf_policy should not exist. It has been introduce before
> cpupower set -b.
> Having an extra tool/binary for this functionality is an unneeded packaging
> overhead for distros.
> Also having more and more of such CPU specific tools is not userfriendly.
> cpupower supports all power relevant features of your CPU and on all
> architectures (or at least it should). People should know this one better
> than "x86_energy_perf_policy" and theoretically intuitively find it, even
> without a message.
>
> So it would be nice to get the message fixed as well.
My point is that since "cpupower set -b" is not the only way to set this,
it doesn't seem appropriate to refer to it explicitly from a kernel message.
I actually don't think the second message is necessary at all.
Thanks,
Rafael
On Wednesday, March 02, 2016 01:26:18 AM Rafael J. Wysocki wrote:
> On Tuesday, March 01, 2016 01:17:37 PM Thomas Renninger wrote:
> > > > if (!cpu_has(c, X86_FEATURE_EPB))z
> > > >
> > > > return;
> > > >
> > > > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc
> > > >
> > > > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
> > > >
> > > > return;
> > > >
> > > > - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was
> >
> > 'performance'\n");
> >
> > > > - pr_warn_once("ENERGY_PERF_BIAS: View and update with
> > > > x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) |
> > > > ENERGY_PERF_BIAS_NORMAL;
> > > > - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> > > > + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
> > > > + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-
set(8)\n");
> > >
> > > This doesn't need to be cpupower-set IMO.
> >
> > You mean why switch the message from:
> > x86_energy_perf_policy to cpupower-set
> > ?
> >
> > IMO x86_energy_perf_policy should not exist. It has been introduce before
> > cpupower set -b.
> > Having an extra tool/binary for this functionality is an unneeded
> > packaging
> > overhead for distros.
> > Also having more and more of such CPU specific tools is not userfriendly.
> > cpupower supports all power relevant features of your CPU and on all
> > architectures (or at least it should). People should know this one better
> > than "x86_energy_perf_policy" and theoretically intuitively find it, even
> > without a message.
> >
> > So it would be nice to get the message fixed as well.
>
> My point is that since "cpupower set -b" is not the only way to set this,
> it doesn't seem appropriate to refer to it explicitly from a kernel message.
>
> I actually don't think the second message is necessary at all.
Hmm, thinking a bit more about this, I think the whole
init_intel_energy_perf() function check should vanish.
The check should get moved into the powertop userspace tool.
This one is used to optimize platform for power saving features.
This would also keep the kernel core code clean...
If you agree I will send the patch.
Thomas
Hi,
On Fri, Mar 4, 2016 at 9:37 AM, Thomas Renninger <[email protected]> wrote:
> On Wednesday, March 02, 2016 01:26:18 AM Rafael J. Wysocki wrote:
>> On Tuesday, March 01, 2016 01:17:37 PM Thomas Renninger wrote:
>> > > > if (!cpu_has(c, X86_FEATURE_EPB))z
>> > > >
>> > > > return;
>> > > >
>> > > > @@ -387,10 +391,8 @@ static void init_intel_energy_perf(struc
>> > > >
>> > > > if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
>> > > >
>> > > > return;
>> > > >
>> > > > - pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was
>> >
>> > 'performance'\n");
>> >
>> > > > - pr_warn_once("ENERGY_PERF_BIAS: View and update with
>> > > > x86_energy_perf_policy(8)\n"); - epb = (epb & ~0xF) |
>> > > > ENERGY_PERF_BIAS_NORMAL;
>> > > > - wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
>> > > > + pr_info_once("ENERGY_PERF_BIAS is set to 'performance'\n");
>> > > > + pr_info_once("ENERGY_PERF_BIAS: Update with cpupower-
> set(8)\n");
>> > >
>> > > This doesn't need to be cpupower-set IMO.
>> >
>> > You mean why switch the message from:
>> > x86_energy_perf_policy to cpupower-set
>> > ?
>> >
>> > IMO x86_energy_perf_policy should not exist. It has been introduce before
>> > cpupower set -b.
>> > Having an extra tool/binary for this functionality is an unneeded
>> > packaging
>> > overhead for distros.
>> > Also having more and more of such CPU specific tools is not userfriendly.
>> > cpupower supports all power relevant features of your CPU and on all
>> > architectures (or at least it should). People should know this one better
>> > than "x86_energy_perf_policy" and theoretically intuitively find it, even
>> > without a message.
>> >
>> > So it would be nice to get the message fixed as well.
>>
>> My point is that since "cpupower set -b" is not the only way to set this,
>> it doesn't seem appropriate to refer to it explicitly from a kernel message.
>>
>> I actually don't think the second message is necessary at all.
>
> Hmm, thinking a bit more about this, I think the whole
> init_intel_energy_perf() function check should vanish.
>
> The check should get moved into the powertop userspace tool.
> This one is used to optimize platform for power saving features.
>
> This would also keep the kernel core code clean...
>
> If you agree I will send the patch.
I need to talk to Len about that, but why don't you send it anyway?
If we are not going to update the knob, I'm not seeing much value in
checking it from the kernel. A few people read boot logs if their
systems work as expected, so the value of the message alone is quite
limited IMO.
Thanks,
Rafael
It came out that on certain CPUs perf bias MSR has to be set to performance,
so that the CPU enters turbo states at all.
Better do not try to wrongly adjust perf bias value,
its value probably is intended by BIOS.
This is a revert of mainline git commits:
commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06
commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6
commit abe48b108247e9b90b4c6739662a2e5c765ed114
Signed-off-by: Thomas Renninger <[email protected]>
---
arch/x86/kernel/cpu/common.c | 18 ------------------
arch/x86/kernel/cpu/cpu.h | 1 -
arch/x86/kernel/cpu/intel.c | 32 --------------------------------
3 files changed, 51 deletions(-)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -13,7 +13,6 @@
#include <linux/kgdb.h>
#include <linux/smp.h>
#include <linux/io.h>
-#include <linux/syscore_ops.h>
#include <asm/stackprotector.h>
#include <asm/perf_event.h>
@@ -1489,20 +1488,3 @@ inline bool __static_cpu_has_safe(u16 bi
return boot_cpu_has(bit);
}
EXPORT_SYMBOL_GPL(__static_cpu_has_safe);
-
-static void bsp_resume(void)
-{
- if (this_cpu->c_bsp_resume)
- this_cpu->c_bsp_resume(&boot_cpu_data);
-}
-
-static struct syscore_ops cpu_syscore_ops = {
- .resume = bsp_resume,
-};
-
-static int __init init_cpu_syscore(void)
-{
- register_syscore_ops(&cpu_syscore_ops);
- return 0;
-}
-core_initcall(init_cpu_syscore);
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -13,7 +13,6 @@ struct cpu_dev {
void (*c_init)(struct cpuinfo_x86 *);
void (*c_identify)(struct cpuinfo_x86 *);
void (*c_detect_tlb)(struct cpuinfo_x86 *);
- void (*c_bsp_resume)(struct cpuinfo_x86 *);
int c_x86_vendor;
#ifdef CONFIG_X86_32
/* Optional vendor specific routine to obtain the cache size. */
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -372,36 +372,6 @@ static void detect_vmx_virtcap(struct cp
}
}
-static void init_intel_energy_perf(struct cpuinfo_x86 *c)
-{
- u64 epb;
-
- /*
- * Initialize MSR_IA32_ENERGY_PERF_BIAS if not already initialized.
- * (x86_energy_perf_policy(8) is available to change it at run-time.)
- */
- if (!cpu_has(c, X86_FEATURE_EPB))
- return;
-
- rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
- if ((epb & 0xF) != ENERGY_PERF_BIAS_PERFORMANCE)
- return;
-
- pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
- pr_warn_once("ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)\n");
- epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
- wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
-}
-
-static void intel_bsp_resume(struct cpuinfo_x86 *c)
-{
- /*
- * MSR_IA32_ENERGY_PERF_BIAS is lost across suspend/resume,
- * so reinitialize it properly like during bootup:
- */
- init_intel_energy_perf(c);
-}
-
static void init_intel(struct cpuinfo_x86 *c)
{
unsigned int l2 = 0;
@@ -509,7 +479,6 @@ static void init_intel(struct cpuinfo_x8
if (cpu_has(c, X86_FEATURE_VMX))
detect_vmx_virtcap(c);
- init_intel_energy_perf(c);
}
#ifdef CONFIG_X86_32
@@ -764,7 +733,6 @@ static const struct cpu_dev intel_cpu_de
.c_detect_tlb = intel_detect_tlb,
.c_early_init = early_init_intel,
.c_init = init_intel,
- .c_bsp_resume = intel_bsp_resume,
.c_x86_vendor = X86_VENDOR_INTEL,
};
On Monday, March 07, 2016 02:24:54 PM Thomas Renninger wrote:
> It came out that on certain CPUs perf bias MSR has to be set to performance,
> so that the CPU enters turbo states at all.
>
> Better do not try to wrongly adjust perf bias value,
> its value probably is intended by BIOS.
This whole perf bias CPU feature and trying to adjust the value
at boot time gets more and more a mess.
I try to sum this up, try to collect some missing bits and get them
published through mailing lists. Hopefully some root causes of what
went wrong are identified and passed on to the right people to get
things fixed.
1. It started with:
commit abe48b108247e9b90b4c6739662a2e5c765ed114
Author: Len Brown <[email protected]>
Date: Thu Jul 14 00:53:24 2011 -0400
x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS
...
However, the typical BIOS fails to initialize the MSR, presumably
because this is handled by high-volume shrink-wrap operating systems...
Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8).
As a result, WSM-EP, SNB, and later hardware from Intel will run in its
default hardware power-on state (performance), which assumes that users
care for performance at all costs and not for energy efficiency.
While that is fine for performance benchmarks, the hardware's intended default
operating point is "normal" mode...
Initialize the MSR to the "normal" by default during kernel boot.
...
2. Four years later it has been identified that the boot processor's perf bias
switches back to 0 on suspend.
I wonder whether this happens generally. Or do BIOS hooks also reset the BIAS
value on suspend?
3. It has now be identified that specific CPUs can only enter Turbo modes, if
the perf BIAS value is set to 0 (performance). But the performance value cannot
be set anymore by BIOS because of above patches.
My questions:
1. If this is true:
"While that is fine for performance benchmarks, the hardware's intended default
operating point is "normal" mode..."
Then the HW (CPU) must initialize to 5 (normal).
It's not the BIOS' fault which forgets to set something,
this may always be intended.
commit b51ef52df71cb28e9d90cd1d48b79bf19f0bab06
commit 17edf2d79f1ea6dfdb4c444801d928953b9f98d6
commit abe48b108247e9b90b4c6739662a2e5c765ed114
Intel must tell their microcode writers to initialize perf BIAS with 5, if this
is what Intel wants.
2. Why is the value for the boot processor reset to 0 on suspend?
Is this a platform/BIOS issue only?
Or could this be another major misdesign in perf BIAS CPU implementation.
3. What exactly was the problem?
The argumentation that the perf BIAS "should" be 5 was rather vague.
What exactly are the concequences? Getting a dirty workaround into the x86 core
code only by this information without any values or references is bad.
Thomas
PS: I will submit the patch (revert of commit b51ef52df71cb2, 17edf2d79f1ea6d and
abe48b108247e9) now to our SLE code base.
> But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo modes
> if this value is not set to performance
BDX-EP supports HWP.
Are these failing machines running in HWP mode?
(On BDX-EP, and only on BDX-EP, EPB acts to set the BIAS for HWP,
because that processor doesn't yet have EPP.)
--
Len Brown, Intel Open Source Technology Center
On Monday, March 07, 2016 07:50:57 PM Len Brown wrote:
> > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo
> > modes if this value is not set to performance
>
> BDX-EP supports HWP.
> Are these failing machines running in HWP mode?
>
> (On BDX-EP, and only on BDX-EP, EPB acts to set the BIAS for HWP,
> because that processor doesn't yet have EPP.)
I am not sure whether I can publish platform info. I asked for and added you
to CC of the private bug a while ago.
This kernel is run: SLES12 SP1, 3.12.49-4-default
I grepped the whole supportconfig for -i hwp and couldn't find anything, so I
very much expect this machine is/was not run in HWP mode, right?
I still question the usefulness of the "initialize perf bias to normal" hack.
For our distro I am pretty sure, we do not want to have this one and
we prefer the CPU or BIOS initialized value, even (or especially) if it is
set to performance.
Are there any known platforms where this would really be an issue
and how sever would it be?
I already removed the "set perf bias to normal" years ago for SLE11 without
getting any regression or negative reports.
Now finding the workaround on the hack to run a suspend hook to
adjust perf bias value on each suspend cycle... This is going into
a wrong direction.
I agree that if this needs resetting after each suspend cycle, userspace
should not need to care about it. I could imagine a sysfs variable here:
/sys/devices/system/cpu/intel_pstate/perf_bias
but the static setting from 0 to 6 in the x86 core code and
the suspend callback should get reverted, right?
Thomas
On Tue, Mar 8, 2016 at 7:14 AM, Thomas Renninger <[email protected]> wrote:
> On Monday, March 07, 2016 07:50:57 PM Len Brown wrote:
>> > But with Broadwell-EP processor (E5-2687W v4) the CPU will not enter turbo
>> > modes if this value is not set to performance
>>
>> BDX-EP supports HWP.
>> Are these failing machines running in HWP mode?
>>
>> (On BDX-EP, and only on BDX-EP, EPB acts to set the BIAS for HWP,
>> because that processor doesn't yet have EPP.)
>
> I am not sure whether I can publish platform info. I asked for and added you
> to CC of the private bug a while ago.
>
> This kernel is run: SLES12 SP1, 3.12.49-4-default
> I grepped the whole supportconfig for -i hwp and couldn't find anything, so I
> very much expect this machine is/was not run in HWP mode, right?
/proc/cpuinfo will have a bunch of hwp stuff if CPUID is advertising
the feature.
The BIOS could put it in HWP mode by default,
or the intel_pstate driver could enable HWP.
If intel_pstate did that, it would print it in dmesg.
pr_info("intel_pstate: HWP enabled\n");
> I still question the usefulness of the "initialize perf bias to normal" hack.
> For our distro I am pretty sure, we do not want to have this one and
> we prefer the CPU or BIOS initialized value, even (or especially) if it is
> set to performance.
>
> Are there any known platforms where this would really be an issue
> and how sever would it be?
>
> I already removed the "set perf bias to normal" years ago for SLE11 without
> getting any regression or negative reports.
>
> Now finding the workaround on the hack to run a suspend hook to
> adjust perf bias value on each suspend cycle... This is going into
> a wrong direction.
>
> I agree that if this needs resetting after each suspend cycle, userspace
> should not need to care about it. I could imagine a sysfs variable here:
> /sys/devices/system/cpu/intel_pstate/perf_bias
As EPB it is unrelated to intel_pstate, this would be a bad place to
put such state.
> but the static setting from 0 to 6 in the x86 core code and
> the suspend callback should get reverted, right?
The kernel is supplying a reasonable default.
It is up to the operating system to apply -- from user space --
a value that is consistent with the performance profile
for how the machine is being used.
I proposed doing such policy in the kernel a number of years ago,
and a number of non-kernel people insisted strenuously that the
kernel should not know or care, and that user-space should manage this.
Perhaps we need to examine how well abdicating responsibility
to the upper OS has been going...
I'm not going to tell you that you can't make policy choices in distro-specific
kernel patches. Maybe it is difficult for some distros to modify user-space.
But I am not going to recommend such changes go in the upstream kernel,
since it would impact other users.
thanks,
Len Brown, Intel Open Source Technology Center