For Intel processors that need to turn on IBRS to protect against
Spectre v2 and Retbleed, the IBRS bit in the SPEC_CTRL MSR affects
the performance of the whole core even if only one thread is turning
it on when running in the kernel. For user space heavy applications,
the performance impact of occasionally turning IBRS on during syscalls
shouldn't be significant. Unfortunately, that is not the case when the
sibling thread is idling in the kernel. In that case, the performance
impact can be significant.
When DPDK is running on an isolated CPU thread processing network packets
in user space while its sibling thread is idle. The performance of the
busy DPDK thread with IBRS on and off in the sibling idle thread are:
IBRS on IBRS off
------- --------
packets/second: 7.8M 10.4M
avg tsc cycles/packet: 282.26 209.86
This is a 25% performance degradation. The test system is a Intel Xeon
4114 CPU @ 2.20GHz.
Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the CPU enters long idle (C6 or below). However, there
are existing users out there who have set "intel_idle.max_cstate=1" or
even "intel_idle.max_cstate=0" to decrease latency. Those users won't be
able to benefit from this commit. This patch series extends this commit
by providing a new "intel_idle.no_ibrs" module option to force disable
IBRS even when "intel_idle.max_cstate=1" at the expense of increased IRQ
response latency. It also includes commit to allow the disabling of IBRS
with "intel_idle.max_cstate=0" as well as when a CPU becomes offline.
The first patch adds a new x86/spec_ctrl_msrs debugfs file which display
the current cached values of the SPEC_CTRL MSRs of all the CPUs. This
allows us to verify that IBRS bit is correctly turned off in idle CPUs
for various cstate values.
Waiman Long (5):
x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs
x86/idle: Disable IBRS when cpu is offline
intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
intel_idle: Add no_ibrs module parameter to force disable IBRS
x86/idle: Disable IBRS entering mwait idle and enable it on wakeup
arch/x86/include/asm/mwait.h | 17 ++++++++
arch/x86/kernel/cpu/bugs.c | 79 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/smpboot.c | 13 ++++++
drivers/idle/intel_idle.c | 22 ++++++++--
4 files changed, 127 insertions(+), 4 deletions(-)
--
2.31.1
Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the CPU enters long idle. However, when a CPU becomes
offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
enabled. That will impact the performance of a sibling CPU. Mitigate
this performance impact by clearing all the mitigation bits in SPEC_CTRL
MSR when offline and restoring the value of the MSR when it becomes
online again.
Signed-off-by: Waiman Long <[email protected]>
---
arch/x86/kernel/smpboot.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 352f0ce1ece4..5ff82fef413c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -84,6 +84,7 @@
#include <asm/hw_irq.h>
#include <asm/stackprotector.h>
#include <asm/sev.h>
+#include <asm/nospec-branch.h>
/* representing HT siblings of each logical CPU */
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
void native_play_dead(void)
{
+ u64 spec_ctrl = spec_ctrl_current();
+
+ if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
+ this_cpu_write(x86_spec_ctrl_current, 0);
+ native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+ }
+
play_dead_common();
tboot_shutdown(TB_SHUTDOWN_WFS);
mwait_play_dead();
if (cpuidle_play_dead())
hlt_play_dead();
+
+ if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
+ native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
+ this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
+ }
}
#else /* ... !CONFIG_HOTPLUG_CPU */
--
2.31.1
On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
> disables IBRS when the CPU enters long idle. However, when a CPU becomes
> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
> enabled. That will impact the performance of a sibling CPU. Mitigate
> this performance impact by clearing all the mitigation bits in SPEC_CTRL
> MSR when offline and restoring the value of the MSR when it becomes
> online again.
>
> Signed-off-by: Waiman Long <[email protected]>
> ---
> arch/x86/kernel/smpboot.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 352f0ce1ece4..5ff82fef413c 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -84,6 +84,7 @@
> #include <asm/hw_irq.h>
> #include <asm/stackprotector.h>
> #include <asm/sev.h>
> +#include <asm/nospec-branch.h>
>
> /* representing HT siblings of each logical CPU */
> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
>
> void native_play_dead(void)
> {
> + u64 spec_ctrl = spec_ctrl_current();
> +
> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> + this_cpu_write(x86_spec_ctrl_current, 0);
> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> + }
> +
> play_dead_common();
> tboot_shutdown(TB_SHUTDOWN_WFS);
>
> mwait_play_dead();
> if (cpuidle_play_dead())
> hlt_play_dead();
> +
> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> + native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
> + this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
> + }
> }
play_dead() is marked __noreturn
On 6/21/23 03:23, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
>> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
>> disables IBRS when the CPU enters long idle. However, when a CPU becomes
>> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
>> enabled. That will impact the performance of a sibling CPU. Mitigate
>> this performance impact by clearing all the mitigation bits in SPEC_CTRL
>> MSR when offline and restoring the value of the MSR when it becomes
>> online again.
>>
>> Signed-off-by: Waiman Long <[email protected]>
>> ---
>> arch/x86/kernel/smpboot.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index 352f0ce1ece4..5ff82fef413c 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -84,6 +84,7 @@
>> #include <asm/hw_irq.h>
>> #include <asm/stackprotector.h>
>> #include <asm/sev.h>
>> +#include <asm/nospec-branch.h>
>>
>> /* representing HT siblings of each logical CPU */
>> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
>> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
>>
>> void native_play_dead(void)
>> {
>> + u64 spec_ctrl = spec_ctrl_current();
>> +
>> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>> + this_cpu_write(x86_spec_ctrl_current, 0);
>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>> + }
>> +
>> play_dead_common();
>> tboot_shutdown(TB_SHUTDOWN_WFS);
>>
>> mwait_play_dead();
>> if (cpuidle_play_dead())
>> hlt_play_dead();
>> +
>> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
>> + this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
>> + }
>> }
> play_dead() is marked __noreturn
There are different versions of play_dead() in the kernel. Some of them
are indeed marked __noreturn like the non-SMP one in
arch/x86/kernel/process.c. The native_play_dead() that I am patching
isn't one of those.
Cheers,
Longman
On Wed, Jun 21, 2023 at 09:59:52AM -0400, Waiman Long wrote:
>
> On 6/21/23 03:23, Peter Zijlstra wrote:
> > On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
> > > Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
> > > disables IBRS when the CPU enters long idle. However, when a CPU becomes
> > > offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
> > > enabled. That will impact the performance of a sibling CPU. Mitigate
> > > this performance impact by clearing all the mitigation bits in SPEC_CTRL
> > > MSR when offline and restoring the value of the MSR when it becomes
> > > online again.
> > >
> > > Signed-off-by: Waiman Long <[email protected]>
> > > ---
> > > arch/x86/kernel/smpboot.c | 13 +++++++++++++
> > > 1 file changed, 13 insertions(+)
> > >
> > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > > index 352f0ce1ece4..5ff82fef413c 100644
> > > --- a/arch/x86/kernel/smpboot.c
> > > +++ b/arch/x86/kernel/smpboot.c
> > > @@ -84,6 +84,7 @@
> > > #include <asm/hw_irq.h>
> > > #include <asm/stackprotector.h>
> > > #include <asm/sev.h>
> > > +#include <asm/nospec-branch.h>
> > > /* representing HT siblings of each logical CPU */
> > > DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
> > > @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
> > > void native_play_dead(void)
> > > {
> > > + u64 spec_ctrl = spec_ctrl_current();
> > > +
> > > + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> > > + this_cpu_write(x86_spec_ctrl_current, 0);
> > > + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> > > + }
> > > +
> > > play_dead_common();
> > > tboot_shutdown(TB_SHUTDOWN_WFS);
> > > mwait_play_dead();
> > > if (cpuidle_play_dead())
> > > hlt_play_dead();
> > > +
> > > + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> > > + native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
> > > + this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
> > > + }
> > > }
> > play_dead() is marked __noreturn
>
> There are different versions of play_dead() in the kernel. Some of them are
> indeed marked __noreturn like the non-SMP one in arch/x86/kernel/process.c.
> The native_play_dead() that I am patching isn't one of those.
mostly by accident I think, hlt_play_dead() is, so I'm thinking
everybody (all compiler and objtool) managed to figure out
native_play_dead() is __noreturn too.
On 6/21/23 10:36, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 09:59:52AM -0400, Waiman Long wrote:
>> On 6/21/23 03:23, Peter Zijlstra wrote:
>>> On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
>>>> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
>>>> disables IBRS when the CPU enters long idle. However, when a CPU becomes
>>>> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
>>>> enabled. That will impact the performance of a sibling CPU. Mitigate
>>>> this performance impact by clearing all the mitigation bits in SPEC_CTRL
>>>> MSR when offline and restoring the value of the MSR when it becomes
>>>> online again.
>>>>
>>>> Signed-off-by: Waiman Long <[email protected]>
>>>> ---
>>>> arch/x86/kernel/smpboot.c | 13 +++++++++++++
>>>> 1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>>>> index 352f0ce1ece4..5ff82fef413c 100644
>>>> --- a/arch/x86/kernel/smpboot.c
>>>> +++ b/arch/x86/kernel/smpboot.c
>>>> @@ -84,6 +84,7 @@
>>>> #include <asm/hw_irq.h>
>>>> #include <asm/stackprotector.h>
>>>> #include <asm/sev.h>
>>>> +#include <asm/nospec-branch.h>
>>>> /* representing HT siblings of each logical CPU */
>>>> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
>>>> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
>>>> void native_play_dead(void)
>>>> {
>>>> + u64 spec_ctrl = spec_ctrl_current();
>>>> +
>>>> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>>>> + this_cpu_write(x86_spec_ctrl_current, 0);
>>>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>>>> + }
>>>> +
>>>> play_dead_common();
>>>> tboot_shutdown(TB_SHUTDOWN_WFS);
>>>> mwait_play_dead();
>>>> if (cpuidle_play_dead())
>>>> hlt_play_dead();
>>>> +
>>>> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>>>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
>>>> + this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
>>>> + }
>>>> }
>>> play_dead() is marked __noreturn
>> There are different versions of play_dead() in the kernel. Some of them are
>> indeed marked __noreturn like the non-SMP one in arch/x86/kernel/process.c.
>> The native_play_dead() that I am patching isn't one of those.
> mostly by accident I think, hlt_play_dead() is, so I'm thinking
> everybody (all compiler and objtool) managed to figure out
> native_play_dead() is __noreturn too.
Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an
error which is not the typical case. My testing does confirm that this
patch is able to keep the IBRS bit off when a CPU is offline via its
online sysfs file.
Cheers,
Longman
On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:
> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
> which is not the typical case. My testing does confirm that this patch is
> able to keep the IBRS bit off when a CPU is offline via its online sysfs
> file.
The point is; your re-enable IBRS hunk at the end is dead-code. It
should never ever run and having it is confusing.
On Wed, Jun 21, 2023 at 10:51:33AM -0400, Waiman Long wrote:
>
> On 6/21/23 10:48, Peter Zijlstra wrote:
> > On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:
> >
> > > Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
> > > which is not the typical case. My testing does confirm that this patch is
> > > able to keep the IBRS bit off when a CPU is offline via its online sysfs
> > > file.
> > The point is; your re-enable IBRS hunk at the end is dead-code. It
> > should never ever run and having it is confusing.
>
> What I meant is that hlt_play_dead() should never be called unless there is
> some serious problem with the system and native_play_dead() does return in
> normal usage.
This is all through arch_cpu_idle_dead() which is __noreturn. And no,
none of this must ever return.
If you want an offline CPU to come back, you re-init.
On 6/21/23 10:48, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:
>
>> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
>> which is not the typical case. My testing does confirm that this patch is
>> able to keep the IBRS bit off when a CPU is offline via its online sysfs
>> file.
> The point is; your re-enable IBRS hunk at the end is dead-code. It
> should never ever run and having it is confusing.
What I meant is that hlt_play_dead() should never be called unless there
is some serious problem with the system and native_play_dead() does
return in normal usage.
Cheers,
Longman
On 6/21/23 10:54, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 10:51:33AM -0400, Waiman Long wrote:
>> On 6/21/23 10:48, Peter Zijlstra wrote:
>>> On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:
>>>
>>>> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
>>>> which is not the typical case. My testing does confirm that this patch is
>>>> able to keep the IBRS bit off when a CPU is offline via its online sysfs
>>>> file.
>>> The point is; your re-enable IBRS hunk at the end is dead-code. It
>>> should never ever run and having it is confusing.
>> What I meant is that hlt_play_dead() should never be called unless there is
>> some serious problem with the system and native_play_dead() does return in
>> normal usage.
> This is all through arch_cpu_idle_dead() which is __noreturn. And no,
> none of this must ever return.
>
> If you want an offline CPU to come back, you re-init.
Yes, you are right. I thought it will return. I will update the patch
accordingly.
Thanks,
Longman