v3:
- Drop patches 1 ("x86/speculation: Provide a debugfs file to dump
SPEC_CTRL MSRs") and 5 ("x86/idle: Disable IBRS entering mwait idle
and enable it on wakeup") for now.
- Drop the MSR restoration code in ("x86/idle: Disable IBRS when cpu
is offline") as native_play_dead() does not return.
- For patch ("intel_idle: Add ibrs_off module parameter to force
disable IBRS"), change the name from "no_ibrs" to "ibrs_off" and
document the new parameter in intel_idle.rst.
For Intel processors that need to turn on IBRS to protect against
Spectre v2 and Retbleed, the IBRS bit in the SPEC_CTRL MSR affects
the performance of the whole core even if only one thread is turning
it on when running in the kernel. For user space heavy applications,
the performance impact of occasionally turning IBRS on during syscalls
shouldn't be significant. Unfortunately, that is not the case when the
sibling thread is idling in the kernel. In that case, the performance
impact can be significant.
When DPDK is running on an isolated CPU thread processing network packets
in user space while its sibling thread is idle. The performance of the
busy DPDK thread with IBRS on and off in the sibling idle thread are:
IBRS on IBRS off
------- --------
packets/second: 7.8M 10.4M
avg tsc cycles/packet: 282.26 209.86
This is a 25% performance degradation. The test system is a Intel Xeon
4114 CPU @ 2.20GHz.
Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the CPU enters long idle (C6 or below). However, there
are existing users out there who have set "intel_idle.max_cstate=1"
to decrease latency. Those users won't be able to benefit from this
commit. This patch series extends this commit by providing a new
"intel_idle.ibrs_off" module parameter to force disable IBRS even when
"intel_idle.max_cstate=1" at the expense of increased IRQ response
latency. It also includes a commit to allow the disabling of IBRS when
a CPU becomes offline.
Waiman Long (3):
x86/idle: Disable IBRS when cpu is offline
intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
intel_idle: Add ibrs_off module parameter to force disable IBRS
Documentation/admin-guide/pm/intel_idle.rst | 17 +++++++++++++++-
arch/x86/kernel/smpboot.c | 10 ++++++++++
drivers/idle/intel_idle.c | 22 +++++++++++++++++----
3 files changed, 44 insertions(+), 5 deletions(-)
--
2.31.1
Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the CPU enters long idle. However, when a CPU
becomes offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS
is enabled. That will impact the performance of a sibling CPU. Mitigate
this performance impact by clearing all the mitigation bits in SPEC_CTRL
MSR when offline. When the CPU is online again, it will be re-initialized
and so restoring the SPEC_CTRL value isn't needed.
Add a comment to say that native_play_dead() is a __noreturn function,
but it can't be marked as such to avoid confusion about the missing
MSR restoration code.
Signed-off-by: Waiman Long <[email protected]>
---
arch/x86/kernel/smpboot.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 352f0ce1ece4..7bc33885518c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -84,6 +84,7 @@
#include <asm/hw_irq.h>
#include <asm/stackprotector.h>
#include <asm/sev.h>
+#include <asm/nospec-branch.h>
/* representing HT siblings of each logical CPU */
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1836,8 +1837,17 @@ void __noreturn hlt_play_dead(void)
}
}
+/*
+ * naitve_play_dead() is essentially a __noreturn function, but it can't
+ * be marked as such as the compiler may complain about it.
+ */
void native_play_dead(void)
{
+ if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
+ this_cpu_write(x86_spec_ctrl_current, 0);
+ native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+ }
+
play_dead_common();
tboot_shutdown(TB_SHUTDOWN_WFS);
--
2.31.1
On 6/21/23 22:05, Randy Dunlap wrote:
>
> On 6/21/23 17:36, Waiman Long wrote:
>> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
>> disables IBRS when the CPU enters long idle. However, when a CPU
>> becomes offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS
>> is enabled. That will impact the performance of a sibling CPU. Mitigate
>> this performance impact by clearing all the mitigation bits in SPEC_CTRL
>> MSR when offline. When the CPU is online again, it will be re-initialized
>> and so restoring the SPEC_CTRL value isn't needed.
>>
>> Add a comment to say that native_play_dead() is a __noreturn function,
>> but it can't be marked as such to avoid confusion about the missing
>> MSR restoration code.
>>
>> Signed-off-by: Waiman Long <[email protected]>
>> ---
>> arch/x86/kernel/smpboot.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index 352f0ce1ece4..7bc33885518c 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -84,6 +84,7 @@
>> #include <asm/hw_irq.h>
>> #include <asm/stackprotector.h>
>> #include <asm/sev.h>
>> +#include <asm/nospec-branch.h>
>>
>> /* representing HT siblings of each logical CPU */
>> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
>> @@ -1836,8 +1837,17 @@ void __noreturn hlt_play_dead(void)
>> }
>> }
>>
>> +/*
>> + * naitve_play_dead() is essentially a __noreturn function, but it can't
> typo: native_play_dead()
Thanks for spotting the typo. Will fix it in the next version.
Cheers,
Longman
>
>> + * be marked as such as the compiler may complain about it.
>> + */
>> void native_play_dead(void)
>> {
>> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>> + this_cpu_write(x86_spec_ctrl_current, 0);
>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>> + }
>> +
>> play_dead_common();
>> tboot_shutdown(TB_SHUTDOWN_WFS);
>>
On 6/21/23 17:36, Waiman Long wrote:
> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
> disables IBRS when the CPU enters long idle. However, when a CPU
> becomes offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS
> is enabled. That will impact the performance of a sibling CPU. Mitigate
> this performance impact by clearing all the mitigation bits in SPEC_CTRL
> MSR when offline. When the CPU is online again, it will be re-initialized
> and so restoring the SPEC_CTRL value isn't needed.
>
> Add a comment to say that native_play_dead() is a __noreturn function,
> but it can't be marked as such to avoid confusion about the missing
> MSR restoration code.
>
> Signed-off-by: Waiman Long <[email protected]>
> ---
> arch/x86/kernel/smpboot.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 352f0ce1ece4..7bc33885518c 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -84,6 +84,7 @@
> #include <asm/hw_irq.h>
> #include <asm/stackprotector.h>
> #include <asm/sev.h>
> +#include <asm/nospec-branch.h>
>
> /* representing HT siblings of each logical CPU */
> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
> @@ -1836,8 +1837,17 @@ void __noreturn hlt_play_dead(void)
> }
> }
>
> +/*
> + * naitve_play_dead() is essentially a __noreturn function, but it can't
typo: native_play_dead()
> + * be marked as such as the compiler may complain about it.
> + */
> void native_play_dead(void)
> {
> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> + this_cpu_write(x86_spec_ctrl_current, 0);
> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> + }
> +
> play_dead_common();
> tboot_shutdown(TB_SHUTDOWN_WFS);
>
--
~Randy
On Wed, Jun 21, 2023 at 08:36:01PM -0400, Waiman Long wrote:
> +/*
> + * naitve_play_dead() is essentially a __noreturn function, but it can't
> + * be marked as such as the compiler may complain about it.
> + */
FWIW, we could in theory do so by marking the smp_ops.play_dead function
pointer as __noreturn, but it would be tricky to teach objtool how to
understand that.
> void native_play_dead(void)
> {
> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> + this_cpu_write(x86_spec_ctrl_current, 0);
> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> + }
Can update_spec_ctrl() be used instead?
--
Josh
On 6/22/23 01:40, Josh Poimboeuf wrote:
> On Wed, Jun 21, 2023 at 08:36:01PM -0400, Waiman Long wrote:
>> +/*
>> + * naitve_play_dead() is essentially a __noreturn function, but it can't
>> + * be marked as such as the compiler may complain about it.
>> + */
> FWIW, we could in theory do so by marking the smp_ops.play_dead function
> pointer as __noreturn, but it would be tricky to teach objtool how to
> understand that.
I added the comment here because I had taken out the MSR restoration
code. We can always replace that later on if there is a better way to do
that.
>
>> void native_play_dead(void)
>> {
>> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>> + this_cpu_write(x86_spec_ctrl_current, 0);
>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>> + }
> Can update_spec_ctrl() be used instead?
Yes, the code is similar to what has been done in update_spec_ctrl().
Using it, however, will require exporting the function either by putting
it into a public header or making it a global function.
Cheers,
Longman
>