2023-07-27 19:40:29

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 0/4] x86/speculation: Disable IBRS when idle

v6:
- Fix allyesconfig build error by moving __update_spec_ctrl()
helper from nospec-branch.h to spec-ctrl.h and include it in files
that need the helper.

v5:
- Update comment in patch 1.
- Minor doc update and code twist in patch 4 as suggested by Peter and
Randy.

v4:
- Add a new __update_spec_ctrl() helper in patch 1.
- Rebased to the latest linux kernel.

v3:
- Drop patches 1 ("x86/speculation: Provide a debugfs file to dump
SPEC_CTRL MSRs") and 5 ("x86/idle: Disable IBRS entering mwait idle
and enable it on wakeup") for now.
- Drop the MSR restoration code in ("x86/idle: Disable IBRS when cpu
is offline") as native_play_dead() does not return.
- For patch ("intel_idle: Add ibrs_off module parameter to force
disable IBRS"), change the name from "no_ibrs" to "ibrs_off" and
document the new parameter in intel_idle.rst.

For Intel processors that need to turn on IBRS to protect against
Spectre v2 and Retbleed, the IBRS bit in the SPEC_CTRL MSR affects
the performance of the whole core even if only one thread is turning
it on when running in the kernel. For user space heavy applications,
the performance impact of occasionally turning IBRS on during syscalls
shouldn't be significant. Unfortunately, that is not the case when the
sibling thread is idling in the kernel. In that case, the performance
impact can be significant.

When DPDK is running on an isolated CPU thread processing network packets
in user space while its sibling thread is idle. The performance of the
busy DPDK thread with IBRS on and off in the sibling idle thread are:

IBRS on IBRS off
------- --------
packets/second: 7.8M 10.4M
avg tsc cycles/packet: 282.26 209.86

This is a 25% performance degradation. The test system is a Intel Xeon
4114 CPU @ 2.20GHz.

Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the CPU enters long idle (C6 or below). However, there
are existing users out there who have set "intel_idle.max_cstate=1"
to decrease latency. Those users won't be able to benefit from this
commit. This patch series extends this commit by providing a new
"intel_idle.ibrs_off" module parameter to force disable IBRS even when
"intel_idle.max_cstate=1" at the expense of increased IRQ response
latency. It also includes a commit to allow the disabling of IBRS when
a CPU becomes offline.

Waiman Long (4):
x86/speculation: Add __update_spec_ctrl() helper
x86/idle: Disable IBRS when cpu is offline
intel_idle: Use __update_spec_ctrl() in intel_idle_ibrs()
intel_idle: Add ibrs_off module parameter to force disable IBRS

Documentation/admin-guide/pm/intel_idle.rst | 17 ++++++++++++++++-
arch/x86/include/asm/spec-ctrl.h | 11 +++++++++++
arch/x86/kernel/smpboot.c | 8 ++++++++
drivers/idle/intel_idle.c | 18 +++++++++++++-----
4 files changed, 48 insertions(+), 6 deletions(-)

--
2.31.1



2023-07-27 20:32:44

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 4/4] intel_idle: Add ibrs_off module parameter to force disable IBRS

Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the cstate is 6 or lower. However, there are
some use cases where a customer may want to use max_cstate=1 to
lower latency. Such use cases will suffer from the performance
degradation caused by the enabling of IBRS in the sibling idle thread.
Add a "ibrs_off" module parameter to force disable IBRS and the
CPUIDLE_FLAG_IRQ_ENABLE flag if set.

In the case of a Skylake server with max_cstate=1, this new ibrs_off
option will likely increase the IRQ response latency as IRQ will now
be disabled.

When running SPECjbb2015 with cstates set to C1 on a Skylake system.

First test when the kernel is booted with: "intel_idle.ibrs_off"
max-jOPS = 117828, critical-jOPS = 66047

Then retest when the kernel is booted without the "intel_idle.ibrs_off"
added.
max-jOPS = 116408, critical-jOPS = 58958

That means booting with "intel_idle.ibrs_off" improves performance by:
max-jOPS: 1.2%, which could be considered noise range.
critical-jOPS: 12%, which is definitely a solid improvement.

The admin-guide/pm/intel_idle.rst file is updated to add a description
about the new "ibrs_off" module parameter.

Signed-off-by: Waiman Long <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
---
Documentation/admin-guide/pm/intel_idle.rst | 17 ++++++++++++++++-
drivers/idle/intel_idle.c | 11 ++++++++++-
2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst
index b799a43da62e..39bd6ecce7de 100644
--- a/Documentation/admin-guide/pm/intel_idle.rst
+++ b/Documentation/admin-guide/pm/intel_idle.rst
@@ -170,7 +170,7 @@ and ``idle=nomwait``. If any of them is present in the kernel command line, the
``MWAIT`` instruction is not allowed to be used, so the initialization of
``intel_idle`` will fail.

-Apart from that there are four module parameters recognized by ``intel_idle``
+Apart from that there are five module parameters recognized by ``intel_idle``
itself that can be set via the kernel command line (they cannot be updated via
sysfs, so that is the only way to change their values).

@@ -216,6 +216,21 @@ are ignored).
The idle states disabled this way can be enabled (on a per-CPU basis) from user
space via ``sysfs``.

+The ``ibrs_off`` module parameter is a boolean flag (defaults to
+false). If set, it is used to control if IBRS (Indirect Branch Restricted
+Speculation) should be turned off when the CPU enters an idle state.
+This flag does not affect CPUs that use Enhanced IBRS which can remain
+on with little performance impact.
+
+For some CPUs, IBRS will be selected as mitigation for Spectre v2 and Retbleed
+security vulnerabilities by default. Leaving the IBRS mode on while idling may
+have a performance impact on its sibling CPU. The IBRS mode will be turned off
+by default when the CPU enters into a deep idle state, but not in some
+shallower ones. Setting the ``ibrs_off`` module parameter will force the IBRS
+mode to off when the CPU is in any one of the available idle states. This may
+help performance of a sibling CPU at the expense of a slightly higher wakeup
+latency for the idle CPU.
+

.. _intel-idle-core-and-package-idle-states:

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index d286f31a5b96..18fa47370bde 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -68,6 +68,7 @@ static int max_cstate = CPUIDLE_STATE_MAX - 1;
static unsigned int disabled_states_mask __read_mostly;
static unsigned int preferred_states_mask __read_mostly;
static bool force_irq_on __read_mostly;
+static bool ibrs_off __read_mostly;

static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;

@@ -1852,11 +1853,13 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
}

if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
- state->flags & CPUIDLE_FLAG_IBRS) {
+ ((state->flags & CPUIDLE_FLAG_IBRS) || ibrs_off)) {
/*
* IBRS mitigation requires that C-states are entered
* with interrupts disabled.
*/
+ if (ibrs_off && (state->flags & CPUIDLE_FLAG_IRQ_ENABLE))
+ state->flags &= ~CPUIDLE_FLAG_IRQ_ENABLE;
WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
state->enter = intel_idle_ibrs;
return;
@@ -2175,3 +2178,9 @@ MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states");
* 'CPUIDLE_FLAG_INIT_XSTATE' and 'CPUIDLE_FLAG_IBRS' flags.
*/
module_param(force_irq_on, bool, 0444);
+/*
+ * Force the disabling of IBRS when X86_FEATURE_KERNEL_IBRS is on and
+ * CPUIDLE_FLAG_IRQ_ENABLE isn't set.
+ */
+module_param(ibrs_off, bool, 0444);
+MODULE_PARM_DESC(ibrs_off, "Disable IBRS when idle");
--
2.31.1


2023-10-04 11:50:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] x86/speculation: Disable IBRS when idle


* Waiman Long <[email protected]> wrote:

> For Intel processors that need to turn on IBRS to protect against
> Spectre v2 and Retbleed, the IBRS bit in the SPEC_CTRL MSR affects
> the performance of the whole core even if only one thread is turning
> it on when running in the kernel. For user space heavy applications,
> the performance impact of occasionally turning IBRS on during syscalls
> shouldn't be significant. Unfortunately, that is not the case when the
> sibling thread is idling in the kernel. In that case, the performance
> impact can be significant.
>
> When DPDK is running on an isolated CPU thread processing network packets
> in user space while its sibling thread is idle. The performance of the
> busy DPDK thread with IBRS on and off in the sibling idle thread are:
>
> IBRS on IBRS off
> ------- --------
> packets/second: 7.8M 10.4M
> avg tsc cycles/packet: 282.26 209.86
>
> This is a 25% performance degradation. The test system is a Intel Xeon
> 4114 CPU @ 2.20GHz.

Ok, that's a solid improvement, and the feature has no obvious
downsides, so I've applied your series to tip:sched/core with a few
edits here and there.

Thanks!

Ingo

Subject: [tip: sched/core] intel_idle: Add ibrs_off module parameter to force-disable IBRS

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 238437d88cea4bfcbc0e7c5031e873dec15d3e93
Gitweb: https://git.kernel.org/tip/238437d88cea4bfcbc0e7c5031e873dec15d3e93
Author: Waiman Long <[email protected]>
AuthorDate: Thu, 27 Jul 2023 14:46:00 -04:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Wed, 04 Oct 2023 13:48:48 +02:00

intel_idle: Add ibrs_off module parameter to force-disable IBRS

Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the cstate is 6 or lower. However, there are
some use cases where a customer may want to use max_cstate=1 to
lower latency. Such use cases will suffer from the performance
degradation caused by the enabling of IBRS in the sibling idle thread.
Add a "ibrs_off" module parameter to force disable IBRS and the
CPUIDLE_FLAG_IRQ_ENABLE flag if set.

In the case of a Skylake server with max_cstate=1, this new ibrs_off
option will likely increase the IRQ response latency as IRQ will now
be disabled.

When running SPECjbb2015 with cstates set to C1 on a Skylake system.

First test when the kernel is booted with: "intel_idle.ibrs_off":

max-jOPS = 117828, critical-jOPS = 66047

Then retest when the kernel is booted without the "intel_idle.ibrs_off"
added:

max-jOPS = 116408, critical-jOPS = 58958

That means booting with "intel_idle.ibrs_off" improves performance by:

max-jOPS: +1.2%, which could be considered noise range.
critical-jOPS: +12%, which is definitely a solid improvement.

The admin-guide/pm/intel_idle.rst file is updated to add a description
about the new "ibrs_off" module parameter.

Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
Documentation/admin-guide/pm/intel_idle.rst | 17 ++++++++++++++++-
drivers/idle/intel_idle.c | 11 ++++++++++-
2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst
index b799a43..39bd6ec 100644
--- a/Documentation/admin-guide/pm/intel_idle.rst
+++ b/Documentation/admin-guide/pm/intel_idle.rst
@@ -170,7 +170,7 @@ and ``idle=nomwait``. If any of them is present in the kernel command line, the
``MWAIT`` instruction is not allowed to be used, so the initialization of
``intel_idle`` will fail.

-Apart from that there are four module parameters recognized by ``intel_idle``
+Apart from that there are five module parameters recognized by ``intel_idle``
itself that can be set via the kernel command line (they cannot be updated via
sysfs, so that is the only way to change their values).

@@ -216,6 +216,21 @@ are ignored).
The idle states disabled this way can be enabled (on a per-CPU basis) from user
space via ``sysfs``.

+The ``ibrs_off`` module parameter is a boolean flag (defaults to
+false). If set, it is used to control if IBRS (Indirect Branch Restricted
+Speculation) should be turned off when the CPU enters an idle state.
+This flag does not affect CPUs that use Enhanced IBRS which can remain
+on with little performance impact.
+
+For some CPUs, IBRS will be selected as mitigation for Spectre v2 and Retbleed
+security vulnerabilities by default. Leaving the IBRS mode on while idling may
+have a performance impact on its sibling CPU. The IBRS mode will be turned off
+by default when the CPU enters into a deep idle state, but not in some
+shallower ones. Setting the ``ibrs_off`` module parameter will force the IBRS
+mode to off when the CPU is in any one of the available idle states. This may
+help performance of a sibling CPU at the expense of a slightly higher wakeup
+latency for the idle CPU.
+

.. _intel-idle-core-and-package-idle-states:

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 86ac9a4..dcda0af 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -68,6 +68,7 @@ static int max_cstate = CPUIDLE_STATE_MAX - 1;
static unsigned int disabled_states_mask __read_mostly;
static unsigned int preferred_states_mask __read_mostly;
static bool force_irq_on __read_mostly;
+static bool ibrs_off __read_mostly;

static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;

@@ -1852,11 +1853,13 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
}

if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
- state->flags & CPUIDLE_FLAG_IBRS) {
+ ((state->flags & CPUIDLE_FLAG_IBRS) || ibrs_off)) {
/*
* IBRS mitigation requires that C-states are entered
* with interrupts disabled.
*/
+ if (ibrs_off && (state->flags & CPUIDLE_FLAG_IRQ_ENABLE))
+ state->flags &= ~CPUIDLE_FLAG_IRQ_ENABLE;
WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
state->enter = intel_idle_ibrs;
return;
@@ -2175,3 +2178,9 @@ MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states");
* 'CPUIDLE_FLAG_INIT_XSTATE' and 'CPUIDLE_FLAG_IBRS' flags.
*/
module_param(force_irq_on, bool, 0444);
+/*
+ * Force the disabling of IBRS when X86_FEATURE_KERNEL_IBRS is on and
+ * CPUIDLE_FLAG_IRQ_ENABLE isn't set.
+ */
+module_param(ibrs_off, bool, 0444);
+MODULE_PARM_DESC(ibrs_off, "Disable IBRS when idle");

2023-10-04 16:12:21

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] x86/speculation: Disable IBRS when idle

On 10/4/23 07:50, Ingo Molnar wrote:
> * Waiman Long <[email protected]> wrote:
>
>> For Intel processors that need to turn on IBRS to protect against
>> Spectre v2 and Retbleed, the IBRS bit in the SPEC_CTRL MSR affects
>> the performance of the whole core even if only one thread is turning
>> it on when running in the kernel. For user space heavy applications,
>> the performance impact of occasionally turning IBRS on during syscalls
>> shouldn't be significant. Unfortunately, that is not the case when the
>> sibling thread is idling in the kernel. In that case, the performance
>> impact can be significant.
>>
>> When DPDK is running on an isolated CPU thread processing network packets
>> in user space while its sibling thread is idle. The performance of the
>> busy DPDK thread with IBRS on and off in the sibling idle thread are:
>>
>> IBRS on IBRS off
>> ------- --------
>> packets/second: 7.8M 10.4M
>> avg tsc cycles/packet: 282.26 209.86
>>
>> This is a 25% performance degradation. The test system is a Intel Xeon
>> 4114 CPU @ 2.20GHz.
> Ok, that's a solid improvement, and the feature has no obvious
> downsides, so I've applied your series to tip:sched/core with a few
> edits here and there.

Thanks!

-Longman

Subject: [tip: sched/core] intel_idle: Add ibrs_off module parameter to force-disable IBRS

The following commit has been merged into the sched/core branch of tip:

Commit-ID: aa1567a7e6440b8c3af4b0d8a8219d8fc5028c5f
Gitweb: https://git.kernel.org/tip/aa1567a7e6440b8c3af4b0d8a8219d8fc5028c5f
Author: Waiman Long <[email protected]>
AuthorDate: Thu, 27 Jul 2023 14:46:00 -04:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 07 Oct 2023 11:33:28 +02:00

intel_idle: Add ibrs_off module parameter to force-disable IBRS

Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the cstate is 6 or lower. However, there are
some use cases where a customer may want to use max_cstate=1 to
lower latency. Such use cases will suffer from the performance
degradation caused by the enabling of IBRS in the sibling idle thread.
Add a "ibrs_off" module parameter to force disable IBRS and the
CPUIDLE_FLAG_IRQ_ENABLE flag if set.

In the case of a Skylake server with max_cstate=1, this new ibrs_off
option will likely increase the IRQ response latency as IRQ will now
be disabled.

When running SPECjbb2015 with cstates set to C1 on a Skylake system.

First test when the kernel is booted with: "intel_idle.ibrs_off":

max-jOPS = 117828, critical-jOPS = 66047

Then retest when the kernel is booted without the "intel_idle.ibrs_off"
added:

max-jOPS = 116408, critical-jOPS = 58958

That means booting with "intel_idle.ibrs_off" improves performance by:

max-jOPS: +1.2%, which could be considered noise range.
critical-jOPS: +12%, which is definitely a solid improvement.

The admin-guide/pm/intel_idle.rst file is updated to add a description
about the new "ibrs_off" module parameter.

Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
Documentation/admin-guide/pm/intel_idle.rst | 17 ++++++++++++++++-
drivers/idle/intel_idle.c | 11 ++++++++++-
2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst
index b799a43..39bd6ec 100644
--- a/Documentation/admin-guide/pm/intel_idle.rst
+++ b/Documentation/admin-guide/pm/intel_idle.rst
@@ -170,7 +170,7 @@ and ``idle=nomwait``. If any of them is present in the kernel command line, the
``MWAIT`` instruction is not allowed to be used, so the initialization of
``intel_idle`` will fail.

-Apart from that there are four module parameters recognized by ``intel_idle``
+Apart from that there are five module parameters recognized by ``intel_idle``
itself that can be set via the kernel command line (they cannot be updated via
sysfs, so that is the only way to change their values).

@@ -216,6 +216,21 @@ are ignored).
The idle states disabled this way can be enabled (on a per-CPU basis) from user
space via ``sysfs``.

+The ``ibrs_off`` module parameter is a boolean flag (defaults to
+false). If set, it is used to control if IBRS (Indirect Branch Restricted
+Speculation) should be turned off when the CPU enters an idle state.
+This flag does not affect CPUs that use Enhanced IBRS which can remain
+on with little performance impact.
+
+For some CPUs, IBRS will be selected as mitigation for Spectre v2 and Retbleed
+security vulnerabilities by default. Leaving the IBRS mode on while idling may
+have a performance impact on its sibling CPU. The IBRS mode will be turned off
+by default when the CPU enters into a deep idle state, but not in some
+shallower ones. Setting the ``ibrs_off`` module parameter will force the IBRS
+mode to off when the CPU is in any one of the available idle states. This may
+help performance of a sibling CPU at the expense of a slightly higher wakeup
+latency for the idle CPU.
+

.. _intel-idle-core-and-package-idle-states:

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 86ac9a4..dcda0af 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -68,6 +68,7 @@ static int max_cstate = CPUIDLE_STATE_MAX - 1;
static unsigned int disabled_states_mask __read_mostly;
static unsigned int preferred_states_mask __read_mostly;
static bool force_irq_on __read_mostly;
+static bool ibrs_off __read_mostly;

static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;

@@ -1852,11 +1853,13 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
}

if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
- state->flags & CPUIDLE_FLAG_IBRS) {
+ ((state->flags & CPUIDLE_FLAG_IBRS) || ibrs_off)) {
/*
* IBRS mitigation requires that C-states are entered
* with interrupts disabled.
*/
+ if (ibrs_off && (state->flags & CPUIDLE_FLAG_IRQ_ENABLE))
+ state->flags &= ~CPUIDLE_FLAG_IRQ_ENABLE;
WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
state->enter = intel_idle_ibrs;
return;
@@ -2175,3 +2178,9 @@ MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states");
* 'CPUIDLE_FLAG_INIT_XSTATE' and 'CPUIDLE_FLAG_IBRS' flags.
*/
module_param(force_irq_on, bool, 0444);
+/*
+ * Force the disabling of IBRS when X86_FEATURE_KERNEL_IBRS is on and
+ * CPUIDLE_FLAG_IRQ_ENABLE isn't set.
+ */
+module_param(ibrs_off, bool, 0444);
+MODULE_PARM_DESC(ibrs_off, "Disable IBRS when idle");