This is a version 4 of the patchset to "Prefer MWAIT over HLT on AMD
processors"
The previous versions are:
v3: https://lore.kernel.org/lkml/cover.fba143c82098dffab6bbf0a2f3c4be8bae07ccf1.1652176835.git-series.wyes.karny@amd.com/
v2: https://lore.kernel.org/lkml/[email protected]/
v1: https://lore.kernel.org/lkml/[email protected]/
Changes between v3 --> v4:
- Update documentation around idle=nomwait
Changes between v2 --> v3:
- Update some text in commit messages
- Update the documentation around idle=nomwait
- Remove unnecessary CPUID level check from prefer_mwait_c1_over_halt function
Background
==========
Currently in the absence of the cpuidle driver (eg: when global C-States are
disabled in the BIOS or when cpuidle is driver is not compiled in), the default
idle state on AMD Zen processors uses the HLT instruction even though there is
support for MWAIT instruction which is more efficient than HLT.
HPC customers who want to optimize for lower latency are known to disable
Global C-States in the BIOS. Some vendors allow choosing a BIOS 'performance'
profile which explicitly disables C-States. In this scenario, the cpuidle
driver will not be loaded and the kernel will continue with the default idle
state chosen at boot time. On AMD systems currently the default idle state is
HLT which has a higher exit latency compared to MWAIT.
The reason for the choice of HLT over MWAIT on AMD systems is:
1. Families prior to 10h didn't support MWAIT
2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was
preferable to use HLT as the default state on these systems.
However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1. And it is
preferable to use MWAIT as the default idle state on these systems, as it has
lower exit latencies.
The below table represents the exit latency for HLT and MWAIT on AMD Zen 3
system. Exit latency is measured by issuing a wakeup (IPI) to other CPU and
measuring how many clock cycles it took to wakeup. Each iteration measures 10K
wakeups by pinning source and destination.
HLT:
25.0000th percentile : 1900 ns
50.0000th percentile : 2000 ns
75.0000th percentile : 2300 ns
90.0000th percentile : 2500 ns
95.0000th percentile : 2600 ns
99.0000th percentile : 2800 ns
99.5000th percentile : 3000 ns
99.9000th percentile : 3400 ns
99.9500th percentile : 3600 ns
99.9900th percentile : 5900 ns
Min latency : 1700 ns
Max latency : 5900 ns
Total Samples 9999
MWAIT:
25.0000th percentile : 1400 ns
50.0000th percentile : 1500 ns
75.0000th percentile : 1700 ns
90.0000th percentile : 1800 ns
95.0000th percentile : 1900 ns
99.0000th percentile : 2300 ns
99.5000th percentile : 2500 ns
99.9000th percentile : 3200 ns
99.9500th percentile : 3500 ns
99.9900th percentile : 4600 ns
Min latency : 1200 ns
Max latency : 4600 ns
Total Samples 9997
Improvement (99th percentile): 21.74%
Below is another result for context_switch2 micro-benchmark, which brings out
the impact of improved wakeup latency through increased context-switches per
second.
Link: https://ozlabs.org/~anton/junkcode/context_switch2.c
with HLT:
-------------------------------
50.0000th percentile : 190184
75.0000th percentile : 191032
90.0000th percentile : 192314
95.0000th percentile : 192520
99.0000th percentile : 192844
MIN : 190148
MAX : 192852
with MWAIT:
-------------------------------
50.0000th percentile : 277444
75.0000th percentile : 278268
90.0000th percentile : 278888
95.0000th percentile : 279164
99.0000th percentile : 280504
MIN : 273278
MAX : 281410
Improvement(99th percentile): ~ 45.46%
A similar trend is observed on older Zen processors also.
Here we enable MWAIT instruction as the default idle call for AMD Zen
processors which support MWAIT. We retain the existing behaviour for older
processors which depend on HLT.
This patchset restores the decision tree that was present in the kernel earlier
due to Thomas Gleixner's patch: commit 09fd4b4ef5bc ("x86: use cpuid to check
MWAIT support for C1")
NOTE: This change only impacts the default idle behaviour in the absence of
cpuidle driver. If the cpuidle driver is present, it controls the processor
idle behaviour.
Fixes: commit b253149b843f ("sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance")
Changelog:
v4:
- Update documetation around idle=nomwait
v3:
- Update documentation around idle=nomwait
- Remove unnecessary CPUID check from prefer_mwait_c1_over_halt function
v2:
- Remove vendor checks, fix idle=nomwait condition, fix documentation
Zhang Rui from Intel confirmed that this patchset has no impact on
modern Intel processors.
Wyes Karny (3):
x86: Handle idle=nomwait cmdline properly for x86_idle
x86: Remove vendor checks from prefer_mwait_c1_over_halt
x86: Fix comment for X86_FEATURE_ZEN
Documentation/admin-guide/pm/cpuidle.rst | 15 +++++----
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/mwait.h | 1 +-
arch/x86/kernel/process.c | 41 ++++++++++++++++++-------
4 files changed, 41 insertions(+), 18 deletions(-)
base-commit: 672c0c5173427e6b3e2a9bbb7be51ceeec78093a
--
git-series 0.9.1