This is a stripped down version of the patch series.
Goals:
======
1) To provide a boot time option (default off) to enable split lock
detection.
2) To ensure that kernel crashes cleanly if OS code executes an
atomic instruction that crosses cache lines.
3) Enable for some existing CPUs that do not provide CPUID enumeration
of the feature together with architectural enumeration for future CPUs.
Non-goals:
==========
1) Fancy methods to have the kernel recover/continue
2) Selective enabling (it is either "on" or "off").
3) /sys files to enable/disable at run time
4) Virtualization support (guests just SIGBUS)
Access to misaligned data across two cache lines in an atomic instruction
(a.k.a split lock) takes over 1000 extra cycles compared to an atomic
access within one cache line. Split lock degrades performance not only
on the current CPU but also on the whole system because during split lock
the instruction holds bus lock and prohibits any other memory access on
the bus.
Some real time environments cannot meet deadlines if the processor
is handling split locks.
On Intel Tremont and future processors, split lock is detected by
triggering #AC exception after setting bit 29 in the TEST_CTRL
MSR (0x33) [1].
When split lock detection is enabled, if split lock happens in the
kernel, the kernel panics. Otherwise, the user process is killed by
SIGBUS.
To get a split lock free real time system, kernel and user application
developers need to enable split lock detection and find and fix all
possible split lock issues.
The split lock detection is disabled by default because potential split
lock issues can cause kernel panic or kill user processes. It is enabled
only for real time or debug purpose through a kernel parameter
"split_lock_detect".
Enabling split lock detection already finds split lock issues in atomic
bit operations and some of the blocking issues are fixed in the tip tree:
https://lore.kernel.org/lkml/157384597983.12247.8995835529288193538.tip-bot2@tip-bot2/
https://lore.kernel.org/lkml/157384597947.12247.7200239597382357556.tip-bot2@tip-bot2/
[1] Please check the latest Intel 64 and IA-32 Architectures Software
Developer's Manual for more detailed information on the TEST_CTRL MSR
and the split lock detection bit and how to enumerate the feature.
==Changelog==
v10:
- Reduce the scope of this patch set to real time and debug usage only
because this usage is requested by customers and is easier to be
implemented than enabling the feature by default:
1. Disable split lock detection by default and enable it only for
real time or debug purpose.
2. Kernel panics or kill user process in #AC for split lock
3. Drop KVM and debugfs knobs.
v9:
Address Thomas Gleixner's comments:
- wrmsr() in split_lock_update_msr() to spare RMW
- Print warnings in atomic bit operations xxx_bit() if the address is
unaligned to unsigned long.
- When host enables split lock detection, forcing it enabled for guest.
- Using the msr_test_ctl_mask to decide which bits need to be switched in
atomic_switch_msr_test_ctl().
- Warn if addr is unaligned to unsigned long in atomic ops xxx_bit().
Address Ingo Molnar's comments:
- Follow right MSR register and bits naming convention
- Use right naming convention for variables and functions
- Use split_lock_debug for atomic opertions of WARN_ONCE in #AC handler
and split_lock_detect_wr();
- Move the sysfs interface to debugfs interface /sys/kernel/debug/x86/
split_lock_detect
Other fixes:
- update vmx->msr_test_ctl_mask when changing MSR_IA32_CORE_CAP.
- Support resume from suspend/hibernation
- The split lock fix patch (#0003) for wlcore wireless driver is
upstreamed. So remove the patch from this patch set.
v8:
Address issues pointed out by Thomas Gleixner:
- Remove all "clearcpuid=" related patches.
- Add kernel parameter "nosplit_lock_detect" patch.
- Merge definition and initialization of msr_test_ctl_cache into #AC
handling patch which first uses the variable.
- Add justification for the sysfs knob and combine function and doc
patches into one patch 0015.
- A few other adjustments.
v7:
- Add per cpu variable to cach MSR TEST_CTL. Suggested by Thomas Gleixner.
- Change a few other changes including locking, simplifying code, work
flow, KVM fixes, etc. Suggested by Thomas Gleixner.
- Fix KVM issues pointed out by Sean Christopherson.
v6:
- Fix #AC handler issues pointed out by Dave Hansen
- Add doc for the sysfs interface pointed out by Dave Hansen
- Fix a lock issue around wrmsr during split lock init, pointed out by Dave
Hansen
- Update descriptions and comments suggested by Dave Hansen
- Fix __le32 issue in wlcore raised by Kalle Valo
- Add feature enumeration based on family/model/stepping for Icelake mobile
v5:
- Fix wlcore issue from Paolo Bonzini
- Fix b44 issue from Peter Zijlstra
- Change init sequence by Dave Hansen
- Fix KVM issues from Paolo Bonzini
- Re-order patch sequence
v4:
- Remove "setcpuid=" option
- Enable IA32_CORE_CAPABILITY enumeration for split lock
- Handle CPUID faulting by Peter Zijlstra
- Enable /sys interface to enable/disable split lock detection
v3:
- Handle split lock as suggested by Thomas Gleixner.
- Fix a few potential spit lock issues suggested by Thomas Gleixner.
- Support kernel option "setcpuid=" suggested by Dave Hanson and Thomas
Gleixner.
- Support flag string in "clearcpuid=" suggested by Dave Hanson and
Thomas Gleixner.
v2:
- Remove code that handles split lock issue in firmware and fix
x86_capability issue mainly based on comments from Thomas Gleixner and
Peter Zijlstra.
In previous version:
Comments from Dave Hansen:
- Enumerate feature in X86_FEATURE_SPLIT_LOCK_AC
- Separate #AC handler from do_error_trap
- Use CONFIG to configure inherit BIOS setting, enable, or disable split
lock. Remove kernel parameter "split_lock_ac="
- Change config interface to debugfs from sysfs
- Fix a few bisectable issues
- Other changes.
Comment from Tony Luck and Dave Hansen:
- Dump right information in #AC handler
Comment from Alan Cox and Dave Hansen:
- Description of split lock in patch 0
Others:
- Remove tracing because we can trace split lock in existing
sq_misc.split_lock.
- Add CONFIG to configure either panic or re-execute faulting instruction
for split lock in kernel.
- other minor changes.
Fenghua Yu (6):
x86/msr-index: Add two new MSRs
x86/cpufeatures: Enumerate the IA32_CORE_CAPABILITIES MSR
x86/split_lock: Enumerate split lock detection by the
IA32_CORE_CAPABILITIES MSR
x86/split_lock: Enumerate split lock detection if the
IA32_CORE_CAPABILITIES MSR is not supported
x86/split_lock: Handle #AC exception for split lock
x86/split_lock: Enable split lock detection by kernel parameter
.../admin-guide/kernel-parameters.txt | 10 +++
arch/x86/include/asm/cpu.h | 5 ++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 8 +++
arch/x86/include/asm/traps.h | 3 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 72 +++++++++++++++++++
arch/x86/kernel/traps.c | 22 +++++-
8 files changed, 123 insertions(+), 1 deletion(-)
--
2.19.1
IA32_CORE_CAPABILITIES(0xCF): Core Capabilities Register
Bit5: #AC(0) exception for split locked accesses supported.
TEST_CTRL(0x33): Test Control Register
Bit29: Enable #AC(0) exception for split locked accesses.
Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/msr-index.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6a3124664289..7b25cec494fd 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
/* Intel MSRs. Some also available on other CPUs */
+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,10 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)
+#define MSR_IA32_CORE_CAPABILITIES 0x000000cf
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
--
2.19.1
Currently Linux does not expect to see an alignment check exception in
kernel mode (since it does not set CR4.AC). The existing #AC handlers
will just return from exception to the faulting instruction which will
trigger another exception.
Add a new handler for #AC exceptions that will force a panic on split
lock for kernel mode.
Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/traps.h | 3 +++
arch/x86/kernel/cpu/intel.c | 2 ++
arch/x86/kernel/traps.c | 22 +++++++++++++++++++++-
3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index b25e633033c3..0fa4eef83057 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -172,4 +172,7 @@ enum x86_pf_error_code {
X86_PF_INSTR = 1 << 4,
X86_PF_PK = 1 << 5,
};
+
+extern bool split_lock_detect_enabled;
+
#endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2614616fb6d3..bc0c2f288509 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -32,6 +32,8 @@
#include <asm/apic.h>
#endif
+bool split_lock_detect_enabled;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4bb0f8447112..044033ff4326 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -293,9 +293,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ unsigned int trapnr = X86_TRAP_AC;
+ char str[] = "alignment check";
+ int signr = SIGBUS;
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
+ return;
+
+ if (!user_mode(regs) && split_lock_detect_enabled)
+ panic("Split lock detected\n");
+
+ cond_local_irq_enable(regs);
+
+ /* Handle #AC generated in any other cases. */
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
--
2.19.1
> On Nov 20, 2019, at 5:45 PM, Fenghua Yu <[email protected]> wrote:
>
> Currently Linux does not expect to see an alignment check exception in
> kernel mode (since it does not set CR4.AC). The existing #AC handlers
> will just return from exception to the faulting instruction which will
> trigger another exception.
>
> Add a new handler for #AC exceptions that will force a panic on split
> lock for kernel mode.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
> arch/x86/include/asm/traps.h | 3 +++
> arch/x86/kernel/cpu/intel.c | 2 ++
> arch/x86/kernel/traps.c | 22 +++++++++++++++++++++-
> 3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
> index b25e633033c3..0fa4eef83057 100644
> --- a/arch/x86/include/asm/traps.h
> +++ b/arch/x86/include/asm/traps.h
> @@ -172,4 +172,7 @@ enum x86_pf_error_code {
> X86_PF_INSTR = 1 << 4,
> X86_PF_PK = 1 << 5,
> };
> +
> +extern bool split_lock_detect_enabled;
> +
> #endif /* _ASM_X86_TRAPS_H */
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 2614616fb6d3..bc0c2f288509 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -32,6 +32,8 @@
> #include <asm/apic.h>
> #endif
>
> +bool split_lock_detect_enabled;
> +
> /*
> * Just in case our CPU detection goes bad, or you have a weird system,
> * allow a way to override the automatic disabling of MPX.
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 4bb0f8447112..044033ff4326 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -293,9 +293,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
> DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
> DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
> DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
> -DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
> #undef IP
>
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> + unsigned int trapnr = X86_TRAP_AC;
> + char str[] = "alignment check";
> + int signr = SIGBUS;
> +
> + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> + return;
> +
> + if (!user_mode(regs) && split_lock_detect_enabled)
> + panic("Split lock detected\n");
NAK.
1. Don’t say “split lock detected” if you don’t know that you detected a split lock. Or is this genuinely the only way to get #AC from kernel mode?
2. Don’t panic. Use die() just like every other error where nothing is corrupted.
And maybe instead turn off split lock detection and print a stack trace instead. Then the kernel is even more likely to survive to log something useful.
On Thu, Nov 21, 2019 at 02:10:38PM -0800, Andy Lutomirski wrote:
>
>
> > On Nov 20, 2019, at 5:45 PM, Fenghua Yu <[email protected]> wrote:
> >
> > + if (!user_mode(regs) && split_lock_detect_enabled)
> > + panic("Split lock detected\n");
>
> NAK.
>
> 1. Don’t say “split lock detected” if you don’t know that you detected a split lock. Or is this genuinely the only way to get #AC from kernel mode?
Intel hardware design team confirmed that the only reason for #AC in ring 0 is
split lock.
>
> 2. Don’t panic. Use die() just like every other error where nothing is corrupted.
Ok. Will change to die() which provides all the trace information and
allow multiple split lock in one boot.
>
> And maybe instead turn off split lock detection and print a stack trace instead. Then the kernel is even more likely to survive to log something useful.
How about we just use simple policy die() in this patch set to allow
detect and debug split lock issues and extend the code base to handle
split lock with different policies (panic, disable split lock, maybe other
options) in the future?
Thanks.
-Fenghua
> On Nov 21, 2019, at 3:02 PM, Fenghua Yu <[email protected]> wrote:
>
> On Thu, Nov 21, 2019 at 02:10:38PM -0800, Andy Lutomirski wrote:
>>
>>
>>>> On Nov 20, 2019, at 5:45 PM, Fenghua Yu <[email protected]> wrote:
>>>
>>> + if (!user_mode(regs) && split_lock_detect_enabled)
>>> + panic("Split lock detected\n");
>>
>> NAK.
>>
>> 1. Don’t say “split lock detected” if you don’t know that you detected a split lock. Or is this genuinely the only way to get #AC from kernel mode?
>
> Intel hardware design team confirmed that the only reason for #AC in ring 0 is
> split lock.
Okay.
This should eventually get integrated with Jann’s decoder work to print the lock address and size.
>
>>
>> 2. Don’t panic. Use die() just like every other error where nothing is corrupted.
>
> Ok. Will change to die() which provides all the trace information and
> allow multiple split lock in one boot.
>
>>
>> And maybe instead turn off split lock detection and print a stack trace instead. Then the kernel is even more likely to survive to log something useful.
>
> How about we just use simple policy die() in this patch set to allow
> detect and debug split lock issues and extend the code base to handle
> split lock with different policies (panic, disable split lock, maybe other
> options) in the future?
>
>
I’m okay with this. Peter?
>