==Introduction==
A split lock is any atomic operation whose operand crosses two cache
lines. Since the operand spans two cache lines and the operation must
be atomic, the system locks the bus while the CPU accesses the two cache
lines.
During bus locking, request from other CPUs or bus agents for control
of the bus are blocked. Blocking bus access from other CPUs plus
overhead of configuring bus locking protocol degrade not only performance
on one CPU but also overall system performance.
If the operand is cacheable and completely contained in one cache line,
the atomic operation is optimized by less expensive cache locking on
Intel P6 and recent processors. If a split lock operation is detected
and a developer fixes the issue so that the operand can be operated in one
cache line, cache locking instead of more expensive bus locking will be
used for the atomic operation. Removing the split lock can improve overall
performance.
Instructions that may cause split lock issue include lock add, lock btc,
xchg, lsl, far call, ltr, etc.
More information about split lock, bus locking, and cache locking can be
found in the latest Intel 64 and IA-32 Architecture Software Developer's
Manual.
==Split lock detection==
Currently Linux can trace split lock event counter sq_misc.split_lock
for debug purpose. But for system deployed in the field, this event
counter after the fact is insufficient. We need a mechanism that
detects split lock before it happens to ensure that bus lock is never
incurred due to split lock.
Intel introduces a mechanism to detect split lock via Alignment Check
(#AC) exception before badly aligned atomic instructions might impact
whole system performance in Tremont and other future processors.
This capability is critical for real time system designers who build
consolidated real time systems. These systems run hard real time
code on some cores and run "untrusted" user processes on some
other cores. The hard real time cannot afford to have any bus lock from
the untrusted processes to hurt real time performance. To date the
designers have been unable to deploy these solutions as they have no
way to prevent the "untrusted" user code from generating split lock and
bus lock to block the hard real time code to access memory during bus
locking.
This capability may also find usage in cloud. A user process with split
lock running in one guest can block other cores from accessing shared
memory during its split locked memory access. That may cause overall
system performance degradation.
Split lock may open a security hole where malicious user code may slow
down overall system by executing instructions with split lock.
==Enumerate split lock detection feature==
A control bit (bit 29) in MSR_TEST_CTL (0x33) will be introduced in
future x86 processors. When the bit 29 is set, the processor causes
#AC exception for split locked accesses at all CPL.
The split lock detection feature is enumerated through bit 5 in
MSR_IA32_CORE_CAPABILITY (0xcf). The MSR 0xcf itself is enumerated by
CPUID.(EAX=0x7,ECX=0):EDX[30].
The enumeration method is published in the latest Intel 64 and IA-32
Architecture Software Developer's Manual.
A few processors have the split lock detection feature. But they don't
have MSR_IA32_CORE_CAPABILITY to enumerate the feature. On those
processors, enumerate the split lock detection feature based on their
family/model/stepping numbers.
==Handle split lock===
There may be different considerations to handle split lock, e.g. how
to handle split lock issue in firmware after kernel enables the feature.
But this patch set uses a simple way to handle split lock which is
suggested by Thomas Gleixner and Dave Hansen:
- If split lock happens in kernel, a warning is issued and split lock
detection is disabled on the current CPU. The split lock issue should
be fixed in kernel.
- If split lock happens in user process, the process is killed by
SIGBUS. Unless the issue is fixed, the process cannot run in the
system.
- If split lock happens in firmware, system may hang in firmware. The
issue should be fixed in firmware.
- Enable split lock detection by default once the feature is enumerated.
- Disable split lock detection by "clearcpuid=split_lock_detect" during
boot time.
- Disable/enable split lock detection by interface
/sys/devices/system/cpu/split_lock_detect during run time.
==Expose to guest==
To expose this feature to guest, need to
1. Report the new CPUID bit to guest.
2. Emulate IA32_CORE_CAPABILITIES MSR.
3. Emulate TEST_CTL MSR. Since this patch series enable split lock
detection in host kernel by default, if do not emulate MSR_TEST_CTL
for guest, guest will run with the value set by host without knowing
that. So guest will run with split lock detection enabled due to the
host's value. Thus guest running with buggy firmware and old kernel
will fail because they lack the ability to handle #AC for split lock.
So need to emulate MSR_TEST_CTL and separate its value between host
and guest.
==Patches==
Patch 1-4: Fix a few existing split lock issues.
Patch 5-9: Enumerate features and define MSR_TEST_CTL.
Patch 10: Handle #AC for split lock
Patch 11-12: Enable split lock detection in KVM.
Patch 13: Enable #AC for split lock by default after #AC handler and KVM
are installed
Patch 14-15: Define a sysfs interface to enable/disable split lock
detection
Patch 16-20: Extend "clearcpuid=" option.
==Changelog==
v6:
- Fix #AC handler issues pointed out by Dave Hansen
- Add doc for the sysfs interface pointed out by Dave Hansen
- Fix a lock issue around wrmsr during split lock init, pointed out by Dave
Hansen
- Update descriptions and comments suggested by Dave Hansen
- Fix __le32 issue in wlcore raised by Kalle Valo
- Add feature enumeration based on family/model/stepping for Icelake mobile
v5:
- Fix wlcore issue from Paolo Bonzini
- Fix b44 issue from Peter Zijlstra
- Change init sequence by Dave Hansen
- Fix KVM issues from Paolo Bonzini
- Re-order patch sequence
v4:
- Remove "setcpuid=" option
- Enable IA32_CORE_CAPABILITY enumeration for split lock
- Handle CPUID faulting by Peter Zijlstra
- Enable /sys interface to enable/disable split lock detection
v3:
- Handle split lock as suggested by Thomas Gleixner.
- Fix a few potential spit lock issues suggested by Thomas Gleixner.
- Support kernel option "setcpuid=" suggested by Dave Hanson and Thomas
Gleixner.
- Support flag string in "clearcpuid=" suggested by Dave Hanson and
Thomas Gleixner.
v2:
- Remove code that handles split lock issue in firmware and fix
x86_capability issue mainly based on comments from Thomas Gleixner and
Peter Zijlstra.
In previous version:
Comments from Dave Hansen:
- Enumerate feature in X86_FEATURE_SPLIT_LOCK_AC
- Separate #AC handler from do_error_trap
- Use CONFIG to configure inherit BIOS setting, enable, or disable split
lock. Remove kernel parameter "split_lock_ac="
- Change config interface to debugfs from sysfs
- Fix a few bisectable issues
- Other changes.
Comment from Tony Luck and Dave Hansen:
- Dump right information in #AC handler
Comment from Alan Cox and Dave Hansen:
- Description of split lock in patch 0
Others:
- Remove tracing because we can trace split lock in existing
sq_misc.split_lock.
- Add CONFIG to configure either panic or re-execute faulting instruction
for split lock in kernel.
- other minor changes.
Fenghua Yu (15):
x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long
x86/split_lock: Align x86_capability to unsigned long to avoid split
locked access
x86/msr-index: Define MSR_IA32_CORE_CAPABILITY and split lock
detection bit
x86/cpufeatures: Enumerate MSR_IA32_CORE_CAPABILITY
x86/split_lock: Enumerate split lock detection by
MSR_IA32_CORE_CAPABILITY
x86/split_lock: Enumerate split lock detection on Icelake mobile
processor
x86/split_lock: Define MSR_TEST_CTL register
x86/split_lock: Handle #AC exception for split lock
x86/split_lock: Enable split lock detection by default
x86/split_lock: Add a sysfs interface to enable/disable split lock
detection during run time
x86/split_lock: Document the new sysfs file for split lock detection
x86/clearcpuid: Support multiple clearcpuid options
x86/clearcpuid: Support feature flag string in kernel option
clearcpuid
x86/clearcpuid: Apply cleared feature bits that are forced set before
x86/clearcpuid: Change document for kernel option clearcpuid
Paolo Bonzini (1):
wlcore: simplify/fix/optimize reg_ch_conf_pending operations
Peter Zijlstra (2):
drivers/net/b44: Align pwol_mask to unsigned long for better
performance
x86/clearcpuid: Clear CPUID bit in CPUID faulting
Xiaoyao Li (2):
kvm/x86: Emulate MSR IA32_CORE_CAPABILITY
kvm/vmx: Emulate MSR TEST_CTL
.../ABI/testing/sysfs-devices-system-cpu | 15 ++
.../admin-guide/kernel-parameters.txt | 18 +-
arch/x86/include/asm/cmdline.h | 3 +
arch/x86/include/asm/cpu.h | 7 +
arch/x86/include/asm/cpufeature.h | 5 +
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/asm/msr-index.h | 7 +
arch/x86/include/asm/processor.h | 4 +-
arch/x86/kernel/cpu/common.c | 14 +-
arch/x86/kernel/cpu/cpuid-deps.c | 157 ++++++++++----
arch/x86/kernel/cpu/intel.c | 195 +++++++++++++++++-
arch/x86/kernel/cpu/scattered.c | 17 ++
arch/x86/kernel/fpu/init.c | 33 ++-
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/traps.c | 54 ++++-
arch/x86/kvm/cpuid.c | 6 +
arch/x86/kvm/vmx/vmx.c | 35 ++++
arch/x86/kvm/vmx/vmx.h | 1 +
arch/x86/kvm/x86.c | 39 ++++
arch/x86/lib/cmdline.c | 17 +-
drivers/net/ethernet/broadcom/b44.c | 4 +-
drivers/net/wireless/ti/wlcore/cmd.c | 15 +-
drivers/net/wireless/ti/wlcore/wlcore.h | 4 +-
24 files changed, 577 insertions(+), 80 deletions(-)
--
2.19.1
Since kernel option clearcpuid now supports multiple options and CPU
capability flags, the document needs to be changed.
Signed-off-by: Fenghua Yu <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2b8ee90bb644..0cbeda6d7f16 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -563,17 +563,21 @@
loops can be debugged more effectively on production
systems.
- clearcpuid=BITNUM [X86]
+ clearcpuid=BITNUM | FLAG [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
- numbers. Note the Linux specific bits are not necessarily
+ numbers or /proc/cpuinfo for valid CPU flags.
+ Multiple options can be used to disable a few features.
+ This prevents the feature from being used by the
+ kernel or shown in /proc/cpuinfo or shown in CPUID
+ called directly by user programs.
+ A few notes:
+ - The Linux specific bits are not necessarily
stable over kernel options, but the vendor specific
ones should be.
- Also note that user programs calling CPUID directly
- or using the feature without checking anything
- will still see it. This just prevents it from
- being used by the kernel or shown in /proc/cpuinfo.
- Also note the kernel might malfunction if you disable
+ - User programs using the feature without checking
+ anything will still use it.
+ - The kernel might malfunction if you disable
some critical bits.
cma=nn[MG]@[start[MG][-end[MG]]]
--
2.19.1
There may be different considerations on how to handle #AC for split lock,
e.g. how to handle system hang caused by split lock issue in firmware,
how to emulate faulting instruction, etc. We use a simple method to
handle user and kernel split lock and may extend the method in the future.
When #AC exception for split lock is triggered from user process, the
process is killed by SIGBUS. To execute the process properly, a user
application developer needs to fix the split lock issue.
When #AC exception for split lock is triggered from a kernel instruction,
disable split lock detection on local CPU and warn the split lock issue.
After the exception, the faulting instruction will be executed and kernel
execution continues. Split lock detection is only disabled on the local
CPU, not globally. It will be re-enabled if the CPU is offline and then
online or through sysfs interface.
A kernel/driver developer should check the warning, which contains helpful
faulting address, context, and callstack info, and fix the split lock
issues. Then further split lock issues may be captured and fixed.
After bit 29 in MSR_TEST_CTL is set to 1 in kernel, firmware inherits
the setting when firmware is executed in S4, S5, run time services, SMI,
etc. If there is a split lock operation in firmware, it will triggers
#AC and may hang the system depending on how firmware handles the #AC.
It's up to a firmware developer to fix split lock issues in firmware.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/traps.c | 43 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 42 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d26f9e9c3d83..0ac992bfa287 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
#include <asm/mpx.h>
#include <asm/vm86.h>
#include <asm/umip.h>
+#include <asm/cpu.h>
#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -293,9 +294,49 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ unsigned int trapnr = X86_TRAP_AC;
+ char str[] = "alignment check";
+ int signr = SIGBUS;
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ /*
+ * WARN*()s end up here; fix them up before we call the
+ * notifier chain.
+ */
+ if (!user_mode(regs) && fixup_bug(regs, trapnr))
+ return;
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) ==
+ NOTIFY_STOP)
+ return;
+
+ cond_local_irq_enable(regs);
+ if (!user_mode(regs) &&
+ static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
+ /*
+ * Only split lock can generate #AC from kernel at this point.
+ * Warn and disable split lock detection on this CPU. The
+ * faulting instruction will be executed without generating
+ * another #AC fault. User needs to check the warning and
+ * fix the split lock issue in the faulting instruction.
+ */
+ msr_clear_bit(MSR_TEST_CTL,
+ TEST_CTL_ENABLE_SPLIT_LOCK_DETECT_SHIFT);
+ WARN_ONCE(1, "split lock operation detected\n");
+
+ return;
+ }
+
+ /* Handle #AC generated in any other cases. */
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
--
2.19.1
From: Xiaoyao Li <[email protected]>
A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
future x86 processors. When bit 29 is set, the processor causes #AC
exception for split locked accesses at all CPL.
Please check the latest Intel 64 and IA-32 Architectures Software
Developer's Manual for more detailed information on the MSR and
the split lock bit.
This patch emulate MSR TEST_CTL with vmx->msr_test_ctl and does the
following:
1. As MSR TEST_CTL of guest is emulated, enable the related bits
in CORE_CAPABILITY to corretly report this feature to guest.
2. Differentiate MSR TEST_CTL between host and guest.
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 1 +
arch/x86/kvm/x86.c | 17 ++++++++++++++++-
3 files changed, 52 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ab432a930ae8..309ccf593f0d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1663,6 +1663,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
u32 index;
switch (msr_info->index) {
+ case MSR_TEST_CTL:
+ if (!msr_info->host_initiated &&
+ !(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
+ return 1;
+ msr_info->data = vmx->msr_test_ctl;
+ break;
#ifdef CONFIG_X86_64
case MSR_FS_BASE:
msr_info->data = vmcs_readl(GUEST_FS_BASE);
@@ -1797,6 +1803,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
u32 index;
switch (msr_index) {
+ case MSR_TEST_CTL:
+ if (!(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
+ return 1;
+
+ if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
+ return 1;
+ vmx->msr_test_ctl = data;
+ break;
case MSR_EFER:
ret = kvm_set_msr_common(vcpu, msr_info);
break;
@@ -4077,6 +4091,9 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}
+ /* disable AC split lock by default */
+ vmx->msr_test_ctl = 0;
+
vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
/* 22.2.1, 20.8.1 */
@@ -4114,6 +4131,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmx->rmode.vm86_active = 0;
vmx->spec_ctrl = 0;
+ vmx->msr_test_ctl = 0;
vcpu->arch.microcode_version = 0x100000000ULL;
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
@@ -6313,6 +6331,21 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
msrs[i].host, false);
}
+static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
+{
+ u64 host_msr_test_ctl;
+
+ /* if TEST_CTL MSR doesn't exist on the hardware, do nothing */
+ if (rdmsrl_safe(MSR_TEST_CTL, &host_msr_test_ctl))
+ return;
+
+ if (host_msr_test_ctl == vmx->msr_test_ctl)
+ clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
+ else
+ add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
+ host_msr_test_ctl, false);
+}
+
static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
{
vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
@@ -6419,6 +6452,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
atomic_switch_perf_msrs(vmx);
+ atomic_switch_msr_test_ctl(vmx);
+
vmx_update_hv_timer(vcpu);
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a1e00d0a2482..6091a8b9de74 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -190,6 +190,7 @@ struct vcpu_vmx {
u64 msr_guest_kernel_gs_base;
#endif
+ u64 msr_test_ctl;
u64 spec_ctrl;
u32 vm_entry_controls_shadow;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4459115eb0ec..e93c2f620cdb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1229,7 +1229,22 @@ EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
u64 kvm_get_core_capability(void)
{
- return 0;
+ u64 data;
+
+ rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data);
+
+ /* mask non-virtualizable functions */
+ data &= CORE_CAP_SPLIT_LOCK_DETECT;
+
+ /*
+ * There will be a list of FMS values that have split lock detection
+ * but lack the CORE CAPABILITY MSR. In this case, set
+ * CORE_CAP_SPLIT_LOCK_DETECT since we emulate MSR CORE_CAPABILITY.
+ */
+ if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+ data |= CORE_CAP_SPLIT_LOCK_DETECT;
+
+ return data;
}
EXPORT_SYMBOL_GPL(kvm_get_core_capability);
--
2.19.1
A split locked access locks bus and degrades overall memory access
performance. When split lock detection feature is enumerated, enable
the feature by default to find any split lock issue and then fix
the issue.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 42 +++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 7f6943af35dc..ae3e327d5e35 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -31,6 +31,12 @@
#include <asm/apic.h>
#endif
+#define DISABLE_SPLIT_LOCK_DETECT 0
+#define ENABLE_SPLIT_LOCK_DETECT 1
+
+static DEFINE_MUTEX(split_lock_detect_mutex);
+static int split_lock_detect_val;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -161,10 +167,45 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
return false;
}
+static u32 new_sp_test_ctl_val(u32 test_ctl_val)
+{
+ /* Change the split lock setting. */
+ if (READ_ONCE(split_lock_detect_val) == DISABLE_SPLIT_LOCK_DETECT)
+ test_ctl_val &= ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
+ else
+ test_ctl_val |= TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
+
+ return test_ctl_val;
+}
+
+static inline void show_split_lock_detection_info(void)
+{
+ if (READ_ONCE(split_lock_detect_val))
+ pr_info_once("x86/split_lock: split lock detection enabled\n");
+ else
+ pr_info_once("x86/split_lock: split lock detection disabled\n");
+}
+
+static void init_split_lock_detect(struct cpuinfo_x86 *c)
+{
+ if (cpu_has(c, X86_FEATURE_SPLIT_LOCK_DETECT)) {
+ u32 l, h;
+
+ mutex_lock(&split_lock_detect_mutex);
+ rdmsr(MSR_TEST_CTL, l, h);
+ l = new_sp_test_ctl_val(l);
+ wrmsr(MSR_TEST_CTL, l, h);
+ show_split_lock_detection_info();
+ mutex_unlock(&split_lock_detect_mutex);
+ }
+}
+
static void early_init_intel(struct cpuinfo_x86 *c)
{
u64 misc_enable;
+ init_split_lock_detect(c);
+
/* Unmask CPUID levels if masked: */
if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
if (msr_clear_bit(MSR_IA32_MISC_ENABLE,
@@ -1032,6 +1073,7 @@ cpu_dev_register(intel_cpu_dev);
static void __init set_split_lock_detect(void)
{
setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+ split_lock_detect_val = 1;
}
void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
--
2.19.1
Icelake mobile processor can detect split lock operations although
the processor doesn't have MSR_IA32_CORE_CAPABILITY and split lock
detection bit in the MSR. Set split lock detection feature bit
X86_FEATURE_SPLIT_LOCK_DETECT on the processor based on its
family/model/stepping.
A few other processors may also have the feature but don't have
MSR_IA32_CORE_CAPABILITY. The feature will be enumerated on those
processors once their family/model/stepping information is released.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ad3f72d106fc..7f6943af35dc 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1038,8 +1038,18 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
{
u64 ia32_core_cap = 0;
- if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITY))
+ if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITY)) {
+ /*
+ * The following processors have split lock detection feature.
+ * But since they don't have MSR_IA32_CORE_CAPABILITY, the
+ * feature cannot be enumerated by the MSR. So enumerate the
+ * feature by family/model/stepping.
+ */
+ if (c->x86 == 6 && c->x86_model == INTEL_FAM6_ICELAKE_MOBILE)
+ set_split_lock_detect();
+
return;
+ }
/*
* If MSR_IA32_CORE_CAPABILITY exists, enumerate features that are
--
2.19.1
Setting bit 29 in MSR_TEST_CTL register 0x33 enables split lock detection
and clearing the bit disables split lock detection.
Define the MSR and the bit. The definitions will be used in enabling or
disabling split lock detection.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/msr-index.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index f65ef6f783d2..25fa808de9e2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -39,6 +39,10 @@
/* Intel MSRs. Some also available on other CPUs */
+#define MSR_TEST_CTL 0x00000033
+#define TEST_CTL_ENABLE_SPLIT_LOCK_DETECT_SHIFT 29
+#define TEST_CTL_ENABLE_SPLIT_LOCK_DETECT BIT(29)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
--
2.19.1
A new MSR_IA32_CORE_CAPABILITY (0xcf) is defined. Each bit in the MSR
enumerates a model specific feature. Currently bit 5 enumerates split
lock detection. When bit 5 is 1, split lock detection is supported.
When the bit is 0, split lock detection is not supported.
Please check the latest Intel 64 and IA-32 Architectures Software
Developer's Manual for more detailed information on the MSR and the
split lock detection bit.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/msr-index.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ca5bc0eacb95..f65ef6f783d2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -59,6 +59,9 @@
#define MSR_PLATFORM_INFO_CPUID_FAULT_BIT 31
#define MSR_PLATFORM_INFO_CPUID_FAULT BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
+#define MSR_IA32_CORE_CAPABILITY 0x000000cf
+#define CORE_CAP_SPLIT_LOCK_DETECT BIT(5) /* Detect split lock */
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
--
2.19.1
MSR_IA32_CORE_CAPABILITY (0xcf) contains bits that enumerate some model
specific features.
The MSR 0xcf itself is enumerated by CPUID.(EAX=0x7,ECX=0):EDX[30].
When this CPUID bit is 1, the MSR 0xcf exists.
Detailed information on the CPUID bit and the MSR can be found in the
latest Intel 64 and IA-32 Architectures Software Developer's Manual.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 981ff9479648..eff25e2015a5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -350,6 +350,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITY (18*32+30) /* "" IA32_CORE_CAPABILITY MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */
/*
--
2.19.1
Bits in MSR_IA32_CORE_CAPABILITY enumerate a few features that are not
enumerated through CPUID. Currently bit 5 is defined to enumerate
feature of split lock detection. All other bits are reserved now.
When bit 5 is 1, the feature is supported and feature bit
X86_FEATURE_SPLIT_LOCK_DETECT is set. Otherwise, the feature is not
available.
The MSR_IA32_CORE_CAPABILITY itself is enumerated by
CPUID.(EAX=0x7,ECX=0):EDX[30].
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpu.h | 5 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/cpuid-deps.c | 79 +++++++++++++++---------------
arch/x86/kernel/cpu/intel.c | 21 ++++++++
5 files changed, 69 insertions(+), 39 deletions(-)
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..4e03f53fc079 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,9 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index eff25e2015a5..db0c1826d7ad 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -221,6 +221,7 @@
#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */
#define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT ( 7*32+31) /* #AC for split lock */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3716e2bb028b..bbdd69dd4f5f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1105,6 +1105,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
cpu_set_bug_bits(c);
+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 2c0bd38a44ab..3d633f67fbd7 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -20,45 +20,46 @@ struct cpuid_dep {
* but it's difficult to tell that to the init reference checker.
*/
static const struct cpuid_dep cpuid_deps[] = {
- { X86_FEATURE_XSAVEOPT, X86_FEATURE_XSAVE },
- { X86_FEATURE_XSAVEC, X86_FEATURE_XSAVE },
- { X86_FEATURE_XSAVES, X86_FEATURE_XSAVE },
- { X86_FEATURE_AVX, X86_FEATURE_XSAVE },
- { X86_FEATURE_PKU, X86_FEATURE_XSAVE },
- { X86_FEATURE_MPX, X86_FEATURE_XSAVE },
- { X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
- { X86_FEATURE_FXSR_OPT, X86_FEATURE_FXSR },
- { X86_FEATURE_XMM, X86_FEATURE_FXSR },
- { X86_FEATURE_XMM2, X86_FEATURE_XMM },
- { X86_FEATURE_XMM3, X86_FEATURE_XMM2 },
- { X86_FEATURE_XMM4_1, X86_FEATURE_XMM2 },
- { X86_FEATURE_XMM4_2, X86_FEATURE_XMM2 },
- { X86_FEATURE_XMM3, X86_FEATURE_XMM2 },
- { X86_FEATURE_PCLMULQDQ, X86_FEATURE_XMM2 },
- { X86_FEATURE_SSSE3, X86_FEATURE_XMM2, },
- { X86_FEATURE_F16C, X86_FEATURE_XMM2, },
- { X86_FEATURE_AES, X86_FEATURE_XMM2 },
- { X86_FEATURE_SHA_NI, X86_FEATURE_XMM2 },
- { X86_FEATURE_FMA, X86_FEATURE_AVX },
- { X86_FEATURE_AVX2, X86_FEATURE_AVX, },
- { X86_FEATURE_AVX512F, X86_FEATURE_AVX, },
- { X86_FEATURE_AVX512IFMA, X86_FEATURE_AVX512F },
- { X86_FEATURE_AVX512PF, X86_FEATURE_AVX512F },
- { X86_FEATURE_AVX512ER, X86_FEATURE_AVX512F },
- { X86_FEATURE_AVX512CD, X86_FEATURE_AVX512F },
- { X86_FEATURE_AVX512DQ, X86_FEATURE_AVX512F },
- { X86_FEATURE_AVX512BW, X86_FEATURE_AVX512F },
- { X86_FEATURE_AVX512VL, X86_FEATURE_AVX512F },
- { X86_FEATURE_AVX512VBMI, X86_FEATURE_AVX512F },
- { X86_FEATURE_AVX512_VBMI2, X86_FEATURE_AVX512VL },
- { X86_FEATURE_GFNI, X86_FEATURE_AVX512VL },
- { X86_FEATURE_VAES, X86_FEATURE_AVX512VL },
- { X86_FEATURE_VPCLMULQDQ, X86_FEATURE_AVX512VL },
- { X86_FEATURE_AVX512_VNNI, X86_FEATURE_AVX512VL },
- { X86_FEATURE_AVX512_BITALG, X86_FEATURE_AVX512VL },
- { X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F },
- { X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F },
- { X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F },
+ { X86_FEATURE_XSAVEOPT, X86_FEATURE_XSAVE },
+ { X86_FEATURE_XSAVEC, X86_FEATURE_XSAVE },
+ { X86_FEATURE_XSAVES, X86_FEATURE_XSAVE },
+ { X86_FEATURE_AVX, X86_FEATURE_XSAVE },
+ { X86_FEATURE_PKU, X86_FEATURE_XSAVE },
+ { X86_FEATURE_MPX, X86_FEATURE_XSAVE },
+ { X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
+ { X86_FEATURE_FXSR_OPT, X86_FEATURE_FXSR },
+ { X86_FEATURE_XMM, X86_FEATURE_FXSR },
+ { X86_FEATURE_XMM2, X86_FEATURE_XMM },
+ { X86_FEATURE_XMM3, X86_FEATURE_XMM2 },
+ { X86_FEATURE_XMM4_1, X86_FEATURE_XMM2 },
+ { X86_FEATURE_XMM4_2, X86_FEATURE_XMM2 },
+ { X86_FEATURE_XMM3, X86_FEATURE_XMM2 },
+ { X86_FEATURE_PCLMULQDQ, X86_FEATURE_XMM2 },
+ { X86_FEATURE_SSSE3, X86_FEATURE_XMM2, },
+ { X86_FEATURE_F16C, X86_FEATURE_XMM2, },
+ { X86_FEATURE_AES, X86_FEATURE_XMM2 },
+ { X86_FEATURE_SHA_NI, X86_FEATURE_XMM2 },
+ { X86_FEATURE_FMA, X86_FEATURE_AVX },
+ { X86_FEATURE_AVX2, X86_FEATURE_AVX, },
+ { X86_FEATURE_AVX512F, X86_FEATURE_AVX, },
+ { X86_FEATURE_AVX512IFMA, X86_FEATURE_AVX512F },
+ { X86_FEATURE_AVX512PF, X86_FEATURE_AVX512F },
+ { X86_FEATURE_AVX512ER, X86_FEATURE_AVX512F },
+ { X86_FEATURE_AVX512CD, X86_FEATURE_AVX512F },
+ { X86_FEATURE_AVX512DQ, X86_FEATURE_AVX512F },
+ { X86_FEATURE_AVX512BW, X86_FEATURE_AVX512F },
+ { X86_FEATURE_AVX512VL, X86_FEATURE_AVX512F },
+ { X86_FEATURE_AVX512VBMI, X86_FEATURE_AVX512F },
+ { X86_FEATURE_AVX512_VBMI2, X86_FEATURE_AVX512VL },
+ { X86_FEATURE_GFNI, X86_FEATURE_AVX512VL },
+ { X86_FEATURE_VAES, X86_FEATURE_AVX512VL },
+ { X86_FEATURE_VPCLMULQDQ, X86_FEATURE_AVX512VL },
+ { X86_FEATURE_AVX512_VNNI, X86_FEATURE_AVX512VL },
+ { X86_FEATURE_AVX512_BITALG, X86_FEATURE_AVX512VL },
+ { X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F },
+ { X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F },
+ { X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F },
+ { X86_FEATURE_SPLIT_LOCK_DETECT, X86_FEATURE_CORE_CAPABILITY},
{}
};
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index fc3c07fe7df5..ad3f72d106fc 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1029,3 +1029,24 @@ static const struct cpu_dev intel_cpu_dev = {
cpu_dev_register(intel_cpu_dev);
+static void __init set_split_lock_detect(void)
+{
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+}
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_cap = 0;
+
+ if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITY))
+ return;
+
+ /*
+ * If MSR_IA32_CORE_CAPABILITY exists, enumerate features that are
+ * reported in the MSR.
+ */
+ rdmsrl(MSR_IA32_CORE_CAPABILITY, ia32_core_cap);
+
+ if (ia32_core_cap & CORE_CAP_SPLIT_LOCK_DETECT)
+ set_split_lock_detect();
+}
--
2.19.1
cpu_caps_cleared and cpu_caps_set may not be aligned to unsigned long.
Atomic operations (i.e. set_bit and clear_bit) on the bitmaps may access
two cache lines (a.k.a. split lock) and lock the bus to block all memory
accesses from other processors to ensure atomicity.
To avoid the overall performance degradation from the bus locking, align
the two variables to unsigned long.
Defining the variables as unsigned long may also fix the issue because
they are naturally aligned to unsigned long. But that needs additional
code changes. Adding __aligned(unsigned long) is a simpler fix.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/common.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb28e98a0659..3716e2bb028b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -488,8 +488,9 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
return NULL; /* Not found */
}
-__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS];
-__u32 cpu_caps_set[NCAPINTS + NBUGINTS];
+/* Aligned to unsigned long to avoid split lock in atomic bitmap ops */
+__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
+__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
void load_percpu_segment(int cpu)
{
--
2.19.1
From: Paolo Bonzini <[email protected]>
Bitmaps are defined on unsigned longs, so the usage of u32[2] in the
wlcore driver is incorrect. As noted by Peter Zijlstra, casting arrays
to a bitmap is incorrect for big-endian architectures.
When looking at it I observed that:
- operations on reg_ch_conf_pending is always under the wl_lock mutex,
so set_bit is overkill
- the only case where reg_ch_conf_pending is accessed a u32 at a time is
unnecessary too.
This patch cleans up everything in this area, and changes tmp_ch_bitmap
to have the proper alignment.
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
---
drivers/net/wireless/ti/wlcore/cmd.c | 15 ++++++---------
drivers/net/wireless/ti/wlcore/wlcore.h | 4 ++--
2 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c
index 348be0aed97e..0415a064f6e2 100644
--- a/drivers/net/wireless/ti/wlcore/cmd.c
+++ b/drivers/net/wireless/ti/wlcore/cmd.c
@@ -1700,14 +1700,14 @@ void wlcore_set_pending_regdomain_ch(struct wl1271 *wl, u16 channel,
ch_bit_idx = wlcore_get_reg_conf_ch_idx(band, channel);
if (ch_bit_idx >= 0 && ch_bit_idx <= WL1271_MAX_CHANNELS)
- set_bit(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
+ __set_bit_le(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
}
int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
{
struct wl12xx_cmd_regdomain_dfs_config *cmd = NULL;
int ret = 0, i, b, ch_bit_idx;
- u32 tmp_ch_bitmap[2];
+ __le32 tmp_ch_bitmap[2] __aligned(sizeof(unsigned long));
struct wiphy *wiphy = wl->hw->wiphy;
struct ieee80211_supported_band *band;
bool timeout = false;
@@ -1717,7 +1717,7 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
wl1271_debug(DEBUG_CMD, "cmd reg domain config");
- memset(tmp_ch_bitmap, 0, sizeof(tmp_ch_bitmap));
+ memcpy(tmp_ch_bitmap, wl->reg_ch_conf_pending, sizeof(tmp_ch_bitmap));
for (b = NL80211_BAND_2GHZ; b <= NL80211_BAND_5GHZ; b++) {
band = wiphy->bands[b];
@@ -1738,13 +1738,10 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
if (ch_bit_idx < 0)
continue;
- set_bit(ch_bit_idx, (long *)tmp_ch_bitmap);
+ __set_bit_le(ch_bit_idx, (long *)tmp_ch_bitmap);
}
}
- tmp_ch_bitmap[0] |= wl->reg_ch_conf_pending[0];
- tmp_ch_bitmap[1] |= wl->reg_ch_conf_pending[1];
-
if (!memcmp(tmp_ch_bitmap, wl->reg_ch_conf_last, sizeof(tmp_ch_bitmap)))
goto out;
@@ -1754,8 +1751,8 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
goto out;
}
- cmd->ch_bit_map1 = cpu_to_le32(tmp_ch_bitmap[0]);
- cmd->ch_bit_map2 = cpu_to_le32(tmp_ch_bitmap[1]);
+ cmd->ch_bit_map1 = tmp_ch_bitmap[0];
+ cmd->ch_bit_map2 = tmp_ch_bitmap[1];
cmd->dfs_region = wl->dfs_region;
wl1271_debug(DEBUG_CMD,
diff --git a/drivers/net/wireless/ti/wlcore/wlcore.h b/drivers/net/wireless/ti/wlcore/wlcore.h
index dd14850b0603..870eea3e7a27 100644
--- a/drivers/net/wireless/ti/wlcore/wlcore.h
+++ b/drivers/net/wireless/ti/wlcore/wlcore.h
@@ -320,9 +320,9 @@ struct wl1271 {
bool watchdog_recovery;
/* Reg domain last configuration */
- u32 reg_ch_conf_last[2] __aligned(8);
+ DECLARE_BITMAP(reg_ch_conf_last, 64);
/* Reg domain pending configuration */
- u32 reg_ch_conf_pending[2];
+ DECLARE_BITMAP(reg_ch_conf_pending, 64);
/* Pointer that holds DMA-friendly block for the mailbox */
void *mbox;
--
2.19.1
From: Peter Zijlstra <[email protected]>
A bit in pwol_mask is set in b44_magic_pattern by atomic set_bit.
But since pwol_mask is local and never exposed to concurrency, there is
no need to set bit in pwol_mask atomically.
set_bit sets the bit in a single unsigned long location. Because pwol_mask
may not be aligned to unsigned long, the location may cross two cache
lines. On x86, accessing two cache lines in locked instruction in set_bit
is called split lock and can cause overall performance degradation.
So use non atomic __set_bit to set pwol_mask bits. __set_bit won't hit
split lock issue on x86.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
---
drivers/net/ethernet/broadcom/b44.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
index 97ab0dd25552..5738ab963dfb 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -1520,7 +1520,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
memset(ppattern + offset, 0xff, magicsync);
for (j = 0; j < magicsync; j++)
- set_bit(len++, (unsigned long *) pmask);
+ __set_bit(len++, (unsigned long *)pmask);
for (j = 0; j < B44_MAX_PATTERNS; j++) {
if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
@@ -1532,7 +1532,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
for (k = 0; k< ethaddr_bytes; k++) {
ppattern[offset + magicsync +
(j * ETH_ALEN) + k] = macaddr[k];
- set_bit(len++, (unsigned long *) pmask);
+ __set_bit(len++, (unsigned long *)pmask);
}
}
return len - 1;
--
2.19.1
set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
operate on bitmap defined in x86_capability.
Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
the location is at:
base address of x86_capability + (bit offset in x86_capability / 64) * 8
Since base address of x86_capability may not be aligned to unsigned long,
the single unsigned long location may cross two cache lines and
accessing the location by locked BTS/BTR introductions will trigger #AC.
To fix the split lock issue, align x86_capability to unsigned long so that
the location will be always within one cache line.
Changing x86_capability[]'s type to unsigned long may also fix the issue
because x86_capability[] will be naturally aligned to unsigned long. But
this needs additional code changes. So choose the simpler solution
by setting alignment to size of unsigned long.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/processor.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 2bb3a648fc12..7c62b9ad6e5a 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -93,7 +93,9 @@ struct cpuinfo_x86 {
__u32 extended_cpuid_level;
/* Maximum supported CPUID level, -1=no CPUID: */
int cpuid_level;
- __u32 x86_capability[NCAPINTS + NBUGINTS];
+ /* Aligned to size of unsigned long to avoid split lock in atomic ops */
+ __u32 x86_capability[NCAPINTS + NBUGINTS]
+ __aligned(sizeof(unsigned long));
char x86_vendor_id[16];
char x86_model_id[64];
/* in KB - valid for CPUS which support this call: */
--
2.19.1
The interface /sys/device/system/cpu/split_lock_detect is added
to allow user to control split lock detection and show current split
lock detection setting.
Writing 1 to the file enables split lock detection and writing 0
disables split lock detection. Split lock detection is enabled or
disabled on all CPUs.
Reading the file returns current global split lock detection setting:
0: disabled
1: enabled
Please note the interface only shows global setting. During run time,
split lock detection on one CPU may be disabled if split lock
in kernel code happens on the CPU. The interface doesn't show split
lock detection status on individual CPU. User can run rdmsr on 0x33
to check split lock detection on each CPU.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 66 +++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ae3e327d5e35..166033fa8423 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1102,3 +1102,69 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
if (ia32_core_cap & CORE_CAP_SPLIT_LOCK_DETECT)
set_split_lock_detect();
}
+
+static ssize_t
+split_lock_detect_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%u\n", READ_ONCE(split_lock_detect_val));
+}
+
+static ssize_t
+split_lock_detect_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ u32 val, l, h;
+ int cpu, ret;
+
+ ret = kstrtou32(buf, 10, &val);
+ if (ret)
+ return ret;
+
+ if (val != DISABLE_SPLIT_LOCK_DETECT && val != ENABLE_SPLIT_LOCK_DETECT)
+ return -EINVAL;
+
+ /*
+ * Since split lock could be disabled by kernel #AC handler or user
+ * may directly change bit 29 in MSR_TEST_CTL, split lock setting on
+ * each CPU may be different from global setting split_lock_detect_val
+ * by now. Update MSR on each CPU, so all of CPUs will have same split
+ * lock setting.
+ */
+ mutex_lock(&split_lock_detect_mutex);
+
+ WRITE_ONCE(split_lock_detect_val, val);
+
+ /*
+ * Get MSR_TEST_CTL on this CPU, assuming all CPUs have same value
+ * in the MSR except split lock detection bit (bit 29).
+ */
+ rdmsr(MSR_TEST_CTL, l, h);
+ l = new_sp_test_ctl_val(l);
+ /* Update the split lock detection setting on all online CPUs. */
+ for_each_online_cpu(cpu)
+ wrmsr_on_cpu(cpu, MSR_TEST_CTL, l, h);
+
+ mutex_unlock(&split_lock_detect_mutex);
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(split_lock_detect);
+
+static int __init split_lock_init(void)
+{
+ int ret;
+
+ if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+ return -ENODEV;
+
+ ret = device_create_file(cpu_subsys.dev_root,
+ &dev_attr_split_lock_detect);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+subsys_initcall(split_lock_init);
--
2.19.1
Currently only one kernel option "clearcpuid=" can be picked up by
kernel during boot time.
In some cases, user may want to clear a few cpu caps. This may be
useful to replace endless (new) kernel options like nosmep, nosmap,
etc.
Add support of multiple clearcpuid options to allow user to clear
multiple cpu caps.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cmdline.h | 3 +++
arch/x86/kernel/fpu/init.c | 30 ++++++++++++++++++++----------
arch/x86/lib/cmdline.c | 17 +++++++++++++++--
3 files changed, 38 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/cmdline.h b/arch/x86/include/asm/cmdline.h
index 6faaf27e8899..059e29558bb3 100644
--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -5,5 +5,8 @@
int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
int cmdline_find_option(const char *cmdline_ptr, const char *option,
char *buffer, int bufsize);
+int cmdline_find_option_in_range(const char *cmdline_ptr, int cmdline_size,
+ const char *option, char *buffer, int bufsize,
+ char **arg_pos);
#endif /* _ASM_X86_CMDLINE_H */
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 6abd83572b01..88bbba7ee96a 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -243,16 +243,31 @@ static void __init fpu__init_system_ctx_switch(void)
WARN_ON_FPU(current->thread.fpu.initialized);
}
+static void __init clear_cpuid(void)
+{
+ char arg[32], *argptr, *option_pos, clearcpuid_option[] = "clearcpuid";
+ int bit, cmdline_size;
+
+ /* Find each option in boot_command_line and clear specified cpu cap. */
+ cmdline_size = COMMAND_LINE_SIZE;
+ while (cmdline_find_option_in_range(boot_command_line, cmdline_size,
+ clearcpuid_option, arg,
+ sizeof(arg), &option_pos) >= 0) {
+ /* Chang command line range for next search. */
+ cmdline_size = option_pos - boot_command_line + 1;
+ argptr = arg;
+ if (get_option(&argptr, &bit) &&
+ bit >= 0 && bit < NCAPINTS * 32)
+ setup_clear_cpu_cap(bit);
+ }
+}
+
/*
* We parse fpu parameters early because fpu__init_system() is executed
* before parse_early_param().
*/
static void __init fpu__init_parse_early_param(void)
{
- char arg[32];
- char *argptr = arg;
- int bit;
-
if (cmdline_find_option_bool(boot_command_line, "no387"))
setup_clear_cpu_cap(X86_FEATURE_FPU);
@@ -271,12 +286,7 @@ static void __init fpu__init_parse_early_param(void)
if (cmdline_find_option_bool(boot_command_line, "noxsaves"))
setup_clear_cpu_cap(X86_FEATURE_XSAVES);
- if (cmdline_find_option(boot_command_line, "clearcpuid", arg,
- sizeof(arg)) &&
- get_option(&argptr, &bit) &&
- bit >= 0 &&
- bit < NCAPINTS * 32)
- setup_clear_cpu_cap(bit);
+ clear_cpuid();
}
/*
diff --git a/arch/x86/lib/cmdline.c b/arch/x86/lib/cmdline.c
index 3261abb21ef4..9cf1a0773877 100644
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -114,13 +114,15 @@ __cmdline_find_option_bool(const char *cmdline, int max_cmdline_size,
* @option: option string to look for
* @buffer: memory buffer to return the option argument
* @bufsize: size of the supplied memory buffer
+ * @option_pos: pointer to the option if found
*
* Returns the length of the argument (regardless of if it was
* truncated to fit in the buffer), or -1 on not found.
*/
static int
__cmdline_find_option(const char *cmdline, int max_cmdline_size,
- const char *option, char *buffer, int bufsize)
+ const char *option, char *buffer, int bufsize,
+ char **arg_pos)
{
char c;
int pos = 0, len = -1;
@@ -164,6 +166,9 @@ __cmdline_find_option(const char *cmdline, int max_cmdline_size,
len = 0;
bufptr = buffer;
state = st_bufcpy;
+ if (arg_pos)
+ *arg_pos = (char *)cmdline -
+ strlen(option) - 1;
break;
} else if (c == *opptr++) {
/*
@@ -211,5 +216,13 @@ int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
int bufsize)
{
return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
- buffer, bufsize);
+ buffer, bufsize, NULL);
+}
+
+int cmdline_find_option_in_range(const char *cmdline, int cmdline_size,
+ char *option, char *buffer, int bufsize,
+ char **arg_pos)
+{
+ return __cmdline_find_option(cmdline, cmdline_size, option, buffer,
+ bufsize, arg_pos);
}
--
2.19.1
The kernel option clearcpuid currently only takes feature bit which
can be changed from kernel to kernel.
Extend clearcpuid to use cap flag string, which is defined in
x86_cap_flags[] and won't be changed from kernel to kernel.
And user can easily get the cap flag string from /proc/cpuinfo.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 26 ++++++++++++++++++++++++++
arch/x86/kernel/fpu/init.c | 3 ++-
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 0e56ff7e4848..823c4ab82e12 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -133,6 +133,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
extern void setup_clear_cpu_cap(unsigned int bit);
extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
+bool find_cpu_cap(char *cap_flag, unsigned int *pfeature);
#define setup_force_cpu_cap(bit) do { \
set_cpu_cap(&boot_cpu_data, bit); \
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 3d633f67fbd7..1a71434f7b46 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -120,3 +120,29 @@ void setup_clear_cpu_cap(unsigned int feature)
{
do_clear_cpu_cap(NULL, feature);
}
+
+/**
+ * find_cpu_cap - Given a cap flag string, find its corresponding feature bit.
+ * @cap_flag: cap flag string as defined in x86_cap_flags[]
+ * @pfeature: feature bit
+ *
+ * Return: true if the feature is found. false if not found
+ */
+bool find_cpu_cap(char *cap_flag, unsigned int *pfeature)
+{
+#ifdef CONFIG_X86_FEATURE_NAMES
+ unsigned int feature;
+
+ for (feature = 0; feature < NCAPINTS * 32; feature++) {
+ if (!x86_cap_flags[feature])
+ continue;
+
+ if (strcmp(cap_flag, x86_cap_flags[feature]) == 0) {
+ *pfeature = feature;
+
+ return true;
+ }
+ }
+#endif
+ return false;
+}
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 88bbba7ee96a..99b895eea166 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -256,7 +256,8 @@ static void __init clear_cpuid(void)
/* Chang command line range for next search. */
cmdline_size = option_pos - boot_command_line + 1;
argptr = arg;
- if (get_option(&argptr, &bit) &&
+ /* cpu cap can be specified by either feature bit or string */
+ if ((get_option(&argptr, &bit) || find_cpu_cap(arg, &bit)) &&
bit >= 0 && bit < NCAPINTS * 32)
setup_clear_cpu_cap(bit);
}
--
2.19.1
Add an ABI document entry for /sys/devices/system/cpu/split_lock_detect.
Signed-off-by: Fenghua Yu <[email protected]>
---
.../ABI/testing/sysfs-devices-system-cpu | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 9605dbd4b5b5..a4e8c2e2e1ac 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -67,6 +67,21 @@ Description: Discover NUMA node a CPU belongs to
/sys/devices/system/cpu/cpu42/node2 -> ../../node/node2
+What: /sys/devices/system/cpu/split_lock_detect
+Date: March 2019
+Contact: Linux kernel mailing list <[email protected]>
+Description: Control split lock detection on Intel Tremont and future CPUs
+
+ Read/write interface to control split lock detection:
+ "0" disable split lock detection on all CPUs
+ "1" enable split lock detection on all CPUs
+
+ Please note the interface only shows or controls global setting.
+ During run time, split lock detection on one CPU may be
+ disabled if split lock operation in kernel code happens on
+ the CPU. The interface doesn't show or control split lock
+ detection on individual CPU.
+
What: /sys/devices/system/cpu/cpu#/topology/core_id
/sys/devices/system/cpu/cpu#/topology/core_siblings
/sys/devices/system/cpu/cpu#/topology/core_siblings_list
--
2.19.1
From: Xiaoyao Li <[email protected]>
MSR IA32_CORE_CAPABILITY is a feature-enumerating MSR, bit 5 of which
reports the capability of enabling detection of split locks (will be
supported on future processors based on Tremont microarchitecture and
later).
CPUID.(EAX=7H,ECX=0):EDX[30] will enumerate the presence of the
IA32_CORE_CAPABILITY MSR.
Please check the latest Intel 64 and IA-32 Architectures Software
Developer's Manual for more detailed information on the MSR and
the split lock bit.
Since MSR_IA32_CORE_CAPABILITY is a feature-enumerating MSR, emulate
it in software regardless of host's capability. What we need to
do is to set the right value of it to report the capability of guest.
In this patch, just set the guest's core_capability as 0, because we
haven't added support of the features it indicates to guest. It's for
bisectability.
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/cpuid.c | 6 ++++++
arch/x86/kvm/x86.c | 24 ++++++++++++++++++++++++
3 files changed, 32 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 159b5988292f..e28626f6a2e0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -570,6 +570,7 @@ struct kvm_vcpu_arch {
u64 ia32_xss;
u64 microcode_version;
u64 arch_capabilities;
+ u64 core_capability;
/*
* Paging state of the vcpu
@@ -1527,6 +1528,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
unsigned long icr, int op_64_bit);
u64 kvm_get_arch_capabilities(void);
+u64 kvm_get_core_capability(void);
void kvm_define_shared_msr(unsigned index, u32 msr);
int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fd3951638ae4..4a2f7892ea31 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -505,6 +505,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
* if the host doesn't support it.
*/
entry->edx |= F(ARCH_CAPABILITIES);
+ /*
+ * Since we emulate MSR IA32_CORE_CAPABILITY in
+ * software, we can always enable it for guest
+ * regardless of host's capability.
+ */
+ entry->edx |= F(CORE_CAPABILITY);
} else {
entry->ebx = 0;
entry->ecx = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 099b851dabaf..4459115eb0ec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1159,6 +1159,7 @@ static u32 emulated_msrs[] = {
MSR_IA32_TSC_ADJUST,
MSR_IA32_TSCDEADLINE,
MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_CORE_CAPABILITY,
MSR_IA32_MISC_ENABLE,
MSR_IA32_MCG_STATUS,
MSR_IA32_MCG_CTL,
@@ -1198,6 +1199,7 @@ static u32 msr_based_features[] = {
MSR_F10H_DECFG,
MSR_IA32_UCODE_REV,
+ MSR_IA32_CORE_CAPABILITY,
MSR_IA32_ARCH_CAPABILITIES,
};
@@ -1225,9 +1227,18 @@ u64 kvm_get_arch_capabilities(void)
}
EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
+u64 kvm_get_core_capability(void)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_get_core_capability);
+
static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
{
switch (msr->index) {
+ case MSR_IA32_CORE_CAPABILITY:
+ msr->data = kvm_get_core_capability();
+ break;
case MSR_IA32_ARCH_CAPABILITIES:
msr->data = kvm_get_arch_capabilities();
break;
@@ -2451,6 +2462,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_EFER:
return set_efer(vcpu, data);
+ case MSR_IA32_CORE_CAPABILITY:
+ if (!msr_info->host_initiated)
+ return 1;
+
+ vcpu->arch.core_capability = data;
+ break;
case MSR_K7_HWCR:
data &= ~(u64)0x40; /* ignore flush filter disable */
data &= ~(u64)0x100; /* ignore ignne emulation enable */
@@ -2762,6 +2779,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + vcpu->arch.tsc_offset;
break;
+ case MSR_IA32_CORE_CAPABILITY:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITY))
+ return 1;
+ msr_info->data = vcpu->arch.core_capability;
+ break;
case MSR_MTRRcap:
case 0x200 ... 0x2ff:
return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
@@ -8762,6 +8785,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
+ vcpu->arch.core_capability = kvm_get_core_capability();
vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
kvm_vcpu_mtrr_init(vcpu);
vcpu_load(vcpu);
--
2.19.1
Some CPU feature bits are forced set and stored in cpuinfo_x86 before
handling clearcpuid options. To clear those bits from cpuinfo_x86,
apply_forced_cap() is called after handling the options.
Please note, apply_forced_cap() is called twice on boot CPU. But this code
is simple and there is no functionality issue.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpu.h | 2 ++
arch/x86/kernel/cpu/common.c | 5 +++--
arch/x86/kernel/fpu/init.c | 2 ++
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 4e03f53fc079..bfa5ac6b7502 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -26,6 +26,8 @@ struct x86_cpu {
struct cpu cpu;
};
+void apply_forced_caps(struct cpuinfo_x86 *c);
+
#ifdef CONFIG_HOTPLUG_CPU
extern int arch_register_cpu(int num);
extern void arch_unregister_cpu(int);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index bbdd69dd4f5f..e1d41405c27b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -758,13 +758,14 @@ void cpu_detect(struct cpuinfo_x86 *c)
}
}
-static void apply_forced_caps(struct cpuinfo_x86 *c)
+void apply_forced_caps(struct cpuinfo_x86 *c)
{
int i;
for (i = 0; i < NCAPINTS + NBUGINTS; i++) {
- c->x86_capability[i] &= ~cpu_caps_cleared[i];
+ /* Bits may be cleared after they are set. */
c->x86_capability[i] |= cpu_caps_set[i];
+ c->x86_capability[i] &= ~cpu_caps_cleared[i];
}
}
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 99b895eea166..9c2801b605e3 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -5,6 +5,7 @@
#include <asm/tlbflush.h>
#include <asm/setup.h>
#include <asm/cmdline.h>
+#include <asm/cpu.h>
#include <linux/sched.h>
#include <linux/sched/task.h>
@@ -261,6 +262,7 @@ static void __init clear_cpuid(void)
bit >= 0 && bit < NCAPINTS * 32)
setup_clear_cpu_cap(bit);
}
+ apply_forced_caps(&boot_cpu_data);
}
/*
--
2.19.1
From: Peter Zijlstra <[email protected]>
After kernel clears a CPUID bit through clearcpuid or other kernel options,
CPUID instruction executed from user space should see the same value for
the bit. The CPUID faulting handler returns the cleared bit to user.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 4 +++
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/kernel/cpu/cpuid-deps.c | 52 ++++++++++++++++++++++++++++
arch/x86/kernel/cpu/intel.c | 56 +++++++++++++++++++++++++++++--
arch/x86/kernel/cpu/scattered.c | 17 ++++++++++
arch/x86/kernel/process.c | 3 ++
arch/x86/kernel/traps.c | 11 ++++++
7 files changed, 142 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 823c4ab82e12..53875fd13f5a 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -228,5 +228,9 @@ static __always_inline __pure bool _static_cpu_has(u16 bit)
#define CPU_FEATURE_TYPEVAL boot_cpu_data.x86_vendor, boot_cpu_data.x86, \
boot_cpu_data.x86_model
+extern int cpuid_fault;
+u32 scattered_cpuid_mask(u32 leaf, u32 count, enum cpuid_regs_idx reg);
+u32 cpuid_cap_mask(u32 leaf, u32 count, enum cpuid_regs_idx reg);
+
#endif /* defined(__KERNEL__) && !defined(__ASSEMBLY__) */
#endif /* _ASM_X86_CPUFEATURE_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e1d41405c27b..020597bca252 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1503,6 +1503,8 @@ void print_cpu_info(struct cpuinfo_x86 *c)
pr_cont(")\n");
}
+int cpuid_fault;
+
/*
* clearcpuid= was already parsed in fpu__init_parse_early_param.
* But we need to keep a dummy __setup around otherwise it would
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 1a71434f7b46..d42aa4fa3021 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -113,9 +113,61 @@ static void do_clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature)
void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature)
{
+ if (boot_cpu_has(feature))
+ cpuid_fault = 1;
do_clear_cpu_cap(c, feature);
}
+u32 cpuid_cap_mask(u32 leaf, u32 count, enum cpuid_regs_idx reg)
+{
+ switch (leaf) {
+ case 0x1:
+ if (reg == CPUID_EDX)
+ return ~cpu_caps_cleared[CPUID_1_EDX];
+ if (reg == CPUID_ECX)
+ return ~cpu_caps_cleared[CPUID_1_ECX];
+ break;
+
+ case 0x6:
+ if (reg == CPUID_EAX)
+ return ~cpu_caps_cleared[CPUID_6_EAX];
+ break;
+
+ case 0x7:
+ if (reg == CPUID_EDX)
+ return ~cpu_caps_cleared[CPUID_7_EDX];
+ if (reg == CPUID_ECX)
+ return ~cpu_caps_cleared[CPUID_7_ECX];
+ if (reg == CPUID_EBX && count == 0)
+ return ~cpu_caps_cleared[CPUID_7_0_EBX];
+ break;
+
+ case 0xD:
+ if (reg == CPUID_EAX)
+ return ~cpu_caps_cleared[CPUID_D_1_EAX];
+ break;
+
+ case 0xF:
+ if (reg == CPUID_EDX) {
+ if (count == 0)
+ return ~cpu_caps_cleared[CPUID_F_0_EDX];
+ if (count == 1)
+ return ~cpu_caps_cleared[CPUID_F_0_EDX];
+ }
+ break;
+
+ case 0x80000007:
+ if (reg == CPUID_EDX) {
+ if (test_bit(X86_FEATURE_CONSTANT_TSC,
+ (unsigned long *)cpu_caps_cleared))
+ return ~(1 << 8);
+ }
+ break;
+ }
+
+ return scattered_cpuid_mask(leaf, count, reg);
+}
+
void setup_clear_cpu_cap(unsigned int feature)
{
do_clear_cpu_cap(NULL, feature);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 166033fa8423..aa1cfcc81323 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,9 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
+#include <asm/inat.h>
#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -667,13 +670,60 @@ static void intel_bsp_resume(struct cpuinfo_x86 *c)
init_intel_energy_perf(c);
}
+bool fixup_cpuid_exception(struct pt_regs *regs)
+{
+ unsigned int leaf, count, eax, ebx, ecx, edx;
+ unsigned long seg_base = 0;
+ unsigned char buf[2];
+ int not_copied;
+
+ if (!cpuid_fault)
+ return false;
+
+ if (test_thread_flag(TIF_NOCPUID))
+ return false;
+
+ if (!user_64bit_mode(regs))
+ seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
+
+ if (seg_base == -1L)
+ return false;
+
+ not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),
+ sizeof(buf));
+ if (not_copied)
+ return false;
+
+ if (buf[0] != 0x0F || buf[1] != 0xA2) /* CPUID - OF A2 */
+ return false;
+
+ leaf = regs->ax;
+ count = regs->cx;
+
+ cpuid_count(leaf, count, &eax, &ebx, &ecx, &edx);
+
+ regs->ip += 2;
+ regs->ax = eax & cpuid_cap_mask(leaf, count, CPUID_EAX);
+ regs->bx = ebx & cpuid_cap_mask(leaf, count, CPUID_EBX);
+ regs->cx = ecx & cpuid_cap_mask(leaf, count, CPUID_ECX);
+ regs->dx = edx & cpuid_cap_mask(leaf, count, CPUID_EDX);
+
+ return true;
+}
+
static void init_cpuid_fault(struct cpuinfo_x86 *c)
{
u64 msr;
- if (!rdmsrl_safe(MSR_PLATFORM_INFO, &msr)) {
- if (msr & MSR_PLATFORM_INFO_CPUID_FAULT)
- set_cpu_cap(c, X86_FEATURE_CPUID_FAULT);
+ if (rdmsrl_safe(MSR_PLATFORM_INFO, &msr))
+ return;
+
+ if (msr & MSR_PLATFORM_INFO_CPUID_FAULT) {
+ set_cpu_cap(c, X86_FEATURE_CPUID_FAULT);
+ if (cpuid_fault) {
+ this_cpu_or(msr_misc_features_shadow,
+ MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
+ }
}
}
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 94aa1c72ca98..353756c27096 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -62,3 +62,20 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
set_cpu_cap(c, cb->feature);
}
}
+
+u32 scattered_cpuid_mask(u32 leaf, u32 count, enum cpuid_regs_idx reg)
+{
+ const struct cpuid_bit *cb;
+ u32 mask = ~0U;
+
+ for (cb = cpuid_bits; cb->feature; cb++) {
+ if (cb->level == leaf && cb->sub_leaf == count &&
+ cb->reg == reg) {
+ if (test_bit(cb->feature,
+ (unsigned long *)cpu_caps_cleared))
+ mask &= ~BIT(cb->bit);
+ }
+ }
+
+ return mask;
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 58ac7be52c7a..2b1dfd7ae65d 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -196,6 +196,9 @@ static void set_cpuid_faulting(bool on)
{
u64 msrval;
+ if (cpuid_fault)
+ return;
+
msrval = this_cpu_read(msr_misc_features_shadow);
msrval &= ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT;
msrval |= (on << MSR_MISC_FEATURES_ENABLES_CPUID_FAULT_BIT);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 0ac992bfa287..d887da1e8050 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -559,6 +559,12 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, 0, NULL);
}
+#ifdef CONFIG_CPU_SUP_INTEL
+extern bool fixup_cpuid_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_cpuid_exception(struct pt_regs *regs) { return false; }
+#endif
+
dotraplinkage void
do_general_protection(struct pt_regs *regs, long error_code)
{
@@ -573,6 +579,11 @@ do_general_protection(struct pt_regs *regs, long error_code)
return;
}
+ if (static_cpu_has(X86_FEATURE_CPUID_FAULT)) {
+ if (user_mode(regs) && fixup_cpuid_exception(regs))
+ return;
+ }
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
--
2.19.1
On Wed, Apr 03, 2019 at 02:21:47PM -0700, Fenghua Yu wrote:
> cpu_caps_cleared and cpu_caps_set may not be aligned to unsigned long.
> Atomic operations (i.e. set_bit and clear_bit) on the bitmaps may access
Please end function names with parentheses.
> two cache lines (a.k.a. split lock) and lock the bus to block all memory
not "lock the bus" but "cause the CPU to do a bus lock... "
> accesses from other processors to ensure atomicity.
>
> To avoid the overall performance degradation from the bus locking, align
> the two variables to unsigned long.
>
> Defining the variables as unsigned long may also fix the issue because
> they are naturally aligned to unsigned long. But that needs additional
> code changes. Adding __aligned(unsigned long) is a simpler fix.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> arch/x86/kernel/cpu/common.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
With the above nitpicks addressed:
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
From: Fenghua Yu
> Sent: 03 April 2019 22:22
> set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
> operate on bitmap defined in x86_capability.
>
> Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
> the location is at:
> base address of x86_capability + (bit offset in x86_capability / 64) * 8
>
> Since base address of x86_capability may not be aligned to unsigned long,
> the single unsigned long location may cross two cache lines and
> accessing the location by locked BTS/BTR introductions will trigger #AC.
That is not true.
The BTS/BTR instructions access the memory word that contains the
expected bit.
The 'operand size' only affects the size of the register use for the
bit offset.
If the 'operand size' is 16 bits wide (+/- 32k bit offset) the cpu might
do an aligned 16bit memory access, otherwise (32 or 64bit bit offset) it
might do an aligned 32 bit access.
It should never do an 64bit access and never a misaligned one (even if
the base address is misaligned).
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Wed, Apr 03, 2019 at 02:21:58PM -0700, Fenghua Yu wrote:
> From: Xiaoyao Li <[email protected]>
>
> A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
> future x86 processors. When bit 29 is set, the processor causes #AC
> exception for split locked accesses at all CPL.
>
> Please check the latest Intel 64 and IA-32 Architectures Software
> Developer's Manual for more detailed information on the MSR and
> the split lock bit.
>
> This patch emulate MSR TEST_CTL with vmx->msr_test_ctl and does the
> following:
> 1. As MSR TEST_CTL of guest is emulated, enable the related bits
> in CORE_CAPABILITY to corretly report this feature to guest.
s/corretly/correctly
>
> 2. Differentiate MSR TEST_CTL between host and guest.
>
> Signed-off-by: Xiaoyao Li <[email protected]>
> Signed-off-by: Fenghua Yu <[email protected]>
> Acked-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.h | 1 +
> arch/x86/kvm/x86.c | 17 ++++++++++++++++-
> 3 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ab432a930ae8..309ccf593f0d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1663,6 +1663,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> u32 index;
>
> switch (msr_info->index) {
> + case MSR_TEST_CTL:
> + if (!msr_info->host_initiated &&
> + !(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> + return 1;
> + msr_info->data = vmx->msr_test_ctl;
> + break;
> #ifdef CONFIG_X86_64
> case MSR_FS_BASE:
> msr_info->data = vmcs_readl(GUEST_FS_BASE);
> @@ -1797,6 +1803,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> u32 index;
>
> switch (msr_index) {
> + case MSR_TEST_CTL:
> + if (!(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> + return 1;
> +
> + if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
> + return 1;
> + vmx->msr_test_ctl = data;
> + break;
> case MSR_EFER:
> ret = kvm_set_msr_common(vcpu, msr_info);
> break;
> @@ -4077,6 +4091,9 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> ++vmx->nmsrs;
> }
>
> + /* disable AC split lock by default */
> + vmx->msr_test_ctl = 0;
> +
> vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
>
> /* 22.2.1, 20.8.1 */
> @@ -4114,6 +4131,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>
> vmx->rmode.vm86_active = 0;
> vmx->spec_ctrl = 0;
> + vmx->msr_test_ctl = 0;
>
> vcpu->arch.microcode_version = 0x100000000ULL;
> vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
> @@ -6313,6 +6331,21 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> msrs[i].host, false);
> }
>
> +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> +{
> + u64 host_msr_test_ctl;
> +
> + /* if TEST_CTL MSR doesn't exist on the hardware, do nothing */
> + if (rdmsrl_safe(MSR_TEST_CTL, &host_msr_test_ctl))
> + return;
This adds a RDMSR on every VM-Enter, and a fault on CPUs that don't
support MSR_TEST_CTL. Ideally the kernel would cache MSR_TEST_CTL and
expose a helper that returns a boolean to indicate the existence of the
MSRs along with the current value. Racing with split_lock_detect_store()
is ok since this code runs with interrupts disabled, i.e. will block
split_lock_detect_store() until after VM-Exit.
Paolo, can you weigh in with your thoughts? I'm surprised you acked
this patch given your earlier comment:
https://patchwork.kernel.org/patch/10413779/#21892723
> +
> + if (host_msr_test_ctl == vmx->msr_test_ctl)
> + clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> + else
> + add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
> + host_msr_test_ctl, false);
> +}
> +
> static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
> {
> vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
> @@ -6419,6 +6452,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>
> atomic_switch_perf_msrs(vmx);
>
> + atomic_switch_msr_test_ctl(vmx);
> +
> vmx_update_hv_timer(vcpu);
>
> /*
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index a1e00d0a2482..6091a8b9de74 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -190,6 +190,7 @@ struct vcpu_vmx {
> u64 msr_guest_kernel_gs_base;
> #endif
>
> + u64 msr_test_ctl;
> u64 spec_ctrl;
>
> u32 vm_entry_controls_shadow;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4459115eb0ec..e93c2f620cdb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1229,7 +1229,22 @@ EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
>
> u64 kvm_get_core_capability(void)
> {
> - return 0;
> + u64 data;
> +
> + rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data);
> +
> + /* mask non-virtualizable functions */
> + data &= CORE_CAP_SPLIT_LOCK_DETECT;
> +
> + /*
> + * There will be a list of FMS values that have split lock detection
> + * but lack the CORE CAPABILITY MSR. In this case, set
> + * CORE_CAP_SPLIT_LOCK_DETECT since we emulate MSR CORE_CAPABILITY.
> + */
> + if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> + data |= CORE_CAP_SPLIT_LOCK_DETECT;
> +
> + return data;
> }
> EXPORT_SYMBOL_GPL(kvm_get_core_capability);
>
> --
> 2.19.1
>
On Thu, Apr 04, 2019 at 04:39:43PM +0200, Borislav Petkov wrote:
> On Wed, Apr 03, 2019 at 02:21:47PM -0700, Fenghua Yu wrote:
> > cpu_caps_cleared and cpu_caps_set may not be aligned to unsigned long.
> > Atomic operations (i.e. set_bit and clear_bit) on the bitmaps may access
>
> Please end function names with parentheses.
Sure.
>
> > two cache lines (a.k.a. split lock) and lock the bus to block all memory
>
> not "lock the bus" but "cause the CPU to do a bus lock... "
Sure.
>
> > accesses from other processors to ensure atomicity.
> >
> > To avoid the overall performance degradation from the bus locking, align
> > the two variables to unsigned long.
> >
> > Defining the variables as unsigned long may also fix the issue because
> > they are naturally aligned to unsigned long. But that needs additional
> > code changes. Adding __aligned(unsigned long) is a simpler fix.
> >
> > Signed-off-by: Fenghua Yu <[email protected]>
> > ---
> > arch/x86/kernel/cpu/common.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
>
> With the above nitpicks addressed:
>
> Reviewed-by: Borislav Petkov <[email protected]>
Thank you for your review!
-Fenghua
From: David Laight
> Sent: 04 April 2019 15:45
>
> From: Fenghua Yu
> > Sent: 03 April 2019 22:22
> > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
> > operate on bitmap defined in x86_capability.
> >
> > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
> > the location is at:
> > base address of x86_capability + (bit offset in x86_capability / 64) * 8
> >
> > Since base address of x86_capability may not be aligned to unsigned long,
> > the single unsigned long location may cross two cache lines and
> > accessing the location by locked BTS/BTR introductions will trigger #AC.
>
> That is not true.
> The BTS/BTR instructions access the memory word that contains the
> expected bit.
> The 'operand size' only affects the size of the register use for the
> bit offset.
> If the 'operand size' is 16 bits wide (+/- 32k bit offset) the cpu might
> do an aligned 16bit memory access, otherwise (32 or 64bit bit offset) it
> might do an aligned 32 bit access.
> It should never do an 64bit access and never a misaligned one (even if
> the base address is misaligned).
Hmmm... I may have misread things slightly.
The accessed address is 'Effective Address + (4 ∗ (BitOffset DIV 32))'.
However nothing suggests that it ever does 64bit accesses.
If it does do 64bit accesses when the operand size is 64 bits then the
asm stubs ought to be changed to only specify 32bit operand size.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Thu, Apr 04, 2019 at 04:24:15PM +0000, David Laight wrote:
> From: David Laight
> > Sent: 04 April 2019 15:45
> >
> > From: Fenghua Yu
> > > Sent: 03 April 2019 22:22
> > > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
> > > operate on bitmap defined in x86_capability.
> > >
> > > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
> > > the location is at:
> > > base address of x86_capability + (bit offset in x86_capability / 64) * 8
> > >
> > > Since base address of x86_capability may not be aligned to unsigned long,
> > > the single unsigned long location may cross two cache lines and
> > > accessing the location by locked BTS/BTR introductions will trigger #AC.
> >
> > That is not true.
> > The BTS/BTR instructions access the memory word that contains the
> > expected bit.
> > The 'operand size' only affects the size of the register use for the
> > bit offset.
> > If the 'operand size' is 16 bits wide (+/- 32k bit offset) the cpu might
> > do an aligned 16bit memory access, otherwise (32 or 64bit bit offset) it
> > might do an aligned 32 bit access.
> > It should never do an 64bit access and never a misaligned one (even if
> > the base address is misaligned).
>
> Hmmm... I may have misread things slightly.
> The accessed address is 'Effective Address + (4 ∗ (BitOffset DIV 32))'.
> However nothing suggests that it ever does 64bit accesses.
>
> If it does do 64bit accesses when the operand size is 64 bits then the
> asm stubs ought to be changed to only specify 32bit operand size.
Heh, we had this discussion before[1], the op size dictates the size of
the memory access and can generate unaligned accesses.
[1] https://lkml.kernel.org/r/[email protected]
On Thu, 4 Apr 2019, David Laight wrote:
> From: David Laight Sent: 04 April 2019 15:45
> > From: Fenghua Yu Sent: 03 April 2019 22:22
> > That is not true.
> > The BTS/BTR instructions access the memory word that contains the
> > expected bit.
> > The 'operand size' only affects the size of the register use for the
> > bit offset.
> > If the 'operand size' is 16 bits wide (+/- 32k bit offset) the cpu might
> > do an aligned 16bit memory access, otherwise (32 or 64bit bit offset) it
> > might do an aligned 32 bit access.
> > It should never do an 64bit access and never a misaligned one (even if
> > the base address is misaligned).
>
> Hmmm... I may have misread things slightly.
> The accessed address is 'Effective Address + (4 ∗ (BitOffset DIV 32))'.
> However nothing suggests that it ever does 64bit accesses.
>
> If it does do 64bit accesses when the operand size is 64 bits then the
> asm stubs ought to be changed to only specify 32bit operand size.
bitops operate on unsigned long arrays, so the RMW on the affected array
member has to be atomic vs. other RMW operations on the same array
member. If we make the bitops 32bit wide on x86/64 we break that.
So selecting 64bit access (REX.W prefix) is correct and has to stay.
Thanks,
tglx
On 04/04/19 18:52, Thomas Gleixner wrote:
> On Thu, 4 Apr 2019, David Laight wrote:
>> From: David Laight Sent: 04 April 2019 15:45
>>> From: Fenghua Yu Sent: 03 April 2019 22:22
>>> That is not true.
>>> The BTS/BTR instructions access the memory word that contains the
>>> expected bit.
>>> The 'operand size' only affects the size of the register use for the
>>> bit offset.
>>> If the 'operand size' is 16 bits wide (+/- 32k bit offset) the cpu might
>>> do an aligned 16bit memory access, otherwise (32 or 64bit bit offset) it
>>> might do an aligned 32 bit access.
>>> It should never do an 64bit access and never a misaligned one (even if
>>> the base address is misaligned).
>>
>> Hmmm... I may have misread things slightly.
>> The accessed address is 'Effective Address + (4 ∗ (BitOffset DIV 32))'.
>> However nothing suggests that it ever does 64bit accesses.
>>
>> If it does do 64bit accesses when the operand size is 64 bits then the
>> asm stubs ought to be changed to only specify 32bit operand size.
>
> bitops operate on unsigned long arrays, so the RMW on the affected array
> member has to be atomic vs. other RMW operations on the same array
> member. If we make the bitops 32bit wide on x86/64 we break that.
>
> So selecting 64bit access (REX.W prefix) is correct and has to stay.
Aren't bitops always atomic with respect to the whole cache line(s)? We
regularly rely on cmpxchgl being atomic with respect to movb.
Paolo
On Wed, 3 Apr 2019, Fenghua Yu wrote:
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> + unsigned int trapnr = X86_TRAP_AC;
> + char str[] = "alignment check";
> + int signr = SIGBUS;
> +
> + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + /*
> + * WARN*()s end up here; fix them up before we call the
> + * notifier chain.
> + */
How exactly is WARN*() ending up here?
> + if (!user_mode(regs) && fixup_bug(regs, trapnr))
And that fixup_bug() check does what?
int fixup_bug(struct pt_regs *regs, int trapnr)
{
if (trapnr != X86_TRAP_UD)
return 0;
Copy and paste from do_error_trap() ....
> + return;
> +
> + if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) ==
> + NOTIFY_STOP)
> + return;
> +
> + cond_local_irq_enable(regs);
> + if (!user_mode(regs) &&
> + static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> + /*
> + * Only split lock can generate #AC from kernel at this point.
> + * Warn and disable split lock detection on this CPU. The
> + * faulting instruction will be executed without generating
> + * another #AC fault. User needs to check the warning and
> + * fix the split lock issue in the faulting instruction.
"User needs to check the warning and fix the issue ..."
I'm looking forward to all the fixes from Joe Users.
Please remove that sentence. It's useless. Users report warnings if at all
and the kernel developers who actually look at them surely don't need an
advice like that.
Thanks,
tglx
On Wed, 3 Apr 2019, Fenghua Yu wrote:
> A split locked access locks bus and degrades overall memory access
> performance. When split lock detection feature is enumerated, enable
> the feature by default to find any split lock issue and then fix
> the issue.
Enabling the feature allows to find the issues, but does not automagically
fix them. Come on.
> +#define DISABLE_SPLIT_LOCK_DETECT 0
> +#define ENABLE_SPLIT_LOCK_DETECT 1
If those defines have a value at all, please start with the facility not
with functionality, i.e. AC_SPLIT_LOCK_ENABLE....
> +
> +static DEFINE_MUTEX(split_lock_detect_mutex);
> +static int split_lock_detect_val;
detect_val? What value is that? Its supposed to hold those magic defines
above. So something like
static unsigned int ac_split_lock_enable;
> /*
> * Just in case our CPU detection goes bad, or you have a weird system,
> * allow a way to override the automatic disabling of MPX.
> @@ -161,10 +167,45 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
> return false;
> }
>
> +static u32 new_sp_test_ctl_val(u32 test_ctl_val)
> +{
> + /* Change the split lock setting. */
> + if (READ_ONCE(split_lock_detect_val) == DISABLE_SPLIT_LOCK_DETECT)
That READ_ONCE() is required because?
> + test_ctl_val &= ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
> + else
> + test_ctl_val |= TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
> +
> + return test_ctl_val;
> +}
Aside of that do we really need a misnomed function which replaces the
simple inline code at the call site:
rdmsr(l, h)
l &= ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
l |= ac_split_lock_enable << TEST_CTL_ENABLE_SPLIT_LOCK_DETECT_SHIFT;
wrmrs(...)
or the even more simple
if (ac_split_lock_enable)
msr_set_bit(...)
else
msr_clear_nit(...)
Hmm?
> +
> +static inline void show_split_lock_detection_info(void)
> +{
> + if (READ_ONCE(split_lock_detect_val))
That READ_ONCE() is required because?
> + pr_info_once("x86/split_lock: split lock detection enabled\n");
> + else
> + pr_info_once("x86/split_lock: split lock detection disabled\n");
pr_fmt exists for a reason and having 'split lock' repeated several times
in the same line is not making it more readable.
> +}
> +
> +static void init_split_lock_detect(struct cpuinfo_x86 *c)
> +{
> + if (cpu_has(c, X86_FEATURE_SPLIT_LOCK_DETECT)) {
> + u32 l, h;
> +
> + mutex_lock(&split_lock_detect_mutex);
> + rdmsr(MSR_TEST_CTL, l, h);
> + l = new_sp_test_ctl_val(l);
> + wrmsr(MSR_TEST_CTL, l, h);
> + show_split_lock_detection_info();
> + mutex_unlock(&split_lock_detect_mutex);
> + }
> +}
> +
> static void early_init_intel(struct cpuinfo_x86 *c)
> {
> u64 misc_enable;
>
> + init_split_lock_detect(c);
so we have in early boot:
early_cpu_init()
early_identify_cpu()
this_cpu->c_early_init(c)
early_init_intel() {
init_split_lock_detect();
}
....
cpu_set_core_cap_bits(c)
set(FEATURE_SPLIT_LOCK)
I don't have to understand how init_split_lock_detect() will magically see
the feature bit which gets set afterwards, right?
> +
> /* Unmask CPUID levels if masked: */
> if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
> if (msr_clear_bit(MSR_IA32_MISC_ENABLE,
> @@ -1032,6 +1073,7 @@ cpu_dev_register(intel_cpu_dev);
> static void __init set_split_lock_detect(void)
> {
> setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
> + split_lock_detect_val = 1;
Oh well. You add defines on top of the file and then you don't use them.
Thanks,
tglx
On Thu, 4 Apr 2019, Paolo Bonzini wrote:
> On 04/04/19 18:52, Thomas Gleixner wrote:
> > On Thu, 4 Apr 2019, David Laight wrote:
> >> From: David Laight Sent: 04 April 2019 15:45
> >>> From: Fenghua Yu Sent: 03 April 2019 22:22
> >>> That is not true.
> >>> The BTS/BTR instructions access the memory word that contains the
> >>> expected bit.
> >>> The 'operand size' only affects the size of the register use for the
> >>> bit offset.
> >>> If the 'operand size' is 16 bits wide (+/- 32k bit offset) the cpu might
> >>> do an aligned 16bit memory access, otherwise (32 or 64bit bit offset) it
> >>> might do an aligned 32 bit access.
> >>> It should never do an 64bit access and never a misaligned one (even if
> >>> the base address is misaligned).
> >>
> >> Hmmm... I may have misread things slightly.
> >> The accessed address is 'Effective Address + (4 ∗ (BitOffset DIV 32))'.
> >> However nothing suggests that it ever does 64bit accesses.
> >>
> >> If it does do 64bit accesses when the operand size is 64 bits then the
> >> asm stubs ought to be changed to only specify 32bit operand size.
> >
> > bitops operate on unsigned long arrays, so the RMW on the affected array
> > member has to be atomic vs. other RMW operations on the same array
> > member. If we make the bitops 32bit wide on x86/64 we break that.
> >
> > So selecting 64bit access (REX.W prefix) is correct and has to stay.
>
> Aren't bitops always atomic with respect to the whole cache line(s)? We
> regularly rely on cmpxchgl being atomic with respect to movb.
Yes, but if your long goes across a cacheline then you have lost due to the
requirement to lock both cachelines. Same problem as with bitops and I
rather catch all of those than just some.
Thanks,
tglx
On Thu, 4 Apr 2019, Thomas Gleixner wrote:
> On Wed, 3 Apr 2019, Fenghua Yu wrote:
> > +static void init_split_lock_detect(struct cpuinfo_x86 *c)
> > +{
> > + if (cpu_has(c, X86_FEATURE_SPLIT_LOCK_DETECT)) {
> > + u32 l, h;
> > +
> > + mutex_lock(&split_lock_detect_mutex);
The mutex protects what exactly in the cpu bringup code?
Thanks,
tglx
On Wed, 3 Apr 2019, Fenghua Yu wrote:
> +
> +static ssize_t
> +split_lock_detect_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + return sprintf(buf, "%u\n", READ_ONCE(split_lock_detect_val));
Please stop sprinkling READ_ONCE all over the place or can you explain why
this is in any way useful? You know what READ/WRITE_ONCE() is for, right?
> +}
> +
> +static ssize_t
> +split_lock_detect_store(struct device *dev, struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + u32 val, l, h;
> + int cpu, ret;
> +
> + ret = kstrtou32(buf, 10, &val);
> + if (ret)
> + return ret;
> +
> + if (val != DISABLE_SPLIT_LOCK_DETECT && val != ENABLE_SPLIT_LOCK_DETECT)
> + return -EINVAL;
As this is really a simple boolean you can just use strtobool() and be done
with it.
> +
> + /*
> + * Since split lock could be disabled by kernel #AC handler or user
> + * may directly change bit 29 in MSR_TEST_CTL, split lock setting on
The user can change bit 29 in that MSR? If you talk about /dev/msr then I
really do not care. That interface should die.
Aside of that your usage of the term 'user' is really misleading and
inconsistent all over the place.
> + * each CPU may be different from global setting split_lock_detect_val
> + * by now. Update MSR on each CPU, so all of CPUs will have same split
> + * lock setting.
That helps in which way? If #AC was detected in the kernel then
1) It's likely to be switched off again right away
2) The WARN_ONCE() already triggered and will not warn again.
So what's the point here, really? If the kernel triggers #AC, game
over. Fix the kernel first. If your kernel is clean, then why do you need
that knob at all?
> + */
> + mutex_lock(&split_lock_detect_mutex);
> +
> + WRITE_ONCE(split_lock_detect_val, val);
Oh well.
> + /*
> + * Get MSR_TEST_CTL on this CPU, assuming all CPUs have same value
> + * in the MSR except split lock detection bit (bit 29).
And some day in the future this breaks because MRS_TEST_CTL has some other
shiny bits.
> + */
> + rdmsr(MSR_TEST_CTL, l, h);
> + l = new_sp_test_ctl_val(l);
> + /* Update the split lock detection setting on all online CPUs. */
> + for_each_online_cpu(cpu)
And what exactly protects the online cpu mask?
> + wrmsr_on_cpu(cpu, MSR_TEST_CTL, l, h);
Oh well. Instead of just having a function which does:
fun()
if (ac_...enabled)
msr_set_bit()
else
msr_clear_bit()
and invoke that from cpu init code and from here via on_each_cpu() or such?
> + mutex_unlock(&split_lock_detect_mutex);
> +
> + return count;
> +}
> +
> +static DEVICE_ATTR_RW(split_lock_detect);
> +
> +static int __init split_lock_init(void)
> +{
> + int ret;
> +
> + if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> + return -ENODEV;
> +
> + ret = device_create_file(cpu_subsys.dev_root,
> + &dev_attr_split_lock_detect);
> + if (ret)
> + return ret;
> +
> + return 0;
What's wrong with:
return device_create_file();
??? Not hard enough to read, right?
> +}
> +
Pointless empty line.
> +subsys_initcall(split_lock_init);
Thanks,
tglx
On Thu, Apr 04, 2019 at 08:07:57PM +0200, Thomas Gleixner wrote:
> On Wed, 3 Apr 2019, Fenghua Yu wrote:
>
> > A split locked access locks bus and degrades overall memory access
> > performance. When split lock detection feature is enumerated, enable
> > the feature by default to find any split lock issue and then fix
> > the issue.
>
> Enabling the feature allows to find the issues, but does not automagically
> fix them. Come on.
Ok. I will remove the "and then fix the issue".
>
> > +#define DISABLE_SPLIT_LOCK_DETECT 0
> > +#define ENABLE_SPLIT_LOCK_DETECT 1
>
> If those defines have a value at all, please start with the facility not
> with functionality, i.e. AC_SPLIT_LOCK_ENABLE....
OK.
>
> > +
> > +static DEFINE_MUTEX(split_lock_detect_mutex);
> > +static int split_lock_detect_val;
>
> detect_val? What value is that?
According to previous discussions, I was told to call this split lock feature
as "split lock detection" instead of "#AC for split lock". So I use
"split_lock_detect..." in variable names or function names, call feature flag
as "split_lock_detect", and call the feature as "split lock detection" in
descriptions.
If you don't agree to name feature as "split lock detection", I can change
variable names/function names/feature flag/descriptions etc back to previous
names "ac_split_lock...", "#AC for split lock", etc.
The variable split_lock_detect_val is either 0 or 1. It stores current
enable/disable status of split lock detection feature. By default it's
one after the feature is enumerated. Then sysadmin can change it to 0 or 1
to enable or disable the feature during run time.
> Its supposed to hold those magic defines
> above. So something like
>
> static unsigned int ac_split_lock_enable;
If you agree to name the split lock feature as "split lock detection" feature,
can I change this variable to static unsigned int split_lock_detect_enable?
> > /*
> > * Just in case our CPU detection goes bad, or you have a weird system,
> > * allow a way to override the automatic disabling of MPX.
> > @@ -161,10 +167,45 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
> > return false;
> > }
> >
> > +static u32 new_sp_test_ctl_val(u32 test_ctl_val)
> > +{
> > + /* Change the split lock setting. */
> > + if (READ_ONCE(split_lock_detect_val) == DISABLE_SPLIT_LOCK_DETECT)
>
> That READ_ONCE() is required because?
Ok. Will remove READ_ONCE().
>
> > + test_ctl_val &= ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
> > + else
> > + test_ctl_val |= TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
> > +
> > + return test_ctl_val;
> > +}
>
> Aside of that do we really need a misnomed function which replaces the
> simple inline code at the call site:
>
> rdmsr(l, h)
> l &= ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
> l |= ac_split_lock_enable << TEST_CTL_ENABLE_SPLIT_LOCK_DETECT_SHIFT;
> wrmrs(...)
>
> or the even more simple
>
> if (ac_split_lock_enable)
> msr_set_bit(...)
> else
> msr_clear_nit(...)
>
> Hmm?
The function new_sp_test_ctrl_val() will be called twice: here when
initializing split lock detection and in split_lock_detect_store()
when enabling/disabling the feature through the sysfs interface in
patch 0014.
So can I still keep this function and name it as get_new_test_ctrl_val()?
>
> > +
> > +static inline void show_split_lock_detection_info(void)
> > +{
> > + if (READ_ONCE(split_lock_detect_val))
>
> That READ_ONCE() is required because?
Ok. Will remove READ_ONCE().
>
> > + pr_info_once("x86/split_lock: split lock detection enabled\n");
> > + else
> > + pr_info_once("x86/split_lock: split lock detection disabled\n");
>
> pr_fmt exists for a reason and having 'split lock' repeated several times
> in the same line is not making it more readable.
Ok. I will change the string to "x86/split_lock_detection: enabled\n",
is it ok?
>
>
> > +
> > /* Unmask CPUID levels if masked: */
> > if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
> > if (msr_clear_bit(MSR_IA32_MISC_ENABLE,
> > @@ -1032,6 +1073,7 @@ cpu_dev_register(intel_cpu_dev);
> > static void __init set_split_lock_detect(void)
> > {
> > setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
> > + split_lock_detect_val = 1;
>
> Oh well. You add defines on top of the file and then you don't use them.
Will fix this.
Thanks.
-Fenghua
On Thu, 4 Apr 2019, Fenghua Yu wrote:
> On Thu, Apr 04, 2019 at 08:07:57PM +0200, Thomas Gleixner wrote:
> > On Wed, 3 Apr 2019, Fenghua Yu wrote:
> > > +static DEFINE_MUTEX(split_lock_detect_mutex);
> > > +static int split_lock_detect_val;
> >
> > detect_val? What value is that?
>
> According to previous discussions, I was told to call this split lock feature
> as "split lock detection" instead of "#AC for split lock". So I use
> "split_lock_detect..." in variable names or function names, call feature flag
> as "split_lock_detect", and call the feature as "split lock detection" in
> descriptions.
>
> If you don't agree to name feature as "split lock detection", I can change
> variable names/function names/feature flag/descriptions etc back to previous
> names "ac_split_lock...", "#AC for split lock", etc.
>
> The variable split_lock_detect_val is either 0 or 1. It stores current
> enable/disable status of split lock detection feature. By default it's
> one after the feature is enumerated. Then sysadmin can change it to 0 or 1
> to enable or disable the feature during run time.
> > static unsigned int ac_split_lock_enable;
>
> If you agree to name the split lock feature as "split lock detection" feature,
> can I change this variable to static unsigned int split_lock_detect_enable?
I don't care much whether it's ac_split_lock or split_lock_detect, but _val
is a completely bogus and unintuitive name. The variable tells whether the
functionality is enabled or not. Then do not name it $prefix_val, which can
mean anything. Name it $prefix_enable, which makes it entirely clear what
this is about.
And please make it type bool so you don't need any of these defines either.
> > > +static u32 new_sp_test_ctl_val(u32 test_ctl_val)
> > > +{
> > > + /* Change the split lock setting. */
> > > + if (READ_ONCE(split_lock_detect_val) == DISABLE_SPLIT_LOCK_DETECT)
> >
> > That READ_ONCE() is required because?
>
> Ok. Will remove READ_ONCE().
>
> >
> > > + test_ctl_val &= ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
> > > + else
> > > + test_ctl_val |= TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
> > > +
> > > + return test_ctl_val;
> > > +}
> >
> > Aside of that do we really need a misnomed function which replaces the
> > simple inline code at the call site:
> >
> > rdmsr(l, h)
> > l &= ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
> > l |= ac_split_lock_enable << TEST_CTL_ENABLE_SPLIT_LOCK_DETECT_SHIFT;
> > wrmrs(...)
> >
> > or the even more simple
> >
> > if (ac_split_lock_enable)
> > msr_set_bit(...)
> > else
> > msr_clear_nit(...)
> >
> > Hmm?
>
> The function new_sp_test_ctrl_val() will be called twice: here when
> initializing split lock detection and in split_lock_detect_store()
> when enabling/disabling the feature through the sysfs interface in
> patch 0014.
It's still pointless.
> So can I still keep this function and name it as get_new_test_ctrl_val()?
No. The function you want to share between init code and sysfs is
split_lock_update_msr()
{
if (split_lock_enable)
msr_set_bit(...)
else
msr_clear_nit(...)
}
That's all. No duplicated code. No convoluted helper function,
nothing. Simple straight forward readable code.
> > > +static inline void show_split_lock_detection_info(void)
> > > +{
> > > + if (READ_ONCE(split_lock_detect_val))
> >
> > That READ_ONCE() is required because?
>
> Ok. Will remove READ_ONCE().
>
> >
> > > + pr_info_once("x86/split_lock: split lock detection enabled\n");
> > > + else
> > > + pr_info_once("x86/split_lock: split lock detection disabled\n");
> >
> > pr_fmt exists for a reason and having 'split lock' repeated several times
> > in the same line is not making it more readable.
>
> Ok. I will change the string to "x86/split_lock_detection: enabled\n",
> is it ok?
Care to read carefully what I wrote? Hint: pr_fmt
> > Oh well. You add defines on top of the file and then you don't use them.
>
> Will fix this.
What about the init / feature detection sequence which you snipped from the
reply?
Thanks,
tglx
On Thu, Apr 04, 2019 at 07:31:59PM +0200, Thomas Gleixner wrote:
> On Wed, 3 Apr 2019, Fenghua Yu wrote:
> > +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> > +{
> > + unsigned int trapnr = X86_TRAP_AC;
> > + char str[] = "alignment check";
> > + int signr = SIGBUS;
> > +
> > + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> > +
> > + /*
> > + * WARN*()s end up here; fix them up before we call the
> > + * notifier chain.
> > + */
>
> How exactly is WARN*() ending up here?
>
> > + if (!user_mode(regs) && fixup_bug(regs, trapnr))
>
> And that fixup_bug() check does what?
>
> int fixup_bug(struct pt_regs *regs, int trapnr)
> {
> if (trapnr != X86_TRAP_UD)
> return 0;
>
> Copy and paste from do_error_trap() ....
As you can see, do_alignment_check() is copied from do_error_trap().
But seems this part of code is irrelevant to #AC handler.
So I will remove the "if (!user_mode(regs) && fixup_bug(regs, trapnr))"
and surrounding code, right?
>
> > + return;
> > +
> > + if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) ==
> > + NOTIFY_STOP)
> > + return;
> > +
> > + cond_local_irq_enable(regs);
> > + if (!user_mode(regs) &&
> > + static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> > + /*
> > + * Only split lock can generate #AC from kernel at this point.
> > + * Warn and disable split lock detection on this CPU. The
> > + * faulting instruction will be executed without generating
> > + * another #AC fault. User needs to check the warning and
> > + * fix the split lock issue in the faulting instruction.
>
> "User needs to check the warning and fix the issue ..."
>
> I'm looking forward to all the fixes from Joe Users.
>
> Please remove that sentence. It's useless. Users report warnings if at all
> and the kernel developers who actually look at them surely don't need an
> advice like that.
Sure. Will do this.
Thanks.
-Fenghua
From: Thomas Gleixner [mailto:[email protected]]
>
> On Thu, 4 Apr 2019, David Laight wrote:
...
> bitops operate on unsigned long arrays, so the RMW on the affected array
> member has to be atomic vs. other RMW operations on the same array
> member. If we make the bitops 32bit wide on x86/64 we break that.
>
> So selecting 64bit access (REX.W prefix) is correct and has to stay.
Even if the lock had a fine granularity it shouldn't matter whether
two RMW operations on different parts of the same 'long' happened
at the same time.
ISTR some of the 'constant bit offset' are done with byte sized
locked transfers.
The problem here is that the bitmap is defined as __u32[] not
unsigned long[].
- __u32 x86_capability[NCAPINTS + NBUGINTS];
+ /* Unsigned long alignment to avoid split lock in atomic bitmap ops */
+ __u32 x86_capability[NCAPINTS + NBUGINTS]
+ __aligned(sizeof(unsigned long));
Somewhere there is a cast.....
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Wed, 3 Apr 2019, Fenghua Yu wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>
> +u64 kvm_get_core_capability(void)
> +{
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_get_core_capability);
Why is this global and exported? The only users are in this very file.
Thanks,
tglx
On Wed, 3 Apr 2019, Fenghua Yu wrote:
> @@ -1663,6 +1663,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> u32 index;
>
> switch (msr_info->index) {
> + case MSR_TEST_CTL:
> + if (!msr_info->host_initiated &&
> + !(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
Errm? If the MSR_TEST_CTL is exposed to the guest via CPUID then the
rdmsr() in the guest is not supposed to fail just because
CORE_CAP_SPLIT_LOCK_DETECT is not set.
vmx->msr_test_ctl holds the proper value, which is either 0 or
CORE_CAP_SPLIT_LOCK_DETECT until more bits are supported.
So the chek needs to be guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY).
> + return 1;
> + msr_info->data = vmx->msr_test_ctl;
> + break;
> #ifdef CONFIG_X86_64
> case MSR_FS_BASE:
> msr_info->data = vmcs_readl(GUEST_FS_BASE);
> @@ -1797,6 +1803,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> u32 index;
>
> switch (msr_index) {
> + case MSR_TEST_CTL:
> + if (!(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> + return 1;
Again, this wants to be a check for the availability of the MSR in the
guest cpuid and not to the CORE_CAP_SPLIT_LOCK_DETECT magic bit.
> +
> + if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
> + return 1;
and this one wants to be:
if (data & vmx->msr_test_ctl_mask)
so additional bits can be enabled later on at exactly one point in the
code, i.e. during vm init.
> + vmx->msr_test_ctl = data;
> + break;
>
> +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> +{
> + u64 host_msr_test_ctl;
> +
> + /* if TEST_CTL MSR doesn't exist on the hardware, do nothing */
> + if (rdmsrl_safe(MSR_TEST_CTL, &host_msr_test_ctl))
> + return;
Yikes. So on hosts which do not have MSR_TEST_CTL this takes a fault on
every VM enter. The host knows whether it has this MSR or not.
Even if it exists there is no point to do the rdmsr on every vmenter. The
value should be cached per cpu. It only changes when:
1) #AC triggers in kernel space
2) The sysfs knob is flipped
#1 It happens either _BEFORE_ or _AFTER_ atomic_switch_msr_test_ctl(). In
both cases the cached value is correct in atomic_switch_msr_test_ctl().
If it happens _AFTER_ atomic_switch_msr_test_ctl() then the VMEXIT will
restore the enabled bit, but there is nothing you can do about that.
#2 CANNOT happen in the context of vcpu_run().
So this wants to be:
host_msr_test_ctl = this_cpu_read(msr_test_ctl_cache);
> + if (host_msr_test_ctl == vmx->msr_test_ctl)
> + clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> + else
> + add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
> + host_msr_test_ctl, false);
> +}
> +
> u64 kvm_get_core_capability(void)
> {
> - return 0;
> + u64 data;
> +
> + rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data);
if (!boot_cpu_has(X86_FEATURE_CORE_CAPABILITY))
return 0;
> + /* mask non-virtualizable functions */
> + data &= CORE_CAP_SPLIT_LOCK_DETECT;
Why?
> + /*
> + * There will be a list of FMS values that have split lock detection
> + * but lack the CORE CAPABILITY MSR. In this case, set
> + * CORE_CAP_SPLIT_LOCK_DETECT since we emulate MSR CORE_CAPABILITY.
That's completely wrong. X86_FEATURE_SPLIT_LOCK_DETECT is set when the
capability is enumerated in CPUID or it's set via the FMS quirk.
So:
data = 0;
> + if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> + data |= CORE_CAP_SPLIT_LOCK_DETECT;
> +
> + return data;
is absolutely sufficient.
Thanks,
tglx
On Thu, 2019-04-04 at 07:44 -0700, Sean Christopherson wrote:
> On Wed, Apr 03, 2019 at 02:21:58PM -0700, Fenghua Yu wrote:
> > From: Xiaoyao Li <[email protected]>
> >
> > A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
> > future x86 processors. When bit 29 is set, the processor causes #AC
> > exception for split locked accesses at all CPL.
> >
> > Please check the latest Intel 64 and IA-32 Architectures Software
> > Developer's Manual for more detailed information on the MSR and
> > the split lock bit.
> >
> > This patch emulate MSR TEST_CTL with vmx->msr_test_ctl and does the
> > following:
> > 1. As MSR TEST_CTL of guest is emulated, enable the related bits
> > in CORE_CAPABILITY to corretly report this feature to guest.
>
> s/corretly/correctly
will correct it. thanks.
> >
> > 2. Differentiate MSR TEST_CTL between host and guest.
> >
> > Signed-off-by: Xiaoyao Li <[email protected]>
> > Signed-off-by: Fenghua Yu <[email protected]>
> > Acked-by: Paolo Bonzini <[email protected]>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
> > arch/x86/kvm/vmx/vmx.h | 1 +
> > arch/x86/kvm/x86.c | 17 ++++++++++++++++-
> > 3 files changed, 52 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index ab432a930ae8..309ccf593f0d 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1663,6 +1663,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> > u32 index;
> >
> > switch (msr_info->index) {
> > + case MSR_TEST_CTL:
> > + if (!msr_info->host_initiated &&
> > + !(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> > + return 1;
> > + msr_info->data = vmx->msr_test_ctl;
> > + break;
> > #ifdef CONFIG_X86_64
> > case MSR_FS_BASE:
> > msr_info->data = vmcs_readl(GUEST_FS_BASE);
> > @@ -1797,6 +1803,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> > u32 index;
> >
> > switch (msr_index) {
> > + case MSR_TEST_CTL:
> > + if (!(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> > + return 1;
> > +
> > + if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
> > + return 1;
> > + vmx->msr_test_ctl = data;
> > + break;
> > case MSR_EFER:
> > ret = kvm_set_msr_common(vcpu, msr_info);
> > break;
> > @@ -4077,6 +4091,9 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> > ++vmx->nmsrs;
> > }
> >
> > + /* disable AC split lock by default */
> > + vmx->msr_test_ctl = 0;
> > +
> > vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
> >
> > /* 22.2.1, 20.8.1 */
> > @@ -4114,6 +4131,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool
> > init_event)
> >
> > vmx->rmode.vm86_active = 0;
> > vmx->spec_ctrl = 0;
> > + vmx->msr_test_ctl = 0;
> >
> > vcpu->arch.microcode_version = 0x100000000ULL;
> > vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
> > @@ -6313,6 +6331,21 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx
> > *vmx)
> > msrs[i].host, false);
> > }
> >
> > +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> > +{
> > + u64 host_msr_test_ctl;
> > +
> > + /* if TEST_CTL MSR doesn't exist on the hardware, do nothing */
> > + if (rdmsrl_safe(MSR_TEST_CTL, &host_msr_test_ctl))
> > + return;
>
> This adds a RDMSR on every VM-Enter, and a fault on CPUs that don't
> support MSR_TEST_CTL. Ideally the kernel would cache MSR_TEST_CTL and
> expose a helper that returns a boolean to indicate the existence of the
> MSRs along with the current value. Racing with split_lock_detect_store()
> is ok since this code runs with interrupts disabled, i.e. will block
> split_lock_detect_store() until after VM-Exit.
>
> Paolo, can you weigh in with your thoughts? I'm surprised you acked
> this patch given your earlier comment:
>
> https://patchwork.kernel.org/patch/10413779/#21892723
In v4 patchset, it checks boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) first in
atomic_switch_msr_test_ctl().
In v5 patchset, considering that there is another bit (bit 31) in MSR_TEST_CTL,
and !boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) cannot guarantee there is no
MSR_TEST_CTL in hardware. So I changed it to rdmsrl_safe() in v5.
It's my fault that I didn't point it out in the changelog in v5 patchset.
Considering the overhead of rdmsr every VMENTRY. I will use a percpu variable
msr_test_ctl_cache based on Thomas's comment to cache the value of host's
MSR_TEST_CTL.
> > +
> > + if (host_msr_test_ctl == vmx->msr_test_ctl)
> > + clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> > + else
> > + add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
> > + host_msr_test_ctl, false);
> > +}
> > +
> > static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
> > {
> > vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
> > @@ -6419,6 +6452,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >
> > atomic_switch_perf_msrs(vmx);
> >
> > + atomic_switch_msr_test_ctl(vmx);
> > +
> > vmx_update_hv_timer(vcpu);
> >
> > /*
> > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > index a1e00d0a2482..6091a8b9de74 100644
> > --- a/arch/x86/kvm/vmx/vmx.h
> > +++ b/arch/x86/kvm/vmx/vmx.h
> > @@ -190,6 +190,7 @@ struct vcpu_vmx {
> > u64 msr_guest_kernel_gs_base;
> > #endif
> >
> > + u64 msr_test_ctl;
> > u64 spec_ctrl;
> >
> > u32 vm_entry_controls_shadow;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 4459115eb0ec..e93c2f620cdb 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1229,7 +1229,22 @@ EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
> >
> > u64 kvm_get_core_capability(void)
> > {
> > - return 0;
> > + u64 data;
> > +
> > + rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data);
> > +
> > + /* mask non-virtualizable functions */
> > + data &= CORE_CAP_SPLIT_LOCK_DETECT;
> > +
> > + /*
> > + * There will be a list of FMS values that have split lock detection
> > + * but lack the CORE CAPABILITY MSR. In this case, set
> > + * CORE_CAP_SPLIT_LOCK_DETECT since we emulate MSR CORE_CAPABILITY.
> > + */
> > + if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > + data |= CORE_CAP_SPLIT_LOCK_DETECT;
> > +
> > + return data;
> > }
> > EXPORT_SYMBOL_GPL(kvm_get_core_capability);
> >
> > --
> > 2.19.1
> >
Hi, Thomas,
On Fri, 2019-04-05 at 14:30 +0200, Thomas Gleixner wrote:
> On Wed, 3 Apr 2019, Fenghua Yu wrote:
> > @@ -1663,6 +1663,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> > u32 index;
> >
> > switch (msr_info->index) {
> > + case MSR_TEST_CTL:
> > + if (!msr_info->host_initiated &&
> > + !(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
>
> Errm? If the MSR_TEST_CTL is exposed to the guest via CPUID then the
> rdmsr() in the guest is not supposed to fail just because
> CORE_CAP_SPLIT_LOCK_DETECT is not set.
you're right.
> vmx->msr_test_ctl holds the proper value, which is either 0 or
> CORE_CAP_SPLIT_LOCK_DETECT until more bits are supported.
>
> So the chek needs to be guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY).
I don't think so. There is no dependecy between
guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY) and MSR_TEST_CTL.
guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY) only indicates that guest has MSR
CORE_CAPABILITY.
Due to the fact that MSR_TEST_CTL is emulated with vmx->msr_test_ctl. I think it
's ok to always let userspace or guest to read MSR_TEST_CTL, it just returns the
emulated value. Like you said, " vmx->msr_test_ctl holds the proper value, which
is either 0 or CORE_CAP_SPLIT_LOCK_DETECT until more bits are supported."
> > + return 1;
> > + msr_info->data = vmx->msr_test_ctl;
> > + break;
> > #ifdef CONFIG_X86_64
> > case MSR_FS_BASE:
> > msr_info->data = vmcs_readl(GUEST_FS_BASE);
> > @@ -1797,6 +1803,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> > u32 index;
> >
> > switch (msr_index) {
> > + case MSR_TEST_CTL:
> > + if (!(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> > + return 1;
>
> Again, this wants to be a check for the availability of the MSR in the
> guest cpuid and not to the CORE_CAP_SPLIT_LOCK_DETECT magic bit.
>
> > +
> > + if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
> > + return 1;
>
> and this one wants to be:
>
> if (data & vmx->msr_test_ctl_mask)
>
> so additional bits can be enabled later on at exactly one point in the
> code, i.e. during vm init.
Again, in wrmsr handler, since MSR_TEST_CTL is emulated, it can be considered
that guest always has this MSR. It just needs to return 1 when reserved bits are
set.
Using vmx->msr_test_ctl_mask is a good idea. I will use it in next version.
> > + vmx->msr_test_ctl = data;
> > + break;
> >
> > +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> > +{
> > + u64 host_msr_test_ctl;
> > +
> > + /* if TEST_CTL MSR doesn't exist on the hardware, do nothing */
> > + if (rdmsrl_safe(MSR_TEST_CTL, &host_msr_test_ctl))
> > + return;
>
> Yikes. So on hosts which do not have MSR_TEST_CTL this takes a fault on
> every VM enter. The host knows whether it has this MSR or not.
>
> Even if it exists there is no point to do the rdmsr on every vmenter. The
> value should be cached per cpu. It only changes when:
>
> 1) #AC triggers in kernel space
>
> 2) The sysfs knob is flipped
>
> #1 It happens either _BEFORE_ or _AFTER_ atomic_switch_msr_test_ctl(). In
> both cases the cached value is correct in atomic_switch_msr_test_ctl().
>
> If it happens _AFTER_ atomic_switch_msr_test_ctl() then the VMEXIT will
> restore the enabled bit, but there is nothing you can do about that.
>
> #2 CANNOT happen in the context of vcpu_run().
>
> So this wants to be:
>
> host_msr_test_ctl = this_cpu_read(msr_test_ctl_cache);
You're right and thanks for your advice.
I will take it in next version.
> > + if (host_msr_test_ctl == vmx->msr_test_ctl)
> > + clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> > + else
> > + add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
> > + host_msr_test_ctl, false);
> > +}
> > +
> > u64 kvm_get_core_capability(void)
> > {
> > - return 0;
> > + u64 data;
> > +
> > + rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data);
>
> if (!boot_cpu_has(X86_FEATURE_CORE_CAPABILITY))
> return 0;
>
> > + /* mask non-virtualizable functions */
> > + data &= CORE_CAP_SPLIT_LOCK_DETECT;
>
> Why?
>
> > + /*
> > + * There will be a list of FMS values that have split lock detection
> > + * but lack the CORE CAPABILITY MSR. In this case, set
> > + * CORE_CAP_SPLIT_LOCK_DETECT since we emulate MSR CORE_CAPABILITY.
>
> That's completely wrong. X86_FEATURE_SPLIT_LOCK_DETECT is set when the
> capability is enumerated in CPUID or it's set via the FMS quirk.
>
> So:
> data = 0;
> > + if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > + data |= CORE_CAP_SPLIT_LOCK_DETECT;
> > +
> > + return data;
>
> is absolutely sufficient.
Right, X86_FEATURE_SPLIT_LOCK_DETECT is set in 2 situation:
#1 the capability is enumerated by CORE_CAP_SPLIT_LOCK_DETECT (bit 5 of
MSR_IA32_CORE_CAPABILITY)
#2 via FMS list, in which those processors have split lock detection
feature but don't have MSR_IA32_CORE_CAPABILITY.
There two pathes work well for host, accurately for real FMS.
However, when it comes to virtualization, we can emulate a different FMS of
guest from host.
Considering the following case:
The host cpu is ICELAKE_MOBILE, which doesn't have X86_FEATURE_CORE_CAPABILITY
but has X86_FEATURE_SPLIT_LOCK_DETECT.
If we run a guest with cpu model, Skylake. It will hidden the feature
X86_FEATURE_SPLIT_LOCK_DETECT from guest, since guest's MSR_IA32_CORE_CAPABILITY
reports zero, and FMS of guest doesn't match the list.
In this way, it failed to expose the X86_FEATURE_SPLIT_LOCK_DETECT to guest.
Since MSR_IA32_CORE_CAPBILITY is emulated in software for guest. We can enforce
using #1 to report X86_FEATURE_SPLIT_LOCK_DETECT of guest, thus guest can get
rid of the limitation of the FMS list.
> Thanks,
>
> tglx
On Mon, Apr 08, 2019 at 05:54:25PM +0800, Xiaoyao Li wrote:
> On Fri, 2019-04-05 at 14:30 +0200, Thomas Gleixner wrote:
> > On Wed, 3 Apr 2019, Fenghua Yu wrote:
> > > @@ -1663,6 +1663,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct
> > > msr_data *msr_info)
> > > u32 index;
> > >
> > > switch (msr_info->index) {
> > > + case MSR_TEST_CTL:
> > > + if (!msr_info->host_initiated &&
> > > + !(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> >
> > Errm? If the MSR_TEST_CTL is exposed to the guest via CPUID then the
> > rdmsr() in the guest is not supposed to fail just because
> > CORE_CAP_SPLIT_LOCK_DETECT is not set.
>
> you're right.
>
> > vmx->msr_test_ctl holds the proper value, which is either 0 or
> > CORE_CAP_SPLIT_LOCK_DETECT until more bits are supported.
> >
> > So the chek needs to be guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY).
>
> I don't think so. There is no dependecy between
> guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY) and MSR_TEST_CTL.
> guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY) only indicates that guest has MSR
> CORE_CAPABILITY.
>
> Due to the fact that MSR_TEST_CTL is emulated with vmx->msr_test_ctl. I think it
> 's ok to always let userspace or guest to read MSR_TEST_CTL, it just returns the
> emulated value. Like you said, " vmx->msr_test_ctl holds the proper value, which
> is either 0 or CORE_CAP_SPLIT_LOCK_DETECT until more bits are supported."
Assuming the next version implements "vmx->msr_test_ctl_mask", KVM should
inject #GP if the guest attempts RDMSR(MSR_TEST_CTL) and the mask is zero.
It stands to reason that a kernel can only reasonably assume the MSR exists
if the (virtual) CPU supports at least one feature enabled via MSR_TEST_CTL.
On Fri, 2019-04-05 at 14:00 +0200, Thomas Gleixner wrote:
> On Wed, 3 Apr 2019, Fenghua Yu wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >
> > +u64 kvm_get_core_capability(void)
> > +{
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_get_core_capability);
>
> Why is this global and exported? The only users are in this very file.
You're right.
Will remove it in next version.
> Thanks,
>
> tglx
On Thu, Apr 04, 2019 at 08:07:57PM +0200, Thomas Gleixner wrote:
> On Wed, 3 Apr 2019, Fenghua Yu wrote:
>
> > +static void init_split_lock_detect(struct cpuinfo_x86 *c)
> > +{
> > + if (cpu_has(c, X86_FEATURE_SPLIT_LOCK_DETECT)) {
> > + u32 l, h;
> > +
> > + mutex_lock(&split_lock_detect_mutex);
> > + rdmsr(MSR_TEST_CTL, l, h);
> > + l = new_sp_test_ctl_val(l);
> > + wrmsr(MSR_TEST_CTL, l, h);
> > + show_split_lock_detection_info();
> > + mutex_unlock(&split_lock_detect_mutex);
> > + }
> > +}
> > +
> > static void early_init_intel(struct cpuinfo_x86 *c)
> > {
> > u64 misc_enable;
> >
> > + init_split_lock_detect(c);
>
> so we have in early boot:
>
> early_cpu_init()
> early_identify_cpu()
> this_cpu->c_early_init(c)
> early_init_intel() {
> init_split_lock_detect();
> }
> ....
> cpu_set_core_cap_bits(c)
> set(FEATURE_SPLIT_LOCK)
>
> I don't have to understand how init_split_lock_detect() will magically see
> the feature bit which gets set afterwards, right?
early_init_intel() is called twice on the boot CPU. Besides it's called
in earl_cpu_init(), it's also called in:
identify_boot_cpu()
identify_cpu()
init_intel()
early_init_intel()
init_split_lock_detect();
It's true that init_split_lock_detect() doesn't see the feature bit when
it's called for the first time in early_cpu_init(). But it sees the feature
bit when it's called for the second time in identify_boot_cpu().
So is init_split_lock_detect() in the right place?
Thanks.
-Fenghua
On Mon, 2019-04-08 at 10:48 -0700, Sean Christopherson wrote:
> On Mon, Apr 08, 2019 at 05:54:25PM +0800, Xiaoyao Li wrote:
> > On Fri, 2019-04-05 at 14:30 +0200, Thomas Gleixner wrote:
> > > On Wed, 3 Apr 2019, Fenghua Yu wrote:
> > > > @@ -1663,6 +1663,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu,
> > > > struct
> > > > msr_data *msr_info)
> > > > u32 index;
> > > >
> > > > switch (msr_info->index) {
> > > > + case MSR_TEST_CTL:
> > > > + if (!msr_info->host_initiated &&
> > > > + !(vcpu->arch.core_capability &
> > > > CORE_CAP_SPLIT_LOCK_DETECT))
> > >
> > > Errm? If the MSR_TEST_CTL is exposed to the guest via CPUID then the
> > > rdmsr() in the guest is not supposed to fail just because
> > > CORE_CAP_SPLIT_LOCK_DETECT is not set.
> >
> > you're right.
> >
> > > vmx->msr_test_ctl holds the proper value, which is either 0 or
> > > CORE_CAP_SPLIT_LOCK_DETECT until more bits are supported.
> > >
> > > So the chek needs to be guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY).
> >
> > I don't think so. There is no dependecy between
> > guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY) and MSR_TEST_CTL.
> > guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY) only indicates that guest has
> > MSR
> > CORE_CAPABILITY.
> >
> > Due to the fact that MSR_TEST_CTL is emulated with vmx->msr_test_ctl. I
> > think it
> > 's ok to always let userspace or guest to read MSR_TEST_CTL, it just returns
> > the
> > emulated value. Like you said, " vmx->msr_test_ctl holds the proper value,
> > which
> > is either 0 or CORE_CAP_SPLIT_LOCK_DETECT until more bits are supported."
>
> Assuming the next version implements "vmx->msr_test_ctl_mask", KVM should
> inject #GP if the guest attempts RDMSR(MSR_TEST_CTL) and the mask is zero.
> It stands to reason that a kernel can only reasonably assume the MSR exists
> if the (virtual) CPU supports at least one feature enabled via MSR_TEST_CTL.
It makes sense to me. Thanks for your reminder.
Will apply it in next version.
On Tue, 9 Apr 2019, Fenghua Yu wrote:
> On Thu, Apr 04, 2019 at 08:07:57PM +0200, Thomas Gleixner wrote:
> > On Wed, 3 Apr 2019, Fenghua Yu wrote:
> > > static void early_init_intel(struct cpuinfo_x86 *c)
> > > {
> > > u64 misc_enable;
> > >
> > > + init_split_lock_detect(c);
> >
> > so we have in early boot:
> >
> > early_cpu_init()
> > early_identify_cpu()
> > this_cpu->c_early_init(c)
> > early_init_intel() {
> > init_split_lock_detect();
> > }
> > ....
> > cpu_set_core_cap_bits(c)
> > set(FEATURE_SPLIT_LOCK)
> >
> > I don't have to understand how init_split_lock_detect() will magically see
> > the feature bit which gets set afterwards, right?
>
> early_init_intel() is called twice on the boot CPU. Besides it's called
> in earl_cpu_init(), it's also called in:
> identify_boot_cpu()
> identify_cpu()
> init_intel()
> early_init_intel()
> init_split_lock_detect();
>
> It's true that init_split_lock_detect() doesn't see the feature bit when
> it's called for the first time in early_cpu_init(). But it sees the feature
> bit when it's called for the second time in identify_boot_cpu().
That's hideous, really.
> So is init_split_lock_detect() in the right place?
You're not seriously asking that?
It's obviously not the right place. We are not placing calls at random
points just because they happen to work by chance.
Thanks,
tglx
FWIW it took me a while to work out what a 'split lock' was.
I suspect because I was thinking of kernel locks, not the
instruction lock prefix.
It also isn't really obvious that 'split' refers to crossing
cache lines.
Referring to it as a 'misaligned lock' might be more
easily understood.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Wed, Apr 10, 2019 at 08:31:31AM +0200, Thomas Gleixner wrote:
> On Tue, 9 Apr 2019, Fenghua Yu wrote:
> > On Thu, Apr 04, 2019 at 08:07:57PM +0200, Thomas Gleixner wrote:
> > > On Wed, 3 Apr 2019, Fenghua Yu wrote:
> > > > static void early_init_intel(struct cpuinfo_x86 *c)
> > > > {
> > > > u64 misc_enable;
> > > >
> > > > + init_split_lock_detect(c);
> > >
> > > so we have in early boot:
> > >
> > > early_cpu_init()
> > > early_identify_cpu()
> > > this_cpu->c_early_init(c)
> > > early_init_intel() {
> > > init_split_lock_detect();
> > > }
> > > ....
> > > cpu_set_core_cap_bits(c)
> > > set(FEATURE_SPLIT_LOCK)
> > >
> > > I don't have to understand how init_split_lock_detect() will magically see
> > > the feature bit which gets set afterwards, right?
> >
> > early_init_intel() is called twice on the boot CPU. Besides it's called
> > in earl_cpu_init(), it's also called in:
> > identify_boot_cpu()
> > identify_cpu()
> > init_intel()
> > early_init_intel()
> > init_split_lock_detect();
> >
> > It's true that init_split_lock_detect() doesn't see the feature bit when
> > it's called for the first time in early_cpu_init(). But it sees the feature
> > bit when it's called for the second time in identify_boot_cpu().
>
> That's hideous, really.
>
> > So is init_split_lock_detect() in the right place?
>
> You're not seriously asking that?
>
> It's obviously not the right place. We are not placing calls at random
> points just because they happen to work by chance.
Is it OK to put init_split_lock_detect(c) after early_init_intel() in
init_intel()? X86_FEATURE_SPLIT_LOCK_DETECT is available now and
init_split_lock_detec() is called only once on each CPU.
@@ -746,6 +749,8 @@ static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
+ init_split_lock_detect(c);
+
intel_workarounds(c);
Thanks.
-Fenghua