==Introduction==
A split lock is any atomic operation whose operand crosses two cache
lines. Since the operand spans two cache lines and the operation must
be atomic, the system locks the bus while the CPU accesses the two cache
lines.
During bus locking, request from other CPUs or bus agents for control
of the bus are blocked. Blocking bus access from other CPUs plus
overhead of configuring bus locking protocol degrade not only the
performance of one CPU but overall system performance.
If operand is cacheable and completely contained in one cache line, atomic
operation is optimized by less expensive cache locking on Intel P6 and
recent processors. If split lock is detected and the two cache lines in the
operand can be merged into one cache line, cache locking instead of
more expensive bus locking will be used for atomic operation. Removing
split lock can improve overall performance.
Instructions that may cause split lock issue include lock add, lock btc,
xchg, lsl, far call, ltr, etc.
More information about split lock, bus locking, and cache locking can be
found in the latest Intel 64 and IA-32 Architecture Software Developer's
Manual.
==Split lock detection==
Currently we can trace split lock event counter sq_misc.split_lock
for debug purpose. But for system deployed in the field, this event
counter after the fact is insufficient. We need a mechanism that
detects split lock before it happens to ensure that bus lock is never
incurred due to split lock.
Intel introduces a mechanism to detect split lock via Alignment Check
(#AC) exception before badly aligned atomic instructions might impact
whole system performance in Tremont and other future processors.
This capability is critical for real time system designers who build
consolidated real time systems. These systems run hard real time
code on some cores and run "untrusted" user processes on some
other cores. The hard real time cannot afford to have any bus lock from
the untrusted processes to hurt real time performance. To date the
designers have been unable to deploy these solutions as they have no
way to prevent the "untrusted" user code from generating split lock and
bus lock to block the hard real time code to access memory during bus
locking.
This capability may also find usage in cloud. A user process with split
lock running in one guest can block other cores from accessing shared
memory during its split locked memory access. That may cause overall
system performance degradation.
Split lock may open a security hole where malicious user code may slow
down overall system by executing instructions with split lock.
==Enumerate #AC for split lock==
A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
future x86 processors. When the bit 29 is set, the processor causes
#AC exception for split locked accesses at all CPL.
The split lock detection feature is enumerated through bit 5 in
MSR IA32_CORE_CAPABILITY (0xCF). The MSR 0xCF itself is enumerated by
CPUID.(EAX=0x7,ECX=0):EDX[30].
The enumeration method is published in the latest Intel Software
Developer's Manual.
A few processors don't follow the above enumeration. For those processors,
we can enable #AC for split lock based on their FMS numbers. We will
enable split lock detection on those processors in the future once their
FMS numbers are public.
==Handle Split Lock===
There may be different considerations to handle split lock, e.g. how
to handle split lock issue in firmware after kernel enables #AC for
split lock.
But we use a simple way to handle split lock which is suggested
by Thomas Gleixner and Dave Hansen:
- If split lock happens in kernel, a warning is issued and #AC for
split lock is disabled on the local CPU. The split lock issue should
be fixed in kernel.
- If split lock happens in user process, the process is killed by
SIGBUS. Unless the issue is fixed, the process cannot run in the
system.
- If split lock happens in firmware, system may hang in firmware. The
issue should be fixed in firmware.
- Enable split lock detection by default once the feature is enumerated.
- Disable split lock detection by "clearcpuid=split_lock_detect" during
boot time.
- Disable/enable split lock detection by interface
/sys/devices/system/cpu/split_lock_detect during run time.
==Expose to guest==
To expose this feature to guest, we need to
1. report the new CPUID bit to guest.
2. emulate IA32_CORE_CAPABILITIES MSR.
3. emulate TEST_CTL MSR. We must do it. Since this patch
series enable split lock detection in host kernel by default, if we do
not emualte TEST_CTL MSR for guest, guest will run with the value set
by host without knowing that. So guest will run with split lock detection
enabled due to the host's value. Thus guest running with buggy firmware
and old kernel will fail because they lack the ability to handle #AC for
split lock. So we should emulate TEST_CTL MSR and separate its value
between host and guest.
==Patches==
Patch 1-4: Fix a few split lock issues.
Patch 5-7: Enumerate features.
Patch 8-12: Extend "clearcpuid=" option.
Patch 13: Handle #AC for split lock
Patch 14: /sys interface to enable/disable split lock detection
Patch 15-17: Enable split lock detection in KVM.
==Changelog==
v4:
- Remove "setcpuid=" option
- Enable IA32_CORE_CAPABILITY enumeration for split lock
- Handle CPUID faulting by Peter Zijlstra
- Enable /sys interface to enable/disable split lock detection
v3:
- Handle split lock as suggested by Thomas Gleixner.
- Fix a few potential spit lock issues suggested by Thomas Gleixner.
- Support kernel option "setcpuid=" suggested by Dave Hanson and Thomas
Gleixner.
- Support flag string in "clearcpuid=" suggested by Dave Hanson and
Thomas Gleixner.
v2:
- Remove code that handles split lock issue in firmware and fix
x86_capability issue mainly based on comments from Thomas Gleixner and
Peter Zijlstra.
In previous version:
Comments from Dave Hansen:
- Enumerate feature in X86_FEATURE_SPLIT_LOCK_AC
- Separate #AC handler from do_error_trap
- Use CONFIG to configure inherit BIOS setting, enable, or disable split
lock. Remove kernel parameter "split_lock_ac="
- Change config interface to debugfs from sysfs
- Fix a few bisectable issues
- Other changes.
Comment from Tony Luck and Dave Hansen:
- Dump right information in #AC handler
Comment from Alan Cox and Dave Hansen:
- Description of split lock in patch 0
Others:
- Remove tracing because we can trace split lock in existing
sq_misc.split_lock.
- Add CONFIG to configure either panic or re-execute faulting instruction
for split lock in kernel.
- other minor changes.
Fenghua Yu (13):
x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long
drivers/net/b44: Align pwol_mask to unsigned long for better
performance
wlcore: Align reg_ch_conf_pending and tmp_ch_bitmap to unsigned long
for better performance
x86/split_lock: Align x86_capability to unsigned long to avoid split
locked access
x86/cpufeatures: Enumerate IA32_CORE_CAPABILITIES MSR
x86/msr-index: Define IA32_CORE_CAPABILITY MSR and #AC exception for
split lock bit
x86/split_lock: Enumerate #AC for split lock by MSR
IA32_CORE_CAPABILITY
x86/clearcpuid: Support multiple clearcpuid options
x86/clearcpuid: Support feature flag string in kernel option
clearcpuid
x86/clearcpuid: Apply cleared feature bits that are forced set before
Change document for kernel option clearcpuid
x86/split_lock: Handle #AC exception for split lock
x86/split_lock: Add a sysfs interface to allow user to enable or
disable split lock detection on all CPUs during run time
Peter Zijlstra (1):
x86/clearcpuid: Handle clearcpuid faulting
Xiaoyao Li (3):
kvm: x86: Report CORE_CAPABILITY on GET_SUPPORTED_CPUID
kvm: x86: Add support IA32_CORE_CAPABILITY MSR
kvm: vmx: Emulate TEST_CTL MSR
Documentation/admin-guide/kernel-parameters.txt | 6 +-
arch/x86/include/asm/cmdline.h | 3 +
arch/x86/include/asm/cpu.h | 7 +
arch/x86/include/asm/cpufeature.h | 5 +
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/msr-index.h | 7 +
arch/x86/include/asm/processor.h | 4 +-
arch/x86/kernel/cpu/common.c | 13 +-
arch/x86/kernel/cpu/cpuid-deps.c | 79 +++++++++++
arch/x86/kernel/cpu/intel.c | 170 +++++++++++++++++++++++-
arch/x86/kernel/cpu/scattered.c | 17 +++
arch/x86/kernel/fpu/init.c | 33 +++--
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/traps.c | 42 ++++++
arch/x86/kvm/cpuid.c | 3 +-
arch/x86/kvm/vmx/vmx.c | 58 ++++++++
arch/x86/kvm/vmx/vmx.h | 2 +
arch/x86/kvm/x86.c | 17 ++-
arch/x86/lib/cmdline.c | 17 ++-
drivers/net/ethernet/broadcom/b44.c | 3 +-
drivers/net/wireless/ti/wlcore/cmd.c | 3 +-
drivers/net/wireless/ti/wlcore/wlcore.h | 6 +-
23 files changed, 473 insertions(+), 28 deletions(-)
--
2.7.4
A new IA32_CORE_CAPABILITY MSR (0xCF) is defined. Each bit in
the MSR enumerates a model specific feature. Currently bit 5 enumerates
#AC exception for split locked accesses. When bit 5 is 1, split locked
accesses will generate #AC exception. When bit 5 is 0, split locked
accesses will not generate #AC exception.
Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more detailed information
on the MSR and the split lock bit.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/msr-index.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 8e40c2446fd1..549e73dcca15 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -59,6 +59,9 @@
#define MSR_PLATFORM_INFO_CPUID_FAULT_BIT 31
#define MSR_PLATFORM_INFO_CPUID_FAULT BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
+#define MSR_IA32_CORE_CAPABILITY 0x000000cf
+#define CORE_CAP_SPLIT_LOCK_DETECT BIT(5) /* Detect split lock */
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
--
2.7.4
From: Xiaoyao Li <[email protected]>
A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
future x86 processors. When bit 29 is set, the processor causes #AC
exception for split locked accesses at all CPL.
Please check the latest Intel Software Developer's Manual
for more detailed information on the MSR and the split lock bit.
1. Since the kernel chooses to enable AC split lock by default, which
means if we don't emulate TEST_CTL MSR for guest, guest will run with
this feature enable while does not known it. Thus existing guests with
buggy firmware (like OVMF) and old kernels having the cross cache line
issues will fail the boot due to #AC.
So we should emulate TEST_CTL MSR, and set it zero to disable AC split
lock by default. Whether and when to enable it is left to guest firmware
and guest kernel.
2. Host and guest can enable AC split lock independently, so using
msr autoload to switch it during VM entry/exit.
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 1 +
2 files changed, 36 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3e03c6e1e558..c0c5e8621afa 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1659,6 +1659,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
u32 index;
switch (msr_info->index) {
+ case MSR_TEST_CTL:
+ if (!msr_info->host_initiated &&
+ !(vmx->core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
+ return 1;
+ msr_info->data = vmx->msr_test_ctl;
+ break;
#ifdef CONFIG_X86_64
case MSR_FS_BASE:
msr_info->data = vmcs_readl(GUEST_FS_BASE);
@@ -1805,6 +1811,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
u32 index;
switch (msr_index) {
+ case MSR_TEST_CTL:
+ if (!(vmx->core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
+ return 1;
+
+ if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
+ return 1;
+ vmx->msr_test_ctl = data;
+ break;
case MSR_EFER:
ret = kvm_set_msr_common(vcpu, msr_info);
break;
@@ -4108,6 +4122,9 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
vmx->arch_capabilities = kvm_get_arch_capabilities();
+ /* disable AC split lock by default */
+ vmx->msr_test_ctl = 0;
+
vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
/* 22.2.1, 20.8.1 */
@@ -4145,6 +4162,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmx->rmode.vm86_active = 0;
vmx->spec_ctrl = 0;
+ vmx->msr_test_ctl = 0;
vcpu->arch.microcode_version = 0x100000000ULL;
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
@@ -6344,6 +6362,21 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
msrs[i].host, false);
}
+static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
+{
+ u64 host_msr_test_ctl;
+
+ if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+ return;
+
+ rdmsrl(MSR_TEST_CTL, host_msr_test_ctl);
+ if (host_msr_test_ctl == vmx->msr_test_ctl)
+ clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
+ else
+ add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
+ host_msr_test_ctl, false);
+}
+
static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
{
vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
@@ -6585,6 +6618,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
atomic_switch_perf_msrs(vmx);
+ atomic_switch_msr_test_ctl(vmx);
+
vmx_update_hv_timer(vcpu);
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index cc22379991f3..e8831609c6c3 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -191,6 +191,7 @@ struct vcpu_vmx {
u64 msr_guest_kernel_gs_base;
#endif
+ u64 msr_test_ctl;
u64 core_capability;
u64 arch_capabilities;
u64 spec_ctrl;
--
2.7.4
From: Xiaoyao Li <[email protected]>
In the latest Intel SDM, CPUID.(EAX=7H,ECX=0):EDX[30] will enumerate
the presence of the IA32_CORE_CAPABILITY MSR.
Update GET_SUPPORTED_CPUID to expose this feature bit to user space, so
that user space know this bit can be enabled in CPUID.
Signed-off-by: Xiaoyao Li <[email protected]>
---
arch/x86/kvm/cpuid.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c07958b59f50..e0e17b9c65da 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -410,7 +410,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
- F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP);
+ F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(CORE_CAPABILITY) |
+ F(INTEL_STIBP);
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
--
2.7.4
There may be different considerations on how to handle #AC for split lock,
e.g. how to handle system hang caused by split lock issue in firmware,
how to emulate faulting instruction, etc. We use a simple method to
handle user and kernel split lock and may extend the method in the future.
When #AC exception for split lock is triggered from user process, the
process is killed by SIGBUS. To execute the process properly, user
application developer needs to fix the split lock issue.
When #AC exception for split lock is triggered from a kernel instruction,
disable #AC for split lock on local CPU and warn the split lock issue.
After the exception, the faulting instruction will be executed and kernel
execution continues. #AC for split lock is only disabled on the local CPU
not globally. It will be re-enabled if the CPU is offline and then online.
Kernel developer should check the warning, which contains helpful faulting
address, context, and callstack info, and fix the split lock issue
one by one. Then further split lock may be captured and fixed.
After bit 29 in MSR_TEST_CTL is set as one in kernel, firmware inherits
the setting when firmware is executed in S4, S5, run time services, SMI,
etc. Split lock issue in firmware triggers #AC and may hang the system
depending on how firmware handles the #AC. It's up to firmware developer
to fix the split lock issues in firmware.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/msr-index.h | 4 ++++
arch/x86/kernel/traps.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 549e73dcca15..d2d43ebcb8ab 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -39,6 +39,10 @@
/* Intel MSRs. Some also available on other CPUs */
+#define MSR_TEST_CTL 0x00000033
+#define TEST_CTL_ENABLE_SPLIT_LOCK_DETECT_SHIFT 29
+#define TEST_CTL_ENABLE_SPLIT_LOCK_DETECT BIT(29)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c43c2608332e..63a4dc985340 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
#include <asm/mpx.h>
#include <asm/vm86.h>
#include <asm/umip.h>
+#include <asm/cpu.h>
#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -292,7 +293,37 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
+#ifndef CONFIG_CPU_SUP_INTEL
DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
+#else
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ unsigned int trapnr = X86_TRAP_AC;
+ char str[] = "alignment check";
+ int signr = SIGBUS;
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
+ NOTIFY_STOP) {
+ cond_local_irq_enable(regs);
+ if (!user_mode(regs)) {
+ /*
+ * Only split lock can generate #AC from kernel. Warn
+ * and disable #AC for split lock on current CPU.
+ */
+ msr_clear_bit(MSR_TEST_CTL,
+ TEST_CTL_ENABLE_SPLIT_LOCK_DETECT_SHIFT);
+ WARN_ONCE(1, "A split lock issue is detected.\n");
+
+ return;
+ }
+ /* Handle #AC generated from user code. */
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+ }
+}
+#endif
#undef IP
#ifdef CONFIG_VMAP_STACK
--
2.7.4
From: Xiaoyao Li <[email protected]>
MSR IA32_CORE_CAPABILITY is a feature-enumerating MSR, bit 5 of which
reports the capability of enabling detection of split locks (will be
supported on future processors based on Tremont microarchitecture and
later).
Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more detailed information
on the MSR and the split lock bit.
1. Expose it to user space as a feature-enumerating MSR, so that user
space can query it.
2. Emualte MSR_IA32_CORE_CAPABILITY with vmx->core_capability. And add the
get and set handler of MSR_IA32_CORE_CAPABILITY.
For uesrspace, it can set this MSR when customizing features of guest,
also it can read the value of this MSR of guest.
For guest, as it's a feature-enumerating MSR, guest only can read it.
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 1 +
arch/x86/kvm/x86.c | 17 ++++++++++++++++-
4 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 180373360e34..208f15570d17 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1527,6 +1527,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
unsigned long icr, int op_64_bit);
u64 kvm_get_arch_capabilities(void);
+u64 kvm_get_core_capability(void);
void kvm_define_shared_msr(unsigned index, u32 msr);
int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 30a6bcd735ec..3e03c6e1e558 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1679,6 +1679,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = to_vmx(vcpu)->spec_ctrl;
break;
+ case MSR_IA32_CORE_CAPABILITY:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITY))
+ return 1;
+ msr_info->data = vmx->core_capability;
+ break;
case MSR_IA32_ARCH_CAPABILITIES:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
@@ -1891,6 +1897,21 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
MSR_TYPE_W);
break;
+ case MSR_IA32_CORE_CAPABILITY:
+ if (!msr_info->host_initiated)
+ return 1;
+ if (data & ~CORE_CAP_SPLIT_LOCK_DETECT)
+ return 1;
+
+ /*
+ * Since AC split lock is a hardware feature, and there is no
+ * software emulation yet, we cannot enable it for guest if
+ * host hardware doesn't support it.
+ */
+ if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+ data &= ~CORE_CAP_SPLIT_LOCK_DETECT;
+ vmx->core_capability = data;
+ break;
case MSR_IA32_ARCH_CAPABILITIES:
if (!msr_info->host_initiated)
return 1;
@@ -4083,6 +4104,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}
+ vmx->core_capability = kvm_get_core_capability();
+
vmx->arch_capabilities = kvm_get_arch_capabilities();
vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 0ac0a64c7790..cc22379991f3 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -191,6 +191,7 @@ struct vcpu_vmx {
u64 msr_guest_kernel_gs_base;
#endif
+ u64 core_capability;
u64 arch_capabilities;
u64 spec_ctrl;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 941f932373d0..c3c9e3f2d08a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1125,7 +1125,8 @@ static u32 msrs_to_save[] = {
#endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
- MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_SPEC_CTRL, MSR_IA32_CORE_CAPABILITY,
+ MSR_IA32_ARCH_CAPABILITIES,
MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
@@ -1197,11 +1198,22 @@ static u32 msr_based_features[] = {
MSR_F10H_DECFG,
MSR_IA32_UCODE_REV,
+ MSR_IA32_CORE_CAPABILITY,
MSR_IA32_ARCH_CAPABILITIES,
};
static unsigned int num_msr_based_features;
+u64 kvm_get_core_capability(void)
+{
+ u64 data;
+
+ rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data);
+
+ return data;
+}
+EXPORT_SYMBOL_GPL(kvm_get_core_capability);
+
u64 kvm_get_arch_capabilities(void)
{
u64 data;
@@ -1227,6 +1239,9 @@ EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
{
switch (msr->index) {
+ case MSR_IA32_CORE_CAPABILITY:
+ msr->data = kvm_get_core_capability();
+ break;
case MSR_IA32_ARCH_CAPABILITIES:
msr->data = kvm_get_arch_capabilities();
break;
--
2.7.4
Since kernel option clearcpuid now supports multiple options and CPU
capability flags, the document needs to be changed.
Signed-off-by: Fenghua Yu <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 858b6c0b9a15..1a195f93d3cc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -558,10 +558,12 @@
loops can be debugged more effectively on production
systems.
- clearcpuid=BITNUM [X86]
+ clearcpuid=BITNUM | FLAG [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
- numbers. Note the Linux specific bits are not necessarily
+ numbers or /proc/cpuinfo for valid CPU flags.
+ Multiple options can be used to disable a few features.
+ Note the Linux specific bits are not necessarily
stable over kernel options, but the vendor specific
ones should be.
Also note that user programs calling CPUID directly
--
2.7.4
The interface /sys/device/system/cpu/split_lock_detect is added
to allow user to control split lock detection and show current split
lock detection setting.
Writing 1 to the file enables split lock detection and writing 0
disables split lock detection. Split lock detection is enabled or
disabled on all CPUs.
Reading the file shows current global split lock detection setting:
disabled when 0 and enabled when 1.
Please note the interface only shows global setting. During run time,
split lock detection on one CPU may be disabled if split lock
in kernel code happens on the CPU. The interface doesn't show split
lock detection status on individual CPU. User can run rdmsr on 0x33
to check split lock detection on each CPU.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 96 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 95 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 1c1ec413a9a9..1512e96287b8 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -33,6 +33,12 @@
#include <asm/apic.h>
#endif
+#define DISABLE_SPLIT_LOCK_DETECT 0
+#define ENABLE_SPLIT_LOCK_DETECT 1
+
+static DEFINE_MUTEX(split_lock_detect_mutex);
+static int split_lock_detect_val;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -163,10 +169,35 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
return false;
}
+static u32 new_sp_test_ctl_val(u32 test_ctl_val)
+{
+ /* Change the split lock setting. */
+ if (split_lock_detect_val == DISABLE_SPLIT_LOCK_DETECT)
+ test_ctl_val &= ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
+ else
+ test_ctl_val |= TEST_CTL_ENABLE_SPLIT_LOCK_DETECT;
+
+ return test_ctl_val;
+}
+
+static void init_split_lock_detect(struct cpuinfo_x86 *c)
+{
+ if (cpu_has(c, X86_FEATURE_SPLIT_LOCK_DETECT)) {
+ u32 l, h;
+
+ rdmsr(MSR_TEST_CTL, l, h);
+ l = new_sp_test_ctl_val(l);
+ wrmsr(MSR_TEST_CTL, l, h);
+ pr_info_once("#AC for split lock is enabled\n");
+ }
+}
+
static void early_init_intel(struct cpuinfo_x86 *c)
{
u64 misc_enable;
+ init_split_lock_detect(c);
+
/* Unmask CPUID levels if masked: */
if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
if (msr_clear_bit(MSR_IA32_MISC_ENABLE,
@@ -1095,7 +1126,70 @@ void init_core_capability(struct cpuinfo_x86 *c)
rdmsrl(MSR_IA32_CORE_CAPABILITY, val);
- if (val & CORE_CAP_SPLIT_LOCK_DETECT)
+ if (val & CORE_CAP_SPLIT_LOCK_DETECT) {
setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+ split_lock_detect_val = 1;
+ }
}
}
+
+static ssize_t
+split_lock_detect_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%u\n", READ_ONCE(split_lock_detect_val));
+}
+
+static ssize_t
+split_lock_detect_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ u32 val, l, h;
+ int cpu, ret;
+
+ ret = kstrtou32(buf, 10, &val);
+ if (ret)
+ return ret;
+
+ if (val != DISABLE_SPLIT_LOCK_DETECT && val != ENABLE_SPLIT_LOCK_DETECT)
+ return -EINVAL;
+
+ /* No need to update MSR if new setting is the same as old one. */
+ if (val == split_lock_detect_val)
+ return count;
+
+ /* Only one write can happen at the same time. */
+ mutex_lock(&split_lock_detect_mutex);
+
+ WRITE_ONCE(split_lock_detect_val, val);
+
+ /* Get MSR_TEST_CTL on this CPU, assuming all CPUs have same value. */
+ rdmsr(MSR_TEST_CTL, l, h);
+ l = new_sp_test_ctl_val(l);
+ /* Update the split lock detection setting on all online CPUs. */
+ for_each_online_cpu(cpu)
+ wrmsr_on_cpu(cpu, MSR_TEST_CTL, l, h);
+
+ mutex_unlock(&split_lock_detect_mutex);
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(split_lock_detect);
+
+static int __init split_lock_init(void)
+{
+ int ret;
+
+ if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+ return -ENODEV;
+
+ ret = device_create_file(cpu_subsys.dev_root,
+ &dev_attr_split_lock_detect);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+subsys_initcall(split_lock_init);
--
2.7.4
The kernel option clearcpuid currently only takes feature bit which
can be changed from kernel to kernel.
Extend clearcpuid to use cap flag string, which is defined in
x86_cap_flags[] and won't be changed from kernel to kernel.
And user can easily get the cap flag string from /proc/cpuinfo.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 26 ++++++++++++++++++++++++++
arch/x86/kernel/fpu/init.c | 3 ++-
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index ce95b8cbd229..6792088525e3 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -132,6 +132,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
extern void setup_clear_cpu_cap(unsigned int bit);
extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
+bool find_cpu_cap(char *cap_flag, unsigned int *pfeature);
#define setup_force_cpu_cap(bit) do { \
set_cpu_cap(&boot_cpu_data, bit); \
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 5ba11ce99f92..b84b133d0ebd 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -120,3 +120,29 @@ void setup_clear_cpu_cap(unsigned int feature)
{
do_clear_cpu_cap(NULL, feature);
}
+
+/**
+ * find_cpu_cap - Given a cap flag string, find its corresponding feature bit.
+ * @cap_flag: cap flag string as defined in x86_cap_flags[]
+ * @pfeature: feature bit
+ *
+ * Return: true if the feature is found. false if not found
+ */
+bool find_cpu_cap(char *cap_flag, unsigned int *pfeature)
+{
+#ifdef CONFIG_X86_FEATURE_NAMES
+ unsigned int feature;
+
+ for (feature = 0; feature < NCAPINTS * 32; feature++) {
+ if (!x86_cap_flags[feature])
+ continue;
+
+ if (strcmp(cap_flag, x86_cap_flags[feature]) == 0) {
+ *pfeature = feature;
+
+ return true;
+ }
+ }
+#endif
+ return false;
+}
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 88bbba7ee96a..99b895eea166 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -256,7 +256,8 @@ static void __init clear_cpuid(void)
/* Chang command line range for next search. */
cmdline_size = option_pos - boot_command_line + 1;
argptr = arg;
- if (get_option(&argptr, &bit) &&
+ /* cpu cap can be specified by either feature bit or string */
+ if ((get_option(&argptr, &bit) || find_cpu_cap(arg, &bit)) &&
bit >= 0 && bit < NCAPINTS * 32)
setup_clear_cpu_cap(bit);
}
--
2.7.4
cpu_caps_cleared and cpu_caps_set may not be aligned to unsigned long.
Atomic operations (i.e. set_bit and clear_bit) on the bitmaps may access
two cache lines (a.k.a. split lock) and lock bus to block all memory
accesses from other processors to ensure atomicity.
To avoid the overall performance degradation from the bus locking, align
the two variables to unsigned long.
Defining the variables as unsigned long may also fix the issue because
they are naturally aligned to unsigned long. But that needs additional
code changes. Adding __aligned(unsigned long) are simpler fixes.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/common.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb28e98a0659..51ab37ba5f64 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -488,8 +488,9 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
return NULL; /* Not found */
}
-__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS];
-__u32 cpu_caps_set[NCAPINTS + NBUGINTS];
+/* Unsigned long alignment to avoid split lock in atomic bitmap ops */
+__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
+__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
void load_percpu_segment(int cpu)
{
--
2.7.4
MSR register IA32_CORE_CAPABILITIES (0xCF) contains bits that enumerate
some model specific features.
The MSR 0xCF itself is enumerated by CPUID.(EAX=0x7,ECX=0):EDX[30].
When this bit is 1, the MSR 0xCF exists.
Detailed information for the CPUID bit and the MSR can be found in the
latest Intel Architecture Instruction Set Extensions and Future Features
Programming Reference.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6d6122524711..350eeccd0ce9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -349,6 +349,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITY (18*32+30) /* IA32_CORE_CAPABILITY MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */
/*
--
2.7.4
set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
operate on bitmap defined in x86_capability.
Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
the location is at:
base address of x86_capability + (bit offset in x86_capability % 64) * 8
Since base address of x86_capability may not aligned to unsigned long,
the single unsigned long location may cross two cache lines and
accessing the location by locked BTS/BTR introductions will trigger #AC.
To fix the split lock issue, align x86_capability to unsigned long so that
the location will be always within one cache line.
Changing x86_capability[]'s type to unsigned long may also fix the issue
because x86_capability[] will be naturally aligned to unsigned long. But
this needs additional code changes. So we choose the simpler solution
by enforcing alignment using __aligned(unsigned long).
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/processor.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 33051436c864..eb8ae701ef65 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -93,7 +93,9 @@ struct cpuinfo_x86 {
__u32 extended_cpuid_level;
/* Maximum supported CPUID level, -1=no CPUID: */
int cpuid_level;
- __u32 x86_capability[NCAPINTS + NBUGINTS];
+ /* Unsigned long alignment to avoid split lock in atomic bitmap ops */
+ __u32 x86_capability[NCAPINTS + NBUGINTS]
+ __aligned(sizeof(unsigned long));
char x86_vendor_id[16];
char x86_model_id[64];
/* in KB - valid for CPUS which support this call: */
--
2.7.4
Bits in MSR IA32_CORE_CAPABILITY enumerate features that are not
enumerated through CPUID. Currently bit 5 is defined to enumerate
feature of #AC for split lock accesses. All other bits are reserved now.
When the bit 5 is 1, the feature is supported and feature bit
X86_FEATURE_SPLIT_LOCK_DETECT is set. Otherwise, the feature is not
available.
The MSR IA32_CORE_CAPABILITY itself is enumerated by
CPUID.(EAX=0x7,ECX=0):EDX[30].
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpu.h | 5 +++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/intel.c | 21 +++++++++++++++++++++
5 files changed, 29 insertions(+)
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..e241abce1a2a 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,9 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+void init_core_capability(struct cpuinfo_x86 *c);
+#else
+static inline void init_core_capability(struct cpuinfo_x86 *c) {}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 350eeccd0ce9..54c73e74213d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -221,6 +221,7 @@
#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */
#define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT ( 7*32+31) /* #AC for split lock */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 51ab37ba5f64..79e7cc0c4c85 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -897,6 +897,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
init_scattered_cpuid_features(c);
init_speculation_control(c);
+ init_core_capability(c);
/*
* Clear/Set all flags overridden by options, after probe.
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 2c0bd38a44ab..5ba11ce99f92 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -59,6 +59,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F },
+ { X86_FEATURE_SPLIT_LOCK_DETECT, X86_FEATURE_CORE_CAPABILITY},
{}
};
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index fc3c07fe7df5..0c44c49f6005 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1029,3 +1029,24 @@ static const struct cpu_dev intel_cpu_dev = {
cpu_dev_register(intel_cpu_dev);
+/**
+ * init_core_capability - enumerate features supported in IA32_CORE_CAPABILITY
+ * @c: pointer to cpuinfo_x86
+ *
+ * Return: void
+ */
+void init_core_capability(struct cpuinfo_x86 *c)
+{
+ /*
+ * If MSR_IA32_CORE_CAPABILITY exists, enumerate features that are
+ * reported in the MSR.
+ */
+ if (c == &boot_cpu_data && cpu_has(c, X86_FEATURE_CORE_CAPABILITY)) {
+ u64 val;
+
+ rdmsrl(MSR_IA32_CORE_CAPABILITY, val);
+
+ if (val & CORE_CAP_SPLIT_LOCK_DETECT)
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+ }
+}
--
2.7.4
Currently only one kernel option "clearcpuid=" can be picked up by
kernel during boot time.
In some cases, user may want to clear a few cpu caps. This may be
useful to replace endless (new) kernel options like nosmep, nosmap,
etc.
We add support of multiple clearcpuid options to allow user to clear
multiple cpu caps.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cmdline.h | 3 +++
arch/x86/kernel/fpu/init.c | 30 ++++++++++++++++++++----------
arch/x86/lib/cmdline.c | 17 +++++++++++++++--
3 files changed, 38 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/cmdline.h b/arch/x86/include/asm/cmdline.h
index 6faaf27e8899..059e29558bb3 100644
--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -5,5 +5,8 @@
int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
int cmdline_find_option(const char *cmdline_ptr, const char *option,
char *buffer, int bufsize);
+int cmdline_find_option_in_range(const char *cmdline_ptr, int cmdline_size,
+ const char *option, char *buffer, int bufsize,
+ char **arg_pos);
#endif /* _ASM_X86_CMDLINE_H */
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 6abd83572b01..88bbba7ee96a 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -243,16 +243,31 @@ static void __init fpu__init_system_ctx_switch(void)
WARN_ON_FPU(current->thread.fpu.initialized);
}
+static void __init clear_cpuid(void)
+{
+ char arg[32], *argptr, *option_pos, clearcpuid_option[] = "clearcpuid";
+ int bit, cmdline_size;
+
+ /* Find each option in boot_command_line and clear specified cpu cap. */
+ cmdline_size = COMMAND_LINE_SIZE;
+ while (cmdline_find_option_in_range(boot_command_line, cmdline_size,
+ clearcpuid_option, arg,
+ sizeof(arg), &option_pos) >= 0) {
+ /* Chang command line range for next search. */
+ cmdline_size = option_pos - boot_command_line + 1;
+ argptr = arg;
+ if (get_option(&argptr, &bit) &&
+ bit >= 0 && bit < NCAPINTS * 32)
+ setup_clear_cpu_cap(bit);
+ }
+}
+
/*
* We parse fpu parameters early because fpu__init_system() is executed
* before parse_early_param().
*/
static void __init fpu__init_parse_early_param(void)
{
- char arg[32];
- char *argptr = arg;
- int bit;
-
if (cmdline_find_option_bool(boot_command_line, "no387"))
setup_clear_cpu_cap(X86_FEATURE_FPU);
@@ -271,12 +286,7 @@ static void __init fpu__init_parse_early_param(void)
if (cmdline_find_option_bool(boot_command_line, "noxsaves"))
setup_clear_cpu_cap(X86_FEATURE_XSAVES);
- if (cmdline_find_option(boot_command_line, "clearcpuid", arg,
- sizeof(arg)) &&
- get_option(&argptr, &bit) &&
- bit >= 0 &&
- bit < NCAPINTS * 32)
- setup_clear_cpu_cap(bit);
+ clear_cpuid();
}
/*
diff --git a/arch/x86/lib/cmdline.c b/arch/x86/lib/cmdline.c
index 3261abb21ef4..9cf1a0773877 100644
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -114,13 +114,15 @@ __cmdline_find_option_bool(const char *cmdline, int max_cmdline_size,
* @option: option string to look for
* @buffer: memory buffer to return the option argument
* @bufsize: size of the supplied memory buffer
+ * @option_pos: pointer to the option if found
*
* Returns the length of the argument (regardless of if it was
* truncated to fit in the buffer), or -1 on not found.
*/
static int
__cmdline_find_option(const char *cmdline, int max_cmdline_size,
- const char *option, char *buffer, int bufsize)
+ const char *option, char *buffer, int bufsize,
+ char **arg_pos)
{
char c;
int pos = 0, len = -1;
@@ -164,6 +166,9 @@ __cmdline_find_option(const char *cmdline, int max_cmdline_size,
len = 0;
bufptr = buffer;
state = st_bufcpy;
+ if (arg_pos)
+ *arg_pos = (char *)cmdline -
+ strlen(option) - 1;
break;
} else if (c == *opptr++) {
/*
@@ -211,5 +216,13 @@ int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
int bufsize)
{
return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
- buffer, bufsize);
+ buffer, bufsize, NULL);
+}
+
+int cmdline_find_option_in_range(const char *cmdline, int cmdline_size,
+ char *option, char *buffer, int bufsize,
+ char **arg_pos)
+{
+ return __cmdline_find_option(cmdline, cmdline_size, option, buffer,
+ bufsize, arg_pos);
}
--
2.7.4
A bit in reg_ch_conf_pending in wl271 and tmp_ch_bitmap is set atomically
by set_bit(). set_bit() sets the bit in a single unsigned long location. If
the variables are not aligned to unsigned long, set_bit() accesses two
cache lines and thus causes slower performance. On x86, this scenario is
called split lock and can cause overall performance degradation due to
locked BTSL instruction in set_bit() locks bus.
To avoid performance degradation, the two variables are aligned to
unsigned long.
Signed-off-by: Fenghua Yu <[email protected]>
---
drivers/net/wireless/ti/wlcore/cmd.c | 3 ++-
drivers/net/wireless/ti/wlcore/wlcore.h | 6 ++++--
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c
index 903968735a74..8d15a6307d44 100644
--- a/drivers/net/wireless/ti/wlcore/cmd.c
+++ b/drivers/net/wireless/ti/wlcore/cmd.c
@@ -1707,7 +1707,8 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
{
struct wl12xx_cmd_regdomain_dfs_config *cmd = NULL;
int ret = 0, i, b, ch_bit_idx;
- u32 tmp_ch_bitmap[2];
+ /* Align to unsigned long for better performance in set_bit() */
+ u32 tmp_ch_bitmap[2] __aligned(sizeof(unsigned long));
struct wiphy *wiphy = wl->hw->wiphy;
struct ieee80211_supported_band *band;
bool timeout = false;
diff --git a/drivers/net/wireless/ti/wlcore/wlcore.h b/drivers/net/wireless/ti/wlcore/wlcore.h
index dd14850b0603..92d878f01fa5 100644
--- a/drivers/net/wireless/ti/wlcore/wlcore.h
+++ b/drivers/net/wireless/ti/wlcore/wlcore.h
@@ -321,8 +321,10 @@ struct wl1271 {
/* Reg domain last configuration */
u32 reg_ch_conf_last[2] __aligned(8);
- /* Reg domain pending configuration */
- u32 reg_ch_conf_pending[2];
+ /* Reg domain pending configuration. Aligned to unsigned long for
+ * better performane in set_bit().
+ */
+ u32 reg_ch_conf_pending[2] __aligned(sizeof(unsigned long));
/* Pointer that holds DMA-friendly block for the mailbox */
void *mbox;
--
2.7.4
Some CPU feature bits are forced set and stored in cpuinfo_x86 before
handling clearcpuid options. To clear those bits from cpuinfo_x86,
apply_forced_cap() is called after handling the options.
Please note, apply_forced_cap() is called twice on boot CPU. But this code
is simple and there is no functionality issue.
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpu.h | 2 ++
arch/x86/kernel/cpu/common.c | 5 +++--
arch/x86/kernel/fpu/init.c | 2 ++
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index e241abce1a2a..487f32ecea0b 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -26,6 +26,8 @@ struct x86_cpu {
struct cpu cpu;
};
+void apply_forced_caps(struct cpuinfo_x86 *c);
+
#ifdef CONFIG_HOTPLUG_CPU
extern int arch_register_cpu(int num);
extern void arch_unregister_cpu(int);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 79e7cc0c4c85..26723ea322fa 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -758,13 +758,14 @@ void cpu_detect(struct cpuinfo_x86 *c)
}
}
-static void apply_forced_caps(struct cpuinfo_x86 *c)
+void apply_forced_caps(struct cpuinfo_x86 *c)
{
int i;
for (i = 0; i < NCAPINTS + NBUGINTS; i++) {
- c->x86_capability[i] &= ~cpu_caps_cleared[i];
+ /* Bits may be cleared after they are set. */
c->x86_capability[i] |= cpu_caps_set[i];
+ c->x86_capability[i] &= ~cpu_caps_cleared[i];
}
}
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 99b895eea166..9c2801b605e3 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -5,6 +5,7 @@
#include <asm/tlbflush.h>
#include <asm/setup.h>
#include <asm/cmdline.h>
+#include <asm/cpu.h>
#include <linux/sched.h>
#include <linux/sched/task.h>
@@ -261,6 +262,7 @@ static void __init clear_cpuid(void)
bit >= 0 && bit < NCAPINTS * 32)
setup_clear_cpu_cap(bit);
}
+ apply_forced_caps(&boot_cpu_data);
}
/*
--
2.7.4
From: Peter Zijlstra <[email protected]>
After kernel clears a CPUID bit through clearcpuid or other kernel options,
CPUID instruction executed from user space should see the same value for
the bit. The CPUID faulting handler returns the cleared bit to user.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 4 +++
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/kernel/cpu/cpuid-deps.c | 52 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/intel.c | 55 ++++++++++++++++++++++++++++++++++++---
arch/x86/kernel/cpu/scattered.c | 17 ++++++++++++
arch/x86/kernel/process.c | 3 +++
arch/x86/kernel/traps.c | 11 ++++++++
7 files changed, 141 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 6792088525e3..c4ac787d9b85 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -227,5 +227,9 @@ static __always_inline __pure bool _static_cpu_has(u16 bit)
#define CPU_FEATURE_TYPEVAL boot_cpu_data.x86_vendor, boot_cpu_data.x86, \
boot_cpu_data.x86_model
+extern int cpuid_fault;
+u32 scattered_cpuid_mask(u32 leaf, u32 count, enum cpuid_regs_idx reg);
+u32 cpuid_cap_mask(u32 leaf, u32 count, enum cpuid_regs_idx reg);
+
#endif /* defined(__KERNEL__) && !defined(__ASSEMBLY__) */
#endif /* _ASM_X86_CPUFEATURE_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 26723ea322fa..aa2658136fe0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1502,6 +1502,8 @@ void print_cpu_info(struct cpuinfo_x86 *c)
pr_cont(")\n");
}
+int cpuid_fault;
+
/*
* clearcpuid= was already parsed in fpu__init_parse_early_param.
* But we need to keep a dummy __setup around otherwise it would
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index b84b133d0ebd..f6c77dcf186d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -113,9 +113,61 @@ static void do_clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature)
void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature)
{
+ if (boot_cpu_has(feature))
+ cpuid_fault = 1;
do_clear_cpu_cap(c, feature);
}
+u32 cpuid_cap_mask(u32 leaf, u32 count, enum cpuid_regs_idx reg)
+{
+ switch (leaf) {
+ case 0x1:
+ if (reg == CPUID_EDX)
+ return ~cpu_caps_cleared[CPUID_1_EDX];
+ if (reg == CPUID_ECX)
+ return ~cpu_caps_cleared[CPUID_1_ECX];
+ break;
+
+ case 0x6:
+ if (reg == CPUID_EAX)
+ return ~cpu_caps_cleared[CPUID_6_EAX];
+ break;
+
+ case 0x7:
+ if (reg == CPUID_EDX)
+ return ~cpu_caps_cleared[CPUID_7_EDX];
+ if (reg == CPUID_ECX)
+ return ~cpu_caps_cleared[CPUID_7_ECX];
+ if (reg == CPUID_EBX && count == 0)
+ return ~cpu_caps_cleared[CPUID_7_0_EBX];
+ break;
+
+ case 0xD:
+ if (reg == CPUID_EAX)
+ return ~cpu_caps_cleared[CPUID_D_1_EAX];
+ break;
+
+ case 0xF:
+ if (reg == CPUID_EDX) {
+ if (count == 0)
+ return ~cpu_caps_cleared[CPUID_F_0_EDX];
+ if (count == 1)
+ return ~cpu_caps_cleared[CPUID_F_0_EDX];
+ }
+ break;
+
+ case 0x80000007:
+ if (reg == CPUID_EDX) {
+ if (test_bit(X86_FEATURE_CONSTANT_TSC,
+ (unsigned long *)cpu_caps_cleared))
+ return ~(1 << 8);
+ }
+ break;
+ }
+
+ return scattered_cpuid_mask(leaf, count, reg);
+}
+
void setup_clear_cpu_cap(unsigned int feature)
{
do_clear_cpu_cap(NULL, feature);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 0c44c49f6005..1c1ec413a9a9 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/insn-eval.h>
+#include <asm/inat.h>
#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -626,13 +628,60 @@ static void intel_bsp_resume(struct cpuinfo_x86 *c)
init_intel_energy_perf(c);
}
+bool fixup_cpuid_exception(struct pt_regs *regs)
+{
+ unsigned int leaf, count, eax, ebx, ecx, edx;
+ unsigned long seg_base = 0;
+ unsigned char buf[2];
+ int not_copied;
+
+ if (!cpuid_fault)
+ return false;
+
+ if (test_thread_flag(TIF_NOCPUID))
+ return false;
+
+ if (!user_64bit_mode(regs))
+ seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
+
+ if (seg_base == -1L)
+ return false;
+
+ not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),
+ sizeof(buf));
+ if (not_copied)
+ return false;
+
+ if (buf[0] != 0x0F || buf[1] != 0xA2) /* CPUID - OF A2 */
+ return false;
+
+ leaf = regs->ax;
+ count = regs->cx;
+
+ cpuid_count(leaf, count, &eax, &ebx, &ecx, &edx);
+
+ regs->ip += 2;
+ regs->ax = eax & cpuid_cap_mask(leaf, count, CPUID_EAX);
+ regs->bx = ebx & cpuid_cap_mask(leaf, count, CPUID_EBX);
+ regs->cx = ecx & cpuid_cap_mask(leaf, count, CPUID_ECX);
+ regs->dx = edx & cpuid_cap_mask(leaf, count, CPUID_EDX);
+
+ return true;
+}
+
static void init_cpuid_fault(struct cpuinfo_x86 *c)
{
u64 msr;
- if (!rdmsrl_safe(MSR_PLATFORM_INFO, &msr)) {
- if (msr & MSR_PLATFORM_INFO_CPUID_FAULT)
- set_cpu_cap(c, X86_FEATURE_CPUID_FAULT);
+ if (rdmsrl_safe(MSR_PLATFORM_INFO, &msr))
+ return;
+
+ if (msr & MSR_PLATFORM_INFO_CPUID_FAULT) {
+ set_cpu_cap(c, X86_FEATURE_CPUID_FAULT);
+ if (cpuid_fault) {
+ this_cpu_or(msr_misc_features_shadow,
+ MSR_MISC_FEATURES_ENABLES_CPUID_FAULT);
+ }
}
}
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 94aa1c72ca98..353756c27096 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -62,3 +62,20 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
set_cpu_cap(c, cb->feature);
}
}
+
+u32 scattered_cpuid_mask(u32 leaf, u32 count, enum cpuid_regs_idx reg)
+{
+ const struct cpuid_bit *cb;
+ u32 mask = ~0U;
+
+ for (cb = cpuid_bits; cb->feature; cb++) {
+ if (cb->level == leaf && cb->sub_leaf == count &&
+ cb->reg == reg) {
+ if (test_bit(cb->feature,
+ (unsigned long *)cpu_caps_cleared))
+ mask &= ~BIT(cb->bit);
+ }
+ }
+
+ return mask;
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 90ae0ca51083..1bba1a3c0b01 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -196,6 +196,9 @@ static void set_cpuid_faulting(bool on)
{
u64 msrval;
+ if (cpuid_fault)
+ return;
+
msrval = this_cpu_read(msr_misc_features_shadow);
msrval &= ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT;
msrval |= (on << MSR_MISC_FEATURES_ENABLES_CPUID_FAULT_BIT);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9b7c4ca8f0a7..c43c2608332e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -517,6 +517,12 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, 0, NULL);
}
+#ifdef CONFIG_CPU_SUP_INTEL
+extern bool fixup_cpuid_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_cpuid_exception(struct pt_regs *regs) { return false; }
+#endif
+
dotraplinkage void
do_general_protection(struct pt_regs *regs, long error_code)
{
@@ -531,6 +537,11 @@ do_general_protection(struct pt_regs *regs, long error_code)
return;
}
+ if (static_cpu_has(X86_FEATURE_CPUID_FAULT)) {
+ if (user_mode(regs) && fixup_cpuid_exception(regs))
+ return;
+ }
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
--
2.7.4
A bit in pwol_mask is set in b44_magic_pattern automatically by set_bit.
set_bit sets the bit in a single unsigned long location. Since pwol_mask
may not be aligned to unsigned long, the location may cross two cache
lines and accessing the location degradates performance. On x86, accessing
two cache lines in locked instruction in set_bit is called split lock and
can cause overall performance degradation.
To avoid to impact performance by accessing two cache lines in set_bit,
align pwol_mask to unsigned long.
Signed-off-by: Fenghua Yu <[email protected]>
---
drivers/net/ethernet/broadcom/b44.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
index 97ab0dd25552..bc544b6b9c3a 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -1547,7 +1547,8 @@ static void b44_setup_pseudo_magicp(struct b44 *bp)
u32 val;
int plen0, plen1, plen2;
u8 *pwol_pattern;
- u8 pwol_mask[B44_PMASK_SIZE];
+ /* Align to unsigned long for better performance in set_bit() */
+ u8 pwol_mask[B44_PMASK_SIZE] __aligned(sizeof(unsigned long));
pwol_pattern = kzalloc(B44_PATTERN_SIZE, GFP_KERNEL);
if (!pwol_pattern)
--
2.7.4
On 02/03/19 03:45, Fenghua Yu wrote:
> From: Xiaoyao Li <[email protected]>
>
> In the latest Intel SDM, CPUID.(EAX=7H,ECX=0):EDX[30] will enumerate
> the presence of the IA32_CORE_CAPABILITY MSR.
>
> Update GET_SUPPORTED_CPUID to expose this feature bit to user space, so
> that user space know this bit can be enabled in CPUID.
>
> Signed-off-by: Xiaoyao Li <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index c07958b59f50..e0e17b9c65da 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -410,7 +410,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> /* cpuid 7.0.edx*/
> const u32 kvm_cpuid_7_0_edx_x86_features =
> F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> - F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP);
> + F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(CORE_CAPABILITY) |
> + F(INTEL_STIBP);
This should be enabled always if boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT),
since the MSR is emulated. This way, guests can always rely on
IA32_CORE_CAPABILITY_MSR and it won't have to rely on the FMS
(which means nothing inside a guest).
Paolo
> /* all calls to cpuid_count() should be made on the same cpu */
> get_cpu();
>
On 02/03/19 03:44, Fenghua Yu wrote:
> cpu_caps_cleared and cpu_caps_set may not be aligned to unsigned long.
> Atomic operations (i.e. set_bit and clear_bit) on the bitmaps may access
> two cache lines (a.k.a. split lock) and lock bus to block all memory
> accesses from other processors to ensure atomicity.
>
> To avoid the overall performance degradation from the bus locking, align
> the two variables to unsigned long.
>
> Defining the variables as unsigned long may also fix the issue because
> they are naturally aligned to unsigned long. But that needs additional
> code changes. Adding __aligned(unsigned long) are simpler fixes.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> arch/x86/kernel/cpu/common.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index cb28e98a0659..51ab37ba5f64 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -488,8 +488,9 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
> return NULL; /* Not found */
> }
>
> -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS];
> -__u32 cpu_caps_set[NCAPINTS + NBUGINTS];
> +/* Unsigned long alignment to avoid split lock in atomic bitmap ops */
> +__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> +__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
>
> void load_percpu_segment(int cpu)
> {
>
(resending including the list)
Why not instead change set_bit/clear_bit to use btsl/btrl instead of
btsq/btrq?
Thanks,
Paolo
On 02/03/19 03:45, Fenghua Yu wrote:
> From: Xiaoyao Li <[email protected]>
>
> MSR IA32_CORE_CAPABILITY is a feature-enumerating MSR, bit 5 of which
> reports the capability of enabling detection of split locks (will be
> supported on future processors based on Tremont microarchitecture and
> later).
>
> Please check the latest Intel Architecture Instruction Set Extensions
> and Future Features Programming Reference for more detailed information
> on the MSR and the split lock bit.
>
> 1. Expose it to user space as a feature-enumerating MSR, so that user
> space can query it.
>
> 2. Emualte MSR_IA32_CORE_CAPABILITY with vmx->core_capability. And add the
> get and set handler of MSR_IA32_CORE_CAPABILITY.
> For uesrspace, it can set this MSR when customizing features of guest,
> also it can read the value of this MSR of guest.
> For guest, as it's a feature-enumerating MSR, guest only can read it.
>
> Signed-off-by: Xiaoyao Li <[email protected]>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.h | 1 +
> arch/x86/kvm/x86.c | 17 ++++++++++++++++-
> 4 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 180373360e34..208f15570d17 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1527,6 +1527,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
> unsigned long icr, int op_64_bit);
>
> u64 kvm_get_arch_capabilities(void);
> +u64 kvm_get_core_capability(void);
> void kvm_define_shared_msr(unsigned index, u32 msr);
> int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 30a6bcd735ec..3e03c6e1e558 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1679,6 +1679,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>
> msr_info->data = to_vmx(vcpu)->spec_ctrl;
> break;
> + case MSR_IA32_CORE_CAPABILITY:
> + if (!msr_info->host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITY))
> + return 1;
> + msr_info->data = vmx->core_capability;
> + break;
> case MSR_IA32_ARCH_CAPABILITIES:
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
> @@ -1891,6 +1897,21 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
> MSR_TYPE_W);
> break;
> + case MSR_IA32_CORE_CAPABILITY:
> + if (!msr_info->host_initiated)
> + return 1;
> + if (data & ~CORE_CAP_SPLIT_LOCK_DETECT)
> + return 1;
> +
> + /*
> + * Since AC split lock is a hardware feature, and there is no
> + * software emulation yet, we cannot enable it for guest if
> + * host hardware doesn't support it.
> + */
> + if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> + data &= ~CORE_CAP_SPLIT_LOCK_DETECT;
> + vmx->core_capability = data;
> + break;
> case MSR_IA32_ARCH_CAPABILITIES:
> if (!msr_info->host_initiated)
> return 1;
> @@ -4083,6 +4104,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> ++vmx->nmsrs;
> }
>
> + vmx->core_capability = kvm_get_core_capability();
> +
> vmx->arch_capabilities = kvm_get_arch_capabilities();
>
> vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 0ac0a64c7790..cc22379991f3 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -191,6 +191,7 @@ struct vcpu_vmx {
> u64 msr_guest_kernel_gs_base;
> #endif
>
> + u64 core_capability;
> u64 arch_capabilities;
> u64 spec_ctrl;
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 941f932373d0..c3c9e3f2d08a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1125,7 +1125,8 @@ static u32 msrs_to_save[] = {
> #endif
> MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
> MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
> - MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES,
> + MSR_IA32_SPEC_CTRL, MSR_IA32_CORE_CAPABILITY,
> + MSR_IA32_ARCH_CAPABILITIES,
> MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
> MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
> MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
> @@ -1197,11 +1198,22 @@ static u32 msr_based_features[] = {
>
> MSR_F10H_DECFG,
> MSR_IA32_UCODE_REV,
> + MSR_IA32_CORE_CAPABILITY,
> MSR_IA32_ARCH_CAPABILITIES,
> };
>
> static unsigned int num_msr_based_features;
>
> +u64 kvm_get_core_capability(void)
> +{
> + u64 data;
> +
> + rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data);
This patch should be merged with the previous patch. Also here you
should add:
data &= CORE_CAP_SPLIT_LOCK_DETECT;
so that non-virtualizable features are hidden and
if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
data |= CORE_CAP_SPLIT_LOCK_DETECT;
so that userspace gets "for free" the FMS list that will be added
later to the kernel.
Thanks,
Paolo
> +
> + return data;
> +}
> +EXPORT_SYMBOL_GPL(kvm_get_core_capability);
> +
> u64 kvm_get_arch_capabilities(void)
> {
> u64 data;
> @@ -1227,6 +1239,9 @@ EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
> static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
> {
> switch (msr->index) {
> + case MSR_IA32_CORE_CAPABILITY:
> + msr->data = kvm_get_core_capability();
> + break;
> case MSR_IA32_ARCH_CAPABILITIES:
> msr->data = kvm_get_arch_capabilities();
> break;
>
On Fri, Mar 01, 2019 at 06:44:56PM -0800, Fenghua Yu wrote:
> A bit in pwol_mask is set in b44_magic_pattern automatically by set_bit.
> set_bit sets the bit in a single unsigned long location. Since pwol_mask
> may not be aligned to unsigned long, the location may cross two cache
> lines and accessing the location degradates performance. On x86, accessing
> two cache lines in locked instruction in set_bit is called split lock and
> can cause overall performance degradation.
>
> To avoid to impact performance by accessing two cache lines in set_bit,
> align pwol_mask to unsigned long.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> drivers/net/ethernet/broadcom/b44.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
> index 97ab0dd25552..bc544b6b9c3a 100644
> --- a/drivers/net/ethernet/broadcom/b44.c
> +++ b/drivers/net/ethernet/broadcom/b44.c
> @@ -1547,7 +1547,8 @@ static void b44_setup_pseudo_magicp(struct b44 *bp)
> u32 val;
> int plen0, plen1, plen2;
> u8 *pwol_pattern;
> - u8 pwol_mask[B44_PMASK_SIZE];
> + /* Align to unsigned long for better performance in set_bit() */
> + u8 pwol_mask[B44_PMASK_SIZE] __aligned(sizeof(unsigned long));
>
> pwol_pattern = kzalloc(B44_PATTERN_SIZE, GFP_KERNEL);
> if (!pwol_pattern)
That is truly horrid code. But afaict pwol_mask is local and never
exposed to concurrency, so _why_ does it need atomic bitset in the first
place?
Would not the below be a _much_ better solution?
diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
index 97ab0dd25552..0b4226b406b1 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -1520,7 +1520,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
memset(ppattern + offset, 0xff, magicsync);
for (j = 0; j < magicsync; j++)
- set_bit(len++, (unsigned long *) pmask);
+ __set_bit(len++, (unsigned long *) pmask);
for (j = 0; j < B44_MAX_PATTERNS; j++) {
if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
@@ -1532,7 +1532,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
for (k = 0; k< ethaddr_bytes; k++) {
ppattern[offset + magicsync +
(j * ETH_ALEN) + k] = macaddr[k];
- set_bit(len++, (unsigned long *) pmask);
+ __set_bit(len++, (unsigned long *) pmask);
}
}
return len - 1;
On Fri, Mar 01, 2019 at 06:44:57PM -0800, Fenghua Yu wrote:
> A bit in reg_ch_conf_pending in wl271 and tmp_ch_bitmap is set atomically
> by set_bit(). set_bit() sets the bit in a single unsigned long location. If
> the variables are not aligned to unsigned long, set_bit() accesses two
> cache lines and thus causes slower performance. On x86, this scenario is
> called split lock and can cause overall performance degradation due to
> locked BTSL instruction in set_bit() locks bus.
>
> To avoid performance degradation, the two variables are aligned to
> unsigned long.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> drivers/net/wireless/ti/wlcore/cmd.c | 3 ++-
> drivers/net/wireless/ti/wlcore/wlcore.h | 6 ++++--
> 2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c
> index 903968735a74..8d15a6307d44 100644
> --- a/drivers/net/wireless/ti/wlcore/cmd.c
> +++ b/drivers/net/wireless/ti/wlcore/cmd.c
> @@ -1707,7 +1707,8 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
> {
> struct wl12xx_cmd_regdomain_dfs_config *cmd = NULL;
> int ret = 0, i, b, ch_bit_idx;
> - u32 tmp_ch_bitmap[2];
> + /* Align to unsigned long for better performance in set_bit() */
> + u32 tmp_ch_bitmap[2] __aligned(sizeof(unsigned long));
> struct wiphy *wiphy = wl->hw->wiphy;
> struct ieee80211_supported_band *band;
> bool timeout = false;
> diff --git a/drivers/net/wireless/ti/wlcore/wlcore.h b/drivers/net/wireless/ti/wlcore/wlcore.h
> index dd14850b0603..92d878f01fa5 100644
> --- a/drivers/net/wireless/ti/wlcore/wlcore.h
> +++ b/drivers/net/wireless/ti/wlcore/wlcore.h
> @@ -321,8 +321,10 @@ struct wl1271 {
>
> /* Reg domain last configuration */
> u32 reg_ch_conf_last[2] __aligned(8);
> - /* Reg domain pending configuration */
> - u32 reg_ch_conf_pending[2];
> + /* Reg domain pending configuration. Aligned to unsigned long for
> + * better performane in set_bit().
> + */
> + u32 reg_ch_conf_pending[2] __aligned(sizeof(unsigned long));
>
> /* Pointer that holds DMA-friendly block for the mailbox */
> void *mbox;
This has nothing to do with better performance. This is generic code,
not x86 arch code. Many RISC platforms will already refuse unaligned
atomic ops.
Also, how is this set_bit() usage endian safe?
And no wireless person on Cc anywhere.
On Mon, Mar 04, 2019 at 09:33:16AM +0100, Paolo Bonzini wrote:
> Why not instead change set_bit/clear_bit to use btsl/btrl instead of
> btsq/btrq?
At least one of the faulty users (wireless) is in generic code and needs
fixing regardless.
For better or worse; the bitmap stuff is defined to work on unsigned
long. Using it on smaller types already relies on small endian; but
further enabling just makes it worse I feel. Better have the rules be
uniform.
On 04/03/19 11:11, Peter Zijlstra wrote:
> On Fri, Mar 01, 2019 at 06:44:57PM -0800, Fenghua Yu wrote:
>> A bit in reg_ch_conf_pending in wl271 and tmp_ch_bitmap is set
>> atomically by set_bit(). set_bit() sets the bit in a single
>> unsigned long location. If the variables are not aligned to
>> unsigned long, set_bit() accesses two cache lines and thus causes
>> slower performance. On x86, this scenario is called split lock and
>> can cause overall performance degradation due to locked BTSL
>> instruction in set_bit() locks bus.
>>
>> To avoid performance degradation, the two variables are aligned to
>> unsigned long.
>>
>> Signed-off-by: Fenghua Yu <[email protected]> ---
>> drivers/net/wireless/ti/wlcore/cmd.c | 3 ++-
>> drivers/net/wireless/ti/wlcore/wlcore.h | 6 ++++-- 2 files
>> changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ti/wlcore/cmd.c
>> b/drivers/net/wireless/ti/wlcore/cmd.c index
>> 903968735a74..8d15a6307d44 100644 ---
>> a/drivers/net/wireless/ti/wlcore/cmd.c +++
>> b/drivers/net/wireless/ti/wlcore/cmd.c @@ -1707,7 +1707,8 @@ int
>> wlcore_cmd_regdomain_config_locked(struct wl1271 *wl) { struct
>> wl12xx_cmd_regdomain_dfs_config *cmd = NULL; int ret = 0, i, b,
>> ch_bit_idx; - u32 tmp_ch_bitmap[2]; + /* Align to unsigned long
>> for better performance in set_bit() */ + u32 tmp_ch_bitmap[2]
>> __aligned(sizeof(unsigned long));
This is the only place where an array of u32 is needed, because of
cmd->ch_bit_map1 = cpu_to_le32(tmp_ch_bitmap[0]);
cmd->ch_bit_map2 = cpu_to_le32(tmp_ch_bitmap[1]);
All the others should use DECLARE_BITMAP, including reg_ch_conf_last
which was already using __aligned. As Peter mentioned they should
also use set_bit_le. Actually they do not need locked access at all
because the only code paths to the set_bit take a mutex.
There is one other place that is accessing the items of the array, but
it is fixed easily. The following patch should do everything:
------------------- 8< --------------------------
From: Paolo Bonzini <[email protected]>
Subject: [PATCH] wlcore: simplify/fix/optimize reg_ch_conf_pending operations
Bitmaps are defined on unsigned longs, so the usage of u32[2] in the
wlcore driver is incorrect. As noted by Peter Zijlstra, casting arrays
to a bitmap is incorrect for big-endian architectures.
When looking at it I observed that:
- operations on reg_ch_conf_pending is always under the wl_lock mutex,
so set_bit is overkill
- the only case where reg_ch_conf_pending is accessed a u32 at a time is
unnecessary too.
This patch cleans up everything in this area, and changes tmp_ch_bitmap
to have the proper alignment.
Reported-by: Fenghua Yu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c
index 903968735a74..3e093f3a7ec8 100644
--- a/drivers/net/wireless/ti/wlcore/cmd.c
+++ b/drivers/net/wireless/ti/wlcore/cmd.c
@@ -1700,14 +1700,14 @@ void wlcore_set_pending_regdomain_ch(struct wl1271 *wl, u16 channel,
ch_bit_idx = wlcore_get_reg_conf_ch_idx(band, channel);
if (ch_bit_idx >= 0 && ch_bit_idx <= WL1271_MAX_CHANNELS)
- set_bit(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
+ __set_bit_le(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
}
int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
{
struct wl12xx_cmd_regdomain_dfs_config *cmd = NULL;
int ret = 0, i, b, ch_bit_idx;
- u32 tmp_ch_bitmap[2];
+ u32 tmp_ch_bitmap[2] __aligned(sizeof(unsigned long));
struct wiphy *wiphy = wl->hw->wiphy;
struct ieee80211_supported_band *band;
bool timeout = false;
@@ -1717,7 +1717,7 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
wl1271_debug(DEBUG_CMD, "cmd reg domain config");
- memset(tmp_ch_bitmap, 0, sizeof(tmp_ch_bitmap));
+ memcpy(tmp_ch_bitmap, wl->reg_ch_conf_pending, sizeof(tmp_ch_bitmap));
for (b = NL80211_BAND_2GHZ; b <= NL80211_BAND_5GHZ; b++) {
band = wiphy->bands[b];
@@ -1738,13 +1738,10 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
if (ch_bit_idx < 0)
continue;
- set_bit(ch_bit_idx, (long *)tmp_ch_bitmap);
+ __set_bit_le(ch_bit_idx, (long *)tmp_ch_bitmap);
}
}
- tmp_ch_bitmap[0] |= wl->reg_ch_conf_pending[0];
- tmp_ch_bitmap[1] |= wl->reg_ch_conf_pending[1];
-
if (!memcmp(tmp_ch_bitmap, wl->reg_ch_conf_last, sizeof(tmp_ch_bitmap)))
goto out;
diff --git a/drivers/net/wireless/ti/wlcore/wlcore.h b/drivers/net/wireless/ti/wlcore/wlcore.h
index dd14850b0603..870eea3e7a27 100644
--- a/drivers/net/wireless/ti/wlcore/wlcore.h
+++ b/drivers/net/wireless/ti/wlcore/wlcore.h
@@ -320,9 +320,9 @@ struct wl1271 {
bool watchdog_recovery;
/* Reg domain last configuration */
- u32 reg_ch_conf_last[2] __aligned(8);
+ DECLARE_BITMAP(reg_ch_conf_last, 64);
/* Reg domain pending configuration */
- u32 reg_ch_conf_pending[2];
+ DECLARE_BITMAP(reg_ch_conf_pending, 64);
/* Pointer that holds DMA-friendly block for the mailbox */
void *mbox;
On 04/03/19 11:17, Peter Zijlstra wrote:
> On Mon, Mar 04, 2019 at 09:33:16AM +0100, Paolo Bonzini wrote:
>> Why not instead change set_bit/clear_bit to use btsl/btrl instead of
>> btsq/btrq?
>
> At least one of the faulty users (wireless) is in generic code and needs
> fixing regardless.
>
> For better or worse; the bitmap stuff is defined to work on unsigned
> long. Using it on smaller types already relies on small endian; but
> further enabling just makes it worse I feel. Better have the rules be
> uniform.
True that. On the other hand btsl/btrl is also one byte smaller if no
operand is %r8-%r15.
In any case, /me wonders if we should have a macro like
#define DECLARE_LE_BITMAP(name,bits) \
u32 name[DIV_ROUND_UP(bits, 32)] __aligned(sizeof(unsigned long))
Paolo
On 04/03/19 11:47, Xiaoyao Li wrote:
> On Mon, 2019-03-04 at 09:38 +0100, Paolo Bonzini wrote:
>> On 02/03/19 03:45, Fenghua Yu wrote:
>>> From: Xiaoyao Li <[email protected]>
>>>
>>> In the latest Intel SDM, CPUID.(EAX=7H,ECX=0):EDX[30] will enumerate
>>> the presence of the IA32_CORE_CAPABILITY MSR.
>>>
>>> Update GET_SUPPORTED_CPUID to expose this feature bit to user space, so
>>> that user space know this bit can be enabled in CPUID.
>>>
>>> Signed-off-by: Xiaoyao Li <[email protected]>
>>> ---
>>> arch/x86/kvm/cpuid.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>> index c07958b59f50..e0e17b9c65da 100644
>>> --- a/arch/x86/kvm/cpuid.c
>>> +++ b/arch/x86/kvm/cpuid.c
>>> @@ -410,7 +410,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2
>>> *entry, u32 function,
>>> /* cpuid 7.0.edx*/
>>> const u32 kvm_cpuid_7_0_edx_x86_features =
>>> F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
>>> - F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP);
>>> + F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(CORE_CAPABILITY) |
>>> + F(INTEL_STIBP);
>>
>> This should be enabled always if boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT),
>> since the MSR is emulated. This way, guests can always rely on
>> IA32_CORE_CAPABILITY_MSR and it won't have to rely on the FMS
>> (which means nothing inside a guest).
>>
>> Paolo
>
> Hi, Paolo
> Do you mean that we don't need this here, but to add the handling below?
>
> static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 * entry, u32 function,
> ...
> switch (function) {
> ...
> case 7: {
> ...
> if (index ==0) {
> ...
> if(boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> entry->edx |= F(CORE_CAPABILITY);
> }
> ...
> ...
> }
> ...
> }
Yes, exactly.
Paolo
>>> /* all calls to cpuid_count() should be made on the same cpu */
>>> get_cpu();
>>>
>>
>>
>
On Mon, 2019-03-04 at 09:38 +0100, Paolo Bonzini wrote:
> On 02/03/19 03:45, Fenghua Yu wrote:
> > From: Xiaoyao Li <[email protected]>
> >
> > In the latest Intel SDM, CPUID.(EAX=7H,ECX=0):EDX[30] will enumerate
> > the presence of the IA32_CORE_CAPABILITY MSR.
> >
> > Update GET_SUPPORTED_CPUID to expose this feature bit to user space, so
> > that user space know this bit can be enabled in CPUID.
> >
> > Signed-off-by: Xiaoyao Li <[email protected]>
> > ---
> > arch/x86/kvm/cpuid.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index c07958b59f50..e0e17b9c65da 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -410,7 +410,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2
> > *entry, u32 function,
> > /* cpuid 7.0.edx*/
> > const u32 kvm_cpuid_7_0_edx_x86_features =
> > F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> > - F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP);
> > + F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(CORE_CAPABILITY) |
> > + F(INTEL_STIBP);
>
> This should be enabled always if boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT),
> since the MSR is emulated. This way, guests can always rely on
> IA32_CORE_CAPABILITY_MSR and it won't have to rely on the FMS
> (which means nothing inside a guest).
>
> Paolo
Hi, Paolo
Do you mean that we don't need this here, but to add the handling below?
static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 * entry, u32 function,
...
switch (function) {
...
case 7: {
...
if (index ==0) {
...
if(boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
entry->edx |= F(CORE_CAPABILITY);
}
...
...
}
...
}
> > /* all calls to cpuid_count() should be made on the same cpu */
> > get_cpu();
> >
>
>
On Mon, 2019-03-04 at 11:49 +0100, Paolo Bonzini wrote:
> On 04/03/19 11:47, Xiaoyao Li wrote:
> > On Mon, 2019-03-04 at 09:38 +0100, Paolo Bonzini wrote:
> > > On 02/03/19 03:45, Fenghua Yu wrote:
> > > > From: Xiaoyao Li <[email protected]>
> > > >
> > > > In the latest Intel SDM, CPUID.(EAX=7H,ECX=0):EDX[30] will enumerate
> > > > the presence of the IA32_CORE_CAPABILITY MSR.
> > > >
> > > > Update GET_SUPPORTED_CPUID to expose this feature bit to user space, so
> > > > that user space know this bit can be enabled in CPUID.
> > > >
> > > > Signed-off-by: Xiaoyao Li <[email protected]>
> > > > ---
> > > > arch/x86/kvm/cpuid.c | 3 ++-
> > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > index c07958b59f50..e0e17b9c65da 100644
> > > > --- a/arch/x86/kvm/cpuid.c
> > > > +++ b/arch/x86/kvm/cpuid.c
> > > > @@ -410,7 +410,8 @@ static inline int __do_cpuid_ent(struct
> > > > kvm_cpuid_entry2
> > > > *entry, u32 function,
> > > > /* cpuid 7.0.edx*/
> > > > const u32 kvm_cpuid_7_0_edx_x86_features =
> > > > F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> > > > - F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) |
> > > > F(INTEL_STIBP);
> > > > + F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) |
> > > > F(CORE_CAPABILITY) |
> > > > + F(INTEL_STIBP);
> > >
> > > This should be enabled always if
> > > boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT),
> > > since the MSR is emulated. This way, guests can always rely on
> > > IA32_CORE_CAPABILITY_MSR and it won't have to rely on the FMS
> > > (which means nothing inside a guest).
> > >
> > > Paolo
> >
> > Hi, Paolo
> > Do you mean that we don't need this here, but to add the handling below?
> >
> > static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 * entry, u32
> > function,
> > ...
> > switch (function) {
> > ...
> > case 7: {
> > ...
> > if (index ==0) {
> > ...
> > if(boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > entry->edx |= F(CORE_CAPABILITY);
> > }
> > ...
> > ...
> > }
> > ...
> > }
>
> Yes, exactly.
>
> Paolo
Like you said before, I think we don't need the condition judgment
"if(boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))", but to set F(CORE_CAPABILITY)
always for guest since MSR_IA32_CORE_CAPABILITY is emulated.
And we should set the right emulated value of MSR_IA32_CORE_CAPABILITY for guest
in function kvm_get_core_capability() based on whether
boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) just as you commented in the next
patch.
Xiaoyao
> > > > /* all calls to cpuid_count() should be made on the same cpu */
> > > > get_cpu();
> > > >
> > >
> > >
>
>
On 04/03/19 12:10, Xiaoyao Li wrote:
> Like you said before, I think we don't need the condition judgment
> "if(boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))", but to set F(CORE_CAPABILITY)
> always for guest since MSR_IA32_CORE_CAPABILITY is emulated.
>
> And we should set the right emulated value of MSR_IA32_CORE_CAPABILITY for guest
> in function kvm_get_core_capability() based on whether
> boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) just as you commented in the next
> patch.
Yes, that would certainly be better. However, you'd also have to move
MSR_IA32_CORE_CAPABILITY handling to x86.c, because you'd have to enable
X86_FEATURE_CORE_CAPABILITY for AMD.
Paolo
On Mon, 2019-03-04 at 12:14 +0100, Paolo Bonzini wrote:
> On 04/03/19 12:10, Xiaoyao Li wrote:
> > Like you said before, I think we don't need the condition judgment
> > "if(boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))", but to set
> > F(CORE_CAPABILITY)
> > always for guest since MSR_IA32_CORE_CAPABILITY is emulated.
> >
> > And we should set the right emulated value of MSR_IA32_CORE_CAPABILITY for
> > guest
> > in function kvm_get_core_capability() based on whether
> > boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) just as you commented in the
> > next
> > patch.
>
> Yes, that would certainly be better. However, you'd also have to move
> MSR_IA32_CORE_CAPABILITY handling to x86.c, because you'd have to enable
> X86_FEATURE_CORE_CAPABILITY for AMD.
>
> Paolo
Thanks for your comments and advises.
I'll do it in next version.
Xiaoyao
On Mon, 2019-03-04 at 09:42 +0100, Paolo Bonzini wrote:
> On 02/03/19 03:45, Fenghua Yu wrote:
> > From: Xiaoyao Li <[email protected]>
> >
> > MSR IA32_CORE_CAPABILITY is a feature-enumerating MSR, bit 5 of which
> > reports the capability of enabling detection of split locks (will be
> > supported on future processors based on Tremont microarchitecture and
> > later).
> >
> > Please check the latest Intel Architecture Instruction Set Extensions
> > and Future Features Programming Reference for more detailed information
> > on the MSR and the split lock bit.
> >
> > 1. Expose it to user space as a feature-enumerating MSR, so that user
> > space can query it.
> >
> > 2. Emualte MSR_IA32_CORE_CAPABILITY with vmx->core_capability. And add the
> > get and set handler of MSR_IA32_CORE_CAPABILITY.
> > For uesrspace, it can set this MSR when customizing features of guest,
> > also it can read the value of this MSR of guest.
> > For guest, as it's a feature-enumerating MSR, guest only can read it.
> >
> > Signed-off-by: Xiaoyao Li <[email protected]>
> > Signed-off-by: Fenghua Yu <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 1 +
> > arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++++++
> > arch/x86/kvm/vmx/vmx.h | 1 +
> > arch/x86/kvm/x86.c | 17 ++++++++++++++++-
> > 4 files changed, 41 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h
> > index 180373360e34..208f15570d17 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1527,6 +1527,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long
> > ipi_bitmap_low,
> > unsigned long icr, int op_64_bit);
> >
> > u64 kvm_get_arch_capabilities(void);
> > +u64 kvm_get_core_capability(void);
> > void kvm_define_shared_msr(unsigned index, u32 msr);
> > int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 30a6bcd735ec..3e03c6e1e558 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1679,6 +1679,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> >
> > msr_info->data = to_vmx(vcpu)->spec_ctrl;
> > break;
> > + case MSR_IA32_CORE_CAPABILITY:
> > + if (!msr_info->host_initiated &&
> > + !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITY))
> > + return 1;
> > + msr_info->data = vmx->core_capability;
> > + break;
> > case MSR_IA32_ARCH_CAPABILITIES:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
> > @@ -1891,6 +1897,21 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> > vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
> > MSR_IA32_PRED_CMD,
> > MSR_TYPE_W);
> > break;
> > + case MSR_IA32_CORE_CAPABILITY:
> > + if (!msr_info->host_initiated)
> > + return 1;
> > + if (data & ~CORE_CAP_SPLIT_LOCK_DETECT)
> > + return 1;
> > +
> > + /*
> > + * Since AC split lock is a hardware feature, and there is no
> > + * software emulation yet, we cannot enable it for guest if
> > + * host hardware doesn't support it.
> > + */
> > + if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > + data &= ~CORE_CAP_SPLIT_LOCK_DETECT;
> > + vmx->core_capability = data;
> > + break;
> > case MSR_IA32_ARCH_CAPABILITIES:
> > if (!msr_info->host_initiated)
> > return 1;
> > @@ -4083,6 +4104,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> > ++vmx->nmsrs;
> > }
> >
> > + vmx->core_capability = kvm_get_core_capability();
> > +
> > vmx->arch_capabilities = kvm_get_arch_capabilities();
> >
> > vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
> > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > index 0ac0a64c7790..cc22379991f3 100644
> > --- a/arch/x86/kvm/vmx/vmx.h
> > +++ b/arch/x86/kvm/vmx/vmx.h
> > @@ -191,6 +191,7 @@ struct vcpu_vmx {
> > u64 msr_guest_kernel_gs_base;
> > #endif
> >
> > + u64 core_capability;
> > u64 arch_capabilities;
> > u64 spec_ctrl;
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 941f932373d0..c3c9e3f2d08a 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1125,7 +1125,8 @@ static u32 msrs_to_save[] = {
> > #endif
> > MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
> > MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
> > - MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES,
> > + MSR_IA32_SPEC_CTRL, MSR_IA32_CORE_CAPABILITY,
> > + MSR_IA32_ARCH_CAPABILITIES,
> > MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
> > MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
> > MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
> > @@ -1197,11 +1198,22 @@ static u32 msr_based_features[] = {
> >
> > MSR_F10H_DECFG,
> > MSR_IA32_UCODE_REV,
> > + MSR_IA32_CORE_CAPABILITY,
> > MSR_IA32_ARCH_CAPABILITIES,
> > };
> >
> > static unsigned int num_msr_based_features;
> >
> > +u64 kvm_get_core_capability(void)
> > +{
> > + u64 data;
> > +
> > + rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data);
>
> This patch should be merged with the previous patch. Also here you
> should add:
>
> data &= CORE_CAP_SPLIT_LOCK_DETECT;
>
> so that non-virtualizable features are hidden and
>
> if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> data |= CORE_CAP_SPLIT_LOCK_DETECT;
>
> so that userspace gets "for free" the FMS list that will be added
> later to the kernel.
>
> Thanks,
>
> Paolo
You are right. I'll correct it in next version.
> > +
> > + return data;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_get_core_capability);
> > +
> > u64 kvm_get_arch_capabilities(void)
> > {
> > u64 data;
> > @@ -1227,6 +1239,9 @@ EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
> > static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
> > {
> > switch (msr->index) {
> > + case MSR_IA32_CORE_CAPABILITY:
> > + msr->data = kvm_get_core_capability();
> > + break;
> > case MSR_IA32_ARCH_CAPABILITIES:
> > msr->data = kvm_get_arch_capabilities();
> > break;
> >
>
>
On Mon, Mar 04, 2019 at 11:46:52AM +0100, Paolo Bonzini wrote:
> From: Paolo Bonzini <[email protected]>
> Subject: [PATCH] wlcore: simplify/fix/optimize reg_ch_conf_pending operations
>
> Bitmaps are defined on unsigned longs, so the usage of u32[2] in the
> wlcore driver is incorrect. As noted by Peter Zijlstra, casting arrays
> to a bitmap is incorrect for big-endian architectures.
>
> When looking at it I observed that:
>
> - operations on reg_ch_conf_pending is always under the wl_lock mutex,
> so set_bit is overkill
>
> - the only case where reg_ch_conf_pending is accessed a u32 at a time is
> unnecessary too.
>
> This patch cleans up everything in this area, and changes tmp_ch_bitmap
> to have the proper alignment.
>
> Reported-by: Fenghua Yu <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
>
> diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c
> index 903968735a74..3e093f3a7ec8 100644
> --- a/drivers/net/wireless/ti/wlcore/cmd.c
> +++ b/drivers/net/wireless/ti/wlcore/cmd.c
> @@ -1700,14 +1700,14 @@ void wlcore_set_pending_regdomain_ch(struct wl1271 *wl, u16 channel,
> ch_bit_idx = wlcore_get_reg_conf_ch_idx(band, channel);
>
> if (ch_bit_idx >= 0 && ch_bit_idx <= WL1271_MAX_CHANNELS)
> - set_bit(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
> + __set_bit_le(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
> }
>
> int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
> {
> struct wl12xx_cmd_regdomain_dfs_config *cmd = NULL;
> int ret = 0, i, b, ch_bit_idx;
> - u32 tmp_ch_bitmap[2];
> + u32 tmp_ch_bitmap[2] __aligned(sizeof(unsigned long));
Also mark it as __le32 ?
> struct wiphy *wiphy = wl->hw->wiphy;
> struct ieee80211_supported_band *band;
> bool timeout = false;
> @@ -1717,7 +1717,7 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
>
> wl1271_debug(DEBUG_CMD, "cmd reg domain config");
>
> - memset(tmp_ch_bitmap, 0, sizeof(tmp_ch_bitmap));
> + memcpy(tmp_ch_bitmap, wl->reg_ch_conf_pending, sizeof(tmp_ch_bitmap));
How about using:
bitmap_to_arr32(tmp_ch_bitmap, wl->reg_ch_conf_pending, sizeof(tmp_ch_bitmap));
for (i=0; i<2; i++)
tmp_ch_bitmap[i] = cpu_to_le32(tmp_ch_bitmap[i]);
(or add bitmap_to_arr32_le ?)
> for (b = NL80211_BAND_2GHZ; b <= NL80211_BAND_5GHZ; b++) {
> band = wiphy->bands[b];
> @@ -1738,13 +1738,10 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
> if (ch_bit_idx < 0)
> continue;
>
> - set_bit(ch_bit_idx, (long *)tmp_ch_bitmap);
> + __set_bit_le(ch_bit_idx, (long *)tmp_ch_bitmap);
But you copied in reg_ch_conf_pending without doing an LE swizzle.
With the proposed change, we have two __le32 here and it works again.
> }
> }
>
> - tmp_ch_bitmap[0] |= wl->reg_ch_conf_pending[0];
> - tmp_ch_bitmap[1] |= wl->reg_ch_conf_pending[1];
> -
> if (!memcmp(tmp_ch_bitmap, wl->reg_ch_conf_last, sizeof(tmp_ch_bitmap)))
> goto out;
>
And then remove the cpu_to_le32() on assignment to ch_bit_map*.
> diff --git a/drivers/net/wireless/ti/wlcore/wlcore.h b/drivers/net/wireless/ti/wlcore/wlcore.h
> index dd14850b0603..870eea3e7a27 100644
> --- a/drivers/net/wireless/ti/wlcore/wlcore.h
> +++ b/drivers/net/wireless/ti/wlcore/wlcore.h
> @@ -320,9 +320,9 @@ struct wl1271 {
> bool watchdog_recovery;
>
> /* Reg domain last configuration */
> - u32 reg_ch_conf_last[2] __aligned(8);
> + DECLARE_BITMAP(reg_ch_conf_last, 64);
Is never actually used as a bitmap but used as opaque storage with
memcpy and memcmp against tmp_ch_bitmap.
> /* Reg domain pending configuration */
> - u32 reg_ch_conf_pending[2];
> + DECLARE_BITMAP(reg_ch_conf_pending, 64);
>
> /* Pointer that holds DMA-friendly block for the mailbox */
> void *mbox;
On Mon, Mar 04, 2019 at 11:48:18AM +0100, Paolo Bonzini wrote:
> True that. On the other hand btsl/btrl is also one byte smaller if no
> operand is %r8-%r15.
Because then we loose the REX prefix, right. Now _that_ might actually
be a reason to do that :-)
> In any case, /me wonders if we should have a macro like
>
> #define DECLARE_LE_BITMAP(name,bits) \
> u32 name[DIV_ROUND_UP(bits, 32)] __aligned(sizeof(unsigned long))
s/u32/__le32/
To go in bitops/le.h, sure, if there's enough users.
On 04/03/19 13:44, Peter Zijlstra wrote:
> On Mon, Mar 04, 2019 at 11:48:18AM +0100, Paolo Bonzini wrote:
>> True that. On the other hand btsl/btrl is also one byte smaller if no
>> operand is %r8-%r15.
>
> Because then we loose the REX prefix, right. Now _that_ might actually
> be a reason to do that :-)
I knew that would be the right way to put it for you. :)
>> In any case, /me wonders if we should have a macro like
>>
>> #define DECLARE_LE_BITMAP(name,bits) \
>> u32 name[DIV_ROUND_UP(bits, 32)] __aligned(sizeof(unsigned long))
>
> s/u32/__le32/
>
> To go in bitops/le.h, sure, if there's enough users.
Hmm... actually that should be "BITS_TO_LONGS(bits) * sizeof(unsigned
long) / 4" because bitmap functions may access (or even clear) the last
word as 64 bits.
Paolo
On 04/03/19 13:41, Peter Zijlstra wrote:
> On Mon, Mar 04, 2019 at 11:46:52AM +0100, Paolo Bonzini wrote:
>> From: Paolo Bonzini <[email protected]>
>> Subject: [PATCH] wlcore: simplify/fix/optimize reg_ch_conf_pending operations
>>
>> Bitmaps are defined on unsigned longs, so the usage of u32[2] in the
>> wlcore driver is incorrect. As noted by Peter Zijlstra, casting arrays
>> to a bitmap is incorrect for big-endian architectures.
>>
>> When looking at it I observed that:
>>
>> - operations on reg_ch_conf_pending is always under the wl_lock mutex,
>> so set_bit is overkill
>>
>> - the only case where reg_ch_conf_pending is accessed a u32 at a time is
>> unnecessary too.
>>
>> This patch cleans up everything in this area, and changes tmp_ch_bitmap
>> to have the proper alignment.
>>
>> Reported-by: Fenghua Yu <[email protected]>
>> Signed-off-by: Paolo Bonzini <[email protected]>
>>
>> diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c
>> index 903968735a74..3e093f3a7ec8 100644
>> --- a/drivers/net/wireless/ti/wlcore/cmd.c
>> +++ b/drivers/net/wireless/ti/wlcore/cmd.c
>> @@ -1700,14 +1700,14 @@ void wlcore_set_pending_regdomain_ch(struct wl1271 *wl, u16 channel,
>> ch_bit_idx = wlcore_get_reg_conf_ch_idx(band, channel);
>>
>> if (ch_bit_idx >= 0 && ch_bit_idx <= WL1271_MAX_CHANNELS)
>> - set_bit(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
>> + __set_bit_le(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
>> }
>>
>> int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
>> {
>> struct wl12xx_cmd_regdomain_dfs_config *cmd = NULL;
>> int ret = 0, i, b, ch_bit_idx;
>> - u32 tmp_ch_bitmap[2];
>> + u32 tmp_ch_bitmap[2] __aligned(sizeof(unsigned long));
>
> Also mark it as __le32 ?
That would require more changes to mark ch_bit_map1/ch_bit_map2 as
__le32 (I think, I don't do much sparse), so I didn't do that.
>> struct wiphy *wiphy = wl->hw->wiphy;
>> struct ieee80211_supported_band *band;
>> bool timeout = false;
>> @@ -1717,7 +1717,7 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
>>
>> wl1271_debug(DEBUG_CMD, "cmd reg domain config");
>>
>> - memset(tmp_ch_bitmap, 0, sizeof(tmp_ch_bitmap));
>> + memcpy(tmp_ch_bitmap, wl->reg_ch_conf_pending, sizeof(tmp_ch_bitmap));
>
> How about using:
>
> bitmap_to_arr32(tmp_ch_bitmap, wl->reg_ch_conf_pending, sizeof(tmp_ch_bitmap));
> for (i=0; i<2; i++)
> tmp_ch_bitmap[i] = cpu_to_le32(tmp_ch_bitmap[i]);
>
> (or add bitmap_to_arr32_le ?)
I've used __set_bit_le when setting reg_ch_conf_pending so no need to
swizzle here; OTOH bitmap_to_arr32 doesn't work here that swizzle
already swaps halfwords.
>> for (b = NL80211_BAND_2GHZ; b <= NL80211_BAND_5GHZ; b++) {
>> band = wiphy->bands[b];
>> @@ -1738,13 +1738,10 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
>> if (ch_bit_idx < 0)
>> continue;
>>
>> - set_bit(ch_bit_idx, (long *)tmp_ch_bitmap);
>> + __set_bit_le(ch_bit_idx, (long *)tmp_ch_bitmap);
>
> But you copied in reg_ch_conf_pending without doing an LE swizzle.
> With the proposed change, we have two __le32 here and it works again.
(Again there's no need to do an LE swizzle because it's done in
wlcore_set_pending_regdomain_ch).
>> }
>> }
>>
>> - tmp_ch_bitmap[0] |= wl->reg_ch_conf_pending[0];
>> - tmp_ch_bitmap[1] |= wl->reg_ch_conf_pending[1];
>> -
>> if (!memcmp(tmp_ch_bitmap, wl->reg_ch_conf_last, sizeof(tmp_ch_bitmap)))
>> goto out;
>>
>
> And then remove the cpu_to_le32() on assignment to ch_bit_map*.
Yup, I forgot to commit that cpu_to_le32 removal.
>> diff --git a/drivers/net/wireless/ti/wlcore/wlcore.h b/drivers/net/wireless/ti/wlcore/wlcore.h
>> index dd14850b0603..870eea3e7a27 100644
>> --- a/drivers/net/wireless/ti/wlcore/wlcore.h
>> +++ b/drivers/net/wireless/ti/wlcore/wlcore.h
>> @@ -320,9 +320,9 @@ struct wl1271 {
>> bool watchdog_recovery;
>>
>> /* Reg domain last configuration */
>> - u32 reg_ch_conf_last[2] __aligned(8);
>> + DECLARE_BITMAP(reg_ch_conf_last, 64);
>
> Is never actually used as a bitmap but used as opaque storage with
> memcpy and memcmp against tmp_ch_bitmap.
Yeah, but it is the easiest way to ensure it is the right size as
reg_ch_conf_pending. The two are related, it makes sense to declare
them the same.
Paolo
On Mon, Mar 04, 2019 at 02:09:30PM +0100, Paolo Bonzini wrote:
> >> @@ -1700,14 +1700,14 @@ void wlcore_set_pending_regdomain_ch(struct wl1271 *wl, u16 channel,
> >> ch_bit_idx = wlcore_get_reg_conf_ch_idx(band, channel);
> >>
> >> if (ch_bit_idx >= 0 && ch_bit_idx <= WL1271_MAX_CHANNELS)
> >> - set_bit(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
> >> + __set_bit_le(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
> >> }
> >>
> I've used __set_bit_le when setting reg_ch_conf_pending so no need to
> swizzle here;
Argh, reading hard. Yes, then it all works.
On Mon, Mar 04, 2019 at 11:46:52AM +0100, Paolo Bonzini wrote:
> On 04/03/19 11:11, Peter Zijlstra wrote:
> > On Fri, Mar 01, 2019 at 06:44:57PM -0800, Fenghua Yu wrote:
> >> A bit in reg_ch_conf_pending in wl271 and tmp_ch_bitmap is set
> >> atomically by set_bit(). set_bit() sets the bit in a single
> >> unsigned long location. If the variables are not aligned to
> >> unsigned long, set_bit() accesses two cache lines and thus causes
> >> slower performance. On x86, this scenario is called split lock and
> >> can cause overall performance degradation due to locked BTSL
> >> instruction in set_bit() locks bus.
> >>
> >> To avoid performance degradation, the two variables are aligned to
> >> unsigned long.
> >>
> >> Signed-off-by: Fenghua Yu <[email protected]> ---
> >> drivers/net/wireless/ti/wlcore/cmd.c | 3 ++-
> >> drivers/net/wireless/ti/wlcore/wlcore.h | 6 ++++-- 2 files
> >> changed, 6 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/net/wireless/ti/wlcore/cmd.c
> >> b/drivers/net/wireless/ti/wlcore/cmd.c index
> >> 903968735a74..8d15a6307d44 100644 ---
> >> a/drivers/net/wireless/ti/wlcore/cmd.c +++
> >> b/drivers/net/wireless/ti/wlcore/cmd.c @@ -1707,7 +1707,8 @@ int
> >> wlcore_cmd_regdomain_config_locked(struct wl1271 *wl) { struct
> >> wl12xx_cmd_regdomain_dfs_config *cmd = NULL; int ret = 0, i, b,
> >> ch_bit_idx; - u32 tmp_ch_bitmap[2]; + /* Align to unsigned long
> >> for better performance in set_bit() */ + u32 tmp_ch_bitmap[2]
> >> __aligned(sizeof(unsigned long));
>
> This is the only place where an array of u32 is needed, because of
>
> cmd->ch_bit_map1 = cpu_to_le32(tmp_ch_bitmap[0]);
> cmd->ch_bit_map2 = cpu_to_le32(tmp_ch_bitmap[1]);
>
> All the others should use DECLARE_BITMAP, including reg_ch_conf_last
> which was already using __aligned. As Peter mentioned they should
> also use set_bit_le. Actually they do not need locked access at all
> because the only code paths to the set_bit take a mutex.
>
> There is one other place that is accessing the items of the array, but
> it is fixed easily. The following patch should do everything:
>
> ------------------- 8< --------------------------
> From: Paolo Bonzini <[email protected]>
> Subject: [PATCH] wlcore: simplify/fix/optimize reg_ch_conf_pending operations
>
> Bitmaps are defined on unsigned longs, so the usage of u32[2] in the
> wlcore driver is incorrect. As noted by Peter Zijlstra, casting arrays
> to a bitmap is incorrect for big-endian architectures.
>
> When looking at it I observed that:
>
> - operations on reg_ch_conf_pending is always under the wl_lock mutex,
> so set_bit is overkill
>
> - the only case where reg_ch_conf_pending is accessed a u32 at a time is
> unnecessary too.
>
> This patch cleans up everything in this area, and changes tmp_ch_bitmap
> to have the proper alignment.
>
> Reported-by: Fenghua Yu <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
>
> diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c
> index 903968735a74..3e093f3a7ec8 100644
> --- a/drivers/net/wireless/ti/wlcore/cmd.c
> +++ b/drivers/net/wireless/ti/wlcore/cmd.c
> @@ -1700,14 +1700,14 @@ void wlcore_set_pending_regdomain_ch(struct wl1271 *wl, u16 channel,
> ch_bit_idx = wlcore_get_reg_conf_ch_idx(band, channel);
>
> if (ch_bit_idx >= 0 && ch_bit_idx <= WL1271_MAX_CHANNELS)
> - set_bit(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
> + __set_bit_le(ch_bit_idx, (long *)wl->reg_ch_conf_pending);
> }
>
> int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
> {
> struct wl12xx_cmd_regdomain_dfs_config *cmd = NULL;
> int ret = 0, i, b, ch_bit_idx;
> - u32 tmp_ch_bitmap[2];
> + u32 tmp_ch_bitmap[2] __aligned(sizeof(unsigned long));
Now __aligned() is unnecessary because __set_bit_le() handles tmp_ch_bitmap,
right?
> struct wiphy *wiphy = wl->hw->wiphy;
> struct ieee80211_supported_band *band;
> bool timeout = false;
> @@ -1717,7 +1717,7 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
>
> wl1271_debug(DEBUG_CMD, "cmd reg domain config");
>
> - memset(tmp_ch_bitmap, 0, sizeof(tmp_ch_bitmap));
> + memcpy(tmp_ch_bitmap, wl->reg_ch_conf_pending, sizeof(tmp_ch_bitmap));
>
> for (b = NL80211_BAND_2GHZ; b <= NL80211_BAND_5GHZ; b++) {
> band = wiphy->bands[b];
> @@ -1738,13 +1738,10 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
> if (ch_bit_idx < 0)
> continue;
>
> - set_bit(ch_bit_idx, (long *)tmp_ch_bitmap);
> + __set_bit_le(ch_bit_idx, (long *)tmp_ch_bitmap);
Is __test_and_set_bit_le() more meaningful to avoid duplicate bit setting ?
> }
> }
>
> - tmp_ch_bitmap[0] |= wl->reg_ch_conf_pending[0];
> - tmp_ch_bitmap[1] |= wl->reg_ch_conf_pending[1];
> -
> if (!memcmp(tmp_ch_bitmap, wl->reg_ch_conf_last, sizeof(tmp_ch_bitmap)))
> goto out;
>
> diff --git a/drivers/net/wireless/ti/wlcore/wlcore.h b/drivers/net/wireless/ti/wlcore/wlcore.h
> index dd14850b0603..870eea3e7a27 100644
> --- a/drivers/net/wireless/ti/wlcore/wlcore.h
> +++ b/drivers/net/wireless/ti/wlcore/wlcore.h
> @@ -320,9 +320,9 @@ struct wl1271 {
> bool watchdog_recovery;
>
> /* Reg domain last configuration */
> - u32 reg_ch_conf_last[2] __aligned(8);
> + DECLARE_BITMAP(reg_ch_conf_last, 64);
> /* Reg domain pending configuration */
> - u32 reg_ch_conf_pending[2];
> + DECLARE_BITMAP(reg_ch_conf_pending, 64);
>
> /* Pointer that holds DMA-friendly block for the mailbox */
> void *mbox;
On Mon, Mar 04, 2019 at 11:00:22AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 01, 2019 at 06:44:56PM -0800, Fenghua Yu wrote:
> > A bit in pwol_mask is set in b44_magic_pattern automatically by set_bit.
> > set_bit sets the bit in a single unsigned long location. Since pwol_mask
> > may not be aligned to unsigned long, the location may cross two cache
> > lines and accessing the location degradates performance. On x86, accessing
> > two cache lines in locked instruction in set_bit is called split lock and
> > can cause overall performance degradation.
> >
> > To avoid to impact performance by accessing two cache lines in set_bit,
> > align pwol_mask to unsigned long.
> >
> > Signed-off-by: Fenghua Yu <[email protected]>
> > ---
> > drivers/net/ethernet/broadcom/b44.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
> > index 97ab0dd25552..bc544b6b9c3a 100644
> > --- a/drivers/net/ethernet/broadcom/b44.c
> > +++ b/drivers/net/ethernet/broadcom/b44.c
> > @@ -1547,7 +1547,8 @@ static void b44_setup_pseudo_magicp(struct b44 *bp)
> > u32 val;
> > int plen0, plen1, plen2;
> > u8 *pwol_pattern;
> > - u8 pwol_mask[B44_PMASK_SIZE];
> > + /* Align to unsigned long for better performance in set_bit() */
> > + u8 pwol_mask[B44_PMASK_SIZE] __aligned(sizeof(unsigned long));
> >
> > pwol_pattern = kzalloc(B44_PATTERN_SIZE, GFP_KERNEL);
> > if (!pwol_pattern)
>
> That is truly horrid code. But afaict pwol_mask is local and never
> exposed to concurrency, so _why_ does it need atomic bitset in the first
> place?
>
> Would not the below be a _much_ better solution?
>
>
> diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
> index 97ab0dd25552..0b4226b406b1 100644
> --- a/drivers/net/ethernet/broadcom/b44.c
> +++ b/drivers/net/ethernet/broadcom/b44.c
> @@ -1520,7 +1520,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
>
> memset(ppattern + offset, 0xff, magicsync);
> for (j = 0; j < magicsync; j++)
> - set_bit(len++, (unsigned long *) pmask);
> + __set_bit(len++, (unsigned long *) pmask);
>
> for (j = 0; j < B44_MAX_PATTERNS; j++) {
> if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
> @@ -1532,7 +1532,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
> for (k = 0; k< ethaddr_bytes; k++) {
> ppattern[offset + magicsync +
> (j * ETH_ALEN) + k] = macaddr[k];
> - set_bit(len++, (unsigned long *) pmask);
> + __set_bit(len++, (unsigned long *) pmask);
> }
> }
> return len - 1;
Sure. I will change the patch to your code. And add
Signed-off-by: Peter Zijlstra <[email protected]>
to this patch?
Thanks.
-Fenghua
On 04/03/19 15:40, Fenghua Yu wrote:
>> + u32 tmp_ch_bitmap[2] __aligned(sizeof(unsigned long));
>
> Now __aligned() is unnecessary because __set_bit_le() handles tmp_ch_bitmap,
> right?
It's still needed because, as explained by Peter, bitmap functions
should always operate on properly aligned variables.
>> struct wiphy *wiphy = wl->hw->wiphy;
>> struct ieee80211_supported_band *band;
>> bool timeout = false;
>> @@ -1717,7 +1717,7 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
>>
>> wl1271_debug(DEBUG_CMD, "cmd reg domain config");
>>
>> - memset(tmp_ch_bitmap, 0, sizeof(tmp_ch_bitmap));
>> + memcpy(tmp_ch_bitmap, wl->reg_ch_conf_pending, sizeof(tmp_ch_bitmap));
>>
>> for (b = NL80211_BAND_2GHZ; b <= NL80211_BAND_5GHZ; b++) {
>> band = wiphy->bands[b];
>> @@ -1738,13 +1738,10 @@ int wlcore_cmd_regdomain_config_locked(struct wl1271 *wl)
>> if (ch_bit_idx < 0)
>> continue;
>>
>> - set_bit(ch_bit_idx, (long *)tmp_ch_bitmap);
>> + __set_bit_le(ch_bit_idx, (long *)tmp_ch_bitmap);
>
> Is __test_and_set_bit_le() more meaningful to avoid duplicate bit setting ?
No, since the return value would be unused.
Note that this patch is missing the removal of cpu_to_le32, as noticed
by Peter.
Thanks,
Paolo
>
>> }
>> }
>>
>> - tmp_ch_bitmap[0] |= wl->reg_ch_conf_pending[0];
>> - tmp_ch_bitmap[1] |= wl->reg_ch_conf_pending[1];
>> -
>> if (!memcmp(tmp_ch_bitmap, wl->reg_ch_conf_last, sizeof(tmp_ch_bitmap)))
>> goto out;
>>
>> diff --git a/drivers/net/wireless/ti/wlcore/wlcore.h b/drivers/net/wireless/ti/wlcore/wlcore.h
>> index dd14850b0603..870eea3e7a27 100644
>> --- a/drivers/net/wireless/ti/wlcore/wlcore.h
>> +++ b/drivers/net/wireless/ti/wlcore/wlcore.h
>> @@ -320,9 +320,9 @@ struct wl1271 {
>> bool watchdog_recovery;
>>
>> /* Reg domain last configuration */
>> - u32 reg_ch_conf_last[2] __aligned(8);
>> + DECLARE_BITMAP(reg_ch_conf_last, 64);
>> /* Reg domain pending configuration */
>> - u32 reg_ch_conf_pending[2];
>> + DECLARE_BITMAP(reg_ch_conf_pending, 64);
>>
>> /* Pointer that holds DMA-friendly block for the mailbox */
>> void *mbox;
On Mon, Mar 04, 2019 at 06:45:26AM -0800, Fenghua Yu wrote:
> Sure. I will change the patch to your code. And add
> Signed-off-by: Peter Zijlstra <[email protected]>
> to this patch?
Sure..
On 3/1/19 6:44 PM, Fenghua Yu wrote:
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 33051436c864..eb8ae701ef65 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -93,7 +93,9 @@ struct cpuinfo_x86 {
> __u32 extended_cpuid_level;
> /* Maximum supported CPUID level, -1=no CPUID: */
> int cpuid_level;
> - __u32 x86_capability[NCAPINTS + NBUGINTS];
> + /* Unsigned long alignment to avoid split lock in atomic bitmap ops */
> + __u32 x86_capability[NCAPINTS + NBUGINTS]
> + __aligned(sizeof(unsigned long));
I think this also warrants a comment in the changelog about the
alignment of 'struct cpuinfo_x86'.
On 3/1/19 6:44 PM, Fenghua Yu wrote:
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 6d6122524711..350eeccd0ce9 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -349,6 +349,7 @@
> #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
> #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
> #define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
> +#define X86_FEATURE_CORE_CAPABILITY (18*32+30) /* IA32_CORE_CAPABILITY MSR */
> #define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */
What does this feature end up looking like in /proc/cpuinfo?
> From: Hansen, Dave
> On 3/1/19 6:44 PM, Fenghua Yu wrote:
> > diff --git a/arch/x86/include/asm/cpufeatures.h
> b/arch/x86/include/asm/cpufeatures.h
> > index 6d6122524711..350eeccd0ce9 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -349,6 +349,7 @@
> > #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single
> Thread Indirect Branch Predictors */
> > #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D
> cache */
> > #define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /*
> IA32_ARCH_CAPABILITIES MSR (Intel) */
> > +#define X86_FEATURE_CORE_CAPABILITY (18*32+30) /*
> IA32_CORE_CAPABILITY MSR */
> > #define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative
> Store Bypass Disable */
>
> What does this feature end up looking like in /proc/cpuinfo?
The flag string is "core_capability" in /proc/cpuinfo.
Thanks.
-Fenghua
> * Clear/Set all flags overridden by options, after probe.
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index 2c0bd38a44ab..5ba11ce99f92 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -59,6 +59,7 @@ static const struct cpuid_dep cpuid_deps[] = {
> { X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F },
> { X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F },
> { X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F },
> + { X86_FEATURE_SPLIT_LOCK_DETECT, X86_FEATURE_CORE_CAPABILITY},
> {}
> };
Please reindent the rest of the structure whenever you break the record
for name length.
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index fc3c07fe7df5..0c44c49f6005 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -1029,3 +1029,24 @@ static const struct cpu_dev intel_cpu_dev = {
>
> cpu_dev_register(intel_cpu_dev);
>
> +/**
> + * init_core_capability - enumerate features supported in IA32_CORE_CAPABILITY
> + * @c: pointer to cpuinfo_x86
> + *
> + * Return: void
> + */
> +void init_core_capability(struct cpuinfo_x86 *c)
> +{
> + /*
> + * If MSR_IA32_CORE_CAPABILITY exists, enumerate features that are
> + * reported in the MSR.
> + */
> + if (c == &boot_cpu_data && cpu_has(c, X86_FEATURE_CORE_CAPABILITY)) {
First of all, please endeavor to leave the main flow of the function at
the first indent.
*If* you were to do this, it would look like:
if (c != &boot_cpu_data)
return;
if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITY))
return;
rdmsrl(...);
But, it goes unmentioned why you do the boot-cpu-only restriction here.
It doesn't match the behavior of the two functions called before
init_core_capability():
init_scattered_cpuid_features(c);
init_speculation_control(c);
So why is this new function special?
> + u64 val;
> +
> + rdmsrl(MSR_IA32_CORE_CAPABILITY, val);
> +
> + if (val & CORE_CAP_SPLIT_LOCK_DETECT)
> + setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
> + }
> +}
>
On 3/4/19 10:55 AM, Yu, Fenghua wrote:
>> What does this feature end up looking like in /proc/cpuinfo?
> The flag string is "core_capability" in /proc/cpuinfo.
Ahh, OK, I misread the lack of quotes in the comment:
> +#define X86_FEATURE_CORE_CAPABILITY (18*32+30) /* IA32_CORE_CAPABILITY MSR */
On Mon, Mar 04, 2019 at 10:58:01AM -0800, Dave Hansen wrote:
> > * Clear/Set all flags overridden by options, after probe.
> > diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> > index 2c0bd38a44ab..5ba11ce99f92 100644
> > --- a/arch/x86/kernel/cpu/cpuid-deps.c
> > +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> > @@ -59,6 +59,7 @@ static const struct cpuid_dep cpuid_deps[] = {
> > { X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F },
> > { X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F },
> > { X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F },
> > + { X86_FEATURE_SPLIT_LOCK_DETECT, X86_FEATURE_CORE_CAPABILITY},
> > {}
> > };
>
> Please reindent the rest of the structure whenever you break the record
> for name length.
Ok. Will do it.
>
> > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> > index fc3c07fe7df5..0c44c49f6005 100644
> > --- a/arch/x86/kernel/cpu/intel.c
> > +++ b/arch/x86/kernel/cpu/intel.c
> > @@ -1029,3 +1029,24 @@ static const struct cpu_dev intel_cpu_dev = {
> >
> > cpu_dev_register(intel_cpu_dev);
> >
> > +/**
> > + * init_core_capability - enumerate features supported in IA32_CORE_CAPABILITY
> > + * @c: pointer to cpuinfo_x86
> > + *
> > + * Return: void
> > + */
> > +void init_core_capability(struct cpuinfo_x86 *c)
> > +{
> > + /*
> > + * If MSR_IA32_CORE_CAPABILITY exists, enumerate features that are
> > + * reported in the MSR.
> > + */
> > + if (c == &boot_cpu_data && cpu_has(c, X86_FEATURE_CORE_CAPABILITY)) {
>
> First of all, please endeavor to leave the main flow of the function at
> the first indent.
>
> *If* you were to do this, it would look like:
>
>
> if (c != &boot_cpu_data)
> return;
> if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITY))
> return;
>
> rdmsrl(...);
Ok. Will do it.
>
> But, it goes unmentioned why you do the boot-cpu-only restriction here.
> It doesn't match the behavior of the two functions called before
> init_core_capability():
>
> init_scattered_cpuid_features(c);
> init_speculation_control(c);
>
> So why is this new function special?
The function sets the split_lock_detect feature which needs to be
applied before BSP calls apply_enforced_caps() in get_cpu_cap(). And I
want to enable split lock detection in earlier phase to detect split
lock earlier.
Thanks.
-Fenghua
On 3/4/19 10:59 AM, Fenghua Yu wrote:
>> But, it goes unmentioned why you do the boot-cpu-only restriction here.
>> It doesn't match the behavior of the two functions called before
>> init_core_capability():
>>
>> init_scattered_cpuid_features(c);
>> init_speculation_control(c);
>>
>> So why is this new function special?
> The function sets the split_lock_detect feature which needs to be
> applied before BSP calls apply_enforced_caps() in get_cpu_cap().
Why does it need to be applied that way?
> And I want to enable split lock detection in earlier phase to detect
> split lock earlier.
Why do you want to do this? Why is this *required* as part of this
implementation?
... and how is this different than the other features that are processed
for cpus other than the boot CPU?
On Mon, Mar 04, 2019 at 10:52:19AM -0800, Dave Hansen wrote:
> On 3/1/19 6:44 PM, Fenghua Yu wrote:
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index 33051436c864..eb8ae701ef65 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -93,7 +93,9 @@ struct cpuinfo_x86 {
> > __u32 extended_cpuid_level;
> > /* Maximum supported CPUID level, -1=no CPUID: */
> > int cpuid_level;
> > - __u32 x86_capability[NCAPINTS + NBUGINTS];
> > + /* Unsigned long alignment to avoid split lock in atomic bitmap ops */
> > + __u32 x86_capability[NCAPINTS + NBUGINTS]
> > + __aligned(sizeof(unsigned long));
>
> I think this also warrants a comment in the changelog about the
> alignment of 'struct cpuinfo_x86'.
How about add "Depending on the starting address where GCC generates
for data of struct cpuinfo_x86, x86_capability[] may or may not align to
unsigned long...."?
Thanks.
-Fenghua
On 3/4/19 11:15 AM, Fenghua Yu wrote:
> On Mon, Mar 04, 2019 at 10:52:19AM -0800, Dave Hansen wrote:
>> On 3/1/19 6:44 PM, Fenghua Yu wrote:
>>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>>> index 33051436c864..eb8ae701ef65 100644
>>> --- a/arch/x86/include/asm/processor.h
>>> +++ b/arch/x86/include/asm/processor.h
>>> @@ -93,7 +93,9 @@ struct cpuinfo_x86 {
>>> __u32 extended_cpuid_level;
>>> /* Maximum supported CPUID level, -1=no CPUID: */
>>> int cpuid_level;
>>> - __u32 x86_capability[NCAPINTS + NBUGINTS];
>>> + /* Unsigned long alignment to avoid split lock in atomic bitmap ops */
>>> + __u32 x86_capability[NCAPINTS + NBUGINTS]
>>> + __aligned(sizeof(unsigned long));
>> I think this also warrants a comment in the changelog about the
>> alignment of 'struct cpuinfo_x86'.
> How about add "Depending on the starting address where GCC generates
> for data of struct cpuinfo_x86, x86_capability[] may or may not align to
> unsigned long...."?
If that's the case, then what good is this patch? Sounds like some of
the story is missing.
You might want to dig through some of the past discussions about this.
I know this exact topic has been broached with Thomas.
On Mon, Mar 04, 2019 at 11:15:12AM -0800, Fenghua Yu wrote:
> On Mon, Mar 04, 2019 at 10:52:19AM -0800, Dave Hansen wrote:
> > On 3/1/19 6:44 PM, Fenghua Yu wrote:
> > > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > > index 33051436c864..eb8ae701ef65 100644
> > > --- a/arch/x86/include/asm/processor.h
> > > +++ b/arch/x86/include/asm/processor.h
> > > @@ -93,7 +93,9 @@ struct cpuinfo_x86 {
> > > __u32 extended_cpuid_level;
> > > /* Maximum supported CPUID level, -1=no CPUID: */
> > > int cpuid_level;
> > > - __u32 x86_capability[NCAPINTS + NBUGINTS];
> > > + /* Unsigned long alignment to avoid split lock in atomic bitmap ops */
> > > + __u32 x86_capability[NCAPINTS + NBUGINTS]
> > > + __aligned(sizeof(unsigned long));
> >
> > I think this also warrants a comment in the changelog about the
> > alignment of 'struct cpuinfo_x86'.
>
> How about add "Depending on the starting address where GCC generates
> for data of struct cpuinfo_x86, x86_capability[] may or may not align to
> unsigned long...."?
Composite types inherit the strictest alignment of their members.
Without that alignment hints would be utterly useless.
On Mon, Mar 04, 2019 at 10:52:19AM -0800, Dave Hansen wrote:
> On 3/1/19 6:44 PM, Fenghua Yu wrote:
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index 33051436c864..eb8ae701ef65 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -93,7 +93,9 @@ struct cpuinfo_x86 {
> > __u32 extended_cpuid_level;
> > /* Maximum supported CPUID level, -1=no CPUID: */
> > int cpuid_level;
> > - __u32 x86_capability[NCAPINTS + NBUGINTS];
> > + /* Unsigned long alignment to avoid split lock in atomic bitmap ops */
> > + __u32 x86_capability[NCAPINTS + NBUGINTS]
> > + __aligned(sizeof(unsigned long));
>
> I think this also warrants a comment in the changelog about the
> alignment of 'struct cpuinfo_x86'.
Nah.
> This capability may also find usage in cloud. A user process with split
> lock running in one guest can block other cores from accessing shared
> memory during its split locked memory access. That may cause overall
> system performance degradation.
"shared memory" ? As in memory shared between two guests?
>
> Split lock may open a security hole where malicious user code may slow
> down overall system by executing instructions with split lock.
How much slowdown are we talking?
Hi Peter,
I love your patch! Perhaps something to improve:
[auto build test WARNING on kvm/linux-next]
[also build test WARNING on v5.0 next-20190304]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Fenghua-Yu/x86-split_lock-Enable-AC-exception-for-split-locked-accesses/20190305-044643
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: x86_64-randconfig-x017-201909 (attached as .config)
compiler: gcc-8 (Debian 8.2.0-21) 8.2.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
In file included from arch/x86/kernel/cpu/intel.c:22:
>> arch/x86/include/asm/insn-eval.h:18:39: warning: 'struct insn' declared inside parameter list will not be visible outside of this definition or declaration
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
^~~~
arch/x86/include/asm/insn-eval.h:19:34: warning: 'struct insn' declared inside parameter list will not be visible outside of this definition or declaration
int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
^~~~
vim +18 arch/x86/include/asm/insn-eval.h
4efea85f Ricardo Neri 2017-10-27 17
32542ee2 Ricardo Neri 2017-10-27 @18 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
e5e45f11 Ricardo Neri 2017-10-27 19 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
bd5a410a Ricardo Neri 2017-10-27 20 unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx);
e2a5dca7 Borislav Petkov 2017-11-23 21 int insn_get_code_seg_params(struct pt_regs *regs);
32542ee2 Ricardo Neri 2017-10-27 22
:::::: The code at line 18 was first introduced by commit
:::::: 32542ee295bec38e5e1608f8c9d6d28e5a7e6112 x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file
:::::: TO: Ricardo Neri <[email protected]>
:::::: CC: Thomas Gleixner <[email protected]>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Mon, Mar 04, 2019 at 04:55:05PM -0500, Konrad Rzeszutek Wilk wrote:
> > This capability may also find usage in cloud. A user process with split
> > lock running in one guest can block other cores from accessing shared
> > memory during its split locked memory access. That may cause overall
> > system performance degradation.
>
> "shared memory" ? As in memory shared between two guests?
Maybe I should remove the "shared" here.
Thanks.
-Fenghua
On Mon, 2019-03-04 at 12:14 +0100, Paolo Bonzini wrote:
> On 04/03/19 12:10, Xiaoyao Li wrote:
> > Like you said before, I think we don't need the condition judgment
> > "if(boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))", but to set
> > F(CORE_CAPABILITY)
> > always for guest since MSR_IA32_CORE_CAPABILITY is emulated.
> >
> > And we should set the right emulated value of MSR_IA32_CORE_CAPABILITY for
> > guest
> > in function kvm_get_core_capability() based on whether
> > boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) just as you commented in the
> > next
> > patch.
>
> Yes, that would certainly be better. However, you'd also have to move
> MSR_IA32_CORE_CAPABILITY handling to x86.c, because you'd have to enable
> X86_FEATURE_CORE_CAPABILITY for AMD.
>
> Paolo
Hi, Paolo
I just notice that F(ARCH_CAPABILITIES) is set unconditionally. However the
handling of MSR_IA32_ARCH_CAPABILITIES only exists with vmx, and the emulation
of this MSR is in vmx->arch_capabilities.
These will cause #GP when guest kernel rdmsr(MSR_IA32_ARCH_CAPABILITES) with AMD
CPU since there is handling for svm.
Maybe what I think is not correct due to my limit knowledge of
MSR_IA32_ARCH_CAPABILITIES and how kernel handles its related features.
If what I said above is true and it's indeed an issue. So based on the fact that
both MSR_IA32_ARCH_CAPABILITIES and MSR_IA32_CORE_CAPABILITY are feature-
enumerating MSR and we emulate them in KVM, there are 2 choices for us to handle
it:
1. we unconditionally set F(ARCH_CAPABILITIES) and F(CORE_CAPABILITY) for guest,
move the emulation of these 2 MSRs to vcpu->arch.***, and move all the handling
of these 2 MSRs to x86.c.
2. we conditionally set F(ARCH_CAPABILITIES) and F(CORE_CAPABILITY) only if it
is intel CPU. So we just need to emulate these 2 MSRs in vmx->*** for intel CPU.
I prefer option 2 personally for CORE_CAPABILITY since it makes no sense to
expose MSR_IA32_CORE_CAPABILITY to other x86 vendors.
About ARCH_CAPABILITIES, it seems that we emulate it for generic x86 cpus that
!x86_match_cpu(cpu_no_speculation). So we should choose option 1, to move the
emulation and handling to x86.c?
Xiaoyao
Hi Peter,
I love your patch! Perhaps something to improve:
[auto build test WARNING on kvm/linux-next]
[also build test WARNING on v5.0 next-20190304]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Fenghua-Yu/x86-split_lock-Enable-AC-exception-for-split-locked-accesses/20190305-044643
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: i386-randconfig-a1-201909 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All warnings (new ones prefixed by >>):
In file included from arch/x86/kernel/cpu/intel.c:22:0:
>> arch/x86/include/asm/insn-eval.h:18:58: warning: 'struct insn' declared inside parameter list
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
^
>> arch/x86/include/asm/insn-eval.h:18:58: warning: its scope is only this definition or declaration, which is probably not what you want
arch/x86/include/asm/insn-eval.h:19:53: warning: 'struct insn' declared inside parameter list
int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
^
Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_read
Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:set_bit
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls
Cyclomatic Complexity 2 include/linux/bitops.h:get_count_order
Cyclomatic Complexity 1 include/linux/string.h:strnlen
Cyclomatic Complexity 4 include/linux/string.h:strlen
Cyclomatic Complexity 4 include/linux/string.h:memcpy
Cyclomatic Complexity 2 include/linux/string.h:strcpy
Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_read
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_read
Cyclomatic Complexity 1 arch/x86/include/asm/current.h:get_current
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:native_save_fl
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:native_restore_fl
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:native_irq_disable
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:arch_local_save_flags
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:arch_local_irq_restore
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:arch_local_irq_disable
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:arch_local_irq_save
Cyclomatic Complexity 1 include/linux/jump_label.h:static_key_count
Cyclomatic Complexity 2 include/linux/jump_label.h:static_key_false
Cyclomatic Complexity 1 arch/x86/include/asm/ptrace.h:user_64bit_mode
Cyclomatic Complexity 1 arch/x86/include/asm/msr.h:__rdmsr
Cyclomatic Complexity 1 arch/x86/include/asm/msr.h:__wrmsr
Cyclomatic Complexity 2 arch/x86/include/asm/msr.h:native_read_msr
Cyclomatic Complexity 2 arch/x86/include/asm/msr.h:native_read_msr_safe
Cyclomatic Complexity 2 arch/x86/include/asm/msr.h:native_write_msr
Cyclomatic Complexity 1 arch/x86/include/asm/msr.h:wrmsrl
Cyclomatic Complexity 1 arch/x86/include/asm/msr.h:rdmsrl_safe
Cyclomatic Complexity 1 arch/x86/include/asm/processor.h:native_cpuid
Cyclomatic Complexity 1 arch/x86/include/asm/processor.h:native_cpuid_eax
Cyclomatic Complexity 1 arch/x86/include/asm/processor.h:cpuid
Cyclomatic Complexity 1 arch/x86/include/asm/processor.h:cpuid_count
Cyclomatic Complexity 1 arch/x86/include/asm/processor.h:cpuid_eax
Cyclomatic Complexity 2 include/linux/thread_info.h:test_ti_thread_flag
Cyclomatic Complexity 2 include/linux/thread_info.h:check_object_size
Cyclomatic Complexity 2 include/linux/thread_info.h:copy_overflow
Cyclomatic Complexity 4 include/linux/thread_info.h:check_copy_size
Cyclomatic Complexity 2 include/linux/uaccess.h:copy_from_user
Cyclomatic Complexity 1 arch/x86/include/asm/microcode_intel.h:intel_get_microcode_revision
Cyclomatic Complexity 1 arch/x86/kernel/cpu/intel.c:forcempx_setup
Cyclomatic Complexity 1 arch/x86/kernel/cpu/intel.c:ring3mwait_disable
Cyclomatic Complexity 5 arch/x86/kernel/cpu/intel.c:probe_xeon_phi_r3mwait
Cyclomatic Complexity 6 arch/x86/kernel/cpu/intel.c:bad_spectre_microcode
Cyclomatic Complexity 1 arch/x86/kernel/cpu/intel.c:forcepae_setup
Cyclomatic Complexity 1 arch/x86/kernel/cpu/intel.c:srat_detect_node
Cyclomatic Complexity 4 arch/x86/kernel/cpu/intel.c:init_cpuid_fault
Cyclomatic Complexity 2 arch/x86/kernel/cpu/intel.c:init_intel_misc_features
Cyclomatic Complexity 4 arch/x86/kernel/cpu/intel.c:intel_size_cache
Cyclomatic Complexity 38 arch/x86/kernel/cpu/intel.c:intel_tlb_lookup
Cyclomatic Complexity 6 arch/x86/kernel/cpu/intel.c:intel_detect_tlb
Cyclomatic Complexity 5 arch/x86/kernel/cpu/intel.c:init_intel_energy_perf
Cyclomatic Complexity 1 arch/x86/kernel/cpu/intel.c:intel_bsp_resume
Cyclomatic Complexity 16 arch/x86/kernel/cpu/intel.c:detect_tme
Cyclomatic Complexity 9 arch/x86/kernel/cpu/intel.c:intel_smp_check
Cyclomatic Complexity 9 arch/x86/kernel/cpu/intel.c:detect_vmx_virtcap
Cyclomatic Complexity 13 arch/x86/kernel/cpu/intel.c:intel_workarounds
Cyclomatic Complexity 1 arch/x86/include/asm/microcode.h:exit_amd_microcode
Cyclomatic Complexity 4 arch/x86/kernel/cpu/intel.c:check_mpx_erratum
Cyclomatic Complexity 37 arch/x86/kernel/cpu/intel.c:early_init_intel
Cyclomatic Complexity 30 arch/x86/kernel/cpu/intel.c:init_intel
Cyclomatic Complexity 3 arch/x86/kernel/cpu/intel.c:ppro_with_ram_bug
Cyclomatic Complexity 8 arch/x86/kernel/cpu/intel.c:fixup_cpuid_exception
Cyclomatic Complexity 4 arch/x86/kernel/cpu/intel.c:init_core_capability
--
In file included from arch/x86//kernel/cpu/intel.c:22:0:
>> arch/x86/include/asm/insn-eval.h:18:58: warning: 'struct insn' declared inside parameter list
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
^
>> arch/x86/include/asm/insn-eval.h:18:58: warning: its scope is only this definition or declaration, which is probably not what you want
arch/x86/include/asm/insn-eval.h:19:53: warning: 'struct insn' declared inside parameter list
int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
^
Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_read
Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:set_bit
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls
Cyclomatic Complexity 2 include/linux/bitops.h:get_count_order
Cyclomatic Complexity 1 include/linux/string.h:strnlen
Cyclomatic Complexity 4 include/linux/string.h:strlen
Cyclomatic Complexity 4 include/linux/string.h:memcpy
Cyclomatic Complexity 2 include/linux/string.h:strcpy
Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_read
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_read
Cyclomatic Complexity 1 arch/x86/include/asm/current.h:get_current
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:native_save_fl
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:native_restore_fl
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:native_irq_disable
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:arch_local_save_flags
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:arch_local_irq_restore
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:arch_local_irq_disable
Cyclomatic Complexity 1 arch/x86/include/asm/irqflags.h:arch_local_irq_save
Cyclomatic Complexity 1 include/linux/jump_label.h:static_key_count
Cyclomatic Complexity 2 include/linux/jump_label.h:static_key_false
Cyclomatic Complexity 1 arch/x86/include/asm/ptrace.h:user_64bit_mode
Cyclomatic Complexity 1 arch/x86/include/asm/msr.h:__rdmsr
Cyclomatic Complexity 1 arch/x86/include/asm/msr.h:__wrmsr
Cyclomatic Complexity 2 arch/x86/include/asm/msr.h:native_read_msr
Cyclomatic Complexity 2 arch/x86/include/asm/msr.h:native_read_msr_safe
Cyclomatic Complexity 2 arch/x86/include/asm/msr.h:native_write_msr
Cyclomatic Complexity 1 arch/x86/include/asm/msr.h:wrmsrl
Cyclomatic Complexity 1 arch/x86/include/asm/msr.h:rdmsrl_safe
Cyclomatic Complexity 1 arch/x86/include/asm/processor.h:native_cpuid
Cyclomatic Complexity 1 arch/x86/include/asm/processor.h:native_cpuid_eax
Cyclomatic Complexity 1 arch/x86/include/asm/processor.h:cpuid
Cyclomatic Complexity 1 arch/x86/include/asm/processor.h:cpuid_count
Cyclomatic Complexity 1 arch/x86/include/asm/processor.h:cpuid_eax
Cyclomatic Complexity 2 include/linux/thread_info.h:test_ti_thread_flag
Cyclomatic Complexity 2 include/linux/thread_info.h:check_object_size
Cyclomatic Complexity 2 include/linux/thread_info.h:copy_overflow
Cyclomatic Complexity 4 include/linux/thread_info.h:check_copy_size
Cyclomatic Complexity 2 include/linux/uaccess.h:copy_from_user
Cyclomatic Complexity 1 arch/x86/include/asm/microcode_intel.h:intel_get_microcode_revision
Cyclomatic Complexity 1 arch/x86//kernel/cpu/intel.c:forcempx_setup
Cyclomatic Complexity 1 arch/x86//kernel/cpu/intel.c:ring3mwait_disable
Cyclomatic Complexity 5 arch/x86//kernel/cpu/intel.c:probe_xeon_phi_r3mwait
Cyclomatic Complexity 6 arch/x86//kernel/cpu/intel.c:bad_spectre_microcode
Cyclomatic Complexity 1 arch/x86//kernel/cpu/intel.c:forcepae_setup
Cyclomatic Complexity 1 arch/x86//kernel/cpu/intel.c:srat_detect_node
Cyclomatic Complexity 4 arch/x86//kernel/cpu/intel.c:init_cpuid_fault
Cyclomatic Complexity 2 arch/x86//kernel/cpu/intel.c:init_intel_misc_features
Cyclomatic Complexity 4 arch/x86//kernel/cpu/intel.c:intel_size_cache
Cyclomatic Complexity 38 arch/x86//kernel/cpu/intel.c:intel_tlb_lookup
Cyclomatic Complexity 6 arch/x86//kernel/cpu/intel.c:intel_detect_tlb
Cyclomatic Complexity 5 arch/x86//kernel/cpu/intel.c:init_intel_energy_perf
Cyclomatic Complexity 1 arch/x86//kernel/cpu/intel.c:intel_bsp_resume
Cyclomatic Complexity 16 arch/x86//kernel/cpu/intel.c:detect_tme
Cyclomatic Complexity 9 arch/x86//kernel/cpu/intel.c:intel_smp_check
Cyclomatic Complexity 9 arch/x86//kernel/cpu/intel.c:detect_vmx_virtcap
Cyclomatic Complexity 13 arch/x86//kernel/cpu/intel.c:intel_workarounds
Cyclomatic Complexity 1 arch/x86/include/asm/microcode.h:exit_amd_microcode
Cyclomatic Complexity 4 arch/x86//kernel/cpu/intel.c:check_mpx_erratum
Cyclomatic Complexity 37 arch/x86//kernel/cpu/intel.c:early_init_intel
Cyclomatic Complexity 30 arch/x86//kernel/cpu/intel.c:init_intel
Cyclomatic Complexity 3 arch/x86//kernel/cpu/intel.c:ppro_with_ram_bug
Cyclomatic Complexity 8 arch/x86//kernel/cpu/intel.c:fixup_cpuid_exception
Cyclomatic Complexity 4 arch/x86//kernel/cpu/intel.c:init_core_capability
vim +18 arch/x86/include/asm/insn-eval.h
4efea85f Ricardo Neri 2017-10-27 17
32542ee2 Ricardo Neri 2017-10-27 @18 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
e5e45f11 Ricardo Neri 2017-10-27 19 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
bd5a410a Ricardo Neri 2017-10-27 20 unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx);
e2a5dca7 Borislav Petkov 2017-11-23 21 int insn_get_code_seg_params(struct pt_regs *regs);
32542ee2 Ricardo Neri 2017-10-27 22
:::::: The code at line 18 was first introduced by commit
:::::: 32542ee295bec38e5e1608f8c9d6d28e5a7e6112 x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file
:::::: TO: Ricardo Neri <[email protected]>
:::::: CC: Thomas Gleixner <[email protected]>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi, Paolo
On Mon, 2019-03-04 at 09:42 +0100, Paolo Bonzini wrote:
> On 02/03/19 03:45, Fenghua Yu wrote:
> > From: Xiaoyao Li <[email protected]>
> >
> > MSR IA32_CORE_CAPABILITY is a feature-enumerating MSR, bit 5 of which
> > reports the capability of enabling detection of split locks (will be
> > supported on future processors based on Tremont microarchitecture and
> > later).
> >
> > Please check the latest Intel Architecture Instruction Set Extensions
> > and Future Features Programming Reference for more detailed information
> > on the MSR and the split lock bit.
> >
> > 1. Expose it to user space as a feature-enumerating MSR, so that user
> > space can query it.
> >
> > 2. Emualte MSR_IA32_CORE_CAPABILITY with vmx->core_capability. And add the
> > get and set handler of MSR_IA32_CORE_CAPABILITY.
> > For uesrspace, it can set this MSR when customizing features of guest,
> > also it can read the value of this MSR of guest.
> > For guest, as it's a feature-enumerating MSR, guest only can read it.
> >
> > Signed-off-by: Xiaoyao Li <[email protected]>
> > Signed-off-by: Fenghua Yu <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 1 +
> > arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++++++
> > arch/x86/kvm/vmx/vmx.h | 1 +
> > arch/x86/kvm/x86.c | 17 ++++++++++++++++-
> > 4 files changed, 41 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h
> > index 180373360e34..208f15570d17 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1527,6 +1527,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long
> > ipi_bitmap_low,
> > unsigned long icr, int op_64_bit);
> >
> > u64 kvm_get_arch_capabilities(void);
> > +u64 kvm_get_core_capability(void);
> > void kvm_define_shared_msr(unsigned index, u32 msr);
> > int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 30a6bcd735ec..3e03c6e1e558 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1679,6 +1679,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> >
> > msr_info->data = to_vmx(vcpu)->spec_ctrl;
> > break;
> > + case MSR_IA32_CORE_CAPABILITY:
> > + if (!msr_info->host_initiated &&
> > + !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITY))
> > + return 1;
> > + msr_info->data = vmx->core_capability;
> > + break;
> > case MSR_IA32_ARCH_CAPABILITIES:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
> > @@ -1891,6 +1897,21 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> > vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
> > MSR_IA32_PRED_CMD,
> > MSR_TYPE_W);
> > break;
> > + case MSR_IA32_CORE_CAPABILITY:
> > + if (!msr_info->host_initiated)
> > + return 1;
> > + if (data & ~CORE_CAP_SPLIT_LOCK_DETECT)
> > + return 1;
> > +
> > + /*
> > + * Since AC split lock is a hardware feature, and there is no
> > + * software emulation yet, we cannot enable it for guest if
> > + * host hardware doesn't support it.
> > + */
> > + if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > + data &= ~CORE_CAP_SPLIT_LOCK_DETECT;
> > + vmx->core_capability = data;
> > + break;
> > case MSR_IA32_ARCH_CAPABILITIES:
> > if (!msr_info->host_initiated)
> > return 1;
> > @@ -4083,6 +4104,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> > ++vmx->nmsrs;
> > }
> >
> > + vmx->core_capability = kvm_get_core_capability();
> > +
> > vmx->arch_capabilities = kvm_get_arch_capabilities();
> >
> > vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
> > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > index 0ac0a64c7790..cc22379991f3 100644
> > --- a/arch/x86/kvm/vmx/vmx.h
> > +++ b/arch/x86/kvm/vmx/vmx.h
> > @@ -191,6 +191,7 @@ struct vcpu_vmx {
> > u64 msr_guest_kernel_gs_base;
> > #endif
> >
> > + u64 core_capability;
> > u64 arch_capabilities;
> > u64 spec_ctrl;
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 941f932373d0..c3c9e3f2d08a 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1125,7 +1125,8 @@ static u32 msrs_to_save[] = {
> > #endif
> > MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
> > MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
> > - MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES,
> > + MSR_IA32_SPEC_CTRL, MSR_IA32_CORE_CAPABILITY,
> > + MSR_IA32_ARCH_CAPABILITIES,
> > MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
> > MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
> > MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
> > @@ -1197,11 +1198,22 @@ static u32 msr_based_features[] = {
> >
> > MSR_F10H_DECFG,
> > MSR_IA32_UCODE_REV,
> > + MSR_IA32_CORE_CAPABILITY,
> > MSR_IA32_ARCH_CAPABILITIES,
> > };
> >
> > static unsigned int num_msr_based_features;
> >
> > +u64 kvm_get_core_capability(void)
> > +{
> > + u64 data;
> > +
> > + rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data);
>
> This patch should be merged with the previous patch. Also here you
> should add:
>
> data &= CORE_CAP_SPLIT_LOCK_DETECT;
I agree with this.
> so that non-virtualizable features are hidden and
>
> if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> data |= CORE_CAP_SPLIT_LOCK_DETECT;
>
> so that userspace gets "for free" the FMS list that will be added
> later to the kernel.
I think it's redundant. Because there is no case that
rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data) shows no split lock detection while
boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT).
Xiaoyao
> Thanks,
>
> Paolo
>
> > +
> > + return data;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_get_core_capability);
> > +
> > u64 kvm_get_arch_capabilities(void)
> > {
> > u64 data;
> > @@ -1227,6 +1239,9 @@ EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
> > static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
> > {
> > switch (msr->index) {
> > + case MSR_IA32_CORE_CAPABILITY:
> > + msr->data = kvm_get_core_capability();
> > + break;
> > case MSR_IA32_ARCH_CAPABILITIES:
> > msr->data = kvm_get_arch_capabilities();
> > break;
> >
>
>
On 08/03/19 07:10, Xiaoyao Li wrote:
>> so that non-virtualizable features are hidden and
>>
>> if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
>> data |= CORE_CAP_SPLIT_LOCK_DETECT;
>>
>> so that userspace gets "for free" the FMS list that will be added
>> later to the kernel.
> I think it's redundant. Because there is no case that
> rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data) shows no split lock detection while
> boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT).
There will be when the kernel will add a list of FMS values that have
split lock detection but lack the core capabilities MSR. Or at least
that is what Fenghua said in the cover letter.
Paolo
On Fri, 2019-03-08 at 08:54 +0100, Paolo Bonzini wrote:
> On 08/03/19 07:10, Xiaoyao Li wrote:
> > > so that non-virtualizable features are hidden and
> > >
> > > if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > > data |= CORE_CAP_SPLIT_LOCK_DETECT;
> > >
> > > so that userspace gets "for free" the FMS list that will be added
> > > later to the kernel.
> >
> > I think it's redundant. Because there is no case that
> > rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data) shows no split lock detection
> > while
> > boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT).
>
> There will be when the kernel will add a list of FMS values that have
> split lock detection but lack the core capabilities MSR. Or at least
> that is what Fenghua said in the cover letter.
>
> Paolo
Got it.
Thanks for your explanation.
Hi, Paolo,
Do you have any comments on this patch?
We are preparing v5 patches for split lock detection, if you have any comments
about this one, please let me know.
Thanks,
Xiaoyao
On Fri, 2019-03-01 at 18:45 -0800, Fenghua Yu wrote:
> From: Xiaoyao Li <[email protected]>
>
> A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
> future x86 processors. When bit 29 is set, the processor causes #AC
> exception for split locked accesses at all CPL.
>
> Please check the latest Intel Software Developer's Manual
> for more detailed information on the MSR and the split lock bit.
>
> 1. Since the kernel chooses to enable AC split lock by default, which
> means if we don't emulate TEST_CTL MSR for guest, guest will run with
> this feature enable while does not known it. Thus existing guests with
> buggy firmware (like OVMF) and old kernels having the cross cache line
> issues will fail the boot due to #AC.
>
> So we should emulate TEST_CTL MSR, and set it zero to disable AC split
> lock by default. Whether and when to enable it is left to guest firmware
> and guest kernel.
>
> 2. Host and guest can enable AC split lock independently, so using
> msr autoload to switch it during VM entry/exit.
>
> Signed-off-by: Xiaoyao Li <[email protected]>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.h | 1 +
> 2 files changed, 36 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 3e03c6e1e558..c0c5e8621afa 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1659,6 +1659,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct
> msr_data *msr_info)
> u32 index;
>
> switch (msr_info->index) {
> + case MSR_TEST_CTL:
> + if (!msr_info->host_initiated &&
> + !(vmx->core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> + return 1;
> + msr_info->data = vmx->msr_test_ctl;
> + break;
> #ifdef CONFIG_X86_64
> case MSR_FS_BASE:
> msr_info->data = vmcs_readl(GUEST_FS_BASE);
> @@ -1805,6 +1811,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct
> msr_data *msr_info)
> u32 index;
>
> switch (msr_index) {
> + case MSR_TEST_CTL:
> + if (!(vmx->core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> + return 1;
> +
> + if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
> + return 1;
> + vmx->msr_test_ctl = data;
> + break;
> case MSR_EFER:
> ret = kvm_set_msr_common(vcpu, msr_info);
> break;
> @@ -4108,6 +4122,9 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
>
> vmx->arch_capabilities = kvm_get_arch_capabilities();
>
> + /* disable AC split lock by default */
> + vmx->msr_test_ctl = 0;
> +
> vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
>
> /* 22.2.1, 20.8.1 */
> @@ -4145,6 +4162,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool
> init_event)
>
> vmx->rmode.vm86_active = 0;
> vmx->spec_ctrl = 0;
> + vmx->msr_test_ctl = 0;
>
> vcpu->arch.microcode_version = 0x100000000ULL;
> vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
> @@ -6344,6 +6362,21 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx
> *vmx)
> msrs[i].host, false);
> }
>
> +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> +{
> + u64 host_msr_test_ctl;
> +
> + if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> + return;
> +
> + rdmsrl(MSR_TEST_CTL, host_msr_test_ctl);
> + if (host_msr_test_ctl == vmx->msr_test_ctl)
> + clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> + else
> + add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
> + host_msr_test_ctl, false);
> +}
> +
> static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
> {
> vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
> @@ -6585,6 +6618,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>
> atomic_switch_perf_msrs(vmx);
>
> + atomic_switch_msr_test_ctl(vmx);
> +
> vmx_update_hv_timer(vcpu);
>
> /*
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index cc22379991f3..e8831609c6c3 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -191,6 +191,7 @@ struct vcpu_vmx {
> u64 msr_guest_kernel_gs_base;
> #endif
>
> + u64 msr_test_ctl;
> u64 core_capability;
> u64 arch_capabilities;
> u64 spec_ctrl;
On 09/03/19 03:31, Xiaoyao Li wrote:
> Hi, Paolo,
>
> Do you have any comments on this patch?
>
> We are preparing v5 patches for split lock detection, if you have any comments
> about this one, please let me know.
No, my only comment is that it should be placed _before_ the other two
for bisectability. I think I have already sent that small remark.
Thanks,
Paolo
> Thanks,
> Xiaoyao
>
> On Fri, 2019-03-01 at 18:45 -0800, Fenghua Yu wrote:
>> From: Xiaoyao Li <[email protected]>
>>
>> A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
>> future x86 processors. When bit 29 is set, the processor causes #AC
>> exception for split locked accesses at all CPL.
>>
>> Please check the latest Intel Software Developer's Manual
>> for more detailed information on the MSR and the split lock bit.
>>
>> 1. Since the kernel chooses to enable AC split lock by default, which
>> means if we don't emulate TEST_CTL MSR for guest, guest will run with
>> this feature enable while does not known it. Thus existing guests with
>> buggy firmware (like OVMF) and old kernels having the cross cache line
>> issues will fail the boot due to #AC.
>>
>> So we should emulate TEST_CTL MSR, and set it zero to disable AC split
>> lock by default. Whether and when to enable it is left to guest firmware
>> and guest kernel.
>>
>> 2. Host and guest can enable AC split lock independently, so using
>> msr autoload to switch it during VM entry/exit.
>>
>> Signed-off-by: Xiaoyao Li <[email protected]>
>> Signed-off-by: Fenghua Yu <[email protected]>
>> ---
>> arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
>> arch/x86/kvm/vmx/vmx.h | 1 +
>> 2 files changed, 36 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 3e03c6e1e558..c0c5e8621afa 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -1659,6 +1659,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct
>> msr_data *msr_info)
>> u32 index;
>>
>> switch (msr_info->index) {
>> + case MSR_TEST_CTL:
>> + if (!msr_info->host_initiated &&
>> + !(vmx->core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
>> + return 1;
>> + msr_info->data = vmx->msr_test_ctl;
>> + break;
>> #ifdef CONFIG_X86_64
>> case MSR_FS_BASE:
>> msr_info->data = vmcs_readl(GUEST_FS_BASE);
>> @@ -1805,6 +1811,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct
>> msr_data *msr_info)
>> u32 index;
>>
>> switch (msr_index) {
>> + case MSR_TEST_CTL:
>> + if (!(vmx->core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
>> + return 1;
>> +
>> + if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
>> + return 1;
>> + vmx->msr_test_ctl = data;
>> + break;
>> case MSR_EFER:
>> ret = kvm_set_msr_common(vcpu, msr_info);
>> break;
>> @@ -4108,6 +4122,9 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
>>
>> vmx->arch_capabilities = kvm_get_arch_capabilities();
>>
>> + /* disable AC split lock by default */
>> + vmx->msr_test_ctl = 0;
>> +
>> vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
>>
>> /* 22.2.1, 20.8.1 */
>> @@ -4145,6 +4162,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool
>> init_event)
>>
>> vmx->rmode.vm86_active = 0;
>> vmx->spec_ctrl = 0;
>> + vmx->msr_test_ctl = 0;
>>
>> vcpu->arch.microcode_version = 0x100000000ULL;
>> vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
>> @@ -6344,6 +6362,21 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx
>> *vmx)
>> msrs[i].host, false);
>> }
>>
>> +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
>> +{
>> + u64 host_msr_test_ctl;
>> +
>> + if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
>> + return;
>> +
>> + rdmsrl(MSR_TEST_CTL, host_msr_test_ctl);
>> + if (host_msr_test_ctl == vmx->msr_test_ctl)
>> + clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
>> + else
>> + add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
>> + host_msr_test_ctl, false);
>> +}
>> +
>> static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
>> {
>> vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
>> @@ -6585,6 +6618,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>
>> atomic_switch_perf_msrs(vmx);
>>
>> + atomic_switch_msr_test_ctl(vmx);
>> +
>> vmx_update_hv_timer(vcpu);
>>
>> /*
>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>> index cc22379991f3..e8831609c6c3 100644
>> --- a/arch/x86/kvm/vmx/vmx.h
>> +++ b/arch/x86/kvm/vmx/vmx.h
>> @@ -191,6 +191,7 @@ struct vcpu_vmx {
>> u64 msr_guest_kernel_gs_base;
>> #endif
>>
>> + u64 msr_test_ctl;
>> u64 core_capability;
>> u64 arch_capabilities;
>> u64 spec_ctrl;
>
On Mon, 2019-03-11 at 14:31 +0100, Paolo Bonzini wrote:
> On 09/03/19 03:31, Xiaoyao Li wrote:
> > Hi, Paolo,
> >
> > Do you have any comments on this patch?
> >
> > We are preparing v5 patches for split lock detection, if you have any
> > comments
> > about this one, please let me know.
>
> No, my only comment is that it should be placed _before_ the other two
> for bisectability. I think I have already sent that small remark.
>
> Thanks,
>
> Paolo
I cannot find the small remark you sent before. Maybe I missed something.
But I'am confused why it should be placed _before_ the other two. This patch
just use the vmx->core_capability that defined it the previous patch.
> > Thanks,
> > Xiaoyao
> >
> > On Fri, 2019-03-01 at 18:45 -0800, Fenghua Yu wrote:
> > > From: Xiaoyao Li <[email protected]>
> > >
> > > A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
> > > future x86 processors. When bit 29 is set, the processor causes #AC
> > > exception for split locked accesses at all CPL.
> > >
> > > Please check the latest Intel Software Developer's Manual
> > > for more detailed information on the MSR and the split lock bit.
> > >
> > > 1. Since the kernel chooses to enable AC split lock by default, which
> > > means if we don't emulate TEST_CTL MSR for guest, guest will run with
> > > this feature enable while does not known it. Thus existing guests with
> > > buggy firmware (like OVMF) and old kernels having the cross cache line
> > > issues will fail the boot due to #AC.
> > >
> > > So we should emulate TEST_CTL MSR, and set it zero to disable AC split
> > > lock by default. Whether and when to enable it is left to guest firmware
> > > and guest kernel.
> > >
> > > 2. Host and guest can enable AC split lock independently, so using
> > > msr autoload to switch it during VM entry/exit.
> > >
> > > Signed-off-by: Xiaoyao Li <[email protected]>
> > > Signed-off-by: Fenghua Yu <[email protected]>
> > > ---
> > > arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
> > > arch/x86/kvm/vmx/vmx.h | 1 +
> > > 2 files changed, 36 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 3e03c6e1e558..c0c5e8621afa 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -1659,6 +1659,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu,
> > > struct
> > > msr_data *msr_info)
> > > u32 index;
> > >
> > > switch (msr_info->index) {
> > > + case MSR_TEST_CTL:
> > > + if (!msr_info->host_initiated &&
> > > + !(vmx->core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> > > + return 1;
> > > + msr_info->data = vmx->msr_test_ctl;
> > > + break;
> > > #ifdef CONFIG_X86_64
> > > case MSR_FS_BASE:
> > > msr_info->data = vmcs_readl(GUEST_FS_BASE);
> > > @@ -1805,6 +1811,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
> > > struct
> > > msr_data *msr_info)
> > > u32 index;
> > >
> > > switch (msr_index) {
> > > + case MSR_TEST_CTL:
> > > + if (!(vmx->core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> > > + return 1;
> > > +
> > > + if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
> > > + return 1;
> > > + vmx->msr_test_ctl = data;
> > > + break;
> > > case MSR_EFER:
> > > ret = kvm_set_msr_common(vcpu, msr_info);
> > > break;
> > > @@ -4108,6 +4122,9 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> > >
> > > vmx->arch_capabilities = kvm_get_arch_capabilities();
> > >
> > > + /* disable AC split lock by default */
> > > + vmx->msr_test_ctl = 0;
> > > +
> > > vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
> > >
> > > /* 22.2.1, 20.8.1 */
> > > @@ -4145,6 +4162,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu,
> > > bool
> > > init_event)
> > >
> > > vmx->rmode.vm86_active = 0;
> > > vmx->spec_ctrl = 0;
> > > + vmx->msr_test_ctl = 0;
> > >
> > > vcpu->arch.microcode_version = 0x100000000ULL;
> > > vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
> > > @@ -6344,6 +6362,21 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx
> > > *vmx)
> > > msrs[i].host, false);
> > > }
> > >
> > > +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> > > +{
> > > + u64 host_msr_test_ctl;
> > > +
> > > + if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > > + return;
> > > +
> > > + rdmsrl(MSR_TEST_CTL, host_msr_test_ctl);
> > > + if (host_msr_test_ctl == vmx->msr_test_ctl)
> > > + clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> > > + else
> > > + add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
> > > + host_msr_test_ctl, false);
> > > +}
> > > +
> > > static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
> > > {
> > > vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
> > > @@ -6585,6 +6618,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> > >
> > > atomic_switch_perf_msrs(vmx);
> > >
> > > + atomic_switch_msr_test_ctl(vmx);
> > > +
> > > vmx_update_hv_timer(vcpu);
> > >
> > > /*
> > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > > index cc22379991f3..e8831609c6c3 100644
> > > --- a/arch/x86/kvm/vmx/vmx.h
> > > +++ b/arch/x86/kvm/vmx/vmx.h
> > > @@ -191,6 +191,7 @@ struct vcpu_vmx {
> > > u64 msr_guest_kernel_gs_base;
> > > #endif
> > >
> > > + u64 msr_test_ctl;
> > > u64 core_capability;
> > > u64 arch_capabilities;
> > > u64 spec_ctrl;
>
>
On 11/03/19 16:10, Xiaoyao Li wrote:
> On Mon, 2019-03-11 at 14:31 +0100, Paolo Bonzini wrote:
>> On 09/03/19 03:31, Xiaoyao Li wrote:
>>> Hi, Paolo,
>>>
>>> Do you have any comments on this patch?
>>>
>>> We are preparing v5 patches for split lock detection, if you have any
>>> comments
>>> about this one, please let me know.
>>
>> No, my only comment is that it should be placed _before_ the other two
>> for bisectability. I think I have already sent that small remark.
>>
>> Thanks,
>>
>> Paolo
>
> I cannot find the small remark you sent before. Maybe I missed something.
> But I'am confused why it should be placed _before_ the other two. This patch
> just use the vmx->core_capability that defined it the previous patch.
Because otherwise the guest can see core_capability != 0 and will #GP
when trying to use split lock detection.
But you are right, this patch must be the last. Instead,
kvm-get_core_capability() should always return 0 until the previous
patch. Then in this patch you add the rdmsr and boot_cpu_has() tests.
Paolo
>>>> + if (!(vmx->core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
>>>> + return 1;
>>>> +
>>>> + if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
>>>> + return 1;
>>>> + vmx->msr_test_ctl = data;
>>>> + break;
>>>> case MSR_EFER:
>>>> ret = kvm_set_msr_common(vcpu, msr_info);
>>>> break;
>>>> @@ -4108,6 +4122,9 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
>>>>
>>>> vmx->arch_capabilities = kvm_get_arch_capabilities();
>>>>
>>>> + /* disable AC split lock by default */
>>>> + vmx->msr_test_ctl = 0;
>>>> +
>>>> vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
>>>>
>>>> /* 22.2.1, 20.8.1 */
>>>> @@ -4145,6 +4162,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu,
>>>> bool
>>>> init_event)
>>>>
>>>> vmx->rmode.vm86_active = 0;
>>>> vmx->spec_ctrl = 0;
>>>> + vmx->msr_test_ctl = 0;
>>>>
>>>> vcpu->arch.microcode_version = 0x100000000ULL;
>>>> vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
>>>> @@ -6344,6 +6362,21 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx
>>>> *vmx)
>>>> msrs[i].host, false);
>>>> }
>>>>
>>>> +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
>>>> +{
>>>> + u64 host_msr_test_ctl;
>>>> +
>>>> + if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
>>>> + return;
>>>> +
>>>> + rdmsrl(MSR_TEST_CTL, host_msr_test_ctl);
>>>> + if (host_msr_test_ctl == vmx->msr_test_ctl)
>>>> + clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
>>>> + else
>>>> + add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
>>>> + host_msr_test_ctl, false);
>>>> +}
>>>> +
>>>> static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
>>>> {
>>>> vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
>>>> @@ -6585,6 +6618,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>>
>>>> atomic_switch_perf_msrs(vmx);
>>>>
>>>> + atomic_switch_msr_test_ctl(vmx);
>>>> +
>>>> vmx_update_hv_timer(vcpu);
>>>>
>>>> /*
>>>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>>>> index cc22379991f3..e8831609c6c3 100644
>>>> --- a/arch/x86/kvm/vmx/vmx.h
>>>> +++ b/arch/x86/kvm/vmx/vmx.h
>>>> @@ -191,6 +191,7 @@ struct vcpu_vmx {
>>>> u64 msr_guest_kernel_gs_base;
>>>> #endif
>>>>
>>>> + u64 msr_test_ctl;
>>>> u64 core_capability;
>>>> u64 arch_capabilities;
>>>> u64 spec_ctrl;
>>
>>
>
On Mon, 2019-03-11 at 16:21 +0100, Paolo Bonzini wrote:
> On 11/03/19 16:10, Xiaoyao Li wrote:
> > On Mon, 2019-03-11 at 14:31 +0100, Paolo Bonzini wrote:
> > > On 09/03/19 03:31, Xiaoyao Li wrote:
> > > > Hi, Paolo,
> > > >
> > > > Do you have any comments on this patch?
> > > >
> > > > We are preparing v5 patches for split lock detection, if you have any
> > > > comments
> > > > about this one, please let me know.
> > >
> > > No, my only comment is that it should be placed _before_ the other two
> > > for bisectability. I think I have already sent that small remark.
> > >
> > > Thanks,
> > >
> > > Paolo
> >
> > I cannot find the small remark you sent before. Maybe I missed something.
> > But I'am confused why it should be placed _before_ the other two. This patch
> > just use the vmx->core_capability that defined it the previous patch.
>
> Because otherwise the guest can see core_capability != 0 and will #GP
> when trying to use split lock detection.
>
> But you are right, this patch must be the last. Instead,
> kvm-get_core_capability() should always return 0 until the previous
> patch. Then in this patch you add the rdmsr and boot_cpu_has() tests.
>
> Paolo
Hi, Paolo
Thanks a lot for your explanation. It makes me better understand the
bisectability of a patchset. I really appreciate you and I love the community. I
can learn a lot here and it makes me better.
I will fix the bisectability issue following your comments.
> > > > > + if (!(vmx->core_capability &
> > > > > CORE_CAP_SPLIT_LOCK_DETECT))
> > > > > + return 1;
> > > > > +
> > > > > + if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
> > > > > + return 1;
> > > > > + vmx->msr_test_ctl = data;
> > > > > + break;
> > > > > case MSR_EFER:
> > > > > ret = kvm_set_msr_common(vcpu, msr_info);
> > > > > break;
> > > > > @@ -4108,6 +4122,9 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> > > > >
> > > > > vmx->arch_capabilities = kvm_get_arch_capabilities();
> > > > >
> > > > > + /* disable AC split lock by default */
> > > > > + vmx->msr_test_ctl = 0;
> > > > > +
> > > > > vm_exit_controls_init(vmx, vmx_vmexit_ctrl());
> > > > >
> > > > > /* 22.2.1, 20.8.1 */
> > > > > @@ -4145,6 +4162,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu
> > > > > *vcpu,
> > > > > bool
> > > > > init_event)
> > > > >
> > > > > vmx->rmode.vm86_active = 0;
> > > > > vmx->spec_ctrl = 0;
> > > > > + vmx->msr_test_ctl = 0;
> > > > >
> > > > > vcpu->arch.microcode_version = 0x100000000ULL;
> > > > > vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
> > > > > @@ -6344,6 +6362,21 @@ static void atomic_switch_perf_msrs(struct
> > > > > vcpu_vmx
> > > > > *vmx)
> > > > > msrs[i].host, false);
> > > > > }
> > > > >
> > > > > +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> > > > > +{
> > > > > + u64 host_msr_test_ctl;
> > > > > +
> > > > > + if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > > > > + return;
> > > > > +
> > > > > + rdmsrl(MSR_TEST_CTL, host_msr_test_ctl);
> > > > > + if (host_msr_test_ctl == vmx->msr_test_ctl)
> > > > > + clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> > > > > + else
> > > > > + add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx-
> > > > > >msr_test_ctl,
> > > > > + host_msr_test_ctl, false);
> > > > > +}
> > > > > +
> > > > > static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
> > > > > {
> > > > > vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
> > > > > @@ -6585,6 +6618,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> > > > >
> > > > > atomic_switch_perf_msrs(vmx);
> > > > >
> > > > > + atomic_switch_msr_test_ctl(vmx);
> > > > > +
> > > > > vmx_update_hv_timer(vcpu);
> > > > >
> > > > > /*
> > > > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > > > > index cc22379991f3..e8831609c6c3 100644
> > > > > --- a/arch/x86/kvm/vmx/vmx.h
> > > > > +++ b/arch/x86/kvm/vmx/vmx.h
> > > > > @@ -191,6 +191,7 @@ struct vcpu_vmx {
> > > > > u64 msr_guest_kernel_gs_base;
> > > > > #endif
> > > > >
> > > > > + u64 msr_test_ctl;
> > > > > u64 core_capability;
> > > > > u64 arch_capabilities;
> > > > > u64 spec_ctrl;
> > >
> > >
>
>