Hi geniuses,
Please help review the new version of Arch LBR enabling patch set.
The Architectural Last Branch Records (LBRs) is publiced
in the 319433-040 release of Intel Architecture Instruction
Set Extensions and Future Features Programming Reference[0].
The main advantages for the Arch LBR users are [1]:
- Faster context switching due to XSAVES support and faster reset of
LBR MSRs via the new DEPTH MSR
- Faster LBR read for a non-PEBS event due to XSAVES support, which
lowers the overhead of the NMI handler.
- Linux kernel can support the LBR features without knowing the model
number of the current CPU.
It's based on the kvm/queue tree plus two commits from kvm/intel tree:
- 'fea4ab260645 ("KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS")'
- '0ccd14126cb2 ("KVM: x86: Report XSS as an MSR to be saved if there are supported features")'
Please check more details in each commit and feel free to comment.
[0] https://software.intel.com/content/www/us/en/develop/download/
intel-architecture-instruction-set-extensions-and-future-features-programming-reference.html
[1] https://lore.kernel.org/lkml/[email protected]/
---
v3->v4 Changelog:
- Add one more host patch to reuse ARCH_LBR_CTL_MASK;
- Add reserve_lbr_buffers() instead of using GFP_ATOMIC;
- Fia a bug in the arch_lbr_depth_is_valid();
- Add LBR_CTL_EN to unify DEBUGCTLMSR_LBR and ARCH_LBR_CTL_LBREN;
- Add vmx->host_lbrctlmsr to save/restore host values;
- Add KVM_SUPPORTED_XSS to refactoring supported_xss;
- Clear Arch_LBR ans its XSS bit if it's not supported;
- Add negative testing to the related kvm-unit-tests;
- Refine code and commit messages;
Previous:
https://lore.kernel.org/kvm/[email protected]/
Like Xu (11):
perf/x86/intel: Fix the comment about guest LBR support on KVM
perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR
perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
perf/x86: Move ARCH_LBR_CTL_MASK definition to include/asm/msr-index.h
KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL emulation for Arch LBR
KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field
KVM: x86: Expose Architectural LBR CPUID leaf
KVM: x86: Refine the matching and clearing logic for supported_xss
KVM: x86: Add XSAVE Support for Architectural LBRs
arch/x86/events/core.c | 8 ++-
arch/x86/events/intel/bts.c | 2 +-
arch/x86/events/intel/core.c | 6 +-
arch/x86/events/intel/lbr.c | 28 +++++----
arch/x86/events/perf_event.h | 8 ++-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/vmx.h | 4 ++
arch/x86/kvm/cpuid.c | 25 +++++++-
arch/x86/kvm/vmx/capabilities.h | 25 +++++---
arch/x86/kvm/vmx/pmu_intel.c | 103 ++++++++++++++++++++++++++++---
arch/x86/kvm/vmx/vmx.c | 50 +++++++++++++--
arch/x86/kvm/vmx/vmx.h | 4 ++
arch/x86/kvm/x86.c | 6 +-
13 files changed, 227 insertions(+), 43 deletions(-)
--
2.29.2
Starting from v5.12, KVM reports guest LBR and extra_regs support
when the host has relevant support. Just delete this part of the
comment and fix a typo.
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/events/intel/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index d4569bfa83e3..7bb96ac87615 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5565,8 +5565,7 @@ __init int intel_pmu_init(void)
/*
* Access LBR MSR may cause #GP under certain circumstances.
- * E.g. KVM doesn't support LBR MSR
- * Check all LBT MSR here.
+ * Check all LBR MSR here.
* Disable LBR access if any LBR MSRs can not be accessed.
*/
if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
--
2.29.2
If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional
might_sleep_if() deep in kmem_cache_alloc() will generate the following
trace, and potentially cause a deadlock when another LBR event is added:
[ 243.115549] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196
[ 243.117576] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 839, name: perf
[ 243.119326] INFO: lockdep is turned off.
[ 243.120249] irq event stamp: 0
[ 243.120967] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 243.122415] hardirqs last disabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
[ 243.124302] softirqs last enabled at (0): [<ffffffff810d9bf5>] copy_process+0xa45/0x1dc0
[ 243.126255] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 243.128119] CPU: 0 PID: 839 Comm: perf Not tainted 5.11.0-rc4-guest+ #8
[ 243.129654] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 243.131520] Call Trace:
[ 243.132112] dump_stack+0x8d/0xb5
[ 243.132896] ___might_sleep.cold.106+0xb3/0xc3
[ 243.133984] slab_pre_alloc_hook.constprop.85+0x96/0xd0
[ 243.135208] ? intel_pmu_lbr_add+0x152/0x170
[ 243.136207] kmem_cache_alloc+0x36/0x250
[ 243.137126] intel_pmu_lbr_add+0x152/0x170
[ 243.138088] x86_pmu_add+0x83/0xd0
[ 243.138889] ? lock_acquire+0x158/0x350
[ 243.139791] ? lock_acquire+0x158/0x350
[ 243.140694] ? lock_acquire+0x158/0x350
[ 243.141625] ? lock_acquired+0x1e3/0x360
[ 243.142544] ? lock_release+0x1bf/0x340
[ 243.143726] ? trace_hardirqs_on+0x1a/0xd0
[ 243.144823] ? lock_acquired+0x1e3/0x360
[ 243.145742] ? lock_release+0x1bf/0x340
[ 243.147107] ? __slab_free+0x49/0x540
[ 243.147966] ? trace_hardirqs_on+0x1a/0xd0
[ 243.148924] event_sched_in.isra.129+0xf8/0x2a0
[ 243.149989] merge_sched_in+0x261/0x3e0
[ 243.150889] ? trace_hardirqs_on+0x1a/0xd0
[ 243.151869] visit_groups_merge.constprop.135+0x130/0x4a0
[ 243.153122] ? sched_clock_cpu+0xc/0xb0
[ 243.154023] ctx_sched_in+0x101/0x210
[ 243.154884] ctx_resched+0x6f/0xc0
[ 243.155686] perf_event_exec+0x21e/0x2e0
[ 243.156641] begin_new_exec+0x5e5/0xbd0
[ 243.157540] load_elf_binary+0x6af/0x1770
[ 243.158478] ? __kernel_read+0x19d/0x2b0
[ 243.159977] ? lock_acquire+0x158/0x350
[ 243.160876] ? __kernel_read+0x19d/0x2b0
[ 243.161796] bprm_execve+0x3c8/0x840
[ 243.162638] do_execveat_common.isra.38+0x1a5/0x1c0
[ 243.163776] __x64_sys_execve+0x32/0x40
[ 243.164676] do_syscall_64+0x33/0x40
[ 243.165514] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 243.166746] RIP: 0033:0x7f6180a26feb
[ 243.167590] Code: Unable to access opcode bytes at RIP 0x7f6180a26fc1.
[ 243.169097] RSP: 002b:00007ffc6558ce18 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[ 243.170844] RAX: ffffffffffffffda RBX: 00007ffc65592d30 RCX: 00007f6180a26feb
[ 243.172514] RDX: 000055657f408dc0 RSI: 00007ffc65592410 RDI: 00007ffc65592d30
[ 243.174162] RBP: 00007ffc6558ce80 R08: 00007ffc6558cde0 R09: 0000000000000000
[ 243.176042] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffc65592410
[ 243.177696] R13: 000055657f408dc0 R14: 0000000000000001 R15: 00007ffc65592410
One of the solution is to use GFP_ATOMIC, but it will make the code less
reliable under memory pressue. Let's move the memory allocation out of
the sleeping region and put it into the x86_reserve_hardware().
The disadvantage of this fix is that the cpuc->lbr_xsave memory
will be allocated for each cpu like the legacy ds_buffer.
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
Suggested-by: Kan Liang <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/events/core.c | 8 +++++---
arch/x86/events/intel/bts.c | 2 +-
arch/x86/events/intel/lbr.c | 22 ++++++++++++++++------
arch/x86/events/perf_event.h | 8 +++++++-
4 files changed, 29 insertions(+), 11 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e37de298a495..b55f43481272 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -365,7 +365,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)
return x86_pmu_extra_regs(val, event);
}
-int x86_reserve_hardware(void)
+int x86_reserve_hardware(struct perf_event *event)
{
int err = 0;
@@ -374,8 +374,10 @@ int x86_reserve_hardware(void)
if (atomic_read(&pmc_refcount) == 0) {
if (!reserve_pmc_hardware())
err = -EBUSY;
- else
+ else {
reserve_ds_buffers();
+ reserve_lbr_buffers(event);
+ }
}
if (!err)
atomic_inc(&pmc_refcount);
@@ -626,7 +628,7 @@ static int __x86_pmu_event_init(struct perf_event *event)
if (!x86_pmu_initialized())
return -ENODEV;
- err = x86_reserve_hardware();
+ err = x86_reserve_hardware(event);
if (err)
return err;
diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index 731dd8d0dbb1..057bb2f761a9 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -564,7 +564,7 @@ static int bts_event_init(struct perf_event *event)
if (x86_add_exclusive(x86_lbr_exclusive_bts))
return -EBUSY;
- ret = x86_reserve_hardware();
+ ret = x86_reserve_hardware(event);
if (ret) {
x86_del_exclusive(x86_lbr_exclusive_bts);
return ret;
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 355ea70f1879..237876733e12 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -658,7 +658,6 @@ static inline bool branch_user_callstack(unsigned br_sel)
void intel_pmu_lbr_add(struct perf_event *event)
{
- struct kmem_cache *kmem_cache = event->pmu->task_ctx_cache;
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
if (!x86_pmu.lbr_nr)
@@ -696,11 +695,6 @@ void intel_pmu_lbr_add(struct perf_event *event)
perf_sched_cb_inc(event->ctx->pmu);
if (!cpuc->lbr_users++ && !event->total_time_running)
intel_pmu_lbr_reset();
-
- if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
- kmem_cache && !cpuc->lbr_xsave &&
- (cpuc->lbr_users != cpuc->lbr_pebs_users))
- cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
}
void release_lbr_buffers(void)
@@ -721,6 +715,22 @@ void release_lbr_buffers(void)
}
}
+void reserve_lbr_buffers(struct perf_event *event)
+{
+ struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
+ struct cpu_hw_events *cpuc;
+ int cpu;
+
+ if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
+ return;
+
+ for_each_possible_cpu(cpu) {
+ cpuc = per_cpu_ptr(&cpu_hw_events, cpu);
+ if (kmem_cache && !cpuc->lbr_xsave && !event->attr.precise_ip)
+ cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
+ }
+}
+
void intel_pmu_lbr_del(struct perf_event *event)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 7895cf4c59a7..7f2ff49c0bda 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -960,7 +960,7 @@ int x86_add_exclusive(unsigned int what);
void x86_del_exclusive(unsigned int what);
-int x86_reserve_hardware(void);
+int x86_reserve_hardware(struct perf_event *event);
void x86_release_hardware(void);
@@ -1122,6 +1122,8 @@ void reserve_ds_buffers(void);
void release_lbr_buffers(void);
+void reserve_lbr_buffers(struct perf_event *event);
+
extern struct event_constraint bts_constraint;
extern struct event_constraint vlbr_constraint;
@@ -1267,6 +1269,10 @@ static inline void release_lbr_buffers(void)
{
}
+static inline void reserve_lbr_buffers(struct perf_event *event)
+{
+}
+
static inline int intel_pmu_init(void)
{
return 0;
--
2.29.2
The Architecture LBR does not have MSR_LBR_TOS (0x000001c9). KVM will
generate #GP for this MSR access, thereby preventing the initialization
of the guest LBR.
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/events/intel/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7bb96ac87615..0338e354826d 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5568,7 +5568,8 @@ __init int intel_pmu_init(void)
* Check all LBR MSR here.
* Disable LBR access if any LBR MSRs can not be accessed.
*/
- if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
+ if (x86_pmu.lbr_nr && !boot_cpu_has(X86_FEATURE_ARCH_LBR) &&
+ !check_msr(x86_pmu.lbr_tos, 0x3UL))
x86_pmu.lbr_nr = 0;
for (i = 0; i < x86_pmu.lbr_nr; i++) {
if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) &&
--
2.29.2
The number of Arch LBR entries available for recording operations
is dictated by the value in MSR_ARCH_LBR_DEPTH.DEPTH. The supported
LBR depth values can be found in CPUID.(EAX=01CH, ECX=0):EAX[7:0]
and for each bit "n" set in this field, the MSR_ARCH_LBR_DEPTH.DEPTH
value of "8*(n+1)" is supported.
On a guest write to MSR_ARCH_LBR_DEPTH, all LBR entries are reset to 0.
KVM emulates the reset behavior by introducing lbr_desc->arch_lbr_reset.
KVM writes the guest requested value to the native ARCH_LBR_DEPTH MSR
(this is safe because the two values will be the same) when the Arch LBR
records MSRs are pass-through to the guest.
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 43 ++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 3 +++
2 files changed, 46 insertions(+)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9efc1a6b8693..d9c9cb6c9a4b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -220,6 +220,9 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
ret = pmu->version > 1;
break;
+ case MSR_ARCH_LBR_DEPTH:
+ ret = guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR);
+ break;
default:
ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
@@ -250,6 +253,7 @@ static inline void intel_pmu_release_guest_lbr_event(struct kvm_vcpu *vcpu)
if (lbr_desc->event) {
perf_event_release_kernel(lbr_desc->event);
lbr_desc->event = NULL;
+ lbr_desc->arch_lbr_reset = false;
vcpu_to_pmu(vcpu)->event_count--;
}
}
@@ -348,10 +352,26 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
return true;
}
+/*
+ * Check if the requested depth values is supported
+ * based on the bits [0:7] of the guest cpuid.1c.eax.
+ */
+static bool arch_lbr_depth_is_valid(struct kvm_vcpu *vcpu, u64 depth)
+{
+ struct kvm_cpuid_entry2 *best;
+
+ best = kvm_find_cpuid_entry(vcpu, 0x1c, 0);
+ if (best && depth && (depth < 65) && !(depth & 7))
+ return best->eax & BIT_ULL(depth / 8 - 1);
+
+ return false;
+}
+
static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
u32 msr = msr_info->index;
switch (msr) {
@@ -367,6 +387,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
msr_info->data = pmu->global_ovf_ctrl;
return 0;
+ case MSR_ARCH_LBR_DEPTH:
+ msr_info->data = lbr_desc->records.nr;
+ return 0;
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -393,6 +416,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
u32 msr = msr_info->index;
u64 data = msr_info->data;
@@ -427,6 +451,12 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 0;
}
break;
+ case MSR_ARCH_LBR_DEPTH:
+ if (!arch_lbr_depth_is_valid(vcpu, data))
+ return 1;
+ lbr_desc->records.nr = data;
+ lbr_desc->arch_lbr_reset = true;
+ return 0;
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -566,6 +596,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
lbr_desc->records.nr = 0;
lbr_desc->event = NULL;
lbr_desc->msr_passthrough = false;
+ lbr_desc->arch_lbr_reset = false;
}
static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -623,6 +654,15 @@ static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
intel_pmu_legacy_freezing_lbrs_on_pmi(vcpu);
}
+static void intel_pmu_arch_lbr_reset(struct kvm_vcpu *vcpu)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ /* On a software write to IA32_LBR_DEPTH, all LBR entries are reset to 0. */
+ wrmsrl(MSR_ARCH_LBR_DEPTH, lbr_desc->records.nr);
+ lbr_desc->arch_lbr_reset = false;
+}
+
static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
{
struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
@@ -654,6 +694,9 @@ static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
{
struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+ if (unlikely(lbr_desc->arch_lbr_reset))
+ intel_pmu_arch_lbr_reset(vcpu);
+
if (lbr_desc->msr_passthrough)
return;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 0fb3236b0283..032b86d0d4ba 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -116,6 +116,9 @@ struct lbr_desc {
/* True if LBRs are marked as not intercepted in the MSR bitmap */
bool msr_passthrough;
+
+ /* Reset all LBR entries on a guest write to MSR_ARCH_LBR_DEPTH */
+ bool arch_lbr_reset;
};
/*
--
2.29.2
Arch LBRs are enabled by setting MSR_ARCH_LBR_CTL.LBREn to 1. A new guest
state field named "Guest IA32_LBR_CTL" is added to enhance guest LBR usage.
When guest Arch LBR is enabled, a guest LBR event will be created like the
model-specific LBR does.
On processors that support Arch LBR, MSR_IA32_DEBUGCTLMSR[bit 0] has no
meaning. It can be written to 0 or 1, but reads will always return 0.
Like IA32_DEBUGCTL, IA32_ARCH_LBR_CTL msr is also reserved on INIT.
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/include/asm/vmx.h | 2 ++
arch/x86/kvm/vmx/pmu_intel.c | 31 ++++++++++++++++++++++++++-----
arch/x86/kvm/vmx/vmx.c | 9 +++++++++
3 files changed, 37 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 358707f60d99..6826fd0e8d1a 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -245,6 +245,8 @@ enum vmcs_field {
GUEST_BNDCFGS_HIGH = 0x00002813,
GUEST_IA32_RTIT_CTL = 0x00002814,
GUEST_IA32_RTIT_CTL_HIGH = 0x00002815,
+ GUEST_IA32_LBR_CTL = 0x00002816,
+ GUEST_IA32_LBR_CTL_HIGH = 0x00002817,
HOST_IA32_PAT = 0x00002c00,
HOST_IA32_PAT_HIGH = 0x00002c01,
HOST_IA32_EFER = 0x00002c02,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index d9c9cb6c9a4b..15490d31b828 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -19,6 +19,12 @@
#include "pmu.h"
#define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
+/*
+ * Regardless of the Arch LBR or legacy LBR, when the LBREn bit 0 of the
+ * corresponding control MSR is set to 1, LBR recording will be enabled.
+ */
+#define LBR_CTL_EN BIT(0)
+#define KVM_ARCH_LBR_CTL_MASK (ARCH_LBR_CTL_MASK | LBR_CTL_EN)
static struct kvm_event_hw_type_mapping intel_arch_events[] = {
/* Index must match CPUID 0x0A.EBX bit vector */
@@ -221,6 +227,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
ret = pmu->version > 1;
break;
case MSR_ARCH_LBR_DEPTH:
+ case MSR_ARCH_LBR_CTL:
ret = guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR);
break;
default:
@@ -390,6 +397,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_ARCH_LBR_DEPTH:
msr_info->data = lbr_desc->records.nr;
return 0;
+ case MSR_ARCH_LBR_CTL:
+ msr_info->data = vmcs_read64(GUEST_IA32_LBR_CTL);
+ return 0;
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -457,6 +467,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
lbr_desc->records.nr = data;
lbr_desc->arch_lbr_reset = true;
return 0;
+ case MSR_ARCH_LBR_CTL:
+ if (data & ~KVM_ARCH_LBR_CTL_MASK)
+ break;
+ vmcs_write64(GUEST_IA32_LBR_CTL, data);
+ if (intel_pmu_lbr_is_enabled(vcpu) && !lbr_desc->event &&
+ (data & ARCH_LBR_CTL_LBREN))
+ intel_pmu_create_guest_lbr_event(vcpu);
+ return 0;
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -635,12 +653,15 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
*/
static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
{
- u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+ u32 lbr_ctl_field = GUEST_IA32_DEBUGCTL;
- if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
- data &= ~DEBUGCTLMSR_LBR;
- vmcs_write64(GUEST_IA32_DEBUGCTL, data);
- }
+ if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI))
+ return;
+
+ if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
+ lbr_ctl_field = GUEST_IA32_LBR_CTL;
+
+ vmcs_write64(lbr_ctl_field, vmcs_read64(lbr_ctl_field) & ~LBR_CTL_EN);
}
static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef826594365f..38007daba935 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2054,6 +2054,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
VM_EXIT_SAVE_DEBUG_CONTROLS)
get_vmcs12(vcpu)->guest_ia32_debugctl = data;
+ /*
+ * For Arch LBR, IA32_DEBUGCTL[bit 0] has no meaning.
+ * It can be written to 0 or 1, but reads will always return 0.
+ */
+ if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
+ data &= ~DEBUGCTLMSR_LBR;
+
vmcs_write64(GUEST_IA32_DEBUGCTL, data);
if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
(data & DEBUGCTLMSR_LBR))
@@ -4474,6 +4481,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmcs_writel(GUEST_SYSENTER_ESP, 0);
vmcs_writel(GUEST_SYSENTER_EIP, 0);
vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
+ if (cpu_has_vmx_arch_lbr())
+ vmcs_write64(GUEST_IA32_LBR_CTL, 0);
}
kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
--
2.29.2
New VMX controls bits for Arch LBR are added. When bit 21 in vmentry_ctrl
is set, VM entry will write the value from the "Guest IA32_LBR_CTL" guest
state field to IA32_LBR_CTL. When bit 26 in vmexit_ctrl is set, VM exit
will clear IA32_LBR_CTL after the value has been saved to the "Guest
IA32_LBR_CTL" guest state field. The host value would be saved before
vm-entry and restored after vm-exit like the legacy host_debugctlmsr;
To enable guest Arch LBR, KVM should set both the "Load Guest IA32_LBR_CTL"
entry control and the "Clear IA32_LBR_CTL" exit control bits. If these two
conditions cannot be met, KVM will clear the LBR_FMT bits and will not
expose the Arch LBR feature.
If Arch LBR is exposed on KVM, the guest should set both the ARCH_LBR CPUID
and the same LBR_FMT value as the host via MSR_IA32_PERF_CAPABILITIES to
enable guest Arch LBR.
KVM will bypass the host/guest x86 cpu model check and the records msrs can
still be pass-through to guest as usual and work like a model-specific LBR.
KVM is consistent with the host and does not support the LER entry.
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/include/asm/vmx.h | 2 ++
arch/x86/kvm/vmx/capabilities.h | 25 +++++++++++++++++--------
arch/x86/kvm/vmx/pmu_intel.c | 27 ++++++++++++++++++++++-----
arch/x86/kvm/vmx/vmx.c | 32 ++++++++++++++++++++++++++++++--
arch/x86/kvm/vmx/vmx.h | 1 +
5 files changed, 72 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 6826fd0e8d1a..973bf16720c2 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -95,6 +95,7 @@
#define VM_EXIT_CLEAR_BNDCFGS 0x00800000
#define VM_EXIT_PT_CONCEAL_PIP 0x01000000
#define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000
+#define VM_EXIT_CLEAR_IA32_LBR_CTL 0x04000000
#define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff
@@ -108,6 +109,7 @@
#define VM_ENTRY_LOAD_BNDCFGS 0x00010000
#define VM_ENTRY_PT_CONCEAL_PIP 0x00020000
#define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000
+#define VM_ENTRY_LOAD_IA32_LBR_CTL 0x00200000
#define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d1d77985e889..73fceb534c7c 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -378,20 +378,29 @@ static inline bool vmx_pt_mode_is_host_guest(void)
return pt_mode == PT_MODE_HOST_GUEST;
}
-static inline u64 vmx_get_perf_capabilities(void)
+static inline bool cpu_has_vmx_arch_lbr(void)
{
- u64 perf_cap = 0;
-
- if (boot_cpu_has(X86_FEATURE_PDCM))
- rdmsrl(MSR_IA32_PERF_CAPABILITIES, perf_cap);
-
- perf_cap &= PMU_CAP_LBR_FMT;
+ return (vmcs_config.vmexit_ctrl & VM_EXIT_CLEAR_IA32_LBR_CTL) &&
+ (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_LBR_CTL);
+}
+static inline u64 vmx_get_perf_capabilities(void)
+{
/*
* Since counters are virtualized, KVM would support full
* width counting unconditionally, even if the host lacks it.
*/
- return PMU_CAP_FW_WRITES | perf_cap;
+ u64 perf_cap = PMU_CAP_FW_WRITES;
+ u64 host_perf_cap = 0;
+
+ if (boot_cpu_has(X86_FEATURE_PDCM))
+ rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
+
+ perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
+ if (boot_cpu_has(X86_FEATURE_ARCH_LBR) && !cpu_has_vmx_arch_lbr())
+ perf_cap &= ~PMU_CAP_LBR_FMT;
+
+ return perf_cap;
}
static inline u64 vmx_supported_debugctl(void)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 15490d31b828..9199d3974d57 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -181,12 +181,16 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
{
+ if (kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
+ return guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR);
+
/*
* As a first step, a guest could only enable LBR feature if its
* cpu model is the same as the host because the LBR registers
* would be pass-through to the guest and they're model specific.
*/
- return boot_cpu_data.x86_model == guest_cpuid_model(vcpu);
+ return !boot_cpu_has(X86_FEATURE_ARCH_LBR) &&
+ boot_cpu_data.x86_model == guest_cpuid_model(vcpu);
}
bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
@@ -204,8 +208,11 @@ static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
if (!intel_pmu_lbr_is_enabled(vcpu))
return ret;
- ret = (index == MSR_LBR_SELECT) || (index == MSR_LBR_TOS) ||
- (index >= records->from && index < records->from + records->nr) ||
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
+ ret = (index == MSR_LBR_SELECT) || (index == MSR_LBR_TOS);
+
+ if (!ret)
+ ret = (index >= records->from && index < records->from + records->nr) ||
(index >= records->to && index < records->to + records->nr);
if (!ret && records->info)
@@ -696,6 +703,9 @@ static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
vmx_set_intercept_for_msr(vcpu, lbr->info + i, MSR_TYPE_RW, set);
}
+ if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
+ return;
+
vmx_set_intercept_for_msr(vcpu, MSR_LBR_SELECT, MSR_TYPE_RW, set);
vmx_set_intercept_for_msr(vcpu, MSR_LBR_TOS, MSR_TYPE_RW, set);
}
@@ -739,10 +749,13 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+ bool lbr_enable = guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) ?
+ (vmcs_read64(GUEST_IA32_LBR_CTL) & ARCH_LBR_CTL_LBREN) :
+ (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR);
if (!lbr_desc->event) {
vmx_disable_lbr_msrs_passthrough(vcpu);
- if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
+ if (lbr_enable)
goto warn;
if (test_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use))
goto warn;
@@ -765,7 +778,11 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
{
- if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
+ bool lbr_enable = guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) ?
+ (vmcs_read64(GUEST_IA32_LBR_CTL) & ARCH_LBR_CTL_LBREN) :
+ (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR);
+
+ if (!lbr_enable)
intel_pmu_release_guest_lbr_event(vcpu);
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 38007daba935..43e73ea12ba6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -684,6 +684,9 @@ static bool is_valid_passthrough_msr(u32 msr)
case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 31:
case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 8:
case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
+ case MSR_ARCH_LBR_FROM_0 ... MSR_ARCH_LBR_FROM_0 + 31:
+ case MSR_ARCH_LBR_TO_0 ... MSR_ARCH_LBR_TO_0 + 31:
+ case MSR_ARCH_LBR_INFO_0 ... MSR_ARCH_LBR_INFO_0 + 31:
/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
return true;
}
@@ -1416,6 +1419,26 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
decache_tsc_multiplier(vmx);
}
+static inline unsigned long get_lbrctlmsr(void)
+{
+ unsigned long lbrctlmsr = 0;
+
+ if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
+ return 0;
+
+ rdmsrl(MSR_ARCH_LBR_CTL, lbrctlmsr);
+
+ return lbrctlmsr;
+}
+
+static inline void update_lbrctlmsr(unsigned long lbrctlmsr)
+{
+ if (!static_cpu_has(X86_FEATURE_ARCH_LBR))
+ return;
+
+ wrmsrl(MSR_ARCH_LBR_CTL, lbrctlmsr);
+}
+
/*
* Switches to specified vcpu, until a matching vcpu_put(), but assumes
* vcpu mutex is already taken.
@@ -1429,6 +1452,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vmx_vcpu_pi_load(vcpu, cpu);
vmx->host_debugctlmsr = get_debugctlmsr();
+ vmx->host_lbrctlmsr = get_lbrctlmsr();
}
static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
@@ -2547,7 +2571,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
VM_EXIT_LOAD_IA32_EFER |
VM_EXIT_CLEAR_BNDCFGS |
VM_EXIT_PT_CONCEAL_PIP |
- VM_EXIT_CLEAR_IA32_RTIT_CTL;
+ VM_EXIT_CLEAR_IA32_RTIT_CTL |
+ VM_EXIT_CLEAR_IA32_LBR_CTL;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
&_vmexit_control) < 0)
return -EIO;
@@ -2571,7 +2596,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
VM_ENTRY_LOAD_IA32_EFER |
VM_ENTRY_LOAD_BNDCFGS |
VM_ENTRY_PT_CONCEAL_PIP |
- VM_ENTRY_LOAD_IA32_RTIT_CTL;
+ VM_ENTRY_LOAD_IA32_RTIT_CTL |
+ VM_ENTRY_LOAD_IA32_LBR_CTL;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
&_vmentry_control) < 0)
return -EIO;
@@ -6747,6 +6773,8 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
if (vmx->host_debugctlmsr)
update_debugctlmsr(vmx->host_debugctlmsr);
+ if (vmx->host_lbrctlmsr)
+ update_lbrctlmsr(vmx->host_lbrctlmsr);
#ifndef CONFIG_X86_64
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 032b86d0d4ba..82b30b700005 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -320,6 +320,7 @@ struct vcpu_vmx {
u64 current_tsc_ratio;
unsigned long host_debugctlmsr;
+ unsigned long host_lbrctlmsr;
/*
* Only bits masked by msr_ia32_feature_control_valid_bits can be set in
--
2.29.2
Refine the code path of the existing clearing of supported_xss in this way:
initialize the supported_xss with the filter of KVM_SUPPORTED_XSS mask and
update its value in a bit clear manner (rather than bit setting).
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 5 +++--
arch/x86/kvm/x86.c | 6 +++++-
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 03c0faf16a7d..14ed3251376f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7302,9 +7302,10 @@ static __init void vmx_set_cpu_caps(void)
kvm_cpu_cap_set(X86_FEATURE_UMIP);
/* CPUID 0xD.1 */
- supported_xss = 0;
- if (!cpu_has_vmx_xsaves())
+ if (!cpu_has_vmx_xsaves()) {
kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
+ supported_xss = 0;
+ }
/* CPUID 0x80000001 */
if (!cpu_has_vmx_rdtscp())
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4bcf5b130e38..171605dcbd65 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -205,6 +205,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU)
+#define KVM_SUPPORTED_XSS 0
+
u64 __read_mostly host_efer;
EXPORT_SYMBOL_GPL(host_efer);
@@ -10450,8 +10452,10 @@ int kvm_arch_hardware_setup(void *opaque)
rdmsrl_safe(MSR_EFER, &host_efer);
- if (boot_cpu_has(X86_FEATURE_XSAVES))
+ if (boot_cpu_has(X86_FEATURE_XSAVES)) {
rdmsrl(MSR_IA32_XSS, host_xss);
+ supported_xss = host_xss & KVM_SUPPORTED_XSS;
+ }
r = ops->hardware_setup();
if (r != 0)
--
2.29.2
This unit-test is intended to test the basic KVM's support for
Architectural LBRs which is a Architectural performance monitor
unit (PMU) feature on Intel processors including negative testing
on the MSR LBR_DEPTH values.
If the LBR bit is set to 1 in the MSR_ARCH_LBR_CTL, the processor
will record a running trace of the most recent branches guest
taken in the LBR entries for guest to read.
Signed-off-by: Like Xu <[email protected]>
---
x86/pmu_lbr.c | 88 +++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 79 insertions(+), 9 deletions(-)
diff --git a/x86/pmu_lbr.c b/x86/pmu_lbr.c
index 3bd9e9f..8cde208 100644
--- a/x86/pmu_lbr.c
+++ b/x86/pmu_lbr.c
@@ -6,6 +6,7 @@
#define MAX_NUM_LBR_ENTRY 32
#define DEBUGCTLMSR_LBR (1UL << 0)
#define PMU_CAP_LBR_FMT 0x3f
+#define KVM_ARCH_LBR_CTL_MASK 0x7f000f
#define MSR_LBR_NHM_FROM 0x00000680
#define MSR_LBR_NHM_TO 0x000006c0
@@ -13,6 +14,10 @@
#define MSR_LBR_CORE_TO 0x00000060
#define MSR_LBR_TOS 0x000001c9
#define MSR_LBR_SELECT 0x000001c8
+#define MSR_ARCH_LBR_CTL 0x000014ce
+#define MSR_ARCH_LBR_DEPTH 0x000014cf
+#define MSR_ARCH_LBR_FROM_0 0x00001500
+#define MSR_ARCH_LBR_TO_0 0x00001600
volatile int count;
@@ -61,11 +66,26 @@ static bool test_init_lbr_from_exception(u64 index)
return test_for_exception(GP_VECTOR, init_lbr, &index);
}
+static void change_archlbr_depth(void *depth)
+{
+ wrmsr(MSR_ARCH_LBR_DEPTH, *(u64 *)depth);
+}
+
+static bool test_change_archlbr_depth_from_exception(u64 depth)
+{
+ return test_for_exception(GP_VECTOR, change_archlbr_depth, &depth);
+}
+
int main(int ac, char **av)
{
struct cpuid id = cpuid(10);
+ struct cpuid id_7 = cpuid(7);
+ struct cpuid id_1c;
u64 perf_cap;
int max, i;
+ bool arch_lbr = false;
+ u32 ctl_msr = MSR_IA32_DEBUGCTLMSR;
+ u64 ctl_value = DEBUGCTLMSR_LBR;
setup_vm();
perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
@@ -80,8 +100,19 @@ int main(int ac, char **av)
return report_summary();
}
+ if (id_7.d & (1UL << 19)) {
+ arch_lbr = true;
+ ctl_msr = MSR_ARCH_LBR_CTL;
+ /* DEPTH defaults to the maximum number of LBRs entries. */
+ max = rdmsr(MSR_ARCH_LBR_DEPTH) - 1;
+ ctl_value = KVM_ARCH_LBR_CTL_MASK;
+ }
+
printf("PMU version: %d\n", eax.split.version_id);
- printf("LBR version: %ld\n", perf_cap & PMU_CAP_LBR_FMT);
+ if (!arch_lbr)
+ printf("LBR version: %ld\n", perf_cap & PMU_CAP_LBR_FMT);
+ else
+ printf("Architectural LBR depth: %d\n", max + 1);
/* Look for LBR from and to MSRs */
lbr_from = MSR_LBR_CORE_FROM;
@@ -90,32 +121,71 @@ int main(int ac, char **av)
lbr_from = MSR_LBR_NHM_FROM;
lbr_to = MSR_LBR_NHM_TO;
}
+ if (test_init_lbr_from_exception(0)) {
+ lbr_from = MSR_ARCH_LBR_FROM_0;
+ lbr_to = MSR_ARCH_LBR_TO_0;
+ }
if (test_init_lbr_from_exception(0)) {
printf("LBR on this platform is not supported!\n");
return report_summary();
}
- wrmsr(MSR_LBR_SELECT, 0);
- wrmsr(MSR_LBR_TOS, 0);
- for (max = 0; max < MAX_NUM_LBR_ENTRY; max++) {
- if (test_init_lbr_from_exception(max))
- break;
+ if (arch_lbr) {
+ /*
+ * On processors that support Architectural LBRs,
+ * IA32_PERF_CAPABILITIES.LBR_FMT will have the value 03FH.
+ */
+ report(0x3f == (perf_cap & PMU_CAP_LBR_FMT), "The guest LBR_FMT value is good.");
}
+ /* Reset the guest LBR entries. */
+ if (arch_lbr) {
+ /* On a software write to IA32_LBR_DEPTH, all LBR entries are reset to 0.*/
+ wrmsr(MSR_ARCH_LBR_DEPTH, max + 1);
+ } else {
+ wrmsr(MSR_LBR_SELECT, 0);
+ wrmsr(MSR_LBR_TOS, 0);
+ for (max = 0; max < MAX_NUM_LBR_ENTRY; max++) {
+ if (test_init_lbr_from_exception(max))
+ break;
+ }
+ }
report(max > 0, "The number of guest LBR entries is good.");
+ /* Check the guest LBR entries are initialized. */
+ for (i = 0; i < max; ++i) {
+ if (rdmsr(lbr_to + i) || rdmsr(lbr_from + i))
+ break;
+ }
+ report(i == max, "The guest LBR initialized FROM_IP/TO_IP values are good.");
+
/* Do some branch instructions. */
- wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
+ wrmsr(ctl_msr, ctl_value);
lbr_test();
- wrmsr(MSR_IA32_DEBUGCTLMSR, 0);
+ wrmsr(ctl_msr, 0);
- report(rdmsr(MSR_LBR_TOS) != 0, "The guest LBR MSR_LBR_TOS value is good.");
+ /* Check if the guest LBR has recorded some branches. */
+ if (!arch_lbr) {
+ report(rdmsr(MSR_LBR_TOS) != 0, "The guest LBR MSR_LBR_TOS value is good.");
+ }
for (i = 0; i < max; ++i) {
if (!rdmsr(lbr_to + i) || !rdmsr(lbr_from + i))
break;
}
report(i == max, "The guest LBR FROM_IP/TO_IP values are good.");
+ if (!arch_lbr)
+ return report_summary();
+
+ /* Negative testing on the LBR_DEPTH MSR values */
+ id_1c = cpuid(0x1c);
+ for (i = 0; i < 8; i++) {
+ if (id_1c.a & (1UL << i))
+ continue;
+ report(test_change_archlbr_depth_from_exception(8*(i+1)) == 1,
+ "Negative test: guest LBR depth %d is unsupported.", 8*(i+1));
+ }
+
return report_summary();
}
--
2.29.2
The ARCH_LBR_CTL_MASK will be reused for LBR emulation in the KVM.
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/events/intel/lbr.c | 2 --
arch/x86/include/asm/msr-index.h | 1 +
2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 237876733e12..f60339ff0c13 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -168,8 +168,6 @@ enum {
ARCH_LBR_RETURN |\
ARCH_LBR_OTHER_BRANCH)
-#define ARCH_LBR_CTL_MASK 0x7f000e
-
static void intel_pmu_lbr_filter(struct cpu_hw_events *cpuc);
static __always_inline bool is_lbr_call_stack_bit_set(u64 config)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 546d6ecf0a35..8f3375961efc 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -169,6 +169,7 @@
#define LBR_INFO_BR_TYPE (0xfull << LBR_INFO_BR_TYPE_OFFSET)
#define MSR_ARCH_LBR_CTL 0x000014ce
+#define ARCH_LBR_CTL_MASK 0x7f000e
#define ARCH_LBR_CTL_LBREN BIT(0)
#define ARCH_LBR_CTL_CPL_OFFSET 1
#define ARCH_LBR_CTL_CPL (0x3ull << ARCH_LBR_CTL_CPL_OFFSET)
--
2.29.2
If CPUID.(EAX=07H, ECX=0):EDX[19] is set to 1, then KVM supports Arch
LBRs and CPUID leaf 01CH indicates details of the Arch LBRs capabilities.
Currently, KVM only supports the current host LBR depth for guests,
which is also the maximum supported depth on the host.
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/cpuid.c | 25 ++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.c | 2 ++
2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b4247f821277..4473324fe7be 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -450,7 +450,7 @@ void kvm_set_cpu_caps(void)
F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
- F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16)
+ F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) | F(ARCH_LBR)
);
/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
@@ -805,6 +805,29 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
goto out;
}
break;
+ /* Architectural LBR */
+ case 0x1c:
+ {
+ u64 lbr_depth_mask = entry->eax & 0xff;
+
+ if (!lbr_depth_mask || !kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR)) {
+ entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+ break;
+ }
+
+ /*
+ * KVM only exposes the maximum supported depth,
+ * which is also the fixed value used on the host.
+ *
+ * KVM doesn't allow VMM user sapce to adjust depth
+ * per guest, because the guest LBR emulation depends
+ * on the implementation of the host LBR driver.
+ */
+ lbr_depth_mask = 1UL << (fls(lbr_depth_mask) - 1);
+ entry->eax &= ~0xff;
+ entry->eax |= lbr_depth_mask;
+ break;
+ }
case KVM_CPUID_SIGNATURE: {
static const char signature[12] = "KVMKVMKVM\0\0";
const u32 *sigptr = (const u32 *)signature;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 43e73ea12ba6..03c0faf16a7d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7295,6 +7295,8 @@ static __init void vmx_set_cpu_caps(void)
kvm_cpu_cap_clear(X86_FEATURE_INVPCID);
if (vmx_pt_mode_is_host_guest())
kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
+ if (!cpu_has_vmx_arch_lbr())
+ kvm_cpu_cap_clear(X86_FEATURE_ARCH_LBR);
if (vmx_umip_emulated())
kvm_cpu_cap_set(X86_FEATURE_UMIP);
--
2.29.2
On processors whose XSAVE feature set supports XSAVES and XRSTORS,
the availability of support for Architectural LBR configuration state save
and restore can be determined from CPUID.(EAX=0DH, ECX=1):EDX:ECX[bit 15].
The detailed leaf for Arch LBRs is enumerated in CPUID.(EAX=0DH, ECX=0FH).
XSAVES provides a faster means than RDMSR for guest to read all LBRs.
When guest IA32_XSS[bit 15] is set, the Arch LBRs state can be saved using
XSAVES and restored by XRSTORS with the appropriate RFBM.
If the KVM fails to pass-through the LBR msrs to the guest, the LBR msrs
will be reset to prevent the leakage of host records via XSAVES. In this
case, the guest results may be inaccurate as the legacy LBR.
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 2 ++
arch/x86/kvm/vmx/vmx.c | 4 +++-
arch/x86/kvm/x86.c | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9199d3974d57..7666292094ec 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -772,6 +772,8 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
return;
warn:
+ if (kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
+ wrmsrl(MSR_ARCH_LBR_DEPTH, lbr_desc->records.nr);
pr_warn_ratelimited("kvm: vcpu-%d: fail to passthrough LBR.\n",
vcpu->vcpu_id);
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 14ed3251376f..659be0d708ac 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7295,8 +7295,10 @@ static __init void vmx_set_cpu_caps(void)
kvm_cpu_cap_clear(X86_FEATURE_INVPCID);
if (vmx_pt_mode_is_host_guest())
kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
- if (!cpu_has_vmx_arch_lbr())
+ if (!cpu_has_vmx_arch_lbr()) {
kvm_cpu_cap_clear(X86_FEATURE_ARCH_LBR);
+ supported_xss &= ~XFEATURE_MASK_LBR;
+ }
if (vmx_umip_emulated())
kvm_cpu_cap_set(X86_FEATURE_UMIP);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 171605dcbd65..2e0935795502 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -205,7 +205,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU)
-#define KVM_SUPPORTED_XSS 0
+#define KVM_SUPPORTED_XSS XFEATURE_MASK_LBR
u64 __read_mostly host_efer;
EXPORT_SYMBOL_GPL(host_efer);
--
2.29.2
Hi, do we have any comments on this patch set?
On 2021/3/14 23:52, Like Xu wrote:
> Hi geniuses,
>
> Please help review the new version of Arch LBR enabling patch set.
>
> The Architectural Last Branch Records (LBRs) is publiced
> in the 319433-040 release of Intel Architecture Instruction
> Set Extensions and Future Features Programming Reference[0].
> ---
> v3->v4 Changelog:
> - Add one more host patch to reuse ARCH_LBR_CTL_MASK;
> - Add reserve_lbr_buffers() instead of using GFP_ATOMIC;
> - Fia a bug in the arch_lbr_depth_is_valid();
> - Add LBR_CTL_EN to unify DEBUGCTLMSR_LBR and ARCH_LBR_CTL_LBREN;
> - Add vmx->host_lbrctlmsr to save/restore host values;
> - Add KVM_SUPPORTED_XSS to refactoring supported_xss;
> - Clear Arch_LBR ans its XSS bit if it's not supported;
> - Add negative testing to the related kvm-unit-tests;
> - Refine code and commit messages;
>
> Previous:
> https://lore.kernel.org/kvm/[email protected]/
>
> Like Xu (11):
> KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
> KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL emulation for Arch LBR
> KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field
> KVM: x86: Expose Architectural LBR CPUID leaf
> KVM: x86: Refine the matching and clearing logic for supported_xss
> KVM: x86: Add XSAVE Support for Architectural LBRs
>
> arch/x86/events/core.c | 8 ++-
> arch/x86/events/intel/bts.c | 2 +-
> arch/x86/events/intel/core.c | 6 +-
> arch/x86/events/intel/lbr.c | 28 +++++----
> arch/x86/events/perf_event.h | 8 ++-
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/include/asm/vmx.h | 4 ++
> arch/x86/kvm/cpuid.c | 25 +++++++-
> arch/x86/kvm/vmx/capabilities.h | 25 +++++---
> arch/x86/kvm/vmx/pmu_intel.c | 103 ++++++++++++++++++++++++++++---
> arch/x86/kvm/vmx/vmx.c | 50 +++++++++++++--
> arch/x86/kvm/vmx/vmx.h | 4 ++
> arch/x86/kvm/x86.c | 6 +-
> 13 files changed, 227 insertions(+), 43 deletions(-)
>
Hi all, do we have any comments on this patch set?
On 2021/3/14 23:52, Like Xu wrote:
> Hi geniuses,
>
> Please help review the new version of Arch LBR enabling patch set.
>
> The Architectural Last Branch Records (LBRs) is publiced
> in the 319433-040 release of Intel Architecture Instruction
> Set Extensions and Future Features Programming Reference[0].
>
> The main advantages for the Arch LBR users are [1]:
> - Faster context switching due to XSAVES support and faster reset of
> LBR MSRs via the new DEPTH MSR
> - Faster LBR read for a non-PEBS event due to XSAVES support, which
> lowers the overhead of the NMI handler.
> - Linux kernel can support the LBR features without knowing the model
> number of the current CPU.
>
> It's based on the kvm/queue tree plus two commits from kvm/intel tree:
> - 'fea4ab260645 ("KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS")'
> - '0ccd14126cb2 ("KVM: x86: Report XSS as an MSR to be saved if there are supported features")'
>
> Please check more details in each commit and feel free to comment.
>
> [0] https://software.intel.com/content/www/us/en/develop/download/
> intel-architecture-instruction-set-extensions-and-future-features-programming-reference.html
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> ---
> v3->v4 Changelog:
> - Add one more host patch to reuse ARCH_LBR_CTL_MASK;
> - Add reserve_lbr_buffers() instead of using GFP_ATOMIC;
> - Fia a bug in the arch_lbr_depth_is_valid();
> - Add LBR_CTL_EN to unify DEBUGCTLMSR_LBR and ARCH_LBR_CTL_LBREN;
> - Add vmx->host_lbrctlmsr to save/restore host values;
> - Add KVM_SUPPORTED_XSS to refactoring supported_xss;
> - Clear Arch_LBR ans its XSS bit if it's not supported;
> - Add negative testing to the related kvm-unit-tests;
> - Refine code and commit messages;
>
> Previous:
> https://lore.kernel.org/kvm/[email protected]/
>
> Like Xu (11):
> perf/x86/intel: Fix the comment about guest LBR support on KVM
> perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
> perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR
> perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
> perf/x86: Move ARCH_LBR_CTL_MASK definition to include/asm/msr-index.h
> KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
> KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL emulation for Arch LBR
> KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field
> KVM: x86: Expose Architectural LBR CPUID leaf
> KVM: x86: Refine the matching and clearing logic for supported_xss
> KVM: x86: Add XSAVE Support for Architectural LBRs
>
> arch/x86/events/core.c | 8 ++-
> arch/x86/events/intel/bts.c | 2 +-
> arch/x86/events/intel/core.c | 6 +-
> arch/x86/events/intel/lbr.c | 28 +++++----
> arch/x86/events/perf_event.h | 8 ++-
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/include/asm/vmx.h | 4 ++
> arch/x86/kvm/cpuid.c | 25 +++++++-
> arch/x86/kvm/vmx/capabilities.h | 25 +++++---
> arch/x86/kvm/vmx/pmu_intel.c | 103 ++++++++++++++++++++++++++++---
> arch/x86/kvm/vmx/vmx.c | 50 +++++++++++++--
> arch/x86/kvm/vmx/vmx.h | 4 ++
> arch/x86/kvm/x86.c | 6 +-
> 13 files changed, 227 insertions(+), 43 deletions(-)
>
Hi Paolo,
Do you have any comments on the patches 06 - 11 ?
Thanks,
Like Xu
On 2021/4/6 11:19, Xu, Like wrote:
> Hi all, do we have any comments on this patch set?
>
> On 2021/3/14 23:52, Like Xu wrote:
>> Hi geniuses,
>>
>> Please help review the new version of Arch LBR enabling patch set.
>>
>> The Architectural Last Branch Records (LBRs) is publiced
>> in the 319433-040 release of Intel Architecture Instruction
>> Set Extensions and Future Features Programming Reference[0].
>>
>> The main advantages for the Arch LBR users are [1]:
>> - Faster context switching due to XSAVES support and faster reset of
>> LBR MSRs via the new DEPTH MSR
>> - Faster LBR read for a non-PEBS event due to XSAVES support, which
>> lowers the overhead of the NMI handler.
>> - Linux kernel can support the LBR features without knowing the model
>> number of the current CPU.
>>
>> It's based on the kvm/queue tree plus two commits from kvm/intel tree:
>> - 'fea4ab260645 ("KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS")'
>> - '0ccd14126cb2 ("KVM: x86: Report XSS as an MSR to be saved if there are
>> supported features")'
>>
>> Please check more details in each commit and feel free to comment.
>>
>> [0] https://software.intel.com/content/www/us/en/develop/download/
>> intel-architecture-instruction-set-extensions-and-future-features-programming-reference.html
>>
>> [1]
>> https://lore.kernel.org/lkml/[email protected]/
>>
>>
>> ---
>> v3->v4 Changelog:
>> - Add one more host patch to reuse ARCH_LBR_CTL_MASK;
>> - Add reserve_lbr_buffers() instead of using GFP_ATOMIC;
>> - Fia a bug in the arch_lbr_depth_is_valid();
>> - Add LBR_CTL_EN to unify DEBUGCTLMSR_LBR and ARCH_LBR_CTL_LBREN;
>> - Add vmx->host_lbrctlmsr to save/restore host values;
>> - Add KVM_SUPPORTED_XSS to refactoring supported_xss;
>> - Clear Arch_LBR ans its XSS bit if it's not supported;
>> - Add negative testing to the related kvm-unit-tests;
>> - Refine code and commit messages;
>>
>> Previous:
>> https://lore.kernel.org/kvm/[email protected]/
>>
>>
>> Like Xu (11):
>> perf/x86/intel: Fix the comment about guest LBR support on KVM
>> perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
>> perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR
>> perf/x86/lbr: Move cpuc->lbr_xsave allocation out of sleeping region
>> perf/x86: Move ARCH_LBR_CTL_MASK definition to include/asm/msr-index.h
>> KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
>> KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL emulation for Arch LBR
>> KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field
>> KVM: x86: Expose Architectural LBR CPUID leaf
>> KVM: x86: Refine the matching and clearing logic for supported_xss
>> KVM: x86: Add XSAVE Support for Architectural LBRs
>>
>> arch/x86/events/core.c | 8 ++-
>> arch/x86/events/intel/bts.c | 2 +-
>> arch/x86/events/intel/core.c | 6 +-
>> arch/x86/events/intel/lbr.c | 28 +++++----
>> arch/x86/events/perf_event.h | 8 ++-
>> arch/x86/include/asm/msr-index.h | 1 +
>> arch/x86/include/asm/vmx.h | 4 ++
>> arch/x86/kvm/cpuid.c | 25 +++++++-
>> arch/x86/kvm/vmx/capabilities.h | 25 +++++---
>> arch/x86/kvm/vmx/pmu_intel.c | 103 ++++++++++++++++++++++++++++---
>> arch/x86/kvm/vmx/vmx.c | 50 +++++++++++++--
>> arch/x86/kvm/vmx/vmx.h | 4 ++
>> arch/x86/kvm/x86.c | 6 +-
>> 13 files changed, 227 insertions(+), 43 deletions(-)
>>
>