Last Branch Recording (LBR) is a performance monitor unit (PMU) feature
on Intel CPUs that captures branch related info. This patch series enables
this feature to KVM guests.
Here is a conclusion of the fundamental methods that we use:
1) the LBR feature is enabled per guest via QEMU setting of
KVM_CAP_X86_GUEST_LBR;
2) when the guest has the LBR feature, the LBR stack is passed through to
the guest for direct accesses;
3) When the guest uses the LBR feature with the user callstack mode, the
host will help save/resotre the LBR stack when the vCPU thread is
scheduled out/in.
Patches 1-5 implements the above 1) and 2), and patches 6-8 implements
the above 3).
ChangeLog:
v1->v2:
- add the per guest LBR capability, KVM_CAP_X86_GUEST_LBR;
- save/restore the LBR stack conditionally on the vCPU thread context
switching, instead of on VMX transitions;
- expose MSR_IA32_PERF_CAPABILITIES to the guest.
The first version was sent out long time ago, and can be referenced here:
https://lkml.org/lkml/2017/9/25/11 , and thanks for lots of the
suggestions from Paolo Bonzini and Andi Kleen.
Like Xu (2):
KVM: PMU: support to save/restore the guest lbr stack on vCPU
switching
perf/x86/intel/lbr: add the guest_lbr boolean to cpuc
Wei Wang (6):
perf/x86: add a function to get the lbr stack
KVM/x86: KVM_CAP_X86_GUEST_LBR
KVM/vmx: Pass through the lbr stack to a guest
KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
KVM/x86: enable the guest to access the debugctl msr
perf/x86/intel/lbr: guest requesting KVM for lbr stack save/restore
arch/x86/events/intel/lbr.c | 54 +++++++++++++++++++++++++--
arch/x86/events/perf_event.h | 1 +
arch/x86/include/asm/kvm_host.h | 3 ++
arch/x86/include/asm/perf_event.h | 19 ++++++++++
arch/x86/include/uapi/asm/kvm_para.h | 2 +
arch/x86/kvm/cpuid.c | 5 ++-
arch/x86/kvm/pmu_intel.c | 71 +++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx.c | 60 ++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 18 +++------
include/uapi/linux/kvm.h | 1 +
10 files changed, 215 insertions(+), 19 deletions(-)
--
2.7.4
From: Like Xu <[email protected]>
The host creates an lbr perf event for the guest vCPU only for the
purpose of saving/restoring the lbr stack on the vCPU context switching.
There is no need to enable the lbr functionality for this lbr perf event,
because the feature is essentially used in the vCPU.
So, we introduce the guest_lbr boolean control to cpuc, to indicate if
the lbr perf event is created for the guest. When the perf subsystem
handles this event (e.g. enable or read lbr stack on PMI) and finds it
is for the guest, it simply returns, because all we need for the perf
event is just a context switch support for the lbr stack.
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/events/intel/lbr.c | 10 +++++++---
arch/x86/events/perf_event.h | 1 +
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 50921d3..74f6ad9 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -481,6 +481,9 @@ void intel_pmu_lbr_add(struct perf_event *event)
if (!x86_pmu.lbr_nr)
return;
+ if (event->attr.exclude_host)
+ cpuc->guest_lbr = true;
+
cpuc->br_sel = event->hw.branch_reg.reg;
if (branch_user_callstack(cpuc->br_sel) && event->ctx->task_ctx_data) {
@@ -528,6 +531,7 @@ void intel_pmu_lbr_del(struct perf_event *event)
set_pv_lbr_ctrl_active(false);
}
+ cpuc->guest_lbr = false;
cpuc->lbr_users--;
WARN_ON_ONCE(cpuc->lbr_users < 0);
perf_sched_cb_dec(event->ctx->pmu);
@@ -537,7 +541,7 @@ void intel_pmu_lbr_enable_all(bool pmi)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
- if (cpuc->lbr_users)
+ if (cpuc->lbr_users && !cpuc->guest_lbr)
__intel_pmu_lbr_enable(pmi);
}
@@ -545,7 +549,7 @@ void intel_pmu_lbr_disable_all(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
- if (cpuc->lbr_users)
+ if (cpuc->lbr_users && !cpuc->guest_lbr)
__intel_pmu_lbr_disable();
}
@@ -679,7 +683,7 @@ void intel_pmu_lbr_read(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
- if (!cpuc->lbr_users)
+ if (!cpuc->lbr_users || cpuc->guest_lbr)
return;
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_32)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 1562863..a91fdef 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -223,6 +223,7 @@ struct cpu_hw_events {
*/
u64 intel_ctrl_guest_mask;
u64 intel_ctrl_host_mask;
+ bool guest_lbr;
struct perf_guest_switch_msr guest_switch_msrs[X86_PMC_IDX_MAX];
/*
--
2.7.4
From: Like Xu <[email protected]>
This patch adds support to KVM to save/restore the lbr stack on vCPU
context switching.
When the guest sets the ACTIVE bit of MSR_KVM_PV_LBR_CTRL, a perf event
is created on the host for the related vCPU. This perf event ensures the
LBR stack to be saved/restored when the vCPU thread is scheduled out/in.
The perf event is removed and freed when the guest clears the ACTIVE
bit.
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/cpuid.c | 3 +-
arch/x86/kvm/pmu_intel.c | 71 ++++++++++++++++++++++++++++++++++++++++-
3 files changed, 74 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5db5ba3..cfbe90f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -432,6 +432,8 @@ struct kvm_pmu {
struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
struct irq_work irq_work;
u64 reprogram_pmi;
+ u64 kvm_pv_lbr_ctrl;
+ struct perf_event *guest_lbr_event;
};
struct kvm_pmu_ops;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 3b8a57b..8550eee 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -622,7 +622,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_PV_UNHALT) |
(1 << KVM_FEATURE_PV_TLB_FLUSH) |
(1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
- (1 << KVM_FEATURE_PV_SEND_IPI);
+ (1 << KVM_FEATURE_PV_SEND_IPI) |
+ (1 << KVM_FEATURE_PV_LBR_CTRL);
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 5ab4a36..27c028d 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -67,6 +67,62 @@ static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data)
reprogram_counter(pmu, bit);
}
+static void guest_lbr_event_create(struct kvm_pmu *pmu)
+{
+ struct perf_event *event;
+ struct perf_event_attr attr = {
+ .type = PERF_TYPE_RAW,
+ .size = sizeof(attr),
+ .pinned = true,
+ .exclude_host = true,
+ .sample_type = PERF_SAMPLE_BRANCH_STACK,
+ .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+ PERF_SAMPLE_BRANCH_USER |
+ PERF_SAMPLE_BRANCH_KERNEL,
+ };
+
+ if (unlikely(pmu->guest_lbr_event)) {
+ pr_err("%s: guest_lbr_event already created\n", __func__);
+ return;
+ }
+
+ event = perf_event_create_kernel_counter(&attr, -1, current, NULL,
+ NULL);
+ if (IS_ERR(event)) {
+ pr_err("%s: failed %ld\n", __func__, PTR_ERR(event));
+ return;
+ }
+ pmu->guest_lbr_event = event;
+}
+
+void guest_lbr_event_release(struct kvm_pmu *pmu)
+{
+ struct perf_event *event = pmu->guest_lbr_event;
+
+ if (unlikely(!pmu->guest_lbr_event)) {
+ pr_err("%s: guest_lbr_event already freed\n", __func__);
+ return;
+ }
+
+ if (event) {
+ event->pmu->stop(event, PERF_EF_UPDATE);
+ perf_event_release_kernel(event);
+ }
+ pmu->guest_lbr_event = NULL;
+}
+
+static void kvm_pv_lbr_ctrl_changed(struct kvm_pmu *pmu, u64 data)
+{
+ bool guest_lbr_active = data & KVM_PV_LBR_CTRL_ACTIVE;
+
+ if (guest_lbr_active)
+ guest_lbr_event_create(pmu);
+ else
+ guest_lbr_event_release(pmu);
+
+ pmu->kvm_pv_lbr_ctrl = data;
+}
+
static unsigned intel_find_arch_event(struct kvm_pmu *pmu,
u8 event_select,
u8 unit_mask)
@@ -145,7 +201,7 @@ static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu,
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
- int ret;
+ int ret = 0;
switch (msr) {
case MSR_CORE_PERF_FIXED_CTR_CTRL:
@@ -154,6 +210,10 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
ret = pmu->version > 1;
break;
+ case MSR_KVM_PV_LBR_CTRL:
+ if (vcpu->kvm->arch.guest_lbr)
+ ret = 1;
+ break;
default:
ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
@@ -182,6 +242,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
*data = pmu->global_ovf_ctrl;
return 0;
+ case MSR_KVM_PV_LBR_CTRL:
+ *data = pmu->kvm_pv_lbr_ctrl;
+ return 0;
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_fixed_pmc(pmu, msr))) {
@@ -234,6 +297,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 0;
}
break;
+ case MSR_KVM_PV_LBR_CTRL:
+ if (pmu->kvm_pv_lbr_ctrl == data)
+ return 0;
+ kvm_pv_lbr_ctrl_changed(pmu, data);
+ return 0;
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_fixed_pmc(pmu, msr))) {
@@ -340,6 +408,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status =
pmu->global_ovf_ctrl = 0;
+ pmu->kvm_pv_lbr_ctrl = 0;
}
struct kvm_pmu_ops intel_pmu_ops = {
--
2.7.4
This patch adds an interface to enable a guest to request KVM to save
and restore the lbr stack on vCPU context switching.
KVM couldn't capture the info about whether the guest is actively using
the lbr feature via the lbr enable bit in the debugctl MSR, because that
control bit is frequently enabled and disabled by the guest, and in some
csaes, it is disabled even when the guest is actively using the lbr
feature. For example, perf_pmu_sched_task in the guest disables the bit
before reading out the lbr stack. In this case, the bit is disabled though
the guest is still using the lbr feature.
So, a KVM-specific MSR, MSR_KVM_PV_LBR_CTRL, is used by the guest at a
proper place to tell KVM if the LBR is actively in use or not. Basically,
the lbr user callstack mode needs the lbr stack to be saved/restored on a
context switching, so we set the ACTIVE bit of MSR_KVM_PV_LBR_CTRL only
when the user callstack mode is used. The KVM hypervisor will add the lbr
stack save/restore support on vCPU switching after the ACTIVE bit is set.
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/events/intel/lbr.c | 21 +++++++++++++++++++++
arch/x86/include/asm/perf_event.h | 3 +++
arch/x86/include/uapi/asm/kvm_para.h | 2 ++
3 files changed, 26 insertions(+)
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 7c3958e..50921d3 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/perf_event.h>
#include <linux/types.h>
+#include <linux/kvm_para.h>
#include <asm/perf_event.h>
#include <asm/msr.h>
@@ -454,6 +455,24 @@ static inline bool branch_user_callstack(unsigned br_sel)
return (br_sel & X86_BR_USER) && (br_sel & X86_BR_CALL_STACK);
}
+static inline void set_pv_lbr_ctrl_active(bool active)
+{
+ u64 lbr_ctrl_old, lbr_ctrl_new;
+
+ if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
+ !kvm_para_has_feature(KVM_FEATURE_PV_LBR_CTRL))
+ return;
+
+ rdmsrl(MSR_KVM_PV_LBR_CTRL, lbr_ctrl_old);
+ if (active)
+ lbr_ctrl_new = lbr_ctrl_old | KVM_PV_LBR_CTRL_ACTIVE;
+ else
+ lbr_ctrl_new = lbr_ctrl_old & ~KVM_PV_LBR_CTRL_ACTIVE;
+
+ if (lbr_ctrl_new != lbr_ctrl_old)
+ wrmsrl(MSR_KVM_PV_LBR_CTRL, lbr_ctrl_new);
+}
+
void intel_pmu_lbr_add(struct perf_event *event)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -467,6 +486,7 @@ void intel_pmu_lbr_add(struct perf_event *event)
if (branch_user_callstack(cpuc->br_sel) && event->ctx->task_ctx_data) {
task_ctx = event->ctx->task_ctx_data;
task_ctx->lbr_callstack_users++;
+ set_pv_lbr_ctrl_active(true);
}
/*
@@ -505,6 +525,7 @@ void intel_pmu_lbr_del(struct perf_event *event)
event->ctx->task_ctx_data) {
task_ctx = event->ctx->task_ctx_data;
task_ctx->lbr_callstack_users--;
+ set_pv_lbr_ctrl_active(false);
}
cpuc->lbr_users--;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 161165f..9fb0f7e 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -162,6 +162,9 @@ struct x86_pmu_capability {
*/
#define INTEL_PMC_IDX_FIXED_BTS (INTEL_PMC_IDX_FIXED + 16)
+/* Indicate to the hypervisor that the guest LBR is active */
+#define KVM_PV_LBR_CTRL_ACTIVE (1UL << 0)
+
#define GLOBAL_STATUS_COND_CHG BIT_ULL(63)
#define GLOBAL_STATUS_BUFFER_OVF BIT_ULL(62)
#define GLOBAL_STATUS_UNC_OVF BIT_ULL(61)
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 19980ec..87764dd 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -29,6 +29,7 @@
#define KVM_FEATURE_PV_TLB_FLUSH 9
#define KVM_FEATURE_ASYNC_PF_VMEXIT 10
#define KVM_FEATURE_PV_SEND_IPI 11
+#define KVM_FEATURE_PV_LBR_CTRL 12
#define KVM_HINTS_REALTIME 0
@@ -47,6 +48,7 @@
#define MSR_KVM_ASYNC_PF_EN 0x4b564d02
#define MSR_KVM_STEAL_TIME 0x4b564d03
#define MSR_KVM_PV_EOI_EN 0x4b564d04
+#define MSR_KVM_PV_LBR_CTRL 0x4b564d05
struct kvm_steal_time {
__u64 steal;
--
2.7.4
The LBR stack MSRs are architecturally specific. The perf subsystem has
already assigned the abstracted MSR values based on the CPU architecture.
This patch enables a caller outside the perf subsystem to get the LBR
stack info. This is useful for hyperviosrs to prepare the lbr feature
for the guest.
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/events/intel/lbr.c | 23 +++++++++++++++++++++++
arch/x86/include/asm/perf_event.h | 14 ++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index f3e006b..7c3958e 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1273,3 +1273,26 @@ void intel_pmu_lbr_init_knl(void)
x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
x86_pmu.lbr_sel_map = snb_lbr_sel_map;
}
+
+/**
+ * perf_get_lbr_stack - get the lbr stack related MSRs
+ *
+ * @stack: the caller's memory to get the lbr stack
+ *
+ * Returns: 0 indicates that the lbr stack has been successfully obtained.
+ */
+int perf_get_lbr_stack(struct perf_lbr_stack *stack)
+{
+ stack->lbr_nr = x86_pmu.lbr_nr;
+ stack->lbr_tos = x86_pmu.lbr_tos;
+ stack->lbr_from = x86_pmu.lbr_from;
+ stack->lbr_to = x86_pmu.lbr_to;
+
+ if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
+ stack->lbr_info = MSR_LBR_INFO_0;
+ else
+ stack->lbr_info = 0;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_get_lbr_stack);
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 12f5408..f40e80a 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -267,7 +267,16 @@ struct perf_guest_switch_msr {
u64 host, guest;
};
+struct perf_lbr_stack {
+ int lbr_nr;
+ unsigned long lbr_tos;
+ unsigned long lbr_from;
+ unsigned long lbr_to;
+ unsigned long lbr_info;
+};
+
extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
+extern int perf_get_lbr_stack(struct perf_lbr_stack *stack);
extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
extern void perf_check_microcode(void);
#else
@@ -277,6 +286,11 @@ static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
return NULL;
}
+static inline int perf_get_lbr_stack(struct perf_lbr_stack *stack)
+{
+ return -1;
+}
+
static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
{
memset(cap, 0, sizeof(*cap));
--
2.7.4
The debugctl MSR is not completely identical on AMD and Intel CPUs, for
example, FREEZE_LBRS_ON_PMI is supported by Intel CPUs only. svm.c has
handled the access to the debugctl msr, and this patch handles the
access to the debugctl msr on Intel CPUs in vmx.c. Accordingly, the
common debugctl msr handling code in kvm_get/set_msr_common is removed.
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx.c | 15 +++++++++++++++
arch/x86/kvm/x86.c | 13 -------------
2 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d5eba8e..d0ea360 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4093,6 +4093,11 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
msr_info->data = to_vmx(vcpu)->arch_capabilities;
break;
+ case MSR_IA32_DEBUGCTLMSR:
+ if (!vcpu->kvm->arch.guest_lbr)
+ return 1;
+ msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+ break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
@@ -4266,6 +4271,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
vmx->arch_capabilities = data;
break;
+ case MSR_IA32_DEBUGCTLMSR:
+ if (!vcpu->kvm->arch.guest_lbr)
+ return 1;
+ /*
+ * Currently, only FREEZE_LBRS_ON_PMI and DEBUGCTLMSR_LBR are
+ * supported.
+ */
+ data &= (DEBUGCTLMSR_FREEZE_LBRS_ON_PMI | DEBUGCTLMSR_LBR);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3eaf1b8..4bbb9eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2369,18 +2369,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
}
break;
- case MSR_IA32_DEBUGCTLMSR:
- if (!data) {
- /* We support the non-activated case already */
- break;
- } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
- /* Values other than LBR and BTF are vendor-specific,
- thus reserved and should throw a #GP */
- return 1;
- }
- vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
- __func__, data);
- break;
case 0x200 ... 0x2ff:
return kvm_mtrr_set_msr(vcpu, msr, data);
case MSR_IA32_APICBASE:
@@ -2623,7 +2611,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
switch (msr_info->index) {
case MSR_IA32_PLATFORM_ID:
case MSR_IA32_EBL_CR_POWERON:
- case MSR_IA32_DEBUGCTLMSR:
case MSR_IA32_LASTBRANCHFROMIP:
case MSR_IA32_LASTBRANCHTOIP:
case MSR_IA32_LASTINTFROMIP:
--
2.7.4
Pass through the LBR stack to the guest when the guest lbr feature is
enabled. This makes the guest have direct accesses to the lbr stack.
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1d26f3c..7a62c1c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7847,6 +7847,38 @@ static void vmx_enable_tdp(void)
kvm_enable_tdp();
}
+static int vmx_passthrough_lbr_msrs(struct kvm *kvm,
+ unsigned long *msr_bitmap)
+{
+ int i;
+ struct perf_lbr_stack lbr_stack;
+
+ if (perf_get_lbr_stack(&lbr_stack) < 0) {
+ pr_err("Failed to pass through the lbr stack\n");
+ return -ENOENT;
+ }
+
+ vmx_disable_intercept_for_msr(msr_bitmap, MSR_LBR_SELECT,
+ MSR_TYPE_RW);
+ vmx_disable_intercept_for_msr(msr_bitmap, lbr_stack.lbr_tos,
+ MSR_TYPE_RW);
+
+ for (i = 0; i < lbr_stack.lbr_nr; i++) {
+ vmx_disable_intercept_for_msr(msr_bitmap,
+ lbr_stack.lbr_from + i,
+ MSR_TYPE_RW);
+ vmx_disable_intercept_for_msr(msr_bitmap,
+ lbr_stack.lbr_to + i,
+ MSR_TYPE_RW);
+ if (lbr_stack.lbr_info)
+ vmx_disable_intercept_for_msr(msr_bitmap,
+ lbr_stack.lbr_info + i,
+ MSR_TYPE_RW);
+ }
+
+ return 0;
+}
+
static __init int hardware_setup(void)
{
unsigned long host_bndcfgs;
@@ -10998,6 +11030,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
vmx->msr_bitmap_mode = 0;
+ if (kvm->arch.guest_lbr) {
+ err = vmx_passthrough_lbr_msrs(kvm, msr_bitmap);
+ if (err < 0)
+ goto free_vmcs;
+ }
+
vmx->loaded_vmcs = &vmx->vmcs01;
cpu = get_cpu();
vmx_vcpu_load(&vmx->vcpu, cpu);
--
2.7.4
Introduce KVM_CAP_X86_GUEST_LBR to allow per-VM enabling of the guest
lbr feature.
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 5 +++++
include/uapi/linux/kvm.h | 1 +
3 files changed, 7 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 00ddb0c..5db5ba3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -822,6 +822,7 @@ struct kvm_arch {
gpa_t wall_clock;
+ bool guest_lbr;
bool mwait_in_guest;
bool hlt_in_guest;
bool pause_in_guest;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 506bd2b..3eaf1b8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2927,6 +2927,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_SPLIT_IRQCHIP:
case KVM_CAP_IMMEDIATE_EXIT:
case KVM_CAP_GET_MSR_FEATURES:
+ case KVM_CAP_X86_GUEST_LBR:
r = 1;
break;
case KVM_CAP_SYNC_REGS:
@@ -4350,6 +4351,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
kvm->arch.pause_in_guest = true;
r = 0;
break;
+ case KVM_CAP_X86_GUEST_LBR:
+ kvm->arch.guest_lbr = true;
+ r = 0;
+ break;
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 07548de..7cf9bc0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -952,6 +952,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_S390_HPAGE_1M 156
#define KVM_CAP_NESTED_STATE 157
#define KVM_CAP_ARM_INJECT_SERROR_ESR 158
+#define KVM_CAP_X86_GUEST_LBR 159
#ifdef KVM_CAP_IRQ_ROUTING
--
2.7.4
Bits [0, 5] of MSR_IA32_PERF_CAPABILITIES tell about the format of
the addresses stored in the LBR stack. Expose those bits to the guest
when the guest lbr feature is enabled.
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/include/asm/perf_event.h | 2 ++
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/vmx.c | 7 +++++++
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index f40e80a..161165f 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -79,6 +79,8 @@
#define ARCH_PERFMON_BRANCH_MISSES_RETIRED 6
#define ARCH_PERFMON_EVENTS_COUNT 7
+#define PERF_CAP_MASK_LBR_FMT 0x3f
+
/*
* Intel "Architectural Performance Monitoring" CPUID
* detection/enumeration details:
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7bcfa61..3b8a57b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,7 +365,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
0 /* DS-CPL, VMX, SMX, EST */ |
0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
- F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
+ F(FMA) | F(CX16) | 0 /* xTPR Update*/ | F(PDCM) |
F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7a62c1c..d5eba8e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4134,6 +4134,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
!guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
return 1;
/* Otherwise falls through */
+ case MSR_IA32_PERF_CAPABILITIES:
+ if (!boot_cpu_has(X86_FEATURE_PDCM))
+ return 1;
+ msr_info->data = native_read_msr(MSR_IA32_PERF_CAPABILITIES);
+ if (vcpu->kvm->arch.guest_lbr)
+ msr_info->data &= PERF_CAP_MASK_LBR_FMT;
+ break;
default:
msr = find_msr_entry(vmx, msr_info->index);
if (msr) {
--
2.7.4
On Thu, Sep 06, 2018 at 07:30:54PM +0800, Wei Wang wrote:
> This patch adds an interface to enable a guest to request KVM to save
> and restore the lbr stack on vCPU context switching.
>
> KVM couldn't capture the info about whether the guest is actively using
> the lbr feature via the lbr enable bit in the debugctl MSR, because that
> control bit is frequently enabled and disabled by the guest, and in some
> csaes, it is disabled even when the guest is actively using the lbr
> feature. For example, perf_pmu_sched_task in the guest disables the bit
> before reading out the lbr stack. In this case, the bit is disabled though
> the guest is still using the lbr feature.
>
> So, a KVM-specific MSR, MSR_KVM_PV_LBR_CTRL, is used by the guest at a
> proper place to tell KVM if the LBR is actively in use or not. Basically,
> the lbr user callstack mode needs the lbr stack to be saved/restored on a
> context switching, so we set the ACTIVE bit of MSR_KVM_PV_LBR_CTRL only
> when the user callstack mode is used. The KVM hypervisor will add the lbr
> stack save/restore support on vCPU switching after the ACTIVE bit is set.
PV is difficult because it requires changing all the users.
Maybe a better approach would be a lazy restore of the LBRs:
Don't restore the LBRs on context switch, but set the LBR MSRs to intercept.
Then on the first access restore the LBRs and allow direct access to the
MSRs again.
Also when the LBRs haven't been set to direct access the state doesn't
need to be saved.
-Andi
> +int perf_get_lbr_stack(struct perf_lbr_stack *stack)
> +{
> + stack->lbr_nr = x86_pmu.lbr_nr;
> + stack->lbr_tos = x86_pmu.lbr_tos;
> + stack->lbr_from = x86_pmu.lbr_from;
> + stack->lbr_to = x86_pmu.lbr_to;
> +
> + if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
> + stack->lbr_info = MSR_LBR_INFO_0;
> + else
> + stack->lbr_info = 0;
Seems weird to export the enum value if the enum isn't exported.
How can it be used?
-Andi
On 09/07/2018 11:27 AM, Andi Kleen wrote:
> On Thu, Sep 06, 2018 at 07:30:54PM +0800, Wei Wang wrote:
>> This patch adds an interface to enable a guest to request KVM to save
>> and restore the lbr stack on vCPU context switching.
>>
>> KVM couldn't capture the info about whether the guest is actively using
>> the lbr feature via the lbr enable bit in the debugctl MSR, because that
>> control bit is frequently enabled and disabled by the guest, and in some
>> csaes, it is disabled even when the guest is actively using the lbr
>> feature. For example, perf_pmu_sched_task in the guest disables the bit
>> before reading out the lbr stack. In this case, the bit is disabled though
>> the guest is still using the lbr feature.
>>
>> So, a KVM-specific MSR, MSR_KVM_PV_LBR_CTRL, is used by the guest at a
>> proper place to tell KVM if the LBR is actively in use or not. Basically,
>> the lbr user callstack mode needs the lbr stack to be saved/restored on a
>> context switching, so we set the ACTIVE bit of MSR_KVM_PV_LBR_CTRL only
>> when the user callstack mode is used. The KVM hypervisor will add the lbr
>> stack save/restore support on vCPU switching after the ACTIVE bit is set.
> PV is difficult because it requires changing all the users.
It needs changes of the guest driver, but remains transparent to guest
user applications (e.g. the perf tool).
Btw, we tested it, and it works in guest as good as on the native linux.
This was thought of as the hardest part of this work. Let me just
clarify it a little bit:
The fundamental function we want to achieve is
#1 when the vCPU is actively using the LBR feature, save/restore the
lbr stack when the vCPU is scheduled out/in;
#2 when the vCPU is NOT actively using the LBR feature, DON'T
save/restore the lbr stack when the vCPU is scheduled out/in;
The key problem we need to solve is: how does the host know if the guest
is actively using the lbr feature or not?
>
> Maybe a better approach would be a lazy restore of the LBRs:
>
> Don't restore the LBRs on context switch, but set the LBR MSRs to intercept.
> Then on the first access restore the LBRs and allow direct access to the
> MSRs again.
>
> Also when the LBRs haven't been set to direct access the state doesn't
> need to be saved.
This could achieve the above #1, but how would it solve #2 above? That
is, after the guest uses the lbr feature for a while, the lbr stack has
been passed through, then the guest doesn't use lbr any more, but the
vCPU will still save/restore on switching?
(Host cannot know that the guest is not using lbr by the debugctl[0],
the commit log above has some explanations about this)
Best,
Wei
On 09/07/2018 11:28 AM, Andi Kleen wrote:
>> +int perf_get_lbr_stack(struct perf_lbr_stack *stack)
>> +{
>> + stack->lbr_nr = x86_pmu.lbr_nr;
>> + stack->lbr_tos = x86_pmu.lbr_tos;
>> + stack->lbr_from = x86_pmu.lbr_from;
>> + stack->lbr_to = x86_pmu.lbr_to;
>> +
>> + if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
>> + stack->lbr_info = MSR_LBR_INFO_0;
>> + else
>> + stack->lbr_info = 0;
> Seems weird to export the enum value if the enum isn't exported.
> How can it be used?
>
I'm not sure about the issue. The caller gets the value of
MSR_LBR_INFO_0 (not the enum, LBR_FORMAT_INFO) only when the hardware
supports it. If hardware doesn't support it, just sets it to 0, and
there will be no lbr info msr to be passed through.
Best,
Wei
> This could achieve the above #1, but how would it solve #2 above? That is,
> after the guest uses the lbr feature for a while, the lbr stack has been
> passed through, then the guest doesn't use lbr any more, but the vCPU will
> still save/restore on switching?
If nothing accesses the MSR LBRs after a context switch in the guest
nothing gets saved/restored due to:
> > Also when the LBRs haven't been set to direct access the state doesn't
> > need to be saved.
-Andi
On Fri, Sep 7, 2018 at 4:28 PM Wei Wang <[email protected]> wrote:
> This patch adds support to KVM to save/restore the lbr stack on vCPU
> context switching.
>
> When the guest sets the ACTIVE bit of MSR_KVM_PV_LBR_CTRL, a perf event
> is created on the host for the related vCPU. This perf event ensures the
> LBR stack to be saved/restored when the vCPU thread is scheduled out/in.
> The perf event is removed and freed when the guest clears the ACTIVE
> bit.
[...]
> +void guest_lbr_event_release(struct kvm_pmu *pmu)
> +{
> + struct perf_event *event = pmu->guest_lbr_event;
> +
> + if (unlikely(!pmu->guest_lbr_event)) {
> + pr_err("%s: guest_lbr_event already freed\n", __func__);
> + return;
> + }
> +
> + if (event) {
> + event->pmu->stop(event, PERF_EF_UPDATE);
> + perf_event_release_kernel(event);
> + }
> + pmu->guest_lbr_event = NULL;
> +}
Is there some guarantee that this method will be called when the vCPU
is torn down on guest exit?
On Friday, September 7, 2018 10:37 PM, Jann Horn wrote:
> On Fri, Sep 7, 2018 at 4:28 PM Wei Wang <[email protected]> wrote:
> > This patch adds support to KVM to save/restore the lbr stack on vCPU
> > context switching.
> >
> > When the guest sets the ACTIVE bit of MSR_KVM_PV_LBR_CTRL, a perf
> > event is created on the host for the related vCPU. This perf event
> > ensures the LBR stack to be saved/restored when the vCPU thread is
> scheduled out/in.
> > The perf event is removed and freed when the guest clears the ACTIVE
> > bit.
> [...]
> > +void guest_lbr_event_release(struct kvm_pmu *pmu) {
> > + struct perf_event *event = pmu->guest_lbr_event;
> > +
> > + if (unlikely(!pmu->guest_lbr_event)) {
> > + pr_err("%s: guest_lbr_event already freed\n", __func__);
> > + return;
> > + }
> > +
> > + if (event) {
> > + event->pmu->stop(event, PERF_EF_UPDATE);
> > + perf_event_release_kernel(event);
> > + }
> > + pmu->guest_lbr_event = NULL;
> > +}
>
> Is there some guarantee that this method will be called when the vCPU is
> torn down on guest exit?
Thanks for reminding us this corner case. We didn’t consider that in this version. I think we could add guest_lbr_event_release() to kvm_arch_vcpu_destroy()
Best,
Wei
On Friday, September 7, 2018 10:11 PM, Andi Kleen wrote:
> > This could achieve the above #1, but how would it solve #2 above? That
> > is, after the guest uses the lbr feature for a while, the lbr stack
> > has been passed through, then the guest doesn't use lbr any more, but
> > the vCPU will still save/restore on switching?
>
> If nothing accesses the MSR LBRs after a context switch in the guest nothing
> gets saved/restored due to:
>
> > > Also when the LBRs haven't been set to direct access the state
> > > doesn't need to be saved.
How would you realize the function of saving/restoring the lbr stack on the host?
Here, we create a perf event on the host (please see guest_lbr_event_create on patch 7), which essentially satisfies all the conditions (e.g. increases cpuc->lbr_users) that are required to have the lbr stack saved/restored on the vCPU switching.
If we want to stop the host side lbr stack save/restore for the vCPU, we need accordingly to call guest_lbr_event_release (in patch 7) to destroy that perf event (the host doesn't automatically stop saving the lbr stack for the vCPU if that perf event is still there).
When would you call that release function? (we all know that the lbr doesn't need to be saved when the guest is not using it, but we need to destroy that perf event to achieve "doesn't need to be saved")
Best,
Wei
> How would you realize the function of saving/restoring the lbr stack on the host?
>
> Here, we create a perf event on the host (please see guest_lbr_event_create on patch 7), which essentially satisfies all the conditions (e.g. increases cpuc->lbr_users) that are required to have the lbr stack saved/restored on the vCPU switching.
>
> If we want to stop the host side lbr stack save/restore for the vCPU, we need accordingly to call guest_lbr_event_release (in patch 7) to destroy that perf event (the host doesn't automatically stop saving the lbr stack for the vCPU if that perf event is still there).
>
> When would you call that release function? (we all know that the lbr doesn't need to be saved when the guest is not using it, but we need to destroy that perf event to achieve "doesn't need to be saved")
Maybe set a timer on DEBUGCTL LBR=0 ? A timer would provide hysteresis, so that quick toggles
(like in a PMI handler) wouldn't do anything expensive.
It needs new interfaces for perf anyways because we need to access the LBR state from
the last save.
-Andi
On Saturday, September 8, 2018 4:05 AM, Andi Kleen wrote:
> > How would you realize the function of saving/restoring the lbr stack on the
> host?
> >
> > Here, we create a perf event on the host (please see
> guest_lbr_event_create on patch 7), which essentially satisfies all the
> conditions (e.g. increases cpuc->lbr_users) that are required to have the lbr
> stack saved/restored on the vCPU switching.
> >
> > If we want to stop the host side lbr stack save/restore for the vCPU, we
> need accordingly to call guest_lbr_event_release (in patch 7) to destroy that
> perf event (the host doesn't automatically stop saving the lbr stack for the
> vCPU if that perf event is still there).
> >
> > When would you call that release function? (we all know that the lbr
> > doesn't need to be saved when the guest is not using it, but we need
> > to destroy that perf event to achieve "doesn't need to be saved")
>
> Maybe set a timer on DEBUGCTL LBR=0 ? A timer would provide hysteresis,
> so that quick toggles (like in a PMI handler) wouldn't do anything expensive.
I'm not sure if this would solve the key problem. What would be the frequency to have the timer fired?
- Let's say every 10ms, for example. The guest is disturbed by a timer interrupt (cause VMExit) every 10ms, though the guest doesn't use the lbr any more after the first use. The problem is switched to when do we call the release function to cancel the timer if we want to avoid that unnecessary disturbance to the guest.
- When the timer fires, and it finds "DEBUGCTL LBR=0", it destroys the host side perf event, then the lbr stack isn't saved when the vCPU is scheduled out. As also mentioned in the commit log, perf_pmu_sched_task in the guest disables that bit before reading out the lbr stack (pmi is another example). Now, DEBUGCTL LBR=0", and before the guest read out the lbr stack, the vCPU may happen to be scheduled out, and another thread on the host is scheduled in and will get the lbr stack overwritten. So, before the guest reads out the lbr stack, the stack has already been polluted in this case.
Best,
Wei
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Wei Wang
> Sent: Thursday, September 06, 2018 7:31 PM
> To: [email protected]; [email protected]; [email protected];
> [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: [PATCH v2 7/8] KVM: PMU: support to save/restore the guest lbr stack
> on vCPU switching
>
> From: Like Xu <[email protected]>
>
> This patch adds support to KVM to save/restore the lbr stack on vCPU
> context switching.
>
> When the guest sets the ACTIVE bit of MSR_KVM_PV_LBR_CTRL, a perf event
> is created on the host for the related vCPU. This perf event ensures the
> LBR stack to be saved/restored when the vCPU thread is scheduled out/in.
> The perf event is removed and freed when the guest clears the ACTIVE
> bit.
>
What about live migration? Does LBR stack need to be saved on the source side and
restored on the dest side with the passthrough mode?
Thanks,
-Gonglei
> Signed-off-by: Like Xu <[email protected]>
> Signed-off-by: Wei Wang <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Andi Kleen <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/cpuid.c | 3 +-
> arch/x86/kvm/pmu_intel.c | 71
> ++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 74 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> index 5db5ba3..cfbe90f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -432,6 +432,8 @@ struct kvm_pmu {
> struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
> struct irq_work irq_work;
> u64 reprogram_pmi;
> + u64 kvm_pv_lbr_ctrl;
> + struct perf_event *guest_lbr_event;
> };
>
> struct kvm_pmu_ops;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 3b8a57b..8550eee 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -622,7 +622,8 @@ static inline int __do_cpuid_ent(struct
> kvm_cpuid_entry2 *entry, u32 function,
> (1 << KVM_FEATURE_PV_UNHALT) |
> (1 << KVM_FEATURE_PV_TLB_FLUSH) |
> (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> - (1 << KVM_FEATURE_PV_SEND_IPI);
> + (1 << KVM_FEATURE_PV_SEND_IPI) |
> + (1 << KVM_FEATURE_PV_LBR_CTRL);
>
> if (sched_info_on())
> entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
> index 5ab4a36..27c028d 100644
> --- a/arch/x86/kvm/pmu_intel.c
> +++ b/arch/x86/kvm/pmu_intel.c
> @@ -67,6 +67,62 @@ static void global_ctrl_changed(struct kvm_pmu *pmu,
> u64 data)
> reprogram_counter(pmu, bit);
> }
>
> +static void guest_lbr_event_create(struct kvm_pmu *pmu)
> +{
> + struct perf_event *event;
> + struct perf_event_attr attr = {
> + .type = PERF_TYPE_RAW,
> + .size = sizeof(attr),
> + .pinned = true,
> + .exclude_host = true,
> + .sample_type = PERF_SAMPLE_BRANCH_STACK,
> + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> + PERF_SAMPLE_BRANCH_USER |
> + PERF_SAMPLE_BRANCH_KERNEL,
> + };
> +
> + if (unlikely(pmu->guest_lbr_event)) {
> + pr_err("%s: guest_lbr_event already created\n", __func__);
> + return;
> + }
> +
> + event = perf_event_create_kernel_counter(&attr, -1, current, NULL,
> + NULL);
> + if (IS_ERR(event)) {
> + pr_err("%s: failed %ld\n", __func__, PTR_ERR(event));
> + return;
> + }
> + pmu->guest_lbr_event = event;
> +}
> +
> +void guest_lbr_event_release(struct kvm_pmu *pmu)
> +{
> + struct perf_event *event = pmu->guest_lbr_event;
> +
> + if (unlikely(!pmu->guest_lbr_event)) {
> + pr_err("%s: guest_lbr_event already freed\n", __func__);
> + return;
> + }
> +
> + if (event) {
> + event->pmu->stop(event, PERF_EF_UPDATE);
> + perf_event_release_kernel(event);
> + }
> + pmu->guest_lbr_event = NULL;
> +}
> +
> +static void kvm_pv_lbr_ctrl_changed(struct kvm_pmu *pmu, u64 data)
> +{
> + bool guest_lbr_active = data & KVM_PV_LBR_CTRL_ACTIVE;
> +
> + if (guest_lbr_active)
> + guest_lbr_event_create(pmu);
> + else
> + guest_lbr_event_release(pmu);
> +
> + pmu->kvm_pv_lbr_ctrl = data;
> +}
> +
> static unsigned intel_find_arch_event(struct kvm_pmu *pmu,
> u8 event_select,
> u8 unit_mask)
> @@ -145,7 +201,7 @@ static struct kvm_pmc *intel_msr_idx_to_pmc(struct
> kvm_vcpu *vcpu,
> static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
> {
> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> - int ret;
> + int ret = 0;
>
> switch (msr) {
> case MSR_CORE_PERF_FIXED_CTR_CTRL:
> @@ -154,6 +210,10 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu,
> u32 msr)
> case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> ret = pmu->version > 1;
> break;
> + case MSR_KVM_PV_LBR_CTRL:
> + if (vcpu->kvm->arch.guest_lbr)
> + ret = 1;
> + break;
> default:
> ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
> get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
> @@ -182,6 +242,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu,
> u32 msr, u64 *data)
> case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> *data = pmu->global_ovf_ctrl;
> return 0;
> + case MSR_KVM_PV_LBR_CTRL:
> + *data = pmu->kvm_pv_lbr_ctrl;
> + return 0;
> default:
> if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
> (pmc = get_fixed_pmc(pmu, msr))) {
> @@ -234,6 +297,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
> return 0;
> }
> break;
> + case MSR_KVM_PV_LBR_CTRL:
> + if (pmu->kvm_pv_lbr_ctrl == data)
> + return 0;
> + kvm_pv_lbr_ctrl_changed(pmu, data);
> + return 0;
> default:
> if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
> (pmc = get_fixed_pmc(pmu, msr))) {
> @@ -340,6 +408,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>
> pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status =
> pmu->global_ovf_ctrl = 0;
> + pmu->kvm_pv_lbr_ctrl = 0;
> }
>
> struct kvm_pmu_ops intel_pmu_ops = {
> --
> 2.7.4
> > From: Like Xu <[email protected]>
> >
> > This patch adds support to KVM to save/restore the lbr stack on vCPU
> > context switching.
> >
> > When the guest sets the ACTIVE bit of MSR_KVM_PV_LBR_CTRL, a perf event
> > is created on the host for the related vCPU. This perf event ensures the
> > LBR stack to be saved/restored when the vCPU thread is scheduled out/in.
> > The perf event is removed and freed when the guest clears the ACTIVE
> > bit.
> >
>
> What about live migration? Does LBR stack need to be saved on the source side and
> restored on the dest side with the passthrough mode?
Yes it should. Either for call stack LBR, or when it is frozen/disabled.
When it's not frozen/disabled and not in call stack LBR mode it likely doesn't
hurt either, but it's not strictly needed because it will
be replaced so quickly.
-Andi
On 09/18/2018 10:56 AM, Andi Kleen wrote:
>>> From: Like Xu <[email protected]>
>>>
>>> This patch adds support to KVM to save/restore the lbr stack on vCPU
>>> context switching.
>>>
>>> When the guest sets the ACTIVE bit of MSR_KVM_PV_LBR_CTRL, a perf event
>>> is created on the host for the related vCPU. This perf event ensures the
>>> LBR stack to be saved/restored when the vCPU thread is scheduled out/in.
>>> The perf event is removed and freed when the guest clears the ACTIVE
>>> bit.
>>>
>> What about live migration? Does LBR stack need to be saved on the source side and
>> restored on the dest side with the passthrough mode?
> Yes it should. Either for call stack LBR, or when it is frozen/disabled.
>
> When it's not frozen/disabled and not in call stack LBR mode it likely doesn't
> hurt either, but it's not strictly needed because it will
> be replaced so quickly.
Yes, should be supported. We are working on v3 with the suggested lazy
save approach, and will send it out shortly.
Best,
Wei
> -----Original Message-----
> From: Wei Wang [mailto:[email protected]]
> Sent: Tuesday, September 18, 2018 5:58 PM
> To: Andi Kleen <[email protected]>; Gonglei (Arei) <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH v2 7/8] KVM: PMU: support to save/restore the guest lbr
> stack on vCPU switching
>
> On 09/18/2018 10:56 AM, Andi Kleen wrote:
> >>> From: Like Xu <[email protected]>
> >>>
> >>> This patch adds support to KVM to save/restore the lbr stack on vCPU
> >>> context switching.
> >>>
> >>> When the guest sets the ACTIVE bit of MSR_KVM_PV_LBR_CTRL, a perf
> event
> >>> is created on the host for the related vCPU. This perf event ensures the
> >>> LBR stack to be saved/restored when the vCPU thread is scheduled out/in.
> >>> The perf event is removed and freed when the guest clears the ACTIVE
> >>> bit.
> >>>
> >> What about live migration? Does LBR stack need to be saved on the source
> side and
> >> restored on the dest side with the passthrough mode?
> > Yes it should. Either for call stack LBR, or when it is frozen/disabled.
> >
> > When it's not frozen/disabled and not in call stack LBR mode it likely doesn't
> > hurt either, but it's not strictly needed because it will
> > be replaced so quickly.
>
> Yes, should be supported. We are working on v3 with the suggested lazy
> save approach, and will send it out shortly.
>
Nice~
Thanks,
-Gonglei