2021-08-23 19:39:13

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 0/3] perf/x86/intel: KVM: PT intr handler fix and cleanup

Patches 1 and 2 fix a bug where PT PMIs in guest are forwarded to KVM's PT
intr handler even if PT is configured for system mode, i.e. when the host
is the sole owner of PT.

Patch 3 is a related cleanup/optimization.

The PT specific stuff is effectively compile-tested only.

Sean Christopherson (3):
KVM: x86: Register perf callbacks after calling vendor's
hardware_setup()
KVM: x86: Register Processor Trace interrupt hook iff PT enabled in
guest
perf/x86/intel: Fold current_vcpu check into KVM's PT intr handler

arch/x86/events/intel/core.c | 7 +++----
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/vmx/vmx.c | 1 +
arch/x86/kvm/x86.c | 17 ++++++++++++-----
include/linux/perf_event.h | 2 +-
6 files changed, 19 insertions(+), 10 deletions(-)

--
2.33.0.rc2.250.ged5fa647cd-goog


2021-08-23 19:39:13

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 1/3] KVM: x86: Register perf callbacks after calling vendor's hardware_setup()

Wait to register perf callbacks until after doing vendor hardaware setup.
VMX's hardware_setup() configures Intel Processor Trace (PT) mode, and a
future fix to register the Intel PT guest interrupt hook if and only if
Intel PT is exposed to the guest will consume the configured PT mode.

Delaying registration to hardware setup is effectively a nop as KVM's perf
hooks all pivot on the per-CPU current_vcpu, which is non-NULL only when
KVM is handling an IRQ/NMI in a VM-Exit path. I.e. current_vcpu will be
NULL throughout both kvm_arch_init() and kvm_arch_hardware_setup().

Cc: Alexander Shishkin <[email protected]>
Cc: Artem Kashkanov <[email protected]>
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 86539c1686fa..fb6015f97f9e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8426,8 +8426,6 @@ int kvm_arch_init(void *opaque)

kvm_timer_init();

- perf_register_guest_info_callbacks(&kvm_guest_cbs);
-
if (boot_cpu_has(X86_FEATURE_XSAVE)) {
host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
@@ -8461,7 +8459,6 @@ void kvm_arch_exit(void)
clear_hv_tscchange_cb();
#endif
kvm_lapic_exit();
- perf_unregister_guest_info_callbacks(&kvm_guest_cbs);

if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
@@ -11064,6 +11061,8 @@ int kvm_arch_hardware_setup(void *opaque)
memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
kvm_ops_static_call_update();

+ perf_register_guest_info_callbacks(&kvm_guest_cbs);
+
if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
supported_xss = 0;

@@ -11091,6 +11090,8 @@ int kvm_arch_hardware_setup(void *opaque)

void kvm_arch_hardware_unsetup(void)
{
+ perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+
static_call(kvm_x86_hardware_unsetup)();
}

--
2.33.0.rc2.250.ged5fa647cd-goog

2021-08-23 19:40:03

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 2/3] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest

Override the Processor Trace (PT) interrupt handler for guest mode if and
only if PT is configured for host+guest mode, i.e. is being used
independently by both host and guest. If PT is configured for system
mode, the host fully controls PT and must handle all events.

Fixes: 8479e04e7d6b ("KVM: x86: Inject PMI for KVM guest")
Cc: [email protected]
Reported-by: Alexander Shishkin <[email protected]>
Reported-by: Artem Kashkanov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/vmx/vmx.c | 1 +
arch/x86/kvm/x86.c | 4 +++-
4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09b256db394a..1ea4943a73d7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1494,6 +1494,7 @@ struct kvm_x86_init_ops {
int (*disabled_by_bios)(void);
int (*check_processor_compatibility)(void);
int (*hardware_setup)(void);
+ bool (*intel_pt_intr_in_guest)(void);

struct kvm_x86_ops *runtime_ops;
};
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 0e4f2b1fa9fb..b06dbbd7eeeb 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -41,6 +41,7 @@ struct kvm_pmu_ops {
void (*reset)(struct kvm_vcpu *vcpu);
void (*deliver_pmi)(struct kvm_vcpu *vcpu);
void (*cleanup)(struct kvm_vcpu *vcpu);
+ void (*handle_intel_pt_intr)(void);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fada1055f325..f19d72136f77 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7896,6 +7896,7 @@ static struct kvm_x86_init_ops vmx_init_ops __initdata = {
.disabled_by_bios = vmx_disabled_by_bios,
.check_processor_compatibility = vmx_check_processor_compat,
.hardware_setup = hardware_setup,
+ .intel_pt_intr_in_guest = vmx_pt_mode_is_host_guest,

.runtime_ops = &vmx_x86_ops,
};
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fb6015f97f9e..b5ade47dad9c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8305,7 +8305,7 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
.is_in_guest = kvm_is_in_guest,
.is_user_mode = kvm_is_user_mode,
.get_guest_ip = kvm_get_guest_ip,
- .handle_intel_pt_intr = kvm_handle_intel_pt_intr,
+ .handle_intel_pt_intr = NULL,
};

#ifdef CONFIG_X86_64
@@ -11061,6 +11061,8 @@ int kvm_arch_hardware_setup(void *opaque)
memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
kvm_ops_static_call_update();

+ if (ops->intel_pt_intr_in_guest && ops->intel_pt_intr_in_guest())
+ kvm_guest_cbs.handle_intel_pt_intr = kvm_handle_intel_pt_intr;
perf_register_guest_info_callbacks(&kvm_guest_cbs);

if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
--
2.33.0.rc2.250.ged5fa647cd-goog

2021-08-23 19:42:45

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 3/3] perf/x86/intel: Fold current_vcpu check into KVM's PT intr handler

Move the check for a non-NULL current_vcpu into KVM's PT intr handler
instead of relying on the caller to perform the check. In addition to
ensuring KVM's handler won't dereference a NULL pointer (and making it
obvious that it can't), this avoids a reptoline when KVM is configured to
run PT in "system mode", in which case handle_intel_pt_intr will be NULL.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/events/intel/core.c | 7 +++----
arch/x86/kvm/pmu.h | 2 +-
arch/x86/kvm/x86.c | 6 +++++-
include/linux/perf_event.h | 2 +-
4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index fca7a6e2242f..060f1f1ebe15 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2852,10 +2852,9 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
*/
if (__test_and_clear_bit(GLOBAL_STATUS_TRACE_TOPAPMI_BIT, (unsigned long *)&status)) {
handled++;
- if (unlikely(perf_guest_cbs && perf_guest_cbs->is_in_guest() &&
- perf_guest_cbs->handle_intel_pt_intr))
- perf_guest_cbs->handle_intel_pt_intr();
- else
+ if (likely(!perf_guest_cbs ||
+ !perf_guest_cbs->handle_intel_pt_intr ||
+ perf_guest_cbs->handle_intel_pt_intr()))
intel_pt_interrupt();
}

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index b06dbbd7eeeb..4e8a38eca72b 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -41,7 +41,7 @@ struct kvm_pmu_ops {
void (*reset)(struct kvm_vcpu *vcpu);
void (*deliver_pmi)(struct kvm_vcpu *vcpu);
void (*cleanup)(struct kvm_vcpu *vcpu);
- void (*handle_intel_pt_intr)(void);
+ int (*handle_intel_pt_intr)(void);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b5ade47dad9c..3f289192f25f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8292,13 +8292,17 @@ static unsigned long kvm_get_guest_ip(void)
return ip;
}

-static void kvm_handle_intel_pt_intr(void)
+static int kvm_handle_intel_pt_intr(void)
{
struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);

+ if (!vcpu)
+ return -ENXIO;
+
kvm_make_request(KVM_REQ_PMI, vcpu);
__set_bit(MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT,
(unsigned long *)&vcpu->arch.pmu.global_status);
+ return 0;
}

static struct perf_guest_info_callbacks kvm_guest_cbs = {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2d510ad750ed..f812c2570285 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -30,7 +30,7 @@ struct perf_guest_info_callbacks {
int (*is_in_guest)(void);
int (*is_user_mode)(void);
unsigned long (*get_guest_ip)(void);
- void (*handle_intel_pt_intr)(void);
+ int (*handle_intel_pt_intr)(void);
};

#ifdef CONFIG_HAVE_HW_BREAKPOINT
--
2.33.0.rc2.250.ged5fa647cd-goog

2021-08-24 07:29:48

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest

Sean Christopherson <[email protected]> writes:

> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 0e4f2b1fa9fb..b06dbbd7eeeb 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -41,6 +41,7 @@ struct kvm_pmu_ops {
> void (*reset)(struct kvm_vcpu *vcpu);
> void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> void (*cleanup)(struct kvm_vcpu *vcpu);
> + void (*handle_intel_pt_intr)(void);

What's this one for?

Regards,
--
Alex

2021-08-24 14:13:51

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest

On Tue, Aug 24, 2021, Alexander Shishkin wrote:
> Sean Christopherson <[email protected]> writes:
>
> > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > index 0e4f2b1fa9fb..b06dbbd7eeeb 100644
> > --- a/arch/x86/kvm/pmu.h
> > +++ b/arch/x86/kvm/pmu.h
> > @@ -41,6 +41,7 @@ struct kvm_pmu_ops {
> > void (*reset)(struct kvm_vcpu *vcpu);
> > void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> > void (*cleanup)(struct kvm_vcpu *vcpu);
> > + void (*handle_intel_pt_intr)(void);
>
> What's this one for?

Doh, the remnants of one of my explorations trying to figure out the least gross
way to conditionally register the handling. I'll get it removed.

Good eyeballs, thanks!

2021-08-25 07:27:09

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest

On 24/8/2021 3:37 am, Sean Christopherson wrote:
> Override the Processor Trace (PT) interrupt handler for guest mode if and
> only if PT is configured for host+guest mode, i.e. is being used
> independently by both host and guest. If PT is configured for system
> mode, the host fully controls PT and must handle all events.
>
> Fixes: 8479e04e7d6b ("KVM: x86: Inject PMI for KVM guest")
> Cc: [email protected]
> Reported-by: Alexander Shishkin <[email protected]>
> Reported-by: Artem Kashkanov <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/pmu.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 1 +
> arch/x86/kvm/x86.c | 4 +++-
> 4 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 09b256db394a..1ea4943a73d7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1494,6 +1494,7 @@ struct kvm_x86_init_ops {
> int (*disabled_by_bios)(void);
> int (*check_processor_compatibility)(void);
> int (*hardware_setup)(void);
> + bool (*intel_pt_intr_in_guest)(void);
>
> struct kvm_x86_ops *runtime_ops;
> };
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 0e4f2b1fa9fb..b06dbbd7eeeb 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -41,6 +41,7 @@ struct kvm_pmu_ops {
> void (*reset)(struct kvm_vcpu *vcpu);
> void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> void (*cleanup)(struct kvm_vcpu *vcpu);
> + void (*handle_intel_pt_intr)(void);
> };
>
> static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index fada1055f325..f19d72136f77 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7896,6 +7896,7 @@ static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> .disabled_by_bios = vmx_disabled_by_bios,
> .check_processor_compatibility = vmx_check_processor_compat,
> .hardware_setup = hardware_setup,
> + .intel_pt_intr_in_guest = vmx_pt_mode_is_host_guest,
>
> .runtime_ops = &vmx_x86_ops,
> };
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fb6015f97f9e..b5ade47dad9c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8305,7 +8305,7 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
> .is_in_guest = kvm_is_in_guest,
> .is_user_mode = kvm_is_user_mode,
> .get_guest_ip = kvm_get_guest_ip,
> - .handle_intel_pt_intr = kvm_handle_intel_pt_intr,
> + .handle_intel_pt_intr = NULL,
> };
>
> #ifdef CONFIG_X86_64
> @@ -11061,6 +11061,8 @@ int kvm_arch_hardware_setup(void *opaque)
> memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
> kvm_ops_static_call_update();
>
> + if (ops->intel_pt_intr_in_guest && ops->intel_pt_intr_in_guest())
> + kvm_guest_cbs.handle_intel_pt_intr = kvm_handle_intel_pt_intr;

Emm, it's still buggy.

The guest "unknown NMI" from the host Intel PT can still be reproduced
after the following operation:

rmmod kvm_intel
modprobe kvm-intel pt_mode=1 ept=1
rmmod kvm_intel
modprobe kvm-intel pt_mode=1 ept=0

Since the handle_intel_pt_intr is not reset to NULL in kvm_arch_hardware_unsetup(),
and the previous function pointer still exists in the generic KVM data structure.

But I have to say that this fix is much better than the one I proposed [1].

[1] https://lore.kernel.org/lkml/[email protected]/

> perf_register_guest_info_callbacks(&kvm_guest_cbs);
>
> if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
>

2021-08-25 16:03:15

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest

On Wed, Aug 25, 2021, Like Xu wrote:
> On 24/8/2021 3:37 am, Sean Christopherson wrote:
> > @@ -11061,6 +11061,8 @@ int kvm_arch_hardware_setup(void *opaque)
> > memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
> > kvm_ops_static_call_update();
> > + if (ops->intel_pt_intr_in_guest && ops->intel_pt_intr_in_guest())
> > + kvm_guest_cbs.handle_intel_pt_intr = kvm_handle_intel_pt_intr;
>
> Emm, it's still buggy.
>
> The guest "unknown NMI" from the host Intel PT can still be reproduced
> after the following operation:
>
> rmmod kvm_intel
> modprobe kvm-intel pt_mode=1 ept=1
> rmmod kvm_intel
> modprobe kvm-intel pt_mode=1 ept=0
>
> Since the handle_intel_pt_intr is not reset to NULL in kvm_arch_hardware_unsetup(),
> and the previous function pointer still exists in the generic KVM data structure.

Ooof, good catch. Any preference between nullifying handle_intel_pt_intr in
setup() vs. unsetup()? I think I like the idea of "unwinding" during unsetup(),
even though it splits the logic a bit.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b5ade47dad9c..ffc6c2d73508 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11093,6 +11093,7 @@ int kvm_arch_hardware_setup(void *opaque)
void kvm_arch_hardware_unsetup(void)
{
perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+ kvm_guest_cbs.handle_intel_pt_intr = NULL;

static_call(kvm_x86_hardware_unsetup)();
}


vs.


diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b5ade47dad9c..462aa7a360e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11063,6 +11063,8 @@ int kvm_arch_hardware_setup(void *opaque)

if (ops->intel_pt_intr_in_guest && ops->intel_pt_intr_in_guest())
kvm_guest_cbs.handle_intel_pt_intr = kvm_handle_intel_pt_intr;
+ else
+ kvm_guest_cbs.handle_intel_pt_intr = NULL;
perf_register_guest_info_callbacks(&kvm_guest_cbs);

if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))

> But I have to say that this fix is much better than the one I proposed [1].
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> > perf_register_guest_info_callbacks(&kvm_guest_cbs);
> > if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
> >

2021-08-25 21:13:12

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest

On Wed, Aug 25, 2021, Sean Christopherson wrote:
> On Wed, Aug 25, 2021, Like Xu wrote:
> > On 24/8/2021 3:37 am, Sean Christopherson wrote:
> > > @@ -11061,6 +11061,8 @@ int kvm_arch_hardware_setup(void *opaque)
> > > memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
> > > kvm_ops_static_call_update();
> > > + if (ops->intel_pt_intr_in_guest && ops->intel_pt_intr_in_guest())
> > > + kvm_guest_cbs.handle_intel_pt_intr = kvm_handle_intel_pt_intr;
> >
> > Emm, it's still buggy.
> >
> > The guest "unknown NMI" from the host Intel PT can still be reproduced
> > after the following operation:
> >
> > rmmod kvm_intel
> > modprobe kvm-intel pt_mode=1 ept=1
> > rmmod kvm_intel
> > modprobe kvm-intel pt_mode=1 ept=0
> >
> > Since the handle_intel_pt_intr is not reset to NULL in kvm_arch_hardware_unsetup(),
> > and the previous function pointer still exists in the generic KVM data structure.
>
> Ooof, good catch. Any preference between nullifying handle_intel_pt_intr in
> setup() vs. unsetup()? I think I like the idea of "unwinding" during unsetup(),
> even though it splits the logic a bit.

Never mind, I figured out a way to clean all this up and land the PT interrupt
handler in vmx.c where it belongs. Getting there is a bit of a journey, but it's
very doable. That means unwinding in unsetup() is the preferred approach,
otherwise there would be potential for leaving a dangling pointer if a different
vendor module was succesfully loaded.