2021-08-23 13:45:06

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCH] kvm/x86: Fix PT "host mode"

Regardless of the "pt_mode", the kvm driver installs its interrupt handler
for Intel PT, which always overrides the native handler, causing data loss
inside kvm guests, while we're expecting to trace them.

Fix this by only installing kvm's perf_guest_cbs if pt_mode is set to
guest tracing.

Signed-off-by: Alexander Shishkin <[email protected]>
Fixes: ff9d07a0e7ce7 ("KVM: Implement perf callbacks for guest sampling")
Reported-by: Artem Kashkanov <[email protected]>
Tested-by: Artem Kashkanov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 6 ++++++
arch/x86/kvm/x86.c | 10 ++++++++--
3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..84a1ed067f35 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1408,6 +1408,7 @@ struct kvm_x86_init_ops {
int (*disabled_by_bios)(void);
int (*check_processor_compatibility)(void);
int (*hardware_setup)(void);
+ int (*intel_pt_enabled)(void);

struct kvm_x86_ops *runtime_ops;
};
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bceb5ca3a89..0c239aa3532a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7943,11 +7943,17 @@ static __init int hardware_setup(void)
return r;
}

+static int vmx_intel_pt_enabled(void)
+{
+ return vmx_pt_mode_is_host_guest();
+}
+
static struct kvm_x86_init_ops vmx_init_ops __initdata = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
.check_processor_compatibility = vmx_check_processor_compat,
.hardware_setup = hardware_setup,
+ .intel_pt_enabled = vmx_intel_pt_enabled,

.runtime_ops = &vmx_x86_ops,
};
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca616929..3ba0001e7388 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -268,6 +268,8 @@ static struct kmem_cache *x86_fpu_cache;

static struct kmem_cache *x86_emulator_cache;

+static int __read_mostly intel_pt_enabled;
+
/*
* When called, it means the previous get/set msr reached an invalid msr.
* Return true if we want to ignore/silent this failed msr access.
@@ -8194,7 +8196,10 @@ int kvm_arch_init(void *opaque)

kvm_timer_init();

- perf_register_guest_info_callbacks(&kvm_guest_cbs);
+ if (ops->intel_pt_enabled && ops->intel_pt_enabled()) {
+ perf_register_guest_info_callbacks(&kvm_guest_cbs);
+ intel_pt_enabled = 1;
+ }

if (boot_cpu_has(X86_FEATURE_XSAVE)) {
host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
@@ -8229,7 +8234,8 @@ void kvm_arch_exit(void)
clear_hv_tscchange_cb();
#endif
kvm_lapic_exit();
- perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+ if (intel_pt_enabled)
+ perf_unregister_guest_info_callbacks(&kvm_guest_cbs);

if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
--
2.32.0


2021-08-23 16:18:02

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] kvm/x86: Fix PT "host mode"

On Mon, Aug 23, 2021, Alexander Shishkin wrote:
> Regardless of the "pt_mode", the kvm driver installs its interrupt handler
> for Intel PT, which always overrides the native handler, causing data loss
> inside kvm guests, while we're expecting to trace them.
>
> Fix this by only installing kvm's perf_guest_cbs if pt_mode is set to
> guest tracing.

Uh, regardless of the correctness of such a change (spoiler alert), making an
enormous leap from "one thing is wrong" to "nuke it all!" needs way more
justfication/explanation. Or more realistically, such a leap should be a good
indication that the proposed change is not correct.

> Signed-off-by: Alexander Shishkin <[email protected]>
> Fixes: ff9d07a0e7ce7 ("KVM: Implement perf callbacks for guest sampling")

This should be another clue that the fix isn't correct. That patch is from 2010,
Intel PT was announced in 2013 and merged in 2019.

> Reported-by: Artem Kashkanov <[email protected]>
> Tested-by: Artem Kashkanov <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 6 ++++++
> arch/x86/kvm/x86.c | 10 ++++++++--
> 3 files changed, 15 insertions(+), 2 deletions(-)
>

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b6bca616929..3ba0001e7388 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -268,6 +268,8 @@ static struct kmem_cache *x86_fpu_cache;
>
> static struct kmem_cache *x86_emulator_cache;
>
> +static int __read_mostly intel_pt_enabled;
> +
> /*
> * When called, it means the previous get/set msr reached an invalid msr.
> * Return true if we want to ignore/silent this failed msr access.
> @@ -8194,7 +8196,10 @@ int kvm_arch_init(void *opaque)
>
> kvm_timer_init();
>
> - perf_register_guest_info_callbacks(&kvm_guest_cbs);
> + if (ops->intel_pt_enabled && ops->intel_pt_enabled()) r

This is not remotely correct. vmx.c's "pt_mode", which is queried via this path,
is modified by hardware_setup(), a.k.a. kvm_x86_ops.hardware_setup(), which runs
_after_ this code. And as alluded to above, these are generic perf callbacks,
installing them if and only if Intel PT is enabled in a specific mode completely
breaks "regular" perf.

I'll post a small series, there's a bit of code massage needed to fix this
properly. The PMI handler can also be optimized to avoid a retpoline when PT is
not exposed to the guest.

> + perf_register_guest_info_callbacks(&kvm_guest_cbs);
> + intel_pt_enabled = 1;
> + }
>
> if (boot_cpu_has(X86_FEATURE_XSAVE)) {
> host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
> @@ -8229,7 +8234,8 @@ void kvm_arch_exit(void)
> clear_hv_tscchange_cb();
> #endif
> kvm_lapic_exit();
> - perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
> + if (intel_pt_enabled)
> + perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
>
> if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
> cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
> --
> 2.32.0
>

2021-08-23 17:15:36

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCH] kvm/x86: Fix PT "host mode"

Sean Christopherson <[email protected]> writes:

> On Mon, Aug 23, 2021, Alexander Shishkin wrote:
>
>> Signed-off-by: Alexander Shishkin <[email protected]>
>> Fixes: ff9d07a0e7ce7 ("KVM: Implement perf callbacks for guest sampling")
>
> This should be another clue that the fix isn't correct.
> That patch is from 2010,

Right, this should have been 8479e04e7d6b1 ("KVM: x86: Inject PMI for
KVM guest") instead.

> Intel PT was announced in 2013 and merged in 2019.

Technically, 2019 is when kvm started breaking host PT.

> This is not remotely correct. vmx.c's "pt_mode", which is queried via this path,
> is modified by hardware_setup(), a.k.a. kvm_x86_ops.hardware_setup(), which runs
> _after_ this code. And as alluded to above, these are generic perf callbacks,
> installing them if and only if Intel PT is enabled in a specific mode completely
> breaks "regular" perf.

I see your point, the callchain code will catch fire.

> I'll post a small series, there's a bit of code massage needed to fix this
> properly. The PMI handler can also be optimized to avoid a retpoline when PT is
> not exposed to the guest.

The actual PMU handler also needs to know that kvm won't be needing it
so it can call the regular PT handler.

One could unset cbs->handle_intel_pt_intr() or one could have it return
different things depending on whether it was actually taken in kvm. But
both are rather disgusting.

Regards,
--
Alex