2024-03-01 07:44:54

by Sandipan Das

[permalink] [raw]
Subject: [PATCH] KVM: x86: Do not mask LVTPC when handling a PMI on AMD platforms

On AMD and Hygon platforms, the local APIC does not automatically set
the mask bit of the LVTPC register when handling a PMI and there is
no need to clear it in the kernel's PMI handler.

For guests, the mask bit is currently set by kvm_apic_local_deliver()
and unless it is cleared by the guest kernel's PMI handler, PMIs stop
arriving and break use-cases like sampling with perf record.

This does not affect non-PerfMonV2 guests because PMIs are handled in
the guest kernel by x86_pmu_handle_irq() which always clears the LVTPC
mask bit irrespective of the vendor.

Before:

$ perf record -e cycles:u true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (1 samples) ]

After:

$ perf record -e cycles:u true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (19 samples) ]

Fixes: a16eb25b09c0 ("KVM: x86: Mask LVTPC when handling a PMI")
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/kvm/lapic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3242f3da2457..0959a887c306 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2768,7 +2768,7 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;

r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
- if (r && lvt_type == APIC_LVTPC)
+ if (r && lvt_type == APIC_LVTPC && !guest_cpuid_is_amd_or_hygon(apic->vcpu))
kvm_lapic_set_reg(apic, APIC_LVTPC, reg | APIC_LVT_MASKED);
return r;
}
--
2.34.1



2024-03-01 19:24:42

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Do not mask LVTPC when handling a PMI on AMD platforms

On Thu, Feb 29, 2024 at 11:44 PM Sandipan Das <[email protected]> wrote:
>
> On AMD and Hygon platforms, the local APIC does not automatically set
> the mask bit of the LVTPC register when handling a PMI and there is
> no need to clear it in the kernel's PMI handler.

I don't know why it didn't occur to me that different x86 vendors
wouldn't agree on this specification. :)

> For guests, the mask bit is currently set by kvm_apic_local_deliver()
> and unless it is cleared by the guest kernel's PMI handler, PMIs stop
> arriving and break use-cases like sampling with perf record.
>
> This does not affect non-PerfMonV2 guests because PMIs are handled in
> the guest kernel by x86_pmu_handle_irq() which always clears the LVTPC
> mask bit irrespective of the vendor.
>
> Before:
>
> $ perf record -e cycles:u true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.001 MB perf.data (1 samples) ]
>
> After:
>
> $ perf record -e cycles:u true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.002 MB perf.data (19 samples) ]
>
> Fixes: a16eb25b09c0 ("KVM: x86: Mask LVTPC when handling a PMI")
> Signed-off-by: Sandipan Das <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 3242f3da2457..0959a887c306 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2768,7 +2768,7 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
> trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
>
> r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
> - if (r && lvt_type == APIC_LVTPC)
> + if (r && lvt_type == APIC_LVTPC && !guest_cpuid_is_amd_or_hygon(apic->vcpu))

Perhaps we could use a positive predicate instead:
guest_cpuid_is_intel(apic->vcpu)?

> kvm_lapic_set_reg(apic, APIC_LVTPC, reg | APIC_LVT_MASKED);
> return r;
> }
> --
> 2.34.1
Reviewed-by: Jim Mattson <[email protected]>

2024-03-04 20:50:05

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Do not mask LVTPC when handling a PMI on AMD platforms

On Fri, Mar 01, 2024, Jim Mattson wrote:
> On Thu, Feb 29, 2024 at 11:44 PM Sandipan Das <[email protected]> wrote:
> >
> > On AMD and Hygon platforms, the local APIC does not automatically set
> > the mask bit of the LVTPC register when handling a PMI and there is
> > no need to clear it in the kernel's PMI handler.
>
> I don't know why it didn't occur to me that different x86 vendors
> wouldn't agree on this specification. :)

Because you're sane? :-D

> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 3242f3da2457..0959a887c306 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2768,7 +2768,7 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
> > trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
> >
> > r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
> > - if (r && lvt_type == APIC_LVTPC)
> > + if (r && lvt_type == APIC_LVTPC && !guest_cpuid_is_amd_or_hygon(apic->vcpu))
>
> Perhaps we could use a positive predicate instead:
> guest_cpuid_is_intel(apic->vcpu)?

AFAICT, Zhaoxin follows intel behavior, so we'd theoretically have to allow for
that too. The many checks of guest_cpuid_is_intel() in KVM suggest that no one
actually cares about about correctly virtualizing Zhaoxin CPUs, but it's an easy
enough problem to solve.

I'm also very tempted to say KVM should cache the Intel vs. AMD vCPU model. E.g.
if userspace does something weird with guest CPUID and puts CPUID.0x0 somewhere
other than the zeroth entry, KVM's linear walk to find a CPUID entry will make
this a pretty slow lookup.

Then we could also encapsulate the gory details for Intel vs. Zhaoxin vs. Centaur,
and AMD vs. Hygon, e.g.

if (r && lvt_type == APIC_LVTPC &&
apic->vcpu->arch.is_model_intel_compatible)

2024-03-07 05:39:49

by Sandipan Das

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Do not mask LVTPC when handling a PMI on AMD platforms

On 3/5/2024 2:19 AM, Sean Christopherson wrote:
> On Fri, Mar 01, 2024, Jim Mattson wrote:
>> On Thu, Feb 29, 2024 at 11:44 PM Sandipan Das <[email protected]> wrote:
>>>
>>> On AMD and Hygon platforms, the local APIC does not automatically set
>>> the mask bit of the LVTPC register when handling a PMI and there is
>>> no need to clear it in the kernel's PMI handler.
>>
>> I don't know why it didn't occur to me that different x86 vendors
>> wouldn't agree on this specification. :)
>
> Because you're sane? :-D
>
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index 3242f3da2457..0959a887c306 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -2768,7 +2768,7 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
>>> trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
>>>
>>> r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
>>> - if (r && lvt_type == APIC_LVTPC)
>>> + if (r && lvt_type == APIC_LVTPC && !guest_cpuid_is_amd_or_hygon(apic->vcpu))
>>
>> Perhaps we could use a positive predicate instead:
>> guest_cpuid_is_intel(apic->vcpu)?
>
> AFAICT, Zhaoxin follows intel behavior, so we'd theoretically have to allow for
> that too. The many checks of guest_cpuid_is_intel() in KVM suggest that no one
> actually cares about about correctly virtualizing Zhaoxin CPUs, but it's an easy
> enough problem to solve.
>
> I'm also very tempted to say KVM should cache the Intel vs. AMD vCPU model. E.g.
> if userspace does something weird with guest CPUID and puts CPUID.0x0 somewhere
> other than the zeroth entry, KVM's linear walk to find a CPUID entry will make
> this a pretty slow lookup.
>
> Then we could also encapsulate the gory details for Intel vs. Zhaoxin vs. Centaur,
> and AMD vs. Hygon, e.g.
>
> if (r && lvt_type == APIC_LVTPC &&
> apic->vcpu->arch.is_model_intel_compatible)

The following are excerpts from some changes that I have been working on. Instead
of a boolean flag, this saves the compatible vendor in kvm_vcpu_arch and tries
to match it against X86_VENDOR_* values. The goal is to replace
guest_cpuid_is_{intel,amd_or_hygon}() with
guest_vendor_is_compatible(vcpu, X86_VENDOR_{INTEL,AMD}). Is this viable?

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d271ba20a0b2..c4ada5b12fc3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1042,6 +1042,7 @@ struct kvm_vcpu_arch {
#if IS_ENABLED(CONFIG_HYPERV)
hpa_t hv_root_tdp;
#endif
+ u8 compat_vendor;
};

struct kvm_lpage_info {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index adba49afb5fe..00170762e72a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -376,6 +376,13 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
vcpu->arch.cpuid_nent));

+ if (guest_cpuid_is_intel_compatible(vcpu))
+ vcpu->arch.compat_vendor = X86_VENDOR_INTEL;
+ else if (guest_cpuid_is_amd_or_hygon(vcpu))
+ vcpu->arch.compat_vendor = X86_VENDOR_AMD;
+ else
+ vcpu->arch.compat_vendor = X86_VENDOR_UNKNOWN;
+
/* Invoke the vendor callback only after the above state is updated. */
static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 856e3037e74f..8c73db586231 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -120,6 +120,17 @@ static inline bool guest_cpuid_is_intel(struct kvm_vcpu *vcpu)
return best && is_guest_vendor_intel(best->ebx, best->ecx, best->edx);
}

+static inline bool guest_cpuid_is_intel_compatible(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpuid_entry2 *best;
+
+ best = kvm_find_cpuid_entry(vcpu, 0);
+ return best &&
+ (is_guest_vendor_intel(best->ebx, best->ecx, best->edx) ||
+ is_guest_vendor_centaur(best->ebx, best->ecx, best->edx) ||
+ is_guest_vendor_zhaoxin(best->ebx, best->ecx, best->edx));
+}
+
static inline int guest_cpuid_family(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
@@ -142,6 +153,11 @@ static inline int guest_cpuid_model(struct kvm_vcpu *vcpu)
return x86_model(best->eax);
}

+static inline bool guest_vendor_is_compatible(struct kvm_vcpu *vcpu, u8 vendor)
+{
+ return vcpu->arch.compat_vendor == vendor;
+}
+
static inline bool cpuid_model_is_consistent(struct kvm_vcpu *vcpu)
{
return boot_cpu_data.x86_model == guest_cpuid_model(vcpu);


2024-03-07 18:36:09

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Do not mask LVTPC when handling a PMI on AMD platforms

On Thu, Mar 07, 2024, Sandipan Das wrote:
> On 3/5/2024 2:19 AM, Sean Christopherson wrote:
> The following are excerpts from some changes that I have been working on. Instead
> of a boolean flag, this saves the compatible vendor in kvm_vcpu_arch and tries
> to match it against X86_VENDOR_* values. The goal is to replace
> guest_cpuid_is_{intel,amd_or_hygon}() with
> guest_vendor_is_compatible(vcpu, X86_VENDOR_{INTEL,AMD}). Is this viable?
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d271ba20a0b2..c4ada5b12fc3 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1042,6 +1042,7 @@ struct kvm_vcpu_arch {
> #if IS_ENABLED(CONFIG_HYPERV)
> hpa_t hv_root_tdp;
> #endif
> + u8 compat_vendor;

Ooh, clever, much better than my idea of using multiple booleans.

One potential flaw though: the vCPU structure is zero-allocated, and so this will
get a false positive on X86_VENDOR_INTEL if userspace never sets guest CPUID.

That might actually be desirable though? E.g. it's closer to KVM's current
behavior. Maybe. I dunno :-)

Anyways, KVM *does* need to be consistent, i.e. the default needs to yield the
same behavior as guest CPUID without entry 0x0 so that setting *other* CPUID
entries doesn't change from INTEL=>UNKNOWN. More below.

> };
>
> struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index adba49afb5fe..00170762e72a 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -376,6 +376,13 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
> vcpu->arch.cpuid_nent));
>
> + if (guest_cpuid_is_intel_compatible(vcpu))
> + vcpu->arch.compat_vendor = X86_VENDOR_INTEL;
> + else if (guest_cpuid_is_amd_or_hygon(vcpu))
> + vcpu->arch.compat_vendor = X86_VENDOR_AMD;
> + else
> + vcpu->arch.compat_vendor = X86_VENDOR_UNKNOWN;

I would prefer to provide a helper for just getting the compat vendor. E.g.

static u8 kvm_vcpu_get_compat_vendor(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;

best = kvm_find_cpuid_entry(vcpu, 0);
if (!best)
return ???;

if (is_guest_vendor_intel(best->ebx, best->ecx, best->edx) ||
is_guest_vendor_centaur(best->ebx, best->ecx, best->edx) ||
is_guest_vendor_zhaoxin(best->ebx, best->ecx, best->edx))
return X86_VENDOR_INTEL;

if (is_guest_vendor_amd(best->ebx, best->ecx, best->edx) ||
is_guest_vendor_hygon(best->ebx, best->ecx, best->edx))
return X86_VENDOR_AMD;

return X86_VENDOR_UNKNOWN;
}

and then a follow-up patch can remove guest_cpuid_is_amd_or_hygon() once all
users are converted to guest_vendor_is_compatible().