2023-11-19 17:22:35

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 2/9] KVM: x86: Replace guts of "goverened" features with comprehensive cpu_caps

On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote:
> Replace the internals of the governed features framework with a more
> comprehensive "guest CPU capabilities" implementation, i.e. with a guest
> version of kvm_cpu_caps. Keep the skeleton of governed features around
> for now as vmx_adjust_sec_exec_control() relies on detecting governed
> features to do the right thing for XSAVES, and switching all guest feature
> queries to guest_cpu_cap_has() requires subtle and non-trivial changes,
> i.e. is best done as a standalone change.
>
> Tracking *all* guest capabilities that KVM cares will allow excising the
> poorly named "governed features" framework, and effectively optimizes all
> KVM queries of guest capabilities, i.e. doesn't require making a
> subjective decision as to whether or not a feature is worth "governing",
> and doesn't require adding the code to do so.
>
> The cost of tracking all features is currently 92 bytes per vCPU on 64-bit
> kernels: 100 bytes for cpu_caps versus 8 bytes for governed_features.
> That cost is well worth paying even if the only benefit was eliminating
> the "governed features" terminology. And practically speaking, the real
> cost is zero unless those 92 bytes pushes the size of vcpu_vmx or vcpu_svm
> into a new order-N allocation, and if that happens there are better ways
> to reduce the footprint of kvm_vcpu_arch, e.g. making the PMU and/or MTRR
> state separate allocations.
>
> Suggested-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 40 ++++++++++++++++++++-------------
> arch/x86/kvm/cpuid.c | 4 +---
> arch/x86/kvm/cpuid.h | 14 ++++++------
> arch/x86/kvm/reverse_cpuid.h | 15 -------------
> 4 files changed, 32 insertions(+), 41 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d7036982332e..1d43dd5fdea7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -722,6 +722,22 @@ struct kvm_queued_exception {
> bool has_payload;
> };
>
> +/*
> + * Hardware-defined CPUID leafs that are either scattered by the kernel or are
> + * unknown to the kernel, but need to be directly used by KVM. Note, these
> + * word values conflict with the kernel's "bug" caps, but KVM doesn't use those.
> + */
> +enum kvm_only_cpuid_leafs {
> + CPUID_12_EAX = NCAPINTS,
> + CPUID_7_1_EDX,
> + CPUID_8000_0007_EDX,
> + CPUID_8000_0022_EAX,
> + NR_KVM_CPU_CAPS,
> +
> + NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
> +};
> +
> +
> struct kvm_vcpu_arch {
> /*
> * rip and regs accesses must go through
> @@ -840,23 +856,15 @@ struct kvm_vcpu_arch {
> struct kvm_hypervisor_cpuid kvm_cpuid;
>
> /*
> - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly
> - * when "struct kvm_vcpu_arch" is no longer defined in an
> - * arch/x86/include/asm header. The max is mostly arbitrary, i.e.
> - * can be increased as necessary.
> + * Track the effective guest capabilities, i.e. the features the vCPU
> + * is allowed to use. Typically, but not always, features can be used
> + * by the guest if and only if both KVM and userspace want to expose
> + * the feature to the guest. A common exception is for virtualization
> + * holes, i.e. when KVM can't prevent the guest from using a feature,
> + * in which case the vCPU "has" the feature regardless of what KVM or
> + * userspace desires.
> */
> -#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG
> -
> - /*
> - * Track whether or not the guest is allowed to use features that are
> - * governed by KVM, where "governed" means KVM needs to manage state
> - * and/or explicitly enable the feature in hardware. Typically, but
> - * not always, governed features can be used by the guest if and only
> - * if both KVM and userspace want to expose the feature to the guest.
> - */
> - struct {
> - DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES);
> - } governed_features;
> + u32 cpu_caps[NR_KVM_CPU_CAPS];

Won't it be better to call this 'effective_cpu_caps' or something like that,
to put emphasis on the fact that these are not exactly the cpu caps that userspace wants.
Although probably any name will still be somewhat confusing.

>
> u64 reserved_gpa_bits;
> int maxphyaddr;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 4f464187b063..4bf3c2d4dc7c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -327,9 +327,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> struct kvm_cpuid_entry2 *best;
> bool allow_gbpages;
>
> - BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES);
> - bitmap_zero(vcpu->arch.governed_features.enabled,
> - KVM_MAX_NR_GOVERNED_FEATURES);
> + memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
>
> /*
> * If TDP is enabled, let the guest use GBPAGES if they're supported in
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index 245416ffa34c..9f18c4395b71 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -255,12 +255,12 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
> }
>
> static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
> - unsigned int x86_feature)
> + unsigned int x86_feature)
> {
> - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
> + unsigned int x86_leaf = __feature_leaf(x86_feature);
>
> - __set_bit(kvm_governed_feature_index(x86_feature),
> - vcpu->arch.governed_features.enabled);
> + reverse_cpuid_check(x86_leaf);
> + vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
> }
>
> static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> @@ -273,10 +273,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
> unsigned int x86_feature)
> {
> - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
> + unsigned int x86_leaf = __feature_leaf(x86_feature);
>
> - return test_bit(kvm_governed_feature_index(x86_feature),
> - vcpu->arch.governed_features.enabled);
> + reverse_cpuid_check(x86_leaf);
> + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature);
> }

It might make sense to think about extracting the common code between
kvm_cpu_cap* and guest_cpu_cap*.

The whole notion of reverse cpuid, KVM only leaves, and other nice things
that it has is already very confusing, but as I understand there is
no better way of doing it.
But there must be a way to avoid at least duplicating this logic.

Also speaking of this logic, it would be nice to document it.
E.g for 'kvm_only_cpuid_leafs' it would be nice to have an explanation
for each entry on why it is needed.


Just curious: I wonder why Intel called them leaves?
CPUID leaves are just table entries, I don't see any tree there.

Finally isn't plural of "leaf" is "leaves"?

>
> #endif
> diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
> index b81650678375..4b658491e8f8 100644
> --- a/arch/x86/kvm/reverse_cpuid.h
> +++ b/arch/x86/kvm/reverse_cpuid.h
> @@ -6,21 +6,6 @@
> #include <asm/cpufeature.h>
> #include <asm/cpufeatures.h>
>
> -/*
> - * Hardware-defined CPUID leafs that are either scattered by the kernel or are
> - * unknown to the kernel, but need to be directly used by KVM. Note, these
> - * word values conflict with the kernel's "bug" caps, but KVM doesn't use those.
> - */
> -enum kvm_only_cpuid_leafs {
> - CPUID_12_EAX = NCAPINTS,
> - CPUID_7_1_EDX,
> - CPUID_8000_0007_EDX,
> - CPUID_8000_0022_EAX,
> - NR_KVM_CPU_CAPS,
> -
> - NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
> -};
> -
> /*
> * Define a KVM-only feature flag.
> *

Best regards,
Maxim Levitsky




2023-11-28 01:24:33

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 2/9] KVM: x86: Replace guts of "goverened" features with comprehensive cpu_caps

On Sun, Nov 19, 2023, Maxim Levitsky wrote:
> On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote:
> > @@ -840,23 +856,15 @@ struct kvm_vcpu_arch {
> > struct kvm_hypervisor_cpuid kvm_cpuid;
> >
> > /*
> > - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly
> > - * when "struct kvm_vcpu_arch" is no longer defined in an
> > - * arch/x86/include/asm header. The max is mostly arbitrary, i.e.
> > - * can be increased as necessary.
> > + * Track the effective guest capabilities, i.e. the features the vCPU
> > + * is allowed to use. Typically, but not always, features can be used
> > + * by the guest if and only if both KVM and userspace want to expose
> > + * the feature to the guest. A common exception is for virtualization
> > + * holes, i.e. when KVM can't prevent the guest from using a feature,
> > + * in which case the vCPU "has" the feature regardless of what KVM or
> > + * userspace desires.
> > */
> > -#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG
> > -
> > - /*
> > - * Track whether or not the guest is allowed to use features that are
> > - * governed by KVM, where "governed" means KVM needs to manage state
> > - * and/or explicitly enable the feature in hardware. Typically, but
> > - * not always, governed features can be used by the guest if and only
> > - * if both KVM and userspace want to expose the feature to the guest.
> > - */
> > - struct {
> > - DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES);
> > - } governed_features;
> > + u32 cpu_caps[NR_KVM_CPU_CAPS];
>
> Won't it be better to call this 'effective_cpu_caps' or something like that,
> to put emphasis on the fact that these are not exactly the cpu caps that
> userspace wants. Although probably any name will still be somewhat
> confusing.

I'd prefer to keep 'cpu_caps' so that the name is aligned with the APIs, e.g. I
think having "effective" in the field but not e.g. guest_cpu_cap_set() would be
even more confusing.

Also, looking at this again, "effective" isn't strictly correct (my comment about
is also wrong), as virtualization holes that neither userspace nor KVM cares about
aren't reflected in CPUID caps. E.g. things like MOVBE and POPCNT have a CPUID
flag but no way to prevent the guest from using them.

So a truly accurate name would have to be something like
effective_cpu_caps_that_kvm_cares_about. :-)

> > u64 reserved_gpa_bits;
> > int maxphyaddr;
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 4f464187b063..4bf3c2d4dc7c 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -327,9 +327,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > struct kvm_cpuid_entry2 *best;
> > bool allow_gbpages;
> >
> > - BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES);
> > - bitmap_zero(vcpu->arch.governed_features.enabled,
> > - KVM_MAX_NR_GOVERNED_FEATURES);
> > + memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
> >
> > /*
> > * If TDP is enabled, let the guest use GBPAGES if they're supported in
> > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > index 245416ffa34c..9f18c4395b71 100644
> > --- a/arch/x86/kvm/cpuid.h
> > +++ b/arch/x86/kvm/cpuid.h
> > @@ -255,12 +255,12 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
> > }
> >
> > static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
> > - unsigned int x86_feature)
> > + unsigned int x86_feature)
> > {
> > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
> > + unsigned int x86_leaf = __feature_leaf(x86_feature);
> >
> > - __set_bit(kvm_governed_feature_index(x86_feature),
> > - vcpu->arch.governed_features.enabled);
> > + reverse_cpuid_check(x86_leaf);
> > + vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
> > }
> >
> > static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> > @@ -273,10 +273,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
> > unsigned int x86_feature)
> > {
> > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
> > + unsigned int x86_leaf = __feature_leaf(x86_feature);
> >
> > - return test_bit(kvm_governed_feature_index(x86_feature),
> > - vcpu->arch.governed_features.enabled);
> > + reverse_cpuid_check(x86_leaf);
> > + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature);
> > }
>
> It might make sense to think about extracting the common code between
> kvm_cpu_cap* and guest_cpu_cap*.
>
> The whole notion of reverse cpuid, KVM only leaves, and other nice things
> that it has is already very confusing, but as I understand there is
> no better way of doing it.
> But there must be a way to avoid at least duplicating this logic.

Yeah, that thought crossed my mind too, but de-duplicating the interesting bits
would require macros, which I think would be a net negative for this code. I
could definitely be convinced otherwise though, I do love me some macros :-)

Something easy I can, should, and will do regardless is to move reverse_cpuid_check()
into __feature_leaf(). It's already in __feature_bit(), and that would cut down
the copy+paste at least a little bit even if we do/don't use macros.

> Also speaking of this logic, it would be nice to document it.
> E.g for 'kvm_only_cpuid_leafs' it would be nice to have an explanation
> for each entry on why it is needed.

As in, which bits KVM cares about? That's guaranteed to become stale, and the
high level answer is always "because KVM needs it, but the kernel does not". The
actual bits are kinda sorta documented by KVM_X86_FEATURE() usage, and any
conflicts with the kernel's cpufeatures.h should result in a build error due to
KVM trying to redefined a macro.

> Just curious: I wonder why Intel called them leaves?
> CPUID leaves are just table entries, I don't see any tree there.

LOL, I suppose a tree without branches can still have leaves?

> Finally isn't plural of "leaf" is "leaves"?

Heh, yes, "leaves" is the more standar plural form of "leaf". And looking at the
SDM, "leafs" is used mostly for GETSEC and ENCL{S,U} "leafs". I spent a lot of
my formative x86 years doing SMX stuff, and then way to much time on SGX, so I
guess "leafs" just stuck with me.