2021-08-05 22:01:24

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] KVM: x86: Convert TDP level calculation to vendor's specific code

On Thu, Aug 05, 2021, Wei Huang wrote:
> Currently the TDP level for x86 vCPU is calculated by checking both
> MAXPHYADDR and max_tdp_level. This design assumes that all x86 CPUs have
> the flexibility of changing the nested page table level different from host
> CPU. This assumption might not be true.

Heh, no need to be circumspect, just state that 5-level NPT inherits CR4.LA57
from the host. I didn't fully understand this sentence until I looked at patch 3.

> To solve this problem, let us
> create a kvm_x86_ops specific function for TDP level calculation.
>
> Signed-off-by: Wei Huang <[email protected]>
> ---

...

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 974cbfb1eefe..20ddfbac966e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -723,7 +723,6 @@ struct kvm_vcpu_arch {
>
> u64 reserved_gpa_bits;
> int maxphyaddr;
> - int max_tdp_level;

Ha, this is leftover crud that can get zapped no matter what.

> /* emulate context */
>

...

> -static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
> -{
> - /* Use 5-level TDP if and only if it's useful/necessary. */
> - if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)

I'd strongly prefer to keep this logic in the MMU. When this was in vendor code,
there were multiple bugs where the MMU and VMX didn't communicate correctly, I
really don't want to back down that road.

Actually, I'm very, very tempted to say we should simply drop the cpuid_maxphyaddr()
bit and just return the max level (and I suppose rename it), e.g.

return mmu_tdp_level;

It's effectively a single 4kb page per VM, and Intel's numbers on 5-level paging
were that there was no measurable cost to the extra level. I would hope that
holds true here, too.

If we want to keep the MAXPHYADDR behavior, I'd vote for something like:

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b4b65c21b2ca..7e35f2bf89b4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -97,6 +97,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
bool tdp_enabled = false;

static int max_huge_page_level __read_mostly;
+static int tdp_root_level __read_mostly;
static int max_tdp_level __read_mostly;

enum {
@@ -4645,6 +4646,9 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,

static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
{
+ if (tdp_root_level)
+ return tdp_root_level;
+
/* Use 5-level TDP if and only if it's useful/necessary. */
if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
return 4;
@@ -5336,10 +5340,11 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
*/
}

-void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
- int tdp_huge_page_level)
+void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
+ int tdp_max_root_level, int tdp_huge_page_level)
{
tdp_enabled = enable_tdp;
+ tdp_root_level = tdp_forced_root_level;
max_tdp_level = tdp_max_root_level;

/*


2021-08-06 01:29:19

by Wei Huang

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] KVM: x86: Convert TDP level calculation to vendor's specific code



On 8/5/21 4:51 PM, Sean Christopherson wrote:
> On Thu, Aug 05, 2021, Wei Huang wrote:
>> Currently the TDP level for x86 vCPU is calculated by checking both
>> MAXPHYADDR and max_tdp_level. This design assumes that all x86 CPUs have
>> the flexibility of changing the nested page table level different from host
>> CPU. This assumption might not be true.
>
> Heh, no need to be circumspect, just state that 5-level NPT inherits CR4.LA57
> from the host. I didn't fully understand this sentence until I looked at patch 3.

Sure, I will fix the comments

>
>> To solve this problem, let us
>> create a kvm_x86_ops specific function for TDP level calculation.
>>
>> Signed-off-by: Wei Huang <[email protected]>
>> ---
>
> ...
>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 974cbfb1eefe..20ddfbac966e 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -723,7 +723,6 @@ struct kvm_vcpu_arch {
>>
>> u64 reserved_gpa_bits;
>> int maxphyaddr;
>> - int max_tdp_level;
>
> Ha, this is leftover crud that can get zapped no matter what.
>

Correct, this field is not being used at this moment and should be removed.

>> /* emulate context */
>>
>
> ...
>
>> -static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
>> -{
>> - /* Use 5-level TDP if and only if it's useful/necessary. */
>> - if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
>
> I'd strongly prefer to keep this logic in the MMU. When this was in vendor code,
> there were multiple bugs where the MMU and VMX didn't communicate correctly, I
> really don't want to back down that road.
>
> Actually, I'm very, very tempted to say we should simply drop the cpuid_maxphyaddr()
> bit and just return the max level (and I suppose rename it), e.g.
>
> return mmu_tdp_level;
>
> It's effectively a single 4kb page per VM, and Intel's numbers on 5-level paging
> were that there was no measurable cost to the extra level. I would hope that
> holds true here, too.

4KB waste per VM is possibly OK. My concern is the unnecessary perf cost
of one extra level. But if you think the hit is minimal, then returning
mmu_tdp_level without checking cpuid_maxphyaddr() is cleaner.

>
> If we want to keep the MAXPHYADDR behavior, I'd vote for something like:
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b4b65c21b2ca..7e35f2bf89b4 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -97,6 +97,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
> bool tdp_enabled = false;
>
> static int max_huge_page_level __read_mostly;
> +static int tdp_root_level __read_mostly;
> static int max_tdp_level __read_mostly;
>
> enum {
> @@ -4645,6 +4646,9 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
>
> static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
> {
> + if (tdp_root_level)
> + return tdp_root_level;
> +
> /* Use 5-level TDP if and only if it's useful/necessary. */
> if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
> return 4;
> @@ -5336,10 +5340,11 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
> */
> }
>
> -void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
> - int tdp_huge_page_level)
> +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> + int tdp_max_root_level, int tdp_huge_page_level)
> {
> tdp_enabled = enable_tdp;
> + tdp_root_level = tdp_forced_root_level;
> max_tdp_level = tdp_max_root_level;
>
> /*
>

2021-08-08 19:54:07

by Wei Huang

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] KVM: x86: Convert TDP level calculation to vendor's specific code



On 8/5/21 4:51 PM, Sean Christopherson wrote:
>
> If we want to keep the MAXPHYADDR behavior, I'd vote for something like:
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b4b65c21b2ca..7e35f2bf89b4 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -97,6 +97,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
> bool tdp_enabled = false;
>
> static int max_huge_page_level __read_mostly;
> +static int tdp_root_level __read_mostly;
> static int max_tdp_level __read_mostly;
>
> enum {
> @@ -4645,6 +4646,9 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
>
> static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
> {
> + if (tdp_root_level)
> + return tdp_root_level;
> +
> /* Use 5-level TDP if and only if it's useful/necessary. */
> if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48)
> return 4;
> @@ -5336,10 +5340,11 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
> */
> }
>
> -void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
> - int tdp_huge_page_level)
> +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> + int tdp_max_root_level, int tdp_huge_page_level)
> {
> tdp_enabled = enable_tdp;
> + tdp_root_level = tdp_forced_root_level;
> max_tdp_level = tdp_max_root_level;
>
> /*
>

I decided to take this suggestion in v2: it avoids using 5-level table
(memory cost and potential table-walk overhead) if the host has the
flexibility of using 4-level NPT under LA57.