Add new guest_phys_bits field to kvm_caps, return the value to
userspace when asked for KVM_CAP_VM_GPA_BITS capability.
Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
Vendor modules (i.e. vmx and svm) can adjust this field in case
additional restrictions apply, for example in case EPT has no
support for 5-level paging.
Signed-off-by: Gerd Hoffmann <[email protected]>
---
arch/x86/kvm/x86.h | 2 ++
arch/x86/kvm/x86.c | 5 +++++
2 files changed, 7 insertions(+)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2f7e19166658..e03aec3527f8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -24,6 +24,8 @@ struct kvm_caps {
bool has_bus_lock_exit;
/* notify VM exit supported? */
bool has_notify_vmexit;
+ /* usable guest phys bits */
+ u32 guest_phys_bits;
u64 supported_mce_cap;
u64 supported_xcr0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 48a61d283406..e270b9b708d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4784,6 +4784,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
r |= BIT(KVM_X86_SW_PROTECTED_VM);
break;
+ case KVM_CAP_VM_GPA_BITS:
+ r = kvm_caps.guest_phys_bits;
+ break;
default:
break;
}
@@ -9706,6 +9709,8 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
rdmsrl(MSR_IA32_ARCH_CAPABILITIES, host_arch_capabilities);
+ kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
+
r = ops->hardware_setup();
if (r != 0)
goto out_mmu_exit;
--
2.44.0
On Fri, Mar 01, 2024 at 11:14:07AM +0100, Gerd Hoffmann wrote:
> Add new guest_phys_bits field to kvm_caps, return the value to
> userspace when asked for KVM_CAP_VM_GPA_BITS capability.
>
> Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
> Vendor modules (i.e. vmx and svm) can adjust this field in case
> additional restrictions apply, for example in case EPT has no
> support for 5-level paging.
>
> Signed-off-by: Gerd Hoffmann <[email protected]>
> ---
> arch/x86/kvm/x86.h | 2 ++
> arch/x86/kvm/x86.c | 5 +++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 2f7e19166658..e03aec3527f8 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -24,6 +24,8 @@ struct kvm_caps {
> bool has_bus_lock_exit;
> /* notify VM exit supported? */
> bool has_notify_vmexit;
> + /* usable guest phys bits */
> + u32 guest_phys_bits;
>
> u64 supported_mce_cap;
> u64 supported_xcr0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 48a61d283406..e270b9b708d1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4784,6 +4784,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
> r |= BIT(KVM_X86_SW_PROTECTED_VM);
> break;
> + case KVM_CAP_VM_GPA_BITS:
> + r = kvm_caps.guest_phys_bits;
> + break;
> default:
> break;
> }
> @@ -9706,6 +9709,8 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
> rdmsrl(MSR_IA32_ARCH_CAPABILITIES, host_arch_capabilities);
>
> + kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
here, you can check in detect_tme().
Thanks,
Tao
> +
> r = ops->hardware_setup();
> if (r != 0)
> goto out_mmu_exit;
> --
> 2.44.0
>
>
> > + kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
>
> When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
> here, you can check in detect_tme().
from detect_tme():
/*
* KeyID bits effectively lower the number of physical address
* bits. Update cpuinfo_x86::x86_phys_bits accordingly.
*/
c->x86_phys_bits -= keyid_bits;
This looks like x86_phys_bits gets adjusted if needed.
take care,
Gerd
On Mon, Mar 04, 2024 at 04:59:32PM +0800, Tao Su wrote:
> On Mon, Mar 04, 2024 at 09:43:53AM +0100, Gerd Hoffmann wrote:
> > > > + kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
> > >
> > > When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
> > > here, you can check in detect_tme().
> >
> > from detect_tme():
> >
> > /*
> > * KeyID bits effectively lower the number of physical address
> > * bits. Update cpuinfo_x86::x86_phys_bits accordingly.
> > */
> > c->x86_phys_bits -= keyid_bits;
> >
> > This looks like x86_phys_bits gets adjusted if needed.
>
> If TDP is enabled and supports 5-level, we want kvm_caps.guest_phys_bits=52,
> but c->x86_phys_bits!=52 here.
Do you talk about EPT or NPT or both?
> Maybe we need to set kvm_caps.guest_phys_bits
> according to whether TDP is enabled or not, like leaf 0x80000008 in
> __do_cpuid_func().
See patches 2+3 of this series.
Maybe it is better to just not set kvm_caps.guest_phys_bits in generic
kvm code and leave that completely to vmx / svm vendor modules. Or let
the generic code handle the !tdp_enabled case and have the vendor
modules override (considering EPT / NPT limitations) in case tdp is
enabled.
take care,
Gerd
On Mon, Mar 04, 2024 at 09:43:53AM +0100, Gerd Hoffmann wrote:
> > > + kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
> >
> > When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
> > here, you can check in detect_tme().
>
> from detect_tme():
>
> /*
> * KeyID bits effectively lower the number of physical address
> * bits. Update cpuinfo_x86::x86_phys_bits accordingly.
> */
> c->x86_phys_bits -= keyid_bits;
>
> This looks like x86_phys_bits gets adjusted if needed.
If TDP is enabled and supports 5-level, we want kvm_caps.guest_phys_bits=52,
but c->x86_phys_bits!=52 here. Maybe we need to set kvm_caps.guest_phys_bits
according to whether TDP is enabled or not, like leaf 0x80000008 in
__do_cpuid_func().
Thanks,
Tao
On Fri, Mar 01, 2024, Gerd Hoffmann wrote:
> Add new guest_phys_bits field to kvm_caps, return the value to
> userspace when asked for KVM_CAP_VM_GPA_BITS capability.
>
> Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
> Vendor modules (i.e. vmx and svm) can adjust this field in case
> additional restrictions apply, for example in case EPT has no
> support for 5-level paging.
>
> Signed-off-by: Gerd Hoffmann <[email protected]>
> ---
> arch/x86/kvm/x86.h | 2 ++
> arch/x86/kvm/x86.c | 5 +++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 2f7e19166658..e03aec3527f8 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -24,6 +24,8 @@ struct kvm_caps {
> bool has_bus_lock_exit;
> /* notify VM exit supported? */
> bool has_notify_vmexit;
> + /* usable guest phys bits */
> + u32 guest_phys_bits;
>
> u64 supported_mce_cap;
> u64 supported_xcr0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 48a61d283406..e270b9b708d1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4784,6 +4784,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
> r |= BIT(KVM_X86_SW_PROTECTED_VM);
> break;
> + case KVM_CAP_VM_GPA_BITS:
> + r = kvm_caps.guest_phys_bits;
This is not a fast path, just compute the effective guest.MAXPHYADDR on the fly
using tdp_root_level and max_tdp_level. But as pointed out and discussed in the
previous thread, adverising a guest.MAXPHYADDR that is smaller than host.MAXPHYADDR
simply doesn't work[*].
I thought the plan was to add a way for KVM to advertise the maximum *addressable*
GPA, and figure out a way to communicate that to the guest, e.g. so that firmware
doesn't try to use legal GPAs that the host cannot address.
Paolo, any update on this?
[*] https://lore.kernel.org/all/CALMp9eTutnTxCjQjs-nxP=XC345vTmJJODr+PcSOeaQpBW0Skw@mail.gmail.com
On 3/4/2024 11:15 PM, Sean Christopherson wrote:
> On Fri, Mar 01, 2024, Gerd Hoffmann wrote:
>> Add new guest_phys_bits field to kvm_caps, return the value to
>> userspace when asked for KVM_CAP_VM_GPA_BITS capability.
>>
>> Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
>> Vendor modules (i.e. vmx and svm) can adjust this field in case
>> additional restrictions apply, for example in case EPT has no
>> support for 5-level paging.
>>
>> Signed-off-by: Gerd Hoffmann <[email protected]>
>> ---
>> arch/x86/kvm/x86.h | 2 ++
>> arch/x86/kvm/x86.c | 5 +++++
>> 2 files changed, 7 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index 2f7e19166658..e03aec3527f8 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -24,6 +24,8 @@ struct kvm_caps {
>> bool has_bus_lock_exit;
>> /* notify VM exit supported? */
>> bool has_notify_vmexit;
>> + /* usable guest phys bits */
>> + u32 guest_phys_bits;
>>
>> u64 supported_mce_cap;
>> u64 supported_xcr0;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 48a61d283406..e270b9b708d1 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4784,6 +4784,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
>> r |= BIT(KVM_X86_SW_PROTECTED_VM);
>> break;
>> + case KVM_CAP_VM_GPA_BITS:
>> + r = kvm_caps.guest_phys_bits;
>
> This is not a fast path, just compute the effective guest.MAXPHYADDR on the fly
> using tdp_root_level and max_tdp_level. But as pointed out and discussed in the
> previous thread, adverising a guest.MAXPHYADDR that is smaller than host.MAXPHYADDR
> simply doesn't work[*].
>
> I thought the plan was to add a way for KVM to advertise the maximum *addressable*
> GPA, and figure out a way to communicate that to the guest, e.g. so that firmware
> doesn't try to use legal GPAs that the host cannot address.
From one off-list email thread, Paolo was proposing to change the
definition of CPUID.0x80000008:EAX[23:16] to "Maximum usable physical
address size in bits", in detail:
Maximum usable physical address size in bits. Physical addresses
above this size should not be used, but will not produce a "reserved"
page fault. When this field is zero, all bits up to PhysAddrSize are
usable. This field is expected to be nonzero only on guests where
the hypervisor is using nesting paging.
As I understand it, it turns bit [23:16] of EAX of CPUID 0x80000008 into
a PV field, that is set by VMM(e.g., KVM/QEMU) and consumed by guest.
So KVM can advertise maximum addressable/usable physical address bits in
CPUID.0x80000008:EAX[23:16] via GET_SUPPORTED_CPUID.
> Paolo, any update on this?
>
> [*] https://lore.kernel.org/all/CALMp9eTutnTxCjQjs-nxP=XC345vTmJJODr+PcSOeaQpBW0Skw@mail.gmail.com
>