LinuxLists.cc - [PATCH] KVM: x86: restrict maximal physical address

2016-11-25 14:52:02

Subject: [PATCH] KVM: x86: restrict maximal physical address

The guest could have configured a maximal physical address that exceeds
the host. Prevent that situation as it could easily lead to a bug.

Signed-off-by: Radim Krčmář <[email protected]>
---
arch/x86/kvm/cpuid.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 25f0f15fab1a..aed910e9fbed 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -136,7 +136,13 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
((best->eax & 0xff00) >> 8) != 0)
return -EINVAL;

- /* Update physical-address width */
+
+ /*
+ * Update physical-address width.
+ * Make sure that it does not exceed hardware capabilities.
+ */
+ if (cpuid_query_maxphyaddr(vcpu) > boot_cpu_data.x86_phys_bits)
+ return -EINVAL;
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);

kvm_pmu_refresh(vcpu);
--
2.10.2

2016-11-25 15:11:39

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH] KVM: x86: restrict maximal physical address

Am 25.11.2016 um 15:51 schrieb Radim Krčmář:
> The guest could have configured a maximal physical address that exceeds
> the host. Prevent that situation as it could easily lead to a bug.
>
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 25f0f15fab1a..aed910e9fbed 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -136,7 +136,13 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> ((best->eax & 0xff00) >> 8) != 0)
> return -EINVAL;
>
> - /* Update physical-address width */
> +
> + /*
> + * Update physical-address width.
> + * Make sure that it does not exceed hardware capabilities.
> + */
> + if (cpuid_query_maxphyaddr(vcpu) > boot_cpu_data.x86_phys_bits)

The name maxphyaddr is really misleading. But that is a different story.
This check is correct.

However, I wonder if there is any way for user space to query this
property? On s390x, there is a kvm capability to export this information
to user space. So QEMU can fail (e.g. migration) with a nice error
message about missing hardware support.

(most probably we still want to block this case, as migration will seem
to work but than simply fail due to missing hardware support I guess).
Maybe there is also already a nice check in QEMU that I am not yet aware
of :)

> + return -EINVAL;
> vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
>
> kvm_pmu_refresh(vcpu);
>

--

David

2016-11-25 16:13:18

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH] KVM: x86: restrict maximal physical address

On 25/11/2016 15:51, Radim Krčmář wrote:
> The guest could have configured a maximal physical address that exceeds
> the host. Prevent that situation as it could easily lead to a bug.
>
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 25f0f15fab1a..aed910e9fbed 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -136,7 +136,13 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> ((best->eax & 0xff00) >> 8) != 0)
> return -EINVAL;
>
> - /* Update physical-address width */
> +
> + /*
> + * Update physical-address width.
> + * Make sure that it does not exceed hardware capabilities.
> + */
> + if (cpuid_query_maxphyaddr(vcpu) > boot_cpu_data.x86_phys_bits)
> + return -EINVAL;
> vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
>
> kvm_pmu_refresh(vcpu);
>

Not possible unfortunately, this would break most versions of QEMU that
hard-code 40 for MAXPHYADDR.

Also, "wider" physical addresses in the guest are actually possible with
shadow paging.

Paolo

2016-11-25 16:15:03

by Radim Krčmář

[permalink] [raw]

Subject: Re: [PATCH] KVM: x86: restrict maximal physical address

2016-11-25 16:11+0100, David Hildenbrand:
> Am 25.11.2016 um 15:51 schrieb Radim Krčmář:
>> The guest could have configured a maximal physical address that exceeds
>> the host. Prevent that situation as it could easily lead to a bug.
>>
>> Signed-off-by: Radim Krčmář <[email protected]>
>> ---
>> arch/x86/kvm/cpuid.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index 25f0f15fab1a..aed910e9fbed 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -136,7 +136,13 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>> ((best->eax & 0xff00) >> 8) != 0)
>> return -EINVAL;
>>
>> - /* Update physical-address width */
>> +
>> + /*
>> + * Update physical-address width.
>> + * Make sure that it does not exceed hardware capabilities.
>> + */
>> + if (cpuid_query_maxphyaddr(vcpu) > boot_cpu_data.x86_phys_bits)
>
> The name maxphyaddr is really misleading. But that is a different story.

Yes, I'll rename it, thanks.

> This check is correct.
>
> However, I wonder if there is any way for user space to query this property?

Do you mean boot_cpu_data.x86_phys_bits?
Userspace can execute CPUID instruction and read the value; QEMU does.

> On s390x, there is a kvm capability to export this information to user
> space. So QEMU can fail (e.g. migration) with a nice error message about
> missing hardware support.
>
> (most probably we still want to block this case, as migration will seem to
> work but than simply fail due to missing hardware support I guess). Maybe
> there is also already a nice check in QEMU that I am not yet aware of :)

This patch is bad. It would break QEMU on all old machines, because
QEMU sets 40 by default.

Heh, QEMU doesn't check at all -- it even allows migration with
"host-phys-bits" feature and will happily change phys-bits when
migrating to another machine.

2016-11-25 16:45:56

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH] KVM: x86: restrict maximal physical address

>> This check is correct.
>>
>> However, I wonder if there is any way for user space to query this property?
>
> Do you mean boot_cpu_data.x86_phys_bits?
> Userspace can execute CPUID instruction and read the value; QEMU does.

Thanks, good to know. I remember that on s390x we explicitly decided to
query the maximum address from KVM (KVM_S390_VM_MEM_LIMIT_SIZE) for two
reasons. One of them was "just because our CPU supports it doesn't mean
KVM supports it". Just like with all CPU features.

However, this applies only for configuring hardware virtualization. The
value that is exposed to the guest comes from the cpu model (with s390x
cpu model support). So it will also not change during migration.

But if this will never be relevant for x86 (KVM will always support host
x86_phys_bits), fine.

>
>> On s390x, there is a kvm capability to export this information to user
>> space. So QEMU can fail (e.g. migration) with a nice error message about
>> missing hardware support.
>>
>> (most probably we still want to block this case, as migration will seem to
>> work but than simply fail due to missing hardware support I guess). Maybe
>> there is also already a nice check in QEMU that I am not yet aware of :)
>
> This patch is bad. It would break QEMU on all old machines, because
> QEMU sets 40 by default.

Not sure if rounding that value down (so it is at least consistent in
KVM) makes sense (and documenting this behavior "may be rounded down").
And then implementing appropriate checks in QEMU (if not already present).

>
> Heh, QEMU doesn't check at all -- it even allows migration with
> "host-phys-bits" feature and will happily change phys-bits when
> migrating to another machine.
>

Either migrate that value (hmmm... ) or glue it to a command line
parameter, so it won't change while migrating. E.g.

- cpu models (if this value was always the same for a CPU generation -
no expert on x86 cpu models).
- "-cpu maxmem..." - could be a fit when thinking about "maximum VM size
== max phys bits for our guest". But depends how this value is actually
interpreted by guests.

--

David

2016-11-25 16:57:55

by Radim Krčmář

[permalink] [raw]

Subject: Re: [PATCH] KVM: x86: restrict maximal physical address

2016-11-25 17:10+0100, Paolo Bonzini:
> On 25/11/2016 15:51, Radim Krčmář wrote:
>> The guest could have configured a maximal physical address that exceeds
>> the host. Prevent that situation as it could easily lead to a bug.
>>
>> Signed-off-by: Radim Krčmář <[email protected]>
>> ---
>> arch/x86/kvm/cpuid.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index 25f0f15fab1a..aed910e9fbed 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -136,7 +136,13 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>> ((best->eax & 0xff00) >> 8) != 0)
>> return -EINVAL;
>>
>> - /* Update physical-address width */
>> +
>> + /*
>> + * Update physical-address width.
>> + * Make sure that it does not exceed hardware capabilities.
>> + */
>> + if (cpuid_query_maxphyaddr(vcpu) > boot_cpu_data.x86_phys_bits)
>> + return -EINVAL;
>> vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
>>
>> kvm_pmu_refresh(vcpu);
>>
>
> Not possible unfortunately, this would break most versions of QEMU that
> hard-code 40 for MAXPHYADDR.
>
> Also, "wider" physical addresses in the guest are actually possible with
> shadow paging.

We don't disable EPT in that case, though. I guess that situations
where QEMU configures mem slot into high physical addresses are not hit
in production ...

Is any solution better than ignoring this situation?

2016-11-25 17:21:37

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH] KVM: x86: restrict maximal physical address

On 25/11/2016 17:57, Radim Krčmář wrote:
> 2016-11-25 17:10+0100, Paolo Bonzini:
>> On 25/11/2016 15:51, Radim Krčmář wrote:
>>> The guest could have configured a maximal physical address that exceeds
>>> the host. Prevent that situation as it could easily lead to a bug.
>>>
>>> Signed-off-by: Radim Krčmář <[email protected]>
>>> ---
>>> arch/x86/kvm/cpuid.c | 8 +++++++-
>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>> index 25f0f15fab1a..aed910e9fbed 100644
>>> --- a/arch/x86/kvm/cpuid.c
>>> +++ b/arch/x86/kvm/cpuid.c
>>> @@ -136,7 +136,13 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>>> ((best->eax & 0xff00) >> 8) != 0)
>>> return -EINVAL;
>>>
>>> - /* Update physical-address width */
>>> +
>>> + /*
>>> + * Update physical-address width.
>>> + * Make sure that it does not exceed hardware capabilities.
>>> + */
>>> + if (cpuid_query_maxphyaddr(vcpu) > boot_cpu_data.x86_phys_bits)
>>> + return -EINVAL;
>>> vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
>>>
>>> kvm_pmu_refresh(vcpu);
>>>
>>
>> Not possible unfortunately, this would break most versions of QEMU that
>> hard-code 40 for MAXPHYADDR.
>>
>> Also, "wider" physical addresses in the guest are actually possible with
>> shadow paging.
>
> We don't disable EPT in that case, though. I guess that situations
> where QEMU configures mem slot into high physical addresses are not hit
> in production ...
>
> Is any solution better than ignoring this situation?

We've hit it at least once a year (I can remember me, Nadav, Eduardo,
Dave Gilbert and you) and always decided that ultimately there is no
satisfactory solution.

Both GAW < HAW (AW = address width) and GAW > HAW have problems. If
GAW is smaller, bits that ought to be reserved aren't. This is
arguably worse than configuring memory into addresses above GAW.
However most Intel chips in the wild have 36 or 46 physical bits
respectively client or server, so in reality it's unlikely to have a
mismatch.

The sad thing, and one that is new since the last time we discussed the
issue, is that apparently Intel did have plans to support GAW < HAW:

commit 0307b7b8c275e65552f6022a18ad91902ae25d42
Author: Zhang Xiantao <[email protected]>
Date: Wed Dec 5 01:55:14 2012 +0800

kvm: remove unnecessary bit checking for ept violation

Bit 6 in EPT vmexit's exit qualification is not defined in SDM, so
remove it.

Signed-off-by: Zhang Xiantao <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2fd2046dc94c..d2248b3dbb61 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4863,11 +4863,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)

exit_qualification = vmcs_readl(EXIT_QUALIFICATION);

- if (exit_qualification & (1 << 6)) {
- printk(KERN_ERR "EPT: GPA exceeds GAW!\n");
- return -EINVAL;
- }
-
gla_validity = (exit_qualification >> 7) & 0x3;
if (gla_validity != 0x3 && gla_validity != 0x1 && gla_validity != 0) {
printk(KERN_ERR "EPT: Handling EPT violation failed!\n");

Oh well. :(

Paolo

2016-11-29 16:53:42

by Radim Krčmář

[permalink] [raw]

Subject: Re: [PATCH] KVM: x86: restrict maximal physical address

2016-11-25 17:43+0100, David Hildenbrand:
>> > This check is correct.
>> >
>> > However, I wonder if there is any way for user space to query this property?
>>
>> Do you mean boot_cpu_data.x86_phys_bits?
>> Userspace can execute CPUID instruction and read the value; QEMU does.
>
> Thanks, good to know. I remember that on s390x we explicitly decided to
> query the maximum address from KVM (KVM_S390_VM_MEM_LIMIT_SIZE) for two
> reasons. One of them was "just because our CPU supports it doesn't mean KVM
> supports it". Just like with all CPU features.
>
> However, this applies only for configuring hardware virtualization. The
> value that is exposed to the guest comes from the cpu model (with s390x cpu
> model support). So it will also not change during migration.
>
> But if this will never be relevant for x86 (KVM will always support host
> x86_phys_bits), fine.
>
>>
>> > On s390x, there is a kvm capability to export this information to user
>> > space. So QEMU can fail (e.g. migration) with a nice error message about
>> > missing hardware support.
>> >
>> > (most probably we still want to block this case, as migration will seem to
>> > work but than simply fail due to missing hardware support I guess). Maybe
>> > there is also already a nice check in QEMU that I am not yet aware of :)
>>
>> This patch is bad. It would break QEMU on all old machines, because
>> QEMU sets 40 by default.
>
> Not sure if rounding that value down (so it is at least consistent in KVM)
> makes sense (and documenting this behavior "may be rounded down"). And then
> implementing appropriate checks in QEMU (if not already present).

Silently rouding down doesn't fix bugs that we introduce to the guest,
just makes them behave differently and changing the value while the
guest is running could introduce more bugs. :(

I slightly prefer doing nothing for the case I was writing this patch
for: VMX checks for CR3 reserved bits -- doing nothing means that the
guest gets killed; rouding down would make the guest misbehave, which a
bit harder to debug.

Changing QEMU makes sense even if KVM stays the same. I'd touch QEMU
first, actually and after few years (decades), we could just apply this
patch. :)

>> Heh, QEMU doesn't check at all -- it even allows migration with
>> "host-phys-bits" feature and will happily change phys-bits when
>> migrating to another machine.
>>
>
> Either migrate that value (hmmm... ) or glue it to a command line parameter,
> so it won't change while migrating. E.g.
> - cpu models (if this value was always the same for a CPU generation - no
> expert on x86 cpu models).
> - "-cpu maxmem..." - could be a fit when thinking about "maximum VM size ==
> max phys bits for our guest". But depends how this value is actually
> interpreted by guests.

Yes, the host value has to be migrated in that case. QEMU also has
"phys-bits=N" feature and the default protected by machine types is 40,
so both work as expected on migration.