2020-07-13 08:29:27

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH] KVM: nVMX: properly pad struct kvm_vmx_nested_state_hdr

Holes in structs which are userspace ABI are undesireable.

Fixes: 83d31e5271ac ("KVM: nVMX: fixes for preemption timer migration")
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
Documentation/virt/kvm/api.rst | 2 +-
arch/x86/include/uapi/asm/kvm.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 320788f81a05..7beccda11978 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4345,7 +4345,7 @@ Errors:
struct {
__u16 flags;
} smm;
-
+ __u16 pad;
__u32 flags;
__u64 preemption_timer_deadline;
};
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 0780f97c1850..aae3df1cbd01 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -414,7 +414,7 @@ struct kvm_vmx_nested_state_hdr {
struct {
__u16 flags;
} smm;
-
+ __u16 pad;
__u32 flags;
__u64 preemption_timer_deadline;
};
--
2.25.4


2020-07-13 15:18:49

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: properly pad struct kvm_vmx_nested_state_hdr

On Mon, Jul 13, 2020 at 10:28:24AM +0200, Vitaly Kuznetsov wrote:
> Holes in structs which are userspace ABI are undesireable.
>
> Fixes: 83d31e5271ac ("KVM: nVMX: fixes for preemption timer migration")
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> Documentation/virt/kvm/api.rst | 2 +-
> arch/x86/include/uapi/asm/kvm.h | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 320788f81a05..7beccda11978 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4345,7 +4345,7 @@ Errors:
> struct {
> __u16 flags;
> } smm;
> -
> + __u16 pad;

I don't think this is sufficient. Before 83d31e5271ac, the struct was:

struct kvm_vmx_nested_state_hdr {
__u64 vmxon_pa;
__u64 vmcs12_pa;

struct {
__u16 flags;
} smm;
};

which most/all compilers will pad out to 24 bytes on a 64-bit system. And
although smm.flags is padded to 8 bytes, it's initialized as a 2 byte value.

714 struct kvm_vmx_nested_state_hdr boo;
715 u64 val;
716
717 BUILD_BUG_ON(sizeof(boo) != 3*8);
718 boo.smm.flags = 0;
0xffffffff810148a9 <+41>: xor %eax,%eax
0xffffffff810148ab <+43>: mov %ax,0x18(%rsp)

719
720 val = *((volatile u64 *)(&boo.smm.flags));
0xffffffff810148b0 <+48>: mov 0x18(%rsp),%rax


Which means that userspace built for the old kernel will potentially send in
garbage for the new 'flags' field due to it being uninitialized stack data,
even with the layout after this patch.

struct kvm_vmx_nested_state_hdr {
__u64 vmxon_pa;
__u64 vmcs12_pa;

struct {
__u16 flags;
} smm;
__u16 pad;
__u32 flags;
__u64 preemption_timer_deadline;
};

So to be backwards compatible I believe we need to add a __u32 pad as well,
and to not cause internal padding issues, either make the new 'flags' a
__u64 or pad that as well (or add and check a reserved __32). Making flags
a __64 seems like the least wasteful approach, e.g.

struct kvm_vmx_nested_state_hdr {
__u64 vmxon_pa;
__u64 vmcs12_pa;

struct {
__u16 flags;
} smm;
__u16 pad16;
__u32 pad32;
__u64 flags;
__u64 preemption_timer_deadline;
};


> __u32 flags;
> __u64 preemption_timer_deadline;
> };
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 0780f97c1850..aae3df1cbd01 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -414,7 +414,7 @@ struct kvm_vmx_nested_state_hdr {
> struct {
> __u16 flags;
> } smm;
> -
> + __u16 pad;
> __u32 flags;
> __u64 preemption_timer_deadline;
> };
> --
> 2.25.4
>

2020-07-13 15:55:37

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: properly pad struct kvm_vmx_nested_state_hdr

Sean Christopherson <[email protected]> writes:

> On Mon, Jul 13, 2020 at 10:28:24AM +0200, Vitaly Kuznetsov wrote:
>> Holes in structs which are userspace ABI are undesireable.
>>
>> Fixes: 83d31e5271ac ("KVM: nVMX: fixes for preemption timer migration")
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> Documentation/virt/kvm/api.rst | 2 +-
>> arch/x86/include/uapi/asm/kvm.h | 2 +-
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 320788f81a05..7beccda11978 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -4345,7 +4345,7 @@ Errors:
>> struct {
>> __u16 flags;
>> } smm;
>> -
>> + __u16 pad;
>
> I don't think this is sufficient. Before 83d31e5271ac, the struct was:
>

Before 850448f35aaf. Thanks, I was too lazy to check that.

> struct kvm_vmx_nested_state_hdr {
> __u64 vmxon_pa;
> __u64 vmcs12_pa;
>
> struct {
> __u16 flags;
> } smm;
> };
>
> which most/all compilers will pad out to 24 bytes on a 64-bit system. And
> although smm.flags is padded to 8 bytes, it's initialized as a 2 byte value.
>
> 714 struct kvm_vmx_nested_state_hdr boo;
> 715 u64 val;
> 716
> 717 BUILD_BUG_ON(sizeof(boo) != 3*8);
> 718 boo.smm.flags = 0;
> 0xffffffff810148a9 <+41>: xor %eax,%eax
> 0xffffffff810148ab <+43>: mov %ax,0x18(%rsp)
>
> 719
> 720 val = *((volatile u64 *)(&boo.smm.flags));
> 0xffffffff810148b0 <+48>: mov 0x18(%rsp),%rax
>
>
> Which means that userspace built for the old kernel will potentially send in
> garbage for the new 'flags' field due to it being uninitialized stack data,
> even with the layout after this patch.
>
> struct kvm_vmx_nested_state_hdr {
> __u64 vmxon_pa;
> __u64 vmcs12_pa;
>
> struct {
> __u16 flags;
> } smm;
> __u16 pad;
> __u32 flags;
> __u64 preemption_timer_deadline;
> };
>
> So to be backwards compatible I believe we need to add a __u32 pad as well,
> and to not cause internal padding issues, either make the new 'flags' a
> __u64 or pad that as well (or add and check a reserved __32). Making flags
> a __64 seems like the least wasteful approach, e.g.
>
> struct kvm_vmx_nested_state_hdr {
> __u64 vmxon_pa;
> __u64 vmcs12_pa;
>
> struct {
> __u16 flags;
> } smm;
> __u16 pad16;
> __u32 pad32;
> __u64 flags;
> __u64 preemption_timer_deadline;
> };

I see and I agree but the fix like that needs to get into 5.8 or an ABI
breakage is guaranteed. I'll send v2 immediately, hope Paolo will take a
look.

>
>
>> __u32 flags;
>> __u64 preemption_timer_deadline;
>> };
>> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
>> index 0780f97c1850..aae3df1cbd01 100644
>> --- a/arch/x86/include/uapi/asm/kvm.h
>> +++ b/arch/x86/include/uapi/asm/kvm.h
>> @@ -414,7 +414,7 @@ struct kvm_vmx_nested_state_hdr {
>> struct {
>> __u16 flags;
>> } smm;
>> -
>> + __u16 pad;
>> __u32 flags;
>> __u64 preemption_timer_deadline;
>> };
>> --
>> 2.25.4
>>
>

--
Vitaly

2020-07-27 11:45:00

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: properly pad struct kvm_vmx_nested_state_hdr

On 13/07/20 17:54, Vitaly Kuznetsov wrote:
> Which means that userspace built for the old kernel will potentially send in
> garbage for the new 'flags' field due to it being uninitialized stack data,
> even with the layout after this patch.

It might as well send it now if the code didn't attempt to zero the
struct before filling it in (this is another good reason to use a
"flags" field to say what's been filled in). I don't think special
casing padding is particularly useful; C11 for example requires
designated initializers to fill padding with zero bits[1] and even
before it's always been considered good behavior to use memset.

Paolo

[1] It says: "If an object that has static or thread storage duration
is not initialized explicitly, then [...] any padding is initialized to
zero bits" and even for non-static objects, "If there are fewer
initializers in a brace-enclosed list than there are elements or members
of an aggregate [...] the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration".

2020-07-27 15:48:37

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: properly pad struct kvm_vmx_nested_state_hdr

On Mon, Jul 27, 2020 at 01:43:34PM +0200, Paolo Bonzini wrote:
> On 13/07/20 17:54, Vitaly Kuznetsov wrote:
> > Which means that userspace built for the old kernel will potentially send in
> > garbage for the new 'flags' field due to it being uninitialized stack data,
> > even with the layout after this patch.
>
> It might as well send it now if the code didn't attempt to zero the
> struct before filling it in (this is another good reason to use a
> "flags" field to say what's been filled in).

The issue is that flags itself could hold garbage.

https://lkml.kernel.org/r/[email protected]

> I don't think special
> casing padding is particularly useful; C11 for example requires
> designated initializers to fill padding with zero bits[1] and even
> before it's always been considered good behavior to use memset.
>
> Paolo
>
> [1] It says: "If an object that has static or thread storage duration
> is not initialized explicitly, then [...] any padding is initialized to
> zero bits"

static and per-thread storage is unlikely to be relevant,

> and even for non-static objects, "If there are fewer
> initializers in a brace-enclosed list than there are elements or members
> of an aggregate [...] the remainder of the aggregate shall be
> initialized implicitly the same as objects that have static storage
> duration".

That's specifically talking about members, not usused/padded space, e.g.
smm.flags (in the hold struct) must be zeroed with this, but it doesn't
say anything about initializing padding.

struct kvm_vmx_nested_state_hdr hdr = {
.vmxon_pa = root,
.vmcs12_pa = vmcs12,
};

QEMU won't see issues because it zero allocates the entire nested state.

All the above being said, after looking at the whole picture I think padding
the header is a moot point. The header is padded out to 120 bytes[*] when
including in the full nested state, and KVM only ever consumes the header in
the context of the full nested state. I.e. if there's garbage at offset 6,
odds are there's going to be garbage at offset 18, so internally padding the
header does nothing.

KVM should be checking that the unused bytes of (sizeof(pad) - sizeof(vmx/svm))
is zero if we want to expand into the padding in the future. Right now we're
relying on userspace to zero allocate the struct without enforcing it.

[*] Amusing side note, the comment in the header is wrong. It states "pad
the header to 128 bytes", but only pads it to 120 bytes, because union.

/* for KVM_CAP_NESTED_STATE */
struct kvm_nested_state {
__u16 flags;
__u16 format;
__u32 size;

union {
struct kvm_vmx_nested_state_hdr vmx;
struct kvm_svm_nested_state_hdr svm;

/* Pad the header to 128 bytes. */
__u8 pad[120];
} hdr;

/*
* Define data region as 0 bytes to preserve backwards-compatability
* to old definition of kvm_nested_state in order to avoid changing
* KVM_{GET,PUT}_NESTED_STATE ioctl values.
*/
union {
struct kvm_vmx_nested_state_data vmx[0];
struct kvm_svm_nested_state_data svm[0];
} data;
};


Odds are no real VMM will have issue given the dynamic size of struct
kvm_nested_state, but

2020-07-27 18:48:49

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: properly pad struct kvm_vmx_nested_state_hdr

On 27/07/20 17:46, Sean Christopherson wrote:
>> It might as well send it now if the code didn't attempt to zero the
>> struct before filling it in (this is another good reason to use a
>> "flags" field to say what's been filled in).
> The issue is that flags itself could hold garbage.
>
> https://lkml.kernel.org/r/[email protected]

Quoting from there:

> Which means that userspace built for the old kernel will potentially send in
> garbage for the new 'flags' field due to it being uninitialized stack data,
> even with the layout after this patch.

Userspace should always zero everything. I don't think that the padding
between fields is any different from the other bytes padding the header
to 128 bytes.

> struct kvm_vmx_nested_state_hdr hdr = {
> .vmxon_pa = root,
> .vmcs12_pa = vmcs12,
> };
>
> QEMU won't see issues because it zero allocates the entire nested state.
>
> All the above being said, after looking at the whole picture I think padding
> the header is a moot point. The header is padded out to 120 bytes[*] when
> including in the full nested state, and KVM only ever consumes the header in
> the context of the full nested state. I.e. if there's garbage at offset 6,
> odds are there's going to be garbage at offset 18, so internally padding the
> header does nothing.

Yes, that was what I was hinting at with "it might as well send it now"
(i.e., after the patch).

(All of this is moot for userspace that just uses KVM_GET_NESTED_STATE
and passes it back to KVM_SET_NESTED_STATE).

> KVM should be checking that the unused bytes of (sizeof(pad) - sizeof(vmx/svm))
> is zero if we want to expand into the padding in the future. Right now we're
> relying on userspace to zero allocate the struct without enforcing it.

The alternative, which is almost as good, is to only use these extra
fields which could be garbage if the flags are not set, and check the
flags (see the patches I have sent earlier today).

The chance of the flags passing the check will decrease over time as
more flags are added; but the chance of having buggy userspace that
sends down garbage also will.

> [*] Amusing side note, the comment in the header is wrong. It states "pad
> the header to 128 bytes", but only pads it to 120 bytes, because union.
>
> /* for KVM_CAP_NESTED_STATE */
> struct kvm_nested_state {
> __u16 flags;
> __u16 format;
> __u32 size;
>
> union {
> struct kvm_vmx_nested_state_hdr vmx;
> struct kvm_svm_nested_state_hdr svm;
>
> /* Pad the header to 128 bytes. */
> __u8 pad[120];
> } hdr;

There are 8 bytes before the union, and it's not a coincidence. :)
"Header" refers to the stuff before the data region.

> Odds are no real VMM will have issue given the dynamic size of struct
> kvm_nested_state, but

... *suspence* ...

Paolo

2020-07-28 16:00:43

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: properly pad struct kvm_vmx_nested_state_hdr

On Mon, Jul 27, 2020 at 06:16:56PM +0200, Paolo Bonzini wrote:
> On 27/07/20 17:46, Sean Christopherson wrote:
> > All the above being said, after looking at the whole picture I think padding
> > the header is a moot point. The header is padded out to 120 bytes[*] when
> > including in the full nested state, and KVM only ever consumes the header in
> > the context of the full nested state. I.e. if there's garbage at offset 6,
> > odds are there's going to be garbage at offset 18, so internally padding the
> > header does nothing.
>
> Yes, that was what I was hinting at with "it might as well send it now"
> (i.e., after the patch).
>
> (All of this is moot for userspace that just uses KVM_GET_NESTED_STATE
> and passes it back to KVM_SET_NESTED_STATE).
>
> > KVM should be checking that the unused bytes of (sizeof(pad) - sizeof(vmx/svm))
> > is zero if we want to expand into the padding in the future. Right now we're
> > relying on userspace to zero allocate the struct without enforcing it.
>
> The alternative, which is almost as good, is to only use these extra
> fields which could be garbage if the flags are not set, and check the
> flags (see the patches I have sent earlier today).
>
> The chance of the flags passing the check will decrease over time as
> more flags are added; but the chance of having buggy userspace that
> sends down garbage also will.

Ah, I see what you're saying. Ya, that makes sense.

> > [*] Amusing side note, the comment in the header is wrong. It states "pad
> > the header to 128 bytes", but only pads it to 120 bytes, because union.
> >
> > /* for KVM_CAP_NESTED_STATE */
> > struct kvm_nested_state {
> > __u16 flags;
> > __u16 format;
> > __u32 size;
> >
> > union {
> > struct kvm_vmx_nested_state_hdr vmx;
> > struct kvm_svm_nested_state_hdr svm;
> >
> > /* Pad the header to 128 bytes. */
> > __u8 pad[120];
> > } hdr;
>
> There are 8 bytes before the union, and it's not a coincidence. :)
> "Header" refers to the stuff before the data region.

Ugh, then 'hdr' probably should be named vendor_header or something.