2020-08-20 13:37:21

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH v2 0/7] KVM: nSVM: ondemand nested state allocation + smm fixes

Hi!



This patch series does some refactoring and implements on demand nested state area

This way at least guests that don't use nesting won't waste memory

on nested state.



Patches 1,2,3 are refactoring



Patches 4,5 are new from V1 and implement more strict SMM save state area checking

on resume from SMM to avoid guest tampering with this area.



This was done to avoid crashing if the guest enabled 'guest was interrupted'

flag there and we don't have nested state allocated.



Patches 6,7 are for ondemand nested state.



The series was tested with various nested guests, in one case even with

L3 running, but note that due to unrelated issue, migration with nested

guest running didn't work for me with or without this series.

I am investigating this currently.



Best regards,

Maxim Levitsky



Maxim Levitsky (7):

KVM: SVM: rename a variable in the svm_create_vcpu

KVM: nSVM: rename nested 'vmcb' to vmcb12_gpa in few places

KVM: SVM: refactor msr permission bitmap allocation

KVM: x86: allow kvm_x86_ops.set_efer to return a value

KVM: nSVM: more strict smm checks

KVM: emulator: more strict rsm checks.

KVM: nSVM: implement ondemand allocation of the nested state



arch/x86/include/asm/kvm_host.h | 2 +-

arch/x86/kvm/emulate.c | 22 ++++--

arch/x86/kvm/svm/nested.c | 53 +++++++++++--

arch/x86/kvm/svm/svm.c | 130 ++++++++++++++++++--------------

arch/x86/kvm/svm/svm.h | 10 ++-

arch/x86/kvm/vmx/vmx.c | 5 +-

arch/x86/kvm/x86.c | 3 +-

7 files changed, 151 insertions(+), 74 deletions(-)



--

2.26.2





2020-08-20 13:39:23

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH v2 3/7] KVM: SVM: refactor msr permission bitmap allocation

Replace svm_vcpu_init_msrpm with svm_vcpu_alloc_msrpm, that also allocates
the msr bitmap and add svm_vcpu_free_msrpm to free it.

This will be used later to move the nested msr permission bitmap allocation
to nested.c

No functional change intended.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 45 +++++++++++++++++++++---------------------
1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d33013b9b4d7..7bb094bf6494 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -609,18 +609,29 @@ static void set_msr_interception(u32 *msrpm, unsigned msr,
msrpm[offset] = tmp;
}

-static void svm_vcpu_init_msrpm(u32 *msrpm)
+static u32 *svm_vcpu_alloc_msrpm(void)
{
int i;
+ u32 *msrpm;
+ struct page *pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
+
+ if (!pages)
+ return NULL;

+ msrpm = page_address(pages);
memset(msrpm, 0xff, PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER));

for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++) {
if (!direct_access_msrs[i].always)
continue;
-
set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
}
+ return msrpm;
+}
+
+static void svm_vcpu_free_msrpm(u32 *msrpm)
+{
+ __free_pages(virt_to_page(msrpm), MSRPM_ALLOC_ORDER);
}

static void add_msr_offset(u32 offset)
@@ -1172,9 +1183,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm;
struct page *vmcb_page;
- struct page *msrpm_pages;
struct page *hsave_page;
- struct page *nested_msrpm_pages;
int err;

BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
@@ -1185,21 +1194,13 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
if (!vmcb_page)
goto out;

- msrpm_pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
- if (!msrpm_pages)
- goto free_page1;
-
- nested_msrpm_pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
- if (!nested_msrpm_pages)
- goto free_page2;
-
hsave_page = alloc_page(GFP_KERNEL_ACCOUNT);
if (!hsave_page)
- goto free_page3;
+ goto free_page1;

err = avic_init_vcpu(svm);
if (err)
- goto free_page4;
+ goto free_page2;

/* We initialize this flag to true to make sure that the is_running
* bit would be set the first time the vcpu is loaded.
@@ -1210,11 +1211,13 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
svm->nested.hsave = page_address(hsave_page);
clear_page(svm->nested.hsave);

- svm->msrpm = page_address(msrpm_pages);
- svm_vcpu_init_msrpm(svm->msrpm);
+ svm->msrpm = svm_vcpu_alloc_msrpm();
+ if (!svm->msrpm)
+ goto free_page2;

- svm->nested.msrpm = page_address(nested_msrpm_pages);
- svm_vcpu_init_msrpm(svm->nested.msrpm);
+ svm->nested.msrpm = svm_vcpu_alloc_msrpm();
+ if (!svm->nested.msrpm)
+ goto free_page3;

svm->vmcb = page_address(vmcb_page);
clear_page(svm->vmcb);
@@ -1227,12 +1230,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)

return 0;

-free_page4:
- __free_page(hsave_page);
free_page3:
- __free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
+ svm_vcpu_free_msrpm(svm->msrpm);
free_page2:
- __free_pages(msrpm_pages, MSRPM_ALLOC_ORDER);
+ __free_page(hsave_page);
free_page1:
__free_page(vmcb_page);
out:
--
2.26.2

2020-08-20 13:39:52

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH v2 4/7] KVM: x86: allow kvm_x86_ops.set_efer to return a value

This will be used later to return an error when setting this msr fails.

For VMX, it already has an error condition when EFER is
not in the shared MSR list, so return an error in this case.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 3 ++-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 5 +++--
arch/x86/kvm/x86.c | 3 ++-
5 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ab3af7275d8..bd0519e26053 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1069,7 +1069,7 @@ struct kvm_x86_ops {
void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
int (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
- void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
+ int (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7bb094bf6494..f4569899361f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -263,7 +263,7 @@ static int get_max_npt_level(void)
#endif
}

-void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
{
struct vcpu_svm *svm = to_svm(vcpu);
vcpu->arch.efer = efer;
@@ -283,6 +283,7 @@ void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)

svm->vmcb->save.efer = efer | EFER_SVME;
vmcb_mark_dirty(svm->vmcb, VMCB_CR);
+ return 0;
}

static int is_external_interrupt(u32 info)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ab913468f9cb..468c58a91534 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -349,7 +349,7 @@ static inline bool gif_set(struct vcpu_svm *svm)
#define MSR_INVALID 0xffffffffU

u32 svm_msrpm_offset(u32 msr);
-void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
+int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
void svm_flush_tlb(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 46ba2e03a892..e90b9e68c7ea 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2862,13 +2862,13 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
kvm_mmu_reset_context(vcpu);
}

-void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+int vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct shared_msr_entry *msr = find_msr_entry(vmx, MSR_EFER);

if (!msr)
- return;
+ return 1;

vcpu->arch.efer = efer;
if (efer & EFER_LMA) {
@@ -2880,6 +2880,7 @@ void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
msr->data = efer & ~EFER_LME;
}
setup_msrs(vmx);
+ return 0;
}

#ifdef CONFIG_X86_64
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2db369a64f29..cad5d9778a21 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1471,7 +1471,8 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
efer &= ~EFER_LMA;
efer |= vcpu->arch.efer & EFER_LMA;

- kvm_x86_ops.set_efer(vcpu, efer);
+ if (kvm_x86_ops.set_efer(vcpu, efer))
+ return 1;

/* Update reserved bits */
if ((efer ^ old_efer) & EFER_NX)
--
2.26.2

2020-08-20 22:04:30

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] KVM: SVM: refactor msr permission bitmap allocation

On Thu, Aug 20, 2020 at 6:34 AM Maxim Levitsky <[email protected]> wrote:
>
> Replace svm_vcpu_init_msrpm with svm_vcpu_alloc_msrpm, that also allocates
> the msr bitmap and add svm_vcpu_free_msrpm to free it.
>
> This will be used later to move the nested msr permission bitmap allocation
> to nested.c
>
> No functional change intended.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 45 +++++++++++++++++++++---------------------
> 1 file changed, 23 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d33013b9b4d7..7bb094bf6494 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -609,18 +609,29 @@ static void set_msr_interception(u32 *msrpm, unsigned msr,
> msrpm[offset] = tmp;
> }
>
> -static void svm_vcpu_init_msrpm(u32 *msrpm)
> +static u32 *svm_vcpu_alloc_msrpm(void)

I prefer the original name, since this function does more than allocation.

> {
> int i;
> + u32 *msrpm;
> + struct page *pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
> +
> + if (!pages)
> + return NULL;
>
> + msrpm = page_address(pages);
> memset(msrpm, 0xff, PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER));
>
> for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++) {
> if (!direct_access_msrs[i].always)
> continue;
> -
> set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
> }
> + return msrpm;
> +}
> +
> +static void svm_vcpu_free_msrpm(u32 *msrpm)
> +{
> + __free_pages(virt_to_page(msrpm), MSRPM_ALLOC_ORDER);
> }
>
> static void add_msr_offset(u32 offset)
> @@ -1172,9 +1183,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm;
> struct page *vmcb_page;
> - struct page *msrpm_pages;
> struct page *hsave_page;
> - struct page *nested_msrpm_pages;
> int err;
>
> BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
> @@ -1185,21 +1194,13 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> if (!vmcb_page)
> goto out;
>
> - msrpm_pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
> - if (!msrpm_pages)
> - goto free_page1;
> -
> - nested_msrpm_pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
> - if (!nested_msrpm_pages)
> - goto free_page2;
> -

Reordering the allocations does seem like a functional change to me,
albeit one that should (hopefully) be benign. For example, if the
MSRPM_ALLOC_ORDER allocations fail, in the new version of the code,
the hsave_page will be cleared, but in the old version of the code, no
page would be cleared.

> hsave_page = alloc_page(GFP_KERNEL_ACCOUNT);

Speaking of clearing pages, why not add __GFP_ZERO to the flags above
and skip the clear_page() call below?

> if (!hsave_page)
> - goto free_page3;
> + goto free_page1;
>
> err = avic_init_vcpu(svm);
> if (err)
> - goto free_page4;
> + goto free_page2;
>
> /* We initialize this flag to true to make sure that the is_running
> * bit would be set the first time the vcpu is loaded.
> @@ -1210,11 +1211,13 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> svm->nested.hsave = page_address(hsave_page);
> clear_page(svm->nested.hsave);
>
> - svm->msrpm = page_address(msrpm_pages);
> - svm_vcpu_init_msrpm(svm->msrpm);
> + svm->msrpm = svm_vcpu_alloc_msrpm();
> + if (!svm->msrpm)
> + goto free_page2;
>
> - svm->nested.msrpm = page_address(nested_msrpm_pages);
> - svm_vcpu_init_msrpm(svm->nested.msrpm);
> + svm->nested.msrpm = svm_vcpu_alloc_msrpm();
> + if (!svm->nested.msrpm)
> + goto free_page3;
>
> svm->vmcb = page_address(vmcb_page);
> clear_page(svm->vmcb);
> @@ -1227,12 +1230,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>
> return 0;
>
> -free_page4:
> - __free_page(hsave_page);
> free_page3:
> - __free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
> + svm_vcpu_free_msrpm(svm->msrpm);
> free_page2:
> - __free_pages(msrpm_pages, MSRPM_ALLOC_ORDER);
> + __free_page(hsave_page);
> free_page1:
> __free_page(vmcb_page);
> out:

While you're here, could you improve these labels? Coding-style.rst says:

Choose label names which say what the goto does or why the goto exists. An
example of a good name could be ``out_free_buffer:`` if the goto frees
``buffer``.
Avoid using GW-BASIC names like ``err1:`` and ``err2:``, as you would have to
renumber them if you ever add or remove exit paths, and they make correctness
difficult to verify anyway.

2020-08-20 22:05:31

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] KVM: x86: allow kvm_x86_ops.set_efer to return a value

On Thu, Aug 20, 2020 at 6:34 AM Maxim Levitsky <[email protected]> wrote:
>
> This will be used later to return an error when setting this msr fails.
>
> For VMX, it already has an error condition when EFER is
> not in the shared MSR list, so return an error in this case.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---

> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1471,7 +1471,8 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> efer &= ~EFER_LMA;
> efer |= vcpu->arch.efer & EFER_LMA;
>
> - kvm_x86_ops.set_efer(vcpu, efer);
> + if (kvm_x86_ops.set_efer(vcpu, efer))
> + return 1;

This seems like a userspace ABI change to me. Previously, it looks
like userspace could always use KVM_SET_MSRS to set MSR_EFER to 0 or
EFER_SCE, and it would always succeed. Now, it looks like it will fail
on CPUs that don't support EFER in hardware. (Perhaps it should fail,
but it didn't before, AFAICT.)

2020-08-21 00:45:04

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] KVM: x86: allow kvm_x86_ops.set_efer to return a value

On Thu, Aug 20, 2020 at 02:43:56PM -0700, Jim Mattson wrote:
> On Thu, Aug 20, 2020 at 6:34 AM Maxim Levitsky <[email protected]> wrote:
> >
> > This will be used later to return an error when setting this msr fails.
> >
> > For VMX, it already has an error condition when EFER is
> > not in the shared MSR list, so return an error in this case.
> >
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
>
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1471,7 +1471,8 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > efer &= ~EFER_LMA;
> > efer |= vcpu->arch.efer & EFER_LMA;
> >
> > - kvm_x86_ops.set_efer(vcpu, efer);
> > + if (kvm_x86_ops.set_efer(vcpu, efer))
> > + return 1;
>
> This seems like a userspace ABI change to me. Previously, it looks
> like userspace could always use KVM_SET_MSRS to set MSR_EFER to 0 or
> EFER_SCE, and it would always succeed. Now, it looks like it will fail
> on CPUs that don't support EFER in hardware. (Perhaps it should fail,
> but it didn't before, AFAICT.)

KVM emulates SYSCALL, presumably that also works when EFER doesn't exist in
hardware.

The above also adds weirdness to nested VMX as vmx_set_efer() simply can't
fail.

2020-08-24 11:45:20

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] KVM: SVM: refactor msr permission bitmap allocation

On Thu, 2020-08-20 at 14:26 -0700, Jim Mattson wrote:
> On Thu, Aug 20, 2020 at 6:34 AM Maxim Levitsky <[email protected]> wrote:
> > Replace svm_vcpu_init_msrpm with svm_vcpu_alloc_msrpm, that also allocates
> > the msr bitmap and add svm_vcpu_free_msrpm to free it.
> >
> > This will be used later to move the nested msr permission bitmap allocation
> > to nested.c
> >
> > No functional change intended.
> >
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
> > arch/x86/kvm/svm/svm.c | 45 +++++++++++++++++++++---------------------
> > 1 file changed, 23 insertions(+), 22 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index d33013b9b4d7..7bb094bf6494 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -609,18 +609,29 @@ static void set_msr_interception(u32 *msrpm, unsigned msr,
> > msrpm[offset] = tmp;
> > }
> >
> > -static void svm_vcpu_init_msrpm(u32 *msrpm)
> > +static u32 *svm_vcpu_alloc_msrpm(void)
>
> I prefer the original name, since this function does more than allocation.
But it also allocates it. I don't mind using the old name though.
>
> > {
> > int i;
> > + u32 *msrpm;
> > + struct page *pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
> > +
> > + if (!pages)
> > + return NULL;
> >
> > + msrpm = page_address(pages);
> > memset(msrpm, 0xff, PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER));
> >
> > for (i = 0; direct_access_msrs[i].index != MSR_INVALID; i++) {
> > if (!direct_access_msrs[i].always)
> > continue;
> > -
> > set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
> > }
> > + return msrpm;
> > +}
> > +
> > +static void svm_vcpu_free_msrpm(u32 *msrpm)
> > +{
> > + __free_pages(virt_to_page(msrpm), MSRPM_ALLOC_ORDER);
> > }
> >
> > static void add_msr_offset(u32 offset)
> > @@ -1172,9 +1183,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> > {
> > struct vcpu_svm *svm;
> > struct page *vmcb_page;
> > - struct page *msrpm_pages;
> > struct page *hsave_page;
> > - struct page *nested_msrpm_pages;
> > int err;
> >
> > BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
> > @@ -1185,21 +1194,13 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> > if (!vmcb_page)
> > goto out;
> >
> > - msrpm_pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
> > - if (!msrpm_pages)
> > - goto free_page1;
> > -
> > - nested_msrpm_pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
> > - if (!nested_msrpm_pages)
> > - goto free_page2;
> > -
>
> Reordering the allocations does seem like a functional change to me,
> albeit one that should (hopefully) be benign. For example, if the
> MSRPM_ALLOC_ORDER allocations fail, in the new version of the code,
> the hsave_page will be cleared, but in the old version of the code, no
> page would be cleared.
Noted.
>
> > hsave_page = alloc_page(GFP_KERNEL_ACCOUNT);
>
> Speaking of clearing pages, why not add __GFP_ZERO to the flags above
> and skip the clear_page() call below?
I haven't thought about it, I don't see a reason to not use __GFP_ZERO,
but this is how the old code was.

>
> > if (!hsave_page)
> > - goto free_page3;
> > + goto free_page1;
> >
> > err = avic_init_vcpu(svm);
> > if (err)
> > - goto free_page4;
> > + goto free_page2;
> >
> > /* We initialize this flag to true to make sure that the is_running
> > * bit would be set the first time the vcpu is loaded.
> > @@ -1210,11 +1211,13 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> > svm->nested.hsave = page_address(hsave_page);
> > clear_page(svm->nested.hsave);
> >
> > - svm->msrpm = page_address(msrpm_pages);
> > - svm_vcpu_init_msrpm(svm->msrpm);
> > + svm->msrpm = svm_vcpu_alloc_msrpm();
> > + if (!svm->msrpm)
> > + goto free_page2;
> >
> > - svm->nested.msrpm = page_address(nested_msrpm_pages);
> > - svm_vcpu_init_msrpm(svm->nested.msrpm);
> > + svm->nested.msrpm = svm_vcpu_alloc_msrpm();
> > + if (!svm->nested.msrpm)
> > + goto free_page3;
> >
> > svm->vmcb = page_address(vmcb_page);
> > clear_page(svm->vmcb);
> > @@ -1227,12 +1230,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> >
> > return 0;
> >
> > -free_page4:
> > - __free_page(hsave_page);
> > free_page3:
> > - __free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
> > + svm_vcpu_free_msrpm(svm->msrpm);
> > free_page2:
> > - __free_pages(msrpm_pages, MSRPM_ALLOC_ORDER);
> > + __free_page(hsave_page);
> > free_page1:
> > __free_page(vmcb_page);
> > out:
>
> While you're here, could you improve these labels? Coding-style.rst says:
>
> Choose label names which say what the goto does or why the goto exists. An
> example of a good name could be ``out_free_buffer:`` if the goto frees
> ``buffer``.
> Avoid using GW-BASIC names like ``err1:`` and ``err2:``, as you would have to
> renumber them if you ever add or remove exit paths, and they make correctness
> difficult to verify anyway.
I noticed that and I agree. I'll do this in follow up patch.

Thanks for review,
Best regards,
Maxim Levitsky


>


2020-08-27 10:24:19

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] KVM: x86: allow kvm_x86_ops.set_efer to return a value

On Thu, 2020-08-20 at 17:43 -0700, Sean Christopherson wrote:
> On Thu, Aug 20, 2020 at 02:43:56PM -0700, Jim Mattson wrote:
> > On Thu, Aug 20, 2020 at 6:34 AM Maxim Levitsky <[email protected]> wrote:
> > > This will be used later to return an error when setting this msr fails.
> > >
> > > For VMX, it already has an error condition when EFER is
> > > not in the shared MSR list, so return an error in this case.
> > >
> > > Signed-off-by: Maxim Levitsky <[email protected]>
> > > ---
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -1471,7 +1471,8 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > efer &= ~EFER_LMA;
> > > efer |= vcpu->arch.efer & EFER_LMA;
> > >
> > > - kvm_x86_ops.set_efer(vcpu, efer);
> > > + if (kvm_x86_ops.set_efer(vcpu, efer))
> > > + return 1;
> >
> > This seems like a userspace ABI change to me. Previously, it looks
> > like userspace could always use KVM_SET_MSRS to set MSR_EFER to 0 or
> > EFER_SCE, and it would always succeed. Now, it looks like it will fail
> > on CPUs that don't support EFER in hardware. (Perhaps it should fail,
> > but it didn't before, AFAICT.)
>
> KVM emulates SYSCALL, presumably that also works when EFER doesn't exist in
> hardware.

This is a fair point.
How about checking the return value only when '!msr_info->host_initiated' in set_efer?

This way userspace initiated EFER write will work as it did before,
but guest initiated write will fail
(and set_efer already checks and fails for many cases)

I also digged a bit around the failure check in VMX, the 'find_msr_entry(vmx, MSR_EFER);'
This one if I am not mistaken will only fail when host doesn't support EFER.
I don't mind ignoring this error as well as it was before.

>
> The above also adds weirdness to nested VMX as vmx_set_efer() simply can't
> fail.
It will now fail on non 64 bit Intel CPUs that support VMX. I do think that
we had these for a while. As I said I'll return 0 when find_msr_entry fails,
thus return this behavior as it was on Intel.

Best regards,
Maxim Levitsky