2024-02-23 10:57:52

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH v2 00/11] KVM: SEV: allow customizing VMSA features

The idea that no parameter would ever be necessary when enabling SEV or
SEV-ES for a VM was decidedly optimistic. The first source of variability
that was encountered is the desired set of VMSA features, as that affects
the measurement of the VM's initial state and cannot be changed
arbitrarily by the hypervisor.

This series adds all the APIs that are needed to customize the features,
with room for future enhancements:

- a new /dev/kvm device attribute to retrieve the set of supported
features (right now, only debug swap)

- a new sub-operation for KVM_MEM_ENCRYPT_OP that can take a struct,
replacing the existing KVM_SEV_INIT and KVM_SEV_ES_INIT

It then puts the new op to work by including the VMSA features as a field
of the The existing KVM_SEV_INIT and KVM_SEV_ES_INIT use the full set of
supported VMSA features for backwards compatibility; but I am considering
also making them use zero as the feature mask, and will gladly adjust the
patches if so requested.

In order to avoid creating *two* new KVM_MEM_ENCRYPT_OPs, I decided that
I could as well make SEV and SEV-ES use VM types. And then, why not make
a SEV-ES VM, when created with the new VM type instead of KVM_SEV_ES_INIT,
reject KVM_GET_REGS/KVM_SET_REGS and friends on the vCPU file descriptor
once the VMSA has been encrypted... Which is how the API should have
always behaved.

The series is defined as follows:

- patches 1 and 2 are unrelated fixes and improvements for the SEV API

- patches 3 to 5 introduce the new device attribute to retrieve supported
VMSA features

- patch 6 (new in v2) disables DEBUG_SWAP by default

- patches 7 and 8 introduce new infrastructure for VM types, partly lifted
out of the TDX patches

- patches 9 and 10 introduce respectively the new VM types for SEV and
SEV-ES, and KVM_SEV_INIT2 as a new sub-operation for KVM_MEM_ENCRYPT_OP.

- patch 11 tests the new ioctl.

The idea is that SEV SNP will only ever support KVM_SEV_INIT2. I have
patches in progress for QEMU to support this new API.

Thanks,

Paolo

v1->v2:
- fix compilation with SEV disabled (patch 4)
- new patch "KVM: SEV: disable DEBUG_SWAP by default" (patch 6)
- move definition of __KVM_X86_*_TYPE outside uapi headers (patch 7)
- do not export __kvm_is_vm_type_supported (patch 8)
- fixes to documentation (patch 10)
- reject all features for SEV (patch 10)
- do not enable any features for legacy KVM_SEV_INIT/KVM_SEV_ES_INIT (patch 10)


Paolo Bonzini (11):
KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP
KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR
Documentation: kvm/sev: separate description of firmware
KVM: SEV: publish supported VMSA features
KVM: SEV: store VMSA features in kvm_sev_info
KVM: SEV: disable DEBUG_SWAP by default
KVM: x86: define standard behavior for bits 0/1 of VM type
KVM: x86: Add is_vm_type_supported callback
KVM: SEV: define VM types for SEV and SEV-ES
KVM: SEV: introduce KVM_SEV_INIT2 operation
selftests: kvm: add tests for KVM_SEV_INIT2

Documentation/virt/kvm/api.rst | 2 +
.../virt/kvm/x86/amd-memory-encryption.rst | 81 +++++++--
arch/x86/include/asm/kvm-x86-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 11 +-
arch/x86/include/uapi/asm/kvm.h | 35 ++++
arch/x86/kvm/svm/sev.c | 110 +++++++++++-
arch/x86/kvm/svm/svm.c | 14 +-
arch/x86/kvm/svm/svm.h | 6 +-
arch/x86/kvm/x86.c | 157 ++++++++++++++----
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 6 +-
.../selftests/kvm/set_memory_region_test.c | 8 +-
.../selftests/kvm/x86_64/sev_init2_tests.c | 146 ++++++++++++++++
13 files changed, 510 insertions(+), 69 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/sev_init2_tests.c

--
2.39.1



2024-02-23 10:58:03

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH v2 07/11] KVM: x86: define standard behavior for bits 0/1 of VM type

Some VM types have characteristics in common; in fact, the only use
of VM types right now is kvm_arch_has_private_mem and it assumes that
_all_ VM types have private memory.

So, let the low bits specify the characteristics of the VM type. As
of we have two special things: whether memory is private, and whether
guest state is protected. The latter is similar to
kvm->arch.guest_state_protected, but the latter is only set on a fully
initialized VM. If both are set, ioctls to set registers will cause
an error---SEV-ES did not do so, which is a problematic API.

Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 9 +++-
arch/x86/kvm/x86.c | 93 +++++++++++++++++++++++++++------
2 files changed, 85 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0bcd9ae16097..15db2697863c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2135,8 +2135,15 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
int tdp_max_root_level, int tdp_huge_page_level);

+
+/* Low bits of VM types provide confidential computing capabilities. */
+#define __KVM_X86_PRIVATE_MEM_TYPE 1
+#define __KVM_X86_PROTECTED_STATE_TYPE 2
+#define __KVM_X86_VM_TYPE_FEATURES 3
+static_assert((KVM_X86_SW_PROTECTED_VM & __KVM_X86_VM_TYPE_FEATURES) == __KVM_X86_PRIVATE_MEM_TYPE);
+
#ifdef CONFIG_KVM_PRIVATE_MEM
-#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.vm_type != KVM_X86_DEFAULT_VM)
+#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.vm_type & __KVM_X86_PRIVATE_MEM_TYPE)
#else
#define kvm_arch_has_private_mem(kvm) false
#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8746530930d5..e634e5b67516 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5526,21 +5526,30 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
return 0;
}

-static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
- struct kvm_debugregs *dbgregs)
+static int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
+ struct kvm_debugregs *dbgregs)
{
unsigned long val;

+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
memset(dbgregs, 0, sizeof(*dbgregs));
memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db));
kvm_get_dr(vcpu, 6, &val);
dbgregs->dr6 = val;
dbgregs->dr7 = vcpu->arch.dr7;
+ return 0;
}

static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
struct kvm_debugregs *dbgregs)
{
+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
if (dbgregs->flags)
return -EINVAL;

@@ -5559,9 +5568,13 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
}


-static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
- u8 *state, unsigned int size)
+static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
+ u8 *state, unsigned int size)
{
+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ fpstate_is_confidential(&vcpu->arch.guest_fpu))
+ return -EINVAL;
+
/*
* Only copy state for features that are enabled for the guest. The
* state itself isn't problematic, but setting bits in the header for
@@ -5578,22 +5591,27 @@ static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
XFEATURE_MASK_FPSSE;

if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
- return;
+ return 0;

fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size,
supported_xcr0, vcpu->arch.pkru);
+ return 0;
}

-static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
- struct kvm_xsave *guest_xsave)
+static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
+ struct kvm_xsave *guest_xsave)
{
- kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region,
- sizeof(guest_xsave->region));
+ return kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region,
+ sizeof(guest_xsave->region));
}

static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
{
+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ fpstate_is_confidential(&vcpu->arch.guest_fpu))
+ return -EINVAL;
+
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return 0;

@@ -5603,18 +5621,23 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
&vcpu->arch.pkru);
}

-static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
- struct kvm_xcrs *guest_xcrs)
+static int kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
+ struct kvm_xcrs *guest_xcrs)
{
+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
if (!boot_cpu_has(X86_FEATURE_XSAVE)) {
guest_xcrs->nr_xcrs = 0;
- return;
+ return 0;
}

guest_xcrs->nr_xcrs = 1;
guest_xcrs->flags = 0;
guest_xcrs->xcrs[0].xcr = XCR_XFEATURE_ENABLED_MASK;
guest_xcrs->xcrs[0].value = vcpu->arch.xcr0;
+ return 0;
}

static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
@@ -5622,6 +5645,10 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
{
int i, r = 0;

+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
if (!boot_cpu_has(X86_FEATURE_XSAVE))
return -EINVAL;

@@ -6010,7 +6037,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
case KVM_GET_DEBUGREGS: {
struct kvm_debugregs dbgregs;

- kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
+ r = kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
+ if (r < 0)
+ break;

r = -EFAULT;
if (copy_to_user(argp, &dbgregs,
@@ -6040,7 +6069,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
if (!u.xsave)
break;

- kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave);
+ r = kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave);
+ if (r < 0)
+ break;

r = -EFAULT;
if (copy_to_user(argp, u.xsave, sizeof(struct kvm_xsave)))
@@ -6069,7 +6100,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
if (!u.xsave)
break;

- kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size);
+ r = kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size);
+ if (r < 0)
+ break;

r = -EFAULT;
if (copy_to_user(argp, u.xsave, size))
@@ -6085,7 +6118,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
if (!u.xcrs)
break;

- kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs);
+ r = kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs);
+ if (r < 0)
+ break;

r = -EFAULT;
if (copy_to_user(argp, u.xcrs,
@@ -6229,6 +6264,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
}
#endif
case KVM_GET_SREGS2: {
+ r = -EINVAL;
+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ vcpu->arch.guest_state_protected)
+ goto out;
+
u.sregs2 = kzalloc(sizeof(struct kvm_sregs2), GFP_KERNEL);
r = -ENOMEM;
if (!u.sregs2)
@@ -6241,6 +6281,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
break;
}
case KVM_SET_SREGS2: {
+ r = -EINVAL;
+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ vcpu->arch.guest_state_protected)
+ goto out;
+
u.sregs2 = memdup_user(argp, sizeof(struct kvm_sregs2));
if (IS_ERR(u.sregs2)) {
r = PTR_ERR(u.sregs2);
@@ -11466,6 +11511,10 @@ static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)

int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
{
+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
vcpu_load(vcpu);
__get_regs(vcpu, regs);
vcpu_put(vcpu);
@@ -11507,6 +11556,10 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)

int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
{
+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
vcpu_load(vcpu);
__set_regs(vcpu, regs);
vcpu_put(vcpu);
@@ -11579,6 +11632,10 @@ static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
struct kvm_sregs *sregs)
{
+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
vcpu_load(vcpu);
__get_sregs(vcpu, sregs);
vcpu_put(vcpu);
@@ -11846,6 +11903,10 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
{
int ret;

+ if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
vcpu_load(vcpu);
ret = __set_sregs(vcpu, sregs);
vcpu_put(vcpu);
--
2.39.1



2024-02-23 14:54:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] KVM: SEV: allow customizing VMSA features

On Fri, Feb 23, 2024, Paolo Bonzini wrote:
> Paolo Bonzini (11):
> KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP
> KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR
> Documentation: kvm/sev: separate description of firmware
> KVM: SEV: publish supported VMSA features
> KVM: SEV: store VMSA features in kvm_sev_info
> KVM: SEV: disable DEBUG_SWAP by default
> KVM: x86: define standard behavior for bits 0/1 of VM type
> KVM: x86: Add is_vm_type_supported callback
> KVM: SEV: define VM types for SEV and SEV-ES
> KVM: SEV: introduce KVM_SEV_INIT2 operation
> selftests: kvm: add tests for KVM_SEV_INIT2
>
> Documentation/virt/kvm/api.rst | 2 +
> .../virt/kvm/x86/amd-memory-encryption.rst | 81 +++++++--
> arch/x86/include/asm/kvm-x86-ops.h | 2 +
> arch/x86/include/asm/kvm_host.h | 11 +-
> arch/x86/include/uapi/asm/kvm.h | 35 ++++
> arch/x86/kvm/svm/sev.c | 110 +++++++++++-
> arch/x86/kvm/svm/svm.c | 14 +-
> arch/x86/kvm/svm/svm.h | 6 +-
> arch/x86/kvm/x86.c | 157 ++++++++++++++----
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/include/kvm_util_base.h | 6 +-
> .../selftests/kvm/set_memory_region_test.c | 8 +-
> .../selftests/kvm/x86_64/sev_init2_tests.c | 146 ++++++++++++++++
> 13 files changed, 510 insertions(+), 69 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/x86_64/sev_init2_tests.c

FYI, there are 4-5 minor conflicts with kvm-x86/next, and going off my memory, I
think the conflicts come from ~3 different topic branches.

Given that this is based on kvm/next, I assume it's destined for 6.9. So maybe
rebase on kvm-x86/next for v3, and then I'll get my 6.9 pull requests sent for
the conflicting branches early next week so that this can land in a topic branch
that's based on kvm/next?

2024-02-23 15:05:54

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] KVM: SEV: allow customizing VMSA features

On 2/23/24 15:52, Sean Christopherson wrote:
>
> Given that this is based on kvm/next, I assume it's destined for 6.9. So maybe
> rebase on kvm-x86/next for v3, and then I'll get my 6.9 pull requests sent for
> the conflicting branches early next week so that this can land in a topic branch
> that's based on kvm/next?

Ok, sounds good.

Paolo


2024-02-23 16:46:53

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 07/11] KVM: x86: define standard behavior for bits 0/1 of VM type

On Fri, Feb 23, 2024, Paolo Bonzini wrote:
> Some VM types have characteristics in common; in fact, the only use
> of VM types right now is kvm_arch_has_private_mem and it assumes that
> _all_ VM types have private memory.
>
> So, let the low bits specify the characteristics of the VM type. As
> of we have two special things: whether memory is private, and whether
> guest state is protected. The latter is similar to
> kvm->arch.guest_state_protected, but the latter is only set on a fully
> initialized VM. If both are set, ioctls to set registers will cause
> an error---SEV-ES did not do so, which is a problematic API.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> Message-Id: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 9 +++-
> arch/x86/kvm/x86.c | 93 +++++++++++++++++++++++++++------
> 2 files changed, 85 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0bcd9ae16097..15db2697863c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2135,8 +2135,15 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
> void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> int tdp_max_root_level, int tdp_huge_page_level);
>
> +
> +/* Low bits of VM types provide confidential computing capabilities. */
> +#define __KVM_X86_PRIVATE_MEM_TYPE 1
> +#define __KVM_X86_PROTECTED_STATE_TYPE 2
> +#define __KVM_X86_VM_TYPE_FEATURES 3
> +static_assert((KVM_X86_SW_PROTECTED_VM & __KVM_X86_VM_TYPE_FEATURES) == __KVM_X86_PRIVATE_MEM_TYPE);

Aliasing bit 0 to KVM_X86_SW_PROTECTED_VM is gross, e.g. this

#define KVM_X86_DEFAULT_VM 0
#define KVM_X86_SW_PROTECTED_VM 1
+#define KVM_X86_SEV_VM 8
+#define KVM_X86_SEV_ES_VM 10

is _super_ confusing and bound to cause problems.

Oh, good gravy, you're also aliasing __KVM_X86_PROTECTED_STATE_TYPE into SEV_ES_VM.
Curse you and your Rami Code induced decimal-based bitwise shenanigans!!!

I don't see any reason to bleed the flags into KVM's ABI. Even shoving the flags
into kvm->arch.vm_type is unncessary. Aha! As is storing vm_type as an "unsigned
long", since (a) it can't ever be bigger than u32, and (b) in practice a u8 will
suffice since as Mike pointed out we're effectively limited to 31 types before
kvm_vm_ioctl_check_extension() starts returning error codes.

So I vote to skip the aliasing and simply do:

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ff23712f1f3f..27265ff5fd29 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1279,7 +1279,9 @@ enum kvm_apicv_inhibit {
};

struct kvm_arch {
- unsigned long vm_type;
+ u8 vm_type;
+ bool has_private_memory;
+ bool has_protected_state;
unsigned long n_used_mmu_pages;
unsigned long n_requested_mmu_pages;
unsigned long n_max_mmu_pages;

and then just use straight incrementing values for types, i.e.

#define KVM_X86_DEFAULT_VM 0
#define KVM_X86_SW_PROTECTED_VM 1
#define KVM_X86_SEV_VM 2
#define KVM_X86_SEV_ES_VM 3

It'll require a bit of extra work in kvm_arch_init_vm(), but I think the end result
will be signifincatly easier to follow.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8746530930d5..e634e5b67516 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5526,21 +5526,30 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> return 0;
> }
>
> -static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
> - struct kvm_debugregs *dbgregs)
> +static int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
> + struct kvm_debugregs *dbgregs)
> {
> unsigned long val;
>
> + if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
> + vcpu->arch.guest_state_protected)
> + return -EINVAL;
> +
> memset(dbgregs, 0, sizeof(*dbgregs));
> memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db));
> kvm_get_dr(vcpu, 6, &val);
> dbgregs->dr6 = val;
> dbgregs->dr7 = vcpu->arch.dr7;
> + return 0;
> }

> static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
> @@ -5622,6 +5645,10 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
> {
> int i, r = 0;
>
> + if ((vcpu->kvm->arch.vm_type & __KVM_X86_PROTECTED_STATE_TYPE) &&
> + vcpu->arch.guest_state_protected)
> + return -EINVAL;
> +
> if (!boot_cpu_has(X86_FEATURE_XSAVE))
> return -EINVAL;
>
> @@ -6010,7 +6037,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> case KVM_GET_DEBUGREGS: {
> struct kvm_debugregs dbgregs;
>
> - kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
> + r = kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
> + if (r < 0)

I would strongly prefer to just do

if (r)

as "r < 0" implies that postive return values are possible/allowed.

That said, rather than a mix of open coding checks in kvm_arch_vcpu_ioctl() versus
burying checks in helpers, what about adding a dedicated helper to take care of
everything in one shot? E.g.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bc69b0c9822..f5ca78e75eec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5864,6 +5864,16 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
}
}

+static bool kvm_ioctl_accesses_guest_state(unsigned int ioctl)
+{
+ switch (ioctl) {
+ case <...>:
+ return true;
+ default:
+ return false:
+ }
+}
+
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -5878,6 +5888,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
void *buffer;
} u;

+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected &&
+ kvm_ioctl_accesses_guest_state(ioctl))
+ return -EINVAL;
+
vcpu_load(vcpu);

u.buffer = NULL;

It'll be a much smaller diff, and hopefully easier to audit, too.

2024-02-23 16:48:40

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 07/11] KVM: x86: define standard behavior for bits 0/1 of VM type

On Fri, Feb 23, 2024 at 5:46 PM Sean Christopherson <[email protected]> wrote:
>
> On Fri, Feb 23, 2024, Paolo Bonzini wrote:
> > Some VM types have characteristics in common; in fact, the only use
> > of VM types right now is kvm_arch_has_private_mem and it assumes that
> > _all_ VM types have private memory.
> >
> > So, let the low bits specify the characteristics of the VM type. As
> > of we have two special things: whether memory is private, and whether
> > guest state is protected. The latter is similar to
> > kvm->arch.guest_state_protected, but the latter is only set on a fully
> > initialized VM. If both are set, ioctls to set registers will cause
> > an error---SEV-ES did not do so, which is a problematic API.
> >
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > Message-Id: <[email protected]>
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 9 +++-
> > arch/x86/kvm/x86.c | 93 +++++++++++++++++++++++++++------
> > 2 files changed, 85 insertions(+), 17 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 0bcd9ae16097..15db2697863c 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2135,8 +2135,15 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
> > void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> > int tdp_max_root_level, int tdp_huge_page_level);
> >
> > +
> > +/* Low bits of VM types provide confidential computing capabilities. */
> > +#define __KVM_X86_PRIVATE_MEM_TYPE 1
> > +#define __KVM_X86_PROTECTED_STATE_TYPE 2
> > +#define __KVM_X86_VM_TYPE_FEATURES 3
> > +static_assert((KVM_X86_SW_PROTECTED_VM & __KVM_X86_VM_TYPE_FEATURES) == __KVM_X86_PRIVATE_MEM_TYPE);
>
> Aliasing bit 0 to KVM_X86_SW_PROTECTED_VM is gross, e.g. this
>
> #define KVM_X86_DEFAULT_VM 0
> #define KVM_X86_SW_PROTECTED_VM 1
> +#define KVM_X86_SEV_VM 8
> +#define KVM_X86_SEV_ES_VM 10
>
> is _super_ confusing and bound to cause problems.
>
> Oh, good gravy, you're also aliasing __KVM_X86_PROTECTED_STATE_TYPE into SEV_ES_VM.
> Curse you and your Rami Code induced decimal-based bitwise shenanigans!!!

v1 was less gross but Mike talked me into this one. :)

> I don't see any reason to bleed the flags into KVM's ABI. Even shoving the flags
> into kvm->arch.vm_type is unncessary. Aha! As is storing vm_type as an "unsigned
> long", since (a) it can't ever be bigger than u32, and (b) in practice a u8 will
> suffice since as Mike pointed out we're effectively limited to 31 types before
> kvm_vm_ioctl_check_extension() starts returning error codes.
>
> So I vote to skip the aliasing and simply do:

Fair enough.

Paolo