2021-04-29 10:49:47

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH v3 0/2] KVM: x86: guest interface for SEV live migration

This is a reviewed version of the guest interface (hypercall+MSR)
for SEV live migration. The differences lie mostly in the API
for userspace. In particular:

- the CPUID feature is not exposed in KVM_GET_SUPPORTED_CPUID

- the hypercall must be enabled manually with KVM_ENABLE_CAP

- the MSR has sensible behavior if not filtered

Compared to v2, the KVM-provided behavior of the MSR is different:
it is set to 0 if the guest memory is encrypted, and 1 if it is not.
The idea is that the MSR is read-only if KVM_FEATURE_HC_PAGE_ENC_STATUS
is not exposed to the guest (it should only be exposed if the guest has
encrypted memory), but it also has a sensible value for non-encrypted
guests. QEMU could however expose a "0" value for the special "-cpu
host,migratable=no" mode if it wanted.

Because of this new behavior, the CPUID bit are split.

Paolo

Ashish Kalra (1):
KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

Paolo Bonzini (1):
KVM: x86: add MSR_KVM_MIGRATION_CONTROL

Documentation/virt/kvm/api.rst | 14 +++++++
Documentation/virt/kvm/cpuid.rst | 9 +++++
Documentation/virt/kvm/hypercalls.rst | 21 ++++++++++
Documentation/virt/kvm/msr.rst | 14 +++++++
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 3 ++
arch/x86/include/uapi/asm/kvm_para.h | 5 +++
arch/x86/kvm/cpuid.c | 3 +-
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/vmx/vmx.c | 7 ++++
arch/x86/kvm/x86.c | 56 +++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
include/uapi/linux/kvm_para.h | 1 +
13 files changed, 135 insertions(+), 1 deletion(-)

--
2.26.2


2021-04-29 10:50:26

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

From: Ashish Kalra <[email protected]>

This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.

The hypercall exits to userspace to manage the guest shared regions and
integrate with the userspace VMM's migration code.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Steve Rutherford <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <93d7f2c2888315adc48905722574d89699edde33.1618498113.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <[email protected]>
---
Documentation/virt/kvm/api.rst | 14 +++++++++
Documentation/virt/kvm/cpuid.rst | 6 ++++
Documentation/virt/kvm/hypercalls.rst | 21 ++++++++++++++
Documentation/virt/kvm/msr.rst | 7 ++++-
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kvm/x86.c | 42 +++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
include/uapi/linux/kvm_para.h | 1 +
9 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 37c520ddb7e8..f8794fed23a4 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6891,3 +6891,17 @@ This capability is always enabled.
This capability indicates that the KVM virtual PTP service is
supported in the host. A VMM can check whether the service is
available to the guest on migration.
+
+8.33 KVM_CAP_EXIT_HYPERCALL
+---------------------------
+
+:Capability: KVM_CAP_EXIT_HYPERCALL
+:Architectures: x86
+:Type: vm
+
+This capability, if enabled, will cause KVM to exit to userspace
+with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
+Right now, the only such hypercall is KVM_HC_PAGE_ENC_STATUS.
+
+Calling KVM_CHECK_EXTENSION for this capability will return a bitmask
+of hypercalls that will exit to userspace.
diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index c53d7e2b8ff4..014ceffc46f2 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -99,6 +99,12 @@ KVM_FEATURE_MSI_EXT_DEST_ID 15 guest checks this feature bit
KVM_FEATURE_MIGRATION_CONTROL 16 guest checks this feature bit before
using MSR_KVM_MIGRATION_CONTROL

+KVM_FEATURE_HC_PAGE_ENC_STATUS 17 guest checks this feature bit before
+ using the page encryption state
+ hypercall to notify the page state
+ change, and before modifying bit 0 of
+ MSR_KVM_MIGRATION_CONTROL
+
KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24 host will warn if no guest-side
per-cpu warps are expected in
kvmclock
diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
index ed4fddd364ea..117ff3b27d3c 100644
--- a/Documentation/virt/kvm/hypercalls.rst
+++ b/Documentation/virt/kvm/hypercalls.rst
@@ -169,3 +169,24 @@ a0: destination APIC ID

:Usage example: When sending a call-function IPI-many to vCPUs, yield if
any of the IPI target vCPUs was preempted.
+
+
+8. KVM_HC_PAGE_ENC_STATUS
+-------------------------
+:Architecture: x86
+:Status: active
+:Purpose: Notify the encryption status changes in guest page table (SEV guest)
+
+a0: the guest physical address of the start page
+a1: the number of pages
+a2: page encryption status
+
+ Where:
+ * 1: Page is encrypted
+ * 0: Page is decrypted
+
+**Implementation note**: this hypercall is implemented in userspace via
+the KVM_CAP_EXIT_HYPERCALL capability. Userspace must enable that capability
+before advertising KVM_FEATURE_HC_PAGE_ENC_STATUS in the guest CPUID. In
+addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace
+must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL.
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 57fc4090031a..cf1b0b2099b0 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -383,5 +383,10 @@ MSR_KVM_MIGRATION_CONTROL:
data:
This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
CPUID. Bit 0 represents whether live migration of the guest is allowed.
+
When a guest is started, bit 0 will be 1 if the guest has encrypted
- memory and 0 if the guest does not have encrypted memory.
+ memory and 0 if the guest does not have encrypted memory. If the
+ guest is communicating page encryption status to the host using the
+ ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
+ allow live migration of the guest. The MSR is read-only if
+ ``KVM_FEATURE_HC_PAGE_STATUS`` is not advertised to the guest.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5491fc617451..9b90a0faeab4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1065,6 +1065,8 @@ struct kvm_arch {
u32 user_space_msr_mask;
struct kvm_x86_msr_filter __rcu *msr_filter;

+ bool hypercall_exit_enabled;
+
/* Guest can access the SGX PROVISIONKEY. */
bool sgx_provisioning_allowed;

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 21390fccfb90..8fadae64bc66 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -34,6 +34,7 @@
#define KVM_FEATURE_ASYNC_PF_INT 14
#define KVM_FEATURE_MSI_EXT_DEST_ID 15
#define KVM_FEATURE_MIGRATION_CONTROL 16
+#define KVM_FEATURE_HC_PAGE_ENC_STATUS 17

#define KVM_HINTS_REALTIME 0

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9c40be9235c..0c2524bbaa84 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3279,6 +3279,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (!guest_pv_has(vcpu, KVM_FEATURE_MIGRATION_CONTROL))
return 1;

+ /*
+ * This implementation is only good if userspace has *not*
+ * enabled KVM_FEATURE_HC_PAGE_ENC_STATUS. If userspace
+ * enables KVM_FEATURE_HC_PAGE_ENC_STATUS it must set up an
+ * MSR filter in order to accept writes that change bit 0.
+ */
if (data != !static_call(kvm_x86_has_encrypted_memory)(vcpu->kvm))
return 1;
break;
@@ -3842,6 +3848,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
r = 1;
break;
+ case KVM_CAP_EXIT_HYPERCALL:
+ r = (1 << KVM_HC_PAGE_ENC_STATUS);
+ break;
case KVM_CAP_SET_GUEST_DEBUG2:
return KVM_GUESTDBG_VALID_MASK;
#ifdef CONFIG_KVM_XEN
@@ -5444,6 +5453,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
}
#endif
+ case KVM_CAP_EXIT_HYPERCALL:
+ kvm->arch.hypercall_exit_enabled = cap->args[0];
+ r = 0;
+ break;
case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
r = -EINVAL;
if (kvm_x86_ops.vm_copy_enc_context_from)
@@ -8329,6 +8342,13 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
return;
}

+static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
+{
+ kvm_rax_write(vcpu, vcpu->run->hypercall.ret);
+ ++vcpu->stat.hypercalls;
+ return kvm_skip_emulated_instruction(vcpu);
+}
+
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
{
unsigned long nr, a0, a1, a2, a3, ret;
@@ -8394,6 +8414,28 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
kvm_sched_yield(vcpu, a0);
ret = 0;
break;
+ case KVM_HC_PAGE_ENC_STATUS: {
+ u64 gpa = a0, npages = a1, enc = a2;
+
+ ret = -KVM_ENOSYS;
+ if (!vcpu->kvm->arch.hypercall_exit_enabled)
+ break;
+
+ if (!PAGE_ALIGNED(gpa) || !npages ||
+ gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
+ ret = -EINVAL;
+ break;
+ }
+
+ vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+ vcpu->run->hypercall.nr = KVM_HC_PAGE_ENC_STATUS;
+ vcpu->run->hypercall.args[0] = gpa;
+ vcpu->run->hypercall.args[1] = npages;
+ vcpu->run->hypercall.args[2] = enc;
+ vcpu->run->hypercall.longmode = op_64_bit;
+ vcpu->arch.complete_userspace_io = complete_hypercall_exit;
+ return 0;
+ }
default:
ret = -KVM_ENOSYS;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..1fb4fd863324 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_SGX_ATTRIBUTE 196
#define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
#define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_EXIT_HYPERCALL 199

#ifdef KVM_CAP_IRQ_ROUTING

diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 8b86609849b9..847b83b75dc8 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -29,6 +29,7 @@
#define KVM_HC_CLOCK_PAIRING 9
#define KVM_HC_SEND_IPI 10
#define KVM_HC_SCHED_YIELD 11
+#define KVM_HC_PAGE_ENC_STATUS 12

/*
* hypercalls use architecture specific
--
2.26.2

2021-04-30 20:29:15

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

On Thu, Apr 29, 2021, Paolo Bonzini wrote:
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index 57fc4090031a..cf1b0b2099b0 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -383,5 +383,10 @@ MSR_KVM_MIGRATION_CONTROL:
> data:
> This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
> CPUID. Bit 0 represents whether live migration of the guest is allowed.
> +
> When a guest is started, bit 0 will be 1 if the guest has encrypted
> - memory and 0 if the guest does not have encrypted memory.
> + memory and 0 if the guest does not have encrypted memory. If the
> + guest is communicating page encryption status to the host using the
> + ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
> + allow live migration of the guest. The MSR is read-only if
> + ``KVM_FEATURE_HC_PAGE_STATUS`` is not advertised to the guest.

I still don't get the desire to tie MSR_KVM_MIGRATION_CONTROL to PAGE_ENC_STATUS
in any way shape or form. I can understand making it read-only or dropping
writes if it's not intercepted by userspace, but making it read-only for
non-encrypted guests makes it useful only for encrypted guests, which defeats
the purpose of genericizing the MSR.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e9c40be9235c..0c2524bbaa84 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3279,6 +3279,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (!guest_pv_has(vcpu, KVM_FEATURE_MIGRATION_CONTROL))
> return 1;
>
> + /*
> + * This implementation is only good if userspace has *not*
> + * enabled KVM_FEATURE_HC_PAGE_ENC_STATUS. If userspace
> + * enables KVM_FEATURE_HC_PAGE_ENC_STATUS it must set up an
> + * MSR filter in order to accept writes that change bit 0.
> + */
> if (data != !static_call(kvm_x86_has_encrypted_memory)(vcpu->kvm))
> return 1;

This behavior doesn't match the documentation.

a. The MSR is not read-only for legacy guests since they can write '0'.
b. The MSR is not read-only if KVM_FEATURE_HC_PAGE_STATUS isn't advertised,
a guest with encrypted memory can write '1' regardless of whether userspace
has enabled KVM_FEATURE_HC_PAGE_STATUS.
c. The MSR is never fully writable, e.g. a guest with encrypted memory can set
bit 0, but not clear it. This doesn't seem intentional?

Why not simply drop writes? E.g.

if (data & ~KVM_MIGRATION_READY)
return 1;
break;

And then do "msr->data = 0;" in the read path. That's just as effective as
making the MSR read-only to force userspace to intercept the MSR if it wants to
do anything useful with the information, and it's easy to document.

> break;

2021-05-01 09:11:03

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

On 30/04/21 22:10, Sean Christopherson wrote:
> On Thu, Apr 29, 2021, Paolo Bonzini wrote:
>> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
>> index 57fc4090031a..cf1b0b2099b0 100644
>> --- a/Documentation/virt/kvm/msr.rst
>> +++ b/Documentation/virt/kvm/msr.rst
>> @@ -383,5 +383,10 @@ MSR_KVM_MIGRATION_CONTROL:
>> data:
>> This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
>> CPUID. Bit 0 represents whether live migration of the guest is allowed.
>> +
>> When a guest is started, bit 0 will be 1 if the guest has encrypted
>> - memory and 0 if the guest does not have encrypted memory.
>> + memory and 0 if the guest does not have encrypted memory. If the
>> + guest is communicating page encryption status to the host using the
>> + ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
>> + allow live migration of the guest. The MSR is read-only if
>> + ``KVM_FEATURE_HC_PAGE_STATUS`` is not advertised to the guest.
>
> I still don't get the desire to tie MSR_KVM_MIGRATION_CONTROL to PAGE_ENC_STATUS
> in any way shape or form. I can understand making it read-only or dropping
> writes if it's not intercepted by userspace, but making it read-only for
> non-encrypted guests makes it useful only for encrypted guests, which defeats
> the purpose of genericizing the MSR.

Yeah, I see your point. On the other hand by making it unconditionally
writable we must implement the writability in KVM, because a read-only
implementation would not comply with the spec.

>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e9c40be9235c..0c2524bbaa84 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3279,6 +3279,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> if (!guest_pv_has(vcpu, KVM_FEATURE_MIGRATION_CONTROL))
>> return 1;
>>
>> + /*
>> + * This implementation is only good if userspace has *not*
>> + * enabled KVM_FEATURE_HC_PAGE_ENC_STATUS. If userspace
>> + * enables KVM_FEATURE_HC_PAGE_ENC_STATUS it must set up an
>> + * MSR filter in order to accept writes that change bit 0.
>> + */
>> if (data != !static_call(kvm_x86_has_encrypted_memory)(vcpu->kvm))
>> return 1;
>
> This behavior doesn't match the documentation.
>
> a. The MSR is not read-only for legacy guests since they can write '0'.
> b. The MSR is not read-only if KVM_FEATURE_HC_PAGE_STATUS isn't advertised,
> a guest with encrypted memory can write '1' regardless of whether userspace
> has enabled KVM_FEATURE_HC_PAGE_STATUS.

Right, I should have said "not changeable" rather than "read-only".

> c. The MSR is never fully writable, e.g. a guest with encrypted memory can set
> bit 0, but not clear it. This doesn't seem intentional?

It is intentional, clearing it would mean preserving the value in the
kernel so that userspace can read it.

So... I don't know, all in all having both the separate CPUID and the
userspace implementation reeks of overengineering. It should be either
of these:

- separate CPUID bit, MSR unconditionally writable and implemented in
KVM. Userspace is expected to ignore the MSR value for encrypted guests
unless KVM_FEATURE_HC_PAGE_STATUS is exposed. Userspace should respect
it even for unencrypted guests (not a migration-DoS vector, because
userspace can just not expose the feature).

- make it completely independent from migration, i.e. it's just a facet
of MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date. It
would use CPUID bit as the encryption status bitmap and have no code at
all in KVM (userspace needs to set up the filter and implement everything).

At this point I very much prefer the latter, which is basically Ashish's
earlier patch.

Paolo

> Why not simply drop writes? E.g.
>
> if (data & ~KVM_MIGRATION_READY)
> return 1;
> break;
>
> And then do "msr->data = 0;" in the read path. That's just as effective as
> making the MSR read-only to force userspace to intercept the MSR if it wants to
> do anything useful with the information, and it's easy to document.

2021-05-03 23:24:13

by Steve Rutherford

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

On Sat, May 1, 2021 at 2:01 AM Paolo Bonzini <[email protected]> wrote:
>
> On 30/04/21 22:10, Sean Christopherson wrote:
> > On Thu, Apr 29, 2021, Paolo Bonzini wrote:
> >> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> >> index 57fc4090031a..cf1b0b2099b0 100644
> >> --- a/Documentation/virt/kvm/msr.rst
> >> +++ b/Documentation/virt/kvm/msr.rst
> >> @@ -383,5 +383,10 @@ MSR_KVM_MIGRATION_CONTROL:
> >> data:
> >> This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
> >> CPUID. Bit 0 represents whether live migration of the guest is allowed.
> >> +
> >> When a guest is started, bit 0 will be 1 if the guest has encrypted
> >> - memory and 0 if the guest does not have encrypted memory.
> >> + memory and 0 if the guest does not have encrypted memory. If the
> >> + guest is communicating page encryption status to the host using the
> >> + ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
> >> + allow live migration of the guest. The MSR is read-only if
> >> + ``KVM_FEATURE_HC_PAGE_STATUS`` is not advertised to the guest.
> >
> > I still don't get the desire to tie MSR_KVM_MIGRATION_CONTROL to PAGE_ENC_STATUS
> > in any way shape or form. I can understand making it read-only or dropping
> > writes if it's not intercepted by userspace, but making it read-only for
> > non-encrypted guests makes it useful only for encrypted guests, which defeats
> > the purpose of genericizing the MSR.
>
> Yeah, I see your point. On the other hand by making it unconditionally
> writable we must implement the writability in KVM, because a read-only
> implementation would not comply with the spec.
>
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index e9c40be9235c..0c2524bbaa84 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -3279,6 +3279,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >> if (!guest_pv_has(vcpu, KVM_FEATURE_MIGRATION_CONTROL))
> >> return 1;
> >>
> >> + /*
> >> + * This implementation is only good if userspace has *not*
> >> + * enabled KVM_FEATURE_HC_PAGE_ENC_STATUS. If userspace
> >> + * enables KVM_FEATURE_HC_PAGE_ENC_STATUS it must set up an
> >> + * MSR filter in order to accept writes that change bit 0.
> >> + */
> >> if (data != !static_call(kvm_x86_has_encrypted_memory)(vcpu->kvm))
> >> return 1;
> >
> > This behavior doesn't match the documentation.
> >
> > a. The MSR is not read-only for legacy guests since they can write '0'.
> > b. The MSR is not read-only if KVM_FEATURE_HC_PAGE_STATUS isn't advertised,
> > a guest with encrypted memory can write '1' regardless of whether userspace
> > has enabled KVM_FEATURE_HC_PAGE_STATUS.
>
> Right, I should have said "not changeable" rather than "read-only".
>
> > c. The MSR is never fully writable, e.g. a guest with encrypted memory can set
> > bit 0, but not clear it. This doesn't seem intentional?
>
> It is intentional, clearing it would mean preserving the value in the
> kernel so that userspace can read it.
>
> So... I don't know, all in all having both the separate CPUID and the
> userspace implementation reeks of overengineering. It should be either
> of these:
>
> - separate CPUID bit, MSR unconditionally writable and implemented in
> KVM. Userspace is expected to ignore the MSR value for encrypted guests
> unless KVM_FEATURE_HC_PAGE_STATUS is exposed. Userspace should respect
> it even for unencrypted guests (not a migration-DoS vector, because
> userspace can just not expose the feature).
>
> - make it completely independent from migration, i.e. it's just a facet
> of MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date. It
> would use CPUID bit as the encryption status bitmap and have no code at
> all in KVM (userspace needs to set up the filter and implement everything).
As far as I know, because of MSR filtering, the only "code" that needs
to be in KVM for MSR handling is a #define reserving the PV feature
number and a #define for the MSR number.

Arguably, you don't even need to add the new PV bits to the supported
cpuid, since MSR filtering is really what determines if kernel support
is present.

>
> At this point I very much prefer the latter, which is basically Ashish's
> earlier patch.
The minor distinction would be that if you expose the cpuid bit to the
guest you plan on intercepting the MSR with filters, and would not
need any handler code in the kernel.

Steve
>
> Paolo

2021-05-04 09:18:11

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

On 04/05/21 01:22, Steve Rutherford wrote:
> As far as I know, because of MSR filtering, the only "code" that needs
> to be in KVM for MSR handling is a #define reserving the PV feature
> number and a #define for the MSR number.
>
> Arguably, you don't even need to add the new PV bits to the supported
> cpuid, since MSR filtering is really what determines if kernel support
> is present.

Not only I don't need to do that, I must not. :)

>> At this point I very much prefer the latter, which is basically Ashish's
>> earlier patch.
>
> The minor distinction would be that if you expose the cpuid bit to the
> guest you plan on intercepting the MSR with filters, and would not
> need any handler code in the kernel.

Yep, and it's not a minor distinction after all (especially from the PoV
of the guy who actually ends up maintaining the code, i.e. me), so
that's what I'm going for.

Paolo

2021-05-04 17:19:08

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

On Sat, May 01, 2021, Paolo Bonzini wrote:
> - make it completely independent from migration, i.e. it's just a facet of
> MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date. It would
> use CPUID bit as the encryption status bitmap and have no code at all in KVM
> (userspace needs to set up the filter and implement everything).

If the bit is purely a "page encryption status is up-to-date", what about
overloading KVM_HC_PAGE_ENC_STATUS to handle that status update as well? That
would eliminate my biggest complaint about having what is effectively a single
paravirt feature split into two separate, but intertwined chunks of ABI.

#define KVM_HC_PAGE_ENC_UPDATE 12

#define KVM_HC_PAGE_ENC_REGION_UPDATE 0 /* encrypted vs. plain text */
#define KVM_HC_PAGE_ENC_STATUS_UPDATE 1 /* up-to-date vs. stale */

ret = -KVM_ENOSYS;
if (!vcpu->kvm->arch.hypercall_exit_enabled)
break;

ret = -EINVAL;
if (a0 == KVM_HC_PAGE_ENC_REGION_UPDATE) {
u64 gpa = a1, npages = a2;

if (!PAGE_ALIGNED(gpa) || !npages ||
gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa))
break;
} else if (a0 != KVM_HC_PAGE_ENC_STATUS_UPDATE) {
break;
}

vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
vcpu->run->hypercall.nr = KVM_HC_PAGE_ENC_STATUS;
vcpu->run->hypercall.args[0] = a0;
vcpu->run->hypercall.args[1] = a1;
vcpu->run->hypercall.args[2] = a2;
vcpu->run->hypercall.args[3] = a3;
vcpu->run->hypercall.longmode = op_64_bit;
vcpu->arch.complete_userspace_io = complete_hypercall_exit;
return 0;

2021-05-04 20:58:49

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

On 04/05/21 22:33, Sean Christopherson wrote:
> On Tue, May 04, 2021, Paolo Bonzini wrote:
>> On 04/05/21 19:09, Sean Christopherson wrote:
>>> On Sat, May 01, 2021, Paolo Bonzini wrote:
>>>> - make it completely independent from migration, i.e. it's just a facet of
>>>> MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date. It would
>>>> use CPUID bit as the encryption status bitmap and have no code at all in KVM
>>>> (userspace needs to set up the filter and implement everything).
>>>
>>> If the bit is purely a "page encryption status is up-to-date", what about
>>> overloading KVM_HC_PAGE_ENC_STATUS to handle that status update as well? That
>>> would eliminate my biggest complaint about having what is effectively a single
>>> paravirt feature split into two separate, but intertwined chunks of ABI.
>>
>> It's true that they are intertwined, but I dislike not having a way to read
>> the current state.
>
> From the guest?

Yes, host userspace obviously doesn't need one since it's implemented
through an MSR filter. It may not be really necessary to read it, but
it's a bit jarring compared to how the rest of the PV APIs uses MSRs.

Also from a debugging/crashdump point of view the VMM may have an
established way to read an MSR from a vCPU, but it won't work if you
come up with a new way to set the state.

Paolo

2021-05-04 21:19:51

by Steve Rutherford

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

On Tue, May 4, 2021 at 1:56 PM Paolo Bonzini <[email protected]> wrote:
>
> On 04/05/21 22:33, Sean Christopherson wrote:
> > On Tue, May 04, 2021, Paolo Bonzini wrote:
> >> On 04/05/21 19:09, Sean Christopherson wrote:
> >>> On Sat, May 01, 2021, Paolo Bonzini wrote:
> >>>> - make it completely independent from migration, i.e. it's just a facet of
> >>>> MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date. It would
> >>>> use CPUID bit as the encryption status bitmap and have no code at all in KVM
> >>>> (userspace needs to set up the filter and implement everything).
> >>>
> >>> If the bit is purely a "page encryption status is up-to-date", what about
> >>> overloading KVM_HC_PAGE_ENC_STATUS to handle that status update as well? That
> >>> would eliminate my biggest complaint about having what is effectively a single
> >>> paravirt feature split into two separate, but intertwined chunks of ABI.
> >>
> >> It's true that they are intertwined, but I dislike not having a way to read
> >> the current state.
> >
> > From the guest?
>
> Yes, host userspace obviously doesn't need one since it's implemented
> through an MSR filter. It may not be really necessary to read it, but
> it's a bit jarring compared to how the rest of the PV APIs uses MSRs.
>
> Also from a debugging/crashdump point of view the VMM may have an
> established way to read an MSR from a vCPU, but it won't work if you
> come up with a new way to set the state.

Agreed on the preference for an MSR. I particularly appreciate that it
reduces the kernel footprint for these changes.


Steve

2021-05-04 21:58:51

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

On 04/05/21 19:09, Sean Christopherson wrote:
> On Sat, May 01, 2021, Paolo Bonzini wrote:
>> - make it completely independent from migration, i.e. it's just a facet of
>> MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date. It would
>> use CPUID bit as the encryption status bitmap and have no code at all in KVM
>> (userspace needs to set up the filter and implement everything).
>
> If the bit is purely a "page encryption status is up-to-date", what about
> overloading KVM_HC_PAGE_ENC_STATUS to handle that status update as well? That
> would eliminate my biggest complaint about having what is effectively a single
> paravirt feature split into two separate, but intertwined chunks of ABI.

It's true that they are intertwined, but I dislike not having a way to
read the current state.

Paolo

>
> #define KVM_HC_PAGE_ENC_UPDATE 12
>
> #define KVM_HC_PAGE_ENC_REGION_UPDATE 0 /* encrypted vs. plain text */
> #define KVM_HC_PAGE_ENC_STATUS_UPDATE 1 /* up-to-date vs. stale */
>
> ret = -KVM_ENOSYS;
> if (!vcpu->kvm->arch.hypercall_exit_enabled)
> break;
>
> ret = -EINVAL;
> if (a0 == KVM_HC_PAGE_ENC_REGION_UPDATE) {
> u64 gpa = a1, npages = a2;
>
> if (!PAGE_ALIGNED(gpa) || !npages ||
> gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa))
> break;
> } else if (a0 != KVM_HC_PAGE_ENC_STATUS_UPDATE) {
> break;
> }
>
> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> vcpu->run->hypercall.nr = KVM_HC_PAGE_ENC_STATUS;
> vcpu->run->hypercall.args[0] = a0;
> vcpu->run->hypercall.args[1] = a1;
> vcpu->run->hypercall.args[2] = a2;
> vcpu->run->hypercall.args[3] = a3;
> vcpu->run->hypercall.longmode = op_64_bit;
> vcpu->arch.complete_userspace_io = complete_hypercall_exit;
> return 0;
>

2021-05-04 21:59:08

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

On Tue, May 04, 2021, Paolo Bonzini wrote:
> On 04/05/21 19:09, Sean Christopherson wrote:
> > On Sat, May 01, 2021, Paolo Bonzini wrote:
> > > - make it completely independent from migration, i.e. it's just a facet of
> > > MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date. It would
> > > use CPUID bit as the encryption status bitmap and have no code at all in KVM
> > > (userspace needs to set up the filter and implement everything).
> >
> > If the bit is purely a "page encryption status is up-to-date", what about
> > overloading KVM_HC_PAGE_ENC_STATUS to handle that status update as well? That
> > would eliminate my biggest complaint about having what is effectively a single
> > paravirt feature split into two separate, but intertwined chunks of ABI.
>
> It's true that they are intertwined, but I dislike not having a way to read
> the current state.

From the guest?