LinuxLists.cc - [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

2024-05-01 09:02:32

Subject: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

When using gmem, private/shared memory is allocated through separate
pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
backed by private memory or not.

Forward these page state change requests to userspace so that it can
issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
entries when it is ready to map a private page into a guest.

Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
requests to userspace via KVM_EXIT_HYPERCALL.

Signed-off-by: Michael Roth <[email protected]>
Co-developed-by: Brijesh Singh <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/sev-common.h | 6 ++++
arch/x86/kvm/svm/sev.c | 48 +++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1006bfffe07a..6d68db812de1 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,11 +101,17 @@ enum psc_op {
/* GHCBData[11:0] */ \
GHCB_MSR_PSC_REQ)

+#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
+#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
+
#define GHCB_MSR_PSC_RESP 0x015
#define GHCB_MSR_PSC_RESP_VAL(val) \
/* GHCBData[63:32] */ \
(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)

+/* Set highest bit as a generic error response */
+#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
+
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e1ac5af4cb74..720775c9d0b8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}

+static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (vcpu->run->hypercall.ret)
+ set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+ else
+ set_ghcb_msr(svm, GHCB_MSR_PSC_RESP);
+
+ return 1; /* resume guest */
+}
+
+static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
+{
+ u64 gpa = gfn_to_gpa(GHCB_MSR_PSC_REQ_TO_GFN(ghcb_msr));
+ u8 op = GHCB_MSR_PSC_REQ_TO_OP(ghcb_msr);
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+
+ if (op != SNP_PAGE_STATE_PRIVATE && op != SNP_PAGE_STATE_SHARED) {
+ set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+ return 1; /* resume guest */
+ }
+
+ if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE))) {
+ set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+ return 1; /* resume guest */
+ }
+
+ vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+ vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
+ vcpu->run->hypercall.args[0] = gpa;
+ vcpu->run->hypercall.args[1] = 1;
+ vcpu->run->hypercall.args[2] = (op == SNP_PAGE_STATE_PRIVATE)
+ ? KVM_MAP_GPA_RANGE_ENCRYPTED
+ : KVM_MAP_GPA_RANGE_DECRYPTED;
+ vcpu->run->hypercall.args[2] |= KVM_MAP_GPA_RANGE_PAGE_SZ_4K;
+
+ vcpu->arch.complete_userspace_io = snp_complete_psc_msr;
+
+ return 0; /* forward request to userspace */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3566,6 +3608,12 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PSC_REQ:
+ if (!sev_snp_guest(vcpu->kvm))
+ goto out_terminate;
+
+ ret = snp_begin_psc_msr(svm, control->ghcb_gpa);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

--
2.25.1

2024-05-16 08:29:36

by Binbin Wu

[permalink] [raw]

Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

On 5/1/2024 4:51 PM, Michael Roth wrote:
> SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
> table to be private or shared using the Page State Change MSR protocol
> as defined in the GHCB specification.
>
> When using gmem, private/shared memory is allocated through separate
> pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
> KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
> backed by private memory or not.
>
> Forward these page state change requests to userspace so that it can
> issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
> entries when it is ready to map a private page into a guest.
>
> Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
> requests to userspace via KVM_EXIT_HYPERCALL.
>
> Signed-off-by: Michael Roth <[email protected]>
> Co-developed-by: Brijesh Singh <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/include/asm/sev-common.h | 6 ++++
> arch/x86/kvm/svm/sev.c | 48 +++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 1006bfffe07a..6d68db812de1 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -101,11 +101,17 @@ enum psc_op {
> /* GHCBData[11:0] */ \
> GHCB_MSR_PSC_REQ)
>
> +#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
> +#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
> +
> #define GHCB_MSR_PSC_RESP 0x015
> #define GHCB_MSR_PSC_RESP_VAL(val) \
> /* GHCBData[63:32] */ \
> (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
>
> +/* Set highest bit as a generic error response */
> +#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
> +
> /* GHCB Hypervisor Feature Request/Response */
> #define GHCB_MSR_HV_FT_REQ 0x080
> #define GHCB_MSR_HV_FT_RESP 0x081
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index e1ac5af4cb74..720775c9d0b8 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
> svm->vmcb->control.ghcb_gpa = value;
> }
>
> +static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + if (vcpu->run->hypercall.ret)

Do we have definition of ret? I didn't find clear documentation about it.
According to the code, 0 means succssful. Is there any other error codes
need to or can be interpreted?

For TDX, it may also want to use KVM_HC_MAP_GPA_RANGE hypercall to
userspace via KVM_EXIT_HYPERCALL.

> + set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
> + else
> + set_ghcb_msr(svm, GHCB_MSR_PSC_RESP);
> +
> + return 1; /* resume guest */
> +}
>
[...]

2024-05-16 17:23:58

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

On Thu, May 16, 2024 at 10:29 AM Binbin Wu <[email protected]> wrote:
>
>
>
> On 5/1/2024 4:51 PM, Michael Roth wrote:
> > SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
> > table to be private or shared using the Page State Change MSR protocol
> > as defined in the GHCB specification.
> >
> > When using gmem, private/shared memory is allocated through separate
> > pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
> > KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
> > backed by private memory or not.
> >
> > Forward these page state change requests to userspace so that it can
> > issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
> > entries when it is ready to map a private page into a guest.
> >
> > Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
> > requests to userspace via KVM_EXIT_HYPERCALL.
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > Co-developed-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Brijesh Singh <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > arch/x86/include/asm/sev-common.h | 6 ++++
> > arch/x86/kvm/svm/sev.c | 48 +++++++++++++++++++++++++++++++
> > 2 files changed, 54 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> > index 1006bfffe07a..6d68db812de1 100644
> > --- a/arch/x86/include/asm/sev-common.h
> > +++ b/arch/x86/include/asm/sev-common.h
> > @@ -101,11 +101,17 @@ enum psc_op {
> > /* GHCBData[11:0] */ \
> > GHCB_MSR_PSC_REQ)
> >
> > +#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
> > +#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
> > +
> > #define GHCB_MSR_PSC_RESP 0x015
> > #define GHCB_MSR_PSC_RESP_VAL(val) \
> > /* GHCBData[63:32] */ \
> > (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
> >
> > +/* Set highest bit as a generic error response */
> > +#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
> > +
> > /* GHCB Hypervisor Feature Request/Response */
> > #define GHCB_MSR_HV_FT_REQ 0x080
> > #define GHCB_MSR_HV_FT_RESP 0x081
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index e1ac5af4cb74..720775c9d0b8 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
> > svm->vmcb->control.ghcb_gpa = value;
> > }
> >
> > +static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
> > +{
> > + struct vcpu_svm *svm = to_svm(vcpu);
> > +
> > + if (vcpu->run->hypercall.ret)
>
> Do we have definition of ret? I didn't find clear documentation about it.
> According to the code, 0 means succssful. Is there any other error codes
> need to or can be interpreted?

They are defined in include/uapi/linux/kvm_para.h

#define KVM_ENOSYS 1000
#define KVM_EFAULT EFAULT /* 14 */
#define KVM_EINVAL EINVAL /* 22 */
#define KVM_E2BIG E2BIG /* 7 */
#define KVM_EPERM EPERM /* 1*/
#define KVM_EOPNOTSUPP 95

Linux however does not expect the hypercall to fail for SEV/SEV-ES; and
it will terminate the guest if the PSC operation fails for SEV-SNP. So
it's best for userspace if the hypercall always succeeds. :)

> For TDX, it may also want to use KVM_HC_MAP_GPA_RANGE hypercall to
> userspace via KVM_EXIT_HYPERCALL.

Yes, definitely.

Paolo

2024-05-21 00:50:19

by Binbin Wu

[permalink] [raw]

Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

On 5/17/2024 1:23 AM, Paolo Bonzini wrote:
> On Thu, May 16, 2024 at 10:29 AM Binbin Wu <[email protected]> wrote:
>>
>>
>> On 5/1/2024 4:51 PM, Michael Roth wrote:
>>> SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
>>> table to be private or shared using the Page State Change MSR protocol
>>> as defined in the GHCB specification.
>>>
>>> When using gmem, private/shared memory is allocated through separate
>>> pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
>>> KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
>>> backed by private memory or not.
>>>
>>> Forward these page state change requests to userspace so that it can
>>> issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
>>> entries when it is ready to map a private page into a guest.
>>>
>>> Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
>>> requests to userspace via KVM_EXIT_HYPERCALL.
>>>
>>> Signed-off-by: Michael Roth <[email protected]>
>>> Co-developed-by: Brijesh Singh <[email protected]>
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> Signed-off-by: Ashish Kalra <[email protected]>
>>> ---
>>> arch/x86/include/asm/sev-common.h | 6 ++++
>>> arch/x86/kvm/svm/sev.c | 48 +++++++++++++++++++++++++++++++
>>> 2 files changed, 54 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
>>> index 1006bfffe07a..6d68db812de1 100644
>>> --- a/arch/x86/include/asm/sev-common.h
>>> +++ b/arch/x86/include/asm/sev-common.h
>>> @@ -101,11 +101,17 @@ enum psc_op {
>>> /* GHCBData[11:0] */ \
>>> GHCB_MSR_PSC_REQ)
>>>
>>> +#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
>>> +#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
>>> +
>>> #define GHCB_MSR_PSC_RESP 0x015
>>> #define GHCB_MSR_PSC_RESP_VAL(val) \
>>> /* GHCBData[63:32] */ \
>>> (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
>>>
>>> +/* Set highest bit as a generic error response */
>>> +#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
>>> +
>>> /* GHCB Hypervisor Feature Request/Response */
>>> #define GHCB_MSR_HV_FT_REQ 0x080
>>> #define GHCB_MSR_HV_FT_RESP 0x081
>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>> index e1ac5af4cb74..720775c9d0b8 100644
>>> --- a/arch/x86/kvm/svm/sev.c
>>> +++ b/arch/x86/kvm/svm/sev.c
>>> @@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
>>> svm->vmcb->control.ghcb_gpa = value;
>>> }
>>>
>>> +static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
>>> +{
>>> + struct vcpu_svm *svm = to_svm(vcpu);
>>> +
>>> + if (vcpu->run->hypercall.ret)
>> Do we have definition of ret? I didn't find clear documentation about it.
>> According to the code, 0 means succssful. Is there any other error codes
>> need to or can be interpreted?
> They are defined in include/uapi/linux/kvm_para.h
>
> #define KVM_ENOSYS 1000
> #define KVM_EFAULT EFAULT /* 14 */
> #define KVM_EINVAL EINVAL /* 22 */
> #define KVM_E2BIG E2BIG /* 7 */
> #define KVM_EPERM EPERM /* 1*/
> #define KVM_EOPNOTSUPP 95
>
> Linux however does not expect the hypercall to fail for SEV/SEV-ES; and
> it will terminate the guest if the PSC operation fails for SEV-SNP. So
> it's best for userspace if the hypercall always succeeds. :)
Thanks for the info.

For TDX, it wants to restrict the size of memory range for conversion in
one hypercall to avoid a too long latency.
Previously, in TDX QEMU patchset v5, the limitation is in userspace and
if the size is too big, the status_code will set to TDG_VP_VMCALL_RETRY
and the failed GPA for guest to retry is updated.
https://lore.kernel.org/all/[email protected]/

When TDX converts TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE, do you think
which is more reasonable to set the restriction? In KVM (TDX specific
code) or userspace?
If userspace is preferred, then the interface needs to be extended to
support it.

>
>> For TDX, it may also want to use KVM_HC_MAP_GPA_RANGE hypercall to
>> userspace via KVM_EXIT_HYPERCALL.
> Yes, definitely.
>
> Paolo
>

2024-05-21 21:50:21

by Michael Roth

[permalink] [raw]

Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

On Tue, May 21, 2024 at 08:49:59AM +0800, Binbin Wu wrote:
>
>
> On 5/17/2024 1:23 AM, Paolo Bonzini wrote:
> > On Thu, May 16, 2024 at 10:29 AM Binbin Wu <[email protected]> wrote:
> > >
> > >
> > > On 5/1/2024 4:51 PM, Michael Roth wrote:
> > > > SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
> > > > table to be private or shared using the Page State Change MSR protocol
> > > > as defined in the GHCB specification.
> > > >
> > > > When using gmem, private/shared memory is allocated through separate
> > > > pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
> > > > KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
> > > > backed by private memory or not.
> > > >
> > > > Forward these page state change requests to userspace so that it can
> > > > issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
> > > > entries when it is ready to map a private page into a guest.
> > > >
> > > > Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
> > > > requests to userspace via KVM_EXIT_HYPERCALL.
> > > >
> > > > Signed-off-by: Michael Roth <[email protected]>
> > > > Co-developed-by: Brijesh Singh <[email protected]>
> > > > Signed-off-by: Brijesh Singh <[email protected]>
> > > > Signed-off-by: Ashish Kalra <[email protected]>
> > > > ---
> > > > arch/x86/include/asm/sev-common.h | 6 ++++
> > > > arch/x86/kvm/svm/sev.c | 48 +++++++++++++++++++++++++++++++
> > > > 2 files changed, 54 insertions(+)
> > > >
> > > > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> > > > index 1006bfffe07a..6d68db812de1 100644
> > > > --- a/arch/x86/include/asm/sev-common.h
> > > > +++ b/arch/x86/include/asm/sev-common.h
> > > > @@ -101,11 +101,17 @@ enum psc_op {
> > > > /* GHCBData[11:0] */ \
> > > > GHCB_MSR_PSC_REQ)
> > > >
> > > > +#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
> > > > +#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
> > > > +
> > > > #define GHCB_MSR_PSC_RESP 0x015
> > > > #define GHCB_MSR_PSC_RESP_VAL(val) \
> > > > /* GHCBData[63:32] */ \
> > > > (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
> > > >
> > > > +/* Set highest bit as a generic error response */
> > > > +#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
> > > > +
> > > > /* GHCB Hypervisor Feature Request/Response */
> > > > #define GHCB_MSR_HV_FT_REQ 0x080
> > > > #define GHCB_MSR_HV_FT_RESP 0x081
> > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > index e1ac5af4cb74..720775c9d0b8 100644
> > > > --- a/arch/x86/kvm/svm/sev.c
> > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > @@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
> > > > svm->vmcb->control.ghcb_gpa = value;
> > > > }
> > > >
> > > > +static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
> > > > +{
> > > > + struct vcpu_svm *svm = to_svm(vcpu);
> > > > +
> > > > + if (vcpu->run->hypercall.ret)
> > > Do we have definition of ret? I didn't find clear documentation about it.
> > > According to the code, 0 means succssful. Is there any other error codes
> > > need to or can be interpreted?
> > They are defined in include/uapi/linux/kvm_para.h
> >
> > #define KVM_ENOSYS 1000
> > #define KVM_EFAULT EFAULT /* 14 */
> > #define KVM_EINVAL EINVAL /* 22 */
> > #define KVM_E2BIG E2BIG /* 7 */
> > #define KVM_EPERM EPERM /* 1*/
> > #define KVM_EOPNOTSUPP 95
> >
> > Linux however does not expect the hypercall to fail for SEV/SEV-ES; and
> > it will terminate the guest if the PSC operation fails for SEV-SNP. So
> > it's best for userspace if the hypercall always succeeds. :)
> Thanks for the info.
>
> For TDX, it wants to restrict the size of memory range for conversion in one
> hypercall to avoid a too long latency.
> Previously, in TDX QEMU patchset v5, the limitation is in userspace and if
> the size is too big, the status_code will set to TDG_VP_VMCALL_RETRY and the
> failed GPA for guest to retry is updated.
> https://lore.kernel.org/all/[email protected]/
>
> When TDX converts TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE, do you think
> which is more reasonable to set the restriction? In KVM (TDX specific code)
> or userspace?
> If userspace is preferred, then the interface needs to be extended to
> support it.

With SNP we might get a batch of requests in a single GHCB request, and
potentially each of those requests need to get set out to userspace as
a single KVM_HC_MAP_GPA_RANGE. The subsequent patch here handles that in
a loop by issuing a new KVM_HC_MAP_GPA_RANGE via the completion handler.
So we also sort of need to split large requests into multiple userspace
requests in some cases.

It seems like TDX should be able to do something similar by limiting the
size of each KVM_HC_MAP_GPA_RANGE to TDX_MAP_GPA_MAX_LEN, and then
returning TDG_VP_VMCALL_RETRY to guest if the original size was greater
than TDX_MAP_GPA_MAX_LEN. But at that point you're effectively done with
the entire request and can return to guest, so it actually seems a little
more straightforward than the SNP case above. E.g. TDX has a 1:1 mapping
between TDG_VP_VMCALL_MAP_GPA and KVM_HC_MAP_GPA_RANGE events. (And even
similar names :))

So doesn't seem like there's a good reason to expose any of these
throttling details to userspace, in which case existing
KVM_HC_MAP_GPA_RANGE interface seems like it should be sufficient.

-Mike

>
>
> >
> > > For TDX, it may also want to use KVM_HC_MAP_GPA_RANGE hypercall to
> > > userspace via KVM_EXIT_HYPERCALL.
> > Yes, definitely.
> >
> > Paolo
> >
>

2024-05-27 12:26:19

by Binbin Wu

[permalink] [raw]

Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

On 5/22/2024 5:49 AM, Michael Roth wrote:
> On Tue, May 21, 2024 at 08:49:59AM +0800, Binbin Wu wrote:
>>
>> On 5/17/2024 1:23 AM, Paolo Bonzini wrote:
>>> On Thu, May 16, 2024 at 10:29 AM Binbin Wu <[email protected]> wrote:
>>>>
>>>> On 5/1/2024 4:51 PM, Michael Roth wrote:
>>>>> SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
>>>>> table to be private or shared using the Page State Change MSR protocol
>>>>> as defined in the GHCB specification.
>>>>>
>>>>> When using gmem, private/shared memory is allocated through separate
>>>>> pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
>>>>> KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
>>>>> backed by private memory or not.
>>>>>
>>>>> Forward these page state change requests to userspace so that it can
>>>>> issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
>>>>> entries when it is ready to map a private page into a guest.
>>>>>
>>>>> Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
>>>>> requests to userspace via KVM_EXIT_HYPERCALL.
>>>>>
>>>>> Signed-off-by: Michael Roth <[email protected]>
>>>>> Co-developed-by: Brijesh Singh <[email protected]>
>>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>>> Signed-off-by: Ashish Kalra <[email protected]>
>>>>> ---
>>>>> arch/x86/include/asm/sev-common.h | 6 ++++
>>>>> arch/x86/kvm/svm/sev.c | 48 +++++++++++++++++++++++++++++++
>>>>> 2 files changed, 54 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
>>>>> index 1006bfffe07a..6d68db812de1 100644
>>>>> --- a/arch/x86/include/asm/sev-common.h
>>>>> +++ b/arch/x86/include/asm/sev-common.h
>>>>> @@ -101,11 +101,17 @@ enum psc_op {
>>>>> /* GHCBData[11:0] */ \
>>>>> GHCB_MSR_PSC_REQ)
>>>>>
>>>>> +#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
>>>>> +#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
>>>>> +
>>>>> #define GHCB_MSR_PSC_RESP 0x015
>>>>> #define GHCB_MSR_PSC_RESP_VAL(val) \
>>>>> /* GHCBData[63:32] */ \
>>>>> (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
>>>>>
>>>>> +/* Set highest bit as a generic error response */
>>>>> +#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
>>>>> +
>>>>> /* GHCB Hypervisor Feature Request/Response */
>>>>> #define GHCB_MSR_HV_FT_REQ 0x080
>>>>> #define GHCB_MSR_HV_FT_RESP 0x081
>>>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>>>> index e1ac5af4cb74..720775c9d0b8 100644
>>>>> --- a/arch/x86/kvm/svm/sev.c
>>>>> +++ b/arch/x86/kvm/svm/sev.c
>>>>> @@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
>>>>> svm->vmcb->control.ghcb_gpa = value;
>>>>> }
>>>>>
>>>>> +static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> + struct vcpu_svm *svm = to_svm(vcpu);
>>>>> +
>>>>> + if (vcpu->run->hypercall.ret)
>>>> Do we have definition of ret? I didn't find clear documentation about it.
>>>> According to the code, 0 means succssful. Is there any other error codes
>>>> need to or can be interpreted?
>>> They are defined in include/uapi/linux/kvm_para.h
>>>
>>> #define KVM_ENOSYS 1000
>>> #define KVM_EFAULT EFAULT /* 14 */
>>> #define KVM_EINVAL EINVAL /* 22 */
>>> #define KVM_E2BIG E2BIG /* 7 */
>>> #define KVM_EPERM EPERM /* 1*/
>>> #define KVM_EOPNOTSUPP 95
>>>
>>> Linux however does not expect the hypercall to fail for SEV/SEV-ES; and
>>> it will terminate the guest if the PSC operation fails for SEV-SNP. So
>>> it's best for userspace if the hypercall always succeeds. :)
>> Thanks for the info.
>>
>> For TDX, it wants to restrict the size of memory range for conversion in one
>> hypercall to avoid a too long latency.
>> Previously, in TDX QEMU patchset v5, the limitation is in userspace and if
>> the size is too big, the status_code will set to TDG_VP_VMCALL_RETRY and the
>> failed GPA for guest to retry is updated.
>> https://lore.kernel.org/all/[email protected]/
>>
>> When TDX converts TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE, do you think
>> which is more reasonable to set the restriction? In KVM (TDX specific code)
>> or userspace?
>> If userspace is preferred, then the interface needs to be extended to
>> support it.
> With SNP we might get a batch of requests in a single GHCB request, and
> potentially each of those requests need to get set out to userspace as
> a single KVM_HC_MAP_GPA_RANGE. The subsequent patch here handles that in
> a loop by issuing a new KVM_HC_MAP_GPA_RANGE via the completion handler.
> So we also sort of need to split large requests into multiple userspace
> requests in some cases.
>
> It seems like TDX should be able to do something similar by limiting the
> size of each KVM_HC_MAP_GPA_RANGE to TDX_MAP_GPA_MAX_LEN, and then
> returning TDG_VP_VMCALL_RETRY to guest if the original size was greater
> than TDX_MAP_GPA_MAX_LEN. But at that point you're effectively done with
> the entire request and can return to guest, so it actually seems a little
> more straightforward than the SNP case above. E.g. TDX has a 1:1 mapping
> between TDG_VP_VMCALL_MAP_GPA and KVM_HC_MAP_GPA_RANGE events. (And even
> similar names :))
>
> So doesn't seem like there's a good reason to expose any of these
> throttling details to userspace,

The reasons I want to put the throttling in userspace are:
1. Hardcode the TDX_MAP_GPA_MAX_LEN in kernel may not be preferred.
2. The throttling thing doesn't need to be TDX specific, it can be
generic in userspace.

I think we can set a reasonable value in userspace, so that for SNP, it
doesn't trigger the throttling since the large request will be split to
multiple userspace requests.

> in which case existing
> KVM_HC_MAP_GPA_RANGE interface seems like it should be sufficient.
>
> -Mike
>
>>
>>>> For TDX, it may also want to use KVM_HC_MAP_GPA_RANGE hypercall to
>>>> userspace via KVM_EXIT_HYPERCALL.
>>> Yes, definitely.
>>>
>>> Paolo
>>>

2024-05-28 10:40:00

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

On Mon, May 27, 2024 at 2:26 PM Binbin Wu <[email protected]> wrote:
> > It seems like TDX should be able to do something similar by limiting the
> > size of each KVM_HC_MAP_GPA_RANGE to TDX_MAP_GPA_MAX_LEN, and then
> > returning TDG_VP_VMCALL_RETRY to guest if the original size was greater
> > than TDX_MAP_GPA_MAX_LEN. But at that point you're effectively done with
> > the entire request and can return to guest, so it actually seems a little
> > more straightforward than the SNP case above. E.g. TDX has a 1:1 mapping
> > between TDG_VP_VMCALL_MAP_GPA and KVM_HC_MAP_GPA_RANGE events. (And even
> > similar names :))
> >
> > So doesn't seem like there's a good reason to expose any of these
> > throttling details to userspace,

I think userspace should never be worried about throttling. I would
say it's up to the guest to split the GPA into multiple ranges, but
that's not how arch/x86/coco/tdx/tdx.c is implemented so instead we
can do the split in KVM instead. It can be a module parameter or VM
attribute, establishing the size that will be processed in a single
TDVMCALL.

Paolo

>
> The reasons I want to put the throttling in userspace are:
> 1. Hardcode the TDX_MAP_GPA_MAX_LEN in kernel may not be preferred.
> 2. The throttling thing doesn't need to be TDX specific, it can be
> generic in userspace.
>
> I think we can set a reasonable value in userspace, so that for SNP, it
> doesn't trigger the throttling since the large request will be split to
> multiple userspace requests.
>
>
> > in which case existing
> > KVM_HC_MAP_GPA_RANGE interface seems like it should be sufficient.
> >
> > -Mike
> >
> >>
> >>>> For TDX, it may also want to use KVM_HC_MAP_GPA_RANGE hypercall to
> >>>> userspace via KVM_EXIT_HYPERCALL.
> >>> Yes, definitely.
> >>>
> >>> Paolo
> >>>
>

2024-05-29 20:02:23

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

On Tue, May 28, 2024, Paolo Bonzini wrote:
> On Mon, May 27, 2024 at 2:26 PM Binbin Wu <[email protected]> wrote:
> > > It seems like TDX should be able to do something similar by limiting the
> > > size of each KVM_HC_MAP_GPA_RANGE to TDX_MAP_GPA_MAX_LEN, and then
> > > returning TDG_VP_VMCALL_RETRY to guest if the original size was greater
> > > than TDX_MAP_GPA_MAX_LEN. But at that point you're effectively done with
> > > the entire request and can return to guest, so it actually seems a little
> > > more straightforward than the SNP case above. E.g. TDX has a 1:1 mapping
> > > between TDG_VP_VMCALL_MAP_GPA and KVM_HC_MAP_GPA_RANGE events. (And even
> > > similar names :))
> > >
> > > So doesn't seem like there's a good reason to expose any of these
> > > throttling details to userspace,
>
> I think userspace should never be worried about throttling. I would
> say it's up to the guest to split the GPA into multiple ranges,

I agree in principle, but in practice I can understand not wanting to split up
the conversion in the guest due to the additional overhead of the world switches.

> but that's not how arch/x86/coco/tdx/tdx.c is implemented so instead we can
> do the split in KVM instead. It can be a module parameter or VM attribute,
> establishing the size that will be processed in a single TDVMCALL.

Is it just interrupts that are problematic for conversions? I assume so, because
I can't think of anything else where telling the guest to retry would be appropriate
and useful.

If so, KVM shouldn't need to unconditionally restrict the size for a single
TDVMCALL, KVM just needs to ensure interrupts are handled soonish. To do that,
KVM could use a much smaller chunk size, e.g. 64KiB (completely made up number),
and keep processing the TDVMCALL as long as there is no interrupt pending.
Hopefully that would obviate the need for a tunable.

2024-05-30 16:48:34

by Zhi Wang

[permalink] [raw]

Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

On Tue, 21 May 2024 16:49:52 -0500
Michael Roth <[email protected]> wrote:

> On Tue, May 21, 2024 at 08:49:59AM +0800, Binbin Wu wrote:
> >
> >
> > On 5/17/2024 1:23 AM, Paolo Bonzini wrote:
> > > On Thu, May 16, 2024 at 10:29 AM Binbin Wu
> > > <[email protected]> wrote:
> > > >
> > > >
> > > > On 5/1/2024 4:51 PM, Michael Roth wrote:
> > > > > SEV-SNP VMs can ask the hypervisor to change the page state
> > > > > in the RMP table to be private or shared using the Page State
> > > > > Change MSR protocol as defined in the GHCB specification.
> > > > >
> > > > > When using gmem, private/shared memory is allocated through
> > > > > separate pools, and KVM relies on userspace issuing a
> > > > > KVM_SET_MEMORY_ATTRIBUTES KVM ioctl to tell the KVM MMU
> > > > > whether or not a particular GFN should be backed by private
> > > > > memory or not.
> > > > >
> > > > > Forward these page state change requests to userspace so that
> > > > > it can issue the expected KVM ioctls. The KVM MMU will handle
> > > > > updating the RMP entries when it is ready to map a private
> > > > > page into a guest.
> > > > >
> > > > > Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to
> > > > > deliver these requests to userspace via KVM_EXIT_HYPERCALL.
> > > > >
> > > > > Signed-off-by: Michael Roth <[email protected]>
> > > > > Co-developed-by: Brijesh Singh <[email protected]>
> > > > > Signed-off-by: Brijesh Singh <[email protected]>
> > > > > Signed-off-by: Ashish Kalra <[email protected]>
> > > > > ---
> > > > > arch/x86/include/asm/sev-common.h | 6 ++++
> > > > > arch/x86/kvm/svm/sev.c | 48
> > > > > +++++++++++++++++++++++++++++++ 2 files changed, 54
> > > > > insertions(+)
> > > > >
> > > > > diff --git a/arch/x86/include/asm/sev-common.h
> > > > > b/arch/x86/include/asm/sev-common.h index
> > > > > 1006bfffe07a..6d68db812de1 100644 ---
> > > > > a/arch/x86/include/asm/sev-common.h +++
> > > > > b/arch/x86/include/asm/sev-common.h @@ -101,11 +101,17 @@
> > > > > enum psc_op { /* GHCBData[11:0] */
> > > > > \ GHCB_MSR_PSC_REQ)
> > > > >
> > > > > +#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) &
> > > > > GENMASK_ULL(51, 12)) >> 12) +#define
> > > > > GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >>
> > > > > 52) + #define GHCB_MSR_PSC_RESP 0x015
> > > > > #define GHCB_MSR_PSC_RESP_VAL(val) \
> > > > > /* GHCBData[63:32] */ \
> > > > > (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
> > > > >
> > > > > +/* Set highest bit as a generic error response */
> > > > > +#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) |
> > > > > GHCB_MSR_PSC_RESP) +
> > > > > /* GHCB Hypervisor Feature Request/Response */
> > > > > #define GHCB_MSR_HV_FT_REQ 0x080
> > > > > #define GHCB_MSR_HV_FT_RESP 0x081
> > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > index e1ac5af4cb74..720775c9d0b8 100644
> > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > @@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct
> > > > > vcpu_svm *svm, u64 value) svm->vmcb->control.ghcb_gpa = value;
> > > > > }
> > > > >
> > > > > +static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > + struct vcpu_svm *svm = to_svm(vcpu);
> > > > > +
> > > > > + if (vcpu->run->hypercall.ret)
> > > > Do we have definition of ret? I didn't find clear documentation
> > > > about it. According to the code, 0 means succssful. Is there
> > > > any other error codes need to or can be interpreted?
> > > They are defined in include/uapi/linux/kvm_para.h
> > >
> > > #define KVM_ENOSYS 1000
> > > #define KVM_EFAULT EFAULT /* 14 */
> > > #define KVM_EINVAL EINVAL /* 22 */
> > > #define KVM_E2BIG E2BIG /* 7 */
> > > #define KVM_EPERM EPERM /* 1*/
> > > #define KVM_EOPNOTSUPP 95
> > >
> > > Linux however does not expect the hypercall to fail for
> > > SEV/SEV-ES; and it will terminate the guest if the PSC operation
> > > fails for SEV-SNP. So it's best for userspace if the hypercall
> > > always succeeds. :)
> > Thanks for the info.
> >
> > For TDX, it wants to restrict the size of memory range for
> > conversion in one hypercall to avoid a too long latency.
> > Previously, in TDX QEMU patchset v5, the limitation is in userspace
> > and if the size is too big, the status_code will set to
> > TDG_VP_VMCALL_RETRY and the failed GPA for guest to retry is
> > updated.
> > https://lore.kernel.org/all/[email protected]/
> >
> > When TDX converts TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE, do you
> > think which is more reasonable to set the restriction? In KVM (TDX
> > specific code) or userspace?
> > If userspace is preferred, then the interface needs to be extended
> > to support it.
>
> With SNP we might get a batch of requests in a single GHCB request,
> and potentially each of those requests need to get set out to
> userspace as a single KVM_HC_MAP_GPA_RANGE. The subsequent patch here
> handles that in a loop by issuing a new KVM_HC_MAP_GPA_RANGE via the
> completion handler. So we also sort of need to split large requests
> into multiple userspace requests in some cases.
>
> It seems like TDX should be able to do something similar by limiting
> the size of each KVM_HC_MAP_GPA_RANGE to TDX_MAP_GPA_MAX_LEN, and then
> returning TDG_VP_VMCALL_RETRY to guest if the original size was
> greater than TDX_MAP_GPA_MAX_LEN. But at that point you're
> effectively done with the entire request and can return to guest, so
> it actually seems a little more straightforward than the SNP case
> above. E.g. TDX has a 1:1 mapping between TDG_VP_VMCALL_MAP_GPA and
> KVM_HC_MAP_GPA_RANGE events. (And even similar names :))
>
> So doesn't seem like there's a good reason to expose any of these
> throttling details to userspace, in which case existing
> KVM_HC_MAP_GPA_RANGE interface seems like it should be sufficient.
>

Is there any rough data about the latency of private-shared and
shared-private page conversion?

Thanks,
Zhi.
> -Mike
>
> >
> >
> > >
> > > > For TDX, it may also want to use KVM_HC_MAP_GPA_RANGE hypercall
> > > > to userspace via KVM_EXIT_HYPERCALL.
> > > Yes, definitely.
> > >
> > > Paolo
> > >
> >
>

2024-05-31 01:23:12

by Binbin Wu

[permalink] [raw]

Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

On 5/30/2024 4:02 AM, Sean Christopherson wrote:
> On Tue, May 28, 2024, Paolo Bonzini wrote:
>> On Mon, May 27, 2024 at 2:26 PM Binbin Wu <[email protected]> wrote:
>>>> It seems like TDX should be able to do something similar by limiting the
>>>> size of each KVM_HC_MAP_GPA_RANGE to TDX_MAP_GPA_MAX_LEN, and then
>>>> returning TDG_VP_VMCALL_RETRY to guest if the original size was greater
>>>> than TDX_MAP_GPA_MAX_LEN. But at that point you're effectively done with
>>>> the entire request and can return to guest, so it actually seems a little
>>>> more straightforward than the SNP case above. E.g. TDX has a 1:1 mapping
>>>> between TDG_VP_VMCALL_MAP_GPA and KVM_HC_MAP_GPA_RANGE events. (And even
>>>> similar names :))
>>>>
>>>> So doesn't seem like there's a good reason to expose any of these
>>>> throttling details to userspace,
>> I think userspace should never be worried about throttling. I would
>> say it's up to the guest to split the GPA into multiple ranges,
> I agree in principle, but in practice I can understand not wanting to split up
> the conversion in the guest due to the additional overhead of the world switches.
>
>> but that's not how arch/x86/coco/tdx/tdx.c is implemented so instead we can
>> do the split in KVM instead. It can be a module parameter or VM attribute,
>> establishing the size that will be processed in a single TDVMCALL.
> Is it just interrupts that are problematic for conversions? I assume so, because
> I can't think of anything else where telling the guest to retry would be appropriate
> and useful.

The concern was the lockup detection in guest.

>
> If so, KVM shouldn't need to unconditionally restrict the size for a single
> TDVMCALL, KVM just needs to ensure interrupts are handled soonish. To do that,
> KVM could use a much smaller chunk size, e.g. 64KiB (completely made up number),
> and keep processing the TDVMCALL as long as there is no interrupt pending.
> Hopefully that would obviate the need for a tunable.

Thanks for the suggestion.
By this way, interrupt can be injected to guest in time and the lockup
detection should not be a problem.

About the chunk size, if it is too small, it will increase the cost of
kernel/userspace context switches.
Maybe 2MB?

2024-05-31 13:11:09

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

On Fri, May 31, 2024 at 3:23 AM Binbin Wu <[email protected]> wrote:
> About the chunk size, if it is too small, it will increase the cost of
> kernel/userspace context switches.
> Maybe 2MB?

Yeah, 2MB sounds right.

Paolo