2024-04-18 19:43:38

by Michael Roth

[permalink] [raw]
Subject: [PATCH v13 00/26] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

This patchset is also available at:

https://github.com/amdese/linux/commits/snp-host-v13

and is based on commit 4d2deb62185f (as suggested by Paolo) from:

https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue


Patch Layout
------------

01-03: These patches are minor dependencies for this series and are already
included in both tip/master and mainline, so are only included here
as a stop-gap until merged from one of those trees. These are needed
by patch #8 in this series which makes use of CC_ATTR_HOST_SEV_SNP

04: This is a small general fix-up for guest_memfd that can be applied
independently of this series.

05-08: These patches add some basic infrastructure and introduces a new
KVM_X86_SNP_VM vm_type to handle differences verses the existing
KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.

09-11: These implement the KVM API to handle the creation of a
cryptographic launch context, encrypt/measure the initial image
into guest memory, and finalize it before launching it.

12-17: These implement handling for various guest-generated events such
as page state changes, onlining of additional vCPUs, etc.

18-21: These implement the gmem hooks needed to prepare gmem-allocated
pages before mapping them into guest private memory ranges as
well as cleaning them up prior to returning them to the host for
use as normal memory. Because this supplants certain activities
like issued WBINVDs during KVM MMU invalidations, there's also
a patch to avoid duplicating that work to avoid unecessary
overhead.

22: With all the core support in place, the patch adds a kvm_amd module
parameter to enable SNP support.

23-26: These patches all deal with the servicing of guest requests to handle
things like attestation, as well as some related host-management
interfaces.


Testing
-------

For testing this via QEMU, use the following tree:

https://github.com/amdese/qemu/commits/snp-v4-wip3

A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
ranges that are mapped as private. It is recommended you build the AmdSevX64
variant as it provides the kernel-hashing support present in this series:

https://github.com/amdese/ovmf/commits/apic-mmio-fix1d

A basic command-line invocation for SNP would be:

qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
-machine q35,confidential-guest-support=sev0,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
-bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd

With kernel-hashing and certificate data supplied:

qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
-machine q35,confidential-guest-support=sev0,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
-bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
-kernel /boot/vmlinuz-$ver
-initrd /boot/initrd.img-$ver
-append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"

With standard X64 OVMF package with separate image for persistent NVRAM:

qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
-machine q35,confidential-guest-support=sev0,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
-bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d.fd
-drive if=pflash,format=raw,unit=0,file=OVMF_VARS-upstream-20240410-apic-mmio-fix1d.fd,readonly=off


Known issues / TODOs
--------------------

* SEV-ES guests may trigger the following warning:

WARNING: CPU: 151 PID: 4003 at arch/x86/kvm/mmu/mmu.c:5855 kvm_mmu_page_fault+0x33b/0x860 [kvm]

It is assumed here that these will be resolved once the transition to
PFERR_PRIVATE_ACCESS is fully completed, but if that's not the case let me
know and will investigate further.

* Base tree in some cases reports "Unpatched return thunk in use. This should
not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
regression upstream and unrelated to this series:

https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@mail.gmail.com/T/

* 2MB hugepage support has been dropped pending discussion on how we plan to
re-enable it in gmem.

* Host kexec should work, but there is a known issue with host kdump support
while SNP guests are running that will be addressed as a follow-up.

* SNP kselftests are currently a WIP and will be included as part of SNP
upstreaming efforts in the near-term.


SEV-SNP Overview
----------------

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required to add KVM support for SEV-SNP. This series builds upon
SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
initialization support, which is now in linux-next.

While series provides the basic building blocks to support booting the
SEV-SNP VMs, it does not cover all the security enhancement introduced by
the SEV-SNP such as interrupt protection, which will added in the future.

With SNP, when pages are marked as guest-owned in the RMP table, they are
assigned to a specific guest/ASID, as well as a specific GFN with in the
guest. Any attempts to map it in the RMP table to a different guest/ASID,
or a different GFN within a guest/ASID, will result in an RMP nested page
fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outside of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation of SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike


Changes since v12:

* rebased to latest kvm-coco-queue branch (commit 4d2deb62185f)
* add more input validation for SNP_LAUNCH_START, especially for handling
things like MBO/MBZ policy bits, and API major/minor minimums. (Paolo)
* block SNP KVM instances from being able to run legacy SEV commands (Paolo)
* don't attempt to measure VMSA for vcpu 0/BSP before the others, let
userspace deal with the ordering just like with SEV-ES (Paolo)
* fix up docs for SNP_LAUNCH_FINISH (Paolo)
* introduce svm->sev_es.snp_has_guest_vmsa flag to better distinguish
handling for guest-mapped vs non-guest-mapped VMSAs, rename
'snp_ap_create' flag to 'snp_ap_waiting_for_reset' (Paolo)
* drop "KVM: SEV: Use a VMSA physical address variable for populating VMCB"
as it is no longer needed due to above VMSA rework
* replace pr_debug_ratelimited() messages for RMP #NPFs with a single trace
event
* handle transient PSMASH_FAIL_INUSE return codes in kvm_gmem_invalidate(),
switch to WARN_ON*()'s to indicate remaining error cases are not expected
and should not be seen in practice. (Paolo)
* add a cond_resched() in kvm_gmem_invalidate() to avoid soft lock-ups when
cleaning up large guest memory ranges.
* rename VLEK_REQUIRED to VCEK_DISABLE. it's be more applicable if another
key type ever gets added.
* don't allow attestation to be paused while an attestation request is
being processed by firmware (Tom)
* add missing Documentation entry for SNP_VLEK_LOAD
* collect Reviewed-by's from Paolo and Tom

Changes since v11:

* Rebase series on kvm-coco-queue and re-work to leverage more
infrastructure between SNP/TDX series.
* Drop KVM_SNP_INIT in favor of the new KVM_SEV_INIT2 interface introduced
here (Paolo):
https://lore.kernel.org/lkml/[email protected]/
* Drop exposure API fields related to things like VMPL levels, migration
agents, etc., until they are actually supported/used (Sean)
* Rework KVM_SEV_SNP_LAUNCH_UPDATE handling to use a new
kvm_gmem_populate() interface instead of copying data directly into
gmem-allocated pages (Sean)
* Add support for SNP_LOAD_VLEK, rework the SNP_SET_CONFIG_{START,END} to
have simpler semantics that are applicable to management of SNP_LOAD_VLEK
updates as well, rename interfaces to the now more appropriate
SNP_{PAUSE,RESUME}_ATTESTATION
* Fix up documentation wording and do print warnings for
userspace-triggerable failures (Peter, Sean)
* Fix a race with AP_CREATION wake-up events (Jacob, Sean)
* Fix a memory leak with VMSA pages (Sean)
* Tighten up handling of RMP page faults to better distinguish between real
and spurious cases (Tom)
* Various patch/documentation rewording, cleanups, etc.


----------------------------------------------------------------
Ashish Kalra (1):
KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

Borislav Petkov (AMD) (3):
[TEMP] x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM
[TEMP] x86/cc: Add cc_platform_set/_clear() helpers
[TEMP] x86/CPU/AMD: Track SNP host status with cc_platform_*()

Brijesh Singh (10):
KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
KVM: SEV: Add support to handle Page State Change VMGEXIT
KVM: SEV: Add support to handle RMP nested page faults
KVM: SVM: Add module parameter to enable SEV-SNP
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

Michael Roth (10):
KVM: guest_memfd: Fix PTR_ERR() handling in __kvm_gmem_get_pfn()
KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
KVM: SEV: Add initial SEV-SNP support
KVM: SEV: Add support for GHCB-based termination requests
KVM: SEV: Implement gmem hook for initializing private pages
KVM: SEV: Implement gmem hook for invalidating private pages
KVM: x86: Implement gmem hook for determining max NPT mapping level
crypto: ccp: Add the SNP_VLEK_LOAD command
crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands
KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Tom Lendacky (2):
KVM: SEV: Add support to handle AP reset MSR protocol
KVM: SEV: Support SEV-SNP AP Creation NAE event

Documentation/virt/coco/sev-guest.rst | 69 +-
Documentation/virt/kvm/api.rst | 73 +
.../virt/kvm/x86/amd-memory-encryption.rst | 88 +-
arch/x86/coco/core.c | 52 +
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/asm/sev-common.h | 22 +-
arch/x86/include/asm/sev.h | 19 +-
arch/x86/include/asm/svm.h | 9 +-
arch/x86/include/uapi/asm/kvm.h | 39 +
arch/x86/kernel/cpu/amd.c | 38 +-
arch/x86/kernel/cpu/mtrr/generic.c | 2 +-
arch/x86/kernel/sev.c | 10 -
arch/x86/kvm/Kconfig | 4 +
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 1 +
arch/x86/kvm/svm/sev.c | 1444 +++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 39 +-
arch/x86/kvm/svm/svm.h | 50 +
arch/x86/kvm/trace.h | 31 +
arch/x86/kvm/x86.c | 19 +-
arch/x86/virt/svm/sev.c | 106 +-
drivers/crypto/ccp/sev-dev.c | 85 +-
drivers/iommu/amd/init.c | 4 +-
include/linux/cc_platform.h | 12 +
include/linux/psp-sev.h | 4 +-
include/uapi/linux/kvm.h | 28 +
include/uapi/linux/psp-sev.h | 39 +
include/uapi/linux/sev-guest.h | 9 +
virt/kvm/guest_memfd.c | 8 +-
29 files changed, 2229 insertions(+), 79 deletions(-)




2024-04-18 19:45:59

by Michael Roth

[permalink] [raw]
Subject: [PATCH v13 13/26] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

When using gmem, private/shared memory is allocated through separate
pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
backed by private memory or not.

Forward these page state change requests to userspace so that it can
issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
entries when it is ready to map a private page into a guest.

Define a new KVM_EXIT_VMGEXIT for exits of this type, and structure it
so that it can be extended for other cases where VMGEXITs need some
level of handling in userspace.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
Documentation/virt/kvm/api.rst | 33 +++++++++++++++++++++++++++++++
arch/x86/include/asm/sev-common.h | 6 ++++++
arch/x86/kvm/svm/sev.c | 33 +++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 17 ++++++++++++++++
4 files changed, 89 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index f0b76ff5030d..4a7a2945bc78 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7060,6 +7060,39 @@ Please note that the kernel is allowed to use the kvm_run structure as the
primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.

+::
+
+ /* KVM_EXIT_VMGEXIT */
+ struct kvm_user_vmgexit {
+ #define KVM_USER_VMGEXIT_PSC_MSR 1
+ __u32 type; /* KVM_USER_VMGEXIT_* type */
+ union {
+ struct {
+ __u64 gpa;
+ #define KVM_USER_VMGEXIT_PSC_MSR_OP_PRIVATE 1
+ #define KVM_USER_VMGEXIT_PSC_MSR_OP_SHARED 2
+ __u8 op;
+ __u32 ret;
+ } psc_msr;
+ };
+ };
+
+If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest
+has issued a VMGEXIT instruction (as documented by the AMD Architecture
+Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
+userspace. These are generally handled by the host kernel, but in some
+cases some aspects handling a VMGEXIT are handled by userspace.
+
+A kvm_user_vmgexit structure is defined to encapsulate the data to be
+sent to or returned by userspace. The type field defines the specific type
+of exit that needs to be serviced, and that type is used as a discriminator
+to determine which union type should be used for input/output.
+
+For the KVM_USER_VMGEXIT_PSC_MSR type, the psc_msr union type is used. The
+kernel will supply the 'gpa' and 'op' fields, and userspace is expected to
+update the private/shared state of the GPA using the corresponding
+KVM_SET_MEMORY_ATTRIBUTES ioctl. The 'ret' field is to be set to 0 by
+userpace on success, or some non-zero value on failure.

6. Capabilities that can be enabled on vCPUs
============================================
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1006bfffe07a..6d68db812de1 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,11 +101,17 @@ enum psc_op {
/* GHCBData[11:0] */ \
GHCB_MSR_PSC_REQ)

+#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
+#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
+
#define GHCB_MSR_PSC_RESP 0x015
#define GHCB_MSR_PSC_RESP_VAL(val) \
/* GHCBData[63:32] */ \
(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)

+/* Set highest bit as a generic error response */
+#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
+
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index bd7f46c61c64..e982468554cb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3454,6 +3454,36 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}

+static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ u64 vmm_ret = vcpu->run->vmgexit.psc_msr.ret;
+
+ set_ghcb_msr(svm, (vmm_ret << 32) | GHCB_MSR_PSC_RESP);
+
+ return 1; /* resume guest */
+}
+
+static int snp_begin_psc_msr(struct kvm_vcpu *vcpu, u64 ghcb_msr)
+{
+ u64 gpa = gfn_to_gpa(GHCB_MSR_PSC_REQ_TO_GFN(ghcb_msr));
+ u8 op = GHCB_MSR_PSC_REQ_TO_OP(ghcb_msr);
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (op != SNP_PAGE_STATE_PRIVATE && op != SNP_PAGE_STATE_SHARED) {
+ set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+ return 1; /* resume guest */
+ }
+
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_PSC_MSR;
+ vcpu->run->vmgexit.psc_msr.gpa = gpa;
+ vcpu->run->vmgexit.psc_msr.op = op;
+ vcpu->arch.complete_userspace_io = snp_complete_psc_msr;
+
+ return 0; /* forward request to userspace */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3552,6 +3582,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PSC_REQ:
+ ret = snp_begin_psc_msr(vcpu, control->ghcb_gpa);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..54b81e46a9fa 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -135,6 +135,20 @@ struct kvm_xen_exit {
} u;
};

+struct kvm_user_vmgexit {
+#define KVM_USER_VMGEXIT_PSC_MSR 1
+ __u32 type; /* KVM_USER_VMGEXIT_* type */
+ union {
+ struct {
+ __u64 gpa;
+#define KVM_USER_VMGEXIT_PSC_MSR_OP_PRIVATE 1
+#define KVM_USER_VMGEXIT_PSC_MSR_OP_SHARED 2
+ __u8 op;
+ __u32 ret;
+ } psc_msr;
+ };
+};
+
#define KVM_S390_GET_SKEYS_NONE 1
#define KVM_S390_SKEYS_MAX 1048576

@@ -178,6 +192,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_NOTIFY 37
#define KVM_EXIT_LOONGARCH_IOCSR 38
#define KVM_EXIT_MEMORY_FAULT 39
+#define KVM_EXIT_VMGEXIT 40

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -433,6 +448,8 @@ struct kvm_run {
__u64 gpa;
__u64 size;
} memory_fault;
+ /* KVM_EXIT_VMGEXIT */
+ struct kvm_user_vmgexit vmgexit;
/* Fix the size of the union. */
char padding[256];
};
--
2.25.1


2024-04-18 19:46:38

by Michael Roth

[permalink] [raw]
Subject: [PATCH v13 12/26] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, verify that the GHCB GPA matches with the registered value.
If a mismatch is detected, then abort the guest.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 8 ++++++++
arch/x86/kvm/svm/sev.c | 27 +++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 7 +++++++
3 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 5a8246dd532f..1006bfffe07a 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ 0x010
+#define GHCB_MSR_GPA_VALUE_POS 12
+#define GHCB_MSR_GPA_VALUE_MASK GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP 0x011
+#define GHCB_MSR_PREF_GPA_NONE 0xfffffffffffff
+
/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 78412c7c6708..bd7f46c61c64 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3532,6 +3532,26 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
break;
+ case GHCB_MSR_PREF_GPA_REQ:
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ case GHCB_MSR_REG_GPA_REQ: {
+ u64 gfn;
+
+ gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+
+ svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+ set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -3595,6 +3615,13 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
trace_kvm_vmgexit_enter(vcpu->vcpu_id, svm->sev_es.ghcb);

sev_es_sync_from_ghcb(svm);
+
+ /* SEV-SNP guest requires that the GHCB GPA must be registered */
+ if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+ vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+ return -EINVAL;
+ }
+
ret = sev_es_validate_vmgexit(svm);
if (ret)
return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0654fc91d4db..730f5ced2a2e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -208,6 +208,8 @@ struct vcpu_sev_es_state {
u32 ghcb_sa_len;
bool ghcb_sa_sync;
bool ghcb_sa_free;
+
+ u64 ghcb_registered_gpa;
};

struct vcpu_svm {
@@ -361,6 +363,11 @@ static __always_inline bool sev_snp_guest(struct kvm *kvm)
#endif
}

+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+ return svm->sev_es.ghcb_registered_gpa == val;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1


2024-04-18 19:47:47

by Michael Roth

[permalink] [raw]
Subject: [PATCH v13 16/26] KVM: SEV: Support SEV-SNP AP Creation NAE event

From: Tom Lendacky <[email protected]>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID. The GPA is saved in the svm struct of the
target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID the next time an INIT is performed. The GPA is
saved in the svm struct of the target vCPU.

For DESTROY:
The guest indicates it wishes to stop the vCPU. The GPA is cleared
from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. If a new VMSA is to
be installed, the VMSA guest page is set as the VMSA in the vCPU VMCB
and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new VMSA is not
to be installed, the VMSA is cleared in the vCPU VMCB and the vCPU state
is set to KVM_MP_STATE_HALTED to prevent it from being run.

Signed-off-by: Tom Lendacky <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/svm.h | 6 +
arch/x86/kvm/svm/sev.c | 229 +++++++++++++++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 11 +-
arch/x86/kvm/svm/svm.h | 9 ++
arch/x86/kvm/x86.c | 11 ++
6 files changed, 264 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6f03e7649780..9943e989fadb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -121,6 +121,7 @@
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)

#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 544a43c1cf11..f0dea3750ca9 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -286,8 +286,14 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)

#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4)
#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)

+#define SVM_SEV_FEAT_INT_INJ_MODES \
+ (SVM_SEV_FEAT_RESTRICTED_INJECTION | \
+ SVM_SEV_FEAT_ALTERNATE_INJECTION)
+
struct vmcb_seg {
u16 selector;
u16 attrib;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0f70b057bfb8..2de3006fec65 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -37,7 +37,7 @@
#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

-#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP
+#define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)

/* enable/disable SEV support */
static bool sev_enabled = true;
@@ -3261,6 +3261,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
if (!kvm_ghcb_sw_scratch_is_valid(svm))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
+ if (!kvm_ghcb_rax_is_valid(svm))
+ goto vmgexit_err;
+ break;
case SVM_VMGEXIT_NMI_COMPLETE:
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3511,6 +3516,205 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
return 1; /* resume guest */
}

+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+ /* Mark the vCPU as offline and not runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+
+ /* Clear use of the VMSA */
+ svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+ if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+ gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
+ struct kvm_memory_slot *slot;
+ kvm_pfn_t pfn;
+
+ slot = gfn_to_memslot(vcpu->kvm, gfn);
+ if (!slot)
+ return -EINVAL;
+
+ /*
+ * The new VMSA will be private memory guest memory, so
+ * retrieve the PFN from the gmem backend.
+ */
+ if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
+ return -EINVAL;
+
+ /*
+ * From this point forward, the VMSA will always be a
+ * guest-mapped page rather than the initial one allocated
+ * by KVM in svm->sev_es.vmsa. In theory, svm->sev_es.vmsa
+ * could be free'd and cleaned up here, but that involves
+ * cleanups like wbinvd_on_all_cpus() which would ideally
+ * be handled during teardown rather than guest boot.
+ * Deferring that also allows the existing logic for SEV-ES
+ * VMSAs to be re-used with minimal SNP-specific changes.
+ */
+ svm->sev_es.snp_has_guest_vmsa = true;
+
+ /* Use the new VMSA */
+ svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
+
+ /* Mark the vCPU as runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+
+ /*
+ * gmem pages aren't currently migratable, but if this ever
+ * changes then care should be taken to ensure
+ * svm->sev_es.vmsa is pinned through some other means.
+ */
+ kvm_release_pfn_clean(pfn);
+ }
+
+ /*
+ * When replacing the VMSA during SEV-SNP AP creation,
+ * mark the VMCB dirty so that full state is always reloaded.
+ */
+ vmcb_mark_all_dirty(svm->vmcb);
+
+ return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ int ret;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ return;
+
+ mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+ if (!svm->sev_es.snp_ap_waiting_for_reset)
+ goto unlock;
+
+ svm->sev_es.snp_ap_waiting_for_reset = false;
+
+ ret = __sev_snp_update_protected_guest_state(vcpu);
+ if (ret)
+ vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+ mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_vcpu *target_vcpu;
+ struct vcpu_svm *target_svm;
+ unsigned int request;
+ unsigned int apic_id;
+ bool kick;
+ int ret;
+
+ request = lower_32_bits(svm->vmcb->control.exit_info_1);
+ apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+ /* Validate the APIC ID */
+ target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+ if (!target_vcpu) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+ apic_id);
+ return -EINVAL;
+ }
+
+ ret = 0;
+
+ target_svm = to_svm(target_vcpu);
+
+ /*
+ * The target vCPU is valid, so the vCPU will be kicked unless the
+ * request is for CREATE_ON_INIT. For any errors at this stage, the
+ * kick will place the vCPU in an non-runnable state.
+ */
+ kick = true;
+
+ mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ target_svm->sev_es.snp_ap_waiting_for_reset = true;
+
+ /* Interrupt injection mode shouldn't change for AP creation */
+ if (request < SVM_VMGEXIT_AP_DESTROY) {
+ u64 sev_features;
+
+ sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+ sev_features ^= sev->vmsa_features;
+
+ if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+ vcpu->arch.regs[VCPU_REGS_RAX]);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ switch (request) {
+ case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+ kick = false;
+ fallthrough;
+ case SVM_VMGEXIT_AP_CREATE:
+ if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /*
+ * Malicious guest can RMPADJUST a large page into VMSA which
+ * will hit the SNP erratum where the CPU will incorrectly signal
+ * an RMP violation #PF if a hugepage collides with the RMP entry
+ * of VMSA page, reject the AP CREATE request if VMSA address from
+ * guest is 2M aligned.
+ */
+ if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+ vcpu_unimpl(vcpu,
+ "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+ break;
+ case SVM_VMGEXIT_AP_DESTROY:
+ break;
+ default:
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+ request);
+ ret = -EINVAL;
+ break;
+ }
+
+out:
+ if (kick) {
+ kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+
+ if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+ kvm_make_request(KVM_REQ_UNBLOCK, target_vcpu);
+
+ kvm_vcpu_kick(target_vcpu);
+ }
+
+ mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ return ret;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3754,6 +3958,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
vcpu->run->vmgexit.psc.shared_gpa = svm->sev_es.sw_scratch;
vcpu->arch.complete_userspace_io = snp_complete_psc;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ ret = sev_snp_ap_creation(svm);
+ if (ret) {
+ ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 2);
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
+ }
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3848,7 +4061,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
* the VMSA will be NULL if this vCPU is the destination for intrahost
* migration, and will be copied later.
*/
- if (svm->sev_es.vmsa)
+ if (!svm->sev_es.snp_has_guest_vmsa)
svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);

/* Can't intercept CR register access, HV can't modify CR registers */
@@ -3921,6 +4134,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
GHCB_VERSION_MIN,
sev_enc_bit));
+
+ mutex_init(&svm->sev_es.snp_vmsa_mutex);
}

void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa)
@@ -4032,6 +4247,16 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
return p;
}

+void sev_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+ if (!sev_snp_guest(vcpu->kvm))
+ return;
+
+ if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu) &&
+ vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+}
+
void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
{
struct kvm_memory_slot *slot;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1cddf7a2aec1..9dc929316c5d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1398,6 +1398,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
svm->spec_ctrl = 0;
svm->virt_spec_ctrl = 0;

+ if (init_event)
+ sev_snp_init_protected_guest_state(vcpu);
+
init_vmcb(vcpu);

if (!init_event)
@@ -4939,6 +4942,12 @@ static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
return page_address(page);
}

+static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+ sev_vcpu_unblocking(vcpu);
+ avic_vcpu_unblocking(vcpu);
+}
+
static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = KBUILD_MODNAME,

@@ -4961,7 +4970,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_load = svm_vcpu_load,
.vcpu_put = svm_vcpu_put,
.vcpu_blocking = avic_vcpu_blocking,
- .vcpu_unblocking = avic_vcpu_unblocking,
+ .vcpu_unblocking = svm_vcpu_unblocking,

.update_exception_bitmap = svm_update_exception_bitmap,
.get_msr_feature = svm_get_msr_feature,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d2b0ec27d4fe..81e335dca281 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -210,6 +210,11 @@ struct vcpu_sev_es_state {
bool ghcb_sa_free;

u64 ghcb_registered_gpa;
+
+ struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
+ gpa_t snp_vmsa_gpa;
+ bool snp_ap_waiting_for_reset;
+ bool snp_has_guest_vmsa;
};

struct vcpu_svm {
@@ -723,6 +728,8 @@ int sev_cpu_init(struct svm_cpu_data *sd);
int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
extern unsigned int max_sev_asid;
void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -737,6 +744,8 @@ static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
#define max_sev_asid 0
static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
+static inline void sev_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}

#endif

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a9d014961d2b..436078b9e5aa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10938,6 +10938,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)

if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+ if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+ kvm_vcpu_reset(vcpu, true);
+ if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+ r = 1;
+ goto out;
+ }
+ }
}

if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -13145,6 +13153,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
if (kvm_test_request(KVM_REQ_PMI, vcpu))
return true;

+ if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+ return true;
+
if (kvm_arch_interrupt_allowed(vcpu) &&
(kvm_cpu_has_interrupt(vcpu) ||
kvm_guest_apic_has_interrupt(vcpu)))
--
2.25.1


2024-04-18 19:49:45

by Michael Roth

[permalink] [raw]
Subject: [PATCH v13 20/26] KVM: x86: Implement gmem hook for determining max NPT mapping level

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table. The existing mixed-attribute tracking in KVM is
insufficient here, for instance:

- gmem allocates 2MB page
- guest issues PVALIDATE on 2MB page
- guest later converts a subpage to shared
- SNP host code issues PSMASH to split 2MB RMP mapping to 4K
- KVM MMU splits NPT mapping to 4K
- guest later converts that shared page back to private

At this point there are no mixed attributes, and KVM would normally
allow for 2MB NPT mappings again, but this is actually not allowed
because the RMP table mappings are 4K and cannot be promoted on the
hypervisor side, so the NPT mappings must still be limited to 4K to
match this.

Implement a kvm_x86_ops.gmem_validate_fault() hook for SEV that checks
for this condition and adjusts the mapping level accordingly.

Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 32 ++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 7 +++++++
3 files changed, 40 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f60bb8291494..3fabd1ee718f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4519,3 +4519,35 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
cond_resched();
}
}
+
+/*
+ * Re-check whether an #NPF for a private/gmem page can still be serviced, and
+ * adjust maximum mapping level if needed.
+ */
+int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
+ u8 *max_level)
+{
+ int level, rc;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc) {
+ pr_err_ratelimited("SEV: RMP entry not found: GFN %llx PFN %llx level %d error %d\n",
+ gfn, pfn, level, rc);
+ return -ENOENT;
+ }
+
+ if (!assigned) {
+ pr_err_ratelimited("SEV: RMP entry is not assigned: GFN %llx PFN %llx level %d\n",
+ gfn, pfn, level);
+ return -EINVAL;
+ }
+
+ if (level < *max_level)
+ *max_level = level;
+
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 60d121250b0d..4b330b5ba4c5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5083,6 +5083,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.gmem_prepare = sev_gmem_prepare,
.gmem_invalidate = sev_gmem_invalidate,
+ .gmem_validate_fault = sev_gmem_validate_fault,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6721e5c6cf73..8a8ee475ad86 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -732,6 +732,8 @@ void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
+ u8 *max_level);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -753,6 +755,11 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
return 0;
}
static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
+static inline int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn,
+ bool is_private, u8 *max_level)
+{
+ return 0;
+}

#endif

--
2.25.1


2024-04-18 19:50:39

by Michael Roth

[permalink] [raw]
Subject: [PATCH v13 17/26] KVM: SEV: Add support for GHCB-based termination requests

GHCB version 2 adds support for a GHCB-based termination request that
a guest can issue when it reaches an error state and wishes to inform
the hypervisor that it should be terminated. Implement support for that
similarly to GHCB MSR-based termination requests that are already
available to SEV-ES guests via earlier versions of the GHCB protocol.

See 'Termination Request' in the 'Invoking VMGEXIT' section of the GHCB
specification for more details.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2de3006fec65..2e0e825b6436 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3272,6 +3272,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
+ case SVM_VMGEXIT_TERM_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3967,6 +3968,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)

ret = 1;
break;
+ case SVM_VMGEXIT_TERM_REQUEST:
+ pr_info("SEV-ES guest requested termination: reason %#llx info %#llx\n",
+ control->exit_info_1, control->exit_info_2);
+ vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+ vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
+ vcpu->run->system_event.ndata = 1;
+ vcpu->run->system_event.data[0] = control->ghcb_gpa;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.25.1


2024-04-18 19:54:01

by Michael Roth

[permalink] [raw]
Subject: [PATCH v13 02/26] [TEMP] x86/cc: Add cc_platform_set/_clear() helpers

From: "Borislav Petkov (AMD)" <[email protected]>

Add functionality to set and/or clear different attributes of the
machine as a confidential computing platform. Add the first one too:
whether the machine is running as a host for SEV-SNP guests.

Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/coco/core.c | 52 +++++++++++++++++++++++++++++++++++++
include/linux/cc_platform.h | 12 +++++++++
2 files changed, 64 insertions(+)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index d07be9d05cd0..8c3fae23d3c6 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -16,6 +16,11 @@
enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
u64 cc_mask __ro_after_init;

+static struct cc_attr_flags {
+ __u64 host_sev_snp : 1,
+ __resv : 63;
+} cc_flags;
+
static bool noinstr intel_cc_platform_has(enum cc_attr attr)
{
switch (attr) {
@@ -89,6 +94,9 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
case CC_ATTR_GUEST_SEV_SNP:
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;

+ case CC_ATTR_HOST_SEV_SNP:
+ return cc_flags.host_sev_snp;
+
default:
return false;
}
@@ -148,3 +156,47 @@ u64 cc_mkdec(u64 val)
}
}
EXPORT_SYMBOL_GPL(cc_mkdec);
+
+static void amd_cc_platform_clear(enum cc_attr attr)
+{
+ switch (attr) {
+ case CC_ATTR_HOST_SEV_SNP:
+ cc_flags.host_sev_snp = 0;
+ break;
+ default:
+ break;
+ }
+}
+
+void cc_platform_clear(enum cc_attr attr)
+{
+ switch (cc_vendor) {
+ case CC_VENDOR_AMD:
+ amd_cc_platform_clear(attr);
+ break;
+ default:
+ break;
+ }
+}
+
+static void amd_cc_platform_set(enum cc_attr attr)
+{
+ switch (attr) {
+ case CC_ATTR_HOST_SEV_SNP:
+ cc_flags.host_sev_snp = 1;
+ break;
+ default:
+ break;
+ }
+}
+
+void cc_platform_set(enum cc_attr attr)
+{
+ switch (cc_vendor) {
+ case CC_VENDOR_AMD:
+ amd_cc_platform_set(attr);
+ break;
+ default:
+ break;
+ }
+}
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index cb0d6cd1c12f..60693a145894 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -90,6 +90,14 @@ enum cc_attr {
* Examples include TDX Guest.
*/
CC_ATTR_HOTPLUG_DISABLED,
+
+ /**
+ * @CC_ATTR_HOST_SEV_SNP: AMD SNP enabled on the host.
+ *
+ * The host kernel is running with the necessary features
+ * enabled to run SEV-SNP guests.
+ */
+ CC_ATTR_HOST_SEV_SNP,
};

#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
@@ -107,10 +115,14 @@ enum cc_attr {
* * FALSE - Specified Confidential Computing attribute is not active
*/
bool cc_platform_has(enum cc_attr attr);
+void cc_platform_set(enum cc_attr attr);
+void cc_platform_clear(enum cc_attr attr);

#else /* !CONFIG_ARCH_HAS_CC_PLATFORM */

static inline bool cc_platform_has(enum cc_attr attr) { return false; }
+static inline void cc_platform_set(enum cc_attr attr) { }
+static inline void cc_platform_clear(enum cc_attr attr) { }

#endif /* CONFIG_ARCH_HAS_CC_PLATFORM */

--
2.25.1


2024-04-18 19:58:29

by Michael Roth

[permalink] [raw]
Subject: [PATCH v13 04/26] KVM: guest_memfd: Fix PTR_ERR() handling in __kvm_gmem_get_pfn()

kvm_gmem_get_folio() may return a PTR_ERR() rather than just NULL. In
particular, for cases where EEXISTS is returned when FGP_CREAT_ONLY
flag is used. Handle this properly in __kvm_gmem_get_pfn().

Signed-off-by: Michael Roth <[email protected]>
---
virt/kvm/guest_memfd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index ccf22e44f387..9d7c6a70c547 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -580,8 +580,8 @@ static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
}

folio = kvm_gmem_get_folio(file_inode(file), index, prepare);
- if (!folio)
- return -ENOMEM;
+ if (IS_ERR_OR_NULL(folio))
+ return folio ? PTR_ERR(folio) : -ENOMEM;

if (folio_test_hwpoison(folio)) {
r = -EHWPOISON;
--
2.25.1


2024-04-19 16:18:43

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v13 04/26] KVM: guest_memfd: Fix PTR_ERR() handling in __kvm_gmem_get_pfn()

On Fri, Apr 19, 2024 at 5:11 PM Michael Roth <[email protected]> wrote:
>
> On Fri, Apr 19, 2024 at 02:58:43PM +0200, David Hildenbrand wrote:
> > On 18.04.24 21:41, Michael Roth wrote:
> > > kvm_gmem_get_folio() may return a PTR_ERR() rather than just NULL. In
> > > particular, for cases where EEXISTS is returned when FGP_CREAT_ONLY
> > > flag is used. Handle this properly in __kvm_gmem_get_pfn().
> > >
> > > Signed-off-by: Michael Roth <[email protected]>
> > > ---
> > > virt/kvm/guest_memfd.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > > index ccf22e44f387..9d7c6a70c547 100644
> > > --- a/virt/kvm/guest_memfd.c
> > > +++ b/virt/kvm/guest_memfd.c
> > > @@ -580,8 +580,8 @@ static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
> > > }
> > > folio = kvm_gmem_get_folio(file_inode(file), index, prepare);
> > > - if (!folio)
> > > - return -ENOMEM;
> > > + if (IS_ERR_OR_NULL(folio))
> > > + return folio ? PTR_ERR(folio) : -ENOMEM;
> >
> > Will it even return NULL? Staring at other filemap_grab_folio() users, they
> > all check for IS_ERR().
>
> Looks like the NULL case is handled with PTR_ERR(-ENOENT), so IS_ERR()
> would be sufficient. I think in the past kvm_gmem_get_folio() itself
> would return NULL in some cases, but as of commit 2b01b7e994e95 that's
> no longer the case.
>
> I'll fix this up to expect only PTR_ERR() when I re-spin v14, and also
> address the other kvm_gmem_get_folio() / __filemap_get_folio() call
> sites.
>
> >
> > > if (folio_test_hwpoison(folio)) {
> > > r = -EHWPOISON;
> >
> > Do we have a Fixes: tag?
>
> Fixes: 2b01b7e994e95 ("KVM: guest_memfd: pass error up from filemap_grab_folio")

I'll squash it so when you rebase on the new kvm-coco-queue it will go
away. Thanks to both!

Paolo


2024-04-21 18:01:16

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v13 00/26] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

On Fri, Apr 19, 2024 at 02:04:54PM +0200, Paolo Bonzini wrote:
> On Thu, Apr 18, 2024 at 9:42 PM Michael Roth <[email protected]> wrote:
> >
> > This patchset is also available at:
> >
> > https://github.com/amdese/linux/commits/snp-host-v13
> >
> > and is based on commit 4d2deb62185f (as suggested by Paolo) from:
> >
> > https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue
>
> This is pretty much ready to go into kvm-coco-queue. Let me know if
> you want to do a quick v14 with the few changes I suggested, or I can
> do them too.

Submitted v14 based on 20cc50a0410f from latest kvm-coco-queue
(bf1390326099). Hoping that way you can easily replace v13 with v14 and
force-push, but let me know if you wanted to go about it a different way.

>
> Then the next steps are:
>
> 1) get the mm acks
>
> 2) figure out the state of patches 1-3

With latest kvm-coco-queue these patches are now in the base tree so
I've dropped them from the series.

>
> 3) wait for more reviews of course
>
> 4) merge everything into kvm/next.
>
> Seems in good shape for a 6.10 target.

Awesome! If anything needs attention just let me know.

Thanks,

Mike

>
> Paolo
>
> >
> > Patch Layout
> > ------------
> >
> > 01-03: These patches are minor dependencies for this series and are already
> > included in both tip/master and mainline, so are only included here
> > as a stop-gap until merged from one of those trees. These are needed
> > by patch #8 in this series which makes use of CC_ATTR_HOST_SEV_SNP
> >
> > 04: This is a small general fix-up for guest_memfd that can be applied
> > independently of this series.
> >
> > 05-08: These patches add some basic infrastructure and introduces a new
> > KVM_X86_SNP_VM vm_type to handle differences verses the existing
> > KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.
> >
> > 09-11: These implement the KVM API to handle the creation of a
> > cryptographic launch context, encrypt/measure the initial image
> > into guest memory, and finalize it before launching it.
> >
> > 12-17: These implement handling for various guest-generated events such
> > as page state changes, onlining of additional vCPUs, etc.
> >
> > 18-21: These implement the gmem hooks needed to prepare gmem-allocated
> > pages before mapping them into guest private memory ranges as
> > well as cleaning them up prior to returning them to the host for
> > use as normal memory. Because this supplants certain activities
> > like issued WBINVDs during KVM MMU invalidations, there's also
> > a patch to avoid duplicating that work to avoid unecessary
> > overhead.
> >
> > 22: With all the core support in place, the patch adds a kvm_amd module
> > parameter to enable SNP support.
> >
> > 23-26: These patches all deal with the servicing of guest requests to handle
> > things like attestation, as well as some related host-management
> > interfaces.
> >
> >
> > Testing
> > -------
> >
> > For testing this via QEMU, use the following tree:
> >
> > https://github.com/amdese/qemu/commits/snp-v4-wip3
> >
> > A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
> > ranges that are mapped as private. It is recommended you build the AmdSevX64
> > variant as it provides the kernel-hashing support present in this series:
> >
> > https://github.com/amdese/ovmf/commits/apic-mmio-fix1d
> >
> > A basic command-line invocation for SNP would be:
> >
> > qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
> > -machine q35,confidential-guest-support=sev0,memory-backend=ram1
> > -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
> > -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
> > -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
> >
> > With kernel-hashing and certificate data supplied:
> >
> > qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
> > -machine q35,confidential-guest-support=sev0,memory-backend=ram1
> > -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
> > -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
> > -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
> > -kernel /boot/vmlinuz-$ver
> > -initrd /boot/initrd.img-$ver
> > -append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"
> >
> > With standard X64 OVMF package with separate image for persistent NVRAM:
> >
> > qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
> > -machine q35,confidential-guest-support=sev0,memory-backend=ram1
> > -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
> > -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
> > -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d.fd
> > -drive if=pflash,format=raw,unit=0,file=OVMF_VARS-upstream-20240410-apic-mmio-fix1d.fd,readonly=off
> >
> >
> > Known issues / TODOs
> > --------------------
> >
> > * SEV-ES guests may trigger the following warning:
> >
> > WARNING: CPU: 151 PID: 4003 at arch/x86/kvm/mmu/mmu.c:5855 kvm_mmu_page_fault+0x33b/0x860 [kvm]
> >
> > It is assumed here that these will be resolved once the transition to
> > PFERR_PRIVATE_ACCESS is fully completed, but if that's not the case let me
> > know and will investigate further.
> >
> > * Base tree in some cases reports "Unpatched return thunk in use. This should
> > not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
> > regression upstream and unrelated to this series:
> >
> > https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@mail.gmail.com/T/
> >
> > * 2MB hugepage support has been dropped pending discussion on how we plan to
> > re-enable it in gmem.
> >
> > * Host kexec should work, but there is a known issue with host kdump support
> > while SNP guests are running that will be addressed as a follow-up.
> >
> > * SNP kselftests are currently a WIP and will be included as part of SNP
> > upstreaming efforts in the near-term.
> >
> >
> > SEV-SNP Overview
> > ----------------
> >
> > This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
> > changes required to add KVM support for SEV-SNP. This series builds upon
> > SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
> > initialization support, which is now in linux-next.
> >
> > While series provides the basic building blocks to support booting the
> > SEV-SNP VMs, it does not cover all the security enhancement introduced by
> > the SEV-SNP such as interrupt protection, which will added in the future.
> >
> > With SNP, when pages are marked as guest-owned in the RMP table, they are
> > assigned to a specific guest/ASID, as well as a specific GFN with in the
> > guest. Any attempts to map it in the RMP table to a different guest/ASID,
> > or a different GFN within a guest/ASID, will result in an RMP nested page
> > fault.
> >
> > Prior to accessing a guest-owned page, the guest must validate it with a
> > special PVALIDATE instruction which will set a special bit in the RMP table
> > for the guest. This is the only way to set the validated bit outside of the
> > initial pre-encrypted guest payload/image; any attempts outside the guest to
> > modify the RMP entry from that point forward will result in the validated
> > bit being cleared, at which point the guest will trigger an exception if it
> > attempts to access that page so it can be made aware of possible tampering.
> >
> > One exception to this is the initial guest payload, which is pre-validated
> > by the firmware prior to launching. The guest can use Guest Message requests
> > to fetch an attestation report which will include the measurement of the
> > initial image so that the guest can verify it was booted with the expected
> > image/environment.
> >
> > After boot, guests can use Page State Change requests to switch pages
> > between shared/hypervisor-owned and private/guest-owned to share data for
> > things like DMA, virtio buffers, and other GHCB requests.
> >
> > In this implementation of SEV-SNP, private guest memory is managed by a new
> > kernel framework called guest_memfd (gmem). With gmem, a new
> > KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
> > MMU whether a particular GFN should be backed by shared (normal) memory or
> > private (gmem-allocated) memory. To tie into this, Page State Change
> > requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
> > then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
> > private/shared state in the KVM MMU.
> >
> > The gmem / KVM MMU hooks implemented in this series will then update the RMP
> > table entries for the backing PFNs to set them to guest-owned/private when
> > mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
> > handling in the case of shared pages where the corresponding RMP table
> > entries are left in the default shared/hypervisor-owned state.
> >
> > Feedback/review is very much appreciated!
> >
> > -Mike
> >
> >
> > Changes since v12:
> >
> > * rebased to latest kvm-coco-queue branch (commit 4d2deb62185f)
> > * add more input validation for SNP_LAUNCH_START, especially for handling
> > things like MBO/MBZ policy bits, and API major/minor minimums. (Paolo)
> > * block SNP KVM instances from being able to run legacy SEV commands (Paolo)
> > * don't attempt to measure VMSA for vcpu 0/BSP before the others, let
> > userspace deal with the ordering just like with SEV-ES (Paolo)
> > * fix up docs for SNP_LAUNCH_FINISH (Paolo)
> > * introduce svm->sev_es.snp_has_guest_vmsa flag to better distinguish
> > handling for guest-mapped vs non-guest-mapped VMSAs, rename
> > 'snp_ap_create' flag to 'snp_ap_waiting_for_reset' (Paolo)
> > * drop "KVM: SEV: Use a VMSA physical address variable for populating VMCB"
> > as it is no longer needed due to above VMSA rework
> > * replace pr_debug_ratelimited() messages for RMP #NPFs with a single trace
> > event
> > * handle transient PSMASH_FAIL_INUSE return codes in kvm_gmem_invalidate(),
> > switch to WARN_ON*()'s to indicate remaining error cases are not expected
> > and should not be seen in practice. (Paolo)
> > * add a cond_resched() in kvm_gmem_invalidate() to avoid soft lock-ups when
> > cleaning up large guest memory ranges.
> > * rename VLEK_REQUIRED to VCEK_DISABLE. it's be more applicable if another
> > key type ever gets added.
> > * don't allow attestation to be paused while an attestation request is
> > being processed by firmware (Tom)
> > * add missing Documentation entry for SNP_VLEK_LOAD
> > * collect Reviewed-by's from Paolo and Tom
> >
> > Changes since v11:
> >
> > * Rebase series on kvm-coco-queue and re-work to leverage more
> > infrastructure between SNP/TDX series.
> > * Drop KVM_SNP_INIT in favor of the new KVM_SEV_INIT2 interface introduced
> > here (Paolo):
> > https://lore.kernel.org/lkml/[email protected]/
> > * Drop exposure API fields related to things like VMPL levels, migration
> > agents, etc., until they are actually supported/used (Sean)
> > * Rework KVM_SEV_SNP_LAUNCH_UPDATE handling to use a new
> > kvm_gmem_populate() interface instead of copying data directly into
> > gmem-allocated pages (Sean)
> > * Add support for SNP_LOAD_VLEK, rework the SNP_SET_CONFIG_{START,END} to
> > have simpler semantics that are applicable to management of SNP_LOAD_VLEK
> > updates as well, rename interfaces to the now more appropriate
> > SNP_{PAUSE,RESUME}_ATTESTATION
> > * Fix up documentation wording and do print warnings for
> > userspace-triggerable failures (Peter, Sean)
> > * Fix a race with AP_CREATION wake-up events (Jacob, Sean)
> > * Fix a memory leak with VMSA pages (Sean)
> > * Tighten up handling of RMP page faults to better distinguish between real
> > and spurious cases (Tom)
> > * Various patch/documentation rewording, cleanups, etc.
> >
> >
> > ----------------------------------------------------------------
> > Ashish Kalra (1):
> > KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
> >
> > Borislav Petkov (AMD) (3):
> > [TEMP] x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM
> > [TEMP] x86/cc: Add cc_platform_set/_clear() helpers
> > [TEMP] x86/CPU/AMD: Track SNP host status with cc_platform_*()
> >
> > Brijesh Singh (10):
> > KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
> > KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
> > KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
> > KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
> > KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
> > KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
> > KVM: SEV: Add support to handle Page State Change VMGEXIT
> > KVM: SEV: Add support to handle RMP nested page faults
> > KVM: SVM: Add module parameter to enable SEV-SNP
> > KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
> >
> > Michael Roth (10):
> > KVM: guest_memfd: Fix PTR_ERR() handling in __kvm_gmem_get_pfn()
> > KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
> > KVM: SEV: Add initial SEV-SNP support
> > KVM: SEV: Add support for GHCB-based termination requests
> > KVM: SEV: Implement gmem hook for initializing private pages
> > KVM: SEV: Implement gmem hook for invalidating private pages
> > KVM: x86: Implement gmem hook for determining max NPT mapping level
> > crypto: ccp: Add the SNP_VLEK_LOAD command
> > crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands
> > KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
> >
> > Tom Lendacky (2):
> > KVM: SEV: Add support to handle AP reset MSR protocol
> > KVM: SEV: Support SEV-SNP AP Creation NAE event
> >
> > Documentation/virt/coco/sev-guest.rst | 69 +-
> > Documentation/virt/kvm/api.rst | 73 +
> > .../virt/kvm/x86/amd-memory-encryption.rst | 88 +-
> > arch/x86/coco/core.c | 52 +
> > arch/x86/include/asm/kvm_host.h | 2 +
> > arch/x86/include/asm/sev-common.h | 22 +-
> > arch/x86/include/asm/sev.h | 19 +-
> > arch/x86/include/asm/svm.h | 9 +-
> > arch/x86/include/uapi/asm/kvm.h | 39 +
> > arch/x86/kernel/cpu/amd.c | 38 +-
> > arch/x86/kernel/cpu/mtrr/generic.c | 2 +-
> > arch/x86/kernel/sev.c | 10 -
> > arch/x86/kvm/Kconfig | 4 +
> > arch/x86/kvm/mmu.h | 2 -
> > arch/x86/kvm/mmu/mmu.c | 1 +
> > arch/x86/kvm/svm/sev.c | 1444 +++++++++++++++++++-
> > arch/x86/kvm/svm/svm.c | 39 +-
> > arch/x86/kvm/svm/svm.h | 50 +
> > arch/x86/kvm/trace.h | 31 +
> > arch/x86/kvm/x86.c | 19 +-
> > arch/x86/virt/svm/sev.c | 106 +-
> > drivers/crypto/ccp/sev-dev.c | 85 +-
> > drivers/iommu/amd/init.c | 4 +-
> > include/linux/cc_platform.h | 12 +
> > include/linux/psp-sev.h | 4 +-
> > include/uapi/linux/kvm.h | 28 +
> > include/uapi/linux/psp-sev.h | 39 +
> > include/uapi/linux/sev-guest.h | 9 +
> > virt/kvm/guest_memfd.c | 8 +-
> > 29 files changed, 2229 insertions(+), 79 deletions(-)
> >
> >
>