2024-05-10 21:18:00

by Michael Roth

[permalink] [raw]
Subject: [PULL 00/19] KVM: Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

Hi Paolo,

This pull request contains v15 of the KVM SNP support patchset[1] along
with fixes and feedback from you and Sean regarding PSC request processing,
fast_page_fault() handling for SNP/TDX, and avoiding uncessary
PSMASH/zapping for KVM_EXIT_MEMORY_FAULT events. It's also been rebased
on top of kvm/queue (commit 1451476151e0), and re-tested with/without
2MB gmem pages enabled.

Thanks!

-Mike

[1] https://lore.kernel.org/kvm/[email protected]/

The following changes since commit 1451476151e08e1e83ff07ce69dd0d1d025e976e:

Merge commit 'kvm-coco-hooks' into HEAD (2024-05-10 13:20:42 -0400)

are available in the Git repository at:

https://github.com/mdroth/linux.git tags/tags/kvm-queue-snp

for you to fetch changes up to 4b3f0135f759bb1a54bb28d644c38a7780150eda:

crypto: ccp: Add the SNP_VLEK_LOAD command (2024-05-10 14:44:31 -0500)

----------------------------------------------------------------
Base x86 KVM support for running SEV-SNP guests:

- add some basic infrastructure and introduces a new KVM_X86_SNP_VM
vm_type to handle differences versus the existing KVM_X86_SEV_VM and
KVM_X86_SEV_ES_VM types.

- implement the KVM API to handle the creation of a cryptographic
launch context, encrypt/measure the initial image into guest memory,
and finalize it before launching it.

- implement handling for various guest-generated events such as page
state changes, onlining of additional vCPUs, etc.

- implement the gmem/mmu hooks needed to prepare gmem-allocated pages
before mapping them into guest private memory ranges as well as
cleaning them up prior to returning them to the host for use as
normal memory. Because those cleanup hooks supplant certain
activities like issuing WBINVDs during KVM MMU invalidations, avoid
duplicating that work to avoid unecessary overhead.

- add support for the servicing of guest requests to handle things like
attestation, as well as some related host-management interfaces to
handle updating firmware's signing key for attestation requests

----------------------------------------------------------------
Ashish Kalra (1):
KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

Brijesh Singh (8):
KVM: SEV: Add initial SEV-SNP support
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
KVM: SEV: Add support to handle RMP nested page faults
KVM: SVM: Add module parameter to enable SEV-SNP
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

Michael Roth (9):
KVM: MMU: Disable fast path if KVM_EXIT_MEMORY_FAULT is needed
KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
KVM: SEV: Add support to handle Page State Change VMGEXIT
KVM: SEV: Implement gmem hook for initializing private pages
KVM: SEV: Implement gmem hook for invalidating private pages
KVM: x86: Implement hook for determining max NPT mapping level
KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
crypto: ccp: Add the SNP_VLEK_LOAD command

Tom Lendacky (1):
KVM: SEV: Support SEV-SNP AP Creation NAE event

Documentation/virt/coco/sev-guest.rst | 19 +
Documentation/virt/kvm/api.rst | 87 ++
.../virt/kvm/x86/amd-memory-encryption.rst | 110 +-
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/asm/sev-common.h | 25 +
arch/x86/include/asm/sev.h | 3 +
arch/x86/include/asm/svm.h | 9 +-
arch/x86/include/uapi/asm/kvm.h | 48 +
arch/x86/kvm/Kconfig | 3 +
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 25 +-
arch/x86/kvm/svm/sev.c | 1546 +++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 37 +-
arch/x86/kvm/svm/svm.h | 52 +
arch/x86/kvm/trace.h | 31 +
arch/x86/kvm/x86.c | 17 +
drivers/crypto/ccp/sev-dev.c | 36 +
include/linux/psp-sev.h | 4 +-
include/uapi/linux/kvm.h | 23 +
include/uapi/linux/psp-sev.h | 27 +
include/uapi/linux/sev-guest.h | 9 +
virt/kvm/guest_memfd.c | 4 +-
22 files changed, 2086 insertions(+), 33 deletions(-)



2024-05-10 21:18:41

by Michael Roth

[permalink] [raw]
Subject: [PULL 12/19] KVM: SEV: Implement gmem hook for initializing private pages

This will handle the RMP table updates needed to put a page into a
private state before mapping it into an SEV-SNP guest.

Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/sev.c | 98 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/svm/svm.h | 5 +++
arch/x86/kvm/x86.c | 5 +++
virt/kvm/guest_memfd.c | 4 +-
6 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 5e72faca4e8f..10768f13b240 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -137,6 +137,7 @@ config KVM_AMD_SEV
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select ARCH_HAS_CC_PLATFORM
select KVM_GENERIC_PRIVATE_MEM
+ select HAVE_KVM_GMEM_PREPARE
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 518c44296f8d..2bc4aa91cd31 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4565,3 +4565,101 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
out_no_trace:
put_page(pfn_to_page(pfn));
}
+
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn = start;
+
+ while (pfn < end) {
+ int ret, rmp_level;
+ bool assigned;
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret) {
+ pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
+ pfn, start, end, rmp_level, ret);
+ return false;
+ }
+
+ if (assigned) {
+ pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
+ __func__, pfn, start, end, rmp_level);
+ return false;
+ }
+
+ pfn++;
+ }
+
+ return true;
+}
+
+static u8 max_level_for_order(int order)
+{
+ if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+ return PG_LEVEL_2M;
+
+ return PG_LEVEL_4K;
+}
+
+static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
+{
+ kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+
+ /*
+ * If this is a large folio, and the entire 2M range containing the
+ * PFN is currently shared, then the entire 2M-aligned range can be
+ * set to private via a single 2M RMP entry.
+ */
+ if (max_level_for_order(order) > PG_LEVEL_4K &&
+ is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
+ return true;
+
+ return false;
+}
+
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ kvm_pfn_t pfn_aligned;
+ gfn_t gfn_aligned;
+ int level, rc;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc) {
+ pr_err_ratelimited("SEV: Failed to look up RMP entry: GFN %llx PFN %llx error %d\n",
+ gfn, pfn, rc);
+ return -ENOENT;
+ }
+
+ if (assigned) {
+ pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
+ __func__, gfn, pfn, max_order, level);
+ return 0;
+ }
+
+ if (is_large_rmp_possible(kvm, pfn, max_order)) {
+ level = PG_LEVEL_2M;
+ pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+ gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
+ } else {
+ level = PG_LEVEL_4K;
+ pfn_aligned = pfn;
+ gfn_aligned = gfn;
+ }
+
+ rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
+ if (rc) {
+ pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
+ gfn, pfn, level, rc);
+ return -EINVAL;
+ }
+
+ pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
+ __func__, gfn, pfn, pfn_aligned, max_order, level);
+
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 546656606b44..b9ecc06f8934 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5081,6 +5081,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+ .gmem_prepare = sev_gmem_prepare,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 926bfce571a6..4203bd9012e9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -736,6 +736,7 @@ extern unsigned int max_sev_asid;
void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -752,6 +753,10 @@ static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXI
static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
static inline void sev_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}
+static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+ return 0;
+}

#endif

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 62a0474e1346..f82a137640d8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13617,6 +13617,11 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
EXPORT_SYMBOL_GPL(kvm_arch_no_poll);

#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+bool kvm_arch_gmem_prepare_needed(struct kvm *kvm)
+{
+ return kvm->arch.vm_type == KVM_X86_SNP_VM;
+}
+
int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
{
return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index dfe50c64a552..9714add38852 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -39,8 +39,8 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
gfn = slot->base_gfn + index - slot->gmem.pgoff;
rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
if (rc) {
- pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
- index, rc);
+ pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
+ index, gfn, pfn, rc);
return rc;
}
}
--
2.25.1


2024-05-10 21:19:36

by Michael Roth

[permalink] [raw]
Subject: [PULL 14/19] KVM: x86: Implement hook for determining max NPT mapping level

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table. The existing mixed-attribute tracking in KVM is
insufficient here, for instance:

- gmem allocates 2MB page
- guest issues PVALIDATE on 2MB page
- guest later converts a subpage to shared
- SNP host code issues PSMASH to split 2MB RMP mapping to 4K
- KVM MMU splits NPT mapping to 4K
- guest later converts that shared page back to private

At this point there are no mixed attributes, and KVM would normally
allow for 2MB NPT mappings again, but this is actually not allowed
because the RMP table mappings are 4K and cannot be promoted on the
hypervisor side, so the NPT mappings must still be limited to 4K to
match this.

Implement a kvm_x86_ops.private_max_mapping_level() hook for SEV that
checks for this condition and adjusts the mapping level accordingly.

Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/svm/sev.c | 15 +++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 5 +++++
3 files changed, 21 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 379ac6efd74e..d603c97493b9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4727,3 +4727,18 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
cond_resched();
}
}
+
+int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+{
+ int level, rc;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc || !assigned)
+ return PG_LEVEL_4K;
+
+ return level;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 653cdb23a7d1..3d0549ca246f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5084,6 +5084,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.gmem_prepare = sev_gmem_prepare,
.gmem_invalidate = sev_gmem_invalidate,
+ .private_max_mapping_level = sev_private_max_mapping_level,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 3cea024a7c18..555c55f50298 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -738,6 +738,7 @@ void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -759,6 +760,10 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
return 0;
}
static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
+static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+{
+ return 0;
+}

#endif

--
2.25.1


2024-05-10 21:20:47

by Michael Roth

[permalink] [raw]
Subject: [PULL 15/19] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

From: Ashish Kalra <[email protected]>

With SNP/guest_memfd, private/encrypted memory should not be mappable,
and MMU notifications for HVA-mapped memory will only be relevant to
unencrypted guest memory. Therefore, the rationale behind issuing a
wbinvd_on_all_cpus() in sev_guest_memory_reclaimed() should not apply
for SNP guests and can be ignored.

Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
[mdr: Add some clarifications in commit]
Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/svm/sev.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d603c97493b9..2b88ae9a4f48 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3039,7 +3039,13 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)

void sev_guest_memory_reclaimed(struct kvm *kvm)
{
- if (!sev_guest(kvm))
+ /*
+ * With SNP+gmem, private/encrypted memory is unreachable via the
+ * hva-based mmu notifiers, so these events are only actually
+ * pertaining to shared pages where there is no need to perform
+ * the WBINVD to flush associated caches.
+ */
+ if (!sev_guest(kvm) || sev_snp_guest(kvm))
return;

wbinvd_on_all_cpus();
--
2.25.1


2024-05-10 21:21:03

by Michael Roth

[permalink] [raw]
Subject: [PULL 16/19] KVM: SVM: Add module parameter to enable SEV-SNP

From: Brijesh Singh <[email protected]>

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/svm/sev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2b88ae9a4f48..eb397ec22a47 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -49,7 +49,8 @@ static bool sev_es_enabled = true;
module_param_named(sev_es, sev_es_enabled, bool, 0444);

/* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);

/* enable/disable SEV-ES DebugSwap support */
static bool sev_es_debug_swap_enabled = true;
--
2.25.1


2024-05-10 21:21:16

by Michael Roth

[permalink] [raw]
Subject: [PULL 17/19] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

From: Brijesh Singh <[email protected]>

Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.

This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made are opaque to the
hypervisor, which only serves as a proxy for the guest requests and
firmware responses.

Implement handling for these events.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/svm/sev.c | 86 ++++++++++++++++++++++++++++++++++
include/uapi/linux/sev-guest.h | 9 ++++
2 files changed, 95 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index eb397ec22a47..00d29d278f6e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,6 +19,7 @@
#include <linux/misc_cgroup.h>
#include <linux/processor.h>
#include <linux/trace_events.h>
+#include <uapi/linux/sev-guest.h>

#include <asm/pkru.h>
#include <asm/trapnr.h>
@@ -3292,6 +3293,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
if (!sev_snp_guest(vcpu->kvm) || !kvm_ghcb_sw_scratch_is_valid(svm))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ if (!sev_snp_guest(vcpu->kvm))
+ goto vmgexit_err;
+ break;
default:
reason = GHCB_ERR_INVALID_EVENT;
goto vmgexit_err;
@@ -3914,6 +3919,83 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
return ret;
}

+static int snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_request *data,
+ gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ kvm_pfn_t req_pfn, resp_pfn;
+
+ if (!PAGE_ALIGNED(req_gpa) || !PAGE_ALIGNED(resp_gpa))
+ return -EINVAL;
+
+ req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+ if (is_error_noslot_pfn(req_pfn))
+ return -EINVAL;
+
+ resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+ if (is_error_noslot_pfn(resp_pfn))
+ return -EINVAL;
+
+ if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+ return -EINVAL;
+
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+ data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+ return 0;
+}
+
+static int snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data)
+{
+ u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+
+ if (snp_page_reclaim(pfn) || rmp_make_shared(pfn, PG_LEVEL_4K))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa,
+ sev_ret_code *fw_err)
+{
+ struct sev_data_snp_guest_request data = {0};
+ struct kvm_sev_info *sev;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -EINVAL;
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ ret = snp_setup_guest_buf(kvm, &data, req_gpa, resp_gpa);
+ if (ret)
+ return ret;
+
+ ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
+ if (ret)
+ return ret;
+
+ ret = snp_cleanup_guest_buf(&data);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ sev_ret_code fw_err = 0;
+ int vmm_ret = 0;
+
+ if (__snp_handle_guest_req(kvm, req_gpa, resp_gpa, &fw_err))
+ vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -4186,6 +4268,10 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
}

+ ret = 1;
+ break;
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
ret = 1;
break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index 154a87a1eca9..7bd78e258569 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -89,8 +89,17 @@ struct snp_ext_report_req {
#define SNP_GUEST_FW_ERR_MASK GENMASK_ULL(31, 0)
#define SNP_GUEST_VMM_ERR_SHIFT 32
#define SNP_GUEST_VMM_ERR(x) (((u64)x) << SNP_GUEST_VMM_ERR_SHIFT)
+#define SNP_GUEST_FW_ERR(x) ((x) & SNP_GUEST_FW_ERR_MASK)
+#define SNP_GUEST_ERR(vmm_err, fw_err) (SNP_GUEST_VMM_ERR(vmm_err) | \
+ SNP_GUEST_FW_ERR(fw_err))

+/*
+ * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define
+ * a GENERIC error code such that it won't ever conflict with GHCB-defined
+ * errors if any get added in the future.
+ */
#define SNP_GUEST_VMM_ERR_INVALID_LEN 1
#define SNP_GUEST_VMM_ERR_BUSY 2
+#define SNP_GUEST_VMM_ERR_GENERIC BIT(31)

#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.25.1


2024-05-10 21:21:29

by Michael Roth

[permalink] [raw]
Subject: [PULL 18/19] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Version 2 of GHCB specification added support for the SNP Extended Guest
Request Message NAE event. This event serves a nearly identical purpose
to the previously-added SNP_GUEST_REQUEST event, but allows for
additional certificate data to be supplied via an additional
guest-supplied buffer to be used mainly for verifying the signature of
an attestation report as returned by firmware.

This certificate data is supplied by userspace, so unlike with
SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
forwarded to userspace via a KVM_EXIT_VMGEXIT exit structure, and then
the firmware request is made after the certificate data has been fetched
from userspace.

Since there is a potential for race conditions where the
userspace-supplied certificate data may be out-of-sync relative to the
reported TCB or VLEK that firmware will use when signing attestation
reports, a hook is also provided so that userspace can be informed once
the attestation request is actually completed. See the updates to
Documentation/ for more details on these aspects.

Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
Documentation/virt/kvm/api.rst | 87 ++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/sev.c | 86 +++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 3 ++
include/uapi/linux/kvm.h | 23 +++++++++
4 files changed, 199 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index f0b76ff5030d..f3780ac98d56 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7060,6 +7060,93 @@ Please note that the kernel is allowed to use the kvm_run structure as the
primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.

+::
+
+ /* KVM_EXIT_VMGEXIT */
+ struct kvm_user_vmgexit {
+ #define KVM_USER_VMGEXIT_REQ_CERTS 1
+ __u32 type; /* KVM_USER_VMGEXIT_* type */
+ union {
+ struct {
+ __u64 data_gpa;
+ __u64 data_npages;
+ #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN 1
+ #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY 2
+ #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC (1 << 31)
+ __u32 ret;
+ #define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE BIT(0)
+ __u8 flags;
+ #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING 0
+ #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE 1
+ __u8 status;
+ } req_certs;
+ };
+ };
+
+
+If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest
+has issued a VMGEXIT instruction (as documented by the AMD Architecture
+Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
+userspace. These are generally handled by the host kernel, but in some
+cases some aspects of handling a VMGEXIT are done in userspace.
+
+A kvm_user_vmgexit structure is defined to encapsulate the data to be
+sent to or returned by userspace. The type field defines the specific type
+of exit that needs to be serviced, and that type is used as a discriminator
+to determine which union type should be used for input/output.
+
+KVM_USER_VMGEXIT_REQ_CERTS
+--------------------------
+
+When an SEV-SNP issues a guest request for an attestation report, it has the
+option of issuing it in the form an *extended* guest request when a
+certificate blob is returned alongside the attestation report so the guest
+can validate the endorsement key used by SNP firmware to sign the report.
+These certificates are managed by userspace and are requested via
+KVM_EXIT_VMGEXITs using the KVM_USER_VMGEXIT_REQ_CERTS type.
+
+For the KVM_USER_VMGEXIT_REQ_CERTS type, the req_certs union type
+is used. The kernel will supply in 'data_gpa' the value the guest supplies
+via the RAX field of the GHCB when issuing extended guest requests.
+'data_npages' will similarly contain the value the guest supplies in RBX
+denoting the number of shared pages available to write the certificate
+data into.
+
+ - If the supplied number of pages is sufficient, userspace should write
+ the certificate data blob (in the format defined by the GHCB spec) in
+ the address indicated by 'data_gpa' and set 'ret' to 0.
+
+ - If the number of pages supplied is not sufficient, userspace must write
+ the required number of pages in 'data_npages' and then set 'ret' to 1.
+
+ - If userspace is temporarily unable to handle the request, 'ret' should
+ be set to 2 to inform the guest to retry later.
+
+ - If some other error occurred, userspace should set 'ret' to a non-zero
+ value that is distinct from the specific return values mentioned above.
+
+Generally some care needs be taken to keep the returned certificate data in
+sync with the actual endorsement key in use by firmware at the time the
+attestation request is sent to SNP firmware. The recommended scheme to do
+this is for the VMM to obtain a shared or exclusive lock on the path the
+certificate blob file resides at before reading it and returning it to KVM,
+and that it continues to hold the lock until the attestation request is
+actually sent to firmware. To facilitate this, the VMM can set the
+KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE flag before returning the
+certificate blob, in which case another KVM_EXIT_VMGEXIT of type
+KVM_USER_VMGEXIT_REQ_CERTS will be sent to userspace with
+KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE being set in the status field to
+indicate the request is fully-completed and that any associated locks can be
+released.
+
+Tools/libraries that perform updates to SNP firmware TCB values or endorsement
+keys (e.g. firmware interfaces such as SNP_COMMIT, SNP_SET_CONFIG, or
+SNP_VLEK_LOAD, see Documentation/virt/coco/sev-guest.rst for more details) in
+such a way that the certificate blob needs to be updated, should similarly
+take an exclusive lock on the certificate blob for the duration of any updates
+to firmware or the certificate blob contents to ensure that VMMs using the
+above scheme will not return certificate blob data that is out of sync with
+firmware.

6. Capabilities that can be enabled on vCPUs
============================================
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 00d29d278f6e..398266bef2ca 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3297,6 +3297,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
if (!sev_snp_guest(vcpu->kvm))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+ if (!sev_snp_guest(vcpu->kvm) || !kvm_ghcb_rax_is_valid(svm) ||
+ !kvm_ghcb_rbx_is_valid(svm))
+ goto vmgexit_err;
+ break;
default:
reason = GHCB_ERR_INVALID_EVENT;
goto vmgexit_err;
@@ -3996,6 +4001,84 @@ static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
}

+static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct vmcb_control_area *control;
+ struct kvm *kvm = vcpu->kvm;
+ sev_ret_code fw_err = 0;
+ int vmm_ret;
+
+ vmm_ret = vcpu->run->vmgexit.req_certs.ret;
+ if (vmm_ret) {
+ if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
+ vcpu->arch.regs[VCPU_REGS_RBX] =
+ vcpu->run->vmgexit.req_certs.data_npages;
+ goto out;
+ }
+
+ /*
+ * The request was completed on the previous completion callback and
+ * this completion is only for the STATUS_DONE userspace notification.
+ */
+ if (vcpu->run->vmgexit.req_certs.status == KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE)
+ goto out_resume;
+
+ control = &svm->vmcb->control;
+
+ if (__snp_handle_guest_req(kvm, control->exit_info_1,
+ control->exit_info_2, &fw_err))
+ vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+out:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+
+ if (vcpu->run->vmgexit.req_certs.flags & KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE) {
+ vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE;
+ vcpu->run->vmgexit.req_certs.flags = 0;
+ return 0; /* notify userspace of completion */
+ }
+
+out_resume:
+ return 1; /* resume guest */
+}
+
+static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+ int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+ struct vcpu_svm *svm = to_svm(vcpu);
+ unsigned long data_npages;
+ sev_ret_code fw_err;
+ gpa_t data_gpa;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ goto abort_request;
+
+ data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+ data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+ if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
+ goto abort_request;
+
+ /*
+ * Grab the certificates from userspace so that can be bundled with
+ * attestation/guest requests.
+ */
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_REQ_CERTS;
+ vcpu->run->vmgexit.req_certs.data_gpa = data_gpa;
+ vcpu->run->vmgexit.req_certs.data_npages = data_npages;
+ vcpu->run->vmgexit.req_certs.flags = 0;
+ vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING;
+ vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
+
+ return 0; /* forward request to userspace */
+
+abort_request:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+ return 1; /* resume guest */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -4274,6 +4357,9 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
ret = 1;
break;
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+ ret = snp_begin_ext_guest_req(vcpu);
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 555c55f50298..97b3683ea324 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -309,6 +309,9 @@ struct vcpu_svm {

/* Guest GIF value, used when vGIF is not enabled */
bool guest_gif;
+
+ /* Transaction ID associated with SNP config updates */
+ u64 snp_transaction_id;
};

struct svm_cpu_data {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..106367d87189 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -135,6 +135,26 @@ struct kvm_xen_exit {
} u;
};

+struct kvm_user_vmgexit {
+#define KVM_USER_VMGEXIT_REQ_CERTS 1
+ __u32 type; /* KVM_USER_VMGEXIT_* type */
+ union {
+ struct {
+ __u64 data_gpa;
+ __u64 data_npages;
+#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN 1
+#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY 2
+#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC (1 << 31)
+ __u32 ret;
+#define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE BIT(0)
+ __u8 flags;
+#define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING 0
+#define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE 1
+ __u8 status;
+ } req_certs;
+ };
+};
+
#define KVM_S390_GET_SKEYS_NONE 1
#define KVM_S390_SKEYS_MAX 1048576

@@ -178,6 +198,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_NOTIFY 37
#define KVM_EXIT_LOONGARCH_IOCSR 38
#define KVM_EXIT_MEMORY_FAULT 39
+#define KVM_EXIT_VMGEXIT 40

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -433,6 +454,8 @@ struct kvm_run {
__u64 gpa;
__u64 size;
} memory_fault;
+ /* KVM_EXIT_VMGEXIT */
+ struct kvm_user_vmgexit vmgexit;
/* Fix the size of the union. */
char padding[256];
};
--
2.25.1


2024-05-10 21:23:32

by Michael Roth

[permalink] [raw]
Subject: [PULL 11/19] KVM: SEV: Support SEV-SNP AP Creation NAE event

From: Tom Lendacky <[email protected]>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID. The GPA is saved in the svm struct of the
target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID the next time an INIT is performed. The GPA is
saved in the svm struct of the target vCPU.

For DESTROY:
The guest indicates it wishes to stop the vCPU. The GPA is cleared
from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. If a new VMSA is to
be installed, the VMSA guest page is set as the VMSA in the vCPU VMCB
and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new VMSA is not
to be installed, the VMSA is cleared in the vCPU VMCB and the vCPU state
is set to KVM_MP_STATE_HALTED to prevent it from being run.

Signed-off-by: Tom Lendacky <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/svm.h | 6 +
arch/x86/kvm/svm/sev.c | 231 +++++++++++++++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 11 +-
arch/x86/kvm/svm/svm.h | 9 ++
arch/x86/kvm/x86.c | 11 ++
6 files changed, 266 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5830d42232da..72f4ec1ee062 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -121,6 +121,7 @@
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)

#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 544a43c1cf11..f0dea3750ca9 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -286,8 +286,14 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)

#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4)
#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)

+#define SVM_SEV_FEAT_INT_INJ_MODES \
+ (SVM_SEV_FEAT_RESTRICTED_INJECTION | \
+ SVM_SEV_FEAT_ALTERNATE_INJECTION)
+
struct vmcb_seg {
u16 selector;
u16 attrib;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 515bd6154a4b..518c44296f8d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -38,7 +38,7 @@
#define GHCB_VERSION_DEFAULT 2ULL
#define GHCB_VERSION_MIN 1ULL

-#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP
+#define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)

/* enable/disable SEV support */
static bool sev_enabled = true;
@@ -3267,6 +3267,13 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
if (!kvm_ghcb_sw_scratch_is_valid(svm))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ if (!sev_snp_guest(vcpu->kvm))
+ goto vmgexit_err;
+ if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
+ if (!kvm_ghcb_rax_is_valid(svm))
+ goto vmgexit_err;
+ break;
case SVM_VMGEXIT_NMI_COMPLETE:
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3701,6 +3708,205 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
unreachable();
}

+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+ /* Mark the vCPU as offline and not runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+
+ /* Clear use of the VMSA */
+ svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+ if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+ gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
+ struct kvm_memory_slot *slot;
+ kvm_pfn_t pfn;
+
+ slot = gfn_to_memslot(vcpu->kvm, gfn);
+ if (!slot)
+ return -EINVAL;
+
+ /*
+ * The new VMSA will be private memory guest memory, so
+ * retrieve the PFN from the gmem backend.
+ */
+ if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
+ return -EINVAL;
+
+ /*
+ * From this point forward, the VMSA will always be a
+ * guest-mapped page rather than the initial one allocated
+ * by KVM in svm->sev_es.vmsa. In theory, svm->sev_es.vmsa
+ * could be free'd and cleaned up here, but that involves
+ * cleanups like wbinvd_on_all_cpus() which would ideally
+ * be handled during teardown rather than guest boot.
+ * Deferring that also allows the existing logic for SEV-ES
+ * VMSAs to be re-used with minimal SNP-specific changes.
+ */
+ svm->sev_es.snp_has_guest_vmsa = true;
+
+ /* Use the new VMSA */
+ svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
+
+ /* Mark the vCPU as runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+
+ /*
+ * gmem pages aren't currently migratable, but if this ever
+ * changes then care should be taken to ensure
+ * svm->sev_es.vmsa is pinned through some other means.
+ */
+ kvm_release_pfn_clean(pfn);
+ }
+
+ /*
+ * When replacing the VMSA during SEV-SNP AP creation,
+ * mark the VMCB dirty so that full state is always reloaded.
+ */
+ vmcb_mark_all_dirty(svm->vmcb);
+
+ return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ int ret;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ return;
+
+ mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+ if (!svm->sev_es.snp_ap_waiting_for_reset)
+ goto unlock;
+
+ svm->sev_es.snp_ap_waiting_for_reset = false;
+
+ ret = __sev_snp_update_protected_guest_state(vcpu);
+ if (ret)
+ vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+ mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_vcpu *target_vcpu;
+ struct vcpu_svm *target_svm;
+ unsigned int request;
+ unsigned int apic_id;
+ bool kick;
+ int ret;
+
+ request = lower_32_bits(svm->vmcb->control.exit_info_1);
+ apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+ /* Validate the APIC ID */
+ target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+ if (!target_vcpu) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+ apic_id);
+ return -EINVAL;
+ }
+
+ ret = 0;
+
+ target_svm = to_svm(target_vcpu);
+
+ /*
+ * The target vCPU is valid, so the vCPU will be kicked unless the
+ * request is for CREATE_ON_INIT. For any errors at this stage, the
+ * kick will place the vCPU in an non-runnable state.
+ */
+ kick = true;
+
+ mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ target_svm->sev_es.snp_ap_waiting_for_reset = true;
+
+ /* Interrupt injection mode shouldn't change for AP creation */
+ if (request < SVM_VMGEXIT_AP_DESTROY) {
+ u64 sev_features;
+
+ sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+ sev_features ^= sev->vmsa_features;
+
+ if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+ vcpu->arch.regs[VCPU_REGS_RAX]);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ switch (request) {
+ case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+ kick = false;
+ fallthrough;
+ case SVM_VMGEXIT_AP_CREATE:
+ if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /*
+ * Malicious guest can RMPADJUST a large page into VMSA which
+ * will hit the SNP erratum where the CPU will incorrectly signal
+ * an RMP violation #PF if a hugepage collides with the RMP entry
+ * of VMSA page, reject the AP CREATE request if VMSA address from
+ * guest is 2M aligned.
+ */
+ if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+ vcpu_unimpl(vcpu,
+ "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+ break;
+ case SVM_VMGEXIT_AP_DESTROY:
+ break;
+ default:
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+ request);
+ ret = -EINVAL;
+ break;
+ }
+
+out:
+ if (kick) {
+ kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+
+ if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+ kvm_make_request(KVM_REQ_UNBLOCK, target_vcpu);
+
+ kvm_vcpu_kick(target_vcpu);
+ }
+
+ mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ return ret;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3966,6 +4172,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)

ret = snp_begin_psc(svm, svm->sev_es.ghcb_sa);
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ ret = sev_snp_ap_creation(svm);
+ if (ret) {
+ ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 2);
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
+ }
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -4060,7 +4275,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
* the VMSA will be NULL if this vCPU is the destination for intrahost
* migration, and will be copied later.
*/
- if (svm->sev_es.vmsa)
+ if (svm->sev_es.vmsa && !svm->sev_es.snp_has_guest_vmsa)
svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);

/* Can't intercept CR register access, HV can't modify CR registers */
@@ -4136,6 +4351,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
set_ghcb_msr(svm, GHCB_MSR_SEV_INFO((__u64)sev->ghcb_version,
GHCB_VERSION_MIN,
sev_enc_bit));
+
+ mutex_init(&svm->sev_es.snp_vmsa_mutex);
}

void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa)
@@ -4247,6 +4464,16 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
return p;
}

+void sev_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+ if (!sev_snp_guest(vcpu->kvm))
+ return;
+
+ if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu) &&
+ vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+}
+
void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
{
struct kvm_memory_slot *slot;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bdaf39571817..546656606b44 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1398,6 +1398,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
svm->spec_ctrl = 0;
svm->virt_spec_ctrl = 0;

+ if (init_event)
+ sev_snp_init_protected_guest_state(vcpu);
+
init_vmcb(vcpu);

if (!init_event)
@@ -4940,6 +4943,12 @@ static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
return page_address(page);
}

+static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+ sev_vcpu_unblocking(vcpu);
+ avic_vcpu_unblocking(vcpu);
+}
+
static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = KBUILD_MODNAME,

@@ -4962,7 +4971,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_load = svm_vcpu_load,
.vcpu_put = svm_vcpu_put,
.vcpu_blocking = avic_vcpu_blocking,
- .vcpu_unblocking = avic_vcpu_unblocking,
+ .vcpu_unblocking = svm_vcpu_unblocking,

.update_exception_bitmap = svm_update_exception_bitmap,
.get_msr_feature = svm_get_msr_feature,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 36b573427b85..926bfce571a6 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -216,6 +216,11 @@ struct vcpu_sev_es_state {
bool psc_2m;

u64 ghcb_registered_gpa;
+
+ struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
+ gpa_t snp_vmsa_gpa;
+ bool snp_ap_waiting_for_reset;
+ bool snp_has_guest_vmsa;
};

struct vcpu_svm {
@@ -729,6 +734,8 @@ int sev_cpu_init(struct svm_cpu_data *sd);
int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
extern unsigned int max_sev_asid;
void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -743,6 +750,8 @@ static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
#define max_sev_asid 0
static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
+static inline void sev_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}

#endif

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 60ebe3ee9118..62a0474e1346 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10945,6 +10945,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)

if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+ if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+ kvm_vcpu_reset(vcpu, true);
+ if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+ r = 1;
+ goto out;
+ }
+ }
}

if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -13152,6 +13160,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
if (kvm_test_request(KVM_REQ_PMI, vcpu))
return true;

+ if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+ return true;
+
if (kvm_arch_interrupt_allowed(vcpu) &&
(kvm_cpu_has_interrupt(vcpu) ||
kvm_guest_apic_has_interrupt(vcpu)))
--
2.25.1


2024-05-10 21:24:11

by Michael Roth

[permalink] [raw]
Subject: [PULL 06/19] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command

From: Brijesh Singh <[email protected]>

Add a KVM_SEV_SNP_LAUNCH_FINISH command to finalize the cryptographic
launch digest which stores the measurement of the guest at launch time.
Also extend the existing SNP firmware data structures to support
disabling the use of Versioned Chip Endorsement Keys (VCEK) by guests as
part of this command.

While finalizing the launch flow, the code also issues the LAUNCH_UPDATE
SNP firmware commands to encrypt/measure the initial VMSA pages for each
configured vCPU, which requires setting the RMP entries for those pages
to private, so also add handling to clean up the RMP entries for these
pages whening freeing vCPUs during shutdown.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Harald Hoyer <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 28 ++++
arch/x86/include/uapi/asm/kvm.h | 17 +++
arch/x86/kvm/svm/sev.c | 127 ++++++++++++++++++
include/linux/psp-sev.h | 4 +-
4 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index cc16a7426d18..1ddb6a86ce7f 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -544,6 +544,34 @@ where the allowed values for page_type are #define'd as::
See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is
used/measured.

+20. KVM_SEV_SNP_LAUNCH_FINISH
+-----------------------------
+
+After completion of the SNP guest launch flow, the KVM_SEV_SNP_LAUNCH_FINISH
+command can be issued to make the guest ready for execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 vcek_disabled;
+ __u8 host_data[32];
+ __u8 pad0[3];
+ __u16 flags; /* Must be zero */
+ __u64 pad1[4];
+ };
+
+
+See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further
+details on the input parameters in ``struct kvm_sev_snp_launch_finish``.
+
Device attribute API
====================

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 5935dc8a7e02..988b5204d636 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -700,6 +700,7 @@ enum sev_cmd_id {
/* SNP-specific commands */
KVM_SEV_SNP_LAUNCH_START = 100,
KVM_SEV_SNP_LAUNCH_UPDATE,
+ KVM_SEV_SNP_LAUNCH_FINISH,

KVM_SEV_NR_MAX,
};
@@ -854,6 +855,22 @@ struct kvm_sev_snp_launch_update {
__u64 pad2[4];
};

+#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
+#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
+
+struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 vcek_disabled;
+ __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+ __u8 pad0[3];
+ __u16 flags;
+ __u64 pad1[4];
+};
+
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c966f2224624..208bb8170d3f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -75,6 +75,8 @@ static u64 sev_supported_vmsa_features;
SNP_POLICY_MASK_DEBUG | \
SNP_POLICY_MASK_SINGLE_SOCKET)

+#define INITIAL_VMSA_GPA 0xFFFFFFFFF000
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2348,6 +2350,115 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}

+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_update data = {};
+ struct kvm_vcpu *vcpu;
+ unsigned long i;
+ int ret;
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.page_type = SNP_PAGE_TYPE_VMSA;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ struct vcpu_svm *svm = to_svm(vcpu);
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ ret = sev_es_sync_vmsa(svm);
+ if (ret)
+ return ret;
+
+ /* Transition the VMSA page to a firmware state. */
+ ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+ if (ret)
+ return ret;
+
+ /* Issue the SNP command to encrypt the VMSA */
+ data.address = __sme_pa(svm->sev_es.vmsa);
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, &argp->error);
+ if (ret) {
+ if (!snp_page_reclaim(pfn))
+ host_rmp_make_shared(pfn, PG_LEVEL_4K);
+
+ return ret;
+ }
+
+ svm->vcpu.arch.guest_state_protected = true;
+ }
+
+ return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_launch_finish params;
+ struct sev_data_snp_launch_finish *data;
+ void *id_block = NULL, *id_auth = NULL;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+ return -EFAULT;
+
+ if (params.flags)
+ return -EINVAL;
+
+ /* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+ ret = snp_launch_update_vmsa(kvm, argp);
+ if (ret)
+ return ret;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+ if (!data)
+ return -ENOMEM;
+
+ if (params.id_block_en) {
+ id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+ if (IS_ERR(id_block)) {
+ ret = PTR_ERR(id_block);
+ goto e_free;
+ }
+
+ data->id_block_en = 1;
+ data->id_block_paddr = __sme_pa(id_block);
+
+ id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+ if (IS_ERR(id_auth)) {
+ ret = PTR_ERR(id_auth);
+ goto e_free_id_block;
+ }
+
+ data->id_auth_paddr = __sme_pa(id_auth);
+
+ if (params.auth_key_en)
+ data->auth_key_en = 1;
+ }
+
+ data->vcek_disabled = params.vcek_disabled;
+
+ memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+ kfree(id_auth);
+
+e_free_id_block:
+ kfree(id_block);
+
+e_free:
+ kfree(data);
+
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2450,6 +2561,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_UPDATE:
r = snp_launch_update(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_FINISH:
+ r = snp_launch_finish(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2940,11 +3054,24 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)

svm = to_svm(vcpu);

+ /*
+ * If it's an SNP guest, then the VMSA was marked in the RMP table as
+ * a guest-owned page. Transition the page to hypervisor state before
+ * releasing it back to the system.
+ */
+ if (sev_snp_guest(vcpu->kvm)) {
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ if (host_rmp_make_shared(pfn, PG_LEVEL_4K))
+ goto skip_vmsa_free;
+ }
+
if (vcpu->arch.guest_state_protected)
sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);

__free_page(virt_to_page(svm->sev_es.vmsa));

+skip_vmsa_free:
if (svm->sev_es.ghcb_sa_free)
kvfree(svm->sev_es.ghcb_sa);
}
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 3705c2044fc0..903ddfea8585 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -658,6 +658,7 @@ struct sev_data_snp_launch_update {
* @id_auth_paddr: system physical address of ID block authentication structure
* @id_block_en: indicates whether ID block is present
* @auth_key_en: indicates whether author key is present in authentication structure
+ * @vcek_disabled: indicates whether use of VCEK is allowed for attestation reports
* @rsvd: reserved
* @host_data: host-supplied data for guest, not interpreted by firmware
*/
@@ -667,7 +668,8 @@ struct sev_data_snp_launch_finish {
u64 id_auth_paddr;
u8 id_block_en:1;
u8 auth_key_en:1;
- u64 rsvd:62;
+ u8 vcek_disabled:1;
+ u64 rsvd:61;
u8 host_data[32];
} __packed;

--
2.25.1


2024-05-10 21:24:47

by Michael Roth

[permalink] [raw]
Subject: [PULL 13/19] KVM: SEV: Implement gmem hook for invalidating private pages

Implement a platform hook to do the work of restoring the direct map
entries of gmem-managed pages and transitioning the corresponding RMP
table entries back to the default shared/hypervisor-owned state.

Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/sev.c | 64 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 2 ++
4 files changed, 68 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 10768f13b240..2a7f69abcac3 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -138,6 +138,7 @@ config KVM_AMD_SEV
select ARCH_HAS_CC_PLATFORM
select KVM_GENERIC_PRIVATE_MEM
select HAVE_KVM_GMEM_PREPARE
+ select HAVE_KVM_GMEM_INVALIDATE
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2bc4aa91cd31..379ac6efd74e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4663,3 +4663,67 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)

return 0;
}
+
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn;
+
+ pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
+
+ for (pfn = start; pfn < end;) {
+ bool use_2m_update = false;
+ int rc, rmp_level;
+ bool assigned;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc))
+ goto next_pfn;
+
+ if (!assigned)
+ goto next_pfn;
+
+ use_2m_update = IS_ALIGNED(pfn, PTRS_PER_PMD) &&
+ end >= (pfn + PTRS_PER_PMD) &&
+ rmp_level > PG_LEVEL_4K;
+
+ /*
+ * If an unaligned PFN corresponds to a 2M region assigned as a
+ * large page in the RMP table, PSMASH the region into individual
+ * 4K RMP entries before attempting to convert a 4K sub-page.
+ */
+ if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
+ /*
+ * This shouldn't fail, but if it does, report it, but
+ * still try to update RMP entry to shared and pray this
+ * was a spurious error that can be addressed later.
+ */
+ rc = snp_rmptable_psmash(pfn);
+ WARN_ONCE(rc, "SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ }
+
+ rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
+ if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc))
+ goto next_pfn;
+
+ /*
+ * SEV-ES avoids host/guest cache coherency issues through
+ * WBINVD hooks issued via MMU notifiers during run-time, and
+ * KVM's VM destroy path at shutdown. Those MMU notifier events
+ * don't cover gmem since there is no requirement to map pages
+ * to a HVA in order to use them for a running guest. While the
+ * shutdown path would still likely cover things for SNP guests,
+ * userspace may also free gmem pages during run-time via
+ * hole-punching operations on the guest_memfd, so flush the
+ * cache entries for these pages before free'ing them back to
+ * the host.
+ */
+ clflush_cache_range(__va(pfn_to_hpa(pfn)),
+ use_2m_update ? PMD_SIZE : PAGE_SIZE);
+next_pfn:
+ pfn += use_2m_update ? PTRS_PER_PMD : 1;
+ cond_resched();
+ }
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b9ecc06f8934..653cdb23a7d1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5083,6 +5083,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.alloc_apic_backing_page = svm_alloc_apic_backing_page,

.gmem_prepare = sev_gmem_prepare,
+ .gmem_invalidate = sev_gmem_invalidate,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4203bd9012e9..3cea024a7c18 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -737,6 +737,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -757,6 +758,7 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
{
return 0;
}
+static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}

#endif

--
2.25.1


2024-05-10 21:24:56

by Michael Roth

[permalink] [raw]
Subject: [PULL 08/19] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

When using gmem, private/shared memory is allocated through separate
pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
backed by private memory or not.

Forward these page state change requests to userspace so that it can
issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
entries when it is ready to map a private page into a guest.

Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
requests to userspace via KVM_EXIT_HYPERCALL.

Signed-off-by: Michael Roth <[email protected]>
Co-developed-by: Brijesh Singh <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/sev-common.h | 6 ++++
arch/x86/kvm/svm/sev.c | 48 +++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1006bfffe07a..6d68db812de1 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,11 +101,17 @@ enum psc_op {
/* GHCBData[11:0] */ \
GHCB_MSR_PSC_REQ)

+#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
+#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
+
#define GHCB_MSR_PSC_RESP 0x015
#define GHCB_MSR_PSC_RESP_VAL(val) \
/* GHCBData[63:32] */ \
(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)

+/* Set highest bit as a generic error response */
+#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
+
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 557f462fde04..438f2e8b8152 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}

+static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (vcpu->run->hypercall.ret)
+ set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+ else
+ set_ghcb_msr(svm, GHCB_MSR_PSC_RESP);
+
+ return 1; /* resume guest */
+}
+
+static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
+{
+ u64 gpa = gfn_to_gpa(GHCB_MSR_PSC_REQ_TO_GFN(ghcb_msr));
+ u8 op = GHCB_MSR_PSC_REQ_TO_OP(ghcb_msr);
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+
+ if (op != SNP_PAGE_STATE_PRIVATE && op != SNP_PAGE_STATE_SHARED) {
+ set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+ return 1; /* resume guest */
+ }
+
+ if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE))) {
+ set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+ return 1; /* resume guest */
+ }
+
+ vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+ vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
+ vcpu->run->hypercall.args[0] = gpa;
+ vcpu->run->hypercall.args[1] = 1;
+ vcpu->run->hypercall.args[2] = (op == SNP_PAGE_STATE_PRIVATE)
+ ? KVM_MAP_GPA_RANGE_ENCRYPTED
+ : KVM_MAP_GPA_RANGE_DECRYPTED;
+ vcpu->run->hypercall.args[2] |= KVM_MAP_GPA_RANGE_PAGE_SZ_4K;
+
+ vcpu->arch.complete_userspace_io = snp_complete_psc_msr;
+
+ return 0; /* forward request to userspace */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3566,6 +3608,12 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PSC_REQ:
+ if (!sev_snp_guest(vcpu->kvm))
+ goto out_terminate;
+
+ ret = snp_begin_psc_msr(svm, control->ghcb_gpa);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

--
2.25.1


2024-05-10 21:28:07

by Michael Roth

[permalink] [raw]
Subject: [PULL 19/19] crypto: ccp: Add the SNP_VLEK_LOAD command

When requesting an attestation report a guest is able to specify whether
it wants SNP firmware to sign the report using either a Versioned Chip
Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
Key Derivation Service (KDS) and derived from seeds allocated to
enrolled cloud service providers (CSPs).

For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load
them into the system after obtaining them from the KDS. Add a
corresponding userspace interface so to allow the loading of VLEK keys
into the system.

See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details.

Reviewed-by: Tom Lendacky <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
Documentation/virt/coco/sev-guest.rst | 19 ++++++++++++++
drivers/crypto/ccp/sev-dev.c | 36 +++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 27 ++++++++++++++++++++
3 files changed, 82 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index e1eaf6a830ce..de68d3a4b540 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -176,6 +176,25 @@ to SNP_CONFIG command defined in the SEV-SNP spec. The current values of
the firmware parameters affected by this command can be queried via
SNP_PLATFORM_STATUS.

+2.7 SNP_VLEK_LOAD
+-----------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_user_data_snp_vlek_load
+:Returns (out): 0 on success, -negative on error
+
+When requesting an attestation report a guest is able to specify whether
+it wants SNP firmware to sign the report using either a Versioned Chip
+Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
+Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
+Key Derivation Service (KDS) and derived from seeds allocated to
+enrolled cloud service providers.
+
+In the case of VLEK keys, the SNP_VLEK_LOAD SNP command is used to load
+them into the system after obtaining them from the KDS, and corresponds
+closely to the SNP_VLEK_LOAD firmware command specified in the SEV-SNP
+spec.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 2102377f727b..97a7959406ee 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2027,6 +2027,39 @@ static int sev_ioctl_do_snp_set_config(struct sev_issue_cmd *argp, bool writable
return __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
}

+static int sev_ioctl_do_snp_vlek_load(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_snp_vlek_load input;
+ void *blob;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ if (copy_from_user(&input, u64_to_user_ptr(argp->data), sizeof(input)))
+ return -EFAULT;
+
+ if (input.len != sizeof(input) || input.vlek_wrapped_version != 0)
+ return -EINVAL;
+
+ blob = psp_copy_user_blob(input.vlek_wrapped_address,
+ sizeof(struct sev_user_data_snp_wrapped_vlek_hashstick));
+ if (IS_ERR(blob))
+ return PTR_ERR(blob);
+
+ input.vlek_wrapped_address = __psp_pa(blob);
+
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_VLEK_LOAD, &input, &argp->error);
+
+ kfree(blob);
+
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -2087,6 +2120,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SNP_SET_CONFIG:
ret = sev_ioctl_do_snp_set_config(&input, writable);
break;
+ case SNP_VLEK_LOAD:
+ ret = sev_ioctl_do_snp_vlek_load(&input, writable);
+ break;
default:
ret = -EINVAL;
goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index b7a2c2ee35b7..2289b7c76c59 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -31,6 +31,7 @@ enum {
SNP_PLATFORM_STATUS,
SNP_COMMIT,
SNP_SET_CONFIG,
+ SNP_VLEK_LOAD,

SEV_MAX,
};
@@ -214,6 +215,32 @@ struct sev_user_data_snp_config {
__u8 rsvd1[52];
} __packed;

+/**
+ * struct sev_data_snp_vlek_load - SNP_VLEK_LOAD structure
+ *
+ * @len: length of the command buffer read by the PSP
+ * @vlek_wrapped_version: version of wrapped VLEK hashstick (Must be 0h)
+ * @rsvd: reserved
+ * @vlek_wrapped_address: address of a wrapped VLEK hashstick
+ * (struct sev_user_data_snp_wrapped_vlek_hashstick)
+ */
+struct sev_user_data_snp_vlek_load {
+ __u32 len; /* In */
+ __u8 vlek_wrapped_version; /* In */
+ __u8 rsvd[3]; /* In */
+ __u64 vlek_wrapped_address; /* In */
+} __packed;
+
+/**
+ * struct sev_user_data_snp_vlek_wrapped_vlek_hashstick - Wrapped VLEK data
+ *
+ * @data: Opaque data provided by AMD KDS (as described in SEV-SNP Firmware ABI
+ * 1.54, SNP_VLEK_LOAD)
+ */
+struct sev_user_data_snp_wrapped_vlek_hashstick {
+ __u8 data[432]; /* In */
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1


2024-05-10 21:32:07

by Michael Roth

[permalink] [raw]
Subject: [PULL 07/19] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, verify that the GHCB GPA matches with the registered value.
If a mismatch is detected, then abort the guest.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/sev-common.h | 8 ++++++
arch/x86/kvm/svm/sev.c | 48 +++++++++++++++++++++++++++----
arch/x86/kvm/svm/svm.h | 7 +++++
3 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 5a8246dd532f..1006bfffe07a 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ 0x010
+#define GHCB_MSR_GPA_VALUE_POS 12
+#define GHCB_MSR_GPA_VALUE_MASK GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP 0x011
+#define GHCB_MSR_PREF_GPA_NONE 0xfffffffffffff
+
/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 208bb8170d3f..557f462fde04 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3540,6 +3540,32 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
break;
+ case GHCB_MSR_PREF_GPA_REQ:
+ if (!sev_snp_guest(vcpu->kvm))
+ goto out_terminate;
+
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ case GHCB_MSR_REG_GPA_REQ: {
+ u64 gfn;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ goto out_terminate;
+
+ gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+
+ svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+ set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -3552,12 +3578,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
reason_set, reason_code);

- vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
- vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
- vcpu->run->system_event.ndata = 1;
- vcpu->run->system_event.data[0] = control->ghcb_gpa;
-
- return 0;
+ goto out_terminate;
}
default:
/* Error, keep GHCB MSR value as-is */
@@ -3568,6 +3589,14 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
control->ghcb_gpa, ret);

return ret;
+
+out_terminate:
+ vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+ vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
+ vcpu->run->system_event.ndata = 1;
+ vcpu->run->system_event.data[0] = control->ghcb_gpa;
+
+ return 0;
}

int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
@@ -3603,6 +3632,13 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
trace_kvm_vmgexit_enter(vcpu->vcpu_id, svm->sev_es.ghcb);

sev_es_sync_from_ghcb(svm);
+
+ /* SEV-SNP guest requires that the GHCB GPA must be registered */
+ if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+ vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+ return -EINVAL;
+ }
+
ret = sev_es_validate_vmgexit(svm);
if (ret)
return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 305772d36490..202ac5494c19 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -209,6 +209,8 @@ struct vcpu_sev_es_state {
u32 ghcb_sa_len;
bool ghcb_sa_sync;
bool ghcb_sa_free;
+
+ u64 ghcb_registered_gpa;
};

struct vcpu_svm {
@@ -362,6 +364,11 @@ static __always_inline bool sev_snp_guest(struct kvm *kvm)
#endif
}

+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+ return svm->sev_es.ghcb_registered_gpa == val;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1


2024-05-10 21:32:35

by Michael Roth

[permalink] [raw]
Subject: [PULL 01/19] KVM: MMU: Disable fast path if KVM_EXIT_MEMORY_FAULT is needed

For hardware-protected VMs like SEV-SNP guests, certain conditions like
attempting to perform a write to a page which is not in the state that
the guest expects it to be in can result in a nested/extended #PF which
can only be satisfied by the host performing an implicit page state
change to transition the page into the expected shared/private state.
This is generally handled by generating a KVM_EXIT_MEMORY_FAULT event
that gets forwarded to userspace to handle via
KVM_SET_MEMORY_ATTRIBUTES.

However, the fast_page_fault() code might misconstrue this situation as
being the result of a write-protected access, and treat it as a spurious
case when it sees that writes are already allowed for the sPTE. This
results in the KVM MMU trying to resume the guest rather than taking any
action to satisfy the real source of the #PF such as generating a
KVM_EXIT_MEMORY_FAULT, resulting in the guest spinning on nested #PFs.

Check for this condition and bail out of the fast path if it is
detected.

Suggested-by: Paolo Bonzini <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Cc: Isaku Yamahata <[email protected]>
Reviewed-by: Isaku Yamahata <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4ec88a2a0061..9c7ab06ce454 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3296,7 +3296,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
return RET_PF_CONTINUE;
}

-static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
+static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault *fault)
{
/*
* Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only
@@ -3307,6 +3307,26 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
if (fault->rsvd)
return false;

+ /*
+ * For hardware-protected VMs, certain conditions like attempting to
+ * perform a write to a page which is not in the state that the guest
+ * expects it to be in can result in a nested/extended #PF. In this
+ * case, the below code might misconstrue this situation as being the
+ * result of a write-protected access, and treat it as a spurious case
+ * rather than taking any action to satisfy the real source of the #PF
+ * such as generating a KVM_EXIT_MEMORY_FAULT. This can lead to the
+ * guest spinning on a #PF indefinitely, so don't attempt the fast path
+ * in this case.
+ *
+ * Note that the kvm_mem_is_private() check might race with an
+ * attribute update, but this will either result in the guest spinning
+ * on RET_PF_SPURIOUS until the update completes, or an actual spurious
+ * case might go down the slow path. Either case will resolve itself.
+ */
+ if (kvm->arch.has_private_mem &&
+ fault->is_private != kvm_mem_is_private(kvm, fault->gfn))
+ return false;
+
/*
* #PF can be fast if:
*
@@ -3407,7 +3427,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
u64 *sptep;
uint retry_count = 0;

- if (!page_fault_can_be_fast(fault))
+ if (!page_fault_can_be_fast(vcpu->kvm, fault))
return ret;

walk_shadow_page_lockless_begin(vcpu);
--
2.25.1


2024-05-10 21:33:40

by Michael Roth

[permalink] [raw]
Subject: [PULL 03/19] KVM: SEV: Add initial SEV-SNP support

From: Brijesh Singh <[email protected]>

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based security protection. SEV-SNP adds strong memory
encryption and integrity protection to help prevent malicious
hypervisor-based attacks such as data replay, memory re-mapping, and
more, to create an isolated execution environment.

Define a new KVM_X86_SNP_VM type which makes use of these capabilities
and extend the KVM_SEV_INIT2 ioctl to support it. Also add a basic
helper to check whether SNP is enabled and set PFERR_PRIVATE_ACCESS for
private #NPFs so they are handled appropriately by KVM MMU.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/svm.h | 3 ++-
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/svm/sev.c | 21 ++++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 8 +++++++-
arch/x86/kvm/svm/svm.h | 12 ++++++++++++
5 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 728c98175b9c..544a43c1cf11 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -285,7 +285,8 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_

#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)

-#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)
+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)

struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 9fae1b73b529..d2ae5fcc0275 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -874,5 +874,6 @@ struct kvm_hyperv_eventfd {
#define KVM_X86_SW_PROTECTED_VM 1
#define KVM_X86_SEV_VM 2
#define KVM_X86_SEV_ES_VM 3
+#define KVM_X86_SNP_VM 4

#endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0623cfaa7bb0..b3345d45b989 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -47,6 +47,9 @@ module_param_named(sev, sev_enabled, bool, 0444);
static bool sev_es_enabled = true;
module_param_named(sev_es, sev_es_enabled, bool, 0444);

+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
+
/* enable/disable SEV-ES DebugSwap support */
static bool sev_es_debug_swap_enabled = true;
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
@@ -288,6 +291,9 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
if (sev->es_active && !sev->ghcb_version)
sev->ghcb_version = GHCB_VERSION_DEFAULT;

+ if (vm_type == KVM_X86_SNP_VM)
+ sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
ret = sev_asid_new(sev);
if (ret)
goto e_no_asid;
@@ -348,7 +354,8 @@ static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
return -EINVAL;

if (kvm->arch.vm_type != KVM_X86_SEV_VM &&
- kvm->arch.vm_type != KVM_X86_SEV_ES_VM)
+ kvm->arch.vm_type != KVM_X86_SEV_ES_VM &&
+ kvm->arch.vm_type != KVM_X86_SNP_VM)
return -EINVAL;

if (copy_from_user(&data, u64_to_user_ptr(argp->data), sizeof(data)))
@@ -2328,11 +2335,16 @@ void __init sev_set_cpu_caps(void)
kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_ES_VM);
}
+ if (sev_snp_enabled) {
+ kvm_cpu_cap_set(X86_FEATURE_SEV_SNP);
+ kvm_caps.supported_vm_types |= BIT(KVM_X86_SNP_VM);
+ }
}

void __init sev_hardware_setup(void)
{
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+ bool sev_snp_supported = false;
bool sev_es_supported = false;
bool sev_supported = false;

@@ -2413,6 +2425,7 @@ void __init sev_hardware_setup(void)
sev_es_asid_count = min_sev_asid - 1;
WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
sev_es_supported = true;
+ sev_snp_supported = sev_snp_enabled && cc_platform_has(CC_ATTR_HOST_SEV_SNP);

out:
if (boot_cpu_has(X86_FEATURE_SEV))
@@ -2425,9 +2438,15 @@ void __init sev_hardware_setup(void)
pr_info("SEV-ES %s (ASIDs %u - %u)\n",
sev_es_supported ? "enabled" : "disabled",
min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
+ if (boot_cpu_has(X86_FEATURE_SEV_SNP))
+ pr_info("SEV-SNP %s (ASIDs %u - %u)\n",
+ sev_snp_supported ? "enabled" : "disabled",
+ min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);

sev_enabled = sev_supported;
sev_es_enabled = sev_es_supported;
+ sev_snp_enabled = sev_snp_supported;
+
if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
!cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
sev_es_debug_swap_enabled = false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c8dc25886c16..66d5e2e46a66 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2057,6 +2057,9 @@ static int npf_interception(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
error_code &= ~PFERR_SYNTHETIC_MASK;

+ if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK))
+ error_code |= PFERR_PRIVATE_ACCESS;
+
trace_kvm_page_fault(vcpu, fault_address, error_code);
return kvm_mmu_page_fault(vcpu, fault_address, error_code,
static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
@@ -4902,8 +4905,11 @@ static int svm_vm_init(struct kvm *kvm)

if (type != KVM_X86_DEFAULT_VM &&
type != KVM_X86_SW_PROTECTED_VM) {
- kvm->arch.has_protected_state = (type == KVM_X86_SEV_ES_VM);
+ kvm->arch.has_protected_state =
+ (type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
to_kvm_sev_info(kvm)->need_init = true;
+
+ kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
}

if (!pause_filter_count || !pause_filter_thresh)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index be57213cd295..583e035d38f8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -349,6 +349,18 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
#endif
}

+static __always_inline bool sev_snp_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return (sev->vmsa_features & SVM_SEV_FEAT_SNP_ACTIVE) &&
+ !WARN_ON_ONCE(!sev_es_guest(kvm));
+#else
+ return false;
+#endif
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1


2024-05-10 21:34:08

by Michael Roth

[permalink] [raw]
Subject: [PULL 04/19] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command

From: Brijesh Singh <[email protected]>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. Other commands can then at that point be
used to load/encrypt data into the guest's initial launch image.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 28 ++-
arch/x86/include/uapi/asm/kvm.h | 11 ++
arch/x86/kvm/svm/sev.c | 176 +++++++++++++++++-
arch/x86/kvm/svm/svm.h | 1 +
4 files changed, 212 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 9677a0714a39..dd179e162a87 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -466,6 +466,30 @@ issued by the hypervisor to make the guest ready for execution.

Returns: 0 on success, -negative on error

+18. KVM_SEV_SNP_LAUNCH_START
+----------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. It must be called prior to issuing
+KVM_SEV_SNP_LAUNCH_UPDATE or KVM_SEV_SNP_LAUNCH_FINISH;
+
+Parameters (in): struct kvm_sev_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_start {
+ __u64 policy; /* Guest policy to use. */
+ __u8 gosvw[16]; /* Guest OS visible workarounds. */
+ __u16 flags; /* Must be zero. */
+ __u8 pad0[6];
+ __u64 pad1[4];
+ };
+
+See SNP_LAUNCH_START in the SEV-SNP specification [snp-fw-abi]_ for further
+details on the input parameters in ``struct kvm_sev_snp_launch_start``.
+
Device attribute API
====================

@@ -497,9 +521,11 @@ References
==========


-See [white-paper]_, [api-spec]_, [amd-apm]_ and [kvm-forum]_ for more info.
+See [white-paper]_, [api-spec]_, [amd-apm]_, [kvm-forum]_, and [snp-fw-abi]_
+for more info.

.. [white-paper] https://developer.amd.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
.. [api-spec] https://support.amd.com/TechDocs/55766_SEV-KM_API_Specification.pdf
.. [amd-apm] https://support.amd.com/TechDocs/24593.pdf (section 15.34)
.. [kvm-forum] https://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
+.. [snp-fw-abi] https://www.amd.com/system/files/TechDocs/56860.pdf
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d2ae5fcc0275..693a80ffe40a 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -697,6 +697,9 @@ enum sev_cmd_id {
/* Second time is the charm; improved versions of the above ioctls. */
KVM_SEV_INIT2,

+ /* SNP-specific commands */
+ KVM_SEV_SNP_LAUNCH_START = 100,
+
KVM_SEV_NR_MAX,
};

@@ -824,6 +827,14 @@ struct kvm_sev_receive_update_data {
__u32 pad2;
};

+struct kvm_sev_snp_launch_start {
+ __u64 policy;
+ __u8 gosvw[16];
+ __u16 flags;
+ __u8 pad0[6];
+ __u64 pad1[4];
+};
+
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b3345d45b989..b372ae5c8c58 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -25,6 +25,7 @@
#include <asm/fpu/xcr.h>
#include <asm/fpu/xstate.h>
#include <asm/debugreg.h>
+#include <asm/sev.h>

#include "mmu.h"
#include "x86.h"
@@ -59,6 +60,21 @@ static u64 sev_supported_vmsa_features;
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2

+/* As defined by SEV-SNP Firmware ABI, under "Guest Policy". */
+#define SNP_POLICY_MASK_API_MINOR GENMASK_ULL(7, 0)
+#define SNP_POLICY_MASK_API_MAJOR GENMASK_ULL(15, 8)
+#define SNP_POLICY_MASK_SMT BIT_ULL(16)
+#define SNP_POLICY_MASK_RSVD_MBO BIT_ULL(17)
+#define SNP_POLICY_MASK_DEBUG BIT_ULL(19)
+#define SNP_POLICY_MASK_SINGLE_SOCKET BIT_ULL(20)
+
+#define SNP_POLICY_MASK_VALID (SNP_POLICY_MASK_API_MINOR | \
+ SNP_POLICY_MASK_API_MAJOR | \
+ SNP_POLICY_MASK_SMT | \
+ SNP_POLICY_MASK_RSVD_MBO | \
+ SNP_POLICY_MASK_DEBUG | \
+ SNP_POLICY_MASK_SINGLE_SOCKET)
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -69,6 +85,8 @@ static unsigned int nr_asids;
static unsigned long *sev_asid_bitmap;
static unsigned long *sev_reclaim_asid_bitmap;

+static int snp_decommission_context(struct kvm *kvm);
+
struct enc_region {
struct list_head list;
unsigned long npages;
@@ -95,12 +113,17 @@ static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid)
down_write(&sev_deactivate_lock);

wbinvd_on_all_cpus();
- ret = sev_guest_df_flush(&error);
+
+ if (sev_snp_enabled)
+ ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+ else
+ ret = sev_guest_df_flush(&error);

up_write(&sev_deactivate_lock);

if (ret)
- pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+ pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+ sev_snp_enabled ? "-SNP" : "", ret, error);

return ret;
}
@@ -1998,6 +2021,106 @@ int sev_dev_get_attr(u32 group, u64 attr, u64 *val)
}
}

+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct sev_data_snp_addr data = {};
+ void *context;
+ int rc;
+
+ /* Allocate memory for context page */
+ context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+ if (!context)
+ return NULL;
+
+ data.address = __psp_pa(context);
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+ if (rc) {
+ pr_warn("Failed to create SEV-SNP context, rc %d fw_error %d",
+ rc, argp->error);
+ snp_free_firmware_page(context);
+ return NULL;
+ }
+
+ return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_activate data = {0};
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.asid = sev_get_asid(kvm);
+ return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_start start = {0};
+ struct kvm_sev_snp_launch_start params;
+ int rc;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+ return -EFAULT;
+
+ /* Don't allow userspace to allocate memory for more than 1 SNP context. */
+ if (sev->snp_context)
+ return -EINVAL;
+
+ sev->snp_context = snp_context_create(kvm, argp);
+ if (!sev->snp_context)
+ return -ENOTTY;
+
+ if (params.flags)
+ return -EINVAL;
+
+ if (params.policy & ~SNP_POLICY_MASK_VALID)
+ return -EINVAL;
+
+ /* Check for policy bits that must be set */
+ if (!(params.policy & SNP_POLICY_MASK_RSVD_MBO) ||
+ !(params.policy & SNP_POLICY_MASK_SMT))
+ return -EINVAL;
+
+ if (params.policy & SNP_POLICY_MASK_SINGLE_SOCKET)
+ return -EINVAL;
+
+ start.gctx_paddr = __psp_pa(sev->snp_context);
+ start.policy = params.policy;
+ memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+ if (rc) {
+ pr_debug("%s: SEV_CMD_SNP_LAUNCH_START firmware command failed, rc %d\n",
+ __func__, rc);
+ goto e_free_context;
+ }
+
+ sev->fd = argp->sev_fd;
+ rc = snp_bind_asid(kvm, &argp->error);
+ if (rc) {
+ pr_debug("%s: Failed to bind ASID to SEV-SNP context, rc %d\n",
+ __func__, rc);
+ goto e_free_context;
+ }
+
+ return 0;
+
+e_free_context:
+ snp_decommission_context(kvm);
+
+ return rc;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2021,6 +2144,15 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
goto out;
}

+ /*
+ * Once KVM_SEV_INIT2 initializes a KVM instance as an SNP guest, only
+ * allow the use of SNP-specific commands.
+ */
+ if (sev_snp_guest(kvm) && sev_cmd.id < KVM_SEV_SNP_LAUNCH_START) {
+ r = -EPERM;
+ goto out;
+ }
+
switch (sev_cmd.id) {
case KVM_SEV_ES_INIT:
if (!sev_es_enabled) {
@@ -2085,6 +2217,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_RECEIVE_FINISH:
r = sev_receive_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_START:
+ r = snp_launch_start(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2280,6 +2415,31 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}

+static int snp_decommission_context(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_addr data = {};
+ int ret;
+
+ /* If context is not created then do nothing */
+ if (!sev->snp_context)
+ return 0;
+
+ /* Do the decommision, which will unbind the ASID from the SNP context */
+ data.address = __sme_pa(sev->snp_context);
+ down_write(&sev_deactivate_lock);
+ ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+ up_write(&sev_deactivate_lock);
+
+ if (WARN_ONCE(ret, "Failed to release guest context, ret %d", ret))
+ return ret;
+
+ snp_free_firmware_page(sev->snp_context);
+ sev->snp_context = NULL;
+
+ return 0;
+}
+
void sev_vm_destroy(struct kvm *kvm)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2321,7 +2481,17 @@ void sev_vm_destroy(struct kvm *kvm)
}
}

- sev_unbind_asid(kvm, sev->handle);
+ if (sev_snp_guest(kvm)) {
+ /*
+ * Decomission handles unbinding of the ASID. If it fails for
+ * some unexpected reason, just leak the ASID.
+ */
+ if (snp_decommission_context(kvm))
+ return;
+ } else {
+ sev_unbind_asid(kvm, sev->handle);
+ }
+
sev_asid_free(sev);
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 583e035d38f8..305772d36490 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -93,6 +93,7 @@ struct kvm_sev_info {
struct list_head mirror_entry; /* Use as a list entry of mirrors */
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
+ void *snp_context; /* SNP guest context page */
};

struct kvm_svm {
--
2.25.1


2024-05-10 21:34:17

by Michael Roth

[permalink] [raw]
Subject: [PULL 02/19] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y

SEV-SNP relies on private memory support to run guests, so make sure to
enable that support via the CONFIG_KVM_GENERIC_PRIVATE_MEM config
option.

Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index d64fb2b3eb69..5e72faca4e8f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -136,6 +136,7 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select ARCH_HAS_CC_PLATFORM
+ select KVM_GENERIC_PRIVATE_MEM
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
--
2.25.1


2024-05-10 21:34:36

by Michael Roth

[permalink] [raw]
Subject: [PULL 05/19] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

From: Brijesh Singh <[email protected]>

A key aspect of a launching an SNP guest is initializing it with a
known/measured payload which is then encrypted into guest memory as
pre-validated private pages and then measured into the cryptographic
launch context created with KVM_SEV_SNP_LAUNCH_START so that the guest
can attest itself after booting.

Since all private pages are provided by guest_memfd, make use of the
kvm_gmem_populate() interface to handle this. The general flow is that
guest_memfd will handle allocating the pages associated with the GPA
ranges being initialized by each particular call of
KVM_SEV_SNP_LAUNCH_UPDATE, copying data from userspace into those pages,
and then the post_populate callback will do the work of setting the
RMP entries for these pages to private and issuing the SNP firmware
calls to encrypt/measure them.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 54 ++++
arch/x86/include/uapi/asm/kvm.h | 19 ++
arch/x86/kvm/svm/sev.c | 230 ++++++++++++++++++
3 files changed, 303 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index dd179e162a87..cc16a7426d18 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -490,6 +490,60 @@ Returns: 0 on success, -negative on error
See SNP_LAUNCH_START in the SEV-SNP specification [snp-fw-abi]_ for further
details on the input parameters in ``struct kvm_sev_snp_launch_start``.

+19. KVM_SEV_SNP_LAUNCH_UPDATE
+-----------------------------
+
+The KVM_SEV_SNP_LAUNCH_UPDATE command is used for loading userspace-provided
+data into a guest GPA range, measuring the contents into the SNP guest context
+created by KVM_SEV_SNP_LAUNCH_START, and then encrypting/validating that GPA
+range so that it will be immediately readable using the encryption key
+associated with the guest context once it is booted, after which point it can
+attest the measurement associated with its context before unlocking any
+secrets.
+
+It is required that the GPA ranges initialized by this command have had the
+KVM_MEMORY_ATTRIBUTE_PRIVATE attribute set in advance. See the documentation
+for KVM_SET_MEMORY_ATTRIBUTES for more details on this aspect.
+
+Upon success, this command is not guaranteed to have processed the entire
+range requested. Instead, the ``gfn_start``, ``uaddr``, and ``len`` fields of
+``struct kvm_sev_snp_launch_update`` will be updated to correspond to the
+remaining range that has yet to be processed. The caller should continue
+calling this command until those fields indicate the entire range has been
+processed, e.g. ``len`` is 0, ``gfn_start`` is equal to the last GFN in the
+range plus 1, and ``uaddr`` is the last byte of the userspace-provided source
+buffer address plus 1. In the case where ``type`` is KVM_SEV_SNP_PAGE_TYPE_ZERO,
+``uaddr`` will be ignored completely.
+
+Parameters (in): struct kvm_sev_snp_launch_update
+
+Returns: 0 on success, < 0 on error, -EAGAIN if caller should retry
+
+::
+
+ struct kvm_sev_snp_launch_update {
+ __u64 gfn_start; /* Guest page number to load/encrypt data into. */
+ __u64 uaddr; /* Userspace address of data to be loaded/encrypted. */
+ __u64 len; /* 4k-aligned length in bytes to copy into guest memory.*/
+ __u8 type; /* The type of the guest pages being initialized. */
+ __u8 pad0;
+ __u16 flags; /* Must be zero. */
+ __u32 pad1;
+ __u64 pad2[4];
+
+ };
+
+where the allowed values for page_type are #define'd as::
+
+ KVM_SEV_SNP_PAGE_TYPE_NORMAL
+ KVM_SEV_SNP_PAGE_TYPE_ZERO
+ KVM_SEV_SNP_PAGE_TYPE_UNMEASURED
+ KVM_SEV_SNP_PAGE_TYPE_SECRETS
+ KVM_SEV_SNP_PAGE_TYPE_CPUID
+
+See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is
+used/measured.
+
Device attribute API
====================

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 693a80ffe40a..5935dc8a7e02 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -699,6 +699,7 @@ enum sev_cmd_id {

/* SNP-specific commands */
KVM_SEV_SNP_LAUNCH_START = 100,
+ KVM_SEV_SNP_LAUNCH_UPDATE,

KVM_SEV_NR_MAX,
};
@@ -835,6 +836,24 @@ struct kvm_sev_snp_launch_start {
__u64 pad1[4];
};

+/* Kept in sync with firmware values for simplicity. */
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
+
+struct kvm_sev_snp_launch_update {
+ __u64 gfn_start;
+ __u64 uaddr;
+ __u64 len;
+ __u8 type;
+ __u8 pad0;
+ __u16 flags;
+ __u32 pad1;
+ __u64 pad2[4];
+};
+
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b372ae5c8c58..c966f2224624 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -259,6 +259,45 @@ static void sev_decommission(unsigned int handle)
sev_guest_decommission(&decommission, NULL);
}

+/*
+ * Certain page-states, such as Pre-Guest and Firmware pages (as documented
+ * in Chapter 5 of the SEV-SNP Firmware ABI under "Page States") cannot be
+ * directly transitioned back to normal/hypervisor-owned state via RMPUPDATE
+ * unless they are reclaimed first.
+ *
+ * Until they are reclaimed and subsequently transitioned via RMPUPDATE, they
+ * might not be usable by the host due to being set as immutable or still
+ * being associated with a guest ASID.
+ */
+static int snp_page_reclaim(u64 pfn)
+{
+ struct sev_data_snp_page_reclaim data = {0};
+ int err, rc;
+
+ data.paddr = __sme_set(pfn << PAGE_SHIFT);
+ rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ if (WARN_ONCE(rc, "Failed to reclaim PFN %llx", pfn))
+ snp_leak_pages(pfn, 1);
+
+ return rc;
+}
+
+/*
+ * Transition a page to hypervisor-owned/shared state in the RMP table. This
+ * should not fail under normal conditions, but leak the page should that
+ * happen since it will no longer be usable by the host due to RMP protections.
+ */
+static int host_rmp_make_shared(u64 pfn, enum pg_level level)
+{
+ int rc;
+
+ rc = rmp_make_shared(pfn, level);
+ if (WARN_ON_ONCE(rc))
+ snp_leak_pages(pfn, page_level_size(level) >> PAGE_SHIFT);
+
+ return rc;
+}
+
static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
{
struct sev_data_deactivate deactivate;
@@ -2121,6 +2160,194 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return rc;
}

+struct sev_gmem_populate_args {
+ __u8 type;
+ int sev_fd;
+ int fw_error;
+};
+
+static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn_start, kvm_pfn_t pfn,
+ void __user *src, int order, void *opaque)
+{
+ struct sev_gmem_populate_args *sev_populate_args = opaque;
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ int n_private = 0, ret, i;
+ int npages = (1 << order);
+ gfn_t gfn;
+
+ if (WARN_ON_ONCE(sev_populate_args->type != KVM_SEV_SNP_PAGE_TYPE_ZERO && !src))
+ return -EINVAL;
+
+ for (gfn = gfn_start, i = 0; gfn < gfn_start + npages; gfn++, i++) {
+ struct sev_data_snp_launch_update fw_args = {0};
+ bool assigned;
+ int level;
+
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_debug("%s: Failed to ensure GFN 0x%llx has private memory attribute set\n",
+ __func__, gfn);
+ ret = -EINVAL;
+ goto err;
+ }
+
+ ret = snp_lookup_rmpentry((u64)pfn + i, &assigned, &level);
+ if (ret || assigned) {
+ pr_debug("%s: Failed to ensure GFN 0x%llx RMP entry is initial shared state, ret: %d assigned: %d\n",
+ __func__, gfn, ret, assigned);
+ ret = -EINVAL;
+ goto err;
+ }
+
+ if (src) {
+ void *vaddr = kmap_local_pfn(pfn + i);
+
+ ret = copy_from_user(vaddr, src + i * PAGE_SIZE, PAGE_SIZE);
+ if (ret)
+ goto err;
+ kunmap_local(vaddr);
+ }
+
+ ret = rmp_make_private(pfn + i, gfn << PAGE_SHIFT, PG_LEVEL_4K,
+ sev_get_asid(kvm), true);
+ if (ret)
+ goto err;
+
+ n_private++;
+
+ fw_args.gctx_paddr = __psp_pa(sev->snp_context);
+ fw_args.address = __sme_set(pfn_to_hpa(pfn + i));
+ fw_args.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K);
+ fw_args.page_type = sev_populate_args->type;
+
+ ret = __sev_issue_cmd(sev_populate_args->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &fw_args, &sev_populate_args->fw_error);
+ if (ret)
+ goto fw_err;
+ }
+
+ return 0;
+
+fw_err:
+ /*
+ * If the firmware command failed handle the reclaim and cleanup of that
+ * PFN specially vs. prior pages which can be cleaned up below without
+ * needing to reclaim in advance.
+ *
+ * Additionally, when invalid CPUID function entries are detected,
+ * firmware writes the expected values into the page and leaves it
+ * unencrypted so it can be used for debugging and error-reporting.
+ *
+ * Copy this page back into the source buffer so userspace can use this
+ * information to provide information on which CPUID leaves/fields
+ * failed CPUID validation.
+ */
+ if (!snp_page_reclaim(pfn + i) && !host_rmp_make_shared(pfn + i, PG_LEVEL_4K) &&
+ sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_CPUID &&
+ sev_populate_args->fw_error == SEV_RET_INVALID_PARAM) {
+ void *vaddr = kmap_local_pfn(pfn + i);
+
+ if (copy_to_user(src + i * PAGE_SIZE, vaddr, PAGE_SIZE))
+ pr_debug("Failed to write CPUID page back to userspace\n");
+
+ kunmap_local(vaddr);
+ }
+
+ /* pfn + i is hypervisor-owned now, so skip below cleanup for it. */
+ n_private--;
+
+err:
+ pr_debug("%s: exiting with error ret %d (fw_error %d), restoring %d gmem PFNs to shared.\n",
+ __func__, ret, sev_populate_args->fw_error, n_private);
+ for (i = 0; i < n_private; i++)
+ host_rmp_make_shared(pfn + i, PG_LEVEL_4K);
+
+ return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_gmem_populate_args sev_populate_args = {0};
+ struct kvm_sev_snp_launch_update params;
+ struct kvm_memory_slot *memslot;
+ long npages, count;
+ void __user *src;
+ int ret = 0;
+
+ if (!sev_snp_guest(kvm) || !sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+ return -EFAULT;
+
+ pr_debug("%s: GFN start 0x%llx length 0x%llx type %d flags %d\n", __func__,
+ params.gfn_start, params.len, params.type, params.flags);
+
+ if (!PAGE_ALIGNED(params.len) || params.flags ||
+ (params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL &&
+ params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO &&
+ params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED &&
+ params.type != KVM_SEV_SNP_PAGE_TYPE_SECRETS &&
+ params.type != KVM_SEV_SNP_PAGE_TYPE_CPUID))
+ return -EINVAL;
+
+ npages = params.len / PAGE_SIZE;
+
+ /*
+ * For each GFN that's being prepared as part of the initial guest
+ * state, the following pre-conditions are verified:
+ *
+ * 1) The backing memslot is a valid private memslot.
+ * 2) The GFN has been set to private via KVM_SET_MEMORY_ATTRIBUTES
+ * beforehand.
+ * 3) The PFN of the guest_memfd has not already been set to private
+ * in the RMP table.
+ *
+ * The KVM MMU relies on kvm->mmu_invalidate_seq to retry nested page
+ * faults if there's a race between a fault and an attribute update via
+ * KVM_SET_MEMORY_ATTRIBUTES, and a similar approach could be utilized
+ * here. However, kvm->slots_lock guards against both this as well as
+ * concurrent memslot updates occurring while these checks are being
+ * performed, so use that here to make it easier to reason about the
+ * initial expected state and better guard against unexpected
+ * situations.
+ */
+ mutex_lock(&kvm->slots_lock);
+
+ memslot = gfn_to_memslot(kvm, params.gfn_start);
+ if (!kvm_slot_can_be_private(memslot)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ sev_populate_args.sev_fd = argp->sev_fd;
+ sev_populate_args.type = params.type;
+ src = params.type == KVM_SEV_SNP_PAGE_TYPE_ZERO ? NULL : u64_to_user_ptr(params.uaddr);
+
+ count = kvm_gmem_populate(kvm, params.gfn_start, src, npages,
+ sev_gmem_post_populate, &sev_populate_args);
+ if (count < 0) {
+ argp->error = sev_populate_args.fw_error;
+ pr_debug("%s: kvm_gmem_populate failed, ret %ld (fw_error %d)\n",
+ __func__, count, argp->error);
+ ret = -EIO;
+ } else {
+ params.gfn_start += count;
+ params.len -= count * PAGE_SIZE;
+ if (params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO)
+ params.uaddr += count * PAGE_SIZE;
+
+ ret = 0;
+ if (copy_to_user(u64_to_user_ptr(argp->data), &params, sizeof(params)))
+ ret = -EFAULT;
+ }
+
+out:
+ mutex_unlock(&kvm->slots_lock);
+
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2220,6 +2447,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_START:
r = snp_launch_start(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_UPDATE:
+ r = snp_launch_update(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
--
2.25.1


2024-05-12 07:14:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PULL 00/19] KVM: Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

On Fri, May 10, 2024 at 11:17 PM Michael Roth <[email protected]> wrote:
>
> Hi Paolo,
>
> This pull request contains v15 of the KVM SNP support patchset[1] along
> with fixes and feedback from you and Sean regarding PSC request processing,
> fast_page_fault() handling for SNP/TDX, and avoiding uncessary
> PSMASH/zapping for KVM_EXIT_MEMORY_FAULT events. It's also been rebased
> on top of kvm/queue (commit 1451476151e0), and re-tested with/without
> 2MB gmem pages enabled.

Pulled into kvm-coco-queue, thanks (and sorry for the sev_complete_psc
mess up - it seemed too good to be true that the PSC changes were all
fine...).

Paolo

> Thanks!
>
> -Mike
>
> [1] https://lore.kernel.org/kvm/20240501085210.2213060-1-michael.roth@amdcom/
>
> The following changes since commit 1451476151e08e1e83ff07ce69dd0d1d025e976e:
>
> Merge commit 'kvm-coco-hooks' into HEAD (2024-05-10 13:20:42 -0400)
>
> are available in the Git repository at:
>
> https://github.com/mdroth/linux.git tags/tags/kvm-queue-snp
>
> for you to fetch changes up to 4b3f0135f759bb1a54bb28d644c38a7780150eda:
>
> crypto: ccp: Add the SNP_VLEK_LOAD command (2024-05-10 14:44:31 -0500)
>
> ----------------------------------------------------------------
> Base x86 KVM support for running SEV-SNP guests:
>
> - add some basic infrastructure and introduces a new KVM_X86_SNP_VM
> vm_type to handle differences versus the existing KVM_X86_SEV_VM and
> KVM_X86_SEV_ES_VM types.
>
> - implement the KVM API to handle the creation of a cryptographic
> launch context, encrypt/measure the initial image into guest memory,
> and finalize it before launching it.
>
> - implement handling for various guest-generated events such as page
> state changes, onlining of additional vCPUs, etc.
>
> - implement the gmem/mmu hooks needed to prepare gmem-allocated pages
> before mapping them into guest private memory ranges as well as
> cleaning them up prior to returning them to the host for use as
> normal memory. Because those cleanup hooks supplant certain
> activities like issuing WBINVDs during KVM MMU invalidations, avoid
> duplicating that work to avoid unecessary overhead.
>
> - add support for the servicing of guest requests to handle things like
> attestation, as well as some related host-management interfaces to
> handle updating firmware's signing key for attestation requests
>
> ----------------------------------------------------------------
> Ashish Kalra (1):
> KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
>
> Brijesh Singh (8):
> KVM: SEV: Add initial SEV-SNP support
> KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
> KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
> KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
> KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
> KVM: SEV: Add support to handle RMP nested page faults
> KVM: SVM: Add module parameter to enable SEV-SNP
> KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
>
> Michael Roth (9):
> KVM: MMU: Disable fast path if KVM_EXIT_MEMORY_FAULT is needed
> KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
> KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
> KVM: SEV: Add support to handle Page State Change VMGEXIT
> KVM: SEV: Implement gmem hook for initializing private pages
> KVM: SEV: Implement gmem hook for invalidating private pages
> KVM: x86: Implement hook for determining max NPT mapping level
> KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
> crypto: ccp: Add the SNP_VLEK_LOAD command
>
> Tom Lendacky (1):
> KVM: SEV: Support SEV-SNP AP Creation NAE event
>
> Documentation/virt/coco/sev-guest.rst | 19 +
> Documentation/virt/kvm/api.rst | 87 ++
> .../virt/kvm/x86/amd-memory-encryption.rst | 110 +-
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/include/asm/sev-common.h | 25 +
> arch/x86/include/asm/sev.h | 3 +
> arch/x86/include/asm/svm.h | 9 +-
> arch/x86/include/uapi/asm/kvm.h | 48 +
> arch/x86/kvm/Kconfig | 3 +
> arch/x86/kvm/mmu.h | 2 -
> arch/x86/kvm/mmu/mmu.c | 25 +-
> arch/x86/kvm/svm/sev.c | 1546 +++++++++++++++++++-
> arch/x86/kvm/svm/svm.c | 37 +-
> arch/x86/kvm/svm/svm.h | 52 +
> arch/x86/kvm/trace.h | 31 +
> arch/x86/kvm/x86.c | 17 +
> drivers/crypto/ccp/sev-dev.c | 36 +
> include/linux/psp-sev.h | 4 +-
> include/uapi/linux/kvm.h | 23 +
> include/uapi/linux/psp-sev.h | 27 +
> include/uapi/linux/sev-guest.h | 9 +
> virt/kvm/guest_memfd.c | 4 +-
> 22 files changed, 2086 insertions(+), 33 deletions(-)
>


2024-05-12 08:17:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PULL 00/19] KVM: Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

On Sun, May 12, 2024 at 9:14 AM Paolo Bonzini <[email protected]> wrote:
>
> On Fri, May 10, 2024 at 11:17 PM Michael Roth <[email protected]> wrote:
> >
> > Hi Paolo,
> >
> > This pull request contains v15 of the KVM SNP support patchset[1] along
> > with fixes and feedback from you and Sean regarding PSC request processing,
> > fast_page_fault() handling for SNP/TDX, and avoiding uncessary
> > PSMASH/zapping for KVM_EXIT_MEMORY_FAULT events. It's also been rebased
> > on top of kvm/queue (commit 1451476151e0), and re-tested with/without
> > 2MB gmem pages enabled.
>
> Pulled into kvm-coco-queue, thanks (and sorry for the sev_complete_psc
> mess up - it seemed too good to be true that the PSC changes were all
> fine...).

.. and there was a missing signoff in "KVM: SVM: Add module parameter
to enable SEV-SNP" so I ended up not using the pull request. But it
was still good to have it because it made it simpler to double check
what you tested vs. what I applied.

Also I have already received the full set of pull requests for
submaintainers, so I put it in kvm/next. It's not impossible that it
ends up in the 6.10 merge window, so I might as well give it a week or
two in linux-next.

Paolo


Paolo


2024-05-13 01:07:24

by Michael Roth

[permalink] [raw]
Subject: Re: [PULL 00/19] KVM: Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

On Sun, May 12, 2024 at 10:17:06AM +0200, Paolo Bonzini wrote:
> On Sun, May 12, 2024 at 9:14 AM Paolo Bonzini <[email protected]> wrote:
> >
> > On Fri, May 10, 2024 at 11:17 PM Michael Roth <[email protected]> wrote:
> > >
> > > Hi Paolo,
> > >
> > > This pull request contains v15 of the KVM SNP support patchset[1] along
> > > with fixes and feedback from you and Sean regarding PSC request processing,
> > > fast_page_fault() handling for SNP/TDX, and avoiding uncessary
> > > PSMASH/zapping for KVM_EXIT_MEMORY_FAULT events. It's also been rebased
> > > on top of kvm/queue (commit 1451476151e0), and re-tested with/without
> > > 2MB gmem pages enabled.
> >
> > Pulled into kvm-coco-queue, thanks (and sorry for the sev_complete_psc
> > mess up - it seemed too good to be true that the PSC changes were all
> > fine...).

That issue was actually introduced from my end while applying the changes,
so I think your suggested changes did pretty much work as-written. :)

>
> ... and there was a missing signoff in "KVM: SVM: Add module parameter
> to enable SEV-SNP" so I ended up not using the pull request. But it
> was still good to have it because it made it simpler to double check
> what you tested vs. what I applied.
>
> Also I have already received the full set of pull requests for
> submaintainers, so I put it in kvm/next. It's not impossible that it
> ends up in the 6.10 merge window, so I might as well give it a week or
> two in linux-next.

Makes sense; glad to hear it! I've re-tested the kvm/next version and
everything looks good. Will also get our CI configured to monitor kvm/next
as well.

Thanks,

Mike

>
> Paolo
>
>
> Paolo
>

2024-05-13 15:19:33

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PULL 18/19] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Hi Michael,

On Fri, May 10, 2024 at 04:10:23PM -0500, Michael Roth wrote:
> Version 2 of GHCB specification added support for the SNP Extended Guest
> Request Message NAE event. This event serves a nearly identical purpose
> to the previously-added SNP_GUEST_REQUEST event, but allows for
> additional certificate data to be supplied via an additional
> guest-supplied buffer to be used mainly for verifying the signature of
> an attestation report as returned by firmware.
>
> This certificate data is supplied by userspace, so unlike with
> SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
> forwarded to userspace via a KVM_EXIT_VMGEXIT exit structure, and then
> the firmware request is made after the certificate data has been fetched
> from userspace.
>
> Since there is a potential for race conditions where the
> userspace-supplied certificate data may be out-of-sync relative to the
> reported TCB or VLEK that firmware will use when signing attestation
> reports, a hook is also provided so that userspace can be informed once
> the attestation request is actually completed. See the updates to
> Documentation/ for more details on these aspects.
>
> Signed-off-by: Michael Roth <[email protected]>
> Message-ID: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
..
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 00d29d278f6e..398266bef2ca 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
..
> +static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
> +{
> + int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> + struct vcpu_svm *svm = to_svm(vcpu);
> + unsigned long data_npages;
> + sev_ret_code fw_err;
> + gpa_t data_gpa;
> +
> + if (!sev_snp_guest(vcpu->kvm))
> + goto abort_request;
> +
> + data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> + data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> +
> + if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
> + goto abort_request;
> +
> + /*
> + * Grab the certificates from userspace so that can be bundled with
> + * attestation/guest requests.
> + */
> + vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
> + vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_REQ_CERTS;
> + vcpu->run->vmgexit.req_certs.data_gpa = data_gpa;
> + vcpu->run->vmgexit.req_certs.data_npages = data_npages;
> + vcpu->run->vmgexit.req_certs.flags = 0;
> + vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING;
> + vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
> +
> + return 0; /* forward request to userspace */
> +
> +abort_request:
> + ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> + return 1; /* resume guest */
> +}

This patch is now in -next as commit 32fde9e18b3f ("KVM: SEV: Provide
support for SNP_EXTENDED_GUEST_REQUEST NAE event"), where it causes a
clang warning (or hard error when CONFIG_WERROR is enabled):

arch/x86/kvm/svm/sev.c:4078:67: error: variable 'fw_err' is uninitialized when used here [-Werror,-Wuninitialized]
4078 | ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
| ^~~~~~
include/uapi/linux/sev-guest.h:94:24: note: expanded from macro 'SNP_GUEST_ERR'
94 | SNP_GUEST_FW_ERR(fw_err))
| ^~~~~~
include/uapi/linux/sev-guest.h:92:32: note: expanded from macro 'SNP_GUEST_FW_ERR'
92 | #define SNP_GUEST_FW_ERR(x) ((x) & SNP_GUEST_FW_ERR_MASK)
| ^
arch/x86/kvm/svm/sev.c:4051:2: note: variable 'fw_err' is declared here
4051 | sev_ret_code fw_err;
| ^
1 error generated.

Seems legitimate to me. What was the intention here?

Cheers,
Nathan

2024-05-13 16:53:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PULL 18/19] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

On 5/13/24 17:19, Nathan Chancellor wrote:
>> +static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
>> +{
>> + int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
>> + struct vcpu_svm *svm = to_svm(vcpu);
>> + unsigned long data_npages;
>> + sev_ret_code fw_err;
>> + gpa_t data_gpa;
>> +
>> + if (!sev_snp_guest(vcpu->kvm))
>> + goto abort_request;
>> +
>> + data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>> + data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
>> +
>> + if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
>> + goto abort_request;
>
> [...]
>
>> +abort_request:
>> + ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
>> + return 1; /* resume guest */
>> +}
>
> This patch is now in -next as commit 32fde9e18b3f ("KVM: SEV: Provide
> support for SNP_EXTENDED_GUEST_REQUEST NAE event"), where it causes a
> clang warning (or hard error when CONFIG_WERROR is enabled) [...]
> Seems legitimate to me. What was the intention here?

Mike, I think this should just be 0?

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c7a0971149f2..affb4fb47f91 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3911,7 +3911,6 @@ static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
struct vcpu_svm *svm = to_svm(vcpu);
unsigned long data_npages;
- sev_ret_code fw_err;
gpa_t data_gpa;

if (!sev_snp_guest(vcpu->kvm))
@@ -3938,7 +3937,7 @@ static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
return 0; /* forward request to userspace */

abort_request:
- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, 0));
return 1; /* resume guest */
}

Paolo


2024-05-13 17:14:38

by Michael Roth

[permalink] [raw]
Subject: Re: [PULL 18/19] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

On Mon, May 13, 2024 at 06:53:24PM +0200, Paolo Bonzini wrote:
> On 5/13/24 17:19, Nathan Chancellor wrote:
> > > +static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
> > > +{
> > > + int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> > > + struct vcpu_svm *svm = to_svm(vcpu);
> > > + unsigned long data_npages;
> > > + sev_ret_code fw_err;
> > > + gpa_t data_gpa;
> > > +
> > > + if (!sev_snp_guest(vcpu->kvm))
> > > + goto abort_request;
> > > +
> > > + data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > > + data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> > > +
> > > + if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
> > > + goto abort_request;
> >
> > [...]
> >
> > > +abort_request:
> > > + ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> > > + return 1; /* resume guest */
> > > +}
> >
> > This patch is now in -next as commit 32fde9e18b3f ("KVM: SEV: Provide
> > support for SNP_EXTENDED_GUEST_REQUEST NAE event"), where it causes a
> > clang warning (or hard error when CONFIG_WERROR is enabled) [...]
> > Seems legitimate to me. What was the intention here?
>
> Mike, I think this should just be 0?

Hi Paolo,

Yes, I was just about to submit a patch that does just that:

https://github.com/mdroth/linux/commit/df55e9c5b97542fe037f5b5293c11a49f7c658ef

Sorry for the breakage,

Mike

>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index c7a0971149f2..affb4fb47f91 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3911,7 +3911,6 @@ static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
> int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> struct vcpu_svm *svm = to_svm(vcpu);
> unsigned long data_npages;
> - sev_ret_code fw_err;
> gpa_t data_gpa;
> if (!sev_snp_guest(vcpu->kvm))
> @@ -3938,7 +3937,7 @@ static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
> return 0; /* forward request to userspace */
> abort_request:
> - ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> + ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, 0));
> return 1; /* resume guest */
> }
> Paolo
>
>

2024-05-13 17:21:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PULL 18/19] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

On Mon, May 13, 2024 at 7:11 PM Michael Roth <[email protected]> wrote:
> Hi Paolo,
>
> Yes, I was just about to submit a patch that does just that:
>
> https://github.com/mdroth/linux/commit/df55e9c5b97542fe037f5b5293c11a49f7c658ef

Go ahead then!

Paolo


2024-05-13 21:18:56

by Michael Roth

[permalink] [raw]
Subject: Re: [PULL 18/19] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

On Mon, May 13, 2024 at 12:05:35PM -0500, Michael Roth wrote:
> On Mon, May 13, 2024 at 06:53:24PM +0200, Paolo Bonzini wrote:
> > On 5/13/24 17:19, Nathan Chancellor wrote:
> > > > +static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
> > > > +{
> > > > + int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> > > > + struct vcpu_svm *svm = to_svm(vcpu);
> > > > + unsigned long data_npages;
> > > > + sev_ret_code fw_err;
> > > > + gpa_t data_gpa;
> > > > +
> > > > + if (!sev_snp_guest(vcpu->kvm))
> > > > + goto abort_request;
> > > > +
> > > > + data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > > > + data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> > > > +
> > > > + if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
> > > > + goto abort_request;
> > >
> > > [...]
> > >
> > > > +abort_request:
> > > > + ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> > > > + return 1; /* resume guest */
> > > > +}
> > >
> > > This patch is now in -next as commit 32fde9e18b3f ("KVM: SEV: Provide
> > > support for SNP_EXTENDED_GUEST_REQUEST NAE event"), where it causes a
> > > clang warning (or hard error when CONFIG_WERROR is enabled) [...]
> > > Seems legitimate to me. What was the intention here?
> >
> > Mike, I think this should just be 0?
>
> Hi Paolo,
>
> Yes, I was just about to submit a patch that does just that:
>
> https://github.com/mdroth/linux/commit/df55e9c5b97542fe037f5b5293c11a49f7c658ef

Submitted a proper patch here:

https://lore.kernel.org/kvm/[email protected]/

and also one for a separate warning:

https://lore.kernel.org/kvm/[email protected]/

I saw my build environment had WARN=0 for the last round of changes, so I
re-tested various kernel configs with/without clang and haven't seen any
other issues. So I think that should be the last of it. I'll be sure to be
a lot more careful about this in the future.

Thanks,

Mike

>
> Sorry for the breakage,
>
> Mike
>
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index c7a0971149f2..affb4fb47f91 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3911,7 +3911,6 @@ static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
> > int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> > struct vcpu_svm *svm = to_svm(vcpu);
> > unsigned long data_npages;
> > - sev_ret_code fw_err;
> > gpa_t data_gpa;
> > if (!sev_snp_guest(vcpu->kvm))
> > @@ -3938,7 +3937,7 @@ static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
> > return 0; /* forward request to userspace */
> > abort_request:
> > - ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> > + ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, 0));
> > return 1; /* resume guest */
> > }
> > Paolo
> >
> >
>

2024-05-16 03:12:54

by Michael Roth

[permalink] [raw]
Subject: Re: [PULL 13/19] KVM: SEV: Implement gmem hook for invalidating private pages

On Wed, May 15, 2024 at 03:32:31PM -0700, Sean Christopherson wrote:
> On Fri, May 10, 2024, Michael Roth wrote:
> > Implement a platform hook to do the work of restoring the direct map
> > entries of gmem-managed pages and transitioning the corresponding RMP
> > table entries back to the default shared/hypervisor-owned state.
>
> ...
>
> > +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> > +{
> > + kvm_pfn_t pfn;
> > +
> > + pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
> > +
> > + for (pfn = start; pfn < end;) {
> > + bool use_2m_update = false;
> > + int rc, rmp_level;
> > + bool assigned;
> > +
> > + rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
> > + if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
> > + pfn, rc))
> > + goto next_pfn;
>
> This is comically trivial to hit, as it fires when running guest_memfd_test on a
> !SNP host. Presumably the correct fix is to simply do nothing for !sev_snp_guest(),
> but that's easier said than done due to the lack of a @kvm in .gmem_invalidate().

Yah, the code assumes that SNP is the only SVM user that would use gmem
pages. Unfortunately KVM_X86_SW_PROTECTED_VM is the one other situation
where this can be the case. The minimal fix would be to squash the below
into this patch:

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 176ba117413a..56b0b59b8263 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4675,6 +4675,9 @@ void sev_gmem_invalidate(kvm_pfn_t start,
kvm_pfn_t end)
{
kvm_pfn_t pfn;

+ if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
+ return;
+
pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n",
__func__, start, end);

for (pfn = start; pfn < end;) {

It's not perfect because the callback will still run for
KVM_X86_SW_PROTECTED_VM if SNP is enabled, but in the context of
KVM_X86_SW_PROTECTED_VM being a stand-in for testing SNP/TDX, that
might not be such a bad thing.

Longer term if we need something more robust would be to modify the
free_folio callback path to pass along folio->mapping, or switch to
something else that provides similar functionality. Another approach
might be to set .free_folio dynamically based on the vm_type of the
gmem user when creating the gmem instance.

>
> That too is not a big fix, but that's beside the point. IMO, the fact that I'm
> the first person to (completely inadvertantly) hit this rather basic bug is a
> good hint that we should wait until 6.11 to merge SNP support.

We do regular testing of normal guests with/without SNP enabled, but
unfortunately we've only been doing KST runs on SNP-enabled hosts.
I've retested with the above fix and everything looks good with
SVM/SEV/SEV-ES/SNP/selftests with and without SNP enabled, but I
understand if we still have reservations after this.

-Mike

2024-05-16 12:46:10

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PULL 13/19] KVM: SEV: Implement gmem hook for invalidating private pages

On Thu, May 16, 2024 at 12:32 AM Sean Christopherson <seanjc@googlecom> wrote:
> > +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> > +{
> > + kvm_pfn_t pfn;
> > +
> > + pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
> > +
> > + for (pfn = start; pfn < end;) {
> > + bool use_2m_update = false;
> > + int rc, rmp_level;
> > + bool assigned;
> > +
> > + rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
> > + if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
> > + pfn, rc))
> > + goto next_pfn;
>
> This is comically trivial to hit, as it fires when running guest_memfd_test on a
> !SNP host. Presumably the correct fix is to simply do nothing for !sev_snp_guest(),
> but that's easier said than done due to the lack of a @kvm in .gmem_invalidate().
>
> That too is not a big fix, but that's beside the point. IMO, the fact that I'm
> the first person to (completely inadvertantly) hit this rather basic bug is a
> good hint that we should wait until 6.11 to merge SNP support.

Of course there is an explanation - I usually run all the tests before
pushing anything to kvm/next, here I did not do it because 1) I was
busy with the merge window and 2) I wanted to give exposure to the
code in linux-next, which was the right call indeed but it's beside
the point. Between the clang issue and this one, it's clear that even
though the implementation is 99.99% okay (especially considering the
size), there are a few kinks to fix.

I'll fix everything up and re-push to kvm/next, but I agree that we
shouldn't rush it any further. What really matters is that development
on userspace can proceed.

This also confirms that it's important to replace kvm/next with
kvm/queue in linux-next, since linux-next doesn't care that much about
branches that rebase.

Paolo


2024-05-17 20:41:56

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PULL 17/19] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

On Fri, 2024-05-10 at 16:10 -0500, Michael Roth wrote:
> +
> +static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t
> resp_gpa,
> +                                 sev_ret_code *fw_err)
> +{
> +       struct sev_data_snp_guest_request data = {0};
> +       struct kvm_sev_info *sev;
> +       int ret;
> +
> +       if (!sev_snp_guest(kvm))
> +               return -EINVAL;
> +
> +       sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       ret = snp_setup_guest_buf(kvm, &data, req_gpa, resp_gpa);
> +       if (ret)
> +               return ret;
> +
> +       ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
> +       if (ret)
> +               return ret;
> +
> +       ret = snp_cleanup_guest_buf(&data);
> +       if (ret)
> +               return ret;
> +
> +       return 0;
> +}

I get a build error in kvm-coco-queue with W=1:

arch/x86/kvm/svm/sev.c: In function ‘__snp_handle_guest_req’:
arch/x86/kvm/svm/sev.c:3968:30: error: variable ‘sev’ set but not used [-
Werror=unused-but-set-variable]
3968 | struct kvm_sev_info *sev;
| ^~~
cc1: all warnings being treated as errors

To fix it:

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 57c2c8025547..6beaa6d42de9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3965,14 +3965,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t
req_gpa, gpa_t resp_gpa
sev_ret_code *fw_err)
{
struct sev_data_snp_guest_request data = {0};
- struct kvm_sev_info *sev;
int ret;

if (!sev_snp_guest(kvm))
return -EINVAL;

- sev = &to_kvm_svm(kvm)->sev_info;
-
ret = snp_setup_guest_buf(kvm, &data, req_gpa, resp_gpa);
if (ret)
return ret;

2024-05-17 22:01:23

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PULL 17/19] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

On 5/17/24 22:41, Edgecombe, Rick P wrote:
> I get a build error in kvm-coco-queue with W=1:
>
> arch/x86/kvm/svm/sev.c: In function ‘__snp_handle_guest_req’:
> arch/x86/kvm/svm/sev.c:3968:30: error: variable ‘sev’ set but not used [-
> Werror=unused-but-set-variable]
> 3968 | struct kvm_sev_info *sev;
> | ^~~
> cc1: all warnings being treated as errors
>
> To fix it:
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 57c2c8025547..6beaa6d42de9 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3965,14 +3965,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t
> req_gpa, gpa_t resp_gpa
> sev_ret_code *fw_err)
> {
> struct sev_data_snp_guest_request data = {0};
> - struct kvm_sev_info *sev;
> int ret;
>
> if (!sev_snp_guest(kvm))
> return -EINVAL;
>
> - sev = &to_kvm_svm(kvm)->sev_info;
> -
> ret = snp_setup_guest_buf(kvm, &data, req_gpa, resp_gpa);
> if (ret)
> return ret;

I'll post a fully updated version tomorrow with all the pending fixes.
Or today depending on the timezone.

Paolo


2024-05-18 15:19:41

by Michael Roth

[permalink] [raw]
Subject: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

Before forwarding guest requests to firmware, KVM takes a reference on
the 2 pages the guest uses for its request/response buffers. Make sure
to release these when cleaning up after the request is completed.

Signed-off-by: Michael Roth <[email protected]>
---

Hi Paolo,

Sorry for another late fix, but I finally spotted this while looking over
the code again today. I've re-tested attestation guest requests with this
applied (after applying the other pending fix) and everything looks good.

-Mike

arch/x86/kvm/svm/sev.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 41e383e30797..e57faf7d04d1 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3933,11 +3933,16 @@ static int snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_reques
return -EINVAL;

resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
- if (is_error_noslot_pfn(resp_pfn))
+ if (is_error_noslot_pfn(resp_pfn)) {
+ kvm_release_pfn_clean(req_pfn);
return -EINVAL;
+ }

- if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+ if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true)) {
+ kvm_release_pfn_clean(req_pfn);
+ kvm_release_pfn_clean(resp_pfn);
return -EINVAL;
+ }

data->gctx_paddr = __psp_pa(sev->snp_context);
data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
@@ -3948,11 +3953,16 @@ static int snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_reques

static int snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data)
{
- u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+ u64 req_pfn = __sme_clr(data->req_paddr) >> PAGE_SHIFT;
+ u64 resp_pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+
+ kvm_release_pfn_clean(req_pfn);

- if (snp_page_reclaim(pfn) || rmp_make_shared(pfn, PG_LEVEL_4K))
+ if (snp_page_reclaim(resp_pfn) || rmp_make_shared(resp_pfn, PG_LEVEL_4K))
return -EINVAL;

+ kvm_release_pfn_dirty(resp_pfn);
+
return 0;
}

@@ -3970,14 +3980,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa
return ret;

ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
- if (ret)
- return ret;

- ret = snp_cleanup_guest_buf(&data);
- if (ret)
- return ret;
+ if (snp_cleanup_guest_buf(&data))
+ return -EINVAL;

- return 0;
+ return ret;
}

static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
--
2.25.1


2024-05-20 22:56:38

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

On Mon, May 20, 2024 at 07:17:13AM -0700, Sean Christopherson wrote:
> This needs a
>
> From: Michael Roth <[email protected]>
>
> otherwise Author will be assigned to your @utexas.edu email.

Thanks, I hadn't considered that. My work email issue seems to be
resolved now, but will keep that in mind if I ever need to use a
fallback again.

>
> On Sat, May 18, 2024, Michael Roth wrote:
> > Before forwarding guest requests to firmware, KVM takes a reference on
> > the 2 pages the guest uses for its request/response buffers. Make sure
> > to release these when cleaning up after the request is completed.
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > ---
>
> ...
>
> > @@ -3970,14 +3980,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa
> > return ret;
> >
> > ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
> > - if (ret)
> > - return ret;
> >
> > - ret = snp_cleanup_guest_buf(&data);
> > - if (ret)
> > - return ret;
> > + if (snp_cleanup_guest_buf(&data))
> > + return -EINVAL;
>
> EINVAL feels wrong. The input was completely valid. Also, forwarding the error

Yah, EIO seems more suitable here.

> to the guest doesn't seem like the right thing to do if KVM can't reclaim the
> response PFN. Shouldn't that be fatal to the VM?

The thinking here is that pretty much all guest request failures will be
fatal to the guest being able to continue. At least, that's definitely
true for attestation. So reporting the error to the guest would allow that
failure to be propagated along by handling in the guest where it would
presumably be reported a little more clearly to the guest owner, at
which point the guest would most likely terminate itself anyway.

But there is a possibility that the guest will attempt access the response
PFN before/during that reporting and spin on an #NPF instead though. So
maybe the safer more repeatable approach is to handle the error directly
from KVM and propagate it to userspace.

But the GHCB spec does require that the firmware response code for
SNP_GUEST_REQUEST be passed directly to the guest via lower 32-bits of
SW_EXITINFO2, so we'd still want handling to pass that error on to the
guest, so I made some changes to retain that behavior.

>
> > - return 0;
> > + return ret;
>
> I find the setup/cleanup split makes this code harder to read, not easier. It
> won't be pretty no matter waht due to the potential RMP failures, but IMO this
> is easier to follow:

It *might* make more sense to split things out into helpers when extended
guest requests are implemented, but for the patch in question I agree
what you have below is clearer. I also went a step further and moved
__snp_handle_guest_req() back into snp_handle_guest_req() as well to
simplify the logic for always passing firmware errors back to the guest.

I'll post a v2 of the fixup with these changes added. But I've also
pushed it here for reference:

https://github.com/mdroth/linux/commit/8ceab17950dc5f1b94231037748104f7c31752f8
(from https://github.com/mdroth/linux/commits/kvm-next-snp-fixes2/)

and here's the original PATCH 17/19 with all pending fixes squashed in:

https://github.com/mdroth/linux/commit/b4f51e38da22a2b163c546cb2a3aefd04446b3c7
(from https://github.com/mdroth/linux/commits/kvm-next-snp-fixes2-squashed/)
(also retested attestation with simulated failures and double-checked
for clang warnings with W=1)

Thanks!

-Mike

>
> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> struct sev_data_snp_guest_request data = {0};
> kvm_pfn_t req_pfn, resp_pfn;
> int ret;
>
> if (!sev_snp_guest(kvm))
> return -EINVAL;
>
> if (!PAGE_ALIGNED(req_gpa) || !PAGE_ALIGNED(resp_gpa))
> return -EINVAL;
>
> req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> if (is_error_noslot_pfn(req_pfn))
> return -EINVAL;
>
> ret = -EINVAL;
>
> resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> if (is_error_noslot_pfn(resp_pfn))
> goto release_req;
>
> if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true)) {
> kvm_release_pfn_clean(resp_pfn);
> goto release_req;
> }
>
> data.gctx_paddr = __psp_pa(sev->snp_context);
> data.req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> data.res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
> ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
>
> if (snp_page_reclaim(resp_pfn) ||
> rmp_make_shared(resp_pfn, PG_LEVEL_4K))
> ret = ret ?: -EIO;
> else
> kvm_release_pfn_dirty(resp_pfn);
> release_req:
> kvm_release_pfn_clean(req_pfn);
> return ret;
>
>

2024-05-20 23:03:01

by Michael Roth

[permalink] [raw]
Subject: [PATCH v2] KVM: SEV: Fix guest memory leak when handling guest requests

Before forwarding guest requests to firmware, KVM takes a reference on
the 2 pages the guest uses for its request/response buffers. Make sure
to release these when cleaning up after the request is completed.

Also modify the logic to fail immediately (rather than report failure to
the guest) if there is an error returning the guest pages to their
expected state after the firmware command completes. Continue to
propagate firmware errors to the guest as per the GHCB spec, however.

Suggested-by: Sean Christopherson <[email protected]> #for error-handling
Signed-off-by: Michael Roth <[email protected]>
---
v2:
- Fail to userspace if reclaim fails rather than trying to inform the
guest of the error (Sean)
- Remove the setup/cleanup helpers so that the cleanup logic is easier
to follow (Sean)
- Full original patch with this and other pending fix squashed in:
https://github.com/mdroth/linux/commit/b4f51e38da22a2b163c546cb2a3aefd04446b3c7

arch/x86/kvm/svm/sev.c | 105 ++++++++++++++++++-----------------------
1 file changed, 47 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 252bf7564f4b..446f9811cdaf 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3919,11 +3919,16 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
return ret;
}

-static int snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_request *data,
- gpa_t req_gpa, gpa_t resp_gpa)
+static int snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
{
- struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_guest_request data = {0};
+ struct kvm *kvm = svm->vcpu.kvm;
kvm_pfn_t req_pfn, resp_pfn;
+ sev_ret_code fw_err = 0;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -EINVAL;

if (!PAGE_ALIGNED(req_gpa) || !PAGE_ALIGNED(resp_gpa))
return -EINVAL;
@@ -3933,64 +3938,49 @@ static int snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_reques
return -EINVAL;

resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
- if (is_error_noslot_pfn(resp_pfn))
- return -EINVAL;
-
- if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
- return -EINVAL;
-
- data->gctx_paddr = __psp_pa(sev->snp_context);
- data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
- data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
-
- return 0;
-}
-
-static int snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data)
-{
- u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
-
- if (snp_page_reclaim(pfn) || rmp_make_shared(pfn, PG_LEVEL_4K))
- return -EINVAL;
-
- return 0;
-}
-
-static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa,
- sev_ret_code *fw_err)
-{
- struct sev_data_snp_guest_request data = {0};
- int ret;
-
- if (!sev_snp_guest(kvm))
- return -EINVAL;
-
- ret = snp_setup_guest_buf(kvm, &data, req_gpa, resp_gpa);
- if (ret)
- return ret;
+ if (is_error_noslot_pfn(resp_pfn)) {
+ ret = -EINVAL;
+ goto release_req;
+ }

- ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
- if (ret)
- return ret;
+ if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true)) {
+ ret = -EINVAL;
+ kvm_release_pfn_clean(resp_pfn);
+ goto release_req;
+ }

- ret = snp_cleanup_guest_buf(&data);
- if (ret)
- return ret;
+ data.gctx_paddr = __psp_pa(to_kvm_sev_info(kvm)->snp_context);
+ data.req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+ data.res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);

- return 0;
-}
+ ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &fw_err);

-static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
-{
- struct kvm_vcpu *vcpu = &svm->vcpu;
- struct kvm *kvm = vcpu->kvm;
- sev_ret_code fw_err = 0;
- int vmm_ret = 0;
-
- if (__snp_handle_guest_req(kvm, req_gpa, resp_gpa, &fw_err))
- vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+ /*
+ * If the pages can't be placed back in the expected state then it is
+ * more reliable to always report the error to userspace than to try to
+ * let the guest deal with it somehow. Either way, the guest would
+ * likely terminate itself soon after a guest request failure anyway.
+ */
+ if (snp_page_reclaim(resp_pfn) ||
+ host_rmp_make_shared(resp_pfn, PG_LEVEL_4K)) {
+ ret = -EIO;
+ goto release_req;
+ }

- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+ /*
+ * Unlike with reclaim failures, firmware failures should be
+ * communicated back to the guest via SW_EXITINFO2 rather than be
+ * treated as immediately fatal.
+ */
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb,
+ SNP_GUEST_ERR(ret ? SNP_GUEST_VMM_ERR_GENERIC : 0,
+ fw_err));
+ ret = 1; /* resume guest */
+ kvm_release_pfn_dirty(resp_pfn);
+
+release_req:
+ kvm_release_pfn_clean(req_pfn);
+ return ret;
}

static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
@@ -4268,8 +4258,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
case SVM_VMGEXIT_GUEST_REQUEST:
- snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
- ret = 1;
+ ret = snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
--
2.25.1


2024-05-20 23:32:20

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

On Mon, May 20, 2024, Michael Roth wrote:
> On Mon, May 20, 2024 at 07:17:13AM -0700, Sean Christopherson wrote:
> > On Sat, May 18, 2024, Michael Roth wrote:
> > > Before forwarding guest requests to firmware, KVM takes a reference on
> > > the 2 pages the guest uses for its request/response buffers. Make sure
> > > to release these when cleaning up after the request is completed.
> > >
> > > Signed-off-by: Michael Roth <[email protected]>
> > > ---
> >
> > ...
> >
> > > @@ -3970,14 +3980,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa
> > > return ret;
> > >
> > > ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
> > > - if (ret)
> > > - return ret;
> > >
> > > - ret = snp_cleanup_guest_buf(&data);
> > > - if (ret)
> > > - return ret;
> > > + if (snp_cleanup_guest_buf(&data))
> > > + return -EINVAL;
> >
> > EINVAL feels wrong. The input was completely valid. Also, forwarding the error
>
> Yah, EIO seems more suitable here.
>
> > to the guest doesn't seem like the right thing to do if KVM can't reclaim the
> > response PFN. Shouldn't that be fatal to the VM?
>
> The thinking here is that pretty much all guest request failures will be
> fatal to the guest being able to continue. At least, that's definitely
> true for attestation. So reporting the error to the guest would allow that
> failure to be propagated along by handling in the guest where it would
> presumably be reported a little more clearly to the guest owner, at
> which point the guest would most likely terminate itself anyway.

But failure to convert a pfn back to shared is a _host_ issue, not a guest issue.
E.g. it most likely indicates a bug in the host software stack, or perhaps a bad
CPU or firmware bug.



> But there is a possibility that the guest will attempt access the response
> PFN before/during that reporting and spin on an #NPF instead though. So
> maybe the safer more repeatable approach is to handle the error directly
> from KVM and propagate it to userspace.

I was thinking more along the lines of KVM marking the VM as dead/bugged.

> But the GHCB spec does require that the firmware response code for
> SNP_GUEST_REQUEST be passed directly to the guest via lower 32-bits of
> SW_EXITINFO2, so we'd still want handling to pass that error on to the
> guest, so I made some changes to retain that behavior.

If and only the hypervisor completes the event.

The hypervisor must save the SNP_GUEST_REQUEST return code in the lower 32-bits
of the SW_EXITINFO2 field before completing the Guest Request NAE event.

If KVM terminates the VM, there's obviously no need to fill SW_EXITINFO2.

Side topic, is there a plan to ratelimit Guest Requests?

To avoid the possibility of a guest creating a denial of service attack against
the SNP firmware, it is recommended that some form of rate limiting be implemented
should it be detected that a high number of Guest Request NAE events are being
issued.

2024-05-21 02:01:47

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> On Mon, May 20, 2024, Michael Roth wrote:
> > On Mon, May 20, 2024 at 07:17:13AM -0700, Sean Christopherson wrote:
> > > On Sat, May 18, 2024, Michael Roth wrote:
> > > > Before forwarding guest requests to firmware, KVM takes a reference on
> > > > the 2 pages the guest uses for its request/response buffers. Make sure
> > > > to release these when cleaning up after the request is completed.
> > > >
> > > > Signed-off-by: Michael Roth <[email protected]>
> > > > ---
> > >
> > > ...
> > >
> > > > @@ -3970,14 +3980,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa
> > > > return ret;
> > > >
> > > > ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
> > > > - if (ret)
> > > > - return ret;
> > > >
> > > > - ret = snp_cleanup_guest_buf(&data);
> > > > - if (ret)
> > > > - return ret;
> > > > + if (snp_cleanup_guest_buf(&data))
> > > > + return -EINVAL;
> > >
> > > EINVAL feels wrong. The input was completely valid. Also, forwarding the error
> >
> > Yah, EIO seems more suitable here.
> >
> > > to the guest doesn't seem like the right thing to do if KVM can't reclaim the
> > > response PFN. Shouldn't that be fatal to the VM?
> >
> > The thinking here is that pretty much all guest request failures will be
> > fatal to the guest being able to continue. At least, that's definitely
> > true for attestation. So reporting the error to the guest would allow that
> > failure to be propagated along by handling in the guest where it would
> > presumably be reported a little more clearly to the guest owner, at
> > which point the guest would most likely terminate itself anyway.
>
> But failure to convert a pfn back to shared is a _host_ issue, not a guest issue.
> E.g. it most likely indicates a bug in the host software stack, or perhaps a bad
> CPU or firmware bug.

No disagreement there, I think it's more correct to not propagate
any errors resulting from reclaim failure. Was just explaining why the
original code had propensity for propagating errors to guest, and why it
still needs to be done for firmware errors.

>
> > But there is a possibility that the guest will attempt access the response
> > PFN before/during that reporting and spin on an #NPF instead though. So
> > maybe the safer more repeatable approach is to handle the error directly
> > from KVM and propagate it to userspace.
>
> I was thinking more along the lines of KVM marking the VM as dead/bugged.

In practice userspace will get an unhandled exit and kill the vcpu/guest,
but we could additionally flag the guest as dead. Is there a existing
mechanism for this?

>
> > But the GHCB spec does require that the firmware response code for
> > SNP_GUEST_REQUEST be passed directly to the guest via lower 32-bits of
> > SW_EXITINFO2, so we'd still want handling to pass that error on to the
> > guest, so I made some changes to retain that behavior.
>
> If and only the hypervisor completes the event.
>
> The hypervisor must save the SNP_GUEST_REQUEST return code in the lower 32-bits
> of the SW_EXITINFO2 field before completing the Guest Request NAE event.
>
> If KVM terminates the VM, there's obviously no need to fill SW_EXITINFO2.

Yah, the v2 patch will only propagate the firmware error if reclaim was
successful.

>
> Side topic, is there a plan to ratelimit Guest Requests?
>
> To avoid the possibility of a guest creating a denial of service attack against
> the SNP firmware, it is recommended that some form of rate limiting be implemented
> should it be detected that a high number of Guest Request NAE events are being
> issued.

The guest side is upstream, and Dionna submitted HV patches last year. I think
these are the latest ones:

https://www.spinics.net/lists/kvm/msg301438.html

I think it probably makes sense to try to get the throttling support in
for 6.11

-Mike

2024-05-21 14:10:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

On Mon, May 20, 2024, Michael Roth wrote:
> On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> > On Mon, May 20, 2024, Michael Roth wrote:
> > > But there is a possibility that the guest will attempt access the response
> > > PFN before/during that reporting and spin on an #NPF instead though. So
> > > maybe the safer more repeatable approach is to handle the error directly
> > > from KVM and propagate it to userspace.
> >
> > I was thinking more along the lines of KVM marking the VM as dead/bugged.
>
> In practice userspace will get an unhandled exit and kill the vcpu/guest,
> but we could additionally flag the guest as dead.

Honest question, does it make sense from KVM to make the VM unusable? E.g. is
it feasible for userspace to keep running the VM? Does the page that's in a bad
state present any danger to the host?

> Is there a existing mechanism for this?

kvm_vm_dead()

2024-05-21 15:34:58

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

On Tue, May 21, 2024 at 07:09:04AM -0700, Sean Christopherson wrote:
> On Mon, May 20, 2024, Michael Roth wrote:
> > On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> > > On Mon, May 20, 2024, Michael Roth wrote:
> > > > But there is a possibility that the guest will attempt access the response
> > > > PFN before/during that reporting and spin on an #NPF instead though. So
> > > > maybe the safer more repeatable approach is to handle the error directly
> > > > from KVM and propagate it to userspace.
> > >
> > > I was thinking more along the lines of KVM marking the VM as dead/bugged.
> >
> > In practice userspace will get an unhandled exit and kill the vcpu/guest,
> > but we could additionally flag the guest as dead.
>
> Honest question, does it make sense from KVM to make the VM unusable? E.g. is
> it feasible for userspace to keep running the VM? Does the page that's in a bad
> state present any danger to the host?

If the reclaim fails (which it shouldn't), then KVM has a unique situation
where a non-gmem guest page is in a state. In theory, if the guest/userspace
could somehow induce a reclaim failure, then can they potentially trick the
host into trying to access that same page as a shared page and induce a
host RMP #PF.

So it does seem like a good idea to force the guest to stop executing. Then
once the guest is fully destroyed the bad page will stay leaked so it
won't affect subsequent activities.

>
> > Is there a existing mechanism for this?
>
> kvm_vm_dead()

Nice, that would do the trick. I'll modify the logic to also call that
after a reclaim failure.

Thanks,

Mike

2024-05-21 16:55:57

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PULL 13/19] KVM: SEV: Implement gmem hook for invalidating private pages

On Thu, May 16, 2024 at 5:12 AM Michael Roth <[email protected]> wrote:
> Longer term if we need something more robust would be to modify the
> .free_folio callback path to pass along folio->mapping, or switch to
> something else that provides similar functionality. Another approach
> might be to set .free_folio dynamically based on the vm_type of the
> gmem user when creating the gmem instance.

You need to not warn. Testing CC_ATTR_HOST_SEV_SNP is just an optimization.

Paolo

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index dc00b89404a2..1c57b4535f15 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4676,8 +4676,7 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
bool assigned;

rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
- if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN
0x%llx error %d\n",
- pfn, rc))
+ if (rc)
goto next_pfn;

if (!assigned)

Paolo


2024-05-21 16:58:52

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

On Tue, May 21, 2024, Michael Roth wrote:
> On Tue, May 21, 2024 at 07:09:04AM -0700, Sean Christopherson wrote:
> > On Mon, May 20, 2024, Michael Roth wrote:
> > > On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> > > > On Mon, May 20, 2024, Michael Roth wrote:
> > > > > But there is a possibility that the guest will attempt access the response
> > > > > PFN before/during that reporting and spin on an #NPF instead though. So
> > > > > maybe the safer more repeatable approach is to handle the error directly
> > > > > from KVM and propagate it to userspace.
> > > >
> > > > I was thinking more along the lines of KVM marking the VM as dead/bugged.
> > >
> > > In practice userspace will get an unhandled exit and kill the vcpu/guest,
> > > but we could additionally flag the guest as dead.
> >
> > Honest question, does it make sense from KVM to make the VM unusable? E.g. is
> > it feasible for userspace to keep running the VM? Does the page that's in a bad
> > state present any danger to the host?
>
> If the reclaim fails (which it shouldn't), then KVM has a unique situation
> where a non-gmem guest page is in a state. In theory, if the guest/userspace
> could somehow induce a reclaim failure, then can they potentially trick the
> host into trying to access that same page as a shared page and induce a
> host RMP #PF.
>
> So it does seem like a good idea to force the guest to stop executing. Then
> once the guest is fully destroyed the bad page will stay leaked so it
> won't affect subsequent activities.
>
> >
> > > Is there a existing mechanism for this?
> >
> > kvm_vm_dead()
>
> Nice, that would do the trick. I'll modify the logic to also call that
> after a reclaim failure.

Hmm, assuming there's no scenario where snp_page_reclaim() is expected fail, and
such a failure is always unrecoverable, e.g. has similar potential for inducing
host RMP #PFs, then KVM_BUG_ON() is more appropriate.

Ah, and there are already WARNs in the lower level helpers. Those WARNs should
be KVM_BUG_ON(), because AFAICT there's no scenario where letting the VM live on
is safe/sensible. And unless I'm missing something, snp_page_reclaim() should
do the private=>shared conversion, because the only reason to reclaim a page is
to move it back to shared state.

Lastly, I vote to rename host_rmp_make_shared() to kvm_rmp_make_shared() to make
it more obvious that it's a KVM helper, whereas rmp_make_shared() is a generic
kernel helper, i.e. _can't_ bug the VM because it doesn't (and shouldn't) have a
pointer to the VM.

E.g. end up with something like this:

/*
* Transition a page to hypervisor-owned/shared state in the RMP table. This
* should not fail under normal conditions, but leak the page should that
* happen since it will no longer be usable by the host due to RMP protections.
*/
static int kvm_rmp_make_shared(struct kvm *kvm, u64 pfn, enum pg_level level)
{
if (KVM_BUG_ON(rmp_make_shared(pfn, level), kvm)) {
snp_leak_pages(pfn, page_level_size(level) >> PAGE_SHIFT);
return -EIO;
}

return 0;
}

/*
* Certain page-states, such as Pre-Guest and Firmware pages (as documented
* in Chapter 5 of the SEV-SNP Firmware ABI under "Page States") cannot be
* directly transitioned back to normal/hypervisor-owned state via RMPUPDATE
* unless they are reclaimed first.
*
* Until they are reclaimed and subsequently transitioned via RMPUPDATE, they
* might not be usable by the host due to being set as immutable or still
* being associated with a guest ASID.
*
* Bug the VM and leak the page if reclaim fails, or if the RMP entry can't be
* converted back to shared, as the page is no longer usable due to RMP
* protections, and it's infeasible for the guest to continue on.
*/
static int snp_page_reclaim(struct kvm *kvm, u64 pfn)
{
struct sev_data_snp_page_reclaim data = {0};
int err;

data.paddr = __sme_set(pfn << PAGE_SHIFT);

if (KVM_BUG_ON(sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err), kvm)) {
snp_leak_pages(pfn, 1);
return -EIO;
}

if (kvm_rmp_make_shared(kvm, pfn, PG_LEVEL_4K))
return -EIO;

return 0;
}

2024-05-21 21:01:15

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

On Tue, May 21, 2024 at 09:58:36AM -0700, Sean Christopherson wrote:
> On Tue, May 21, 2024, Michael Roth wrote:
> > On Tue, May 21, 2024 at 07:09:04AM -0700, Sean Christopherson wrote:
> > > On Mon, May 20, 2024, Michael Roth wrote:
> > > > On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> > > > > On Mon, May 20, 2024, Michael Roth wrote:
> > > > > > But there is a possibility that the guest will attempt access the response
> > > > > > PFN before/during that reporting and spin on an #NPF instead though. So
> > > > > > maybe the safer more repeatable approach is to handle the error directly
> > > > > > from KVM and propagate it to userspace.
> > > > >
> > > > > I was thinking more along the lines of KVM marking the VM as dead/bugged.
> > > >
> > > > In practice userspace will get an unhandled exit and kill the vcpu/guest,
> > > > but we could additionally flag the guest as dead.
> > >
> > > Honest question, does it make sense from KVM to make the VM unusable? E.g. is
> > > it feasible for userspace to keep running the VM? Does the page that's in a bad
> > > state present any danger to the host?
> >
> > If the reclaim fails (which it shouldn't), then KVM has a unique situation
> > where a non-gmem guest page is in a state. In theory, if the guest/userspace
> > could somehow induce a reclaim failure, then can they potentially trick the
> > host into trying to access that same page as a shared page and induce a
> > host RMP #PF.
> >
> > So it does seem like a good idea to force the guest to stop executing. Then
> > once the guest is fully destroyed the bad page will stay leaked so it
> > won't affect subsequent activities.
> >
> > >
> > > > Is there a existing mechanism for this?
> > >
> > > kvm_vm_dead()
> >
> > Nice, that would do the trick. I'll modify the logic to also call that
> > after a reclaim failure.
>
> Hmm, assuming there's no scenario where snp_page_reclaim() is expected fail, and
> such a failure is always unrecoverable, e.g. has similar potential for inducing
> host RMP #PFs, then KVM_BUG_ON() is more appropriate.
>
> Ah, and there are already WARNs in the lower level helpers. Those WARNs should
> be KVM_BUG_ON(), because AFAICT there's no scenario where letting the VM live on
> is safe/sensible. And unless I'm missing something, snp_page_reclaim() should
> do the private=>shared conversion, because the only reason to reclaim a page is
> to move it back to shared state.

Yes, and the code always follows up snp_page_reclaim() with
rmp_make_shared(), so it makes sense to combine the 2.

>
> Lastly, I vote to rename host_rmp_make_shared() to kvm_rmp_make_shared() to make
> it more obvious that it's a KVM helper, whereas rmp_make_shared() is a generic
> kernel helper, i.e. _can't_ bug the VM because it doesn't (and shouldn't) have a
> pointer to the VM.

Makes sense.

>
> E.g. end up with something like this:
>
> /*
> * Transition a page to hypervisor-owned/shared state in the RMP table. This
> * should not fail under normal conditions, but leak the page should that
> * happen since it will no longer be usable by the host due to RMP protections.
> */
> static int kvm_rmp_make_shared(struct kvm *kvm, u64 pfn, enum pg_level level)
> {
> if (KVM_BUG_ON(rmp_make_shared(pfn, level), kvm)) {
> snp_leak_pages(pfn, page_level_size(level) >> PAGE_SHIFT);
> return -EIO;
> }
>
> return 0;
> }
>
> /*
> * Certain page-states, such as Pre-Guest and Firmware pages (as documented
> * in Chapter 5 of the SEV-SNP Firmware ABI under "Page States") cannot be
> * directly transitioned back to normal/hypervisor-owned state via RMPUPDATE
> * unless they are reclaimed first.
> *
> * Until they are reclaimed and subsequently transitioned via RMPUPDATE, they
> * might not be usable by the host due to being set as immutable or still
> * being associated with a guest ASID.
> *
> * Bug the VM and leak the page if reclaim fails, or if the RMP entry can't be
> * converted back to shared, as the page is no longer usable due to RMP
> * protections, and it's infeasible for the guest to continue on.
> */
> static int snp_page_reclaim(struct kvm *kvm, u64 pfn)
> {
> struct sev_data_snp_page_reclaim data = {0};
> int err;
>
> data.paddr = __sme_set(pfn << PAGE_SHIFT);
>
> if (KVM_BUG_ON(sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err), kvm)) {

I would probably opt to use KVM_BUG() and print the PFN and firmware
error code to help with diagnosing the failure, but I think the overall
approach seems reasonable and is a safer/cleaner way to handle this
situation.

-Mike

> snp_leak_pages(pfn, 1);
> return -EIO;
> }
>
> if (kvm_rmp_make_shared(kvm, pfn, PG_LEVEL_4K))
> return -EIO;
>
> return 0;
> }

2024-05-31 03:23:50

by Michael Roth

[permalink] [raw]
Subject: Re: [PULL 00/19] KVM: Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

On Fri, May 10, 2024 at 04:10:05PM -0500, Michael Roth wrote:
> Hi Paolo,
>
> This pull request contains v15 of the KVM SNP support patchset[1] along
> with fixes and feedback from you and Sean regarding PSC request processing,
> fast_page_fault() handling for SNP/TDX, and avoiding uncessary
> PSMASH/zapping for KVM_EXIT_MEMORY_FAULT events. It's also been rebased
> on top of kvm/queue (commit 1451476151e0), and re-tested with/without
> 2MB gmem pages enabled.

As discussed during the PUCK call, here is a branch with fixup patches
that incorporate the additional review/testing that came in after these
patches were merged into kvm/next:

https://github.com/mdroth/linux/commits/kvm-next-snp-fixes4/

They are intended to be squashed in but can also be applied on top if
that's preferable (but in that case the first 2 patches need to be
squashed together to maintain build bisectability):

[SQUASH] KVM: SVM: Remove the need to trigger an UNBLOCK event on AP creation
- drops handling for KVM_MP_STATE_UNINITIALIZED since no special
handling for it will be needed until SVSM support is added in OVMF
and the host kernel has the necessary support for running
SVSM-enabled guests
- to be squashed into:
KVM: SEV: Support SEV-SNP AP Creation NAE event

[SQUASH] KVM: SEV: Don't WARN() if RMP lookup fails when invalidating gmem pages
- address the WARN() that Sean noticed when running guest_memfd_test
kselftest on an AMD system without SNP enabled
- to be squashed into:
KVM: SEV: Implement gmem hook for invalidating private pages

[SQUASH] KVM: SEV: Use new kvm_rmp_make_shared() naming
- fixup to handle helper function being renamed in prior patch
- to be squashed into:
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command

[SQUASH] KVM: SEV: Automatically switch reclaimed pages to shared
- implement suggestion from Sean to always switch reclaimed pages to shared
since that's what the callers all end up doing anyway
- to be squashed into:
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

As discussed at PUCK I will resubmit the guest requests patches
separately will all the pending changes incorporated.

Thanks!

-Mike

>
> Thanks!
>
> -Mike
>
> [1] https://lore.kernel.org/kvm/[email protected]/
>
> The following changes since commit 1451476151e08e1e83ff07ce69dd0d1d025e976e:
>
> Merge commit 'kvm-coco-hooks' into HEAD (2024-05-10 13:20:42 -0400)
>
> are available in the Git repository at:
>
> https://github.com/mdroth/linux.git tags/tags/kvm-queue-snp
>
> for you to fetch changes up to 4b3f0135f759bb1a54bb28d644c38a7780150eda:
>
> crypto: ccp: Add the SNP_VLEK_LOAD command (2024-05-10 14:44:31 -0500)
>
> ----------------------------------------------------------------
> Base x86 KVM support for running SEV-SNP guests:
>
> - add some basic infrastructure and introduces a new KVM_X86_SNP_VM
> vm_type to handle differences versus the existing KVM_X86_SEV_VM and
> KVM_X86_SEV_ES_VM types.
>
> - implement the KVM API to handle the creation of a cryptographic
> launch context, encrypt/measure the initial image into guest memory,
> and finalize it before launching it.
>
> - implement handling for various guest-generated events such as page
> state changes, onlining of additional vCPUs, etc.
>
> - implement the gmem/mmu hooks needed to prepare gmem-allocated pages
> before mapping them into guest private memory ranges as well as
> cleaning them up prior to returning them to the host for use as
> normal memory. Because those cleanup hooks supplant certain
> activities like issuing WBINVDs during KVM MMU invalidations, avoid
> duplicating that work to avoid unecessary overhead.
>
> - add support for the servicing of guest requests to handle things like
> attestation, as well as some related host-management interfaces to
> handle updating firmware's signing key for attestation requests
>
> ----------------------------------------------------------------
> Ashish Kalra (1):
> KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
>
> Brijesh Singh (8):
> KVM: SEV: Add initial SEV-SNP support
> KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
> KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
> KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
> KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
> KVM: SEV: Add support to handle RMP nested page faults
> KVM: SVM: Add module parameter to enable SEV-SNP
> KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
>
> Michael Roth (9):
> KVM: MMU: Disable fast path if KVM_EXIT_MEMORY_FAULT is needed
> KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
> KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
> KVM: SEV: Add support to handle Page State Change VMGEXIT
> KVM: SEV: Implement gmem hook for initializing private pages
> KVM: SEV: Implement gmem hook for invalidating private pages
> KVM: x86: Implement hook for determining max NPT mapping level
> KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
> crypto: ccp: Add the SNP_VLEK_LOAD command
>
> Tom Lendacky (1):
> KVM: SEV: Support SEV-SNP AP Creation NAE event
>
> Documentation/virt/coco/sev-guest.rst | 19 +
> Documentation/virt/kvm/api.rst | 87 ++
> .../virt/kvm/x86/amd-memory-encryption.rst | 110 +-
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/include/asm/sev-common.h | 25 +
> arch/x86/include/asm/sev.h | 3 +
> arch/x86/include/asm/svm.h | 9 +-
> arch/x86/include/uapi/asm/kvm.h | 48 +
> arch/x86/kvm/Kconfig | 3 +
> arch/x86/kvm/mmu.h | 2 -
> arch/x86/kvm/mmu/mmu.c | 25 +-
> arch/x86/kvm/svm/sev.c | 1546 +++++++++++++++++++-
> arch/x86/kvm/svm/svm.c | 37 +-
> arch/x86/kvm/svm/svm.h | 52 +
> arch/x86/kvm/trace.h | 31 +
> arch/x86/kvm/x86.c | 17 +
> drivers/crypto/ccp/sev-dev.c | 36 +
> include/linux/psp-sev.h | 4 +-
> include/uapi/linux/kvm.h | 23 +
> include/uapi/linux/psp-sev.h | 27 +
> include/uapi/linux/sev-guest.h | 9 +
> virt/kvm/guest_memfd.c | 4 +-
> 22 files changed, 2086 insertions(+), 33 deletions(-)
>
>

2024-06-03 16:44:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PULL 00/19] KVM: Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

On Fri, May 31, 2024 at 5:23 AM Michael Roth <[email protected]> wrote:
> As discussed during the PUCK call, here is a branch with fixup patches
> that incorporate the additional review/testing that came in after these
> patches were merged into kvm/next:
>
> https://github.com/mdroth/linux/commits/kvm-next-snp-fixes4/
>
> They are intended to be squashed in but can also be applied on top if
> that's preferable (but in that case the first 2 patches need to be
> squashed together to maintain build bisectability):

Yes, I'd rather not rebase kvm/next again so I applied them on top.
None of the issues are so egregiously bad.

Paolo