2024-04-21 18:04:24

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 00/22] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

This patchset is also available at:

https://github.com/amdese/linux/commits/snp-host-v14

and is based on commit 20cc50a0410f (just before the v13 SNP patches) from:

https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue


Patch Layout
------------

01-04: These patches add some basic infrastructure and introduces a new
KVM_X86_SNP_VM vm_type to handle differences verses the existing
KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.

05-07: These implement the KVM API to handle the creation of a
cryptographic launch context, encrypt/measure the initial image
into guest memory, and finalize it before launching it.

08-13: These implement handling for various guest-generated events such
as page state changes, onlining of additional vCPUs, etc.

14-17: These implement the gmem hooks needed to prepare gmem-allocated
pages before mapping them into guest private memory ranges as
well as cleaning them up prior to returning them to the host for
use as normal memory. Because this supplants certain activities
like issued WBINVDs during KVM MMU invalidations, there's also
a patch to avoid duplicating that work to avoid unecessary
overhead.

18: With all the core support in place, the patch adds a kvm_amd module
parameter to enable SNP support.

19-22: These patches all deal with the servicing of guest requests to handle
things like attestation, as well as some related host-management
interfaces.


Testing
-------

For testing this via QEMU, use the following tree:

https://github.com/amdese/qemu/commits/snp-v4-wip3b

A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
ranges that are mapped as private. It is recommended you build the AmdSevX64
variant as it provides the kernel-hashing support present in this series:

https://github.com/amdese/ovmf/commits/apic-mmio-fix1d

A basic command-line invocation for SNP would be:

qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
-machine q35,confidential-guest-support=sev0,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
-bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd

With kernel-hashing and certificate data supplied:

qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
-machine q35,confidential-guest-support=sev0,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
-bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
-kernel /boot/vmlinuz-$ver
-initrd /boot/initrd.img-$ver
-append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"

With standard X64 OVMF package with separate image for persistent NVRAM:

qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
-machine q35,confidential-guest-support=sev0,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
-bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d.fd
-drive if=pflash,format=raw,unit=0,file=OVMF_VARS-upstream-20240410-apic-mmio-fix1d.fd,readonly=off


Known issues / TODOs
--------------------

* Base tree in some cases reports "Unpatched return thunk in use. This should
not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
regression upstream and unrelated to this series:

https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@mail.gmail.com/T/

* 2MB hugepage support has been dropped pending discussion on how we plan to
re-enable it in gmem.

* Host kexec should work, but there is a known issue with host kdump support
while SNP guests are running that will be addressed as a follow-up.

* SNP kselftests are currently a WIP and will be included as part of SNP
upstreaming efforts in the near-term.


SEV-SNP Overview
----------------

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required to add KVM support for SEV-SNP. This series builds upon
SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
initialization support, which is now in linux-next.

While series provides the basic building blocks to support booting the
SEV-SNP VMs, it does not cover all the security enhancement introduced by
the SEV-SNP such as interrupt protection, which will added in the future.

With SNP, when pages are marked as guest-owned in the RMP table, they are
assigned to a specific guest/ASID, as well as a specific GFN with in the
guest. Any attempts to map it in the RMP table to a different guest/ASID,
or a different GFN within a guest/ASID, will result in an RMP nested page
fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outside of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation of SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike


Changes since v13:

* rebase to new kvm-coco-queue and wire up to PFERR_PRIVATE_ACCESS (Paolo)
* handle setting kvm->arch.has_private_mem in same location as
kvm->arch.has_protected_state (Paolo)
* add flags and additional padding fields to
snp_launch{start,update,finish} APIs to address alignment and
expandability (Paolo)
* update snp_launch_update() to update input struct values to reflect
current progress of command in situations where mulitple calls are
needed (Paolo)
* update snp_launch_update() to avoid copying/accessing 'src' parameter
when dealing with zero pages. (Paolo)
* update snp_launch_update() to use u64 as length input parameter instead
of u32 and adjust padding accordingly
* modify ordering of SNP_POLICY_MASK_* definitions to be consistent with
bit order of corresponding flags
* let firmware handle enforcement of policy bits corresponding to
user-specified minimum API version
* add missing "0x" prefixs in pr_debug()'s for snp_launch_start()
* fix handling of VMSAs during in-place migration (Paolo)

Changes since v12:

* rebased to latest kvm-coco-queue branch (commit 4d2deb62185f)
* add more input validation for SNP_LAUNCH_START, especially for handling
things like MBO/MBZ policy bits, and API major/minor minimums. (Paolo)
* block SNP KVM instances from being able to run legacy SEV commands (Paolo)
* don't attempt to measure VMSA for vcpu 0/BSP before the others, let
userspace deal with the ordering just like with SEV-ES (Paolo)
* fix up docs for SNP_LAUNCH_FINISH (Paolo)
* introduce svm->sev_es.snp_has_guest_vmsa flag to better distinguish
handling for guest-mapped vs non-guest-mapped VMSAs, rename
'snp_ap_create' flag to 'snp_ap_waiting_for_reset' (Paolo)
* drop "KVM: SEV: Use a VMSA physical address variable for populating VMCB"
as it is no longer needed due to above VMSA rework
* replace pr_debug_ratelimited() messages for RMP #NPFs with a single trace
event
* handle transient PSMASH_FAIL_INUSE return codes in kvm_gmem_invalidate(),
switch to WARN_ON*()'s to indicate remaining error cases are not expected
and should not be seen in practice. (Paolo)
* add a cond_resched() in kvm_gmem_invalidate() to avoid soft lock-ups when
cleaning up large guest memory ranges.
* rename VLEK_REQUIRED to VCEK_DISABLE. it's be more applicable if another
key type ever gets added.
* don't allow attestation to be paused while an attestation request is
being processed by firmware (Tom)
* add missing Documentation entry for SNP_VLEK_LOAD
* collect Reviewed-by's from Paolo and Tom

Changes since v11:

* Rebase series on kvm-coco-queue and re-work to leverage more
infrastructure between SNP/TDX series.
* Drop KVM_SNP_INIT in favor of the new KVM_SEV_INIT2 interface introduced
here (Paolo):
https://lore.kernel.org/lkml/[email protected]/
* Drop exposure API fields related to things like VMPL levels, migration
agents, etc., until they are actually supported/used (Sean)
* Rework KVM_SEV_SNP_LAUNCH_UPDATE handling to use a new
kvm_gmem_populate() interface instead of copying data directly into
gmem-allocated pages (Sean)
* Add support for SNP_LOAD_VLEK, rework the SNP_SET_CONFIG_{START,END} to
have simpler semantics that are applicable to management of SNP_LOAD_VLEK
updates as well, rename interfaces to the now more appropriate
SNP_{PAUSE,RESUME}_ATTESTATION
* Fix up documentation wording and do print warnings for
userspace-triggerable failures (Peter, Sean)
* Fix a race with AP_CREATION wake-up events (Jacob, Sean)
* Fix a memory leak with VMSA pages (Sean)
* Tighten up handling of RMP page faults to better distinguish between real
and spurious cases (Tom)
* Various patch/documentation rewording, cleanups, etc.


----------------------------------------------------------------
Ashish Kalra (1):
KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

Brijesh Singh (11):
KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
KVM: SEV: Add initial SEV-SNP support
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
KVM: SEV: Add support to handle Page State Change VMGEXIT
KVM: SEV: Add support to handle RMP nested page faults
KVM: SVM: Add module parameter to enable SEV-SNP
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

Michael Roth (8):
KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
KVM: SEV: Add support for GHCB-based termination requests
KVM: SEV: Implement gmem hook for initializing private pages
KVM: SEV: Implement gmem hook for invalidating private pages
KVM: x86: Implement gmem hook for determining max NPT mapping level
crypto: ccp: Add the SNP_VLEK_LOAD command
crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands
KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Tom Lendacky (2):
KVM: SEV: Add support to handle AP reset MSR protocol
KVM: SEV: Support SEV-SNP AP Creation NAE event

Documentation/virt/coco/sev-guest.rst | 69 +-
Documentation/virt/kvm/api.rst | 73 +
.../virt/kvm/x86/amd-memory-encryption.rst | 110 +-
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/asm/sev-common.h | 22 +-
arch/x86/include/asm/sev.h | 15 +
arch/x86/include/asm/svm.h | 9 +-
arch/x86/include/uapi/asm/kvm.h | 48 +
arch/x86/kvm/Kconfig | 3 +
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 1 +
arch/x86/kvm/svm/sev.c | 1447 +++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 44 +-
arch/x86/kvm/svm/svm.h | 50 +
arch/x86/kvm/trace.h | 31 +
arch/x86/kvm/x86.c | 17 +
arch/x86/virt/svm/sev.c | 80 ++
drivers/crypto/ccp/sev-dev.c | 83 ++
include/linux/psp-sev.h | 4 +-
include/uapi/linux/kvm.h | 28 +
include/uapi/linux/psp-sev.h | 39 +
include/uapi/linux/sev-guest.h | 9 +
virt/kvm/guest_memfd.c | 4 +-
23 files changed, 2155 insertions(+), 35 deletions(-)




2024-04-21 18:04:54

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 11/22] KVM: SEV: Add support to handle RMP nested page faults

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.

When using gmem, RMP faults resulting from mismatches between the state
in the RMP table vs. what the guest expects via its page table result
in KVM_EXIT_MEMORY_FAULTs being forwarded to userspace to handle. This
means the only expected case that needs to be handled in the kernel is
when the page size of the entry in the RMP table is larger than the
mapping in the nested page table, in which case a PSMASH instruction
needs to be issued to split the large RMP entry into individual 4K
entries so that subsequent accesses can succeed.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/sev.h | 3 +
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 1 +
arch/x86/kvm/svm/sev.c | 109 ++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 21 ++++--
arch/x86/kvm/svm/svm.h | 3 +
arch/x86/kvm/trace.h | 31 +++++++++
arch/x86/kvm/x86.c | 1 +
9 files changed, 166 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4c9d8a22840a..90f0de2b8645 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1945,6 +1945,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
const struct kvm_memory_slot *memslot);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);

int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7f57382afee4..3a06f06b847a 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -91,6 +91,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* RMUPDATE detected 4K page and 2MB page overlap. */
#define RMPUPDATE_FAIL_OVERLAP 4

+/* PSMASH failed due to concurrent access by another CPU */
+#define PSMASH_FAIL_INUSE 3
+
/* RMP page size */
#define RMP_PG_SIZE_4K 0
#define RMP_PG_SIZE_2M 1
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2343c9f00e31..e3cb35b9396d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -251,8 +251,6 @@ static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
return __kvm_mmu_honors_guest_mtrrs(kvm_arch_has_noncoherent_dma(kvm));
}

-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);

int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index eebb1562c5bc..b6d0aa18b72b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6752,6 +6752,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,

return need_tlb_flush;
}
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);

static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
const struct kvm_memory_slot *slot)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e9519af8f14c..65882033a82f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3464,6 +3464,23 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}

+static int snp_rmptable_psmash(kvm_pfn_t pfn)
+{
+ int ret;
+
+ pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+ /*
+ * PSMASH_FAIL_INUSE indicates another processor is modifying the
+ * entry, so retry until that's no longer the case.
+ */
+ do {
+ ret = psmash(pfn);
+ } while (ret == PSMASH_FAIL_INUSE);
+
+ return ret;
+}
+
static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -4023,3 +4040,95 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)

return p;
}
+
+void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+ struct kvm_memory_slot *slot;
+ struct kvm *kvm = vcpu->kvm;
+ int order, rmp_level, ret;
+ bool assigned;
+ kvm_pfn_t pfn;
+ gfn_t gfn;
+
+ gfn = gpa >> PAGE_SHIFT;
+
+ /*
+ * The only time RMP faults occur for shared pages is when the guest is
+ * triggering an RMP fault for an implicit page-state change from
+ * shared->private. Implicit page-state changes are forwarded to
+ * userspace via KVM_EXIT_MEMORY_FAULT events, however, so RMP faults
+ * for shared pages should not end up here.
+ */
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault for non-private GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+ if (ret) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, no backing page for private GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret || !assigned) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
+ gpa, pfn, ret);
+ goto out_no_trace;
+ }
+
+ /*
+ * There are 2 cases where a PSMASH may be needed to resolve an #NPF
+ * with PFERR_GUEST_RMP_BIT set:
+ *
+ * 1) RMPADJUST/PVALIDATE can trigger an #NPF with PFERR_GUEST_SIZEM
+ * bit set if the guest issues them with a smaller granularity than
+ * what is indicated by the page-size bit in the 2MB RMP entry for
+ * the PFN that backs the GPA.
+ *
+ * 2) Guest access via NPT can trigger an #NPF if the NPT mapping is
+ * smaller than what is indicated by the 2MB RMP entry for the PFN
+ * that backs the GPA.
+ *
+ * In both these cases, the corresponding 2M RMP entry needs to
+ * be PSMASH'd to 512 4K RMP entries. If the RMP entry is already
+ * split into 4K RMP entries, then this is likely a spurious case which
+ * can occur when there are concurrent accesses by the guest to a 2MB
+ * GPA range that is backed by a 2MB-aligned PFN who's RMP entry is in
+ * the process of being PMASH'd into 4K entries. These cases should
+ * resolve automatically on subsequent accesses, so just ignore them
+ * here.
+ */
+ if (rmp_level == PG_LEVEL_4K)
+ goto out;
+
+ ret = snp_rmptable_psmash(pfn);
+ if (ret) {
+ /*
+ * Look it up again. If it's 4K now then the PSMASH may have
+ * raced with another process and the issue has already resolved
+ * itself.
+ */
+ if (!snp_lookup_rmpentry(pfn, &assigned, &rmp_level) &&
+ assigned && rmp_level == PG_LEVEL_4K)
+ goto out;
+
+ pr_warn_ratelimited("SEV: Unable to split RMP entry for GPA 0x%llx PFN 0x%llx ret %d\n",
+ gpa, pfn, ret);
+ }
+
+ kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+out:
+ trace_kvm_rmp_fault(vcpu, gpa, pfn, error_code, rmp_level, ret);
+out_no_trace:
+ put_page(pfn_to_page(pfn));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 422b452fbc3b..7c9807fdafc3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2043,6 +2043,7 @@ static int pf_interception(struct kvm_vcpu *vcpu)
static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ int rc;

u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;
@@ -2060,10 +2061,22 @@ static int npf_interception(struct kvm_vcpu *vcpu)
error_code |= PFERR_PRIVATE_ACCESS;

trace_kvm_page_fault(vcpu, fault_address, error_code);
- return kvm_mmu_page_fault(vcpu, fault_address, error_code,
- static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
- svm->vmcb->control.insn_bytes : NULL,
- svm->vmcb->control.insn_len);
+ rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+ static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+ svm->vmcb->control.insn_bytes : NULL,
+ svm->vmcb->control.insn_len);
+
+ /*
+ * rc == 0 indicates a userspace exit is needed to handle page
+ * transitions, so do that first before updating the RMP table.
+ */
+ if (error_code & PFERR_GUEST_RMP_MASK) {
+ if (rc == 0)
+ return rc;
+ sev_handle_rmp_fault(vcpu, fault_address, error_code);
+ }
+
+ return rc;
}

static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 730f5ced2a2e..d2b0ec27d4fe 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -722,6 +722,7 @@ void sev_hardware_unsetup(void);
int sev_cpu_init(struct svm_cpu_data *sd);
int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
extern unsigned int max_sev_asid;
+void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -735,6 +736,8 @@ static inline void sev_hardware_unsetup(void) {}
static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
#define max_sev_asid 0
+static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
+
#endif

/* vmenter.S */
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index c6b4b1728006..3531a187d5d9 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1834,6 +1834,37 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
__entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
);

+/*
+ * Tracepoint for #NPFs due to RMP faults.
+ */
+TRACE_EVENT(kvm_rmp_fault,
+ TP_PROTO(struct kvm_vcpu *vcpu, u64 gpa, u64 pfn, u64 error_code,
+ int rmp_level, int psmash_ret),
+ TP_ARGS(vcpu, gpa, pfn, error_code, rmp_level, psmash_ret),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, vcpu_id)
+ __field(u64, gpa)
+ __field(u64, pfn)
+ __field(u64, error_code)
+ __field(int, rmp_level)
+ __field(int, psmash_ret)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu->vcpu_id;
+ __entry->gpa = gpa;
+ __entry->pfn = pfn;
+ __entry->error_code = error_code;
+ __entry->rmp_level = rmp_level;
+ __entry->psmash_ret = psmash_ret;
+ ),
+
+ TP_printk("vcpu %u gpa %016llx pfn 0x%llx error_code 0x%llx rmp_level %d psmash_ret %d",
+ __entry->vcpu_id, __entry->gpa, __entry->pfn,
+ __entry->error_code, __entry->rmp_level, __entry->psmash_ret)
+);
+
#endif /* _TRACE_KVM_H */

#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83b8260443a3..14693effec6b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13996,6 +13996,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault);

static int __init kvm_x86_init(void)
{
--
2.25.1


2024-04-21 18:05:20

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 12/22] KVM: SEV: Support SEV-SNP AP Creation NAE event

From: Tom Lendacky <[email protected]>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID. The GPA is saved in the svm struct of the
target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID the next time an INIT is performed. The GPA is
saved in the svm struct of the target vCPU.

For DESTROY:
The guest indicates it wishes to stop the vCPU. The GPA is cleared
from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. If a new VMSA is to
be installed, the VMSA guest page is set as the VMSA in the vCPU VMCB
and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new VMSA is not
to be installed, the VMSA is cleared in the vCPU VMCB and the vCPU state
is set to KVM_MP_STATE_HALTED to prevent it from being run.

Signed-off-by: Tom Lendacky <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/svm.h | 6 +
arch/x86/kvm/svm/sev.c | 229 +++++++++++++++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 11 +-
arch/x86/kvm/svm/svm.h | 9 ++
arch/x86/kvm/x86.c | 11 ++
6 files changed, 264 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 90f0de2b8645..54aafcb50d8b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -121,6 +121,7 @@
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)

#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 544a43c1cf11..f0dea3750ca9 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -286,8 +286,14 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)

#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4)
#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)

+#define SVM_SEV_FEAT_INT_INJ_MODES \
+ (SVM_SEV_FEAT_RESTRICTED_INJECTION | \
+ SVM_SEV_FEAT_ALTERNATE_INJECTION)
+
struct vmcb_seg {
u16 selector;
u16 attrib;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 65882033a82f..67e245a0d2bb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -37,7 +37,7 @@
#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

-#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP
+#define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)

/* enable/disable SEV support */
static bool sev_enabled = true;
@@ -3270,6 +3270,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
if (!kvm_ghcb_sw_scratch_is_valid(svm))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
+ if (!kvm_ghcb_rax_is_valid(svm))
+ goto vmgexit_err;
+ break;
case SVM_VMGEXIT_NMI_COMPLETE:
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3520,6 +3525,205 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
return 1; /* resume guest */
}

+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+ /* Mark the vCPU as offline and not runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+
+ /* Clear use of the VMSA */
+ svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+ if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+ gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
+ struct kvm_memory_slot *slot;
+ kvm_pfn_t pfn;
+
+ slot = gfn_to_memslot(vcpu->kvm, gfn);
+ if (!slot)
+ return -EINVAL;
+
+ /*
+ * The new VMSA will be private memory guest memory, so
+ * retrieve the PFN from the gmem backend.
+ */
+ if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
+ return -EINVAL;
+
+ /*
+ * From this point forward, the VMSA will always be a
+ * guest-mapped page rather than the initial one allocated
+ * by KVM in svm->sev_es.vmsa. In theory, svm->sev_es.vmsa
+ * could be free'd and cleaned up here, but that involves
+ * cleanups like wbinvd_on_all_cpus() which would ideally
+ * be handled during teardown rather than guest boot.
+ * Deferring that also allows the existing logic for SEV-ES
+ * VMSAs to be re-used with minimal SNP-specific changes.
+ */
+ svm->sev_es.snp_has_guest_vmsa = true;
+
+ /* Use the new VMSA */
+ svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
+
+ /* Mark the vCPU as runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+
+ /*
+ * gmem pages aren't currently migratable, but if this ever
+ * changes then care should be taken to ensure
+ * svm->sev_es.vmsa is pinned through some other means.
+ */
+ kvm_release_pfn_clean(pfn);
+ }
+
+ /*
+ * When replacing the VMSA during SEV-SNP AP creation,
+ * mark the VMCB dirty so that full state is always reloaded.
+ */
+ vmcb_mark_all_dirty(svm->vmcb);
+
+ return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ int ret;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ return;
+
+ mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+ if (!svm->sev_es.snp_ap_waiting_for_reset)
+ goto unlock;
+
+ svm->sev_es.snp_ap_waiting_for_reset = false;
+
+ ret = __sev_snp_update_protected_guest_state(vcpu);
+ if (ret)
+ vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+ mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_vcpu *target_vcpu;
+ struct vcpu_svm *target_svm;
+ unsigned int request;
+ unsigned int apic_id;
+ bool kick;
+ int ret;
+
+ request = lower_32_bits(svm->vmcb->control.exit_info_1);
+ apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+ /* Validate the APIC ID */
+ target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+ if (!target_vcpu) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+ apic_id);
+ return -EINVAL;
+ }
+
+ ret = 0;
+
+ target_svm = to_svm(target_vcpu);
+
+ /*
+ * The target vCPU is valid, so the vCPU will be kicked unless the
+ * request is for CREATE_ON_INIT. For any errors at this stage, the
+ * kick will place the vCPU in an non-runnable state.
+ */
+ kick = true;
+
+ mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ target_svm->sev_es.snp_ap_waiting_for_reset = true;
+
+ /* Interrupt injection mode shouldn't change for AP creation */
+ if (request < SVM_VMGEXIT_AP_DESTROY) {
+ u64 sev_features;
+
+ sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+ sev_features ^= sev->vmsa_features;
+
+ if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+ vcpu->arch.regs[VCPU_REGS_RAX]);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ switch (request) {
+ case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+ kick = false;
+ fallthrough;
+ case SVM_VMGEXIT_AP_CREATE:
+ if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /*
+ * Malicious guest can RMPADJUST a large page into VMSA which
+ * will hit the SNP erratum where the CPU will incorrectly signal
+ * an RMP violation #PF if a hugepage collides with the RMP entry
+ * of VMSA page, reject the AP CREATE request if VMSA address from
+ * guest is 2M aligned.
+ */
+ if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+ vcpu_unimpl(vcpu,
+ "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+ break;
+ case SVM_VMGEXIT_AP_DESTROY:
+ break;
+ default:
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+ request);
+ ret = -EINVAL;
+ break;
+ }
+
+out:
+ if (kick) {
+ kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+
+ if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+ kvm_make_request(KVM_REQ_UNBLOCK, target_vcpu);
+
+ kvm_vcpu_kick(target_vcpu);
+ }
+
+ mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ return ret;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3763,6 +3967,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
vcpu->run->vmgexit.psc.shared_gpa = svm->sev_es.sw_scratch;
vcpu->arch.complete_userspace_io = snp_complete_psc;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ ret = sev_snp_ap_creation(svm);
+ if (ret) {
+ ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 2);
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
+ }
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3857,7 +4070,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
* the VMSA will be NULL if this vCPU is the destination for intrahost
* migration, and will be copied later.
*/
- if (svm->sev_es.vmsa)
+ if (svm->sev_es.vmsa && !svm->sev_es.snp_has_guest_vmsa)
svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);

/* Can't intercept CR register access, HV can't modify CR registers */
@@ -3930,6 +4143,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
GHCB_VERSION_MIN,
sev_enc_bit));
+
+ mutex_init(&svm->sev_es.snp_vmsa_mutex);
}

void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa)
@@ -4041,6 +4256,16 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
return p;
}

+void sev_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+ if (!sev_snp_guest(vcpu->kvm))
+ return;
+
+ if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu) &&
+ vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+}
+
void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
{
struct kvm_memory_slot *slot;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7c9807fdafc3..b70556608e8d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1398,6 +1398,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
svm->spec_ctrl = 0;
svm->virt_spec_ctrl = 0;

+ if (init_event)
+ sev_snp_init_protected_guest_state(vcpu);
+
init_vmcb(vcpu);

if (!init_event)
@@ -4944,6 +4947,12 @@ static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
return page_address(page);
}

+static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+ sev_vcpu_unblocking(vcpu);
+ avic_vcpu_unblocking(vcpu);
+}
+
static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = KBUILD_MODNAME,

@@ -4966,7 +4975,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_load = svm_vcpu_load,
.vcpu_put = svm_vcpu_put,
.vcpu_blocking = avic_vcpu_blocking,
- .vcpu_unblocking = avic_vcpu_unblocking,
+ .vcpu_unblocking = svm_vcpu_unblocking,

.update_exception_bitmap = svm_update_exception_bitmap,
.get_msr_feature = svm_get_msr_feature,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d2b0ec27d4fe..81e335dca281 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -210,6 +210,11 @@ struct vcpu_sev_es_state {
bool ghcb_sa_free;

u64 ghcb_registered_gpa;
+
+ struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
+ gpa_t snp_vmsa_gpa;
+ bool snp_ap_waiting_for_reset;
+ bool snp_has_guest_vmsa;
};

struct vcpu_svm {
@@ -723,6 +728,8 @@ int sev_cpu_init(struct svm_cpu_data *sd);
int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
extern unsigned int max_sev_asid;
void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -737,6 +744,8 @@ static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
#define max_sev_asid 0
static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
+static inline void sev_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}

#endif

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 14693effec6b..b20f6c1b8214 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10938,6 +10938,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)

if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+ if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+ kvm_vcpu_reset(vcpu, true);
+ if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+ r = 1;
+ goto out;
+ }
+ }
}

if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -13145,6 +13153,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
if (kvm_test_request(KVM_REQ_PMI, vcpu))
return true;

+ if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+ return true;
+
if (kvm_arch_interrupt_allowed(vcpu) &&
(kvm_cpu_has_interrupt(vcpu) ||
kvm_guest_apic_has_interrupt(vcpu)))
--
2.25.1


2024-04-21 18:05:42

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 13/22] KVM: SEV: Add support for GHCB-based termination requests

GHCB version 2 adds support for a GHCB-based termination request that
a guest can issue when it reaches an error state and wishes to inform
the hypervisor that it should be terminated. Implement support for that
similarly to GHCB MSR-based termination requests that are already
available to SEV-ES guests via earlier versions of the GHCB protocol.

See 'Termination Request' in the 'Invoking VMGEXIT' section of the GHCB
specification for more details.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 67e245a0d2bb..1d18e3497b4e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3281,6 +3281,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
+ case SVM_VMGEXIT_TERM_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3976,6 +3977,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)

ret = 1;
break;
+ case SVM_VMGEXIT_TERM_REQUEST:
+ pr_info("SEV-ES guest requested termination: reason %#llx info %#llx\n",
+ control->exit_info_1, control->exit_info_2);
+ vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+ vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
+ vcpu->run->system_event.ndata = 1;
+ vcpu->run->system_event.data[0] = control->ghcb_gpa;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.25.1


2024-04-21 18:06:02

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 14/22] KVM: SEV: Implement gmem hook for initializing private pages

This will handle the RMP table updates needed to put a page into a
private state before mapping it into an SEV-SNP guest.

Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/sev.c | 98 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/svm/svm.h | 5 +++
arch/x86/kvm/x86.c | 5 +++
virt/kvm/guest_memfd.c | 4 +-
6 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 5e72faca4e8f..10768f13b240 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -137,6 +137,7 @@ config KVM_AMD_SEV
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select ARCH_HAS_CC_PLATFORM
select KVM_GENERIC_PRIVATE_MEM
+ select HAVE_KVM_GMEM_PREPARE
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1d18e3497b4e..2906fee3187d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4366,3 +4366,101 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
out_no_trace:
put_page(pfn_to_page(pfn));
}
+
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn = start;
+
+ while (pfn < end) {
+ int ret, rmp_level;
+ bool assigned;
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret) {
+ pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
+ pfn, start, end, rmp_level, ret);
+ return false;
+ }
+
+ if (assigned) {
+ pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
+ __func__, pfn, start, end, rmp_level);
+ return false;
+ }
+
+ pfn++;
+ }
+
+ return true;
+}
+
+static u8 max_level_for_order(int order)
+{
+ if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+ return PG_LEVEL_2M;
+
+ return PG_LEVEL_4K;
+}
+
+static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
+{
+ kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+
+ /*
+ * If this is a large folio, and the entire 2M range containing the
+ * PFN is currently shared, then the entire 2M-aligned range can be
+ * set to private via a single 2M RMP entry.
+ */
+ if (max_level_for_order(order) > PG_LEVEL_4K &&
+ is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
+ return true;
+
+ return false;
+}
+
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ kvm_pfn_t pfn_aligned;
+ gfn_t gfn_aligned;
+ int level, rc;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc) {
+ pr_err_ratelimited("SEV: Failed to look up RMP entry: GFN %llx PFN %llx error %d\n",
+ gfn, pfn, rc);
+ return -ENOENT;
+ }
+
+ if (assigned) {
+ pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
+ __func__, gfn, pfn, max_order, level);
+ return 0;
+ }
+
+ if (is_large_rmp_possible(kvm, pfn, max_order)) {
+ level = PG_LEVEL_2M;
+ pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+ gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
+ } else {
+ level = PG_LEVEL_4K;
+ pfn_aligned = pfn;
+ gfn_aligned = gfn;
+ }
+
+ rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
+ if (rc) {
+ pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
+ gfn, pfn, level, rc);
+ return -EINVAL;
+ }
+
+ pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
+ __func__, gfn, pfn, pfn_aligned, max_order, level);
+
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b70556608e8d..60783e9f2ae8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5085,6 +5085,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+ .gmem_prepare = sev_gmem_prepare,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 81e335dca281..7712ed90aae8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -730,6 +730,7 @@ extern unsigned int max_sev_asid;
void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -746,6 +747,10 @@ static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXI
static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
static inline void sev_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}
+static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+ return 0;
+}

#endif

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b20f6c1b8214..0fb76ef9b7e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13610,6 +13610,11 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
EXPORT_SYMBOL_GPL(kvm_arch_no_poll);

#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+bool kvm_arch_gmem_prepare_needed(struct kvm *kvm)
+{
+ return kvm->arch.vm_type == KVM_X86_SNP_VM;
+}
+
int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
{
return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index a44f983eb673..7d3932e5a689 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -46,8 +46,8 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
gfn = slot->base_gfn + index - slot->gmem.pgoff;
rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
if (rc) {
- pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
- index, rc);
+ pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
+ index, gfn, pfn, rc);
return rc;
}
}
--
2.25.1


2024-04-21 18:06:22

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 15/22] KVM: SEV: Implement gmem hook for invalidating private pages

Implement a platform hook to do the work of restoring the direct map
entries of gmem-managed pages and transitioning the corresponding RMP
table entries back to the default shared/hypervisor-owned state.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/sev.c | 64 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 2 ++
4 files changed, 68 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 10768f13b240..2a7f69abcac3 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -138,6 +138,7 @@ config KVM_AMD_SEV
select ARCH_HAS_CC_PLATFORM
select KVM_GENERIC_PRIVATE_MEM
select HAVE_KVM_GMEM_PREPARE
+ select HAVE_KVM_GMEM_INVALIDATE
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2906fee3187d..ff9b8c68ae56 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4464,3 +4464,67 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)

return 0;
}
+
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn;
+
+ pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
+
+ for (pfn = start; pfn < end;) {
+ bool use_2m_update = false;
+ int rc, rmp_level;
+ bool assigned;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc))
+ goto next_pfn;
+
+ if (!assigned)
+ goto next_pfn;
+
+ use_2m_update = IS_ALIGNED(pfn, PTRS_PER_PMD) &&
+ end >= (pfn + PTRS_PER_PMD) &&
+ rmp_level > PG_LEVEL_4K;
+
+ /*
+ * If an unaligned PFN corresponds to a 2M region assigned as a
+ * large page in the RMP table, PSMASH the region into individual
+ * 4K RMP entries before attempting to convert a 4K sub-page.
+ */
+ if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
+ /*
+ * This shouldn't fail, but if it does, report it, but
+ * still try to update RMP entry to shared and pray this
+ * was a spurious error that can be addressed later.
+ */
+ rc = snp_rmptable_psmash(pfn);
+ WARN_ONCE(rc, "SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ }
+
+ rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
+ if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc))
+ goto next_pfn;
+
+ /*
+ * SEV-ES avoids host/guest cache coherency issues through
+ * WBINVD hooks issued via MMU notifiers during run-time, and
+ * KVM's VM destroy path at shutdown. Those MMU notifier events
+ * don't cover gmem since there is no requirement to map pages
+ * to a HVA in order to use them for a running guest. While the
+ * shutdown path would still likely cover things for SNP guests,
+ * userspace may also free gmem pages during run-time via
+ * hole-punching operations on the guest_memfd, so flush the
+ * cache entries for these pages before free'ing them back to
+ * the host.
+ */
+ clflush_cache_range(__va(pfn_to_hpa(pfn)),
+ use_2m_update ? PMD_SIZE : PAGE_SIZE);
+next_pfn:
+ pfn += use_2m_update ? PTRS_PER_PMD : 1;
+ cond_resched();
+ }
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 60783e9f2ae8..29dc5fa28d97 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5087,6 +5087,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.alloc_apic_backing_page = svm_alloc_apic_backing_page,

.gmem_prepare = sev_gmem_prepare,
+ .gmem_invalidate = sev_gmem_invalidate,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7712ed90aae8..6721e5c6cf73 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -731,6 +731,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -751,6 +752,7 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
{
return 0;
}
+static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}

#endif

--
2.25.1


2024-04-21 18:06:43

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 16/22] KVM: x86: Implement gmem hook for determining max NPT mapping level

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table. The existing mixed-attribute tracking in KVM is
insufficient here, for instance:

- gmem allocates 2MB page
- guest issues PVALIDATE on 2MB page
- guest later converts a subpage to shared
- SNP host code issues PSMASH to split 2MB RMP mapping to 4K
- KVM MMU splits NPT mapping to 4K
- guest later converts that shared page back to private

At this point there are no mixed attributes, and KVM would normally
allow for 2MB NPT mappings again, but this is actually not allowed
because the RMP table mappings are 4K and cannot be promoted on the
hypervisor side, so the NPT mappings must still be limited to 4K to
match this.

Implement a kvm_x86_ops.gmem_validate_fault() hook for SEV that checks
for this condition and adjusts the mapping level accordingly.

Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 32 ++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 7 +++++++
3 files changed, 40 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ff9b8c68ae56..243369e302f4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4528,3 +4528,35 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
cond_resched();
}
}
+
+/*
+ * Re-check whether an #NPF for a private/gmem page can still be serviced, and
+ * adjust maximum mapping level if needed.
+ */
+int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
+ u8 *max_level)
+{
+ int level, rc;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc) {
+ pr_err_ratelimited("SEV: RMP entry not found: GFN %llx PFN %llx level %d error %d\n",
+ gfn, pfn, level, rc);
+ return -ENOENT;
+ }
+
+ if (!assigned) {
+ pr_err_ratelimited("SEV: RMP entry is not assigned: GFN %llx PFN %llx level %d\n",
+ gfn, pfn, level);
+ return -EINVAL;
+ }
+
+ if (level < *max_level)
+ *max_level = level;
+
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 29dc5fa28d97..c26a7a933b93 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5088,6 +5088,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.gmem_prepare = sev_gmem_prepare,
.gmem_invalidate = sev_gmem_invalidate,
+ .gmem_validate_fault = sev_gmem_validate_fault,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6721e5c6cf73..8a8ee475ad86 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -732,6 +732,8 @@ void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
+ u8 *max_level);
#else
static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -753,6 +755,11 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
return 0;
}
static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
+static inline int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn,
+ bool is_private, u8 *max_level)
+{
+ return 0;
+}

#endif

--
2.25.1


2024-04-21 18:07:04

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 17/22] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

From: Ashish Kalra <[email protected]>

With SNP/guest_memfd, private/encrypted memory should not be mappable,
and MMU notifications for HVA-mapped memory will only be relevant to
unencrypted guest memory. Therefore, the rationale behind issuing a
wbinvd_on_all_cpus() in sev_guest_memory_reclaimed() should not apply
for SNP guests and can be ignored.

Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
[mdr: Add some clarifications in commit]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 243369e302f4..cf00a811aca5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3042,7 +3042,14 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)

void sev_guest_memory_reclaimed(struct kvm *kvm)
{
- if (!sev_guest(kvm))
+ /*
+ * With SNP+gmem, private/encrypted memory should be
+ * unreachable via the hva-based mmu notifiers. Additionally,
+ * for shared->private translations, H/W coherency will ensure
+ * first guest access to the page would clear out any existing
+ * dirty copies of that cacheline.
+ */
+ if (!sev_guest(kvm) || sev_snp_guest(kvm))
return;

wbinvd_on_all_cpus();
--
2.25.1


2024-04-21 18:07:25

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 18/22] KVM: SVM: Add module parameter to enable SEV-SNP

From: Brijesh Singh <[email protected]>

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kvm/svm/sev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index cf00a811aca5..c354aca721e5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -48,7 +48,8 @@ static bool sev_es_enabled = true;
module_param_named(sev_es, sev_es_enabled, bool, 0444);

/* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);

/* enable/disable SEV-ES DebugSwap support */
static bool sev_es_debug_swap_enabled = true;
--
2.25.1


2024-04-21 18:07:45

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 19/22] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

From: Brijesh Singh <[email protected]>

Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.

This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made are opaque to the
hypervisor, which only serves as a proxy for the guest requests and
firmware responses.

Implement handling for these events.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 83 ++++++++++++++++++++++++++++++++++
include/uapi/linux/sev-guest.h | 9 ++++
2 files changed, 92 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c354aca721e5..68db390b19d0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,6 +19,7 @@
#include <linux/misc_cgroup.h>
#include <linux/processor.h>
#include <linux/trace_events.h>
+#include <uapi/linux/sev-guest.h>

#include <asm/pkru.h>
#include <asm/trapnr.h>
@@ -3290,6 +3291,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
case SVM_VMGEXIT_TERM_REQUEST:
+ case SVM_VMGEXIT_GUEST_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3733,6 +3735,83 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
return ret;
}

+static bool snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_request *data,
+ gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ kvm_pfn_t req_pfn, resp_pfn;
+
+ if (!PAGE_ALIGNED(req_gpa) || !PAGE_ALIGNED(resp_gpa))
+ return false;
+
+ req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+ if (is_error_noslot_pfn(req_pfn))
+ return false;
+
+ resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+ if (is_error_noslot_pfn(resp_pfn))
+ return false;
+
+ if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+ return false;
+
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+ data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+ return true;
+}
+
+static bool snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data)
+{
+ u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+
+ if (snp_page_reclaim(pfn))
+ return false;
+
+ if (rmp_make_shared(pfn, PG_LEVEL_4K))
+ return false;
+
+ return true;
+}
+
+static bool __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa,
+ sev_ret_code *fw_err)
+{
+ struct sev_data_snp_guest_request data = {0};
+ struct kvm_sev_info *sev;
+ bool ret = true;
+
+ if (!sev_snp_guest(kvm))
+ return false;
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ if (!snp_setup_guest_buf(kvm, &data, req_gpa, resp_gpa))
+ return false;
+
+ if (sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err))
+ ret = false;
+
+ if (!snp_cleanup_guest_buf(&data))
+ ret = false;
+
+ return ret;
+}
+
+static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ sev_ret_code fw_err = 0;
+ int vmm_ret = 0;
+
+ if (!__snp_handle_guest_req(kvm, req_gpa, resp_gpa, &fw_err))
+ vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3993,6 +4072,10 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
vcpu->run->system_event.ndata = 1;
vcpu->run->system_event.data[0] = control->ghcb_gpa;
break;
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index 154a87a1eca9..7bd78e258569 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -89,8 +89,17 @@ struct snp_ext_report_req {
#define SNP_GUEST_FW_ERR_MASK GENMASK_ULL(31, 0)
#define SNP_GUEST_VMM_ERR_SHIFT 32
#define SNP_GUEST_VMM_ERR(x) (((u64)x) << SNP_GUEST_VMM_ERR_SHIFT)
+#define SNP_GUEST_FW_ERR(x) ((x) & SNP_GUEST_FW_ERR_MASK)
+#define SNP_GUEST_ERR(vmm_err, fw_err) (SNP_GUEST_VMM_ERR(vmm_err) | \
+ SNP_GUEST_FW_ERR(fw_err))

+/*
+ * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define
+ * a GENERIC error code such that it won't ever conflict with GHCB-defined
+ * errors if any get added in the future.
+ */
#define SNP_GUEST_VMM_ERR_INVALID_LEN 1
#define SNP_GUEST_VMM_ERR_BUSY 2
+#define SNP_GUEST_VMM_ERR_GENERIC BIT(31)

#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.25.1


2024-04-21 18:08:06

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 20/22] crypto: ccp: Add the SNP_VLEK_LOAD command

When requesting an attestation report a guest is able to specify whether
it wants SNP firmware to sign the report using either a Versioned Chip
Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
Key Derivation Service (KDS) and derived from seeds allocated to
enrolled cloud service providers (CSPs).

For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load
them into the system after obtaining them from the KDS. Add a
corresponding userspace interface so to allow the loading of VLEK keys
into the system.

See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details.

Reviewed-by: Tom Lendacky <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/coco/sev-guest.rst | 19 ++++++++++++++
drivers/crypto/ccp/sev-dev.c | 36 +++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 27 ++++++++++++++++++++
3 files changed, 82 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index e1eaf6a830ce..de68d3a4b540 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -176,6 +176,25 @@ to SNP_CONFIG command defined in the SEV-SNP spec. The current values of
the firmware parameters affected by this command can be queried via
SNP_PLATFORM_STATUS.

+2.7 SNP_VLEK_LOAD
+-----------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_user_data_snp_vlek_load
+:Returns (out): 0 on success, -negative on error
+
+When requesting an attestation report a guest is able to specify whether
+it wants SNP firmware to sign the report using either a Versioned Chip
+Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
+Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
+Key Derivation Service (KDS) and derived from seeds allocated to
+enrolled cloud service providers.
+
+In the case of VLEK keys, the SNP_VLEK_LOAD SNP command is used to load
+them into the system after obtaining them from the KDS, and corresponds
+closely to the SNP_VLEK_LOAD firmware command specified in the SEV-SNP
+spec.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 2102377f727b..97a7959406ee 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2027,6 +2027,39 @@ static int sev_ioctl_do_snp_set_config(struct sev_issue_cmd *argp, bool writable
return __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
}

+static int sev_ioctl_do_snp_vlek_load(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_snp_vlek_load input;
+ void *blob;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ if (copy_from_user(&input, u64_to_user_ptr(argp->data), sizeof(input)))
+ return -EFAULT;
+
+ if (input.len != sizeof(input) || input.vlek_wrapped_version != 0)
+ return -EINVAL;
+
+ blob = psp_copy_user_blob(input.vlek_wrapped_address,
+ sizeof(struct sev_user_data_snp_wrapped_vlek_hashstick));
+ if (IS_ERR(blob))
+ return PTR_ERR(blob);
+
+ input.vlek_wrapped_address = __psp_pa(blob);
+
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_VLEK_LOAD, &input, &argp->error);
+
+ kfree(blob);
+
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -2087,6 +2120,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SNP_SET_CONFIG:
ret = sev_ioctl_do_snp_set_config(&input, writable);
break;
+ case SNP_VLEK_LOAD:
+ ret = sev_ioctl_do_snp_vlek_load(&input, writable);
+ break;
default:
ret = -EINVAL;
goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index b7a2c2ee35b7..2289b7c76c59 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -31,6 +31,7 @@ enum {
SNP_PLATFORM_STATUS,
SNP_COMMIT,
SNP_SET_CONFIG,
+ SNP_VLEK_LOAD,

SEV_MAX,
};
@@ -214,6 +215,32 @@ struct sev_user_data_snp_config {
__u8 rsvd1[52];
} __packed;

+/**
+ * struct sev_data_snp_vlek_load - SNP_VLEK_LOAD structure
+ *
+ * @len: length of the command buffer read by the PSP
+ * @vlek_wrapped_version: version of wrapped VLEK hashstick (Must be 0h)
+ * @rsvd: reserved
+ * @vlek_wrapped_address: address of a wrapped VLEK hashstick
+ * (struct sev_user_data_snp_wrapped_vlek_hashstick)
+ */
+struct sev_user_data_snp_vlek_load {
+ __u32 len; /* In */
+ __u8 vlek_wrapped_version; /* In */
+ __u8 rsvd[3]; /* In */
+ __u64 vlek_wrapped_address; /* In */
+} __packed;
+
+/**
+ * struct sev_user_data_snp_vlek_wrapped_vlek_hashstick - Wrapped VLEK data
+ *
+ * @data: Opaque data provided by AMD KDS (as described in SEV-SNP Firmware ABI
+ * 1.54, SNP_VLEK_LOAD)
+ */
+struct sev_user_data_snp_wrapped_vlek_hashstick {
+ __u8 data[432]; /* In */
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1


2024-04-21 18:08:28

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 01/22] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y

SEV-SNP relies on private memory support to run guests, so make sure to
enable that support via the CONFIG_KVM_GENERIC_PRIVATE_MEM config
option.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index d64fb2b3eb69..5e72faca4e8f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -136,6 +136,7 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select ARCH_HAS_CC_PLATFORM
+ select KVM_GENERIC_PRIVATE_MEM
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
--
2.25.1


2024-04-21 18:08:48

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 21/22] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands

These commands can be used to pause servicing of guest attestation
requests. This useful when updating the reported TCB or signing key with
commands such as SNP_SET_CONFIG/SNP_COMMIT/SNP_VLEK_LOAD, since they may
in turn require updates to userspace-supplied certificates, and if an
attestation request happens to be in-flight at the time those updates
are occurring there is potential for a guest to receive a certificate
blob that is out of sync with the effective signing key for the
attestation report.

These interfaces also provide some versatility with how similar
firmware/certificate update activities can be handled in the future.

Reviewed-by: Tom Lendacky <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/coco/sev-guest.rst | 50 +++++++++++++++++++++++++--
arch/x86/include/asm/sev.h | 6 ++++
arch/x86/virt/svm/sev.c | 43 +++++++++++++++++++++++
drivers/crypto/ccp/sev-dev.c | 47 +++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 12 +++++++
5 files changed, 156 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index de68d3a4b540..ab192a008ba7 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -128,8 +128,6 @@ the SEV-SNP specification for further details.

The SNP_GET_EXT_REPORT ioctl is similar to the SNP_GET_REPORT. The difference is
related to the additional certificate data that is returned with the report.
-The certificate data returned is being provided by the hypervisor through the
-SNP_SET_EXT_CONFIG.

The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command provided by the SEV-SNP
firmware to get the attestation report.
@@ -195,6 +193,54 @@ them into the system after obtaining them from the KDS, and corresponds
closely to the SNP_VLEK_LOAD firmware command specified in the SEV-SNP
spec.

+2.8 SNP_PAUSE_ATTESTATION / SNP_RESUME_ATTESTATION
+--------------------------------------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (out): struct sev_user_data_snp_pause_transaction
+:Returns (out): 0 on success, -negative on error
+
+When requesting attestation reports, SNP guests have the option of issuing
+an extended guest request which allows host userspace to supply additional
+certificate data that can be used to validate the signature used to sign
+the attestation report. This signature is generated using a key that is
+derived from the reported TCB that can be set via the SNP_SET_CONFIG and
+SNP_COMMIT ioctls, so the accompanying certificate data needs to be kept in
+sync with the changes made to the reported TCB via these ioctls.
+
+Similarly, interfaces like SNP_LOAD_VLEK can modify the key used to sign
+the attestation reports, which may in turn require updating the certificate
+data provided to guests via extended guest requests.
+
+To allow for updating the reported TCB, endorsement key, and any certificate
+data in a manner that is atomic to guests, the SNP_PAUSE_ATTESTATION and
+SNP_RESUME_ATTESTATION commands are provided.
+
+After SNP_PAUSE_ATTESTATION is issued, any attestation report requests via
+extended guest requests that are in-progress, or received after
+SNP_PAUSE_ATTESTATION is issued, will result in the guest receiving a
+GHCB-defined error message instructing it to retry the request. Once all
+the desired reported TCB, endorsement keys, or certificate data updates
+are completed on the host, the SNP_RESUME_ATTESTATION command must be
+issued to allow guest attestation requests to proceed.
+
+In general, hosts should serialize updates of this sort and never have more
+than 1 outstanding transaction in flight that could result in the
+interleaving of multiple SNP_PAUSE_ATTESTATION/SNP_RESUME_ATTESTATION pairs.
+To guard against this, SNP_PAUSE_ATTESTATION will fail if another process
+has already paused attestation requests.
+
+However, there may be occassions where a transaction needs to be aborted due
+to unexpected activity in userspace such as timeouts, crashes, etc., so
+SNP_RESUME_ATTESTATION will always succeed. Nonetheless, this could
+potentially lead to SNP_RESUME_ATTESTATION being called out of sequence, so
+to allow for callers of SNP_{PAUSE,RESUME}_ATTESTATION to detect such
+occurrences, each ioctl will return a transaction ID in the response so the
+caller can monitor whether the start/end ID both match. If they don't, the
+caller should assume that attestation has been paused/resumed unexpectedly,
+and take whatever measures it deems necessary such as logging, reporting,
+auditing the sequence of events.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 3a06f06b847a..ee24ef815e35 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -82,6 +82,8 @@ extern void vc_no_ghcb(void);
extern void vc_boot_ghcb(void);
extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

+extern struct mutex snp_pause_attestation_lock;
+
/* PVALIDATE return codes */
#define PVALIDATE_FAIL_SIZEMISMATCH 6

@@ -272,6 +274,8 @@ int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, u32 asid, bool immut
int rmp_make_shared(u64 pfn, enum pg_level level);
void snp_leak_pages(u64 pfn, unsigned int npages);
void kdump_sev_callback(void);
+int snp_pause_attestation(u64 *transaction_id);
+void snp_resume_attestation(u64 *transaction_id);
#else
static inline bool snp_probe_rmptable_info(void) { return false; }
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -285,6 +289,8 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, u32 as
static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
static inline void kdump_sev_callback(void) { }
+static inline int snp_pause_attestation(u64 *transaction_id) { return 0; }
+static inline void snp_resume_attestation(u64 *transaction_id) {}
#endif

#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index ab0e8448bb6e..b75f2e7d4012 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -70,6 +70,11 @@ static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);

static unsigned long snp_nr_leaked_pages;

+/* For synchronizing TCB/certificate updates with extended guest requests */
+DEFINE_MUTEX(snp_pause_attestation_lock);
+static u64 snp_transaction_id;
+static bool snp_attestation_paused;
+
#undef pr_fmt
#define pr_fmt(fmt) "SEV-SNP: " fmt

@@ -568,3 +573,41 @@ void kdump_sev_callback(void)
if (cc_platform_has(CC_ATTR_HOST_SEV_SNP))
wbinvd();
}
+
+int snp_pause_attestation(u64 *transaction_id)
+{
+ mutex_lock(&snp_pause_attestation_lock);
+
+ if (snp_attestation_paused) {
+ mutex_unlock(&snp_pause_attestation_lock);
+ return -EBUSY;
+ }
+
+ /*
+ * The actual transaction ID update will happen when
+ * snp_resume_attestation() is called, so return
+ * the *anticipated* transaction ID that will be
+ * returned by snp_resume_attestation(). This is
+ * to ensure that unbalanced/aborted transactions will
+ * be noticeable when the caller that started the
+ * transaction calls snp_resume_attestation().
+ */
+ *transaction_id = snp_transaction_id + 1;
+ snp_attestation_paused = true;
+
+ mutex_unlock(&snp_pause_attestation_lock);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(snp_pause_attestation);
+
+void snp_resume_attestation(u64 *transaction_id)
+{
+ mutex_lock(&snp_pause_attestation_lock);
+
+ snp_attestation_paused = false;
+ *transaction_id = ++snp_transaction_id;
+
+ mutex_unlock(&snp_pause_attestation_lock);
+}
+EXPORT_SYMBOL_GPL(snp_resume_attestation);
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 97a7959406ee..7eb18a273731 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2060,6 +2060,47 @@ static int sev_ioctl_do_snp_vlek_load(struct sev_issue_cmd *argp, bool writable)
return ret;
}

+static int sev_ioctl_do_snp_pause_attestation(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_user_data_snp_pause_attestation transaction = {0};
+ struct sev_device *sev = psp_master->sev_data;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ ret = snp_pause_attestation(&transaction.id);
+ if (ret)
+ return ret;
+
+ if (copy_to_user((void __user *)argp->data, &transaction, sizeof(transaction)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int sev_ioctl_do_snp_resume_attestation(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_user_data_snp_pause_attestation transaction = {0};
+ struct sev_device *sev = psp_master->sev_data;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ snp_resume_attestation(&transaction.id);
+
+ if (copy_to_user((void __user *)argp->data, &transaction, sizeof(transaction)))
+ return -EFAULT;
+
+ return 0;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -2123,6 +2164,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SNP_VLEK_LOAD:
ret = sev_ioctl_do_snp_vlek_load(&input, writable);
break;
+ case SNP_PAUSE_ATTESTATION:
+ ret = sev_ioctl_do_snp_pause_attestation(&input, writable);
+ break;
+ case SNP_RESUME_ATTESTATION:
+ ret = sev_ioctl_do_snp_resume_attestation(&input, writable);
+ break;
default:
ret = -EINVAL;
goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 2289b7c76c59..7b35b2814a99 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -32,6 +32,8 @@ enum {
SNP_COMMIT,
SNP_SET_CONFIG,
SNP_VLEK_LOAD,
+ SNP_PAUSE_ATTESTATION,
+ SNP_RESUME_ATTESTATION,

SEV_MAX,
};
@@ -241,6 +243,16 @@ struct sev_user_data_snp_wrapped_vlek_hashstick {
__u8 data[432]; /* In */
} __packed;

+/**
+ * struct sev_user_data_snp_pause_attestation - metadata for pausing attestation
+ *
+ * @id: the ID of the transaction started/ended by a call to SNP_PAUSE_ATTESTATION
+ * or SNP_RESUME_ATTESTATION, respectively.
+ */
+struct sev_user_data_snp_pause_attestation {
+ __u64 id; /* Out */
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1


2024-04-21 18:09:18

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 22/22] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Version 2 of GHCB specification added support for the SNP Extended Guest
Request Message NAE event. This event serves a nearly identical purpose
to the previously-added SNP_GUEST_REQUEST event, but allows for
additional certificate data to be supplied via an additional
guest-supplied buffer to be used mainly for verifying the signature of
an attestation report as returned by firmware.

This certificate data is supplied by userspace, so unlike with
SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
forwarded to userspace via a KVM_EXIT_VMGEXIT exit type, and then the
firmware request is made only afterward.

Implement handling for these events.

Since there is a potential for race conditions where the
userspace-supplied certificate data may be out-of-sync relative to the
reported TCB or VLEK that firmware will use when signing attestation
reports, make use of the synchronization mechanisms wired up to the
SNP_{PAUSE,RESUME}_ATTESTATION SEV device ioctls such that the guest
will be told to retry the request while attestation has been paused due
to an update being underway on the system.

Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/kvm/api.rst | 26 +++++++++++
arch/x86/include/asm/sev.h | 6 +++
arch/x86/kvm/svm/sev.c | 82 ++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 3 ++
arch/x86/virt/svm/sev.c | 37 +++++++++++++++
include/uapi/linux/kvm.h | 6 +++
6 files changed, 160 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 85099198a10f..6cf186ed8f66 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7066,6 +7066,7 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
struct kvm_user_vmgexit {
#define KVM_USER_VMGEXIT_PSC_MSR 1
#define KVM_USER_VMGEXIT_PSC 2
+ #define KVM_USER_VMGEXIT_EXT_GUEST_REQ 3
__u32 type; /* KVM_USER_VMGEXIT_* type */
union {
struct {
@@ -7079,6 +7080,11 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
__u64 shared_gpa;
__u64 ret;
} psc;
+ struct {
+ __u64 data_gpa;
+ __u64 data_npages;
+ __u32 ret;
+ } ext_guest_req;
};
};

@@ -7108,6 +7114,26 @@ private/shared state. Userspace will return a value in 'ret' that is in
agreement with the GHCB-defined return values that the guest will expect
in the SW_EXITINFO2 field of the GHCB in response to these requests.

+For the KVM_USER_VMGEXIT_EXT_GUEST_REQ type, the ext_guest_req union type
+is used. The kernel will supply in 'data_gpa' the value the guest supplies
+via the RAX field of the GHCB when issued extended guest requests.
+'data_npages' will similarly contain the value the guest supplies in RBX
+denoting the number of shared pages available to write the certificate
+data into.
+
+ - If the supplied number of pages is sufficient, userspace should write
+ the certificate data blob (in the format defined by the GHCB spec) in
+ the address indicated by 'data_gpa' and set 'ret' to 0.
+
+ - If the number of pages supplied is not sufficient, userspace must write
+ the required number of pages in 'data_npages' and then set 'ret' to 1.
+
+ - If userspace is temporarily unable to handle the request, 'ret' should
+ be set to 2 to inform the guest to retry later.
+
+ - If some other error occurred, userspace should set 'ret' to a non-zero
+ value that is distinct from the specific return values mentioned above.
+
6. Capabilities that can be enabled on vCPUs
============================================

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ee24ef815e35..dfc28ac4dd0e 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -276,6 +276,9 @@ void snp_leak_pages(u64 pfn, unsigned int npages);
void kdump_sev_callback(void);
int snp_pause_attestation(u64 *transaction_id);
void snp_resume_attestation(u64 *transaction_id);
+u64 snp_transaction_get_id(void);
+bool __snp_transaction_is_stale(u64 transaction_id);
+bool snp_transaction_is_stale(u64 transaction_id);
#else
static inline bool snp_probe_rmptable_info(void) { return false; }
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -291,6 +294,9 @@ static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
static inline void kdump_sev_callback(void) { }
static inline int snp_pause_attestation(u64 *transaction_id) { return 0; }
static inline void snp_resume_attestation(u64 *transaction_id) {}
+static inline u64 snp_transaction_get_id(void) { return 0; }
+static inline bool __snp_transaction_is_stale(u64 transaction_id) { return false; }
+static inline bool snp_transaction_is_stale(u64 transaction_id) { return false; }
#endif

#endif
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 68db390b19d0..1cec466e593b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3292,6 +3292,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_PSC:
case SVM_VMGEXIT_TERM_REQUEST:
case SVM_VMGEXIT_GUEST_REQUEST:
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3812,6 +3813,84 @@ static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
}

+static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct vmcb_control_area *control;
+ struct kvm *kvm = vcpu->kvm;
+ sev_ret_code fw_err = 0;
+ int vmm_ret;
+
+ vmm_ret = vcpu->run->vmgexit.ext_guest_req.ret;
+ if (vmm_ret) {
+ if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
+ vcpu->arch.regs[VCPU_REGS_RBX] =
+ vcpu->run->vmgexit.ext_guest_req.data_npages;
+ goto abort_request;
+ }
+
+ control = &svm->vmcb->control;
+
+ /*
+ * To avoid the message sequence number getting out of sync between the
+ * actual value seen by firmware verses the value expected by the guest,
+ * make sure attestations can't get paused on the write-side at this
+ * point by holding the lock for the entire duration of the firmware
+ * request so that there is no situation where SNP_GUEST_VMM_ERR_BUSY
+ * would need to be returned after firmware sees the request.
+ */
+ mutex_lock(&snp_pause_attestation_lock);
+
+ if (__snp_transaction_is_stale(svm->snp_transaction_id))
+ vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
+ else if (!__snp_handle_guest_req(kvm, control->exit_info_1,
+ control->exit_info_2, &fw_err))
+ vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+ mutex_unlock(&snp_pause_attestation_lock);
+
+abort_request:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+
+ return 1; /* resume guest */
+}
+
+static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+ int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+ struct vcpu_svm *svm = to_svm(vcpu);
+ unsigned long data_npages;
+ sev_ret_code fw_err;
+ gpa_t data_gpa;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ goto abort_request;
+
+ data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+ data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+ if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
+ goto abort_request;
+
+ svm->snp_transaction_id = snp_transaction_get_id();
+ if (snp_transaction_is_stale(svm->snp_transaction_id)) {
+ vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
+ goto abort_request;
+ }
+
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_EXT_GUEST_REQ;
+ vcpu->run->vmgexit.ext_guest_req.data_gpa = data_gpa;
+ vcpu->run->vmgexit.ext_guest_req.data_npages = data_npages;
+ vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
+
+ return 0; /* forward request to userspace */
+
+abort_request:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+ return 1; /* resume guest */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -4076,6 +4155,9 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
ret = 1;
break;
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+ ret = snp_begin_ext_guest_req(vcpu);
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8a8ee475ad86..28140bc8af27 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -303,6 +303,9 @@ struct vcpu_svm {

/* Guest GIF value, used when vGIF is not enabled */
bool guest_gif;
+
+ /* Transaction ID associated with SNP config updates */
+ u64 snp_transaction_id;
};

struct svm_cpu_data {
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index b75f2e7d4012..f1f7486a3dcf 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -72,6 +72,7 @@ static unsigned long snp_nr_leaked_pages;

/* For synchronizing TCB/certificate updates with extended guest requests */
DEFINE_MUTEX(snp_pause_attestation_lock);
+EXPORT_SYMBOL_GPL(snp_pause_attestation_lock);
static u64 snp_transaction_id;
static bool snp_attestation_paused;

@@ -611,3 +612,39 @@ void snp_resume_attestation(u64 *transaction_id)
mutex_unlock(&snp_pause_attestation_lock);
}
EXPORT_SYMBOL_GPL(snp_resume_attestation);
+
+u64 snp_transaction_get_id(void)
+{
+ u64 id;
+
+ mutex_lock(&snp_pause_attestation_lock);
+ id = snp_transaction_id;
+ mutex_unlock(&snp_pause_attestation_lock);
+
+ return id;
+}
+EXPORT_SYMBOL_GPL(snp_transaction_get_id);
+
+/* Must be called with snp_pause_attestion_lock held */
+bool __snp_transaction_is_stale(u64 transaction_id)
+{
+ lockdep_assert_held(&snp_pause_attestation_lock);
+
+ return (snp_attestation_paused ||
+ transaction_id != snp_transaction_id);
+}
+EXPORT_SYMBOL_GPL(__snp_transaction_is_stale);
+
+bool snp_transaction_is_stale(u64 transaction_id)
+{
+ bool stale;
+
+ mutex_lock(&snp_pause_attestation_lock);
+
+ stale = __snp_transaction_is_stale(transaction_id);
+
+ mutex_unlock(&snp_pause_attestation_lock);
+
+ return stale;
+}
+EXPORT_SYMBOL_GPL(snp_transaction_is_stale);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e33c48bfbd67..585de3a2591e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -138,6 +138,7 @@ struct kvm_xen_exit {
struct kvm_user_vmgexit {
#define KVM_USER_VMGEXIT_PSC_MSR 1
#define KVM_USER_VMGEXIT_PSC 2
+#define KVM_USER_VMGEXIT_EXT_GUEST_REQ 3
__u32 type; /* KVM_USER_VMGEXIT_* type */
union {
struct {
@@ -151,6 +152,11 @@ struct kvm_user_vmgexit {
__u64 shared_gpa;
__u64 ret;
} psc;
+ struct {
+ __u64 data_gpa;
+ __u64 data_npages;
+ __u32 ret;
+ } ext_guest_req;
};
};

--
2.25.1


2024-04-21 18:09:43

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 02/22] KVM: SEV: Add support to handle AP reset MSR protocol

From: Tom Lendacky <[email protected]>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 6 ++--
arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++-----
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b463fcbd4b90..01261f7054ad 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -54,8 +54,10 @@
(((unsigned long)fn) << 32))

/* AP Reset Hold */
-#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
-#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
+#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 598d78b4107f..6e31cb408dd8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -49,6 +49,10 @@ static bool sev_es_debug_swap_enabled = true;
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
static u64 sev_supported_vmsa_features;

+#define AP_RESET_HOLD_NONE 0
+#define AP_RESET_HOLD_NAE_EVENT 1
+#define AP_RESET_HOLD_MSR_PROTO 2
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2727,6 +2731,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)

void sev_es_unmap_ghcb(struct vcpu_svm *svm)
{
+ /* Clear any indication that the vCPU is in a type of AP Reset Hold */
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
if (!svm->sev_es.ghcb)
return;

@@ -2938,6 +2945,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_AP_RESET_HOLD_REQ:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+ ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+ /*
+ * Preset the result to a non-SIPI return and then only set
+ * the result to non-zero when delivering a SIPI.
+ */
+ set_ghcb_msr_bits(svm, 0,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -3037,6 +3060,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
case SVM_VMGEXIT_AP_HLT_LOOP:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
ret = kvm_emulate_ap_reset_hold(vcpu);
break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -3280,15 +3304,31 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
return;
}

- /*
- * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
- * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
- * non-zero value.
- */
- if (!svm->sev_es.ghcb)
- return;
+ /* Subsequent SIPI */
+ switch (svm->sev_es.ap_reset_hold_type) {
+ case AP_RESET_HOLD_NAE_EVENT:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+ */
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ break;
+ case AP_RESET_HOLD_MSR_PROTO:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set GHCB data field to a non-zero value.
+ */
+ set_ghcb_msr_bits(svm, 1,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);

- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ default:
+ break;
+ }
}

struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 323901782547..6fd0f5862681 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -199,6 +199,7 @@ struct vcpu_sev_es_state {
u8 valid_bitmap[16];
struct kvm_host_map ghcb_map;
bool received_first_sipi;
+ unsigned int ap_reset_hold_type;

/* SEV-ES scratch area support */
u64 sw_scratch;
--
2.25.1


2024-04-21 18:10:04

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 03/22] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests

From: Brijesh Singh <[email protected]>

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 16 +++++++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 01261f7054ad..5a8246dd532f 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_POS 12
+#define GHCB_MSR_HV_FT_MASK GENMASK_ULL(51, 0)
#define GHCB_MSR_HV_FT_RESP_VAL(v) \
/* GHCBData[63:12] */ \
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6e31cb408dd8..1d2264e93afe 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -33,9 +33,11 @@
#include "cpuid.h"
#include "trace.h"

-#define GHCB_VERSION_MAX 1ULL
+#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

+#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP
+
/* enable/disable SEV support */
static bool sev_enabled = true;
module_param_named(sev, sev_enabled, bool, 0444);
@@ -2701,6 +2703,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+ case SVM_VMGEXIT_HV_FEATURES:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -2961,6 +2964,12 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK,
GHCB_MSR_INFO_POS);
break;
+ case GHCB_MSR_HV_FT_REQ:
+ set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+ GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+ GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -3085,6 +3094,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_HV_FEATURES:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_HV_FT_SUPPORTED);
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.25.1


2024-04-21 18:10:24

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 04/22] KVM: SEV: Add initial SEV-SNP support

From: Brijesh Singh <[email protected]>

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based security protection. SEV-SNP adds strong memory
encryption and integrity protection to help prevent malicious
hypervisor-based attacks such as data replay, memory re-mapping, and
more, to create an isolated execution environment.

Define a new KVM_X86_SNP_VM type which makes use of these capabilities
and extend the KVM_SEV_INIT2 ioctl to support it. Also add a basic
helper to check whether SNP is enabled and set PFERR_PRIVATE_ACCESS for
private #NPFs so they are handled appropriately by KVM MMU.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/svm.h | 3 ++-
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/svm/sev.c | 21 ++++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 8 +++++++-
arch/x86/kvm/svm/svm.h | 12 ++++++++++++
5 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 728c98175b9c..544a43c1cf11 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -285,7 +285,8 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_

#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)

-#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)
+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)

struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 72ad5ace118d..9a8b81d20314 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -872,5 +872,6 @@ struct kvm_hyperv_eventfd {
#define KVM_X86_SW_PROTECTED_VM 1
#define KVM_X86_SEV_VM 2
#define KVM_X86_SEV_ES_VM 3
+#define KVM_X86_SNP_VM 4

#endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1d2264e93afe..c41cc73a1efe 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -46,6 +46,9 @@ module_param_named(sev, sev_enabled, bool, 0444);
static bool sev_es_enabled = true;
module_param_named(sev_es, sev_es_enabled, bool, 0444);

+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
+
/* enable/disable SEV-ES DebugSwap support */
static bool sev_es_debug_swap_enabled = true;
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
@@ -275,6 +278,9 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
sev->es_active = es_active;
sev->vmsa_features = data->vmsa_features;

+ if (vm_type == KVM_X86_SNP_VM)
+ sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
ret = sev_asid_new(sev);
if (ret)
goto e_no_asid;
@@ -326,7 +332,8 @@ static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
return -EINVAL;

if (kvm->arch.vm_type != KVM_X86_SEV_VM &&
- kvm->arch.vm_type != KVM_X86_SEV_ES_VM)
+ kvm->arch.vm_type != KVM_X86_SEV_ES_VM &&
+ kvm->arch.vm_type != KVM_X86_SNP_VM)
return -EINVAL;

if (copy_from_user(&data, u64_to_user_ptr(argp->data), sizeof(data)))
@@ -2306,11 +2313,16 @@ void __init sev_set_cpu_caps(void)
kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_ES_VM);
}
+ if (sev_snp_enabled) {
+ kvm_cpu_cap_set(X86_FEATURE_SEV_SNP);
+ kvm_caps.supported_vm_types |= BIT(KVM_X86_SNP_VM);
+ }
}

void __init sev_hardware_setup(void)
{
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+ bool sev_snp_supported = false;
bool sev_es_supported = false;
bool sev_supported = false;

@@ -2391,6 +2403,7 @@ void __init sev_hardware_setup(void)
sev_es_asid_count = min_sev_asid - 1;
WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
sev_es_supported = true;
+ sev_snp_supported = sev_snp_enabled && cc_platform_has(CC_ATTR_HOST_SEV_SNP);

out:
if (boot_cpu_has(X86_FEATURE_SEV))
@@ -2403,9 +2416,15 @@ void __init sev_hardware_setup(void)
pr_info("SEV-ES %s (ASIDs %u - %u)\n",
sev_es_supported ? "enabled" : "disabled",
min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
+ if (boot_cpu_has(X86_FEATURE_SEV_SNP))
+ pr_info("SEV-SNP %s (ASIDs %u - %u)\n",
+ sev_snp_supported ? "enabled" : "disabled",
+ min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);

sev_enabled = sev_supported;
sev_es_enabled = sev_es_supported;
+ sev_snp_enabled = sev_snp_supported;
+
if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
!cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
sev_es_debug_swap_enabled = false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 535018f152a3..422b452fbc3b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2056,6 +2056,9 @@ static int npf_interception(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
error_code &= ~PFERR_SYNTHETIC_MASK;

+ if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK))
+ error_code |= PFERR_PRIVATE_ACCESS;
+
trace_kvm_page_fault(vcpu, fault_address, error_code);
return kvm_mmu_page_fault(vcpu, fault_address, error_code,
static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
@@ -4899,8 +4902,11 @@ static int svm_vm_init(struct kvm *kvm)

if (type != KVM_X86_DEFAULT_VM &&
type != KVM_X86_SW_PROTECTED_VM) {
- kvm->arch.has_protected_state = (type == KVM_X86_SEV_ES_VM);
+ kvm->arch.has_protected_state =
+ (type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
to_kvm_sev_info(kvm)->need_init = true;
+
+ kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
}

if (!pause_filter_count || !pause_filter_thresh)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6fd0f5862681..7f2e9c7fc4ca 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -348,6 +348,18 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
#endif
}

+static __always_inline bool sev_snp_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return (sev->vmsa_features & SVM_SEV_FEAT_SNP_ACTIVE) &&
+ !WARN_ON_ONCE(!sev_es_guest(kvm));
+#else
+ return false;
+#endif
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1


2024-04-21 18:10:44

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 05/22] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command

From: Brijesh Singh <[email protected]>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. Other commands can then at that point be
used to load/encrypt data into the guest's initial launch image.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 28 ++-
arch/x86/include/uapi/asm/kvm.h | 11 +
arch/x86/kvm/svm/sev.c | 195 +++++++++++++++++-
arch/x86/kvm/svm/svm.h | 1 +
4 files changed, 231 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 3381556d596d..d4c4a0b90bc9 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -459,6 +459,30 @@ issued by the hypervisor to make the guest ready for execution.

Returns: 0 on success, -negative on error

+18. KVM_SEV_SNP_LAUNCH_START
+----------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. It must be called prior to issuing
+KVM_SEV_SNP_LAUNCH_UPDATE or KVM_SEV_SNP_LAUNCH_FINISH;
+
+Parameters (in): struct kvm_sev_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_start {
+ __u64 policy; /* Guest policy to use. */
+ __u8 gosvw[16]; /* Guest OS visible workarounds. */
+ __u16 flags; /* Must be zero. */
+ __u8 pad0[6];
+ __u64 pad1[4];
+ };
+
+See SNP_LAUNCH_START in the SEV-SNP specification [snp-fw-abi]_ for further
+details on the input parameters in ``struct kvm_sev_snp_launch_start``.
+
Device attribute API
====================

@@ -490,9 +514,11 @@ References
==========


-See [white-paper]_, [api-spec]_, [amd-apm]_ and [kvm-forum]_ for more info.
+See [white-paper]_, [api-spec]_, [amd-apm]_, [kvm-forum]_, and [snp-fw-abi]_
+for more info.

.. [white-paper] https://developer.amd.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
.. [api-spec] https://support.amd.com/TechDocs/55766_SEV-KM_API_Specification.pdf
.. [amd-apm] https://support.amd.com/TechDocs/24593.pdf (section 15.34)
.. [kvm-forum] https://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
+.. [snp-fw-abi] https://www.amd.com/system/files/TechDocs/56860.pdf
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 9a8b81d20314..5765391f0fdb 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -697,6 +697,9 @@ enum sev_cmd_id {
/* Second time is the charm; improved versions of the above ioctls. */
KVM_SEV_INIT2,

+ /* SNP-specific commands */
+ KVM_SEV_SNP_LAUNCH_START = 100,
+
KVM_SEV_NR_MAX,
};

@@ -822,6 +825,14 @@ struct kvm_sev_receive_update_data {
__u32 pad2;
};

+struct kvm_sev_snp_launch_start {
+ __u64 policy;
+ __u8 gosvw[16];
+ __u16 flags;
+ __u8 pad0[6];
+ __u64 pad1[4];
+};
+
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c41cc73a1efe..9d08d1202544 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -25,6 +25,7 @@
#include <asm/fpu/xcr.h>
#include <asm/fpu/xstate.h>
#include <asm/debugreg.h>
+#include <asm/sev.h>

#include "mmu.h"
#include "x86.h"
@@ -58,6 +59,21 @@ static u64 sev_supported_vmsa_features;
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2

+/* As defined by SEV-SNP Firmware ABI, under "Guest Policy". */
+#define SNP_POLICY_MASK_API_MINOR GENMASK_ULL(7, 0)
+#define SNP_POLICY_MASK_API_MAJOR GENMASK_ULL(15, 8)
+#define SNP_POLICY_MASK_SMT BIT_ULL(16)
+#define SNP_POLICY_MASK_RSVD_MBO BIT_ULL(17)
+#define SNP_POLICY_MASK_DEBUG BIT_ULL(19)
+#define SNP_POLICY_MASK_SINGLE_SOCKET BIT_ULL(20)
+
+#define SNP_POLICY_MASK_VALID (SNP_POLICY_MASK_API_MINOR | \
+ SNP_POLICY_MASK_API_MAJOR | \
+ SNP_POLICY_MASK_SMT | \
+ SNP_POLICY_MASK_RSVD_MBO | \
+ SNP_POLICY_MASK_DEBUG | \
+ SNP_POLICY_MASK_SINGLE_SOCKET)
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -68,6 +84,8 @@ static unsigned int nr_asids;
static unsigned long *sev_asid_bitmap;
static unsigned long *sev_reclaim_asid_bitmap;

+static int snp_decommission_context(struct kvm *kvm);
+
struct enc_region {
struct list_head list;
unsigned long npages;
@@ -94,12 +112,17 @@ static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid)
down_write(&sev_deactivate_lock);

wbinvd_on_all_cpus();
- ret = sev_guest_df_flush(&error);
+
+ if (sev_snp_enabled)
+ ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+ else
+ ret = sev_guest_df_flush(&error);

up_write(&sev_deactivate_lock);

if (ret)
- pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+ pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+ sev_snp_enabled ? "-SNP" : "", ret, error);

return ret;
}
@@ -1976,6 +1999,125 @@ int sev_dev_get_attr(u32 group, u64 attr, u64 *val)
}
}

+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct sev_data_snp_addr data = {};
+ void *context;
+ int rc;
+
+ /* Allocate memory for context page */
+ context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+ if (!context)
+ return NULL;
+
+ data.address = __psp_pa(context);
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+ if (rc) {
+ pr_warn("Failed to create SEV-SNP context, rc %d fw_error %d",
+ rc, argp->error);
+ snp_free_firmware_page(context);
+ return NULL;
+ }
+
+ return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_activate data = {0};
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.asid = sev_get_asid(kvm);
+ return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_start start = {0};
+ struct kvm_sev_snp_launch_start params;
+ int rc;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+ return -EFAULT;
+
+ /* Don't allow userspace to allocate memory for more than 1 SNP context. */
+ if (sev->snp_context) {
+ pr_debug("%s: SEV-SNP context already exists. Refusing to allocate an additional one.\n",
+ __func__);
+ return -EINVAL;
+ }
+
+ sev->snp_context = snp_context_create(kvm, argp);
+ if (!sev->snp_context)
+ return -ENOTTY;
+
+ if (params.flags) {
+ pr_debug("%s: SEV-SNP hypervisor does not support requested flags 0x%x\n",
+ __func__, params.flags);
+ return -EINVAL;
+ }
+
+ if (params.policy & ~SNP_POLICY_MASK_VALID) {
+ pr_debug("%s: SEV-SNP hypervisor does not support requested policy 0x%llx (supported 0x%llx).\n",
+ __func__, params.policy, SNP_POLICY_MASK_VALID);
+ return -EINVAL;
+ }
+
+ if (!(params.policy & SNP_POLICY_MASK_RSVD_MBO)) {
+ pr_debug("%s: SEV-SNP hypervisor does not support requested policy 0x%llx (must be set 0x%llx).\n",
+ __func__, params.policy, SNP_POLICY_MASK_RSVD_MBO);
+ return -EINVAL;
+ }
+
+ if (params.policy & SNP_POLICY_MASK_SINGLE_SOCKET) {
+ pr_debug("%s: SEV-SNP hypervisor does not support limiting guests to a single socket.\n",
+ __func__);
+ return -EINVAL;
+ }
+
+ if (!(params.policy & SNP_POLICY_MASK_SMT)) {
+ pr_debug("%s: SEV-SNP hypervisor does not support limiting guests to a single SMT thread.\n",
+ __func__);
+ return -EINVAL;
+ }
+
+ start.gctx_paddr = __psp_pa(sev->snp_context);
+ start.policy = params.policy;
+ memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+ if (rc) {
+ pr_debug("%s: SEV_CMD_SNP_LAUNCH_START firmware command failed, rc %d\n",
+ __func__, rc);
+ goto e_free_context;
+ }
+
+ sev->fd = argp->sev_fd;
+ rc = snp_bind_asid(kvm, &argp->error);
+ if (rc) {
+ pr_debug("%s: Failed to bind ASID to SEV-SNP context, rc %d\n",
+ __func__, rc);
+ goto e_free_context;
+ }
+
+ return 0;
+
+e_free_context:
+ snp_decommission_context(kvm);
+
+ return rc;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -1999,6 +2141,15 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
goto out;
}

+ /*
+ * Once KVM_SEV_INIT2 initializes a KVM instance as an SNP guest, only
+ * allow the use of SNP-specific commands.
+ */
+ if (sev_snp_guest(kvm) && sev_cmd.id < KVM_SEV_SNP_LAUNCH_START) {
+ r = -EPERM;
+ goto out;
+ }
+
switch (sev_cmd.id) {
case KVM_SEV_ES_INIT:
if (!sev_es_enabled) {
@@ -2063,6 +2214,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_RECEIVE_FINISH:
r = sev_receive_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_START:
+ r = snp_launch_start(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2258,6 +2412,33 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}

+static int snp_decommission_context(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_addr data = {};
+ int ret;
+
+ /* If context is not created then do nothing */
+ if (!sev->snp_context)
+ return 0;
+
+ data.address = __sme_pa(sev->snp_context);
+ down_write(&sev_deactivate_lock);
+ ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+ if (WARN_ONCE(ret, "failed to release guest context")) {
+ up_write(&sev_deactivate_lock);
+ return ret;
+ }
+
+ up_write(&sev_deactivate_lock);
+
+ /* free the context page now */
+ snp_free_firmware_page(sev->snp_context);
+ sev->snp_context = NULL;
+
+ return 0;
+}
+
void sev_vm_destroy(struct kvm *kvm)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2299,7 +2480,15 @@ void sev_vm_destroy(struct kvm *kvm)
}
}

- sev_unbind_asid(kvm, sev->handle);
+ if (sev_snp_guest(kvm)) {
+ if (snp_decommission_context(kvm)) {
+ WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+ return;
+ }
+ } else {
+ sev_unbind_asid(kvm, sev->handle);
+ }
+
sev_asid_free(sev);
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7f2e9c7fc4ca..0654fc91d4db 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -92,6 +92,7 @@ struct kvm_sev_info {
struct list_head mirror_entry; /* Use as a list entry of mirrors */
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
+ void *snp_context; /* SNP guest context page */
};

struct kvm_svm {
--
2.25.1


2024-04-21 18:11:05

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 06/22] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

From: Brijesh Singh <[email protected]>

A key aspect of a launching an SNP guest is initializing it with a
known/measured payload which is then encrypted into guest memory as
pre-validated private pages and then measured into the cryptographic
launch context created with KVM_SEV_SNP_LAUNCH_START so that the guest
can attest itself after booting.

Since all private pages are provided by guest_memfd, make use of the
kvm_gmem_populate() interface to handle this. The general flow is that
guest_memfd will handle allocating the pages associated with the GPA
ranges being initialized by each particular call of
KVM_SEV_SNP_LAUNCH_UPDATE, copying data from userspace into those pages,
and then the post_populate callback will do the work of setting the
RMP entries for these pages to private and issuing the SNP firmware
calls to encrypt/measure them.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 54 ++++
arch/x86/include/uapi/asm/kvm.h | 19 ++
arch/x86/kvm/svm/sev.c | 237 ++++++++++++++++++
3 files changed, 310 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index d4c4a0b90bc9..60728868c5c6 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -483,6 +483,60 @@ Returns: 0 on success, -negative on error
See SNP_LAUNCH_START in the SEV-SNP specification [snp-fw-abi]_ for further
details on the input parameters in ``struct kvm_sev_snp_launch_start``.

+19. KVM_SEV_SNP_LAUNCH_UPDATE
+-----------------------------
+
+The KVM_SEV_SNP_LAUNCH_UPDATE command is used for loading userspace-provided
+data into a guest GPA range, measuring the contents into the SNP guest context
+created by KVM_SEV_SNP_LAUNCH_START, and then encrypting/validating that GPA
+range so that it will be immediately readable using the encryption key
+associated with the guest context once it is booted, after which point it can
+attest the measurement associated with its context before unlocking any
+secrets.
+
+It is required that the GPA ranges initialized by this command have had the
+KVM_MEMORY_ATTRIBUTE_PRIVATE attribute set in advance. See the documentation
+for KVM_SET_MEMORY_ATTRIBUTES for more details on this aspect.
+
+Upon success, this command is not guaranteed to have processed the entire
+range requested. Instead, the ``gfn_start``, ``uaddr``, and ``len`` fields of
+``struct kvm_sev_snp_launch_update`` will be updated to correspond to the
+remaining range that has yet to be processed. The caller should continue
+calling this command until those fields indicate the entire range has been
+processed, e.g. ``len`` is 0, ``gfn_start`` is equal to the last GFN in the
+range plus 1, and ``uaddr`` is the last byte of the userspace-provided source
+buffer address plus 1. In the case where ``type`` is KVM_SEV_SNP_PAGE_TYPE_ZERO,
+``uaddr`` will be ignored completely.
+
+Parameters (in): struct kvm_sev_snp_launch_update
+
+Returns: 0 on success, < 0 on error, -EAGAIN if caller should retry
+
+::
+
+ struct kvm_sev_snp_launch_update {
+ __u64 gfn_start; /* Guest page number to load/encrypt data into. */
+ __u64 uaddr; /* Userspace address of data to be loaded/encrypted. */
+ __u64 len; /* 4k-aligned length in bytes to copy into guest memory.*/
+ __u8 type; /* The type of the guest pages being initialized. */
+ __u8 pad0;
+ __u16 flags; /* Must be zero. */
+ __u32 pad1;
+ __u64 pad2[4];
+
+ };
+
+where the allowed values for page_type are #define'd as::
+
+ KVM_SEV_SNP_PAGE_TYPE_NORMAL
+ KVM_SEV_SNP_PAGE_TYPE_ZERO
+ KVM_SEV_SNP_PAGE_TYPE_UNMEASURED
+ KVM_SEV_SNP_PAGE_TYPE_SECRETS
+ KVM_SEV_SNP_PAGE_TYPE_CPUID
+
+See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is
+used/measured.
+
Device attribute API
====================

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 5765391f0fdb..3c9255de76db 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -699,6 +699,7 @@ enum sev_cmd_id {

/* SNP-specific commands */
KVM_SEV_SNP_LAUNCH_START = 100,
+ KVM_SEV_SNP_LAUNCH_UPDATE,

KVM_SEV_NR_MAX,
};
@@ -833,6 +834,24 @@ struct kvm_sev_snp_launch_start {
__u64 pad1[4];
};

+/* Kept in sync with firmware values for simplicity. */
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
+
+struct kvm_sev_snp_launch_update {
+ __u64 gfn_start;
+ __u64 uaddr;
+ __u64 len;
+ __u8 type;
+ __u8 pad0;
+ __u16 flags;
+ __u32 pad1;
+ __u64 pad2[4];
+};
+
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9d08d1202544..d3ae4ded91df 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -258,6 +258,35 @@ static void sev_decommission(unsigned int handle)
sev_guest_decommission(&decommission, NULL);
}

+static int snp_page_reclaim(u64 pfn)
+{
+ struct sev_data_snp_page_reclaim data = {0};
+ int err, rc;
+
+ data.paddr = __sme_set(pfn << PAGE_SHIFT);
+ rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ if (WARN_ON_ONCE(rc)) {
+ /*
+ * This shouldn't happen under normal circumstances, but if the
+ * reclaim failed, then the page is no longer safe to use.
+ */
+ snp_leak_pages(pfn, 1);
+ }
+
+ return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level)
+{
+ int rc;
+
+ rc = rmp_make_shared(pfn, level);
+ if (rc)
+ snp_leak_pages(pfn, page_level_size(level) >> PAGE_SHIFT);
+
+ return rc;
+}
+
static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
{
struct sev_data_deactivate deactivate;
@@ -2118,6 +2147,211 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return rc;
}

+struct sev_gmem_populate_args {
+ __u8 type;
+ int sev_fd;
+ int fw_error;
+};
+
+static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn_start, kvm_pfn_t pfn,
+ void __user *src, int order, void *opaque)
+{
+ struct sev_gmem_populate_args *sev_populate_args = opaque;
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ int n_private = 0, ret, i;
+ int npages = (1 << order);
+ gfn_t gfn;
+
+ pr_debug("%s: gfn_start 0x%llx pfn_start 0x%llx npages %d\n",
+ __func__, gfn_start, pfn, npages);
+
+ if (WARN_ON_ONCE(sev_populate_args->type != KVM_SEV_SNP_PAGE_TYPE_ZERO && !src)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ for (gfn = gfn_start, i = 0; gfn < gfn_start + npages; gfn++, i++) {
+ struct sev_data_snp_launch_update fw_args = {0};
+ bool assigned;
+ void *vaddr;
+ int level;
+
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_debug("%s: Failed to ensure GFN 0x%llx has private memory attribute set\n",
+ __func__, gfn);
+ ret = -EINVAL;
+ break;
+ }
+
+ ret = snp_lookup_rmpentry((u64)pfn + i, &assigned, &level);
+ if (ret || assigned) {
+ pr_debug("%s: Failed to ensure GFN 0x%llx RMP entry is initial shared state, ret: %d assigned: %d\n",
+ __func__, gfn, ret, assigned);
+ ret = -EINVAL;
+ break;
+ }
+
+ if (src) {
+ vaddr = kmap_local_pfn(pfn + i);
+ ret = copy_from_user(vaddr, src + i * PAGE_SIZE, PAGE_SIZE);
+ if (ret) {
+ pr_debug("Failed to copy source page into GFN 0x%llx\n", gfn);
+ goto out_unmap;
+ }
+ }
+
+ ret = rmp_make_private(pfn + i, gfn << PAGE_SHIFT, PG_LEVEL_4K,
+ sev_get_asid(kvm), true);
+ if (ret) {
+ pr_debug("%s: Failed to mark RMP entry for GFN 0x%llx as private, ret: %d\n",
+ __func__, gfn, ret);
+ goto out_unmap;
+ }
+
+ n_private++;
+
+ fw_args.gctx_paddr = __psp_pa(sev->snp_context);
+ fw_args.address = __sme_set(pfn_to_hpa(pfn + i));
+ fw_args.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K);
+ fw_args.page_type = sev_populate_args->type;
+ ret = __sev_issue_cmd(sev_populate_args->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &fw_args, &sev_populate_args->fw_error);
+ if (ret) {
+ pr_debug("%s: SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
+ __func__, ret, sev_populate_args->fw_error);
+
+ if (WARN_ON_ONCE(snp_page_reclaim(pfn + i)))
+ goto out_unmap;
+
+ /*
+ * When invalid CPUID function entries are detected,
+ * firmware writes the expected values into the page and
+ * leaves it unencrypted so it can be used for debugging
+ * and error-reporting.
+ *
+ * Copy this page back into the source buffer so
+ * userspace can use this information to provide
+ * information on which CPUID leaves/fields failed CPUID
+ * validation.
+ */
+ if (sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_CPUID &&
+ sev_populate_args->fw_error == SEV_RET_INVALID_PARAM) {
+ if (WARN_ON_ONCE(host_rmp_make_shared(pfn + i, PG_LEVEL_4K)))
+ goto out_unmap;
+
+ if (copy_to_user(src + i * PAGE_SIZE, vaddr,
+ PAGE_SIZE))
+ pr_debug("Failed to write CPUID page back to userspace\n");
+
+ /* PFN is hypervisor-owned at this point, skip cleanup for it. */
+ n_private--;
+ }
+ }
+
+out_unmap:
+ kunmap_local(vaddr);
+ if (ret)
+ break;
+ }
+
+out:
+ if (ret) {
+ pr_debug("%s: exiting with error ret %d, restoring %d gmem PFNs to shared.\n",
+ __func__, ret, n_private);
+ for (i = 0; i < n_private; i++)
+ WARN_ON_ONCE(host_rmp_make_shared(pfn + i, PG_LEVEL_4K));
+ }
+
+ return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_gmem_populate_args sev_populate_args = {0};
+ struct kvm_sev_snp_launch_update params;
+ struct kvm_memory_slot *memslot;
+ long npages, count;
+ void __user *src;
+ int ret = 0;
+
+ if (!sev_snp_guest(kvm) || !sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+ return -EFAULT;
+
+ pr_debug("%s: GFN start 0x%llx length 0x%llx type %d flags %d\n", __func__,
+ params.gfn_start, params.len, params.type, params.flags);
+
+ if (!PAGE_ALIGNED(params.len) || params.flags ||
+ (params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL &&
+ params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO &&
+ params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED &&
+ params.type != KVM_SEV_SNP_PAGE_TYPE_SECRETS &&
+ params.type != KVM_SEV_SNP_PAGE_TYPE_CPUID))
+ return -EINVAL;
+
+ npages = params.len / PAGE_SIZE;
+
+ /*
+ * For each GFN that's being prepared as part of the initial guest
+ * state, the following pre-conditions are verified:
+ *
+ * 1) The backing memslot is a valid private memslot.
+ * 2) The GFN has been set to private via KVM_SET_MEMORY_ATTRIBUTES
+ * beforehand.
+ * 3) The PFN of the guest_memfd has not already been set to private
+ * in the RMP table.
+ *
+ * The KVM MMU relies on kvm->mmu_invalidate_seq to retry nested page
+ * faults if there's a race between a fault and an attribute update via
+ * KVM_SET_MEMORY_ATTRIBUTES, and a similar approach could be utilized
+ * here. However, kvm->slots_lock guards against both this as well as
+ * concurrent memslot updates occurring while these checks are being
+ * performed, so use that here to make it easier to reason about the
+ * initial expected state and better guard against unexpected
+ * situations.
+ */
+ mutex_lock(&kvm->slots_lock);
+
+ memslot = gfn_to_memslot(kvm, params.gfn_start);
+ if (!kvm_slot_can_be_private(memslot)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ sev_populate_args.sev_fd = argp->sev_fd;
+ sev_populate_args.type = params.type;
+ src = params.type == KVM_SEV_SNP_PAGE_TYPE_ZERO ? NULL : u64_to_user_ptr(params.uaddr);
+
+ count = kvm_gmem_populate(kvm, params.gfn_start, src, npages,
+ sev_gmem_post_populate, &sev_populate_args);
+ if (count < 0) {
+ argp->error = sev_populate_args.fw_error;
+ pr_debug("%s: kvm_gmem_populate failed, ret %ld (fw_error %d)\n",
+ __func__, count, argp->error);
+ ret = -EIO;
+ } else if (count <= npages) {
+ params.gfn_start += count;
+ params.len -= count * PAGE_SIZE;
+ if (params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO)
+ params.uaddr += count * PAGE_SIZE;
+
+ ret = copy_to_user(u64_to_user_ptr(argp->data), &params, sizeof(params))
+ ? -EIO : 0;
+ } else {
+ WARN_ONCE(1, "Completed page count %ld exceeds requested amount %ld",
+ count, npages);
+ ret = -EINVAL;
+ }
+
+out:
+ mutex_unlock(&kvm->slots_lock);
+
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2217,6 +2451,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_START:
r = snp_launch_start(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_UPDATE:
+ r = snp_launch_update(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
--
2.25.1


2024-04-21 18:11:25

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 07/22] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command

From: Brijesh Singh <[email protected]>

Add a KVM_SEV_SNP_LAUNCH_FINISH command to finalize the cryptographic
launch digest which stores the measurement of the guest at launch time.
Also extend the existing SNP firmware data structures to support
disabling the use of Versioned Chip Endorsement Keys (VCEK) by guests as
part of this command.

While finalizing the launch flow, the code also issues the LAUNCH_UPDATE
SNP firmware commands to encrypt/measure the initial VMSA pages for each
configured vCPU, which requires setting the RMP entries for those pages
to private, so also add handling to clean up the RMP entries for these
pages whening freeing vCPUs during shutdown.

Signed-off-by: Brijesh Singh <[email protected]>
Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Harald Hoyer <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 28 ++++
arch/x86/include/uapi/asm/kvm.h | 17 +++
arch/x86/kvm/svm/sev.c | 126 ++++++++++++++++++
include/linux/psp-sev.h | 4 +-
4 files changed, 174 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 60728868c5c6..67bcede94bb5 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -537,6 +537,34 @@ where the allowed values for page_type are #define'd as::
See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is
used/measured.

+20. KVM_SEV_SNP_LAUNCH_FINISH
+-----------------------------
+
+After completion of the SNP guest launch flow, the KVM_SEV_SNP_LAUNCH_FINISH
+command can be issued to make the guest ready for execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 vcek_disabled;
+ __u8 host_data[32];
+ __u8 pad0[3];
+ __u16 flags; /* Must be zero */
+ __u64 pad1[4];
+ };
+
+
+See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further
+details on the input parameters in ``struct kvm_sev_snp_launch_finish``.
+
Device attribute API
====================

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 3c9255de76db..8007fbfe0160 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -700,6 +700,7 @@ enum sev_cmd_id {
/* SNP-specific commands */
KVM_SEV_SNP_LAUNCH_START = 100,
KVM_SEV_SNP_LAUNCH_UPDATE,
+ KVM_SEV_SNP_LAUNCH_FINISH,

KVM_SEV_NR_MAX,
};
@@ -852,6 +853,22 @@ struct kvm_sev_snp_launch_update {
__u64 pad2[4];
};

+#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
+#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
+
+struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 vcek_disabled;
+ __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+ __u8 pad0[3];
+ __u16 flags;
+ __u64 pad1[4];
+};
+
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d3ae4ded91df..6ca1b13c9beb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -74,6 +74,8 @@ static u64 sev_supported_vmsa_features;
SNP_POLICY_MASK_DEBUG | \
SNP_POLICY_MASK_SINGLE_SOCKET)

+#define INITIAL_VMSA_GPA 0xFFFFFFFFF000
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2352,6 +2354,114 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}

+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_update data = {};
+ struct kvm_vcpu *vcpu;
+ unsigned long i;
+ int ret;
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.page_type = SNP_PAGE_TYPE_VMSA;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ struct vcpu_svm *svm = to_svm(vcpu);
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ /* Perform some pre-encryption checks against the VMSA */
+ ret = sev_es_sync_vmsa(svm);
+ if (ret)
+ return ret;
+
+ /* Transition the VMSA page to a firmware state. */
+ ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+ if (ret)
+ return ret;
+
+ /* Issue the SNP command to encrypt the VMSA */
+ data.address = __sme_pa(svm->sev_es.vmsa);
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, &argp->error);
+ if (ret) {
+ snp_page_reclaim(pfn);
+ return ret;
+ }
+
+ svm->vcpu.arch.guest_state_protected = true;
+ }
+
+ return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_launch_finish params;
+ struct sev_data_snp_launch_finish *data;
+ void *id_block = NULL, *id_auth = NULL;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+ return -EFAULT;
+
+ if (params.flags)
+ return -EINVAL;
+
+ /* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+ ret = snp_launch_update_vmsa(kvm, argp);
+ if (ret)
+ return ret;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+ if (!data)
+ return -ENOMEM;
+
+ if (params.id_block_en) {
+ id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+ if (IS_ERR(id_block)) {
+ ret = PTR_ERR(id_block);
+ goto e_free;
+ }
+
+ data->id_block_en = 1;
+ data->id_block_paddr = __sme_pa(id_block);
+
+ id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+ if (IS_ERR(id_auth)) {
+ ret = PTR_ERR(id_auth);
+ goto e_free_id_block;
+ }
+
+ data->id_auth_paddr = __sme_pa(id_auth);
+
+ if (params.auth_key_en)
+ data->auth_key_en = 1;
+ }
+
+ data->vcek_disabled = params.vcek_disabled;
+
+ memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+ kfree(id_auth);
+
+e_free_id_block:
+ kfree(id_block);
+
+e_free:
+ kfree(data);
+
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2454,6 +2564,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_UPDATE:
r = snp_launch_update(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_FINISH:
+ r = snp_launch_finish(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2944,11 +3057,24 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)

svm = to_svm(vcpu);

+ /*
+ * If it's an SNP guest, then the VMSA was marked in the RMP table as
+ * a guest-owned page. Transition the page to hypervisor state before
+ * releasing it back to the system.
+ */
+ if (sev_snp_guest(vcpu->kvm)) {
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ if (host_rmp_make_shared(pfn, PG_LEVEL_4K))
+ goto skip_vmsa_free;
+ }
+
if (vcpu->arch.guest_state_protected)
sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);

__free_page(virt_to_page(svm->sev_es.vmsa));

+skip_vmsa_free:
if (svm->sev_es.ghcb_sa_free)
kvfree(svm->sev_es.ghcb_sa);
}
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 3705c2044fc0..903ddfea8585 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -658,6 +658,7 @@ struct sev_data_snp_launch_update {
* @id_auth_paddr: system physical address of ID block authentication structure
* @id_block_en: indicates whether ID block is present
* @auth_key_en: indicates whether author key is present in authentication structure
+ * @vcek_disabled: indicates whether use of VCEK is allowed for attestation reports
* @rsvd: reserved
* @host_data: host-supplied data for guest, not interpreted by firmware
*/
@@ -667,7 +668,8 @@ struct sev_data_snp_launch_finish {
u64 id_auth_paddr;
u8 id_block_en:1;
u8 auth_key_en:1;
- u64 rsvd:62;
+ u8 vcek_disabled:1;
+ u64 rsvd:61;
u8 host_data[32];
} __packed;

--
2.25.1


2024-04-23 16:25:04

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 23/22] [SQUASH] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT

Terminate if an non-SNP guest attempts to register a GHCB page; this
is an SNP-only GHCB request.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1cec466e593b..088eca85a6ac 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3970,6 +3970,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
break;
case GHCB_MSR_PREF_GPA_REQ:
+ if (!sev_snp_guest(vcpu->kvm))
+ goto out_terminate;
+
set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
GHCB_MSR_GPA_VALUE_POS);
set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
@@ -3978,6 +3981,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
case GHCB_MSR_REG_GPA_REQ: {
u64 gfn;

+ if (!sev_snp_guest(vcpu->kvm))
+ goto out_terminate;
+
gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
GHCB_MSR_GPA_VALUE_POS);

@@ -4004,12 +4010,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
reason_set, reason_code);

- vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
- vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
- vcpu->run->system_event.ndata = 1;
- vcpu->run->system_event.data[0] = control->ghcb_gpa;
-
- return 0;
+ goto out_terminate;
}
default:
/* Error, keep GHCB MSR value as-is */
@@ -4020,6 +4021,14 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
control->ghcb_gpa, ret);

return ret;
+
+out_terminate:
+ vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+ vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
+ vcpu->run->system_event.ndata = 1;
+ vcpu->run->system_event.data[0] = control->ghcb_gpa;
+
+ return 0;
}

int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
--
2.25.1


2024-04-23 16:26:16

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 24/22] [SQUASH] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

Terminate if an non-SNP guest attempts to issue a Page State Change
GHCB request; this is only allowed for SNP.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 088eca85a6ac..0d8fbd5e25fe 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3996,6 +3996,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
break;
}
case GHCB_MSR_PSC_REQ:
+ if (!sev_snp_guest(vcpu->kvm))
+ goto out_terminate;
+
ret = snp_begin_psc_msr(vcpu, control->ghcb_gpa);
break;
case GHCB_MSR_TERM_REQ: {
--
2.25.1


2024-04-23 16:28:10

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 27/22] [SQUASH] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

As with SVM_VMGEXIT_PSC, ensure that SVM_VMGEXIT_GUEST_REQUEST can only
be issued by an SNP guest.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0e22f588dbe4..2b30b3b0eec8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3292,10 +3292,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_TERM_REQUEST:
break;
case SVM_VMGEXIT_PSC:
+ case SVM_VMGEXIT_GUEST_REQUEST:
if (!sev_snp_guest(vcpu->kvm))
goto vmgexit_err;
break;
- case SVM_VMGEXIT_GUEST_REQUEST:
case SVM_VMGEXIT_EXT_GUEST_REQUEST:
break;
default:
--
2.25.1


2024-04-23 16:48:05

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 29/22] [SQUASH] KVM: SEV: Support SEV-SNP AP Creation NAE event

Return an error if non-SNP guest issues AP Creation request.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ff64ed8df301..1137a7f4136b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3280,6 +3280,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
goto vmgexit_err;
break;
case SVM_VMGEXIT_AP_CREATION:
+ if (!sev_snp_guest(vcpu->kvm))
+ goto vmgexit_err;
if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
if (!kvm_ghcb_rax_is_valid(svm))
goto vmgexit_err;
--
2.25.1


2024-04-23 16:50:42

by Michael Roth

[permalink] [raw]
Subject: [PATCH v14 28/22] [SQUASH] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Ensure an error is returned if a non-SNP guest attempts to issue an
Extended Guest Request. Also add input validation for RAX/RBX.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2b30b3b0eec8..ff64ed8df301 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3297,6 +3297,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
goto vmgexit_err;
break;
case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+ if (!sev_snp_guest(vcpu->kvm))
+ goto vmgexit_err;
+ if (!kvm_ghcb_rax_is_valid(svm) ||
+ !kvm_ghcb_rbx_is_valid(svm))
+ goto vmgexit_err;
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
--
2.25.1


2024-04-23 22:05:03

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v14 28/22] [SQUASH] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

On Tue Apr 23, 2024 at 7:21 PM EEST, Michael Roth wrote:
> Ensure an error is returned if a non-SNP guest attempts to issue an
> Extended Guest Request. Also add input validation for RAX/RBX.
>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2b30b3b0eec8..ff64ed8df301 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3297,6 +3297,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> goto vmgexit_err;
> break;
> case SVM_VMGEXIT_EXT_GUEST_REQUEST:
> + if (!sev_snp_guest(vcpu->kvm))
> + goto vmgexit_err;
> + if (!kvm_ghcb_rax_is_valid(svm) ||
> + !kvm_ghcb_rbx_is_valid(svm))
> + goto vmgexit_err;

Hmm... maybe I'm ignoring something but why this is not just:

if (!sev_snp_guest(vcpu->kvm) ||
!kvm_ghcb_rax_is_valid(svm) ||
!kvm_ghcb_rbx_is_valid(svm)))

since they branch to the same location.

BR, Jarkko

2024-04-24 20:22:07

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v14 03/22] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests

On Sun, Apr 21, 2024, Michael Roth wrote:
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 6e31cb408dd8..1d2264e93afe 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -33,9 +33,11 @@
> #include "cpuid.h"
> #include "trace.h"
>
> -#define GHCB_VERSION_MAX 1ULL
> +#define GHCB_VERSION_MAX 2ULL
> #define GHCB_VERSION_MIN 1ULL

This needs a userspace control. Being unable to limit the GHCB version advertised
to the guest is going to break live migration of SEV-ES VMs, e.g. if a pool of
hosts has some kernels running this flavor of KVM, and some hosts running an
older KVM that doesn't support v2.

2024-04-24 23:58:48

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v14 06/22] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

On Sun, Apr 21, 2024, Michael Roth wrote:
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 9d08d1202544..d3ae4ded91df 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -258,6 +258,35 @@ static void sev_decommission(unsigned int handle)
> sev_guest_decommission(&decommission, NULL);
> }
>
> +static int snp_page_reclaim(u64 pfn)
> +{
> + struct sev_data_snp_page_reclaim data = {0};
> + int err, rc;
> +
> + data.paddr = __sme_set(pfn << PAGE_SHIFT);
> + rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> + if (WARN_ON_ONCE(rc)) {

This now WARNs here *and* in the caller, which is overkill. Pick one.

> + /*
> + * This shouldn't happen under normal circumstances, but if the
> + * reclaim failed, then the page is no longer safe to use.

Explain _why_ it's unsafe. The code makes it quite clear that reclaim shouldn't
fail. E.g. I assume it's bound to the ASID still? Something else?

This is all messy, too. snp_launch_update_vmsa() does RECLAIM but doesn't
convert the page back to shared, which seems wrong. sev_gmem_post_populate()
open codes the calls separately, and then snp_cleanup_guest_buf() bundles them
together.

Yeesh, and the return types are all mixed. snp_cleanup_guest_buf() returns a
bool on success, but the helpers it uses return 0/-errno. Please convert
everything to return 0/-errno, boolean "error codes" are hard to follow and make
the code unnecessarily brittle.

> + if (WARN_ON_ONCE(sev_populate_args->type != KVM_SEV_SNP_PAGE_TYPE_ZERO && !src)) {
> + ret = -EINVAL;

Just return -EINVAL, no?

> + goto out;
> + }
> +
> + for (gfn = gfn_start, i = 0; gfn < gfn_start + npages; gfn++, i++) {
> + struct sev_data_snp_launch_update fw_args = {0};
> + bool assigned;
> + void *vaddr;
> + int level;
> +
> + if (!kvm_mem_is_private(kvm, gfn)) {
> + pr_debug("%s: Failed to ensure GFN 0x%llx has private memory attribute set\n",
> + __func__, gfn);
> + ret = -EINVAL;
> + break;
> + }
> +
> + ret = snp_lookup_rmpentry((u64)pfn + i, &assigned, &level);
> + if (ret || assigned) {
> + pr_debug("%s: Failed to ensure GFN 0x%llx RMP entry is initial shared state, ret: %d assigned: %d\n",
> + __func__, gfn, ret, assigned);
> + ret = -EINVAL;
> + break;
> + }
> +
> + if (src) {
> + vaddr = kmap_local_pfn(pfn + i);
> + ret = copy_from_user(vaddr, src + i * PAGE_SIZE, PAGE_SIZE);
> + if (ret) {
> + pr_debug("Failed to copy source page into GFN 0x%llx\n", gfn);
> + goto out_unmap;
> + }
> + }
> +
> + ret = rmp_make_private(pfn + i, gfn << PAGE_SHIFT, PG_LEVEL_4K,
> + sev_get_asid(kvm), true);
> + if (ret) {
> + pr_debug("%s: Failed to mark RMP entry for GFN 0x%llx as private, ret: %d\n",
> + __func__, gfn, ret);
> + goto out_unmap;
> + }
> +
> + n_private++;
> +
> + fw_args.gctx_paddr = __psp_pa(sev->snp_context);
> + fw_args.address = __sme_set(pfn_to_hpa(pfn + i));
> + fw_args.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K);
> + fw_args.page_type = sev_populate_args->type;
> + ret = __sev_issue_cmd(sev_populate_args->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> + &fw_args, &sev_populate_args->fw_error);
> + if (ret) {
> + pr_debug("%s: SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
> + __func__, ret, sev_populate_args->fw_error);
> +
> + if (WARN_ON_ONCE(snp_page_reclaim(pfn + i)))
> + goto out_unmap;
> +
> + /*
> + * When invalid CPUID function entries are detected,
> + * firmware writes the expected values into the page and
> + * leaves it unencrypted so it can be used for debugging
> + * and error-reporting.
> + *
> + * Copy this page back into the source buffer so
> + * userspace can use this information to provide
> + * information on which CPUID leaves/fields failed CPUID
> + * validation.
> + */
> + if (sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_CPUID &&
> + sev_populate_args->fw_error == SEV_RET_INVALID_PARAM) {
> + if (WARN_ON_ONCE(host_rmp_make_shared(pfn + i, PG_LEVEL_4K)))
> + goto out_unmap;
> +
> + if (copy_to_user(src + i * PAGE_SIZE, vaddr,
> + PAGE_SIZE))
> + pr_debug("Failed to write CPUID page back to userspace\n");
> +
> + /* PFN is hypervisor-owned at this point, skip cleanup for it. */
> + n_private--;

If reclaim or make_shared fails, KVM will attempt make_shared again. And I am
not a fan of keeping vaddr mapped after making the pfn private. Functionally
it probably changes nothing, but conceptually it's odd. And "mapping" is free
(AFAIK). Oh, and vaddr is subtly guaranteed to be valid based on the type, which
is fun.

So why not unmap immediately after the first copy_from_user(), and then this
can be:


if (ret)
goto err;

}

return 0;

err:
if (!<reclaim and make shared>() &&
sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_CPUID &&
sev_populate_args->fw_error == SEV_RET_INVALID_PARAM) {
vaddr = kmap_local(...);
copy_to_user(...);
kunmap_local(vaddr);
}
for (i = 0; i < n_private - 1; i++)
<reclaim? and make shared()>

return ret;

> + sev_populate_args->fw_error == SEV_RET_INVALID_PARAM)
> + }
> + }
> +
> +out_unmap:
> + kunmap_local(vaddr);
> + if (ret)
> + break;
> + }
> +
> +out:
> + if (ret) {
> + pr_debug("%s: exiting with error ret %d, restoring %d gmem PFNs to shared.\n",
> + __func__, ret, n_private);
> + for (i = 0; i < n_private; i++)
> + WARN_ON_ONCE(host_rmp_make_shared(pfn + i, PG_LEVEL_4K));
> + }
> +
> + return ret;
> +}

...

> + src = params.type == KVM_SEV_SNP_PAGE_TYPE_ZERO ? NULL : u64_to_user_ptr(params.uaddr);
> +
> + count = kvm_gmem_populate(kvm, params.gfn_start, src, npages,
> + sev_gmem_post_populate, &sev_populate_args);
> + if (count < 0) {
> + argp->error = sev_populate_args.fw_error;
> + pr_debug("%s: kvm_gmem_populate failed, ret %ld (fw_error %d)\n",
> + __func__, count, argp->error);
> + ret = -EIO;
> + } else if (count <= npages) {

This seems like excessive paranoia.

> + params.gfn_start += count;
> + params.len -= count * PAGE_SIZE;
> + if (params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO)
> + params.uaddr += count * PAGE_SIZE;
> +
> + ret = copy_to_user(u64_to_user_ptr(argp->data), &params, sizeof(params))
> + ? -EIO : 0;

copy_to_user() failures typically return -EFAULT. The style is not the most
readable. Maybe this?

ret = 0;
if (copy_to_user(...))
ret = -EFAULT;

> + } else {
> + WARN_ONCE(1, "Completed page count %ld exceeds requested amount %ld",
> + count, npages);
> + ret = -EINVAL;
> + }
> +
> +out:
> + mutex_unlock(&kvm->slots_lock);
> +
> + return ret;
> +}
> +
> int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -2217,6 +2451,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SNP_LAUNCH_START:
> r = snp_launch_start(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SNP_LAUNCH_UPDATE:
> + r = snp_launch_update(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> --
> 2.25.1
>

2024-04-25 00:15:50

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v14 21/22] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands

On Sun, Apr 21, 2024, Michael Roth wrote:
> These commands can be used to pause servicing of guest attestation
> requests. This useful when updating the reported TCB or signing key with
> commands such as SNP_SET_CONFIG/SNP_COMMIT/SNP_VLEK_LOAD, since they may
> in turn require updates to userspace-supplied certificates, and if an
> attestation request happens to be in-flight at the time those updates
> are occurring there is potential for a guest to receive a certificate
> blob that is out of sync with the effective signing key for the
> attestation report.
>
> These interfaces also provide some versatility with how similar
> firmware/certificate update activities can be handled in the future.

Wait, IIUC, this is using the kernel to get two userspace components to not
stomp over each other. Why is this the kernel's problem to solve?

2024-04-25 00:17:47

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v14 17/22] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

On Sun, Apr 21, 2024, Michael Roth wrote:
> From: Ashish Kalra <[email protected]>
>
> With SNP/guest_memfd, private/encrypted memory should not be mappable,
> and MMU notifications for HVA-mapped memory will only be relevant to
> unencrypted guest memory. Therefore, the rationale behind issuing a
> wbinvd_on_all_cpus() in sev_guest_memory_reclaimed() should not apply
> for SNP guests and can be ignored.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> Reviewed-by: Paolo Bonzini <[email protected]>
> [mdr: Add some clarifications in commit]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 243369e302f4..cf00a811aca5 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3042,7 +3042,14 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
>
> void sev_guest_memory_reclaimed(struct kvm *kvm)
> {
> - if (!sev_guest(kvm))
> + /*
> + * With SNP+gmem, private/encrypted memory should be

Not should, *is*.

> + * unreachable via the hva-based mmu notifiers. Additionally,
> + * for shared->private translations, H/W coherency will ensure
> + * first guest access to the page would clear out any existing
> + * dirty copies of that cacheline.

I don't see how this second part is relevant.

> + */
> + if (!sev_guest(kvm) || sev_snp_guest(kvm))
> return;
>
> wbinvd_on_all_cpus();
> --
> 2.25.1
>

2024-04-25 00:45:27

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v14 16/22] KVM: x86: Implement gmem hook for determining max NPT mapping level

On Sun, Apr 21, 2024, Michael Roth wrote:
> ---
> arch/x86/kvm/svm/sev.c | 32 ++++++++++++++++++++++++++++++++
> arch/x86/kvm/svm/svm.c | 1 +
> arch/x86/kvm/svm/svm.h | 7 +++++++
> 3 files changed, 40 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index ff9b8c68ae56..243369e302f4 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4528,3 +4528,35 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> cond_resched();
> }
> }
> +
> +/*
> + * Re-check whether an #NPF for a private/gmem page can still be serviced, and
> + * adjust maximum mapping level if needed.
> + */
> +int sev_gmem_validate_fault(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,

This is a misleading name. The primary purpose is not to validate the fault, the
primary purpose is to get the max mapping level. The fact that this can fail
should not dictate the name.

I also think we should skip the call if the max level is already PG_LEVEL_4K.
Something _could_ race and invalidate the RMP, but that's _exactly_ why KVM
guards the page fault path with mmu_invalidate_seq.

Actually, is returning an error in this case even correct? Me thinks no. If
something invalidates the RMP between kvm_gmem_get_pfn() and getting the mapping
level, then KVM should retry, which mmu_invalidate_seq handles. Returning
-EINVAL and killing the VM is wrong.

And IMO, "gmem" shouldn't be in the name, this is not a hook from guest_memfd,
it's a hook for mapping private memory. And as someone called out somwhere else,
the "private" parameters is pointless. And for that matter, so is the gfn.

And even _if_ we want to return an error, we could even overload the return code
to handle this, e.g. in the caller:

r = static_call(kvm_x86_max_private_mapping_level)(vcpu->kvm, fault->pfn);
if (r < 0) {
kvm_release_pfn_clean(fault->pfn);
return r;
}

fault->max_level = min(fault->max_level, r);

but what I think we want is:

---
int sev_snp_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
{
int level, rc;
bool assigned;

if (!sev_snp_guest(kvm))
return 0;

rc = snp_lookup_rmpentry(pfn, &assigned, &level);
if (rc || !assigned)
return PG_LEVEL_4K;

return level;
}

static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
u8 max_level, int gmem_order)
{
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;

max_level = min(kvm_max_level_for_order(gmem_order),max_level);
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;

return min(max_level,
static_call(kvm_x86_private_max_mapping_level)(kvm, pfn);
}

static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
struct kvm *kvm = vcpu->kvm;
int max_order, r;

if (!kvm_slot_can_be_private(fault->slot)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
}

r = kvm_gmem_get_pfn(kvm, fault->slot, fault->gfn, &fault->pfn, &max_order);
if (r) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return r;
}

fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
fault->max_level = kvm_max_private_mapping_level(kvm, fault->pfn,
fault->max_level, order);
return RET_PF_CONTINUE;
}

---

Side topic, the KVM_MEM_READONLY check is unnecessary, KVM doesn't allow RO memslots
to coincide with guest_memfd. I missed that in commit e563592224e0 ("KVM: Make
KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY").

2024-04-25 20:53:11

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v14 03/22] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests

On Wed, Apr 24, 2024 at 01:21:49PM -0700, Sean Christopherson wrote:
> On Sun, Apr 21, 2024, Michael Roth wrote:
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 6e31cb408dd8..1d2264e93afe 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -33,9 +33,11 @@
> > #include "cpuid.h"
> > #include "trace.h"
> >
> > -#define GHCB_VERSION_MAX 1ULL
> > +#define GHCB_VERSION_MAX 2ULL
> > #define GHCB_VERSION_MIN 1ULL
>
> This needs a userspace control. Being unable to limit the GHCB version advertised
> to the guest is going to break live migration of SEV-ES VMs, e.g. if a pool of
> hosts has some kernels running this flavor of KVM, and some hosts running an
> older KVM that doesn't support v2.
>

The requirements for implementing the non-SNP aspects of the GHCB
version 2 protocol are fairly minimal, and KVM_SEV_INIT2 is already
migration incompatible with older kernels running KVM_SEV_ES_INIT (e.g.
migrate to newer host, shutdown, start -> measurement failure). There
are QEMU patches here that allow for controlling this via QEMU versioned
machine types to handle this [1]

So I think it makes sense to go ahead move to GHCB version 2 as the base
version for all SEV-ES/SNP guests created via KVM_SEV_INIT2, and leave
KVM_SEV_ES_INIT restricted to GHCB version 1.

This could be done in a pretty self-contained way for SEV-ES by applying
the following patches from this series which are the version 2 protocol
interfaces also applicable to SEV-ES:

KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
KVM: SEV: Add support to handle AP reset MSR protocol
KVM: SEV: Add support for GHCB-based termination requests

And then applying the below patch on top to set GHCB version 1 or 2
accordingly for SEV-ES. (and relocating the GHCB_VERSION_MAX bump to the
below patch as well, although it's not really used at that point so
could also just be dropped completely).

Then in the future we can extend KVM_SEV_INIT2 to allow specifying
specific/newer versions of the GHCB protocol when that becomes needed.

If that sounds appropriate I can submit as a separate standalone patchset
for SEV-ES and then make that a prereq for the remaining SNP-specific bits.

-Mike

[1] https://lore.kernel.org/kvm/[email protected]/


commit 114da695c065595f74fe8aa0e9203f3c65175a95
Author: Michael Roth <[email protected]>
Date: Thu Apr 25 14:42:17 2024 -0500

KVM: SEV: Allow per-instance tracking of GHCB protocol version

The GHCB protocol version may be different from one guest to the next.
Add a field to track it and initialize it accordingly when KVM_SEV_INIT,
KVM_SEV_ES_INIT, and KVM_SEV_INIT2 are called.

Signed-off-by: Michael Roth <[email protected]>

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1137a7f4136b..5c16abc47541 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -310,7 +310,7 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)

static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
struct kvm_sev_init *data,
- unsigned long vm_type)
+ unsigned long vm_type, u16 ghcb_version)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
struct sev_platform_init_args init_args = {0};
@@ -333,6 +333,7 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
sev->active = true;
sev->es_active = es_active;
sev->vmsa_features = data->vmsa_features;
+ sev->ghcb_version = ghcb_version;

if (vm_type == KVM_X86_SNP_VM)
sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE;
@@ -376,7 +377,13 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
return -EINVAL;

vm_type = (argp->id == KVM_SEV_INIT ? KVM_X86_SEV_VM : KVM_X86_SEV_ES_VM);
- return __sev_guest_init(kvm, argp, &data, vm_type);
+
+ /*
+ * KVM_SEV_ES_INIT has been deprecated by KVM_SEV_INIT2, so it will
+ * continue to only ever support the minimal GHCB protocol version.
+ */
+ return __sev_guest_init(kvm, argp, &data, vm_type,
+ vm_type == KVM_X86_SEV_ES_VM ? GHCB_VERSION_MIN : 0);
}

static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
@@ -395,7 +402,15 @@ static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
if (copy_from_user(&data, u64_to_user_ptr(argp->data), sizeof(data)))
return -EFAULT;

- return __sev_guest_init(kvm, argp, &data, kvm->arch.vm_type);
+ /*
+ * Currently KVM supports the full range of mandatory features defined by
+ * version 2 of the GHCB protocol, so default to that for SEV/SNP guests
+ * created via KVM_SEV_INIT2. Care should be taken that support for future
+ * versions of the GHCB protocol are configurable via KVM_SEV_INIT2 to
+ * allow limiting the guests to a particular version to support things
+ * like live migration.
+ */
+ return __sev_guest_init(kvm, argp, &data, kvm->arch.vm_type, 2);
}

static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error)
@@ -3906,6 +3921,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
u64 ghcb_info;
int ret = 1;

@@ -3916,7 +3932,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)

switch (ghcb_info) {
case GHCB_MSR_SEV_INFO_REQ:
- set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+ set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(sev->ghcb_version,
GHCB_VERSION_MIN,
sev_enc_bit));
break;
@@ -4341,11 +4357,14 @@ void sev_init_vmcb(struct vcpu_svm *svm)

void sev_es_vcpu_reset(struct vcpu_svm *svm)
{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
+
/*
* Set the GHCB MSR value as per the GHCB specification when emulating
* vCPU RESET for an SEV-ES guest.
*/
- set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+ set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(sev->ghcb_version,
GHCB_VERSION_MIN,
sev_enc_bit));

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 28140bc8af27..229cb630b540 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -87,6 +87,7 @@ struct kvm_sev_info {
struct list_head regions_list; /* List of registered regions */
u64 ap_jump_table; /* SEV-ES AP Jump Table address */
u64 vmsa_features;
+ u64 ghcb_version; /* Highest guest GHCB protocol version allowed */
struct kvm *enc_context_owner; /* Owner of copied encryption context */
struct list_head mirror_vms; /* List of VMs mirroring */
struct list_head mirror_entry; /* Use as a list entry of mirrors */

2024-04-26 17:35:46

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v14 21/22] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands

On Wed, Apr 24, 2024 at 05:15:40PM -0700, Sean Christopherson wrote:
> On Sun, Apr 21, 2024, Michael Roth wrote:
> > These commands can be used to pause servicing of guest attestation
> > requests. This useful when updating the reported TCB or signing key with
> > commands such as SNP_SET_CONFIG/SNP_COMMIT/SNP_VLEK_LOAD, since they may
> > in turn require updates to userspace-supplied certificates, and if an
> > attestation request happens to be in-flight at the time those updates
> > are occurring there is potential for a guest to receive a certificate
> > blob that is out of sync with the effective signing key for the
> > attestation report.
> >
> > These interfaces also provide some versatility with how similar
> > firmware/certificate update activities can be handled in the future.
>
> Wait, IIUC, this is using the kernel to get two userspace components to not
> stomp over each other. Why is this the kernel's problem to solve?

It's not that they are stepping on each other, but that kernel and
userspace need to coordinate on updating 2 components whose updates need
to be atomic from a guest perspective. Take an update to VLEK key for
instance:

1) management gets a new VLEK endorsement key from KDS along with
associated certificate chain
2) management uses SNP_VLEK_LOAD to update key
3) management updates the certs at the path VMM will grab them
from when the EXT_GUEST_REQUEST userspace exit is issued

If an attestation request comes in after 2), but before 3), then the
guest sees an attestation report signed with the new key, but still
gets the old certificate.

If you reverse the ordering:

1) management gets a new VLEK endorsement key from KDS along with
associated certificate chain
2) management updates the certs at the path VMM will grab them
from when the EXT_GUEST_REQUEST userspace exit is issued
3) management uses SNP_VLEK_LOAD to update key

then an attestation request between 2) and 3) will result in the guest
getting the new cert, but getting an attestation report signed with an old
endorsement key.

Providing a way to pause guest attestation requests prior to 2), and
resume after 3), provides a straightforward way to make those updates
atomic to the guest.

-Mike

2024-04-26 17:58:25

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v14 22/22] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

On Wed, Apr 24, 2024 at 05:10:31PM -0700, Sean Christopherson wrote:
> On Sun, Apr 21, 2024, Michael Roth wrote:
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 85099198a10f..6cf186ed8f66 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -7066,6 +7066,7 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
> > struct kvm_user_vmgexit {
> > #define KVM_USER_VMGEXIT_PSC_MSR 1
> > #define KVM_USER_VMGEXIT_PSC 2
> > + #define KVM_USER_VMGEXIT_EXT_GUEST_REQ 3
>
> Assuming we can't get massage this into a vendor agnostic exit, there's gotta be
> a better name than EXT_GUEST_REQ, which is completely meaningless to me and probably
> most other people that aren't intimately familar with the specs. Request what?

EXT_CERT_REQ maybe? That's effectively all it boils down to, fetching
the GHCB-defined certificate blob from userspace.

>
> > __u32 type; /* KVM_USER_VMGEXIT_* type */
> > union {
> > struct {
> > @@ -3812,6 +3813,84 @@ static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp
> > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> > }
> >
> > +static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
> > +{
> > + struct vcpu_svm *svm = to_svm(vcpu);
> > + struct vmcb_control_area *control;
> > + struct kvm *kvm = vcpu->kvm;
> > + sev_ret_code fw_err = 0;
> > + int vmm_ret;
> > +
> > + vmm_ret = vcpu->run->vmgexit.ext_guest_req.ret;
> > + if (vmm_ret) {
> > + if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
> > + vcpu->arch.regs[VCPU_REGS_RBX] =
> > + vcpu->run->vmgexit.ext_guest_req.data_npages;
> > + goto abort_request;
> > + }
> > +
> > + control = &svm->vmcb->control;
> > +
> > + /*
> > + * To avoid the message sequence number getting out of sync between the
> > + * actual value seen by firmware verses the value expected by the guest,
> > + * make sure attestations can't get paused on the write-side at this
> > + * point by holding the lock for the entire duration of the firmware
> > + * request so that there is no situation where SNP_GUEST_VMM_ERR_BUSY
> > + * would need to be returned after firmware sees the request.
> > + */
> > + mutex_lock(&snp_pause_attestation_lock);
>
> Why is this in KVM? IIUC, KVM only needs to get involved to translate GFNs to
> PFNs, the rest can go in the sev-dev driver, no? The whole split is weird,
> seemingly due to KVM "needing" to take this lock. E.g. why is core kernel code
> in arch/x86/virt/svm/sev.c at all dealing with attestation goo, when pretty much
> all of the actual usage is (or can be) in sev-dev.c

It would be very tempting to have all this locking tucked away in sev-dev
driver, but KVM is a part of that sequence because it needs to handle fetching
the certificate that will be returned to the guest as part of the
attestation request. The transaction ID updates from PAUSE/RESUME
commands is technically enough satisfy this requirement, in which case
KVM wouldn't need to take the lock directly, only check that the
transaction ID isn't stale and report -EBUSY/-EAGAIN to the guest if it
is.

But if the request actually gets sent to firmware, and then we decide
after that the transaction is stale and thus the attestation response
and certificate might not be in sync, then reporting -EBUSY/-EAGAIN will
permanently hose the guest because the message seq fields will be out of
sync between firmware and guest kernel. That's why we need to hold the
lock to actually block a transaction ID update from occuring once we've
committed to sending the attestation request to firmware.

>
> > +
> > + if (__snp_transaction_is_stale(svm->snp_transaction_id))
> > + vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
> > + else if (!__snp_handle_guest_req(kvm, control->exit_info_1,
> > + control->exit_info_2, &fw_err))
> > + vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> > +
> > + mutex_unlock(&snp_pause_attestation_lock);
> > +
> > +abort_request:
> > + ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> > +
> > + return 1; /* resume guest */
> > +}
> > +
> > +static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
> > +{
> > + int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> > + struct vcpu_svm *svm = to_svm(vcpu);
> > + unsigned long data_npages;
> > + sev_ret_code fw_err;
> > + gpa_t data_gpa;
> > +
> > + if (!sev_snp_guest(vcpu->kvm))
> > + goto abort_request;
> > +
> > + data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > + data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> > +
> > + if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
> > + goto abort_request;
> > +
> > + svm->snp_transaction_id = snp_transaction_get_id();
> > + if (snp_transaction_is_stale(svm->snp_transaction_id)) {
>
> Why bother? I assume this is rare, so why not handle it on the backend, i.e.
> after userspace does its thing? Then KVM doesn't even have to care about
> checking for stale IDs, I think?

That's true, it's little more than a very minor performance optimization
to do it here rather than on the return path, so could definitely be
dropped.

>
> None of this makes much sense to my eyes, e.g. AFAICT, the only thing that can
> pause attestation is userspace, yet the kernel is responsible for tracking whether
> or not a transaction is fresh?

Pausing is essentially the start of an update transaction initiated by
userspace. So the kernel is tracking whether there's a potential
transaction that has been started or completed since it first started
servicing the attestation request, otherwise there could be a mismatch
between the endorsement key used for the attestation report versus the
certificate for the endorsement key that was fetched from userspace.

-Mike

2024-04-26 19:57:18

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v14 21/22] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands

On Fri, Apr 26, 2024, Michael Roth wrote:
> On Wed, Apr 24, 2024 at 05:15:40PM -0700, Sean Christopherson wrote:
> > On Sun, Apr 21, 2024, Michael Roth wrote:
> > > These commands can be used to pause servicing of guest attestation
> > > requests. This useful when updating the reported TCB or signing key with
> > > commands such as SNP_SET_CONFIG/SNP_COMMIT/SNP_VLEK_LOAD, since they may
> > > in turn require updates to userspace-supplied certificates, and if an
> > > attestation request happens to be in-flight at the time those updates
> > > are occurring there is potential for a guest to receive a certificate
> > > blob that is out of sync with the effective signing key for the
> > > attestation report.
> > >
> > > These interfaces also provide some versatility with how similar
> > > firmware/certificate update activities can be handled in the future.
> >
> > Wait, IIUC, this is using the kernel to get two userspace components to not
> > stomp over each other. Why is this the kernel's problem to solve?
>
> It's not that they are stepping on each other, but that kernel and
> userspace need to coordinate on updating 2 components whose updates need
> to be atomic from a guest perspective. Take an update to VLEK key for
> instance:
>
> 1) management gets a new VLEK endorsement key from KDS along with

What is "management"? I assume its some userspace daemon?

> associated certificate chain
> 2) management uses SNP_VLEK_LOAD to update key
> 3) management updates the certs at the path VMM will grab them
> from when the EXT_GUEST_REQUEST userspace exit is issued
>
> If an attestation request comes in after 2), but before 3), then the
> guest sees an attestation report signed with the new key, but still
> gets the old certificate.
>
> If you reverse the ordering:
>
> 1) management gets a new VLEK endorsement key from KDS along with
> associated certificate chain
> 2) management updates the certs at the path VMM will grab them
> from when the EXT_GUEST_REQUEST userspace exit is issued
> 3) management uses SNP_VLEK_LOAD to update key
>
> then an attestation request between 2) and 3) will result in the guest
> getting the new cert, but getting an attestation report signed with an old
> endorsement key.
>
> Providing a way to pause guest attestation requests prior to 2), and
> resume after 3), provides a straightforward way to make those updates
> atomic to the guest.

Assuming "management" is a userspace component, I still don't see why this
requires kernel involvement. "management" can tell VMMs to pause attestation
without having to bounce through the kernel. It doesn't even require a push
model, e.g. wrap/redirect the certs with a file that has a "pause" flag and a
sequence counter.

2024-04-26 21:54:13

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v14 21/22] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands

On Fri, Apr 26, 2024 at 12:57:08PM -0700, Sean Christopherson wrote:
> On Fri, Apr 26, 2024, Michael Roth wrote:
> > On Wed, Apr 24, 2024 at 05:15:40PM -0700, Sean Christopherson wrote:
> > > On Sun, Apr 21, 2024, Michael Roth wrote:
> > > > These commands can be used to pause servicing of guest attestation
> > > > requests. This useful when updating the reported TCB or signing key with
> > > > commands such as SNP_SET_CONFIG/SNP_COMMIT/SNP_VLEK_LOAD, since they may
> > > > in turn require updates to userspace-supplied certificates, and if an
> > > > attestation request happens to be in-flight at the time those updates
> > > > are occurring there is potential for a guest to receive a certificate
> > > > blob that is out of sync with the effective signing key for the
> > > > attestation report.
> > > >
> > > > These interfaces also provide some versatility with how similar
> > > > firmware/certificate update activities can be handled in the future.
> > >
> > > Wait, IIUC, this is using the kernel to get two userspace components to not
> > > stomp over each other. Why is this the kernel's problem to solve?
> >
> > It's not that they are stepping on each other, but that kernel and
> > userspace need to coordinate on updating 2 components whose updates need
> > to be atomic from a guest perspective. Take an update to VLEK key for
> > instance:
> >
> > 1) management gets a new VLEK endorsement key from KDS along with
>
> What is "management"? I assume its some userspace daemon?

It could be a daemon depending on cloud provider, but the main example
we have in mind is something more basic like virtee[1] being used to
interactively perform an update at the command-line. E.g. you point it
at the new VLEK, the new cert, and it will handle updating the certs at
some known location and issuing the SNP_LOAD_VLEK command. With this
interface, it can take the additional step of PAUSE'ing attestations
before performing either update to keep the 2 actions in sync with the
guest view.

[1] https://github.com/virtee/snphost

>
> > associated certificate chain
> > 2) management uses SNP_VLEK_LOAD to update key
> > 3) management updates the certs at the path VMM will grab them
> > from when the EXT_GUEST_REQUEST userspace exit is issued
> >
> > If an attestation request comes in after 2), but before 3), then the
> > guest sees an attestation report signed with the new key, but still
> > gets the old certificate.
> >
> > If you reverse the ordering:
> >
> > 1) management gets a new VLEK endorsement key from KDS along with
> > associated certificate chain
> > 2) management updates the certs at the path VMM will grab them
> > from when the EXT_GUEST_REQUEST userspace exit is issued
> > 3) management uses SNP_VLEK_LOAD to update key
> >
> > then an attestation request between 2) and 3) will result in the guest
> > getting the new cert, but getting an attestation report signed with an old
> > endorsement key.
> >
> > Providing a way to pause guest attestation requests prior to 2), and
> > resume after 3), provides a straightforward way to make those updates
> > atomic to the guest.
>
> Assuming "management" is a userspace component, I still don't see why this
> requires kernel involvement. "management" can tell VMMs to pause attestation
> without having to bounce through the kernel. It doesn't even require a push

That would mean a tool like virtee above would need to issue kernel
commands like SNP_LOAD_VLEK to handle key update, then implement some
VMM-specific hook to pause servicing of EXT_GUEST_REQ (or whatever we
end up calling it). QEMU could define events for this, and libvirt could
implement them, and virtee could interact with libvirt to issue them in
place of the PAUSE/RESUME approach here.

But SNP libvirt support is a ways out, QEMU event mechanism for this
will be a pain to use directly because you'd need some custom way to
enumerate all guests, to issue them. But then maybe the provider doesn't
even use QEMU and has to invent something else. Or they just decide to
pause all guests before performing updates but that still a potential
significant amount of downtime.

> without having to bounce through the kernel. It doesn't even require a push
> model, e.g. wrap/redirect the certs with a file that has a "pause" flag and a
> sequence counter.

We could do something like flag the certificate file itself, it does
sounds less painful than the above. But what defines that spec? GHCB
completely defines the current format of the certs blob, so if we wrap
that in another layer we need to extend the GHCB or have something else
be the authority on what that wrapper looks like and tools like virtee
would need to be very selective about what VMMs it can claim to support
based on what file format they support... it just seems like a
significant and unecessary pain that every userspace implementation
will need to go through to achieve the same basic functionality.

With PAUSE/RESUME, tools like virtee can be completely VMM-agnostic, and
more highly-integrated daemon-based approaches can still benefit from a
common mechanism that doesn't require signficant coordination with VMM
processes. For something as important and basic as updating endorsement
keys while guests are running it seems worthwhile to expose this minimal
level of control to userspace.

-Mike

2024-04-27 00:10:20

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v14 21/22] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands

On Fri, Apr 26, 2024, Michael Roth wrote:
> On Fri, Apr 26, 2024 at 12:57:08PM -0700, Sean Christopherson wrote:
> > On Fri, Apr 26, 2024, Michael Roth wrote:
> > What is "management"? I assume its some userspace daemon?
>
> It could be a daemon depending on cloud provider, but the main example
> we have in mind is something more basic like virtee[1] being used to
> interactively perform an update at the command-line. E.g. you point it
> at the new VLEK, the new cert, and it will handle updating the certs at
> some known location and issuing the SNP_LOAD_VLEK command. With this
^^^^^^^^^^^^^^^^^^^
> interface, it can take the additional step of PAUSE'ing attestations
> before performing either update to keep the 2 actions in sync with the
> guest view.

...

> > without having to bounce through the kernel. It doesn't even require a push
> > model, e.g. wrap/redirect the certs with a file that has a "pause" flag and a
> > sequence counter.
>
> We could do something like flag the certificate file itself, it does
> sounds less painful than the above. But what defines that spec?

Whoever defines "some known location". And it doesn't need to be a file wrapper,
e.g. put the cert in a directory along with a lock. Actually, IIUC, there doesn't
even need to be a separate lock file. I know very little about userspace programming,
but common sense and a quick search tells me that file locks are a solved problem.

E.g. it took me ~5 minutes of Googling to come up with this, which AFAICT does
exactly what you want.

touch ~/vlek.cert
(
flock -e 200
echo "Locked the cert, sleeping for 10 seconds"
sleep 10
echo "Igor, it's alive!!!!!!"
) 200< vlek.cert

touch ~/vlek.cert
(
flock -s 201
echo "Got me a shared lock, no updates for you!"
) 201< vlek.cert

2024-04-27 01:32:40

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v14 21/22] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands

On Fri, Apr 26, 2024 at 05:10:10PM -0700, Sean Christopherson wrote:
> On Fri, Apr 26, 2024, Michael Roth wrote:
> > On Fri, Apr 26, 2024 at 12:57:08PM -0700, Sean Christopherson wrote:
> > > On Fri, Apr 26, 2024, Michael Roth wrote:
> > > What is "management"? I assume its some userspace daemon?
> >
> > It could be a daemon depending on cloud provider, but the main example
> > we have in mind is something more basic like virtee[1] being used to
> > interactively perform an update at the command-line. E.g. you point it
> > at the new VLEK, the new cert, and it will handle updating the certs at
> > some known location and issuing the SNP_LOAD_VLEK command. With this
> ^^^^^^^^^^^^^^^^^^^
> > interface, it can take the additional step of PAUSE'ing attestations
> > before performing either update to keep the 2 actions in sync with the
> > guest view.
>
> ...
>
> > > without having to bounce through the kernel. It doesn't even require a push
> > > model, e.g. wrap/redirect the certs with a file that has a "pause" flag and a
> > > sequence counter.
> >
> > We could do something like flag the certificate file itself, it does
> > sounds less painful than the above. But what defines that spec?
>
> Whoever defines "some known location". And it doesn't need to be a file wrapper,

"some known location" is a necessary and simple parameter controlled by
the cloud provider, so it's easy to make a tool like virtee aware of those
what those parameters should be for any particular environment. But it's not
easy to make it aware of a particular providers internal way of synchronizing
guests and certs access. We'd be somewhat dependent of those providers either
providing hooks to allow for better integration, which is "work" that might
encourage them to just brew their own solutions, versus...

Providing a simple reference scheme that's clearly defined, and easily
adoptable across the board, which is "less work", and makes adoption of common
tools/libraries SNP/certs/key management easier because we don't need to
directly involve a provider's internal guest management mechanisms into those
tools.

> e.g. put the cert in a directory along with a lock. Actually, IIUC, there doesn't
> even need to be a separate lock file. I know very little about userspace programming,
> but common sense and a quick search tells me that file locks are a solved problem.
>
> E.g. it took me ~5 minutes of Googling to come up with this, which AFAICT does
> exactly what you want.
>
> touch ~/vlek.cert
> (
> flock -e 200
> echo "Locked the cert, sleeping for 10 seconds"
> sleep 10
> echo "Igor, it's alive!!!!!!"
> ) 200< vlek.cert
>
> touch ~/vlek.cert
> (
> flock -s 201
> echo "Got me a shared lock, no updates for you!"
> ) 201< vlek.cert
>

Hmm... I did completely miss this option. But I think there are still some
issues here. IIUC you're suggesting (for example):

"Management":
a) writelock vlek.cert
b) perform SNP_LOAD_VLEK and update vlek.cert contents
c) unlock vlek.cert

"QEMU":
a) readlock vlek.cert
b) copy cert into guest buffer
c) unlock vlek.cert

The issue is that after "QEMU" unlocks and return the cert to KVM we'll
have:

"KVM"
a) return from EXT_GUEST_REQ exit to userspace
b) issue the attestation report to firmware
c) return the attestation report and cert to the guest

Between a) and b), "Management" can complete another entire update, but
the cert that it passes back to the guest will be stale relative to the
key used to sign the attestation report.

If we need to take more time to explore other options it's not
absolutely necessary to have the kernel solve this now. But every userspace
will need to solve it in some way so it seemed like it might be nice to
have a simple reference implementation to start with.

-Mike

2024-04-29 14:27:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v14 21/22] crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands

On Fri, Apr 26, 2024, Michael Roth wrote:
> On Fri, Apr 26, 2024 at 05:10:10PM -0700, Sean Christopherson wrote:
> > e.g. put the cert in a directory along with a lock. Actually, IIUC, there doesn't
> > even need to be a separate lock file. I know very little about userspace programming,
> > but common sense and a quick search tells me that file locks are a solved problem.
> >
> > E.g. it took me ~5 minutes of Googling to come up with this, which AFAICT does
> > exactly what you want.
> >
> > touch ~/vlek.cert
> > (
> > flock -e 200
> > echo "Locked the cert, sleeping for 10 seconds"
> > sleep 10
> > echo "Igor, it's alive!!!!!!"
> > ) 200< vlek.cert
> >
> > touch ~/vlek.cert
> > (
> > flock -s 201
> > echo "Got me a shared lock, no updates for you!"
> > ) 201< vlek.cert
> >
>
> Hmm... I did completely miss this option. But I think there are still some
> issues here. IIUC you're suggesting (for example):
>
> "Management":
> a) writelock vlek.cert
> b) perform SNP_LOAD_VLEK and update vlek.cert contents
> c) unlock vlek.cert
>
> "QEMU":
> a) readlock vlek.cert
> b) copy cert into guest buffer
> c) unlock vlek.cert
>
> The issue is that after "QEMU" unlocks and return the cert to KVM we'll
> have:
>
> "KVM"
> a) return from EXT_GUEST_REQ exit to userspace
> b) issue the attestation report to firmware
> c) return the attestation report and cert to the guest
>
> Between a) and b), "Management" can complete another entire update, but
> the cert that it passes back to the guest will be stale relative to the
> key used to sign the attestation report.

I was thinking userspace would hold the lock across SEV_CMD_SNP_GUEST_REQUEST.

QEMU:
a) readlock vlek.cert
b) copy cert into guest buffer
c) set kvm_run->immediate_exit
d) invoke KVM_RUN
e) KVM sends SEV_CMD_SNP_GUEST_REQUEST to PSP
f) KVM exits to userspace with -EINTR
g) unlock vlek.cert
h) invoke KVM_RUN (resume the guest)

> If we need to take more time to explore other options it's not
> absolutely necessary to have the kernel solve this now. But every userspace
> will need to solve it in some way so it seemed like it might be nice to
> have a simple reference implementation to start with.

Shoving something into the kernel is not a "reference implementation", especially
not when it impacts the ABI of multiple subsystems.