2023-06-12 04:29:33

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

This patchset is also available at:

https://github.com/amdese/linux/commits/snp-host-v9-rfc

and is based on top of the following tree:

https://github.com/mdroth/linux/commits/kvm_gmem_solo_fixes

which in turn is based on Sean Christopherson's UPM base support tree,
with a couple fixes/workarounds needed for SEV/SNP support. [1]

== OVERVIEW ==

This patchset implements SEV-SNP hypervisor support for linux.

This version is being posted as an RFC due to fairly extensive changes
relating to transitioning the SEV-SNP implementation to using
guest_memfd (gmem, aka Unmapped Private Memory) to manage private guest
pages instead of the legacy SEV memory registration ioctls.

For this purpose we've added a number of hooks on top of gmem to plumb
in necessary RMP table updates corresponding when mapping private
memory into a guest's nested page table, and then restoring it to
shared/hypervisor-owned state we free'ing gmem-allocated memory back to
the host. Our hope is that some of these hooks can be re-used for other
platforms as well, but have tried to make them as minimal as possible if
they prove to be SNP-specific. For quicker review of this aspect, they
are at the beginning of the series, directly on top of the gmem patchset.

Outside of UPM-related items, we've also included fairly extensive changes
based on review feedback from v8 and would appreciate any feedback on
those aspects as well.


== LAYOUT ==

PATCH 01-05: Pre-patches that add generic gmem and KVM MMU hooks to handle
plumbing gmem memory into CoCo guests, and make arch/x86/coco
re-usability for common SEV host code instead of only guest
code..
PATCH 06-22: Host SNP initialization code and CCP driver prep for handling
SNP cmds
PATCH 13-22: general SNP detection/enablement for host and CCP driver
PATCH 23-46: core KVM support for running SEV-SNP guests
PATCH 47-51: misc handling for IOMMU support, guest request handling, and
debug infrastructure


== TESTING (note updated QEMU command-lines) ==

For testing this via QEMU, use the following tree:

https://github.com/amdese/qemu/commits/snp-wip-gmem

SEV-SNP with gmem/UPM enabled:

# set discard=none to disable discarding memory post-conversion, faster
# boot times, but increased memory usage
qemu-system-x86_64 -cpu EPYC-Milan-v2 \
-object memory-backend-memfd-private,id=ram1,size=1G,share=true \
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,discard=both \
-machine q35,confidential-guest-support=sev0,memory-backend=ram1,kvm-type=protected \
...

KVM selftests for UPM:

cd $kernel_src_dir
make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I<path to kernel headers>"
sudo tools/testing/selftests/kvm/x86_64/private_mem_conversions_test


== BACKGROUND (SEV-SNP) ==

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Guest Support now part of mainline.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:

- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

When pages are marked as guest-owned in the RMP table, they are assigned
to a specific guest/ASID, as well as a specific GFN with in the guest. Any
attempts to map it in the RMP table to a different guest/ASID, or a
different GFN within a guest/ASID, will result in an RMP nested page fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outisde of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks added in this series will then update the RMP table
entries for the backing PFNs to set them to guest-owned/private when mapping
private pages into the guest via KVM MMU, or use the normal KVM MMU handling
in the case of shared pages where the corresponding RMP table entries are
left in the default shared/hypervisor-owned state.

== TODO / KNOWN ISSUES ==

* Add a per-arch CONFIG option for enabling platform-specific handling
when invalidating gmem pages and free'ing the back to host, as opposed
to the current approach which defaults to issuing invalidations to a
weak-referenced stub implementation for non-x86 builds. Hoping for more
feedback on general implementation first.
* This should incorporate all review feedback from v8, but if anything
slipped through the cracks please let me know.

[1] https://lore.kernel.org/lkml/[email protected]/

Changes since v8:

* Rework gmem/UPM hooks based on Sean's latest gmem/UPM tree
* Move SEV lazy-pinning support out to a separate series which uses this
series as a prereq instead of the other way around.
* Re-organize extended guest request patches into 3 patches encompassing
SEV FD ioctls for host-wide certs, KVM ioctls for per-instance certs,
and the guest request handling that consumes them. Also move them to
the top of the series to better separate them for the core SNP patches
(Alexey, Zhi, Ashish, Dov, Dionna, others)
* Various other changes/fixups for extended guests request handling (Dov,
Alexey, Dionna)
* Use helper to calculate max RMP entry size and improve readability (Dave)
* Use architecture-independent GPA value for initial VMSA pages
* Ensure SEV_CMD_SNP_GUEST_REQUEST failures are indicated to guest (Alex)
* Allocate per-instance certs on-demand (Alex)
* comment fixup for RMP fault handling (Zhi)
* commit msg rewording for MSR-based PSCs (Zhi)
* update SNP command/struct definitions based on 1.54 ABI (Saban)
* use sev_deactivate_lock around SEV_CMD_SNP_DECOMMISSION (Saban)
* Various comment/commit fixups (Zhi, Alex, Kim, Vlastimil, Dave,
* kexec fixes for newer SNP firmwares (Ashish)
* Various other fixups and re-ordering of patches.

Changes since v7:

* Rebase to Sean's updated UPM base support tree
* Drop KVM_CAP_UNMAPPED_MEMORY and .private_mem_enabled x86 op in favor
of kvm_arch_has_private_mem() and vm_type KVM_VM_CREATE arg
* Drop GHCB map/unmap refactoring and post map/unmap hooks as they are no
longer needed with UPM
* Move .fault_is_private implementation to SNP patch range, no longer
needed for SEV.
* Don't call attribute update / invalidation hooks under kvm->mmu_lock
(Tom, Jarkko)
* Revert switch to using set_memory_p()/set_memory_np() in rmpupdate() due
to it causing performance regression
* Commit fixups for 'fault_is_private'/'update_mem_attr' hooks, have
'fault_is_private' return bool (Boris)
* Split kvm_vm_set_region_attr() into separate patch. (Jarkko)
* Copy corrected CPUID page to userspace when firmware rejects it (Tom,
Jarkko)
* Fix sev_dump_rmpentry() error-handling (Alper)
* Use adjusted cmd_buf pointer rather than sev->cmd_buf directly (Alper)
* Correct typo in SNP_GET_EXT_CONFIG documentation (Dov)
* Update struct kvm_sev_snp_launch_finish definition in
amd-memory-encryption.rst (Tom)
* Fix snp_launch_update_vmsa replacing created_vcpus with online_vcpus
* Fix SNP_DBG_DECRYPT to not include len parameter.
* Fix SNP_LAUNCH_FINISH to copy host-data from userspace


Changes since v6:

* Added support for restrictedmem/UPM, and removed SEV-specific
implementation of private memory management. As a result of this rework
the following patches were no longer needed so were dropped:
- KVM: SVM: Mark the private vma unmergable for SEV-SNP guests
- KVM: SVM: Disallow registering memory range from HugeTLB for SNP guest
- KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP
- KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
* Moved RMP table entry structure definition (struct rmpentry)
to sev.c, to not expose this non-architectural definition to rest
of the kernel and making the structure private to SNP code.
Also made RMP table entry accessors to be inline functions and
removed all accessors which are not called more than once.
Added a new function rmptable_entry() to index into the RMP table
and return RMP table entry.
* Moved RMPUPDATE, PSMASH helper function declerations to x86 arch
specific include namespace from linux namespace. Added comments
for these helper functions.
* Introduce set_memory_p() to provide a way to change atributes of a
memory range to be marked as present and added to the kernel
directmap, and invalidating/restoring pages from directmap are
now done using set_memory_np() and set_memory_p().
* Added detailed comments around user RMP #PF fault handling and
simplified computation of the faulting pfn for large-pages.
* Added support to return pfn from dump_pagetable() to do SEV-specific
fault handling, this is added a pre-patch. This support is now
used to dump RMP entry in case of RMP #PF in show_fault_oops().
* Added a new generic SNP command params structure sev_data_snp_addr,
which is used for all SNP firmware API commands requiring a
single physical address parameter.
* Added support for new SNP_INIT_EX command with support for HV-Fixed
page range list.
* Added support for new SNP_SHUTDOWN_EX command which allows
disabling enforcement of SNP in the IOMMU. Also DF_FLUSH is done
at SNP shutdown if it indicates DF_FLUSH is required.
* Make sev_do_cmd() a generic API interface for the hypervisor
to issue commands to manage an SEV and SNP guest. Also removed
the API wrappers used by the hypervisor to manage an SEV-SNP guest.
All these APIs now invoke sev_do_cmd() directly.
* Introduce snp leaked pages list. If pages are unsafe to be released
back to the page-allocator as they can't be reclaimed or
transitioned back to hypervisor/shared state are now added
to this internal leaked pages list to prevent fatal page faults
when accessing these pages. The function snp_leak_pages() is
renamed to snp_mark_pages_offline() and is an external function
available to both CCP driver and the SNP hypervisor code. Removed
call to memory_failure() when leaking/marking pages offline.
* Remove snp_set_rmp_state() multiplexor code and add new separate
helpers such as rmp_mark_pages_firmware() & rmp_mark_pages_shared().
The callers now issue snp_reclaim_pages() directly when needed as
done by __snp_free_firmware_pages() and unmap_firmware_writeable().
All callers of snp_set_rmp_state() modified to call helpers
rmp_mark_pages_firmware() or rmp_mark_pages_shared() as required.
* Change snp_reclaim_pages() to take physical address as an argument
and clear C-bit from this physical address argument internally.
* Output parameter sev_user_data_ext_snp_config in sev_ioctl_snp_get_config()
is memset to zero to avoid kernel memory leaking.
* Prevent race between sev_ioctl_snp_set_config() and
snp_guest_ext_guest_request() for sev->snp_certs_data by acquiring
sev->snp_certs_lock mutex.
* Zeroed out struct sev_user_data_snp_config in
sev_ioctl_snp_set_config() to prevent leaking uninitialized
kernel memory.
* Optimized snp_safe_alloc_page() by avoiding multiple calls to
pfn_to_page() and checking for a hugepage using pfn instead of
expanding to full physical address.
* Invoke host_rmp_make_shared() with leak parameter set to true
if VMSA page cannot be transitioned back to shared state.
* Fix snp_launch_finish() to always sent the ID_AUTH struct to
the firmware. Use params.auth_key_en indicator to set
if the ID_AUTH struct contains an author key or not.
* Cleanup snp_context_create() and allocate certs_data in this
function using kzalloc() to prevent giving the guest
uninitialized kernel memory.
* Remove the check for guest supplied buffer greater than the data
provided by the hypervisor in snp_handle_ext_guest_request().
* Add check in sev_snp_ap_create() if a malicious guest can
RMPADJUST a large page into VMSA which will hit the SNP erratum
where the CPU will incorrectly signal an RMP violation #PF if a
hugepage collides with the RMP entry of VMSA page, reject the
AP CREATE request if VMSA address from guest is 2M aligned.
* Make VMSAVE target area memory allocation SNP safe, implemented
workaround for an SNP erratum where the CPU will incorrectly signal
an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of the VMSAVE target page.
* Fix handle_split_page_fault() to work with memfd backed pages.
* Add KVM commands for per-VM instance certificates.
* Add IOMMU_SNP_SHUTDOWN support, this adds support for Host kexec
support with SNP.

Documentation/virt/coco/sev-guest.rst | 54 +
Documentation/virt/kvm/api.rst | 34 +
.../virt/kvm/x86/amd-memory-encryption.rst | 147 ++
arch/x86/Kbuild | 2 +-
arch/x86/coco/Makefile | 3 +-
arch/x86/coco/sev/Makefile | 3 +
arch/x86/coco/sev/host.c | 524 ++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/kvm-x86-ops.h | 3 +
arch/x86/include/asm/kvm_host.h | 23 +
arch/x86/include/asm/msr-index.h | 11 +-
arch/x86/include/asm/sev-common.h | 30 +
arch/x86/include/asm/sev-host.h | 37 +
arch/x86/include/asm/sev.h | 5 +-
arch/x86/include/asm/svm.h | 6 +
arch/x86/include/asm/trap_pf.h | 18 +-
arch/x86/kernel/cpu/amd.c | 24 +-
arch/x86/kernel/cpu/bugs.c | 7 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/lapic.c | 5 +-
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 15 +-
arch/x86/kvm/mmu/mmu_internal.h | 39 +-
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/sev.c | 1802 +++++++++++++++++---
arch/x86/kvm/svm/svm.c | 53 +-
arch/x86/kvm/svm/svm.h | 38 +-
arch/x86/kvm/x86.c | 17 +
arch/x86/mm/fault.c | 21 +
drivers/crypto/ccp/sev-dev.c | 1064 +++++++++++-
drivers/crypto/ccp/sev-dev.h | 16 +
drivers/iommu/amd/init.c | 57 +-
include/linux/amd-iommu.h | 3 +-
include/linux/kvm_host.h | 10 +
include/linux/psp-sev.h | 304 +++-
include/uapi/linux/kvm.h | 74 +
include/uapi/linux/psp-sev.h | 71 +
tools/arch/x86/include/asm/cpufeatures.h | 1 +
virt/kvm/guest_mem.c | 48 +-
virt/kvm/kvm_main.c | 75 +-
41 files changed, 4383 insertions(+), 275 deletions(-)



2023-06-12 04:29:52

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 11/51] x86/traps: Define RMP violation #PF error code

From: Brijesh Singh <[email protected]>

Bit 31 in the page fault-error bit will be set when processor encounters
an RMP violation.

While at it, use the BIT_ULL() macro.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/trap_pf.h | 18 +++++++++++-------
arch/x86/mm/fault.c | 1 +
2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 10b1de500ab1..295be06f8db7 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_TRAP_PF_H
#define _ASM_X86_TRAP_PF_H

+#include <linux/bits.h> /* BIT() macro */
+
/*
* Page fault error code bits:
*
@@ -12,15 +14,17 @@
* bit 4 == 1: fault was an instruction fetch
* bit 5 == 1: protection keys block access
* bit 15 == 1: SGX MMU page-fault
+ * bit 31 == 1: fault was due to RMP violation
*/
enum x86_pf_error_code {
- X86_PF_PROT = 1 << 0,
- X86_PF_WRITE = 1 << 1,
- X86_PF_USER = 1 << 2,
- X86_PF_RSVD = 1 << 3,
- X86_PF_INSTR = 1 << 4,
- X86_PF_PK = 1 << 5,
- X86_PF_SGX = 1 << 15,
+ X86_PF_PROT = BIT(0),
+ X86_PF_WRITE = BIT(1),
+ X86_PF_USER = BIT(2),
+ X86_PF_RSVD = BIT(3),
+ X86_PF_INSTR = BIT(4),
+ X86_PF_PK = BIT(5),
+ X86_PF_SGX = BIT(15),
+ X86_PF_RMP = BIT(31),
};

#endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a498ae1fbe66..95791071e3cd 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -546,6 +546,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
!(error_code & X86_PF_PROT) ? "not-present page" :
(error_code & X86_PF_RSVD) ? "reserved bit violation" :
(error_code & X86_PF_PK) ? "protection keys violation" :
+ (error_code & X86_PF_RMP) ? "RMP violation" :
"permissions violation");

if (!(error_code & X86_PF_USER) && user_mode(regs)) {
--
2.25.1


2023-06-12 04:32:09

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 14/51] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

From: Brijesh Singh <[email protected]>

The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set
of contiguous 4KB-Page RMP entries. The hypervisor will use this
instruction to adjust the RMP entry without invalidating the previous
RMP entry.

Add the following external interface API functions:

psmash():
Used to smash a 2MB aligned page into 4K pages while preserving the
Validated bit in the RMP.

rmp_make_private():
Used to assign a page to guest using the RMPUPDATE instruction.

rmp_make_shared():
Used to transition a page to hypervisor/shared state using the
RMPUPDATE instruction.

Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/coco/sev/host.c | 94 +++++++++++++++++++++++++++++++
arch/x86/include/asm/sev-common.h | 14 +++++
arch/x86/include/asm/sev-host.h | 10 ++++
3 files changed, 118 insertions(+)

diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
index d766b3bc6647..9df690b0b263 100644
--- a/arch/x86/coco/sev/host.c
+++ b/arch/x86/coco/sev/host.c
@@ -338,3 +338,97 @@ void sev_dump_rmpentry(u64 pfn)
}
}
EXPORT_SYMBOL_GPL(sev_dump_rmpentry);
+
+/*
+ * PSMASH a 2MB aligned page into 4K pages in the RMP table while preserving the
+ * Validated bit.
+ */
+int psmash(u64 pfn)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret;
+
+ pr_debug("%s: PFN: 0x%llx\n", __func__, pfn);
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ /* Binutils version 2.36 supports the PSMASH mnemonic. */
+ asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+ : "=a"(ret)
+ : "a"(paddr)
+ : "memory", "cc");
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+static int rmpupdate(u64 pfn, struct rmp_state *val)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret, level, npages;
+ int attempts = 0;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ do {
+ /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+ : "=a"(ret)
+ : "a"(paddr), "c"((unsigned long)val)
+ : "memory", "cc");
+
+ attempts++;
+ } while (ret == RMPUPDATE_FAIL_OVERLAP);
+
+ if (ret) {
+ pr_err("RMPUPDATE failed after %d attempts, ret: %d, pfn: %llx, npages: %d, level: %d\n",
+ attempts, ret, pfn, npages, level);
+ sev_dump_rmpentry(pfn);
+ dump_stack();
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+/*
+ * Assign a page to guest using the RMPUPDATE instruction.
+ */
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
+{
+ struct rmp_state val;
+
+ pr_debug("%s: GPA: 0x%llx, PFN: 0x%llx, level: %d, immutable: %d\n",
+ __func__, gpa, pfn, level, immutable);
+
+ memset(&val, 0, sizeof(val));
+ val.assigned = 1;
+ val.asid = asid;
+ val.immutable = immutable;
+ val.gpa = gpa;
+ val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+ return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_private);
+
+/*
+ * Transition a page to hypervisor/shared state using the RMPUPDATE instruction.
+ */
+int rmp_make_shared(u64 pfn, enum pg_level level)
+{
+ struct rmp_state val;
+
+ pr_debug("%s: PFN: 0x%llx, level: %d\n", __func__, pfn, level);
+
+ memset(&val, 0, sizeof(val));
+ val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+ return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_shared);
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index bf0378136289..9eb20b416251 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -171,8 +171,22 @@ struct snp_psc_desc {
#define GHCB_ERR_INVALID_INPUT 5
#define GHCB_ERR_INVALID_EVENT 6

+/* RMUPDATE detected 4K page and 2MB page overlap. */
+#define RMPUPDATE_FAIL_OVERLAP 4
+
/* RMP page size */
#define RMP_PG_SIZE_4K 0
+#define RMP_PG_SIZE_2M 1
#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+#define X86_TO_RMP_PG_LEVEL(level) (((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
+
+struct rmp_state {
+ u64 gpa;
+ u8 assigned;
+ u8 pagesize;
+ u8 immutable;
+ u8 rsvd;
+ u32 asid;
+} __packed;

#endif
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index 85cfe577155c..753e80d16433 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -16,9 +16,19 @@
#ifdef CONFIG_KVM_AMD_SEV
int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
void sev_dump_rmpentry(u64 pfn);
+int psmash(u64 pfn);
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
+int rmp_make_shared(u64 pfn, enum pg_level level);
#else
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
static inline void sev_dump_rmpentry(u64 pfn) {}
+static inline int psmash(u64 pfn) { return -ENXIO; }
+static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
+ bool immutable)
+{
+ return -ENODEV;
+}
+static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
#endif

#endif
--
2.25.1


2023-06-12 04:34:19

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 15/51] x86/sev: Invalidate pages from the direct map when adding them to the RMP table

From: Brijesh Singh <[email protected]>

The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used with standard x86 and IOMMU page tables to enforce
memory restrictions and page access rights. The RMP check is enforced as
soon as SEV-SNP is enabled globally in the system. When hardware
encounters an RMP-check failure, it raises a page-fault exception.

The rmp_make_private() and rmp_make_shared() helpers are used to add
or remove the pages from the RMP table. Improve the rmp_make_private()
to invalidate state so that pages cannot be used in the direct-map after
they are added the RMP table, and restored to their default valid
permission after the pages are removed from the RMP table.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/coco/sev/host.c | 62 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)

diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
index 9df690b0b263..cd3b4c6a25bc 100644
--- a/arch/x86/coco/sev/host.c
+++ b/arch/x86/coco/sev/host.c
@@ -366,6 +366,42 @@ int psmash(u64 pfn)
}
EXPORT_SYMBOL_GPL(psmash);

+static int restore_direct_map(u64 pfn, int npages)
+{
+ int i, ret = 0;
+
+ for (i = 0; i < npages; i++) {
+ ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
+ if (ret)
+ break;
+ }
+
+ if (ret)
+ pr_warn("Failed to restore direct map for pfn 0x%llx, ret: %d\n",
+ pfn + i, ret);
+
+ return ret;
+}
+
+static int invalidate_direct_map(u64 pfn, int npages)
+{
+ int i, ret = 0;
+
+ for (i = 0; i < npages; i++) {
+ ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));
+ if (ret)
+ break;
+ }
+
+ if (ret) {
+ pr_warn("Failed to invalidate direct map for pfn 0x%llx, ret: %d\n",
+ pfn + i, ret);
+ restore_direct_map(pfn, i);
+ }
+
+ return ret;
+}
+
static int rmpupdate(u64 pfn, struct rmp_state *val)
{
unsigned long paddr = pfn << PAGE_SHIFT;
@@ -375,6 +411,21 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
return -ENXIO;

+ level = RMP_TO_X86_PG_LEVEL(val->pagesize);
+ npages = page_level_size(level) / PAGE_SIZE;
+
+ /*
+ * If page is getting assigned in the RMP table then unmap it from the
+ * direct map.
+ */
+ if (val->assigned) {
+ if (invalidate_direct_map(pfn, npages)) {
+ pr_err("Failed to unmap %d pages at pfn 0x%llx from the direct_map\n",
+ npages, pfn);
+ return -EFAULT;
+ }
+ }
+
do {
/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
@@ -393,6 +444,17 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
return -EFAULT;
}

+ /*
+ * Restore the direct map after the page is removed from the RMP table.
+ */
+ if (!val->assigned) {
+ if (restore_direct_map(pfn, npages)) {
+ pr_err("Failed to map %d pages at pfn 0x%llx into the direct_map\n",
+ npages, pfn);
+ return -EFAULT;
+ }
+ }
+
return 0;
}

--
2.25.1


2023-06-12 04:34:51

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 13/51] x86/fault: Handle RMP page faults for user addresses

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled globally, a write from the host is subject to
checks performed by the hardware against the RMP table (APM2 15.36.10)
at the end of a page walk:

1. Assigned bit in the RMP table is not set (i.e page is shared).
2. Immutable bit in the RMP table is not set.
3. If the page table entry that gives the sPA indicates that the
target page size is a large page, then all RMP entries for the 4KB
constituting pages of the target must have the assigned bit 0.

Nothing constructive can come of an attempt by userspace to violate case
1) (which will result in writing garbage due to page encryption) or case
2) (userspace should not ever need or be allowed to write to a page that
the host has specifically needed to mark immutable).

Case 3) is dependent on the hypervisor. In case of KVM, due to how
shared/private pages are partitioned into separate memory pools via
restricted/guarded memory, there should never be a case where a page in
the private pool overlaps with a shared page: either it is a
hugepage-sized allocation and all the sub-pages are private, or it is a
single-page allocation, in which case it cannot overlap with anything
but itself.

Therefore, for all 3 cases, it is appropriate to simply kill the
userspace process if it ever generates an RMP #PF. Implement that logic
here.

Co-developed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: drop all previous page-splitting logic since it is no longer
needed with restricted/guarded memory, update commit message
accordingly]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/mm/fault.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index d46b9cf832b9..6465bff9d1ba 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1329,6 +1329,13 @@ void do_user_addr_fault(struct pt_regs *regs,
if (error_code & X86_PF_INSTR)
flags |= FAULT_FLAG_INSTRUCTION;

+ if (error_code & X86_PF_RMP) {
+ pr_err("Unexpected RMP page fault for address 0x%lx, terminating process\n",
+ address);
+ do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
+ return;
+ }
+
#ifdef CONFIG_X86_64
/*
* Faults in the vsyscall page might need emulation. The
--
2.25.1


2023-06-12 04:35:44

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 16/51] crypto: ccp: Define the SEV-SNP commands

From: Brijesh Singh <[email protected]>

AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.

Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: update SNP command list and SNP status struct based on current
spec]
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 16 +++
include/linux/psp-sev.h | 246 +++++++++++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 53 ++++++++
3 files changed, 315 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index e2f25926eb51..ab3572286755 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -128,6 +128,8 @@ static int sev_cmd_buffer_len(int cmd)
switch (cmd) {
case SEV_CMD_INIT: return sizeof(struct sev_data_init);
case SEV_CMD_INIT_EX: return sizeof(struct sev_data_init_ex);
+ case SEV_CMD_SNP_SHUTDOWN_EX: return sizeof(struct sev_data_snp_shutdown_ex);
+ case SEV_CMD_SNP_INIT_EX: return sizeof(struct sev_data_snp_init_ex);
case SEV_CMD_PLATFORM_STATUS: return sizeof(struct sev_user_data_status);
case SEV_CMD_PEK_CSR: return sizeof(struct sev_data_pek_csr);
case SEV_CMD_PEK_CERT_IMPORT: return sizeof(struct sev_data_pek_cert_import);
@@ -156,6 +158,20 @@ static int sev_cmd_buffer_len(int cmd)
case SEV_CMD_GET_ID: return sizeof(struct sev_data_get_id);
case SEV_CMD_ATTESTATION_REPORT: return sizeof(struct sev_data_attestation_report);
case SEV_CMD_SEND_CANCEL: return sizeof(struct sev_data_send_cancel);
+ case SEV_CMD_SNP_GCTX_CREATE: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_LAUNCH_START: return sizeof(struct sev_data_snp_launch_start);
+ case SEV_CMD_SNP_LAUNCH_UPDATE: return sizeof(struct sev_data_snp_launch_update);
+ case SEV_CMD_SNP_ACTIVATE: return sizeof(struct sev_data_snp_activate);
+ case SEV_CMD_SNP_DECOMMISSION: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_PAGE_RECLAIM: return sizeof(struct sev_data_snp_page_reclaim);
+ case SEV_CMD_SNP_GUEST_STATUS: return sizeof(struct sev_data_snp_guest_status);
+ case SEV_CMD_SNP_LAUNCH_FINISH: return sizeof(struct sev_data_snp_launch_finish);
+ case SEV_CMD_SNP_DBG_DECRYPT: return sizeof(struct sev_data_snp_dbg);
+ case SEV_CMD_SNP_DBG_ENCRYPT: return sizeof(struct sev_data_snp_dbg);
+ case SEV_CMD_SNP_PAGE_UNSMASH: return sizeof(struct sev_data_snp_page_unsmash);
+ case SEV_CMD_SNP_PLATFORM_STATUS: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_GUEST_REQUEST: return sizeof(struct sev_data_snp_guest_request);
+ case SEV_CMD_SNP_CONFIG: return sizeof(struct sev_user_data_snp_config);
default: return 0;
}

diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 1595088c428b..06d0619ca442 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -86,6 +86,36 @@ enum sev_cmd {
SEV_CMD_DBG_DECRYPT = 0x060,
SEV_CMD_DBG_ENCRYPT = 0x061,

+ /* SNP specific commands */
+ SEV_CMD_SNP_INIT = 0x81,
+ SEV_CMD_SNP_SHUTDOWN = 0x82,
+ SEV_CMD_SNP_PLATFORM_STATUS = 0x83,
+ SEV_CMD_SNP_DF_FLUSH = 0x84,
+ SEV_CMD_SNP_INIT_EX = 0x85,
+ SEV_CMD_SNP_SHUTDOWN_EX = 0x86,
+ SEV_CMD_SNP_DECOMMISSION = 0x90,
+ SEV_CMD_SNP_ACTIVATE = 0x91,
+ SEV_CMD_SNP_GUEST_STATUS = 0x92,
+ SEV_CMD_SNP_GCTX_CREATE = 0x93,
+ SEV_CMD_SNP_GUEST_REQUEST = 0x94,
+ SEV_CMD_SNP_ACTIVATE_EX = 0x95,
+ SEV_CMD_SNP_LAUNCH_START = 0xA0,
+ SEV_CMD_SNP_LAUNCH_UPDATE = 0xA1,
+ SEV_CMD_SNP_LAUNCH_FINISH = 0xA2,
+ SEV_CMD_SNP_DBG_DECRYPT = 0xB0,
+ SEV_CMD_SNP_DBG_ENCRYPT = 0xB1,
+ SEV_CMD_SNP_PAGE_SWAP_OUT = 0xC0,
+ SEV_CMD_SNP_PAGE_SWAP_IN = 0xC1,
+ SEV_CMD_SNP_PAGE_MOVE = 0xC2,
+ SEV_CMD_SNP_PAGE_MD_INIT = 0xC3,
+ SEV_CMD_SNP_PAGE_SET_STATE = 0xC6,
+ SEV_CMD_SNP_PAGE_RECLAIM = 0xC7,
+ SEV_CMD_SNP_PAGE_UNSMASH = 0xC8,
+ SEV_CMD_SNP_CONFIG = 0xC9,
+ SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX = 0xCA,
+ SEV_CMD_SNP_COMMIT = 0xCB,
+ SEV_CMD_SNP_VLEK_LOAD = 0xCD,
+
SEV_CMD_MAX,
};

@@ -531,6 +561,222 @@ struct sev_data_attestation_report {
u32 len; /* In/Out */
} __packed;

+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+ u64 address; /* In */
+ u32 len; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_activate - SNP_ACTIVATE command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ * @asid: ASID to bind to the guest
+ */
+struct sev_data_snp_activate {
+ u64 gctx_paddr; /* In */
+ u32 asid; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_addr - generic SNP command params
+ *
+ * @address: system physical address guest context page
+ */
+struct sev_data_snp_addr {
+ u64 gctx_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @policy: guest policy
+ * @ma_gctx_addr: system physical address of migration agent
+ * @imi_en: launch flow is launching an IMI for the purpose of
+ * guest-assisted migration.
+ * @ma_en: the guest is associated with a migration agent
+ */
+struct sev_data_snp_launch_start {
+ u64 gctx_paddr; /* In */
+ u64 policy; /* In */
+ u64 ma_gctx_paddr; /* In */
+ u32 ma_en:1; /* In */
+ u32 imi_en:1; /* In */
+ u32 rsvd:30;
+ u8 gosvw[16]; /* In */
+} __packed;
+
+/* SNP support page type */
+enum {
+ SNP_PAGE_TYPE_NORMAL = 0x1,
+ SNP_PAGE_TYPE_VMSA = 0x2,
+ SNP_PAGE_TYPE_ZERO = 0x3,
+ SNP_PAGE_TYPE_UNMEASURED = 0x4,
+ SNP_PAGE_TYPE_SECRET = 0x5,
+ SNP_PAGE_TYPE_CPUID = 0x6,
+
+ SNP_PAGE_TYPE_MAX
+};
+
+/**
+ * struct sev_data_snp_launch_update - SNP_LAUNCH_UPDATE command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @imi_page: indicates that this page is part of the IMI of the guest
+ * @page_type: encoded page type
+ * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
+ * @address: system physical address of destination page to encrypt
+ * @vmpl1_perms: VMPL permission mask for VMPL1
+ * @vmpl2_perms: VMPL permission mask for VMPL2
+ * @vmpl3_perms: VMPL permission mask for VMPL3
+ */
+struct sev_data_snp_launch_update {
+ u64 gctx_paddr; /* In */
+ u32 page_size:1; /* In */
+ u32 page_type:3; /* In */
+ u32 imi_page:1; /* In */
+ u32 rsvd:27;
+ u32 rsvd2;
+ u64 address; /* In */
+ u32 rsvd3:8;
+ u32 vmpl1_perms:8; /* In */
+ u32 vmpl2_perms:8; /* In */
+ u32 vmpl3_perms:8; /* In */
+ u32 rsvd4;
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_finish - SNP_LAUNCH_FINISH command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ */
+struct sev_data_snp_launch_finish {
+ u64 gctx_paddr;
+ u64 id_block_paddr;
+ u64 id_auth_paddr;
+ u8 id_block_en:1;
+ u8 auth_key_en:1;
+ u64 rsvd:62;
+ u8 host_data[32];
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_status - SNP_GUEST_STATUS command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @address: system physical address of guest status page
+ */
+struct sev_data_snp_guest_status {
+ u64 gctx_paddr;
+ u64 address;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_reclaim - SNP_PAGE_RECLAIM command params
+ *
+ * @paddr: system physical address of page to be claimed. The 0th bit
+ * in the address indicates the page size. 0h indicates 4 kB and
+ * 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_reclaim {
+ u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_unsmash - SNP_PAGE_UNSMASH command params
+ *
+ * @paddr: system physical address of page to be unsmashed. The 0th bit
+ * in the address indicates the page size. 0h indicates 4 kB and
+ * 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_unsmash {
+ u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_dbg - DBG_ENCRYPT/DBG_DECRYPT command parameters
+ *
+ * @handle: handle of the VM to perform debug operation
+ * @src_addr: source address of data to operate on
+ * @dst_addr: destination address of data to operate on
+ */
+struct sev_data_snp_dbg {
+ u64 gctx_paddr; /* In */
+ u64 src_addr; /* In */
+ u64 dst_addr; /* In */
+} __packed;
+
+/**
+ * struct sev_snp_guest_request - SNP_GUEST_REQUEST command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @req_paddr: system physical address of request page
+ * @res_paddr: system physical address of response page
+ */
+struct sev_data_snp_guest_request {
+ u64 gctx_paddr; /* In */
+ u64 req_paddr; /* In */
+ u64 res_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_init - SNP_INIT_EX structure
+ *
+ * @init_rmp: indicate that the RMP should be initialized.
+ * @list_paddr_en: indicate that list_paddr is valid
+ * @list_paddr: system physical address of range list
+ */
+struct sev_data_snp_init_ex {
+ u32 init_rmp:1;
+ u32 list_paddr_en:1;
+ u32 rsvd:30;
+ u32 rsvd1;
+ u64 list_paddr;
+ u8 rsvd2[48];
+} __packed;
+
+/**
+ * struct sev_data_range - RANGE structure
+ *
+ * @base: system physical address of first byte of range
+ * @page_count: number of 4KB pages in this range
+ */
+struct sev_data_range {
+ u64 base;
+ u32 page_count;
+ u32 rsvd;
+} __packed;
+
+/**
+ * struct sev_data_range_list - RANGE_LIST structure
+ *
+ * @num_elements: number of elements in RANGE_ARRAY
+ * @ranges: array of num_elements of type RANGE
+ */
+struct sev_data_range_list {
+ u32 num_elements;
+ u32 rsvd;
+ struct sev_data_range ranges[0];
+} __packed;
+
+/**
+ * struct sev_data_snp_shutdown_ex - SNP_SHUTDOWN_EX structure
+ *
+ * @length: len of the command buffer read by the PSP
+ * @iommu_snp_shutdown: Disable enforcement of SNP in the IOMMU
+ */
+struct sev_data_snp_shutdown_ex {
+ u32 length;
+ u32 iommu_snp_shutdown:1;
+ u32 rsvd1:31;
+} __packed;
+
#ifdef CONFIG_CRYPTO_DEV_SP_PSP

/**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 91b4c63d5cbf..7d8a2dd20273 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -61,6 +61,13 @@ typedef enum {
SEV_RET_INVALID_PARAM,
SEV_RET_RESOURCE_LIMIT,
SEV_RET_SECURE_DATA_INVALID,
+ SEV_RET_INVALID_PAGE_SIZE,
+ SEV_RET_INVALID_PAGE_STATE,
+ SEV_RET_INVALID_MDATA_ENTRY,
+ SEV_RET_INVALID_PAGE_OWNER,
+ SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+ SEV_RET_RMP_INIT_REQUIRED,
+
SEV_RET_MAX,
} sev_ret_code;

@@ -147,6 +154,52 @@ struct sev_user_data_get_id2 {
__u32 length; /* In/Out */
} __packed;

+/**
+ * struct sev_user_data_snp_status - SNP status
+ *
+ * @major: API major version
+ * @minor: API minor version
+ * @state: current platform state
+ * @is_rmp_initialized: whether RMP is initialized or not
+ * @build: firmware build id for the API version
+ * @mask_chip_id: whether chip id is present in attestation reports or not
+ * @mask_chip_key: whether attestation reports are signed or not
+ * @vlek_en: VLEK hashstick is loaded
+ * @guest_count: the number of guest currently managed by the firmware
+ * @current_tcb_version: current TCB version
+ * @reported_tcb_version: reported TCB version
+ */
+struct sev_user_data_snp_status {
+ __u8 api_major; /* Out */
+ __u8 api_minor; /* Out */
+ __u8 state; /* Out */
+ __u8 is_rmp_initialized:1; /* Out */
+ __u8 rsvd:7;
+ __u32 build_id; /* Out */
+ __u32 mask_chip_id:1; /* Out */
+ __u32 mask_chip_key:1; /* Out */
+ __u32 vlek_en:1; /* Out */
+ __u32 rsvd1:29;
+ __u32 guest_count; /* Out */
+ __u64 current_tcb_version; /* Out */
+ __u64 reported_tcb_version; /* Out */
+} __packed;
+
+/*
+ * struct sev_user_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: The TCB version to report in the guest attestation report.
+ * @mask_chip_id: Indicates that the CHID_ID field in the attestation report
+ * will always be zero.
+ */
+struct sev_user_data_snp_config {
+ __u64 reported_tcb ; /* In */
+ __u32 mask_chip_id:1; /* In */
+ __u32 mask_chip_key:1; /* In */
+ __u32 rsvd:30; /* In */
+ __u8 rsvd1[52];
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1


2023-06-12 04:36:23

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 17/51] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

From: Brijesh Singh <[email protected]>

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
command to prepare for the first SNP guest launch after INIT.

During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to other page states
after SNP_INIT enables their SNP enforcement.

Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.
For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Co-developed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 242 +++++++++++++++++++++++++++++++++--
drivers/crypto/ccp/sev-dev.h | 2 +
2 files changed, 234 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index ab3572286755..d3764ee073f3 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -27,6 +27,7 @@

#include <asm/smp.h>
#include <asm/cacheflush.h>
+#include <asm/e820/types.h>

#include "psp-dev.h"
#include "sev-dev.h"
@@ -35,6 +36,10 @@
#define SEV_FW_FILE "amd/sev.fw"
#define SEV_FW_NAME_SIZE 64

+/* Minimum firmware version required for the SEV-SNP support */
+#define SNP_MIN_API_MAJOR 1
+#define SNP_MIN_API_MINOR 51
+
static DEFINE_MUTEX(sev_cmd_mutex);
static struct sev_misc_dev *misc_dev;

@@ -78,6 +83,14 @@ static void *sev_es_tmr;
#define NV_LENGTH (32 * 1024)
static void *sev_init_ex_buffer;

+/*
+ * SEV_DATA_RANGE_LIST:
+ * Array containing range of pages that firmware transitions to HV-fixed
+ * page state.
+ */
+struct sev_data_range_list *snp_range_list;
+static int __sev_snp_init_locked(int *error);
+
static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
{
struct sev_device *sev = psp_master->sev_data;
@@ -462,7 +475,8 @@ static int __sev_platform_init_locked(int *error)
{
struct psp_device *psp = psp_master;
struct sev_device *sev;
- int rc = 0, psp_ret = -1;
+ int psp_ret = -1;
+ int rc;
int (*init_function)(int *error);

if (!psp || !psp->sev_data)
@@ -473,6 +487,26 @@ static int __sev_platform_init_locked(int *error)
if (sev->state == SEV_STATE_INIT)
return 0;

+ rc = __sev_snp_init_locked(error);
+ if (rc && rc != -ENODEV) {
+ /*
+ * Don't abort the probe if SNP INIT failed,
+ * continue to initialize the legacy SEV firmware.
+ */
+ dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n", rc, *error);
+ }
+
+ if (!sev_es_tmr) {
+ /* Obtain the TMR memory area for SEV-ES use */
+ sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
+ if (sev_es_tmr)
+ /* Must flush the cache before giving it to the firmware */
+ clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
+ else
+ dev_warn(sev->dev,
+ "SEV: TMR allocation failed, SEV-ES support unavailable\n");
+ }
+
if (sev_init_ex_buffer) {
init_function = __sev_init_ex_locked;
rc = sev_read_init_ex_file();
@@ -832,6 +866,191 @@ static int sev_update_firmware(struct device *dev)
return ret;
}

+static void snp_set_hsave_pa(void *arg)
+{
+ wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
+{
+ struct sev_data_range_list *range_list = arg;
+ struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
+ size_t size;
+
+ if ((range_list->num_elements * sizeof(struct sev_data_range) +
+ sizeof(struct sev_data_range_list)) > PAGE_SIZE)
+ return -E2BIG;
+
+ switch (rs->desc) {
+ case E820_TYPE_RESERVED:
+ case E820_TYPE_PMEM:
+ case E820_TYPE_ACPI:
+ range->base = rs->start & PAGE_MASK;
+ size = (rs->end + 1) - rs->start;
+ range->page_count = size >> PAGE_SHIFT;
+ range_list->num_elements++;
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static int __sev_snp_init_locked(int *error)
+{
+ struct psp_device *psp = psp_master;
+ struct sev_data_snp_init_ex data;
+ struct sev_device *sev;
+ int rc = 0;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENODEV;
+
+ if (!psp || !psp->sev_data)
+ return -ENODEV;
+
+ sev = psp->sev_data;
+
+ if (sev->snp_initialized)
+ return 0;
+
+ if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
+ dev_dbg(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
+ SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
+ return 0;
+ }
+
+ /*
+ * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
+ * across all cores.
+ */
+ on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+ /*
+ * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
+ * system physical address ranges to convert into the HV-fixed page states
+ * during the RMP initialization. For instance, the memory that UEFI
+ * reserves should be included in the range list. This allows system
+ * components that occasionally write to memory (e.g. logging to UEFI
+ * reserved regions) to not fail due to RMP initialization and SNP enablement.
+ */
+ if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
+ /*
+ * Firmware checks that the pages containing the ranges enumerated
+ * in the RANGES structure are either in the Default page state or in the
+ * firmware page state.
+ */
+ snp_range_list = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!snp_range_list) {
+ dev_err(sev->dev,
+ "SEV: SNP_INIT_EX range list memory allocation failed\n");
+ return -ENOMEM;
+ }
+
+ /*
+ * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
+ * to be setup as HV-fixed pages.
+ */
+
+ rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
+ snp_range_list, snp_filter_reserved_mem_regions);
+ if (rc) {
+ dev_err(sev->dev,
+ "SEV: SNP_INIT_EX walk_iomem_res_desc failed rc = %d\n", rc);
+ return rc;
+ }
+
+ memset(&data, 0, sizeof(data));
+ data.init_rmp = 1;
+ data.list_paddr_en = 1;
+ data.list_paddr = __psp_pa(snp_range_list);
+
+ /*
+ * Before invoking SNP_INIT_EX with INIT_RMP=1, make sure that
+ * all dirty cache lines containing the RMP are flushed.
+ *
+ * NOTE: that includes writes via RMPUPDATE instructions, which
+ * are also cacheable writes.
+ */
+ wbinvd_on_all_cpus();
+
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT_EX, &data, error);
+ if (rc)
+ return rc;
+ } else {
+ /*
+ * SNP_INIT is equivalent to SNP_INIT_EX with INIT_RMP=1, so
+ * just as with that case, make sure all dirty cache lines
+ * containing the RMP are flushed.
+ */
+ wbinvd_on_all_cpus();
+
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
+ if (rc)
+ return rc;
+ }
+
+ /* Prepare for first SNP guest launch after INIT */
+ wbinvd_on_all_cpus();
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+ if (rc)
+ return rc;
+
+ sev->snp_initialized = true;
+ dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+ return rc;
+}
+
+static int __sev_snp_shutdown_locked(int *error)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_data_snp_shutdown_ex data;
+ int ret;
+
+ if (!sev->snp_initialized)
+ return 0;
+
+ memset(&data, 0, sizeof(data));
+ data.length = sizeof(data);
+ data.iommu_snp_shutdown = 1;
+
+ wbinvd_on_all_cpus();
+
+retry:
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
+ /* SHUTDOWN may require DF_FLUSH */
+ if (*error == SEV_RET_DFFLUSH_REQUIRED) {
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
+ if (ret) {
+ dev_err(sev->dev, "SEV-SNP DF_FLUSH failed\n");
+ return ret;
+ }
+ goto retry;
+ }
+ if (ret) {
+ dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
+ return ret;
+ }
+
+ sev->snp_initialized = false;
+ dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+ return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+ int rc;
+
+ mutex_lock(&sev_cmd_mutex);
+ rc = __sev_snp_shutdown_locked(error);
+ mutex_unlock(&sev_cmd_mutex);
+
+ return rc;
+}
+
static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
{
struct sev_device *sev = psp_master->sev_data;
@@ -1279,6 +1498,8 @@ int sev_dev_init(struct psp_device *psp)

static void sev_firmware_shutdown(struct sev_device *sev)
{
+ int error;
+
sev_platform_shutdown(NULL);

if (sev_es_tmr) {
@@ -1295,6 +1516,13 @@ static void sev_firmware_shutdown(struct sev_device *sev)
get_order(NV_LENGTH));
sev_init_ex_buffer = NULL;
}
+
+ if (snp_range_list) {
+ kfree(snp_range_list);
+ snp_range_list = NULL;
+ }
+
+ sev_snp_shutdown(&error);
}

void sev_dev_destroy(struct psp_device *psp)
@@ -1350,15 +1578,6 @@ void sev_pci_init(void)
}
}

- /* Obtain the TMR memory area for SEV-ES use */
- sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
- if (sev_es_tmr)
- /* Must flush the cache before giving it to the firmware */
- clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
- else
- dev_warn(sev->dev,
- "SEV: TMR allocation failed, SEV-ES support unavailable\n");
-
if (!psp_init_on_probe)
return;

@@ -1368,6 +1587,9 @@ void sev_pci_init(void)
dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
error, rc);

+ dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
+ "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
+
return;

err:
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 666c21eb81ab..34767657beb5 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,8 @@ struct sev_device {
u8 build;

void *cmd_buf;
+
+ bool snp_initialized;
};

int sev_dev_init(struct psp_device *psp);
--
2.25.1


2023-06-12 04:37:16

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 01/51] KVM: x86: Add gmem hook for initializing private memory

All gmem pages are expected to be 'private' as defined by a particular
arch/platform. Platforms like SEV-SNP require additional operations to
move these pages into a private state, so implement a hook that can be
used to prepare this memory prior to mapping it into a guest.

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table, so this hook will also be used by the KVM MMU to clamp
the maximum mapping size accordingly.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/mmu/mmu.c | 11 ++++++++++-
3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 13bc212cd4bc..439ba4beb5af 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -133,6 +133,7 @@ KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8ae131dc645d..bd03b6cf40fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1732,6 +1732,9 @@ struct kvm_x86_ops {
* Returns vCPU specific APICv inhibit reasons
*/
unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
+
+ int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot,
+ kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dc2b9a2f717c..c54672ad6cbc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4341,6 +4341,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
int order, r;
+ u8 max_level;

if (!kvm_slot_can_be_private(fault->slot))
return kvm_do_memory_fault_exit(vcpu, fault);
@@ -4349,7 +4350,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
if (r)
return r;

- fault->max_level = min(kvm_max_level_for_order(order), fault->max_level);
+ max_level = kvm_max_level_for_order(order);
+ r = static_call(kvm_x86_gmem_prepare)(vcpu->kvm, fault->slot, fault->pfn,
+ fault->gfn, &max_level);
+ if (r) {
+ kvm_release_pfn_clean(fault->pfn);
+ return r;
+ }
+
+ fault->max_level = min(max_level, fault->max_level);
fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
return RET_PF_CONTINUE;
}
--
2.25.1


2023-06-12 04:37:55

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 19/51] x86/sev: Introduce snp leaked pages list

From: Ashish Kalra <[email protected]>

Pages are unsafe to be released back to the page-allocator, if they
have been transitioned to firmware/guest state and can't be reclaimed
or transitioned back to hypervisor/shared state. In this case add
them to an internal leaked pages list to ensure that they are not freed
or touched/accessed to cause fatal page faults.

Signed-off-by: Ashish Kalra <[email protected]>
[mdr: relocate to arch/x86/coco/sev/host.c]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/coco/sev/host.c | 28 ++++++++++++++++++++++++++++
arch/x86/include/asm/sev-host.h | 3 +++
2 files changed, 31 insertions(+)

diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
index cd3b4c6a25bc..373e91f5a337 100644
--- a/arch/x86/coco/sev/host.c
+++ b/arch/x86/coco/sev/host.c
@@ -64,6 +64,12 @@ struct rmpentry {
static unsigned long rmptable_start __ro_after_init;
static unsigned long rmptable_end __ro_after_init;

+/* list of pages which are leaked and cannot be reclaimed */
+static LIST_HEAD(snp_leaked_pages_list);
+static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
+
+static atomic_long_t snp_nr_leaked_pages = ATOMIC_LONG_INIT(0);
+
#undef pr_fmt
#define pr_fmt(fmt) "SEV-SNP: " fmt

@@ -494,3 +500,25 @@ int rmp_make_shared(u64 pfn, enum pg_level level)
return rmpupdate(pfn, &val);
}
EXPORT_SYMBOL_GPL(rmp_make_shared);
+
+void snp_leak_pages(unsigned long pfn, unsigned int npages)
+{
+ struct page *page = pfn_to_page(pfn);
+
+ WARN(1, "psc failed, pfn 0x%lx pages %d (marked offline)\n", pfn, npages);
+
+ spin_lock(&snp_leaked_pages_list_lock);
+ while (npages--) {
+ /*
+ * Reuse the page's buddy list for chaining into the leaked
+ * pages list. This page should not be on a free list currently
+ * and is also unsafe to be added to a free list.
+ */
+ list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
+ sev_dump_rmpentry(pfn);
+ pfn++;
+ }
+ spin_unlock(&snp_leaked_pages_list_lock);
+ atomic_long_inc(&snp_nr_leaked_pages);
+}
+EXPORT_SYMBOL_GPL(snp_leak_pages);
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index 753e80d16433..bab3b226777a 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -19,6 +19,8 @@ void sev_dump_rmpentry(u64 pfn);
int psmash(u64 pfn);
int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
int rmp_make_shared(u64 pfn, enum pg_level level);
+void snp_leak_pages(unsigned long pfn, unsigned int npages);
+
#else
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
static inline void sev_dump_rmpentry(u64 pfn) {}
@@ -29,6 +31,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
return -ENODEV;
}
static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+void snp_leak_pages(unsigned long pfn, unsigned int npages) {}
#endif

#endif
--
2.25.1


2023-06-12 04:38:19

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 20/51] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

From: Brijesh Singh <[email protected]>

The behavior and requirement for the SEV-legacy command is altered when
the SNP firmware is in the INIT state. See SEV-SNP firmware specification
for more details.

Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
when SNP is enabled to satisfy new requirements for the SNP. Continue
allocating a 1mb region for !SNP configuration.

While at it, provide API that can be used by others to allocate a page
that can be used by the firmware. The immediate user for this API will
be the KVM driver. The KVM driver to need to allocate a firmware context
page during the guest creation. The context page need to be updated
by the firmware. See the SEV-SNP specification for further details.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: use struct sev_data_snp_page_reclaim instead of passing paddr
directly to SEV_CMD_SNP_PAGE_RECLAIM]
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 150 ++++++++++++++++++++++++++++++++---
include/linux/psp-sev.h | 9 +++
2 files changed, 150 insertions(+), 9 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 88c5bf264a87..d8124d33c831 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -91,6 +91,13 @@ static void *sev_init_ex_buffer;
struct sev_data_range_list *snp_range_list;
static int __sev_snp_init_locked(int *error);

+/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
+#define SEV_SNP_ES_TMR_SIZE (2 * 1024 * 1024)
+
+static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
+
+static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
+
static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
{
struct sev_device *sev = psp_master->sev_data;
@@ -191,11 +198,131 @@ static int sev_cmd_buffer_len(int cmd)
return 0;
}

+static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
+{
+ /* Cbit maybe set in the paddr */
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ int ret, err, i, n = 0;
+
+ for (i = 0; i < npages; i++, pfn++, n++) {
+ struct sev_data_snp_page_reclaim data = {0};
+
+ data.paddr = pfn << PAGE_SHIFT;
+
+ if (locked)
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ else
+ ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+
+ if (ret)
+ goto cleanup;
+
+ ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (ret)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * If failed to reclaim the page then page is no longer safe to
+ * be release back to the system, leak it.
+ */
+ snp_leak_pages(pfn, npages - n);
+ return ret;
+}
+
+static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
+{
+ /* Cbit maybe set in the paddr */
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ int rc, n = 0, i;
+
+ for (i = 0; i < npages; i++, n++, pfn++) {
+ rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
+ if (rc)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * Try unrolling the firmware state changes by
+ * reclaiming the pages which were already changed to the
+ * firmware state.
+ */
+ snp_reclaim_pages(paddr, n, locked);
+
+ return rc;
+}
+
+static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
+{
+ unsigned long npages = 1ul << order, paddr;
+ struct sev_device *sev;
+ struct page *page;
+
+ if (!psp_master || !psp_master->sev_data)
+ return NULL;
+
+ page = alloc_pages(gfp_mask, order);
+ if (!page)
+ return NULL;
+
+ /* If SEV-SNP is initialized then add the page in RMP table. */
+ sev = psp_master->sev_data;
+ if (!sev->snp_initialized)
+ return page;
+
+ paddr = __pa((unsigned long)page_address(page));
+ if (rmp_mark_pages_firmware(paddr, npages, locked))
+ return NULL;
+
+ return page;
+}
+
+void *snp_alloc_firmware_page(gfp_t gfp_mask)
+{
+ struct page *page;
+
+ page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
+
+ return page ? page_address(page) : NULL;
+}
+EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
+
+static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ unsigned long paddr, npages = 1ul << order;
+
+ if (!page)
+ return;
+
+ paddr = __pa((unsigned long)page_address(page));
+ if (sev->snp_initialized &&
+ snp_reclaim_pages(paddr, npages, locked))
+ return;
+
+ __free_pages(page, order);
+}
+
+void snp_free_firmware_page(void *addr)
+{
+ if (!addr)
+ return;
+
+ __snp_free_firmware_pages(virt_to_page(addr), 0, false);
+}
+EXPORT_SYMBOL_GPL(snp_free_firmware_page);
+
static void *sev_fw_alloc(unsigned long len)
{
struct page *page;

- page = alloc_pages(GFP_KERNEL, get_order(len));
+ page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len), false);
if (!page)
return NULL;

@@ -443,7 +570,7 @@ static int __sev_init_locked(int *error)
data.tmr_address = __pa(sev_es_tmr);

data.flags |= SEV_INIT_FLAGS_SEV_ES;
- data.tmr_len = SEV_ES_TMR_SIZE;
+ data.tmr_len = sev_es_tmr_size;
}

return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
@@ -466,7 +593,7 @@ static int __sev_init_ex_locked(int *error)
data.tmr_address = __pa(sev_es_tmr);

data.flags |= SEV_INIT_FLAGS_SEV_ES;
- data.tmr_len = SEV_ES_TMR_SIZE;
+ data.tmr_len = sev_es_tmr_size;
}

return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
@@ -499,14 +626,16 @@ static int __sev_platform_init_locked(int *error)

if (!sev_es_tmr) {
/* Obtain the TMR memory area for SEV-ES use */
- sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
- if (sev_es_tmr)
+ sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
+ if (sev_es_tmr) {
/* Must flush the cache before giving it to the firmware */
- clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
- else
+ if (!sev->snp_initialized)
+ clflush_cache_range(sev_es_tmr, sev_es_tmr_size);
+ } else {
dev_warn(sev->dev,
"SEV: TMR allocation failed, SEV-ES support unavailable\n");
}
+ }

if (sev_init_ex_buffer) {
init_function = __sev_init_ex_locked;
@@ -1001,6 +1130,8 @@ static int __sev_snp_init_locked(int *error)
sev->snp_initialized = true;
dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");

+ sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
+
return rc;
}

@@ -1507,8 +1638,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
/* The TMR area was encrypted, flush it from the cache */
wbinvd_on_all_cpus();

- free_pages((unsigned long)sev_es_tmr,
- get_order(SEV_ES_TMR_SIZE));
+ __snp_free_firmware_pages(virt_to_page(sev_es_tmr),
+ get_order(sev_es_tmr_size),
+ false);
sev_es_tmr = NULL;
}

diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index c8656a36baeb..5ae61de96e44 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -906,6 +906,8 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
int sev_do_cmd(int cmd, void *data, int *psp_ret);

void *psp_copy_user_blob(u64 uaddr, u32 len);
+void *snp_alloc_firmware_page(gfp_t mask);
+void snp_free_firmware_page(void *addr);

#else /* !CONFIG_CRYPTO_DEV_SP_PSP */

@@ -933,6 +935,13 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int

static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }

+static inline void *snp_alloc_firmware_page(gfp_t mask)
+{
+ return NULL;
+}
+
+static inline void snp_free_firmware_page(void *addr) { }
+
#endif /* CONFIG_CRYPTO_DEV_SP_PSP */

#endif /* __PSP_SEV_H__ */
--
2.25.1


2023-06-12 04:40:03

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 18/51] crypto: ccp: Provide API to issue SEV and SNP commands

From: Brijesh Singh <[email protected]>

Make sev_do_cmd() a generic API interface for the hypervisor
to issue commands to manage an SEV and SNP guest. The commands
for SEV and SNP are defined in the SEV and SEV-SNP firmware
specifications.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 3 ++-
include/linux/psp-sev.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index d3764ee073f3..88c5bf264a87 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -418,7 +418,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
return ret;
}

-static int sev_do_cmd(int cmd, void *data, int *psp_ret)
+int sev_do_cmd(int cmd, void *data, int *psp_ret)
{
int rc;

@@ -428,6 +428,7 @@ static int sev_do_cmd(int cmd, void *data, int *psp_ret)

return rc;
}
+EXPORT_SYMBOL_GPL(sev_do_cmd);

static int __sev_init_locked(int *error)
{
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 06d0619ca442..c8656a36baeb 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -891,6 +891,20 @@ int sev_guest_df_flush(int *error);
*/
int sev_guest_decommission(struct sev_data_decommission *data, int *error);

+/**
+ * sev_do_cmd - perform SEV command
+ *
+ * @error: SEV command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV if the SEV device is not available
+ * -%ENOTSUPP if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO if the SEV returned a non-zero return code
+ */
+int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
void *psp_copy_user_blob(u64 uaddr, u32 len);

#else /* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -906,6 +920,9 @@ sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENO
static inline int
sev_guest_decommission(struct sev_data_decommission *data, int *error) { return -ENODEV; }

+static inline int
+sev_do_cmd(int cmd, void *data, int *psp_ret) { return -ENODEV; }
+
static inline int
sev_guest_activate(struct sev_data_activate *data, int *error) { return -ENODEV; }

--
2.25.1


2023-06-12 04:41:10

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 21/51] crypto: ccp: Handle the legacy SEV command when SNP is enabled

From: Brijesh Singh <[email protected]>

The behavior of the SEV-legacy commands is altered when the SNP firmware
is in the INIT state. When SNP is in INIT state, all the SEV-legacy
commands that cause the firmware to write to memory must be in the
firmware state before issuing the command..

A command buffer may contains a system physical address that the firmware
may write to. There are two cases that need to be handled:

1) system physical address points to a guest memory
2) system physical address points to a host memory

To handle the case #1, change the page state to the firmware in the RMP
table before issuing the command and restore the state to shared after the
command completes.

For the case #2, use a bounce buffer to complete the request.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 371 ++++++++++++++++++++++++++++++++++-
drivers/crypto/ccp/sev-dev.h | 12 ++
2 files changed, 373 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index d8124d33c831..10bb0a7dcfd6 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -28,6 +28,7 @@
#include <asm/smp.h>
#include <asm/cacheflush.h>
#include <asm/e820/types.h>
+#include <asm/sev-host.h>

#include "psp-dev.h"
#include "sev-dev.h"
@@ -258,6 +259,30 @@ static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, boo
return rc;
}

+static int rmp_mark_pages_shared(unsigned long paddr, unsigned int npages)
+{
+ /* Cbit maybe set in the paddr */
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ int rc, n = 0, i;
+
+ for (i = 0; i < npages; i++, pfn++, n++) {
+ rc = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (rc)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * If failed to change the page state to shared, then its not safe
+ * to release the page back to the system, leak it.
+ */
+ snp_leak_pages(pfn, npages - n);
+
+ return rc;
+}
+
static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
{
unsigned long npages = 1ul << order, paddr;
@@ -459,12 +484,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
return sev_write_init_ex_file();
}

+static int alloc_snp_host_map(struct sev_device *sev)
+{
+ struct page *page;
+ int i;
+
+ for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+ struct snp_host_map *map = &sev->snp_host_map[i];
+
+ memset(map, 0, sizeof(*map));
+
+ page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
+ if (!page)
+ return -ENOMEM;
+
+ map->host = page_address(page);
+ }
+
+ return 0;
+}
+
+static void free_snp_host_map(struct sev_device *sev)
+{
+ int i;
+
+ for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+ struct snp_host_map *map = &sev->snp_host_map[i];
+
+ if (map->host) {
+ __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
+ memset(map, 0, sizeof(*map));
+ }
+ }
+}
+
+static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+ unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+ map->active = false;
+
+ if (!paddr || !len)
+ return 0;
+
+ map->paddr = *paddr;
+ map->len = len;
+
+ /* If paddr points to a guest memory then change the page state to firmwware. */
+ if (guest) {
+ if (rmp_mark_pages_firmware(*paddr, npages, true))
+ return -EFAULT;
+
+ goto done;
+ }
+
+ if (!map->host)
+ return -ENOMEM;
+
+ /* Check if the pre-allocated buffer can be used to fullfil the request. */
+ if (len > SEV_FW_BLOB_MAX_SIZE)
+ return -EINVAL;
+
+ /* Transition the pre-allocated buffer to the firmware state. */
+ if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
+ return -EFAULT;
+
+ /* Set the paddr to use pre-allocated firmware buffer */
+ *paddr = __psp_pa(map->host);
+
+done:
+ map->active = true;
+ return 0;
+}
+
+static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+ unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+ if (!map->active)
+ return 0;
+
+ /* If paddr points to a guest memory then restore the page state to hypervisor. */
+ if (guest) {
+ if (snp_reclaim_pages(*paddr, npages, true))
+ return -EFAULT;
+
+ goto done;
+ }
+
+ /*
+ * Transition the pre-allocated buffer to hypervisor state before the access.
+ *
+ * This is because while changing the page state to firmware, the kernel unmaps
+ * the pages from the direct map, and to restore the direct map the pages must
+ * be transitioned back to the shared state.
+ */
+ if (snp_reclaim_pages(__pa(map->host), npages, true))
+ return -EFAULT;
+
+ /* Copy the response data firmware buffer to the callers buffer. */
+ memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
+ *paddr = map->paddr;
+
+done:
+ map->active = false;
+ return 0;
+}
+
+static bool sev_legacy_cmd_buf_writable(int cmd)
+{
+ switch (cmd) {
+ case SEV_CMD_PLATFORM_STATUS:
+ case SEV_CMD_GUEST_STATUS:
+ case SEV_CMD_LAUNCH_START:
+ case SEV_CMD_RECEIVE_START:
+ case SEV_CMD_LAUNCH_MEASURE:
+ case SEV_CMD_SEND_START:
+ case SEV_CMD_SEND_UPDATE_DATA:
+ case SEV_CMD_SEND_UPDATE_VMSA:
+ case SEV_CMD_PEK_CSR:
+ case SEV_CMD_PDH_CERT_EXPORT:
+ case SEV_CMD_GET_ID:
+ case SEV_CMD_ATTESTATION_REPORT:
+ return true;
+ default:
+ return false;
+ }
+}
+
+#define prep_buffer(name, addr, len, guest, map) \
+ func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
+
+static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
+{
+ int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
+ struct sev_device *sev = psp_master->sev_data;
+ bool from_fw = !to_fw;
+
+ /*
+ * After the command is completed, change the command buffer memory to
+ * hypervisor state.
+ *
+ * The immutable bit is automatically cleared by the firmware, so
+ * no not need to reclaim the page.
+ */
+ if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
+ if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
+ return -EFAULT;
+
+ /* No need to go further if firmware failed to execute command. */
+ if (fw_err)
+ return 0;
+ }
+
+ if (to_fw)
+ func = map_firmware_writeable;
+ else
+ func = unmap_firmware_writeable;
+
+ /*
+ * A command buffer may contains a system physical address. If the address
+ * points to a host memory then use an intermediate firmware page otherwise
+ * change the page state in the RMP table.
+ */
+ switch (cmd) {
+ case SEV_CMD_PDH_CERT_EXPORT:
+ if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
+ pdh_cert_len, false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
+ cert_chain_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_GET_ID:
+ if (prep_buffer(struct sev_data_get_id, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_PEK_CSR:
+ if (prep_buffer(struct sev_data_pek_csr, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_launch_update_data, address, len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_MEASURE:
+ if (prep_buffer(struct sev_data_launch_measure, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_SECRET:
+ if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_DBG_DECRYPT:
+ if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
+ &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_DBG_ENCRYPT:
+ if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
+ &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_ATTESTATION_REPORT:
+ if (prep_buffer(struct sev_data_attestation_report, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_START:
+ if (prep_buffer(struct sev_data_send_start, session_address,
+ session_len, false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_send_update_data, trans_address,
+ trans_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
+ trans_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_RECEIVE_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_receive_update_data, guest_address,
+ guest_len, true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_RECEIVE_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
+ guest_len, true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ default:
+ break;
+ }
+
+ /* The command buffer need to be in the firmware state. */
+ if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
+ if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
+ return -EFAULT;
+ }
+
+ return 0;
+
+err:
+ return -EINVAL;
+}
+
+static inline bool need_firmware_copy(int cmd)
+{
+ struct sev_device *sev = psp_master->sev_data;
+
+ /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
+ return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
+}
+
+static int snp_aware_copy_to_firmware(int cmd, void *data)
+{
+ return __snp_cmd_buf_copy(cmd, data, true, 0);
+}
+
+static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
+{
+ return __snp_cmd_buf_copy(cmd, data, false, fw_err);
+}
+
static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
{
struct psp_device *psp = psp_master;
struct sev_device *sev;
unsigned int phys_lsb, phys_msb;
unsigned int reg, ret = 0;
+ void *cmd_buf;
int buf_len;

if (!psp || !psp->sev_data)
@@ -484,12 +792,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
* work for some memory, e.g. vmalloc'd addresses, and @data may not be
* physically contiguous.
*/
- if (data)
- memcpy(sev->cmd_buf, data, buf_len);
+ if (data) {
+ if (sev->cmd_buf_active > 2)
+ return -EBUSY;
+
+ cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
+
+ memcpy(cmd_buf, data, buf_len);
+ sev->cmd_buf_active++;
+
+ /*
+ * The behavior of the SEV-legacy commands is altered when the
+ * SNP firmware is in the INIT state.
+ */
+ if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, cmd_buf))
+ return -EFAULT;
+ } else {
+ cmd_buf = sev->cmd_buf;
+ }

/* Get the physical address of the command buffer */
- phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
- phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
+ phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
+ phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;

dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
cmd, phys_msb, phys_lsb, psp_timeout);
@@ -532,15 +856,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
ret = sev_write_init_ex_file_if_required(cmd);
}

- print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
- buf_len, false);
-
/*
* Copy potential output from the PSP back to data. Do this even on
* failure in case the caller wants to glean something from the error.
*/
- if (data)
- memcpy(data, sev->cmd_buf, buf_len);
+ if (data) {
+ /*
+ * Restore the page state after the command completes.
+ */
+ if (need_firmware_copy(cmd) &&
+ snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
+ return -EFAULT;
+
+ memcpy(data, cmd_buf, buf_len);
+ sev->cmd_buf_active--;
+ }
+
+ print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+ buf_len, false);

return ret;
}
@@ -624,6 +957,14 @@ static int __sev_platform_init_locked(int *error)
dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n", rc, *error);
}

+ /*
+ * Allocate the intermediate buffers used for the legacy command handling.
+ */
+ if (rc != -ENODEV && alloc_snp_host_map(sev)) {
+ dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
+ goto skip_legacy;
+ }
+
if (!sev_es_tmr) {
/* Obtain the TMR memory area for SEV-ES use */
sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
@@ -677,6 +1018,7 @@ static int __sev_platform_init_locked(int *error)
dev_info(sev->dev, "SEV API:%d.%d build:%d\n", sev->api_major,
sev->api_minor, sev->build);

+skip_legacy:
return 0;
}

@@ -1586,10 +1928,12 @@ int sev_dev_init(struct psp_device *psp)
if (!sev)
goto e_err;

- sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
+ sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
if (!sev->cmd_buf)
goto e_sev;

+ sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+
psp->sev_data = sev;

sev->dev = dev;
@@ -1655,6 +1999,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
snp_range_list = NULL;
}

+ /*
+ * The host map need to clear the immutable bit so it must be free'd before the
+ * SNP firmware shutdown.
+ */
+ free_snp_host_map(sev);
+
sev_snp_shutdown(&error);
}

@@ -1726,6 +2076,7 @@ void sev_pci_init(void)
return;

err:
+ free_snp_host_map(sev);
psp_master->sev_data = NULL;
}

diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 34767657beb5..19d79f9d4212 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -29,11 +29,20 @@
#define SEV_CMDRESP_CMD_SHIFT 16
#define SEV_CMDRESP_IOC BIT(0)

+#define MAX_SNP_HOST_MAP_BUFS 2
+
struct sev_misc_dev {
struct kref refcount;
struct miscdevice misc;
};

+struct snp_host_map {
+ u64 paddr;
+ u32 len;
+ void *host;
+ bool active;
+};
+
struct sev_device {
struct device *dev;
struct psp_device *psp;
@@ -52,8 +61,11 @@ struct sev_device {
u8 build;

void *cmd_buf;
+ void *cmd_buf_backup;
+ int cmd_buf_active;

bool snp_initialized;
+ struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
};

int sev_dev_init(struct psp_device *psp);
--
2.25.1


2023-06-12 04:41:29

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 22/51] crypto: ccp: Add the SNP_PLATFORM_STATUS command

From: Brijesh Singh <[email protected]>

The command can be used by the userspace to query the SNP platform status
report. See the SEV-SNP spec for more details.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/coco/sev-guest.rst | 27 ++++++++++++++++
drivers/crypto/ccp/sev-dev.c | 45 +++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 1 +
3 files changed, 73 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index bf593e88cfd9..11ea67c944df 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -61,6 +61,22 @@ counter (e.g. counter overflow), then -EIO will be returned.
__u64 fw_err;
};

+The host ioctl should be called to /dev/sev device. The ioctl accepts command
+id and command input structure.
+
+::
+ struct sev_issue_cmd {
+ /* Command ID */
+ __u32 cmd;
+
+ /* Command request structure */
+ __u64 data;
+
+ /* firmware error code on failure (see psp-sev.h) */
+ __u32 error;
+ };
+
+
2.1 SNP_GET_REPORT
------------------

@@ -118,6 +134,17 @@ be updated with the expected value.

See GHCB specification for further detail on how to parse the certificate blob.

+2.4 SNP_PLATFORM_STATUS
+-----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_platform_status
+:Returns (out): 0 on success, -negative on error
+
+The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
+status includes API major, minor version and more. See the SEV-SNP
+specification for further details.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 10bb0a7dcfd6..0bfe9721c977 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1767,6 +1767,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
return ret;
}

+static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_data_snp_addr buf;
+ struct page *status_page;
+ void *data;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ status_page = alloc_page(GFP_KERNEL_ACCOUNT);
+ if (!status_page)
+ return -ENOMEM;
+
+ data = page_address(status_page);
+ if (rmp_mark_pages_firmware(__pa(data), 1, true)) {
+ __free_pages(status_page, 0);
+ return -EFAULT;
+ }
+
+ buf.gctx_paddr = __psp_pa(data);
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
+
+ /* Change the page state before accessing it */
+ if (snp_reclaim_pages(__pa(data), 1, true)) {
+ snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
+ return -EFAULT;
+ }
+
+ if (ret)
+ goto cleanup;
+
+ if (copy_to_user((void __user *)argp->data, data,
+ sizeof(struct sev_user_data_snp_status)))
+ ret = -EFAULT;
+
+cleanup:
+ __free_pages(status_page, 0);
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -1818,6 +1860,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SEV_GET_ID2:
ret = sev_ioctl_do_get_id2(&input);
break;
+ case SNP_PLATFORM_STATUS:
+ ret = sev_ioctl_snp_platform_status(&input);
+ break;
default:
ret = -EINVAL;
goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 7d8a2dd20273..4dc6a3e7b3d5 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -28,6 +28,7 @@ enum {
SEV_PEK_CERT_IMPORT,
SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
SEV_GET_ID2,
+ SNP_PLATFORM_STATUS,

SEV_MAX,
};
--
2.25.1


2023-06-12 04:41:31

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 23/51] KVM: SEV: Select CONFIG_KVM_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y

AMD SEV relies on the restricted/protected memory support to run guests
in some cases (such as SEV lazy-pinning), so make sure to enable that
support with the CONFIG_KVM_PROTECTED_VM build option.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 718010600956..638679a4e5dc 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -126,6 +126,7 @@ config KVM_AMD_SEV
bool "AMD Secure Encrypted Virtualization (SEV) support"
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
+ select KVM_PROTECTED_VM
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
--
2.25.1


2023-06-12 04:42:22

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 24/51] KVM: SVM: Add support to handle AP reset MSR protocol

From: Tom Lendacky <[email protected]>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++-----
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 9eb20b416251..a4fb53fd15d7 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -56,6 +56,8 @@
/* AP Reset Hold */
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c25aeb550cd9..b88295aa7124 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -58,6 +58,10 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
#define sev_es_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

+#define AP_RESET_HOLD_NONE 0
+#define AP_RESET_HOLD_NAE_EVENT 1
+#define AP_RESET_HOLD_MSR_PROTO 2
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2569,6 +2573,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)

void sev_es_unmap_ghcb(struct vcpu_svm *svm)
{
+ /* Clear any indication that the vCPU is in a type of AP Reset Hold */
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
if (!svm->sev_es.ghcb)
return;

@@ -2781,6 +2788,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_AP_RESET_HOLD_REQ:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+ ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+ /*
+ * Preset the result to a non-SIPI return and then only set
+ * the result to non-zero when delivering a SIPI.
+ */
+ set_ghcb_msr_bits(svm, 0,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -2880,6 +2903,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
break;
case SVM_VMGEXIT_AP_HLT_LOOP:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
ret = kvm_emulate_ap_reset_hold(vcpu);
break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -3040,13 +3064,29 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
return;
}

- /*
- * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
- * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
- * non-zero value.
- */
- if (!svm->sev_es.ghcb)
- return;
+ /* Subsequent SIPI */
+ switch (svm->sev_es.ap_reset_hold_type) {
+ case AP_RESET_HOLD_NAE_EVENT:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+ */
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ break;
+ case AP_RESET_HOLD_MSR_PROTO:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set GHCB data field to a non-zero value.
+ */
+ set_ghcb_msr_bits(svm, 1,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);

- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ default:
+ break;
+ }
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f44751dd8d5d..50be41fa16a0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -192,6 +192,7 @@ struct vcpu_sev_es_state {
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
bool received_first_sipi;
+ unsigned int ap_reset_hold_type;

/* SEV-ES scratch area support */
void *ghcb_sa;
--
2.25.1


2023-06-12 04:42:34

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 25/51] KVM: SVM: Add GHCB handling for Hypervisor Feature Support requests

From: Brijesh Singh <[email protected]>

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 14 ++++++++++++++
arch/x86/kvm/svm/svm.h | 3 ++-
3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index a4fb53fd15d7..aaea4afcda98 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_POS 12
+#define GHCB_MSR_HV_FT_MASK GENMASK_ULL(51, 0)
#define GHCB_MSR_HV_FT_RESP_VAL(v) \
/* GHCBData[63:12] */ \
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b88295aa7124..2bceb0060880 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2538,6 +2538,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+ case SVM_VMGEXIT_HV_FEATURES:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -2804,6 +2805,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK,
GHCB_MSR_INFO_POS);
break;
+ case GHCB_MSR_HV_FT_REQ: {
+ set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+ GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+ GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -2928,6 +2936,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_HV_FEATURES: {
+ ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FT_SUPPORTED);
+
+ ret = 1;
+ break;
+ }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 50be41fa16a0..1ab117daebd9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -711,9 +711,10 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);

/* sev.c */

-#define GHCB_VERSION_MAX 1ULL
+#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

+#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP

extern unsigned int max_sev_asid;

--
2.25.1


2023-06-12 04:43:40

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 26/51] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe

From: Brijesh Singh <[email protected]>

Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.

When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" via a reserved bit in the corresponding RMP
entry after a successful VMRUN. This is done for _all_ VMs, not just
SNP-Active VMs.

If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2mb aligned and software accesses any
part of the associated 2mb region with a hupage, the CPU will
incorrectly treat the entire 2mb region as in-use and signal a spurious
RMP violation #PF.

The recommended is to not use the hugepage for the VMCB, VMSA or
AVIC backing page for similar reasons. Add a generic allocator that will
ensure that the page returns is not hugepage (2mb or 1gb) and is safe to
be used when SEV-SNP is enabled. Also implement similar handling for the
VMCB/VMSA pages of nested guests.

Co-developed-by: Marc Orr <[email protected]>
Signed-off-by: Marc Orr <[email protected]>
Reported-by: Alper Gun <[email protected]> # for nested VMSA case
Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: squash in nested guest handling from Ashish]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/lapic.c | 5 ++++-
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/sev.c | 33 ++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 17 ++++++++++++---
arch/x86/kvm/svm/svm.h | 1 +
7 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 48f043de2ec0..28456b497198 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -135,6 +135,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
KVM_X86_OP_OPTIONAL(gmem_invalidate)
+KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c26f76641121..8d2bb3ff66a2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1743,6 +1743,7 @@ struct kvm_x86_ops {
int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot,
kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
void (*gmem_invalidate)(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
+ void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e542cf285b51..94311938651a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2769,7 +2769,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)

vcpu->arch.apic = apic;

- apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+ if (kvm_x86_ops.alloc_apic_backing_page)
+ apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
+ else
+ apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!apic->regs) {
printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 96936ddf1b3c..fb981c8b82c4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1185,7 +1185,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
if (svm->nested.initialized)
return 0;

- vmcb02_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmcb02_page = snp_safe_alloc_page(&svm->vcpu);
if (!vmcb02_page)
return -ENOMEM;
svm->nested.vmcb02.ptr = page_address(vmcb02_page);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2bceb0060880..69b57e8f0a7f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3104,3 +3104,36 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
break;
}
}
+
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+ unsigned long pfn;
+ struct page *p;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+ /*
+ * Allocate an SNP safe page to workaround the SNP erratum where
+ * the CPU will incorrectly signal an RMP violation #PF if a
+ * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
+ * or AVIC backing page. The recommeded workaround is to not use the
+ * hugepage.
+ *
+ * Allocate one extra page, use a page which is not 2mb aligned
+ * and free the other.
+ */
+ p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+ if (!p)
+ return NULL;
+
+ split_page(p, 1);
+
+ pfn = page_to_pfn(p);
+ if (IS_ALIGNED(pfn, PTRS_PER_PMD))
+ __free_page(p++);
+ else
+ __free_page(p + 1);
+
+ return p;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index eb308c9994f9..065167b42f90 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -668,7 +668,7 @@ static int svm_cpu_init(int cpu)
int ret = -ENOMEM;

memset(sd, 0, sizeof(struct svm_cpu_data));
- sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ sd->save_area = snp_safe_alloc_page(NULL);
if (!sd->save_area)
return ret;

@@ -1381,7 +1381,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
svm = to_svm(vcpu);

err = -ENOMEM;
- vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmcb01_page = snp_safe_alloc_page(vcpu);
if (!vmcb01_page)
goto out;

@@ -1390,7 +1390,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
* SEV-ES guests require a separate VMSA page used to contain
* the encrypted register state of the guest.
*/
- vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmsa_page = snp_safe_alloc_page(vcpu);
if (!vmsa_page)
goto error_free_vmcb_page;

@@ -4770,6 +4770,16 @@ static int svm_vm_init(struct kvm *kvm)
return 0;
}

+static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
+{
+ struct page *page = snp_safe_alloc_page(vcpu);
+
+ if (!page)
+ return NULL;
+
+ return page_address(page);
+}
+
static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = KBUILD_MODNAME,

@@ -4900,6 +4910,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
+ .alloc_apic_backing_page = svm_alloc_apic_backing_page,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1ab117daebd9..e45b54e95495 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -741,6 +741,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);

/* vmenter.S */

--
2.25.1


2023-06-12 04:43:53

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 27/51] KVM: SVM: Add initial SEV-SNP support

From: Brijesh Singh <[email protected]>

The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding new
hardware based security protection. SEV-SNP adds strong memory encryption
integrity protection to help prevent malicious hypervisor-based attacks
such as data replay, memory re-mapping, and more, to create an isolated
execution environment.

The SNP feature is added incrementally, the later patches adds a new module
parameters that can be used to enabled SEV-SNP in the KVM.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 10 +++++++++-
arch/x86/kvm/svm/svm.h | 8 ++++++++
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 69b57e8f0a7f..f5fcf6c33583 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -58,6 +58,9 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
#define sev_es_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
+
#define AP_RESET_HOLD_NONE 0
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2
@@ -2169,6 +2172,7 @@ void __init sev_hardware_setup(void)
{
#ifdef CONFIG_KVM_AMD_SEV
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+ bool sev_snp_supported = false;
bool sev_es_supported = false;
bool sev_supported = false;

@@ -2248,12 +2252,16 @@ void __init sev_hardware_setup(void)
if (misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count))
goto out;

- pr_info("SEV-ES supported: %u ASIDs\n", sev_es_asid_count);
sev_es_supported = true;
+ sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
+
+ pr_info("SEV-ES %ssupported: %u ASIDs\n",
+ sev_snp_supported ? "and SEV-SNP " : "", sev_es_asid_count);

out:
sev_enabled = sev_supported;
sev_es_enabled = sev_es_supported;
+ sev_snp_enabled = sev_snp_supported;
#endif
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e45b54e95495..6974d63c84f9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -77,6 +77,7 @@ enum {
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
+ bool snp_active; /* SEV-SNP enabled guest */
unsigned int asid; /* ASID used for this guest */
unsigned int handle; /* SEV firmware handle */
int fd; /* SEV device fd */
@@ -346,6 +347,13 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
#endif
}

+static __always_inline bool sev_snp_guest(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return sev_es_guest(kvm) && sev->snp_active;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1


2023-06-12 04:44:57

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 28/51] KVM: SVM: Add KVM_SNP_INIT command

From: Brijesh Singh <[email protected]>

The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be the
first command issued. When creating SEV-SNP guest, the VMM must use this
command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.

The flags value must be zero, it will be extended in future SNP support to
communicate the optional features (such as restricted INT injection etc).

Co-developed-by: Pavan Kumar Paluri <[email protected]>
Signed-off-by: Pavan Kumar Paluri <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 27 +++++++++++++
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/sev.c | 39 ++++++++++++++++++-
arch/x86/kvm/svm/svm.h | 4 ++
include/uapi/linux/kvm.h | 13 +++++++
5 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 487b6328b3e7..1240d28badd6 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -434,6 +434,33 @@ issued by the hypervisor to make the guest ready for execution.

Returns: 0 on success, -negative on error

+18. KVM_SNP_INIT
+----------------
+
+The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
+context. In a typical workflow, this command should be the first command issued.
+
+Parameters (in/out): struct kvm_snp_init
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_snp_init {
+ __u64 flags;
+ };
+
+The flags bitmap is defined as::
+
+ /* enable the restricted injection */
+ #define KVM_SEV_SNP_RESTRICTED_INJET (1<<0)
+
+ /* enable the restricted injection timer */
+ #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1<<1)
+
+If the specified flags is not supported then return -EOPNOTSUPP, and the supported
+flags are returned.
+
References
==========

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index e7c7379d6ac7..ac8edfdd60fa 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -288,6 +288,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_

#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)

+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)

struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f5fcf6c33583..70e0576a32d0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -243,6 +243,25 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
sev_decommission(handle);
}

+static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_snp_init params;
+ int ret = 0;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
+ ret = -EOPNOTSUPP;
+
+ params.flags = SEV_SNP_SUPPORTED_FLAGS;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
+ ret = -EFAULT;
+
+ return ret;
+}
+
static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -256,12 +275,19 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;

sev->active = true;
- sev->es_active = argp->id == KVM_SEV_ES_INIT;
+ sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
+ sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
asid = sev_asid_new(sev);
if (asid < 0)
goto e_no_asid;
sev->asid = asid;

+ if (sev->snp_active) {
+ ret = verify_snp_init_flags(kvm, argp);
+ if (ret)
+ goto e_free;
+ }
+
ret = sev_platform_init(&argp->error);
if (ret)
goto e_free;
@@ -277,6 +303,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
sev_asid_free(sev);
sev->asid = 0;
e_no_asid:
+ sev->snp_active = false;
sev->es_active = false;
sev->active = false;
return ret;
@@ -612,6 +639,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->xss = svm->vcpu.arch.ia32_xss;
save->dr6 = svm->vcpu.arch.dr6;

+ /* Enable the SEV-SNP feature */
+ if (sev_snp_guest(svm->vcpu.kvm))
+ save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);

@@ -1864,6 +1895,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
}

switch (sev_cmd.id) {
+ case KVM_SEV_SNP_INIT:
+ if (!sev_snp_enabled) {
+ r = -ENOTTY;
+ goto out;
+ }
+ fallthrough;
case KVM_SEV_ES_INIT:
if (!sev_es_enabled) {
r = -ENOTTY;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6974d63c84f9..4360cf04f53a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -74,6 +74,9 @@ enum {
/* TPR and CR2 are always written before VMRUN */
#define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))

+/* Supported init feature flags */
+#define SEV_SNP_SUPPORTED_FLAGS 0x0
+
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
@@ -89,6 +92,7 @@ struct kvm_sev_info {
struct list_head mirror_entry; /* Use as a list entry of mirrors */
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
+ u64 snp_init_flags;
};

struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0fa665e8862a..43b6291e3a80 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1926,6 +1926,9 @@ enum sev_cmd_id {
/* Guest Migration Extension */
KVM_SEV_SEND_CANCEL,

+ /* SNP specific commands */
+ KVM_SEV_SNP_INIT,
+
KVM_SEV_NR_MAX,
};

@@ -2022,6 +2025,16 @@ struct kvm_sev_receive_update_data {
__u32 trans_len;
};

+/* enable the restricted injection */
+#define KVM_SEV_SNP_RESTRICTED_INJET (1 << 0)
+
+/* enable the restricted injection timer */
+#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1 << 1)
+
+struct kvm_snp_init {
+ __u64 flags;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1


2023-06-12 04:46:25

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 29/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command

From: Brijesh Singh <[email protected]>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: hold sev_deactivate_lock when calling SEV_CMD_SNP_DECOMMISSION]
Signed-off-by: Michael Roth <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 24 ++++
arch/x86/kvm/svm/sev.c | 126 +++++++++++++++++-
arch/x86/kvm/svm/svm.h | 1 +
include/uapi/linux/kvm.h | 10 ++
4 files changed, 158 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 1240d28badd6..3293e86f9b8a 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -461,6 +461,30 @@ The flags bitmap is defined as::
If the specified flags is not supported then return -EOPNOTSUPP, and the supported
flags are returned.

+19. KVM_SNP_LAUNCH_START
+------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. To create the encryption context, user must
+provide a guest policy, migration agent (if any) and guest OS visible
+workarounds value as defined SEV-SNP specification.
+
+Parameters (in): struct kvm_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_start {
+ __u64 policy; /* Guest policy to use. */
+ __u64 ma_uaddr; /* userspace address of migration agent */
+ __u8 ma_en; /* 1 if the migration agent is enabled */
+ __u8 imi_en; /* set IMI to 1. */
+ __u8 gosvw[16]; /* guest OS visible workarounds */
+ };
+
+See the SEV-SNP specification for further detail on the launch input.
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 70e0576a32d0..e65f3be67c23 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -22,6 +22,7 @@
#include <asm/pkru.h>
#include <asm/trapnr.h>
#include <asm/fpu/xcr.h>
+#include <asm/sev-host.h>

#include "mmu.h"
#include "x86.h"
@@ -75,6 +76,8 @@ static unsigned int nr_asids;
static unsigned long *sev_asid_bitmap;
static unsigned long *sev_reclaim_asid_bitmap;

+static int snp_decommission_context(struct kvm *kvm);
+
struct enc_region {
struct list_head list;
unsigned long npages;
@@ -100,12 +103,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
down_write(&sev_deactivate_lock);

wbinvd_on_all_cpus();
- ret = sev_guest_df_flush(&error);
+
+ if (sev_snp_enabled)
+ ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+ else
+ ret = sev_guest_df_flush(&error);

up_write(&sev_deactivate_lock);

if (ret)
- pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+ pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+ sev_snp_enabled ? "-SNP" : "", ret, error);

return ret;
}
@@ -1871,6 +1879,80 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}

+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct sev_data_snp_addr data = {};
+ void *context;
+ int rc;
+
+ /* Allocate memory for context page */
+ context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+ if (!context)
+ return NULL;
+
+ data.gctx_paddr = __psp_pa(context);
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+ if (rc) {
+ snp_free_firmware_page(context);
+ return NULL;
+ }
+
+ return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_activate data = {0};
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.asid = sev_get_asid(kvm);
+ return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_start start = {0};
+ struct kvm_sev_snp_launch_start params;
+ int rc;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ sev->snp_context = snp_context_create(kvm, argp);
+ if (!sev->snp_context)
+ return -ENOTTY;
+
+ start.gctx_paddr = __psp_pa(sev->snp_context);
+ start.policy = params.policy;
+ memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ sev->fd = argp->sev_fd;
+ rc = snp_bind_asid(kvm, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ return 0;
+
+e_free_context:
+ snp_decommission_context(kvm);
+
+ return rc;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -1961,6 +2043,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_RECEIVE_FINISH:
r = sev_receive_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_START:
+ r = snp_launch_start(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2152,6 +2237,33 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}

+static int snp_decommission_context(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_addr data = {};
+ int ret;
+
+ /* If context is not created then do nothing */
+ if (!sev->snp_context)
+ return 0;
+
+ data.gctx_paddr = __sme_pa(sev->snp_context);
+ down_write(&sev_deactivate_lock);
+ ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+ if (WARN_ONCE(ret, "failed to release guest context")) {
+ up_write(&sev_deactivate_lock);
+ return ret;
+ }
+
+ up_write(&sev_deactivate_lock);
+
+ /* free the context page now */
+ snp_free_firmware_page(sev->snp_context);
+ sev->snp_context = NULL;
+
+ return 0;
+}
+
void sev_vm_destroy(struct kvm *kvm)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2193,7 +2305,15 @@ void sev_vm_destroy(struct kvm *kvm)
}
}

- sev_unbind_asid(kvm, sev->handle);
+ if (sev_snp_guest(kvm)) {
+ if (snp_decommission_context(kvm)) {
+ WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+ return;
+ }
+ } else {
+ sev_unbind_asid(kvm, sev->handle);
+ }
+
sev_asid_free(sev);
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4360cf04f53a..9a7cafb018fe 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -93,6 +93,7 @@ struct kvm_sev_info {
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
u64 snp_init_flags;
+ void *snp_context; /* SNP guest context page */
};

struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 43b6291e3a80..b4c7ac9710d3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1928,6 +1928,7 @@ enum sev_cmd_id {

/* SNP specific commands */
KVM_SEV_SNP_INIT,
+ KVM_SEV_SNP_LAUNCH_START,

KVM_SEV_NR_MAX,
};
@@ -2035,6 +2036,15 @@ struct kvm_snp_init {
__u64 flags;
};

+struct kvm_sev_snp_launch_start {
+ __u64 policy;
+ __u64 ma_uaddr;
+ __u8 ma_en;
+ __u8 imi_en;
+ __u8 gosvw[16];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1


2023-06-12 04:46:31

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 31/51] KVM: Split out memory attribute xarray updates to helper function

This will be useful to other callers that need to update memory
attributes for things like setting up the initial private memory payload
for a guest.

Signed-off-by: Michael Roth <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 26 ++++++++++++++++++--------
2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9a9d4141ba74..56055692cdfa 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -991,6 +991,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module);
void kvm_exit(void);

void kvm_get_kvm(struct kvm *kvm);
+int kvm_vm_set_region_attr(struct kvm *kvm, gfn_t start, gfn_t end, u64 attributes);
bool kvm_get_kvm_safe(struct kvm *kvm);
void kvm_put_kvm(struct kvm *kvm);
bool file_is_kvm(struct file *file);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 48beffca6b67..0c0a75affab6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2466,12 +2466,28 @@ static void kvm_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
kvm_flush_remote_tlbs(kvm);
}

+int kvm_vm_set_region_attr(struct kvm *kvm, gfn_t start, gfn_t end,
+ u64 attributes)
+{
+ gfn_t index;
+ void *entry;
+
+ entry = attributes ? xa_mk_value(attributes) : NULL;
+
+ for (index = start; index < end; index++)
+ if (xa_err(xa_store(&kvm->mem_attr_array, index, entry,
+ GFP_KERNEL_ACCOUNT)))
+ break;
+
+ return index;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_set_region_attr);
+
static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
struct kvm_memory_attributes *attrs)
{
gfn_t start, end;
unsigned long i;
- void *entry;

/* flags is currently not used. */
if (attrs->flags)
@@ -2486,8 +2502,6 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
start = attrs->address >> PAGE_SHIFT;
end = (attrs->address + attrs->size - 1 + PAGE_SIZE) >> PAGE_SHIFT;

- entry = attrs->attributes ? xa_mk_value(attrs->attributes) : NULL;
-
mutex_lock(&kvm->slots_lock);

KVM_MMU_LOCK(kvm);
@@ -2495,11 +2509,7 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
kvm_mmu_invalidate_range_add(kvm, start, end);
KVM_MMU_UNLOCK(kvm);

- for (i = start; i < end; i++) {
- if (xa_err(xa_store(&kvm->mem_attr_array, i, entry,
- GFP_KERNEL_ACCOUNT)))
- break;
- }
+ i = kvm_vm_set_region_attr(kvm, start, end, attrs->attributes);

KVM_MMU_LOCK(kvm);
if (i > start)
--
2.25.1


2023-06-12 04:46:33

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 32/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command

From: Brijesh Singh <[email protected]>

The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.

In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.

While terminating the guest, reclaim the guest pages added in the RMP
table. If the reclaim fails, then the page is no longer safe to be
released back to the system and leak them.

For more information see the SEV-SNP specification.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 28 +++
arch/x86/kvm/svm/sev.c | 189 ++++++++++++++++++
include/uapi/linux/kvm.h | 19 ++
3 files changed, 236 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 3293e86f9b8a..d8492af09796 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error

See the SEV-SNP specification for further detail on the launch input.

+20. KVM_SNP_LAUNCH_UPDATE
+-------------------------
+
+The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
+calculates a measurement of the memory contents. The measurement is a signature
+of the memory contents that can be sent to the guest owner as an attestation
+that the memory was encrypted correctly by the firmware.
+
+Parameters (in): struct kvm_snp_launch_update
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_update {
+ __u64 start_gfn; /* Guest page number to start from. */
+ __u64 uaddr; /* userspace address need to be encrypted */
+ __u32 len; /* length of memory region */
+ __u8 imi_page; /* 1 if memory is part of the IMI */
+ __u8 page_type; /* page type */
+ __u8 vmpl3_perms; /* VMPL3 permission mask */
+ __u8 vmpl2_perms; /* VMPL2 permission mask */
+ __u8 vmpl1_perms; /* VMPL1 permission mask */
+ };
+
+See the SEV-SNP spec for further details on how to build the VMPL permission
+mask and page type.
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e65f3be67c23..6a82767d940f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -234,6 +234,36 @@ static void sev_decommission(unsigned int handle)
sev_guest_decommission(&decommission, NULL);
}

+static int snp_page_reclaim(u64 pfn)
+{
+ struct sev_data_snp_page_reclaim data = {0};
+ int err, rc;
+
+ data.paddr = __sme_set(pfn << PAGE_SHIFT);
+ rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ if (rc) {
+ /*
+ * If the reclaim failed, then page is no longer safe
+ * to use.
+ */
+ snp_leak_pages(pfn, 1);
+ }
+
+ return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
+{
+ int rc;
+
+ rc = rmp_make_shared(pfn, level);
+ if (rc && leak)
+ snp_leak_pages(pfn,
+ page_level_size(level) >> PAGE_SHIFT);
+
+ return rc;
+}
+
static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
{
struct sev_data_deactivate deactivate;
@@ -1953,6 +1983,162 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return rc;
}

+static int snp_launch_update_gfn_handler(struct kvm *kvm,
+ struct kvm_gfn_range *range,
+ void *opaque)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_memory_slot *memslot = range->slot;
+ struct sev_data_snp_launch_update data = {0};
+ struct kvm_sev_snp_launch_update params;
+ struct kvm_sev_cmd *argp = opaque;
+ int *error = &argp->error;
+ int i, n = 0, ret = 0;
+ unsigned long npages;
+ kvm_pfn_t *pfns;
+ gfn_t gfn;
+
+ if (!kvm_slot_can_be_private(memslot)) {
+ pr_err("SEV-SNP requires restricted memory.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
+ pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
+ return -EFAULT;
+ }
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+
+ npages = range->end - range->start;
+ pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
+ if (!pfns)
+ return -ENOMEM;
+
+ pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
+ range->start, range->end, params.page_type);
+
+ for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
+ int order, level;
+ bool assigned;
+ void *kvaddr;
+
+ ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfns[i], &order);
+ if (ret)
+ goto e_release;
+
+ n++;
+ ret = snp_lookup_rmpentry((u64)pfns[i], &assigned, &level);
+ if (ret || assigned) {
+ pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d, assigned: %d\n",
+ gfn, ret, assigned);
+ return -EFAULT;
+ }
+
+ kvaddr = pfn_to_kaddr(pfns[i]);
+ if (!virt_addr_valid(kvaddr)) {
+ pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
+ ret = -EINVAL;
+ goto e_release;
+ }
+
+ ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+ if (ret) {
+ pr_err("Guest read failed, ret: 0x%x\n", ret);
+ goto e_release;
+ }
+
+ ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
+ sev_get_asid(kvm), true);
+ if (ret) {
+ ret = -EFAULT;
+ goto e_release;
+ }
+
+ data.address = __sme_set(pfns[i] << PAGE_SHIFT);
+ data.page_size = X86_TO_RMP_PG_LEVEL(PG_LEVEL_4K);
+ data.page_type = params.page_type;
+ data.vmpl3_perms = params.vmpl3_perms;
+ data.vmpl2_perms = params.vmpl2_perms;
+ data.vmpl1_perms = params.vmpl1_perms;
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, error);
+ if (ret) {
+ pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
+ ret, *error);
+ snp_page_reclaim(pfns[i]);
+
+ /*
+ * When invalid CPUID function entries are detected, the firmware
+ * corrects these entries for debugging purpose and leaves the
+ * page unencrypted so it can be provided users for debugging
+ * and error-reporting.
+ *
+ * Copy the corrected CPUID page back to shared memory so
+ * userpsace can retrieve this information.
+ */
+ if (params.page_type == SNP_PAGE_TYPE_CPUID &&
+ *error == SEV_RET_INVALID_PARAM) {
+ int ret;
+
+ host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+ ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+ if (ret)
+ pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
+ ret);
+ }
+
+ goto e_release;
+ }
+ }
+
+ /*
+ * Memory attribute updates via KVM_SET_MEMORY_ATTRIBUTES are serialized
+ * via kvm->slots_lock, so use the same protocol for updating them here.
+ */
+ mutex_lock(&kvm->slots_lock);
+ kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+ mutex_unlock(&kvm->slots_lock);
+
+e_release:
+ /* Content of memory is updated, mark pages dirty */
+ for (i = 0; i < n; i++) {
+ set_page_dirty(pfn_to_page(pfns[i]));
+ mark_page_accessed(pfn_to_page(pfns[i]));
+
+ /*
+ * If its an error, then update RMP entry to change page ownership
+ * to the hypervisor.
+ */
+ if (ret)
+ host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+ put_page(pfn_to_page(pfns[i]));
+ }
+
+ kvfree(pfns);
+ return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_launch_update params;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ return kvm_vm_do_hva_range_op(kvm, params.uaddr, params.uaddr + params.len,
+ snp_launch_update_gfn_handler, argp);
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2046,6 +2232,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_START:
r = snp_launch_start(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_UPDATE:
+ r = snp_launch_update(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b4c7ac9710d3..4961d2e67a4b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1929,6 +1929,7 @@ enum sev_cmd_id {
/* SNP specific commands */
KVM_SEV_SNP_INIT,
KVM_SEV_SNP_LAUNCH_START,
+ KVM_SEV_SNP_LAUNCH_UPDATE,

KVM_SEV_NR_MAX,
};
@@ -2045,6 +2046,24 @@ struct kvm_sev_snp_launch_start {
__u8 pad[6];
};

+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
+
+struct kvm_sev_snp_launch_update {
+ __u64 start_gfn;
+ __u64 uaddr;
+ __u32 len;
+ __u8 imi_page;
+ __u8 page_type;
+ __u8 vmpl3_perms;
+ __u8 vmpl2_perms;
+ __u8 vmpl1_perms;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1


2023-06-12 04:46:41

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 02/51] KVM: x86: Add gmem hook for invalidating private memory

TODO: add a CONFIG option that can be to completely skip arch
invalidation loop and avoid __weak references for arch/platforms that
don't need an additional invalidation hook.

In some cases, like with SEV-SNP, guest memory needs to be updated in a
platform-specific manner before it can be safely freed back to the host.
Add hooks to wire up handling of this sort when freeing memory in
response to FALLOC_FL_PUNCH_HOLE operations.

Also issue invalidations of all allocated pages when releasing the gmem
file so that the pages are not left in an unusable state when they get
freed back to the host.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 6 ++++
include/linux/kvm_host.h | 3 ++
virt/kvm/guest_mem.c | 48 ++++++++++++++++++++++++++++--
5 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 439ba4beb5af..48f043de2ec0 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -134,6 +134,7 @@ KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
+KVM_X86_OP_OPTIONAL(gmem_invalidate)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bd03b6cf40fb..b3bd24f2a390 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1735,6 +1735,7 @@ struct kvm_x86_ops {

int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot,
kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
+ void (*gmem_invalidate)(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c9e1c9369be2..10d76afa23d9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13252,6 +13252,12 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_arch_no_poll);

+#ifdef CONFIG_KVM_PRIVATE_MEM
+void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end)
+{
+ static_call_cond(kvm_x86_gmem_invalidate)(kvm, start, end);
+}
+#endif

int kvm_spec_ctrl_test_value(u64 value)
{
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1a47cedae8a1..7de06add2235 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2343,6 +2343,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
#ifdef CONFIG_KVM_PRIVATE_MEM
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
gfn_t gfn, kvm_pfn_t *pfn, int *order);
+void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
#else
static inline int kvm_gmem_get_pfn(struct kvm *kvm,
struct kvm_memory_slot *slot, gfn_t gfn,
@@ -2351,6 +2352,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
KVM_BUG_ON(1, kvm);
return -EIO;
}
+
+void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end) { }
#endif /* CONFIG_KVM_PRIVATE_MEM */

#endif
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index cdf2d84683c8..a7e926af4255 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -140,16 +140,58 @@ static void kvm_gmem_invalidate_end(struct kvm *kvm, struct kvm_gmem *gmem,
KVM_MMU_UNLOCK(kvm);
}

+void __weak kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end)
+{
+}
+
+/* Handle arch-specific hooks needed before releasing guarded pages. */
+static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct file *file,
+ pgoff_t start, pgoff_t end)
+{
+ pgoff_t file_end = i_size_read(file_inode(file)) >> PAGE_SHIFT;
+ pgoff_t index = start;
+
+ end = min(end, file_end);
+
+ while (index < end) {
+ struct folio *folio;
+ unsigned int order;
+ struct page *page;
+ kvm_pfn_t pfn;
+
+ folio = __filemap_get_folio(file->f_mapping, index,
+ FGP_LOCK, 0);
+ if (!folio) {
+ index++;
+ continue;
+ }
+
+ page = folio_file_page(folio, index);
+ pfn = page_to_pfn(page);
+ order = folio_order(folio);
+
+ kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index));
+
+ index = folio_next_index(folio);
+ folio_unlock(folio);
+ folio_put(folio);
+
+ cond_resched();
+ }
+}
+
static long kvm_gmem_punch_hole(struct file *file, loff_t offset, loff_t len)
{
struct kvm_gmem *gmem = file->private_data;
- pgoff_t start = offset >> PAGE_SHIFT;
- pgoff_t end = (offset + len) >> PAGE_SHIFT;
struct kvm *kvm = gmem->kvm;
+ pgoff_t start, end;

if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
return 0;

+ start = offset >> PAGE_SHIFT;
+ end = (offset + len) >> PAGE_SHIFT;
+
/*
* Bindings must stable across invalidation to ensure the start+end
* are balanced.
@@ -158,6 +200,7 @@ static long kvm_gmem_punch_hole(struct file *file, loff_t offset, loff_t len)

kvm_gmem_invalidate_begin(kvm, gmem, start, end);

+ kvm_gmem_issue_arch_invalidate(kvm, file, start, end);
truncate_inode_pages_range(file->f_mapping, offset, offset + len - 1);

kvm_gmem_invalidate_end(kvm, gmem, start, end);
@@ -264,6 +307,7 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
* pointed at this file.
*/
kvm_gmem_invalidate_begin(kvm, gmem, 0, -1ul);
+ kvm_gmem_issue_arch_invalidate(gmem->kvm, file, 0, -1ul);
truncate_inode_pages_final(file->f_mapping);
kvm_gmem_invalidate_end(kvm, gmem, 0, -1ul);

--
2.25.1


2023-06-12 04:46:41

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 30/51] KVM: Add HVA range operator

From: Vishal Annapurve <[email protected]>

Introduce HVA range operator so that other KVM subsystems
can operate on HVA range.

Signed-off-by: Vishal Annapurve <[email protected]>
[mdr: minor checkpatch alignment fixups]
Signed-off-by: Michael Roth <[email protected]>
---
include/linux/kvm_host.h | 6 +++++
virt/kvm/kvm_main.c | 49 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7de06add2235..9a9d4141ba74 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1400,6 +1400,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm);
void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
void kvm_mmu_invalidate_end(struct kvm *kvm);

+typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
+ struct kvm_gfn_range *range, void *data);
+
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+ unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
+
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 422d49634c56..48beffca6b67 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -642,6 +642,55 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
return (int)ret;
}

+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+ unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
+{
+ int ret = 0;
+ struct kvm_gfn_range gfn_range;
+ struct kvm_memory_slot *slot;
+ struct kvm_memslots *slots;
+ int i, idx;
+
+ if (WARN_ON_ONCE(hva_end <= hva_start))
+ return -EINVAL;
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
+ struct interval_tree_node *node;
+
+ slots = __kvm_memslots(kvm, i);
+ kvm_for_each_memslot_in_hva_range(node, slots,
+ hva_start, hva_end - 1) {
+ unsigned long start, end;
+
+ slot = container_of(node, struct kvm_memory_slot,
+ hva_node[slots->node_idx]);
+ start = max(hva_start, slot->userspace_addr);
+ end = min(hva_end, slot->userspace_addr +
+ (slot->npages << PAGE_SHIFT));
+
+ /*
+ * {gfn(page) | page intersects with [hva_start, hva_end)} =
+ * {gfn_start, gfn_start+1, ..., gfn_end-1}.
+ */
+ gfn_range.start = hva_to_gfn_memslot(start, slot);
+ gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
+ gfn_range.slot = slot;
+
+ ret = handler(kvm, &gfn_range, data);
+ if (ret)
+ goto e_ret;
+ }
+ }
+
+e_ret:
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_do_hva_range_op);
+
static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
unsigned long start,
unsigned long end,
--
2.25.1


2023-06-12 04:47:04

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 34/51] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, Verify that GHCB GPA matches with the registered value. If a
mismatch is detected then abort the guest.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 8 ++++++++
arch/x86/kvm/svm/sev.c | 27 +++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 7 +++++++
3 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index aaea4afcda98..d499730e1f15 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ 0x010
+#define GHCB_MSR_GPA_VALUE_POS 12
+#define GHCB_MSR_GPA_VALUE_MASK GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP 0x011
+#define GHCB_MSR_PREF_GPA_NONE 0xfffffffffffff
+
/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a7cbdc24ccdb..44fdcf407759 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3312,6 +3312,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PREF_GPA_REQ: {
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
+ case GHCB_MSR_REG_GPA_REQ: {
+ u64 gfn;
+
+ gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+
+ svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+ set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -3378,6 +3399,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)

exit_code = ghcb_get_sw_exit_code(ghcb);

+ /* SEV-SNP guest requires that the GHCB GPA must be registered */
+ if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+ vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+ return -EINVAL;
+ }
+
ret = sev_es_validate_vmgexit(svm);
if (ret)
return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9a7cafb018fe..02edbdd443e4 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -205,6 +205,8 @@ struct vcpu_sev_es_state {
u32 ghcb_sa_len;
bool ghcb_sa_sync;
bool ghcb_sa_free;
+
+ u64 ghcb_registered_gpa;
};

struct vcpu_svm {
@@ -359,6 +361,11 @@ static __always_inline bool sev_snp_guest(struct kvm *kvm)
return sev_es_guest(kvm) && sev->snp_active;
}

+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+ return svm->sev_es.ghcb_registered_gpa == val;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1


2023-06-12 04:47:17

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 35/51] KVM: SVM: Add KVM_EXIT_VMGEXIT

For private memslots, GHCB page state change requests will be forwarded
to userspace for processing. Define a new KVM_EXIT_VMGEXIT for exits of
this type, as well as other potential userspace handling for VMGEXITs in
the future.

Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/kvm/api.rst | 34 ++++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 6 ++++++
2 files changed, 40 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index df37aa11512d..028fd3fa50a7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6780,6 +6780,40 @@ Please note that the kernel is allowed to use the kvm_run structure as the
primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.

+::
+
+ /* KVM_EXIT_VMGEXIT */
+ struct {
+ __u64 ghcb_msr; /* GHCB MSR contents */
+ __u64 ret; /* user -> kernel return value */
+ } memory;
+
+If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest has
+issued a VMGEXIT instruction (as documented by the AMD Architecture
+Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
+userspace. This is generally handled via the Guest-Hypervisor Communication
+Block (GHCB) specification. The value of 'ghcb_msr' will be the contents of
+the GHCB MSR register at the time of the VMGEXIT, which can either be the GPA
+of the GHCB page for page-based GHCB requests, or an encoding of an MSR-based
+GHCB request. The mechanism to distinguish between these two and determine the
+type of request is the same as what is documented in the GHCB specification.
+
+Not all VMGEXITs or GHCB requests will be forwarded to userspace. Currently
+this will only be the case for "SNP Page State Change" requests (PSCs), and
+only for the subset of these which involve actual shared <-> private
+transition. Userspace is expected to process these requests in accordance
+with the GHCB specification and issue KVM_SET_MEMORY_ATTRIBUTE ioctls to
+perform the shared/private transitions.
+
+GHCB page-based PSC requests require returning a 64-bit return value to the
+guest via the SW_EXITINFO2 field of the vCPU's VMCB structure, as documented
+in the GHCB. Userspace must set 'ret' to what the GHCB specification documents
+the SW_EXITINFO2 VMCB field should be set to after processing a PSC request.
+
+For MSR-based PSC requests, userspace must set the value of 'ghcb_msr' to be
+the same as what the GHCB specification documents the actual GHCB MSR register
+should be set to after processing a PSC request.
+

6. Capabilities that can be enabled on vCPUs
============================================
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1fb6a6615d09..175b958f103f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -279,6 +279,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_RISCV_CSR 36
#define KVM_EXIT_NOTIFY 37
#define KVM_EXIT_MEMORY_FAULT 38
+#define KVM_EXIT_VMGEXIT 50

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -527,6 +528,11 @@ struct kvm_run {
__u64 gpa;
__u64 size;
} memory;
+ /* KVM_EXIT_VMGEXIT */
+ struct {
+ __u64 ghcb_msr; /* GHCB MSR contents */
+ __u64 ret; /* user -> kernel */
+ } vmgexit;
/* Fix the size of the union. */
char padding[256];
};
--
2.25.1


2023-06-12 04:47:28

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 38/51] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use

From: Brijesh Singh <[email protected]>

While resolving the RMP page fault, there may be cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.

To keep the RMP and TDP page level in sync, zap the gfn range after
splitting the pages in the RMP entry. The zap should force the TDP to
gets rebuilt with the new page level.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu.h | 2 --
arch/x86/kvm/mmu/mmu.c | 1 +
3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8d2bb3ff66a2..026bfc4446ee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1851,6 +1851,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+

int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 92d5a1924fc1..963c734642f6 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -235,8 +235,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
return -(u32)fault & errcode;
}

-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);

int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0d3983b9aa7e..dff0eb018b27 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6732,6 +6732,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,

return need_tlb_flush;
}
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);

static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
const struct kvm_memory_slot *slot)
--
2.25.1


2023-06-12 04:47:46

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 37/51] KVM: SVM: Add support to handle Page State Change VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.

Forward these requests to userspace as KVM_EXIT_VMGEXITs, similar to how
it is done for requests that don't use a GHCB page.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kvm/svm/sev.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2afc59b86b91..9b9dff7728c8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3039,6 +3039,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
+ case SVM_VMGEXIT_PSC:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3242,6 +3243,15 @@ static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
return 1; /* resume */
}

+static int snp_complete_psc(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, vcpu->run->vmgexit.ret);
+
+ return 1; /* resume */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3485,6 +3495,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_PSC:
+ /* Let userspace handling allocating/deallocating backing pages. */
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
+ vcpu->arch.complete_userspace_io = snp_complete_psc;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.25.1


2023-06-12 04:49:48

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 36/51] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

When using gmem, private/shared memory is allocated through separate
pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
KVM ioctl to tell KVM MMU whether or not a particular GFN should be
backed by private memory or not.

Forward these page state change requests to userspace so that it can
issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
entries when it is ready to map a private page into a guest.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kvm/svm/sev.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 44fdcf407759..2afc59b86b91 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3233,6 +3233,15 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}

+static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ set_ghcb_msr(svm, vcpu->run->vmgexit.ghcb_msr);
+
+ return 1; /* resume */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3333,6 +3342,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PSC_REQ:
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.ghcb_msr = control->ghcb_gpa;
+ vcpu->arch.complete_userspace_io = snp_complete_psc_msr_protocol;
+
+ ret = -1;
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

--
2.25.1


2023-06-12 04:51:57

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 39/51] KVM: x86: Define RMP page fault error bits for #NPF

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled globally, the hardware places restrictions on all
memory accesses based on the RMP entry, whether the hypervisor or a VM,
performs the accesses. When hardware encounters an RMP access violation
during a guest access, it will cause a #VMEXIT(NPF).

See APM2 section 16.36.10 for more details.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 026bfc4446ee..2fcd309fd9fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -253,9 +253,13 @@ enum x86_intercept_stage;
#define PFERR_FETCH_BIT 4
#define PFERR_PK_BIT 5
#define PFERR_SGX_BIT 15
+#define PFERR_GUEST_RMP_BIT 31
#define PFERR_GUEST_FINAL_BIT 32
#define PFERR_GUEST_PAGE_BIT 33
#define PFERR_IMPLICIT_ACCESS_BIT 48
+#define PFERR_GUEST_ENC_BIT 34
+#define PFERR_GUEST_SIZEM_BIT 35
+#define PFERR_GUEST_VMPL_BIT 36

#define PFERR_PRESENT_MASK BIT(PFERR_PRESENT_BIT)
#define PFERR_WRITE_MASK BIT(PFERR_WRITE_BIT)
@@ -267,6 +271,10 @@ enum x86_intercept_stage;
#define PFERR_GUEST_FINAL_MASK BIT_ULL(PFERR_GUEST_FINAL_BIT)
#define PFERR_GUEST_PAGE_MASK BIT_ULL(PFERR_GUEST_PAGE_BIT)
#define PFERR_IMPLICIT_ACCESS BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
+#define PFERR_GUEST_RMP_MASK BIT_ULL(PFERR_GUEST_RMP_BIT)
+#define PFERR_GUEST_ENC_MASK BIT_ULL(PFERR_GUEST_ENC_BIT)
+#define PFERR_GUEST_SIZEM_MASK BIT_ULL(PFERR_GUEST_SIZEM_BIT)
+#define PFERR_GUEST_VMPL_MASK BIT_ULL(PFERR_GUEST_VMPL_BIT)

#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
PFERR_WRITE_MASK | \
--
2.25.1


2023-06-12 04:51:58

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 40/51] KVM: SVM: Add support to handle RMP nested page faults

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.

When using gmem, RMP faults resulting from mismatches between the state
in the RMP table vs. what the guest expects via its page table result
in KVM_EXIT_MEMORY_FAULTs being forwarded to userspace to handle. This
means the only expected case that needs to be handled in the kernel is
when the page size of the entry in the RMP table is larger than the
mapping in the nested page table, in which case a PSMASH instruction
needs to be issued to split the large RMP entry into individual 4K
entries so that subsequent accesses can succeed.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kvm/svm/sev.c | 85 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 21 +++++++++--
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 103 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9b9dff7728c8..1ba49c5ebaed 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3234,6 +3234,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}

+static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
+{
+ pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+ return psmash(pfn);
+}
+
static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3696,3 +3703,81 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)

return p;
}
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+ struct kvm_memory_slot *slot;
+ struct kvm *kvm = vcpu->kvm;
+ int order, rmp_level, ret;
+ bool assigned;
+ kvm_pfn_t pfn;
+ gfn_t gfn;
+
+ /*
+ * Private memslots forward handling of implicit page state changes
+ * to userspace, so the only RMP faults expected here are for
+ * PFERR_GUEST_SIZEM_MASK. Anything else suggests that the RMP table
+ * has gotten out of sync with the private memslot. Generally...
+ *
+ * However, there is a transient case where access to an NPT mapping
+ * that has just been split/PSMASH'd can generate an RMP fault. In this
+ * case the PFERR_GUEST_SIZEM bit might not be set. In these cases it
+ * should be safe to ignore and let the guest retry, but allow for
+ * these to be optionally logged to diagnose exceptional cases.
+ */
+ if (!(error_code & PFERR_GUEST_SIZEM_MASK)) {
+ pr_debug_ratelimited("Unexpected RMP fault for GPA 0x%llx, error_code 0x%llx",
+ gpa, error_code);
+ return;
+ }
+
+ gfn = gpa >> PAGE_SHIFT;
+
+ /*
+ * Only RMPADJUST/PVALIDATE should cause PFERR_GUEST_SIZEM.
+ *
+ * For PVALIDATE, this should only happen if a guest PVALIDATEs a 4K GFN
+ * that is backed by a huge page in the host whose RMP entry has the
+ * hugepage/assigned bits set. With UPM, that should only ever happen
+ * for private pages.
+ *
+ * For RMPADJUST, this assumption might not hold, in which case handling
+ * for obtaining the PFN from HVA-backed memory may be needed. For now,
+ * just print warnings.
+ */
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_warn_ratelimited("Unexpected RMP fault, size-mismatch for non-private GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_warn_ratelimited("Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+ if (ret) {
+ pr_warn_ratelimited("Unexpected RMP fault, no private backing page for GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret || !assigned) {
+ pr_warn_ratelimited("Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
+ gpa, pfn, ret);
+ goto out;
+ }
+
+ ret = snp_rmptable_psmash(kvm, pfn);
+ if (ret)
+ pr_err_ratelimited("Unable to split RMP entries for GPA 0x%llx PFN 0x%llx ret %d\n",
+ gpa, pfn, ret);
+
+out:
+ kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+ put_page(pfn_to_page(pfn));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 065167b42f90..0cff050bf5bb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1995,15 +1995,28 @@ static int pf_interception(struct kvm_vcpu *vcpu)
static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ int rc;

u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;

trace_kvm_page_fault(vcpu, fault_address, error_code);
- return kvm_mmu_page_fault(vcpu, fault_address, error_code,
- static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
- svm->vmcb->control.insn_bytes : NULL,
- svm->vmcb->control.insn_len);
+ rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+ static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+ svm->vmcb->control.insn_bytes : NULL,
+ svm->vmcb->control.insn_len);
+
+ /*
+ * rc == 0 indicates a userspace exit is needed to handle page
+ * transitions, so do that first before updating the RMP table.
+ */
+ if (error_code & PFERR_GUEST_RMP_MASK) {
+ if (rc == 0)
+ return rc;
+ handle_rmp_page_fault(vcpu, fault_address, error_code);
+ }
+
+ return rc;
}

static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 02edbdd443e4..4cf9dbc442e9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -762,6 +762,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);

/* vmenter.S */

--
2.25.1


2023-06-12 04:52:26

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 03/51] KVM: x86: Use full 64-bit error code for kvm_mmu_do_page_fault

The upper bits will be needed in some cases to distinguish between
nested page faults for private/shared pages, so pass along the full
64-bit value.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +--
arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c54672ad6cbc..0d3983b9aa7e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5829,8 +5829,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
}

if (r == RET_PF_INVALID) {
- r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
- lower_32_bits(error_code), false,
+ r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
&emulation_type);
if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
return -EIO;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index f1786698ae00..780b91e1da9f 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -283,11 +283,11 @@ enum {
};

static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- u32 err, bool prefetch, int *emulation_type)
+ u64 err, bool prefetch, int *emulation_type)
{
struct kvm_page_fault fault = {
.addr = cr2_or_gpa,
- .error_code = err,
+ .error_code = lower_32_bits(err),
.exec = err & PFERR_FETCH_MASK,
.write = err & PFERR_WRITE_MASK,
.present = err & PFERR_PRESENT_MASK,
--
2.25.1


2023-06-12 04:53:12

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 41/51] KVM: SVM: Use a VMSA physical address variable for populating VMCB

From: Tom Lendacky <[email protected]>

In preparation to support SEV-SNP AP Creation, use a variable that holds
the VMSA physical address rather than converting the virtual address.
This will allow SEV-SNP AP Creation to set the new physical address that
will be used should the vCPU reset path be taken.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 5 ++---
arch/x86/kvm/svm/svm.c | 9 ++++++++-
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1ba49c5ebaed..111e43eede15 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3551,10 +3551,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)

/*
* An SEV-ES guest requires a VMSA area that is a separate from the
- * VMCB page. Do not include the encryption mask on the VMSA physical
- * address since hardware will access it using the guest key.
+ * VMCB page.
*/
- svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
+ svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;

/* Can't intercept CR register access, HV can't modify CR registers */
svm_clr_intercept(svm, INTERCEPT_CR0_READ);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0cff050bf5bb..77195d8c1aa3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1419,9 +1419,16 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
svm_switch_vmcb(svm, &svm->vmcb01);

- if (vmsa_page)
+ if (vmsa_page) {
svm->sev_es.vmsa = page_address(vmsa_page);

+ /*
+ * Do not include the encryption mask on the VMSA physical
+ * address since hardware will access it using the guest key.
+ */
+ svm->sev_es.vmsa_pa = __pa(svm->sev_es.vmsa);
+ }
+
svm->guest_state_loaded = false;

return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4cf9dbc442e9..8dc7946ab634 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -197,6 +197,7 @@ struct vcpu_sev_es_state {
struct sev_es_save_area *vmsa;
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
+ hpa_t vmsa_pa;
bool received_first_sipi;
unsigned int ap_reset_hold_type;

--
2.25.1


2023-06-12 04:58:41

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 49/51] x86/sev: Add KVM commands for per-instance certs

From: Dionna Glaze <[email protected]>

The /dev/sev device has the ability to store host-wide certificates for
the key used by the AMD-SP for SEV-SNP attestation report signing,
but for hosts that want to specify additional certificates that are
specific to the image launched in a VM, a different way is needed to
communicate those certificates.

Add two new KVM ioctl to handle this: KVM_SEV_SNP_{GET,SET}_CERTS

The certificates that are set with this command are expected to follow
the same format as the host certificates, but that format is opaque
to the kernel.

The new behavior for custom certificates is that the extended guest
request command will now return the overridden certificates if they
were installed for the instance. The error condition for a too small
data buffer is changed to return the overridden certificate data size
if there is an overridden certificate set installed.

Setting a 0 length certificate returns the system state to only return
the host certificates on an extended guest request.

Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
for an extra certificate.

Cc: Tom Lendacky <[email protected]>
Cc: Paolo Bonzini <[email protected]>

Signed-off-by: Dionna Glaze <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: remove used of "we" and "this patch" in commit log, squash in
documentation patch]
Signed-off-by: Michael Roth <[email protected]>
[aik: snp_handle_ext_guest_request() now uses the CCP's cert object
without copying things over, only refcounting needed.]
Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 44 +++++++
arch/x86/kvm/svm/sev.c | 115 ++++++++++++++++++
arch/x86/kvm/svm/svm.h | 1 +
include/linux/psp-sev.h | 2 +-
include/uapi/linux/kvm.h | 12 ++
5 files changed, 173 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index cd77a19577fe..21c1894d78ef 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -537,6 +537,50 @@ Returns: 0 on success, -negative on error

See SEV-SNP specification for further details on launch finish input parameters.

+22. KVM_SEV_SNP_GET_CERTS
+-------------------------
+
+After the SNP guest launch flow has started, the KVM_SEV_SNP_GET_CERTS command
+can be issued to request the data that has been installed with the
+KVM_SEV_SNP_SET_CERTS command.
+
+Parameters (in/out): struct kvm_sev_snp_get_certs
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_get_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len
+ };
+
+If no certs have been installed, then the return value is -ENOENT.
+If the buffer specified in the struct is too small, the certs_len field will be
+overwritten with the required bytes to receive all the certificate bytes and the
+return value will be -EINVAL.
+
+23. KVM_SEV_SNP_SET_CERTS
+-------------------------
+
+After the SNP guest launch flow has started, the KVM_SEV_SNP_SET_CERTS command
+can be issued to override the /dev/sev certs data that is returned when a
+guest issues an extended guest request. This is useful for instance-specific
+extensions to the host certificates.
+
+Parameters (in/out): struct kvm_sev_snp_set_certs
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_set_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len
+ };
+
+The certs_len field may not exceed SEV_FW_BLOB_MAX_SIZE.
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index adbe8c242d81..bdf32aa971d8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2277,6 +2277,113 @@ static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}

+static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_get_certs params;
+ struct sev_snp_certs *snp_certs;
+ int rc = 0;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+ sizeof(params)))
+ return -EFAULT;
+
+ snp_certs = sev_snp_certs_get(sev->snp_certs);
+ /* No instance certs set. */
+ if (!snp_certs)
+ return -ENOENT;
+
+ if (params.certs_len < sev->snp_certs->len) {
+ /* Output buffer too small. Return the required size. */
+ params.certs_len = sev->snp_certs->len;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
+ sizeof(params)))
+ rc = -EFAULT;
+ else
+ rc = -EINVAL; /* May be ENOSPC? */
+ } else {
+ if (copy_to_user((void __user *)(uintptr_t)params.certs_uaddr,
+ snp_certs->data, snp_certs->len))
+ rc = -EFAULT;
+ }
+
+ sev_snp_certs_put(snp_certs);
+
+ return rc;
+}
+
+static void snp_replace_certs(struct kvm_sev_info *sev, struct sev_snp_certs *snp_certs)
+{
+ sev_snp_certs_put(sev->snp_certs);
+ sev->snp_certs = snp_certs;
+}
+
+static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ unsigned long length = SEV_FW_BLOB_MAX_SIZE;
+ struct kvm_sev_snp_set_certs params;
+ struct sev_snp_certs *snp_certs;
+ void *to_certs;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+ sizeof(params)))
+ return -EFAULT;
+
+ if (params.certs_len > SEV_FW_BLOB_MAX_SIZE)
+ return -EINVAL;
+
+ /*
+ * Setting a length of 0 is the same as "uninstalling" instance-
+ * specific certificates.
+ */
+ if (params.certs_len == 0) {
+ snp_replace_certs(sev, NULL);
+ return 0;
+ }
+
+ /* Page-align the length */
+ length = ALIGN(params.certs_len, PAGE_SIZE);
+
+ to_certs = kmalloc(length, GFP_KERNEL | __GFP_ZERO);
+ if (!to_certs)
+ return -ENOMEM;
+
+ if (copy_from_user(to_certs,
+ (void __user *)(uintptr_t)params.certs_uaddr,
+ params.certs_len)) {
+ ret = -EFAULT;
+ goto error_exit;
+ }
+
+ snp_certs = sev_snp_certs_new(to_certs, length);
+ if (!snp_certs) {
+ ret = -ENOMEM;
+ goto error_exit;
+ }
+
+ snp_replace_certs(sev, snp_certs);
+
+ return 0;
+error_exit:
+ kfree(to_certs);
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2376,6 +2483,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_FINISH:
r = snp_launch_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_GET_CERTS:
+ r = snp_get_instance_certs(kvm, &sev_cmd);
+ break;
+ case KVM_SEV_SNP_SET_CERTS:
+ r = snp_set_instance_certs(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2591,6 +2704,8 @@ static int snp_decommission_context(struct kvm *kvm)
snp_free_firmware_page(sev->snp_context);
sev->snp_context = NULL;

+ sev_snp_certs_put(sev->snp_certs);
+
return 0;
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0d4c29a4300a..72be0c440b16 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -95,6 +95,7 @@ struct kvm_sev_info {
u64 snp_init_flags;
void *snp_context; /* SNP guest context page */
u64 sev_features; /* Features set at VMSA creation */
+ struct sev_snp_certs *snp_certs;
};

struct kvm_svm {
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 2191d8b5423a..7b65dd5808a1 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -22,7 +22,7 @@
#define __psp_pa(x) __pa(x)
#endif

-#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
+#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */

struct sev_snp_certs {
void *data;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 175b958f103f..fa1c300303d6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1937,6 +1937,8 @@ enum sev_cmd_id {
KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_SNP_LAUNCH_UPDATE,
KVM_SEV_SNP_LAUNCH_FINISH,
+ KVM_SEV_SNP_GET_CERTS,
+ KVM_SEV_SNP_SET_CERTS,

KVM_SEV_NR_MAX,
};
@@ -2084,6 +2086,16 @@ struct kvm_sev_snp_launch_finish {
__u8 pad[6];
};

+struct kvm_sev_snp_get_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len;
+};
+
+struct kvm_sev_snp_set_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1


2023-06-12 05:01:08

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 44/51] KVM: SEV: Implement gmem hook for initializing private pages

This will handle RMP table updates and direct map changes needed to put
a page into a private state before mapping it into an SEV-SNP guest.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 95 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/svm/svm.h | 2 +
3 files changed, 99 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 909ecd90d199..c5a1706387bf 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4022,3 +4022,98 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
put_page(pfn_to_page(pfn));
}
+
+/* Check if GFN range is marked private in the KVM/gmem xarray. */
+static bool is_gfn_range_private(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+ gfn_t gfn = start;
+
+ while (gfn++ < end)
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_debug("%s: overlap detected, GFN 0x%llx start 0x%llx end 0x%llx\n",
+ __func__, gfn, start, end);
+ return false;
+ }
+
+ return true;
+}
+
+/* Check that no pages in PFN range have already been set to private in RMP table. */
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn = start;
+
+ while (pfn++ < end) {
+ int ret, rmp_level;
+ bool assigned;
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret) {
+ pr_debug("%s: failed to retrieve RMP entry, assuming overlap, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d error %d\n",
+ __func__, pfn, start, end, rmp_level, ret);
+ return false;
+ }
+
+ if (assigned == 1) {
+ pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
+ __func__, pfn, start, end, rmp_level);
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static int get_supported_rmp_level(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn)
+{
+ if (!IS_ALIGNED(pfn, PTRS_PER_PMD) || !IS_ALIGNED(gfn, PTRS_PER_PMD))
+ return PG_LEVEL_4K;
+
+ /*
+ * Check that both the desired GFN range states in the xarray, and
+ * current PFN range states in the RMP table, are conducive to
+ * creating a 2M private RMP entry.
+ */
+ if (is_gfn_range_private(kvm, gfn, gfn + PTRS_PER_PMD) &&
+ is_pfn_range_shared(pfn, pfn + PTRS_PER_PMD))
+ return PG_LEVEL_2M;
+
+ return PG_LEVEL_4K;
+}
+
+int sev_gmem_prepare(struct kvm *kvm, struct kvm_memory_slot *slot,
+ kvm_pfn_t pfn, gfn_t gfn, u8 *max_level)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ int level, rc = 0;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc)
+ return rc;
+
+ /* No conversion needed, just clamp xax_level according to RMP entry. */
+ if (assigned)
+ goto out_adjust_level;
+
+ if (*max_level == PG_LEVEL_4K)
+ level = PG_LEVEL_4K;
+ else
+ level = get_supported_rmp_level(kvm, pfn, gfn);
+
+ rc = rmp_make_private(pfn, gfn_to_gpa(gfn), level, sev->asid, false);
+ if (rc)
+ pr_err_ratelimited("%s: failed gfn %llx pfn %llx level %d rc %d\n",
+ __func__, gfn, pfn, level, rc);
+
+out_adjust_level:
+ pr_debug("%s: pfn %llx gfn %llx max_level %d level %d assigned %d\n",
+ __func__, pfn, gfn, *max_level, level, assigned);
+ if (*max_level > level)
+ *max_level = level;
+
+ return rc;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 81b9f4e04a8d..9085a122907c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4934,6 +4934,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+ .gmem_prepare = sev_gmem_prepare,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e73a58e489c7..0438f52e4396 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -770,6 +770,8 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_gmem_prepare(struct kvm *kvm, struct kvm_memory_slot *slot,
+ kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);

/* vmenter.S */

--
2.25.1


2023-06-12 05:05:06

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 45/51] KVM: SEV: Implement gmem hook for invalidating private pages

Implement a platform hook to do the work of restoring the direct map
entries of gmem-managed pages and transitioning the corresponding RMP
table entries back to the default shared/hypervisor-owned state.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 43 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 45 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c5a1706387bf..543926fa3200 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4117,3 +4117,46 @@ int sev_gmem_prepare(struct kvm *kvm, struct kvm_memory_slot *slot,

return rc;
}
+
+void sev_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn;
+
+ if (!sev_snp_guest(kvm))
+ return;
+
+ pr_debug("%s: kvm %p pfn 0x%llx pfn_end 0x%llx\n",
+ __func__, kvm, start, end);
+
+ for (pfn = start; pfn < end; pfn++) {
+ int rc, rmp_level;
+ bool assigned;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (rc) {
+ pr_warn_ratelimited("SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ continue;
+ }
+
+ if (!assigned)
+ continue;
+
+ /*
+ * If PFN is currently assigned as a 2M page, PSMASH it into
+ * individual 4K RMP entries before attempting to convert a
+ * 4K sub-page.
+ */
+ if (rmp_level > PG_LEVEL_4K) {
+ rc = snp_rmptable_psmash(kvm, pfn);
+ if (rc)
+ pr_warn_ratelimited("SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ }
+
+ rc = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (rc)
+ pr_warn_ratelimited("SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ }
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9085a122907c..1390e47d0aa5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4936,6 +4936,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.alloc_apic_backing_page = svm_alloc_apic_backing_page,

.gmem_prepare = sev_gmem_prepare,
+ .gmem_invalidate = sev_gmem_invalidate,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0438f52e4396..0d4c29a4300a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -772,6 +772,7 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_gmem_prepare(struct kvm *kvm, struct kvm_memory_slot *slot,
kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
+void sev_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);

/* vmenter.S */

--
2.25.1


2023-06-12 05:21:05

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 48/51] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command

From: Brijesh Singh <[email protected]>

The SEV-SNP firmware provides the SNP_CONFIG command used to set the
system-wide configuration value for SNP guests. The information includes
the TCB version string to be reported in guest attestation reports.

Version 2 of the GHCB specification adds an NAE (SNP extended guest
request) that a guest can use to query the reports that include additional
certificates.

In both cases, userspace provided additional data is included in the
attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
command to give the certificate blob and the reported TCB version string
at once. Note that the specification defines certificate blob with a
specific GUID format; the userspace is responsible for building the
proper certificate blob. The ioctl treats it an opaque blob.

While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
command that can be used to obtain the data programmed through the
SNP_SET_EXT_CONFIG.

Co-developed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Co-developed-by: Dionna Glaze <[email protected]>
Signed-off-by: Dionna Glaze <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: squash in doc patch from Dionna]
Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/coco/sev-guest.rst | 27 ++++
drivers/crypto/ccp/sev-dev.c | 178 ++++++++++++++++++++++++++
drivers/crypto/ccp/sev-dev.h | 2 +
include/linux/psp-sev.h | 10 ++
include/uapi/linux/psp-sev.h | 17 +++
5 files changed, 234 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 11ea67c944df..6cad4226c348 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
status includes API major, minor version and more. See the SEV-SNP
specification for further details.

+2.5 SNP_SET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
+reported TCB version in the attestation report. The command is similar to
+SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
+command also accepts an additional certificate blob defined in the GHCB
+specification.
+
+If the certs_address is zero, then the previous certificate blob will deleted.
+For more information on the certificate blob layout, see the GHCB spec
+(extended guest request message).
+
+2.6 SNP_GET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_GET_EXT_CONFIG is used to query the system-wide configuration set
+through the SNP_SET_EXT_CONFIG.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index b8e8c4da4025..175c24163ba0 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1491,6 +1491,10 @@ static int __sev_snp_shutdown_locked(int *error)
data.length = sizeof(data);
data.iommu_snp_shutdown = 1;

+ /* Free the memory used for caching the certificate data */
+ sev_snp_certs_put(sev->snp_certs);
+ sev->snp_certs = NULL;
+
wbinvd_on_all_cpus();

retry:
@@ -1829,6 +1833,126 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
return ret;
}

+static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_ext_snp_config input;
+ struct sev_snp_certs *snp_certs;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ memset(&input, 0, sizeof(input));
+
+ if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+ return -EFAULT;
+
+ /* Copy the TCB version programmed through the SET_CONFIG to userspace */
+ if (input.config_address) {
+ if (copy_to_user((void * __user)input.config_address,
+ &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
+ return -EFAULT;
+ }
+
+ snp_certs = sev_snp_certs_get(sev->snp_certs);
+
+ /* Copy the extended certs programmed through the SNP_SET_CONFIG */
+ if (input.certs_address && snp_certs) {
+ if (input.certs_len < snp_certs->len) {
+ /* Return the certs length to userspace */
+ input.certs_len = snp_certs->len;
+
+ ret = -EIO;
+ goto e_done;
+ }
+
+ if (copy_to_user((void * __user)input.certs_address,
+ snp_certs->data, snp_certs->len)) {
+ ret = -EFAULT;
+ goto put_exit;
+ }
+ }
+
+ ret = 0;
+
+e_done:
+ if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
+ ret = -EFAULT;
+
+put_exit:
+ sev_snp_certs_put(snp_certs);
+
+ return ret;
+}
+
+static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_ext_snp_config input;
+ struct sev_user_data_snp_config config;
+ struct sev_snp_certs *snp_certs = NULL;
+ void *certs = NULL;
+ int ret = 0;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ memset(&input, 0, sizeof(input));
+
+ if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+ return -EFAULT;
+
+ /* Copy the certs from userspace */
+ if (input.certs_address) {
+ if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
+ return -EINVAL;
+
+ certs = psp_copy_user_blob(input.certs_address, input.certs_len);
+ if (IS_ERR(certs))
+ return PTR_ERR(certs);
+ }
+
+ /* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
+ if (input.config_address) {
+ memset(&config, 0, sizeof(config));
+ if (copy_from_user(&config,
+ (void __user *)input.config_address, sizeof(config))) {
+ ret = -EFAULT;
+ goto e_free;
+ }
+
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
+ if (ret)
+ goto e_free;
+
+ memcpy(&sev->snp_config, &config, sizeof(config));
+ }
+
+ /*
+ * If the new certs are passed then cache it else free the old certs.
+ */
+ if (input.certs_len) {
+ snp_certs = sev_snp_certs_new(certs, input.certs_len);
+ if (!snp_certs) {
+ ret = -ENOMEM;
+ goto e_free;
+ }
+ }
+
+ sev_snp_certs_put(sev->snp_certs);
+ sev->snp_certs = snp_certs;
+
+ return 0;
+
+e_free:
+ kfree(certs);
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -1883,6 +2007,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SNP_PLATFORM_STATUS:
ret = sev_ioctl_snp_platform_status(&input);
break;
+ case SNP_SET_EXT_CONFIG:
+ ret = sev_ioctl_snp_set_config(&input, writable);
+ break;
+ case SNP_GET_EXT_CONFIG:
+ ret = sev_ioctl_snp_get_config(&input);
+ break;
default:
ret = -EINVAL;
goto out;
@@ -1931,6 +2061,54 @@ int sev_guest_df_flush(int *error)
}
EXPORT_SYMBOL_GPL(sev_guest_df_flush);

+static void sev_snp_certs_release(struct kref *kref)
+{
+ struct sev_snp_certs *certs = container_of(kref, struct sev_snp_certs, kref);
+
+ kfree(certs->data);
+ kfree(certs);
+}
+
+struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len)
+{
+ struct sev_snp_certs *certs;
+
+ if (!len || !data)
+ return NULL;
+
+ certs = kzalloc(sizeof(*certs), GFP_KERNEL);
+ if (!certs)
+ return NULL;
+
+ certs->data = data;
+ certs->len = len;
+ kref_init(&certs->kref);
+
+ return certs;
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_new);
+
+struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs)
+{
+ if (!certs)
+ return NULL;
+
+ if (!kref_get_unless_zero(&certs->kref))
+ return NULL;
+
+ return certs;
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_get);
+
+void sev_snp_certs_put(struct sev_snp_certs *certs)
+{
+ if (!certs)
+ return;
+
+ kref_put(&certs->kref, sev_snp_certs_release);
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_put);
+
static void sev_exit(struct kref *ref)
{
misc_deregister(&misc_dev->misc);
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 19d79f9d4212..22374f3d3e2e 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -66,6 +66,8 @@ struct sev_device {

bool snp_initialized;
struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
+ struct sev_snp_certs *snp_certs;
+ struct sev_user_data_snp_config snp_config;
};

int sev_dev_init(struct psp_device *psp);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 5ae61de96e44..2191d8b5423a 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -24,6 +24,16 @@

#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */

+struct sev_snp_certs {
+ void *data;
+ u32 len;
+ struct kref kref;
+};
+
+struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len);
+struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs);
+void sev_snp_certs_put(struct sev_snp_certs *certs);
+
/**
* SEV platform state
*/
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 4dc6a3e7b3d5..d1e6a0615546 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -29,6 +29,8 @@ enum {
SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
SEV_GET_ID2,
SNP_PLATFORM_STATUS,
+ SNP_SET_EXT_CONFIG,
+ SNP_GET_EXT_CONFIG,

SEV_MAX,
};
@@ -201,6 +203,21 @@ struct sev_user_data_snp_config {
__u8 rsvd1[52];
} __packed;

+/**
+ * struct sev_data_snp_ext_config - system wide configuration value for SNP.
+ *
+ * @config_address: address of the struct sev_user_data_snp_config or 0 when
+ * reported_tcb does not need to be updated.
+ * @certs_address: address of extended guest request certificate chain or
+ * 0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
+ * @certs_len: length of the certs
+ */
+struct sev_user_data_ext_snp_config {
+ __u64 config_address; /* In */
+ __u64 certs_address; /* In */
+ __u32 certs_len; /* In */
+};
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1


2023-06-12 05:22:42

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 46/51] KVM: SVM: Add module parameter to enable the SEV-SNP

From: Brijesh Singh <[email protected]>

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 543926fa3200..adbe8c242d81 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -54,14 +54,16 @@ module_param_named(sev, sev_enabled, bool, 0444);
/* enable/disable SEV-ES support */
static bool sev_es_enabled = true;
module_param_named(sev_es, sev_es_enabled, bool, 0444);
+
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
#else
#define sev_enabled false
#define sev_es_enabled false
+#define sev_snp_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

-/* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
-
#define AP_RESET_HOLD_NONE 0
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2
--
2.25.1


2023-06-12 05:25:30

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 47/51] iommu/amd: Add IOMMU_SNP_SHUTDOWN support

From: Ashish Kalra <[email protected]>

Add a new IOMMU API interface amd_iommu_snp_disable() to transition
IOMMU pages to Hypervisor state from Reclaim state after SNP_SHUTDOWN_EX
command. Invoke this API from the CCP driver after SNP_SHUTDOWN_EX
command.

Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 20 +++++++++++++
drivers/iommu/amd/init.c | 55 ++++++++++++++++++++++++++++++++++++
include/linux/amd-iommu.h | 1 +
3 files changed, 76 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 0bfe9721c977..b8e8c4da4025 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -24,6 +24,7 @@
#include <linux/cpufeature.h>
#include <linux/fs.h>
#include <linux/fs_struct.h>
+#include <linux/amd-iommu.h>

#include <asm/smp.h>
#include <asm/cacheflush.h>
@@ -1508,6 +1509,25 @@ static int __sev_snp_shutdown_locked(int *error)
return ret;
}

+ /*
+ * SNP_SHUTDOWN_EX with IOMMU_SNP_SHUTDOWN set to 1 disables SNP
+ * enforcement by the IOMMU and also transitions all pages
+ * associated with the IOMMU to the Reclaim state.
+ * Firmware was transitioning the IOMMU pages to Hypervisor state
+ * before version 1.53. But, accounting for the number of assigned
+ * 4kB pages in a 2M page was done incorrectly by not transitioning
+ * to the Reclaim state. This resulted in RMP #PF when later accessing
+ * the 2M page containing those pages during kexec boot. Hence, the
+ * firmware now transitions these pages to Reclaim state and hypervisor
+ * needs to transition these pages to shared state. SNP Firmware
+ * version 1.53 and above are needed for kexec boot.
+ */
+ ret = amd_iommu_snp_disable();
+ if (ret) {
+ dev_err(sev->dev, "SNP IOMMU shutdown failed\n");
+ return ret;
+ }
+
sev->snp_initialized = false;
dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 33ea62d93540..a84ec81cfbb5 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -30,6 +30,7 @@
#include <asm/io_apic.h>
#include <asm/irq_remapping.h>
#include <asm/set_memory.h>
+#include <asm/sev-host.h>

#include <linux/crash_dump.h>

@@ -3701,4 +3702,58 @@ int amd_iommu_snp_enable(void)

return 0;
}
+
+static int iommu_page_make_shared(void *page)
+{
+ unsigned long paddr, pfn;
+
+ paddr = iommu_virt_to_phys(page);
+ /* Cbit maybe set in the paddr */
+ pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ return rmp_make_shared(pfn, PG_LEVEL_4K);
+}
+
+static int iommu_make_shared(void *va, size_t size)
+{
+ void *page;
+ int ret;
+
+ if (!va)
+ return 0;
+
+ for (page = va; page < (va + size); page += PAGE_SIZE) {
+ ret = iommu_page_make_shared(page);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+int amd_iommu_snp_disable(void)
+{
+ struct amd_iommu *iommu;
+ int ret;
+
+ if (!amd_iommu_snp_en)
+ return 0;
+
+ for_each_iommu(iommu) {
+ ret = iommu_make_shared(iommu->evt_buf, EVT_BUFFER_SIZE);
+ if (ret)
+ return ret;
+
+ ret = iommu_make_shared(iommu->ppr_log, PPR_LOG_SIZE);
+ if (ret)
+ return ret;
+
+ ret = iommu_make_shared((void *)iommu->cmd_sem, PAGE_SIZE);
+ if (ret)
+ return ret;
+ }
+
+ amd_iommu_snp_en = false;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(amd_iommu_snp_disable);
#endif
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 8f0cde2d451c..7ba46118d0f1 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -208,6 +208,7 @@ struct amd_iommu *get_amd_iommu(unsigned int idx);

#ifdef CONFIG_KVM_AMD_SEV
int amd_iommu_snp_enable(void);
+int amd_iommu_snp_disable(void);
#endif

#endif /* _ASM_X86_AMD_IOMMU_H */
--
2.25.1


2023-06-12 05:27:29

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 06/51] x86/cpufeatures: Add SEV-SNP CPU feature

From: Brijesh Singh <[email protected]>

Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/amd.c | 5 +++--
tools/arch/x86/include/asm/cpufeatures.h | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 97327a1e3aff..b60b32f47884 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -431,6 +431,7 @@
#define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
#define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 95cdd08c4cbb..a79774181f22 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -558,8 +558,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
* SME feature (set in scattered.c).
* If the kernel has not enabled SME via any means then
* don't advertise the SME feature.
- * For SEV: If BIOS has not enabled SEV then don't advertise the
- * SEV and SEV_ES feature (set in scattered.c).
+ * For SEV: If BIOS has not enabled SEV then don't advertise SEV and
+ * any additional functionality based on it.
*
* In all cases, since support for SME and SEV requires long mode,
* don't advertise the feature under CONFIG_X86_32.
@@ -594,6 +594,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
clear_sev:
setup_clear_cpu_cap(X86_FEATURE_SEV);
setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+ setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
}
}

diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index b89005819cd5..b4f11a500470 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -424,6 +424,7 @@
#define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
#define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */

--
2.25.1


2023-06-12 05:30:15

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 50/51] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

From: Brijesh Singh <[email protected]>

Version 2 of GHCB specification added the support for two SNP Guest
Request Message NAE events. The events allows for an SEV-SNP guest to
make request to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.

The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
difference of an additional certificate blob that can be passed through
the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
provides snp_guest_ext_guest_request() that is used by the KVM to get
both the report and certificate data at once.

Co-developed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: ensure FW command failures are indicated to guest]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 175 +++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 1 +
drivers/crypto/ccp/sev-dev.c | 15 +++
include/linux/psp-sev.h | 1 +
4 files changed, 192 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index bdf32aa971d8..9f7defce1988 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -328,6 +328,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
ret = verify_snp_init_flags(kvm, argp);
if (ret)
goto e_free;
+
+ mutex_init(&sev->guest_req_lock);
}

ret = sev_platform_init(&argp->error);
@@ -2321,8 +2323,10 @@ static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)

static void snp_replace_certs(struct kvm_sev_info *sev, struct sev_snp_certs *snp_certs)
{
+ mutex_lock(&sev->guest_req_lock);
sev_snp_certs_put(sev->snp_certs);
sev->snp_certs = snp_certs;
+ mutex_unlock(&sev->guest_req_lock);
}

static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
@@ -3171,6 +3175,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3604,6 +3610,163 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
return ret;
}

+static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
+ struct sev_data_snp_guest_request *data,
+ gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ kvm_pfn_t req_pfn, resp_pfn;
+ struct kvm_sev_info *sev;
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+ return SEV_RET_INVALID_PARAM;
+
+ req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+ if (is_error_noslot_pfn(req_pfn))
+ return SEV_RET_INVALID_ADDRESS;
+
+ resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+ if (is_error_noslot_pfn(resp_pfn))
+ return SEV_RET_INVALID_ADDRESS;
+
+ if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+ return SEV_RET_INVALID_ADDRESS;
+
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+ data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+ return 0;
+}
+
+static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
+{
+ u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+ int ret;
+
+ ret = snp_page_reclaim(pfn);
+ if (ret)
+ *rc = SEV_RET_INVALID_ADDRESS;
+
+ ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (ret)
+ *rc = SEV_RET_INVALID_ADDRESS;
+}
+
+static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct sev_data_snp_guest_request data = {0};
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_sev_info *sev;
+ unsigned long rc;
+ int err;
+
+ if (!sev_snp_guest(vcpu->kvm)) {
+ rc = SEV_RET_INVALID_GUEST;
+ goto e_fail;
+ }
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ mutex_lock(&sev->guest_req_lock);
+
+ rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
+ if (rc)
+ goto unlock;
+
+ rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
+ if (rc)
+ /* Ensure an error value is returned to guest. */
+ rc = err ? err : SEV_RET_INVALID_ADDRESS;
+
+ snp_cleanup_guest_buf(&data, &rc);
+
+unlock:
+ mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
+}
+
+static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct sev_data_snp_guest_request req = {0};
+ struct sev_snp_certs *snp_certs = NULL;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ unsigned long data_npages;
+ struct kvm_sev_info *sev;
+ unsigned long exitcode = 0;
+ u64 data_gpa;
+ int err, rc;
+
+ if (!sev_snp_guest(vcpu->kvm)) {
+ rc = SEV_RET_INVALID_GUEST;
+ goto e_fail;
+ }
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+ data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+ if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
+ exitcode = SEV_RET_INVALID_ADDRESS;
+ goto e_fail;
+ }
+
+ mutex_lock(&sev->guest_req_lock);
+
+ rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
+ if (rc)
+ goto unlock;
+
+ /*
+ * If a VMM-specific certificate blob hasn't been provided, grab the
+ * host-wide one.
+ */
+ snp_certs = sev_snp_certs_get(sev->snp_certs);
+ if (!snp_certs)
+ snp_certs = sev_snp_global_certs_get();
+
+ /*
+ * If there is a host-wide or VMM-specific certificate blob available,
+ * make sure the guest has allocated enough space to store it.
+ * Otherwise, inform the guest how much space is needed.
+ */
+ if (snp_certs && (data_npages << PAGE_SHIFT) < snp_certs->len) {
+ vcpu->arch.regs[VCPU_REGS_RBX] = snp_certs->len >> PAGE_SHIFT;
+ exitcode = SNP_GUEST_REQ_INVALID_LEN;
+ goto cleanup;
+ }
+
+ rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req, &err);
+ if (rc) {
+ /* pass the firmware error code */
+ exitcode = err;
+ goto cleanup;
+ }
+
+ /* Copy the certificate blob in the guest memory */
+ if (sev->snp_certs &&
+ kvm_write_guest(kvm, data_gpa, sev->snp_certs->data, sev->snp_certs->len))
+ exitcode = SEV_RET_INVALID_ADDRESS;
+
+cleanup:
+ sev_snp_certs_put(snp_certs);
+ snp_cleanup_guest_buf(&req, &exitcode);
+
+unlock:
+ mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3863,6 +4026,18 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
SVM_EVTINJ_VALID);
}

+ ret = 1;
+ break;
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
+
+ ret = 1;
+ break;
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+ snp_handle_ext_guest_request(svm,
+ control->exit_info_1,
+ control->exit_info_2);
+
ret = 1;
break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 72be0c440b16..31cd8b3e6877 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -96,6 +96,7 @@ struct kvm_sev_info {
void *snp_context; /* SNP guest context page */
u64 sev_features; /* Features set at VMSA creation */
struct sev_snp_certs *snp_certs;
+ struct mutex guest_req_lock; /* Lock for guest request handling */
};

struct kvm_svm {
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 175c24163ba0..096ba15d0740 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2109,6 +2109,21 @@ void sev_snp_certs_put(struct sev_snp_certs *certs)
}
EXPORT_SYMBOL_GPL(sev_snp_certs_put);

+struct sev_snp_certs *sev_snp_global_certs_get(void)
+{
+ struct sev_device *sev;
+
+ if (!psp_master || !psp_master->sev_data)
+ return NULL;
+
+ sev = psp_master->sev_data;
+ if (!sev->snp_initialized)
+ return NULL;
+
+ return sev_snp_certs_get(sev->snp_certs);
+}
+EXPORT_SYMBOL_GPL(sev_snp_global_certs_get);
+
static void sev_exit(struct kref *ref)
{
misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 7b65dd5808a1..1235eb3110cb 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -33,6 +33,7 @@ struct sev_snp_certs {
struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len);
struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs);
void sev_snp_certs_put(struct sev_snp_certs *certs);
+struct sev_snp_certs *sev_snp_global_certs_get(void);

/**
* SEV platform state
--
2.25.1


2023-06-12 05:31:04

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile

Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
arch/x86/coco, but that is generally only applicable for guest support.

For SEV-SNP, helpers related purely to host support will also live in
arch/x86/coco. To allow for CoCo-related host support code in
arch/x86/coco, move that check down into the Makefile and check for it
specifically when needed.

Cc: Kirill A. Shutemov <[email protected]>
Suggested-by: Tom Lendacky <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/Kbuild | 2 +-
arch/x86/coco/Makefile | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
index 5a83da703e87..1889cef48b58 100644
--- a/arch/x86/Kbuild
+++ b/arch/x86/Kbuild
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += coco/
+obj-y += coco/

obj-y += entry/

diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
index c816acf78b6a..6aa52e719bf5 100644
--- a/arch/x86/coco/Makefile
+++ b/arch/x86/coco/Makefile
@@ -3,6 +3,6 @@ CFLAGS_REMOVE_core.o = -pg
KASAN_SANITIZE_core.o := n
CFLAGS_core.o += -fno-stack-protector

-obj-y += core.o
+obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += core.o

obj-$(CONFIG_INTEL_TDX_GUEST) += tdx/
--
2.25.1


2023-06-12 05:32:48

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 42/51] KVM: SVM: Support SEV-SNP AP Creation NAE event

From: Tom Lendacky <[email protected]>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID. The GPA is saved in the svm struct of the
target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID the next time an INIT is performed. The GPA is
saved in the svm struct of the target vCPU.

For DESTROY:
The guest indicates it wishes to stop the vCPU. The GPA is cleared
from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. The handler sets the
vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
leave the vCPU as not runnable. Any previous VMSA pages that were
installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
a new VMSA is to be installed, the VMSA guest page is pinned and set as
the VMSA in the vCPU VMCB and the vCPU state is set to
KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
cleared in the vCPU VMCB and the vCPU state is left as
KVM_MP_STATE_UNINITIALIZED to prevent it from being run.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: add handling for restrictedmem]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/svm.h | 7 +-
arch/x86/kvm/svm/sev.c | 240 ++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 3 +
arch/x86/kvm/svm/svm.h | 8 +-
arch/x86/kvm/x86.c | 11 ++
6 files changed, 268 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2fcd309fd9fb..8f515e0386a0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -113,6 +113,7 @@
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)

#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ac8edfdd60fa..0deb83ac800b 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -288,7 +288,12 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_

#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)

-#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4)
+#define SVM_SEV_FEAT_INT_INJ_MODES \
+ (SVM_SEV_FEAT_RESTRICTED_INJECTION | \
+ SVM_SEV_FEAT_ALTERNATE_INJECTION)

struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 111e43eede15..ec74ff5e09c7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -638,6 +638,7 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)

static int sev_es_sync_vmsa(struct vcpu_svm *svm)
{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
struct sev_es_save_area *save = svm->sev_es.vmsa;

/* Check some debug related fields before encrypting the VMSA */
@@ -683,6 +684,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
if (sev_snp_guest(svm->vcpu.kvm))
save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;

+ /*
+ * Save the VMSA synced SEV features. For now, they are the same for
+ * all vCPUs, so just save each time.
+ */
+ sev->sev_features = save->sev_features;
+
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);

@@ -3034,6 +3041,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
if (!ghcb_sw_scratch_is_valid(ghcb))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ if (lower_32_bits(ghcb_get_sw_exit_info_1(ghcb)) != SVM_VMGEXIT_AP_DESTROY)
+ if (!ghcb_rax_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
case SVM_VMGEXIT_NMI_COMPLETE:
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3259,6 +3271,220 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
return 1; /* resume */
}

+static kvm_pfn_t gfn_to_pfn_gmem(struct kvm *kvm, gfn_t gfn)
+{
+ struct kvm_memory_slot *slot;
+ kvm_pfn_t pfn;
+ int order = 0;
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_err("SEV: Failure retrieving restricted memslot for GFN 0x%llx, flags 0x%x, userspace_addr: 0x%lx\n",
+ gfn, slot->flags, slot->userspace_addr);
+ return INVALID_PAGE;
+ }
+
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_err("SEV: Failure retrieving restricted PFN for GFN 0x%llx\n", gfn);
+ return INVALID_PAGE;
+ }
+
+ if (kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order)) {
+ pr_err("SEV: Failure retrieving restricted PFN for GFN 0x%llx\n", gfn);
+ return INVALID_PAGE;
+ }
+
+ return pfn;
+}
+
+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ kvm_pfn_t pfn;
+ hpa_t cur_pa;
+
+ WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+ /* Save off the current VMSA PA for later checks */
+ cur_pa = svm->sev_es.vmsa_pa;
+
+ /* Mark the vCPU as offline and not runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+
+ /* Clear use of the VMSA */
+ svm->sev_es.vmsa_pa = INVALID_PAGE;
+ svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+ /*
+ * sev->sev_es.vmsa holds the virtual address of the VMSA initially
+ * allocated by the host. If the guest specified a new a VMSA via
+ * AP_CREATION, it will have been pinned to avoid future issues
+ * with things like page migration support. Make sure to un-pin it
+ * before switching to a newer guest-specified VMSA.
+ */
+ if (cur_pa != __pa(svm->sev_es.vmsa) && VALID_PAGE(cur_pa))
+ kvm_release_pfn_dirty(__phys_to_pfn(cur_pa));
+
+ if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+ /*
+ * The VMSA is referenced by the hypervisor physical address,
+ * so retrieve the PFN and ensure it is restricted memory.
+ */
+ pfn = gfn_to_pfn_gmem(vcpu->kvm, gpa_to_gfn(svm->sev_es.snp_vmsa_gpa));
+ if (!VALID_PAGE(pfn) || is_error_pfn(pfn))
+ return -EINVAL;
+
+ /* Use the new VMSA */
+ svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
+ svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
+
+ /* Mark the vCPU as runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ }
+
+ /*
+ * When replacing the VMSA during SEV-SNP AP creation,
+ * mark the VMCB dirty so that full state is always reloaded.
+ */
+ vmcb_mark_all_dirty(svm->vmcb);
+
+ return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ int ret;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ return;
+
+ mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+ if (!svm->sev_es.snp_ap_create)
+ goto unlock;
+
+ svm->sev_es.snp_ap_create = false;
+
+ ret = __sev_snp_update_protected_guest_state(vcpu);
+ if (ret)
+ vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+ mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_vcpu *target_vcpu;
+ struct vcpu_svm *target_svm;
+ unsigned int request;
+ unsigned int apic_id;
+ bool kick;
+ int ret;
+
+ request = lower_32_bits(svm->vmcb->control.exit_info_1);
+ apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+ /* Validate the APIC ID */
+ target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+ if (!target_vcpu) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+ apic_id);
+ return -EINVAL;
+ }
+
+ ret = 0;
+
+ target_svm = to_svm(target_vcpu);
+
+ /*
+ * The target vCPU is valid, so the vCPU will be kicked unless the
+ * request is for CREATE_ON_INIT. For any errors at this stage, the
+ * kick will place the vCPU in an non-runnable state.
+ */
+ kick = true;
+
+ mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ target_svm->sev_es.snp_ap_create = true;
+
+ /* Interrupt injection mode shouldn't change for AP creation */
+ if (request < SVM_VMGEXIT_AP_DESTROY) {
+ u64 sev_features;
+
+ sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+ sev_features ^= sev->sev_features;
+ if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+ vcpu->arch.regs[VCPU_REGS_RAX]);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ switch (request) {
+ case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+ kick = false;
+ fallthrough;
+ case SVM_VMGEXIT_AP_CREATE:
+ if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /*
+ * Malicious guest can RMPADJUST a large page into VMSA which
+ * will hit the SNP erratum where the CPU will incorrectly signal
+ * an RMP violation #PF if a hugepage collides with the RMP entry
+ * of VMSA page, reject the AP CREATE request if VMSA address from
+ * guest is 2M aligned.
+ */
+ if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+ vcpu_unimpl(vcpu,
+ "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+ break;
+ case SVM_VMGEXIT_AP_DESTROY:
+ break;
+ default:
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+ request);
+ ret = -EINVAL;
+ break;
+ }
+
+out:
+ if (kick) {
+ if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+ target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+ kvm_vcpu_kick(target_vcpu);
+ }
+
+ mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ return ret;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3508,6 +3734,18 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
vcpu->arch.complete_userspace_io = snp_complete_psc;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ ret = sev_snp_ap_creation(svm);
+ if (ret) {
+ ghcb_set_sw_exit_info_1(ghcb, 1);
+ ghcb_set_sw_exit_info_2(ghcb,
+ X86_TRAP_GP |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
+ }
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3612,6 +3850,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
GHCB_VERSION_MIN,
sev_enc_bit));
+
+ mutex_init(&svm->sev_es.snp_vmsa_mutex);
}

void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 77195d8c1aa3..81b9f4e04a8d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1358,6 +1358,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
svm->spec_ctrl = 0;
svm->virt_spec_ctrl = 0;

+ if (init_event)
+ sev_snp_init_protected_guest_state(vcpu);
+
init_vmcb(vcpu);

if (!init_event)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8dc7946ab634..e73a58e489c7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -94,6 +94,7 @@ struct kvm_sev_info {
atomic_t migration_in_progress;
u64 snp_init_flags;
void *snp_context; /* SNP guest context page */
+ u64 sev_features; /* Features set at VMSA creation */
};

struct kvm_svm {
@@ -208,6 +209,10 @@ struct vcpu_sev_es_state {
bool ghcb_sa_free;

u64 ghcb_registered_gpa;
+
+ struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
+ gpa_t snp_vmsa_gpa;
+ bool snp_ap_create;
};

struct vcpu_svm {
@@ -735,7 +740,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

-#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP
+#define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)

extern unsigned int max_sev_asid;

@@ -764,6 +769,7 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);

/* vmenter.S */

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 10d76afa23d9..9e3c41e2a3ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10621,6 +10621,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)

if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+ if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+ kvm_vcpu_reset(vcpu, true);
+ if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+ r = 1;
+ goto out;
+ }
+ }
}

if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -12816,6 +12824,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
return true;
#endif

+ if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+ return true;
+
if (kvm_arch_interrupt_allowed(vcpu) &&
(kvm_cpu_has_interrupt(vcpu) ||
kvm_guest_apic_has_interrupt(vcpu)))
--
2.25.1


2023-06-12 05:35:59

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 51/51] crypto: ccp: Add debug support for decrypting pages

From: Brijesh Singh <[email protected]>

Add support to decrypt guest encrypted memory. These API interfaces can
be used for example to dump VMCBs on SNP guest exit.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: minor commit fixups]
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 32 ++++++++++++++++++++++++++++++++
include/linux/psp-sev.h | 19 +++++++++++++++++++
2 files changed, 51 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 096ba15d0740..3c8cd2d20016 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2061,6 +2061,38 @@ int sev_guest_df_flush(int *error)
}
EXPORT_SYMBOL_GPL(sev_guest_df_flush);

+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
+{
+ struct sev_data_snp_dbg data = {0};
+ struct sev_device *sev;
+ int ret;
+
+ if (!psp_master || !psp_master->sev_data)
+ return -ENODEV;
+
+ sev = psp_master->sev_data;
+
+ if (!sev->snp_initialized)
+ return -EINVAL;
+
+ data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
+ data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
+ data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);
+
+ /* The destination page must be in the firmware state. */
+ if (rmp_mark_pages_firmware(data.dst_addr, 1, false))
+ return -EIO;
+
+ ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &data, error);
+
+ /* Restore the page state */
+ if (snp_reclaim_pages(data.dst_addr, 1, false))
+ ret = -EIO;
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);
+
static void sev_snp_certs_release(struct kref *kref)
{
struct sev_snp_certs *certs = container_of(kref, struct sev_snp_certs, kref);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 1235eb3110cb..55f6dfc2580d 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -916,6 +916,20 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
*/
int sev_do_cmd(int cmd, void *data, int *psp_ret);

+/**
+ * snp_guest_dbg_decrypt_page - perform SEV SNP_DBG_DECRYPT command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV if the SEV device is not available
+ * -%ENOTSUPP if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO if the SEV returned a non-zero return code
+ */
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error);
+
void *psp_copy_user_blob(u64 uaddr, u32 len);
void *snp_alloc_firmware_page(gfp_t mask);
void snp_free_firmware_page(void *addr);
@@ -946,6 +960,11 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int

static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }

+static inline int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
+{
+ return -ENODEV;
+}
+
static inline void *snp_alloc_firmware_page(gfp_t mask)
{
return NULL;
--
2.25.1


2023-06-12 05:37:21

by Michael Roth

[permalink] [raw]
Subject: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support

From: Brijesh Singh <[email protected]>

The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
a number of steps needed to detect/enable SEV-SNP and RMP table support
on the host:

- Detect SEV-SNP support based on CPUID bit
- Initialize the RMP table memory reported by the RMP base/end MSR
registers and configure IOMMU to be compatible with RMP access
restrictions
- Set the MtrrFixDramModEn bit in SYSCFG MSR
- Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
- Configure IOMMU

RMP table entry format is non-architectural and it can vary by
processor. It is defined by the PPR. Restrict SNP support to CPU
models/families which are compatible with the current RMP table entry
format to guard against any undefined behavior when running on other
system types. Future models/support will handle this through an
architectural mechanism to allow for broader compatibility.

SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
instead of CONFIG_AMD_MEM_ENCRYPT.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Co-developed-by: Tom Lendacky <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: rework commit message to be clearer about what patch does, squash
in early_rmptable_check() handling from Tom]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/coco/Makefile | 1 +
arch/x86/coco/sev/Makefile | 3 +
arch/x86/coco/sev/host.c | 212 +++++++++++++++++++++++
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/msr-index.h | 11 +-
arch/x86/include/asm/sev.h | 2 +
arch/x86/kernel/cpu/amd.c | 19 ++
drivers/iommu/amd/init.c | 2 +-
include/linux/amd-iommu.h | 2 +-
9 files changed, 256 insertions(+), 4 deletions(-)
create mode 100644 arch/x86/coco/sev/Makefile
create mode 100644 arch/x86/coco/sev/host.c

diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
index 6aa52e719bf5..6a7d876130e2 100644
--- a/arch/x86/coco/Makefile
+++ b/arch/x86/coco/Makefile
@@ -6,3 +6,4 @@ CFLAGS_core.o += -fno-stack-protector
obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += core.o

obj-$(CONFIG_INTEL_TDX_GUEST) += tdx/
+obj-$(CONFIG_KVM_AMD_SEV) += sev/
diff --git a/arch/x86/coco/sev/Makefile b/arch/x86/coco/sev/Makefile
new file mode 100644
index 000000000000..27c0500d75c8
--- /dev/null
+++ b/arch/x86/coco/sev/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y += host.o
diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
new file mode 100644
index 000000000000..6907ce887b23
--- /dev/null
+++ b/arch/x86/coco/sev/host.c
@@ -0,0 +1,212 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD SVM-SEV Host Support.
+ *
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ *
+ * Author: Ashish Kalra <[email protected]>
+ *
+ */
+
+#include <linux/cc_platform.h>
+#include <linux/printk.h>
+#include <linux/mm_types.h>
+#include <linux/set_memory.h>
+#include <linux/memblock.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/cpumask.h>
+#include <linux/iommu.h>
+#include <linux/amd-iommu.h>
+
+#include <asm/sev.h>
+#include <asm/processor.h>
+#include <asm/setup.h>
+#include <asm/svm.h>
+#include <asm/smp.h>
+#include <asm/cpu.h>
+#include <asm/apic.h>
+#include <asm/cpuid.h>
+#include <asm/cmdline.h>
+#include <asm/iommu.h>
+
+/*
+ * The first 16KB from the RMP_BASE is used by the processor for the
+ * bookkeeping, the range needs to be added during the RMP entry lookup.
+ */
+#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
+
+static unsigned long rmptable_start __ro_after_init;
+static unsigned long rmptable_end __ro_after_init;
+
+#undef pr_fmt
+#define pr_fmt(fmt) "SEV-SNP: " fmt
+
+static int __mfd_enable(unsigned int cpu)
+{
+ u64 val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+
+ val |= MSR_AMD64_SYSCFG_MFDM;
+
+ wrmsrl(MSR_AMD64_SYSCFG, val);
+
+ return 0;
+}
+
+static __init void mfd_enable(void *arg)
+{
+ __mfd_enable(smp_processor_id());
+}
+
+static int __snp_enable(unsigned int cpu)
+{
+ u64 val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+
+ val |= MSR_AMD64_SYSCFG_SNP_EN;
+ val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
+
+ wrmsrl(MSR_AMD64_SYSCFG, val);
+
+ return 0;
+}
+
+static __init void snp_enable(void *arg)
+{
+ __snp_enable(smp_processor_id());
+}
+
+bool snp_get_rmptable_info(u64 *start, u64 *len)
+{
+ u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
+
+ rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
+ rdmsrl(MSR_AMD64_RMP_END, rmp_end);
+
+ if (!rmp_base || !rmp_end) {
+ pr_err("Memory for the RMP table has not been reserved by BIOS\n");
+ return false;
+ }
+
+ rmp_sz = rmp_end - rmp_base + 1;
+
+ /*
+ * Calculate the amount the memory that must be reserved by the BIOS to
+ * address the whole RAM, including the bookkeeping area. The RMP itself
+ * must also be covered.
+ */
+ max_rmp_pfn = max_pfn;
+ if (PHYS_PFN(rmp_end) > max_pfn)
+ max_rmp_pfn = PHYS_PFN(rmp_end);
+
+ calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+ if (calc_rmp_sz > rmp_sz) {
+ pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
+ calc_rmp_sz, rmp_sz);
+ return false;
+ }
+
+ *start = rmp_base;
+ *len = rmp_sz;
+
+ return true;
+}
+
+static __init int __snp_rmptable_init(void)
+{
+ u64 rmp_base, sz;
+ void *start;
+ u64 val;
+
+ if (!snp_get_rmptable_info(&rmp_base, &sz))
+ return 1;
+
+ pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
+ rmp_base, rmp_base + sz - 1);
+
+ start = memremap(rmp_base, sz, MEMREMAP_WB);
+ if (!start) {
+ pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, sz);
+ return 1;
+ }
+
+ /*
+ * Check if SEV-SNP is already enabled, this can happen in case of
+ * kexec boot.
+ */
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+ if (val & MSR_AMD64_SYSCFG_SNP_EN)
+ goto skip_enable;
+
+ /* Initialize the RMP table to zero */
+ memset(start, 0, sz);
+
+ /* Flush the caches to ensure that data is written before SNP is enabled. */
+ wbinvd_on_all_cpus();
+
+ /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
+ on_each_cpu(mfd_enable, NULL, 1);
+
+ /* Enable SNP on all CPUs. */
+ on_each_cpu(snp_enable, NULL, 1);
+
+skip_enable:
+ rmptable_start = (unsigned long)start;
+ rmptable_end = rmptable_start + sz - 1;
+
+ return 0;
+}
+
+static int __init snp_rmptable_init(void)
+{
+ int family, model;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ family = boot_cpu_data.x86;
+ model = boot_cpu_data.x86_model;
+
+ /*
+ * RMP table entry format is not architectural and it can vary by processor and
+ * is defined by the per-processor PPR. Restrict SNP support on the known CPU
+ * model and family for which the RMP table entry format is currently defined for.
+ */
+ if (!(family == 0x19 && model <= 0xaf) && !(family == 0x1a && model <= 0xf))
+ goto nosnp;
+
+ if (amd_iommu_snp_enable())
+ goto nosnp;
+
+ if (__snp_rmptable_init())
+ goto nosnp;
+
+ cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
+
+ return 0;
+
+nosnp:
+ setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+ return -ENOSYS;
+}
+
+/*
+ * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
+ * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
+ * called after subsys_initcall().
+ *
+ * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
+ * directly into guest private memory. In case of SNP, the IOMMU ensures that
+ * the page(s) used for DMA are hypervisor owned.
+ */
+fs_initcall(snp_rmptable_init);
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 5dfa4fb76f4b..0a9938aea305 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -99,6 +99,12 @@
# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
#endif

+#ifdef CONFIG_KVM_AMD_SEV
+# define DISABLE_SEV_SNP 0
+#else
+# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -123,7 +129,7 @@
DISABLE_ENQCMD)
#define DISABLED_MASK17 0
#define DISABLED_MASK18 0
-#define DISABLED_MASK19 0
+#define DISABLED_MASK19 (DISABLE_SEV_SNP)
#define DISABLED_MASK20 0
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ad35355ee43e..db0f3a041930 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -574,6 +574,8 @@
#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
#define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE 0xc0010132
+#define MSR_AMD64_RMP_END 0xc0010133

/* SNP feature bits enabled by the hypervisor */
#define MSR_AMD64_SNP_VTOM BIT_ULL(3)
@@ -675,7 +677,14 @@
#define MSR_K8_TOP_MEM2 0xc001001d
#define MSR_AMD64_SYSCFG 0xc0010010
#define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
-#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
+#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
+#define MSR_AMD64_SYSCFG_MFDM_BIT 19
+#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
+
#define MSR_K8_INT_PENDING_MSG 0xc0010055
/* C1E active bits in int pending message */
#define K8_INTP_C1E_ACTIVE_MASK 0x18000000
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ebc271bb6d8e..d34c46db7dd1 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -197,6 +197,7 @@ void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void __init __noreturn snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
+bool snp_get_rmptable_info(u64 *start, u64 *len);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -221,6 +222,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
{
return -ENOTTY;
}
+static inline bool snp_get_rmptable_info(u64 *start, u64 *len) { return false; }
#endif

#endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index a79774181f22..1493ddf89fdf 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -20,6 +20,7 @@
#include <asm/delay.h>
#include <asm/debugreg.h>
#include <asm/resctrl.h>
+#include <asm/sev.h>

#ifdef CONFIG_X86_64
# include <asm/mmconfig.h>
@@ -546,6 +547,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
resctrl_cpu_detect(c);
}

+static bool early_rmptable_check(void)
+{
+ u64 rmp_base, rmp_size;
+
+ /*
+ * For early BSP initialization, max_pfn won't be set up yet, wait until
+ * it is set before performing the RMP table calculations.
+ */
+ if (!max_pfn)
+ return true;
+
+ return snp_get_rmptable_info(&rmp_base, &rmp_size);
+}
+
static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
{
u64 msr;
@@ -587,6 +602,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
if (!(msr & MSR_K7_HWCR_SMMLOCK))
goto clear_sev;

+ if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
+ goto clear_snp;
+
return;

clear_all:
@@ -594,6 +612,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
clear_sev:
setup_clear_cpu_cap(X86_FEATURE_SEV);
setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+clear_snp:
setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
}
}
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 19a46b9f7357..33ea62d93540 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3665,7 +3665,7 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
}

-#ifdef CONFIG_AMD_MEM_ENCRYPT
+#ifdef CONFIG_KVM_AMD_SEV
int amd_iommu_snp_enable(void)
{
/*
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 953e6f12fa1c..8f0cde2d451c 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -206,7 +206,7 @@ int amd_iommu_pc_get_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn,
u64 *value);
struct amd_iommu *get_amd_iommu(unsigned int idx);

-#ifdef CONFIG_AMD_MEM_ENCRYPT
+#ifdef CONFIG_KVM_AMD_SEV
int amd_iommu_snp_enable(void);
#endif

--
2.25.1


2023-06-12 09:04:48

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile

On Sun, Jun 11, 2023 at 11:25:13PM -0500, Michael Roth wrote:
> Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
> arch/x86/coco, but that is generally only applicable for guest support.
>
> For SEV-SNP, helpers related purely to host support will also live in
> arch/x86/coco. To allow for CoCo-related host support code in
> arch/x86/coco, move that check down into the Makefile and check for it
> specifically when needed.

Hm. TDX host support uses arch/x86/virt/vmx/tdx/. I think we need to be
consistent here.

IIRC, Borislav proposed the scheme that TDX uses.


--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-12 11:03:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v9 02/51] KVM: x86: Add gmem hook for invalidating private memory

On Sun, Jun 11, 2023 at 11:25:10PM -0500, Michael Roth wrote:
> TODO: add a CONFIG option that can be to completely skip arch
> invalidation loop and avoid __weak references for arch/platforms that
> don't need an additional invalidation hook.
>
> In some cases, like with SEV-SNP, guest memory needs to be updated in a
> platform-specific manner before it can be safely freed back to the host.
> Add hooks to wire up handling of this sort when freeing memory in
> response to FALLOC_FL_PUNCH_HOLE operations.
>
> Also issue invalidations of all allocated pages when releasing the gmem
> file so that the pages are not left in an unusable state when they get
> freed back to the host.
>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/x86.c | 6 ++++
> include/linux/kvm_host.h | 3 ++
> virt/kvm/guest_mem.c | 48 ++++++++++++++++++++++++++++--
> 5 files changed, 57 insertions(+), 2 deletions(-)

ld: arch/x86/kvm/../../../virt/kvm/eventfd.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/binary_stats.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/arch/x86/kvm/../../../virt/kvm/binary_stats.c:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/vfio.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/coalesced_mmio.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/async_pf.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/irqchip.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/dirty_ring.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/pfncache.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/x86.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/emulate.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/i8259.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/irq.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/lapic.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/i8254.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/ioapic.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/irq_comm.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/cpuid.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/pmu.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mtrr.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/hyperv.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/debugfs.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mmu/mmu.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mmu/page_track.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mmu/spte.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mmu/tdp_iter.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/arch/x86/kvm/mmu/tdp_iter.c:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mmu/tdp_mmu.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/smm.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
make[3]: *** [scripts/Makefile.build:452: arch/x86/kvm/kvm.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [scripts/Makefile.build:494: arch/x86/kvm] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [scripts/Makefile.build:494: arch/x86] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:2028: .] Error 2

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-12 16:28:29

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH RFC v9 11/51] x86/traps: Define RMP violation #PF error code

On 6/11/23 21:25, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> Bit 31 in the page fault-error bit will be set when processor encounters
> an RMP violation.
>
> While at it, use the BIT_ULL() macro.
...
> enum x86_pf_error_code {
> - X86_PF_PROT = 1 << 0,
> - X86_PF_WRITE = 1 << 1,
> - X86_PF_USER = 1 << 2,
> - X86_PF_RSVD = 1 << 3,
> - X86_PF_INSTR = 1 << 4,
> - X86_PF_PK = 1 << 5,
> - X86_PF_SGX = 1 << 15,
> + X86_PF_PROT = BIT(0),
> + X86_PF_WRITE = BIT(1),
> + X86_PF_USER = BIT(2),
> + X86_PF_RSVD = BIT(3),
> + X86_PF_INSTR = BIT(4),
> + X86_PF_PK = BIT(5),
> + X86_PF_SGX = BIT(15),
> + X86_PF_RMP = BIT(31),
> };

It would be nice if the changelog "BIT_ULL()" matched the code "BIT()". :)

With that fixed,

Acked-by: Dave Hansen <[email protected]>


2023-06-12 17:09:43

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH RFC v9 14/51] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

On 6/11/23 21:25, Michael Roth wrote:
> +/*
> + * Assign a page to guest using the RMPUPDATE instruction.
> + */
> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
> +{
> + struct rmp_state val;
> +
> + pr_debug("%s: GPA: 0x%llx, PFN: 0x%llx, level: %d, immutable: %d\n",
> + __func__, gpa, pfn, level, immutable);

Is this needed *EVERY* time a page is assigned to a guest? As in, if I
create a 4GB guest, I'll see a literal million of these pr_debug()s in
dmesg?


2023-06-12 17:22:09

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH RFC v9 29/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command

> +
> +static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_start start = {0};
> + struct kvm_sev_snp_launch_start params;
> + int rc;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> + return -EFAULT;
> +
> + sev->snp_context = snp_context_create(kvm, argp);
> + if (!sev->snp_context)
> + return -ENOTTY;


I commented on a previous series but I think the bug is still here. I
think users can repeatedly call KVM_SEV_SNP_LAUNCH_START to have KVM
keep allocating more snp_contexts above.

Should we check if the VM already has a |snp_context| and error out if so?

>
> +
> + start.gctx_paddr = __psp_pa(sev->snp_context);
> + start.policy = params.policy;
> + memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
> + rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
> + if (rc)
> + goto e_free_context;
> +
> + sev->fd = argp->sev_fd;
> + rc = snp_bind_asid(kvm, &argp->error);
> + if (rc)
> + goto e_free_context;
> +
> + return 0;
> +
> +e_free_context:
> + snp_decommission_context(kvm);
> +
> + return rc;
> +}
> +

2023-06-14 14:49:57

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH RFC v9 03/51] KVM: x86: Use full 64-bit error code for kvm_mmu_do_page_fault

On Sun, Jun 11, 2023 at 11:25:11PM -0500,
Michael Roth <[email protected]> wrote:

> The upper bits will be needed in some cases to distinguish between
> nested page faults for private/shared pages, so pass along the full
> 64-bit value.
>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 3 +--
> arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
> 2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c54672ad6cbc..0d3983b9aa7e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5829,8 +5829,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> }
>
> if (r == RET_PF_INVALID) {
> - r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
> - lower_32_bits(error_code), false,
> + r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
> &emulation_type);
> if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
> return -EIO;
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index f1786698ae00..780b91e1da9f 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -283,11 +283,11 @@ enum {
> };
>
> static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> - u32 err, bool prefetch, int *emulation_type)
> + u64 err, bool prefetch, int *emulation_type)
> {
> struct kvm_page_fault fault = {
> .addr = cr2_or_gpa,
> - .error_code = err,
> + .error_code = lower_32_bits(err),
> .exec = err & PFERR_FETCH_MASK,
> .write = err & PFERR_WRITE_MASK,
> .present = err & PFERR_PRESENT_MASK,
> --
> 2.25.1

Hi. I'd like to pass around 64bit for TDX and come up with the following one.
Does it work for you?

From 53273b67e9be09129d35ac00cf9ce739d3fb4e2c Mon Sep 17 00:00:00 2001
Message-Id: <53273b67e9be09129d35ac00cf9ce739d3fb4e2c.1686752340.git.isaku.yamahata@intel.com>
From: Isaku Yamahata <[email protected]>
Date: Fri, 17 Mar 2023 12:58:42 -0700
Subject: [PATCH] KVM: x86/mmu: Pass round full 64-bit error code for the KVM
page fault

In some cases the full 64-bit error code for the KVM page fault will be
needed to make this determination, so also update kvm_mmu_do_page_fault()
to accept the full 64-bit value so it can be plumbed through to the
callback.

The upper 32 bits of error code is discarded at kvm_mmu_page_fault()
by lower_32_bits(). Now it's passed down as full 64 bits. It turns out
that only FNAME(page_fault) depends on it. Move lower_32_bits() into
FNAME(page_fault).

The accesses of fault->error_code are as follows
- FNAME(page_fault): change to explicitly use lower_32_bits()
- kvm_tdp_page_fault(): explicit mask with PFERR_LEVEL_MASK
- kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
PFERR_NESTED_GUEST_PAGE
- mmutrace: changed u32 -> u64
- pgprintk(): change %x -> %llx

Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/mmu.h | 2 +-
arch/x86/kvm/mmu/mmu.c | 7 +++----
arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
arch/x86/kvm/mmu/mmutrace.h | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 4 ++--
5 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6cc2558e977d..b6c9c2e27d0b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -176,7 +176,7 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
}

kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
- u32 error_code, int max_level);
+ u64 error_code, int max_level);

/*
* Check if a given access (described through the I/D, W/R and U/S bits of a
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2c6d9d8d2c10..e4457eaa10a9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4941,7 +4941,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
static int nonpaging_page_fault(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
- pgprintk("%s: gva %llx error %x\n", __func__, fault->addr, fault->error_code);
+ pgprintk("%s: gva %llx error %llx\n", __func__, fault->addr, fault->error_code);

/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
fault->max_level = PG_LEVEL_2M;
@@ -5062,7 +5062,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
}

kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
- u32 error_code, int max_level)
+ u64 error_code, int max_level)
{
int r;
struct kvm_page_fault fault = (struct kvm_page_fault) {
@@ -6317,8 +6317,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
}

if (r == RET_PF_INVALID) {
- r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
- lower_32_bits(error_code), false,
+ r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
&emulation_type);
if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
return -EIO;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 2d8a5df56f20..979fd7c26610 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -342,7 +342,7 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
struct kvm_page_fault {
/* arguments to kvm_mmu_do_page_fault. */
const gpa_t addr;
- const u32 error_code;
+ const u64 error_code;
const bool prefetch;

/* Derived from error_code. */
@@ -437,7 +437,7 @@ enum {
};

static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- u32 err, bool prefetch, int *emulation_type)
+ u64 err, bool prefetch, int *emulation_type)
{
struct kvm_page_fault fault = {
.addr = cr2_or_gpa,
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 2d7555381955..2e77883c92f6 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -261,7 +261,7 @@ TRACE_EVENT(
TP_STRUCT__entry(
__field(int, vcpu_id)
__field(gpa_t, cr2_or_gpa)
- __field(u32, error_code)
+ __field(u64, error_code)
__field(u64 *, sptep)
__field(u64, old_spte)
__field(u64, new_spte)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index d87f95245ee9..8aeafd9178ac 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -758,7 +758,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
struct guest_walker walker;
int r;

- pgprintk("%s: addr %llx err %x\n", __func__, fault->addr, fault->error_code);
+ pgprintk("%s: addr %llx err %llx\n", __func__, fault->addr, fault->error_code);
WARN_ON_ONCE(fault->is_tdp);

/*
@@ -767,7 +767,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* The bit needs to be cleared before walking guest page tables.
*/
r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
- fault->error_code & ~PFERR_RSVD_MASK);
+ lower_32_bits(fault->error_code) & ~PFERR_RSVD_MASK);

/*
* The page is not mapped by the guest. Let the guest handle it.
--
2.25.1




--
Isaku Yamahata <[email protected]>

2023-06-20 12:21:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile

On Sun, Jun 11, 2023 at 11:25:13PM -0500, Michael Roth wrote:
> Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
^^^^^^

Use proper english words pls.

> arch/x86/coco, but that is generally only applicable for guest support.
>
> For SEV-SNP, helpers related purely to host support will also live in
> arch/x86/coco. To allow for CoCo-related host support code in
> arch/x86/coco, move that check down into the Makefile and check for it
> specifically when needed.

I have no clue what that means. Example?

The last time we talked about paths, we ended up agreeing on:

https://lore.kernel.org/all/[email protected]/

So your "helpers related purely to host support" should go to

arch/x86/virt/svm/sev*.c

And just to keep it simple, that should be

arch/x86/virt/svm/sev.c

and if there's real need to split that, we can do that later.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-21 08:59:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile

On Tue, Jun 20, 2023 at 03:43:15PM -0500, Michael Roth wrote:
> Basically, arch/x86/coco/Makefile is never processed if arch/x86/Kbuild
> indicates that CONFIG_HAS_CC_PLATFORM is not set. So if we want to have
> stuff in arch/x86/coco/Makefile that build for !CONFIG_HAS_CC_PLATFORM,
> like SNP host support, which does not rely on CONFIG_HAS_CC_PLATFORM
> being set, that check needs to be moved down into arch/x86/coco/Makefile.

Ok, so if you put SNP host support into arch/x86/virt/svm/sev.c, that
should work too and won't have any relation to CONFIG_HAS_CC_PLATFORM,
right?

The CC_PLATFORM thing is a way to check for confidential computing guest
features by abstracting the capabilities so that you don't have to check
*each* and *every* conf guest type in the conditionals and thus go nuts.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-21 09:19:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support

On Mon, Jun 12, 2023 at 08:34:02AM -0700, Dave Hansen wrote:
> On 6/11/23 21:25, Michael Roth wrote:
> > + /*
> > + * Calculate the amount the memory that must be reserved by the BIOS to
> > + * address the whole RAM, including the bookkeeping area. The RMP itself
> > + * must also be covered.
> > + */
> > + max_rmp_pfn = max_pfn;
> > + if (PHYS_PFN(rmp_end) > max_pfn)
> > + max_rmp_pfn = PHYS_PFN(rmp_end);
>
> Could you say a little here about how this deals with memory hotplug?

Does SNP hw even support memory hotplug?

I think in order to support that, you'd need some special dance because
of the RMP table etc...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-21 09:49:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support

On Sun, Jun 11, 2023 at 11:25:15PM -0500, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details

Rather say 'APM v2, section "Secure Nested Paging (SEV-SNP)"' because
the numbering is more likely to change than the name in the future. With
the name, people can find it faster.

> a number of steps needed to detect/enable SEV-SNP and RMP table support
> on the host:
>
> - Detect SEV-SNP support based on CPUID bit
> - Initialize the RMP table memory reported by the RMP base/end MSR
> registers and configure IOMMU to be compatible with RMP access
> restrictions
> - Set the MtrrFixDramModEn bit in SYSCFG MSR
> - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
> - Configure IOMMU
>
> RMP table entry format is non-architectural and it can vary by
> processor. It is defined by the PPR. Restrict SNP support to CPU
> models/families which are compatible with the current RMP table entry
> format to guard against any undefined behavior when running on other
> system types. Future models/support will handle this through an
> architectural mechanism to allow for broader compatibility.

I'm guessing this is all for live migration between SNP hosts. If so,
then there will have to be a guest API to handle the differences.

> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
> instead of CONFIG_AMD_MEM_ENCRYPT.

Does that mean that even on CONFIG_AMD_MEM_ENCRYPT=n kernels, host SNP
can function?

Do we even want that?

I'd expect that a host SNP kernel should have SME enabled too even
though it is not absolutely necessary.

> Co-developed-by: Ashish Kalra <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Co-developed-by: Tom Lendacky <[email protected]>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> [mdr: rework commit message to be clearer about what patch does, squash
> in early_rmptable_check() handling from Tom]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/coco/Makefile | 1 +
> arch/x86/coco/sev/Makefile | 3 +
> arch/x86/coco/sev/host.c | 212 +++++++++++++++++++++++
> arch/x86/include/asm/disabled-features.h | 8 +-
> arch/x86/include/asm/msr-index.h | 11 +-
> arch/x86/include/asm/sev.h | 2 +
> arch/x86/kernel/cpu/amd.c | 19 ++
> drivers/iommu/amd/init.c | 2 +-
> include/linux/amd-iommu.h | 2 +-
> 9 files changed, 256 insertions(+), 4 deletions(-)
> create mode 100644 arch/x86/coco/sev/Makefile
> create mode 100644 arch/x86/coco/sev/host.c

Ignored review comments here:

https://lore.kernel.org/r/[email protected]

Ignoring this one for now too.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-21 14:37:07

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support

On 6/21/23 02:15, Borislav Petkov wrote:
> On Mon, Jun 12, 2023 at 08:34:02AM -0700, Dave Hansen wrote:
>> On 6/11/23 21:25, Michael Roth wrote:
>>> + /*
>>> + * Calculate the amount the memory that must be reserved by the BIOS to
>>> + * address the whole RAM, including the bookkeeping area. The RMP itself
>>> + * must also be covered.
>>> + */
>>> + max_rmp_pfn = max_pfn;
>>> + if (PHYS_PFN(rmp_end) > max_pfn)
>>> + max_rmp_pfn = PHYS_PFN(rmp_end);
>> Could you say a little here about how this deals with memory hotplug?
> Does SNP hw even support memory hotplug?
>
> I think in order to support that, you'd need some special dance because
> of the RMP table etc...

Yep, there's the hardware side and then there are fun nuggets like using
mem= and then doing a software-only hot-add later after boot.

Also, if the hardware doesn't support any kind of hotplug, it would be
great to point to the place in the spec where it says that.

2023-06-21 14:38:53

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support

On 6/21/23 04:42, Borislav Petkov wrote:
> On Sun, Jun 11, 2023 at 11:25:15PM -0500, Michael Roth wrote:
>> From: Brijesh Singh <[email protected]>
>>
>> The memory integrity guarantees of SEV-SNP are enforced through a new
>> structure called the Reverse Map Table (RMP). The RMP is a single data
>> structure shared across the system that contains one entry for every 4K
>> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
>
> Rather say 'APM v2, section "Secure Nested Paging (SEV-SNP)"' because
> the numbering is more likely to change than the name in the future. With
> the name, people can find it faster.
>
>> a number of steps needed to detect/enable SEV-SNP and RMP table support
>> on the host:
>>
>> - Detect SEV-SNP support based on CPUID bit
>> - Initialize the RMP table memory reported by the RMP base/end MSR
>> registers and configure IOMMU to be compatible with RMP access
>> restrictions
>> - Set the MtrrFixDramModEn bit in SYSCFG MSR
>> - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
>> - Configure IOMMU
>>
>> RMP table entry format is non-architectural and it can vary by
>> processor. It is defined by the PPR. Restrict SNP support to CPU
>> models/families which are compatible with the current RMP table entry
>> format to guard against any undefined behavior when running on other
>> system types. Future models/support will handle this through an
>> architectural mechanism to allow for broader compatibility.
>
> I'm guessing this is all for live migration between SNP hosts. If so,
> then there will have to be a guest API to handle the differences.
>
>> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
>> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
>> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
>> instead of CONFIG_AMD_MEM_ENCRYPT.
>
> Does that mean that even on CONFIG_AMD_MEM_ENCRYPT=n kernels, host SNP
> can function?

Yes, because CONFIG_AMD_MEM_ENCRYPT is mainly for dealing with the
encryption bit.

>
> Do we even want that?

We support that today with SEV and SEV-ES guests. The host/hypervisor
kernel does not need CONFIG_AMD_MEM_ENCRYPT=y in order to run SEV guests.

>
> I'd expect that a host SNP kernel should have SME enabled too even
> though it is not absolutely necessary.

I recommend using TSME over SME.

Thanks,
Tom

>
>> Co-developed-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> Co-developed-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> [mdr: rework commit message to be clearer about what patch does, squash
>> in early_rmptable_check() handling from Tom]
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> arch/x86/coco/Makefile | 1 +
>> arch/x86/coco/sev/Makefile | 3 +
>> arch/x86/coco/sev/host.c | 212 +++++++++++++++++++++++
>> arch/x86/include/asm/disabled-features.h | 8 +-
>> arch/x86/include/asm/msr-index.h | 11 +-
>> arch/x86/include/asm/sev.h | 2 +
>> arch/x86/kernel/cpu/amd.c | 19 ++
>> drivers/iommu/amd/init.c | 2 +-
>> include/linux/amd-iommu.h | 2 +-
>> 9 files changed, 256 insertions(+), 4 deletions(-)
>> create mode 100644 arch/x86/coco/sev/Makefile
>> create mode 100644 arch/x86/coco/sev/host.c
>
> Ignored review comments here:
>
> https://lore.kernel.org/r/[email protected]
>
> Ignoring this one for now too.
>

2023-06-21 16:18:37

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support

On Wed, Jun 21, 2023 at 07:31:34AM -0700, Dave Hansen wrote:
> Yep, there's the hardware side and then there are fun nuggets like using
> mem= and then doing a software-only hot-add later after boot.

Ah, right, Mike, I think we need to check whether mem= has any effect on
SNP.

> Also, if the hardware doesn't support any kind of hotplug, it would be
> great to point to the place in the spec where it says that.

I honestly don't know. But I can't recall ever hearing about hardware
memory hotplug so something worth to check too.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-21 19:26:25

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support

Hello Boris,

On 6/21/2023 4:42 AM, Borislav Petkov wrote:
> On Sun, Jun 11, 2023 at 11:25:15PM -0500, Michael Roth wrote:
>> From: Brijesh Singh <[email protected]>
>>
>> The memory integrity guarantees of SEV-SNP are enforced through a new
>> structure called the Reverse Map Table (RMP). The RMP is a single data
>> structure shared across the system that contains one entry for every 4K
>> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
>
> Rather say 'APM v2, section "Secure Nested Paging (SEV-SNP)"' because
> the numbering is more likely to change than the name in the future. With
> the name, people can find it faster.
>
>> a number of steps needed to detect/enable SEV-SNP and RMP table support
>> on the host:
>>
>> - Detect SEV-SNP support based on CPUID bit
>> - Initialize the RMP table memory reported by the RMP base/end MSR
>> registers and configure IOMMU to be compatible with RMP access
>> restrictions
>> - Set the MtrrFixDramModEn bit in SYSCFG MSR
>> - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
>> - Configure IOMMU
>>
>> RMP table entry format is non-architectural and it can vary by
>> processor. It is defined by the PPR. Restrict SNP support to CPU
>> models/families which are compatible with the current RMP table entry
>> format to guard against any undefined behavior when running on other
>> system types. Future models/support will handle this through an
>> architectural mechanism to allow for broader compatibility.
>
> I'm guessing this is all for live migration between SNP hosts. If so,
> then there will have to be a guest API to handle the differences.

This is basically for the RMP table entry format/structure definition in
arch/x86/coco/sev/host.c, as this is non-architectural it is defined in
a .c file instead of a header file, so that the structure remains
private (and restricted to that file) to the SNP host code and not
exposed to the rest of the kernel.

As mentioned in the comments above, future CPU models may support RMP
table accesses in an architectural way.

>
>> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
>> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
>> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
>> instead of CONFIG_AMD_MEM_ENCRYPT.
>
> Does that mean that even on CONFIG_AMD_MEM_ENCRYPT=n kernels, host SNP
> can function?
>

Yes, host SNP is supposed to function with CONFIG_AMD_MEM_ENCRYPT=n.

CONFIG_AMD_MEM_ENCRYPT=y is needed for SNP guest.

> Do we even want that?
>
> I'd expect that a host SNP kernel should have SME enabled too even
> though it is not absolutely necessary.

Yes, we typically test host SNP kernel with SME enabled.

Thanks,
Ashish

>
>> Co-developed-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> Co-developed-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> [mdr: rework commit message to be clearer about what patch does, squash
>> in early_rmptable_check() handling from Tom]
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> arch/x86/coco/Makefile | 1 +
>> arch/x86/coco/sev/Makefile | 3 +
>> arch/x86/coco/sev/host.c | 212 +++++++++++++++++++++++
>> arch/x86/include/asm/disabled-features.h | 8 +-
>> arch/x86/include/asm/msr-index.h | 11 +-
>> arch/x86/include/asm/sev.h | 2 +
>> arch/x86/kernel/cpu/amd.c | 19 ++
>> drivers/iommu/amd/init.c | 2 +-
>> include/linux/amd-iommu.h | 2 +-
>> 9 files changed, 256 insertions(+), 4 deletions(-)
>> create mode 100644 arch/x86/coco/sev/Makefile
>> create mode 100644 arch/x86/coco/sev/host.c
>
> Ignored review comments here:
>
> https://lore.kernel.org/r/[email protected]
>
> Ignoring this one for now too.
>

2023-06-30 21:58:11

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile

On Wed, Jun 21, 2023 at 10:54:00AM +0200, Borislav Petkov wrote:
> On Tue, Jun 20, 2023 at 03:43:15PM -0500, Michael Roth wrote:
> > Basically, arch/x86/coco/Makefile is never processed if arch/x86/Kbuild
> > indicates that CONFIG_HAS_CC_PLATFORM is not set. So if we want to have
> > stuff in arch/x86/coco/Makefile that build for !CONFIG_HAS_CC_PLATFORM,
> > like SNP host support, which does not rely on CONFIG_HAS_CC_PLATFORM
> > being set, that check needs to be moved down into arch/x86/coco/Makefile.
>
> Ok, so if you put SNP host support into arch/x86/virt/svm/sev.c, that
> should work too and won't have any relation to CONFIG_HAS_CC_PLATFORM,
> right?

Right, that works out just as well, and ends up being a bit more
straightforward. I have it implemented here:

https://github.com/mdroth/linux/commits/snp-host-latest-v9b

https://github.com/mdroth/linux/commit/a889a2dd64b62d9c3bf74cf02e7d8d71c7061667

and dropped the patch that reworks arch/x86/coco/Makefile.

Thanks,

Mike

>
> The CC_PLATFORM thing is a way to check for confidential computing guest
> features by abstracting the capabilities so that you don't have to check
> *each* and *every* conf guest type in the conditionals and thus go nuts.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile

Hi,

On 6/11/23 9:25 PM, Michael Roth wrote:
> Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
> arch/x86/coco, but that is generally only applicable for guest support.>
> For SEV-SNP, helpers related purely to host support will also live in
> arch/x86/coco. To allow for CoCo-related host support code in
> arch/x86/coco, move that check down into the Makefile and check for it
> specifically when needed.


I think CONFIG_HAS_CC_PLATFORM is not meant to be guest specific (otherwise,
we could have named it CONFIG_HAS_CC_GUEST). Will it create any issue if
we enable it in host?

>
> Cc: Kirill A. Shutemov <[email protected]>
> Suggested-by: Tom Lendacky <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/Kbuild | 2 +-
> arch/x86/coco/Makefile | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
> index 5a83da703e87..1889cef48b58 100644
> --- a/arch/x86/Kbuild
> +++ b/arch/x86/Kbuild
> @@ -1,5 +1,5 @@
> # SPDX-License-Identifier: GPL-2.0
> -obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += coco/
> +obj-y += coco/
>
> obj-y += entry/
>
> diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
> index c816acf78b6a..6aa52e719bf5 100644
> --- a/arch/x86/coco/Makefile
> +++ b/arch/x86/coco/Makefile
> @@ -3,6 +3,6 @@ CFLAGS_REMOVE_core.o = -pg
> KASAN_SANITIZE_core.o := n
> CFLAGS_core.o += -fno-stack-protector
>
> -obj-y += core.o
> +obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += core.o
>
> obj-$(CONFIG_INTEL_TDX_GUEST) += tdx/

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2023-07-10 13:16:15

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile

On 7/9/23 22:05, Sathyanarayanan Kuppuswamy wrote:
> Hi,
>
> On 6/11/23 9:25 PM, Michael Roth wrote:
>> Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
>> arch/x86/coco, but that is generally only applicable for guest support.>
>> For SEV-SNP, helpers related purely to host support will also live in
>> arch/x86/coco. To allow for CoCo-related host support code in
>> arch/x86/coco, move that check down into the Makefile and check for it
>> specifically when needed.
>
>
> I think CONFIG_HAS_CC_PLATFORM is not meant to be guest specific (otherwise,

Correct, it is used in bare-metal for SME support, so that needs to
continue to work.

Thanks,
Tom

> we could have named it CONFIG_HAS_CC_GUEST). Will it create any issue if
> we enable it in host?
>
>>
>> Cc: Kirill A. Shutemov <[email protected]>
>> Suggested-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> arch/x86/Kbuild | 2 +-
>> arch/x86/coco/Makefile | 2 +-
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
>> index 5a83da703e87..1889cef48b58 100644
>> --- a/arch/x86/Kbuild
>> +++ b/arch/x86/Kbuild
>> @@ -1,5 +1,5 @@
>> # SPDX-License-Identifier: GPL-2.0
>> -obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += coco/
>> +obj-y += coco/
>>
>> obj-y += entry/
>>
>> diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
>> index c816acf78b6a..6aa52e719bf5 100644
>> --- a/arch/x86/coco/Makefile
>> +++ b/arch/x86/coco/Makefile
>> @@ -3,6 +3,6 @@ CFLAGS_REMOVE_core.o = -pg
>> KASAN_SANITIZE_core.o := n
>> CFLAGS_core.o += -fno-stack-protector
>>
>> -obj-y += core.o
>> +obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += core.o
>>
>> obj-$(CONFIG_INTEL_TDX_GUEST) += tdx/
>

2023-08-09 13:32:57

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [PATCH RFC v9 19/51] x86/sev: Introduce snp leaked pages list

On Sun, Jun 11, 2023 at 11:25:27PM -0500, Michael Roth wrote:
> From: Ashish Kalra <[email protected]>
>
> Pages are unsafe to be released back to the page-allocator, if they
> have been transitioned to firmware/guest state and can't be reclaimed
> or transitioned back to hypervisor/shared state. In this case add
> them to an internal leaked pages list to ensure that they are not freed
> or touched/accessed to cause fatal page faults.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> [mdr: relocate to arch/x86/coco/sev/host.c]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/coco/sev/host.c | 28 ++++++++++++++++++++++++++++
> arch/x86/include/asm/sev-host.h | 3 +++
> 2 files changed, 31 insertions(+)
>
> diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
> index cd3b4c6a25bc..373e91f5a337 100644
> --- a/arch/x86/coco/sev/host.c
> +++ b/arch/x86/coco/sev/host.c
> @@ -64,6 +64,12 @@ struct rmpentry {
> static unsigned long rmptable_start __ro_after_init;
> static unsigned long rmptable_end __ro_after_init;
>
> +/* list of pages which are leaked and cannot be reclaimed */
> +static LIST_HEAD(snp_leaked_pages_list);
> +static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
> +
> +static atomic_long_t snp_nr_leaked_pages = ATOMIC_LONG_INIT(0);
> +
> #undef pr_fmt
> #define pr_fmt(fmt) "SEV-SNP: " fmt
>
> @@ -494,3 +500,25 @@ int rmp_make_shared(u64 pfn, enum pg_level level)
> return rmpupdate(pfn, &val);
> }
> EXPORT_SYMBOL_GPL(rmp_make_shared);
> +
> +void snp_leak_pages(unsigned long pfn, unsigned int npages)
> +{
> + struct page *page = pfn_to_page(pfn);
> +
> + WARN(1, "psc failed, pfn 0x%lx pages %d (marked offline)\n", pfn, npages);
> +
> + spin_lock(&snp_leaked_pages_list_lock);
> + while (npages--) {
> + /*
> + * Reuse the page's buddy list for chaining into the leaked
> + * pages list. This page should not be on a free list currently
> + * and is also unsafe to be added to a free list.
> + */
> + list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
> + sev_dump_rmpentry(pfn);
> + pfn++;
> + }
> + spin_unlock(&snp_leaked_pages_list_lock);
> + atomic_long_inc(&snp_nr_leaked_pages);
> +}
> +EXPORT_SYMBOL_GPL(snp_leak_pages);
> diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
> index 753e80d16433..bab3b226777a 100644
> --- a/arch/x86/include/asm/sev-host.h
> +++ b/arch/x86/include/asm/sev-host.h
> @@ -19,6 +19,8 @@ void sev_dump_rmpentry(u64 pfn);
> int psmash(u64 pfn);
> int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> int rmp_make_shared(u64 pfn, enum pg_level level);
> +void snp_leak_pages(unsigned long pfn, unsigned int npages);
> +
> #else
> static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
> static inline void sev_dump_rmpentry(u64 pfn) {}
> @@ -29,6 +31,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
> return -ENODEV;
> }
> static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
> +void snp_leak_pages(unsigned long pfn, unsigned int npages) {}

This needs to be 'static inline' or the build fails with multiple definition errors.
I'm building a guest kernel with CONFIG_KVM_AMD_SEV disabled.

Jeremi

> #endif
>
> #endif
> --
> 2.25.1
>

2023-08-09 14:05:36

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support

On Sun, Jun 11, 2023 at 11:25:15PM -0500, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
> a number of steps needed to detect/enable SEV-SNP and RMP table support
> on the host:
>
> - Detect SEV-SNP support based on CPUID bit
> - Initialize the RMP table memory reported by the RMP base/end MSR
> registers and configure IOMMU to be compatible with RMP access
> restrictions
> - Set the MtrrFixDramModEn bit in SYSCFG MSR
> - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
> - Configure IOMMU
>
> RMP table entry format is non-architectural and it can vary by
> processor. It is defined by the PPR. Restrict SNP support to CPU
> models/families which are compatible with the current RMP table entry
> format to guard against any undefined behavior when running on other
> system types. Future models/support will handle this through an
> architectural mechanism to allow for broader compatibility.
>
> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
> instead of CONFIG_AMD_MEM_ENCRYPT.
>
> Co-developed-by: Ashish Kalra <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Co-developed-by: Tom Lendacky <[email protected]>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> [mdr: rework commit message to be clearer about what patch does, squash
> in early_rmptable_check() handling from Tom]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/coco/Makefile | 1 +
> arch/x86/coco/sev/Makefile | 3 +
> arch/x86/coco/sev/host.c | 212 +++++++++++++++++++++++
> arch/x86/include/asm/disabled-features.h | 8 +-
> arch/x86/include/asm/msr-index.h | 11 +-
> arch/x86/include/asm/sev.h | 2 +
> arch/x86/kernel/cpu/amd.c | 19 ++
> drivers/iommu/amd/init.c | 2 +-
> include/linux/amd-iommu.h | 2 +-
> 9 files changed, 256 insertions(+), 4 deletions(-)
> create mode 100644 arch/x86/coco/sev/Makefile
> create mode 100644 arch/x86/coco/sev/host.c
>
> diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
> index 6aa52e719bf5..6a7d876130e2 100644
> --- a/arch/x86/coco/Makefile
> +++ b/arch/x86/coco/Makefile
> @@ -6,3 +6,4 @@ CFLAGS_core.o += -fno-stack-protector
> obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += core.o
>
> obj-$(CONFIG_INTEL_TDX_GUEST) += tdx/
> +obj-$(CONFIG_KVM_AMD_SEV) += sev/
> diff --git a/arch/x86/coco/sev/Makefile b/arch/x86/coco/sev/Makefile
> new file mode 100644
> index 000000000000..27c0500d75c8
> --- /dev/null
> +++ b/arch/x86/coco/sev/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-y += host.o
> diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
> new file mode 100644
> index 000000000000..6907ce887b23
> --- /dev/null
> +++ b/arch/x86/coco/sev/host.c
> @@ -0,0 +1,212 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * AMD SVM-SEV Host Support.
> + *
> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> + *
> + * Author: Ashish Kalra <[email protected]>
> + *
> + */
> +
> +#include <linux/cc_platform.h>
> +#include <linux/printk.h>
> +#include <linux/mm_types.h>
> +#include <linux/set_memory.h>
> +#include <linux/memblock.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/cpumask.h>
> +#include <linux/iommu.h>
> +#include <linux/amd-iommu.h>
> +
> +#include <asm/sev.h>
> +#include <asm/processor.h>
> +#include <asm/setup.h>
> +#include <asm/svm.h>
> +#include <asm/smp.h>
> +#include <asm/cpu.h>
> +#include <asm/apic.h>
> +#include <asm/cpuid.h>
> +#include <asm/cmdline.h>
> +#include <asm/iommu.h>
> +
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the
> + * bookkeeping, the range needs to be added during the RMP entry lookup.
> + */
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> +
> +static unsigned long rmptable_start __ro_after_init;
> +static unsigned long rmptable_end __ro_after_init;
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt) "SEV-SNP: " fmt
> +
> +static int __mfd_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_MFDM;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void mfd_enable(void *arg)
> +{
> + __mfd_enable(smp_processor_id());
> +}
> +
> +static int __snp_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_SNP_EN;
> + val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void snp_enable(void *arg)
> +{
> + __snp_enable(smp_processor_id());
> +}
> +
> +bool snp_get_rmptable_info(u64 *start, u64 *len)
> +{
> + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
> +
> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> + if (!rmp_base || !rmp_end) {
> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
> + return false;
> + }
> +
> + rmp_sz = rmp_end - rmp_base + 1;
> +
> + /*
> + * Calculate the amount the memory that must be reserved by the BIOS to
> + * address the whole RAM, including the bookkeeping area. The RMP itself
> + * must also be covered.
> + */
> + max_rmp_pfn = max_pfn;
> + if (PHYS_PFN(rmp_end) > max_pfn)
> + max_rmp_pfn = PHYS_PFN(rmp_end);
> +
> + calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> + if (calc_rmp_sz > rmp_sz) {
> + pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> + calc_rmp_sz, rmp_sz);
> + return false;
> + }
> +
> + *start = rmp_base;
> + *len = rmp_sz;
> +
> + return true;
> +}
> +
> +static __init int __snp_rmptable_init(void)
> +{
> + u64 rmp_base, sz;
> + void *start;
> + u64 val;
> +
> + if (!snp_get_rmptable_info(&rmp_base, &sz))
> + return 1;
> +
> + pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
> + rmp_base, rmp_base + sz - 1);
> +
> + start = memremap(rmp_base, sz, MEMREMAP_WB);
> + if (!start) {
> + pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, sz);
> + return 1;
> + }
> +
> + /*
> + * Check if SEV-SNP is already enabled, this can happen in case of
> + * kexec boot.
> + */
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
> + goto skip_enable;
> +
> + /* Initialize the RMP table to zero */
> + memset(start, 0, sz);
> +
> + /* Flush the caches to ensure that data is written before SNP is enabled. */
> + wbinvd_on_all_cpus();
> +
> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
> + on_each_cpu(mfd_enable, NULL, 1);
> +
> + /* Enable SNP on all CPUs. */
> + on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> + rmptable_start = (unsigned long)start;
> + rmptable_end = rmptable_start + sz - 1;
> +
> + return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> + int family, model;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + family = boot_cpu_data.x86;
> + model = boot_cpu_data.x86_model;
> +
> + /*
> + * RMP table entry format is not architectural and it can vary by processor and
> + * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> + * model and family for which the RMP table entry format is currently defined for.
> + */
> + if (!(family == 0x19 && model <= 0xaf) && !(family == 0x1a && model <= 0xf))
> + goto nosnp;
> +
> + if (amd_iommu_snp_enable())
> + goto nosnp;
> +
> + if (__snp_rmptable_init())
> + goto nosnp;
> +
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> + return 0;
> +
> +nosnp:
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> + return -ENOSYS;
> +}
> +
> +/*
> + * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
> + * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
> + * called after subsys_initcall().
> + *
> + * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
> + * directly into guest private memory. In case of SNP, the IOMMU ensures that
> + * the page(s) used for DMA are hypervisor owned.
> + */
> +fs_initcall(snp_rmptable_init);
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 5dfa4fb76f4b..0a9938aea305 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -99,6 +99,12 @@
> # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
> #endif
>
> +#ifdef CONFIG_KVM_AMD_SEV
> +# define DISABLE_SEV_SNP 0
> +#else
> +# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
> +#endif
> +
> /*
> * Make sure to add features to the correct mask
> */
> @@ -123,7 +129,7 @@
> DISABLE_ENQCMD)
> #define DISABLED_MASK17 0
> #define DISABLED_MASK18 0
> -#define DISABLED_MASK19 0
> +#define DISABLED_MASK19 (DISABLE_SEV_SNP)
> #define DISABLED_MASK20 0
> #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index ad35355ee43e..db0f3a041930 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -574,6 +574,8 @@
> #define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
> #define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
> #define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +#define MSR_AMD64_RMP_BASE 0xc0010132
> +#define MSR_AMD64_RMP_END 0xc0010133
>
> /* SNP feature bits enabled by the hypervisor */
> #define MSR_AMD64_SNP_VTOM BIT_ULL(3)
> @@ -675,7 +677,14 @@
> #define MSR_K8_TOP_MEM2 0xc001001d
> #define MSR_AMD64_SYSCFG 0xc0010010
> #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
> -#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
> +#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
> +#define MSR_AMD64_SYSCFG_MFDM_BIT 19
> +#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
> +
> #define MSR_K8_INT_PENDING_MSG 0xc0010055
> /* C1E active bits in int pending message */
> #define K8_INTP_C1E_ACTIVE_MASK 0x18000000
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index ebc271bb6d8e..d34c46db7dd1 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -197,6 +197,7 @@ void snp_set_wakeup_secondary_cpu(void);
> bool snp_init(struct boot_params *bp);
> void __init __noreturn snp_abort(void);
> int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
> +bool snp_get_rmptable_info(u64 *start, u64 *len);
> #else
> static inline void sev_es_ist_enter(struct pt_regs *regs) { }
> static inline void sev_es_ist_exit(void) { }
> @@ -221,6 +222,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
> {
> return -ENOTTY;
> }
> +static inline bool snp_get_rmptable_info(u64 *start, u64 *len) { return false; }
> #endif
>
> #endif
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index a79774181f22..1493ddf89fdf 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -20,6 +20,7 @@
> #include <asm/delay.h>
> #include <asm/debugreg.h>
> #include <asm/resctrl.h>
> +#include <asm/sev.h>
>
> #ifdef CONFIG_X86_64
> # include <asm/mmconfig.h>
> @@ -546,6 +547,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
> resctrl_cpu_detect(c);
> }
>
> +static bool early_rmptable_check(void)
> +{
> + u64 rmp_base, rmp_size;
> +
> + /*
> + * For early BSP initialization, max_pfn won't be set up yet, wait until
> + * it is set before performing the RMP table calculations.
> + */
> + if (!max_pfn)
> + return true;
> +
> + return snp_get_rmptable_info(&rmp_base, &rmp_size);
> +}
> +

When CONFIG_AMD_MEM_ENCRYPT=y && CONFIG_KVM=n (=> CONFIG_KVM_AMD_SEV=n) this
results in an undefined reference to snp_get_rmptable_info when linking this
file. The header provides a stub when AMD_MEM_ENCRYPT=n but the definition is
only compiled in when KVM_AMD_SEV=y

Jeremi

2023-09-07 18:31:03

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH RFC v9 47/51] iommu/amd: Add IOMMU_SNP_SHUTDOWN support

Mike / Ashish

FYI, you might need to start including the change from this patch

https://lore.kernel.org/linux-iommu/[email protected]/T/

in this series since Christoph would like to remove the code from the
current upstream, and re-introduce the change within this series instead.

Regards,
Suravee

On 6/12/2023 11:25 AM, Michael Roth wrote:
> From: Ashish Kalra<[email protected]>
>
> Add a new IOMMU API interface amd_iommu_snp_disable() to transition
> IOMMU pages to Hypervisor state from Reclaim state after SNP_SHUTDOWN_EX
> command. Invoke this API from the CCP driver after SNP_SHUTDOWN_EX
> command.
>
> Signed-off-by: Ashish Kalra<[email protected]>
> Signed-off-by: Michael Roth<[email protected]>