2023-10-16 13:30:23

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

This patchset is also available at:

https://github.com/amdese/linux/commits/snp-host-v10

and is based on top of the following series:

"[PATCH RFC gmem v1 0/8] KVM: gmem hooks/changes needed for x86 (other archs?)"
https://lore.kernel.org/kvm/[email protected]/

which in turn is based on the KVM-x86 staging tree for guest_memfd:

https://github.com/kvm-x86/linux/commits/guest_memfd


== OVERVIEW ==

This patchset implements SEV-SNP hypervisor support for linux. It
relies on the gmem changes noted above, which are still in an RFC
state, but other than those aspects, the series is being targeted for
inclusion in the KVM x86 tree to support running SEV-SNP guests on AMD
EPYC systems utilizing Zen 3 and newer microarchitectures.

More details on what SEV-SNP is and how it works are available below
under "BACKGROUND".


== PATCH LAYOUT ==

PATCH 01-02: Dependencies for patch #3 that are already upstream but not in
current guest_memfd staging tree
PATCH 03 : General SEV-ES fix for MSR_IA32_XSS interception that fixes a
minor bug for SEV-ES, but a more severe one for SNP guests.
Planning to also submit this separately as an SEV-ES fix.
PATCH 04-19: Host SNP initialization code and CCP driver prep for handling
SNP cmds
PATCH 20-43: general SNP enablement for KVM and CCP driver
PATCH 47-50: misc handling for IOMMU support, guest request handling, debug
infrastructure, and kdump-related handling.


== TESTING ==

For testing this via QEMU, use the following tree:

https://github.com/amdese/qemu/commits/snp-latest-gmem-v12

SEV-SNP with gmem enabled:

# set discard=none to disable discarding memory post-conversion, faster
# boot times, but increased memory usage
qemu-system-x86_64 -cpu EPYC-Milan-v2 \
-object memory-backend-memfd-private,id=ram1,size=2G,share=true \
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,discard=both \
-machine q35,confidential-guest-support=sev0,memory-backend=ram1,kvm-type=protected \
...

KVM selftests for UPM:

cd $kernel_src_dir
make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I<path to kernel headers>"
sudo tools/testing/selftests/kvm/x86_64/private_mem_conversions_test


== BACKGROUND (SEV-SNP) ==

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Guest Support now part of mainline.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:

- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

When pages are marked as guest-owned in the RMP table, they are assigned
to a specific guest/ASID, as well as a specific GFN with in the guest. Any
attempts to map it in the RMP table to a different guest/ASID, or a
different GFN within a guest/ASID, will result in an RMP nested page fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outisde of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike


Changes since v9:

* Split off gmem changes to separate RFC series, drop RFC tag from this series
* Use 2M RMPUPDATE instructions whenever possible when invalidating/releasing
gmem pages
* Tighten up RMP #NPF handling to better differentiate spurious cases from
unexpected behavior
* Simplify/optimize logic for determine when 2M NPT private mappings are
possible
* Be more consistent with PFN data types and stub return values (Dave)
* Reduce potential flooding from frequently-printed pr_debug()'s (Dave)
* Use existing #PF handling paths to catch illegal userspace-generated RMP
faults (Dave)
* Improve host kexec/kdump support (Ashish)
* Reduce overhead from unecessary WBINVD via MMU notifiers (Ashish)
* Avoid host crashes during CCP module probe if SNP_INIT* is issued while
guests are running (Tom L.)
* Simplify AutoIBRS disablement (Kim, Dave)
* Avoid unecessary zero'ing in extended guest requests (Alexey)
* Fix padding in struct sev_user_data_ext_snp_config (Alexey)
* Report AP creation failures via GHCB error codes rather than inducing #GP in
guest (Peter)
* Disallow multiple allocations of snp_context via userspace (Peter)
* Error out on unsupported SNP policy bits (Tom)
* Fix snp_leak_pages() stub (Jeremi)
* Use C99 flexible arrays where appropriate
* Use helper to handle HVA->PFN conversions prior to dumping RMP entries (Dave)
* Don't potentially print out all 512 entries when dumping 2MB RMP range (Dave)
* Don't use a union to dump raw RMP entries, just cast at dump-site (Dave)
* Don't use helpers to access RMP entry bitfields, use them directly (Dave)
* Simplify logic and improve comments for AutoIBRS disablement (Dave)

# Changes that were split off to separate gmem series
* Use KVM_X86_SNP_VM to implement SNP-specific checks on whether a fault was
shared/private and drop the duplicate memslot lookup (Isaku, Sean)
* Use Isaku's version of patch to plumb 64-bit #NPF error code (Isaku)
* Fix up stub for kvm_arch_gmem_invalidate() (Boris)

Changes since v8:

* Rework gmem/UPM hooks based on Sean's latest gmem/UPM tree
* Move SEV lazy-pinning support out to a separate series which uses this
series as a prereq instead of the other way around.
* Re-organize extended guest request patches into 3 patches encompassing
SEV FD ioctls for host-wide certs, KVM ioctls for per-instance certs,
and the guest request handling that consumes them. Also move them to
the top of the series to better separate them for the core SNP patches
(Alexey, Zhi, Ashish, Dov, Dionna, others)
* Various other changes/fixups for extended guests request handling (Dov,
Alexey, Dionna)
* Use helper to calculate max RMP entry size and improve readability (Dave)
* Use architecture-independent GPA value for initial VMSA pages
* Ensure SEV_CMD_SNP_GUEST_REQUEST failures are indicated to guest (Alex)
* Allocate per-instance certs on-demand (Alex)
* comment fixup for RMP fault handling (Zhi)
* commit msg rewording for MSR-based PSCs (Zhi)
* update SNP command/struct definitions based on 1.54 ABI (Saban)
* use sev_deactivate_lock around SEV_CMD_SNP_DECOMMISSION (Saban)
* Various comment/commit fixups (Zhi, Alex, Kim, Vlastimil, Dave,
* kexec fixes for newer SNP firmwares (Ashish)
* Various other fixups and re-ordering of patches.

----------------------------------------------------------------
Ashish Kalra (4):
x86/sev: Introduce snp leaked pages list
KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
iommu/amd: Add IOMMU_SNP_SHUTDOWN support
crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump

Brijesh Singh (29):
x86/cpufeatures: Add SEV-SNP CPU feature
x86/sev: Add the host SEV-SNP initialization support
x86/sev: Add RMP entry lookup helpers
x86/fault: Add helper for dumping RMP entries
x86/traps: Define RMP violation #PF error code
x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
x86/sev: Invalidate pages from the direct map when adding them to the RMP table
crypto: ccp: Define the SEV-SNP commands
crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
crypto: ccp: Provide API to issue SEV and SNP commands
crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
crypto: ccp: Handle the legacy SEV command when SNP is enabled
crypto: ccp: Add the SNP_PLATFORM_STATUS command
KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
KVM: SEV: Add initial SEV-SNP support
KVM: SEV: Add KVM_SNP_INIT command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
KVM: SEV: Add support to handle Page State Change VMGEXIT
KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
KVM: SEV: Add support to handle RMP nested page faults
KVM: SVM: Add module parameter to enable the SEV-SNP
crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
crypto: ccp: Add debug support for decrypting pages

Dionna Glaze (1):
x86/sev: Add KVM commands for per-instance certs

Kim Phillips (1):
x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled

Michael Roth (9):
KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
x86/fault: Report RMP page faults for kernel addresses
KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y
KVM: SEV: Add KVM_EXIT_VMGEXIT
KVM: SEV: Add support for GHCB-based termination requests
KVM: SEV: Implement gmem hook for initializing private pages
KVM: SEV: Implement gmem hook for invalidating private pages
KVM: x86: Add gmem hook for determining max NPT mapping level
iommu/amd: Report all cases inhibiting SNP enablement

Paolo Bonzini (1):
KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway

Tom Lendacky (4):
KVM: SVM: Fix TSC_AUX virtualization setup
KVM: SEV: Add support to handle AP reset MSR protocol
KVM: SEV: Use a VMSA physical address variable for populating VMCB
KVM: SEV: Support SEV-SNP AP Creation NAE event

Vishal Annapurve (1):
KVM: Add HVA range operator

Documentation/virt/coco/sev-guest.rst | 54 +
Documentation/virt/kvm/api.rst | 34 +
.../virt/kvm/x86/amd-memory-encryption.rst | 147 ++
arch/x86/Kbuild | 2 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/kvm-x86-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 5 +
arch/x86/include/asm/msr-index.h | 11 +-
arch/x86/include/asm/sev-common.h | 33 +
arch/x86/include/asm/sev-host.h | 37 +
arch/x86/include/asm/sev.h | 6 +
arch/x86/include/asm/svm.h | 6 +
arch/x86/include/asm/trap_pf.h | 4 +
arch/x86/kernel/cpu/amd.c | 24 +-
arch/x86/kernel/cpu/common.c | 7 +-
arch/x86/kernel/crash.c | 7 +
arch/x86/kvm/Kconfig | 3 +
arch/x86/kvm/lapic.c | 5 +-
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 13 +-
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/sev.c | 1903 +++++++++++++++++---
arch/x86/kvm/svm/svm.c | 64 +-
arch/x86/kvm/svm/svm.h | 41 +-
arch/x86/kvm/x86.c | 11 +
arch/x86/mm/fault.c | 5 +
arch/x86/virt/svm/Makefile | 3 +
arch/x86/virt/svm/sev.c | 548 ++++++
drivers/crypto/ccp/sev-dev.c | 1253 ++++++++++++-
drivers/crypto/ccp/sev-dev.h | 16 +
drivers/iommu/amd/init.c | 65 +-
include/linux/amd-iommu.h | 5 +-
include/linux/kvm_host.h | 6 +
include/linux/psp-sev.h | 304 +++-
include/uapi/linux/kvm.h | 74 +
include/uapi/linux/psp-sev.h | 71 +
tools/arch/x86/include/asm/cpufeatures.h | 1 +
virt/kvm/kvm_main.c | 49 +
39 files changed, 4497 insertions(+), 335 deletions(-)
create mode 100644 arch/x86/include/asm/sev-host.h
create mode 100644 arch/x86/virt/svm/Makefile
create mode 100644 arch/x86/virt/svm/sev.c



2023-10-16 13:31:25

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 11/50] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

From: Brijesh Singh <[email protected]>

The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set
of contiguous 4KB-Page RMP entries. The hypervisor will use this
instruction to adjust the RMP entry without invalidating the previous
RMP entry.

Add the following external interface API functions:

psmash():
Used to smash a 2MB aligned page into 4K pages while preserving the
Validated bit in the RMP.

rmp_make_private():
Used to assign a page to guest using the RMPUPDATE instruction.

rmp_make_shared():
Used to transition a page to hypervisor/shared state using the
RMPUPDATE instruction.

Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 14 +++++
arch/x86/include/asm/sev-host.h | 10 ++++
arch/x86/virt/svm/sev.c | 89 +++++++++++++++++++++++++++++++
3 files changed, 113 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1e6fb93d8ab0..93ec8c12c91d 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -173,8 +173,22 @@ struct snp_psc_desc {
#define GHCB_ERR_INVALID_INPUT 5
#define GHCB_ERR_INVALID_EVENT 6

+/* RMUPDATE detected 4K page and 2MB page overlap. */
+#define RMPUPDATE_FAIL_OVERLAP 4
+
/* RMP page size */
#define RMP_PG_SIZE_4K 0
+#define RMP_PG_SIZE_2M 1
#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+#define X86_TO_RMP_PG_LEVEL(level) (((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
+
+struct rmp_state {
+ u64 gpa;
+ u8 assigned;
+ u8 pagesize;
+ u8 immutable;
+ u8 rsvd;
+ u32 asid;
+} __packed;

#endif
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index bb06c57f2909..1df989411334 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -16,9 +16,19 @@
#ifdef CONFIG_KVM_AMD_SEV
int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
void sev_dump_hva_rmpentry(unsigned long address);
+int psmash(u64 pfn);
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
+int rmp_make_shared(u64 pfn, enum pg_level level);
#else
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENXIO; }
static inline void sev_dump_hva_rmpentry(unsigned long address) {}
+static inline int psmash(u64 pfn) { return -ENXIO; }
+static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
+ bool immutable)
+{
+ return -ENXIO;
+}
+static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENXIO; }
#endif

#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index cac3e311c38f..24a695af13a5 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -367,3 +367,92 @@ void sev_dump_hva_rmpentry(unsigned long hva)
sev_dump_rmpentry(pte_pfn(*pte));
}
EXPORT_SYMBOL_GPL(sev_dump_hva_rmpentry);
+
+/*
+ * PSMASH a 2MB aligned page into 4K pages in the RMP table while preserving the
+ * Validated bit.
+ */
+int psmash(u64 pfn)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret;
+
+ pr_debug("%s: PFN: 0x%llx\n", __func__, pfn);
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ /* Binutils version 2.36 supports the PSMASH mnemonic. */
+ asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+ : "=a"(ret)
+ : "a"(paddr)
+ : "memory", "cc");
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+static int rmpupdate(u64 pfn, struct rmp_state *val)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret, level, npages;
+ int attempts = 0;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ do {
+ /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+ : "=a"(ret)
+ : "a"(paddr), "c"((unsigned long)val)
+ : "memory", "cc");
+
+ attempts++;
+ } while (ret == RMPUPDATE_FAIL_OVERLAP);
+
+ if (ret) {
+ pr_err("RMPUPDATE failed after %d attempts, ret: %d, pfn: %llx, npages: %d, level: %d\n",
+ attempts, ret, pfn, npages, level);
+ sev_dump_rmpentry(pfn);
+ dump_stack();
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+/*
+ * Assign a page to guest using the RMPUPDATE instruction.
+ */
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
+{
+ struct rmp_state val;
+
+ memset(&val, 0, sizeof(val));
+ val.assigned = 1;
+ val.asid = asid;
+ val.immutable = immutable;
+ val.gpa = gpa;
+ val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+ return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_private);
+
+/*
+ * Transition a page to hypervisor/shared state using the RMPUPDATE instruction.
+ */
+int rmp_make_shared(u64 pfn, enum pg_level level)
+{
+ struct rmp_state val;
+
+ memset(&val, 0, sizeof(val));
+ val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+ return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_shared);
--
2.25.1

2023-10-16 13:31:27

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 12/50] x86/sev: Invalidate pages from the direct map when adding them to the RMP table

From: Brijesh Singh <[email protected]>

The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used with standard x86 and IOMMU page tables to enforce
memory restrictions and page access rights. The RMP check is enforced as
soon as SEV-SNP is enabled globally in the system. When hardware
encounters an RMP-check failure, it raises a page-fault exception.

The rmp_make_private() and rmp_make_shared() helpers are used to add
or remove the pages from the RMP table. Improve the rmp_make_private()
to invalidate state so that pages cannot be used in the direct-map after
they are added the RMP table, and restored to their default valid
permission after the pages are removed from the RMP table.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/virt/svm/sev.c | 62 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)

diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 24a695af13a5..bf9b97046e05 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -395,6 +395,42 @@ int psmash(u64 pfn)
}
EXPORT_SYMBOL_GPL(psmash);

+static int restore_direct_map(u64 pfn, int npages)
+{
+ int i, ret = 0;
+
+ for (i = 0; i < npages; i++) {
+ ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
+ if (ret)
+ break;
+ }
+
+ if (ret)
+ pr_warn("Failed to restore direct map for pfn 0x%llx, ret: %d\n",
+ pfn + i, ret);
+
+ return ret;
+}
+
+static int invalidate_direct_map(u64 pfn, int npages)
+{
+ int i, ret = 0;
+
+ for (i = 0; i < npages; i++) {
+ ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));
+ if (ret)
+ break;
+ }
+
+ if (ret) {
+ pr_warn("Failed to invalidate direct map for pfn 0x%llx, ret: %d\n",
+ pfn + i, ret);
+ restore_direct_map(pfn, i);
+ }
+
+ return ret;
+}
+
static int rmpupdate(u64 pfn, struct rmp_state *val)
{
unsigned long paddr = pfn << PAGE_SHIFT;
@@ -404,6 +440,21 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
return -ENXIO;

+ level = RMP_TO_X86_PG_LEVEL(val->pagesize);
+ npages = page_level_size(level) / PAGE_SIZE;
+
+ /*
+ * If page is getting assigned in the RMP table then unmap it from the
+ * direct map.
+ */
+ if (val->assigned) {
+ if (invalidate_direct_map(pfn, npages)) {
+ pr_err("Failed to unmap %d pages at pfn 0x%llx from the direct_map\n",
+ npages, pfn);
+ return -EFAULT;
+ }
+ }
+
do {
/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
@@ -422,6 +473,17 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
return -EFAULT;
}

+ /*
+ * Restore the direct map after the page is removed from the RMP table.
+ */
+ if (!val->assigned) {
+ if (restore_direct_map(pfn, npages)) {
+ pr_err("Failed to map %d pages at pfn 0x%llx into the direct_map\n",
+ npages, pfn);
+ return -EFAULT;
+ }
+ }
+
return 0;
}

--
2.25.1

2023-10-16 13:31:40

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 13/50] crypto: ccp: Define the SEV-SNP commands

From: Brijesh Singh <[email protected]>

AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.

Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: update SNP command list and SNP status struct based on current
spec, use C99 flexible arrays]
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 16 +++
include/linux/psp-sev.h | 246 +++++++++++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 53 ++++++++
3 files changed, 315 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index f97166fba9d9..c2da92f19ccd 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -130,6 +130,8 @@ static int sev_cmd_buffer_len(int cmd)
switch (cmd) {
case SEV_CMD_INIT: return sizeof(struct sev_data_init);
case SEV_CMD_INIT_EX: return sizeof(struct sev_data_init_ex);
+ case SEV_CMD_SNP_SHUTDOWN_EX: return sizeof(struct sev_data_snp_shutdown_ex);
+ case SEV_CMD_SNP_INIT_EX: return sizeof(struct sev_data_snp_init_ex);
case SEV_CMD_PLATFORM_STATUS: return sizeof(struct sev_user_data_status);
case SEV_CMD_PEK_CSR: return sizeof(struct sev_data_pek_csr);
case SEV_CMD_PEK_CERT_IMPORT: return sizeof(struct sev_data_pek_cert_import);
@@ -158,6 +160,20 @@ static int sev_cmd_buffer_len(int cmd)
case SEV_CMD_GET_ID: return sizeof(struct sev_data_get_id);
case SEV_CMD_ATTESTATION_REPORT: return sizeof(struct sev_data_attestation_report);
case SEV_CMD_SEND_CANCEL: return sizeof(struct sev_data_send_cancel);
+ case SEV_CMD_SNP_GCTX_CREATE: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_LAUNCH_START: return sizeof(struct sev_data_snp_launch_start);
+ case SEV_CMD_SNP_LAUNCH_UPDATE: return sizeof(struct sev_data_snp_launch_update);
+ case SEV_CMD_SNP_ACTIVATE: return sizeof(struct sev_data_snp_activate);
+ case SEV_CMD_SNP_DECOMMISSION: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_PAGE_RECLAIM: return sizeof(struct sev_data_snp_page_reclaim);
+ case SEV_CMD_SNP_GUEST_STATUS: return sizeof(struct sev_data_snp_guest_status);
+ case SEV_CMD_SNP_LAUNCH_FINISH: return sizeof(struct sev_data_snp_launch_finish);
+ case SEV_CMD_SNP_DBG_DECRYPT: return sizeof(struct sev_data_snp_dbg);
+ case SEV_CMD_SNP_DBG_ENCRYPT: return sizeof(struct sev_data_snp_dbg);
+ case SEV_CMD_SNP_PAGE_UNSMASH: return sizeof(struct sev_data_snp_page_unsmash);
+ case SEV_CMD_SNP_PLATFORM_STATUS: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_GUEST_REQUEST: return sizeof(struct sev_data_snp_guest_request);
+ case SEV_CMD_SNP_CONFIG: return sizeof(struct sev_user_data_snp_config);
default: return 0;
}

diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 7fd17e82bab4..a7f92e74564d 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -78,6 +78,36 @@ enum sev_cmd {
SEV_CMD_DBG_DECRYPT = 0x060,
SEV_CMD_DBG_ENCRYPT = 0x061,

+ /* SNP specific commands */
+ SEV_CMD_SNP_INIT = 0x81,
+ SEV_CMD_SNP_SHUTDOWN = 0x82,
+ SEV_CMD_SNP_PLATFORM_STATUS = 0x83,
+ SEV_CMD_SNP_DF_FLUSH = 0x84,
+ SEV_CMD_SNP_INIT_EX = 0x85,
+ SEV_CMD_SNP_SHUTDOWN_EX = 0x86,
+ SEV_CMD_SNP_DECOMMISSION = 0x90,
+ SEV_CMD_SNP_ACTIVATE = 0x91,
+ SEV_CMD_SNP_GUEST_STATUS = 0x92,
+ SEV_CMD_SNP_GCTX_CREATE = 0x93,
+ SEV_CMD_SNP_GUEST_REQUEST = 0x94,
+ SEV_CMD_SNP_ACTIVATE_EX = 0x95,
+ SEV_CMD_SNP_LAUNCH_START = 0xA0,
+ SEV_CMD_SNP_LAUNCH_UPDATE = 0xA1,
+ SEV_CMD_SNP_LAUNCH_FINISH = 0xA2,
+ SEV_CMD_SNP_DBG_DECRYPT = 0xB0,
+ SEV_CMD_SNP_DBG_ENCRYPT = 0xB1,
+ SEV_CMD_SNP_PAGE_SWAP_OUT = 0xC0,
+ SEV_CMD_SNP_PAGE_SWAP_IN = 0xC1,
+ SEV_CMD_SNP_PAGE_MOVE = 0xC2,
+ SEV_CMD_SNP_PAGE_MD_INIT = 0xC3,
+ SEV_CMD_SNP_PAGE_SET_STATE = 0xC6,
+ SEV_CMD_SNP_PAGE_RECLAIM = 0xC7,
+ SEV_CMD_SNP_PAGE_UNSMASH = 0xC8,
+ SEV_CMD_SNP_CONFIG = 0xC9,
+ SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX = 0xCA,
+ SEV_CMD_SNP_COMMIT = 0xCB,
+ SEV_CMD_SNP_VLEK_LOAD = 0xCD,
+
SEV_CMD_MAX,
};

@@ -523,6 +553,222 @@ struct sev_data_attestation_report {
u32 len; /* In/Out */
} __packed;

+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+ u64 address; /* In */
+ u32 len; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_activate - SNP_ACTIVATE command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ * @asid: ASID to bind to the guest
+ */
+struct sev_data_snp_activate {
+ u64 gctx_paddr; /* In */
+ u32 asid; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_addr - generic SNP command params
+ *
+ * @address: system physical address guest context page
+ */
+struct sev_data_snp_addr {
+ u64 gctx_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @policy: guest policy
+ * @ma_gctx_addr: system physical address of migration agent
+ * @imi_en: launch flow is launching an IMI for the purpose of
+ * guest-assisted migration.
+ * @ma_en: the guest is associated with a migration agent
+ */
+struct sev_data_snp_launch_start {
+ u64 gctx_paddr; /* In */
+ u64 policy; /* In */
+ u64 ma_gctx_paddr; /* In */
+ u32 ma_en:1; /* In */
+ u32 imi_en:1; /* In */
+ u32 rsvd:30;
+ u8 gosvw[16]; /* In */
+} __packed;
+
+/* SNP support page type */
+enum {
+ SNP_PAGE_TYPE_NORMAL = 0x1,
+ SNP_PAGE_TYPE_VMSA = 0x2,
+ SNP_PAGE_TYPE_ZERO = 0x3,
+ SNP_PAGE_TYPE_UNMEASURED = 0x4,
+ SNP_PAGE_TYPE_SECRET = 0x5,
+ SNP_PAGE_TYPE_CPUID = 0x6,
+
+ SNP_PAGE_TYPE_MAX
+};
+
+/**
+ * struct sev_data_snp_launch_update - SNP_LAUNCH_UPDATE command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @imi_page: indicates that this page is part of the IMI of the guest
+ * @page_type: encoded page type
+ * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
+ * @address: system physical address of destination page to encrypt
+ * @vmpl1_perms: VMPL permission mask for VMPL1
+ * @vmpl2_perms: VMPL permission mask for VMPL2
+ * @vmpl3_perms: VMPL permission mask for VMPL3
+ */
+struct sev_data_snp_launch_update {
+ u64 gctx_paddr; /* In */
+ u32 page_size:1; /* In */
+ u32 page_type:3; /* In */
+ u32 imi_page:1; /* In */
+ u32 rsvd:27;
+ u32 rsvd2;
+ u64 address; /* In */
+ u32 rsvd3:8;
+ u32 vmpl1_perms:8; /* In */
+ u32 vmpl2_perms:8; /* In */
+ u32 vmpl3_perms:8; /* In */
+ u32 rsvd4;
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_finish - SNP_LAUNCH_FINISH command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ */
+struct sev_data_snp_launch_finish {
+ u64 gctx_paddr;
+ u64 id_block_paddr;
+ u64 id_auth_paddr;
+ u8 id_block_en:1;
+ u8 auth_key_en:1;
+ u64 rsvd:62;
+ u8 host_data[32];
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_status - SNP_GUEST_STATUS command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @address: system physical address of guest status page
+ */
+struct sev_data_snp_guest_status {
+ u64 gctx_paddr;
+ u64 address;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_reclaim - SNP_PAGE_RECLAIM command params
+ *
+ * @paddr: system physical address of page to be claimed. The 0th bit
+ * in the address indicates the page size. 0h indicates 4 kB and
+ * 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_reclaim {
+ u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_unsmash - SNP_PAGE_UNSMASH command params
+ *
+ * @paddr: system physical address of page to be unsmashed. The 0th bit
+ * in the address indicates the page size. 0h indicates 4 kB and
+ * 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_unsmash {
+ u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_dbg - DBG_ENCRYPT/DBG_DECRYPT command parameters
+ *
+ * @handle: handle of the VM to perform debug operation
+ * @src_addr: source address of data to operate on
+ * @dst_addr: destination address of data to operate on
+ */
+struct sev_data_snp_dbg {
+ u64 gctx_paddr; /* In */
+ u64 src_addr; /* In */
+ u64 dst_addr; /* In */
+} __packed;
+
+/**
+ * struct sev_snp_guest_request - SNP_GUEST_REQUEST command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @req_paddr: system physical address of request page
+ * @res_paddr: system physical address of response page
+ */
+struct sev_data_snp_guest_request {
+ u64 gctx_paddr; /* In */
+ u64 req_paddr; /* In */
+ u64 res_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_init - SNP_INIT_EX structure
+ *
+ * @init_rmp: indicate that the RMP should be initialized.
+ * @list_paddr_en: indicate that list_paddr is valid
+ * @list_paddr: system physical address of range list
+ */
+struct sev_data_snp_init_ex {
+ u32 init_rmp:1;
+ u32 list_paddr_en:1;
+ u32 rsvd:30;
+ u32 rsvd1;
+ u64 list_paddr;
+ u8 rsvd2[48];
+} __packed;
+
+/**
+ * struct sev_data_range - RANGE structure
+ *
+ * @base: system physical address of first byte of range
+ * @page_count: number of 4KB pages in this range
+ */
+struct sev_data_range {
+ u64 base;
+ u32 page_count;
+ u32 rsvd;
+} __packed;
+
+/**
+ * struct sev_data_range_list - RANGE_LIST structure
+ *
+ * @num_elements: number of elements in RANGE_ARRAY
+ * @ranges: array of num_elements of type RANGE
+ */
+struct sev_data_range_list {
+ u32 num_elements;
+ u32 rsvd;
+ struct sev_data_range ranges[];
+} __packed;
+
+/**
+ * struct sev_data_snp_shutdown_ex - SNP_SHUTDOWN_EX structure
+ *
+ * @length: len of the command buffer read by the PSP
+ * @iommu_snp_shutdown: Disable enforcement of SNP in the IOMMU
+ */
+struct sev_data_snp_shutdown_ex {
+ u32 length;
+ u32 iommu_snp_shutdown:1;
+ u32 rsvd1:31;
+} __packed;
+
#ifdef CONFIG_CRYPTO_DEV_SP_PSP

/**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 1c9da485318f..48e3ef91559c 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -68,6 +68,13 @@ typedef enum {
SEV_RET_INVALID_PARAM,
SEV_RET_RESOURCE_LIMIT,
SEV_RET_SECURE_DATA_INVALID,
+ SEV_RET_INVALID_PAGE_SIZE,
+ SEV_RET_INVALID_PAGE_STATE,
+ SEV_RET_INVALID_MDATA_ENTRY,
+ SEV_RET_INVALID_PAGE_OWNER,
+ SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+ SEV_RET_RMP_INIT_REQUIRED,
+
SEV_RET_MAX,
} sev_ret_code;

@@ -154,6 +161,52 @@ struct sev_user_data_get_id2 {
__u32 length; /* In/Out */
} __packed;

+/**
+ * struct sev_user_data_snp_status - SNP status
+ *
+ * @major: API major version
+ * @minor: API minor version
+ * @state: current platform state
+ * @is_rmp_initialized: whether RMP is initialized or not
+ * @build: firmware build id for the API version
+ * @mask_chip_id: whether chip id is present in attestation reports or not
+ * @mask_chip_key: whether attestation reports are signed or not
+ * @vlek_en: VLEK hashstick is loaded
+ * @guest_count: the number of guest currently managed by the firmware
+ * @current_tcb_version: current TCB version
+ * @reported_tcb_version: reported TCB version
+ */
+struct sev_user_data_snp_status {
+ __u8 api_major; /* Out */
+ __u8 api_minor; /* Out */
+ __u8 state; /* Out */
+ __u8 is_rmp_initialized:1; /* Out */
+ __u8 rsvd:7;
+ __u32 build_id; /* Out */
+ __u32 mask_chip_id:1; /* Out */
+ __u32 mask_chip_key:1; /* Out */
+ __u32 vlek_en:1; /* Out */
+ __u32 rsvd1:29;
+ __u32 guest_count; /* Out */
+ __u64 current_tcb_version; /* Out */
+ __u64 reported_tcb_version; /* Out */
+} __packed;
+
+/*
+ * struct sev_user_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: The TCB version to report in the guest attestation report.
+ * @mask_chip_id: Indicates that the CHID_ID field in the attestation report
+ * will always be zero.
+ */
+struct sev_user_data_snp_config {
+ __u64 reported_tcb ; /* In */
+ __u32 mask_chip_id:1; /* In */
+ __u32 mask_chip_key:1; /* In */
+ __u32 rsvd:30; /* In */
+ __u8 rsvd1[52];
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1

2023-10-16 13:32:16

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

From: Brijesh Singh <[email protected]>

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
command to prepare for the first SNP guest launch after INIT.

During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to other page states
after SNP_INIT enables their SNP enforcement.

Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.
For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.

Note that SNP_INIT(_EX) must not be executed while non-SEV guests are
executing, otherwise it is possible that the system could reset or hang.
The psp_init_on_probe module parameter was added for SEV/SEV-ES support
and the init_ex_path module parameter to allow for time for the
necessary file system to be mounted/available. SNP_INIT(_EX) does not
use the file associated with init_ex_path. So, to avoid running into
issues where SNP_INIT(_EX) is called while there are other running
guests, issue it during module probe regardless of the psp_init_on_probe
setting, but maintain the previous deferrable handling for SEV/SEV-ES
initialization.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Co-developed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
[mdr: squash in psp_init_on_probe changes from Tom]
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 272 +++++++++++++++++++++++++++++++++--
drivers/crypto/ccp/sev-dev.h | 2 +
2 files changed, 259 insertions(+), 15 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index c2da92f19ccd..fae1fd45eccd 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -29,6 +29,7 @@

#include <asm/smp.h>
#include <asm/cacheflush.h>
+#include <asm/e820/types.h>

#include "psp-dev.h"
#include "sev-dev.h"
@@ -37,6 +38,10 @@
#define SEV_FW_FILE "amd/sev.fw"
#define SEV_FW_NAME_SIZE 64

+/* Minimum firmware version required for the SEV-SNP support */
+#define SNP_MIN_API_MAJOR 1
+#define SNP_MIN_API_MINOR 51
+
static DEFINE_MUTEX(sev_cmd_mutex);
static struct sev_misc_dev *misc_dev;

@@ -80,6 +85,14 @@ static void *sev_es_tmr;
#define NV_LENGTH (32 * 1024)
static void *sev_init_ex_buffer;

+/*
+ * SEV_DATA_RANGE_LIST:
+ * Array containing range of pages that firmware transitions to HV-fixed
+ * page state.
+ */
+struct sev_data_range_list *snp_range_list;
+static int __sev_snp_init_locked(int *error);
+
static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
{
struct sev_device *sev = psp_master->sev_data;
@@ -466,9 +479,9 @@ static inline int __sev_do_init_locked(int *psp_ret)
return __sev_init_locked(psp_ret);
}

-static int __sev_platform_init_locked(int *error)
+static int ___sev_platform_init_locked(int *error, bool probe)
{
- int rc = 0, psp_ret = SEV_RET_NO_FW_CALL;
+ int rc, psp_ret = SEV_RET_NO_FW_CALL;
struct psp_device *psp = psp_master;
struct sev_device *sev;

@@ -480,6 +493,34 @@ static int __sev_platform_init_locked(int *error)
if (sev->state == SEV_STATE_INIT)
return 0;

+ /*
+ * Legacy guests cannot be running while SNP_INIT(_EX) is executing,
+ * so perform SEV-SNP initialization at probe time.
+ */
+ rc = __sev_snp_init_locked(error);
+ if (rc && rc != -ENODEV) {
+ /*
+ * Don't abort the probe if SNP INIT failed,
+ * continue to initialize the legacy SEV firmware.
+ */
+ dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n", rc, *error);
+ }
+
+ /* Delay SEV/SEV-ES support initialization */
+ if (probe && !psp_init_on_probe)
+ return 0;
+
+ if (!sev_es_tmr) {
+ /* Obtain the TMR memory area for SEV-ES use */
+ sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
+ if (sev_es_tmr)
+ /* Must flush the cache before giving it to the firmware */
+ clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
+ else
+ dev_warn(sev->dev,
+ "SEV: TMR allocation failed, SEV-ES support unavailable\n");
+ }
+
if (sev_init_ex_buffer) {
rc = sev_read_init_ex_file();
if (rc)
@@ -522,6 +563,11 @@ static int __sev_platform_init_locked(int *error)
return 0;
}

+static int __sev_platform_init_locked(int *error)
+{
+ return ___sev_platform_init_locked(error, false);
+}
+
int sev_platform_init(int *error)
{
int rc;
@@ -534,6 +580,17 @@ int sev_platform_init(int *error)
}
EXPORT_SYMBOL_GPL(sev_platform_init);

+static int sev_platform_init_on_probe(int *error)
+{
+ int rc;
+
+ mutex_lock(&sev_cmd_mutex);
+ rc = ___sev_platform_init_locked(error, true);
+ mutex_unlock(&sev_cmd_mutex);
+
+ return rc;
+}
+
static int __sev_platform_shutdown_locked(int *error)
{
struct sev_device *sev = psp_master->sev_data;
@@ -838,6 +895,191 @@ static int sev_update_firmware(struct device *dev)
return ret;
}

+static void snp_set_hsave_pa(void *arg)
+{
+ wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
+{
+ struct sev_data_range_list *range_list = arg;
+ struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
+ size_t size;
+
+ if ((range_list->num_elements * sizeof(struct sev_data_range) +
+ sizeof(struct sev_data_range_list)) > PAGE_SIZE)
+ return -E2BIG;
+
+ switch (rs->desc) {
+ case E820_TYPE_RESERVED:
+ case E820_TYPE_PMEM:
+ case E820_TYPE_ACPI:
+ range->base = rs->start & PAGE_MASK;
+ size = (rs->end + 1) - rs->start;
+ range->page_count = size >> PAGE_SHIFT;
+ range_list->num_elements++;
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static int __sev_snp_init_locked(int *error)
+{
+ struct psp_device *psp = psp_master;
+ struct sev_data_snp_init_ex data;
+ struct sev_device *sev;
+ int rc = 0;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENODEV;
+
+ if (!psp || !psp->sev_data)
+ return -ENODEV;
+
+ sev = psp->sev_data;
+
+ if (sev->snp_initialized)
+ return 0;
+
+ if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
+ dev_dbg(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
+ SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
+ return 0;
+ }
+
+ /*
+ * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
+ * across all cores.
+ */
+ on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+ /*
+ * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
+ * system physical address ranges to convert into the HV-fixed page states
+ * during the RMP initialization. For instance, the memory that UEFI
+ * reserves should be included in the range list. This allows system
+ * components that occasionally write to memory (e.g. logging to UEFI
+ * reserved regions) to not fail due to RMP initialization and SNP enablement.
+ */
+ if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
+ /*
+ * Firmware checks that the pages containing the ranges enumerated
+ * in the RANGES structure are either in the Default page state or in the
+ * firmware page state.
+ */
+ snp_range_list = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!snp_range_list) {
+ dev_err(sev->dev,
+ "SEV: SNP_INIT_EX range list memory allocation failed\n");
+ return -ENOMEM;
+ }
+
+ /*
+ * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
+ * to be setup as HV-fixed pages.
+ */
+
+ rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
+ snp_range_list, snp_filter_reserved_mem_regions);
+ if (rc) {
+ dev_err(sev->dev,
+ "SEV: SNP_INIT_EX walk_iomem_res_desc failed rc = %d\n", rc);
+ return rc;
+ }
+
+ memset(&data, 0, sizeof(data));
+ data.init_rmp = 1;
+ data.list_paddr_en = 1;
+ data.list_paddr = __psp_pa(snp_range_list);
+
+ /*
+ * Before invoking SNP_INIT_EX with INIT_RMP=1, make sure that
+ * all dirty cache lines containing the RMP are flushed.
+ *
+ * NOTE: that includes writes via RMPUPDATE instructions, which
+ * are also cacheable writes.
+ */
+ wbinvd_on_all_cpus();
+
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT_EX, &data, error);
+ if (rc)
+ return rc;
+ } else {
+ /*
+ * SNP_INIT is equivalent to SNP_INIT_EX with INIT_RMP=1, so
+ * just as with that case, make sure all dirty cache lines
+ * containing the RMP are flushed.
+ */
+ wbinvd_on_all_cpus();
+
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
+ if (rc)
+ return rc;
+ }
+
+ /* Prepare for first SNP guest launch after INIT */
+ wbinvd_on_all_cpus();
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+ if (rc)
+ return rc;
+
+ sev->snp_initialized = true;
+ dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+ return rc;
+}
+
+static int __sev_snp_shutdown_locked(int *error)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_data_snp_shutdown_ex data;
+ int ret;
+
+ if (!sev->snp_initialized)
+ return 0;
+
+ memset(&data, 0, sizeof(data));
+ data.length = sizeof(data);
+ data.iommu_snp_shutdown = 1;
+
+ wbinvd_on_all_cpus();
+
+retry:
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
+ /* SHUTDOWN may require DF_FLUSH */
+ if (*error == SEV_RET_DFFLUSH_REQUIRED) {
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
+ if (ret) {
+ dev_err(sev->dev, "SEV-SNP DF_FLUSH failed\n");
+ return ret;
+ }
+ goto retry;
+ }
+ if (ret) {
+ dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
+ return ret;
+ }
+
+ sev->snp_initialized = false;
+ dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+ return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+ int rc;
+
+ mutex_lock(&sev_cmd_mutex);
+ rc = __sev_snp_shutdown_locked(error);
+ mutex_unlock(&sev_cmd_mutex);
+
+ return rc;
+}
+
static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
{
struct sev_device *sev = psp_master->sev_data;
@@ -1285,6 +1527,8 @@ int sev_dev_init(struct psp_device *psp)

static void sev_firmware_shutdown(struct sev_device *sev)
{
+ int error;
+
sev_platform_shutdown(NULL);

if (sev_es_tmr) {
@@ -1301,6 +1545,13 @@ static void sev_firmware_shutdown(struct sev_device *sev)
get_order(NV_LENGTH));
sev_init_ex_buffer = NULL;
}
+
+ if (snp_range_list) {
+ kfree(snp_range_list);
+ snp_range_list = NULL;
+ }
+
+ sev_snp_shutdown(&error);
}

void sev_dev_destroy(struct psp_device *psp)
@@ -1356,24 +1607,15 @@ void sev_pci_init(void)
}
}

- /* Obtain the TMR memory area for SEV-ES use */
- sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
- if (sev_es_tmr)
- /* Must flush the cache before giving it to the firmware */
- clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
- else
- dev_warn(sev->dev,
- "SEV: TMR allocation failed, SEV-ES support unavailable\n");
-
- if (!psp_init_on_probe)
- return;
-
/* Initialize the platform */
- rc = sev_platform_init(&error);
+ rc = sev_platform_init_on_probe(&error);
if (rc)
dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
error, rc);

+ dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
+ "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
+
return;

err:
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 778c95155e74..85506325051a 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,8 @@ struct sev_device {
u8 build;

void *cmd_buf;
+
+ bool snp_initialized;
};

int sev_dev_init(struct psp_device *psp);
--
2.25.1

2023-10-16 13:32:35

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 15/50] crypto: ccp: Provide API to issue SEV and SNP commands

From: Brijesh Singh <[email protected]>

Make sev_do_cmd() a generic API interface for the hypervisor
to issue commands to manage an SEV and SNP guest. The commands
for SEV and SNP are defined in the SEV and SEV-SNP firmware
specifications.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 3 ++-
include/linux/psp-sev.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index fae1fd45eccd..613b25f81498 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -418,7 +418,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
return ret;
}

-static int sev_do_cmd(int cmd, void *data, int *psp_ret)
+int sev_do_cmd(int cmd, void *data, int *psp_ret)
{
int rc;

@@ -428,6 +428,7 @@ static int sev_do_cmd(int cmd, void *data, int *psp_ret)

return rc;
}
+EXPORT_SYMBOL_GPL(sev_do_cmd);

static int __sev_init_locked(int *error)
{
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index a7f92e74564d..61bb5849ebf2 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -883,6 +883,20 @@ int sev_guest_df_flush(int *error);
*/
int sev_guest_decommission(struct sev_data_decommission *data, int *error);

+/**
+ * sev_do_cmd - perform SEV command
+ *
+ * @error: SEV command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV if the SEV device is not available
+ * -%ENOTSUPP if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO if the SEV returned a non-zero return code
+ */
+int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
void *psp_copy_user_blob(u64 uaddr, u32 len);

#else /* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -898,6 +912,9 @@ sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENO
static inline int
sev_guest_decommission(struct sev_data_decommission *data, int *error) { return -ENODEV; }

+static inline int
+sev_do_cmd(int cmd, void *data, int *psp_ret) { return -ENODEV; }
+
static inline int
sev_guest_activate(struct sev_data_activate *data, int *error) { return -ENODEV; }

--
2.25.1

2023-10-16 13:32:57

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list

From: Ashish Kalra <[email protected]>

Pages are unsafe to be released back to the page-allocator, if they
have been transitioned to firmware/guest state and can't be reclaimed
or transitioned back to hypervisor/shared state. In this case add
them to an internal leaked pages list to ensure that they are not freed
or touched/accessed to cause fatal page faults.

Signed-off-by: Ashish Kalra <[email protected]>
[mdr: relocate to arch/x86/coco/sev/host.c]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-host.h | 3 +++
arch/x86/virt/svm/sev.c | 28 ++++++++++++++++++++++++++++
2 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index 1df989411334..7490a665e78f 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -19,6 +19,8 @@ void sev_dump_hva_rmpentry(unsigned long address);
int psmash(u64 pfn);
int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
int rmp_make_shared(u64 pfn, enum pg_level level);
+void snp_leak_pages(u64 pfn, unsigned int npages);
+
#else
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENXIO; }
static inline void sev_dump_hva_rmpentry(unsigned long address) {}
@@ -29,6 +31,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
return -ENXIO;
}
static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENXIO; }
+static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
#endif

#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index bf9b97046e05..29a69f4b8cfb 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -59,6 +59,12 @@ struct rmpentry {
static struct rmpentry *rmptable_start __ro_after_init;
static u64 rmptable_max_pfn __ro_after_init;

+/* list of pages which are leaked and cannot be reclaimed */
+static LIST_HEAD(snp_leaked_pages_list);
+static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
+
+static atomic_long_t snp_nr_leaked_pages = ATOMIC_LONG_INIT(0);
+
#undef pr_fmt
#define pr_fmt(fmt) "SEV-SNP: " fmt

@@ -518,3 +524,25 @@ int rmp_make_shared(u64 pfn, enum pg_level level)
return rmpupdate(pfn, &val);
}
EXPORT_SYMBOL_GPL(rmp_make_shared);
+
+void snp_leak_pages(u64 pfn, unsigned int npages)
+{
+ struct page *page = pfn_to_page(pfn);
+
+ pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
+
+ spin_lock(&snp_leaked_pages_list_lock);
+ while (npages--) {
+ /*
+ * Reuse the page's buddy list for chaining into the leaked
+ * pages list. This page should not be on a free list currently
+ * and is also unsafe to be added to a free list.
+ */
+ list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
+ sev_dump_rmpentry(pfn);
+ pfn++;
+ }
+ spin_unlock(&snp_leaked_pages_list_lock);
+ atomic_long_inc(&snp_nr_leaked_pages);
+}
+EXPORT_SYMBOL_GPL(snp_leak_pages);
--
2.25.1

2023-10-16 13:33:37

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 17/50] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

From: Brijesh Singh <[email protected]>

The behavior and requirement for the SEV-legacy command is altered when
the SNP firmware is in the INIT state. See SEV-SNP firmware specification
for more details.

Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
when SNP is enabled to satisfy new requirements for the SNP. Continue
allocating a 1mb region for !SNP configuration.

While at it, provide API that can be used by others to allocate a page
that can be used by the firmware. The immediate user for this API will
be the KVM driver. The KVM driver to need to allocate a firmware context
page during the guest creation. The context page need to be updated
by the firmware. See the SEV-SNP specification for further details.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: use struct sev_data_snp_page_reclaim instead of passing paddr
directly to SEV_CMD_SNP_PAGE_RECLAIM]
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 151 ++++++++++++++++++++++++++++++++---
include/linux/psp-sev.h | 9 +++
2 files changed, 151 insertions(+), 9 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 613b25f81498..ea21307a2b34 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -30,6 +30,7 @@
#include <asm/smp.h>
#include <asm/cacheflush.h>
#include <asm/e820/types.h>
+#include <asm/sev-host.h>

#include "psp-dev.h"
#include "sev-dev.h"
@@ -93,6 +94,13 @@ static void *sev_init_ex_buffer;
struct sev_data_range_list *snp_range_list;
static int __sev_snp_init_locked(int *error);

+/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
+#define SEV_SNP_ES_TMR_SIZE (2 * 1024 * 1024)
+
+static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
+
+static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
+
static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
{
struct sev_device *sev = psp_master->sev_data;
@@ -193,11 +201,131 @@ static int sev_cmd_buffer_len(int cmd)
return 0;
}

+static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
+{
+ /* Cbit maybe set in the paddr */
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ int ret, err, i, n = 0;
+
+ for (i = 0; i < npages; i++, pfn++, n++) {
+ struct sev_data_snp_page_reclaim data = {0};
+
+ data.paddr = pfn << PAGE_SHIFT;
+
+ if (locked)
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ else
+ ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+
+ if (ret)
+ goto cleanup;
+
+ ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (ret)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * If failed to reclaim the page then page is no longer safe to
+ * be release back to the system, leak it.
+ */
+ snp_leak_pages(pfn, npages - n);
+ return ret;
+}
+
+static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
+{
+ /* Cbit maybe set in the paddr */
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ int rc, n = 0, i;
+
+ for (i = 0; i < npages; i++, n++, pfn++) {
+ rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
+ if (rc)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * Try unrolling the firmware state changes by
+ * reclaiming the pages which were already changed to the
+ * firmware state.
+ */
+ snp_reclaim_pages(paddr, n, locked);
+
+ return rc;
+}
+
+static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
+{
+ unsigned long npages = 1ul << order, paddr;
+ struct sev_device *sev;
+ struct page *page;
+
+ if (!psp_master || !psp_master->sev_data)
+ return NULL;
+
+ page = alloc_pages(gfp_mask, order);
+ if (!page)
+ return NULL;
+
+ /* If SEV-SNP is initialized then add the page in RMP table. */
+ sev = psp_master->sev_data;
+ if (!sev->snp_initialized)
+ return page;
+
+ paddr = __pa((unsigned long)page_address(page));
+ if (rmp_mark_pages_firmware(paddr, npages, locked))
+ return NULL;
+
+ return page;
+}
+
+void *snp_alloc_firmware_page(gfp_t gfp_mask)
+{
+ struct page *page;
+
+ page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
+
+ return page ? page_address(page) : NULL;
+}
+EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
+
+static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ unsigned long paddr, npages = 1ul << order;
+
+ if (!page)
+ return;
+
+ paddr = __pa((unsigned long)page_address(page));
+ if (sev->snp_initialized &&
+ snp_reclaim_pages(paddr, npages, locked))
+ return;
+
+ __free_pages(page, order);
+}
+
+void snp_free_firmware_page(void *addr)
+{
+ if (!addr)
+ return;
+
+ __snp_free_firmware_pages(virt_to_page(addr), 0, false);
+}
+EXPORT_SYMBOL_GPL(snp_free_firmware_page);
+
static void *sev_fw_alloc(unsigned long len)
{
struct page *page;

- page = alloc_pages(GFP_KERNEL, get_order(len));
+ page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len), false);
if (!page)
return NULL;

@@ -443,7 +571,7 @@ static int __sev_init_locked(int *error)
data.tmr_address = __pa(sev_es_tmr);

data.flags |= SEV_INIT_FLAGS_SEV_ES;
- data.tmr_len = SEV_ES_TMR_SIZE;
+ data.tmr_len = sev_es_tmr_size;
}

return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
@@ -466,7 +594,7 @@ static int __sev_init_ex_locked(int *error)
data.tmr_address = __pa(sev_es_tmr);

data.flags |= SEV_INIT_FLAGS_SEV_ES;
- data.tmr_len = SEV_ES_TMR_SIZE;
+ data.tmr_len = sev_es_tmr_size;
}

return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
@@ -513,14 +641,16 @@ static int ___sev_platform_init_locked(int *error, bool probe)

if (!sev_es_tmr) {
/* Obtain the TMR memory area for SEV-ES use */
- sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
- if (sev_es_tmr)
+ sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
+ if (sev_es_tmr) {
/* Must flush the cache before giving it to the firmware */
- clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
- else
+ if (!sev->snp_initialized)
+ clflush_cache_range(sev_es_tmr, sev_es_tmr_size);
+ } else {
dev_warn(sev->dev,
"SEV: TMR allocation failed, SEV-ES support unavailable\n");
}
+ }

if (sev_init_ex_buffer) {
rc = sev_read_init_ex_file();
@@ -1030,6 +1160,8 @@ static int __sev_snp_init_locked(int *error)
sev->snp_initialized = true;
dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");

+ sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
+
return rc;
}

@@ -1536,8 +1668,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
/* The TMR area was encrypted, flush it from the cache */
wbinvd_on_all_cpus();

- free_pages((unsigned long)sev_es_tmr,
- get_order(SEV_ES_TMR_SIZE));
+ __snp_free_firmware_pages(virt_to_page(sev_es_tmr),
+ get_order(sev_es_tmr_size),
+ false);
sev_es_tmr = NULL;
}

diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 61bb5849ebf2..9342cee1a1e6 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -898,6 +898,8 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
int sev_do_cmd(int cmd, void *data, int *psp_ret);

void *psp_copy_user_blob(u64 uaddr, u32 len);
+void *snp_alloc_firmware_page(gfp_t mask);
+void snp_free_firmware_page(void *addr);

#else /* !CONFIG_CRYPTO_DEV_SP_PSP */

@@ -925,6 +927,13 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int

static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }

+static inline void *snp_alloc_firmware_page(gfp_t mask)
+{
+ return NULL;
+}
+
+static inline void snp_free_firmware_page(void *addr) { }
+
#endif /* CONFIG_CRYPTO_DEV_SP_PSP */

#endif /* __PSP_SEV_H__ */
--
2.25.1

2023-10-16 13:34:02

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 19/50] crypto: ccp: Add the SNP_PLATFORM_STATUS command

From: Brijesh Singh <[email protected]>

The command can be used by the userspace to query the SNP platform status
report. See the SEV-SNP spec for more details.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
Documentation/virt/coco/sev-guest.rst | 27 ++++++++++++++++
drivers/crypto/ccp/sev-dev.c | 45 +++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 1 +
3 files changed, 73 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 68b0d2363af8..e828c5326936 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -67,6 +67,22 @@ counter (e.g. counter overflow), then -EIO will be returned.
};
};

+The host ioctl should be called to /dev/sev device. The ioctl accepts command
+id and command input structure.
+
+::
+ struct sev_issue_cmd {
+ /* Command ID */
+ __u32 cmd;
+
+ /* Command request structure */
+ __u64 data;
+
+ /* firmware error code on failure (see psp-sev.h) */
+ __u32 error;
+ };
+
+
2.1 SNP_GET_REPORT
------------------

@@ -124,6 +140,17 @@ be updated with the expected value.

See GHCB specification for further detail on how to parse the certificate blob.

+2.4 SNP_PLATFORM_STATUS
+-----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_platform_status
+:Returns (out): 0 on success, -negative on error
+
+The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
+status includes API major, minor version and more. See the SEV-SNP
+specification for further details.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index b574b0ef2b1f..679b8d6fc09a 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1772,6 +1772,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
return ret;
}

+static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_data_snp_addr buf;
+ struct page *status_page;
+ void *data;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ status_page = alloc_page(GFP_KERNEL_ACCOUNT);
+ if (!status_page)
+ return -ENOMEM;
+
+ data = page_address(status_page);
+ if (rmp_mark_pages_firmware(__pa(data), 1, true)) {
+ __free_pages(status_page, 0);
+ return -EFAULT;
+ }
+
+ buf.gctx_paddr = __psp_pa(data);
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
+
+ /* Change the page state before accessing it */
+ if (snp_reclaim_pages(__pa(data), 1, true)) {
+ snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
+ return -EFAULT;
+ }
+
+ if (ret)
+ goto cleanup;
+
+ if (copy_to_user((void __user *)argp->data, data,
+ sizeof(struct sev_user_data_snp_status)))
+ ret = -EFAULT;
+
+cleanup:
+ __free_pages(status_page, 0);
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -1823,6 +1865,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SEV_GET_ID2:
ret = sev_ioctl_do_get_id2(&input);
break;
+ case SNP_PLATFORM_STATUS:
+ ret = sev_ioctl_snp_platform_status(&input);
+ break;
default:
ret = -EINVAL;
goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 48e3ef91559c..b94b3687edbb 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -28,6 +28,7 @@ enum {
SEV_PEK_CERT_IMPORT,
SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
SEV_GET_ID2,
+ SNP_PLATFORM_STATUS,

SEV_MAX,
};
--
2.25.1

2023-10-16 13:34:05

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 18/50] crypto: ccp: Handle the legacy SEV command when SNP is enabled

From: Brijesh Singh <[email protected]>

The behavior of the SEV-legacy commands is altered when the SNP firmware
is in the INIT state. When SNP is in INIT state, all the SEV-legacy
commands that cause the firmware to write to memory must be in the
firmware state before issuing the command..

A command buffer may contains a system physical address that the firmware
may write to. There are two cases that need to be handled:

1) system physical address points to a guest memory
2) system physical address points to a host memory

To handle the case #1, change the page state to the firmware in the RMP
table before issuing the command and restore the state to shared after the
command completes.

For the case #2, use a bounce buffer to complete the request.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 346 ++++++++++++++++++++++++++++++++++-
drivers/crypto/ccp/sev-dev.h | 12 ++
2 files changed, 348 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index ea21307a2b34..b574b0ef2b1f 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -462,12 +462,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
return sev_write_init_ex_file();
}

+static int alloc_snp_host_map(struct sev_device *sev)
+{
+ struct page *page;
+ int i;
+
+ for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+ struct snp_host_map *map = &sev->snp_host_map[i];
+
+ memset(map, 0, sizeof(*map));
+
+ page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
+ if (!page)
+ return -ENOMEM;
+
+ map->host = page_address(page);
+ }
+
+ return 0;
+}
+
+static void free_snp_host_map(struct sev_device *sev)
+{
+ int i;
+
+ for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+ struct snp_host_map *map = &sev->snp_host_map[i];
+
+ if (map->host) {
+ __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
+ memset(map, 0, sizeof(*map));
+ }
+ }
+}
+
+static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+ unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+ map->active = false;
+
+ if (!paddr || !len)
+ return 0;
+
+ map->paddr = *paddr;
+ map->len = len;
+
+ /* If paddr points to a guest memory then change the page state to firmwware. */
+ if (guest) {
+ if (rmp_mark_pages_firmware(*paddr, npages, true))
+ return -EFAULT;
+
+ goto done;
+ }
+
+ if (!map->host)
+ return -ENOMEM;
+
+ /* Check if the pre-allocated buffer can be used to fullfil the request. */
+ if (len > SEV_FW_BLOB_MAX_SIZE)
+ return -EINVAL;
+
+ /* Transition the pre-allocated buffer to the firmware state. */
+ if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
+ return -EFAULT;
+
+ /* Set the paddr to use pre-allocated firmware buffer */
+ *paddr = __psp_pa(map->host);
+
+done:
+ map->active = true;
+ return 0;
+}
+
+static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+ unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+ if (!map->active)
+ return 0;
+
+ /* If paddr points to a guest memory then restore the page state to hypervisor. */
+ if (guest) {
+ if (snp_reclaim_pages(*paddr, npages, true))
+ return -EFAULT;
+
+ goto done;
+ }
+
+ /*
+ * Transition the pre-allocated buffer to hypervisor state before the access.
+ *
+ * This is because while changing the page state to firmware, the kernel unmaps
+ * the pages from the direct map, and to restore the direct map the pages must
+ * be transitioned back to the shared state.
+ */
+ if (snp_reclaim_pages(__pa(map->host), npages, true))
+ return -EFAULT;
+
+ /* Copy the response data firmware buffer to the callers buffer. */
+ memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
+ *paddr = map->paddr;
+
+done:
+ map->active = false;
+ return 0;
+}
+
+static bool sev_legacy_cmd_buf_writable(int cmd)
+{
+ switch (cmd) {
+ case SEV_CMD_PLATFORM_STATUS:
+ case SEV_CMD_GUEST_STATUS:
+ case SEV_CMD_LAUNCH_START:
+ case SEV_CMD_RECEIVE_START:
+ case SEV_CMD_LAUNCH_MEASURE:
+ case SEV_CMD_SEND_START:
+ case SEV_CMD_SEND_UPDATE_DATA:
+ case SEV_CMD_SEND_UPDATE_VMSA:
+ case SEV_CMD_PEK_CSR:
+ case SEV_CMD_PDH_CERT_EXPORT:
+ case SEV_CMD_GET_ID:
+ case SEV_CMD_ATTESTATION_REPORT:
+ return true;
+ default:
+ return false;
+ }
+}
+
+#define prep_buffer(name, addr, len, guest, map) \
+ func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
+
+static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
+{
+ int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
+ struct sev_device *sev = psp_master->sev_data;
+ bool from_fw = !to_fw;
+
+ /*
+ * After the command is completed, change the command buffer memory to
+ * hypervisor state.
+ *
+ * The immutable bit is automatically cleared by the firmware, so
+ * no not need to reclaim the page.
+ */
+ if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
+ if (snp_reclaim_pages(__pa(cmd_buf), 1, true))
+ return -EFAULT;
+
+ /* No need to go further if firmware failed to execute command. */
+ if (fw_err)
+ return 0;
+ }
+
+ if (to_fw)
+ func = map_firmware_writeable;
+ else
+ func = unmap_firmware_writeable;
+
+ /*
+ * A command buffer may contains a system physical address. If the address
+ * points to a host memory then use an intermediate firmware page otherwise
+ * change the page state in the RMP table.
+ */
+ switch (cmd) {
+ case SEV_CMD_PDH_CERT_EXPORT:
+ if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
+ pdh_cert_len, false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
+ cert_chain_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_GET_ID:
+ if (prep_buffer(struct sev_data_get_id, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_PEK_CSR:
+ if (prep_buffer(struct sev_data_pek_csr, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_launch_update_data, address, len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_MEASURE:
+ if (prep_buffer(struct sev_data_launch_measure, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_SECRET:
+ if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_DBG_DECRYPT:
+ if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
+ &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_DBG_ENCRYPT:
+ if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
+ &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_ATTESTATION_REPORT:
+ if (prep_buffer(struct sev_data_attestation_report, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_START:
+ if (prep_buffer(struct sev_data_send_start, session_address,
+ session_len, false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_send_update_data, trans_address,
+ trans_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
+ trans_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_RECEIVE_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_receive_update_data, guest_address,
+ guest_len, true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_RECEIVE_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
+ guest_len, true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ default:
+ break;
+ }
+
+ /* The command buffer need to be in the firmware state. */
+ if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
+ if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
+ return -EFAULT;
+ }
+
+ return 0;
+
+err:
+ return -EINVAL;
+}
+
+static inline bool need_firmware_copy(int cmd)
+{
+ struct sev_device *sev = psp_master->sev_data;
+
+ /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
+ return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
+}
+
+static int snp_aware_copy_to_firmware(int cmd, void *data)
+{
+ return __snp_cmd_buf_copy(cmd, data, true, 0);
+}
+
+static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
+{
+ return __snp_cmd_buf_copy(cmd, data, false, fw_err);
+}
+
static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
{
struct psp_device *psp = psp_master;
struct sev_device *sev;
unsigned int phys_lsb, phys_msb;
unsigned int reg, ret = 0;
+ void *cmd_buf;
int buf_len;

if (!psp || !psp->sev_data)
@@ -487,12 +770,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
* work for some memory, e.g. vmalloc'd addresses, and @data may not be
* physically contiguous.
*/
- if (data)
- memcpy(sev->cmd_buf, data, buf_len);
+ if (data) {
+ if (sev->cmd_buf_active > 2)
+ return -EBUSY;
+
+ cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
+
+ memcpy(cmd_buf, data, buf_len);
+ sev->cmd_buf_active++;
+
+ /*
+ * The behavior of the SEV-legacy commands is altered when the
+ * SNP firmware is in the INIT state.
+ */
+ if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, cmd_buf))
+ return -EFAULT;
+ } else {
+ cmd_buf = sev->cmd_buf;
+ }

/* Get the physical address of the command buffer */
- phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
- phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
+ phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
+ phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;

dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
cmd, phys_msb, phys_lsb, psp_timeout);
@@ -533,15 +832,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
ret = sev_write_init_ex_file_if_required(cmd);
}

- print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
- buf_len, false);
-
/*
* Copy potential output from the PSP back to data. Do this even on
* failure in case the caller wants to glean something from the error.
*/
- if (data)
- memcpy(data, sev->cmd_buf, buf_len);
+ if (data) {
+ /*
+ * Restore the page state after the command completes.
+ */
+ if (need_firmware_copy(cmd) &&
+ snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
+ return -EFAULT;
+
+ memcpy(data, cmd_buf, buf_len);
+ sev->cmd_buf_active--;
+ }
+
+ print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+ buf_len, false);

return ret;
}
@@ -639,6 +947,14 @@ static int ___sev_platform_init_locked(int *error, bool probe)
if (probe && !psp_init_on_probe)
return 0;

+ /*
+ * Allocate the intermediate buffers used for the legacy command handling.
+ */
+ if (rc != -ENODEV && alloc_snp_host_map(sev)) {
+ dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
+ goto skip_legacy;
+ }
+
if (!sev_es_tmr) {
/* Obtain the TMR memory area for SEV-ES use */
sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
@@ -691,6 +1007,7 @@ static int ___sev_platform_init_locked(int *error, bool probe)
dev_info(sev->dev, "SEV API:%d.%d build:%d\n", sev->api_major,
sev->api_minor, sev->build);

+skip_legacy:
return 0;
}

@@ -1616,10 +1933,12 @@ int sev_dev_init(struct psp_device *psp)
if (!sev)
goto e_err;

- sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
+ sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
if (!sev->cmd_buf)
goto e_sev;

+ sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+
psp->sev_data = sev;

sev->dev = dev;
@@ -1685,6 +2004,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
snp_range_list = NULL;
}

+ /*
+ * The host map need to clear the immutable bit so it must be free'd before the
+ * SNP firmware shutdown.
+ */
+ free_snp_host_map(sev);
+
sev_snp_shutdown(&error);
}

@@ -1753,6 +2078,7 @@ void sev_pci_init(void)
return;

err:
+ free_snp_host_map(sev);
psp_master->sev_data = NULL;
}

diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 85506325051a..2c2fe42189a5 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -29,11 +29,20 @@
#define SEV_CMD_COMPLETE BIT(1)
#define SEV_CMDRESP_IOC BIT(0)

+#define MAX_SNP_HOST_MAP_BUFS 2
+
struct sev_misc_dev {
struct kref refcount;
struct miscdevice misc;
};

+struct snp_host_map {
+ u64 paddr;
+ u32 len;
+ void *host;
+ bool active;
+};
+
struct sev_device {
struct device *dev;
struct psp_device *psp;
@@ -52,8 +61,11 @@ struct sev_device {
u8 build;

void *cmd_buf;
+ void *cmd_buf_backup;
+ int cmd_buf_active;

bool snp_initialized;
+ struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
};

int sev_dev_init(struct psp_device *psp);
--
2.25.1

2023-10-16 13:34:26

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y

SEV-SNP relies on the restricted/protected memory support to run guests,
so make sure to enable that support with the
CONFIG_KVM_SW_PROTECTED_VM build option.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8452ed0228cb..71dc506aa3fb 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -126,6 +126,7 @@ config KVM_AMD_SEV
bool "AMD Secure Encrypted Virtualization (SEV) support"
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
+ select KVM_SW_PROTECTED_VM
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
--
2.25.1

2023-10-16 13:34:43

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway

From: Paolo Bonzini <[email protected]>

svm_recalc_instruction_intercepts() is always called at least once
before the vCPU is started, so the setting or clearing of the RDTSCP
intercept can be dropped from the TSC_AUX virtualization support.

Extracted from a patch by Tom Lendacky.

Cc: [email protected]
Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Paolo Bonzini <[email protected]>
(cherry picked from commit e8d93d5d93f85949e7299be289c6e7e1154b2f78)
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b9a0a939d59f..fa1fb81323b5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3027,11 +3027,8 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)

if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) &&
(guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP) ||
- guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDPID))) {
+ guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDPID)))
set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, 1, 1);
- if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
- svm_clr_intercept(svm, INTERCEPT_RDTSCP);
- }
}

void sev_init_vmcb(struct vcpu_svm *svm)
--
2.25.1

2023-10-16 13:35:40

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 21/50] KVM: SEV: Add support to handle AP reset MSR protocol

From: Tom Lendacky <[email protected]>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++-----
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 93ec8c12c91d..57ced29264ce 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -56,6 +56,8 @@
/* AP Reset Hold */
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6ee925d66648..4f895a7201ed 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -65,6 +65,10 @@ module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
#define sev_es_debug_swap_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

+#define AP_RESET_HOLD_NONE 0
+#define AP_RESET_HOLD_NAE_EVENT 1
+#define AP_RESET_HOLD_MSR_PROTO 2
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2594,6 +2598,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)

void sev_es_unmap_ghcb(struct vcpu_svm *svm)
{
+ /* Clear any indication that the vCPU is in a type of AP Reset Hold */
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
if (!svm->sev_es.ghcb)
return;

@@ -2805,6 +2812,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_AP_RESET_HOLD_REQ:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+ ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+ /*
+ * Preset the result to a non-SIPI return and then only set
+ * the result to non-zero when delivering a SIPI.
+ */
+ set_ghcb_msr_bits(svm, 0,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -2904,6 +2927,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
case SVM_VMGEXIT_AP_HLT_LOOP:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
ret = kvm_emulate_ap_reset_hold(vcpu);
break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -3147,13 +3171,29 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
return;
}

- /*
- * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
- * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
- * non-zero value.
- */
- if (!svm->sev_es.ghcb)
- return;
+ /* Subsequent SIPI */
+ switch (svm->sev_es.ap_reset_hold_type) {
+ case AP_RESET_HOLD_NAE_EVENT:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+ */
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ break;
+ case AP_RESET_HOLD_MSR_PROTO:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set GHCB data field to a non-zero value.
+ */
+ set_ghcb_msr_bits(svm, 1,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);

- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ default:
+ break;
+ }
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c409f934c377..b74231511493 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -195,6 +195,7 @@ struct vcpu_sev_es_state {
u8 valid_bitmap[16];
struct kvm_host_map ghcb_map;
bool received_first_sipi;
+ unsigned int ap_reset_hold_type;

/* SEV-ES scratch area support */
u64 sw_scratch;
--
2.25.1

2023-10-16 13:35:43

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 22/50] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests

From: Brijesh Singh <[email protected]>

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 14 ++++++++++++++
arch/x86/kvm/svm/svm.h | 3 ++-
3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 57ced29264ce..9ba88973a187 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_POS 12
+#define GHCB_MSR_HV_FT_MASK GENMASK_ULL(51, 0)
#define GHCB_MSR_HV_FT_RESP_VAL(v) \
/* GHCBData[63:12] */ \
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4f895a7201ed..088b32657f46 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2568,6 +2568,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+ case SVM_VMGEXIT_HV_FEATURES:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -2828,6 +2829,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK,
GHCB_MSR_INFO_POS);
break;
+ case GHCB_MSR_HV_FT_REQ: {
+ set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+ GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+ GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -2952,6 +2960,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_HV_FEATURES: {
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_HV_FT_SUPPORTED);
+
+ ret = 1;
+ break;
+ }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index b74231511493..c13070d00910 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -663,9 +663,10 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);

/* sev.c */

-#define GHCB_VERSION_MAX 1ULL
+#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

+#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP

extern unsigned int max_sev_asid;

--
2.25.1

2023-10-16 13:36:01

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe

From: Brijesh Singh <[email protected]>

Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.

When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" via a reserved bit in the corresponding RMP
entry after a successful VMRUN. This is done for _all_ VMs, not just
SNP-Active VMs.

If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2mb aligned and software accesses any
part of the associated 2mb region with a hupage, the CPU will
incorrectly treat the entire 2mb region as in-use and signal a spurious
RMP violation #PF.

The recommended is to not use the hugepage for the VMCB, VMSA or
AVIC backing page for similar reasons. Add a generic allocator that will
ensure that the page returns is not hugepage (2mb or 1gb) and is safe to
be used when SEV-SNP is enabled. Also implement similar handling for the
VMCB/VMSA pages of nested guests.

Co-developed-by: Marc Orr <[email protected]>
Signed-off-by: Marc Orr <[email protected]>
Reported-by: Alper Gun <[email protected]> # for nested VMSA case
Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: squash in nested guest handling from Ashish]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/lapic.c | 5 ++++-
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/sev.c | 33 ++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 17 ++++++++++++---
arch/x86/kvm/svm/svm.h | 1 +
7 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index f1505a5fa781..4ef2eca14287 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -136,6 +136,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
KVM_X86_OP_OPTIONAL(gmem_invalidate)
+KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fa401cb1a552..a3983271ea28 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1763,6 +1763,7 @@ struct kvm_x86_ops {

int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
+ void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dcd60b39e794..631a554c0f48 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2810,7 +2810,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)

vcpu->arch.apic = apic;

- apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+ if (kvm_x86_ops.alloc_apic_backing_page)
+ apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
+ else
+ apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!apic->regs) {
printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dd496c9e5f91..1f9a3f9eb985 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1194,7 +1194,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
if (svm->nested.initialized)
return 0;

- vmcb02_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmcb02_page = snp_safe_alloc_page(&svm->vcpu);
if (!vmcb02_page)
return -ENOMEM;
svm->nested.vmcb02.ptr = page_address(vmcb02_page);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 088b32657f46..1cfb9232fc74 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3211,3 +3211,36 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
break;
}
}
+
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+ unsigned long pfn;
+ struct page *p;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+ /*
+ * Allocate an SNP safe page to workaround the SNP erratum where
+ * the CPU will incorrectly signal an RMP violation #PF if a
+ * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
+ * or AVIC backing page. The recommeded workaround is to not use the
+ * hugepage.
+ *
+ * Allocate one extra page, use a page which is not 2mb aligned
+ * and free the other.
+ */
+ p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+ if (!p)
+ return NULL;
+
+ split_page(p, 1);
+
+ pfn = page_to_pfn(p);
+ if (IS_ALIGNED(pfn, PTRS_PER_PMD))
+ __free_page(p++);
+ else
+ __free_page(p + 1);
+
+ return p;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1e7fb1ea45f7..8e4ef0cd968a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -706,7 +706,7 @@ static int svm_cpu_init(int cpu)
int ret = -ENOMEM;

memset(sd, 0, sizeof(struct svm_cpu_data));
- sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ sd->save_area = snp_safe_alloc_page(NULL);
if (!sd->save_area)
return ret;

@@ -1425,7 +1425,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
svm = to_svm(vcpu);

err = -ENOMEM;
- vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmcb01_page = snp_safe_alloc_page(vcpu);
if (!vmcb01_page)
goto out;

@@ -1434,7 +1434,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
* SEV-ES guests require a separate VMSA page used to contain
* the encrypted register state of the guest.
*/
- vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmsa_page = snp_safe_alloc_page(vcpu);
if (!vmsa_page)
goto error_free_vmcb_page;

@@ -4876,6 +4876,16 @@ static int svm_vm_init(struct kvm *kvm)
return 0;
}

+static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
+{
+ struct page *page = snp_safe_alloc_page(vcpu);
+
+ if (!page)
+ return NULL;
+
+ return page_address(page);
+}
+
static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = KBUILD_MODNAME,

@@ -5007,6 +5017,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
+ .alloc_apic_backing_page = svm_alloc_apic_backing_page,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c13070d00910..b7b8bf73cbb9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -694,6 +694,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);

/* vmenter.S */

--
2.25.1

2023-10-16 13:36:37

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 24/50] KVM: SEV: Add initial SEV-SNP support

From: Brijesh Singh <[email protected]>

The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding new
hardware based security protection. SEV-SNP adds strong memory encryption
integrity protection to help prevent malicious hypervisor-based attacks
such as data replay, memory re-mapping, and more, to create an isolated
execution environment.

The SNP feature is added incrementally, the later patches adds a new module
parameters that can be used to enabled SEV-SNP in the KVM.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kvm/svm/sev.c | 10 ++++++++++
arch/x86/kvm/svm/svm.h | 8 ++++++++
2 files changed, 18 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1cfb9232fc74..4eefc168ebb3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -59,10 +59,14 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
/* enable/disable SEV-ES DebugSwap support */
static bool sev_es_debug_swap_enabled = true;
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
+
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
#else
#define sev_enabled false
#define sev_es_enabled false
#define sev_es_debug_swap_enabled false
+#define sev_snp_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

#define AP_RESET_HOLD_NONE 0
@@ -2186,6 +2190,7 @@ void __init sev_hardware_setup(void)
{
#ifdef CONFIG_KVM_AMD_SEV
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+ bool sev_snp_supported = false;
bool sev_es_supported = false;
bool sev_supported = false;

@@ -2261,6 +2266,10 @@ void __init sev_hardware_setup(void)
sev_es_asid_count = min_sev_asid - 1;
WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
sev_es_supported = true;
+ sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
+
+ pr_info("SEV-ES %ssupported: %u ASIDs\n",
+ sev_snp_supported ? "and SEV-SNP " : "", sev_es_asid_count);

out:
if (boot_cpu_has(X86_FEATURE_SEV))
@@ -2277,6 +2286,7 @@ void __init sev_hardware_setup(void)
if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
!cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
sev_es_debug_swap_enabled = false;
+ sev_snp_enabled = sev_snp_supported;
#endif
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index b7b8bf73cbb9..635430fa641b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -79,6 +79,7 @@ enum {
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
+ bool snp_active; /* SEV-SNP enabled guest */
unsigned int asid; /* ASID used for this guest */
unsigned int handle; /* SEV firmware handle */
int fd; /* SEV device fd */
@@ -339,6 +340,13 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
#endif
}

+static __always_inline bool sev_snp_guest(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return sev_es_guest(kvm) && sev->snp_active;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1

2023-10-16 13:37:10

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 25/50] KVM: SEV: Add KVM_SNP_INIT command

From: Brijesh Singh <[email protected]>

The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be the
first command issued. When creating SEV-SNP guest, the VMM must use this
command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.

The flags value must be zero, it will be extended in future SNP support to
communicate the optional features (such as restricted INT injection etc).

Co-developed-by: Pavan Kumar Paluri <[email protected]>
Signed-off-by: Pavan Kumar Paluri <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 27 +++++++++++++
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/sev.c | 39 ++++++++++++++++++-
arch/x86/kvm/svm/svm.h | 4 ++
include/uapi/linux/kvm.h | 13 +++++++
5 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 995780088eb2..b1a19c9a577a 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -434,6 +434,33 @@ issued by the hypervisor to make the guest ready for execution.

Returns: 0 on success, -negative on error

+18. KVM_SNP_INIT
+----------------
+
+The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
+context. In a typical workflow, this command should be the first command issued.
+
+Parameters (in/out): struct kvm_snp_init
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_snp_init {
+ __u64 flags;
+ };
+
+The flags bitmap is defined as::
+
+ /* enable the restricted injection */
+ #define KVM_SEV_SNP_RESTRICTED_INJET (1<<0)
+
+ /* enable the restricted injection timer */
+ #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1<<1)
+
+If the specified flags is not supported then return -EOPNOTSUPP, and the supported
+flags are returned.
+
References
==========

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 19bf955b67e0..a901f1daaefc 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -289,6 +289,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)

#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)
+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)

struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4eefc168ebb3..0cd2a850cb45 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -251,6 +251,25 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
sev_decommission(handle);
}

+static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_snp_init params;
+ int ret = 0;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
+ ret = -EOPNOTSUPP;
+
+ params.flags = SEV_SNP_SUPPORTED_FLAGS;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
+ ret = -EFAULT;
+
+ return ret;
+}
+
static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -264,12 +283,19 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;

sev->active = true;
- sev->es_active = argp->id == KVM_SEV_ES_INIT;
+ sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
+ sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
asid = sev_asid_new(sev);
if (asid < 0)
goto e_no_asid;
sev->asid = asid;

+ if (sev->snp_active) {
+ ret = verify_snp_init_flags(kvm, argp);
+ if (ret)
+ goto e_free;
+ }
+
ret = sev_platform_init(&argp->error);
if (ret)
goto e_free;
@@ -285,6 +311,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
sev_asid_free(sev);
sev->asid = 0;
e_no_asid:
+ sev->snp_active = false;
sev->es_active = false;
sev->active = false;
return ret;
@@ -623,6 +650,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
if (sev_es_debug_swap_enabled)
save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;

+ /* Enable the SEV-SNP feature */
+ if (sev_snp_guest(svm->vcpu.kvm))
+ save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);

@@ -1881,6 +1912,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
}

switch (sev_cmd.id) {
+ case KVM_SEV_SNP_INIT:
+ if (!sev_snp_enabled) {
+ r = -ENOTTY;
+ goto out;
+ }
+ fallthrough;
case KVM_SEV_ES_INIT:
if (!sev_es_enabled) {
r = -ENOTTY;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 635430fa641b..71f56bee0b90 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -76,6 +76,9 @@ enum {
/* TPR and CR2 are always written before VMRUN */
#define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))

+/* Supported init feature flags */
+#define SEV_SNP_SUPPORTED_FLAGS 0x0
+
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
@@ -91,6 +94,7 @@ struct kvm_sev_info {
struct list_head mirror_entry; /* Use as a list entry of mirrors */
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
+ u64 snp_init_flags;
};

struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 65fc983af840..a98a77f4fc4c 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1962,6 +1962,9 @@ enum sev_cmd_id {
/* Guest Migration Extension */
KVM_SEV_SEND_CANCEL,

+ /* SNP specific commands */
+ KVM_SEV_SNP_INIT,
+
KVM_SEV_NR_MAX,
};

@@ -2058,6 +2061,16 @@ struct kvm_sev_receive_update_data {
__u32 trans_len;
};

+/* enable the restricted injection */
+#define KVM_SEV_SNP_RESTRICTED_INJET (1 << 0)
+
+/* enable the restricted injection timer */
+#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1 << 1)
+
+struct kvm_snp_init {
+ __u64 flags;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2023-10-16 13:37:34

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 26/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command

From: Brijesh Singh <[email protected]>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: hold sev_deactivate_lock when calling SEV_CMD_SNP_DECOMMISSION]
Signed-off-by: Michael Roth <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 24 +++
arch/x86/kvm/svm/sev.c | 144 +++++++++++++++++-
arch/x86/kvm/svm/svm.h | 1 +
include/uapi/linux/kvm.h | 10 ++
4 files changed, 176 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index b1a19c9a577a..b1beb2fe8766 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -461,6 +461,30 @@ The flags bitmap is defined as::
If the specified flags is not supported then return -EOPNOTSUPP, and the supported
flags are returned.

+19. KVM_SNP_LAUNCH_START
+------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. To create the encryption context, user must
+provide a guest policy, migration agent (if any) and guest OS visible
+workarounds value as defined SEV-SNP specification.
+
+Parameters (in): struct kvm_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_start {
+ __u64 policy; /* Guest policy to use. */
+ __u64 ma_uaddr; /* userspace address of migration agent */
+ __u8 ma_en; /* 1 if the migration agent is enabled */
+ __u8 imi_en; /* set IMI to 1. */
+ __u8 gosvw[16]; /* guest OS visible workarounds */
+ };
+
+See the SEV-SNP specification for further detail on the launch input.
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0cd2a850cb45..a4efd1858a9c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -24,6 +24,7 @@
#include <asm/trapnr.h>
#include <asm/fpu/xcr.h>
#include <asm/debugreg.h>
+#include <asm/sev-host.h>

#include "mmu.h"
#include "x86.h"
@@ -73,6 +74,10 @@ static bool sev_snp_enabled;
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2

+/* As defined by SEV-SNP Firmware ABI, under "Guest Policy". */
+#define SNP_POLICY_MASK_SMT BIT_ULL(16)
+#define SNP_POLICY_MASK_SINGLE_SOCKET BIT_ULL(20)
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -83,6 +88,8 @@ static unsigned int nr_asids;
static unsigned long *sev_asid_bitmap;
static unsigned long *sev_reclaim_asid_bitmap;

+static int snp_decommission_context(struct kvm *kvm);
+
struct enc_region {
struct list_head list;
unsigned long npages;
@@ -108,12 +115,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
down_write(&sev_deactivate_lock);

wbinvd_on_all_cpus();
- ret = sev_guest_df_flush(&error);
+
+ if (sev_snp_enabled)
+ ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+ else
+ ret = sev_guest_df_flush(&error);

up_write(&sev_deactivate_lock);

if (ret)
- pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+ pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+ sev_snp_enabled ? "-SNP" : "", ret, error);

return ret;
}
@@ -1888,6 +1900,94 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}

+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct sev_data_snp_addr data = {};
+ void *context;
+ int rc;
+
+ /* Allocate memory for context page */
+ context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+ if (!context)
+ return NULL;
+
+ data.gctx_paddr = __psp_pa(context);
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+ if (rc) {
+ snp_free_firmware_page(context);
+ return NULL;
+ }
+
+ return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_activate data = {0};
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.asid = sev_get_asid(kvm);
+ return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_start start = {0};
+ struct kvm_sev_snp_launch_start params;
+ int rc;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ /* Don't allow userspace to allocate memory for more than 1 SNP context. */
+ if (sev->snp_context)
+ return -EINVAL;
+
+ sev->snp_context = snp_context_create(kvm, argp);
+ if (!sev->snp_context)
+ return -ENOTTY;
+
+ if (params.policy & SNP_POLICY_MASK_SINGLE_SOCKET) {
+ pr_warn("SEV-SNP hypervisor does not support limiting guests to a single socket.");
+ return -EINVAL;
+ }
+
+ if (!(params.policy & SNP_POLICY_MASK_SMT)) {
+ pr_warn("SEV-SNP hypervisor does not support limiting guests to a single SMT thread.");
+ return -EINVAL;
+ }
+
+ start.gctx_paddr = __psp_pa(sev->snp_context);
+ start.policy = params.policy;
+ memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ sev->fd = argp->sev_fd;
+ rc = snp_bind_asid(kvm, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ return 0;
+
+e_free_context:
+ snp_decommission_context(kvm);
+
+ return rc;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -1978,6 +2078,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_RECEIVE_FINISH:
r = sev_receive_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_START:
+ r = snp_launch_start(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2170,6 +2273,33 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}

+static int snp_decommission_context(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_addr data = {};
+ int ret;
+
+ /* If context is not created then do nothing */
+ if (!sev->snp_context)
+ return 0;
+
+ data.gctx_paddr = __sme_pa(sev->snp_context);
+ down_write(&sev_deactivate_lock);
+ ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+ if (WARN_ONCE(ret, "failed to release guest context")) {
+ up_write(&sev_deactivate_lock);
+ return ret;
+ }
+
+ up_write(&sev_deactivate_lock);
+
+ /* free the context page now */
+ snp_free_firmware_page(sev->snp_context);
+ sev->snp_context = NULL;
+
+ return 0;
+}
+
void sev_vm_destroy(struct kvm *kvm)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2211,7 +2341,15 @@ void sev_vm_destroy(struct kvm *kvm)
}
}

- sev_unbind_asid(kvm, sev->handle);
+ if (sev_snp_guest(kvm)) {
+ if (snp_decommission_context(kvm)) {
+ WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+ return;
+ }
+ } else {
+ sev_unbind_asid(kvm, sev->handle);
+ }
+
sev_asid_free(sev);
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 71f56bee0b90..f86dd7d09441 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -95,6 +95,7 @@ struct kvm_sev_info {
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
u64 snp_init_flags;
+ void *snp_context; /* SNP guest context page */
};

struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a98a77f4fc4c..e92da3d4f569 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1964,6 +1964,7 @@ enum sev_cmd_id {

/* SNP specific commands */
KVM_SEV_SNP_INIT,
+ KVM_SEV_SNP_LAUNCH_START,

KVM_SEV_NR_MAX,
};
@@ -2071,6 +2072,15 @@ struct kvm_snp_init {
__u64 flags;
};

+struct kvm_sev_snp_launch_start {
+ __u64 policy;
+ __u64 ma_uaddr;
+ __u8 ma_en;
+ __u8 imi_en;
+ __u8 gosvw[16];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2023-10-16 13:37:59

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 27/50] KVM: Add HVA range operator

From: Vishal Annapurve <[email protected]>

Introduce HVA range operator so that other KVM subsystems
can operate on HVA range.

Signed-off-by: Vishal Annapurve <[email protected]>
[mdr: minor checkpatch alignment fixups]
Signed-off-by: Michael Roth <[email protected]>
---
include/linux/kvm_host.h | 6 +++++
virt/kvm/kvm_main.c | 49 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 840a5be5962a..f5453006b98d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1431,6 +1431,12 @@ void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
void kvm_mmu_invalidate_end(struct kvm *kvm);
bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);

+typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
+ struct kvm_gfn_range *range, void *data);
+
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+ unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
+
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 959e866c84f0..2ad452a13d82 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -676,6 +676,55 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
return r;
}

+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+ unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
+{
+ int ret = 0;
+ struct kvm_gfn_range gfn_range;
+ struct kvm_memory_slot *slot;
+ struct kvm_memslots *slots;
+ int i, idx;
+
+ if (WARN_ON_ONCE(hva_end <= hva_start))
+ return -EINVAL;
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
+ struct interval_tree_node *node;
+
+ slots = __kvm_memslots(kvm, i);
+ kvm_for_each_memslot_in_hva_range(node, slots,
+ hva_start, hva_end - 1) {
+ unsigned long start, end;
+
+ slot = container_of(node, struct kvm_memory_slot,
+ hva_node[slots->node_idx]);
+ start = max(hva_start, slot->userspace_addr);
+ end = min(hva_end, slot->userspace_addr +
+ (slot->npages << PAGE_SHIFT));
+
+ /*
+ * {gfn(page) | page intersects with [hva_start, hva_end)} =
+ * {gfn_start, gfn_start+1, ..., gfn_end-1}.
+ */
+ gfn_range.start = hva_to_gfn_memslot(start, slot);
+ gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
+ gfn_range.slot = slot;
+
+ ret = handler(kvm, &gfn_range, data);
+ if (ret)
+ goto e_ret;
+ }
+ }
+
+e_ret:
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_do_hva_range_op);
+
static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
unsigned long start,
unsigned long end,
--
2.25.1

2023-10-16 13:38:21

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 28/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

From: Brijesh Singh <[email protected]>

The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.

In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.

While terminating the guest, reclaim the guest pages added in the RMP
table. If the reclaim fails, then the page is no longer safe to be
released back to the system and leak them.

For more information see the SEV-SNP specification.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 28 +++
arch/x86/kvm/svm/sev.c | 181 ++++++++++++++++++
include/uapi/linux/kvm.h | 19 ++
3 files changed, 228 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index b1beb2fe8766..d4325b26724c 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error

See the SEV-SNP specification for further detail on the launch input.

+20. KVM_SNP_LAUNCH_UPDATE
+-------------------------
+
+The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
+calculates a measurement of the memory contents. The measurement is a signature
+of the memory contents that can be sent to the guest owner as an attestation
+that the memory was encrypted correctly by the firmware.
+
+Parameters (in): struct kvm_snp_launch_update
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_update {
+ __u64 start_gfn; /* Guest page number to start from. */
+ __u64 uaddr; /* userspace address need to be encrypted */
+ __u32 len; /* length of memory region */
+ __u8 imi_page; /* 1 if memory is part of the IMI */
+ __u8 page_type; /* page type */
+ __u8 vmpl3_perms; /* VMPL3 permission mask */
+ __u8 vmpl2_perms; /* VMPL2 permission mask */
+ __u8 vmpl1_perms; /* VMPL1 permission mask */
+ };
+
+See the SEV-SNP spec for further details on how to build the VMPL permission
+mask and page type.
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a4efd1858a9c..c505e4620456 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -246,6 +246,36 @@ static void sev_decommission(unsigned int handle)
sev_guest_decommission(&decommission, NULL);
}

+static int snp_page_reclaim(u64 pfn)
+{
+ struct sev_data_snp_page_reclaim data = {0};
+ int err, rc;
+
+ data.paddr = __sme_set(pfn << PAGE_SHIFT);
+ rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ if (rc) {
+ /*
+ * If the reclaim failed, then page is no longer safe
+ * to use.
+ */
+ snp_leak_pages(pfn, 1);
+ }
+
+ return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
+{
+ int rc;
+
+ rc = rmp_make_shared(pfn, level);
+ if (rc && leak)
+ snp_leak_pages(pfn,
+ page_level_size(level) >> PAGE_SHIFT);
+
+ return rc;
+}
+
static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
{
struct sev_data_deactivate deactivate;
@@ -1988,6 +2018,154 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return rc;
}

+static int snp_launch_update_gfn_handler(struct kvm *kvm,
+ struct kvm_gfn_range *range,
+ void *opaque)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_memory_slot *memslot = range->slot;
+ struct sev_data_snp_launch_update data = {0};
+ struct kvm_sev_snp_launch_update params;
+ struct kvm_sev_cmd *argp = opaque;
+ int *error = &argp->error;
+ int i, n = 0, ret = 0;
+ unsigned long npages;
+ kvm_pfn_t *pfns;
+ gfn_t gfn;
+
+ if (!kvm_slot_can_be_private(memslot)) {
+ pr_err("SEV-SNP requires private memory support via guest_memfd.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
+ pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
+ return -EFAULT;
+ }
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+
+ npages = range->end - range->start;
+ pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
+ if (!pfns)
+ return -ENOMEM;
+
+ pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
+ range->start, range->end, params.page_type);
+
+ for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
+ int order, level;
+ bool assigned;
+ void *kvaddr;
+
+ ret = __kvm_gmem_get_pfn(kvm, memslot, gfn, &pfns[i], &order, false);
+ if (ret)
+ goto e_release;
+
+ n++;
+ ret = snp_lookup_rmpentry((u64)pfns[i], &assigned, &level);
+ if (ret || assigned) {
+ pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d, assigned: %d\n",
+ gfn, ret, assigned);
+ return -EFAULT;
+ }
+
+ kvaddr = pfn_to_kaddr(pfns[i]);
+ if (!virt_addr_valid(kvaddr)) {
+ pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
+ ret = -EINVAL;
+ goto e_release;
+ }
+
+ ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+ if (ret) {
+ pr_err("Guest read failed, ret: 0x%x\n", ret);
+ goto e_release;
+ }
+
+ ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
+ sev_get_asid(kvm), true);
+ if (ret) {
+ ret = -EFAULT;
+ goto e_release;
+ }
+
+ data.address = __sme_set(pfns[i] << PAGE_SHIFT);
+ data.page_size = X86_TO_RMP_PG_LEVEL(PG_LEVEL_4K);
+ data.page_type = params.page_type;
+ data.vmpl3_perms = params.vmpl3_perms;
+ data.vmpl2_perms = params.vmpl2_perms;
+ data.vmpl1_perms = params.vmpl1_perms;
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, error);
+ if (ret) {
+ pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
+ ret, *error);
+ snp_page_reclaim(pfns[i]);
+
+ /*
+ * When invalid CPUID function entries are detected, the firmware
+ * corrects these entries for debugging purpose and leaves the
+ * page unencrypted so it can be provided users for debugging
+ * and error-reporting.
+ *
+ * Copy the corrected CPUID page back to shared memory so
+ * userpsace can retrieve this information.
+ */
+ if (params.page_type == SNP_PAGE_TYPE_CPUID &&
+ *error == SEV_RET_INVALID_PARAM) {
+ int ret;
+
+ host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+ ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+ if (ret)
+ pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
+ ret);
+ }
+
+ goto e_release;
+ }
+ }
+
+e_release:
+ /* Content of memory is updated, mark pages dirty */
+ for (i = 0; i < n; i++) {
+ set_page_dirty(pfn_to_page(pfns[i]));
+ mark_page_accessed(pfn_to_page(pfns[i]));
+
+ /*
+ * If its an error, then update RMP entry to change page ownership
+ * to the hypervisor.
+ */
+ if (ret)
+ host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+ put_page(pfn_to_page(pfns[i]));
+ }
+
+ kvfree(pfns);
+ return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_launch_update params;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ return kvm_vm_do_hva_range_op(kvm, params.uaddr, params.uaddr + params.len,
+ snp_launch_update_gfn_handler, argp);
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2081,6 +2259,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_START:
r = snp_launch_start(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_UPDATE:
+ r = snp_launch_update(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e92da3d4f569..264e6acb7947 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1965,6 +1965,7 @@ enum sev_cmd_id {
/* SNP specific commands */
KVM_SEV_SNP_INIT,
KVM_SEV_SNP_LAUNCH_START,
+ KVM_SEV_SNP_LAUNCH_UPDATE,

KVM_SEV_NR_MAX,
};
@@ -2081,6 +2082,24 @@ struct kvm_sev_snp_launch_start {
__u8 pad[6];
};

+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
+
+struct kvm_sev_snp_launch_update {
+ __u64 start_gfn;
+ __u64 uaddr;
+ __u32 len;
+ __u8 imi_page;
+ __u8 page_type;
+ __u8 vmpl3_perms;
+ __u8 vmpl2_perms;
+ __u8 vmpl1_perms;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2023-10-16 13:38:40

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 29/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command

From: Brijesh Singh <[email protected]>

The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.

If its an SNP guest, then VMSA was added in the RMP entry as
a guest owned page and also removed from the kernel direct map
so flush it later after it is transitioned back to hypervisor
state and restored in the direct map.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Harald Hoyer <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: always measure BSP first to get consistent launch measurements]
Signed-off-by: Michael Roth <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 24 +++
arch/x86/kvm/svm/sev.c | 146 ++++++++++++++++++
include/uapi/linux/kvm.h | 14 ++
3 files changed, 184 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index d4325b26724c..b89634cfcc06 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -513,6 +513,30 @@ Returns: 0 on success, -negative on error
See the SEV-SNP spec for further details on how to build the VMPL permission
mask and page type.

+21. KVM_SNP_LAUNCH_FINISH
+-------------------------
+
+After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
+issued to make the guest ready for the execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 host_data[32];
+ __u8 pad[6];
+ };
+
+
+See SEV-SNP specification for further details on launch finish input parameters.
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c505e4620456..ae9f765dfa95 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -78,6 +78,8 @@ static bool sev_snp_enabled;
#define SNP_POLICY_MASK_SMT BIT_ULL(16)
#define SNP_POLICY_MASK_SINGLE_SOCKET BIT_ULL(20)

+#define INITIAL_VMSA_GPA 0xFFFFFFFFF000
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -747,7 +749,29 @@ static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
if (!sev_es_guest(kvm))
return -ENOTTY;

+ /* Handle boot vCPU first to ensure consistent measurement of initial state. */
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (vcpu->vcpu_id != 0)
+ continue;
+
+ ret = mutex_lock_killable(&vcpu->mutex);
+ if (ret)
+ return ret;
+
+ ret = __sev_launch_update_vmsa(kvm, vcpu, &argp->error);
+
+ mutex_unlock(&vcpu->mutex);
+ if (ret)
+ return ret;
+
+ break;
+ }
+
+ /* Handle remaining vCPUs. */
kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (vcpu->vcpu_id == 0)
+ continue;
+
ret = mutex_lock_killable(&vcpu->mutex);
if (ret)
return ret;
@@ -2166,6 +2190,109 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
snp_launch_update_gfn_handler, argp);
}

+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_update data = {};
+ struct kvm_vcpu *vcpu;
+ unsigned long i;
+ int ret;
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.page_type = SNP_PAGE_TYPE_VMSA;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ struct vcpu_svm *svm = to_svm(vcpu);
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ /* Perform some pre-encryption checks against the VMSA */
+ ret = sev_es_sync_vmsa(svm);
+ if (ret)
+ return ret;
+
+ /* Transition the VMSA page to a firmware state. */
+ ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+ if (ret)
+ return ret;
+
+ /* Issue the SNP command to encrypt the VMSA */
+ data.address = __sme_pa(svm->sev_es.vmsa);
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, &argp->error);
+ if (ret) {
+ snp_page_reclaim(pfn);
+ return ret;
+ }
+
+ svm->vcpu.arch.guest_state_protected = true;
+ }
+
+ return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_launch_finish params;
+ struct sev_data_snp_launch_finish *data;
+ void *id_block = NULL, *id_auth = NULL;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ /* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+ ret = snp_launch_update_vmsa(kvm, argp);
+ if (ret)
+ return ret;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+ if (!data)
+ return -ENOMEM;
+
+ if (params.id_block_en) {
+ id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+ if (IS_ERR(id_block)) {
+ ret = PTR_ERR(id_block);
+ goto e_free;
+ }
+
+ data->id_block_en = 1;
+ data->id_block_paddr = __sme_pa(id_block);
+
+ id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+ if (IS_ERR(id_auth)) {
+ ret = PTR_ERR(id_auth);
+ goto e_free_id_block;
+ }
+
+ data->id_auth_paddr = __sme_pa(id_auth);
+
+ if (params.auth_key_en)
+ data->auth_key_en = 1;
+ }
+
+ memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+ kfree(id_auth);
+
+e_free_id_block:
+ kfree(id_block);
+
+e_free:
+ kfree(data);
+
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2262,6 +2389,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_UPDATE:
r = snp_launch_update(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_FINISH:
+ r = snp_launch_finish(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2730,11 +2860,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)

svm = to_svm(vcpu);

+ /*
+ * If its an SNP guest, then VMSA was added in the RMP entry as
+ * a guest owned page. Transition the page to hypervisor state
+ * before releasing it back to the system.
+ * Also the page is removed from the kernel direct map, so flush it
+ * later after it is transitioned back to hypervisor state and
+ * restored in the direct map.
+ */
+ if (sev_snp_guest(vcpu->kvm)) {
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ if (host_rmp_make_shared(pfn, PG_LEVEL_4K, true))
+ goto skip_vmsa_free;
+ }
+
if (vcpu->arch.guest_state_protected)
sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);

__free_page(virt_to_page(svm->sev_es.vmsa));

+skip_vmsa_free:
if (svm->sev_es.ghcb_sa_free)
kvfree(svm->sev_es.ghcb_sa);
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 264e6acb7947..6f7b44b32497 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1966,6 +1966,7 @@ enum sev_cmd_id {
KVM_SEV_SNP_INIT,
KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_SNP_LAUNCH_UPDATE,
+ KVM_SEV_SNP_LAUNCH_FINISH,

KVM_SEV_NR_MAX,
};
@@ -2100,6 +2101,19 @@ struct kvm_sev_snp_launch_update {
__u8 vmpl1_perms;
};

+#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
+#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
+
+struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2023-10-16 13:39:16

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 30/50] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, Verify that GHCB GPA matches with the registered value. If a
mismatch is detected then abort the guest.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 8 ++++++++
arch/x86/kvm/svm/sev.c | 28 ++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 7 +++++++
3 files changed, 43 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 9ba88973a187..9febc1474a30 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ 0x010
+#define GHCB_MSR_GPA_VALUE_POS 12
+#define GHCB_MSR_GPA_VALUE_MASK GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP 0x011
+#define GHCB_MSR_PREF_GPA_NONE 0xfffffffffffff
+
/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ae9f765dfa95..d9c3ecef2710 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3348,6 +3348,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PREF_GPA_REQ: {
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
+ case GHCB_MSR_REG_GPA_REQ: {
+ u64 gfn;
+
+ gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+
+ svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+ set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -3411,6 +3432,13 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
trace_kvm_vmgexit_enter(vcpu->vcpu_id, svm->sev_es.ghcb);

sev_es_sync_from_ghcb(svm);
+
+ /* SEV-SNP guest requires that the GHCB GPA must be registered */
+ if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+ vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+ return -EINVAL;
+ }
+
ret = sev_es_validate_vmgexit(svm);
if (ret)
return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f86dd7d09441..c4449a88e629 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -209,6 +209,8 @@ struct vcpu_sev_es_state {
u32 ghcb_sa_len;
bool ghcb_sa_sync;
bool ghcb_sa_free;
+
+ u64 ghcb_registered_gpa;
};

struct vcpu_svm {
@@ -352,6 +354,11 @@ static __always_inline bool sev_snp_guest(struct kvm *kvm)
return sev_es_guest(kvm) && sev->snp_active;
}

+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+ return svm->sev_es.ghcb_registered_gpa == val;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1

2023-10-16 13:39:23

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 02/50] KVM: SVM: Fix TSC_AUX virtualization setup

From: Tom Lendacky <[email protected]>

The checks for virtualizing TSC_AUX occur during the vCPU reset processing
path. However, at the time of initial vCPU reset processing, when the vCPU
is first created, not all of the guest CPUID information has been set. In
this case the RDTSCP and RDPID feature support for the guest is not in
place and so TSC_AUX virtualization is not established.

This continues for each vCPU created for the guest. On the first boot of
an AP, vCPU reset processing is executed as a result of an APIC INIT
event, this time with all of the guest CPUID information set, resulting
in TSC_AUX virtualization being enabled, but only for the APs. The BSP
always sees a TSC_AUX value of 0 which probably went unnoticed because,
at least for Linux, the BSP TSC_AUX value is 0.

Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
into the vcpu_after_set_cpuid() path to allow for proper initialization of
the support after the guest CPUID information has been set.

With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
path, the intercepts must be either cleared or set based on the guest
CPUID input.

Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Tom Lendacky <[email protected]>
Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
(cherry picked from commit e0096d01c4fcb8c96c05643cfc2c20ab78eae4da)
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 31 ++++++++++++++++++++++++++-----
arch/x86/kvm/svm/svm.c | 9 ++-------
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index fa1fb81323b5..4900c078045a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2962,6 +2962,32 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
count, in);
}

+static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+
+ if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) {
+ bool v_tsc_aux = guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) ||
+ guest_cpuid_has(vcpu, X86_FEATURE_RDPID);
+
+ set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
+ }
+}
+
+void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_cpuid_entry2 *best;
+
+ /* For sev guests, the memory encryption bit is not reserved in CR3. */
+ best = kvm_find_cpuid_entry(vcpu, 0x8000001F);
+ if (best)
+ vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f));
+
+ if (sev_es_guest(svm->vcpu.kvm))
+ sev_es_vcpu_after_set_cpuid(svm);
+}
+
static void sev_es_init_vmcb(struct vcpu_svm *svm)
{
struct vmcb *vmcb = svm->vmcb01.ptr;
@@ -3024,11 +3050,6 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
-
- if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) &&
- (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP) ||
- guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDPID)))
- set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, 1, 1);
}

void sev_init_vmcb(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f283eb47f6ac..aef1ddf0b705 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4284,7 +4284,6 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- struct kvm_cpuid_entry2 *best;

/*
* SVM doesn't provide a way to disable just XSAVES in the guest, KVM
@@ -4328,12 +4327,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
!!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));

- /* For sev guests, the memory encryption bit is not reserved in CR3. */
- if (sev_guest(vcpu->kvm)) {
- best = kvm_find_cpuid_entry(vcpu, 0x8000001F);
- if (best)
- vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f));
- }
+ if (sev_guest(vcpu->kvm))
+ sev_vcpu_after_set_cpuid(svm);

init_vmcb_after_set_cpuid(vcpu);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f41253958357..be67ab7fdd10 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -684,6 +684,7 @@ void __init sev_hardware_setup(void);
void sev_hardware_unsetup(void);
int sev_cpu_init(struct svm_cpu_data *sd);
void sev_init_vmcb(struct vcpu_svm *svm);
+void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
void sev_free_vcpu(struct kvm_vcpu *vcpu);
int sev_handle_vmgexit(struct kvm_vcpu *vcpu);
int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
--
2.25.1

2023-10-16 13:39:42

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 31/50] KVM: SEV: Add KVM_EXIT_VMGEXIT

For private memslots, GHCB page state change requests will be forwarded
to userspace for processing. Define a new KVM_EXIT_VMGEXIT for exits of
this type, as well as other potential userspace handling for VMGEXITs in
the future.

Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/kvm/api.rst | 34 ++++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 6 ++++++
2 files changed, 40 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 5e08f2a157ef..e84c62423ab7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6847,6 +6847,40 @@ Please note that the kernel is allowed to use the kvm_run structure as the
primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.

+::
+
+ /* KVM_EXIT_VMGEXIT */
+ struct {
+ __u64 ghcb_msr; /* GHCB MSR contents */
+ __u64 ret; /* user -> kernel return value */
+ } memory;
+
+If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest has
+issued a VMGEXIT instruction (as documented by the AMD Architecture
+Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
+userspace. This is generally handled via the Guest-Hypervisor Communication
+Block (GHCB) specification. The value of 'ghcb_msr' will be the contents of
+the GHCB MSR register at the time of the VMGEXIT, which can either be the GPA
+of the GHCB page for page-based GHCB requests, or an encoding of an MSR-based
+GHCB request. The mechanism to distinguish between these two and determine the
+type of request is the same as what is documented in the GHCB specification.
+
+Not all VMGEXITs or GHCB requests will be forwarded to userspace. Currently
+this will only be the case for "SNP Page State Change" requests (PSCs), and
+only for the subset of these which involve actual shared <-> private
+transition. Userspace is expected to process these requests in accordance
+with the GHCB specification and issue KVM_SET_MEMORY_ATTRIBUTE ioctls to
+perform the shared/private transitions.
+
+GHCB page-based PSC requests require returning a 64-bit return value to the
+guest via the SW_EXITINFO2 field of the vCPU's VMCB structure, as documented
+in the GHCB. Userspace must set 'ret' to what the GHCB specification documents
+the SW_EXITINFO2 VMCB field should be set to after processing a PSC request.
+
+For MSR-based PSC requests, userspace must set the value of 'ghcb_msr' to be
+the same as what the GHCB specification documents the actual GHCB MSR register
+should be set to after processing a PSC request.
+

6. Capabilities that can be enabled on vCPUs
============================================
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6f7b44b32497..3af546adb962 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -279,6 +279,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_RISCV_CSR 36
#define KVM_EXIT_NOTIFY 37
#define KVM_EXIT_MEMORY_FAULT 38
+#define KVM_EXIT_VMGEXIT 50

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -525,6 +526,11 @@ struct kvm_run {
#define KVM_NOTIFY_CONTEXT_INVALID (1 << 0)
__u32 flags;
} notify;
+ /* KVM_EXIT_VMGEXIT */
+ struct {
+ __u64 ghcb_msr; /* GHCB MSR contents */
+ __u64 ret; /* user -> kernel */
+ } vmgexit;
/* Fix the size of the union. */
char padding[256];
};
--
2.25.1

2023-10-16 13:41:09

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 33/50] KVM: SEV: Add support to handle Page State Change VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.

Forward these requests to userspace as KVM_EXIT_VMGEXITs, similar to how
it is done for requests that don't use a GHCB page.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kvm/svm/sev.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4890e910e6e0..0287fadeae76 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3081,6 +3081,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
+ case SVM_VMGEXIT_PSC:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3278,6 +3279,15 @@ static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
return 1; /* resume */
}

+static int snp_complete_psc(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, vcpu->run->vmgexit.ret);
+
+ return 1; /* resume */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3522,6 +3532,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_PSC:
+ /* Let userspace handling allocating/deallocating backing pages. */
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
+ vcpu->arch.complete_userspace_io = snp_complete_psc;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.25.1

2023-10-16 13:41:58

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 35/50] KVM: SEV: Add support to handle RMP nested page faults

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.

When using gmem, RMP faults resulting from mismatches between the state
in the RMP table vs. what the guest expects via its page table result
in KVM_EXIT_MEMORY_FAULTs being forwarded to userspace to handle. This
means the only expected case that needs to be handled in the kernel is
when the page size of the entry in the RMP table is larger than the
mapping in the nested page table, in which case a PSMASH instruction
needs to be issued to split the large RMP entry into individual 4K
entries so that subsequent accesses can succeed.

Co-developed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/sev-common.h | 3 +
arch/x86/kvm/svm/sev.c | 92 +++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 21 +++++--
arch/x86/kvm/svm/svm.h | 1 +
4 files changed, 113 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 9febc1474a30..15d8e9805963 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -188,6 +188,9 @@ struct snp_psc_desc {
/* RMUPDATE detected 4K page and 2MB page overlap. */
#define RMPUPDATE_FAIL_OVERLAP 4

+/* PSMASH failed due to concurrent access by another CPU */
+#define PSMASH_FAIL_INUSE 3
+
/* RMP page size */
#define RMP_PG_SIZE_4K 0
#define RMP_PG_SIZE_2M 1
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0287fadeae76..0a45031386c2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3270,6 +3270,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}

+static int snp_rmptable_psmash(kvm_pfn_t pfn)
+{
+ pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+ return psmash(pfn);
+}
+
static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3816,3 +3823,88 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)

return p;
}
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+ struct kvm_memory_slot *slot;
+ struct kvm *kvm = vcpu->kvm;
+ int order, rmp_level, ret;
+ bool assigned;
+ kvm_pfn_t pfn;
+ gfn_t gfn;
+
+ gfn = gpa >> PAGE_SHIFT;
+
+ /*
+ * The only time RMP faults occur for shared pages is when the guest is
+ * triggering an RMP fault for an implicit page-state change from
+ * shared->private. Implicit page-state changes are forwarded to
+ * userspace via KVM_EXIT_MEMORY_FAULT events, however, so RMP faults
+ * for shared pages should not end up here.
+ */
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, size-mismatch for non-private GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+ if (ret) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, no private backing page for GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret || !assigned) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
+ gpa, pfn, ret);
+ goto out;
+ }
+
+ /*
+ * There are 2 cases where a PSMASH may be needed to resolve an #NPF
+ * with PFERR_GUEST_RMP_BIT set:
+ *
+ * 1) RMPADJUST/PVALIDATE can trigger an #NPF with PFERR_GUEST_SIZEM
+ * bit set if the guest issues them with a smaller granularity than
+ * what is indicated by the page-size bit in the 2MB-aligned RMP
+ * entry for the PFN that backs the GPA.
+ *
+ * 2) Guest access via NPT can trigger an #NPF if the NPT mapping is
+ * smaller than what is indicated by the 2MB-aligned RMP entry for
+ * the PFN that backs the GPA.
+ *
+ * In both these cases, the corresponding 2M RMP entry needs to
+ * be PSMASH'd to 512 4K RMP entries. If the RMP entry is already
+ * split into 4K RMP entries, then this is likely a spurious case which
+ * can occur when there are concurrent accesses by the guest to a 2MB
+ * GPA range that is backed by a 2MB-aligned PFN who's RMP entry is in
+ * the process of being PMASH'd into 4K entries. These cases should
+ * resolve automatically on subsequent accesses, so just ignore them
+ * here.
+ */
+ if (rmp_level == PG_LEVEL_4K) {
+ pr_debug_ratelimited("%s: Spurious RMP fault for GPA 0x%llx, error_code 0x%llx",
+ __func__, gpa, error_code);
+ goto out;
+ }
+
+ pr_debug_ratelimited("%s: Splitting 2M RMP entry for GPA 0x%llx, error_code 0x%llx",
+ __func__, gpa, error_code);
+ ret = snp_rmptable_psmash(pfn);
+ if (ret && ret != PSMASH_FAIL_INUSE)
+ pr_err_ratelimited("SEV: Unable to split RMP entry for GPA 0x%llx PFN 0x%llx ret %d\n",
+ gpa, pfn, ret);
+
+ kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+out:
+ put_page(pfn_to_page(pfn));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8e4ef0cd968a..563c9839428d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2046,15 +2046,28 @@ static int pf_interception(struct kvm_vcpu *vcpu)
static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ int rc;

u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;

trace_kvm_page_fault(vcpu, fault_address, error_code);
- return kvm_mmu_page_fault(vcpu, fault_address, error_code,
- static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
- svm->vmcb->control.insn_bytes : NULL,
- svm->vmcb->control.insn_len);
+ rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+ static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+ svm->vmcb->control.insn_bytes : NULL,
+ svm->vmcb->control.insn_len);
+
+ /*
+ * rc == 0 indicates a userspace exit is needed to handle page
+ * transitions, so do that first before updating the RMP table.
+ */
+ if (error_code & PFERR_GUEST_RMP_MASK) {
+ if (rc == 0)
+ return rc;
+ handle_rmp_page_fault(vcpu, fault_address, error_code);
+ }
+
+ return rc;
}

static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c4449a88e629..c3a37136fa30 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -715,6 +715,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);

/* vmenter.S */

--
2.25.1

2023-10-16 13:42:12

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 36/50] KVM: SEV: Use a VMSA physical address variable for populating VMCB

From: Tom Lendacky <[email protected]>

In preparation to support SEV-SNP AP Creation, use a variable that holds
the VMSA physical address rather than converting the virtual address.
This will allow SEV-SNP AP Creation to set the new physical address that
will be used should the vCPU reset path be taken.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kvm/svm/sev.c | 3 +--
arch/x86/kvm/svm/svm.c | 9 ++++++++-
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0a45031386c2..f36d72ca2cf7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3639,8 +3639,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
* the VMSA will be NULL if this vCPU is the destination for intrahost
* migration, and will be copied later.
*/
- if (svm->sev_es.vmsa)
- svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
+ svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;

/* Can't intercept CR register access, HV can't modify CR registers */
svm_clr_intercept(svm, INTERCEPT_CR0_READ);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 563c9839428d..c04c554e5675 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1463,9 +1463,16 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
svm_switch_vmcb(svm, &svm->vmcb01);

- if (vmsa_page)
+ if (vmsa_page) {
svm->sev_es.vmsa = page_address(vmsa_page);

+ /*
+ * Do not include the encryption mask on the VMSA physical
+ * address since hardware will access it using the guest key.
+ */
+ svm->sev_es.vmsa_pa = __pa(svm->sev_es.vmsa);
+ }
+
svm->guest_state_loaded = false;

return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c3a37136fa30..0ad76ed4d625 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -200,6 +200,7 @@ struct vcpu_sev_es_state {
struct ghcb *ghcb;
u8 valid_bitmap[16];
struct kvm_host_map ghcb_map;
+ hpa_t vmsa_pa;
bool received_first_sipi;
unsigned int ap_reset_hold_type;

--
2.25.1

2023-10-16 13:42:49

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 38/50] KVM: SEV: Add support for GHCB-based termination requests

GHCB version 2 adds support for a GHCB-based termination request that
a guest can issue when it reaches an error state and wishes to inform
the hypervisor that it should be terminated. Implement support for that
similarly to GHCB MSR-based termination requests that are already
available to SEV-ES guests via earlier versions of the GHCB protocol.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e547adddacfa..9c38fe796e00 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3094,6 +3094,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
+ case SVM_VMGEXIT_TERM_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3762,6 +3763,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)

ret = 1;
break;
+ case SVM_VMGEXIT_TERM_REQUEST:
+ pr_info("SEV-ES guess requested termination: reason %#llx info %#llx\n",
+ control->exit_info_1, control->exit_info_1);
+ vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+ vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
+ vcpu->run->system_event.ndata = 1;
+ vcpu->run->system_event.data[0] = control->ghcb_gpa;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.25.1

2023-10-16 13:43:13

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 39/50] KVM: SEV: Implement gmem hook for initializing private pages

This will handle RMP table updates and direct map changes needed to put
a page into a private state before mapping it into an SEV-SNP guest.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/sev.c | 95 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/svm/svm.h | 1 +
4 files changed, 99 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 71dc506aa3fb..8caf2eb6add8 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -127,6 +127,7 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select KVM_SW_PROTECTED_VM
+ select HAVE_KVM_GMEM_PREPARE
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9c38fe796e00..8cf2d19597b1 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4135,3 +4135,98 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
out:
put_page(pfn_to_page(pfn));
}
+
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn = start;
+
+ while (pfn < end) {
+ int ret, rmp_level;
+ bool assigned;
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret) {
+ pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
+ pfn, start, end, rmp_level, ret);
+ return false;
+ }
+
+ if (assigned) {
+ pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
+ __func__, pfn, start, end, rmp_level);
+ return false;
+ }
+
+ pfn++;
+ }
+
+ return true;
+}
+
+static u8 max_level_for_order(int order)
+{
+ if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+ return PG_LEVEL_2M;
+
+ return PG_LEVEL_4K;
+}
+
+static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
+{
+ kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+
+ /*
+ * If this is a large folio, and the entire 2M range containing the
+ * PFN is currently shared, then the entire 2M-aligned range can be
+ * set to private via a single 2M RMP entry.
+ */
+ if (max_level_for_order(order) > PG_LEVEL_4K &&
+ is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
+ return true;
+
+ return false;
+}
+
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ kvm_pfn_t pfn_aligned;
+ gfn_t gfn_aligned;
+ int level, rc;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc)
+ return rc;
+
+ if (assigned) {
+ pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
+ __func__, gfn, pfn, max_order, level);
+ return 0;
+ }
+
+ if (is_large_rmp_possible(kvm, pfn, max_order)) {
+ level = PG_LEVEL_2M;
+ pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+ gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
+ } else {
+ level = PG_LEVEL_4K;
+ pfn_aligned = pfn;
+ gfn_aligned = gfn;
+ }
+
+ rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
+ if (rc) {
+ pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
+ gfn, pfn, level, rc);
+ return -EINVAL;
+ }
+
+ pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
+ __func__, gfn, pfn, pfn_aligned, max_order, level);
+
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f5cdcbd1ba67..b3ed424533b0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5041,6 +5041,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+ .gmem_prepare = sev_gmem_prepare,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f81dfa1594f6..c5cee554176e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -723,6 +723,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);

/* vmenter.S */

--
2.25.1

2023-10-16 13:43:42

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 40/50] KVM: SEV: Implement gmem hook for invalidating private pages

Implement a platform hook to do the work of restoring the direct map
entries of gmem-managed pages and transitioning the corresponding RMP
table entries back to the default shared/hypervisor-owned state.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/sev.c | 63 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 2 ++
4 files changed, 67 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8caf2eb6add8..dfc857db389f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -128,6 +128,7 @@ config KVM_AMD_SEV
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select KVM_SW_PROTECTED_VM
select HAVE_KVM_GMEM_PREPARE
+ select HAVE_KVM_GMEM_INVALIDATE
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8cf2d19597b1..5b3a3bbfebee 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4230,3 +4230,66 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)

return 0;
}
+
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn;
+
+ pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
+
+ for (pfn = start; pfn < end;) {
+ bool use_2m_update = false;
+ int rc, rmp_level;
+ bool assigned;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (rc) {
+ pr_debug_ratelimited("SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ goto next_pfn;
+ }
+
+ if (!assigned)
+ goto next_pfn;
+
+ use_2m_update = IS_ALIGNED(pfn, PTRS_PER_PMD) &&
+ end >= (pfn + PTRS_PER_PMD) &&
+ rmp_level > PG_LEVEL_4K;
+
+ /*
+ * If an unaligned PFN corresponds to a 2M region assigned as a
+ * large page in he RMP table, PSMASH the region into individual
+ * 4K RMP entries before attempting to convert a 4K sub-page.
+ */
+ if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
+ rc = snp_rmptable_psmash(pfn);
+ if (rc)
+ pr_err_ratelimited("SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ }
+
+ rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
+ if (WARN_ON_ONCE(rc)) {
+ pr_err_ratelimited("SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ goto next_pfn;
+ }
+
+ /*
+ * SEV-ES avoids host/guest cache coherency issues through
+ * WBINVD hooks issued via MMU notifiers during run-time, and
+ * KVM's VM destroy path at shutdown. Those MMU notifier events
+ * don't cover gmem since there is no requirement to map pages
+ * to a HVA in order to use them for a running guest. While the
+ * shutdown path would still likely cover things for SNP guests,
+ * userspace may also free gmem pages during run-time via
+ * hole-punching operations on the guest_memfd, so flush the
+ * cache entries for these pages before free'ing them back to
+ * the host.
+ */
+ clflush_cache_range(__va(pfn_to_hpa(pfn)),
+ use_2m_update ? PMD_SIZE : PAGE_SIZE);
+next_pfn:
+ pfn += use_2m_update ? PTRS_PER_PMD : 1;
+ }
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b3ed424533b0..9cff302b4402 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5043,6 +5043,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.alloc_apic_backing_page = svm_alloc_apic_backing_page,

.gmem_prepare = sev_gmem_prepare,
+ .gmem_invalidate = sev_gmem_invalidate,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c5cee554176e..1fd90a88b0db 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -724,6 +724,8 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+int sev_gmem_max_level(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);

/* vmenter.S */

--
2.25.1

2023-10-16 13:43:49

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 03/50] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests

When intercepts are enabled for MSR_IA32_XSS, the host will swap in/out
the guest-defined values while context-switching to/from guest mode.
However, in the case of SEV-ES, vcpu->arch.guest_state_protected is set,
so the guest-defined value is effectively ignored when switching to
guest mode with the understanding that the VMSA will handle swapping
in/out this register state.

However, SVM is still configured to intercept these accesses for SEV-ES
guests, so the values in the initial MSR_IA32_XSS are effectively
read-only, and a guest will experience undefined behavior if it actually
tries to write to this MSR. Fortunately, only CET/shadowstack makes use
of this register on SEV-ES-capable systems currently, which isn't yet
widely used, but this may become more of an issue in the future.

Additionally, enabling intercepts of MSR_IA32_XSS results in #VC
exceptions in the guest in certain paths that can lead to unexpected #VC
nesting levels. One example is SEV-SNP guests when handling #VC
exceptions for CPUID instructions involving leaf 0xD, subleaf 0x1, since
they will access MSR_IA32_XSS as part of servicing the CPUID #VC, then
generate another #VC when accessing MSR_IA32_XSS, which can lead to
guest crashes if an NMI occurs at that point in time. Running perf on a
guest while it is issuing such a sequence is one example where these can
be problematic.

Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests
if the host/guest configuration allows it. If the host/guest
configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so
that it can be caught by the existing checks in
kvm_{set,get}_msr_common() if the guest still attempts to access it.

Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Cc: Alexey Kardashevskiy <[email protected]>
Suggested-by: Tom Lendacky <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 19 +++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4900c078045a..6ee925d66648 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2972,6 +2972,25 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)

set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
}
+
+ /*
+ * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
+ * the host/guest supports its use.
+ *
+ * guest_can_use() checks a number of requirements on the host/guest to
+ * ensure that MSR_IA32_XSS is available, but it might report true even
+ * if X86_FEATURE_XSAVES isn't configured in the guest to ensure host
+ * MSR_IA32_XSS is always properly restored. For SEV-ES, it is better
+ * to further check that the guest CPUID actually supports
+ * X86_FEATURE_XSAVES so that accesses to MSR_IA32_XSS by misbehaved
+ * guests will still get intercepted and caught in the normal
+ * kvm_emulate_rdmsr()/kvm_emulated_wrmsr() paths.
+ */
+ if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
+ else
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 0, 0);
}

void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index aef1ddf0b705..1e7fb1ea45f7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -103,6 +103,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
{ .index = MSR_IA32_LASTINTFROMIP, .always = false },
{ .index = MSR_IA32_LASTINTTOIP, .always = false },
+ { .index = MSR_IA32_XSS, .always = false },
{ .index = MSR_EFER, .always = false },
{ .index = MSR_IA32_CR_PAT, .always = false },
{ .index = MSR_AMD64_SEV_ES_GHCB, .always = true },
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index be67ab7fdd10..c409f934c377 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -30,7 +30,7 @@
#define IOPM_SIZE PAGE_SIZE * 3
#define MSRPM_SIZE PAGE_SIZE * 2

-#define MAX_DIRECT_ACCESS_MSRS 46
+#define MAX_DIRECT_ACCESS_MSRS 47
#define MSRPM_OFFSETS 32
extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
--
2.25.1

2023-10-16 13:44:10

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 41/50] KVM: x86: Add gmem hook for determining max NPT mapping level

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table. The existing mixed-attribute tracking in KVM is
insufficient here, for instance:

- gmem allocates 2MB page
- guest issues PVALIDATE on 2MB page
- guest later converts a subpage to shared
- SNP host code issues PSMASH to split 2MB RMP mapping to 4K
- KVM MMU splits NPT mapping to 4K

At this point there are no mixed attributes, and KVM would normally
allow for 2MB NPT mappings again, but this is actually not allowed
because the RMP table mappings are 4K and cannot be promoted on the
hypervisor side, so the NPT mappings must still be limited to 4K to
match this.

Add a hook to determine the max NPT mapping size in situations like
this.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 12 ++++++++++--
arch/x86/kvm/svm/sev.c | 27 +++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
5 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 4ef2eca14287..7f2e00c48d3b 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -135,6 +135,7 @@ KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
+KVM_X86_OP_OPTIONAL_RET0(gmem_max_level)
KVM_X86_OP_OPTIONAL(gmem_invalidate)
KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cd4bfe0b7deb..6dda4d24dbef 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1764,6 +1764,7 @@ struct kvm_x86_ops {

int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
+ int (*gmem_max_level)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
};

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8c78807e0f45..64f6cb428b32 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4304,6 +4304,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
int max_order, r;
+ u8 max_level;

if (!kvm_slot_can_be_private(fault->slot)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
@@ -4317,8 +4318,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
return r;
}

- fault->max_level = min(kvm_max_level_for_order(max_order),
- fault->max_level);
+ max_level = kvm_max_level_for_order(max_order);
+ r = static_call(kvm_x86_gmem_max_level)(vcpu->kvm, fault->pfn,
+ fault->gfn, &max_level);
+ if (r) {
+ kvm_release_pfn_clean(fault->pfn);
+ return r;
+ }
+
+ fault->max_level = min(max_level, fault->max_level);
fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);

return RET_PF_CONTINUE;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5b3a3bbfebee..6c6d5a320d72 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4293,3 +4293,30 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
pfn += use_2m_update ? PTRS_PER_PMD : 1;
}
}
+
+int sev_gmem_max_level(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level)
+{
+ int level, rc;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc) {
+ pr_err_ratelimited("SEV: RMP entry not found: GFN %llx PFN %llx level %d error %d\n",
+ gfn, pfn, level, rc);
+ return -ENOENT;
+ }
+
+ if (!assigned) {
+ pr_err_ratelimited("SEV: RMP entry is not assigned: GFN %llx PFN %llx level %d\n",
+ gfn, pfn, level);
+ return -EINVAL;
+ }
+
+ if (level < *max_level)
+ *max_level = level;
+
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9cff302b4402..d97ec673b63d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5043,6 +5043,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.alloc_apic_backing_page = svm_alloc_apic_backing_page,

.gmem_prepare = sev_gmem_prepare,
+ .gmem_max_level = sev_gmem_max_level,
.gmem_invalidate = sev_gmem_invalidate,
};

--
2.25.1

2023-10-16 13:44:46

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 42/50] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

From: Ashish Kalra <[email protected]>

With SNP/guest_memfd, private/encrypted memory should not be mappable,
and MMU notifications for HVA-mapped memory will only be relevant to
unencrypted guest memory. Therefore, the rationale behind issuing a
wbinvd_on_all_cpus() in sev_guest_memory_reclaimed() should not apply
for SNP guests and can be ignored.

Signed-off-by: Ashish Kalra <[email protected]>
[mdr: Add some clarifications in commit]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6c6d5a320d72..f027def3a79e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2852,7 +2852,14 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)

void sev_guest_memory_reclaimed(struct kvm *kvm)
{
- if (!sev_guest(kvm))
+ /*
+ * With SNP+gmem, private/encrypted memory should be
+ * unreachable via the hva-based mmu notifiers. Additionally,
+ * for shared->private translations, H/W coherency will ensure
+ * first guest access to the page would clear out any existing
+ * dirty copies of that cacheline.
+ */
+ if (!sev_guest(kvm) || sev_snp_guest(kvm))
return;

wbinvd_on_all_cpus();
--
2.25.1

2023-10-16 13:45:57

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 44/50] iommu/amd: Add IOMMU_SNP_SHUTDOWN support

From: Ashish Kalra <[email protected]>

Add a new IOMMU API interface amd_iommu_snp_disable() to transition
IOMMU pages to Hypervisor state from Reclaim state after SNP_SHUTDOWN_EX
command. Invoke this API from the CCP driver after SNP_SHUTDOWN_EX
command.

Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 20 +++++++++++++
drivers/iommu/amd/init.c | 55 ++++++++++++++++++++++++++++++++++++
include/linux/amd-iommu.h | 3 ++
3 files changed, 78 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 679b8d6fc09a..0626c0feff9b 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -26,6 +26,7 @@
#include <linux/fs.h>
#include <linux/fs_struct.h>
#include <linux/psp.h>
+#include <linux/amd-iommu.h>

#include <asm/smp.h>
#include <asm/cacheflush.h>
@@ -1513,6 +1514,25 @@ static int __sev_snp_shutdown_locked(int *error)
return ret;
}

+ /*
+ * SNP_SHUTDOWN_EX with IOMMU_SNP_SHUTDOWN set to 1 disables SNP
+ * enforcement by the IOMMU and also transitions all pages
+ * associated with the IOMMU to the Reclaim state.
+ * Firmware was transitioning the IOMMU pages to Hypervisor state
+ * before version 1.53. But, accounting for the number of assigned
+ * 4kB pages in a 2M page was done incorrectly by not transitioning
+ * to the Reclaim state. This resulted in RMP #PF when later accessing
+ * the 2M page containing those pages during kexec boot. Hence, the
+ * firmware now transitions these pages to Reclaim state and hypervisor
+ * needs to transition these pages to shared state. SNP Firmware
+ * version 1.53 and above are needed for kexec boot.
+ */
+ ret = amd_iommu_snp_disable();
+ if (ret) {
+ dev_err(sev->dev, "SNP IOMMU shutdown failed\n");
+ return ret;
+ }
+
sev->snp_initialized = false;
dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1c9924de607a..6af208a4f66b 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -30,6 +30,7 @@
#include <asm/io_apic.h>
#include <asm/irq_remapping.h>
#include <asm/set_memory.h>
+#include <asm/sev-host.h>

#include <linux/crash_dump.h>

@@ -3838,4 +3839,58 @@ int amd_iommu_snp_enable(void)

return 0;
}
+
+static int iommu_page_make_shared(void *page)
+{
+ unsigned long paddr, pfn;
+
+ paddr = iommu_virt_to_phys(page);
+ /* Cbit maybe set in the paddr */
+ pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ return rmp_make_shared(pfn, PG_LEVEL_4K);
+}
+
+static int iommu_make_shared(void *va, size_t size)
+{
+ void *page;
+ int ret;
+
+ if (!va)
+ return 0;
+
+ for (page = va; page < (va + size); page += PAGE_SIZE) {
+ ret = iommu_page_make_shared(page);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+int amd_iommu_snp_disable(void)
+{
+ struct amd_iommu *iommu;
+ int ret;
+
+ if (!amd_iommu_snp_en)
+ return 0;
+
+ for_each_iommu(iommu) {
+ ret = iommu_make_shared(iommu->evt_buf, EVT_BUFFER_SIZE);
+ if (ret)
+ return ret;
+
+ ret = iommu_make_shared(iommu->ppr_log, PPR_LOG_SIZE);
+ if (ret)
+ return ret;
+
+ ret = iommu_make_shared((void *)iommu->cmd_sem, PAGE_SIZE);
+ if (ret)
+ return ret;
+ }
+
+ amd_iommu_snp_en = false;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(amd_iommu_snp_disable);
#endif
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 55fc03cb3968..b04f2d3201b1 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -207,6 +207,9 @@ struct amd_iommu *get_amd_iommu(unsigned int idx);

#ifdef CONFIG_KVM_AMD_SEV
int amd_iommu_snp_enable(void);
+int amd_iommu_snp_disable(void);
+#else
+static inline int amd_iommu_snp_disable(void) { return 0; }
#endif

#endif /* _ASM_X86_AMD_IOMMU_H */
--
2.25.1

2023-10-16 13:46:50

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 45/50] iommu/amd: Report all cases inhibiting SNP enablement

Enabling SNP relies on various IOMMU-related checks in
amd_iommu_snp_enable(). In most cases, when the host supports SNP, any
IOMMU-related details that prevent enabling SNP are reported. One case
where it is not reported is when the IOMMU doesn't support the SNP
feature. Often this is the result of the corresponding BIOS option not
being enabled, so report that case along with the others.

While here, fix up the reporting to be more consistent about using
periods to end sentences, and always printing a newline afterward.

Signed-off-by: Michael Roth <[email protected]>
---
drivers/iommu/amd/init.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 6af208a4f66b..121092f0a48a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3811,7 +3811,7 @@ int amd_iommu_snp_enable(void)
* not configured in the passthrough mode.
*/
if (no_iommu || iommu_default_passthrough()) {
- pr_err("SNP: IOMMU is disabled or configured in passthrough mode, SNP cannot be supported");
+ pr_err("SNP: IOMMU is disabled or configured in passthrough mode, SNP cannot be supported.\n");
return -EINVAL;
}

@@ -3826,14 +3826,16 @@ int amd_iommu_snp_enable(void)
}

amd_iommu_snp_en = check_feature_on_all_iommus(FEATURE_SNP);
- if (!amd_iommu_snp_en)
+ if (!amd_iommu_snp_en) {
+ pr_err("SNP: IOMMU SNP feature is not enabled, SNP cannot be supported.\n");
return -EINVAL;
+ }

pr_info("SNP enabled\n");

/* Enforce IOMMU v1 pagetable when SNP is enabled. */
if (amd_iommu_pgtable != AMD_IOMMU_V1) {
- pr_warn("Force to using AMD IOMMU v1 page table due to SNP\n");
+ pr_warn("Force to using AMD IOMMU v1 page table due to SNP.\n");
amd_iommu_pgtable = AMD_IOMMU_V1;
}

--
2.25.1

2023-10-16 13:47:27

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 46/50] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command

From: Brijesh Singh <[email protected]>

The SEV-SNP firmware provides the SNP_CONFIG command used to set the
system-wide configuration value for SNP guests. The information includes
the TCB version string to be reported in guest attestation reports.

Version 2 of the GHCB specification adds an NAE (SNP extended guest
request) that a guest can use to query the reports that include additional
certificates.

In both cases, userspace provided additional data is included in the
attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
command to give the certificate blob and the reported TCB version string
at once. Note that the specification defines certificate blob with a
specific GUID format; the userspace is responsible for building the
proper certificate blob. The ioctl treats it an opaque blob.

While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
command that can be used to obtain the data programmed through the
SNP_SET_EXT_CONFIG.

Co-developed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Co-developed-by: Dionna Glaze <[email protected]>
Signed-off-by: Dionna Glaze <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: squash in doc patch from Dionna]
Signed-off-by: Michael Roth <[email protected]>
---
Documentation/virt/coco/sev-guest.rst | 27 ++++
drivers/crypto/ccp/sev-dev.c | 173 ++++++++++++++++++++++++++
drivers/crypto/ccp/sev-dev.h | 2 +
include/linux/psp-sev.h | 10 ++
include/uapi/linux/psp-sev.h | 17 +++
5 files changed, 229 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index e828c5326936..7cabf54395e5 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -151,6 +151,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
status includes API major, minor version and more. See the SEV-SNP
specification for further details.

+2.5 SNP_SET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
+reported TCB version in the attestation report. The command is similar to
+SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
+command also accepts an additional certificate blob defined in the GHCB
+specification.
+
+If the certs_address is zero, then the previous certificate blob will deleted.
+For more information on the certificate blob layout, see the GHCB spec
+(extended guest request message).
+
+2.6 SNP_GET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_GET_EXT_CONFIG is used to query the system-wide configuration set
+through the SNP_SET_EXT_CONFIG.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 0626c0feff9b..4807ddd6ec52 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1496,6 +1496,10 @@ static int __sev_snp_shutdown_locked(int *error)
data.length = sizeof(data);
data.iommu_snp_shutdown = 1;

+ /* Free the memory used for caching the certificate data */
+ sev_snp_certs_put(sev->snp_certs);
+ sev->snp_certs = NULL;
+
wbinvd_on_all_cpus();

retry:
@@ -1834,6 +1838,121 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
return ret;
}

+static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_ext_snp_config input;
+ struct sev_snp_certs *snp_certs;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+ return -EFAULT;
+
+ /* Copy the TCB version programmed through the SET_CONFIG to userspace */
+ if (input.config_address) {
+ if (copy_to_user((void * __user)input.config_address,
+ &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
+ return -EFAULT;
+ }
+
+ snp_certs = sev_snp_certs_get(sev->snp_certs);
+
+ /* Copy the extended certs programmed through the SNP_SET_CONFIG */
+ if (input.certs_address && snp_certs) {
+ if (input.certs_len < snp_certs->len) {
+ /* Return the certs length to userspace */
+ input.certs_len = snp_certs->len;
+
+ ret = -EIO;
+ goto e_done;
+ }
+
+ if (copy_to_user((void * __user)input.certs_address,
+ snp_certs->data, snp_certs->len)) {
+ ret = -EFAULT;
+ goto put_exit;
+ }
+ }
+
+ ret = 0;
+
+e_done:
+ if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
+ ret = -EFAULT;
+
+put_exit:
+ sev_snp_certs_put(snp_certs);
+
+ return ret;
+}
+
+static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_ext_snp_config input;
+ struct sev_user_data_snp_config config;
+ struct sev_snp_certs *snp_certs = NULL;
+ void *certs = NULL;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+ return -EFAULT;
+
+ /* Copy the certs from userspace */
+ if (input.certs_address) {
+ if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
+ return -EINVAL;
+
+ certs = psp_copy_user_blob(input.certs_address, input.certs_len);
+ if (IS_ERR(certs))
+ return PTR_ERR(certs);
+ }
+
+ /* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
+ if (input.config_address) {
+ if (copy_from_user(&config,
+ (void __user *)input.config_address, sizeof(config))) {
+ ret = -EFAULT;
+ goto e_free;
+ }
+
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
+ if (ret)
+ goto e_free;
+
+ memcpy(&sev->snp_config, &config, sizeof(config));
+ }
+
+ /*
+ * If the new certs are passed then cache it else free the old certs.
+ */
+ if (input.certs_len) {
+ snp_certs = sev_snp_certs_new(certs, input.certs_len);
+ if (!snp_certs) {
+ ret = -ENOMEM;
+ goto e_free;
+ }
+ }
+
+ sev_snp_certs_put(sev->snp_certs);
+ sev->snp_certs = snp_certs;
+
+ return 0;
+
+e_free:
+ kfree(certs);
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -1888,6 +2007,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SNP_PLATFORM_STATUS:
ret = sev_ioctl_snp_platform_status(&input);
break;
+ case SNP_SET_EXT_CONFIG:
+ ret = sev_ioctl_snp_set_config(&input, writable);
+ break;
+ case SNP_GET_EXT_CONFIG:
+ ret = sev_ioctl_snp_get_config(&input);
+ break;
default:
ret = -EINVAL;
goto out;
@@ -1936,6 +2061,54 @@ int sev_guest_df_flush(int *error)
}
EXPORT_SYMBOL_GPL(sev_guest_df_flush);

+static void sev_snp_certs_release(struct kref *kref)
+{
+ struct sev_snp_certs *certs = container_of(kref, struct sev_snp_certs, kref);
+
+ kfree(certs->data);
+ kfree(certs);
+}
+
+struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len)
+{
+ struct sev_snp_certs *certs;
+
+ if (!len || !data)
+ return NULL;
+
+ certs = kzalloc(sizeof(*certs), GFP_KERNEL);
+ if (!certs)
+ return NULL;
+
+ certs->data = data;
+ certs->len = len;
+ kref_init(&certs->kref);
+
+ return certs;
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_new);
+
+struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs)
+{
+ if (!certs)
+ return NULL;
+
+ if (!kref_get_unless_zero(&certs->kref))
+ return NULL;
+
+ return certs;
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_get);
+
+void sev_snp_certs_put(struct sev_snp_certs *certs)
+{
+ if (!certs)
+ return;
+
+ kref_put(&certs->kref, sev_snp_certs_release);
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_put);
+
static void sev_exit(struct kref *ref)
{
misc_deregister(&misc_dev->misc);
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 2c2fe42189a5..71eac493fd56 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -66,6 +66,8 @@ struct sev_device {

bool snp_initialized;
struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
+ struct sev_snp_certs *snp_certs;
+ struct sev_user_data_snp_config snp_config;
};

int sev_dev_init(struct psp_device *psp);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 9342cee1a1e6..3c605856ef4f 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -16,6 +16,16 @@

#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */

+struct sev_snp_certs {
+ void *data;
+ u32 len;
+ struct kref kref;
+};
+
+struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len);
+struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs);
+void sev_snp_certs_put(struct sev_snp_certs *certs);
+
/**
* SEV platform state
*/
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index b94b3687edbb..b70db9ab7e44 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -29,6 +29,8 @@ enum {
SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
SEV_GET_ID2,
SNP_PLATFORM_STATUS,
+ SNP_SET_EXT_CONFIG,
+ SNP_GET_EXT_CONFIG,

SEV_MAX,
};
@@ -208,6 +210,21 @@ struct sev_user_data_snp_config {
__u8 rsvd1[52];
} __packed;

+/**
+ * struct sev_data_snp_ext_config - system wide configuration value for SNP.
+ *
+ * @config_address: address of the struct sev_user_data_snp_config or 0 when
+ * reported_tcb does not need to be updated.
+ * @certs_address: address of extended guest request certificate chain or
+ * 0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
+ * @certs_len: length of the certs
+ */
+struct sev_user_data_ext_snp_config {
+ __u64 config_address; /* In */
+ __u64 certs_address; /* In */
+ __u32 certs_len; /* In */
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1

2023-10-16 13:47:42

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 47/50] x86/sev: Add KVM commands for per-instance certs

From: Dionna Glaze <[email protected]>

The /dev/sev device has the ability to store host-wide certificates for
the key used by the AMD-SP for SEV-SNP attestation report signing,
but for hosts that want to specify additional certificates that are
specific to the image launched in a VM, a different way is needed to
communicate those certificates.

Add two new KVM ioctl to handle this: KVM_SEV_SNP_{GET,SET}_CERTS

The certificates that are set with this command are expected to follow
the same format as the host certificates, but that format is opaque
to the kernel.

The new behavior for custom certificates is that the extended guest
request command will now return the overridden certificates if they
were installed for the instance. The error condition for a too small
data buffer is changed to return the overridden certificate data size
if there is an overridden certificate set installed.

Setting a 0 length certificate returns the system state to only return
the host certificates on an extended guest request.

Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
for an extra certificate.

Cc: Tom Lendacky <[email protected]>
Cc: Paolo Bonzini <[email protected]>

Signed-off-by: Dionna Glaze <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: remove used of "we" and "this patch" in commit log, squash in
documentation patch]
Signed-off-by: Michael Roth <[email protected]>
[aik: snp_handle_ext_guest_request() now uses the CCP's cert object
without copying things over, only refcounting needed.]
Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 44 +++++++
arch/x86/kvm/svm/sev.c | 115 ++++++++++++++++++
arch/x86/kvm/svm/svm.h | 1 +
include/linux/psp-sev.h | 2 +-
include/uapi/linux/kvm.h | 12 ++
5 files changed, 173 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index b89634cfcc06..2ce6c90f07d4 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -537,6 +537,50 @@ Returns: 0 on success, -negative on error

See SEV-SNP specification for further details on launch finish input parameters.

+22. KVM_SEV_SNP_GET_CERTS
+-------------------------
+
+After the SNP guest launch flow has started, the KVM_SEV_SNP_GET_CERTS command
+can be issued to request the data that has been installed with the
+KVM_SEV_SNP_SET_CERTS command.
+
+Parameters (in/out): struct kvm_sev_snp_get_certs
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_get_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len
+ };
+
+If no certs have been installed, then the return value is -ENOENT.
+If the buffer specified in the struct is too small, the certs_len field will be
+overwritten with the required bytes to receive all the certificate bytes and the
+return value will be -EINVAL.
+
+23. KVM_SEV_SNP_SET_CERTS
+-------------------------
+
+After the SNP guest launch flow has started, the KVM_SEV_SNP_SET_CERTS command
+can be issued to override the /dev/sev certs data that is returned when a
+guest issues an extended guest request. This is useful for instance-specific
+extensions to the host certificates.
+
+Parameters (in/out): struct kvm_sev_snp_set_certs
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_set_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len
+ };
+
+The certs_len field may not exceed SEV_FW_BLOB_MAX_SIZE.
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index efe879524b6c..602aaf82eef3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2301,6 +2301,113 @@ static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}

+static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_get_certs params;
+ struct sev_snp_certs *snp_certs;
+ int rc = 0;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+ sizeof(params)))
+ return -EFAULT;
+
+ snp_certs = sev_snp_certs_get(sev->snp_certs);
+ /* No instance certs set. */
+ if (!snp_certs)
+ return -ENOENT;
+
+ if (params.certs_len < sev->snp_certs->len) {
+ /* Output buffer too small. Return the required size. */
+ params.certs_len = sev->snp_certs->len;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
+ sizeof(params)))
+ rc = -EFAULT;
+ else
+ rc = -EINVAL; /* May be ENOSPC? */
+ } else {
+ if (copy_to_user((void __user *)(uintptr_t)params.certs_uaddr,
+ snp_certs->data, snp_certs->len))
+ rc = -EFAULT;
+ }
+
+ sev_snp_certs_put(snp_certs);
+
+ return rc;
+}
+
+static void snp_replace_certs(struct kvm_sev_info *sev, struct sev_snp_certs *snp_certs)
+{
+ sev_snp_certs_put(sev->snp_certs);
+ sev->snp_certs = snp_certs;
+}
+
+static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ unsigned long length = SEV_FW_BLOB_MAX_SIZE;
+ struct kvm_sev_snp_set_certs params;
+ struct sev_snp_certs *snp_certs;
+ void *to_certs;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+ sizeof(params)))
+ return -EFAULT;
+
+ if (params.certs_len > SEV_FW_BLOB_MAX_SIZE)
+ return -EINVAL;
+
+ /*
+ * Setting a length of 0 is the same as "uninstalling" instance-
+ * specific certificates.
+ */
+ if (params.certs_len == 0) {
+ snp_replace_certs(sev, NULL);
+ return 0;
+ }
+
+ /* Page-align the length */
+ length = ALIGN(params.certs_len, PAGE_SIZE);
+
+ to_certs = kmalloc(length, GFP_KERNEL | __GFP_ZERO);
+ if (!to_certs)
+ return -ENOMEM;
+
+ if (copy_from_user(to_certs,
+ (void __user *)(uintptr_t)params.certs_uaddr,
+ params.certs_len)) {
+ ret = -EFAULT;
+ goto error_exit;
+ }
+
+ snp_certs = sev_snp_certs_new(to_certs, length);
+ if (!snp_certs) {
+ ret = -ENOMEM;
+ goto error_exit;
+ }
+
+ snp_replace_certs(sev, snp_certs);
+
+ return 0;
+error_exit:
+ kfree(to_certs);
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2400,6 +2507,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_FINISH:
r = snp_launch_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_GET_CERTS:
+ r = snp_get_instance_certs(kvm, &sev_cmd);
+ break;
+ case KVM_SEV_SNP_SET_CERTS:
+ r = snp_set_instance_certs(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2616,6 +2729,8 @@ static int snp_decommission_context(struct kvm *kvm)
snp_free_firmware_page(sev->snp_context);
sev->snp_context = NULL;

+ sev_snp_certs_put(sev->snp_certs);
+
return 0;
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1fd90a88b0db..bdf792ba06e1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -97,6 +97,7 @@ struct kvm_sev_info {
u64 snp_init_flags;
void *snp_context; /* SNP guest context page */
u64 sev_features; /* Features set at VMSA creation */
+ struct sev_snp_certs *snp_certs;
};

struct kvm_svm {
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 3c605856ef4f..722e26d28d2f 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -14,7 +14,7 @@

#include <uapi/linux/psp-sev.h>

-#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
+#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */

struct sev_snp_certs {
void *data;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3af546adb962..0444e122ac5e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1973,6 +1973,8 @@ enum sev_cmd_id {
KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_SNP_LAUNCH_UPDATE,
KVM_SEV_SNP_LAUNCH_FINISH,
+ KVM_SEV_SNP_GET_CERTS,
+ KVM_SEV_SNP_SET_CERTS,

KVM_SEV_NR_MAX,
};
@@ -2120,6 +2122,16 @@ struct kvm_sev_snp_launch_finish {
__u8 pad[6];
};

+struct kvm_sev_snp_get_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len;
+};
+
+struct kvm_sev_snp_set_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2023-10-16 13:48:05

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

From: Brijesh Singh <[email protected]>

Version 2 of GHCB specification added the support for two SNP Guest
Request Message NAE events. The events allows for an SEV-SNP guest to
make request to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.

The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
difference of an additional certificate blob that can be passed through
the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
provides snp_guest_ext_guest_request() that is used by the KVM to get
both the report and certificate data at once.

Co-developed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: ensure FW command failures are indicated to guest]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/sev.c | 176 +++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 1 +
drivers/crypto/ccp/sev-dev.c | 15 +++
include/linux/psp-sev.h | 1 +
4 files changed, 193 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 602aaf82eef3..d71ec257debb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,6 +19,7 @@
#include <linux/misc_cgroup.h>
#include <linux/processor.h>
#include <linux/trace_events.h>
+#include <uapi/linux/sev-guest.h>

#include <asm/pkru.h>
#include <asm/trapnr.h>
@@ -339,6 +340,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
ret = verify_snp_init_flags(kvm, argp);
if (ret)
goto e_free;
+
+ mutex_init(&sev->guest_req_lock);
}

ret = sev_platform_init(&argp->error);
@@ -2345,8 +2348,10 @@ static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)

static void snp_replace_certs(struct kvm_sev_info *sev, struct sev_snp_certs *snp_certs)
{
+ mutex_lock(&sev->guest_req_lock);
sev_snp_certs_put(sev->snp_certs);
sev->snp_certs = snp_certs;
+ mutex_unlock(&sev->guest_req_lock);
}

static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
@@ -3218,6 +3223,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
case SVM_VMGEXIT_TERM_REQUEST:
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3627,6 +3634,163 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
return ret;
}

+static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
+ struct sev_data_snp_guest_request *data,
+ gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ kvm_pfn_t req_pfn, resp_pfn;
+ struct kvm_sev_info *sev;
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+ return SEV_RET_INVALID_PARAM;
+
+ req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+ if (is_error_noslot_pfn(req_pfn))
+ return SEV_RET_INVALID_ADDRESS;
+
+ resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+ if (is_error_noslot_pfn(resp_pfn))
+ return SEV_RET_INVALID_ADDRESS;
+
+ if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+ return SEV_RET_INVALID_ADDRESS;
+
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+ data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+ return 0;
+}
+
+static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
+{
+ u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+ int ret;
+
+ ret = snp_page_reclaim(pfn);
+ if (ret)
+ *rc = SEV_RET_INVALID_ADDRESS;
+
+ ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (ret)
+ *rc = SEV_RET_INVALID_ADDRESS;
+}
+
+static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct sev_data_snp_guest_request data = {0};
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_sev_info *sev;
+ unsigned long rc;
+ int err;
+
+ if (!sev_snp_guest(vcpu->kvm)) {
+ rc = SEV_RET_INVALID_GUEST;
+ goto e_fail;
+ }
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ mutex_lock(&sev->guest_req_lock);
+
+ rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
+ if (rc)
+ goto unlock;
+
+ rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
+ if (rc)
+ /* Ensure an error value is returned to guest. */
+ rc = err ? err : SEV_RET_INVALID_ADDRESS;
+
+ snp_cleanup_guest_buf(&data, &rc);
+
+unlock:
+ mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
+}
+
+static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct sev_data_snp_guest_request req = {0};
+ struct sev_snp_certs *snp_certs = NULL;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ unsigned long data_npages;
+ struct kvm_sev_info *sev;
+ unsigned long exitcode = 0;
+ u64 data_gpa;
+ int err, rc;
+
+ if (!sev_snp_guest(vcpu->kvm)) {
+ rc = SEV_RET_INVALID_GUEST;
+ goto e_fail;
+ }
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+ data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+ if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
+ exitcode = SEV_RET_INVALID_ADDRESS;
+ goto e_fail;
+ }
+
+ mutex_lock(&sev->guest_req_lock);
+
+ rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
+ if (rc)
+ goto unlock;
+
+ /*
+ * If a VMM-specific certificate blob hasn't been provided, grab the
+ * host-wide one.
+ */
+ snp_certs = sev_snp_certs_get(sev->snp_certs);
+ if (!snp_certs)
+ snp_certs = sev_snp_global_certs_get();
+
+ /*
+ * If there is a host-wide or VMM-specific certificate blob available,
+ * make sure the guest has allocated enough space to store it.
+ * Otherwise, inform the guest how much space is needed.
+ */
+ if (snp_certs && (data_npages << PAGE_SHIFT) < snp_certs->len) {
+ vcpu->arch.regs[VCPU_REGS_RBX] = snp_certs->len >> PAGE_SHIFT;
+ exitcode = SNP_GUEST_VMM_ERR(SNP_GUEST_VMM_ERR_INVALID_LEN);
+ goto cleanup;
+ }
+
+ rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req, &err);
+ if (rc) {
+ /* pass the firmware error code */
+ exitcode = err;
+ goto cleanup;
+ }
+
+ /* Copy the certificate blob in the guest memory */
+ if (snp_certs &&
+ kvm_write_guest(kvm, data_gpa, snp_certs->data, snp_certs->len))
+ exitcode = SEV_RET_INVALID_ADDRESS;
+
+cleanup:
+ sev_snp_certs_put(snp_certs);
+ snp_cleanup_guest_buf(&req, &exitcode);
+
+unlock:
+ mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3894,6 +4058,18 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
vcpu->run->system_event.ndata = 1;
vcpu->run->system_event.data[0] = control->ghcb_gpa;
break;
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
+
+ ret = 1;
+ break;
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+ snp_handle_ext_guest_request(svm,
+ control->exit_info_1,
+ control->exit_info_2);
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bdf792ba06e1..3673a6e4e22e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -98,6 +98,7 @@ struct kvm_sev_info {
void *snp_context; /* SNP guest context page */
u64 sev_features; /* Features set at VMSA creation */
struct sev_snp_certs *snp_certs;
+ struct mutex guest_req_lock; /* Lock for guest request handling */
};

struct kvm_svm {
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 4807ddd6ec52..f9c75c561c4e 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2109,6 +2109,21 @@ void sev_snp_certs_put(struct sev_snp_certs *certs)
}
EXPORT_SYMBOL_GPL(sev_snp_certs_put);

+struct sev_snp_certs *sev_snp_global_certs_get(void)
+{
+ struct sev_device *sev;
+
+ if (!psp_master || !psp_master->sev_data)
+ return NULL;
+
+ sev = psp_master->sev_data;
+ if (!sev->snp_initialized)
+ return NULL;
+
+ return sev_snp_certs_get(sev->snp_certs);
+}
+EXPORT_SYMBOL_GPL(sev_snp_global_certs_get);
+
static void sev_exit(struct kref *ref)
{
misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 722e26d28d2f..3b294ccbbec9 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -25,6 +25,7 @@ struct sev_snp_certs {
struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len);
struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs);
void sev_snp_certs_put(struct sev_snp_certs *certs);
+struct sev_snp_certs *sev_snp_global_certs_get(void);

/**
* SEV platform state
--
2.25.1

2023-10-16 13:48:41

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 50/50] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump

From: Ashish Kalra <[email protected]>

Add a kdump safe version of sev_firmware_shutdown() registered as a
crash_kexec_post_notifier, which is invoked during panic/crash to do
SEV/SNP shutdown. This is required for transitioning all IOMMU pages
to reclaim/hypervisor state, otherwise re-init of IOMMU pages during
crashdump kernel boot fails and panics the crashdump kernel. This
panic notifier runs in atomic context, hence it ensures not to
acquire any locks/mutexes and polls for PSP command completion
instead of depending on PSP command completion interrupt.

Signed-off-by: Ashish Kalra <[email protected]>
[mdr: remove use of "we" in comments]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kernel/crash.c | 7 +++
drivers/crypto/ccp/sev-dev.c | 112 +++++++++++++++++++++++++----------
2 files changed, 89 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c92d88680dbf..23ede774d31b 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -59,6 +59,13 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
*/
cpu_emergency_stop_pt();

+ /*
+ * for SNP do wbinvd() on remote CPUs to
+ * safely do SNP_SHUTDOWN on the local CPU.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ wbinvd();
+
disable_local_APIC();
}

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 26218df1371e..21a3064f30c9 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -21,6 +21,7 @@
#include <linux/hw_random.h>
#include <linux/ccp.h>
#include <linux/firmware.h>
+#include <linux/panic_notifier.h>
#include <linux/gfp.h>
#include <linux/cpufeature.h>
#include <linux/fs.h>
@@ -137,6 +138,26 @@ static int sev_wait_cmd_ioc(struct sev_device *sev,
{
int ret;

+ /*
+ * If invoked during panic handling, local interrupts are disabled,
+ * so the PSP command completion interrupt can't be used. Poll for
+ * PSP command completion instead.
+ */
+ if (irqs_disabled()) {
+ unsigned long timeout_usecs = (timeout * USEC_PER_SEC) / 10;
+
+ /* Poll for SEV command completion: */
+ while (timeout_usecs--) {
+ *reg = ioread32(sev->io_regs + sev->vdata->cmdresp_reg);
+ if (*reg & PSP_CMDRESP_RESP)
+ return 0;
+
+ udelay(10);
+ }
+
+ return -ETIMEDOUT;
+ }
+
ret = wait_event_timeout(sev->int_queue,
sev->int_rcvd, timeout * HZ);
if (!ret)
@@ -1058,17 +1079,6 @@ static int __sev_platform_shutdown_locked(int *error)
return ret;
}

-static int sev_platform_shutdown(int *error)
-{
- int rc;
-
- mutex_lock(&sev_cmd_mutex);
- rc = __sev_platform_shutdown_locked(NULL);
- mutex_unlock(&sev_cmd_mutex);
-
- return rc;
-}
-
static int sev_get_platform_state(int *state, int *error)
{
struct sev_user_data_status data;
@@ -1483,7 +1493,7 @@ static int __sev_snp_init_locked(int *error)
return rc;
}

-static int __sev_snp_shutdown_locked(int *error)
+static int __sev_snp_shutdown_locked(int *error, bool in_panic)
{
struct sev_device *sev = psp_master->sev_data;
struct sev_data_snp_shutdown_ex data;
@@ -1500,7 +1510,16 @@ static int __sev_snp_shutdown_locked(int *error)
sev_snp_certs_put(sev->snp_certs);
sev->snp_certs = NULL;

- wbinvd_on_all_cpus();
+ /*
+ * If invoked during panic handling, local interrupts are disabled
+ * and all CPUs are stopped, so wbinvd_on_all_cpus() can't be called.
+ * In that case, a wbinvd() is done on remote CPUs via the NMI
+ * callback, so only a local wbinvd() is needed here.
+ */
+ if (!in_panic)
+ wbinvd_on_all_cpus();
+ else
+ wbinvd();

retry:
ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
@@ -1543,17 +1562,6 @@ static int __sev_snp_shutdown_locked(int *error)
return ret;
}

-static int sev_snp_shutdown(int *error)
-{
- int rc;
-
- mutex_lock(&sev_cmd_mutex);
- rc = __sev_snp_shutdown_locked(error);
- mutex_unlock(&sev_cmd_mutex);
-
- return rc;
-}
-
static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
{
struct sev_device *sev = psp_master->sev_data;
@@ -2262,19 +2270,29 @@ int sev_dev_init(struct psp_device *psp)
return ret;
}

-static void sev_firmware_shutdown(struct sev_device *sev)
+static void __sev_firmware_shutdown(struct sev_device *sev, bool in_panic)
{
int error;

- sev_platform_shutdown(NULL);
+ __sev_platform_shutdown_locked(NULL);

if (sev_es_tmr) {
- /* The TMR area was encrypted, flush it from the cache */
- wbinvd_on_all_cpus();
+ /*
+ * The TMR area was encrypted, flush it from the cache
+ *
+ * If invoked during panic handling, local interrupts are
+ * disabled and all CPUs are stopped, so wbinvd_on_all_cpus()
+ * can't be used. In that case, wbinvd() is done on remote CPUs
+ * via the NMI callback, so a local wbinvd() is sufficient here.
+ */
+ if (!in_panic)
+ wbinvd_on_all_cpus();
+ else
+ wbinvd();

__snp_free_firmware_pages(virt_to_page(sev_es_tmr),
get_order(sev_es_tmr_size),
- false);
+ true);
sev_es_tmr = NULL;
}

@@ -2295,7 +2313,14 @@ static void sev_firmware_shutdown(struct sev_device *sev)
*/
free_snp_host_map(sev);

- sev_snp_shutdown(&error);
+ __sev_snp_shutdown_locked(&error, in_panic);
+}
+
+static void sev_firmware_shutdown(struct sev_device *sev)
+{
+ mutex_lock(&sev_cmd_mutex);
+ __sev_firmware_shutdown(sev, false);
+ mutex_unlock(&sev_cmd_mutex);
}

void sev_dev_destroy(struct psp_device *psp)
@@ -2313,6 +2338,28 @@ void sev_dev_destroy(struct psp_device *psp)
psp_clear_sev_irq_handler(psp);
}

+static int sev_snp_shutdown_on_panic(struct notifier_block *nb,
+ unsigned long reason, void *arg)
+{
+ struct sev_device *sev = psp_master->sev_data;
+
+ /*
+ * Panic callbacks are executed with all other CPUs stopped,
+ * so don't wait for sev_cmd_mutex to be released since it
+ * would block here forever.
+ */
+ if (mutex_is_locked(&sev_cmd_mutex))
+ return NOTIFY_DONE;
+
+ __sev_firmware_shutdown(sev, true);
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block sev_snp_panic_notifier = {
+ .notifier_call = sev_snp_shutdown_on_panic,
+};
+
int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
void *data, int *error)
{
@@ -2360,6 +2407,8 @@ void sev_pci_init(void)
dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
"-SNP" : "", sev->api_major, sev->api_minor, sev->build);

+ atomic_notifier_chain_register(&panic_notifier_list,
+ &sev_snp_panic_notifier);
return;

err:
@@ -2375,4 +2424,7 @@ void sev_pci_exit(void)
return;

sev_firmware_shutdown(sev);
+
+ atomic_notifier_chain_unregister(&panic_notifier_list,
+ &sev_snp_panic_notifier);
}
--
2.25.1

2023-10-16 13:49:18

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature

From: Brijesh Singh <[email protected]>

Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/amd.c | 5 +++--
tools/arch/x86/include/asm/cpufeatures.h | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 58cb9495e40f..1640cedd77f1 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -437,6 +437,7 @@
#define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
#define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
#define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index dd8379d84445..14ee7f750cc7 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -630,8 +630,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
* SME feature (set in scattered.c).
* If the kernel has not enabled SME via any means then
* don't advertise the SME feature.
- * For SEV: If BIOS has not enabled SEV then don't advertise the
- * SEV and SEV_ES feature (set in scattered.c).
+ * For SEV: If BIOS has not enabled SEV then don't advertise SEV and
+ * any additional functionality based on it.
*
* In all cases, since support for SME and SEV requires long mode,
* don't advertise the feature under CONFIG_X86_32.
@@ -666,6 +666,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
clear_sev:
setup_clear_cpu_cap(X86_FEATURE_SEV);
setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+ setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
}
}

diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index 798e60b5454b..669f45eefa0c 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -432,6 +432,7 @@
#define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
#define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
#define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
--
2.25.1

2023-10-16 13:50:34

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled

From: Kim Phillips <[email protected]>

Without SEV-SNP, Automatic IBRS protects only the kernel. But when
SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
host-side code, including userspace. This protection comes at a cost:
reduced userspace indirect branch performance.

To avoid this performance loss, don't use Automatic IBRS on SEV-SNP
hosts. Fall back to retpolines instead.

Signed-off-by: Kim Phillips <[email protected]>
[mdr: squash in changes from review discussion]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kernel/cpu/common.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 382d4e6b848d..11fae89b799e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1357,8 +1357,13 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
/*
* AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature
* flag and protect from vendor-specific bugs via the whitelist.
+ *
+ * Don't use AutoIBRS when SNP is enabled because it degrades host
+ * userspace indirect branch performance.
*/
- if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
+ if ((ia32_cap & ARCH_CAP_IBRS_ALL) ||
+ (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
+ !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
!(ia32_cap & ARCH_CAP_PBRSB_NO))
--
2.25.1

2023-10-16 13:50:39

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support

From: Brijesh Singh <[email protected]>

The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
a number of steps needed to detect/enable SEV-SNP and RMP table support
on the host:

- Detect SEV-SNP support based on CPUID bit
- Initialize the RMP table memory reported by the RMP base/end MSR
registers and configure IOMMU to be compatible with RMP access
restrictions
- Set the MtrrFixDramModEn bit in SYSCFG MSR
- Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
- Configure IOMMU

RMP table entry format is non-architectural and it can vary by
processor. It is defined by the PPR. Restrict SNP support to CPU
models/families which are compatible with the current RMP table entry
format to guard against any undefined behavior when running on other
system types. Future models/support will handle this through an
architectural mechanism to allow for broader compatibility.

SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
instead of CONFIG_AMD_MEM_ENCRYPT.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Co-developed-by: Tom Lendacky <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: rework commit message to be clearer about what patch does, squash
in early_rmptable_check() handling from Tom]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/Kbuild | 2 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/msr-index.h | 11 +-
arch/x86/include/asm/sev.h | 6 +
arch/x86/kernel/cpu/amd.c | 19 ++
arch/x86/virt/svm/Makefile | 3 +
arch/x86/virt/svm/sev.c | 239 +++++++++++++++++++++++
drivers/iommu/amd/init.c | 2 +-
include/linux/amd-iommu.h | 2 +-
9 files changed, 288 insertions(+), 4 deletions(-)
create mode 100644 arch/x86/virt/svm/Makefile
create mode 100644 arch/x86/virt/svm/sev.c

diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
index 5a83da703e87..6a1f36df6a18 100644
--- a/arch/x86/Kbuild
+++ b/arch/x86/Kbuild
@@ -28,5 +28,7 @@ obj-y += net/

obj-$(CONFIG_KEXEC_FILE) += purgatory/

+obj-y += virt/svm/
+
# for cleaning
subdir- += boot tools
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 702d93fdd10e..83efd407033b 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -117,6 +117,12 @@
#define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31))
#endif

+#ifdef CONFIG_KVM_AMD_SEV
+# define DISABLE_SEV_SNP 0
+#else
+# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -141,7 +147,7 @@
DISABLE_ENQCMD)
#define DISABLED_MASK17 0
#define DISABLED_MASK18 (DISABLE_IBT)
-#define DISABLED_MASK19 0
+#define DISABLED_MASK19 (DISABLE_SEV_SNP)
#define DISABLED_MASK20 0
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1d111350197f..2be74afb4cbd 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -589,6 +589,8 @@
#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
#define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE 0xc0010132
+#define MSR_AMD64_RMP_END 0xc0010133

/* SNP feature bits enabled by the hypervisor */
#define MSR_AMD64_SNP_VTOM BIT_ULL(3)
@@ -690,7 +692,14 @@
#define MSR_K8_TOP_MEM2 0xc001001d
#define MSR_AMD64_SYSCFG 0xc0010010
#define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
-#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
+#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
+#define MSR_AMD64_SYSCFG_MFDM_BIT 19
+#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
+
#define MSR_K8_INT_PENDING_MSG 0xc0010055
/* C1E active bits in int pending message */
#define K8_INTP_C1E_ACTIVE_MASK 0x18000000
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 5b4a1ce3d368..b05fcd0ab7e4 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -243,4 +243,10 @@ static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
static inline u64 sev_get_status(void) { return 0; }
#endif

+#ifdef CONFIG_KVM_AMD_SEV
+bool snp_get_rmptable_info(u64 *start, u64 *len);
+#else
+static inline bool snp_get_rmptable_info(u64 *start, u64 *len) { return false; }
+#endif
+
#endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 14ee7f750cc7..6cc2074fcea3 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -20,6 +20,7 @@
#include <asm/delay.h>
#include <asm/debugreg.h>
#include <asm/resctrl.h>
+#include <asm/sev.h>

#ifdef CONFIG_X86_64
# include <asm/mmconfig.h>
@@ -618,6 +619,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
resctrl_cpu_detect(c);
}

+static bool early_rmptable_check(void)
+{
+ u64 rmp_base, rmp_size;
+
+ /*
+ * For early BSP initialization, max_pfn won't be set up yet, wait until
+ * it is set before performing the RMP table calculations.
+ */
+ if (!max_pfn)
+ return true;
+
+ return snp_get_rmptable_info(&rmp_base, &rmp_size);
+}
+
static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
{
u64 msr;
@@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
if (!(msr & MSR_K7_HWCR_SMMLOCK))
goto clear_sev;

+ if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
+ goto clear_snp;
+
return;

clear_all:
@@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
clear_sev:
setup_clear_cpu_cap(X86_FEATURE_SEV);
setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+clear_snp:
setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
}
}
diff --git a/arch/x86/virt/svm/Makefile b/arch/x86/virt/svm/Makefile
new file mode 100644
index 000000000000..ef2a31bdcc70
--- /dev/null
+++ b/arch/x86/virt/svm/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_KVM_AMD_SEV) += sev.o
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
new file mode 100644
index 000000000000..8b9ed72489e4
--- /dev/null
+++ b/arch/x86/virt/svm/sev.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD SVM-SEV Host Support.
+ *
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ *
+ * Author: Ashish Kalra <[email protected]>
+ *
+ */
+
+#include <linux/cc_platform.h>
+#include <linux/printk.h>
+#include <linux/mm_types.h>
+#include <linux/set_memory.h>
+#include <linux/memblock.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/cpumask.h>
+#include <linux/iommu.h>
+#include <linux/amd-iommu.h>
+
+#include <asm/sev.h>
+#include <asm/processor.h>
+#include <asm/setup.h>
+#include <asm/svm.h>
+#include <asm/smp.h>
+#include <asm/cpu.h>
+#include <asm/apic.h>
+#include <asm/cpuid.h>
+#include <asm/cmdline.h>
+#include <asm/iommu.h>
+
+/*
+ * The RMP entry format is not architectural. The format is defined in PPR
+ * Family 19h Model 01h, Rev B1 processor.
+ */
+struct rmpentry {
+ u64 assigned : 1,
+ pagesize : 1,
+ immutable : 1,
+ rsvd1 : 9,
+ gpa : 39,
+ asid : 10,
+ vmsa : 1,
+ validated : 1,
+ rsvd2 : 1;
+ u64 rsvd3;
+} __packed;
+
+/*
+ * The first 16KB from the RMP_BASE is used by the processor for the
+ * bookkeeping, the range needs to be added during the RMP entry lookup.
+ */
+#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
+
+static struct rmpentry *rmptable_start __ro_after_init;
+static u64 rmptable_max_pfn __ro_after_init;
+
+#undef pr_fmt
+#define pr_fmt(fmt) "SEV-SNP: " fmt
+
+static int __mfd_enable(unsigned int cpu)
+{
+ u64 val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+
+ val |= MSR_AMD64_SYSCFG_MFDM;
+
+ wrmsrl(MSR_AMD64_SYSCFG, val);
+
+ return 0;
+}
+
+static __init void mfd_enable(void *arg)
+{
+ __mfd_enable(smp_processor_id());
+}
+
+static int __snp_enable(unsigned int cpu)
+{
+ u64 val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+
+ val |= MSR_AMD64_SYSCFG_SNP_EN;
+ val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
+
+ wrmsrl(MSR_AMD64_SYSCFG, val);
+
+ return 0;
+}
+
+static __init void snp_enable(void *arg)
+{
+ __snp_enable(smp_processor_id());
+}
+
+#define RMP_ADDR_MASK GENMASK_ULL(51, 13)
+
+bool snp_get_rmptable_info(u64 *start, u64 *len)
+{
+ u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
+
+ rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
+ rdmsrl(MSR_AMD64_RMP_END, rmp_end);
+
+ if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
+ pr_err("Memory for the RMP table has not been reserved by BIOS\n");
+ return false;
+ }
+
+ if (rmp_base > rmp_end) {
+ pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end);
+ return false;
+ }
+
+ rmp_sz = rmp_end - rmp_base + 1;
+
+ /*
+ * Calculate the amount the memory that must be reserved by the BIOS to
+ * address the whole RAM, including the bookkeeping area. The RMP itself
+ * must also be covered.
+ */
+ max_rmp_pfn = max_pfn;
+ if (PHYS_PFN(rmp_end) > max_pfn)
+ max_rmp_pfn = PHYS_PFN(rmp_end);
+
+ calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+ if (calc_rmp_sz > rmp_sz) {
+ pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
+ calc_rmp_sz, rmp_sz);
+ return false;
+ }
+
+ *start = rmp_base;
+ *len = rmp_sz;
+
+ return true;
+}
+
+static __init int __snp_rmptable_init(void)
+{
+ u64 rmp_base, rmp_size;
+ void *rmp_start;
+ u64 val;
+
+ if (!snp_get_rmptable_info(&rmp_base, &rmp_size))
+ return 1;
+
+ pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
+ rmp_base, rmp_base + rmp_size - 1);
+
+ rmp_start = memremap(rmp_base, rmp_size, MEMREMAP_WB);
+ if (!rmp_start) {
+ pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, rmp_size);
+ return 1;
+ }
+
+ /*
+ * Check if SEV-SNP is already enabled, this can happen in case of
+ * kexec boot.
+ */
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+ if (val & MSR_AMD64_SYSCFG_SNP_EN)
+ goto skip_enable;
+
+ /* Initialize the RMP table to zero */
+ memset(rmp_start, 0, rmp_size);
+
+ /* Flush the caches to ensure that data is written before SNP is enabled. */
+ wbinvd_on_all_cpus();
+
+ /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
+ on_each_cpu(mfd_enable, NULL, 1);
+
+ /* Enable SNP on all CPUs. */
+ on_each_cpu(snp_enable, NULL, 1);
+
+skip_enable:
+ rmp_start += RMPTABLE_CPU_BOOKKEEPING_SZ;
+ rmp_size -= RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+ rmptable_start = (struct rmpentry *)rmp_start;
+ rmptable_max_pfn = rmp_size / sizeof(struct rmpentry) - 1;
+
+ return 0;
+}
+
+static int __init snp_rmptable_init(void)
+{
+ int family, model;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ family = boot_cpu_data.x86;
+ model = boot_cpu_data.x86_model;
+
+ /*
+ * RMP table entry format is not architectural and it can vary by processor and
+ * is defined by the per-processor PPR. Restrict SNP support on the known CPU
+ * model and family for which the RMP table entry format is currently defined for.
+ */
+ if (family != 0x19 || model > 0xaf)
+ goto nosnp;
+
+ if (amd_iommu_snp_enable())
+ goto nosnp;
+
+ if (__snp_rmptable_init())
+ goto nosnp;
+
+ cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
+
+ return 0;
+
+nosnp:
+ setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+ return -ENOSYS;
+}
+
+/*
+ * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
+ * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
+ * called after subsys_initcall().
+ *
+ * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
+ * directly into guest private memory. In case of SNP, the IOMMU ensures that
+ * the page(s) used for DMA are hypervisor owned.
+ */
+fs_initcall(snp_rmptable_init);
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 45efb7e5d725..1c9924de607a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3802,7 +3802,7 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
}

-#ifdef CONFIG_AMD_MEM_ENCRYPT
+#ifdef CONFIG_KVM_AMD_SEV
int amd_iommu_snp_enable(void)
{
/*
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 99a5201d9e62..55fc03cb3968 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -205,7 +205,7 @@ int amd_iommu_pc_get_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn,
u64 *value);
struct amd_iommu *get_amd_iommu(unsigned int idx);

-#ifdef CONFIG_AMD_MEM_ENCRYPT
+#ifdef CONFIG_KVM_AMD_SEV
int amd_iommu_snp_enable(void);
#endif

--
2.25.1

2023-10-16 13:51:31

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 07/50] x86/sev: Add RMP entry lookup helpers

From: Brijesh Singh <[email protected]>

The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
entry for a given page. The RMP entry format is documented in AMD PPR, see
https://bugzilla.kernel.org/attachment.cgi?id=296015.

Co-developed-by: Ashish Kalra <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
[mdr: separate 'assigned' indicator from return code]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-common.h | 4 +++
arch/x86/include/asm/sev-host.h | 22 +++++++++++++
arch/x86/virt/svm/sev.c | 53 +++++++++++++++++++++++++++++++
3 files changed, 79 insertions(+)
create mode 100644 arch/x86/include/asm/sev-host.h

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b463fcbd4b90..1e6fb93d8ab0 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -173,4 +173,8 @@ struct snp_psc_desc {
#define GHCB_ERR_INVALID_INPUT 5
#define GHCB_ERR_INVALID_EVENT 6

+/* RMP page size */
+#define RMP_PG_SIZE_4K 0
+#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+
#endif
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
new file mode 100644
index 000000000000..4c487ce8457f
--- /dev/null
+++ b/arch/x86/include/asm/sev-host.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * AMD SVM-SEV Host Support.
+ *
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ *
+ * Author: Ashish Kalra <[email protected]>
+ *
+ */
+
+#ifndef __ASM_X86_SEV_HOST_H
+#define __ASM_X86_SEV_HOST_H
+
+#include <asm/sev-common.h>
+
+#ifdef CONFIG_KVM_AMD_SEV
+int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
+#else
+static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENXIO; }
+#endif
+
+#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 8b9ed72489e4..7d3802605376 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -53,6 +53,9 @@ struct rmpentry {
*/
#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000

+/* Mask to apply to a PFN to get the first PFN of a 2MB page */
+#define PFN_PMD_MASK (~((1ULL << (PMD_SHIFT - PAGE_SHIFT)) - 1))
+
static struct rmpentry *rmptable_start __ro_after_init;
static u64 rmptable_max_pfn __ro_after_init;

@@ -237,3 +240,53 @@ static int __init snp_rmptable_init(void)
* the page(s) used for DMA are hypervisor owned.
*/
fs_initcall(snp_rmptable_init);
+
+static int rmptable_entry(u64 pfn, struct rmpentry *entry)
+{
+ if (WARN_ON_ONCE(pfn > rmptable_max_pfn))
+ return -EFAULT;
+
+ *entry = rmptable_start[pfn];
+
+ return 0;
+}
+
+static int __snp_lookup_rmpentry(u64 pfn, struct rmpentry *entry, int *level)
+{
+ struct rmpentry large_entry;
+ int ret;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ ret = rmptable_entry(pfn, entry);
+ if (ret)
+ return ret;
+
+ /*
+ * Find the authoritative RMP entry for a PFN. This can be either a 4K
+ * RMP entry or a special large RMP entry that is authoritative for a
+ * whole 2M area.
+ */
+ ret = rmptable_entry(pfn & PFN_PMD_MASK, &large_entry);
+ if (ret)
+ return ret;
+
+ *level = RMP_TO_X86_PG_LEVEL(large_entry.pagesize);
+
+ return 0;
+}
+
+int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
+{
+ struct rmpentry e;
+ int ret;
+
+ ret = __snp_lookup_rmpentry(pfn, &e, level);
+ if (ret)
+ return ret;
+
+ *assigned = !!e.assigned;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
--
2.25.1

2023-10-16 14:01:18

by Michael Roth

[permalink] [raw]
Subject: [PATCH v10 08/50] x86/fault: Add helper for dumping RMP entries

From: Brijesh Singh <[email protected]>

This information will be useful for debugging things like page faults
due to RMP access violations and RMPUPDATE failures.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
[mdr: move helper to standalone patch]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev-host.h | 2 +
arch/x86/virt/svm/sev.c | 77 +++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+)

diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index 4c487ce8457f..bb06c57f2909 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -15,8 +15,10 @@

#ifdef CONFIG_KVM_AMD_SEV
int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
+void sev_dump_hva_rmpentry(unsigned long address);
#else
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENXIO; }
+static inline void sev_dump_hva_rmpentry(unsigned long address) {}
#endif

#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 7d3802605376..cac3e311c38f 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -290,3 +290,80 @@ int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
return 0;
}
EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
+
+/*
+ * Dump the raw RMP entry for a particular PFN. These bits are documented in the
+ * PPR for a particular CPU model and provide useful information about how a
+ * particular PFN is being utilized by the kernel/firmware at the time certain
+ * unexpected events occur, such as RMP faults.
+ */
+static void sev_dump_rmpentry(u64 dumped_pfn)
+{
+ struct rmpentry e;
+ u64 pfn, pfn_end;
+ int level, ret;
+ u64 *e_data;
+
+ ret = __snp_lookup_rmpentry(dumped_pfn, &e, &level);
+ if (ret) {
+ pr_info("Failed to read RMP entry for PFN 0x%llx, error %d\n",
+ dumped_pfn, ret);
+ return;
+ }
+
+ e_data = (u64 *)&e;
+ if (e.assigned) {
+ pr_info("RMP entry for PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+ dumped_pfn, e_data[1], e_data[0]);
+ return;
+ }
+
+ /*
+ * If the RMP entry for a particular PFN is not in an assigned state,
+ * then it is sometimes useful to get an idea of whether or not any RMP
+ * entries for other PFNs within the same 2MB region are assigned, since
+ * those too can affect the ability to access a particular PFN in
+ * certain situations, such as when the PFN is being accessed via a 2MB
+ * mapping in the host page table.
+ */
+ pfn = ALIGN(dumped_pfn, PTRS_PER_PMD);
+ pfn_end = pfn + PTRS_PER_PMD;
+
+ while (pfn < pfn_end) {
+ ret = __snp_lookup_rmpentry(pfn, &e, &level);
+ if (ret) {
+ pr_info_ratelimited("Failed to read RMP entry for PFN 0x%llx\n", pfn);
+ pfn++;
+ continue;
+ }
+
+ if (e_data[0] || e_data[1]) {
+ pr_info("No assigned RMP entry for PFN 0x%llx, but the 2MB region contains populated RMP entries, e.g.: PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+ dumped_pfn, pfn, e_data[1], e_data[0]);
+ return;
+ }
+ pfn++;
+ }
+
+ pr_info("No populated RMP entries in the 2MB region containing PFN 0x%llx\n",
+ dumped_pfn);
+}
+
+void sev_dump_hva_rmpentry(unsigned long hva)
+{
+ unsigned int level;
+ pgd_t *pgd;
+ pte_t *pte;
+
+ pgd = __va(read_cr3_pa());
+ pgd += pgd_index(hva);
+ pte = lookup_address_in_pgd(pgd, hva, &level);
+
+ if (pte) {
+ pr_info("Can't dump RMP entry for HVA %lx: no PTE/PFN found\n", hva);
+ return;
+ }
+
+ sev_dump_rmpentry(pte_pfn(*pte));
+}
+EXPORT_SYMBOL_GPL(sev_dump_hva_rmpentry);
--
2.25.1

2023-10-16 15:12:26

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway

On Mon, Oct 16, 2023 at 08:27:30AM -0500, Michael Roth wrote:
> From: Paolo Bonzini <[email protected]>
>
> svm_recalc_instruction_intercepts() is always called at least once
> before the vCPU is started, so the setting or clearing of the RDTSCP
> intercept can be dropped from the TSC_AUX virtualization support.
>
> Extracted from a patch by Tom Lendacky.
>
> Cc: [email protected]
> Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
> Signed-off-by: Paolo Bonzini <[email protected]>
> (cherry picked from commit e8d93d5d93f85949e7299be289c6e7e1154b2f78)
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)

What stable tree(s) are you wanting this applied to (same for the others
in this series)? It's already in the 6.1.56 release, and the Fixes tag
is for 5.19, so I don't see where it could be missing from?

thanks,

greg k-h

2023-10-16 15:15:57

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway

On 10/16/23 17:12, Greg KH wrote:
> On Mon, Oct 16, 2023 at 08:27:30AM -0500, Michael Roth wrote:
>> From: Paolo Bonzini <[email protected]>
>>
>> svm_recalc_instruction_intercepts() is always called at least once
>> before the vCPU is started, so the setting or clearing of the RDTSCP
>> intercept can be dropped from the TSC_AUX virtualization support.
>>
>> Extracted from a patch by Tom Lendacky.
>>
>> Cc: [email protected]
>> Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
>> Signed-off-by: Paolo Bonzini <[email protected]>
>> (cherry picked from commit e8d93d5d93f85949e7299be289c6e7e1154b2f78)
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> arch/x86/kvm/svm/sev.c | 5 +----
>> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> What stable tree(s) are you wanting this applied to (same for the others
> in this series)? It's already in the 6.1.56 release, and the Fixes tag
> is for 5.19, so I don't see where it could be missing from?

I tink it's missing in the (destined for 6.7) tree that Michael is
basing this series on, so he's cherry picking it from Linus's tree.

Paolo

2023-10-16 15:22:23

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway

On Mon, Oct 16, 2023 at 05:14:38PM +0200, Paolo Bonzini wrote:
> On 10/16/23 17:12, Greg KH wrote:
> > On Mon, Oct 16, 2023 at 08:27:30AM -0500, Michael Roth wrote:
> > > From: Paolo Bonzini <[email protected]>
> > >
> > > svm_recalc_instruction_intercepts() is always called at least once
> > > before the vCPU is started, so the setting or clearing of the RDTSCP
> > > intercept can be dropped from the TSC_AUX virtualization support.
> > >
> > > Extracted from a patch by Tom Lendacky.
> > >
> > > Cc: [email protected]
> > > Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
> > > Signed-off-by: Paolo Bonzini <[email protected]>
> > > (cherry picked from commit e8d93d5d93f85949e7299be289c6e7e1154b2f78)
> > > Signed-off-by: Michael Roth <[email protected]>
> > > ---
> > > arch/x86/kvm/svm/sev.c | 5 +----
> > > 1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > What stable tree(s) are you wanting this applied to (same for the others
> > in this series)? It's already in the 6.1.56 release, and the Fixes tag
> > is for 5.19, so I don't see where it could be missing from?
>
> I tink it's missing in the (destined for 6.7) tree that Michael is basing
> this series on, so he's cherry picking it from Linus's tree.

Yes, this and PATCH #2 are both prereqs that have already been applied
upstream, and are only being included in this series because they are
preqs for PATCH #3 which is new. Sorry for any confusion.

-Mike

>
> Paolo
>

2023-10-16 23:18:38

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

> +
> + /*
> + * If a VMM-specific certificate blob hasn't been provided, grab the
> + * host-wide one.
> + */
> + snp_certs = sev_snp_certs_get(sev->snp_certs);
> + if (!snp_certs)
> + snp_certs = sev_snp_global_certs_get();
> +

This is where the generation I suggested adding would get checked. If
the instance certs' generation is not the global generation, then I
think we need a way to return to the VMM to make that right before
continuing to provide outdated certificates.
This might be an unreasonable request, but the fact that the certs and
reported_tcb can be set while a VM is running makes this an issue.

--
-Dionna Glaze, PhD (she/her)

2023-10-18 02:29:29

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event


On 18/10/23 03:27, Sean Christopherson wrote:
> On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
>>> +
>>> + /*
>>> + * If a VMM-specific certificate blob hasn't been provided, grab the
>>> + * host-wide one.
>>> + */
>>> + snp_certs = sev_snp_certs_get(sev->snp_certs);
>>> + if (!snp_certs)
>>> + snp_certs = sev_snp_global_certs_get();
>>> +
>>
>> This is where the generation I suggested adding would get checked. If
>> the instance certs' generation is not the global generation, then I
>> think we need a way to return to the VMM to make that right before
>> continuing to provide outdated certificates.
>> This might be an unreasonable request, but the fact that the certs and
>> reported_tcb can be set while a VM is running makes this an issue.
>
> Before we get that far, the changelogs need to explain why the kernel is storing
> userspace blobs in the first place. The whole thing is a bit of a mess.
>
> sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
> bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
> while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
> between bumping the refcount and grabbing the pointer, KVM will end up leaking a
> refcount and consuming a pointer without a refcount.
>
> if (!kref_get_unless_zero(&certs->kref))
> return NULL;
>
> return certs;

I'm missing something here. The @certs pointer is on the stack, if it is
being released elsewhere - kref_get_unless_zero() is going to fail and
return NULL. How can this @certs not have the refcount incremented?


> If allocating memory for the certs fails, the kernel will have set the config
> but not store the corresponding certs.


Ah true.

> ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
> if (ret)
> goto e_free;
>
> memcpy(&sev->snp_config, &config, sizeof(config));
> }
>
> /*
> * If the new certs are passed then cache it else free the old certs.
> */
> if (input.certs_len) {
> snp_certs = sev_snp_certs_new(certs, input.certs_len);
> if (!snp_certs) {
> ret = -ENOMEM;
> goto e_free;
> }
> }
>
> Reasoning about ordering is also difficult, e.g. what is KVM's contract with
> userspace in terms of recognizing new global certs?
>
> I don't understand why the kernel needs to manage the certs. AFAICT the so called
> global certs aren't an input to SEV_CMD_SNP_CONFIG, i.e. SNP_SET_EXT_CONFIG is
> purely a software defined thing.
> > The easiest solution I can think of is to have KVM provide a chunk of
memory in
> kvm_sev_info for SNP guests that userspace can mmap(), a la vcpu->run.
>
> struct sev_snp_certs {
> u8 data[KVM_MAX_SEV_SNP_CERT_SIZE];
> u32 size;
> u8 pad[<size to make the struct page aligned>];
> };
>
> When the guest requests the certs, KVM does something like:
>
> certs_size = READ_ONCE(sev->snp_certs->size);
> if (certs_size > sizeof(sev->snp_certs->data) ||
> !IS_ALIGNED(certs_size, PAGE_SIZE))
> certs_size = 0;
>
> if (certs_size && (data_npages << PAGE_SHIFT) < certs_size) {
> vcpu->arch.regs[VCPU_REGS_RBX] = certs_size >> PAGE_SHIFT;
> exitcode = SNP_GUEST_VMM_ERR(SNP_GUEST_VMM_ERR_INVALID_LEN);
> goto cleanup;
> }
>
> ...
>
> if (certs_size &&
> kvm_write_guest(kvm, data_gpa, sev->snp_certs->data, certs_size))
> exitcode = SEV_RET_INVALID_ADDRESS;
>
> If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
> That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
> concern.

The global cert lives in CCP (/dev/sev), the per VM cert lives in
kvmvm_fd. "A la vcpu->run" is fine for the latter but for the former we
need something else. And there is scenario when one global certs blob is
what is needed and copying it over multiple VMs seems suboptimal.

> If userspace needs to *stall* cert requests, e.g. while the certs are being updated,

afaik it does not need to.

> then that's a different issue entirely. If the GHCB allows telling the guest to
> retry the request, then it should be trivially easy to solve, e.g. add a flag in
> sev_snp_certs. If KVM must "immediately" handle the request, then we'll need more
> elaborate uAPI.


--
Alexey


2023-10-18 13:49:37

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
>
> On 18/10/23 03:27, Sean Christopherson wrote:
> > On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
> > > > +
> > > > + /*
> > > > + * If a VMM-specific certificate blob hasn't been provided, grab the
> > > > + * host-wide one.
> > > > + */
> > > > + snp_certs = sev_snp_certs_get(sev->snp_certs);
> > > > + if (!snp_certs)
> > > > + snp_certs = sev_snp_global_certs_get();
> > > > +
> > >
> > > This is where the generation I suggested adding would get checked. If
> > > the instance certs' generation is not the global generation, then I
> > > think we need a way to return to the VMM to make that right before
> > > continuing to provide outdated certificates.
> > > This might be an unreasonable request, but the fact that the certs and
> > > reported_tcb can be set while a VM is running makes this an issue.
> >
> > Before we get that far, the changelogs need to explain why the kernel is storing
> > userspace blobs in the first place. The whole thing is a bit of a mess.
> >
> > sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
> > bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
> > while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
> > between bumping the refcount and grabbing the pointer, KVM will end up leaking a
> > refcount and consuming a pointer without a refcount.
> >
> > if (!kref_get_unless_zero(&certs->kref))
> > return NULL;
> >
> > return certs;
>
> I'm missing something here. The @certs pointer is on the stack,

No, nothing guarantees that @certs is on the stack and will never be reloaded.
sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so it's entirely
possible that it can be inlined. Then you end up with:

struct sev_device *sev;

if (!psp_master || !psp_master->sev_data)
return NULL;

sev = psp_master->sev_data;
if (!sev->snp_initialized)
return NULL;

if (!sev->snp_certs)
return NULL;

if (!kref_get_unless_zero(&sev->snp_certs->kref))
return NULL;

return sev->snp_certs;

At which point the compiler could choose to omit a local variable entirely, it
could store @certs in a register and reload after kref_get_unless_zero(), etc.
If psp_master->sev_data->snp_certs is changed at any point, odd thing can happen.

That atomic operation in kref_get_unless_zero() might prevent a reload between
getting the kref and the return, but it wouldn't prevent a reload between the
!NULL check and kref_get_unless_zero().

> > If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
> > That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
> > concern.
>
> The global cert lives in CCP (/dev/sev), the per VM cert lives in kvmvm_fd.
> "A la vcpu->run" is fine for the latter but for the former we need something
> else.

Why? The cert ultimately comes from userspace, no? Make userspace deal with it.

> And there is scenario when one global certs blob is what is needed and
> copying it over multiple VMs seems suboptimal.

That's a solvable problem. I'm not sure I like the most obvious solution, but it
is a solution: let userspace define a KVM-wide blob pointer, either via .mmap()
or via an ioctl().

FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
userspace pointer would suffice. The benefit of a kernel controlled pointer is
that it doesn't require copying to a kernel buffer (or special code to copy from
userspace into guest).

Actually, looking at the flow again, AFAICT there's nothing special about the
target DATA_PAGE. It must be SHARED *before* SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
KVM doesn't need to do conversions, there's no kernel priveleges required, etc.
And the GHCB doesn't dictate ordering between storing the certificates and doing
the request. That means the certificate stuff can be punted entirely to usersepace.

Heh, typing up the below, there's another bug: KVM will incorrectly "return" '0'
for non-SNP guests:

unsigned long exitcode = 0;
u64 data_gpa;
int err, rc;

if (!sev_snp_guest(vcpu->kvm)) {
rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
goto e_fail;
}

e_fail:
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);

Which really highlights that we need to get test infrastructure up and running
for SEV-ES, SNP, and TDX.

Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
is the definition of KVM_HC_SNP_GET_CERTS and its arguments.

static void snp_handle_guest_request(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
struct sev_data_snp_guest_request data = {0};
struct kvm_vcpu *vcpu = &svm->vcpu;
struct kvm *kvm = vcpu->kvm;
struct kvm_sev_info *sev;
gpa_t req_gpa = control->exit_info_1;
gpa_t resp_gpa = control->exit_info_2;
unsigned long rc;
int err;

if (!sev_snp_guest(vcpu->kvm)) {
rc = SEV_RET_INVALID_GUEST;
goto e_fail;
}

sev = &to_kvm_svm(kvm)->sev_info;

mutex_lock(&sev->guest_req_lock);

rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
if (rc)
goto unlock;

rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
if (rc)
/* Ensure an error value is returned to guest. */
rc = err ? err : SEV_RET_INVALID_ADDRESS;

snp_cleanup_guest_buf(&data, &rc);

unlock:
mutex_unlock(&sev->guest_req_lock);

e_fail:
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
}

static int snp_complete_ext_guest_request(struct kvm_vcpu *vcpu)
{
u64 certs_exitcode = vcpu->run->hypercall.args[2];
struct vcpu_svm *svm = to_svm(vcpu);

if (certs_exitcode)
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, certs_exitcode);
else
snp_handle_guest_request(svm);
return 1;
}

static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
struct kvm *kvm = vcpu->kvm;
struct kvm_sev_info *sev;
unsigned long exitcode;
u64 data_gpa;

if (!sev_snp_guest(vcpu->kvm)) {
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
return 1;
}

data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
return 1;
}

vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
vcpu->run->hypercall.args[0] = data_gpa;
vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
return 0;
}

2023-10-18 20:27:23

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

On 10/18/2023 8:48 AM, Sean Christopherson wrote:
> On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
>>
>> On 18/10/23 03:27, Sean Christopherson wrote:
>>> On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
>>>>> +
>>>>> + /*
>>>>> + * If a VMM-specific certificate blob hasn't been provided, grab the
>>>>> + * host-wide one.
>>>>> + */
>>>>> + snp_certs = sev_snp_certs_get(sev->snp_certs);
>>>>> + if (!snp_certs)
>>>>> + snp_certs = sev_snp_global_certs_get();
>>>>> +
>>>>
>>>> This is where the generation I suggested adding would get checked. If
>>>> the instance certs' generation is not the global generation, then I
>>>> think we need a way to return to the VMM to make that right before
>>>> continuing to provide outdated certificates.
>>>> This might be an unreasonable request, but the fact that the certs and
>>>> reported_tcb can be set while a VM is running makes this an issue.
>>>
>>> Before we get that far, the changelogs need to explain why the kernel is storing
>>> userspace blobs in the first place. The whole thing is a bit of a mess.
>>>
>>> sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
>>> bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
>>> while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
>>> between bumping the refcount and grabbing the pointer, KVM will end up leaking a
>>> refcount and consuming a pointer without a refcount.
>>>
>>> if (!kref_get_unless_zero(&certs->kref))
>>> return NULL;
>>>
>>> return certs;
>>
>> I'm missing something here. The @certs pointer is on the stack,
>
> No, nothing guarantees that @certs is on the stack and will never be reloaded.
> sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so it's entirely
> possible that it can be inlined. Then you end up with:
>
> struct sev_device *sev;
>
> if (!psp_master || !psp_master->sev_data)
> return NULL;
>
> sev = psp_master->sev_data;
> if (!sev->snp_initialized)
> return NULL;
>
> if (!sev->snp_certs)
> return NULL;
>
> if (!kref_get_unless_zero(&sev->snp_certs->kref))
> return NULL;
>
> return sev->snp_certs;
>
> At which point the compiler could choose to omit a local variable entirely, it
> could store @certs in a register and reload after kref_get_unless_zero(), etc.
> If psp_master->sev_data->snp_certs is changed at any point, odd thing can happen.
>
> That atomic operation in kref_get_unless_zero() might prevent a reload between
> getting the kref and the return, but it wouldn't prevent a reload between the
> !NULL check and kref_get_unless_zero().
>
>>> If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
>>> That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
>>> concern.
>>
>> The global cert lives in CCP (/dev/sev), the per VM cert lives in kvmvm_fd.
>> "A la vcpu->run" is fine for the latter but for the former we need something
>> else.
>
> Why? The cert ultimately comes from userspace, no? Make userspace deal with it.
>
>> And there is scenario when one global certs blob is what is needed and
>> copying it over multiple VMs seems suboptimal.
>
> That's a solvable problem. I'm not sure I like the most obvious solution, but it
> is a solution: let userspace define a KVM-wide blob pointer, either via .mmap()
> or via an ioctl().
>
> FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
> userspace pointer would suffice. The benefit of a kernel controlled pointer is
> that it doesn't require copying to a kernel buffer (or special code to copy from
> userspace into guest).
>
> Actually, looking at the flow again, AFAICT there's nothing special about the
> target DATA_PAGE. It must be SHARED *before* SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
> KVM doesn't need to do conversions, there's no kernel priveleges required, etc.
> And the GHCB doesn't dictate ordering between storing the certificates and doing
> the request.

That's true.

>That means the certificate stuff can be punted entirely to usersepace.

>
> Heh, typing up the below, there's another bug: KVM will incorrectly "return" '0'
> for non-SNP guests:
>
> unsigned long exitcode = 0;
> u64 data_gpa;
> int err, rc;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
> goto e_fail;
> }
>
> e_fail:
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
>
> Which really highlights that we need to get test infrastructure up and running
> for SEV-ES, SNP, and TDX.
>
> Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
> is the definition of KVM_HC_SNP_GET_CERTS and its arguments.
>
> static void snp_handle_guest_request(struct vcpu_svm *svm)
> {
> struct vmcb_control_area *control = &svm->vmcb->control;
> struct sev_data_snp_guest_request data = {0};
> struct kvm_vcpu *vcpu = &svm->vcpu;
> struct kvm *kvm = vcpu->kvm;
> struct kvm_sev_info *sev;
> gpa_t req_gpa = control->exit_info_1;
> gpa_t resp_gpa = control->exit_info_2;
> unsigned long rc;
> int err;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST;
> goto e_fail;
> }
>
> sev = &to_kvm_svm(kvm)->sev_info;
>
> mutex_lock(&sev->guest_req_lock);
>
> rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> if (rc)
> goto unlock;
>
> rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> if (rc)
> /* Ensure an error value is returned to guest. */
> rc = err ? err : SEV_RET_INVALID_ADDRESS;
>
> snp_cleanup_guest_buf(&data, &rc);
>
> unlock:
> mutex_unlock(&sev->guest_req_lock);
>
> e_fail:
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
> }
>
> static int snp_complete_ext_guest_request(struct kvm_vcpu *vcpu)
> {
> u64 certs_exitcode = vcpu->run->hypercall.args[2];
> struct vcpu_svm *svm = to_svm(vcpu);
>
> if (certs_exitcode)
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, certs_exitcode);
> else
> snp_handle_guest_request(svm);
> return 1;
> }
>
> static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
> {
> struct kvm_vcpu *vcpu = &svm->vcpu;
> struct kvm *kvm = vcpu->kvm;
> struct kvm_sev_info *sev;
> unsigned long exitcode;
> u64 data_gpa;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
> return 1;
> }
>
> data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
> return 1;
> }
>
> vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
> vcpu->run->hypercall.args[0] = data_gpa;
> vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
> vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> return 0;
> }
>

IIRC, the important consideration here is to ensure that getting the
attestation report and retrieving the certificates appears atomic to the
guest. When SNP live migration is supported we don't want a case where
the guest could have migrated between the call to obtain the
certificates and obtaining the attestation report, which can potentially
cause failure of validation of the attestation report.

Thanks,
Ashish

2023-10-18 20:39:09

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

On Wed, Oct 18, 2023, Ashish Kalra wrote:
> > static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
> > {
> > struct kvm_vcpu *vcpu = &svm->vcpu;
> > struct kvm *kvm = vcpu->kvm;
> > struct kvm_sev_info *sev;
> > unsigned long exitcode;
> > u64 data_gpa;
> >
> > if (!sev_snp_guest(vcpu->kvm)) {
> > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
> > return 1;
> > }
> >
> > data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
> > return 1;
> > }
> >
> > vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
> > vcpu->run->hypercall.args[0] = data_gpa;
> > vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
> > vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
> > vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> > return 0;
> > }
> >
>
> IIRC, the important consideration here is to ensure that getting the
> attestation report and retrieving the certificates appears atomic to the
> guest. When SNP live migration is supported we don't want a case where the
> guest could have migrated between the call to obtain the certificates and
> obtaining the attestation report, which can potentially cause failure of
> validation of the attestation report.

Where does "obtaining the attestation report" happen? I see the guest request
and the certificate stuff, I don't see anything about attestation reports (though
I'm not looking very closely).

2023-10-18 21:27:36

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event


On 10/18/2023 3:38 PM, Sean Christopherson wrote:
> On Wed, Oct 18, 2023, Ashish Kalra wrote:
>>> static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
>>> {
>>> struct kvm_vcpu *vcpu = &svm->vcpu;
>>> struct kvm *kvm = vcpu->kvm;
>>> struct kvm_sev_info *sev;
>>> unsigned long exitcode;
>>> u64 data_gpa;
>>>
>>> if (!sev_snp_guest(vcpu->kvm)) {
>>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
>>> return 1;
>>> }
>>>
>>> data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>> if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
>>> return 1;
>>> }
>>>
>>> vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
>>> vcpu->run->hypercall.args[0] = data_gpa;
>>> vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
>>> vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
>>> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
>>> return 0;
>>> }
>>>
>>
>> IIRC, the important consideration here is to ensure that getting the
>> attestation report and retrieving the certificates appears atomic to the
>> guest. When SNP live migration is supported we don't want a case where the
>> guest could have migrated between the call to obtain the certificates and
>> obtaining the attestation report, which can potentially cause failure of
>> validation of the attestation report.
>
> Where does "obtaining the attestation report" happen? I see the guest request
> and the certificate stuff, I don't see anything about attestation reports (though
> I'm not looking very closely).
>

The guest requests that the firmware construct an attestation report via
the SNP_GUEST_REQUEST command. The certificates are piggy-backed to the
guest along with the attestation report (retrieved from the FW via the
SNP_GUEST_REQUEST command) as part of the SNP Extended Guest Request NAE
handling.

Thanks,
Ashish

2023-10-18 21:43:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

On Wed, Oct 18, 2023, Ashish Kalra wrote:
>
> On 10/18/2023 3:38 PM, Sean Christopherson wrote:
> > On Wed, Oct 18, 2023, Ashish Kalra wrote:
> > > > static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
> > > > {
> > > > struct kvm_vcpu *vcpu = &svm->vcpu;
> > > > struct kvm *kvm = vcpu->kvm;
> > > > struct kvm_sev_info *sev;
> > > > unsigned long exitcode;
> > > > u64 data_gpa;
> > > >
> > > > if (!sev_snp_guest(vcpu->kvm)) {
> > > > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
> > > > return 1;
> > > > }
> > > >
> > > > data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > > > if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> > > > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
> > > > return 1;
> > > > }
> > > >

Doh, I forget to set

vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;

> > > > vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
> > > > vcpu->run->hypercall.args[0] = data_gpa;
> > > > vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
> > > > vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
> > > > vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> > > > return 0;
> > > > }
> > > >
> > >
> > > IIRC, the important consideration here is to ensure that getting the
> > > attestation report and retrieving the certificates appears atomic to the
> > > guest. When SNP live migration is supported we don't want a case where the
> > > guest could have migrated between the call to obtain the certificates and
> > > obtaining the attestation report, which can potentially cause failure of
> > > validation of the attestation report.
> >
> > Where does "obtaining the attestation report" happen? I see the guest request
> > and the certificate stuff, I don't see anything about attestation reports (though
> > I'm not looking very closely).
> >
>
> The guest requests that the firmware construct an attestation report via the
> SNP_GUEST_REQUEST command. The certificates are piggy-backed to the guest
> along with the attestation report (retrieved from the FW via the
> SNP_GUEST_REQUEST command) as part of the SNP Extended Guest Request NAE
> handling.

Ah, thanks!

In that case, my proposal should more or less Just Workâ„¢, we simply need to define
KVM's ABI to be that userspace is responsible for doing KVM_RUN with
vcpu->run->immediate_exit set before migrating if the previous exit was
KVM_EXIT_HYPERCALL with KVM_HC_SNP_GET_CERTS. This is standard operating procedure
for userspace exits where KVM needs to "complete" the VM-Exit, e.g. for MMIO, I/O,
etc. that are punted to userspace.

2023-10-19 02:49:42

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event


On 19/10/23 00:48, Sean Christopherson wrote:
> On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
>>
>> On 18/10/23 03:27, Sean Christopherson wrote:
>>> On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
>>>>> +
>>>>> + /*
>>>>> + * If a VMM-specific certificate blob hasn't been provided, grab the
>>>>> + * host-wide one.
>>>>> + */
>>>>> + snp_certs = sev_snp_certs_get(sev->snp_certs);
>>>>> + if (!snp_certs)
>>>>> + snp_certs = sev_snp_global_certs_get();
>>>>> +
>>>>
>>>> This is where the generation I suggested adding would get checked. If
>>>> the instance certs' generation is not the global generation, then I
>>>> think we need a way to return to the VMM to make that right before
>>>> continuing to provide outdated certificates.
>>>> This might be an unreasonable request, but the fact that the certs and
>>>> reported_tcb can be set while a VM is running makes this an issue.
>>>
>>> Before we get that far, the changelogs need to explain why the kernel is storing
>>> userspace blobs in the first place. The whole thing is a bit of a mess.
>>>
>>> sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
>>> bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
>>> while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
>>> between bumping the refcount and grabbing the pointer, KVM will end up leaking a
>>> refcount and consuming a pointer without a refcount.
>>>
>>> if (!kref_get_unless_zero(&certs->kref))
>>> return NULL;
>>>
>>> return certs;
>>
>> I'm missing something here. The @certs pointer is on the stack,
>
> No, nothing guarantees that @certs is on the stack and will never be reloaded.
> sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so it's entirely
> possible that it can be inlined. Then you end up with:
>
> struct sev_device *sev;
>
> if (!psp_master || !psp_master->sev_data)
> return NULL;
>
> sev = psp_master->sev_data;
> if (!sev->snp_initialized)
> return NULL;
>
> if (!sev->snp_certs)
> return NULL;
>
> if (!kref_get_unless_zero(&sev->snp_certs->kref))
> return NULL;
>
> return sev->snp_certs;
>
> At which point the compiler could choose to omit a local variable entirely, it
> could store @certs in a register and reload after kref_get_unless_zero(), etc.
> If psp_master->sev_data->snp_certs is changed at any point, odd thing can happen.
>
> That atomic operation in kref_get_unless_zero() might prevent a reload between
> getting the kref and the return, but it wouldn't prevent a reload between the
> !NULL check and kref_get_unless_zero().

Oh. The function is exported so I thought gcc would not go that far but
yeah it is possible. So this needs an explicit READ_ONCE barrier.


>>> If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
>>> That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
>>> concern.
>>
>> The global cert lives in CCP (/dev/sev), the per VM cert lives in kvmvm_fd.
>> "A la vcpu->run" is fine for the latter but for the former we need something
>> else.
>
> Why? The cert ultimately comes from userspace, no? Make userspace deal with it.
>
>> And there is scenario when one global certs blob is what is needed and
>> copying it over multiple VMs seems suboptimal.
>
> That's a solvable problem. I'm not sure I like the most obvious solution, but it
> is a solution: let userspace define a KVM-wide blob pointer, either via .mmap()
> or via an ioctl().
>
> FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
> userspace pointer would suffice. The benefit of a kernel controlled pointer is
> that it doesn't require copying to a kernel buffer (or special code to copy from
> userspace into guest).

Just to clarify - like, a small userspace non-qemu program which just
holds a pointer with the certs blob, or embed it into libvirt or systemd?


> Actually, looking at the flow again, AFAICT there's nothing special about the
> target DATA_PAGE. It must be SHARED *before* SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
> KVM doesn't need to do conversions, there's no kernel priveleges required, etc.
> And the GHCB doesn't dictate ordering between storing the certificates and doing
> the request. That means the certificate stuff can be punted entirely to usersepace.

All true.

> Heh, typing up the below, there's another bug: KVM will incorrectly "return" '0'
> for non-SNP guests:
>
> unsigned long exitcode = 0;
> u64 data_gpa;
> int err, rc;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
> goto e_fail;
> }
>
> e_fail:
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
>
> Which really highlights that we need to get test infrastructure up and running
> for SEV-ES, SNP, and TDX.
>
> Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
> is the definition of KVM_HC_SNP_GET_CERTS and its arguments.
>
> static void snp_handle_guest_request(struct vcpu_svm *svm)
> {
> struct vmcb_control_area *control = &svm->vmcb->control;
> struct sev_data_snp_guest_request data = {0};
> struct kvm_vcpu *vcpu = &svm->vcpu;
> struct kvm *kvm = vcpu->kvm;
> struct kvm_sev_info *sev;
> gpa_t req_gpa = control->exit_info_1;
> gpa_t resp_gpa = control->exit_info_2;
> unsigned long rc;
> int err;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST;
> goto e_fail;
> }
>
> sev = &to_kvm_svm(kvm)->sev_info;
>
> mutex_lock(&sev->guest_req_lock);
>
> rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> if (rc)
> goto unlock;
>
> rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> if (rc)
> /* Ensure an error value is returned to guest. */
> rc = err ? err : SEV_RET_INVALID_ADDRESS;
>
> snp_cleanup_guest_buf(&data, &rc);
>
> unlock:
> mutex_unlock(&sev->guest_req_lock);
>
> e_fail:
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
> }
>
> static int snp_complete_ext_guest_request(struct kvm_vcpu *vcpu)
> {
> u64 certs_exitcode = vcpu->run->hypercall.args[2];
> struct vcpu_svm *svm = to_svm(vcpu);
>
> if (certs_exitcode)
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, certs_exitcode);
> else
> snp_handle_guest_request(svm);
> return 1;
> }
>
> static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
> {
> struct kvm_vcpu *vcpu = &svm->vcpu;
> struct kvm *kvm = vcpu->kvm;
> struct kvm_sev_info *sev;
> unsigned long exitcode;
> u64 data_gpa;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
> return 1;
> }
>
> data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
> return 1;
> }
>
> vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
> vcpu->run->hypercall.args[0] = data_gpa;
> vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
> vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;

btw why is it _LONG_MODE and not just _64? :)

> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> return 0;
> }

This should work the KVM stored certs nicely but not for the global
certs. Although I am not all convinced that global certs is all that
valuable but I do not know the history of that, happened before I joined
so I let others to comment on that. Thanks,


--
Alexey


2023-10-19 12:26:56

by Liam Merwick

[permalink] [raw]
Subject: Re: [PATCH v10 38/50] KVM: SEV: Add support for GHCB-based termination requests

On 16/10/2023 14:28, Michael Roth wrote:
> GHCB version 2 adds support for a GHCB-based termination request that
> a guest can issue when it reaches an error state and wishes to inform
> the hypervisor that it should be terminated. Implement support for that
> similarly to GHCB MSR-based termination requests that are already
> available to SEV-ES guests via earlier versions of the GHCB protocol.


Maybe add

See 'Termination Request' in the 'Invoking VMGEXIT' section of AMD's
GHCB spec for more details.

>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index e547adddacfa..9c38fe796e00 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3094,6 +3094,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> case SVM_VMGEXIT_HV_FEATURES:
> case SVM_VMGEXIT_PSC:
> + case SVM_VMGEXIT_TERM_REQUEST:
> break;
> default:
> reason = GHCB_ERR_INVALID_EVENT;
> @@ -3762,6 +3763,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>
> ret = 1;
> break;
> + case SVM_VMGEXIT_TERM_REQUEST:
> + pr_info("SEV-ES guess requested termination: reason %#llx info %#llx\n",
> + control->exit_info_1, control->exit_info_1);

typo: "guess" -> "guest"
It prints exit_info_1 twice - was one of those meant to be exit_info_2?

Otherwise
Reviewed-by: Liam Merwick <[email protected]>


> + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> + vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
> + vcpu->run->system_event.ndata = 1;
> + vcpu->run->system_event.data[0] = control->ghcb_gpa;
> + break;
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> vcpu_unimpl(vcpu,
> "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",

2023-10-19 14:58:16

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

On Thu, Oct 19, 2023, Alexey Kardashevskiy wrote:
>
> On 19/10/23 00:48, Sean Christopherson wrote:
> > static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
> > {
> > struct kvm_vcpu *vcpu = &svm->vcpu;
> > struct kvm *kvm = vcpu->kvm;
> > struct kvm_sev_info *sev;
> > unsigned long exitcode;
> > u64 data_gpa;
> >
> > if (!sev_snp_guest(vcpu->kvm)) {
> > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
> > return 1;
> > }
> >
> > data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
> > return 1;
> > }
> >
> > vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
> > vcpu->run->hypercall.args[0] = data_gpa;
> > vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
> > vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
>
> btw why is it _LONG_MODE and not just _64? :)

I'm pretty sure it got copied from Xen when KVM started adding supporting for
emulating Xen's hypercalls. I assume Xen PV actually has a need for identifying
long mode as opposed to just 64-bit mode, but KVM, not so much.

> > vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> > return 0;
> > }
>
> This should work the KVM stored certs nicely but not for the global certs.
> Although I am not all convinced that global certs is all that valuable but I
> do not know the history of that, happened before I joined so I let others to
> comment on that. Thanks,

Aren't the global certs provided by userspace too though? If all certs are
ultimately controlled by userspace, I don't see any reason to make the kernel a
middle-man.

2023-10-19 23:55:59

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event


On 20/10/23 01:57, Sean Christopherson wrote:
> On Thu, Oct 19, 2023, Alexey Kardashevskiy wrote:
>>
>> On 19/10/23 00:48, Sean Christopherson wrote:
>>> static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
>>> {
>>> struct kvm_vcpu *vcpu = &svm->vcpu;
>>> struct kvm *kvm = vcpu->kvm;
>>> struct kvm_sev_info *sev;
>>> unsigned long exitcode;
>>> u64 data_gpa;
>>>
>>> if (!sev_snp_guest(vcpu->kvm)) {
>>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
>>> return 1;
>>> }
>>>
>>> data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>> if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
>>> return 1;
>>> }
>>>
>>> vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
>>> vcpu->run->hypercall.args[0] = data_gpa;
>>> vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
>>> vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
>>
>> btw why is it _LONG_MODE and not just _64? :)
>
> I'm pretty sure it got copied from Xen when KVM started adding supporting for
> emulating Xen's hypercalls. I assume Xen PV actually has a need for identifying
> long mode as opposed to just 64-bit mode, but KVM, not so much.
>
>>> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
>>> return 0;
>>> }
>>
>> This should work the KVM stored certs nicely but not for the global certs.
>> Although I am not all convinced that global certs is all that valuable but I
>> do not know the history of that, happened before I joined so I let others to
>> comment on that. Thanks,
>
> Aren't the global certs provided by userspace too though? If all certs are
> ultimately controlled by userspace, I don't see any reason to make the kernel a
> middle-man.

The max blob size is 32KB or so and for 200 VMs it is:
- 6.5MB, all in the userspace so swappable vs
- 32KB but in the kernel so not swappable.
Sure, a box capable of running 200 VMs must have plenty of RAM but still :)
Plus, GHCB now has to go via the userspace before talking to the PSP
which was not the case so far (though I cannot think of immediate
implication right now).


--
Alexey


2023-10-20 00:14:11

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

On Fri, Oct 20, 2023, Alexey Kardashevskiy wrote:
>
> On 20/10/23 01:57, Sean Christopherson wrote:
> > On Thu, Oct 19, 2023, Alexey Kardashevskiy wrote:
> > > > vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> > > > return 0;
> > > > }
> > >
> > > This should work the KVM stored certs nicely but not for the global certs.
> > > Although I am not all convinced that global certs is all that valuable but I
> > > do not know the history of that, happened before I joined so I let others to
> > > comment on that. Thanks,
> >
> > Aren't the global certs provided by userspace too though? If all certs are
> > ultimately controlled by userspace, I don't see any reason to make the kernel a
> > middle-man.
>
> The max blob size is 32KB or so and for 200 VMs it is:

Not according to include/linux/psp-sev.h:

#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */

Ugh, and I see in another patch:

Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
for an extra certificate.

-#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
+#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */

That's gross and just asking for ABI problems, because then there's this:

+::
+
+ struct kvm_sev_snp_set_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len
+ };
+
+The certs_len field may not exceed SEV_FW_BLOB_MAX_SIZE.

> - 6.5MB, all in the userspace so swappable vs
> - 32KB but in the kernel so not swappable.
> Sure, a box capable of running 200 VMs must have plenty of RAM but still :)

That's making quite a few assumptions.

1) That the global cert will be 32KiB (which clearly isn't the case today).
2) That every VM will want the global cert.
3) That userspace can't figure out a way to share the global cert.

Even in that absolutely worst case scenario, I am not remotely convinced that it
justifies taking on the necessary complexity to manage certs in-kernel.

> Plus, GHCB now has to go via the userspace before talking to the PSP which
> was not the case so far (though I cannot think of immediate implication
> right now).

Any argument along the lines of "because that's how we've always done it" is going
to fall on deaf ears. If there's a real performance bottleneck with kicking out
to userspace, then I'll happily work to figure out a solution. If.

2023-10-20 15:13:53

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

On Fri, Oct 20, 2023, Alexey Kardashevskiy wrote:
>
> On 20/10/23 11:13, Sean Christopherson wrote:
> > On Fri, Oct 20, 2023, Alexey Kardashevskiy wrote:
> > > Plus, GHCB now has to go via the userspace before talking to the PSP which
> > > was not the case so far (though I cannot think of immediate implication
> > > right now).
> >
> > Any argument along the lines of "because that's how we've always done it" is going
> > to fall on deaf ears. If there's a real performance bottleneck with kicking out
> > to userspace, then I'll happily work to figure out a solution. If.
>
> No, not performance, I was trying to imagine what can go wrong if multiple
> vcpus are making this call, all exiting to QEMU, in a loop, racing,
> something like this.

I am not at all concerned about userspace being able to handle parallel requests
to get a certificate. Per-vCPU exits that access global/shared resources might
not be super common, but they're certainly not rare. E.g. a guest access to an
option ROM can trigger memslot updates in QEMU, which requires at least taking a
mutex to guard KVM_SET_USER_MEMORY_REGION, and IIRC QEMU also uses RCU to protect
QEMU accesses to address spaces.

Given that we know there will be scenarios where certificates are changed/updated,
I wouldn't be at all surprised if handling this in userspace is actually easier
as it will give userspace more control and options, and make it easier to reason
about the resulting behavior. E.g. userspace could choose between a lockless
scheme and a r/w lock if there's a need to ensure per-VM and global certs are
updated atomically from the guest's perspective.

2023-10-25 17:34:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled

On Mon, Oct 16, 2023 at 08:27:34AM -0500, Michael Roth wrote:
> From: Kim Phillips <[email protected]>
>
> Without SEV-SNP, Automatic IBRS protects only the kernel. But when
> SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
> host-side code, including userspace. This protection comes at a cost:
> reduced userspace indirect branch performance.
>
> To avoid this performance loss, don't use Automatic IBRS on SEV-SNP
> hosts. Fall back to retpolines instead.
>
> Signed-off-by: Kim Phillips <[email protected]>
> [mdr: squash in changes from review discussion]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kernel/cpu/common.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)

Acked-by: Borislav Petkov (AMD) <[email protected]>

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-10-25 18:19:43

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support

On 10/16/23 08:27, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
> a number of steps needed to detect/enable SEV-SNP and RMP table support
> on the host:
>
> - Detect SEV-SNP support based on CPUID bit
> - Initialize the RMP table memory reported by the RMP base/end MSR
> registers and configure IOMMU to be compatible with RMP access
> restrictions
> - Set the MtrrFixDramModEn bit in SYSCFG MSR
> - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
> - Configure IOMMU
>
> RMP table entry format is non-architectural and it can vary by
> processor. It is defined by the PPR. Restrict SNP support to CPU
> models/families which are compatible with the current RMP table entry
> format to guard against any undefined behavior when running on other
> system types. Future models/support will handle this through an
> architectural mechanism to allow for broader compatibility.
>
> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
> instead of CONFIG_AMD_MEM_ENCRYPT.
>
> Co-developed-by: Ashish Kalra <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Co-developed-by: Tom Lendacky <[email protected]>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> [mdr: rework commit message to be clearer about what patch does, squash
> in early_rmptable_check() handling from Tom]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/Kbuild | 2 +
> arch/x86/include/asm/disabled-features.h | 8 +-
> arch/x86/include/asm/msr-index.h | 11 +-
> arch/x86/include/asm/sev.h | 6 +
> arch/x86/kernel/cpu/amd.c | 19 ++
> arch/x86/virt/svm/Makefile | 3 +
> arch/x86/virt/svm/sev.c | 239 +++++++++++++++++++++++
> drivers/iommu/amd/init.c | 2 +-
> include/linux/amd-iommu.h | 2 +-
> 9 files changed, 288 insertions(+), 4 deletions(-)
> create mode 100644 arch/x86/virt/svm/Makefile
> create mode 100644 arch/x86/virt/svm/sev.c
>
> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
> index 5a83da703e87..6a1f36df6a18 100644
> --- a/arch/x86/Kbuild
> +++ b/arch/x86/Kbuild
> @@ -28,5 +28,7 @@ obj-y += net/
>
> obj-$(CONFIG_KEXEC_FILE) += purgatory/
>
> +obj-y += virt/svm/
> +
> # for cleaning
> subdir- += boot tools
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 702d93fdd10e..83efd407033b 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -117,6 +117,12 @@
> #define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31))
> #endif
>
> +#ifdef CONFIG_KVM_AMD_SEV
> +# define DISABLE_SEV_SNP 0
> +#else
> +# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
> +#endif
> +
> /*
> * Make sure to add features to the correct mask
> */
> @@ -141,7 +147,7 @@
> DISABLE_ENQCMD)
> #define DISABLED_MASK17 0
> #define DISABLED_MASK18 (DISABLE_IBT)
> -#define DISABLED_MASK19 0
> +#define DISABLED_MASK19 (DISABLE_SEV_SNP)
> #define DISABLED_MASK20 0
> #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 1d111350197f..2be74afb4cbd 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -589,6 +589,8 @@
> #define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
> #define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
> #define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +#define MSR_AMD64_RMP_BASE 0xc0010132
> +#define MSR_AMD64_RMP_END 0xc0010133
>
> /* SNP feature bits enabled by the hypervisor */
> #define MSR_AMD64_SNP_VTOM BIT_ULL(3)
> @@ -690,7 +692,14 @@
> #define MSR_K8_TOP_MEM2 0xc001001d
> #define MSR_AMD64_SYSCFG 0xc0010010
> #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
> -#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
> +#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
> +#define MSR_AMD64_SYSCFG_MFDM_BIT 19
> +#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
> +
> #define MSR_K8_INT_PENDING_MSG 0xc0010055
> /* C1E active bits in int pending message */
> #define K8_INTP_C1E_ACTIVE_MASK 0x18000000
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 5b4a1ce3d368..b05fcd0ab7e4 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -243,4 +243,10 @@ static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
> static inline u64 sev_get_status(void) { return 0; }
> #endif
>
> +#ifdef CONFIG_KVM_AMD_SEV
> +bool snp_get_rmptable_info(u64 *start, u64 *len);
> +#else
> +static inline bool snp_get_rmptable_info(u64 *start, u64 *len) { return false; }
> +#endif
> +
> #endif
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 14ee7f750cc7..6cc2074fcea3 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -20,6 +20,7 @@
> #include <asm/delay.h>
> #include <asm/debugreg.h>
> #include <asm/resctrl.h>
> +#include <asm/sev.h>
>
> #ifdef CONFIG_X86_64
> # include <asm/mmconfig.h>
> @@ -618,6 +619,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
> resctrl_cpu_detect(c);
> }
>
> +static bool early_rmptable_check(void)
> +{
> + u64 rmp_base, rmp_size;
> +
> + /*
> + * For early BSP initialization, max_pfn won't be set up yet, wait until
> + * it is set before performing the RMP table calculations.
> + */
> + if (!max_pfn)
> + return true;

To make this so that AutoIBRS isn't disabled should an RMP table not have
been allocated by BIOS, lets delete the above check. It then becomes just
a check for whether the RMP table has been allocated by BIOS, enabled by
selecting a BIOS option, which shows intent for running SNP guests.

This way, the AutoIBRS mitigation can be used if SNP is not possible on
the system.

Thanks,
Tom

> +
> + return snp_get_rmptable_info(&rmp_base, &rmp_size);
> +}
> +
> static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> {
> u64 msr;
> @@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> if (!(msr & MSR_K7_HWCR_SMMLOCK))
> goto clear_sev;
>
> + if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
> + goto clear_snp;
> +
> return;
>
> clear_all:
> @@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> clear_sev:
> setup_clear_cpu_cap(X86_FEATURE_SEV);
> setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
> +clear_snp:
> setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> }
> }
> diff --git a/arch/x86/virt/svm/Makefile b/arch/x86/virt/svm/Makefile
> new file mode 100644
> index 000000000000..ef2a31bdcc70
> --- /dev/null
> +++ b/arch/x86/virt/svm/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_KVM_AMD_SEV) += sev.o
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> new file mode 100644
> index 000000000000..8b9ed72489e4
> --- /dev/null
> +++ b/arch/x86/virt/svm/sev.c
> @@ -0,0 +1,239 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * AMD SVM-SEV Host Support.
> + *
> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> + *
> + * Author: Ashish Kalra <[email protected]>
> + *
> + */
> +
> +#include <linux/cc_platform.h>
> +#include <linux/printk.h>
> +#include <linux/mm_types.h>
> +#include <linux/set_memory.h>
> +#include <linux/memblock.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/cpumask.h>
> +#include <linux/iommu.h>
> +#include <linux/amd-iommu.h>
> +
> +#include <asm/sev.h>
> +#include <asm/processor.h>
> +#include <asm/setup.h>
> +#include <asm/svm.h>
> +#include <asm/smp.h>
> +#include <asm/cpu.h>
> +#include <asm/apic.h>
> +#include <asm/cpuid.h>
> +#include <asm/cmdline.h>
> +#include <asm/iommu.h>
> +
> +/*
> + * The RMP entry format is not architectural. The format is defined in PPR
> + * Family 19h Model 01h, Rev B1 processor.
> + */
> +struct rmpentry {
> + u64 assigned : 1,
> + pagesize : 1,
> + immutable : 1,
> + rsvd1 : 9,
> + gpa : 39,
> + asid : 10,
> + vmsa : 1,
> + validated : 1,
> + rsvd2 : 1;
> + u64 rsvd3;
> +} __packed;
> +
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the
> + * bookkeeping, the range needs to be added during the RMP entry lookup.
> + */
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> +
> +static struct rmpentry *rmptable_start __ro_after_init;
> +static u64 rmptable_max_pfn __ro_after_init;
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt) "SEV-SNP: " fmt
> +
> +static int __mfd_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_MFDM;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void mfd_enable(void *arg)
> +{
> + __mfd_enable(smp_processor_id());
> +}
> +
> +static int __snp_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_SNP_EN;
> + val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void snp_enable(void *arg)
> +{
> + __snp_enable(smp_processor_id());
> +}
> +
> +#define RMP_ADDR_MASK GENMASK_ULL(51, 13)
> +
> +bool snp_get_rmptable_info(u64 *start, u64 *len)
> +{
> + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
> +
> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> + if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
> + return false;
> + }
> +
> + if (rmp_base > rmp_end) {
> + pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end);
> + return false;
> + }
> +
> + rmp_sz = rmp_end - rmp_base + 1;
> +
> + /*
> + * Calculate the amount the memory that must be reserved by the BIOS to
> + * address the whole RAM, including the bookkeeping area. The RMP itself
> + * must also be covered.
> + */
> + max_rmp_pfn = max_pfn;
> + if (PHYS_PFN(rmp_end) > max_pfn)
> + max_rmp_pfn = PHYS_PFN(rmp_end);
> +
> + calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> + if (calc_rmp_sz > rmp_sz) {
> + pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> + calc_rmp_sz, rmp_sz);
> + return false;
> + }
> +
> + *start = rmp_base;
> + *len = rmp_sz;
> +
> + return true;
> +}
> +
> +static __init int __snp_rmptable_init(void)
> +{
> + u64 rmp_base, rmp_size;
> + void *rmp_start;
> + u64 val;
> +
> + if (!snp_get_rmptable_info(&rmp_base, &rmp_size))
> + return 1;
> +
> + pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
> + rmp_base, rmp_base + rmp_size - 1);
> +
> + rmp_start = memremap(rmp_base, rmp_size, MEMREMAP_WB);
> + if (!rmp_start) {
> + pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, rmp_size);
> + return 1;
> + }
> +
> + /*
> + * Check if SEV-SNP is already enabled, this can happen in case of
> + * kexec boot.
> + */
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
> + goto skip_enable;
> +
> + /* Initialize the RMP table to zero */
> + memset(rmp_start, 0, rmp_size);
> +
> + /* Flush the caches to ensure that data is written before SNP is enabled. */
> + wbinvd_on_all_cpus();
> +
> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
> + on_each_cpu(mfd_enable, NULL, 1);
> +
> + /* Enable SNP on all CPUs. */
> + on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> + rmp_start += RMPTABLE_CPU_BOOKKEEPING_SZ;
> + rmp_size -= RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> + rmptable_start = (struct rmpentry *)rmp_start;
> + rmptable_max_pfn = rmp_size / sizeof(struct rmpentry) - 1;
> +
> + return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> + int family, model;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + family = boot_cpu_data.x86;
> + model = boot_cpu_data.x86_model;
> +
> + /*
> + * RMP table entry format is not architectural and it can vary by processor and
> + * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> + * model and family for which the RMP table entry format is currently defined for.
> + */
> + if (family != 0x19 || model > 0xaf)
> + goto nosnp;
> +
> + if (amd_iommu_snp_enable())
> + goto nosnp;
> +
> + if (__snp_rmptable_init())
> + goto nosnp;
> +
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> + return 0;
> +
> +nosnp:
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> + return -ENOSYS;
> +}
> +
> +/*
> + * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
> + * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
> + * called after subsys_initcall().
> + *
> + * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
> + * directly into guest private memory. In case of SNP, the IOMMU ensures that
> + * the page(s) used for DMA are hypervisor owned.
> + */
> +fs_initcall(snp_rmptable_init);
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 45efb7e5d725..1c9924de607a 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -3802,7 +3802,7 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
> return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
> }
>
> -#ifdef CONFIG_AMD_MEM_ENCRYPT
> +#ifdef CONFIG_KVM_AMD_SEV
> int amd_iommu_snp_enable(void)
> {
> /*
> diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
> index 99a5201d9e62..55fc03cb3968 100644
> --- a/include/linux/amd-iommu.h
> +++ b/include/linux/amd-iommu.h
> @@ -205,7 +205,7 @@ int amd_iommu_pc_get_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn,
> u64 *value);
> struct amd_iommu *get_amd_iommu(unsigned int idx);
>
> -#ifdef CONFIG_AMD_MEM_ENCRYPT
> +#ifdef CONFIG_KVM_AMD_SEV
> int amd_iommu_snp_enable(void);
> #endif
>

2023-11-24 14:40:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 13/50] crypto: ccp: Define the SEV-SNP commands

On Mon, Oct 16, 2023 at 08:27:42AM -0500, Michael Roth wrote:
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 7fd17e82bab4..a7f92e74564d 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -78,6 +78,36 @@ enum sev_cmd {
> SEV_CMD_DBG_DECRYPT = 0x060,
> SEV_CMD_DBG_ENCRYPT = 0x061,
>
> + /* SNP specific commands */
> + SEV_CMD_SNP_INIT = 0x81,

The other commands start with "0x0" - pls do that too here or unify with
a pre-patch.

> + SEV_CMD_SNP_SHUTDOWN = 0x82,
> + SEV_CMD_SNP_PLATFORM_STATUS = 0x83,
> + SEV_CMD_SNP_DF_FLUSH = 0x84,
> + SEV_CMD_SNP_INIT_EX = 0x85,
> + SEV_CMD_SNP_SHUTDOWN_EX = 0x86,
> + SEV_CMD_SNP_DECOMMISSION = 0x90,
> + SEV_CMD_SNP_ACTIVATE = 0x91,
> + SEV_CMD_SNP_GUEST_STATUS = 0x92,
> + SEV_CMD_SNP_GCTX_CREATE = 0x93,
> + SEV_CMD_SNP_GUEST_REQUEST = 0x94,
> + SEV_CMD_SNP_ACTIVATE_EX = 0x95,
> + SEV_CMD_SNP_LAUNCH_START = 0xA0,
> + SEV_CMD_SNP_LAUNCH_UPDATE = 0xA1,
> + SEV_CMD_SNP_LAUNCH_FINISH = 0xA2,
> + SEV_CMD_SNP_DBG_DECRYPT = 0xB0,
> + SEV_CMD_SNP_DBG_ENCRYPT = 0xB1,
> + SEV_CMD_SNP_PAGE_SWAP_OUT = 0xC0,
> + SEV_CMD_SNP_PAGE_SWAP_IN = 0xC1,
> + SEV_CMD_SNP_PAGE_MOVE = 0xC2,
> + SEV_CMD_SNP_PAGE_MD_INIT = 0xC3,
> + SEV_CMD_SNP_PAGE_SET_STATE = 0xC6,
> + SEV_CMD_SNP_PAGE_RECLAIM = 0xC7,
> + SEV_CMD_SNP_PAGE_UNSMASH = 0xC8,
> + SEV_CMD_SNP_CONFIG = 0xC9,
> + SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX = 0xCA,

You don't have to vertically align those to a different column due to
this command's name not fitting - just do:

SEV_CMD_SNP_CONFIG = 0x0C9,
SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX = 0x0CA,
SEV_CMD_SNP_COMMIT = 0x0CB,




> + SEV_CMD_SNP_COMMIT = 0xCB,
> + SEV_CMD_SNP_VLEK_LOAD = 0xCD,
> +
> SEV_CMD_MAX,
> };

...

> +/**
> + * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
> + *
> + * @gctx_addr: system physical address of guest context page
> + * @policy: guest policy
> + * @ma_gctx_addr: system physical address of migration agent
> + * @imi_en: launch flow is launching an IMI for the purpose of

What is an "IMI"?

Define it once for the readers pls.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-06 23:00:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list

On Mon, Oct 16, 2023 at 08:27:45AM -0500, Michael Roth wrote:
> + spin_lock(&snp_leaked_pages_list_lock);
> + while (npages--) {
> + /*
> + * Reuse the page's buddy list for chaining into the leaked
> + * pages list. This page should not be on a free list currently
> + * and is also unsafe to be added to a free list.
> + */
> + list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
> + sev_dump_rmpentry(pfn);
> + pfn++;
> + }
> + spin_unlock(&snp_leaked_pages_list_lock);
> + atomic_long_inc(&snp_nr_leaked_pages);

How is this supposed to count?

You're leaking @npages as the function's parameter but are incrementing
snp_nr_leaked_pages only once?

Just make it a bog-normal unsigned long and increment it inside the
locked section.

Or do at the beginning of the function:

atomic_long_add(npages, &snp_nr_leaked_pages);

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-08 15:50:40

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 17/50] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

On Mon, Oct 16, 2023 at 08:27:46AM -0500, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The behavior and requirement for the SEV-legacy command is altered when
> the SNP firmware is in the INIT state. See SEV-SNP firmware specification
> for more details.
>
> Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
> when SNP is enabled to satisfy new requirements for the SNP. Continue

s/the //

> allocating a 1mb region for !SNP configuration.
>
> While at it, provide API that can be used by others to allocate a page

"...an API... ... to allocate a firmware page."

Simple.

> that can be used by the firmware.

> The immediate user for this API will be the KVM driver.

Delete that sentence.

> The KVM driver to need to allocate a firmware context

"The KVM driver needs to allocate ...

> page during the guest creation. The context page need to be updated

"needs"

> by the firmware. See the SEV-SNP specification for further details.
>
> Co-developed-by: Ashish Kalra <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> [mdr: use struct sev_data_snp_page_reclaim instead of passing paddr
> directly to SEV_CMD_SNP_PAGE_RECLAIM]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> drivers/crypto/ccp/sev-dev.c | 151 ++++++++++++++++++++++++++++++++---
> include/linux/psp-sev.h | 9 +++
> 2 files changed, 151 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 613b25f81498..ea21307a2b34 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -30,6 +30,7 @@
> #include <asm/smp.h>
> #include <asm/cacheflush.h>
> #include <asm/e820/types.h>
> +#include <asm/sev-host.h>
>
> #include "psp-dev.h"
> #include "sev-dev.h"
> @@ -93,6 +94,13 @@ static void *sev_init_ex_buffer;
> struct sev_data_range_list *snp_range_list;
> static int __sev_snp_init_locked(int *error);
>
> +/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
> +#define SEV_SNP_ES_TMR_SIZE (2 * 1024 * 1024)

There's "SEV", "SNP" *and* "ES". Wow.

Let's do this:

#define SEV_TMR_SIZE SZ_1M
#define SNP_TMR_SIZE SZ_2M

Done.

> +static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
> +
> +static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);

Instead of doing forward declarations, move the whole logic around
__sev_do_cmd_locked() up here in the file so that you can call that
function by other functions without forward declarations.

The move should probably be a pre-patch.

> static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
> {
> struct sev_device *sev = psp_master->sev_data;
> @@ -193,11 +201,131 @@ static int sev_cmd_buffer_len(int cmd)
> return 0;
> }
>
> +static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
> +{
> + /* Cbit maybe set in the paddr */
> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> + int ret, err, i, n = 0;
> +
> + for (i = 0; i < npages; i++, pfn++, n++) {
> + struct sev_data_snp_page_reclaim data = {0};
> +
> + data.paddr = pfn << PAGE_SHIFT;

This shifting back'n'forth between paddr and pfn makes this function
hard to read. Let's use only paddr (diff ontop):

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index ea21307a2b34..25078b0253bd 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -203,14 +203,15 @@ static int sev_cmd_buffer_len(int cmd)

static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
{
- /* Cbit maybe set in the paddr */
- unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
int ret, err, i, n = 0;

- for (i = 0; i < npages; i++, pfn++, n++) {
+ /* C-bit maybe set, clear it: */
+ paddr = __sme_clr(paddr);
+
+ for (i = 0; i < npages; i++, paddr += PAGE_SIZE, n++) {
struct sev_data_snp_page_reclaim data = {0};

- data.paddr = pfn << PAGE_SHIFT;
+ data.paddr = paddr;

if (locked)
ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
@@ -220,7 +221,7 @@ static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool lock
if (ret)
goto cleanup;

- ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ ret = rmp_make_shared(__phys_to_pfn(paddr), PG_LEVEL_4K);
if (ret)
goto cleanup;
}
@@ -232,7 +233,7 @@ static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool lock
* If failed to reclaim the page then page is no longer safe to
* be release back to the system, leak it.
*/
- snp_leak_pages(pfn, npages - n);
+ snp_leak_pages(__phys_to_pfn(paddr), npages - n);
return ret;
}

> +
> + if (locked)
> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> + else
> + ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> +
> + if (ret)
> + goto cleanup;
> +
> + ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (ret)
> + goto cleanup;
> + }
> +
> + return 0;
> +
> +cleanup:
> + /*
> + * If failed to reclaim the page then page is no longer safe to
> + * be release back to the system, leak it.

"released"

> + */
> + snp_leak_pages(pfn, npages - n);
> + return ret;
> +}
> +
> +static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
> +{
> + /* Cbit maybe set in the paddr */
> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> + int rc, n = 0, i;

That n looks like it can be replaced by i.

> +
> + for (i = 0; i < npages; i++, n++, pfn++) {
> + rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
> + if (rc)
> + goto cleanup;
> + }
> +
> + return 0;
> +
> +cleanup:
> + /*
> + * Try unrolling the firmware state changes by
> + * reclaiming the pages which were already changed to the
> + * firmware state.
> + */
> + snp_reclaim_pages(paddr, n, locked);
> +
> + return rc;
> +}
> +
> +static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)

AFAICT, @locked is always false. So it can go.

> +{
> + unsigned long npages = 1ul << order, paddr;
> + struct sev_device *sev;
> + struct page *page;
> +
> + if (!psp_master || !psp_master->sev_data)
> + return NULL;
> +
> + page = alloc_pages(gfp_mask, order);
> + if (!page)
> + return NULL;
> +
> + /* If SEV-SNP is initialized then add the page in RMP table. */
> + sev = psp_master->sev_data;
> + if (!sev->snp_initialized)
> + return page;
> +
> + paddr = __pa((unsigned long)page_address(page));
> + if (rmp_mark_pages_firmware(paddr, npages, locked))
> + return NULL;
> +
> + return page;
> +}
> +
> +void *snp_alloc_firmware_page(gfp_t gfp_mask)
> +{
> + struct page *page;
> +
> + page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
> +
> + return page ? page_address(page) : NULL;
> +}
> +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
> +
> +static void __snp_free_firmware_pages(struct page *page, int order, bool locked)a

This @locked too is always false. It becomes true later in

Subject: [PATCH v10 50/50] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump

which talks about some panic notifier running in atomic context. But
then you can't take locks in atomic context.

Looks like this whole dance around the locked thing needs a cleanup.

...


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-08 20:54:31

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list

On 12/6/2023 2:42 PM, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:45AM -0500, Michael Roth wrote:
>> + spin_lock(&snp_leaked_pages_list_lock);
>> + while (npages--) {
>> + /*
>> + * Reuse the page's buddy list for chaining into the leaked
>> + * pages list. This page should not be on a free list currently
>> + * and is also unsafe to be added to a free list.
>> + */
>> + list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
>> + sev_dump_rmpentry(pfn);
>> + pfn++;
>> + }
>> + spin_unlock(&snp_leaked_pages_list_lock);
>> + atomic_long_inc(&snp_nr_leaked_pages);
>
> How is this supposed to count?
>
> You're leaking @npages as the function's parameter but are incrementing
> snp_nr_leaked_pages only once?
>
> Just make it a bog-normal unsigned long and increment it inside the
> locked section.
>
> Or do at the beginning of the function:
>
> atomic_long_add(npages, &snp_nr_leaked_pages);
>

Yes will fix accordingly by incrementing it inside the locked section.

Thanks,
Ashish

2023-12-08 22:10:39

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list

Hello Vlastimil,

On 12/7/2023 10:20 AM, Vlastimil Babka wrote:

>> +
>> +void snp_leak_pages(u64 pfn, unsigned int npages)
>> +{
>> + struct page *page = pfn_to_page(pfn);
>> +
>> + pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
>> +
>> + spin_lock(&snp_leaked_pages_list_lock);
>> + while (npages--) {
>> + /*
>> + * Reuse the page's buddy list for chaining into the leaked
>> + * pages list. This page should not be on a free list currently
>> + * and is also unsafe to be added to a free list.
>> + */
>> + list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
>> + sev_dump_rmpentry(pfn);
>> + pfn++;
>
> You increment pfn, but not page, which is always pointing to the page of the
> initial pfn, so need to do page++ too.

Yes, that is a bug and needs to be fixed.

> But that assumes it's all order-0 pages (hard to tell for me whether that's
> true as we start with a pfn), if there can be compound pages, it would be
> best to only add the head page and skip the tail pages - it's not expected
> to use page->buddy_list of tail pages.

Can't we use PageCompound() to check if the page is a compound page and
then use page->compound_head to get and add the head page to leaked
pages list. I understand the tail pages for compound pages are really
limited for usage.

Thanks,
Ashish

2023-12-08 23:21:33

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support

Hello Jeremi,

> Hi Ashish,
>
> I just noticed that the kernel shouts at me about this bit when I offline->online a CPU in
> an SNP host:

Yes, i also observe the same warning when i bring a CPU back online.
>
> [2692586.589194] smpboot: CPU 63 is now offline
> [2692589.366822] [Firmware Warn]: MTRR: CPU 0: SYSCFG[MtrrFixDramModEn] not cleared by BIOS, clearing this bit
> [2692589.376582] smpboot: Booting Node 0 Processor 63 APIC 0x3f
> [2692589.378070] [Firmware Warn]: MTRR: CPU 63: SYSCFG[MtrrFixDramModEn] not cleared by BIOS, clearing this bit
> [2692589.388845] microcode: CPU63: new patch_level=0x0a0011d1
>
> Now I understand if you say "CPU offlining is not supported" but there's nothing currently
> blocking it.
>

There is CPU hotplug support for SNP platform to do __snp_enable() when
bringing a CPU back online, but not really sure what needs to be done
for MtrrFixDramModeEn bit in SYSCFG, discussing with the FW developers.

Thanks,
Ashish

2023-12-09 15:38:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 18/50] crypto: ccp: Handle the legacy SEV command when SNP is enabled

On Mon, Oct 16, 2023 at 08:27:47AM -0500, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> The behavior of the SEV-legacy commands is altered when the SNP firmware
> is in the INIT state. When SNP is in INIT state, all the SEV-legacy
> commands that cause the firmware to write to memory must be in the
> firmware state before issuing the command..

I think this is trying to say that the *memory* must be in firmware
state before the command. Needs massaging.

> A command buffer may contains a system physical address that the firmware

"contain"

> may write to. There are two cases that need to be handled:
>
> 1) system physical address points to a guest memory
> 2) system physical address points to a host memory

s/a //g
>
> To handle the case #1, change the page state to the firmware in the RMP
> table before issuing the command and restore the state to shared after the
> command completes.
>
> For the case #2, use a bounce buffer to complete the request.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>
> ---
> drivers/crypto/ccp/sev-dev.c | 346 ++++++++++++++++++++++++++++++++++-
> drivers/crypto/ccp/sev-dev.h | 12 ++
> 2 files changed, 348 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index ea21307a2b34..b574b0ef2b1f 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -462,12 +462,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
> return sev_write_init_ex_file();
> }
>
> +static int alloc_snp_host_map(struct sev_device *sev)

If this is allocating intermediary bounce buffers, then call the
function that it does exactly that. Or what "host_map" is the name
referring to?

> +{
> + struct page *page;
> + int i;
> +
> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
> + struct snp_host_map *map = &sev->snp_host_map[i];
> +
> + memset(map, 0, sizeof(*map));
> +
> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
> + if (!page)
> + return -ENOMEM;

If the second allocation fails, you just leaked the first one.

> + map->host = page_address(page);
> + }
> +
> + return 0;
> +}
> +
> +static void free_snp_host_map(struct sev_device *sev)
> +{
> + int i;
> +
> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
> + struct snp_host_map *map = &sev->snp_host_map[i];
> +
> + if (map->host) {
> + __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
> + memset(map, 0, sizeof(*map));
> + }
> + }
> +}
> +
> +static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)

Why is paddr a pointer? You simply pass a "unsigned long paddr" like the
rest of the gazillion functions dealing with addresses.

And then you do the ERR_PTR, PTR_ERR thing for the return value of this
function, see include/linux/err.h.

> +{
> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
> +
> + map->active = false;

This toggling of active on function entry and exit is silly.

The usual way to do those things is to mark it as active as the last
step of the map function, when everything has succeeded and to mark it
as inactive (active == false) as the first step in the unmap function.

> +
> + if (!paddr || !len)
> + return 0;
> +
> + map->paddr = *paddr;
> + map->len = len;
> +
> + /* If paddr points to a guest memory then change the page state to firmwware. */
> + if (guest) {
> + if (rmp_mark_pages_firmware(*paddr, npages, true))
> + return -EFAULT;
> +
> + goto done;
> + }

This is where it tells you that this function wants splitting:

map_guest_firmware_pages
map_host_firmware_pages

or so.

And then you lose the @guest argument too and you call the different
functions depending on the SEV cmd.

> +
> + if (!map->host)

What in the hell is ->host?! SPA is host memory?

Comments please.

> + return -ENOMEM;
> +
> + /* Check if the pre-allocated buffer can be used to fullfil the request. */

"fulfill"

> + if (len > SEV_FW_BLOB_MAX_SIZE)
> + return -EINVAL;
> +
> + /* Transition the pre-allocated buffer to the firmware state. */
> + if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
> + return -EFAULT;
> +
> + /* Set the paddr to use pre-allocated firmware buffer */
> + *paddr = __psp_pa(map->host);
> +
> +done:
> + map->active = true;
> + return 0;
> +}


> +
> +static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
> +{
> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
> +
> + if (!map->active)

Same comments as above for that one.

> + return 0;
> +
> + /* If paddr points to a guest memory then restore the page state to hypervisor. */
> + if (guest) {
> + if (snp_reclaim_pages(*paddr, npages, true))
> + return -EFAULT;
> +
> + goto done;
> + }
> +
> + /*
> + * Transition the pre-allocated buffer to hypervisor state before the access.
> + *
> + * This is because while changing the page state to firmware, the kernel unmaps
> + * the pages from the direct map, and to restore the direct map the pages must
> + * be transitioned back to the shared state.
> + */
> + if (snp_reclaim_pages(__pa(map->host), npages, true))
> + return -EFAULT;
> +
> + /* Copy the response data firmware buffer to the callers buffer. */
> + memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));

This is not testing whether map->host is NULL as the above counterpart.

> + *paddr = map->paddr;
> +
> +done:
> + map->active = false;
> + return 0;
> +}
> +
> +static bool sev_legacy_cmd_buf_writable(int cmd)
> +{
> + switch (cmd) {
> + case SEV_CMD_PLATFORM_STATUS:
> + case SEV_CMD_GUEST_STATUS:
> + case SEV_CMD_LAUNCH_START:
> + case SEV_CMD_RECEIVE_START:
> + case SEV_CMD_LAUNCH_MEASURE:
> + case SEV_CMD_SEND_START:
> + case SEV_CMD_SEND_UPDATE_DATA:
> + case SEV_CMD_SEND_UPDATE_VMSA:
> + case SEV_CMD_PEK_CSR:
> + case SEV_CMD_PDH_CERT_EXPORT:
> + case SEV_CMD_GET_ID:
> + case SEV_CMD_ATTESTATION_REPORT:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +#define prep_buffer(name, addr, len, guest, map) \
> + func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
> +
> +static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
> +{
> + int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
> + struct sev_device *sev = psp_master->sev_data;
> + bool from_fw = !to_fw;
> +
> + /*
> + * After the command is completed, change the command buffer memory to
> + * hypervisor state.
> + *
> + * The immutable bit is automatically cleared by the firmware, so
> + * no not need to reclaim the page.
> + */
> + if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
> + if (snp_reclaim_pages(__pa(cmd_buf), 1, true))
> + return -EFAULT;
> +
> + /* No need to go further if firmware failed to execute command. */
> + if (fw_err)
> + return 0;
> + }
> +
> + if (to_fw)
> + func = map_firmware_writeable;
> + else
> + func = unmap_firmware_writeable;

Eww, ugly and with the macro above even worse. And completely
unnecessary.

Define prep_buffer() as a normal function which selects which @func to
call and then does it. Not like this.

...

> +static inline bool need_firmware_copy(int cmd)
> +{
> + struct sev_device *sev = psp_master->sev_data;
> +
> + /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */

"initialized"

> + return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;

redundant ternary conditional:

return cmd < SEV_CMD_SNP_INIT && sev->snp_initialized;

> +}
> +
> +static int snp_aware_copy_to_firmware(int cmd, void *data)

What does "SNP aware" even mean?

> +{
> + return __snp_cmd_buf_copy(cmd, data, true, 0);
> +}
> +
> +static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
> +{
> + return __snp_cmd_buf_copy(cmd, data, false, fw_err);
> +}
> +
> static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> {
> struct psp_device *psp = psp_master;
> struct sev_device *sev;
> unsigned int phys_lsb, phys_msb;
> unsigned int reg, ret = 0;
> + void *cmd_buf;
> int buf_len;
>
> if (!psp || !psp->sev_data)
> @@ -487,12 +770,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> * work for some memory, e.g. vmalloc'd addresses, and @data may not be
> * physically contiguous.
> */
> - if (data)
> - memcpy(sev->cmd_buf, data, buf_len);
> + if (data) {
> + if (sev->cmd_buf_active > 2)

What is that silly counter supposed to mean?

Nested SNP commands?

> + return -EBUSY;
> +
> + cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
> +
> + memcpy(cmd_buf, data, buf_len);
> + sev->cmd_buf_active++;
> +
> + /*
> + * The behavior of the SEV-legacy commands is altered when the
> + * SNP firmware is in the INIT state.
> + */
> + if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, cmd_buf))

Move that need_firmware_copy() check inside snp_aware_copy_to_firmware()
and the other one.

> + return -EFAULT;
> + } else {
> + cmd_buf = sev->cmd_buf;
> + }
>
> /* Get the physical address of the command buffer */
> - phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> - phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> + phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
> + phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
>
> dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
> cmd, phys_msb, phys_lsb, psp_timeout);

...

> @@ -639,6 +947,14 @@ static int ___sev_platform_init_locked(int *error, bool probe)
> if (probe && !psp_init_on_probe)
> return 0;
>
> + /*
> + * Allocate the intermediate buffers used for the legacy command handling.
> + */
> + if (rc != -ENODEV && alloc_snp_host_map(sev)) {

Why isn't this

if (!rc && ...)

> + dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
> + goto skip_legacy;

No need for that skip_legacy silly label. Just "return 0" here.

...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-10 15:50:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

On Wed, Dec 06, 2023 at 02:35:28PM -0600, Kalra, Ashish wrote:
> The main use case for the probe parameter is to control if we want to do
> legacy SEV/SEV-ES INIT during probe. There is a usage case where we want to
> delay legacy SEV INIT till an actual SEV/SEV-ES guest is being launched. So
> essentially the probe parameter controls if we want to
> execute __sev_do_init_locked() or not.
>
> We always want to do SNP INIT at probe time.

Here's what I mean (diff ontop):

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index fae1fd45eccd..830d74fcf950 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -479,11 +479,16 @@ static inline int __sev_do_init_locked(int *psp_ret)
return __sev_init_locked(psp_ret);
}

-static int ___sev_platform_init_locked(int *error, bool probe)
+/*
+ * Legacy guests cannot be running while SNP_INIT(_EX) is executing,
+ * so perform SEV-SNP initialization at probe time.
+ */
+static int __sev_platform_init_snp_locked(int *error)
{
- int rc, psp_ret = SEV_RET_NO_FW_CALL;
+
struct psp_device *psp = psp_master;
struct sev_device *sev;
+ int rc;

if (!psp || !psp->sev_data)
return -ENODEV;
@@ -493,10 +498,6 @@ static int ___sev_platform_init_locked(int *error, bool probe)
if (sev->state == SEV_STATE_INIT)
return 0;

- /*
- * Legacy guests cannot be running while SNP_INIT(_EX) is executing,
- * so perform SEV-SNP initialization at probe time.
- */
rc = __sev_snp_init_locked(error);
if (rc && rc != -ENODEV) {
/*
@@ -506,8 +507,21 @@ static int ___sev_platform_init_locked(int *error, bool probe)
dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n", rc, *error);
}

- /* Delay SEV/SEV-ES support initialization */
- if (probe && !psp_init_on_probe)
+ return rc;
+}
+
+static int __sev_platform_init_locked(int *error)
+{
+ int rc, psp_ret = SEV_RET_NO_FW_CALL;
+ struct psp_device *psp = psp_master;
+ struct sev_device *sev;
+
+ if (!psp || !psp->sev_data)
+ return -ENODEV;
+
+ sev = psp->sev_data;
+
+ if (sev->state == SEV_STATE_INIT)
return 0;

if (!sev_es_tmr) {
@@ -563,33 +577,32 @@ static int ___sev_platform_init_locked(int *error, bool probe)
return 0;
}

-static int __sev_platform_init_locked(int *error)
-{
- return ___sev_platform_init_locked(error, false);
-}
-
-int sev_platform_init(int *error)
+static int _sev_platform_init_locked(int *error, bool probe)
{
int rc;

- mutex_lock(&sev_cmd_mutex);
- rc = __sev_platform_init_locked(error);
- mutex_unlock(&sev_cmd_mutex);
+ rc = __sev_platform_init_snp_locked(error);
+ if (rc)
+ return rc;

- return rc;
+ /* Delay SEV/SEV-ES support initialization */
+ if (probe && !psp_init_on_probe)
+ return 0;
+
+ return __sev_platform_init_locked(error);
}
-EXPORT_SYMBOL_GPL(sev_platform_init);

-static int sev_platform_init_on_probe(int *error)
+int sev_platform_init(int *error)
{
int rc;

mutex_lock(&sev_cmd_mutex);
- rc = ___sev_platform_init_locked(error, true);
+ rc = _sev_platform_init_locked(error, false);
mutex_unlock(&sev_cmd_mutex);

return rc;
}
+EXPORT_SYMBOL_GPL(sev_platform_init);

static int __sev_platform_shutdown_locked(int *error)
{
@@ -691,7 +704,7 @@ static int sev_ioctl_do_pek_pdh_gen(int cmd, struct sev_issue_cmd *argp, bool wr
return -EPERM;

if (sev->state == SEV_STATE_UNINIT) {
- rc = __sev_platform_init_locked(&argp->error);
+ rc = _sev_platform_init_locked(&argp->error, false);
if (rc)
return rc;
}
@@ -734,7 +747,7 @@ static int sev_ioctl_do_pek_csr(struct sev_issue_cmd *argp, bool writable)

cmd:
if (sev->state == SEV_STATE_UNINIT) {
- ret = __sev_platform_init_locked(&argp->error);
+ ret = _sev_platform_init_locked(&argp->error, false);
if (ret)
goto e_free_blob;
}
@@ -1115,7 +1128,7 @@ static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)

/* If platform is not in INIT state then transition it to INIT */
if (sev->state != SEV_STATE_INIT) {
- ret = __sev_platform_init_locked(&argp->error);
+ ret = _sev_platform_init_locked(&argp->error, false);
if (ret)
goto e_free_oca;
}
@@ -1246,7 +1259,7 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
if (!writable)
return -EPERM;

- ret = __sev_platform_init_locked(&argp->error);
+ ret = _sev_platform_init_locked(&argp->error, false);
if (ret)
return ret;
}
@@ -1608,7 +1621,9 @@ void sev_pci_init(void)
}

/* Initialize the platform */
- rc = sev_platform_init_on_probe(&error);
+ mutex_lock(&sev_cmd_mutex);
+ rc = _sev_platform_init_locked(&error, true);
+ mutex_unlock(&sev_cmd_mutex);
if (rc)
dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
error, rc);


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-11 13:08:48

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list



On 12/8/23 23:10, Kalra, Ashish wrote:
> Hello Vlastimil,
>
> On 12/7/2023 10:20 AM, Vlastimil Babka wrote:
>
>>> +
>>> +void snp_leak_pages(u64 pfn, unsigned int npages)
>>> +{
>>> +    struct page *page = pfn_to_page(pfn);
>>> +
>>> +    pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn,
>>> pfn + npages);
>>> +
>>> +    spin_lock(&snp_leaked_pages_list_lock);
>>> +    while (npages--) {
>>> +        /*
>>> +         * Reuse the page's buddy list for chaining into the leaked
>>> +         * pages list. This page should not be on a free list currently
>>> +         * and is also unsafe to be added to a free list.
>>> +         */
>>> +        list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
>>> +        sev_dump_rmpentry(pfn);
>>> +        pfn++;
>>
>> You increment pfn, but not page, which is always pointing to the page
>> of the
>> initial pfn, so need to do page++ too.
>
> Yes, that is a bug and needs to be fixed.
>
>> But that assumes it's all order-0 pages (hard to tell for me whether
>> that's
>> true as we start with a pfn), if there can be compound pages, it would be
>> best to only add the head page and skip the tail pages - it's not
>> expected
>> to use page->buddy_list of tail pages.
>
> Can't we use PageCompound() to check if the page is a compound page and
> then use page->compound_head to get and add the head page to leaked
> pages list. I understand the tail pages for compound pages are really
> limited for usage.

Yeah that should work. Need to be careful though, should probably only
process head pages and check if the whole compound_order() is within the
range we are to leak, and then leak the head page and advance the loop
by compound_order(). And if we encounter a tail page, it should probably
be just skipped. I'm looking at snp_reclaim_pages() which seems to
process a number of pages with SEV_CMD_SNP_PAGE_RECLAIM and once any
fails, call snp_leak_pages() on the rest. Could that invoke
snp_leak_pages with the first pfn being a tail page?

> Thanks,
> Ashish

2023-12-11 13:24:45

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe

On 10/16/23 15:27, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.
>
> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> backing pages as "in-use" via a reserved bit in the corresponding RMP
> entry after a successful VMRUN. This is done for _all_ VMs, not just
> SNP-Active VMs.
>
> If the hypervisor accesses an in-use page through a writable
> translation, the CPU will throw an RMP violation #PF. On early SNP
> hardware, if an in-use page is 2mb aligned and software accesses any
> part of the associated 2mb region with a hupage, the CPU will
> incorrectly treat the entire 2mb region as in-use and signal a spurious
> RMP violation #PF.
>
> The recommended is to not use the hugepage for the VMCB, VMSA or
> AVIC backing page for similar reasons. Add a generic allocator that will
> ensure that the page returns is not hugepage (2mb or 1gb) and is safe to

This is a bit confusing wording as we are not avoiding "using a
hugepage" but AFAIU, avoiding using a (4k) page that has a hugepage
aligned physical address, right?

> be used when SEV-SNP is enabled. Also implement similar handling for the
> VMCB/VMSA pages of nested guests.
>
> Co-developed-by: Marc Orr <[email protected]>
> Signed-off-by: Marc Orr <[email protected]>
> Reported-by: Alper Gun <[email protected]> # for nested VMSA case
> Co-developed-by: Ashish Kalra <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> [mdr: squash in nested guest handling from Ashish]
> Signed-off-by: Michael Roth <[email protected]>
> ---

<snip>

> +
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> +{
> + unsigned long pfn;
> + struct page *p;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +
> + /*
> + * Allocate an SNP safe page to workaround the SNP erratum where
> + * the CPU will incorrectly signal an RMP violation #PF if a
> + * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
> + * or AVIC backing page. The recommeded workaround is to not use the
> + * hugepage.

Same here "not use the hugepage"

> + *
> + * Allocate one extra page, use a page which is not 2mb aligned
> + * and free the other.

This makes more sense.

> + */
> + p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> + if (!p)
> + return NULL;
> +
> + split_page(p, 1);

Yeah I think that's a sensible use of split_page(), as we don't have
support for forcefully non-aligned allocations or specific "page
coloring" in the page allocator.
So even with my wording concerns:

Acked-by: Vlastimil Babka <[email protected]>

> +
> + pfn = page_to_pfn(p);
> + if (IS_ALIGNED(pfn, PTRS_PER_PMD))
> + __free_page(p++);
> + else
> + __free_page(p + 1);
> +
> + return p;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 1e7fb1ea45f7..8e4ef0cd968a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -706,7 +706,7 @@ static int svm_cpu_init(int cpu)
> int ret = -ENOMEM;
>
> memset(sd, 0, sizeof(struct svm_cpu_data));
> - sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
> + sd->save_area = snp_safe_alloc_page(NULL);
> if (!sd->save_area)
> return ret;
>
> @@ -1425,7 +1425,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
> svm = to_svm(vcpu);
>
> err = -ENOMEM;
> - vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + vmcb01_page = snp_safe_alloc_page(vcpu);
> if (!vmcb01_page)
> goto out;
>
> @@ -1434,7 +1434,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
> * SEV-ES guests require a separate VMSA page used to contain
> * the encrypted register state of the guest.
> */
> - vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + vmsa_page = snp_safe_alloc_page(vcpu);
> if (!vmsa_page)
> goto error_free_vmcb_page;
>
> @@ -4876,6 +4876,16 @@ static int svm_vm_init(struct kvm *kvm)
> return 0;
> }
>
> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
> +{
> + struct page *page = snp_safe_alloc_page(vcpu);
> +
> + if (!page)
> + return NULL;
> +
> + return page_address(page);
> +}
> +
> static struct kvm_x86_ops svm_x86_ops __initdata = {
> .name = KBUILD_MODNAME,
>
> @@ -5007,6 +5017,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>
> .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> .vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
> + .alloc_apic_backing_page = svm_alloc_apic_backing_page,
> };
>
> /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index c13070d00910..b7b8bf73cbb9 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -694,6 +694,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
> void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
> void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
> void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>
> /* vmenter.S */
>

2023-12-11 21:11:31

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

Hello Boris,

On 12/9/2023 10:20 AM, Borislav Petkov wrote:
> On Wed, Dec 06, 2023 at 02:35:28PM -0600, Kalra, Ashish wrote:
>> The main use case for the probe parameter is to control if we want to doHl
>> legacy SEV/SEV-ES INIT during probe. There is a usage case where we want to
>> delay legacy SEV INIT till an actual SEV/SEV-ES guest is being launched. So
>> essentially the probe parameter controls if we want to
>> execute __sev_do_init_locked() or not.
>>
>> We always want to do SNP INIT at probe time.
>
> Here's what I mean (diff ontop):
>

See my comments below on this patch:

> +int sev_platform_init(int *error)
> {
> int rc;
>
> mutex_lock(&sev_cmd_mutex);
> - rc = ___sev_platform_init_locked(error, true);
> + rc = _sev_platform_init_locked(error, false);
> mutex_unlock(&sev_cmd_mutex);
>
> return rc;
> }
> +EXPORT_SYMBOL_GPL(sev_platform_init);
>

What we need is a mechanism to do legacy SEV/SEV-ES INIT only if a
SEV/SEV-ES guest is being launched, hence, we want an additional
parameter added to sev_platform_init() exported interface so that
kvm_amd module can call this interface during guest launch and indicate
if SNP/legacy guest is being launched.

That's the reason we want to add the probe parameter to
sev_platform_init().

And to address your previous comments, this will remain a clean
interface, there are going to be only two functions:
sev_platform_init() & __sev_platform_init_locked().

Thanks,
Ashish

2023-12-12 00:00:24

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe

Hello Vlastimil,

On 12/11/2023 7:24 AM, Vlastimil Babka wrote:
> On 10/16/23 15:27, Michael Roth wrote:
>> From: Brijesh Singh <[email protected]>
>>
>> Implement a workaround for an SNP erratum where the CPU will incorrectly
>> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
>> RMP entry of a VMCB, VMSA or AVIC backing page.
>>
>> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
>> backing pages as "in-use" via a reserved bit in the corresponding RMP
>> entry after a successful VMRUN. This is done for _all_ VMs, not just
>> SNP-Active VMs.
>>
>> If the hypervisor accesses an in-use page through a writable
>> translation, the CPU will throw an RMP violation #PF. On early SNP
>> hardware, if an in-use page is 2mb aligned and software accesses any
>> part of the associated 2mb region with a hupage, the CPU will
>> incorrectly treat the entire 2mb region as in-use and signal a spurious
>> RMP violation #PF.
>>
>> The recommended is to not use the hugepage for the VMCB, VMSA or
>> AVIC backing page for similar reasons. Add a generic allocator that will
>> ensure that the page returns is not hugepage (2mb or 1gb) and is safe to
>
> This is a bit confusing wording as we are not avoiding "using a
> hugepage" but AFAIU, avoiding using a (4k) page that has a hugepage
> aligned physical address, right?

Yes.

>
>> be used when SEV-SNP is enabled. Also implement similar handling for the
>> VMCB/VMSA pages of nested guests.
>>
>> Co-developed-by: Marc Orr <[email protected]>
>> Signed-off-by: Marc Orr <[email protected]>
>> Reported-by: Alper Gun <[email protected]> # for nested VMSA case
>> Co-developed-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> [mdr: squash in nested guest handling from Ashish]
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>
> <snip>
>
>> +
>> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
>> +{
>> + unsigned long pfn;
>> + struct page *p;
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>> + return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>> +
>> + /*
>> + * Allocate an SNP safe page to workaround the SNP erratum where
>> + * the CPU will incorrectly signal an RMP violation #PF if a
>> + * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
>> + * or AVIC backing page. The recommeded workaround is to not use the
>> + * hugepage.
>
> Same here "not use the hugepage"
>
>> + *
>> + * Allocate one extra page, use a page which is not 2mb aligned
>> + * and free the other.
>
> This makes more sense.
>
>> + */
>> + p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
>> + if (!p)
>> + return NULL;
>> +
>> + split_page(p, 1);
> > Yeah I think that's a sensible use of split_page(), as we don't have
> support for forcefully non-aligned allocations or specific "page
> coloring" in the page allocator.

Yes, using split_page() allows us to free the additionally allocated
page individually.

Thanks,
Ashish

> So even with my wording concerns:
>
> Acked-by: Vlastimil Babka <[email protected]>

2023-12-12 06:54:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

On Mon, Dec 11, 2023 at 03:11:17PM -0600, Kalra, Ashish wrote:
> What we need is a mechanism to do legacy SEV/SEV-ES INIT only if a
> SEV/SEV-ES guest is being launched, hence, we want an additional parameter
> added to sev_platform_init() exported interface so that kvm_amd module can
> call this interface during guest launch and indicate if SNP/legacy guest is
> being launched.
>
> That's the reason we want to add the probe parameter to
> sev_platform_init().

That's not what your original patch does and nowhere in the whole
patchset do I see this new requirement for KVM to be able to control the
probing.

The probe param is added to ___sev_platform_init_locked() which is
called by this new sev_platform_init_on_probe() thing to signal that
whatever calls this, it wants the probing.

And "whatever" is sev_pci_init() which is called from the bowels of the
secure processor drivers. Suffice it to say, this is some sort of an
init path.

So, it wants to init SNP stuff which is unconditional during driver init
- not when KVM starts guests - and probe too on driver init time, *iff*
that psp_init_on_probe thing is set. Which looks suspicious to me:

"Add psp_init_on_probe module parameter that allows for skipping the
PSP's SEV platform initialization during module init. User may decouple
module init from PSP init due to use of the INIT_EX support in upcoming
patch which allows for users to save PSP's internal state to file."

From b64fa5fc9f44 ("crypto: ccp - Add psp_init_on_probe module
parameter").

And reading about INIT_EX, "This command loads the SEV related
persistent data from user-supplied data and initializes the platform
context."

So it sounds like HV vendor wants to supply something itself. But then
looking at init_ex_path and open_file_as_root() makes me cringe.
I would've never done it this way: we have request_firmware* etc helpers
for loading blobs from userspace which are widely used. But then reading

3d725965f836 ("crypto: ccp - Add SEV_INIT_EX support")

that increases the cringe factor even more because that also wants to
*write* into that file. Maybe there were good reasons to do it this way
- it is still yucky for my taste tho...

But I digress - whatever you want to do, the right approach is to split
the functionality:

SNP init
legacy SEV init

and to call them from a wrapper function around it which determines
which ones need to get called depending on that delayed probe thing.

Lumping everything together and handing a silly bool downwards is
already turning into a mess.

Now, looking at sev_guest_init() which calls sev_platform_init() and if
you want to pass back'n'forth more information than just that &error
pointer, then you can define your own struct sev_platform_init_info or
so which you preset before calling sev_platform_init() and pass in
a pointer to it.

And in it you can stick &error, bool probe or whatever else you need to
control what the platform needs to do upon init. And if you need to
extend that in the future, you can add new struct members and so on.

HTH.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-12 17:03:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 21/50] KVM: SEV: Add support to handle AP reset MSR protocol

On Mon, Oct 16, 2023 at 08:27:50AM -0500, Michael Roth wrote:
> From: Tom Lendacky <[email protected]>
>
> Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
> available in version 2 of the GHCB specification.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/include/asm/sev-common.h | 2 ++
> arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++-----
> arch/x86/kvm/svm/svm.h | 1 +
> 3 files changed, 51 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 93ec8c12c91d..57ced29264ce 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -56,6 +56,8 @@
> /* AP Reset Hold */
> #define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
> #define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
> +#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
> +#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

Align vertically pls.

> /* GHCB GPA Register */
> #define GHCB_MSR_REG_GPA_REQ 0x012
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 6ee925d66648..4f895a7201ed 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -65,6 +65,10 @@ module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
> #define sev_es_debug_swap_enabled false
> #endif /* CONFIG_KVM_AMD_SEV */
>
> +#define AP_RESET_HOLD_NONE 0
> +#define AP_RESET_HOLD_NAE_EVENT 1
> +#define AP_RESET_HOLD_MSR_PROTO 2
> +
> static u8 sev_enc_bit;
> static DECLARE_RWSEM(sev_deactivate_lock);
> static DEFINE_MUTEX(sev_bitmap_lock);
> @@ -2594,6 +2598,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>
> void sev_es_unmap_ghcb(struct vcpu_svm *svm)
> {
> + /* Clear any indication that the vCPU is in a type of AP Reset Hold */
> + svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
> +
> if (!svm->sev_es.ghcb)
> return;
>
> @@ -2805,6 +2812,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> GHCB_MSR_INFO_POS);
> break;
> }
> + case GHCB_MSR_AP_RESET_HOLD_REQ:
> + svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
> + ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
> +
> + /*
> + * Preset the result to a non-SIPI return and then only set
> + * the result to non-zero when delivering a SIPI.
> + */
> + set_ghcb_msr_bits(svm, 0,
> + GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
> + GHCB_MSR_AP_RESET_HOLD_RESULT_POS);

Yikes, those defines are a mouthful.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-12 23:26:50

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list

Hello Vlastimil,

On 12/11/2023 7:08 AM, Vlastimil Babka wrote:
>
>
> On 12/8/23 23:10, Kalra, Ashish wrote:
>> Hello Vlastimil,
>>
>> On 12/7/2023 10:20 AM, Vlastimil Babka wrote:
>>
>>>> +
>>>> +void snp_leak_pages(u64 pfn, unsigned int npages)
>>>> +{
>>>> +    struct page *page = pfn_to_page(pfn);
>>>> +
>>>> +    pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn,
>>>> pfn + npages);
>>>> +
>>>> +    spin_lock(&snp_leaked_pages_list_lock);
>>>> +    while (npages--) {
>>>> +        /*
>>>> +         * Reuse the page's buddy list for chaining into the leaked
>>>> +         * pages list. This page should not be on a free list currently
>>>> +         * and is also unsafe to be added to a free list.
>>>> +         */
>>>> +        list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
>>>> +        sev_dump_rmpentry(pfn);
>>>> +        pfn++;
>>>
>>> You increment pfn, but not page, which is always pointing to the page
>>> of the
>>> initial pfn, so need to do page++ too.
>>
>> Yes, that is a bug and needs to be fixed.
>>
>>> But that assumes it's all order-0 pages (hard to tell for me whether
>>> that's
>>> true as we start with a pfn), if there can be compound pages, it would be
>>> best to only add the head page and skip the tail pages - it's not
>>> expected
>>> to use page->buddy_list of tail pages.
>>
>> Can't we use PageCompound() to check if the page is a compound page and
>> then use page->compound_head to get and add the head page to leaked
>> pages list. I understand the tail pages for compound pages are really
>> limited for usage.
>
> Yeah that should work. Need to be careful though, should probably only
> process head pages and check if the whole compound_order() is within the
> range we are to leak, and then leak the head page and advance the loop
> by compound_order(). And if we encounter a tail page, it should probably
> be just skipped. I'm looking at snp_reclaim_pages() which seems to
> process a number of pages with SEV_CMD_SNP_PAGE_RECLAIM and once any
> fails, call snp_leak_pages() on the rest. Could that invoke
> snp_leak_pages with the first pfn being a tail page?

Yes i don't think we can assume that the first pfn will not be a tail
page. But then this becomes complex as we might have already reclaimed
the head page and one or more tail pages successfully or probably never
transitioned head page to FW state as alloc_page()/alloc_pages() would
have returned subpage(s) of a largepage.

But then we really can't use the buddy_list of a tail page to insert it
in the snp leaked pages list, right ?

These non-reclaimed pages are not usable anymore anyway, any access to
them will cause fatal RMP #PF, so don't know if i can use the buddy_list
to insert tail pages as that will corrupt the page metadata ?

We initially used to invoke memory_failure() here to try to gracefully
handle failure of these non-reclaimed pages and that used to handle
hugepages, etc., but as pointed in previous review feedback that is not
a logical approach for this as that's meant more for the RAS stuff.

Maybe it is a simpler approach to have our own container object on top
and have this page pointer and list_head in it and use that list_head to
insert into the snp leaked list instead of re-using the buddy_list for
chaining into the leaked pages list ?

Thanks,
Ashish


2023-12-13 12:51:02

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 03/50] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests

On 10/16/23 15:27, Michael Roth wrote:
> Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests
> if the host/guest configuration allows it. If the host/guest
> configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so
> that it can be caught by the existing checks in
> kvm_{set,get}_msr_common() if the guest still attempts to access it.

This is wrong, because it allows the guest to do untrapped writes to
MSR_IA32_XSS and therefore (via XRSTORS) to MSRs that the host might not
save or restore.

If the processor cannot let the host validate writes to MSR_IA32_XSS,
KVM simply cannot expose XSAVES to SEV-ES (and SEV-SNP) guests.

Because SVM doesn't provide a way to disable just XSAVES in the guest,
all that KVM can do is keep on trapping MSR_IA32_XSS (which the guest
shouldn't read or write to). In other words the crash on accesses to
MSR_IA32_XSS is not a bug but a feature (of the hypervisor, that
wants/needs to protect itself just as much as the guest wants to).

The bug is that there is no API to tell userspace "do not enable this
and that CPUID for SEV guests", there is only the extremely limited
KVM_GET_SUPPORTED_CPUID system ioctl.

For now, all we can do is document our wishes, with which userspace had
better comply. Please send a patch to QEMU that makes it obey.

Paolo

--------------------------- 8< -----------------------
From 303e66472ddf54c2a945588b133d34eaab291257 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <[email protected]>
Date: Wed, 13 Dec 2023 07:45:08 -0500
Subject: [PATCH] Documentation: KVM: suggest disabling XSAVES on SEV-ES guests

When intercepts are enabled for MSR_IA32_XSS, the host will swap in/out
the guest-defined values while context-switching to/from guest mode.
However, in the case of SEV-ES, vcpu->arch.guest_state_protected is set,
so the guest-defined value is effectively ignored when switching to
guest mode with the understanding that the VMSA will handle swapping
in/out this register state.

However, SVM is still configured to intercept these accesses for SEV-ES
guests, so the values in the initial MSR_IA32_XSS are effectively
read-only, and a guest will experience undefined behavior if it actually
tries to write to this MSR. Fortunately, only CET/shadowstack makes use
of this register on SEV-ES-capable systems currently, which isn't yet
widely used, but this may become more of an issue in the future.

Additionally, enabling intercepts of MSR_IA32_XSS results in #VC
exceptions in the guest in certain paths that can lead to unexpected #VC
nesting levels. One example is SEV-SNP guests when handling #VC
exceptions for CPUID instructions involving leaf 0xD, subleaf 0x1, since
they will access MSR_IA32_XSS as part of servicing the CPUID #VC, then
generate another #VC when accessing MSR_IA32_XSS, which can lead to
guest crashes if an NMI occurs at that point in time. Running perf on a
guest while it is issuing such a sequence is one example where these can
be problematic.

Unfortunately, there is not really a way to fix this issue; allowing
unfiltered access to MSR_IA32_XSS also lets the guest write (via
XRSTORS) MSRs that the host might not be ready to save or restore.
Because SVM doesn't provide a way to disable just XSAVES in the guest,
all that KVM can do to protect itself is keep on trapping MSR_IA32_XSS.
Userspace has to comply and not enable XSAVES in CPUID, so that the
guest has no business accessing MSR_IA32_XSS at all.

Unfortunately^2, there is no API to tell userspace "do not enable this
and that CPUID for SEV guests", there is only the extremely limited
KVM_GET_SUPPORTED_CPUID system ioctl. So all we can do for now is
document it.

Reported-by: Michael Roth <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
index 49a05f24747b..0c91916c0164 100644
--- a/Documentation/virt/kvm/x86/errata.rst
+++ b/Documentation/virt/kvm/x86/errata.rst
@@ -33,6 +33,15 @@ Note however that any software (e.g ``WIN87EM.DLL``) expecting these features
to be present likely predates these CPUID feature bits, and therefore
doesn't know to check for them anyway.

+Encrypted guests
+~~~~~~~~~~~~~~~~
+
+For SEV-ES guests, it is impossible for KVM to validate writes for MSRs that
+are part of the VMSA. In the case of MSR_IA32_XSS, however, KVM needs to
+validate writes to the MSR in order to prevent the guest from using XRSTORS
+to overwrite host MSRs. Therefore, the XSAVES feature should never be exposed
+to SEV-ES guests.
+
Nested virtualization features
------------------------------



2023-12-13 12:52:37

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature

On 10/16/23 15:27, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> Add CPU feature detection for Secure Encrypted Virtualization with
> Secure Nested Paging. This feature adds a strong memory integrity
> protection to help prevent malicious hypervisor-based attacks like
> data replay, memory re-mapping, and more.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Jarkko Sakkinen <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Michael Roth <[email protected]>

Queued, thanks.

Paolo

> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/kernel/cpu/amd.c | 5 +++--
> tools/arch/x86/include/asm/cpufeatures.h | 1 +
> 3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 58cb9495e40f..1640cedd77f1 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -437,6 +437,7 @@
> #define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
> #define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
> #define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
> +#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
> #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
> #define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
> #define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index dd8379d84445..14ee7f750cc7 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -630,8 +630,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> * SME feature (set in scattered.c).
> * If the kernel has not enabled SME via any means then
> * don't advertise the SME feature.
> - * For SEV: If BIOS has not enabled SEV then don't advertise the
> - * SEV and SEV_ES feature (set in scattered.c).
> + * For SEV: If BIOS has not enabled SEV then don't advertise SEV and
> + * any additional functionality based on it.
> *
> * In all cases, since support for SME and SEV requires long mode,
> * don't advertise the feature under CONFIG_X86_32.
> @@ -666,6 +666,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> clear_sev:
> setup_clear_cpu_cap(X86_FEATURE_SEV);
> setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> }
> }
>
> diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
> index 798e60b5454b..669f45eefa0c 100644
> --- a/tools/arch/x86/include/asm/cpufeatures.h
> +++ b/tools/arch/x86/include/asm/cpufeatures.h
> @@ -432,6 +432,7 @@
> #define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
> #define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
> #define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
> +#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
> #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
> #define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
> #define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */


2023-12-13 12:52:44

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled

On 10/16/23 15:27, Michael Roth wrote:
> From: Kim Phillips <[email protected]>
>
> Without SEV-SNP, Automatic IBRS protects only the kernel. But when
> SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
> host-side code, including userspace. This protection comes at a cost:
> reduced userspace indirect branch performance.
>
> To avoid this performance loss, don't use Automatic IBRS on SEV-SNP
> hosts. Fall back to retpolines instead.
>
> Signed-off-by: Kim Phillips <[email protected]>
> [mdr: squash in changes from review discussion]
> Signed-off-by: Michael Roth <[email protected]>

Queued, thanks.

Paolo

> ---
> arch/x86/kernel/cpu/common.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 382d4e6b848d..11fae89b799e 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1357,8 +1357,13 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
> /*
> * AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature
> * flag and protect from vendor-specific bugs via the whitelist.
> + *
> + * Don't use AutoIBRS when SNP is enabled because it degrades host
> + * userspace indirect branch performance.
> */
> - if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
> + if ((ia32_cap & ARCH_CAP_IBRS_ALL) ||
> + (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
> + !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
> setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
> if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
> !(ia32_cap & ARCH_CAP_PBRSB_NO))


2023-12-13 13:31:20

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature

On 12/13/23 14:13, Borislav Petkov wrote:
> On Wed, Dec 13, 2023 at 01:51:58PM +0100, Paolo Bonzini wrote:
>> On 10/16/23 15:27, Michael Roth wrote:
>>> From: Brijesh Singh <[email protected]>
>>>
>>> Add CPU feature detection for Secure Encrypted Virtualization with
>>> Secure Nested Paging. This feature adds a strong memory integrity
>>> protection to help prevent malicious hypervisor-based attacks like
>>> data replay, memory re-mapping, and more.
>>>
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> Signed-off-by: Jarkko Sakkinen <[email protected]>
>>> Signed-off-by: Ashish Kalra <[email protected]>
>>> Signed-off-by: Michael Roth <[email protected]>
>>
>> Queued, thanks.
>
> Paolo, please stop queueing x86 patches through your tree. I'll give you
> an immutable branch with the x86 bits when the stuff has been reviewed.

Sure, I only queued it because you gave Acked-by for 05/50 and this is
an obvious dependency. I would like to get things in as they are ready
(whenever it makes sense), so if you want to include those two in the
x86 tree for 6.8, that would work for me.

Paolo


2023-12-13 13:32:03

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe

On 10/16/23 15:27, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.
>
> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> backing pages as "in-use" via a reserved bit in the corresponding RMP
> entry after a successful VMRUN. This is done for _all_ VMs, not just
> SNP-Active VMs.
>
> If the hypervisor accesses an in-use page through a writable
> translation, the CPU will throw an RMP violation #PF. On early SNP
> hardware, if an in-use page is 2mb aligned and software accesses any
> part of the associated 2mb region with a hupage, the CPU will
> incorrectly treat the entire 2mb region as in-use and signal a spurious
> RMP violation #PF.

I don't understand if this can happen even if SEV-SNP is not in use,
just because it is supported on the host? If so, should this be Cc'd to
stable? (I can tweak the wording and submit it).

Paolo


2023-12-13 13:40:54

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature

On 12/13/23 14:36, Borislav Petkov wrote:
>> Sure, I only queued it because you gave Acked-by for 05/50 and this is an
>> obvious dependency. I would like to get things in as they are ready
>> (whenever it makes sense), so if you want to include those two in the x86
>> tree for 6.8, that would work for me.
>
> It doesn't make sense to include them into 6.8 because the two alone are
> simply dead code in 6.8.

Why are they dead code? X86_FEATURE_SEV_SNP is set automatically based
on CPUID, therefore patch 5 is a performance improvement on all
processors that support SEV-SNP. This is independent of whether KVM can
create SEV-SNP guests or not.

If this is wrong, there is a problem in the commit messages.

Paolo

> The plan is to put the x86 patches first in the next submission, I'll
> pick them up for 6.9 and then give you an immutable branch to apply the
> KVM bits ontop. This way it all goes together.


2023-12-13 14:18:32

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature

On 12/13/23 14:49, Borislav Petkov wrote:
> On Wed, Dec 13, 2023 at 02:40:24PM +0100, Paolo Bonzini wrote:
>> Why are they dead code? X86_FEATURE_SEV_SNP is set automatically based on
>> CPUID, therefore patch 5 is a performance improvement on all processors that
>> support SEV-SNP. This is independent of whether KVM can create SEV-SNP
>> guests or not.
>
> No, it is not. This CPUID bit means:
>
> "RMP table can be enabled to protect memory even from hypervisor."
>
> Without the SNP host patches, it is dead code.

- if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
+ if ((ia32_cap & ARCH_CAP_IBRS_ALL) ||
+ (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
+ !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {

Surely we can agree that cpu_feature_enabled(X86_FEATURE_SEV_SNP) has nothing
to do with SEV-SNP host patches being present? And that therefore retpolines
are preferred even without any SEV-SNP support in KVM?

And can we agree that "Acked-by" means "feel free and take it if you wish,
I don't care enough to merge it through my tree or provide a topic branch"?

I'm asking because I'm not sure if we agree on these two things, but they
really seem basic to me?

Paolo

> And regardless, arch/x86/kvm/ patches go through the kvm tree. The rest
> of arch/x86/ through the tip tree. We've been over this a bunch of times
> already.


> If you don't agree with this split, let's discuss it offlist with all
> tip and kvm maintainers, reach an agreement who picks up what and to put
> an end to this nonsense.
>
> Thx.
>


2023-12-13 17:35:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature

On 12/13/23 16:41, Borislav Petkov wrote:
> On Wed, Dec 13, 2023 at 03:18:17PM +0100, Paolo Bonzini wrote:
>> Surely we can agree that cpu_feature_enabled(X86_FEATURE_SEV_SNP) has nothing
>> to do with SEV-SNP host patches being present?
>
> It does - we're sanitizing the meaning of a CPUID flag present in
> /proc/cpuinfo, see here:
>
> https://git.kernel.org/tip/79c603ee43b2674fba0257803bab265147821955
>
>> And that therefore retpolines are preferred even without any SEV-SNP
>> support in KVM?
>
> No, automatic IBRS should be disabled when SNP is enabled. Not CPUID
> present - enabled.

Ok, so the root cause of the problem is commit message/patch ordering:

1) patch 4 should have unconditionally cleared the feature (until the
initialization code comes around in patch 6); and it should have
mentioned in the commit message that we don't want X86_FEATURE_SEV_SNP
to be set, unless SNP can be enabled via MSR_AMD64_SYSCFG.

2) possibly, the commit message of patch 5 could have said something
like "at this point in the kernel SNP is never enabled".

3) Patch 23 should have been placed before the SNP initialization,
because as things stand the patches (mildly) break bisectability.

> We clear that bit on a couple of occasions in the SNP
> host patchset if we determine that SNP host support is not possible so
> 4/50 needs to go together with the rest to mean something.

Understood now. With the patch ordering and commit message edits I
suggested above, indeed I would not have picked up patch 4.

But with your explanation, I would even say that "4/50 needs to go
together with the rest" *for correctness*, not just to mean something.

Paolo


2023-12-13 17:41:10

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 03/50] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests

On 12/13/23 18:30, Sean Christopherson wrote:
>> For now, all we can do is document our wishes, with which userspace had
>> better comply. Please send a patch to QEMU that makes it obey.
> Discussed this early today with Paolo at PUCK and pointed out that (a) the CPU
> context switches the underlying state, (b) SVM doesn't allow intercepting*just*
> XSAVES, and (c) SNP's AP creation can bypass XSS interception.
>
> So while we all (all == KVM folks) agree that this is rather terrifying, e.g.
> gives KVM zero option if there is a hardware issue, it's "fine" to let the guest
> use XSAVES/XSS.

Indeed; looks like I've got to queue this for 6.7 after all.

Paolo


2023-12-13 18:45:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe

On 10/16/23 15:27, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.
>
> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> backing pages as "in-use" via a reserved bit in the corresponding RMP
> entry after a successful VMRUN. This is done for _all_ VMs, not just
> SNP-Active VMs.
>
> If the hypervisor accesses an in-use page through a writable
> translation, the CPU will throw an RMP violation #PF. On early SNP
> hardware, if an in-use page is 2mb aligned and software accesses any
> part of the associated 2mb region with a hupage, the CPU will
> incorrectly treat the entire 2mb region as in-use and signal a spurious
> RMP violation #PF.
>
> The recommended is to not use the hugepage for the VMCB, VMSA or
> AVIC backing page for similar reasons. Add a generic allocator that will
> ensure that the page returns is not hugepage (2mb or 1gb) and is safe to
> be used when SEV-SNP is enabled. Also implement similar handling for the
> VMCB/VMSA pages of nested guests.
>
> Co-developed-by: Marc Orr <[email protected]>
> Signed-off-by: Marc Orr <[email protected]>
> Reported-by: Alper Gun <[email protected]> # for nested VMSA case
> Co-developed-by: Ashish Kalra <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> [mdr: squash in nested guest handling from Ashish]
> Signed-off-by: Michael Roth <[email protected]>

Based on the discussion with Borislav, please move this earlier in the
series, before patch 6.

Paolo

> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/lapic.c | 5 ++++-
> arch/x86/kvm/svm/nested.c | 2 +-
> arch/x86/kvm/svm/sev.c | 33 ++++++++++++++++++++++++++++++
> arch/x86/kvm/svm/svm.c | 17 ++++++++++++---
> arch/x86/kvm/svm/svm.h | 1 +
> 7 files changed, 55 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index f1505a5fa781..4ef2eca14287 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -136,6 +136,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
> KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
> KVM_X86_OP_OPTIONAL(gmem_invalidate)
> +KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
>
> #undef KVM_X86_OP
> #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index fa401cb1a552..a3983271ea28 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1763,6 +1763,7 @@ struct kvm_x86_ops {
>
> int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
> void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
> + void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
> };
>
> struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index dcd60b39e794..631a554c0f48 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2810,7 +2810,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
>
> vcpu->arch.apic = apic;
>
> - apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> + if (kvm_x86_ops.alloc_apic_backing_page)
> + apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
> + else
> + apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> if (!apic->regs) {
> printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
> vcpu->vcpu_id);
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index dd496c9e5f91..1f9a3f9eb985 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1194,7 +1194,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
> if (svm->nested.initialized)
> return 0;
>
> - vmcb02_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + vmcb02_page = snp_safe_alloc_page(&svm->vcpu);
> if (!vmcb02_page)
> return -ENOMEM;
> svm->nested.vmcb02.ptr = page_address(vmcb02_page);
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 088b32657f46..1cfb9232fc74 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3211,3 +3211,36 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> break;
> }
> }
> +
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> +{
> + unsigned long pfn;
> + struct page *p;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +
> + /*
> + * Allocate an SNP safe page to workaround the SNP erratum where
> + * the CPU will incorrectly signal an RMP violation #PF if a
> + * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
> + * or AVIC backing page. The recommeded workaround is to not use the
> + * hugepage.
> + *
> + * Allocate one extra page, use a page which is not 2mb aligned
> + * and free the other.
> + */
> + p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> + if (!p)
> + return NULL;
> +
> + split_page(p, 1);
> +
> + pfn = page_to_pfn(p);
> + if (IS_ALIGNED(pfn, PTRS_PER_PMD))
> + __free_page(p++);
> + else
> + __free_page(p + 1);
> +
> + return p;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 1e7fb1ea45f7..8e4ef0cd968a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -706,7 +706,7 @@ static int svm_cpu_init(int cpu)
> int ret = -ENOMEM;
>
> memset(sd, 0, sizeof(struct svm_cpu_data));
> - sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
> + sd->save_area = snp_safe_alloc_page(NULL);
> if (!sd->save_area)
> return ret;
>
> @@ -1425,7 +1425,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
> svm = to_svm(vcpu);
>
> err = -ENOMEM;
> - vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + vmcb01_page = snp_safe_alloc_page(vcpu);
> if (!vmcb01_page)
> goto out;
>
> @@ -1434,7 +1434,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
> * SEV-ES guests require a separate VMSA page used to contain
> * the encrypted register state of the guest.
> */
> - vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + vmsa_page = snp_safe_alloc_page(vcpu);
> if (!vmsa_page)
> goto error_free_vmcb_page;
>
> @@ -4876,6 +4876,16 @@ static int svm_vm_init(struct kvm *kvm)
> return 0;
> }
>
> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
> +{
> + struct page *page = snp_safe_alloc_page(vcpu);
> +
> + if (!page)
> + return NULL;
> +
> + return page_address(page);
> +}
> +
> static struct kvm_x86_ops svm_x86_ops __initdata = {
> .name = KBUILD_MODNAME,
>
> @@ -5007,6 +5017,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>
> .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> .vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
> + .alloc_apic_backing_page = svm_alloc_apic_backing_page,
> };
>
> /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index c13070d00910..b7b8bf73cbb9 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -694,6 +694,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
> void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
> void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
> void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>
> /* vmenter.S */
>


2023-12-14 15:26:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature

On Wed, Dec 13, 2023 at 02:31:05PM +0100, Paolo Bonzini wrote:
> Sure, I only queued it because you gave Acked-by for 05/50 and this is an
> obvious dependency. I would like to get things in as they are ready
> (whenever it makes sense), so if you want to include those two in the x86
> tree for 6.8, that would work for me.

It doesn't make sense to include them into 6.8 because the two alone are
simply dead code in 6.8.

The plan is to put the x86 patches first in the next submission, I'll
pick them up for 6.9 and then give you an immutable branch to apply the
KVM bits ontop. This way it all goes together.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-15 15:40:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature

On Wed, Dec 13, 2023 at 03:18:17PM +0100, Paolo Bonzini wrote:
> Surely we can agree that cpu_feature_enabled(X86_FEATURE_SEV_SNP) has nothing
> to do with SEV-SNP host patches being present?

It does - we're sanitizing the meaning of a CPUID flag present in
/proc/cpuinfo, see here:

https://git.kernel.org/tip/79c603ee43b2674fba0257803bab265147821955

> And that therefore retpolines are preferred even without any SEV-SNP
> support in KVM?

No, automatic IBRS should be disabled when SNP is enabled. Not CPUID
present - enabled. We clear that bit on a couple of occasions in the SNP
host patchset if we determine that SNP host support is not possible so
4/50 needs to go together with the rest to mean something.

> And can we agree that "Acked-by" means "feel free and take it if you wish,

I can see how it can mean that and I'm sorry for the misunderstanding
I caused. Two things here:

* I acked it because I did a lengthly digging internally on whether
disabling AIBRS makes sense on SNP and this was a note more to myself to
say, yes, that's a good change.

* If I wanted for you to pick it up, I would've acked 4/50 too. Which
I haven't.

> I'm asking because I'm not sure if we agree on these two things, but they
> really seem basic to me?

I think KVM and x86 maintainers should sit down and discuss who picks up
what and through which tree so that there's no more confusion in the
future. It seems things need discussion...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-18 10:24:54

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 22/50] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests

On Mon, Oct 16, 2023 at 08:27:51AM -0500, Michael Roth wrote:
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 4f895a7201ed..088b32657f46 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2568,6 +2568,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> case SVM_VMGEXIT_AP_HLT_LOOP:
> case SVM_VMGEXIT_AP_JUMP_TABLE:
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> + case SVM_VMGEXIT_HV_FEATURES:
> break;
> default:
> reason = GHCB_ERR_INVALID_EVENT;
> @@ -2828,6 +2829,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> GHCB_MSR_INFO_MASK,
> GHCB_MSR_INFO_POS);
> break;
> + case GHCB_MSR_HV_FT_REQ: {
^^^

No need to have a statement block here. Neither below.


> + set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
> + GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
> + set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
> + GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
> + break;
> + }
> case GHCB_MSR_TERM_REQ: {
> u64 reason_set, reason_code;
>
> @@ -2952,6 +2960,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
> ret = 1;
> break;
> }
> + case SVM_VMGEXIT_HV_FEATURES: {
^^^^

> + ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_HV_FT_SUPPORTED);
> +
> + ret = 1;
> + break;
> + }
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> vcpu_unimpl(vcpu,
> "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-18 15:46:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe


Just typos:

On Mon, Oct 16, 2023 at 08:27:52AM -0500, Michael Roth wrote:
> From: Brijesh Singh <[email protected]>
>
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.
>
> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> backing pages as "in-use" via a reserved bit in the corresponding RMP
> entry after a successful VMRUN. This is done for _all_ VMs, not just
> SNP-Active VMs.
>
> If the hypervisor accesses an in-use page through a writable
> translation, the CPU will throw an RMP violation #PF. On early SNP
> hardware, if an in-use page is 2mb aligned and software accesses any
> part of the associated 2mb region with a hupage, the CPU will

"hugepage"

> incorrectly treat the entire 2mb region as in-use and signal a spurious
> RMP violation #PF.
>
> The recommended is to not use the hugepage for the VMCB, VMSA or

s/recommended/recommendation/
s/the hugepage/a hugepage/

> AVIC backing page for similar reasons. Add a generic allocator that will
> ensure that the page returns is not hugepage (2mb or 1gb) and is safe to

"... the page returned is not a hugepage..."

...

> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> +{
> + unsigned long pfn;
> + struct page *p;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +
> + /*
> + * Allocate an SNP safe page to workaround the SNP erratum where
> + * the CPU will incorrectly signal an RMP violation #PF if a
> + * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
> + * or AVIC backing page. The recommeded workaround is to not use the

"recommended"

> + * hugepage.
> + *
> + * Allocate one extra page, use a page which is not 2mb aligned
> + * and free the other.
> + */
> + p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> + if (!p)
> + return NULL;
> +
> + split_page(p, 1);
> +
> + pfn = page_to_pfn(p);
> + if (IS_ALIGNED(pfn, PTRS_PER_PMD))
> + __free_page(p++);
> + else
> + __free_page(p + 1);
> +
> + return p;
> +}

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-19 21:29:58

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v10 07/50] x86/sev: Add RMP entry lookup helpers

On Tue, Nov 14, 2023 at 03:24:42PM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:36AM -0500, Michael Roth wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
>
> $ git grep snp_lookup_page_in_rmptable
> $
>
> Stale commit message. And not very telling. Please rewrite.
>
> > entry for a given page. The RMP entry format is documented in AMD PPR, see
> > https://bugzilla.kernel.org/attachment.cgi?id=296015.
>
> <--- Brijesh's SOB comes first here if he's the primary author.
>
> > Co-developed-by: Ashish Kalra <[email protected]>
> > Signed-off-by: Ashish Kalra <[email protected]>
> > Signed-off-by: Brijesh Singh <[email protected]>
> > [mdr: separate 'assigned' indicator from return code]
> > Signed-off-by: Michael Roth <[email protected]>
> > ---
> > arch/x86/include/asm/sev-common.h | 4 +++
> > arch/x86/include/asm/sev-host.h | 22 +++++++++++++
> > arch/x86/virt/svm/sev.c | 53 +++++++++++++++++++++++++++++++
> > 3 files changed, 79 insertions(+)
> > create mode 100644 arch/x86/include/asm/sev-host.h
> >
> > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> > index b463fcbd4b90..1e6fb93d8ab0 100644
> > --- a/arch/x86/include/asm/sev-common.h
> > +++ b/arch/x86/include/asm/sev-common.h
> > @@ -173,4 +173,8 @@ struct snp_psc_desc {
> > #define GHCB_ERR_INVALID_INPUT 5
> > #define GHCB_ERR_INVALID_EVENT 6
> >
> > +/* RMP page size */
> > +#define RMP_PG_SIZE_4K 0
>
> RMP_PG_LEVEL_4K just like the generic ones.

I've moved this to sev.h, but it RMP_PG_SIZE_4K is already defined there
and used by a bunch of guest code so it's a bit out-of-place to update
those as part of this patchset. I can send a follow-up series to clean up
some of the naming and get rid of sev-common.h

>
> > +#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
>
> What else is there besides X86 PG level?
>
> IOW, RMP_TO_PG_LEVEL simply.

Make sense.

>
> > +
> > #endif
> > diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
>
> Nah, we don't need a third sev header:
>
> arch/x86/include/asm/sev-common.h
> arch/x86/include/asm/sev.h
> arch/x86/include/asm/sev-host.h
>
> Put it in sev.h pls.

Done.

>
> sev-common.h should be merged into sev.h too unless there's a compelling
> reason not to which I don't see atm.

Doesn't seem like it would be an issue, maybe some fallout from any
files that previously only included sev-common.h and now need to pull in
guest struct definitions as well, but those definitions don't have a lot
of external dependencies so don't anticipate any header include
hellishness. I'll send that as a separate follow-up, along with some of
the renames you suggested above since they'll touch guest code and
create unecessary churn for SNP host support.

Thanks,

Mike

> > --
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2023-12-19 21:30:54

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v10 11/50] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

On Tue, Nov 21, 2023 at 05:21:49PM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:40AM -0500, Michael Roth wrote:
> > +static int rmpupdate(u64 pfn, struct rmp_state *val)
>
> rmp_state *state
>
> so that it is clear what this is.
>
> > +{
> > + unsigned long paddr = pfn << PAGE_SHIFT;
> > + int ret, level, npages;
> > + int attempts = 0;
> > +
> > + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> > + return -ENXIO;
> > +
> > + do {
> > + /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> > + asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> > + : "=a"(ret)
> > + : "a"(paddr), "c"((unsigned long)val)
>
> Add an empty space between the " and the (
>
> > + : "memory", "cc");
> > +
> > + attempts++;
> > + } while (ret == RMPUPDATE_FAIL_OVERLAP);
>
> What's the logic here? Loop as long as it says "overlap"?
>
> How "transient" is that overlapping condition?
>
> What's the upper limit of that loop?
>
> This loop should check a generously chosen upper limit of attempts and
> then break if that limit is reached.

We've raised similar questions to David Kaplan and discussed this to a
fair degree.

The transient condition here is due to firmware locking the 2MB-aligned
RMP entry for the range to handle atomic updates. There is no upper bound
on retries or the amount of time spent, but it is always transient since
multiple hypervisor implementations now depend on this and any deviation
from this assurance would constitute a firmware regression.

A good torture test for this path is lots of 4K-only guests doing
concurrent boot/shutdowns in a tight loop. With week-long runs the
longest delay seen was on the order of 100ns, but there's no real
correlation between time spent and number of retries, sometimes
100ns delays only involve 1 retry, sometimes much smaller time delays
involve hundreds of retries, and it all depends on what firmware is
doing, so there's no way to infer a safe retry limit based on that
data.

All that said, there are unfortunately other conditions that can
trigger non-transient RMPUPDATE_FAIL_OVERLAP failures, and these will
result in an infinite loop. Those are the result of host misbehavior
however, like trying to set up 2MB private RMP entries when there are
already private 4K entries in the range. Ideally these would be separate
error codes, but even if that were changed in firmware we'd still need
code to support older firmwares that don't disambiguate so not sure this
situation can be improved much.

>
> > + if (ret) {
> > + pr_err("RMPUPDATE failed after %d attempts, ret: %d, pfn: %llx, npages: %d, level: %d\n",
> > + attempts, ret, pfn, npages, level);
>
> You're dumping here uninitialized stack variables npages and level.
> Looks like leftover from some prior version of this function.

Yah, I'll clean this up. I think logging the attempts probably doesn't
have much use anymore either.

>
> > + sev_dump_rmpentry(pfn);
> > + dump_stack();
>
> This is going to become real noisy on a huge machine with a lot of SNP
> guests.

Since the transient case will eventually resolve to ret==0, we will only
get here on a kernel oops sort of condition where a stack dump seems
appropriate. rmpupdate() shouldn't error during normal operation, and if
it ever does it will likely be a fatal situation where those stack dumps
will be useful.

Thanks,

Mike

>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>

2023-12-20 16:17:47

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support

On Tue, Nov 07, 2023 at 05:31:42PM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:35AM -0500, Michael Roth wrote:
> > +static bool early_rmptable_check(void)
> > +{
> > + u64 rmp_base, rmp_size;
> > +
> > + /*
> > + * For early BSP initialization, max_pfn won't be set up yet, wait until
> > + * it is set before performing the RMP table calculations.
> > + */
> > + if (!max_pfn)
> > + return true;
>
> This already says that this is called at the wrong point during init.
>
> Right now we have
>
> early_identify_cpu -> early_init_amd -> early_detect_mem_encrypt
>
> which runs only on the BSP but then early_init_amd() is called in
> init_amd() too so that it takes care of the APs too.
>
> Which ends up doing a lot of unnecessary work on each AP in
> early_detect_mem_encrypt() like calculating the RMP size on each AP
> unnecessarily where this needs to happen exactly once.
>
> Is there any reason why this function cannot be moved to init_amd()
> where it'll do the normal, per-AP init?
>
> And the stuff that needs to happen once, needs to be called once too.

I've renamed/repurposed snp_get_rmptable_info() to
snp_probe_rmptable_info(). It now reads the MSRs, sanity-checks them,
and stores the values into ro_after_init variables on success.

Subsequent code uses those values to initialize the RMP table mapping
instead of re-reading the MSRs.

I've moved the call-site for snp_probe_rmptable_info() to
bsp_init_amd(), which gets called right after early_init_amd(), so
should still be early enough to clear X86_FEATURE_SEV_SNP such that
AutoIBRS doesn't get disabled if SNP isn't available on the system. APs
don't call bsp_init_amd(), so that should avoid doing multiple MSR reads.

And I think Ashish has all the other review comments addressed now.

Thanks,

Mike

2023-12-30 16:21:26

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v10 18/50] crypto: ccp: Handle the legacy SEV command when SNP is enabled

On Sat, Dec 09, 2023 at 04:36:56PM +0100, Borislav Petkov wrote:
> > +static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
> > +{
> > + int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
> > + struct sev_device *sev = psp_master->sev_data;
> > + bool from_fw = !to_fw;
> > +
> > + /*
> > + * After the command is completed, change the command buffer memory to
> > + * hypervisor state.
> > + *
> > + * The immutable bit is automatically cleared by the firmware, so
> > + * no not need to reclaim the page.
> > + */
> > + if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
> > + if (snp_reclaim_pages(__pa(cmd_buf), 1, true))
> > + return -EFAULT;
> > +
> > + /* No need to go further if firmware failed to execute command. */
> > + if (fw_err)
> > + return 0;
> > + }
> > +
> > + if (to_fw)
> > + func = map_firmware_writeable;
> > + else
> > + func = unmap_firmware_writeable;
>
> Eww, ugly and with the macro above even worse. And completely
> unnecessary.
>
> Define prep_buffer() as a normal function which selects which @func to
> call and then does it. Not like this.

I've rewritten this using a descriptor array to handle buffers for
various command parameters, and switched to allocating bounce buffers
on-demand to avoid some of the init/cleanup coordination. I dont think
any of these are really performance critical and its only for legacy
support, but would be straightforward to add a cache of pre-allocated
buffers later if needed.

I've tried to document/name the helpers so the flow is a bit clearer.

-Mike

>
> ...
>
> > +static inline bool need_firmware_copy(int cmd)
> > +{
> > + struct sev_device *sev = psp_master->sev_data;
> > +
> > + /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
>
> "initialized"
>
> > + return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
>
> redundant ternary conditional:
>
> return cmd < SEV_CMD_SNP_INIT && sev->snp_initialized;
>
> > +}
> > +
> > +static int snp_aware_copy_to_firmware(int cmd, void *data)
>
> What does "SNP aware" even mean?
>
> > +{
> > + return __snp_cmd_buf_copy(cmd, data, true, 0);
> > +}
> > +
> > +static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
> > +{
> > + return __snp_cmd_buf_copy(cmd, data, false, fw_err);
> > +}
> > +
> > static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> > {
> > struct psp_device *psp = psp_master;
> > struct sev_device *sev;
> > unsigned int phys_lsb, phys_msb;
> > unsigned int reg, ret = 0;
> > + void *cmd_buf;
> > int buf_len;
> >
> > if (!psp || !psp->sev_data)
> > @@ -487,12 +770,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> > * work for some memory, e.g. vmalloc'd addresses, and @data may not be
> > * physically contiguous.
> > */
> > - if (data)
> > - memcpy(sev->cmd_buf, data, buf_len);
> > + if (data) {
> > + if (sev->cmd_buf_active > 2)
>
> What is that silly counter supposed to mean?
>
> Nested SNP commands?
>
> > + return -EBUSY;
> > +
> > + cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
> > +
> > + memcpy(cmd_buf, data, buf_len);
> > + sev->cmd_buf_active++;
> > +
> > + /*
> > + * The behavior of the SEV-legacy commands is altered when the
> > + * SNP firmware is in the INIT state.
> > + */
> > + if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, cmd_buf))
>
> Move that need_firmware_copy() check inside snp_aware_copy_to_firmware()
> and the other one.
>
> > + return -EFAULT;
> > + } else {
> > + cmd_buf = sev->cmd_buf;
> > + }
> >
> > /* Get the physical address of the command buffer */
> > - phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> > - phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> > + phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
> > + phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
> >
> > dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
> > cmd, phys_msb, phys_lsb, psp_timeout);
>
> ...
>
> > @@ -639,6 +947,14 @@ static int ___sev_platform_init_locked(int *error, bool probe)
> > if (probe && !psp_init_on_probe)
> > return 0;
> >
> > + /*
> > + * Allocate the intermediate buffers used for the legacy command handling.
> > + */
> > + if (rc != -ENODEV && alloc_snp_host_map(sev)) {
>
> Why isn't this
>
> if (!rc && ...)
>
> > + dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
> > + goto skip_legacy;
>
> No need for that skip_legacy silly label. Just "return 0" here.
>
> ...
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>

2023-12-30 16:26:52

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y

On Mon, Dec 18, 2023 at 11:13:50AM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:49AM -0500, Michael Roth wrote:
> > SEV-SNP relies on the restricted/protected memory support to run guests,
> > so make sure to enable that support with the
> > CONFIG_KVM_SW_PROTECTED_VM build option.
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > ---
> > arch/x86/kvm/Kconfig | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> > index 8452ed0228cb..71dc506aa3fb 100644
> > --- a/arch/x86/kvm/Kconfig
> > +++ b/arch/x86/kvm/Kconfig
> > @@ -126,6 +126,7 @@ config KVM_AMD_SEV
> > bool "AMD Secure Encrypted Virtualization (SEV) support"
> > depends on KVM_AMD && X86_64
> > depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
> > + select KVM_SW_PROTECTED_VM
> > help
> > Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
> > with Encrypted State (SEV-ES) on AMD processors.
> > --
>
> Kconfig doesn't like this one:
>
> WARNING: unmet direct dependencies detected for KVM_SW_PROTECTED_VM
> Depends on [n]: VIRTUALIZATION [=y] && EXPERT [=n] && X86_64 [=y]
> Selected by [m]:
> - KVM_AMD_SEV [=y] && VIRTUALIZATION [=y] && KVM_AMD [=m] && X86_64 [=y] && CRYPTO_DEV_SP_PSP [=y] && (KVM_AMD [=m]!=y || CRYPTO_DEV_CCP_DD [=m]!=m)
>
> WARNING: unmet direct dependencies detected for KVM_SW_PROTECTED_VM
> Depends on [n]: VIRTUALIZATION [=y] && EXPERT [=n] && X86_64 [=y]
> Selected by [m]:
> - KVM_AMD_SEV [=y] && VIRTUALIZATION [=y] && KVM_AMD [=m] && X86_64 [=y] && CRYPTO_DEV_SP_PSP [=y] && (KVM_AMD [=m]!=y || CRYPTO_DEV_CCP_DD [=m]!=m)

I think this is because KVM_SW_PROTECTED_VM requires EXPERT, which has
to be set explicitly. But I think Paolo is right that
KVM_GENERIC_PRIVATE_MEM is more appropriate, which does not require
EXPERT.

-Mike

>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>

2024-01-11 15:05:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 07/50] x86/sev: Add RMP entry lookup helpers

On Mon, Dec 18, 2023 at 09:31:50PM -0600, Michael Roth wrote:
> I've moved this to sev.h, but it RMP_PG_SIZE_4K is already defined there
> and used by a bunch of guest code so it's a bit out-of-place to update
> those as part of this patchset. I can send a follow-up series to clean up
> some of the naming and get rid of sev-common.h

Yap, good idea.

> Doesn't seem like it would be an issue, maybe some fallout from any
> files that previously only included sev-common.h and now need to pull in
> guest struct definitions as well, but those definitions don't have a lot
> of external dependencies so don't anticipate any header include
> hellishness. I'll send that as a separate follow-up, along with some of
> the renames you suggested above since they'll touch guest code and
> create unecessary churn for SNP host support.

OTOH, people recently have started looking at including only that stuff
which is really used so having a single header would cause more
preprocessing effort. I'm not too crazy about it as the preprocessing
overhead is barely measurable so might as well have a single header and
then split it later...

Definitely something for the after-burner and not important right now.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette