2022-06-20 23:01:56

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 00/49] Add AMD Secure Nested Paging (SEV-SNP)

From: Ashish Kalra <[email protected]>

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Guest Support now part of mainline.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:
- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
access requires an RMP check. In particular, the read accesses from the
hypervisor do not require RMP checks because the data confidentiality is
already protected via memory encryption. When hardware encounters an RMP
checks failure, it raises a page-fault exception. If RMP check failure
is due to the page-size mismatch, then split the large page to resolve
the fault.

The series does not provide support for the interrupt security and migration
and those feature will be added after the base support.

Please note that some areas, such as how private guest pages are
managed/pinned/protected, are likely to change once Unmapped Private Memory
support is further along in development/design and can be incorporated
into this series. We are posting these patches without UPM support for now
to hopefully get some review on other aspects of the series in the meantime.

Here is a link to latest UPM v6 patches:
https://lore.kernel.org/linux-mm/[email protected]/

A branch containing these patches is available here:
https://github.com/AMDESE/linux/tree/sev-snp-5.18-rc3-v3

Changes since v5:
* Rebase to 5.18.0-rc3, these patches are just for review so they
are based on 5.18.0-rc3 linux-next release as this included the
SNP guest patches which weren't in mainline then.
* Using kvm_write_guest() to sync the GHCB scratch buffer can fail
due to host mapping being 2M, but RMP being 4K. The page fault
handling in do_user_addr_fault() fails to split the 2M page to handle
RMP fault due it being called in a non-preemptible context. Instead,
use the already kernel mapped ghcb to sync the scratch buffer when
the scratch buffer is contained within the GHCB.
* warn and retry failed rmpupdates.
* Fix for stale per-cpu pointer due to cond_resched due during
ghcb mapping.
* Multiple fixes for SEV-SNP AP Creation.
* Remove SRCU to synchronize the PSC and gfn mapping replacing it
with a spinlock.
* Remove generic post_{map,unmap}_gfn ops, need to revisit these
later with respect to UPM support.
* Fix kvm_mmu_get_tdp_walk() to handle "suspicious RCU usage"
warning.
* Fix sev_snp_init() to do WBINVD/DF_FLUSH command after SNP_INIT
command has been issued.
* Fix sev_free_vcpu() to flush the VMSA page after it is transitioned
back to hypervisor state and restored in the kernel direct map.

Changes since v4:
* Move the RMP entry definition to x86 specific header file.
* Move the dump RMP entry function to SEV specific file.
* Use BIT_ULL while defining the #PF bit fields.
* Add helper function to check the IOMMU support for SEV-SNP feature.
* Add helper functions for the page state transition.
* Map and unmap the pages from the direct map after page is added or
removed in RMP table.
* Enforce the minimum SEV-SNP firmware version.
* Extend the LAUNCH_UPDATE to accept the base_gfn and remove the
logic to calculate the gfn from the hva.
* Add a check in LAUNCH_UPDATE to ensure that all the pages are
shared before calling the PSP.
* Mark the memory failure when failing to remove the page from the
RMP table or clearing the immutable bit.
* Exclude the encrypted hva range from the KSM.
* Remove the gfn tracking during the kvm_gfn_map() and use SRCU to
syncronize the PSC and gfn mapping.
* Allow PSC on the registered hva range only.
* Add support for the Preferred GPA VMGEXIT.
* Simplify the PSC handling routines.
* Use the static_call() for the newly added kvm_x86_ops.
* Remove the long-lived GHCB map.
* Move the snp enable module parameter to the end of the file.
* Remove the kvm_x86_op for the RMP fault handling. Call the
fault handler directly from the #NPF interception.

Changes since v3:
* Add support for extended guest message request.
* Add ioctl to query the SNP Platform status.
* Add ioctl to get and set the SNP config.
* Add check to verify that memory reserved for the RMP covers the full system RAM.
* Start the SNP specific commands from 256 instead of 255.
* Multiple cleanup and fixes based on the review feedback.

Changes since v2:
* Add AP creation support.
* Drop the patch to handle the RMP fault for the kernel address.
* Add functions to track the write access from the hypervisor.
* Do not enable the SNP feature when IOMMU is disabled or is in passthrough mode.
* Dump the RMP entry on RMP violation for the debug.
* Shorten the GHCB macro names.
* Start the SNP_INIT command id from 255 to give some gap for the legacy SEV.
* Sync the header with the latest 0.9 SNP spec.

Changes since v1:
* Add AP reset MSR protocol VMGEXIT NAE.
* Add Hypervisor features VMGEXIT NAE.
* Move the RMP table initialization and RMPUPDATE/PSMASH helper in
arch/x86/kernel/sev.c.
* Add support to map/unmap SEV legacy command buffer to firmware state when
SNP is active.
* Enhance PSP driver to provide helper to allocate/free memory used for the
firmware context page.
* Add support to handle RMP fault for the kernel address.
* Add support to handle GUEST_REQUEST NAE event for attestation.
* Rename RMP table lookup helper.
* Drop typedef from rmpentry struct definition.
* Drop SNP static key and use cpu_feature_enabled() to check whether SEV-SNP
is active.
* Multiple cleanup/fixes to address Boris review feedback.


Ashish Kalra (1):
KVM: SVM: Sync the GHCB scratch buffer using already mapped ghcb

Brijesh Singh (42):
x86/cpufeatures: Add SEV-SNP CPU feature
iommu/amd: Introduce function to check SEV-SNP support
x86/sev: Add the host SEV-SNP initialization support
x86/sev: set SYSCFG.MFMD
x86/sev: Add RMP entry lookup helpers
x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
x86/sev: Invalid pages from direct map when adding it to RMP table
x86/traps: Define RMP violation #PF error code
x86/fault: Add support to handle the RMP fault for user address
x86/fault: Add support to dump RMP entry on fault
crypto:ccp: Define the SEV-SNP commands
crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
crypto:ccp: Provide APIs to issue SEV-SNP commands
crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
crypto: ccp: Handle the legacy SEV command when SNP is enabled
crypto: ccp: Add the SNP_PLATFORM_STATUS command
crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
crypto: ccp: Provide APIs to query extended attestation report
KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
KVM: SVM: Add initial SEV-SNP support
KVM: SVM: Add KVM_SNP_INIT command
KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
KVM: SVM: Disallow registering memory range from HugeTLB for SNP guest
KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
KVM: X86: Keep the NPT and RMP page level in sync
KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
KVM: x86: Define RMP page fault error bits for #NPF
KVM: x86: Update page-fault trace to log full 64-bit error code
KVM: SVM: Do not use long-lived GHCB map while setting scratch area
KVM: SVM: Remove the long-lived GHCB host map
KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
KVM: SVM: Add support to handle Page State Change VMGEXIT
KVM: SVM: Introduce ops for the post gfn map and unmap
KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
KVM: SVM: Add support to handle the RMP nested page fault
KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
KVM: SVM: Add module parameter to enable the SEV-SNP
ccp: add support to decrypt the page

Michael Roth (2):
*fix for stale per-cpu pointer due to cond_resched during ghcb
mapping
*debug: warn and retry failed rmpupdates

Sean Christopherson (1):
KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP

Tom Lendacky (3):
KVM: SVM: Add support to handle AP reset MSR protocol
KVM: SVM: Use a VMSA physical address variable for populating VMCB
KVM: SVM: Support SEV-SNP AP Creation NAE event

Documentation/virt/coco/sevguest.rst | 54 +
.../virt/kvm/x86/amd-memory-encryption.rst | 102 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/kvm-x86-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 15 +
arch/x86/include/asm/msr-index.h | 9 +
arch/x86/include/asm/sev-common.h | 28 +
arch/x86/include/asm/sev.h | 45 +
arch/x86/include/asm/svm.h | 6 +
arch/x86/include/asm/trap_pf.h | 18 +-
arch/x86/kernel/cpu/amd.c | 3 +-
arch/x86/kernel/sev.c | 400 ++++
arch/x86/kvm/lapic.c | 5 +-
arch/x86/kvm/mmu.h | 7 +-
arch/x86/kvm/mmu/mmu.c | 90 +
arch/x86/kvm/svm/sev.c | 1703 ++++++++++++++++-
arch/x86/kvm/svm/svm.c | 62 +-
arch/x86/kvm/svm/svm.h | 75 +-
arch/x86/kvm/trace.h | 40 +-
arch/x86/kvm/x86.c | 10 +-
arch/x86/mm/fault.c | 84 +-
drivers/crypto/ccp/sev-dev.c | 908 ++++++++-
drivers/crypto/ccp/sev-dev.h | 17 +
drivers/iommu/amd/init.c | 30 +
include/linux/iommu.h | 9 +
include/linux/mm.h | 3 +-
include/linux/mm_types.h | 3 +
include/linux/psp-sev.h | 346 ++++
include/linux/sev.h | 32 +
include/uapi/linux/kvm.h | 56 +
include/uapi/linux/psp-sev.h | 60 +
mm/memory.c | 13 +
tools/arch/x86/include/asm/cpufeatures.h | 1 +
34 files changed, 4090 insertions(+), 155 deletions(-)
create mode 100644 include/linux/sev.h

--
2.25.1


2022-06-20 23:03:18

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 03/49] x86/sev: Add the host SEV-SNP initialization support

From: Brijesh Singh <[email protected]>

The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
track the owner of each page of memory. Pages of memory can be owned by
the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
section 15.36.3 for more detail on RMP.

The RMP table is used to enforce access control to memory. The table itself
is not directly writable by the software. New CPU instructions (RMPUPDATE,
PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.

Based on the platform configuration, the BIOS reserves the memory used
for the RMP table. The start and end address of the RMP table must be
queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
RMP_END are not set then disable the SEV-SNP feature.

The SEV-SNP feature is enabled only after the RMP table is successfully
initialized.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/msr-index.h | 6 +
arch/x86/kernel/sev.c | 144 +++++++++++++++++++++++
3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 36369e76cc63..c1be3091a383 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -68,6 +68,12 @@
# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
#endif

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+# define DISABLE_SEV_SNP 0
+#else
+# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -91,7 +97,7 @@
DISABLE_ENQCMD)
#define DISABLED_MASK17 0
#define DISABLED_MASK18 0
-#define DISABLED_MASK19 0
+#define DISABLED_MASK19 (DISABLE_SEV_SNP)
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)

#endif /* _ASM_X86_DISABLED_FEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 9e2e7185fc1d..57a8280e283a 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -507,6 +507,8 @@
#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
#define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE 0xc0010132
+#define MSR_AMD64_RMP_END 0xc0010133

#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f

@@ -581,6 +583,10 @@
#define MSR_AMD64_SYSCFG 0xc0010010
#define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
+#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
#define MSR_K8_INT_PENDING_MSG 0xc0010055
/* C1E active bits in int pending message */
#define K8_INTP_C1E_ACTIVE_MASK 0x18000000
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index f01f4550e2c6..3a233b5d47c5 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -22,6 +22,8 @@
#include <linux/efi.h>
#include <linux/platform_device.h>
#include <linux/io.h>
+#include <linux/cpumask.h>
+#include <linux/iommu.h>

#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -38,6 +40,7 @@
#include <asm/apic.h>
#include <asm/cpuid.h>
#include <asm/cmdline.h>
+#include <asm/iommu.h>

#define DR7_RESET_VALUE 0x400

@@ -57,6 +60,12 @@
#define AP_INIT_CR0_DEFAULT 0x60000010
#define AP_INIT_MXCSR_DEFAULT 0x1f80

+/*
+ * The first 16KB from the RMP_BASE is used by the processor for the
+ * bookkeeping, the range need to be added during the RMP entry lookup.
+ */
+#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
+
/* For early boot hypervisor communication in SEV-ES enabled guests */
static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);

@@ -69,6 +78,10 @@ static struct ghcb *boot_ghcb __section(".data");
/* Bitmap of SEV features supported by the hypervisor */
static u64 sev_hv_features __ro_after_init;

+static unsigned long rmptable_start __ro_after_init;
+static unsigned long rmptable_end __ro_after_init;
+
+
/* #VC handler runtime per-CPU data */
struct sev_es_runtime_data {
struct ghcb ghcb_page;
@@ -2218,3 +2231,134 @@ static int __init snp_init_platform_device(void)
return 0;
}
device_initcall(snp_init_platform_device);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "SEV-SNP: " fmt
+
+static int __snp_enable(unsigned int cpu)
+{
+ u64 val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+
+ val |= MSR_AMD64_SYSCFG_SNP_EN;
+ val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
+
+ wrmsrl(MSR_AMD64_SYSCFG, val);
+
+ return 0;
+}
+
+static __init void snp_enable(void *arg)
+{
+ __snp_enable(smp_processor_id());
+}
+
+static bool get_rmptable_info(u64 *start, u64 *len)
+{
+ u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end, nr_pages;
+
+ rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
+ rdmsrl(MSR_AMD64_RMP_END, rmp_end);
+
+ if (!rmp_base || !rmp_end) {
+ pr_info("Memory for the RMP table has not been reserved by BIOS\n");
+ return false;
+ }
+
+ rmp_sz = rmp_end - rmp_base + 1;
+
+ /*
+ * Calculate the amount the memory that must be reserved by the BIOS to
+ * address the full system RAM. The reserved memory should also cover the
+ * RMP table itself.
+ *
+ * See PPR Family 19h Model 01h, Revision B1 section 2.1.4.2 for more
+ * information on memory requirement.
+ */
+ nr_pages = totalram_pages();
+ calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + nr_pages) << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+ if (calc_rmp_sz > rmp_sz) {
+ pr_info("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
+ calc_rmp_sz, rmp_sz);
+ return false;
+ }
+
+ *start = rmp_base;
+ *len = rmp_sz;
+
+ pr_info("RMP table physical address 0x%016llx - 0x%016llx\n", rmp_base, rmp_end);
+
+ return true;
+}
+
+static __init int __snp_rmptable_init(void)
+{
+ u64 rmp_base, sz;
+ void *start;
+ u64 val;
+
+ if (!get_rmptable_info(&rmp_base, &sz))
+ return 1;
+
+ start = memremap(rmp_base, sz, MEMREMAP_WB);
+ if (!start) {
+ pr_err("Failed to map RMP table 0x%llx+0x%llx\n", rmp_base, sz);
+ return 1;
+ }
+
+ /*
+ * Check if SEV-SNP is already enabled, this can happen if we are coming from
+ * kexec boot.
+ */
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+ if (val & MSR_AMD64_SYSCFG_SNP_EN)
+ goto skip_enable;
+
+ /* Initialize the RMP table to zero */
+ memset(start, 0, sz);
+
+ /* Flush the caches to ensure that data is written before SNP is enabled. */
+ wbinvd_on_all_cpus();
+
+ /* Enable SNP on all CPUs. */
+ on_each_cpu(snp_enable, NULL, 1);
+
+skip_enable:
+ rmptable_start = (unsigned long)start;
+ rmptable_end = rmptable_start + sz;
+
+ return 0;
+}
+
+static int __init snp_rmptable_init(void)
+{
+ if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ if (!iommu_sev_snp_supported())
+ goto nosnp;
+
+ if (__snp_rmptable_init())
+ goto nosnp;
+
+ cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
+
+ return 0;
+
+nosnp:
+ setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+ return 1;
+}
+
+/*
+ * This must be called after the PCI subsystem. This is because before enabling
+ * the SNP feature we need to ensure that IOMMU supports the SEV-SNP feature.
+ * The iommu_sev_snp_support() is used for checking the feature, and it is
+ * available after subsys_initcall().
+ */
+fs_initcall(snp_rmptable_init);
--
2.25.1

2022-06-20 23:03:21

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 04/49] x86/sev: set SYSCFG.MFMD

From: Brijesh Singh <[email protected]>

SEV-SNP FW >= 1.51 requires that SYSCFG.MFMD must be set.

Subsequent CCP patches while require 1.51 as the minimum SEV-SNP
firmware version.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/msr-index.h | 3 +++
arch/x86/kernel/sev.c | 24 ++++++++++++++++++++++++
2 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 57a8280e283a..1e36f16daa56 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -587,6 +587,9 @@
#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
+#define MSR_AMD64_SYSCFG_MFDM_BIT 19
+#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
+
#define MSR_K8_INT_PENDING_MSG 0xc0010055
/* C1E active bits in int pending message */
#define K8_INTP_C1E_ACTIVE_MASK 0x18000000
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 3a233b5d47c5..25c7feb367f6 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2257,6 +2257,27 @@ static __init void snp_enable(void *arg)
__snp_enable(smp_processor_id());
}

+static int __mfdm_enable(unsigned int cpu)
+{
+ u64 val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+
+ val |= MSR_AMD64_SYSCFG_MFDM;
+
+ wrmsrl(MSR_AMD64_SYSCFG, val);
+
+ return 0;
+}
+
+static __init void mfdm_enable(void *arg)
+{
+ __mfdm_enable(smp_processor_id());
+}
+
static bool get_rmptable_info(u64 *start, u64 *len)
{
u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end, nr_pages;
@@ -2325,6 +2346,9 @@ static __init int __snp_rmptable_init(void)
/* Flush the caches to ensure that data is written before SNP is enabled. */
wbinvd_on_all_cpus();

+ /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
+ on_each_cpu(mfdm_enable, NULL, 1);
+
/* Enable SNP on all CPUs. */
on_each_cpu(snp_enable, NULL, 1);

--
2.25.1

2022-06-20 23:04:09

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

From: Brijesh Singh <[email protected]>

The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
entry for a given page. The RMP entry format is documented in AMD PPR, see
https://bugzilla.kernel.org/attachment.cgi?id=296015.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 27 ++++++++++++++++++++++++
arch/x86/kernel/sev.c | 43 ++++++++++++++++++++++++++++++++++++++
include/linux/sev.h | 30 ++++++++++++++++++++++++++
3 files changed, 100 insertions(+)
create mode 100644 include/linux/sev.h

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 9c2d33f1cfee..cb16f0e5b585 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -9,6 +9,7 @@
#define __ASM_ENCRYPTED_STATE_H

#include <linux/types.h>
+#include <linux/sev.h>
#include <asm/insn.h>
#include <asm/sev-common.h>
#include <asm/bootparam.h>
@@ -84,6 +85,32 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

/* RMP page size */
#define RMP_PG_SIZE_4K 0
+#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+
+/*
+ * The RMP entry format is not architectural. The format is defined in PPR
+ * Family 19h Model 01h, Rev B1 processor.
+ */
+struct __packed rmpentry {
+ union {
+ struct {
+ u64 assigned : 1,
+ pagesize : 1,
+ immutable : 1,
+ rsvd1 : 9,
+ gpa : 39,
+ asid : 10,
+ vmsa : 1,
+ validated : 1,
+ rsvd2 : 1;
+ } info;
+ u64 low;
+ };
+ u64 high;
+};
+
+#define rmpentry_assigned(x) ((x)->info.assigned)
+#define rmpentry_pagesize(x) ((x)->info.pagesize)

#define RMPADJUST_VMSA_PAGE_BIT BIT(16)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 25c7feb367f6..59e7ec6b0326 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -65,6 +65,8 @@
* bookkeeping, the range need to be added during the RMP entry lookup.
*/
#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
+#define RMPENTRY_SHIFT 8
+#define rmptable_page_offset(x) (RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))

/* For early boot hypervisor communication in SEV-ES enabled guests */
static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
@@ -2386,3 +2388,44 @@ static int __init snp_rmptable_init(void)
* available after subsys_initcall().
*/
fs_initcall(snp_rmptable_init);
+
+static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
+{
+ unsigned long vaddr, paddr = pfn << PAGE_SHIFT;
+ struct rmpentry *entry, *large_entry;
+
+ if (!pfn_valid(pfn))
+ return ERR_PTR(-EINVAL);
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return ERR_PTR(-ENXIO);
+
+ vaddr = rmptable_start + rmptable_page_offset(paddr);
+ if (unlikely(vaddr > rmptable_end))
+ return ERR_PTR(-ENXIO);
+
+ entry = (struct rmpentry *)vaddr;
+
+ /* Read a large RMP entry to get the correct page level used in RMP entry. */
+ vaddr = rmptable_start + rmptable_page_offset(paddr & PMD_MASK);
+ large_entry = (struct rmpentry *)vaddr;
+ *level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(large_entry));
+
+ return entry;
+}
+
+/*
+ * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
+ * and -errno if there is no corresponding RMP entry.
+ */
+int snp_lookup_rmpentry(u64 pfn, int *level)
+{
+ struct rmpentry *e;
+
+ e = __snp_lookup_rmpentry(pfn, level);
+ if (IS_ERR(e))
+ return PTR_ERR(e);
+
+ return !!rmpentry_assigned(e);
+}
+EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
diff --git a/include/linux/sev.h b/include/linux/sev.h
new file mode 100644
index 000000000000..1a68842789e1
--- /dev/null
+++ b/include/linux/sev.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD Secure Encrypted Virtualization
+ *
+ * Author: Brijesh Singh <[email protected]>
+ */
+
+#ifndef __LINUX_SEV_H
+#define __LINUX_SEV_H
+
+/* RMUPDATE detected 4K page and 2MB page overlap. */
+#define RMPUPDATE_FAIL_OVERLAP 7
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+int snp_lookup_rmpentry(u64 pfn, int *level);
+int psmash(u64 pfn);
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
+int rmp_make_shared(u64 pfn, enum pg_level level);
+#else
+static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
+static inline int psmash(u64 pfn) { return -ENXIO; }
+static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
+ bool immutable)
+{
+ return -ENODEV;
+}
+static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+#endif /* __LINUX_SEV_H */
--
2.25.1

2022-06-20 23:04:23

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 08/49] x86/traps: Define RMP violation #PF error code

From: Brijesh Singh <[email protected]>

Bit 31 in the page fault-error bit will be set when processor encounters
an RMP violation.

While at it, use the BIT_ULL() macro.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/trap_pf.h | 18 +++++++++++-------
arch/x86/mm/fault.c | 1 +
2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 10b1de500ab1..89b705114b3f 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_TRAP_PF_H
#define _ASM_X86_TRAP_PF_H

+#include <linux/bits.h> /* BIT() macro */
+
/*
* Page fault error code bits:
*
@@ -12,15 +14,17 @@
* bit 4 == 1: fault was an instruction fetch
* bit 5 == 1: protection keys block access
* bit 15 == 1: SGX MMU page-fault
+ * bit 31 == 1: fault was due to RMP violation
*/
enum x86_pf_error_code {
- X86_PF_PROT = 1 << 0,
- X86_PF_WRITE = 1 << 1,
- X86_PF_USER = 1 << 2,
- X86_PF_RSVD = 1 << 3,
- X86_PF_INSTR = 1 << 4,
- X86_PF_PK = 1 << 5,
- X86_PF_SGX = 1 << 15,
+ X86_PF_PROT = BIT_ULL(0),
+ X86_PF_WRITE = BIT_ULL(1),
+ X86_PF_USER = BIT_ULL(2),
+ X86_PF_RSVD = BIT_ULL(3),
+ X86_PF_INSTR = BIT_ULL(4),
+ X86_PF_PK = BIT_ULL(5),
+ X86_PF_SGX = BIT_ULL(15),
+ X86_PF_RMP = BIT_ULL(31),
};

#endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fad8faa29d04..a4c270e99f7f 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -546,6 +546,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
!(error_code & X86_PF_PROT) ? "not-present page" :
(error_code & X86_PF_RSVD) ? "reserved bit violation" :
(error_code & X86_PF_PK) ? "protection keys violation" :
+ (error_code & X86_PF_RMP) ? "RMP violation" :
"permissions violation");

if (!(error_code & X86_PF_USER) && user_mode(regs)) {
--
2.25.1

2022-06-20 23:04:39

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

From: Brijesh Singh <[email protected]>

The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set of
contiguous 4KB-Page RMP entries. The hypervisor will use this instruction
to adjust the RMP entry without invalidating the previous RMP entry.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 11 ++++++
arch/x86/kernel/sev.c | 72 ++++++++++++++++++++++++++++++++++++++
2 files changed, 83 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index cb16f0e5b585..6ab872311544 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -85,7 +85,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

/* RMP page size */
#define RMP_PG_SIZE_4K 0
+#define RMP_PG_SIZE_2M 1
#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+#define X86_TO_RMP_PG_LEVEL(level) (((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)

/*
* The RMP entry format is not architectural. The format is defined in PPR
@@ -126,6 +128,15 @@ struct snp_guest_platform_data {
u64 secrets_gpa;
};

+struct rmpupdate {
+ u64 gpa;
+ u8 assigned;
+ u8 pagesize;
+ u8 immutable;
+ u8 rsvd;
+ u32 asid;
+} __packed;
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 59e7ec6b0326..f6c64a722e94 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2429,3 +2429,75 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
return !!rmpentry_assigned(e);
}
EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
+
+int psmash(u64 pfn)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ /* Binutils version 2.36 supports the PSMASH mnemonic. */
+ asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+ : "=a"(ret)
+ : "a"(paddr)
+ : "memory", "cc");
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+static int rmpupdate(u64 pfn, struct rmpupdate *val)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+ : "=a"(ret)
+ : "a"(paddr), "c"((unsigned long)val)
+ : "memory", "cc");
+ return ret;
+}
+
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
+{
+ struct rmpupdate val;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
+ memset(&val, 0, sizeof(val));
+ val.assigned = 1;
+ val.asid = asid;
+ val.immutable = immutable;
+ val.gpa = gpa;
+ val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+ return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_private);
+
+int rmp_make_shared(u64 pfn, enum pg_level level)
+{
+ struct rmpupdate val;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
+ memset(&val, 0, sizeof(val));
+ val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+ return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_shared);
--
2.25.1

2022-06-20 23:04:52

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 07/49] x86/sev: Invalid pages from direct map when adding it to RMP table

From: Brijesh Singh <[email protected]>

The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used with standard x86 and IOMMU page tables to enforce memory
restrictions and page access rights. The RMP check is enforced as soon as
SEV-SNP is enabled globally in the system. When hardware encounters an
RMP checks failure, it raises a page-fault exception.

The rmp_make_private() and rmp_make_shared() helpers are used to add
or remove the pages from the RMP table. Improve the rmp_make_private() to
invalid state so that pages cannot be used in the direct-map after its
added in the RMP table, and restore to its default valid permission after
the pages are removed from the RMP table.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/sev.c | 61 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index f6c64a722e94..734cddd837f5 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2451,10 +2451,42 @@ int psmash(u64 pfn)
}
EXPORT_SYMBOL_GPL(psmash);

+static int restore_direct_map(u64 pfn, int npages)
+{
+ int i, ret = 0;
+
+ for (i = 0; i < npages; i++) {
+ ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
+ if (ret)
+ goto cleanup;
+ }
+
+cleanup:
+ WARN(ret > 0, "Failed to restore direct map for pfn 0x%llx\n", pfn + i);
+ return ret;
+}
+
+static int invalid_direct_map(unsigned long pfn, int npages)
+{
+ int i, ret = 0;
+
+ for (i = 0; i < npages; i++) {
+ ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));
+ if (ret)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ restore_direct_map(pfn, i);
+ return ret;
+}
+
static int rmpupdate(u64 pfn, struct rmpupdate *val)
{
unsigned long paddr = pfn << PAGE_SHIFT;
- int ret;
+ int ret, level, npages;

if (!pfn_valid(pfn))
return -EINVAL;
@@ -2462,11 +2494,38 @@ static int rmpupdate(u64 pfn, struct rmpupdate *val)
if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
return -ENXIO;

+ level = RMP_TO_X86_PG_LEVEL(val->pagesize);
+ npages = page_level_size(level) / PAGE_SIZE;
+
+ /*
+ * If page is getting assigned in the RMP table then unmap it from the
+ * direct map.
+ */
+ if (val->assigned) {
+ if (invalid_direct_map(pfn, npages)) {
+ pr_err("Failed to unmap pfn 0x%llx pages %d from direct_map\n",
+ pfn, npages);
+ return -EFAULT;
+ }
+ }
+
/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
: "=a"(ret)
: "a"(paddr), "c"((unsigned long)val)
: "memory", "cc");
+
+ /*
+ * Restore the direct map after the page is removed from the RMP table.
+ */
+ if (!ret && !val->assigned) {
+ if (restore_direct_map(pfn, npages)) {
+ pr_err("Failed to map pfn 0x%llx pages %d in direct_map\n",
+ pfn, npages);
+ return -EFAULT;
+ }
+ }
+
return ret;
}

--
2.25.1

2022-06-20 23:05:13

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 10/49] x86/fault: Add support to dump RMP entry on fault

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled globally, a write from the host goes through the
RMP check. If the hardware encounters the check failure, then it raises
the #PF (with RMP set). Dump the RMP entry at the faulting pfn to help
the debug.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 7 +++++++
arch/x86/kernel/sev.c | 43 ++++++++++++++++++++++++++++++++++++++
arch/x86/mm/fault.c | 17 +++++++++++----
include/linux/sev.h | 2 ++
4 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 6ab872311544..c0c4df817159 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -113,6 +113,11 @@ struct __packed rmpentry {

#define rmpentry_assigned(x) ((x)->info.assigned)
#define rmpentry_pagesize(x) ((x)->info.pagesize)
+#define rmpentry_vmsa(x) ((x)->info.vmsa)
+#define rmpentry_asid(x) ((x)->info.asid)
+#define rmpentry_validated(x) ((x)->info.validated)
+#define rmpentry_gpa(x) ((unsigned long)(x)->info.gpa)
+#define rmpentry_immutable(x) ((x)->info.immutable)

#define RMPADJUST_VMSA_PAGE_BIT BIT(16)

@@ -205,6 +210,7 @@ void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
+void dump_rmpentry(u64 pfn);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -229,6 +235,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
{
return -ENOTTY;
}
+static inline void dump_rmpentry(u64 pfn) {}
#endif

#endif
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 734cddd837f5..6640a639fffc 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2414,6 +2414,49 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
return entry;
}

+void dump_rmpentry(u64 pfn)
+{
+ unsigned long pfn_end;
+ struct rmpentry *e;
+ int level;
+
+ e = __snp_lookup_rmpentry(pfn, &level);
+ if (!e) {
+ pr_alert("failed to read RMP entry pfn 0x%llx\n", pfn);
+ return;
+ }
+
+ if (rmpentry_assigned(e)) {
+ pr_alert("RMPEntry paddr 0x%llx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx"
+ " asid=%d vmsa=%d validated=%d]\n", pfn << PAGE_SHIFT,
+ rmpentry_assigned(e), rmpentry_immutable(e), rmpentry_pagesize(e),
+ rmpentry_gpa(e), rmpentry_asid(e), rmpentry_vmsa(e),
+ rmpentry_validated(e));
+ return;
+ }
+
+ /*
+ * If the RMP entry at the faulting pfn was not assigned, then we do not
+ * know what caused the RMP violation. To get some useful debug information,
+ * let iterate through the entire 2MB region, and dump the RMP entries if
+ * one of the bit in the RMP entry is set.
+ */
+ pfn = pfn & ~(PTRS_PER_PMD - 1);
+ pfn_end = pfn + PTRS_PER_PMD;
+
+ while (pfn < pfn_end) {
+ e = __snp_lookup_rmpentry(pfn, &level);
+ if (!e)
+ return;
+
+ if (e->low || e->high)
+ pr_alert("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+ pfn << PAGE_SHIFT, e->high, e->low);
+ pfn++;
+ }
+}
+EXPORT_SYMBOL_GPL(dump_rmpentry);
+
/*
* Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
* and -errno if there is no corresponding RMP entry.
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f5de9673093a..25896a6ba04a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -34,6 +34,7 @@
#include <asm/kvm_para.h> /* kvm_handle_async_pf */
#include <asm/vdso.h> /* fixup_vdso_exception() */
#include <asm/irq_stack.h>
+#include <asm/sev.h> /* dump_rmpentry() */

#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -290,7 +291,7 @@ static bool low_pfn(unsigned long pfn)
return pfn < max_low_pfn;
}

-static void dump_pagetable(unsigned long address)
+static void dump_pagetable(unsigned long address, bool show_rmpentry)
{
pgd_t *base = __va(read_cr3_pa());
pgd_t *pgd = &base[pgd_index(address)];
@@ -346,10 +347,11 @@ static int bad_address(void *p)
return get_kernel_nofault(dummy, (unsigned long *)p);
}

-static void dump_pagetable(unsigned long address)
+static void dump_pagetable(unsigned long address, bool show_rmpentry)
{
pgd_t *base = __va(read_cr3_pa());
pgd_t *pgd = base + pgd_index(address);
+ unsigned long pfn;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
@@ -367,6 +369,7 @@ static void dump_pagetable(unsigned long address)
if (bad_address(p4d))
goto bad;

+ pfn = p4d_pfn(*p4d);
pr_cont("P4D %lx ", p4d_val(*p4d));
if (!p4d_present(*p4d) || p4d_large(*p4d))
goto out;
@@ -375,6 +378,7 @@ static void dump_pagetable(unsigned long address)
if (bad_address(pud))
goto bad;

+ pfn = pud_pfn(*pud);
pr_cont("PUD %lx ", pud_val(*pud));
if (!pud_present(*pud) || pud_large(*pud))
goto out;
@@ -383,6 +387,7 @@ static void dump_pagetable(unsigned long address)
if (bad_address(pmd))
goto bad;

+ pfn = pmd_pfn(*pmd);
pr_cont("PMD %lx ", pmd_val(*pmd));
if (!pmd_present(*pmd) || pmd_large(*pmd))
goto out;
@@ -391,9 +396,13 @@ static void dump_pagetable(unsigned long address)
if (bad_address(pte))
goto bad;

+ pfn = pte_pfn(*pte);
pr_cont("PTE %lx", pte_val(*pte));
out:
pr_cont("\n");
+
+ if (show_rmpentry)
+ dump_rmpentry(pfn);
return;
bad:
pr_info("BAD\n");
@@ -579,7 +588,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
show_ldttss(&gdt, "TR", tr);
}

- dump_pagetable(address);
+ dump_pagetable(address, error_code & X86_PF_RMP);
}

static noinline void
@@ -596,7 +605,7 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code,

printk(KERN_ALERT "%s: Corrupted page table at address %lx\n",
tsk->comm, address);
- dump_pagetable(address);
+ dump_pagetable(address, false);

if (__die("Bad pagetable", regs, error_code))
sig = 0;
diff --git a/include/linux/sev.h b/include/linux/sev.h
index 1a68842789e1..734b13a69c54 100644
--- a/include/linux/sev.h
+++ b/include/linux/sev.h
@@ -16,6 +16,7 @@ int snp_lookup_rmpentry(u64 pfn, int *level);
int psmash(u64 pfn);
int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
int rmp_make_shared(u64 pfn, enum pg_level level);
+void dump_rmpentry(u64 pfn);
#else
static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
static inline int psmash(u64 pfn) { return -ENXIO; }
@@ -25,6 +26,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
return -ENODEV;
}
static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+static inline void dump_rmpentry(u64 pfn) { }

#endif /* CONFIG_AMD_MEM_ENCRYPT */
#endif /* __LINUX_SEV_H */
--
2.25.1

2022-06-20 23:05:14

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 11/49] crypto:ccp: Define the SEV-SNP commands

From: Brijesh Singh <[email protected]>

AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.

Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.

Signed-off-by: Brijesh Singh <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 14 +++
include/linux/psp-sev.h | 222 +++++++++++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 42 +++++++
3 files changed, 278 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index fd928199bf1e..9cb3265f3bef 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -153,6 +153,20 @@ static int sev_cmd_buffer_len(int cmd)
case SEV_CMD_GET_ID: return sizeof(struct sev_data_get_id);
case SEV_CMD_ATTESTATION_REPORT: return sizeof(struct sev_data_attestation_report);
case SEV_CMD_SEND_CANCEL: return sizeof(struct sev_data_send_cancel);
+ case SEV_CMD_SNP_GCTX_CREATE: return sizeof(struct sev_data_snp_gctx_create);
+ case SEV_CMD_SNP_LAUNCH_START: return sizeof(struct sev_data_snp_launch_start);
+ case SEV_CMD_SNP_LAUNCH_UPDATE: return sizeof(struct sev_data_snp_launch_update);
+ case SEV_CMD_SNP_ACTIVATE: return sizeof(struct sev_data_snp_activate);
+ case SEV_CMD_SNP_DECOMMISSION: return sizeof(struct sev_data_snp_decommission);
+ case SEV_CMD_SNP_PAGE_RECLAIM: return sizeof(struct sev_data_snp_page_reclaim);
+ case SEV_CMD_SNP_GUEST_STATUS: return sizeof(struct sev_data_snp_guest_status);
+ case SEV_CMD_SNP_LAUNCH_FINISH: return sizeof(struct sev_data_snp_launch_finish);
+ case SEV_CMD_SNP_DBG_DECRYPT: return sizeof(struct sev_data_snp_dbg);
+ case SEV_CMD_SNP_DBG_ENCRYPT: return sizeof(struct sev_data_snp_dbg);
+ case SEV_CMD_SNP_PAGE_UNSMASH: return sizeof(struct sev_data_snp_page_unsmash);
+ case SEV_CMD_SNP_PLATFORM_STATUS: return sizeof(struct sev_data_snp_platform_status_buf);
+ case SEV_CMD_SNP_GUEST_REQUEST: return sizeof(struct sev_data_snp_guest_request);
+ case SEV_CMD_SNP_CONFIG: return sizeof(struct sev_user_data_snp_config);
default: return 0;
}

diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 1595088c428b..01ba9dc46ca3 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -86,6 +86,34 @@ enum sev_cmd {
SEV_CMD_DBG_DECRYPT = 0x060,
SEV_CMD_DBG_ENCRYPT = 0x061,

+ /* SNP specific commands */
+ SEV_CMD_SNP_INIT = 0x81,
+ SEV_CMD_SNP_SHUTDOWN = 0x82,
+ SEV_CMD_SNP_PLATFORM_STATUS = 0x83,
+ SEV_CMD_SNP_DF_FLUSH = 0x84,
+ SEV_CMD_SNP_INIT_EX = 0x85,
+ SEV_CMD_SNP_DECOMMISSION = 0x90,
+ SEV_CMD_SNP_ACTIVATE = 0x91,
+ SEV_CMD_SNP_GUEST_STATUS = 0x92,
+ SEV_CMD_SNP_GCTX_CREATE = 0x93,
+ SEV_CMD_SNP_GUEST_REQUEST = 0x94,
+ SEV_CMD_SNP_ACTIVATE_EX = 0x95,
+ SEV_CMD_SNP_LAUNCH_START = 0xA0,
+ SEV_CMD_SNP_LAUNCH_UPDATE = 0xA1,
+ SEV_CMD_SNP_LAUNCH_FINISH = 0xA2,
+ SEV_CMD_SNP_DBG_DECRYPT = 0xB0,
+ SEV_CMD_SNP_DBG_ENCRYPT = 0xB1,
+ SEV_CMD_SNP_PAGE_SWAP_OUT = 0xC0,
+ SEV_CMD_SNP_PAGE_SWAP_IN = 0xC1,
+ SEV_CMD_SNP_PAGE_MOVE = 0xC2,
+ SEV_CMD_SNP_PAGE_MD_INIT = 0xC3,
+ SEV_CMD_SNP_PAGE_MD_RECLAIM = 0xC4,
+ SEV_CMD_SNP_PAGE_RO_RECLAIM = 0xC5,
+ SEV_CMD_SNP_PAGE_RO_RESTORE = 0xC6,
+ SEV_CMD_SNP_PAGE_RECLAIM = 0xC7,
+ SEV_CMD_SNP_PAGE_UNSMASH = 0xC8,
+ SEV_CMD_SNP_CONFIG = 0xC9,
+
SEV_CMD_MAX,
};

@@ -531,6 +559,200 @@ struct sev_data_attestation_report {
u32 len; /* In/Out */
} __packed;

+/**
+ * struct sev_data_snp_platform_status_buf - SNP_PLATFORM_STATUS command params
+ *
+ * @address: physical address where the status should be copied
+ */
+struct sev_data_snp_platform_status_buf {
+ u64 status_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+ u64 address; /* In */
+ u32 len; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_gctx_create - SNP_GCTX_CREATE command params
+ *
+ * @gctx_paddr: system physical address of the page donated to firmware by
+ * the hypervisor to contain the guest context.
+ */
+struct sev_data_snp_gctx_create {
+ u64 gctx_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_activate - SNP_ACTIVATE command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ * @asid: ASID to bind to the guest
+ */
+struct sev_data_snp_activate {
+ u64 gctx_paddr; /* In */
+ u32 asid; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_decommission - SNP_DECOMMISSION command params
+ *
+ * @address: system physical address guest context page
+ */
+struct sev_data_snp_decommission {
+ u64 gctx_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @policy: guest policy
+ * @ma_gctx_addr: system physical address of migration agent
+ * @imi_en: launch flow is launching an IMI for the purpose of
+ * guest-assisted migration.
+ * @ma_en: the guest is associated with a migration agent
+ */
+struct sev_data_snp_launch_start {
+ u64 gctx_paddr; /* In */
+ u64 policy; /* In */
+ u64 ma_gctx_paddr; /* In */
+ u32 ma_en:1; /* In */
+ u32 imi_en:1; /* In */
+ u32 rsvd:30;
+ u8 gosvw[16]; /* In */
+} __packed;
+
+/* SNP support page type */
+enum {
+ SNP_PAGE_TYPE_NORMAL = 0x1,
+ SNP_PAGE_TYPE_VMSA = 0x2,
+ SNP_PAGE_TYPE_ZERO = 0x3,
+ SNP_PAGE_TYPE_UNMEASURED = 0x4,
+ SNP_PAGE_TYPE_SECRET = 0x5,
+ SNP_PAGE_TYPE_CPUID = 0x6,
+
+ SNP_PAGE_TYPE_MAX
+};
+
+/**
+ * struct sev_data_snp_launch_update - SNP_LAUNCH_UPDATE command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @imi_page: indicates that this page is part of the IMI of the guest
+ * @page_type: encoded page type
+ * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
+ * @address: system physical address of destination page to encrypt
+ * @vmpl1_perms: VMPL permission mask for VMPL1
+ * @vmpl2_perms: VMPL permission mask for VMPL2
+ * @vmpl3_perms: VMPL permission mask for VMPL3
+ */
+struct sev_data_snp_launch_update {
+ u64 gctx_paddr; /* In */
+ u32 page_size:1; /* In */
+ u32 page_type:3; /* In */
+ u32 imi_page:1; /* In */
+ u32 rsvd:27;
+ u32 rsvd2;
+ u64 address; /* In */
+ u32 rsvd3:8;
+ u32 vmpl1_perms:8; /* In */
+ u32 vmpl2_perms:8; /* In */
+ u32 vmpl3_perms:8; /* In */
+ u32 rsvd4;
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_finish - SNP_LAUNCH_FINISH command params
+ *
+ * @gctx_addr: system pphysical address of guest context page
+ */
+struct sev_data_snp_launch_finish {
+ u64 gctx_paddr;
+ u64 id_block_paddr;
+ u64 id_auth_paddr;
+ u8 id_block_en:1;
+ u8 auth_key_en:1;
+ u64 rsvd:62;
+ u8 host_data[32];
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_status - SNP_GUEST_STATUS command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @address: system physical address of guest status page
+ */
+struct sev_data_snp_guest_status {
+ u64 gctx_paddr;
+ u64 address;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_reclaim - SNP_PAGE_RECLAIM command params
+ *
+ * @paddr: system physical address of page to be claimed. The BIT0 indicate
+ * the page size. 0h indicates 4 kB and 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_reclaim {
+ u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_unsmash - SNP_PAGE_UNMASH command params
+ *
+ * @paddr: system physical address of page to be unmashed. The BIT0 indicate
+ * the page size. 0h indicates 4 kB and 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_unsmash {
+ u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_dbg - DBG_ENCRYPT/DBG_DECRYPT command parameters
+ *
+ * @handle: handle of the VM to perform debug operation
+ * @src_addr: source address of data to operate on
+ * @dst_addr: destination address of data to operate on
+ * @len: len of data to operate on
+ */
+struct sev_data_snp_dbg {
+ u64 gctx_paddr; /* In */
+ u64 src_addr; /* In */
+ u64 dst_addr; /* In */
+ u32 len; /* In */
+} __packed;
+
+/**
+ * struct sev_snp_guest_request - SNP_GUEST_REQUEST command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @req_paddr: system physical address of request page
+ * @res_paddr: system physical address of response page
+ */
+struct sev_data_snp_guest_request {
+ u64 gctx_paddr; /* In */
+ u64 req_paddr; /* In */
+ u64 res_paddr; /* In */
+} __packed;
+
+/**
+ * struuct sev_data_snp_init - SNP_INIT_EX structure
+ *
+ * @init_rmp: indicate that the RMP should be initialized.
+ */
+struct sev_data_snp_init_ex {
+ u32 init_rmp:1;
+ u32 rsvd:31;
+ u8 rsvd1[60];
+} __packed;
+
#ifdef CONFIG_CRYPTO_DEV_SP_PSP

/**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 91b4c63d5cbf..bed65a891223 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -61,6 +61,13 @@ typedef enum {
SEV_RET_INVALID_PARAM,
SEV_RET_RESOURCE_LIMIT,
SEV_RET_SECURE_DATA_INVALID,
+ SEV_RET_INVALID_PAGE_SIZE,
+ SEV_RET_INVALID_PAGE_STATE,
+ SEV_RET_INVALID_MDATA_ENTRY,
+ SEV_RET_INVALID_PAGE_OWNER,
+ SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+ SEV_RET_RMP_INIT_REQUIRED,
+
SEV_RET_MAX,
} sev_ret_code;

@@ -147,6 +154,41 @@ struct sev_user_data_get_id2 {
__u32 length; /* In/Out */
} __packed;

+/**
+ * struct sev_user_data_snp_status - SNP status
+ *
+ * @major: API major version
+ * @minor: API minor version
+ * @state: current platform state
+ * @build: firmware build id for the API version
+ * @guest_count: the number of guest currently managed by the firmware
+ * @tcb_version: current TCB version
+ */
+struct sev_user_data_snp_status {
+ __u8 api_major; /* Out */
+ __u8 api_minor; /* Out */
+ __u8 state; /* Out */
+ __u8 rsvd;
+ __u32 build_id; /* Out */
+ __u32 rsvd1;
+ __u32 guest_count; /* Out */
+ __u64 tcb_version; /* Out */
+ __u64 rsvd2;
+} __packed;
+
+/*
+ * struct sev_user_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: The TCB version to report in the guest attestation report.
+ * @mask_chip_id: Indicates that the CHID_ID field in the attestation report
+ * will always be zero.
+ */
+struct sev_user_data_snp_config {
+ __u64 reported_tcb; /* In */
+ __u32 mask_chip_id; /* In */
+ __u8 rsvd[52];
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1

2022-06-20 23:06:07

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 12/49] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

From: Brijesh Singh <[email protected]>

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH command
to prepare for the first SNP guest launch after INIT.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off by: Ashish Kalra <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 121 +++++++++++++++++++++++++++++++++++
drivers/crypto/ccp/sev-dev.h | 2 +
include/linux/psp-sev.h | 16 +++++
3 files changed, 139 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 9cb3265f3bef..f1173221d0b9 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -33,6 +33,10 @@
#define SEV_FW_FILE "amd/sev.fw"
#define SEV_FW_NAME_SIZE 64

+/* Minimum firmware version required for the SEV-SNP support */
+#define SNP_MIN_API_MAJOR 1
+#define SNP_MIN_API_MINOR 51
+
static DEFINE_MUTEX(sev_cmd_mutex);
static struct sev_misc_dev *misc_dev;

@@ -775,6 +779,98 @@ static int sev_update_firmware(struct device *dev)
return ret;
}

+static void snp_set_hsave_pa(void *arg)
+{
+ wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int __sev_snp_init_locked(int *error)
+{
+ struct psp_device *psp = psp_master;
+ struct sev_device *sev;
+ int rc = 0;
+
+ if (!psp || !psp->sev_data)
+ return -ENODEV;
+
+ sev = psp->sev_data;
+
+ if (sev->snp_inited)
+ return 0;
+
+ /*
+ * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
+ * across all cores.
+ */
+ on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+ /* Issue the SNP_INIT firmware command. */
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
+ if (rc)
+ return rc;
+
+ /* Prepare for first SNP guest launch after INIT */
+ wbinvd_on_all_cpus();
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+ if (rc)
+ return rc;
+
+ sev->snp_inited = true;
+ dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+ return rc;
+}
+
+int sev_snp_init(int *error)
+{
+ int rc;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENODEV;
+
+ mutex_lock(&sev_cmd_mutex);
+ rc = __sev_snp_init_locked(error);
+ mutex_unlock(&sev_cmd_mutex);
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(sev_snp_init);
+
+static int __sev_snp_shutdown_locked(int *error)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ int ret;
+
+ if (!sev->snp_inited)
+ return 0;
+
+ /* SHUTDOWN requires the DF_FLUSH */
+ wbinvd_on_all_cpus();
+ __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
+
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN, NULL, error);
+ if (ret) {
+ dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
+ return ret;
+ }
+
+ sev->snp_inited = false;
+ dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+ return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+ int rc;
+
+ mutex_lock(&sev_cmd_mutex);
+ rc = __sev_snp_shutdown_locked(NULL);
+ mutex_unlock(&sev_cmd_mutex);
+
+ return rc;
+}
+
static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
{
struct sev_device *sev = psp_master->sev_data;
@@ -1231,6 +1327,8 @@ static void sev_firmware_shutdown(struct sev_device *sev)
get_order(NV_LENGTH));
sev_init_ex_buffer = NULL;
}
+
+ sev_snp_shutdown(NULL);
}

void sev_dev_destroy(struct psp_device *psp)
@@ -1287,6 +1385,26 @@ void sev_pci_init(void)
}
}

+ /*
+ * If boot CPU supports the SNP, then first attempt to initialize
+ * the SNP firmware.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
+ if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
+ dev_err(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
+ SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
+ } else {
+ rc = sev_snp_init(&error);
+ if (rc) {
+ /*
+ * If we failed to INIT SNP then don't abort the probe.
+ * Continue to initialize the legacy SEV firmware.
+ */
+ dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
+ }
+ }
+ }
+
/* Obtain the TMR memory area for SEV-ES use */
sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
if (!sev_es_tmr)
@@ -1302,6 +1420,9 @@ void sev_pci_init(void)
dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
error, rc);

+ dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_inited ?
+ "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
+
return;

err:
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 666c21eb81ab..186ad20cbd24 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,8 @@ struct sev_device {
u8 build;

void *cmd_buf;
+
+ bool snp_inited;
};

int sev_dev_init(struct psp_device *psp);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 01ba9dc46ca3..ef4d42e8c96e 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -769,6 +769,20 @@ struct sev_data_snp_init_ex {
*/
int sev_platform_init(int *error);

+/**
+ * sev_snp_init - perform SEV SNP_INIT command
+ *
+ * @error: SEV command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV if the SEV device is not available
+ * -%ENOTSUPP if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO if the SEV returned a non-zero return code
+ */
+int sev_snp_init(int *error);
+
/**
* sev_platform_status - perform SEV PLATFORM_STATUS command
*
@@ -876,6 +890,8 @@ sev_platform_status(struct sev_user_data_status *status, int *error) { return -E

static inline int sev_platform_init(int *error) { return -ENODEV; }

+static inline int sev_snp_init(int *error) { return -ENODEV; }
+
static inline int
sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENODEV; }

--
2.25.1

2022-06-20 23:06:07

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 09/49] x86/fault: Add support to handle the RMP fault for user address

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled globally, a write from the host goes through the
RMP check. When the host writes to pages, hardware checks the following
conditions at the end of page walk:

1. Assigned bit in the RMP table is zero (i.e page is shared).
2. If the page table entry that gives the sPA indicates that the target
page size is a large page, then all RMP entries for the 4KB
constituting pages of the target must have the assigned bit 0.
3. Immutable bit in the RMP table is not zero.

The hardware will raise page fault if one of the above conditions is not
met. Try resolving the fault instead of taking fault again and again. If
the host attempts to write to the guest private memory then send the
SIGBUS signal to kill the process. If the page level between the host and
RMP entry does not match, then split the address to keep the RMP and host
page levels in sync.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/mm/fault.c | 66 ++++++++++++++++++++++++++++++++++++++++
include/linux/mm.h | 3 +-
include/linux/mm_types.h | 3 ++
mm/memory.c | 13 ++++++++
4 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a4c270e99f7f..f5de9673093a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -19,6 +19,7 @@
#include <linux/uaccess.h> /* faulthandler_disabled() */
#include <linux/efi.h> /* efi_crash_gracefully_on_page_fault()*/
#include <linux/mm_types.h>
+#include <linux/sev.h> /* snp_lookup_rmpentry() */

#include <asm/cpufeature.h> /* boot_cpu_has, ... */
#include <asm/traps.h> /* dotraplinkage, ... */
@@ -1209,6 +1210,60 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
}
NOKPROBE_SYMBOL(do_kern_addr_fault);

+static inline size_t pages_per_hpage(int level)
+{
+ return page_level_size(level) / PAGE_SIZE;
+}
+
+/*
+ * Return 1 if the caller need to retry, 0 if it the address need to be split
+ * in order to resolve the fault.
+ */
+static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
+ unsigned long address)
+{
+ int rmp_level, level;
+ pte_t *pte;
+ u64 pfn;
+
+ pte = lookup_address_in_mm(current->mm, address, &level);
+
+ /*
+ * It can happen if there was a race between an unmap event and
+ * the RMP fault delivery.
+ */
+ if (!pte || !pte_present(*pte))
+ return 1;
+
+ pfn = pte_pfn(*pte);
+
+ /* If its large page then calculte the fault pfn */
+ if (level > PG_LEVEL_4K) {
+ unsigned long mask;
+
+ mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
+ pfn |= (address >> PAGE_SHIFT) & mask;
+ }
+
+ /*
+ * If its a guest private page, then the fault cannot be resolved.
+ * Send a SIGBUS to terminate the process.
+ */
+ if (snp_lookup_rmpentry(pfn, &rmp_level)) {
+ do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
+ return 1;
+ }
+
+ /*
+ * The backing page level is higher than the RMP page level, request
+ * to split the page.
+ */
+ if (level > rmp_level)
+ return 0;
+
+ return 1;
+}
+
/*
* Handle faults in the user portion of the address space. Nothing in here
* should check X86_PF_USER without a specific justification: for almost
@@ -1306,6 +1361,17 @@ void do_user_addr_fault(struct pt_regs *regs,
if (error_code & X86_PF_INSTR)
flags |= FAULT_FLAG_INSTRUCTION;

+ /*
+ * If its an RMP violation, try resolving it.
+ */
+ if (error_code & X86_PF_RMP) {
+ if (handle_user_rmp_page_fault(regs, error_code, address))
+ return;
+
+ /* Ask to split the page */
+ flags |= FAULT_FLAG_PAGE_SPLIT;
+ }
+
#ifdef CONFIG_X86_64
/*
* Faults in the vsyscall page might need emulation. The
diff --git a/include/linux/mm.h b/include/linux/mm.h
index de32c0383387..2ccc562d166f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -463,7 +463,8 @@ static inline bool fault_flag_allow_retry_first(enum fault_flag flags)
{ FAULT_FLAG_USER, "USER" }, \
{ FAULT_FLAG_REMOTE, "REMOTE" }, \
{ FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \
- { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }
+ { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \
+ { FAULT_FLAG_PAGE_SPLIT, "PAGESPLIT" }

/*
* vm_fault is filled by the pagefault handler and passed to the vma's
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6dfaf271ebf8..aa2d8d48ce3e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -818,6 +818,8 @@ typedef struct {
* mapped R/O.
* @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached.
* We should only access orig_pte if this flag set.
+ * @FAULT_FLAG_PAGE_SPLIT: The fault was due page size mismatch, split the
+ * region to smaller page size and retry.
*
* About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
* whether we would allow page faults to retry by specifying these two
@@ -855,6 +857,7 @@ enum fault_flag {
FAULT_FLAG_INTERRUPTIBLE = 1 << 9,
FAULT_FLAG_UNSHARE = 1 << 10,
FAULT_FLAG_ORIG_PTE_VALID = 1 << 11,
+ FAULT_FLAG_PAGE_SPLIT = 1 << 12,
};

typedef unsigned int __bitwise zap_flags_t;
diff --git a/mm/memory.c b/mm/memory.c
index 7274f2b52bca..c2187ffcbb8e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4945,6 +4945,15 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
return 0;
}

+static int handle_split_page_fault(struct vm_fault *vmf)
+{
+ if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+ return VM_FAULT_SIGBUS;
+
+ __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
+ return 0;
+}
+
/*
* By the time we get here, we already hold the mm semaphore
*
@@ -5024,6 +5033,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
pmd_migration_entry_wait(mm, vmf.pmd);
return 0;
}
+
+ if (flags & FAULT_FLAG_PAGE_SPLIT)
+ return handle_split_page_fault(&vmf);
+
if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) {
if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
return do_huge_pmd_numa_page(&vmf);
--
2.25.1

2022-06-20 23:06:21

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 13/49] crypto:ccp: Provide APIs to issue SEV-SNP commands

From: Brijesh Singh <[email protected]>

Provide the APIs for the hypervisor to manage an SEV-SNP guest. The
commands for SEV-SNP is defined in the SEV-SNP firmware specification.

Signed-off-by: Brijesh Singh <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 24 ++++++++++++
include/linux/psp-sev.h | 73 ++++++++++++++++++++++++++++++++++++
2 files changed, 97 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index f1173221d0b9..35d76333e120 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1205,6 +1205,30 @@ int sev_guest_df_flush(int *error)
}
EXPORT_SYMBOL_GPL(sev_guest_df_flush);

+int snp_guest_decommission(struct sev_data_snp_decommission *data, int *error)
+{
+ return sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, data, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_decommission);
+
+int snp_guest_df_flush(int *error)
+{
+ return sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_df_flush);
+
+int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error)
+{
+ return sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, data, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_page_reclaim);
+
+int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
+{
+ return sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, data, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);
+
static void sev_exit(struct kref *ref)
{
misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index ef4d42e8c96e..9f921d221b75 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -881,6 +881,64 @@ int sev_guest_df_flush(int *error);
*/
int sev_guest_decommission(struct sev_data_decommission *data, int *error);

+/**
+ * snp_guest_df_flush - perform SNP DF_FLUSH command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV if the sev device is not available
+ * -%ENOTSUPP if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO if the sev returned a non-zero return code
+ */
+int snp_guest_df_flush(int *error);
+
+/**
+ * snp_guest_decommission - perform SNP_DECOMMISSION command
+ *
+ * @decommission: sev_data_decommission structure to be processed
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV if the sev device is not available
+ * -%ENOTSUPP if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO if the sev returned a non-zero return code
+ */
+int snp_guest_decommission(struct sev_data_snp_decommission *data, int *error);
+
+/**
+ * snp_guest_page_reclaim - perform SNP_PAGE_RECLAIM command
+ *
+ * @decommission: sev_snp_page_reclaim structure to be processed
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV if the sev device is not available
+ * -%ENOTSUPP if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO if the sev returned a non-zero return code
+ */
+int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
+
+/**
+ * snp_guest_dbg_decrypt - perform SEV SNP_DBG_DECRYPT command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV if the sev device is not available
+ * -%ENOTSUPP if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO if the sev returned a non-zero return code
+ */
+int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
+
void *psp_copy_user_blob(u64 uaddr, u32 len);

#else /* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -908,6 +966,21 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int

static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }

+static inline int
+snp_guest_decommission(struct sev_data_snp_decommission *data, int *error) { return -ENODEV; }
+
+static inline int snp_guest_df_flush(int *error) { return -ENODEV; }
+
+static inline int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error)
+{
+ return -ENODEV;
+}
+
+static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
+{
+ return -ENODEV;
+}
+
#endif /* CONFIG_CRYPTO_DEV_SP_PSP */

#endif /* __PSP_SEV_H__ */
--
2.25.1

2022-06-20 23:06:58

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

From: Brijesh Singh <[email protected]>

The behavior and requirement for the SEV-legacy command is altered when
the SNP firmware is in the INIT state. See SEV-SNP firmware specification
for more details.

Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
when SNP is enabled to satify new requirements for the SNP. Continue
allocating a 1mb region for !SNP configuration.

While at it, provide API that can be used by others to allocate a page
that can be used by the firmware. The immediate user for this API will
be the KVM driver. The KVM driver to need to allocate a firmware context
page during the guest creation. The context page need to be updated
by the firmware. See the SEV-SNP specification for further details.

Signed-off-by: Brijesh Singh <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 173 +++++++++++++++++++++++++++++++++--
include/linux/psp-sev.h | 11 +++
2 files changed, 178 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 35d76333e120..0dbd99f29b25 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -79,6 +79,14 @@ static void *sev_es_tmr;
#define NV_LENGTH (32 * 1024)
static void *sev_init_ex_buffer;

+/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
+#define SEV_SNP_ES_TMR_SIZE (2 * 1024 * 1024)
+
+static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
+
+static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
+static int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
{
struct sev_device *sev = psp_master->sev_data;
@@ -177,11 +185,161 @@ static int sev_cmd_buffer_len(int cmd)
return 0;
}

+static void snp_leak_pages(unsigned long pfn, unsigned int npages)
+{
+ WARN(1, "psc failed, pfn 0x%lx pages %d (leaking)\n", pfn, npages);
+ while (npages--) {
+ memory_failure(pfn, 0);
+ dump_rmpentry(pfn);
+ pfn++;
+ }
+}
+
+static int snp_reclaim_pages(unsigned long pfn, unsigned int npages, bool locked)
+{
+ struct sev_data_snp_page_reclaim data;
+ int ret, err, i, n = 0;
+
+ for (i = 0; i < npages; i++) {
+ memset(&data, 0, sizeof(data));
+ data.paddr = pfn << PAGE_SHIFT;
+
+ if (locked)
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ else
+ ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ if (ret)
+ goto cleanup;
+
+ ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (ret)
+ goto cleanup;
+
+ pfn++;
+ n++;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * If failed to reclaim the page then page is no longer safe to
+ * be released, leak it.
+ */
+ snp_leak_pages(pfn, npages - n);
+ return ret;
+}
+
+static inline int rmp_make_firmware(unsigned long pfn, int level)
+{
+ return rmp_make_private(pfn, 0, level, 0, true);
+}
+
+static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, bool to_fw, bool locked,
+ bool need_reclaim)
+{
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT; /* Cbit maybe set in the paddr */
+ int rc, n = 0, i;
+
+ for (i = 0; i < npages; i++) {
+ if (to_fw)
+ rc = rmp_make_firmware(pfn, PG_LEVEL_4K);
+ else
+ rc = need_reclaim ? snp_reclaim_pages(pfn, 1, locked) :
+ rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (rc)
+ goto cleanup;
+
+ pfn++;
+ n++;
+ }
+
+ return 0;
+
+cleanup:
+ /* Try unrolling the firmware state changes */
+ if (to_fw) {
+ /*
+ * Reclaim the pages which were already changed to the
+ * firmware state.
+ */
+ snp_reclaim_pages(paddr >> PAGE_SHIFT, n, locked);
+
+ return rc;
+ }
+
+ /*
+ * If failed to change the page state to shared, then its not safe
+ * to release the page back to the system, leak it.
+ */
+ snp_leak_pages(pfn, npages - n);
+
+ return rc;
+}
+
+static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
+{
+ unsigned long npages = 1ul << order, paddr;
+ struct sev_device *sev;
+ struct page *page;
+
+ if (!psp_master || !psp_master->sev_data)
+ return NULL;
+
+ page = alloc_pages(gfp_mask, order);
+ if (!page)
+ return NULL;
+
+ /* If SEV-SNP is initialized then add the page in RMP table. */
+ sev = psp_master->sev_data;
+ if (!sev->snp_inited)
+ return page;
+
+ paddr = __pa((unsigned long)page_address(page));
+ if (snp_set_rmp_state(paddr, npages, true, locked, false))
+ return NULL;
+
+ return page;
+}
+
+void *snp_alloc_firmware_page(gfp_t gfp_mask)
+{
+ struct page *page;
+
+ page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
+
+ return page ? page_address(page) : NULL;
+}
+EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
+
+static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
+{
+ unsigned long paddr, npages = 1ul << order;
+
+ if (!page)
+ return;
+
+ paddr = __pa((unsigned long)page_address(page));
+ if (snp_set_rmp_state(paddr, npages, false, locked, true))
+ return;
+
+ __free_pages(page, order);
+}
+
+void snp_free_firmware_page(void *addr)
+{
+ if (!addr)
+ return;
+
+ __snp_free_firmware_pages(virt_to_page(addr), 0, false);
+}
+EXPORT_SYMBOL(snp_free_firmware_page);
+
static void *sev_fw_alloc(unsigned long len)
{
struct page *page;

- page = alloc_pages(GFP_KERNEL, get_order(len));
+ page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len), false);
if (!page)
return NULL;

@@ -393,7 +551,7 @@ static int __sev_init_locked(int *error)
data.tmr_address = __pa(sev_es_tmr);

data.flags |= SEV_INIT_FLAGS_SEV_ES;
- data.tmr_len = SEV_ES_TMR_SIZE;
+ data.tmr_len = sev_es_tmr_size;
}

return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
@@ -421,7 +579,7 @@ static int __sev_init_ex_locked(int *error)
data.tmr_address = __pa(sev_es_tmr);

data.flags |= SEV_INIT_FLAGS_SEV_ES;
- data.tmr_len = SEV_ES_TMR_SIZE;
+ data.tmr_len = sev_es_tmr_size;
}

return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
@@ -818,6 +976,8 @@ static int __sev_snp_init_locked(int *error)
sev->snp_inited = true;
dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");

+ sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
+
return rc;
}

@@ -1341,8 +1501,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
/* The TMR area was encrypted, flush it from the cache */
wbinvd_on_all_cpus();

- free_pages((unsigned long)sev_es_tmr,
- get_order(SEV_ES_TMR_SIZE));
+ __snp_free_firmware_pages(virt_to_page(sev_es_tmr),
+ get_order(sev_es_tmr_size),
+ false);
sev_es_tmr = NULL;
}

@@ -1430,7 +1591,7 @@ void sev_pci_init(void)
}

/* Obtain the TMR memory area for SEV-ES use */
- sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
+ sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
if (!sev_es_tmr)
dev_warn(sev->dev,
"SEV: TMR allocation failed, SEV-ES support unavailable\n");
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 9f921d221b75..a3bb792bb842 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -12,6 +12,8 @@
#ifndef __PSP_SEV_H__
#define __PSP_SEV_H__

+#include <linux/sev.h>
+
#include <uapi/linux/psp-sev.h>

#ifdef CONFIG_X86
@@ -940,6 +942,8 @@ int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);

void *psp_copy_user_blob(u64 uaddr, u32 len);
+void *snp_alloc_firmware_page(gfp_t mask);
+void snp_free_firmware_page(void *addr);

#else /* !CONFIG_CRYPTO_DEV_SP_PSP */

@@ -981,6 +985,13 @@ static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *erro
return -ENODEV;
}

+static inline void *snp_alloc_firmware_page(gfp_t mask)
+{
+ return NULL;
+}
+
+static inline void snp_free_firmware_page(void *addr) { }
+
#endif /* CONFIG_CRYPTO_DEV_SP_PSP */

#endif /* __PSP_SEV_H__ */
--
2.25.1

2022-06-20 23:09:44

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 15/49] crypto: ccp: Handle the legacy SEV command when SNP is enabled

From: Brijesh Singh <[email protected]>

The behavior of the SEV-legacy commands is altered when the SNP firmware
is in the INIT state. When SNP is in INIT state, all the SEV-legacy
commands that cause the firmware to write to memory must be in the
firmware state before issuing the command..

A command buffer may contains a system physical address that the firmware
may write to. There are two cases that need to be handled:

1) system physical address points to a guest memory
2) system physical address points to a host memory

To handle the case #1, change the page state to the firmware in the RMP
table before issuing the command and restore the state to shared after the
command completes.

For the case #2, use a bounce buffer to complete the request.

Signed-off-by: Brijesh Singh <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 346 ++++++++++++++++++++++++++++++++++-
drivers/crypto/ccp/sev-dev.h | 12 ++
2 files changed, 348 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 0dbd99f29b25..75f5c4ed9ac3 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -441,12 +441,295 @@ static void sev_write_init_ex_file_if_required(int cmd_id)
sev_write_init_ex_file();
}

+static int alloc_snp_host_map(struct sev_device *sev)
+{
+ struct page *page;
+ int i;
+
+ for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+ struct snp_host_map *map = &sev->snp_host_map[i];
+
+ memset(map, 0, sizeof(*map));
+
+ page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
+ if (!page)
+ return -ENOMEM;
+
+ map->host = page_address(page);
+ }
+
+ return 0;
+}
+
+static void free_snp_host_map(struct sev_device *sev)
+{
+ int i;
+
+ for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+ struct snp_host_map *map = &sev->snp_host_map[i];
+
+ if (map->host) {
+ __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
+ memset(map, 0, sizeof(*map));
+ }
+ }
+}
+
+static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+ unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+ map->active = false;
+
+ if (!paddr || !len)
+ return 0;
+
+ map->paddr = *paddr;
+ map->len = len;
+
+ /* If paddr points to a guest memory then change the page state to firmwware. */
+ if (guest) {
+ if (snp_set_rmp_state(*paddr, npages, true, true, false))
+ return -EFAULT;
+
+ goto done;
+ }
+
+ if (!map->host)
+ return -ENOMEM;
+
+ /* Check if the pre-allocated buffer can be used to fullfil the request. */
+ if (len > SEV_FW_BLOB_MAX_SIZE)
+ return -EINVAL;
+
+ /* Transition the pre-allocated buffer to the firmware state. */
+ if (snp_set_rmp_state(__pa(map->host), npages, true, true, false))
+ return -EFAULT;
+
+ /* Set the paddr to use pre-allocated firmware buffer */
+ *paddr = __psp_pa(map->host);
+
+done:
+ map->active = true;
+ return 0;
+}
+
+static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+ unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+ if (!map->active)
+ return 0;
+
+ /* If paddr points to a guest memory then restore the page state to hypervisor. */
+ if (guest) {
+ if (snp_set_rmp_state(*paddr, npages, false, true, true))
+ return -EFAULT;
+
+ goto done;
+ }
+
+ /*
+ * Transition the pre-allocated buffer to hypervisor state before the access.
+ *
+ * This is because while changing the page state to firmware, the kernel unmaps
+ * the pages from the direct map, and to restore the direct map we must
+ * transition the pages to shared state.
+ */
+ if (snp_set_rmp_state(__pa(map->host), npages, false, true, true))
+ return -EFAULT;
+
+ /* Copy the response data firmware buffer to the callers buffer. */
+ memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
+ *paddr = map->paddr;
+
+done:
+ map->active = false;
+ return 0;
+}
+
+static bool sev_legacy_cmd_buf_writable(int cmd)
+{
+ switch (cmd) {
+ case SEV_CMD_PLATFORM_STATUS:
+ case SEV_CMD_GUEST_STATUS:
+ case SEV_CMD_LAUNCH_START:
+ case SEV_CMD_RECEIVE_START:
+ case SEV_CMD_LAUNCH_MEASURE:
+ case SEV_CMD_SEND_START:
+ case SEV_CMD_SEND_UPDATE_DATA:
+ case SEV_CMD_SEND_UPDATE_VMSA:
+ case SEV_CMD_PEK_CSR:
+ case SEV_CMD_PDH_CERT_EXPORT:
+ case SEV_CMD_GET_ID:
+ case SEV_CMD_ATTESTATION_REPORT:
+ return true;
+ default:
+ return false;
+ }
+}
+
+#define prep_buffer(name, addr, len, guest, map) \
+ func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
+
+static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
+{
+ int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
+ struct sev_device *sev = psp_master->sev_data;
+ bool from_fw = !to_fw;
+
+ /*
+ * After the command is completed, change the command buffer memory to
+ * hypervisor state.
+ *
+ * The immutable bit is automatically cleared by the firmware, so
+ * no not need to reclaim the page.
+ */
+ if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
+ if (snp_set_rmp_state(__pa(cmd_buf), 1, false, true, false))
+ return -EFAULT;
+
+ /* No need to go further if firmware failed to execute command. */
+ if (fw_err)
+ return 0;
+ }
+
+ if (to_fw)
+ func = map_firmware_writeable;
+ else
+ func = unmap_firmware_writeable;
+
+ /*
+ * A command buffer may contains a system physical address. If the address
+ * points to a host memory then use an intermediate firmware page otherwise
+ * change the page state in the RMP table.
+ */
+ switch (cmd) {
+ case SEV_CMD_PDH_CERT_EXPORT:
+ if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
+ pdh_cert_len, false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
+ cert_chain_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_GET_ID:
+ if (prep_buffer(struct sev_data_get_id, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_PEK_CSR:
+ if (prep_buffer(struct sev_data_pek_csr, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_launch_update_data, address, len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_MEASURE:
+ if (prep_buffer(struct sev_data_launch_measure, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_SECRET:
+ if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_DBG_DECRYPT:
+ if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
+ &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_DBG_ENCRYPT:
+ if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
+ &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_ATTESTATION_REPORT:
+ if (prep_buffer(struct sev_data_attestation_report, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_START:
+ if (prep_buffer(struct sev_data_send_start, session_address,
+ session_len, false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_send_update_data, trans_address,
+ trans_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
+ trans_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_RECEIVE_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_receive_update_data, guest_address,
+ guest_len, true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_RECEIVE_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
+ guest_len, true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ default:
+ break;
+ }
+
+ /* The command buffer need to be in the firmware state. */
+ if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
+ if (snp_set_rmp_state(__pa(cmd_buf), 1, true, true, false))
+ return -EFAULT;
+ }
+
+ return 0;
+
+err:
+ return -EINVAL;
+}
+
+static inline bool need_firmware_copy(int cmd)
+{
+ struct sev_device *sev = psp_master->sev_data;
+
+ /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
+ return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_inited) ? true : false;
+}
+
+static int snp_aware_copy_to_firmware(int cmd, void *data)
+{
+ return __snp_cmd_buf_copy(cmd, data, true, 0);
+}
+
+static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
+{
+ return __snp_cmd_buf_copy(cmd, data, false, fw_err);
+}
+
static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
{
struct psp_device *psp = psp_master;
struct sev_device *sev;
unsigned int phys_lsb, phys_msb;
unsigned int reg, ret = 0;
+ void *cmd_buf;
int buf_len;

if (!psp || !psp->sev_data)
@@ -466,12 +749,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
* work for some memory, e.g. vmalloc'd addresses, and @data may not be
* physically contiguous.
*/
- if (data)
- memcpy(sev->cmd_buf, data, buf_len);
+ if (data) {
+ if (sev->cmd_buf_active > 2)
+ return -EBUSY;
+
+ cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
+
+ memcpy(cmd_buf, data, buf_len);
+ sev->cmd_buf_active++;
+
+ /*
+ * The behavior of the SEV-legacy commands is altered when the
+ * SNP firmware is in the INIT state.
+ */
+ if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, sev->cmd_buf))
+ return -EFAULT;
+ } else {
+ cmd_buf = sev->cmd_buf;
+ }

/* Get the physical address of the command buffer */
- phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
- phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
+ phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
+ phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;

dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
cmd, phys_msb, phys_lsb, psp_timeout);
@@ -514,15 +813,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
sev_write_init_ex_file_if_required(cmd);
}

- print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
- buf_len, false);
-
/*
* Copy potential output from the PSP back to data. Do this even on
* failure in case the caller wants to glean something from the error.
*/
- if (data)
- memcpy(data, sev->cmd_buf, buf_len);
+ if (data) {
+ /*
+ * Restore the page state after the command completes.
+ */
+ if (need_firmware_copy(cmd) &&
+ snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
+ return -EFAULT;
+
+ memcpy(data, cmd_buf, buf_len);
+ sev->cmd_buf_active--;
+ }
+
+ print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+ buf_len, false);

return ret;
}
@@ -1451,10 +1759,12 @@ int sev_dev_init(struct psp_device *psp)
if (!sev)
goto e_err;

- sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
+ sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
if (!sev->cmd_buf)
goto e_sev;

+ sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+
psp->sev_data = sev;

sev->dev = dev;
@@ -1513,6 +1823,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
sev_init_ex_buffer = NULL;
}

+ /*
+ * The host map need to clear the immutable bit so it must be free'd before the
+ * SNP firmware shutdown.
+ */
+ free_snp_host_map(sev);
+
sev_snp_shutdown(NULL);
}

@@ -1588,6 +1904,14 @@ void sev_pci_init(void)
dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
}
}
+
+ /*
+ * Allocate the intermediate buffers used for the legacy command handling.
+ */
+ if (alloc_snp_host_map(sev)) {
+ dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
+ goto skip_legacy;
+ }
}

/* Obtain the TMR memory area for SEV-ES use */
@@ -1605,12 +1929,14 @@ void sev_pci_init(void)
dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
error, rc);

+skip_legacy:
dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_inited ?
"-SNP" : "", sev->api_major, sev->api_minor, sev->build);

return;

err:
+ free_snp_host_map(sev);
psp_master->sev_data = NULL;
}

diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 186ad20cbd24..fe5d7a3ebace 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -29,11 +29,20 @@
#define SEV_CMDRESP_CMD_SHIFT 16
#define SEV_CMDRESP_IOC BIT(0)

+#define MAX_SNP_HOST_MAP_BUFS 2
+
struct sev_misc_dev {
struct kref refcount;
struct miscdevice misc;
};

+struct snp_host_map {
+ u64 paddr;
+ u32 len;
+ void *host;
+ bool active;
+};
+
struct sev_device {
struct device *dev;
struct psp_device *psp;
@@ -52,8 +61,11 @@ struct sev_device {
u8 build;

void *cmd_buf;
+ void *cmd_buf_backup;
+ int cmd_buf_active;

bool snp_inited;
+ struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
};

int sev_dev_init(struct psp_device *psp);
--
2.25.1

2022-06-20 23:09:44

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 18/49] crypto: ccp: Provide APIs to query extended attestation report

From: Brijesh Singh <[email protected]>

Version 2 of the GHCB specification defines VMGEXIT that is used to get
the extended attestation report. The extended attestation report includes
the certificate blobs provided through the SNP_SET_EXT_CONFIG.

The snp_guest_ext_guest_request() will be used by the hypervisor to get
the extended attestation report. See the GHCB specification for more
details.

Signed-off-by: Brijesh Singh <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 43 ++++++++++++++++++++++++++++++++++++
include/linux/psp-sev.h | 24 ++++++++++++++++++++
2 files changed, 67 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 97b479d5aa86..f6306b820b86 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -25,6 +25,7 @@
#include <linux/fs.h>

#include <asm/smp.h>
+#include <asm/sev.h>

#include "psp-dev.h"
#include "sev-dev.h"
@@ -1857,6 +1858,48 @@ int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
}
EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);

+int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+ unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
+{
+ unsigned long expected_npages;
+ struct sev_device *sev;
+ int rc;
+
+ if (!psp_master || !psp_master->sev_data)
+ return -ENODEV;
+
+ sev = psp_master->sev_data;
+
+ if (!sev->snp_inited)
+ return -EINVAL;
+
+ /*
+ * Check if there is enough space to copy the certificate chain. Otherwise
+ * return ERROR code defined in the GHCB specification.
+ */
+ expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
+ if (*npages < expected_npages) {
+ *npages = expected_npages;
+ *fw_err = SNP_GUEST_REQ_INVALID_LEN;
+ return -EINVAL;
+ }
+
+ rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)&fw_err);
+ if (rc)
+ return rc;
+
+ /* Copy the certificate blob */
+ if (sev->snp_certs_data) {
+ *npages = expected_npages;
+ memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);
+ } else {
+ *npages = 0;
+ }
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
+
static void sev_exit(struct kref *ref)
{
misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index a3bb792bb842..cd37ccd1fa1f 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -945,6 +945,23 @@ void *psp_copy_user_blob(u64 uaddr, u32 len);
void *snp_alloc_firmware_page(gfp_t mask);
void snp_free_firmware_page(void *addr);

+/**
+ * snp_guest_ext_guest_request - perform the SNP extended guest request command
+ * defined in the GHCB specification.
+ *
+ * @data: the input guest request structure
+ * @vaddr: address where the certificate blob need to be copied.
+ * @npages: number of pages for the certificate blob.
+ * If the specified page count is less than the certificate blob size, then the
+ * required page count is returned with error code defined in the GHCB spec.
+ * If the specified page count is more than the certificate blob size, then
+ * page count is updated to reflect the amount of valid data copied in the
+ * vaddr.
+ */
+int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+ unsigned long vaddr, unsigned long *npages,
+ unsigned long *error);
+
#else /* !CONFIG_CRYPTO_DEV_SP_PSP */

static inline int
@@ -992,6 +1009,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)

static inline void snp_free_firmware_page(void *addr) { }

+static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+ unsigned long vaddr, unsigned long *n,
+ unsigned long *error)
+{
+ return -ENODEV;
+}
+
#endif /* CONFIG_CRYPTO_DEV_SP_PSP */

#endif /* __PSP_SEV_H__ */
--
2.25.1

2022-06-20 23:11:12

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 16/49] crypto: ccp: Add the SNP_PLATFORM_STATUS command

From: Brijesh Singh <[email protected]com>

The command can be used by the userspace to query the SNP platform status
report. See the SEV-SNP spec for more details.

Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/virt/coco/sevguest.rst | 27 +++++++++++++++++
drivers/crypto/ccp/sev-dev.c | 45 ++++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 1 +
3 files changed, 73 insertions(+)

diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
index bf593e88cfd9..11ea67c944df 100644
--- a/Documentation/virt/coco/sevguest.rst
+++ b/Documentation/virt/coco/sevguest.rst
@@ -61,6 +61,22 @@ counter (e.g. counter overflow), then -EIO will be returned.
__u64 fw_err;
};

+The host ioctl should be called to /dev/sev device. The ioctl accepts command
+id and command input structure.
+
+::
+ struct sev_issue_cmd {
+ /* Command ID */
+ __u32 cmd;
+
+ /* Command request structure */
+ __u64 data;
+
+ /* firmware error code on failure (see psp-sev.h) */
+ __u32 error;
+ };
+
+
2.1 SNP_GET_REPORT
------------------

@@ -118,6 +134,17 @@ be updated with the expected value.

See GHCB specification for further detail on how to parse the certificate blob.

+2.4 SNP_PLATFORM_STATUS
+-----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_platform_status
+:Returns (out): 0 on success, -negative on error
+
+The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
+status includes API major, minor version and more. See the SEV-SNP
+specification for further details.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 75f5c4ed9ac3..b9b6fab31a82 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1574,6 +1574,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
return ret;
}

+static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_data_snp_platform_status_buf buf;
+ struct page *status_page;
+ void *data;
+ int ret;
+
+ if (!sev->snp_inited || !argp->data)
+ return -EINVAL;
+
+ status_page = alloc_page(GFP_KERNEL_ACCOUNT);
+ if (!status_page)
+ return -ENOMEM;
+
+ data = page_address(status_page);
+ if (snp_set_rmp_state(__pa(data), 1, true, true, false)) {
+ __free_pages(status_page, 0);
+ return -EFAULT;
+ }
+
+ buf.status_paddr = __psp_pa(data);
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
+
+ /* Change the page state before accessing it */
+ if (snp_set_rmp_state(__pa(data), 1, false, true, true)) {
+ snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
+ return -EFAULT;
+ }
+
+ if (ret)
+ goto cleanup;
+
+ if (copy_to_user((void __user *)argp->data, data,
+ sizeof(struct sev_user_data_snp_status)))
+ ret = -EFAULT;
+
+cleanup:
+ __free_pages(status_page, 0);
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -1625,6 +1667,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SEV_GET_ID2:
ret = sev_ioctl_do_get_id2(&input);
break;
+ case SNP_PLATFORM_STATUS:
+ ret = sev_ioctl_snp_platform_status(&input);
+ break;
default:
ret = -EINVAL;
goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index bed65a891223..ffd60e8b0a31 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -28,6 +28,7 @@ enum {
SEV_PEK_CERT_IMPORT,
SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
SEV_GET_ID2,
+ SNP_PLATFORM_STATUS,

SEV_MAX,
};
--
2.25.1

2022-06-20 23:11:23

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 19/49] KVM: SVM: Add support to handle AP reset MSR protocol

From: Tom Lendacky <[email protected]>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++-----
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b8357d6ecd47..e15548d88f2a 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -56,6 +56,8 @@
/* AP Reset Hold */
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 609471204c6e..a1318236acd2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -56,6 +56,10 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
#define sev_es_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

+#define AP_RESET_HOLD_NONE 0
+#define AP_RESET_HOLD_NAE_EVENT 1
+#define AP_RESET_HOLD_MSR_PROTO 2
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2511,6 +2515,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)

void sev_es_unmap_ghcb(struct vcpu_svm *svm)
{
+ /* Clear any indication that the vCPU is in a type of AP Reset Hold */
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
if (!svm->sev_es.ghcb)
return;

@@ -2723,6 +2730,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_AP_RESET_HOLD_REQ:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+ ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+ /*
+ * Preset the result to a non-SIPI return and then only set
+ * the result to non-zero when delivering a SIPI.
+ */
+ set_ghcb_msr_bits(svm, 0,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -2823,6 +2846,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
break;
case SVM_VMGEXIT_AP_HLT_LOOP:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
ret = kvm_emulate_ap_reset_hold(vcpu);
break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -2966,13 +2990,29 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
return;
}

- /*
- * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
- * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
- * non-zero value.
- */
- if (!svm->sev_es.ghcb)
- return;
+ /* Subsequent SIPI */
+ switch (svm->sev_es.ap_reset_hold_type) {
+ case AP_RESET_HOLD_NAE_EVENT:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+ */
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ break;
+ case AP_RESET_HOLD_MSR_PROTO:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set GHCB data field to a non-zero value.
+ */
+ set_ghcb_msr_bits(svm, 1,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);

- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ default:
+ break;
+ }
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bb9ec9139af3..9f7eb1f18893 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -186,6 +186,7 @@ struct vcpu_sev_es_state {
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
bool received_first_sipi;
+ unsigned int ap_reset_hold_type;

/* SEV-ES scratch area support */
void *ghcb_sa;
--
2.25.1

2022-06-20 23:11:48

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 17/49] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command

From: Brijesh Singh <[email protected]>

The SEV-SNP firmware provides the SNP_CONFIG command used to set the
system-wide configuration value for SNP guests. The information includes
the TCB version string to be reported in guest attestation reports.

Version 2 of the GHCB specification adds an NAE (SNP extended guest
request) that a guest can use to query the reports that include additional
certificates.

In both cases, userspace provided additional data is included in the
attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
command to give the certificate blob and the reported TCB version string
at once. Note that the specification defines certificate blob with a
specific GUID format; the userspace is responsible for building the
proper certificate blob. The ioctl treats it an opaque blob.

While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
command that can be used to obtain the data programmed through the
SNP_SET_EXT_CONFIG.

Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/virt/coco/sevguest.rst | 27 +++++++
drivers/crypto/ccp/sev-dev.c | 115 +++++++++++++++++++++++++++
drivers/crypto/ccp/sev-dev.h | 3 +
include/uapi/linux/psp-sev.h | 17 ++++
4 files changed, 162 insertions(+)

diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
index 11ea67c944df..3014de47e4ce 100644
--- a/Documentation/virt/coco/sevguest.rst
+++ b/Documentation/virt/coco/sevguest.rst
@@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
status includes API major, minor version and more. See the SEV-SNP
specification for further details.

+2.5 SNP_SET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
+reported TCB version in the attestation report. The command is similar to
+SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
+command also accepts an additional certificate blob defined in the GHCB
+specification.
+
+If the certs_address is zero, then previous certificate blob will deleted.
+For more information on the certificate blob layout, see the GHCB spec
+(extended guest request message).
+
+2.6 SNP_GET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to query the system-wide configuration set
+through the SNP_SET_EXT_CONFIG.
+
3. SEV-SNP CPUID Enforcement
============================

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index b9b6fab31a82..97b479d5aa86 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1312,6 +1312,10 @@ static int __sev_snp_shutdown_locked(int *error)
if (!sev->snp_inited)
return 0;

+ /* Free the memory used for caching the certificate data */
+ kfree(sev->snp_certs_data);
+ sev->snp_certs_data = NULL;
+
/* SHUTDOWN requires the DF_FLUSH */
wbinvd_on_all_cpus();
__sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
@@ -1616,6 +1620,111 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
return ret;
}

+static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_ext_snp_config input;
+ int ret;
+
+ if (!sev->snp_inited || !argp->data)
+ return -EINVAL;
+
+ if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+ return -EFAULT;
+
+ /* Copy the TCB version programmed through the SET_CONFIG to userspace */
+ if (input.config_address) {
+ if (copy_to_user((void * __user)input.config_address,
+ &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
+ return -EFAULT;
+ }
+
+ /* Copy the extended certs programmed through the SNP_SET_CONFIG */
+ if (input.certs_address && sev->snp_certs_data) {
+ if (input.certs_len < sev->snp_certs_len) {
+ /* Return the certs length to userspace */
+ input.certs_len = sev->snp_certs_len;
+
+ ret = -ENOSR;
+ goto e_done;
+ }
+
+ if (copy_to_user((void * __user)input.certs_address,
+ sev->snp_certs_data, sev->snp_certs_len))
+ return -EFAULT;
+ }
+
+ ret = 0;
+
+e_done:
+ if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
+ ret = -EFAULT;
+
+ return ret;
+}
+
+static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_ext_snp_config input;
+ struct sev_user_data_snp_config config;
+ void *certs = NULL;
+ int ret = 0;
+
+ if (!sev->snp_inited || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+ return -EFAULT;
+
+ /* Copy the certs from userspace */
+ if (input.certs_address) {
+ if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
+ return -EINVAL;
+
+ certs = psp_copy_user_blob(input.certs_address, input.certs_len);
+ if (IS_ERR(certs))
+ return PTR_ERR(certs);
+ }
+
+ /* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
+ if (input.config_address) {
+ if (copy_from_user(&config,
+ (void __user *)input.config_address, sizeof(config))) {
+ ret = -EFAULT;
+ goto e_free;
+ }
+
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
+ if (ret)
+ goto e_free;
+
+ memcpy(&sev->snp_config, &config, sizeof(config));
+ }
+
+ /*
+ * If the new certs are passed then cache it else free the old certs.
+ */
+ if (certs) {
+ kfree(sev->snp_certs_data);
+ sev->snp_certs_data = certs;
+ sev->snp_certs_len = input.certs_len;
+ } else {
+ kfree(sev->snp_certs_data);
+ sev->snp_certs_data = NULL;
+ sev->snp_certs_len = 0;
+ }
+
+ return 0;
+
+e_free:
+ kfree(certs);
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -1670,6 +1779,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SNP_PLATFORM_STATUS:
ret = sev_ioctl_snp_platform_status(&input);
break;
+ case SNP_SET_EXT_CONFIG:
+ ret = sev_ioctl_snp_set_config(&input, writable);
+ break;
+ case SNP_GET_EXT_CONFIG:
+ ret = sev_ioctl_snp_get_config(&input);
+ break;
default:
ret = -EINVAL;
goto out;
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index fe5d7a3ebace..d2fe1706311a 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -66,6 +66,9 @@ struct sev_device {

bool snp_inited;
struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
+ void *snp_certs_data;
+ u32 snp_certs_len;
+ struct sev_user_data_snp_config snp_config;
};

int sev_dev_init(struct psp_device *psp);
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index ffd60e8b0a31..60e7a8d1a18e 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -29,6 +29,8 @@ enum {
SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
SEV_GET_ID2,
SNP_PLATFORM_STATUS,
+ SNP_SET_EXT_CONFIG,
+ SNP_GET_EXT_CONFIG,

SEV_MAX,
};
@@ -190,6 +192,21 @@ struct sev_user_data_snp_config {
__u8 rsvd[52];
} __packed;

+/**
+ * struct sev_data_snp_ext_config - system wide configuration value for SNP.
+ *
+ * @config_address: address of the struct sev_user_data_snp_config or 0 when
+ * reported_tcb does not need to be updated.
+ * @certs_address: address of extended guest request certificate chain or
+ * 0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
+ * @certs_len: length of the certs
+ */
+struct sev_user_data_ext_snp_config {
+ __u64 config_address; /* In */
+ __u64 certs_address; /* In */
+ __u32 certs_len; /* In */
+};
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1

2022-06-20 23:15:01

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 20/49] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT

From: Brijesh Singh <[email protected]>

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 14 ++++++++++++++
arch/x86/kvm/svm/svm.h | 3 ++-
3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index e15548d88f2a..539de6b93420 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_POS 12
+#define GHCB_MSR_HV_FT_MASK GENMASK_ULL(51, 0)
#define GHCB_MSR_HV_FT_RESP_VAL(v) \
/* GHCBData[63:12] */ \
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a1318236acd2..b49c370d5ae9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2480,6 +2480,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+ case SVM_VMGEXIT_HV_FEATURES:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -2746,6 +2747,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK,
GHCB_MSR_INFO_POS);
break;
+ case GHCB_MSR_HV_FT_REQ: {
+ set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+ GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+ GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -2871,6 +2879,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_HV_FEATURES: {
+ ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FT_SUPPORTED);
+
+ ret = 1;
+ break;
+ }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9f7eb1f18893..1f4a8bd09c9e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -629,9 +629,10 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu);

/* sev.c */

-#define GHCB_VERSION_MAX 1ULL
+#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

+#define GHCB_HV_FT_SUPPORTED 0

extern unsigned int max_sev_asid;

--
2.25.1

2022-06-20 23:15:04

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 22/49] KVM: SVM: Add initial SEV-SNP support

From: Brijesh Singh <[email protected]>

The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding new
hardware based security protection. SEV-SNP adds strong memory encryption
integrity protection to help prevent malicious hypervisor-based attacks
such as data replay, memory re-mapping, and more, to create an isolated
execution environment.

The SNP feature is added incrementally, the later patches adds a new module
parameters that can be used to enabled SEV-SNP in the KVM.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kvm/svm/sev.c | 10 +++++++++-
arch/x86/kvm/svm/svm.h | 8 ++++++++
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 93365996bd59..dc1f69a28aa7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -56,6 +56,9 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
#define sev_es_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
+
#define AP_RESET_HOLD_NONE 0
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2
@@ -2120,6 +2123,7 @@ void __init sev_hardware_setup(void)
{
#ifdef CONFIG_KVM_AMD_SEV
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+ bool sev_snp_supported = false;
bool sev_es_supported = false;
bool sev_supported = false;

@@ -2190,12 +2194,16 @@ void __init sev_hardware_setup(void)
if (misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count))
goto out;

- pr_info("SEV-ES supported: %u ASIDs\n", sev_es_asid_count);
sev_es_supported = true;
+ sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
+
+ pr_info("SEV-ES %ssupported: %u ASIDs\n",
+ sev_snp_supported ? "and SEV-SNP " : "", sev_es_asid_count);

out:
sev_enabled = sev_supported;
sev_es_enabled = sev_es_supported;
+ sev_snp_enabled = sev_snp_supported;
#endif
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9672e25a338d..edecc5066517 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -75,6 +75,7 @@ enum {
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
+ bool snp_active; /* SEV-SNP enabled guest */
unsigned int asid; /* ASID used for this guest */
unsigned int handle; /* SEV firmware handle */
int fd; /* SEV device fd */
@@ -314,6 +315,13 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
#endif
}

+static inline bool sev_snp_guest(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return sev_es_guest(kvm) && sev->snp_active;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1

2022-06-20 23:15:04

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 24/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command

From: Brijesh Singh <[email protected]>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 24 ++++
arch/x86/kvm/svm/sev.c | 115 +++++++++++++++++-
arch/x86/kvm/svm/svm.h | 1 +
include/uapi/linux/kvm.h | 10 ++
4 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 903023f524af..878711f2dca6 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -462,6 +462,30 @@ The flags bitmap is defined as::
If the specified flags is not supported then return -EOPNOTSUPP, and the supported
flags are returned.

+19. KVM_SNP_LAUNCH_START
+------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. To create the encryption context, user must
+provide a guest policy, migration agent (if any) and guest OS visible
+workarounds value as defined SEV-SNP specification.
+
+Parameters (in): struct kvm_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_start {
+ __u64 policy; /* Guest policy to use. */
+ __u64 ma_uaddr; /* userspace address of migration agent */
+ __u8 ma_en; /* 1 if the migtation agent is enabled */
+ __u8 imi_en; /* set IMI to 1. */
+ __u8 gosvw[16]; /* guest OS visible workarounds */
+ };
+
+See the SEV-SNP specification for further detail on the launch input.
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 813bda7f7b55..9e6fc7a94ed7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -21,6 +21,7 @@
#include <asm/pkru.h>
#include <asm/trapnr.h>
#include <asm/fpu/xcr.h>
+#include <asm/sev.h>

#include "x86.h"
#include "svm.h"
@@ -73,6 +74,8 @@ static unsigned int nr_asids;
static unsigned long *sev_asid_bitmap;
static unsigned long *sev_reclaim_asid_bitmap;

+static int snp_decommission_context(struct kvm *kvm);
+
struct enc_region {
struct list_head list;
unsigned long npages;
@@ -98,12 +101,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
down_write(&sev_deactivate_lock);

wbinvd_on_all_cpus();
- ret = sev_guest_df_flush(&error);
+
+ if (sev_snp_enabled)
+ ret = snp_guest_df_flush(&error);
+ else
+ ret = sev_guest_df_flush(&error);

up_write(&sev_deactivate_lock);

if (ret)
- pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+ pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+ sev_snp_enabled ? "-SNP" : "", ret, error);

return ret;
}
@@ -1825,6 +1833,74 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}

+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct sev_data_snp_gctx_create data = {};
+ void *context;
+ int rc;
+
+ /* Allocate memory for context page */
+ context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+ if (!context)
+ return NULL;
+
+ data.gctx_paddr = __psp_pa(context);
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+ if (rc) {
+ snp_free_firmware_page(context);
+ return NULL;
+ }
+
+ return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_activate data = {0};
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.asid = sev_get_asid(kvm);
+ return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_start start = {0};
+ struct kvm_sev_snp_launch_start params;
+ int rc;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ sev->snp_context = snp_context_create(kvm, argp);
+ if (!sev->snp_context)
+ return -ENOTTY;
+
+ start.gctx_paddr = __psp_pa(sev->snp_context);
+ start.policy = params.policy;
+ memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ sev->fd = argp->sev_fd;
+ rc = snp_bind_asid(kvm, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ return 0;
+
+e_free_context:
+ snp_decommission_context(kvm);
+
+ return rc;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -1915,6 +1991,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_RECEIVE_FINISH:
r = sev_receive_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_START:
+ r = snp_launch_start(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2106,6 +2185,28 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}

+static int snp_decommission_context(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_decommission data = {};
+ int ret;
+
+ /* If context is not created then do nothing */
+ if (!sev->snp_context)
+ return 0;
+
+ data.gctx_paddr = __sme_pa(sev->snp_context);
+ ret = snp_guest_decommission(&data, NULL);
+ if (WARN_ONCE(ret, "failed to release guest context"))
+ return ret;
+
+ /* free the context page now */
+ snp_free_firmware_page(sev->snp_context);
+ sev->snp_context = NULL;
+
+ return 0;
+}
+
void sev_vm_destroy(struct kvm *kvm)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2147,7 +2248,15 @@ void sev_vm_destroy(struct kvm *kvm)
}
}

- sev_unbind_asid(kvm, sev->handle);
+ if (sev_snp_guest(kvm)) {
+ if (snp_decommission_context(kvm)) {
+ WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+ return;
+ }
+ } else {
+ sev_unbind_asid(kvm, sev->handle);
+ }
+
sev_asid_free(sev);
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2f45589ee596..71c011af098e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -91,6 +91,7 @@ struct kvm_sev_info {
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
u64 snp_init_flags;
+ void *snp_context; /* SNP guest context page */
};

struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0f912cefc544..0cb119d66ae5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1812,6 +1812,7 @@ enum sev_cmd_id {

/* SNP specific commands */
KVM_SEV_SNP_INIT,
+ KVM_SEV_SNP_LAUNCH_START,

KVM_SEV_NR_MAX,
};
@@ -1919,6 +1920,15 @@ struct kvm_snp_init {
__u64 flags;
};

+struct kvm_sev_snp_launch_start {
+ __u64 policy;
+ __u64 ma_uaddr;
+ __u8 ma_en;
+ __u8 imi_en;
+ __u8 gosvw[16];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2022-06-20 23:15:04

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 26/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command

From: Brijesh Singh <[email protected]>

The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.

In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.

While terminating the guest, reclaim the guest pages added in the RMP
table. If the reclaim fails, then the page is no longer safe to be
released back to the system and leak them.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 29 +++
arch/x86/kvm/svm/sev.c | 187 ++++++++++++++++++
include/uapi/linux/kvm.h | 19 ++
3 files changed, 235 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 878711f2dca6..62abd5c1f72b 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -486,6 +486,35 @@ Returns: 0 on success, -negative on error

See the SEV-SNP specification for further detail on the launch input.

+20. KVM_SNP_LAUNCH_UPDATE
+-------------------------
+
+The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
+calculates a measurement of the memory contents. The measurement is a signature
+of the memory contents that can be sent to the guest owner as an attestation
+that the memory was encrypted correctly by the firmware.
+
+Parameters (in): struct kvm_snp_launch_update
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_update {
+ __u64 start_gfn; /* Guest page number to start from. */
+ __u64 uaddr; /* userspace address need to be encrypted */
+ __u32 len; /* length of memory region */
+ __u8 imi_page; /* 1 if memory is part of the IMI */
+ __u8 page_type; /* page type */
+ __u8 vmpl3_perms; /* VMPL3 permission mask */
+ __u8 vmpl2_perms; /* VMPL2 permission mask */
+ __u8 vmpl1_perms; /* VMPL1 permission mask */
+ };
+
+See the SEV-SNP spec for further details on how to build the VMPL permission
+mask and page type.
+
+
References
==========

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 41b83aa6b5f4..b5f0707d7ed6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -18,6 +18,7 @@
#include <linux/processor.h>
#include <linux/trace_events.h>
#include <linux/hugetlb.h>
+#include <linux/sev.h>

#include <asm/pkru.h>
#include <asm/trapnr.h>
@@ -233,6 +234,49 @@ static void sev_decommission(unsigned int handle)
sev_guest_decommission(&decommission, NULL);
}

+static inline void snp_leak_pages(u64 pfn, enum pg_level level)
+{
+ unsigned int npages = page_level_size(level) >> PAGE_SHIFT;
+
+ WARN(1, "psc failed pfn 0x%llx pages %d (leaking)\n", pfn, npages);
+
+ while (npages) {
+ memory_failure(pfn, 0);
+ dump_rmpentry(pfn);
+ npages--;
+ pfn++;
+ }
+}
+
+static int snp_page_reclaim(u64 pfn)
+{
+ struct sev_data_snp_page_reclaim data = {0};
+ int err, rc;
+
+ data.paddr = __sme_set(pfn << PAGE_SHIFT);
+ rc = snp_guest_page_reclaim(&data, &err);
+ if (rc) {
+ /*
+ * If the reclaim failed, then page is no longer safe
+ * to use.
+ */
+ snp_leak_pages(pfn, PG_LEVEL_4K);
+ }
+
+ return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
+{
+ int rc;
+
+ rc = rmp_make_shared(pfn, level);
+ if (rc && leak)
+ snp_leak_pages(pfn, level);
+
+ return rc;
+}
+
static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
{
struct sev_data_deactivate deactivate;
@@ -1902,6 +1946,123 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return rc;
}

+static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct list_head *head = &sev->regions_list;
+ struct enc_region *i;
+
+ lockdep_assert_held(&kvm->lock);
+
+ list_for_each_entry(i, head, list) {
+ u64 start = i->uaddr;
+ u64 end = start + i->size;
+
+ if (start <= hva && end >= (hva + len))
+ return true;
+ }
+
+ return false;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_update data = {0};
+ struct kvm_sev_snp_launch_update params;
+ unsigned long npages, pfn, n = 0;
+ int *error = &argp->error;
+ struct page **inpages;
+ int ret, i, level;
+ u64 gfn;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ /* Verify that the specified address range is registered. */
+ if (!is_hva_registered(kvm, params.uaddr, params.len))
+ return -EINVAL;
+
+ /*
+ * The userspace memory is already locked so technically we don't
+ * need to lock it again. Later part of the function needs to know
+ * pfn so call the sev_pin_memory() so that we can get the list of
+ * pages to iterate through.
+ */
+ inpages = sev_pin_memory(kvm, params.uaddr, params.len, &npages, 1);
+ if (!inpages)
+ return -ENOMEM;
+
+ /*
+ * Verify that all the pages are marked shared in the RMP table before
+ * going further. This is avoid the cases where the userspace may try
+ * updating the same page twice.
+ */
+ for (i = 0; i < npages; i++) {
+ if (snp_lookup_rmpentry(page_to_pfn(inpages[i]), &level) != 0) {
+ sev_unpin_memory(kvm, inpages, npages);
+ return -EFAULT;
+ }
+ }
+
+ gfn = params.start_gfn;
+ level = PG_LEVEL_4K;
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+
+ for (i = 0; i < npages; i++) {
+ pfn = page_to_pfn(inpages[i]);
+
+ ret = rmp_make_private(pfn, gfn << PAGE_SHIFT, level, sev_get_asid(kvm), true);
+ if (ret) {
+ ret = -EFAULT;
+ goto e_unpin;
+ }
+
+ n++;
+ data.address = __sme_page_pa(inpages[i]);
+ data.page_size = X86_TO_RMP_PG_LEVEL(level);
+ data.page_type = params.page_type;
+ data.vmpl3_perms = params.vmpl3_perms;
+ data.vmpl2_perms = params.vmpl2_perms;
+ data.vmpl1_perms = params.vmpl1_perms;
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, &data, error);
+ if (ret) {
+ /*
+ * If the command failed then need to reclaim the page.
+ */
+ snp_page_reclaim(pfn);
+ goto e_unpin;
+ }
+
+ gfn++;
+ }
+
+e_unpin:
+ /* Content of memory is updated, mark pages dirty */
+ for (i = 0; i < n; i++) {
+ set_page_dirty_lock(inpages[i]);
+ mark_page_accessed(inpages[i]);
+
+ /*
+ * If its an error, then update RMP entry to change page ownership
+ * to the hypervisor.
+ */
+ if (ret)
+ host_rmp_make_shared(pfn, level, true);
+ }
+
+ /* Unlock the user pages */
+ sev_unpin_memory(kvm, inpages, npages);
+
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -1995,6 +2156,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_START:
r = snp_launch_start(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_UPDATE:
+ r = snp_launch_update(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2113,6 +2277,29 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
static void __unregister_enc_region_locked(struct kvm *kvm,
struct enc_region *region)
{
+ unsigned long i, pfn;
+ int level;
+
+ /*
+ * The guest memory pages are assigned in the RMP table. Unassign it
+ * before releasing the memory.
+ */
+ if (sev_snp_guest(kvm)) {
+ for (i = 0; i < region->npages; i++) {
+ pfn = page_to_pfn(region->pages[i]);
+
+ if (!snp_lookup_rmpentry(pfn, &level))
+ continue;
+
+ cond_resched();
+
+ if (level > PG_LEVEL_4K)
+ pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+ host_rmp_make_shared(pfn, level, true);
+ }
+ }
+
sev_unpin_memory(kvm, region->pages, region->npages);
list_del(&region->list);
kfree(region);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0cb119d66ae5..9b36b07414ea 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1813,6 +1813,7 @@ enum sev_cmd_id {
/* SNP specific commands */
KVM_SEV_SNP_INIT,
KVM_SEV_SNP_LAUNCH_START,
+ KVM_SEV_SNP_LAUNCH_UPDATE,

KVM_SEV_NR_MAX,
};
@@ -1929,6 +1930,24 @@ struct kvm_sev_snp_launch_start {
__u8 pad[6];
};

+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
+
+struct kvm_sev_snp_launch_update {
+ __u64 start_gfn;
+ __u64 uaddr;
+ __u32 len;
+ __u8 imi_page;
+ __u8 page_type;
+ __u8 vmpl3_perms;
+ __u8 vmpl2_perms;
+ __u8 vmpl1_perms;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2022-06-20 23:15:05

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 21/49] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe

From: Brijesh Singh <[email protected]>

Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.

When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" in the RMP after a successful VMRUN. This
is done for _all_ VMs, not just SNP-Active VMs.

If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2mb aligned and software accesses any
part of the associated 2mb region with a hupage, the CPU will
incorrectly treat the entire 2mb region as in-use and signal a spurious
RMP violation #PF.

The recommended is to not use the hugepage for the VMCB, VMSA or
AVIC backing page. Add a generic allocator that will ensure that the
page returns is not hugepage (2mb or 1gb) and is safe to be used when
SEV-SNP is enabled.

Co-developed-by: Marc Orr <[email protected]>
Signed-off-by: Marc Orr <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/lapic.c | 5 ++++-
arch/x86/kvm/svm/sev.c | 35 ++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 16 ++++++++++++--
arch/x86/kvm/svm/svm.h | 1 +
6 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index da47f60a4650..a66292dae698 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -128,6 +128,7 @@ KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP(alloc_apic_backing_page)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c24a72ddc93b..0205e2944067 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1512,6 +1512,8 @@ struct kvm_x86_ops {
* Returns vCPU specific APICv inhibit reasons
*/
unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
+
+ void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 66b0eb0bda94..7c7fc6c4a7f9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2506,7 +2506,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)

vcpu->arch.apic = apic;

- apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+ if (kvm_x86_ops.alloc_apic_backing_page)
+ apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
+ else
+ apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!apic->regs) {
printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b49c370d5ae9..93365996bd59 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3030,3 +3030,38 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
break;
}
}
+
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+ unsigned long pfn;
+ struct page *p;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+ /*
+ * Allocate an SNP safe page to workaround the SNP erratum where
+ * the CPU will incorrectly signal an RMP violation #PF if a
+ * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
+ * or AVIC backing page. The recommeded workaround is to not use the
+ * hugepage.
+ *
+ * Allocate one extra page, use a page which is not 2mb aligned
+ * and free the other.
+ */
+ p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+ if (!p)
+ return NULL;
+
+ split_page(p, 1);
+
+ pfn = page_to_pfn(p);
+ if (IS_ALIGNED(__pfn_to_phys(pfn), PMD_SIZE)) {
+ pfn++;
+ __free_page(p);
+ } else {
+ __free_page(pfn_to_page(pfn + 1));
+ }
+
+ return pfn_to_page(pfn);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index efc7623d0f90..b4bd64f94d3a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1260,7 +1260,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
svm = to_svm(vcpu);

err = -ENOMEM;
- vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmcb01_page = snp_safe_alloc_page(vcpu);
if (!vmcb01_page)
goto out;

@@ -1269,7 +1269,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
* SEV-ES guests require a separate VMSA page used to contain
* the encrypted register state of the guest.
*/
- vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmsa_page = snp_safe_alloc_page(vcpu);
if (!vmsa_page)
goto error_free_vmcb_page;

@@ -4598,6 +4598,16 @@ static int svm_vm_init(struct kvm *kvm)
return 0;
}

+static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
+{
+ struct page *page = snp_safe_alloc_page(vcpu);
+
+ if (!page)
+ return NULL;
+
+ return page_address(page);
+}
+
static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = "kvm_amd",

@@ -4722,6 +4732,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {

.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
+
+ .alloc_apic_backing_page = svm_alloc_apic_backing_page,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1f4a8bd09c9e..9672e25a338d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -659,6 +659,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);

/* vmenter.S */

--
2.25.1

2022-06-20 23:15:06

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 25/49] KVM: SVM: Disallow registering memory range from HugeTLB for SNP guest

From: Brijesh Singh <[email protected]>

While creating the VM, userspace call the KVM_MEMORY_ENCRYPT_REG_REGION
ioctl to register the memory regions for the guest. This registered
memory region is typically used as a guest RAM. Later, the guest may
issue the page state change (PSC) request that will require splitting
the large page into smaller page. If the memory is allocated from the
HugeTLB then hypervisor will not be able to split it.

Do not allow registering the memory range backed by the HugeTLB until
the hypervisor support is added to handle the case.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kvm/svm/sev.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9e6fc7a94ed7..41b83aa6b5f4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -17,6 +17,7 @@
#include <linux/misc_cgroup.h>
#include <linux/processor.h>
#include <linux/trace_events.h>
+#include <linux/hugetlb.h>

#include <asm/pkru.h>
#include <asm/trapnr.h>
@@ -2007,6 +2008,35 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
return r;
}

+static bool is_range_hugetlb(struct kvm *kvm, struct kvm_enc_region *range)
+{
+ struct vm_area_struct *vma;
+ u64 start, end;
+ bool ret = true;
+
+ start = range->addr;
+ end = start + range->size;
+
+ mmap_read_lock(kvm->mm);
+
+ do {
+ vma = find_vma_intersection(kvm->mm, start, end);
+ if (!vma)
+ goto unlock;
+
+ if (is_vm_hugetlb_page(vma))
+ goto unlock;
+
+ start = vma->vm_end;
+ } while (end > vma->vm_end);
+
+ ret = false;
+
+unlock:
+ mmap_read_unlock(kvm->mm);
+ return ret;
+}
+
int sev_mem_enc_register_region(struct kvm *kvm,
struct kvm_enc_region *range)
{
@@ -2024,6 +2054,13 @@ int sev_mem_enc_register_region(struct kvm *kvm,
if (range->addr > ULONG_MAX || range->size > ULONG_MAX)
return -EINVAL;

+ /*
+ * SEV-SNP does not support the backing pages from the HugeTLB. Verify
+ * that the registered memory range is not from the HugeTLB.
+ */
+ if (sev_snp_guest(kvm) && is_range_hugetlb(kvm, range))
+ return -EINVAL;
+
region = kzalloc(sizeof(*region), GFP_KERNEL_ACCOUNT);
if (!region)
return -ENOMEM;
--
2.25.1

2022-06-20 23:15:05

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 23/49] KVM: SVM: Add KVM_SNP_INIT command

From: Brijesh Singh <[email protected]>

The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be the
first command issued. When creating SEV-SNP guest, the VMM must use this
command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.

The flags value must be zero, it will be extended in future SNP support to
communicate the optional features (such as restricted INT injection etc).

Co-developed-by: Pavan Kumar Paluri <[email protected]>
Signed-off-by: Pavan Kumar Paluri <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 27 ++++++++++++
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/sev.c | 44 ++++++++++++++++++-
arch/x86/kvm/svm/svm.h | 4 ++
include/uapi/linux/kvm.h | 13 ++++++
5 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 2d307811978c..903023f524af 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -435,6 +435,33 @@ issued by the hypervisor to make the guest ready for execution.

Returns: 0 on success, -negative on error

+18. KVM_SNP_INIT
+----------------
+
+The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
+context. In a typical workflow, this command should be the first command issued.
+
+Parameters (in/out): struct kvm_snp_init
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_snp_init {
+ __u64 flags;
+ };
+
+The flags bitmap is defined as::
+
+ /* enable the restricted injection */
+ #define KVM_SEV_SNP_RESTRICTED_INJET (1<<0)
+
+ /* enable the restricted injection timer */
+ #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1<<1)
+
+If the specified flags is not supported then return -EOPNOTSUPP, and the supported
+flags are returned.
+
References
==========

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 1b07fba11704..284a8113227e 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -263,6 +263,7 @@ enum avic_ipi_failure_cause {
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)
#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL

+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)

struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index dc1f69a28aa7..813bda7f7b55 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -241,6 +241,25 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
sev_decommission(handle);
}

+static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_snp_init params;
+ int ret = 0;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
+ ret = -EOPNOTSUPP;
+
+ params.flags = SEV_SNP_SUPPORTED_FLAGS;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
+ ret = -EFAULT;
+
+ return ret;
+}
+
static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -254,13 +273,23 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;

sev->active = true;
- sev->es_active = argp->id == KVM_SEV_ES_INIT;
+ sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
+ sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
asid = sev_asid_new(sev);
if (asid < 0)
goto e_no_asid;
sev->asid = asid;

- ret = sev_platform_init(&argp->error);
+ if (sev->snp_active) {
+ ret = verify_snp_init_flags(kvm, argp);
+ if (ret)
+ goto e_free;
+
+ ret = sev_snp_init(&argp->error);
+ } else {
+ ret = sev_platform_init(&argp->error);
+ }
+
if (ret)
goto e_free;

@@ -275,6 +304,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
sev_asid_free(sev);
sev->asid = 0;
e_no_asid:
+ sev->snp_active = false;
sev->es_active = false;
sev->active = false;
return ret;
@@ -610,6 +640,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->xss = svm->vcpu.arch.ia32_xss;
save->dr6 = svm->vcpu.arch.dr6;

+ /* Enable the SEV-SNP feature */
+ if (sev_snp_guest(svm->vcpu.kvm))
+ save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
return 0;
}

@@ -1815,6 +1849,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
}

switch (sev_cmd.id) {
+ case KVM_SEV_SNP_INIT:
+ if (!sev_snp_enabled) {
+ r = -ENOTTY;
+ goto out;
+ }
+ fallthrough;
case KVM_SEV_ES_INIT:
if (!sev_es_enabled) {
r = -ENOTTY;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index edecc5066517..2f45589ee596 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -72,6 +72,9 @@ enum {
/* TPR and CR2 are always written before VMRUN */
#define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))

+/* Supported init feature flags */
+#define SEV_SNP_SUPPORTED_FLAGS 0x0
+
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
@@ -87,6 +90,7 @@ struct kvm_sev_info {
struct list_head mirror_entry; /* Use as a list entry of mirrors */
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
+ u64 snp_init_flags;
};

struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 68ce07185f03..0f912cefc544 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1810,6 +1810,9 @@ enum sev_cmd_id {
/* Guest Migration Extension */
KVM_SEV_SEND_CANCEL,

+ /* SNP specific commands */
+ KVM_SEV_SNP_INIT,
+
KVM_SEV_NR_MAX,
};

@@ -1906,6 +1909,16 @@ struct kvm_sev_receive_update_data {
__u32 trans_len;
};

+/* enable the restricted injection */
+#define KVM_SEV_SNP_RESTRICTED_INJET (1 << 0)
+
+/* enable the restricted injection timer */
+#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1 << 1)
+
+struct kvm_snp_init {
+ __u64 flags;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2022-06-20 23:15:25

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 28/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

From: Brijesh Singh <[email protected]>

The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.

If its an SNP guest, then VMSA was added in the RMP entry as
a guest owned page and also removed from the kernel direct map
so flush it later after it is transitioned back to hypervisor
state and restored in the direct map.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 22 ++++
arch/x86/kvm/svm/sev.c | 119 ++++++++++++++++++
include/uapi/linux/kvm.h | 14 +++
3 files changed, 155 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 62abd5c1f72b..750162cff87b 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -514,6 +514,28 @@ Returns: 0 on success, -negative on error
See the SEV-SNP spec for further details on how to build the VMPL permission
mask and page type.

+21. KVM_SNP_LAUNCH_FINISH
+-------------------------
+
+After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
+issued to make the guest ready for the execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 host_data[32];
+ };
+
+
+See SEV-SNP specification for further details on launch finish input parameters.

References
==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a9461d352eda..a5b90469683f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2095,6 +2095,106 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}

+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_update data = {};
+ int i, ret;
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.page_type = SNP_PAGE_TYPE_VMSA;
+
+ for (i = 0; i < kvm->created_vcpus; i++) {
+ struct vcpu_svm *svm = to_svm(xa_load(&kvm->vcpu_array, i));
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ /* Perform some pre-encryption checks against the VMSA */
+ ret = sev_es_sync_vmsa(svm);
+ if (ret)
+ return ret;
+
+ /* Transition the VMSA page to a firmware state. */
+ ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);
+ if (ret)
+ return ret;
+
+ /* Issue the SNP command to encrypt the VMSA */
+ data.address = __sme_pa(svm->sev_es.vmsa);
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, &argp->error);
+ if (ret) {
+ snp_page_reclaim(pfn);
+ return ret;
+ }
+
+ svm->vcpu.arch.guest_state_protected = true;
+ }
+
+ return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_finish *data;
+ void *id_block = NULL, *id_auth = NULL;
+ struct kvm_sev_snp_launch_finish params;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ /* Measure all vCPUs using LAUNCH_UPDATE before we finalize the launch flow. */
+ ret = snp_launch_update_vmsa(kvm, argp);
+ if (ret)
+ return ret;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+ if (!data)
+ return -ENOMEM;
+
+ if (params.id_block_en) {
+ id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+ if (IS_ERR(id_block)) {
+ ret = PTR_ERR(id_block);
+ goto e_free;
+ }
+
+ data->id_block_en = 1;
+ data->id_block_paddr = __sme_pa(id_block);
+ }
+
+ if (params.auth_key_en) {
+ id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+ if (IS_ERR(id_auth)) {
+ ret = PTR_ERR(id_auth);
+ goto e_free_id_block;
+ }
+
+ data->auth_key_en = 1;
+ data->id_auth_paddr = __sme_pa(id_auth);
+ }
+
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+ kfree(id_auth);
+
+e_free_id_block:
+ kfree(id_block);
+
+e_free:
+ kfree(data);
+
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2191,6 +2291,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_UPDATE:
r = snp_launch_update(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_FINISH:
+ r = snp_launch_finish(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2696,11 +2799,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)

svm = to_svm(vcpu);

+ /*
+ * If its an SNP guest, then VMSA was added in the RMP entry as
+ * a guest owned page. Transition the page to hypervisor state
+ * before releasing it back to the system.
+ * Also the page is removed from the kernel direct map, so flush it
+ * later after it is transitioned back to hypervisor state and
+ * restored in the direct map.
+ */
+ if (sev_snp_guest(vcpu->kvm)) {
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ if (host_rmp_make_shared(pfn, PG_LEVEL_4K, false))
+ goto skip_vmsa_free;
+ }
+
if (vcpu->arch.guest_state_protected)
sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);

__free_page(virt_to_page(svm->sev_es.vmsa));

+skip_vmsa_free:
if (svm->sev_es.ghcb_sa_free)
kvfree(svm->sev_es.ghcb_sa);
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9b36b07414ea..5a4662716b6a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1814,6 +1814,7 @@ enum sev_cmd_id {
KVM_SEV_SNP_INIT,
KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_SNP_LAUNCH_UPDATE,
+ KVM_SEV_SNP_LAUNCH_FINISH,

KVM_SEV_NR_MAX,
};
@@ -1948,6 +1949,19 @@ struct kvm_sev_snp_launch_update {
__u8 vmpl1_perms;
};

+#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
+#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
+
+struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1

2022-06-20 23:15:32

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 30/49] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP

From: Sean Christopherson <[email protected]>

Introduce a helper to directly (pun intended) fault-in a TDP page
without having to go through the full page fault path. This allows
TDX to get the resulting pfn and also allows the RET_PF_* enums to
stay in mmu.c where they belong.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kvm/mmu.h | 3 +++
arch/x86/kvm/mmu/mmu.c | 51 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e6cae6f22683..c99b15e97a0a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -204,6 +204,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
return vcpu->arch.mmu->page_fault(vcpu, &fault);
}

+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+ u32 error_code, int max_level);
+
/*
* Check if a given access (described through the I/D, W/R and U/S bits of a
* page fault error code pfec) causes a permission fault with the given PTE
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 997318ecebd1..569021af349a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4100,6 +4100,57 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
return direct_page_fault(vcpu, fault);
}

+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+ u32 err, int max_level)
+{
+ struct kvm_page_fault fault = {
+ .addr = gpa,
+ .error_code = err,
+ .exec = err & PFERR_FETCH_MASK,
+ .write = err & PFERR_WRITE_MASK,
+ .present = err & PFERR_PRESENT_MASK,
+ .rsvd = err & PFERR_RSVD_MASK,
+ .user = err & PFERR_USER_MASK,
+ .prefetch = false,
+ .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
+ .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(),
+
+ .max_level = max_level,
+ .req_level = PG_LEVEL_4K,
+ .goal_level = PG_LEVEL_4K,
+ };
+ int r;
+
+ if (mmu_topup_memory_caches(vcpu, false))
+ return KVM_PFN_ERR_FAULT;
+
+ /*
+ * Loop on the page fault path to handle the case where an mmu_notifier
+ * invalidation triggers RET_PF_RETRY. In the normal page fault path,
+ * KVM needs to resume the guest in case the invalidation changed any
+ * of the page fault properties, i.e. the gpa or error code. For this
+ * path, the gpa and error code are fixed by the caller, and the caller
+ * expects failure if and only if the page fault can't be fixed.
+ */
+ do {
+ /*
+ * TODO: this should probably go through kvm_mmu_do_page_fault(),
+ * but we need a way to control the max_level, so maybe a direct
+ * call to kvm_tdp_page_fault, which will call into
+ * direct_page_fault() when appropriate.
+ */
+ //r = direct_page_fault(vcpu, &fault);
+#if CONFIG_RETPOLINE
+ if (fault.is_tdp)
+ r = kvm_tdp_page_fault(vcpu, &fault);
+#else
+ r = vcpu->arch.mmu->page_fault(vcpu, &fault);
+#endif
+ } while (r == RET_PF_RETRY && !is_error_noslot_pfn(fault.pfn));
+ return fault.pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
+
static void nonpaging_init_context(struct kvm_mmu *context)
{
context->page_fault = nonpaging_page_fault;
--
2.25.1

2022-06-20 23:15:38

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 31/49] KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use

From: Brijesh Singh <[email protected]>

The SEV-SNP VMs may call the page state change VMGEXIT to add the GPA
as private or shared in the RMP table. The page state change VMGEXIT
will contain the RMP page level to be used in the RMP entry. If the
page level between the TDP and RMP does not match then, it will result
in nested-page-fault (RMP violation).

The SEV-SNP VMGEXIT handler will use the kvm_mmu_get_tdp_walk() to get
the current page-level in the TDP for the given GPA and calculate a
workable page level. If a GPA is mapped as a 4K-page in the TDP, but
the guest requested to add the GPA as a 2M in the RMP entry then the
2M request will be broken into 4K-pages to keep the RMP and TDP
page-levels in sync.

TDP SPTEs are RCU protected so need to put kvm_mmu_get_tdp_walk() in RCU
read-side critical section by using walk_shadow_page_lockless_begin() and
walk_lockless_shadow_page_lockless_end(). This fixes the
"suspicious RCU usage" message seen with lockdep kernel build.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off by: Ashish Kalra <[email protected]>
---
arch/x86/kvm/mmu.h | 2 ++
arch/x86/kvm/mmu/mmu.c | 33 +++++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index c99b15e97a0a..d55b5166389a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -178,6 +178,8 @@ static inline bool is_nx_huge_page_enabled(void)
return READ_ONCE(nx_huge_pages);
}

+bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level);
+
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
u32 err, bool prefetch)
{
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 569021af349a..c1ac486e096e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4151,6 +4151,39 @@ kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
}
EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);

+bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level)
+{
+ u64 sptes[PT64_ROOT_MAX_LEVEL + 1];
+ int leaf, root;
+
+ walk_shadow_page_lockless_begin(vcpu);
+
+ if (is_tdp_mmu(vcpu->arch.mmu))
+ leaf = kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, &root);
+ else
+ leaf = get_walk(vcpu, gpa, sptes, &root);
+
+ walk_shadow_page_lockless_end(vcpu);
+
+ if (unlikely(leaf < 0))
+ return false;
+
+ /* Check if the leaf SPTE is present */
+ if (!is_shadow_present_pte(sptes[leaf]))
+ return false;
+
+ *pfn = spte_to_pfn(sptes[leaf]);
+ if (leaf > PG_LEVEL_4K) {
+ u64 page_mask = KVM_PAGES_PER_HPAGE(leaf) - KVM_PAGES_PER_HPAGE(leaf - 1);
+ *pfn |= (gpa_to_gfn(gpa) & page_mask);
+ }
+
+ *level = leaf;
+
+ return true;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_get_tdp_walk);
+
static void nonpaging_init_context(struct kvm_mmu *context)
{
context->page_fault = nonpaging_page_fault;
--
2.25.1

2022-06-20 23:15:42

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 32/49] KVM: x86: Define RMP page fault error bits for #NPF

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled globally, the hardware places restrictions on all
memory accesses based on the RMP entry, whether the hypervisor or a VM,
performs the accesses. When hardware encounters an RMP access violation
during a guest access, it will cause a #VMEXIT(NPF).

See APM2 section 16.36.10 for more details.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2748c69609e3..49b217dc8d7e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -247,9 +247,13 @@ enum x86_intercept_stage;
#define PFERR_FETCH_BIT 4
#define PFERR_PK_BIT 5
#define PFERR_SGX_BIT 15
+#define PFERR_GUEST_RMP_BIT 31
#define PFERR_GUEST_FINAL_BIT 32
#define PFERR_GUEST_PAGE_BIT 33
#define PFERR_IMPLICIT_ACCESS_BIT 48
+#define PFERR_GUEST_ENC_BIT 34
+#define PFERR_GUEST_SIZEM_BIT 35
+#define PFERR_GUEST_VMPL_BIT 36

#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
@@ -261,6 +265,10 @@ enum x86_intercept_stage;
#define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
#define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
#define PFERR_IMPLICIT_ACCESS (1ULL << PFERR_IMPLICIT_ACCESS_BIT)
+#define PFERR_GUEST_RMP_MASK (1ULL << PFERR_GUEST_RMP_BIT)
+#define PFERR_GUEST_ENC_MASK (1ULL << PFERR_GUEST_ENC_BIT)
+#define PFERR_GUEST_SIZEM_MASK (1ULL << PFERR_GUEST_SIZEM_BIT)
+#define PFERR_GUEST_VMPL_MASK (1ULL << PFERR_GUEST_VMPL_BIT)

#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
PFERR_WRITE_MASK | \
--
2.25.1

2022-06-20 23:16:02

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 27/49] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled, the guest private pages are added in the RMP
table; while adding the pages, the rmp_make_private() unmaps the pages
from the direct map. If KSM attempts to access those unmapped pages then
it will trigger #PF (page-not-present).

Encrypted guest pages cannot be shared between the process, so an
userspace should not mark the region mergeable but to be safe, mark the
process vma unmerable before adding the pages in the RMP table.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kvm/svm/sev.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b5f0707d7ed6..a9461d352eda 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,11 +19,13 @@
#include <linux/trace_events.h>
#include <linux/hugetlb.h>
#include <linux/sev.h>
+#include <linux/ksm.h>

#include <asm/pkru.h>
#include <asm/trapnr.h>
#include <asm/fpu/xcr.h>
#include <asm/sev.h>
+#include <asm/mman.h>

#include "x86.h"
#include "svm.h"
@@ -1965,6 +1967,30 @@ static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
return false;
}

+static int snp_mark_unmergable(struct kvm *kvm, u64 start, u64 size)
+{
+ struct vm_area_struct *vma;
+ u64 end = start + size;
+ int ret;
+
+ do {
+ vma = find_vma_intersection(kvm->mm, start, end);
+ if (!vma) {
+ ret = -EINVAL;
+ break;
+ }
+
+ ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
+ MADV_UNMERGEABLE, &vma->vm_flags);
+ if (ret)
+ break;
+
+ start = vma->vm_end;
+ } while (end > vma->vm_end);
+
+ return ret;
+}
+
static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -1989,6 +2015,12 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
if (!is_hva_registered(kvm, params.uaddr, params.len))
return -EINVAL;

+ mmap_write_lock(kvm->mm);
+ ret = snp_mark_unmergable(kvm, params.uaddr, params.len);
+ mmap_write_unlock(kvm->mm);
+ if (ret)
+ return -EFAULT;
+
/*
* The userspace memory is already locked so technically we don't
* need to lock it again. Later part of the function needs to know
--
2.25.1

2022-06-20 23:16:21

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 29/49] KVM: X86: Keep the NPT and RMP page level in sync

From: Brijesh Singh <[email protected]>

When running an SEV-SNP VM, the sPA used to index the RMP entry is
obtained through the NPT translation (gva->gpa->spa). The NPT page
level is checked against the page level programmed in the RMP entry.
If the page level does not match, then it will cause a nested page
fault with the RMP bit set to indicate the RMP violation.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 5 ++++
arch/x86/kvm/svm/sev.c | 46 ++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 1 +
6 files changed, 55 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index a66292dae698..e0068e702692 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -129,6 +129,7 @@ KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP(alloc_apic_backing_page)
+KVM_X86_OP_OPTIONAL(rmp_page_level_adjust)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0205e2944067..2748c69609e3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1514,6 +1514,7 @@ struct kvm_x86_ops {
unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);

void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+ void (*rmp_page_level_adjust)(struct kvm *kvm, kvm_pfn_t pfn, int *level);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c623019929a7..997318ecebd1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -43,6 +43,7 @@
#include <linux/hash.h>
#include <linux/kern_levels.h>
#include <linux/kthread.h>
+#include <linux/sev.h>

#include <asm/page.h>
#include <asm/memtype.h>
@@ -2824,6 +2825,10 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
if (unlikely(!pte))
return PG_LEVEL_4K;

+ /* Adjust the page level based on the SEV-SNP RMP page level. */
+ if (kvm_x86_ops.rmp_page_level_adjust)
+ static_call(kvm_x86_rmp_page_level_adjust)(kvm, pfn, &level);
+
return level;
}

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a5b90469683f..91d3d24e60d2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3597,3 +3597,49 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)

return pfn_to_page(pfn);
}
+
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+ int level;
+
+ while (end > start) {
+ if (snp_lookup_rmpentry(start, &level) != 0)
+ return false;
+ start++;
+ }
+
+ return true;
+}
+
+void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level)
+{
+ int rmp_level, assigned;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return;
+
+ assigned = snp_lookup_rmpentry(pfn, &rmp_level);
+ if (unlikely(assigned < 0))
+ return;
+
+ if (!assigned) {
+ /*
+ * If all the pages are shared then no need to keep the RMP
+ * and NPT in sync.
+ */
+ pfn = pfn & ~(PTRS_PER_PMD - 1);
+ if (is_pfn_range_shared(pfn, pfn + PTRS_PER_PMD))
+ return;
+ }
+
+ /*
+ * The hardware installs 2MB TLB entries to access to 1GB pages,
+ * therefore allow NPT to use 1GB pages when pfn was added as 2MB
+ * in the RMP table.
+ */
+ if (rmp_level == PG_LEVEL_2M && (*level == PG_LEVEL_1G))
+ return;
+
+ /* Adjust the level to keep the NPT and RMP in sync */
+ *level = min_t(size_t, *level, rmp_level);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b4bd64f94d3a..18e2cd4d9559 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4734,6 +4734,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,

.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+ .rmp_page_level_adjust = sev_rmp_page_level_adjust,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 71c011af098e..7782312a1cda 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -673,6 +673,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level);

/* vmenter.S */

--
2.25.1

2022-06-20 23:16:25

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 36/49] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, Verify that GHCB GPA matches with the registered value. If a
mismatch is detected then abort the guest.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 8 ++++++++
arch/x86/kvm/svm/sev.c | 27 +++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 7 +++++++
3 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 539de6b93420..0a9055cdfae2 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)

+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ 0x010
+#define GHCB_MSR_GPA_VALUE_POS 12
+#define GHCB_MSR_GPA_VALUE_MASK GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP 0x011
+#define GHCB_MSR_PREF_GPA_NONE 0xfffffffffffff
+
/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c70f3f7e06a8..6de48130e414 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3331,6 +3331,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PREF_GPA_REQ: {
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
+ case GHCB_MSR_REG_GPA_REQ: {
+ u64 gfn;
+
+ gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+
+ svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+ set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -3381,6 +3402,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
return 1;
}

+ /* SEV-SNP guest requires that the GHCB GPA must be registered */
+ if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+ vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+ return -EINVAL;
+ }
+
ret = sev_es_validate_vmgexit(svm, &exit_code);
if (ret)
return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c80352c9c0d6..54ff56cb6125 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -206,6 +206,8 @@ struct vcpu_sev_es_state {
*/
u64 ghcb_sw_exit_info_1;
u64 ghcb_sw_exit_info_2;
+
+ u64 ghcb_registered_gpa;
};

struct vcpu_svm {
@@ -334,6 +336,11 @@ static inline bool sev_snp_guest(struct kvm *kvm)
return sev_es_guest(kvm) && sev->snp_active;
}

+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+ return svm->sev_es.ghcb_registered_gpa == val;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1

2022-06-20 23:16:30

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 33/49] KVM: x86: Update page-fault trace to log full 64-bit error code

From: Brijesh Singh <[email protected]>

The #NPT error code is a 64-bit value but the trace prints only the
lower 32-bits. Some of the fault error code (e.g PFERR_GUEST_FINAL_MASK)
are available in the upper 32-bits.

Cc: <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kvm/trace.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index e3a24b8f04be..9b9bc5468103 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -383,12 +383,12 @@ TRACE_EVENT(kvm_inj_exception,
* Tracepoint for page fault.
*/
TRACE_EVENT(kvm_page_fault,
- TP_PROTO(unsigned long fault_address, unsigned int error_code),
+ TP_PROTO(unsigned long fault_address, u64 error_code),
TP_ARGS(fault_address, error_code),

TP_STRUCT__entry(
__field( unsigned long, fault_address )
- __field( unsigned int, error_code )
+ __field( u64, error_code )
),

TP_fast_assign(
@@ -396,7 +396,7 @@ TRACE_EVENT(kvm_page_fault,
__entry->error_code = error_code;
),

- TP_printk("address %lx error_code %x",
+ TP_printk("address %lx error_code %llx",
__entry->fault_address, __entry->error_code)
);

--
2.25.1

2022-06-20 23:16:31

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 35/49] KVM: SVM: Remove the long-lived GHCB host map

From: Brijesh Singh <[email protected]>

On VMGEXIT, sev_handle_vmgexit() creates a host mapping for the GHCB GPA,
and unmaps it just before VM-entry. This long-lived GHCB map is used by
the VMGEXIT handler through accessors such as ghcb_{set_get}_xxx().

A long-lived GHCB map can cause issue when SEV-SNP is enabled. When
SEV-SNP is enabled the mapped GPA needs to be protected against a page
state change.

To eliminate the long-lived GHCB mapping, update the GHCB sync operations
to explicitly map the GHCB before access and unmap it after access is
complete. This requires that the setting of the GHCBs sw_exit_info_{1,2}
fields be done during sev_es_sync_to_ghcb(), so create two new fields in
the vcpu_svm struct to hold these values when required to be set outside
of the GHCB mapping.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kvm/svm/sev.c | 131 ++++++++++++++++++++++++++---------------
arch/x86/kvm/svm/svm.c | 12 ++--
arch/x86/kvm/svm/svm.h | 24 +++++++-
3 files changed, 111 insertions(+), 56 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 01ea257e17d6..c70f3f7e06a8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2823,15 +2823,40 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
kvfree(svm->sev_es.ghcb_sa);
}

+static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
+{
+ struct vmcb_control_area *control = &svm->vmcb->control;
+ u64 gfn = gpa_to_gfn(control->ghcb_gpa);
+
+ if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
+ /* Unable to map GHCB from guest */
+ pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
+{
+ kvm_vcpu_unmap(&svm->vcpu, map, true);
+}
+
static void dump_ghcb(struct vcpu_svm *svm)
{
- struct ghcb *ghcb = svm->sev_es.ghcb;
+ struct kvm_host_map map;
unsigned int nbits;
+ struct ghcb *ghcb;
+
+ if (svm_map_ghcb(svm, &map))
+ return;
+
+ ghcb = map.hva;

/* Re-use the dump_invalid_vmcb module parameter */
if (!dump_invalid_vmcb) {
pr_warn_ratelimited("set kvm_amd.dump_invalid_vmcb=1 to dump internal KVM state.\n");
- return;
+ goto e_unmap;
}

nbits = sizeof(ghcb->save.valid_bitmap) * 8;
@@ -2846,12 +2871,21 @@ static void dump_ghcb(struct vcpu_svm *svm)
pr_err("%-20s%016llx is_valid: %u\n", "sw_scratch",
ghcb->save.sw_scratch, ghcb_sw_scratch_is_valid(ghcb));
pr_err("%-20s%*pb\n", "valid_bitmap", nbits, ghcb->save.valid_bitmap);
+
+e_unmap:
+ svm_unmap_ghcb(svm, &map);
}

-static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
+static bool sev_es_sync_to_ghcb(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
- struct ghcb *ghcb = svm->sev_es.ghcb;
+ struct kvm_host_map map;
+ struct ghcb *ghcb;
+
+ if (svm_map_ghcb(svm, &map))
+ return false;
+
+ ghcb = map.hva;

/*
* The GHCB protocol so far allows for the following data
@@ -2865,13 +2899,24 @@ static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
+
+ /*
+ * Copy the return values from the exit_info_{1,2}.
+ */
+ ghcb_set_sw_exit_info_1(ghcb, svm->sev_es.ghcb_sw_exit_info_1);
+ ghcb_set_sw_exit_info_2(ghcb, svm->sev_es.ghcb_sw_exit_info_2);
+
+ trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, ghcb);
+
+ svm_unmap_ghcb(svm, &map);
+
+ return true;
}

-static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
+static void sev_es_sync_from_ghcb(struct vcpu_svm *svm, struct ghcb *ghcb)
{
struct vmcb_control_area *control = &svm->vmcb->control;
struct kvm_vcpu *vcpu = &svm->vcpu;
- struct ghcb *ghcb = svm->sev_es.ghcb;
u64 exit_code;

/*
@@ -2915,20 +2960,25 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
}

-static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
+static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
{
- struct kvm_vcpu *vcpu;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_host_map map;
struct ghcb *ghcb;
- u64 exit_code;
u64 reason;

- ghcb = svm->sev_es.ghcb;
+ if (svm_map_ghcb(svm, &map))
+ return -EFAULT;
+
+ ghcb = map.hva;
+
+ trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);

/*
* Retrieve the exit code now even though it may not be marked valid
* as it could help with debugging.
*/
- exit_code = ghcb_get_sw_exit_code(ghcb);
+ *exit_code = ghcb_get_sw_exit_code(ghcb);

/* Only GHCB Usage code 0 is supported */
if (ghcb->ghcb_usage) {
@@ -3021,6 +3071,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
goto vmgexit_err;
}

+ sev_es_sync_from_ghcb(svm, ghcb);
+
+ svm_unmap_ghcb(svm, &map);
return 0;

vmgexit_err:
@@ -3031,10 +3084,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
ghcb->ghcb_usage);
} else if (reason == GHCB_ERR_INVALID_EVENT) {
vcpu_unimpl(vcpu, "vmgexit: exit code %#llx is not valid\n",
- exit_code);
+ *exit_code);
} else {
vcpu_unimpl(vcpu, "vmgexit: exit code %#llx input is not valid\n",
- exit_code);
+ *exit_code);
dump_ghcb(svm);
}

@@ -3044,6 +3097,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
ghcb_set_sw_exit_info_1(ghcb, 2);
ghcb_set_sw_exit_info_2(ghcb, reason);

+ svm_unmap_ghcb(svm, &map);
+
/* Resume the guest to "return" the error code. */
return 1;
}
@@ -3053,23 +3108,20 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
/* Clear any indication that the vCPU is in a type of AP Reset Hold */
svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;

- if (!svm->sev_es.ghcb)
+ if (!svm->sev_es.ghcb_in_use)
return;

/* Sync the scratch buffer area. */
if (svm->sev_es.ghcb_sa_sync) {
kvm_write_guest(svm->vcpu.kvm,
- ghcb_get_sw_scratch(svm->sev_es.ghcb),
+ svm->sev_es.ghcb_sa_gpa,
svm->sev_es.ghcb_sa, svm->sev_es.ghcb_sa_len);
svm->sev_es.ghcb_sa_sync = false;
}

- trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->sev_es.ghcb);
-
sev_es_sync_to_ghcb(svm);

- kvm_vcpu_unmap(&svm->vcpu, &svm->sev_es.ghcb_map, true);
- svm->sev_es.ghcb = NULL;
+ svm->sev_es.ghcb_in_use = false;
}

void pre_sev_run(struct vcpu_svm *svm, int cpu)
@@ -3099,7 +3151,6 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
{
struct vmcb_control_area *control = &svm->vmcb->control;
- struct ghcb *ghcb = svm->sev_es.ghcb;
u64 ghcb_scratch_beg, ghcb_scratch_end;
u64 scratch_gpa_beg, scratch_gpa_end;

@@ -3178,8 +3229,8 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
return 0;

e_scratch:
- ghcb_set_sw_exit_info_1(ghcb, 2);
- ghcb_set_sw_exit_info_2(ghcb, GHCB_ERR_INVALID_SCRATCH_AREA);
+ svm_set_ghcb_sw_exit_info_1(&svm->vcpu, 2);
+ svm_set_ghcb_sw_exit_info_2(&svm->vcpu, GHCB_ERR_INVALID_SCRATCH_AREA);

return 1;
}
@@ -3316,7 +3367,6 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb_control_area *control = &svm->vmcb->control;
u64 ghcb_gpa, exit_code;
- struct ghcb *ghcb;
int ret;

/* Validate the GHCB */
@@ -3331,29 +3381,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
return 1;
}

- if (kvm_vcpu_map(vcpu, ghcb_gpa >> PAGE_SHIFT, &svm->sev_es.ghcb_map)) {
- /* Unable to map GHCB from guest */
- vcpu_unimpl(vcpu, "vmgexit: error mapping GHCB [%#llx] from guest\n",
- ghcb_gpa);
-
- /* Without a GHCB, just return right back to the guest */
- return 1;
- }
-
- svm->sev_es.ghcb = svm->sev_es.ghcb_map.hva;
- ghcb = svm->sev_es.ghcb_map.hva;
-
- trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);
-
- exit_code = ghcb_get_sw_exit_code(ghcb);
-
- ret = sev_es_validate_vmgexit(svm);
+ ret = sev_es_validate_vmgexit(svm, &exit_code);
if (ret)
return ret;

- sev_es_sync_from_ghcb(svm);
- ghcb_set_sw_exit_info_1(ghcb, 0);
- ghcb_set_sw_exit_info_2(ghcb, 0);
+ svm->sev_es.ghcb_in_use = true;
+
+ svm_set_ghcb_sw_exit_info_1(vcpu, 0);
+ svm_set_ghcb_sw_exit_info_2(vcpu, 0);

switch (exit_code) {
case SVM_VMGEXIT_MMIO_READ:
@@ -3393,20 +3428,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
break;
case 1:
/* Get AP jump table address */
- ghcb_set_sw_exit_info_2(ghcb, sev->ap_jump_table);
+ svm_set_ghcb_sw_exit_info_2(vcpu, sev->ap_jump_table);
break;
default:
pr_err("svm: vmgexit: unsupported AP jump table request - exit_info_1=%#llx\n",
control->exit_info_1);
- ghcb_set_sw_exit_info_1(ghcb, 2);
- ghcb_set_sw_exit_info_2(ghcb, GHCB_ERR_INVALID_INPUT);
+ svm_set_ghcb_sw_exit_info_1(vcpu, 2);
+ svm_set_ghcb_sw_exit_info_2(vcpu, GHCB_ERR_INVALID_INPUT);
}

ret = 1;
break;
}
case SVM_VMGEXIT_HV_FEATURES: {
- ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FT_SUPPORTED);
+ svm_set_ghcb_sw_exit_info_2(vcpu, GHCB_HV_FT_SUPPORTED);

ret = 1;
break;
@@ -3537,7 +3572,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
* Return from an AP Reset Hold VMGEXIT, where the guest will
* set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
*/
- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ svm_set_ghcb_sw_exit_info_2(vcpu, 1);
break;
case AP_RESET_HOLD_MSR_PROTO:
/*
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 18e2cd4d9559..b24e0171cbf2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2720,14 +2720,14 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
{
struct vcpu_svm *svm = to_svm(vcpu);
- if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->sev_es.ghcb))
+ if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->sev_es.ghcb_in_use))
return kvm_complete_insn_gp(vcpu, err);

- ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 1);
- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb,
- X86_TRAP_GP |
- SVM_EVTINJ_TYPE_EXEPT |
- SVM_EVTINJ_VALID);
+ svm_set_ghcb_sw_exit_info_1(vcpu, 1);
+ svm_set_ghcb_sw_exit_info_2(vcpu,
+ X86_TRAP_GP |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
return 1;
}

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bd0db4d4a61e..c80352c9c0d6 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -189,8 +189,7 @@ struct svm_nested_state {
struct vcpu_sev_es_state {
/* SEV-ES support */
struct sev_es_save_area *vmsa;
- struct ghcb *ghcb;
- struct kvm_host_map ghcb_map;
+ bool ghcb_in_use;
bool received_first_sipi;
unsigned int ap_reset_hold_type;

@@ -200,6 +199,13 @@ struct vcpu_sev_es_state {
u64 ghcb_sa_gpa;
u32 ghcb_sa_alloc_len;
bool ghcb_sa_sync;
+
+ /*
+ * SEV-ES support to hold the sw_exit_info return values to be
+ * sync'ed to the GHCB when mapped.
+ */
+ u64 ghcb_sw_exit_info_1;
+ u64 ghcb_sw_exit_info_2;
};

struct vcpu_svm {
@@ -614,6 +620,20 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);

+static inline void svm_set_ghcb_sw_exit_info_1(struct kvm_vcpu *vcpu, u64 val)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ svm->sev_es.ghcb_sw_exit_info_1 = val;
+}
+
+static inline void svm_set_ghcb_sw_exit_info_2(struct kvm_vcpu *vcpu, u64 val)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ svm->sev_es.ghcb_sw_exit_info_2 = val;
+}
+
extern struct kvm_x86_nested_ops svm_nested_ops;

/* avic.c */
--
2.25.1

2022-06-20 23:16:35

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 38/49] KVM: SVM: Add support to handle Page State Change VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 7 +++
arch/x86/kvm/svm/sev.c | 79 +++++++++++++++++++++++++++++--
2 files changed, 81 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index ee38f7408470..1b111cde8c82 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -130,6 +130,13 @@ enum psc_op {
/* SNP Page State Change NAE event */
#define VMGEXIT_PSC_MAX_ENTRY 253

+/* The page state change hdr structure in not valid */
+#define PSC_INVALID_HDR 1
+/* The hdr.cur_entry or hdr.end_entry is not valid */
+#define PSC_INVALID_ENTRY 2
+/* Page state change encountered undefined error */
+#define PSC_UNDEF_ERR 3
+
struct psc_hdr {
u16 cur_entry;
u16 end_entry;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 15900c2f30fc..cb2d1bbb862b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3066,6 +3066,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
+ case SVM_VMGEXIT_PSC:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3351,13 +3352,13 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
*/
rc = snp_check_and_build_npt(vcpu, gpa, level);
if (rc)
- return -EINVAL;
+ return PSC_UNDEF_ERR;

if (op == SNP_PAGE_STATE_PRIVATE) {
hva_t hva;

if (snp_gpa_to_hva(kvm, gpa, &hva))
- return -EINVAL;
+ return PSC_UNDEF_ERR;

/*
* Verify that the hva range is registered. This enforcement is
@@ -3369,7 +3370,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
rc = is_hva_registered(kvm, hva, page_level_size(level));
mutex_unlock(&kvm->lock);
if (!rc)
- return -EINVAL;
+ return PSC_UNDEF_ERR;

/*
* Mark the userspace range unmerable before adding the pages
@@ -3379,7 +3380,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
rc = snp_mark_unmergable(kvm, hva, page_level_size(level));
mmap_write_unlock(kvm->mm);
if (rc)
- return -EINVAL;
+ return PSC_UNDEF_ERR;
}

write_lock(&kvm->mmu_lock);
@@ -3410,7 +3411,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
rc = rmp_make_private(pfn, gpa, level, sev->asid, false);
break;
default:
- rc = -EINVAL;
+ rc = PSC_INVALID_ENTRY;
break;
}

@@ -3428,6 +3429,65 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
return 0;
}

+static inline unsigned long map_to_psc_vmgexit_code(int rc)
+{
+ switch (rc) {
+ case PSC_INVALID_HDR:
+ return ((1ul << 32) | 1);
+ case PSC_INVALID_ENTRY:
+ return ((1ul << 32) | 2);
+ case RMPUPDATE_FAIL_OVERLAP:
+ return ((3ul << 32) | 2);
+ default: return (4ul << 32);
+ }
+}
+
+static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ int level, op, rc = PSC_UNDEF_ERR;
+ struct snp_psc_desc *info;
+ struct psc_entry *entry;
+ u16 cur, end;
+ gpa_t gpa;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ return PSC_INVALID_HDR;
+
+ if (setup_vmgexit_scratch(svm, true, sizeof(*info))) {
+ pr_err("vmgexit: scratch area is not setup.\n");
+ return PSC_INVALID_HDR;
+ }
+
+ info = (struct snp_psc_desc *)svm->sev_es.ghcb_sa;
+ cur = info->hdr.cur_entry;
+ end = info->hdr.end_entry;
+
+ if (cur >= VMGEXIT_PSC_MAX_ENTRY ||
+ end >= VMGEXIT_PSC_MAX_ENTRY || cur > end)
+ return PSC_INVALID_ENTRY;
+
+ for (; cur <= end; cur++) {
+ entry = &info->entries[cur];
+ gpa = gfn_to_gpa(entry->gfn);
+ level = RMP_TO_X86_PG_LEVEL(entry->pagesize);
+ op = entry->operation;
+
+ if (!IS_ALIGNED(gpa, page_level_size(level))) {
+ rc = PSC_INVALID_ENTRY;
+ goto out;
+ }
+
+ rc = __snp_handle_page_state_change(vcpu, op, gpa, level);
+ if (rc)
+ goto out;
+ }
+
+out:
+ info->hdr.cur_entry = cur;
+ return rc ? map_to_psc_vmgexit_code(rc) : 0;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3670,6 +3730,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_PSC: {
+ unsigned long rc;
+
+ ret = 1;
+
+ rc = snp_handle_page_state_change(svm);
+ svm_set_ghcb_sw_exit_info_2(vcpu, rc);
+ break;
+ }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.25.1

2022-06-20 23:16:57

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 34/49] KVM: SVM: Do not use long-lived GHCB map while setting scratch area

From: Brijesh Singh <[email protected]>

The setup_vmgexit_scratch() function may rely on a long-lived GHCB
mapping if the GHCB shared buffer area was used for the scratch area.
In preparation for eliminating the long-lived GHCB mapping, always
allocate a buffer for the scratch area so it can be accessed without
the GHCB mapping.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kvm/svm/sev.c | 74 +++++++++++++++++++-----------------------
arch/x86/kvm/svm/svm.h | 3 +-
2 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 91d3d24e60d2..01ea257e17d6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2820,8 +2820,7 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
__free_page(virt_to_page(svm->sev_es.vmsa));

skip_vmsa_free:
- if (svm->sev_es.ghcb_sa_free)
- kvfree(svm->sev_es.ghcb_sa);
+ kvfree(svm->sev_es.ghcb_sa);
}

static void dump_ghcb(struct vcpu_svm *svm)
@@ -2909,6 +2908,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb);

+ /* Copy the GHCB scratch area GPA */
+ svm->sev_es.ghcb_sa_gpa = ghcb_get_sw_scratch(ghcb);
+
/* Clear the valid entries fields */
memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
}
@@ -3054,23 +3056,12 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
if (!svm->sev_es.ghcb)
return;

- if (svm->sev_es.ghcb_sa_free) {
- /*
- * The scratch area lives outside the GHCB, so there is a
- * buffer that, depending on the operation performed, may
- * need to be synced, then freed.
- */
- if (svm->sev_es.ghcb_sa_sync) {
- kvm_write_guest(svm->vcpu.kvm,
- ghcb_get_sw_scratch(svm->sev_es.ghcb),
- svm->sev_es.ghcb_sa,
- svm->sev_es.ghcb_sa_len);
- svm->sev_es.ghcb_sa_sync = false;
- }
-
- kvfree(svm->sev_es.ghcb_sa);
- svm->sev_es.ghcb_sa = NULL;
- svm->sev_es.ghcb_sa_free = false;
+ /* Sync the scratch buffer area. */
+ if (svm->sev_es.ghcb_sa_sync) {
+ kvm_write_guest(svm->vcpu.kvm,
+ ghcb_get_sw_scratch(svm->sev_es.ghcb),
+ svm->sev_es.ghcb_sa, svm->sev_es.ghcb_sa_len);
+ svm->sev_es.ghcb_sa_sync = false;
}

trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->sev_es.ghcb);
@@ -3111,9 +3102,8 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
struct ghcb *ghcb = svm->sev_es.ghcb;
u64 ghcb_scratch_beg, ghcb_scratch_end;
u64 scratch_gpa_beg, scratch_gpa_end;
- void *scratch_va;

- scratch_gpa_beg = ghcb_get_sw_scratch(ghcb);
+ scratch_gpa_beg = svm->sev_es.ghcb_sa_gpa;
if (!scratch_gpa_beg) {
pr_err("vmgexit: scratch gpa not provided\n");
goto e_scratch;
@@ -3143,9 +3133,6 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
scratch_gpa_beg, scratch_gpa_end);
goto e_scratch;
}
-
- scratch_va = (void *)svm->sev_es.ghcb;
- scratch_va += (scratch_gpa_beg - control->ghcb_gpa);
} else {
/*
* The guest memory must be read into a kernel buffer, so
@@ -3156,29 +3143,36 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
len, GHCB_SCRATCH_AREA_LIMIT);
goto e_scratch;
}
- scratch_va = kvzalloc(len, GFP_KERNEL_ACCOUNT);
- if (!scratch_va)
- return -ENOMEM;
+ }

- if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, scratch_va, len)) {
- /* Unable to copy scratch area from guest */
- pr_err("vmgexit: kvm_read_guest for scratch area failed\n");
+ if (svm->sev_es.ghcb_sa_alloc_len < len) {
+ void *scratch_va = kvzalloc(len, GFP_KERNEL_ACCOUNT);

- kvfree(scratch_va);
- return -EFAULT;
- }
+ if (!scratch_va)
+ return -ENOMEM;

/*
- * The scratch area is outside the GHCB. The operation will
- * dictate whether the buffer needs to be synced before running
- * the vCPU next time (i.e. a read was requested so the data
- * must be written back to the guest memory).
+ * Free the old scratch area and switch to using newly
+ * allocated.
*/
- svm->sev_es.ghcb_sa_sync = sync;
- svm->sev_es.ghcb_sa_free = true;
+ kvfree(svm->sev_es.ghcb_sa);
+
+ svm->sev_es.ghcb_sa_alloc_len = len;
+ svm->sev_es.ghcb_sa = scratch_va;
}

- svm->sev_es.ghcb_sa = scratch_va;
+ if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, svm->sev_es.ghcb_sa, len)) {
+ /* Unable to copy scratch area from guest */
+ pr_err("vmgexit: kvm_read_guest for scratch area failed\n");
+ return -EFAULT;
+ }
+
+ /*
+ * The operation will dictate whether the buffer needs to be synced
+ * before running the vCPU next time (i.e. a read was requested so
+ * the data must be written back to the guest memory).
+ */
+ svm->sev_es.ghcb_sa_sync = sync;
svm->sev_es.ghcb_sa_len = len;

return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7782312a1cda..bd0db4d4a61e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -197,8 +197,9 @@ struct vcpu_sev_es_state {
/* SEV-ES scratch area support */
void *ghcb_sa;
u32 ghcb_sa_len;
+ u64 ghcb_sa_gpa;
+ u32 ghcb_sa_alloc_len;
bool ghcb_sa_sync;
- bool ghcb_sa_free;
};

struct vcpu_svm {
--
2.25.1

2022-06-20 23:17:22

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 37/49] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT

From: Brijesh Singh <[email protected]>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

Before changing the page state in the RMP entry, lookup the page in the
NPT to make sure that there is a valid mapping for it. If the mapping
exist then try to find a workable page level between the NPT and RMP for
the page. If the page is not mapped in the NPT, then create a fault such
that it gets mapped before we change the page state in the RMP entry.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 9 ++
arch/x86/kvm/svm/sev.c | 197 ++++++++++++++++++++++++++++++
arch/x86/kvm/trace.h | 34 ++++++
arch/x86/kvm/x86.c | 1 +
4 files changed, 241 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 0a9055cdfae2..ee38f7408470 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -93,6 +93,10 @@ enum psc_op {
};

#define GHCB_MSR_PSC_REQ 0x014
+#define GHCB_MSR_PSC_GFN_POS 12
+#define GHCB_MSR_PSC_GFN_MASK GENMASK_ULL(39, 0)
+#define GHCB_MSR_PSC_OP_POS 52
+#define GHCB_MSR_PSC_OP_MASK 0xf
#define GHCB_MSR_PSC_REQ_GFN(gfn, op) \
/* GHCBData[55:52] */ \
(((u64)((op) & 0xf) << 52) | \
@@ -102,6 +106,11 @@ enum psc_op {
GHCB_MSR_PSC_REQ)

#define GHCB_MSR_PSC_RESP 0x015
+#define GHCB_MSR_PSC_ERROR_POS 32
+#define GHCB_MSR_PSC_ERROR_MASK GENMASK_ULL(31, 0)
+#define GHCB_MSR_PSC_ERROR GENMASK_ULL(31, 0)
+#define GHCB_MSR_PSC_RSVD_POS 12
+#define GHCB_MSR_PSC_RSVD_MASK GENMASK_ULL(19, 0)
#define GHCB_MSR_PSC_RESP_VAL(val) \
/* GHCBData[63:32] */ \
(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6de48130e414..15900c2f30fc 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -32,6 +32,7 @@
#include "svm_ops.h"
#include "cpuid.h"
#include "trace.h"
+#include "mmu.h"

#ifndef CONFIG_KVM_AMD_SEV
/*
@@ -3252,6 +3253,181 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}

+static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
+{
+ pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+ return psmash(pfn);
+}
+
+static int snp_make_page_shared(struct kvm *kvm, gpa_t gpa, kvm_pfn_t pfn, int level)
+{
+ int rc, rmp_level;
+
+ rc = snp_lookup_rmpentry(pfn, &rmp_level);
+ if (rc < 0)
+ return -EINVAL;
+
+ /* If page is not assigned then do nothing */
+ if (!rc)
+ return 0;
+
+ /*
+ * Is the page part of an existing 2MB RMP entry ? Split the 2MB into
+ * multiple of 4K-page before making the memory shared.
+ */
+ if (level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M) {
+ rc = snp_rmptable_psmash(kvm, pfn);
+ if (rc)
+ return rc;
+ }
+
+ return rmp_make_shared(pfn, level);
+}
+
+static int snp_check_and_build_npt(struct kvm_vcpu *vcpu, gpa_t gpa, int level)
+{
+ struct kvm *kvm = vcpu->kvm;
+ int rc, npt_level;
+ kvm_pfn_t pfn;
+
+ /*
+ * Get the pfn and level for the gpa from the nested page table.
+ *
+ * If the tdp walk fails, then its safe to say that there is no
+ * valid mapping for this gpa. Create a fault to build the map.
+ */
+ write_lock(&kvm->mmu_lock);
+ rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
+ write_unlock(&kvm->mmu_lock);
+ if (!rc) {
+ pfn = kvm_mmu_map_tdp_page(vcpu, gpa, PFERR_USER_MASK, level);
+ if (is_error_noslot_pfn(pfn))
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int snp_gpa_to_hva(struct kvm *kvm, gpa_t gpa, hva_t *hva)
+{
+ struct kvm_memory_slot *slot;
+ gfn_t gfn = gpa_to_gfn(gpa);
+ int idx;
+
+ idx = srcu_read_lock(&kvm->srcu);
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!slot) {
+ srcu_read_unlock(&kvm->srcu, idx);
+ return -EINVAL;
+ }
+
+ /*
+ * Note, using the __gfn_to_hva_memslot() is not solely for performance,
+ * it's also necessary to avoid the "writable" check in __gfn_to_hva_many(),
+ * which will always fail on read-only memslots due to gfn_to_hva() assuming
+ * writes.
+ */
+ *hva = __gfn_to_hva_memslot(slot, gfn);
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return 0;
+}
+
+static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op, gpa_t gpa,
+ int level)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
+ struct kvm *kvm = vcpu->kvm;
+ int rc, npt_level;
+ kvm_pfn_t pfn;
+ gpa_t gpa_end;
+
+ gpa_end = gpa + page_level_size(level);
+
+ while (gpa < gpa_end) {
+ /*
+ * If the gpa is not present in the NPT then build the NPT.
+ */
+ rc = snp_check_and_build_npt(vcpu, gpa, level);
+ if (rc)
+ return -EINVAL;
+
+ if (op == SNP_PAGE_STATE_PRIVATE) {
+ hva_t hva;
+
+ if (snp_gpa_to_hva(kvm, gpa, &hva))
+ return -EINVAL;
+
+ /*
+ * Verify that the hva range is registered. This enforcement is
+ * required to avoid the cases where a page is marked private
+ * in the RMP table but never gets cleanup during the VM
+ * termination path.
+ */
+ mutex_lock(&kvm->lock);
+ rc = is_hva_registered(kvm, hva, page_level_size(level));
+ mutex_unlock(&kvm->lock);
+ if (!rc)
+ return -EINVAL;
+
+ /*
+ * Mark the userspace range unmerable before adding the pages
+ * in the RMP table.
+ */
+ mmap_write_lock(kvm->mm);
+ rc = snp_mark_unmergable(kvm, hva, page_level_size(level));
+ mmap_write_unlock(kvm->mm);
+ if (rc)
+ return -EINVAL;
+ }
+
+ write_lock(&kvm->mmu_lock);
+
+ rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
+ if (!rc) {
+ /*
+ * This may happen if another vCPU unmapped the page
+ * before we acquire the lock. Retry the PSC.
+ */
+ write_unlock(&kvm->mmu_lock);
+ return 0;
+ }
+
+ /*
+ * Adjust the level so that we don't go higher than the backing
+ * page level.
+ */
+ level = min_t(size_t, level, npt_level);
+
+ trace_kvm_snp_psc(vcpu->vcpu_id, pfn, gpa, op, level);
+
+ switch (op) {
+ case SNP_PAGE_STATE_SHARED:
+ rc = snp_make_page_shared(kvm, gpa, pfn, level);
+ break;
+ case SNP_PAGE_STATE_PRIVATE:
+ rc = rmp_make_private(pfn, gpa, level, sev->asid, false);
+ break;
+ default:
+ rc = -EINVAL;
+ break;
+ }
+
+ write_unlock(&kvm->mmu_lock);
+
+ if (rc) {
+ pr_err_ratelimited("Error op %d gpa %llx pfn %llx level %d rc %d\n",
+ op, gpa, pfn, level, rc);
+ return rc;
+ }
+
+ gpa = gpa + page_level_size(level);
+ }
+
+ return 0;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3352,6 +3528,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PSC_REQ: {
+ gfn_t gfn;
+ int ret;
+ enum psc_op op;
+
+ gfn = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_GFN_MASK, GHCB_MSR_PSC_GFN_POS);
+ op = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_OP_MASK, GHCB_MSR_PSC_OP_POS);
+
+ ret = __snp_handle_page_state_change(vcpu, op, gfn_to_gpa(gfn), PG_LEVEL_4K);
+
+ if (ret)
+ set_ghcb_msr_bits(svm, GHCB_MSR_PSC_ERROR,
+ GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
+ else
+ set_ghcb_msr_bits(svm, 0,
+ GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
+
+ set_ghcb_msr_bits(svm, 0, GHCB_MSR_PSC_RSVD_MASK, GHCB_MSR_PSC_RSVD_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_PSC_RESP, GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 9b9bc5468103..79801e50344a 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -7,6 +7,7 @@
#include <asm/svm.h>
#include <asm/clocksource.h>
#include <asm/pvclock-abi.h>
+#include <asm/sev-common.h>

#undef TRACE_SYSTEM
#define TRACE_SYSTEM kvm
@@ -1755,6 +1756,39 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
__entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
);

+/*
+ * Tracepoint for the SEV-SNP page state change processing
+ */
+#define psc_operation \
+ {SNP_PAGE_STATE_PRIVATE, "private"}, \
+ {SNP_PAGE_STATE_SHARED, "shared"} \
+
+TRACE_EVENT(kvm_snp_psc,
+ TP_PROTO(unsigned int vcpu_id, u64 pfn, u64 gpa, u8 op, int level),
+ TP_ARGS(vcpu_id, pfn, gpa, op, level),
+
+ TP_STRUCT__entry(
+ __field(int, vcpu_id)
+ __field(u64, pfn)
+ __field(u64, gpa)
+ __field(u8, op)
+ __field(int, level)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu_id;
+ __entry->pfn = pfn;
+ __entry->gpa = gpa;
+ __entry->op = op;
+ __entry->level = level;
+ ),
+
+ TP_printk("vcpu %u, pfn %llx, gpa %llx, op %s, level %d",
+ __entry->vcpu_id, __entry->pfn, __entry->gpa,
+ __print_symbolic(__entry->op, psc_operation),
+ __entry->level)
+);
+
#endif /* _TRACE_KVM_H */

#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 50fff5202e7e..4a1d16231e30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13066,6 +13066,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_snp_psc);

static int __init kvm_x86_init(void)
{
--
2.25.1

2022-06-20 23:17:31

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 39/49] KVM: SVM: Introduce ops for the post gfn map and unmap

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled in the guest VM, the guest memory pages can
either be a private or shared. A write from the hypervisor goes through
the RMP checks. If hardware sees that hypervisor is attempting to write
to a guest private page, then it triggers an RMP violation #PF.

To avoid the RMP violation with GHCB pages, added new post_{map,unmap}_gfn
functions to verify if its safe to map GHCB pages. Uses a spinlock to
protect against the page state change for existing mapped pages.

Need to add generic post_{map,unmap}_gfn() ops that can be used to verify
that its safe to map a given guest page in the hypervisor.

This patch will need to be revisited later after consensus is reached on
how to manage guest private memory as probably UPM private memslots will
be able to handle this page state change more gracefully.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 3 ++
arch/x86/kvm/svm/sev.c | 48 ++++++++++++++++++++++++++++--
arch/x86/kvm/svm/svm.c | 3 ++
arch/x86/kvm/svm/svm.h | 11 +++++++
5 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index e0068e702692..2dd2bc0cf4c3 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -130,6 +130,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP(alloc_apic_backing_page)
KVM_X86_OP_OPTIONAL(rmp_page_level_adjust)
+KVM_X86_OP(update_protected_guest_state)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 49b217dc8d7e..8abc0e724f5c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1522,7 +1522,10 @@ struct kvm_x86_ops {
unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);

void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+
void (*rmp_page_level_adjust)(struct kvm *kvm, kvm_pfn_t pfn, int *level);
+
+ int (*update_protected_guest_state)(struct kvm_vcpu *vcpu);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index cb2d1bbb862b..4ed90331bca0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -341,6 +341,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
if (ret)
goto e_free;

+ spin_lock_init(&sev->psc_lock);
ret = sev_snp_init(&argp->error);
} else {
ret = sev_platform_init(&argp->error);
@@ -2828,19 +2829,28 @@ static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
{
struct vmcb_control_area *control = &svm->vmcb->control;
u64 gfn = gpa_to_gfn(control->ghcb_gpa);
+ struct kvm_vcpu *vcpu = &svm->vcpu;

- if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
+ if (kvm_vcpu_map(vcpu, gfn, map)) {
/* Unable to map GHCB from guest */
pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
return -EFAULT;
}

+ if (sev_post_map_gfn(vcpu->kvm, map->gfn, map->pfn)) {
+ kvm_vcpu_unmap(vcpu, map, false);
+ return -EBUSY;
+ }
+
return 0;
}

static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
{
- kvm_vcpu_unmap(&svm->vcpu, map, true);
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+
+ kvm_vcpu_unmap(vcpu, map, true);
+ sev_post_unmap_gfn(vcpu->kvm, map->gfn, map->pfn);
}

static void dump_ghcb(struct vcpu_svm *svm)
@@ -3383,6 +3393,8 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
return PSC_UNDEF_ERR;
}

+ spin_lock(&sev->psc_lock);
+
write_lock(&kvm->mmu_lock);

rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
@@ -3417,6 +3429,8 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,

write_unlock(&kvm->mmu_lock);

+ spin_unlock(&sev->psc_lock);
+
if (rc) {
pr_err_ratelimited("Error op %d gpa %llx pfn %llx level %d rc %d\n",
op, gpa, pfn, level, rc);
@@ -3965,3 +3979,33 @@ void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level)
/* Adjust the level to keep the NPT and RMP in sync */
*level = min_t(size_t, *level, rmp_level);
}
+
+int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ int level;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ spin_lock(&sev->psc_lock);
+
+ /* If pfn is not added as private then fail */
+ if (snp_lookup_rmpentry(pfn, &level) == 1) {
+ spin_unlock(&sev->psc_lock);
+ pr_err_ratelimited("failed to map private gfn 0x%llx pfn 0x%llx\n", gfn, pfn);
+ return -EBUSY;
+ }
+
+ return 0;
+}
+
+void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ if (!sev_snp_guest(kvm))
+ return;
+
+ spin_unlock(&sev->psc_lock);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b24e0171cbf2..1c8e035ba011 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4734,7 +4734,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,

.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
.rmp_page_level_adjust = sev_rmp_page_level_adjust,
+
+ .update_protected_guest_state = sev_snp_update_protected_guest_state,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 54ff56cb6125..3fd95193ed8d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -79,19 +79,25 @@ struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
bool snp_active; /* SEV-SNP enabled guest */
+
unsigned int asid; /* ASID used for this guest */
unsigned int handle; /* SEV firmware handle */
int fd; /* SEV device fd */
+
unsigned long pages_locked; /* Number of pages locked */
struct list_head regions_list; /* List of registered regions */
+
u64 ap_jump_table; /* SEV-ES AP Jump Table address */
+
struct kvm *enc_context_owner; /* Owner of copied encryption context */
struct list_head mirror_vms; /* List of VMs mirroring */
struct list_head mirror_entry; /* Use as a list entry of mirrors */
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
+
u64 snp_init_flags;
void *snp_context; /* SNP guest context page */
+ spinlock_t psc_lock;
};

struct kvm_svm {
@@ -702,6 +708,11 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level);
+int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn);
+void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn);
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu);

/* vmenter.S */

--
2.25.1

2022-06-20 23:18:11

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 40/49] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use

From: Brijesh Singh <[email protected]>

While resolving the RMP page fault, we may run into cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.

To keep the RMP and TDP page level in sync, we will zap the gfn range
after splitting the pages in the RMP entry. The zap should force the
TDP to gets rebuilt with the new page level.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu.h | 2 --
arch/x86/kvm/mmu/mmu.c | 1 +
3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8abc0e724f5c..1db4d178eb1d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1627,6 +1627,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+

int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index d55b5166389a..c5044958a0fa 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -267,8 +267,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
return -(u32)fault & errcode;
}

-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);

int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c1ac486e096e..67120bfeb667 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6084,6 +6084,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,

return need_tlb_flush;
}
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);

void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
const struct kvm_memory_slot *slot)
--
2.25.1

2022-06-20 23:18:13

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 41/49] KVM: SVM: Add support to handle the RMP nested page fault

From: Brijesh Singh <[email protected]>

When SEV-SNP is enabled in the guest, the hardware places restrictions on
all memory accesses based on the contents of the RMP table. When hardware
encounters RMP check failure caused by the guest memory access it raises
the #NPF. The error code contains additional information on the access
type. See the APM volume 2 for additional information.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 14 +++++---
2 files changed, 86 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4ed90331bca0..7fc0fad87054 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4009,3 +4009,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)

spin_unlock(&sev->psc_lock);
}
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+ int rmp_level, npt_level, rc, assigned;
+ struct kvm *kvm = vcpu->kvm;
+ gfn_t gfn = gpa_to_gfn(gpa);
+ bool need_psc = false;
+ enum psc_op psc_op;
+ kvm_pfn_t pfn;
+ bool private;
+
+ write_lock(&kvm->mmu_lock);
+
+ if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))
+ goto unlock;
+
+ assigned = snp_lookup_rmpentry(pfn, &rmp_level);
+ if (unlikely(assigned < 0))
+ goto unlock;
+
+ private = !!(error_code & PFERR_GUEST_ENC_MASK);
+
+ /*
+ * If the fault was due to size mismatch, or NPT and RMP page level's
+ * are not in sync, then use PSMASH to split the RMP entry into 4K.
+ */
+ if ((error_code & PFERR_GUEST_SIZEM_MASK) ||
+ (npt_level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M && private)) {
+ rc = snp_rmptable_psmash(kvm, pfn);
+ if (rc)
+ pr_err_ratelimited("psmash failed, gpa 0x%llx pfn 0x%llx rc %d\n",
+ gpa, pfn, rc);
+ goto out;
+ }
+
+ /*
+ * If it's a private access, and the page is not assigned in the
+ * RMP table, create a new private RMP entry. This can happen if
+ * guest did not use the PSC VMGEXIT to transition the page state
+ * before the access.
+ */
+ if (!assigned && private) {
+ need_psc = 1;
+ psc_op = SNP_PAGE_STATE_PRIVATE;
+ goto out;
+ }
+
+ /*
+ * If it's a shared access, but the page is private in the RMP table
+ * then make the page shared in the RMP table. This can happen if
+ * the guest did not use the PSC VMGEXIT to transition the page
+ * state before the access.
+ */
+ if (assigned && !private) {
+ need_psc = 1;
+ psc_op = SNP_PAGE_STATE_SHARED;
+ }
+
+out:
+ write_unlock(&kvm->mmu_lock);
+
+ if (need_psc)
+ rc = __snp_handle_page_state_change(vcpu, psc_op, gpa, PG_LEVEL_4K);
+
+ /*
+ * The fault handler has updated the RMP pagesize, zap the existing
+ * rmaps for large entry ranges so that nested page table gets rebuilt
+ * with the updated RMP pagesize.
+ */
+ gfn = gpa_to_gfn(gpa) & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+ kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+ return;
+
+unlock:
+ write_unlock(&kvm->mmu_lock);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1c8e035ba011..7742bc986afc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1866,15 +1866,21 @@ static int pf_interception(struct kvm_vcpu *vcpu)
static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ int rc;

u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;

trace_kvm_page_fault(fault_address, error_code);
- return kvm_mmu_page_fault(vcpu, fault_address, error_code,
- static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
- svm->vmcb->control.insn_bytes : NULL,
- svm->vmcb->control.insn_len);
+ rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+ static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+ svm->vmcb->control.insn_bytes : NULL,
+ svm->vmcb->control.insn_len);
+
+ if (error_code & PFERR_GUEST_RMP_MASK)
+ handle_rmp_page_fault(vcpu, fault_address, error_code);
+
+ return rc;
}

static int db_interception(struct kvm_vcpu *vcpu)
--
2.25.1

2022-06-20 23:19:36

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 42/49] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

From: Brijesh Singh <[email protected]>

Version 2 of GHCB specification added the support for two SNP Guest
Request Message NAE events. The events allows for an SEV-SNP guest to
make request to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.

The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
difference of an additional certificate blob that can be passed through
the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
provides snp_guest_ext_guest_request() that is used by the KVM to get
both the report and certificate data at once.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kvm/svm/sev.c | 196 +++++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/svm/svm.h | 2 +
2 files changed, 192 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7fc0fad87054..089af21a4efe 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -343,6 +343,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)

spin_lock_init(&sev->psc_lock);
ret = sev_snp_init(&argp->error);
+ mutex_init(&sev->guest_req_lock);
} else {
ret = sev_platform_init(&argp->error);
}
@@ -1884,23 +1885,39 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)

static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
+ void *context = NULL, *certs_data = NULL, *resp_page = NULL;
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
struct sev_data_snp_gctx_create data = {};
- void *context;
int rc;

+ /* Allocate memory used for the certs data in SNP guest request */
+ certs_data = kmalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
+ if (!certs_data)
+ return NULL;
+
/* Allocate memory for context page */
context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
if (!context)
- return NULL;
+ goto e_free;
+
+ /* Allocate a firmware buffer used during the guest command handling. */
+ resp_page = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+ if (!resp_page)
+ goto e_free;

data.gctx_paddr = __psp_pa(context);
rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
- if (rc) {
- snp_free_firmware_page(context);
- return NULL;
- }
+ if (rc)
+ goto e_free;
+
+ sev->snp_certs_data = certs_data;

return context;
+
+e_free:
+ snp_free_firmware_page(context);
+ kfree(certs_data);
+ return NULL;
}

static int snp_bind_asid(struct kvm *kvm, int *error)
@@ -2565,6 +2582,8 @@ static int snp_decommission_context(struct kvm *kvm)
snp_free_firmware_page(sev->snp_context);
sev->snp_context = NULL;

+ kfree(sev->snp_certs_data);
+
return 0;
}

@@ -3077,6 +3096,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3502,6 +3523,155 @@ static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm)
return rc ? map_to_psc_vmgexit_code(rc) : 0;
}

+static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
+ struct sev_data_snp_guest_request *data,
+ gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ kvm_pfn_t req_pfn, resp_pfn;
+ struct kvm_sev_info *sev;
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+ return SEV_RET_INVALID_PARAM;
+
+ req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+ if (is_error_noslot_pfn(req_pfn))
+ return SEV_RET_INVALID_ADDRESS;
+
+ resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+ if (is_error_noslot_pfn(resp_pfn))
+ return SEV_RET_INVALID_ADDRESS;
+
+ if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+ return SEV_RET_INVALID_ADDRESS;
+
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+ data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+ return 0;
+}
+
+static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
+{
+ u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+ int ret;
+
+ ret = snp_page_reclaim(pfn);
+ if (ret)
+ *rc = SEV_RET_INVALID_ADDRESS;
+
+ ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (ret)
+ *rc = SEV_RET_INVALID_ADDRESS;
+}
+
+static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct sev_data_snp_guest_request data = {0};
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_sev_info *sev;
+ unsigned long rc;
+ int err;
+
+ if (!sev_snp_guest(vcpu->kvm)) {
+ rc = SEV_RET_INVALID_GUEST;
+ goto e_fail;
+ }
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ mutex_lock(&sev->guest_req_lock);
+
+ rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
+ if (rc)
+ goto unlock;
+
+ rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
+ if (rc)
+ /* use the firmware error code */
+ rc = err;
+
+ snp_cleanup_guest_buf(&data, &rc);
+
+unlock:
+ mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+ svm_set_ghcb_sw_exit_info_2(vcpu, rc);
+}
+
+static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct sev_data_snp_guest_request req = {0};
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ unsigned long data_npages;
+ struct kvm_sev_info *sev;
+ unsigned long rc, err;
+ u64 data_gpa;
+
+ if (!sev_snp_guest(vcpu->kvm)) {
+ rc = SEV_RET_INVALID_GUEST;
+ goto e_fail;
+ }
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+ data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+ if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
+ rc = SEV_RET_INVALID_ADDRESS;
+ goto e_fail;
+ }
+
+ /* Verify that requested blob will fit in certificate buffer */
+ if ((data_npages << PAGE_SHIFT) > SEV_FW_BLOB_MAX_SIZE) {
+ rc = SEV_RET_INVALID_PARAM;
+ goto e_fail;
+ }
+
+ mutex_lock(&sev->guest_req_lock);
+
+ rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
+ if (rc)
+ goto unlock;
+
+ rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
+ &data_npages, &err);
+ if (rc) {
+ /*
+ * If buffer length is small then return the expected
+ * length in rbx.
+ */
+ if (err == SNP_GUEST_REQ_INVALID_LEN)
+ vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
+
+ /* pass the firmware error code */
+ rc = err;
+ goto cleanup;
+ }
+
+ /* Copy the certificate blob in the guest memory */
+ if (data_npages &&
+ kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
+ rc = SEV_RET_INVALID_ADDRESS;
+
+cleanup:
+ snp_cleanup_guest_buf(&req, &rc);
+
+unlock:
+ mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+ svm_set_ghcb_sw_exit_info_2(vcpu, rc);
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3753,6 +3923,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
svm_set_ghcb_sw_exit_info_2(vcpu, rc);
break;
}
+ case SVM_VMGEXIT_GUEST_REQUEST: {
+ snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
+
+ ret = 1;
+ break;
+ }
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
+ snp_handle_ext_guest_request(svm,
+ control->exit_info_1,
+ control->exit_info_2);
+
+ ret = 1;
+ break;
+ }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 3fd95193ed8d..3be24da1a743 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -98,6 +98,8 @@ struct kvm_sev_info {
u64 snp_init_flags;
void *snp_context; /* SNP guest context page */
spinlock_t psc_lock;
+ void *snp_certs_data;
+ struct mutex guest_req_lock;
};

struct kvm_svm {
--
2.25.1

2022-06-20 23:19:44

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 43/49] KVM: SVM: Use a VMSA physical address variable for populating VMCB

From: Tom Lendacky <[email protected]>

In preparation to support SEV-SNP AP Creation, use a variable that holds
the VMSA physical address rather than converting the virtual address.
This will allow SEV-SNP AP Creation to set the new physical address that
will be used should the vCPU reset path be taken.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 5 ++---
arch/x86/kvm/svm/svm.c | 9 ++++++++-
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 089af21a4efe..d5584551f3dd 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3980,10 +3980,9 @@ void sev_es_init_vmcb(struct vcpu_svm *svm)

/*
* An SEV-ES guest requires a VMSA area that is a separate from the
- * VMCB page. Do not include the encryption mask on the VMSA physical
- * address since hardware will access it using the guest key.
+ * VMCB page.
*/
- svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
+ svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;

/* Can't intercept CR register access, HV can't modify CR registers */
svm_clr_intercept(svm, INTERCEPT_CR0_READ);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7742bc986afc..f7155abe7567 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1296,9 +1296,16 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
svm_switch_vmcb(svm, &svm->vmcb01);

- if (vmsa_page)
+ if (vmsa_page) {
svm->sev_es.vmsa = page_address(vmsa_page);

+ /*
+ * Do not include the encryption mask on the VMSA physical
+ * address since hardware will access it using the guest key.
+ */
+ svm->sev_es.vmsa_pa = __pa(svm->sev_es.vmsa);
+ }
+
svm->guest_state_loaded = false;

return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 3be24da1a743..46790bab07a8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -197,6 +197,7 @@ struct svm_nested_state {
struct vcpu_sev_es_state {
/* SEV-ES support */
struct sev_es_save_area *vmsa;
+ hpa_t vmsa_pa;
bool ghcb_in_use;
bool received_first_sipi;
unsigned int ap_reset_hold_type;
--
2.25.1

2022-06-20 23:19:51

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 44/49] KVM: SVM: Support SEV-SNP AP Creation NAE event

From: Tom Lendacky <[email protected]>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID. The GPA is saved in the svm struct of the
target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID the next time an INIT is performed. The GPA is
saved in the svm struct of the target vCPU.

For DESTROY:
The guest indicates it wishes to stop the vCPU. The GPA is cleared
from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. The handler sets the
vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
leave the vCPU as not runnable. Any previous VMSA pages that were
installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
a new VMSA is to be installed, the VMSA guest page is pinned and set as
the VMSA in the vCPU VMCB and the vCPU state is set to
KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
cleared in the vCPU VMCB and the vCPU state is left as
KVM_MP_STATE_UNINITIALIZED to prevent it from being run.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 -
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/include/asm/svm.h | 7 +-
arch/x86/kvm/svm/sev.c | 197 +++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 5 +-
arch/x86/kvm/svm/svm.h | 6 +
arch/x86/kvm/x86.c | 9 +-
7 files changed, 221 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 2dd2bc0cf4c3..e0068e702692 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -130,7 +130,6 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP(alloc_apic_backing_page)
KVM_X86_OP_OPTIONAL(rmp_page_level_adjust)
-KVM_X86_OP(update_protected_guest_state)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1db4d178eb1d..660cf39344fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -105,6 +105,7 @@
KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(32)

#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1524,8 +1525,6 @@ struct kvm_x86_ops {
void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);

void (*rmp_page_level_adjust)(struct kvm *kvm, kvm_pfn_t pfn, int *level);
-
- int (*update_protected_guest_state)(struct kvm_vcpu *vcpu);
};

struct kvm_x86_nested_ops {
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 284a8113227e..a69b6da71a65 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -263,7 +263,12 @@ enum avic_ipi_failure_cause {
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)
#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL

-#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4)
+#define SVM_SEV_FEAT_INT_INJ_MODES \
+ (SVM_SEV_FEAT_RESTRICTED_INJECTION | \
+ SVM_SEV_FEAT_ALTERNATE_INJECTION)

struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d5584551f3dd..bb7d4547df81 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -657,6 +657,7 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)

static int sev_es_sync_vmsa(struct vcpu_svm *svm)
{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
struct sev_es_save_area *save = svm->sev_es.vmsa;

/* Check some debug related fields before encrypting the VMSA */
@@ -702,6 +703,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
if (sev_snp_guest(svm->vcpu.kvm))
save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;

+ /*
+ * Save the VMSA synced SEV features. For now, they are the same for
+ * all vCPUs, so just save each time.
+ */
+ sev->sev_features = save->sev_features;
+
return 0;
}

@@ -3090,6 +3097,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
if (!ghcb_sw_scratch_is_valid(ghcb))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ if (!ghcb_rax_is_valid(ghcb))
+ goto vmgexit_err;
+ break;
case SVM_VMGEXIT_NMI_COMPLETE:
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3672,6 +3683,178 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
svm_set_ghcb_sw_exit_info_2(vcpu, rc);
}

+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ kvm_pfn_t pfn;
+ hpa_t cur_pa;
+
+ WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+ /* Save off the current VMSA PA for later checks */
+ cur_pa = svm->sev_es.vmsa_pa;
+
+ /* Mark the vCPU as offline and not runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_STOPPED;
+
+ /* Clear use of the VMSA */
+ svm->sev_es.vmsa_pa = INVALID_PAGE;
+ svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+ if (cur_pa != __pa(svm->sev_es.vmsa) && VALID_PAGE(cur_pa)) {
+ /*
+ * The svm->sev_es.vmsa_pa field holds the hypervisor physical
+ * address of the about to be replaced VMSA which will no longer
+ * be used or referenced, so un-pin it.
+ */
+ kvm_release_pfn_dirty(__phys_to_pfn(cur_pa));
+ }
+
+ if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+ /*
+ * The VMSA is referenced by the hypervisor physical address,
+ * so retrieve the PFN and pin it.
+ */
+ pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(svm->sev_es.snp_vmsa_gpa));
+ if (is_error_pfn(pfn))
+ return -EINVAL;
+
+ /* Use the new VMSA */
+ svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
+ svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
+
+ /* Mark the vCPU as runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ }
+
+ /*
+ * When replacing the VMSA during SEV-SNP AP creation,
+ * mark the VMCB dirty so that full state is always reloaded.
+ */
+ vmcb_mark_all_dirty(svm->vmcb);
+
+ return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ int ret;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ return;
+
+ mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+ if (!svm->sev_es.snp_ap_create)
+ goto unlock;
+
+ svm->sev_es.snp_ap_create = false;
+
+ ret = __sev_snp_update_protected_guest_state(vcpu);
+ if (ret)
+ vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+ mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_vcpu *target_vcpu;
+ struct vcpu_svm *target_svm;
+ unsigned int request;
+ unsigned int apic_id;
+ bool kick;
+ int ret;
+
+ request = lower_32_bits(svm->vmcb->control.exit_info_1);
+ apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+ /* Validate the APIC ID */
+ target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+ if (!target_vcpu) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+ apic_id);
+ return -EINVAL;
+ }
+
+ ret = 0;
+
+ target_svm = to_svm(target_vcpu);
+
+ /*
+ * We have a valid target vCPU, so the vCPU will be kicked unless the
+ * request is for CREATE_ON_INIT. For any errors at this stage, the
+ * kick will place the vCPU in an non-runnable state.
+ */
+ kick = true;
+
+ mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ target_svm->sev_es.snp_ap_create = true;
+
+ /* Interrupt injection mode shouldn't change for AP creation */
+ if (request < SVM_VMGEXIT_AP_DESTROY) {
+ u64 sev_features;
+
+ sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+ sev_features ^= sev->sev_features;
+ if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+ vcpu->arch.regs[VCPU_REGS_RAX]);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ switch (request) {
+ case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+ kick = false;
+ fallthrough;
+ case SVM_VMGEXIT_AP_CREATE:
+ if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+ break;
+ case SVM_VMGEXIT_AP_DESTROY:
+ break;
+ default:
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+ request);
+ ret = -EINVAL;
+ break;
+ }
+
+out:
+ if (kick) {
+ if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+ target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+ kvm_vcpu_kick(target_vcpu);
+ }
+
+ mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ return ret;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3937,6 +4120,18 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_AP_CREATION:
+ ret = sev_snp_ap_creation(svm);
+ if (ret) {
+ svm_set_ghcb_sw_exit_info_1(vcpu, 1);
+ svm_set_ghcb_sw_exit_info_2(vcpu,
+ X86_TRAP_GP |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
+ }
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -4024,6 +4219,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
GHCB_VERSION_MIN,
sev_enc_bit));
+
+ mutex_init(&svm->sev_es.snp_vmsa_mutex);
}

void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f7155abe7567..fced6ea423ad 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1237,6 +1237,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
svm->spec_ctrl = 0;
svm->virt_spec_ctrl = 0;

+ if (init_event)
+ sev_snp_init_protected_guest_state(vcpu);
+
init_vmcb(vcpu);

if (!init_event)
@@ -4749,8 +4752,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.alloc_apic_backing_page = svm_alloc_apic_backing_page,

.rmp_page_level_adjust = sev_rmp_page_level_adjust,
-
- .update_protected_guest_state = sev_snp_update_protected_guest_state,
};

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 46790bab07a8..971ff4e949fd 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -100,6 +100,8 @@ struct kvm_sev_info {
spinlock_t psc_lock;
void *snp_certs_data;
struct mutex guest_req_lock;
+
+ u64 sev_features; /* Features set at VMSA creation */
};

struct kvm_svm {
@@ -217,6 +219,10 @@ struct vcpu_sev_es_state {
u64 ghcb_sw_exit_info_2;

u64 ghcb_registered_gpa;
+
+ struct mutex snp_vmsa_mutex;
+ gpa_t snp_vmsa_gpa;
+ bool snp_ap_create;
};

struct vcpu_svm {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4a1d16231e30..c649d15efae3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10095,6 +10095,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)

if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+ if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+ kvm_vcpu_reset(vcpu, true);
+ if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE)
+ goto out;
+ }
}

if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -12219,7 +12225,8 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
if (!list_empty_careful(&vcpu->async_pf.done))
return true;

- if (kvm_apic_has_events(vcpu))
+ if (kvm_apic_has_events(vcpu) ||
+ kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
return true;

if (vcpu->arch.pv.pv_unhalted)
--
2.25.1

2022-06-20 23:20:15

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 46/49] ccp: add support to decrypt the page

From: Brijesh Singh <[email protected]>

Add support to decrypt guest encrypted memory, these API interfaces can be
used for example to dump VMCBs on SNP guest exit.

Signed-off-by: Brijesh Singh <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 33 ++++++++++++++++++++++++++++++---
include/linux/psp-sev.h | 6 +++---
2 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index f6306b820b86..9896350e7f56 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1852,11 +1852,38 @@ int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error)
}
EXPORT_SYMBOL_GPL(snp_guest_page_reclaim);

-int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
{
- return sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, data, error);
+ struct sev_data_snp_dbg data = {0};
+ struct sev_device *sev;
+ int ret;
+
+ if (!psp_master || !psp_master->sev_data)
+ return -ENODEV;
+
+ sev = psp_master->sev_data;
+
+ if (!sev->snp_inited)
+ return -EINVAL;
+
+ data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
+ data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
+ data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);
+ data.len = PAGE_SIZE;
+
+ /* The destination page must be in the firmware state. */
+ if (snp_set_rmp_state(data.dst_addr, 1, true, false, false))
+ return -EIO;
+
+ ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &data, error);
+
+ /* Restore the page state */
+ if (snp_set_rmp_state(data.dst_addr, 1, false, false, true))
+ ret = -EIO;
+
+ return ret;
}
-EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);
+EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);

int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index cd37ccd1fa1f..8d2565c70c39 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -928,7 +928,7 @@ int snp_guest_decommission(struct sev_data_snp_decommission *data, int *error);
int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);

/**
- * snp_guest_dbg_decrypt - perform SEV SNP_DBG_DECRYPT command
+ * snp_guest_dbg_decrypt_page - perform SEV SNP_DBG_DECRYPT command
*
* @sev_ret: sev command return code
*
@@ -939,7 +939,7 @@ int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
* -%ETIMEDOUT if the sev command timed out
* -%EIO if the sev returned a non-zero return code
*/
-int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error);

void *psp_copy_user_blob(u64 uaddr, u32 len);
void *snp_alloc_firmware_page(gfp_t mask);
@@ -997,7 +997,7 @@ static inline int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data,
return -ENODEV;
}

-static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
+static inline int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
{
return -ENODEV;
}
--
2.25.1

2022-06-20 23:20:43

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 45/49] KVM: SVM: Add module parameter to enable the SEV-SNP

From: Brijesh Singh <[email protected]>

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kvm/svm/sev.c | 7 ++++---
arch/x86/kvm/svm/svm.h | 2 +-
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index bb7d4547df81..2c88215a111f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -57,14 +57,15 @@ module_param_named(sev, sev_enabled, bool, 0444);
/* enable/disable SEV-ES support */
static bool sev_es_enabled = true;
module_param_named(sev_es, sev_es_enabled, bool, 0444);
+
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
#else
#define sev_enabled false
#define sev_es_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

-/* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
-
#define AP_RESET_HOLD_NONE 0
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 971ff4e949fd..7b14b5ef1f8c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -688,7 +688,7 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu);
#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

-#define GHCB_HV_FT_SUPPORTED 0
+#define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)

extern unsigned int max_sev_asid;

--
2.25.1

2022-06-20 23:21:00

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 48/49] *debug: warn and retry failed rmpupdates

From: Michael Roth <[email protected]>

In some cases on B0 hardware exhibits something like the following
behavior (where M < 512):

Guest A | Guest B
|-------------------------------|----------------------------------|
| | rc = rmpupdate pfn=N*512,4K,priv
| rmpupdate pfn=N*512+M,4K,priv |
| rc = FAIL_OVERLAP | rc = SUCCESS

The FAIL_OVERLAP might possible be the result of hardware temporarily
treating Guest B's rmpupdate for pfn=N*512 as a 2M update, causing the
subsequent update from Guest A for pfn=N*512+M to report FAIL_OVERLAP
at that particular instance. Retrying the update for N*512+M immediately
afterward seems to resolve the FAIL_OVERLAP issue reliably however.

A similar failure has also been observed when transitioning pages back
to shared during VM destroy. In this case repeating the rmpupdate does
not always seem to resolve the failure immediately.

Both situations are much more likely to occur if THP is disabled, or
if it is enabled/disabled while guests are actively being
started/stopped.

Include some debug/error information to get a better idea of the
behavior on different hardware, and add the rmpupdate retry as a
workaround for Milan B0 testing.

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kernel/sev.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 6640a639fffc..5ae8c9f853c8 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2530,6 +2530,7 @@ static int rmpupdate(u64 pfn, struct rmpupdate *val)
{
unsigned long paddr = pfn << PAGE_SHIFT;
int ret, level, npages;
+ int retries = 0;

if (!pfn_valid(pfn))
return -EINVAL;
@@ -2552,12 +2553,26 @@ static int rmpupdate(u64 pfn, struct rmpupdate *val)
}
}

+retry:
/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
: "=a"(ret)
: "a"(paddr), "c"((unsigned long)val)
: "memory", "cc");

+ if (ret) {
+ if (!retries) {
+ pr_err("rmpupdate failed, ret: %d, pfn: %llx, npages: %d, level: %d, retrying (max: %d)...\n",
+ ret, pfn, npages, level, 2 * num_present_cpus());
+ dump_stack();
+ }
+ retries++;
+ if (retries < 2 * num_present_cpus())
+ goto retry;
+ } else if (retries > 0) {
+ pr_err("rmpupdate for pfn %llx succeeded after %d retries\n", pfn, retries);
+ }
+
/*
* Restore the direct map after the page is removed from the RMP table.
*/
--
2.25.1

2022-06-20 23:21:05

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 47/49] *fix for stale per-cpu pointer due to cond_resched during ghcb mapping

From: Michael Roth <[email protected]>

Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/svm.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fced6ea423ad..f78e3b1bde0e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1352,7 +1352,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- struct svm_cpu_data *sd = per_cpu(svm_data, vcpu->cpu);
+ struct svm_cpu_data *sd;

if (sev_es_guest(vcpu->kvm))
sev_es_unmap_ghcb(svm);
@@ -1360,6 +1360,10 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
if (svm->guest_state_loaded)
return;

+ /* sev_es_unmap_ghcb() can resched, so grab per-cpu pointer afterward. */
+ barrier();
+ sd = per_cpu(svm_data, vcpu->cpu);
+
/*
* Save additional host state that will be restored on VMEXIT (sev-es)
* or subsequent vmload of host save area.
--
2.25.1

2022-06-20 23:22:38

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH Part2 v6 49/49] KVM: SVM: Sync the GHCB scratch buffer using already mapped ghcb

From: Ashish Kalra <[email protected]>

Using kvm_write_guest() to sync the GHCB scratch buffer can fail
due to host mapping being 2M, but RMP being 4K. The page fault handling
in do_user_addr_fault() fails to split the 2M page to handle RMP fault due
to it being called here in a non-preemptible context. Instead use
the already kernel mapped ghcb to sync the scratch buffer when the
scratch buffer is contained within the GHCB.

Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kvm/svm/sev.c | 29 +++++++++++++++++++++--------
arch/x86/kvm/svm/svm.h | 2 ++
2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2c88215a111f..e1dd67e12774 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2944,6 +2944,24 @@ static bool sev_es_sync_to_ghcb(struct vcpu_svm *svm)
ghcb_set_sw_exit_info_1(ghcb, svm->sev_es.ghcb_sw_exit_info_1);
ghcb_set_sw_exit_info_2(ghcb, svm->sev_es.ghcb_sw_exit_info_2);

+ /* Sync the scratch buffer area. */
+ if (svm->sev_es.ghcb_sa_sync) {
+ if (svm->sev_es.ghcb_sa_contained) {
+ memcpy(ghcb->shared_buffer + svm->sev_es.ghcb_sa_offset,
+ svm->sev_es.ghcb_sa, svm->sev_es.ghcb_sa_len);
+ } else {
+ int ret;
+
+ ret = kvm_write_guest(svm->vcpu.kvm,
+ svm->sev_es.ghcb_sa_gpa,
+ svm->sev_es.ghcb_sa, svm->sev_es.ghcb_sa_len);
+ if (ret)
+ pr_warn_ratelimited("unmap_ghcb: kvm_write_guest failed while syncing scratch area, gpa: %llx, ret: %d\n",
+ svm->sev_es.ghcb_sa_gpa, ret);
+ }
+ svm->sev_es.ghcb_sa_sync = false;
+ }
+
trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, ghcb);

svm_unmap_ghcb(svm, &map);
@@ -3156,14 +3174,6 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
if (!svm->sev_es.ghcb_in_use)
return;

- /* Sync the scratch buffer area. */
- if (svm->sev_es.ghcb_sa_sync) {
- kvm_write_guest(svm->vcpu.kvm,
- svm->sev_es.ghcb_sa_gpa,
- svm->sev_es.ghcb_sa, svm->sev_es.ghcb_sa_len);
- svm->sev_es.ghcb_sa_sync = false;
- }
-
sev_es_sync_to_ghcb(svm);

svm->sev_es.ghcb_in_use = false;
@@ -3229,6 +3239,8 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
scratch_gpa_beg, scratch_gpa_end);
goto e_scratch;
}
+ svm->sev_es.ghcb_sa_contained = true;
+ svm->sev_es.ghcb_sa_offset = scratch_gpa_beg - ghcb_scratch_beg;
} else {
/*
* The guest memory must be read into a kernel buffer, so
@@ -3239,6 +3251,7 @@ static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
len, GHCB_SCRATCH_AREA_LIMIT);
goto e_scratch;
}
+ svm->sev_es.ghcb_sa_contained = false;
}

if (svm->sev_es.ghcb_sa_alloc_len < len) {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7b14b5ef1f8c..2cdfc79bf2cf 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -210,6 +210,8 @@ struct vcpu_sev_es_state {
u64 ghcb_sa_gpa;
u32 ghcb_sa_alloc_len;
bool ghcb_sa_sync;
+ bool ghcb_sa_contained;
+ u32 ghcb_sa_offset;

/*
* SEV-ES support to hold the sw_exit_info return values to be
--
2.25.1

2022-06-21 15:53:42

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 03/49] x86/sev: Add the host SEV-SNP initialization support

On Mon, Jun 20, 2022 at 5:02 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
> track the owner of each page of memory. Pages of memory can be owned by
> the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
> section 15.36.3 for more detail on RMP.
>
> The RMP table is used to enforce access control to memory. The table itself
> is not directly writable by the software. New CPU instructions (RMPUPDATE,
> PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.
>
> Based on the platform configuration, the BIOS reserves the memory used
> for the RMP table. The start and end address of the RMP table must be
> queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
> RMP_END are not set then disable the SEV-SNP feature.
>
> The SEV-SNP feature is enabled only after the RMP table is successfully
> initialized.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/disabled-features.h | 8 +-
> arch/x86/include/asm/msr-index.h | 6 +
> arch/x86/kernel/sev.c | 144 +++++++++++++++++++++++
> 3 files changed, 157 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 36369e76cc63..c1be3091a383 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -68,6 +68,12 @@
> # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
> #endif
>
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +# define DISABLE_SEV_SNP 0
> +#else
> +# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
> +#endif
> +
> /*
> * Make sure to add features to the correct mask
> */
> @@ -91,7 +97,7 @@
> DISABLE_ENQCMD)
> #define DISABLED_MASK17 0
> #define DISABLED_MASK18 0
> -#define DISABLED_MASK19 0
> +#define DISABLED_MASK19 (DISABLE_SEV_SNP)
> #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
>
> #endif /* _ASM_X86_DISABLED_FEATURES_H */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 9e2e7185fc1d..57a8280e283a 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -507,6 +507,8 @@
> #define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
> #define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
> #define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +#define MSR_AMD64_RMP_BASE 0xc0010132
> +#define MSR_AMD64_RMP_END 0xc0010133
>
> #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
>
> @@ -581,6 +583,10 @@
> #define MSR_AMD64_SYSCFG 0xc0010010
> #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
> #define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
> +#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
> #define MSR_K8_INT_PENDING_MSG 0xc0010055
> /* C1E active bits in int pending message */
> #define K8_INTP_C1E_ACTIVE_MASK 0x18000000
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index f01f4550e2c6..3a233b5d47c5 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -22,6 +22,8 @@
> #include <linux/efi.h>
> #include <linux/platform_device.h>
> #include <linux/io.h>
> +#include <linux/cpumask.h>
> +#include <linux/iommu.h>
>
> #include <asm/cpu_entry_area.h>
> #include <asm/stacktrace.h>
> @@ -38,6 +40,7 @@
> #include <asm/apic.h>
> #include <asm/cpuid.h>
> #include <asm/cmdline.h>
> +#include <asm/iommu.h>
>
> #define DR7_RESET_VALUE 0x400
>
> @@ -57,6 +60,12 @@
> #define AP_INIT_CR0_DEFAULT 0x60000010
> #define AP_INIT_MXCSR_DEFAULT 0x1f80
>
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the
> + * bookkeeping, the range need to be added during the RMP entry lookup.
> + */
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> +
> /* For early boot hypervisor communication in SEV-ES enabled guests */
> static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
>
> @@ -69,6 +78,10 @@ static struct ghcb *boot_ghcb __section(".data");
> /* Bitmap of SEV features supported by the hypervisor */
> static u64 sev_hv_features __ro_after_init;
>
> +static unsigned long rmptable_start __ro_after_init;
> +static unsigned long rmptable_end __ro_after_init;
> +
> +
> /* #VC handler runtime per-CPU data */
> struct sev_es_runtime_data {
> struct ghcb ghcb_page;
> @@ -2218,3 +2231,134 @@ static int __init snp_init_platform_device(void)
> return 0;
> }
> device_initcall(snp_init_platform_device);
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt) "SEV-SNP: " fmt
> +
> +static int __snp_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_SNP_EN;
> + val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void snp_enable(void *arg)
> +{
> + __snp_enable(smp_processor_id());
> +}
> +
> +static bool get_rmptable_info(u64 *start, u64 *len)
> +{
> + u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end, nr_pages;
> +
> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> + if (!rmp_base || !rmp_end) {
> + pr_info("Memory for the RMP table has not been reserved by BIOS\n");
> + return false;
> + }
> +
> + rmp_sz = rmp_end - rmp_base + 1;
> +
> + /*
> + * Calculate the amount the memory that must be reserved by the BIOS to
> + * address the full system RAM. The reserved memory should also cover the
> + * RMP table itself.
> + *
> + * See PPR Family 19h Model 01h, Revision B1 section 2.1.4.2 for more
> + * information on memory requirement.
> + */
> + nr_pages = totalram_pages();
> + calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + nr_pages) << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> + if (calc_rmp_sz > rmp_sz) {
> + pr_info("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> + calc_rmp_sz, rmp_sz);
> + return false;
> + }
> +
> + *start = rmp_base;
> + *len = rmp_sz;
> +
> + pr_info("RMP table physical address 0x%016llx - 0x%016llx\n", rmp_base, rmp_end);
> +
> + return true;
> +}
> +
> +static __init int __snp_rmptable_init(void)
> +{
> + u64 rmp_base, sz;
> + void *start;
> + u64 val;
> +
> + if (!get_rmptable_info(&rmp_base, &sz))
> + return 1;
> +
> + start = memremap(rmp_base, sz, MEMREMAP_WB);
> + if (!start) {
> + pr_err("Failed to map RMP table 0x%llx+0x%llx\n", rmp_base, sz);
> + return 1;
> + }
> +
> + /*
> + * Check if SEV-SNP is already enabled, this can happen if we are coming from
> + * kexec boot.
> + */
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
> + goto skip_enable;
> +
> + /* Initialize the RMP table to zero */
> + memset(start, 0, sz);
> +
> + /* Flush the caches to ensure that data is written before SNP is enabled. */
> + wbinvd_on_all_cpus();
> +
> + /* Enable SNP on all CPUs. */
> + on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> + rmptable_start = (unsigned long)start;
> + rmptable_end = rmptable_start + sz;

Since in get_rmptable_info() `rmp_sz = rmp_end - rmp_base + 1;` should
this be `rmptable_end = rmptable_start + sz - 1;`?

> +
> + return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> + if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + if (!iommu_sev_snp_supported())
> + goto nosnp;
> +
> + if (__snp_rmptable_init())
> + goto nosnp;
> +
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> + return 0;
> +
> +nosnp:
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> + return 1;
> +}
> +
> +/*
> + * This must be called after the PCI subsystem. This is because before enabling
> + * the SNP feature we need to ensure that IOMMU supports the SEV-SNP feature.
> + * The iommu_sev_snp_support() is used for checking the feature, and it is
> + * available after subsys_initcall().
> + */
> +fs_initcall(snp_rmptable_init);
> --
> 2.25.1
>

2022-06-21 17:40:35

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

[AMD Official Use Only - General]

Hello Dave,

>> /*
>> * The RMP entry format is not architectural. The format is defined
>> in PPR @@ -126,6 +128,15 @@ struct snp_guest_platform_data {
>> u64 secrets_gpa;
>> };
>>
>> +struct rmpupdate {
>> + u64 gpa;
>> + u8 assigned;
>> + u8 pagesize;
>> + u8 immutable;
>> + u8 rsvd;
>> + u32 asid;
>> +} __packed;

>I see above it says the RMP entry format isn't architectural; is this 'rmpupdate' structure? If not how is this going to get handled when we have a couple of SNP capable CPUs with different layouts?

Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible.
I probably think the wording here should be architecture independent or more precisely platform independent.

Thanks,
Ashish

2022-06-21 18:08:03

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 03/49] x86/sev: Add the host SEV-SNP initialization support

[Public]

Hello Peter,

>> +static __init int __snp_rmptable_init(void) {
>> + u64 rmp_base, sz;
>> + void *start;
>> + u64 val;
>> +
>> + if (!get_rmptable_info(&rmp_base, &sz))
>> + return 1;
>> +
>> + start = memremap(rmp_base, sz, MEMREMAP_WB);
>> + if (!start) {
>> + pr_err("Failed to map RMP table 0x%llx+0x%llx\n", rmp_base, sz);
>> + return 1;
>> + }
>> +
>> + /*
>> + * Check if SEV-SNP is already enabled, this can happen if we are coming from
>> + * kexec boot.
>> + */
>> + rdmsrl(MSR_AMD64_SYSCFG, val);
>> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
>> + goto skip_enable;
>> +
>> + /* Initialize the RMP table to zero */
>> + memset(start, 0, sz);
>> +
>> + /* Flush the caches to ensure that data is written before SNP is enabled. */
>> + wbinvd_on_all_cpus();
>> +
>> + /* Enable SNP on all CPUs. */
>> + on_each_cpu(snp_enable, NULL, 1);
>> +
>> +skip_enable:
>> + rmptable_start = (unsigned long)start;
>> + rmptable_end = rmptable_start + sz;

> Since in get_rmptable_info() `rmp_sz = rmp_end - rmp_base + 1;` should this be `rmptable_end = rmptable_start + sz - 1;`?

Yes, it should be.

Thanks,
Ashish

2022-06-21 18:13:33

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

On Mon, Jun 20, 2022 at 5:05 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The behavior and requirement for the SEV-legacy command is altered when
> the SNP firmware is in the INIT state. See SEV-SNP firmware specification
> for more details.
>
> Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
> when SNP is enabled to satify new requirements for the SNP. Continue

satisfy

> allocating a 1mb region for !SNP configuration.
>
> While at it, provide API that can be used by others to allocate a page
> that can be used by the firmware. The immediate user for this API will
> be the KVM driver. The KVM driver to need to allocate a firmware context
> page during the guest creation. The context page need to be updated
> by the firmware. See the SEV-SNP specification for further details.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> drivers/crypto/ccp/sev-dev.c | 173 +++++++++++++++++++++++++++++++++--
> include/linux/psp-sev.h | 11 +++
> 2 files changed, 178 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 35d76333e120..0dbd99f29b25 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -79,6 +79,14 @@ static void *sev_es_tmr;
> #define NV_LENGTH (32 * 1024)
> static void *sev_init_ex_buffer;
>
> +/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
> +#define SEV_SNP_ES_TMR_SIZE (2 * 1024 * 1024)
> +
> +static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;

Why not keep all this TMR stuff together near the SEV_ES_TMR_SIZE define?

> +
> +static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
> +static int sev_do_cmd(int cmd, void *data, int *psp_ret);
> +
> static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
> {
> struct sev_device *sev = psp_master->sev_data;
> @@ -177,11 +185,161 @@ static int sev_cmd_buffer_len(int cmd)
> return 0;
> }
>
> +static void snp_leak_pages(unsigned long pfn, unsigned int npages)
> +{
> + WARN(1, "psc failed, pfn 0x%lx pages %d (leaking)\n", pfn, npages);
> + while (npages--) {
> + memory_failure(pfn, 0);
> + dump_rmpentry(pfn);
> + pfn++;
> + }
> +}
> +
> +static int snp_reclaim_pages(unsigned long pfn, unsigned int npages, bool locked)
> +{
> + struct sev_data_snp_page_reclaim data;
> + int ret, err, i, n = 0;
> +
> + for (i = 0; i < npages; i++) {

What about setting |n| here too, also the other increments.

for (i = 0, n = 0; i < npages; i++, n++, pfn++)

> + memset(&data, 0, sizeof(data));
> + data.paddr = pfn << PAGE_SHIFT;
> +
> + if (locked)
> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> + else
> + ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);

Can we change `sev_cmd_mutex` to some sort of nesting lock type? That
could clean up this if (locked) code.

> + if (ret)
> + goto cleanup;
> +
> + ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (ret)
> + goto cleanup;
> +
> + pfn++;
> + n++;
> + }
> +
> + return 0;
> +
> +cleanup:
> + /*
> + * If failed to reclaim the page then page is no longer safe to
> + * be released, leak it.
> + */
> + snp_leak_pages(pfn, npages - n);
> + return ret;
> +}
> +
> +static inline int rmp_make_firmware(unsigned long pfn, int level)
> +{
> + return rmp_make_private(pfn, 0, level, 0, true);
> +}
> +
> +static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, bool to_fw, bool locked,
> + bool need_reclaim)

This function can do a lot and when I read the call sites its hard to
see what its doing since we have a combination of arguments which tell
us what behavior is happening, some of which are not valid (ex: to_fw
== true and need_reclaim == true is an invalid argument combination).
Also this for loop over |npages| is duplicated from
snp_reclaim_pages(). One improvement here is that on the current
snp_reclaim_pages() if we fail to reclaim a page we assume we cannot
reclaim the next pages, this may cause us to snp_leak_pages() more
pages than we actually need too.

What about something like this?

static snp_leak_page(u64 pfn, enum pg_level level)
{
memory_failure(pfn, 0);
dump_rmpentry(pfn);
}

static int snp_reclaim_page(u64 pfn, enum pg_level level)
{
int ret;
struct sev_data_snp_page_reclaim data;

ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
if (ret)
goto cleanup;

ret = rmp_make_shared(pfn, level);
if (ret)
goto cleanup;

return 0;

cleanup:
snp_leak_page(pfn, level)
}

typedef int (*rmp_state_change_func) (u64 pfn, enum pg_level level);

static int snp_set_rmp_state(unsigned long paddr, unsigned int npages,
rmp_state_change_func state_change, rmp_state_change_func cleanup)
{
struct sev_data_snp_page_reclaim data;
int ret, err, i, n = 0;

for (i = 0, n = 0; i < npages; i++, n++, pfn++) {
ret = state_change(pfn, PG_LEVEL_4K)
if (ret)
goto cleanup;
}

return 0;

cleanup:
for (; i>= 0; i--, n--, pfn--) {
cleanup(pfn, PG_LEVEL_4K);
}

return ret;
}

Then inside of __snp_alloc_firmware_pages():

snp_set_rmp_state(paddr, npages, rmp_make_firmware, snp_reclaim_page);

And inside of __snp_free_firmware_pages():

snp_set_rmp_state(paddr, npages, snp_reclaim_page, snp_leak_page);

Just a suggestion feel free to ignore. The readability comment could
be addressed much less invasively by just making separate functions
for each valid combination of arguments here. Like
snp_set_rmp_fw_state(), snp_set_rmp_shared_state(),
snp_set_rmp_release_state() or something.

> +{
> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT; /* Cbit maybe set in the paddr */
> + int rc, n = 0, i;
> +
> + for (i = 0; i < npages; i++) {
> + if (to_fw)
> + rc = rmp_make_firmware(pfn, PG_LEVEL_4K);
> + else
> + rc = need_reclaim ? snp_reclaim_pages(pfn, 1, locked) :
> + rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (rc)
> + goto cleanup;
> +
> + pfn++;
> + n++;
> + }
> +
> + return 0;
> +
> +cleanup:
> + /* Try unrolling the firmware state changes */
> + if (to_fw) {
> + /*
> + * Reclaim the pages which were already changed to the
> + * firmware state.
> + */
> + snp_reclaim_pages(paddr >> PAGE_SHIFT, n, locked);
> +
> + return rc;
> + }
> +
> + /*
> + * If failed to change the page state to shared, then its not safe
> + * to release the page back to the system, leak it.
> + */
> + snp_leak_pages(pfn, npages - n);
> +
> + return rc;
> +}
> +
> +static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
> +{
> + unsigned long npages = 1ul << order, paddr;
> + struct sev_device *sev;
> + struct page *page;
> +
> + if (!psp_master || !psp_master->sev_data)
> + return NULL;
> +
> + page = alloc_pages(gfp_mask, order);
> + if (!page)
> + return NULL;
> +
> + /* If SEV-SNP is initialized then add the page in RMP table. */
> + sev = psp_master->sev_data;
> + if (!sev->snp_inited)
> + return page;
> +
> + paddr = __pa((unsigned long)page_address(page));
> + if (snp_set_rmp_state(paddr, npages, true, locked, false))
> + return NULL;

So what about the case where snp_set_rmp_state() fails but we were
able to reclaim all the pages? Should we be able to signal that to
callers so that we could free |page| here? But given this is an error
path already maybe we can optimize this in a follow up series.

> +
> + return page;
> +}
> +
> +void *snp_alloc_firmware_page(gfp_t gfp_mask)
> +{
> + struct page *page;
> +
> + page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
> +
> + return page ? page_address(page) : NULL;
> +}
> +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
> +
> +static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
> +{
> + unsigned long paddr, npages = 1ul << order;
> +
> + if (!page)
> + return;
> +
> + paddr = __pa((unsigned long)page_address(page));
> + if (snp_set_rmp_state(paddr, npages, false, locked, true))
> + return;

Here we may be able to free some of |page| depending how where inside
of snp_set_rmp_state() we failed. But again given this is an error
path already maybe we can optimize this in a follow up series.



> +
> + __free_pages(page, order);
> +}
> +
> +void snp_free_firmware_page(void *addr)
> +{
> + if (!addr)
> + return;
> +
> + __snp_free_firmware_pages(virt_to_page(addr), 0, false);
> +}
> +EXPORT_SYMBOL(snp_free_firmware_page);
> +
> static void *sev_fw_alloc(unsigned long len)
> {
> struct page *page;
>
> - page = alloc_pages(GFP_KERNEL, get_order(len));
> + page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len), false);
> if (!page)
> return NULL;
>
> @@ -393,7 +551,7 @@ static int __sev_init_locked(int *error)
> data.tmr_address = __pa(sev_es_tmr);
>
> data.flags |= SEV_INIT_FLAGS_SEV_ES;
> - data.tmr_len = SEV_ES_TMR_SIZE;
> + data.tmr_len = sev_es_tmr_size;
> }
>
> return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
> @@ -421,7 +579,7 @@ static int __sev_init_ex_locked(int *error)
> data.tmr_address = __pa(sev_es_tmr);
>
> data.flags |= SEV_INIT_FLAGS_SEV_ES;
> - data.tmr_len = SEV_ES_TMR_SIZE;
> + data.tmr_len = sev_es_tmr_size;
> }
>
> return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
> @@ -818,6 +976,8 @@ static int __sev_snp_init_locked(int *error)
> sev->snp_inited = true;
> dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
>
> + sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
> +
> return rc;
> }
>
> @@ -1341,8 +1501,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
> /* The TMR area was encrypted, flush it from the cache */
> wbinvd_on_all_cpus();
>
> - free_pages((unsigned long)sev_es_tmr,
> - get_order(SEV_ES_TMR_SIZE));
> + __snp_free_firmware_pages(virt_to_page(sev_es_tmr),
> + get_order(sev_es_tmr_size),
> + false);
> sev_es_tmr = NULL;
> }
>
> @@ -1430,7 +1591,7 @@ void sev_pci_init(void)
> }
>
> /* Obtain the TMR memory area for SEV-ES use */
> - sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
> + sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
> if (!sev_es_tmr)
> dev_warn(sev->dev,
> "SEV: TMR allocation failed, SEV-ES support unavailable\n");
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 9f921d221b75..a3bb792bb842 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -12,6 +12,8 @@
> #ifndef __PSP_SEV_H__
> #define __PSP_SEV_H__
>
> +#include <linux/sev.h>
> +
> #include <uapi/linux/psp-sev.h>
>
> #ifdef CONFIG_X86
> @@ -940,6 +942,8 @@ int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
> int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
>
> void *psp_copy_user_blob(u64 uaddr, u32 len);
> +void *snp_alloc_firmware_page(gfp_t mask);
> +void snp_free_firmware_page(void *addr);
>
> #else /* !CONFIG_CRYPTO_DEV_SP_PSP */
>
> @@ -981,6 +985,13 @@ static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *erro
> return -ENODEV;
> }
>
> +static inline void *snp_alloc_firmware_page(gfp_t mask)
> +{
> + return NULL;
> +}
> +
> +static inline void snp_free_firmware_page(void *addr) { }
> +
> #endif /* CONFIG_CRYPTO_DEV_SP_PSP */
>
> #endif /* __PSP_SEV_H__ */
> --
> 2.25.1
>

2022-06-21 20:18:32

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

[Public]

Hello Peter,

>> +static int snp_reclaim_pages(unsigned long pfn, unsigned int npages,
>> +bool locked) {
>> + struct sev_data_snp_page_reclaim data;
>> + int ret, err, i, n = 0;
>> +
>> + for (i = 0; i < npages; i++) {

>What about setting |n| here too, also the other increments.

>for (i = 0, n = 0; i < npages; i++, n++, pfn++)

Yes that is simpler.

>> + memset(&data, 0, sizeof(data));
>> + data.paddr = pfn << PAGE_SHIFT;
>> +
>> + if (locked)
>> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
>> + else
>> + ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM,
>> + &data, &err);

> Can we change `sev_cmd_mutex` to some sort of nesting lock type? That could clean up this if (locked) code.

> +static inline int rmp_make_firmware(unsigned long pfn, int level) {
> + return rmp_make_private(pfn, 0, level, 0, true); }
> +
> +static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, bool to_fw, bool locked,
> + bool need_reclaim)

>This function can do a lot and when I read the call sites its hard to see what its doing since we have a combination of arguments which tell us what behavior is happening, some of which are not valid (ex: to_fw == true and need_reclaim == true is an >invalid argument combination).

to_fw is used to make a firmware page and need_reclaim is for freeing the firmware page, so they are going to be mutually exclusive.

I actually can connect with it quite logically with the callers :
snp_alloc_firmware_pages will call with to_fw = true and need_reclaim = false
and snp_free_firmware_pages will do the opposite, to_fw = false and need_reclaim = true.

That seems straightforward to look at.

>Also this for loop over |npages| is duplicated from snp_reclaim_pages(). One improvement here is that on the current
>snp_reclaim_pages() if we fail to reclaim a page we assume we cannot reclaim the next pages, this may cause us to snp_leak_pages() more pages than we actually need too.

Yes that is true.

>What about something like this?

>static snp_leak_page(u64 pfn, enum pg_level level) {
> memory_failure(pfn, 0);
> dump_rmpentry(pfn);
>}

>static int snp_reclaim_page(u64 pfn, enum pg_level level) {
> int ret;
> struct sev_data_snp_page_reclaim data;

> ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> if (ret)
> goto cleanup;

> ret = rmp_make_shared(pfn, level);
> if (ret)
> goto cleanup;

> return 0;

>cleanup:
> snp_leak_page(pfn, level)
>}

>typedef int (*rmp_state_change_func) (u64 pfn, enum pg_level level);

>static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, rmp_state_change_func state_change, rmp_state_change_func cleanup) {
> struct sev_data_snp_page_reclaim data;
> int ret, err, i, n = 0;

> for (i = 0, n = 0; i < npages; i++, n++, pfn++) {
> ret = state_change(pfn, PG_LEVEL_4K)
> if (ret)
> goto cleanup;
> }

> return 0;

> cleanup:
> for (; i>= 0; i--, n--, pfn--) {
> cleanup(pfn, PG_LEVEL_4K);
> }

> return ret;
>}

>Then inside of __snp_alloc_firmware_pages():

>snp_set_rmp_state(paddr, npages, rmp_make_firmware, snp_reclaim_page);

>And inside of __snp_free_firmware_pages():

>snp_set_rmp_state(paddr, npages, snp_reclaim_page, snp_leak_page);

>Just a suggestion feel free to ignore. The readability comment could be addressed much less invasively by just making separate functions for each valid combination of arguments here. Like snp_set_rmp_fw_state(), snp_set_rmp_shared_state(),
>snp_set_rmp_release_state() or something.

>> +static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int
>> +order, bool locked) {
>> + unsigned long npages = 1ul << order, paddr;
>> + struct sev_device *sev;
>> + struct page *page;
>> +
>> + if (!psp_master || !psp_master->sev_data)
>> + return NULL;
>> +
>> + page = alloc_pages(gfp_mask, order);
>> + if (!page)
>> + return NULL;
>> +
>> + /* If SEV-SNP is initialized then add the page in RMP table. */
>> + sev = psp_master->sev_data;
>> + if (!sev->snp_inited)
>> + return page;
>> +
>> + paddr = __pa((unsigned long)page_address(page));
>> + if (snp_set_rmp_state(paddr, npages, true, locked, false))
>> + return NULL;

>So what about the case where snp_set_rmp_state() fails but we were able to reclaim all the pages? Should we be able to signal that to callers so that we could free |page| here? But given this is an error path already maybe we can optimize this in a >follow up series.

Yes, we should actually tie in to snp_reclaim_pages() success or failure here in the case we were able to successfully unroll some or all of the firmware state change.

> +
> + return page;
> +}
> +
> +void *snp_alloc_firmware_page(gfp_t gfp_mask) {
> + struct page *page;
> +
> + page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
> +
> + return page ? page_address(page) : NULL; }
> +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
> +
> +static void __snp_free_firmware_pages(struct page *page, int order,
> +bool locked) {
> + unsigned long paddr, npages = 1ul << order;
> +
> + if (!page)
> + return;
> +
> + paddr = __pa((unsigned long)page_address(page));
> + if (snp_set_rmp_state(paddr, npages, false, locked, true))
> + return;

> Here we may be able to free some of |page| depending how where inside of snp_set_rmp_state() we failed. But again given this is an error path already maybe we can optimize this in a follow up series.

Yes, we probably should be able to free some of the page(s) depending on how many page(s) got reclaimed in snp_set_rmp_state().
But these reclamation failures may not be very common, so any failure is indicative of a bigger issue, it might be the case when there is a single page reclamation error it might happen with all the subsequent
pages and so follow a simple recovery procedure, then handling a more complex recovery for a chunk of pages being reclaimed and another chunk not.

Thanks,
Ashish



2022-06-21 21:44:34

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 13/49] crypto:ccp: Provide APIs to issue SEV-SNP commands

(

On Mon, Jun 20, 2022 at 5:05 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> Provide the APIs for the hypervisor to manage an SEV-SNP guest. The
> commands for SEV-SNP is defined in the SEV-SNP firmware specification.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> drivers/crypto/ccp/sev-dev.c | 24 ++++++++++++
> include/linux/psp-sev.h | 73 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 97 insertions(+)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index f1173221d0b9..35d76333e120 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -1205,6 +1205,30 @@ int sev_guest_df_flush(int *error)
> }
> EXPORT_SYMBOL_GPL(sev_guest_df_flush);
>
> +int snp_guest_decommission(struct sev_data_snp_decommission *data, int *error)
> +{
> + return sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, data, error);
> +}
> +EXPORT_SYMBOL_GPL(snp_guest_decommission);
> +
> +int snp_guest_df_flush(int *error)
> +{
> + return sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, error);
> +}
> +EXPORT_SYMBOL_GPL(snp_guest_df_flush);

Why not instead change sev_guest_df_flush() to be SNP aware? That way
callers get the right behavior without having to know if SNP is
enabled or not.

int sev_guest_df_flush(int *error)
{
if (!psp_master || !psp_master->sev_data)
return -EINVAL;

if (psp_master->sev_data->snp_inited)
return sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, error);

return sev_do_cmd(SEV_CMD_DF_FLUSH, NULL, error);
}

> +int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error)
> +{
> + return sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, data, error);
> +}
> +EXPORT_SYMBOL_GPL(snp_guest_page_reclaim);
> +
> +int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
> +{
> + return sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, data, error);
> +}
> +EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);
> +
> static void sev_exit(struct kref *ref)
> {
> misc_deregister(&misc_dev->misc);
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index ef4d42e8c96e..9f921d221b75 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -881,6 +881,64 @@ int sev_guest_df_flush(int *error);
> */
> int sev_guest_decommission(struct sev_data_decommission *data, int *error);
>
> +/**
> + * snp_guest_df_flush - perform SNP DF_FLUSH command
> + *
> + * @sev_ret: sev command return code
> + *
> + * Returns:
> + * 0 if the sev successfully processed the command
> + * -%ENODEV if the sev device is not available
> + * -%ENOTSUPP if the sev does not support SEV
> + * -%ETIMEDOUT if the sev command timed out
> + * -%EIO if the sev returned a non-zero return code
> + */
> +int snp_guest_df_flush(int *error);
> +
> +/**
> + * snp_guest_decommission - perform SNP_DECOMMISSION command
> + *
> + * @decommission: sev_data_decommission structure to be processed
> + * @sev_ret: sev command return code
> + *
> + * Returns:
> + * 0 if the sev successfully processed the command
> + * -%ENODEV if the sev device is not available
> + * -%ENOTSUPP if the sev does not support SEV
> + * -%ETIMEDOUT if the sev command timed out
> + * -%EIO if the sev returned a non-zero return code
> + */
> +int snp_guest_decommission(struct sev_data_snp_decommission *data, int *error);
> +
> +/**
> + * snp_guest_page_reclaim - perform SNP_PAGE_RECLAIM command
> + *
> + * @decommission: sev_snp_page_reclaim structure to be processed
> + * @sev_ret: sev command return code
> + *
> + * Returns:
> + * 0 if the sev successfully processed the command
> + * -%ENODEV if the sev device is not available
> + * -%ENOTSUPP if the sev does not support SEV
> + * -%ETIMEDOUT if the sev command timed out
> + * -%EIO if the sev returned a non-zero return code
> + */
> +int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
> +
> +/**
> + * snp_guest_dbg_decrypt - perform SEV SNP_DBG_DECRYPT command
> + *
> + * @sev_ret: sev command return code
> + *
> + * Returns:
> + * 0 if the sev successfully processed the command
> + * -%ENODEV if the sev device is not available
> + * -%ENOTSUPP if the sev does not support SEV
> + * -%ETIMEDOUT if the sev command timed out
> + * -%EIO if the sev returned a non-zero return code
> + */
> +int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
> +
> void *psp_copy_user_blob(u64 uaddr, u32 len);
>
> #else /* !CONFIG_CRYPTO_DEV_SP_PSP */
> @@ -908,6 +966,21 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
>
> static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
>
> +static inline int
> +snp_guest_decommission(struct sev_data_snp_decommission *data, int *error) { return -ENODEV; }
> +
> +static inline int snp_guest_df_flush(int *error) { return -ENODEV; }
> +
> +static inline int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error)
> +{
> + return -ENODEV;
> +}
> +
> +static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
> +{
> + return -ENODEV;
> +}
> +
> #endif /* CONFIG_CRYPTO_DEV_SP_PSP */
>
> #endif /* __PSP_SEV_H__ */
> --
> 2.25.1
>

2022-06-21 22:18:46

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 17/49] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command

On Mon, Jun 20, 2022 at 5:06 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
> system-wide configuration value for SNP guests. The information includes
> the TCB version string to be reported in guest attestation reports.
>
> Version 2 of the GHCB specification adds an NAE (SNP extended guest
> request) that a guest can use to query the reports that include additional
> certificates.
>
> In both cases, userspace provided additional data is included in the
> attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
> command to give the certificate blob and the reported TCB version string
> at once. Note that the specification defines certificate blob with a
> specific GUID format; the userspace is responsible for building the
> proper certificate blob. The ioctl treats it an opaque blob.
>
> While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
> command that can be used to obtain the data programmed through the
> SNP_SET_EXT_CONFIG.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> Documentation/virt/coco/sevguest.rst | 27 +++++++
> drivers/crypto/ccp/sev-dev.c | 115 +++++++++++++++++++++++++++
> drivers/crypto/ccp/sev-dev.h | 3 +
> include/uapi/linux/psp-sev.h | 17 ++++
> 4 files changed, 162 insertions(+)
>
> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
> index 11ea67c944df..3014de47e4ce 100644
> --- a/Documentation/virt/coco/sevguest.rst
> +++ b/Documentation/virt/coco/sevguest.rst
> @@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
> status includes API major, minor version and more. See the SEV-SNP
> specification for further details.
>
> +2.5 SNP_SET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
> +reported TCB version in the attestation report. The command is similar to
> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
> +command also accepts an additional certificate blob defined in the GHCB
> +specification.
> +
> +If the certs_address is zero, then previous certificate blob will deleted.

... then the previous certificate blob will be deleted.

> +For more information on the certificate blob layout, see the GHCB spec
> +(extended guest request message).
> +
> +2.6 SNP_GET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_SET_EXT_CONFIG is used to query the system-wide configuration set
> +through the SNP_SET_EXT_CONFIG.
> +
> 3. SEV-SNP CPUID Enforcement
> ============================
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index b9b6fab31a82..97b479d5aa86 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -1312,6 +1312,10 @@ static int __sev_snp_shutdown_locked(int *error)
> if (!sev->snp_inited)
> return 0;
>
> + /* Free the memory used for caching the certificate data */
> + kfree(sev->snp_certs_data);
> + sev->snp_certs_data = NULL;
> +
> /* SHUTDOWN requires the DF_FLUSH */
> wbinvd_on_all_cpus();
> __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
> @@ -1616,6 +1620,111 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
> return ret;
> }
>
> +static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
> +{
> + struct sev_device *sev = psp_master->sev_data;
> + struct sev_user_data_ext_snp_config input;

Lets memset |input| to zero to avoid leaking kernel memory, see
"crypto: ccp - Use kzalloc for sev ioctl interfaces to prevent kernel
memory leak"

> + int ret;
> +
> + if (!sev->snp_inited || !argp->data)
> + return -EINVAL;
> +
> + if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
> + return -EFAULT;
> +
> + /* Copy the TCB version programmed through the SET_CONFIG to userspace */
> + if (input.config_address) {
> + if (copy_to_user((void * __user)input.config_address,
> + &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
> + return -EFAULT;
> + }
> +
> + /* Copy the extended certs programmed through the SNP_SET_CONFIG */
> + if (input.certs_address && sev->snp_certs_data) {
> + if (input.certs_len < sev->snp_certs_len) {
> + /* Return the certs length to userspace */
> + input.certs_len = sev->snp_certs_len;
> +
> + ret = -ENOSR;
> + goto e_done;
> + }
> +
> + if (copy_to_user((void * __user)input.certs_address,
> + sev->snp_certs_data, sev->snp_certs_len))
> + return -EFAULT;
> + }
> +
> + ret = 0;
> +
> +e_done:
> + if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
> + ret = -EFAULT;
> +
> + return ret;
> +}
> +
> +static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
> +{
> + struct sev_device *sev = psp_master->sev_data;
> + struct sev_user_data_ext_snp_config input;
> + struct sev_user_data_snp_config config;
> + void *certs = NULL;
> + int ret = 0;
> +
> + if (!sev->snp_inited || !argp->data)
> + return -EINVAL;
> +
> + if (!writable)
> + return -EPERM;
> +
> + if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
> + return -EFAULT;
> +
> + /* Copy the certs from userspace */
> + if (input.certs_address) {
> + if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
> + return -EINVAL;
> +
> + certs = psp_copy_user_blob(input.certs_address, input.certs_len);

I see that psp_copy_user_blob() uses memdup_user() which tracks the
allocated memory to GFP_USER. Given this memory is long lived and now
belongs to the PSP driver in perpetuity, should this be tracked with
GFP_KERNEL?

> + if (IS_ERR(certs))
> + return PTR_ERR(certs);
> + }
> +
> + /* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
> + if (input.config_address) {
> + if (copy_from_user(&config,
> + (void __user *)input.config_address, sizeof(config))) {
> + ret = -EFAULT;
> + goto e_free;
> + }
> +
> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
> + if (ret)
> + goto e_free;
> +
> + memcpy(&sev->snp_config, &config, sizeof(config));
> + }
> +
> + /*
> + * If the new certs are passed then cache it else free the old certs.
> + */
> + if (certs) {
> + kfree(sev->snp_certs_data);
> + sev->snp_certs_data = certs;
> + sev->snp_certs_len = input.certs_len;
> + } else {
> + kfree(sev->snp_certs_data);
> + sev->snp_certs_data = NULL;
> + sev->snp_certs_len = 0;
> + }

Do we need another lock here? When I look at 18/49 it seems like
snp_guest_ext_guest_request() it seems like we have a race for
|sev->snp_certs_data|

> +
> + return 0;
> +
> +e_free:
> + kfree(certs);
> + return ret;
> +}
> +
> static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> {
> void __user *argp = (void __user *)arg;
> @@ -1670,6 +1779,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> case SNP_PLATFORM_STATUS:
> ret = sev_ioctl_snp_platform_status(&input);
> break;
> + case SNP_SET_EXT_CONFIG:
> + ret = sev_ioctl_snp_set_config(&input, writable);
> + break;
> + case SNP_GET_EXT_CONFIG:
> + ret = sev_ioctl_snp_get_config(&input);
> + break;
> default:
> ret = -EINVAL;
> goto out;
> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
> index fe5d7a3ebace..d2fe1706311a 100644
> --- a/drivers/crypto/ccp/sev-dev.h
> +++ b/drivers/crypto/ccp/sev-dev.h
> @@ -66,6 +66,9 @@ struct sev_device {
>
> bool snp_inited;
> struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
> + void *snp_certs_data;
> + u32 snp_certs_len;
> + struct sev_user_data_snp_config snp_config;

Since this gets copy_to_user'd can we memset this to 0 to prevent
leaking kernel uninitialized memory? Similar to recent patches with
kzalloc and __GPF_ZERO usage.


> };
>
> int sev_dev_init(struct psp_device *psp);
> diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
> index ffd60e8b0a31..60e7a8d1a18e 100644
> --- a/include/uapi/linux/psp-sev.h
> +++ b/include/uapi/linux/psp-sev.h
> @@ -29,6 +29,8 @@ enum {
> SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
> SEV_GET_ID2,
> SNP_PLATFORM_STATUS,
> + SNP_SET_EXT_CONFIG,
> + SNP_GET_EXT_CONFIG,
>
> SEV_MAX,
> };
> @@ -190,6 +192,21 @@ struct sev_user_data_snp_config {
> __u8 rsvd[52];
> } __packed;
>
> +/**
> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
> + *
> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
> + * reported_tcb does not need to be updated.
> + * @certs_address: address of extended guest request certificate chain or
> + * 0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
> + * @certs_len: length of the certs
> + */
> +struct sev_user_data_ext_snp_config {
> + __u64 config_address; /* In */
> + __u64 certs_address; /* In */
> + __u32 certs_len; /* In */
> +};
> +
> /**
> * struct sev_issue_cmd - SEV ioctl parameters
> *
> --
> 2.25.1
>

2022-06-21 22:34:10

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 18/49] crypto: ccp: Provide APIs to query extended attestation report

On Mon, Jun 20, 2022 at 5:06 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> Version 2 of the GHCB specification defines VMGEXIT that is used to get
> the extended attestation report. The extended attestation report includes
> the certificate blobs provided through the SNP_SET_EXT_CONFIG.
>
> The snp_guest_ext_guest_request() will be used by the hypervisor to get
> the extended attestation report. See the GHCB specification for more
> details.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> drivers/crypto/ccp/sev-dev.c | 43 ++++++++++++++++++++++++++++++++++++
> include/linux/psp-sev.h | 24 ++++++++++++++++++++
> 2 files changed, 67 insertions(+)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 97b479d5aa86..f6306b820b86 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -25,6 +25,7 @@
> #include <linux/fs.h>
>
> #include <asm/smp.h>
> +#include <asm/sev.h>
>
> #include "psp-dev.h"
> #include "sev-dev.h"
> @@ -1857,6 +1858,48 @@ int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
> }
> EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);
>
> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> + unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
> +{
> + unsigned long expected_npages;
> + struct sev_device *sev;
> + int rc;
> +
> + if (!psp_master || !psp_master->sev_data)
> + return -ENODEV;
> +
> + sev = psp_master->sev_data;
> +
> + if (!sev->snp_inited)
> + return -EINVAL;
> +
> + /*
> + * Check if there is enough space to copy the certificate chain. Otherwise
> + * return ERROR code defined in the GHCB specification.
> + */
> + expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
> + if (*npages < expected_npages) {
> + *npages = expected_npages;
> + *fw_err = SNP_GUEST_REQ_INVALID_LEN;
> + return -EINVAL;
> + }
> +
> + rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)&fw_err);

We can just pass |fw_error| here (with the cast) here right? Not need
to do &fw_err.

rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)fw_err);

> + if (rc)
> + return rc;
> +
> + /* Copy the certificate blob */
> + if (sev->snp_certs_data) {
> + *npages = expected_npages;
> + memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);

Why don't we just make |vaddr| into a void* instead of an unsigned long?

> + } else {
> + *npages = 0;
> + }
> +
> + return rc;
> +}
> +EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
> +
> static void sev_exit(struct kref *ref)
> {
> misc_deregister(&misc_dev->misc);
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index a3bb792bb842..cd37ccd1fa1f 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -945,6 +945,23 @@ void *psp_copy_user_blob(u64 uaddr, u32 len);
> void *snp_alloc_firmware_page(gfp_t mask);
> void snp_free_firmware_page(void *addr);
>
> +/**
> + * snp_guest_ext_guest_request - perform the SNP extended guest request command
> + * defined in the GHCB specification.
> + *
> + * @data: the input guest request structure
> + * @vaddr: address where the certificate blob need to be copied.
> + * @npages: number of pages for the certificate blob.
> + * If the specified page count is less than the certificate blob size, then the
> + * required page count is returned with error code defined in the GHCB spec.
> + * If the specified page count is more than the certificate blob size, then
> + * page count is updated to reflect the amount of valid data copied in the
> + * vaddr.
> + */
> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> + unsigned long vaddr, unsigned long *npages,
> + unsigned long *error);
> +
> #else /* !CONFIG_CRYPTO_DEV_SP_PSP */
>
> static inline int
> @@ -992,6 +1009,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)
>
> static inline void snp_free_firmware_page(void *addr) { }
>
> +static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> + unsigned long vaddr, unsigned long *n,
> + unsigned long *error)
> +{
> + return -ENODEV;
> +}
> +
> #endif /* CONFIG_CRYPTO_DEV_SP_PSP */
>
> #endif /* __PSP_SEV_H__ */
> --
> 2.25.1
>

2022-06-22 01:48:12

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 13/49] crypto:ccp: Provide APIs to issue SEV-SNP commands

[Public]

>> +EXPORT_SYMBOL_GPL(snp_guest_decommission);
>> +
>> +int snp_guest_df_flush(int *error)
>> +{
>> + return sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, error); }
>> +EXPORT_SYMBOL_GPL(snp_guest_df_flush);

>Why not instead change sev_guest_df_flush() to be SNP aware? That way callers get the right behavior without having to know if SNP is enabled or not.

It can be done, and actually both DF_FLUSH commands do exactly the same thing.

But as with other API interfaces here, I think it is better to differentiate between snp and sev API interfaces and the callers be aware of which
interface they are invoking.

Thanks,
Ashish

2022-06-22 10:33:52

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 27/49] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests

* Ashish Kalra ([email protected]) wrote:
> From: Brijesh Singh <[email protected]>
>
> When SEV-SNP is enabled, the guest private pages are added in the RMP
> table; while adding the pages, the rmp_make_private() unmaps the pages
> from the direct map. If KSM attempts to access those unmapped pages then
> it will trigger #PF (page-not-present).
>
> Encrypted guest pages cannot be shared between the process, so an
> userspace should not mark the region mergeable but to be safe, mark the
> process vma unmerable before adding the pages in the RMP table.
^
Typo 'unmergable' (also in title)

> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index b5f0707d7ed6..a9461d352eda 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -19,11 +19,13 @@
> #include <linux/trace_events.h>
> #include <linux/hugetlb.h>
> #include <linux/sev.h>
> +#include <linux/ksm.h>
>
> #include <asm/pkru.h>
> #include <asm/trapnr.h>
> #include <asm/fpu/xcr.h>
> #include <asm/sev.h>
> +#include <asm/mman.h>
>
> #include "x86.h"
> #include "svm.h"
> @@ -1965,6 +1967,30 @@ static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
> return false;
> }
>
> +static int snp_mark_unmergable(struct kvm *kvm, u64 start, u64 size)
> +{
> + struct vm_area_struct *vma;
> + u64 end = start + size;
> + int ret;
> +
> + do {
> + vma = find_vma_intersection(kvm->mm, start, end);
> + if (!vma) {
> + ret = -EINVAL;
> + break;
> + }
> +
> + ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> + MADV_UNMERGEABLE, &vma->vm_flags);
> + if (ret)
> + break;
> +
> + start = vma->vm_end;
> + } while (end > vma->vm_end);
> +
> + return ret;
> +}
> +
> static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> {
> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> @@ -1989,6 +2015,12 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> if (!is_hva_registered(kvm, params.uaddr, params.len))
> return -EINVAL;
>
> + mmap_write_lock(kvm->mm);
> + ret = snp_mark_unmergable(kvm, params.uaddr, params.len);
> + mmap_write_unlock(kvm->mm);
> + if (ret)
> + return -EFAULT;
> +
> /*
> * The userspace memory is already locked so technically we don't
> * need to lock it again. Later part of the function needs to know
> --
> 2.25.1
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2022-06-22 14:15:33

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On 6/20/22 16:02, Ashish Kalra wrote:
> +/*
> + * The RMP entry format is not architectural. The format is defined in PPR
> + * Family 19h Model 01h, Rev B1 processor.
> + */

Let's say that Family 20h comes out and has a new RMP entry format.
What keeps an old kernel from attempting to use this old format on that
new CPU?

2022-06-22 14:29:55

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

[AMD Official Use Only - General]

>> +/*
>> + * The RMP entry format is not architectural. The format is defined
>> +in PPR
>> + * Family 19h Model 01h, Rev B1 processor.
>> + */

>Let's say that Family 20h comes out and has a new RMP entry format.
>What keeps an old kernel from attempting to use this old format on that new CPU?

As I replied previously on the same subject:
Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible.
I probably think the wording here should be architecture independent or more precisely platform independent.

Thanks,
Ashish

2022-06-22 14:30:02

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

On 6/20/22 16:02, Ashish Kalra wrote:
> +int psmash(u64 pfn)
> +{
> + unsigned long paddr = pfn << PAGE_SHIFT;
> + int ret;
> +
> + if (!pfn_valid(pfn))
> + return -EINVAL;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return -ENXIO;
> +
> + /* Binutils version 2.36 supports the PSMASH mnemonic. */
> + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> + : "=a"(ret)
> + : "a"(paddr)
> + : "memory", "cc");
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(psmash);

If a function gets an EXPORT_SYMBOL_GPL(), the least we can do is
reasonably document it. We don't need full kerneldoc nonsense, but a
one-line about what this does would be quite helpful. That goes for all
the functions here.

It would also be extremely helpful to have the changelog explain why
these functions are exported and how the exports will be used.

As a general rule, please push cpu_feature_enabled() checks as early as
you reasonably can. They are *VERY* cheap and can even enable the
compiler to completely zap code like an #ifdef.

There also seem to be a lot of pfn_valid() checks in here that aren't
very well thought out. For instance, there's a pfn_valid() check here:


+int rmp_make_shared(u64 pfn, enum pg_level level)
+{
+ struct rmpupdate val;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
...
+ return rmpupdate(pfn, &val);
+}

and in rmpupdate():

+static int rmpupdate(u64 pfn, struct rmpupdate *val)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
...


This is (at best) wasteful. Could it be refactored?

2022-06-22 14:32:42

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On 6/22/22 07:22, Kalra, Ashish wrote:
> As I replied previously on the same subject: Architectural implies
> that it is defined in the APM and shouldn't change in such a way as
> to not be backward compatible. I probably think the wording here
> should be architecture independent or more precisely platform
> independent.
Yeah, arch-independent and non-architectural are quite different concepts.

At Intel, at least, when someone says "not architectural" mean that the
behavior is implementation-specific. That, combined with the
model/family/stepping gave me the wrong impression about what was going on.

Some more clarity would be greatly appreciated.

2022-06-22 14:33:01

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 09/49] x86/fault: Add support to handle the RMP fault for user address

On Mon, Jun 20, 2022 at 11:03:43PM +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> When SEV-SNP is enabled globally, a write from the host goes through the
> RMP check. When the host writes to pages, hardware checks the following
> conditions at the end of page walk:
>
> 1. Assigned bit in the RMP table is zero (i.e page is shared).
> 2. If the page table entry that gives the sPA indicates that the target
> page size is a large page, then all RMP entries for the 4KB
> constituting pages of the target must have the assigned bit 0.
> 3. Immutable bit in the RMP table is not zero.
>
> The hardware will raise page fault if one of the above conditions is not
> met. Try resolving the fault instead of taking fault again and again. If
> the host attempts to write to the guest private memory then send the
> SIGBUS signal to kill the process. If the page level between the host and
> RMP entry does not match, then split the address to keep the RMP and host
> page levels in sync.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/mm/fault.c | 66 ++++++++++++++++++++++++++++++++++++++++
> include/linux/mm.h | 3 +-
> include/linux/mm_types.h | 3 ++
> mm/memory.c | 13 ++++++++
> 4 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index a4c270e99f7f..f5de9673093a 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -19,6 +19,7 @@
> #include <linux/uaccess.h> /* faulthandler_disabled() */
> #include <linux/efi.h> /* efi_crash_gracefully_on_page_fault()*/
> #include <linux/mm_types.h>
> +#include <linux/sev.h> /* snp_lookup_rmpentry() */
>
> #include <asm/cpufeature.h> /* boot_cpu_has, ... */
> #include <asm/traps.h> /* dotraplinkage, ... */
> @@ -1209,6 +1210,60 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
> }
> NOKPROBE_SYMBOL(do_kern_addr_fault);
>
> +static inline size_t pages_per_hpage(int level)
> +{
> + return page_level_size(level) / PAGE_SIZE;
> +}
> +
> +/*
> + * Return 1 if the caller need to retry, 0 if it the address need to be split
> + * in order to resolve the fault.
> + */
> +static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
> + unsigned long address)
> +{
> + int rmp_level, level;
> + pte_t *pte;
> + u64 pfn;
> +
> + pte = lookup_address_in_mm(current->mm, address, &level);
> +
> + /*
> + * It can happen if there was a race between an unmap event and
> + * the RMP fault delivery.
> + */
> + if (!pte || !pte_present(*pte))
> + return 1;
> +
> + pfn = pte_pfn(*pte);
> +
> + /* If its large page then calculte the fault pfn */
> + if (level > PG_LEVEL_4K) {
> + unsigned long mask;
> +
> + mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
> + pfn |= (address >> PAGE_SHIFT) & mask;
> + }
> +
> + /*
> + * If its a guest private page, then the fault cannot be resolved.
> + * Send a SIGBUS to terminate the process.
> + */
> + if (snp_lookup_rmpentry(pfn, &rmp_level)) {

snp_lookup_rmpentry returns 0, 1 or -errno, so this should likely be:

if (snp_lookup_rmpentry(pfn, &rmp_level) != 1)) {

> + do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
> + return 1;
> + }
> +
> + /*
> + * The backing page level is higher than the RMP page level, request
> + * to split the page.
> + */
> + if (level > rmp_level)
> + return 0;
> +
> + return 1;
> +}
> +
> /*
> * Handle faults in the user portion of the address space. Nothing in here
> * should check X86_PF_USER without a specific justification: for almost
> @@ -1306,6 +1361,17 @@ void do_user_addr_fault(struct pt_regs *regs,
> if (error_code & X86_PF_INSTR)
> flags |= FAULT_FLAG_INSTRUCTION;
>
> + /*
> + * If its an RMP violation, try resolving it.
> + */
> + if (error_code & X86_PF_RMP) {
> + if (handle_user_rmp_page_fault(regs, error_code, address))
> + return;
> +
> + /* Ask to split the page */
> + flags |= FAULT_FLAG_PAGE_SPLIT;
> + }
> +
> #ifdef CONFIG_X86_64
> /*
> * Faults in the vsyscall page might need emulation. The
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index de32c0383387..2ccc562d166f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -463,7 +463,8 @@ static inline bool fault_flag_allow_retry_first(enum fault_flag flags)
> { FAULT_FLAG_USER, "USER" }, \
> { FAULT_FLAG_REMOTE, "REMOTE" }, \
> { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \
> - { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }
> + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \
> + { FAULT_FLAG_PAGE_SPLIT, "PAGESPLIT" }
>
> /*
> * vm_fault is filled by the pagefault handler and passed to the vma's
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6dfaf271ebf8..aa2d8d48ce3e 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -818,6 +818,8 @@ typedef struct {
> * mapped R/O.
> * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached.
> * We should only access orig_pte if this flag set.
> + * @FAULT_FLAG_PAGE_SPLIT: The fault was due page size mismatch, split the
> + * region to smaller page size and retry.
> *
> * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
> * whether we would allow page faults to retry by specifying these two
> @@ -855,6 +857,7 @@ enum fault_flag {
> FAULT_FLAG_INTERRUPTIBLE = 1 << 9,
> FAULT_FLAG_UNSHARE = 1 << 10,
> FAULT_FLAG_ORIG_PTE_VALID = 1 << 11,
> + FAULT_FLAG_PAGE_SPLIT = 1 << 12,
> };
>
> typedef unsigned int __bitwise zap_flags_t;
> diff --git a/mm/memory.c b/mm/memory.c
> index 7274f2b52bca..c2187ffcbb8e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4945,6 +4945,15 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> return 0;
> }
>
> +static int handle_split_page_fault(struct vm_fault *vmf)
> +{
> + if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
> + return VM_FAULT_SIGBUS;
> +
> + __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
> + return 0;
> +}
> +
> /*
> * By the time we get here, we already hold the mm semaphore
> *
> @@ -5024,6 +5033,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> pmd_migration_entry_wait(mm, vmf.pmd);
> return 0;
> }
> +
> + if (flags & FAULT_FLAG_PAGE_SPLIT)
> + return handle_split_page_fault(&vmf);
> +
> if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) {
> if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
> return do_huge_pmd_numa_page(&vmf);
> --
> 2.25.1
>

2022-06-22 14:34:28

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 10/49] x86/fault: Add support to dump RMP entry on fault

On Mon, Jun 20, 2022 at 11:03:58PM +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> When SEV-SNP is enabled globally, a write from the host goes through the
> RMP check. If the hardware encounters the check failure, then it raises
> the #PF (with RMP set). Dump the RMP entry at the faulting pfn to help
> the debug.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/sev.h | 7 +++++++
> arch/x86/kernel/sev.c | 43 ++++++++++++++++++++++++++++++++++++++
> arch/x86/mm/fault.c | 17 +++++++++++----
> include/linux/sev.h | 2 ++
> 4 files changed, 65 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 6ab872311544..c0c4df817159 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -113,6 +113,11 @@ struct __packed rmpentry {
>
> #define rmpentry_assigned(x) ((x)->info.assigned)
> #define rmpentry_pagesize(x) ((x)->info.pagesize)
> +#define rmpentry_vmsa(x) ((x)->info.vmsa)
> +#define rmpentry_asid(x) ((x)->info.asid)
> +#define rmpentry_validated(x) ((x)->info.validated)
> +#define rmpentry_gpa(x) ((unsigned long)(x)->info.gpa)
> +#define rmpentry_immutable(x) ((x)->info.immutable)
>
> #define RMPADJUST_VMSA_PAGE_BIT BIT(16)
>
> @@ -205,6 +210,7 @@ void snp_set_wakeup_secondary_cpu(void);
> bool snp_init(struct boot_params *bp);
> void snp_abort(void);
> int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
> +void dump_rmpentry(u64 pfn);
> #else
> static inline void sev_es_ist_enter(struct pt_regs *regs) { }
> static inline void sev_es_ist_exit(void) { }
> @@ -229,6 +235,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
> {
> return -ENOTTY;
> }
> +static inline void dump_rmpentry(u64 pfn) {}
> #endif
>
> #endif
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 734cddd837f5..6640a639fffc 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2414,6 +2414,49 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
> return entry;
> }
>
> +void dump_rmpentry(u64 pfn)
> +{
> + unsigned long pfn_end;
> + struct rmpentry *e;
> + int level;
> +
> + e = __snp_lookup_rmpentry(pfn, &level);
> + if (!e) {

__snp_lookup_rmpentry may return -errno so this should be:

if (e != 1)

> + pr_alert("failed to read RMP entry pfn 0x%llx\n", pfn);
> + return;
> + }
> +
> + if (rmpentry_assigned(e)) {
> + pr_alert("RMPEntry paddr 0x%llx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx"
> + " asid=%d vmsa=%d validated=%d]\n", pfn << PAGE_SHIFT,
> + rmpentry_assigned(e), rmpentry_immutable(e), rmpentry_pagesize(e),
> + rmpentry_gpa(e), rmpentry_asid(e), rmpentry_vmsa(e),
> + rmpentry_validated(e));
> + return;
> + }
> +
> + /*
> + * If the RMP entry at the faulting pfn was not assigned, then we do not
> + * know what caused the RMP violation. To get some useful debug information,
> + * let iterate through the entire 2MB region, and dump the RMP entries if
> + * one of the bit in the RMP entry is set.
> + */
> + pfn = pfn & ~(PTRS_PER_PMD - 1);
> + pfn_end = pfn + PTRS_PER_PMD;
> +
> + while (pfn < pfn_end) {
> + e = __snp_lookup_rmpentry(pfn, &level);
> + if (!e)

if (e != 1)

> + return;
> +
> + if (e->low || e->high)
> + pr_alert("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> + pfn << PAGE_SHIFT, e->high, e->low);
> + pfn++;
> + }
> +}
> +EXPORT_SYMBOL_GPL(dump_rmpentry);
> +
> /*
> * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
> * and -errno if there is no corresponding RMP entry.
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f5de9673093a..25896a6ba04a 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -34,6 +34,7 @@
> #include <asm/kvm_para.h> /* kvm_handle_async_pf */
> #include <asm/vdso.h> /* fixup_vdso_exception() */
> #include <asm/irq_stack.h>
> +#include <asm/sev.h> /* dump_rmpentry() */
>
> #define CREATE_TRACE_POINTS
> #include <asm/trace/exceptions.h>
> @@ -290,7 +291,7 @@ static bool low_pfn(unsigned long pfn)
> return pfn < max_low_pfn;
> }
>
> -static void dump_pagetable(unsigned long address)
> +static void dump_pagetable(unsigned long address, bool show_rmpentry)
> {
> pgd_t *base = __va(read_cr3_pa());
> pgd_t *pgd = &base[pgd_index(address)];
> @@ -346,10 +347,11 @@ static int bad_address(void *p)
> return get_kernel_nofault(dummy, (unsigned long *)p);
> }
>
> -static void dump_pagetable(unsigned long address)
> +static void dump_pagetable(unsigned long address, bool show_rmpentry)
> {
> pgd_t *base = __va(read_cr3_pa());
> pgd_t *pgd = base + pgd_index(address);
> + unsigned long pfn;
> p4d_t *p4d;
> pud_t *pud;
> pmd_t *pmd;
> @@ -367,6 +369,7 @@ static void dump_pagetable(unsigned long address)
> if (bad_address(p4d))
> goto bad;
>
> + pfn = p4d_pfn(*p4d);
> pr_cont("P4D %lx ", p4d_val(*p4d));
> if (!p4d_present(*p4d) || p4d_large(*p4d))
> goto out;
> @@ -375,6 +378,7 @@ static void dump_pagetable(unsigned long address)
> if (bad_address(pud))
> goto bad;
>
> + pfn = pud_pfn(*pud);
> pr_cont("PUD %lx ", pud_val(*pud));
> if (!pud_present(*pud) || pud_large(*pud))
> goto out;
> @@ -383,6 +387,7 @@ static void dump_pagetable(unsigned long address)
> if (bad_address(pmd))
> goto bad;
>
> + pfn = pmd_pfn(*pmd);
> pr_cont("PMD %lx ", pmd_val(*pmd));
> if (!pmd_present(*pmd) || pmd_large(*pmd))
> goto out;
> @@ -391,9 +396,13 @@ static void dump_pagetable(unsigned long address)
> if (bad_address(pte))
> goto bad;
>
> + pfn = pte_pfn(*pte);
> pr_cont("PTE %lx", pte_val(*pte));
> out:
> pr_cont("\n");
> +
> + if (show_rmpentry)
> + dump_rmpentry(pfn);
> return;
> bad:
> pr_info("BAD\n");
> @@ -579,7 +588,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
> show_ldttss(&gdt, "TR", tr);
> }
>
> - dump_pagetable(address);
> + dump_pagetable(address, error_code & X86_PF_RMP);
> }
>
> static noinline void
> @@ -596,7 +605,7 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code,
>
> printk(KERN_ALERT "%s: Corrupted page table at address %lx\n",
> tsk->comm, address);
> - dump_pagetable(address);
> + dump_pagetable(address, false);
>
> if (__die("Bad pagetable", regs, error_code))
> sig = 0;
> diff --git a/include/linux/sev.h b/include/linux/sev.h
> index 1a68842789e1..734b13a69c54 100644
> --- a/include/linux/sev.h
> +++ b/include/linux/sev.h
> @@ -16,6 +16,7 @@ int snp_lookup_rmpentry(u64 pfn, int *level);
> int psmash(u64 pfn);
> int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> int rmp_make_shared(u64 pfn, enum pg_level level);
> +void dump_rmpentry(u64 pfn);
> #else
> static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
> static inline int psmash(u64 pfn) { return -ENXIO; }
> @@ -25,6 +26,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
> return -ENODEV;
> }
> static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
> +static inline void dump_rmpentry(u64 pfn) { }
>
> #endif /* CONFIG_AMD_MEM_ENCRYPT */
> #endif /* __LINUX_SEV_H */
> --
> 2.25.1
>

2022-06-22 14:50:28

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 10/49] x86/fault: Add support to dump RMP entry on fault

On Wed, Jun 22, 2022 at 04:33:04PM +0200, Jeremi Piotrowski wrote:
> On Mon, Jun 20, 2022 at 11:03:58PM +0000, Ashish Kalra wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > When SEV-SNP is enabled globally, a write from the host goes through the
> > RMP check. If the hardware encounters the check failure, then it raises
> > the #PF (with RMP set). Dump the RMP entry at the faulting pfn to help
> > the debug.
> >
> > Signed-off-by: Brijesh Singh <[email protected]>
> > ---
> > arch/x86/include/asm/sev.h | 7 +++++++
> > arch/x86/kernel/sev.c | 43 ++++++++++++++++++++++++++++++++++++++
> > arch/x86/mm/fault.c | 17 +++++++++++----
> > include/linux/sev.h | 2 ++
> > 4 files changed, 65 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> > index 6ab872311544..c0c4df817159 100644
> > --- a/arch/x86/include/asm/sev.h
> > +++ b/arch/x86/include/asm/sev.h
> > @@ -113,6 +113,11 @@ struct __packed rmpentry {
> >
> > #define rmpentry_assigned(x) ((x)->info.assigned)
> > #define rmpentry_pagesize(x) ((x)->info.pagesize)
> > +#define rmpentry_vmsa(x) ((x)->info.vmsa)
> > +#define rmpentry_asid(x) ((x)->info.asid)
> > +#define rmpentry_validated(x) ((x)->info.validated)
> > +#define rmpentry_gpa(x) ((unsigned long)(x)->info.gpa)
> > +#define rmpentry_immutable(x) ((x)->info.immutable)
> >
> > #define RMPADJUST_VMSA_PAGE_BIT BIT(16)
> >
> > @@ -205,6 +210,7 @@ void snp_set_wakeup_secondary_cpu(void);
> > bool snp_init(struct boot_params *bp);
> > void snp_abort(void);
> > int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
> > +void dump_rmpentry(u64 pfn);
> > #else
> > static inline void sev_es_ist_enter(struct pt_regs *regs) { }
> > static inline void sev_es_ist_exit(void) { }
> > @@ -229,6 +235,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
> > {
> > return -ENOTTY;
> > }
> > +static inline void dump_rmpentry(u64 pfn) {}
> > #endif
> >
> > #endif
> > diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> > index 734cddd837f5..6640a639fffc 100644
> > --- a/arch/x86/kernel/sev.c
> > +++ b/arch/x86/kernel/sev.c
> > @@ -2414,6 +2414,49 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
> > return entry;
> > }
> >
> > +void dump_rmpentry(u64 pfn)
> > +{
> > + unsigned long pfn_end;
> > + struct rmpentry *e;
> > + int level;
> > +
> > + e = __snp_lookup_rmpentry(pfn, &level);
> > + if (!e) {
>
> __snp_lookup_rmpentry may return -errno so this should be:
>
> if (e != 1)

Sorry, actually it should be:

if (IS_ERR_OR_NULL(e)) {

>
> > + pr_alert("failed to read RMP entry pfn 0x%llx\n", pfn);
> > + return;
> > + }
> > +
> > + if (rmpentry_assigned(e)) {
> > + pr_alert("RMPEntry paddr 0x%llx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx"
> > + " asid=%d vmsa=%d validated=%d]\n", pfn << PAGE_SHIFT,
> > + rmpentry_assigned(e), rmpentry_immutable(e), rmpentry_pagesize(e),
> > + rmpentry_gpa(e), rmpentry_asid(e), rmpentry_vmsa(e),
> > + rmpentry_validated(e));
> > + return;
> > + }
> > +
> > + /*
> > + * If the RMP entry at the faulting pfn was not assigned, then we do not
> > + * know what caused the RMP violation. To get some useful debug information,
> > + * let iterate through the entire 2MB region, and dump the RMP entries if
> > + * one of the bit in the RMP entry is set.
> > + */
> > + pfn = pfn & ~(PTRS_PER_PMD - 1);
> > + pfn_end = pfn + PTRS_PER_PMD;
> > +
> > + while (pfn < pfn_end) {
> > + e = __snp_lookup_rmpentry(pfn, &level);
> > + if (!e)
>
> if (e != 1)
>

and this too:

if (IS_ERR_OR_NULL(e))


> > + return;
> > +
> > + if (e->low || e->high)
> > + pr_alert("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> > + pfn << PAGE_SHIFT, e->high, e->low);
> > + pfn++;
> > + }
> > +}
> > +EXPORT_SYMBOL_GPL(dump_rmpentry);
> > +
> > /*
> > * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
> > * and -errno if there is no corresponding RMP entry.
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index f5de9673093a..25896a6ba04a 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -34,6 +34,7 @@
> > #include <asm/kvm_para.h> /* kvm_handle_async_pf */
> > #include <asm/vdso.h> /* fixup_vdso_exception() */
> > #include <asm/irq_stack.h>
> > +#include <asm/sev.h> /* dump_rmpentry() */
> >
> > #define CREATE_TRACE_POINTS
> > #include <asm/trace/exceptions.h>
> > @@ -290,7 +291,7 @@ static bool low_pfn(unsigned long pfn)
> > return pfn < max_low_pfn;
> > }
> >
> > -static void dump_pagetable(unsigned long address)
> > +static void dump_pagetable(unsigned long address, bool show_rmpentry)
> > {
> > pgd_t *base = __va(read_cr3_pa());
> > pgd_t *pgd = &base[pgd_index(address)];
> > @@ -346,10 +347,11 @@ static int bad_address(void *p)
> > return get_kernel_nofault(dummy, (unsigned long *)p);
> > }
> >
> > -static void dump_pagetable(unsigned long address)
> > +static void dump_pagetable(unsigned long address, bool show_rmpentry)
> > {
> > pgd_t *base = __va(read_cr3_pa());
> > pgd_t *pgd = base + pgd_index(address);
> > + unsigned long pfn;
> > p4d_t *p4d;
> > pud_t *pud;
> > pmd_t *pmd;
> > @@ -367,6 +369,7 @@ static void dump_pagetable(unsigned long address)
> > if (bad_address(p4d))
> > goto bad;
> >
> > + pfn = p4d_pfn(*p4d);
> > pr_cont("P4D %lx ", p4d_val(*p4d));
> > if (!p4d_present(*p4d) || p4d_large(*p4d))
> > goto out;
> > @@ -375,6 +378,7 @@ static void dump_pagetable(unsigned long address)
> > if (bad_address(pud))
> > goto bad;
> >
> > + pfn = pud_pfn(*pud);
> > pr_cont("PUD %lx ", pud_val(*pud));
> > if (!pud_present(*pud) || pud_large(*pud))
> > goto out;
> > @@ -383,6 +387,7 @@ static void dump_pagetable(unsigned long address)
> > if (bad_address(pmd))
> > goto bad;
> >
> > + pfn = pmd_pfn(*pmd);
> > pr_cont("PMD %lx ", pmd_val(*pmd));
> > if (!pmd_present(*pmd) || pmd_large(*pmd))
> > goto out;
> > @@ -391,9 +396,13 @@ static void dump_pagetable(unsigned long address)
> > if (bad_address(pte))
> > goto bad;
> >
> > + pfn = pte_pfn(*pte);
> > pr_cont("PTE %lx", pte_val(*pte));
> > out:
> > pr_cont("\n");
> > +
> > + if (show_rmpentry)
> > + dump_rmpentry(pfn);
> > return;
> > bad:
> > pr_info("BAD\n");
> > @@ -579,7 +588,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
> > show_ldttss(&gdt, "TR", tr);
> > }
> >
> > - dump_pagetable(address);
> > + dump_pagetable(address, error_code & X86_PF_RMP);
> > }
> >
> > static noinline void
> > @@ -596,7 +605,7 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code,
> >
> > printk(KERN_ALERT "%s: Corrupted page table at address %lx\n",
> > tsk->comm, address);
> > - dump_pagetable(address);
> > + dump_pagetable(address, false);
> >
> > if (__die("Bad pagetable", regs, error_code))
> > sig = 0;
> > diff --git a/include/linux/sev.h b/include/linux/sev.h
> > index 1a68842789e1..734b13a69c54 100644
> > --- a/include/linux/sev.h
> > +++ b/include/linux/sev.h
> > @@ -16,6 +16,7 @@ int snp_lookup_rmpentry(u64 pfn, int *level);
> > int psmash(u64 pfn);
> > int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> > int rmp_make_shared(u64 pfn, enum pg_level level);
> > +void dump_rmpentry(u64 pfn);
> > #else
> > static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
> > static inline int psmash(u64 pfn) { return -ENXIO; }
> > @@ -25,6 +26,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
> > return -ENODEV;
> > }
> > static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
> > +static inline void dump_rmpentry(u64 pfn) { }
> >
> > #endif /* CONFIG_AMD_MEM_ENCRYPT */
> > #endif /* __LINUX_SEV_H */
> > --
> > 2.25.1
> >

2022-06-22 18:11:22

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

[AMD Official Use Only - General]

>> +int psmash(u64 pfn)
>> +{
>> + unsigned long paddr = pfn << PAGE_SHIFT;
>> + int ret;
>> +
>> + if (!pfn_valid(pfn))
>> + return -EINVAL;
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>> + return -ENXIO;
>> +
>> + /* Binutils version 2.36 supports the PSMASH mnemonic. */
>> + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
>> + : "=a"(ret)
>> + : "a"(paddr)
>> + : "memory", "cc");
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(psmash);

>If a function gets an EXPORT_SYMBOL_GPL(), the least we can do is reasonably document it. We don't need full kerneldoc nonsense, but a one-line about what this does would be quite helpful. That goes for all the functions here.

>It would also be extremely helpful to have the changelog explain why these functions are exported and how the exports will be used.

I will add basic descriptions for all these exported functions.

Thanks,
Ashish

>As a general rule, please push cpu_feature_enabled() checks as early as you reasonably can. They are *VERY* cheap and can even enable the compiler to completely zap code like an #ifdef.

There also seem to be a lot of pfn_valid() checks in here that aren't very well thought out. For instance, there's a pfn_valid() check here:


+int rmp_make_shared(u64 pfn, enum pg_level level) {
+ struct rmpupdate val;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
...
+ return rmpupdate(pfn, &val);
+}

and in rmpupdate():

+static int rmpupdate(u64 pfn, struct rmpupdate *val) {
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
...


This is (at best) wasteful. Could it be refactored?

2022-06-22 18:11:42

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 10/49] x86/fault: Add support to dump RMP entry on fault

[AMD Official Use Only - General]

>> > +void dump_rmpentry(u64 pfn)
>> > +{
>> > + unsigned long pfn_end;
>> > + struct rmpentry *e;
>> > + int level;
>> > +
>> > + e = __snp_lookup_rmpentry(pfn, &level);
>> > + if (!e) {
>>
>> __snp_lookup_rmpentry may return -errno so this should be:
>>
>> if (e != 1)

>Sorry, actually it should be:

> if (IS_ERR_OR_NULL(e)) {

I will fix this accordingly.

>> > +
>> > + while (pfn < pfn_end) {
>> > + e = __snp_lookup_rmpentry(pfn, &level);
>> > + if (!e)
>>
>> if (e != 1)
>>

>and this too:

> if (IS_ERR_OR_NULL(e))

Same here.

Thanks,
Ashish

2022-06-22 18:15:56

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

[Public]

On 6/22/22 07:22, Kalra, Ashish wrote:
>> As I replied previously on the same subject: Architectural implies
>> that it is defined in the APM and shouldn't change in such a way as to
>> not be backward compatible. I probably think the wording here should
>> be architecture independent or more precisely platform independent.
>Yeah, arch-independent and non-architectural are quite different concepts.

>At Intel, at least, when someone says "not architectural" mean that the behavior is implementation-specific. That, combined with the model/family/stepping gave me the wrong impression about what was going on.

>Some more clarity would be greatly appreciated.

Actually, the PPR for family 19h Model 01h, Rev B1 defines the RMP entry format as below:

2.1.4.2 RMP Entry Format
Architecturally the format of RMP entries are not specified in APM. In order to assist software, the following table specifies select portions of the RMP entry format for this specific product. Each RMP entry is 16B in size and is formatted as follows. Software should not rely on any field definitions not specified in this table and the format of an RMP entry may change in future processors.

Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible. So non-architectural in this context means that it is only defined in our PPR.

So actually this RPM entry definition is platform dependent and will need to be changed for different AMD processors and that change has to be handled correspondingly in the dump_rmpentry() code.

Thanks,
Ashish

2022-06-22 18:18:02

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

[AMD Official Use Only - General]

>>> /*
>>> * The RMP entry format is not architectural. The format is defined
>>> in PPR @@ -126,6 +128,15 @@ struct snp_guest_platform_data {
>>> u64 secrets_gpa;
>>> };
>>>
>>> +struct rmpupdate {
>>> + u64 gpa;
>>> + u8 assigned;
>>> + u8 pagesize;
>>> + u8 immutable;
>>> + u8 rsvd;
>>> + u32 asid;
>>> +} __packed;

>>I see above it says the RMP entry format isn't architectural; is this 'rmpupdate' structure? If not how is this going to get handled when we have a couple >of SNP capable CPUs with different layouts?

>Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible.
>I probably think the wording here should be architecture independent or more precisely platform independent.

Some more clarity on this:

Actually, the PPR for family 19h Model 01h, Rev B1 defines the RMP entry format as below:

2.1.4.2 RMP Entry Format
Architecturally the format of RMP entries are not specified in APM. In order to assist software, the following table specifies select portions of the RMP entry format for this specific product. Each RMP entry is 16B in size and is formatted as follows. Software should not rely on any field definitions not specified in this table and the format of an RMP entry may change in future processors.

Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible. So non-architectural in this context means that it is only defined in our PPR.

So actually this RPM entry definition is platform dependent and will need to be changed for different AMD processors and that change has to be handled correspondingly in the dump_rmpentry() code.

Thanks,
Ashish

2022-06-22 18:47:30

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

[AMD Official Use Only - General]

-----Original Message-----
From: Dave Hansen <[email protected]>
Sent: Wednesday, June 22, 2022 1:18 PM
To: Kalra, Ashish <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Cc: [email protected]; [email protected]; [email protected]; Lendacky, Thomas <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Roth, Michael <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On 6/22/22 11:15, Kalra, Ashish wrote:
> So actually this RPM entry definition is platform dependent and will
> need to be changed for different AMD processors and that change has to
> be handled correspondingly in the dump_rmpentry() code.

>So, if the RMP entry format changes in future processors, how do we make sure that the kernel does not try to use *this* code on those processors?

Functions snp_lookup_rmpentry() and dump_rmpentry() which rely on this structure definition will need to handle it accordingly.

Thanks,
Ashish

2022-06-22 18:47:38

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On 6/22/22 11:15, Kalra, Ashish wrote:
> So actually this RPM entry definition is platform dependent and will
> need to be changed for different AMD processors and that change has
> to be handled correspondingly in the dump_rmpentry() code.

So, if the RMP entry format changes in future processors, how do we make
sure that the kernel does not try to use *this* code on those processors?

2022-06-22 18:49:21

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On 6/22/22 11:34, Kalra, Ashish wrote:
>> So, if the RMP entry format changes in future processors, how do we
>> make sure that the kernel does not try to use *this* code on those
>> processors?
> Functions snp_lookup_rmpentry() and dump_rmpentry() which rely on
> this structure definition will need to handle it accordingly.

In other words, old kernels will break on new hardware?

I think that needs to be fixed. It should be as simple as a
model/family check, though. If someone (for example) attempts to use
SNP (and thus snp_lookup_rmpentry() and dump_rmpentry()) code on a newer
CPU, the kernel should refuse.

2022-06-22 19:44:58

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

[AMD Official Use Only - General]


-----Original Message-----
From: Dave Hansen <[email protected]>
Sent: Wednesday, June 22, 2022 1:43 PM
To: Kalra, Ashish <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Cc: [email protected]; [email protected]; [email protected]; Lendacky, Thomas <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Roth, Michael <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On 6/22/22 11:34, Kalra, Ashish wrote:
>>> So, if the RMP entry format changes in future processors, how do we
>>> make sure that the kernel does not try to use *this* code on those
>>> processors?
>> Functions snp_lookup_rmpentry() and dump_rmpentry() which rely on this
>> structure definition will need to handle it accordingly.

>In other words, old kernels will break on new hardware?

>I think that needs to be fixed. It should be as simple as a model/family check, though. If someone (for example) attempts to use SNP (and thus snp_lookup_rmpentry() and dump_rmpentry()) code on a newer CPU, the kernel should refuse.

More specifically I am thinking of adding RMP entry field accessors so that they can do this cpu model/family check and return the correct field as per processor architecture.

Thanks,
Ashish

2022-06-22 20:10:20

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On 6/22/22 12:43, Kalra, Ashish wrote:
>> I think that needs to be fixed. It should be as simple as a
>> model/family check, though. If someone (for example) attempts to
>> use SNP (and thus snp_lookup_rmpentry() and dump_rmpentry()) code
>> on a newer CPU, the kernel should refuse.
> More specifically I am thinking of adding RMP entry field accessors
> so that they can do this cpu model/family check and return the
> correct field as per processor architecture.

That will be helpful down the road when there's more than one format.
But, the real issue is that the kernel doesn't *support* a different RMP
format. So, the SNP support should be disabled when encountering a
model/family other than the known good one.

2022-06-22 20:26:06

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

[AMD Official Use Only - General]


From: Dave Hansen <[email protected]>
Sent: Wednesday, June 22, 2022 2:50 PM
To: Kalra, Ashish <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Cc: [email protected]; [email protected]; [email protected]; Lendacky, Thomas <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Roth, Michael <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On 6/22/22 12:43, Kalra, Ashish wrote:
>>> I think that needs to be fixed. It should be as simple as a
>>> model/family check, though. If someone (for example) attempts to use
>>> SNP (and thus snp_lookup_rmpentry() and dump_rmpentry()) code on a
>>> newer CPU, the kernel should refuse.
>> More specifically I am thinking of adding RMP entry field accessors so
>> that they can do this cpu model/family check and return the correct
>> field as per processor architecture.

>That will be helpful down the road when there's more than one format.
>But, the real issue is that the kernel doesn't *support* a different RMP format. So, the SNP support should be disabled when encountering a model/family other than the known good one.

Yes, that makes sense, will add an additional check in snp_rmptable_init().

Thanks,
Ashish

2022-06-22 21:00:11

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

[Public]

From: Dave Hansen <[email protected]>
Sent: Wednesday, June 22, 2022 2:50 PM
To: Kalra, Ashish <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Cc: [email protected]; [email protected]; [email protected]; Lendacky, Thomas <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Roth, Michael <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On 6/22/22 12:43, Kalra, Ashish wrote:
>>> I think that needs to be fixed. It should be as simple as a
>>> model/family check, though. If someone (for example) attempts to
>>> use SNP (and thus snp_lookup_rmpentry() and dump_rmpentry()) code on
>>> a newer CPU, the kernel should refuse.
>> More specifically I am thinking of adding RMP entry field accessors
>> so that they can do this cpu model/family check and return the
>> correct field as per processor architecture.

>That will be helpful down the road when there's more than one format.
>But, the real issue is that the kernel doesn't *support* a different RMP format. So, the SNP support should be disabled when encountering a model/family other than the known good one.

>Yes, that makes sense, will add an additional check in snp_rmptable_init().

Also to add here, additionally we may create an architectural way to read the RMP entry in the future.

Thanks,
Ashish

2022-06-23 20:50:26

by Marc Orr

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 03/49] x86/sev: Add the host SEV-SNP initialization support

On Mon, Jun 20, 2022 at 4:02 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
> track the owner of each page of memory. Pages of memory can be owned by
> the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
> section 15.36.3 for more detail on RMP.
>
> The RMP table is used to enforce access control to memory. The table itself
> is not directly writable by the software. New CPU instructions (RMPUPDATE,
> PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.
>
> Based on the platform configuration, the BIOS reserves the memory used
> for the RMP table. The start and end address of the RMP table must be
> queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
> RMP_END are not set then disable the SEV-SNP feature.
>
> The SEV-SNP feature is enabled only after the RMP table is successfully
> initialized.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/disabled-features.h | 8 +-
> arch/x86/include/asm/msr-index.h | 6 +
> arch/x86/kernel/sev.c | 144 +++++++++++++++++++++++
> 3 files changed, 157 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 36369e76cc63..c1be3091a383 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -68,6 +68,12 @@
> # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
> #endif
>
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +# define DISABLE_SEV_SNP 0
> +#else
> +# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
> +#endif
> +
> /*
> * Make sure to add features to the correct mask
> */
> @@ -91,7 +97,7 @@
> DISABLE_ENQCMD)
> #define DISABLED_MASK17 0
> #define DISABLED_MASK18 0
> -#define DISABLED_MASK19 0
> +#define DISABLED_MASK19 (DISABLE_SEV_SNP)
> #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
>
> #endif /* _ASM_X86_DISABLED_FEATURES_H */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 9e2e7185fc1d..57a8280e283a 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -507,6 +507,8 @@
> #define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
> #define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
> #define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +#define MSR_AMD64_RMP_BASE 0xc0010132
> +#define MSR_AMD64_RMP_END 0xc0010133
>
> #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
>
> @@ -581,6 +583,10 @@
> #define MSR_AMD64_SYSCFG 0xc0010010
> #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
> #define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
> +#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)

nit: The alignment here looks off. The rest of the file left-aligns
the macro definition column under a comment header. The bad alignment
can be viewed on the github version of this patch:
https://github.com/AMDESE/linux/commit/5101daef92f448c046207b701c0c420b1fce3eaf

> #define MSR_K8_INT_PENDING_MSG 0xc0010055
> /* C1E active bits in int pending message */
> #define K8_INTP_C1E_ACTIVE_MASK 0x18000000
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index f01f4550e2c6..3a233b5d47c5 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -22,6 +22,8 @@
> #include <linux/efi.h>
> #include <linux/platform_device.h>
> #include <linux/io.h>
> +#include <linux/cpumask.h>
> +#include <linux/iommu.h>
>
> #include <asm/cpu_entry_area.h>
> #include <asm/stacktrace.h>
> @@ -38,6 +40,7 @@
> #include <asm/apic.h>
> #include <asm/cpuid.h>
> #include <asm/cmdline.h>
> +#include <asm/iommu.h>
>
> #define DR7_RESET_VALUE 0x400
>
> @@ -57,6 +60,12 @@
> #define AP_INIT_CR0_DEFAULT 0x60000010
> #define AP_INIT_MXCSR_DEFAULT 0x1f80
>
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the
> + * bookkeeping, the range need to be added during the RMP entry lookup.
> + */
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> +
> /* For early boot hypervisor communication in SEV-ES enabled guests */
> static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
>
> @@ -69,6 +78,10 @@ static struct ghcb *boot_ghcb __section(".data");
> /* Bitmap of SEV features supported by the hypervisor */
> static u64 sev_hv_features __ro_after_init;
>
> +static unsigned long rmptable_start __ro_after_init;
> +static unsigned long rmptable_end __ro_after_init;
> +
> +
> /* #VC handler runtime per-CPU data */
> struct sev_es_runtime_data {
> struct ghcb ghcb_page;
> @@ -2218,3 +2231,134 @@ static int __init snp_init_platform_device(void)
> return 0;
> }
> device_initcall(snp_init_platform_device);
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt) "SEV-SNP: " fmt
> +
> +static int __snp_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_SNP_EN;
> + val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void snp_enable(void *arg)
> +{
> + __snp_enable(smp_processor_id());
> +}
> +
> +static bool get_rmptable_info(u64 *start, u64 *len)
> +{
> + u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end, nr_pages;
> +
> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> + if (!rmp_base || !rmp_end) {
> + pr_info("Memory for the RMP table has not been reserved by BIOS\n");
> + return false;
> + }
> +
> + rmp_sz = rmp_end - rmp_base + 1;
> +
> + /*
> + * Calculate the amount the memory that must be reserved by the BIOS to
> + * address the full system RAM. The reserved memory should also cover the
> + * RMP table itself.
> + *
> + * See PPR Family 19h Model 01h, Revision B1 section 2.1.4.2 for more
> + * information on memory requirement.
> + */
> + nr_pages = totalram_pages();
> + calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + nr_pages) << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> + if (calc_rmp_sz > rmp_sz) {
> + pr_info("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> + calc_rmp_sz, rmp_sz);
> + return false;
> + }
> +
> + *start = rmp_base;
> + *len = rmp_sz;
> +
> + pr_info("RMP table physical address 0x%016llx - 0x%016llx\n", rmp_base, rmp_end);
> +
> + return true;
> +}
> +
> +static __init int __snp_rmptable_init(void)
> +{
> + u64 rmp_base, sz;
> + void *start;
> + u64 val;
> +
> + if (!get_rmptable_info(&rmp_base, &sz))
> + return 1;
> +
> + start = memremap(rmp_base, sz, MEMREMAP_WB);
> + if (!start) {
> + pr_err("Failed to map RMP table 0x%llx+0x%llx\n", rmp_base, sz);
> + return 1;
> + }
> +
> + /*
> + * Check if SEV-SNP is already enabled, this can happen if we are coming from
> + * kexec boot.
> + */
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
> + goto skip_enable;
> +
> + /* Initialize the RMP table to zero */
> + memset(start, 0, sz);
> +
> + /* Flush the caches to ensure that data is written before SNP is enabled. */
> + wbinvd_on_all_cpus();
> +
> + /* Enable SNP on all CPUs. */
> + on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> + rmptable_start = (unsigned long)start;
> + rmptable_end = rmptable_start + sz;
> +
> + return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> + if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + if (!iommu_sev_snp_supported())
> + goto nosnp;
> +
> + if (__snp_rmptable_init())
> + goto nosnp;
> +
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> + return 0;
> +
> +nosnp:
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> + return 1;

Seems odd that we're returning 1 here, rather than 0. I tried to
figure out how the initcall return values are used and failed. My
impression was 0 means success and a negative number means failure.
But maybe this is normal.

> +}
> +
> +/*
> + * This must be called after the PCI subsystem. This is because before enabling
> + * the SNP feature we need to ensure that IOMMU supports the SEV-SNP feature.
> + * The iommu_sev_snp_support() is used for checking the feature, and it is
> + * available after subsys_initcall().
> + */
> +fs_initcall(snp_rmptable_init);
> --
> 2.25.1
>

2022-06-23 21:02:44

by Marc Orr

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 04/49] x86/sev: set SYSCFG.MFMD

On Mon, Jun 20, 2022 at 4:02 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> SEV-SNP FW >= 1.51 requires that SYSCFG.MFMD must be set.
>
> Subsequent CCP patches while require 1.51 as the minimum SEV-SNP
> firmware version.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/msr-index.h | 3 +++
> arch/x86/kernel/sev.c | 24 ++++++++++++++++++++++++
> 2 files changed, 27 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 57a8280e283a..1e36f16daa56 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -587,6 +587,9 @@
> #define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> #define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
> #define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
> +#define MSR_AMD64_SYSCFG_MFDM_BIT 19
> +#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)

nit: Similar to the previous patch, the alignment here doesn't look
right. The bad alignment can be viewed on the github version of this
patch:
https://github.com/AMDESE/linux/commit/6d4469b86f90e67119ff110230857788a0d9dbd0

> +
> #define MSR_K8_INT_PENDING_MSG 0xc0010055
> /* C1E active bits in int pending message */
> #define K8_INTP_C1E_ACTIVE_MASK 0x18000000
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 3a233b5d47c5..25c7feb367f6 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2257,6 +2257,27 @@ static __init void snp_enable(void *arg)
> __snp_enable(smp_processor_id());
> }
>
> +static int __mfdm_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_MFDM;

Can we do this inside `__snp_enable()`, above? Then, we'll execute if
a hotplug event happens as well.

static int __snp_enable(unsigned int cpu)
{
u64 val;

if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
return 0;

rdmsrl(MSR_AMD64_SYSCFG, val);

val |= MSR_AMD64_SYSCFG_SNP_EN;
val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
val |= MSR_AMD64_SYSCFG_MFDM;

wrmsrl(MSR_AMD64_SYSCFG, val);

return 0;
}

> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void mfdm_enable(void *arg)
> +{
> + __mfdm_enable(smp_processor_id());
> +}
> +
> static bool get_rmptable_info(u64 *start, u64 *len)
> {
> u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end, nr_pages;
> @@ -2325,6 +2346,9 @@ static __init int __snp_rmptable_init(void)
> /* Flush the caches to ensure that data is written before SNP is enabled. */
> wbinvd_on_all_cpus();
>
> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
> + on_each_cpu(mfdm_enable, NULL, 1);
> +
> /* Enable SNP on all CPUs. */
> on_each_cpu(snp_enable, NULL, 1);
>
> --
> 2.25.1
>

2022-06-23 21:38:49

by Marc Orr

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On Mon, Jun 20, 2022 at 4:02 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
> entry for a given page. The RMP entry format is documented in AMD PPR, see
> https://bugzilla.kernel.org/attachment.cgi?id=296015.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/sev.h | 27 ++++++++++++++++++++++++
> arch/x86/kernel/sev.c | 43 ++++++++++++++++++++++++++++++++++++++
> include/linux/sev.h | 30 ++++++++++++++++++++++++++
> 3 files changed, 100 insertions(+)
> create mode 100644 include/linux/sev.h
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 9c2d33f1cfee..cb16f0e5b585 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -9,6 +9,7 @@
> #define __ASM_ENCRYPTED_STATE_H
>
> #include <linux/types.h>
> +#include <linux/sev.h>
> #include <asm/insn.h>
> #include <asm/sev-common.h>
> #include <asm/bootparam.h>
> @@ -84,6 +85,32 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>
> /* RMP page size */
> #define RMP_PG_SIZE_4K 0
> +#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> +
> +/*
> + * The RMP entry format is not architectural. The format is defined in PPR
> + * Family 19h Model 01h, Rev B1 processor.
> + */
> +struct __packed rmpentry {
> + union {
> + struct {
> + u64 assigned : 1,
> + pagesize : 1,
> + immutable : 1,
> + rsvd1 : 9,
> + gpa : 39,
> + asid : 10,
> + vmsa : 1,
> + validated : 1,
> + rsvd2 : 1;
> + } info;
> + u64 low;
> + };
> + u64 high;
> +};
> +
> +#define rmpentry_assigned(x) ((x)->info.assigned)
> +#define rmpentry_pagesize(x) ((x)->info.pagesize)
>
> #define RMPADJUST_VMSA_PAGE_BIT BIT(16)
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 25c7feb367f6..59e7ec6b0326 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -65,6 +65,8 @@
> * bookkeeping, the range need to be added during the RMP entry lookup.
> */
> #define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> +#define RMPENTRY_SHIFT 8
> +#define rmptable_page_offset(x) (RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))
>
> /* For early boot hypervisor communication in SEV-ES enabled guests */
> static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
> @@ -2386,3 +2388,44 @@ static int __init snp_rmptable_init(void)
> * available after subsys_initcall().
> */
> fs_initcall(snp_rmptable_init);
> +
> +static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
> +{
> + unsigned long vaddr, paddr = pfn << PAGE_SHIFT;
> + struct rmpentry *entry, *large_entry;
> +
> + if (!pfn_valid(pfn))
> + return ERR_PTR(-EINVAL);
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return ERR_PTR(-ENXIO);

nit: I think we should check if SNP is enabled first, before doing
anything else. In other words, I think we should move this check above
the `!pfn_valid()` check.

> +
> + vaddr = rmptable_start + rmptable_page_offset(paddr);
> + if (unlikely(vaddr > rmptable_end))
> + return ERR_PTR(-ENXIO);

nit: It would be nice to use a different error code here, from the SNP
feature check. That way, if this function fails, it's easier to
diagnose where the function failed from the error code.

> +
> + entry = (struct rmpentry *)vaddr;
> +
> + /* Read a large RMP entry to get the correct page level used in RMP entry. */
> + vaddr = rmptable_start + rmptable_page_offset(paddr & PMD_MASK);
> + large_entry = (struct rmpentry *)vaddr;
> + *level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(large_entry));
> +
> + return entry;
> +}
> +
> +/*
> + * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
> + * and -errno if there is no corresponding RMP entry.
> + */
> +int snp_lookup_rmpentry(u64 pfn, int *level)
> +{
> + struct rmpentry *e;
> +
> + e = __snp_lookup_rmpentry(pfn, level);
> + if (IS_ERR(e))
> + return PTR_ERR(e);
> +
> + return !!rmpentry_assigned(e);
> +}
> +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> diff --git a/include/linux/sev.h b/include/linux/sev.h
> new file mode 100644
> index 000000000000..1a68842789e1
> --- /dev/null
> +++ b/include/linux/sev.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * AMD Secure Encrypted Virtualization
> + *
> + * Author: Brijesh Singh <[email protected]>
> + */
> +
> +#ifndef __LINUX_SEV_H
> +#define __LINUX_SEV_H
> +
> +/* RMUPDATE detected 4K page and 2MB page overlap. */
> +#define RMPUPDATE_FAIL_OVERLAP 7
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +int snp_lookup_rmpentry(u64 pfn, int *level);
> +int psmash(u64 pfn);
> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> +int rmp_make_shared(u64 pfn, enum pg_level level);

nit: I think the declarations for `psmash()`, `rmp_make_private()`,
and `rmp_make_shared()` should be introduced in the patches that have
their definitions.

> +#else
> +static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
> +static inline int psmash(u64 pfn) { return -ENXIO; }
> +static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
> + bool immutable)
> +{
> + return -ENODEV;
> +}
> +static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
> +
> +#endif /* CONFIG_AMD_MEM_ENCRYPT */
> +#endif /* __LINUX_SEV_H */
> --
> 2.25.1
>

2022-06-23 22:24:38

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 03/49] x86/sev: Add the host SEV-SNP initialization support

[AMD Official Use Only - General]

>> +static int __init snp_rmptable_init(void) {
>> + if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
>> + return 0;
>> +
>> + if (!iommu_sev_snp_supported())
>> + goto nosnp;
>> +
>> + if (__snp_rmptable_init())
>> + goto nosnp;
>> +
>> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
>> + "x86/rmptable_init:online", __snp_enable, NULL);
>> +
>> + return 0;
>> +
>> +nosnp:
>> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>> + return 1;

>Seems odd that we're returning 1 here, rather than 0. I tried to figure out how the initcall return values are used and failed. My impression was 0 means success and a negative number means failure.
>But maybe this is normal.

I think that initcall values are typically ignored, but it should return 0 on success and negative on error. So probably should fix this to return something like -ENOSYS instead of 1.

Thanks,
Ashish

2022-06-23 22:40:10

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On Wed, Jun 22, 2022, Kalra, Ashish wrote:
> On 6/22/22 12:43, Kalra, Ashish wrote:
> >>> I think that needs to be fixed. It should be as simple as a
> >>> model/family check, though. If someone (for example) attempts to use
> >>> SNP (and thus snp_lookup_rmpentry() and dump_rmpentry()) code on a
> >>> newer CPU, the kernel should refuse.
> >> More specifically I am thinking of adding RMP entry field accessors so
> >> that they can do this cpu model/family check and return the correct
> >> field as per processor architecture.
>
> >That will be helpful down the road when there's more than one format. But,
> >the real issue is that the kernel doesn't *support* a different RMP format.
> >So, the SNP support should be disabled when encountering a model/family
> >other than the known good one.
>
> Yes, that makes sense, will add an additional check in snp_rmptable_init().

And as I suggested in v5[*], bury the microarchitectural struct in sev.c so that
nothing outside of the few bits of SNP code that absolutely need to know the layout
of the struct should even be aware that there's a struct overlay for RMP entries.

[*] https://lore.kernel.org/all/[email protected]

2022-06-23 22:43:59

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

[AMD Official Use Only - General]

>> On 6/22/22 12:43, Kalra, Ashish wrote:
>> >>> I think that needs to be fixed. It should be as simple as a
>> >>> model/family check, though. If someone (for example) attempts to
>> >>> use SNP (and thus snp_lookup_rmpentry() and dump_rmpentry()) code
>> >>> on a newer CPU, the kernel should refuse.
>> >> More specifically I am thinking of adding RMP entry field accessors
>> >> so that they can do this cpu model/family check and return the
>> >> correct field as per processor architecture.
>>
>> >That will be helpful down the road when there's more than one format.
>> >But, the real issue is that the kernel doesn't *support* a different RMP format.
>> >So, the SNP support should be disabled when encountering a
>> >model/family other than the known good one.
>>
>> Yes, that makes sense, will add an additional check in snp_rmptable_init().

>And as I suggested in v5[*], bury the microarchitectural struct in sev.c so that nothing outside of the few bits of SNP code that absolutely need to know the layout of the struct should even be aware that there's a struct overlay for RMP entries.

Yes, that's a nice way to hide it from the rest of the kernel which does not require access to this structure anyway, in essence, it becomes a private structure.

Thanks,
Ashish

>[*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2FYPCAZaROOHNskGlO%40google.com&amp;data=05%7C01%7CAshish.Kalra%40amd.com%7Ce210ec383f654556348c08da5568ca81%>7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637916205851843411%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=6TOpchjhgFg%>2F5JTa%2FqSviiTuehNoZgvTVBuZv6JxsXc%3D&amp;reserved=0

2022-06-24 14:20:55

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

On Tue, Jun 21, 2022 at 2:17 PM Kalra, Ashish <[email protected]> wrote:
>
> [Public]
>
> Hello Peter,
>
> >> +static int snp_reclaim_pages(unsigned long pfn, unsigned int npages,
> >> +bool locked) {
> >> + struct sev_data_snp_page_reclaim data;
> >> + int ret, err, i, n = 0;
> >> +
> >> + for (i = 0; i < npages; i++) {
>
> >What about setting |n| here too, also the other increments.
>
> >for (i = 0, n = 0; i < npages; i++, n++, pfn++)
>
> Yes that is simpler.
>
> >> + memset(&data, 0, sizeof(data));
> >> + data.paddr = pfn << PAGE_SHIFT;
> >> +
> >> + if (locked)
> >> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> >> + else
> >> + ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM,
> >> + &data, &err);
>
> > Can we change `sev_cmd_mutex` to some sort of nesting lock type? That could clean up this if (locked) code.
>
> > +static inline int rmp_make_firmware(unsigned long pfn, int level) {
> > + return rmp_make_private(pfn, 0, level, 0, true); }
> > +
> > +static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, bool to_fw, bool locked,
> > + bool need_reclaim)
>
> >This function can do a lot and when I read the call sites its hard to see what its doing since we have a combination of arguments which tell us what behavior is happening, some of which are not valid (ex: to_fw == true and need_reclaim == true is an >invalid argument combination).
>
> to_fw is used to make a firmware page and need_reclaim is for freeing the firmware page, so they are going to be mutually exclusive.
>
> I actually can connect with it quite logically with the callers :
> snp_alloc_firmware_pages will call with to_fw = true and need_reclaim = false
> and snp_free_firmware_pages will do the opposite, to_fw = false and need_reclaim = true.
>
> That seems straightforward to look at.

This might be a preference thing but I find it not straightforward.
When I am reading through unmap_firmware_writeable() and I see

/* Transition the pre-allocated buffer to the firmware state. */
if (snp_set_rmp_state(__pa(map->host), npages, true, true, false))
return -EFAULT;

I don't actually know what snp_set_rmp_state() is doing unless I go
look at the definition and see what all those booleans mean. This is
unlike the rmp_make_shared() and rmp_make_private() functions, each of
which tells me a lot more about what the function will do just from
the name.


>
> >Also this for loop over |npages| is duplicated from snp_reclaim_pages(). One improvement here is that on the current
> >snp_reclaim_pages() if we fail to reclaim a page we assume we cannot reclaim the next pages, this may cause us to snp_leak_pages() more pages than we actually need too.
>
> Yes that is true.
>
> >What about something like this?
>
> >static snp_leak_page(u64 pfn, enum pg_level level) {
> > memory_failure(pfn, 0);
> > dump_rmpentry(pfn);
> >}
>
> >static int snp_reclaim_page(u64 pfn, enum pg_level level) {
> > int ret;
> > struct sev_data_snp_page_reclaim data;
>
> > ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> > if (ret)
> > goto cleanup;
>
> > ret = rmp_make_shared(pfn, level);
> > if (ret)
> > goto cleanup;
>
> > return 0;
>
> >cleanup:
> > snp_leak_page(pfn, level)
> >}
>
> >typedef int (*rmp_state_change_func) (u64 pfn, enum pg_level level);
>
> >static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, rmp_state_change_func state_change, rmp_state_change_func cleanup) {
> > struct sev_data_snp_page_reclaim data;
> > int ret, err, i, n = 0;
>
> > for (i = 0, n = 0; i < npages; i++, n++, pfn++) {
> > ret = state_change(pfn, PG_LEVEL_4K)
> > if (ret)
> > goto cleanup;
> > }
>
> > return 0;
>
> > cleanup:
> > for (; i>= 0; i--, n--, pfn--) {
> > cleanup(pfn, PG_LEVEL_4K);
> > }
>
> > return ret;
> >}
>
> >Then inside of __snp_alloc_firmware_pages():
>
> >snp_set_rmp_state(paddr, npages, rmp_make_firmware, snp_reclaim_page);
>
> >And inside of __snp_free_firmware_pages():
>
> >snp_set_rmp_state(paddr, npages, snp_reclaim_page, snp_leak_page);
>
> >Just a suggestion feel free to ignore. The readability comment could be addressed much less invasively by just making separate functions for each valid combination of arguments here. Like snp_set_rmp_fw_state(), snp_set_rmp_shared_state(),
> >snp_set_rmp_release_state() or something.
>
> >> +static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int
> >> +order, bool locked) {
> >> + unsigned long npages = 1ul << order, paddr;
> >> + struct sev_device *sev;
> >> + struct page *page;
> >> +
> >> + if (!psp_master || !psp_master->sev_data)
> >> + return NULL;
> >> +
> >> + page = alloc_pages(gfp_mask, order);
> >> + if (!page)
> >> + return NULL;
> >> +
> >> + /* If SEV-SNP is initialized then add the page in RMP table. */
> >> + sev = psp_master->sev_data;
> >> + if (!sev->snp_inited)
> >> + return page;
> >> +
> >> + paddr = __pa((unsigned long)page_address(page));
> >> + if (snp_set_rmp_state(paddr, npages, true, locked, false))
> >> + return NULL;
>
> >So what about the case where snp_set_rmp_state() fails but we were able to reclaim all the pages? Should we be able to signal that to callers so that we could free |page| here? But given this is an error path already maybe we can optimize this in a >follow up series.
>
> Yes, we should actually tie in to snp_reclaim_pages() success or failure here in the case we were able to successfully unroll some or all of the firmware state change.
>
> > +
> > + return page;
> > +}
> > +
> > +void *snp_alloc_firmware_page(gfp_t gfp_mask) {
> > + struct page *page;
> > +
> > + page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
> > +
> > + return page ? page_address(page) : NULL; }
> > +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
> > +
> > +static void __snp_free_firmware_pages(struct page *page, int order,
> > +bool locked) {
> > + unsigned long paddr, npages = 1ul << order;
> > +
> > + if (!page)
> > + return;
> > +
> > + paddr = __pa((unsigned long)page_address(page));
> > + if (snp_set_rmp_state(paddr, npages, false, locked, true))
> > + return;
>
> > Here we may be able to free some of |page| depending how where inside of snp_set_rmp_state() we failed. But again given this is an error path already maybe we can optimize this in a follow up series.
>
> Yes, we probably should be able to free some of the page(s) depending on how many page(s) got reclaimed in snp_set_rmp_state().
> But these reclamation failures may not be very common, so any failure is indicative of a bigger issue, it might be the case when there is a single page reclamation error it might happen with all the subsequent
> pages and so follow a simple recovery procedure, then handling a more complex recovery for a chunk of pages being reclaimed and another chunk not.
>
> Thanks,
> Ashish
>
>
>

2022-06-24 14:35:00

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 26/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command

On Mon, Jun 20, 2022 at 5:08 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
> guest's memory. The data is encrypted with the cryptographic context
> created with the KVM_SEV_SNP_LAUNCH_START.
>
> In addition to the inserting data, it can insert a two special pages
> into the guests memory: the secrets page and the CPUID page.
>
> While terminating the guest, reclaim the guest pages added in the RMP
> table. If the reclaim fails, then the page is no longer safe to be
> released back to the system and leak them.
>
> For more information see the SEV-SNP specification.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> .../virt/kvm/x86/amd-memory-encryption.rst | 29 +++
> arch/x86/kvm/svm/sev.c | 187 ++++++++++++++++++
> include/uapi/linux/kvm.h | 19 ++
> 3 files changed, 235 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index 878711f2dca6..62abd5c1f72b 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -486,6 +486,35 @@ Returns: 0 on success, -negative on error
>
> See the SEV-SNP specification for further detail on the launch input.
>
> +20. KVM_SNP_LAUNCH_UPDATE
> +-------------------------
> +
> +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
> +calculates a measurement of the memory contents. The measurement is a signature
> +of the memory contents that can be sent to the guest owner as an attestation
> +that the memory was encrypted correctly by the firmware.
> +
> +Parameters (in): struct kvm_snp_launch_update
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_snp_launch_update {
> + __u64 start_gfn; /* Guest page number to start from. */
> + __u64 uaddr; /* userspace address need to be encrypted */
> + __u32 len; /* length of memory region */
> + __u8 imi_page; /* 1 if memory is part of the IMI */
> + __u8 page_type; /* page type */
> + __u8 vmpl3_perms; /* VMPL3 permission mask */
> + __u8 vmpl2_perms; /* VMPL2 permission mask */
> + __u8 vmpl1_perms; /* VMPL1 permission mask */
> + };
> +
> +See the SEV-SNP spec for further details on how to build the VMPL permission
> +mask and page type.
> +
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 41b83aa6b5f4..b5f0707d7ed6 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -18,6 +18,7 @@
> #include <linux/processor.h>
> #include <linux/trace_events.h>
> #include <linux/hugetlb.h>
> +#include <linux/sev.h>
>
> #include <asm/pkru.h>
> #include <asm/trapnr.h>
> @@ -233,6 +234,49 @@ static void sev_decommission(unsigned int handle)
> sev_guest_decommission(&decommission, NULL);
> }
>
> +static inline void snp_leak_pages(u64 pfn, enum pg_level level)
> +{
> + unsigned int npages = page_level_size(level) >> PAGE_SHIFT;
> +
> + WARN(1, "psc failed pfn 0x%llx pages %d (leaking)\n", pfn, npages);
> +
> + while (npages) {
> + memory_failure(pfn, 0);
> + dump_rmpentry(pfn);
> + npages--;
> + pfn++;
> + }
> +}

Should this be deduplicated with the snp_leak_pages() in "crypto: ccp:
Handle the legacy TMR allocation when SNP is enabled" ?

> +
> +static int snp_page_reclaim(u64 pfn)
> +{
> + struct sev_data_snp_page_reclaim data = {0};
> + int err, rc;
> +
> + data.paddr = __sme_set(pfn << PAGE_SHIFT);
> + rc = snp_guest_page_reclaim(&data, &err);
> + if (rc) {
> + /*
> + * If the reclaim failed, then page is no longer safe
> + * to use.
> + */
> + snp_leak_pages(pfn, PG_LEVEL_4K);
> + }
> +
> + return rc;
> +}
> +
> +static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
> +{
> + int rc;
> +
> + rc = rmp_make_shared(pfn, level);
> + if (rc && leak)
> + snp_leak_pages(pfn, level);
> +
> + return rc;
> +}
> +
> static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
> {
> struct sev_data_deactivate deactivate;
> @@ -1902,6 +1946,123 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return rc;
> }
>
> +static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct list_head *head = &sev->regions_list;
> + struct enc_region *i;
> +
> + lockdep_assert_held(&kvm->lock);
> +
> + list_for_each_entry(i, head, list) {
> + u64 start = i->uaddr;
> + u64 end = start + i->size;
> +
> + if (start <= hva && end >= (hva + len))
> + return true;
> + }

Given that usersapce could load sev->regions_list with any # of any
sized regions. Should we add a cond_resched() like in
sev_vm_destroy()?

> +
> + return false;
> +}
> +
> +static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_update data = {0};
> + struct kvm_sev_snp_launch_update params;
> + unsigned long npages, pfn, n = 0;
> + int *error = &argp->error;
> + struct page **inpages;
> + int ret, i, level;
> + u64 gfn;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (!sev->snp_context)
> + return -EINVAL;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> + return -EFAULT;
> +
> + /* Verify that the specified address range is registered. */
> + if (!is_hva_registered(kvm, params.uaddr, params.len))
> + return -EINVAL;
> +
> + /*
> + * The userspace memory is already locked so technically we don't
> + * need to lock it again. Later part of the function needs to know
> + * pfn so call the sev_pin_memory() so that we can get the list of
> + * pages to iterate through.
> + */
> + inpages = sev_pin_memory(kvm, params.uaddr, params.len, &npages, 1);
> + if (!inpages)
> + return -ENOMEM;
> +
> + /*
> + * Verify that all the pages are marked shared in the RMP table before
> + * going further. This is avoid the cases where the userspace may try
> + * updating the same page twice.
> + */
> + for (i = 0; i < npages; i++) {
> + if (snp_lookup_rmpentry(page_to_pfn(inpages[i]), &level) != 0) {
> + sev_unpin_memory(kvm, inpages, npages);
> + return -EFAULT;
> + }
> + }
> +
> + gfn = params.start_gfn;
> + level = PG_LEVEL_4K;
> + data.gctx_paddr = __psp_pa(sev->snp_context);
> +
> + for (i = 0; i < npages; i++) {
> + pfn = page_to_pfn(inpages[i]);
> +
> + ret = rmp_make_private(pfn, gfn << PAGE_SHIFT, level, sev_get_asid(kvm), true);
> + if (ret) {
> + ret = -EFAULT;
> + goto e_unpin;
> + }
> +
> + n++;
> + data.address = __sme_page_pa(inpages[i]);
> + data.page_size = X86_TO_RMP_PG_LEVEL(level);
> + data.page_type = params.page_type;
> + data.vmpl3_perms = params.vmpl3_perms;
> + data.vmpl2_perms = params.vmpl2_perms;
> + data.vmpl1_perms = params.vmpl1_perms;
> + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, &data, error);
> + if (ret) {
> + /*
> + * If the command failed then need to reclaim the page.
> + */
> + snp_page_reclaim(pfn);
> + goto e_unpin;
> + }
> +
> + gfn++;
> + }
> +
> +e_unpin:
> + /* Content of memory is updated, mark pages dirty */
> + for (i = 0; i < n; i++) {

Since |n| is not only a loop variable but actually carries the number
of private pages over to e_unpin can we use a more descriptive name?
How about something like 'nprivate_pages'?

> + set_page_dirty_lock(inpages[i]);
> + mark_page_accessed(inpages[i]);
> +
> + /*
> + * If its an error, then update RMP entry to change page ownership
> + * to the hypervisor.
> + */
> + if (ret)
> + host_rmp_make_shared(pfn, level, true);
> + }
> +
> + /* Unlock the user pages */
> + sev_unpin_memory(kvm, inpages, npages);
> +
> + return ret;
> +}
> +
> int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -1995,6 +2156,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SNP_LAUNCH_START:
> r = snp_launch_start(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SNP_LAUNCH_UPDATE:
> + r = snp_launch_update(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> @@ -2113,6 +2277,29 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
> static void __unregister_enc_region_locked(struct kvm *kvm,
> struct enc_region *region)
> {
> + unsigned long i, pfn;
> + int level;
> +
> + /*
> + * The guest memory pages are assigned in the RMP table. Unassign it
> + * before releasing the memory.
> + */
> + if (sev_snp_guest(kvm)) {
> + for (i = 0; i < region->npages; i++) {
> + pfn = page_to_pfn(region->pages[i]);
> +
> + if (!snp_lookup_rmpentry(pfn, &level))
> + continue;
> +
> + cond_resched();
> +
> + if (level > PG_LEVEL_4K)
> + pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
> +
> + host_rmp_make_shared(pfn, level, true);
> + }
> + }
> +
> sev_unpin_memory(kvm, region->pages, region->npages);
> list_del(&region->list);
> kfree(region);
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 0cb119d66ae5..9b36b07414ea 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1813,6 +1813,7 @@ enum sev_cmd_id {
> /* SNP specific commands */
> KVM_SEV_SNP_INIT,
> KVM_SEV_SNP_LAUNCH_START,
> + KVM_SEV_SNP_LAUNCH_UPDATE,
>
> KVM_SEV_NR_MAX,
> };
> @@ -1929,6 +1930,24 @@ struct kvm_sev_snp_launch_start {
> __u8 pad[6];
> };
>
> +#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
> +#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2
> +#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
> +#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
> +#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
> +#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
> +
> +struct kvm_sev_snp_launch_update {
> + __u64 start_gfn;
> + __u64 uaddr;
> + __u32 len;
> + __u8 imi_page;
> + __u8 page_type;
> + __u8 vmpl3_perms;
> + __u8 vmpl2_perms;
> + __u8 vmpl1_perms;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> --
> 2.25.1
>

2022-06-24 14:45:31

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 24/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command

>
> +19. KVM_SNP_LAUNCH_START
> +------------------------
> +
> +The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
> +context for the SEV-SNP guest. To create the encryption context, user must
> +provide a guest policy, migration agent (if any) and guest OS visible
> +workarounds value as defined SEV-SNP specification.
> +
> +Parameters (in): struct kvm_snp_launch_start
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_snp_launch_start {
> + __u64 policy; /* Guest policy to use. */
> + __u64 ma_uaddr; /* userspace address of migration agent */
> + __u8 ma_en; /* 1 if the migtation agent is enabled */

migration

> + __u8 imi_en; /* set IMI to 1. */
> + __u8 gosvw[16]; /* guest OS visible workarounds */
> + };
> +
> +See the SEV-SNP specification for further detail on the launch input.
> +
> References
> ==========
>

>
> +static int snp_decommission_context(struct kvm *kvm)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_decommission data = {};
> + int ret;
> +
> + /* If context is not created then do nothing */
> + if (!sev->snp_context)
> + return 0;
> +
> + data.gctx_paddr = __sme_pa(sev->snp_context);
> + ret = snp_guest_decommission(&data, NULL);

Do we have a similar race like in sev_unbind_asid() with DEACTIVATE
and WBINVD/DF_FLUSH? The SNP_DECOMMISSION spec looks quite similar to
DEACTIVATE.

> + if (WARN_ONCE(ret, "failed to release guest context"))
> + return ret;
> +
> + /* free the context page now */
> + snp_free_firmware_page(sev->snp_context);
> + sev->snp_context = NULL;
> +
> + return 0;
> +}
> +
> void sev_vm_destroy(struct kvm *kvm)
> {
> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;

2022-06-24 15:24:01

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 35/49] KVM: SVM: Remove the long-lived GHCB host map

On Mon, Jun 20, 2022 at 5:11 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> On VMGEXIT, sev_handle_vmgexit() creates a host mapping for the GHCB GPA,
> and unmaps it just before VM-entry. This long-lived GHCB map is used by
> the VMGEXIT handler through accessors such as ghcb_{set_get}_xxx().
>
> A long-lived GHCB map can cause issue when SEV-SNP is enabled. When
> SEV-SNP is enabled the mapped GPA needs to be protected against a page
> state change.
>
> To eliminate the long-lived GHCB mapping, update the GHCB sync operations
> to explicitly map the GHCB before access and unmap it after access is
> complete. This requires that the setting of the GHCBs sw_exit_info_{1,2}
> fields be done during sev_es_sync_to_ghcb(), so create two new fields in
> the vcpu_svm struct to hold these values when required to be set outside
> of the GHCB mapping.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 131 ++++++++++++++++++++++++++---------------
> arch/x86/kvm/svm/svm.c | 12 ++--
> arch/x86/kvm/svm/svm.h | 24 +++++++-
> 3 files changed, 111 insertions(+), 56 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 01ea257e17d6..c70f3f7e06a8 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2823,15 +2823,40 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
> kvfree(svm->sev_es.ghcb_sa);
> }
>
> +static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
> +{
> + struct vmcb_control_area *control = &svm->vmcb->control;
> + u64 gfn = gpa_to_gfn(control->ghcb_gpa);
> +
> + if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
> + /* Unable to map GHCB from guest */
> + pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
> + return -EFAULT;
> + }
> +
> + return 0;
> +}

There is a perf cost to this suggestion but it might make accessing
the GHCB safer for KVM. Have you thought about just using
kvm_read_guest() or copy_from_user() to fully copy out the GCHB into a
KVM owned buffer, then copying it back before the VMRUN. That way the
KVM doesn't need to guard against page_state_changes on the GHCBs,
that could be a perf improvement in a follow up.

Since we cannot unmap GHCBs I don't think UPM will help here so we
probably want to make these patches safe against malicious guests
making GHCBs private. But maybe UPM does help?

2022-06-24 16:32:19

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 42/49] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On Mon, Jun 20, 2022 at 5:13 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> Version 2 of GHCB specification added the support for two SNP Guest
> Request Message NAE events. The events allows for an SEV-SNP guest to
> make request to the SEV-SNP firmware through hypervisor using the
> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>
> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
> difference of an additional certificate blob that can be passed through
> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
> provides snp_guest_ext_guest_request() that is used by the KVM to get
> both the report and certificate data at once.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 196 +++++++++++++++++++++++++++++++++++++++--
> arch/x86/kvm/svm/svm.h | 2 +
> 2 files changed, 192 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 7fc0fad87054..089af21a4efe 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -343,6 +343,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
> spin_lock_init(&sev->psc_lock);
> ret = sev_snp_init(&argp->error);
> + mutex_init(&sev->guest_req_lock);
> } else {
> ret = sev_platform_init(&argp->error);
> }
> @@ -1884,23 +1885,39 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>
> static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
> {
> + void *context = NULL, *certs_data = NULL, *resp_page = NULL;

Is the NULL setting here unnecessary since all of these are set via
functions snp_alloc_firmware_page(), kmalloc(), and
snp_alloc_firmware_page() respectively?

> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> struct sev_data_snp_gctx_create data = {};
> - void *context;
> int rc;
>
> + /* Allocate memory used for the certs data in SNP guest request */
> + certs_data = kmalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
> + if (!certs_data)
> + return NULL;

I think we want to use kzalloc() here to ensure we never give the
guest uninitialized kernel memory.

> +
> /* Allocate memory for context page */
> context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
> if (!context)
> - return NULL;
> + goto e_free;
> +
> + /* Allocate a firmware buffer used during the guest command handling. */
> + resp_page = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
> + if (!resp_page)
> + goto e_free;

|resp_page| doesn't appear to be used anywhere?

>
> data.gctx_paddr = __psp_pa(context);
> rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
> - if (rc) {
> - snp_free_firmware_page(context);
> - return NULL;
> - }
> + if (rc)
> + goto e_free;
> +
> + sev->snp_certs_data = certs_data;
>
> return context;
> +
> +e_free:
> + snp_free_firmware_page(context);
> + kfree(certs_data);
> + return NULL;
> }
>
> static int snp_bind_asid(struct kvm *kvm, int *error)
> @@ -2565,6 +2582,8 @@ static int snp_decommission_context(struct kvm *kvm)
> snp_free_firmware_page(sev->snp_context);
> sev->snp_context = NULL;
>
> + kfree(sev->snp_certs_data);
> +
> return 0;
> }
>
> @@ -3077,6 +3096,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> case SVM_VMGEXIT_HV_FEATURES:
> case SVM_VMGEXIT_PSC:
> + case SVM_VMGEXIT_GUEST_REQUEST:
> + case SVM_VMGEXIT_EXT_GUEST_REQUEST:
> break;
> default:
> reason = GHCB_ERR_INVALID_EVENT;
> @@ -3502,6 +3523,155 @@ static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm)
> return rc ? map_to_psc_vmgexit_code(rc) : 0;
> }
>
> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
> + struct sev_data_snp_guest_request *data,
> + gpa_t req_gpa, gpa_t resp_gpa)
> +{
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + kvm_pfn_t req_pfn, resp_pfn;
> + struct kvm_sev_info *sev;
> +
> + sev = &to_kvm_svm(kvm)->sev_info;

This is normally done at declaration in this file. Why not here?

struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;

> +
> + if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
> + return SEV_RET_INVALID_PARAM;
> +
> + req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> + if (is_error_noslot_pfn(req_pfn))
> + return SEV_RET_INVALID_ADDRESS;
> +
> + resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> + if (is_error_noslot_pfn(resp_pfn))
> + return SEV_RET_INVALID_ADDRESS;
> +
> + if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
> + return SEV_RET_INVALID_ADDRESS;
> +
> + data->gctx_paddr = __psp_pa(sev->snp_context);
> + data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> + data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
> +
> + return 0;
> +}
> +
> +static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
> +{
> + u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
> + int ret;
> +
> + ret = snp_page_reclaim(pfn);
> + if (ret)
> + *rc = SEV_RET_INVALID_ADDRESS;

Do we need a diff error code here? This means the page the guest gives
us is now "stuck" in the FW owned state. How would the guest know this
is the case? We return the exact same error in snp_setup_guest_buf()
if the resp_gpa isn't page aligned so now if the guest ever sees a
SEV_RET_INVALID_ADDRESS I think its only safe option is to either try
and page_state_change it to a know state or mark it as unusable
memory.

> +
> + ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (ret)
> + *rc = SEV_RET_INVALID_ADDRESS;

Ditto here I think we need some way to signal to the guest what state
this page is on return to guest execution.

Also these errors shadow over FW successes, this means the guest's
guest-request-sequence-numbers are now out of sync meaning this VMPCK
is unusable less the guest risk reusing the AES IV (which would break
the confidentiality/integrity). Should we have a way to signal to the
guest the FW has successfully run your command but we could not change
the page states back correctly, so the guest should increment their
sequence numbers.

> +}
> +
> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> + struct sev_data_snp_guest_request data = {0};
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + struct kvm_sev_info *sev;
> + unsigned long rc;
> + int err;
> +
> + if (!sev_snp_guest(vcpu->kvm)) {
> + rc = SEV_RET_INVALID_GUEST;
> + goto e_fail;
> + }
> +
> + sev = &to_kvm_svm(kvm)->sev_info;

Ditto why not due this above?

> +
> + mutex_lock(&sev->guest_req_lock);
> +
> + rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> + if (rc)
> + goto unlock;
> +
> + rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> + if (rc)
> + /* use the firmware error code */
> + rc = err;
> +
> + snp_cleanup_guest_buf(&data, &rc);
> +
> +unlock:
> + mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> + svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> +}
> +
> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> + struct sev_data_snp_guest_request req = {0};
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + unsigned long data_npages;
> + struct kvm_sev_info *sev;
> + unsigned long rc, err;
> + u64 data_gpa;
> +
> + if (!sev_snp_guest(vcpu->kvm)) {
> + rc = SEV_RET_INVALID_GUEST;
> + goto e_fail;
> + }
> +
> + sev = &to_kvm_svm(kvm)->sev_info;
> +
> + data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> + data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> +
> + if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> + rc = SEV_RET_INVALID_ADDRESS;
> + goto e_fail;
> + }
> +
> + /* Verify that requested blob will fit in certificate buffer */
> + if ((data_npages << PAGE_SHIFT) > SEV_FW_BLOB_MAX_SIZE) {
> + rc = SEV_RET_INVALID_PARAM;
> + goto e_fail;
> + }
> +
> + mutex_lock(&sev->guest_req_lock);
> +
> + rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
> + if (rc)
> + goto unlock;
> +
> + rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
> + &data_npages, &err);
> + if (rc) {
> + /*
> + * If buffer length is small then return the expected
> + * length in rbx.
> + */
> + if (err == SNP_GUEST_REQ_INVALID_LEN)
> + vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
> +
> + /* pass the firmware error code */
> + rc = err;
> + goto cleanup;
> + }
> +
> + /* Copy the certificate blob in the guest memory */
> + if (data_npages &&
> + kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
> + rc = SEV_RET_INVALID_ADDRESS;

Since at this point the PSP FW has correctly executed the command and
incremented the VMPCK sequence number I think we need another error
signal here since this will tell the guest the PSP had an error so it
will not know if the VMPCK sequence number should be incremented.

> +
> +cleanup:
> + snp_cleanup_guest_buf(&req, &rc);
> +
> +unlock:
> + mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> + svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> +}
> +
> static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> {
> struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -3753,6 +3923,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
> svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> break;
> }
> + case SVM_VMGEXIT_GUEST_REQUEST: {
> + snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
> +
> + ret = 1;
> + break;
> + }
> + case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
> + snp_handle_ext_guest_request(svm,
> + control->exit_info_1,
> + control->exit_info_2);
> +
> + ret = 1;
> + break;
> + }
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> vcpu_unimpl(vcpu,
> "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 3fd95193ed8d..3be24da1a743 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -98,6 +98,8 @@ struct kvm_sev_info {
> u64 snp_init_flags;
> void *snp_context; /* SNP guest context page */
> spinlock_t psc_lock;
> + void *snp_certs_data;
> + struct mutex guest_req_lock;
> };
>
> struct kvm_svm {
> --
> 2.25.1
>

2022-06-24 16:37:00

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 47/49] *fix for stale per-cpu pointer due to cond_resched during ghcb mapping

On Mon, Jun 20, 2022 at 5:15 PM Ashish Kalra <[email protected]> wrote:
>
> From: Michael Roth <[email protected]>
>
> Signed-off-by: Michael Roth <[email protected]>

Can you add a commit description here? Is this a fix for existing
SEV-ES support or should this be incorporated into a patch in this
series which adds this issue?

> ---
> arch/x86/kvm/svm/svm.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index fced6ea423ad..f78e3b1bde0e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1352,7 +1352,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
> static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> - struct svm_cpu_data *sd = per_cpu(svm_data, vcpu->cpu);
> + struct svm_cpu_data *sd;
>
> if (sev_es_guest(vcpu->kvm))
> sev_es_unmap_ghcb(svm);
> @@ -1360,6 +1360,10 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> if (svm->guest_state_loaded)
> return;
>
> + /* sev_es_unmap_ghcb() can resched, so grab per-cpu pointer afterward. */
> + barrier();
> + sd = per_cpu(svm_data, vcpu->cpu);
> +
> /*
> * Save additional host state that will be restored on VMEXIT (sev-es)
> * or subsequent vmload of host save area.
> --
> 2.25.1
>

2022-06-24 16:50:27

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 47/49] *fix for stale per-cpu pointer due to cond_resched during ghcb mapping

[AMD Official Use Only - General]

Hello Peter,
>>
>> From: Michael Roth <[email protected]>
>>
>> Signed-off-by: Michael Roth <[email protected]>

>Can you add a commit description here? Is this a fix for existing SEV-ES support or should this be incorporated into a patch in this series which adds this issue?

This actually fixes issues caused due to preemption happening in svm_prepare_switch_to_guest() when kvm_vcpu_map() is called to map in the GHCB before
entering the guest.

This is a temporary fix and what we need to do is to prevent getting preempted after vcpu_enter_guest() has disabled preemption, have some ideas about
using gfn_to_pfn_cache() infrastructure to re-use the already mapped GHCB at guest exit, so that we can avoid calling kvm_vcpu_map() to re-map the
GHCB.

Thanks,
Ashish

> ---
> arch/x86/kvm/svm/svm.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index
> fced6ea423ad..f78e3b1bde0e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1352,7 +1352,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
> static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {
> struct vcpu_svm *svm = to_svm(vcpu);
> - struct svm_cpu_data *sd = per_cpu(svm_data, vcpu->cpu);
> + struct svm_cpu_data *sd;
>
> if (sev_es_guest(vcpu->kvm))
> sev_es_unmap_ghcb(svm); @@ -1360,6 +1360,10 @@ static
> void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> if (svm->guest_state_loaded)
> return;
>
> + /* sev_es_unmap_ghcb() can resched, so grab per-cpu pointer afterward. */
> + barrier();
> + sd = per_cpu(svm_data, vcpu->cpu);
> +
> /*
> * Save additional host state that will be restored on VMEXIT (sev-es)
> * or subsequent vmload of host save area.
> --
> 2.25.1
>

2022-06-24 18:20:37

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 24/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command

[AMD Official Use Only - General]

>> +static int snp_decommission_context(struct kvm *kvm) {
>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> + struct sev_data_snp_decommission data = {};
>> + int ret;
>> +
>> + /* If context is not created then do nothing */
>> + if (!sev->snp_context)
>> + return 0;
>> +
>> + data.gctx_paddr = __sme_pa(sev->snp_context);
>> + ret = snp_guest_decommission(&data, NULL);

>Do we have a similar race like in sev_unbind_asid() with DEACTIVATE and WBINVD/DF_FLUSH? The SNP_DECOMMISSION spec looks quite similar to DEACTIVATE.

Yes, SNP_DECOMMISION also marks the ASID as invalid and require a WBINVD/DF_FLUSH before the ASID is re-used/re-cycled, so we need to prevent against
DECOMMISION and ASID re-cycling happening at the same time. Can reuse the same RWSEM (sev_deactivate_lock) here too.

Thanks,
Ashish

> + if (WARN_ONCE(ret, "failed to release guest context"))
> + return ret;
> +
> + /* free the context page now */
> + snp_free_firmware_page(sev->snp_context);
> + sev->snp_context = NULL;
> +
> + return 0;
> +}
> +
> void sev_vm_destroy(struct kvm *kvm)
> {
> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;

2022-06-24 20:16:11

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 35/49] KVM: SVM: Remove the long-lived GHCB host map

[AMD Official Use Only - General]

Hello Peter,

>> From: Brijesh Singh <[email protected]>
>>
>> On VMGEXIT, sev_handle_vmgexit() creates a host mapping for the GHCB
>> GPA, and unmaps it just before VM-entry. This long-lived GHCB map is
>> used by the VMGEXIT handler through accessors such as ghcb_{set_get}_xxx().
>>
>> A long-lived GHCB map can cause issue when SEV-SNP is enabled. When
>> SEV-SNP is enabled the mapped GPA needs to be protected against a page
>> state change.
>>
>> To eliminate the long-lived GHCB mapping, update the GHCB sync
>> operations to explicitly map the GHCB before access and unmap it after
>> access is complete. This requires that the setting of the GHCBs
>> sw_exit_info_{1,2} fields be done during sev_es_sync_to_ghcb(), so
>> create two new fields in the vcpu_svm struct to hold these values when
>> required to be set outside of the GHCB mapping.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/kvm/svm/sev.c | 131
>> ++++++++++++++++++++++++++---------------
>> arch/x86/kvm/svm/svm.c | 12 ++--
>> arch/x86/kvm/svm/svm.h | 24 +++++++-
>> 3 files changed, 111 insertions(+), 56 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index
>> 01ea257e17d6..c70f3f7e06a8 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2823,15 +2823,40 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
>> kvfree(svm->sev_es.ghcb_sa);
>> }
>>
>> +static inline int svm_map_ghcb(struct vcpu_svm *svm, struct
>> +kvm_host_map *map) {
>> + struct vmcb_control_area *control = &svm->vmcb->control;
>> + u64 gfn = gpa_to_gfn(control->ghcb_gpa);
>> +
>> + if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
>> + /* Unable to map GHCB from guest */
>> + pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
>> + return -EFAULT;
>> + }
>> +
>> + return 0;
>> +}

>There is a perf cost to this suggestion but it might make accessing the GHCB safer for KVM. Have you thought about just using
>kvm_read_guest() or copy_from_user() to fully copy out the GCHB into a KVM owned buffer, then copying it back before the VMRUN. That way the KVM doesn't need to guard against page_state_changes on the GHCBs, that could be a perf ?>improvement in a follow up.

Along with the performance costs you mentioned, the main concern here will be the GHCB write-back path (copying it back) before VMRUN: this will again hit the issue we have currently with
kvm_write_guest() / copy_to_user(), when we use it to sync the scratch buffer back to GHCB. This can fail if guest RAM is mapped using huge-page(s) and RMP is 4K. Please refer to the patch/fix
mentioned below, kvm_write_guest() potentially can fail before VMRUN in case of SNP :

commit 94ed878c2669532ebae8eb9b4503f19aa33cd7aa
Author: Ashish Kalra <[email protected]>
Date: Mon Jun 6 22:28:01 2022 +0000

KVM: SVM: Sync the GHCB scratch buffer using already mapped ghcb

Using kvm_write_guest() to sync the GHCB scratch buffer can fail
due to host mapping being 2M, but RMP being 4K. The page fault handling
in do_user_addr_fault() fails to split the 2M page to handle RMP fault due
to it being called here in a non-preemptible context. Instead use
the already kernel mapped ghcb to sync the scratch buffer when the
scratch buffer is contained within the GHCB.

Thanks,
Ashish

>Since we cannot unmap GHCBs I don't think UPM will help here so we probably want to make these patches safe against malicious guests making GHCBs private. But maybe UPM does help?

2022-06-27 19:07:32

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 42/49] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

[Public]

Hello Peter,

-----Original Message-----
From: Peter Gonda <[email protected]>
Sent: Friday, June 24, 2022 11:25 AM
To: Kalra, Ashish <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>; LKML <[email protected]>; kvm list <[email protected]>; [email protected]; [email protected]; Linux Crypto Mailing List <[email protected]>; Thomas Gleixner <[email protected]>; Ingo Molnar <[email protected]>; Joerg Roedel <[email protected]>; Lendacky, Thomas <[email protected]>; H. Peter Anvin <[email protected]>; Ard Biesheuvel <[email protected]>; Paolo Bonzini <[email protected]>; Sean Christopherson <[email protected]>; Vitaly Kuznetsov <[email protected]>; Jim Mattson <[email protected]>; Andy Lutomirski <[email protected]>; Dave Hansen <[email protected]>; Sergio Lopez <[email protected]>; Peter Zijlstra <[email protected]>; Srinivas Pandruvada <[email protected]>; David Rientjes <[email protected]>; Dov Murik <[email protected]>; Tobin Feldman-Fitzthum <[email protected]>; Borislav Petkov <[email protected]>; Roth, Michael <[email protected]>; Vlastimil Babka <[email protected]>; Kirill A . Shutemov <[email protected]>; Andi Kleen <[email protected]>; Tony Luck <[email protected]>; Marc Orr <[email protected]>; Sathyanarayanan Kuppuswamy <[email protected]>; Alper Gun <[email protected]>; Dr. David Alan Gilbert <[email protected]>; [email protected]
Subject: Re: [PATCH Part2 v6 42/49] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On Mon, Jun 20, 2022 at 5:13 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> Version 2 of GHCB specification added the support for two SNP Guest
> Request Message NAE events. The events allows for an SEV-SNP guest to
> make request to the SEV-SNP firmware through hypervisor using the
> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>
> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
> difference of an additional certificate blob that can be passed
> through the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP
> driver provides snp_guest_ext_guest_request() that is used by the KVM
> to get both the report and certificate data at once.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 196 +++++++++++++++++++++++++++++++++++++++--
> arch/x86/kvm/svm/svm.h | 2 +
> 2 files changed, 192 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index
> 7fc0fad87054..089af21a4efe 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -343,6 +343,7 @@ static int sev_guest_init(struct kvm *kvm, struct
> kvm_sev_cmd *argp)
>
> spin_lock_init(&sev->psc_lock);
> ret = sev_snp_init(&argp->error);
> + mutex_init(&sev->guest_req_lock);
> } else {
> ret = sev_platform_init(&argp->error);
> }
> @@ -1884,23 +1885,39 @@ int sev_vm_move_enc_context_from(struct kvm
> *kvm, unsigned int source_fd)
>
> static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd
> *argp) {
> + void *context = NULL, *certs_data = NULL, *resp_page = NULL;

>Is the NULL setting here unnecessary since all of these are set via functions snp_alloc_firmware_page(), kmalloc(), and
>snp_alloc_firmware_page() respectively?

Yes, they don't need to be set to NULL.

> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> struct sev_data_snp_gctx_create data = {};
> - void *context;
> int rc;
>
> + /* Allocate memory used for the certs data in SNP guest request */
> + certs_data = kmalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
> + if (!certs_data)
> + return NULL;

>I think we want to use kzalloc() here to ensure we never give the guest uninitialized kernel memory.

Yes.

> +
> /* Allocate memory for context page */
> context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
> if (!context)
> - return NULL;
> + goto e_free;
> +
> + /* Allocate a firmware buffer used during the guest command handling. */
> + resp_page = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
> + if (!resp_page)
> + goto e_free;

>|resp_page| doesn't appear to be used anywhere?

>
> data.gctx_paddr = __psp_pa(context);
> rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
> - if (rc) {
> - snp_free_firmware_page(context);
> - return NULL;
> - }
> + if (rc)
> + goto e_free;
> +
> + sev->snp_certs_data = certs_data;
>
> return context;
> +
> +e_free:
> + snp_free_firmware_page(context);
> + kfree(certs_data);
> + return NULL;
> }
>
> static int snp_bind_asid(struct kvm *kvm, int *error) @@ -2565,6
> +2582,8 @@ static int snp_decommission_context(struct kvm *kvm)
> snp_free_firmware_page(sev->snp_context);
> sev->snp_context = NULL;
>
> + kfree(sev->snp_certs_data);
> +
> return 0;
> }
>
> @@ -3077,6 +3096,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> case SVM_VMGEXIT_HV_FEATURES:
> case SVM_VMGEXIT_PSC:
> + case SVM_VMGEXIT_GUEST_REQUEST:
> + case SVM_VMGEXIT_EXT_GUEST_REQUEST:
> break;
> default:
> reason = GHCB_ERR_INVALID_EVENT; @@ -3502,6 +3523,155
> @@ static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm)
> return rc ? map_to_psc_vmgexit_code(rc) : 0; }
>
> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
> + struct sev_data_snp_guest_request *data,
> + gpa_t req_gpa, gpa_t
> +resp_gpa) {
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + kvm_pfn_t req_pfn, resp_pfn;
> + struct kvm_sev_info *sev;
> +
> + sev = &to_kvm_svm(kvm)->sev_info;

>This is normally done at declaration in this file. Why not here?

> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
Ok.

> +
> + if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
> + return SEV_RET_INVALID_PARAM;
> +
> + req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> + if (is_error_noslot_pfn(req_pfn))
> + return SEV_RET_INVALID_ADDRESS;
> +
> + resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> + if (is_error_noslot_pfn(resp_pfn))
> + return SEV_RET_INVALID_ADDRESS;
> +
> + if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
> + return SEV_RET_INVALID_ADDRESS;
> +
> + data->gctx_paddr = __psp_pa(sev->snp_context);
> + data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> + data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
> +
> + return 0;
> +}
> +
> +static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request
> +*data, unsigned long *rc) {
> + u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
> + int ret;
> +
> + ret = snp_page_reclaim(pfn);
> + if (ret)
> + *rc = SEV_RET_INVALID_ADDRESS;

>Do we need a diff error code here? This means the page the guest gives us is now "stuck" in the FW owned state. How would the guest know this is the case? We return the exact same error in snp_setup_guest_buf() if the resp_gpa isn't page aligned >so now if the guest ever sees a SEV_RET_INVALID_ADDRESS I think its only safe option is to either try and page_state_change it to a know state or mark it as unusable memory.

If snp_page_reclaim() fails, it will invoke snp_leak_pages() which will indicate memory failure and trigger memory recovery mechanisms and that should drop the pages or mark them unusable.

> +
> + ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (ret)
> + *rc = SEV_RET_INVALID_ADDRESS;

>Ditto here I think we need some way to signal to the guest what state this page is on return to guest execution.

>Also these errors shadow over FW successes, this means the guest's guest-request-sequence-numbers are now out of sync meaning this VMPCK is unusable less the guest risk reusing the AES IV (which would break the confidentiality/integrity). >Should we have a way to signal to the guest the FW has successfully run your command but we could not change the page states back correctly, so the guest should increment their sequence numbers.

Yes, that is an important observation, and the sequence numbers are now out of sync.
But as this is an error path, so what's the guarantee that the next guest message request will succeed completely, isn’t it better to let the
FW reject any subsequent guest messages once it has detected that the sequence numbers are out of sync ?

> +}
> +
> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t
> +req_gpa, gpa_t resp_gpa) {
> + struct sev_data_snp_guest_request data = {0};
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + struct kvm_sev_info *sev;
> + unsigned long rc;
> + int err;
> +
> + if (!sev_snp_guest(vcpu->kvm)) {
> + rc = SEV_RET_INVALID_GUEST;
> + goto e_fail;
> + }
> +
> + sev = &to_kvm_svm(kvm)->sev_info;

>Ditto why not due this above?
Ok.

> +
> + mutex_lock(&sev->guest_req_lock);
> +
> + rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> + if (rc)
> + goto unlock;
> +
> + rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> + if (rc)
> + /* use the firmware error code */
> + rc = err;
> +
> + snp_cleanup_guest_buf(&data, &rc);
> +
> +unlock:
> + mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> + svm_set_ghcb_sw_exit_info_2(vcpu, rc); }
> +
> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t
> +req_gpa, gpa_t resp_gpa) {
> + struct sev_data_snp_guest_request req = {0};
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + struct kvm *kvm = vcpu->kvm;
> + unsigned long data_npages;
> + struct kvm_sev_info *sev;
> + unsigned long rc, err;
> + u64 data_gpa;
> +
> + if (!sev_snp_guest(vcpu->kvm)) {
> + rc = SEV_RET_INVALID_GUEST;
> + goto e_fail;
> + }
> +
> + sev = &to_kvm_svm(kvm)->sev_info;
> +
> + data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> + data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> +
> + if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> + rc = SEV_RET_INVALID_ADDRESS;
> + goto e_fail;
> + }
> +
> + /* Verify that requested blob will fit in certificate buffer */
> + if ((data_npages << PAGE_SHIFT) > SEV_FW_BLOB_MAX_SIZE) {
> + rc = SEV_RET_INVALID_PARAM;
> + goto e_fail;
> + }
> +
> + mutex_lock(&sev->guest_req_lock);
> +
> + rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
> + if (rc)
> + goto unlock;
> +
> + rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
> + &data_npages, &err);
> + if (rc) {
> + /*
> + * If buffer length is small then return the expected
> + * length in rbx.
> + */
> + if (err == SNP_GUEST_REQ_INVALID_LEN)
> + vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
> +
> + /* pass the firmware error code */
> + rc = err;
> + goto cleanup;
> + }
> +
> + /* Copy the certificate blob in the guest memory */
> + if (data_npages &&
> + kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
> + rc = SEV_RET_INVALID_ADDRESS;

>Since at this point the PSP FW has correctly executed the command and incremented the VMPCK sequence number I think we need another error signal here since this will tell the guest the PSP had an error so it will not know if the VMPCK sequence >number should be incremented.

Similarly as above, as this is an error path, so what's the guarantee that the next guest message request will succeed completely, isn’t it better to let the
FW reject any subsequent guest messages once it has detected that the sequence numbers are out of sync ?

> +
> +cleanup:
> + snp_cleanup_guest_buf(&req, &rc);
> +
> +unlock:
> + mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> + svm_set_ghcb_sw_exit_info_2(vcpu, rc); }
> +
> static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm) {
> struct vmcb_control_area *control = &svm->vmcb->control; @@
> -3753,6 +3923,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
> svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> break;
> }
> + case SVM_VMGEXIT_GUEST_REQUEST: {
> + snp_handle_guest_request(svm, control->exit_info_1,
> + control->exit_info_2);
> +
> + ret = 1;
> + break;
> + }
> + case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
> + snp_handle_ext_guest_request(svm,
> + control->exit_info_1,
> + control->exit_info_2);
> +
> + ret = 1;
> + break;
> + }
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> vcpu_unimpl(vcpu,
> "vmgexit: unsupported event -
> exit_info_1=%#llx, exit_info_2=%#llx\n", diff --git
> a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index
> 3fd95193ed8d..3be24da1a743 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -98,6 +98,8 @@ struct kvm_sev_info {
> u64 snp_init_flags;
> void *snp_context; /* SNP guest context page */
> spinlock_t psc_lock;
> + void *snp_certs_data;
> + struct mutex guest_req_lock;
> };
>
> struct kvm_svm {
> --
> 2.25.1
>

2022-06-28 10:52:49

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

* Kalra, Ashish ([email protected]) wrote:
> [AMD Official Use Only - General]
>
> >>> /*
> >>> * The RMP entry format is not architectural. The format is defined
> >>> in PPR @@ -126,6 +128,15 @@ struct snp_guest_platform_data {
> >>> u64 secrets_gpa;
> >>> };
> >>>
> >>> +struct rmpupdate {
> >>> + u64 gpa;
> >>> + u8 assigned;
> >>> + u8 pagesize;
> >>> + u8 immutable;
> >>> + u8 rsvd;
> >>> + u32 asid;
> >>> +} __packed;
>
> >>I see above it says the RMP entry format isn't architectural; is this 'rmpupdate' structure? If not how is this going to get handled when we have a couple >of SNP capable CPUs with different layouts?
>
> >Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible.
> >I probably think the wording here should be architecture independent or more precisely platform independent.
>
> Some more clarity on this:
>
> Actually, the PPR for family 19h Model 01h, Rev B1 defines the RMP entry format as below:
>
> 2.1.4.2 RMP Entry Format
> Architecturally the format of RMP entries are not specified in APM. In order to assist software, the following table specifies select portions of the RMP entry format for this specific product. Each RMP entry is 16B in size and is formatted as follows. Software should not rely on any field definitions not specified in this table and the format of an RMP entry may change in future processors.
>
> Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible. So non-architectural in this context means that it is only defined in our PPR.
>
> So actually this RPM entry definition is platform dependent and will need to be changed for different AMD processors and that change has to be handled correspondingly in the dump_rmpentry() code.

You'll need a way to make that fail cleanly when run on a newer CPU
with different layout, and a way to build kernels that can handle
more than one layout.

Dave

> Thanks,
> Ashish
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2022-06-28 13:33:30

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 36/49] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT

* Ashish Kalra ([email protected]) wrote:
> From: Brijesh Singh <[email protected]>
>
> SEV-SNP guests are required to perform a GHCB GPA registration. Before
> using a GHCB GPA for a vCPU the first time, a guest must register the
> vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
> it must respond back with the same GPA otherwise return -1.
>
> On VMEXIT, Verify that GHCB GPA matches with the registered value. If a
> mismatch is detected then abort the guest.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/sev-common.h | 8 ++++++++
> arch/x86/kvm/svm/sev.c | 27 +++++++++++++++++++++++++++
> arch/x86/kvm/svm/svm.h | 7 +++++++
> 3 files changed, 42 insertions(+)
>
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 539de6b93420..0a9055cdfae2 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -59,6 +59,14 @@
> #define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
> #define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)
>
> +/* Preferred GHCB GPA Request */
> +#define GHCB_MSR_PREF_GPA_REQ 0x010
> +#define GHCB_MSR_GPA_VALUE_POS 12
> +#define GHCB_MSR_GPA_VALUE_MASK GENMASK_ULL(51, 0)

Are the magic 51's in here fixed ?

Dave

> +#define GHCB_MSR_PREF_GPA_RESP 0x011
> +#define GHCB_MSR_PREF_GPA_NONE 0xfffffffffffff
> +
> /* GHCB GPA Register */
> #define GHCB_MSR_REG_GPA_REQ 0x012
> #define GHCB_MSR_REG_GPA_REQ_VAL(v) \
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index c70f3f7e06a8..6de48130e414 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3331,6 +3331,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
> break;
> }
> + case GHCB_MSR_PREF_GPA_REQ: {
> + set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
> + GHCB_MSR_GPA_VALUE_POS);
> + set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
> + GHCB_MSR_INFO_POS);
> + break;
> + }
> + case GHCB_MSR_REG_GPA_REQ: {
> + u64 gfn;
> +
> + gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
> + GHCB_MSR_GPA_VALUE_POS);
> +
> + svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
> +
> + set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
> + GHCB_MSR_GPA_VALUE_POS);
> + set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
> + GHCB_MSR_INFO_POS);
> + break;
> + }
> case GHCB_MSR_TERM_REQ: {
> u64 reason_set, reason_code;
>
> @@ -3381,6 +3402,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> + /* SEV-SNP guest requires that the GHCB GPA must be registered */
> + if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
> + vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
> + return -EINVAL;
> + }
> +
> ret = sev_es_validate_vmgexit(svm, &exit_code);
> if (ret)
> return ret;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index c80352c9c0d6..54ff56cb6125 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -206,6 +206,8 @@ struct vcpu_sev_es_state {
> */
> u64 ghcb_sw_exit_info_1;
> u64 ghcb_sw_exit_info_2;
> +
> + u64 ghcb_registered_gpa;
> };
>
> struct vcpu_svm {
> @@ -334,6 +336,11 @@ static inline bool sev_snp_guest(struct kvm *kvm)
> return sev_es_guest(kvm) && sev->snp_active;
> }
>
> +static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
> +{
> + return svm->sev_es.ghcb_registered_gpa == val;
> +}
> +
> static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
> {
> vmcb->control.clean = 0;
> --
> 2.25.1
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2022-06-28 17:59:30

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

[AMD Official Use Only - General]

Hello Dave,

-----Original Message-----
From: Dr. David Alan Gilbert <[email protected]>
Sent: Tuesday, June 28, 2022 5:51 AM
To: Kalra, Ashish <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Lendacky, Thomas <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Roth, Michael <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

* Kalra, Ashish ([email protected]) wrote:
> [AMD Official Use Only - General]
>
> >>> /*
> >>> * The RMP entry format is not architectural. The format is
> >>> defined in PPR @@ -126,6 +128,15 @@ struct snp_guest_platform_data {
> >>> u64 secrets_gpa;
> >>> };
> >>>
> >>> +struct rmpupdate {
> >>> + u64 gpa;
> >>> + u8 assigned;
> >>> + u8 pagesize;
> >>> + u8 immutable;
> >>> + u8 rsvd;
> >>> + u32 asid;
> >>> +} __packed;
>
> >>I see above it says the RMP entry format isn't architectural; is this 'rmpupdate' structure? If not how is this going to get handled when we have a couple >of SNP capable CPUs with different layouts?
>
> >Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible.
> >I probably think the wording here should be architecture independent or more precisely platform independent.
>
> Some more clarity on this:
>
> Actually, the PPR for family 19h Model 01h, Rev B1 defines the RMP entry format as below:
>
> 2.1.4.2 RMP Entry Format
> Architecturally the format of RMP entries are not specified in APM. In order to assist software, the following table specifies select portions of the RMP entry format for this specific product. Each RMP entry is 16B in size and is formatted as follows. Software should not rely on any field definitions not specified in this table and the format of an RMP entry may change in future processors.
>
> Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible. So non-architectural in this context means that it is only defined in our PPR.
>
> So actually this RPM entry definition is platform dependent and will need to be changed for different AMD processors and that change has to be handled correspondingly in the dump_rmpentry() code.

> You'll need a way to make that fail cleanly when run on a newer CPU with different layout, and a way to build kernels that can handle more than one layout.

Yes, I will be adding a check for CPU family/model as following :

static int __init snp_rmptable_init(void)
{
+ int family, model;

if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
return 0;

+ family = boot_cpu_data.x86;
+ model = boot_cpu_data.x86_model;

+ /*
+ * RMP table entry format is not architectural and it can vary by processor and
+ * is defined by the per-processor PPR. Restrict SNP support on the known CPU
+ * model and family for which the RMP table entry format is currently defined for.
+ */
+ if (family != 0x19 || model > 0xaf)
+ goto nosnp;
+

This way SNP will only be enabled specifically on the platforms for which this RMP entry
format is defined in those processor's PPR. This will work for Milan and Genoa as of now.

Additionally as per Sean's suggestion, I will be moving the RMP structure definition to sev.c,
which will make it a private structure and not exposed to other parts of the kernel.

Also in the future we will have an architectural interface to read the RMP table entry,
we will first check for it's availability and if not available fall back to the RMP table
entry structure definition.

Thanks,
Ashish

2022-06-28 19:01:36

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

* Kalra, Ashish ([email protected]) wrote:
> [AMD Official Use Only - General]
>
> Hello Dave,
>
> -----Original Message-----
> From: Dr. David Alan Gilbert <[email protected]>
> Sent: Tuesday, June 28, 2022 5:51 AM
> To: Kalra, Ashish <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Lendacky, Thomas <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Roth, Michael <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
>
> * Kalra, Ashish ([email protected]) wrote:
> > [AMD Official Use Only - General]
> >
> > >>> /*
> > >>> * The RMP entry format is not architectural. The format is
> > >>> defined in PPR @@ -126,6 +128,15 @@ struct snp_guest_platform_data {
> > >>> u64 secrets_gpa;
> > >>> };
> > >>>
> > >>> +struct rmpupdate {
> > >>> + u64 gpa;
> > >>> + u8 assigned;
> > >>> + u8 pagesize;
> > >>> + u8 immutable;
> > >>> + u8 rsvd;
> > >>> + u32 asid;
> > >>> +} __packed;
> >
> > >>I see above it says the RMP entry format isn't architectural; is this 'rmpupdate' structure? If not how is this going to get handled when we have a couple >of SNP capable CPUs with different layouts?
> >
> > >Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible.
> > >I probably think the wording here should be architecture independent or more precisely platform independent.
> >
> > Some more clarity on this:
> >
> > Actually, the PPR for family 19h Model 01h, Rev B1 defines the RMP entry format as below:
> >
> > 2.1.4.2 RMP Entry Format
> > Architecturally the format of RMP entries are not specified in APM. In order to assist software, the following table specifies select portions of the RMP entry format for this specific product. Each RMP entry is 16B in size and is formatted as follows. Software should not rely on any field definitions not specified in this table and the format of an RMP entry may change in future processors.
> >
> > Architectural implies that it is defined in the APM and shouldn't change in such a way as to not be backward compatible. So non-architectural in this context means that it is only defined in our PPR.
> >
> > So actually this RPM entry definition is platform dependent and will need to be changed for different AMD processors and that change has to be handled correspondingly in the dump_rmpentry() code.
>
> > You'll need a way to make that fail cleanly when run on a newer CPU with different layout, and a way to build kernels that can handle more than one layout.
>
> Yes, I will be adding a check for CPU family/model as following :
>
> static int __init snp_rmptable_init(void)
> {
> + int family, model;
>
> if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
> return 0;
>
> + family = boot_cpu_data.x86;
> + model = boot_cpu_data.x86_model;
>
> + /*
> + * RMP table entry format is not architectural and it can vary by processor and
> + * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> + * model and family for which the RMP table entry format is currently defined for.
> + */
> + if (family != 0x19 || model > 0xaf)
> + goto nosnp;

please add a print there to say why you're not enabling SNP.

It would be great if your firmware could give you an 'rmpentry version'; and
then if a new model came out that happened to have the same layout
everything would just carryon working by checking that rather than
the actual family/model.

> +
>
> This way SNP will only be enabled specifically on the platforms for which this RMP entry
> format is defined in those processor's PPR. This will work for Milan and Genoa as of now.
>
> Additionally as per Sean's suggestion, I will be moving the RMP structure definition to sev.c,
> which will make it a private structure and not exposed to other parts of the kernel.
>
> Also in the future we will have an architectural interface to read the RMP table entry,
> we will first check for it's availability and if not available fall back to the RMP table
> entry structure definition.

Dave

> Thanks,
> Ashish
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2022-06-28 19:06:44

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

On 6/28/22 10:57, Kalra, Ashish wrote:
> + /*
> + * RMP table entry format is not architectural and it can vary by processor and
> + * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> + * model and family for which the RMP table entry format is currently defined for.
> + */
> + if (family != 0x19 || model > 0xaf)
> + goto nosnp;
> +
>
> This way SNP will only be enabled specifically on the platforms for which this RMP entry
> format is defined in those processor's PPR. This will work for Milan and Genoa as of now.

At some point, it would be really nice if the AMD side of things could
work to kick the magic number habit on these things. This:

arch/x86/include/asm/intel-family.h

has been really handy. It lets you do things like

grep INTEL_FAM6_SKYLAKE arch/x86

That's a *LOT* more precise than:

egrep -i '0x5E|94' arch/x86

2022-06-29 18:18:56

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 26/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command

[Public]

>> +static inline void snp_leak_pages(u64 pfn, enum pg_level level) {
>> + unsigned int npages = page_level_size(level) >> PAGE_SHIFT;
>> +
>> + WARN(1, "psc failed pfn 0x%llx pages %d (leaking)\n", pfn,
>> + npages);
>> +
>> + while (npages) {
>> + memory_failure(pfn, 0);
>> + dump_rmpentry(pfn);
>> + npages--;
>> + pfn++;
>> + }
>> +}

>Should this be deduplicated with the snp_leak_pages() in "crypto: ccp:
>Handle the legacy TMR allocation when SNP is enabled" ?

Yes, probably should.

>> +static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
>> +{
>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> + struct list_head *head = &sev->regions_list;
>> + struct enc_region *i;
>> +
>> + lockdep_assert_held(&kvm->lock);
>> +
>> + list_for_each_entry(i, head, list) {
>> + u64 start = i->uaddr;
>> + u64 end = start + i->size;
>> +
>> + if (start <= hva && end >= (hva + len))
>> + return true;
>> + }

>Given that usersapce could load sev->regions_list with any # of any sized regions. Should we add a cond_resched() like in sev_vm_destroy()?

Actually, is_hva_registered() is also called from PSC handler with kvm->lock mutex held. Even though it is a mutex, I am not really sure if
it is a good idea to do cond_resched() with the kvm->lock mutex held ?

>> +e_unpin:
>> + /* Content of memory is updated, mark pages dirty */
>> + for (i = 0; i < n; i++) {

>Since |n| is not only a loop variable but actually carries the number of private pages over to e_unpin can we use a more descriptive name?
>How about something like 'nprivate_pages'?

Yes.

Thanks,
Ashish

2022-06-29 19:20:32

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 42/49] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

[Public]


>> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t
>> +req_gpa, gpa_t resp_gpa) {
>> + struct sev_data_snp_guest_request req = {0};
>> + struct kvm_vcpu *vcpu = &svm->vcpu;
>> + struct kvm *kvm = vcpu->kvm;
>> + unsigned long data_npages;
>> + struct kvm_sev_info *sev;
>> + unsigned long rc, err;
>> + u64 data_gpa;
>> +
>> + if (!sev_snp_guest(vcpu->kvm)) {
>> + rc = SEV_RET_INVALID_GUEST;
>> + goto e_fail;
>> + }
>> +
>> + sev = &to_kvm_svm(kvm)->sev_info;
>> +
>> + data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>> + data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
>> +
>> + if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>> + rc = SEV_RET_INVALID_ADDRESS;
>> + goto e_fail;
>> + }
>> +
>> + /* Verify that requested blob will fit in certificate buffer */
>> + if ((data_npages << PAGE_SHIFT) > SEV_FW_BLOB_MAX_SIZE) {
>> + rc = SEV_RET_INVALID_PARAM;
>> + goto e_fail;
>> + }
>> +
>> + mutex_lock(&sev->guest_req_lock);
>> +
>> + rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
>> + if (rc)
>> + goto unlock;
>> +
>> + rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
>> + &data_npages, &err);
>> + if (rc) {
>> + /*
>> + * If buffer length is small then return the expected
>> + * length in rbx.
>> + */
>> + if (err == SNP_GUEST_REQ_INVALID_LEN)
>> + vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
>> +
>> + /* pass the firmware error code */
>> + rc = err;
>> + goto cleanup;
>> + }
>> +
>> + /* Copy the certificate blob in the guest memory */
>> + if (data_npages &&
>> + kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
>> + rc = SEV_RET_INVALID_ADDRESS;

>>Since at this point the PSP FW has correctly executed the command and incremented the VMPCK sequence number I think we need another error signal here since this will tell the guest the PSP had an error so it will not know if the VMPCK sequence >number should be incremented.

>Similarly as above, as this is an error path, so what's the guarantee that the next guest message request will succeed completely, isn’t it better to let the
>FW reject any subsequent guest messages once it has detected that the sequence numbers are out of sync ?

Alternately, we probably can return SEV_RET_INVALID_PAGE_STATE/SEV_RET_INVALID_PAGE_OWNER here, but that still does not indicate to the guest
that the FW has successfully executed the command and the error occurred during cleanup/result phase and it needs to increment the VMPCK sequence number. There is nothing as such defined in SNP FW API specs to indicate such kind of failures to guest. As I mentioned earlier, this is probably indicative of
a bigger system failure and it is better to let the FW reject subsequent guest messages/requests once it has detected that the sequence numbers are out of sync.

Thanks,
Ashish

2022-07-07 20:09:11

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 35/49] KVM: SVM: Remove the long-lived GHCB host map

On Fri, Jun 24, 2022 at 2:14 PM Kalra, Ashish <[email protected]> wrote:
>
> [AMD Official Use Only - General]
>
> Hello Peter,
>
> >> From: Brijesh Singh <[email protected]>
> >>
> >> On VMGEXIT, sev_handle_vmgexit() creates a host mapping for the GHCB
> >> GPA, and unmaps it just before VM-entry. This long-lived GHCB map is
> >> used by the VMGEXIT handler through accessors such as ghcb_{set_get}_xxx().
> >>
> >> A long-lived GHCB map can cause issue when SEV-SNP is enabled. When
> >> SEV-SNP is enabled the mapped GPA needs to be protected against a page
> >> state change.
> >>
> >> To eliminate the long-lived GHCB mapping, update the GHCB sync
> >> operations to explicitly map the GHCB before access and unmap it after
> >> access is complete. This requires that the setting of the GHCBs
> >> sw_exit_info_{1,2} fields be done during sev_es_sync_to_ghcb(), so
> >> create two new fields in the vcpu_svm struct to hold these values when
> >> required to be set outside of the GHCB mapping.
> >>
> >> Signed-off-by: Brijesh Singh <[email protected]>
> >> ---
> >> arch/x86/kvm/svm/sev.c | 131
> >> ++++++++++++++++++++++++++---------------
> >> arch/x86/kvm/svm/svm.c | 12 ++--
> >> arch/x86/kvm/svm/svm.h | 24 +++++++-
> >> 3 files changed, 111 insertions(+), 56 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index
> >> 01ea257e17d6..c70f3f7e06a8 100644
> >> --- a/arch/x86/kvm/svm/sev.c
> >> +++ b/arch/x86/kvm/svm/sev.c
> >> @@ -2823,15 +2823,40 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
> >> kvfree(svm->sev_es.ghcb_sa);
> >> }
> >>
> >> +static inline int svm_map_ghcb(struct vcpu_svm *svm, struct
> >> +kvm_host_map *map) {
> >> + struct vmcb_control_area *control = &svm->vmcb->control;
> >> + u64 gfn = gpa_to_gfn(control->ghcb_gpa);
> >> +
> >> + if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
> >> + /* Unable to map GHCB from guest */
> >> + pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
> >> + return -EFAULT;
> >> + }
> >> +
> >> + return 0;
> >> +}
>
> >There is a perf cost to this suggestion but it might make accessing the GHCB safer for KVM. Have you thought about just using
> >kvm_read_guest() or copy_from_user() to fully copy out the GCHB into a KVM owned buffer, then copying it back before the VMRUN. That way the KVM doesn't need to guard against page_state_changes on the GHCBs, that could be a perf ?>improvement in a follow up.
>
> Along with the performance costs you mentioned, the main concern here will be the GHCB write-back path (copying it back) before VMRUN: this will again hit the issue we have currently with
> kvm_write_guest() / copy_to_user(), when we use it to sync the scratch buffer back to GHCB. This can fail if guest RAM is mapped using huge-page(s) and RMP is 4K. Please refer to the patch/fix
> mentioned below, kvm_write_guest() potentially can fail before VMRUN in case of SNP :
>
> commit 94ed878c2669532ebae8eb9b4503f19aa33cd7aa
> Author: Ashish Kalra <[email protected]>
> Date: Mon Jun 6 22:28:01 2022 +0000
>
> KVM: SVM: Sync the GHCB scratch buffer using already mapped ghcb
>
> Using kvm_write_guest() to sync the GHCB scratch buffer can fail
> due to host mapping being 2M, but RMP being 4K. The page fault handling
> in do_user_addr_fault() fails to split the 2M page to handle RMP fault due
> to it being called here in a non-preemptible context. Instead use
> the already kernel mapped ghcb to sync the scratch buffer when the
> scratch buffer is contained within the GHCB.

Ah I didn't see that issue thanks for the pointer.

The patch description says "When SEV-SNP is enabled the mapped GPA
needs to be protected against a page state change." since if the guest
were to convert the GHCB page to private when the host is using the
GHCB the host could get an RMP violation right? That RMP violation
would cause the host to crash unless we use some copy_to_user() type
protections. I don't see anything mechanism for this patch to add the
page state change protection discussed. Can't another vCPU still
convert the GHCB to private?

I was wrong about the importance of this though [email protected] walked me
through how UPM will solve this issue so no worries about this until
the series is rebased on to UPM.

>
> Thanks,
> Ashish
>
> >Since we cannot unmap GHCBs I don't think UPM will help here so we probably want to make these patches safe against malicious guests making GHCBs private. But maybe UPM does help?

2022-07-07 20:42:44

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 35/49] KVM: SVM: Remove the long-lived GHCB host map

[AMD Official Use Only - General]

Hello Peter,

>> >There is a perf cost to this suggestion but it might make accessing
>> >the GHCB safer for KVM. Have you thought about just using
>> >kvm_read_guest() or copy_from_user() to fully copy out the GCHB into a KVM owned buffer, then copying it back before the VMRUN. That way the KVM doesn't need to guard against page_state_changes on the GHCBs, that could be a perf ?>>improvement in a follow up.
>>
>> Along with the performance costs you mentioned, the main concern here
>> will be the GHCB write-back path (copying it back) before VMRUN: this
>> will again hit the issue we have currently with
>> kvm_write_guest() / copy_to_user(), when we use it to sync the scratch
>> buffer back to GHCB. This can fail if guest RAM is mapped using huge-page(s) and RMP is 4K. Please refer to the patch/fix mentioned below, kvm_write_guest() potentially can fail before VMRUN in case of SNP :
>>
>> commit 94ed878c2669532ebae8eb9b4503f19aa33cd7aa
>> Author: Ashish Kalra <[email protected]>
>> Date: Mon Jun 6 22:28:01 2022 +0000
>>
>> KVM: SVM: Sync the GHCB scratch buffer using already mapped ghcb
>>
>> Using kvm_write_guest() to sync the GHCB scratch buffer can fail
>> due to host mapping being 2M, but RMP being 4K. The page fault handling
>> in do_user_addr_fault() fails to split the 2M page to handle RMP fault due
>> to it being called here in a non-preemptible context. Instead use
>> the already kernel mapped ghcb to sync the scratch buffer when the
>> scratch buffer is contained within the GHCB.

>Ah I didn't see that issue thanks for the pointer.

>The patch description says "When SEV-SNP is enabled the mapped GPA needs to be protected against a page state change." since if the guest were to convert the GHCB page to private when the host is using the GHCB the host could get an RMP violation right?

Right.

>That RMP violation would cause the host to crash unless we use some copy_to_user() type protections.

As such copy_to_user() will only swallow the RMP violation and return failure, so the host can retry the write.

> I don't see anything mechanism for this patch to add the page state change protection discussed. Can't another vCPU still convert the GHCB to private?

We do have the protections for GHCB getting mapped to private specifically, there are new post_{map|unmap}_gfn functions added to verify if it is safe to map
GHCB pages. There is a PSC spinlock added which protects again page state change for these mapped pages.
Below is the reference to this patch:
https://lore.kernel.org/lkml/[email protected]/T/#mafcaac7296eb9a92c0ea58730dbd3ca47a8e0756

But do note that there is protection only for GHCB pages and there is a need to add generic post_{map,unmap}_gfn() ops that can be used to verify
that it's safe to map a given guest page in the hypervisor. This is a TODO right now and probably this is something which UPM can address more cleanly.

>I was wrong about the importance of this though [email protected] walked me through how UPM will solve this issue so no worries about this until the series is rebased on to UPM.

Thanks,
Ashish

2022-07-08 15:47:44

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 42/49] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event

On Wed, Jun 29, 2022 at 1:15 PM Kalra, Ashish <[email protected]> wrote:
>
> [Public]
>
>
> >> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t
> >> +req_gpa, gpa_t resp_gpa) {
> >> + struct sev_data_snp_guest_request req = {0};
> >> + struct kvm_vcpu *vcpu = &svm->vcpu;
> >> + struct kvm *kvm = vcpu->kvm;
> >> + unsigned long data_npages;
> >> + struct kvm_sev_info *sev;
> >> + unsigned long rc, err;
> >> + u64 data_gpa;
> >> +
> >> + if (!sev_snp_guest(vcpu->kvm)) {
> >> + rc = SEV_RET_INVALID_GUEST;
> >> + goto e_fail;
> >> + }
> >> +
> >> + sev = &to_kvm_svm(kvm)->sev_info;
> >> +
> >> + data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> >> + data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> >> +
> >> + if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> >> + rc = SEV_RET_INVALID_ADDRESS;
> >> + goto e_fail;
> >> + }
> >> +
> >> + /* Verify that requested blob will fit in certificate buffer */
> >> + if ((data_npages << PAGE_SHIFT) > SEV_FW_BLOB_MAX_SIZE) {
> >> + rc = SEV_RET_INVALID_PARAM;
> >> + goto e_fail;
> >> + }
> >> +
> >> + mutex_lock(&sev->guest_req_lock);
> >> +
> >> + rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
> >> + if (rc)
> >> + goto unlock;
> >> +
> >> + rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
> >> + &data_npages, &err);
> >> + if (rc) {
> >> + /*
> >> + * If buffer length is small then return the expected
> >> + * length in rbx.
> >> + */
> >> + if (err == SNP_GUEST_REQ_INVALID_LEN)
> >> + vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
> >> +
> >> + /* pass the firmware error code */
> >> + rc = err;
> >> + goto cleanup;
> >> + }
> >> +
> >> + /* Copy the certificate blob in the guest memory */
> >> + if (data_npages &&
> >> + kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
> >> + rc = SEV_RET_INVALID_ADDRESS;
>
> >>Since at this point the PSP FW has correctly executed the command and incremented the VMPCK sequence number I think we need another error signal here since this will tell the guest the PSP had an error so it will not know if the VMPCK sequence >number should be incremented.
>
> >Similarly as above, as this is an error path, so what's the guarantee that the next guest message request will succeed completely, isn’t it better to let the
> >FW reject any subsequent guest messages once it has detected that the sequence numbers are out of sync ?
>
> Alternately, we probably can return SEV_RET_INVALID_PAGE_STATE/SEV_RET_INVALID_PAGE_OWNER here, but that still does not indicate to the guest
> that the FW has successfully executed the command and the error occurred during cleanup/result phase and it needs to increment the VMPCK sequence number. There is nothing as such defined in SNP FW API specs to indicate such kind of failures to guest. As I mentioned earlier, this is probably indicative of
> a bigger system failure and it is better to let the FW reject subsequent guest messages/requests once it has detected that the sequence numbers are out of sync.

Hmm I think the guest must be careful here because the guest could not
trust the hypervisor here to be truthful about the sequence numbers
incrementing. That's unfortunate since this means if these operations
do fail with a well behaved hypervisor the guest cannot use that VMPCK
again. But there is no harm in the guest re-issuing the
SNP_GUEST_REQUEST (or extended version) with the exact same request
just in at a different address. The GHCB spec actually calls this out
" It is recommended that the hypervisor validate the guest physical
address of the response page before invoking the SNP_GUEST_REQUEST API
so that the sequence numbers do not get out of sync for the guest,
possibly resulting in all successive requests failing".

Currently SVM_VMGEXIT_GUEST_REQUEST and SVM_VMGEXIT_EXT_GUEST_REQUEST
have different hypervisor -> guest usage for SW_EXITINFO2. I think
they both should be defined as what SVM_VMGEXIT_EXT_GUEST_REQUEST is
now: the high 32bits are the hypervisor error code, the low 32bits are
the FW error code. This would allow for both NAEs to have some signal
to the guest say SEV_RET_INVALID_REQ_ADDRESS. The hypervisor can use
this error code when doing the validation on the request and response
regions, if some is wrong with them the guest can retry with the exact
same request (so no IV reuse) in a corrected region.

But another reason I think SVM_VMGEXIT_GUEST_REQUEST SW_EXITINFO2
hypervisor->guest state should include this change is because in this
patch we are currently overloading the lower 32bits with hypervisor
error codes. In snp_handle_guest_request() if sev_snp_guest(),
snp_setup_guest_buf(), or snp_cleanup_guest_buf() fails we use the low
32bits of SW_EXITINFO2 to return hypervisor errors to the guest.

>
> Thanks,
> Ashish

2022-07-08 15:59:09

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 35/49] KVM: SVM: Remove the long-lived GHCB host map

On Thu, Jul 7, 2022 at 2:31 PM Kalra, Ashish <[email protected]> wrote:
>
> [AMD Official Use Only - General]
>
> Hello Peter,
>
> >> >There is a perf cost to this suggestion but it might make accessing
> >> >the GHCB safer for KVM. Have you thought about just using
> >> >kvm_read_guest() or copy_from_user() to fully copy out the GCHB into a KVM owned buffer, then copying it back before the VMRUN. That way the KVM doesn't need to guard against page_state_changes on the GHCBs, that could be a perf ?>>improvement in a follow up.
> >>
> >> Along with the performance costs you mentioned, the main concern here
> >> will be the GHCB write-back path (copying it back) before VMRUN: this
> >> will again hit the issue we have currently with
> >> kvm_write_guest() / copy_to_user(), when we use it to sync the scratch
> >> buffer back to GHCB. This can fail if guest RAM is mapped using huge-page(s) and RMP is 4K. Please refer to the patch/fix mentioned below, kvm_write_guest() potentially can fail before VMRUN in case of SNP :
> >>
> >> commit 94ed878c2669532ebae8eb9b4503f19aa33cd7aa
> >> Author: Ashish Kalra <[email protected]>
> >> Date: Mon Jun 6 22:28:01 2022 +0000
> >>
> >> KVM: SVM: Sync the GHCB scratch buffer using already mapped ghcb
> >>
> >> Using kvm_write_guest() to sync the GHCB scratch buffer can fail
> >> due to host mapping being 2M, but RMP being 4K. The page fault handling
> >> in do_user_addr_fault() fails to split the 2M page to handle RMP fault due
> >> to it being called here in a non-preemptible context. Instead use
> >> the already kernel mapped ghcb to sync the scratch buffer when the
> >> scratch buffer is contained within the GHCB.
>
> >Ah I didn't see that issue thanks for the pointer.
>
> >The patch description says "When SEV-SNP is enabled the mapped GPA needs to be protected against a page state change." since if the guest were to convert the GHCB page to private when the host is using the GHCB the host could get an RMP violation right?
>
> Right.
>
> >That RMP violation would cause the host to crash unless we use some copy_to_user() type protections.
>
> As such copy_to_user() will only swallow the RMP violation and return failure, so the host can retry the write.
>
> > I don't see anything mechanism for this patch to add the page state change protection discussed. Can't another vCPU still convert the GHCB to private?
>
> We do have the protections for GHCB getting mapped to private specifically, there are new post_{map|unmap}_gfn functions added to verify if it is safe to map
> GHCB pages. There is a PSC spinlock added which protects again page state change for these mapped pages.
> Below is the reference to this patch:
> https://lore.kernel.org/lkml/[email protected]/T/#mafcaac7296eb9a92c0ea58730dbd3ca47a8e0756
>
> But do note that there is protection only for GHCB pages and there is a need to add generic post_{map,unmap}_gfn() ops that can be used to verify
> that it's safe to map a given guest page in the hypervisor. This is a TODO right now and probably this is something which UPM can address more cleanly.

Thank you Ashish. I had missed that.

Can you help me understand why its OK to use kvm_write_guest() for the
|snp_certs_data| inside of snp_handle_ext_guest_request() in patch
42/49? I would have thought we'd have the same 2M vs 4K mapping
issues.

>
> >I was wrong about the importance of this though [email protected] walked me through how UPM will solve this issue so no worries about this until the series is rebased on to UPM.
>
> Thanks,
> Ashish

2022-07-08 16:01:33

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 35/49] KVM: SVM: Remove the long-lived GHCB host map

[AMD Official Use Only - General]

Hello Peter,

>> > I don't see anything mechanism for this patch to add the page state change protection discussed. Can't another vCPU still convert the GHCB to private?
>>
>> We do have the protections for GHCB getting mapped to private
>> specifically, there are new post_{map|unmap}_gfn functions added to verify if it is safe to map GHCB pages. There is a PSC spinlock added which protects again page state change for these mapped pages.
>> Below is the reference to this patch:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
>> .kernel.org%2Flkml%2Fcover.1655761627.git.ashish.kalra%40amd.com%2FT%2
>> F%23mafcaac7296eb9a92c0ea58730dbd3ca47a8e0756&amp;data=05%7C01%7CAshis
>> h.Kalra%40amd.com%7C647218cdb2a040bf354e08da60fa2968%7C3dd8961fe4884e6
>> 08e11a82d994e183d%7C0%7C0%7C637928924845082803%7CUnknown%7CTWFpbGZsb3d
>> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
>> 3000%7C%7C%7C&amp;sdata=ss8%2F5qualccXQero9phARIG2wvYhtp8SMdve3GglZeU%
>> 3D&amp;reserved=0
>>
>> But do note that there is protection only for GHCB pages and there is
>> a need to add generic post_{map,unmap}_gfn() ops that can be used to verify that it's safe to map a given guest page in the hypervisor. This is a TODO right now and probably this is something which UPM can address more cleanly.

>Thank you Ashish. I had missed that.

>Can you help me understand why its OK to use kvm_write_guest() for the
>|snp_certs_data| inside of snp_handle_ext_guest_request() in patch
>42/49? I would have thought we'd have the same 2M vs 4K mapping issues.

Preemption is not disabled there, hence the RMP page fault handler can do
the split of 2M to 4K on host pages without any issues.

Thanks,
Ashish

2022-07-11 14:07:49

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 28/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

On Mon, Jun 20, 2022 at 5:08 PM Ashish Kalra <[email protected]> wrote:
>
> From: Brijesh Singh <[email protected]>
>
> The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
> it as the measurement of the guest at launch.
>
> While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
> to encrypt the VMSA pages.
>
> If its an SNP guest, then VMSA was added in the RMP entry as
> a guest owned page and also removed from the kernel direct map
> so flush it later after it is transitioned back to hypervisor
> state and restored in the direct map.

Given the guest uses the SNP NAE AP boot protocol we were expecting
that there would be some option to add vCPUs to the VM but mark them
as "pending AP boot creation protocol" state. This would allow the
LaunchDigest of a VM doesn't change just because its vCPU count
changes. Would it be possible to add a new add an argument to
KVM_SNP_LAUNCH_FINISH to tell it which vCPUs to LAUNCH_UPDATE VMSA
pages for or similarly a new argument for KVM_CREATE_VCPU?

>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/x86/amd-memory-encryption.rst | 22 ++++
> arch/x86/kvm/svm/sev.c | 119 ++++++++++++++++++
> include/uapi/linux/kvm.h | 14 +++
> 3 files changed, 155 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index 62abd5c1f72b..750162cff87b 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -514,6 +514,28 @@ Returns: 0 on success, -negative on error
> See the SEV-SNP spec for further details on how to build the VMPL permission
> mask and page type.
>
> +21. KVM_SNP_LAUNCH_FINISH
> +-------------------------
> +
> +After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
> +issued to make the guest ready for the execution.
> +
> +Parameters (in): struct kvm_sev_snp_launch_finish
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_snp_launch_finish {
> + __u64 id_block_uaddr;
> + __u64 id_auth_uaddr;
> + __u8 id_block_en;
> + __u8 auth_key_en;
> + __u8 host_data[32];
> + };
> +
> +
> +See SEV-SNP specification for further details on launch finish input parameters.
>
> References
> ==========
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a9461d352eda..a5b90469683f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2095,6 +2095,106 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_update data = {};
> + int i, ret;
> +
> + data.gctx_paddr = __psp_pa(sev->snp_context);
> + data.page_type = SNP_PAGE_TYPE_VMSA;
> +
> + for (i = 0; i < kvm->created_vcpus; i++) {
> + struct vcpu_svm *svm = to_svm(xa_load(&kvm->vcpu_array, i));

Why are we iterating over |created_vcpus| rather than using kvm_for_each_vcpu?

> + u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
> +
> + /* Perform some pre-encryption checks against the VMSA */
> + ret = sev_es_sync_vmsa(svm);
> + if (ret)
> + return ret;

Do we need to take the 'vcpu->mutex' lock before modifying the
vcpu,like we do for SEV-ES in sev_launch_update_vmsa()?

> +
> + /* Transition the VMSA page to a firmware state. */
> + ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);
> + if (ret)
> + return ret;
> +
> + /* Issue the SNP command to encrypt the VMSA */
> + data.address = __sme_pa(svm->sev_es.vmsa);
> + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> + &data, &argp->error);
> + if (ret) {
> + snp_page_reclaim(pfn);
> + return ret;
> + }
> +
> + svm->vcpu.arch.guest_state_protected = true;
> + }
> +
> + return 0;
> +}
> +
> +static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_finish *data;
> + void *id_block = NULL, *id_auth = NULL;
> + struct kvm_sev_snp_launch_finish params;
> + int ret;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (!sev->snp_context)
> + return -EINVAL;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> + return -EFAULT;
> +
> + /* Measure all vCPUs using LAUNCH_UPDATE before we finalize the launch flow. */
> + ret = snp_launch_update_vmsa(kvm, argp);
> + if (ret)
> + return ret;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> + if (!data)
> + return -ENOMEM;
> +
> + if (params.id_block_en) {
> + id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
> + if (IS_ERR(id_block)) {
> + ret = PTR_ERR(id_block);
> + goto e_free;
> + }
> +
> + data->id_block_en = 1;
> + data->id_block_paddr = __sme_pa(id_block);
> + }
> +
> + if (params.auth_key_en) {
> + id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
> + if (IS_ERR(id_auth)) {
> + ret = PTR_ERR(id_auth);
> + goto e_free_id_block;
> + }
> +
> + data->auth_key_en = 1;
> + data->id_auth_paddr = __sme_pa(id_auth);
> + }
> +
> + data->gctx_paddr = __psp_pa(sev->snp_context);
> + ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
> +
> + kfree(id_auth);
> +
> +e_free_id_block:
> + kfree(id_block);
> +
> +e_free:
> + kfree(data);
> +
> + return ret;
> +}
> +
> int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -2191,6 +2291,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SNP_LAUNCH_UPDATE:
> r = snp_launch_update(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SNP_LAUNCH_FINISH:
> + r = snp_launch_finish(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> @@ -2696,11 +2799,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
>
> svm = to_svm(vcpu);
>
> + /*
> + * If its an SNP guest, then VMSA was added in the RMP entry as
> + * a guest owned page. Transition the page to hypervisor state
> + * before releasing it back to the system.
> + * Also the page is removed from the kernel direct map, so flush it
> + * later after it is transitioned back to hypervisor state and
> + * restored in the direct map.
> + */
> + if (sev_snp_guest(vcpu->kvm)) {
> + u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
> +
> + if (host_rmp_make_shared(pfn, PG_LEVEL_4K, false))
> + goto skip_vmsa_free;

Why not call host_rmp_make_shared with leak==true? This old VMSA page
is now unusable IIUC.



> + }
> +
> if (vcpu->arch.guest_state_protected)
> sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
>
> __free_page(virt_to_page(svm->sev_es.vmsa));
>
> +skip_vmsa_free:
> if (svm->sev_es.ghcb_sa_free)
> kvfree(svm->sev_es.ghcb_sa);
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 9b36b07414ea..5a4662716b6a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1814,6 +1814,7 @@ enum sev_cmd_id {
> KVM_SEV_SNP_INIT,
> KVM_SEV_SNP_LAUNCH_START,
> KVM_SEV_SNP_LAUNCH_UPDATE,
> + KVM_SEV_SNP_LAUNCH_FINISH,
>
> KVM_SEV_NR_MAX,
> };
> @@ -1948,6 +1949,19 @@ struct kvm_sev_snp_launch_update {
> __u8 vmpl1_perms;
> };
>
> +#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
> +#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
> +#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
> +
> +struct kvm_sev_snp_launch_finish {
> + __u64 id_block_uaddr;
> + __u64 id_auth_uaddr;
> + __u8 id_block_en;
> + __u8 auth_key_en;
> + __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
> + __u8 pad[6];
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> --
> 2.25.1
>

2022-07-11 22:42:21

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 28/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

[AMD Official Use Only - General]

Hello Peter,

>> The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and
>> stores it as the measurement of the guest at launch.
>>
>> While finalizing the launch flow, it also issues the LAUNCH_UPDATE
>> command to encrypt the VMSA pages.

>Given the guest uses the SNP NAE AP boot protocol we were expecting that there would be some option to add vCPUs to the VM but mark them as "pending AP boot creation protocol" state. This would allow the LaunchDigest of a VM doesn't change >just because its vCPU count changes. Would it be possible to add a new add an argument to KVM_SNP_LAUNCH_FINISH to tell it which vCPUs to LAUNCH_UPDATE VMSA pages for or similarly a new argument for KVM_CREATE_VCPU?

But don't we want/need to measure all vCPUs using LAUNCH_UPDATE_VMSA before we issue SNP_LAUNCH_FINISH command ?

If we are going to add vCPUs and mark them as "pending AP boot creation" state then how are we going to do LAUNCH_UPDATE_VMSAs for them after SNP_LAUNCH_FINISH ?

int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd
>> +*argp) {
>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> + struct sev_data_snp_launch_update data = {};
>> + int i, ret;
>> +
>> + data.gctx_paddr = __psp_pa(sev->snp_context);
>> + data.page_type = SNP_PAGE_TYPE_VMSA;
>> +
>> + for (i = 0; i < kvm->created_vcpus; i++) {
>> + struct vcpu_svm *svm =
>> + to_svm(xa_load(&kvm->vcpu_array, i));

> Why are we iterating over |created_vcpus| rather than using kvm_for_each_vcpu?

Yes we should be using kvm_for_each_vcpu(), that will also help avoid touching implementation
specific details and hide complexities such as xa_load(), locking requirements, etc.

Additionally, kvm_for_each_vcpu() works on online_cpus, but I think that is what we should
be considering at LAUNCH_UPDATE_VMSA time, via-a-vis created_vcpus.

>> + u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
>> +
>> + /* Perform some pre-encryption checks against the VMSA */
>> + ret = sev_es_sync_vmsa(svm);
>> + if (ret)
>> + return ret;

>Do we need to take the 'vcpu->mutex' lock before modifying the vcpu,like we do for SEV-ES in sev_launch_update_vmsa()?

This is using the per-cpu vcpu_svm structure, but we may need to guard against the KVM vCPU ioctl requests, so yes it is
safer to take the 'vcpu->mutex' lock here.

>> + /*
>> + * If its an SNP guest, then VMSA was added in the RMP entry as
>> + * a guest owned page. Transition the page to hypervisor state
>> + * before releasing it back to the system.
>> + * Also the page is removed from the kernel direct map, so flush it
>> + * later after it is transitioned back to hypervisor state and
>> + * restored in the direct map.
>> + */
>> + if (sev_snp_guest(vcpu->kvm)) {
>> + u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
>> +
>> + if (host_rmp_make_shared(pfn, PG_LEVEL_4K, false))
>> + goto skip_vmsa_free;

>Why not call host_rmp_make_shared with leak==true? This old VMSA page is now unusable IIUC.

Yes the old VMSA page is now unavailable and lost, so makes sense to call host_rmp_make_shared() with leak==true.

Thanks,
Ashish

2022-07-12 12:00:02

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 09/49] x86/fault: Add support to handle the RMP fault for user address

On Mon, Jun 20, 2022 at 11:03:43PM +0000, Ashish Kalra wrote:
> +/*
> + * Return 1 if the caller need to retry, 0 if it the address need to be split
> + * in order to resolve the fault.
> + */
> +static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
> + unsigned long address)
> +{
> + int rmp_level, level;
> + pte_t *pte;
> + u64 pfn;
> +
> + pte = lookup_address_in_mm(current->mm, address, &level);

As discussed in [1], the lookup should be done in kvm->mm, along the
lines of host_pfn_mapping_level().

[1] https://lore.kernel.org/kvm/YmwIi3bXr%2F1yhYV%[email protected]/
|
BR, Jarkko

2022-07-12 12:44:20

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 41/49] KVM: SVM: Add support to handle the RMP nested page fault

On Mon, Jun 20, 2022 at 11:13:03PM +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> When SEV-SNP is enabled in the guest, the hardware places restrictions on
> all memory accesses based on the contents of the RMP table. When hardware
> encounters RMP check failure caused by the guest memory access it raises
> the #NPF. The error code contains additional information on the access
> type. See the APM volume 2 for additional information.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/svm/svm.c | 14 +++++---
> 2 files changed, 86 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 4ed90331bca0..7fc0fad87054 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4009,3 +4009,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
>
> spin_unlock(&sev->psc_lock);
> }
> +
> +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
> +{
> + int rmp_level, npt_level, rc, assigned;
> + struct kvm *kvm = vcpu->kvm;
> + gfn_t gfn = gpa_to_gfn(gpa);
> + bool need_psc = false;
> + enum psc_op psc_op;
> + kvm_pfn_t pfn;
> + bool private;
> +
> + write_lock(&kvm->mmu_lock);
> +
> + if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))

This function does not exist. Should it be kvm_mmu_get_tdp_page?

BR, Jarkko

2022-07-12 12:47:17

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 41/49] KVM: SVM: Add support to handle the RMP nested page fault

On Tue, Jul 12, 2022 at 03:34:00PM +0300, Jarkko Sakkinen wrote:
> On Mon, Jun 20, 2022 at 11:13:03PM +0000, Ashish Kalra wrote:
> > From: Brijesh Singh <[email protected]>
> >
> > When SEV-SNP is enabled in the guest, the hardware places restrictions on
> > all memory accesses based on the contents of the RMP table. When hardware
> > encounters RMP check failure caused by the guest memory access it raises
> > the #NPF. The error code contains additional information on the access
> > type. See the APM volume 2 for additional information.
> >
> > Signed-off-by: Brijesh Singh <[email protected]>
> > ---
> > arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
> > arch/x86/kvm/svm/svm.c | 14 +++++---
> > 2 files changed, 86 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 4ed90331bca0..7fc0fad87054 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -4009,3 +4009,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
> >
> > spin_unlock(&sev->psc_lock);
> > }
> > +
> > +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
> > +{
> > + int rmp_level, npt_level, rc, assigned;
> > + struct kvm *kvm = vcpu->kvm;
> > + gfn_t gfn = gpa_to_gfn(gpa);
> > + bool need_psc = false;
> > + enum psc_op psc_op;
> > + kvm_pfn_t pfn;
> > + bool private;
> > +
> > + write_lock(&kvm->mmu_lock);
> > +
> > + if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))
>
> This function does not exist. Should it be kvm_mmu_get_tdp_page?

Ugh, ignore that.

This the actual issue:

$ git grep kvm_mmu_get_tdp_walk
arch/x86/kvm/mmu/mmu.c:bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level)
arch/x86/kvm/mmu/mmu.c:EXPORT_SYMBOL_GPL(kvm_mmu_get_tdp_walk);
arch/x86/kvm/svm/sev.c: rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);

It's not declared in any header.

BR, Jarkko

2022-07-12 12:51:22

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 41/49] KVM: SVM: Add support to handle the RMP nested page fault

On Tue, Jul 12, 2022 at 03:45:13PM +0300, Jarkko Sakkinen wrote:
> On Tue, Jul 12, 2022 at 03:34:00PM +0300, Jarkko Sakkinen wrote:
> > On Mon, Jun 20, 2022 at 11:13:03PM +0000, Ashish Kalra wrote:
> > > From: Brijesh Singh <[email protected]>
> > >
> > > When SEV-SNP is enabled in the guest, the hardware places restrictions on
> > > all memory accesses based on the contents of the RMP table. When hardware
> > > encounters RMP check failure caused by the guest memory access it raises
> > > the #NPF. The error code contains additional information on the access
> > > type. See the APM volume 2 for additional information.
> > >
> > > Signed-off-by: Brijesh Singh <[email protected]>
> > > ---
> > > arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
> > > arch/x86/kvm/svm/svm.c | 14 +++++---
> > > 2 files changed, 86 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index 4ed90331bca0..7fc0fad87054 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -4009,3 +4009,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
> > >
> > > spin_unlock(&sev->psc_lock);
> > > }
> > > +
> > > +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
> > > +{
> > > + int rmp_level, npt_level, rc, assigned;
> > > + struct kvm *kvm = vcpu->kvm;
> > > + gfn_t gfn = gpa_to_gfn(gpa);
> > > + bool need_psc = false;
> > > + enum psc_op psc_op;
> > > + kvm_pfn_t pfn;
> > > + bool private;
> > > +
> > > + write_lock(&kvm->mmu_lock);
> > > +
> > > + if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))
> >
> > This function does not exist. Should it be kvm_mmu_get_tdp_page?
>
> Ugh, ignore that.
>
> This the actual issue:
>
> $ git grep kvm_mmu_get_tdp_walk
> arch/x86/kvm/mmu/mmu.c:bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level)
> arch/x86/kvm/mmu/mmu.c:EXPORT_SYMBOL_GPL(kvm_mmu_get_tdp_walk);
> arch/x86/kvm/svm/sev.c: rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
>
> It's not declared in any header.

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 0e1f4d92b89b..33267f619e61 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -164,6 +164,8 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
u32 error_code, int max_level);

+bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level):
+
/*
* Check if a given access (described through the I/D, W/R and U/S bits of a
* page fault error code pfec) causes a permission fault with the given PTE


BTW, kvm_mmu_map_tdp_page() ought to be in single line since it's less than
100 characters.

BR, Jarkko

2022-07-12 14:35:29

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 09/49] x86/fault: Add support to handle the RMP fault for user address

[AMD Official Use Only - General]

>> +static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
>> + unsigned long address)
>> +{
>> + int rmp_level, level;
>> + pte_t *pte;
>> + u64 pfn;
>> +
>> + pte = lookup_address_in_mm(current->mm, address, &level);

>As discussed in [1], the lookup should be done in kvm->mm, along the lines of host_pfn_mapping_level().

With lookup_address_in_mm() now removed in 5.19, this is now using lookup_address_in_pgd() though still using non init-mm, and as mentioned here in [1], it makes sense to
not use lookup_address_in_pgd() as it does not play nice with userspace mappings, e.g. doesn't disable IRQs to block TLB shootdowns and doesn't use READ_ONCE()
to ensure an upper level entry isn't converted to a huge page between checking the PAGE_SIZE bit and grabbing the address of the next level down.

But is KVM going to provide its own variant of lookup_address_in_pgd() that is safe for use with user addresses, i.e., a generic version of lookup_address() on kvm->mm or we need to
duplicate page table walking code of host_pfn_mapping_level() ?

Thanks,
Ashish

>[1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2FYmwIi3bXr%252F1yhYV%252F%40google.com%2F&amp;data=05%7C01%7CAshish.Kalra%40amd.com%>7Ce300014162fc4d8b452708da63fdb970%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637932238689925974%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%>7C%7C%7C&amp;sdata=GxPrEUxuVNEm6COdfHCILOwp9yuX48gpoYmtrOwMx8Q%3D&amp;reserved=0

2022-07-12 14:46:04

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 28/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

On Mon, Jul 11, 2022 at 4:41 PM Kalra, Ashish <[email protected]> wrote:
>
> [AMD Official Use Only - General]
>
> Hello Peter,
>
> >> The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and
> >> stores it as the measurement of the guest at launch.
> >>
> >> While finalizing the launch flow, it also issues the LAUNCH_UPDATE
> >> command to encrypt the VMSA pages.
>
> >Given the guest uses the SNP NAE AP boot protocol we were expecting that there would be some option to add vCPUs to the VM but mark them as "pending AP boot creation protocol" state. This would allow the LaunchDigest of a VM doesn't change >just because its vCPU count changes. Would it be possible to add a new add an argument to KVM_SNP_LAUNCH_FINISH to tell it which vCPUs to LAUNCH_UPDATE VMSA pages for or similarly a new argument for KVM_CREATE_VCPU?
>
> But don't we want/need to measure all vCPUs using LAUNCH_UPDATE_VMSA before we issue SNP_LAUNCH_FINISH command ?
>
> If we are going to add vCPUs and mark them as "pending AP boot creation" state then how are we going to do LAUNCH_UPDATE_VMSAs for them after SNP_LAUNCH_FINISH ?

If I understand correctly we don't need or even want the APs to be
LAUNCH_UPDATE_VMSA'd. LAUNCH_UPDATEing all the VMSAs causes VMs with
different numbers of vCPUs to have different launch digests. Its my
understanding the SNP AP Creation protocol was to solve this so that
VMs with different vcpu counts have the same launch digest.

Looking at patch "[Part2,v6,44/49] KVM: SVM: Support SEV-SNP AP
Creation NAE event" and section "4.1.9 SNP AP Creation" of the GHCB
spec. There is no need to mark the LAUNCH_UPDATE the AP's VMSA or mark
the vCPUs runnable. Instead we can do that only for the BSP. Then in
the guest UEFI the BSP can: create new VMSAs from guest pages,
RMPADJUST them into the RMP state VMSA, then use the SNP AP Creation
NAE to get the hypervisor to mark them runnable. I believe this is all
setup in the UEFI patch:
https://www.mail-archive.com/[email protected]/msg38460.html.

2022-07-12 15:01:04

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 09/49] x86/fault: Add support to handle the RMP fault for user address

On Tue, Jul 12, 2022 at 02:29:18PM +0000, Kalra, Ashish wrote:
> [AMD Official Use Only - General]
>
> >> +static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
> >> + unsigned long address)
> >> +{
> >> + int rmp_level, level;
> >> + pte_t *pte;
> >> + u64 pfn;
> >> +
> >> + pte = lookup_address_in_mm(current->mm, address, &level);
>
> >As discussed in [1], the lookup should be done in kvm->mm, along the lines of host_pfn_mapping_level().
>
> With lookup_address_in_mm() now removed in 5.19, this is now using
> lookup_address_in_pgd() though still using non init-mm, and as mentioned
> here in [1], it makes sense to not use lookup_address_in_pgd() as it does
> not play nice with userspace mappings, e.g. doesn't disable IRQs to block
> TLB shootdowns and doesn't use READ_ONCE() to ensure an upper level entry
> isn't converted to a huge page between checking the PAGE_SIZE bit and
> grabbing the address of the next level down.
>
> But is KVM going to provide its own variant of lookup_address_in_pgd()
> that is safe for use with user addresses, i.e., a generic version of
> lookup_address() on kvm->mm or we need to duplicate page table walking
> code of host_pfn_mapping_level() ?

It's probably cpen coded for the sole reason that there is only one
call site, i.e. there has not been rational reason to have a helper
function.

Helpers are usually created only in-need basis, and since the need
comes from this patch set, it should include a patch, which simply
encapsulates it into a helper.

>
> Thanks,
> Ashish

BR, Jarkko

2022-07-12 15:26:40

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 28/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

[AMD Official Use Only - General]

Hello Peter,

>> >Given the guest uses the SNP NAE AP boot protocol we were expecting that there would be some option to add vCPUs to the VM but mark them as "pending AP boot creation protocol" state. This would allow the LaunchDigest of a VM doesn't change >just because its vCPU count changes. Would it be possible to add a new add an argument to KVM_SNP_LAUNCH_FINISH to tell it which vCPUs to LAUNCH_UPDATE VMSA pages for or similarly a new argument for KVM_CREATE_VCPU?
>>
>> But don't we want/need to measure all vCPUs using LAUNCH_UPDATE_VMSA before we issue SNP_LAUNCH_FINISH command ?
>>
>> If we are going to add vCPUs and mark them as "pending AP boot creation" state then how are we going to do LAUNCH_UPDATE_VMSAs for them after SNP_LAUNCH_FINISH ?

>If I understand correctly we don't need or even want the APs to be LAUNCH_UPDATE_VMSA'd. LAUNCH_UPDATEing all the VMSAs causes VMs with different numbers of vCPUs to have different launch digests. Its my understanding the SNP AP >Creation protocol was to solve this so that VMs with different vcpu counts have the same launch digest.

>Looking at patch "[Part2,v6,44/49] KVM: SVM: Support SEV-SNP AP Creation NAE event" and section "4.1.9 SNP AP Creation" of the GHCB spec. There is no need to mark the LAUNCH_UPDATE the AP's VMSA or mark the vCPUs runnable. Instead we >can do that only for the BSP. Then in the guest UEFI the BSP can: create new VMSAs from guest pages, RMPADJUST them into the RMP state VMSA, then use the SNP AP Creation NAE to get the hypervisor to mark them runnable. I believe this is all >setup in the UEFI patch:
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fdevel%40edk2.groups.io%2Fmsg38460.html&amp;data=05%7C01%7CAshish.Kalra%40amd.com%7Ca40178ac6f284a9e33aa08da64152baa%>7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637932339382401133%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=ZaiHHo9S24f9BB6E%>2FjexOt5TdKJQXxQDJI5QoYdDDHc%3D&amp;reserved=0.

Yes, I discussed the same with Tom, and this will be supported going forward, only the BSP will need to go through the LAUNCH_UPDATE_VMSA and at runtime the guest can dynamically create more APs using the SNP AP Creation NAE event.

Now, coming back to the original question, why do we need a separate vCPU count argument for SNP_LAUNCH_FINISH, won't the statically created vCPUs in kvm->created_vcpus/online_vcpus be sufficient for that, any dynamically created
vCPU's won't be part of the initial measurement or LaunchDigest of the VM, right ?

Thanks,
Ashish

2022-07-12 15:33:24

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 41/49] KVM: SVM: Add support to handle the RMP nested page fault

[AMD Official Use Only - General]

Yes, this is fixed in 5.19 rebase.

Thanks,
Ashish

-----Original Message-----
From: Jarkko Sakkinen <[email protected]>
Sent: Tuesday, July 12, 2022 7:49 AM
To: Kalra, Ashish <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Lendacky, Thomas <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Roth, Michael <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH Part2 v6 41/49] KVM: SVM: Add support to handle the RMP nested page fault

On Tue, Jul 12, 2022 at 03:45:13PM +0300, Jarkko Sakkinen wrote:
> On Tue, Jul 12, 2022 at 03:34:00PM +0300, Jarkko Sakkinen wrote:
> > On Mon, Jun 20, 2022 at 11:13:03PM +0000, Ashish Kalra wrote:
> > > From: Brijesh Singh <[email protected]>
> > >
> > > When SEV-SNP is enabled in the guest, the hardware places
> > > restrictions on all memory accesses based on the contents of the
> > > RMP table. When hardware encounters RMP check failure caused by
> > > the guest memory access it raises the #NPF. The error code
> > > contains additional information on the access type. See the APM volume 2 for additional information.
> > >
> > > Signed-off-by: Brijesh Singh <[email protected]>
> > > ---
> > > arch/x86/kvm/svm/sev.c | 76
> > > ++++++++++++++++++++++++++++++++++++++++++
> > > arch/x86/kvm/svm/svm.c | 14 +++++---
> > > 2 files changed, 86 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index
> > > 4ed90331bca0..7fc0fad87054 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -4009,3 +4009,79 @@ void sev_post_unmap_gfn(struct kvm *kvm,
> > > gfn_t gfn, kvm_pfn_t pfn)
> > >
> > > spin_unlock(&sev->psc_lock);
> > > }
> > > +
> > > +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64
> > > +error_code) {
> > > + int rmp_level, npt_level, rc, assigned;
> > > + struct kvm *kvm = vcpu->kvm;
> > > + gfn_t gfn = gpa_to_gfn(gpa);
> > > + bool need_psc = false;
> > > + enum psc_op psc_op;
> > > + kvm_pfn_t pfn;
> > > + bool private;
> > > +
> > > + write_lock(&kvm->mmu_lock);
> > > +
> > > + if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn,
> > > +&npt_level)))
> >
> > This function does not exist. Should it be kvm_mmu_get_tdp_page?
>
> Ugh, ignore that.
>
> This the actual issue:
>
> $ git grep kvm_mmu_get_tdp_walk
> arch/x86/kvm/mmu/mmu.c:bool kvm_mmu_get_tdp_walk(struct kvm_vcpu
> *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level) arch/x86/kvm/mmu/mmu.c:EXPORT_SYMBOL_GPL(kvm_mmu_get_tdp_walk);
> arch/x86/kvm/svm/sev.c: rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
>
> It's not declared in any header.

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 0e1f4d92b89b..33267f619e61 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -164,6 +164,8 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu) kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
u32 error_code, int max_level);

+bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level):
+
/*
* Check if a given access (described through the I/D, W/R and U/S bits of a
* page fault error code pfec) causes a permission fault with the given PTE


BTW, kvm_mmu_map_tdp_page() ought to be in single line since it's less than
100 characters.

BR, Jarkko

2022-07-12 16:05:35

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 28/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

On Tue, Jul 12, 2022 at 9:22 AM Kalra, Ashish <[email protected]> wrote:
>
> [AMD Official Use Only - General]
>
> Hello Peter,
>
> >> >Given the guest uses the SNP NAE AP boot protocol we were expecting that there would be some option to add vCPUs to the VM but mark them as "pending AP boot creation protocol" state. This would allow the LaunchDigest of a VM doesn't change >just because its vCPU count changes. Would it be possible to add a new add an argument to KVM_SNP_LAUNCH_FINISH to tell it which vCPUs to LAUNCH_UPDATE VMSA pages for or similarly a new argument for KVM_CREATE_VCPU?
> >>
> >> But don't we want/need to measure all vCPUs using LAUNCH_UPDATE_VMSA before we issue SNP_LAUNCH_FINISH command ?
> >>
> >> If we are going to add vCPUs and mark them as "pending AP boot creation" state then how are we going to do LAUNCH_UPDATE_VMSAs for them after SNP_LAUNCH_FINISH ?
>
> >If I understand correctly we don't need or even want the APs to be LAUNCH_UPDATE_VMSA'd. LAUNCH_UPDATEing all the VMSAs causes VMs with different numbers of vCPUs to have different launch digests. Its my understanding the SNP AP >Creation protocol was to solve this so that VMs with different vcpu counts have the same launch digest.
>
> >Looking at patch "[Part2,v6,44/49] KVM: SVM: Support SEV-SNP AP Creation NAE event" and section "4.1.9 SNP AP Creation" of the GHCB spec. There is no need to mark the LAUNCH_UPDATE the AP's VMSA or mark the vCPUs runnable. Instead we >can do that only for the BSP. Then in the guest UEFI the BSP can: create new VMSAs from guest pages, RMPADJUST them into the RMP state VMSA, then use the SNP AP Creation NAE to get the hypervisor to mark them runnable. I believe this is all >setup in the UEFI patch:
> >https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fdevel%40edk2.groups.io%2Fmsg38460.html&amp;data=05%7C01%7CAshish.Kalra%40amd.com%7Ca40178ac6f284a9e33aa08da64152baa%>7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637932339382401133%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=ZaiHHo9S24f9BB6E%>2FjexOt5TdKJQXxQDJI5QoYdDDHc%3D&amp;reserved=0.
>
> Yes, I discussed the same with Tom, and this will be supported going forward, only the BSP will need to go through the LAUNCH_UPDATE_VMSA and at runtime the guest can dynamically create more APs using the SNP AP Creation NAE event.
>
> Now, coming back to the original question, why do we need a separate vCPU count argument for SNP_LAUNCH_FINISH, won't the statically created vCPUs in kvm->created_vcpus/online_vcpus be sufficient for that, any dynamically created
> vCPU's won't be part of the initial measurement or LaunchDigest of the VM, right ?

Are you suggesting that QEMU will KVM_CREATE_VCPU the BSP, then
LAUNCH_FINISH, then KVM_CREATE_VCPU all the APs to their VMSAs were
not LAUNCH_UPDATED? If so, it seems annoying to have to create vCPUs
at different times to get their VMSAs into different states. That's
why I was suggesting some other mechanism so we can continue to
KVM_CREATE_VCPU all the vCPUs at the same time.

>
> Thanks,
> Ashish

2022-07-12 16:46:57

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 29/49] KVM: X86: Keep the NPT and RMP page level in sync

s/X86/x86/

On Mon, Jun 20, 2022 at 11:08:57PM +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> When running an SEV-SNP VM, the sPA used to index the RMP entry is
> obtained through the NPT translation (gva->gpa->spa). The NPT page
> level is checked against the page level programmed in the RMP entry.
> If the page level does not match, then it will cause a nested page
> fault with the RMP bit set to indicate the RMP violation.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/mmu/mmu.c | 5 ++++
> arch/x86/kvm/svm/sev.c | 46 ++++++++++++++++++++++++++++++
> arch/x86/kvm/svm/svm.c | 1 +
> arch/x86/kvm/svm/svm.h | 1 +
> 6 files changed, 55 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index a66292dae698..e0068e702692 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -129,6 +129,7 @@ KVM_X86_OP(complete_emulated_msr)
> KVM_X86_OP(vcpu_deliver_sipi_vector)
> KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> KVM_X86_OP(alloc_apic_backing_page)
> +KVM_X86_OP_OPTIONAL(rmp_page_level_adjust)
>
> #undef KVM_X86_OP
> #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0205e2944067..2748c69609e3 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1514,6 +1514,7 @@ struct kvm_x86_ops {
> unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
>
> void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
> + void (*rmp_page_level_adjust)(struct kvm *kvm, kvm_pfn_t pfn, int *level);
> };
>
> struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c623019929a7..997318ecebd1 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -43,6 +43,7 @@
> #include <linux/hash.h>
> #include <linux/kern_levels.h>
> #include <linux/kthread.h>
> +#include <linux/sev.h>
>
> #include <asm/page.h>
> #include <asm/memtype.h>
> @@ -2824,6 +2825,10 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> if (unlikely(!pte))
> return PG_LEVEL_4K;
>
> + /* Adjust the page level based on the SEV-SNP RMP page level. */
> + if (kvm_x86_ops.rmp_page_level_adjust)
> + static_call(kvm_x86_rmp_page_level_adjust)(kvm, pfn, &level);
> +
> return level;
> }
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a5b90469683f..91d3d24e60d2 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3597,3 +3597,49 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
>
> return pfn_to_page(pfn);
> }
> +
> +static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
> +{
> + int level;
> +
> + while (end > start) {
> + if (snp_lookup_rmpentry(start, &level) != 0)
> + return false;
> + start++;
> + }
> +
> + return true;
> +}
> +
> +void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level)

Would not do harm to document this, given that it is not a static
fuction.

> +{
> + int rmp_level, assigned;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return;
> +
> + assigned = snp_lookup_rmpentry(pfn, &rmp_level);
> + if (unlikely(assigned < 0))
> + return;
> +
> + if (!assigned) {
> + /*
> + * If all the pages are shared then no need to keep the RMP
> + * and NPT in sync.
> + */
> + pfn = pfn & ~(PTRS_PER_PMD - 1);
> + if (is_pfn_range_shared(pfn, pfn + PTRS_PER_PMD))
> + return;
> + }
> +
> + /*
> + * The hardware installs 2MB TLB entries to access to 1GB pages,
> + * therefore allow NPT to use 1GB pages when pfn was added as 2MB
> + * in the RMP table.
> + */
> + if (rmp_level == PG_LEVEL_2M && (*level == PG_LEVEL_1G))
> + return;
> +
> + /* Adjust the level to keep the NPT and RMP in sync */
> + *level = min_t(size_t, *level, rmp_level);
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b4bd64f94d3a..18e2cd4d9559 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4734,6 +4734,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> .vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
>
> .alloc_apic_backing_page = svm_alloc_apic_backing_page,
> + .rmp_page_level_adjust = sev_rmp_page_level_adjust,
> };
>
> /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 71c011af098e..7782312a1cda 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -673,6 +673,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
> void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
> void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
> +void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level);
>
> /* vmenter.S */
>
> --
> 2.25.1
>


BR, Jarkko

2022-07-12 17:46:33

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 28/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

On 7/12/22 09:45, Peter Gonda wrote:
> On Mon, Jul 11, 2022 at 4:41 PM Kalra, Ashish <[email protected]> wrote:
>>
>> [AMD Official Use Only - General]
>>
>> Hello Peter,
>>
>>>> The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and
>>>> stores it as the measurement of the guest at launch.
>>>>
>>>> While finalizing the launch flow, it also issues the LAUNCH_UPDATE
>>>> command to encrypt the VMSA pages.
>>
>>> Given the guest uses the SNP NAE AP boot protocol we were expecting that there would be some option to add vCPUs to the VM but mark them as "pending AP boot creation protocol" state. This would allow the LaunchDigest of a VM doesn't change >just because its vCPU count changes. Would it be possible to add a new add an argument to KVM_SNP_LAUNCH_FINISH to tell it which vCPUs to LAUNCH_UPDATE VMSA pages for or similarly a new argument for KVM_CREATE_VCPU?
>>
>> But don't we want/need to measure all vCPUs using LAUNCH_UPDATE_VMSA before we issue SNP_LAUNCH_FINISH command ?
>>
>> If we are going to add vCPUs and mark them as "pending AP boot creation" state then how are we going to do LAUNCH_UPDATE_VMSAs for them after SNP_LAUNCH_FINISH ?
>
> If I understand correctly we don't need or even want the APs to be
> LAUNCH_UPDATE_VMSA'd. LAUNCH_UPDATEing all the VMSAs causes VMs with
> different numbers of vCPUs to have different launch digests. Its my
> understanding the SNP AP Creation protocol was to solve this so that
> VMs with different vcpu counts have the same launch digest.
>
> Looking at patch "[Part2,v6,44/49] KVM: SVM: Support SEV-SNP AP
> Creation NAE event" and section "4.1.9 SNP AP Creation" of the GHCB
> spec. There is no need to mark the LAUNCH_UPDATE the AP's VMSA or mark
> the vCPUs runnable. Instead we can do that only for the BSP. Then in
> the guest UEFI the BSP can: create new VMSAs from guest pages,
> RMPADJUST them into the RMP state VMSA, then use the SNP AP Creation
> NAE to get the hypervisor to mark them runnable. I believe this is all
> setup in the UEFI patch:
> https://www.mail-archive.com/[email protected]/msg38460.html.

Not quite... there isn't a way to (easily) retrieve the APIC IDs for all
of the vCPUs, which are required in order to use the AP Create event.

For this version of SNP, all of the vCPUs are measured and started by OVMF
in the same way as SEV-ES. However, once the vCPUs have run, we now have
the APIC ID associated with each vCPU and the AP Create event can be used
going forward.

The SVSM support will introduce a new NAE event to the GHCB spec to
retrieve all of the APIC IDs from the hypervisor. With that, then you
would be able be required to perform a LAUNCH_UPDATE_VMSA against the BSP.

Thanks,
Tom

2022-07-13 15:02:06

by Peter Gonda

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 28/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

On Tue, Jul 12, 2022 at 11:40 AM Tom Lendacky <[email protected]> wrote:
>
> On 7/12/22 09:45, Peter Gonda wrote:
> > On Mon, Jul 11, 2022 at 4:41 PM Kalra, Ashish <[email protected]> wrote:
> >>
> >> [AMD Official Use Only - General]
> >>
> >> Hello Peter,
> >>
> >>>> The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and
> >>>> stores it as the measurement of the guest at launch.
> >>>>
> >>>> While finalizing the launch flow, it also issues the LAUNCH_UPDATE
> >>>> command to encrypt the VMSA pages.
> >>
> >>> Given the guest uses the SNP NAE AP boot protocol we were expecting that there would be some option to add vCPUs to the VM but mark them as "pending AP boot creation protocol" state. This would allow the LaunchDigest of a VM doesn't change >just because its vCPU count changes. Would it be possible to add a new add an argument to KVM_SNP_LAUNCH_FINISH to tell it which vCPUs to LAUNCH_UPDATE VMSA pages for or similarly a new argument for KVM_CREATE_VCPU?
> >>
> >> But don't we want/need to measure all vCPUs using LAUNCH_UPDATE_VMSA before we issue SNP_LAUNCH_FINISH command ?
> >>
> >> If we are going to add vCPUs and mark them as "pending AP boot creation" state then how are we going to do LAUNCH_UPDATE_VMSAs for them after SNP_LAUNCH_FINISH ?
> >
> > If I understand correctly we don't need or even want the APs to be
> > LAUNCH_UPDATE_VMSA'd. LAUNCH_UPDATEing all the VMSAs causes VMs with
> > different numbers of vCPUs to have different launch digests. Its my
> > understanding the SNP AP Creation protocol was to solve this so that
> > VMs with different vcpu counts have the same launch digest.
> >
> > Looking at patch "[Part2,v6,44/49] KVM: SVM: Support SEV-SNP AP
> > Creation NAE event" and section "4.1.9 SNP AP Creation" of the GHCB
> > spec. There is no need to mark the LAUNCH_UPDATE the AP's VMSA or mark
> > the vCPUs runnable. Instead we can do that only for the BSP. Then in
> > the guest UEFI the BSP can: create new VMSAs from guest pages,
> > RMPADJUST them into the RMP state VMSA, then use the SNP AP Creation
> > NAE to get the hypervisor to mark them runnable. I believe this is all
> > setup in the UEFI patch:
> > https://www.mail-archive.com/[email protected]/msg38460.html.
>
> Not quite... there isn't a way to (easily) retrieve the APIC IDs for all
> of the vCPUs, which are required in order to use the AP Create event.
>
> For this version of SNP, all of the vCPUs are measured and started by OVMF
> in the same way as SEV-ES. However, once the vCPUs have run, we now have
> the APIC ID associated with each vCPU and the AP Create event can be used
> going forward.
>
> The SVSM support will introduce a new NAE event to the GHCB spec to
> retrieve all of the APIC IDs from the hypervisor. With that, then you
> would be able be required to perform a LAUNCH_UPDATE_VMSA against the BSP.

Thank you Tom I missed that we needed to run the APs to set up their
APIC IDs for OVMF. Is there any reason we need to wait for the SVSM to
do what you describe? Couldn't the OVMF use an NAE to get all the APIC
IDs?

>
> Thanks,
> Tom
>

2022-07-17 10:08:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 03/49] x86/sev: Add the host SEV-SNP initialization support

On Mon, Jun 20, 2022 at 11:02:01PM +0000, Ashish Kalra wrote:
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the
> + * bookkeeping, the range need to be added during the RMP entry lookup.

needs

> +static int __snp_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_SNP_EN;
> + val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void snp_enable(void *arg)
> +{
> + __snp_enable(smp_processor_id());
> +}

Get rid of that silly wrapper - you're not even using that @cpu argument.

> +static bool get_rmptable_info(u64 *start, u64 *len)
> +{
> + u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end, nr_pages;
> +
> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> + if (!rmp_base || !rmp_end) {
> + pr_info("Memory for the RMP table has not been reserved by BIOS\n");

pr_err

> + return false;
> + }
> +
> + rmp_sz = rmp_end - rmp_base + 1;
> +
> + /*
> + * Calculate the amount the memory that must be reserved by the BIOS to
> + * address the full system RAM. The reserved memory should also cover the

"... address the whole RAM."

> + * RMP table itself.
> + *
> + * See PPR Family 19h Model 01h, Revision B1 section 2.1.4.2 for more
> + * information on memory requirement.

That section number will change over time - if you want to refer to some
section just use its title so that people can at least grep for the
relevant text.

> + */
> + nr_pages = totalram_pages();
> + calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + nr_pages) << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;

use totalram_pages() directly and get rid of nr_pages.

> +
> + if (calc_rmp_sz > rmp_sz) {
> + pr_info("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> + calc_rmp_sz, rmp_sz);

pr_err

> + return false;
> + }
> +
> + *start = rmp_base;
> + *len = rmp_sz;
> +
> + pr_info("RMP table physical address 0x%016llx - 0x%016llx\n", rmp_base, rmp_end);

"RMP table physical address range: ...[0x.. - 0x..]"

> +
> + return true;
> +}
> +
> +static __init int __snp_rmptable_init(void)

s/int/bool/

> +{
> + u64 rmp_base, sz;
> + void *start;
> + u64 val;
> +
> + if (!get_rmptable_info(&rmp_base, &sz))
> + return 1;
> +
> + start = memremap(rmp_base, sz, MEMREMAP_WB);
> + if (!start) {
> + pr_err("Failed to map RMP table 0x%llx+0x%llx\n", rmp_base, sz);
^^^^^^

either write the size in decimal or do a normal interval.

> + return 1;
> + }
> +
> + /*
> + * Check if SEV-SNP is already enabled, this can happen if we are coming from

Who is "we"?

Pls get rid of all "we" in the comments and use passive formulations.

> + * kexec boot.
> + */
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
> + goto skip_enable;
> +
> + /* Initialize the RMP table to zero */
> + memset(start, 0, sz);

Do I understand it correctly that in the kexec case the second, kexec-ed
kernel is reusing the previous kernel's RMP table so it should not be
cleared?

> +
> + /* Flush the caches to ensure that data is written before SNP is enabled. */
> + wbinvd_on_all_cpus();
> +
> + /* Enable SNP on all CPUs. */
> + on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> + rmptable_start = (unsigned long)start;
> + rmptable_end = rmptable_start + sz;
> +
> + return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> + if (!boot_cpu_has(X86_FEATURE_SEV_SNP))

cpu_feature_enabled

> + return 0;
> +
> + if (!iommu_sev_snp_supported())
> + goto nosnp;
> +
> + if (__snp_rmptable_init())
> + goto nosnp;
> +
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> + return 0;
> +
> +nosnp:
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> + return 1;
> +}
> +
> +/*
> + * This must be called after the PCI subsystem. This is because before enabling
> + * the SNP feature we need to ensure that IOMMU supports the SEV-SNP feature.
> + * The iommu_sev_snp_support() is used for checking the feature, and it is
> + * available after subsys_initcall().

I'd much more appreciate here a short formulation explaining why is
IOMMU needed for SNP rather than the obvious.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-19 04:00:42

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 03/49] x86/sev: Add the host SEV-SNP initialization support

[AMD Official Use Only - General]

Hello Boris,

>> + * See PPR Family 19h Model 01h, Revision B1 section 2.1.4.2 for more
>> + * information on memory requirement.

>That section number will change over time - if you want to refer to some section just use its title so that people can at least grep for the relevant text.

This will all go into sev.c, instead of the header file, as this is non-architectural and per-processor and the structure won't be exposed to the rest
of the kernel. The above PPR reference and potentially in future an architectural method of reading the RMP table entries will be moved into it.

>> + */
>> + nr_pages = totalram_pages();
>> + calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + nr_pages) << 4) +
>> +RMPTABLE_CPU_BOOKKEEPING_SZ;

>use totalram_pages() directly and get rid of nr_pages.
Ok.

>> + * kexec boot.
>> + */
>> + rdmsrl(MSR_AMD64_SYSCFG, val);
>> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
>> + goto skip_enable;
>> +
>> + /* Initialize the RMP table to zero */
>> + memset(start, 0, sz);

>Do I understand it correctly that in the kexec case the second, kexec-ed kernel is reusing the previous kernel's RMP table so it should not be cleared?
I believe that with kexec and after issuing the shutdown command, the RMP table needs to be fully initialized, so we should be re-initializing the RMP
table to zero here.

>>
>> +
>> +static int __init snp_rmptable_init(void) {
>> + if (!boot_cpu_has(X86_FEATURE_SEV_SNP))

>cpu_feature_enabled
Ok.

>> + return 0;
>> +
>> + if (!iommu_sev_snp_supported())
>> + goto nosnp;
>> +
>> + if (__snp_rmptable_init())
>> + goto nosnp;
>> +
>> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online",
>> +__snp_enable, NULL);
>> +
>> + return 0;
>> +
>> +nosnp:
>> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>> + return 1;
>> +}
>> +
>> +/*
>> + * This must be called after the PCI subsystem. This is because
>> +before enabling
>> + * the SNP feature we need to ensure that IOMMU supports the SEV-SNP feature.
>> + * The iommu_sev_snp_support() is used for checking the feature, and
>> +it is
>> + * available after subsys_initcall().

>I'd much more appreciate here a short formulation explaining why is IOMMU needed for SNP rather than the obvious.

Yes, IOMMU is enforced for SNP to ensure that HV cannot program DMA directly into guest private memory. In case of SNP,
the IOMMU makes sure that the page(s) used for DMA are HV owned.

Thanks,
Ashish

2022-07-19 08:41:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 03/49] x86/sev: Add the host SEV-SNP initialization support

On Tue, Jul 19, 2022 at 03:56:25AM +0000, Kalra, Ashish wrote:
> > That section number will change over time - if you want to refer to
> > some section just use its title so that people can at least grep for
> > the relevant text.
>
> This will all go into sev.c, instead of the header file, as this is
> non-architectural and per-processor and the structure won't be exposed
> to the rest of the kernel. The above PPR reference and potentially in
> future an architectural method of reading the RMP table entries will
> be moved into it.

I fail to see how this addresses my comment... All I'm saying is, the
"section 2.1.4.2" number will change so don't quote it in the text but
quote the section *name* instead.

> I believe that with kexec and after issuing the shutdown command,
> the RMP table needs to be fully initialized, so we should be
> re-initializing the RMP table to zero here.

And yet you're doing:

/*
* Check if SEV-SNP is already enabled, this can happen if we are coming from
* kexec boot.
*/
rdmsrl(MSR_AMD64_SYSCFG, val);
if (val & MSR_AMD64_SYSCFG_SNP_EN)
goto skip_enable; <-------- skip zeroing


So which is it?

> Yes, IOMMU is enforced for SNP to ensure that HV cannot program DMA
> directly into guest private memory. In case of SNP, the IOMMU makes
> sure that the page(s) used for DMA are HV owned.

Yes, now put that in the comment above the

fs_initcall(snp_rmptable_init);

line.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-19 11:46:37

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 03/49] x86/sev: Add the host SEV-SNP initialization support

[AMD Official Use Only - General]

Hello Boris,

>> > That section number will change over time - if you want to refer to
>> > some section just use its title so that people can at least grep for
>> > the relevant text.
>>
>> This will all go into sev.c, instead of the header file, as this is
>> non-architectural and per-processor and the structure won't be exposed
>> to the rest of the kernel. The above PPR reference and potentially in
>> future an architectural method of reading the RMP table entries will
>> be moved into it.

>I fail to see how this addresses my comment... All I'm saying is, the "section 2.1.4.2" number will change so don't quote it in the text but quote the section *name* instead.

Yes I agree with your comments, all I am saying is that these comments will move into sev.c instead of the header file.

>> I believe that with kexec and after issuing the shutdown command, the
>> RMP table needs to be fully initialized, so we should be
>> re-initializing the RMP table to zero here.

>And yet you're doing:

> /*
> * Check if SEV-SNP is already enabled, this can happen if we are coming from
> * kexec boot.
> */
> rdmsrl(MSR_AMD64_SYSCFG, val);
> if (val & MSR_AMD64_SYSCFG_SNP_EN)
> goto skip_enable; <-------- skip zeroing

>So which is it?

Again what I meant is that this will be fixed to reset the RMP table for kexec boot too.

>> Yes, IOMMU is enforced for SNP to ensure that HV cannot program DMA
>> directly into guest private memory. In case of SNP, the IOMMU makes
>> sure that the page(s) used for DMA are HV owned.

>>Yes, now put that in the comment above the

> fs_initcall(snp_rmptable_init);

>line.

Yes.

Thanks,
Ashish

2022-07-21 11:41:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 04/49] x86/sev: set SYSCFG.MFMD

On Mon, Jun 20, 2022 at 11:02:18PM +0000, Ashish Kalra wrote:
> Subject: [PATCH Part2 v6 04/49] x86/sev: set SYSCFG.MFMD

That subject title needs to be made human readable.

> From: Brijesh Singh <[email protected]>
>
> SEV-SNP FW >= 1.51 requires that SYSCFG.MFMD must be set.

Because?

Also, commit message needs to be human-readable and not pseudocode.

> @@ -2325,6 +2346,9 @@ static __init int __snp_rmptable_init(void)
> /* Flush the caches to ensure that data is written before SNP is enabled. */
> wbinvd_on_all_cpus();
>
> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
> + on_each_cpu(mfdm_enable, NULL, 1);
> +
> /* Enable SNP on all CPUs. */
> on_each_cpu(snp_enable, NULL, 1);

No, not two IPI generating function calls - one and do everything in it.
I.e., what Marc said.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-22 11:42:16

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On Thu, Jun 23, 2022 at 10:43:40PM +0000, Kalra, Ashish wrote:
> Yes, that's a nice way to hide it from the rest of the kernel which
> does not require access to this structure anyway, in essence, it
> becomes a private structure.

So this whole discussion whether there should be a model check or not
in case a new RMP format gets added in the future is moot - when a new
model format comes along, *then* the distinction should be done and
added in code - not earlier.

This is nothing else but normal CPU enablement work - it should be done
when it is really needed.

Because the opposite can happen: you can add a model check which
excludes future model X, future model X comes along but does *not*
change the RMP format and then you're going to have to relax that model
check again to fix SNP on the new model X.

So pls add the model checks only when really needed.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-22 11:48:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On Mon, Jun 20, 2022 at 11:02:33PM +0000, Ashish Kalra wrote:
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 25c7feb367f6..59e7ec6b0326 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -65,6 +65,8 @@
> * bookkeeping, the range need to be added during the RMP entry lookup.
> */
> #define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> +#define RMPENTRY_SHIFT 8
> +#define rmptable_page_offset(x) (RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))
>
> /* For early boot hypervisor communication in SEV-ES enabled guests */
> static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
> @@ -2386,3 +2388,44 @@ static int __init snp_rmptable_init(void)
> * available after subsys_initcall().
> */
> fs_initcall(snp_rmptable_init);
> +
> +static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
> +{
> + unsigned long vaddr, paddr = pfn << PAGE_SHIFT;
> + struct rmpentry *entry, *large_entry;
> +
> + if (!pfn_valid(pfn))
> + return ERR_PTR(-EINVAL);
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return ERR_PTR(-ENXIO);

That test should happen first.

> + vaddr = rmptable_start + rmptable_page_offset(paddr);

Wait, what does that macro do?

It takes the physical address and gives the offset from the beginning of
the RMP table in VA space?

So why don't you do

entry = rmptable_entry(paddr)

instead which simply gives you directly the entry in the RMP table with
which you can work further?

Instead of this macro doing half the work and then callers having to add
the RMP start address and cast.

And make it small function so that you can have typechecking too, while
at it.

> + if (unlikely(vaddr > rmptable_end))
> + return ERR_PTR(-ENXIO);
> +
> + entry = (struct rmpentry *)vaddr;
> +
> + /* Read a large RMP entry to get the correct page level used in RMP entry. */
> + vaddr = rmptable_start + rmptable_page_offset(paddr & PMD_MASK);
> + large_entry = (struct rmpentry *)vaddr;
> + *level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(large_entry));
> +
> + return entry;
> +}
> +
> +/*
> + * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
> + * and -errno if there is no corresponding RMP entry.
> + */
> +int snp_lookup_rmpentry(u64 pfn, int *level)
> +{
> + struct rmpentry *e;
> +
> + e = __snp_lookup_rmpentry(pfn, level);
> + if (IS_ERR(e))
> + return PTR_ERR(e);
> +
> + return !!rmpentry_assigned(e);
> +}
> +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> diff --git a/include/linux/sev.h b/include/linux/sev.h
> new file mode 100644
> index 000000000000..1a68842789e1
> --- /dev/null
> +++ b/include/linux/sev.h

Why is this header in the linux/ namespace and not in arch/x86/ ?

All that stuff in here doesn't have any meaning outside of x86...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-22 19:09:23

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On Fri, Jul 22, 2022, Borislav Petkov wrote:
> On Thu, Jun 23, 2022 at 10:43:40PM +0000, Kalra, Ashish wrote:
> > Yes, that's a nice way to hide it from the rest of the kernel which
> > does not require access to this structure anyway, in essence, it
> > becomes a private structure.
>
> So this whole discussion whether there should be a model check or not
> in case a new RMP format gets added in the future is moot - when a new
> model format comes along, *then* the distinction should be done and
> added in code - not earlier.

I disagree. Running an old kernel on new hardware with a different RMP layout
should refuse to use SNP, not read/write garbage and likely corrupt the RMP and/or
host memory.

And IMO, hiding the non-architectural RMP format in SNP-specific code so that we
don't have to churn a bunch of call sites that don't _need_ access to the raw RMP
format is a good idea regardless of whether we want to be optimistic or pessimistic
about future formats.

> This is nothing else but normal CPU enablement work - it should be done
> when it is really needed.
>
> Because the opposite can happen: you can add a model check which
> excludes future model X, future model X comes along but does *not*
> change the RMP format and then you're going to have to relax that model
> check again to fix SNP on the new model X.
>
> So pls add the model checks only when really needed.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2022-07-22 19:30:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On Fri, Jul 22, 2022 at 07:04:23PM +0000, Sean Christopherson wrote:
> I disagree. Running an old kernel on new hardware with a different RMP layout
> should refuse to use SNP, not read/write garbage and likely corrupt the RMP and/or
> host memory.

See my example below.

> And IMO, hiding the non-architectural RMP format in SNP-specific code so that we
> don't have to churn a bunch of call sites that don't _need_ access to the raw RMP
> format is a good idea regardless of whether we want to be optimistic or pessimistic
> about future formats.

I don't think I ever objected to that.

> > This is nothing else but normal CPU enablement work - it should be done
> > when it is really needed.
> >

<--- this here.

> > Because the opposite can happen: you can add a model check which
> > excludes future model X, future model X comes along but does *not*
> > change the RMP format and then you're going to have to relax that model
> > check again to fix SNP on the new model X.

So constantly adding new models to a list which support a certain
version of the RMP format doesn't scale either.

If you corrupt the RMP because your kernel is old, you'll crash and burn
very visibly so that you'll be forced to have to look for an updated
kernel regardless.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-22 19:39:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

Btw,

what could work is to spec only a *version* field somewhere in the HW or
FW which says which version the RMP header has.

Then, OS would check that field and if it doesn't support that certain
version, it'll bail.

I'd need to talk to folks first, though, what the whole story is behind
not spec-ing the RMP format...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-22 22:18:08

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On Fri, Jul 22, 2022, Borislav Petkov wrote:
> On Fri, Jul 22, 2022 at 07:04:23PM +0000, Sean Christopherson wrote:
> > I disagree. Running an old kernel on new hardware with a different RMP layout
> > should refuse to use SNP, not read/write garbage and likely corrupt the RMP and/or
> > host memory.
>
> See my example below.
>
> > And IMO, hiding the non-architectural RMP format in SNP-specific code so that we
> > don't have to churn a bunch of call sites that don't _need_ access to the raw RMP
> > format is a good idea regardless of whether we want to be optimistic or pessimistic
> > about future formats.
>
> I don't think I ever objected to that.

Yar, just wanted to be make sure we're all on the same page, I wasn't entirely
sure what was get nacked :-)

> > > This is nothing else but normal CPU enablement work - it should be done
> > > when it is really needed.
> > >
>
> <--- this here.
>
> > > Because the opposite can happen: you can add a model check which
> > > excludes future model X, future model X comes along but does *not*
> > > change the RMP format and then you're going to have to relax that model
> > > check again to fix SNP on the new model X.
>
> So constantly adding new models to a list which support a certain
> version of the RMP format doesn't scale either.

Yeah, but either we get AMD to give us an architectural layout or we'll have to
eat that cost at some point in the future.

> If you corrupt the RMP because your kernel is old, you'll crash and burn
> very visibly so that you'll be forced to have to look for an updated
> kernel regardless.

Heh, you're definitely more optimistic than me. I can just see something truly
ridiculous happening like moving the page size bit and then getting weird behavior
only when KVM happens to need the page size for some edge case.

Anyways, it's not a sticking point, and I certainly am not volunteering to
maintain the FMS list...

2022-07-22 22:35:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On Fri, Jul 22, 2022 at 10:16:07PM +0000, Sean Christopherson wrote:
> Yar, just wanted to be make sure we're all on the same page, I wasn't entirely
> sure what was get nacked :-)

Not nacked - we're all just talking here. :-)

> Heh, you're definitely more optimistic than me. I can just see something truly
> ridiculous happening like moving the page size bit and then getting weird behavior
> only when KVM happens to need the page size for some edge case.
>
> Anyways, it's not a sticking point, and I certainly am not volunteering to
> maintain the FMS list...

Yeah, no need for it to be a sticking point because a pretty reliable
birdie just told me that we're worrying for nothing and it all will
solve itself.

:-)

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-24 17:36:50

by Dov Murik

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

Hi Ashish,

On 21/06/2022 2:02, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
> hypervisor will use the instruction to add pages to the RMP table. See
> APM3 for details on the instruction operations.
>
> The PSMASH instruction expands a 2MB RMP entry into a corresponding set of
> contiguous 4KB-Page RMP entries. The hypervisor will use this instruction
> to adjust the RMP entry without invalidating the previous RMP entry.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/sev.h | 11 ++++++
> arch/x86/kernel/sev.c | 72 ++++++++++++++++++++++++++++++++++++++
> 2 files changed, 83 insertions(+)
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index cb16f0e5b585..6ab872311544 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -85,7 +85,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>
> /* RMP page size */
> #define RMP_PG_SIZE_4K 0
> +#define RMP_PG_SIZE_2M 1
> #define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> +#define X86_TO_RMP_PG_LEVEL(level) (((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
>
> /*
> * The RMP entry format is not architectural. The format is defined in PPR
> @@ -126,6 +128,15 @@ struct snp_guest_platform_data {
> u64 secrets_gpa;
> };
>
> +struct rmpupdate {
> + u64 gpa;
> + u8 assigned;
> + u8 pagesize;
> + u8 immutable;
> + u8 rsvd;
> + u32 asid;
> +} __packed;
> +
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> extern struct static_key_false sev_es_enable_key;
> extern void __sev_es_ist_enter(struct pt_regs *regs);
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 59e7ec6b0326..f6c64a722e94 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2429,3 +2429,75 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
> return !!rmpentry_assigned(e);
> }
> EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> +
> +int psmash(u64 pfn)
> +{
> + unsigned long paddr = pfn << PAGE_SHIFT;
> + int ret;
> +
> + if (!pfn_valid(pfn))
> + return -EINVAL;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return -ENXIO;
> +
> + /* Binutils version 2.36 supports the PSMASH mnemonic. */
> + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> + : "=a"(ret)
> + : "a"(paddr)
> + : "memory", "cc");
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(psmash);
> +
> +static int rmpupdate(u64 pfn, struct rmpupdate *val)
> +{
> + unsigned long paddr = pfn << PAGE_SHIFT;
> + int ret;
> +
> + if (!pfn_valid(pfn))
> + return -EINVAL;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return -ENXIO;
> +
> + /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> + asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> + : "=a"(ret)
> + : "a"(paddr), "c"((unsigned long)val)
> + : "memory", "cc");
> + return ret;
> +}
> +
> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
> +{
> + struct rmpupdate val;
> +
> + if (!pfn_valid(pfn))
> + return -EINVAL;
> +

Should we add more checks on the arguments?

1. asid must be > 0
2. gpa must be aligned according to 'level'
3. gpa must be below the maximal address for the guest

"Note that the guest physical address space is limited according to
CPUID Fn80000008_EAX and thus the GPAs used by the firmware in
measurement calculation are equally limited. Hypervisors should not
attempt to map pages outside of this limit."
(-SNP ABI spec page 86, section 8.17 SNP_LAUNCH_UPDATE)


But note that in patch 28 of this series we have:

+ /* Transition the VMSA page to a firmware state. */
+ ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);

That (u64)(-1) value for the gpa argument violates conditions 2 and 3
from my list above.

And indeed when calculating measurements we see that the GPA value
for the VMSA pages is 0x0000FFFF_FFFFF000, and not (u64)(-1). [1] [2]

Instead of checks, we can mask the gpa argument so that rmpupdate will
get the correct value. Not sure which approach is preferable.


[1] https://github.com/IBM/sev-snp-measure/blob/90f6e59831d20e44d03d5ee19388f624fca87291/sevsnpmeasure/gctx.py#L40
[2] https://github.com/slp/snp-digest-rs/blob/0e5a787e99069944467151101ae4db474793d657/src/main.rs#L86


-Dov


> + memset(&val, 0, sizeof(val));
> + val.assigned = 1;
> + val.asid = asid;
> + val.immutable = immutable;
> + val.gpa = gpa;
> + val.pagesize = X86_TO_RMP_PG_LEVEL(level);
> +
> + return rmpupdate(pfn, &val);
> +}
> +EXPORT_SYMBOL_GPL(rmp_make_private);
> +
> +int rmp_make_shared(u64 pfn, enum pg_level level)
> +{
> + struct rmpupdate val;
> +
> + if (!pfn_valid(pfn))
> + return -EINVAL;
> +
> + memset(&val, 0, sizeof(val));
> + val.pagesize = X86_TO_RMP_PG_LEVEL(level);
> +
> + return rmpupdate(pfn, &val);
> +}
> +EXPORT_SYMBOL_GPL(rmp_make_shared);

2022-07-25 11:21:20

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 33/49] KVM: x86: Update page-fault trace to log full 64-bit error code

On 6/21/22 01:10, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The #NPT error code is a 64-bit value but the trace prints only the
> lower 32-bits. Some of the fault error code (e.g PFERR_GUEST_FINAL_MASK)
> are available in the upper 32-bits.
>
> Cc: <[email protected]>

Why stable?

> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kvm/trace.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index e3a24b8f04be..9b9bc5468103 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -383,12 +383,12 @@ TRACE_EVENT(kvm_inj_exception,
> * Tracepoint for page fault.
> */
> TRACE_EVENT(kvm_page_fault,
> - TP_PROTO(unsigned long fault_address, unsigned int error_code),
> + TP_PROTO(unsigned long fault_address, u64 error_code),
> TP_ARGS(fault_address, error_code),
>
> TP_STRUCT__entry(
> __field( unsigned long, fault_address )
> - __field( unsigned int, error_code )
> + __field( u64, error_code )
> ),
>
> TP_fast_assign(
> @@ -396,7 +396,7 @@ TRACE_EVENT(kvm_page_fault,
> __entry->error_code = error_code;
> ),
>
> - TP_printk("address %lx error_code %x",
> + TP_printk("address %lx error_code %llx",
> __entry->fault_address, __entry->error_code)
> );
>

2022-07-25 13:25:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

On Tue, Jun 28, 2022 at 05:57:41PM +0000, Kalra, Ashish wrote:
> Yes, I will be adding a check for CPU family/model as following :

Why if the PPR is already kinda spelling the already architectural
pieces of the RMP entry?

"In order to assist software" it says.

So you call the specified ones by their name and the rest is __rsvd.

No need for model checks at all.

Right?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-25 14:33:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

On Mon, Jun 20, 2022 at 11:02:33PM +0000, Ashish Kalra wrote:
> +/*
> + * The RMP entry format is not architectural. The format is defined in PPR
> + * Family 19h Model 01h, Rev B1 processor.
> + */
> +struct __packed rmpentry {

That __packed goes...

> + union {
> + struct {
> + u64 assigned : 1,
> + pagesize : 1,
> + immutable : 1,
> + rsvd1 : 9,
> + gpa : 39,
> + asid : 10,
> + vmsa : 1,
> + validated : 1,
> + rsvd2 : 1;
> + } info;
> + u64 low;
> + };
> + u64 high;
> +};

... here, at the end.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-25 14:37:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

On Mon, Jun 20, 2022 at 11:02:52PM +0000, Ashish Kalra wrote:
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index cb16f0e5b585..6ab872311544 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -85,7 +85,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>
> /* RMP page size */
> #define RMP_PG_SIZE_4K 0
> +#define RMP_PG_SIZE_2M 1
> #define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> +#define X86_TO_RMP_PG_LEVEL(level) (((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
>
> /*
> * The RMP entry format is not architectural. The format is defined in PPR
> @@ -126,6 +128,15 @@ struct snp_guest_platform_data {
> u64 secrets_gpa;
> };
>
> +struct rmpupdate {

Why is there a struct rmpupdate *and* a struct rmpentry?!

One should be enough.

> + u64 gpa;
> + u8 assigned;
> + u8 pagesize;
> + u8 immutable;
> + u8 rsvd;
> + u32 asid;
> +} __packed;
> +

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-27 18:32:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 07/49] x86/sev: Invalid pages from direct map when adding it to RMP table

On Mon, Jun 20, 2022 at 11:03:07PM +0000, Ashish Kalra wrote:

> Subject: x86/sev: Invalid pages from direct map when adding it to RMP table

"...: Invalidate pages from the direct map when adding them to the RMP table"

> +static int restore_direct_map(u64 pfn, int npages)
> +{
> + int i, ret = 0;
> +
> + for (i = 0; i < npages; i++) {
> + ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));

set_memory_p() ?

> + if (ret)
> + goto cleanup;
> + }
> +
> +cleanup:
> + WARN(ret > 0, "Failed to restore direct map for pfn 0x%llx\n", pfn + i);

Warn for each pfn?!

That'll flood dmesg mightily.

> + return ret;
> +}
> +
> +static int invalid_direct_map(unsigned long pfn, int npages)
> +{
> + int i, ret = 0;
> +
> + for (i = 0; i < npages; i++) {
> + ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));

As above, set_memory_np() doesn't work here instead of looping over each
page?

> @@ -2462,11 +2494,38 @@ static int rmpupdate(u64 pfn, struct rmpupdate *val)
> if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> return -ENXIO;
>
> + level = RMP_TO_X86_PG_LEVEL(val->pagesize);
> + npages = page_level_size(level) / PAGE_SIZE;
> +
> + /*
> + * If page is getting assigned in the RMP table then unmap it from the
> + * direct map.
> + */
> + if (val->assigned) {
> + if (invalid_direct_map(pfn, npages)) {
> + pr_err("Failed to unmap pfn 0x%llx pages %d from direct_map\n",

"Failed to unmap %d pages at pfn 0x... from the direct map\n"

> + pfn, npages);
> + return -EFAULT;
> + }
> + }
> +
> /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> : "=a"(ret)
> : "a"(paddr), "c"((unsigned long)val)
> : "memory", "cc");
> +
> + /*
> + * Restore the direct map after the page is removed from the RMP table.
> + */
> + if (!ret && !val->assigned) {
> + if (restore_direct_map(pfn, npages)) {
> + pr_err("Failed to map pfn 0x%llx pages %d in direct_map\n",

"Failed to map %d pages at pfn 0x... into the direct map\n"

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-08-01 21:17:43

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 04/49] x86/sev: set SYSCFG.MFMD

[AMD Official Use Only - General]

Hello Boris,

>> Subject: [PATCH Part2 v6 04/49] x86/sev: set SYSCFG.MFMD

>That subject title needs to be made human readable.
Ok.

>> SEV-SNP FW >= 1.51 requires that SYSCFG.MFMD must be set.

>Because?
This is a FW requirement.

>Also, commit message needs to be human-readable and not pseudocode.

>> @@ -2325,6 +2346,9 @@ static __init int __snp_rmptable_init(void)
>> /* Flush the caches to ensure that data is written before SNP is enabled. */
>> wbinvd_on_all_cpus();
>>
>> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
>> + on_each_cpu(mfdm_enable, NULL, 1);
>> +
>> /* Enable SNP on all CPUs. */
>> on_each_cpu(snp_enable, NULL, 1);

>No, not two IPI generating function calls - one and do everything in it.
>I.e., what Marc said.

Ok got that.

Thanks,
Ashish

2022-08-01 21:51:15

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

[AMD Official Use Only - General]

>> I disagree. Running an old kernel on new hardware with a different
>> RMP layout should refuse to use SNP, not read/write garbage and likely
>> corrupt the RMP and/or host memory.

>See my example below.

> And IMO, hiding the non-architectural RMP format in SNP-specific code
> so that we don't have to churn a bunch of call sites that don't _need_
> access to the raw RMP format is a good idea regardless of whether we
> want to be optimistic or pessimistic about future formats.

>I don't think I ever objected to that.

I agree with what Sean is recommending to do.

As I mentioned earlier, in the long term and with respect to future platforms, we are going to add architectural support
to read RMP table entries, so this structure will exist only for older platform support.

Thanks,
Ashish

2022-08-01 21:56:44

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

[AMD Official Use Only - General]

As I mentioned before, in the future we will have architectural support to read RMP table entries, we will first check for
availability of this feature and use it always if it is supported and enabled, and only fallback to doing raw RMP table access
if this architectural support is not available.

Thanks,
Ashish

-----Original Message-----
From: Borislav Petkov <[email protected]>
Sent: Friday, July 22, 2022 2:38 PM
To: Sean Christopherson <[email protected]>
Cc: Kalra, Ashish <[email protected]>; Dave Hansen <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Lendacky, Thomas <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Roth, Michael <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

Btw,

what could work is to spec only a *version* field somewhere in the HW or FW which says which version the RMP header has.

Then, OS would check that field and if it doesn't support that certain version, it'll bail.

I'd need to talk to folks first, though, what the whole story is behind not spec-ing the RMP format...

--
Regards/Gruss,
Boris.

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=05%7C01%7CAshish.Kalra%40amd.com%7Cfc8ed4feddb346bbae8a08da6c19b7d6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637941154968117489%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=khiE7a%2FAW8C%2B0RilZHWxGvMtnlQkDTlC5UtU8Q3L1Lo%3D&amp;reserved=0

2022-08-01 22:10:17

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 05/49] x86/sev: Add RMP entry lookup helpers

[AMD Official Use Only - General]

Hello Boris,

>> + * The RMP entry format is not architectural. The format is defined in PPR
>> + * Family 19h Model 01h, Rev B1 processor.
>> + */
>> +struct __packed rmpentry {

>That __packed goes...

>> + union {
>> + struct {
>> + u64 assigned : 1,
>> + pagesize : 1,
>> + immutable : 1,
>> + rsvd1 : 9,
>> + gpa : 39,
>> + asid : 10,
>> + vmsa : 1,
>> + validated : 1,
>> + rsvd2 : 1;
>> + } info;
>> + u64 low;
>> + };
>> + u64 high;
>> +};

>... here, at the end.

Yes, will fix that.

Thanks,
Ashish

2022-08-01 22:35:55

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

[AMD Official Use Only - General]

Hello Boris,

>> +struct rmpupdate {

Why is there a struct rmpupdate *and* a struct rmpentry?!

The struct rmpentry is the raw layout of the RMP table entry while struct rmpupdate is the structure
expected by the rmpupdate instruction for programming the RMP table entries.

Arguably, we can program a struct rmpupdate internally from a struct rmpentry.

But we will still need struct rmpupdate for issuing the rmpupdate instruction, so it is probably cleaner
to keep it this way, as it only has two main callers - rmp_make_private() and rmp_make_shared().

Also due to non-architectural aspect of struct rmpentry, the above functions may need to be modified
If there are changes in struct rmpentry, while struct rmpupdate remains consistent and persistent.

>One should be enough.

>> + u64 gpa;
>> + u8 assigned;
>> + u8 pagesize;
>> + u8 immutable;
>> + u8 rsvd;
>> + u32 asid;
>> +} __packed;
>> +

Thanks,
Ashish

2022-08-01 23:35:26

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

[AMD Official Use Only - General]

Hello Boris,

>> Yes, I will be adding a check for CPU family/model as following :

>Why if the PPR is already kinda spelling the already architectural pieces of the RMP entry?

>"In order to assist software" it says.

The PPR specifies select portions of the RMP entry format for a specific core/platform.

Therefore, the complete struct rmpentry definition is non-architectural.

As per PPR, software should not rely on any field definitions not specified
in this table and the format of an RMP entry may change in future processors.

>So you call the specified ones by their name and the rest is __rsvd.

>No need for model checks at all.

>Right?

But we can't use this struct on a core/platform which has a different layout, so aren't
the model checks required ?

Thanks,
Ashish

2022-08-02 00:02:09

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 07/49] x86/sev: Invalid pages from direct map when adding it to RMP table

[AMD Official Use Only - General]

Hello Boris,

>> Subject: x86/sev: Invalid pages from direct map when adding it to RMP
>> table

>"...: Invalidate pages from the direct map when adding them to the RMP table"
Ok

>> +static int restore_direct_map(u64 pfn, int npages) {
>> + int i, ret = 0;
>> +
>> + for (i = 0; i < npages; i++) {
>> + ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));

>set_memory_p() ?

You mean set_memory_present() ?

Is there an issue with not using set_direct_map_default_noflush(), it is easier to correlate with
this function and it's functionality of restoring the page in the kernel direct map ?

> + if (ret)
> + goto cleanup;
> + }
> +
> +cleanup:
> + WARN(ret > 0, "Failed to restore direct map for pfn 0x%llx\n", pfn +
> +i);

>Warn for each pfn?!

>That'll flood dmesg mightily.

> + return ret;
> +}
> +
> +static int invalid_direct_map(unsigned long pfn, int npages) {
> + int i, ret = 0;
> +
> + for (i = 0; i < npages; i++) {
> + ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));

>As above, set_memory_np() doesn't work here instead of looping over each page?

Yes, set_memory_np() looks more efficient to use instead of looping over each page.

But again, calling set_direct_map_invalid_noflush() is easier to understand from the
calling function's point of view as it correlates to the functionality of invalidating the
page from kernel direct map ?

>> + if (val->assigned) {
>> + if (invalid_direct_map(pfn, npages)) {
>. + pr_err("Failed to unmap pfn 0x%llx pages %d from direct_map\n",

>"Failed to unmap %d pages at pfn 0x... from the direct map\n"
Ok.

>> + if (!ret && !val->assigned) {
>> + if (restore_direct_map(pfn, npages)) {
>> + pr_err("Failed to map pfn 0x%llx pages %d in direct_map\n",

>"Failed to map %d pages at pfn 0x... into the direct map\n"
Ok.

Thanks,
Ashish

2022-08-02 04:58:01

by Kalra, Ashish

[permalink] [raw]
Subject: RE: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

[AMD Official Use Only - General]

Hello Dov,

>> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
>> +bool immutable) {
>> + struct rmpupdate val;
>> +
>> + if (!pfn_valid(pfn))
>> + return -EINVAL;
>> +

>Should we add more checks on the arguments?

>1. asid must be > 0
>2. gpa must be aligned according to 'level'
>3. gpa must be below the maximal address for the guest

Ok, yes it surely makes sense to add more checks on the arguments.

>"Note that the guest physical address space is limited according to CPUID Fn80000008_EAX and thus the GPAs used by the firmware in measurement calculation are equally limited. Hypervisors should not attempt to map pages outside of this limit."
>(-SNP ABI spec page 86, section 8.17 SNP_LAUNCH_UPDATE)


>But note that in patch 28 of this series we have:

>+ /* Transition the VMSA page to a firmware state. */
>+ ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);

>That (u64)(-1) value for the gpa argument violates conditions 2 and 3 from my list above.

>And indeed when calculating measurements we see that the GPA value for the VMSA pages is 0x0000FFFF_FFFFF000, and not (u64)(-1). [1] [2]

>Instead of checks, we can mask the gpa argument so that rmpupdate will get the correct value. Not sure which approach is preferable.

Well, the firmware is anyway masking the gpa argument as you observe in the launch digest, so probably we should do the same here too.

Thanks,
Ashish

2022-08-02 10:59:02

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 13/49] crypto:ccp: Provide APIs to issue SEV-SNP commands

On Tue, Jun 21, 2022 at 03:43:13PM -0600, Peter Gonda wrote:
> (
>
> On Mon, Jun 20, 2022 at 5:05 PM Ashish Kalra <[email protected]> wrote:
> >
> > From: Brijesh Singh <[email protected]>
> >
> > Provide the APIs for the hypervisor to manage an SEV-SNP guest. The
> > commands for SEV-SNP is defined in the SEV-SNP firmware specification.
> >
> > Signed-off-by: Brijesh Singh <[email protected]>
> > ---
> > drivers/crypto/ccp/sev-dev.c | 24 ++++++++++++
> > include/linux/psp-sev.h | 73 ++++++++++++++++++++++++++++++++++++
> > 2 files changed, 97 insertions(+)
> >
> > diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> > index f1173221d0b9..35d76333e120 100644
> > --- a/drivers/crypto/ccp/sev-dev.c
> > +++ b/drivers/crypto/ccp/sev-dev.c
> > @@ -1205,6 +1205,30 @@ int sev_guest_df_flush(int *error)
> > }
> > EXPORT_SYMBOL_GPL(sev_guest_df_flush);
> >
> > +int snp_guest_decommission(struct sev_data_snp_decommission *data, int *error)
> > +{
> > + return sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, data, error);
> > +}
> > +EXPORT_SYMBOL_GPL(snp_guest_decommission);
> > +
> > +int snp_guest_df_flush(int *error)
> > +{
> > + return sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, error);
> > +}
> > +EXPORT_SYMBOL_GPL(snp_guest_df_flush);

Nit: undocumented exported functions. Both need kdoc.

>
> Why not instead change sev_guest_df_flush() to be SNP aware? That way
> callers get the right behavior without having to know if SNP is
> enabled or not.
>
> int sev_guest_df_flush(int *error)
> {
> if (!psp_master || !psp_master->sev_data)
> return -EINVAL;
>
> if (psp_master->sev_data->snp_inited)
> return sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, error);
>
> return sev_do_cmd(SEV_CMD_DF_FLUSH, NULL, error);
> }

Because it serves no purpose to fuse them into one, and is only more
obfuscated (and also undocumented).

Two exported symbols can be traced also separately with ftrace/kprobes.

Degrading transparency is not great idea IMHO.

BR, Jarkko


2022-08-02 11:25:11

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

On Mon, Jun 20, 2022 at 11:05:01PM +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The behavior and requirement for the SEV-legacy command is altered when
> the SNP firmware is in the INIT state. See SEV-SNP firmware specification
> for more details.
>
> Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
> when SNP is enabled to satify new requirements for the SNP. Continue
> allocating a 1mb region for !SNP configuration.
>
> While at it, provide API that can be used by others to allocate a page
> that can be used by the firmware. The immediate user for this API will
> be the KVM driver. The KVM driver to need to allocate a firmware context
> page during the guest creation. The context page need to be updated
> by the firmware. See the SEV-SNP specification for further details.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> drivers/crypto/ccp/sev-dev.c | 173 +++++++++++++++++++++++++++++++++--
> include/linux/psp-sev.h | 11 +++
> 2 files changed, 178 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 35d76333e120..0dbd99f29b25 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -79,6 +79,14 @@ static void *sev_es_tmr;
> #define NV_LENGTH (32 * 1024)
> static void *sev_init_ex_buffer;
>
> +/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
> +#define SEV_SNP_ES_TMR_SIZE (2 * 1024 * 1024)
> +
> +static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
> +
> +static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
> +static int sev_do_cmd(int cmd, void *data, int *psp_ret);
> +
> static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
> {
> struct sev_device *sev = psp_master->sev_data;
> @@ -177,11 +185,161 @@ static int sev_cmd_buffer_len(int cmd)
> return 0;
> }
>
> +static void snp_leak_pages(unsigned long pfn, unsigned int npages)
> +{
> + WARN(1, "psc failed, pfn 0x%lx pages %d (leaking)\n", pfn, npages);
> + while (npages--) {
> + memory_failure(pfn, 0);
> + dump_rmpentry(pfn);
> + pfn++;
> + }
> +}
> +
> +static int snp_reclaim_pages(unsigned long pfn, unsigned int npages, bool locked)
> +{
> + struct sev_data_snp_page_reclaim data;
> + int ret, err, i, n = 0;
> +
> + for (i = 0; i < npages; i++) {
> + memset(&data, 0, sizeof(data));
> + data.paddr = pfn << PAGE_SHIFT;
> +
> + if (locked)
> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> + else
> + ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> + if (ret)
> + goto cleanup;
> +
> + ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (ret)
> + goto cleanup;
> +
> + pfn++;
> + n++;
> + }
> +
> + return 0;
> +
> +cleanup:
> + /*
> + * If failed to reclaim the page then page is no longer safe to
> + * be released, leak it.
> + */
> + snp_leak_pages(pfn, npages - n);
> + return ret;
> +}
> +
> +static inline int rmp_make_firmware(unsigned long pfn, int level)
> +{
> + return rmp_make_private(pfn, 0, level, 0, true);
> +}
> +
> +static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, bool to_fw, bool locked,
> + bool need_reclaim)
> +{
> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT; /* Cbit maybe set in the paddr */
> + int rc, n = 0, i;
> +
> + for (i = 0; i < npages; i++) {
> + if (to_fw)
> + rc = rmp_make_firmware(pfn, PG_LEVEL_4K);
> + else
> + rc = need_reclaim ? snp_reclaim_pages(pfn, 1, locked) :
> + rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (rc)
> + goto cleanup;
> +
> + pfn++;
> + n++;
> + }
> +
> + return 0;
> +
> +cleanup:
> + /* Try unrolling the firmware state changes */
> + if (to_fw) {
> + /*
> + * Reclaim the pages which were already changed to the
> + * firmware state.
> + */
> + snp_reclaim_pages(paddr >> PAGE_SHIFT, n, locked);
> +
> + return rc;
> + }
> +
> + /*
> + * If failed to change the page state to shared, then its not safe
> + * to release the page back to the system, leak it.
> + */
> + snp_leak_pages(pfn, npages - n);
> +
> + return rc;
> +}
> +
> +static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
> +{
> + unsigned long npages = 1ul << order, paddr;
> + struct sev_device *sev;
> + struct page *page;
> +
> + if (!psp_master || !psp_master->sev_data)
> + return NULL;
> +
> + page = alloc_pages(gfp_mask, order);
> + if (!page)
> + return NULL;
> +
> + /* If SEV-SNP is initialized then add the page in RMP table. */
> + sev = psp_master->sev_data;
> + if (!sev->snp_inited)
> + return page;
> +
> + paddr = __pa((unsigned long)page_address(page));
> + if (snp_set_rmp_state(paddr, npages, true, locked, false))
> + return NULL;
> +
> + return page;
> +}
> +
> +void *snp_alloc_firmware_page(gfp_t gfp_mask)
> +{
> + struct page *page;
> +
> + page = __snp_alloc_firmware_pages(gfp_mask, 0, false);

Could be just

struct page *page == __snp_alloc_firmware_pages(gfp_mask, 0, false);

> +
> + return page ? page_address(page) : NULL;
> +}
> +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);

Undocumented API

Why don't you just export __snp_alloc_firmware_pages() and declare these
trivial wrappers as "static inline" inside psp-sev.h?

> +
> +static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
> +{
> + unsigned long paddr, npages = 1ul << order;
> +
> + if (!page)
> + return;

Silently ignored NULL pointer.

> +
> + paddr = __pa((unsigned long)page_address(page));
> + if (snp_set_rmp_state(paddr, npages, false, locked, true))
> + return;
> +
> + __free_pages(page, order);
> +}
> +
> +void snp_free_firmware_page(void *addr)
> +{
> + if (!addr)
> + return;

Why silently ignore a NULL pointer? At minimum, pr_warn() would be
appropriate.

> +
> + __snp_free_firmware_pages(virt_to_page(addr), 0, false);
> +}
> +EXPORT_SYMBOL(snp_free_firmware_page);

Ditto, same comments as for allocation part.

> +
> static void *sev_fw_alloc(unsigned long len)
> {
> struct page *page;
>
> - page = alloc_pages(GFP_KERNEL, get_order(len));
> + page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len), false);
> if (!page)
> return NULL;
>
> @@ -393,7 +551,7 @@ static int __sev_init_locked(int *error)
> data.tmr_address = __pa(sev_es_tmr);
>
> data.flags |= SEV_INIT_FLAGS_SEV_ES;
> - data.tmr_len = SEV_ES_TMR_SIZE;
> + data.tmr_len = sev_es_tmr_size;
> }
>
> return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
> @@ -421,7 +579,7 @@ static int __sev_init_ex_locked(int *error)
> data.tmr_address = __pa(sev_es_tmr);
>
> data.flags |= SEV_INIT_FLAGS_SEV_ES;
> - data.tmr_len = SEV_ES_TMR_SIZE;
> + data.tmr_len = sev_es_tmr_size;
> }
>
> return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
> @@ -818,6 +976,8 @@ static int __sev_snp_init_locked(int *error)
> sev->snp_inited = true;
> dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
>
> + sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
> +
> return rc;
> }
>
> @@ -1341,8 +1501,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
> /* The TMR area was encrypted, flush it from the cache */
> wbinvd_on_all_cpus();
>
> - free_pages((unsigned long)sev_es_tmr,
> - get_order(SEV_ES_TMR_SIZE));
> + __snp_free_firmware_pages(virt_to_page(sev_es_tmr),
> + get_order(sev_es_tmr_size),
> + false);
> sev_es_tmr = NULL;
> }
>
> @@ -1430,7 +1591,7 @@ void sev_pci_init(void)
> }
>
> /* Obtain the TMR memory area for SEV-ES use */
> - sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
> + sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
> if (!sev_es_tmr)
> dev_warn(sev->dev,
> "SEV: TMR allocation failed, SEV-ES support unavailable\n");
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 9f921d221b75..a3bb792bb842 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -12,6 +12,8 @@
> #ifndef __PSP_SEV_H__
> #define __PSP_SEV_H__
>
> +#include <linux/sev.h>
> +
> #include <uapi/linux/psp-sev.h>
>
> #ifdef CONFIG_X86
> @@ -940,6 +942,8 @@ int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
> int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
>
> void *psp_copy_user_blob(u64 uaddr, u32 len);
> +void *snp_alloc_firmware_page(gfp_t mask);
> +void snp_free_firmware_page(void *addr);
>
> #else /* !CONFIG_CRYPTO_DEV_SP_PSP */
>
> @@ -981,6 +985,13 @@ static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *erro
> return -ENODEV;
> }
>
> +static inline void *snp_alloc_firmware_page(gfp_t mask)
> +{
> + return NULL;
> +}
> +
> +static inline void snp_free_firmware_page(void *addr) { }
> +
> #endif /* CONFIG_CRYPTO_DEV_SP_PSP */
>
> #endif /* __PSP_SEV_H__ */
> --
> 2.25.1
>

BR, Jarkko










2022-08-02 12:19:02

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

On Tue, Jun 21, 2022 at 08:17:15PM +0000, Kalra, Ashish wrote:
> [Public]
>
> Hello Peter,
>
> >> +static int snp_reclaim_pages(unsigned long pfn, unsigned int npages,
> >> +bool locked) {
> >> + struct sev_data_snp_page_reclaim data;
> >> + int ret, err, i, n = 0;
> >> +
> >> + for (i = 0; i < npages; i++) {
>
> >What about setting |n| here too, also the other increments.
>
> >for (i = 0, n = 0; i < npages; i++, n++, pfn++)
>
> Yes that is simpler.
>
> >> + memset(&data, 0, sizeof(data));
> >> + data.paddr = pfn << PAGE_SHIFT;
> >> +
> >> + if (locked)
> >> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> >> + else
> >> + ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM,
> >> + &data, &err);
>
> > Can we change `sev_cmd_mutex` to some sort of nesting lock type? That could clean up this if (locked) code.
>
> > +static inline int rmp_make_firmware(unsigned long pfn, int level) {
> > + return rmp_make_private(pfn, 0, level, 0, true); }
> > +
> > +static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, bool to_fw, bool locked,
> > + bool need_reclaim)
>
> >This function can do a lot and when I read the call sites its hard to see what its doing since we have a combination of arguments which tell us what behavior is happening, some of which are not valid (ex: to_fw == true and need_reclaim == true is an >invalid argument combination).
>
> to_fw is used to make a firmware page and need_reclaim is for freeing the firmware page, so they are going to be mutually exclusive.
>
> I actually can connect with it quite logically with the callers :
> snp_alloc_firmware_pages will call with to_fw = true and need_reclaim = false
> and snp_free_firmware_pages will do the opposite, to_fw = false and need_reclaim = true.
>
> That seems straightforward to look at.
>
> >Also this for loop over |npages| is duplicated from snp_reclaim_pages(). One improvement here is that on the current
> >snp_reclaim_pages() if we fail to reclaim a page we assume we cannot reclaim the next pages, this may cause us to snp_leak_pages() more pages than we actually need too.
>
> Yes that is true.
>
> >What about something like this?
>
> >static snp_leak_page(u64 pfn, enum pg_level level) {
> > memory_failure(pfn, 0);
> > dump_rmpentry(pfn);
> >}
>
> >static int snp_reclaim_page(u64 pfn, enum pg_level level) {
> > int ret;
> > struct sev_data_snp_page_reclaim data;
>
> > ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> > if (ret)
> > goto cleanup;
>
> > ret = rmp_make_shared(pfn, level);
> > if (ret)
> > goto cleanup;
>
> > return 0;
>
> >cleanup:
> > snp_leak_page(pfn, level)
> >}
>
> >typedef int (*rmp_state_change_func) (u64 pfn, enum pg_level level);
>
> >static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, rmp_state_change_func state_change, rmp_state_change_func cleanup) {
> > struct sev_data_snp_page_reclaim data;
> > int ret, err, i, n = 0;
>
> > for (i = 0, n = 0; i < npages; i++, n++, pfn++) {
> > ret = state_change(pfn, PG_LEVEL_4K)
> > if (ret)
> > goto cleanup;
> > }
>
> > return 0;
>
> > cleanup:
> > for (; i>= 0; i--, n--, pfn--) {
> > cleanup(pfn, PG_LEVEL_4K);
> > }
>
> > return ret;
> >}
>
> >Then inside of __snp_alloc_firmware_pages():
>
> >snp_set_rmp_state(paddr, npages, rmp_make_firmware, snp_reclaim_page);
>
> >And inside of __snp_free_firmware_pages():
>
> >snp_set_rmp_state(paddr, npages, snp_reclaim_page, snp_leak_page);
>
> >Just a suggestion feel free to ignore. The readability comment could be addressed much less invasively by just making separate functions for each valid combination of arguments here. Like snp_set_rmp_fw_state(), snp_set_rmp_shared_state(),
> >snp_set_rmp_release_state() or something.
>
> >> +static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int
> >> +order, bool locked) {
> >> + unsigned long npages = 1ul << order, paddr;
> >> + struct sev_device *sev;
> >> + struct page *page;
> >> +
> >> + if (!psp_master || !psp_master->sev_data)
> >> + return NULL;
> >> +
> >> + page = alloc_pages(gfp_mask, order);
> >> + if (!page)
> >> + return NULL;
> >> +
> >> + /* If SEV-SNP is initialized then add the page in RMP table. */
> >> + sev = psp_master->sev_data;
> >> + if (!sev->snp_inited)
> >> + return page;
> >> +
> >> + paddr = __pa((unsigned long)page_address(page));
> >> + if (snp_set_rmp_state(paddr, npages, true, locked, false))
> >> + return NULL;
>
> >So what about the case where snp_set_rmp_state() fails but we were able to reclaim all the pages? Should we be able to signal that to callers so that we could free |page| here? But given this is an error path already maybe we can optimize this in a >follow up series.
>
> Yes, we should actually tie in to snp_reclaim_pages() success or failure here in the case we were able to successfully unroll some or all of the firmware state change.
>
> > +
> > + return page;
> > +}
> > +
> > +void *snp_alloc_firmware_page(gfp_t gfp_mask) {
> > + struct page *page;
> > +
> > + page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
> > +
> > + return page ? page_address(page) : NULL; }
> > +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
> > +
> > +static void __snp_free_firmware_pages(struct page *page, int order,
> > +bool locked) {
> > + unsigned long paddr, npages = 1ul << order;
> > +
> > + if (!page)
> > + return;
> > +
> > + paddr = __pa((unsigned long)page_address(page));
> > + if (snp_set_rmp_state(paddr, npages, false, locked, true))
> > + return;
>
> > Here we may be able to free some of |page| depending how where inside of snp_set_rmp_state() we failed. But again given this is an error path already maybe we can optimize this in a follow up series.
>
> Yes, we probably should be able to free some of the page(s) depending on how many page(s) got reclaimed in snp_set_rmp_state().
> But these reclamation failures may not be very common, so any failure is indicative of a bigger issue, it might be the case when there is a single page reclamation error it might happen with all the subsequent
> pages and so follow a simple recovery procedure, then handling a more complex recovery for a chunk of pages being reclaimed and another chunk not.

Silent ignore is stil a bad idea. I.e. at minimum would
make sense to print a warning to klog.

>
> Thanks,
> Ashish

BR, Jarkko

2022-08-02 12:45:31

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 18/49] crypto: ccp: Provide APIs to query extended attestation report

I'd rephrase "Provide in-kernel API..." (e.g. not uapi).

On Mon, Jun 20, 2022 at 11:06:06PM +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> Version 2 of the GHCB specification defines VMGEXIT that is used to get
> the extended attestation report. The extended attestation report includes
> the certificate blobs provided through the SNP_SET_EXT_CONFIG.
>
> The snp_guest_ext_guest_request() will be used by the hypervisor to get
> the extended attestation report. See the GHCB specification for more
> details.

What is "the hypersivor"? Could it be replaced with e.g. KVM for
clarity?

>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> drivers/crypto/ccp/sev-dev.c | 43 ++++++++++++++++++++++++++++++++++++
> include/linux/psp-sev.h | 24 ++++++++++++++++++++
> 2 files changed, 67 insertions(+)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 97b479d5aa86..f6306b820b86 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -25,6 +25,7 @@
> #include <linux/fs.h>
>
> #include <asm/smp.h>
> +#include <asm/sev.h>
>
> #include "psp-dev.h"
> #include "sev-dev.h"
> @@ -1857,6 +1858,48 @@ int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
> }
> EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);
>
> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> + unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
> +{
> + unsigned long expected_npages;
> + struct sev_device *sev;
> + int rc;
> +
> + if (!psp_master || !psp_master->sev_data)
> + return -ENODEV;
> +
> + sev = psp_master->sev_data;
> +
> + if (!sev->snp_inited)
> + return -EINVAL;
> +
> + /*
> + * Check if there is enough space to copy the certificate chain. Otherwise
> + * return ERROR code defined in the GHCB specification.
> + */
> + expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
> + if (*npages < expected_npages) {
> + *npages = expected_npages;
> + *fw_err = SNP_GUEST_REQ_INVALID_LEN;
> + return -EINVAL;
> + }
> +
> + rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)&fw_err);
> + if (rc)
> + return rc;
> +
> + /* Copy the certificate blob */
> + if (sev->snp_certs_data) {
> + *npages = expected_npages;
> + memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);
> + } else {
> + *npages = 0;
> + }
> +
> + return rc;
> +}
> +EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);

Undocumented export.

> +
> static void sev_exit(struct kref *ref)
> {
> misc_deregister(&misc_dev->misc);
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index a3bb792bb842..cd37ccd1fa1f 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -945,6 +945,23 @@ void *psp_copy_user_blob(u64 uaddr, u32 len);
> void *snp_alloc_firmware_page(gfp_t mask);
> void snp_free_firmware_page(void *addr);
>
> +/**
> + * snp_guest_ext_guest_request - perform the SNP extended guest request command
> + * defined in the GHCB specification.
> + *
> + * @data: the input guest request structure
> + * @vaddr: address where the certificate blob need to be copied.
> + * @npages: number of pages for the certificate blob.
> + * If the specified page count is less than the certificate blob size, then the
> + * required page count is returned with error code defined in the GHCB spec.
> + * If the specified page count is more than the certificate blob size, then
> + * page count is updated to reflect the amount of valid data copied in the
> + * vaddr.
> + */

This kdoc is misplaced: it should be in sev-dev.c, right before the
implementation. Also it does not say anything about return value, and
still the return type is "int".

> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> + unsigned long vaddr, unsigned long *npages,
> + unsigned long *error);
> +
> #else /* !CONFIG_CRYPTO_DEV_SP_PSP */
>
> static inline int
> @@ -992,6 +1009,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)
>
> static inline void snp_free_firmware_page(void *addr) { }
>
> +static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> + unsigned long vaddr, unsigned long *n,
> + unsigned long *error)
> +{
> + return -ENODEV;
> +}
> +
> #endif /* CONFIG_CRYPTO_DEV_SP_PSP */
>
> #endif /* __PSP_SEV_H__ */
> --
> 2.25.1
>

BR, Jarkko

2022-08-02 12:51:20

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 26/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command

On Mon, Jun 20, 2022 at 11:08:05PM +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
> guest's memory. The data is encrypted with the cryptographic context
> created with the KVM_SEV_SNP_LAUNCH_START.
>
> In addition to the inserting data, it can insert a two special pages
> into the guests memory: the secrets page and the CPUID page.
>
> While terminating the guest, reclaim the guest pages added in the RMP
> table. If the reclaim fails, then the page is no longer safe to be
> released back to the system and leak them.

From this paragraph I get a picture that reclaimer is failing "all the
time", and that is totally normal and legit behaviour. Is this the case?

Stimuli/conditions/something is mandatory if failure is mentioned in any
context.

>
> For more information see the SEV-SNP specification.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> .../virt/kvm/x86/amd-memory-encryption.rst | 29 +++
> arch/x86/kvm/svm/sev.c | 187 ++++++++++++++++++
> include/uapi/linux/kvm.h | 19 ++
> 3 files changed, 235 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index 878711f2dca6..62abd5c1f72b 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -486,6 +486,35 @@ Returns: 0 on success, -negative on error
>
> See the SEV-SNP specification for further detail on the launch input.
>
> +20. KVM_SNP_LAUNCH_UPDATE
> +-------------------------
> +
> +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
> +calculates a measurement of the memory contents. The measurement is a signature
> +of the memory contents that can be sent to the guest owner as an attestation
> +that the memory was encrypted correctly by the firmware.
> +
> +Parameters (in): struct kvm_snp_launch_update
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_snp_launch_update {
> + __u64 start_gfn; /* Guest page number to start from. */
> + __u64 uaddr; /* userspace address need to be encrypted */
> + __u32 len; /* length of memory region */
> + __u8 imi_page; /* 1 if memory is part of the IMI */
> + __u8 page_type; /* page type */
> + __u8 vmpl3_perms; /* VMPL3 permission mask */
> + __u8 vmpl2_perms; /* VMPL2 permission mask */
> + __u8 vmpl1_perms; /* VMPL1 permission mask */
> + };
> +
> +See the SEV-SNP spec for further details on how to build the VMPL permission
> +mask and page type.
> +
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 41b83aa6b5f4..b5f0707d7ed6 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -18,6 +18,7 @@
> #include <linux/processor.h>
> #include <linux/trace_events.h>
> #include <linux/hugetlb.h>
> +#include <linux/sev.h>
>
> #include <asm/pkru.h>
> #include <asm/trapnr.h>
> @@ -233,6 +234,49 @@ static void sev_decommission(unsigned int handle)
> sev_guest_decommission(&decommission, NULL);
> }
>
> +static inline void snp_leak_pages(u64 pfn, enum pg_level level)
> +{
> + unsigned int npages = page_level_size(level) >> PAGE_SHIFT;
> +
> + WARN(1, "psc failed pfn 0x%llx pages %d (leaking)\n", pfn, npages);
> +
> + while (npages) {
> + memory_failure(pfn, 0);
> + dump_rmpentry(pfn);
> + npages--;
> + pfn++;
> + }
> +}
> +
> +static int snp_page_reclaim(u64 pfn)
> +{
> + struct sev_data_snp_page_reclaim data = {0};
> + int err, rc;
> +
> + data.paddr = __sme_set(pfn << PAGE_SHIFT);
> + rc = snp_guest_page_reclaim(&data, &err);
> + if (rc) {
> + /*
> + * If the reclaim failed, then page is no longer safe
> + * to use.
> + */
> + snp_leak_pages(pfn, PG_LEVEL_4K);
> + }
> +
> + return rc;
> +}
> +
> +static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
> +{
> + int rc;
> +
> + rc = rmp_make_shared(pfn, level);
> + if (rc && leak)
> + snp_leak_pages(pfn, level);
> +
> + return rc;
> +}
> +
> static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
> {
> struct sev_data_deactivate deactivate;
> @@ -1902,6 +1946,123 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return rc;
> }
>
> +static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct list_head *head = &sev->regions_list;
> + struct enc_region *i;
> +
> + lockdep_assert_held(&kvm->lock);
> +
> + list_for_each_entry(i, head, list) {
> + u64 start = i->uaddr;
> + u64 end = start + i->size;
> +
> + if (start <= hva && end >= (hva + len))
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_update data = {0};
> + struct kvm_sev_snp_launch_update params;
> + unsigned long npages, pfn, n = 0;
> + int *error = &argp->error;
> + struct page **inpages;
> + int ret, i, level;
> + u64 gfn;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (!sev->snp_context)
> + return -EINVAL;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> + return -EFAULT;
> +
> + /* Verify that the specified address range is registered. */
> + if (!is_hva_registered(kvm, params.uaddr, params.len))
> + return -EINVAL;
> +
> + /*
> + * The userspace memory is already locked so technically we don't
> + * need to lock it again. Later part of the function needs to know
> + * pfn so call the sev_pin_memory() so that we can get the list of
> + * pages to iterate through.
> + */
> + inpages = sev_pin_memory(kvm, params.uaddr, params.len, &npages, 1);
> + if (!inpages)
> + return -ENOMEM;
> +
> + /*
> + * Verify that all the pages are marked shared in the RMP table before
> + * going further. This is avoid the cases where the userspace may try
> + * updating the same page twice.
> + */
> + for (i = 0; i < npages; i++) {
> + if (snp_lookup_rmpentry(page_to_pfn(inpages[i]), &level) != 0) {
> + sev_unpin_memory(kvm, inpages, npages);
> + return -EFAULT;
> + }
> + }
> +
> + gfn = params.start_gfn;
> + level = PG_LEVEL_4K;
> + data.gctx_paddr = __psp_pa(sev->snp_context);
> +
> + for (i = 0; i < npages; i++) {
> + pfn = page_to_pfn(inpages[i]);
> +
> + ret = rmp_make_private(pfn, gfn << PAGE_SHIFT, level, sev_get_asid(kvm), true);
> + if (ret) {
> + ret = -EFAULT;
> + goto e_unpin;
> + }
> +
> + n++;
> + data.address = __sme_page_pa(inpages[i]);
> + data.page_size = X86_TO_RMP_PG_LEVEL(level);
> + data.page_type = params.page_type;
> + data.vmpl3_perms = params.vmpl3_perms;
> + data.vmpl2_perms = params.vmpl2_perms;
> + data.vmpl1_perms = params.vmpl1_perms;
> + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, &data, error);
> + if (ret) {
> + /*
> + * If the command failed then need to reclaim the page.
> + */
> + snp_page_reclaim(pfn);
> + goto e_unpin;
> + }
> +
> + gfn++;
> + }
> +
> +e_unpin:
> + /* Content of memory is updated, mark pages dirty */
> + for (i = 0; i < n; i++) {
> + set_page_dirty_lock(inpages[i]);
> + mark_page_accessed(inpages[i]);
> +
> + /*
> + * If its an error, then update RMP entry to change page ownership
> + * to the hypervisor.
> + */
> + if (ret)
> + host_rmp_make_shared(pfn, level, true);
> + }
> +
> + /* Unlock the user pages */
> + sev_unpin_memory(kvm, inpages, npages);
> +
> + return ret;
> +}
> +
> int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -1995,6 +2156,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SNP_LAUNCH_START:
> r = snp_launch_start(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SNP_LAUNCH_UPDATE:
> + r = snp_launch_update(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> @@ -2113,6 +2277,29 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
> static void __unregister_enc_region_locked(struct kvm *kvm,
> struct enc_region *region)
> {
> + unsigned long i, pfn;
> + int level;
> +
> + /*
> + * The guest memory pages are assigned in the RMP table. Unassign it
> + * before releasing the memory.
> + */
> + if (sev_snp_guest(kvm)) {
> + for (i = 0; i < region->npages; i++) {
> + pfn = page_to_pfn(region->pages[i]);
> +
> + if (!snp_lookup_rmpentry(pfn, &level))
> + continue;
> +
> + cond_resched();
> +
> + if (level > PG_LEVEL_4K)
> + pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
> +
> + host_rmp_make_shared(pfn, level, true);
> + }
> + }
> +
> sev_unpin_memory(kvm, region->pages, region->npages);
> list_del(&region->list);
> kfree(region);
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 0cb119d66ae5..9b36b07414ea 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1813,6 +1813,7 @@ enum sev_cmd_id {
> /* SNP specific commands */
> KVM_SEV_SNP_INIT,
> KVM_SEV_SNP_LAUNCH_START,
> + KVM_SEV_SNP_LAUNCH_UPDATE,
>
> KVM_SEV_NR_MAX,
> };
> @@ -1929,6 +1930,24 @@ struct kvm_sev_snp_launch_start {
> __u8 pad[6];
> };
>
> +#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
> +#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2
> +#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
> +#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
> +#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
> +#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
> +
> +struct kvm_sev_snp_launch_update {
> + __u64 start_gfn;
> + __u64 uaddr;
> + __u32 len;
> + __u8 imi_page;
> + __u8 page_type;
> + __u8 vmpl3_perms;
> + __u8 vmpl2_perms;
> + __u8 vmpl1_perms;
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> --
> 2.25.1
>

BR, Jarkko

2022-08-02 13:23:13

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 24/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command

On Mon, Jun 20, 2022 at 11:07:35PM +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
> The command initializes a cryptographic digest context used to construct
> the measurement of the guest. If the guest is expected to be migrated,
> the command also binds a migration agent (MA) to the guest.
>
> For more information see the SEV-SNP specification.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> .../virt/kvm/x86/amd-memory-encryption.rst | 24 ++++
> arch/x86/kvm/svm/sev.c | 115 +++++++++++++++++-
> arch/x86/kvm/svm/svm.h | 1 +
> include/uapi/linux/kvm.h | 10 ++
> 4 files changed, 147 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index 903023f524af..878711f2dca6 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -462,6 +462,30 @@ The flags bitmap is defined as::
> If the specified flags is not supported then return -EOPNOTSUPP, and the supported
> flags are returned.
>
> +19. KVM_SNP_LAUNCH_START
> +------------------------
> +
> +The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
> +context for the SEV-SNP guest. To create the encryption context, user must
> +provide a guest policy, migration agent (if any) and guest OS visible
> +workarounds value as defined SEV-SNP specification.
> +
> +Parameters (in): struct kvm_snp_launch_start
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_snp_launch_start {
> + __u64 policy; /* Guest policy to use. */
> + __u64 ma_uaddr; /* userspace address of migration agent */
> + __u8 ma_en; /* 1 if the migtation agent is enabled */
> + __u8 imi_en; /* set IMI to 1. */
> + __u8 gosvw[16]; /* guest OS visible workarounds */
> + };
> +
> +See the SEV-SNP specification for further detail on the launch input.
> +
> References
> ==========
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 813bda7f7b55..9e6fc7a94ed7 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -21,6 +21,7 @@
> #include <asm/pkru.h>
> #include <asm/trapnr.h>
> #include <asm/fpu/xcr.h>
> +#include <asm/sev.h>
>
> #include "x86.h"
> #include "svm.h"
> @@ -73,6 +74,8 @@ static unsigned int nr_asids;
> static unsigned long *sev_asid_bitmap;
> static unsigned long *sev_reclaim_asid_bitmap;
>
> +static int snp_decommission_context(struct kvm *kvm);
> +
> struct enc_region {
> struct list_head list;
> unsigned long npages;
> @@ -98,12 +101,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
> down_write(&sev_deactivate_lock);
>
> wbinvd_on_all_cpus();
> - ret = sev_guest_df_flush(&error);
> +
> + if (sev_snp_enabled)
> + ret = snp_guest_df_flush(&error);
> + else
> + ret = sev_guest_df_flush(&error);
>
> up_write(&sev_deactivate_lock);
>
> if (ret)
> - pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
> + pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
> + sev_snp_enabled ? "-SNP" : "", ret, error);
>
> return ret;
> }
> @@ -1825,6 +1833,74 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
> return ret;
> }
>
> +static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct sev_data_snp_gctx_create data = {};
> + void *context;
> + int rc;
> +
> + /* Allocate memory for context page */

Nit: this comment has very little value, if any. It's just stating
the obvious.

Instead, I'd add a description for the function:

/*
* Allocate and initialize a digest for the guest measurement.
*/
static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)

This would be much more helpful to get a grasp on "what I'm looking at".

> + context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
> + if (!context)
> + return NULL;
> +
> + data.gctx_paddr = __psp_pa(context);
> + rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
> + if (rc) {
> + snp_free_firmware_page(context);
> + return NULL;
> + }
> +
> + return context;
> +}
> +
> +static int snp_bind_asid(struct kvm *kvm, int *error)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_activate data = {0};
> +
> + data.gctx_paddr = __psp_pa(sev->snp_context);
> + data.asid = sev_get_asid(kvm);
> + return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
> +}
> +
> +static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_start start = {0};
> + struct kvm_sev_snp_launch_start params;
> + int rc;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> + return -EFAULT;
> +
> + sev->snp_context = snp_context_create(kvm, argp);
> + if (!sev->snp_context)
> + return -ENOTTY;
> +
> + start.gctx_paddr = __psp_pa(sev->snp_context);
> + start.policy = params.policy;
> + memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
> + rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
> + if (rc)
> + goto e_free_context;
> +
> + sev->fd = argp->sev_fd;
> + rc = snp_bind_asid(kvm, &argp->error);
> + if (rc)
> + goto e_free_context;
> +
> + return 0;
> +
> +e_free_context:
> + snp_decommission_context(kvm);
> +
> + return rc;
> +}
> +
> int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -1915,6 +1991,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> case KVM_SEV_RECEIVE_FINISH:
> r = sev_receive_finish(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SNP_LAUNCH_START:
> + r = snp_launch_start(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> @@ -2106,6 +2185,28 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
> return ret;
> }
>
> +static int snp_decommission_context(struct kvm *kvm)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_decommission data = {};
> + int ret;
> +
> + /* If context is not created then do nothing */
> + if (!sev->snp_context)
> + return 0;
> +
> + data.gctx_paddr = __sme_pa(sev->snp_context);
> + ret = snp_guest_decommission(&data, NULL);
> + if (WARN_ONCE(ret, "failed to release guest context"))
> + return ret;
> +
> + /* free the context page now */
> + snp_free_firmware_page(sev->snp_context);
> + sev->snp_context = NULL;
> +
> + return 0;
> +}
> +
> void sev_vm_destroy(struct kvm *kvm)
> {
> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> @@ -2147,7 +2248,15 @@ void sev_vm_destroy(struct kvm *kvm)
> }
> }
>
> - sev_unbind_asid(kvm, sev->handle);
> + if (sev_snp_guest(kvm)) {
> + if (snp_decommission_context(kvm)) {
> + WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
> + return;
> + }
> + } else {
> + sev_unbind_asid(kvm, sev->handle);
> + }
> +
> sev_asid_free(sev);
> }
>
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 2f45589ee596..71c011af098e 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -91,6 +91,7 @@ struct kvm_sev_info {
> struct misc_cg *misc_cg; /* For misc cgroup accounting */
> atomic_t migration_in_progress;
> u64 snp_init_flags;
> + void *snp_context; /* SNP guest context page */
> };
>
> struct kvm_svm {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 0f912cefc544..0cb119d66ae5 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1812,6 +1812,7 @@ enum sev_cmd_id {
>
> /* SNP specific commands */
> KVM_SEV_SNP_INIT,
> + KVM_SEV_SNP_LAUNCH_START,
>
> KVM_SEV_NR_MAX,
> };
> @@ -1919,6 +1920,15 @@ struct kvm_snp_init {
> __u64 flags;
> };
>
> +struct kvm_sev_snp_launch_start {
> + __u64 policy;
> + __u64 ma_uaddr;
> + __u8 ma_en;
> + __u8 imi_en;
> + __u8 gosvw[16];
> + __u8 pad[6];
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> --
> 2.25.1
>

BR, Jarkko

2022-08-02 13:31:42

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 28/49] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

On Mon, Jun 20, 2022 at 11:08:38PM +0000, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
> it as the measurement of the guest at launch.
>
> While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
> to encrypt the VMSA pages.

Nit: for completeness sake it would nice to fully conclude whether
LAUNCH_UPDATE is usable after LAUNCH_FINISH in this paragraph.

>
> If its an SNP guest, then VMSA was added in the RMP entry as
> a guest owned page and also removed from the kernel direct map
> so flush it later after it is transitioned back to hypervisor
> state and restored in the direct map.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> .../virt/kvm/x86/amd-memory-encryption.rst | 22 ++++
> arch/x86/kvm/svm/sev.c | 119 ++++++++++++++++++
> include/uapi/linux/kvm.h | 14 +++
> 3 files changed, 155 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index 62abd5c1f72b..750162cff87b 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -514,6 +514,28 @@ Returns: 0 on success, -negative on error
> See the SEV-SNP spec for further details on how to build the VMPL permission
> mask and page type.
>
> +21. KVM_SNP_LAUNCH_FINISH
> +-------------------------
> +
> +After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
> +issued to make the guest ready for the execution.

Some remark about LAUNCH_UPDATE post-LAUNCH_FINISH would be nice.

> +
> +Parameters (in): struct kvm_sev_snp_launch_finish
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> + struct kvm_sev_snp_launch_finish {
> + __u64 id_block_uaddr;
> + __u64 id_auth_uaddr;
> + __u8 id_block_en;
> + __u8 auth_key_en;
> + __u8 host_data[32];
> + };
> +
> +
> +See SEV-SNP specification for further details on launch finish input parameters.
>
> References
> ==========
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a9461d352eda..a5b90469683f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2095,6 +2095,106 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_update data = {};
> + int i, ret;
> +
> + data.gctx_paddr = __psp_pa(sev->snp_context);
> + data.page_type = SNP_PAGE_TYPE_VMSA;
> +
> + for (i = 0; i < kvm->created_vcpus; i++) {
> + struct vcpu_svm *svm = to_svm(xa_load(&kvm->vcpu_array, i));
> + u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
> +
> + /* Perform some pre-encryption checks against the VMSA */
> + ret = sev_es_sync_vmsa(svm);
> + if (ret)
> + return ret;
> +
> + /* Transition the VMSA page to a firmware state. */
> + ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);
> + if (ret)
> + return ret;
> +
> + /* Issue the SNP command to encrypt the VMSA */
> + data.address = __sme_pa(svm->sev_es.vmsa);
> + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> + &data, &argp->error);
> + if (ret) {
> + snp_page_reclaim(pfn);
> + return ret;
> + }
> +
> + svm->vcpu.arch.guest_state_protected = true;
> + }
> +
> + return 0;
> +}
> +
> +static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + struct sev_data_snp_launch_finish *data;
> + void *id_block = NULL, *id_auth = NULL;
> + struct kvm_sev_snp_launch_finish params;

Nit: "params" should be the 2nd declaration (reverse
christmas tree order).

> + int ret;
> +
> + if (!sev_snp_guest(kvm))
> + return -ENOTTY;
> +
> + if (!sev->snp_context)
> + return -EINVAL;
> +
> + if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> + return -EFAULT;
> +
> + /* Measure all vCPUs using LAUNCH_UPDATE before we finalize the launch flow. */
> + ret = snp_launch_update_vmsa(kvm, argp);
> + if (ret)
> + return ret;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> + if (!data)
> + return -ENOMEM;
> +
> + if (params.id_block_en) {
> + id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
> + if (IS_ERR(id_block)) {
> + ret = PTR_ERR(id_block);
> + goto e_free;
> + }
> +
> + data->id_block_en = 1;
> + data->id_block_paddr = __sme_pa(id_block);
> + }
> +
> + if (params.auth_key_en) {
> + id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
> + if (IS_ERR(id_auth)) {
> + ret = PTR_ERR(id_auth);
> + goto e_free_id_block;
> + }
> +
> + data->auth_key_en = 1;
> + data->id_auth_paddr = __sme_pa(id_auth);
> + }
> +
> + data->gctx_paddr = __psp_pa(sev->snp_context);
> + ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
> +
> + kfree(id_auth);
> +
> +e_free_id_block:
> + kfree(id_block);
> +
> +e_free:
> + kfree(data);
> +
> + return ret;
> +}
> +
> int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_sev_cmd sev_cmd;
> @@ -2191,6 +2291,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SNP_LAUNCH_UPDATE:
> r = snp_launch_update(kvm, &sev_cmd);
> break;
> + case KVM_SEV_SNP_LAUNCH_FINISH:
> + r = snp_launch_finish(kvm, &sev_cmd);
> + break;
> default:
> r = -EINVAL;
> goto out;
> @@ -2696,11 +2799,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
>
> svm = to_svm(vcpu);
>
> + /*
> + * If its an SNP guest, then VMSA was added in the RMP entry as
> + * a guest owned page. Transition the page to hypervisor state
> + * before releasing it back to the system.
> + * Also the page is removed from the kernel direct map, so flush it
> + * later after it is transitioned back to hypervisor state and
> + * restored in the direct map.
> + */
> + if (sev_snp_guest(vcpu->kvm)) {
> + u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
> +
> + if (host_rmp_make_shared(pfn, PG_LEVEL_4K, false))
> + goto skip_vmsa_free;
> + }
> +
> if (vcpu->arch.guest_state_protected)
> sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
>
> __free_page(virt_to_page(svm->sev_es.vmsa));
>
> +skip_vmsa_free:
> if (svm->sev_es.ghcb_sa_free)
> kvfree(svm->sev_es.ghcb_sa);
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 9b36b07414ea..5a4662716b6a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1814,6 +1814,7 @@ enum sev_cmd_id {
> KVM_SEV_SNP_INIT,
> KVM_SEV_SNP_LAUNCH_START,
> KVM_SEV_SNP_LAUNCH_UPDATE,
> + KVM_SEV_SNP_LAUNCH_FINISH,
>
> KVM_SEV_NR_MAX,
> };
> @@ -1948,6 +1949,19 @@ struct kvm_sev_snp_launch_update {
> __u8 vmpl1_perms;
> };
>
> +#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
> +#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
> +#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
> +
> +struct kvm_sev_snp_launch_finish {
> + __u64 id_block_uaddr;
> + __u64 id_auth_uaddr;
> + __u8 id_block_en;
> + __u8 auth_key_en;
> + __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
> + __u8 pad[6];
> +};
> +
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> --
> 2.25.1
>

BR, Jarkko

2022-08-02 14:20:54

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

On Mon, Aug 01, 2022 at 11:32:21PM +0000, Kalra, Ashish wrote:
> But we can't use this struct on a core/platform which has a different
> layout, so aren't the model checks required ?

That would be a problem only if the already specified fields move or get
resized.

If their offset and size don't change, you're good.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-08-03 20:28:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 06/49] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

On Mon, Aug 01, 2022 at 10:31:26PM +0000, Kalra, Ashish wrote:
> The struct rmpentry is the raw layout of the RMP table entry
> while struct rmpupdate is the structure expected by the rmpupdate
> instruction for programming the RMP table entries.
>
> Arguably, we can program a struct rmpupdate internally from a struct
> rmpentry.
>
> But we will still need struct rmpupdate for issuing the rmpupdate
> instruction, so it is probably cleaner to keep it this way, as it only
> has two main callers - rmp_make_private() and rmp_make_shared().

Ok, but then call it struct rmp_state. The APM says in the RMPUPDATE
blurb:

"The RCX register provides the effective address of a 16-byte data
structure which contains the new RMP state."

so the function signature should be:

static int rmpupdate(u64 pfn, struct rmp_state *new)

and this is basically the description of that. It can't get any more
user-friendly than this.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-08-04 10:59:50

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 27/49] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests

On 6/21/22 01:08, Ashish Kalra wrote:
> From: Brijesh Singh <[email protected]>
>
> When SEV-SNP is enabled, the guest private pages are added in the RMP
> table; while adding the pages, the rmp_make_private() unmaps the pages
> from the direct map. If KSM attempts to access those unmapped pages then
> it will trigger #PF (page-not-present).
>
> Encrypted guest pages cannot be shared between the process, so an
> userspace should not mark the region mergeable but to be safe, mark the
> process vma unmerable before adding the pages in the RMP table.
>
> Signed-off-by: Brijesh Singh <[email protected]>

Note this doesn't really mark the vma unmergeable, rather it unmarks it as
mergeable, and unmerges any already merged pages.
Which seems like a good idea. Is snp_launch_update() the only place that
needs it or can private pages be added elsewhere too?

However, AFAICS nothing stops userspace to do another
madvise(MADV_MERGEABLE) afterwards, so we should make somehow sure that ksm
will still be prevented, as we should protect the kernel even from a buggy
userspace. So either we stop it with a flag at vma level (see ksm_madvise()
for which flags currently stop it), or page level - currently only
PageAnon() pages are handled. The vma level is probably easier/cheaper.

It's also possible that this will solve itself with the switch to UPM as
those vma's or pages might be incompatible with ksm naturally (didn't check
closely), and then this patch can be just dropped. But we should double-check.


2022-08-04 12:13:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 07/49] x86/sev: Invalid pages from direct map when adding it to RMP table

On Mon, Aug 01, 2022 at 11:57:09PM +0000, Kalra, Ashish wrote:
> You mean set_memory_present() ?

Right, that.

We have set_memory_np() but set_memory_present(). Talk about
consistence... ;-\

> But again, calling set_direct_map_invalid_noflush() is easier to
> understand from the calling function's point of view as it correlates
> to the functionality of invalidating the page from kernel direct map ?

You mean, we prefer easy to understand to performance?

set_direct_map_invalid_noflush() means crap to me. I have to go look it
up - set memory P or NP is much clearer.

The patch which added those things you consider easier to understand is:

commit d253ca0c3865a8d9a8c01143cf20425e0be4d0ce
Author: Rick Edgecombe <[email protected]>
Date: Thu Apr 25 17:11:34 2019 -0700

x86/mm/cpa: Add set_direct_map_*() functions

Add two new functions set_direct_map_default_noflush() and
set_direct_map_invalid_noflush() for setting the direct map alias for the
page to its default valid permissions and to an invalid state that cannot
be cached in a TLB, respectively. These functions do not flush the TLB.

I don't see how this fits with your use case...

Also, your helpers are called restore_direct_map and
invalidate_direct_map. That's already explaining what this is supposed
to do.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-08-08 13:20:41

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 08/49] x86/traps: Define RMP violation #PF error code

On Mon, Jun 20, 2022 at 11:03:27PM +0000, Ashish Kalra wrote:
> @@ -12,15 +14,17 @@
> * bit 4 == 1: fault was an instruction fetch
> * bit 5 == 1: protection keys block access
> * bit 15 == 1: SGX MMU page-fault
> + * bit 31 == 1: fault was due to RMP violation
> */
> enum x86_pf_error_code {
> - X86_PF_PROT = 1 << 0,
> - X86_PF_WRITE = 1 << 1,
> - X86_PF_USER = 1 << 2,
> - X86_PF_RSVD = 1 << 3,
> - X86_PF_INSTR = 1 << 4,
> - X86_PF_PK = 1 << 5,
> - X86_PF_SGX = 1 << 15,
> + X86_PF_PROT = BIT_ULL(0),
> + X86_PF_WRITE = BIT_ULL(1),
> + X86_PF_USER = BIT_ULL(2),
> + X86_PF_RSVD = BIT_ULL(3),
> + X86_PF_INSTR = BIT_ULL(4),
> + X86_PF_PK = BIT_ULL(5),
> + X86_PF_SGX = BIT_ULL(15),
> + X86_PF_RMP = BIT_ULL(31),

Yeah, I remember dhansen asked for those to use the BIT() macro but the
_ULL is an overkill. Those PF flags are 32 and they fit in an unsigned
int.

But we don't have BUT_UI() so I guess the next best thing - BIT() -
which uses UL internally, should be good enough.

So pls use BIT() here - not BIT_ULL().

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-08-08 19:39:25

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 17/49] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command

To preface, I don't want to delay this patch set, only have the
conversation at the most appropriate place.

>
> > The SEV-SNP firmware provides the SNP_CONFIG command used to set the
> > system-wide configuration value for SNP guests. The information includes
> > the TCB version string to be reported in guest attestation reports.
>

The system-wide aspect of this makes me wonder if we can also have a
VM instance-specific extension. This is important for the use case
that we may see secure boot variables included in the launch
measurement, making offline signing of the UEFI image impossible. We
can't sign the cross-product of all UEFI builds and every user's EFI
variables. We'd like to include an instance-specific certificate that
specifies the platform-endorsed golden measurement of the UEFI.

An alternative that doesn't require a change to the kernel is to just
make this certificate fetchable from a FAMILY_ID-keyed, predetermined
URL prefix + IMAGE_ID + '.crt', but this requires a download (and
continuous hosting) to do something as routine as collecting an
attestation report. It's up to the upstream community to determine if
that is an acceptable cost to keep the complexity of a certificate
table merge operation out of the kernel.

The SNP API specification gives an interpretation to the data blob
here as a table of GUID/offset pairs followed by data blobs that
presumably are at the appropriate offsets into the data pages. The
spec allows for the host to add any number of GUID/offset pairs it
wants, with 3 specific GUIDs recommended for the AMD PSP certificate
chain.

The snp_guest_ext_guest_request function in ccp is what passes back
the certificate data that was previously stored, so I'm wondering if
it can take an extra (pointer,len) pair of VM instance certificate
data to merge with the host certificate data before returning to the
guest. The new required length is the sum total of both the header
certs and instance certs. The operation to copy the data is no longer
a memcpy but a header merge that tracks the offset shifts caused by a
larger header and other certificates in the remaining data pages.

I can propose my own patch on top of this v6 patch set that adds a KVM
ioctl like KVM_{GET,SET}_INSTANCE_SNP_EXT_CONFIG and then pass along
the stored certificate blob in the request call. I'd prefer to have
the design agreed upon upfront though.

--
-Dionna Glaze, PhD (she/her)

2022-08-08 21:34:39

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 17/49] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command

On 8/8/22 14:27, Dionna Amalie Glaze wrote:
> To preface, I don't want to delay this patch set, only have the
> conversation at the most appropriate place.
>
>>
>>> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
>>> system-wide configuration value for SNP guests. The information includes
>>> the TCB version string to be reported in guest attestation reports.
>>
>
> The system-wide aspect of this makes me wonder if we can also have a
> VM instance-specific extension. This is important for the use case
> that we may see secure boot variables included in the launch
> measurement, making offline signing of the UEFI image impossible. We
> can't sign the cross-product of all UEFI builds and every user's EFI
> variables. We'd like to include an instance-specific certificate that
> specifies the platform-endorsed golden measurement of the UEFI.
>
> An alternative that doesn't require a change to the kernel is to just
> make this certificate fetchable from a FAMILY_ID-keyed, predetermined
> URL prefix + IMAGE_ID + '.crt', but this requires a download (and
> continuous hosting) to do something as routine as collecting an
> attestation report. It's up to the upstream community to determine if
> that is an acceptable cost to keep the complexity of a certificate
> table merge operation out of the kernel.
>
> The SNP API specification gives an interpretation to the data blob

That's the GHCB specification, not the SNP API.

> here as a table of GUID/offset pairs followed by data blobs that
> presumably are at the appropriate offsets into the data pages. The
> spec allows for the host to add any number of GUID/offset pairs it
> wants, with 3 specific GUIDs recommended for the AMD PSP certificate
> chain.
>
> The snp_guest_ext_guest_request function in ccp is what passes back
> the certificate data that was previously stored, so I'm wondering if
> it can take an extra (pointer,len) pair of VM instance certificate
> data to merge with the host certificate data before returning to the
> guest. The new required length is the sum total of both the header
> certs and instance certs. The operation to copy the data is no longer
> a memcpy but a header merge that tracks the offset shifts caused by a
> larger header and other certificates in the remaining data pages.
>
> I can propose my own patch on top of this v6 patch set that adds a KVM
> ioctl like KVM_{GET,SET}_INSTANCE_SNP_EXT_CONFIG and then pass along

Would it be burden to supply all the certificates, both system and per-VM,
in this KVM call? On the SNP Extended Guest Request, the hypervisor could
just check if there is a per-VM blob and return that or else return the
system-wide blob (if present).

Thanks,
Tom


> the stored certificate blob in the request call. I'd prefer to have
> the design agreed upon upfront though.
>

2022-08-08 23:27:59

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCH Part2 v6 17/49] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command

> Would it be burden to supply all the certificates, both system and per-VM,
> in this KVM call? On the SNP Extended Guest Request, the hypervisor could
> just check if there is a per-VM blob and return that or else return the
> system-wide blob (if present).
>

I think that's fine by me. We can use SNP_GET_EXT_CONFIG, merge in
user space, and create an instance override with a KVM ioctl without
touching ccp.

--
-Dionna Glaze, PhD (she/her)