This patch series enables SNP-host support when running on Hyper-V, which
allows launching SNP guests while running as a nested hypervisor. This works
with SNP guest side support that was merged in v5.19, and the snp capable qemu
from AMD.
In this scenario the L0 hypervisor is Hyper-V, L1 is KVM, and L2 is an SNP
guest. The code from this patchset runs in L1. L1 is not an SNP guest itself,
SNP guests are not capable of supporting virtualization.
Patch 1 deals with allocating an RMP table which is not provided by
firmware/hypervisor, but is needed by the kernel to keep track of page
assignment to guests and rmp page size. Patch 2 implements MSR-based
rmpupdate/psmash instructions which are meant for virtualized environments.
Patch 3 containts the logic to update the rmp table when rmpupdate/psmash is
issued. Patch 4 makes sure that the kernel does not disable SNP support during
early CPU init. Patch 5 allows SNP initialization to proceed when no iommus
are available. Patch 6 adds a quirk in psp command buffer handling, because of
differences in SNP firmware spec interpretation. Patch 7 adds handling for RMP
faults which occur as NPF and the L0 is not able to resolve the address that
the fault occurred at.
This series depends on:
- "Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support" (applies on top of RFC v7)
https://lore.kernel.org/lkml/[email protected]/
- "Support ACPI PSP on Hyper-V"
https://lore.kernel.org/lkml/[email protected]/
Changes since v1:
* added handling for rmp page faults that occur during copy_to_user() that
don't come with a proper fault address when running nested.
* fold IS_ENABLED() test into hv_needs_snp_rmp(), and use CONFIG_KVM_AMD_SEV
instead of CONFIG_AMD_MEM_ENCRYPT
* introduce snp_soft_rmptable() wrapper to remove core dependency on hyperv
specific code
* use msr_set_bit for SYSCFG_MEM_ENCRYPT bit instead of open coding
Jeremi Piotrowski (7):
x86/hyperv: Allocate RMP table during boot
x86/sev: Add support for NestedVirtSnpMsr
x86/sev: Maintain shadow rmptable on Hyper-V
x86/amd: Configure necessary MSRs for SNP during CPU init when running
as a guest
iommu/amd: Don't fail snp_enable when running virtualized
crypto: ccp - Introduce quirk to always reclaim pages after SEV-legacy
commands
x86/fault: Handle RMP faults with 0 address when nested
arch/x86/hyperv/hv_init.c | 5 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/hyperv-tlfs.h | 3 +
arch/x86/include/asm/mshyperv.h | 3 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/include/asm/sev.h | 6 ++
arch/x86/kernel/cpu/amd.c | 5 +-
arch/x86/kernel/cpu/mshyperv.c | 47 +++++++++
arch/x86/kernel/sev.c | 150 ++++++++++++++++++++++++++---
arch/x86/mm/fault.c | 14 +++
drivers/crypto/ccp/sev-dev.c | 6 +-
drivers/crypto/ccp/sp-dev.h | 4 +
drivers/crypto/ccp/sp-platform.c | 1 +
drivers/iommu/amd/init.c | 6 ++
14 files changed, 240 insertions(+), 13 deletions(-)
--
2.25.1
Hyper-V VMs can be capable of hosting SNP isolated nested VMs on AMD
CPUs. One of the pieces of SNP is the RMP (Reverse Map) table which
tracks page assignment to firmware, hypervisor or guest. On bare-metal
this table is allocated by UEFI, but on Hyper-V it is the responsibility
of the OS to allocate one if necessary. The nested_feature
'HV_X64_NESTED_NO_RMP_TABLE' will be set to communicate that no rmp is
available. The actual RMP table is exclusively controlled by the Hyper-V
hypervisor and is not virtualized to the VM. The SNP code in the kernel
uses the RMP table for its own tracking and so it is necessary for init
code to allocate one.
While not strictly necessary, follow the requirements defined by "SEV
Secure Nested Paging Firmware ABI Specification" Rev 1.54, section 8.8.2
when allocating the RMP:
- RMP_BASE and RMP_END must be set identically across all cores.
- RMP_BASE must be 1 MB aligned
- RMP_END – RMP_BASE + 1 must be a multiple of 1 MB
- RMP is large enough to protect itself
The allocation is done in the init_mem_mapping() hook, which is the
earliest hook I found that has both max_pfn and memblock initialized. At
this point we are still under the
memblock_set_current_limit(ISA_END_ADDRESS) condition, but explicitly
passing the end to memblock_phys_alloc_range() allows us to allocate
past that value.
The RMP table is needed when the hypervisor has access to SNP, which can
be determined using X86_FEATURE_SEV_SNP, but we need to exclude SNP
guests themselves (since SNP guests are not capable of virtualization).
This is why we check for cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT).
Signed-off-by: Jeremi Piotrowski <[email protected]>
---
arch/x86/hyperv/hv_init.c | 5 ++++
arch/x86/include/asm/hyperv-tlfs.h | 3 ++
arch/x86/include/asm/mshyperv.h | 3 ++
arch/x86/include/asm/sev.h | 2 ++
arch/x86/kernel/cpu/mshyperv.c | 45 ++++++++++++++++++++++++++++++
arch/x86/kernel/sev.c | 1 -
6 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 29774126e931..0c540fff1a20 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -117,6 +117,11 @@ static int hv_cpu_init(unsigned int cpu)
}
}
+ if (hv_needs_snp_rmp()) {
+ wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
+ wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
+ }
+
return hyperv_init_ghcb();
}
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index e3efaf6e6b62..01cc2c3f9f20 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -152,6 +152,9 @@
*/
#define HV_X64_NESTED_ENLIGHTENED_TLB BIT(22)
+/* Nested SNP on Hyper-V */
+#define HV_X64_NESTED_NO_RMP_TABLE BIT(23)
+
/* HYPERV_CPUID_ISOLATION_CONFIG.EAX bits. */
#define HV_PARAVISOR_PRESENT BIT(0)
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 61f0c206bff0..3533b002cede 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -190,6 +190,9 @@ static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
extern bool hv_isolation_type_snp(void);
+extern struct resource rmp_res;
+bool hv_needs_snp_rmp(void);
+
static inline bool hv_is_synic_reg(unsigned int reg)
{
if ((reg >= HV_REGISTER_SCONTROL) &&
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 2916f4150ac7..db5438663229 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -83,6 +83,8 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* RMUPDATE detected 4K page and 2MB page overlap. */
#define RMPUPDATE_FAIL_OVERLAP 7
+#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
+
/* RMP page size */
#define RMP_PG_SIZE_4K 0
#define RMP_PG_SIZE_2M 1
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 831613959a92..777c9d812dfa 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -17,6 +17,7 @@
#include <linux/irq.h>
#include <linux/kexec.h>
#include <linux/i8253.h>
+#include <linux/memblock.h>
#include <linux/random.h>
#include <linux/swiotlb.h>
#include <asm/processor.h>
@@ -31,6 +32,7 @@
#include <asm/timer.h>
#include <asm/reboot.h>
#include <asm/nmi.h>
+#include <asm/sev.h>
#include <clocksource/hyperv_timer.h>
#include <asm/numa.h>
#include <asm/coco.h>
@@ -488,6 +490,48 @@ static bool __init ms_hyperv_msi_ext_dest_id(void)
return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
}
+struct resource rmp_res = {
+ .name = "RMP",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_SYSTEM_RAM,
+};
+
+/*
+ * HV_X64_NESTED_NO_RMP_TABLE indicates to the nested hypervisor that no RMP
+ * table is provided/necessary, but kernel code requires access to one so we
+ * use that bit as an indication that we need to allocate one ourselves.
+ */
+bool hv_needs_snp_rmp(void)
+{
+ return IS_ENABLED(CONFIG_KVM_AMD_SEV) &&
+ boot_cpu_has(X86_FEATURE_SEV_SNP) &&
+ !cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
+ (ms_hyperv.nested_features & HV_X64_NESTED_NO_RMP_TABLE);
+}
+
+static void __init ms_hyperv_init_mem_mapping(void)
+{
+ phys_addr_t addr;
+ u64 calc_rmp_sz;
+
+ if (!hv_needs_snp_rmp())
+ return;
+
+ calc_rmp_sz = (max_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
+ calc_rmp_sz = round_up(calc_rmp_sz, SZ_1M);
+ addr = memblock_phys_alloc_range(calc_rmp_sz, SZ_1M, 0, max_pfn << PAGE_SHIFT);
+ if (!addr) {
+ pr_warn("Unable to allocate RMP table\n");
+ return;
+ }
+ rmp_res.start = addr;
+ rmp_res.end = addr + calc_rmp_sz - 1;
+ wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
+ wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
+ insert_resource(&iomem_resource, &rmp_res);
+}
+
const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.name = "Microsoft Hyper-V",
.detect = ms_hyperv_platform,
@@ -495,4 +539,5 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.init.x2apic_available = ms_hyperv_x2apic_available,
.init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id,
.init.init_platform = ms_hyperv_init_platform,
+ .init.init_mem_mapping = ms_hyperv_init_mem_mapping,
};
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1dd1b36bdfea..7fa39dc17edd 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -87,7 +87,6 @@ struct rmpentry {
* The first 16KB from the RMP_BASE is used by the processor for the
* bookkeeping, the range needs to be added during the RMP entry lookup.
*/
-#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
#define RMPENTRY_SHIFT 8
#define rmptable_page_offset(x) (RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))
--
2.25.1
The rmpupdate and psmash instructions, which are used in AMD's SEV-SNP
to update the RMP (Reverse Map) table, can't be trapped. For nested
scenarios, AMD defined MSR versions of these instructions which can
be trapped and must be emulated by the L0 hypervisor. One instance where
these MSRs are used are Hyper-V VMs which expose SNP hardware isolation
capabilities to the L1 guest.
The MSRs are defined in "AMD64 Architecture Programmer’s Manual, Volume 2:
System Programming", section 15.36.19.
Signed-off-by: Jeremi Piotrowski <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/kernel/sev.c | 80 ++++++++++++++++++++++++++----
3 files changed, 73 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 480b4eaef310..e6e2e824f67b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -423,6 +423,7 @@
#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
+#define X86_FEATURE_NESTED_VIRT_SNP_MSR (19*32+29) /* Virtualizable RMPUPDATE and PSMASH MSR available */
/*
* BUG word(s)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 35100c630617..d6103e607896 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -567,6 +567,8 @@
#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
#define MSR_AMD64_RMP_BASE 0xc0010132
#define MSR_AMD64_RMP_END 0xc0010133
+#define MSR_AMD64_VIRT_RMPUPDATE 0xc001f001
+#define MSR_AMD64_VIRT_PSMASH 0xc001f002
#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 7fa39dc17edd..ad09dd3747a1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2566,6 +2566,32 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
}
EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
+static bool virt_snp_msr(void)
+{
+ return boot_cpu_has(X86_FEATURE_NESTED_VIRT_SNP_MSR);
+}
+
+/*
+ * This version of psmash is not implemented in hardware but always
+ * traps to L0 hypervisor. It doesn't follow usual wrmsr conventions.
+ * Inputs:
+ * rax: 2MB aligned GPA
+ * Outputs:
+ * rax: psmash return code
+ */
+static u64 virt_psmash(u64 paddr)
+{
+ int ret;
+
+ asm volatile(
+ "wrmsr\n\t"
+ : "=a"(ret)
+ : "a"(paddr), "c"(MSR_AMD64_VIRT_PSMASH)
+ : "memory", "cc"
+ );
+ return ret;
+}
+
/*
* psmash is used to smash a 2MB aligned page into 4K
* pages while preserving the Validated bit in the RMP.
@@ -2581,11 +2607,15 @@ int psmash(u64 pfn)
if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
return -ENXIO;
- /* Binutils version 2.36 supports the PSMASH mnemonic. */
- asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
- : "=a"(ret)
- : "a"(paddr)
- : "memory", "cc");
+ if (virt_snp_msr()) {
+ ret = virt_psmash(paddr);
+ } else {
+ /* Binutils version 2.36 supports the PSMASH mnemonic. */
+ asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+ : "=a"(ret)
+ : "a"(paddr)
+ : "memory", "cc");
+ }
return ret;
}
@@ -2601,6 +2631,31 @@ static int invalidate_direct_map(unsigned long pfn, int npages)
return set_memory_np((unsigned long)pfn_to_kaddr(pfn), npages);
}
+/*
+ * This version of rmpupdate is not implemented in hardware but always
+ * traps to L0 hypervisor. It doesn't follow usual wrmsr conventions.
+ * Inputs:
+ * rax: 4KB aligned GPA
+ * rdx: bytes 7:0 of new rmp entry
+ * r8: bytes 15:8 of new rmp entry
+ * Outputs:
+ * rax: rmpupdate return code
+ */
+static u64 virt_rmpupdate(unsigned long paddr, struct rmp_state *val)
+{
+ int ret;
+ register u64 hi asm("r8") = ((u64 *)val)[1];
+ register u64 lo asm("rdx") = ((u64 *)val)[0];
+
+ asm volatile(
+ "wrmsr\n\t"
+ : "=a"(ret)
+ : "a"(paddr), "c"(MSR_AMD64_VIRT_RMPUPDATE), "r"(lo), "r"(hi)
+ : "memory", "cc"
+ );
+ return ret;
+}
+
static int rmpupdate(u64 pfn, struct rmp_state *val)
{
unsigned long paddr = pfn << PAGE_SHIFT;
@@ -2626,11 +2681,16 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
}
retry:
- /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
- asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
- : "=a"(ret)
- : "a"(paddr), "c"((unsigned long)val)
- : "memory", "cc");
+
+ if (virt_snp_msr()) {
+ ret = virt_rmpupdate(paddr, val);
+ } else {
+ /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+ : "=a"(ret)
+ : "a"(paddr), "c"((unsigned long)val)
+ : "memory", "cc");
+ }
if (ret) {
if (!retries) {
--
2.25.1
Hyper-V can expose the SEV-SNP feature to guests, and manages the
system-wide RMP (Reverse Map) table. The SNP implementation in the
kernel needs access to the rmptable for tracking pages and deciding
when/how to issue rmpupdate/psmash. When running as a Hyper-V guest
with SNP support, an rmptable is allocated by the kernel during boot for
this purpose. Keep the table in sync with issued rmpupdate/psmash
instructions.
The logic for how to update the rmptable comes from "AMD64 Architecture
Programmer’s Manual, Volume 3" which describes the psmash and rmpupdate
instructions. To ensure correctness of the SNP host code, the most
important fields are "assigned" and "page size".
Signed-off-by: Jeremi Piotrowski <[email protected]>
---
arch/x86/include/asm/sev.h | 4 ++
arch/x86/kernel/cpu/mshyperv.c | 2 +
arch/x86/kernel/sev.c | 69 ++++++++++++++++++++++++++++++++++
3 files changed, 75 insertions(+)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index db5438663229..4d3591ebff5d 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -218,6 +218,8 @@ int psmash(u64 pfn);
int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
int rmp_make_shared(u64 pfn, enum pg_level level);
void sev_dump_rmpentry(u64 pfn);
+bool snp_soft_rmptable(void);
+void __init snp_set_soft_rmptable(void);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -251,6 +253,8 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
}
static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
static inline void sev_dump_rmpentry(u64 pfn) {}
+static inline bool snp_soft_rmptable(void) { return false; }
+static inline void __init snp_set_soft_rmptable(void) {}
#endif
#endif
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 777c9d812dfa..101c38e9cae7 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -530,6 +530,8 @@ static void __init ms_hyperv_init_mem_mapping(void)
wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
insert_resource(&iomem_resource, &rmp_res);
+
+ snp_set_soft_rmptable();
}
const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index ad09dd3747a1..712f1a9623ce 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2566,6 +2566,22 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
}
EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
+static bool soft_rmptable __ro_after_init;
+
+/*
+ * Test if the rmptable needs to be managed by software and is not maintained by
+ * (virtualized) hardware.
+ */
+bool snp_soft_rmptable(void)
+{
+ return soft_rmptable;
+}
+
+void __init snp_set_soft_rmptable(void)
+{
+ soft_rmptable = true;
+}
+
static bool virt_snp_msr(void)
{
return boot_cpu_has(X86_FEATURE_NESTED_VIRT_SNP_MSR);
@@ -2592,6 +2608,26 @@ static u64 virt_psmash(u64 paddr)
return ret;
}
+static void snp_update_rmptable_psmash(u64 pfn)
+{
+ int level;
+ struct rmpentry *entry = __snp_lookup_rmpentry(pfn, &level);
+
+ if (WARN_ON(IS_ERR_OR_NULL(entry)))
+ return;
+
+ if (level == PG_LEVEL_2M) {
+ int i;
+
+ entry->info.pagesize = RMP_PG_SIZE_4K;
+ for (i = 1; i < PTRS_PER_PMD; i++) {
+ struct rmpentry *it = &entry[i];
+ *it = *entry;
+ it->info.gpa = entry->info.gpa + i * PAGE_SIZE;
+ }
+ }
+}
+
/*
* psmash is used to smash a 2MB aligned page into 4K
* pages while preserving the Validated bit in the RMP.
@@ -2609,6 +2645,8 @@ int psmash(u64 pfn)
if (virt_snp_msr()) {
ret = virt_psmash(paddr);
+ if (!ret && snp_soft_rmptable())
+ snp_update_rmptable_psmash(pfn);
} else {
/* Binutils version 2.36 supports the PSMASH mnemonic. */
asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
@@ -2656,6 +2694,35 @@ static u64 virt_rmpupdate(unsigned long paddr, struct rmp_state *val)
return ret;
}
+static void snp_update_rmptable_rmpupdate(u64 pfn, int level, struct rmp_state *val)
+{
+ int prev_level;
+ struct rmpentry *entry = __snp_lookup_rmpentry(pfn, &prev_level);
+
+ if (WARN_ON(IS_ERR_OR_NULL(entry)))
+ return;
+
+ if (level > PG_LEVEL_4K) {
+ int i;
+ struct rmpentry tmp_rmp = {
+ .info = {
+ .assigned = val->assigned,
+ },
+ };
+ for (i = 1; i < PTRS_PER_PMD; i++)
+ entry[i] = tmp_rmp;
+ }
+ if (!val->assigned) {
+ memset(entry, 0, sizeof(*entry));
+ } else {
+ entry->info.assigned = val->assigned;
+ entry->info.pagesize = val->pagesize;
+ entry->info.immutable = val->immutable;
+ entry->info.gpa = val->gpa;
+ entry->info.asid = val->asid;
+ }
+}
+
static int rmpupdate(u64 pfn, struct rmp_state *val)
{
unsigned long paddr = pfn << PAGE_SHIFT;
@@ -2684,6 +2751,8 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
if (virt_snp_msr()) {
ret = virt_rmpupdate(paddr, val);
+ if (!ret && snp_soft_rmptable())
+ snp_update_rmptable_rmpupdate(pfn, level, val);
} else {
/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
--
2.25.1
Hyper-V may expose the SEV-SNP CPU features to the guest, but it is the
guests kernel's responsibility to configure them.
early_detect_mem_encrypt() checks SYSCFG[MEM_ENCRYPT] and HWCR[SMMLOCK]
and if these are not set the SEV-SNP CPU flags are cleared. These checks
are only really necessary on baremetal and provide no value when running
virtualized. They prevent further initialization from happening, so
check if we are running under a hypervisor and if so - update SYSCFG and
skip the HWCR check.
Signed-off-by: Jeremi Piotrowski <[email protected]>
---
arch/x86/kernel/cpu/amd.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c7884198ad5b..4418a418109b 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -565,6 +565,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
* don't advertise the feature under CONFIG_X86_32.
*/
if (cpu_has(c, X86_FEATURE_SME) || cpu_has(c, X86_FEATURE_SEV)) {
+ if (cpu_has(c, X86_FEATURE_HYPERVISOR))
+ msr_set_bit(MSR_AMD64_SYSCFG, MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT);
+
/* Check if memory encryption is enabled */
rdmsrl(MSR_AMD64_SYSCFG, msr);
if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
@@ -584,7 +587,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
setup_clear_cpu_cap(X86_FEATURE_SME);
rdmsrl(MSR_K7_HWCR, msr);
- if (!(msr & MSR_K7_HWCR_SMMLOCK))
+ if (!(msr & MSR_K7_HWCR_SMMLOCK) && !cpu_has(c, X86_FEATURE_HYPERVISOR))
goto clear_sev;
return;
--
2.25.1
Hyper-V VMs do not have access to an IOMMU but can support hosting SNP
VMs. amd_iommu_snp_enable() is on the SNP init path and should not fail
in that case.
Signed-off-by: Jeremi Piotrowski <[email protected]>
---
drivers/iommu/amd/init.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d1270e3c5baf..8049dbe78a27 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3619,6 +3619,12 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
#ifdef CONFIG_AMD_MEM_ENCRYPT
int amd_iommu_snp_enable(void)
{
+ /*
+ * If we're running virtualized there doesn't have to be an IOMMU for SNP to work.
+ */
+ if (init_state == IOMMU_NOT_FOUND && boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ return 0;
+
/*
* The SNP support requires that IOMMU must be enabled, and is
* not configured in the passthrough mode.
--
2.25.1
On Hyper-V, the rmp_mark_pages_shared() call after a SEV_PLATFORM_STATUS
fails with return code 2 (FAIL_PERMISSION) due to the page having the
immutable bit set in the RMP (SNP has been initialized). The comment
above this spot mentions that firmware automatically clears the
immutable bit, but I can't find any mention of this behavior in the SNP
Firmware ABI Spec.
Introduce a quirk to always attempt the page reclaim and set it for the
platform PSP. It would be possible to make this behavior unconditional
as the firmware spec defines that page reclaim results in success if the
page does not have the immutable bit set.
Signed-off-by: Jeremi Piotrowski <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 6 +++++-
drivers/crypto/ccp/sp-dev.h | 4 ++++
drivers/crypto/ccp/sp-platform.c | 1 +
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 6c4fdcaed72b..4719c0cafa28 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -658,8 +658,12 @@ static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
* no not need to reclaim the page.
*/
if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
- if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
+ if (psp_master->vdata->quirks & PSP_QUIRK_ALWAYS_RECLAIM) {
+ if (snp_reclaim_pages(__pa(cmd_buf), 1, true))
+ return -EFAULT;
+ } else if (rmp_mark_pages_shared(__pa(cmd_buf), 1)) {
return -EFAULT;
+ }
/* No need to go further if firmware failed to execute command. */
if (fw_err)
diff --git a/drivers/crypto/ccp/sp-dev.h b/drivers/crypto/ccp/sp-dev.h
index c05f1fa82ff4..d50f274462d4 100644
--- a/drivers/crypto/ccp/sp-dev.h
+++ b/drivers/crypto/ccp/sp-dev.h
@@ -28,6 +28,9 @@
#define CACHE_NONE 0x00
#define CACHE_WB_NO_ALLOC 0xb7
+/* PSP requires a reclaim after every firmware command */
+#define PSP_QUIRK_ALWAYS_RECLAIM BIT(0)
+
/* Structure to hold CCP device data */
struct ccp_device;
struct ccp_vdata {
@@ -59,6 +62,7 @@ struct psp_vdata {
const unsigned int feature_reg;
const unsigned int inten_reg;
const unsigned int intsts_reg;
+ const unsigned int quirks;
};
/* Structure to hold SP device data */
diff --git a/drivers/crypto/ccp/sp-platform.c b/drivers/crypto/ccp/sp-platform.c
index 1926efbc7b32..937448f6391a 100644
--- a/drivers/crypto/ccp/sp-platform.c
+++ b/drivers/crypto/ccp/sp-platform.c
@@ -103,6 +103,7 @@ static void sp_platform_fill_vdata(struct sp_dev_vdata *vdata,
.feature_reg = pdata->feature_reg,
.inten_reg = pdata->irq_en_reg,
.intsts_reg = pdata->irq_st_reg,
+ .quirks = PSP_QUIRK_ALWAYS_RECLAIM,
};
memcpy(sev, &sevtmp, sizeof(*sev));
--
2.25.1
When using SNP, accessing an encrypted guest page from the host triggers
an RMP fault. The page fault handling code can currently handle this by
looking up the corresponding rmp entry. If the same operation happens
when using nested virtualization, the L0 hypervisor sees a #NPF but the
CPU does not provide the address of the fault if the CPU was running at
L1 at the time of the fault.
This happens on Hyper-V when using nested SNP guests. Hyper-V has no
choice but to use a placeholder address (0) when injecting the page
fault to L1. We need to handle this, and the only sane thing to do is to
forward a SIGBUS to the task.
One path where this happens is when the SNP guest issues a
KVM_HC_CLOCK_PAIRING hypercall, which leads to KVM calling
kvm_write_guest() on a guest supplied address. This results in the
following backtrace:
[ 191.862660] exc_page_fault+0x71/0x170
[ 191.862664] asm_exc_page_fault+0x2c/0x40
[ 191.862666] RIP: 0010:copy_user_enhanced_fast_string+0xa/0x40
...
[ 191.862677] ? __kvm_write_guest_page+0x6e/0xa0 [kvm]
[ 191.862700] kvm_write_guest_page+0x52/0xc0 [kvm]
[ 191.862788] kvm_write_guest+0x44/0x80 [kvm]
[ 191.862807] kvm_emulate_hypercall+0x1ca/0x5a0 [kvm]
[ 191.862830] ? kvm_emulate_monitor+0x40/0x40 [kvm]
[ 191.862849] svm_invoke_exit_handler+0x74/0x180 [kvm_amd]
[ 191.862854] sev_handle_vmgexit+0xf42/0x17f0 [kvm_amd]
[ 191.862858] ? __this_cpu_preempt_check+0x13/0x20
[ 191.862860] ? sev_post_map_gfn+0xf0/0xf0 [kvm_amd]
[ 191.862863] svm_invoke_exit_handler+0x74/0x180 [kvm_amd]
[ 191.862866] svm_handle_exit+0xb5/0x2b0 [kvm_amd]
[ 191.862869] kvm_arch_vcpu_ioctl_run+0x12a8/0x1aa0 [kvm]
[ 191.862891] kvm_vcpu_ioctl+0x24f/0x6d0 [kvm]
[ 191.862910] ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm]
[ 191.862929] ? _copy_to_user+0x25/0x30
[ 191.862932] ? kvm_vm_ioctl+0x291/0xea0 [kvm]
[ 191.862951] ? kvm_vm_ioctl+0x291/0xea0 [kvm]
[ 191.862970] ? __fget_light+0xc5/0x100
[ 191.862972] __x64_sys_ioctl+0x91/0xc0
[ 191.862975] do_syscall_64+0x5c/0x80
[ 191.862976] ? exit_to_user_mode_prepare+0x53/0x240
[ 191.862978] ? syscall_exit_to_user_mode+0x17/0x40
[ 191.862980] ? do_syscall_64+0x69/0x80
[ 191.862981] ? do_syscall_64+0x69/0x80
[ 191.862982] ? syscall_exit_to_user_mode+0x17/0x40
[ 191.862983] ? do_syscall_64+0x69/0x80
[ 191.862984] ? syscall_exit_to_user_mode+0x17/0x40
[ 191.862985] ? do_syscall_64+0x69/0x80
[ 191.862986] ? do_syscall_64+0x69/0x80
[ 191.862987] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Without this fix the handler returns without doing anything and the
result is a soft-lockup of the CPU.
Signed-off-by: Jeremi Piotrowski <[email protected]>
---
arch/x86/mm/fault.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f2b16dcfbd9a..8706fd34f3a9 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -34,6 +34,7 @@
#include <asm/vdso.h> /* fixup_vdso_exception() */
#include <asm/irq_stack.h>
#include <asm/sev.h> /* snp_lookup_rmpentry() */
+#include <asm/hypervisor.h> /* hypervisor_is_type() */
#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -1282,6 +1283,18 @@ static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_
pte_t *pte;
u64 pfn;
+ /*
+ * When an rmp fault occurs while not inside the SNP guest, the L0
+ * hypervisor sees a NPF and does not have access to the address that
+ * caused the fault to forward to L1 hypervisor. Hyper-V places a 0 in
+ * the PF as a placeholder. SIGBUS the task since there's nothing
+ * better that we can do.
+ */
+ if (!address && hypervisor_is_type(X86_HYPER_MS_HYPERV)) {
+ do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
+ return 1;
+ }
+
pgd = __va(read_cr3_pa());
pgd += pgd_index(address);
--
2.25.1