2023-01-23 16:51:48

by Jeremi Piotrowski

[permalink] [raw]
Subject: [RFC PATCH v1 0/6] Support nested SNP KVM guests on Hyper-V

This patch series enables SNP-host support when running on Hyper-V, which
allows launching nested SNP guests. No guest or qemu changes are necessary
for this to work.

Patch 1 deals with allocating an RMP table which is not provided by
firmware/hypervisor. Patch 2 implements MSR-based rmpupdate/psmash instructions
which are meant for virtualized environments. Patch 3 maintains the rmptable
for internal kernel page tracking. Patch 4 makes sure that the kernel does not
disable SNP support during early CPU init. Patch 5 allows SNP initialization to
proceed when no iommus are available. Patch 6 adds a quirk in psp command
buffer handling, because of differences in SNP firmware spec interpretation.

This series depends on

- "Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support" (applies on top of RFC v7)
https://lore.kernel.org/lkml/[email protected]/
- "Support ACPI PSP on Hyper-V"
https://lore.kernel.org/lkml/[email protected]/

Jeremi Piotrowski (6):
x86/hyperv: Allocate RMP table during boot
x86/sev: Add support for NestedVirtSnpMsr
x86/sev: Maintain shadow rmptable on Hyper-V
x86/amd: Configure necessary MSRs for SNP during CPU init when running
as a guest
iommu/amd: Don't fail snp_enable when running virtualized
crypto: ccp - Introduce quirk to always reclaim pages after SEV-legacy
commands

arch/x86/hyperv/hv_init.c | 5 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/hyperv-tlfs.h | 3 +
arch/x86/include/asm/mshyperv.h | 3 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/include/asm/sev.h | 2 +
arch/x86/kernel/cpu/amd.c | 8 +-
arch/x86/kernel/cpu/mshyperv.c | 41 ++++++++++
arch/x86/kernel/sev.c | 122 ++++++++++++++++++++++++++---
drivers/crypto/ccp/sev-dev.c | 6 +-
drivers/crypto/ccp/sp-dev.h | 4 +
drivers/crypto/ccp/sp-platform.c | 1 +
drivers/iommu/amd/init.c | 6 ++
13 files changed, 191 insertions(+), 13 deletions(-)

--
2.25.1



2023-01-23 16:51:52

by Jeremi Piotrowski

[permalink] [raw]
Subject: [RFC PATCH v1 1/6] x86/hyperv: Allocate RMP table during boot

Hyper-V VMs can be capable of hosting SNP isolated nested VMs on AMD
CPUs. One of the pieces of SNP is the RMP (Reverse Map) table which
tracks page assignment to firmware, hypervisor or guest. On bare-metal
this table is allocated by UEFI, but on Hyper-V it is the respnsibility
of the OS to allocate one if necessary. The nested_feature
'HV_X64_NESTED_NO_RMP_TABLE' will be set to communicate that no rmp is
available. The actual RMP table is exclusively controlled by the Hyper-V
hypervisor and is not virtualized to the VM. The SNP code in the kernel
uses the RMP table for its own tracking and so it is necessary for init
code to allocate one.

While not strictly necessary, follow the requirements defined by "SEV
Secure Nested Paging Firmware ABI Specification" Rev 1.54, section 8.8.2
when allocating the RMP:

- RMP_BASE and RMP_END must be set identically across all cores.
- RMP_BASE must be 1 MB aligned
- RMP_END – RMP_BASE + 1 must be a multiple of 1 MB
- RMP is large enough to protect itself

The allocation is done in the init_mem_mapping() hook, which is the
earliest hook I found that has both max_pfn and memblock initialized. At
this point we are still under the
memblock_set_current_limit(ISA_END_ADDRESS) condition, but explicitly
passing the end to memblock_phys_alloc_range() allows us to allocate
past that value.

Signed-off-by: Jeremi Piotrowski <[email protected]>
---
arch/x86/hyperv/hv_init.c | 5 ++++
arch/x86/include/asm/hyperv-tlfs.h | 3 +++
arch/x86/include/asm/mshyperv.h | 3 +++
arch/x86/include/asm/sev.h | 2 ++
arch/x86/kernel/cpu/mshyperv.c | 41 ++++++++++++++++++++++++++++++
arch/x86/kernel/sev.c | 1 -
6 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 29774126e931..e7f5ac075e6d 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -117,6 +117,11 @@ static int hv_cpu_init(unsigned int cpu)
}
}

+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && hv_needs_snp_rmp()) {
+ wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
+ wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
+ }
+
return hyperv_init_ghcb();
}

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index e3efaf6e6b62..01cc2c3f9f20 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -152,6 +152,9 @@
*/
#define HV_X64_NESTED_ENLIGHTENED_TLB BIT(22)

+/* Nested SNP on Hyper-V */
+#define HV_X64_NESTED_NO_RMP_TABLE BIT(23)
+
/* HYPERV_CPUID_ISOLATION_CONFIG.EAX bits. */
#define HV_PARAVISOR_PRESENT BIT(0)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 61f0c206bff0..3533b002cede 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -190,6 +190,9 @@ static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}

extern bool hv_isolation_type_snp(void);

+extern struct resource rmp_res;
+bool hv_needs_snp_rmp(void);
+
static inline bool hv_is_synic_reg(unsigned int reg)
{
if ((reg >= HV_REGISTER_SCONTROL) &&
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 2916f4150ac7..db5438663229 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -83,6 +83,8 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* RMUPDATE detected 4K page and 2MB page overlap. */
#define RMPUPDATE_FAIL_OVERLAP 7

+#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
+
/* RMP page size */
#define RMP_PG_SIZE_4K 0
#define RMP_PG_SIZE_2M 1
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 831613959a92..e7f02412f3a1 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -17,6 +17,7 @@
#include <linux/irq.h>
#include <linux/kexec.h>
#include <linux/i8253.h>
+#include <linux/memblock.h>
#include <linux/random.h>
#include <linux/swiotlb.h>
#include <asm/processor.h>
@@ -31,6 +32,7 @@
#include <asm/timer.h>
#include <asm/reboot.h>
#include <asm/nmi.h>
+#include <asm/sev.h>
#include <clocksource/hyperv_timer.h>
#include <asm/numa.h>
#include <asm/coco.h>
@@ -488,6 +490,44 @@ static bool __init ms_hyperv_msi_ext_dest_id(void)
return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
}

+struct resource rmp_res = {
+ .name = "RMP",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_SYSTEM_RAM,
+};
+
+bool hv_needs_snp_rmp(void)
+{
+ return boot_cpu_has(X86_FEATURE_SEV_SNP) &&
+ (ms_hyperv.nested_features & HV_X64_NESTED_NO_RMP_TABLE);
+}
+
+
+static void __init ms_hyperv_init_mem_mapping(void)
+{
+ phys_addr_t addr;
+ u64 calc_rmp_sz;
+
+ if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+ return;
+ if (!hv_needs_snp_rmp())
+ return;
+
+ calc_rmp_sz = (max_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
+ calc_rmp_sz = round_up(calc_rmp_sz, SZ_1M);
+ addr = memblock_phys_alloc_range(calc_rmp_sz, SZ_1M, 0, max_pfn << PAGE_SHIFT);
+ if (!addr) {
+ pr_warn("Unable to allocate RMP table\n");
+ return;
+ }
+ rmp_res.start = addr;
+ rmp_res.end = addr + calc_rmp_sz - 1;
+ wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
+ wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
+ insert_resource(&iomem_resource, &rmp_res);
+}
+
const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.name = "Microsoft Hyper-V",
.detect = ms_hyperv_platform,
@@ -495,4 +535,5 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.init.x2apic_available = ms_hyperv_x2apic_available,
.init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id,
.init.init_platform = ms_hyperv_init_platform,
+ .init.init_mem_mapping = ms_hyperv_init_mem_mapping,
};
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1dd1b36bdfea..7fa39dc17edd 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -87,7 +87,6 @@ struct rmpentry {
* The first 16KB from the RMP_BASE is used by the processor for the
* bookkeeping, the range needs to be added during the RMP entry lookup.
*/
-#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
#define RMPENTRY_SHIFT 8
#define rmptable_page_offset(x) (RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))

--
2.25.1


2023-01-23 16:51:56

by Jeremi Piotrowski

[permalink] [raw]
Subject: [RFC PATCH v1 2/6] x86/sev: Add support for NestedVirtSnpMsr

The rmpupdate and psmash instructions, which are used in AMD's SEV-SNP
to update the RMP (Reverse Map) table, can't be trapped. For nested
scenarios, AMD defined MSR versions of these instructions which can be
emulated by the top-level hypervisor. One instance where these MSRs are
used are Hyper-V VMs which expose SNP isolation features to the guest.

The MSRs are defined in "AMD64 Architecture Programmer’s Manual, Volume 2:
System Programming", section 15.36.19.

Signed-off-by: Jeremi Piotrowski <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/kernel/sev.c | 62 +++++++++++++++++++++++++-----
3 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 480b4eaef310..e6e2e824f67b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -423,6 +423,7 @@
#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
+#define X86_FEATURE_NESTED_VIRT_SNP_MSR (19*32+29) /* Virtualizable RMPUPDATE and PSMASH MSR available */

/*
* BUG word(s)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 35100c630617..d6103e607896 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -567,6 +567,8 @@
#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
#define MSR_AMD64_RMP_BASE 0xc0010132
#define MSR_AMD64_RMP_END 0xc0010133
+#define MSR_AMD64_VIRT_RMPUPDATE 0xc001f001
+#define MSR_AMD64_VIRT_PSMASH 0xc001f002

#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 7fa39dc17edd..95404c7e5150 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2566,6 +2566,24 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
}
EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);

+static bool virt_snp_msr(void)
+{
+ return boot_cpu_has(X86_FEATURE_NESTED_VIRT_SNP_MSR);
+}
+
+static u64 virt_psmash(u64 paddr)
+{
+ int ret;
+
+ asm volatile(
+ "wrmsr\n\t"
+ : "=a"(ret)
+ : "a"(paddr), "c"(MSR_AMD64_VIRT_PSMASH)
+ : "memory", "cc"
+ );
+ return ret;
+}
+
/*
* psmash is used to smash a 2MB aligned page into 4K
* pages while preserving the Validated bit in the RMP.
@@ -2581,11 +2599,15 @@ int psmash(u64 pfn)
if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
return -ENXIO;

- /* Binutils version 2.36 supports the PSMASH mnemonic. */
- asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
- : "=a"(ret)
- : "a"(paddr)
- : "memory", "cc");
+ if (virt_snp_msr()) {
+ ret = virt_psmash(paddr);
+ } else {
+ /* Binutils version 2.36 supports the PSMASH mnemonic. */
+ asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+ : "=a"(ret)
+ : "a"(paddr)
+ : "memory", "cc");
+ }

return ret;
}
@@ -2601,6 +2623,21 @@ static int invalidate_direct_map(unsigned long pfn, int npages)
return set_memory_np((unsigned long)pfn_to_kaddr(pfn), npages);
}

+static u64 virt_rmpupdate(unsigned long paddr, struct rmp_state *val)
+{
+ int ret;
+ register u64 hi asm("r8") = ((u64 *)val)[1];
+ register u64 lo asm("rdx") = ((u64 *)val)[0];
+
+ asm volatile(
+ "wrmsr\n\t"
+ : "=a"(ret)
+ : "a"(paddr), "c"(MSR_AMD64_VIRT_RMPUPDATE), "r"(lo), "r"(hi)
+ : "memory", "cc"
+ );
+ return ret;
+}
+
static int rmpupdate(u64 pfn, struct rmp_state *val)
{
unsigned long paddr = pfn << PAGE_SHIFT;
@@ -2626,11 +2663,16 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
}

retry:
- /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
- asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
- : "=a"(ret)
- : "a"(paddr), "c"((unsigned long)val)
- : "memory", "cc");
+
+ if (virt_snp_msr()) {
+ ret = virt_rmpupdate(paddr, val);
+ } else {
+ /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+ : "=a"(ret)
+ : "a"(paddr), "c"((unsigned long)val)
+ : "memory", "cc");
+ }

if (ret) {
if (!retries) {
--
2.25.1


2023-01-23 16:52:03

by Jeremi Piotrowski

[permalink] [raw]
Subject: [RFC PATCH v1 3/6] x86/sev: Maintain shadow rmptable on Hyper-V

Hyper-V can expose the SEV-SNP feature to guests, and manages the
system-wide RMP (Reverse Map) table. The SNP implementation in the
kernel needs access to the rmptable for tracking pages and deciding
when/how to issue rmpupdate/psmash. When running as a Hyper-V guest
with SNP support, an rmptable is allocated by the kernel during boot for
this purpose. Keep the table in sync with issued rmpupdate/psmash
instructions.

The logic for how to update the rmptable comes from "AMD64 Architecture
Programmer’s Manual, Volume 3" which describes the psmash and rmpupdate
instructions. To ensure correctness of the SNP host code, the most
important fields are "assigned" and "page size".

Signed-off-by: Jeremi Piotrowski <[email protected]>
---
arch/x86/kernel/sev.c | 59 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 95404c7e5150..edec1ccb80b1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -26,6 +26,7 @@
#include <linux/iommu.h>
#include <linux/amd-iommu.h>

+#include <asm/mshyperv.h>
#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
#include <asm/sev.h>
@@ -2566,6 +2567,11 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
}
EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);

+static bool hv_no_rmp_table(void)
+{
+ return ms_hyperv.nested_features & HV_X64_NESTED_NO_RMP_TABLE;
+}
+
static bool virt_snp_msr(void)
{
return boot_cpu_has(X86_FEATURE_NESTED_VIRT_SNP_MSR);
@@ -2584,6 +2590,26 @@ static u64 virt_psmash(u64 paddr)
return ret;
}

+static void snp_update_rmptable_psmash(u64 pfn)
+{
+ int level;
+ struct rmpentry *entry = __snp_lookup_rmpentry(pfn, &level);
+
+ if (WARN_ON(IS_ERR_OR_NULL(entry)))
+ return;
+
+ if (level == PG_LEVEL_2M) {
+ int i;
+
+ entry->info.pagesize = RMP_PG_SIZE_4K;
+ for (i = 1; i < PTRS_PER_PMD; i++) {
+ struct rmpentry *it = &entry[i];
+ *it = *entry;
+ it->info.gpa = entry->info.gpa + i * PAGE_SIZE;
+ }
+ }
+}
+
/*
* psmash is used to smash a 2MB aligned page into 4K
* pages while preserving the Validated bit in the RMP.
@@ -2601,6 +2627,8 @@ int psmash(u64 pfn)

if (virt_snp_msr()) {
ret = virt_psmash(paddr);
+ if (!ret && hv_no_rmp_table())
+ snp_update_rmptable_psmash(pfn);
} else {
/* Binutils version 2.36 supports the PSMASH mnemonic. */
asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
@@ -2638,6 +2666,35 @@ static u64 virt_rmpupdate(unsigned long paddr, struct rmp_state *val)
return ret;
}

+static void snp_update_rmptable_rmpupdate(u64 pfn, int level, struct rmp_state *val)
+{
+ int prev_level;
+ struct rmpentry *entry = __snp_lookup_rmpentry(pfn, &prev_level);
+
+ if (WARN_ON(IS_ERR_OR_NULL(entry)))
+ return;
+
+ if (level > PG_LEVEL_4K) {
+ int i;
+ struct rmpentry tmp_rmp = {
+ .info = {
+ .assigned = val->assigned,
+ },
+ };
+ for (i = 1; i < PTRS_PER_PMD; i++)
+ entry[i] = tmp_rmp;
+ }
+ if (!val->assigned) {
+ memset(entry, 0, sizeof(*entry));
+ } else {
+ entry->info.assigned = val->assigned;
+ entry->info.pagesize = val->pagesize;
+ entry->info.immutable = val->immutable;
+ entry->info.gpa = val->gpa;
+ entry->info.asid = val->asid;
+ }
+}
+
static int rmpupdate(u64 pfn, struct rmp_state *val)
{
unsigned long paddr = pfn << PAGE_SHIFT;
@@ -2666,6 +2723,8 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)

if (virt_snp_msr()) {
ret = virt_rmpupdate(paddr, val);
+ if (!ret && hv_no_rmp_table())
+ snp_update_rmptable_rmpupdate(pfn, level, val);
} else {
/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
--
2.25.1


2023-01-23 16:52:07

by Jeremi Piotrowski

[permalink] [raw]
Subject: [RFC PATCH v1 4/6] x86/amd: Configure necessary MSRs for SNP during CPU init when running as a guest

From: Jeremi Piotrowski <[email protected]>

Hyper-V may expose the SEV/SEV-SNP CPU features to the guest, but it is
up to the guest to use them. early_detect_mem_encrypt() checks
SYSCFG[MEM_ENCRYPT] and HWCR[SMMLOCK] and if these are not set the
SEV-SNP features are cleared. Check if we are running under a
hypervisor and if so - update SYSCFG and skip the HWCR check.

It would be great to make this check more specific (checking for
Hyper-V) but this code runs before hypervisor detection on the boot cpu.

Signed-off-by: Jeremi Piotrowski <[email protected]>
---
arch/x86/kernel/cpu/amd.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c7884198ad5b..17d91ac62937 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -565,6 +565,12 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
* don't advertise the feature under CONFIG_X86_32.
*/
if (cpu_has(c, X86_FEATURE_SME) || cpu_has(c, X86_FEATURE_SEV)) {
+ if (cpu_has(c, X86_FEATURE_HYPERVISOR)) {
+ rdmsrl(MSR_AMD64_SYSCFG, msr);
+ msr |= MSR_AMD64_SYSCFG_MEM_ENCRYPT;
+ wrmsrl(MSR_AMD64_SYSCFG, msr);
+ }
+
/* Check if memory encryption is enabled */
rdmsrl(MSR_AMD64_SYSCFG, msr);
if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
@@ -584,7 +590,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
setup_clear_cpu_cap(X86_FEATURE_SME);

rdmsrl(MSR_K7_HWCR, msr);
- if (!(msr & MSR_K7_HWCR_SMMLOCK))
+ if (!(msr & MSR_K7_HWCR_SMMLOCK) && !cpu_has(c, X86_FEATURE_HYPERVISOR))
goto clear_sev;

return;
--
2.25.1


2023-01-23 16:52:15

by Jeremi Piotrowski

[permalink] [raw]
Subject: [RFC PATCH v1 5/6] iommu/amd: Don't fail snp_enable when running virtualized

From: Jeremi Piotrowski <[email protected]>

Hyper-V VMs do not have access to an IOMMU but can support hosting SNP
VMs. amd_iommu_snp_enable() is on the SNP init path and should not fail
in that case.

Signed-off-by: Jeremi Piotrowski <[email protected]>
---
drivers/iommu/amd/init.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d1270e3c5baf..8049dbe78a27 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3619,6 +3619,12 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
#ifdef CONFIG_AMD_MEM_ENCRYPT
int amd_iommu_snp_enable(void)
{
+ /*
+ * If we're running virtualized there doesn't have to be an IOMMU for SNP to work.
+ */
+ if (init_state == IOMMU_NOT_FOUND && boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ return 0;
+
/*
* The SNP support requires that IOMMU must be enabled, and is
* not configured in the passthrough mode.
--
2.25.1


2023-01-23 16:52:23

by Jeremi Piotrowski

[permalink] [raw]
Subject: [RFC PATCH v1 6/6] crypto: ccp - Introduce quirk to always reclaim pages after SEV-legacy commands

On Hyper-V, the rmp_mark_pages_shared() call after a SEV_PLATFORM_STATUS
fails with return code 2 (FAIL_PERMISSION) due to the page having the
immutable bit set in the RMP (SNP has been initialized). The comment
above this spot mentions that firmware automatically clears the
immutable bit, but I can't find any mention of this behavior in the SNP
Firmware ABI Spec.

Introduce a quirk to always attempt the page reclaim and set it for the
platform PSP. It would be possible to make this behavior unconditional
as the firmware spec defines that page reclaim results in success if the
page does not have the immutable bit set.

Signed-off-by: Jeremi Piotrowski <[email protected]>
---
drivers/crypto/ccp/sev-dev.c | 6 +++++-
drivers/crypto/ccp/sp-dev.h | 4 ++++
drivers/crypto/ccp/sp-platform.c | 1 +
3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 6c4fdcaed72b..4719c0cafa28 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -658,8 +658,12 @@ static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
* no not need to reclaim the page.
*/
if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
- if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
+ if (psp_master->vdata->quirks & PSP_QUIRK_ALWAYS_RECLAIM) {
+ if (snp_reclaim_pages(__pa(cmd_buf), 1, true))
+ return -EFAULT;
+ } else if (rmp_mark_pages_shared(__pa(cmd_buf), 1)) {
return -EFAULT;
+ }

/* No need to go further if firmware failed to execute command. */
if (fw_err)
diff --git a/drivers/crypto/ccp/sp-dev.h b/drivers/crypto/ccp/sp-dev.h
index 083e57652c7b..6fb065a7d1fd 100644
--- a/drivers/crypto/ccp/sp-dev.h
+++ b/drivers/crypto/ccp/sp-dev.h
@@ -28,6 +28,9 @@
#define CACHE_NONE 0x00
#define CACHE_WB_NO_ALLOC 0xb7

+/* PSP requires a reclaim after every firmware command */
+#define PSP_QUIRK_ALWAYS_RECLAIM BIT(0)
+
/* Structure to hold CCP device data */
struct ccp_device;
struct ccp_vdata {
@@ -59,6 +62,7 @@ struct psp_vdata {
unsigned int feature_reg;
unsigned int inten_reg;
unsigned int intsts_reg;
+ unsigned int quirks;
};

/* Structure to hold SP device data */
diff --git a/drivers/crypto/ccp/sp-platform.c b/drivers/crypto/ccp/sp-platform.c
index d56b34255b97..cae3e7e8f289 100644
--- a/drivers/crypto/ccp/sp-platform.c
+++ b/drivers/crypto/ccp/sp-platform.c
@@ -43,6 +43,7 @@ static struct psp_vdata psp_platform = {
.feature_reg = -1,
.inten_reg = -1,
.intsts_reg = -1,
+ .quirks = PSP_QUIRK_ALWAYS_RECLAIM,
};
#endif

--
2.25.1


2023-01-28 19:26:18

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH v1 1/6] x86/hyperv: Allocate RMP table during boot

From: Jeremi Piotrowski <[email protected]> Sent: Monday, January 23, 2023 8:51 AM
>
> Hyper-V VMs can be capable of hosting SNP isolated nested VMs on AMD
> CPUs. One of the pieces of SNP is the RMP (Reverse Map) table which
> tracks page assignment to firmware, hypervisor or guest. On bare-metal
> this table is allocated by UEFI, but on Hyper-V it is the respnsibility

s/respnsibility/responsibility/

> of the OS to allocate one if necessary. The nested_feature
> 'HV_X64_NESTED_NO_RMP_TABLE' will be set to communicate that no rmp is
> available. The actual RMP table is exclusively controlled by the Hyper-V
> hypervisor and is not virtualized to the VM. The SNP code in the kernel
> uses the RMP table for its own tracking and so it is necessary for init
> code to allocate one.
>
> While not strictly necessary, follow the requirements defined by "SEV
> Secure Nested Paging Firmware ABI Specification" Rev 1.54, section 8.8.2
> when allocating the RMP:
>
> - RMP_BASE and RMP_END must be set identically across all cores.
> - RMP_BASE must be 1 MB aligned
> - RMP_END – RMP_BASE + 1 must be a multiple of 1 MB
> - RMP is large enough to protect itself
>
> The allocation is done in the init_mem_mapping() hook, which is the
> earliest hook I found that has both max_pfn and memblock initialized. At
> this point we are still under the
> memblock_set_current_limit(ISA_END_ADDRESS) condition, but explicitly
> passing the end to memblock_phys_alloc_range() allows us to allocate
> past that value.
>
> Signed-off-by: Jeremi Piotrowski <[email protected]>
> ---
> arch/x86/hyperv/hv_init.c | 5 ++++
> arch/x86/include/asm/hyperv-tlfs.h | 3 +++
> arch/x86/include/asm/mshyperv.h | 3 +++
> arch/x86/include/asm/sev.h | 2 ++
> arch/x86/kernel/cpu/mshyperv.c | 41 ++++++++++++++++++++++++++++++
> arch/x86/kernel/sev.c | 1 -
> 6 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 29774126e931..e7f5ac075e6d 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -117,6 +117,11 @@ static int hv_cpu_init(unsigned int cpu)
> }
> }
>
> + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && hv_needs_snp_rmp()) {

Could the IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) condition be
folded into the implementation of hv_needs_snp_rmp() so that only one
test is needed?

> + wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
> + wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
> + }
> +
> return hyperv_init_ghcb();
> }
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index e3efaf6e6b62..01cc2c3f9f20 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -152,6 +152,9 @@
> */
> #define HV_X64_NESTED_ENLIGHTENED_TLB BIT(22)
>
> +/* Nested SNP on Hyper-V */
> +#define HV_X64_NESTED_NO_RMP_TABLE BIT(23)
> +

Just for my clarification, is this flag always set in an SNP guest when
running on a version of Hyper-V that supports nested SNP? I'm
presuming "yes". But there may be older versions of Hyper-V
that support SNP guests, but not nested SNP guests, in which case
this flag would be clear.

> /* HYPERV_CPUID_ISOLATION_CONFIG.EAX bits. */
> #define HV_PARAVISOR_PRESENT BIT(0)
>
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 61f0c206bff0..3533b002cede 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -190,6 +190,9 @@ static inline void hv_ghcb_terminate(unsigned int set, unsigned
> int reason) {}
>
> extern bool hv_isolation_type_snp(void);
>
> +extern struct resource rmp_res;
> +bool hv_needs_snp_rmp(void);
> +
> static inline bool hv_is_synic_reg(unsigned int reg)
> {
> if ((reg >= HV_REGISTER_SCONTROL) &&
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 2916f4150ac7..db5438663229 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -83,6 +83,8 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
> /* RMUPDATE detected 4K page and 2MB page overlap. */
> #define RMPUPDATE_FAIL_OVERLAP 7
>
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> +
> /* RMP page size */
> #define RMP_PG_SIZE_4K 0
> #define RMP_PG_SIZE_2M 1
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 831613959a92..e7f02412f3a1 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -17,6 +17,7 @@
> #include <linux/irq.h>
> #include <linux/kexec.h>
> #include <linux/i8253.h>
> +#include <linux/memblock.h>
> #include <linux/random.h>
> #include <linux/swiotlb.h>
> #include <asm/processor.h>
> @@ -31,6 +32,7 @@
> #include <asm/timer.h>
> #include <asm/reboot.h>
> #include <asm/nmi.h>
> +#include <asm/sev.h>
> #include <clocksource/hyperv_timer.h>
> #include <asm/numa.h>
> #include <asm/coco.h>
> @@ -488,6 +490,44 @@ static bool __init ms_hyperv_msi_ext_dest_id(void)
> return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
> }
>
> +struct resource rmp_res = {
> + .name = "RMP",
> + .start = 0,
> + .end = 0,
> + .flags = IORESOURCE_SYSTEM_RAM,
> +};
> +
> +bool hv_needs_snp_rmp(void)
> +{
> + return boot_cpu_has(X86_FEATURE_SEV_SNP) &&
> + (ms_hyperv.nested_features & HV_X64_NESTED_NO_RMP_TABLE);
> +}
> +
> +
> +static void __init ms_hyperv_init_mem_mapping(void)
> +{
> + phys_addr_t addr;
> + u64 calc_rmp_sz;
> +
> + if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
> + return;
> + if (!hv_needs_snp_rmp())
> + return;

Another case where it would be cleaner if all the
conditions could be folded into hv_needs_snp_rmp().

> +
> + calc_rmp_sz = (max_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> + calc_rmp_sz = round_up(calc_rmp_sz, SZ_1M);
> + addr = memblock_phys_alloc_range(calc_rmp_sz, SZ_1M, 0, max_pfn << PAGE_SHIFT);
> + if (!addr) {
> + pr_warn("Unable to allocate RMP table\n");
> + return;
> + }
> + rmp_res.start = addr;
> + rmp_res.end = addr + calc_rmp_sz - 1;
> + wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
> + wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
> + insert_resource(&iomem_resource, &rmp_res);
> +}
> +
> const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
> .name = "Microsoft Hyper-V",
> .detect = ms_hyperv_platform,
> @@ -495,4 +535,5 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
> .init.x2apic_available = ms_hyperv_x2apic_available,
> .init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id,
> .init.init_platform = ms_hyperv_init_platform,
> + .init.init_mem_mapping = ms_hyperv_init_mem_mapping,

On versions of Hyper-V that support nested SNP guests, it appears that every
L1 SNP guest will allocate memory for the RMP, even if it never runs an L2 guest.
The amount of memory allocated is 16 bytes per 4K page, or 0.4% of the total
memory size of the L1 VM. In most cases the memory will be unused. For
example, that works out to be 256 Mbytes in a 64 Gbyte VM, which to me is
a fairly big chunk of memory to waste, even though the percentage isn't huge.
Should we have a CONFIG option that controls whether the RMP is allocated?
L1 guests that intend to run their own L2 guests would need to be built with
this CONFIG option. Presumably the allocation must be done early to ensure
the ability to get a big chunk of contiguous physical memory. Allocating
the RMP later only if it is needed isn't an option.

> };
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 1dd1b36bdfea..7fa39dc17edd 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -87,7 +87,6 @@ struct rmpentry {
> * The first 16KB from the RMP_BASE is used by the processor for the
> * bookkeeping, the range needs to be added during the RMP entry lookup.
> */
> -#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> #define RMPENTRY_SHIFT 8
> #define rmptable_page_offset(x) (RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))
>
> --
> 2.25.1

2023-01-28 19:48:43

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH v1 2/6] x86/sev: Add support for NestedVirtSnpMsr

From: Jeremi Piotrowski <[email protected]> Sent: Monday, January 23, 2023 8:51 AM
>
> The rmpupdate and psmash instructions, which are used in AMD's SEV-SNP
> to update the RMP (Reverse Map) table, can't be trapped. For nested
> scenarios, AMD defined MSR versions of these instructions which can be

s/can be/must be/ ??

> emulated by the top-level hypervisor. One instance where these MSRs are

And by "top-level", I think you are referring the hypervisor running at L1, right?
Using the L0/L1/L2 terminology would probably help make the description
more precise.

> used are Hyper-V VMs which expose SNP isolation features to the guest.
>
> The MSRs are defined in "AMD64 Architecture Programmer’s Manual, Volume 2:
> System Programming", section 15.36.19.
>
> Signed-off-by: Jeremi Piotrowski <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/msr-index.h | 2 +
> arch/x86/kernel/sev.c | 62 +++++++++++++++++++++++++-----
> 3 files changed, 55 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 480b4eaef310..e6e2e824f67b 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -423,6 +423,7 @@
> #define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
> #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
> #define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
> +#define X86_FEATURE_NESTED_VIRT_SNP_MSR (19*32+29) /* Virtualizable RMPUPDATE and PSMASH MSR available */
>
> /*
> * BUG word(s)
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 35100c630617..d6103e607896 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -567,6 +567,8 @@
> #define MSR_AMD64_SEV_SNP_ENABLED
> BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> #define MSR_AMD64_RMP_BASE 0xc0010132
> #define MSR_AMD64_RMP_END 0xc0010133
> +#define MSR_AMD64_VIRT_RMPUPDATE 0xc001f001
> +#define MSR_AMD64_VIRT_PSMASH 0xc001f002
>
> #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 7fa39dc17edd..95404c7e5150 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2566,6 +2566,24 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
> }
> EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
>
> +static bool virt_snp_msr(void)
> +{
> + return boot_cpu_has(X86_FEATURE_NESTED_VIRT_SNP_MSR);
> +}
> +
> +static u64 virt_psmash(u64 paddr)
> +{
> + int ret;
> +
> + asm volatile(
> + "wrmsr\n\t"
> + : "=a"(ret)
> + : "a"(paddr), "c"(MSR_AMD64_VIRT_PSMASH)
> + : "memory", "cc"
> + );
> + return ret;
> +}

From checking the AMD spec, I can see that the above use
of wrmsr is non-conventional. Could you capture the basics
of the usage paradigm in a comment? I.e., the expected
inputs and outputs, and the core assumption that the
MSR isn't implemented in hardware, but must trap
to the hypervisor.

> +
> /*
> * psmash is used to smash a 2MB aligned page into 4K
> * pages while preserving the Validated bit in the RMP.
> @@ -2581,11 +2599,15 @@ int psmash(u64 pfn)
> if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> return -ENXIO;
>
> - /* Binutils version 2.36 supports the PSMASH mnemonic. */
> - asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> - : "=a"(ret)
> - : "a"(paddr)
> - : "memory", "cc");
> + if (virt_snp_msr()) {
> + ret = virt_psmash(paddr);
> + } else {
> + /* Binutils version 2.36 supports the PSMASH mnemonic. */
> + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> + : "=a"(ret)
> + : "a"(paddr)
> + : "memory", "cc");
> + }
>
> return ret;
> }
> @@ -2601,6 +2623,21 @@ static int invalidate_direct_map(unsigned long pfn, int npages)
> return set_memory_np((unsigned long)pfn_to_kaddr(pfn), npages);
> }
>
> +static u64 virt_rmpupdate(unsigned long paddr, struct rmp_state *val)
> +{
> + int ret;
> + register u64 hi asm("r8") = ((u64 *)val)[1];
> + register u64 lo asm("rdx") = ((u64 *)val)[0];
> +
> + asm volatile(
> + "wrmsr\n\t"
> + : "=a"(ret)
> + : "a"(paddr), "c"(MSR_AMD64_VIRT_RMPUPDATE), "r"(lo), "r"(hi)
> + : "memory", "cc"
> + );
> + return ret;
> +}

Same here about a comment capturing the expected inputs
and outputs.

> +
> static int rmpupdate(u64 pfn, struct rmp_state *val)
> {
> unsigned long paddr = pfn << PAGE_SHIFT;
> @@ -2626,11 +2663,16 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
> }
>
> retry:
> - /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> - asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> - : "=a"(ret)
> - : "a"(paddr), "c"((unsigned long)val)
> - : "memory", "cc");
> +
> + if (virt_snp_msr()) {
> + ret = virt_rmpupdate(paddr, val);
> + } else {
> + /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> + asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> + : "=a"(ret)
> + : "a"(paddr), "c"((unsigned long)val)
> + : "memory", "cc");
> + }
>
> if (ret) {
> if (!retries) {
> --
> 2.25.1

2023-01-29 04:37:43

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH v1 3/6] x86/sev: Maintain shadow rmptable on Hyper-V

From: Jeremi Piotrowski <[email protected]> Sent: Monday, January 23, 2023 8:51 AM
>
> Hyper-V can expose the SEV-SNP feature to guests, and manages the
> system-wide RMP (Reverse Map) table. The SNP implementation in the
> kernel needs access to the rmptable for tracking pages and deciding
> when/how to issue rmpupdate/psmash. When running as a Hyper-V guest
> with SNP support, an rmptable is allocated by the kernel during boot for
> this purpose. Keep the table in sync with issued rmpupdate/psmash
> instructions.
>
> The logic for how to update the rmptable comes from "AMD64 Architecture
> Programmer’s Manual, Volume 3" which describes the psmash and rmpupdate
> instructions. To ensure correctness of the SNP host code, the most
> important fields are "assigned" and "page size".
>
> Signed-off-by: Jeremi Piotrowski <[email protected]>
> ---
> arch/x86/kernel/sev.c | 59 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 59 insertions(+)
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 95404c7e5150..edec1ccb80b1 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -26,6 +26,7 @@
> #include <linux/iommu.h>
> #include <linux/amd-iommu.h>
>
> +#include <asm/mshyperv.h>

The code in sev.c should be generic and appropriate for use with any
hypervisor. As such, I think we want to avoid sev.c having a dependency
on Hyper-V specific code, such as the <asm/mshyperv.h> include file
and the ms_hyperv.nested_features variable as used below.

Instead, create a boolean static variable in the sev.c module along with
a wrapper function to set it. The logic that tests hv_no_rmp_table()
should test this static variable instead, which would default to "false".
Hypervisor-specific initialization code can call the wrapper function
to set the variable to "true" based on whatever logic is appropriate for
the hypervisor. This approach reverses the dependency and hopefully
is usable by other hypervisors that want to offer nested SNP support.

> #include <asm/cpu_entry_area.h>
> #include <asm/stacktrace.h>
> #include <asm/sev.h>
> @@ -2566,6 +2567,11 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
> }
> EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
>
> +static bool hv_no_rmp_table(void)
> +{
> + return ms_hyperv.nested_features & HV_X64_NESTED_NO_RMP_TABLE;
> +}
> +
> static bool virt_snp_msr(void)
> {
> return boot_cpu_has(X86_FEATURE_NESTED_VIRT_SNP_MSR);
> @@ -2584,6 +2590,26 @@ static u64 virt_psmash(u64 paddr)
> return ret;
> }
>
> +static void snp_update_rmptable_psmash(u64 pfn)
> +{
> + int level;
> + struct rmpentry *entry = __snp_lookup_rmpentry(pfn, &level);
> +
> + if (WARN_ON(IS_ERR_OR_NULL(entry)))
> + return;
> +
> + if (level == PG_LEVEL_2M) {
> + int i;
> +
> + entry->info.pagesize = RMP_PG_SIZE_4K;
> + for (i = 1; i < PTRS_PER_PMD; i++) {
> + struct rmpentry *it = &entry[i];
> + *it = *entry;
> + it->info.gpa = entry->info.gpa + i * PAGE_SIZE;
> + }
> + }
> +}
> +
> /*
> * psmash is used to smash a 2MB aligned page into 4K
> * pages while preserving the Validated bit in the RMP.
> @@ -2601,6 +2627,8 @@ int psmash(u64 pfn)
>
> if (virt_snp_msr()) {
> ret = virt_psmash(paddr);
> + if (!ret && hv_no_rmp_table())
> + snp_update_rmptable_psmash(pfn);
> } else {
> /* Binutils version 2.36 supports the PSMASH mnemonic. */
> asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> @@ -2638,6 +2666,35 @@ static u64 virt_rmpupdate(unsigned long paddr, struct
> rmp_state *val)
> return ret;
> }
>
> +static void snp_update_rmptable_rmpupdate(u64 pfn, int level, struct rmp_state *val)
> +{
> + int prev_level;
> + struct rmpentry *entry = __snp_lookup_rmpentry(pfn, &prev_level);
> +
> + if (WARN_ON(IS_ERR_OR_NULL(entry)))
> + return;
> +
> + if (level > PG_LEVEL_4K) {
> + int i;
> + struct rmpentry tmp_rmp = {
> + .info = {
> + .assigned = val->assigned,
> + },
> + };
> + for (i = 1; i < PTRS_PER_PMD; i++)
> + entry[i] = tmp_rmp;
> + }
> + if (!val->assigned) {
> + memset(entry, 0, sizeof(*entry));
> + } else {
> + entry->info.assigned = val->assigned;
> + entry->info.pagesize = val->pagesize;
> + entry->info.immutable = val->immutable;
> + entry->info.gpa = val->gpa;
> + entry->info.asid = val->asid;
> + }
> +}
> +
> static int rmpupdate(u64 pfn, struct rmp_state *val)
> {
> unsigned long paddr = pfn << PAGE_SHIFT;
> @@ -2666,6 +2723,8 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
>
> if (virt_snp_msr()) {
> ret = virt_rmpupdate(paddr, val);
> + if (!ret && hv_no_rmp_table())
> + snp_update_rmptable_rmpupdate(pfn, level, val);
> } else {
> /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> --
> 2.25.1

2023-01-29 04:44:21

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH v1 4/6] x86/amd: Configure necessary MSRs for SNP during CPU init when running as a guest

From: Jeremi Piotrowski <[email protected]> Sent: Monday, January 23, 2023 8:51 AM
>
> Hyper-V may expose the SEV/SEV-SNP CPU features to the guest, but it is
> up to the guest to use them. early_detect_mem_encrypt() checks
> SYSCFG[MEM_ENCRYPT] and HWCR[SMMLOCK] and if these are not set the
> SEV-SNP features are cleared. Check if we are running under a
> hypervisor and if so - update SYSCFG and skip the HWCR check.
>
> It would be great to make this check more specific (checking for
> Hyper-V) but this code runs before hypervisor detection on the boot cpu.

Could you elaborate on why we would want this check to be Hyper-V
specific? Per my comments on Patch 3 of this series, I would think the
opposite. If possible, we want code like this to work on any hypervisor,
and not have Hyper-V specific behavior in code outside of the Hyper-V
modules. But I don't know this code well at all, so maybe there's an
aspect I'm missing.

Michael

>
> Signed-off-by: Jeremi Piotrowski <[email protected]>
> ---
> arch/x86/kernel/cpu/amd.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index c7884198ad5b..17d91ac62937 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -565,6 +565,12 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> * don't advertise the feature under CONFIG_X86_32.
> */
> if (cpu_has(c, X86_FEATURE_SME) || cpu_has(c, X86_FEATURE_SEV)) {
> + if (cpu_has(c, X86_FEATURE_HYPERVISOR)) {
> + rdmsrl(MSR_AMD64_SYSCFG, msr);
> + msr |= MSR_AMD64_SYSCFG_MEM_ENCRYPT;
> + wrmsrl(MSR_AMD64_SYSCFG, msr);
> + }
> +
> /* Check if memory encryption is enabled */
> rdmsrl(MSR_AMD64_SYSCFG, msr);
> if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
> @@ -584,7 +590,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> setup_clear_cpu_cap(X86_FEATURE_SME);
>
> rdmsrl(MSR_K7_HWCR, msr);
> - if (!(msr & MSR_K7_HWCR_SMMLOCK))
> + if (!(msr & MSR_K7_HWCR_SMMLOCK) && !cpu_has(c, X86_FEATURE_HYPERVISOR))
> goto clear_sev;
>
> return;
> --
> 2.25.1


2023-01-30 15:03:22

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [RFC PATCH v1 1/6] x86/hyperv: Allocate RMP table during boot

On Sat, Jan 28, 2023 at 07:26:05PM +0000, Michael Kelley (LINUX) wrote:
> From: Jeremi Piotrowski <[email protected]> Sent: Monday, January 23, 2023 8:51 AM
> >
> > Hyper-V VMs can be capable of hosting SNP isolated nested VMs on AMD
> > CPUs. One of the pieces of SNP is the RMP (Reverse Map) table which
> > tracks page assignment to firmware, hypervisor or guest. On bare-metal
> > this table is allocated by UEFI, but on Hyper-V it is the respnsibility
>
> s/respnsibility/responsibility/
>

ok

> > of the OS to allocate one if necessary. The nested_feature
> > 'HV_X64_NESTED_NO_RMP_TABLE' will be set to communicate that no rmp is
> > available. The actual RMP table is exclusively controlled by the Hyper-V
> > hypervisor and is not virtualized to the VM. The SNP code in the kernel
> > uses the RMP table for its own tracking and so it is necessary for init
> > code to allocate one.
> >
> > While not strictly necessary, follow the requirements defined by "SEV
> > Secure Nested Paging Firmware ABI Specification" Rev 1.54, section 8.8.2
> > when allocating the RMP:
> >
> > - RMP_BASE and RMP_END must be set identically across all cores.
> > - RMP_BASE must be 1 MB aligned
> > - RMP_END – RMP_BASE + 1 must be a multiple of 1 MB
> > - RMP is large enough to protect itself
> >
> > The allocation is done in the init_mem_mapping() hook, which is the
> > earliest hook I found that has both max_pfn and memblock initialized. At
> > this point we are still under the
> > memblock_set_current_limit(ISA_END_ADDRESS) condition, but explicitly
> > passing the end to memblock_phys_alloc_range() allows us to allocate
> > past that value.
> >
> > Signed-off-by: Jeremi Piotrowski <[email protected]>
> > ---
> > arch/x86/hyperv/hv_init.c | 5 ++++
> > arch/x86/include/asm/hyperv-tlfs.h | 3 +++
> > arch/x86/include/asm/mshyperv.h | 3 +++
> > arch/x86/include/asm/sev.h | 2 ++
> > arch/x86/kernel/cpu/mshyperv.c | 41 ++++++++++++++++++++++++++++++
> > arch/x86/kernel/sev.c | 1 -
> > 6 files changed, 54 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index 29774126e931..e7f5ac075e6d 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -117,6 +117,11 @@ static int hv_cpu_init(unsigned int cpu)
> > }
> > }
> >
> > + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && hv_needs_snp_rmp()) {
>
> Could the IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) condition be
> folded into the implementation of hv_needs_snp_rmp() so that only one
> test is needed?
>

Yes, I'll fold it in. I originally kept this out because I was worried about
dead-code elimination. I wanted to make hv_needs_snp_rmp() a static inline but
I couldn't find the right place to put it:

- ms_hyperv is declared in include/asm-generic/mshyperv.h but I wouldn't
put something x86 specific there.
- doesn't go in arch/x86/include/asm/hyperv-tlfs.h with the NESTED_NO_RMP_TABLE
definition.
- arch/x86/include/asm/mshyperv.h would have been good but it would need to be
at the bottom of the file (after the asm-generic/mshyperv.h include) and
nothing else is there.

> > + wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
> > + wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
> > + }
> > +
> > return hyperv_init_ghcb();
> > }
> >
> > diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> > index e3efaf6e6b62..01cc2c3f9f20 100644
> > --- a/arch/x86/include/asm/hyperv-tlfs.h
> > +++ b/arch/x86/include/asm/hyperv-tlfs.h
> > @@ -152,6 +152,9 @@
> > */
> > #define HV_X64_NESTED_ENLIGHTENED_TLB BIT(22)
> >
> > +/* Nested SNP on Hyper-V */
> > +#define HV_X64_NESTED_NO_RMP_TABLE BIT(23)
> > +
>
> Just for my clarification, is this flag always set in an SNP guest when
> running on a version of Hyper-V that supports nested SNP? I'm
> presuming "yes".

No, this is not set for SNP guests. I'll expand the cover letter but when I say
"nested SNP" I mean "running an SNP guest as a nested guest", and not "running
guests nested inside an SNP guest". SNP guests do not support any form of
virtualization, and the L1 in this scenario is not SNP itself.

> But there may be older versions of Hyper-V
> that support SNP guests, but not nested SNP guests, in which case
> this flag would be clear.

These two cases are exclusive, so I'm adding a:

!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)

Condition to make that clearer. The NO_RMP_TABLE flag would never be set for a
"Hyper-V SNP guest" and is always set for a "Hyper-V guest that is capable of
running SNP guests".

>
> > /* HYPERV_CPUID_ISOLATION_CONFIG.EAX bits. */
> > #define HV_PARAVISOR_PRESENT BIT(0)
> >
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index 61f0c206bff0..3533b002cede 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -190,6 +190,9 @@ static inline void hv_ghcb_terminate(unsigned int set, unsigned
> > int reason) {}
> >
> > extern bool hv_isolation_type_snp(void);
> >
> > +extern struct resource rmp_res;
> > +bool hv_needs_snp_rmp(void);
> > +
> > static inline bool hv_is_synic_reg(unsigned int reg)
> > {
> > if ((reg >= HV_REGISTER_SCONTROL) &&
> > diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> > index 2916f4150ac7..db5438663229 100644
> > --- a/arch/x86/include/asm/sev.h
> > +++ b/arch/x86/include/asm/sev.h
> > @@ -83,6 +83,8 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
> > /* RMUPDATE detected 4K page and 2MB page overlap. */
> > #define RMPUPDATE_FAIL_OVERLAP 7
> >
> > +#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> > +
> > /* RMP page size */
> > #define RMP_PG_SIZE_4K 0
> > #define RMP_PG_SIZE_2M 1
> > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> > index 831613959a92..e7f02412f3a1 100644
> > --- a/arch/x86/kernel/cpu/mshyperv.c
> > +++ b/arch/x86/kernel/cpu/mshyperv.c
> > @@ -17,6 +17,7 @@
> > #include <linux/irq.h>
> > #include <linux/kexec.h>
> > #include <linux/i8253.h>
> > +#include <linux/memblock.h>
> > #include <linux/random.h>
> > #include <linux/swiotlb.h>
> > #include <asm/processor.h>
> > @@ -31,6 +32,7 @@
> > #include <asm/timer.h>
> > #include <asm/reboot.h>
> > #include <asm/nmi.h>
> > +#include <asm/sev.h>
> > #include <clocksource/hyperv_timer.h>
> > #include <asm/numa.h>
> > #include <asm/coco.h>
> > @@ -488,6 +490,44 @@ static bool __init ms_hyperv_msi_ext_dest_id(void)
> > return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
> > }
> >
> > +struct resource rmp_res = {
> > + .name = "RMP",
> > + .start = 0,
> > + .end = 0,
> > + .flags = IORESOURCE_SYSTEM_RAM,
> > +};
> > +
> > +bool hv_needs_snp_rmp(void)
> > +{
> > + return boot_cpu_has(X86_FEATURE_SEV_SNP) &&
> > + (ms_hyperv.nested_features & HV_X64_NESTED_NO_RMP_TABLE);
> > +}
> > +
> > +
> > +static void __init ms_hyperv_init_mem_mapping(void)
> > +{
> > + phys_addr_t addr;
> > + u64 calc_rmp_sz;
> > +
> > + if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
> > + return;
> > + if (!hv_needs_snp_rmp())
> > + return;
>
> Another case where it would be cleaner if all the
> conditions could be folded into hv_needs_snp_rmp().
>

ok

> > +
> > + calc_rmp_sz = (max_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> > + calc_rmp_sz = round_up(calc_rmp_sz, SZ_1M);
> > + addr = memblock_phys_alloc_range(calc_rmp_sz, SZ_1M, 0, max_pfn << PAGE_SHIFT);
> > + if (!addr) {
> > + pr_warn("Unable to allocate RMP table\n");
> > + return;
> > + }
> > + rmp_res.start = addr;
> > + rmp_res.end = addr + calc_rmp_sz - 1;
> > + wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
> > + wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
> > + insert_resource(&iomem_resource, &rmp_res);
> > +}
> > +
> > const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
> > .name = "Microsoft Hyper-V",
> > .detect = ms_hyperv_platform,
> > @@ -495,4 +535,5 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
> > .init.x2apic_available = ms_hyperv_x2apic_available,
> > .init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id,
> > .init.init_platform = ms_hyperv_init_platform,
> > + .init.init_mem_mapping = ms_hyperv_init_mem_mapping,
>
> On versions of Hyper-V that support nested SNP guests, it appears that every
> L1 SNP guest will allocate memory for the RMP, even if it never runs an L2 guest.
> The amount of memory allocated is 16 bytes per 4K page, or 0.4% of the total
> memory size of the L1 VM. In most cases the memory will be unused. For
> example, that works out to be 256 Mbytes in a 64 Gbyte VM, which to me is
> a fairly big chunk of memory to waste, even though the percentage isn't huge.
> Should we have a CONFIG option that controls whether the RMP is allocated?
> L1 guests that intend to run their own L2 guests would need to be built with
> this CONFIG option. Presumably the allocation must be done early to ensure
> the ability to get a big chunk of contiguous physical memory. Allocating
> the RMP later only if it is needed isn't an option.
>

I clarified above - this table is not intended to be allocated on L1 SNP guests
and I agree that doing so would be wasteful.

Once we exclude L1 SNP guests, I don't think a CONFIG option is needed because
the VMM can easily control whether to expose the SNP CPU flag to the guest or
not. If it's exposed, the guest is going to need an RMP to do anything with
SNP, if it's not exposed the RMP won't be allocated. It's also already behind
CONFIG_AMD_MEM_ENCRYPT.

The allocation needs to be done early to match the behavior the rmp handling
code expects (big and physically contiguous). In an earlier version I had a big
vmalloc allocation in 'fs_initcall(snp_rmptable_init)' which worked fine but I
thought that would not be suitable for upstreaming.

> > };
> > diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> > index 1dd1b36bdfea..7fa39dc17edd 100644
> > --- a/arch/x86/kernel/sev.c
> > +++ b/arch/x86/kernel/sev.c
> > @@ -87,7 +87,6 @@ struct rmpentry {
> > * The first 16KB from the RMP_BASE is used by the processor for the
> > * bookkeeping, the range needs to be added during the RMP entry lookup.
> > */
> > -#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> > #define RMPENTRY_SHIFT 8
> > #define rmptable_page_offset(x) (RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))
> >
> > --
> > 2.25.1
>

2023-01-30 15:26:00

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [RFC PATCH v1 2/6] x86/sev: Add support for NestedVirtSnpMsr

On Sat, Jan 28, 2023 at 07:48:27PM +0000, Michael Kelley (LINUX) wrote:
> From: Jeremi Piotrowski <[email protected]> Sent: Monday, January 23, 2023 8:51 AM
> >
> > The rmpupdate and psmash instructions, which are used in AMD's SEV-SNP
> > to update the RMP (Reverse Map) table, can't be trapped. For nested
> > scenarios, AMD defined MSR versions of these instructions which can be
>
> s/can be/must be/ ??
>

yes indeed

> > emulated by the top-level hypervisor. One instance where these MSRs are
>
> And by "top-level", I think you are referring the hypervisor running at L1, right?
> Using the L0/L1/L2 terminology would probably help make the description
> more precise.

These instructions are called by the L1 hypervisor and are emulated by the L0
hypervisor which controls the actual rmp table. I'll rephrase the commit
message to make that clearer.

>
> > used are Hyper-V VMs which expose SNP isolation features to the guest.
> >
> > The MSRs are defined in "AMD64 Architecture Programmer’s Manual, Volume 2:
> > System Programming", section 15.36.19.
> >
> > Signed-off-by: Jeremi Piotrowski <[email protected]>
> > ---
> > arch/x86/include/asm/cpufeatures.h | 1 +
> > arch/x86/include/asm/msr-index.h | 2 +
> > arch/x86/kernel/sev.c | 62 +++++++++++++++++++++++++-----
> > 3 files changed, 55 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index 480b4eaef310..e6e2e824f67b 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -423,6 +423,7 @@
> > #define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
> > #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
> > #define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
> > +#define X86_FEATURE_NESTED_VIRT_SNP_MSR (19*32+29) /* Virtualizable RMPUPDATE and PSMASH MSR available */
> >
> > /*
> > * BUG word(s)
> > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> > index 35100c630617..d6103e607896 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -567,6 +567,8 @@
> > #define MSR_AMD64_SEV_SNP_ENABLED
> > BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> > #define MSR_AMD64_RMP_BASE 0xc0010132
> > #define MSR_AMD64_RMP_END 0xc0010133
> > +#define MSR_AMD64_VIRT_RMPUPDATE 0xc001f001
> > +#define MSR_AMD64_VIRT_PSMASH 0xc001f002
> >
> > #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
> >
> > diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> > index 7fa39dc17edd..95404c7e5150 100644
> > --- a/arch/x86/kernel/sev.c
> > +++ b/arch/x86/kernel/sev.c
> > @@ -2566,6 +2566,24 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
> > }
> > EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> >
> > +static bool virt_snp_msr(void)
> > +{
> > + return boot_cpu_has(X86_FEATURE_NESTED_VIRT_SNP_MSR);
> > +}
> > +
> > +static u64 virt_psmash(u64 paddr)
> > +{
> > + int ret;
> > +
> > + asm volatile(
> > + "wrmsr\n\t"
> > + : "=a"(ret)
> > + : "a"(paddr), "c"(MSR_AMD64_VIRT_PSMASH)
> > + : "memory", "cc"
> > + );
> > + return ret;
> > +}
>
> From checking the AMD spec, I can see that the above use
> of wrmsr is non-conventional. Could you capture the basics
> of the usage paradigm in a comment? I.e., the expected
> inputs and outputs, and the core assumption that the
> MSR isn't implemented in hardware, but must trap
> to the hypervisor.

ok, how does this sound:

/*
* This version of rmpupdate is not implemented in hardware but always
* traps to L0 hypervisor. It doesn't follow usual wrmsr conventions.
* Inputs:
* rax: 4KB aligned GPA
* rdx: bytes 7:0 of new rmp entry
* r8: bytes 15:8 of new rmp entry
* Outputs:
* rax: rmpupdate return code
*/

and similar for psmash.

>
> > +
> > /*
> > * psmash is used to smash a 2MB aligned page into 4K
> > * pages while preserving the Validated bit in the RMP.
> > @@ -2581,11 +2599,15 @@ int psmash(u64 pfn)
> > if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> > return -ENXIO;
> >
> > - /* Binutils version 2.36 supports the PSMASH mnemonic. */
> > - asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> > - : "=a"(ret)
> > - : "a"(paddr)
> > - : "memory", "cc");
> > + if (virt_snp_msr()) {
> > + ret = virt_psmash(paddr);
> > + } else {
> > + /* Binutils version 2.36 supports the PSMASH mnemonic. */
> > + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> > + : "=a"(ret)
> > + : "a"(paddr)
> > + : "memory", "cc");
> > + }
> >
> > return ret;
> > }
> > @@ -2601,6 +2623,21 @@ static int invalidate_direct_map(unsigned long pfn, int npages)
> > return set_memory_np((unsigned long)pfn_to_kaddr(pfn), npages);
> > }
> >
> > +static u64 virt_rmpupdate(unsigned long paddr, struct rmp_state *val)
> > +{
> > + int ret;
> > + register u64 hi asm("r8") = ((u64 *)val)[1];
> > + register u64 lo asm("rdx") = ((u64 *)val)[0];
> > +
> > + asm volatile(
> > + "wrmsr\n\t"
> > + : "=a"(ret)
> > + : "a"(paddr), "c"(MSR_AMD64_VIRT_RMPUPDATE), "r"(lo), "r"(hi)
> > + : "memory", "cc"
> > + );
> > + return ret;
> > +}
>
> Same here about a comment capturing the expected inputs
> and outputs.

ok

>
> > +
> > static int rmpupdate(u64 pfn, struct rmp_state *val)
> > {
> > unsigned long paddr = pfn << PAGE_SHIFT;
> > @@ -2626,11 +2663,16 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
> > }
> >
> > retry:
> > - /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> > - asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> > - : "=a"(ret)
> > - : "a"(paddr), "c"((unsigned long)val)
> > - : "memory", "cc");
> > +
> > + if (virt_snp_msr()) {
> > + ret = virt_rmpupdate(paddr, val);
> > + } else {
> > + /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> > + asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> > + : "=a"(ret)
> > + : "a"(paddr), "c"((unsigned long)val)
> > + : "memory", "cc");
> > + }
> >
> > if (ret) {
> > if (!retries) {
> > --
> > 2.25.1
>

2023-01-30 15:39:30

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH v1 2/6] x86/sev: Add support for NestedVirtSnpMsr

From: Jeremi Piotrowski <[email protected]> Sent: Monday, January 30, 2023 7:26 AM
>
> On Sat, Jan 28, 2023 at 07:48:27PM +0000, Michael Kelley (LINUX) wrote:
> > From: Jeremi Piotrowski <[email protected]> Sent: Monday, January
> 23, 2023 8:51 AM
> > >
> > > The rmpupdate and psmash instructions, which are used in AMD's SEV-SNP
> > > to update the RMP (Reverse Map) table, can't be trapped. For nested
> > > scenarios, AMD defined MSR versions of these instructions which can be
> >
> > s/can be/must be/ ??
> >
>
> yes indeed
>
> > > emulated by the top-level hypervisor. One instance where these MSRs are
> >
> > And by "top-level", I think you are referring the hypervisor running at L1, right?
> > Using the L0/L1/L2 terminology would probably help make the description
> > more precise.
>
> These instructions are called by the L1 hypervisor and are emulated by the L0
> hypervisor which controls the actual rmp table. I'll rephrase the commit
> message to make that clearer.
>

[snip]

> > > +
> > > +static u64 virt_psmash(u64 paddr)
> > > +{
> > > + int ret;
> > > +
> > > + asm volatile(
> > > + "wrmsr\n\t"
> > > + : "=a"(ret)
> > > + : "a"(paddr), "c"(MSR_AMD64_VIRT_PSMASH)
> > > + : "memory", "cc"
> > > + );
> > > + return ret;
> > > +}
> >
> > From checking the AMD spec, I can see that the above use
> > of wrmsr is non-conventional. Could you capture the basics
> > of the usage paradigm in a comment? I.e., the expected
> > inputs and outputs, and the core assumption that the
> > MSR isn't implemented in hardware, but must trap
> > to the hypervisor.
>
> ok, how does this sound:
>
> /*
> * This version of rmpupdate is not implemented in hardware but always
> * traps to L0 hypervisor. It doesn't follow usual wrmsr conventions.
> * Inputs:
> * rax: 4KB aligned GPA
> * rdx: bytes 7:0 of new rmp entry
> * r8: bytes 15:8 of new rmp entry
> * Outputs:
> * rax: rmpupdate return code
> */
>
> and similar for psmash.
>

Yes, that works for me.

Michael

2023-01-30 16:51:32

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [RFC PATCH v1 3/6] x86/sev: Maintain shadow rmptable on Hyper-V

On Sun, Jan 29, 2023 at 04:37:24AM +0000, Michael Kelley (LINUX) wrote:
> From: Jeremi Piotrowski <[email protected]> Sent: Monday, January 23, 2023 8:51 AM
> >
> > Hyper-V can expose the SEV-SNP feature to guests, and manages the
> > system-wide RMP (Reverse Map) table. The SNP implementation in the
> > kernel needs access to the rmptable for tracking pages and deciding
> > when/how to issue rmpupdate/psmash. When running as a Hyper-V guest
> > with SNP support, an rmptable is allocated by the kernel during boot for
> > this purpose. Keep the table in sync with issued rmpupdate/psmash
> > instructions.
> >
> > The logic for how to update the rmptable comes from "AMD64 Architecture
> > Programmer’s Manual, Volume 3" which describes the psmash and rmpupdate
> > instructions. To ensure correctness of the SNP host code, the most
> > important fields are "assigned" and "page size".
> >
> > Signed-off-by: Jeremi Piotrowski <[email protected]>
> > ---
> > arch/x86/kernel/sev.c | 59 +++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 59 insertions(+)
> >
> > diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> > index 95404c7e5150..edec1ccb80b1 100644
> > --- a/arch/x86/kernel/sev.c
> > +++ b/arch/x86/kernel/sev.c
> > @@ -26,6 +26,7 @@
> > #include <linux/iommu.h>
> > #include <linux/amd-iommu.h>
> >
> > +#include <asm/mshyperv.h>
>
> The code in sev.c should be generic and appropriate for use with any
> hypervisor. As such, I think we want to avoid sev.c having a dependency
> on Hyper-V specific code, such as the <asm/mshyperv.h> include file
> and the ms_hyperv.nested_features variable as used below.
>
> Instead, create a boolean static variable in the sev.c module along with
> a wrapper function to set it. The logic that tests hv_no_rmp_table()
> should test this static variable instead, which would default to "false".
> Hypervisor-specific initialization code can call the wrapper function
> to set the variable to "true" based on whatever logic is appropriate for
> the hypervisor. This approach reverses the dependency and hopefully
> is usable by other hypervisors that want to offer nested SNP support.
>

ok, this makes sense. I've added a call to the wrapper to the init_mem_mapping
callback added before so that it is enabled together with allocating the
rmptable.


2023-01-30 17:25:07

by Jeremi Piotrowski

[permalink] [raw]
Subject: Re: [RFC PATCH v1 4/6] x86/amd: Configure necessary MSRs for SNP during CPU init when running as a guest

On Sun, Jan 29, 2023 at 04:44:05AM +0000, Michael Kelley (LINUX) wrote:
> From: Jeremi Piotrowski <[email protected]> Sent: Monday, January 23, 2023 8:51 AM
> >
> > Hyper-V may expose the SEV/SEV-SNP CPU features to the guest, but it is
> > up to the guest to use them. early_detect_mem_encrypt() checks
> > SYSCFG[MEM_ENCRYPT] and HWCR[SMMLOCK] and if these are not set the
> > SEV-SNP features are cleared. Check if we are running under a
> > hypervisor and if so - update SYSCFG and skip the HWCR check.
> >
> > It would be great to make this check more specific (checking for
> > Hyper-V) but this code runs before hypervisor detection on the boot cpu.
>
> Could you elaborate on why we would want this check to be Hyper-V
> specific? Per my comments on Patch 3 of this series, I would think the
> opposite. If possible, we want code like this to work on any hypervisor,
> and not have Hyper-V specific behavior in code outside of the Hyper-V
> modules. But I don't know this code well at all, so maybe there's an
> aspect I'm missing.
>
> Michael
>

This patch would work for any hypervisor, but I'm not sure every hypervisor
would chose to do things this way. Take the MSR_AMD64_SYSCFG_MEM_ENCRYPT
setting. It could be done like on baremetal with VM BIOS settings, which
wouldn't work well for Hyper-V. The VMM could also simply always return
MSR_AMD64_SYSCFG_MEM_ENCRYPT when it exposes SEV/-ES/-SNP flags to a non-SNP
guest (KVM always returns 0 in SYSCFG right now, and doesn't allow it to be
set).

But ultimately all this function does is mask off SEV/-ES/-SNP CPU flags based
on an assumption that no longer holds, so I think this approach to fixing it is
acceptable. The only thing I would check is whether it's possible to check the
coco attr here as well so that this definitely doesn't run for SNP guests
(provided this information is available at this point).

> >
> > Signed-off-by: Jeremi Piotrowski <[email protected]>
> > ---
> > arch/x86/kernel/cpu/amd.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> > index c7884198ad5b..17d91ac62937 100644
> > --- a/arch/x86/kernel/cpu/amd.c
> > +++ b/arch/x86/kernel/cpu/amd.c
> > @@ -565,6 +565,12 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> > * don't advertise the feature under CONFIG_X86_32.
> > */
> > if (cpu_has(c, X86_FEATURE_SME) || cpu_has(c, X86_FEATURE_SEV)) {
> > + if (cpu_has(c, X86_FEATURE_HYPERVISOR)) {
> > + rdmsrl(MSR_AMD64_SYSCFG, msr);
> > + msr |= MSR_AMD64_SYSCFG_MEM_ENCRYPT;
> > + wrmsrl(MSR_AMD64_SYSCFG, msr);
> > + }
> > +
> > /* Check if memory encryption is enabled */
> > rdmsrl(MSR_AMD64_SYSCFG, msr);
> > if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
> > @@ -584,7 +590,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> > setup_clear_cpu_cap(X86_FEATURE_SME);
> >
> > rdmsrl(MSR_K7_HWCR, msr);
> > - if (!(msr & MSR_K7_HWCR_SMMLOCK))
> > + if (!(msr & MSR_K7_HWCR_SMMLOCK) && !cpu_has(c, X86_FEATURE_HYPERVISOR))
> > goto clear_sev;
> >
> > return;
> > --
> > 2.25.1