2024-04-24 16:24:38

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v4 00/15] Provide SEV-SNP support for running under an SVSM

This series adds SEV-SNP support for running Linux under an Secure VM
Service Module (SVSM) at a less privileged VM Privilege Level (VMPL).
By running at a less priviledged VMPL, the SVSM can be used to provide
services, e.g. a virtual TPM, for Linux within the SEV-SNP confidential
VM (CVM) rather than trust such services from the hypervisor.

Currently, a Linux guest expects to run at the highest VMPL, VMPL0, and
there are certain SNP related operations that require that VMPL level.
Specifically, the PVALIDATE instruction and the RMPADJUST instruction
when setting the VMSA attribute of a page (used when starting APs).

If Linux is to run at a less privileged VMPL, e.g. VMPL2, then it must
use an SVSM (which is running at VMPL0) to perform the operations that
it is no longer able to perform.

How Linux interacts with and uses the SVSM is documented in the SVSM
specification [1] and the GHCB specification [2].

This series introduces support to run Linux under an SVSM. It consists
of:
- Detecting the presence of an SVSM
- When not running at VMPL0, invoking the SVSM for page validation and
VMSA page creation/deletion
- Adding a sysfs entry that specifies the Linux VMPL
- Modifying the sev-guest driver to use the VMPCK key associated with
the Linux VMPL
- Expanding the config-fs TSM support to request attestation reports
from the SVSM and allowing attributes to be hidden
- Detecting and allowing Linux to run in a VMPL other than 0 when an
SVSM is present

The series is based off of and tested against the tip tree:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master

4e2b6e891aae ("Merge branch into tip/master: 'x86/shstk'")

[1] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/58019.pdf
[2] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/56421.pdf

Cc: Joel Becker <[email protected]>
Cc: Christoph Hellwig <[email protected]>

---

Changes in v4:
- Add a pre-patch to rename the struct snp_secrets_page_layout to just
snp_secrets_page.
- Move the config-fs visibility support to be group based and referenced
by an index. Remove the macro changes that set the visibility function
for an entry.
- Make the TSM visibility support vendor specific via an ops callback.
- Use the rmpadjust() function directly and remove the enforce_vmpl0()
function.
- Consolidate common variables into arch/x86/kernel/sev-shared.c.

Changes in v3:
- Rename decompresor snp_setup() to early_snp_setup() to better indicate
when it is called.
- Rename the "svsm" config-fs attribute into the more generic
"service_provider" attribute that takes a name as input.
- Change config-fs visibility function to be a simple bool return type
instead of returning the mode.
- Switch to using new RIP_REL_REF() macro and __head notation where
appropriate.

Changes in v2:
- Define X86_FEATURE_SVSM_PRESENT and set the bit in the CPUID table,
removing the need to set the CPUID bit in the #VC handler.
- Rename the TSM service_version attribute to service_manifest_version.
- Add support to config-fs to hide attributes and hide the SVSM attributes
when an SVSM is not present.


Tom Lendacky (15):
x86/sev: Shorten snp_secrets_page_layout to snp_secrets_page
x86/sev: Rename snp_init() in the boot/compressed/sev.c file
x86/sev: Make the VMPL0 checking more straight forward
x86/sev: Check for the presence of an SVSM in the SNP Secrets page
x86/sev: Use kernel provided SVSM Calling Areas
x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0
x86/sev: Use the SVSM to create a vCPU when not in VMPL0
x86/sev: Provide SVSM discovery support
x86/sev: Provide guest VMPL level to userspace
virt: sev-guest: Choose the VMPCK key based on executing VMPL
configfs-tsm: Allow the privlevel_floor attribute to be updated
fs/configfs: Add a callback to determine attribute visibility
x86/sev: Take advantage of configfs visibility support in TSM
x86/sev: Extend the config-fs attestation support for an SVSM
x86/sev: Allow non-VMPL0 execution when an SVSM is present

Documentation/ABI/testing/configfs-tsm | 63 +++
.../arch/x86/amd-memory-encryption.rst | 22 +
arch/x86/boot/compressed/sev.c | 258 ++++++-----
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/include/asm/sev-common.h | 18 +
arch/x86/include/asm/sev.h | 116 ++++-
arch/x86/include/uapi/asm/svm.h | 1 +
arch/x86/kernel/sev-shared.c | 354 ++++++++++++++-
arch/x86/kernel/sev.c | 421 +++++++++++++++---
arch/x86/mm/mem_encrypt_amd.c | 8 +-
drivers/virt/coco/sev-guest/sev-guest.c | 210 ++++++++-
drivers/virt/coco/tdx-guest/tdx-guest.c | 29 +-
drivers/virt/coco/tsm.c | 173 +++++--
fs/configfs/dir.c | 20 +
include/linux/configfs.h | 3 +
include/linux/tsm.h | 62 ++-
17 files changed, 1533 insertions(+), 228 deletions(-)

--
2.43.2



2024-04-24 16:29:33

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v4 08/15] x86/sev: Provide SVSM discovery support

The SVSM specification documents an alternative method of discovery for
the SVSM using a reserved CPUID bit and a reserved MSR.

For the CPUID support, the SNP CPUID table is updated to set bit 28 of
the EAX register of the 0x8000001f leaf when an SVSM is present. This bit
has been reserved for use in this capacity.

For the MSR support, a new reserved MSR 0xc001f000 has been defined. A #VC
should be generated when accessing this MSR. The #VC handler is expected
to ignore writes to this MSR and return the physical calling area address
(CAA) on reads of this MSR.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/sev-shared.c | 11 +++++++++++
arch/x86/kernel/sev.c | 17 +++++++++++++++++
4 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..a17a81b3189b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -446,6 +446,7 @@
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
#define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
+#define X86_FEATURE_SVSM_PRESENT (19*32+28) /* "" SNP SVSM is present */

/* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */
#define X86_FEATURE_NO_NESTED_DATA_BP (20*32+ 0) /* "" No Nested Data Breakpoints */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e022e6eb766c..45ffa27569f4 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -660,6 +660,8 @@
#define MSR_AMD64_RMP_BASE 0xc0010132
#define MSR_AMD64_RMP_END 0xc0010133

+#define MSR_SVSM_CAA 0xc001f000
+
/* AMD Collaborative Processor Performance Control MSRs */
#define MSR_AMD_CPPC_CAP1 0xc00102b0
#define MSR_AMD_CPPC_ENABLE 0xc00102b1
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index b415b10a0823..50db783f151e 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -1561,6 +1561,8 @@ static enum es_result vc_check_opcode_bytes(struct es_em_ctxt *ctxt,
static void __head setup_svsm_ca(const struct cc_blob_sev_info *cc_info)
{
struct snp_secrets_page *secrets_page;
+ struct snp_cpuid_table *cpuid_table;
+ unsigned int i;
u64 caa;

BUILD_BUG_ON(sizeof(*secrets_page) != PAGE_SIZE);
@@ -1607,4 +1609,13 @@ static void __head setup_svsm_ca(const struct cc_blob_sev_info *cc_info)
*/
boot_svsm_caa = (struct svsm_ca *)caa;
boot_svsm_caa_pa = caa;
+
+ /* Advertise the SVSM presence via CPUID. */
+ cpuid_table = (struct snp_cpuid_table *)snp_cpuid_get_table();
+ for (i = 0; i < cpuid_table->count; i++) {
+ struct snp_cpuid_fn *fn = &cpuid_table->fn[i];
+
+ if (fn->eax_in == 0x8000001f)
+ fn->eax |= BIT(28);
+ }
}
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 3f4342b31736..69a756781d90 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1326,12 +1326,29 @@ int __init sev_es_efi_map_ghcbs(pgd_t *pgd)
return 0;
}

+static enum es_result vc_handle_svsm_caa_msr(struct es_em_ctxt *ctxt)
+{
+ struct pt_regs *regs = ctxt->regs;
+
+ /* Writes to the SVSM CAA msr are ignored */
+ if (ctxt->insn.opcode.bytes[1] == 0x30)
+ return ES_OK;
+
+ regs->ax = lower_32_bits(this_cpu_read(svsm_caa_pa));
+ regs->dx = upper_32_bits(this_cpu_read(svsm_caa_pa));
+
+ return ES_OK;
+}
+
static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
{
struct pt_regs *regs = ctxt->regs;
enum es_result ret;
u64 exit_info_1;

+ if (regs->cx == MSR_SVSM_CAA)
+ return vc_handle_svsm_caa_msr(ctxt);
+
/* Is it a WRMSR? */
exit_info_1 = (ctxt->insn.opcode.bytes[1] == 0x30) ? 1 : 0;

--
2.43.2


2024-04-24 16:31:26

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v4 11/15] configfs-tsm: Allow the privlevel_floor attribute to be updated

With the introduction of an SVSM, Linux will be running at a non-zero
VMPL. Any request for an attestation report at a higher privilege VMPL
than what Linux is currently running will result in an error. Allow for
the privlevel_floor attribute to be updated dynamically so that the
attribute may be set dynamically. Set the privlevel_floor attribute to
be the value of the vmpck_id being used.

Signed-off-by: Tom Lendacky <[email protected]>
---
drivers/virt/coco/sev-guest/sev-guest.c | 5 ++++-
include/linux/tsm.h | 2 +-
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/virt/coco/sev-guest/sev-guest.c b/drivers/virt/coco/sev-guest/sev-guest.c
index e7dd7df86427..2abb51bd034f 100644
--- a/drivers/virt/coco/sev-guest/sev-guest.c
+++ b/drivers/virt/coco/sev-guest/sev-guest.c
@@ -885,7 +885,7 @@ static int sev_report_new(struct tsm_report *report, void *data)
return 0;
}

-static const struct tsm_ops sev_tsm_ops = {
+static struct tsm_ops sev_tsm_ops = {
.name = KBUILD_MODNAME,
.report_new = sev_report_new,
};
@@ -972,6 +972,9 @@ static int __init sev_guest_probe(struct platform_device *pdev)
snp_dev->input.resp_gpa = __pa(snp_dev->response);
snp_dev->input.data_gpa = __pa(snp_dev->certs_data);

+ /* Set the privlevel_floor attribute based on the vmpck_id */
+ sev_tsm_ops.privlevel_floor = vmpck_id;
+
ret = tsm_register(&sev_tsm_ops, snp_dev, &tsm_report_extra_type);
if (ret)
goto e_free_cert_data;
diff --git a/include/linux/tsm.h b/include/linux/tsm.h
index de8324a2223c..50c5769657d8 100644
--- a/include/linux/tsm.h
+++ b/include/linux/tsm.h
@@ -54,7 +54,7 @@ struct tsm_report {
*/
struct tsm_ops {
const char *name;
- const unsigned int privlevel_floor;
+ unsigned int privlevel_floor;
int (*report_new)(struct tsm_report *report, void *data);
};

--
2.43.2


2024-04-24 16:33:16

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v4 06/15] x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0

The PVALIDATE instruction can only be performed at VMPL0. An SVSM will
be present when running at VMPL1 or a lower privilege level.

When an SVSM is present, use the SVSM_CORE_PVALIDATE call to perform
memory validation instead of issuing the PVALIDATE instruction directly.

The validation of a single 4K page is now explicitly identified as such
in the function name, pvalidate_4k_page(). The pvalidate_pages() function
is used for validating 1 or more pages at either 4K or 2M in size. Each
function, however, determines whether it can issue the PVALIDATE directly
or whether the SVSM needs to be invoked.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/boot/compressed/sev.c | 45 ++++++++-
arch/x86/include/asm/sev.h | 22 ++++
arch/x86/kernel/sev-shared.c | 179 ++++++++++++++++++++++++++++++++-
arch/x86/kernel/sev.c | 25 +++--
4 files changed, 253 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index cb771b380a6b..32a1e98ffaa9 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -130,6 +130,34 @@ static bool fault_in_kernel_space(unsigned long address)
/* Include code for early handlers */
#include "../../kernel/sev-shared.c"

+static struct svsm_ca *__svsm_get_caa(void)
+{
+ return boot_svsm_caa;
+}
+
+static u64 __svsm_get_caa_pa(void)
+{
+ return boot_svsm_caa_pa;
+}
+
+static int svsm_protocol(struct svsm_call *call)
+{
+ struct ghcb *ghcb;
+ int ret;
+
+ if (boot_ghcb)
+ ghcb = boot_ghcb;
+ else
+ ghcb = NULL;
+
+ do {
+ ret = ghcb ? __svsm_ghcb_protocol(ghcb, call)
+ : __svsm_msr_protocol(call);
+ } while (ret == SVSM_ERR_BUSY);
+
+ return ret;
+}
+
bool sev_snp_enabled(void)
{
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
@@ -146,8 +174,8 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
* If private -> shared then invalidate the page before requesting the
* state change in the RMP table.
*/
- if (op == SNP_PAGE_STATE_SHARED && pvalidate(paddr, RMP_PG_SIZE_4K, 0))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+ if (op == SNP_PAGE_STATE_SHARED)
+ pvalidate_4k_page(paddr, paddr, 0);

/* Issue VMGEXIT to change the page state in RMP table. */
sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
@@ -162,8 +190,8 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
* Now that page state is changed in the RMP table, validate it so that it is
* consistent with the RMP entry.
*/
- if (op == SNP_PAGE_STATE_PRIVATE && pvalidate(paddr, RMP_PG_SIZE_4K, 1))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+ if (op == SNP_PAGE_STATE_PRIVATE)
+ pvalidate_4k_page(paddr, paddr, 1);
}

void snp_set_page_private(unsigned long paddr)
@@ -256,6 +284,15 @@ void sev_es_shutdown_ghcb(void)
if (!sev_es_check_cpu_features())
error("SEV-ES CPU Features missing.");

+ /*
+ * The boot_ghcb value is used to determine whether to use the GHCB MSR
+ * protocol or the GHCB shared page to perform a GHCB request. Since the
+ * GHCB page is being changed to encrypted, it can't be used to perform
+ * GHCB requests. Clear the boot_ghcb variable so that the GHCB MSR
+ * protocol is used to change the GHCB page over to an encrypted page.
+ */
+ boot_ghcb = NULL;
+
/*
* GHCB Page must be flushed from the cache and mapped encrypted again.
* Otherwise the running kernel will see strange cache effects when
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 8f180fd3cbf0..e6f1ed3f6ce3 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -187,6 +187,27 @@ struct svsm_ca {
#define SVSM_ERR_INVALID_PARAMETER 0x80000005
#define SVSM_ERR_INVALID_REQUEST 0x80000006
#define SVSM_ERR_BUSY 0x80000007
+#define SVSM_PVALIDATE_FAIL_SIZEMISMATCH 0x80001006
+
+/*
+ * The SVSM PVALIDATE related structures
+ */
+struct svsm_pvalidate_entry {
+ u64 page_size : 2,
+ action : 1,
+ ignore_cf : 1,
+ rsvd : 8,
+ pfn : 52;
+};
+
+struct svsm_pvalidate_call {
+ u16 entries;
+ u16 next;
+
+ u8 rsvd1[4];
+
+ struct svsm_pvalidate_entry entry[];
+};

/*
* SVSM protocol structure
@@ -207,6 +228,7 @@ struct svsm_call {

#define SVSM_CORE_CALL(x) ((0ULL << 32) | (x))
#define SVSM_CORE_REMAP_CA 0
+#define SVSM_CORE_PVALIDATE 1

#ifdef CONFIG_AMD_MEM_ENCRYPT
extern void __sev_es_ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 6f57eb804e70..b415b10a0823 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -40,6 +40,9 @@ static u8 vmpl __ro_after_init;
static struct svsm_ca *boot_svsm_caa __ro_after_init;
static u64 boot_svsm_caa_pa __ro_after_init;

+static struct svsm_ca *__svsm_get_caa(void);
+static u64 __svsm_get_caa_pa(void);
+
/* I/O parameters for CPUID-related helpers */
struct cpuid_leaf {
u32 fn;
@@ -102,6 +105,8 @@ static u32 cpuid_std_range_max __ro_after_init;
static u32 cpuid_hyp_range_max __ro_after_init;
static u32 cpuid_ext_range_max __ro_after_init;

+static int svsm_protocol(struct svsm_call *call);
+
static bool __init sev_es_check_cpu_features(void)
{
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -1186,7 +1191,65 @@ static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
}
}

-static void pvalidate_pages(struct snp_psc_desc *desc)
+static int base_pvalidate_4k_page(unsigned long vaddr, bool validate)
+{
+ return pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
+}
+
+static int svsm_pvalidate_4k_page(unsigned long paddr, bool validate)
+{
+ struct svsm_pvalidate_call *pvalidate_call;
+ struct svsm_call call = {};
+ u64 pvalidate_call_pa;
+ unsigned long flags;
+ int ret;
+
+ /*
+ * This can be called very early in the boot, use native functions in
+ * order to avoid paravirt issues.
+ */
+ flags = native_save_fl();
+ if (flags & X86_EFLAGS_IF)
+ native_irq_disable();
+
+ call.caa = __svsm_get_caa();
+
+ pvalidate_call = (struct svsm_pvalidate_call *)call.caa->svsm_buffer;
+ pvalidate_call_pa = __svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer);
+
+ pvalidate_call->entries = 1;
+ pvalidate_call->next = 0;
+ pvalidate_call->entry[0].page_size = RMP_PG_SIZE_4K;
+ pvalidate_call->entry[0].action = validate;
+ pvalidate_call->entry[0].ignore_cf = 0;
+ pvalidate_call->entry[0].pfn = paddr >> PAGE_SHIFT;
+
+ /* Protocol 0, Call ID 1 */
+ call.rax = SVSM_CORE_CALL(SVSM_CORE_PVALIDATE);
+ call.rcx = pvalidate_call_pa;
+
+ ret = svsm_protocol(&call);
+
+ if (flags & X86_EFLAGS_IF)
+ native_irq_enable();
+
+ return ret;
+}
+
+static void pvalidate_4k_page(unsigned long vaddr, unsigned long paddr, bool validate)
+{
+ int ret;
+
+ ret = vmpl ? svsm_pvalidate_4k_page(paddr, validate)
+ : base_pvalidate_4k_page(vaddr, validate);
+
+ if (ret) {
+ WARN(1, "Failed to validate address 0x%lx ret %d", vaddr, ret);
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+ }
+}
+
+static void base_pvalidate_pages(struct snp_psc_desc *desc)
{
struct psc_entry *e;
unsigned long vaddr;
@@ -1220,6 +1283,120 @@ static void pvalidate_pages(struct snp_psc_desc *desc)
}
}

+static void svsm_pvalidate_pages(struct snp_psc_desc *desc)
+{
+ struct svsm_pvalidate_call *pvalidate_call;
+ struct svsm_pvalidate_entry *pe;
+ unsigned int call_count, i;
+ struct svsm_call call = {};
+ u64 pvalidate_call_pa;
+ struct psc_entry *e;
+ unsigned long flags;
+ unsigned long vaddr;
+ bool action;
+ int ret;
+
+ /*
+ * This can be called very early in the boot, use native functions in
+ * order to avoid paravirt issues.
+ */
+ flags = native_save_fl();
+ if (flags & X86_EFLAGS_IF)
+ native_irq_disable();
+
+ call.caa = __svsm_get_caa();
+
+ pvalidate_call = (struct svsm_pvalidate_call *)call.caa->svsm_buffer;
+ pvalidate_call_pa = __svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer);
+
+ /* Calculate how many entries the CA buffer can hold */
+ call_count = sizeof(call.caa->svsm_buffer);
+ call_count -= offsetof(struct svsm_pvalidate_call, entry);
+ call_count /= sizeof(pvalidate_call->entry[0]);
+
+ /* Protocol 0, Call ID 1 */
+ call.rax = SVSM_CORE_CALL(SVSM_CORE_PVALIDATE);
+ call.rcx = pvalidate_call_pa;
+
+ pvalidate_call->entries = 0;
+ pvalidate_call->next = 0;
+
+ for (i = 0; i <= desc->hdr.end_entry; i++) {
+ e = &desc->entries[i];
+ pe = &pvalidate_call->entry[pvalidate_call->entries];
+
+ pe->page_size = e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K;
+ pe->action = e->operation == SNP_PAGE_STATE_PRIVATE;
+ pe->ignore_cf = 0;
+ pe->pfn = e->gfn;
+
+ pvalidate_call->entries++;
+ if (pvalidate_call->entries < call_count && i != desc->hdr.end_entry)
+ continue;
+
+ ret = svsm_protocol(&call);
+ if (ret == SVSM_PVALIDATE_FAIL_SIZEMISMATCH &&
+ pvalidate_call->entry[pvalidate_call->next].page_size == RMP_PG_SIZE_2M) {
+ u64 pfn, pfn_end;
+
+ /*
+ * The "next" field is the index of the failed entry. Calculate the
+ * index of the entry after the failed entry before the fields are
+ * cleared so that processing can continue on from that point (take
+ * into account the for loop adding 1 to the entry).
+ */
+ i -= pvalidate_call->entries - pvalidate_call->next;
+ i += 1;
+
+ action = pvalidate_call->entry[pvalidate_call->next].action;
+ pfn = pvalidate_call->entry[pvalidate_call->next].pfn;
+ pfn_end = pfn + 511;
+
+ pvalidate_call->entries = 0;
+ pvalidate_call->next = 0;
+ for (; pfn <= pfn_end; pfn++) {
+ pe = &pvalidate_call->entry[pvalidate_call->entries];
+
+ pe->page_size = RMP_PG_SIZE_4K;
+ pe->action = action;
+ pe->ignore_cf = 0;
+ pe->pfn = pfn;
+
+ pvalidate_call->entries++;
+ if (pvalidate_call->entries < call_count && pfn != pfn_end)
+ continue;
+
+ ret = svsm_protocol(&call);
+ if (ret != SVSM_SUCCESS)
+ break;
+
+ pvalidate_call->entries = 0;
+ pvalidate_call->next = 0;
+ }
+ }
+
+ if (ret != SVSM_SUCCESS) {
+ pe = &pvalidate_call->entry[pvalidate_call->next];
+ vaddr = (unsigned long)pfn_to_kaddr(pe->pfn);
+
+ WARN(1, "Failed to validate address %lx ret=%#x (%d)", vaddr, ret, ret);
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+ }
+
+ pvalidate_call->entries = 0;
+ pvalidate_call->next = 0;
+ }
+
+ if (flags & X86_EFLAGS_IF)
+ native_irq_enable();
+}
+
+static void pvalidate_pages(struct snp_psc_desc *desc)
+{
+ vmpl ? svsm_pvalidate_pages(desc)
+ : base_pvalidate_pages(desc);
+}
+
static int vmgexit_psc(struct ghcb *ghcb, struct snp_psc_desc *desc)
{
int cur_entry, end_entry, ret = 0;
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 21f3cc40d662..49cf4a6f1f31 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -622,6 +622,12 @@ static struct svsm_ca *__svsm_get_caa(void)
: boot_svsm_caa;
}

+static u64 __svsm_get_caa_pa(void)
+{
+ return sev_cfg.cas_initialized ? this_cpu_read(svsm_caa_pa)
+ : boot_svsm_caa_pa;
+}
+
static noinstr void __sev_put_ghcb(struct ghcb_state *state)
{
struct sev_es_runtime_data *data;
@@ -792,7 +798,6 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
{
unsigned long paddr_end;
u64 val;
- int ret;

vaddr = vaddr & PAGE_MASK;

@@ -800,12 +805,9 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
paddr_end = paddr + (npages << PAGE_SHIFT);

while (paddr < paddr_end) {
- if (op == SNP_PAGE_STATE_SHARED) {
- /* Page validation must be rescinded before changing to shared */
- ret = pvalidate(vaddr, RMP_PG_SIZE_4K, false);
- if (WARN(ret, "Failed to validate address 0x%lx ret %d", paddr, ret))
- goto e_term;
- }
+ /* Page validation must be rescinded before changing to shared */
+ if (op == SNP_PAGE_STATE_SHARED)
+ pvalidate_4k_page(vaddr, paddr, false);

/*
* Use the MSR protocol because this function can be called before
@@ -827,12 +829,9 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
paddr, GHCB_MSR_PSC_RESP_VAL(val)))
goto e_term;

- if (op == SNP_PAGE_STATE_PRIVATE) {
- /* Page validation must be performed after changing to private */
- ret = pvalidate(vaddr, RMP_PG_SIZE_4K, true);
- if (WARN(ret, "Failed to validate address 0x%lx ret %d", paddr, ret))
- goto e_term;
- }
+ /* Page validation must be performed after changing to private */
+ if (op == SNP_PAGE_STATE_PRIVATE)
+ pvalidate_4k_page(vaddr, paddr, true);

vaddr += PAGE_SIZE;
paddr += PAGE_SIZE;
--
2.43.2


2024-04-24 16:35:47

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v4 01/15] x86/sev: Shorten snp_secrets_page_layout to snp_secrets_page

Ending a struct name with "layout" is a little redundant, so shorten the
snp_secrets_page_layout name to just snp_secrets_page.

No functional change.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/sev.h | 2 +-
arch/x86/kernel/sev.c | 4 ++--
drivers/virt/coco/sev-guest/sev-guest.c | 6 +++---
3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7f57382afee4..48bc397db649 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -140,7 +140,7 @@ struct secrets_os_area {
#define VMPCK_KEY_LEN 32

/* See the SNP spec version 0.9 for secrets page format */
-struct snp_secrets_page_layout {
+struct snp_secrets_page {
u32 version;
u32 imien : 1,
rsvd1 : 31;
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 995f94467101..6949fbccec40 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -648,7 +648,7 @@ static u64 __init get_secrets_page(void)

static u64 __init get_snp_jump_table_addr(void)
{
- struct snp_secrets_page_layout *layout;
+ struct snp_secrets_page *layout;
void __iomem *mem;
u64 pa, addr;

@@ -662,7 +662,7 @@ static u64 __init get_snp_jump_table_addr(void)
return 0;
}

- layout = (__force struct snp_secrets_page_layout *)mem;
+ layout = (__force struct snp_secrets_page *)mem;

addr = layout->os_area.ap_jump_table_pa;
iounmap(mem);
diff --git a/drivers/virt/coco/sev-guest/sev-guest.c b/drivers/virt/coco/sev-guest/sev-guest.c
index 87f241825bc3..04a7bd1e4314 100644
--- a/drivers/virt/coco/sev-guest/sev-guest.c
+++ b/drivers/virt/coco/sev-guest/sev-guest.c
@@ -59,7 +59,7 @@ struct snp_guest_dev {
*/
struct snp_guest_msg secret_request, secret_response;

- struct snp_secrets_page_layout *layout;
+ struct snp_secrets_page *layout;
struct snp_req_data input;
union {
struct snp_report_req report;
@@ -743,7 +743,7 @@ static const struct file_operations snp_guest_fops = {
.unlocked_ioctl = snp_guest_ioctl,
};

-static u8 *get_vmpck(int id, struct snp_secrets_page_layout *layout, u32 **seqno)
+static u8 *get_vmpck(int id, struct snp_secrets_page *layout, u32 **seqno)
{
u8 *key = NULL;

@@ -897,8 +897,8 @@ static void unregister_sev_tsm(void *data)

static int __init sev_guest_probe(struct platform_device *pdev)
{
- struct snp_secrets_page_layout *layout;
struct sev_guest_platform_data *data;
+ struct snp_secrets_page *layout;
struct device *dev = &pdev->dev;
struct snp_guest_dev *snp_dev;
struct miscdevice *misc;
--
2.43.2


2024-04-24 16:36:30

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH v4 07/15] x86/sev: Use the SVSM to create a vCPU when not in VMPL0

Using the RMPADJUST instruction, the VSMA attribute can only be changed
at VMPL0. An SVSM will be present when running at VMPL1 or a lower
privilege level.

When an SVSM is present, use the SVSM_CORE_CREATE_VCPU call or the
SVSM_CORE_DESTROY_VCPU call to perform VMSA attribute changes. Use the
VMPL level supplied by the SVSM for the VMSA when starting the AP.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/sev.h | 2 ++
arch/x86/kernel/sev.c | 60 +++++++++++++++++++++++++++++++++-----
2 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index e6f1ed3f6ce3..a7312b936d16 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -229,6 +229,8 @@ struct svsm_call {
#define SVSM_CORE_CALL(x) ((0ULL << 32) | (x))
#define SVSM_CORE_REMAP_CA 0
#define SVSM_CORE_PVALIDATE 1
+#define SVSM_CORE_CREATE_VCPU 2
+#define SVSM_CORE_DELETE_VCPU 3

#ifdef CONFIG_AMD_MEM_ENCRYPT
extern void __sev_es_ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 49cf4a6f1f31..3f4342b31736 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -995,7 +995,7 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
}

-static int snp_set_vmsa(void *va, bool vmsa)
+static int base_snp_set_vmsa(void *va, bool vmsa)
{
u64 attrs;

@@ -1013,6 +1013,40 @@ static int snp_set_vmsa(void *va, bool vmsa)
return rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
}

+static int svsm_snp_set_vmsa(void *va, void *caa, int apic_id, bool vmsa)
+{
+ struct svsm_call call = {};
+ unsigned long flags;
+ int ret;
+
+ local_irq_save(flags);
+
+ call.caa = this_cpu_read(svsm_caa);
+ call.rcx = __pa(va);
+
+ if (vmsa) {
+ /* Protocol 0, Call ID 2 */
+ call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU);
+ call.rdx = __pa(caa);
+ call.r8 = apic_id;
+ } else {
+ /* Protocol 0, Call ID 3 */
+ call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU);
+ }
+
+ ret = svsm_protocol(&call);
+
+ local_irq_restore(flags);
+
+ return ret;
+}
+
+static int snp_set_vmsa(void *va, void *caa, int apic_id, bool vmsa)
+{
+ return vmpl ? svsm_snp_set_vmsa(va, caa, apic_id, vmsa)
+ : base_snp_set_vmsa(va, vmsa);
+}
+
#define __ATTR_BASE (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK)
#define INIT_CS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
#define INIT_DS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_WRITE_MASK)
@@ -1044,11 +1078,11 @@ static void *snp_alloc_vmsa_page(int cpu)
return page_address(p + 1);
}

-static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa)
+static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id)
{
int err;

- err = snp_set_vmsa(vmsa, false);
+ err = snp_set_vmsa(vmsa, NULL, apic_id, false);
if (err)
pr_err("clear VMSA page failed (%u), leaking page\n", err);
else
@@ -1059,6 +1093,7 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
{
struct sev_es_save_area *cur_vmsa, *vmsa;
struct ghcb_state state;
+ struct svsm_ca *caa;
unsigned long flags;
struct ghcb *ghcb;
u8 sipi_vector;
@@ -1105,6 +1140,12 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
if (!vmsa)
return -ENOMEM;

+ /*
+ * If an SVSM is present, then the SVSM CAA per-CPU variable will
+ * have a value, otherwise it will be NULL.
+ */
+ caa = per_cpu(svsm_caa, cpu);
+
/* CR4 should maintain the MCE value */
cr4 = native_read_cr4() & X86_CR4_MCE;

@@ -1152,11 +1193,11 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
* VMPL level
* SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
*/
- vmsa->vmpl = 0;
+ vmsa->vmpl = vmpl;
vmsa->sev_features = sev_status >> 2;

/* Switch the page over to a VMSA page now that it is initialized */
- ret = snp_set_vmsa(vmsa, true);
+ ret = snp_set_vmsa(vmsa, caa, apic_id, true);
if (ret) {
pr_err("set VMSA page failed (%u)\n", ret);
free_page((unsigned long)vmsa);
@@ -1172,7 +1213,10 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
vc_ghcb_invalidate(ghcb);
ghcb_set_rax(ghcb, vmsa->sev_features);
ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
- ghcb_set_sw_exit_info_1(ghcb, ((u64)apic_id << 32) | SVM_VMGEXIT_AP_CREATE);
+ ghcb_set_sw_exit_info_1(ghcb,
+ ((u64)apic_id << 32) |
+ ((u64)vmpl << 16) |
+ SVM_VMGEXIT_AP_CREATE);
ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));

sev_es_wr_ghcb_msr(__pa(ghcb));
@@ -1190,13 +1234,13 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)

/* Perform cleanup if there was an error */
if (ret) {
- snp_cleanup_vmsa(vmsa);
+ snp_cleanup_vmsa(vmsa, apic_id);
vmsa = NULL;
}

/* Free up any previous VMSA page */
if (cur_vmsa)
- snp_cleanup_vmsa(cur_vmsa);
+ snp_cleanup_vmsa(cur_vmsa, apic_id);

/* Record the current VMSA page */
per_cpu(sev_vmsa, cpu) = vmsa;
--
2.43.2


2024-04-25 13:51:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 01/15] x86/sev: Shorten snp_secrets_page_layout to snp_secrets_page

On Wed, Apr 24, 2024 at 10:57:57AM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 995f94467101..6949fbccec40 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -648,7 +648,7 @@ static u64 __init get_secrets_page(void)
>
> static u64 __init get_snp_jump_table_addr(void)
> {
> - struct snp_secrets_page_layout *layout;
> + struct snp_secrets_page *layout;

Yes, and I'd go change that "layout" name to "secrets" too because
layout doesn't make any sense when talking about a secrets page.

This, OTOH:

addr = secrets->os_area.ap_jump_table_pa;

means something: the address comes from the secrets page. Not from the
"layout". :-)

IOW, diff ontop:

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 25056346bc18..790e4818f7c6 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -648,7 +648,7 @@ static u64 __init get_secrets_page(void)

static u64 __init get_snp_jump_table_addr(void)
{
- struct snp_secrets_page *layout;
+ struct snp_secrets_page *secrets;
void __iomem *mem;
u64 pa, addr;

@@ -662,9 +662,9 @@ static u64 __init get_snp_jump_table_addr(void)
return 0;
}

- layout = (__force struct snp_secrets_page *)mem;
+ secrets = (__force struct snp_secrets_page *)mem;

- addr = layout->os_area.ap_jump_table_pa;
+ addr = secrets->os_area.ap_jump_table_pa;
iounmap(mem);

return addr;
diff --git a/drivers/virt/coco/sev-guest/sev-guest.c b/drivers/virt/coco/sev-guest/sev-guest.c
index 04a7bd1e4314..654290a8e1ba 100644
--- a/drivers/virt/coco/sev-guest/sev-guest.c
+++ b/drivers/virt/coco/sev-guest/sev-guest.c
@@ -59,7 +59,7 @@ struct snp_guest_dev {
*/
struct snp_guest_msg secret_request, secret_response;

- struct snp_secrets_page *layout;
+ struct snp_secrets_page *secrets;
struct snp_req_data input;
union {
struct snp_report_req report;
@@ -743,26 +743,26 @@ static const struct file_operations snp_guest_fops = {
.unlocked_ioctl = snp_guest_ioctl,
};

-static u8 *get_vmpck(int id, struct snp_secrets_page *layout, u32 **seqno)
+static u8 *get_vmpck(int id, struct snp_secrets_page *secrets, u32 **seqno)
{
u8 *key = NULL;

switch (id) {
case 0:
- *seqno = &layout->os_area.msg_seqno_0;
- key = layout->vmpck0;
+ *seqno = &secrets->os_area.msg_seqno_0;
+ key = secrets->vmpck0;
break;
case 1:
- *seqno = &layout->os_area.msg_seqno_1;
- key = layout->vmpck1;
+ *seqno = &secrets->os_area.msg_seqno_1;
+ key = secrets->vmpck1;
break;
case 2:
- *seqno = &layout->os_area.msg_seqno_2;
- key = layout->vmpck2;
+ *seqno = &secrets->os_area.msg_seqno_2;
+ key = secrets->vmpck2;
break;
case 3:
- *seqno = &layout->os_area.msg_seqno_3;
- key = layout->vmpck3;
+ *seqno = &secrets->os_area.msg_seqno_3;
+ key = secrets->vmpck3;
break;
default:
break;
@@ -898,7 +898,7 @@ static void unregister_sev_tsm(void *data)
static int __init sev_guest_probe(struct platform_device *pdev)
{
struct sev_guest_platform_data *data;
- struct snp_secrets_page *layout;
+ struct snp_secrets_page *secrets;
struct device *dev = &pdev->dev;
struct snp_guest_dev *snp_dev;
struct miscdevice *misc;
@@ -916,7 +916,7 @@ static int __init sev_guest_probe(struct platform_device *pdev)
if (!mapping)
return -ENODEV;

- layout = (__force void *)mapping;
+ secrets = (__force void *)mapping;

ret = -ENOMEM;
snp_dev = devm_kzalloc(&pdev->dev, sizeof(struct snp_guest_dev), GFP_KERNEL);
@@ -924,7 +924,7 @@ static int __init sev_guest_probe(struct platform_device *pdev)
goto e_unmap;

ret = -EINVAL;
- snp_dev->vmpck = get_vmpck(vmpck_id, layout, &snp_dev->os_area_msg_seqno);
+ snp_dev->vmpck = get_vmpck(vmpck_id, secrets, &snp_dev->os_area_msg_seqno);
if (!snp_dev->vmpck) {
dev_err(dev, "invalid vmpck id %d\n", vmpck_id);
goto e_unmap;
@@ -938,7 +938,7 @@ static int __init sev_guest_probe(struct platform_device *pdev)

platform_set_drvdata(pdev, snp_dev);
snp_dev->dev = dev;
- snp_dev->layout = layout;
+ snp_dev->secrets = secrets;

/* Allocate the shared page used for the request and response message. */
snp_dev->request = alloc_shared_pages(dev, sizeof(struct snp_guest_msg));

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-26 20:52:07

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 11/15] configfs-tsm: Allow the privlevel_floor attribute to be updated

Tom Lendacky wrote:
> With the introduction of an SVSM, Linux will be running at a non-zero
> VMPL. Any request for an attestation report at a higher privilege VMPL
> than what Linux is currently running will result in an error. Allow for
> the privlevel_floor attribute to be updated dynamically so that the
> attribute may be set dynamically. Set the privlevel_floor attribute to
> be the value of the vmpck_id being used.
>
> Signed-off-by: Tom Lendacky <[email protected]>

This addresses my comment on the last version.

Acked-by: Dan Williams <[email protected]>

2024-05-03 11:39:22

by Joerg Roedel

[permalink] [raw]
Subject: Re: [svsm-devel] [PATCH v4 00/15] Provide SEV-SNP support for running under an SVSM

On Wed, Apr 24, 2024 at 10:57:56AM -0500, Tom Lendacky wrote:
> Tom Lendacky (15):
> x86/sev: Shorten snp_secrets_page_layout to snp_secrets_page
> x86/sev: Rename snp_init() in the boot/compressed/sev.c file
> x86/sev: Make the VMPL0 checking more straight forward
> x86/sev: Check for the presence of an SVSM in the SNP Secrets page
> x86/sev: Use kernel provided SVSM Calling Areas
> x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0
> x86/sev: Use the SVSM to create a vCPU when not in VMPL0
> x86/sev: Provide SVSM discovery support
> x86/sev: Provide guest VMPL level to userspace
> virt: sev-guest: Choose the VMPCK key based on executing VMPL
> configfs-tsm: Allow the privlevel_floor attribute to be updated
> fs/configfs: Add a callback to determine attribute visibility
> x86/sev: Take advantage of configfs visibility support in TSM
> x86/sev: Extend the config-fs attestation support for an SVSM
> x86/sev: Allow non-VMPL0 execution when an SVSM is present

I tested these on latest COCONUT-SVSM upstream and found no issues.

Tested-by: Joerg Roedel <[email protected]>

2024-05-22 18:24:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 06/15] x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0

On Wed, Apr 24, 2024 at 10:58:02AM -0500, Tom Lendacky wrote:
> The PVALIDATE instruction can only be performed at VMPL0. An SVSM will
> be present when running at VMPL1 or a lower privilege level.

s/when running/when the guest itself is running/

> When an SVSM is present, use the SVSM_CORE_PVALIDATE call to perform
> memory validation instead of issuing the PVALIDATE instruction directly.
>
> The validation of a single 4K page is now explicitly identified as such
> in the function name, pvalidate_4k_page(). The pvalidate_pages() function
> is used for validating 1 or more pages at either 4K or 2M in size. Each
> function, however, determines whether it can issue the PVALIDATE directly
> or whether the SVSM needs to be invoked.

This all talks more about what this is doing and I'm missing the "why".

It sounds like when you're running a SVSM under the guest, you cannot
use PVALIDATE but the SVSM should do it for you. And you should start
with that.

The rest/details should be visible from the diff.

> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/boot/compressed/sev.c | 45 ++++++++-
> arch/x86/include/asm/sev.h | 22 ++++
> arch/x86/kernel/sev-shared.c | 179 ++++++++++++++++++++++++++++++++-
> arch/x86/kernel/sev.c | 25 +++--
> 4 files changed, 253 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index cb771b380a6b..32a1e98ffaa9 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -130,6 +130,34 @@ static bool fault_in_kernel_space(unsigned long address)
> /* Include code for early handlers */
> #include "../../kernel/sev-shared.c"
>
> +static struct svsm_ca *__svsm_get_caa(void)
> +{
> + return boot_svsm_caa;
> +}
> +
> +static u64 __svsm_get_caa_pa(void)
> +{
> + return boot_svsm_caa_pa;
> +}
> +
> +static int svsm_protocol(struct svsm_call *call)

Function name needs a verb.

> +{
> + struct ghcb *ghcb;
> + int ret;
> +
> + if (boot_ghcb)
> + ghcb = boot_ghcb;
> + else
> + ghcb = NULL;
> +
> + do {
> + ret = ghcb ? __svsm_ghcb_protocol(ghcb, call)
> + : __svsm_msr_protocol(call);
> + } while (ret == SVSM_ERR_BUSY);
> +
> + return ret;
> +}
> +
> bool sev_snp_enabled(void)
> {
> return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
> @@ -146,8 +174,8 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
> * If private -> shared then invalidate the page before requesting the
> * state change in the RMP table.
> */
> - if (op == SNP_PAGE_STATE_SHARED && pvalidate(paddr, RMP_PG_SIZE_4K, 0))
> - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
> + if (op == SNP_PAGE_STATE_SHARED)
> + pvalidate_4k_page(paddr, paddr, 0);

That is a bool so put a "false" in there.

> /* Issue VMGEXIT to change the page state in RMP table. */
> sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
> @@ -162,8 +190,8 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
> * Now that page state is changed in the RMP table, validate it so that it is
> * consistent with the RMP entry.
> */
> - if (op == SNP_PAGE_STATE_PRIVATE && pvalidate(paddr, RMP_PG_SIZE_4K, 1))
> - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
> + if (op == SNP_PAGE_STATE_PRIVATE)
> + pvalidate_4k_page(paddr, paddr, 1);

Ditto, but "true".

> void snp_set_page_private(unsigned long paddr)
> @@ -256,6 +284,15 @@ void sev_es_shutdown_ghcb(void)
> if (!sev_es_check_cpu_features())
> error("SEV-ES CPU Features missing.");
>
> + /*
> + * The boot_ghcb value is used to determine whether to use the GHCB MSR

s/The boot_ghcb value /This is used/

> + * protocol or the GHCB shared page to perform a GHCB request. Since the
> + * GHCB page is being changed to encrypted, it can't be used to perform
> + * GHCB requests. Clear the boot_ghcb variable so that the GHCB MSR
> + * protocol is used to change the GHCB page over to an encrypted page.
> + */
> + boot_ghcb = NULL;
> +
> /*
> * GHCB Page must be flushed from the cache and mapped encrypted again.
> * Otherwise the running kernel will see strange cache effects when
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 8f180fd3cbf0..e6f1ed3f6ce3 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -187,6 +187,27 @@ struct svsm_ca {
> #define SVSM_ERR_INVALID_PARAMETER 0x80000005
> #define SVSM_ERR_INVALID_REQUEST 0x80000006
> #define SVSM_ERR_BUSY 0x80000007
> +#define SVSM_PVALIDATE_FAIL_SIZEMISMATCH 0x80001006
> +
> +/*
> + * The SVSM PVALIDATE related structures
> + */
> +struct svsm_pvalidate_entry {
> + u64 page_size : 2,
> + action : 1,
> + ignore_cf : 1,
> + rsvd : 8,
> + pfn : 52;
> +};
> +
> +struct svsm_pvalidate_call {
> + u16 entries;
> + u16 next;
> +
> + u8 rsvd1[4];
> +
> + struct svsm_pvalidate_entry entry[];
> +};
>
> /*
> * SVSM protocol structure
> @@ -207,6 +228,7 @@ struct svsm_call {
>
> #define SVSM_CORE_CALL(x) ((0ULL << 32) | (x))
> #define SVSM_CORE_REMAP_CA 0
> +#define SVSM_CORE_PVALIDATE 1
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> extern void __sev_es_ist_enter(struct pt_regs *regs);
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 6f57eb804e70..b415b10a0823 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -40,6 +40,9 @@ static u8 vmpl __ro_after_init;
> static struct svsm_ca *boot_svsm_caa __ro_after_init;
> static u64 boot_svsm_caa_pa __ro_after_init;
>
> +static struct svsm_ca *__svsm_get_caa(void);
> +static u64 __svsm_get_caa_pa(void);

Are we being lazy again? :)

So I know the below is bigger than two silly forward declarations but
forward declarations are a sign that our source code placement is
lacking and if we keep piling on that, it'll become a mess soon. And
guess who gets to mop up after y'all who don't have time to do it
because you have to enable features? :-\

So in order to avoid that, we'll re-position it to where we think it'll
be better going forward.

Btw, this is the second time, at least, where I think that that
sev-shared.c thing is starting to become more of a nuisance than a code
savings thing but I don't have a better idea for it yet.

So let's extend that ifdeffery at the top which provides things which
are called the same but defined differently depending on whether we're
in the decompressor or kernel proper.

IOW, something like this, ontop:

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 32a1e98ffaa9..9d89fc67574b 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -130,16 +130,6 @@ static bool fault_in_kernel_space(unsigned long address)
/* Include code for early handlers */
#include "../../kernel/sev-shared.c"

-static struct svsm_ca *__svsm_get_caa(void)
-{
- return boot_svsm_caa;
-}
-
-static u64 __svsm_get_caa_pa(void)
-{
- return boot_svsm_caa_pa;
-}
-
static int svsm_protocol(struct svsm_call *call)
{
struct ghcb *ghcb;
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index b415b10a0823..b4f1fd780925 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -11,20 +11,6 @@

#include <asm/setup_data.h>

-#ifndef __BOOT_COMPRESSED
-#define error(v) pr_err(v)
-#define has_cpuflag(f) boot_cpu_has(f)
-#define sev_printk(fmt, ...) printk(fmt, ##__VA_ARGS__)
-#define sev_printk_rtl(fmt, ...) printk_ratelimited(fmt, ##__VA_ARGS__)
-#else
-#undef WARN
-#define WARN(condition, format...) (!!(condition))
-#define sev_printk(fmt, ...)
-#define sev_printk_rtl(fmt, ...)
-#undef vc_forward_exception
-#define vc_forward_exception(c) panic("SNP: Hypervisor requested exception\n")
-#endif
-
/*
* SVSM related information:
* When running under an SVSM, the VMPL that Linux is executing at must be
@@ -40,8 +26,47 @@ static u8 vmpl __ro_after_init;
static struct svsm_ca *boot_svsm_caa __ro_after_init;
static u64 boot_svsm_caa_pa __ro_after_init;

-static struct svsm_ca *__svsm_get_caa(void);
-static u64 __svsm_get_caa_pa(void);
+#ifdef __BOOT_COMPRESSED
+
+#undef WARN
+#define WARN(condition, format...) (!!(condition))
+#define sev_printk(fmt, ...)
+#define sev_printk_rtl(fmt, ...)
+#undef vc_forward_exception
+#define vc_forward_exception(c) panic("SNP: Hypervisor requested exception\n")
+
+static struct svsm_ca *__svsm_get_caa(void)
+{
+ return boot_svsm_caa;
+}
+
+static u64 __svsm_get_caa_pa(void)
+{
+ return boot_svsm_caa_pa;
+}
+
+#else /* __BOOT_COMPRESSED */
+
+#define error(v) pr_err(v)
+#define has_cpuflag(f) boot_cpu_has(f)
+#define sev_printk(fmt, ...) printk(fmt, ##__VA_ARGS__)
+#define sev_printk_rtl(fmt, ...) printk_ratelimited(fmt, ##__VA_ARGS__)
+
+static DEFINE_PER_CPU(struct svsm_ca *, svsm_caa);
+
+static struct svsm_ca *__svsm_get_caa(void)
+{
+ return sev_cfg.cas_initialized ? this_cpu_read(svsm_caa)
+ : boot_svsm_caa;
+}
+
+static u64 __svsm_get_caa_pa(void)
+{
+ return sev_cfg.cas_initialized ? this_cpu_read(svsm_caa_pa)
+ : boot_svsm_caa_pa;
+}
+
+#endif /* __BOOT_COMPRESSED */

/* I/O parameters for CPUID-related helpers */
struct cpuid_leaf {
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index bb6455ff45a2..db895a7a9401 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -138,7 +138,6 @@ static struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE);

static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
static DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
-static DEFINE_PER_CPU(struct svsm_ca *, svsm_caa);
static DEFINE_PER_CPU(u64, svsm_caa_pa);

struct sev_config {
@@ -616,18 +615,6 @@ static __always_inline void vc_forward_exception(struct es_em_ctxt *ctxt)
/* Include code shared with pre-decompression boot stage */
#include "sev-shared.c"

-static struct svsm_ca *__svsm_get_caa(void)
-{
- return sev_cfg.cas_initialized ? this_cpu_read(svsm_caa)
- : boot_svsm_caa;
-}
-
-static u64 __svsm_get_caa_pa(void)
-{
- return sev_cfg.cas_initialized ? this_cpu_read(svsm_caa_pa)
- : boot_svsm_caa_pa;
-}
-
static noinstr void __sev_put_ghcb(struct ghcb_state *state)
{
struct sev_es_runtime_data *data;

> +
> /* I/O parameters for CPUID-related helpers */
> struct cpuid_leaf {
> u32 fn;
> @@ -102,6 +105,8 @@ static u32 cpuid_std_range_max __ro_after_init;
> static u32 cpuid_hyp_range_max __ro_after_init;
> static u32 cpuid_ext_range_max __ro_after_init;
>
> +static int svsm_protocol(struct svsm_call *call);

You get the idea...

> static bool __init sev_es_check_cpu_features(void)
> {
> if (!has_cpuflag(X86_FEATURE_RDRAND)) {
> @@ -1186,7 +1191,65 @@ static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> }
> }
>
> -static void pvalidate_pages(struct snp_psc_desc *desc)
> +static int base_pvalidate_4k_page(unsigned long vaddr, bool validate)
> +{
> + return pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
> +}

There are those silly wrappers again. Kill it pls.

> +
> +static int svsm_pvalidate_4k_page(unsigned long paddr, bool validate)

Will there ever be a pvalidate_2M_page?

If so, then you need to redesign this to have a lower-level helper

__svsm_pvalidate_page(... ,size, );

and the 4K and 2M things call it.

> +{
> + struct svsm_pvalidate_call *pvalidate_call;

Too long:

struct svsm_pvalidate_call *pvl_call;

> + struct svsm_call call = {};

I guess this needs to be

struct svsm_call svsm_call = {};

so that you know what kind of call it is - you have two.

> + u64 pvalidate_call_pa;
> + unsigned long flags;
> + int ret;
> +
> + /*
> + * This can be called very early in the boot, use native functions in
> + * order to avoid paravirt issues.
> + */
> + flags = native_save_fl();
> + if (flags & X86_EFLAGS_IF)
> + native_irq_disable();

Yeah, this'll change.

> + call.caa = __svsm_get_caa();
> +
> + pvalidate_call = (struct svsm_pvalidate_call *)call.caa->svsm_buffer;

That's almost a page worth of data, we don't zero it. How sensitive
would this be if the SVSM sees some old data?

Or we trust the SVSM and all is good?

> + pvalidate_call_pa = __svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer);
> +
> + pvalidate_call->entries = 1;
> + pvalidate_call->next = 0;
> + pvalidate_call->entry[0].page_size = RMP_PG_SIZE_4K;
> + pvalidate_call->entry[0].action = validate;
> + pvalidate_call->entry[0].ignore_cf = 0;
> + pvalidate_call->entry[0].pfn = paddr >> PAGE_SHIFT;
> +
> + /* Protocol 0, Call ID 1 */
> + call.rax = SVSM_CORE_CALL(SVSM_CORE_PVALIDATE);
> + call.rcx = pvalidate_call_pa;
> +
> + ret = svsm_protocol(&call);
> +
> + if (flags & X86_EFLAGS_IF)
> + native_irq_enable();
> +
> + return ret;
> +}
> +
> +static void pvalidate_4k_page(unsigned long vaddr, unsigned long paddr, bool validate)
> +{
> + int ret;
> +
> + ret = vmpl ? svsm_pvalidate_4k_page(paddr, validate)
> + : base_pvalidate_4k_page(vaddr, validate);

if (vmpl)
ret = svsm_pvalidate_4k_page(paddr, validate);
else
ret = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);

No need for silly wrappers.

> +
> + if (ret) {
> + WARN(1, "Failed to validate address 0x%lx ret %d", vaddr, ret);
> + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
> + }
> +}
> +
> +static void base_pvalidate_pages(struct snp_psc_desc *desc)
> {
> struct psc_entry *e;
> unsigned long vaddr;
> @@ -1220,6 +1283,120 @@ static void pvalidate_pages(struct snp_psc_desc *desc)
> }
> }
>
> +static void svsm_pvalidate_pages(struct snp_psc_desc *desc)
> +{
> + struct svsm_pvalidate_call *pvalidate_call;

shorten to "pvl_call" or so.

> + struct svsm_pvalidate_entry *pe;

See, like this. :-P

> + unsigned int call_count, i;
> + struct svsm_call call = {};
> + u64 pvalidate_call_pa;
> + struct psc_entry *e;
> + unsigned long flags;
> + unsigned long vaddr;
> + bool action;
> + int ret;
> +
> + /*
> + * This can be called very early in the boot, use native functions in
> + * order to avoid paravirt issues.
> + */
> + flags = native_save_fl();
> + if (flags & X86_EFLAGS_IF)
> + native_irq_disable();
> +
> + call.caa = __svsm_get_caa();
> +
> + pvalidate_call = (struct svsm_pvalidate_call *)call.caa->svsm_buffer;
> + pvalidate_call_pa = __svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer);

As above.

> + /* Calculate how many entries the CA buffer can hold */
> + call_count = sizeof(call.caa->svsm_buffer);
> + call_count -= offsetof(struct svsm_pvalidate_call, entry);
> + call_count /= sizeof(pvalidate_call->entry[0]);
> +
> + /* Protocol 0, Call ID 1 */
> + call.rax = SVSM_CORE_CALL(SVSM_CORE_PVALIDATE);
> + call.rcx = pvalidate_call_pa;
> +
> + pvalidate_call->entries = 0;
> + pvalidate_call->next = 0;

Or you simply memset the whole thing and be safe.

> + for (i = 0; i <= desc->hdr.end_entry; i++) {
> + e = &desc->entries[i];
> + pe = &pvalidate_call->entry[pvalidate_call->entries];
> +
> + pe->page_size = e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K;
> + pe->action = e->operation == SNP_PAGE_STATE_PRIVATE;
> + pe->ignore_cf = 0;
> + pe->pfn = e->gfn;
> +
> + pvalidate_call->entries++;
> + if (pvalidate_call->entries < call_count && i != desc->hdr.end_entry)
> + continue;
> +
> + ret = svsm_protocol(&call);
> + if (ret == SVSM_PVALIDATE_FAIL_SIZEMISMATCH &&
> + pvalidate_call->entry[pvalidate_call->next].page_size == RMP_PG_SIZE_2M) {
> + u64 pfn, pfn_end;
> +
> + /*
> + * The "next" field is the index of the failed entry. Calculate the
> + * index of the entry after the failed entry before the fields are
> + * cleared so that processing can continue on from that point (take
> + * into account the for loop adding 1 to the entry).
> + */
> + i -= pvalidate_call->entries - pvalidate_call->next;
> + i += 1;
> +
> + action = pvalidate_call->entry[pvalidate_call->next].action;
> + pfn = pvalidate_call->entry[pvalidate_call->next].pfn;
> + pfn_end = pfn + 511;
> +
> + pvalidate_call->entries = 0;
> + pvalidate_call->next = 0;

You did that above before the loop. Looks weird doing it again.

> + for (; pfn <= pfn_end; pfn++) {
> + pe = &pvalidate_call->entry[pvalidate_call->entries];
> +
> + pe->page_size = RMP_PG_SIZE_4K;
> + pe->action = action;
> + pe->ignore_cf = 0;
> + pe->pfn = pfn;
> +
> + pvalidate_call->entries++;
> + if (pvalidate_call->entries < call_count && pfn != pfn_end)
> + continue;
> +
> + ret = svsm_protocol(&call);
> + if (ret != SVSM_SUCCESS)
> + break;
> +
> + pvalidate_call->entries = 0;
> + pvalidate_call->next = 0;
> + }
> + }

I have no clue what's going on in this function. Sounds like it needs
splitting. And commenting too. Like the loop body should be something
like svsm_pvalidate_entry() or so.

And then that second loop wants to be a separate function too as it is
calling the SVSM protocol again.

> +
> + if (ret != SVSM_SUCCESS) {
> + pe = &pvalidate_call->entry[pvalidate_call->next];
> + vaddr = (unsigned long)pfn_to_kaddr(pe->pfn);
> +
> + WARN(1, "Failed to validate address %lx ret=%#x (%d)", vaddr, ret, ret);
> + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
> + }
> +
> + pvalidate_call->entries = 0;
> + pvalidate_call->next = 0;

And here it is again. If anything, splitting and comments are needed
here at least.

..

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-05-22 21:15:22

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v4 06/15] x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0

On 5/22/24 13:24, Borislav Petkov wrote:
> On Wed, Apr 24, 2024 at 10:58:02AM -0500, Tom Lendacky wrote:
>> The PVALIDATE instruction can only be performed at VMPL0. An SVSM will
>> be present when running at VMPL1 or a lower privilege level.
>
> s/when running/when the guest itself is running/
>
>> When an SVSM is present, use the SVSM_CORE_PVALIDATE call to perform
>> memory validation instead of issuing the PVALIDATE instruction directly.
>>
>> The validation of a single 4K page is now explicitly identified as such
>> in the function name, pvalidate_4k_page(). The pvalidate_pages() function
>> is used for validating 1 or more pages at either 4K or 2M in size. Each
>> function, however, determines whether it can issue the PVALIDATE directly
>> or whether the SVSM needs to be invoked.
>
> This all talks more about what this is doing and I'm missing the "why".

It's sort of there in the first two paragraphs. I'll re-word it.

>
> It sounds like when you're running a SVSM under the guest, you cannot
> use PVALIDATE but the SVSM should do it for you. And you should start
> with that.
>
> The rest/details should be visible from the diff.
>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>> arch/x86/boot/compressed/sev.c | 45 ++++++++-
>> arch/x86/include/asm/sev.h | 22 ++++
>> arch/x86/kernel/sev-shared.c | 179 ++++++++++++++++++++++++++++++++-
>> arch/x86/kernel/sev.c | 25 +++--
>> 4 files changed, 253 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
>> index cb771b380a6b..32a1e98ffaa9 100644
>> --- a/arch/x86/boot/compressed/sev.c
>> +++ b/arch/x86/boot/compressed/sev.c
>> @@ -130,6 +130,34 @@ static bool fault_in_kernel_space(unsigned long address)
>> /* Include code for early handlers */
>> #include "../../kernel/sev-shared.c"
>>
>> +static struct svsm_ca *__svsm_get_caa(void)
>> +{
>> + return boot_svsm_caa;
>> +}
>> +
>> +static u64 __svsm_get_caa_pa(void)
>> +{
>> + return boot_svsm_caa_pa;
>> +}
>> +
>> +static int svsm_protocol(struct svsm_call *call)
>
> Function name needs a verb.

Yep, taken care of based on an earlier patch.

>
>> +{
>> + struct ghcb *ghcb;
>> + int ret;
>> +
>> + if (boot_ghcb)
>> + ghcb = boot_ghcb;
>> + else
>> + ghcb = NULL;
>> +
>> + do {
>> + ret = ghcb ? __svsm_ghcb_protocol(ghcb, call)
>> + : __svsm_msr_protocol(call);
>> + } while (ret == SVSM_ERR_BUSY);
>> +
>> + return ret;
>> +}
>> +
>> bool sev_snp_enabled(void)
>> {
>> return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
>> @@ -146,8 +174,8 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
>> * If private -> shared then invalidate the page before requesting the
>> * state change in the RMP table.
>> */
>> - if (op == SNP_PAGE_STATE_SHARED && pvalidate(paddr, RMP_PG_SIZE_4K, 0))
>> - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
>> + if (op == SNP_PAGE_STATE_SHARED)
>> + pvalidate_4k_page(paddr, paddr, 0);
>
> That is a bool so put a "false" in there.

Ok

>
>> /* Issue VMGEXIT to change the page state in RMP table. */
>> sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
>> @@ -162,8 +190,8 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
>> * Now that page state is changed in the RMP table, validate it so that it is
>> * consistent with the RMP entry.
>> */
>> - if (op == SNP_PAGE_STATE_PRIVATE && pvalidate(paddr, RMP_PG_SIZE_4K, 1))
>> - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
>> + if (op == SNP_PAGE_STATE_PRIVATE)
>> + pvalidate_4k_page(paddr, paddr, 1);
>
> Ditto, but "true".

Ok

>
>> void snp_set_page_private(unsigned long paddr)
>> @@ -256,6 +284,15 @@ void sev_es_shutdown_ghcb(void)
>> if (!sev_es_check_cpu_features())
>> error("SEV-ES CPU Features missing.");
>>
>> + /*
>> + * The boot_ghcb value is used to determine whether to use the GHCB MSR
>
> s/The boot_ghcb value /This is used/

Ok

>
>> + * protocol or the GHCB shared page to perform a GHCB request. Since the
>> + * GHCB page is being changed to encrypted, it can't be used to perform
>> + * GHCB requests. Clear the boot_ghcb variable so that the GHCB MSR
>> + * protocol is used to change the GHCB page over to an encrypted page.
>> + */
>> + boot_ghcb = NULL;
>> +
>> /*
>> * GHCB Page must be flushed from the cache and mapped encrypted again.
>> * Otherwise the running kernel will see strange cache effects when
>> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
>> index 8f180fd3cbf0..e6f1ed3f6ce3 100644
>> --- a/arch/x86/include/asm/sev.h
>> +++ b/arch/x86/include/asm/sev.h
>> @@ -187,6 +187,27 @@ struct svsm_ca {
>> #define SVSM_ERR_INVALID_PARAMETER 0x80000005
>> #define SVSM_ERR_INVALID_REQUEST 0x80000006
>> #define SVSM_ERR_BUSY 0x80000007
>> +#define SVSM_PVALIDATE_FAIL_SIZEMISMATCH 0x80001006
>> +
>> +/*
>> + * The SVSM PVALIDATE related structures
>> + */
>> +struct svsm_pvalidate_entry {
>> + u64 page_size : 2,
>> + action : 1,
>> + ignore_cf : 1,
>> + rsvd : 8,
>> + pfn : 52;
>> +};
>> +
>> +struct svsm_pvalidate_call {
>> + u16 entries;
>> + u16 next;
>> +
>> + u8 rsvd1[4];
>> +
>> + struct svsm_pvalidate_entry entry[];
>> +};
>>
>> /*
>> * SVSM protocol structure
>> @@ -207,6 +228,7 @@ struct svsm_call {
>>
>> #define SVSM_CORE_CALL(x) ((0ULL << 32) | (x))
>> #define SVSM_CORE_REMAP_CA 0
>> +#define SVSM_CORE_PVALIDATE 1
>>
>> #ifdef CONFIG_AMD_MEM_ENCRYPT
>> extern void __sev_es_ist_enter(struct pt_regs *regs);
>> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
>> index 6f57eb804e70..b415b10a0823 100644
>> --- a/arch/x86/kernel/sev-shared.c
>> +++ b/arch/x86/kernel/sev-shared.c
>> @@ -40,6 +40,9 @@ static u8 vmpl __ro_after_init;
>> static struct svsm_ca *boot_svsm_caa __ro_after_init;
>> static u64 boot_svsm_caa_pa __ro_after_init;
>>
>> +static struct svsm_ca *__svsm_get_caa(void);
>> +static u64 __svsm_get_caa_pa(void);
>
> Are we being lazy again? :)
>
> So I know the below is bigger than two silly forward declarations but
> forward declarations are a sign that our source code placement is
> lacking and if we keep piling on that, it'll become a mess soon. And
> guess who gets to mop up after y'all who don't have time to do it
> because you have to enable features? :-\
>
> So in order to avoid that, we'll re-position it to where we think it'll
> be better going forward.
>
> Btw, this is the second time, at least, where I think that that
> sev-shared.c thing is starting to become more of a nuisance than a code
> savings thing but I don't have a better idea for it yet.
>
> So let's extend that ifdeffery at the top which provides things which
> are called the same but defined differently depending on whether we're
> in the decompressor or kernel proper.
>
> IOW, something like this, ontop:
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 32a1e98ffaa9..9d89fc67574b 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -130,16 +130,6 @@ static bool fault_in_kernel_space(unsigned long address)
> /* Include code for early handlers */
> #include "../../kernel/sev-shared.c"
>
> -static struct svsm_ca *__svsm_get_caa(void)
> -{
> - return boot_svsm_caa;
> -}
> -
> -static u64 __svsm_get_caa_pa(void)
> -{
> - return boot_svsm_caa_pa;
> -}
> -
> static int svsm_protocol(struct svsm_call *call)
> {
> struct ghcb *ghcb;
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index b415b10a0823..b4f1fd780925 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -11,20 +11,6 @@
>
> #include <asm/setup_data.h>
>
> -#ifndef __BOOT_COMPRESSED
> -#define error(v) pr_err(v)
> -#define has_cpuflag(f) boot_cpu_has(f)
> -#define sev_printk(fmt, ...) printk(fmt, ##__VA_ARGS__)
> -#define sev_printk_rtl(fmt, ...) printk_ratelimited(fmt, ##__VA_ARGS__)
> -#else
> -#undef WARN
> -#define WARN(condition, format...) (!!(condition))
> -#define sev_printk(fmt, ...)
> -#define sev_printk_rtl(fmt, ...)
> -#undef vc_forward_exception
> -#define vc_forward_exception(c) panic("SNP: Hypervisor requested exception\n")
> -#endif
> -
> /*
> * SVSM related information:
> * When running under an SVSM, the VMPL that Linux is executing at must be
> @@ -40,8 +26,47 @@ static u8 vmpl __ro_after_init;
> static struct svsm_ca *boot_svsm_caa __ro_after_init;
> static u64 boot_svsm_caa_pa __ro_after_init;
>
> -static struct svsm_ca *__svsm_get_caa(void);
> -static u64 __svsm_get_caa_pa(void);
> +#ifdef __BOOT_COMPRESSED
> +
> +#undef WARN
> +#define WARN(condition, format...) (!!(condition))
> +#define sev_printk(fmt, ...)
> +#define sev_printk_rtl(fmt, ...)
> +#undef vc_forward_exception
> +#define vc_forward_exception(c) panic("SNP: Hypervisor requested exception\n")
> +
> +static struct svsm_ca *__svsm_get_caa(void)
> +{
> + return boot_svsm_caa;
> +}
> +
> +static u64 __svsm_get_caa_pa(void)
> +{
> + return boot_svsm_caa_pa;
> +}
> +
> +#else /* __BOOT_COMPRESSED */
> +
> +#define error(v) pr_err(v)
> +#define has_cpuflag(f) boot_cpu_has(f)
> +#define sev_printk(fmt, ...) printk(fmt, ##__VA_ARGS__)
> +#define sev_printk_rtl(fmt, ...) printk_ratelimited(fmt, ##__VA_ARGS__)
> +
> +static DEFINE_PER_CPU(struct svsm_ca *, svsm_caa);
> +
> +static struct svsm_ca *__svsm_get_caa(void)
> +{
> + return sev_cfg.cas_initialized ? this_cpu_read(svsm_caa)
> + : boot_svsm_caa;
> +}
> +
> +static u64 __svsm_get_caa_pa(void)
> +{
> + return sev_cfg.cas_initialized ? this_cpu_read(svsm_caa_pa)
> + : boot_svsm_caa_pa;
> +}
> +
> +#endif /* __BOOT_COMPRESSED */
>
> /* I/O parameters for CPUID-related helpers */
> struct cpuid_leaf {
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index bb6455ff45a2..db895a7a9401 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -138,7 +138,6 @@ static struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE);
>
> static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
> static DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
> -static DEFINE_PER_CPU(struct svsm_ca *, svsm_caa);
> static DEFINE_PER_CPU(u64, svsm_caa_pa);
>
> struct sev_config {
> @@ -616,18 +615,6 @@ static __always_inline void vc_forward_exception(struct es_em_ctxt *ctxt)
> /* Include code shared with pre-decompression boot stage */
> #include "sev-shared.c"
>
> -static struct svsm_ca *__svsm_get_caa(void)
> -{
> - return sev_cfg.cas_initialized ? this_cpu_read(svsm_caa)
> - : boot_svsm_caa;
> -}
> -
> -static u64 __svsm_get_caa_pa(void)
> -{
> - return sev_cfg.cas_initialized ? this_cpu_read(svsm_caa_pa)
> - : boot_svsm_caa_pa;
> -}
> -
> static noinstr void __sev_put_ghcb(struct ghcb_state *state)
> {
> struct sev_es_runtime_data *data;
>
>> +
>> /* I/O parameters for CPUID-related helpers */
>> struct cpuid_leaf {
>> u32 fn;
>> @@ -102,6 +105,8 @@ static u32 cpuid_std_range_max __ro_after_init;
>> static u32 cpuid_hyp_range_max __ro_after_init;
>> static u32 cpuid_ext_range_max __ro_after_init;
>>
>> +static int svsm_protocol(struct svsm_call *call);
>
> You get the idea...

I'll do a pre-patch that moves and reverses the __BOOT_COMPRESSED ifdef
and then re-apply and adjust the patches based on this.

>
>> static bool __init sev_es_check_cpu_features(void)
>> {
>> if (!has_cpuflag(X86_FEATURE_RDRAND)) {
>> @@ -1186,7 +1191,65 @@ static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
>> }
>> }
>>
>> -static void pvalidate_pages(struct snp_psc_desc *desc)
>> +static int base_pvalidate_4k_page(unsigned long vaddr, bool validate)
>> +{
>> + return pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
>> +}
>
> There are those silly wrappers again. Kill it pls.

Done.

>
>> +
>> +static int svsm_pvalidate_4k_page(unsigned long paddr, bool validate)
>
> Will there ever be a pvalidate_2M_page?

Not really. There is a multi page state change request structure that
can be a mix of page sizes and will operate in large groups by looping
through the entries.

>
> If so, then you need to redesign this to have a lower-level helper
>
> __svsm_pvalidate_page(... ,size, );
>
> and the 4K and 2M things call it.
>
>> +{
>> + struct svsm_pvalidate_call *pvalidate_call;
>
> Too long:
>
> struct svsm_pvalidate_call *pvl_call;

Sure.

>
>> + struct svsm_call call = {};
>
> I guess this needs to be
>
> struct svsm_call svsm_call = {};
>
> so that you know what kind of call it is - you have two.

Ok.

>
>> + u64 pvalidate_call_pa;
>> + unsigned long flags;
>> + int ret;
>> +
>> + /*
>> + * This can be called very early in the boot, use native functions in
>> + * order to avoid paravirt issues.
>> + */
>> + flags = native_save_fl();
>> + if (flags & X86_EFLAGS_IF)
>> + native_irq_disable();
>
> Yeah, this'll change.

Right.

>
>> + call.caa = __svsm_get_caa();
>> +
>> + pvalidate_call = (struct svsm_pvalidate_call *)call.caa->svsm_buffer;
>
> That's almost a page worth of data, we don't zero it. How sensitive
> would this be if the SVSM sees some old data?
>
> Or we trust the SVSM and all is good?

Correct. The SVSM can look at all of the guest memory anyway.

>
>> + pvalidate_call_pa = __svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer);
>> +
>> + pvalidate_call->entries = 1;
>> + pvalidate_call->next = 0;
>> + pvalidate_call->entry[0].page_size = RMP_PG_SIZE_4K;
>> + pvalidate_call->entry[0].action = validate;
>> + pvalidate_call->entry[0].ignore_cf = 0;
>> + pvalidate_call->entry[0].pfn = paddr >> PAGE_SHIFT;
>> +
>> + /* Protocol 0, Call ID 1 */
>> + call.rax = SVSM_CORE_CALL(SVSM_CORE_PVALIDATE);
>> + call.rcx = pvalidate_call_pa;
>> +
>> + ret = svsm_protocol(&call);
>> +
>> + if (flags & X86_EFLAGS_IF)
>> + native_irq_enable();
>> +
>> + return ret;
>> +}
>> +
>> +static void pvalidate_4k_page(unsigned long vaddr, unsigned long paddr, bool validate)
>> +{
>> + int ret;
>> +
>> + ret = vmpl ? svsm_pvalidate_4k_page(paddr, validate)
>> + : base_pvalidate_4k_page(vaddr, validate);
>
> if (vmpl)
> ret = svsm_pvalidate_4k_page(paddr, validate);
> else
> ret = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
>
> No need for silly wrappers.

Right, changing that from previous comment. But are you also asking that
the if / else style be used?

>
>> +
>> + if (ret) {
>> + WARN(1, "Failed to validate address 0x%lx ret %d", vaddr, ret);
>> + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
>> + }
>> +}
>> +
>> +static void base_pvalidate_pages(struct snp_psc_desc *desc)
>> {
>> struct psc_entry *e;
>> unsigned long vaddr;
>> @@ -1220,6 +1283,120 @@ static void pvalidate_pages(struct snp_psc_desc *desc)
>> }
>> }
>>
>> +static void svsm_pvalidate_pages(struct snp_psc_desc *desc)
>> +{
>> + struct svsm_pvalidate_call *pvalidate_call;
>
> shorten to "pvl_call" or so.

Ok

>
>> + struct svsm_pvalidate_entry *pe;
>
> See, like this. :-P
>
>> + unsigned int call_count, i;
>> + struct svsm_call call = {};
>> + u64 pvalidate_call_pa;
>> + struct psc_entry *e;
>> + unsigned long flags;
>> + unsigned long vaddr;
>> + bool action;
>> + int ret;
>> +
>> + /*
>> + * This can be called very early in the boot, use native functions in
>> + * order to avoid paravirt issues.
>> + */
>> + flags = native_save_fl();
>> + if (flags & X86_EFLAGS_IF)
>> + native_irq_disable();
>> +
>> + call.caa = __svsm_get_caa();
>> +
>> + pvalidate_call = (struct svsm_pvalidate_call *)call.caa->svsm_buffer;
>> + pvalidate_call_pa = __svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer);
>
> As above.

Yep.

>
>> + /* Calculate how many entries the CA buffer can hold */
>> + call_count = sizeof(call.caa->svsm_buffer);
>> + call_count -= offsetof(struct svsm_pvalidate_call, entry);
>> + call_count /= sizeof(pvalidate_call->entry[0]);
>> +
>> + /* Protocol 0, Call ID 1 */
>> + call.rax = SVSM_CORE_CALL(SVSM_CORE_PVALIDATE);
>> + call.rcx = pvalidate_call_pa;
>> +
>> + pvalidate_call->entries = 0;
>> + pvalidate_call->next = 0;
>
> Or you simply memset the whole thing and be safe.

We could, but that is clearing a 4K page each time... and the SVSM will
still rely on the entries and next values.

>
>> + for (i = 0; i <= desc->hdr.end_entry; i++) {
>> + e = &desc->entries[i];
>> + pe = &pvalidate_call->entry[pvalidate_call->entries];
>> +
>> + pe->page_size = e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K;
>> + pe->action = e->operation == SNP_PAGE_STATE_PRIVATE;
>> + pe->ignore_cf = 0;
>> + pe->pfn = e->gfn;
>> +
>> + pvalidate_call->entries++;
>> + if (pvalidate_call->entries < call_count && i != desc->hdr.end_entry)
>> + continue;
>> +
>> + ret = svsm_protocol(&call);
>> + if (ret == SVSM_PVALIDATE_FAIL_SIZEMISMATCH &&
>> + pvalidate_call->entry[pvalidate_call->next].page_size == RMP_PG_SIZE_2M) {
>> + u64 pfn, pfn_end;
>> +
>> + /*
>> + * The "next" field is the index of the failed entry. Calculate the
>> + * index of the entry after the failed entry before the fields are
>> + * cleared so that processing can continue on from that point (take
>> + * into account the for loop adding 1 to the entry).
>> + */
>> + i -= pvalidate_call->entries - pvalidate_call->next;
>> + i += 1;
>> +
>> + action = pvalidate_call->entry[pvalidate_call->next].action;
>> + pfn = pvalidate_call->entry[pvalidate_call->next].pfn;
>> + pfn_end = pfn + 511;
>> +
>> + pvalidate_call->entries = 0;
>> + pvalidate_call->next = 0;
>
> You did that above before the loop. Looks weird doing it again.

It's because we have to break down the 2MB page into multiple 4K calls
now. So we start over and enter 512 4K entries.

>
>> + for (; pfn <= pfn_end; pfn++) {
>> + pe = &pvalidate_call->entry[pvalidate_call->entries];
>> +
>> + pe->page_size = RMP_PG_SIZE_4K;
>> + pe->action = action;
>> + pe->ignore_cf = 0;
>> + pe->pfn = pfn;
>> +
>> + pvalidate_call->entries++;
>> + if (pvalidate_call->entries < call_count && pfn != pfn_end)
>> + continue;
>> +
>> + ret = svsm_protocol(&call);
>> + if (ret != SVSM_SUCCESS)
>> + break;
>> +
>> + pvalidate_call->entries = 0;
>> + pvalidate_call->next = 0;
>> + }
>> + }
>
> I have no clue what's going on in this function. Sounds like it needs
> splitting. And commenting too. Like the loop body should be something
> like svsm_pvalidate_entry() or so.

Sure, I'll add more comments or expand the comment above.

>
> And then that second loop wants to be a separate function too as it is
> calling the SVSM protocol again.
>
>> +
>> + if (ret != SVSM_SUCCESS) {
>> + pe = &pvalidate_call->entry[pvalidate_call->next];
>> + vaddr = (unsigned long)pfn_to_kaddr(pe->pfn);
>> +
>> + WARN(1, "Failed to validate address %lx ret=%#x (%d)", vaddr, ret, ret);
>> + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
>> + }
>> +
>> + pvalidate_call->entries = 0;
>> + pvalidate_call->next = 0;
>
> And here it is again. If anything, splitting and comments are needed
> here at least.

That's because you can only pass a certain number of entries to the SVSM
to handle at one time. If the kernel is in the process of validating,
say, 1,000 entries, but the CA can only hold 511 at a time, then after
it reaches the 511th entry, the SVSM must be called. Upon return, the
kernel resets the CA area and builds up the entries in there again,
calling the SVSM again when the area is again full or we reach the last
entry to be validated.

I'll add more detail in the comments.

Thanks,
Tom

>
> ...
>
> Thx.
>

2024-05-27 12:02:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 06/15] x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0

On Wed, May 22, 2024 at 04:14:54PM -0500, Tom Lendacky wrote:
> Right, changing that from previous comment. But are you also asking that the
> if / else style be used?

Yeah, please. It is trivial and thus more readable this way.

> Sure, I'll add more comments or expand the comment above.

Yes, and pls split it into helpers. Those bunch of nested loops are
kinda begging to be separate function helpers.

> That's because you can only pass a certain number of entries to the SVSM to
> handle at one time. If the kernel is in the process of validating, say,
> 1,000 entries, but the CA can only hold 511 at a time, then after it reaches
> the 511th entry, the SVSM must be called. Upon return, the kernel resets the
> CA area and builds up the entries in there again, calling the SVSM again
> when the area is again full or we reach the last entry to be validated.
>
> I'll add more detail in the comments.

Yeah, that "portioning" must be explained there.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-05-27 12:41:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 07/15] x86/sev: Use the SVSM to create a vCPU when not in VMPL0

On Wed, Apr 24, 2024 at 10:58:03AM -0500, Tom Lendacky wrote:
> -static int snp_set_vmsa(void *va, bool vmsa)
> +static int base_snp_set_vmsa(void *va, bool vmsa)

s/base_/__/

The svsm_-prefixed ones are already a good enough distinction...

> {
> u64 attrs;
>
> @@ -1013,6 +1013,40 @@ static int snp_set_vmsa(void *va, bool vmsa)
> return rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
> }
>
> +static int svsm_snp_set_vmsa(void *va, void *caa, int apic_id, bool vmsa)
^^^^^^^^^^^

bool create_vmsa or so, to denote what this arg means.

> +{
> + struct svsm_call call = {};
> + unsigned long flags;
> + int ret;
> +
> + local_irq_save(flags);
> +
> + call.caa = this_cpu_read(svsm_caa);
> + call.rcx = __pa(va);
> +
> + if (vmsa) {
> + /* Protocol 0, Call ID 2 */
> + call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU);
> + call.rdx = __pa(caa);
> + call.r8 = apic_id;
> + } else {
> + /* Protocol 0, Call ID 3 */
> + call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU);
> + }
> +
> + ret = svsm_protocol(&call);
> +
> + local_irq_restore(flags);
> +
> + return ret;
> +}
> +
> +static int snp_set_vmsa(void *va, void *caa, int apic_id, bool vmsa)
> +{
> + return vmpl ? svsm_snp_set_vmsa(va, caa, apic_id, vmsa)
> + : base_snp_set_vmsa(va, vmsa);

Why do you even need helpers if you're not going to use them somewhere
else? Just put the whole logic inside snp_set_vmsa().

> +}
> +
> #define __ATTR_BASE (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK)
> #define INIT_CS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
> #define INIT_DS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_WRITE_MASK)
> @@ -1044,11 +1078,11 @@ static void *snp_alloc_vmsa_page(int cpu)
> return page_address(p + 1);
> }
>
> -static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa)
> +static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id)
> {
> int err;
>
> - err = snp_set_vmsa(vmsa, false);
> + err = snp_set_vmsa(vmsa, NULL, apic_id, false);
> if (err)
> pr_err("clear VMSA page failed (%u), leaking page\n", err);
> else
> @@ -1059,6 +1093,7 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
> {
> struct sev_es_save_area *cur_vmsa, *vmsa;
> struct ghcb_state state;
> + struct svsm_ca *caa;
> unsigned long flags;
> struct ghcb *ghcb;
> u8 sipi_vector;
> @@ -1105,6 +1140,12 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
> if (!vmsa)
> return -ENOMEM;
>
> + /*
> + * If an SVSM is present, then the SVSM CAA per-CPU variable will
> + * have a value, otherwise it will be NULL.
> + */

/* If an SVSM is present, the SVSM per-CPU CAA will be !NULL. */

Shorter.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-05-27 13:31:48

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 08/15] x86/sev: Provide SVSM discovery support

On Wed, Apr 24, 2024 at 10:58:04AM -0500, Tom Lendacky wrote:
> The SVSM specification documents an alternative method of discovery for
> the SVSM using a reserved CPUID bit and a reserved MSR.

Yes, and all your code places where you do

if (vmpl)

to check whether the guest is running over a SVSM should do the CPUID
check. We should not be hardcoding the VMPL level to mean a SVSM is
present.

>
> For the CPUID support, the SNP CPUID table is updated to set bit 28 of

s/is updated/update the.../

> the EAX register of the 0x8000001f leaf when an SVSM is present. This bit
> has been reserved for use in this capacity.
>
> For the MSR support, a new reserved MSR 0xc001f000 has been defined. A #VC
> should be generated when accessing this MSR. The #VC handler is expected
> to ignore writes to this MSR and return the physical calling area address
> (CAA) on reads of this MSR.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/msr-index.h | 2 ++
> arch/x86/kernel/sev-shared.c | 11 +++++++++++
> arch/x86/kernel/sev.c | 17 +++++++++++++++++
> 4 files changed, 31 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 3c7434329661..a17a81b3189b 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -446,6 +446,7 @@
> #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
> #define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
> #define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
> +#define X86_FEATURE_SVSM_PRESENT (19*32+28) /* "" SNP SVSM is present */

X86_FEATURE_SVSM is better right?

And then we might even want to show it in /proc/cpuinfo here to really
say that we're running over a SVSM as that might be useful info. Think
alternate injection support for one.

> /* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */
> #define X86_FEATURE_NO_NESTED_DATA_BP (20*32+ 0) /* "" No Nested Data Breakpoints */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e022e6eb766c..45ffa27569f4 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -660,6 +660,8 @@
> #define MSR_AMD64_RMP_BASE 0xc0010132
> #define MSR_AMD64_RMP_END 0xc0010133
>
> +#define MSR_SVSM_CAA 0xc001f000
> +
> /* AMD Collaborative Processor Performance Control MSRs */
> #define MSR_AMD_CPPC_CAP1 0xc00102b0
> #define MSR_AMD_CPPC_ENABLE 0xc00102b1
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index b415b10a0823..50db783f151e 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -1561,6 +1561,8 @@ static enum es_result vc_check_opcode_bytes(struct es_em_ctxt *ctxt,
> static void __head setup_svsm_ca(const struct cc_blob_sev_info *cc_info)
> {
> struct snp_secrets_page *secrets_page;
> + struct snp_cpuid_table *cpuid_table;
> + unsigned int i;
> u64 caa;
>
> BUILD_BUG_ON(sizeof(*secrets_page) != PAGE_SIZE);
> @@ -1607,4 +1609,13 @@ static void __head setup_svsm_ca(const struct cc_blob_sev_info *cc_info)
> */
> boot_svsm_caa = (struct svsm_ca *)caa;
> boot_svsm_caa_pa = caa;
> +
> + /* Advertise the SVSM presence via CPUID. */
> + cpuid_table = (struct snp_cpuid_table *)snp_cpuid_get_table();
> + for (i = 0; i < cpuid_table->count; i++) {
> + struct snp_cpuid_fn *fn = &cpuid_table->fn[i];
> +
> + if (fn->eax_in == 0x8000001f)
> + fn->eax |= BIT(28);
> + }
> }
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 3f4342b31736..69a756781d90 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -1326,12 +1326,29 @@ int __init sev_es_efi_map_ghcbs(pgd_t *pgd)
> return 0;
> }
>
> +static enum es_result vc_handle_svsm_caa_msr(struct es_em_ctxt *ctxt)

No need for that helper. And you can reuse the exit_info_1 assignment.
Diff ontop:

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 40eb547d0d6c..7a248d66227e 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1326,31 +1326,25 @@ int __init sev_es_efi_map_ghcbs(pgd_t *pgd)
return 0;
}

-static enum es_result vc_handle_svsm_caa_msr(struct es_em_ctxt *ctxt)
-{
- struct pt_regs *regs = ctxt->regs;
-
- /* Writes to the SVSM CAA msr are ignored */
- if (ctxt->insn.opcode.bytes[1] == 0x30)
- return ES_OK;
-
- regs->ax = lower_32_bits(this_cpu_read(svsm_caa_pa));
- regs->dx = upper_32_bits(this_cpu_read(svsm_caa_pa));
-
- return ES_OK;
-}
-
static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
{
struct pt_regs *regs = ctxt->regs;
enum es_result ret;
u64 exit_info_1;

- if (regs->cx == MSR_SVSM_CAA)
- return vc_handle_svsm_caa_msr(ctxt);
-
/* Is it a WRMSR? */
- exit_info_1 = (ctxt->insn.opcode.bytes[1] == 0x30) ? 1 : 0;
+ exit_info_1 = !!(ctxt->insn.opcode.bytes[1] == 0x30);
+
+ if (regs->cx == MSR_SVSM_CAA) {
+ /* Writes to the SVSM CAA msr are ignored */
+ if (exit_info_1)
+ return ES_OK;
+
+ regs->ax = lower_32_bits(this_cpu_read(svsm_caa_pa));
+ regs->dx = upper_32_bits(this_cpu_read(svsm_caa_pa));
+
+ return ES_OK;
+ }

ghcb_set_rcx(ghcb, regs->cx);
if (exit_info_1) {

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-05-28 21:04:12

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v4 08/15] x86/sev: Provide SVSM discovery support

On 5/27/24 08:10, Borislav Petkov wrote:
> On Wed, Apr 24, 2024 at 10:58:04AM -0500, Tom Lendacky wrote:
>> The SVSM specification documents an alternative method of discovery for
>> the SVSM using a reserved CPUID bit and a reserved MSR.
>
> Yes, and all your code places where you do
>
> if (vmpl)
>
> to check whether the guest is running over a SVSM should do the CPUID
> check. We should not be hardcoding the VMPL level to mean a SVSM is
> present.

The alternative method is really meant for things like UEFI runtime
services (which uses the kernels #VC handler), not the kernel directly.

Some of those checks have to be made very early, I'll see if it is
feasible to rely on the CPUID check / cpu_feature_enabled() support.

We can separate out SVSM vs VMPL, but if the kernel isn't running at
VMPL0 then it requires that an SVSM be present.

>
>>
>> For the CPUID support, the SNP CPUID table is updated to set bit 28 of
>
> s/is updated/update the.../

Ok.

>
>> the EAX register of the 0x8000001f leaf when an SVSM is present. This bit
>> has been reserved for use in this capacity.
>>
>> For the MSR support, a new reserved MSR 0xc001f000 has been defined. A #VC
>> should be generated when accessing this MSR. The #VC handler is expected
>> to ignore writes to this MSR and return the physical calling area address
>> (CAA) on reads of this MSR.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>> arch/x86/include/asm/cpufeatures.h | 1 +
>> arch/x86/include/asm/msr-index.h | 2 ++
>> arch/x86/kernel/sev-shared.c | 11 +++++++++++
>> arch/x86/kernel/sev.c | 17 +++++++++++++++++
>> 4 files changed, 31 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 3c7434329661..a17a81b3189b 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -446,6 +446,7 @@
>> #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
>> #define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
>> #define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
>> +#define X86_FEATURE_SVSM_PRESENT (19*32+28) /* "" SNP SVSM is present */
>
> X86_FEATURE_SVSM is better right?
>
> And then we might even want to show it in /proc/cpuinfo here to really
> say that we're running over a SVSM as that might be useful info. Think
> alternate injection support for one.

Yep, will do.

>
>> /* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */
>> #define X86_FEATURE_NO_NESTED_DATA_BP (20*32+ 0) /* "" No Nested Data Breakpoints */
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index e022e6eb766c..45ffa27569f4 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -660,6 +660,8 @@
>> #define MSR_AMD64_RMP_BASE 0xc0010132
>> #define MSR_AMD64_RMP_END 0xc0010133
>>
>> +#define MSR_SVSM_CAA 0xc001f000
>> +
>> /* AMD Collaborative Processor Performance Control MSRs */
>> #define MSR_AMD_CPPC_CAP1 0xc00102b0
>> #define MSR_AMD_CPPC_ENABLE 0xc00102b1
>> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
>> index b415b10a0823..50db783f151e 100644
>> --- a/arch/x86/kernel/sev-shared.c
>> +++ b/arch/x86/kernel/sev-shared.c
>> @@ -1561,6 +1561,8 @@ static enum es_result vc_check_opcode_bytes(struct es_em_ctxt *ctxt,
>> static void __head setup_svsm_ca(const struct cc_blob_sev_info *cc_info)
>> {
>> struct snp_secrets_page *secrets_page;
>> + struct snp_cpuid_table *cpuid_table;
>> + unsigned int i;
>> u64 caa;
>>
>> BUILD_BUG_ON(sizeof(*secrets_page) != PAGE_SIZE);
>> @@ -1607,4 +1609,13 @@ static void __head setup_svsm_ca(const struct cc_blob_sev_info *cc_info)
>> */
>> boot_svsm_caa = (struct svsm_ca *)caa;
>> boot_svsm_caa_pa = caa;
>> +
>> + /* Advertise the SVSM presence via CPUID. */
>> + cpuid_table = (struct snp_cpuid_table *)snp_cpuid_get_table();
>> + for (i = 0; i < cpuid_table->count; i++) {
>> + struct snp_cpuid_fn *fn = &cpuid_table->fn[i];
>> +
>> + if (fn->eax_in == 0x8000001f)
>> + fn->eax |= BIT(28);
>> + }
>> }
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index 3f4342b31736..69a756781d90 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -1326,12 +1326,29 @@ int __init sev_es_efi_map_ghcbs(pgd_t *pgd)
>> return 0;
>> }
>>
>> +static enum es_result vc_handle_svsm_caa_msr(struct es_em_ctxt *ctxt)
>
> No need for that helper. And you can reuse the exit_info_1 assignment.
> Diff ontop:

I'll incorporate this, but probably won't change the way exit_info_1 is
assigned.

Thanks,
Tom

>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 40eb547d0d6c..7a248d66227e 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -1326,31 +1326,25 @@ int __init sev_es_efi_map_ghcbs(pgd_t *pgd)
> return 0;
> }
>
> -static enum es_result vc_handle_svsm_caa_msr(struct es_em_ctxt *ctxt)
> -{
> - struct pt_regs *regs = ctxt->regs;
> -
> - /* Writes to the SVSM CAA msr are ignored */
> - if (ctxt->insn.opcode.bytes[1] == 0x30)
> - return ES_OK;
> -
> - regs->ax = lower_32_bits(this_cpu_read(svsm_caa_pa));
> - regs->dx = upper_32_bits(this_cpu_read(svsm_caa_pa));
> -
> - return ES_OK;
> -}
> -
> static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
> {
> struct pt_regs *regs = ctxt->regs;
> enum es_result ret;
> u64 exit_info_1;
>
> - if (regs->cx == MSR_SVSM_CAA)
> - return vc_handle_svsm_caa_msr(ctxt);
> -
> /* Is it a WRMSR? */
> - exit_info_1 = (ctxt->insn.opcode.bytes[1] == 0x30) ? 1 : 0;
> + exit_info_1 = !!(ctxt->insn.opcode.bytes[1] == 0x30);
> +
> + if (regs->cx == MSR_SVSM_CAA) {
> + /* Writes to the SVSM CAA msr are ignored */
> + if (exit_info_1)
> + return ES_OK;
> +
> + regs->ax = lower_32_bits(this_cpu_read(svsm_caa_pa));
> + regs->dx = upper_32_bits(this_cpu_read(svsm_caa_pa));
> +
> + return ES_OK;
> + }
>
> ghcb_set_rcx(ghcb, regs->cx);
> if (exit_info_1) {
>

2024-05-28 21:04:47

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v4 07/15] x86/sev: Use the SVSM to create a vCPU when not in VMPL0

On 5/27/24 07:33, Borislav Petkov wrote:
> On Wed, Apr 24, 2024 at 10:58:03AM -0500, Tom Lendacky wrote:
>> -static int snp_set_vmsa(void *va, bool vmsa)
>> +static int base_snp_set_vmsa(void *va, bool vmsa)
>
> s/base_/__/

Ok.

>
> The svsm_-prefixed ones are already a good enough distinction...
>
>> {
>> u64 attrs;
>>
>> @@ -1013,6 +1013,40 @@ static int snp_set_vmsa(void *va, bool vmsa)
>> return rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
>> }
>>
>> +static int svsm_snp_set_vmsa(void *va, void *caa, int apic_id, bool vmsa)
> ^^^^^^^^^^^
>
> bool create_vmsa or so, to denote what this arg means.

Ok. I'll change it on the original function, too.

>
>> +{
>> + struct svsm_call call = {};
>> + unsigned long flags;
>> + int ret;
>> +
>> + local_irq_save(flags);
>> +
>> + call.caa = this_cpu_read(svsm_caa);
>> + call.rcx = __pa(va);
>> +
>> + if (vmsa) {
>> + /* Protocol 0, Call ID 2 */
>> + call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU);
>> + call.rdx = __pa(caa);
>> + call.r8 = apic_id;
>> + } else {
>> + /* Protocol 0, Call ID 3 */
>> + call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU);
>> + }
>> +
>> + ret = svsm_protocol(&call);
>> +
>> + local_irq_restore(flags);
>> +
>> + return ret;
>> +}
>> +
>> +static int snp_set_vmsa(void *va, void *caa, int apic_id, bool vmsa)
>> +{
>> + return vmpl ? svsm_snp_set_vmsa(va, caa, apic_id, vmsa)
>> + : base_snp_set_vmsa(va, vmsa);
>
> Why do you even need helpers if you're not going to use them somewhere
> else? Just put the whole logic inside snp_set_vmsa().

I just think it's easier to follow, with specific functions for the
situation and less indentation. But if you want, I can put it all in one
function.

>
>> +}
>> +
>> #define __ATTR_BASE (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK)
>> #define INIT_CS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
>> #define INIT_DS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_WRITE_MASK)
>> @@ -1044,11 +1078,11 @@ static void *snp_alloc_vmsa_page(int cpu)
>> return page_address(p + 1);
>> }
>>
>> -static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa)
>> +static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id)
>> {
>> int err;
>>
>> - err = snp_set_vmsa(vmsa, false);
>> + err = snp_set_vmsa(vmsa, NULL, apic_id, false);
>> if (err)
>> pr_err("clear VMSA page failed (%u), leaking page\n", err);
>> else
>> @@ -1059,6 +1093,7 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
>> {
>> struct sev_es_save_area *cur_vmsa, *vmsa;
>> struct ghcb_state state;
>> + struct svsm_ca *caa;
>> unsigned long flags;
>> struct ghcb *ghcb;
>> u8 sipi_vector;
>> @@ -1105,6 +1140,12 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
>> if (!vmsa)
>> return -ENOMEM;
>>
>> + /*
>> + * If an SVSM is present, then the SVSM CAA per-CPU variable will
>> + * have a value, otherwise it will be NULL.
>> + */
>
> /* If an SVSM is present, the SVSM per-CPU CAA will be !NULL. */
>
> Shorter.

Yep.

Thanks,
Tom

>

2024-05-31 12:30:58

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 07/15] x86/sev: Use the SVSM to create a vCPU when not in VMPL0

On Tue, May 28, 2024 at 03:28:28PM -0500, Tom Lendacky wrote:
> I just think it's easier to follow, with specific functions for the
> situation and less indentation. But if you want, I can put it all in one
> function.

Well, if the function were huge and hard to read sure, but right now it
is simple and with single indentation level. There's the other side of
having too many small helpers, leading into not seeing the flow. The
logic we should follow is: if the function is big and fat and has too
many indentation levels, you split into smaller functions.

The "Functions" section in Documentation/process/coding-style.rst has
some blurb on the matter.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-05-31 12:49:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 08/15] x86/sev: Provide SVSM discovery support

On Tue, May 28, 2024 at 03:57:10PM -0500, Tom Lendacky wrote:
> The alternative method is really meant for things like UEFI runtime services
> (which uses the kernels #VC handler), not the kernel directly.
>
> Some of those checks have to be made very early, I'll see if it is feasible
> to rely on the CPUID check / cpu_feature_enabled() support.

Put that in the commit message.

> We can separate out SVSM vs VMPL, but if the kernel isn't running at VMPL0
> then it requires that an SVSM be present.

Ok, I guess the two things are identical.

> I'll incorporate this, but probably won't change the way exit_info_1 is
> assigned.

Oh, but we love our '!!' construct:

git grep -E '\s!![^!]' *.[ch] | wc -l
7776

At least so many, my pattern is not precise.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette