This part of Secure Encrypted Paging (SEV-SNP) series focuses on the changes
required in a guest OS for SEV-SNP support.
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.
This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.
Many of the integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP
VM requires a 2-step process. First, the hypervisor assigns a page to the
guest using the new RMPUPDATE instruction. This transitions the page to
guest-invalid. Second, the guest validates the page using the new PVALIDATE
instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE"
defined in the GHCB specification to ask hypervisor to add or remove page
from the RMP table.
Each page assigned to the SEV-SNP VM can either be validated or unvalidated,
as indicated by the Validated flag in the page's RMP entry. There are two
approaches that can be taken for the page validation: Pre-validation and
Lazy Validation.
Under pre-validation, the pages are validated prior to first use. And under
lazy validation, pages are validated when first accessed. An access to a
unvalidated page results in a #VC exception, at which time the exception
handler may validate the page. Lazy validation requires careful tracking of
the validated pages to avoid validating the same GPA more than once. The
recently introduced "Unaccepted" memory type can be used to communicate the
unvalidated memory ranges to the Guest OS.
At this time we only sypport the pre-validation, the OVMF guest BIOS
validates the entire RAM before the control is handed over to the guest kernel.
The early_set_memory_{encrypt,decrypt} and set_memory_{encrypt,decrypt} are
enlightened to perform the page validation or invalidation while setting or
clearing the encryption attribute from the page table.
This series does not provide support for the Interrupt security yet which will
be added after the base support.
The series is based on tip/master
a6d06ef25c4e (origin/master, origin/HEAD, master) Merge branch 'irq/core
Additional resources
---------------------
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf
(section 15.36)
GHCB spec:
https://developer.amd.com/wp-content/resources/56421.pdf
SEV-SNP firmware specification:
https://developer.amd.com/sev/
v5: https://lore.kernel.org/lkml/[email protected]/
Changes since v5:
* move the seqno allocation in the sevguest driver.
* extend snp_issue_guest_request() to accept the exit_info to simplify the logic.
* use smaller structure names based on feedback.
* explicitly clear the memory after the SNP guest request is completed.
* cpuid validation: use a local copy of cpuid table instead of keeping
firmware table mapped throughout boot.
* cpuid validation: coding style fix-ups and refactor cpuid-related helpers
as suggested.
* cpuid validation: drop a number of BOOT_COMPRESSED-guarded defs/declarations
by moving things like snp_cpuid_init*() out of sev-shared.c and keeping only
the common bits there.
* Break up EFI config table helpers and related acpi.c changes into separate
patches.
* re-enable stack protection for 32-bit kernels as well, not just 64-bit
Changes since v4:
* Address the cpuid specific review comment
* Simplified the macro based on the review feedback
* Move macro definition to the patch that needs it
* Fix the issues reported by the checkpath
* Address the AP creation specific review comment
Changes since v3:
* Add support to use the PSP filtered CPUID.
* Add support for the extended guest request.
* Move sevguest driver in driver/virt/coco.
* Add documentation for sevguest ioctl.
* Add support to check the vmpl0.
* Pass the VM encryption key and id to be used for encrypting guest messages
through the platform drv data.
* Multiple cleanup and fixes to address the review feedbacks.
Changes since v2:
* Add support for AP startup using SNP specific vmgexit.
* Add snp_prep_memory() helper.
* Drop sev_snp_active() helper.
* Add sev_feature_enabled() helper to check which SEV feature is active.
* Sync the SNP guest message request header with latest SNP FW spec.
* Multiple cleanup and fixes to address the review feedbacks.
Changes since v1:
* Integerate the SNP support in sev.{ch}.
* Add support to query the hypervisor feature and detect whether SNP is supported.
* Define Linux specific reason code for the SNP guest termination.
* Extend the setup_header provide a way for hypervisor to pass secret and cpuid page.
* Add support to create a platform device and driver to query the attestation report
and the derive a key.
* Multiple cleanup and fixes to address Boris's review fedback.
Borislav Petkov (3):
x86/sev: Get rid of excessive use of defines
x86/head64: Carve out the guest encryption postprocessing into a
helper
x86/sev: Remove do_early_exception() forward declarations
Brijesh Singh (22):
x86/mm: Extend cc_attr to include AMD SEV-SNP
x86/sev: Shorten GHCB terminate macro names
x86/sev: Define the Linux specific guest termination reasons
x86/sev: Save the negotiated GHCB version
x86/sev: Add support for hypervisor feature VMGEXIT
x86/sev: Check SEV-SNP features support
x86/sev: Add a helper for the PVALIDATE instruction
x86/sev: Check the vmpl level
x86/compressed: Add helper for validating pages in the decompression
stage
x86/compressed: Register GHCB memory when SEV-SNP is active
x86/sev: Register GHCB memory when SEV-SNP is active
x86/sev: Add helper for validating pages in early enc attribute
changes
x86/kernel: Make the bss.decrypted section shared in RMP table
x86/kernel: Validate rom memory before accessing when SEV-SNP is
active
x86/mm: Add support to validate memory when changing C-bit
KVM: SVM: Define sev_features and vmpl field in the VMSA
x86/boot: Add Confidential Computing type to setup_data
x86/sev: Provide support for SNP guest request NAEs
x86/sev: Register SNP guest request platform device
virt: Add SEV-SNP guest driver
virt: sevguest: Add support to derive key
virt: sevguest: Add support to get extended report
Michael Roth (13):
x86/sev-es: initialize sev_status/features within #VC handler
x86/head: re-enable stack protection for 32/64-bit builds
x86/sev: move MSR-based VMGEXITs for CPUID to helper
KVM: x86: move lookup of indexed CPUID leafs to helper
x86/compressed/acpi: move EFI system table lookup to helper
x86/compressed/acpi: move EFI config table lookup to helper
x86/compressed/acpi: move EFI vendor table lookup to helper
x86/compressed/64: add support for SEV-SNP CPUID table in #VC handlers
boot/compressed/64: use firmware-validated CPUID for SEV-SNP guests
x86/boot: add a pointer to Confidential Computing blob in bootparams
x86/compressed/64: store Confidential Computing blob address in
bootparams
x86/compressed/64: add identity mapping for Confidential Computing
blob
x86/sev: use firmware-validated CPUID for SEV-SNP guests
Tom Lendacky (4):
KVM: SVM: Create a separate mapping for the SEV-ES save area
KVM: SVM: Create a separate mapping for the GHCB save area
KVM: SVM: Update the SEV-ES save area mapping
x86/sev: Use SEV-SNP AP creation to start secondary CPUs
Documentation/virt/coco/sevguest.rst | 117 ++++
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/acpi.c | 120 +---
arch/x86/boot/compressed/efi.c | 171 +++++
arch/x86/boot/compressed/head_64.S | 1 +
arch/x86/boot/compressed/ident_map_64.c | 44 +-
arch/x86/boot/compressed/idt_64.c | 5 +-
arch/x86/boot/compressed/misc.h | 42 ++
arch/x86/boot/compressed/sev.c | 189 +++++-
arch/x86/include/asm/bootparam_utils.h | 1 +
arch/x86/include/asm/cpuid.h | 26 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/include/asm/setup.h | 2 +-
arch/x86/include/asm/sev-common.h | 137 +++-
arch/x86/include/asm/sev.h | 80 ++-
arch/x86/include/asm/svm.h | 167 ++++-
arch/x86/include/uapi/asm/bootparam.h | 4 +-
arch/x86/include/uapi/asm/svm.h | 13 +
arch/x86/kernel/Makefile | 1 -
arch/x86/kernel/cc_platform.c | 2 +
arch/x86/kernel/head64.c | 79 ++-
arch/x86/kernel/head_64.S | 24 +
arch/x86/kernel/probe_roms.c | 13 +-
arch/x86/kernel/sev-shared.c | 569 +++++++++++++++-
arch/x86/kernel/sev.c | 860 ++++++++++++++++++++++--
arch/x86/kernel/smpboot.c | 3 +
arch/x86/kvm/cpuid.c | 17 +-
arch/x86/kvm/svm/sev.c | 24 +-
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/mm/mem_encrypt.c | 55 +-
arch/x86/mm/pat/set_memory.c | 15 +
drivers/virt/Kconfig | 3 +
drivers/virt/Makefile | 1 +
drivers/virt/coco/sevguest/Kconfig | 9 +
drivers/virt/coco/sevguest/Makefile | 2 +
drivers/virt/coco/sevguest/sevguest.c | 703 +++++++++++++++++++
drivers/virt/coco/sevguest/sevguest.h | 98 +++
include/linux/cc_platform.h | 8 +
include/linux/efi.h | 1 +
include/uapi/linux/sev-guest.h | 81 +++
41 files changed, 3389 insertions(+), 307 deletions(-)
create mode 100644 Documentation/virt/coco/sevguest.rst
create mode 100644 arch/x86/boot/compressed/efi.c
create mode 100644 arch/x86/include/asm/cpuid.h
create mode 100644 drivers/virt/coco/sevguest/Kconfig
create mode 100644 drivers/virt/coco/sevguest/Makefile
create mode 100644 drivers/virt/coco/sevguest/sevguest.c
create mode 100644 drivers/virt/coco/sevguest/sevguest.h
create mode 100644 include/uapi/linux/sev-guest.h
--
2.25.1
From: Borislav Petkov <[email protected]>
Remove all the defines of masks and bit positions for the GHCB MSR
protocol and use comments instead which correspond directly to the spec
so that following those can be a lot easier and straightforward with the
spec opened in parallel to the code.
Aligh vertically while at it.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/sev-common.h | 51 +++++++++++++++++--------------
1 file changed, 28 insertions(+), 23 deletions(-)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 855b0ec9c4e8..aac44c3f839c 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -18,20 +18,19 @@
/* SEV Information Request/Response */
#define GHCB_MSR_SEV_INFO_RESP 0x001
#define GHCB_MSR_SEV_INFO_REQ 0x002
-#define GHCB_MSR_VER_MAX_POS 48
-#define GHCB_MSR_VER_MAX_MASK 0xffff
-#define GHCB_MSR_VER_MIN_POS 32
-#define GHCB_MSR_VER_MIN_MASK 0xffff
-#define GHCB_MSR_CBIT_POS 24
-#define GHCB_MSR_CBIT_MASK 0xff
-#define GHCB_MSR_SEV_INFO(_max, _min, _cbit) \
- ((((_max) & GHCB_MSR_VER_MAX_MASK) << GHCB_MSR_VER_MAX_POS) | \
- (((_min) & GHCB_MSR_VER_MIN_MASK) << GHCB_MSR_VER_MIN_POS) | \
- (((_cbit) & GHCB_MSR_CBIT_MASK) << GHCB_MSR_CBIT_POS) | \
+
+#define GHCB_MSR_SEV_INFO(_max, _min, _cbit) \
+ /* GHCBData[63:48] */ \
+ ((((_max) & 0xffff) << 48) | \
+ /* GHCBData[47:32] */ \
+ (((_min) & 0xffff) << 32) | \
+ /* GHCBData[31:24] */ \
+ (((_cbit) & 0xff) << 24) | \
GHCB_MSR_SEV_INFO_RESP)
+
#define GHCB_MSR_INFO(v) ((v) & 0xfffUL)
-#define GHCB_MSR_PROTO_MAX(v) (((v) >> GHCB_MSR_VER_MAX_POS) & GHCB_MSR_VER_MAX_MASK)
-#define GHCB_MSR_PROTO_MIN(v) (((v) >> GHCB_MSR_VER_MIN_POS) & GHCB_MSR_VER_MIN_MASK)
+#define GHCB_MSR_PROTO_MAX(v) (((v) >> 48) & 0xffff)
+#define GHCB_MSR_PROTO_MIN(v) (((v) >> 32) & 0xffff)
/* CPUID Request/Response */
#define GHCB_MSR_CPUID_REQ 0x004
@@ -46,27 +45,33 @@
#define GHCB_CPUID_REQ_EBX 1
#define GHCB_CPUID_REQ_ECX 2
#define GHCB_CPUID_REQ_EDX 3
-#define GHCB_CPUID_REQ(fn, reg) \
- (GHCB_MSR_CPUID_REQ | \
- (((unsigned long)reg & GHCB_MSR_CPUID_REG_MASK) << GHCB_MSR_CPUID_REG_POS) | \
- (((unsigned long)fn) << GHCB_MSR_CPUID_FUNC_POS))
+#define GHCB_CPUID_REQ(fn, reg) \
+ /* GHCBData[11:0] */ \
+ (GHCB_MSR_CPUID_REQ | \
+ /* GHCBData[31:12] */ \
+ (((unsigned long)(reg) & 0x3) << 30) | \
+ /* GHCBData[63:32] */ \
+ (((unsigned long)fn) << 32))
/* AP Reset Hold */
-#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
-#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
+#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
/* GHCB Hypervisor Feature Request/Response */
-#define GHCB_MSR_HV_FT_REQ 0x080
-#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_REQ 0x080
+#define GHCB_MSR_HV_FT_RESP 0x081
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
#define GHCB_MSR_TERM_REASON_POS 16
#define GHCB_MSR_TERM_REASON_MASK 0xff
-#define GHCB_SEV_TERM_REASON(reason_set, reason_val) \
- (((((u64)reason_set) & GHCB_MSR_TERM_REASON_SET_MASK) << GHCB_MSR_TERM_REASON_SET_POS) | \
- ((((u64)reason_val) & GHCB_MSR_TERM_REASON_MASK) << GHCB_MSR_TERM_REASON_POS))
+
+#define GHCB_SEV_TERM_REASON(reason_set, reason_val) \
+ /* GHCBData[15:12] */ \
+ (((((u64)reason_set) & 0xf) << 12) | \
+ /* GHCBData[23:16] */ \
+ ((((u64)reason_val) & 0xff) << 16))
#define GHCB_SEV_ES_GEN_REQ 0
#define GHCB_SEV_ES_PROT_UNSUPPORTED 1
--
2.25.1
Version 2 of GHCB specification introduced advertisement of a features
that are supported by the hypervisor. Add support to query the HV
features on boot.
Version 2 of GHCB specification adds several new NAEs, most of them are
optional except the hypervisor feature. Now that hypervisor feature NAE
is implemented, so bump the GHCB maximum support protocol version.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 3 +++
arch/x86/include/asm/sev.h | 2 +-
arch/x86/include/uapi/asm/svm.h | 2 ++
arch/x86/kernel/sev-shared.c | 30 ++++++++++++++++++++++++++++++
4 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 3278ee578937..891569c07ed7 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -60,6 +60,9 @@
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_RESP_VAL(v) \
+ /* GHCBData[63:12] */ \
+ (((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7ec91b1359df..134a7c9d91b6 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -13,7 +13,7 @@
#include <asm/sev-common.h>
#define GHCB_PROTOCOL_MIN 1ULL
-#define GHCB_PROTOCOL_MAX 1ULL
+#define GHCB_PROTOCOL_MAX 2ULL
#define GHCB_DEFAULT_USAGE 0ULL
#define VMGEXIT() { asm volatile("rep; vmmcall\n\r"); }
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index efa969325ede..b0ad00f4c1e1 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -108,6 +108,7 @@
#define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
+#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff
/* Exit code reserved for hypervisor/software use */
@@ -218,6 +219,7 @@
{ SVM_VMGEXIT_NMI_COMPLETE, "vmgexit_nmi_complete" }, \
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
+ { SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 0eb22528ec87..8ee27d07c1cd 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -23,6 +23,9 @@
*/
static u16 __ro_after_init ghcb_version;
+/* Bitmap of SEV features supported by the hypervisor */
+static u64 __ro_after_init sev_hv_features;
+
static bool __init sev_es_check_cpu_features(void)
{
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -48,6 +51,30 @@ static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
asm volatile("hlt\n" : : : "memory");
}
+/*
+ * The hypervisor features are available from GHCB version 2 onward.
+ */
+static bool get_hv_features(void)
+{
+ u64 val;
+
+ sev_hv_features = 0;
+
+ if (ghcb_version < 2)
+ return false;
+
+ sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_HV_FT_RESP)
+ return false;
+
+ sev_hv_features = GHCB_MSR_HV_FT_RESP_VAL(val);
+
+ return true;
+}
+
static bool sev_es_negotiate_protocol(void)
{
u64 val;
@@ -66,6 +93,9 @@ static bool sev_es_negotiate_protocol(void)
ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
+ if (!get_hv_features())
+ return false;
+
return true;
}
--
2.25.1
From: Borislav Petkov <[email protected]>
Carve it out so that it is abstracted out of the main boot path. All
other encrypted guest-relevant processing should be placed in there.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/head64.c | 60 +++++++++++++++++++++-------------------
1 file changed, 31 insertions(+), 29 deletions(-)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index fc5371a7e9d1..3be9dd213dad 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -126,6 +126,36 @@ static bool __head check_la57_support(unsigned long physaddr)
}
#endif
+static unsigned long sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
+{
+ unsigned long vaddr, vaddr_end;
+ int i;
+
+ /* Encrypt the kernel and related (if SME is active) */
+ sme_encrypt_kernel(bp);
+
+ /*
+ * Clear the memory encryption mask from the .bss..decrypted section.
+ * The bss section will be memset to zero later in the initialization so
+ * there is no need to zero it after changing the memory encryption
+ * attribute.
+ */
+ if (sme_get_me_mask()) {
+ vaddr = (unsigned long)__start_bss_decrypted;
+ vaddr_end = (unsigned long)__end_bss_decrypted;
+ for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+ i = pmd_index(vaddr);
+ pmd[i] -= sme_get_me_mask();
+ }
+ }
+
+ /*
+ * Return the SME encryption mask (if SME is active) to be used as a
+ * modifier for the initial pgdir entry programmed into CR3.
+ */
+ return sme_get_me_mask();
+}
+
/* Code in __startup_64() can be relocated during execution, but the compiler
* doesn't have to generate PC-relative relocations when accessing globals from
* that function. Clang actually does not generate them, which leads to
@@ -135,7 +165,6 @@ static bool __head check_la57_support(unsigned long physaddr)
unsigned long __head __startup_64(unsigned long physaddr,
struct boot_params *bp)
{
- unsigned long vaddr, vaddr_end;
unsigned long load_delta, *p;
unsigned long pgtable_flags;
pgdval_t *pgd;
@@ -276,34 +305,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
*/
*fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
- /* Encrypt the kernel and related (if SME is active) */
- sme_encrypt_kernel(bp);
-
- /*
- * Clear the memory encryption mask from the .bss..decrypted section.
- * The bss section will be memset to zero later in the initialization so
- * there is no need to zero it after changing the memory encryption
- * attribute.
- *
- * This is early code, use an open coded check for SME instead of
- * using cc_platform_has(). This eliminates worries about removing
- * instrumentation or checking boot_cpu_data in the cc_platform_has()
- * function.
- */
- if (sme_get_me_mask()) {
- vaddr = (unsigned long)__start_bss_decrypted;
- vaddr_end = (unsigned long)__end_bss_decrypted;
- for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
- i = pmd_index(vaddr);
- pmd[i] -= sme_get_me_mask();
- }
- }
-
- /*
- * Return the SME encryption mask (if SME is active) to be used as a
- * modifier for the initial pgdir entry programmed into CR3.
- */
- return sme_get_me_mask();
+ return sme_postprocess_startup(bp, pmd);
}
unsigned long __startup_secondary_64(void)
--
2.25.1
An SNP-active guest uses the PVALIDATE instruction to validate or
rescind the validation of a guest page’s RMP entry. Upon completion,
a return code is stored in EAX and rFLAGS bits are set based on the
return code. If the instruction completed successfully, the CF
indicates if the content of the RMP were changed or not.
See AMD APM Volume 3 for additional details.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 134a7c9d91b6..b308815a2c01 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -59,6 +59,9 @@ extern void vc_no_ghcb(void);
extern void vc_boot_ghcb(void);
extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
+/* Software defined (when rFlags.CF = 1) */
+#define PVALIDATE_FAIL_NOUPDATE 255
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -81,12 +84,30 @@ static __always_inline void sev_es_nmi_complete(void)
__sev_es_nmi_complete();
}
extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
+{
+ bool no_rmpupdate;
+ int rc;
+
+ /* "pvalidate" mnemonic support in binutils 2.36 and newer */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
+ CC_SET(c)
+ : CC_OUT(c) (no_rmpupdate), "=a"(rc)
+ : "a"(vaddr), "c"(rmp_psize), "d"(validate)
+ : "memory", "cc");
+
+ if (no_rmpupdate)
+ return PVALIDATE_FAIL_NOUPDATE;
+
+ return rc;
+}
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { return 0; }
static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
+static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
#endif
#endif
--
2.25.1
Version 2 of the GHCB specification added the advertisement of features
that are supported by the hypervisor. If hypervisor supports the SEV-SNP
then it must set the SEV-SNP features bit to indicate that the base
SEV-SNP is supported.
Check the SEV-SNP feature while establishing the GHCB, if failed,
terminate the guest.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 16 ++++++++++++++--
arch/x86/include/asm/sev-common.h | 3 +++
arch/x86/kernel/sev.c | 8 ++++++--
3 files changed, 23 insertions(+), 4 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 7760959fe96d..8b0f892c072b 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -119,11 +119,23 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
/* Include code for early handlers */
#include "../../kernel/sev-shared.c"
-static bool early_setup_sev_es(void)
+static inline bool sev_snp_enabled(void)
+{
+ return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+}
+
+static bool do_early_sev_setup(void)
{
if (!sev_es_negotiate_protocol())
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_PROT_UNSUPPORTED);
+ /*
+ * If SEV-SNP is enabled, then check if the hypervisor supports the SEV-SNP
+ * features.
+ */
+ if (sev_snp_enabled() && !(sev_hv_features & GHCB_HV_FT_SNP))
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+
if (set_page_decrypted((unsigned long)&boot_ghcb_page))
return false;
@@ -174,7 +186,7 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
struct es_em_ctxt ctxt;
enum es_result result;
- if (!boot_ghcb && !early_setup_sev_es())
+ if (!boot_ghcb && !do_early_sev_setup())
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
vc_ghcb_invalidate(boot_ghcb);
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 891569c07ed7..f80a3cde2086 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -64,6 +64,8 @@
/* GHCBData[63:12] */ \
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
+#define GHCB_HV_FT_SNP BIT_ULL(0)
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
@@ -80,6 +82,7 @@
#define SEV_TERM_SET_GEN 0
#define GHCB_SEV_ES_GEN_REQ 0
#define GHCB_SEV_ES_PROT_UNSUPPORTED 1
+#define GHCB_SNP_UNSUPPORTED 2
/* Linux-specific reason codes (used with reason set 1) */
#define SEV_TERM_SET_LINUX 1
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 427b1c6d08a8..2290fbcc1844 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -631,12 +631,16 @@ static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
* This function runs on the first #VC exception after the kernel
* switched to virtual addresses.
*/
-static bool __init sev_es_setup_ghcb(void)
+static bool __init setup_ghcb(void)
{
/* First make sure the hypervisor talks a supported protocol. */
if (!sev_es_negotiate_protocol())
return false;
+ /* If SNP is active, make sure that hypervisor supports the feature. */
+ if (cc_platform_has(CC_ATTR_SEV_SNP) && !(sev_hv_features & GHCB_HV_FT_SNP))
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+
/*
* Clear the boot_ghcb. The first exception comes in before the bss
* section is cleared.
@@ -1444,7 +1448,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
enum es_result result;
/* Do initial setup or terminate the guest */
- if (unlikely(boot_ghcb == NULL && !sev_es_setup_ghcb()))
+ if (unlikely(!boot_ghcb && !setup_ghcb()))
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
vc_ghcb_invalidate(boot_ghcb);
--
2.25.1
Many of the integrity guarantees of SEV-SNP are enforced through the
Reverse Map Table (RMP). Each RMP entry contains the GPA at which a
particular page of DRAM should be mapped. The VMs can request the
hypervisor to add pages in the RMP table via the Page State Change VMGEXIT
defined in the GHCB specification. Inside each RMP entry is a Validated
flag; this flag is automatically cleared to 0 by the CPU hardware when a
new RMP entry is created for a guest. Each VM page can be either
validated or invalidated, as indicated by the Validated flag in the RMP
entry. Memory access to a private page that is not validated generates
a #VC. A VM must use PVALIDATE instruction to validate the private page
before using it.
To maintain the security guarantee of SEV-SNP guests, when transitioning
pages from private to shared, the guest must invalidate the pages before
asking the hypervisor to change the page state to shared in the RMP table.
After the pages are mapped private in the page table, the guest must issue
a page state change VMGEXIT to make the pages private in the RMP table and
validate it.
On boot, BIOS should have validated the entire system memory. During
the kernel decompression stage, the VC handler uses the
set_memory_decrypted() to make the GHCB page shared (i.e clear encryption
attribute). And while exiting from the decompression, it calls the
set_page_encrypted() to make the page private.
Add sev_snp_set_page_{private,shared}() helper that is used by the
set_memory_{decrypt,encrypt}() to change the page state in the RMP table.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/ident_map_64.c | 18 ++++++++++-
arch/x86/boot/compressed/misc.h | 6 ++++
arch/x86/boot/compressed/sev.c | 41 +++++++++++++++++++++++++
arch/x86/include/asm/sev-common.h | 26 ++++++++++++++++
4 files changed, 90 insertions(+), 1 deletion(-)
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index f7213d0943b8..3cf7a7575f5c 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -275,15 +275,31 @@ static int set_clr_page_flags(struct x86_mapping_info *info,
* Changing encryption attributes of a page requires to flush it from
* the caches.
*/
- if ((set | clr) & _PAGE_ENC)
+ if ((set | clr) & _PAGE_ENC) {
clflush_page(address);
+ /*
+ * If the encryption attribute is being cleared, then change
+ * the page state to shared in the RMP table.
+ */
+ if (clr)
+ snp_set_page_shared(pte_pfn(*ptep) << PAGE_SHIFT);
+ }
+
/* Update PTE */
pte = *ptep;
pte = pte_set_flags(pte, set);
pte = pte_clear_flags(pte, clr);
set_pte(ptep, pte);
+ /*
+ * If the encryption attribute is being set, then change the page state to
+ * private in the RMP entry. The page state must be done after the PTE
+ * is updated.
+ */
+ if (set & _PAGE_ENC)
+ snp_set_page_private(pte_pfn(*ptep) << PAGE_SHIFT);
+
/* Flush TLB after changing encryption attribute */
write_cr3(top_level_pgt);
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 31139256859f..822e0c254b9a 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -121,12 +121,18 @@ void set_sev_encryption_mask(void);
#ifdef CONFIG_AMD_MEM_ENCRYPT
void sev_es_shutdown_ghcb(void);
extern bool sev_es_check_ghcb_fault(unsigned long address);
+void snp_set_page_private(unsigned long paddr);
+void snp_set_page_shared(unsigned long paddr);
+
#else
static inline void sev_es_shutdown_ghcb(void) { }
static inline bool sev_es_check_ghcb_fault(unsigned long address)
{
return false;
}
+static inline void snp_set_page_private(unsigned long paddr) { }
+static inline void snp_set_page_shared(unsigned long paddr) { }
+
#endif
/* acpi.c */
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index cf24cc2af40a..c644f260098e 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -154,6 +154,47 @@ static bool is_vmpl0(void)
return true;
}
+static void __page_state_change(unsigned long paddr, enum psc_op op)
+{
+ u64 val;
+
+ if (!sev_snp_enabled())
+ return;
+
+ /*
+ * If private -> shared then invalidate the page before requesting the
+ * state change in the RMP table.
+ */
+ if (op == SNP_PAGE_STATE_SHARED && pvalidate(paddr, RMP_PG_SIZE_4K, 0))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+
+ /* Issue VMGEXIT to change the page state in RMP table. */
+ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
+ VMGEXIT();
+
+ /* Read the response of the VMGEXIT. */
+ val = sev_es_rd_ghcb_msr();
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+ /*
+ * Now that page is added in the RMP table, validate it so that it is
+ * consistent with the RMP entry.
+ */
+ if (op == SNP_PAGE_STATE_PRIVATE && pvalidate(paddr, RMP_PG_SIZE_4K, 1))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+}
+
+void snp_set_page_private(unsigned long paddr)
+{
+ __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
+}
+
+void snp_set_page_shared(unsigned long paddr)
+{
+ __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
+}
+
static bool do_early_sev_setup(void)
{
if (!sev_es_negotiate_protocol())
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index d426c30ae7b4..1c76b6b775cc 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -57,6 +57,32 @@
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+/*
+ * SNP Page State Change Operation
+ *
+ * GHCBData[55:52] - Page operation:
+ * 0x0001 – Page assignment, Private
+ * 0x0002 – Page assignment, Shared
+ */
+enum psc_op {
+ SNP_PAGE_STATE_PRIVATE = 1,
+ SNP_PAGE_STATE_SHARED,
+};
+
+#define GHCB_MSR_PSC_REQ 0x014
+#define GHCB_MSR_PSC_REQ_GFN(gfn, op) \
+ /* GHCBData[55:52] */ \
+ (((u64)((op) & 0xf) << 52) | \
+ /* GHCBData[51:12] */ \
+ ((u64)((gfn) & GENMASK_ULL(39, 0)) << 12) | \
+ /* GHCBData[11:0] */ \
+ GHCB_MSR_PSC_REQ)
+
+#define GHCB_MSR_PSC_RESP 0x015
+#define GHCB_MSR_PSC_RESP_VAL(val) \
+ /* GHCBData[63:32] */ \
+ (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
+
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
--
2.25.1
The hypervisor uses the sev_features field (offset 3B0h) in the Save State
Area to control the SEV-SNP guest features such as SNPActive, vTOM,
ReflectVC etc. An SEV-SNP guest can read the SEV_FEATURES fields through
the SEV_STATUS MSR.
While at it, update the dump_vmcb() to log the VMPL level.
See APM2 Table 15-34 and B-4 for more details.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 6 ++++--
arch/x86/kvm/svm/svm.c | 4 ++--
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index b00dbc5fac2b..7c9cf4f3c164 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -238,7 +238,8 @@ struct vmcb_save_area {
struct vmcb_seg ldtr;
struct vmcb_seg idtr;
struct vmcb_seg tr;
- u8 reserved_1[43];
+ u8 reserved_1[42];
+ u8 vmpl;
u8 cpl;
u8 reserved_2[4];
u64 efer;
@@ -303,7 +304,8 @@ struct vmcb_save_area {
u64 sw_exit_info_1;
u64 sw_exit_info_2;
u64 sw_scratch;
- u8 reserved_11[56];
+ u64 sev_features;
+ u8 reserved_11[48];
u64 xcr0;
u8 valid_bitmap[16];
u64 x87_state_gpa;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index aa4828274557..2b932e074256 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3210,8 +3210,8 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
"tr:",
save01->tr.selector, save01->tr.attrib,
save01->tr.limit, save01->tr.base);
- pr_err("cpl: %d efer: %016llx\n",
- save->cpl, save->efer);
+ pr_err("vmpl: %d cpl: %d efer: %016llx\n",
+ save->vmpl, save->cpl, save->efer);
pr_err("%-15s %016llx %-13s %016llx\n",
"cr0:", save->cr0, "cr2:", save->cr2);
pr_err("%-15s %016llx %-13s %016llx\n",
--
2.25.1
The encryption attribute for the bss.decrypted region is cleared in the
initial page table build. This is because the section contains the data
that need to be shared between the guest and the hypervisor.
When SEV-SNP is active, just clearing the encryption attribute in the
page table is not enough. The page state need to be updated in the RMP
table.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/head64.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 3be9dd213dad..3c0bfed3b58e 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -143,7 +143,14 @@ static unsigned long sme_postprocess_startup(struct boot_params *bp, pmdval_t *p
if (sme_get_me_mask()) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
+
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+ /*
+ * When SEV-SNP is active then transition the page to shared in the RMP
+ * table so that it is consistent with the page table attribute change.
+ */
+ early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
+
i = pmd_index(vaddr);
pmd[i] -= sme_get_me_mask();
}
--
2.25.1
The set_memory_{encrypt,decrypt}() are used for changing the pages
from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps.
If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:
1. Issue the page state change VMGEXIT to add the memory region in
the RMP table.
2. Validate the memory region after the RMP entry is added.
To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before encryption attribute
is removed from the page table:
1. Invalidate the page.
2. Issue the page state change VMGEXIT to remove the page from RMP table.
To change the page state in the RMP table, use the Page State Change
VMGEXIT defined in the GHCB specification.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 22 ++++
arch/x86/include/asm/sev.h | 4 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kernel/sev.c | 165 ++++++++++++++++++++++++++++++
arch/x86/mm/pat/set_memory.c | 15 +++
5 files changed, 208 insertions(+)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b82fff9d607b..c2c5d60f0da0 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -105,6 +105,28 @@ enum psc_op {
#define GHCB_HV_FT_SNP BIT_ULL(0)
+/* SNP Page State Change NAE event */
+#define VMGEXIT_PSC_MAX_ENTRY 253
+
+struct psc_hdr {
+ u16 cur_entry;
+ u16 end_entry;
+ u32 reserved;
+} __packed;
+
+struct psc_entry {
+ u64 cur_page : 12,
+ gfn : 40,
+ operation : 4,
+ pagesize : 1,
+ reserved : 7;
+} __packed;
+
+struct snp_psc_desc {
+ struct psc_hdr hdr;
+ struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY];
+} __packed;
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ecd8cd8c5908..005f230d0406 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -109,6 +109,8 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
unsigned int npages);
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
+void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
+void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -121,6 +123,8 @@ early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned
static inline void __init
early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
+static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
+static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
#endif
#endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index b0ad00f4c1e1..0dcdb6e0c913 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -108,6 +108,7 @@
#define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
+#define SVM_VMGEXIT_PSC 0x80000010
#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff
@@ -219,6 +220,7 @@
{ SVM_VMGEXIT_NMI_COMPLETE, "vmgexit_nmi_complete" }, \
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
+ { SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
{ SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 488011479678..80fdfd83770a 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -655,6 +655,171 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
WARN(1, "invalid memory op %d\n", op);
}
+static int vmgexit_psc(struct snp_psc_desc *desc)
+{
+ int cur_entry, end_entry, ret;
+ struct snp_psc_desc *data;
+ struct ghcb_state state;
+ struct ghcb *ghcb;
+ struct psc_hdr *hdr;
+ unsigned long flags;
+
+ local_irq_save(flags);
+
+ ghcb = __sev_get_ghcb(&state);
+ if (unlikely(!ghcb))
+ panic("SEV-SNP: Failed to get GHCB\n");
+
+ /* Copy the input desc into GHCB shared buffer */
+ data = (struct snp_psc_desc *)ghcb->shared_buffer;
+ memcpy(ghcb->shared_buffer, desc, sizeof(*desc));
+
+ hdr = &data->hdr;
+ cur_entry = hdr->cur_entry;
+ end_entry = hdr->end_entry;
+
+ /*
+ * As per the GHCB specification, the hypervisor can resume the guest
+ * before processing all the entries. Checks whether all the entries
+ * are processed. If not, then keep retrying.
+ *
+ * The stragtegy here is to wait for the hypervisor to change the page
+ * state in the RMP table before guest access the memory pages. If the
+ * page state was not successful, then later memory access will result
+ * in the crash.
+ */
+ while (hdr->cur_entry <= hdr->end_entry) {
+ ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
+
+ ret = sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PSC, 0, 0);
+
+ /*
+ * Page State Change VMGEXIT can pass error code through
+ * exit_info_2.
+ */
+ if (WARN(ret || ghcb->save.sw_exit_info_2,
+ "SEV-SNP: PSC failed ret=%d exit_info_2=%llx\n",
+ ret, ghcb->save.sw_exit_info_2)) {
+ ret = 1;
+ goto out;
+ }
+
+ /*
+ * Sanity check that entry processing is not going backward.
+ * This will happen only if hypervisor is tricking us.
+ */
+ if (WARN(hdr->end_entry > end_entry || cur_entry > hdr->cur_entry,
+"SEV-SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
+ end_entry, hdr->end_entry, cur_entry, hdr->cur_entry)) {
+ ret = 1;
+ goto out;
+ }
+
+ /* Verify that reserved bit is not set */
+ if (WARN(hdr->reserved, "Reserved bit is set in the PSC header\n")) {
+ ret = 1;
+ goto out;
+ }
+ }
+
+out:
+ __sev_put_ghcb(&state);
+ local_irq_restore(flags);
+
+ return 0;
+}
+
+static void __set_page_state(struct snp_psc_desc *data, unsigned long vaddr,
+ unsigned long vaddr_end, int op)
+{
+ struct psc_hdr *hdr;
+ struct psc_entry *e;
+ unsigned long pfn;
+ int i;
+
+ hdr = &data->hdr;
+ e = data->entries;
+
+ memset(data, 0, sizeof(*data));
+ i = 0;
+
+ while (vaddr < vaddr_end) {
+ if (is_vmalloc_addr((void *)vaddr))
+ pfn = vmalloc_to_pfn((void *)vaddr);
+ else
+ pfn = __pa(vaddr) >> PAGE_SHIFT;
+
+ e->gfn = pfn;
+ e->operation = op;
+ hdr->end_entry = i;
+
+ /*
+ * The GHCB specification provides the flexibility to
+ * use either 4K or 2MB page size in the RMP table.
+ * The current SNP support does not keep track of the
+ * page size used in the RMP table. To avoid the
+ * overlap request, use the 4K page size in the RMP
+ * table.
+ */
+ e->pagesize = RMP_PG_SIZE_4K;
+
+ vaddr = vaddr + PAGE_SIZE;
+ e++;
+ i++;
+ }
+
+ if (vmgexit_psc(data))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+}
+
+static void set_page_state(unsigned long vaddr, unsigned int npages, int op)
+{
+ unsigned long vaddr_end, next_vaddr;
+ struct snp_psc_desc *desc;
+
+ vaddr = vaddr & PAGE_MASK;
+ vaddr_end = vaddr + (npages << PAGE_SHIFT);
+
+ desc = kmalloc(sizeof(*desc), GFP_KERNEL_ACCOUNT);
+ if (!desc)
+ panic("SEV-SNP: failed to allocate memory for PSC descriptor\n");
+
+ while (vaddr < vaddr_end) {
+ /*
+ * Calculate the last vaddr that can be fit in one
+ * struct snp_psc_desc.
+ */
+ next_vaddr = min_t(unsigned long, vaddr_end,
+ (VMGEXIT_PSC_MAX_ENTRY * PAGE_SIZE) + vaddr);
+
+ __set_page_state(desc, vaddr, next_vaddr, op);
+
+ vaddr = next_vaddr;
+ }
+
+ kfree(desc);
+}
+
+void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
+{
+ if (!cc_platform_has(CC_ATTR_SEV_SNP))
+ return;
+
+ pvalidate_pages(vaddr, npages, 0);
+
+ set_page_state(vaddr, npages, SNP_PAGE_STATE_SHARED);
+}
+
+void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
+{
+ if (!cc_platform_has(CC_ATTR_SEV_SNP))
+ return;
+
+ set_page_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
+
+ pvalidate_pages(vaddr, npages, 1);
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 527957586f3c..ffe51944606a 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -30,6 +30,7 @@
#include <asm/proto.h>
#include <asm/memtype.h>
#include <asm/set_memory.h>
+#include <asm/sev.h>
#include "../mm_internal.h"
@@ -2010,8 +2011,22 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
*/
cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
+ /*
+ * To maintain the security gurantees of SEV-SNP guest invalidate the memory
+ * before clearing the encryption attribute.
+ */
+ if (!enc)
+ snp_set_memory_shared(addr, numpages);
+
ret = __change_page_attr_set_clr(&cpa, 1);
+ /*
+ * Now that memory is mapped encrypted in the page table, validate it
+ * so that is consistent with the above page state.
+ */
+ if (!ret && enc)
+ snp_set_memory_private(addr, numpages);
+
/*
* After changing the encryption attribute, we need to flush TLBs again
* in case any speculative TLB caching occurred (but no need to flush
--
2.25.1
From: Michael Roth <[email protected]>
Generally access to MSR_AMD64_SEV is only safe if the 0x8000001F CPUID
leaf indicates SEV support. With SEV-SNP, CPUID responses from the
hypervisor are not considered trustworthy, particularly for 0x8000001F.
SEV-SNP provides a firmware-validated CPUID table to use as an
alternative, but prior to checking MSR_AMD64_SEV there are no
guarantees that this is even an SEV-SNP guest.
Rather than relying on these CPUID values early on, allow SEV-ES and
SEV-SNP guests to instead use a cpuid instruction to trigger a #VC and
have it cache MSR_AMD64_SEV in sev_status, since it is known to be safe
to access MSR_AMD64_SEV if a #VC has triggered.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/sev-shared.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 8ee27d07c1cd..2796c524d174 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -191,6 +191,20 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
if (exit_code != SVM_EXIT_CPUID)
goto fail;
+ /*
+ * A #VC implies that either SEV-ES or SEV-SNP are enabled, so the SEV
+ * MSR is also available. Go ahead and initialize sev_status here to
+ * allow SEV features to be checked without relying solely on the SEV
+ * cpuid bit to indicate whether it is safe to do so.
+ */
+ if (!sev_status) {
+ unsigned long lo, hi;
+
+ asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
+ : "c" (MSR_AMD64_SEV));
+ sev_status = (hi << 32) | lo;
+ }
+
sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX));
VMGEXIT();
val = sev_es_rd_ghcb_msr();
--
2.25.1
From: Borislav Petkov <[email protected]>
There's a perfectly fine prototype in the asm/setup.h header. Use it.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/sev.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 4c891d5d9651..ad3fefb741e1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -26,6 +26,7 @@
#include <asm/fpu/internal.h>
#include <asm/processor.h>
#include <asm/realmode.h>
+#include <asm/setup.h>
#include <asm/traps.h>
#include <asm/svm.h>
#include <asm/smp.h>
@@ -93,9 +94,6 @@ struct ghcb_state {
static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
-/* Needed in vc_early_forward_exception */
-void do_early_exception(struct pt_regs *regs, int trapnr);
-
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -167,9 +165,6 @@ void noinstr __sev_es_ist_exit(void)
this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
}
-/* Needed in vc_early_forward_exception */
-void do_early_exception(struct pt_regs *regs, int trapnr);
-
static inline u64 sev_es_rd_ghcb_msr(void)
{
return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
--
2.25.1
From: Tom Lendacky <[email protected]>
The initial implementation of the GHCB spec was based on trying to keep
the register state offsets the same relative to the VM save area. However,
the save area for SEV-ES has changed within the hardware causing the
relation between the SEV-ES save area to change relative to the GHCB save
area.
This is the second step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Create a GHCB save area that matches the GHCB specification.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 48 +++++++++++++++++++++++++++++++++++---
1 file changed, 45 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 0df489a70945..4a4de2454ca3 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -354,9 +354,49 @@ struct sev_es_save_area {
u64 x87_state_gpa;
} __packed;
+struct ghcb_save_area {
+ u8 reserved_1[203];
+ u8 cpl;
+ u8 reserved_2[116];
+ u64 xss;
+ u8 reserved_3[24];
+ u64 dr7;
+ u8 reserved_4[16];
+ u64 rip;
+ u8 reserved_5[88];
+ u64 rsp;
+ u8 reserved_6[24];
+ u64 rax;
+ u8 reserved_7[264];
+ u64 rcx;
+ u64 rdx;
+ u64 rbx;
+ u8 reserved_8[8];
+ u64 rbp;
+ u64 rsi;
+ u64 rdi;
+ u64 r8;
+ u64 r9;
+ u64 r10;
+ u64 r11;
+ u64 r12;
+ u64 r13;
+ u64 r14;
+ u64 r15;
+ u8 reserved_9[16];
+ u64 sw_exit_code;
+ u64 sw_exit_info_1;
+ u64 sw_exit_info_2;
+ u64 sw_scratch;
+ u8 reserved_10[56];
+ u64 xcr0;
+ u8 valid_bitmap[16];
+ u64 x87_state_gpa;
+} __packed;
+
struct ghcb {
- struct sev_es_save_area save;
- u8 reserved_save[2048 - sizeof(struct sev_es_save_area)];
+ struct ghcb_save_area save;
+ u8 reserved_save[2048 - sizeof(struct ghcb_save_area)];
u8 shared_buffer[2032];
@@ -367,6 +407,7 @@ struct ghcb {
#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
+#define EXPECTED_GHCB_SAVE_AREA_SIZE 1032
#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 1024
#define EXPECTED_GHCB_SIZE PAGE_SIZE
@@ -374,6 +415,7 @@ struct ghcb {
static inline void __unused_size_checks(void)
{
BUILD_BUG_ON(sizeof(struct vmcb_save_area) != EXPECTED_VMCB_SAVE_AREA_SIZE);
+ BUILD_BUG_ON(sizeof(struct ghcb_save_area) != EXPECTED_GHCB_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct sev_es_save_area) != EXPECTED_SEV_ES_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct vmcb_control_area) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct ghcb) != EXPECTED_GHCB_SIZE);
@@ -444,7 +486,7 @@ struct vmcb {
/* GHCB Accessor functions */
#define GHCB_BITMAP_IDX(field) \
- (offsetof(struct sev_es_save_area, field) / sizeof(u64))
+ (offsetof(struct ghcb_save_area, field) / sizeof(u64))
#define DEFINE_GHCB_ACCESSORS(field) \
static inline bool ghcb_##field##_is_valid(const struct ghcb *ghcb) \
--
2.25.1
From: Tom Lendacky <[email protected]>
The save area for SEV-ES/SEV-SNP guests, as used by the hardware, is
different from the save area of a non SEV-ES/SEV-SNP guest.
This is the first step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Create an SEV-ES/SEV-SNP save area and adjust usage to the new
save area definition where needed.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 83 +++++++++++++++++++++++++++++---------
arch/x86/kvm/svm/sev.c | 24 +++++------
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 77 insertions(+), 32 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 7c9cf4f3c164..0df489a70945 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -227,6 +227,7 @@ struct vmcb_seg {
u64 base;
} __packed;
+/* Save area definition for legacy and SEV-MEM guests */
struct vmcb_save_area {
struct vmcb_seg es;
struct vmcb_seg cs;
@@ -243,8 +244,58 @@ struct vmcb_save_area {
u8 cpl;
u8 reserved_2[4];
u64 efer;
+ u8 reserved_3[112];
+ u64 cr4;
+ u64 cr3;
+ u64 cr0;
+ u64 dr7;
+ u64 dr6;
+ u64 rflags;
+ u64 rip;
+ u8 reserved_4[88];
+ u64 rsp;
+ u64 s_cet;
+ u64 ssp;
+ u64 isst_addr;
+ u64 rax;
+ u64 star;
+ u64 lstar;
+ u64 cstar;
+ u64 sfmask;
+ u64 kernel_gs_base;
+ u64 sysenter_cs;
+ u64 sysenter_esp;
+ u64 sysenter_eip;
+ u64 cr2;
+ u8 reserved_5[32];
+ u64 g_pat;
+ u64 dbgctl;
+ u64 br_from;
+ u64 br_to;
+ u64 last_excp_from;
+ u64 last_excp_to;
+ u8 reserved_6[72];
+ u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
+} __packed;
+
+/* Save area definition for SEV-ES and SEV-SNP guests */
+struct sev_es_save_area {
+ struct vmcb_seg es;
+ struct vmcb_seg cs;
+ struct vmcb_seg ss;
+ struct vmcb_seg ds;
+ struct vmcb_seg fs;
+ struct vmcb_seg gs;
+ struct vmcb_seg gdtr;
+ struct vmcb_seg ldtr;
+ struct vmcb_seg idtr;
+ struct vmcb_seg tr;
+ u8 reserved_1[43];
+ u8 cpl;
+ u8 reserved_2[4];
+ u64 efer;
u8 reserved_3[104];
- u64 xss; /* Valid for SEV-ES only */
+ u64 xss;
u64 cr4;
u64 cr3;
u64 cr0;
@@ -272,22 +323,14 @@ struct vmcb_save_area {
u64 br_to;
u64 last_excp_from;
u64 last_excp_to;
-
- /*
- * The following part of the save area is valid only for
- * SEV-ES guests when referenced through the GHCB or for
- * saving to the host save area.
- */
- u8 reserved_7[72];
- u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
- u8 reserved_7b[4];
+ u8 reserved_7[80];
u32 pkru;
- u8 reserved_7a[20];
- u64 reserved_8; /* rax already available at 0x01f8 */
+ u8 reserved_9[20];
+ u64 reserved_10; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
u64 rbx;
- u64 reserved_9; /* rsp already available at 0x01d8 */
+ u64 reserved_11; /* rsp already available at 0x01d8 */
u64 rbp;
u64 rsi;
u64 rdi;
@@ -299,21 +342,21 @@ struct vmcb_save_area {
u64 r13;
u64 r14;
u64 r15;
- u8 reserved_10[16];
+ u8 reserved_12[16];
u64 sw_exit_code;
u64 sw_exit_info_1;
u64 sw_exit_info_2;
u64 sw_scratch;
u64 sev_features;
- u8 reserved_11[48];
+ u8 reserved_13[48];
u64 xcr0;
u8 valid_bitmap[16];
u64 x87_state_gpa;
} __packed;
struct ghcb {
- struct vmcb_save_area save;
- u8 reserved_save[2048 - sizeof(struct vmcb_save_area)];
+ struct sev_es_save_area save;
+ u8 reserved_save[2048 - sizeof(struct sev_es_save_area)];
u8 shared_buffer[2032];
@@ -323,13 +366,15 @@ struct ghcb {
} __packed;
-#define EXPECTED_VMCB_SAVE_AREA_SIZE 1032
+#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
+#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 1024
#define EXPECTED_GHCB_SIZE PAGE_SIZE
static inline void __unused_size_checks(void)
{
BUILD_BUG_ON(sizeof(struct vmcb_save_area) != EXPECTED_VMCB_SAVE_AREA_SIZE);
+ BUILD_BUG_ON(sizeof(struct sev_es_save_area) != EXPECTED_SEV_ES_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct vmcb_control_area) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct ghcb) != EXPECTED_GHCB_SIZE);
}
@@ -399,7 +444,7 @@ struct vmcb {
/* GHCB Accessor functions */
#define GHCB_BITMAP_IDX(field) \
- (offsetof(struct vmcb_save_area, field) / sizeof(u64))
+ (offsetof(struct sev_es_save_area, field) / sizeof(u64))
#define DEFINE_GHCB_ACCESSORS(field) \
static inline bool ghcb_##field##_is_valid(const struct ghcb *ghcb) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c36b5fe4c27c..4d3c5b302586 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -551,12 +551,20 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
static int sev_es_sync_vmsa(struct vcpu_svm *svm)
{
- struct vmcb_save_area *save = &svm->vmcb->save;
+ struct sev_es_save_area *save = svm->vmsa;
/* Check some debug related fields before encrypting the VMSA */
- if (svm->vcpu.guest_debug || (save->dr7 & ~DR7_FIXED_1))
+ if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1))
return -EINVAL;
+ /*
+ * SEV-ES will use a VMSA that is pointed to by the VMCB, not
+ * the traditional VMSA that is part of the VMCB. Copy the
+ * traditional VMSA as it has been built so far (in prep
+ * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
+ */
+ memcpy(save, &svm->vmcb->save, sizeof(svm->vmcb->save));
+
/* Sync registgers */
save->rax = svm->vcpu.arch.regs[VCPU_REGS_RAX];
save->rbx = svm->vcpu.arch.regs[VCPU_REGS_RBX];
@@ -584,14 +592,6 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->xss = svm->vcpu.arch.ia32_xss;
save->dr6 = svm->vcpu.arch.dr6;
- /*
- * SEV-ES will use a VMSA that is pointed to by the VMCB, not
- * the traditional VMSA that is part of the VMCB. Copy the
- * traditional VMSA as it has been built so far (in prep
- * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
- */
- memcpy(svm->vmsa, save, sizeof(*save));
-
return 0;
}
@@ -2645,7 +2645,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm)
void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
{
struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
- struct vmcb_save_area *hostsa;
+ struct sev_es_save_area *hostsa;
/*
* As an SEV-ES guest, hardware will restore the host state on VMEXIT,
@@ -2655,7 +2655,7 @@ void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
vmsave(__sme_page_pa(sd->save_area));
/* XCR0 is restored on VMEXIT, save the current host value */
- hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
+ hostsa = (struct sev_es_save_area *)(page_address(sd->save_area) + 0x400);
hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
/* PKRU is restored on VMEXIT, save the current host value */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 58350deb428b..689d99cd7b9d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -184,7 +184,7 @@ struct vcpu_svm {
} shadow_msr_intercept;
/* SEV-ES support */
- struct vmcb_save_area *vmsa;
+ struct sev_es_save_area *vmsa;
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
bool received_first_sipi;
--
2.25.1
From: Tom Lendacky <[email protected]>
To provide a more secure way to start APs under SEV-SNP, use the SEV-SNP
AP Creation NAE event. This allows for guest control over the AP register
state rather than trusting the hypervisor with the SEV-ES Jump Table
address.
During native_smp_prepare_cpus(), invoke an SEV-SNP function that, if
SEV-SNP is active, will set/override apic->wakeup_secondary_cpu. This
will allow the SEV-SNP AP Creation NAE event method to be used to boot
the APs. As a result of installing the override when SEV-SNP is active,
this method of starting the APs becomes the required method. The override
function will fail to start the AP if the hypervisor does not have
support for AP creation.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 1 +
arch/x86/include/asm/sev.h | 4 +
arch/x86/include/uapi/asm/svm.h | 5 +
arch/x86/kernel/sev.c | 205 ++++++++++++++++++++++++++++++
arch/x86/kernel/smpboot.c | 3 +
5 files changed, 218 insertions(+)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index c2c5d60f0da0..c380aba9fc8d 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -104,6 +104,7 @@ enum psc_op {
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
#define GHCB_HV_FT_SNP BIT_ULL(0)
+#define GHCB_HV_FT_SNP_AP_CREATION (BIT_ULL(1) | GHCB_HV_FT_SNP)
/* SNP Page State Change NAE event */
#define VMGEXIT_PSC_MAX_ENTRY 253
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 005f230d0406..7f063127aa66 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -65,6 +65,8 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* RMP page size */
#define RMP_PG_SIZE_4K 0
+#define RMPADJUST_VMSA_PAGE_BIT BIT(16)
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -111,6 +113,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
+void snp_set_wakeup_secondary_cpu(void);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -125,6 +128,7 @@ early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned i
static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
+static inline void snp_set_wakeup_secondary_cpu(void) { }
#endif
#endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 0dcdb6e0c913..8b4c57baec52 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -109,6 +109,10 @@
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
#define SVM_VMGEXIT_PSC 0x80000010
+#define SVM_VMGEXIT_AP_CREATION 0x80000013
+#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
+#define SVM_VMGEXIT_AP_CREATE 1
+#define SVM_VMGEXIT_AP_DESTROY 2
#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff
@@ -221,6 +225,7 @@
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
{ SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
+ { SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
{ SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 80fdfd83770a..dfb5b2920933 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -18,6 +18,7 @@
#include <linux/memblock.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/cpumask.h>
#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -31,6 +32,7 @@
#include <asm/svm.h>
#include <asm/smp.h>
#include <asm/cpu.h>
+#include <asm/apic.h>
#define DR7_RESET_VALUE 0x400
@@ -94,6 +96,8 @@ struct ghcb_state {
static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
+static DEFINE_PER_CPU(struct sev_es_save_area *, snp_vmsa);
+
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -820,6 +824,207 @@ void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
pvalidate_pages(vaddr, npages, 1);
}
+static int rmpadjust(void *va, bool vmsa)
+{
+ u64 attrs;
+ int err;
+
+ /*
+ * The RMPADJUST instruction is used to set or clear the VMSA bit for
+ * a page. A change to the VMSA bit is only performed when running
+ * at VMPL0 and is ignored at other VMPL levels. If too low of a target
+ * VMPL level is specified, the instruction can succeed without changing
+ * the VMSA bit should the kernel not be in VMPL0. Using a target VMPL
+ * level of 1 will return a FAIL_PERMISSION error if the kernel is not
+ * at VMPL0, thus ensuring that the VMSA bit has been properly set when
+ * no error is returned.
+ */
+ attrs = 1;
+ if (vmsa)
+ attrs |= RMPADJUST_VMSA_PAGE_BIT;
+
+ /* Instruction mnemonic supported in binutils versions v2.36 and later */
+ asm volatile (".byte 0xf3,0x0f,0x01,0xfe\n\t"
+ : "=a" (err)
+ : "a" (va), "c" (RMP_PG_SIZE_4K), "d" (attrs)
+ : "memory", "cc");
+
+ return err;
+}
+
+#define __ATTR_BASE (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK)
+#define INIT_CS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
+#define INIT_DS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_WRITE_MASK)
+
+#define INIT_LDTR_ATTRIBS (SVM_SELECTOR_P_MASK | 2)
+#define INIT_TR_ATTRIBS (SVM_SELECTOR_P_MASK | 3)
+
+static int wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
+{
+ struct sev_es_save_area *cur_vmsa, *vmsa;
+ struct ghcb_state state;
+ unsigned long flags;
+ struct ghcb *ghcb;
+ int cpu, err, ret;
+ u8 sipi_vector;
+ u64 cr4;
+
+ if ((sev_hv_features & GHCB_HV_FT_SNP_AP_CREATION) != GHCB_HV_FT_SNP_AP_CREATION)
+ return -EOPNOTSUPP;
+
+ /*
+ * Verify the desired start IP against the known trampoline start IP
+ * to catch any future new trampolines that may be introduced that
+ * would require a new protected guest entry point.
+ */
+ if (WARN_ONCE(start_ip != real_mode_header->trampoline_start,
+ "Unsupported SEV-SNP start_ip: %lx\n", start_ip))
+ return -EINVAL;
+
+ /* Override start_ip with known protected guest start IP */
+ start_ip = real_mode_header->sev_es_trampoline_start;
+
+ /* Find the logical CPU for the APIC ID */
+ for_each_present_cpu(cpu) {
+ if (arch_match_cpu_phys_id(cpu, apic_id))
+ break;
+ }
+ if (cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ cur_vmsa = per_cpu(snp_vmsa, cpu);
+
+ /*
+ * A new VMSA is created each time because there is no guarantee that
+ * the current VMSA is the kernels or that the vCPU is not running. If
+ * an attempt was done to use the current VMSA with a running vCPU, a
+ * #VMEXIT of that vCPU would wipe out all of the settings being done
+ * here.
+ */
+ vmsa = (struct sev_es_save_area *)get_zeroed_page(GFP_KERNEL);
+ if (!vmsa)
+ return -ENOMEM;
+
+ /* CR4 should maintain the MCE value */
+ cr4 = native_read_cr4() & X86_CR4_MCE;
+
+ /* Set the CS value based on the start_ip converted to a SIPI vector */
+ sipi_vector = (start_ip >> 12);
+ vmsa->cs.base = sipi_vector << 12;
+ vmsa->cs.limit = 0xffff;
+ vmsa->cs.attrib = INIT_CS_ATTRIBS;
+ vmsa->cs.selector = sipi_vector << 8;
+
+ /* Set the RIP value based on start_ip */
+ vmsa->rip = start_ip & 0xfff;
+
+ /* Set VMSA entries to the INIT values as documented in the APM */
+ vmsa->ds.limit = 0xffff;
+ vmsa->ds.attrib = INIT_DS_ATTRIBS;
+ vmsa->es = vmsa->ds;
+ vmsa->fs = vmsa->ds;
+ vmsa->gs = vmsa->ds;
+ vmsa->ss = vmsa->ds;
+
+ vmsa->gdtr.limit = 0xffff;
+ vmsa->ldtr.limit = 0xffff;
+ vmsa->ldtr.attrib = INIT_LDTR_ATTRIBS;
+ vmsa->idtr.limit = 0xffff;
+ vmsa->tr.limit = 0xffff;
+ vmsa->tr.attrib = INIT_TR_ATTRIBS;
+
+ vmsa->efer = 0x1000; /* Must set SVME bit */
+ vmsa->cr4 = cr4;
+ vmsa->cr0 = 0x60000010;
+ vmsa->dr7 = 0x400;
+ vmsa->dr6 = 0xffff0ff0;
+ vmsa->rflags = 0x2;
+ vmsa->g_pat = 0x0007040600070406ULL;
+ vmsa->xcr0 = 0x1;
+ vmsa->mxcsr = 0x1f80;
+ vmsa->x87_ftw = 0x5555;
+ vmsa->x87_fcw = 0x0040;
+
+ /*
+ * Set the SNP-specific fields for this VMSA:
+ * VMPL level
+ * SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
+ */
+ vmsa->vmpl = 0;
+ vmsa->sev_features = sev_status >> 2;
+
+ /* Switch the page over to a VMSA page now that it is initialized */
+ ret = rmpadjust(vmsa, true);
+ if (ret) {
+ pr_err("set VMSA page failed (%u)\n", ret);
+ free_page((unsigned long)vmsa);
+
+ return -EINVAL;
+ }
+
+ /* Issue VMGEXIT AP Creation NAE event */
+ local_irq_save(flags);
+
+ ghcb = __sev_get_ghcb(&state);
+
+ vc_ghcb_invalidate(ghcb);
+ ghcb_set_rax(ghcb, vmsa->sev_features);
+ ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
+ ghcb_set_sw_exit_info_1(ghcb, ((u64)apic_id << 32) | SVM_VMGEXIT_AP_CREATE);
+ ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
+
+ sev_es_wr_ghcb_msr(__pa(ghcb));
+ VMGEXIT();
+
+ if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
+ lower_32_bits(ghcb->save.sw_exit_info_1)) {
+ pr_alert("SNP AP Creation error\n");
+ ret = -EINVAL;
+ }
+
+ __sev_put_ghcb(&state);
+
+ local_irq_restore(flags);
+
+ /* Perform cleanup if there was an error */
+ if (ret) {
+ err = rmpadjust(vmsa, false);
+ if (err)
+ pr_err("clear VMSA page failed (%u), leaking page\n", err);
+ else
+ free_page((unsigned long)vmsa);
+
+ vmsa = NULL;
+ }
+
+ /* Free up any previous VMSA page */
+ if (cur_vmsa) {
+ err = rmpadjust(cur_vmsa, false);
+ if (err)
+ pr_err("clear VMSA page failed (%u), leaking page\n", err);
+ else
+ free_page((unsigned long)cur_vmsa);
+ }
+
+ /* Record the current VMSA page */
+ per_cpu(snp_vmsa, cpu) = vmsa;
+
+ return ret;
+}
+
+void snp_set_wakeup_secondary_cpu(void)
+{
+ if (!cc_platform_has(CC_ATTR_SEV_SNP))
+ return;
+
+ /*
+ * Always set this override if SEV-SNP is enabled. This makes it the
+ * required method to start APs under SEV-SNP. If the hypervisor does
+ * not support AP creation, then no APs will be started.
+ */
+ apic->wakeup_secondary_cpu = wakeup_cpu_via_vmgexit;
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c453b825a57f..b04cf8ebcb37 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -82,6 +82,7 @@
#include <asm/spec-ctrl.h>
#include <asm/hw_irq.h>
#include <asm/stackprotector.h>
+#include <asm/sev.h>
#ifdef CONFIG_ACPI_CPPC_LIB
#include <acpi/cppc_acpi.h>
@@ -1380,6 +1381,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
smp_quirk_init_udelay();
speculative_store_bypass_ht_init();
+
+ snp_set_wakeup_secondary_cpu();
}
void arch_thaw_secondary_cpus_begin(void)
--
2.25.1
Virtual Machine Privilege Level (VMPL) is an optional feature in the
SEV-SNP architecture, which allows a guest VM to divide its address space
into four levels. The level can be used to provide the hardware isolated
abstraction layers with a VM. The VMPL0 is the highest privilege, and
VMPL3 is the least privilege. Certain operations must be done by the VMPL0
software, such as:
* Validate or invalidate memory range (PVALIDATE instruction)
* Allocate VMSA page (RMPADJUST instruction when VMSA=1)
The initial SEV-SNP support assumes that the guest kernel is running on
VMPL0. Let's add a check to make sure that kernel is running at VMPL0
before continuing the boot. There is no easy method to query the current
VMPL level, so use the RMPADJUST instruction to determine whether its
booted at the VMPL0.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 41 ++++++++++++++++++++++++++++---
arch/x86/include/asm/sev-common.h | 1 +
arch/x86/include/asm/sev.h | 3 +++
3 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 8b0f892c072b..cf24cc2af40a 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -124,6 +124,36 @@ static inline bool sev_snp_enabled(void)
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
}
+static bool is_vmpl0(void)
+{
+ u64 attrs, va;
+ int err;
+
+ /*
+ * There is no straightforward way to query the current VMPL level. The
+ * simplest method is to use the RMPADJUST instruction to change a page
+ * permission to a VMPL level-1, and if the guest kernel is launched at
+ * a level <= 1, then RMPADJUST instruction will return an error.
+ */
+ attrs = 1;
+
+ /*
+ * Any page aligned virtual address is sufficent to test the VMPL level.
+ * The boot_ghcb_page is page aligned memory, so lets use for the test.
+ */
+ va = (u64)&boot_ghcb_page;
+
+ /* Instruction mnemonic supported in binutils versions v2.36 and later */
+ asm volatile (".byte 0xf3,0x0f,0x01,0xfe\n\t"
+ : "=a" (err)
+ : "a" (va), "c" (RMP_PG_SIZE_4K), "d" (attrs)
+ : "memory", "cc");
+ if (err)
+ return false;
+
+ return true;
+}
+
static bool do_early_sev_setup(void)
{
if (!sev_es_negotiate_protocol())
@@ -131,10 +161,15 @@ static bool do_early_sev_setup(void)
/*
* If SEV-SNP is enabled, then check if the hypervisor supports the SEV-SNP
- * features.
+ * features and is launched at VMPL-0 level.
*/
- if (sev_snp_enabled() && !(sev_hv_features & GHCB_HV_FT_SNP))
- sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+ if (sev_snp_enabled()) {
+ if (!(sev_hv_features & GHCB_HV_FT_SNP))
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+
+ if (!is_vmpl0())
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
+ }
if (set_page_decrypted((unsigned long)&boot_ghcb_page))
return false;
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index f80a3cde2086..d426c30ae7b4 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -89,6 +89,7 @@
#define GHCB_TERM_REGISTER 0 /* GHCB GPA registration failure */
#define GHCB_TERM_PSC 1 /* Page State Change failure */
#define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
+#define GHCB_TERM_NOT_VMPL0 3 /* SNP guest is not running at VMPL-0 */
#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index b308815a2c01..242af1154e49 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -62,6 +62,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* Software defined (when rFlags.CF = 1) */
#define PVALIDATE_FAIL_NOUPDATE 255
+/* RMP page size */
+#define RMP_PG_SIZE_4K 0
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
--
2.25.1
The SEV-SNP guest is required to perform GHCB GPA registration. This is
because the hypervisor may prefer that a guest use a consistent and/or
specific GPA for the GHCB associated with a vCPU. For more information,
see the GHCB specification section GHCB GPA Registration.
During the boot, init_ghcb() allocates a per-cpu GHCB page. On very first
VC exception, the exception handler switch to using the per-cpu GHCB page
allocated during the init_ghcb(). The GHCB page must be registered in
the current vcpu context.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/sev.c | 124 +++++++++++++++++++++++++-----------------
1 file changed, 75 insertions(+), 49 deletions(-)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 2290fbcc1844..4c891d5d9651 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -77,6 +77,13 @@ struct sev_es_runtime_data {
* is currently unsupported in SEV-ES guests.
*/
unsigned long dr7;
+
+ /*
+ * SEV-SNP requires that the GHCB must be registered before using it.
+ * The flag below will indicate whether the GHCB is registered, if its
+ * not registered then sev_es_get_ghcb() will perform the registration.
+ */
+ bool snp_ghcb_registered;
};
struct ghcb_state {
@@ -160,55 +167,6 @@ void noinstr __sev_es_ist_exit(void)
this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
}
-/*
- * Nothing shall interrupt this code path while holding the per-CPU
- * GHCB. The backup GHCB is only for NMIs interrupting this path.
- *
- * Callers must disable local interrupts around it.
- */
-static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
-{
- struct sev_es_runtime_data *data;
- struct ghcb *ghcb;
-
- WARN_ON(!irqs_disabled());
-
- data = this_cpu_read(runtime_data);
- ghcb = &data->ghcb_page;
-
- if (unlikely(data->ghcb_active)) {
- /* GHCB is already in use - save its contents */
-
- if (unlikely(data->backup_ghcb_active)) {
- /*
- * Backup-GHCB is also already in use. There is no way
- * to continue here so just kill the machine. To make
- * panic() work, mark GHCBs inactive so that messages
- * can be printed out.
- */
- data->ghcb_active = false;
- data->backup_ghcb_active = false;
-
- instrumentation_begin();
- panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
- instrumentation_end();
- }
-
- /* Mark backup_ghcb active before writing to it */
- data->backup_ghcb_active = true;
-
- state->ghcb = &data->backup_ghcb;
-
- /* Backup GHCB content */
- *state->ghcb = *ghcb;
- } else {
- state->ghcb = NULL;
- data->ghcb_active = true;
- }
-
- return ghcb;
-}
-
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
@@ -464,6 +422,69 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
/* Include code shared with pre-decompression boot stage */
#include "sev-shared.c"
+static void snp_register_ghcb(struct sev_es_runtime_data *data, unsigned long paddr)
+{
+ if (data->snp_ghcb_registered)
+ return;
+
+ snp_register_ghcb_early(paddr);
+
+ data->snp_ghcb_registered = true;
+}
+
+/*
+ * Nothing shall interrupt this code path while holding the per-CPU
+ * GHCB. The backup GHCB is only for NMIs interrupting this path.
+ *
+ * Callers must disable local interrupts around it.
+ */
+static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
+{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ WARN_ON(!irqs_disabled());
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ if (unlikely(data->ghcb_active)) {
+ /* GHCB is already in use - save its contents */
+
+ if (unlikely(data->backup_ghcb_active)) {
+ /*
+ * Backup-GHCB is also already in use. There is no way
+ * to continue here so just kill the machine. To make
+ * panic() work, mark GHCBs inactive so that messages
+ * can be printed out.
+ */
+ data->ghcb_active = false;
+ data->backup_ghcb_active = false;
+
+ instrumentation_begin();
+ panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
+ instrumentation_end();
+ }
+
+ /* Mark backup_ghcb active before writing to it */
+ data->backup_ghcb_active = true;
+
+ state->ghcb = &data->backup_ghcb;
+
+ /* Backup GHCB content */
+ *state->ghcb = *ghcb;
+ } else {
+ state->ghcb = NULL;
+ data->ghcb_active = true;
+ }
+
+ /* SEV-SNP guest requires that GHCB must be registered. */
+ if (cc_platform_has(CC_ATTR_SEV_SNP))
+ snp_register_ghcb(data, __pa(ghcb));
+
+ return ghcb;
+}
+
static noinstr void __sev_put_ghcb(struct ghcb_state *state)
{
struct sev_es_runtime_data *data;
@@ -650,6 +671,10 @@ static bool __init setup_ghcb(void)
/* Alright - Make the boot-ghcb public */
boot_ghcb = &boot_ghcb_page;
+ /* SEV-SNP guest requires that GHCB GPA must be registered. */
+ if (cc_platform_has(CC_ATTR_SEV_SNP))
+ snp_register_ghcb_early(__pa(&boot_ghcb_page));
+
return true;
}
@@ -739,6 +764,7 @@ static void __init init_ghcb(int cpu)
data->ghcb_active = false;
data->backup_ghcb_active = false;
+ data->snp_ghcb_registered = false;
}
void __init sev_es_init_vc_handling(void)
--
2.25.1
From: Michael Roth <[email protected]>
As of commit 103a4908ad4d ("x86/head/64: Disable stack protection for
head$(BITS).o") kernel/head64.c is compiled with -fno-stack-protector
to allow a call to set_bringup_idt_handler(), which would otherwise
have stack protection enabled with CONFIG_STACKPROTECTOR_STRONG. While
sufficient for that case, there may still be issues with calls to any
external functions that were compiled with stack protection enabled that
in-turn make stack-protected calls, or if the exception handlers set up
by set_bringup_idt_handler() make calls to stack-protected functions.
As part of 103a4908ad4d, stack protection was also disabled for
kernel/head32.c as a precaution.
Subsequent patches for SEV-SNP CPUID validation support will introduce
both such cases. Attempting to disable stack protection for everything
in scope to address that is prohibitive since much of the code, like
SEV-ES #VC handler, is shared code that remains in use after boot and
could benefit from having stack protection enabled. Attempting to inline
calls is brittle and can quickly balloon out to library/helper code
where that's not really an option.
Instead, re-enable stack protection for head32.c/head64.c and make the
appropriate changes to ensure the segment used for the stack canary is
initialized in advance of any stack-protected C calls.
for head64.c:
- The BSP will enter from startup_64 and call into C code
(startup_64_setup_env) shortly after setting up the stack, which may
result in calls to stack-protected code. Set up %gs early to allow
for this safely.
- APs will enter from secondary_startup_64*, and %gs will be set up
soon after. There is one call to C code prior to this
(__startup_secondary_64), but it is only to fetch sme_me_mask, and
unlikely to be stack-protected, so leave things as they are, but add
a note about this in case things change in the future.
for head32.c:
- BSPs/APs will set %fs to __BOOT_DS prior to any C calls. In recent
kernels, the compiler is configured to access the stack canary at
%fs:__stack_chk_guard, which overlaps with the initial per-cpu
__stack_chk_guard variable in the initial/'master' .data..percpu
area. This is sufficient to allow access to the canary for use
during initial startup, so no changes are needed there.
Suggested-by: Joerg Roedel <[email protected]> #for 64-bit %gs set up
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/Makefile | 1 -
arch/x86/kernel/head_64.S | 24 ++++++++++++++++++++++++
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ff3e600f426..4df8c8f7d2ac 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -48,7 +48,6 @@ endif
# non-deterministic coverage.
KCOV_INSTRUMENT := n
-CFLAGS_head$(BITS).o += -fno-stack-protector
CFLAGS_cc_platform.o += -fno-stack-protector
CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d8b3ebd2bb85..7074ebf2b47b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -65,6 +65,22 @@ SYM_CODE_START_NOALIGN(startup_64)
leaq (__end_init_task - FRAME_SIZE)(%rip), %rsp
leaq _text(%rip), %rdi
+
+ /*
+ * initial_gs points to initial fixed_per_cpu struct with storage for
+ * the stack protector canary. Global pointer fixups are needed at this
+ * stage, so apply them as is done in fixup_pointer(), and initialize %gs
+ * such that the canary can be accessed at %gs:40 for subsequent C calls.
+ */
+ movl $MSR_GS_BASE, %ecx
+ movq initial_gs(%rip), %rax
+ movq $_text, %rdx
+ subq %rdx, %rax
+ addq %rdi, %rax
+ movq %rax, %rdx
+ shrq $32, %rdx
+ wrmsr
+
pushq %rsi
call startup_64_setup_env
popq %rsi
@@ -133,6 +149,14 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
* added to the initial pgdir entry that will be programmed into CR3.
*/
pushq %rsi
+ /*
+ * NOTE: %gs at this point is a stale data segment left over from the
+ * real-mode trampoline, so the default stack protector canary location
+ * at %gs:40 does not yet coincide with the expected fixed_per_cpu struct
+ * that contains storage for the stack canary. So take care not to add
+ * anything to the C functions in this path that would result in stack
+ * protected C code being generated.
+ */
call __startup_secondary_64
popq %rsi
--
2.25.1
From: Michael Roth <[email protected]>
Future patches for SEV-SNP-validated CPUID will also require early
parsing of the EFI configuration. Incrementally move the related code
into a set of helpers that can be re-used for that purpose.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/acpi.c | 18 ++++-----
arch/x86/boot/compressed/efi.c | 64 +++++++++++++++++++++++++++++++
arch/x86/boot/compressed/misc.h | 14 +++++++
4 files changed, 87 insertions(+), 10 deletions(-)
create mode 100644 arch/x86/boot/compressed/efi.c
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 431bf7f846c3..d364192c2367 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -100,6 +100,7 @@ endif
vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
+vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
$(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE
diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index 8bcbcee54aa1..255f6959c090 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -86,8 +86,8 @@ static acpi_physical_address kexec_get_rsdp_addr(void)
{
efi_system_table_64_t *systab;
struct efi_setup_data *esd;
- struct efi_info *ei;
- char *sig;
+ bool efi_64;
+ int ret;
esd = (struct efi_setup_data *)get_kexec_setup_data_addr();
if (!esd)
@@ -98,18 +98,16 @@ static acpi_physical_address kexec_get_rsdp_addr(void)
return 0;
}
- ei = &boot_params->efi_info;
- sig = (char *)&ei->efi_loader_signature;
- if (strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
+ /* Get systab from boot params. */
+ ret = efi_get_system_table(boot_params, (unsigned long *)&systab, &efi_64);
+ if (ret)
+ error("EFI system table not found in kexec boot_params.");
+
+ if (!efi_64) {
debug_putstr("Wrong kexec EFI loader signature.\n");
return 0;
}
- /* Get systab from boot params. */
- systab = (efi_system_table_64_t *) (ei->efi_systab | ((__u64)ei->efi_systab_hi << 32));
- if (!systab)
- error("EFI system table not found in kexec boot_params.");
-
return __efi_get_rsdp_addr((unsigned long)esd->tables, systab->nr_tables, true);
}
#else
diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
new file mode 100644
index 000000000000..306b287b7368
--- /dev/null
+++ b/arch/x86/boot/compressed/efi.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Helpers for early access to EFI configuration table
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Michael Roth <[email protected]>
+ */
+
+#include "misc.h"
+#include <linux/efi.h>
+#include <asm/efi.h>
+
+/**
+ * Given boot_params, retrieve the physical address of EFI system table.
+ *
+ * @boot_params: pointer to boot_params
+ * @sys_tbl_pa: location to store physical address of system table
+ * @is_efi_64: location to store whether using 64-bit EFI or not
+ *
+ * Returns 0 on success. On error, return params are left unchanged.
+ */
+int efi_get_system_table(struct boot_params *boot_params, unsigned long *sys_tbl_pa,
+ bool *is_efi_64)
+{
+ unsigned long sys_tbl;
+ struct efi_info *ei;
+ bool efi_64;
+ char *sig;
+
+ if (!sys_tbl_pa || !is_efi_64)
+ return -EINVAL;
+
+ ei = &boot_params->efi_info;
+ sig = (char *)&ei->efi_loader_signature;
+
+ if (!strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
+ efi_64 = true;
+ } else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4)) {
+ efi_64 = false;
+ } else {
+ debug_putstr("Wrong EFI loader signature.\n");
+ return -ENOENT;
+ }
+
+ /* Get systab from boot params. */
+#ifdef CONFIG_X86_64
+ sys_tbl = ei->efi_systab | ((__u64)ei->efi_systab_hi << 32);
+#else
+ if (ei->efi_systab_hi || ei->efi_memmap_hi) {
+ debug_putstr("Error: EFI system table located above 4GB.\n");
+ return -EINVAL;
+ }
+ sys_tbl = ei->efi_systab;
+#endif
+ if (!sys_tbl) {
+ debug_putstr("EFI system table not found.");
+ return -ENOENT;
+ }
+
+ *sys_tbl_pa = sys_tbl;
+ *is_efi_64 = efi_64;
+ return 0;
+}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 822e0c254b9a..f86ff866fd7a 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -21,6 +21,7 @@
#include <linux/screen_info.h>
#include <linux/elf.h>
#include <linux/io.h>
+#include <linux/efi.h>
#include <asm/page.h>
#include <asm/boot.h>
#include <asm/bootparam.h>
@@ -174,4 +175,17 @@ void boot_stage2_vc(void);
unsigned long sev_verify_cbit(unsigned long cr3);
+#ifdef CONFIG_EFI
+/* helpers for early EFI config table access */
+int efi_get_system_table(struct boot_params *boot_params,
+ unsigned long *sys_tbl_pa, bool *is_efi_64);
+#else
+static inline int
+efi_get_system_table(struct boot_params *boot_params,
+ unsigned long *sys_tbl_pa, bool *is_efi_64)
+{
+ return -ENOENT;
+}
+#endif /* CONFIG_EFI */
+
#endif /* BOOT_COMPRESSED_MISC_H */
--
2.25.1
The early_set_memory_{encrypt,decrypt}() are used for changing the
page from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps.
If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:
1. Issue the page state change VMGEXIT to add the page as a private
in the RMP table.
2. Validate the page after its successfully added in the RMP table.
To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before clearing the
encryption attribute from the page table.
1. Invalidate the page.
2. Issue the page state change VMGEXIT to make the page shared in the
RMP table.
The early_set_memory_{encrypt,decrypt} can be called before the GHCB
is setup, use the SNP page state MSR protocol VMGEXIT defined in the GHCB
specification to request the page state change in the RMP table.
While at it, add a helper snp_prep_memory() that can be used outside
the sev specific files to change the page state for a specified memory
range.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 10 ++++
arch/x86/kernel/sev.c | 102 +++++++++++++++++++++++++++++++++++++
arch/x86/mm/mem_encrypt.c | 51 +++++++++++++++++--
3 files changed, 159 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 242af1154e49..ecd8cd8c5908 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -104,6 +104,11 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
return rc;
}
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages);
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages);
+void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -111,6 +116,11 @@ static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { ret
static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
+static inline void __init
+early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+static inline void __init
+early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
#endif
#endif
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index ad3fefb741e1..488011479678 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -553,6 +553,108 @@ static u64 get_jump_table_addr(void)
return ret;
}
+static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool validate)
+{
+ unsigned long vaddr_end;
+ int rc;
+
+ vaddr = vaddr & PAGE_MASK;
+ vaddr_end = vaddr + (npages << PAGE_SHIFT);
+
+ while (vaddr < vaddr_end) {
+ rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
+ if (WARN(rc, "Failed to validate address 0x%lx ret %d", vaddr, rc))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+
+ vaddr = vaddr + PAGE_SIZE;
+ }
+}
+
+static void __init early_set_page_state(unsigned long paddr, unsigned int npages, enum psc_op op)
+{
+ unsigned long paddr_end;
+ u64 val;
+
+ paddr = paddr & PAGE_MASK;
+ paddr_end = paddr + (npages << PAGE_SHIFT);
+
+ while (paddr < paddr_end) {
+ /*
+ * Use the MSR protocol because this function can be called before the GHCB
+ * is established.
+ */
+ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+
+ if (WARN(GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP,
+ "Wrong PSC response code: 0x%x\n",
+ (unsigned int)GHCB_RESP_CODE(val)))
+ goto e_term;
+
+ if (WARN(GHCB_MSR_PSC_RESP_VAL(val),
+ "Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",
+ op == SNP_PAGE_STATE_PRIVATE ? "private" : "shared",
+ paddr, GHCB_MSR_PSC_RESP_VAL(val)))
+ goto e_term;
+
+ paddr = paddr + PAGE_SIZE;
+ }
+
+ return;
+
+e_term:
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+}
+
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages)
+{
+ if (!cc_platform_has(CC_ATTR_SEV_SNP))
+ return;
+
+ /*
+ * Ask the hypervisor to mark the memory pages as private in the RMP
+ * table.
+ */
+ early_set_page_state(paddr, npages, SNP_PAGE_STATE_PRIVATE);
+
+ /* Validate the memory pages after they've been added in the RMP table. */
+ pvalidate_pages(vaddr, npages, 1);
+}
+
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages)
+{
+ if (!cc_platform_has(CC_ATTR_SEV_SNP))
+ return;
+
+ /*
+ * Invalidate the memory pages before they are marked shared in the
+ * RMP table.
+ */
+ pvalidate_pages(vaddr, npages, 0);
+
+ /* Ask hypervisor to mark the memory pages shared in the RMP table. */
+ early_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
+}
+
+void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op)
+{
+ unsigned long vaddr, npages;
+
+ vaddr = (unsigned long)__va(paddr);
+ npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ if (op == SNP_PAGE_STATE_PRIVATE)
+ early_snp_set_memory_private(vaddr, paddr, npages);
+ else if (op == SNP_PAGE_STATE_SHARED)
+ early_snp_set_memory_shared(vaddr, paddr, npages);
+ else
+ WARN(1, "invalid memory op %d\n", op);
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 534c2c82fbec..d01bb95f7aef 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -31,6 +31,7 @@
#include <asm/processor-flags.h>
#include <asm/msr.h>
#include <asm/cmdline.h>
+#include <asm/sev.h>
#include "mm_internal.h"
@@ -49,6 +50,34 @@ EXPORT_SYMBOL_GPL(sev_enable_key);
/* Buffer used for early in-place encryption by BSP, no locking needed */
static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
+/*
+ * When SNP is active, change the page state from private to shared before
+ * copying the data from the source to destination and restore after the copy.
+ * This is required because the source address is mapped as decrypted by the
+ * caller of the routine.
+ */
+static inline void __init snp_memcpy(void *dst, void *src, size_t sz,
+ unsigned long paddr, bool decrypt)
+{
+ unsigned long npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ if (!cc_platform_has(CC_ATTR_SEV_SNP) || !decrypt) {
+ memcpy(dst, src, sz);
+ return;
+ }
+
+ /*
+ * With SNP, the paddr needs to be accessed decrypted, mark the page
+ * shared in the RMP table before copying it.
+ */
+ early_snp_set_memory_shared((unsigned long)__va(paddr), paddr, npages);
+
+ memcpy(dst, src, sz);
+
+ /* Restore the page state after the memcpy. */
+ early_snp_set_memory_private((unsigned long)__va(paddr), paddr, npages);
+}
+
/*
* This routine does not change the underlying encryption setting of the
* page(s) that map this memory. It assumes that eventually the memory is
@@ -97,8 +126,8 @@ static void __init __sme_early_enc_dec(resource_size_t paddr,
* Use a temporary buffer, of cache-line multiple size, to
* avoid data corruption as documented in the APM.
*/
- memcpy(sme_early_buffer, src, len);
- memcpy(dst, sme_early_buffer, len);
+ snp_memcpy(sme_early_buffer, src, len, paddr, enc);
+ snp_memcpy(dst, sme_early_buffer, len, paddr, !enc);
early_memunmap(dst, len);
early_memunmap(src, len);
@@ -273,14 +302,28 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
clflush_cache_range(__va(pa), size);
/* Encrypt/decrypt the contents in-place */
- if (enc)
+ if (enc) {
sme_early_encrypt(pa, size);
- else
+ } else {
sme_early_decrypt(pa, size);
+ /*
+ * ON SNP, the page state in the RMP table must happen
+ * before the page table updates.
+ */
+ early_snp_set_memory_shared((unsigned long)__va(pa), pa, 1);
+ }
+
/* Change the page encryption mask. */
new_pte = pfn_pte(pfn, new_prot);
set_pte_atomic(kpte, new_pte);
+
+ /*
+ * If page is set encrypted in the page table, then update the RMP table to
+ * add this page as private.
+ */
+ if (enc)
+ early_snp_set_memory_private((unsigned long)__va(pa), pa, 1);
}
static int __init early_set_memory_enc_dec(unsigned long vaddr,
--
2.25.1
From: Michael Roth <[email protected]>
This code will also be used later for SEV-SNP-validated CPUID code in
some cases, so move it to a common helper.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/sev-shared.c | 84 +++++++++++++++++++++++++-----------
1 file changed, 58 insertions(+), 26 deletions(-)
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 2b53b622108f..402b19f1c75d 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -193,6 +193,58 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
return ret;
}
+static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
+ u32 *ecx, u32 *edx)
+{
+ u64 val;
+
+ if (eax) {
+ sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(func, GHCB_CPUID_REQ_EAX));
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
+ return -EIO;
+
+ *eax = (val >> 32);
+ }
+
+ if (ebx) {
+ sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(func, GHCB_CPUID_REQ_EBX));
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
+ return -EIO;
+
+ *ebx = (val >> 32);
+ }
+
+ if (ecx) {
+ sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(func, GHCB_CPUID_REQ_ECX));
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
+ return -EIO;
+
+ *ecx = (val >> 32);
+ }
+
+ if (edx) {
+ sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(func, GHCB_CPUID_REQ_EDX));
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
+ return -EIO;
+
+ *edx = (val >> 32);
+ }
+
+ return 0;
+}
+
/*
* Boot VC Handler - This is the first VC handler during boot, there is no GHCB
* page yet, so it only supports the MSR based communication with the
@@ -201,7 +253,7 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
{
unsigned int fn = lower_bits(regs->ax, 32);
- unsigned long val;
+ u32 eax, ebx, ecx, edx;
/* Only CPUID is supported via MSR protocol */
if (exit_code != SVM_EXIT_CPUID)
@@ -221,33 +273,13 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
sev_status = (hi << 32) | lo;
}
- sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX));
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
- if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
+ if (sev_cpuid_hv(fn, 0, &eax, &ebx, &ecx, &edx))
goto fail;
- regs->ax = val >> 32;
- sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EBX));
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
- if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
- goto fail;
- regs->bx = val >> 32;
-
- sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_ECX));
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
- if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
- goto fail;
- regs->cx = val >> 32;
-
- sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EDX));
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
- if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
- goto fail;
- regs->dx = val >> 32;
+ regs->ax = eax;
+ regs->bx = ebx;
+ regs->cx = ecx;
+ regs->dx = edx;
/*
* This is a VC handler and the #VC is only raised when SEV-ES is
--
2.25.1
From: Michael Roth <[email protected]>
Future patches for SEV-SNP-validated CPUID will also require early
parsing of the EFI configuration. Incrementally move the related code
into a set of helpers that can be re-used for that purpose.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/acpi.c | 50 ++++++++-----------------
arch/x86/boot/compressed/efi.c | 65 +++++++++++++++++++++++++++++++++
arch/x86/boot/compressed/misc.h | 9 +++++
3 files changed, 90 insertions(+), 34 deletions(-)
diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index d43ff3ff573b..f2a6092738cd 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -20,46 +20,28 @@
*/
struct mem_vector immovable_mem[MAX_NUMNODES*2];
-/*
- * Search EFI system tables for RSDP. If both ACPI_20_TABLE_GUID and
- * ACPI_TABLE_GUID are found, take the former, which has more features.
- */
static acpi_physical_address
-__efi_get_rsdp_addr(unsigned long config_tables, unsigned int nr_tables,
- bool efi_64)
+__efi_get_rsdp_addr(unsigned long cfg_tbl_pa, unsigned int cfg_tbl_len, bool efi_64)
{
acpi_physical_address rsdp_addr = 0;
#ifdef CONFIG_EFI
- int i;
-
- /* Get EFI tables from systab. */
- for (i = 0; i < nr_tables; i++) {
- acpi_physical_address table;
- efi_guid_t guid;
-
- if (efi_64) {
- efi_config_table_64_t *tbl = (efi_config_table_64_t *)config_tables + i;
-
- guid = tbl->guid;
- table = tbl->table;
-
- if (!IS_ENABLED(CONFIG_X86_64) && table >> 32) {
- debug_putstr("Error getting RSDP address: EFI config table located above 4GB.\n");
- return 0;
- }
- } else {
- efi_config_table_32_t *tbl = (efi_config_table_32_t *)config_tables + i;
-
- guid = tbl->guid;
- table = tbl->table;
- }
+ int ret;
- if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
- rsdp_addr = table;
- else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
- return table;
- }
+ /*
+ * Search EFI system tables for RSDP. Preferred is ACPI_20_TABLE_GUID to
+ * ACPI_TABLE_GUID because it has more features.
+ */
+ ret = efi_find_vendor_table(cfg_tbl_pa, cfg_tbl_len, ACPI_20_TABLE_GUID,
+ efi_64, (unsigned long *)&rsdp_addr);
+ if (!ret)
+ return rsdp_addr;
+
+ /* No ACPI_20_TABLE_GUID found, fallback to ACPI_TABLE_GUID. */
+ ret = efi_find_vendor_table(cfg_tbl_pa, cfg_tbl_len, ACPI_TABLE_GUID,
+ efi_64, (unsigned long *)&rsdp_addr);
+ if (ret)
+ debug_putstr("Error getting RSDP address.\n");
#endif
return rsdp_addr;
}
diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
index e5f39b3f5665..9817e3020207 100644
--- a/arch/x86/boot/compressed/efi.c
+++ b/arch/x86/boot/compressed/efi.c
@@ -104,3 +104,68 @@ int efi_get_conf_table(struct boot_params *boot_params, unsigned long *cfg_tbl_p
return 0;
}
+
+/* Get vendor table address/guid from EFI config table at the given index */
+static int get_vendor_table(void *cfg_tbl, unsigned int idx,
+ unsigned long *vendor_tbl_pa,
+ efi_guid_t *vendor_tbl_guid,
+ bool efi_64)
+{
+ if (efi_64) {
+ efi_config_table_64_t *tbl_entry =
+ (efi_config_table_64_t *)cfg_tbl + idx;
+
+ if (!IS_ENABLED(CONFIG_X86_64) && tbl_entry->table >> 32) {
+ debug_putstr("Error: EFI config table entry located above 4GB.\n");
+ return -EINVAL;
+ }
+
+ *vendor_tbl_pa = tbl_entry->table;
+ *vendor_tbl_guid = tbl_entry->guid;
+
+ } else {
+ efi_config_table_32_t *tbl_entry =
+ (efi_config_table_32_t *)cfg_tbl + idx;
+
+ *vendor_tbl_pa = tbl_entry->table;
+ *vendor_tbl_guid = tbl_entry->guid;
+ }
+
+ return 0;
+}
+
+/**
+ * Given EFI config table, search it for the physical address of the vendor
+ * table associated with GUID.
+ *
+ * @cfg_tbl_pa: pointer to EFI configuration table
+ * @cfg_tbl_len: number of entries in EFI configuration table
+ * @guid: GUID of vendor table
+ * @efi_64: true if using 64-bit EFI
+ * @vendor_tbl_pa: location to store physical address of vendor table
+ *
+ * Returns 0 on success. On error, return params are left unchanged.
+ */
+int efi_find_vendor_table(unsigned long cfg_tbl_pa, unsigned int cfg_tbl_len,
+ efi_guid_t guid, bool efi_64, unsigned long *vendor_tbl_pa)
+{
+ unsigned int i;
+
+ for (i = 0; i < cfg_tbl_len; i++) {
+ unsigned long vendor_tbl_pa_tmp;
+ efi_guid_t vendor_tbl_guid;
+ int ret;
+
+ if (get_vendor_table((void *)cfg_tbl_pa, i,
+ &vendor_tbl_pa_tmp,
+ &vendor_tbl_guid, efi_64))
+ return -EINVAL;
+
+ if (!efi_guidcmp(guid, vendor_tbl_guid)) {
+ *vendor_tbl_pa = vendor_tbl_pa_tmp;
+ return 0;
+ }
+ }
+
+ return -ENOENT;
+}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index b72fd860362a..d4a26f3d3580 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -181,6 +181,8 @@ int efi_get_system_table(struct boot_params *boot_params,
unsigned long *sys_tbl_pa, bool *is_efi_64);
int efi_get_conf_table(struct boot_params *boot_params, unsigned long *cfg_tbl_pa,
unsigned int *cfg_tbl_len, bool *is_efi_64);
+int efi_find_vendor_table(unsigned long cfg_tbl_pa, unsigned int cfg_tbl_len,
+ efi_guid_t guid, bool efi_64, unsigned long *vendor_tbl_pa);
#else
static inline int
efi_get_system_table(struct boot_params *boot_params,
@@ -195,6 +197,13 @@ efi_get_conf_table(struct boot_params *boot_params, unsigned long *cfg_tbl_pa,
{
return -ENOENT;
}
+
+static inline int
+efi_find_vendor_table(unsigned long cfg_tbl_pa, unsigned int cfg_tbl_len,
+ efi_guid_t guid, bool efi_64, unsigned long *vendor_tbl_pa)
+{
+ return -ENOENT;
+}
#endif /* CONFIG_EFI */
#endif /* BOOT_COMPRESSED_MISC_H */
--
2.25.1
While launching the encrypted guests, the hypervisor may need to provide
some additional information during the guest boot. When booting under the
EFI based BIOS, the EFI configuration table contains an entry for the
confidential computing blob that contains the required information.
To support booting encrypted guests on non-EFI VM, the hypervisor needs to
pass this additional information to the kernel with a different method.
For this purpose, introduce SETUP_CC_BLOB type in setup_data to hold the
physical address of the confidential computing blob location. The boot
loader or hypervisor may choose to use this method instead of EFI
configuration table. The CC blob location scanning should give preference
to setup_data data over the EFI configuration table.
In AMD SEV-SNP, the CC blob contains the address of the secrets and CPUID
pages. The secrets page includes information such as a VM to PSP
communication key and CPUID page contains PSP filtered CPUID values.
Define the AMD SEV confidential computing blob structure.
While at it, define the EFI GUID for the confidential computing blob.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 12 ++++++++++++
arch/x86/include/uapi/asm/bootparam.h | 1 +
include/linux/efi.h | 1 +
3 files changed, 14 insertions(+)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7f063127aa66..534fa1c4c881 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -44,6 +44,18 @@ struct es_em_ctxt {
void do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code);
+/* AMD SEV Confidential computing blob structure */
+#define CC_BLOB_SEV_HDR_MAGIC 0x45444d41
+struct cc_blob_sev_info {
+ u32 magic;
+ u16 version;
+ u16 reserved;
+ u64 secrets_phys;
+ u32 secrets_len;
+ u64 cpuid_phys;
+ u32 cpuid_len;
+};
+
static inline u64 lower_bits(u64 val, unsigned int bits)
{
u64 mask = (1ULL << bits) - 1;
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index b25d3f82c2f3..1ac5acca72ce 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -10,6 +10,7 @@
#define SETUP_EFI 4
#define SETUP_APPLE_PROPERTIES 5
#define SETUP_JAILHOUSE 6
+#define SETUP_CC_BLOB 7
#define SETUP_INDIRECT (1<<31)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 6b5d36babfcc..75aeb2a56888 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -344,6 +344,7 @@ void efi_native_runtime_setup(void);
#define EFI_CERT_SHA256_GUID EFI_GUID(0xc1c41626, 0x504c, 0x4092, 0xac, 0xa9, 0x41, 0xf9, 0x36, 0x93, 0x43, 0x28)
#define EFI_CERT_X509_GUID EFI_GUID(0xa5c059a1, 0x94e4, 0x4aa7, 0x87, 0xb5, 0xab, 0x15, 0x5c, 0x2b, 0xf0, 0x72)
#define EFI_CERT_X509_SHA256_GUID EFI_GUID(0x3bd2a492, 0x96c0, 0x4079, 0xb4, 0x20, 0xfc, 0xf9, 0x8e, 0xf1, 0x03, 0xed)
+#define EFI_CC_BLOB_GUID EFI_GUID(0x067b1f5f, 0xcf26, 0x44c5, 0x85, 0x54, 0x93, 0xd7, 0x77, 0x91, 0x2d, 0x42)
/*
* This GUID is used to pass to the kernel proper the struct screen_info
--
2.25.1
The probe_roms() access the memory range (0xc0000 - 0x10000) to probe
various ROMs. The memory range is not part of the E820 system RAM
range. The memory range is mapped as private (i.e encrypted) in page
table.
When SEV-SNP is active, all the private memory must be validated before
the access. The ROM range was not part of E820 map, so the guest BIOS
did not validate it. An access to invalidated memory will cause a VC
exception. The guest does not support handling not-validated VC exception
yet, so validate the ROM memory regions before it is accessed.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/probe_roms.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index 9e1def3744f2..9c09df86d167 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -21,6 +21,7 @@
#include <asm/sections.h>
#include <asm/io.h>
#include <asm/setup_arch.h>
+#include <asm/sev.h>
static struct resource system_rom_resource = {
.name = "System ROM",
@@ -197,11 +198,21 @@ static int __init romchecksum(const unsigned char *rom, unsigned long length)
void __init probe_roms(void)
{
- const unsigned char *rom;
unsigned long start, length, upper;
+ const unsigned char *rom;
unsigned char c;
int i;
+ /*
+ * The ROM memory is not part of the E820 system RAM and is not pre-validated
+ * by the BIOS. The kernel page table maps the ROM region as encrypted memory,
+ * the SEV-SNP requires the encrypted memory must be validated before the
+ * access. Validate the ROM before accessing it.
+ */
+ snp_prep_memory(video_rom_resource.start,
+ ((system_rom_resource.end + 1) - video_rom_resource.start),
+ SNP_PAGE_STATE_PRIVATE);
+
/* video rom */
upper = adapter_rom_resources[0].start;
for (start = video_rom_resource.start; start < upper; start += 2048) {
--
2.25.1
The SNP_GET_DERIVED_KEY ioctl interface can be used by the SNP guest to
ask the firmware to provide a key derived from a root key. The derived
key may be used by the guest for any purposes it choose, such as a
sealing key or communicating with the external entities.
See SEV-SNP firmware spec for more information.
Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/virt/coco/sevguest.rst | 19 ++++++++++-
drivers/virt/coco/sevguest/sevguest.c | 49 +++++++++++++++++++++++++++
include/uapi/linux/sev-guest.h | 24 +++++++++++++
3 files changed, 91 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
index 002c90946b8a..4b524d1de37c 100644
--- a/Documentation/virt/coco/sevguest.rst
+++ b/Documentation/virt/coco/sevguest.rst
@@ -64,10 +64,27 @@ The SNP_GET_REPORT ioctl can be used to query the attestation report from the
SEV-SNP firmware. The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command
provided by the SEV-SNP firmware to query the attestation report.
-On success, the snp_report_resp.data will contains the report. The report
+On success, the snp_report_resp.data will contain the report. The report
will contain the format described in the SEV-SNP specification. See the SEV-SNP
specification for further details.
+2.2 SNP_GET_DERIVED_KEY
+-----------------------
+:Technology: sev-snp
+:Type: guest ioctl
+:Parameters (in): struct snp_derived_key_req
+:Returns (out): struct snp_derived_key_req on success, -negative on error
+
+The SNP_GET_DERIVED_KEY ioctl can be used to get a key derive from a root key.
+The derived key can be used by the guest for any purpose, such as sealing keys
+or communicating with external entities.
+
+The ioctl uses the SNP_GUEST_REQUEST (MSG_KEY_REQ) command provided by the
+SEV-SNP firmware to derive the key. See SEV-SNP specification for further details
+on the various fileds passed in the key derivation request.
+
+On success, the snp_derived_key_resp.data will contains the derived key value. See
+the SEV-SNP specification for further details.
Reference
---------
diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
index 2d313fb2ffae..c6ca7d861a3a 100644
--- a/drivers/virt/coco/sevguest/sevguest.c
+++ b/drivers/virt/coco/sevguest/sevguest.c
@@ -364,6 +364,52 @@ static int get_report(struct snp_guest_dev *snp_dev, struct snp_guest_request_io
return rc;
}
+static int get_derived_key(struct snp_guest_dev *snp_dev, struct snp_guest_request_ioctl *arg)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_derived_key_resp resp = {0};
+ struct snp_derived_key_req req;
+ int rc, resp_len;
+ u8 buf[89];
+
+ if (!arg->req_data || !arg->resp_data)
+ return -EINVAL;
+
+ /* Copy the request payload from userspace */
+ if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
+ return -EFAULT;
+
+ /* Message version must be non-zero */
+ if (!req.msg_version)
+ return -EINVAL;
+
+ /*
+ * The intermediate response buffer is used while decrypting the
+ * response payload. Make sure that it has enough space to cover the
+ * authtag.
+ */
+ resp_len = sizeof(resp.data) + crypto->a_len;
+ if (sizeof(buf) < resp_len)
+ return -ENOMEM;
+
+ /* Issue the command to get the attestation report */
+ rc = handle_guest_request(snp_dev, SVM_VMGEXIT_GUEST_REQUEST, req.msg_version,
+ SNP_MSG_KEY_REQ, &req.data, sizeof(req.data), buf, resp_len,
+ &arg->fw_err);
+ if (rc)
+ goto e_free;
+
+ /* Copy the response payload to userspace */
+ memcpy(resp.data, buf, sizeof(resp.data));
+ if (copy_to_user((void __user *)arg->resp_data, &resp, sizeof(resp)))
+ rc = -EFAULT;
+
+e_free:
+ memzero_explicit(buf, sizeof(buf));
+ memzero_explicit(&resp, sizeof(resp));
+ return rc;
+}
+
static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
struct snp_guest_dev *snp_dev = to_snp_dev(file);
@@ -382,6 +428,9 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
case SNP_GET_REPORT:
ret = get_report(snp_dev, &input);
break;
+ case SNP_GET_DERIVED_KEY:
+ ret = get_derived_key(snp_dev, &input);
+ break;
default:
break;
}
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index eda7edcffda8..f6d9c136ff4d 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -36,9 +36,33 @@ struct snp_guest_request_ioctl {
__u64 fw_err;
};
+struct __snp_derived_key_req {
+ __u32 root_key_select;
+ __u32 rsvd;
+ __u64 guest_field_select;
+ __u32 vmpl;
+ __u32 guest_svn;
+ __u64 tcb_version;
+};
+
+struct snp_derived_key_req {
+ /* message version number (must be non-zero) */
+ __u8 msg_version;
+
+ struct __snp_derived_key_req data;
+};
+
+struct snp_derived_key_resp {
+ /* response data, see SEV-SNP spec for the format */
+ __u8 data[64];
+};
+
#define SNP_GUEST_REQ_IOC_TYPE 'S'
/* Get SNP attestation report */
#define SNP_GET_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x0, struct snp_guest_request_ioctl)
+/* Get a derived key from the root */
+#define SNP_GET_DERIVED_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_guest_request_ioctl)
+
#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.25.1
From: Tom Lendacky <[email protected]>
This is the final step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Update the SEV-ES/SEV-SNP save area to match the APM. This save
area will be used for the upcoming SEV-SNP AP Creation NAE event support.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 66 +++++++++++++++++++++++++++++---------
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 4a4de2454ca3..c75f46cf27db 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -290,7 +290,13 @@ struct sev_es_save_area {
struct vmcb_seg ldtr;
struct vmcb_seg idtr;
struct vmcb_seg tr;
- u8 reserved_1[43];
+ u64 vmpl0_ssp;
+ u64 vmpl1_ssp;
+ u64 vmpl2_ssp;
+ u64 vmpl3_ssp;
+ u64 u_cet;
+ u8 reserved_1[2];
+ u8 vmpl;
u8 cpl;
u8 reserved_2[4];
u64 efer;
@@ -303,9 +309,19 @@ struct sev_es_save_area {
u64 dr6;
u64 rflags;
u64 rip;
- u8 reserved_4[88];
+ u64 dr0;
+ u64 dr1;
+ u64 dr2;
+ u64 dr3;
+ u64 dr0_addr_mask;
+ u64 dr1_addr_mask;
+ u64 dr2_addr_mask;
+ u64 dr3_addr_mask;
+ u8 reserved_4[24];
u64 rsp;
- u8 reserved_5[24];
+ u64 s_cet;
+ u64 ssp;
+ u64 isst_addr;
u64 rax;
u64 star;
u64 lstar;
@@ -316,7 +332,7 @@ struct sev_es_save_area {
u64 sysenter_esp;
u64 sysenter_eip;
u64 cr2;
- u8 reserved_6[32];
+ u8 reserved_5[32];
u64 g_pat;
u64 dbgctl;
u64 br_from;
@@ -325,12 +341,12 @@ struct sev_es_save_area {
u64 last_excp_to;
u8 reserved_7[80];
u32 pkru;
- u8 reserved_9[20];
- u64 reserved_10; /* rax already available at 0x01f8 */
+ u8 reserved_8[20];
+ u64 reserved_9; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
u64 rbx;
- u64 reserved_11; /* rsp already available at 0x01d8 */
+ u64 reserved_10; /* rsp already available at 0x01d8 */
u64 rbp;
u64 rsi;
u64 rdi;
@@ -342,16 +358,34 @@ struct sev_es_save_area {
u64 r13;
u64 r14;
u64 r15;
- u8 reserved_12[16];
- u64 sw_exit_code;
- u64 sw_exit_info_1;
- u64 sw_exit_info_2;
- u64 sw_scratch;
+ u8 reserved_11[16];
+ u64 guest_exit_info_1;
+ u64 guest_exit_info_2;
+ u64 guest_exit_int_info;
+ u64 guest_nrip;
u64 sev_features;
- u8 reserved_13[48];
+ u64 vintr_ctrl;
+ u64 guest_exit_code;
+ u64 virtual_tom;
+ u64 tlb_id;
+ u64 pcpu_id;
+ u64 event_inj;
u64 xcr0;
- u8 valid_bitmap[16];
- u64 x87_state_gpa;
+ u8 reserved_12[16];
+
+ /* Floating point area */
+ u64 x87_dp;
+ u32 mxcsr;
+ u16 x87_ftw;
+ u16 x87_fsw;
+ u16 x87_fcw;
+ u16 x87_fop;
+ u16 x87_ds;
+ u16 x87_cs;
+ u64 x87_rip;
+ u8 fpreg_x87[80];
+ u8 fpreg_xmm[256];
+ u8 fpreg_ymm[256];
} __packed;
struct ghcb_save_area {
@@ -408,7 +442,7 @@ struct ghcb {
#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
#define EXPECTED_GHCB_SAVE_AREA_SIZE 1032
-#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
+#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1648
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 1024
#define EXPECTED_GHCB_SIZE PAGE_SIZE
--
2.25.1
The SEV-SNP guest is required to perform GHCB GPA registration. This is
because the hypervisor may prefer that a guest use a consistent and/or
specific GPA for the GHCB associated with a vCPU. For more information,
see the GHCB specification.
If hypervisor can not work with the guest provided GPA then terminate the
guest boot.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 4 ++++
arch/x86/include/asm/sev-common.h | 13 +++++++++++++
arch/x86/kernel/sev-shared.c | 16 ++++++++++++++++
3 files changed, 33 insertions(+)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index c644f260098e..e8308ada610d 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -223,6 +223,10 @@ static bool do_early_sev_setup(void)
/* Initialize lookup tables for the instruction decoder */
inat_init_tables();
+ /* SEV-SNP guest requires the GHCB GPA must be registered */
+ if (sev_snp_enabled())
+ snp_register_ghcb_early(__pa(&boot_ghcb_page));
+
return true;
}
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1c76b6b775cc..b82fff9d607b 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -57,6 +57,19 @@
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+/* GHCB GPA Register */
+#define GHCB_MSR_REG_GPA_REQ 0x012
+#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
+ /* GHCBData[63:12] */ \
+ (((u64)((v) & GENMASK_ULL(51, 0)) << 12) | \
+ /* GHCBData[11:0] */ \
+ GHCB_MSR_REG_GPA_REQ)
+
+#define GHCB_MSR_REG_GPA_RESP 0x013
+#define GHCB_MSR_REG_GPA_RESP_VAL(v) \
+ /* GHCBData[63:12] */ \
+ (((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
+
/*
* SNP Page State Change Operation
*
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 2796c524d174..2b53b622108f 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -75,6 +75,22 @@ static bool get_hv_features(void)
return true;
}
+static void snp_register_ghcb_early(unsigned long paddr)
+{
+ unsigned long pfn = paddr >> PAGE_SHIFT;
+ u64 val;
+
+ sev_es_wr_ghcb_msr(GHCB_MSR_REG_GPA_REQ_VAL(pfn));
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+
+ /* If the response GPA is not ours then abort the guest */
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_REG_GPA_RESP) ||
+ (GHCB_MSR_REG_GPA_RESP_VAL(val) != pfn))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_REGISTER);
+}
+
static bool sev_es_negotiate_protocol(void)
{
u64 val;
--
2.25.1
The SEV-ES guest calls the sev_es_negotiate_protocol() to negotiate the
GHCB protocol version before establishing the GHCB. Cache the negotiated
GHCB version so that it can be used later.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 2 +-
arch/x86/kernel/sev-shared.c | 17 ++++++++++++++---
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index fa5cd05d3b5b..7ec91b1359df 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -12,7 +12,7 @@
#include <asm/insn.h>
#include <asm/sev-common.h>
-#define GHCB_PROTO_OUR 0x0001UL
+#define GHCB_PROTOCOL_MIN 1ULL
#define GHCB_PROTOCOL_MAX 1ULL
#define GHCB_DEFAULT_USAGE 0ULL
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index a010d6b41a04..0eb22528ec87 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -14,6 +14,15 @@
#define has_cpuflag(f) boot_cpu_has(f)
#endif
+/*
+ * Since feature negotiation related variables are set early in the boot
+ * process they must reside in the .data section so as not to be zeroed
+ * out when the .bss section is later cleared.
+ *
+ * GHCB protocol version negotiated with the hypervisor.
+ */
+static u16 __ro_after_init ghcb_version;
+
static bool __init sev_es_check_cpu_features(void)
{
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -51,10 +60,12 @@ static bool sev_es_negotiate_protocol(void)
if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
return false;
- if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTO_OUR ||
- GHCB_MSR_PROTO_MIN(val) > GHCB_PROTO_OUR)
+ if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
+ GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
return false;
+ ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
+
return true;
}
@@ -99,7 +110,7 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
enum es_result ret;
/* Fill in protocol and format specifiers */
- ghcb->protocol_version = GHCB_PROTOCOL_MAX;
+ ghcb->protocol_version = ghcb_version;
ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
ghcb_set_sw_exit_code(ghcb, exit_code);
--
2.25.1
SEV-SNP specification provides the guest a mechanisum to communicate with
the PSP without risk from a malicious hypervisor who wishes to read, alter,
drop or replay the messages sent. The driver uses snp_issue_guest_request()
to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to
submit the request to PSP.
The PSP requires that all communication should be encrypted using key
specified through the platform_data.
The userspace can use SNP_GET_REPORT ioctl() to query the guest
attestation report.
See SEV-SNP spec section Guest Messages for more details.
Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/virt/coco/sevguest.rst | 77 ++++
drivers/virt/Kconfig | 3 +
drivers/virt/Makefile | 1 +
drivers/virt/coco/sevguest/Kconfig | 9 +
drivers/virt/coco/sevguest/Makefile | 2 +
drivers/virt/coco/sevguest/sevguest.c | 561 ++++++++++++++++++++++++++
drivers/virt/coco/sevguest/sevguest.h | 98 +++++
include/uapi/linux/sev-guest.h | 44 ++
8 files changed, 795 insertions(+)
create mode 100644 Documentation/virt/coco/sevguest.rst
create mode 100644 drivers/virt/coco/sevguest/Kconfig
create mode 100644 drivers/virt/coco/sevguest/Makefile
create mode 100644 drivers/virt/coco/sevguest/sevguest.c
create mode 100644 drivers/virt/coco/sevguest/sevguest.h
create mode 100644 include/uapi/linux/sev-guest.h
diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
new file mode 100644
index 000000000000..002c90946b8a
--- /dev/null
+++ b/Documentation/virt/coco/sevguest.rst
@@ -0,0 +1,77 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================================================
+The Definitive SEV Guest API Documentation
+===================================================================
+
+1. General description
+======================
+
+The SEV API is a set of ioctls that are used by the guest or hypervisor
+to get or set certain aspect of the SEV virtual machine. The ioctls belong
+to the following classes:
+
+ - Hypervisor ioctls: These query and set global attributes which affect the
+ whole SEV firmware. These ioctl are used by platform provision tools.
+
+ - Guest ioctls: These query and set attributes of the SEV virtual machine.
+
+2. API description
+==================
+
+This section describes ioctls that can be used to query or set SEV guests.
+For each ioctl, the following information is provided along with a
+description:
+
+ Technology:
+ which SEV techology provides this ioctl. sev, sev-es, sev-snp or all.
+
+ Type:
+ hypervisor or guest. The ioctl can be used inside the guest or the
+ hypervisor.
+
+ Parameters:
+ what parameters are accepted by the ioctl.
+
+ Returns:
+ the return value. General error numbers (ENOMEM, EINVAL)
+ are not detailed, but errors with specific meanings are.
+
+The guest ioctl should be issued on a file descriptor of the /dev/sev-guest device.
+The ioctl accepts struct snp_user_guest_request. The input and output structure is
+specified through the req_data and resp_data field respectively. If the ioctl fails
+to execute due to a firmware error, then fw_err code will be set.
+
+::
+ struct snp_guest_request_ioctl {
+ /* Request and response structure address */
+ __u64 req_data;
+ __u64 resp_data;
+
+ /* firmware error code on failure (see psp-sev.h) */
+ __u64 fw_err;
+ };
+
+2.1 SNP_GET_REPORT
+------------------
+
+:Technology: sev-snp
+:Type: guest ioctl
+:Parameters (in): struct snp_report_req
+:Returns (out): struct snp_report_resp on success, -negative on error
+
+The SNP_GET_REPORT ioctl can be used to query the attestation report from the
+SEV-SNP firmware. The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command
+provided by the SEV-SNP firmware to query the attestation report.
+
+On success, the snp_report_resp.data will contains the report. The report
+will contain the format described in the SEV-SNP specification. See the SEV-SNP
+specification for further details.
+
+
+Reference
+---------
+
+SEV-SNP and GHCB specification: developer.amd.com/sev
+
+The driver is based on SEV-SNP firmware spec 0.9 and GHCB spec version 2.0.
diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 8061e8ef449f..e457e47610d3 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -36,4 +36,7 @@ source "drivers/virt/vboxguest/Kconfig"
source "drivers/virt/nitro_enclaves/Kconfig"
source "drivers/virt/acrn/Kconfig"
+
+source "drivers/virt/coco/sevguest/Kconfig"
+
endif
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index 3e272ea60cd9..9c704a6fdcda 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -8,3 +8,4 @@ obj-y += vboxguest/
obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
obj-$(CONFIG_ACRN_HSM) += acrn/
+obj-$(CONFIG_SEV_GUEST) += coco/sevguest/
diff --git a/drivers/virt/coco/sevguest/Kconfig b/drivers/virt/coco/sevguest/Kconfig
new file mode 100644
index 000000000000..96190919cca8
--- /dev/null
+++ b/drivers/virt/coco/sevguest/Kconfig
@@ -0,0 +1,9 @@
+config SEV_GUEST
+ tristate "AMD SEV Guest driver"
+ default y
+ depends on AMD_MEM_ENCRYPT && CRYPTO_AEAD2
+ help
+ The driver can be used by the SEV-SNP guest to communicate with the PSP to
+ request the attestation report and more.
+
+ If you choose 'M' here, this module will be called sevguest.
diff --git a/drivers/virt/coco/sevguest/Makefile b/drivers/virt/coco/sevguest/Makefile
new file mode 100644
index 000000000000..b1ffb2b4177b
--- /dev/null
+++ b/drivers/virt/coco/sevguest/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_SEV_GUEST) += sevguest.o
diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
new file mode 100644
index 000000000000..2d313fb2ffae
--- /dev/null
+++ b/drivers/virt/coco/sevguest/sevguest.c
@@ -0,0 +1,561 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD Secure Encrypted Virtualization Nested Paging (SEV-SNP) guest request interface
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/io.h>
+#include <linux/platform_device.h>
+#include <linux/miscdevice.h>
+#include <linux/set_memory.h>
+#include <linux/fs.h>
+#include <crypto/aead.h>
+#include <linux/scatterlist.h>
+#include <linux/psp-sev.h>
+#include <uapi/linux/sev-guest.h>
+#include <uapi/linux/psp-sev.h>
+
+#include <asm/svm.h>
+#include <asm/sev.h>
+
+#include "sevguest.h"
+
+#define DEVICE_NAME "sev-guest"
+#define AAD_LEN 48
+#define MSG_HDR_VER 1
+
+struct snp_guest_crypto {
+ struct crypto_aead *tfm;
+ u8 *iv, *authtag;
+ int iv_len, a_len;
+};
+
+struct snp_guest_dev {
+ struct device *dev;
+ struct miscdevice misc;
+
+ struct snp_guest_crypto *crypto;
+ struct snp_guest_msg *request, *response;
+ struct snp_secrets_page_layout *layout;
+ struct snp_req_data input;
+ u32 *os_area_msg_seqno;
+};
+
+static u32 vmpck_id;
+module_param(vmpck_id, uint, 0444);
+MODULE_PARM_DESC(vmpck_id, "The VMPCK ID to use when communicating with the PSP.");
+
+static DEFINE_MUTEX(snp_cmd_mutex);
+
+static inline u64 __snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
+{
+ u64 count;
+
+ /* Read the current message sequence counter from secrets pages */
+ count = *snp_dev->os_area_msg_seqno;
+
+ return count + 1;
+}
+
+/* Return a non-zero on success */
+static u64 snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
+{
+ u64 count = __snp_get_msg_seqno(snp_dev);
+
+ /*
+ * The message sequence counter for the SNP guest request is a 64-bit
+ * value but the version 2 of GHCB specification defines a 32-bit storage
+ * for the it. If the counter exceeds the 32-bit value then return zero.
+ * The caller should check the return value, but if the caller happen to
+ * not check the value and use it, then the firmware treats zero as an
+ * invalid number and will fail the message request.
+ */
+ if (count >= UINT_MAX) {
+ pr_err_ratelimited("SNP guest request message sequence counter overflow\n");
+ return 0;
+ }
+
+ return count;
+}
+
+static void snp_inc_msg_seqno(struct snp_guest_dev *snp_dev)
+{
+ /*
+ * The counter is also incremented by the PSP, so increment it by 2
+ * and save in secrets page.
+ */
+ *snp_dev->os_area_msg_seqno += 2;
+}
+
+static inline struct snp_guest_dev *to_snp_dev(struct file *file)
+{
+ struct miscdevice *dev = file->private_data;
+
+ return container_of(dev, struct snp_guest_dev, misc);
+}
+
+static struct snp_guest_crypto *init_crypto(struct snp_guest_dev *snp_dev, u8 *key, size_t keylen)
+{
+ struct snp_guest_crypto *crypto;
+
+ crypto = kzalloc(sizeof(*crypto), GFP_KERNEL_ACCOUNT);
+ if (!crypto)
+ return NULL;
+
+ crypto->tfm = crypto_alloc_aead("gcm(aes)", 0, 0);
+ if (IS_ERR(crypto->tfm))
+ goto e_free;
+
+ if (crypto_aead_setkey(crypto->tfm, key, keylen))
+ goto e_free_crypto;
+
+ crypto->iv_len = crypto_aead_ivsize(crypto->tfm);
+ if (crypto->iv_len < 12) {
+ dev_err(snp_dev->dev, "IV length is less than 12.\n");
+ goto e_free_crypto;
+ }
+
+ crypto->iv = kmalloc(crypto->iv_len, GFP_KERNEL_ACCOUNT);
+ if (!crypto->iv)
+ goto e_free_crypto;
+
+ if (crypto_aead_authsize(crypto->tfm) > MAX_AUTHTAG_LEN) {
+ if (crypto_aead_setauthsize(crypto->tfm, MAX_AUTHTAG_LEN)) {
+ dev_err(snp_dev->dev, "failed to set authsize to %d\n", MAX_AUTHTAG_LEN);
+ goto e_free_crypto;
+ }
+ }
+
+ crypto->a_len = crypto_aead_authsize(crypto->tfm);
+ crypto->authtag = kmalloc(crypto->a_len, GFP_KERNEL_ACCOUNT);
+ if (!crypto->authtag)
+ goto e_free_crypto;
+
+ return crypto;
+
+e_free_crypto:
+ crypto_free_aead(crypto->tfm);
+e_free:
+ kfree(crypto->iv);
+ kfree(crypto->authtag);
+ kfree(crypto);
+
+ return NULL;
+}
+
+static void deinit_crypto(struct snp_guest_crypto *crypto)
+{
+ crypto_free_aead(crypto->tfm);
+ kfree(crypto->iv);
+ kfree(crypto->authtag);
+ kfree(crypto);
+}
+
+static int enc_dec_message(struct snp_guest_crypto *crypto, struct snp_guest_msg *msg,
+ u8 *src_buf, u8 *dst_buf, size_t len, bool enc)
+{
+ struct snp_guest_msg_hdr *hdr = &msg->hdr;
+ struct scatterlist src[3], dst[3];
+ DECLARE_CRYPTO_WAIT(wait);
+ struct aead_request *req;
+ int ret;
+
+ req = aead_request_alloc(crypto->tfm, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
+
+ /*
+ * AEAD memory operations:
+ * +------ AAD -------+------- DATA -----+---- AUTHTAG----+
+ * | msg header | plaintext | hdr->authtag |
+ * | bytes 30h - 5Fh | or | |
+ * | | cipher | |
+ * +------------------+------------------+----------------+
+ */
+ sg_init_table(src, 3);
+ sg_set_buf(&src[0], &hdr->algo, AAD_LEN);
+ sg_set_buf(&src[1], src_buf, hdr->msg_sz);
+ sg_set_buf(&src[2], hdr->authtag, crypto->a_len);
+
+ sg_init_table(dst, 3);
+ sg_set_buf(&dst[0], &hdr->algo, AAD_LEN);
+ sg_set_buf(&dst[1], dst_buf, hdr->msg_sz);
+ sg_set_buf(&dst[2], hdr->authtag, crypto->a_len);
+
+ aead_request_set_ad(req, AAD_LEN);
+ aead_request_set_tfm(req, crypto->tfm);
+ aead_request_set_callback(req, 0, crypto_req_done, &wait);
+
+ aead_request_set_crypt(req, src, dst, len, crypto->iv);
+ ret = crypto_wait_req(enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req), &wait);
+
+ aead_request_free(req);
+ return ret;
+}
+
+static int __enc_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
+ void *plaintext, size_t len)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_guest_msg_hdr *hdr = &msg->hdr;
+
+ memset(crypto->iv, 0, crypto->iv_len);
+ memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
+
+ return enc_dec_message(crypto, msg, plaintext, msg->payload, len, true);
+}
+
+static int dec_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
+ void *plaintext, size_t len)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_guest_msg_hdr *hdr = &msg->hdr;
+
+ /* Build IV with response buffer sequence number */
+ memset(crypto->iv, 0, crypto->iv_len);
+ memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
+
+ return enc_dec_message(crypto, msg, msg->payload, plaintext, len, false);
+}
+
+static int verify_and_dec_payload(struct snp_guest_dev *snp_dev, void *payload, u32 sz)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_guest_msg *resp = snp_dev->response;
+ struct snp_guest_msg *req = snp_dev->request;
+ struct snp_guest_msg_hdr *req_hdr = &req->hdr;
+ struct snp_guest_msg_hdr *resp_hdr = &resp->hdr;
+
+ dev_dbg(snp_dev->dev, "response [seqno %lld type %d version %d sz %d]\n",
+ resp_hdr->msg_seqno, resp_hdr->msg_type, resp_hdr->msg_version, resp_hdr->msg_sz);
+
+ /* Verify that the sequence counter is incremented by 1 */
+ if (unlikely(resp_hdr->msg_seqno != (req_hdr->msg_seqno + 1)))
+ return -EBADMSG;
+
+ /* Verify response message type and version number. */
+ if (resp_hdr->msg_type != (req_hdr->msg_type + 1) ||
+ resp_hdr->msg_version != req_hdr->msg_version)
+ return -EBADMSG;
+
+ /*
+ * If the message size is greater than our buffer length then return
+ * an error.
+ */
+ if (unlikely((resp_hdr->msg_sz + crypto->a_len) > sz))
+ return -EBADMSG;
+
+ return dec_payload(snp_dev, resp, payload, resp_hdr->msg_sz + crypto->a_len);
+}
+
+static bool enc_payload(struct snp_guest_dev *snp_dev, u64 seqno, int version, u8 type,
+ void *payload, size_t sz)
+{
+ struct snp_guest_msg *req = snp_dev->request;
+ struct snp_guest_msg_hdr *hdr = &req->hdr;
+
+ memset(req, 0, sizeof(*req));
+
+ hdr->algo = SNP_AEAD_AES_256_GCM;
+ hdr->hdr_version = MSG_HDR_VER;
+ hdr->hdr_sz = sizeof(*hdr);
+ hdr->msg_type = type;
+ hdr->msg_version = version;
+ hdr->msg_seqno = seqno;
+ hdr->msg_vmpck = vmpck_id;
+ hdr->msg_sz = sz;
+
+ /* Verify the sequence number is non-zero */
+ if (!hdr->msg_seqno)
+ return -ENOSR;
+
+ dev_dbg(snp_dev->dev, "request [seqno %lld type %d version %d sz %d]\n",
+ hdr->msg_seqno, hdr->msg_type, hdr->msg_version, hdr->msg_sz);
+
+ return __enc_payload(snp_dev, req, payload, sz);
+}
+
+static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, int msg_ver,
+ u8 type, void *req_buf, size_t req_sz, void *resp_buf,
+ u32 resp_sz, __u64 *fw_err)
+{
+ unsigned long err;
+ u64 seqno;
+ int rc;
+
+ /* Get message sequence and verify that its a non-zero */
+ seqno = snp_get_msg_seqno(snp_dev);
+ if (!seqno)
+ return -EIO;
+
+ memset(snp_dev->response, 0, sizeof(*snp_dev->response));
+
+ /* Encrypt the userspace provided payload */
+ rc = enc_payload(snp_dev, seqno, msg_ver, type, req_buf, req_sz);
+ if (rc)
+ return rc;
+
+ /* Call firmware to process the request */
+ rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
+ if (fw_err)
+ *fw_err = err;
+
+ if (rc)
+ return rc;
+
+ rc = verify_and_dec_payload(snp_dev, resp_buf, resp_sz);
+ if (rc)
+ return rc;
+
+ /* Increment to new message sequence after the command is successful. */
+ snp_inc_msg_seqno(snp_dev);
+
+ return 0;
+}
+
+static int get_report(struct snp_guest_dev *snp_dev, struct snp_guest_request_ioctl *arg)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_report_resp *resp;
+ struct snp_report_req req;
+ int rc, resp_len;
+
+ if (!arg->req_data || !arg->resp_data)
+ return -EINVAL;
+
+ /* Copy the request payload from userspace */
+ if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
+ return -EFAULT;
+
+ /* Message version must be non-zero */
+ if (!req.msg_version)
+ return -EINVAL;
+
+ /*
+ * The intermediate response buffer is used while decrypting the
+ * response payload. Make sure that it has enough space to cover the
+ * authtag.
+ */
+ resp_len = sizeof(resp->data) + crypto->a_len;
+ resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
+ if (!resp)
+ return -ENOMEM;
+
+ /* Issue the command to get the attestation report */
+ rc = handle_guest_request(snp_dev, SVM_VMGEXIT_GUEST_REQUEST, req.msg_version,
+ SNP_MSG_REPORT_REQ, &req.user_data, sizeof(req.user_data),
+ resp->data, resp_len, &arg->fw_err);
+ if (rc)
+ goto e_free;
+
+ /* Copy the response payload to userspace */
+ if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
+ rc = -EFAULT;
+
+e_free:
+ kfree(resp);
+ return rc;
+}
+
+static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
+{
+ struct snp_guest_dev *snp_dev = to_snp_dev(file);
+ void __user *argp = (void __user *)arg;
+ struct snp_guest_request_ioctl input;
+ int ret = -ENOTTY;
+
+ if (copy_from_user(&input, argp, sizeof(input)))
+ return -EFAULT;
+
+ input.fw_err = 0;
+
+ mutex_lock(&snp_cmd_mutex);
+
+ switch (ioctl) {
+ case SNP_GET_REPORT:
+ ret = get_report(snp_dev, &input);
+ break;
+ default:
+ break;
+ }
+
+ mutex_unlock(&snp_cmd_mutex);
+
+ if (input.fw_err && copy_to_user(argp, &input, sizeof(input)))
+ return -EFAULT;
+
+ return ret;
+}
+
+static void free_shared_pages(void *buf, size_t sz)
+{
+ unsigned int npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ if (!buf)
+ return;
+
+ /* If fail to restore the encryption mask then leak it. */
+ if (WARN_ONCE(set_memory_encrypted((unsigned long)buf, npages),
+ "Failed to restore encryption mask (leak it)\n"))
+ return;
+
+ __free_pages(virt_to_page(buf), get_order(sz));
+}
+
+static void *alloc_shared_pages(size_t sz)
+{
+ unsigned int npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+ struct page *page;
+ int ret;
+
+ page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(sz));
+ if (IS_ERR(page))
+ return NULL;
+
+ ret = set_memory_decrypted((unsigned long)page_address(page), npages);
+ if (ret) {
+ pr_err("SEV-SNP: failed to mark page shared, ret=%d\n", ret);
+ __free_pages(page, get_order(sz));
+ return NULL;
+ }
+
+ return page_address(page);
+}
+
+static const struct file_operations snp_guest_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = snp_guest_ioctl,
+};
+
+static u8 *get_vmpck(int id, struct snp_secrets_page_layout *layout, u32 **seqno)
+{
+ u8 *key = NULL;
+
+ switch (id) {
+ case 0:
+ *seqno = &layout->os_area.msg_seqno_0;
+ key = layout->vmpck0;
+ break;
+ case 1:
+ *seqno = &layout->os_area.msg_seqno_1;
+ key = layout->vmpck1;
+ break;
+ case 2:
+ *seqno = &layout->os_area.msg_seqno_2;
+ key = layout->vmpck2;
+ break;
+ case 3:
+ *seqno = &layout->os_area.msg_seqno_3;
+ key = layout->vmpck3;
+ break;
+ default:
+ break;
+ }
+
+ return NULL;
+}
+
+static int __init snp_guest_probe(struct platform_device *pdev)
+{
+ struct snp_secrets_page_layout *layout;
+ struct snp_guest_platform_data *data;
+ struct device *dev = &pdev->dev;
+ struct snp_guest_dev *snp_dev;
+ struct miscdevice *misc;
+ u8 *vmpck;
+ int ret;
+
+ if (!dev->platform_data)
+ return -ENODEV;
+
+ data = (struct snp_guest_platform_data *)dev->platform_data;
+ layout = (__force void *)ioremap_encrypted(data->secrets_gpa, PAGE_SIZE);
+ if (!layout)
+ return -ENODEV;
+
+ ret = -ENOMEM;
+ snp_dev = devm_kzalloc(&pdev->dev, sizeof(struct snp_guest_dev), GFP_KERNEL);
+ if (!snp_dev)
+ goto e_fail;
+
+ ret = -EINVAL;
+ vmpck = get_vmpck(vmpck_id, layout, &snp_dev->os_area_msg_seqno);
+ if (!vmpck) {
+ dev_err(dev, "invalid vmpck id %d\n", vmpck_id);
+ goto e_fail;
+ }
+
+ platform_set_drvdata(pdev, snp_dev);
+ snp_dev->dev = dev;
+ snp_dev->layout = layout;
+
+ /* Allocate the shared page used for the request and response message. */
+ snp_dev->request = alloc_shared_pages(sizeof(struct snp_guest_msg));
+ if (!snp_dev->request)
+ goto e_fail;
+
+ snp_dev->response = alloc_shared_pages(sizeof(struct snp_guest_msg));
+ if (!snp_dev->response)
+ goto e_fail;
+
+ ret = -EIO;
+ snp_dev->crypto = init_crypto(snp_dev, vmpck, VMPCK_KEY_LEN);
+ if (!snp_dev->crypto)
+ goto e_fail;
+
+ misc = &snp_dev->misc;
+ misc->minor = MISC_DYNAMIC_MINOR;
+ misc->name = DEVICE_NAME;
+ misc->fops = &snp_guest_fops;
+
+ /* initial the input address for guest request */
+ snp_dev->input.req_gpa = __pa(snp_dev->request);
+ snp_dev->input.resp_gpa = __pa(snp_dev->response);
+
+ ret = misc_register(misc);
+ if (ret)
+ goto e_fail;
+
+ dev_dbg(dev, "Initialized SNP guest driver (using vmpck_id %d)\n", vmpck_id);
+ return 0;
+
+e_fail:
+ iounmap(layout);
+ free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
+ free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
+
+ return ret;
+}
+
+static int __exit snp_guest_remove(struct platform_device *pdev)
+{
+ struct snp_guest_dev *snp_dev = platform_get_drvdata(pdev);
+
+ free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
+ free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
+ deinit_crypto(snp_dev->crypto);
+ misc_deregister(&snp_dev->misc);
+
+ return 0;
+}
+
+static struct platform_driver snp_guest_driver = {
+ .remove = __exit_p(snp_guest_remove),
+ .driver = {
+ .name = "snp-guest",
+ },
+};
+
+module_platform_driver_probe(snp_guest_driver, snp_guest_probe);
+
+MODULE_AUTHOR("Brijesh Singh <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1.0.0");
+MODULE_DESCRIPTION("AMD SNP Guest Driver");
diff --git a/drivers/virt/coco/sevguest/sevguest.h b/drivers/virt/coco/sevguest/sevguest.h
new file mode 100644
index 000000000000..cfa76cf8a21a
--- /dev/null
+++ b/drivers/virt/coco/sevguest/sevguest.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ *
+ * SEV-SNP API spec is available at https://developer.amd.com/sev
+ */
+
+#ifndef __LINUX_SEVGUEST_H_
+#define __LINUX_SEVGUEST_H_
+
+#include <linux/types.h>
+
+#define MAX_AUTHTAG_LEN 32
+
+/* See SNP spec SNP_GUEST_REQUEST section for the structure */
+enum msg_type {
+ SNP_MSG_TYPE_INVALID = 0,
+ SNP_MSG_CPUID_REQ,
+ SNP_MSG_CPUID_RSP,
+ SNP_MSG_KEY_REQ,
+ SNP_MSG_KEY_RSP,
+ SNP_MSG_REPORT_REQ,
+ SNP_MSG_REPORT_RSP,
+ SNP_MSG_EXPORT_REQ,
+ SNP_MSG_EXPORT_RSP,
+ SNP_MSG_IMPORT_REQ,
+ SNP_MSG_IMPORT_RSP,
+ SNP_MSG_ABSORB_REQ,
+ SNP_MSG_ABSORB_RSP,
+ SNP_MSG_VMRK_REQ,
+ SNP_MSG_VMRK_RSP,
+
+ SNP_MSG_TYPE_MAX
+};
+
+enum aead_algo {
+ SNP_AEAD_INVALID,
+ SNP_AEAD_AES_256_GCM,
+};
+
+struct snp_guest_msg_hdr {
+ u8 authtag[MAX_AUTHTAG_LEN];
+ u64 msg_seqno;
+ u8 rsvd1[8];
+ u8 algo;
+ u8 hdr_version;
+ u16 hdr_sz;
+ u8 msg_type;
+ u8 msg_version;
+ u16 msg_sz;
+ u32 rsvd2;
+ u8 msg_vmpck;
+ u8 rsvd3[35];
+} __packed;
+
+struct snp_guest_msg {
+ struct snp_guest_msg_hdr hdr;
+ u8 payload[4000];
+} __packed;
+
+/*
+ * The secrets page contains 96-bytes of reserved field that can be used by
+ * the guest OS. The guest OS uses the area to save the message sequence
+ * number for each VMPCK.
+ *
+ * See the GHCB spec section Secret page layout for the format for this area.
+ */
+struct secrets_os_area {
+ u32 msg_seqno_0;
+ u32 msg_seqno_1;
+ u32 msg_seqno_2;
+ u32 msg_seqno_3;
+ u64 ap_jump_table_pa;
+ u8 rsvd[40];
+ u8 guest_usage[32];
+} __packed;
+
+#define VMPCK_KEY_LEN 32
+
+/* See the SNP spec version 0.9 for secrets page format */
+struct snp_secrets_page_layout {
+ u32 version;
+ u32 imien : 1,
+ rsvd1 : 31;
+ u32 fms;
+ u32 rsvd2;
+ u8 gosvw[16];
+ u8 vmpck0[VMPCK_KEY_LEN];
+ u8 vmpck1[VMPCK_KEY_LEN];
+ u8 vmpck2[VMPCK_KEY_LEN];
+ u8 vmpck3[VMPCK_KEY_LEN];
+ struct secrets_os_area os_area;
+ u8 rsvd3[3840];
+} __packed;
+
+#endif /* __LINUX_SNP_GUEST_H__ */
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
new file mode 100644
index 000000000000..eda7edcffda8
--- /dev/null
+++ b/include/uapi/linux/sev-guest.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
+/*
+ * Userspace interface for AMD SEV and SEV-SNP guest driver.
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ *
+ * SEV API specification is available at: https://developer.amd.com/sev/
+ */
+
+#ifndef __UAPI_LINUX_SEV_GUEST_H_
+#define __UAPI_LINUX_SEV_GUEST_H_
+
+#include <linux/types.h>
+
+struct snp_report_req {
+ /* message version number (must be non-zero) */
+ __u8 msg_version;
+
+ /* user data that should be included in the report */
+ __u8 user_data[64];
+};
+
+struct snp_report_resp {
+ /* response data, see SEV-SNP spec for the format */
+ __u8 data[4000];
+};
+
+struct snp_guest_request_ioctl {
+ /* Request and response structure address */
+ __u64 req_data;
+ __u64 resp_data;
+
+ /* firmware error code on failure (see psp-sev.h) */
+ __u64 fw_err;
+};
+
+#define SNP_GUEST_REQ_IOC_TYPE 'S'
+
+/* Get SNP attestation report */
+#define SNP_GET_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x0, struct snp_guest_request_ioctl)
+
+#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.25.1
From: Michael Roth <[email protected]>
When the Confidential Computing blob is located by the boot/compressed
kernel, store a pointer to it in bootparams->cc_blob_address to avoid
the need for the run-time kernel to rescan the EFI config table to find
it again.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 60885d80bf5f..9d6a2ecb609f 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -375,4 +375,11 @@ void snp_cpuid_init_boot(struct boot_params *bp)
/* It should be safe to read SEV MSR and check features now. */
if (!sev_snp_enabled())
sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ /*
+ * Pass run-time kernel a pointer to CC info via boot_params so EFI
+ * config table doesn't need to be searched again during early startup
+ * phase.
+ */
+ bp->cc_blob_address = (u32)(unsigned long)cc_info;
}
--
2.25.1
From: Michael Roth <[email protected]>
CPUID instructions generate a #VC exception for SEV-ES/SEV-SNP guests,
for which early handlers are currently set up to handle. In the case
of SEV-SNP, guests can use a configurable location in guest memory
that has been pre-populated with a firmware-validated CPUID table to
look up the relevant CPUID values rather than requesting them from
hypervisor via a VMGEXIT. Add the various hooks in the #VC handlers to
allow CPUID instructions to be handled via the table. The code to
actually configure/enable the table will be added in a subsequent
commit.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 1 +
arch/x86/include/asm/sev-common.h | 2 +
arch/x86/kernel/sev-shared.c | 308 ++++++++++++++++++++++++++++++
arch/x86/kernel/sev.c | 1 +
4 files changed, 312 insertions(+)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index e8308ada610d..11c459809d4c 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -20,6 +20,7 @@
#include <asm/fpu/xcr.h>
#include <asm/ptrace.h>
#include <asm/svm.h>
+#include <asm/cpuid.h>
#include "error.h"
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index c380aba9fc8d..45c535eb75f1 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -152,6 +152,8 @@ struct snp_psc_desc {
#define GHCB_TERM_PSC 1 /* Page State Change failure */
#define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
#define GHCB_TERM_NOT_VMPL0 3 /* SNP guest is not running at VMPL-0 */
+#define GHCB_TERM_CPUID 4 /* CPUID-validation failure */
+#define GHCB_TERM_CPUID_HV 5 /* CPUID failure during hypervisor fallback */
#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 402b19f1c75d..193ca49a1689 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -14,6 +14,41 @@
#define has_cpuflag(f) boot_cpu_has(f)
#endif
+/*
+ * Individual entries of the SEV-SNP CPUID table, as defined by the SEV-SNP
+ * Firmware ABI, Revision 0.9, Section 7.1, Table 14. Note that the XCR0_IN
+ * and XSS_IN are denoted here as __unused/__unused2, since they are not
+ * needed for the current guest implementation, where the size of the buffers
+ * needed to store enabled XSAVE-saved features are calculated rather than
+ * encoded in the CPUID table for each possible combination of XCR0_IN/XSS_IN
+ * to save space.
+ */
+struct snp_cpuid_fn {
+ u32 eax_in;
+ u32 ecx_in;
+ u64 __unused;
+ u64 __unused2;
+ u32 eax;
+ u32 ebx;
+ u32 ecx;
+ u32 edx;
+ u64 __reserved;
+} __packed;
+
+/*
+ * SEV-SNP CPUID table header, as defined by the SEV-SNP Firmware ABI,
+ * Revision 0.9, Section 8.14.2.6. Also noted there is the SEV-SNP
+ * firmware-enforced limit of 64 entries per CPUID table.
+ */
+#define SNP_CPUID_COUNT_MAX 64
+
+struct snp_cpuid_info {
+ u32 count;
+ u32 __reserved1;
+ u64 __reserved2;
+ struct snp_cpuid_fn fn[SNP_CPUID_COUNT_MAX];
+} __packed;
+
/*
* Since feature negotiation related variables are set early in the boot
* process they must reside in the .data section so as not to be zeroed
@@ -26,6 +61,28 @@ static u16 __ro_after_init ghcb_version;
/* Bitmap of SEV features supported by the hypervisor */
static u64 __ro_after_init sev_hv_features;
+/*
+ * These are stored in .data section to avoid the need to re-parse boot_params
+ * and regenerate the CPUID table/pointer when .bss is cleared.
+ */
+
+/*
+ * The CPUID info can't always be referenced directly due to the need for
+ * pointer fixups during initial startup phase of kernel proper, so access must
+ * be done through this pointer, which will be fixed up as-needed during boot.
+ */
+static const struct snp_cpuid_info *cpuid_info __ro_after_init;
+
+/*
+ * These will be initialized based on CPUID table so that non-present
+ * all-zero leaves (for sparse tables) can be differentiated from
+ * invalid/out-of-range leaves. This is needed since all-zero leaves
+ * still need to be post-processed.
+ */
+u32 cpuid_std_range_max __ro_after_init;
+u32 cpuid_hyp_range_max __ro_after_init;
+u32 cpuid_ext_range_max __ro_after_init;
+
static bool __init sev_es_check_cpu_features(void)
{
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -245,6 +302,224 @@ static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
return 0;
}
+static inline bool snp_cpuid_active(void)
+{
+ return !!cpuid_info;
+}
+
+static int snp_cpuid_calc_xsave_size(u64 xfeatures_en, u32 base_size,
+ u32 *xsave_size, bool compacted)
+{
+ u32 xsave_size_total = base_size;
+ u64 xfeatures_found = 0;
+ int i;
+
+ for (i = 0; i < cpuid_info->count; i++) {
+ const struct snp_cpuid_fn *fn = &cpuid_info->fn[i];
+
+ if (!(fn->eax_in == 0xD && fn->ecx_in > 1 && fn->ecx_in < 64))
+ continue;
+ if (!(xfeatures_en & (BIT_ULL(fn->ecx_in))))
+ continue;
+ if (xfeatures_found & (BIT_ULL(fn->ecx_in)))
+ continue;
+
+ xfeatures_found |= (BIT_ULL(fn->ecx_in));
+
+ if (compacted)
+ xsave_size_total += fn->eax;
+ else
+ xsave_size_total = max(xsave_size_total,
+ fn->eax + fn->ebx);
+ }
+
+ /*
+ * Either the guest set unsupported XCR0/XSS bits, or the corresponding
+ * entries in the CPUID table were not present. This is not a valid
+ * state to be in.
+ */
+ if (xfeatures_found != (xfeatures_en & GENMASK_ULL(63, 2)))
+ return -EINVAL;
+
+ *xsave_size = xsave_size_total;
+
+ return 0;
+}
+
+static void snp_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
+ u32 *edx)
+{
+ /*
+ * MSR protocol does not support fetching indexed subfunction, but is
+ * sufficient to handle current fallback cases. Should that change,
+ * make sure to terminate rather than ignoring the index and grabbing
+ * random values. If this issue arises in the future, handling can be
+ * added here to use GHCB-page protocol for cases that occur late
+ * enough in boot that GHCB page is available.
+ */
+ if (cpuid_function_is_indexed(func) && subfunc)
+ sev_es_terminate(1, GHCB_TERM_CPUID_HV);
+
+ if (sev_cpuid_hv(func, 0, eax, ebx, ecx, edx))
+ sev_es_terminate(1, GHCB_TERM_CPUID_HV);
+}
+
+static bool
+snp_cpuid_find_validated_func(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
+ u32 *ecx, u32 *edx)
+{
+ int i;
+
+ for (i = 0; i < cpuid_info->count; i++) {
+ const struct snp_cpuid_fn *fn = &cpuid_info->fn[i];
+
+ if (fn->eax_in != func)
+ continue;
+
+ if (cpuid_function_is_indexed(func) && fn->ecx_in != subfunc)
+ continue;
+
+ *eax = fn->eax;
+ *ebx = fn->ebx;
+ *ecx = fn->ecx;
+ *edx = fn->edx;
+
+ return true;
+ }
+
+ return false;
+}
+
+static bool snp_cpuid_check_range(u32 func)
+{
+ if (func <= cpuid_std_range_max ||
+ (func >= 0x40000000 && func <= cpuid_hyp_range_max) ||
+ (func >= 0x80000000 && func <= cpuid_ext_range_max))
+ return true;
+
+ return false;
+}
+
+static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
+ u32 *ecx, u32 *edx)
+{
+ u32 ebx2, ecx2, edx2;
+
+ switch (func) {
+ case 0x1:
+ snp_cpuid_hv(func, subfunc, NULL, &ebx2, NULL, &edx2);
+
+ /* initial APIC ID */
+ *ebx = (ebx2 & GENMASK(31, 24)) | (*ebx & GENMASK(23, 0));
+ /* APIC enabled bit */
+ *edx = (edx2 & BIT(9)) | (*edx & ~BIT(9));
+
+ /* OSXSAVE enabled bit */
+ if (native_read_cr4() & X86_CR4_OSXSAVE)
+ *ecx |= BIT(27);
+ break;
+ case 0x7:
+ /* OSPKE enabled bit */
+ *ecx &= ~BIT(4);
+ if (native_read_cr4() & X86_CR4_PKE)
+ *ecx |= BIT(4);
+ break;
+ case 0xB:
+ /* extended APIC ID */
+ snp_cpuid_hv(func, 0, NULL, NULL, NULL, edx);
+ break;
+ case 0xD: {
+ bool compacted = false;
+ u64 xcr0 = 1, xss = 0;
+ u32 xsave_size;
+
+ if (subfunc != 0 && subfunc != 1)
+ return 0;
+
+ if (native_read_cr4() & X86_CR4_OSXSAVE)
+ xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
+ if (subfunc == 1) {
+ /* Get XSS value if XSAVES is enabled. */
+ if (*eax & BIT(3)) {
+ unsigned long lo, hi;
+
+ asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
+ : "c" (MSR_IA32_XSS));
+ xss = (hi << 32) | lo;
+ }
+
+ /*
+ * The PPR and APM aren't clear on what size should be
+ * encoded in 0xD:0x1:EBX when compaction is not enabled
+ * by either XSAVEC (feature bit 1) or XSAVES (feature
+ * bit 3) since SNP-capable hardware has these feature
+ * bits fixed as 1. KVM sets it to 0 in this case, but
+ * to avoid this becoming an issue it's safer to simply
+ * treat this as unsupported for SEV-SNP guests.
+ */
+ if (!(*eax & (BIT(1) | BIT(3))))
+ return -EINVAL;
+
+ compacted = true;
+ }
+
+ if (snp_cpuid_calc_xsave_size(xcr0 | xss, *ebx, &xsave_size,
+ compacted))
+ return -EINVAL;
+
+ *ebx = xsave_size;
+ }
+ break;
+ case 0x8000001E:
+ /* extended APIC ID */
+ snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL);
+ /* compute ID */
+ *ebx = (*ebx & GENMASK(31, 8)) | (ebx2 & GENMASK(7, 0));
+ /* node ID */
+ *ecx = (*ecx & GENMASK(31, 8)) | (ecx2 & GENMASK(7, 0));
+ break;
+ default:
+ /* No fix-ups needed, use values as-is. */
+ break;
+ }
+
+ return 0;
+}
+
+/*
+ * Returns -EOPNOTSUPP if feature not enabled. Any other return value should be
+ * treated as fatal by caller.
+ */
+static int snp_cpuid(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
+ u32 *edx)
+{
+ if (!snp_cpuid_active())
+ return -EOPNOTSUPP;
+
+ if (!snp_cpuid_find_validated_func(func, subfunc, eax, ebx, ecx, edx)) {
+ /*
+ * Some hypervisors will avoid keeping track of CPUID entries
+ * where all values are zero, since they can be handled the
+ * same as out-of-range values (all-zero). This is useful here
+ * as well as it allows virtually all guest configurations to
+ * work using a single SEV-SNP CPUID table.
+ *
+ * To allow for this, there is a need to distinguish between
+ * out-of-range entries and in-range zero entries, since the
+ * CPUID table entries are only a template that may need to be
+ * augmented with additional values for things like
+ * CPU-specific information during post-processing. So if it's
+ * not in the table, but is still in the valid range, proceed
+ * with the post-processing. Otherwise, just return zeros.
+ */
+ *eax = *ebx = *ecx = *edx = 0;
+ if (!snp_cpuid_check_range(func))
+ return 0;
+ }
+
+ return snp_cpuid_postprocess(func, subfunc, eax, ebx, ecx, edx);
+}
+
/*
* Boot VC Handler - This is the first VC handler during boot, there is no GHCB
* page yet, so it only supports the MSR based communication with the
@@ -252,8 +527,10 @@ static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
*/
void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
{
+ unsigned int subfn = lower_bits(regs->cx, 32);
unsigned int fn = lower_bits(regs->ax, 32);
u32 eax, ebx, ecx, edx;
+ int ret;
/* Only CPUID is supported via MSR protocol */
if (exit_code != SVM_EXIT_CPUID)
@@ -273,9 +550,17 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
sev_status = (hi << 32) | lo;
}
+ ret = snp_cpuid(fn, subfn, &eax, &ebx, &ecx, &edx);
+ if (ret == 0)
+ goto cpuid_done;
+
+ if (ret != -EOPNOTSUPP)
+ goto fail;
+
if (sev_cpuid_hv(fn, 0, &eax, &ebx, &ecx, &edx))
goto fail;
+cpuid_done:
regs->ax = eax;
regs->bx = ebx;
regs->cx = ecx;
@@ -569,12 +854,35 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
return ret;
}
+static int vc_handle_cpuid_snp(struct pt_regs *regs)
+{
+ u32 eax, ebx, ecx, edx;
+ int ret;
+
+ ret = snp_cpuid(regs->ax, regs->cx, &eax, &ebx, &ecx, &edx);
+ if (ret == 0) {
+ regs->ax = eax;
+ regs->bx = ebx;
+ regs->cx = ecx;
+ regs->dx = edx;
+ }
+
+ return ret;
+}
+
static enum es_result vc_handle_cpuid(struct ghcb *ghcb,
struct es_em_ctxt *ctxt)
{
struct pt_regs *regs = ctxt->regs;
u32 cr4 = native_read_cr4();
enum es_result ret;
+ int snp_cpuid_ret;
+
+ snp_cpuid_ret = vc_handle_cpuid_snp(regs);
+ if (snp_cpuid_ret == 0)
+ return ES_OK;
+ if (snp_cpuid_ret != -EOPNOTSUPP)
+ return ES_VMM_ERROR;
ghcb_set_rax(ghcb, regs->ax);
ghcb_set_rcx(ghcb, regs->cx);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index dfb5b2920933..d348ad027df8 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -33,6 +33,7 @@
#include <asm/smp.h>
#include <asm/cpu.h>
#include <asm/apic.h>
+#include <asm/cpuid.h>
#define DR7_RESET_VALUE 0x400
--
2.25.1
Version 2 of GHCB specification defines Non-Automatic-Exit(NAE) to get
the extended guest report. It is similar to the SNP_GET_REPORT ioctl.
The main difference is related to the additional data that will be
returned. The additional data returned is a certificate blob that can
be used by the SNP guest user. The certificate blob layout is defined
in the GHCB specification. The driver simply treats the blob as a opaque
data and copies it to userspace.
Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/virt/coco/sevguest.rst | 23 +++++++
drivers/virt/coco/sevguest/sevguest.c | 97 ++++++++++++++++++++++++++-
include/uapi/linux/sev-guest.h | 13 ++++
3 files changed, 131 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
index 4b524d1de37c..071dc93aad6c 100644
--- a/Documentation/virt/coco/sevguest.rst
+++ b/Documentation/virt/coco/sevguest.rst
@@ -86,6 +86,29 @@ on the various fileds passed in the key derivation request.
On success, the snp_derived_key_resp.data will contains the derived key value. See
the SEV-SNP specification for further details.
+
+2.3 SNP_GET_EXT_REPORT
+----------------------
+:Technology: sev-snp
+:Type: guest ioctl
+:Parameters (in/out): struct snp_ext_report_req
+:Returns (out): struct snp_report_resp on success, -negative on error
+
+The SNP_GET_EXT_REPORT ioctl is similar to the SNP_GET_REPORT. The difference is
+related to the additional certificate data that is returned with the report.
+The certificate data returned is being provided by the hypervisor through the
+SNP_SET_EXT_CONFIG.
+
+The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command provided by the SEV-SNP
+firmware to get the attestation report.
+
+On success, the snp_ext_report_resp.data will contain the attestation report
+and snp_ext_report_req.certs_address will contain the certificate blob. If the
+length of the blob is smaller than expected then snp_ext_report_req.certs_len will
+be updated with the expected value.
+
+See GHCB specification for further detail on how to parse the certificate blob.
+
Reference
---------
diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
index c6ca7d861a3a..f7115adc4378 100644
--- a/drivers/virt/coco/sevguest/sevguest.c
+++ b/drivers/virt/coco/sevguest/sevguest.c
@@ -41,6 +41,7 @@ struct snp_guest_dev {
struct device *dev;
struct miscdevice misc;
+ void *certs_data;
struct snp_guest_crypto *crypto;
struct snp_guest_msg *request, *response;
struct snp_secrets_page_layout *layout;
@@ -410,6 +411,88 @@ static int get_derived_key(struct snp_guest_dev *snp_dev, struct snp_guest_reque
return rc;
}
+static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_request_ioctl *arg)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_ext_report_req req;
+ struct snp_report_resp *resp;
+ int ret, npages = 0, resp_len;
+
+ if (!arg->req_data || !arg->resp_data)
+ return -EINVAL;
+
+ /* Copy the request payload from userspace */
+ if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
+ return -EFAULT;
+
+ /* Message version must be non-zero */
+ if (!req.data.msg_version)
+ return -EINVAL;
+
+ if (req.certs_len) {
+ if (req.certs_len > SEV_FW_BLOB_MAX_SIZE ||
+ !IS_ALIGNED(req.certs_len, PAGE_SIZE))
+ return -EINVAL;
+ }
+
+ if (req.certs_address && req.certs_len) {
+ if (!access_ok(req.certs_address, req.certs_len))
+ return -EFAULT;
+
+ /*
+ * Initialize the intermediate buffer with all zero's. This buffer
+ * is used in the guest request message to get the certs blob from
+ * the host. If host does not supply any certs in it, then copy
+ * zeros to indicate that certificate data was not provided.
+ */
+ memset(snp_dev->certs_data, 0, req.certs_len);
+
+ npages = req.certs_len >> PAGE_SHIFT;
+ }
+
+ /*
+ * The intermediate response buffer is used while decrypting the
+ * response payload. Make sure that it has enough space to cover the
+ * authtag.
+ */
+ resp_len = sizeof(resp->data) + crypto->a_len;
+ resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
+ if (!resp)
+ return -ENOMEM;
+
+ snp_dev->input.data_npages = npages;
+ ret = handle_guest_request(snp_dev, SVM_VMGEXIT_EXT_GUEST_REQUEST, req.data.msg_version,
+ SNP_MSG_REPORT_REQ, &req.data.user_data,
+ sizeof(req.data.user_data), resp->data, resp_len, &arg->fw_err);
+
+ /* If certs length is invalid then copy the returned length */
+ if (arg->fw_err == SNP_GUEST_REQ_INVALID_LEN) {
+ req.certs_len = snp_dev->input.data_npages << PAGE_SHIFT;
+
+ if (copy_to_user((void __user *)arg->req_data, &req, sizeof(req)))
+ ret = -EFAULT;
+ }
+
+ if (ret)
+ goto e_free;
+
+ /* Copy the certificate data blob to userspace */
+ if (req.certs_address && req.certs_len &&
+ copy_to_user((void __user *)req.certs_address, snp_dev->certs_data,
+ req.certs_len)) {
+ ret = -EFAULT;
+ goto e_free;
+ }
+
+ /* Copy the response payload to userspace */
+ if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
+ ret = -EFAULT;
+
+e_free:
+ kfree(resp);
+ return ret;
+}
+
static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
struct snp_guest_dev *snp_dev = to_snp_dev(file);
@@ -431,6 +514,9 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
case SNP_GET_DERIVED_KEY:
ret = get_derived_key(snp_dev, &input);
break;
+ case SNP_GET_EXT_REPORT:
+ ret = get_ext_report(snp_dev, &input);
+ break;
default:
break;
}
@@ -508,7 +594,7 @@ static u8 *get_vmpck(int id, struct snp_secrets_page_layout *layout, u32 **seqno
break;
}
- return NULL;
+ return key;
}
static int __init snp_guest_probe(struct platform_device *pdev)
@@ -554,6 +640,10 @@ static int __init snp_guest_probe(struct platform_device *pdev)
if (!snp_dev->response)
goto e_fail;
+ snp_dev->certs_data = alloc_shared_pages(SEV_FW_BLOB_MAX_SIZE);
+ if (!snp_dev->certs_data)
+ goto e_fail;
+
ret = -EIO;
snp_dev->crypto = init_crypto(snp_dev, vmpck, VMPCK_KEY_LEN);
if (!snp_dev->crypto)
@@ -567,16 +657,18 @@ static int __init snp_guest_probe(struct platform_device *pdev)
/* initial the input address for guest request */
snp_dev->input.req_gpa = __pa(snp_dev->request);
snp_dev->input.resp_gpa = __pa(snp_dev->response);
+ snp_dev->input.data_gpa = __pa(snp_dev->certs_data);
ret = misc_register(misc);
if (ret)
goto e_fail;
- dev_dbg(dev, "Initialized SNP guest driver (using vmpck_id %d)\n", vmpck_id);
+ dev_info(dev, "Initialized SNP guest driver (using vmpck_id %d)\n", vmpck_id);
return 0;
e_fail:
iounmap(layout);
+ free_shared_pages(snp_dev->certs_data, SEV_FW_BLOB_MAX_SIZE);
free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
@@ -589,6 +681,7 @@ static int __exit snp_guest_remove(struct platform_device *pdev)
free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
+ free_shared_pages(snp_dev->certs_data, SEV_FW_BLOB_MAX_SIZE);
deinit_crypto(snp_dev->crypto);
misc_deregister(&snp_dev->misc);
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index f6d9c136ff4d..3f6a9d694a47 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -57,6 +57,16 @@ struct snp_derived_key_resp {
__u8 data[64];
};
+struct snp_ext_report_req {
+ struct snp_report_req data;
+
+ /* where to copy the certificate blob */
+ __u64 certs_address;
+
+ /* length of the certificate blob */
+ __u32 certs_len;
+};
+
#define SNP_GUEST_REQ_IOC_TYPE 'S'
/* Get SNP attestation report */
@@ -65,4 +75,7 @@ struct snp_derived_key_resp {
/* Get a derived key from the root */
#define SNP_GET_DERIVED_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_guest_request_ioctl)
+/* Get SNP extended report as defined in the GHCB specification version 2. */
+#define SNP_GET_EXT_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x2, struct snp_guest_request_ioctl)
+
#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.25.1
From: Michael Roth <[email protected]>
Determining which CPUID leafs have significant ECX/index values is
also needed by guest kernel code when doing SEV-SNP-validated CPUID
lookups. Move this to common code to keep future updates in sync.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/cpuid.h | 26 ++++++++++++++++++++++++++
arch/x86/kvm/cpuid.c | 17 ++---------------
2 files changed, 28 insertions(+), 15 deletions(-)
create mode 100644 arch/x86/include/asm/cpuid.h
diff --git a/arch/x86/include/asm/cpuid.h b/arch/x86/include/asm/cpuid.h
new file mode 100644
index 000000000000..61426eb1f665
--- /dev/null
+++ b/arch/x86/include/asm/cpuid.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_CPUID_H
+#define _ASM_X86_CPUID_H
+
+static __always_inline bool cpuid_function_is_indexed(u32 function)
+{
+ switch (function) {
+ case 4:
+ case 7:
+ case 0xb:
+ case 0xd:
+ case 0xf:
+ case 0x10:
+ case 0x12:
+ case 0x14:
+ case 0x17:
+ case 0x18:
+ case 0x1f:
+ case 0x8000001d:
+ return true;
+ }
+
+ return false;
+}
+
+#endif /* _ASM_X86_CPUID_H */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 751aa85a3001..312b0382e541 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -19,6 +19,7 @@
#include <asm/user.h>
#include <asm/fpu/xstate.h>
#include <asm/sgx.h>
+#include <asm/cpuid.h>
#include "cpuid.h"
#include "lapic.h"
#include "mmu.h"
@@ -582,22 +583,8 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
cpuid_count(entry->function, entry->index,
&entry->eax, &entry->ebx, &entry->ecx, &entry->edx);
- switch (function) {
- case 4:
- case 7:
- case 0xb:
- case 0xd:
- case 0xf:
- case 0x10:
- case 0x12:
- case 0x14:
- case 0x17:
- case 0x18:
- case 0x1f:
- case 0x8000001d:
+ if (cpuid_function_is_indexed(function))
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
- break;
- }
return entry;
}
--
2.25.1
Version 2 of GHCB specification provides SNP_GUEST_REQUEST and
SNP_EXT_GUEST_REQUEST NAE that can be used by the SNP guest to communicate
with the PSP.
While at it, add a snp_issue_guest_request() helper that can be used by
driver or other subsystem to issue the request to PSP.
See SEV-SNP and GHCB spec for more details.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 3 ++
arch/x86/include/asm/sev.h | 13 ++++++++
arch/x86/include/uapi/asm/svm.h | 4 +++
arch/x86/kernel/sev.c | 50 +++++++++++++++++++++++++++++++
4 files changed, 70 insertions(+)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 45c535eb75f1..cf66600b1c68 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -128,6 +128,9 @@ struct snp_psc_desc {
struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY];
} __packed;
+/* Guest message request error code */
+#define SNP_GUEST_REQ_INVALID_LEN BIT_ULL(32)
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 1c58060b48b7..4ea8e2f73d37 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -80,6 +80,14 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
#define RMPADJUST_VMSA_PAGE_BIT BIT(16)
+/* SNP Guest message request */
+struct snp_req_data {
+ unsigned long req_gpa;
+ unsigned long resp_gpa;
+ unsigned long data_gpa;
+ unsigned int data_npages;
+};
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -129,6 +137,7 @@ void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
void snp_set_wakeup_secondary_cpu(void);
void snp_cpuid_init_startup(struct boot_params *bp, unsigned long physaddr);
void snp_cpuid_init(void);
+int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -146,6 +155,10 @@ static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npag
static inline void snp_set_wakeup_secondary_cpu(void) { }
static inline void snp_cpuid_startup(struct boot_params *bp, unsigned long physbase) { }
static inline void snp_cpuid_init(void) { }
+static int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err)
+{
+ return -ENOTTY;
+}
#endif
#endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 8b4c57baec52..5b8bc2b65a5e 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -109,6 +109,8 @@
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
#define SVM_VMGEXIT_PSC 0x80000010
+#define SVM_VMGEXIT_GUEST_REQUEST 0x80000011
+#define SVM_VMGEXIT_EXT_GUEST_REQUEST 0x80000012
#define SVM_VMGEXIT_AP_CREATION 0x80000013
#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
#define SVM_VMGEXIT_AP_CREATE 1
@@ -225,6 +227,8 @@
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
{ SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
+ { SVM_VMGEXIT_GUEST_REQUEST, "vmgexit_guest_request" }, \
+ { SVM_VMGEXIT_EXT_GUEST_REQUEST, "vmgexit_ext_guest_request" }, \
{ SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
{ SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1e6152fe27ba..c29a78f868ed 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2121,3 +2121,53 @@ static int __init snp_cpuid_check_status(void)
}
arch_initcall(snp_cpuid_check_status);
+
+int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err)
+{
+ struct ghcb_state state;
+ unsigned long flags;
+ struct ghcb *ghcb;
+ int ret;
+
+ if (!cc_platform_has(CC_ATTR_SEV_SNP))
+ return -ENODEV;
+
+ local_irq_save(flags);
+
+ ghcb = __sev_get_ghcb(&state);
+ if (!ghcb) {
+ ret = -EIO;
+ goto e_restore_irq;
+ }
+
+ vc_ghcb_invalidate(ghcb);
+
+ if (exit_code == SVM_VMGEXIT_EXT_GUEST_REQUEST) {
+ ghcb_set_rax(ghcb, input->data_gpa);
+ ghcb_set_rbx(ghcb, input->data_npages);
+ }
+
+ ret = sev_es_ghcb_hv_call(ghcb, NULL, exit_code, input->req_gpa, input->resp_gpa);
+ if (ret)
+ goto e_put;
+
+ if (ghcb->save.sw_exit_info_2) {
+ /* Number of expected pages are returned in RBX */
+ if (exit_code == SVM_VMGEXIT_EXT_GUEST_REQUEST &&
+ ghcb->save.sw_exit_info_2 == SNP_GUEST_REQ_INVALID_LEN)
+ input->data_npages = ghcb_get_rbx(ghcb);
+
+ if (fw_err)
+ *fw_err = ghcb->save.sw_exit_info_2;
+
+ ret = -EIO;
+ }
+
+e_put:
+ __sev_put_ghcb(&state);
+e_restore_irq:
+ local_irq_restore(flags);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(snp_issue_guest_request);
--
2.25.1
From: Michael Roth <[email protected]>
Future patches for SEV-SNP-validated CPUID will also require early
parsing of the EFI configuration. Incrementally move the related code
into a set of helpers that can be re-used for that purpose.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/acpi.c | 52 +++++----------------------------
arch/x86/boot/compressed/efi.c | 42 ++++++++++++++++++++++++++
arch/x86/boot/compressed/misc.h | 9 ++++++
3 files changed, 58 insertions(+), 45 deletions(-)
diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index 255f6959c090..d43ff3ff573b 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -117,54 +117,16 @@ static acpi_physical_address kexec_get_rsdp_addr(void) { return 0; }
static acpi_physical_address efi_get_rsdp_addr(void)
{
#ifdef CONFIG_EFI
- unsigned long systab, config_tables;
- unsigned int nr_tables;
- struct efi_info *ei;
+ unsigned long cfg_tbl_pa = 0;
+ unsigned int cfg_tbl_len;
bool efi_64;
- char *sig;
-
- ei = &boot_params->efi_info;
- sig = (char *)&ei->efi_loader_signature;
-
- if (!strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
- efi_64 = true;
- } else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4)) {
- efi_64 = false;
- } else {
- debug_putstr("Wrong EFI loader signature.\n");
- return 0;
- }
-
- /* Get systab from boot params. */
-#ifdef CONFIG_X86_64
- systab = ei->efi_systab | ((__u64)ei->efi_systab_hi << 32);
-#else
- if (ei->efi_systab_hi || ei->efi_memmap_hi) {
- debug_putstr("Error getting RSDP address: EFI system table located above 4GB.\n");
- return 0;
- }
- systab = ei->efi_systab;
-#endif
- if (!systab)
- error("EFI system table not found.");
-
- /* Handle EFI bitness properly */
- if (efi_64) {
- efi_system_table_64_t *stbl = (efi_system_table_64_t *)systab;
-
- config_tables = stbl->tables;
- nr_tables = stbl->nr_tables;
- } else {
- efi_system_table_32_t *stbl = (efi_system_table_32_t *)systab;
-
- config_tables = stbl->tables;
- nr_tables = stbl->nr_tables;
- }
+ int ret;
- if (!config_tables)
- error("EFI config tables not found.");
+ ret = efi_get_conf_table(boot_params, &cfg_tbl_pa, &cfg_tbl_len, &efi_64);
+ if (ret || !cfg_tbl_pa)
+ error("EFI config table not found.");
- return __efi_get_rsdp_addr(config_tables, nr_tables, efi_64);
+ return __efi_get_rsdp_addr(cfg_tbl_pa, cfg_tbl_len, efi_64);
#else
return 0;
#endif
diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
index 306b287b7368..e5f39b3f5665 100644
--- a/arch/x86/boot/compressed/efi.c
+++ b/arch/x86/boot/compressed/efi.c
@@ -62,3 +62,45 @@ int efi_get_system_table(struct boot_params *boot_params, unsigned long *sys_tbl
*is_efi_64 = efi_64;
return 0;
}
+
+/**
+ * Given boot_params, locate EFI system table from it and return the physical
+ * address EFI configuration table.
+ *
+ * @boot_params: pointer to boot_params
+ * @cfg_tbl_pa: location to store physical address of config table
+ * @cfg_tbl_len: location to store number of config table entries
+ * @is_efi_64: location to store whether using 64-bit EFI or not
+ *
+ * Returns 0 on success. On error, return params are left unchanged.
+ */
+int efi_get_conf_table(struct boot_params *boot_params, unsigned long *cfg_tbl_pa,
+ unsigned int *cfg_tbl_len, bool *is_efi_64)
+{
+ unsigned long sys_tbl_pa = 0;
+ int ret;
+
+ if (!cfg_tbl_pa || !cfg_tbl_len || !is_efi_64)
+ return -EINVAL;
+
+ ret = efi_get_system_table(boot_params, &sys_tbl_pa, is_efi_64);
+ if (ret)
+ return ret;
+
+ /* Handle EFI bitness properly */
+ if (*is_efi_64) {
+ efi_system_table_64_t *stbl =
+ (efi_system_table_64_t *)sys_tbl_pa;
+
+ *cfg_tbl_pa = stbl->tables;
+ *cfg_tbl_len = stbl->nr_tables;
+ } else {
+ efi_system_table_32_t *stbl =
+ (efi_system_table_32_t *)sys_tbl_pa;
+
+ *cfg_tbl_pa = stbl->tables;
+ *cfg_tbl_len = stbl->nr_tables;
+ }
+
+ return 0;
+}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index f86ff866fd7a..b72fd860362a 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -179,6 +179,8 @@ unsigned long sev_verify_cbit(unsigned long cr3);
/* helpers for early EFI config table access */
int efi_get_system_table(struct boot_params *boot_params,
unsigned long *sys_tbl_pa, bool *is_efi_64);
+int efi_get_conf_table(struct boot_params *boot_params, unsigned long *cfg_tbl_pa,
+ unsigned int *cfg_tbl_len, bool *is_efi_64);
#else
static inline int
efi_get_system_table(struct boot_params *boot_params,
@@ -186,6 +188,13 @@ efi_get_system_table(struct boot_params *boot_params,
{
return -ENOENT;
}
+
+static inline int
+efi_get_conf_table(struct boot_params *boot_params, unsigned long *cfg_tbl_pa,
+ unsigned int *cfg_tbl_len, bool *is_efi_64)
+{
+ return -ENOENT;
+}
#endif /* CONFIG_EFI */
#endif /* BOOT_COMPRESSED_MISC_H */
--
2.25.1
From: Michael Roth <[email protected]>
SEV-SNP guests will be provided the location of special 'secrets' and
'CPUID' pages via the Confidential Computing blob. This blob is
provided to the run-time kernel either through bootparams field that
was initialized by the boot/compressed kernel, or via a setup_data
structure as defined by the Linux Boot Protocol.
Locate the Confidential Computing from these sources and, if found,
use the provided CPUID page/table address to create a copy that the
run-time kernel will use when servicing cpuid instructions via a #VC
handler.
This must be set up during early startup before any cpuid instructions
are issued. As result, some pointer fixups are needed early on that
must be adjusted later in boot, which is why there are 2 init routines.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 2 +-
arch/x86/include/asm/setup.h | 2 +-
arch/x86/include/asm/sev.h | 17 +----
arch/x86/kernel/head64.c | 12 ++-
arch/x86/kernel/sev-shared.c | 23 +++++-
arch/x86/kernel/sev.c | 135 +++++++++++++++++++++++++++++++++
6 files changed, 170 insertions(+), 21 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 1b77b819ddb4..2f31f69715d0 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -361,7 +361,7 @@ void snp_cpuid_init_boot(struct boot_params *bp)
if (!cc_info)
return;
- snp_cpuid_info_create(cc_info);
+ snp_cpuid_info_create(cc_info, 0);
/* SEV-SNP CPUID table is set up now. Do some sanity checks. */
if (!snp_cpuid_active())
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index a12458a7a8d4..cee1e816fdcd 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -50,7 +50,7 @@ extern void reserve_standard_io_resources(void);
extern void i386_reserve_resources(void);
extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp);
extern unsigned long __startup_secondary_64(void);
-extern void startup_64_setup_env(unsigned long physbase);
+extern void startup_64_setup_env(unsigned long physbase, struct boot_params *bp);
extern void early_setup_idt(void);
extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7c88762cdb23..1c58060b48b7 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -127,17 +127,8 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
void snp_set_wakeup_secondary_cpu(void);
-/*
- * TODO: These are exported only temporarily while boot/compressed/sev.c is
- * the only user. This is to avoid unused function warnings for kernel/sev.c
- * during the build of kernel proper.
- *
- * Once the code is added to consume these in kernel proper these functions
- * can be moved back to being statically-scoped to units that pull in
- * sev-shared.c via #include and these declarations can be dropped.
- */
-void __init snp_cpuid_info_create(const struct cc_blob_sev_info *cc_info);
-struct cc_blob_sev_info *snp_find_cc_blob_setup_data(struct boot_params *bp);
+void snp_cpuid_init_startup(struct boot_params *bp, unsigned long physaddr);
+void snp_cpuid_init(void);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -153,8 +144,8 @@ static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz,
static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_wakeup_secondary_cpu(void) { }
-void snp_cpuid_info_create(const struct cc_blob_sev_info *cc_info) { }
-struct cc_blob_sev_info *snp_find_cc_blob_setup_data(struct boot_params *bp) { }
+static inline void snp_cpuid_startup(struct boot_params *bp, unsigned long physbase) { }
+static inline void snp_cpuid_init(void) { }
#endif
#endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 3c0bfed3b58e..ef5efa484efa 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -571,7 +571,7 @@ static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
}
/* This runs while still in the direct mapping */
-static void startup_64_load_idt(unsigned long physbase)
+static void startup_64_load_idt(unsigned long physbase, struct boot_params *bp)
{
struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
@@ -587,6 +587,9 @@ static void startup_64_load_idt(unsigned long physbase)
desc->address = (unsigned long)idt;
native_load_idt(desc);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+ snp_cpuid_init_startup(bp, physbase);
}
/* This is used when running on kernel addresses */
@@ -598,12 +601,15 @@ void early_setup_idt(void)
bringup_idt_descr.address = (unsigned long)bringup_idt_table;
native_load_idt(&bringup_idt_descr);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+ snp_cpuid_init();
}
/*
* Setup boot CPU state needed before kernel switches to virtual addresses.
*/
-void __head startup_64_setup_env(unsigned long physbase)
+void __head startup_64_setup_env(unsigned long physbase, struct boot_params *bp)
{
/* Load GDT */
startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
@@ -614,5 +620,5 @@ void __head startup_64_setup_env(unsigned long physbase)
"movl %%eax, %%ss\n"
"movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");
- startup_64_load_idt(physbase);
+ startup_64_load_idt(physbase, bp);
}
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index b321c1b7d07c..341ea8800b9f 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -976,7 +976,7 @@ static struct cc_setup_data *get_cc_setup_data(struct boot_params *bp)
* Search for a Confidential Computing blob passed in as a setup_data entry
* via the Linux Boot Protocol.
*/
-struct cc_blob_sev_info *
+static struct cc_blob_sev_info *
snp_find_cc_blob_setup_data(struct boot_params *bp)
{
struct cc_setup_data *sd;
@@ -988,6 +988,22 @@ snp_find_cc_blob_setup_data(struct boot_params *bp)
return (struct cc_blob_sev_info *)(unsigned long)sd->cc_blob_address;
}
+static const struct snp_cpuid_info *
+snp_cpuid_info_get_ptr(unsigned long physbase)
+{
+ void *ptr = &cpuid_info_copy;
+
+ /* physbase is only 0 when the caller doesn't need adjustments */
+ if (!physbase)
+ return ptr;
+
+ /*
+ * Handle relocation adjustments for global pointers, as done by
+ * fixup_pointer() in __startup64().
+ */
+ return ptr - (void *)_text + (void *)physbase;
+}
+
/*
* Initialize the kernel's copy of the SEV-SNP CPUID table, and set up the
* pointer that will be used to access it.
@@ -997,7 +1013,8 @@ snp_find_cc_blob_setup_data(struct boot_params *bp)
* mapping needs to be updated in sync with all the changes to virtual memory
* layout and related mapping facilities throughout the boot process.
*/
-void __init snp_cpuid_info_create(const struct cc_blob_sev_info *cc_info)
+static void __init snp_cpuid_info_create(const struct cc_blob_sev_info *cc_info,
+ unsigned long physbase)
{
const struct snp_cpuid_info *cpuid_info_fw;
@@ -1008,7 +1025,7 @@ void __init snp_cpuid_info_create(const struct cc_blob_sev_info *cc_info)
if (!cpuid_info_fw->count || cpuid_info_fw->count > SNP_CPUID_COUNT_MAX)
sev_es_terminate(1, GHCB_TERM_CPUID);
- cpuid_info = &cpuid_info_copy;
+ cpuid_info = snp_cpuid_info_get_ptr(physbase);
memcpy((void *)cpuid_info, cpuid_info_fw, sizeof(*cpuid_info));
snp_cpuid_set_ranges();
}
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index d348ad027df8..1e6152fe27ba 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1986,3 +1986,138 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
while (true)
halt();
}
+
+/*
+ * Initial set up of SEV-SNP CPUID table relies on information provided
+ * by the Confidential Computing blob, which can be passed to the kernel
+ * in the following ways, depending on how it is booted:
+ *
+ * - when booted via the boot/decompress kernel:
+ * - via boot_params
+ *
+ * - when booted directly by firmware/bootloader (e.g. CONFIG_PVH):
+ * - via a setup_data entry, as defined by the Linux Boot Protocol
+ *
+ * Scan for the blob in that order.
+ */
+struct cc_blob_sev_info *snp_find_cc_blob(struct boot_params *bp)
+{
+ struct cc_blob_sev_info *cc_info;
+
+ /* Boot kernel would have passed the CC blob via boot_params. */
+ if (bp->cc_blob_address) {
+ cc_info = (struct cc_blob_sev_info *)
+ (unsigned long)bp->cc_blob_address;
+ goto found_cc_info;
+ }
+
+ /*
+ * If kernel was booted directly, without the use of the
+ * boot/decompression kernel, the CC blob may have been passed via
+ * setup_data instead.
+ */
+ cc_info = snp_find_cc_blob_setup_data(bp);
+ if (!cc_info)
+ return NULL;
+
+found_cc_info:
+ if (cc_info->magic != CC_BLOB_SEV_HDR_MAGIC)
+ sev_es_terminate(1, GHCB_SNP_UNSUPPORTED);
+
+ return cc_info;
+}
+
+/*
+ * Initial set up of SEV-SNP CPUID table during early startup when still
+ * using identity-mapped addresses.
+ *
+ * Since this is during early startup, physbase is needed to generate the
+ * correct pointer to the initialized CPUID table. This pointer will be
+ * adjusted again later via snp_cpuid_init() after the kernel switches over
+ * to virtual addresses and pointer fixups are no longer needed.
+ */
+void __init snp_cpuid_init_startup(struct boot_params *bp,
+ unsigned long physbase)
+{
+ struct cc_blob_sev_info *cc_info;
+ u32 eax;
+
+ if (!bp)
+ return;
+
+ cc_info = snp_find_cc_blob(bp);
+ if (!cc_info)
+ return;
+
+ snp_cpuid_info_create(cc_info, physbase);
+
+ /* SEV-SNP CPUID table is set up now. Do some sanity checks. */
+ if (!snp_cpuid_active())
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ /* SEV (bit 1) and SEV-SNP (bit 4) should be enabled in CPUID. */
+ eax = native_cpuid_eax(0x8000001f);
+ if (!(eax & (BIT(4) | BIT(1))))
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ /* #VC generated by CPUID above will set sev_status based on SEV MSR. */
+ if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ /*
+ * The CC blob will be used later to access the secrets page. Cache
+ * it here like the boot kernel does.
+ */
+ bp->cc_blob_address = (u32)(unsigned long)cc_info;
+}
+
+/*
+ * This is called after the kernel switches over to virtual addresses. Fixup
+ * offsets are no longer needed at this point, so update the CPUID table
+ * pointer accordingly.
+ */
+void snp_cpuid_init(void)
+{
+ if (!cc_platform_has(CC_ATTR_SEV_SNP)) {
+ /* Firmware should not have advertised the feature. */
+ if (snp_cpuid_active())
+ panic("Invalid use of SEV-SNP CPUID table.");
+ return;
+ }
+
+ /* CPUID table should always be available when SEV-SNP is enabled. */
+ if (!snp_cpuid_active())
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ /* Remove the fixup offset from the cpuid_info pointer. */
+ cpuid_info = snp_cpuid_info_get_ptr(0);
+}
+
+/*
+ * It is useful from an auditing/testing perspective to provide an easy way
+ * for the guest owner to know that the CPUID table has been initialized as
+ * expected, but that initialization happens too early in boot to print any
+ * sort of indicator, and there's not really any other good place to do it. So
+ * do it here, and while at it, go ahead and re-verify that nothing strange has
+ * happened between early boot and now.
+ */
+static int __init snp_cpuid_check_status(void)
+{
+ if (!cc_platform_has(CC_ATTR_SEV_SNP)) {
+ /* Firmware should not have advertised the feature. */
+ if (snp_cpuid_active())
+ panic("Invalid use of SEV-SNP CPUID table.");
+ return 0;
+ }
+
+ /* CPUID table should always be available when SEV-SNP is enabled. */
+ if (!snp_cpuid_active())
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ pr_info("Using SEV-SNP CPUID table, %d entries present.\n",
+ cpuid_info->count);
+
+ return 0;
+}
+
+arch_initcall(snp_cpuid_check_status);
--
2.25.1
Version 2 of GHCB specification provides Non Automatic Exit (NAE) that can
be used by the SNP guest to communicate with the PSP without risk from a
malicious hypervisor who wishes to read, alter, drop or replay the messages
sent.
SNP_LAUNCH_UPDATE can insert two special pages into the guest’s memory:
the secrets page and the CPUID page. The PSP firmware populate the contents
of the secrets page. The secrets page contains encryption keys used by the
guest to interact with the firmware. Because the secrets page is encrypted
with the guest’s memory encryption key, the hypervisor cannot read the keys.
See SNP FW ABI spec for further details about the secrets page.
Create a platform device that the SNP guest driver can bind to get the
platform resources such as encryption key and message id to use to
communicate with the PSP. The SNP guest driver provides a userspace
interface to get the attestation report, key derivation, extended
attestation report etc.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 4 +++
arch/x86/kernel/sev.c | 61 ++++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 4ea8e2f73d37..2a9e6ea11242 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -88,6 +88,10 @@ struct snp_req_data {
unsigned int data_npages;
};
+struct snp_guest_platform_data {
+ u64 secrets_gpa;
+};
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index c29a78f868ed..01505ac9c7b2 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -19,6 +19,9 @@
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/cpumask.h>
+#include <linux/efi.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>
#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -34,6 +37,7 @@
#include <asm/cpu.h>
#include <asm/apic.h>
#include <asm/cpuid.h>
+#include <asm/setup.h>
#define DR7_RESET_VALUE 0x400
@@ -2171,3 +2175,60 @@ int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned
return ret;
}
EXPORT_SYMBOL_GPL(snp_issue_guest_request);
+
+static struct platform_device guest_req_device = {
+ .name = "snp-guest",
+ .id = -1,
+};
+
+static u64 get_secrets_page(void)
+{
+ u64 pa_data = boot_params.cc_blob_address;
+ struct cc_blob_sev_info info;
+ void *map;
+
+ /*
+ * The CC blob contains the address of the secrets page, check if the
+ * blob is present.
+ */
+ if (!pa_data)
+ return 0;
+
+ map = early_memremap(pa_data, sizeof(info));
+ memcpy(&info, map, sizeof(info));
+ early_memunmap(map, sizeof(info));
+
+ /* smoke-test the secrets page passed */
+ if (!info.secrets_phys || info.secrets_len != PAGE_SIZE)
+ return 0;
+
+ return info.secrets_phys;
+}
+
+static int __init init_snp_platform_device(void)
+{
+ struct snp_guest_platform_data data;
+ u64 gpa;
+
+ if (!cc_platform_has(CC_ATTR_SEV_SNP))
+ return -ENODEV;
+
+ gpa = get_secrets_page();
+ if (!gpa)
+ return -ENODEV;
+
+ data.secrets_gpa = gpa;
+ if (platform_device_add_data(&guest_req_device, &data, sizeof(data)))
+ goto e_fail;
+
+ if (platform_device_register(&guest_req_device))
+ goto e_fail;
+
+ pr_info("SNP guest platform device initialized.\n");
+ return 0;
+
+e_fail:
+ pr_err("Failed to initialize SNP guest device\n");
+ return -ENODEV;
+}
+device_initcall(init_snp_platform_device);
--
2.25.1
From: Michael Roth <[email protected]>
The run-time kernel will need to access the Confidential Computing
blob very early in boot to access the CPUID table it points to. At
that stage of boot it will be relying on the identity-mapped page table
set up by boot/compressed kernel, so make sure the blob and the CPUID
table it points to are mapped in advance.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/ident_map_64.c | 26 ++++++++++++++++++++++++-
arch/x86/boot/compressed/misc.h | 2 ++
arch/x86/boot/compressed/sev.c | 2 +-
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index 3cf7a7575f5c..10ecbc53f8bc 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -37,6 +37,8 @@
#include <asm/setup.h> /* For COMMAND_LINE_SIZE */
#undef _SETUP
+#include <asm/sev.h> /* For ConfidentialComputing blob */
+
extern unsigned long get_cmd_line_ptr(void);
/* Used by PAGE_KERN* macros: */
@@ -106,6 +108,27 @@ static void add_identity_map(unsigned long start, unsigned long end)
error("Error: kernel_ident_mapping_init() failed\n");
}
+void sev_prep_identity_maps(void)
+{
+ /*
+ * The ConfidentialComputing blob is used very early in uncompressed
+ * kernel to find the in-memory cpuid table to handle cpuid
+ * instructions. Make sure an identity-mapping exists so it can be
+ * accessed after switchover.
+ */
+ if (sev_snp_enabled()) {
+ struct cc_blob_sev_info *cc_info =
+ (void *)(unsigned long)boot_params->cc_blob_address;
+
+ add_identity_map((unsigned long)cc_info,
+ (unsigned long)cc_info + sizeof(*cc_info));
+ add_identity_map((unsigned long)cc_info->cpuid_phys,
+ (unsigned long)cc_info->cpuid_phys + cc_info->cpuid_len);
+ }
+
+ sev_verify_cbit(top_level_pgt);
+}
+
/* Locates and clears a region for a new top level page table. */
void initialize_identity_maps(void *rmode)
{
@@ -163,8 +186,9 @@ void initialize_identity_maps(void *rmode)
cmdline = get_cmd_line_ptr();
add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE);
+ sev_prep_identity_maps();
+
/* Load the new page-table. */
- sev_verify_cbit(top_level_pgt);
write_cr3(top_level_pgt);
}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 9b66a8bf336e..ce1a884e8322 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -125,6 +125,7 @@ extern bool sev_es_check_ghcb_fault(unsigned long address);
void snp_set_page_private(unsigned long paddr);
void snp_set_page_shared(unsigned long paddr);
void snp_cpuid_init_boot(struct boot_params *bp);
+bool sev_snp_enabled(void);
#else
static inline void sev_es_shutdown_ghcb(void) { }
@@ -135,6 +136,7 @@ static inline bool sev_es_check_ghcb_fault(unsigned long address)
static inline void snp_set_page_private(unsigned long paddr) { }
static inline void snp_set_page_shared(unsigned long paddr) { }
static inline void snp_cpuid_init_boot(struct boot_params *bp) { }
+static inline bool sev_snp_enabled(void) { return false; }
#endif
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 9d6a2ecb609f..1b77b819ddb4 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -120,7 +120,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
/* Include code for early handlers */
#include "../../kernel/sev-shared.c"
-static inline bool sev_snp_enabled(void)
+bool sev_snp_enabled(void)
{
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
}
--
2.25.1
From: Michael Roth <[email protected]>
SEV-SNP guests will be provided the location of special 'secrets' and
'CPUID' pages via the Confidential Computing blob. This blob is
provided to the boot kernel either through an EFI config table entry,
or via a setup_data structure as defined by the Linux Boot Protocol.
Locate the Confidential Computing from these sources and, if found,
use the provided CPUID page/table address to create a copy that the
boot kernel will use when servicing cpuid instructions via a #VC
handler.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 1 +
arch/x86/boot/compressed/idt_64.c | 5 +-
arch/x86/boot/compressed/misc.h | 2 +
arch/x86/boot/compressed/sev.c | 79 ++++++++++++++++++++++++++++++
arch/x86/include/asm/sev.h | 14 ++++++
arch/x86/kernel/sev-shared.c | 78 +++++++++++++++++++++++++++++
6 files changed, 178 insertions(+), 1 deletion(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 572c535cf45b..c9252f0b0e81 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -444,6 +444,7 @@ SYM_CODE_START(startup_64)
.Lon_kernel_cs:
pushq %rsi
+ movq %rsi, %rdi /* real mode address */
call load_stage1_idt
popq %rsi
diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
index 9b93567d663a..3c0f7c8d9152 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -28,7 +28,7 @@ static void load_boot_idt(const struct desc_ptr *dtr)
}
/* Setup IDT before kernel jumping to .Lrelocated */
-void load_stage1_idt(void)
+void load_stage1_idt(void *rmode)
{
boot_idt_desc.address = (unsigned long)boot_idt;
@@ -37,6 +37,9 @@ void load_stage1_idt(void)
set_idt_entry(X86_TRAP_VC, boot_stage1_vc);
load_boot_idt(&boot_idt_desc);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+ snp_cpuid_init_boot(rmode);
}
/* Setup IDT after kernel jumping to .Lrelocated */
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index d4a26f3d3580..9b66a8bf336e 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -124,6 +124,7 @@ void sev_es_shutdown_ghcb(void);
extern bool sev_es_check_ghcb_fault(unsigned long address);
void snp_set_page_private(unsigned long paddr);
void snp_set_page_shared(unsigned long paddr);
+void snp_cpuid_init_boot(struct boot_params *bp);
#else
static inline void sev_es_shutdown_ghcb(void) { }
@@ -133,6 +134,7 @@ static inline bool sev_es_check_ghcb_fault(unsigned long address)
}
static inline void snp_set_page_private(unsigned long paddr) { }
static inline void snp_set_page_shared(unsigned long paddr) { }
+static inline void snp_cpuid_init_boot(struct boot_params *bp) { }
#endif
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 11c459809d4c..60885d80bf5f 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -297,3 +297,82 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
else if (result != ES_RETRY)
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
}
+
+/* Search for Confidential Computing blob in the EFI config table. */
+static struct cc_blob_sev_info *snp_find_cc_blob_efi(struct boot_params *bp)
+{
+ struct cc_blob_sev_info *cc_info;
+ unsigned long conf_table_pa;
+ unsigned int conf_table_len;
+ bool efi_64;
+ int ret;
+
+ ret = efi_get_conf_table(bp, &conf_table_pa, &conf_table_len, &efi_64);
+ if (ret)
+ return NULL;
+
+ ret = efi_find_vendor_table(conf_table_pa, conf_table_len,
+ EFI_CC_BLOB_GUID, efi_64,
+ (unsigned long *)&cc_info);
+ if (ret)
+ return NULL;
+
+ return cc_info;
+}
+
+/*
+ * Initial set up of SEV-SNP CPUID table relies on information provided
+ * by the Confidential Computing blob, which can be passed to the boot kernel
+ * by firmware/bootloader in the following ways:
+ *
+ * - via an entry in the EFI config table
+ * - via a setup_data structure, as defined by the Linux Boot Protocol
+ *
+ * Scan for the blob in that order.
+ */
+struct cc_blob_sev_info *snp_find_cc_blob(struct boot_params *bp)
+{
+ struct cc_blob_sev_info *cc_info;
+
+ cc_info = snp_find_cc_blob_efi(bp);
+ if (cc_info)
+ goto found_cc_info;
+
+ cc_info = snp_find_cc_blob_setup_data(bp);
+ if (!cc_info)
+ return NULL;
+
+found_cc_info:
+ if (cc_info->magic != CC_BLOB_SEV_HDR_MAGIC)
+ sev_es_terminate(0, GHCB_SNP_UNSUPPORTED);
+
+ return cc_info;
+}
+
+void snp_cpuid_init_boot(struct boot_params *bp)
+{
+ struct cc_blob_sev_info *cc_info;
+ u32 eax;
+
+ if (!bp)
+ return;
+
+ cc_info = snp_find_cc_blob(bp);
+ if (!cc_info)
+ return;
+
+ snp_cpuid_info_create(cc_info);
+
+ /* SEV-SNP CPUID table is set up now. Do some sanity checks. */
+ if (!snp_cpuid_active())
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ /* CPUID bits for SEV (bit 1) and SEV-SNP (bit 4) should be enabled. */
+ eax = native_cpuid_eax(0x8000001f);
+ if (!(eax & (BIT(4) | BIT(1))))
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ /* It should be safe to read SEV MSR and check features now. */
+ if (!sev_snp_enabled())
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+}
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 534fa1c4c881..7c88762cdb23 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -11,6 +11,7 @@
#include <linux/types.h>
#include <asm/insn.h>
#include <asm/sev-common.h>
+#include <asm/bootparam.h>
#define GHCB_PROTOCOL_MIN 1ULL
#define GHCB_PROTOCOL_MAX 2ULL
@@ -126,6 +127,17 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
void snp_set_wakeup_secondary_cpu(void);
+/*
+ * TODO: These are exported only temporarily while boot/compressed/sev.c is
+ * the only user. This is to avoid unused function warnings for kernel/sev.c
+ * during the build of kernel proper.
+ *
+ * Once the code is added to consume these in kernel proper these functions
+ * can be moved back to being statically-scoped to units that pull in
+ * sev-shared.c via #include and these declarations can be dropped.
+ */
+void __init snp_cpuid_info_create(const struct cc_blob_sev_info *cc_info);
+struct cc_blob_sev_info *snp_find_cc_blob_setup_data(struct boot_params *bp);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -141,6 +153,8 @@ static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz,
static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_wakeup_secondary_cpu(void) { }
+void snp_cpuid_info_create(const struct cc_blob_sev_info *cc_info) { }
+struct cc_blob_sev_info *snp_find_cc_blob_setup_data(struct boot_params *bp) { }
#endif
#endif
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 193ca49a1689..b321c1b7d07c 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -66,6 +66,9 @@ static u64 __ro_after_init sev_hv_features;
* and regenerate the CPUID table/pointer when .bss is cleared.
*/
+/* Copy of the SNP firmware's CPUID page. */
+static struct snp_cpuid_info cpuid_info_copy __ro_after_init;
+
/*
* The CPUID info can't always be referenced directly due to the need for
* pointer fixups during initial startup phase of kernel proper, so access must
@@ -390,6 +393,22 @@ snp_cpuid_find_validated_func(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
return false;
}
+static void __init snp_cpuid_set_ranges(void)
+{
+ int i;
+
+ for (i = 0; i < cpuid_info->count; i++) {
+ const struct snp_cpuid_fn *fn = &cpuid_info->fn[i];
+
+ if (fn->eax_in == 0x0)
+ cpuid_std_range_max = fn->eax;
+ else if (fn->eax_in == 0x40000000)
+ cpuid_hyp_range_max = fn->eax;
+ else if (fn->eax_in == 0x80000000)
+ cpuid_ext_range_max = fn->eax;
+ }
+}
+
static bool snp_cpuid_check_range(u32 func)
{
if (func <= cpuid_std_range_max ||
@@ -934,3 +953,62 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,
return ES_OK;
}
+
+struct cc_setup_data {
+ struct setup_data header;
+ u32 cc_blob_address;
+};
+
+static struct cc_setup_data *get_cc_setup_data(struct boot_params *bp)
+{
+ struct setup_data *hdr = (struct setup_data *)bp->hdr.setup_data;
+
+ while (hdr) {
+ if (hdr->type == SETUP_CC_BLOB)
+ return (struct cc_setup_data *)hdr;
+ hdr = (struct setup_data *)hdr->next;
+ }
+
+ return NULL;
+}
+
+/*
+ * Search for a Confidential Computing blob passed in as a setup_data entry
+ * via the Linux Boot Protocol.
+ */
+struct cc_blob_sev_info *
+snp_find_cc_blob_setup_data(struct boot_params *bp)
+{
+ struct cc_setup_data *sd;
+
+ sd = get_cc_setup_data(bp);
+ if (!sd)
+ return NULL;
+
+ return (struct cc_blob_sev_info *)(unsigned long)sd->cc_blob_address;
+}
+
+/*
+ * Initialize the kernel's copy of the SEV-SNP CPUID table, and set up the
+ * pointer that will be used to access it.
+ *
+ * Maintaining a direct mapping of the SEV-SNP CPUID table used by firmware
+ * would be possible as an alternative, but the approach is brittle since the
+ * mapping needs to be updated in sync with all the changes to virtual memory
+ * layout and related mapping facilities throughout the boot process.
+ */
+void __init snp_cpuid_info_create(const struct cc_blob_sev_info *cc_info)
+{
+ const struct snp_cpuid_info *cpuid_info_fw;
+
+ if (!cc_info || !cc_info->cpuid_phys || cc_info->cpuid_len < PAGE_SIZE)
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ cpuid_info_fw = (const struct snp_cpuid_info *)cc_info->cpuid_phys;
+ if (!cpuid_info_fw->count || cpuid_info_fw->count > SNP_CPUID_COUNT_MAX)
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ cpuid_info = &cpuid_info_copy;
+ memcpy((void *)cpuid_info, cpuid_info_fw, sizeof(*cpuid_info));
+ snp_cpuid_set_ranges();
+}
--
2.25.1
From: Michael Roth <[email protected]>
The previously defined Confidential Computing blob is provided to the
kernel via a setup_data structure or EFI config table entry. Currently
these are both checked for by boot/compressed kernel to access the
CPUID table address within it for use with SEV-SNP CPUID enforcement.
To also enable SEV-SNP CPUID enforcement for the run-time kernel,
similar early access to the CPUID table is needed early on while it's
still using the identity-mapped page table set up by boot/compressed,
where global pointers need to be accessed via fixup_pointer().
This isn't much of an issue for accessing setup_data, and the EFI
config table helper code currently used in boot/compressed *could* be
used in this case as well since they both rely on identity-mapping.
However, it has some reliance on EFI helpers/string constants that
would need to be accessed via fixup_pointer(), and fixing it up while
making it shareable between boot/compressed and run-time kernel is
fragile and introduces a good bit of uglyness.
Instead, add a boot_params->cc_blob_address pointer that the
boot/compressed kernel can initialize so that the run-time kernel can
access the CC blob from there instead of re-scanning the EFI config
table.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/bootparam_utils.h | 1 +
arch/x86/include/uapi/asm/bootparam.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/bootparam_utils.h b/arch/x86/include/asm/bootparam_utils.h
index 981fe923a59f..53e9b0620d96 100644
--- a/arch/x86/include/asm/bootparam_utils.h
+++ b/arch/x86/include/asm/bootparam_utils.h
@@ -74,6 +74,7 @@ static void sanitize_boot_params(struct boot_params *boot_params)
BOOT_PARAM_PRESERVE(hdr),
BOOT_PARAM_PRESERVE(e820_table),
BOOT_PARAM_PRESERVE(eddbuf),
+ BOOT_PARAM_PRESERVE(cc_blob_address),
};
memset(&scratch, 0, sizeof(scratch));
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 1ac5acca72ce..bea5cdcdf532 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -188,7 +188,8 @@ struct boot_params {
__u32 ext_ramdisk_image; /* 0x0c0 */
__u32 ext_ramdisk_size; /* 0x0c4 */
__u32 ext_cmd_line_ptr; /* 0x0c8 */
- __u8 _pad4[116]; /* 0x0cc */
+ __u8 _pad4[112]; /* 0x0cc */
+ __u32 cc_blob_address; /* 0x13c */
struct edid_info edid_info; /* 0x140 */
struct efi_info efi_info; /* 0x1c0 */
__u32 alt_mem_k; /* 0x1e0 */
--
2.25.1
Hi Brijesh,
On 08/10/2021 21:04, Brijesh Singh wrote:
> SEV-SNP specification provides the guest a mechanisum to communicate with
> the PSP without risk from a malicious hypervisor who wishes to read, alter,
> drop or replay the messages sent. The driver uses snp_issue_guest_request()
> to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to
> submit the request to PSP.
>
> The PSP requires that all communication should be encrypted using key
> specified through the platform_data.
>
> The userspace can use SNP_GET_REPORT ioctl() to query the guest
> attestation report.
>
> See SEV-SNP spec section Guest Messages for more details.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> Documentation/virt/coco/sevguest.rst | 77 ++++
> drivers/virt/Kconfig | 3 +
> drivers/virt/Makefile | 1 +
> drivers/virt/coco/sevguest/Kconfig | 9 +
> drivers/virt/coco/sevguest/Makefile | 2 +
> drivers/virt/coco/sevguest/sevguest.c | 561 ++++++++++++++++++++++++++
> drivers/virt/coco/sevguest/sevguest.h | 98 +++++
> include/uapi/linux/sev-guest.h | 44 ++
> 8 files changed, 795 insertions(+)
> create mode 100644 Documentation/virt/coco/sevguest.rst
> create mode 100644 drivers/virt/coco/sevguest/Kconfig
> create mode 100644 drivers/virt/coco/sevguest/Makefile
> create mode 100644 drivers/virt/coco/sevguest/sevguest.c
> create mode 100644 drivers/virt/coco/sevguest/sevguest.h
> create mode 100644 include/uapi/linux/sev-guest.h
>
[...]
> +
> +static u8 *get_vmpck(int id, struct snp_secrets_page_layout *layout, u32 **seqno)
> +{
> + u8 *key = NULL;
> +
> + switch (id) {
> + case 0:
> + *seqno = &layout->os_area.msg_seqno_0;
> + key = layout->vmpck0;
> + break;
> + case 1:
> + *seqno = &layout->os_area.msg_seqno_1;
> + key = layout->vmpck1;
> + break;
> + case 2:
> + *seqno = &layout->os_area.msg_seqno_2;
> + key = layout->vmpck2;
> + break;
> + case 3:
> + *seqno = &layout->os_area.msg_seqno_3;
> + key = layout->vmpck3;
> + break;
> + default:
> + break;
> + }
> +
> + return NULL;
This should be 'return key', right?
-Dov
> +}
> +
On Fri, Oct 08, 2021 at 01:04:14PM -0500, Brijesh Singh wrote:
> From: Borislav Petkov <[email protected]>
>
> Remove all the defines of masks and bit positions for the GHCB MSR
> protocol and use comments instead which correspond directly to the spec
> so that following those can be a lot easier and straightforward with the
> spec opened in parallel to the code.
>
> Aligh vertically while at it.
>
> No functional changes.
>
> Signed-off-by: Borislav Petkov <[email protected]>
When you handle someone else's patch, you need to add your SOB
underneath to state that fact. I'll add it now but don't forget rule as
it is important to be able to show how a patch found its way upstream.
Like you've done for the next patch. :)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Dov,
On 10/10/21 10:51 AM, Dov Murik wrote:
> Hi Brijesh,
>
> On 08/10/2021 21:04, Brijesh Singh wrote:
>> SEV-SNP specification provides the guest a mechanisum to communicate with
>> the PSP without risk from a malicious hypervisor who wishes to read, alter,
>> drop or replay the messages sent. The driver uses snp_issue_guest_request()
>> to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to
>> submit the request to PSP.
>>
>> The PSP requires that all communication should be encrypted using key
>> specified through the platform_data.
>>
>> The userspace can use SNP_GET_REPORT ioctl() to query the guest
>> attestation report.
>>
>> See SEV-SNP spec section Guest Messages for more details.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> Documentation/virt/coco/sevguest.rst | 77 ++++
>> drivers/virt/Kconfig | 3 +
>> drivers/virt/Makefile | 1 +
>> drivers/virt/coco/sevguest/Kconfig | 9 +
>> drivers/virt/coco/sevguest/Makefile | 2 +
>> drivers/virt/coco/sevguest/sevguest.c | 561 ++++++++++++++++++++++++++
>> drivers/virt/coco/sevguest/sevguest.h | 98 +++++
>> include/uapi/linux/sev-guest.h | 44 ++
>> 8 files changed, 795 insertions(+)
>> create mode 100644 Documentation/virt/coco/sevguest.rst
>> create mode 100644 drivers/virt/coco/sevguest/Kconfig
>> create mode 100644 drivers/virt/coco/sevguest/Makefile
>> create mode 100644 drivers/virt/coco/sevguest/sevguest.c
>> create mode 100644 drivers/virt/coco/sevguest/sevguest.h
>> create mode 100644 include/uapi/linux/sev-guest.h
>>
> [...]
>
>
>> +
>> +static u8 *get_vmpck(int id, struct snp_secrets_page_layout *layout, u32 **seqno)
>> +{
>> + u8 *key = NULL;
>> +
>> + switch (id) {
>> + case 0:
>> + *seqno = &layout->os_area.msg_seqno_0;
>> + key = layout->vmpck0;
>> + break;
>> + case 1:
>> + *seqno = &layout->os_area.msg_seqno_1;
>> + key = layout->vmpck1;
>> + break;
>> + case 2:
>> + *seqno = &layout->os_area.msg_seqno_2;
>> + key = layout->vmpck2;
>> + break;
>> + case 3:
>> + *seqno = &layout->os_area.msg_seqno_3;
>> + key = layout->vmpck3;
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> + return NULL;
> This should be 'return key', right?
Yes, I did caught that during my testing and the hunk to fix it is in
42/42. I missed merging the hunk in this patch and will take care in
next rev. thanks
On Fri, Oct 08, 2021 at 01:04:18PM -0500, Brijesh Singh wrote:
> Version 2 of GHCB specification introduced advertisement of a features
> that are supported by the hypervisor. Add support to query the HV
> features on boot.
>
> Version 2 of GHCB specification adds several new NAEs, most of them are
> optional except the hypervisor feature. Now that hypervisor feature NAE
> is implemented, so bump the GHCB maximum support protocol version.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/sev-common.h | 3 +++
> arch/x86/include/asm/sev.h | 2 +-
> arch/x86/include/uapi/asm/svm.h | 2 ++
> arch/x86/kernel/sev-shared.c | 30 ++++++++++++++++++++++++++++++
> 4 files changed, 36 insertions(+), 1 deletion(-)
For the next version, when you add those variables, do this too pls:
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 8ee27d07c1cd..7a2176e0d0ad 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -21,10 +21,10 @@
*
* GHCB protocol version negotiated with the hypervisor.
*/
-static u16 __ro_after_init ghcb_version;
+static u16 ghcb_version __ro_after_init;
/* Bitmap of SEV features supported by the hypervisor */
-static u64 __ro_after_init sev_hv_features;
+static u64 sev_hv_features __ro_after_init;
static bool __init sev_es_check_cpu_features(void)
{
I didn't realize this earlier but we put that annotation at the end.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Oct 08, 2021 at 01:04:19PM -0500, Brijesh Singh wrote:
> From: Michael Roth <[email protected]>
>
> Generally access to MSR_AMD64_SEV is only safe if the 0x8000001F CPUID
> leaf indicates SEV support. With SEV-SNP, CPUID responses from the
> hypervisor are not considered trustworthy, particularly for 0x8000001F.
> SEV-SNP provides a firmware-validated CPUID table to use as an
> alternative, but prior to checking MSR_AMD64_SEV there are no
> guarantees that this is even an SEV-SNP guest.
>
> Rather than relying on these CPUID values early on, allow SEV-ES and
> SEV-SNP guests to instead use a cpuid instruction to trigger a #VC and
> have it cache MSR_AMD64_SEV in sev_status, since it is known to be safe
> to access MSR_AMD64_SEV if a #VC has triggered.
>
> Signed-off-by: Michael Roth <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/sev-shared.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 8ee27d07c1cd..2796c524d174 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -191,6 +191,20 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
> if (exit_code != SVM_EXIT_CPUID)
> goto fail;
>
> + /*
> + * A #VC implies that either SEV-ES or SEV-SNP are enabled, so the SEV
> + * MSR is also available. Go ahead and initialize sev_status here to
> + * allow SEV features to be checked without relying solely on the SEV
> + * cpuid bit to indicate whether it is safe to do so.
> + */
> + if (!sev_status) {
> + unsigned long lo, hi;
> +
> + asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
> + : "c" (MSR_AMD64_SEV));
> + sev_status = (hi << 32) | lo;
> + }
> +
> sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX));
> VMGEXIT();
> val = sev_es_rd_ghcb_msr();
> --
Ok, you guys are killing me. ;-\
How is bolting some pretty much unrelated code into the early #VC
handler not a hack? Do you not see it?
So sme_enable() is reading MSR_AMD64_SEV and setting up everything
there, including sev_status. If a SNP guest does not trust CPUID, why
can't you attempt to read that MSR there, even if CPUID has lied to the
guest?
And not just slap it somewhere just because it works?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Oct 18, 2021 at 04:29:07PM +0200, Borislav Petkov wrote:
> On Fri, Oct 08, 2021 at 01:04:19PM -0500, Brijesh Singh wrote:
> > From: Michael Roth <[email protected]>
> >
> > Generally access to MSR_AMD64_SEV is only safe if the 0x8000001F CPUID
> > leaf indicates SEV support. With SEV-SNP, CPUID responses from the
> > hypervisor are not considered trustworthy, particularly for 0x8000001F.
> > SEV-SNP provides a firmware-validated CPUID table to use as an
> > alternative, but prior to checking MSR_AMD64_SEV there are no
> > guarantees that this is even an SEV-SNP guest.
> >
> > Rather than relying on these CPUID values early on, allow SEV-ES and
> > SEV-SNP guests to instead use a cpuid instruction to trigger a #VC and
> > have it cache MSR_AMD64_SEV in sev_status, since it is known to be safe
> > to access MSR_AMD64_SEV if a #VC has triggered.
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > Signed-off-by: Brijesh Singh <[email protected]>
> > ---
> > arch/x86/kernel/sev-shared.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> > index 8ee27d07c1cd..2796c524d174 100644
> > --- a/arch/x86/kernel/sev-shared.c
> > +++ b/arch/x86/kernel/sev-shared.c
> > @@ -191,6 +191,20 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
> > if (exit_code != SVM_EXIT_CPUID)
> > goto fail;
> >
> > + /*
> > + * A #VC implies that either SEV-ES or SEV-SNP are enabled, so the SEV
> > + * MSR is also available. Go ahead and initialize sev_status here to
> > + * allow SEV features to be checked without relying solely on the SEV
> > + * cpuid bit to indicate whether it is safe to do so.
> > + */
> > + if (!sev_status) {
> > + unsigned long lo, hi;
> > +
> > + asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
> > + : "c" (MSR_AMD64_SEV));
> > + sev_status = (hi << 32) | lo;
> > + }
> > +
> > sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX));
> > VMGEXIT();
> > val = sev_es_rd_ghcb_msr();
> > --
>
> Ok, you guys are killing me. ;-\
>
> How is bolting some pretty much unrelated code into the early #VC
> handler not a hack? Do you not see it?
This was the result of my proposal in v5:
> More specifically, the general protocol to determine SNP is enabled
> seems
> to be:
>
> 1) check cpuid 0x8000001f to determine if SEV bit is enabled and SEV
> MSR is available
> 2) check the SEV MSR to see if SEV-SNP bit is set
>
> but the conundrum here is the CPUID page is only valid if SNP is
> enabled, otherwise it can be garbage. So the code to set up the page
> skips those checks initially, and relies on the expectation that UEFI,
> or whatever the initial guest blob was, will only provide a CC_BLOB if
> it already determined SNP is enabled.
>
> It's still possible something goes awry and the kernel gets handed a
> bogus CC_BLOB even though SNP isn't actually enabled. In this case the
> cpuid values could be bogus as well, but the guest will fail
> attestation then and no secrets should be exposed.
>
> There is one thing that could tighten up the check a bit though. Some
> bits of SEV-ES code will use the generation of a #VC as an indicator
> of SEV-ES support, which implies SEV MSR is available without relying
> on hypervisor-provided CPUID bits. I could add a one-time check in
> the cpuid #VC to check SEV MSR for SNP bit, but it would likely
> involve another static __ro_after_init variable store state. If that
> seems worthwhile I can look into that more as well.
Yes, the skipping of checks above sounds weird: why don't you simply
keep the checks order: SEV, -ES, -SNP and then parse CPUID. It'll fail
at attestation eventually, but you'll have the usual flow like with the
rest of the SEV- feature picking apart.
https://lore.kernel.org/lkml/[email protected]/
I'd thought you didn't like the previous approach of having snp_cpuid_init()
defer the CPUID/MSR checks until sme_enable() sets up sev_status later on,
then failing the boot retroactively if SNP bit isn't set but CPUID table
was advertised. So I added those checks in snp_cpuid_init(), along with the
additional #VC-based indicator of SEV-ES/SEV-SNP support as an additional
sanity check of what EFI firmware was providing, since I thought that was
the key concern here.
Now I'm realizing that perhaps your suggestion was to actually defer the
entire CPUID page setup until after sme_enable(). Is that correct?
>
> So sme_enable() is reading MSR_AMD64_SEV and setting up everything
> there, including sev_status. If a SNP guest does not trust CPUID, why
> can't you attempt to read that MSR there, even if CPUID has lied to the
> guest?
If CPUID has lied, that would result in a #GP, rather than a controlled
termination in the various checkers/callers. The latter is easier to
debug.
Additionally, #VC is arguably a better indicator of SEV MSR availability
for SEV-ES/SEV-SNP guests, since it is only generated by ES/SNP hardware
and doesn't rely directly on hypervisor/EFI-provided CPUID values. It
doesn't work for SEV guests, but I don't think it's a bad idea to allow
SEV-ES/SEV-SNP guests to initialize sev_status in #VC handler to make
use of the added assurance.
Is it just the way it's currently implemented as something
cpuid-table-specific that's at issue, or are you opposed to doing so in
general?
Thanks,
Mike
>
> And not just slap it somewhere just because it works?
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C462c7481ae414f7706a808d99243a615%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637701641625364120%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6MViA5KCFEgSA2fijEx3Dg05btIEAjw55bFYRKL0P6o%3D&reserved=0
On Mon, Oct 18, 2021 at 01:40:03PM -0500, Michael Roth wrote:
> If CPUID has lied, that would result in a #GP, rather than a controlled
> termination in the various checkers/callers. The latter is easier to
> debug.
>
> Additionally, #VC is arguably a better indicator of SEV MSR availability
> for SEV-ES/SEV-SNP guests, since it is only generated by ES/SNP hardware
> and doesn't rely directly on hypervisor/EFI-provided CPUID values. It
> doesn't work for SEV guests, but I don't think it's a bad idea to allow
> SEV-ES/SEV-SNP guests to initialize sev_status in #VC handler to make
> use of the added assurance.
Ok, let's take a step back and analyze what we're trying to solve first.
So I'm looking at sme_enable():
1. Code checks SME/SEV support leaf. HV lies and says there's none. So
guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor
won't be able to give confidentiality to its users => users go away or
do unencrypted like now.
Problem is solved by political and economical pressure.
2. Check SEV and SME bit. HV lies here. Oh well, same as the above.
3. HV lies about 1. and 2. but says that SME/SEV is supported.
Guest attempts to read the MSR Guest explodes due to the #GP. The same
political/economical pressure thing happens.
If the MSR is really there, we've landed at the place where we read the
SEV MSR. Moment of truth - SEV/SNP guests have a communication protocol
which is independent from the HV and all good.
Now, which case am I missing here which justifies the need to do those
acrobatics of causing #VCs just to detect the SEV MSR?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Oct 08, 2021 at 01:04:20PM -0500, Brijesh Singh wrote:
> +static bool do_early_sev_setup(void)
> {
> if (!sev_es_negotiate_protocol())
> sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_PROT_UNSUPPORTED);
>
> + /*
> + * If SEV-SNP is enabled, then check if the hypervisor supports the SEV-SNP
> + * features.
This and the other comment should say something along the lines of:
"SNP is supported in v2 of the GHCB spec which mandates support for HV
features."
because it wasn't clear to me why we're enforcing that support here.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Oct 18, 2021 at 09:18:13PM +0200, Borislav Petkov wrote:
> On Mon, Oct 18, 2021 at 01:40:03PM -0500, Michael Roth wrote:
> > If CPUID has lied, that would result in a #GP, rather than a controlled
> > termination in the various checkers/callers. The latter is easier to
> > debug.
> >
> > Additionally, #VC is arguably a better indicator of SEV MSR availability
> > for SEV-ES/SEV-SNP guests, since it is only generated by ES/SNP hardware
> > and doesn't rely directly on hypervisor/EFI-provided CPUID values. It
> > doesn't work for SEV guests, but I don't think it's a bad idea to allow
> > SEV-ES/SEV-SNP guests to initialize sev_status in #VC handler to make
> > use of the added assurance.
>
[Sorry for the wall of text, just trying to work through everything.]
> Ok, let's take a step back and analyze what we're trying to solve first.
> So I'm looking at sme_enable():
I'm not sure if this is pertaining to using the CPUID table prior to
sme_enable(), or just the #VC-based SEV MSR read. The following comments
assume the former. If that assumption is wrong you can basically ignore
the rest of this email :)
[The #VC-based SEV MSR read is not necessary for anything in sme_enable(),
it's simply a way to determine whether the guest is an SNP guest, without
any reliance on CPUID, which seemed useful in the context of doing some
additional sanity checks against the SNP CPUID table and determining that
it's appropriate to use it early on (rather than just trust that this is an
SNP guest by virtue of the CC blob being present, and then failing later
once sme_enable() checks for the SNP feature bits through the normal
mechanism, as was done in v5).]
>
> 1. Code checks SME/SEV support leaf. HV lies and says there's none. So
> guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor
> won't be able to give confidentiality to its users => users go away or
> do unencrypted like now.
>
> Problem is solved by political and economical pressure.
>
> 2. Check SEV and SME bit. HV lies here. Oh well, same as the above.
I'd be worried about the possibility that, through some additional exploits
or failures in the attestation flow, a guest owner was tricked into booting
unencrypted on a compromised host and exposing their secrets. Their
attestation process might even do some additional CPUID sanity checks, which
would at the point be via the SNP CPUID table and look legitimate, unaware
that the kernel didn't actually use the SNP CPUID table until after
0x8000001F was parsed (if we were to only initialize it after/as-part-of
sme_enable()).
Fortunately in this scenario I think the guest kernel actually would fail to
boot due to the SNP hardware unconditionally treating code/page tables as
encrypted pages. I tested some of these scenarios just to check, but not
all, and I still don't feel confident enough about it to say that there's
not some way to exploit this by someone who is more clever/persistant than
me.
>
> 3. HV lies about 1. and 2. but says that SME/SEV is supported.
>
> Guest attempts to read the MSR Guest explodes due to the #GP. The same
> political/economical pressure thing happens.
That's seems likely, but maybe some future hardware bug, or some other
exploit, makes it possible to intercept that MSR read? I don't know, but
if that particular branch of execution can be made less likely by utilizing
SNP CPUID validation I think it makes sense to make use of it.
>
> If the MSR is really there, we've landed at the place where we read the
> SEV MSR. Moment of truth - SEV/SNP guests have a communication protocol
> which is independent from the HV and all good.
At which point we then switch to using the CPUID table? But at that
point all the previous CPUID checks, both SEV-related/non-SEV-related,
are now possibly not consistent with what's in the CPUID table. Do we
then revalidate? Even a non-malicious hypervisor might provide
inconsistent values between the two sources due to bugs, or SNP
validation suppressing certain feature bits that hypervisor otherwise
exposes, etc. Now all the code after sme_enable() can potentially take
unexpected execution paths, where post-sme_enable() code makes
assumptions about pre-sme_enable() checks that may no longer hold true.
Also, it would be useful from an attestation perspective that the CPUID
bits visible to userspace correspond to what the kernel used during boot,
which wouldn't necessarily be the case if hypervisor-provided values were
used during early boot and potentially put the kernel into some unexpected
state that could persist beyond the point of attestation.
Code-wise, thanks in large part to your suggestions, it really isn't all
that much more complicated to hook in the CPUID table lookup in the #VC
handlers (which are already needed anyway for SEV-ES) early on so all
these checks are against the same trusted (or more-trusted at least)
CPUID source.
>
> Now, which case am I missing here which justifies the need to do those
> acrobatics of causing #VCs just to detect the SEV MSR?
There are a few more places where cpuid is utilized prior to
sme_enable():
# In boot/compressed
paging_prepare():
ecx = cpuid(7, 0) # SNP-verified against host values
# check ecx for LA57
# In boot/compressed and kernel proper
verify_cpu():
eax, ebx, ecx, edx = cpuid(0, 0) # SNP-verified against host values
# check eax for range > 0
# check ebx, ecx, edx for "AuthenticAMD" or "GenuineIntel"
if_amd:
edx = cpuid(1, 0) # SNP-verified against host values
# check edx feature bits against REQUIRED_MASK0 (PAE|FPU|PSE|etc.)
eax = cpuid(0x80000001, 0) # SNP-verified against host values
# check eax against REQUIRED_MASK1 (LM|3DNOW)
edx = cpuid(1, 0) # SNP-verified against host values
# check eax against SSE_MASK
# if not set, try to force it on via MSR_K7_HWCR if this is an AMD CPU
# if forcing fails, report no_longmode available
if_intel:
# completely different stuff
It's possible that various lies about the values checked for in
REQUIRED_MASK0/REQUIRED_MASK1, LA57 enablement, etc., can be audited in
similar fashion as you've done above to find nothing concerning, but
what about 5 years from now? And these are all checks/configuration that
can put the kernel in unexpected states that persist beyond the point of
attestation, where we really need to care about the possible effects. If
SNP CPUID validation isn't utilized until after-the-fact, we'd end up
not utilizing it for some of the more 'interesting' CPUID bits.
It's also worth noting that TDX guards against most of this through
CPUID virtualization, where hardware/microcode provides similar
validation for these sorts of CPUID bits in early boot. It's only because
the SEV-SNP CPUID 'virtualization' lives in the guest code that we have to
deal with the additional complexity of initializing the CPUID table early
on. But if both platforms are capable of providing similar assurances then
it seems worthwhile to pursue that.
> acrobatics of causing #VCs just to detect the SEV MSR?
The CPUID calls in snp_cpuid_init() weren't added specifically to induce
the #VC-based SEV MSR read, they were added only because I thought the
gist of your earlier suggestions were to do more validation against the
CPUID table advertised by EFI rather than the v5 approach of deferring
them till later after sev_status gets set by sme_enable(), and then
failing later if turned out that SNP CPUID feature bit wasn't set by
sme_enable(). I thought this was motivated by a desire to be more
paranoid about what EFI provides, so once I had the cpuid checks added
in snp_cpuid_init() it seemed like a logical step to further
sanity-check the SNP CPUID bit using a mechanism that was completely
independent of the CPUID table.
I'm not dead set on that at all however, it was only added based on me
[mis-]interpreting your comments as a desire to be less trusting of EFI
as to whether this is an SNP guest or not. But I also don't think it's
a good idea to not utilize the CPUID table until after sme_enable(),
based on the above reasons.
What if we simply add a check in sme_enable() that terminates the guest
if the cc_blob/cpuid table is provided, but the CPUID/MSR checks in
sme_enable() determine that this isn't an SNP guest? That would be similar
to the v5 approach, but in a less roundabout way, and then the cpuid/MSR
checks could be dropped from snp_cpuid_init().
If we did decide that it is useful to use #VC-based initialization of
sev_status there, it would all be self-contained there (but again, that's
a separate thing that I don't have a strong opinion on).
Thanks,
Mike
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7Cddb8c27d71794244176308d9926c094d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637701815061371715%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LUArcQqb%2F0WFlworDbZOClXhgYqEGje364fpBycixUg%3D&reserved=0
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> [Sorry for the wall of text, just trying to work through everything.]
And I'm going to respond in a couple of mails just for my own sanity.
> I'm not sure if this is pertaining to using the CPUID table prior to
> sme_enable(), or just the #VC-based SEV MSR read. The following comments
> assume the former. If that assumption is wrong you can basically ignore
> the rest of this email :)
This is pertaining to me wanting to show you that the design of this SNP
support needs to be sane and maintainable and every function needs to
make sense not only now but in the future.
In this particular example, we should set sev_status *once*, *before*
anything accesses it so that it is prepared when something needs it. Not
do a #VC and go, "oh, btw, is sev_status set? No? Ok, lemme set it."
which basically means our design is seriously lacking.
And I had suggested a similar thing for TDX and tglx was 100% right in
shooting it down because we do properly designed things - not, get stuff
in so that vendor is happy and then, once the vendor programmers have
disappeared to do their next enablement task, the maintainers get to mop
up and maintain it forever.
Because this mopping up doesn't scale - trust me.
> [The #VC-based SEV MSR read is not necessary for anything in sme_enable(),
> it's simply a way to determine whether the guest is an SNP guest, without
> any reliance on CPUID, which seemed useful in the context of doing some
> additional sanity checks against the SNP CPUID table and determining that
> it's appropriate to use it early on (rather than just trust that this is an
> SNP guest by virtue of the CC blob being present, and then failing later
> once sme_enable() checks for the SNP feature bits through the normal
> mechanism, as was done in v5).]
So you need to make up your mind here design-wise, what you wanna do.
The proper thing to do would be, to detect *everything*, detect whether
this is an SNP guest, yadda yadda, everything your code is going to need
later on, and then be done with it.
Then you continue with the boot and now your other code queries
everything that has been detected up til now and uses it.
End of mail 1.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > 1. Code checks SME/SEV support leaf. HV lies and says there's none. So
> > guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor
> > won't be able to give confidentiality to its users => users go away or
> > do unencrypted like now.
> >
> > Problem is solved by political and economical pressure.
> >
> > 2. Check SEV and SME bit. HV lies here. Oh well, same as the above.
>
> I'd be worried about the possibility that, through some additional exploits
> or failures in the attestation flow,
Well, that puts forward an important question: how do you verify
*reliably* that this is an SNP guest?
- attestation?
- CPUID?
- anything else?
I don't see this written down anywhere. Because this assumption will
guide the design in the kernel.
> a guest owner was tricked into booting unencrypted on a compromised
> host and exposing their secrets. Their attestation process might even
> do some additional CPUID sanity checks, which would at the point
> be via the SNP CPUID table and look legitimate, unaware that the
> kernel didn't actually use the SNP CPUID table until after 0x8000001F
> was parsed (if we were to only initialize it after/as-part-of
> sme_enable()).
So what happens with that guest owner later?
How is she to notice that she booted unencrypted?
> Fortunately in this scenario I think the guest kernel actually would fail to
> boot due to the SNP hardware unconditionally treating code/page tables as
> encrypted pages. I tested some of these scenarios just to check, but not
> all, and I still don't feel confident enough about it to say that there's
> not some way to exploit this by someone who is more clever/persistant than
> me.
All this design needs to be preceded with: "We protect against cases A,
B and C and not against D, E, etc."
So that it is clear to all parties involved what we're working with and
what we're protecting against and what we're *not* protecting against.
End of mail 2, more later.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Oct 8, 2021 at 12:06 PM Brijesh Singh <[email protected]> wrote:
>
> SEV-SNP specification provides the guest a mechanisum to communicate with
> the PSP without risk from a malicious hypervisor who wishes to read, alter,
> drop or replay the messages sent. The driver uses snp_issue_guest_request()
> to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to
> submit the request to PSP.
>
> The PSP requires that all communication should be encrypted using key
> specified through the platform_data.
>
> The userspace can use SNP_GET_REPORT ioctl() to query the guest
> attestation report.
>
> See SEV-SNP spec section Guest Messages for more details.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> Documentation/virt/coco/sevguest.rst | 77 ++++
> drivers/virt/Kconfig | 3 +
> drivers/virt/Makefile | 1 +
> drivers/virt/coco/sevguest/Kconfig | 9 +
> drivers/virt/coco/sevguest/Makefile | 2 +
> drivers/virt/coco/sevguest/sevguest.c | 561 ++++++++++++++++++++++++++
> drivers/virt/coco/sevguest/sevguest.h | 98 +++++
> include/uapi/linux/sev-guest.h | 44 ++
> 8 files changed, 795 insertions(+)
> create mode 100644 Documentation/virt/coco/sevguest.rst
> create mode 100644 drivers/virt/coco/sevguest/Kconfig
> create mode 100644 drivers/virt/coco/sevguest/Makefile
> create mode 100644 drivers/virt/coco/sevguest/sevguest.c
> create mode 100644 drivers/virt/coco/sevguest/sevguest.h
> create mode 100644 include/uapi/linux/sev-guest.h
>
> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
> new file mode 100644
> index 000000000000..002c90946b8a
> --- /dev/null
> +++ b/Documentation/virt/coco/sevguest.rst
> @@ -0,0 +1,77 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===================================================================
> +The Definitive SEV Guest API Documentation
> +===================================================================
> +
> +1. General description
> +======================
> +
> +The SEV API is a set of ioctls that are used by the guest or hypervisor
> +to get or set certain aspect of the SEV virtual machine. The ioctls belong
> +to the following classes:
> +
> + - Hypervisor ioctls: These query and set global attributes which affect the
> + whole SEV firmware. These ioctl are used by platform provision tools.
> +
> + - Guest ioctls: These query and set attributes of the SEV virtual machine.
> +
> +2. API description
> +==================
> +
> +This section describes ioctls that can be used to query or set SEV guests.
> +For each ioctl, the following information is provided along with a
> +description:
> +
> + Technology:
> + which SEV techology provides this ioctl. sev, sev-es, sev-snp or all.
> +
> + Type:
> + hypervisor or guest. The ioctl can be used inside the guest or the
> + hypervisor.
> +
> + Parameters:
> + what parameters are accepted by the ioctl.
> +
> + Returns:
> + the return value. General error numbers (ENOMEM, EINVAL)
> + are not detailed, but errors with specific meanings are.
> +
> +The guest ioctl should be issued on a file descriptor of the /dev/sev-guest device.
> +The ioctl accepts struct snp_user_guest_request. The input and output structure is
> +specified through the req_data and resp_data field respectively. If the ioctl fails
> +to execute due to a firmware error, then fw_err code will be set.
> +
> +::
> + struct snp_guest_request_ioctl {
> + /* Request and response structure address */
> + __u64 req_data;
> + __u64 resp_data;
> +
> + /* firmware error code on failure (see psp-sev.h) */
> + __u64 fw_err;
> + };
> +
> +2.1 SNP_GET_REPORT
> +------------------
> +
> +:Technology: sev-snp
> +:Type: guest ioctl
> +:Parameters (in): struct snp_report_req
> +:Returns (out): struct snp_report_resp on success, -negative on error
> +
> +The SNP_GET_REPORT ioctl can be used to query the attestation report from the
> +SEV-SNP firmware. The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command
> +provided by the SEV-SNP firmware to query the attestation report.
> +
> +On success, the snp_report_resp.data will contains the report. The report
> +will contain the format described in the SEV-SNP specification. See the SEV-SNP
> +specification for further details.
> +
> +
> +Reference
> +---------
> +
> +SEV-SNP and GHCB specification: developer.amd.com/sev
> +
> +The driver is based on SEV-SNP firmware spec 0.9 and GHCB spec version 2.0.
> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
> index 8061e8ef449f..e457e47610d3 100644
> --- a/drivers/virt/Kconfig
> +++ b/drivers/virt/Kconfig
> @@ -36,4 +36,7 @@ source "drivers/virt/vboxguest/Kconfig"
> source "drivers/virt/nitro_enclaves/Kconfig"
>
> source "drivers/virt/acrn/Kconfig"
> +
> +source "drivers/virt/coco/sevguest/Kconfig"
> +
> endif
> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> index 3e272ea60cd9..9c704a6fdcda 100644
> --- a/drivers/virt/Makefile
> +++ b/drivers/virt/Makefile
> @@ -8,3 +8,4 @@ obj-y += vboxguest/
>
> obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
> obj-$(CONFIG_ACRN_HSM) += acrn/
> +obj-$(CONFIG_SEV_GUEST) += coco/sevguest/
> diff --git a/drivers/virt/coco/sevguest/Kconfig b/drivers/virt/coco/sevguest/Kconfig
> new file mode 100644
> index 000000000000..96190919cca8
> --- /dev/null
> +++ b/drivers/virt/coco/sevguest/Kconfig
> @@ -0,0 +1,9 @@
> +config SEV_GUEST
> + tristate "AMD SEV Guest driver"
> + default y
> + depends on AMD_MEM_ENCRYPT && CRYPTO_AEAD2
> + help
> + The driver can be used by the SEV-SNP guest to communicate with the PSP to
> + request the attestation report and more.
> +
> + If you choose 'M' here, this module will be called sevguest.
> diff --git a/drivers/virt/coco/sevguest/Makefile b/drivers/virt/coco/sevguest/Makefile
> new file mode 100644
> index 000000000000..b1ffb2b4177b
> --- /dev/null
> +++ b/drivers/virt/coco/sevguest/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_SEV_GUEST) += sevguest.o
> diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
> new file mode 100644
> index 000000000000..2d313fb2ffae
> --- /dev/null
> +++ b/drivers/virt/coco/sevguest/sevguest.c
> @@ -0,0 +1,561 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * AMD Secure Encrypted Virtualization Nested Paging (SEV-SNP) guest request interface
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Brijesh Singh <[email protected]>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/mutex.h>
> +#include <linux/io.h>
> +#include <linux/platform_device.h>
> +#include <linux/miscdevice.h>
> +#include <linux/set_memory.h>
> +#include <linux/fs.h>
> +#include <crypto/aead.h>
> +#include <linux/scatterlist.h>
> +#include <linux/psp-sev.h>
> +#include <uapi/linux/sev-guest.h>
> +#include <uapi/linux/psp-sev.h>
> +
> +#include <asm/svm.h>
> +#include <asm/sev.h>
> +
> +#include "sevguest.h"
> +
> +#define DEVICE_NAME "sev-guest"
> +#define AAD_LEN 48
> +#define MSG_HDR_VER 1
> +
> +struct snp_guest_crypto {
> + struct crypto_aead *tfm;
> + u8 *iv, *authtag;
> + int iv_len, a_len;
> +};
> +
> +struct snp_guest_dev {
> + struct device *dev;
> + struct miscdevice misc;
> +
> + struct snp_guest_crypto *crypto;
> + struct snp_guest_msg *request, *response;
> + struct snp_secrets_page_layout *layout;
> + struct snp_req_data input;
> + u32 *os_area_msg_seqno;
> +};
> +
> +static u32 vmpck_id;
> +module_param(vmpck_id, uint, 0444);
> +MODULE_PARM_DESC(vmpck_id, "The VMPCK ID to use when communicating with the PSP.");
> +
> +static DEFINE_MUTEX(snp_cmd_mutex);
> +
> +static inline u64 __snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
> +{
> + u64 count;
> +
> + /* Read the current message sequence counter from secrets pages */
> + count = *snp_dev->os_area_msg_seqno;
> +
> + return count + 1;
> +}
> +
> +/* Return a non-zero on success */
> +static u64 snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
> +{
> + u64 count = __snp_get_msg_seqno(snp_dev);
> +
> + /*
> + * The message sequence counter for the SNP guest request is a 64-bit
> + * value but the version 2 of GHCB specification defines a 32-bit storage
> + * for the it. If the counter exceeds the 32-bit value then return zero.
> + * The caller should check the return value, but if the caller happen to
> + * not check the value and use it, then the firmware treats zero as an
> + * invalid number and will fail the message request.
> + */
> + if (count >= UINT_MAX) {
> + pr_err_ratelimited("SNP guest request message sequence counter overflow\n");
> + return 0;
> + }
> +
> + return count;
> +}
> +
> +static void snp_inc_msg_seqno(struct snp_guest_dev *snp_dev)
> +{
> + /*
> + * The counter is also incremented by the PSP, so increment it by 2
> + * and save in secrets page.
> + */
> + *snp_dev->os_area_msg_seqno += 2;
> +}
> +
> +static inline struct snp_guest_dev *to_snp_dev(struct file *file)
> +{
> + struct miscdevice *dev = file->private_data;
> +
> + return container_of(dev, struct snp_guest_dev, misc);
> +}
> +
> +static struct snp_guest_crypto *init_crypto(struct snp_guest_dev *snp_dev, u8 *key, size_t keylen)
> +{
> + struct snp_guest_crypto *crypto;
> +
> + crypto = kzalloc(sizeof(*crypto), GFP_KERNEL_ACCOUNT);
> + if (!crypto)
> + return NULL;
> +
> + crypto->tfm = crypto_alloc_aead("gcm(aes)", 0, 0);
> + if (IS_ERR(crypto->tfm))
> + goto e_free;
> +
> + if (crypto_aead_setkey(crypto->tfm, key, keylen))
> + goto e_free_crypto;
> +
> + crypto->iv_len = crypto_aead_ivsize(crypto->tfm);
> + if (crypto->iv_len < 12) {
> + dev_err(snp_dev->dev, "IV length is less than 12.\n");
> + goto e_free_crypto;
> + }
> +
> + crypto->iv = kmalloc(crypto->iv_len, GFP_KERNEL_ACCOUNT);
> + if (!crypto->iv)
> + goto e_free_crypto;
> +
> + if (crypto_aead_authsize(crypto->tfm) > MAX_AUTHTAG_LEN) {
> + if (crypto_aead_setauthsize(crypto->tfm, MAX_AUTHTAG_LEN)) {
> + dev_err(snp_dev->dev, "failed to set authsize to %d\n", MAX_AUTHTAG_LEN);
> + goto e_free_crypto;
> + }
> + }
> +
> + crypto->a_len = crypto_aead_authsize(crypto->tfm);
> + crypto->authtag = kmalloc(crypto->a_len, GFP_KERNEL_ACCOUNT);
> + if (!crypto->authtag)
> + goto e_free_crypto;
> +
> + return crypto;
> +
> +e_free_crypto:
> + crypto_free_aead(crypto->tfm);
> +e_free:
> + kfree(crypto->iv);
> + kfree(crypto->authtag);
> + kfree(crypto);
> +
> + return NULL;
> +}
> +
> +static void deinit_crypto(struct snp_guest_crypto *crypto)
> +{
> + crypto_free_aead(crypto->tfm);
> + kfree(crypto->iv);
> + kfree(crypto->authtag);
> + kfree(crypto);
> +}
> +
> +static int enc_dec_message(struct snp_guest_crypto *crypto, struct snp_guest_msg *msg,
> + u8 *src_buf, u8 *dst_buf, size_t len, bool enc)
> +{
> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
> + struct scatterlist src[3], dst[3];
> + DECLARE_CRYPTO_WAIT(wait);
> + struct aead_request *req;
> + int ret;
> +
> + req = aead_request_alloc(crypto->tfm, GFP_KERNEL);
> + if (!req)
> + return -ENOMEM;
> +
> + /*
> + * AEAD memory operations:
> + * +------ AAD -------+------- DATA -----+---- AUTHTAG----+
> + * | msg header | plaintext | hdr->authtag |
> + * | bytes 30h - 5Fh | or | |
> + * | | cipher | |
> + * +------------------+------------------+----------------+
> + */
> + sg_init_table(src, 3);
> + sg_set_buf(&src[0], &hdr->algo, AAD_LEN);
> + sg_set_buf(&src[1], src_buf, hdr->msg_sz);
> + sg_set_buf(&src[2], hdr->authtag, crypto->a_len);
> +
> + sg_init_table(dst, 3);
> + sg_set_buf(&dst[0], &hdr->algo, AAD_LEN);
> + sg_set_buf(&dst[1], dst_buf, hdr->msg_sz);
> + sg_set_buf(&dst[2], hdr->authtag, crypto->a_len);
> +
> + aead_request_set_ad(req, AAD_LEN);
> + aead_request_set_tfm(req, crypto->tfm);
> + aead_request_set_callback(req, 0, crypto_req_done, &wait);
> +
> + aead_request_set_crypt(req, src, dst, len, crypto->iv);
> + ret = crypto_wait_req(enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req), &wait);
> +
> + aead_request_free(req);
> + return ret;
> +}
> +
> +static int __enc_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
> + void *plaintext, size_t len)
> +{
> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
> +
> + memset(crypto->iv, 0, crypto->iv_len);
> + memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
> +
> + return enc_dec_message(crypto, msg, plaintext, msg->payload, len, true);
> +}
> +
> +static int dec_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
> + void *plaintext, size_t len)
> +{
> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
> +
> + /* Build IV with response buffer sequence number */
> + memset(crypto->iv, 0, crypto->iv_len);
> + memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
> +
> + return enc_dec_message(crypto, msg, msg->payload, plaintext, len, false);
> +}
> +
> +static int verify_and_dec_payload(struct snp_guest_dev *snp_dev, void *payload, u32 sz)
> +{
> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> + struct snp_guest_msg *resp = snp_dev->response;
> + struct snp_guest_msg *req = snp_dev->request;
> + struct snp_guest_msg_hdr *req_hdr = &req->hdr;
> + struct snp_guest_msg_hdr *resp_hdr = &resp->hdr;
> +
> + dev_dbg(snp_dev->dev, "response [seqno %lld type %d version %d sz %d]\n",
> + resp_hdr->msg_seqno, resp_hdr->msg_type, resp_hdr->msg_version, resp_hdr->msg_sz);
> +
> + /* Verify that the sequence counter is incremented by 1 */
> + if (unlikely(resp_hdr->msg_seqno != (req_hdr->msg_seqno + 1)))
> + return -EBADMSG;
> +
> + /* Verify response message type and version number. */
> + if (resp_hdr->msg_type != (req_hdr->msg_type + 1) ||
> + resp_hdr->msg_version != req_hdr->msg_version)
> + return -EBADMSG;
> +
> + /*
> + * If the message size is greater than our buffer length then return
> + * an error.
> + */
> + if (unlikely((resp_hdr->msg_sz + crypto->a_len) > sz))
> + return -EBADMSG;
> +
> + return dec_payload(snp_dev, resp, payload, resp_hdr->msg_sz + crypto->a_len);
> +}
> +
> +static bool enc_payload(struct snp_guest_dev *snp_dev, u64 seqno, int version, u8 type,
> + void *payload, size_t sz)
> +{
> + struct snp_guest_msg *req = snp_dev->request;
> + struct snp_guest_msg_hdr *hdr = &req->hdr;
> +
> + memset(req, 0, sizeof(*req));
> +
> + hdr->algo = SNP_AEAD_AES_256_GCM;
> + hdr->hdr_version = MSG_HDR_VER;
> + hdr->hdr_sz = sizeof(*hdr);
> + hdr->msg_type = type;
> + hdr->msg_version = version;
> + hdr->msg_seqno = seqno;
> + hdr->msg_vmpck = vmpck_id;
> + hdr->msg_sz = sz;
> +
> + /* Verify the sequence number is non-zero */
> + if (!hdr->msg_seqno)
> + return -ENOSR;
> +
> + dev_dbg(snp_dev->dev, "request [seqno %lld type %d version %d sz %d]\n",
> + hdr->msg_seqno, hdr->msg_type, hdr->msg_version, hdr->msg_sz);
> +
> + return __enc_payload(snp_dev, req, payload, sz);
> +}
> +
> +static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, int msg_ver,
> + u8 type, void *req_buf, size_t req_sz, void *resp_buf,
> + u32 resp_sz, __u64 *fw_err)
> +{
> + unsigned long err;
> + u64 seqno;
> + int rc;
> +
> + /* Get message sequence and verify that its a non-zero */
> + seqno = snp_get_msg_seqno(snp_dev);
> + if (!seqno)
> + return -EIO;
> +
> + memset(snp_dev->response, 0, sizeof(*snp_dev->response));
> +
> + /* Encrypt the userspace provided payload */
> + rc = enc_payload(snp_dev, seqno, msg_ver, type, req_buf, req_sz);
> + if (rc)
> + return rc;
> +
> + /* Call firmware to process the request */
> + rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
> + if (fw_err)
> + *fw_err = err;
> +
> + if (rc)
> + return rc;
> +
> + rc = verify_and_dec_payload(snp_dev, resp_buf, resp_sz);
> + if (rc)
> + return rc;
> +
> + /* Increment to new message sequence after the command is successful. */
> + snp_inc_msg_seqno(snp_dev);
Thanks for updating this sequence number logic. But I still have some
concerns. In verify_and_dec_payload() we check the encryption header
but all these fields are accessible to the hypervisor, meaning it can
change the header and cause this sequence number to not get
incremented. We then will reuse the sequence number for the next
command, which isn't great for AES GCM. It seems very hard to tell if
the FW actually got our request and created a response there by
incrementing the sequence number by 2, or if the hypervisor is acting
in bad faith. It seems like to be safe we need to completely stop
using this vmpck if we cannot confirm the PSP has gotten our request
and created a response. Thoughts?
> +
> + return 0;
> +}
> +
> +static int get_report(struct snp_guest_dev *snp_dev, struct snp_guest_request_ioctl *arg)
> +{
> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> + struct snp_report_resp *resp;
> + struct snp_report_req req;
> + int rc, resp_len;
> +
> + if (!arg->req_data || !arg->resp_data)
> + return -EINVAL;
> +
> + /* Copy the request payload from userspace */
> + if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
> + return -EFAULT;
> +
> + /* Message version must be non-zero */
> + if (!req.msg_version)
> + return -EINVAL;
> +
> + /*
> + * The intermediate response buffer is used while decrypting the
> + * response payload. Make sure that it has enough space to cover the
> + * authtag.
> + */
> + resp_len = sizeof(resp->data) + crypto->a_len;
> + resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
> + if (!resp)
> + return -ENOMEM;
> +
> + /* Issue the command to get the attestation report */
> + rc = handle_guest_request(snp_dev, SVM_VMGEXIT_GUEST_REQUEST, req.msg_version,
> + SNP_MSG_REPORT_REQ, &req.user_data, sizeof(req.user_data),
> + resp->data, resp_len, &arg->fw_err);
> + if (rc)
> + goto e_free;
> +
> + /* Copy the response payload to userspace */
> + if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
> + rc = -EFAULT;
> +
> +e_free:
> + kfree(resp);
> + return rc;
> +}
> +
> +static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> +{
> + struct snp_guest_dev *snp_dev = to_snp_dev(file);
> + void __user *argp = (void __user *)arg;
> + struct snp_guest_request_ioctl input;
> + int ret = -ENOTTY;
> +
> + if (copy_from_user(&input, argp, sizeof(input)))
> + return -EFAULT;
> +
> + input.fw_err = 0;
> +
> + mutex_lock(&snp_cmd_mutex);
> +
> + switch (ioctl) {
> + case SNP_GET_REPORT:
> + ret = get_report(snp_dev, &input);
> + break;
> + default:
> + break;
> + }
> +
> + mutex_unlock(&snp_cmd_mutex);
> +
> + if (input.fw_err && copy_to_user(argp, &input, sizeof(input)))
> + return -EFAULT;
> +
> + return ret;
> +}
> +
> +static void free_shared_pages(void *buf, size_t sz)
> +{
> + unsigned int npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
> +
> + if (!buf)
> + return;
> +
> + /* If fail to restore the encryption mask then leak it. */
> + if (WARN_ONCE(set_memory_encrypted((unsigned long)buf, npages),
> + "Failed to restore encryption mask (leak it)\n"))
> + return;
> +
> + __free_pages(virt_to_page(buf), get_order(sz));
> +}
> +
> +static void *alloc_shared_pages(size_t sz)
> +{
> + unsigned int npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
> + struct page *page;
> + int ret;
> +
> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(sz));
> + if (IS_ERR(page))
> + return NULL;
> +
> + ret = set_memory_decrypted((unsigned long)page_address(page), npages);
> + if (ret) {
> + pr_err("SEV-SNP: failed to mark page shared, ret=%d\n", ret);
> + __free_pages(page, get_order(sz));
> + return NULL;
> + }
> +
> + return page_address(page);
> +}
> +
> +static const struct file_operations snp_guest_fops = {
> + .owner = THIS_MODULE,
> + .unlocked_ioctl = snp_guest_ioctl,
> +};
> +
> +static u8 *get_vmpck(int id, struct snp_secrets_page_layout *layout, u32 **seqno)
> +{
> + u8 *key = NULL;
> +
> + switch (id) {
> + case 0:
> + *seqno = &layout->os_area.msg_seqno_0;
> + key = layout->vmpck0;
> + break;
> + case 1:
> + *seqno = &layout->os_area.msg_seqno_1;
> + key = layout->vmpck1;
> + break;
> + case 2:
> + *seqno = &layout->os_area.msg_seqno_2;
> + key = layout->vmpck2;
> + break;
> + case 3:
> + *seqno = &layout->os_area.msg_seqno_3;
> + key = layout->vmpck3;
> + break;
> + default:
> + break;
> + }
> +
> + return NULL;
> +}
> +
> +static int __init snp_guest_probe(struct platform_device *pdev)
> +{
> + struct snp_secrets_page_layout *layout;
> + struct snp_guest_platform_data *data;
> + struct device *dev = &pdev->dev;
> + struct snp_guest_dev *snp_dev;
> + struct miscdevice *misc;
> + u8 *vmpck;
> + int ret;
> +
> + if (!dev->platform_data)
> + return -ENODEV;
> +
> + data = (struct snp_guest_platform_data *)dev->platform_data;
> + layout = (__force void *)ioremap_encrypted(data->secrets_gpa, PAGE_SIZE);
> + if (!layout)
> + return -ENODEV;
> +
> + ret = -ENOMEM;
> + snp_dev = devm_kzalloc(&pdev->dev, sizeof(struct snp_guest_dev), GFP_KERNEL);
> + if (!snp_dev)
> + goto e_fail;
> +
> + ret = -EINVAL;
> + vmpck = get_vmpck(vmpck_id, layout, &snp_dev->os_area_msg_seqno);
> + if (!vmpck) {
> + dev_err(dev, "invalid vmpck id %d\n", vmpck_id);
> + goto e_fail;
> + }
> +
> + platform_set_drvdata(pdev, snp_dev);
> + snp_dev->dev = dev;
> + snp_dev->layout = layout;
> +
> + /* Allocate the shared page used for the request and response message. */
> + snp_dev->request = alloc_shared_pages(sizeof(struct snp_guest_msg));
> + if (!snp_dev->request)
> + goto e_fail;
> +
> + snp_dev->response = alloc_shared_pages(sizeof(struct snp_guest_msg));
> + if (!snp_dev->response)
> + goto e_fail;
> +
> + ret = -EIO;
> + snp_dev->crypto = init_crypto(snp_dev, vmpck, VMPCK_KEY_LEN);
> + if (!snp_dev->crypto)
> + goto e_fail;
> +
> + misc = &snp_dev->misc;
> + misc->minor = MISC_DYNAMIC_MINOR;
> + misc->name = DEVICE_NAME;
> + misc->fops = &snp_guest_fops;
> +
> + /* initial the input address for guest request */
> + snp_dev->input.req_gpa = __pa(snp_dev->request);
> + snp_dev->input.resp_gpa = __pa(snp_dev->response);
> +
> + ret = misc_register(misc);
> + if (ret)
> + goto e_fail;
> +
> + dev_dbg(dev, "Initialized SNP guest driver (using vmpck_id %d)\n", vmpck_id);
> + return 0;
> +
> +e_fail:
> + iounmap(layout);
> + free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
> + free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
> +
> + return ret;
> +}
> +
> +static int __exit snp_guest_remove(struct platform_device *pdev)
> +{
> + struct snp_guest_dev *snp_dev = platform_get_drvdata(pdev);
> +
> + free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
> + free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
> + deinit_crypto(snp_dev->crypto);
> + misc_deregister(&snp_dev->misc);
> +
> + return 0;
> +}
> +
> +static struct platform_driver snp_guest_driver = {
> + .remove = __exit_p(snp_guest_remove),
> + .driver = {
> + .name = "snp-guest",
> + },
> +};
> +
> +module_platform_driver_probe(snp_guest_driver, snp_guest_probe);
> +
> +MODULE_AUTHOR("Brijesh Singh <[email protected]>");
> +MODULE_LICENSE("GPL");
> +MODULE_VERSION("1.0.0");
> +MODULE_DESCRIPTION("AMD SNP Guest Driver");
> diff --git a/drivers/virt/coco/sevguest/sevguest.h b/drivers/virt/coco/sevguest/sevguest.h
> new file mode 100644
> index 000000000000..cfa76cf8a21a
> --- /dev/null
> +++ b/drivers/virt/coco/sevguest/sevguest.h
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Brijesh Singh <[email protected]>
> + *
> + * SEV-SNP API spec is available at https://developer.amd.com/sev
> + */
> +
> +#ifndef __LINUX_SEVGUEST_H_
> +#define __LINUX_SEVGUEST_H_
> +
> +#include <linux/types.h>
> +
> +#define MAX_AUTHTAG_LEN 32
> +
> +/* See SNP spec SNP_GUEST_REQUEST section for the structure */
> +enum msg_type {
> + SNP_MSG_TYPE_INVALID = 0,
> + SNP_MSG_CPUID_REQ,
> + SNP_MSG_CPUID_RSP,
> + SNP_MSG_KEY_REQ,
> + SNP_MSG_KEY_RSP,
> + SNP_MSG_REPORT_REQ,
> + SNP_MSG_REPORT_RSP,
> + SNP_MSG_EXPORT_REQ,
> + SNP_MSG_EXPORT_RSP,
> + SNP_MSG_IMPORT_REQ,
> + SNP_MSG_IMPORT_RSP,
> + SNP_MSG_ABSORB_REQ,
> + SNP_MSG_ABSORB_RSP,
> + SNP_MSG_VMRK_REQ,
> + SNP_MSG_VMRK_RSP,
> +
> + SNP_MSG_TYPE_MAX
> +};
> +
> +enum aead_algo {
> + SNP_AEAD_INVALID,
> + SNP_AEAD_AES_256_GCM,
> +};
> +
> +struct snp_guest_msg_hdr {
> + u8 authtag[MAX_AUTHTAG_LEN];
> + u64 msg_seqno;
> + u8 rsvd1[8];
> + u8 algo;
> + u8 hdr_version;
> + u16 hdr_sz;
> + u8 msg_type;
> + u8 msg_version;
> + u16 msg_sz;
> + u32 rsvd2;
> + u8 msg_vmpck;
> + u8 rsvd3[35];
> +} __packed;
> +
> +struct snp_guest_msg {
> + struct snp_guest_msg_hdr hdr;
> + u8 payload[4000];
> +} __packed;
> +
> +/*
> + * The secrets page contains 96-bytes of reserved field that can be used by
> + * the guest OS. The guest OS uses the area to save the message sequence
> + * number for each VMPCK.
> + *
> + * See the GHCB spec section Secret page layout for the format for this area.
> + */
> +struct secrets_os_area {
> + u32 msg_seqno_0;
> + u32 msg_seqno_1;
> + u32 msg_seqno_2;
> + u32 msg_seqno_3;
> + u64 ap_jump_table_pa;
> + u8 rsvd[40];
> + u8 guest_usage[32];
> +} __packed;
> +
> +#define VMPCK_KEY_LEN 32
> +
> +/* See the SNP spec version 0.9 for secrets page format */
> +struct snp_secrets_page_layout {
> + u32 version;
> + u32 imien : 1,
> + rsvd1 : 31;
> + u32 fms;
> + u32 rsvd2;
> + u8 gosvw[16];
> + u8 vmpck0[VMPCK_KEY_LEN];
> + u8 vmpck1[VMPCK_KEY_LEN];
> + u8 vmpck2[VMPCK_KEY_LEN];
> + u8 vmpck3[VMPCK_KEY_LEN];
> + struct secrets_os_area os_area;
> + u8 rsvd3[3840];
> +} __packed;
> +
> +#endif /* __LINUX_SNP_GUEST_H__ */
> diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
> new file mode 100644
> index 000000000000..eda7edcffda8
> --- /dev/null
> +++ b/include/uapi/linux/sev-guest.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> +/*
> + * Userspace interface for AMD SEV and SEV-SNP guest driver.
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Brijesh Singh <[email protected]>
> + *
> + * SEV API specification is available at: https://developer.amd.com/sev/
> + */
> +
> +#ifndef __UAPI_LINUX_SEV_GUEST_H_
> +#define __UAPI_LINUX_SEV_GUEST_H_
> +
> +#include <linux/types.h>
> +
> +struct snp_report_req {
> + /* message version number (must be non-zero) */
> + __u8 msg_version;
> +
> + /* user data that should be included in the report */
> + __u8 user_data[64];
> +};
> +
> +struct snp_report_resp {
> + /* response data, see SEV-SNP spec for the format */
> + __u8 data[4000];
> +};
> +
> +struct snp_guest_request_ioctl {
> + /* Request and response structure address */
> + __u64 req_data;
> + __u64 resp_data;
> +
> + /* firmware error code on failure (see psp-sev.h) */
> + __u64 fw_err;
> +};
> +
> +#define SNP_GUEST_REQ_IOC_TYPE 'S'
> +
> +/* Get SNP attestation report */
> +#define SNP_GET_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x0, struct snp_guest_request_ioctl)
> +
> +#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
> --
> 2.25.1
>
On Wed, Oct 20, 2021 at 08:01:07PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > [Sorry for the wall of text, just trying to work through everything.]
>
> And I'm going to respond in a couple of mails just for my own sanity.
>
> > I'm not sure if this is pertaining to using the CPUID table prior to
> > sme_enable(), or just the #VC-based SEV MSR read. The following comments
> > assume the former. If that assumption is wrong you can basically ignore
> > the rest of this email :)
>
> This is pertaining to me wanting to show you that the design of this SNP
> support needs to be sane and maintainable and every function needs to
> make sense not only now but in the future.
Absolutely.
>
> In this particular example, we should set sev_status *once*, *before*
> anything accesses it so that it is prepared when something needs it. Not
> do a #VC and go, "oh, btw, is sev_status set? No? Ok, lemme set it."
> which basically means our design is seriously lacking.
Yes, taking a step back there are some things that could probably be
improved upon there.
currently:
- boot kernel initializes sev_status in set_sev_encryption_mask()
- run-time kernel initializes sev_status in sme_enable()
with this series the following are introduced:
- boot kernel initializes sev_status on-demand in sev_snp_enabled()
- initially used by snp_cpuid_init_boot(), which happens before
set_sev_encryption_mask()
- run-time kernel initializes sev_status on-demand via #VC handler
- initially used by snp_cpuid_init(), which happens before
sme_enable()
Fortunately, all the code makes use of sev_status to get at the SEV MSR
bits, so breaking the appropriate bits out of sme_enable() into an earlier
sev_init() routine that's the exclusive writer of sev_status sounds like a
promising approach.
It makes sense to do it immediately after the first #VC handler is set
up, so CPUID is available, and since that's where SNP CPUID table
initialization would need to happen if it's to be made available in
#VC handler.
It may even be similar enough between boot/compressed and run-time kernel
that it could be a shared routine in sev-shared.c. But then again it also
sounds like the appropriate place to move the snp_cpuid_init*() calls,
and locating the cc_blob, and since there's differences there it might make
sense to keep the boot/compressed and kernel proper sev_init() routines
separate to avoid #ifdeffery).
Not to get ahead of myself though. Just seems like a good starting point
for how to consolidate the various users.
>
> And I had suggested a similar thing for TDX and tglx was 100% right in
> shooting it down because we do properly designed things - not, get stuff
> in so that vendor is happy and then, once the vendor programmers have
> disappeared to do their next enablement task, the maintainers get to mop
> up and maintain it forever.
>
> Because this mopping up doesn't scale - trust me.
Got it, and my apologies if I've given you that impression as it's
certainly not my intent. (though I'm sure you've heard that before.)
>
> > [The #VC-based SEV MSR read is not necessary for anything in sme_enable(),
> > it's simply a way to determine whether the guest is an SNP guest, without
> > any reliance on CPUID, which seemed useful in the context of doing some
> > additional sanity checks against the SNP CPUID table and determining that
> > it's appropriate to use it early on (rather than just trust that this is an
> > SNP guest by virtue of the CC blob being present, and then failing later
> > once sme_enable() checks for the SNP feature bits through the normal
> > mechanism, as was done in v5).]
>
> So you need to make up your mind here design-wise, what you wanna do.
>
> The proper thing to do would be, to detect *everything*, detect whether
> this is an SNP guest, yadda yadda, everything your code is going to need
> later on, and then be done with it.
>
> Then you continue with the boot and now your other code queries
> everything that has been detected up til now and uses it.
Agreed, if we need to check SEV MSR early for the purposes of SNP it makes
sense to move the overall SEV feature detection code earlier as well. I
should have looked into that aspect more closely before introducing the
changes.
>
> End of mail 1.
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C1f46689b40da4a700a6308d993f3993a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703496776826912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Kdi9h%2FTzuzoLn64BRsRMLWHkew14BxZHR28QsORsSxs%3D&reserved=0
On Wed, Oct 20, 2021 at 08:08:39PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > > 1. Code checks SME/SEV support leaf. HV lies and says there's none. So
> > > guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor
> > > won't be able to give confidentiality to its users => users go away or
> > > do unencrypted like now.
> > >
> > > Problem is solved by political and economical pressure.
> > >
> > > 2. Check SEV and SME bit. HV lies here. Oh well, same as the above.
> >
> > I'd be worried about the possibility that, through some additional exploits
> > or failures in the attestation flow,
>
> Well, that puts forward an important question: how do you verify
> *reliably* that this is an SNP guest?
>
> - attestation?
>
> - CPUID?
>
> - anything else?
>
> I don't see this written down anywhere. Because this assumption will
> guide the design in the kernel.
According to the APM at least, (Rev 3.37, 15.34.10, "SEV_STATUS MSR"), the
SEV MSR is the appropriate source for guests to use. This is what is used
in the EFI code as well. So that seems to be the right way to make the
initial determination.
There's a dependency there on the SEV CPUID bit however, since setting the
bit to 0 would generally result in a guest skipping the SEV MSR read and
assuming 0. So for SNP it would be more reliable to make use of the CPUID
table at that point, since it's less-susceptible to manipulation, or do the
#VC-based SEV MSR read (or both).
>
> > a guest owner was tricked into booting unencrypted on a compromised
> > host and exposing their secrets. Their attestation process might even
> > do some additional CPUID sanity checks, which would at the point
> > be via the SNP CPUID table and look legitimate, unaware that the
> > kernel didn't actually use the SNP CPUID table until after 0x8000001F
> > was parsed (if we were to only initialize it after/as-part-of
> > sme_enable()).
>
> So what happens with that guest owner later?
>
> How is she to notice that she booted unencrypted?
Fully-unencrypted should result in a crash due to the reasons below.
But there may exist some carefully crafted outside influences that could
goad the guest into, perhaps, not marking certain pages as private. The
best that can be done to prevent that is to audit/harden all the code in the
boot stack so that it is less susceptible to that kind of outside
manipulation (via mechanisms like SEV-ES, SNP page validation, SNP CPUID
table, SNP restricted injection, etc.)
Then of course that boot stack needs to be part of the attestation process
to provide any meaningful assurances about the resulting guest state.
Outside of the boot stack the guest owner might take some extra precautions.
Perhaps custom some kernel driver to verify encryption/validated status of
guest pages, some checks against the CPUID table to verify it contains sane
values, but not really worth speculating on that aspect as it will be
ultimately dependent on how the cloud vendor decides to handle things after
boot.
>
> > Fortunately in this scenario I think the guest kernel actually would fail to
> > boot due to the SNP hardware unconditionally treating code/page tables as
> > encrypted pages. I tested some of these scenarios just to check, but not
> > all, and I still don't feel confident enough about it to say that there's
> > not some way to exploit this by someone who is more clever/persistant than
> > me.
>
> All this design needs to be preceded with: "We protect against cases A,
> B and C and not against D, E, etc."
>
> So that it is clear to all parties involved what we're working with and
> what we're protecting against and what we're *not* protecting against.
That would indeed be useful. Perhaps as a nice big comment in sme_enable()
and/or the proposed sev_init() so that those invariants can be maintained,
or updated in sync with future changes. I'll look into that for the next
spin and check with Brijesh on the details.
>
> End of mail 2, more later.
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C70ce657823a441516fc808d993f4a402%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703501243595370%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nIqCXolZUNWTV6eBfLscXRfQDWJZk5fwBMghKVbIeaw%3D&reserved=0
On Wed, Oct 20, 2021 at 07:35:35PM -0500, Michael Roth wrote:
> Fortunately, all the code makes use of sev_status to get at the SEV MSR
> bits, so breaking the appropriate bits out of sme_enable() into an earlier
> sev_init() routine that's the exclusive writer of sev_status sounds like a
> promising approach.
Ack.
> It makes sense to do it immediately after the first #VC handler is set
> up, so CPUID is available, and since that's where SNP CPUID table
> initialization would need to happen if it's to be made available in
> #VC handler.
Right, and you can do all your init/CPUID prep there.
> It may even be similar enough between boot/compressed and run-time kernel
> that it could be a shared routine in sev-shared.c.
Uuh, bonus points! :-)
> But then again it also sounds like the appropriate place to move the
> snp_cpuid_init*() calls, and locating the cc_blob, and since there's
> differences there it might make sense to keep the boot/compressed and
> kernel proper sev_init() routines separate to avoid #ifdeffery).
>
> Not to get ahead of myself though. Just seems like a good starting point
> for how to consolidate the various users.
I like how you're thinking. :)
> Got it, and my apologies if I've given you that impression as it's
> certainly not my intent. (though I'm sure you've heard that before.)
Nothing to apologize - all good.
> Agreed, if we need to check SEV MSR early for the purposes of SNP it makes
> sense to move the overall SEV feature detection code earlier as well. I
> should have looked into that aspect more closely before introducing the
> changes.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 20, 2021 at 09:05:42PM -0500, Michael Roth wrote:
> According to the APM at least, (Rev 3.37, 15.34.10, "SEV_STATUS MSR"), the
> SEV MSR is the appropriate source for guests to use. This is what is used
> in the EFI code as well. So that seems to be the right way to make the
> initial determination.
Yap.
> There's a dependency there on the SEV CPUID bit however, since setting the
> bit to 0 would generally result in a guest skipping the SEV MSR read and
> assuming 0. So for SNP it would be more reliable to make use of the CPUID
> table at that point, since it's less-susceptible to manipulation, or do the
> #VC-based SEV MSR read (or both).
So the CPUID page is supplied by the firmware, right?
Then, you parse it and see that the CPUID bit is 1, then you start using
the SEV_STATUS MSR and all good.
If there *is* a CPUID page but that bit is 0, then you can safely assume
that something is playing tricks on ya so you simply refuse booting.
> Fully-unencrypted should result in a crash due to the reasons below.
Crash is a good thing in confidential computing. :)
> But there may exist some carefully crafted outside influences that could
> goad the guest into, perhaps, not marking certain pages as private. The
> best that can be done to prevent that is to audit/harden all the code in the
> boot stack so that it is less susceptible to that kind of outside
> manipulation (via mechanisms like SEV-ES, SNP page validation, SNP CPUID
> table, SNP restricted injection, etc.)
So to me I wonder why would one use anything *else* but an SNP guest. We
all know that those previous technologies were just the stepping stones
towards SNP.
> Then of course that boot stack needs to be part of the attestation process
> to provide any meaningful assurances about the resulting guest state.
>
> Outside of the boot stack the guest owner might take some extra precautions.
> Perhaps custom some kernel driver to verify encryption/validated status of
> guest pages, some checks against the CPUID table to verify it contains sane
> values, but not really worth speculating on that aspect as it will be
> ultimately dependent on how the cloud vendor decides to handle things after
> boot.
Well, I've always advocated having a best-practices writeup somewhere
goes a long way to explain this technology to people and how to get
their feet wet. And there you can give hints how such verification could
look like in detail...
> That would indeed be useful. Perhaps as a nice big comment in sme_enable()
> and/or the proposed sev_init() so that those invariants can be maintained,
> or updated in sync with future changes. I'll look into that for the next
> spin and check with Brijesh on the details.
There is Documentation/x86/amd-memory-encryption.rst, for example.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> At which point we then switch to using the CPUID table? But at that
> point all the previous CPUID checks, both SEV-related/non-SEV-related,
> are now possibly not consistent with what's in the CPUID table. Do we
> then revalidate?
Well, that's a tough question. That's basically the same question as,
does Linux support heterogeneous cores and can it handle hardware
features which get enabled after boot. The perfect example is, late
microcode loading which changes CPUID bits and adds new functionality.
And the answer to that is, well, hard. You need to decide this on a
case-by-case basis.
But isn't it that the SNP CPUID page will be parsed early enough anyway
so that kernel proper will see only SNP CPUID info and init properly
using that?
> Even a non-malicious hypervisor might provide inconsistent values
> between the two sources due to bugs, or SNP validation suppressing
> certain feature bits that hypervisor otherwise exposes, etc.
There's also migration, lemme point to a very recent example:
https://lore.kernel.org/r/[email protected]
which is exactly what you say - a non-malicious HV taking care of its
migration pool. So how do you handle that?
> Now all the code after sme_enable() can potentially take unexpected
> execution paths, where post-sme_enable() code makes assumptions about
> pre-sme_enable() checks that may no longer hold true.
So as I said above, if you parse SNP CPUID page early enough, you don't
have to worry about feature rediscovery. Early enough means, before
identify_boot_cpu().
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> The CPUID calls in snp_cpuid_init() weren't added specifically to induce
> the #VC-based SEV MSR read, they were added only because I thought the
> gist of your earlier suggestions were to do more validation against the
> CPUID table advertised by EFI
Well, if EFI is providing us with the CPUID table, who verified it? The
attestation process? Is it signed with the AMD platform key?
Because if we can verify the firmware is ok, then we can trust the CPUID
page, right?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
* Borislav Petkov ([email protected]) wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > At which point we then switch to using the CPUID table? But at that
> > point all the previous CPUID checks, both SEV-related/non-SEV-related,
> > are now possibly not consistent with what's in the CPUID table. Do we
> > then revalidate?
>
> Well, that's a tough question. That's basically the same question as,
> does Linux support heterogeneous cores and can it handle hardware
> features which get enabled after boot. The perfect example is, late
> microcode loading which changes CPUID bits and adds new functionality.
>
> And the answer to that is, well, hard. You need to decide this on a
> case-by-case basis.
I can imagine a malicious hypervisor trying to return different cpuid
answers to different threads or even the same thread at different times.
> But isn't it that the SNP CPUID page will be parsed early enough anyway
> so that kernel proper will see only SNP CPUID info and init properly
> using that?
>
> > Even a non-malicious hypervisor might provide inconsistent values
> > between the two sources due to bugs, or SNP validation suppressing
> > certain feature bits that hypervisor otherwise exposes, etc.
>
> There's also migration, lemme point to a very recent example:
>
> https://lore.kernel.org/r/[email protected]
Ewww.
> which is exactly what you say - a non-malicious HV taking care of its
> migration pool. So how do you handle that?
Well, the spec (AMD 56860 SEV spec) says:
'If firmware encounters a CPUID function that is in the standard or extended ranges, then the
firmware performs a check to ensure that the provided output would not lead to an insecure guest
state'
so I take that 'firmware' to be the PSP; that wording doesn't say that
it checks that the CPUID is identical, just that it 'would not lead to
an insecure guest' - so a hypervisor could hide any 'no longer affected
by' flag for all the CPUs in it's migration pool and the firmware
shouldn't complain; so it should be OK to pessimise.
Dave
> > Now all the code after sme_enable() can potentially take unexpected
> > execution paths, where post-sme_enable() code makes assumptions about
> > pre-sme_enable() checks that may no longer hold true.
>
> So as I said above, if you parse SNP CPUID page early enough, you don't
> have to worry about feature rediscovery. Early enough means, before
> identify_boot_cpu().
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK
On Thu, Oct 21, 2021 at 04:56:09PM +0100, Dr. David Alan Gilbert wrote:
> I can imagine a malicious hypervisor trying to return different cpuid
> answers to different threads or even the same thread at different times.
Haha, I guess that will fail not because of SEV* but because of the
kernel not really being able to handle heterogeneous CPUIDs.
> Well, the spec (AMD 56860 SEV spec) says:
>
> 'If firmware encounters a CPUID function that is in the standard or extended ranges, then the
> firmware performs a check to ensure that the provided output would not lead to an insecure guest
> state'
>
> so I take that 'firmware' to be the PSP; that wording doesn't say that
> it checks that the CPUID is identical, just that it 'would not lead to
> an insecure guest' - so a hypervisor could hide any 'no longer affected
> by' flag for all the CPUs in it's migration pool and the firmware
> shouldn't complain; so it should be OK to pessimise.
AFAIU this, I think this would depend on "[t]he policy used by the
firmware to assess CPUID function output can be found in [PPR]."
So if the HV sets the "no longer affected by" flag but the firmware
deems this set flag as insecure, I'm assuming the firmare will clear
it when it returns the CPUID leafs. I guess I need to go find that
policy...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
* Borislav Petkov ([email protected]) wrote:
> On Thu, Oct 21, 2021 at 04:56:09PM +0100, Dr. David Alan Gilbert wrote:
> > I can imagine a malicious hypervisor trying to return different cpuid
> > answers to different threads or even the same thread at different times.
>
> Haha, I guess that will fail not because of SEV* but because of the
> kernel not really being able to handle heterogeneous CPUIDs.
My worry is if it fails cleanly or fails in a way an evil hypervisor can
exploit.
> > Well, the spec (AMD 56860 SEV spec) says:
> >
> > 'If firmware encounters a CPUID function that is in the standard or extended ranges, then the
> > firmware performs a check to ensure that the provided output would not lead to an insecure guest
> > state'
> >
> > so I take that 'firmware' to be the PSP; that wording doesn't say that
> > it checks that the CPUID is identical, just that it 'would not lead to
> > an insecure guest' - so a hypervisor could hide any 'no longer affected
> > by' flag for all the CPUs in it's migration pool and the firmware
> > shouldn't complain; so it should be OK to pessimise.
>
> AFAIU this, I think this would depend on "[t]he policy used by the
> firmware to assess CPUID function output can be found in [PPR]."
>
> So if the HV sets the "no longer affected by" flag but the firmware
> deems this set flag as insecure, I'm assuming the firmare will clear
> it when it returns the CPUID leafs. I guess I need to go find that
> policy...
<digs - ppr_B1_pub_1 55898 rev 0.50 >
OK, so that bit is 8...21 Eax ext2eax bit 6 page 1-109
then 2.1.5.3 CPUID policy enforcement shows 8...21 EAX as
'bitmask'
'bits set in the GuestVal must also be set in HostVal.
This is often applied to feature fields where each bit indicates
support for a feature'
So that's right isn't it?
Dave
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK
On Thu, Oct 21, 2021 at 06:12:53PM +0100, Dr. David Alan Gilbert wrote:
> OK, so that bit is 8...21 Eax ext2eax bit 6 page 1-109
>
> then 2.1.5.3 CPUID policy enforcement shows 8...21 EAX as
> 'bitmask'
> 'bits set in the GuestVal must also be set in HostVal.
> This is often applied to feature fields where each bit indicates
> support for a feature'
>
> So that's right isn't it?
Yap, AFAIRC, it would fail the check if:
(GuestVal & HostVal) != GuestVal
and GuestVal is "the CPUID result value created by the hypervisor that
it wants to give to the guest". Let's say it clears bit 6 there.
Then HostVal comes in which is "the actual CPUID result value specified
in this PPR" and there the guest catches the HV lying its *ss off.
:-)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
* Borislav Petkov ([email protected]) wrote:
> On Thu, Oct 21, 2021 at 06:12:53PM +0100, Dr. David Alan Gilbert wrote:
> > OK, so that bit is 8...21 Eax ext2eax bit 6 page 1-109
> >
> > then 2.1.5.3 CPUID policy enforcement shows 8...21 EAX as
> > 'bitmask'
> > 'bits set in the GuestVal must also be set in HostVal.
> > This is often applied to feature fields where each bit indicates
> > support for a feature'
> >
> > So that's right isn't it?
>
> Yap, AFAIRC, it would fail the check if:
>
> (GuestVal & HostVal) != GuestVal
>
> and GuestVal is "the CPUID result value created by the hypervisor that
> it wants to give to the guest". Let's say it clears bit 6 there.
^^^^^^^
> Then HostVal comes in which is "the actual CPUID result value specified
> in this PPR" and there the guest catches the HV lying its *ss off.
>
> :-)
Hang on, I think it's perfectly fine for it to clear that bit - it just
gets caught if it *sets* it (i.e. claims to be a chip unaffected by the
bug).
i.e. if guestval=0 then (GustVal & whatever) == GuestVal
fine
?
Dave
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK
On Thu, Oct 21, 2021 at 06:47:50PM +0100, Dr. David Alan Gilbert wrote:
> Hang on, I think it's perfectly fine for it to clear that bit - it just
> gets caught if it *sets* it (i.e. claims to be a chip unaffected by the
> bug).
>
> i.e. if guestval=0 then (GustVal & whatever) == GuestVal
> fine
>
> ?
Bah, ofc. The name of the bit is NullSelectorClearsBase - so when it is
clear, we will note we're affected, as that patch does:
+ /*
+ * CPUID bit above wasn't set. If this kernel is still running
+ * as a HV guest, then the HV has decided not to advertize
+ * that CPUID bit for whatever reason. For example, one
+ * member of the migration pool might be vulnerable. Which
+ * means, the bug is present: set the BUG flag and return.
+ */
+ if (cpu_has(c, X86_FEATURE_HYPERVISOR)) {
+ set_cpu_bug(c, X86_BUG_NULL_SEG);
+ return;
+ }
I have managed to flip the meaning in my mind.
Ok, that makes more sense.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Oct 21, 2021 at 04:51:06PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > The CPUID calls in snp_cpuid_init() weren't added specifically to induce
> > the #VC-based SEV MSR read, they were added only because I thought the
> > gist of your earlier suggestions were to do more validation against the
> > CPUID table advertised by EFI
>
> Well, if EFI is providing us with the CPUID table, who verified it? The
> attestation process? Is it signed with the AMD platform key?
For CPUID table pages, the only thing that's assured/attested to by firmware
is that:
1) it is present at the expected guest physical address (that address
is generally baked into the EFI firmware, which *is* attested to)
2) its contents have been validated by the PSP against the current host
CPUID capabilities as defined by the AMD PPR (Publication #55898),
Section 2.1.5.3, "CPUID Policy Enforcement"
3) it is encrypted with the guest key
4) it is in a validated state at launch
The actual contents of the CPUID table are *not* attested to, so in theory
it can still be manipulated by a malicious hypervisor as part of the initial
SNP_LAUNCH_UPDATE firmware commands that provides the initial plain-text
encoding of the CPUID table that is provided to the PSP via
SNP_LAUNCH_UPDATE. It's also not signed in any way (apparently there were
some security reasons for that decision, though I don't know the full
details).
[A guest owner can still validate their CPUID values against known good
ones as part of their attestation flow, but that is not part of the
attestation report as reported by SNP firmware. (So long as there is some
care taken to ensure the source of the CPUID values visible to
userspace/guest attestion process are the same as what was used by the boot
stack: i.e. EFI/bootloader/kernel all use the CPUID page at that same
initial address, or in cases where a copy is used, that copy is placed in
encrypted/private/validated guest memory so it can't be tampered with during
boot.]
So, while it's more difficult to do, and the scope of influence is reduced,
there are still some games that can be played to mess with boot via
manipulation of the initial CPUID table values, so long as they are within
the constraints set by the CPUID enforcement policy defined in the PPR.
Unfortunately, the presence of the SEV/SEV-ES/SEV-SNP bits in 0x8000001F,
EAX, are not enforced by PSP. The only thing enforced there is that the
hypervisor cannot advertise bits that aren't supported by hardware. So
no matter how much the boot stack is trusted, the CPUID table does not
inherit that trust, and even values that we *know* should be true should be
verified rather than assumed.
But I think there are a couple approaches for verifying this is an SNP
guest that are robust against this sort of scenario. You've touched on
some of them in your other replies, so I'll respond there.
>
> Because if we can verify the firmware is ok, then we can trust the CPUID
> page, right?
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CMichael.Roth%40amd.com%7C155dd6f54f3e4de017a908d994a236a5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704246794699901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=U%2BS%2B%2F8%2BX8zLvPQGWvsOb7o6sKBz1MOZqU%2BVLKHiwugY%3D&reserved=0
On Thu, Oct 21, 2021 at 04:48:16PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > At which point we then switch to using the CPUID table? But at that
> > point all the previous CPUID checks, both SEV-related/non-SEV-related,
> > are now possibly not consistent with what's in the CPUID table. Do we
> > then revalidate?
>
> Well, that's a tough question. That's basically the same question as,
> does Linux support heterogeneous cores and can it handle hardware
> features which get enabled after boot. The perfect example is, late
> microcode loading which changes CPUID bits and adds new functionality.
>
> And the answer to that is, well, hard. You need to decide this on a
> case-by-case basis.
>
> But isn't it that the SNP CPUID page will be parsed early enough anyway
> so that kernel proper will see only SNP CPUID info and init properly
> using that?
At the time I wrote that I thought you were suggesting moving the SNP CPUID
table initialization to where sme_enable() is in current upstream, so it
seemed worth mentioning, but since the idea was actually to move all the
sev_status initialization in sme_enable() earlier in the code to where
SNP CPUID table init needs to happen (before first cpuid calls are made), I
this scenario is avoided.
>
> > Even a non-malicious hypervisor might provide inconsistent values
> > between the two sources due to bugs, or SNP validation suppressing
> > certain feature bits that hypervisor otherwise exposes, etc.
>
> There's also migration, lemme point to a very recent example:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20211021104744.24126-1-jane.malalane%40citrix.com&data=04%7C01%7Cmichael.roth%40amd.com%7C4aaa998fcd134c8d054608d994a1d1aa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704245057316093%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OKO9o3YzKwRkyWPpam2%2Fxn4aRSMKtPEnZjn05g81SP8%3D&reserved=0
>
> which is exactly what you say - a non-malicious HV taking care of its
> migration pool. So how do you handle that?
I concur with David's assessment on that solution being compatible with
CPUID enforcement policy. But it's certainly something to consider more
generally.
Fortunately I think I misspoke earlier, I thought there was a case or 2
where bits were suppressed, rather than causing a validation failure,
but looking back through the PPR I doesn't seem like that's actually the
case. Which is good, since that would indeed be painful to deal with in
the context of migration.
On Thu, Oct 21, 2021 at 04:39:31PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 09:05:42PM -0500, Michael Roth wrote:
> > According to the APM at least, (Rev 3.37, 15.34.10, "SEV_STATUS MSR"), the
> > SEV MSR is the appropriate source for guests to use. This is what is used
> > in the EFI code as well. So that seems to be the right way to make the
> > initial determination.
>
> Yap.
>
> > There's a dependency there on the SEV CPUID bit however, since setting the
> > bit to 0 would generally result in a guest skipping the SEV MSR read and
> > assuming 0. So for SNP it would be more reliable to make use of the CPUID
> > table at that point, since it's less-susceptible to manipulation, or do the
> > #VC-based SEV MSR read (or both).
>
> So the CPUID page is supplied by the firmware, right?
Yes.
>
> Then, you parse it and see that the CPUID bit is 1, then you start using
> the SEV_STATUS MSR and all good.
>
> If there *is* a CPUID page but that bit is 0, then you can safely assume
> that something is playing tricks on ya so you simply refuse booting.
I think that's a good way to deal with this.
I was going to suggest we could assume the presence of SEV status MSR by
virtue of EFI/bootloader/etc having provided a cc_blob, and just read it
right away to confirm this is SNP. But with your approach we could basically
just set up the table early, based on the presence of the cc_blob, and do all
the checks in sme_enable() in the same order as with SEV/SEV-ES, then just
have additional sanity checks against the CPUID/MSR response values to
ensure the SNP bits are present for the cases where a cpuid table / cc_blob
are provided.
I'll work on implementing things in this way and see how it goes.
>
> > Fully-unencrypted should result in a crash due to the reasons below.
>
> Crash is a good thing in confidential computing. :)
>
> > But there may exist some carefully crafted outside influences that could
> > goad the guest into, perhaps, not marking certain pages as private. The
> > best that can be done to prevent that is to audit/harden all the code in the
> > boot stack so that it is less susceptible to that kind of outside
> > manipulation (via mechanisms like SEV-ES, SNP page validation, SNP CPUID
> > table, SNP restricted injection, etc.)
>
> So to me I wonder why would one use anything *else* but an SNP guest. We
> all know that those previous technologies were just the stepping stones
> towards SNP.
Yah, I think ultimately that's where things are headed.
>
> > Then of course that boot stack needs to be part of the attestation process
> > to provide any meaningful assurances about the resulting guest state.
> >
> > Outside of the boot stack the guest owner might take some extra precautions.
> > Perhaps custom some kernel driver to verify encryption/validated status of
> > guest pages, some checks against the CPUID table to verify it contains sane
> > values, but not really worth speculating on that aspect as it will be
> > ultimately dependent on how the cloud vendor decides to handle things after
> > boot.
>
> Well, I've always advocated having a best-practices writeup somewhere
> goes a long way to explain this technology to people and how to get
> their feet wet. And there you can give hints how such verification could
> look like in detail...
Our security team is working on some initial reference designs / tooling
for attestation. It'll eventually make it's way to here:
https://github.com/AMDESE/sev-guest
but it's still mostly an internal effort so nothing there ATM. But hopefully
that will fill in some of these gaps. But I agree some an accompanying best
practices document to highlight some of these considerations is also
something that should be considered, I'll need to check to see if there's
anything like that in the works already.
>
> > That would indeed be useful. Perhaps as a nice big comment in sme_enable()
> > and/or the proposed sev_init() so that those invariants can be maintained,
> > or updated in sync with future changes. I'll look into that for the next
> > spin and check with Brijesh on the details.
>
> There is Documentation/x86/amd-memory-encryption.rst, for example.
Makes sense, will work with Brijesh on this.
Thanks!
-Mike
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C966f1e67d4704901d29a08d994a099e1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704239855683221%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=730WoYycYnjabC4igLcViZsEokSrcMkJ9oXvZso4ULQ%3D&reserved=0
On Mon, Oct 25, 2021 at 01:04:10PM +0200, Borislav Petkov wrote:
> On Thu, Oct 21, 2021 at 03:41:49PM -0500, Michael Roth wrote:
> > On Thu, Oct 21, 2021 at 04:51:06PM +0200, Borislav Petkov wrote:
> > > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > > > The CPUID calls in snp_cpuid_init() weren't added specifically to induce
> > > > the #VC-based SEV MSR read, they were added only because I thought the
> > > > gist of your earlier suggestions were to do more validation against the
> > > > CPUID table advertised by EFI
> > >
> > > Well, if EFI is providing us with the CPUID table, who verified it? The
> > > attestation process? Is it signed with the AMD platform key?
> >
> > For CPUID table pages, the only thing that's assured/attested to by firmware
> > is that:
> >
> > 1) it is present at the expected guest physical address (that address
> > is generally baked into the EFI firmware, which *is* attested to)
> > 2) its contents have been validated by the PSP against the current host
> > CPUID capabilities as defined by the AMD PPR (Publication #55898),
> > Section 2.1.5.3, "CPUID Policy Enforcement"
> > 3) it is encrypted with the guest key
> > 4) it is in a validated state at launch
> >
> > The actual contents of the CPUID table are *not* attested to,
>
> Why?
As counter-intuitive as it sounds, it actually doesn't buy us if the CPUID
table is part of the PSP attestation report, since:
- the boot stack is attested to, and if the boot stack isn't careful to
use the CPUID table at all times, then attesting CPUID table after
boot doesn't provide any assurance that the boot wasn't manipulated
by CPUID
- given the boot stack must take these precautions, guest-specific
attestation code is just as capable of attesting the CPUID table
contents/values, since it has the same view of the CPUID values that
were used during boot.
So leaving it to the guest owner to attest it provides some flexibility
to guest owners to implement it as they see fit, whereas making it part
of the attestation report means that the guest needs the exact contents
of the CPUID page for a particular guest configuration so it can be
incorporated into the measurement they are expecting, which would likely
require some tooling provided by the cloud vendor, since every different
guest configuration, or even changes like the ordering in which entries
are placed in the table, would affect measurement, so it's not something
that could be easily surmised separately with minimal involvement from a
cloud vendor.
And even if the cloud vendor provided a simple way to export the table
contents for measurement, can you really trust it? If you have to audit
individual entries to be sure there's nothing fishy, why not just
incorporate those checks into the guest owner's attestation flow and
leave the vendor out of it completely?
So not including it in the measurement meshes well with the overall
SEV-SNP approach of reducing the cloud vendor's involvement in the
overall attestation process.
>
> > so in theory it can still be manipulated by a malicious hypervisor as
> > part of the initial SNP_LAUNCH_UPDATE firmware commands that provides
> > the initial plain-text encoding of the CPUID table that is provided
> > to the PSP via SNP_LAUNCH_UPDATE. It's also not signed in any way
> > (apparently there were some security reasons for that decision, though
> > I don't know the full details).
>
> So this sounds like an unnecessary complication. I'm sure there are
> reasons to do it this way but my simple thinking would simply want the
> CPUID page to be read-only and signed so that the guest can trust it
> unconditionally.
The thing here is that it's not just a specific CPUID page that's valid
for all guests for a particular host. Booting a guest with additional
vCPUs changes the contents, different CPU models/flags changes the
contents, etc. So it needs to be generated for each specific guest
configuration, and can't just be a read-only page.
Some sort of signature that indicates the PSP's stamp of approval on a
particular CPUID page would be nice, but we do sort of have this in the
sense that CPUID page 'address' is part of measurement, and can only
contain values that were blessed by the PSP. The problem then becomes
ensuring that only that address it used for CPUID lookups, and that it's
contents weren't manipulated in a way where it's 'valid' as far as the
PSP is concerned, but still not the 'expected' values for a particular
guest (which is where the attestation mentioned above would come into
play).
>
> > [A guest owner can still validate their CPUID values against known good
> > ones as part of their attestation flow, but that is not part of the
> > attestation report as reported by SNP firmware. (So long as there is some
> > care taken to ensure the source of the CPUID values visible to
> > userspace/guest attestion process are the same as what was used by the boot
> > stack: i.e. EFI/bootloader/kernel all use the CPUID page at that same
> > initial address, or in cases where a copy is used, that copy is placed in
> > encrypted/private/validated guest memory so it can't be tampered with during
> > boot.]
>
> This sounds like the good practices advice to guest owners would be,
> "Hey, I just booted your SNP guest but for full trust, you should go and
> verify the CPUID page's contents."
>
> "And if I were you, I wouldn't want to run any verification of CPUID
> pages' contents on the same guest because it itself hasn't been verified
> yet."
>
> It all sounds weird.
Yes, understandably so. But the only way to avoid that sort of weirdness
in general is for *all* guest state to be measured, all pages, all
registers, etc. Baking that directly into the SEV-SNP attestation report
would be a non-starter for most since computing the measurement for all
that state independently would require lots of additional inputs from
cloud vendor (who we don't necessarily trust in the first place), and
constant updates of measurement values since they would change with
every guest configuration change, every different starting TSC offset,
different, maybe the order in which vCPUs were onlined, stuff that the
kernel prints to log buffers, etc.
But, if a guest owner wants to attempt clever ways to account for some/all
of that in their attestation flow, they are welcome to try. That's sort of
the idea behind SNP attestation vs. SEV. Things like page
validation/encryption, cpuid enforcement, etc., reduce some of the
variables/possibilities guest owners need to account for during attestation
to make the process more secure/tenable, but they don't rule out all
possibilities, just as Trusted Boot doesn't necessarily mean you can fully
trust your OS state immediately afer boot; there are still outside
influences at play, and the boot stack should guard against them
wherever possible.
>
> > So, while it's more difficult to do, and the scope of influence is reduced,
> > there are still some games that can be played to mess with boot via
> > manipulation of the initial CPUID table values, so long as they are within
> > the constraints set by the CPUID enforcement policy defined in the PPR.
> >
> > Unfortunately, the presence of the SEV/SEV-ES/SEV-SNP bits in 0x8000001F,
> > EAX, are not enforced by PSP. The only thing enforced there is that the
> > hypervisor cannot advertise bits that aren't supported by hardware. So
> > no matter how much the boot stack is trusted, the CPUID table does not
> > inherit that trust, and even values that we *know* should be true should be
> > verified rather than assumed.
> >
> > But I think there are a couple approaches for verifying this is an SNP
> > guest that are robust against this sort of scenario. You've touched on
> > some of them in your other replies, so I'll respond there.
>
> Yah, I guess the kernel can do good enough verification and then the
> full thing needs to be done by the guest owner and in *some* userspace
> - not necessarily on the currently booted, unverified guest - but
> somewhere, where you have maximal flexibility.
Exactly, moving attestation into the guest allows for more of the these
unexpected states to be accounted for at whatever level of paranoia a guest
owner sees fit, while still allowing firmware to provide some basic
assurances via attestation report and various features to reduce common
attack vectors during/after boot.
>
> IMHO.
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CMichael.Roth%40amd.com%7Cd9f9c20a37ce4a4b0e0608d997a72f6a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637707566641580997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OmfuUZfcf3bkC%2FmSsiInQ1vScK5Onu1lHWAUH3o%2FUmE%3D&reserved=0
On Thu, Oct 21, 2021 at 03:41:49PM -0500, Michael Roth wrote:
> On Thu, Oct 21, 2021 at 04:51:06PM +0200, Borislav Petkov wrote:
> > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > > The CPUID calls in snp_cpuid_init() weren't added specifically to induce
> > > the #VC-based SEV MSR read, they were added only because I thought the
> > > gist of your earlier suggestions were to do more validation against the
> > > CPUID table advertised by EFI
> >
> > Well, if EFI is providing us with the CPUID table, who verified it? The
> > attestation process? Is it signed with the AMD platform key?
>
> For CPUID table pages, the only thing that's assured/attested to by firmware
> is that:
>
> 1) it is present at the expected guest physical address (that address
> is generally baked into the EFI firmware, which *is* attested to)
> 2) its contents have been validated by the PSP against the current host
> CPUID capabilities as defined by the AMD PPR (Publication #55898),
> Section 2.1.5.3, "CPUID Policy Enforcement"
> 3) it is encrypted with the guest key
> 4) it is in a validated state at launch
>
> The actual contents of the CPUID table are *not* attested to,
Why?
> so in theory it can still be manipulated by a malicious hypervisor as
> part of the initial SNP_LAUNCH_UPDATE firmware commands that provides
> the initial plain-text encoding of the CPUID table that is provided
> to the PSP via SNP_LAUNCH_UPDATE. It's also not signed in any way
> (apparently there were some security reasons for that decision, though
> I don't know the full details).
So this sounds like an unnecessary complication. I'm sure there are
reasons to do it this way but my simple thinking would simply want the
CPUID page to be read-only and signed so that the guest can trust it
unconditionally.
> [A guest owner can still validate their CPUID values against known good
> ones as part of their attestation flow, but that is not part of the
> attestation report as reported by SNP firmware. (So long as there is some
> care taken to ensure the source of the CPUID values visible to
> userspace/guest attestion process are the same as what was used by the boot
> stack: i.e. EFI/bootloader/kernel all use the CPUID page at that same
> initial address, or in cases where a copy is used, that copy is placed in
> encrypted/private/validated guest memory so it can't be tampered with during
> boot.]
This sounds like the good practices advice to guest owners would be,
"Hey, I just booted your SNP guest but for full trust, you should go and
verify the CPUID page's contents."
"And if I were you, I wouldn't want to run any verification of CPUID
pages' contents on the same guest because it itself hasn't been verified
yet."
It all sounds weird.
> So, while it's more difficult to do, and the scope of influence is reduced,
> there are still some games that can be played to mess with boot via
> manipulation of the initial CPUID table values, so long as they are within
> the constraints set by the CPUID enforcement policy defined in the PPR.
>
> Unfortunately, the presence of the SEV/SEV-ES/SEV-SNP bits in 0x8000001F,
> EAX, are not enforced by PSP. The only thing enforced there is that the
> hypervisor cannot advertise bits that aren't supported by hardware. So
> no matter how much the boot stack is trusted, the CPUID table does not
> inherit that trust, and even values that we *know* should be true should be
> verified rather than assumed.
>
> But I think there are a couple approaches for verifying this is an SNP
> guest that are robust against this sort of scenario. You've touched on
> some of them in your other replies, so I'll respond there.
Yah, I guess the kernel can do good enough verification and then the
full thing needs to be done by the guest owner and in *some* userspace
- not necessarily on the currently booted, unverified guest - but
somewhere, where you have maximal flexibility.
IMHO.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Oct 25, 2021 at 11:35:18AM -0500, Michael Roth wrote:
> As counter-intuitive as it sounds, it actually doesn't buy us if the CPUID
> table is part of the PSP attestation report, since:
Thanks for taking the time to explain in detail - I think I know now
what's going on, and David explained some additional stuff to me
yesterday.
So, to cut to the chase:
- yeah, ok, I guess guest owner attestation is what should happen.
- as to the boot detection, I think you should do in sme_enable(), in
pseudo:
bool snp_guest_detected;
if (CPUID page address) {
read SEV_STATUS;
snp_guest_detected = SEV_STATUS & MSR_AMD64_SEV_SNP_ENABLED;
}
/* old SME/SEV detection path */
read 0x8000_001F_EAX and look at bits SME and SEV, yadda yadda.
if (snp_guest_detected && (!SME || !SEV))
/*
* HV is lying to me, do something there, dunno what. I guess we can
* continue booting unencrypted so that the guest owner knows that
* detection has failed and maybe the HV didn't want us to force SNP.
* This way, attestation will fail and the user will know why.
* Or something like that.
*/
/* normal feature detection continues. */
How does that sound?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 27, 2021 at 01:17:11PM +0200, Borislav Petkov wrote:
> On Mon, Oct 25, 2021 at 11:35:18AM -0500, Michael Roth wrote:
> > As counter-intuitive as it sounds, it actually doesn't buy us if the CPUID
> > table is part of the PSP attestation report, since:
>
> Thanks for taking the time to explain in detail - I think I know now
> what's going on, and David explained some additional stuff to me
> yesterday.
>
> So, to cut to the chase:
>
> - yeah, ok, I guess guest owner attestation is what should happen.
>
> - as to the boot detection, I think you should do in sme_enable(), in
> pseudo:
>
> bool snp_guest_detected;
>
> if (CPUID page address) {
> read SEV_STATUS;
>
> snp_guest_detected = SEV_STATUS & MSR_AMD64_SEV_SNP_ENABLED;
> }
>
> /* old SME/SEV detection path */
> read 0x8000_001F_EAX and look at bits SME and SEV, yadda yadda.
>
> if (snp_guest_detected && (!SME || !SEV))
> /*
> * HV is lying to me, do something there, dunno what. I guess we can
> * continue booting unencrypted so that the guest owner knows that
> * detection has failed and maybe the HV didn't want us to force SNP.
> * This way, attestation will fail and the user will know why.
> * Or something like that.
> */
>
>
> /* normal feature detection continues. */
>
> How does that sound?
That seems promising. I've been testing a similar approach in conjunction with
moving sme_enable() to after the initial #VC handler is set up and things seem
to work out pretty nicely.
boot/compressed is a little less straightforward since the sme_enable()
equivalent is set_sev_encryption_mask() which sets sev_status and is written
in assembly, whereas the SNP-specific bits we're adding relies on C code
that handles stuff like scanning EFI config table are in C, so probably
worthwhile to see if everything can be redone in C. But then there's
get_sev_encryption_bit(), which needs to be in assembly since it needs
to be called from 32-bit entry path as well, but that doesn't actually
rely on anything set by set_sev_encryption_mask(), so it seems like it
should be okay to split set_sev_encryption_mask() out into a separate C
routine.
Will work on implementing/testing that approach, but if you or Joerg are
aware of any showstoppers there just let me know.
Thanks!
-Mike
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C72940826a93b49882ffa08d9993b5390%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637709302358641670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BoUx7zP3RA57CwGG2q5IkUkrYQZiOL9ZoLxvIVTq%2BDY%3D&reserved=0
Hi Peter,
Somehow this email was filtered out as spam and never reached to my
inbox. Sorry for the delay in the response.
On 10/20/21 4:33 PM, Peter Gonda wrote:
> On Fri, Oct 8, 2021 at 12:06 PM Brijesh Singh <[email protected]> wrote:
>>
>> SEV-SNP specification provides the guest a mechanisum to communicate with
>> the PSP without risk from a malicious hypervisor who wishes to read, alter,
>> drop or replay the messages sent. The driver uses snp_issue_guest_request()
>> to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to
>> submit the request to PSP.
>>
>> The PSP requires that all communication should be encrypted using key
>> specified through the platform_data.
>>
>> The userspace can use SNP_GET_REPORT ioctl() to query the guest
>> attestation report.
>>
>> See SEV-SNP spec section Guest Messages for more details.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> Documentation/virt/coco/sevguest.rst | 77 ++++
>> drivers/virt/Kconfig | 3 +
>> drivers/virt/Makefile | 1 +
>> drivers/virt/coco/sevguest/Kconfig | 9 +
>> drivers/virt/coco/sevguest/Makefile | 2 +
>> drivers/virt/coco/sevguest/sevguest.c | 561 ++++++++++++++++++++++++++
>> drivers/virt/coco/sevguest/sevguest.h | 98 +++++
>> include/uapi/linux/sev-guest.h | 44 ++
>> 8 files changed, 795 insertions(+)
>> create mode 100644 Documentation/virt/coco/sevguest.rst
>> create mode 100644 drivers/virt/coco/sevguest/Kconfig
>> create mode 100644 drivers/virt/coco/sevguest/Makefile
>> create mode 100644 drivers/virt/coco/sevguest/sevguest.c
>> create mode 100644 drivers/virt/coco/sevguest/sevguest.h
>> create mode 100644 include/uapi/linux/sev-guest.h
>>
>> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
>> new file mode 100644
>> index 000000000000..002c90946b8a
>> --- /dev/null
>> +++ b/Documentation/virt/coco/sevguest.rst
>> @@ -0,0 +1,77 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +===================================================================
>> +The Definitive SEV Guest API Documentation
>> +===================================================================
>> +
>> +1. General description
>> +======================
>> +
>> +The SEV API is a set of ioctls that are used by the guest or hypervisor
>> +to get or set certain aspect of the SEV virtual machine. The ioctls belong
>> +to the following classes:
>> +
>> + - Hypervisor ioctls: These query and set global attributes which affect the
>> + whole SEV firmware. These ioctl are used by platform provision tools.
>> +
>> + - Guest ioctls: These query and set attributes of the SEV virtual machine.
>> +
>> +2. API description
>> +==================
>> +
>> +This section describes ioctls that can be used to query or set SEV guests.
>> +For each ioctl, the following information is provided along with a
>> +description:
>> +
>> + Technology:
>> + which SEV techology provides this ioctl. sev, sev-es, sev-snp or all.
>> +
>> + Type:
>> + hypervisor or guest. The ioctl can be used inside the guest or the
>> + hypervisor.
>> +
>> + Parameters:
>> + what parameters are accepted by the ioctl.
>> +
>> + Returns:
>> + the return value. General error numbers (ENOMEM, EINVAL)
>> + are not detailed, but errors with specific meanings are.
>> +
>> +The guest ioctl should be issued on a file descriptor of the /dev/sev-guest device.
>> +The ioctl accepts struct snp_user_guest_request. The input and output structure is
>> +specified through the req_data and resp_data field respectively. If the ioctl fails
>> +to execute due to a firmware error, then fw_err code will be set.
>> +
>> +::
>> + struct snp_guest_request_ioctl {
>> + /* Request and response structure address */
>> + __u64 req_data;
>> + __u64 resp_data;
>> +
>> + /* firmware error code on failure (see psp-sev.h) */
>> + __u64 fw_err;
>> + };
>> +
>> +2.1 SNP_GET_REPORT
>> +------------------
>> +
>> +:Technology: sev-snp
>> +:Type: guest ioctl
>> +:Parameters (in): struct snp_report_req
>> +:Returns (out): struct snp_report_resp on success, -negative on error
>> +
>> +The SNP_GET_REPORT ioctl can be used to query the attestation report from the
>> +SEV-SNP firmware. The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command
>> +provided by the SEV-SNP firmware to query the attestation report.
>> +
>> +On success, the snp_report_resp.data will contains the report. The report
>> +will contain the format described in the SEV-SNP specification. See the SEV-SNP
>> +specification for further details.
>> +
>> +
>> +Reference
>> +---------
>> +
>> +SEV-SNP and GHCB specification: developer.amd.com/sev
>> +
>> +The driver is based on SEV-SNP firmware spec 0.9 and GHCB spec version 2.0.
>> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
>> index 8061e8ef449f..e457e47610d3 100644
>> --- a/drivers/virt/Kconfig
>> +++ b/drivers/virt/Kconfig
>> @@ -36,4 +36,7 @@ source "drivers/virt/vboxguest/Kconfig"
>> source "drivers/virt/nitro_enclaves/Kconfig"
>>
>> source "drivers/virt/acrn/Kconfig"
>> +
>> +source "drivers/virt/coco/sevguest/Kconfig"
>> +
>> endif
>> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
>> index 3e272ea60cd9..9c704a6fdcda 100644
>> --- a/drivers/virt/Makefile
>> +++ b/drivers/virt/Makefile
>> @@ -8,3 +8,4 @@ obj-y += vboxguest/
>>
>> obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
>> obj-$(CONFIG_ACRN_HSM) += acrn/
>> +obj-$(CONFIG_SEV_GUEST) += coco/sevguest/
>> diff --git a/drivers/virt/coco/sevguest/Kconfig b/drivers/virt/coco/sevguest/Kconfig
>> new file mode 100644
>> index 000000000000..96190919cca8
>> --- /dev/null
>> +++ b/drivers/virt/coco/sevguest/Kconfig
>> @@ -0,0 +1,9 @@
>> +config SEV_GUEST
>> + tristate "AMD SEV Guest driver"
>> + default y
>> + depends on AMD_MEM_ENCRYPT && CRYPTO_AEAD2
>> + help
>> + The driver can be used by the SEV-SNP guest to communicate with the PSP to
>> + request the attestation report and more.
>> +
>> + If you choose 'M' here, this module will be called sevguest.
>> diff --git a/drivers/virt/coco/sevguest/Makefile b/drivers/virt/coco/sevguest/Makefile
>> new file mode 100644
>> index 000000000000..b1ffb2b4177b
>> --- /dev/null
>> +++ b/drivers/virt/coco/sevguest/Makefile
>> @@ -0,0 +1,2 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +obj-$(CONFIG_SEV_GUEST) += sevguest.o
>> diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
>> new file mode 100644
>> index 000000000000..2d313fb2ffae
>> --- /dev/null
>> +++ b/drivers/virt/coco/sevguest/sevguest.c
>> @@ -0,0 +1,561 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * AMD Secure Encrypted Virtualization Nested Paging (SEV-SNP) guest request interface
>> + *
>> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Brijesh Singh <[email protected]>
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/kernel.h>
>> +#include <linux/types.h>
>> +#include <linux/mutex.h>
>> +#include <linux/io.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/miscdevice.h>
>> +#include <linux/set_memory.h>
>> +#include <linux/fs.h>
>> +#include <crypto/aead.h>
>> +#include <linux/scatterlist.h>
>> +#include <linux/psp-sev.h>
>> +#include <uapi/linux/sev-guest.h>
>> +#include <uapi/linux/psp-sev.h>
>> +
>> +#include <asm/svm.h>
>> +#include <asm/sev.h>
>> +
>> +#include "sevguest.h"
>> +
>> +#define DEVICE_NAME "sev-guest"
>> +#define AAD_LEN 48
>> +#define MSG_HDR_VER 1
>> +
>> +struct snp_guest_crypto {
>> + struct crypto_aead *tfm;
>> + u8 *iv, *authtag;
>> + int iv_len, a_len;
>> +};
>> +
>> +struct snp_guest_dev {
>> + struct device *dev;
>> + struct miscdevice misc;
>> +
>> + struct snp_guest_crypto *crypto;
>> + struct snp_guest_msg *request, *response;
>> + struct snp_secrets_page_layout *layout;
>> + struct snp_req_data input;
>> + u32 *os_area_msg_seqno;
>> +};
>> +
>> +static u32 vmpck_id;
>> +module_param(vmpck_id, uint, 0444);
>> +MODULE_PARM_DESC(vmpck_id, "The VMPCK ID to use when communicating with the PSP.");
>> +
>> +static DEFINE_MUTEX(snp_cmd_mutex);
>> +
>> +static inline u64 __snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
>> +{
>> + u64 count;
>> +
>> + /* Read the current message sequence counter from secrets pages */
>> + count = *snp_dev->os_area_msg_seqno;
>> +
>> + return count + 1;
>> +}
>> +
>> +/* Return a non-zero on success */
>> +static u64 snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
>> +{
>> + u64 count = __snp_get_msg_seqno(snp_dev);
>> +
>> + /*
>> + * The message sequence counter for the SNP guest request is a 64-bit
>> + * value but the version 2 of GHCB specification defines a 32-bit storage
>> + * for the it. If the counter exceeds the 32-bit value then return zero.
>> + * The caller should check the return value, but if the caller happen to
>> + * not check the value and use it, then the firmware treats zero as an
>> + * invalid number and will fail the message request.
>> + */
>> + if (count >= UINT_MAX) {
>> + pr_err_ratelimited("SNP guest request message sequence counter overflow\n");
>> + return 0;
>> + }
>> +
>> + return count;
>> +}
>> +
>> +static void snp_inc_msg_seqno(struct snp_guest_dev *snp_dev)
>> +{
>> + /*
>> + * The counter is also incremented by the PSP, so increment it by 2
>> + * and save in secrets page.
>> + */
>> + *snp_dev->os_area_msg_seqno += 2;
>> +}
>> +
>> +static inline struct snp_guest_dev *to_snp_dev(struct file *file)
>> +{
>> + struct miscdevice *dev = file->private_data;
>> +
>> + return container_of(dev, struct snp_guest_dev, misc);
>> +}
>> +
>> +static struct snp_guest_crypto *init_crypto(struct snp_guest_dev *snp_dev, u8 *key, size_t keylen)
>> +{
>> + struct snp_guest_crypto *crypto;
>> +
>> + crypto = kzalloc(sizeof(*crypto), GFP_KERNEL_ACCOUNT);
>> + if (!crypto)
>> + return NULL;
>> +
>> + crypto->tfm = crypto_alloc_aead("gcm(aes)", 0, 0);
>> + if (IS_ERR(crypto->tfm))
>> + goto e_free;
>> +
>> + if (crypto_aead_setkey(crypto->tfm, key, keylen))
>> + goto e_free_crypto;
>> +
>> + crypto->iv_len = crypto_aead_ivsize(crypto->tfm);
>> + if (crypto->iv_len < 12) {
>> + dev_err(snp_dev->dev, "IV length is less than 12.\n");
>> + goto e_free_crypto;
>> + }
>> +
>> + crypto->iv = kmalloc(crypto->iv_len, GFP_KERNEL_ACCOUNT);
>> + if (!crypto->iv)
>> + goto e_free_crypto;
>> +
>> + if (crypto_aead_authsize(crypto->tfm) > MAX_AUTHTAG_LEN) {
>> + if (crypto_aead_setauthsize(crypto->tfm, MAX_AUTHTAG_LEN)) {
>> + dev_err(snp_dev->dev, "failed to set authsize to %d\n", MAX_AUTHTAG_LEN);
>> + goto e_free_crypto;
>> + }
>> + }
>> +
>> + crypto->a_len = crypto_aead_authsize(crypto->tfm);
>> + crypto->authtag = kmalloc(crypto->a_len, GFP_KERNEL_ACCOUNT);
>> + if (!crypto->authtag)
>> + goto e_free_crypto;
>> +
>> + return crypto;
>> +
>> +e_free_crypto:
>> + crypto_free_aead(crypto->tfm);
>> +e_free:
>> + kfree(crypto->iv);
>> + kfree(crypto->authtag);
>> + kfree(crypto);
>> +
>> + return NULL;
>> +}
>> +
>> +static void deinit_crypto(struct snp_guest_crypto *crypto)
>> +{
>> + crypto_free_aead(crypto->tfm);
>> + kfree(crypto->iv);
>> + kfree(crypto->authtag);
>> + kfree(crypto);
>> +}
>> +
>> +static int enc_dec_message(struct snp_guest_crypto *crypto, struct snp_guest_msg *msg,
>> + u8 *src_buf, u8 *dst_buf, size_t len, bool enc)
>> +{
>> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
>> + struct scatterlist src[3], dst[3];
>> + DECLARE_CRYPTO_WAIT(wait);
>> + struct aead_request *req;
>> + int ret;
>> +
>> + req = aead_request_alloc(crypto->tfm, GFP_KERNEL);
>> + if (!req)
>> + return -ENOMEM;
>> +
>> + /*
>> + * AEAD memory operations:
>> + * +------ AAD -------+------- DATA -----+---- AUTHTAG----+
>> + * | msg header | plaintext | hdr->authtag |
>> + * | bytes 30h - 5Fh | or | |
>> + * | | cipher | |
>> + * +------------------+------------------+----------------+
>> + */
>> + sg_init_table(src, 3);
>> + sg_set_buf(&src[0], &hdr->algo, AAD_LEN);
>> + sg_set_buf(&src[1], src_buf, hdr->msg_sz);
>> + sg_set_buf(&src[2], hdr->authtag, crypto->a_len);
>> +
>> + sg_init_table(dst, 3);
>> + sg_set_buf(&dst[0], &hdr->algo, AAD_LEN);
>> + sg_set_buf(&dst[1], dst_buf, hdr->msg_sz);
>> + sg_set_buf(&dst[2], hdr->authtag, crypto->a_len);
>> +
>> + aead_request_set_ad(req, AAD_LEN);
>> + aead_request_set_tfm(req, crypto->tfm);
>> + aead_request_set_callback(req, 0, crypto_req_done, &wait);
>> +
>> + aead_request_set_crypt(req, src, dst, len, crypto->iv);
>> + ret = crypto_wait_req(enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req), &wait);
>> +
>> + aead_request_free(req);
>> + return ret;
>> +}
>> +
>> +static int __enc_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
>> + void *plaintext, size_t len)
>> +{
>> + struct snp_guest_crypto *crypto = snp_dev->crypto;
>> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
>> +
>> + memset(crypto->iv, 0, crypto->iv_len);
>> + memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
>> +
>> + return enc_dec_message(crypto, msg, plaintext, msg->payload, len, true);
>> +}
>> +
>> +static int dec_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
>> + void *plaintext, size_t len)
>> +{
>> + struct snp_guest_crypto *crypto = snp_dev->crypto;
>> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
>> +
>> + /* Build IV with response buffer sequence number */
>> + memset(crypto->iv, 0, crypto->iv_len);
>> + memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
>> +
>> + return enc_dec_message(crypto, msg, msg->payload, plaintext, len, false);
>> +}
>> +
>> +static int verify_and_dec_payload(struct snp_guest_dev *snp_dev, void *payload, u32 sz)
>> +{
>> + struct snp_guest_crypto *crypto = snp_dev->crypto;
>> + struct snp_guest_msg *resp = snp_dev->response;
>> + struct snp_guest_msg *req = snp_dev->request;
>> + struct snp_guest_msg_hdr *req_hdr = &req->hdr;
>> + struct snp_guest_msg_hdr *resp_hdr = &resp->hdr;
>> +
>> + dev_dbg(snp_dev->dev, "response [seqno %lld type %d version %d sz %d]\n",
>> + resp_hdr->msg_seqno, resp_hdr->msg_type, resp_hdr->msg_version, resp_hdr->msg_sz);
>> +
>> + /* Verify that the sequence counter is incremented by 1 */
>> + if (unlikely(resp_hdr->msg_seqno != (req_hdr->msg_seqno + 1)))
>> + return -EBADMSG;
>> +
>> + /* Verify response message type and version number. */
>> + if (resp_hdr->msg_type != (req_hdr->msg_type + 1) ||
>> + resp_hdr->msg_version != req_hdr->msg_version)
>> + return -EBADMSG;
>> +
>> + /*
>> + * If the message size is greater than our buffer length then return
>> + * an error.
>> + */
>> + if (unlikely((resp_hdr->msg_sz + crypto->a_len) > sz))
>> + return -EBADMSG;
>> +
>> + return dec_payload(snp_dev, resp, payload, resp_hdr->msg_sz + crypto->a_len);
>> +}
>> +
>> +static bool enc_payload(struct snp_guest_dev *snp_dev, u64 seqno, int version, u8 type,
>> + void *payload, size_t sz)
>> +{
>> + struct snp_guest_msg *req = snp_dev->request;
>> + struct snp_guest_msg_hdr *hdr = &req->hdr;
>> +
>> + memset(req, 0, sizeof(*req));
>> +
>> + hdr->algo = SNP_AEAD_AES_256_GCM;
>> + hdr->hdr_version = MSG_HDR_VER;
>> + hdr->hdr_sz = sizeof(*hdr);
>> + hdr->msg_type = type;
>> + hdr->msg_version = version;
>> + hdr->msg_seqno = seqno;
>> + hdr->msg_vmpck = vmpck_id;
>> + hdr->msg_sz = sz;
>> +
>> + /* Verify the sequence number is non-zero */
>> + if (!hdr->msg_seqno)
>> + return -ENOSR;
>> +
>> + dev_dbg(snp_dev->dev, "request [seqno %lld type %d version %d sz %d]\n",
>> + hdr->msg_seqno, hdr->msg_type, hdr->msg_version, hdr->msg_sz);
>> +
>> + return __enc_payload(snp_dev, req, payload, sz);
>> +}
>> +
>> +static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, int msg_ver,
>> + u8 type, void *req_buf, size_t req_sz, void *resp_buf,
>> + u32 resp_sz, __u64 *fw_err)
>> +{
>> + unsigned long err;
>> + u64 seqno;
>> + int rc;
>> +
>> + /* Get message sequence and verify that its a non-zero */
>> + seqno = snp_get_msg_seqno(snp_dev);
>> + if (!seqno)
>> + return -EIO;
>> +
>> + memset(snp_dev->response, 0, sizeof(*snp_dev->response));
>> +
>> + /* Encrypt the userspace provided payload */
>> + rc = enc_payload(snp_dev, seqno, msg_ver, type, req_buf, req_sz);
>> + if (rc)
>> + return rc;
>> +
>> + /* Call firmware to process the request */
>> + rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
>> + if (fw_err)
>> + *fw_err = err;
>> +
>> + if (rc)
>> + return rc;
>> +
>> + rc = verify_and_dec_payload(snp_dev, resp_buf, resp_sz);
>> + if (rc)
>> + return rc;
>> +
>> + /* Increment to new message sequence after the command is successful. */
>> + snp_inc_msg_seqno(snp_dev);
>
> Thanks for updating this sequence number logic. But I still have some
> concerns. In verify_and_dec_payload() we check the encryption header
> but all these fields are accessible to the hypervisor, meaning it can
> change the header and cause this sequence number to not get
> incremented. We then will reuse the sequence number for the next
> command, which isn't great for AES GCM. It seems very hard to tell if
> the FW actually got our request and created a response there by
> incrementing the sequence number by 2, or if the hypervisor is acting
> in bad faith. It seems like to be safe we need to completely stop
> using this vmpck if we cannot confirm the PSP has gotten our request
> and created a response. Thoughts?
>
Very good point, I think we can detect this condition by rearranging the
checks. The verify_and_dec_payload() is called only after the command is
succesful and does the following checks
1) Verifies the header
2) Decrypts the payload
3) Later we increment the sequence
If we arrange to the below order then we can avoid this condition.
1) Decrypt the payload
2) Increment the sequence number
3) Verify the header
The descryption will succeed only if PSP constructed the payload.
Does this make sense ?
thanks
On Wed, Oct 27, 2021 at 10:08 AM Brijesh Singh <[email protected]> wrote:
>
> Hi Peter,
>
> Somehow this email was filtered out as spam and never reached to my
> inbox. Sorry for the delay in the response.
>
> On 10/20/21 4:33 PM, Peter Gonda wrote:
> > On Fri, Oct 8, 2021 at 12:06 PM Brijesh Singh <[email protected]> wrote:
> >>
> >> SEV-SNP specification provides the guest a mechanisum to communicate with
> >> the PSP without risk from a malicious hypervisor who wishes to read, alter,
> >> drop or replay the messages sent. The driver uses snp_issue_guest_request()
> >> to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to
> >> submit the request to PSP.
> >>
> >> The PSP requires that all communication should be encrypted using key
> >> specified through the platform_data.
> >>
> >> The userspace can use SNP_GET_REPORT ioctl() to query the guest
> >> attestation report.
> >>
> >> See SEV-SNP spec section Guest Messages for more details.
> >>
> >> Signed-off-by: Brijesh Singh <[email protected]>
> >> ---
> >> Documentation/virt/coco/sevguest.rst | 77 ++++
> >> drivers/virt/Kconfig | 3 +
> >> drivers/virt/Makefile | 1 +
> >> drivers/virt/coco/sevguest/Kconfig | 9 +
> >> drivers/virt/coco/sevguest/Makefile | 2 +
> >> drivers/virt/coco/sevguest/sevguest.c | 561 ++++++++++++++++++++++++++
> >> drivers/virt/coco/sevguest/sevguest.h | 98 +++++
> >> include/uapi/linux/sev-guest.h | 44 ++
> >> 8 files changed, 795 insertions(+)
> >> create mode 100644 Documentation/virt/coco/sevguest.rst
> >> create mode 100644 drivers/virt/coco/sevguest/Kconfig
> >> create mode 100644 drivers/virt/coco/sevguest/Makefile
> >> create mode 100644 drivers/virt/coco/sevguest/sevguest.c
> >> create mode 100644 drivers/virt/coco/sevguest/sevguest.h
> >> create mode 100644 include/uapi/linux/sev-guest.h
> >>
> >> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
> >> new file mode 100644
> >> index 000000000000..002c90946b8a
> >> --- /dev/null
> >> +++ b/Documentation/virt/coco/sevguest.rst
> >> @@ -0,0 +1,77 @@
> >> +.. SPDX-License-Identifier: GPL-2.0
> >> +
> >> +===================================================================
> >> +The Definitive SEV Guest API Documentation
> >> +===================================================================
> >> +
> >> +1. General description
> >> +======================
> >> +
> >> +The SEV API is a set of ioctls that are used by the guest or hypervisor
> >> +to get or set certain aspect of the SEV virtual machine. The ioctls belong
> >> +to the following classes:
> >> +
> >> + - Hypervisor ioctls: These query and set global attributes which affect the
> >> + whole SEV firmware. These ioctl are used by platform provision tools.
> >> +
> >> + - Guest ioctls: These query and set attributes of the SEV virtual machine.
> >> +
> >> +2. API description
> >> +==================
> >> +
> >> +This section describes ioctls that can be used to query or set SEV guests.
> >> +For each ioctl, the following information is provided along with a
> >> +description:
> >> +
> >> + Technology:
> >> + which SEV techology provides this ioctl. sev, sev-es, sev-snp or all.
> >> +
> >> + Type:
> >> + hypervisor or guest. The ioctl can be used inside the guest or the
> >> + hypervisor.
> >> +
> >> + Parameters:
> >> + what parameters are accepted by the ioctl.
> >> +
> >> + Returns:
> >> + the return value. General error numbers (ENOMEM, EINVAL)
> >> + are not detailed, but errors with specific meanings are.
> >> +
> >> +The guest ioctl should be issued on a file descriptor of the /dev/sev-guest device.
> >> +The ioctl accepts struct snp_user_guest_request. The input and output structure is
> >> +specified through the req_data and resp_data field respectively. If the ioctl fails
> >> +to execute due to a firmware error, then fw_err code will be set.
> >> +
> >> +::
> >> + struct snp_guest_request_ioctl {
> >> + /* Request and response structure address */
> >> + __u64 req_data;
> >> + __u64 resp_data;
> >> +
> >> + /* firmware error code on failure (see psp-sev.h) */
> >> + __u64 fw_err;
> >> + };
> >> +
> >> +2.1 SNP_GET_REPORT
> >> +------------------
> >> +
> >> +:Technology: sev-snp
> >> +:Type: guest ioctl
> >> +:Parameters (in): struct snp_report_req
> >> +:Returns (out): struct snp_report_resp on success, -negative on error
> >> +
> >> +The SNP_GET_REPORT ioctl can be used to query the attestation report from the
> >> +SEV-SNP firmware. The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command
> >> +provided by the SEV-SNP firmware to query the attestation report.
> >> +
> >> +On success, the snp_report_resp.data will contains the report. The report
> >> +will contain the format described in the SEV-SNP specification. See the SEV-SNP
> >> +specification for further details.
> >> +
> >> +
> >> +Reference
> >> +---------
> >> +
> >> +SEV-SNP and GHCB specification: developer.amd.com/sev
> >> +
> >> +The driver is based on SEV-SNP firmware spec 0.9 and GHCB spec version 2.0.
> >> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
> >> index 8061e8ef449f..e457e47610d3 100644
> >> --- a/drivers/virt/Kconfig
> >> +++ b/drivers/virt/Kconfig
> >> @@ -36,4 +36,7 @@ source "drivers/virt/vboxguest/Kconfig"
> >> source "drivers/virt/nitro_enclaves/Kconfig"
> >>
> >> source "drivers/virt/acrn/Kconfig"
> >> +
> >> +source "drivers/virt/coco/sevguest/Kconfig"
> >> +
> >> endif
> >> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> >> index 3e272ea60cd9..9c704a6fdcda 100644
> >> --- a/drivers/virt/Makefile
> >> +++ b/drivers/virt/Makefile
> >> @@ -8,3 +8,4 @@ obj-y += vboxguest/
> >>
> >> obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
> >> obj-$(CONFIG_ACRN_HSM) += acrn/
> >> +obj-$(CONFIG_SEV_GUEST) += coco/sevguest/
> >> diff --git a/drivers/virt/coco/sevguest/Kconfig b/drivers/virt/coco/sevguest/Kconfig
> >> new file mode 100644
> >> index 000000000000..96190919cca8
> >> --- /dev/null
> >> +++ b/drivers/virt/coco/sevguest/Kconfig
> >> @@ -0,0 +1,9 @@
> >> +config SEV_GUEST
> >> + tristate "AMD SEV Guest driver"
> >> + default y
> >> + depends on AMD_MEM_ENCRYPT && CRYPTO_AEAD2
> >> + help
> >> + The driver can be used by the SEV-SNP guest to communicate with the PSP to
> >> + request the attestation report and more.
> >> +
> >> + If you choose 'M' here, this module will be called sevguest.
> >> diff --git a/drivers/virt/coco/sevguest/Makefile b/drivers/virt/coco/sevguest/Makefile
> >> new file mode 100644
> >> index 000000000000..b1ffb2b4177b
> >> --- /dev/null
> >> +++ b/drivers/virt/coco/sevguest/Makefile
> >> @@ -0,0 +1,2 @@
> >> +# SPDX-License-Identifier: GPL-2.0-only
> >> +obj-$(CONFIG_SEV_GUEST) += sevguest.o
> >> diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
> >> new file mode 100644
> >> index 000000000000..2d313fb2ffae
> >> --- /dev/null
> >> +++ b/drivers/virt/coco/sevguest/sevguest.c
> >> @@ -0,0 +1,561 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/*
> >> + * AMD Secure Encrypted Virtualization Nested Paging (SEV-SNP) guest request interface
> >> + *
> >> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> >> + *
> >> + * Author: Brijesh Singh <[email protected]>
> >> + */
> >> +
> >> +#include <linux/module.h>
> >> +#include <linux/kernel.h>
> >> +#include <linux/types.h>
> >> +#include <linux/mutex.h>
> >> +#include <linux/io.h>
> >> +#include <linux/platform_device.h>
> >> +#include <linux/miscdevice.h>
> >> +#include <linux/set_memory.h>
> >> +#include <linux/fs.h>
> >> +#include <crypto/aead.h>
> >> +#include <linux/scatterlist.h>
> >> +#include <linux/psp-sev.h>
> >> +#include <uapi/linux/sev-guest.h>
> >> +#include <uapi/linux/psp-sev.h>
> >> +
> >> +#include <asm/svm.h>
> >> +#include <asm/sev.h>
> >> +
> >> +#include "sevguest.h"
> >> +
> >> +#define DEVICE_NAME "sev-guest"
> >> +#define AAD_LEN 48
> >> +#define MSG_HDR_VER 1
> >> +
> >> +struct snp_guest_crypto {
> >> + struct crypto_aead *tfm;
> >> + u8 *iv, *authtag;
> >> + int iv_len, a_len;
> >> +};
> >> +
> >> +struct snp_guest_dev {
> >> + struct device *dev;
> >> + struct miscdevice misc;
> >> +
> >> + struct snp_guest_crypto *crypto;
> >> + struct snp_guest_msg *request, *response;
> >> + struct snp_secrets_page_layout *layout;
> >> + struct snp_req_data input;
> >> + u32 *os_area_msg_seqno;
> >> +};
> >> +
> >> +static u32 vmpck_id;
> >> +module_param(vmpck_id, uint, 0444);
> >> +MODULE_PARM_DESC(vmpck_id, "The VMPCK ID to use when communicating with the PSP.");
> >> +
> >> +static DEFINE_MUTEX(snp_cmd_mutex);
> >> +
> >> +static inline u64 __snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
> >> +{
> >> + u64 count;
> >> +
> >> + /* Read the current message sequence counter from secrets pages */
> >> + count = *snp_dev->os_area_msg_seqno;
> >> +
> >> + return count + 1;
> >> +}
> >> +
> >> +/* Return a non-zero on success */
> >> +static u64 snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
> >> +{
> >> + u64 count = __snp_get_msg_seqno(snp_dev);
> >> +
> >> + /*
> >> + * The message sequence counter for the SNP guest request is a 64-bit
> >> + * value but the version 2 of GHCB specification defines a 32-bit storage
> >> + * for the it. If the counter exceeds the 32-bit value then return zero.
> >> + * The caller should check the return value, but if the caller happen to
> >> + * not check the value and use it, then the firmware treats zero as an
> >> + * invalid number and will fail the message request.
> >> + */
> >> + if (count >= UINT_MAX) {
> >> + pr_err_ratelimited("SNP guest request message sequence counter overflow\n");
> >> + return 0;
> >> + }
> >> +
> >> + return count;
> >> +}
> >> +
> >> +static void snp_inc_msg_seqno(struct snp_guest_dev *snp_dev)
> >> +{
> >> + /*
> >> + * The counter is also incremented by the PSP, so increment it by 2
> >> + * and save in secrets page.
> >> + */
> >> + *snp_dev->os_area_msg_seqno += 2;
> >> +}
> >> +
> >> +static inline struct snp_guest_dev *to_snp_dev(struct file *file)
> >> +{
> >> + struct miscdevice *dev = file->private_data;
> >> +
> >> + return container_of(dev, struct snp_guest_dev, misc);
> >> +}
> >> +
> >> +static struct snp_guest_crypto *init_crypto(struct snp_guest_dev *snp_dev, u8 *key, size_t keylen)
> >> +{
> >> + struct snp_guest_crypto *crypto;
> >> +
> >> + crypto = kzalloc(sizeof(*crypto), GFP_KERNEL_ACCOUNT);
> >> + if (!crypto)
> >> + return NULL;
> >> +
> >> + crypto->tfm = crypto_alloc_aead("gcm(aes)", 0, 0);
> >> + if (IS_ERR(crypto->tfm))
> >> + goto e_free;
> >> +
> >> + if (crypto_aead_setkey(crypto->tfm, key, keylen))
> >> + goto e_free_crypto;
> >> +
> >> + crypto->iv_len = crypto_aead_ivsize(crypto->tfm);
> >> + if (crypto->iv_len < 12) {
> >> + dev_err(snp_dev->dev, "IV length is less than 12.\n");
> >> + goto e_free_crypto;
> >> + }
> >> +
> >> + crypto->iv = kmalloc(crypto->iv_len, GFP_KERNEL_ACCOUNT);
> >> + if (!crypto->iv)
> >> + goto e_free_crypto;
> >> +
> >> + if (crypto_aead_authsize(crypto->tfm) > MAX_AUTHTAG_LEN) {
> >> + if (crypto_aead_setauthsize(crypto->tfm, MAX_AUTHTAG_LEN)) {
> >> + dev_err(snp_dev->dev, "failed to set authsize to %d\n", MAX_AUTHTAG_LEN);
> >> + goto e_free_crypto;
> >> + }
> >> + }
> >> +
> >> + crypto->a_len = crypto_aead_authsize(crypto->tfm);
> >> + crypto->authtag = kmalloc(crypto->a_len, GFP_KERNEL_ACCOUNT);
> >> + if (!crypto->authtag)
> >> + goto e_free_crypto;
> >> +
> >> + return crypto;
> >> +
> >> +e_free_crypto:
> >> + crypto_free_aead(crypto->tfm);
> >> +e_free:
> >> + kfree(crypto->iv);
> >> + kfree(crypto->authtag);
> >> + kfree(crypto);
> >> +
> >> + return NULL;
> >> +}
> >> +
> >> +static void deinit_crypto(struct snp_guest_crypto *crypto)
> >> +{
> >> + crypto_free_aead(crypto->tfm);
> >> + kfree(crypto->iv);
> >> + kfree(crypto->authtag);
> >> + kfree(crypto);
> >> +}
> >> +
> >> +static int enc_dec_message(struct snp_guest_crypto *crypto, struct snp_guest_msg *msg,
> >> + u8 *src_buf, u8 *dst_buf, size_t len, bool enc)
> >> +{
> >> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
> >> + struct scatterlist src[3], dst[3];
> >> + DECLARE_CRYPTO_WAIT(wait);
> >> + struct aead_request *req;
> >> + int ret;
> >> +
> >> + req = aead_request_alloc(crypto->tfm, GFP_KERNEL);
> >> + if (!req)
> >> + return -ENOMEM;
> >> +
> >> + /*
> >> + * AEAD memory operations:
> >> + * +------ AAD -------+------- DATA -----+---- AUTHTAG----+
> >> + * | msg header | plaintext | hdr->authtag |
> >> + * | bytes 30h - 5Fh | or | |
> >> + * | | cipher | |
> >> + * +------------------+------------------+----------------+
> >> + */
> >> + sg_init_table(src, 3);
> >> + sg_set_buf(&src[0], &hdr->algo, AAD_LEN);
> >> + sg_set_buf(&src[1], src_buf, hdr->msg_sz);
> >> + sg_set_buf(&src[2], hdr->authtag, crypto->a_len);
> >> +
> >> + sg_init_table(dst, 3);
> >> + sg_set_buf(&dst[0], &hdr->algo, AAD_LEN);
> >> + sg_set_buf(&dst[1], dst_buf, hdr->msg_sz);
> >> + sg_set_buf(&dst[2], hdr->authtag, crypto->a_len);
> >> +
> >> + aead_request_set_ad(req, AAD_LEN);
> >> + aead_request_set_tfm(req, crypto->tfm);
> >> + aead_request_set_callback(req, 0, crypto_req_done, &wait);
> >> +
> >> + aead_request_set_crypt(req, src, dst, len, crypto->iv);
> >> + ret = crypto_wait_req(enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req), &wait);
> >> +
> >> + aead_request_free(req);
> >> + return ret;
> >> +}
> >> +
> >> +static int __enc_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
> >> + void *plaintext, size_t len)
> >> +{
> >> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> >> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
> >> +
> >> + memset(crypto->iv, 0, crypto->iv_len);
> >> + memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
> >> +
> >> + return enc_dec_message(crypto, msg, plaintext, msg->payload, len, true);
> >> +}
> >> +
> >> +static int dec_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
> >> + void *plaintext, size_t len)
> >> +{
> >> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> >> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
> >> +
> >> + /* Build IV with response buffer sequence number */
> >> + memset(crypto->iv, 0, crypto->iv_len);
> >> + memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
> >> +
> >> + return enc_dec_message(crypto, msg, msg->payload, plaintext, len, false);
> >> +}
> >> +
> >> +static int verify_and_dec_payload(struct snp_guest_dev *snp_dev, void *payload, u32 sz)
> >> +{
> >> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> >> + struct snp_guest_msg *resp = snp_dev->response;
> >> + struct snp_guest_msg *req = snp_dev->request;
> >> + struct snp_guest_msg_hdr *req_hdr = &req->hdr;
> >> + struct snp_guest_msg_hdr *resp_hdr = &resp->hdr;
> >> +
> >> + dev_dbg(snp_dev->dev, "response [seqno %lld type %d version %d sz %d]\n",
> >> + resp_hdr->msg_seqno, resp_hdr->msg_type, resp_hdr->msg_version, resp_hdr->msg_sz);
> >> +
> >> + /* Verify that the sequence counter is incremented by 1 */
> >> + if (unlikely(resp_hdr->msg_seqno != (req_hdr->msg_seqno + 1)))
> >> + return -EBADMSG;
> >> +
> >> + /* Verify response message type and version number. */
> >> + if (resp_hdr->msg_type != (req_hdr->msg_type + 1) ||
> >> + resp_hdr->msg_version != req_hdr->msg_version)
> >> + return -EBADMSG;
> >> +
> >> + /*
> >> + * If the message size is greater than our buffer length then return
> >> + * an error.
> >> + */
> >> + if (unlikely((resp_hdr->msg_sz + crypto->a_len) > sz))
> >> + return -EBADMSG;
> >> +
> >> + return dec_payload(snp_dev, resp, payload, resp_hdr->msg_sz + crypto->a_len);
> >> +}
> >> +
> >> +static bool enc_payload(struct snp_guest_dev *snp_dev, u64 seqno, int version, u8 type,
> >> + void *payload, size_t sz)
> >> +{
> >> + struct snp_guest_msg *req = snp_dev->request;
> >> + struct snp_guest_msg_hdr *hdr = &req->hdr;
> >> +
> >> + memset(req, 0, sizeof(*req));
> >> +
> >> + hdr->algo = SNP_AEAD_AES_256_GCM;
> >> + hdr->hdr_version = MSG_HDR_VER;
> >> + hdr->hdr_sz = sizeof(*hdr);
> >> + hdr->msg_type = type;
> >> + hdr->msg_version = version;
> >> + hdr->msg_seqno = seqno;
> >> + hdr->msg_vmpck = vmpck_id;
> >> + hdr->msg_sz = sz;
> >> +
> >> + /* Verify the sequence number is non-zero */
> >> + if (!hdr->msg_seqno)
> >> + return -ENOSR;
> >> +
> >> + dev_dbg(snp_dev->dev, "request [seqno %lld type %d version %d sz %d]\n",
> >> + hdr->msg_seqno, hdr->msg_type, hdr->msg_version, hdr->msg_sz);
> >> +
> >> + return __enc_payload(snp_dev, req, payload, sz);
> >> +}
> >> +
> >> +static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, int msg_ver,
> >> + u8 type, void *req_buf, size_t req_sz, void *resp_buf,
> >> + u32 resp_sz, __u64 *fw_err)
> >> +{
> >> + unsigned long err;
> >> + u64 seqno;
> >> + int rc;
> >> +
> >> + /* Get message sequence and verify that its a non-zero */
> >> + seqno = snp_get_msg_seqno(snp_dev);
> >> + if (!seqno)
> >> + return -EIO;
> >> +
> >> + memset(snp_dev->response, 0, sizeof(*snp_dev->response));
> >> +
> >> + /* Encrypt the userspace provided payload */
> >> + rc = enc_payload(snp_dev, seqno, msg_ver, type, req_buf, req_sz);
> >> + if (rc)
> >> + return rc;
> >> +
> >> + /* Call firmware to process the request */
> >> + rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
> >> + if (fw_err)
> >> + *fw_err = err;
> >> +
> >> + if (rc)
> >> + return rc;
> >> +
> >> + rc = verify_and_dec_payload(snp_dev, resp_buf, resp_sz);
> >> + if (rc)
> >> + return rc;
> >> +
> >> + /* Increment to new message sequence after the command is successful. */
> >> + snp_inc_msg_seqno(snp_dev);
> >
> > Thanks for updating this sequence number logic. But I still have some
> > concerns. In verify_and_dec_payload() we check the encryption header
> > but all these fields are accessible to the hypervisor, meaning it can
> > change the header and cause this sequence number to not get
> > incremented. We then will reuse the sequence number for the next
> > command, which isn't great for AES GCM. It seems very hard to tell if
> > the FW actually got our request and created a response there by
> > incrementing the sequence number by 2, or if the hypervisor is acting
> > in bad faith. It seems like to be safe we need to completely stop
> > using this vmpck if we cannot confirm the PSP has gotten our request
> > and created a response. Thoughts?
> >
>
> Very good point, I think we can detect this condition by rearranging the
> checks. The verify_and_dec_payload() is called only after the command is
> succesful and does the following checks
>
> 1) Verifies the header
> 2) Decrypts the payload
> 3) Later we increment the sequence
>
> If we arrange to the below order then we can avoid this condition.
> 1) Decrypt the payload
> 2) Increment the sequence number
> 3) Verify the header
>
> The descryption will succeed only if PSP constructed the payload.
>
> Does this make sense ?
Either ordering seems fine to me. I don't think it changes much though
since the header (bytes 30-50 according to the spec) are included in
the authenticated data of the encryption. So any hypervisor modictions
will lead to a decryption failure right?
Either case if we do fail the decryption, what are your thoughts on
not allowing further use of that VMPCK?
>
> thanks
On Wed, Oct 27, 2021 at 2:48 PM Brijesh Singh <[email protected]> wrote:
>
>
>
> On 10/27/21 3:10 PM, Peter Gonda wrote:
> > On Wed, Oct 27, 2021 at 10:08 AM Brijesh Singh <[email protected]> wrote:
> >>
> >> Hi Peter,
> >>
> >> Somehow this email was filtered out as spam and never reached to my
> >> inbox. Sorry for the delay in the response.
> >>
> >> On 10/20/21 4:33 PM, Peter Gonda wrote:
> >>> On Fri, Oct 8, 2021 at 12:06 PM Brijesh Singh <[email protected]> wrote:
> >>>>
> >>>> SEV-SNP specification provides the guest a mechanisum to communicate with
> >>>> the PSP without risk from a malicious hypervisor who wishes to read, alter,
> >>>> drop or replay the messages sent. The driver uses snp_issue_guest_request()
> >>>> to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to
> >>>> submit the request to PSP.
> >>>>
> >>>> The PSP requires that all communication should be encrypted using key
> >>>> specified through the platform_data.
> >>>>
> >>>> The userspace can use SNP_GET_REPORT ioctl() to query the guest
> >>>> attestation report.
> >>>>
> >>>> See SEV-SNP spec section Guest Messages for more details.
> >>>>
> >>>> Signed-off-by: Brijesh Singh <[email protected]>
> >>>> ---
> >>>> Documentation/virt/coco/sevguest.rst | 77 ++++
> >>>> drivers/virt/Kconfig | 3 +
> >>>> drivers/virt/Makefile | 1 +
> >>>> drivers/virt/coco/sevguest/Kconfig | 9 +
> >>>> drivers/virt/coco/sevguest/Makefile | 2 +
> >>>> drivers/virt/coco/sevguest/sevguest.c | 561 ++++++++++++++++++++++++++
> >>>> drivers/virt/coco/sevguest/sevguest.h | 98 +++++
> >>>> include/uapi/linux/sev-guest.h | 44 ++
> >>>> 8 files changed, 795 insertions(+)
> >>>> create mode 100644 Documentation/virt/coco/sevguest.rst
> >>>> create mode 100644 drivers/virt/coco/sevguest/Kconfig
> >>>> create mode 100644 drivers/virt/coco/sevguest/Makefile
> >>>> create mode 100644 drivers/virt/coco/sevguest/sevguest.c
> >>>> create mode 100644 drivers/virt/coco/sevguest/sevguest.h
> >>>> create mode 100644 include/uapi/linux/sev-guest.h
> >>>>
> >>>> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
> >>>> new file mode 100644
> >>>> index 000000000000..002c90946b8a
> >>>> --- /dev/null
> >>>> +++ b/Documentation/virt/coco/sevguest.rst
> >>>> @@ -0,0 +1,77 @@
> >>>> +.. SPDX-License-Identifier: GPL-2.0
> >>>> +
> >>>> +===================================================================
> >>>> +The Definitive SEV Guest API Documentation
> >>>> +===================================================================
> >>>> +
> >>>> +1. General description
> >>>> +======================
> >>>> +
> >>>> +The SEV API is a set of ioctls that are used by the guest or hypervisor
> >>>> +to get or set certain aspect of the SEV virtual machine. The ioctls belong
> >>>> +to the following classes:
> >>>> +
> >>>> + - Hypervisor ioctls: These query and set global attributes which affect the
> >>>> + whole SEV firmware. These ioctl are used by platform provision tools.
> >>>> +
> >>>> + - Guest ioctls: These query and set attributes of the SEV virtual machine.
> >>>> +
> >>>> +2. API description
> >>>> +==================
> >>>> +
> >>>> +This section describes ioctls that can be used to query or set SEV guests.
> >>>> +For each ioctl, the following information is provided along with a
> >>>> +description:
> >>>> +
> >>>> + Technology:
> >>>> + which SEV techology provides this ioctl. sev, sev-es, sev-snp or all.
> >>>> +
> >>>> + Type:
> >>>> + hypervisor or guest. The ioctl can be used inside the guest or the
> >>>> + hypervisor.
> >>>> +
> >>>> + Parameters:
> >>>> + what parameters are accepted by the ioctl.
> >>>> +
> >>>> + Returns:
> >>>> + the return value. General error numbers (ENOMEM, EINVAL)
> >>>> + are not detailed, but errors with specific meanings are.
> >>>> +
> >>>> +The guest ioctl should be issued on a file descriptor of the /dev/sev-guest device.
> >>>> +The ioctl accepts struct snp_user_guest_request. The input and output structure is
> >>>> +specified through the req_data and resp_data field respectively. If the ioctl fails
> >>>> +to execute due to a firmware error, then fw_err code will be set.
> >>>> +
> >>>> +::
> >>>> + struct snp_guest_request_ioctl {
> >>>> + /* Request and response structure address */
> >>>> + __u64 req_data;
> >>>> + __u64 resp_data;
> >>>> +
> >>>> + /* firmware error code on failure (see psp-sev.h) */
> >>>> + __u64 fw_err;
> >>>> + };
> >>>> +
> >>>> +2.1 SNP_GET_REPORT
> >>>> +------------------
> >>>> +
> >>>> +:Technology: sev-snp
> >>>> +:Type: guest ioctl
> >>>> +:Parameters (in): struct snp_report_req
> >>>> +:Returns (out): struct snp_report_resp on success, -negative on error
> >>>> +
> >>>> +The SNP_GET_REPORT ioctl can be used to query the attestation report from the
> >>>> +SEV-SNP firmware. The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command
> >>>> +provided by the SEV-SNP firmware to query the attestation report.
> >>>> +
> >>>> +On success, the snp_report_resp.data will contains the report. The report
> >>>> +will contain the format described in the SEV-SNP specification. See the SEV-SNP
> >>>> +specification for further details.
> >>>> +
> >>>> +
> >>>> +Reference
> >>>> +---------
> >>>> +
> >>>> +SEV-SNP and GHCB specification: developer.amd.com/sev
> >>>> +
> >>>> +The driver is based on SEV-SNP firmware spec 0.9 and GHCB spec version 2.0.
> >>>> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
> >>>> index 8061e8ef449f..e457e47610d3 100644
> >>>> --- a/drivers/virt/Kconfig
> >>>> +++ b/drivers/virt/Kconfig
> >>>> @@ -36,4 +36,7 @@ source "drivers/virt/vboxguest/Kconfig"
> >>>> source "drivers/virt/nitro_enclaves/Kconfig"
> >>>>
> >>>> source "drivers/virt/acrn/Kconfig"
> >>>> +
> >>>> +source "drivers/virt/coco/sevguest/Kconfig"
> >>>> +
> >>>> endif
> >>>> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> >>>> index 3e272ea60cd9..9c704a6fdcda 100644
> >>>> --- a/drivers/virt/Makefile
> >>>> +++ b/drivers/virt/Makefile
> >>>> @@ -8,3 +8,4 @@ obj-y += vboxguest/
> >>>>
> >>>> obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
> >>>> obj-$(CONFIG_ACRN_HSM) += acrn/
> >>>> +obj-$(CONFIG_SEV_GUEST) += coco/sevguest/
> >>>> diff --git a/drivers/virt/coco/sevguest/Kconfig b/drivers/virt/coco/sevguest/Kconfig
> >>>> new file mode 100644
> >>>> index 000000000000..96190919cca8
> >>>> --- /dev/null
> >>>> +++ b/drivers/virt/coco/sevguest/Kconfig
> >>>> @@ -0,0 +1,9 @@
> >>>> +config SEV_GUEST
> >>>> + tristate "AMD SEV Guest driver"
> >>>> + default y
> >>>> + depends on AMD_MEM_ENCRYPT && CRYPTO_AEAD2
> >>>> + help
> >>>> + The driver can be used by the SEV-SNP guest to communicate with the PSP to
> >>>> + request the attestation report and more.
> >>>> +
> >>>> + If you choose 'M' here, this module will be called sevguest.
> >>>> diff --git a/drivers/virt/coco/sevguest/Makefile b/drivers/virt/coco/sevguest/Makefile
> >>>> new file mode 100644
> >>>> index 000000000000..b1ffb2b4177b
> >>>> --- /dev/null
> >>>> +++ b/drivers/virt/coco/sevguest/Makefile
> >>>> @@ -0,0 +1,2 @@
> >>>> +# SPDX-License-Identifier: GPL-2.0-only
> >>>> +obj-$(CONFIG_SEV_GUEST) += sevguest.o
> >>>> diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
> >>>> new file mode 100644
> >>>> index 000000000000..2d313fb2ffae
> >>>> --- /dev/null
> >>>> +++ b/drivers/virt/coco/sevguest/sevguest.c
> >>>> @@ -0,0 +1,561 @@
> >>>> +// SPDX-License-Identifier: GPL-2.0-only
> >>>> +/*
> >>>> + * AMD Secure Encrypted Virtualization Nested Paging (SEV-SNP) guest request interface
> >>>> + *
> >>>> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> >>>> + *
> >>>> + * Author: Brijesh Singh <[email protected]>
> >>>> + */
> >>>> +
> >>>> +#include <linux/module.h>
> >>>> +#include <linux/kernel.h>
> >>>> +#include <linux/types.h>
> >>>> +#include <linux/mutex.h>
> >>>> +#include <linux/io.h>
> >>>> +#include <linux/platform_device.h>
> >>>> +#include <linux/miscdevice.h>
> >>>> +#include <linux/set_memory.h>
> >>>> +#include <linux/fs.h>
> >>>> +#include <crypto/aead.h>
> >>>> +#include <linux/scatterlist.h>
> >>>> +#include <linux/psp-sev.h>
> >>>> +#include <uapi/linux/sev-guest.h>
> >>>> +#include <uapi/linux/psp-sev.h>
> >>>> +
> >>>> +#include <asm/svm.h>
> >>>> +#include <asm/sev.h>
> >>>> +
> >>>> +#include "sevguest.h"
> >>>> +
> >>>> +#define DEVICE_NAME "sev-guest"
> >>>> +#define AAD_LEN 48
> >>>> +#define MSG_HDR_VER 1
> >>>> +
> >>>> +struct snp_guest_crypto {
> >>>> + struct crypto_aead *tfm;
> >>>> + u8 *iv, *authtag;
> >>>> + int iv_len, a_len;
> >>>> +};
> >>>> +
> >>>> +struct snp_guest_dev {
> >>>> + struct device *dev;
> >>>> + struct miscdevice misc;
> >>>> +
> >>>> + struct snp_guest_crypto *crypto;
> >>>> + struct snp_guest_msg *request, *response;
> >>>> + struct snp_secrets_page_layout *layout;
> >>>> + struct snp_req_data input;
> >>>> + u32 *os_area_msg_seqno;
> >>>> +};
> >>>> +
> >>>> +static u32 vmpck_id;
> >>>> +module_param(vmpck_id, uint, 0444);
> >>>> +MODULE_PARM_DESC(vmpck_id, "The VMPCK ID to use when communicating with the PSP.");
> >>>> +
> >>>> +static DEFINE_MUTEX(snp_cmd_mutex);
> >>>> +
> >>>> +static inline u64 __snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
> >>>> +{
> >>>> + u64 count;
> >>>> +
> >>>> + /* Read the current message sequence counter from secrets pages */
> >>>> + count = *snp_dev->os_area_msg_seqno;
> >>>> +
> >>>> + return count + 1;
> >>>> +}
> >>>> +
> >>>> +/* Return a non-zero on success */
> >>>> +static u64 snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
> >>>> +{
> >>>> + u64 count = __snp_get_msg_seqno(snp_dev);
> >>>> +
> >>>> + /*
> >>>> + * The message sequence counter for the SNP guest request is a 64-bit
> >>>> + * value but the version 2 of GHCB specification defines a 32-bit storage
> >>>> + * for the it. If the counter exceeds the 32-bit value then return zero.
> >>>> + * The caller should check the return value, but if the caller happen to
> >>>> + * not check the value and use it, then the firmware treats zero as an
> >>>> + * invalid number and will fail the message request.
> >>>> + */
> >>>> + if (count >= UINT_MAX) {
> >>>> + pr_err_ratelimited("SNP guest request message sequence counter overflow\n");
> >>>> + return 0;
> >>>> + }
> >>>> +
> >>>> + return count;
> >>>> +}
> >>>> +
> >>>> +static void snp_inc_msg_seqno(struct snp_guest_dev *snp_dev)
> >>>> +{
> >>>> + /*
> >>>> + * The counter is also incremented by the PSP, so increment it by 2
> >>>> + * and save in secrets page.
> >>>> + */
> >>>> + *snp_dev->os_area_msg_seqno += 2;
> >>>> +}
> >>>> +
> >>>> +static inline struct snp_guest_dev *to_snp_dev(struct file *file)
> >>>> +{
> >>>> + struct miscdevice *dev = file->private_data;
> >>>> +
> >>>> + return container_of(dev, struct snp_guest_dev, misc);
> >>>> +}
> >>>> +
> >>>> +static struct snp_guest_crypto *init_crypto(struct snp_guest_dev *snp_dev, u8 *key, size_t keylen)
> >>>> +{
> >>>> + struct snp_guest_crypto *crypto;
> >>>> +
> >>>> + crypto = kzalloc(sizeof(*crypto), GFP_KERNEL_ACCOUNT);
> >>>> + if (!crypto)
> >>>> + return NULL;
> >>>> +
> >>>> + crypto->tfm = crypto_alloc_aead("gcm(aes)", 0, 0);
> >>>> + if (IS_ERR(crypto->tfm))
> >>>> + goto e_free;
> >>>> +
> >>>> + if (crypto_aead_setkey(crypto->tfm, key, keylen))
> >>>> + goto e_free_crypto;
> >>>> +
> >>>> + crypto->iv_len = crypto_aead_ivsize(crypto->tfm);
> >>>> + if (crypto->iv_len < 12) {
> >>>> + dev_err(snp_dev->dev, "IV length is less than 12.\n");
> >>>> + goto e_free_crypto;
> >>>> + }
> >>>> +
> >>>> + crypto->iv = kmalloc(crypto->iv_len, GFP_KERNEL_ACCOUNT);
> >>>> + if (!crypto->iv)
> >>>> + goto e_free_crypto;
> >>>> +
> >>>> + if (crypto_aead_authsize(crypto->tfm) > MAX_AUTHTAG_LEN) {
> >>>> + if (crypto_aead_setauthsize(crypto->tfm, MAX_AUTHTAG_LEN)) {
> >>>> + dev_err(snp_dev->dev, "failed to set authsize to %d\n", MAX_AUTHTAG_LEN);
> >>>> + goto e_free_crypto;
> >>>> + }
> >>>> + }
> >>>> +
> >>>> + crypto->a_len = crypto_aead_authsize(crypto->tfm);
> >>>> + crypto->authtag = kmalloc(crypto->a_len, GFP_KERNEL_ACCOUNT);
> >>>> + if (!crypto->authtag)
> >>>> + goto e_free_crypto;
> >>>> +
> >>>> + return crypto;
> >>>> +
> >>>> +e_free_crypto:
> >>>> + crypto_free_aead(crypto->tfm);
> >>>> +e_free:
> >>>> + kfree(crypto->iv);
> >>>> + kfree(crypto->authtag);
> >>>> + kfree(crypto);
> >>>> +
> >>>> + return NULL;
> >>>> +}
> >>>> +
> >>>> +static void deinit_crypto(struct snp_guest_crypto *crypto)
> >>>> +{
> >>>> + crypto_free_aead(crypto->tfm);
> >>>> + kfree(crypto->iv);
> >>>> + kfree(crypto->authtag);
> >>>> + kfree(crypto);
> >>>> +}
> >>>> +
> >>>> +static int enc_dec_message(struct snp_guest_crypto *crypto, struct snp_guest_msg *msg,
> >>>> + u8 *src_buf, u8 *dst_buf, size_t len, bool enc)
> >>>> +{
> >>>> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
> >>>> + struct scatterlist src[3], dst[3];
> >>>> + DECLARE_CRYPTO_WAIT(wait);
> >>>> + struct aead_request *req;
> >>>> + int ret;
> >>>> +
> >>>> + req = aead_request_alloc(crypto->tfm, GFP_KERNEL);
> >>>> + if (!req)
> >>>> + return -ENOMEM;
> >>>> +
> >>>> + /*
> >>>> + * AEAD memory operations:
> >>>> + * +------ AAD -------+------- DATA -----+---- AUTHTAG----+
> >>>> + * | msg header | plaintext | hdr->authtag |
> >>>> + * | bytes 30h - 5Fh | or | |
> >>>> + * | | cipher | |
> >>>> + * +------------------+------------------+----------------+
> >>>> + */
> >>>> + sg_init_table(src, 3);
> >>>> + sg_set_buf(&src[0], &hdr->algo, AAD_LEN);
> >>>> + sg_set_buf(&src[1], src_buf, hdr->msg_sz);
> >>>> + sg_set_buf(&src[2], hdr->authtag, crypto->a_len);
> >>>> +
> >>>> + sg_init_table(dst, 3);
> >>>> + sg_set_buf(&dst[0], &hdr->algo, AAD_LEN);
> >>>> + sg_set_buf(&dst[1], dst_buf, hdr->msg_sz);
> >>>> + sg_set_buf(&dst[2], hdr->authtag, crypto->a_len);
> >>>> +
> >>>> + aead_request_set_ad(req, AAD_LEN);
> >>>> + aead_request_set_tfm(req, crypto->tfm);
> >>>> + aead_request_set_callback(req, 0, crypto_req_done, &wait);
> >>>> +
> >>>> + aead_request_set_crypt(req, src, dst, len, crypto->iv);
> >>>> + ret = crypto_wait_req(enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req), &wait);
> >>>> +
> >>>> + aead_request_free(req);
> >>>> + return ret;
> >>>> +}
> >>>> +
> >>>> +static int __enc_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
> >>>> + void *plaintext, size_t len)
> >>>> +{
> >>>> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> >>>> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
> >>>> +
> >>>> + memset(crypto->iv, 0, crypto->iv_len);
> >>>> + memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
> >>>> +
> >>>> + return enc_dec_message(crypto, msg, plaintext, msg->payload, len, true);
> >>>> +}
> >>>> +
> >>>> +static int dec_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
> >>>> + void *plaintext, size_t len)
> >>>> +{
> >>>> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> >>>> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
> >>>> +
> >>>> + /* Build IV with response buffer sequence number */
> >>>> + memset(crypto->iv, 0, crypto->iv_len);
> >>>> + memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
> >>>> +
> >>>> + return enc_dec_message(crypto, msg, msg->payload, plaintext, len, false);
> >>>> +}
> >>>> +
> >>>> +static int verify_and_dec_payload(struct snp_guest_dev *snp_dev, void *payload, u32 sz)
> >>>> +{
> >>>> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> >>>> + struct snp_guest_msg *resp = snp_dev->response;
> >>>> + struct snp_guest_msg *req = snp_dev->request;
> >>>> + struct snp_guest_msg_hdr *req_hdr = &req->hdr;
> >>>> + struct snp_guest_msg_hdr *resp_hdr = &resp->hdr;
> >>>> +
> >>>> + dev_dbg(snp_dev->dev, "response [seqno %lld type %d version %d sz %d]\n",
> >>>> + resp_hdr->msg_seqno, resp_hdr->msg_type, resp_hdr->msg_version, resp_hdr->msg_sz);
> >>>> +
> >>>> + /* Verify that the sequence counter is incremented by 1 */
> >>>> + if (unlikely(resp_hdr->msg_seqno != (req_hdr->msg_seqno + 1)))
> >>>> + return -EBADMSG;
> >>>> +
> >>>> + /* Verify response message type and version number. */
> >>>> + if (resp_hdr->msg_type != (req_hdr->msg_type + 1) ||
> >>>> + resp_hdr->msg_version != req_hdr->msg_version)
> >>>> + return -EBADMSG;
> >>>> +
> >>>> + /*
> >>>> + * If the message size is greater than our buffer length then return
> >>>> + * an error.
> >>>> + */
> >>>> + if (unlikely((resp_hdr->msg_sz + crypto->a_len) > sz))
> >>>> + return -EBADMSG;
> >>>> +
> >>>> + return dec_payload(snp_dev, resp, payload, resp_hdr->msg_sz + crypto->a_len);
> >>>> +}
> >>>> +
> >>>> +static bool enc_payload(struct snp_guest_dev *snp_dev, u64 seqno, int version, u8 type,
> >>>> + void *payload, size_t sz)
> >>>> +{
> >>>> + struct snp_guest_msg *req = snp_dev->request;
> >>>> + struct snp_guest_msg_hdr *hdr = &req->hdr;
> >>>> +
> >>>> + memset(req, 0, sizeof(*req));
> >>>> +
> >>>> + hdr->algo = SNP_AEAD_AES_256_GCM;
> >>>> + hdr->hdr_version = MSG_HDR_VER;
> >>>> + hdr->hdr_sz = sizeof(*hdr);
> >>>> + hdr->msg_type = type;
> >>>> + hdr->msg_version = version;
> >>>> + hdr->msg_seqno = seqno;
> >>>> + hdr->msg_vmpck = vmpck_id;
> >>>> + hdr->msg_sz = sz;
> >>>> +
> >>>> + /* Verify the sequence number is non-zero */
> >>>> + if (!hdr->msg_seqno)
> >>>> + return -ENOSR;
> >>>> +
> >>>> + dev_dbg(snp_dev->dev, "request [seqno %lld type %d version %d sz %d]\n",
> >>>> + hdr->msg_seqno, hdr->msg_type, hdr->msg_version, hdr->msg_sz);
> >>>> +
> >>>> + return __enc_payload(snp_dev, req, payload, sz);
> >>>> +}
> >>>> +
> >>>> +static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, int msg_ver,
> >>>> + u8 type, void *req_buf, size_t req_sz, void *resp_buf,
> >>>> + u32 resp_sz, __u64 *fw_err)
> >>>> +{
> >>>> + unsigned long err;
> >>>> + u64 seqno;
> >>>> + int rc;
> >>>> +
> >>>> + /* Get message sequence and verify that its a non-zero */
> >>>> + seqno = snp_get_msg_seqno(snp_dev);
> >>>> + if (!seqno)
> >>>> + return -EIO;
> >>>> +
> >>>> + memset(snp_dev->response, 0, sizeof(*snp_dev->response));
> >>>> +
> >>>> + /* Encrypt the userspace provided payload */
> >>>> + rc = enc_payload(snp_dev, seqno, msg_ver, type, req_buf, req_sz);
> >>>> + if (rc)
> >>>> + return rc;
> >>>> +
> >>>> + /* Call firmware to process the request */
> >>>> + rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
> >>>> + if (fw_err)
> >>>> + *fw_err = err;
> >>>> +
> >>>> + if (rc)
> >>>> + return rc;
> >>>> +
> >>>> + rc = verify_and_dec_payload(snp_dev, resp_buf, resp_sz);
> >>>> + if (rc)
> >>>> + return rc;
> >>>> +
> >>>> + /* Increment to new message sequence after the command is successful. */
> >>>> + snp_inc_msg_seqno(snp_dev);
> >>>
> >>> Thanks for updating this sequence number logic. But I still have some
> >>> concerns. In verify_and_dec_payload() we check the encryption header
> >>> but all these fields are accessible to the hypervisor, meaning it can
> >>> change the header and cause this sequence number to not get
> >>> incremented. We then will reuse the sequence number for the next
> >>> command, which isn't great for AES GCM. It seems very hard to tell if
> >>> the FW actually got our request and created a response there by
> >>> incrementing the sequence number by 2, or if the hypervisor is acting
> >>> in bad faith. It seems like to be safe we need to completely stop
> >>> using this vmpck if we cannot confirm the PSP has gotten our request
> >>> and created a response. Thoughts?
> >>>
> >>
> >> Very good point, I think we can detect this condition by rearranging the
> >> checks. The verify_and_dec_payload() is called only after the command is
> >> succesful and does the following checks
> >>
> >> 1) Verifies the header
> >> 2) Decrypts the payload
> >> 3) Later we increment the sequence
> >>
> >> If we arrange to the below order then we can avoid this condition.
> >> 1) Decrypt the payload
> >> 2) Increment the sequence number
> >> 3) Verify the header
> >>
> >> The descryption will succeed only if PSP constructed the payload.
> >>
> >> Does this make sense ?
> >
> > Either ordering seems fine to me. I don't think it changes much though
> > since the header (bytes 30-50 according to the spec) are included in
> > the authenticated data of the encryption. So any hypervisor modictions
> > will lead to a decryption failure right?
> >
> > Either case if we do fail the decryption, what are your thoughts on
> > not allowing further use of that VMPCK?
> >
>
> We have limited number of VMPCK (total 3). I am not sure switching to
> different will change much. HV can quickly exaust it. Once we have SVSM
> in-place then its possible that SVSM may use of the VMPCK. If the
> decryption failed, then maybe its safe to erase the key from the secrets
> page (in other words guest OS cannot use that key for any further
> communication). A guest can reload the driver will different VMPCK id
> and try again.
SNP cannot really cover DOS at all since the VMM could just never
schedule the VM. In this case we know that the hypervisor is trying to
mess with the guest, so my preference would be to stop sending guest
messages to prevent that duplicated IV usage. If one caller gets an
EBADMSG it knows its in this case but the rest of userspace has no
idea. Maybe log an error?
>
> thanks
On 10/27/21 3:10 PM, Peter Gonda wrote:
> On Wed, Oct 27, 2021 at 10:08 AM Brijesh Singh <[email protected]> wrote:
>>
>> Hi Peter,
>>
>> Somehow this email was filtered out as spam and never reached to my
>> inbox. Sorry for the delay in the response.
>>
>> On 10/20/21 4:33 PM, Peter Gonda wrote:
>>> On Fri, Oct 8, 2021 at 12:06 PM Brijesh Singh <[email protected]> wrote:
>>>>
>>>> SEV-SNP specification provides the guest a mechanisum to communicate with
>>>> the PSP without risk from a malicious hypervisor who wishes to read, alter,
>>>> drop or replay the messages sent. The driver uses snp_issue_guest_request()
>>>> to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to
>>>> submit the request to PSP.
>>>>
>>>> The PSP requires that all communication should be encrypted using key
>>>> specified through the platform_data.
>>>>
>>>> The userspace can use SNP_GET_REPORT ioctl() to query the guest
>>>> attestation report.
>>>>
>>>> See SEV-SNP spec section Guest Messages for more details.
>>>>
>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>> ---
>>>> Documentation/virt/coco/sevguest.rst | 77 ++++
>>>> drivers/virt/Kconfig | 3 +
>>>> drivers/virt/Makefile | 1 +
>>>> drivers/virt/coco/sevguest/Kconfig | 9 +
>>>> drivers/virt/coco/sevguest/Makefile | 2 +
>>>> drivers/virt/coco/sevguest/sevguest.c | 561 ++++++++++++++++++++++++++
>>>> drivers/virt/coco/sevguest/sevguest.h | 98 +++++
>>>> include/uapi/linux/sev-guest.h | 44 ++
>>>> 8 files changed, 795 insertions(+)
>>>> create mode 100644 Documentation/virt/coco/sevguest.rst
>>>> create mode 100644 drivers/virt/coco/sevguest/Kconfig
>>>> create mode 100644 drivers/virt/coco/sevguest/Makefile
>>>> create mode 100644 drivers/virt/coco/sevguest/sevguest.c
>>>> create mode 100644 drivers/virt/coco/sevguest/sevguest.h
>>>> create mode 100644 include/uapi/linux/sev-guest.h
>>>>
>>>> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
>>>> new file mode 100644
>>>> index 000000000000..002c90946b8a
>>>> --- /dev/null
>>>> +++ b/Documentation/virt/coco/sevguest.rst
>>>> @@ -0,0 +1,77 @@
>>>> +.. SPDX-License-Identifier: GPL-2.0
>>>> +
>>>> +===================================================================
>>>> +The Definitive SEV Guest API Documentation
>>>> +===================================================================
>>>> +
>>>> +1. General description
>>>> +======================
>>>> +
>>>> +The SEV API is a set of ioctls that are used by the guest or hypervisor
>>>> +to get or set certain aspect of the SEV virtual machine. The ioctls belong
>>>> +to the following classes:
>>>> +
>>>> + - Hypervisor ioctls: These query and set global attributes which affect the
>>>> + whole SEV firmware. These ioctl are used by platform provision tools.
>>>> +
>>>> + - Guest ioctls: These query and set attributes of the SEV virtual machine.
>>>> +
>>>> +2. API description
>>>> +==================
>>>> +
>>>> +This section describes ioctls that can be used to query or set SEV guests.
>>>> +For each ioctl, the following information is provided along with a
>>>> +description:
>>>> +
>>>> + Technology:
>>>> + which SEV techology provides this ioctl. sev, sev-es, sev-snp or all.
>>>> +
>>>> + Type:
>>>> + hypervisor or guest. The ioctl can be used inside the guest or the
>>>> + hypervisor.
>>>> +
>>>> + Parameters:
>>>> + what parameters are accepted by the ioctl.
>>>> +
>>>> + Returns:
>>>> + the return value. General error numbers (ENOMEM, EINVAL)
>>>> + are not detailed, but errors with specific meanings are.
>>>> +
>>>> +The guest ioctl should be issued on a file descriptor of the /dev/sev-guest device.
>>>> +The ioctl accepts struct snp_user_guest_request. The input and output structure is
>>>> +specified through the req_data and resp_data field respectively. If the ioctl fails
>>>> +to execute due to a firmware error, then fw_err code will be set.
>>>> +
>>>> +::
>>>> + struct snp_guest_request_ioctl {
>>>> + /* Request and response structure address */
>>>> + __u64 req_data;
>>>> + __u64 resp_data;
>>>> +
>>>> + /* firmware error code on failure (see psp-sev.h) */
>>>> + __u64 fw_err;
>>>> + };
>>>> +
>>>> +2.1 SNP_GET_REPORT
>>>> +------------------
>>>> +
>>>> +:Technology: sev-snp
>>>> +:Type: guest ioctl
>>>> +:Parameters (in): struct snp_report_req
>>>> +:Returns (out): struct snp_report_resp on success, -negative on error
>>>> +
>>>> +The SNP_GET_REPORT ioctl can be used to query the attestation report from the
>>>> +SEV-SNP firmware. The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command
>>>> +provided by the SEV-SNP firmware to query the attestation report.
>>>> +
>>>> +On success, the snp_report_resp.data will contains the report. The report
>>>> +will contain the format described in the SEV-SNP specification. See the SEV-SNP
>>>> +specification for further details.
>>>> +
>>>> +
>>>> +Reference
>>>> +---------
>>>> +
>>>> +SEV-SNP and GHCB specification: developer.amd.com/sev
>>>> +
>>>> +The driver is based on SEV-SNP firmware spec 0.9 and GHCB spec version 2.0.
>>>> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
>>>> index 8061e8ef449f..e457e47610d3 100644
>>>> --- a/drivers/virt/Kconfig
>>>> +++ b/drivers/virt/Kconfig
>>>> @@ -36,4 +36,7 @@ source "drivers/virt/vboxguest/Kconfig"
>>>> source "drivers/virt/nitro_enclaves/Kconfig"
>>>>
>>>> source "drivers/virt/acrn/Kconfig"
>>>> +
>>>> +source "drivers/virt/coco/sevguest/Kconfig"
>>>> +
>>>> endif
>>>> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
>>>> index 3e272ea60cd9..9c704a6fdcda 100644
>>>> --- a/drivers/virt/Makefile
>>>> +++ b/drivers/virt/Makefile
>>>> @@ -8,3 +8,4 @@ obj-y += vboxguest/
>>>>
>>>> obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
>>>> obj-$(CONFIG_ACRN_HSM) += acrn/
>>>> +obj-$(CONFIG_SEV_GUEST) += coco/sevguest/
>>>> diff --git a/drivers/virt/coco/sevguest/Kconfig b/drivers/virt/coco/sevguest/Kconfig
>>>> new file mode 100644
>>>> index 000000000000..96190919cca8
>>>> --- /dev/null
>>>> +++ b/drivers/virt/coco/sevguest/Kconfig
>>>> @@ -0,0 +1,9 @@
>>>> +config SEV_GUEST
>>>> + tristate "AMD SEV Guest driver"
>>>> + default y
>>>> + depends on AMD_MEM_ENCRYPT && CRYPTO_AEAD2
>>>> + help
>>>> + The driver can be used by the SEV-SNP guest to communicate with the PSP to
>>>> + request the attestation report and more.
>>>> +
>>>> + If you choose 'M' here, this module will be called sevguest.
>>>> diff --git a/drivers/virt/coco/sevguest/Makefile b/drivers/virt/coco/sevguest/Makefile
>>>> new file mode 100644
>>>> index 000000000000..b1ffb2b4177b
>>>> --- /dev/null
>>>> +++ b/drivers/virt/coco/sevguest/Makefile
>>>> @@ -0,0 +1,2 @@
>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>> +obj-$(CONFIG_SEV_GUEST) += sevguest.o
>>>> diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
>>>> new file mode 100644
>>>> index 000000000000..2d313fb2ffae
>>>> --- /dev/null
>>>> +++ b/drivers/virt/coco/sevguest/sevguest.c
>>>> @@ -0,0 +1,561 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/*
>>>> + * AMD Secure Encrypted Virtualization Nested Paging (SEV-SNP) guest request interface
>>>> + *
>>>> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
>>>> + *
>>>> + * Author: Brijesh Singh <[email protected]>
>>>> + */
>>>> +
>>>> +#include <linux/module.h>
>>>> +#include <linux/kernel.h>
>>>> +#include <linux/types.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/io.h>
>>>> +#include <linux/platform_device.h>
>>>> +#include <linux/miscdevice.h>
>>>> +#include <linux/set_memory.h>
>>>> +#include <linux/fs.h>
>>>> +#include <crypto/aead.h>
>>>> +#include <linux/scatterlist.h>
>>>> +#include <linux/psp-sev.h>
>>>> +#include <uapi/linux/sev-guest.h>
>>>> +#include <uapi/linux/psp-sev.h>
>>>> +
>>>> +#include <asm/svm.h>
>>>> +#include <asm/sev.h>
>>>> +
>>>> +#include "sevguest.h"
>>>> +
>>>> +#define DEVICE_NAME "sev-guest"
>>>> +#define AAD_LEN 48
>>>> +#define MSG_HDR_VER 1
>>>> +
>>>> +struct snp_guest_crypto {
>>>> + struct crypto_aead *tfm;
>>>> + u8 *iv, *authtag;
>>>> + int iv_len, a_len;
>>>> +};
>>>> +
>>>> +struct snp_guest_dev {
>>>> + struct device *dev;
>>>> + struct miscdevice misc;
>>>> +
>>>> + struct snp_guest_crypto *crypto;
>>>> + struct snp_guest_msg *request, *response;
>>>> + struct snp_secrets_page_layout *layout;
>>>> + struct snp_req_data input;
>>>> + u32 *os_area_msg_seqno;
>>>> +};
>>>> +
>>>> +static u32 vmpck_id;
>>>> +module_param(vmpck_id, uint, 0444);
>>>> +MODULE_PARM_DESC(vmpck_id, "The VMPCK ID to use when communicating with the PSP.");
>>>> +
>>>> +static DEFINE_MUTEX(snp_cmd_mutex);
>>>> +
>>>> +static inline u64 __snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
>>>> +{
>>>> + u64 count;
>>>> +
>>>> + /* Read the current message sequence counter from secrets pages */
>>>> + count = *snp_dev->os_area_msg_seqno;
>>>> +
>>>> + return count + 1;
>>>> +}
>>>> +
>>>> +/* Return a non-zero on success */
>>>> +static u64 snp_get_msg_seqno(struct snp_guest_dev *snp_dev)
>>>> +{
>>>> + u64 count = __snp_get_msg_seqno(snp_dev);
>>>> +
>>>> + /*
>>>> + * The message sequence counter for the SNP guest request is a 64-bit
>>>> + * value but the version 2 of GHCB specification defines a 32-bit storage
>>>> + * for the it. If the counter exceeds the 32-bit value then return zero.
>>>> + * The caller should check the return value, but if the caller happen to
>>>> + * not check the value and use it, then the firmware treats zero as an
>>>> + * invalid number and will fail the message request.
>>>> + */
>>>> + if (count >= UINT_MAX) {
>>>> + pr_err_ratelimited("SNP guest request message sequence counter overflow\n");
>>>> + return 0;
>>>> + }
>>>> +
>>>> + return count;
>>>> +}
>>>> +
>>>> +static void snp_inc_msg_seqno(struct snp_guest_dev *snp_dev)
>>>> +{
>>>> + /*
>>>> + * The counter is also incremented by the PSP, so increment it by 2
>>>> + * and save in secrets page.
>>>> + */
>>>> + *snp_dev->os_area_msg_seqno += 2;
>>>> +}
>>>> +
>>>> +static inline struct snp_guest_dev *to_snp_dev(struct file *file)
>>>> +{
>>>> + struct miscdevice *dev = file->private_data;
>>>> +
>>>> + return container_of(dev, struct snp_guest_dev, misc);
>>>> +}
>>>> +
>>>> +static struct snp_guest_crypto *init_crypto(struct snp_guest_dev *snp_dev, u8 *key, size_t keylen)
>>>> +{
>>>> + struct snp_guest_crypto *crypto;
>>>> +
>>>> + crypto = kzalloc(sizeof(*crypto), GFP_KERNEL_ACCOUNT);
>>>> + if (!crypto)
>>>> + return NULL;
>>>> +
>>>> + crypto->tfm = crypto_alloc_aead("gcm(aes)", 0, 0);
>>>> + if (IS_ERR(crypto->tfm))
>>>> + goto e_free;
>>>> +
>>>> + if (crypto_aead_setkey(crypto->tfm, key, keylen))
>>>> + goto e_free_crypto;
>>>> +
>>>> + crypto->iv_len = crypto_aead_ivsize(crypto->tfm);
>>>> + if (crypto->iv_len < 12) {
>>>> + dev_err(snp_dev->dev, "IV length is less than 12.\n");
>>>> + goto e_free_crypto;
>>>> + }
>>>> +
>>>> + crypto->iv = kmalloc(crypto->iv_len, GFP_KERNEL_ACCOUNT);
>>>> + if (!crypto->iv)
>>>> + goto e_free_crypto;
>>>> +
>>>> + if (crypto_aead_authsize(crypto->tfm) > MAX_AUTHTAG_LEN) {
>>>> + if (crypto_aead_setauthsize(crypto->tfm, MAX_AUTHTAG_LEN)) {
>>>> + dev_err(snp_dev->dev, "failed to set authsize to %d\n", MAX_AUTHTAG_LEN);
>>>> + goto e_free_crypto;
>>>> + }
>>>> + }
>>>> +
>>>> + crypto->a_len = crypto_aead_authsize(crypto->tfm);
>>>> + crypto->authtag = kmalloc(crypto->a_len, GFP_KERNEL_ACCOUNT);
>>>> + if (!crypto->authtag)
>>>> + goto e_free_crypto;
>>>> +
>>>> + return crypto;
>>>> +
>>>> +e_free_crypto:
>>>> + crypto_free_aead(crypto->tfm);
>>>> +e_free:
>>>> + kfree(crypto->iv);
>>>> + kfree(crypto->authtag);
>>>> + kfree(crypto);
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +static void deinit_crypto(struct snp_guest_crypto *crypto)
>>>> +{
>>>> + crypto_free_aead(crypto->tfm);
>>>> + kfree(crypto->iv);
>>>> + kfree(crypto->authtag);
>>>> + kfree(crypto);
>>>> +}
>>>> +
>>>> +static int enc_dec_message(struct snp_guest_crypto *crypto, struct snp_guest_msg *msg,
>>>> + u8 *src_buf, u8 *dst_buf, size_t len, bool enc)
>>>> +{
>>>> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
>>>> + struct scatterlist src[3], dst[3];
>>>> + DECLARE_CRYPTO_WAIT(wait);
>>>> + struct aead_request *req;
>>>> + int ret;
>>>> +
>>>> + req = aead_request_alloc(crypto->tfm, GFP_KERNEL);
>>>> + if (!req)
>>>> + return -ENOMEM;
>>>> +
>>>> + /*
>>>> + * AEAD memory operations:
>>>> + * +------ AAD -------+------- DATA -----+---- AUTHTAG----+
>>>> + * | msg header | plaintext | hdr->authtag |
>>>> + * | bytes 30h - 5Fh | or | |
>>>> + * | | cipher | |
>>>> + * +------------------+------------------+----------------+
>>>> + */
>>>> + sg_init_table(src, 3);
>>>> + sg_set_buf(&src[0], &hdr->algo, AAD_LEN);
>>>> + sg_set_buf(&src[1], src_buf, hdr->msg_sz);
>>>> + sg_set_buf(&src[2], hdr->authtag, crypto->a_len);
>>>> +
>>>> + sg_init_table(dst, 3);
>>>> + sg_set_buf(&dst[0], &hdr->algo, AAD_LEN);
>>>> + sg_set_buf(&dst[1], dst_buf, hdr->msg_sz);
>>>> + sg_set_buf(&dst[2], hdr->authtag, crypto->a_len);
>>>> +
>>>> + aead_request_set_ad(req, AAD_LEN);
>>>> + aead_request_set_tfm(req, crypto->tfm);
>>>> + aead_request_set_callback(req, 0, crypto_req_done, &wait);
>>>> +
>>>> + aead_request_set_crypt(req, src, dst, len, crypto->iv);
>>>> + ret = crypto_wait_req(enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req), &wait);
>>>> +
>>>> + aead_request_free(req);
>>>> + return ret;
>>>> +}
>>>> +
>>>> +static int __enc_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
>>>> + void *plaintext, size_t len)
>>>> +{
>>>> + struct snp_guest_crypto *crypto = snp_dev->crypto;
>>>> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
>>>> +
>>>> + memset(crypto->iv, 0, crypto->iv_len);
>>>> + memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
>>>> +
>>>> + return enc_dec_message(crypto, msg, plaintext, msg->payload, len, true);
>>>> +}
>>>> +
>>>> +static int dec_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
>>>> + void *plaintext, size_t len)
>>>> +{
>>>> + struct snp_guest_crypto *crypto = snp_dev->crypto;
>>>> + struct snp_guest_msg_hdr *hdr = &msg->hdr;
>>>> +
>>>> + /* Build IV with response buffer sequence number */
>>>> + memset(crypto->iv, 0, crypto->iv_len);
>>>> + memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
>>>> +
>>>> + return enc_dec_message(crypto, msg, msg->payload, plaintext, len, false);
>>>> +}
>>>> +
>>>> +static int verify_and_dec_payload(struct snp_guest_dev *snp_dev, void *payload, u32 sz)
>>>> +{
>>>> + struct snp_guest_crypto *crypto = snp_dev->crypto;
>>>> + struct snp_guest_msg *resp = snp_dev->response;
>>>> + struct snp_guest_msg *req = snp_dev->request;
>>>> + struct snp_guest_msg_hdr *req_hdr = &req->hdr;
>>>> + struct snp_guest_msg_hdr *resp_hdr = &resp->hdr;
>>>> +
>>>> + dev_dbg(snp_dev->dev, "response [seqno %lld type %d version %d sz %d]\n",
>>>> + resp_hdr->msg_seqno, resp_hdr->msg_type, resp_hdr->msg_version, resp_hdr->msg_sz);
>>>> +
>>>> + /* Verify that the sequence counter is incremented by 1 */
>>>> + if (unlikely(resp_hdr->msg_seqno != (req_hdr->msg_seqno + 1)))
>>>> + return -EBADMSG;
>>>> +
>>>> + /* Verify response message type and version number. */
>>>> + if (resp_hdr->msg_type != (req_hdr->msg_type + 1) ||
>>>> + resp_hdr->msg_version != req_hdr->msg_version)
>>>> + return -EBADMSG;
>>>> +
>>>> + /*
>>>> + * If the message size is greater than our buffer length then return
>>>> + * an error.
>>>> + */
>>>> + if (unlikely((resp_hdr->msg_sz + crypto->a_len) > sz))
>>>> + return -EBADMSG;
>>>> +
>>>> + return dec_payload(snp_dev, resp, payload, resp_hdr->msg_sz + crypto->a_len);
>>>> +}
>>>> +
>>>> +static bool enc_payload(struct snp_guest_dev *snp_dev, u64 seqno, int version, u8 type,
>>>> + void *payload, size_t sz)
>>>> +{
>>>> + struct snp_guest_msg *req = snp_dev->request;
>>>> + struct snp_guest_msg_hdr *hdr = &req->hdr;
>>>> +
>>>> + memset(req, 0, sizeof(*req));
>>>> +
>>>> + hdr->algo = SNP_AEAD_AES_256_GCM;
>>>> + hdr->hdr_version = MSG_HDR_VER;
>>>> + hdr->hdr_sz = sizeof(*hdr);
>>>> + hdr->msg_type = type;
>>>> + hdr->msg_version = version;
>>>> + hdr->msg_seqno = seqno;
>>>> + hdr->msg_vmpck = vmpck_id;
>>>> + hdr->msg_sz = sz;
>>>> +
>>>> + /* Verify the sequence number is non-zero */
>>>> + if (!hdr->msg_seqno)
>>>> + return -ENOSR;
>>>> +
>>>> + dev_dbg(snp_dev->dev, "request [seqno %lld type %d version %d sz %d]\n",
>>>> + hdr->msg_seqno, hdr->msg_type, hdr->msg_version, hdr->msg_sz);
>>>> +
>>>> + return __enc_payload(snp_dev, req, payload, sz);
>>>> +}
>>>> +
>>>> +static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, int msg_ver,
>>>> + u8 type, void *req_buf, size_t req_sz, void *resp_buf,
>>>> + u32 resp_sz, __u64 *fw_err)
>>>> +{
>>>> + unsigned long err;
>>>> + u64 seqno;
>>>> + int rc;
>>>> +
>>>> + /* Get message sequence and verify that its a non-zero */
>>>> + seqno = snp_get_msg_seqno(snp_dev);
>>>> + if (!seqno)
>>>> + return -EIO;
>>>> +
>>>> + memset(snp_dev->response, 0, sizeof(*snp_dev->response));
>>>> +
>>>> + /* Encrypt the userspace provided payload */
>>>> + rc = enc_payload(snp_dev, seqno, msg_ver, type, req_buf, req_sz);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + /* Call firmware to process the request */
>>>> + rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
>>>> + if (fw_err)
>>>> + *fw_err = err;
>>>> +
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + rc = verify_and_dec_payload(snp_dev, resp_buf, resp_sz);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + /* Increment to new message sequence after the command is successful. */
>>>> + snp_inc_msg_seqno(snp_dev);
>>>
>>> Thanks for updating this sequence number logic. But I still have some
>>> concerns. In verify_and_dec_payload() we check the encryption header
>>> but all these fields are accessible to the hypervisor, meaning it can
>>> change the header and cause this sequence number to not get
>>> incremented. We then will reuse the sequence number for the next
>>> command, which isn't great for AES GCM. It seems very hard to tell if
>>> the FW actually got our request and created a response there by
>>> incrementing the sequence number by 2, or if the hypervisor is acting
>>> in bad faith. It seems like to be safe we need to completely stop
>>> using this vmpck if we cannot confirm the PSP has gotten our request
>>> and created a response. Thoughts?
>>>
>>
>> Very good point, I think we can detect this condition by rearranging the
>> checks. The verify_and_dec_payload() is called only after the command is
>> succesful and does the following checks
>>
>> 1) Verifies the header
>> 2) Decrypts the payload
>> 3) Later we increment the sequence
>>
>> If we arrange to the below order then we can avoid this condition.
>> 1) Decrypt the payload
>> 2) Increment the sequence number
>> 3) Verify the header
>>
>> The descryption will succeed only if PSP constructed the payload.
>>
>> Does this make sense ?
>
> Either ordering seems fine to me. I don't think it changes much though
> since the header (bytes 30-50 according to the spec) are included in
> the authenticated data of the encryption. So any hypervisor modictions
> will lead to a decryption failure right?
>
> Either case if we do fail the decryption, what are your thoughts on
> not allowing further use of that VMPCK?
>
We have limited number of VMPCK (total 3). I am not sure switching to
different will change much. HV can quickly exaust it. Once we have SVSM
in-place then its possible that SVSM may use of the VMPCK. If the
decryption failed, then maybe its safe to erase the key from the secrets
page (in other words guest OS cannot use that key for any further
communication). A guest can reload the driver will different VMPCK id
and try again.
thanks
On Wed, Oct 27, 2021 at 3:13 PM Brijesh Singh <[email protected]> wrote:
>
>
>
> On 10/27/21 4:05 PM, Peter Gonda wrote:
> ....
>
> >>>>>
> >>>>> Thanks for updating this sequence number logic. But I still have some
> >>>>> concerns. In verify_and_dec_payload() we check the encryption header
> >>>>> but all these fields are accessible to the hypervisor, meaning it can
> >>>>> change the header and cause this sequence number to not get
> >>>>> incremented. We then will reuse the sequence number for the next
> >>>>> command, which isn't great for AES GCM. It seems very hard to tell if
> >>>>> the FW actually got our request and created a response there by
> >>>>> incrementing the sequence number by 2, or if the hypervisor is acting
> >>>>> in bad faith. It seems like to be safe we need to completely stop
> >>>>> using this vmpck if we cannot confirm the PSP has gotten our request
> >>>>> and created a response. Thoughts?
> >>>>>
> >>>>
> >>>> Very good point, I think we can detect this condition by rearranging the
> >>>> checks. The verify_and_dec_payload() is called only after the command is
> >>>> succesful and does the following checks
> >>>>
> >>>> 1) Verifies the header
> >>>> 2) Decrypts the payload
> >>>> 3) Later we increment the sequence
> >>>>
> >>>> If we arrange to the below order then we can avoid this condition.
> >>>> 1) Decrypt the payload
> >>>> 2) Increment the sequence number
> >>>> 3) Verify the header
> >>>>
> >>>> The descryption will succeed only if PSP constructed the payload.
> >>>>
> >>>> Does this make sense ?
> >>>
> >>> Either ordering seems fine to me. I don't think it changes much though
> >>> since the header (bytes 30-50 according to the spec) are included in
> >>> the authenticated data of the encryption. So any hypervisor modictions
> >>> will lead to a decryption failure right?
> >>>
> >>> Either case if we do fail the decryption, what are your thoughts on
> >>> not allowing further use of that VMPCK?
> >>>
> >>
> >> We have limited number of VMPCK (total 3). I am not sure switching to
> >> different will change much. HV can quickly exaust it. Once we have SVSM
> >> in-place then its possible that SVSM may use of the VMPCK. If the
> >> decryption failed, then maybe its safe to erase the key from the secrets
> >> page (in other words guest OS cannot use that key for any further
> >> communication). A guest can reload the driver will different VMPCK id
> >> and try again.
> >
> > SNP cannot really cover DOS at all since the VMM could just never
> > schedule the VM. In this case we know that the hypervisor is trying to
> > mess with the guest, so my preference would be to stop sending guest
> > messages to prevent that duplicated IV usage. If one caller gets an
> > EBADMSG it knows its in this case but the rest of userspace has no
> > idea. Maybe log an error?
> >
> >>
>
> Yap, we cannot protect against the DOS. This is why I was saying that we
> zero the key from secrets page so that guest cannot use that key for any
> future communication (whether its from rest of userspace or kexec
> kernel). I can update the driver to log the message and ensure that
> future messages will *not* use that key. The VMPCK ID is a module
> params, so a guest can reload the driver to use different VMPCK.
Duh! Sorry I thought you said we needed a VMPL0 SVSM to do that. That
sounds great.
>
>
> >> thanks
On 10/27/21 4:05 PM, Peter Gonda wrote:
....
>>>>>
>>>>> Thanks for updating this sequence number logic. But I still have some
>>>>> concerns. In verify_and_dec_payload() we check the encryption header
>>>>> but all these fields are accessible to the hypervisor, meaning it can
>>>>> change the header and cause this sequence number to not get
>>>>> incremented. We then will reuse the sequence number for the next
>>>>> command, which isn't great for AES GCM. It seems very hard to tell if
>>>>> the FW actually got our request and created a response there by
>>>>> incrementing the sequence number by 2, or if the hypervisor is acting
>>>>> in bad faith. It seems like to be safe we need to completely stop
>>>>> using this vmpck if we cannot confirm the PSP has gotten our request
>>>>> and created a response. Thoughts?
>>>>>
>>>>
>>>> Very good point, I think we can detect this condition by rearranging the
>>>> checks. The verify_and_dec_payload() is called only after the command is
>>>> succesful and does the following checks
>>>>
>>>> 1) Verifies the header
>>>> 2) Decrypts the payload
>>>> 3) Later we increment the sequence
>>>>
>>>> If we arrange to the below order then we can avoid this condition.
>>>> 1) Decrypt the payload
>>>> 2) Increment the sequence number
>>>> 3) Verify the header
>>>>
>>>> The descryption will succeed only if PSP constructed the payload.
>>>>
>>>> Does this make sense ?
>>>
>>> Either ordering seems fine to me. I don't think it changes much though
>>> since the header (bytes 30-50 according to the spec) are included in
>>> the authenticated data of the encryption. So any hypervisor modictions
>>> will lead to a decryption failure right?
>>>
>>> Either case if we do fail the decryption, what are your thoughts on
>>> not allowing further use of that VMPCK?
>>>
>>
>> We have limited number of VMPCK (total 3). I am not sure switching to
>> different will change much. HV can quickly exaust it. Once we have SVSM
>> in-place then its possible that SVSM may use of the VMPCK. If the
>> decryption failed, then maybe its safe to erase the key from the secrets
>> page (in other words guest OS cannot use that key for any further
>> communication). A guest can reload the driver will different VMPCK id
>> and try again.
>
> SNP cannot really cover DOS at all since the VMM could just never
> schedule the VM. In this case we know that the hypervisor is trying to
> mess with the guest, so my preference would be to stop sending guest
> messages to prevent that duplicated IV usage. If one caller gets an
> EBADMSG it knows its in this case but the rest of userspace has no
> idea. Maybe log an error?
>
>>
Yap, we cannot protect against the DOS. This is why I was saying that we
zero the key from secrets page so that guest cannot use that key for any
future communication (whether its from rest of userspace or kexec
kernel). I can update the driver to log the message and ensure that
future messages will *not* use that key. The VMPCK ID is a module
params, so a guest can reload the driver to use different VMPCK.
>> thanks
On Fri, Oct 08, 2021 at 01:04:22PM -0500, Brijesh Singh wrote:
> +static bool is_vmpl0(void)
> +{
> + u64 attrs, va;
That local variable va is not needed.
> + int err;
> +
> + /*
> + * There is no straightforward way to query the current VMPL level. The
> + * simplest method is to use the RMPADJUST instruction to change a page
> + * permission to a VMPL level-1, and if the guest kernel is launched at
> + * a level <= 1, then RMPADJUST instruction will return an error.
> + */
> + attrs = 1;
> +
> + /*
> + * Any page aligned virtual address is sufficent to test the VMPL level.
"page-aligned" ... "sufficient"
> + * The boot_ghcb_page is page aligned memory, so lets use for the test.
> + */
> + va = (u64)&boot_ghcb_page;
> +
> + /* Instruction mnemonic supported in binutils versions v2.36 and later */
> + asm volatile (".byte 0xf3,0x0f,0x01,0xfe\n\t"
> + : "=a" (err)
> + : "a" (va), "c" (RMP_PG_SIZE_4K), "d" (attrs)
> + : "memory", "cc");
You're adding a separate rmpadjust() primitive function in patch 24.
In order to avoid duplication, define that primitive first, just like
you've done for PVALIDATE in the previous patch and use said primitive
at both call sites.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Oct 08, 2021 at 01:04:24PM -0500, Brijesh Singh wrote:
> The SEV-SNP guest is required to perform GHCB GPA registration. This is
Let's write that out and make this more readable for outsiders too:
"... is required by the GHCB spec to register the GHCB's Guest Physical
Address (GPA)."
> because the hypervisor may prefer that a guest use a consistent and/or
> specific GPA for the GHCB associated with a vCPU. For more information,
> see the GHCB specification.
Giving a section in that spec would be helpful.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Oct 08, 2021 at 01:04:25PM -0500, Brijesh Singh wrote:
> + /* SEV-SNP guest requires that GHCB must be registered. */
> + if (cc_platform_has(CC_ATTR_SEV_SNP))
> + snp_register_ghcb(data, __pa(ghcb));
This looks like more of that "let's register a GHCB at the time the
first #VC fires".
And there already is setup_ghcb() which is called in the #VC handler.
And that thing registers a GHCB GPA.
But then you have to do it here again.
I think this should be changed together with the CPUID page detection
stuff we talked about earlier, where, after you've established that this
is an SNP guest, you call setup_ghcb() *once* and after that you have
everything set up, including the GHCB GPA. And then the #VC exceptions
can come.
Right?
Or is there a chicken-and-an-egg issue here which I'm not thinking
about?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Oct 08, 2021 at 01:04:26PM -0500, Brijesh Singh wrote:
> From: Borislav Petkov <[email protected]>
>
> There's a perfectly fine prototype in the asm/setup.h header. Use it.
>
> No functional changes.
>
> Signed-off-by: Borislav Petkov <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/sev.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
Right, for the next and all future submissions, it is a lot easier if
you put all fixes and cleanups and code reorganizations at the beginning
of the patchset. Because then they can simply get applied earlier -
they're useful cleanups and fixes, after all - and this way you'll
unload some of the patches quicker and have to deal with a smaller set.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Boris,
On 11/2/21 11:53 AM, Borislav Petkov wrote:
> On Fri, Oct 08, 2021 at 01:04:25PM -0500, Brijesh Singh wrote:
>> + /* SEV-SNP guest requires that GHCB must be registered. */
>> + if (cc_platform_has(CC_ATTR_SEV_SNP))
>> + snp_register_ghcb(data, __pa(ghcb));
>
> This looks like more of that "let's register a GHCB at the time the
> first #VC fires".
>
There are two #VC handlers:
1) early exception handler [do_vc_no_ghcb()]. The handler uses the MSR
protocol based VMGEXIT.
https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/sev-shared.c#L147
2) exception handler setup during the idt bringup
[handle_vc_boot_ghcb()]. The handler uses the full GHCB.
https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/sev.c#L1472
To answer your question, GHCB is registered at the time of first #VC
handling by the second exception handler. Mike can correct me, the CPUID
page check is going to happen on first #VC handling inside the early
exception handler (i.e case 1). The early exception handler uses the MSR
protocol, so, there is no need to register the GHCB page. Before
registering the page we need to map it unencrypted.
> And there already is setup_ghcb() which is called in the #VC handler.
> And that thing registers a GHCB GPA.
>
There are two cases that need to be covered 1) BSP GHCB page and 2) APs
GHCB page. The setup_ghcb() is called for the BSP. Later on, per-cpu
GHCB page is used by the APs. APs need to register their GHCB page
before using it.
> But then you have to do it here again.
>
> I think this should be changed together with the CPUID page detection
> stuff we talked about earlier, where, after you've established that this
> is an SNP guest, you call setup_ghcb() *once* and after that you have
> everything set up, including the GHCB GPA. And then the #VC exceptions
> can come.
See if my above explanation make sense. Based on it, I don't think it
makes sense to register the GHCB during the CPUID page detection. The
CPUID page detection will occur in early VC handling.
> Right?
>
> Or is there a chicken-and-an-egg issue here which I'm not thinking
> about?
>
On Tue, Nov 02, 2021 at 01:24:01PM -0500, Brijesh Singh wrote:
> To answer your question, GHCB is registered at the time of first #VC
> handling by the second exception handler.
And this is what I don't like - register at use. Instead of init
everything *before* use.
> Mike can correct me, the CPUID page check is going to happen on first
> #VC handling inside the early exception handler (i.e case 1).
What is the "CPUID page check"?
And no, you don't want to do any detection when an exception happens -
you want to detect *everything* *first* and then do exceptions.
> See if my above explanation make sense. Based on it, I don't think it
> makes sense to register the GHCB during the CPUID page detection. The
> CPUID page detection will occur in early VC handling.
See above. If this needs more discussion, we can talk on IRC.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Boris,
On 11/2/21 1:44 PM, Borislav Petkov wrote:
> On Tue, Nov 02, 2021 at 01:24:01PM -0500, Brijesh Singh wrote:
>> To answer your question, GHCB is registered at the time of first #VC
>> handling by the second exception handler.
>
> And this is what I don't like - register at use. Instead of init
> everything *before* use.
>
>> Mike can correct me, the CPUID page check is going to happen on first
>> #VC handling inside the early exception handler (i.e case 1).
>
> What is the "CPUID page check"?
>
> And no, you don't want to do any detection when an exception happens -
> you want to detect *everything* *first* and then do exceptions.
>
>> See if my above explanation make sense. Based on it, I don't think it
>> makes sense to register the GHCB during the CPUID page detection. The
>> CPUID page detection will occur in early VC handling.
>
> See above. If this needs more discussion, we can talk on IRC.
>
Looking at the secondary CPU bring up path it seems that we will not be
getting #VC until the early_setup_idt() is called. I am thinking to add
function to register the GHCB from the early_setup_idt()
early_setup_idt()
{
...
if (IS_ENABLED(CONFIG_MEM_ENCRYPT))
sev_snp_register_ghcb()
...
}
The above will cover the APs and for BSP case I can call the same
function just after the final IDT is loaded
cpu_init_exception_handling()
{
...
...
/* Finally load the IDT */
load_current_idt();
if (IS_ENABLED(CONFIG_MEM_ENCRYPT))
sev_snp_register_ghcb()
}
Please let me know if something like above is acceptable.
thanks
On Wed, Nov 03, 2021 at 03:10:16PM -0500, Brijesh Singh wrote:
> Looking at the secondary CPU bring up path it seems that we will not be
> getting #VC until the early_setup_idt() is called. I am thinking to add
> function to register the GHCB from the early_setup_idt()
>
> early_setup_idt()
> {
> ...
> if (IS_ENABLED(CONFIG_MEM_ENCRYPT))
> sev_snp_register_ghcb()
> ...
> }
>
> The above will cover the APs
That will cover the APs during early boot as that is being called from
asm.
> and for BSP case I can call the same function just after the final IDT
> is loaded
Why after and not before?
> cpu_init_exception_handling()
> {
> ...
> ...
> /* Finally load the IDT */
> load_current_idt();
>
> if (IS_ENABLED(CONFIG_MEM_ENCRYPT))
> sev_snp_register_ghcb()
>
> }
That is also called on the APs - not only the BSP. trap_init() calls it
from start_kernel() which is the BSP and cpu_init_secondary() calls it
too, which is ofc the APs.
I guess that should be ok since you're calling the same function from
both but WTH do I know...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 11/4/21 8:58 AM, Borislav Petkov wrote:
> On Wed, Nov 03, 2021 at 03:10:16PM -0500, Brijesh Singh wrote:
>> Looking at the secondary CPU bring up path it seems that we will not be
>> getting #VC until the early_setup_idt() is called. I am thinking to add
>> function to register the GHCB from the early_setup_idt()
>>
>> early_setup_idt()
>> {
>> ...
>> if (IS_ENABLED(CONFIG_MEM_ENCRYPT))
>> sev_snp_register_ghcb()
>> ...
>> }
>>
>> The above will cover the APs
>
> That will cover the APs during early boot as that is being called from
> asm.
>
>> and for BSP case I can call the same function just after the final IDT
>> is loaded
>
> Why after and not before?
>
I just looked at load_current_idt() and we should not get #VC before
loading the new idt, so, its safe to do is before.
>> cpu_init_exception_handling()
>> {
>> ...
>> ...
>> /* Finally load the IDT */
>> load_current_idt();
>>
>> if (IS_ENABLED(CONFIG_MEM_ENCRYPT))
>> sev_snp_register_ghcb()
>>
>> }
>
> That is also called on the APs - not only the BSP. trap_init() calls it
> from start_kernel() which is the BSP and cpu_init_secondary() calls it
> too, which is ofc the APs.
>
> I guess that should be ok since you're calling the same function from
> both but WTH do I know...
>
For AP case, we will be registering the same GHCB GPA twice, that should
not be an issue. The GHCB spec does not restrict us on registering the
GPA twice.
Of course, the current patch does not suffer with it. Let me know your
preference.
thanks
On November 4, 2021 3:26:56 PM UTC, Brijesh Singh <[email protected]> wrote:
>Of course, the current patch does not suffer with it. Let me know your
>preference.
Whatever keeps the code simpler.
Thx.
--
Sent from a small device: formatting sux and brevity is inevitable.
On Fri, Oct 08, 2021 at 01:04:30PM -0500, Brijesh Singh wrote:
> +static int vmgexit_psc(struct snp_psc_desc *desc)
> +{
> + int cur_entry, end_entry, ret;
> + struct snp_psc_desc *data;
> + struct ghcb_state state;
> + struct ghcb *ghcb;
> + struct psc_hdr *hdr;
> + unsigned long flags;
int cur_entry, end_entry, ret;
struct snp_psc_desc *data;
struct ghcb_state state;
struct psc_hdr *hdr;
unsigned long flags;
struct ghcb *ghcb;
that's properly sorted.
> +
> + local_irq_save(flags);
What is that protecting against? Comment about it?
Aha, __sev_get_ghcb() needs to run with IRQs disabled because it is
using the per-CPU GHCB.
> +
> + ghcb = __sev_get_ghcb(&state);
> + if (unlikely(!ghcb))
> + panic("SEV-SNP: Failed to get GHCB\n");
> +
> + /* Copy the input desc into GHCB shared buffer */
> + data = (struct snp_psc_desc *)ghcb->shared_buffer;
> + memcpy(ghcb->shared_buffer, desc, sizeof(*desc));
That shared buffer has a size - check it vs the size of the desc thing.
> +
> + hdr = &data->hdr;
Why do you need this and why can't you use data->hdr simply?
/me continues reading and realizes why
Oh no, this is tricky. The HV call will modify what @data points to and
thus @hdr will point to new contents. Only then your backwards processing
check below makes sense.
So then you *absoulutely* want to use data->hdr everywhere and then also
write why in the comment above the check that data gets updated by the
HV call.
> + cur_entry = hdr->cur_entry;
> + end_entry = hdr->end_entry;
> +
> + /*
> + * As per the GHCB specification, the hypervisor can resume the guest
> + * before processing all the entries. Checks whether all the entries
Check
> + * are processed. If not, then keep retrying.
> + *
> + * The stragtegy here is to wait for the hypervisor to change the page
> + * state in the RMP table before guest access the memory pages. If the
accesses
> + * page state was not successful, then later memory access will result
"If the page state *change* was not ..."
> + * in the crash.
"in a crash."
> + */
> + while (hdr->cur_entry <= hdr->end_entry) {
> + ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
> +
> + ret = sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PSC, 0, 0);
This should be
ret = sev_es_ghcb_hv_call(ghcb, true, NULL, SVM_VMGEXIT_PSC, 0, 0);
as we changed it in the meantime to accomodate HyperV isolation VMs.
> +
> + /*
> + * Page State Change VMGEXIT can pass error code through
> + * exit_info_2.
> + */
> + if (WARN(ret || ghcb->save.sw_exit_info_2,
> + "SEV-SNP: PSC failed ret=%d exit_info_2=%llx\n",
> + ret, ghcb->save.sw_exit_info_2)) {
> + ret = 1;
That ret = 1 goes unused with that "return 0" at the end. It should be
"return ret" at the end.. Ditto for the others. Audit all your exit
paths in this function.
> + goto out;
> + }
> +
> + /*
> + * Sanity check that entry processing is not going backward.
> + * This will happen only if hypervisor is tricking us.
> + */
> + if (WARN(hdr->end_entry > end_entry || cur_entry > hdr->cur_entry,
> +"SEV-SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
> + end_entry, hdr->end_entry, cur_entry, hdr->cur_entry)) {
> + ret = 1;
> + goto out;
> + }
> +
> + /* Verify that reserved bit is not set */
> + if (WARN(hdr->reserved, "Reserved bit is set in the PSC header\n")) {
Shouldn't that thing happen first after the HV call?
> + ret = 1;
> + goto out;
> + }
> + }
> +
> +out:
> + __sev_put_ghcb(&state);
> + local_irq_restore(flags);
> +
> + return 0;
> +}
> +
> +static void __set_page_state(struct snp_psc_desc *data, unsigned long vaddr,
> + unsigned long vaddr_end, int op)
> +{
> + struct psc_hdr *hdr;
> + struct psc_entry *e;
> + unsigned long pfn;
> + int i;
> +
> + hdr = &data->hdr;
> + e = data->entries;
> +
> + memset(data, 0, sizeof(*data));
> + i = 0;
> +
> + while (vaddr < vaddr_end) {
> + if (is_vmalloc_addr((void *)vaddr))
> + pfn = vmalloc_to_pfn((void *)vaddr);
> + else
> + pfn = __pa(vaddr) >> PAGE_SHIFT;
> +
> + e->gfn = pfn;
> + e->operation = op;
> + hdr->end_entry = i;
> +
> + /*
> + * The GHCB specification provides the flexibility to
> + * use either 4K or 2MB page size in the RMP table.
> + * The current SNP support does not keep track of the
> + * page size used in the RMP table. To avoid the
> + * overlap request,
"avoid overlap request"?
No clue what that means. In general, that comment is talking about
something in the future and is more confusing than explaining stuff.
> use the 4K page size in the RMP
> + * table.
> + */
> + e->pagesize = RMP_PG_SIZE_4K;
> +
> + vaddr = vaddr + PAGE_SIZE;
> + e++;
> + i++;
> + }
> +
> + if (vmgexit_psc(data))
> + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
> +}
> +
> +static void set_page_state(unsigned long vaddr, unsigned int npages, int op)
Yeah, so this should be named
set_pages_state - notice the plural "pages"
because it works on multiple pages, @npages exactly.
> +{
> + unsigned long vaddr_end, next_vaddr;
> + struct snp_psc_desc *desc;
> +
> + vaddr = vaddr & PAGE_MASK;
> + vaddr_end = vaddr + (npages << PAGE_SHIFT);
Take those two...
> +
> + desc = kmalloc(sizeof(*desc), GFP_KERNEL_ACCOUNT);
> + if (!desc)
> + panic("SEV-SNP: failed to allocate memory for PSC descriptor\n");
... and put them here.
<---
> +
> + while (vaddr < vaddr_end) {
> + /*
> + * Calculate the last vaddr that can be fit in one
> + * struct snp_psc_desc.
> + */
> + next_vaddr = min_t(unsigned long, vaddr_end,
> + (VMGEXIT_PSC_MAX_ENTRY * PAGE_SIZE) + vaddr);
> +
> + __set_page_state(desc, vaddr, next_vaddr, op);
> +
> + vaddr = next_vaddr;
> + }
> +
> + kfree(desc);
> +}
> +
> +void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
> +{
> + if (!cc_platform_has(CC_ATTR_SEV_SNP))
> + return;
> +
> + pvalidate_pages(vaddr, npages, 0);
> +
> + set_page_state(vaddr, npages, SNP_PAGE_STATE_SHARED);
> +}
> +
> +void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
> +{
> + if (!cc_platform_has(CC_ATTR_SEV_SNP))
> + return;
> +
> + set_page_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
> +
> + pvalidate_pages(vaddr, npages, 1);
> +}
> +
> int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
> {
> u16 startup_cs, startup_ip;
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 527957586f3c..ffe51944606a 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -30,6 +30,7 @@
> #include <asm/proto.h>
> #include <asm/memtype.h>
> #include <asm/set_memory.h>
> +#include <asm/sev.h>
>
> #include "../mm_internal.h"
>
> @@ -2010,8 +2011,22 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> */
> cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
>
> + /*
> + * To maintain the security gurantees of SEV-SNP guest invalidate the memory
"guarantees"
Your spellchecker broke again.
> + * before clearing the encryption attribute.
> + */
> + if (!enc)
> + snp_set_memory_shared(addr, numpages);
> +
> ret = __change_page_attr_set_clr(&cpa, 1);
>
> + /*
> + * Now that memory is mapped encrypted in the page table, validate it
> + * so that is consistent with the above page state.
> + */
> + if (!ret && enc)
> + snp_set_memory_private(addr, numpages);
> +
> /*
> * After changing the encryption attribute, we need to flush TLBs again
> * in case any speculative TLB caching occurred (but no need to flush
> --
> 2.25.1
>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 11/9/21 1:34 PM, Borislav Petkov wrote:
> On Fri, Oct 08, 2021 at 01:04:30PM -0500, Brijesh Singh wrote:
>> +static int vmgexit_psc(struct snp_psc_desc *desc)
>> +{
>> + int cur_entry, end_entry, ret;
>> + struct snp_psc_desc *data;
>> + struct ghcb_state state;
>> + struct ghcb *ghcb;
>> + struct psc_hdr *hdr;
>> + unsigned long flags;
>
> int cur_entry, end_entry, ret;
> struct snp_psc_desc *data;
> struct ghcb_state state;
> struct psc_hdr *hdr;
> unsigned long flags;
> struct ghcb *ghcb;
>
> that's properly sorted.
>
Noted.
>> +
>> + local_irq_save(flags);
>
> What is that protecting against? Comment about it?
>
> Aha, __sev_get_ghcb() needs to run with IRQs disabled because it is
> using the per-CPU GHCB.
>
I will add a comment to clarify it.
>> +
>> + ghcb = __sev_get_ghcb(&state);
>> + if (unlikely(!ghcb))
>> + panic("SEV-SNP: Failed to get GHCB\n");
>> +
>> + /* Copy the input desc into GHCB shared buffer */
>> + data = (struct snp_psc_desc *)ghcb->shared_buffer;
>> + memcpy(ghcb->shared_buffer, desc, sizeof(*desc));
>
> That shared buffer has a size - check it vs the size of the desc thing.
>
I am assuming you mean add some compile time check to ensure that desc
will fit in the shared buffer ?
...
>> + if (WARN(ret || ghcb->save.sw_exit_info_2,
>> + "SEV-SNP: PSC failed ret=%d exit_info_2=%llx\n",
>> + ret, ghcb->save.sw_exit_info_2)) {
>> + ret = 1;
>
> That ret = 1 goes unused with that "return 0" at the end. It should be
> "return ret" at the end.. Ditto for the others. Audit all your exit
> paths in this function.
Noted.
>> +
>> + /* Verify that reserved bit is not set */
>> + if (WARN(hdr->reserved, "Reserved bit is set in the PSC header\n")) {
>
> Shouldn't that thing happen first after the HV call?
>
I am okay to move this check before the going backward check.
>> +
>> + /*
>> + * The GHCB specification provides the flexibility to
>> + * use either 4K or 2MB page size in the RMP table.
>> + * The current SNP support does not keep track of the
>> + * page size used in the RMP table. To avoid the
>> + * overlap request,
>
> "avoid overlap request"?
>
> No clue what that means. In general, that comment is talking about
> something in the future and is more confusing than explaining stuff.
>
I can drop the overlap comment to avoid the confusion, as you pointed it
more of the future thing. Basically overlap is the below condition
set_memory_private(gfn=0, page_size=2m)
set_memory_private(gfn=10, page_size=4k)
The RMPUPDATE instruction will detect overlap on the second call and
return an error to the guest. After we add the support to track the page
validation state (either in bitmap or page flag), the second call will
not be issued and thus avoid an overlap errors. For now, we use the
page_size=4k for all the page state changes from the kernel.
>> use the 4K page size in the RMP
>> + * table.
>> + */
>> + e->pagesize = RMP_PG_SIZE_4K;
>> +
>> + vaddr = vaddr + PAGE_SIZE;
>> + e++;
>> + i++;
>> + }
>> +
>> + if (vmgexit_psc(data))
>> + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
>> +}
>> +
>> +static void set_page_state(unsigned long vaddr, unsigned int npages, int op)
>
> Yeah, so this should be named
>
> set_pages_state - notice the plural "pages"
>
> because it works on multiple pages, @npages exactly.
>
Ah, I thought I had it pages but maybe it got renamed sometime back in
the series.
>> +{
>> + unsigned long vaddr_end, next_vaddr;
>> + struct snp_psc_desc *desc;
>> +
>> + vaddr = vaddr & PAGE_MASK;
>> + vaddr_end = vaddr + (npages << PAGE_SHIFT);
>
> Take those two...
>
>> +
>> + desc = kmalloc(sizeof(*desc), GFP_KERNEL_ACCOUNT);
>> + if (!desc)
>> + panic("SEV-SNP: failed to allocate memory for PSC descriptor\n");
>
>
> ... and put them here.
>
Noted.
thanks
On Wed, Nov 10, 2021 at 08:21:21AM -0600, Brijesh Singh wrote:
> I am assuming you mean add some compile time check to ensure that desc will
> fit in the shared buffer ?
No:
struct ghcb {
...
u8 shared_buffer[2032];
so that memcpy needs to do:
memcpy(ghcb->shared_buffer, desc, min_t(int, 2032, sizeof(*desc)));
with that 2032 behind a proper define, ofc.
> I can drop the overlap comment to avoid the confusion, as you pointed it
> more of the future thing. Basically overlap is the below condition
>
> set_memory_private(gfn=0, page_size=2m)
> set_memory_private(gfn=10, page_size=4k)
>
> The RMPUPDATE instruction will detect overlap on the second call and return
> an error to the guest. After we add the support to track the page validation
> state (either in bitmap or page flag), the second call will not be issued
> and thus avoid an overlap errors. For now, we use the page_size=4k for all
> the page state changes from the kernel.
Yah, sounds like the comment is not needed now. You could put this in
the commit message, though.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 11/10/21 12:43 PM, Borislav Petkov wrote:
> On Wed, Nov 10, 2021 at 08:21:21AM -0600, Brijesh Singh wrote:
>> I am assuming you mean add some compile time check to ensure that desc will
>> fit in the shared buffer ?
>
> No:
>
> struct ghcb {
>
> ...
>
> u8 shared_buffer[2032];
>
> so that memcpy needs to do:
>
> memcpy(ghcb->shared_buffer, desc, min_t(int, 2032, sizeof(*desc)));
>
> with that 2032 behind a proper define, ofc.
2032 => sizeof(ghcb->shared_buffer) ?
The idea is that a full snp_psc_desc structure is meant to fit completely
in the shared_buffer area. So if there are no compile time checks, then
the code on the HV side will need to ensure that the input doesn't cause
the HV to access the structure outside of the shared_buffer area - which,
IIRC, it does (think protect against a malicious guest), so the min_t() on
the memcpy should be safe on the guest side.
But given the snp_psc_desc is sized/meant to fit completely in the
shared_buffer, a compile time check would be a good idea, too, right?
Thanks,
Tom
>
>> I can drop the overlap comment to avoid the confusion, as you pointed it
>> more of the future thing. Basically overlap is the below condition
>>
>> set_memory_private(gfn=0, page_size=2m)
>> set_memory_private(gfn=10, page_size=4k)
>>
>> The RMPUPDATE instruction will detect overlap on the second call and return
>> an error to the guest. After we add the support to track the page validation
>> state (either in bitmap or page flag), the second call will not be issued
>> and thus avoid an overlap errors. For now, we use the page_size=4k for all
>> the page state changes from the kernel.
>
> Yah, sounds like the comment is not needed now. You could put this in
> the commit message, though.
>
> Thx.
>
On Thu, Nov 11, 2021 at 08:49:49AM -0600, Tom Lendacky wrote:
> 2032 => sizeof(ghcb->shared_buffer) ?
Or that.
> The idea is that a full snp_psc_desc structure is meant to fit completely in
> the shared_buffer area. So if there are no compile time checks, then the
> code on the HV side will need to ensure that the input doesn't cause the HV
> to access the structure outside of the shared_buffer area - which, IIRC, it
> does (think protect against a malicious guest), so the min_t() on the memcpy
> should be safe on the guest side.
>
> But given the snp_psc_desc is sized/meant to fit completely in the
> shared_buffer, a compile time check would be a good idea, too, right?
If the desc thing is meant to fit, then a compile-time check is also a
good way to express that intention. So yeah.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette