This part of Secure Encrypted Paging (SEV-SNP) series focuses on the changes
required in a guest OS for SEV-SNP support.
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.
This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.
Many of the integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP
VM requires a 2-step process. First, the hypervisor assigns a page to the
guest using the new RMPUPDATE instruction. This transitions the page to
guest-invalid. Second, the guest validates the page using the new PVALIDATE
instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE"
defined in the GHCB specification to ask hypervisor to add or remove page
from the RMP table.
Each page assigned to the SEV-SNP VM can either be validated or unvalidated,
as indicated by the Validated flag in the page's RMP entry. There are two
approaches that can be taken for the page validation: Pre-validation and
Lazy Validation.
Under pre-validation, the pages are validated prior to first use. And under
lazy validation, pages are validated when first accessed. An access to a
unvalidated page results in a #VC exception, at which time the exception
handler may validate the page. Lazy validation requires careful tracking of
the validated pages to avoid validating the same GPA more than once. The
recently introduced "Unaccepted" memory type can be used to communicate the
unvalidated memory ranges to the Guest OS.
At this time we only sypport the pre-validation, the OVMF guest BIOS
validates the entire RAM before the control is handed over to the guest kernel.
The early_set_memory_{encrypt,decrypt} and set_memory_{encrypt,decrypt} are
enlightened to perform the page validation or invalidation while setting or
clearing the encryption attribute from the page table.
This series does not provide support for the Interrupt security yet which will
be added after the base support.
The series is based on tip/master
f6a71a5ebe23 (origin/master, origin/HEAD) Merge branch 'locking/core'
Additional resources
---------------------
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf
(section 15.36)
GHCB spec:
https://developer.amd.com/wp-content/resources/56421.pdf
SEV-SNP firmware specification:
https://developer.amd.com/sev/
Changes since v4:
* Address the cpuid specific review comment
* Simplified the macro based on the review feedback
* Move macro definition to the patch that needs it
* Fix the issues reported by the checkpath
* Address the AP creation specific review comment
Changes since v3:
* Add support to use the PSP filtered CPUID.
* Add support for the extended guest request.
* Move sevguest driver in driver/virt/coco.
* Add documentation for sevguest ioctl.
* Add support to check the vmpl0.
* Pass the VM encryption key and id to be used for encrypting guest messages
through the platform drv data.
* Multiple cleanup and fixes to address the review feedbacks.
Changes since v2:
* Add support for AP startup using SNP specific vmgexit.
* Add snp_prep_memory() helper.
* Drop sev_snp_active() helper.
* Add sev_feature_enabled() helper to check which SEV feature is active.
* Sync the SNP guest message request header with latest SNP FW spec.
* Multiple cleanup and fixes to address the review feedbacks.
Changes since v1:
* Integerate the SNP support in sev.{ch}.
* Add support to query the hypervisor feature and detect whether SNP is supported.
* Define Linux specific reason code for the SNP guest termination.
* Extend the setup_header provide a way for hypervisor to pass secret and cpuid page.
* Add support to create a platform device and driver to query the attestation report
and the derive a key.
* Multiple cleanup and fixes to address Boris's review fedback.
Borislav Petkov (2):
x86/sev: Get rid of excessive use of defines
x86/head64: Carve out the guest encryption postprocessing into a
helper
Brijesh Singh (23):
x86/mm: Add sev_feature_enabled() helper
x86/sev: Shorten GHCB terminate macro names
x86/sev: Define the Linux specific guest termination reasons
x86/sev: Save the negotiated GHCB version
x86/sev: Add support for hypervisor feature VMGEXIT
x86/sev: Check SEV-SNP features support
x86/sev: Add a helper for the PVALIDATE instruction
x86/sev: Check the vmpl level
x86/compressed: Add helper for validating pages in the decompression
stage
x86/compressed: Register GHCB memory when SEV-SNP is active
x86/sev: Register GHCB memory when SEV-SNP is active
x86/sev: Add helper for validating pages in early enc attribute
changes
x86/kernel: Make the bss.decrypted section shared in RMP table
x86/kernel: Validate rom memory before accessing when SEV-SNP is
active
x86/mm: Add support to validate memory when changing C-bit
KVM: SVM: Define sev_features and vmpl field in the VMSA
x86/boot: Add Confidential Computing type to setup_data
x86/sev: Provide support for SNP guest request NAEs
x86/sev: Add snp_msg_seqno() helper
x86/sev: Register SNP guest request platform device
virt: Add SEV-SNP guest driver
virt: sevguest: Add support to derive key
virt: sevguest: Add support to get extended report
Michael Roth (9):
x86/head/64: set up a startup %gs for stack protector
x86/sev: move MSR-based VMGEXITs for CPUID to helper
KVM: x86: move lookup of indexed CPUID leafs to helper
x86/compressed/acpi: move EFI config table access to common code
x86/compressed/64: enable SEV-SNP-validated CPUID in #VC handler
x86/boot: add a pointer to Confidential Computing blob in bootparams
x86/compressed/64: store Confidential Computing blob address in
bootparams
x86/compressed/64: add identity mapping for Confidential Computing
blob
x86/sev: enable SEV-SNP-validated CPUID in #VC handlers
Tom Lendacky (4):
KVM: SVM: Create a separate mapping for the SEV-ES save area
KVM: SVM: Create a separate mapping for the GHCB save area
KVM: SVM: Update the SEV-ES save area mapping
x86/sev: Use SEV-SNP AP creation to start secondary CPUs
Documentation/virt/coco/sevguest.rst | 109 ++++
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/acpi.c | 113 +---
arch/x86/boot/compressed/efi.c | 179 ++++++
arch/x86/boot/compressed/head_64.S | 1 +
arch/x86/boot/compressed/ident_map_64.c | 36 +-
arch/x86/boot/compressed/idt_64.c | 7 +-
arch/x86/boot/compressed/misc.h | 50 ++
arch/x86/boot/compressed/sev.c | 115 +++-
arch/x86/include/asm/bootparam_utils.h | 1 +
arch/x86/include/asm/cpuid.h | 26 +
arch/x86/include/asm/mem_encrypt.h | 10 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/include/asm/realmode.h | 1 +
arch/x86/include/asm/setup.h | 5 +-
arch/x86/include/asm/sev-common.h | 130 ++++-
arch/x86/include/asm/sev.h | 73 ++-
arch/x86/include/asm/svm.h | 167 +++++-
arch/x86/include/uapi/asm/bootparam.h | 4 +-
arch/x86/include/uapi/asm/svm.h | 13 +
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/head64.c | 103 +++-
arch/x86/kernel/head_64.S | 6 +-
arch/x86/kernel/probe_roms.c | 13 +-
arch/x86/kernel/setup.c | 3 +
arch/x86/kernel/sev-internal.h | 12 +
arch/x86/kernel/sev-shared.c | 628 +++++++++++++++++++--
arch/x86/kernel/sev.c | 720 +++++++++++++++++++++++-
arch/x86/kernel/smpboot.c | 5 +
arch/x86/kvm/cpuid.c | 17 +-
arch/x86/kvm/svm/sev.c | 24 +-
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/mm/mem_encrypt.c | 65 ++-
arch/x86/mm/pat/set_memory.c | 15 +
drivers/virt/Kconfig | 3 +
drivers/virt/Makefile | 1 +
drivers/virt/coco/sevguest/Kconfig | 9 +
drivers/virt/coco/sevguest/Makefile | 2 +
drivers/virt/coco/sevguest/sevguest.c | 622 ++++++++++++++++++++
drivers/virt/coco/sevguest/sevguest.h | 63 +++
include/linux/efi.h | 1 +
include/linux/sev-guest.h | 90 +++
include/uapi/linux/sev-guest.h | 81 +++
44 files changed, 3287 insertions(+), 247 deletions(-)
create mode 100644 Documentation/virt/coco/sevguest.rst
create mode 100644 arch/x86/boot/compressed/efi.c
create mode 100644 arch/x86/include/asm/cpuid.h
create mode 100644 arch/x86/kernel/sev-internal.h
create mode 100644 drivers/virt/coco/sevguest/Kconfig
create mode 100644 drivers/virt/coco/sevguest/Makefile
create mode 100644 drivers/virt/coco/sevguest/sevguest.c
create mode 100644 drivers/virt/coco/sevguest/sevguest.h
create mode 100644 include/linux/sev-guest.h
create mode 100644 include/uapi/linux/sev-guest.h
--
2.17.1
The sev_feature_enabled() helper can be used by the guest to query whether
the SNP - Secure Nested Paging feature is active.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 8 ++++++++
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/mm/mem_encrypt.c | 14 ++++++++++++++
3 files changed, 24 insertions(+)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 9c80c68d75b5..df14291d65de 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -16,6 +16,12 @@
#include <asm/bootparam.h>
+enum sev_feature_type {
+ SEV,
+ SEV_ES,
+ SEV_SNP
+};
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern u64 sme_me_mask;
@@ -53,6 +59,7 @@ void __init sev_es_init_vc_handling(void);
bool sme_active(void);
bool sev_active(void);
bool sev_es_active(void);
+bool sev_feature_enabled(unsigned int feature_type);
#define __bss_decrypted __section(".bss..decrypted")
@@ -85,6 +92,7 @@ static inline int __init
early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
static inline void mem_encrypt_free_decrypted_mem(void) { }
+static bool sev_feature_enabled(unsigned int feature_type) { return false; }
#define __bss_decrypted
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a7c413432b33..37589da0282e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -481,8 +481,10 @@
#define MSR_AMD64_SEV 0xc0010131
#define MSR_AMD64_SEV_ENABLED_BIT 0
#define MSR_AMD64_SEV_ES_ENABLED_BIT 1
+#define MSR_AMD64_SEV_SNP_ENABLED_BIT 2
#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
#define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
+#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ff08dc463634..63e7799a9a86 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -389,6 +389,16 @@ bool noinstr sev_es_active(void)
return sev_status & MSR_AMD64_SEV_ES_ENABLED;
}
+bool sev_feature_enabled(unsigned int type)
+{
+ switch (type) {
+ case SEV: return sev_status & MSR_AMD64_SEV_ENABLED;
+ case SEV_ES: return sev_status & MSR_AMD64_SEV_ES_ENABLED;
+ case SEV_SNP: return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+ default: return false;
+ }
+}
+
/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
bool force_dma_unencrypted(struct device *dev)
{
@@ -461,6 +471,10 @@ static void print_mem_encrypt_feature_info(void)
if (sev_es_active())
pr_cont(" SEV-ES");
+ /* Secure Nested Paging */
+ if (sev_feature_enabled(SEV_SNP))
+ pr_cont(" SEV-SNP");
+
pr_cont("\n");
}
--
2.17.1
From: Borislav Petkov <[email protected]>
Remove all the defines of masks and bit positions for the GHCB MSR
protocol and use comments instead which correspond directly to the spec
so that following those can be a lot easier and straightforward with the
spec opened in parallel to the code.
Aligh vertically while at it.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/sev-common.h | 51 +++++++++++++++++--------------
1 file changed, 28 insertions(+), 23 deletions(-)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 855b0ec9c4e8..aac44c3f839c 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -18,20 +18,19 @@
/* SEV Information Request/Response */
#define GHCB_MSR_SEV_INFO_RESP 0x001
#define GHCB_MSR_SEV_INFO_REQ 0x002
-#define GHCB_MSR_VER_MAX_POS 48
-#define GHCB_MSR_VER_MAX_MASK 0xffff
-#define GHCB_MSR_VER_MIN_POS 32
-#define GHCB_MSR_VER_MIN_MASK 0xffff
-#define GHCB_MSR_CBIT_POS 24
-#define GHCB_MSR_CBIT_MASK 0xff
-#define GHCB_MSR_SEV_INFO(_max, _min, _cbit) \
- ((((_max) & GHCB_MSR_VER_MAX_MASK) << GHCB_MSR_VER_MAX_POS) | \
- (((_min) & GHCB_MSR_VER_MIN_MASK) << GHCB_MSR_VER_MIN_POS) | \
- (((_cbit) & GHCB_MSR_CBIT_MASK) << GHCB_MSR_CBIT_POS) | \
+
+#define GHCB_MSR_SEV_INFO(_max, _min, _cbit) \
+ /* GHCBData[63:48] */ \
+ ((((_max) & 0xffff) << 48) | \
+ /* GHCBData[47:32] */ \
+ (((_min) & 0xffff) << 32) | \
+ /* GHCBData[31:24] */ \
+ (((_cbit) & 0xff) << 24) | \
GHCB_MSR_SEV_INFO_RESP)
+
#define GHCB_MSR_INFO(v) ((v) & 0xfffUL)
-#define GHCB_MSR_PROTO_MAX(v) (((v) >> GHCB_MSR_VER_MAX_POS) & GHCB_MSR_VER_MAX_MASK)
-#define GHCB_MSR_PROTO_MIN(v) (((v) >> GHCB_MSR_VER_MIN_POS) & GHCB_MSR_VER_MIN_MASK)
+#define GHCB_MSR_PROTO_MAX(v) (((v) >> 48) & 0xffff)
+#define GHCB_MSR_PROTO_MIN(v) (((v) >> 32) & 0xffff)
/* CPUID Request/Response */
#define GHCB_MSR_CPUID_REQ 0x004
@@ -46,27 +45,33 @@
#define GHCB_CPUID_REQ_EBX 1
#define GHCB_CPUID_REQ_ECX 2
#define GHCB_CPUID_REQ_EDX 3
-#define GHCB_CPUID_REQ(fn, reg) \
- (GHCB_MSR_CPUID_REQ | \
- (((unsigned long)reg & GHCB_MSR_CPUID_REG_MASK) << GHCB_MSR_CPUID_REG_POS) | \
- (((unsigned long)fn) << GHCB_MSR_CPUID_FUNC_POS))
+#define GHCB_CPUID_REQ(fn, reg) \
+ /* GHCBData[11:0] */ \
+ (GHCB_MSR_CPUID_REQ | \
+ /* GHCBData[31:12] */ \
+ (((unsigned long)(reg) & 0x3) << 30) | \
+ /* GHCBData[63:32] */ \
+ (((unsigned long)fn) << 32))
/* AP Reset Hold */
-#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
-#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
+#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
/* GHCB Hypervisor Feature Request/Response */
-#define GHCB_MSR_HV_FT_REQ 0x080
-#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_REQ 0x080
+#define GHCB_MSR_HV_FT_RESP 0x081
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
#define GHCB_MSR_TERM_REASON_POS 16
#define GHCB_MSR_TERM_REASON_MASK 0xff
-#define GHCB_SEV_TERM_REASON(reason_set, reason_val) \
- (((((u64)reason_set) & GHCB_MSR_TERM_REASON_SET_MASK) << GHCB_MSR_TERM_REASON_SET_POS) | \
- ((((u64)reason_val) & GHCB_MSR_TERM_REASON_MASK) << GHCB_MSR_TERM_REASON_POS))
+
+#define GHCB_SEV_TERM_REASON(reason_set, reason_val) \
+ /* GHCBData[15:12] */ \
+ (((((u64)reason_set) & 0xf) << 12) | \
+ /* GHCBData[23:16] */ \
+ ((((u64)reason_val) & 0xff) << 16))
#define GHCB_SEV_ES_GEN_REQ 0
#define GHCB_SEV_ES_PROT_UNSUPPORTED 1
--
2.17.1
GHCB specification defines the reason code for reason set 0. The reason
codes defined in the set 0 do not cover all possible causes for a guest
to request termination.
The reason set 1 to 255 is reserved for the vendor-specific codes.
Reseve the reason set 1 for the Linux guest. Define an error codes for
reason set 1.
While at it, change the sev_es_terminate() to accept the reason set
parameter.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 6 +++---
arch/x86/include/asm/sev-common.h | 8 ++++++++
arch/x86/kernel/sev-shared.c | 11 ++++-------
arch/x86/kernel/sev.c | 4 ++--
4 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 28bcf04c022e..7760959fe96d 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -122,7 +122,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
static bool early_setup_sev_es(void)
{
if (!sev_es_negotiate_protocol())
- sev_es_terminate(GHCB_SEV_ES_PROT_UNSUPPORTED);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_PROT_UNSUPPORTED);
if (set_page_decrypted((unsigned long)&boot_ghcb_page))
return false;
@@ -175,7 +175,7 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
enum es_result result;
if (!boot_ghcb && !early_setup_sev_es())
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
vc_ghcb_invalidate(boot_ghcb);
result = vc_init_em_ctxt(&ctxt, regs, exit_code);
@@ -202,5 +202,5 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
if (result == ES_OK)
vc_finish_insn(&ctxt);
else if (result != ES_RETRY)
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
}
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index aac44c3f839c..3278ee578937 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -73,9 +73,17 @@
/* GHCBData[23:16] */ \
((((u64)reason_val) & 0xff) << 16))
+/* Error codes from reason set 0 */
+#define SEV_TERM_SET_GEN 0
#define GHCB_SEV_ES_GEN_REQ 0
#define GHCB_SEV_ES_PROT_UNSUPPORTED 1
+/* Linux-specific reason codes (used with reason set 1) */
+#define SEV_TERM_SET_LINUX 1
+#define GHCB_TERM_REGISTER 0 /* GHCB GPA registration failure */
+#define GHCB_TERM_PSC 1 /* Page State Change failure */
+#define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
+
#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
#endif
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 114f62fe2529..dab73fec74ec 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -24,15 +24,12 @@ static bool __init sev_es_check_cpu_features(void)
return true;
}
-static void __noreturn sev_es_terminate(unsigned int reason)
+static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
{
u64 val = GHCB_MSR_TERM_REQ;
- /*
- * Tell the hypervisor what went wrong - only reason-set 0 is
- * currently supported.
- */
- val |= GHCB_SEV_TERM_REASON(0, reason);
+ /* Tell the hypervisor what went wrong. */
+ val |= GHCB_SEV_TERM_REASON(set, reason);
/* Request Guest Termination from Hypvervisor */
sev_es_wr_ghcb_msr(val);
@@ -208,7 +205,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
fail:
/* Terminate the guest */
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
}
static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 71744ee0add6..646912709334 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1429,7 +1429,7 @@ DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication)
show_regs(regs);
/* Ask hypervisor to sev_es_terminate */
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
/* If that fails and we get here - just panic */
panic("Returned from Terminate-Request to Hypervisor\n");
@@ -1477,7 +1477,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
/* Do initial setup or terminate the guest */
if (unlikely(boot_ghcb == NULL && !sev_es_setup_ghcb()))
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
vc_ghcb_invalidate(boot_ghcb);
--
2.17.1
Reviewed-by: Venu Busireddy <[email protected]>
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 6 +++---
arch/x86/include/asm/sev-common.h | 4 ++--
arch/x86/kernel/sev-shared.c | 2 +-
arch/x86/kernel/sev.c | 4 ++--
4 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 670e998fe930..28bcf04c022e 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -122,7 +122,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
static bool early_setup_sev_es(void)
{
if (!sev_es_negotiate_protocol())
- sev_es_terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED);
+ sev_es_terminate(GHCB_SEV_ES_PROT_UNSUPPORTED);
if (set_page_decrypted((unsigned long)&boot_ghcb_page))
return false;
@@ -175,7 +175,7 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
enum es_result result;
if (!boot_ghcb && !early_setup_sev_es())
- sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+ sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
vc_ghcb_invalidate(boot_ghcb);
result = vc_init_em_ctxt(&ctxt, regs, exit_code);
@@ -202,5 +202,5 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
if (result == ES_OK)
vc_finish_insn(&ctxt);
else if (result != ES_RETRY)
- sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+ sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
}
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 2cef6c5a52c2..855b0ec9c4e8 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -68,8 +68,8 @@
(((((u64)reason_set) & GHCB_MSR_TERM_REASON_SET_MASK) << GHCB_MSR_TERM_REASON_SET_POS) | \
((((u64)reason_val) & GHCB_MSR_TERM_REASON_MASK) << GHCB_MSR_TERM_REASON_POS))
-#define GHCB_SEV_ES_REASON_GENERAL_REQUEST 0
-#define GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED 1
+#define GHCB_SEV_ES_GEN_REQ 0
+#define GHCB_SEV_ES_PROT_UNSUPPORTED 1
#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 9f90f460a28c..114f62fe2529 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -208,7 +208,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
fail:
/* Terminate the guest */
- sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+ sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
}
static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a6895e440bc3..71744ee0add6 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1429,7 +1429,7 @@ DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication)
show_regs(regs);
/* Ask hypervisor to sev_es_terminate */
- sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+ sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
/* If that fails and we get here - just panic */
panic("Returned from Terminate-Request to Hypervisor\n");
@@ -1477,7 +1477,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
/* Do initial setup or terminate the guest */
if (unlikely(boot_ghcb == NULL && !sev_es_setup_ghcb()))
- sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+ sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
vc_ghcb_invalidate(boot_ghcb);
--
2.17.1
The SEV-SNP guest is required to perform GHCB GPA registration. This is
because the hypervisor may prefer that a guest use a consistent and/or
specific GPA for the GHCB associated with a vCPU. For more information,
see the GHCB specification.
If hypervisor can not work with the guest provided GPA then terminate the
guest boot.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 4 ++++
arch/x86/include/asm/sev-common.h | 13 +++++++++++++
arch/x86/kernel/sev-shared.c | 16 ++++++++++++++++
3 files changed, 33 insertions(+)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 5c4ba211bcef..6e8d97c280aa 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -233,6 +233,10 @@ static bool do_early_sev_setup(void)
/* Initialize lookup tables for the instruction decoder */
inat_init_tables();
+ /* SEV-SNP guest requires the GHCB GPA must be registered */
+ if (sev_snp_enabled())
+ snp_register_ghcb_early(__pa(&boot_ghcb_page));
+
return true;
}
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1cd8ce838af8..37aa77565726 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -57,6 +57,19 @@
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+/* GHCB GPA Register */
+#define GHCB_MSR_REG_GPA_REQ 0x012
+#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
+ /* GHCBData[63:12] */ \
+ (((u64)((v) & GENMASK_ULL(51, 0)) << 12) | \
+ /* GHCBData[11:0] */ \
+ GHCB_MSR_REG_GPA_REQ)
+
+#define GHCB_MSR_REG_GPA_RESP 0x013
+#define GHCB_MSR_REG_GPA_RESP_VAL(v) \
+ /* GHCBData[63:12] */ \
+ (((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
+
/* SNP Page State Change */
enum psc_op {
SNP_PAGE_STATE_PRIVATE = 1,
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 8bd67087d79e..1adc74ab97c0 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -67,6 +67,22 @@ static bool get_hv_features(void)
return true;
}
+static void snp_register_ghcb_early(unsigned long paddr)
+{
+ unsigned long pfn = paddr >> PAGE_SHIFT;
+ u64 val;
+
+ sev_es_wr_ghcb_msr(GHCB_MSR_REG_GPA_REQ_VAL(pfn));
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+
+ /* If the response GPA is not ours then abort the guest */
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_REG_GPA_RESP) ||
+ (GHCB_MSR_REG_GPA_RESP_VAL(val) != pfn))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_REGISTER);
+}
+
static bool sev_es_negotiate_protocol(void)
{
u64 val;
--
2.17.1
The SEV-SNP guest is required to perform GHCB GPA registration. This is
because the hypervisor may prefer that a guest use a consistent and/or
specific GPA for the GHCB associated with a vCPU. For more information,
see the GHCB specification section GHCB GPA Registration.
During the boot, init_ghcb() allocates a per-cpu GHCB page. On very first
VC exception, the exception handler switch to using the per-cpu GHCB page
allocated during the init_ghcb(). The GHCB page must be registered in
the current vcpu context.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/sev-internal.h | 12 ++++++++++++
arch/x86/kernel/sev.c | 28 ++++++++++++++++++++++++++++
2 files changed, 40 insertions(+)
create mode 100644 arch/x86/kernel/sev-internal.h
diff --git a/arch/x86/kernel/sev-internal.h b/arch/x86/kernel/sev-internal.h
new file mode 100644
index 000000000000..0fb7324803b4
--- /dev/null
+++ b/arch/x86/kernel/sev-internal.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Forward declarations for sev-shared.c
+ *
+ * Author: Brijesh Singh <[email protected]>
+ */
+
+#ifndef __X86_SEV_INTERNAL_H__
+
+static void snp_register_ghcb_early(unsigned long paddr);
+
+#endif /* __X86_SEV_INTERNAL_H__ */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 06e6914cdc26..9ab541b893c2 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -31,6 +31,8 @@
#include <asm/smp.h>
#include <asm/cpu.h>
+#include "sev-internal.h"
+
#define DR7_RESET_VALUE 0x400
/* For early boot hypervisor communication in SEV-ES enabled guests */
@@ -87,6 +89,13 @@ struct sev_es_runtime_data {
* is currently unsupported in SEV-ES guests.
*/
unsigned long dr7;
+
+ /*
+ * SEV-SNP requires that the GHCB must be registered before using it.
+ * The flag below will indicate whether the GHCB is registered, if its
+ * not registered then sev_es_get_ghcb() will perform the registration.
+ */
+ bool snp_ghcb_registered;
};
struct ghcb_state {
@@ -191,6 +200,16 @@ void noinstr __sev_es_ist_exit(void)
this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
}
+static void snp_register_ghcb(struct sev_es_runtime_data *data, unsigned long paddr)
+{
+ if (data->snp_ghcb_registered)
+ return;
+
+ snp_register_ghcb_early(paddr);
+
+ data->snp_ghcb_registered = true;
+}
+
/*
* Nothing shall interrupt this code path while holding the per-CPU
* GHCB. The backup GHCB is only for NMIs interrupting this path.
@@ -237,6 +256,10 @@ static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
data->ghcb_active = true;
}
+ /* SEV-SNP guest requires that GHCB must be registered. */
+ if (sev_feature_enabled(SEV_SNP))
+ snp_register_ghcb(data, __pa(ghcb));
+
return ghcb;
}
@@ -681,6 +704,10 @@ static bool __init setup_ghcb(void)
/* Alright - Make the boot-ghcb public */
boot_ghcb = &boot_ghcb_page;
+ /* SEV-SNP guest requires that GHCB GPA must be registered. */
+ if (sev_feature_enabled(SEV_SNP))
+ snp_register_ghcb_early(__pa(&boot_ghcb_page));
+
return true;
}
@@ -770,6 +797,7 @@ static void __init init_ghcb(int cpu)
data->ghcb_active = false;
data->backup_ghcb_active = false;
+ data->snp_ghcb_registered = false;
}
void __init sev_es_init_vc_handling(void)
--
2.17.1
An SNP-active guest uses the PVALIDATE instruction to validate or
rescind the validation of a guest page’s RMP entry. Upon completion,
a return code is stored in EAX and rFLAGS bits are set based on the
return code. If the instruction completed successfully, the CF
indicates if the content of the RMP were changed or not.
See AMD APM Volume 3 for additional details.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 134a7c9d91b6..b308815a2c01 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -59,6 +59,9 @@ extern void vc_no_ghcb(void);
extern void vc_boot_ghcb(void);
extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
+/* Software defined (when rFlags.CF = 1) */
+#define PVALIDATE_FAIL_NOUPDATE 255
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -81,12 +84,30 @@ static __always_inline void sev_es_nmi_complete(void)
__sev_es_nmi_complete();
}
extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
+{
+ bool no_rmpupdate;
+ int rc;
+
+ /* "pvalidate" mnemonic support in binutils 2.36 and newer */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
+ CC_SET(c)
+ : CC_OUT(c) (no_rmpupdate), "=a"(rc)
+ : "a"(vaddr), "c"(rmp_psize), "d"(validate)
+ : "memory", "cc");
+
+ if (no_rmpupdate)
+ return PVALIDATE_FAIL_NOUPDATE;
+
+ return rc;
+}
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { return 0; }
static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
+static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
#endif
#endif
--
2.17.1
Version 2 of GHCB specification introduced advertisement of a features
that are supported by the hypervisor. Add support to query the HV
features on boot.
Version 2 of GHCB specification adds several new NAEs, most of them are
optional except the hypervisor feature. Now that hypervisor feature NAE
is implemented, so bump the GHCB maximum support protocol version.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 2 ++
arch/x86/include/asm/sev-common.h | 3 +++
arch/x86/include/asm/sev.h | 2 +-
arch/x86/include/uapi/asm/svm.h | 2 ++
arch/x86/kernel/sev-shared.c | 23 +++++++++++++++++++++++
5 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index df14291d65de..fb857f2e72cb 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,7 @@ enum sev_feature_type {
extern u64 sme_me_mask;
extern u64 sev_status;
+extern u64 sev_hv_features;
void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
unsigned long decrypted_kernel_vaddr,
@@ -66,6 +67,7 @@ bool sev_feature_enabled(unsigned int feature_type);
#else /* !CONFIG_AMD_MEM_ENCRYPT */
#define sme_me_mask 0ULL
+#define sev_hv_features 0ULL
static inline void __init sme_early_encrypt(resource_size_t paddr,
unsigned long size) { }
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 3278ee578937..891569c07ed7 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -60,6 +60,9 @@
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_RESP_VAL(v) \
+ /* GHCBData[63:12] */ \
+ (((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7ec91b1359df..134a7c9d91b6 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -13,7 +13,7 @@
#include <asm/sev-common.h>
#define GHCB_PROTOCOL_MIN 1ULL
-#define GHCB_PROTOCOL_MAX 1ULL
+#define GHCB_PROTOCOL_MAX 2ULL
#define GHCB_DEFAULT_USAGE 0ULL
#define VMGEXIT() { asm volatile("rep; vmmcall\n\r"); }
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index efa969325ede..b0ad00f4c1e1 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -108,6 +108,7 @@
#define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
+#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff
/* Exit code reserved for hypervisor/software use */
@@ -218,6 +219,7 @@
{ SVM_VMGEXIT_NMI_COMPLETE, "vmgexit_nmi_complete" }, \
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
+ { SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 58a6efb1f327..8bd67087d79e 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -23,6 +23,9 @@
*/
static u16 __ro_after_init ghcb_version;
+/* Bitmap of SEV features supported by the hypervisor */
+u64 __ro_after_init sev_hv_features = 0;
+
static bool __init sev_es_check_cpu_features(void)
{
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -48,6 +51,22 @@ static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
asm volatile("hlt\n" : : : "memory");
}
+static bool get_hv_features(void)
+{
+ u64 val;
+
+ sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_HV_FT_RESP)
+ return false;
+
+ sev_hv_features = GHCB_MSR_HV_FT_RESP_VAL(val);
+
+ return true;
+}
+
static bool sev_es_negotiate_protocol(void)
{
u64 val;
@@ -66,6 +85,10 @@ static bool sev_es_negotiate_protocol(void)
ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
+ /* The hypervisor features are available from version 2 onward. */
+ if (ghcb_version >= 2 && !get_hv_features())
+ return false;
+
return true;
}
--
2.17.1
Virtual Machine Privilege Level (VMPL) is an optional feature in the
SEV-SNP architecture, which allows a guest VM to divide its address space
into four levels. The level can be used to provide the hardware isolated
abstraction layers with a VM. The VMPL0 is the highest privilege, and
VMPL3 is the least privilege. Certain operations must be done by the VMPL0
software, such as:
* Validate or invalidate memory range (PVALIDATE instruction)
* Allocate VMSA page (RMPADJUST instruction when VMSA=1)
The initial SEV-SNP support assumes that the guest kernel is running on
VMPL0. Let's add a check to make sure that kernel is running at VMPL0
before continuing the boot. There is no easy method to query the current
VMPL level, so use the RMPADJUST instruction to determine whether its
booted at the VMPL0.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 41 ++++++++++++++++++++++++++++---
arch/x86/include/asm/sev-common.h | 1 +
arch/x86/include/asm/sev.h | 3 +++
3 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 7be325d9b09f..ec765527546f 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -134,6 +134,36 @@ static inline bool sev_snp_enabled(void)
return msr_sev_status & MSR_AMD64_SEV_SNP_ENABLED;
}
+static bool is_vmpl0(void)
+{
+ u64 attrs, va;
+ int err;
+
+ /*
+ * There is no straightforward way to query the current VMPL level. The
+ * simplest method is to use the RMPADJUST instruction to change a page
+ * permission to a VMPL level-1, and if the guest kernel is launched at
+ * a level <= 1, then RMPADJUST instruction will return an error.
+ */
+ attrs = 1;
+
+ /*
+ * Any page aligned virtual address is sufficent to test the VMPL level.
+ * The boot_ghcb_page is page aligned memory, so lets use for the test.
+ */
+ va = (u64)&boot_ghcb_page;
+
+ /* Instruction mnemonic supported in binutils versions v2.36 and later */
+ asm volatile (".byte 0xf3,0x0f,0x01,0xfe\n\t"
+ : "=a" (err)
+ : "a" (va), "c" (RMP_PG_SIZE_4K), "d" (attrs)
+ : "memory", "cc");
+ if (err)
+ return false;
+
+ return true;
+}
+
static bool do_early_sev_setup(void)
{
if (!sev_es_negotiate_protocol())
@@ -141,10 +171,15 @@ static bool do_early_sev_setup(void)
/*
* If SEV-SNP is enabled, then check if the hypervisor supports the SEV-SNP
- * features.
+ * features and is launched at VMPL-0 level.
*/
- if (sev_snp_enabled() && !(sev_hv_features & GHCB_HV_FT_SNP))
- sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+ if (sev_snp_enabled()) {
+ if (!(sev_hv_features & GHCB_HV_FT_SNP))
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+
+ if (!is_vmpl0())
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
+ }
if (set_page_decrypted((unsigned long)&boot_ghcb_page))
return false;
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index f80a3cde2086..d426c30ae7b4 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -89,6 +89,7 @@
#define GHCB_TERM_REGISTER 0 /* GHCB GPA registration failure */
#define GHCB_TERM_PSC 1 /* Page State Change failure */
#define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
+#define GHCB_TERM_NOT_VMPL0 3 /* SNP guest is not running at VMPL-0 */
#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index b308815a2c01..242af1154e49 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -62,6 +62,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* Software defined (when rFlags.CF = 1) */
#define PVALIDATE_FAIL_NOUPDATE 255
+/* RMP page size */
+#define RMP_PG_SIZE_4K 0
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
--
2.17.1
The early_set_memory_{encrypt,decrypt}() are used for changing the
page from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps.
If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:
1. Issue the page state change VMGEXIT to add the page as a private
in the RMP table.
2. Validate the page after its successfully added in the RMP table.
To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before clearing the
encryption attribute from the page table.
1. Invalidate the page.
2. Issue the page state change VMGEXIT to make the page shared in the
RMP table.
The early_set_memory_{encrypt,decrypt} can be called before the GHCB
is setup, use the SNP page state MSR protocol VMGEXIT defined in the GHCB
specification to request the page state change in the RMP table.
While at it, add a helper snp_prep_memory() that can be used outside
the sev specific files to change the page state for a specified memory
range.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 10 ++++
arch/x86/kernel/sev.c | 102 +++++++++++++++++++++++++++++++++++++
arch/x86/mm/mem_encrypt.c | 51 +++++++++++++++++--
3 files changed, 159 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 242af1154e49..ecd8cd8c5908 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -104,6 +104,11 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
return rc;
}
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages);
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages);
+void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -111,6 +116,11 @@ static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { ret
static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
+static inline void __init
+early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+static inline void __init
+early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
#endif
#endif
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 9ab541b893c2..0ddc032fd252 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -591,6 +591,108 @@ static u64 get_jump_table_addr(void)
return ret;
}
+static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool validate)
+{
+ unsigned long vaddr_end;
+ int rc;
+
+ vaddr = vaddr & PAGE_MASK;
+ vaddr_end = vaddr + (npages << PAGE_SHIFT);
+
+ while (vaddr < vaddr_end) {
+ rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
+ if (WARN(rc, "Failed to validate address 0x%lx ret %d", vaddr, rc))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+
+ vaddr = vaddr + PAGE_SIZE;
+ }
+}
+
+static void __init early_set_page_state(unsigned long paddr, unsigned int npages, enum psc_op op)
+{
+ unsigned long paddr_end;
+ u64 val;
+
+ paddr = paddr & PAGE_MASK;
+ paddr_end = paddr + (npages << PAGE_SHIFT);
+
+ while (paddr < paddr_end) {
+ /*
+ * Use the MSR protocol because this function can be called before the GHCB
+ * is established.
+ */
+ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+
+ if (WARN(GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP,
+ "Wrong PSC response code: 0x%x\n",
+ (unsigned int)GHCB_RESP_CODE(val)))
+ goto e_term;
+
+ if (WARN(GHCB_MSR_PSC_RESP_VAL(val),
+ "Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",
+ op == SNP_PAGE_STATE_PRIVATE ? "private" : "shared",
+ paddr, GHCB_MSR_PSC_RESP_VAL(val)))
+ goto e_term;
+
+ paddr = paddr + PAGE_SIZE;
+ }
+
+ return;
+
+e_term:
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+}
+
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return;
+
+ /*
+ * Ask the hypervisor to mark the memory pages as private in the RMP
+ * table.
+ */
+ early_set_page_state(paddr, npages, SNP_PAGE_STATE_PRIVATE);
+
+ /* Validate the memory pages after they've been added in the RMP table. */
+ pvalidate_pages(vaddr, npages, 1);
+}
+
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return;
+
+ /*
+ * Invalidate the memory pages before they are marked shared in the
+ * RMP table.
+ */
+ pvalidate_pages(vaddr, npages, 0);
+
+ /* Ask hypervisor to mark the memory pages shared in the RMP table. */
+ early_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
+}
+
+void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op)
+{
+ unsigned long vaddr, npages;
+
+ vaddr = (unsigned long)__va(paddr);
+ npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ if (op == SNP_PAGE_STATE_PRIVATE)
+ early_snp_set_memory_private(vaddr, paddr, npages);
+ else if (op == SNP_PAGE_STATE_SHARED)
+ early_snp_set_memory_shared(vaddr, paddr, npages);
+ else
+ WARN(1, "invalid memory op %d\n", op);
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 63e7799a9a86..d434376568de 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -30,6 +30,7 @@
#include <asm/processor-flags.h>
#include <asm/msr.h>
#include <asm/cmdline.h>
+#include <asm/sev.h>
#include "mm_internal.h"
@@ -48,6 +49,34 @@ EXPORT_SYMBOL_GPL(sev_enable_key);
/* Buffer used for early in-place encryption by BSP, no locking needed */
static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
+/*
+ * When SNP is active, change the page state from private to shared before
+ * copying the data from the source to destination and restore after the copy.
+ * This is required because the source address is mapped as decrypted by the
+ * caller of the routine.
+ */
+static inline void __init snp_memcpy(void *dst, void *src, size_t sz,
+ unsigned long paddr, bool decrypt)
+{
+ unsigned long npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ if (!sev_feature_enabled(SEV_SNP) || !decrypt) {
+ memcpy(dst, src, sz);
+ return;
+ }
+
+ /*
+ * With SNP, the paddr needs to be accessed decrypted, mark the page
+ * shared in the RMP table before copying it.
+ */
+ early_snp_set_memory_shared((unsigned long)__va(paddr), paddr, npages);
+
+ memcpy(dst, src, sz);
+
+ /* Restore the page state after the memcpy. */
+ early_snp_set_memory_private((unsigned long)__va(paddr), paddr, npages);
+}
+
/*
* This routine does not change the underlying encryption setting of the
* page(s) that map this memory. It assumes that eventually the memory is
@@ -96,8 +125,8 @@ static void __init __sme_early_enc_dec(resource_size_t paddr,
* Use a temporary buffer, of cache-line multiple size, to
* avoid data corruption as documented in the APM.
*/
- memcpy(sme_early_buffer, src, len);
- memcpy(dst, sme_early_buffer, len);
+ snp_memcpy(sme_early_buffer, src, len, paddr, enc);
+ snp_memcpy(dst, sme_early_buffer, len, paddr, !enc);
early_memunmap(dst, len);
early_memunmap(src, len);
@@ -272,14 +301,28 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
clflush_cache_range(__va(pa), size);
/* Encrypt/decrypt the contents in-place */
- if (enc)
+ if (enc) {
sme_early_encrypt(pa, size);
- else
+ } else {
sme_early_decrypt(pa, size);
+ /*
+ * ON SNP, the page state in the RMP table must happen
+ * before the page table updates.
+ */
+ early_snp_set_memory_shared((unsigned long)__va(pa), pa, 1);
+ }
+
/* Change the page encryption mask. */
new_pte = pfn_pte(pfn, new_prot);
set_pte_atomic(kpte, new_pte);
+
+ /*
+ * If page is set encrypted in the page table, then update the RMP table to
+ * add this page as private.
+ */
+ if (enc)
+ early_snp_set_memory_private((unsigned long)__va(pa), pa, 1);
}
static int __init early_set_memory_enc_dec(unsigned long vaddr,
--
2.17.1
From: Michael Roth <[email protected]>
Future patches for SEV-SNP-validated CPUID will also require early
parsing of the EFI configuration. Move the related code into a set of
helpers that can be re-used for that purpose.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/acpi.c | 113 +++++--------------
arch/x86/boot/compressed/efi.c | 178 ++++++++++++++++++++++++++++++
arch/x86/boot/compressed/misc.h | 43 ++++++++
4 files changed, 251 insertions(+), 84 deletions(-)
create mode 100644 arch/x86/boot/compressed/efi.c
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 431bf7f846c3..d364192c2367 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -100,6 +100,7 @@ endif
vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
+vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
$(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE
diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index 8bcbcee54aa1..3a3f997d7210 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -25,41 +25,22 @@ struct mem_vector immovable_mem[MAX_NUMNODES*2];
* ACPI_TABLE_GUID are found, take the former, which has more features.
*/
static acpi_physical_address
-__efi_get_rsdp_addr(unsigned long config_tables, unsigned int nr_tables,
- bool efi_64)
+__efi_get_rsdp_addr(unsigned long config_table_pa,
+ unsigned int config_table_len, bool efi_64)
{
acpi_physical_address rsdp_addr = 0;
-
#ifdef CONFIG_EFI
- int i;
-
- /* Get EFI tables from systab. */
- for (i = 0; i < nr_tables; i++) {
- acpi_physical_address table;
- efi_guid_t guid;
-
- if (efi_64) {
- efi_config_table_64_t *tbl = (efi_config_table_64_t *)config_tables + i;
-
- guid = tbl->guid;
- table = tbl->table;
-
- if (!IS_ENABLED(CONFIG_X86_64) && table >> 32) {
- debug_putstr("Error getting RSDP address: EFI config table located above 4GB.\n");
- return 0;
- }
- } else {
- efi_config_table_32_t *tbl = (efi_config_table_32_t *)config_tables + i;
-
- guid = tbl->guid;
- table = tbl->table;
- }
+ int ret;
- if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
- rsdp_addr = table;
- else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
- return table;
- }
+ ret = efi_find_vendor_table(config_table_pa, config_table_len,
+ ACPI_20_TABLE_GUID, efi_64,
+ (unsigned long *)&rsdp_addr);
+ if (ret == -ENOENT)
+ ret = efi_find_vendor_table(config_table_pa, config_table_len,
+ ACPI_TABLE_GUID, efi_64,
+ (unsigned long *)&rsdp_addr);
+ if (ret)
+ debug_putstr("Error getting RSDP address.\n");
#endif
return rsdp_addr;
}
@@ -87,7 +68,9 @@ static acpi_physical_address kexec_get_rsdp_addr(void)
efi_system_table_64_t *systab;
struct efi_setup_data *esd;
struct efi_info *ei;
+ bool efi_64;
char *sig;
+ int ret;
esd = (struct efi_setup_data *)get_kexec_setup_data_addr();
if (!esd)
@@ -98,18 +81,16 @@ static acpi_physical_address kexec_get_rsdp_addr(void)
return 0;
}
- ei = &boot_params->efi_info;
- sig = (char *)&ei->efi_loader_signature;
- if (strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
+ /* Get systab from boot params. */
+ ret = efi_get_system_table(boot_params, (unsigned long *)&systab, &efi_64);
+ if (ret)
+ error("EFI system table not found in kexec boot_params.");
+
+ if (!efi_64) {
debug_putstr("Wrong kexec EFI loader signature.\n");
return 0;
}
- /* Get systab from boot params. */
- systab = (efi_system_table_64_t *) (ei->efi_systab | ((__u64)ei->efi_systab_hi << 32));
- if (!systab)
- error("EFI system table not found in kexec boot_params.");
-
return __efi_get_rsdp_addr((unsigned long)esd->tables, systab->nr_tables, true);
}
#else
@@ -119,54 +100,18 @@ static acpi_physical_address kexec_get_rsdp_addr(void) { return 0; }
static acpi_physical_address efi_get_rsdp_addr(void)
{
#ifdef CONFIG_EFI
- unsigned long systab, config_tables;
- unsigned int nr_tables;
- struct efi_info *ei;
+ unsigned long config_table_pa = 0;
+ unsigned int config_table_len;
bool efi_64;
- char *sig;
-
- ei = &boot_params->efi_info;
- sig = (char *)&ei->efi_loader_signature;
-
- if (!strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
- efi_64 = true;
- } else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4)) {
- efi_64 = false;
- } else {
- debug_putstr("Wrong EFI loader signature.\n");
- return 0;
- }
-
- /* Get systab from boot params. */
-#ifdef CONFIG_X86_64
- systab = ei->efi_systab | ((__u64)ei->efi_systab_hi << 32);
-#else
- if (ei->efi_systab_hi || ei->efi_memmap_hi) {
- debug_putstr("Error getting RSDP address: EFI system table located above 4GB.\n");
- return 0;
- }
- systab = ei->efi_systab;
-#endif
- if (!systab)
- error("EFI system table not found.");
-
- /* Handle EFI bitness properly */
- if (efi_64) {
- efi_system_table_64_t *stbl = (efi_system_table_64_t *)systab;
-
- config_tables = stbl->tables;
- nr_tables = stbl->nr_tables;
- } else {
- efi_system_table_32_t *stbl = (efi_system_table_32_t *)systab;
-
- config_tables = stbl->tables;
- nr_tables = stbl->nr_tables;
- }
+ int ret;
- if (!config_tables)
- error("EFI config tables not found.");
+ ret = efi_get_conf_table(boot_params, &config_table_pa,
+ &config_table_len, &efi_64);
+ if (ret || !config_table_pa)
+ error("EFI config table not found.");
- return __efi_get_rsdp_addr(config_tables, nr_tables, efi_64);
+ return __efi_get_rsdp_addr(config_table_pa, config_table_len,
+ efi_64);
#else
return 0;
#endif
diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
new file mode 100644
index 000000000000..16ff5cb9a1fb
--- /dev/null
+++ b/arch/x86/boot/compressed/efi.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Helpers for early access to EFI configuration table
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Michael Roth <[email protected]>
+ */
+
+#include "misc.h"
+#include <linux/efi.h>
+#include <asm/efi.h>
+
+/* Get vendor table address/guid from EFI config table at the given index */
+static int get_vendor_table(void *conf_table, unsigned int idx,
+ unsigned long *vendor_table_pa,
+ efi_guid_t *vendor_table_guid,
+ bool efi_64)
+{
+ if (efi_64) {
+ efi_config_table_64_t *table_entry =
+ (efi_config_table_64_t *)conf_table + idx;
+
+ if (!IS_ENABLED(CONFIG_X86_64) &&
+ table_entry->table >> 32) {
+ debug_putstr("Error: EFI config table entry located above 4GB.\n");
+ return -EINVAL;
+ }
+
+ *vendor_table_pa = table_entry->table;
+ *vendor_table_guid = table_entry->guid;
+
+ } else {
+ efi_config_table_32_t *table_entry =
+ (efi_config_table_32_t *)conf_table + idx;
+
+ *vendor_table_pa = table_entry->table;
+ *vendor_table_guid = table_entry->guid;
+ }
+
+ return 0;
+}
+
+/**
+ * Given EFI config table, search it for the physical address of the vendor
+ * table associated with GUID.
+ *
+ * @conf_table: pointer to EFI configuration table
+ * @conf_table_len: number of entries in EFI configuration table
+ * @guid: GUID of vendor table
+ * @efi_64: true if using 64-bit EFI
+ * @vendor_table_pa: location to store physical address of vendor table
+ *
+ * Returns 0 on success. On error, return params are left unchanged.
+ */
+int
+efi_find_vendor_table(unsigned long conf_table_pa, unsigned int conf_table_len,
+ efi_guid_t guid, bool efi_64,
+ unsigned long *vendor_table_pa)
+{
+ unsigned int i;
+
+ for (i = 0; i < conf_table_len; i++) {
+ unsigned long vendor_table_pa_tmp;
+ efi_guid_t vendor_table_guid;
+ int ret;
+
+ if (get_vendor_table((void *)conf_table_pa, i,
+ &vendor_table_pa_tmp,
+ &vendor_table_guid, efi_64))
+ return -EINVAL;
+
+ if (!efi_guidcmp(guid, vendor_table_guid)) {
+ *vendor_table_pa = vendor_table_pa_tmp;
+ return 0;
+ }
+ }
+
+ return -ENOENT;
+}
+
+/**
+ * Given boot_params, retrieve the physical address of EFI system table.
+ *
+ * @boot_params: pointer to boot_params
+ * @sys_table_pa: location to store physical address of system table
+ * @is_efi_64: location to store whether using 64-bit EFI or not
+ *
+ * Returns 0 on success. On error, return params are left unchanged.
+ */
+int
+efi_get_system_table(struct boot_params *boot_params,
+ unsigned long *sys_table_pa, bool *is_efi_64)
+{
+ unsigned long sys_table;
+ struct efi_info *ei;
+ bool efi_64;
+ char *sig;
+
+ if (!sys_table_pa || !is_efi_64)
+ return -EINVAL;
+
+ ei = &boot_params->efi_info;
+ sig = (char *)&ei->efi_loader_signature;
+
+ if (!strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
+ efi_64 = true;
+ } else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4)) {
+ efi_64 = false;
+ } else {
+ debug_putstr("Wrong EFI loader signature.\n");
+ return -ENOENT;
+ }
+
+ /* Get systab from boot params. */
+#ifdef CONFIG_X86_64
+ sys_table = ei->efi_systab | ((__u64)ei->efi_systab_hi << 32);
+#else
+ if (ei->efi_systab_hi || ei->efi_memmap_hi) {
+ debug_putstr("Error: EFI system table located above 4GB.\n");
+ return -EINVAL;
+ }
+ sys_table = ei->efi_systab;
+#endif
+ if (!sys_table) {
+ debug_putstr("EFI system table not found.");
+ return -ENOENT;
+ }
+
+ *sys_table_pa = sys_table;
+ *is_efi_64 = efi_64;
+ return 0;
+}
+
+/**
+ * Given boot_params, locate EFI system table from it and return the physical
+ * address EFI configuration table.
+ *
+ * @boot_params: pointer to boot_params
+ * @conf_table_pa: location to store physical address of config table
+ * @conf_table_len: location to store number of config table entries
+ * @is_efi_64: location to store whether using 64-bit EFI or not
+ *
+ * Returns 0 on success. On error, return params are left unchanged.
+ */
+int
+efi_get_conf_table(struct boot_params *boot_params,
+ unsigned long *conf_table_pa,
+ unsigned int *conf_table_len,
+ bool *is_efi_64)
+{
+ unsigned long sys_table_pa = 0;
+ int ret;
+
+ if (!conf_table_pa || !conf_table_len || !is_efi_64)
+ return -EINVAL;
+
+ ret = efi_get_system_table(boot_params, &sys_table_pa, is_efi_64);
+ if (ret)
+ return ret;
+
+ /* Handle EFI bitness properly */
+ if (*is_efi_64) {
+ efi_system_table_64_t *stbl =
+ (efi_system_table_64_t *)sys_table_pa;
+
+ *conf_table_pa = stbl->tables;
+ *conf_table_len = stbl->nr_tables;
+ } else {
+ efi_system_table_32_t *stbl =
+ (efi_system_table_32_t *)sys_table_pa;
+
+ *conf_table_pa = stbl->tables;
+ *conf_table_len = stbl->nr_tables;
+ }
+
+ return 0;
+}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 822e0c254b9a..16b092fd7aa1 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -21,6 +21,7 @@
#include <linux/screen_info.h>
#include <linux/elf.h>
#include <linux/io.h>
+#include <linux/efi.h>
#include <asm/page.h>
#include <asm/boot.h>
#include <asm/bootparam.h>
@@ -174,4 +175,46 @@ void boot_stage2_vc(void);
unsigned long sev_verify_cbit(unsigned long cr3);
+#ifdef CONFIG_EFI
+/* helpers for early EFI config table access */
+int
+efi_find_vendor_table(unsigned long conf_table_pa, unsigned int conf_table_len,
+ efi_guid_t guid, bool efi_64,
+ unsigned long *vendor_table_pa);
+
+int efi_get_system_table(struct boot_params *boot_params,
+ unsigned long *sys_table_pa,
+ bool *is_efi_64);
+
+int efi_get_conf_table(struct boot_params *boot_params,
+ unsigned long *conf_table_pa,
+ unsigned int *conf_table_len,
+ bool *is_efi_64);
+#else
+static inline int
+efi_find_vendor_table(unsigned long conf_table_pa, unsigned int conf_table_len,
+ efi_guid_t guid, bool efi_64,
+ unsigned long *vendor_table_pa)
+{
+ return -ENOENT;
+}
+
+static inline int
+efi_get_system_table(struct boot_params *boot_params,
+ unsigned long *sys_table_pa,
+ bool *is_efi_64)
+{
+ return -ENOENT;
+}
+
+static inline int
+efi_get_conf_table(struct boot_params *boot_params,
+ unsigned long *conf_table_pa,
+ unsigned int *conf_table_len,
+ bool *is_efi_64)
+{
+ return -ENOENT;
+}
+#endif /* CONFIG_EFI */
+
#endif /* BOOT_COMPRESSED_MISC_H */
--
2.17.1
From: Michael Roth <[email protected]>
As of commit 103a4908ad4d ("x86/head/64: Disable stack protection for
head$(BITS).o") kernel/head64.c is compiled with -fno-stack-protector
to allow a call to set_bringup_idt_handler(), which would otherwise
have stack protection enabled with CONFIG_STACKPROTECTOR_STRONG. While
sufficient for that case, this will still cause issues if we attempt to
call out to any external functions that were compiled with stack
protection enabled that in-turn make stack-protected calls, or if the
exception handlers set up by set_bringup_idt_handler() make calls to
stack-protected functions.
Subsequent patches for SEV-SNP CPUID validation support will introduce
both such cases. Attempting to disable stack protection for everything
in scope to address that is prohibitive since much of the code, like
SEV-ES #VC handler, is shared code that remains in use after boot and
could benefit from having stack protection enabled. Attempting to inline
calls is brittle and can quickly balloon out to library/helper code
where that's not really an option.
Instead, set up %gs to point a buffer that stack protector can use for
canary values when needed.
In doing so, it's likely we can stop using -no-stack-protector for
head64.c, but that hasn't been tested yet, and head32.c would need a
similar solution to be safe, so that is left as a potential follow-up.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/head64.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3e625c61f008..5abdfd0dbbc3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -46,7 +46,7 @@ endif
# non-deterministic coverage.
KCOV_INSTRUMENT := n
-CFLAGS_head$(BITS).o += -fno-stack-protector
+CFLAGS_head32.o += -fno-stack-protector
CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index a1711c4594fa..f1b76a54c84e 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -74,6 +74,11 @@ static struct desc_struct startup_gdt[GDT_ENTRIES] = {
[GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
};
+/* For use by stack protector code before switching to virtual addresses */
+#if CONFIG_STACKPROTECTOR
+static char startup_gs_area[64];
+#endif
+
/*
* Address needs to be set at runtime because it references the startup_gdt
* while the kernel still uses a direct mapping.
@@ -605,6 +610,8 @@ void early_setup_idt(void)
*/
void __head startup_64_setup_env(unsigned long physbase)
{
+ u64 gs_area = (u64)fixup_pointer(startup_gs_area, physbase);
+
/* Load GDT */
startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
native_load_gdt(&startup_gdt_descr);
@@ -614,5 +621,18 @@ void __head startup_64_setup_env(unsigned long physbase)
"movl %%eax, %%ss\n"
"movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");
+ /*
+ * GCC stack protection needs a place to store canary values. The
+ * default is %gs:0x28, which is what the kernel currently uses.
+ * Point GS base to a buffer that can be used for this purpose.
+ * Note that newer GCCs now allow this location to be configured,
+ * so if we change from the default in the future we need to ensure
+ * that this buffer overlaps whatever address ends up being used.
+ */
+#if CONFIG_STACKPROTECTOR
+ asm volatile("movl %%eax, %%gs\n" : : "a"(__KERNEL_DS) : "memory");
+ native_wrmsr(MSR_GS_BASE, gs_area, gs_area >> 32);
+#endif
+
startup_64_load_idt(physbase);
}
--
2.17.1
From: Michael Roth <[email protected]>
When the Confidential Computing blob is located by the boot/compressed
kernel, store a pointer to it in bootparams->cc_blob_address to avoid
the need for the run-time kernel to rescan the EFI config table to find
it again.
Since this function is also shared by the run-time kernel, this patch
also adds the logic to make use of bootparams->cc_blob_address when it
has been initialized.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/sev-shared.c | 40 ++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 651980ddbd65..6f70ba293c5e 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -868,7 +868,6 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,
return ES_OK;
}
-#ifdef BOOT_COMPRESSED
static struct setup_data *get_cc_setup_data(struct boot_params *bp)
{
struct setup_data *hdr = (struct setup_data *)bp->hdr.setup_data;
@@ -888,6 +887,16 @@ static struct setup_data *get_cc_setup_data(struct boot_params *bp)
* 1) Search for CC blob in the following order/precedence:
* - via linux boot protocol / setup_data entry
* - via EFI configuration table
+ * 2) If found, initialize boot_params->cc_blob_address to point to the
+ * blob so that uncompressed kernel can easily access it during very
+ * early boot without the need to re-parse EFI config table
+ * 3) Return a pointer to the CC blob, NULL otherwise.
+ *
+ * For run-time/uncompressed kernel:
+ *
+ * 1) Search for CC blob in the following order/precedence:
+ * - via linux boot protocol / setup_data entry
+ * - via boot_params->cc_blob_address
* 2) Return a pointer to the CC blob, NULL otherwise.
*/
static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
@@ -897,9 +906,11 @@ static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
struct setup_data header;
u32 cc_blob_address;
} *sd;
+#ifdef __BOOT_COMPRESSED
unsigned long conf_table_pa;
unsigned int conf_table_len;
bool efi_64;
+#endif
/* Try to get CC blob via setup_data */
sd = (struct setup_data_cc *)get_cc_setup_data(bp);
@@ -908,29 +919,36 @@ static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
goto out_verify;
}
+#ifdef __BOOT_COMPRESSED
/* CC blob isn't in setup_data, see if it's in the EFI config table */
if (!efi_get_conf_table(bp, &conf_table_pa, &conf_table_len, &efi_64))
(void)efi_find_vendor_table(conf_table_pa, conf_table_len,
EFI_CC_BLOB_GUID, efi_64,
(unsigned long *)&cc_info);
+#else
+ /*
+ * CC blob isn't in setup_data, see if boot kernel passed it via
+ * boot_params.
+ */
+ if (bp->cc_blob_address)
+ cc_info = (struct cc_blob_sev_info *)(unsigned long)bp->cc_blob_address;
+#endif
out_verify:
/* CC blob should be either valid or not present. Fail otherwise. */
if (cc_info && cc_info->magic != CC_BLOB_SEV_HDR_MAGIC)
sev_es_terminate(1, GHCB_SNP_UNSUPPORTED);
+#ifdef __BOOT_COMPRESSED
+ /*
+ * Pass run-time kernel a pointer to CC info via boot_params for easier
+ * access during early boot.
+ */
+ bp->cc_blob_address = (u32)(unsigned long)cc_info;
+#endif
+
return cc_info;
}
-#else
-/*
- * Probing for CC blob for run-time kernel will be enabled in a subsequent
- * patch. For now we need to stub this out.
- */
-static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
-{
- return NULL;
-}
-#endif
/*
* Initial set up of CPUID table when running identity-mapped.
--
2.17.1
From: Michael Roth <[email protected]>
CPUID instructions generate a #VC exception for SEV-ES/SEV-SNP guests,
for which early handlers are currently set up to handle. In the case
of SEV-SNP, guests can use a special location in guest memory address
space that has been pre-populated with firmware-validated CPUID
information to look up the relevant CPUID values rather than
requesting them from hypervisor via a VMGEXIT.
Determine the location of the CPUID memory address in advance of any
CPUID instructions/exceptions and, when available, use it to handle
the CPUID lookup.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/efi.c | 1 +
arch/x86/boot/compressed/head_64.S | 1 +
arch/x86/boot/compressed/idt_64.c | 7 +-
arch/x86/boot/compressed/misc.h | 1 +
arch/x86/boot/compressed/sev.c | 3 +
arch/x86/include/asm/sev-common.h | 2 +
arch/x86/include/asm/sev.h | 3 +
arch/x86/kernel/sev-shared.c | 374 +++++++++++++++++++++++++++++
arch/x86/kernel/sev.c | 4 +
9 files changed, 394 insertions(+), 2 deletions(-)
diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
index 16ff5cb9a1fb..a1529a230ea7 100644
--- a/arch/x86/boot/compressed/efi.c
+++ b/arch/x86/boot/compressed/efi.c
@@ -176,3 +176,4 @@ efi_get_conf_table(struct boot_params *boot_params,
return 0;
}
+
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index a2347ded77ea..1c1658693fc9 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -441,6 +441,7 @@ SYM_CODE_START(startup_64)
.Lon_kernel_cs:
pushq %rsi
+ movq %rsi, %rdi /* real mode address */
call load_stage1_idt
popq %rsi
diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
index 9b93567d663a..1f6511a6625d 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -3,6 +3,7 @@
#include <asm/segment.h>
#include <asm/trapnr.h>
#include "misc.h"
+#include <asm/sev.h>
static void set_idt_entry(int vector, void (*handler)(void))
{
@@ -28,13 +29,15 @@ static void load_boot_idt(const struct desc_ptr *dtr)
}
/* Setup IDT before kernel jumping to .Lrelocated */
-void load_stage1_idt(void)
+void load_stage1_idt(void *rmode)
{
boot_idt_desc.address = (unsigned long)boot_idt;
- if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ sev_snp_cpuid_init(rmode);
set_idt_entry(X86_TRAP_VC, boot_stage1_vc);
+ }
load_boot_idt(&boot_idt_desc);
}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 16b092fd7aa1..cdd328aa42c2 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -190,6 +190,7 @@ int efi_get_conf_table(struct boot_params *boot_params,
unsigned long *conf_table_pa,
unsigned int *conf_table_len,
bool *is_efi_64);
+
#else
static inline int
efi_find_vendor_table(unsigned long conf_table_pa, unsigned int conf_table_len,
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 6e8d97c280aa..910bf5cf010e 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -20,6 +20,9 @@
#include <asm/fpu/xcr.h>
#include <asm/ptrace.h>
#include <asm/svm.h>
+#include <asm/cpuid.h>
+#include <linux/efi.h>
+#include <linux/log2.h>
#include "error.h"
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 072540dfb129..5f134c172dbf 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -148,6 +148,8 @@ struct snp_psc_desc {
#define GHCB_TERM_PSC 1 /* Page State Change failure */
#define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
#define GHCB_TERM_NOT_VMPL0 3 /* SNP guest is not running at VMPL-0 */
+#define GHCB_TERM_CPUID 4 /* CPUID-validation failure */
+#define GHCB_TERM_CPUID_HV 5 /* CPUID failure during hypervisor fallback */
#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 534fa1c4c881..c73931548346 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -11,6 +11,7 @@
#include <linux/types.h>
#include <asm/insn.h>
#include <asm/sev-common.h>
+#include <asm/bootparam.h>
#define GHCB_PROTOCOL_MIN 1ULL
#define GHCB_PROTOCOL_MAX 2ULL
@@ -126,6 +127,7 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
void snp_set_wakeup_secondary_cpu(void);
+void sev_snp_cpuid_init(struct boot_params *bp);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -141,6 +143,7 @@ static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz,
static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_wakeup_secondary_cpu(void) { }
+static inline void sev_snp_cpuid_init(struct boot_params *bp) { }
#endif
#endif
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index ae4556925485..651980ddbd65 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -14,6 +14,25 @@
#define has_cpuflag(f) boot_cpu_has(f)
#endif
+struct sev_snp_cpuid_fn {
+ u32 eax_in;
+ u32 ecx_in;
+ u64 unused;
+ u64 unused2;
+ u32 eax;
+ u32 ebx;
+ u32 ecx;
+ u32 edx;
+ u64 reserved;
+} __packed;
+
+struct sev_snp_cpuid_info {
+ u32 count;
+ u32 reserved1;
+ u64 reserved2;
+ struct sev_snp_cpuid_fn fn[0];
+} __packed;
+
/*
* Since feature negotiation related variables are set early in the boot
* process they must reside in the .data section so as not to be zeroed
@@ -26,6 +45,15 @@ static u16 __ro_after_init ghcb_version;
/* Bitmap of SEV features supported by the hypervisor */
u64 __ro_after_init sev_hv_features = 0;
+/*
+ * These are also stored in .data section to avoid the need to re-parse
+ * boot_params and re-determine CPUID memory range when .bss is cleared.
+ */
+static int sev_snp_cpuid_enabled __section(".data");
+static unsigned long sev_snp_cpuid_pa __section(".data");
+static unsigned long sev_snp_cpuid_sz __section(".data");
+static const struct sev_snp_cpuid_info *cpuid_info __section(".data");
+
static bool __init sev_es_check_cpu_features(void)
{
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -236,6 +264,219 @@ static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
return 0;
}
+static bool sev_snp_cpuid_active(void)
+{
+ return sev_snp_cpuid_enabled;
+}
+
+static int sev_snp_cpuid_xsave_size(u64 xfeatures_en, u32 base_size,
+ u32 *xsave_size, bool compacted)
+{
+ u64 xfeatures_found = 0;
+ int i;
+
+ *xsave_size = base_size;
+
+ for (i = 0; i < cpuid_info->count; i++) {
+ const struct sev_snp_cpuid_fn *fn = &cpuid_info->fn[i];
+
+ if (!(fn->eax_in == 0xd && fn->ecx_in > 1 && fn->ecx_in < 64))
+ continue;
+ if (!(xfeatures_en & (1UL << fn->ecx_in)))
+ continue;
+ if (xfeatures_found & (1UL << fn->ecx_in))
+ continue;
+
+ xfeatures_found |= (1UL << fn->ecx_in);
+ if (compacted)
+ *xsave_size += fn->eax;
+ else
+ *xsave_size = max(*xsave_size, fn->eax + fn->ebx);
+ }
+
+ /*
+ * Either the guest set unsupported XCR0/XSS bits, or the corresponding
+ * entries in the CPUID table were not present. This is not a valid
+ * state to be in.
+ */
+ if (xfeatures_found != (xfeatures_en & ~3ULL))
+ return -EINVAL;
+
+ return 0;
+}
+
+static void sev_snp_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
+ u32 *ecx, u32 *edx)
+{
+ /*
+ * Currently MSR protocol is sufficient to handle fallback cases, but
+ * should that change make sure we terminate rather than grabbing random
+ * values. Handling can be added in future to use GHCB-page protocol for
+ * cases that occur late enough in boot that GHCB page is available
+ */
+ if (cpuid_function_is_indexed(func) && subfunc != 0)
+ sev_es_terminate(1, GHCB_TERM_CPUID_HV);
+
+ if (sev_cpuid_hv(func, 0, eax, ebx, ecx, edx))
+ sev_es_terminate(1, GHCB_TERM_CPUID_HV);
+}
+
+static bool sev_snp_cpuid_find(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
+ u32 *ecx, u32 *edx)
+{
+ int i;
+ bool found = false;
+
+ for (i = 0; i < cpuid_info->count; i++) {
+ const struct sev_snp_cpuid_fn *fn = &cpuid_info->fn[i];
+
+ if (fn->eax_in != func)
+ continue;
+
+ if (cpuid_function_is_indexed(func) && fn->ecx_in != subfunc)
+ continue;
+
+ *eax = fn->eax;
+ *ebx = fn->ebx;
+ *ecx = fn->ecx;
+ *edx = fn->edx;
+ found = true;
+
+ break;
+ }
+
+ return found;
+}
+
+static bool sev_snp_cpuid_in_range(u32 func)
+{
+ int i;
+ u32 std_range_min = 0;
+ u32 std_range_max = 0;
+ u32 hyp_range_min = 0x40000000;
+ u32 hyp_range_max = 0;
+ u32 ext_range_min = 0x80000000;
+ u32 ext_range_max = 0;
+
+ for (i = 0; i < cpuid_info->count; i++) {
+ const struct sev_snp_cpuid_fn *fn = &cpuid_info->fn[i];
+
+ if (fn->eax_in == std_range_min)
+ std_range_max = fn->eax;
+ else if (fn->eax_in == hyp_range_min)
+ hyp_range_max = fn->eax;
+ else if (fn->eax_in == ext_range_min)
+ ext_range_max = fn->eax;
+ }
+
+ if ((func >= std_range_min && func <= std_range_max) ||
+ (func >= hyp_range_min && func <= hyp_range_max) ||
+ (func >= ext_range_min && func <= ext_range_max))
+ return true;
+
+ return false;
+}
+
+/*
+ * Returns -EOPNOTSUPP if feature not enabled. Any other return value should be
+ * treated as fatal by caller since we cannot fall back to hypervisor to fetch
+ * the values for security reasons (outside of the specific cases handled here)
+ */
+static int sev_snp_cpuid(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
+ u32 *edx)
+{
+ if (!sev_snp_cpuid_active())
+ return -EOPNOTSUPP;
+
+ if (!cpuid_info)
+ return -EIO;
+
+ if (!sev_snp_cpuid_find(func, subfunc, eax, ebx, ecx, edx)) {
+ /*
+ * Some hypervisors will avoid keeping track of CPUID entries
+ * where all values are zero, since they can be handled the
+ * same as out-of-range values (all-zero). In our case, we want
+ * to be able to distinguish between out-of-range entries and
+ * in-range zero entries, since the CPUID table entries are
+ * only a template that may need to be augmented with
+ * additional values for things like CPU-specific information.
+ * So if it's not in the table, but is still in the valid
+ * range, proceed with the fix-ups below. Otherwise, just return
+ * zeros.
+ */
+ *eax = *ebx = *ecx = *edx = 0;
+ if (!sev_snp_cpuid_in_range(func))
+ goto out;
+ }
+
+ if (func == 0x1) {
+ u32 ebx2, edx2;
+
+ sev_snp_cpuid_hv(func, subfunc, NULL, &ebx2, NULL, &edx2);
+ /* initial APIC ID */
+ *ebx = (*ebx & 0x00FFFFFF) | (ebx2 & 0xFF000000);
+ /* APIC enabled bit */
+ *edx = (*edx & ~BIT_ULL(9)) | (edx2 & BIT_ULL(9));
+
+ /* OSXSAVE enabled bit */
+ if (native_read_cr4() & X86_CR4_OSXSAVE)
+ *ecx |= BIT_ULL(27);
+ } else if (func == 0x7) {
+ /* OSPKE enabled bit */
+ *ecx &= ~BIT_ULL(4);
+ if (native_read_cr4() & X86_CR4_PKE)
+ *ecx |= BIT_ULL(4);
+ } else if (func == 0xB) {
+ /* extended APIC ID */
+ sev_snp_cpuid_hv(func, 0, NULL, NULL, NULL, edx);
+ } else if (func == 0xd && (subfunc == 0x0 || subfunc == 0x1)) {
+ bool compacted = false;
+ u64 xcr0 = 1, xss = 0;
+ u32 xsave_size;
+
+ if (native_read_cr4() & X86_CR4_OSXSAVE)
+ xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
+ if (subfunc == 1) {
+ /* boot/compressed doesn't set XSS so 0 is fine there */
+#ifndef __BOOT_COMPRESSED
+ if (*eax & 0x8) /* XSAVES */
+ if (boot_cpu_has(X86_FEATURE_XSAVES))
+ rdmsrl(MSR_IA32_XSS, xss);
+#endif
+ /*
+ * The PPR and APM aren't clear on what size should be
+ * encoded in 0xD:0x1:EBX when compaction is not enabled
+ * by either XSAVEC or XSAVES since SNP-capable hardware
+ * has the entries fixed as 1. KVM sets it to 0 in this
+ * case, but to avoid this becoming an issue it's safer
+ * to simply treat this as unsupported or SNP guests.
+ */
+ if (!(*eax & 0xA)) /* (XSAVEC|XSAVES) */
+ return -EINVAL;
+
+ compacted = true;
+ }
+
+ if (sev_snp_cpuid_xsave_size(xcr0 | xss, *ebx, &xsave_size,
+ compacted))
+ return -EINVAL;
+
+ *ebx = xsave_size;
+ } else if (func == 0x8000001E) {
+ u32 ebx2, ecx2;
+
+ /* extended APIC ID */
+ sev_snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL);
+ /* compute ID */
+ *ebx = (*ebx & 0xFFFFFFF00) | (ebx2 & 0x000000FF);
+ /* node ID */
+ *ecx = (*ecx & 0xFFFFFFF00) | (ecx2 & 0x000000FF);
+ }
+
+out:
+ return 0;
+}
+
/*
* Boot VC Handler - This is the first VC handler during boot, there is no GHCB
* page yet, so it only supports the MSR based communication with the
@@ -244,15 +485,25 @@ static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
{
unsigned int fn = lower_bits(regs->ax, 32);
+ unsigned int subfn = lower_bits(regs->cx, 32);
u32 eax, ebx, ecx, edx;
+ int ret;
/* Only CPUID is supported via MSR protocol */
if (exit_code != SVM_EXIT_CPUID)
goto fail;
+ ret = sev_snp_cpuid(fn, subfn, &eax, &ebx, &ecx, &edx);
+ if (ret == 0)
+ goto out;
+
+ if (ret != -EOPNOTSUPP)
+ goto fail;
+
if (sev_cpuid_hv(fn, 0, &eax, &ebx, &ecx, &edx))
goto fail;
+out:
regs->ax = eax;
regs->bx = ebx;
regs->cx = ecx;
@@ -552,6 +803,19 @@ static enum es_result vc_handle_cpuid(struct ghcb *ghcb,
struct pt_regs *regs = ctxt->regs;
u32 cr4 = native_read_cr4();
enum es_result ret;
+ u32 eax, ebx, ecx, edx;
+ int cpuid_ret;
+
+ cpuid_ret = sev_snp_cpuid(regs->ax, regs->cx, &eax, &ebx, &ecx, &edx);
+ if (cpuid_ret == 0) {
+ regs->ax = eax;
+ regs->bx = ebx;
+ regs->cx = ecx;
+ regs->dx = edx;
+ return ES_OK;
+ }
+ if (cpuid_ret != -EOPNOTSUPP)
+ return ES_VMM_ERROR;
ghcb_set_rax(ghcb, regs->ax);
ghcb_set_rcx(ghcb, regs->cx);
@@ -603,3 +867,113 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,
return ES_OK;
}
+
+#ifdef BOOT_COMPRESSED
+static struct setup_data *get_cc_setup_data(struct boot_params *bp)
+{
+ struct setup_data *hdr = (struct setup_data *)bp->hdr.setup_data;
+
+ while (hdr) {
+ if (hdr->type == SETUP_CC_BLOB)
+ return hdr;
+ hdr = (struct setup_data *)hdr->next;
+ }
+
+ return NULL;
+}
+
+/*
+ * For boot/compressed kernel:
+ *
+ * 1) Search for CC blob in the following order/precedence:
+ * - via linux boot protocol / setup_data entry
+ * - via EFI configuration table
+ * 2) Return a pointer to the CC blob, NULL otherwise.
+ */
+static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
+{
+ struct cc_blob_sev_info *cc_info = NULL;
+ struct setup_data_cc {
+ struct setup_data header;
+ u32 cc_blob_address;
+ } *sd;
+ unsigned long conf_table_pa;
+ unsigned int conf_table_len;
+ bool efi_64;
+
+ /* Try to get CC blob via setup_data */
+ sd = (struct setup_data_cc *)get_cc_setup_data(bp);
+ if (sd) {
+ cc_info = (struct cc_blob_sev_info *)(unsigned long)sd->cc_blob_address;
+ goto out_verify;
+ }
+
+ /* CC blob isn't in setup_data, see if it's in the EFI config table */
+ if (!efi_get_conf_table(bp, &conf_table_pa, &conf_table_len, &efi_64))
+ (void)efi_find_vendor_table(conf_table_pa, conf_table_len,
+ EFI_CC_BLOB_GUID, efi_64,
+ (unsigned long *)&cc_info);
+
+out_verify:
+ /* CC blob should be either valid or not present. Fail otherwise. */
+ if (cc_info && cc_info->magic != CC_BLOB_SEV_HDR_MAGIC)
+ sev_es_terminate(1, GHCB_SNP_UNSUPPORTED);
+
+ return cc_info;
+}
+#else
+/*
+ * Probing for CC blob for run-time kernel will be enabled in a subsequent
+ * patch. For now we need to stub this out.
+ */
+static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
+{
+ return NULL;
+}
+#endif
+
+/*
+ * Initial set up of CPUID table when running identity-mapped.
+ *
+ * NOTE: Since SEV_SNP feature partly relies on CPUID checks that can't
+ * happen until we access CPUID page, we skip the check and hope the
+ * bootloader is providing sane values. Current code relies on all CPUID
+ * page lookups originating from #VC handler, which at least provides
+ * indication that SEV-ES is enabled. Subsequent init levels will check for
+ * SEV_SNP feature once available to also take SEV MSR value into account.
+ */
+void sev_snp_cpuid_init(struct boot_params *bp)
+{
+ struct cc_blob_sev_info *cc_info;
+
+ if (!bp)
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ cc_info = sev_snp_probe_cc_blob(bp);
+
+ if (!cc_info)
+ return;
+
+ sev_snp_cpuid_pa = cc_info->cpuid_phys;
+ sev_snp_cpuid_sz = cc_info->cpuid_len;
+
+ /*
+ * These should always be valid values for SNP, even if guest isn't
+ * actually configured to use the CPUID table.
+ */
+ if (!sev_snp_cpuid_pa || sev_snp_cpuid_sz < PAGE_SIZE)
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ cpuid_info = (const struct sev_snp_cpuid_info *)sev_snp_cpuid_pa;
+
+ /*
+ * We should be able to trust the 'count' value in the CPUID table
+ * area, but ensure it agrees with CC blob value to be safe.
+ */
+ if (sev_snp_cpuid_sz < (sizeof(struct sev_snp_cpuid_info) +
+ sizeof(struct sev_snp_cpuid_fn) *
+ cpuid_info->count))
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ sev_snp_cpuid_enabled = 1;
+}
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index ddf8ced4a879..d7b6f7420551 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -19,6 +19,8 @@
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/cpumask.h>
+#include <linux/log2.h>
+#include <linux/efi.h>
#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -32,6 +34,8 @@
#include <asm/smp.h>
#include <asm/cpu.h>
#include <asm/apic.h>
+#include <asm/efi.h>
+#include <asm/cpuid.h>
#include "sev-internal.h"
--
2.17.1
The probe_roms() access the memory range (0xc0000 - 0x10000) to probe
various ROMs. The memory range is not part of the E820 system RAM
range. The memory range is mapped as private (i.e encrypted) in page
table.
When SEV-SNP is active, all the private memory must be validated before
the access. The ROM range was not part of E820 map, so the guest BIOS
did not validate it. An access to invalidated memory will cause a VC
exception. The guest does not support handling not-validated VC exception
yet, so validate the ROM memory regions before it is accessed.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/probe_roms.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index 9e1def3744f2..9c09df86d167 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -21,6 +21,7 @@
#include <asm/sections.h>
#include <asm/io.h>
#include <asm/setup_arch.h>
+#include <asm/sev.h>
static struct resource system_rom_resource = {
.name = "System ROM",
@@ -197,11 +198,21 @@ static int __init romchecksum(const unsigned char *rom, unsigned long length)
void __init probe_roms(void)
{
- const unsigned char *rom;
unsigned long start, length, upper;
+ const unsigned char *rom;
unsigned char c;
int i;
+ /*
+ * The ROM memory is not part of the E820 system RAM and is not pre-validated
+ * by the BIOS. The kernel page table maps the ROM region as encrypted memory,
+ * the SEV-SNP requires the encrypted memory must be validated before the
+ * access. Validate the ROM before accessing it.
+ */
+ snp_prep_memory(video_rom_resource.start,
+ ((system_rom_resource.end + 1) - video_rom_resource.start),
+ SNP_PAGE_STATE_PRIVATE);
+
/* video rom */
upper = adapter_rom_resources[0].start;
for (start = video_rom_resource.start; start < upper; start += 2048) {
--
2.17.1
From: Michael Roth <[email protected]>
Determining which CPUID leafs have significant ECX/index values is
also needed by guest kernel code when doing SEV-SNP-validated CPUID
lookups. Move this to common code to keep future updates in sync.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/cpuid.h | 26 ++++++++++++++++++++++++++
arch/x86/kvm/cpuid.c | 17 ++---------------
2 files changed, 28 insertions(+), 15 deletions(-)
create mode 100644 arch/x86/include/asm/cpuid.h
diff --git a/arch/x86/include/asm/cpuid.h b/arch/x86/include/asm/cpuid.h
new file mode 100644
index 000000000000..61426eb1f665
--- /dev/null
+++ b/arch/x86/include/asm/cpuid.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_CPUID_H
+#define _ASM_X86_CPUID_H
+
+static __always_inline bool cpuid_function_is_indexed(u32 function)
+{
+ switch (function) {
+ case 4:
+ case 7:
+ case 0xb:
+ case 0xd:
+ case 0xf:
+ case 0x10:
+ case 0x12:
+ case 0x14:
+ case 0x17:
+ case 0x18:
+ case 0x1f:
+ case 0x8000001d:
+ return true;
+ }
+
+ return false;
+}
+
+#endif /* _ASM_X86_CPUID_H */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 739be5da3bca..9ef13775f29e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -19,6 +19,7 @@
#include <asm/user.h>
#include <asm/fpu/xstate.h>
#include <asm/sgx.h>
+#include <asm/cpuid.h>
#include "cpuid.h"
#include "lapic.h"
#include "mmu.h"
@@ -608,22 +609,8 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
cpuid_count(entry->function, entry->index,
&entry->eax, &entry->ebx, &entry->ecx, &entry->edx);
- switch (function) {
- case 4:
- case 7:
- case 0xb:
- case 0xd:
- case 0xf:
- case 0x10:
- case 0x12:
- case 0x14:
- case 0x17:
- case 0x18:
- case 0x1f:
- case 0x8000001d:
+ if (cpuid_function_is_indexed(function))
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
- break;
- }
return entry;
}
--
2.17.1
From: Borislav Petkov <[email protected]>
Carve it out so that it is abstracted out of the main boot path. All
other encrypted guest-relevant processing should be placed in there.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/head64.c | 55 ++++++++++++++++++++++------------------
1 file changed, 31 insertions(+), 24 deletions(-)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..eee24b427237 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -126,6 +126,36 @@ static bool __head check_la57_support(unsigned long physaddr)
}
#endif
+static unsigned long sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
+{
+ unsigned long vaddr, vaddr_end;
+ int i;
+
+ /* Encrypt the kernel and related (if SME is active) */
+ sme_encrypt_kernel(bp);
+
+ /*
+ * Clear the memory encryption mask from the .bss..decrypted section.
+ * The bss section will be memset to zero later in the initialization so
+ * there is no need to zero it after changing the memory encryption
+ * attribute.
+ */
+ if (mem_encrypt_active()) {
+ vaddr = (unsigned long)__start_bss_decrypted;
+ vaddr_end = (unsigned long)__end_bss_decrypted;
+ for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+ i = pmd_index(vaddr);
+ pmd[i] -= sme_get_me_mask();
+ }
+ }
+
+ /*
+ * Return the SME encryption mask (if SME is active) to be used as a
+ * modifier for the initial pgdir entry programmed into CR3.
+ */
+ return sme_get_me_mask();
+}
+
/* Code in __startup_64() can be relocated during execution, but the compiler
* doesn't have to generate PC-relative relocations when accessing globals from
* that function. Clang actually does not generate them, which leads to
@@ -135,7 +165,6 @@ static bool __head check_la57_support(unsigned long physaddr)
unsigned long __head __startup_64(unsigned long physaddr,
struct boot_params *bp)
{
- unsigned long vaddr, vaddr_end;
unsigned long load_delta, *p;
unsigned long pgtable_flags;
pgdval_t *pgd;
@@ -276,29 +305,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
*/
*fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
- /* Encrypt the kernel and related (if SME is active) */
- sme_encrypt_kernel(bp);
-
- /*
- * Clear the memory encryption mask from the .bss..decrypted section.
- * The bss section will be memset to zero later in the initialization so
- * there is no need to zero it after changing the memory encryption
- * attribute.
- */
- if (mem_encrypt_active()) {
- vaddr = (unsigned long)__start_bss_decrypted;
- vaddr_end = (unsigned long)__end_bss_decrypted;
- for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
- i = pmd_index(vaddr);
- pmd[i] -= sme_get_me_mask();
- }
- }
-
- /*
- * Return the SME encryption mask (if SME is active) to be used as a
- * modifier for the initial pgdir entry programmed into CR3.
- */
- return sme_get_me_mask();
+ return sme_postprocess_startup(bp, pmd);
}
unsigned long __startup_secondary_64(void)
--
2.17.1
While launching the encrypted guests, the hypervisor may need to provide
some additional information during the guest boot. When booting under the
EFI based BIOS, the EFI configuration table contains an entry for the
confidential computing blob that contains the required information.
To support booting encrypted guests on non-EFI VM, the hypervisor needs to
pass this additional information to the kernel with a different method.
For this purpose, introduce SETUP_CC_BLOB type in setup_data to hold the
physical address of the confidential computing blob location. The boot
loader or hypervisor may choose to use this method instead of EFI
configuration table. The CC blob location scanning should give preference
to setup_data data over the EFI configuration table.
In AMD SEV-SNP, the CC blob contains the address of the secrets and CPUID
pages. The secrets page includes information such as a VM to PSP
communication key and CPUID page contains PSP filtered CPUID values.
Define the AMD SEV confidential computing blob structure.
While at it, define the EFI GUID for the confidential computing blob.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 12 ++++++++++++
arch/x86/include/uapi/asm/bootparam.h | 1 +
include/linux/efi.h | 1 +
3 files changed, 14 insertions(+)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7f063127aa66..534fa1c4c881 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -44,6 +44,18 @@ struct es_em_ctxt {
void do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code);
+/* AMD SEV Confidential computing blob structure */
+#define CC_BLOB_SEV_HDR_MAGIC 0x45444d41
+struct cc_blob_sev_info {
+ u32 magic;
+ u16 version;
+ u16 reserved;
+ u64 secrets_phys;
+ u32 secrets_len;
+ u64 cpuid_phys;
+ u32 cpuid_len;
+};
+
static inline u64 lower_bits(u64 val, unsigned int bits)
{
u64 mask = (1ULL << bits) - 1;
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index b25d3f82c2f3..1ac5acca72ce 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -10,6 +10,7 @@
#define SETUP_EFI 4
#define SETUP_APPLE_PROPERTIES 5
#define SETUP_JAILHOUSE 6
+#define SETUP_CC_BLOB 7
#define SETUP_INDIRECT (1<<31)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 6b5d36babfcc..75aeb2a56888 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -344,6 +344,7 @@ void efi_native_runtime_setup(void);
#define EFI_CERT_SHA256_GUID EFI_GUID(0xc1c41626, 0x504c, 0x4092, 0xac, 0xa9, 0x41, 0xf9, 0x36, 0x93, 0x43, 0x28)
#define EFI_CERT_X509_GUID EFI_GUID(0xa5c059a1, 0x94e4, 0x4aa7, 0x87, 0xb5, 0xab, 0x15, 0x5c, 0x2b, 0xf0, 0x72)
#define EFI_CERT_X509_SHA256_GUID EFI_GUID(0x3bd2a492, 0x96c0, 0x4079, 0xb4, 0x20, 0xfc, 0xf9, 0x8e, 0xf1, 0x03, 0xed)
+#define EFI_CC_BLOB_GUID EFI_GUID(0x067b1f5f, 0xcf26, 0x44c5, 0x85, 0x54, 0x93, 0xd7, 0x77, 0x91, 0x2d, 0x42)
/*
* This GUID is used to pass to the kernel proper the struct screen_info
--
2.17.1
From: Michael Roth <[email protected]>
This code will also be used later for SEV-SNP-validated CPUID code in
some cases, so move it to a common helper.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/sev-shared.c | 84 +++++++++++++++++++++++++-----------
1 file changed, 58 insertions(+), 26 deletions(-)
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 1adc74ab97c0..ae4556925485 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -184,6 +184,58 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
return ret;
}
+static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
+ u32 *ecx, u32 *edx)
+{
+ u64 val;
+
+ if (eax) {
+ sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(func, GHCB_CPUID_REQ_EAX));
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
+ return -EIO;
+
+ *eax = (val >> 32);
+ }
+
+ if (ebx) {
+ sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(func, GHCB_CPUID_REQ_EBX));
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
+ return -EIO;
+
+ *ebx = (val >> 32);
+ }
+
+ if (ecx) {
+ sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(func, GHCB_CPUID_REQ_ECX));
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
+ return -EIO;
+
+ *ecx = (val >> 32);
+ }
+
+ if (edx) {
+ sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(func, GHCB_CPUID_REQ_EDX));
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
+ return -EIO;
+
+ *edx = (val >> 32);
+ }
+
+ return 0;
+}
+
/*
* Boot VC Handler - This is the first VC handler during boot, there is no GHCB
* page yet, so it only supports the MSR based communication with the
@@ -192,39 +244,19 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
{
unsigned int fn = lower_bits(regs->ax, 32);
- unsigned long val;
+ u32 eax, ebx, ecx, edx;
/* Only CPUID is supported via MSR protocol */
if (exit_code != SVM_EXIT_CPUID)
goto fail;
- sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX));
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
- if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
+ if (sev_cpuid_hv(fn, 0, &eax, &ebx, &ecx, &edx))
goto fail;
- regs->ax = val >> 32;
- sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EBX));
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
- if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
- goto fail;
- regs->bx = val >> 32;
-
- sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_ECX));
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
- if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
- goto fail;
- regs->cx = val >> 32;
-
- sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EDX));
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
- if (GHCB_RESP_CODE(val) != GHCB_MSR_CPUID_RESP)
- goto fail;
- regs->dx = val >> 32;
+ regs->ax = eax;
+ regs->bx = ebx;
+ regs->cx = ecx;
+ regs->dx = edx;
/*
* This is a VC handler and the #VC is only raised when SEV-ES is
--
2.17.1
Version 2 of GHCB specification provides NAEs that can be used by the SNP
guest to communicate with the PSP without risk from a malicious hypervisor
who wishes to read, alter, drop or replay the messages sent.
In order to communicate with the PSP, the guest need to locate the secrets
page inserted by the hypervisor during the SEV-SNP guest launch. The
secrets page contains the communication keys used to send and receive the
encrypted messages between the guest and the PSP. The secrets page location
is passed through the setup_data.
Create a platform device that the SNP guest driver can bind to get the
platform resources such as encryption key and message id to use to
communicate with the PSP. The SNP guest driver can provide userspace
interface to get the attestation report, key derivation, extended
attestation report etc.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/sev.c | 68 +++++++++++++++++++++++++++++++++++++++
include/linux/sev-guest.h | 5 +++
2 files changed, 73 insertions(+)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index f42cd5a8e7bb..ab17c93634e9 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -22,6 +22,8 @@
#include <linux/log2.h>
#include <linux/efi.h>
#include <linux/sev-guest.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>
#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -37,6 +39,7 @@
#include <asm/apic.h>
#include <asm/efi.h>
#include <asm/cpuid.h>
+#include <asm/setup.h>
#include "sev-internal.h"
@@ -2164,3 +2167,68 @@ int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsi
return ret;
}
EXPORT_SYMBOL_GPL(snp_issue_guest_request);
+
+static struct platform_device guest_req_device = {
+ .name = "snp-guest",
+ .id = -1,
+};
+
+static u64 find_secrets_paddr(void)
+{
+ u64 pa_data = boot_params.cc_blob_address;
+ struct cc_blob_sev_info info;
+ void *map;
+
+ /*
+ * The CC blob contains the address of the secrets page, check if the
+ * blob is present.
+ */
+ if (!pa_data)
+ return 0;
+
+ map = early_memremap(pa_data, sizeof(info));
+ memcpy(&info, map, sizeof(info));
+ early_memunmap(map, sizeof(info));
+
+ /* Verify that secrets page address is passed */
+ if (info.secrets_phys && info.secrets_len == PAGE_SIZE)
+ return info.secrets_phys;
+
+ return 0;
+}
+
+static int __init add_snp_guest_request(void)
+{
+ struct snp_secrets_page_layout *layout;
+ struct snp_guest_platform_data data;
+
+ if (!sev_feature_enabled(SEV_SNP))
+ return -ENODEV;
+
+ snp_secrets_phys = find_secrets_paddr();
+ if (!snp_secrets_phys)
+ return -ENODEV;
+
+ layout = snp_map_secrets_page();
+ if (!layout)
+ return -ENODEV;
+
+ /*
+ * The secrets page contains three VMPCK that can be used for
+ * communicating with the PSP. We choose the VMPCK0 to encrypt guest
+ * messages send and receive by the Linux. Provide the key and
+ * id through the platform data to the driver.
+ */
+ data.vmpck_id = 0;
+ memcpy_fromio(data.vmpck, layout->vmpck0, sizeof(data.vmpck));
+
+ iounmap(layout);
+
+ platform_device_add_data(&guest_req_device, &data, sizeof(data));
+
+ if (!platform_device_register(&guest_req_device))
+ dev_info(&guest_req_device.dev, "secret phys 0x%llx\n", snp_secrets_phys);
+
+ return 0;
+}
+device_initcall(add_snp_guest_request);
diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
index 16b6af24fda7..e1cb3f7dd034 100644
--- a/include/linux/sev-guest.h
+++ b/include/linux/sev-guest.h
@@ -68,6 +68,11 @@ struct snp_guest_request_data {
unsigned int data_npages;
};
+struct snp_guest_platform_data {
+ u8 vmpck_id;
+ char vmpck[VMPCK_KEY_LEN];
+};
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
unsigned long *fw_err);
--
2.17.1
The encryption attribute for the bss.decrypted region is cleared in the
initial page table build. This is because the section contains the data
that need to be shared between the guest and the hypervisor.
When SEV-SNP is active, just clearing the encryption attribute in the
page table is not enough. The page state need to be updated in the RMP
table.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/head64.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index eee24b427237..a1711c4594fa 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -143,7 +143,14 @@ static unsigned long sme_postprocess_startup(struct boot_params *bp, pmdval_t *p
if (mem_encrypt_active()) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
+
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+ /*
+ * When SEV-SNP is active then transition the page to shared in the RMP
+ * table so that it is consistent with the page table attribute change.
+ */
+ early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
+
i = pmd_index(vaddr);
pmd[i] -= sme_get_me_mask();
}
--
2.17.1
From: Tom Lendacky <[email protected]>
To provide a more secure way to start APs under SEV-SNP, use the SEV-SNP
AP Creation NAE event. This allows for guest control over the AP register
state rather than trusting the hypervisor with the SEV-ES Jump Table
address.
During native_smp_prepare_cpus(), invoke an SEV-SNP function that, if
SEV-SNP is active, will set/override apic->wakeup_secondary_cpu. This
will allow the SEV-SNP AP Creation NAE event method to be used to boot
the APs. As a result of installing the override when SEV-SNP is active,
this method of starting the APs becomes the required method. The override
function will fail to start the AP if the hypervisor does not have
support for AP creation.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 1 +
arch/x86/include/asm/sev.h | 4 +
arch/x86/include/uapi/asm/svm.h | 5 +
arch/x86/kernel/sev.c | 205 ++++++++++++++++++++++++++++++
arch/x86/kernel/smpboot.c | 3 +
5 files changed, 218 insertions(+)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 3388db814fd0..072540dfb129 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -100,6 +100,7 @@ enum psc_op {
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
#define GHCB_HV_FT_SNP BIT_ULL(0)
+#define GHCB_HV_FT_SNP_AP_CREATION (BIT_ULL(1) | GHCB_HV_FT_SNP)
/* SNP Page State Change NAE event */
#define VMGEXIT_PSC_MAX_ENTRY 253
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 005f230d0406..7f063127aa66 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -65,6 +65,8 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* RMP page size */
#define RMP_PG_SIZE_4K 0
+#define RMPADJUST_VMSA_PAGE_BIT BIT(16)
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -111,6 +113,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
+void snp_set_wakeup_secondary_cpu(void);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -125,6 +128,7 @@ early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned i
static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
+static inline void snp_set_wakeup_secondary_cpu(void) { }
#endif
#endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 0dcdb6e0c913..8b4c57baec52 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -109,6 +109,10 @@
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
#define SVM_VMGEXIT_PSC 0x80000010
+#define SVM_VMGEXIT_AP_CREATION 0x80000013
+#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
+#define SVM_VMGEXIT_AP_CREATE 1
+#define SVM_VMGEXIT_AP_DESTROY 2
#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff
@@ -221,6 +225,7 @@
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
{ SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
+ { SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
{ SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 106b4aaddfde..ddf8ced4a879 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -18,6 +18,7 @@
#include <linux/memblock.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/cpumask.h>
#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -30,6 +31,7 @@
#include <asm/svm.h>
#include <asm/smp.h>
#include <asm/cpu.h>
+#include <asm/apic.h>
#include "sev-internal.h"
@@ -105,6 +107,8 @@ struct ghcb_state {
static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
+static DEFINE_PER_CPU(struct sev_es_save_area *, snp_vmsa);
+
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
@@ -858,6 +862,207 @@ void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
pvalidate_pages(vaddr, npages, 1);
}
+static int rmpadjust(void *va, bool vmsa)
+{
+ u64 attrs;
+ int err;
+
+ /*
+ * The RMPADJUST instruction is used to set or clear the VMSA bit for
+ * a page. A change to the VMSA bit is only performed when running
+ * at VMPL0 and is ignored at other VMPL levels. If too low of a target
+ * VMPL level is specified, the instruction can succeed without changing
+ * the VMSA bit should the kernel not be in VMPL0. Using a target VMPL
+ * level of 1 will return a FAIL_PERMISSION error if the kernel is not
+ * at VMPL0, thus ensuring that the VMSA bit has been properly set when
+ * no error is returned.
+ */
+ attrs = 1;
+ if (vmsa)
+ attrs |= RMPADJUST_VMSA_PAGE_BIT;
+
+ /* Instruction mnemonic supported in binutils versions v2.36 and later */
+ asm volatile (".byte 0xf3,0x0f,0x01,0xfe\n\t"
+ : "=a" (err)
+ : "a" (va), "c" (RMP_PG_SIZE_4K), "d" (attrs)
+ : "memory", "cc");
+
+ return err;
+}
+
+#define __ATTR_BASE (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK)
+#define INIT_CS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
+#define INIT_DS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_WRITE_MASK)
+
+#define INIT_LDTR_ATTRIBS (SVM_SELECTOR_P_MASK | 2)
+#define INIT_TR_ATTRIBS (SVM_SELECTOR_P_MASK | 3)
+
+static int wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
+{
+ struct sev_es_save_area *cur_vmsa, *vmsa;
+ struct ghcb_state state;
+ unsigned long flags;
+ struct ghcb *ghcb;
+ int cpu, err, ret;
+ u8 sipi_vector;
+ u64 cr4;
+
+ if ((sev_hv_features & GHCB_HV_FT_SNP_AP_CREATION) != GHCB_HV_FT_SNP_AP_CREATION)
+ return -EOPNOTSUPP;
+
+ /*
+ * Verify the desired start IP against the known trampoline start IP
+ * to catch any future new trampolines that may be introduced that
+ * would require a new protected guest entry point.
+ */
+ if (WARN_ONCE(start_ip != real_mode_header->trampoline_start,
+ "Unsupported SEV-SNP start_ip: %lx\n", start_ip))
+ return -EINVAL;
+
+ /* Override start_ip with known protected guest start IP */
+ start_ip = real_mode_header->sev_es_trampoline_start;
+
+ /* Find the logical CPU for the APIC ID */
+ for_each_present_cpu(cpu) {
+ if (arch_match_cpu_phys_id(cpu, apic_id))
+ break;
+ }
+ if (cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ cur_vmsa = per_cpu(snp_vmsa, cpu);
+
+ /*
+ * A new VMSA is created each time because there is no guarantee that
+ * the current VMSA is the kernels or that the vCPU is not running. If
+ * an attempt was done to use the current VMSA with a running vCPU, a
+ * #VMEXIT of that vCPU would wipe out all of the settings being done
+ * here.
+ */
+ vmsa = (struct sev_es_save_area *)get_zeroed_page(GFP_KERNEL);
+ if (!vmsa)
+ return -ENOMEM;
+
+ /* CR4 should maintain the MCE value */
+ cr4 = native_read_cr4() & ~X86_CR4_MCE;
+
+ /* Set the CS value based on the start_ip converted to a SIPI vector */
+ sipi_vector = (start_ip >> 12);
+ vmsa->cs.base = sipi_vector << 12;
+ vmsa->cs.limit = 0xffff;
+ vmsa->cs.attrib = INIT_CS_ATTRIBS;
+ vmsa->cs.selector = sipi_vector << 8;
+
+ /* Set the RIP value based on start_ip */
+ vmsa->rip = start_ip & 0xfff;
+
+ /* Set VMSA entries to the INIT values as documented in the APM */
+ vmsa->ds.limit = 0xffff;
+ vmsa->ds.attrib = INIT_DS_ATTRIBS;
+ vmsa->es = vmsa->ds;
+ vmsa->fs = vmsa->ds;
+ vmsa->gs = vmsa->ds;
+ vmsa->ss = vmsa->ds;
+
+ vmsa->gdtr.limit = 0xffff;
+ vmsa->ldtr.limit = 0xffff;
+ vmsa->ldtr.attrib = INIT_LDTR_ATTRIBS;
+ vmsa->idtr.limit = 0xffff;
+ vmsa->tr.limit = 0xffff;
+ vmsa->tr.attrib = INIT_TR_ATTRIBS;
+
+ vmsa->efer = 0x1000; /* Must set SVME bit */
+ vmsa->cr4 = cr4;
+ vmsa->cr0 = 0x60000010;
+ vmsa->dr7 = 0x400;
+ vmsa->dr6 = 0xffff0ff0;
+ vmsa->rflags = 0x2;
+ vmsa->g_pat = 0x0007040600070406ULL;
+ vmsa->xcr0 = 0x1;
+ vmsa->mxcsr = 0x1f80;
+ vmsa->x87_ftw = 0x5555;
+ vmsa->x87_fcw = 0x0040;
+
+ /*
+ * Set the SNP-specific fields for this VMSA:
+ * VMPL level
+ * SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
+ */
+ vmsa->vmpl = 0;
+ vmsa->sev_features = sev_status >> 2;
+
+ /* Switch the page over to a VMSA page now that it is initialized */
+ ret = rmpadjust(vmsa, true);
+ if (ret) {
+ pr_err("set VMSA page failed (%u)\n", ret);
+ free_page((unsigned long)vmsa);
+
+ return -EINVAL;
+ }
+
+ /* Issue VMGEXIT AP Creation NAE event */
+ local_irq_save(flags);
+
+ ghcb = __sev_get_ghcb(&state);
+
+ vc_ghcb_invalidate(ghcb);
+ ghcb_set_rax(ghcb, vmsa->sev_features);
+ ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
+ ghcb_set_sw_exit_info_1(ghcb, ((u64)apic_id << 32) | SVM_VMGEXIT_AP_CREATE);
+ ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
+
+ sev_es_wr_ghcb_msr(__pa(ghcb));
+ VMGEXIT();
+
+ if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
+ lower_32_bits(ghcb->save.sw_exit_info_1)) {
+ pr_alert("SNP AP Creation error\n");
+ ret = -EINVAL;
+ }
+
+ __sev_put_ghcb(&state);
+
+ local_irq_restore(flags);
+
+ /* Perform cleanup if there was an error */
+ if (ret) {
+ err = rmpadjust(vmsa, false);
+ if (err)
+ pr_err("clear VMSA page failed (%u), leaking page\n", err);
+ else
+ free_page((unsigned long)vmsa);
+
+ vmsa = NULL;
+ }
+
+ /* Free up any previous VMSA page */
+ if (cur_vmsa) {
+ err = rmpadjust(cur_vmsa, false);
+ if (err)
+ pr_err("clear VMSA page failed (%u), leaking page\n", err);
+ else
+ free_page((unsigned long)cur_vmsa);
+ }
+
+ /* Record the current VMSA page */
+ per_cpu(snp_vmsa, cpu) = vmsa;
+
+ return ret;
+}
+
+void snp_set_wakeup_secondary_cpu(void)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return;
+
+ /*
+ * Always set this override if SEV-SNP is enabled. This makes it the
+ * required method to start APs under SEV-SNP. If the hypervisor does
+ * not support AP creation, then no APs will be started.
+ */
+ apic->wakeup_secondary_cpu = wakeup_cpu_via_vmgexit;
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 85f6e242b6b4..ca78711620e0 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -82,6 +82,7 @@
#include <asm/spec-ctrl.h>
#include <asm/hw_irq.h>
#include <asm/stackprotector.h>
+#include <asm/sev.h>
#ifdef CONFIG_ACPI_CPPC_LIB
#include <acpi/cppc_acpi.h>
@@ -1380,6 +1381,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
smp_quirk_init_udelay();
speculative_store_bypass_ht_init();
+
+ snp_set_wakeup_secondary_cpu();
}
void arch_thaw_secondary_cpus_begin(void)
--
2.17.1
The SNP_GET_DERIVED_KEY ioctl interface can be used by the SNP guest to
ask the firmware to provide a key derived from a root key. The derived
key may be used by the guest for any purposes it choose, such as a
sealing key or communicating with the external entities.
See SEV-SNP firmware spec for more information.
Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/virt/coco/sevguest.rst | 18 ++++++++++
drivers/virt/coco/sevguest/sevguest.c | 48 +++++++++++++++++++++++++++
include/uapi/linux/sev-guest.h | 24 ++++++++++++++
3 files changed, 90 insertions(+)
diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
index 52d5915037ef..25446670d816 100644
--- a/Documentation/virt/coco/sevguest.rst
+++ b/Documentation/virt/coco/sevguest.rst
@@ -67,3 +67,21 @@ provided by the SEV-SNP firmware to query the attestation report.
On success, the snp_report_resp.data will contains the report. The report
format is described in the SEV-SNP specification. See the SEV-SNP specification
for further details.
+
+2.2 SNP_GET_DERIVED_KEY
+-----------------------
+:Technology: sev-snp
+:Type: guest ioctl
+:Parameters (in): struct snp_derived_key_req
+:Returns (out): struct snp_derived_key_req on success, -negative on error
+
+The SNP_GET_DERIVED_KEY ioctl can be used to get a key derive from a root key.
+The derived key can be used by the guest for any purpose, such as sealing keys
+or communicating with external entities.
+
+The ioctl uses the SNP_GUEST_REQUEST (MSG_KEY_REQ) command provided by the
+SEV-SNP firmware to derive the key. See SEV-SNP specification for further details
+on the various fileds passed in the key derivation request.
+
+On success, the snp_derived_key_resp.data will contains the derived key
+value.
diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
index d029a98ad088..621b1c5a9cfc 100644
--- a/drivers/virt/coco/sevguest/sevguest.c
+++ b/drivers/virt/coco/sevguest/sevguest.c
@@ -303,6 +303,50 @@ static int get_report(struct snp_guest_dev *snp_dev, struct snp_user_guest_reque
return rc;
}
+static int get_derived_key(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *arg)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_derived_key_resp *resp;
+ struct snp_derived_key_req req;
+ int rc, resp_len;
+
+ if (!arg->req_data || !arg->resp_data)
+ return -EINVAL;
+
+ /* Copy the request payload from the userspace */
+ if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
+ return -EFAULT;
+
+ /* Message version must be non-zero */
+ if (!req.msg_version)
+ return -EINVAL;
+
+ /*
+ * The intermediate response buffer is used while decrypting the
+ * response payload. Make sure that it has enough space to cover the
+ * authtag.
+ */
+ resp_len = sizeof(resp->data) + crypto->a_len;
+ resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
+ if (!resp)
+ return -ENOMEM;
+
+ /* Issue the command to get the attestation report */
+ rc = handle_guest_request(snp_dev, req.msg_version, SNP_MSG_KEY_REQ,
+ &req.data, sizeof(req.data), resp->data, resp_len,
+ &arg->fw_err);
+ if (rc)
+ goto e_free;
+
+ /* Copy the response payload to userspace */
+ if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
+ rc = -EFAULT;
+
+e_free:
+ kfree(resp);
+ return rc;
+}
+
static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
struct snp_guest_dev *snp_dev = to_snp_dev(file);
@@ -320,6 +364,10 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
ret = get_report(snp_dev, &input);
break;
}
+ case SNP_GET_DERIVED_KEY: {
+ ret = get_derived_key(snp_dev, &input);
+ break;
+ }
default:
break;
}
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index e8cfd15133f3..621a9167df7a 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -36,9 +36,33 @@ struct snp_user_guest_request {
__u64 fw_err;
};
+struct __snp_derived_key_req {
+ __u32 root_key_select;
+ __u32 rsvd;
+ __u64 guest_field_select;
+ __u32 vmpl;
+ __u32 guest_svn;
+ __u64 tcb_version;
+};
+
+struct snp_derived_key_req {
+ /* message version number (must be non-zero) */
+ __u8 msg_version;
+
+ struct __snp_derived_key_req data;
+};
+
+struct snp_derived_key_resp {
+ /* response data, see SEV-SNP spec for the format */
+ __u8 data[64];
+};
+
#define SNP_GUEST_REQ_IOC_TYPE 'S'
/* Get SNP attestation report */
#define SNP_GET_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x0, struct snp_user_guest_request)
+/* Get a derived key from the root */
+#define SNP_GET_DERIVED_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_user_guest_request)
+
#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.17.1
From: Tom Lendacky <[email protected]>
The save area for SEV-ES/SEV-SNP guests, as used by the hardware, is
different from the save area of a non SEV-ES/SEV-SNP guest.
This is the first step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Create an SEV-ES/SEV-SNP save area and adjust usage to the new
save area definition where needed.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 83 +++++++++++++++++++++++++++++---------
arch/x86/kvm/svm/sev.c | 24 +++++------
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 77 insertions(+), 32 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 5ac691c27dcc..edd4a9fe050f 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -225,6 +225,7 @@ struct vmcb_seg {
u64 base;
} __packed;
+/* Save area definition for legacy and SEV-MEM guests */
struct vmcb_save_area {
struct vmcb_seg es;
struct vmcb_seg cs;
@@ -241,8 +242,58 @@ struct vmcb_save_area {
u8 cpl;
u8 reserved_2[4];
u64 efer;
+ u8 reserved_3[112];
+ u64 cr4;
+ u64 cr3;
+ u64 cr0;
+ u64 dr7;
+ u64 dr6;
+ u64 rflags;
+ u64 rip;
+ u8 reserved_4[88];
+ u64 rsp;
+ u64 s_cet;
+ u64 ssp;
+ u64 isst_addr;
+ u64 rax;
+ u64 star;
+ u64 lstar;
+ u64 cstar;
+ u64 sfmask;
+ u64 kernel_gs_base;
+ u64 sysenter_cs;
+ u64 sysenter_esp;
+ u64 sysenter_eip;
+ u64 cr2;
+ u8 reserved_5[32];
+ u64 g_pat;
+ u64 dbgctl;
+ u64 br_from;
+ u64 br_to;
+ u64 last_excp_from;
+ u64 last_excp_to;
+ u8 reserved_6[72];
+ u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
+} __packed;
+
+/* Save area definition for SEV-ES and SEV-SNP guests */
+struct sev_es_save_area {
+ struct vmcb_seg es;
+ struct vmcb_seg cs;
+ struct vmcb_seg ss;
+ struct vmcb_seg ds;
+ struct vmcb_seg fs;
+ struct vmcb_seg gs;
+ struct vmcb_seg gdtr;
+ struct vmcb_seg ldtr;
+ struct vmcb_seg idtr;
+ struct vmcb_seg tr;
+ u8 reserved_1[43];
+ u8 cpl;
+ u8 reserved_2[4];
+ u64 efer;
u8 reserved_3[104];
- u64 xss; /* Valid for SEV-ES only */
+ u64 xss;
u64 cr4;
u64 cr3;
u64 cr0;
@@ -270,22 +321,14 @@ struct vmcb_save_area {
u64 br_to;
u64 last_excp_from;
u64 last_excp_to;
-
- /*
- * The following part of the save area is valid only for
- * SEV-ES guests when referenced through the GHCB or for
- * saving to the host save area.
- */
- u8 reserved_7[72];
- u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
- u8 reserved_7b[4];
+ u8 reserved_7[80];
u32 pkru;
- u8 reserved_7a[20];
- u64 reserved_8; /* rax already available at 0x01f8 */
+ u8 reserved_9[20];
+ u64 reserved_10; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
u64 rbx;
- u64 reserved_9; /* rsp already available at 0x01d8 */
+ u64 reserved_11; /* rsp already available at 0x01d8 */
u64 rbp;
u64 rsi;
u64 rdi;
@@ -297,21 +340,21 @@ struct vmcb_save_area {
u64 r13;
u64 r14;
u64 r15;
- u8 reserved_10[16];
+ u8 reserved_12[16];
u64 sw_exit_code;
u64 sw_exit_info_1;
u64 sw_exit_info_2;
u64 sw_scratch;
u64 sev_features;
- u8 reserved_11[48];
+ u8 reserved_13[48];
u64 xcr0;
u8 valid_bitmap[16];
u64 x87_state_gpa;
} __packed;
struct ghcb {
- struct vmcb_save_area save;
- u8 reserved_save[2048 - sizeof(struct vmcb_save_area)];
+ struct sev_es_save_area save;
+ u8 reserved_save[2048 - sizeof(struct sev_es_save_area)];
u8 shared_buffer[2032];
@@ -321,13 +364,15 @@ struct ghcb {
} __packed;
-#define EXPECTED_VMCB_SAVE_AREA_SIZE 1032
+#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
+#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 1024
#define EXPECTED_GHCB_SIZE PAGE_SIZE
static inline void __unused_size_checks(void)
{
BUILD_BUG_ON(sizeof(struct vmcb_save_area) != EXPECTED_VMCB_SAVE_AREA_SIZE);
+ BUILD_BUG_ON(sizeof(struct sev_es_save_area) != EXPECTED_SEV_ES_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct vmcb_control_area) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct ghcb) != EXPECTED_GHCB_SIZE);
}
@@ -397,7 +442,7 @@ struct vmcb {
/* GHCB Accessor functions */
#define GHCB_BITMAP_IDX(field) \
- (offsetof(struct vmcb_save_area, field) / sizeof(u64))
+ (offsetof(struct sev_es_save_area, field) / sizeof(u64))
#define DEFINE_GHCB_ACCESSORS(field) \
static inline bool ghcb_##field##_is_valid(const struct ghcb *ghcb) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6710d9ee2e4b..6ce9bafe768c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -553,12 +553,20 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
static int sev_es_sync_vmsa(struct vcpu_svm *svm)
{
- struct vmcb_save_area *save = &svm->vmcb->save;
+ struct sev_es_save_area *save = svm->vmsa;
/* Check some debug related fields before encrypting the VMSA */
- if (svm->vcpu.guest_debug || (save->dr7 & ~DR7_FIXED_1))
+ if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1))
return -EINVAL;
+ /*
+ * SEV-ES will use a VMSA that is pointed to by the VMCB, not
+ * the traditional VMSA that is part of the VMCB. Copy the
+ * traditional VMSA as it has been built so far (in prep
+ * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
+ */
+ memcpy(save, &svm->vmcb->save, sizeof(svm->vmcb->save));
+
/* Sync registgers */
save->rax = svm->vcpu.arch.regs[VCPU_REGS_RAX];
save->rbx = svm->vcpu.arch.regs[VCPU_REGS_RBX];
@@ -585,14 +593,6 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->pkru = svm->vcpu.arch.pkru;
save->xss = svm->vcpu.arch.ia32_xss;
- /*
- * SEV-ES will use a VMSA that is pointed to by the VMCB, not
- * the traditional VMSA that is part of the VMCB. Copy the
- * traditional VMSA as it has been built so far (in prep
- * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
- */
- memcpy(svm->vmsa, save, sizeof(*save));
-
return 0;
}
@@ -2609,7 +2609,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm)
void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
{
struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
- struct vmcb_save_area *hostsa;
+ struct sev_es_save_area *hostsa;
/*
* As an SEV-ES guest, hardware will restore the host state on VMEXIT,
@@ -2619,7 +2619,7 @@ void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
vmsave(__sme_page_pa(sd->save_area));
/* XCR0 is restored on VMEXIT, save the current host value */
- hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
+ hostsa = (struct sev_es_save_area *)(page_address(sd->save_area) + 0x400);
hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
/* PKRU is restored on VMEXIT, save the current host value */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bd0fe94c2920..8f4cdb98d8ee 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -184,7 +184,7 @@ struct vcpu_svm {
} shadow_msr_intercept;
/* SEV-ES support */
- struct vmcb_save_area *vmsa;
+ struct sev_es_save_area *vmsa;
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
bool received_first_sipi;
--
2.17.1
From: Michael Roth <[email protected]>
The run-time kernel will need to access the Confidential Computing
blob very early in boot to access the CPUID table it points to. At that
stage of boot it will be relying on the identity-mapped page table set
up by boot/compressed kernel, so make sure we have both of them mapped
in advance.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/ident_map_64.c | 18 ++++++++++++++++++
arch/x86/boot/compressed/sev.c | 2 +-
arch/x86/include/asm/sev.h | 6 ++++++
3 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index 3cf7a7575f5c..54374e0f0257 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -37,6 +37,9 @@
#include <asm/setup.h> /* For COMMAND_LINE_SIZE */
#undef _SETUP
+#define __BOOT_COMPRESSED
+#include <asm/sev.h> /* For sev_snp_active() + ConfidentialComputing blob */
+
extern unsigned long get_cmd_line_ptr(void);
/* Used by PAGE_KERN* macros: */
@@ -163,6 +166,21 @@ void initialize_identity_maps(void *rmode)
cmdline = get_cmd_line_ptr();
add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE);
+ /*
+ * The ConfidentialComputing blob is used very early in uncompressed
+ * kernel to find CPUID memory to handle cpuid instructions. Make sure
+ * an identity-mapping exists so they can be accessed after switchover.
+ */
+ if (sev_snp_enabled()) {
+ struct cc_blob_sev_info *cc_info =
+ (void *)(unsigned long)boot_params->cc_blob_address;
+
+ add_identity_map((unsigned long)cc_info,
+ (unsigned long)cc_info + sizeof(*cc_info));
+ add_identity_map((unsigned long)cc_info->cpuid_phys,
+ (unsigned long)cc_info->cpuid_phys + cc_info->cpuid_len);
+ }
+
/* Load the new page-table. */
sev_verify_cbit(top_level_pgt);
write_cr3(top_level_pgt);
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 910bf5cf010e..d1ecba457350 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -123,7 +123,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
/* Include code for early handlers */
#include "../../kernel/sev-shared.c"
-static inline bool sev_snp_enabled(void)
+bool sev_snp_enabled(void)
{
unsigned long low, high;
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index c73931548346..345740aa5559 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -127,6 +127,9 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
void snp_set_wakeup_secondary_cpu(void);
+#ifdef __BOOT_COMPRESSED
+bool sev_snp_enabled(void);
+#endif /* __BOOT_COMPRESSED */
void sev_snp_cpuid_init(struct boot_params *bp);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
@@ -144,6 +147,9 @@ static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npage
static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_wakeup_secondary_cpu(void) { }
static inline void sev_snp_cpuid_init(struct boot_params *bp) { }
+#ifdef __BOOT_COMPRESSED
+static inline bool sev_snp_enabled { return false; }
+#endif /*__BOOT_COMPRESSED */
#endif
#endif
--
2.17.1
From: Michael Roth <[email protected]>
The previously defined Confidential Computing blob is provided to the
kernel via a setup_data structure or EFI config table entry. Currently
these are both checked for by boot/compressed kernel to access the
CPUID table address within it for use with SEV-SNP CPUID enforcement.
To also enable SEV-SNP CPUID enforcement for the run-time kernel,
similar early access to the CPUID table is needed early on while it's
still using the identity-mapped page table set up by boot/compressed,
where global pointers need to be accessed via fixup_pointer().
This is much of an issue for accessing setup_data, and the EFI config
table helper code currently used in boot/compressed *could* be used in
this case as well since they both rely on identity-mapping. However, it
has some reliance on EFI helpers/string constants that would need to be
accessed via fixup_pointer(), and fixing it up while making it
shareable between boot/compressed and run-time kernel is fragile and
introduces a good bit of uglyness.
Instead, this patch adds a boot_params->cc_blob_address pointer that
boot/compressed can initialize so that the run-time kernel can access
the prelocated CC blob that way instead.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/bootparam_utils.h | 1 +
arch/x86/include/uapi/asm/bootparam.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/bootparam_utils.h b/arch/x86/include/asm/bootparam_utils.h
index 981fe923a59f..53e9b0620d96 100644
--- a/arch/x86/include/asm/bootparam_utils.h
+++ b/arch/x86/include/asm/bootparam_utils.h
@@ -74,6 +74,7 @@ static void sanitize_boot_params(struct boot_params *boot_params)
BOOT_PARAM_PRESERVE(hdr),
BOOT_PARAM_PRESERVE(e820_table),
BOOT_PARAM_PRESERVE(eddbuf),
+ BOOT_PARAM_PRESERVE(cc_blob_address),
};
memset(&scratch, 0, sizeof(scratch));
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 1ac5acca72ce..bea5cdcdf532 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -188,7 +188,8 @@ struct boot_params {
__u32 ext_ramdisk_image; /* 0x0c0 */
__u32 ext_ramdisk_size; /* 0x0c4 */
__u32 ext_cmd_line_ptr; /* 0x0c8 */
- __u8 _pad4[116]; /* 0x0cc */
+ __u8 _pad4[112]; /* 0x0cc */
+ __u32 cc_blob_address; /* 0x13c */
struct edid_info edid_info; /* 0x140 */
struct efi_info efi_info; /* 0x1c0 */
__u32 alt_mem_k; /* 0x1e0 */
--
2.17.1
From: Tom Lendacky <[email protected]>
The initial implementation of the GHCB spec was based on trying to keep
the register state offsets the same relative to the VM save area. However,
the save area for SEV-ES has changed within the hardware causing the
relation between the SEV-ES save area to change relative to the GHCB save
area.
This is the second step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Create a GHCB save area that matches the GHCB specification.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 48 +++++++++++++++++++++++++++++++++++---
1 file changed, 45 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index edd4a9fe050f..748fe1c82a2b 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -352,9 +352,49 @@ struct sev_es_save_area {
u64 x87_state_gpa;
} __packed;
+struct ghcb_save_area {
+ u8 reserved_1[203];
+ u8 cpl;
+ u8 reserved_2[116];
+ u64 xss;
+ u8 reserved_3[24];
+ u64 dr7;
+ u8 reserved_4[16];
+ u64 rip;
+ u8 reserved_5[88];
+ u64 rsp;
+ u8 reserved_6[24];
+ u64 rax;
+ u8 reserved_7[264];
+ u64 rcx;
+ u64 rdx;
+ u64 rbx;
+ u8 reserved_8[8];
+ u64 rbp;
+ u64 rsi;
+ u64 rdi;
+ u64 r8;
+ u64 r9;
+ u64 r10;
+ u64 r11;
+ u64 r12;
+ u64 r13;
+ u64 r14;
+ u64 r15;
+ u8 reserved_9[16];
+ u64 sw_exit_code;
+ u64 sw_exit_info_1;
+ u64 sw_exit_info_2;
+ u64 sw_scratch;
+ u8 reserved_10[56];
+ u64 xcr0;
+ u8 valid_bitmap[16];
+ u64 x87_state_gpa;
+} __packed;
+
struct ghcb {
- struct sev_es_save_area save;
- u8 reserved_save[2048 - sizeof(struct sev_es_save_area)];
+ struct ghcb_save_area save;
+ u8 reserved_save[2048 - sizeof(struct ghcb_save_area)];
u8 shared_buffer[2032];
@@ -365,6 +405,7 @@ struct ghcb {
#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
+#define EXPECTED_GHCB_SAVE_AREA_SIZE 1032
#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 1024
#define EXPECTED_GHCB_SIZE PAGE_SIZE
@@ -372,6 +413,7 @@ struct ghcb {
static inline void __unused_size_checks(void)
{
BUILD_BUG_ON(sizeof(struct vmcb_save_area) != EXPECTED_VMCB_SAVE_AREA_SIZE);
+ BUILD_BUG_ON(sizeof(struct ghcb_save_area) != EXPECTED_GHCB_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct sev_es_save_area) != EXPECTED_SEV_ES_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct vmcb_control_area) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct ghcb) != EXPECTED_GHCB_SIZE);
@@ -442,7 +484,7 @@ struct vmcb {
/* GHCB Accessor functions */
#define GHCB_BITMAP_IDX(field) \
- (offsetof(struct sev_es_save_area, field) / sizeof(u64))
+ (offsetof(struct ghcb_save_area, field) / sizeof(u64))
#define DEFINE_GHCB_ACCESSORS(field) \
static inline bool ghcb_##field##_is_valid(const struct ghcb *ghcb) \
--
2.17.1
The set_memory_{encrypt,decrypt}() are used for changing the pages
from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps.
If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:
1. Issue the page state change VMGEXIT to add the memory region in
the RMP table.
2. Validate the memory region after the RMP entry is added.
To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before encryption attribute
is removed from the page table:
1. Invalidate the page.
2. Issue the page state change VMGEXIT to remove the page from RMP table.
To change the page state in the RMP table, use the Page State Change
VMGEXIT defined in the GHCB specification.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 24 +++++
arch/x86/include/asm/sev.h | 4 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kernel/sev.c | 165 ++++++++++++++++++++++++++++++
arch/x86/mm/pat/set_memory.c | 15 +++
5 files changed, 210 insertions(+)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 37aa77565726..3388db814fd0 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -74,6 +74,8 @@
enum psc_op {
SNP_PAGE_STATE_PRIVATE = 1,
SNP_PAGE_STATE_SHARED,
+ SNP_PAGE_STATE_PSMASH,
+ SNP_PAGE_STATE_UNSMASH,
};
#define GHCB_MSR_PSC_REQ 0x014
@@ -99,6 +101,28 @@ enum psc_op {
#define GHCB_HV_FT_SNP BIT_ULL(0)
+/* SNP Page State Change NAE event */
+#define VMGEXIT_PSC_MAX_ENTRY 253
+
+struct psc_hdr {
+ u16 cur_entry;
+ u16 end_entry;
+ u32 reserved;
+} __packed;
+
+struct psc_entry {
+ u64 cur_page : 12,
+ gfn : 40,
+ operation : 4,
+ pagesize : 1,
+ reserved : 7;
+} __packed;
+
+struct snp_psc_desc {
+ struct psc_hdr hdr;
+ struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY];
+} __packed;
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ecd8cd8c5908..005f230d0406 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -109,6 +109,8 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
unsigned int npages);
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
+void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
+void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -121,6 +123,8 @@ early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned
static inline void __init
early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
+static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
+static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
#endif
#endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index b0ad00f4c1e1..0dcdb6e0c913 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -108,6 +108,7 @@
#define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
+#define SVM_VMGEXIT_PSC 0x80000010
#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff
@@ -219,6 +220,7 @@
{ SVM_VMGEXIT_NMI_COMPLETE, "vmgexit_nmi_complete" }, \
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
+ { SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
{ SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 0ddc032fd252..106b4aaddfde 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -693,6 +693,171 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
WARN(1, "invalid memory op %d\n", op);
}
+static int vmgexit_psc(struct snp_psc_desc *desc)
+{
+ int cur_entry, end_entry, ret;
+ struct snp_psc_desc *data;
+ struct ghcb_state state;
+ struct ghcb *ghcb;
+ struct psc_hdr *hdr;
+ unsigned long flags;
+
+ local_irq_save(flags);
+
+ ghcb = __sev_get_ghcb(&state);
+ if (unlikely(!ghcb))
+ panic("SEV-SNP: Failed to get GHCB\n");
+
+ /* Copy the input desc into GHCB shared buffer */
+ data = (struct snp_psc_desc *)ghcb->shared_buffer;
+ memcpy(ghcb->shared_buffer, desc, sizeof(*desc));
+
+ hdr = &data->hdr;
+ cur_entry = hdr->cur_entry;
+ end_entry = hdr->end_entry;
+
+ /*
+ * As per the GHCB specification, the hypervisor can resume the guest
+ * before processing all the entries. Checks whether all the entries
+ * are processed. If not, then keep retrying.
+ *
+ * The stragtegy here is to wait for the hypervisor to change the page
+ * state in the RMP table before guest access the memory pages. If the
+ * page state was not successful, then later memory access will result
+ * in the crash.
+ */
+ while (hdr->cur_entry <= hdr->end_entry) {
+ ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
+
+ ret = sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PSC, 0, 0);
+
+ /*
+ * Page State Change VMGEXIT can pass error code through
+ * exit_info_2.
+ */
+ if (WARN(ret || ghcb->save.sw_exit_info_2,
+ "SEV-SNP: PSC failed ret=%d exit_info_2=%llx\n",
+ ret, ghcb->save.sw_exit_info_2)) {
+ ret = 1;
+ goto out;
+ }
+
+ /*
+ * Sanity check that entry processing is not going backward.
+ * This will happen only if hypervisor is tricking us.
+ */
+ if (WARN(hdr->end_entry > end_entry || cur_entry > hdr->cur_entry,
+ "SEV-SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
+ end_entry, hdr->end_entry, cur_entry, hdr->cur_entry)) {
+ ret = 1;
+ goto out;
+ }
+
+ /* Verify that reserved bit is not set */
+ if (WARN(hdr->reserved, "Reserved bit is set in the PSC header\n")) {
+ ret = 1;
+ goto out;
+ }
+ }
+
+out:
+ __sev_put_ghcb(&state);
+ local_irq_restore(flags);
+
+ return 0;
+}
+
+static void __set_page_state(struct snp_psc_desc *data, unsigned long vaddr,
+ unsigned long vaddr_end, int op)
+{
+ struct psc_hdr *hdr;
+ struct psc_entry *e;
+ unsigned long pfn;
+ int i;
+
+ hdr = &data->hdr;
+ e = data->entries;
+
+ memset(data, 0, sizeof(*data));
+ i = 0;
+
+ while (vaddr < vaddr_end) {
+ if (is_vmalloc_addr((void *)vaddr))
+ pfn = vmalloc_to_pfn((void *)vaddr);
+ else
+ pfn = __pa(vaddr) >> PAGE_SHIFT;
+
+ e->gfn = pfn;
+ e->operation = op;
+ hdr->end_entry = i;
+
+ /*
+ * The GHCB specification provides the flexibility to
+ * use either 4K or 2MB page size in the RMP table.
+ * The current SNP support does not keep track of the
+ * page size used in the RMP table. To avoid the
+ * overlap request, use the 4K page size in the RMP
+ * table.
+ */
+ e->pagesize = RMP_PG_SIZE_4K;
+
+ vaddr = vaddr + PAGE_SIZE;
+ e++;
+ i++;
+ }
+
+ if (vmgexit_psc(data))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+}
+
+static void set_page_state(unsigned long vaddr, unsigned int npages, int op)
+{
+ unsigned long vaddr_end, next_vaddr;
+ struct snp_psc_desc *desc;
+
+ vaddr = vaddr & PAGE_MASK;
+ vaddr_end = vaddr + (npages << PAGE_SHIFT);
+
+ desc = kmalloc(sizeof(*desc), GFP_KERNEL_ACCOUNT);
+ if (!desc)
+ panic("SEV-SNP: failed to alloc memory for PSC descriptor\n");
+
+ while (vaddr < vaddr_end) {
+ /*
+ * Calculate the last vaddr that can be fit in one
+ * struct snp_psc_desc.
+ */
+ next_vaddr = min_t(unsigned long, vaddr_end,
+ (VMGEXIT_PSC_MAX_ENTRY * PAGE_SIZE) + vaddr);
+
+ __set_page_state(desc, vaddr, next_vaddr, op);
+
+ vaddr = next_vaddr;
+ }
+
+ kfree(desc);
+}
+
+void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return;
+
+ pvalidate_pages(vaddr, npages, 0);
+
+ set_page_state(vaddr, npages, SNP_PAGE_STATE_SHARED);
+}
+
+void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return;
+
+ set_page_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
+
+ pvalidate_pages(vaddr, npages, 1);
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index ad8a5c586a35..8e6952d626ec 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -29,6 +29,7 @@
#include <asm/proto.h>
#include <asm/memtype.h>
#include <asm/set_memory.h>
+#include <asm/sev.h>
#include "../mm_internal.h"
@@ -2009,8 +2010,22 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
*/
cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
+ /*
+ * To maintain the security gurantees of SEV-SNP guest invalidate the memory
+ * before clearing the encryption attribute.
+ */
+ if (!enc)
+ snp_set_memory_shared(addr, numpages);
+
ret = __change_page_attr_set_clr(&cpa, 1);
+ /*
+ * Now that memory is mapped encrypted in the page table, validate it
+ * so that is consistent with the above page state.
+ */
+ if (!ret && enc)
+ snp_set_memory_private(addr, numpages);
+
/*
* After changing the encryption attribute, we need to flush TLBs again
* in case any speculative TLB caching occurred (but no need to flush
--
2.17.1
From: Tom Lendacky <[email protected]>
This is the final step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Update the SEV-ES/SEV-SNP save area to match the APM. This save
area will be used for the upcoming SEV-SNP AP Creation NAE event support.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 66 +++++++++++++++++++++++++++++---------
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 748fe1c82a2b..44a3f920f886 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -288,7 +288,13 @@ struct sev_es_save_area {
struct vmcb_seg ldtr;
struct vmcb_seg idtr;
struct vmcb_seg tr;
- u8 reserved_1[43];
+ u64 vmpl0_ssp;
+ u64 vmpl1_ssp;
+ u64 vmpl2_ssp;
+ u64 vmpl3_ssp;
+ u64 u_cet;
+ u8 reserved_1[2];
+ u8 vmpl;
u8 cpl;
u8 reserved_2[4];
u64 efer;
@@ -301,9 +307,19 @@ struct sev_es_save_area {
u64 dr6;
u64 rflags;
u64 rip;
- u8 reserved_4[88];
+ u64 dr0;
+ u64 dr1;
+ u64 dr2;
+ u64 dr3;
+ u64 dr0_addr_mask;
+ u64 dr1_addr_mask;
+ u64 dr2_addr_mask;
+ u64 dr3_addr_mask;
+ u8 reserved_4[24];
u64 rsp;
- u8 reserved_5[24];
+ u64 s_cet;
+ u64 ssp;
+ u64 isst_addr;
u64 rax;
u64 star;
u64 lstar;
@@ -314,7 +330,7 @@ struct sev_es_save_area {
u64 sysenter_esp;
u64 sysenter_eip;
u64 cr2;
- u8 reserved_6[32];
+ u8 reserved_5[32];
u64 g_pat;
u64 dbgctl;
u64 br_from;
@@ -323,12 +339,12 @@ struct sev_es_save_area {
u64 last_excp_to;
u8 reserved_7[80];
u32 pkru;
- u8 reserved_9[20];
- u64 reserved_10; /* rax already available at 0x01f8 */
+ u8 reserved_8[20];
+ u64 reserved_9; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
u64 rbx;
- u64 reserved_11; /* rsp already available at 0x01d8 */
+ u64 reserved_10; /* rsp already available at 0x01d8 */
u64 rbp;
u64 rsi;
u64 rdi;
@@ -340,16 +356,34 @@ struct sev_es_save_area {
u64 r13;
u64 r14;
u64 r15;
- u8 reserved_12[16];
- u64 sw_exit_code;
- u64 sw_exit_info_1;
- u64 sw_exit_info_2;
- u64 sw_scratch;
+ u8 reserved_11[16];
+ u64 guest_exit_info_1;
+ u64 guest_exit_info_2;
+ u64 guest_exit_int_info;
+ u64 guest_nrip;
u64 sev_features;
- u8 reserved_13[48];
+ u64 vintr_ctrl;
+ u64 guest_exit_code;
+ u64 virtual_tom;
+ u64 tlb_id;
+ u64 pcpu_id;
+ u64 event_inj;
u64 xcr0;
- u8 valid_bitmap[16];
- u64 x87_state_gpa;
+ u8 reserved_12[16];
+
+ /* Floating point area */
+ u64 x87_dp;
+ u32 mxcsr;
+ u16 x87_ftw;
+ u16 x87_fsw;
+ u16 x87_fcw;
+ u16 x87_fop;
+ u16 x87_ds;
+ u16 x87_cs;
+ u64 x87_rip;
+ u8 fpreg_x87[80];
+ u8 fpreg_xmm[256];
+ u8 fpreg_ymm[256];
} __packed;
struct ghcb_save_area {
@@ -406,7 +440,7 @@ struct ghcb {
#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
#define EXPECTED_GHCB_SAVE_AREA_SIZE 1032
-#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
+#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1648
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 1024
#define EXPECTED_GHCB_SIZE PAGE_SIZE
--
2.17.1
The SEV-ES guest calls the sev_es_negotiate_protocol() to negotiate the
GHCB protocol version before establishing the GHCB. Cache the negotiated
GHCB version so that it can be used later.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 2 +-
arch/x86/kernel/sev-shared.c | 17 ++++++++++++++---
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index fa5cd05d3b5b..7ec91b1359df 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -12,7 +12,7 @@
#include <asm/insn.h>
#include <asm/sev-common.h>
-#define GHCB_PROTO_OUR 0x0001UL
+#define GHCB_PROTOCOL_MIN 1ULL
#define GHCB_PROTOCOL_MAX 1ULL
#define GHCB_DEFAULT_USAGE 0ULL
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index dab73fec74ec..58a6efb1f327 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -14,6 +14,15 @@
#define has_cpuflag(f) boot_cpu_has(f)
#endif
+/*
+ * Since feature negotiation related variables are set early in the boot
+ * process they must reside in the .data section so as not to be zeroed
+ * out when the .bss section is later cleared.
+ *
+ * GHCB protocol version negotiated with the hypervisor.
+ */
+static u16 __ro_after_init ghcb_version;
+
static bool __init sev_es_check_cpu_features(void)
{
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -51,10 +60,12 @@ static bool sev_es_negotiate_protocol(void)
if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
return false;
- if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTO_OUR ||
- GHCB_MSR_PROTO_MIN(val) > GHCB_PROTO_OUR)
+ if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
+ GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
return false;
+ ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
+
return true;
}
@@ -99,7 +110,7 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
enum es_result ret;
/* Fill in protocol and format specifiers */
- ghcb->protocol_version = GHCB_PROTOCOL_MAX;
+ ghcb->protocol_version = ghcb_version;
ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
ghcb_set_sw_exit_code(ghcb, exit_code);
--
2.17.1
Version 2 of GHCB specification provides SNP_GUEST_REQUEST and
SNP_EXT_GUEST_REQUEST NAE that can be used by the SNP guest to communicate
with the PSP.
While at it, add a snp_issue_guest_request() helper that can be used by
driver or other subsystem to issue the request to PSP.
See SEV-SNP and GHCB spec for more details.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/uapi/asm/svm.h | 4 +++
arch/x86/kernel/sev.c | 57 +++++++++++++++++++++++++++++++++
include/linux/sev-guest.h | 48 +++++++++++++++++++++++++++
3 files changed, 109 insertions(+)
create mode 100644 include/linux/sev-guest.h
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 8b4c57baec52..5b8bc2b65a5e 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -109,6 +109,8 @@
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
#define SVM_VMGEXIT_PSC 0x80000010
+#define SVM_VMGEXIT_GUEST_REQUEST 0x80000011
+#define SVM_VMGEXIT_EXT_GUEST_REQUEST 0x80000012
#define SVM_VMGEXIT_AP_CREATION 0x80000013
#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
#define SVM_VMGEXIT_AP_CREATE 1
@@ -225,6 +227,8 @@
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
{ SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
+ { SVM_VMGEXIT_GUEST_REQUEST, "vmgexit_guest_request" }, \
+ { SVM_VMGEXIT_EXT_GUEST_REQUEST, "vmgexit_ext_guest_request" }, \
{ SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
{ SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index d7b6f7420551..319a40fc57ce 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -21,6 +21,7 @@
#include <linux/cpumask.h>
#include <linux/log2.h>
#include <linux/efi.h>
+#include <linux/sev-guest.h>
#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -2028,3 +2029,59 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
while (true)
halt();
}
+
+int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsigned long *fw_err)
+{
+ struct ghcb_state state;
+ unsigned long id, flags;
+ struct ghcb *ghcb;
+ int ret;
+
+ if (!sev_feature_enabled(SEV_SNP))
+ return -ENODEV;
+
+ local_irq_save(flags);
+
+ ghcb = __sev_get_ghcb(&state);
+ if (!ghcb) {
+ ret = -EIO;
+ goto e_restore_irq;
+ }
+
+ vc_ghcb_invalidate(ghcb);
+
+ if (type == GUEST_REQUEST) {
+ id = SVM_VMGEXIT_GUEST_REQUEST;
+ } else if (type == EXT_GUEST_REQUEST) {
+ id = SVM_VMGEXIT_EXT_GUEST_REQUEST;
+ ghcb_set_rax(ghcb, input->data_gpa);
+ ghcb_set_rbx(ghcb, input->data_npages);
+ } else {
+ ret = -EINVAL;
+ goto e_put;
+ }
+
+ ret = sev_es_ghcb_hv_call(ghcb, NULL, id, input->req_gpa, input->resp_gpa);
+ if (ret)
+ goto e_put;
+
+ if (ghcb->save.sw_exit_info_2) {
+ /* Number of expected pages are returned in RBX */
+ if (id == EXT_GUEST_REQUEST &&
+ ghcb->save.sw_exit_info_2 == SNP_GUEST_REQ_INVALID_LEN)
+ input->data_npages = ghcb_get_rbx(ghcb);
+
+ if (fw_err)
+ *fw_err = ghcb->save.sw_exit_info_2;
+
+ ret = -EIO;
+ }
+
+e_put:
+ __sev_put_ghcb(&state);
+e_restore_irq:
+ local_irq_restore(flags);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(snp_issue_guest_request);
diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
new file mode 100644
index 000000000000..24dd17507789
--- /dev/null
+++ b/include/linux/sev-guest.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * AMD Secure Encrypted Virtualization (SEV) guest driver interface
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ *
+ */
+
+#ifndef __LINUX_SEV_GUEST_H_
+#define __LINUX_SEV_GUEST_H_
+
+#include <linux/types.h>
+
+enum vmgexit_type {
+ GUEST_REQUEST,
+ EXT_GUEST_REQUEST,
+
+ GUEST_REQUEST_MAX
+};
+
+/*
+ * The error code when the data_npages is too small. The error code
+ * is defined in the GHCB specification.
+ */
+#define SNP_GUEST_REQ_INVALID_LEN 0x100000000ULL
+
+struct snp_guest_request_data {
+ unsigned long req_gpa;
+ unsigned long resp_gpa;
+ unsigned long data_gpa;
+ unsigned int data_npages;
+};
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
+ unsigned long *fw_err);
+#else
+
+static inline int snp_issue_guest_request(int type, struct snp_guest_request_data *input,
+ unsigned long *fw_err)
+{
+ return -ENODEV;
+}
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+#endif /* __LINUX_SEV_GUEST_H__ */
--
2.17.1
From: Michael Roth <[email protected]>
This adds support for utilizing the SEV-SNP-validated CPUID table in
the various #VC handler routines used throughout boot/run-time. Mostly
this is handled by re-using the CPUID lookup code introduced earlier
for the boot/compressed kernel, but at various stages of boot some work
needs to be done to ensure the CPUID table is set up and remains
accessible throughout. The following init routines are introduced to
handle this:
sev_snp_cpuid_init():
This sets up access to the CPUID memory range for the #VC handler
that gets set up just after entry to startup_64(). Since the code is
still using an identity mapping, the existing sev_snp_cpuid_init()
used by boot/compressed is used here as well, but annotated as __init
so it can be cleaned up later (boot/compressed/sev.c already defines
away __init when it pulls in shared SEV code). The boot/compressed
kernel handles any necessary lookup of ConfidentialComputing blob
from EFI and puts it into boot_params if present, so only boot_params
needs to be checked.
sev_snp_cpuid_init_virtual():
This is called when the previous identity mapping is gone and the
memory used for the CPUID memory range needs to be mapped into the
new page table with encryption bit set and accessed via __va().
Since this path is also entered later by APs to set up their initial
VC handlers, a function pointer is used to switch them to a handler
that doesn't attempt to re-initialize the SNP CPUID feature, as at
that point it will have already been set up.
sev_snp_cpuid_init_remap_early():
This is called when the previous mapping of CPUID memory range is no
longer present. early_memremap() is now available, so use that to
create a new one that can be used until memremap() is available.
sev_snp_cpuid_init_remap():
This switches away from using early_memremap() to ioremap_encrypted()
to map CPUID memory range, otherwise the leak detector will complain.
This mapping is what gets used for the remaining life of the guest.
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/realmode.h | 1 +
arch/x86/include/asm/setup.h | 5 +-
arch/x86/include/asm/sev.h | 6 +++
arch/x86/kernel/head64.c | 21 ++++++--
arch/x86/kernel/head_64.S | 6 ++-
arch/x86/kernel/setup.c | 3 ++
arch/x86/kernel/sev-shared.c | 95 ++++++++++++++++++++++++++++++++-
arch/x86/kernel/smpboot.c | 2 +
8 files changed, 129 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 5db5d083c873..ff0eecee4235 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -63,6 +63,7 @@ extern unsigned long initial_stack;
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern unsigned long initial_vc_handler;
#endif
+extern unsigned long initial_idt_setup;
extern unsigned char real_mode_blob[];
extern unsigned char real_mode_relocs[];
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index a12458a7a8d4..12fc52894ad8 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -50,8 +50,9 @@ extern void reserve_standard_io_resources(void);
extern void i386_reserve_resources(void);
extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp);
extern unsigned long __startup_secondary_64(void);
-extern void startup_64_setup_env(unsigned long physbase);
-extern void early_setup_idt(void);
+extern void startup_64_setup_env(unsigned long physbase, struct boot_params *bp);
+extern void early_setup_idt_common(void *rmode);
+extern void __init early_setup_idt(void *rmode);
extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
#ifdef CONFIG_X86_INTEL_MID
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 345740aa5559..a5f0a1c3ccbe 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -129,6 +129,9 @@ void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
void snp_set_wakeup_secondary_cpu(void);
#ifdef __BOOT_COMPRESSED
bool sev_snp_enabled(void);
+#else
+void sev_snp_cpuid_init_virtual(void);
+void sev_snp_cpuid_init_remap_early(void);
#endif /* __BOOT_COMPRESSED */
void sev_snp_cpuid_init(struct boot_params *bp);
#else
@@ -149,6 +152,9 @@ static inline void snp_set_wakeup_secondary_cpu(void) { }
static inline void sev_snp_cpuid_init(struct boot_params *bp) { }
#ifdef __BOOT_COMPRESSED
static inline bool sev_snp_enabled { return false; }
+#else
+static inline void sev_snp_cpuid_init_virtual(void) { }
+static inline void sev_snp_cpuid_init_remap_early(void) { }
#endif /*__BOOT_COMPRESSED */
#endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index f1b76a54c84e..4700926deb52 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -576,7 +576,7 @@ static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
}
/* This runs while still in the direct mapping */
-static void startup_64_load_idt(unsigned long physbase)
+static void startup_64_load_idt(unsigned long physbase, struct boot_params *bp)
{
struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
@@ -586,6 +586,7 @@ static void startup_64_load_idt(unsigned long physbase)
void *handler;
/* VMM Communication Exception */
+ sev_snp_cpuid_init(bp); /* used by #VC handler */
handler = fixup_pointer(vc_no_ghcb, physbase);
set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
}
@@ -594,8 +595,8 @@ static void startup_64_load_idt(unsigned long physbase)
native_load_idt(desc);
}
-/* This is used when running on kernel addresses */
-void early_setup_idt(void)
+/* Used for all CPUs */
+void early_setup_idt_common(void *rmode)
{
/* VMM Communication Exception */
if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
@@ -605,10 +606,20 @@ void early_setup_idt(void)
native_load_idt(&bringup_idt_descr);
}
+/* This is used by boot processor when running on kernel addresses */
+void __init early_setup_idt(void *rmode)
+{
+ /* SEV-SNP CPUID setup for use by #VC handler */
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+ sev_snp_cpuid_init_virtual();
+
+ early_setup_idt_common(rmode);
+}
+
/*
* Setup boot CPU state needed before kernel switches to virtual addresses.
*/
-void __head startup_64_setup_env(unsigned long physbase)
+void __head startup_64_setup_env(unsigned long physbase, struct boot_params *bp)
{
u64 gs_area = (u64)fixup_pointer(startup_gs_area, physbase);
@@ -634,5 +645,5 @@ void __head startup_64_setup_env(unsigned long physbase)
native_wrmsr(MSR_GS_BASE, gs_area, gs_area >> 32);
#endif
- startup_64_load_idt(physbase);
+ startup_64_load_idt(physbase, bp);
}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d8b3ebd2bb85..78f35e446498 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -218,7 +218,10 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
/* Setup and Load IDT */
pushq %rsi
- call early_setup_idt
+ movq %rsi, %rdi
+ movq initial_idt_setup(%rip), %rax
+ ANNOTATE_RETPOLINE_SAFE
+ call *%rax
popq %rsi
/* Check if nx is implemented */
@@ -341,6 +344,7 @@ SYM_DATA(initial_gs, .quad INIT_PER_CPU_VAR(fixed_percpu_data))
#ifdef CONFIG_AMD_MEM_ENCRYPT
SYM_DATA(initial_vc_handler, .quad handle_vc_boot_ghcb)
#endif
+SYM_DATA(initial_idt_setup, .quad early_setup_idt)
/*
* The FRAME_SIZE gap is a convention which helps the in-kernel unwinder
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bff3a784aec5..e81fc19657b7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -48,6 +48,7 @@
#include <asm/thermal.h>
#include <asm/unwind.h>
#include <asm/vsyscall.h>
+#include <asm/sev.h>
#include <linux/vmalloc.h>
/*
@@ -1075,6 +1076,8 @@ void __init setup_arch(char **cmdline_p)
init_mem_mapping();
+ sev_snp_cpuid_init_remap_early();
+
idt_setup_early_pf();
/*
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 6f70ba293c5e..e257df79830c 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -264,7 +264,7 @@ static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
return 0;
}
-static bool sev_snp_cpuid_active(void)
+static inline bool sev_snp_cpuid_active(void)
{
return sev_snp_cpuid_enabled;
}
@@ -960,7 +960,7 @@ static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
* indication that SEV-ES is enabled. Subsequent init levels will check for
* SEV_SNP feature once available to also take SEV MSR value into account.
*/
-void sev_snp_cpuid_init(struct boot_params *bp)
+void __init sev_snp_cpuid_init(struct boot_params *bp)
{
struct cc_blob_sev_info *cc_info;
@@ -995,3 +995,94 @@ void sev_snp_cpuid_init(struct boot_params *bp)
sev_snp_cpuid_enabled = 1;
}
+
+#ifndef __BOOT_COMPRESSED
+
+static bool __init early_make_pgtable_enc(unsigned long physaddr)
+{
+ pmdval_t pmd;
+
+ /* early_pmd_flags hasn't been updated with SME bit yet; add it */
+ pmd = (physaddr & PMD_MASK) + early_pmd_flags + sme_get_me_mask();
+
+ return __early_make_pgtable((unsigned long)__va(physaddr), pmd);
+}
+
+/*
+ * This is called when we switch to virtual kernel addresses, before #PF
+ * handler is set up. boot_params have already been parsed at this point,
+ * but CPUID page is no longer identity-mapped so we need to create a
+ * virtual mapping.
+ */
+void __init sev_snp_cpuid_init_virtual(void)
+{
+ /*
+ * We rely on sev_snp_cpuid_init() to do initial parsing of bootparams
+ * and initial setup. If that didn't enable the feature then don't try
+ * to enable it here.
+ */
+ if (!sev_snp_cpuid_active())
+ return;
+
+ /*
+ * Either boot_params/EFI advertised the feature even though SNP isn't
+ * enabled, or something else went wrong. Bail out.
+ */
+ if (!sev_feature_enabled(SEV_SNP))
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ /* If feature is enabled, but we can't map CPUID info, we're hosed */
+ if (!early_make_pgtable_enc(sev_snp_cpuid_pa))
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ cpuid_info = (const struct sev_snp_cpuid_info *)__va(sev_snp_cpuid_pa);
+}
+
+/* Called after early_ioremap_init() */
+void __init sev_snp_cpuid_init_remap_early(void)
+{
+ if (!sev_snp_cpuid_active())
+ return;
+
+ /*
+ * This really shouldn't be possible at this point.
+ */
+ if (!sev_feature_enabled(SEV_SNP))
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ cpuid_info = early_memremap(sev_snp_cpuid_pa, sev_snp_cpuid_sz);
+}
+
+/* Final switch to run-time mapping */
+static int __init sev_snp_cpuid_init_remap(void)
+{
+ if (!sev_snp_cpuid_active())
+ return 0;
+
+ pr_info("Using SNP CPUID page, %d entries present.\n", cpuid_info->count);
+
+ /*
+ * This really shouldn't be possible at this point either.
+ */
+ if (!sev_feature_enabled(SEV_SNP))
+ sev_es_terminate(1, GHCB_TERM_CPUID);
+
+ /* Clean up earlier mapping. */
+ if (cpuid_info)
+ early_memunmap((void *)cpuid_info, sev_snp_cpuid_sz);
+
+ /*
+ * We need ioremap_encrypted() to get an encrypted mapping, but this
+ * is normal RAM so can be accessed directly.
+ */
+ cpuid_info = (__force void *)ioremap_encrypted(sev_snp_cpuid_pa,
+ sev_snp_cpuid_sz);
+ if (!cpuid_info)
+ return -EIO;
+
+ return 0;
+}
+
+arch_initcall(sev_snp_cpuid_init_remap);
+
+#endif /* __BOOT_COMPRESSED */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ca78711620e0..02c172ab97de 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1044,6 +1044,8 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu);
initial_code = (unsigned long)start_secondary;
initial_stack = idle->thread.sp;
+ /* don't repeat IDT setup work specific to the BSP */
+ initial_idt_setup = (unsigned long)early_setup_idt_common;
/* Enable the espfix hack for this CPU */
init_espfix_ap(cpu);
--
2.17.1
Many of the integrity guarantees of SEV-SNP are enforced through the
Reverse Map Table (RMP). Each RMP entry contains the GPA at which a
particular page of DRAM should be mapped. The VMs can request the
hypervisor to add pages in the RMP table via the Page State Change VMGEXIT
defined in the GHCB specification. Inside each RMP entry is a Validated
flag; this flag is automatically cleared to 0 by the CPU hardware when a
new RMP entry is created for a guest. Each VM page can be either
validated or invalidated, as indicated by the Validated flag in the RMP
entry. Memory access to a private page that is not validated generates
a #VC. A VM must use PVALIDATE instruction to validate the private page
before using it.
To maintain the security guarantee of SEV-SNP guests, when transitioning
pages from private to shared, the guest must invalidate the pages before
asking the hypervisor to change the page state to shared in the RMP table.
After the pages are mapped private in the page table, the guest must issue
a page state change VMGEXIT to make the pages private in the RMP table and
validate it.
On boot, BIOS should have validated the entire system memory. During
the kernel decompression stage, the VC handler uses the
set_memory_decrypted() to make the GHCB page shared (i.e clear encryption
attribute). And while exiting from the decompression, it calls the
set_page_encrypted() to make the page private.
Add sev_snp_set_page_{private,shared}() helper that is used by the
set_memory_{decrypt,encrypt}() to change the page state in the RMP table.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/ident_map_64.c | 18 ++++++++++-
arch/x86/boot/compressed/misc.h | 6 ++++
arch/x86/boot/compressed/sev.c | 41 +++++++++++++++++++++++++
arch/x86/include/asm/sev-common.h | 20 ++++++++++++
4 files changed, 84 insertions(+), 1 deletion(-)
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index f7213d0943b8..3cf7a7575f5c 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -275,15 +275,31 @@ static int set_clr_page_flags(struct x86_mapping_info *info,
* Changing encryption attributes of a page requires to flush it from
* the caches.
*/
- if ((set | clr) & _PAGE_ENC)
+ if ((set | clr) & _PAGE_ENC) {
clflush_page(address);
+ /*
+ * If the encryption attribute is being cleared, then change
+ * the page state to shared in the RMP table.
+ */
+ if (clr)
+ snp_set_page_shared(pte_pfn(*ptep) << PAGE_SHIFT);
+ }
+
/* Update PTE */
pte = *ptep;
pte = pte_set_flags(pte, set);
pte = pte_clear_flags(pte, clr);
set_pte(ptep, pte);
+ /*
+ * If the encryption attribute is being set, then change the page state to
+ * private in the RMP entry. The page state must be done after the PTE
+ * is updated.
+ */
+ if (set & _PAGE_ENC)
+ snp_set_page_private(pte_pfn(*ptep) << PAGE_SHIFT);
+
/* Flush TLB after changing encryption attribute */
write_cr3(top_level_pgt);
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 31139256859f..822e0c254b9a 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -121,12 +121,18 @@ void set_sev_encryption_mask(void);
#ifdef CONFIG_AMD_MEM_ENCRYPT
void sev_es_shutdown_ghcb(void);
extern bool sev_es_check_ghcb_fault(unsigned long address);
+void snp_set_page_private(unsigned long paddr);
+void snp_set_page_shared(unsigned long paddr);
+
#else
static inline void sev_es_shutdown_ghcb(void) { }
static inline bool sev_es_check_ghcb_fault(unsigned long address)
{
return false;
}
+static inline void snp_set_page_private(unsigned long paddr) { }
+static inline void snp_set_page_shared(unsigned long paddr) { }
+
#endif
/* acpi.c */
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index ec765527546f..5c4ba211bcef 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -164,6 +164,47 @@ static bool is_vmpl0(void)
return true;
}
+static void __page_state_change(unsigned long paddr, enum psc_op op)
+{
+ u64 val;
+
+ if (!sev_snp_enabled())
+ return;
+
+ /*
+ * If private -> shared then invalidate the page before requesting the
+ * state change in the RMP table.
+ */
+ if (op == SNP_PAGE_STATE_SHARED && pvalidate(paddr, RMP_PG_SIZE_4K, 0))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+
+ /* Issue VMGEXIT to change the page state in RMP table. */
+ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
+ VMGEXIT();
+
+ /* Read the response of the VMGEXIT. */
+ val = sev_es_rd_ghcb_msr();
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+ /*
+ * Now that page is added in the RMP table, validate it so that it is
+ * consistent with the RMP entry.
+ */
+ if (op == SNP_PAGE_STATE_PRIVATE && pvalidate(paddr, RMP_PG_SIZE_4K, 1))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+}
+
+void snp_set_page_private(unsigned long paddr)
+{
+ __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
+}
+
+void snp_set_page_shared(unsigned long paddr)
+{
+ __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
+}
+
static bool do_early_sev_setup(void)
{
if (!sev_es_negotiate_protocol())
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index d426c30ae7b4..1cd8ce838af8 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -57,6 +57,26 @@
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+/* SNP Page State Change */
+enum psc_op {
+ SNP_PAGE_STATE_PRIVATE = 1,
+ SNP_PAGE_STATE_SHARED,
+};
+
+#define GHCB_MSR_PSC_REQ 0x014
+#define GHCB_MSR_PSC_REQ_GFN(gfn, op) \
+ /* GHCBData[55:52] */ \
+ (((u64)((op) & 0xf) << 52) | \
+ /* GHCBData[51:12] */ \
+ ((u64)((gfn) & GENMASK_ULL(39, 0)) << 12) | \
+ /* GHCBData[11:0] */ \
+ GHCB_MSR_PSC_REQ)
+
+#define GHCB_MSR_PSC_RESP 0x015
+#define GHCB_MSR_PSC_RESP_VAL(val) \
+ /* GHCBData[63:32] */ \
+ (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
+
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
--
2.17.1
SEV-SNP specification provides the guest a mechanisum to communicate with
the PSP without risk from a malicious hypervisor who wishes to read, alter,
drop or replay the messages sent. The driver uses snp_issue_guest_request()
to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to
submit the request to PSP.
The PSP requires that all communication should be encrypted using key
specified through the platform_data.
The userspace can use SNP_GET_REPORT ioctl() to query the guest
attestation report.
See SEV-SNP spec section Guest Messages for more details.
Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/virt/coco/sevguest.rst | 69 ++++
drivers/virt/Kconfig | 3 +
drivers/virt/Makefile | 1 +
drivers/virt/coco/sevguest/Kconfig | 9 +
drivers/virt/coco/sevguest/Makefile | 2 +
drivers/virt/coco/sevguest/sevguest.c | 448 ++++++++++++++++++++++++++
drivers/virt/coco/sevguest/sevguest.h | 63 ++++
include/uapi/linux/sev-guest.h | 44 +++
8 files changed, 639 insertions(+)
create mode 100644 Documentation/virt/coco/sevguest.rst
create mode 100644 drivers/virt/coco/sevguest/Kconfig
create mode 100644 drivers/virt/coco/sevguest/Makefile
create mode 100644 drivers/virt/coco/sevguest/sevguest.c
create mode 100644 drivers/virt/coco/sevguest/sevguest.h
create mode 100644 include/uapi/linux/sev-guest.h
diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
new file mode 100644
index 000000000000..52d5915037ef
--- /dev/null
+++ b/Documentation/virt/coco/sevguest.rst
@@ -0,0 +1,69 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================================================
+The Definitive SEV Guest API Documentation
+===================================================================
+
+1. General description
+======================
+
+The SEV API is a set of ioctls that are issued to by the guest or
+hypervisor to get or set certain aspect of the SEV virtual machine.
+The ioctls belong to the following classes:
+
+ - Hypervisor ioctls: These query and set global attributes which affect the
+ whole SEV firmware. These ioctl is used by platform provision tools.
+
+ - Guest ioctls: These query and set attribute of the SEV virtual machine.
+
+2. API description
+==================
+
+This section describes ioctls that can be used to query or set SEV guests.
+For each ioctl, the following information is provided along with a
+description:
+
+ Technology:
+ which SEV techology provides this ioctl. sev, sev-es, sev-snp or all.
+
+ Type:
+ hypervisor or guest. The ioctl can be used inside the guest or the
+ hypervisor.
+
+ Parameters:
+ what parameters are accepted by the ioctl.
+
+ Returns:
+ the return value. General error numbers (ENOMEM, EINVAL)
+ are not detailed, but errors with specific meanings are.
+
+The guest ioctl should be called to /dev/sev-guest device. The ioctl accepts
+struct snp_user_guest_request. The input and output structure is specified
+through the req_data and resp_data field respectively. If the ioctl fails
+to execute due to the firmware error, then fw_err code will be set.
+
+::
+ struct snp_user_guest_request {
+ /* Request and response structure address */
+ __u64 req_data;
+ __u64 resp_data;
+
+ /* firmware error code on failure (see psp-sev.h) */
+ __u64 fw_err;
+ };
+
+2.1 SNP_GET_REPORT
+------------------
+
+:Technology: sev-snp
+:Type: guest ioctl
+:Parameters (in): struct snp_report_req
+:Returns (out): struct snp_report_resp on success, -negative on error
+
+The SNP_GET_REPORT ioctl can be used to query the attestation report from the
+SEV-SNP firmware. The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command
+provided by the SEV-SNP firmware to query the attestation report.
+
+On success, the snp_report_resp.data will contains the report. The report
+format is described in the SEV-SNP specification. See the SEV-SNP specification
+for further details.
diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 8061e8ef449f..e457e47610d3 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -36,4 +36,7 @@ source "drivers/virt/vboxguest/Kconfig"
source "drivers/virt/nitro_enclaves/Kconfig"
source "drivers/virt/acrn/Kconfig"
+
+source "drivers/virt/coco/sevguest/Kconfig"
+
endif
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index 3e272ea60cd9..9c704a6fdcda 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -8,3 +8,4 @@ obj-y += vboxguest/
obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
obj-$(CONFIG_ACRN_HSM) += acrn/
+obj-$(CONFIG_SEV_GUEST) += coco/sevguest/
diff --git a/drivers/virt/coco/sevguest/Kconfig b/drivers/virt/coco/sevguest/Kconfig
new file mode 100644
index 000000000000..96190919cca8
--- /dev/null
+++ b/drivers/virt/coco/sevguest/Kconfig
@@ -0,0 +1,9 @@
+config SEV_GUEST
+ tristate "AMD SEV Guest driver"
+ default y
+ depends on AMD_MEM_ENCRYPT && CRYPTO_AEAD2
+ help
+ The driver can be used by the SEV-SNP guest to communicate with the PSP to
+ request the attestation report and more.
+
+ If you choose 'M' here, this module will be called sevguest.
diff --git a/drivers/virt/coco/sevguest/Makefile b/drivers/virt/coco/sevguest/Makefile
new file mode 100644
index 000000000000..b1ffb2b4177b
--- /dev/null
+++ b/drivers/virt/coco/sevguest/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_SEV_GUEST) += sevguest.o
diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
new file mode 100644
index 000000000000..d029a98ad088
--- /dev/null
+++ b/drivers/virt/coco/sevguest/sevguest.c
@@ -0,0 +1,448 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD Secure Encrypted Virtualization Nested Paging (SEV-SNP) guest request interface
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/io.h>
+#include <linux/platform_device.h>
+#include <linux/miscdevice.h>
+#include <linux/set_memory.h>
+#include <linux/fs.h>
+#include <crypto/aead.h>
+#include <linux/scatterlist.h>
+#include <linux/sev-guest.h>
+#include <linux/psp-sev.h>
+#include <uapi/linux/sev-guest.h>
+#include <uapi/linux/psp-sev.h>
+
+#include "sevguest.h"
+
+#define DEVICE_NAME "sev-guest"
+#define AAD_LEN 48
+#define MSG_HDR_VER 1
+
+struct snp_guest_crypto {
+ struct crypto_aead *tfm;
+ u8 *iv, *authtag;
+ int iv_len, a_len;
+};
+
+struct snp_guest_dev {
+ struct device *dev;
+ struct miscdevice misc;
+
+ struct snp_guest_crypto *crypto;
+ struct snp_guest_msg *request, *response;
+};
+
+static u8 vmpck_id;
+static DEFINE_MUTEX(snp_cmd_mutex);
+
+static inline struct snp_guest_dev *to_snp_dev(struct file *file)
+{
+ struct miscdevice *dev = file->private_data;
+
+ return container_of(dev, struct snp_guest_dev, misc);
+}
+
+static struct snp_guest_crypto *init_crypto(struct snp_guest_dev *snp_dev, u8 *key, size_t keylen)
+{
+ struct snp_guest_crypto *crypto;
+
+ crypto = kzalloc(sizeof(*crypto), GFP_KERNEL_ACCOUNT);
+ if (!crypto)
+ return NULL;
+
+ crypto->tfm = crypto_alloc_aead("gcm(aes)", 0, 0);
+ if (IS_ERR(crypto->tfm))
+ goto e_free;
+
+ if (crypto_aead_setkey(crypto->tfm, key, keylen))
+ goto e_free_crypto;
+
+ crypto->iv_len = crypto_aead_ivsize(crypto->tfm);
+ if (crypto->iv_len < 12) {
+ dev_err(snp_dev->dev, "IV length is less than 12.\n");
+ goto e_free_crypto;
+ }
+
+ crypto->iv = kmalloc(crypto->iv_len, GFP_KERNEL_ACCOUNT);
+ if (!crypto->iv)
+ goto e_free_crypto;
+
+ if (crypto_aead_authsize(crypto->tfm) > MAX_AUTHTAG_LEN) {
+ if (crypto_aead_setauthsize(crypto->tfm, MAX_AUTHTAG_LEN)) {
+ dev_err(snp_dev->dev, "failed to set authsize to %d\n", MAX_AUTHTAG_LEN);
+ goto e_free_crypto;
+ }
+ }
+
+ crypto->a_len = crypto_aead_authsize(crypto->tfm);
+ crypto->authtag = kmalloc(crypto->a_len, GFP_KERNEL_ACCOUNT);
+ if (!crypto->authtag)
+ goto e_free_crypto;
+
+ return crypto;
+
+e_free_crypto:
+ crypto_free_aead(crypto->tfm);
+e_free:
+ kfree(crypto->iv);
+ kfree(crypto->authtag);
+ kfree(crypto);
+
+ return NULL;
+}
+
+static void deinit_crypto(struct snp_guest_crypto *crypto)
+{
+ crypto_free_aead(crypto->tfm);
+ kfree(crypto->iv);
+ kfree(crypto->authtag);
+ kfree(crypto);
+}
+
+static int enc_dec_message(struct snp_guest_crypto *crypto, struct snp_guest_msg *msg,
+ u8 *src_buf, u8 *dst_buf, size_t len, bool enc)
+{
+ struct snp_guest_msg_hdr *hdr = &msg->hdr;
+ struct scatterlist src[3], dst[3];
+ DECLARE_CRYPTO_WAIT(wait);
+ struct aead_request *req;
+ int ret;
+
+ req = aead_request_alloc(crypto->tfm, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
+
+ /*
+ * AEAD memory operations:
+ * +------ AAD -------+------- DATA -----+---- AUTHTAG----+
+ * | msg header | plaintext | hdr->authtag |
+ * | bytes 30h - 5Fh | or | |
+ * | | cipher | |
+ * +------------------+------------------+----------------+
+ */
+ sg_init_table(src, 3);
+ sg_set_buf(&src[0], &hdr->algo, AAD_LEN);
+ sg_set_buf(&src[1], src_buf, hdr->msg_sz);
+ sg_set_buf(&src[2], hdr->authtag, crypto->a_len);
+
+ sg_init_table(dst, 3);
+ sg_set_buf(&dst[0], &hdr->algo, AAD_LEN);
+ sg_set_buf(&dst[1], dst_buf, hdr->msg_sz);
+ sg_set_buf(&dst[2], hdr->authtag, crypto->a_len);
+
+ aead_request_set_ad(req, AAD_LEN);
+ aead_request_set_tfm(req, crypto->tfm);
+ aead_request_set_callback(req, 0, crypto_req_done, &wait);
+
+ aead_request_set_crypt(req, src, dst, len, crypto->iv);
+ ret = crypto_wait_req(enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req), &wait);
+
+ aead_request_free(req);
+ return ret;
+}
+
+static int __enc_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
+ void *plaintext, size_t len)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_guest_msg_hdr *hdr = &msg->hdr;
+
+ memset(crypto->iv, 0, crypto->iv_len);
+ memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
+
+ return enc_dec_message(crypto, msg, plaintext, msg->payload, len, true);
+}
+
+static int dec_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
+ void *plaintext, size_t len)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_guest_msg_hdr *hdr = &msg->hdr;
+
+ /* Build IV with response buffer sequence number */
+ memset(crypto->iv, 0, crypto->iv_len);
+ memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
+
+ return enc_dec_message(crypto, msg, msg->payload, plaintext, len, false);
+}
+
+static int verify_and_dec_payload(struct snp_guest_dev *snp_dev, void *payload, u32 sz)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_guest_msg *resp = snp_dev->response;
+ struct snp_guest_msg *req = snp_dev->request;
+ struct snp_guest_msg_hdr *req_hdr = &req->hdr;
+ struct snp_guest_msg_hdr *resp_hdr = &resp->hdr;
+
+ dev_dbg(snp_dev->dev, "response [seqno %lld type %d version %d sz %d]\n",
+ resp_hdr->msg_seqno, resp_hdr->msg_type, resp_hdr->msg_version, resp_hdr->msg_sz);
+
+ /* Verify that the sequence counter is incremented by 1 */
+ if (unlikely(resp_hdr->msg_seqno != (req_hdr->msg_seqno + 1)))
+ return -EBADMSG;
+
+ /* Verify response message type and version number. */
+ if (resp_hdr->msg_type != (req_hdr->msg_type + 1) ||
+ resp_hdr->msg_version != req_hdr->msg_version)
+ return -EBADMSG;
+
+ /*
+ * If the message size is greater than our buffer length then return
+ * an error.
+ */
+ if (unlikely((resp_hdr->msg_sz + crypto->a_len) > sz))
+ return -EBADMSG;
+
+ return dec_payload(snp_dev, resp, payload, resp_hdr->msg_sz + crypto->a_len);
+}
+
+static bool enc_payload(struct snp_guest_dev *snp_dev, int version, u8 type,
+ void *payload, size_t sz)
+{
+ struct snp_guest_msg *req = snp_dev->request;
+ struct snp_guest_msg_hdr *hdr = &req->hdr;
+
+ memset(req, 0, sizeof(*req));
+
+ hdr->algo = SNP_AEAD_AES_256_GCM;
+ hdr->hdr_version = MSG_HDR_VER;
+ hdr->hdr_sz = sizeof(*hdr);
+ hdr->msg_type = type;
+ hdr->msg_version = version;
+ hdr->msg_seqno = snp_msg_seqno();
+ hdr->msg_vmpck = vmpck_id;
+ hdr->msg_sz = sz;
+
+ dev_dbg(snp_dev->dev, "request [seqno %lld type %d version %d sz %d]\n",
+ hdr->msg_seqno, hdr->msg_type, hdr->msg_version, hdr->msg_sz);
+
+ return __enc_payload(snp_dev, req, payload, sz);
+}
+
+static int handle_guest_request(struct snp_guest_dev *snp_dev, int version, u8 type,
+ void *req_buf, size_t req_sz, void *resp_buf,
+ u32 resp_sz, __u64 *fw_err)
+{
+ struct snp_guest_request_data data;
+ unsigned long err;
+ int rc;
+
+ memset(snp_dev->response, 0, sizeof(*snp_dev->response));
+
+ /* Encrypt the userspace provided payload */
+ rc = enc_payload(snp_dev, version, type, req_buf, req_sz);
+ if (rc)
+ return rc;
+
+ /* Call firmware to process the request */
+ data.req_gpa = __pa(snp_dev->request);
+ data.resp_gpa = __pa(snp_dev->response);
+ rc = snp_issue_guest_request(GUEST_REQUEST, &data, &err);
+
+ if (fw_err)
+ *fw_err = err;
+
+ if (rc)
+ return rc;
+
+ return verify_and_dec_payload(snp_dev, resp_buf, resp_sz);
+}
+
+static int get_report(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *arg)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_report_resp *resp;
+ struct snp_report_req req;
+ int rc, resp_len;
+
+ if (!arg->req_data || !arg->resp_data)
+ return -EINVAL;
+
+ /* Copy the request payload from the userspace */
+ if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
+ return -EFAULT;
+
+ /* Message version must be non-zero */
+ if (!req.msg_version)
+ return -EINVAL;
+
+ /*
+ * The intermediate response buffer is used while decrypting the
+ * response payload. Make sure that it has enough space to cover the
+ * authtag.
+ */
+ resp_len = sizeof(resp->data) + crypto->a_len;
+ resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
+ if (!resp)
+ return -ENOMEM;
+
+ /* Issue the command to get the attestation report */
+ rc = handle_guest_request(snp_dev, req.msg_version, SNP_MSG_REPORT_REQ,
+ &req.user_data, sizeof(req.user_data), resp->data, resp_len,
+ &arg->fw_err);
+ if (rc)
+ goto e_free;
+
+ /* Copy the response payload to userspace */
+ if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
+ rc = -EFAULT;
+
+e_free:
+ kfree(resp);
+ return rc;
+}
+
+static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
+{
+ struct snp_guest_dev *snp_dev = to_snp_dev(file);
+ void __user *argp = (void __user *)arg;
+ struct snp_user_guest_request input;
+ int ret = -ENOTTY;
+
+ if (copy_from_user(&input, argp, sizeof(input)))
+ return -EFAULT;
+
+ mutex_lock(&snp_cmd_mutex);
+
+ switch (ioctl) {
+ case SNP_GET_REPORT: {
+ ret = get_report(snp_dev, &input);
+ break;
+ }
+ default:
+ break;
+ }
+
+ mutex_unlock(&snp_cmd_mutex);
+
+ if (copy_to_user(argp, &input, sizeof(input)))
+ return -EFAULT;
+
+ return ret;
+}
+
+static void free_shared_pages(void *buf, size_t sz)
+{
+ unsigned int npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ /* If fail to restore the encryption mask then leak it. */
+ if (set_memory_encrypted((unsigned long)buf, npages))
+ return;
+
+ __free_pages(virt_to_page(buf), get_order(sz));
+}
+
+static void *alloc_shared_pages(size_t sz)
+{
+ unsigned int npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+ struct page *page;
+ int ret;
+
+ page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(sz));
+ if (IS_ERR(page))
+ return NULL;
+
+ ret = set_memory_decrypted((unsigned long)page_address(page), npages);
+ if (ret) {
+ __free_pages(page, get_order(sz));
+ return NULL;
+ }
+
+ return page_address(page);
+}
+
+static const struct file_operations snp_guest_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = snp_guest_ioctl,
+};
+
+static int __init snp_guest_probe(struct platform_device *pdev)
+{
+ struct snp_guest_platform_data *data;
+ struct device *dev = &pdev->dev;
+ struct snp_guest_dev *snp_dev;
+ struct miscdevice *misc;
+ int ret;
+
+ if (!dev->platform_data)
+ return -ENODEV;
+
+ data = (struct snp_guest_platform_data *)dev->platform_data;
+ vmpck_id = data->vmpck_id;
+
+ snp_dev = devm_kzalloc(&pdev->dev, sizeof(struct snp_guest_dev), GFP_KERNEL);
+ if (!snp_dev)
+ return -ENOMEM;
+
+ platform_set_drvdata(pdev, snp_dev);
+ snp_dev->dev = dev;
+
+ snp_dev->crypto = init_crypto(snp_dev, data->vmpck, sizeof(data->vmpck));
+ if (!snp_dev->crypto)
+ return -EIO;
+
+ /* Allocate the shared page used for the request and response message. */
+ snp_dev->request = alloc_shared_pages(sizeof(struct snp_guest_msg));
+ if (IS_ERR(snp_dev->request)) {
+ ret = PTR_ERR(snp_dev->request);
+ goto e_free_crypto;
+ }
+
+ snp_dev->response = alloc_shared_pages(sizeof(struct snp_guest_msg));
+ if (IS_ERR(snp_dev->response)) {
+ ret = PTR_ERR(snp_dev->response);
+ goto e_free_req;
+ }
+
+ misc = &snp_dev->misc;
+ misc->minor = MISC_DYNAMIC_MINOR;
+ misc->name = DEVICE_NAME;
+ misc->fops = &snp_guest_fops;
+
+ return misc_register(misc);
+
+e_free_req:
+ free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
+
+e_free_crypto:
+ deinit_crypto(snp_dev->crypto);
+
+ return ret;
+}
+
+static int __exit snp_guest_remove(struct platform_device *pdev)
+{
+ struct snp_guest_dev *snp_dev = platform_get_drvdata(pdev);
+
+ free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
+ free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
+ deinit_crypto(snp_dev->crypto);
+ misc_deregister(&snp_dev->misc);
+
+ return 0;
+}
+
+static struct platform_driver snp_guest_driver = {
+ .remove = __exit_p(snp_guest_remove),
+ .driver = {
+ .name = "snp-guest",
+ },
+};
+
+module_platform_driver_probe(snp_guest_driver, snp_guest_probe);
+
+MODULE_AUTHOR("Brijesh Singh <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1.0.0");
+MODULE_DESCRIPTION("AMD SNP Guest Driver");
diff --git a/drivers/virt/coco/sevguest/sevguest.h b/drivers/virt/coco/sevguest/sevguest.h
new file mode 100644
index 000000000000..4cd2f8b81154
--- /dev/null
+++ b/drivers/virt/coco/sevguest/sevguest.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ *
+ * SEV-SNP API spec is available at https://developer.amd.com/sev
+ */
+
+#ifndef __LINUX_SEVGUEST_H_
+#define __LINUX_SEVGUEST_H_
+
+#include <linux/types.h>
+
+#define MAX_AUTHTAG_LEN 32
+
+/* See SNP spec SNP_GUEST_REQUEST section for the structure */
+enum msg_type {
+ SNP_MSG_TYPE_INVALID = 0,
+ SNP_MSG_CPUID_REQ,
+ SNP_MSG_CPUID_RSP,
+ SNP_MSG_KEY_REQ,
+ SNP_MSG_KEY_RSP,
+ SNP_MSG_REPORT_REQ,
+ SNP_MSG_REPORT_RSP,
+ SNP_MSG_EXPORT_REQ,
+ SNP_MSG_EXPORT_RSP,
+ SNP_MSG_IMPORT_REQ,
+ SNP_MSG_IMPORT_RSP,
+ SNP_MSG_ABSORB_REQ,
+ SNP_MSG_ABSORB_RSP,
+ SNP_MSG_VMRK_REQ,
+ SNP_MSG_VMRK_RSP,
+
+ SNP_MSG_TYPE_MAX
+};
+
+enum aead_algo {
+ SNP_AEAD_INVALID,
+ SNP_AEAD_AES_256_GCM,
+};
+
+struct snp_guest_msg_hdr {
+ u8 authtag[MAX_AUTHTAG_LEN];
+ u64 msg_seqno;
+ u8 rsvd1[8];
+ u8 algo;
+ u8 hdr_version;
+ u16 hdr_sz;
+ u8 msg_type;
+ u8 msg_version;
+ u16 msg_sz;
+ u32 rsvd2;
+ u8 msg_vmpck;
+ u8 rsvd3[35];
+} __packed;
+
+struct snp_guest_msg {
+ struct snp_guest_msg_hdr hdr;
+ u8 payload[4000];
+} __packed;
+
+#endif /* __LINUX_SNP_GUEST_H__ */
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
new file mode 100644
index 000000000000..e8cfd15133f3
--- /dev/null
+++ b/include/uapi/linux/sev-guest.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
+/*
+ * Userspace interface for AMD SEV and SEV-SNP guest driver.
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ *
+ * SEV API specification is available at: https://developer.amd.com/sev/
+ */
+
+#ifndef __UAPI_LINUX_SEV_GUEST_H_
+#define __UAPI_LINUX_SEV_GUEST_H_
+
+#include <linux/types.h>
+
+struct snp_report_req {
+ /* message version number (must be non-zero) */
+ __u8 msg_version;
+
+ /* user data that should be included in the report */
+ __u8 user_data[64];
+};
+
+struct snp_report_resp {
+ /* response data, see SEV-SNP spec for the format */
+ __u8 data[4000];
+};
+
+struct snp_user_guest_request {
+ /* Request and response structure address */
+ __u64 req_data;
+ __u64 resp_data;
+
+ /* firmware error code on failure (see psp-sev.h) */
+ __u64 fw_err;
+};
+
+#define SNP_GUEST_REQ_IOC_TYPE 'S'
+
+/* Get SNP attestation report */
+#define SNP_GET_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x0, struct snp_user_guest_request)
+
+#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.17.1
Version 2 of GHCB specification defines NAE to get the extended guest
request. It is similar to the SNP_GET_REPORT ioctl. The main difference
is related to the additional data that be returned. The additional
data returned is a certificate blob that can be used by the SNP guest
user. The certificate blob layout is defined in the GHCB specification.
The driver simply treats the blob as a opaque data and copies it to
userspace.
Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/virt/coco/sevguest.rst | 22 +++++
drivers/virt/coco/sevguest/sevguest.c | 126 ++++++++++++++++++++++++++
include/uapi/linux/sev-guest.h | 13 +++
3 files changed, 161 insertions(+)
diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
index 25446670d816..7acb8696fca4 100644
--- a/Documentation/virt/coco/sevguest.rst
+++ b/Documentation/virt/coco/sevguest.rst
@@ -85,3 +85,25 @@ on the various fileds passed in the key derivation request.
On success, the snp_derived_key_resp.data will contains the derived key
value.
+
+2.2 SNP_GET_EXT_REPORT
+----------------------
+:Technology: sev-snp
+:Type: guest ioctl
+:Parameters (in/out): struct snp_ext_report_req
+:Returns (out): struct snp_report_resp on success, -negative on error
+
+The SNP_GET_EXT_REPORT ioctl is similar to the SNP_GET_REPORT. The difference is
+related to the additional certificate data that is returned with the report.
+The certificate data returned is being provided by the hypervisor through the
+SNP_SET_EXT_CONFIG.
+
+The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command provided by the SEV-SNP
+firmware to get the attestation report.
+
+On success, the snp_ext_report_resp.data will contains the attestation report
+and snp_ext_report_req.certs_address will contains the certificate blob. If the
+length of the blob is lesser than expected then snp_ext_report_req.certs_len will
+be updated with the expected value.
+
+See GHCB specification for further detail on how to parse the certificate blob.
diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
index 621b1c5a9cfc..d978eb432c4c 100644
--- a/drivers/virt/coco/sevguest/sevguest.c
+++ b/drivers/virt/coco/sevguest/sevguest.c
@@ -39,6 +39,7 @@ struct snp_guest_dev {
struct device *dev;
struct miscdevice misc;
+ void *certs_data;
struct snp_guest_crypto *crypto;
struct snp_guest_msg *request, *response;
};
@@ -347,6 +348,117 @@ static int get_derived_key(struct snp_guest_dev *snp_dev, struct snp_user_guest_
return rc;
}
+static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *arg)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_guest_request_data input = {};
+ struct snp_ext_report_req req;
+ int ret, npages = 0, resp_len;
+ struct snp_report_resp *resp;
+ struct snp_report_req *rreq;
+ unsigned long fw_err = 0;
+
+ if (!arg->req_data || !arg->resp_data)
+ return -EINVAL;
+
+ /* Copy the request payload from the userspace */
+ if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
+ return -EFAULT;
+
+ rreq = &req.data;
+
+ /* Message version must be non-zero */
+ if (!rreq->msg_version)
+ return -EINVAL;
+
+ if (req.certs_len) {
+ if (req.certs_len > SEV_FW_BLOB_MAX_SIZE ||
+ !IS_ALIGNED(req.certs_len, PAGE_SIZE))
+ return -EINVAL;
+ }
+
+ if (req.certs_address && req.certs_len) {
+ if (!access_ok(req.certs_address, req.certs_len))
+ return -EFAULT;
+
+ /*
+ * Initialize the intermediate buffer with all zero's. This buffer
+ * is used in the guest request message to get the certs blob from
+ * the host. If host does not supply any certs in it, then we copy
+ * zeros to indicate that certificate data was not provided.
+ */
+ memset(snp_dev->certs_data, 0, req.certs_len);
+
+ input.data_gpa = __pa(snp_dev->certs_data);
+ npages = req.certs_len >> PAGE_SHIFT;
+ }
+
+ /*
+ * The intermediate response buffer is used while decrypting the
+ * response payload. Make sure that it has enough space to cover the
+ * authtag.
+ */
+ resp_len = sizeof(resp->data) + crypto->a_len;
+ resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
+ if (!resp)
+ return -ENOMEM;
+
+ if (copy_from_user(resp, (void __user *)arg->resp_data, sizeof(*resp))) {
+ ret = -EFAULT;
+ goto e_free;
+ }
+
+ /* Encrypt the userspace provided payload */
+ ret = enc_payload(snp_dev, rreq->msg_version, SNP_MSG_REPORT_REQ,
+ &rreq->user_data, sizeof(rreq->user_data));
+ if (ret)
+ goto e_free;
+
+ /* Call firmware to process the request */
+ input.req_gpa = __pa(snp_dev->request);
+ input.resp_gpa = __pa(snp_dev->response);
+ input.data_npages = npages;
+ memset(snp_dev->response, 0, sizeof(*snp_dev->response));
+ ret = snp_issue_guest_request(EXT_GUEST_REQUEST, &input, &fw_err);
+
+ /* Popogate any firmware error to the userspace */
+ arg->fw_err = fw_err;
+
+ /* If certs length is invalid then copy the returned length */
+ if (arg->fw_err == SNP_GUEST_REQ_INVALID_LEN) {
+ req.certs_len = input.data_npages << PAGE_SHIFT;
+
+ if (copy_to_user((void __user *)arg->req_data, &req, sizeof(req)))
+ ret = -EFAULT;
+
+ goto e_free;
+ }
+
+ if (ret)
+ goto e_free;
+
+ /* Decrypt the response payload */
+ ret = verify_and_dec_payload(snp_dev, resp->data, resp_len);
+ if (ret)
+ goto e_free;
+
+ /* Copy the certificate data blob to userspace */
+ if (req.certs_address &&
+ copy_to_user((void __user *)req.certs_address, snp_dev->certs_data,
+ req.certs_len)) {
+ ret = -EFAULT;
+ goto e_free;
+ }
+
+ /* Copy the response payload to userspace */
+ if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
+ ret = -EFAULT;
+
+e_free:
+ kfree(resp);
+ return ret;
+}
+
static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
struct snp_guest_dev *snp_dev = to_snp_dev(file);
@@ -368,6 +480,10 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
ret = get_derived_key(snp_dev, &input);
break;
}
+ case SNP_GET_EXT_REPORT: {
+ ret = get_ext_report(snp_dev, &input);
+ break;
+ }
default:
break;
}
@@ -453,6 +569,12 @@ static int __init snp_guest_probe(struct platform_device *pdev)
goto e_free_req;
}
+ snp_dev->certs_data = alloc_shared_pages(SEV_FW_BLOB_MAX_SIZE);
+ if (IS_ERR(snp_dev->certs_data)) {
+ ret = PTR_ERR(snp_dev->certs_data);
+ goto e_free_resp;
+ }
+
misc = &snp_dev->misc;
misc->minor = MISC_DYNAMIC_MINOR;
misc->name = DEVICE_NAME;
@@ -460,6 +582,9 @@ static int __init snp_guest_probe(struct platform_device *pdev)
return misc_register(misc);
+e_free_resp:
+ free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
+
e_free_req:
free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
@@ -475,6 +600,7 @@ static int __exit snp_guest_remove(struct platform_device *pdev)
free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
+ free_shared_pages(snp_dev->certs_data, SEV_FW_BLOB_MAX_SIZE);
deinit_crypto(snp_dev->crypto);
misc_deregister(&snp_dev->misc);
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index 621a9167df7a..23659215fcfb 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -57,6 +57,16 @@ struct snp_derived_key_resp {
__u8 data[64];
};
+struct snp_ext_report_req {
+ struct snp_report_req data;
+
+ /* where to copy the certificate blob */
+ __u64 certs_address;
+
+ /* length of the certificate blob */
+ __u32 certs_len;
+};
+
#define SNP_GUEST_REQ_IOC_TYPE 'S'
/* Get SNP attestation report */
@@ -65,4 +75,7 @@ struct snp_derived_key_resp {
/* Get a derived key from the root */
#define SNP_GET_DERIVED_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_user_guest_request)
+/* Get SNP extended report as defined in the GHCB specification version 2. */
+#define SNP_GET_EXT_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x2, struct snp_user_guest_request)
+
#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.17.1
The SNP guest request message header contains a message count. The
message count is used while building the IV. The PSP firmware increments
the message count by 1, and expects that next message will be using the
incremented count. The snp_msg_seqno() helper will be used by driver to
get the message sequence counter used in the request message header,
and it will be automatically incremented after the request is successful.
The incremented value is saved in the secrets page so that the kexec'ed
kernel knows from where to begin.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/sev.c | 79 +++++++++++++++++++++++++++++++++++++++
include/linux/sev-guest.h | 37 ++++++++++++++++++
2 files changed, 116 insertions(+)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 319a40fc57ce..f42cd5a8e7bb 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -51,6 +51,8 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
*/
static struct ghcb __initdata *boot_ghcb;
+static u64 snp_secrets_phys;
+
/* #VC handler runtime per-CPU data */
struct sev_es_runtime_data {
struct ghcb ghcb_page;
@@ -2030,6 +2032,80 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
halt();
}
+static struct snp_secrets_page_layout *snp_map_secrets_page(void)
+{
+ u16 __iomem *secrets;
+
+ if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
+ return NULL;
+
+ secrets = ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
+ if (!secrets)
+ return NULL;
+
+ return (struct snp_secrets_page_layout *)secrets;
+}
+
+static inline u64 snp_read_msg_seqno(void)
+{
+ struct snp_secrets_page_layout *layout;
+ u64 count;
+
+ layout = snp_map_secrets_page();
+ if (!layout)
+ return 0;
+
+ /* Read the current message sequence counter from secrets pages */
+ count = readl(&layout->os_area.msg_seqno_0);
+
+ iounmap(layout);
+
+ /* The sequence counter must begin with 1 */
+ if (!count)
+ return 1;
+
+ return count + 1;
+}
+
+u64 snp_msg_seqno(void)
+{
+ u64 count = snp_read_msg_seqno();
+
+ if (unlikely(!count))
+ return 0;
+
+ /*
+ * The message sequence counter for the SNP guest request is a
+ * 64-bit value but the version 2 of GHCB specification defines a
+ * 32-bit storage for the it.
+ */
+ if (count >= UINT_MAX)
+ return 0;
+
+ return count;
+}
+EXPORT_SYMBOL_GPL(snp_msg_seqno);
+
+static void snp_gen_msg_seqno(void)
+{
+ struct snp_secrets_page_layout *layout;
+ u64 count;
+
+ layout = snp_map_secrets_page();
+ if (!layout)
+ return;
+
+ /*
+ * The counter is also incremented by the PSP, so increment it by 2
+ * and save in secrets page.
+ */
+ count = readl(&layout->os_area.msg_seqno_0);
+ count += 2;
+
+ writel(count, &layout->os_area.msg_seqno_0);
+ iounmap(layout);
+}
+
int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsigned long *fw_err)
{
struct ghcb_state state;
@@ -2077,6 +2153,9 @@ int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsi
ret = -EIO;
}
+ /* The command was successful, increment the sequence counter */
+ snp_gen_msg_seqno();
+
e_put:
__sev_put_ghcb(&state);
e_restore_irq:
diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
index 24dd17507789..16b6af24fda7 100644
--- a/include/linux/sev-guest.h
+++ b/include/linux/sev-guest.h
@@ -20,6 +20,41 @@ enum vmgexit_type {
GUEST_REQUEST_MAX
};
+/*
+ * The secrets page contains 96-bytes of reserved field that can be used by
+ * the guest OS. The guest OS uses the area to save the message sequence
+ * number for each VMPCK.
+ *
+ * See the GHCB spec section Secret page layout for the format for this area.
+ */
+struct secrets_os_area {
+ u32 msg_seqno_0;
+ u32 msg_seqno_1;
+ u32 msg_seqno_2;
+ u32 msg_seqno_3;
+ u64 ap_jump_table_pa;
+ u8 rsvd[40];
+ u8 guest_usage[32];
+} __packed;
+
+#define VMPCK_KEY_LEN 32
+
+/* See the SNP spec for secrets page format */
+struct snp_secrets_page_layout {
+ u32 version;
+ u32 imien : 1,
+ rsvd1 : 31;
+ u32 fms;
+ u32 rsvd2;
+ u8 gosvw[16];
+ u8 vmpck0[VMPCK_KEY_LEN];
+ u8 vmpck1[VMPCK_KEY_LEN];
+ u8 vmpck2[VMPCK_KEY_LEN];
+ u8 vmpck3[VMPCK_KEY_LEN];
+ struct secrets_os_area os_area;
+ u8 rsvd3[3840];
+} __packed;
+
/*
* The error code when the data_npages is too small. The error code
* is defined in the GHCB specification.
@@ -36,6 +71,7 @@ struct snp_guest_request_data {
#ifdef CONFIG_AMD_MEM_ENCRYPT
int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
unsigned long *fw_err);
+u64 snp_msg_seqno(void);
#else
static inline int snp_issue_guest_request(int type, struct snp_guest_request_data *input,
@@ -43,6 +79,7 @@ static inline int snp_issue_guest_request(int type, struct snp_guest_request_dat
{
return -ENODEV;
}
+static inline u64 snp_msg_seqno(void) { return 0; }
#endif /* CONFIG_AMD_MEM_ENCRYPT */
#endif /* __LINUX_SEV_GUEST_H__ */
--
2.17.1
The hypervisor uses the sev_features field (offset 3B0h) in the Save State
Area to control the SEV-SNP guest features such as SNPActive, vTOM,
ReflectVC etc. An SEV-SNP guest can read the SEV_FEATURES fields through
the SEV_STATUS MSR.
While at it, update the dump_vmcb() to log the VMPL level.
See APM2 Table 15-34 and B-4 for more details.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 6 ++++--
arch/x86/kvm/svm/svm.c | 4 ++--
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index e322676039f4..5ac691c27dcc 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -236,7 +236,8 @@ struct vmcb_save_area {
struct vmcb_seg ldtr;
struct vmcb_seg idtr;
struct vmcb_seg tr;
- u8 reserved_1[43];
+ u8 reserved_1[42];
+ u8 vmpl;
u8 cpl;
u8 reserved_2[4];
u64 efer;
@@ -301,7 +302,8 @@ struct vmcb_save_area {
u64 sw_exit_info_1;
u64 sw_exit_info_2;
u64 sw_scratch;
- u8 reserved_11[56];
+ u64 sev_features;
+ u8 reserved_11[48];
u64 xcr0;
u8 valid_bitmap[16];
u64 x87_state_gpa;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e8ccab50ebf6..25773bf72158 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3237,8 +3237,8 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
"tr:",
save01->tr.selector, save01->tr.attrib,
save01->tr.limit, save01->tr.base);
- pr_err("cpl: %d efer: %016llx\n",
- save->cpl, save->efer);
+ pr_err("vmpl: %d cpl: %d efer: %016llx\n",
+ save->vmpl, save->cpl, save->efer);
pr_err("%-15s %016llx %-13s %016llx\n",
"cr0:", save->cr0, "cr2:", save->cr2);
pr_err("%-15s %016llx %-13s %016llx\n",
--
2.17.1
Version 2 of the GHCB specification added the advertisement of features
that are supported by the hypervisor. If hypervisor supports the SEV-SNP
then it must set the SEV-SNP features bit to indicate that the base
SEV-SNP is supported.
Check the SEV-SNP feature while establishing the GHCB, if failed,
terminate the guest.
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 26 ++++++++++++++++++++++++--
arch/x86/include/asm/sev-common.h | 3 +++
arch/x86/kernel/sev.c | 8 ++++++--
3 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 7760959fe96d..7be325d9b09f 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -25,6 +25,7 @@
struct ghcb boot_ghcb_page __aligned(PAGE_SIZE);
struct ghcb *boot_ghcb;
+static u64 msr_sev_status;
/*
* Copy a version of this function here - insn-eval.c can't be used in
@@ -119,11 +120,32 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
/* Include code for early handlers */
#include "../../kernel/sev-shared.c"
-static bool early_setup_sev_es(void)
+static inline bool sev_snp_enabled(void)
+{
+ unsigned long low, high;
+
+ if (!msr_sev_status) {
+ asm volatile("rdmsr\n"
+ : "=a" (low), "=d" (high)
+ : "c" (MSR_AMD64_SEV));
+ msr_sev_status = (high << 32) | low;
+ }
+
+ return msr_sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+}
+
+static bool do_early_sev_setup(void)
{
if (!sev_es_negotiate_protocol())
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_PROT_UNSUPPORTED);
+ /*
+ * If SEV-SNP is enabled, then check if the hypervisor supports the SEV-SNP
+ * features.
+ */
+ if (sev_snp_enabled() && !(sev_hv_features & GHCB_HV_FT_SNP))
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+
if (set_page_decrypted((unsigned long)&boot_ghcb_page))
return false;
@@ -174,7 +196,7 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
struct es_em_ctxt ctxt;
enum es_result result;
- if (!boot_ghcb && !early_setup_sev_es())
+ if (!boot_ghcb && !do_early_sev_setup())
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
vc_ghcb_invalidate(boot_ghcb);
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 891569c07ed7..f80a3cde2086 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -64,6 +64,8 @@
/* GHCBData[63:12] */ \
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
+#define GHCB_HV_FT_SNP BIT_ULL(0)
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
@@ -80,6 +82,7 @@
#define SEV_TERM_SET_GEN 0
#define GHCB_SEV_ES_GEN_REQ 0
#define GHCB_SEV_ES_PROT_UNSUPPORTED 1
+#define GHCB_SNP_UNSUPPORTED 2
/* Linux-specific reason codes (used with reason set 1) */
#define SEV_TERM_SET_LINUX 1
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 646912709334..06e6914cdc26 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -662,12 +662,16 @@ static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
* This function runs on the first #VC exception after the kernel
* switched to virtual addresses.
*/
-static bool __init sev_es_setup_ghcb(void)
+static bool __init setup_ghcb(void)
{
/* First make sure the hypervisor talks a supported protocol. */
if (!sev_es_negotiate_protocol())
return false;
+ /* If SNP is active, make sure that hypervisor supports the feature. */
+ if (sev_feature_enabled(SEV_SNP) && !(sev_hv_features & GHCB_HV_FT_SNP))
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+
/*
* Clear the boot_ghcb. The first exception comes in before the bss
* section is cleared.
@@ -1476,7 +1480,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
enum es_result result;
/* Do initial setup or terminate the guest */
- if (unlikely(boot_ghcb == NULL && !sev_es_setup_ghcb()))
+ if (unlikely(!boot_ghcb && !setup_ghcb()))
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
vc_ghcb_invalidate(boot_ghcb);
--
2.17.1
On Fri, Aug 20, 2021 at 10:19:02AM -0500, Brijesh Singh wrote:
> Version 2 of GHCB specification introduced advertisement of a features
> that are supported by the hypervisor. Add support to query the HV
> features on boot.
>
> Version 2 of GHCB specification adds several new NAEs, most of them are
> optional except the hypervisor feature. Now that hypervisor feature NAE
> is implemented, so bump the GHCB maximum support protocol version.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 2 ++
> arch/x86/include/asm/sev-common.h | 3 +++
> arch/x86/include/asm/sev.h | 2 +-
> arch/x86/include/uapi/asm/svm.h | 2 ++
> arch/x86/kernel/sev-shared.c | 23 +++++++++++++++++++++++
> 5 files changed, 31 insertions(+), 1 deletion(-)
I think you can simplify more.
The HV features are read twice - once in the decompressor stub and again
in kernel proper - but I guess that's not such a big deal.
Also, sev_hv_features can be static.
Diff ontop:
---
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index fb857f2e72cb..df14291d65de 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,7 +26,6 @@ enum sev_feature_type {
extern u64 sme_me_mask;
extern u64 sev_status;
-extern u64 sev_hv_features;
void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
unsigned long decrypted_kernel_vaddr,
@@ -67,7 +66,6 @@ bool sev_feature_enabled(unsigned int feature_type);
#else /* !CONFIG_AMD_MEM_ENCRYPT */
#define sme_me_mask 0ULL
-#define sev_hv_features 0ULL
static inline void __init sme_early_encrypt(resource_size_t paddr,
unsigned long size) { }
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 8bd67087d79e..d657c2c5a1ee 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -24,7 +24,7 @@
static u16 __ro_after_init ghcb_version;
/* Bitmap of SEV features supported by the hypervisor */
-u64 __ro_after_init sev_hv_features = 0;
+static u64 __ro_after_init sev_hv_features;
static bool __init sev_es_check_cpu_features(void)
{
@@ -51,10 +51,18 @@ static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
asm volatile("hlt\n" : : : "memory");
}
+/*
+ * The hypervisor features are available from GHCB version 2 onward.
+ */
static bool get_hv_features(void)
{
u64 val;
+ sev_hv_features = 0;
+
+ if (ghcb_version < 2)
+ return false;
+
sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
VMGEXIT();
@@ -85,8 +93,7 @@ static bool sev_es_negotiate_protocol(void)
ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
- /* The hypervisor features are available from version 2 onward. */
- if (ghcb_version >= 2 && !get_hv_features())
+ if (!get_hv_features())
return false;
return true;
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:06AM -0500, Brijesh Singh wrote:
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index d426c30ae7b4..1cd8ce838af8 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -57,6 +57,26 @@
> #define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
> #define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
>
> +/* SNP Page State Change */
Let's make it very clear here that those cmd numbers below are actually
part of the protocol and not randomly chosen:
/*
* ...
*
* 0x014 – SNP Page State Change Request
*
* GHCBData[55:52] – Page operation:
* 0x0001 – Page assignment, Private
* 0x0002 – Page assignment, Shared
*/
> +enum psc_op {
> + SNP_PAGE_STATE_PRIVATE = 1,
> + SNP_PAGE_STATE_SHARED,
> +};
> +
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:08AM -0500, Brijesh Singh wrote:
> The SEV-SNP guest is required to perform GHCB GPA registration. This is
> because the hypervisor may prefer that a guest use a consistent and/or
> specific GPA for the GHCB associated with a vCPU. For more information,
> see the GHCB specification section GHCB GPA Registration.
>
> During the boot, init_ghcb() allocates a per-cpu GHCB page. On very first
> VC exception, the exception handler switch to using the per-cpu GHCB page
> allocated during the init_ghcb(). The GHCB page must be registered in
> the current vcpu context.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/sev-internal.h | 12 ++++++++++++
> arch/x86/kernel/sev.c | 28 ++++++++++++++++++++++++++++
> 2 files changed, 40 insertions(+)
> create mode 100644 arch/x86/kernel/sev-internal.h
>
> diff --git a/arch/x86/kernel/sev-internal.h b/arch/x86/kernel/sev-internal.h
> new file mode 100644
> index 000000000000..0fb7324803b4
> --- /dev/null
> +++ b/arch/x86/kernel/sev-internal.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Forward declarations for sev-shared.c
> + *
> + * Author: Brijesh Singh <[email protected]>
> + */
> +
> +#ifndef __X86_SEV_INTERNAL_H__
> +
> +static void snp_register_ghcb_early(unsigned long paddr);
> +
> +#endif /* __X86_SEV_INTERNAL_H__ */
I believe you don't need that header if you move __sev_get_ghcb()
and snp_register_ghcb() under the #include "sev-shared.c" so that
snp_register_ghcb_early() is visible by then.
diff --git a/arch/x86/kernel/sev-internal.h b/arch/x86/kernel/sev-internal.h
deleted file mode 100644
index 0fb7324803b4..000000000000
--- a/arch/x86/kernel/sev-internal.h
+++ /dev/null
@@ -1,12 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Forward declarations for sev-shared.c
- *
- * Author: Brijesh Singh <[email protected]>
- */
-
-#ifndef __X86_SEV_INTERNAL_H__
-
-static void snp_register_ghcb_early(unsigned long paddr);
-
-#endif /* __X86_SEV_INTERNAL_H__ */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 9ab541b893c2..0ec0602e4bed 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -31,8 +31,6 @@
#include <asm/smp.h>
#include <asm/cpu.h>
-#include "sev-internal.h"
-
#define DR7_RESET_VALUE 0x400
/* For early boot hypervisor communication in SEV-ES enabled guests */
@@ -200,69 +198,6 @@ void noinstr __sev_es_ist_exit(void)
this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
}
-static void snp_register_ghcb(struct sev_es_runtime_data *data, unsigned long paddr)
-{
- if (data->snp_ghcb_registered)
- return;
-
- snp_register_ghcb_early(paddr);
-
- data->snp_ghcb_registered = true;
-}
-
-/*
- * Nothing shall interrupt this code path while holding the per-CPU
- * GHCB. The backup GHCB is only for NMIs interrupting this path.
- *
- * Callers must disable local interrupts around it.
- */
-static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
-{
- struct sev_es_runtime_data *data;
- struct ghcb *ghcb;
-
- WARN_ON(!irqs_disabled());
-
- data = this_cpu_read(runtime_data);
- ghcb = &data->ghcb_page;
-
- if (unlikely(data->ghcb_active)) {
- /* GHCB is already in use - save its contents */
-
- if (unlikely(data->backup_ghcb_active)) {
- /*
- * Backup-GHCB is also already in use. There is no way
- * to continue here so just kill the machine. To make
- * panic() work, mark GHCBs inactive so that messages
- * can be printed out.
- */
- data->ghcb_active = false;
- data->backup_ghcb_active = false;
-
- instrumentation_begin();
- panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
- instrumentation_end();
- }
-
- /* Mark backup_ghcb active before writing to it */
- data->backup_ghcb_active = true;
-
- state->ghcb = &data->backup_ghcb;
-
- /* Backup GHCB content */
- *state->ghcb = *ghcb;
- } else {
- state->ghcb = NULL;
- data->ghcb_active = true;
- }
-
- /* SEV-SNP guest requires that GHCB must be registered. */
- if (sev_feature_enabled(SEV_SNP))
- snp_register_ghcb(data, __pa(ghcb));
-
- return ghcb;
-}
-
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
@@ -518,6 +453,69 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
/* Include code shared with pre-decompression boot stage */
#include "sev-shared.c"
+static void snp_register_ghcb(struct sev_es_runtime_data *data, unsigned long paddr)
+{
+ if (data->snp_ghcb_registered)
+ return;
+
+ snp_register_ghcb_early(paddr);
+
+ data->snp_ghcb_registered = true;
+}
+
+/*
+ * Nothing shall interrupt this code path while holding the per-CPU
+ * GHCB. The backup GHCB is only for NMIs interrupting this path.
+ *
+ * Callers must disable local interrupts around it.
+ */
+static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
+{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ WARN_ON(!irqs_disabled());
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ if (unlikely(data->ghcb_active)) {
+ /* GHCB is already in use - save its contents */
+
+ if (unlikely(data->backup_ghcb_active)) {
+ /*
+ * Backup-GHCB is also already in use. There is no way
+ * to continue here so just kill the machine. To make
+ * panic() work, mark GHCBs inactive so that messages
+ * can be printed out.
+ */
+ data->ghcb_active = false;
+ data->backup_ghcb_active = false;
+
+ instrumentation_begin();
+ panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
+ instrumentation_end();
+ }
+
+ /* Mark backup_ghcb active before writing to it */
+ data->backup_ghcb_active = true;
+
+ state->ghcb = &data->backup_ghcb;
+
+ /* Backup GHCB content */
+ *state->ghcb = *ghcb;
+ } else {
+ state->ghcb = NULL;
+ data->ghcb_active = true;
+ }
+
+ /* SEV-SNP guest requires that GHCB must be registered. */
+ if (sev_feature_enabled(SEV_SNP))
+ snp_register_ghcb(data, __pa(ghcb));
+
+ return ghcb;
+}
+
static noinstr void __sev_put_ghcb(struct ghcb_state *state)
{
struct sev_es_runtime_data *data;
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/23/21 4:47 AM, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:02AM -0500, Brijesh Singh wrote:
>> Version 2 of GHCB specification introduced advertisement of a features
>> that are supported by the hypervisor. Add support to query the HV
>> features on boot.
>>
>> Version 2 of GHCB specification adds several new NAEs, most of them are
>> optional except the hypervisor feature. Now that hypervisor feature NAE
>> is implemented, so bump the GHCB maximum support protocol version.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/include/asm/mem_encrypt.h | 2 ++
>> arch/x86/include/asm/sev-common.h | 3 +++
>> arch/x86/include/asm/sev.h | 2 +-
>> arch/x86/include/uapi/asm/svm.h | 2 ++
>> arch/x86/kernel/sev-shared.c | 23 +++++++++++++++++++++++
>> 5 files changed, 31 insertions(+), 1 deletion(-)
>
> I think you can simplify more.
>
> The HV features are read twice - once in the decompressor stub and again
> in kernel proper - but I guess that's not such a big deal.
>
> Also, sev_hv_features can be static.
>
> Diff ontop:
>
The sev_hv_features is also referred during the AP creation. By caching
the value in sev-shared.c and exporting it to others, we wanted to
minimize VMGEXITs during the AP creation.
If we go with your patch below, then we will need to cache the
sev_hv_features in sev.c, so that it can be later used by the AP
creation code (see patch#22).
thanks
On 8/23/21 1:25 PM, Brijesh Singh wrote:
>
>
> On 8/23/21 4:47 AM, Borislav Petkov wrote:
>> On Fri, Aug 20, 2021 at 10:19:02AM -0500, Brijesh Singh wrote:
>>> Version 2 of GHCB specification introduced advertisement of a features
>>> that are supported by the hypervisor. Add support to query the HV
>>> features on boot.
>>>
>>> Version 2 of GHCB specification adds several new NAEs, most of them are
>>> optional except the hypervisor feature. Now that hypervisor feature NAE
>>> is implemented, so bump the GHCB maximum support protocol version.
>>>
>>> Signed-off-by: Brijesh Singh <[email protected]>
>>> ---
>>> arch/x86/include/asm/mem_encrypt.h | 2 ++
>>> arch/x86/include/asm/sev-common.h | 3 +++
>>> arch/x86/include/asm/sev.h | 2 +-
>>> arch/x86/include/uapi/asm/svm.h | 2 ++
>>> arch/x86/kernel/sev-shared.c | 23 +++++++++++++++++++++++
>>> 5 files changed, 31 insertions(+), 1 deletion(-)
>>
>> I think you can simplify more.
>>
>> The HV features are read twice - once in the decompressor stub and again
>> in kernel proper - but I guess that's not such a big deal.
>>
>> Also, sev_hv_features can be static.
>>
>> Diff ontop:
>>
>
> The sev_hv_features is also referred during the AP creation. By caching
> the value in sev-shared.c and exporting it to others, we wanted to
> minimize VMGEXITs during the AP creation.
>
> If we go with your patch below, then we will need to cache the
> sev_hv_features in sev.c, so that it can be later used by the AP
> creation code (see patch#22).
>
Let me take it back, I didn't realize that sev.c includes the
sev-shared.c. So your patch will work fine. sorry about the noise.
On 8/23/21 9:16 AM, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:06AM -0500, Brijesh Singh wrote:
>> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
>> index d426c30ae7b4..1cd8ce838af8 100644
>> --- a/arch/x86/include/asm/sev-common.h
>> +++ b/arch/x86/include/asm/sev-common.h
>> @@ -57,6 +57,26 @@
>> #define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
>> #define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
>>
>> +/* SNP Page State Change */
>
> Let's make it very clear here that those cmd numbers below are actually
> part of the protocol and not randomly chosen:
>
> /*
> * ...
> *
> * 0x014 – SNP Page State Change Request
> *
> * GHCBData[55:52] – Page operation:
> * 0x0001 – Page assignment, Private
> * 0x0002 – Page assignment, Shared
> */
>
Noted.
thanks
On 8/23/21 12:35 PM, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:08AM -0500, Brijesh Singh wrote:
>> The SEV-SNP guest is required to perform GHCB GPA registration. This is
>> because the hypervisor may prefer that a guest use a consistent and/or
>> specific GPA for the GHCB associated with a vCPU. For more information,
>> see the GHCB specification section GHCB GPA Registration.
>>
>> During the boot, init_ghcb() allocates a per-cpu GHCB page. On very first
>> VC exception, the exception handler switch to using the per-cpu GHCB page
>> allocated during the init_ghcb(). The GHCB page must be registered in
>> the current vcpu context.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/kernel/sev-internal.h | 12 ++++++++++++
>> arch/x86/kernel/sev.c | 28 ++++++++++++++++++++++++++++
>> 2 files changed, 40 insertions(+)
>> create mode 100644 arch/x86/kernel/sev-internal.h
>>
>> diff --git a/arch/x86/kernel/sev-internal.h b/arch/x86/kernel/sev-internal.h
>> new file mode 100644
>> index 000000000000..0fb7324803b4
>> --- /dev/null
>> +++ b/arch/x86/kernel/sev-internal.h
>> @@ -0,0 +1,12 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Forward declarations for sev-shared.c
>> + *
>> + * Author: Brijesh Singh <[email protected]>
>> + */
>> +
>> +#ifndef __X86_SEV_INTERNAL_H__
>> +
>> +static void snp_register_ghcb_early(unsigned long paddr);
>> +
>> +#endif /* __X86_SEV_INTERNAL_H__ */
>
> I believe you don't need that header if you move __sev_get_ghcb()
> and snp_register_ghcb() under the #include "sev-shared.c" so that
> snp_register_ghcb_early() is visible by then.
>
thanks, I will merge this in next version.
On Mon, Aug 23, 2021 at 01:56:06PM -0500, Brijesh Singh wrote:
> thanks, I will merge this in next version.
Thx.
One more thing I stumbled upon while staring at this, see below. Can you
add it to your set or should I simply apply it now?
Thx.
---
From: Borislav Petkov <[email protected]>
Date: Mon, 23 Aug 2021 20:01:35 +0200
Subject: [PATCH] x86/sev: Remove do_early_exception() forward declarations
There's a perfectly fine prototype in the asm/setup.h header. Use it.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/sev.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a6895e440bc3..700ef31d32f8 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -26,6 +26,7 @@
#include <asm/fpu/internal.h>
#include <asm/processor.h>
#include <asm/realmode.h>
+#include <asm/setup.h>
#include <asm/traps.h>
#include <asm/svm.h>
#include <asm/smp.h>
@@ -96,9 +97,6 @@ struct ghcb_state {
static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
-/* Needed in vc_early_forward_exception */
-void do_early_exception(struct pt_regs *regs, int trapnr);
-
static void __init setup_vc_stacks(int cpu)
{
struct sev_es_runtime_data *data;
@@ -240,9 +238,6 @@ static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
return ghcb;
}
-/* Needed in vc_early_forward_exception */
-void do_early_exception(struct pt_regs *regs, int trapnr);
-
static inline u64 sev_es_rd_ghcb_msr(void)
{
return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
--
2.29.2
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/23/21 2:45 PM, Borislav Petkov wrote:
> On Mon, Aug 23, 2021 at 01:56:06PM -0500, Brijesh Singh wrote:
>> thanks, I will merge this in next version.
>
> Thx.
>
> One more thing I stumbled upon while staring at this, see below. Can you
> add it to your set or should I simply apply it now?
>
I can include it in my series. thanks
On Fri, Aug 20, 2021 at 10:19:12AM -0500, Brijesh Singh wrote:
> + while (hdr->cur_entry <= hdr->end_entry) {
> + ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
> +
> + ret = sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PSC, 0, 0);
> +
> + /*
> + * Page State Change VMGEXIT can pass error code through
> + * exit_info_2.
> + */
> + if (WARN(ret || ghcb->save.sw_exit_info_2,
> + "SEV-SNP: PSC failed ret=%d exit_info_2=%llx\n",
> + ret, ghcb->save.sw_exit_info_2)) {
> + ret = 1;
> + goto out;
> + }
> +
> + /*
> + * Sanity check that entry processing is not going backward.
> + * This will happen only if hypervisor is tricking us.
> + */
> + if (WARN(hdr->end_entry > end_entry || cur_entry > hdr->cur_entry,
> + "SEV-SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
I really meant putting the beginning of that string at the very first
position on the line:
if (WARN(hdr->end_entry > end_entry || cur_entry > hdr->cur_entry,
"SEV-SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
end_entry, hdr->end_entry, cur_entry, hdr->cur_entry)) {
Exactly like this!
...
> +static void set_page_state(unsigned long vaddr, unsigned int npages, int op)
> +{
> + unsigned long vaddr_end, next_vaddr;
> + struct snp_psc_desc *desc;
> +
> + vaddr = vaddr & PAGE_MASK;
> + vaddr_end = vaddr + (npages << PAGE_SHIFT);
> +
> + desc = kmalloc(sizeof(*desc), GFP_KERNEL_ACCOUNT);
And again, from previous review:
kzalloc() so that you don't have to memset() later in
__set_page_state().
> + if (!desc)
> + panic("SEV-SNP: failed to alloc memory for PSC descriptor\n");
"allocate" fits just fine too.
> +
> + while (vaddr < vaddr_end) {
> + /*
> + * Calculate the last vaddr that can be fit in one
> + * struct snp_psc_desc.
> + */
> + next_vaddr = min_t(unsigned long, vaddr_end,
> + (VMGEXIT_PSC_MAX_ENTRY * PAGE_SIZE) + vaddr);
> +
> + __set_page_state(desc, vaddr, next_vaddr, op);
> +
> + vaddr = next_vaddr;
> + }
> +
> + kfree(desc);
> +}
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Aug 25, 2021 at 08:54:31AM -0500, Brijesh Singh wrote:
> I replied to your previous comment. Depending on the npages value, the
> __set_page_state() will be called multiple times and on each call it
> needs to clear desc before populate it.
Ah, now I missed it, sorry.
> So, I do not see strong reason to use kzalloc() during the desc
> allocation.
Yeah, then you don't need it.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/25/21 6:06 AM, Borislav Petkov wrote:
>
> I really meant putting the beginning of that string at the very first
> position on the line:
>
> if (WARN(hdr->end_entry > end_entry || cur_entry > hdr->cur_entry,
> "SEV-SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
> end_entry, hdr->end_entry, cur_entry, hdr->cur_entry)) {
>
> Exactly like this!
>
Noted.
> ...
>
>> +static void set_page_state(unsigned long vaddr, unsigned int npages, int op)
>> +{
>> + unsigned long vaddr_end, next_vaddr;
>> + struct snp_psc_desc *desc;
>> +
>> + vaddr = vaddr & PAGE_MASK;
>> + vaddr_end = vaddr + (npages << PAGE_SHIFT);
>> +
>> + desc = kmalloc(sizeof(*desc), GFP_KERNEL_ACCOUNT);
>
> And again, from previous review:
>
> kzalloc() so that you don't have to memset() later in
> __set_page_state().
>
I replied to your previous comment. Depending on the npages value, the
__set_page_state() will be called multiple times and on each call it
needs to clear desc before populate it. So, I do not see strong reason
to use kzalloc() during the desc allocation. I thought you were okay
with that explanation.
>> + if (!desc)
>> + panic("SEV-SNP: failed to alloc memory for PSC descriptor\n");
>
> "allocate" fits just fine too.
>
Noted.
thanks
On Fri, Aug 20, 2021 at 10:19:18AM -0500, Brijesh Singh wrote:
> void __head startup_64_setup_env(unsigned long physbase)
> {
> + u64 gs_area = (u64)fixup_pointer(startup_gs_area, physbase);
> +
This breaks as soon as the compiler decides that startup_64_setup_env()
needs stack protection too.
And the startup_gs_area is also not needed, there is initial_gs for
that.
What you need is something along these lines (untested):
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d8b3ebd2bb85..3c7c59bc9903 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -65,6 +65,16 @@ SYM_CODE_START_NOALIGN(startup_64)
leaq (__end_init_task - FRAME_SIZE)(%rip), %rsp
leaq _text(%rip), %rdi
+
+ movl $MSR_GS_BASE, %ecx
+ movq initial_gs(%rip), %rax
+ movq $_text, %rdx
+ subq %rdx, %rax
+ addq %rdi, %rax
+ movq %rax, %rdx
+ shrq $32, %rdx
+ wrmsr
+
pushq %rsi
call startup_64_setup_env
popq %rsi
It loads the initial_gs pointer, applies the fixup on it and loads it
into MSR_GS_BASE.
On Fri, Aug 20, 2021 at 10:19:21AM -0500, Brijesh Singh wrote:
> From: Michael Roth <[email protected]>
>
> Future patches for SEV-SNP-validated CPUID will also require early
> parsing of the EFI configuration. Move the related code into a set of
> helpers that can be re-used for that purpose.
>
> Signed-off-by: Michael Roth <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/boot/compressed/acpi.c | 113 +++++--------------
> arch/x86/boot/compressed/efi.c | 178 ++++++++++++++++++++++++++++++
> arch/x86/boot/compressed/misc.h | 43 ++++++++
> 4 files changed, 251 insertions(+), 84 deletions(-)
> create mode 100644 arch/x86/boot/compressed/efi.c
Ok, better, but this patch needs splitting. And I have a good idea how:
in at least three patches:
1. Add efi_get_system_table() and use it
2. Add efi_get_conf_table() and use it
3. Add efi_find_vendor_table() and use it
This will facilitate review immensely.
Also, here's a diff ontop of what to do also, style-wise.
- change how you look for the preferred vendor table along with commenting what you do
- shorten variable names so that you don't have so many line breaks.
Thx.
diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index 3a3f997d7210..c22b21e94a95 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -20,27 +20,29 @@
*/
struct mem_vector immovable_mem[MAX_NUMNODES*2];
-/*
- * Search EFI system tables for RSDP. If both ACPI_20_TABLE_GUID and
- * ACPI_TABLE_GUID are found, take the former, which has more features.
- */
static acpi_physical_address
-__efi_get_rsdp_addr(unsigned long config_table_pa,
- unsigned int config_table_len, bool efi_64)
+__efi_get_rsdp_addr(unsigned long cfg_tbl_pa, unsigned int cfg_tbl_len, bool efi_64)
{
acpi_physical_address rsdp_addr = 0;
+
#ifdef CONFIG_EFI
int ret;
- ret = efi_find_vendor_table(config_table_pa, config_table_len,
- ACPI_20_TABLE_GUID, efi_64,
- (unsigned long *)&rsdp_addr);
- if (ret == -ENOENT)
- ret = efi_find_vendor_table(config_table_pa, config_table_len,
- ACPI_TABLE_GUID, efi_64,
- (unsigned long *)&rsdp_addr);
+ /*
+ * Search EFI system tables for RSDP. Preferred is ACPI_20_TABLE_GUID to
+ * ACPI_TABLE_GUID because it has more features.
+ */
+ ret = efi_find_vendor_table(cfg_tbl_pa, cfg_tbl_len, ACPI_20_TABLE_GUID,
+ efi_64, (unsigned long *)&rsdp_addr);
+ if (!ret)
+ return rsdp_addr;
+
+ /* No ACPI_20_TABLE_GUID found, fallback to ACPI_TABLE_GUID. */
+ ret = efi_find_vendor_table(cfg_tbl_pa, cfg_tbl_len, ACPI_TABLE_GUID,
+ efi_64, (unsigned long *)&rsdp_addr);
if (ret)
debug_putstr("Error getting RSDP address.\n");
+
#endif
return rsdp_addr;
}
@@ -100,18 +102,16 @@ static acpi_physical_address kexec_get_rsdp_addr(void) { return 0; }
static acpi_physical_address efi_get_rsdp_addr(void)
{
#ifdef CONFIG_EFI
- unsigned long config_table_pa = 0;
- unsigned int config_table_len;
+ unsigned long cfg_tbl_pa = 0;
+ unsigned int cfg_tbl_len;
bool efi_64;
int ret;
- ret = efi_get_conf_table(boot_params, &config_table_pa,
- &config_table_len, &efi_64);
- if (ret || !config_table_pa)
+ ret = efi_get_conf_table(boot_params, &cfg_tbl_pa, &cfg_tbl_len, &efi_64);
+ if (ret || !cfg_tbl_pa)
error("EFI config table not found.");
- return __efi_get_rsdp_addr(config_table_pa, config_table_len,
- efi_64);
+ return __efi_get_rsdp_addr(cfg_tbl_pa, cfg_tbl_len, efi_64);
#else
return 0;
#endif
diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
index 16ff5cb9a1fb..7ed31b943c04 100644
--- a/arch/x86/boot/compressed/efi.c
+++ b/arch/x86/boot/compressed/efi.c
@@ -12,14 +12,14 @@
#include <asm/efi.h>
/* Get vendor table address/guid from EFI config table at the given index */
-static int get_vendor_table(void *conf_table, unsigned int idx,
+static int get_vendor_table(void *cfg_tbl, unsigned int idx,
unsigned long *vendor_table_pa,
efi_guid_t *vendor_table_guid,
bool efi_64)
{
if (efi_64) {
efi_config_table_64_t *table_entry =
- (efi_config_table_64_t *)conf_table + idx;
+ (efi_config_table_64_t *)cfg_tbl + idx;
if (!IS_ENABLED(CONFIG_X86_64) &&
table_entry->table >> 32) {
@@ -32,7 +32,7 @@ static int get_vendor_table(void *conf_table, unsigned int idx,
} else {
efi_config_table_32_t *table_entry =
- (efi_config_table_32_t *)conf_table + idx;
+ (efi_config_table_32_t *)cfg_tbl + idx;
*vendor_table_pa = table_entry->table;
*vendor_table_guid = table_entry->guid;
@@ -45,27 +45,25 @@ static int get_vendor_table(void *conf_table, unsigned int idx,
* Given EFI config table, search it for the physical address of the vendor
* table associated with GUID.
*
- * @conf_table: pointer to EFI configuration table
- * @conf_table_len: number of entries in EFI configuration table
+ * @cfg_tbl: pointer to EFI configuration table
+ * @cfg_tbl_len: number of entries in EFI configuration table
* @guid: GUID of vendor table
* @efi_64: true if using 64-bit EFI
* @vendor_table_pa: location to store physical address of vendor table
*
* Returns 0 on success. On error, return params are left unchanged.
*/
-int
-efi_find_vendor_table(unsigned long conf_table_pa, unsigned int conf_table_len,
- efi_guid_t guid, bool efi_64,
- unsigned long *vendor_table_pa)
+int efi_find_vendor_table(unsigned long cfg_tbl_pa, unsigned int cfg_tbl_len,
+ efi_guid_t guid, bool efi_64, unsigned long *vendor_table_pa)
{
unsigned int i;
- for (i = 0; i < conf_table_len; i++) {
+ for (i = 0; i < cfg_tbl_len; i++) {
unsigned long vendor_table_pa_tmp;
efi_guid_t vendor_table_guid;
int ret;
- if (get_vendor_table((void *)conf_table_pa, i,
+ if (get_vendor_table((void *)cfg_tbl_pa, i,
&vendor_table_pa_tmp,
&vendor_table_guid, efi_64))
return -EINVAL;
@@ -88,9 +86,8 @@ efi_find_vendor_table(unsigned long conf_table_pa, unsigned int conf_table_len,
*
* Returns 0 on success. On error, return params are left unchanged.
*/
-int
-efi_get_system_table(struct boot_params *boot_params,
- unsigned long *sys_table_pa, bool *is_efi_64)
+int efi_get_system_table(struct boot_params *boot_params, unsigned long *sys_table_pa,
+ bool *is_efi_64)
{
unsigned long sys_table;
struct efi_info *ei;
@@ -137,22 +134,19 @@ efi_get_system_table(struct boot_params *boot_params,
* address EFI configuration table.
*
* @boot_params: pointer to boot_params
- * @conf_table_pa: location to store physical address of config table
- * @conf_table_len: location to store number of config table entries
+ * @cfg_tbl_pa: location to store physical address of config table
+ * @cfg_tbl_len: location to store number of config table entries
* @is_efi_64: location to store whether using 64-bit EFI or not
*
* Returns 0 on success. On error, return params are left unchanged.
*/
-int
-efi_get_conf_table(struct boot_params *boot_params,
- unsigned long *conf_table_pa,
- unsigned int *conf_table_len,
- bool *is_efi_64)
+int efi_get_conf_table(struct boot_params *boot_params, unsigned long *cfg_tbl_pa,
+ unsigned int *cfg_tbl_len, bool *is_efi_64)
{
unsigned long sys_table_pa = 0;
int ret;
- if (!conf_table_pa || !conf_table_len || !is_efi_64)
+ if (!cfg_tbl_pa || !cfg_tbl_len || !is_efi_64)
return -EINVAL;
ret = efi_get_system_table(boot_params, &sys_table_pa, is_efi_64);
@@ -164,14 +158,14 @@ efi_get_conf_table(struct boot_params *boot_params,
efi_system_table_64_t *stbl =
(efi_system_table_64_t *)sys_table_pa;
- *conf_table_pa = stbl->tables;
- *conf_table_len = stbl->nr_tables;
+ *cfg_tbl_pa = stbl->tables;
+ *cfg_tbl_len = stbl->nr_tables;
} else {
efi_system_table_32_t *stbl =
(efi_system_table_32_t *)sys_table_pa;
- *conf_table_pa = stbl->tables;
- *conf_table_len = stbl->nr_tables;
+ *cfg_tbl_pa = stbl->tables;
+ *cfg_tbl_len = stbl->nr_tables;
}
return 0;
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Aug 25, 2021 at 04:29:13PM +0200, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:18AM -0500, Brijesh Singh wrote:
> > From: Michael Roth <[email protected]>
> >
> > As of commit 103a4908ad4d ("x86/head/64: Disable stack protection for
> > head$(BITS).o") kernel/head64.c is compiled with -fno-stack-protector
> > to allow a call to set_bringup_idt_handler(), which would otherwise
> > have stack protection enabled with CONFIG_STACKPROTECTOR_STRONG. While
> > sufficient for that case, this will still cause issues if we attempt to
> > call out to any external functions that were compiled with stack
> > protection enabled that in-turn make stack-protected calls, or if the
> > exception handlers set up by set_bringup_idt_handler() make calls to
> > stack-protected functions.
> >
> > Subsequent patches for SEV-SNP CPUID validation support will introduce
> > both such cases. Attempting to disable stack protection for everything
> > in scope to address that is prohibitive since much of the code, like
> > SEV-ES #VC handler, is shared code that remains in use after boot and
> > could benefit from having stack protection enabled. Attempting to inline
> > calls is brittle and can quickly balloon out to library/helper code
> > where that's not really an option.
> >
> > Instead, set up %gs to point a buffer that stack protector can use for
> > canary values when needed.
> >
> > In doing so, it's likely we can stop using -no-stack-protector for
> > head64.c, but that hasn't been tested yet, and head32.c would need a
> > similar solution to be safe, so that is left as a potential follow-up.
>
> That...
Argh! I had this fixed up but I think it got clobbered in the patch
shuffle. I'll make sure to fix this, and remember to actually test
without CONFIG_STACKPTROTECTOR this time. Sorry for the screw-up.
>
> > Signed-off-by: Michael Roth <[email protected]>
> > Signed-off-by: Brijesh Singh <[email protected]>
> > ---
> > arch/x86/kernel/Makefile | 2 +-
> > arch/x86/kernel/head64.c | 20 ++++++++++++++++++++
> > 2 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> > index 3e625c61f008..5abdfd0dbbc3 100644
> > --- a/arch/x86/kernel/Makefile
> > +++ b/arch/x86/kernel/Makefile
> > @@ -46,7 +46,7 @@ endif
> > # non-deterministic coverage.
> > KCOV_INSTRUMENT := n
> >
> > -CFLAGS_head$(BITS).o += -fno-stack-protector
> > +CFLAGS_head32.o += -fno-stack-protector
>
> ... and that needs to be taken care of too.
I didn't realize the the 32-bit path was something you were suggesting
to have added in this patch, but I'll take a look at that as well.
On Fri, Aug 20, 2021 at 10:19:18AM -0500, Brijesh Singh wrote:
> From: Michael Roth <[email protected]>
>
> As of commit 103a4908ad4d ("x86/head/64: Disable stack protection for
> head$(BITS).o") kernel/head64.c is compiled with -fno-stack-protector
> to allow a call to set_bringup_idt_handler(), which would otherwise
> have stack protection enabled with CONFIG_STACKPROTECTOR_STRONG. While
> sufficient for that case, this will still cause issues if we attempt to
> call out to any external functions that were compiled with stack
> protection enabled that in-turn make stack-protected calls, or if the
> exception handlers set up by set_bringup_idt_handler() make calls to
> stack-protected functions.
>
> Subsequent patches for SEV-SNP CPUID validation support will introduce
> both such cases. Attempting to disable stack protection for everything
> in scope to address that is prohibitive since much of the code, like
> SEV-ES #VC handler, is shared code that remains in use after boot and
> could benefit from having stack protection enabled. Attempting to inline
> calls is brittle and can quickly balloon out to library/helper code
> where that's not really an option.
>
> Instead, set up %gs to point a buffer that stack protector can use for
> canary values when needed.
>
> In doing so, it's likely we can stop using -no-stack-protector for
> head64.c, but that hasn't been tested yet, and head32.c would need a
> similar solution to be safe, so that is left as a potential follow-up.
That...
> Signed-off-by: Michael Roth <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/Makefile | 2 +-
> arch/x86/kernel/head64.c | 20 ++++++++++++++++++++
> 2 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 3e625c61f008..5abdfd0dbbc3 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -46,7 +46,7 @@ endif
> # non-deterministic coverage.
> KCOV_INSTRUMENT := n
>
> -CFLAGS_head$(BITS).o += -fno-stack-protector
> +CFLAGS_head32.o += -fno-stack-protector
... and that needs to be taken care of too.
> CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
>
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index a1711c4594fa..f1b76a54c84e 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -74,6 +74,11 @@ static struct desc_struct startup_gdt[GDT_ENTRIES] = {
> [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
> };
>
> +/* For use by stack protector code before switching to virtual addresses */
> +#if CONFIG_STACKPROTECTOR
That's "#ifdef". Below too.
Did you even build this with CONFIG_STACKPROTECTOR disabled?
Because if you did, you would've seen this:
arch/x86/kernel/head64.c:78:5: warning: "CONFIG_STACKPROTECTOR" is not defined, evaluates to 0 [-Wundef]
78 | #if CONFIG_STACKPROTECTOR
| ^~~~~~~~~~~~~~~~~~~~~
arch/x86/kernel/head64.c: In function ‘startup_64_setup_env’:
arch/x86/kernel/head64.c:613:35: error: ‘startup_gs_area’ undeclared (first use in this function)
613 | u64 gs_area = (u64)fixup_pointer(startup_gs_area, physbase);
| ^~~~~~~~~~~~~~~
arch/x86/kernel/head64.c:613:35: note: each undeclared identifier is reported only once for each function it appears in
arch/x86/kernel/head64.c:632:5: warning: "CONFIG_STACKPROTECTOR" is not defined, evaluates to 0 [-Wundef]
632 | #if CONFIG_STACKPROTECTOR
| ^~~~~~~~~~~~~~~~~~~~~
arch/x86/kernel/head64.c:613:6: warning: unused variable ‘gs_area’ [-Wunused-variable]
613 | u64 gs_area = (u64)fixup_pointer(startup_gs_area, physbase);
| ^~~~~~~
make[2]: *** [scripts/Makefile.build:271: arch/x86/kernel/head64.o] Error 1
make[1]: *** [scripts/Makefile.build:514: arch/x86/kernel] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:1851: arch/x86] Error 2
make: *** Waiting for unfinished jobs....
> +static char startup_gs_area[64];
> +#endif
> +
> /*
> * Address needs to be set at runtime because it references the startup_gdt
> * while the kernel still uses a direct mapping.
> @@ -605,6 +610,8 @@ void early_setup_idt(void)
> */
> void __head startup_64_setup_env(unsigned long physbase)
> {
> + u64 gs_area = (u64)fixup_pointer(startup_gs_area, physbase);
> +
> /* Load GDT */
> startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
> native_load_gdt(&startup_gdt_descr);
> @@ -614,5 +621,18 @@ void __head startup_64_setup_env(unsigned long physbase)
> "movl %%eax, %%ss\n"
> "movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");
>
> + /*
> + * GCC stack protection needs a place to store canary values. The
> + * default is %gs:0x28, which is what the kernel currently uses.
> + * Point GS base to a buffer that can be used for this purpose.
> + * Note that newer GCCs now allow this location to be configured,
> + * so if we change from the default in the future we need to ensure
> + * that this buffer overlaps whatever address ends up being used.
> + */
> +#if CONFIG_STACKPROTECTOR
> + asm volatile("movl %%eax, %%gs\n" : : "a"(__KERNEL_DS) : "memory");
> + native_wrmsr(MSR_GS_BASE, gs_area, gs_area >> 32);
> +#endif
> +
> startup_64_load_idt(physbase);
> }
> --
> 2.17.1
>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Aug 25, 2021 at 05:18:20PM +0200, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:21AM -0500, Brijesh Singh wrote:
> > From: Michael Roth <[email protected]>
> >
> > Future patches for SEV-SNP-validated CPUID will also require early
> > parsing of the EFI configuration. Move the related code into a set of
> > helpers that can be re-used for that purpose.
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > Signed-off-by: Brijesh Singh <[email protected]>
> > ---
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/boot/compressed/acpi.c | 113 +++++--------------
> > arch/x86/boot/compressed/efi.c | 178 ++++++++++++++++++++++++++++++
> > arch/x86/boot/compressed/misc.h | 43 ++++++++
> > 4 files changed, 251 insertions(+), 84 deletions(-)
> > create mode 100644 arch/x86/boot/compressed/efi.c
>
> Ok, better, but this patch needs splitting. And I have a good idea how:
> in at least three patches:
>
> 1. Add efi_get_system_table() and use it
> 2. Add efi_get_conf_table() and use it
> 3. Add efi_find_vendor_table() and use it
>
> This will facilitate review immensely.
Ok, that makes sense.
>
> Also, here's a diff ontop of what to do also, style-wise.
>
> - change how you look for the preferred vendor table along with commenting what you do
> - shorten variable names so that you don't have so many line breaks.
Thanks for the suggestions, I'll incorporate those changes in the next spin as
well.
>
> Thx.
>
> diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> index 3a3f997d7210..c22b21e94a95 100644
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -20,27 +20,29 @@
> */
> struct mem_vector immovable_mem[MAX_NUMNODES*2];
>
> -/*
> - * Search EFI system tables for RSDP. If both ACPI_20_TABLE_GUID and
> - * ACPI_TABLE_GUID are found, take the former, which has more features.
> - */
> static acpi_physical_address
> -__efi_get_rsdp_addr(unsigned long config_table_pa,
> - unsigned int config_table_len, bool efi_64)
> +__efi_get_rsdp_addr(unsigned long cfg_tbl_pa, unsigned int cfg_tbl_len, bool efi_64)
> {
> acpi_physical_address rsdp_addr = 0;
> +
> #ifdef CONFIG_EFI
> int ret;
>
> - ret = efi_find_vendor_table(config_table_pa, config_table_len,
> - ACPI_20_TABLE_GUID, efi_64,
> - (unsigned long *)&rsdp_addr);
> - if (ret == -ENOENT)
> - ret = efi_find_vendor_table(config_table_pa, config_table_len,
> - ACPI_TABLE_GUID, efi_64,
> - (unsigned long *)&rsdp_addr);
> + /*
> + * Search EFI system tables for RSDP. Preferred is ACPI_20_TABLE_GUID to
> + * ACPI_TABLE_GUID because it has more features.
> + */
> + ret = efi_find_vendor_table(cfg_tbl_pa, cfg_tbl_len, ACPI_20_TABLE_GUID,
> + efi_64, (unsigned long *)&rsdp_addr);
> + if (!ret)
> + return rsdp_addr;
> +
> + /* No ACPI_20_TABLE_GUID found, fallback to ACPI_TABLE_GUID. */
> + ret = efi_find_vendor_table(cfg_tbl_pa, cfg_tbl_len, ACPI_TABLE_GUID,
> + efi_64, (unsigned long *)&rsdp_addr);
> if (ret)
> debug_putstr("Error getting RSDP address.\n");
> +
> #endif
> return rsdp_addr;
> }
> @@ -100,18 +102,16 @@ static acpi_physical_address kexec_get_rsdp_addr(void) { return 0; }
> static acpi_physical_address efi_get_rsdp_addr(void)
> {
> #ifdef CONFIG_EFI
> - unsigned long config_table_pa = 0;
> - unsigned int config_table_len;
> + unsigned long cfg_tbl_pa = 0;
> + unsigned int cfg_tbl_len;
> bool efi_64;
> int ret;
>
> - ret = efi_get_conf_table(boot_params, &config_table_pa,
> - &config_table_len, &efi_64);
> - if (ret || !config_table_pa)
> + ret = efi_get_conf_table(boot_params, &cfg_tbl_pa, &cfg_tbl_len, &efi_64);
> + if (ret || !cfg_tbl_pa)
> error("EFI config table not found.");
>
> - return __efi_get_rsdp_addr(config_table_pa, config_table_len,
> - efi_64);
> + return __efi_get_rsdp_addr(cfg_tbl_pa, cfg_tbl_len, efi_64);
> #else
> return 0;
> #endif
> diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
> index 16ff5cb9a1fb..7ed31b943c04 100644
> --- a/arch/x86/boot/compressed/efi.c
> +++ b/arch/x86/boot/compressed/efi.c
> @@ -12,14 +12,14 @@
> #include <asm/efi.h>
>
> /* Get vendor table address/guid from EFI config table at the given index */
> -static int get_vendor_table(void *conf_table, unsigned int idx,
> +static int get_vendor_table(void *cfg_tbl, unsigned int idx,
> unsigned long *vendor_table_pa,
> efi_guid_t *vendor_table_guid,
> bool efi_64)
> {
> if (efi_64) {
> efi_config_table_64_t *table_entry =
> - (efi_config_table_64_t *)conf_table + idx;
> + (efi_config_table_64_t *)cfg_tbl + idx;
>
> if (!IS_ENABLED(CONFIG_X86_64) &&
> table_entry->table >> 32) {
> @@ -32,7 +32,7 @@ static int get_vendor_table(void *conf_table, unsigned int idx,
>
> } else {
> efi_config_table_32_t *table_entry =
> - (efi_config_table_32_t *)conf_table + idx;
> + (efi_config_table_32_t *)cfg_tbl + idx;
>
> *vendor_table_pa = table_entry->table;
> *vendor_table_guid = table_entry->guid;
> @@ -45,27 +45,25 @@ static int get_vendor_table(void *conf_table, unsigned int idx,
> * Given EFI config table, search it for the physical address of the vendor
> * table associated with GUID.
> *
> - * @conf_table: pointer to EFI configuration table
> - * @conf_table_len: number of entries in EFI configuration table
> + * @cfg_tbl: pointer to EFI configuration table
> + * @cfg_tbl_len: number of entries in EFI configuration table
> * @guid: GUID of vendor table
> * @efi_64: true if using 64-bit EFI
> * @vendor_table_pa: location to store physical address of vendor table
> *
> * Returns 0 on success. On error, return params are left unchanged.
> */
> -int
> -efi_find_vendor_table(unsigned long conf_table_pa, unsigned int conf_table_len,
> - efi_guid_t guid, bool efi_64,
> - unsigned long *vendor_table_pa)
> +int efi_find_vendor_table(unsigned long cfg_tbl_pa, unsigned int cfg_tbl_len,
> + efi_guid_t guid, bool efi_64, unsigned long *vendor_table_pa)
> {
> unsigned int i;
>
> - for (i = 0; i < conf_table_len; i++) {
> + for (i = 0; i < cfg_tbl_len; i++) {
> unsigned long vendor_table_pa_tmp;
> efi_guid_t vendor_table_guid;
> int ret;
>
> - if (get_vendor_table((void *)conf_table_pa, i,
> + if (get_vendor_table((void *)cfg_tbl_pa, i,
> &vendor_table_pa_tmp,
> &vendor_table_guid, efi_64))
> return -EINVAL;
> @@ -88,9 +86,8 @@ efi_find_vendor_table(unsigned long conf_table_pa, unsigned int conf_table_len,
> *
> * Returns 0 on success. On error, return params are left unchanged.
> */
> -int
> -efi_get_system_table(struct boot_params *boot_params,
> - unsigned long *sys_table_pa, bool *is_efi_64)
> +int efi_get_system_table(struct boot_params *boot_params, unsigned long *sys_table_pa,
> + bool *is_efi_64)
> {
> unsigned long sys_table;
> struct efi_info *ei;
> @@ -137,22 +134,19 @@ efi_get_system_table(struct boot_params *boot_params,
> * address EFI configuration table.
> *
> * @boot_params: pointer to boot_params
> - * @conf_table_pa: location to store physical address of config table
> - * @conf_table_len: location to store number of config table entries
> + * @cfg_tbl_pa: location to store physical address of config table
> + * @cfg_tbl_len: location to store number of config table entries
> * @is_efi_64: location to store whether using 64-bit EFI or not
> *
> * Returns 0 on success. On error, return params are left unchanged.
> */
> -int
> -efi_get_conf_table(struct boot_params *boot_params,
> - unsigned long *conf_table_pa,
> - unsigned int *conf_table_len,
> - bool *is_efi_64)
> +int efi_get_conf_table(struct boot_params *boot_params, unsigned long *cfg_tbl_pa,
> + unsigned int *cfg_tbl_len, bool *is_efi_64)
> {
> unsigned long sys_table_pa = 0;
> int ret;
>
> - if (!conf_table_pa || !conf_table_len || !is_efi_64)
> + if (!cfg_tbl_pa || !cfg_tbl_len || !is_efi_64)
> return -EINVAL;
>
> ret = efi_get_system_table(boot_params, &sys_table_pa, is_efi_64);
> @@ -164,14 +158,14 @@ efi_get_conf_table(struct boot_params *boot_params,
> efi_system_table_64_t *stbl =
> (efi_system_table_64_t *)sys_table_pa;
>
> - *conf_table_pa = stbl->tables;
> - *conf_table_len = stbl->nr_tables;
> + *cfg_tbl_pa = stbl->tables;
> + *cfg_tbl_len = stbl->nr_tables;
> } else {
> efi_system_table_32_t *stbl =
> (efi_system_table_32_t *)sys_table_pa;
>
> - *conf_table_pa = stbl->tables;
> - *conf_table_len = stbl->nr_tables;
> + *cfg_tbl_pa = stbl->tables;
> + *cfg_tbl_len = stbl->nr_tables;
> }
>
> return 0;
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CMichael.Roth%40amd.com%7Cb4091a3e85cd463b1cb608d967db81ec%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637655014764920013%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rvxyc%2FAURXMBWX5lkUHQoM1a%2FlGt4ZtcfGCJZ1TjIQU%3D&reserved=0
On Wed, Aug 25, 2021 at 05:07:26PM +0200, Joerg Roedel wrote:
> On Fri, Aug 20, 2021 at 10:19:18AM -0500, Brijesh Singh wrote:
> > void __head startup_64_setup_env(unsigned long physbase)
> > {
> > + u64 gs_area = (u64)fixup_pointer(startup_gs_area, physbase);
> > +
>
> This breaks as soon as the compiler decides that startup_64_setup_env()
> needs stack protection too.
Good point.
>
> And the startup_gs_area is also not needed, there is initial_gs for
> that.
>
> What you need is something along these lines (untested):
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index d8b3ebd2bb85..3c7c59bc9903 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -65,6 +65,16 @@ SYM_CODE_START_NOALIGN(startup_64)
> leaq (__end_init_task - FRAME_SIZE)(%rip), %rsp
>
> leaq _text(%rip), %rdi
> +
> + movl $MSR_GS_BASE, %ecx
> + movq initial_gs(%rip), %rax
> + movq $_text, %rdx
> + subq %rdx, %rax
> + addq %rdi, %rax
> + movq %rax, %rdx
> + shrq $32, %rdx
> + wrmsr
> +
> pushq %rsi
> call startup_64_setup_env
> popq %rsi
>
>
> It loads the initial_gs pointer, applies the fixup on it and loads it
> into MSR_GS_BASE.
This seems to do the trick, and is probably closer to what the 32-bit
version would look like. Thanks for the suggestion!
On Wed, Aug 25, 2021 at 10:18:35AM -0500, Michael Roth wrote:
> On Wed, Aug 25, 2021 at 04:29:13PM +0200, Borislav Petkov wrote:
> > On Fri, Aug 20, 2021 at 10:19:18AM -0500, Brijesh Singh wrote:
> > > From: Michael Roth <[email protected]>
> > >
> > > As of commit 103a4908ad4d ("x86/head/64: Disable stack protection for
> > > head$(BITS).o") kernel/head64.c is compiled with -fno-stack-protector
> > > to allow a call to set_bringup_idt_handler(), which would otherwise
> > > have stack protection enabled with CONFIG_STACKPROTECTOR_STRONG. While
> > > sufficient for that case, this will still cause issues if we attempt to
^^^
I'm tired of repeating the same review comments with you guys:
Who's "we"?
Please use passive voice in your text: no "we" or "I", etc.
Personal pronouns are ambiguous in text, especially with so many
parties/companies/etc developing the kernel so let's avoid them please.
How about you pay more attention?
> I didn't realize the the 32-bit path was something you were suggesting
> to have added in this patch, but I'll take a look at that as well.
If you're going to remove the -no-stack-protector thing for that file,
then pls remove it for both 32- and 64-bit. I.e., the revert what
103a4908ad4d did.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:23AM -0500, Brijesh Singh wrote:
> From: Michael Roth <[email protected]>
>
> CPUID instructions generate a #VC exception for SEV-ES/SEV-SNP guests,
> for which early handlers are currently set up to handle. In the case
> of SEV-SNP, guests can use a special location in guest memory address
> space that has been pre-populated with firmware-validated CPUID
> information to look up the relevant CPUID values rather than
> requesting them from hypervisor via a VMGEXIT.
>
> Determine the location of the CPUID memory address in advance of any
> CPUID instructions/exceptions and, when available, use it to handle
> the CPUID lookup.
>
> Signed-off-by: Michael Roth <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/boot/compressed/efi.c | 1 +
> arch/x86/boot/compressed/head_64.S | 1 +
> arch/x86/boot/compressed/idt_64.c | 7 +-
> arch/x86/boot/compressed/misc.h | 1 +
> arch/x86/boot/compressed/sev.c | 3 +
> arch/x86/include/asm/sev-common.h | 2 +
> arch/x86/include/asm/sev.h | 3 +
> arch/x86/kernel/sev-shared.c | 374 +++++++++++++++++++++++++++++
> arch/x86/kernel/sev.c | 4 +
> 9 files changed, 394 insertions(+), 2 deletions(-)
Another huuge patch. I wonder if it can be split...
> diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
> index 16ff5cb9a1fb..a1529a230ea7 100644
> --- a/arch/x86/boot/compressed/efi.c
> +++ b/arch/x86/boot/compressed/efi.c
> @@ -176,3 +176,4 @@ efi_get_conf_table(struct boot_params *boot_params,
>
> return 0;
> }
> +
Applying: x86/compressed/64: Enable SEV-SNP-validated CPUID in #VC handler
.git/rebase-apply/patch:21: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
That looks like a stray hunk which doesn't belong.
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index a2347ded77ea..1c1658693fc9 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -441,6 +441,7 @@ SYM_CODE_START(startup_64)
> .Lon_kernel_cs:
>
> pushq %rsi
> + movq %rsi, %rdi /* real mode address */
> call load_stage1_idt
> popq %rsi
>
> diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
> index 9b93567d663a..1f6511a6625d 100644
> --- a/arch/x86/boot/compressed/idt_64.c
> +++ b/arch/x86/boot/compressed/idt_64.c
> @@ -3,6 +3,7 @@
> #include <asm/segment.h>
> #include <asm/trapnr.h>
> #include "misc.h"
> +#include <asm/sev.h>
asm/ namespaced headers should go together, before the private ones,
i.e., above the misc.h line.
> static void set_idt_entry(int vector, void (*handler)(void))
> {
> @@ -28,13 +29,15 @@ static void load_boot_idt(const struct desc_ptr *dtr)
> }
>
> /* Setup IDT before kernel jumping to .Lrelocated */
> -void load_stage1_idt(void)
> +void load_stage1_idt(void *rmode)
> {
> boot_idt_desc.address = (unsigned long)boot_idt;
>
>
> - if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
> + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
> + sev_snp_cpuid_init(rmode);
> set_idt_entry(X86_TRAP_VC, boot_stage1_vc);
> + }
>
> load_boot_idt(&boot_idt_desc);
> }
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 16b092fd7aa1..cdd328aa42c2 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -190,6 +190,7 @@ int efi_get_conf_table(struct boot_params *boot_params,
> unsigned long *conf_table_pa,
> unsigned int *conf_table_len,
> bool *is_efi_64);
> +
Another stray hunk.
> #else
> static inline int
> efi_find_vendor_table(unsigned long conf_table_pa, unsigned int conf_table_len,
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 6e8d97c280aa..910bf5cf010e 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -20,6 +20,9 @@
> #include <asm/fpu/xcr.h>
> #include <asm/ptrace.h>
> #include <asm/svm.h>
> +#include <asm/cpuid.h>
> +#include <linux/efi.h>
> +#include <linux/log2.h>
What are those includes for?
Polluting the decompressor namespace with kernel proper defines is a
real pain to untangle as it is. What do you need those for and can you
do it without them?
> #include "error.h"
>
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 072540dfb129..5f134c172dbf 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -148,6 +148,8 @@ struct snp_psc_desc {
> #define GHCB_TERM_PSC 1 /* Page State Change failure */
> #define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
> #define GHCB_TERM_NOT_VMPL0 3 /* SNP guest is not running at VMPL-0 */
> +#define GHCB_TERM_CPUID 4 /* CPUID-validation failure */
> +#define GHCB_TERM_CPUID_HV 5 /* CPUID failure during hypervisor fallback */
>
> #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 534fa1c4c881..c73931548346 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -11,6 +11,7 @@
> #include <linux/types.h>
> #include <asm/insn.h>
> #include <asm/sev-common.h>
> +#include <asm/bootparam.h>
>
> #define GHCB_PROTOCOL_MIN 1ULL
> #define GHCB_PROTOCOL_MAX 2ULL
> @@ -126,6 +127,7 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
> void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
> void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
> void snp_set_wakeup_secondary_cpu(void);
> +void sev_snp_cpuid_init(struct boot_params *bp);
> #else
> static inline void sev_es_ist_enter(struct pt_regs *regs) { }
> static inline void sev_es_ist_exit(void) { }
> @@ -141,6 +143,7 @@ static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz,
> static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
> static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
> static inline void snp_set_wakeup_secondary_cpu(void) { }
> +static inline void sev_snp_cpuid_init(struct boot_params *bp) { }
> #endif
>
> #endif
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index ae4556925485..651980ddbd65 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -14,6 +14,25 @@
> #define has_cpuflag(f) boot_cpu_has(f)
> #endif
>
> +struct sev_snp_cpuid_fn {
> + u32 eax_in;
> + u32 ecx_in;
> + u64 unused;
> + u64 unused2;
What are those for? Padding? Or are they spec-ed somewhere and left for
future use?
Seeing how the struct is __packed, they probably are part of a spec
definition somewhere.
Link pls.
> + u32 eax;
> + u32 ebx;
> + u32 ecx;
> + u32 edx;
> + u64 reserved;
Ditto.
Please prefix all those unused/reserved members with "__".
> +} __packed;
> +
> +struct sev_snp_cpuid_info {
> + u32 count;
> + u32 reserved1;
> + u64 reserved2;
Ditto.
> + struct sev_snp_cpuid_fn fn[0];
> +} __packed;
> +
> /*
> * Since feature negotiation related variables are set early in the boot
> * process they must reside in the .data section so as not to be zeroed
> @@ -26,6 +45,15 @@ static u16 __ro_after_init ghcb_version;
> /* Bitmap of SEV features supported by the hypervisor */
> u64 __ro_after_init sev_hv_features = 0;
>
> +/*
> + * These are also stored in .data section to avoid the need to re-parse
> + * boot_params and re-determine CPUID memory range when .bss is cleared.
> + */
> +static int sev_snp_cpuid_enabled __section(".data");
That will become part of prot_guest_has() or cc_platform_has() or
whatever its name is going to be.
> +static unsigned long sev_snp_cpuid_pa __section(".data");
> +static unsigned long sev_snp_cpuid_sz __section(".data");
> +static const struct sev_snp_cpuid_info *cpuid_info __section(".data");
All those: __ro_after_init?
Also, just like the ones above have a short comment explaining what they
are, add such comments for those too pls and perhaps what they're used
for.
> +
> static bool __init sev_es_check_cpu_features(void)
> {
> if (!has_cpuflag(X86_FEATURE_RDRAND)) {
> @@ -236,6 +264,219 @@ static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
> return 0;
> }
>
> +static bool sev_snp_cpuid_active(void)
> +{
> + return sev_snp_cpuid_enabled;
> +}
That too will become part of prot_guest_has() or cc_platform_has() or
whatever its name is going to be.
> +
> +static int sev_snp_cpuid_xsave_size(u64 xfeatures_en, u32 base_size,
> + u32 *xsave_size, bool compacted)
Function name needs a verb. Please audit all your patches.
> +{
> + u64 xfeatures_found = 0;
> + int i;
> +
> + *xsave_size = base_size;
Set that xsave_size only...
> +
> + for (i = 0; i < cpuid_info->count; i++) {
> + const struct sev_snp_cpuid_fn *fn = &cpuid_info->fn[i];
> +
> + if (!(fn->eax_in == 0xd && fn->ecx_in > 1 && fn->ecx_in < 64))
> + continue;
> + if (!(xfeatures_en & (1UL << fn->ecx_in)))
> + continue;
> + if (xfeatures_found & (1UL << fn->ecx_in))
> + continue;
> +
> + xfeatures_found |= (1UL << fn->ecx_in);
For all use BIT_ULL().
> + if (compacted)
> + *xsave_size += fn->eax;
> + else
> + *xsave_size = max(*xsave_size, fn->eax + fn->ebx);
... not here ...
> + }
> +
> + /*
> + * Either the guest set unsupported XCR0/XSS bits, or the corresponding
> + * entries in the CPUID table were not present. This is not a valid
> + * state to be in.
> + */
> + if (xfeatures_found != (xfeatures_en & ~3ULL))
> + return -EINVAL;
... but here when you're not going to return an error because callers
will see that value change temporarily which is not clean.
Also, you need to set it once - not during each loop iteration.
> +
> + return 0;
> +}
> +
> +static void sev_snp_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
> + u32 *ecx, u32 *edx)
> +{
> + /*
> + * Currently MSR protocol is sufficient to handle fallback cases, but
> + * should that change make sure we terminate rather than grabbing random
Fix the "we"s please. Please audit all your patches.
> + * values. Handling can be added in future to use GHCB-page protocol for
> + * cases that occur late enough in boot that GHCB page is available
End comment sentences with a fullstop. Please audit all your patches.
> + */
Also, put that comment over the function.
> + if (cpuid_function_is_indexed(func) && subfunc != 0)
In all your patches:
s/ != 0//g
> + sev_es_terminate(1, GHCB_TERM_CPUID_HV);
> +
> + if (sev_cpuid_hv(func, 0, eax, ebx, ecx, edx))
> + sev_es_terminate(1, GHCB_TERM_CPUID_HV);
> +}
> +
> +static bool sev_snp_cpuid_find(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
I guess
find_validated_cpuid_func()
or so to denote where it picks it out from.
> + u32 *ecx, u32 *edx)
> +{
> + int i;
> + bool found = false;
The tip-tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::
struct long_struct_name *descriptive_name;
unsigned long foo, bar;
unsigned int tmp;
int ret;
The above is faster to parse than the reverse ordering::
int ret;
unsigned int tmp;
unsigned long foo, bar;
struct long_struct_name *descriptive_name;
And even more so than random ordering::
unsigned long foo, bar;
int ret;
struct long_struct_name *descriptive_name;
unsigned int tmp;
Audit all your patches pls.
> +
> + for (i = 0; i < cpuid_info->count; i++) {
> + const struct sev_snp_cpuid_fn *fn = &cpuid_info->fn[i];
> +
> + if (fn->eax_in != func)
> + continue;
> +
> + if (cpuid_function_is_indexed(func) && fn->ecx_in != subfunc)
> + continue;
> +
> + *eax = fn->eax;
> + *ebx = fn->ebx;
> + *ecx = fn->ecx;
> + *edx = fn->edx;
> + found = true;
> +
> + break;
That's just silly. Simply:
return true;
> + }
> +
> + return found;
return false;
here and the "found" variable can go.
> +}
> +
> +static bool sev_snp_cpuid_in_range(u32 func)
> +{
> + int i;
> + u32 std_range_min = 0;
> + u32 std_range_max = 0;
> + u32 hyp_range_min = 0x40000000;
> + u32 hyp_range_max = 0;
> + u32 ext_range_min = 0x80000000;
> + u32 ext_range_max = 0;
> +
> + for (i = 0; i < cpuid_info->count; i++) {
> + const struct sev_snp_cpuid_fn *fn = &cpuid_info->fn[i];
> +
> + if (fn->eax_in == std_range_min)
> + std_range_max = fn->eax;
> + else if (fn->eax_in == hyp_range_min)
> + hyp_range_max = fn->eax;
> + else if (fn->eax_in == ext_range_min)
> + ext_range_max = fn->eax;
> + }
So this loop which determines those ranges will run each time
sev_snp_cpuid_find() doesn't find @func among the validated CPUID leafs.
Why don't you do that determination once at init...
> +
> + if ((func >= std_range_min && func <= std_range_max) ||
> + (func >= hyp_range_min && func <= hyp_range_max) ||
> + (func >= ext_range_min && func <= ext_range_max))
... so that this function becomes only this check?
This is unnecessary work as it is.
> + return true;
> +
> + return false;
> +}
> +
> +/*
> + * Returns -EOPNOTSUPP if feature not enabled. Any other return value should be
> + * treated as fatal by caller since we cannot fall back to hypervisor to fetch
> + * the values for security reasons (outside of the specific cases handled here)
> + */
> +static int sev_snp_cpuid(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
> + u32 *edx)
> +{
> + if (!sev_snp_cpuid_active())
> + return -EOPNOTSUPP;
> +
> + if (!cpuid_info)
> + return -EIO;
> +
> + if (!sev_snp_cpuid_find(func, subfunc, eax, ebx, ecx, edx)) {
> + /*
> + * Some hypervisors will avoid keeping track of CPUID entries
> + * where all values are zero, since they can be handled the
> + * same as out-of-range values (all-zero). In our case, we want
> + * to be able to distinguish between out-of-range entries and
> + * in-range zero entries, since the CPUID table entries are
> + * only a template that may need to be augmented with
> + * additional values for things like CPU-specific information.
> + * So if it's not in the table, but is still in the valid
> + * range, proceed with the fix-ups below. Otherwise, just return
> + * zeros.
> + */
> + *eax = *ebx = *ecx = *edx = 0;
> + if (!sev_snp_cpuid_in_range(func))
> + goto out;
That label is not needed.
> + }
All that from here on looks like it should go into a separate function
called
snp_cpuid_postprocess()
where you can do a switch-case on func and have it nice, readable and
extensible there, in case more functions get added.
> + if (func == 0x1) {
> + u32 ebx2, edx2;
> +
> + sev_snp_cpuid_hv(func, subfunc, NULL, &ebx2, NULL, &edx2);
> + /* initial APIC ID */
> + *ebx = (*ebx & 0x00FFFFFF) | (ebx2 & 0xFF000000);
For all hex masks: use GENMASK_ULL.
> + /* APIC enabled bit */
> + *edx = (*edx & ~BIT_ULL(9)) | (edx2 & BIT_ULL(9));
> +
> + /* OSXSAVE enabled bit */
> + if (native_read_cr4() & X86_CR4_OSXSAVE)
> + *ecx |= BIT_ULL(27);
> + } else if (func == 0x7) {
> + /* OSPKE enabled bit */
> + *ecx &= ~BIT_ULL(4);
> + if (native_read_cr4() & X86_CR4_PKE)
> + *ecx |= BIT_ULL(4);
> + } else if (func == 0xB) {
> + /* extended APIC ID */
> + sev_snp_cpuid_hv(func, 0, NULL, NULL, NULL, edx);
> + } else if (func == 0xd && (subfunc == 0x0 || subfunc == 0x1)) {
> + bool compacted = false;
> + u64 xcr0 = 1, xss = 0;
> + u32 xsave_size;
> +
> + if (native_read_cr4() & X86_CR4_OSXSAVE)
> + xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
> + if (subfunc == 1) {
> + /* boot/compressed doesn't set XSS so 0 is fine there */
> +#ifndef __BOOT_COMPRESSED
> + if (*eax & 0x8) /* XSAVES */
> + if (boot_cpu_has(X86_FEATURE_XSAVES))
cpu_feature_enabled()
> + rdmsrl(MSR_IA32_XSS, xss);
> +#endif
> + /*
> + * The PPR and APM aren't clear on what size should be
> + * encoded in 0xD:0x1:EBX when compaction is not enabled
> + * by either XSAVEC or XSAVES since SNP-capable hardware
> + * has the entries fixed as 1. KVM sets it to 0 in this
> + * case, but to avoid this becoming an issue it's safer
> + * to simply treat this as unsupported or SNP guests.
> + */
> + if (!(*eax & 0xA)) /* (XSAVEC|XSAVES) */
Please put side comments over the line they comment.
> + return -EINVAL;
> +
> + compacted = true;
> + }
> +
> + if (sev_snp_cpuid_xsave_size(xcr0 | xss, *ebx, &xsave_size,
> + compacted))
No need for that linebreak.
> + return -EINVAL;
> +
> + *ebx = xsave_size;
> + } else if (func == 0x8000001E) {
> + u32 ebx2, ecx2;
> +
> + /* extended APIC ID */
> + sev_snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL);
> + /* compute ID */
> + *ebx = (*ebx & 0xFFFFFFF00) | (ebx2 & 0x000000FF);
> + /* node ID */
> + *ecx = (*ecx & 0xFFFFFFF00) | (ecx2 & 0x000000FF);
> + }
> +
> +out:
> + return 0;
> +}
> +
> /*
> * Boot VC Handler - This is the first VC handler during boot, there is no GHCB
> * page yet, so it only supports the MSR based communication with the
Is that comment...
> @@ -244,15 +485,25 @@ static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
> void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
> {
> unsigned int fn = lower_bits(regs->ax, 32);
> + unsigned int subfn = lower_bits(regs->cx, 32);
> u32 eax, ebx, ecx, edx;
> + int ret;
>
> /* Only CPUID is supported via MSR protocol */
... and that still valid?
> if (exit_code != SVM_EXIT_CPUID)
> goto fail;
>
> + ret = sev_snp_cpuid(fn, subfn, &eax, &ebx, &ecx, &edx);
> + if (ret == 0)
> + goto out;
I think you mean here "goto cpuid_done;" or so.
> +
> + if (ret != -EOPNOTSUPP)
> + goto fail;
> +
> if (sev_cpuid_hv(fn, 0, &eax, &ebx, &ecx, &edx))
> goto fail;
>
> +out:
> regs->ax = eax;
> regs->bx = ebx;
> regs->cx = ecx;
> @@ -552,6 +803,19 @@ static enum es_result vc_handle_cpuid(struct ghcb *ghcb,
> struct pt_regs *regs = ctxt->regs;
> u32 cr4 = native_read_cr4();
> enum es_result ret;
> + u32 eax, ebx, ecx, edx;
> + int cpuid_ret;
> +
> + cpuid_ret = sev_snp_cpuid(regs->ax, regs->cx, &eax, &ebx, &ecx, &edx);
> + if (cpuid_ret == 0) {
> + regs->ax = eax;
> + regs->bx = ebx;
> + regs->cx = ecx;
> + regs->dx = edx;
> + return ES_OK;
> + }
> + if (cpuid_ret != -EOPNOTSUPP)
> + return ES_VMM_ERROR;
I don't like this thing slapped inside the function. Pls put it in a separate
vc_handle_cpuid_snp()
which is called by vc_handle_cpuid() instead.
>
> ghcb_set_rax(ghcb, regs->ax);
> ghcb_set_rcx(ghcb, regs->cx);
> @@ -603,3 +867,113 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,
>
> return ES_OK;
> }
> +
> +#ifdef BOOT_COMPRESSED
> +static struct setup_data *get_cc_setup_data(struct boot_params *bp)
> +{
> + struct setup_data *hdr = (struct setup_data *)bp->hdr.setup_data;
> +
> + while (hdr) {
> + if (hdr->type == SETUP_CC_BLOB)
> + return hdr;
> + hdr = (struct setup_data *)hdr->next;
> + }
> +
> + return NULL;
> +}
> +
> +/*
> + * For boot/compressed kernel:
> + *
> + * 1) Search for CC blob in the following order/precedence:
> + * - via linux boot protocol / setup_data entry
> + * - via EFI configuration table
> + * 2) Return a pointer to the CC blob, NULL otherwise.
> + */
> +static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
snp_find_cc_blob() simply.
> +{
> + struct cc_blob_sev_info *cc_info = NULL;
> + struct setup_data_cc {
> + struct setup_data header;
> + u32 cc_blob_address;
> + } *sd;
Define that struct above the function and call it "cc_setup_data" like
the rest of the stuff which deals with that.
> + unsigned long conf_table_pa;
> + unsigned int conf_table_len;
> + bool efi_64;
> +
> + /* Try to get CC blob via setup_data */
> + sd = (struct setup_data_cc *)get_cc_setup_data(bp);
> + if (sd) {
> + cc_info = (struct cc_blob_sev_info *)(unsigned long)sd->cc_blob_address;
> + goto out_verify;
> + }
> +
> + /* CC blob isn't in setup_data, see if it's in the EFI config table */
> + if (!efi_get_conf_table(bp, &conf_table_pa, &conf_table_len, &efi_64))
> + (void)efi_find_vendor_table(conf_table_pa, conf_table_len,
> + EFI_CC_BLOB_GUID, efi_64,
> + (unsigned long *)&cc_info);
Yah, check that retval pls with a proper ret variable. No need to cram
it all together.
> +
> +out_verify:
> + /* CC blob should be either valid or not present. Fail otherwise. */
> + if (cc_info && cc_info->magic != CC_BLOB_SEV_HDR_MAGIC)
> + sev_es_terminate(1, GHCB_SNP_UNSUPPORTED);
> +
> + return cc_info;
> +}
> +#else
> +/*
> + * Probing for CC blob for run-time kernel will be enabled in a subsequent
> + * patch. For now we need to stub this out.
> + */
> +static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
> +{
> + return NULL;
> +}
> +#endif
> +
> +/*
> + * Initial set up of CPUID table when running identity-mapped.
> + *
> + * NOTE: Since SEV_SNP feature partly relies on CPUID checks that can't
> + * happen until we access CPUID page, we skip the check and hope the
> + * bootloader is providing sane values.
So I don't like the sound of that even one bit. We shouldn't hope
anything here...
> Current code relies on all CPUID
> + * page lookups originating from #VC handler, which at least provides
> + * indication that SEV-ES is enabled. Subsequent init levels will check for
> + * SEV_SNP feature once available to also take SEV MSR value into account.
> + */
> +void sev_snp_cpuid_init(struct boot_params *bp)
snp_cpuid_init()
In general, prefix all SNP-specific variables, structs, functions, etc
with "snp_" simply.
> +{
> + struct cc_blob_sev_info *cc_info;
> +
> + if (!bp)
> + sev_es_terminate(1, GHCB_TERM_CPUID);
> +
> + cc_info = sev_snp_probe_cc_blob(bp);
> +
^ Superfluous newline.
> + if (!cc_info)
> + return;
> +
> + sev_snp_cpuid_pa = cc_info->cpuid_phys;
> + sev_snp_cpuid_sz = cc_info->cpuid_len;
You can do those assignments ...
> +
> + /*
> + * These should always be valid values for SNP, even if guest isn't
> + * actually configured to use the CPUID table.
> + */
> + if (!sev_snp_cpuid_pa || sev_snp_cpuid_sz < PAGE_SIZE)
> + sev_es_terminate(1, GHCB_TERM_CPUID);
... here, after you've verified them.
> +
> + cpuid_info = (const struct sev_snp_cpuid_info *)sev_snp_cpuid_pa;
> +
> + /*
> + * We should be able to trust the 'count' value in the CPUID table
> + * area, but ensure it agrees with CC blob value to be safe.
> + */
> + if (sev_snp_cpuid_sz < (sizeof(struct sev_snp_cpuid_info) +
> + sizeof(struct sev_snp_cpuid_fn) *
> + cpuid_info->count))
Yah, this is the type of paranoia I'm talking about!
> + sev_es_terminate(1, GHCB_TERM_CPUID);
> +
> + sev_snp_cpuid_enabled = 1;
> +}
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index ddf8ced4a879..d7b6f7420551 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -19,6 +19,8 @@
> #include <linux/kernel.h>
> #include <linux/mm.h>
> #include <linux/cpumask.h>
> +#include <linux/log2.h>
> +#include <linux/efi.h>
>
> #include <asm/cpu_entry_area.h>
> #include <asm/stacktrace.h>
> @@ -32,6 +34,8 @@
> #include <asm/smp.h>
> #include <asm/cpu.h>
> #include <asm/apic.h>
> +#include <asm/efi.h>
> +#include <asm/cpuid.h>
>
> #include "sev-internal.h"
What are those includes for?
Looks like a leftover...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:24AM -0500, Brijesh Singh wrote:
> From: Michael Roth <[email protected]>
>
> The previously defined Confidential Computing blob is provided to the
> kernel via a setup_data structure or EFI config table entry. Currently
> these are both checked for by boot/compressed kernel to access the
> CPUID table address within it for use with SEV-SNP CPUID enforcement.
>
> To also enable SEV-SNP CPUID enforcement for the run-time kernel,
> similar early access to the CPUID table is needed early on while it's
> still using the identity-mapped page table set up by boot/compressed,
> where global pointers need to be accessed via fixup_pointer().
>
> This is much of an issue for accessing setup_data, and the EFI config
> table helper code currently used in boot/compressed *could* be used in
> this case as well since they both rely on identity-mapping. However, it
> has some reliance on EFI helpers/string constants that would need to be
> accessed via fixup_pointer(), and fixing it up while making it
> shareable between boot/compressed and run-time kernel is fragile and
> introduces a good bit of uglyness.
>
> Instead, this patch adds a boot_params->cc_blob_address pointer that
Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.
Also, do
$ git grep 'This patch' Documentation/process
for more details.
> boot/compressed can initialize so that the run-time kernel can access
> the prelocated CC blob that way instead.
>
> Signed-off-by: Michael Roth <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/bootparam_utils.h | 1 +
> arch/x86/include/uapi/asm/bootparam.h | 3 ++-
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/bootparam_utils.h b/arch/x86/include/asm/bootparam_utils.h
> index 981fe923a59f..53e9b0620d96 100644
> --- a/arch/x86/include/asm/bootparam_utils.h
> +++ b/arch/x86/include/asm/bootparam_utils.h
> @@ -74,6 +74,7 @@ static void sanitize_boot_params(struct boot_params *boot_params)
> BOOT_PARAM_PRESERVE(hdr),
> BOOT_PARAM_PRESERVE(e820_table),
> BOOT_PARAM_PRESERVE(eddbuf),
> + BOOT_PARAM_PRESERVE(cc_blob_address),
> };
>
> memset(&scratch, 0, sizeof(scratch));
> diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
> index 1ac5acca72ce..bea5cdcdf532 100644
> --- a/arch/x86/include/uapi/asm/bootparam.h
> +++ b/arch/x86/include/uapi/asm/bootparam.h
> @@ -188,7 +188,8 @@ struct boot_params {
> __u32 ext_ramdisk_image; /* 0x0c0 */
> __u32 ext_ramdisk_size; /* 0x0c4 */
> __u32 ext_cmd_line_ptr; /* 0x0c8 */
> - __u8 _pad4[116]; /* 0x0cc */
> + __u8 _pad4[112]; /* 0x0cc */
> + __u32 cc_blob_address; /* 0x13c */
So I know I've heard grub being mentioned in conjunction with this: if
you are ever going to pass this through the boot loader, then you'd need
to update Documentation/x86/zero-page.rst too to state that this field
can be written by the boot loader too.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:25AM -0500, Brijesh Singh wrote:
> From: Michael Roth <[email protected]>
>
> When the Confidential Computing blob is located by the boot/compressed
> kernel, store a pointer to it in bootparams->cc_blob_address to avoid
> the need for the run-time kernel to rescan the EFI config table to find
> it again.
>
> Since this function is also shared by the run-time kernel, this patch
Here's "this patch" again... but you know what to do.
> also adds the logic to make use of bootparams->cc_blob_address when it
> has been initialized.
>
> Signed-off-by: Michael Roth <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/sev-shared.c | 40 ++++++++++++++++++++++++++----------
> 1 file changed, 29 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 651980ddbd65..6f70ba293c5e 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -868,7 +868,6 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,
> return ES_OK;
> }
>
> -#ifdef BOOT_COMPRESSED
> static struct setup_data *get_cc_setup_data(struct boot_params *bp)
> {
> struct setup_data *hdr = (struct setup_data *)bp->hdr.setup_data;
> @@ -888,6 +887,16 @@ static struct setup_data *get_cc_setup_data(struct boot_params *bp)
> * 1) Search for CC blob in the following order/precedence:
> * - via linux boot protocol / setup_data entry
> * - via EFI configuration table
> + * 2) If found, initialize boot_params->cc_blob_address to point to the
> + * blob so that uncompressed kernel can easily access it during very
> + * early boot without the need to re-parse EFI config table
> + * 3) Return a pointer to the CC blob, NULL otherwise.
> + *
> + * For run-time/uncompressed kernel:
> + *
> + * 1) Search for CC blob in the following order/precedence:
> + * - via linux boot protocol / setup_data entry
Why would you do this again if the boot/compressed kernel has already
searched for it?
> + * - via boot_params->cc_blob_address
Yes, that is the only thing you need to do in the runtime kernel - see
if cc_blob_address is not 0. And all the work has been done by the
decompressor kernel already.
> * 2) Return a pointer to the CC blob, NULL otherwise.
> */
> static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
> @@ -897,9 +906,11 @@ static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
> struct setup_data header;
> u32 cc_blob_address;
> } *sd;
> +#ifdef __BOOT_COMPRESSED
> unsigned long conf_table_pa;
> unsigned int conf_table_len;
> bool efi_64;
> +#endif
That function turns into an unreadable mess with that #ifdef
__BOOT_COMPRESSED slapped everywhere.
It seems the cleanest thing to do is to do what we do with
acpi_rsdp_addr: do all the parsing in boot/compressed/ and pass it on
through boot_params. Kernel proper simply reads the pointer.
Which means, you can stick all that cc_blob figuring out functionality
in arch/x86/boot/compressed/sev.c instead.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:26AM -0500, Brijesh Singh wrote:
> diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
> index 3cf7a7575f5c..54374e0f0257 100644
> --- a/arch/x86/boot/compressed/ident_map_64.c
> +++ b/arch/x86/boot/compressed/ident_map_64.c
> @@ -37,6 +37,9 @@
> #include <asm/setup.h> /* For COMMAND_LINE_SIZE */
> #undef _SETUP
>
> +#define __BOOT_COMPRESSED
> +#include <asm/sev.h> /* For sev_snp_active() + ConfidentialComputing blob */
> +
When you move all the cc_blob parsing to the compressed kernel, all that
ugly ifdeffery won't be needed.
> extern unsigned long get_cmd_line_ptr(void);
>
> /* Used by PAGE_KERN* macros: */
> @@ -163,6 +166,21 @@ void initialize_identity_maps(void *rmode)
> cmdline = get_cmd_line_ptr();
> add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE);
Carve that ...
> + /*
> + * The ConfidentialComputing blob is used very early in uncompressed
> + * kernel to find CPUID memory to handle cpuid instructions. Make sure
> + * an identity-mapping exists so they can be accessed after switchover.
> + */
> + if (sev_snp_enabled()) {
> + struct cc_blob_sev_info *cc_info =
> + (void *)(unsigned long)boot_params->cc_blob_address;
> +
> + add_identity_map((unsigned long)cc_info,
> + (unsigned long)cc_info + sizeof(*cc_info));
> + add_identity_map((unsigned long)cc_info->cpuid_phys,
> + (unsigned long)cc_info->cpuid_phys + cc_info->cpuid_len);
> + }
> +
> /* Load the new page-table. */
> sev_verify_cbit(top_level_pgt);
... up to here into a separate function called sev_prep_identity_maps()
so that SEV-specific code flow is not in the generic code path.
> write_cr3(top_level_pgt);
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:27AM -0500, Brijesh Singh wrote:
> From: Michael Roth <[email protected]>
>
> This adds support for utilizing the SEV-SNP-validated CPUID table in
s/This adds support for utilizing/Utilize/
Yap, it can really be that simple. :)
> the various #VC handler routines used throughout boot/run-time. Mostly
> this is handled by re-using the CPUID lookup code introduced earlier
> for the boot/compressed kernel, but at various stages of boot some work
> needs to be done to ensure the CPUID table is set up and remains
> accessible throughout. The following init routines are introduced to
> handle this:
Do not talk about what your patch does - that should hopefully be
visible in the diff itself. Rather, talk about *why* you're doing what
you're doing.
> sev_snp_cpuid_init():
This one is not really introduced - it is already there.
<snip all the complex rest>
So this patch is making my head spin. It seems we're dancing a lot of
dance just to have our CPUID page present at all times. Which begs the
question: do we need it during the whole lifetime of the guest?
Regardless, I think this can be simplified by orders of
magnitude if we allocated statically 4K for that CPUID page in
arch/x86/boot/compressed/mem_encrypt.S, copied the supplied CPUID page
from the firmware to it and from now on, work with our own copy.
You probably would need to still remap it for kernel proper but it would
get rid of all that crazy in this patch here.
Hmmm?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/27/21 10:18 AM, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:27AM -0500, Brijesh Singh wrote:
>> From: Michael Roth <[email protected]>
>>
>> This adds support for utilizing the SEV-SNP-validated CPUID table in
> s/This adds support for utilizing/Utilize/
>
> Yap, it can really be that simple. :)
>
>> the various #VC handler routines used throughout boot/run-time. Mostly
>> this is handled by re-using the CPUID lookup code introduced earlier
>> for the boot/compressed kernel, but at various stages of boot some work
>> needs to be done to ensure the CPUID table is set up and remains
>> accessible throughout. The following init routines are introduced to
>> handle this:
> Do not talk about what your patch does - that should hopefully be
> visible in the diff itself. Rather, talk about *why* you're doing what
> you're doing.
>
>> sev_snp_cpuid_init():
> This one is not really introduced - it is already there.
>
> <snip all the complex rest>
>
> So this patch is making my head spin. It seems we're dancing a lot of
> dance just to have our CPUID page present at all times. Which begs the
> question: do we need it during the whole lifetime of the guest?
Mike can correct me, we need it for entire lifetime of the guest.
Whenever guest needs the CPUID value, the #VC handler will refer to this
page.
> Regardless, I think this can be simplified by orders of
> magnitude if we allocated statically 4K for that CPUID page in
> arch/x86/boot/compressed/mem_encrypt.S, copied the supplied CPUID page
> from the firmware to it and from now on, work with our own copy.
Actually a VMM could populate more than one page for the CPUID. One
page can include 64 entries and I believe Mike is already running into
limits (with Qemu) and exploring the ideas to extend it more than a page.
> You probably would need to still remap it for kernel proper but it would
> get rid of all that crazy in this patch here.
>
> Hmmm?
>
On Fri, Aug 27, 2021 at 10:47:42AM -0500, Brijesh Singh wrote:
> Actually a VMM could populate more than one page for the CPUID. One
> page can include 64 entries and I believe Mike is already running into
> limits (with Qemu) and exploring the ideas to extend it more than a page.
You mean, like, 2 pages?
:-)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:12AM -0500, Brijesh Singh wrote:
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 37aa77565726..3388db814fd0 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -74,6 +74,8 @@
> enum psc_op {
> SNP_PAGE_STATE_PRIVATE = 1,
> SNP_PAGE_STATE_SHARED,
> + SNP_PAGE_STATE_PSMASH,
> + SNP_PAGE_STATE_UNSMASH,
Those two are unused in this set, AFAICT.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:28AM -0500, Brijesh Singh wrote:
> +int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsigned long *fw_err)
> +{
> + struct ghcb_state state;
> + unsigned long id, flags;
> + struct ghcb *ghcb;
> + int ret;
> +
> + if (!sev_feature_enabled(SEV_SNP))
> + return -ENODEV;
> +
> + local_irq_save(flags);
> +
> + ghcb = __sev_get_ghcb(&state);
> + if (!ghcb) {
> + ret = -EIO;
> + goto e_restore_irq;
> + }
> +
> + vc_ghcb_invalidate(ghcb);
> +
> + if (type == GUEST_REQUEST) {
> + id = SVM_VMGEXIT_GUEST_REQUEST;
> + } else if (type == EXT_GUEST_REQUEST) {
> + id = SVM_VMGEXIT_EXT_GUEST_REQUEST;
> + ghcb_set_rax(ghcb, input->data_gpa);
> + ghcb_set_rbx(ghcb, input->data_npages);
Hmmm, now I'm not sure. We did enum psc_op where you simply pass in the
op directly to the hardware because the enum uses the same numbers as
the actual command.
But here that @type thing is simply used to translate to the SVM_VMGEXIT
thing. So maybe you don't need it here and you can hand in the exit_code
directly:
int snp_issue_guest_request(u64 exit_code, struct snp_guest_request_data *input,
unsigned long *fw_err)
which you then pass in directly to...
> + } else {
> + ret = -EINVAL;
> + goto e_put;
> + }
> +
> + ret = sev_es_ghcb_hv_call(ghcb, NULL, id, input->req_gpa, input->resp_gpa);
... this guy here:
ret = sev_es_ghcb_hv_call(ghcb, NULL, exit_code, input->req_gpa, input->resp_gpa);
> + if (ret)
> + goto e_put;
> +
> + if (ghcb->save.sw_exit_info_2) {
> + /* Number of expected pages are returned in RBX */
> + if (id == EXT_GUEST_REQUEST &&
> + ghcb->save.sw_exit_info_2 == SNP_GUEST_REQ_INVALID_LEN)
> + input->data_npages = ghcb_get_rbx(ghcb);
> +
> + if (fw_err)
> + *fw_err = ghcb->save.sw_exit_info_2;
> +
> + ret = -EIO;
> + }
> +
> +e_put:
> + __sev_put_ghcb(&state);
> +e_restore_irq:
> + local_irq_restore(flags);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(snp_issue_guest_request);
> diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
Why is this a separate header in the include/linux/ namespace?
Is SNP available on something which is !x86, all of a sudden?
> new file mode 100644
> index 000000000000..24dd17507789
> --- /dev/null
> +++ b/include/linux/sev-guest.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * AMD Secure Encrypted Virtualization (SEV) guest driver interface
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Brijesh Singh <[email protected]>
> + *
> + */
> +
> +#ifndef __LINUX_SEV_GUEST_H_
> +#define __LINUX_SEV_GUEST_H_
> +
> +#include <linux/types.h>
> +
> +enum vmgexit_type {
> + GUEST_REQUEST,
> + EXT_GUEST_REQUEST,
> +
> + GUEST_REQUEST_MAX
> +};
> +
> +/*
> + * The error code when the data_npages is too small. The error code
> + * is defined in the GHCB specification.
> + */
> +#define SNP_GUEST_REQ_INVALID_LEN 0x100000000ULL
so basically
BIT_ULL(32)
> +
> +struct snp_guest_request_data {
"snp_req_data" I guess is shorter. And having "guest" in there is
probably not needed because snp is always guest-related anyway.
> + unsigned long req_gpa;
> + unsigned long resp_gpa;
> + unsigned long data_gpa;
> + unsigned int data_npages;
> +};
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
> + unsigned long *fw_err);
> +#else
> +
> +static inline int snp_issue_guest_request(int type, struct snp_guest_request_data *input,
> + unsigned long *fw_err)
> +{
> + return -ENODEV;
> +}
> +
> +#endif /* CONFIG_AMD_MEM_ENCRYPT */
> +#endif /* __LINUX_SEV_GUEST_H__ */
> --
> 2.17.1
>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/27/21 12:44 PM, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:28AM -0500, Brijesh Singh wrote:
>> +int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsigned long *fw_err)
>> +{
>> + struct ghcb_state state;
>> + unsigned long id, flags;
>> + struct ghcb *ghcb;
>> + int ret;
>> +
>> + if (!sev_feature_enabled(SEV_SNP))
>> + return -ENODEV;
>> +
>> + local_irq_save(flags);
>> +
>> + ghcb = __sev_get_ghcb(&state);
>> + if (!ghcb) {
>> + ret = -EIO;
>> + goto e_restore_irq;
>> + }
>> +
>> + vc_ghcb_invalidate(ghcb);
>> +
>> + if (type == GUEST_REQUEST) {
>> + id = SVM_VMGEXIT_GUEST_REQUEST;
>> + } else if (type == EXT_GUEST_REQUEST) {
>> + id = SVM_VMGEXIT_EXT_GUEST_REQUEST;
>> + ghcb_set_rax(ghcb, input->data_gpa);
>> + ghcb_set_rbx(ghcb, input->data_npages);
> Hmmm, now I'm not sure. We did enum psc_op where you simply pass in the
> op directly to the hardware because the enum uses the same numbers as
> the actual command.
>
> But here that @type thing is simply used to translate to the SVM_VMGEXIT
> thing. So maybe you don't need it here and you can hand in the exit_code
> directly:
>
> int snp_issue_guest_request(u64 exit_code, struct snp_guest_request_data *input,
> unsigned long *fw_err)
>
> which you then pass in directly to...
Okay, works for me. The main reason why I choose the enum was to not
expose the GHCB exit code to the driver but I guess that attestation
driver need to know which VMGEXIT need to be called, so, its okay to
have it pass the VMGEXIT number instead of enum.
>> + } else {
>> + ret = -EINVAL;
>> + goto e_put;
>> + }
>> +
>> + ret = sev_es_ghcb_hv_call(ghcb, NULL, id, input->req_gpa, input->resp_gpa);
> ... this guy here:
>
> ret = sev_es_ghcb_hv_call(ghcb, NULL, exit_code, input->req_gpa, input->resp_gpa);
>
>> + if (ret)
>> + goto e_put;
>> +
>> + if (ghcb->save.sw_exit_info_2) {
>> + /* Number of expected pages are returned in RBX */
>> + if (id == EXT_GUEST_REQUEST &&
>> + ghcb->save.sw_exit_info_2 == SNP_GUEST_REQ_INVALID_LEN)
>> + input->data_npages = ghcb_get_rbx(ghcb);
>> +
>> + if (fw_err)
>> + *fw_err = ghcb->save.sw_exit_info_2;
>> +
>> + ret = -EIO;
>> + }
>> +
>> +e_put:
>> + __sev_put_ghcb(&state);
>> +e_restore_irq:
>> + local_irq_restore(flags);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(snp_issue_guest_request);
>> diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
> Why is this a separate header in the include/linux/ namespace?
>
> Is SNP available on something which is !x86, all of a sudden?
So far most of the changes were in x86 specific files. However, the
attestation service is available through a driver to the userspace. The
driver needs to use the VMGEXIT routines provides by the x86 core. I
created the said header file so that driver does not need to include
<asm/sev.h/sev-common.h> etc and will compile for !x86.
>> new file mode 100644
>> index 000000000000..24dd17507789
>> --- /dev/null
>> +++ b/include/linux/sev-guest.h
>> @@ -0,0 +1,48 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * AMD Secure Encrypted Virtualization (SEV) guest driver interface
>> + *
>> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Brijesh Singh <[email protected]>
>> + *
>> + */
>> +
>> +#ifndef __LINUX_SEV_GUEST_H_
>> +#define __LINUX_SEV_GUEST_H_
>> +
>> +#include <linux/types.h>
>> +
>> +enum vmgexit_type {
>> + GUEST_REQUEST,
>> + EXT_GUEST_REQUEST,
>> +
>> + GUEST_REQUEST_MAX
>> +};
>> +
>> +/*
>> + * The error code when the data_npages is too small. The error code
>> + * is defined in the GHCB specification.
>> + */
>> +#define SNP_GUEST_REQ_INVALID_LEN 0x100000000ULL
> so basically
>
> BIT_ULL(32)
Noted.
>
>> +
>> +struct snp_guest_request_data {
> "snp_req_data" I guess is shorter. And having "guest" in there is
> probably not needed because snp is always guest-related anyway.
Noted.
>> + unsigned long req_gpa;
>> + unsigned long resp_gpa;
>> + unsigned long data_gpa;
>> + unsigned int data_npages;
>> +};
>> +
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
>> + unsigned long *fw_err);
>> +#else
>> +
>> +static inline int snp_issue_guest_request(int type, struct snp_guest_request_data *input,
>> + unsigned long *fw_err)
>> +{
>> + return -ENODEV;
>> +}
>> +
>> +#endif /* CONFIG_AMD_MEM_ENCRYPT */
>> +#endif /* __LINUX_SEV_GUEST_H__ */
>> --
>> 2.17.1
>>
On Fri, Aug 27, 2021 at 01:07:59PM -0500, Brijesh Singh wrote:
> Okay, works for me. The main reason why I choose the enum was to not
> expose the GHCB exit code to the driver
Why does that matter?
> but I guess that attestation driver need to know which VMGEXIT need to
> be called, so, its okay to have it pass the VMGEXIT number instead of
> enum.
Well, they're in an uapi header - can't be more public than that. :-)
> So far most of the changes were in x86 specific files. However, the
> attestation service is available through a driver to the userspace. The
> driver needs to use the VMGEXIT routines provides by the x86 core. I
> created the said header file so that driver does not need to include
> <asm/sev.h/sev-common.h> etc and will compile for !x86.
Lemme ask my question again:
Is SNP available on something which is !x86, all of a sudden?
Why would you want to compile that driver on anything but x86?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/27/21 1:13 PM, Borislav Petkov wrote:
> On Fri, Aug 27, 2021 at 01:07:59PM -0500, Brijesh Singh wrote:
>> Okay, works for me. The main reason why I choose the enum was to not
>> expose the GHCB exit code to the driver
> Why does that matter?
Those definitions are present in <asm/xxx>. Somewhere I read said that
if possible a new drivers should avoid including the <asm/xxx>. This is
one of the motivation to create a new file to provide the selective
information.
> Lemme ask my question again:
>
> Is SNP available on something which is !x86, all of a sudden?
Nope
>
> Why would you want to compile that driver on anything but x86?
>
Nobody should compile it for !x86. If my read about including <asm/xx>
in driver is wrong then I do not see any need for creating new header
file. I can drop the header file inclusion in next update.
thanks
On Fri, Aug 27, 2021 at 01:27:14PM -0500, Brijesh Singh wrote:
> Those definitions are present in <asm/xxx>. Somewhere I read said that
> if possible a new drivers should avoid including the <asm/xxx>.
Where?
That is news to me. It is likely possible that I might've missed that
rule but it doesn't look like there's a rule like that at the moment:
$ git grep -E "include.*asm" drivers/ | wc -l
4475
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:29AM -0500, Brijesh Singh wrote:
> The SNP guest request message header contains a message count. The
> message count is used while building the IV. The PSP firmware increments
> the message count by 1, and expects that next message will be using the
> incremented count. The snp_msg_seqno() helper will be used by driver to
> get the message sequence counter used in the request message header,
> and it will be automatically incremented after the request is successful.
> The incremented value is saved in the secrets page so that the kexec'ed
> kernel knows from where to begin.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/sev.c | 79 +++++++++++++++++++++++++++++++++++++++
> include/linux/sev-guest.h | 37 ++++++++++++++++++
> 2 files changed, 116 insertions(+)
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 319a40fc57ce..f42cd5a8e7bb 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -51,6 +51,8 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
> */
> static struct ghcb __initdata *boot_ghcb;
>
Explain what that is in a comment above it.
> +static u64 snp_secrets_phys;
snp_secrets_pa;
is the usual convention when a variable is supposed to contain a
physical address.
> +
> /* #VC handler runtime per-CPU data */
> struct sev_es_runtime_data {
> struct ghcb ghcb_page;
> @@ -2030,6 +2032,80 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
> halt();
> }
>
> +static struct snp_secrets_page_layout *snp_map_secrets_page(void)
> +{
> + u16 __iomem *secrets;
> +
> + if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
> + return NULL;
> +
> + secrets = ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
> + if (!secrets)
> + return NULL;
> +
> + return (struct snp_secrets_page_layout *)secrets;
> +}
Or simply:
static struct snp_secrets_page_layout *map_secrets_page(void)
{
if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
return NULL;
return ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
}
?
> +
> +static inline u64 snp_read_msg_seqno(void)
Drop that "snp_" prefix from all those static function names. This one
is even inline, which means its name doesn't matter at all.
> +{
> + struct snp_secrets_page_layout *layout;
> + u64 count;
> +
> + layout = snp_map_secrets_page();
> + if (!layout)
> + return 0;
> +
> + /* Read the current message sequence counter from secrets pages */
> + count = readl(&layout->os_area.msg_seqno_0);
> +
> + iounmap(layout);
> +
> + /* The sequence counter must begin with 1 */
That sounds weird. Why? 0 is special?
> + if (!count)
> + return 1;
> +
> + return count + 1;
> +}
> +
> +u64 snp_msg_seqno(void)
Function name needs a verb. I.e.,
snp_get_msg_seqno()
> +{
> + u64 count = snp_read_msg_seqno();
> +
> + if (unlikely(!count))
That looks like a left-over from a previous version as it can't happen.
Or are you handling the case where the u64 count will wraparound to 0?
But "The sequence counter must begin with 1" so that read function above
needs more love.
> + return 0;
> +
> + /*
> + * The message sequence counter for the SNP guest request is a
> + * 64-bit value but the version 2 of GHCB specification defines a
> + * 32-bit storage for the it.
> + */
> + if (count >= UINT_MAX)
> + return 0;
Huh, WTF? So when the internal counter goes over u32, this function will
return 0 only? More weird.
> +
> + return count;
> +}
> +EXPORT_SYMBOL_GPL(snp_msg_seqno);
> +
> +static void snp_gen_msg_seqno(void)
That's not "gen" - that's "inc" what this function does. IOW,
snp_inc_msg_seqno
> +{
> + struct snp_secrets_page_layout *layout;
> + u64 count;
> +
> + layout = snp_map_secrets_page();
> + if (!layout)
> + return;
> +
> + /*
> + * The counter is also incremented by the PSP, so increment it by 2
> + * and save in secrets page.
> + */
> + count = readl(&layout->os_area.msg_seqno_0);
> + count += 2;
> +
> + writel(count, &layout->os_area.msg_seqno_0);
> + iounmap(layout);
Why does this need to constantly map and unmap the secrets page? Why
don't you map it once on init and unmap it on exit?
> +}
> +
> int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsigned long *fw_err)
> {
> struct ghcb_state state;
> @@ -2077,6 +2153,9 @@ int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsi
> ret = -EIO;
> }
>
> + /* The command was successful, increment the sequence counter */
> + snp_gen_msg_seqno();
> +
> e_put:
> __sev_put_ghcb(&state);
> e_restore_irq:
> diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
> index 24dd17507789..16b6af24fda7 100644
> --- a/include/linux/sev-guest.h
> +++ b/include/linux/sev-guest.h
> @@ -20,6 +20,41 @@ enum vmgexit_type {
> GUEST_REQUEST_MAX
> };
>
> +/*
> + * The secrets page contains 96-bytes of reserved field that can be used by
> + * the guest OS. The guest OS uses the area to save the message sequence
> + * number for each VMPCK.
> + *
> + * See the GHCB spec section Secret page layout for the format for this area.
> + */
> +struct secrets_os_area {
> + u32 msg_seqno_0;
> + u32 msg_seqno_1;
> + u32 msg_seqno_2;
> + u32 msg_seqno_3;
> + u64 ap_jump_table_pa;
> + u8 rsvd[40];
> + u8 guest_usage[32];
> +} __packed;
So those are differently named there:
struct secrets_page_os_area {
uint32 vmpl0_message_seq_num;
uint32 vmpl1_message_seq_num;
...
and they have "vmpl" in there which makes a lot more sense for that
they're used than msg_seqno_* does.
> +
> +#define VMPCK_KEY_LEN 32
> +
> +/* See the SNP spec for secrets page format */
> +struct snp_secrets_page_layout {
Simply
struct snp_secrets
That name says all you need to know about what that struct represents.
> + u32 version;
> + u32 imien : 1,
> + rsvd1 : 31;
> + u32 fms;
> + u32 rsvd2;
> + u8 gosvw[16];
> + u8 vmpck0[VMPCK_KEY_LEN];
> + u8 vmpck1[VMPCK_KEY_LEN];
> + u8 vmpck2[VMPCK_KEY_LEN];
> + u8 vmpck3[VMPCK_KEY_LEN];
> + struct secrets_os_area os_area;
My SNP spec copy has here
0A0h–FFFh Reserved.
and no os area. I guess
SEV Secure Nested Paging Firmware ABI Specification 56860 Rev. 0.8 August 2020
needs updating...
> + u8 rsvd3[3840];
> +} __packed;
> +
> /*
> * The error code when the data_npages is too small. The error code
> * is defined in the GHCB specification.
> @@ -36,6 +71,7 @@ struct snp_guest_request_data {
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
> unsigned long *fw_err);
> +u64 snp_msg_seqno(void);
> #else
>
> static inline int snp_issue_guest_request(int type, struct snp_guest_request_data *input,
> @@ -43,6 +79,7 @@ static inline int snp_issue_guest_request(int type, struct snp_guest_request_dat
> {
> return -ENODEV;
> }
> +static inline u64 snp_msg_seqno(void) { return 0; }
>
> #endif /* CONFIG_AMD_MEM_ENCRYPT */
> #endif /* __LINUX_SEV_GUEST_H__ */
> --
> 2.17.1
>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/27/21 1:07 PM, Brijesh Singh wrote:
> On 8/27/21 12:44 PM, Borislav Petkov wrote:
>> On Fri, Aug 20, 2021 at 10:19:28AM -0500, Brijesh Singh wrote:
...
>>> +
>>> +/*
>>> + * The error code when the data_npages is too small. The error code
>>> + * is defined in the GHCB specification.
>>> + */
>>> +#define SNP_GUEST_REQ_INVALID_LEN 0x100000000ULL
>> so basically
>>
>> BIT_ULL(32)
>
> Noted.
The main thing about this is that it is an error code from the HV on
extended guest requests. The HV error code sits in the high-order 32-bits
of the SW_EXIT_INFO_2 field. So defining it either way seems a bit
confusing. To me, the value should just be 1ULL and then it should be
shifted when assigning it to the SW_EXIT_INFO_2.
Thanks,
Tom
>
On Fri, Aug 27, 2021 at 02:57:11PM -0500, Tom Lendacky wrote:
> The main thing about this is that it is an error code from the HV on
> extended guest requests. The HV error code sits in the high-order 32-bits of
> the SW_EXIT_INFO_2 field. So defining it either way seems a bit confusing.
> To me, the value should just be 1ULL and then it should be shifted when
> assigning it to the SW_EXIT_INFO_2.
Err, that's from the GHCB spec:
"The hypervisor must validate that the guest has supplied enough pages
...
certificate data in the RBX register and set the SW_EXITINFO2 field to
0x0000000100000000."
So if you wanna do the above, you need to fix the spec first. I'd say.
:-)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/27/21 3:17 PM, Borislav Petkov wrote:
> On Fri, Aug 27, 2021 at 02:57:11PM -0500, Tom Lendacky wrote:
>> The main thing about this is that it is an error code from the HV on
>> extended guest requests. The HV error code sits in the high-order 32-bits of
>> the SW_EXIT_INFO_2 field. So defining it either way seems a bit confusing.
>> To me, the value should just be 1ULL and then it should be shifted when
>> assigning it to the SW_EXIT_INFO_2.
>
> Err, that's from the GHCB spec:
>
> "The hypervisor must validate that the guest has supplied enough pages
>
> ...
>
> certificate data in the RBX register and set the SW_EXITINFO2 field to
> 0x0000000100000000."
>
> So if you wanna do the above, you need to fix the spec first. I'd say.
>
> :-)
See the NAE Event table that documents "State from Hypervisor" where it
says the upper 32-bits (63:32) will contain the return code from the
hypervisor.
In the case you quoted, that specific situation is documented to return a
hypervisor return code of 1 (since the hypervisor return code occupies
bits 63:32). The hypervisor is free to return other values, that need not
be documented in spec, if it encounters other types of unforeseeable errors.
Thanks,
Tom
>
On 8/27/21 1:41 PM, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:29AM -0500, Brijesh Singh wrote:
>> The SNP guest request message header contains a message count. The
>> message count is used while building the IV. The PSP firmware increments
>> the message count by 1, and expects that next message will be using the
>> incremented count. The snp_msg_seqno() helper will be used by driver to
>> get the message sequence counter used in the request message header,
>> and it will be automatically incremented after the request is successful.
>> The incremented value is saved in the secrets page so that the kexec'ed
>> kernel knows from where to begin.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/kernel/sev.c | 79 +++++++++++++++++++++++++++++++++++++++
>> include/linux/sev-guest.h | 37 ++++++++++++++++++
>> 2 files changed, 116 insertions(+)
>>
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index 319a40fc57ce..f42cd5a8e7bb 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -51,6 +51,8 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
>> */
>> static struct ghcb __initdata *boot_ghcb;
>>
>
> Explain what that is in a comment above it.
>
>> +static u64 snp_secrets_phys;
>
> snp_secrets_pa;
>
> is the usual convention when a variable is supposed to contain a
> physical address.
>
Noted.
>> +
>> /* #VC handler runtime per-CPU data */
>> struct sev_es_runtime_data {
>> struct ghcb ghcb_page;
>> @@ -2030,6 +2032,80 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
>> halt();
>> }
>>
>> +static struct snp_secrets_page_layout *snp_map_secrets_page(void)
>> +{
>> + u16 __iomem *secrets;
>> +
>> + if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
>> + return NULL;
>> +
>> + secrets = ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
>> + if (!secrets)
>> + return NULL;
>> +
>> + return (struct snp_secrets_page_layout *)secrets;
>> +}
>
> Or simply:
>
> static struct snp_secrets_page_layout *map_secrets_page(void)
> {
> if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
> return NULL;
>
> return ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
> }
>
> ?
>
Yes that also works.
>> +
>> +static inline u64 snp_read_msg_seqno(void)
>
> Drop that "snp_" prefix from all those static function names. This one
> is even inline, which means its name doesn't matter at all.
>
>> +{
>> + struct snp_secrets_page_layout *layout;
>> + u64 count;
>> +
>> + layout = snp_map_secrets_page();
>> + if (!layout)
>> + return 0;
>> +
>> + /* Read the current message sequence counter from secrets pages */
>> + count = readl(&layout->os_area.msg_seqno_0);
>> +
>> + iounmap(layout);
>> +
>> + /* The sequence counter must begin with 1 */
>
> That sounds weird. Why? 0 is special?
The SNP firmware spec says that counter must begin with the 1.
>
>> + if (!count)
>> + return 1;
>> +
>> + return count + 1;
>> +}
>> +
>> +u64 snp_msg_seqno(void)
>
> Function name needs a verb. I.e.,
>
> snp_get_msg_seqno()
>
Ok.
>> +{
>> + u64 count = snp_read_msg_seqno();
>> +
>> + if (unlikely(!count))
>
> That looks like a left-over from a previous version as it can't happen.
>
> Or are you handling the case where the u64 count will wraparound to 0?
>
> But "The sequence counter must begin with 1" so that read function above
> needs more love.
>
Yes, I will cleanup a bit more.
>> + return 0;
>
>
>> +
>> + /*
>> + * The message sequence counter for the SNP guest request is a
>> + * 64-bit value but the version 2 of GHCB specification defines a
>> + * 32-bit storage for the it.
>> + */
>> + if (count >= UINT_MAX)
>> + return 0;
>
> Huh, WTF? So when the internal counter goes over u32, this function will
> return 0 only? More weird.
>
During the GHCB writing the seqno use to be 32-bit value and hence the
GHCB spec choose the 32-bit value but recently the SNP firmware changed
it from the 32 to 64. So, now we are left with the option of limiting
the sequence number to 32-bit. If we go beyond 32-bit then all we can do
is fail the call. If we pass the value of zero then FW will fail the call.
>> +
>> + return count;
>> +}
>> +EXPORT_SYMBOL_GPL(snp_msg_seqno);
>> +
>> +static void snp_gen_msg_seqno(void)
>
> That's not "gen" - that's "inc" what this function does. IOW,
>
> snp_inc_msg_seqno
>
I agree. I will update it.
>> +{
>> + struct snp_secrets_page_layout *layout;
>> + u64 count;
>> +
>> + layout = snp_map_secrets_page();
>> + if (!layout)
>> + return;
>> +
>> + /*
>> + * The counter is also incremented by the PSP, so increment it by 2
>> + * and save in secrets page.
>> + */
>> + count = readl(&layout->os_area.msg_seqno_0);
>> + count += 2;
>> +
>> + writel(count, &layout->os_area.msg_seqno_0);
>> + iounmap(layout);
>
> Why does this need to constantly map and unmap the secrets page? Why
> don't you map it once on init and unmap it on exit?
>
Yes, I can remove that with:
secrets_va = (__force void *)ioremap_encrypted(pa...)
And then use secrets_va instead of doing readl/writel.
>> +}
>> +
>> int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsigned long *fw_err)
>> {
>> struct ghcb_state state;
>> @@ -2077,6 +2153,9 @@ int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsi
>> ret = -EIO;
>> }
>>
>> + /* The command was successful, increment the sequence counter */
>> + snp_gen_msg_seqno();
>> +
>> e_put:
>> __sev_put_ghcb(&state);
>> e_restore_irq:
>> diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
>> index 24dd17507789..16b6af24fda7 100644
>> --- a/include/linux/sev-guest.h
>> +++ b/include/linux/sev-guest.h
>> @@ -20,6 +20,41 @@ enum vmgexit_type {
>> GUEST_REQUEST_MAX
>> };
>>
>> +/*
>> + * The secrets page contains 96-bytes of reserved field that can be used by
>> + * the guest OS. The guest OS uses the area to save the message sequence
>> + * number for each VMPCK.
>> + *
>> + * See the GHCB spec section Secret page layout for the format for this area.
>> + */
>> +struct secrets_os_area {
>> + u32 msg_seqno_0;
>> + u32 msg_seqno_1;
>> + u32 msg_seqno_2;
>> + u32 msg_seqno_3;
>> + u64 ap_jump_table_pa;
>> + u8 rsvd[40];
>> + u8 guest_usage[32];
>> +} __packed;
>
> So those are differently named there:
>
> struct secrets_page_os_area {
> uint32 vmpl0_message_seq_num;
> uint32 vmpl1_message_seq_num;
> ...
>
> and they have "vmpl" in there which makes a lot more sense for that
> they're used than msg_seqno_* does.
>
I just choose the smaller name but I have no issues matching with the
spec. Also those keys does not have anything to do with the VMPL level.
The secrets page provides 4 different keys and they are referred as
vmpck0..3 and each of them have a sequence numbers associated with it.
In GHCB v3 we probably need to rework the structure name.
>> +
>> +#define VMPCK_KEY_LEN 32
>> +
>> +/* See the SNP spec for secrets page format */
>> +struct snp_secrets_page_layout {
>
> Simply
>
> struct snp_secrets
>
> That name says all you need to know about what that struct represents.
>
>> + u32 version;
>> + u32 imien : 1,
>> + rsvd1 : 31;
>> + u32 fms;
>> + u32 rsvd2;
>> + u8 gosvw[16];
>> + u8 vmpck0[VMPCK_KEY_LEN];
>> + u8 vmpck1[VMPCK_KEY_LEN];
>> + u8 vmpck2[VMPCK_KEY_LEN];
>> + u8 vmpck3[VMPCK_KEY_LEN];
>> + struct secrets_os_area os_area;
>
> My SNP spec copy has here
>
> 0A0h–FFFh Reserved.
>
> and no os area. I guess
>
> SEV Secure Nested Paging Firmware ABI Specification 56860 Rev. 0.8 August 2020
>
> needs updating...
The latest SNP spec here:
https://www.amd.com/system/files/TechDocs/56860.pdf
We are at spec 0.9.
>
>> + u8 rsvd3[3840];
>> +} __packed;
>> +
>> /*
>> * The error code when the data_npages is too small. The error code
>> * is defined in the GHCB specification.
>> @@ -36,6 +71,7 @@ struct snp_guest_request_data {
>> #ifdef CONFIG_AMD_MEM_ENCRYPT
>> int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
>> unsigned long *fw_err);
>> +u64 snp_msg_seqno(void);
>> #else
>>
>> static inline int snp_issue_guest_request(int type, struct snp_guest_request_data *input,
>> @@ -43,6 +79,7 @@ static inline int snp_issue_guest_request(int type, struct snp_guest_request_dat
>> {
>> return -ENODEV;
>> }
>> +static inline u64 snp_msg_seqno(void) { return 0; }
>>
>> #endif /* CONFIG_AMD_MEM_ENCRYPT */
>> #endif /* __LINUX_SEV_GUEST_H__ */
>> --
>> 2.17.1
>>
>
On Wed, Aug 25, 2021 at 06:29:31PM +0200, Borislav Petkov wrote:
> On Wed, Aug 25, 2021 at 10:18:35AM -0500, Michael Roth wrote:
> > On Wed, Aug 25, 2021 at 04:29:13PM +0200, Borislav Petkov wrote:
> > > On Fri, Aug 20, 2021 at 10:19:18AM -0500, Brijesh Singh wrote:
> > > > From: Michael Roth <[email protected]>
> > > >
> > > > As of commit 103a4908ad4d ("x86/head/64: Disable stack protection for
> > > > head$(BITS).o") kernel/head64.c is compiled with -fno-stack-protector
> > > > to allow a call to set_bringup_idt_handler(), which would otherwise
> > > > have stack protection enabled with CONFIG_STACKPROTECTOR_STRONG. While
> > > > sufficient for that case, this will still cause issues if we attempt to
> ^^^
>
> I'm tired of repeating the same review comments with you guys:
>
> Who's "we"?
>
> Please use passive voice in your text: no "we" or "I", etc.
> Personal pronouns are ambiguous in text, especially with so many
> parties/companies/etc developing the kernel so let's avoid them please.
That had also been fixed in the commit message fixup that got clobbered, but
I still missed it in one of the comments as well so I'll be more careful of
this.
>
> How about you pay more attention?
I've been periodically revising/rewording my comments since I saw you're
original comments to Brijesh a few versions back, but it's how I normally
talk when discussing code with people so it keeps managing to sneak back in.
I've added a git hook to check for this and found other instances that need
fixing as well, so hopefully with the help of technology I can get them all
sorted for the next spin.
>
> > I didn't realize the the 32-bit path was something you were suggesting
> > to have added in this patch, but I'll take a look at that as well.
>
> If you're going to remove the -no-stack-protector thing for that file,
> then pls remove it for both 32- and 64-bit. I.e., the revert what
> 103a4908ad4d did.
Got it, will do.
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7Cdfceeb76d2a4481da83f08d967e57220%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637655057436180426%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aAKQ%2B7mXBvL4oofk0y7CacaMMD8Ucg8YL5hB4nw7zgo%3D&reserved=0
On Wed, Aug 25, 2021 at 09:19:18PM +0200, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:23AM -0500, Brijesh Singh wrote:
> > From: Michael Roth <[email protected]>
> >
> > CPUID instructions generate a #VC exception for SEV-ES/SEV-SNP guests,
> > for which early handlers are currently set up to handle. In the case
> > of SEV-SNP, guests can use a special location in guest memory address
> > space that has been pre-populated with firmware-validated CPUID
> > information to look up the relevant CPUID values rather than
> > requesting them from hypervisor via a VMGEXIT.
> >
> > Determine the location of the CPUID memory address in advance of any
> > CPUID instructions/exceptions and, when available, use it to handle
> > the CPUID lookup.
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > Signed-off-by: Brijesh Singh <[email protected]>
> > ---
> > arch/x86/boot/compressed/efi.c | 1 +
> > arch/x86/boot/compressed/head_64.S | 1 +
> > arch/x86/boot/compressed/idt_64.c | 7 +-
> > arch/x86/boot/compressed/misc.h | 1 +
> > arch/x86/boot/compressed/sev.c | 3 +
> > arch/x86/include/asm/sev-common.h | 2 +
> > arch/x86/include/asm/sev.h | 3 +
> > arch/x86/kernel/sev-shared.c | 374 +++++++++++++++++++++++++++++
> > arch/x86/kernel/sev.c | 4 +
> > 9 files changed, 394 insertions(+), 2 deletions(-)
>
> Another huuge patch. I wonder if it can be split...
I think I can split out at least sev_snp_cpuid_init() and
sev_snp_probe_cc_blob(). Adding the actual cpuid lookup and related code to
#VC handler though I'm not sure there's much that can be done there.
>
> > diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
> > index 16ff5cb9a1fb..a1529a230ea7 100644
> > --- a/arch/x86/boot/compressed/efi.c
> > +++ b/arch/x86/boot/compressed/efi.c
> > @@ -176,3 +176,4 @@ efi_get_conf_table(struct boot_params *boot_params,
> >
> > return 0;
> > }
> > +
>
> Applying: x86/compressed/64: Enable SEV-SNP-validated CPUID in #VC handler
> .git/rebase-apply/patch:21: new blank line at EOF.
> +
> warning: 1 line adds whitespace errors.
>
> That looks like a stray hunk which doesn't belong.
Will get this fixed up. I should've noticed these checkpatch warnings so
I modified my git hook to flag these a bit more prevalently.
>
> > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> > index a2347ded77ea..1c1658693fc9 100644
> > --- a/arch/x86/boot/compressed/head_64.S
> > +++ b/arch/x86/boot/compressed/head_64.S
> > @@ -441,6 +441,7 @@ SYM_CODE_START(startup_64)
> > .Lon_kernel_cs:
> >
> > pushq %rsi
> > + movq %rsi, %rdi /* real mode address */
> > call load_stage1_idt
> > popq %rsi
> >
> > diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
> > index 9b93567d663a..1f6511a6625d 100644
> > --- a/arch/x86/boot/compressed/idt_64.c
> > +++ b/arch/x86/boot/compressed/idt_64.c
> > @@ -3,6 +3,7 @@
> > #include <asm/segment.h>
> > #include <asm/trapnr.h>
> > #include "misc.h"
> > +#include <asm/sev.h>
>
> asm/ namespaced headers should go together, before the private ones,
> i.e., above the misc.h line.
Will make sure to great these together, but there seems to be a convention
of including misc.h first, since it does some fixups for subsequent
includes. So maybe that should be moved to the top? There's a comment in
boot/compressed/sev.c:
/*
* misc.h needs to be first because it knows how to include the other kernel
* headers in the pre-decompression code in a way that does not break
* compilation.
*/
And while it's not an issue here, asm/sev.h now needs to have
__BOOT_COMPRESSED #define'd in advance. So maybe that #define should be
moved into misc.h so it doesn't have to happen before each include?
>
> > static void set_idt_entry(int vector, void (*handler)(void))
> > {
> > @@ -28,13 +29,15 @@ static void load_boot_idt(const struct desc_ptr *dtr)
> > }
> >
> > /* Setup IDT before kernel jumping to .Lrelocated */
> > -void load_stage1_idt(void)
> > +void load_stage1_idt(void *rmode)
> > {
> > boot_idt_desc.address = (unsigned long)boot_idt;
> >
> >
> > - if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
> > + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
> > + sev_snp_cpuid_init(rmode);
> > set_idt_entry(X86_TRAP_VC, boot_stage1_vc);
> > + }
> >
> > load_boot_idt(&boot_idt_desc);
> > }
> > diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> > index 16b092fd7aa1..cdd328aa42c2 100644
> > --- a/arch/x86/boot/compressed/misc.h
> > +++ b/arch/x86/boot/compressed/misc.h
> > @@ -190,6 +190,7 @@ int efi_get_conf_table(struct boot_params *boot_params,
> > unsigned long *conf_table_pa,
> > unsigned int *conf_table_len,
> > bool *is_efi_64);
> > +
>
> Another stray hunk.
>
> > #else
> > static inline int
> > efi_find_vendor_table(unsigned long conf_table_pa, unsigned int conf_table_len,
> > diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> > index 6e8d97c280aa..910bf5cf010e 100644
> > --- a/arch/x86/boot/compressed/sev.c
> > +++ b/arch/x86/boot/compressed/sev.c
> > @@ -20,6 +20,9 @@
> > #include <asm/fpu/xcr.h>
> > #include <asm/ptrace.h>
> > #include <asm/svm.h>
> > +#include <asm/cpuid.h>
> > +#include <linux/efi.h>
> > +#include <linux/log2.h>
>
> What are those includes for?
>
> Polluting the decompressor namespace with kernel proper defines is a
> real pain to untangle as it is. What do you need those for and can you
> do it without them?
cpuid.h is for cpuid_function_is_indexed(), which was introduced in this
series with patch "KVM: x86: move lookup of indexed CPUID leafs to helper".
efi.h is for EFI_CC_BLOB_GUID, which gets referenced by sev-shared.c
when it gets included here. However, misc.h seems to already include it,
so it can be safely dropped from this patch.
log2.h seems to be an artifact, I'll get that cleaned up.
>
> > #include "error.h"
> >
> > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> > index 072540dfb129..5f134c172dbf 100644
> > --- a/arch/x86/include/asm/sev-common.h
> > +++ b/arch/x86/include/asm/sev-common.h
> > @@ -148,6 +148,8 @@ struct snp_psc_desc {
> > #define GHCB_TERM_PSC 1 /* Page State Change failure */
> > #define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
> > #define GHCB_TERM_NOT_VMPL0 3 /* SNP guest is not running at VMPL-0 */
> > +#define GHCB_TERM_CPUID 4 /* CPUID-validation failure */
> > +#define GHCB_TERM_CPUID_HV 5 /* CPUID failure during hypervisor fallback */
> >
> > #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
> >
> > diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> > index 534fa1c4c881..c73931548346 100644
> > --- a/arch/x86/include/asm/sev.h
> > +++ b/arch/x86/include/asm/sev.h
> > @@ -11,6 +11,7 @@
> > #include <linux/types.h>
> > #include <asm/insn.h>
> > #include <asm/sev-common.h>
> > +#include <asm/bootparam.h>
> >
> > #define GHCB_PROTOCOL_MIN 1ULL
> > #define GHCB_PROTOCOL_MAX 2ULL
> > @@ -126,6 +127,7 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
> > void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
> > void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
> > void snp_set_wakeup_secondary_cpu(void);
> > +void sev_snp_cpuid_init(struct boot_params *bp);
> > #else
> > static inline void sev_es_ist_enter(struct pt_regs *regs) { }
> > static inline void sev_es_ist_exit(void) { }
> > @@ -141,6 +143,7 @@ static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz,
> > static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
> > static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
> > static inline void snp_set_wakeup_secondary_cpu(void) { }
> > +static inline void sev_snp_cpuid_init(struct boot_params *bp) { }
> > #endif
> >
> > #endif
> > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> > index ae4556925485..651980ddbd65 100644
> > --- a/arch/x86/kernel/sev-shared.c
> > +++ b/arch/x86/kernel/sev-shared.c
> > @@ -14,6 +14,25 @@
> > #define has_cpuflag(f) boot_cpu_has(f)
> > #endif
> >
> > +struct sev_snp_cpuid_fn {
> > + u32 eax_in;
> > + u32 ecx_in;
> > + u64 unused;
> > + u64 unused2;
>
> What are those for? Padding? Or are they spec-ed somewhere and left for
> future use?
>
> Seeing how the struct is __packed, they probably are part of a spec
> definition somewhere.
>
> Link pls.
>
> > + u32 eax;
> > + u32 ebx;
> > + u32 ecx;
> > + u32 edx;
> > + u64 reserved;
>
> Ditto.
>
> Please prefix all those unused/reserved members with "__".
Will do.
>
> > +} __packed;
> > +
> > +struct sev_snp_cpuid_info {
> > + u32 count;
> > + u32 reserved1;
> > + u64 reserved2;
>
> Ditto.
The 'reserved' fields here are documented in SEV-SNP Firmware ABI
revision 0.9, section 8.14.2.6 (CPUID page), and the above 'reserved'
fields of sev_snp_cpuid_fn are documented in section 7.1 (CPUID Reporting)
Table 14:
https://www.amd.com/system/files/TechDocs/56860.pdf
The 'unused' / 'unused2' fields correspond to 'XCR0_IN' and 'XSS_IN' in
section 7.1 Table 14. They are meant to allow a hypervisor to encode
CPUID leaf 0xD subleaf 0x0:0x1 entries that are specific to a certain
set of XSAVE features enabled via XCR0/XSS registers, so a guest can
look up the specific entry based on its current XCR0/XSS register
values.
This doesn't scale very well as more XSAVE features are added however,
and was more useful for the CPUID guest message documented in 7.1, as
opposed to the static CPUID page implemented here.
Instead, it is simpler and just as safe to have the guest calculate the
appropriate values based on CPUID leaf 0xD, subleaves 0x2-0x3F, like
what sev_snp_cpuid_xsave_size() does below. So they are marked unused
here to try to make that clearer.
Some of these hypervisor-specific implementation notes have been summarized
into a document posted to the sev-snp mailing list in June:
"Guest/Hypervisor Implementation Notes for SEV-SNP CPUID Enforcement"
It's currently in RFC v2, but there has been a change relating to the
CPUID range checks that needs to be added for v3, I'll get that sent
out soon. We are hoping to get these included in an official spec to
help with interoperability between hypervisors, but for now it is only
a reference to aid implementations.
>
> > + struct sev_snp_cpuid_fn fn[0];
> > +} __packed;
> > +
> > /*
> > * Since feature negotiation related variables are set early in the boot
> > * process they must reside in the .data section so as not to be zeroed
> > @@ -26,6 +45,15 @@ static u16 __ro_after_init ghcb_version;
> > /* Bitmap of SEV features supported by the hypervisor */
> > u64 __ro_after_init sev_hv_features = 0;
> >
> > +/*
> > + * These are also stored in .data section to avoid the need to re-parse
> > + * boot_params and re-determine CPUID memory range when .bss is cleared.
> > + */
> > +static int sev_snp_cpuid_enabled __section(".data");
>
> That will become part of prot_guest_has() or cc_platform_has() or
> whatever its name is going to be.
Ok, will look at working this into there.
>
> > +static unsigned long sev_snp_cpuid_pa __section(".data");
> > +static unsigned long sev_snp_cpuid_sz __section(".data");
> > +static const struct sev_snp_cpuid_info *cpuid_info __section(".data");
>
> All those: __ro_after_init?
Makes sense.
>
> Also, just like the ones above have a short comment explaining what they
> are, add such comments for those too pls and perhaps what they're used
> for.
Will do.
>
> > +
> > static bool __init sev_es_check_cpu_features(void)
> > {
> > if (!has_cpuflag(X86_FEATURE_RDRAND)) {
> > @@ -236,6 +264,219 @@ static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
> > return 0;
> > }
> >
> > +static bool sev_snp_cpuid_active(void)
> > +{
> > + return sev_snp_cpuid_enabled;
> > +}
>
> That too will become part of prot_guest_has() or cc_platform_has() or
> whatever its name is going to be.
>
> > +
> > +static int sev_snp_cpuid_xsave_size(u64 xfeatures_en, u32 base_size,
> > + u32 *xsave_size, bool compacted)
>
> Function name needs a verb. Please audit all your patches.
>
> > +{
> > + u64 xfeatures_found = 0;
> > + int i;
> > +
> > + *xsave_size = base_size;
>
> Set that xsave_size only...
> > +
> > + for (i = 0; i < cpuid_info->count; i++) {
> > + const struct sev_snp_cpuid_fn *fn = &cpuid_info->fn[i];
> > +
> > + if (!(fn->eax_in == 0xd && fn->ecx_in > 1 && fn->ecx_in < 64))
> > + continue;
> > + if (!(xfeatures_en & (1UL << fn->ecx_in)))
> > + continue;
> > + if (xfeatures_found & (1UL << fn->ecx_in))
> > + continue;
> > +
> > + xfeatures_found |= (1UL << fn->ecx_in);
>
> For all use BIT_ULL().
>
> > + if (compacted)
> > + *xsave_size += fn->eax;
> > + else
> > + *xsave_size = max(*xsave_size, fn->eax + fn->ebx);
>
> ... not here ...
>
> > + }
> > +
> > + /*
> > + * Either the guest set unsupported XCR0/XSS bits, or the corresponding
> > + * entries in the CPUID table were not present. This is not a valid
> > + * state to be in.
> > + */
> > + if (xfeatures_found != (xfeatures_en & ~3ULL))
> > + return -EINVAL;
>
> ... but here when you're not going to return an error because callers
> will see that value change temporarily which is not clean.
>
> Also, you need to set it once - not during each loop iteration.
Much nicer, will do.
>
> > +
> > + return 0;
> > +}
> > +
> > +static void sev_snp_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
> > + u32 *ecx, u32 *edx)
> > +{
> > + /*
> > + * Currently MSR protocol is sufficient to handle fallback cases, but
> > + * should that change make sure we terminate rather than grabbing random
>
> Fix the "we"s please. Please audit all your patches.
>
> > + * values. Handling can be added in future to use GHCB-page protocol for
> > + * cases that occur late enough in boot that GHCB page is available
>
> End comment sentences with a fullstop. Please audit all your patches.
>
> > + */
>
> Also, put that comment over the function.
>
> > + if (cpuid_function_is_indexed(func) && subfunc != 0)
>
> In all your patches:
>
> s/ != 0//g
>
> > + sev_es_terminate(1, GHCB_TERM_CPUID_HV);
> > +
> > + if (sev_cpuid_hv(func, 0, eax, ebx, ecx, edx))
> > + sev_es_terminate(1, GHCB_TERM_CPUID_HV);
> > +}
> > +
> > +static bool sev_snp_cpuid_find(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
>
> I guess
>
> find_validated_cpuid_func()
>
> or so to denote where it picks it out from.
>
> > + u32 *ecx, u32 *edx)
> > +{
> > + int i;
> > + bool found = false;
>
> The tip-tree preferred ordering of variable declarations at the
> beginning of a function is reverse fir tree order::
>
> struct long_struct_name *descriptive_name;
> unsigned long foo, bar;
> unsigned int tmp;
> int ret;
>
> The above is faster to parse than the reverse ordering::
>
> int ret;
> unsigned int tmp;
> unsigned long foo, bar;
> struct long_struct_name *descriptive_name;
>
> And even more so than random ordering::
>
> unsigned long foo, bar;
> int ret;
> struct long_struct_name *descriptive_name;
> unsigned int tmp;
>
> Audit all your patches pls.
>
> > +
> > + for (i = 0; i < cpuid_info->count; i++) {
> > + const struct sev_snp_cpuid_fn *fn = &cpuid_info->fn[i];
> > +
> > + if (fn->eax_in != func)
> > + continue;
> > +
> > + if (cpuid_function_is_indexed(func) && fn->ecx_in != subfunc)
> > + continue;
> > +
> > + *eax = fn->eax;
> > + *ebx = fn->ebx;
> > + *ecx = fn->ecx;
> > + *edx = fn->edx;
> > + found = true;
> > +
> > + break;
>
> That's just silly. Simply:
>
> return true;
>
>
> > + }
> > +
> > + return found;
>
> return false;
>
> here and the "found" variable can go.
Will do. Missed this cleanup when I originally moved this out to a
seperate helper.
>
> > +}
> > +
> > +static bool sev_snp_cpuid_in_range(u32 func)
> > +{
> > + int i;
> > + u32 std_range_min = 0;
> > + u32 std_range_max = 0;
> > + u32 hyp_range_min = 0x40000000;
> > + u32 hyp_range_max = 0;
> > + u32 ext_range_min = 0x80000000;
> > + u32 ext_range_max = 0;
> > +
> > + for (i = 0; i < cpuid_info->count; i++) {
> > + const struct sev_snp_cpuid_fn *fn = &cpuid_info->fn[i];
> > +
> > + if (fn->eax_in == std_range_min)
> > + std_range_max = fn->eax;
> > + else if (fn->eax_in == hyp_range_min)
> > + hyp_range_max = fn->eax;
> > + else if (fn->eax_in == ext_range_min)
> > + ext_range_max = fn->eax;
> > + }
>
> So this loop which determines those ranges will run each time
> sev_snp_cpuid_find() doesn't find @func among the validated CPUID leafs.
>
> Why don't you do that determination once at init...
>
> > +
> > + if ((func >= std_range_min && func <= std_range_max) ||
> > + (func >= hyp_range_min && func <= hyp_range_max) ||
> > + (func >= ext_range_min && func <= ext_range_max))
>
> ... so that this function becomes only this check?
>
> This is unnecessary work as it is.
That makes sense. I was treating this as an edge case but it could actually
happen fairly often in some case. I'll plan to add __ro_after_init
variables to store these values.
>
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +/*
> > + * Returns -EOPNOTSUPP if feature not enabled. Any other return value should be
> > + * treated as fatal by caller since we cannot fall back to hypervisor to fetch
> > + * the values for security reasons (outside of the specific cases handled here)
> > + */
> > +static int sev_snp_cpuid(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
> > + u32 *edx)
> > +{
> > + if (!sev_snp_cpuid_active())
> > + return -EOPNOTSUPP;
> > +
> > + if (!cpuid_info)
> > + return -EIO;
> > +
> > + if (!sev_snp_cpuid_find(func, subfunc, eax, ebx, ecx, edx)) {
> > + /*
> > + * Some hypervisors will avoid keeping track of CPUID entries
> > + * where all values are zero, since they can be handled the
> > + * same as out-of-range values (all-zero). In our case, we want
> > + * to be able to distinguish between out-of-range entries and
> > + * in-range zero entries, since the CPUID table entries are
> > + * only a template that may need to be augmented with
> > + * additional values for things like CPU-specific information.
> > + * So if it's not in the table, but is still in the valid
> > + * range, proceed with the fix-ups below. Otherwise, just return
> > + * zeros.
> > + */
> > + *eax = *ebx = *ecx = *edx = 0;
> > + if (!sev_snp_cpuid_in_range(func))
> > + goto out;
>
> That label is not needed.
>
> > + }
>
> All that from here on looks like it should go into a separate function
> called
>
> snp_cpuid_postprocess()
>
> where you can do a switch-case on func and have it nice, readable and
> extensible there, in case more functions get added.
Sounds good.
> > /*
> > * Boot VC Handler - This is the first VC handler during boot, there is no GHCB
> > * page yet, so it only supports the MSR based communication with the
>
> Is that comment...
Technically it supports MSR communication *and* CPUID page lookups now.
Assuming that's what you're referring to I'll get that added.
>
> > @@ -244,15 +485,25 @@ static int sev_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
> > void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
> > {
> > unsigned int fn = lower_bits(regs->ax, 32);
> > + unsigned int subfn = lower_bits(regs->cx, 32);
> > u32 eax, ebx, ecx, edx;
> > + int ret;
> >
> > /* Only CPUID is supported via MSR protocol */
>
> ... and that still valid?
"Only CPUID #VCs can be handled without using a GHCB page" might be a
bit more to the point now. I'll update it.
> > +
> > +out_verify:
> > + /* CC blob should be either valid or not present. Fail otherwise. */
> > + if (cc_info && cc_info->magic != CC_BLOB_SEV_HDR_MAGIC)
> > + sev_es_terminate(1, GHCB_SNP_UNSUPPORTED);
> > +
> > + return cc_info;
> > +}
> > +#else
> > +/*
> > + * Probing for CC blob for run-time kernel will be enabled in a subsequent
> > + * patch. For now we need to stub this out.
> > + */
> > +static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
> > +{
> > + return NULL;
> > +}
> > +#endif
> > +
> > +/*
> > + * Initial set up of CPUID table when running identity-mapped.
> > + *
> > + * NOTE: Since SEV_SNP feature partly relies on CPUID checks that can't
> > + * happen until we access CPUID page, we skip the check and hope the
> > + * bootloader is providing sane values.
>
> So I don't like the sound of that even one bit. We shouldn't hope
> anything here...
More specifically, the general protocol to determine SNP is enabled seems
to be:
1) check cpuid 0x8000001f to determine if SEV bit is enabled and SEV
MSR is available
2) check the SEV MSR to see if SEV-SNP bit is set
but the conundrum here is the CPUID page is only valid if SNP is
enabled, otherwise it can be garbage. So the code to set up the page
skips those checks initially, and relies on the expectation that UEFI,
or whatever the initial guest blob was, will only provide a CC_BLOB if
it already determined SNP is enabled.
It's still possible something goes awry and the kernel gets handed a
bogus CC_BLOB even though SNP isn't actually enabled. In this case the
cpuid values could be bogus as well, but the guest will fail
attestation then and no secrets should be exposed.
There is one thing that could tighten up the check a bit though. Some
bits of SEV-ES code will use the generation of a #VC as an indicator
of SEV-ES support, which implies SEV MSR is available without relying
on hypervisor-provided CPUID bits. I could add a one-time check in
the cpuid #VC to check SEV MSR for SNP bit, but it would likely
involve another static __ro_after_init variable store state. If that
seems worthwhile I can look into that more as well.
>
> > Current code relies on all CPUID
> > + * page lookups originating from #VC handler, which at least provides
> > + * indication that SEV-ES is enabled. Subsequent init levels will check for
> > + * SEV_SNP feature once available to also take SEV MSR value into account.
> > + */
> > +void sev_snp_cpuid_init(struct boot_params *bp)
>
> snp_cpuid_init()
>
> In general, prefix all SNP-specific variables, structs, functions, etc
> with "snp_" simply.
>
> > +{
> > + struct cc_blob_sev_info *cc_info;
> > +
> > + if (!bp)
> > + sev_es_terminate(1, GHCB_TERM_CPUID);
> > +
> > + cc_info = sev_snp_probe_cc_blob(bp);
> > +
>
> ^ Superfluous newline.
>
> > + if (!cc_info)
> > + return;
> > +
> > + sev_snp_cpuid_pa = cc_info->cpuid_phys;
> > + sev_snp_cpuid_sz = cc_info->cpuid_len;
>
> You can do those assignments ...
>
> > +
> > + /*
> > + * These should always be valid values for SNP, even if guest isn't
> > + * actually configured to use the CPUID table.
> > + */
> > + if (!sev_snp_cpuid_pa || sev_snp_cpuid_sz < PAGE_SIZE)
> > + sev_es_terminate(1, GHCB_TERM_CPUID);
>
>
> ... here, after you've verified them.
>
> > +
> > + cpuid_info = (const struct sev_snp_cpuid_info *)sev_snp_cpuid_pa;
> > +
> > + /*
> > + * We should be able to trust the 'count' value in the CPUID table
> > + * area, but ensure it agrees with CC blob value to be safe.
> > + */
> > + if (sev_snp_cpuid_sz < (sizeof(struct sev_snp_cpuid_info) +
> > + sizeof(struct sev_snp_cpuid_fn) *
> > + cpuid_info->count))
>
> Yah, this is the type of paranoia I'm talking about!
>
> > + sev_es_terminate(1, GHCB_TERM_CPUID);
> > +
> > + sev_snp_cpuid_enabled = 1;
> > +}
> > diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> > index ddf8ced4a879..d7b6f7420551 100644
> > --- a/arch/x86/kernel/sev.c
> > +++ b/arch/x86/kernel/sev.c
> > @@ -19,6 +19,8 @@
> > #include <linux/kernel.h>
> > #include <linux/mm.h>
> > #include <linux/cpumask.h>
> > +#include <linux/log2.h>
> > +#include <linux/efi.h>
> >
> > #include <asm/cpu_entry_area.h>
> > #include <asm/stacktrace.h>
> > @@ -32,6 +34,8 @@
> > #include <asm/smp.h>
> > #include <asm/cpu.h>
> > #include <asm/apic.h>
> > +#include <asm/efi.h>
> > +#include <asm/cpuid.h>
> >
> > #include "sev-internal.h"
>
> What are those includes for?
These are also for EFI_CC_BLOB_GUID and cpuid_function_is_indexed,
respectively. Will add comments to clarify.
Thanks for the thorough review, will address all comments.
>
> Looks like a leftover...
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C4640a85832a9482de34f08d967fd2972%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637655159332919463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jpLwHsCIsXMV1jvwBp7DjPX4RAnw5tWRqB%2F9Ddccp%2FY%3D&reserved=0
On Fri, Aug 27, 2021 at 10:47:42AM -0500, Brijesh Singh wrote:
>
> On 8/27/21 10:18 AM, Borislav Petkov wrote:
> > On Fri, Aug 20, 2021 at 10:19:27AM -0500, Brijesh Singh wrote:
> >> From: Michael Roth <[email protected]>
> >>
> >> This adds support for utilizing the SEV-SNP-validated CPUID table in
> > s/This adds support for utilizing/Utilize/
> >
> > Yap, it can really be that simple. :)
> >
> >> the various #VC handler routines used throughout boot/run-time. Mostly
> >> this is handled by re-using the CPUID lookup code introduced earlier
> >> for the boot/compressed kernel, but at various stages of boot some work
> >> needs to be done to ensure the CPUID table is set up and remains
> >> accessible throughout. The following init routines are introduced to
> >> handle this:
> > Do not talk about what your patch does - that should hopefully be
> > visible in the diff itself. Rather, talk about *why* you're doing what
> > you're doing.
> >
> >> sev_snp_cpuid_init():
> > This one is not really introduced - it is already there.
> >
> > <snip all the complex rest>
> >
> > So this patch is making my head spin. It seems we're dancing a lot of
> > dance just to have our CPUID page present at all times. Which begs the
> > question: do we need it during the whole lifetime of the guest?
>
> Mike can correct me,? we need it for entire lifetime of the guest.?
> Whenever guest needs the CPUID value, the #VC handler will refer to this
> page.
That's right, and cpuid instructions can get introduced at pretty much
every stage of the boot process.
>
>
> > Regardless, I think this can be simplified by orders of
> > magnitude if we allocated statically 4K for that CPUID page in
> > arch/x86/boot/compressed/mem_encrypt.S, copied the supplied CPUID page
> > from the firmware to it and from now on, work with our own copy.
>
> Actually a? VMM could populate more than one page for the CPUID. One
> page can include 64 entries and I believe Mike is already running into
> limits (with Qemu) and exploring the ideas to extend it more than a page.
I added the range checks in this version so that a hypervisor can still
leave out all-zero entries, so I think it can be avoided near-term at
least, but yes, still a possibility we might need an extra one in the
future, not sure how scarce storage is for stuff like __ro_after_init, so
worth considering.
On Fri, Aug 27, 2021 at 05:18:49PM +0200, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:27AM -0500, Brijesh Singh wrote:
> > From: Michael Roth <[email protected]>
> >
> > This adds support for utilizing the SEV-SNP-validated CPUID table in
>
> s/This adds support for utilizing/Utilize/
>
> Yap, it can really be that simple. :)
>
> > the various #VC handler routines used throughout boot/run-time. Mostly
> > this is handled by re-using the CPUID lookup code introduced earlier
> > for the boot/compressed kernel, but at various stages of boot some work
> > needs to be done to ensure the CPUID table is set up and remains
> > accessible throughout. The following init routines are introduced to
> > handle this:
>
> Do not talk about what your patch does - that should hopefully be
> visible in the diff itself. Rather, talk about *why* you're doing what
> you're doing.
I'll get this cleaned up.
>
> > sev_snp_cpuid_init():
>
> This one is not really introduced - it is already there.
>
> <snip all the complex rest>
>
> So this patch is making my head spin. It seems we're dancing a lot of
> dance just to have our CPUID page present at all times. Which begs the
> question: do we need it during the whole lifetime of the guest?
>
> Regardless, I think this can be simplified by orders of
> magnitude if we allocated statically 4K for that CPUID page in
> arch/x86/boot/compressed/mem_encrypt.S, copied the supplied CPUID page
> from the firmware to it and from now on, work with our own copy.
That makes sense. I was thinking it was safer to work with the FW page
since it would be less susceptible to something like a buffer overflow
modifying the CPUID table, but __ro_after_init seems like it would
provide similar protections. And yes, would definitely be great to avoid
the need for so many [re-]init routines.
>
> You probably would need to still remap it for kernel proper but it would
> get rid of all that crazy in this patch here.
>
> Hmmm?
If the memory is allocated in boot/compressed/mem_encrypt.S, wouldn't
kernel proper still need to create a static buffer for its copy? And if
not, wouldn't boot compressed still need a way to pass the PA of this
buffer? That seems like it would need to be done via boot_params. It
seems like it would also need to be marked as reserved as well since
kernel proper could no longer rely on the EFI map to handle it.
I've been testing a similar approach based on your suggestion that seems
to work out pretty well, but there's still some ugliness due to the
fixup_pointer() stuff that's needed early during snp_cpuid_init() in
kernel proper, which results in the need for 2 init routines there. Not
sure if there's a better way to handle it, but it's a lot better than 4
init routines at least, and with this there is no longer any need to
store the address/size of the FW page:
in arch/x86/kernel/sev-shared.c:
/* Firmware-enforced limit on CPUID table entries */
#define SNP_CPUID_COUNT_MAX 64
struct sev_snp_cpuid_info {
u32 count;
u32 __reserved1;
u64 __reserved2;
struct sev_snp_cpuid_fn fn[SNP_CPUID_COUNT_MAX];
} __packed;
static struct snp_cpuid_info cpuid_info_copy __ro_after_init;
static const struct snp_cpuid_info *cpuid_info __ro_after_init;
static int sev_snp_cpuid_enabled __ro_after_init;
/*
* Initial set up of CPUID table when running identity-mapped.
*/
#ifdef __BOOT_COMPRESSED
void sev_snp_cpuid_init(struct boot_params *bp)
#else
void __init sev_snp_cpuid_init(struct boot_params *bp, unsigned long physaddr)
#endif
{
const struct sev_snp_cpuid_info *cpuid_info_fw;
cpuid_info_fw = snp_probe_cpuid_info(bp);
if (!cpuid_info_fw)
return;
#ifdef __BOOT_COMPRESSED
cpuid_info2 = &cpuid_info_copy;
#else
/* Kernel proper calls this while pointer fixups are still needed. */
cpuid_info2 = (const struct sev_snp_cpuid_info *)
((void *)&cpuid_info_copy - (void *)_text + physaddr);
#endif
memcpy((struct sev_snp_cpuid_info *)cpuid_info2, cpuid_info_fw,
sizeof(*cpuid_info2));
sev_snp_cpuid_enabled = 1;
}
#ifndef __BOOT_COMPRESSED
/*
* This is called after the switch to virtual kernel addresses. At this
* point pointer fixups are no longer needed, and the virtual address of
* the CPUID info buffer has changed, so re-initialize the pointer.
*/
void __init sev_snp_cpuid_init_virtual(void)
{
/*
* sev_snp_cpuid_init() already did the initial parsing of bootparams
* and initial setup. If that didn't enable the feature then don't try
* to enable it here.
*/
if (!sev_snp_cpuid_active())
return;
/*
* Either boot_params/EFI advertised the feature even though SNP isn't
* enabled, or something else went wrong. Bail out.
*/
if (!sev_feature_enabled(SEV_SNP))
sev_es_terminate(1, GHCB_TERM_CPUID);
cpuid_info = &cpuid_info_copy;
}
#endif
Then the rest of the code just accesses cpuid_info directly as it does now.
Would that be a reasonable approach for v6?
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C464de4ea70544dc32ba108d9696de67a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637656743008963071%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wYK2a%2FuHgw0%2FQJsCVCIpoaJxdT0XASMYUfHvmDcowXg%3D&reserved=0
On Fri, Aug 27, 2021 at 03:51:29PM +0200, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:24AM -0500, Brijesh Singh wrote:
> > From: Michael Roth <[email protected]>
> >
> > The previously defined Confidential Computing blob is provided to the
> > kernel via a setup_data structure or EFI config table entry. Currently
> > these are both checked for by boot/compressed kernel to access the
> > CPUID table address within it for use with SEV-SNP CPUID enforcement.
> >
> > To also enable SEV-SNP CPUID enforcement for the run-time kernel,
> > similar early access to the CPUID table is needed early on while it's
> > still using the identity-mapped page table set up by boot/compressed,
> > where global pointers need to be accessed via fixup_pointer().
> >
> > This is much of an issue for accessing setup_data, and the EFI config
> > table helper code currently used in boot/compressed *could* be used in
> > this case as well since they both rely on identity-mapping. However, it
> > has some reliance on EFI helpers/string constants that would need to be
> > accessed via fixup_pointer(), and fixing it up while making it
> > shareable between boot/compressed and run-time kernel is fragile and
> > introduces a good bit of uglyness.
> >
> > Instead, this patch adds a boot_params->cc_blob_address pointer that
>
> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.
>
> Also, do
>
> $ git grep 'This patch' Documentation/process
>
> for more details.
>
> > boot/compressed can initialize so that the run-time kernel can access
> > the prelocated CC blob that way instead.
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > Signed-off-by: Brijesh Singh <[email protected]>
> > ---
> > arch/x86/include/asm/bootparam_utils.h | 1 +
> > arch/x86/include/uapi/asm/bootparam.h | 3 ++-
> > 2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/bootparam_utils.h b/arch/x86/include/asm/bootparam_utils.h
> > index 981fe923a59f..53e9b0620d96 100644
> > --- a/arch/x86/include/asm/bootparam_utils.h
> > +++ b/arch/x86/include/asm/bootparam_utils.h
> > @@ -74,6 +74,7 @@ static void sanitize_boot_params(struct boot_params *boot_params)
> > BOOT_PARAM_PRESERVE(hdr),
> > BOOT_PARAM_PRESERVE(e820_table),
> > BOOT_PARAM_PRESERVE(eddbuf),
> > + BOOT_PARAM_PRESERVE(cc_blob_address),
> > };
> >
> > memset(&scratch, 0, sizeof(scratch));
> > diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
> > index 1ac5acca72ce..bea5cdcdf532 100644
> > --- a/arch/x86/include/uapi/asm/bootparam.h
> > +++ b/arch/x86/include/uapi/asm/bootparam.h
> > @@ -188,7 +188,8 @@ struct boot_params {
> > __u32 ext_ramdisk_image; /* 0x0c0 */
> > __u32 ext_ramdisk_size; /* 0x0c4 */
> > __u32 ext_cmd_line_ptr; /* 0x0c8 */
> > - __u8 _pad4[116]; /* 0x0cc */
> > + __u8 _pad4[112]; /* 0x0cc */
> > + __u32 cc_blob_address; /* 0x13c */
>
> So I know I've heard grub being mentioned in conjunction with this: if
> you are ever going to pass this through the boot loader, then you'd need
> to update Documentation/x86/zero-page.rst too to state that this field
> can be written by the boot loader too.
Right, I think we had discussed this back in v3 or so. But for grub, or
other bootloaders, the idea would be for them to use pass the CC blob
via a struct setup_data corresponding to SETUP_CC_BLOB, introduced in:
x86/boot: Add Confidential Computing type to setup_data
the boot_params field is only used internally to allow boot/compressed
to hand the CC blob over to kernel proper without kernel proper needing
to rescan for EFI blob (and thus needing all the efi config parsing
stuff).
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C83df94e2e42a415a515308d96961b2e8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637656690614876025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aL8JhC82mQ59sYNfk645%2Bxv%2FrgfU95jTxBJIr8uRRZs%3D&reserved=0
On Fri, Aug 27, 2021 at 04:15:14PM +0200, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:25AM -0500, Brijesh Singh wrote:
> > From: Michael Roth <[email protected]>
> >
> > When the Confidential Computing blob is located by the boot/compressed
> > kernel, store a pointer to it in bootparams->cc_blob_address to avoid
> > the need for the run-time kernel to rescan the EFI config table to find
> > it again.
> >
> > Since this function is also shared by the run-time kernel, this patch
>
> Here's "this patch" again... but you know what to do.
>
> > also adds the logic to make use of bootparams->cc_blob_address when it
> > has been initialized.
> >
> > Signed-off-by: Michael Roth <[email protected]>
> > Signed-off-by: Brijesh Singh <[email protected]>
> > ---
> > arch/x86/kernel/sev-shared.c | 40 ++++++++++++++++++++++++++----------
> > 1 file changed, 29 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> > index 651980ddbd65..6f70ba293c5e 100644
> > --- a/arch/x86/kernel/sev-shared.c
> > +++ b/arch/x86/kernel/sev-shared.c
> > @@ -868,7 +868,6 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,
> > return ES_OK;
> > }
> >
> > -#ifdef BOOT_COMPRESSED
> > static struct setup_data *get_cc_setup_data(struct boot_params *bp)
> > {
> > struct setup_data *hdr = (struct setup_data *)bp->hdr.setup_data;
> > @@ -888,6 +887,16 @@ static struct setup_data *get_cc_setup_data(struct boot_params *bp)
> > * 1) Search for CC blob in the following order/precedence:
> > * - via linux boot protocol / setup_data entry
> > * - via EFI configuration table
> > + * 2) If found, initialize boot_params->cc_blob_address to point to the
> > + * blob so that uncompressed kernel can easily access it during very
> > + * early boot without the need to re-parse EFI config table
> > + * 3) Return a pointer to the CC blob, NULL otherwise.
> > + *
> > + * For run-time/uncompressed kernel:
> > + *
> > + * 1) Search for CC blob in the following order/precedence:
> > + * - via linux boot protocol / setup_data entry
>
> Why would you do this again if the boot/compressed kernel has already
> searched for it?
In some cases it's possible to boot directly to kernel proper without
going through decompression kernel (e.g. CONFIG_PVH), so this is to allow
a way for boot loaders of this sort to provide a CC blob without relying
on EFI. It could be relevant for things like fast/virtualized containers.
>
> > + * - via boot_params->cc_blob_address
>
> Yes, that is the only thing you need to do in the runtime kernel - see
> if cc_blob_address is not 0. And all the work has been done by the
> decompressor kernel already.
>
> > * 2) Return a pointer to the CC blob, NULL otherwise.
> > */
> > static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
> > @@ -897,9 +906,11 @@ static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
> > struct setup_data header;
> > u32 cc_blob_address;
> > } *sd;
> > +#ifdef __BOOT_COMPRESSED
> > unsigned long conf_table_pa;
> > unsigned int conf_table_len;
> > bool efi_64;
> > +#endif
>
> That function turns into an unreadable mess with that #ifdef
> __BOOT_COMPRESSED slapped everywhere.
>
> It seems the cleanest thing to do is to do what we do with
> acpi_rsdp_addr: do all the parsing in boot/compressed/ and pass it on
> through boot_params. Kernel proper simply reads the pointer.
>
> Which means, you can stick all that cc_blob figuring out functionality
> in arch/x86/boot/compressed/sev.c instead.
Most of the #ifdef'ery is due to the EFI scan, so I moved that part out
to a separate helper, snp_probe_cc_blob_efi(), that lives in
boot/compressed.sev.c. Still not pretty, but would this be acceptable?
/*
* For boot/compressed kernel:
*
* 1) Search for CC blob in the following order/precedence:
* - via linux boot protocol / setup_data entry
* - via EFI configuration table
* 2) If found, initialize boot_params->cc_blob_address to point to the
* blob so that uncompressed kernel can easily access it during very
* early boot without the need to re-parse EFI config table
* 3) Return a pointer to the CC blob, NULL otherwise.
*
* For run-time/uncompressed kernel:
*
* 1) Search for CC blob in the following order/precedence:
* - via boot_params->cc_blob_address
* - via linux boot protocol / setup_data entry
* 2) Return a pointer to the CC blob, NULL otherwise.
*/
static struct cc_blob_sev_info *sev_snp_probe_cc_blob(struct boot_params *bp)
{
struct cc_blob_sev_info *cc_info = NULL;
struct cc_setup_data *sd;
#ifndef __BOOT_COMPRESSED
/*
* CC blob isn't in setup_data, see if boot kernel passed it via
* boot_params.
*/
if (bp->cc_blob_address) {
cc_info = (struct cc_blob_sev_info *)(unsigned long)bp->cc_blob_address;
goto out_verify;
}
#endif
/* Try to get CC blob via setup_data */
sd = get_cc_setup_data(bp);
if (sd) {
cc_info = (struct cc_blob_sev_info *)(unsigned long)sd->cc_blob_address;
goto out_verify;
}
#ifdef __BOOT_COMPRESSED
cc_info = snp_probe_cc_blob_efi(bp);
#endif
out_verify:
/* CC blob should be either valid or not present. Fail otherwise. */
if (cc_info && cc_info->magic != CC_BLOB_SEV_HDR_MAGIC)
sev_es_terminate(1, GHCB_SNP_UNSUPPORTED);
#ifdef __BOOT_COMPRESSED
/*
* Pass run-time kernel a pointer to CC info via boot_params for easier
* access during early boot.
*/
bp->cc_blob_address = (u32)(unsigned long)cc_info;
#endif
return cc_info;
}
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7C1c87c1e207d64d80bae308d9696503b4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637656704870745741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=R3pm8Xf3f5B%2Fm7IDpL%2BiFS0kUdCMmUlhtJFXCROh4YA%3D&reserved=0
On Fri, Aug 27, 2021 at 01:32:40PM -0500, Michael Roth wrote:
> On Fri, Aug 27, 2021 at 05:18:49PM +0200, Borislav Petkov wrote:
> > On Fri, Aug 20, 2021 at 10:19:27AM -0500, Brijesh Singh wrote:
> > > From: Michael Roth <[email protected]>
> > >
> > > This adds support for utilizing the SEV-SNP-validated CPUID table in
> >
> > s/This adds support for utilizing/Utilize/
> >
> > Yap, it can really be that simple. :)
> >
> > > the various #VC handler routines used throughout boot/run-time. Mostly
> > > this is handled by re-using the CPUID lookup code introduced earlier
> > > for the boot/compressed kernel, but at various stages of boot some work
> > > needs to be done to ensure the CPUID table is set up and remains
> > > accessible throughout. The following init routines are introduced to
> > > handle this:
> >
> > Do not talk about what your patch does - that should hopefully be
> > visible in the diff itself. Rather, talk about *why* you're doing what
> > you're doing.
>
> I'll get this cleaned up.
>
> >
> > > sev_snp_cpuid_init():
> >
> > This one is not really introduced - it is already there.
> >
> > <snip all the complex rest>
> >
> > So this patch is making my head spin. It seems we're dancing a lot of
> > dance just to have our CPUID page present at all times. Which begs the
> > question: do we need it during the whole lifetime of the guest?
> >
> > Regardless, I think this can be simplified by orders of
> > magnitude if we allocated statically 4K for that CPUID page in
> > arch/x86/boot/compressed/mem_encrypt.S, copied the supplied CPUID page
> > from the firmware to it and from now on, work with our own copy.
>
> That makes sense. I was thinking it was safer to work with the FW page
> since it would be less susceptible to something like a buffer overflow
> modifying the CPUID table, but __ro_after_init seems like it would
> provide similar protections. And yes, would definitely be great to avoid
> the need for so many [re-]init routines.
>
> >
> > You probably would need to still remap it for kernel proper but it would
> > get rid of all that crazy in this patch here.
> >
> > Hmmm?
>
> If the memory is allocated in boot/compressed/mem_encrypt.S, wouldn't
> kernel proper still need to create a static buffer for its copy? And if
> not, wouldn't boot compressed still need a way to pass the PA of this
> buffer? That seems like it would need to be done via boot_params. It
> seems like it would also need to be marked as reserved as well since
> kernel proper could no longer rely on the EFI map to handle it.
>
> I've been testing a similar approach based on your suggestion that seems
> to work out pretty well, but there's still some ugliness due to the
> fixup_pointer() stuff that's needed early during snp_cpuid_init() in
> kernel proper, which results in the need for 2 init routines there. Not
> sure if there's a better way to handle it, but it's a lot better than 4
> init routines at least, and with this there is no longer any need to
> store the address/size of the FW page:
>
> in arch/x86/kernel/sev-shared.c:
>
> /* Firmware-enforced limit on CPUID table entries */
> #define SNP_CPUID_COUNT_MAX 64
>
> struct sev_snp_cpuid_info {
> u32 count;
> u32 __reserved1;
> u64 __reserved2;
> struct sev_snp_cpuid_fn fn[SNP_CPUID_COUNT_MAX];
> } __packed;
>
> static struct snp_cpuid_info cpuid_info_copy __ro_after_init;
> static const struct snp_cpuid_info *cpuid_info __ro_after_init;
> static int sev_snp_cpuid_enabled __ro_after_init;
>
> /*
> * Initial set up of CPUID table when running identity-mapped.
> */
> #ifdef __BOOT_COMPRESSED
> void sev_snp_cpuid_init(struct boot_params *bp)
> #else
> void __init sev_snp_cpuid_init(struct boot_params *bp, unsigned long physaddr)
> #endif
> {
> const struct sev_snp_cpuid_info *cpuid_info_fw;
>
> cpuid_info_fw = snp_probe_cpuid_info(bp);
> if (!cpuid_info_fw)
> return;
>
> #ifdef __BOOT_COMPRESSED
> cpuid_info2 = &cpuid_info_copy;
> #else
> /* Kernel proper calls this while pointer fixups are still needed. */
> cpuid_info2 = (const struct sev_snp_cpuid_info *)
> ((void *)&cpuid_info_copy - (void *)_text + physaddr);
> #endif
> memcpy((struct sev_snp_cpuid_info *)cpuid_info2, cpuid_info_fw,
> sizeof(*cpuid_info2));
These should be cpuid_info, not cpuid_info2.
On Fri, Aug 27, 2021 at 08:38:31AM -0500, Michael Roth wrote:
> I've been periodically revising/rewording my comments since I saw you're
> original comments to Brijesh a few versions back, but it's how I normally
> talk when discussing code with people so it keeps managing to sneak back in.
Oh sure, happens to me too and I know it is hard to keep out but when
you start doing git archeology and start going through old commit
messages, wondering why stuff was done the way it is sitting there,
you'd be very grateful if someone actually took the time to write up the
"why" properly. Why was it done this way, what the constraints were,
yadda yadda.
And when you see a "we" there, you sometimes wonder, who's "we"? Was it
the party who submitted the code, was it the person who's submitting the
code but talking with the generic voice of a programmer who means "we"
the community writing the kernel, etc.
So yes, it is ambiguous and it probably wasn't a big deal at all when
the people writing the kernel all knew each other back then but that
long ain't the case anymore. So we (see, snuck in on me too :)) ... so
maintainers need to pay attention to those things now too.
Oh look, the last "we" above meant "maintainers".
I believe that should explain with a greater detail what I mean.
:-)
> I've added a git hook to check for this and found other instances that need
> fixing as well, so hopefully with the help of technology I can get them all
> sorted for the next spin.
Thanks, very much appreciated!
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 27, 2021 at 11:46:01AM -0500, Michael Roth wrote:
> I think I can split out at least sev_snp_cpuid_init() and
> sev_snp_probe_cc_blob(). Adding the actual cpuid lookup and related code to
> #VC handler though I'm not sure there's much that can be done there.
Ok.
> Will get this fixed up. I should've noticed these checkpatch warnings so
> I modified my git hook to flag these a bit more prevalently.
Yeah, that comes from git actually.
> Will make sure to great these together, but there seems to be a convention
> of including misc.h first, since it does some fixups for subsequent
> includes. So maybe that should be moved to the top? There's a comment in
> boot/compressed/sev.c:
>
> /*
> * misc.h needs to be first because it knows how to include the other kernel
> * headers in the pre-decompression code in a way that does not break
> * compilation.
> */
>
> And while it's not an issue here, asm/sev.h now needs to have
> __BOOT_COMPRESSED #define'd in advance. So maybe that #define should be
> moved into misc.h so it doesn't have to happen before each include?
Actually, I'd like to avoid all such nasty games, if possible, with the
compressed kernel includes because this is where it leads us: sprinkling
defines left and right and all kinds of magic include order which is
fragile and error prone.
So please try to be very conservative here with all the including games.
So I'd like to understand first *why* asm/sev.h needs to have
__BOOT_COMPRESSED defined and can that be avoided? Maybe in a separate
mail because this one already deals with a bunch of things.
> cpuid.h is for cpuid_function_is_indexed(), which was introduced in this
> series with patch "KVM: x86: move lookup of indexed CPUID leafs to helper".
Ok, if we keep cpuid.h only strictly with cpuid-specific helpers, I
guess that's fine.
> efi.h is for EFI_CC_BLOB_GUID, which gets referenced by sev-shared.c
> when it gets included here. However, misc.h seems to already include it,
> so it can be safely dropped from this patch.
Yeah, and this is what I mean: efi.h includes a bunch of linux/
namespace headers and then we have to go deal with compressed
pulling all kinds of definitions from kernel proper, with hacks like
__BOOT_COMPRESSED, for example.
That EFI_CC_BLOB_GUID is only needed in the compressed kernel, right?
That is, if you move all the CC blob parsing to the compressed kernel
and supply the thusly parsed info to kernel proper. In that case, you
can simply define in there, in efi.c or so.
> The 'reserved' fields here are documented in SEV-SNP Firmware ABI
> revision 0.9, section 8.14.2.6 (CPUID page), and the above 'reserved'
> fields of sev_snp_cpuid_fn are documented in section 7.1 (CPUID Reporting)
> Table 14:
>
> https://www.amd.com/system/files/TechDocs/56860.pdf
>
> The 'unused' / 'unused2' fields correspond to 'XCR0_IN' and 'XSS_IN' in
> section 7.1 Table 14. They are meant to allow a hypervisor to encode
> CPUID leaf 0xD subleaf 0x0:0x1 entries that are specific to a certain
> set of XSAVE features enabled via XCR0/XSS registers, so a guest can
> look up the specific entry based on its current XCR0/XSS register
> values.
>
> This doesn't scale very well as more XSAVE features are added however,
> and was more useful for the CPUID guest message documented in 7.1, as
> opposed to the static CPUID page implemented here.
>
> Instead, it is simpler and just as safe to have the guest calculate the
> appropriate values based on CPUID leaf 0xD, subleaves 0x2-0x3F, like
> what sev_snp_cpuid_xsave_size() does below. So they are marked unused
> here to try to make that clearer.
>
> Some of these hypervisor-specific implementation notes have been summarized
> into a document posted to the sev-snp mailing list in June:
>
> "Guest/Hypervisor Implementation Notes for SEV-SNP CPUID Enforcement"
>
> It's currently in RFC v2, but there has been a change relating to the
> CPUID range checks that needs to be added for v3, I'll get that sent
> out soon. We are hoping to get these included in an official spec to
> help with interoperability between hypervisors, but for now it is only
> a reference to aid implementations.
Thanks for explaining all that. What I mean here is to have some
reference above it to the official spec so that people can find it. With
SEV-*, there are a *lot* of specs so I'd like to have at least pointers
to the docs where one can find the text about it.
> Ok, will look at working this into there.
Yeah, first you need to kick Tom to send a new version. :-)
> More specifically, the general protocol to determine SNP is enabled seems
> to be:
>
> 1) check cpuid 0x8000001f to determine if SEV bit is enabled and SEV
> MSR is available
> 2) check the SEV MSR to see if SEV-SNP bit is set
>
> but the conundrum here is the CPUID page is only valid if SNP is
> enabled, otherwise it can be garbage. So the code to set up the page
> skips those checks initially, and relies on the expectation that UEFI,
> or whatever the initial guest blob was, will only provide a CC_BLOB if
> it already determined SNP is enabled.
>
> It's still possible something goes awry and the kernel gets handed a
> bogus CC_BLOB even though SNP isn't actually enabled. In this case the
> cpuid values could be bogus as well, but the guest will fail
> attestation then and no secrets should be exposed.
>
> There is one thing that could tighten up the check a bit though. Some
> bits of SEV-ES code will use the generation of a #VC as an indicator
> of SEV-ES support, which implies SEV MSR is available without relying
> on hypervisor-provided CPUID bits. I could add a one-time check in
> the cpuid #VC to check SEV MSR for SNP bit, but it would likely
> involve another static __ro_after_init variable store state. If that
> seems worthwhile I can look into that more as well.
Yes, the skipping of checks above sounds weird: why don't you simply
keep the checks order: SEV, -ES, -SNP and then parse CPUID. It'll fail
at attestation eventually, but you'll have the usual flow like with the
rest of the SEV- feature picking apart.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 27, 2021 at 02:09:55PM -0500, Michael Roth wrote:
> Most of the #ifdef'ery is due to the EFI scan, so I moved that part out
> to a separate helper, snp_probe_cc_blob_efi(), that lives in
> boot/compressed.sev.c. Still not pretty, but would this be acceptable?
It is still ugly... :)
I guess you should simply do two separate functions - one doing the
compressed dance and the other the later parsing and keep 'em separate.
It's not like you're duplicating a ton of code so...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Brijesh,
On 20/08/2021 18:19, Brijesh Singh wrote:
> Version 2 of GHCB specification provides NAEs that can be used by the SNP
> guest to communicate with the PSP without risk from a malicious hypervisor
> who wishes to read, alter, drop or replay the messages sent.
>
> In order to communicate with the PSP, the guest need to locate the secrets
> page inserted by the hypervisor during the SEV-SNP guest launch. The
> secrets page contains the communication keys used to send and receive the
> encrypted messages between the guest and the PSP. The secrets page location
> is passed through the setup_data.
>
> Create a platform device that the SNP guest driver can bind to get the
> platform resources such as encryption key and message id to use to
> communicate with the PSP. The SNP guest driver can provide userspace
> interface to get the attestation report, key derivation, extended
> attestation report etc.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/sev.c | 68 +++++++++++++++++++++++++++++++++++++++
> include/linux/sev-guest.h | 5 +++
> 2 files changed, 73 insertions(+)
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index f42cd5a8e7bb..ab17c93634e9 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -22,6 +22,8 @@
> #include <linux/log2.h>
> #include <linux/efi.h>
> #include <linux/sev-guest.h>
> +#include <linux/platform_device.h>
> +#include <linux/io.h>
>
> #include <asm/cpu_entry_area.h>
> #include <asm/stacktrace.h>
> @@ -37,6 +39,7 @@
> #include <asm/apic.h>
> #include <asm/efi.h>
> #include <asm/cpuid.h>
> +#include <asm/setup.h>
>
> #include "sev-internal.h"
>
> @@ -2164,3 +2167,68 @@ int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsi
> return ret;
> }
> EXPORT_SYMBOL_GPL(snp_issue_guest_request);
> +
> +static struct platform_device guest_req_device = {
> + .name = "snp-guest",
> + .id = -1,
> +};
> +
> +static u64 find_secrets_paddr(void)
> +{
> + u64 pa_data = boot_params.cc_blob_address;
> + struct cc_blob_sev_info info;
> + void *map;
> +
> + /*
> + * The CC blob contains the address of the secrets page, check if the
> + * blob is present.
> + */
> + if (!pa_data)
> + return 0;
> +
> + map = early_memremap(pa_data, sizeof(info));
> + memcpy(&info, map, sizeof(info));
> + early_memunmap(map, sizeof(info));
> +
> + /* Verify that secrets page address is passed */
> + if (info.secrets_phys && info.secrets_len == PAGE_SIZE)
> + return info.secrets_phys;
> +
> + return 0;
> +}
> +
> +static int __init add_snp_guest_request(void)
> +{
> + struct snp_secrets_page_layout *layout;
> + struct snp_guest_platform_data data;
> +
> + if (!sev_feature_enabled(SEV_SNP))
> + return -ENODEV;
> +
> + snp_secrets_phys = find_secrets_paddr();
> + if (!snp_secrets_phys)
> + return -ENODEV;
> +
> + layout = snp_map_secrets_page();
> + if (!layout)
> + return -ENODEV;
> +
> + /*
> + * The secrets page contains three VMPCK that can be used for
> + * communicating with the PSP. We choose the VMPCK0 to encrypt guest
> + * messages send and receive by the Linux. Provide the key and
> + * id through the platform data to the driver.
> + */
> + data.vmpck_id = 0;
> + memcpy_fromio(data.vmpck, layout->vmpck0, sizeof(data.vmpck));
> +
> + iounmap(layout);
> +
> + platform_device_add_data(&guest_req_device, &data, sizeof(data));
> +
> + if (!platform_device_register(&guest_req_device))
> + dev_info(&guest_req_device.dev, "secret phys 0x%llx\n", snp_secrets_phys);
Should you return the error code from platform_device_register() in case
it fails (returns something other than zero)?
-Dov
> +
> + return 0;
> +}
> +device_initcall(add_snp_guest_request);
> diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
> index 16b6af24fda7..e1cb3f7dd034 100644
> --- a/include/linux/sev-guest.h
> +++ b/include/linux/sev-guest.h
> @@ -68,6 +68,11 @@ struct snp_guest_request_data {
> unsigned int data_npages;
> };
>
> +struct snp_guest_platform_data {
> + u8 vmpck_id;
> + char vmpck[VMPCK_KEY_LEN];
> +};
> +
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
> unsigned long *fw_err);
>
Hi Dov,
On 8/31/21 6:37 AM, Dov Murik wrote:
>> +
>> + if (!platform_device_register(&guest_req_device))
>> + dev_info(&guest_req_device.dev, "secret phys 0x%llx\n", snp_secrets_phys);
>
> Should you return the error code from platform_device_register() in case
> it fails (returns something other than zero)?
>
Yes, I will fix it in next rev. Will return a non-zero on failure to
register the device.
thanks
On Fri, Aug 27, 2021 at 01:32:40PM -0500, Michael Roth wrote:
> If the memory is allocated in boot/compressed/mem_encrypt.S, wouldn't
> kernel proper still need to create a static buffer for its copy?
Just like the other variables like sme_me_mask etc that file allocates
at the bottom. Or do you have a better idea?
> Would that be a reasonable approach for v6?
I don't like the ifdeffery one bit, TBH. I guess you should split it
and have a boot/compressed page and a kernel proper one and keep 'em
separate. That should make everything nice and clean at the cost of 2*4K
which is nothing nowadays.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Brijesh,
On 20/08/2021 18:19, Brijesh Singh wrote:
> The SNP_GET_DERIVED_KEY ioctl interface can be used by the SNP guest to
> ask the firmware to provide a key derived from a root key. The derived
> key may be used by the guest for any purposes it choose, such as a
> sealing key or communicating with the external entities.
>
> See SEV-SNP firmware spec for more information.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> Documentation/virt/coco/sevguest.rst | 18 ++++++++++
> drivers/virt/coco/sevguest/sevguest.c | 48 +++++++++++++++++++++++++++
> include/uapi/linux/sev-guest.h | 24 ++++++++++++++
> 3 files changed, 90 insertions(+)
>
> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
> index 52d5915037ef..25446670d816 100644
> --- a/Documentation/virt/coco/sevguest.rst
> +++ b/Documentation/virt/coco/sevguest.rst
> @@ -67,3 +67,21 @@ provided by the SEV-SNP firmware to query the attestation report.
> On success, the snp_report_resp.data will contains the report. The report
> format is described in the SEV-SNP specification. See the SEV-SNP specification
> for further details.
> +
> +2.2 SNP_GET_DERIVED_KEY
> +-----------------------
> +:Technology: sev-snp
> +:Type: guest ioctl
> +:Parameters (in): struct snp_derived_key_req
> +:Returns (out): struct snp_derived_key_req on success, -negative on error
> +
> +The SNP_GET_DERIVED_KEY ioctl can be used to get a key derive from a root key.
> +The derived key can be used by the guest for any purpose, such as sealing keys
> +or communicating with external entities.
> +
> +The ioctl uses the SNP_GUEST_REQUEST (MSG_KEY_REQ) command provided by the
> +SEV-SNP firmware to derive the key. See SEV-SNP specification for further details
> +on the various fileds passed in the key derivation request.
> +
> +On success, the snp_derived_key_resp.data will contains the derived key
> +value.
> diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
> index d029a98ad088..621b1c5a9cfc 100644
> --- a/drivers/virt/coco/sevguest/sevguest.c
> +++ b/drivers/virt/coco/sevguest/sevguest.c
> @@ -303,6 +303,50 @@ static int get_report(struct snp_guest_dev *snp_dev, struct snp_user_guest_reque
> return rc;
> }
>
> +static int get_derived_key(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *arg)
> +{
> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> + struct snp_derived_key_resp *resp;
> + struct snp_derived_key_req req;
> + int rc, resp_len;
> +
> + if (!arg->req_data || !arg->resp_data)
> + return -EINVAL;
> +
> + /* Copy the request payload from the userspace */
> + if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
> + return -EFAULT;
> +
> + /* Message version must be non-zero */
> + if (!req.msg_version)
> + return -EINVAL;
> +
> + /*
> + * The intermediate response buffer is used while decrypting the
> + * response payload. Make sure that it has enough space to cover the
> + * authtag.
> + */
> + resp_len = sizeof(resp->data) + crypto->a_len;
> + resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
The length of resp->data is 64 bytes; I assume crypto->a_len is not a
lot more (and probably known in advance for AES GCM). Maybe use a
buffer on the stack instead of allocating and freeing?
> + if (!resp)
> + return -ENOMEM;
> +
> + /* Issue the command to get the attestation report */
> + rc = handle_guest_request(snp_dev, req.msg_version, SNP_MSG_KEY_REQ,
> + &req.data, sizeof(req.data), resp->data, resp_len,
> + &arg->fw_err);
> + if (rc)
> + goto e_free;
> +
> + /* Copy the response payload to userspace */
> + if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
> + rc = -EFAULT;
> +
> +e_free:
> + kfree(resp);
Since resp contains key material, I think you should explicit_memzero()
it before freeing, so the key bytes don't linger around in unused
memory. I'm not sure if any copies are made inside the
handle_guest_request call above; maybe zero these as well.
-Dov
> + return rc;
> +}
> +
> static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> {
> struct snp_guest_dev *snp_dev = to_snp_dev(file);
> @@ -320,6 +364,10 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
> ret = get_report(snp_dev, &input);
> break;
> }
> + case SNP_GET_DERIVED_KEY: {
> + ret = get_derived_key(snp_dev, &input);
> + break;
> + }
> default:
> break;
> }
> diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
> index e8cfd15133f3..621a9167df7a 100644
> --- a/include/uapi/linux/sev-guest.h
> +++ b/include/uapi/linux/sev-guest.h
> @@ -36,9 +36,33 @@ struct snp_user_guest_request {
> __u64 fw_err;
> };
>
> +struct __snp_derived_key_req {
> + __u32 root_key_select;
> + __u32 rsvd;
> + __u64 guest_field_select;
> + __u32 vmpl;
> + __u32 guest_svn;
> + __u64 tcb_version;
> +};
> +
> +struct snp_derived_key_req {
> + /* message version number (must be non-zero) */
> + __u8 msg_version;
> +
> + struct __snp_derived_key_req data;
> +};
> +
> +struct snp_derived_key_resp {
> + /* response data, see SEV-SNP spec for the format */
> + __u8 data[64];
> +};
> +
> #define SNP_GUEST_REQ_IOC_TYPE 'S'
>
> /* Get SNP attestation report */
> #define SNP_GET_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x0, struct snp_user_guest_request)
>
> +/* Get a derived key from the root */
> +#define SNP_GET_DERIVED_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_user_guest_request)
> +
> #endif /* __UAPI_LINUX_SEV_GUEST_H_ */
>
Hi Brijesh,
On 20/08/2021 18:19, Brijesh Singh wrote:
> Version 2 of GHCB specification defines NAE to get the extended guest
> request. It is similar to the SNP_GET_REPORT ioctl. The main difference
> is related to the additional data that be returned. The additional
> data returned is a certificate blob that can be used by the SNP guest
> user.
It seems like the SNP_GET_EXT_REPORT ioctl does everything that the
SNP_GET_REPORT ioctl does, and more. Why expose SNP_GET_REPORT to
userspace at all?
-Dov
> The certificate blob layout is defined in the GHCB specification.
> The driver simply treats the blob as a opaque data and copies it to
> userspace.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> Documentation/virt/coco/sevguest.rst | 22 +++++
> drivers/virt/coco/sevguest/sevguest.c | 126 ++++++++++++++++++++++++++
> include/uapi/linux/sev-guest.h | 13 +++
> 3 files changed, 161 insertions(+)
>
[...]
Hi Brijesh,
On 20/08/2021 18:19, Brijesh Singh wrote:
> The SNP guest request message header contains a message count. The
> message count is used while building the IV. The PSP firmware increments
> the message count by 1, and expects that next message will be using the
> incremented count. The snp_msg_seqno() helper will be used by driver to
> get the message sequence counter used in the request message header,
> and it will be automatically incremented after the request is successful.
> The incremented value is saved in the secrets page so that the kexec'ed
> kernel knows from where to begin.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/sev.c | 79 +++++++++++++++++++++++++++++++++++++++
> include/linux/sev-guest.h | 37 ++++++++++++++++++
> 2 files changed, 116 insertions(+)
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 319a40fc57ce..f42cd5a8e7bb 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -51,6 +51,8 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
> */
> static struct ghcb __initdata *boot_ghcb;
>
> +static u64 snp_secrets_phys;
> +
> /* #VC handler runtime per-CPU data */
> struct sev_es_runtime_data {
> struct ghcb ghcb_page;
> @@ -2030,6 +2032,80 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
> halt();
> }
>
> +static struct snp_secrets_page_layout *snp_map_secrets_page(void)
> +{
> + u16 __iomem *secrets;
You never dereference 'secrets'. Maybe s/u16/void/ ?
> +
> + if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
> + return NULL;
> +
> + secrets = ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
> + if (!secrets)
> + return NULL;
> +
> + return (struct snp_secrets_page_layout *)secrets;
> +}
> +
> +static inline u64 snp_read_msg_seqno(void)
> +{
> + struct snp_secrets_page_layout *layout;
> + u64 count;
> +
> + layout = snp_map_secrets_page();
> + if (!layout)
> + return 0;
> +
> + /* Read the current message sequence counter from secrets pages */
> + count = readl(&layout->os_area.msg_seqno_0);
> +
> + iounmap(layout);
> +
> + /* The sequence counter must begin with 1 */
> + if (!count)
> + return 1;
> +
> + return count + 1;
As Borislav noted, you can remove the "if (!count) return 1" because in
that case (count==0) the "return count+1" will return exactly 1.
-Dov
> +}
> +
> +u64 snp_msg_seqno(void)
> +{
> + u64 count = snp_read_msg_seqno();
> +
> + if (unlikely(!count))
> + return 0;
> +
> + /*
> + * The message sequence counter for the SNP guest request is a
> + * 64-bit value but the version 2 of GHCB specification defines a
> + * 32-bit storage for the it.
> + */
> + if (count >= UINT_MAX)
> + return 0;
> +
> + return count;
> +}
> +EXPORT_SYMBOL_GPL(snp_msg_seqno);
> +
> +static void snp_gen_msg_seqno(void)
> +{
> + struct snp_secrets_page_layout *layout;
> + u64 count;
> +
> + layout = snp_map_secrets_page();
> + if (!layout)
> + return;
> +
> + /*
> + * The counter is also incremented by the PSP, so increment it by 2
> + * and save in secrets page.
> + */
> + count = readl(&layout->os_area.msg_seqno_0);
> + count += 2;
> +
> + writel(count, &layout->os_area.msg_seqno_0);
> + iounmap(layout);
> +}
> +
> int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsigned long *fw_err)
> {
> struct ghcb_state state;
> @@ -2077,6 +2153,9 @@ int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsi
> ret = -EIO;
> }
>
> + /* The command was successful, increment the sequence counter */
> + snp_gen_msg_seqno();
> +
> e_put:
> __sev_put_ghcb(&state);
> e_restore_irq:
> diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
> index 24dd17507789..16b6af24fda7 100644
> --- a/include/linux/sev-guest.h
> +++ b/include/linux/sev-guest.h
> @@ -20,6 +20,41 @@ enum vmgexit_type {
> GUEST_REQUEST_MAX
> };
>
> +/*
> + * The secrets page contains 96-bytes of reserved field that can be used by
> + * the guest OS. The guest OS uses the area to save the message sequence
> + * number for each VMPCK.
> + *
> + * See the GHCB spec section Secret page layout for the format for this area.
> + */
> +struct secrets_os_area {
> + u32 msg_seqno_0;
> + u32 msg_seqno_1;
> + u32 msg_seqno_2;
> + u32 msg_seqno_3;
> + u64 ap_jump_table_pa;
> + u8 rsvd[40];
> + u8 guest_usage[32];
> +} __packed;
> +
> +#define VMPCK_KEY_LEN 32
> +
> +/* See the SNP spec for secrets page format */
> +struct snp_secrets_page_layout {
> + u32 version;
> + u32 imien : 1,
> + rsvd1 : 31;
> + u32 fms;
> + u32 rsvd2;
> + u8 gosvw[16];
> + u8 vmpck0[VMPCK_KEY_LEN];
> + u8 vmpck1[VMPCK_KEY_LEN];
> + u8 vmpck2[VMPCK_KEY_LEN];
> + u8 vmpck3[VMPCK_KEY_LEN];
> + struct secrets_os_area os_area;
> + u8 rsvd3[3840];
> +} __packed;
> +
> /*
> * The error code when the data_npages is too small. The error code
> * is defined in the GHCB specification.
> @@ -36,6 +71,7 @@ struct snp_guest_request_data {
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
> unsigned long *fw_err);
> +u64 snp_msg_seqno(void);
> #else
>
> static inline int snp_issue_guest_request(int type, struct snp_guest_request_data *input,
> @@ -43,6 +79,7 @@ static inline int snp_issue_guest_request(int type, struct snp_guest_request_dat
> {
> return -ENODEV;
> }
> +static inline u64 snp_msg_seqno(void) { return 0; }
>
> #endif /* CONFIG_AMD_MEM_ENCRYPT */
> #endif /* __LINUX_SEV_GUEST_H__ */
>
Hi Dov,
On 8/31/21 1:59 PM, Dov Murik wrote:
>> +
>> + /*
>> + * The intermediate response buffer is used while decrypting the
>> + * response payload. Make sure that it has enough space to cover the
>> + * authtag.
>> + */
>> + resp_len = sizeof(resp->data) + crypto->a_len;
>> + resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
>
> The length of resp->data is 64 bytes; I assume crypto->a_len is not a
> lot more (and probably known in advance for AES GCM). Maybe use a
> buffer on the stack instead of allocating and freeing?
>
The authtag size can be up to 16 bytes, so I guess I can allocate 80
bytes on stack and avoid the kzalloc().
>
>> + if (!resp)
>> + return -ENOMEM;
>> +
>> + /* Issue the command to get the attestation report */
>> + rc = handle_guest_request(snp_dev, req.msg_version, SNP_MSG_KEY_REQ,
>> + &req.data, sizeof(req.data), resp->data, resp_len,
>> + &arg->fw_err);
>> + if (rc)
>> + goto e_free;
>> +
>> + /* Copy the response payload to userspace */
>> + if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
>> + rc = -EFAULT;
>> +
>> +e_free:
>> + kfree(resp);
>
> Since resp contains key material, I think you should explicit_memzero()
> it before freeing, so the key bytes don't linger around in unused
> memory. I'm not sure if any copies are made inside the
> handle_guest_request call above; maybe zero these as well.
>
I can do that, but I guess I am trying to find a reason for it. The resp
buffer is encrypted page, so, the key is protected from the hypervisor
access. Are you thinking about an attack within the VM guest OS ?
-Brijesh
On 8/31/21 3:22 PM, Dov Murik wrote:
> Hi Brijesh,
>
> On 20/08/2021 18:19, Brijesh Singh wrote:
>> Version 2 of GHCB specification defines NAE to get the extended guest
>> request. It is similar to the SNP_GET_REPORT ioctl. The main difference
>> is related to the additional data that be returned. The additional
>> data returned is a certificate blob that can be used by the SNP guest
>> user.
>
> It seems like the SNP_GET_EXT_REPORT ioctl does everything that the
> SNP_GET_REPORT ioctl does, and more. Why expose SNP_GET_REPORT to
> userspace at all?
>
>
Since both of these options are provided by the GHCB protocol so I
exposed it. Its possible that some applications may not care about the
extended certificate blob. And in those case, if the hypervisor is
programmed with the extended certificate blob and caller does not supply
the enough number of pages to copy the blob then command should fail.
This will enforce a new requirement on that guest application to
allocate an extra memory. e.g:
1. Hypervisor is programmed with a system wide certificate blob using
the SNP_SET_EXT_CONFIG ioctl().
2. Guest wants to get the report but does not care about the certificate
blob.
3. Guest issues a extended guest report with the npages = 0. The command
will fail with invalid length and number of pages will be returned in
the response.
4. Guest will not need to allocate memory to hold the certificate and
reissue the command.
The #4 is unnecessary for a guest which does not want to get. In this
case, a guest can simply call the attestation report without asking for
certificate blob. Please see the GHCB spec for more details.
thanks
On 8/31/21 3:46 PM, Dov Murik wrote:
>> +
>> + return count + 1;
>
> As Borislav noted, you can remove the "if (!count) return 1" because in
> that case (count==0) the "return count+1" will return exactly 1.
Yep, I am working to simplify it based on the Boris feedback. I will
incorporate the feedback in v6. thanks
On Tue, Aug 31, 2021 at 12:04:33PM +0200, Borislav Petkov wrote:
> On Fri, Aug 27, 2021 at 11:46:01AM -0500, Michael Roth wrote:
> > Will make sure to great these together, but there seems to be a convention
> > of including misc.h first, since it does some fixups for subsequent
> > includes. So maybe that should be moved to the top? There's a comment in
> > boot/compressed/sev.c:
> >
> > /*
> > * misc.h needs to be first because it knows how to include the other kernel
> > * headers in the pre-decompression code in a way that does not break
> > * compilation.
> > */
> >
> > And while it's not an issue here, asm/sev.h now needs to have
> > __BOOT_COMPRESSED #define'd in advance. So maybe that #define should be
> > moved into misc.h so it doesn't have to happen before each include?
>
> Actually, I'd like to avoid all such nasty games, if possible, with the
> compressed kernel includes because this is where it leads us: sprinkling
> defines left and right and all kinds of magic include order which is
> fragile and error prone.
>
> So please try to be very conservative here with all the including games.
>
> So I'd like to understand first *why* asm/sev.h needs to have
> __BOOT_COMPRESSED defined and can that be avoided? Maybe in a separate
> mail because this one already deals with a bunch of things.
I think I just convinced myself at some point that that's where all
these sev-shared.c declarations are supposed to go, but you're right, I
could just as easily move all the __BOOT_COMPRESSED-only definitions
into boot/compressed/misc.h and avoid the mess.
That'll make it nicer if I can get some of the __BOOT_COMPRESSED-guarded
definitions in sev-shared.c moved out boot/compressed/sev.c and
kernel/sev.c as well, with the help of some common setter/getter helpers
to still keep most of the core logic/data structures contained in
sev-shared.c.
>
> > cpuid.h is for cpuid_function_is_indexed(), which was introduced in this
> > series with patch "KVM: x86: move lookup of indexed CPUID leafs to helper".
>
> Ok, if we keep cpuid.h only strictly with cpuid-specific helpers, I
> guess that's fine.
>
> > efi.h is for EFI_CC_BLOB_GUID, which gets referenced by sev-shared.c
> > when it gets included here. However, misc.h seems to already include it,
> > so it can be safely dropped from this patch.
>
> Yeah, and this is what I mean: efi.h includes a bunch of linux/
> namespace headers and then we have to go deal with compressed
> pulling all kinds of definitions from kernel proper, with hacks like
> __BOOT_COMPRESSED, for example.
>
> That EFI_CC_BLOB_GUID is only needed in the compressed kernel, right?
> That is, if you move all the CC blob parsing to the compressed kernel
> and supply the thusly parsed info to kernel proper. In that case, you
> can simply define in there, in efi.c or so.
It was used previously in kernel proper to get at the secrets page later,
but now it's obtained via the cached entry in boot_params.cc_blob_address.
Unfortunately it uses EFI_GUID() macro, so maybe efi.c or misc.h where
it makes more sense to add a copy of the macro?
On Tue, Aug 31, 2021 at 10:03:12AM +0200, Borislav Petkov wrote:
> On Fri, Aug 27, 2021 at 08:38:31AM -0500, Michael Roth wrote:
> > I've been periodically revising/rewording my comments since I saw you're
> > original comments to Brijesh a few versions back, but it's how I normally
> > talk when discussing code with people so it keeps managing to sneak back in.
>
> Oh sure, happens to me too and I know it is hard to keep out but when
> you start doing git archeology and start going through old commit
> messages, wondering why stuff was done the way it is sitting there,
> you'd be very grateful if someone actually took the time to write up the
> "why" properly. Why was it done this way, what the constraints were,
> yadda yadda.
>
> And when you see a "we" there, you sometimes wonder, who's "we"? Was it
> the party who submitted the code, was it the person who's submitting the
> code but talking with the generic voice of a programmer who means "we"
> the community writing the kernel, etc.
>
> So yes, it is ambiguous and it probably wasn't a big deal at all when
> the people writing the kernel all knew each other back then but that
> long ain't the case anymore. So we (see, snuck in on me too :)) ... so
> maintainers need to pay attention to those things now too.
>
> Oh look, the last "we" above meant "maintainers".
>
> I believe that should explain with a greater detail what I mean.
>
> :-)
Thanks for the explanation, makes perfect sense. Just need to get my brain
on the same page. :)
On Tue, Aug 31, 2021 at 06:22:44PM +0200, Borislav Petkov wrote:
> On Fri, Aug 27, 2021 at 01:32:40PM -0500, Michael Roth wrote:
> > If the memory is allocated in boot/compressed/mem_encrypt.S, wouldn't
> > kernel proper still need to create a static buffer for its copy?
>
> Just like the other variables like sme_me_mask etc that file allocates
> at the bottom. Or do you have a better idea?
What did you think of the suggestion of defining it in sev-shared.c
as a static buffer/struct as __ro_after_init? It would be nice to
declare/reserve the memory in one place. Another benefit is it doesn't
need to be exported, and could just be local with all the other
snp_cpuid* helpers that access it in sev-shared.c
>
> > Would that be a reasonable approach for v6?
>
> I don't like the ifdeffery one bit, TBH. I guess you should split it
> and have a boot/compressed page and a kernel proper one and keep 'em
> separate. That should make everything nice and clean at the cost of 2*4K
> which is nothing nowadays.
I think I can address the ifdeffery by splitting the boot/proper routines
into separate self-contained routines (and maybe move them out into
boot/compressed/sev.c and kernel/sev.c, respectively), then having them
just initialize the table pointer and create the copy using a common setter
function, e.g.
snp_cpuid_table_create(cc_blob, fixup_offset)
and for boot/compressed.c fixup_offset would just be passed in as 0.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CMichael.Roth%40amd.com%7Cb4adf700d33e42ffe4be08d96c9b7fe6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637660237404006013%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nVGIpG0WAcqcHZWXK1%2BQoaPBoeCLwtkqgs8Mfgz%2Fr04%3D&reserved=0
On 01/09/2021 0:04, Brijesh Singh wrote:
> Hi Dov,
>
>
> On 8/31/21 1:59 PM, Dov Murik wrote:
>>> +
>>> + /*
>>> + * The intermediate response buffer is used while decrypting the
>>> + * response payload. Make sure that it has enough space to cover
>>> the
>>> + * authtag.
>>> + */
>>> + resp_len = sizeof(resp->data) + crypto->a_len;
>>> + resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
>>
>> The length of resp->data is 64 bytes; I assume crypto->a_len is not a
>> lot more (and probably known in advance for AES GCM). Maybe use a
>> buffer on the stack instead of allocating and freeing?
>>
>
> The authtag size can be up to 16 bytes, so I guess I can allocate 80
> bytes on stack and avoid the kzalloc().
>
>>
>>> + if (!resp)
>>> + return -ENOMEM;
>>> +
>>> + /* Issue the command to get the attestation report */
>>> + rc = handle_guest_request(snp_dev, req.msg_version,
>>> SNP_MSG_KEY_REQ,
>>> + &req.data, sizeof(req.data), resp->data, resp_len,
>>> + &arg->fw_err);
>>> + if (rc)
>>> + goto e_free;
>>> +
>>> + /* Copy the response payload to userspace */
>>> + if (copy_to_user((void __user *)arg->resp_data, resp,
>>> sizeof(*resp)))
>>> + rc = -EFAULT;
>>> +
>>> +e_free:
>>> + kfree(resp);
>>
>> Since resp contains key material, I think you should explicit_memzero()
>> it before freeing, so the key bytes don't linger around in unused
>> memory. I'm not sure if any copies are made inside the
>> handle_guest_request call above; maybe zero these as well.
>>
>
> I can do that, but I guess I am trying to find a reason for it. The resp
> buffer is encrypted page, so, the key is protected from the hypervisor
> access. Are you thinking about an attack within the VM guest OS ?
>
Yes, that's the concern, specifically with sensitive buffers (keys).
You don't want many copies floating around in unused memory.
-Dov
On 01/09/2021 0:11, Brijesh Singh wrote:
>
>
> On 8/31/21 3:22 PM, Dov Murik wrote:
>> Hi Brijesh,
>>
>> On 20/08/2021 18:19, Brijesh Singh wrote:
>>> Version 2 of GHCB specification defines NAE to get the extended guest
>>> request. It is similar to the SNP_GET_REPORT ioctl. The main difference
>>> is related to the additional data that be returned. The additional
>>> data returned is a certificate blob that can be used by the SNP guest
>>> user.
>>
>> It seems like the SNP_GET_EXT_REPORT ioctl does everything that the
>> SNP_GET_REPORT ioctl does, and more. Why expose SNP_GET_REPORT to
>> userspace at all?
>>
>>
>
> Since both of these options are provided by the GHCB protocol so I
> exposed it. Its possible that some applications may not care about the
> extended certificate blob. And in those case, if the hypervisor is
> programmed with the extended certificate blob and caller does not supply
> the enough number of pages to copy the blob then command should fail.
> This will enforce a new requirement on that guest application to
> allocate an extra memory. e.g:
>
> 1. Hypervisor is programmed with a system wide certificate blob using
> the SNP_SET_EXT_CONFIG ioctl().
>
> 2. Guest wants to get the report but does not care about the certificate
> blob.
>
> 3. Guest issues a extended guest report with the npages = 0. The command
> will fail with invalid length and number of pages will be returned in
> the response.
>
> 4. Guest will not need to allocate memory to hold the certificate and
> reissue the command.
>
> The #4 is unnecessary for a guest which does not want to get. In this
> case, a guest can simply call the attestation report without asking for
> certificate blob. Please see the GHCB spec for more details.
>
OK. Originally I thought that by passing certs_address=NULL and
certs_len=0 the user program can say "I don't want this extra data"; but
now I understand that this will return an error (invalid length) with
number of pages needed.
-Dov
On Tue, Aug 31, 2021 at 08:16:58PM -0500, Michael Roth wrote:
> What did you think of the suggestion of defining it in sev-shared.c
> as a static buffer/struct as __ro_after_init? It would be nice to
> declare/reserve the memory in one place. Another benefit is it doesn't
> need to be exported, and could just be local with all the other
> snp_cpuid* helpers that access it in sev-shared.c
Yap.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Aug 30, 2021 at 10:07:39AM -0500, Brijesh Singh wrote:
> The SNP firmware spec says that counter must begin with the 1.
So put that in the comment and explain what 0 is: magic or invalid or
whatnot and why is that so and that it is spec-ed this way, etc.
Just having it there without a reasoning makes one wonder whether that's
some arbitrary limitation or so.
> During the GHCB writing the seqno use to be 32-bit value and hence the GHCB
> spec choose the 32-bit value but recently the SNP firmware changed it from
> the 32 to 64. So, now we are left with the option of limiting the sequence
> number to 32-bit. If we go beyond 32-bit then all we can do is fail the
> call. If we pass the value of zero then FW will fail the call.
That sounds weird again. So make it 64-bit like the FW and fix the spec.
> I just choose the smaller name but I have no issues matching with the spec.
> Also those keys does not have anything to do with the VMPL level. The
> secrets page provides 4 different keys and they are referred as vmpck0..3
> and each of them have a sequence numbers associated with it.
>
> In GHCB v3 we probably need to rework the structure name.
You can point to the spec section so that readers can find the struct
layout there.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Aug 31, 2021 at 08:03:25PM -0500, Michael Roth wrote:
> It was used previously in kernel proper to get at the secrets page later,
> but now it's obtained via the cached entry in boot_params.cc_blob_address.
> Unfortunately it uses EFI_GUID() macro, so maybe efi.c or misc.h where
> it makes more sense to add a copy of the macro?
A copy?
arch/x86/boot/compressed/efi.c already includes linux/efi.h where that
macro is defined.
That ship has already sailed. ;-\
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:30AM -0500, Brijesh Singh wrote:
> Version 2 of GHCB specification provides NAEs that can be used by the SNP
Resolve the "NAE" abbreviation here so that it is clear what this means.
> guest to communicate with the PSP without risk from a malicious hypervisor
> who wishes to read, alter, drop or replay the messages sent.
This here says "malicious hypervisor" from which we protect from...
> In order to communicate with the PSP, the guest need to locate the secrets
> page inserted by the hypervisor during the SEV-SNP guest launch. The
... but this here says the secrets page is inserted by the same
hypervisor from which we're actually protecting.
You wanna rephrase that to explain what exactly happens so that it
doesn't sound like we're really trusting the HV with the secrets page.
> secrets page contains the communication keys used to send and receive the
> encrypted messages between the guest and the PSP. The secrets page location
> is passed through the setup_data.
>
> Create a platform device that the SNP guest driver can bind to get the
> platform resources such as encryption key and message id to use to
> communicate with the PSP. The SNP guest driver can provide userspace
> interface to get the attestation report, key derivation, extended
> attestation report etc.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/sev.c | 68 +++++++++++++++++++++++++++++++++++++++
> include/linux/sev-guest.h | 5 +++
> 2 files changed, 73 insertions(+)
...
> +static u64 find_secrets_paddr(void)
> +{
> + u64 pa_data = boot_params.cc_blob_address;
> + struct cc_blob_sev_info info;
> + void *map;
> +
> + /*
> + * The CC blob contains the address of the secrets page, check if the
> + * blob is present.
> + */
> + if (!pa_data)
> + return 0;
> +
> + map = early_memremap(pa_data, sizeof(info));
> + memcpy(&info, map, sizeof(info));
> + early_memunmap(map, sizeof(info));
> +
> + /* Verify that secrets page address is passed */
That's hardly verifying something - if anything, it should say
/* smoke-test the secrets page passed */
> + if (info.secrets_phys && info.secrets_len == PAGE_SIZE)
> + return info.secrets_phys;
... which begs the question: how do we verify the HV is not passing some
garbage instead of an actual secrets page?
I guess it is that:
"SNP_LAUNCH_UPDATE can insert two special pages into the guest’s
memory: the secrets page and the CPUID page. The secrets page contains
encryption keys used by the guest to interact with the firmware. Because
the secrets page is encrypted with the guest’s memory encryption
key, the hypervisor cannot read the keys. The CPUID page contains
hypervisor provided CPUID function values that it passes to the guest.
The firmware validates these values to ensure the hypervisor is not
providing out-of-range values."
From "4.5 Launching a Guest" in the SNP FW ABI spec.
I think that explanation above is very important wrt to explaining the
big picture how this all works with those pages injected into the guest
so I guess somewhere around here a comment should say
"See section 4.5 Launching a Guest in the SNP FW ABI spec for details
about those special pages."
or so.
> +
> + return 0;
> +}
> +
> +static int __init add_snp_guest_request(void)
If anything, that should be called
init_snp_platform_device()
or so.
> +{
> + struct snp_secrets_page_layout *layout;
> + struct snp_guest_platform_data data;
> +
> + if (!sev_feature_enabled(SEV_SNP))
> + return -ENODEV;
> +
> + snp_secrets_phys = find_secrets_paddr();
> + if (!snp_secrets_phys)
> + return -ENODEV;
> +
> + layout = snp_map_secrets_page();
> + if (!layout)
> + return -ENODEV;
> +
> + /*
> + * The secrets page contains three VMPCK that can be used for
What's VMPCK?
> + * communicating with the PSP. We choose the VMPCK0 to encrypt guest
"We" is?
> + * messages send and receive by the Linux. Provide the key and
"... by the Linux."?! That sentence needs more love.
> + * id through the platform data to the driver.
> + */
> + data.vmpck_id = 0;
> + memcpy_fromio(data.vmpck, layout->vmpck0, sizeof(data.vmpck));
> +
> + iounmap(layout);
> +
> + platform_device_add_data(&guest_req_device, &data, sizeof(data));
Oh look, that function can return an error.
> +
> + if (!platform_device_register(&guest_req_device))
> + dev_info(&guest_req_device.dev, "secret phys 0x%llx\n", snp_secrets_phys);
Make that message human-readable - not a debug one.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 9/2/21 11:40 AM, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:30AM -0500, Brijesh Singh wrote:
>> Version 2 of GHCB specification provides NAEs that can be used by the SNP
>
> Resolve the "NAE" abbreviation here so that it is clear what this means.
>
Noted.
>> guest to communicate with the PSP without risk from a malicious hypervisor
>> who wishes to read, alter, drop or replay the messages sent.
>
> This here says "malicious hypervisor" from which we protect from...
>
>> In order to communicate with the PSP, the guest need to locate the secrets
>> page inserted by the hypervisor during the SEV-SNP guest launch. The
>
> ... but this here says the secrets page is inserted by the same
> hypervisor from which we're actually protecting.
>
The content of the secret page is populated by the PSP. Hypervisor
cannot alter the contents; all it can do tell the guest where the
secrets page is present in the memory. The guest will read the secrets
page to get the VM communication key and use that key to encrypt the
message send between the PSP and guest.
> You wanna rephrase that to explain what exactly happens so that it
> doesn't sound like we're really trusting the HV with the secrets page.
>
Sure, I will expand it a bit more.
>> secrets page contains the communication keys used to send and receive the
>> encrypted messages between the guest and the PSP. The secrets page location
>> is passed through the setup_data.
>>
>> Create a platform device that the SNP guest driver can bind to get the
>> platform resources such as encryption key and message id to use to
>> communicate with the PSP. The SNP guest driver can provide userspace
>> interface to get the attestation report, key derivation, extended
>> attestation report etc.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/kernel/sev.c | 68 +++++++++++++++++++++++++++++++++++++++
>> include/linux/sev-guest.h | 5 +++
>> 2 files changed, 73 insertions(+)
>
> ...
>
>> +static u64 find_secrets_paddr(void)
>> +{
>> + u64 pa_data = boot_params.cc_blob_address;
>> + struct cc_blob_sev_info info;
>> + void *map;
>> +
>> + /*
>> + * The CC blob contains the address of the secrets page, check if the
>> + * blob is present.
>> + */
>> + if (!pa_data)
>> + return 0;
>> +
>> + map = early_memremap(pa_data, sizeof(info));
>> + memcpy(&info, map, sizeof(info));
>> + early_memunmap(map, sizeof(info));
>> +
>> + /* Verify that secrets page address is passed */
>
> That's hardly verifying something - if anything, it should say
>
> /* smoke-test the secrets page passed */
>
Noted.
>> + if (info.secrets_phys && info.secrets_len == PAGE_SIZE)
>> + return info.secrets_phys;
>
> ... which begs the question: how do we verify the HV is not passing some
> garbage instead of an actual secrets page?
>
Unfortunately, the secrets page does not contain a magic header or uuid
which a guest can read to verify that the page is actually populated by
the PSP. But since the page is encrypted before the launch so this page
is always accessed encrypted. If hypervisor is tricking us then all that
means is guest OS will get a wrong key and will not be able to
communicate with the PSP to get the attestation reports etc.
> I guess it is that:
>
> "SNP_LAUNCH_UPDATE can insert two special pages into the guest’s
> memory: the secrets page and the CPUID page. The secrets page contains
> encryption keys used by the guest to interact with the firmware. Because
> the secrets page is encrypted with the guest’s memory encryption
> key, the hypervisor cannot read the keys. The CPUID page contains
> hypervisor provided CPUID function values that it passes to the guest.
> The firmware validates these values to ensure the hypervisor is not
> providing out-of-range values."
>
> From "4.5 Launching a Guest" in the SNP FW ABI spec.
>
> I think that explanation above is very important wrt to explaining the
> big picture how this all works with those pages injected into the guest
> so I guess somewhere around here a comment should say
>
I will add more explanation.
> "See section 4.5 Launching a Guest in the SNP FW ABI spec for details
> about those special pages."
>
> or so.
>
>> +
>> + return 0;
>> +}
>> +
>> +static int __init add_snp_guest_request(void)
>
> If anything, that should be called
>
> init_snp_platform_device()
>
> or so.
>
Noted.
>> +{
>> + struct snp_secrets_page_layout *layout;
>> + struct snp_guest_platform_data data;
>> +
>> + if (!sev_feature_enabled(SEV_SNP))
>> + return -ENODEV;
>> +
>> + snp_secrets_phys = find_secrets_paddr();
>> + if (!snp_secrets_phys)
>> + return -ENODEV;
>> +
>> + layout = snp_map_secrets_page();
>> + if (!layout)
>> + return -ENODEV;
>> +
>> + /*
>> + * The secrets page contains three VMPCK that can be used for
>
> What's VMPCK?
>
VM platform communication key.
>> + * communicating with the PSP. We choose the VMPCK0 to encrypt guest
>
> "We" is?
>
>> + * messages send and receive by the Linux. Provide the key and
>
> "... by the Linux."?! That sentence needs more love.
>
I will expand comment a bit more.
>> + * id through the platform data to the driver.
>> + */
>> + data.vmpck_id = 0;
>> + memcpy_fromio(data.vmpck, layout->vmpck0, sizeof(data.vmpck));
>> +
>> + iounmap(layout);
>> +
>> + platform_device_add_data(&guest_req_device, &data, sizeof(data));
>
> Oh look, that function can return an error.
>
Yes, after seeing Dov comment I am adding more checks and return failure.
>> +
>> + if (!platform_device_register(&guest_req_device))
>> + dev_info(&guest_req_device.dev, "secret phys 0x%llx\n", snp_secrets_phys);
>
> Make that message human-readable - not a debug one.
>
Sure.
thank
On 9/2/21 6:26 AM, Borislav Petkov wrote:
> On Mon, Aug 30, 2021 at 10:07:39AM -0500, Brijesh Singh wrote:
>> The SNP firmware spec says that counter must begin with the 1.
>
> So put that in the comment and explain what 0 is: magic or invalid or
> whatnot and why is that so and that it is spec-ed this way, etc.
>
> Just having it there without a reasoning makes one wonder whether that's
> some arbitrary limitation or so.
Agreed, I will add a comment explaining it.
>
>> During the GHCB writing the seqno use to be 32-bit value and hence the GHCB
>> spec choose the 32-bit value but recently the SNP firmware changed it from
>> the 32 to 64. So, now we are left with the option of limiting the sequence
>> number to 32-bit. If we go beyond 32-bit then all we can do is fail the
>> call. If we pass the value of zero then FW will fail the call.
>
> That sounds weird again. So make it 64-bit like the FW and fix the spec.
>
>> I just choose the smaller name but I have no issues matching with the spec.
>> Also those keys does not have anything to do with the VMPL level. The
>> secrets page provides 4 different keys and they are referred as vmpck0..3
>> and each of them have a sequence numbers associated with it.
>>
>> In GHCB v3 we probably need to rework the structure name.
>
> You can point to the spec section so that readers can find the struct
> layout there.
>
I will add comment that this for spec 0.9+.
thanks
On 02/09/2021 22:58, Brijesh Singh wrote:
>
>
> On 9/2/21 11:40 AM, Borislav Petkov wrote:
[...]
>>
>>> +static u64 find_secrets_paddr(void)
>>> +{
>>> + u64 pa_data = boot_params.cc_blob_address;
>>> + struct cc_blob_sev_info info;
>>> + void *map;
>>> +
>>> + /*
>>> + * The CC blob contains the address of the secrets page, check
>>> if the
>>> + * blob is present.
>>> + */
>>> + if (!pa_data)
>>> + return 0;
>>> +
>>> + map = early_memremap(pa_data, sizeof(info));
>>> + memcpy(&info, map, sizeof(info));
>>> + early_memunmap(map, sizeof(info));
>>> +
>>> + /* Verify that secrets page address is passed */
>>
>> That's hardly verifying something - if anything, it should say
>>
>> /* smoke-test the secrets page passed */
>>
> Noted.
>
>>> + if (info.secrets_phys && info.secrets_len == PAGE_SIZE)
>>> + return info.secrets_phys;
>>
>> ... which begs the question: how do we verify the HV is not passing some
>> garbage instead of an actual secrets page?
>>
>
> Unfortunately, the secrets page does not contain a magic header or uuid
> which a guest can read to verify that the page is actually populated by
> the PSP.
In the SNP FW ABI document section 8.14.2.5 there's a Table 61 titled
Secrets Page Format, which states that the first field in that page is a
u32 VERSION field which should equal 2h.
While not as strict as GUID header, this can help detect early that the
content of the SNP secrets page is invalid.
-Dov
> But since the page is encrypted before the launch so this page
> is always accessed encrypted. If hypervisor is tricking us then all that
> means is guest OS will get a wrong key and will not be able to
> communicate with the PSP to get the attestation reports etc.
>
>
>> I guess it is that:
>>
>> "SNP_LAUNCH_UPDATE can insert two special pages into the guest’s
>> memory: the secrets page and the CPUID page. The secrets page contains
>> encryption keys used by the guest to interact with the firmware. Because
>> the secrets page is encrypted with the guest’s memory encryption
>> key, the hypervisor cannot read the keys. The CPUID page contains
>> hypervisor provided CPUID function values that it passes to the guest.
>> The firmware validates these values to ensure the hypervisor is not
>> providing out-of-range values."
>>
>> From "4.5 Launching a Guest" in the SNP FW ABI spec.
>>
>> I think that explanation above is very important wrt to explaining the
>> big picture how this all works with those pages injected into the guest
>> so I guess somewhere around here a comment should say
>>
>
> I will add more explanation.
>
>> "See section 4.5 Launching a Guest in the SNP FW ABI spec for details
>> about those special pages."
>>
>> or so.
>>
On 9/3/21 3:15 AM, Dov Murik wrote:
>> Unfortunately, the secrets page does not contain a magic header or uuid
>> which a guest can read to verify that the page is actually populated by
>> the PSP.
> In the SNP FW ABI document section 8.14.2.5 there's a Table 61 titled
> Secrets Page Format, which states that the first field in that page is a
> u32 VERSION field which should equal 2h.
>
> While not as strict as GUID header, this can help detect early that the
> content of the SNP secrets page is invalid.
The description indicates that the field is a version number of the
secrets page format; it will get bumped every time the spec steals the
reserved bytes for something new. IMHO, we should not depend on the
version number.
thanks
On Fri, Aug 20, 2021 at 10:19:31AM -0500, Brijesh Singh wrote:
> +===================================================================
> +The Definitive SEV Guest API Documentation
> +===================================================================
> +
> +1. General description
> +======================
> +
> +The SEV API is a set of ioctls that are issued to by the guest or
issued to by?
Issued by the guest or hypervisor, you mean..
> +hypervisor to get or set certain aspect of the SEV virtual machine.
> +The ioctls belong to the following classes:
> +
> + - Hypervisor ioctls: These query and set global attributes which affect the
> + whole SEV firmware. These ioctl is used by platform provision tools.
"These ioctls are used ... "
> +
> + - Guest ioctls: These query and set attribute of the SEV virtual machine.
"... attributes... "
> +
> +2. API description
> +==================
> +
> +This section describes ioctls that can be used to query or set SEV guests.
> +For each ioctl, the following information is provided along with a
> +description:
> +
> + Technology:
> + which SEV techology provides this ioctl. sev, sev-es, sev-snp or all.
> +
> + Type:
> + hypervisor or guest. The ioctl can be used inside the guest or the
> + hypervisor.
> +
> + Parameters:
> + what parameters are accepted by the ioctl.
> +
> + Returns:
> + the return value. General error numbers (ENOMEM, EINVAL)
> + are not detailed, but errors with specific meanings are.
> +
> +The guest ioctl should be called to /dev/sev-guest device. The ioctl accepts
s/called to/issued on a file descriptor of the/
> +struct snp_user_guest_request. The input and output structure is specified
> +through the req_data and resp_data field respectively. If the ioctl fails
> +to execute due to the firmware error, then fw_err code will be set.
"... due to a ... "
> +
> +::
> + struct snp_user_guest_request {
So you said earlier:
> I followed the naming convension you recommended during the initial SEV driver
> developement. IIRC, the main reason for us having to add "user" in it because
> we wanted to distinguious that this structure is not exactly same as the what
> is defined in the SEV-SNP firmware spec.
but looking at the current variant in the code, the structure in the SNP spec is
Table 91. Layout of the CMDBUF_SNP_GUEST_REQUEST Structure
which corresponds to struct snp_guest_request_data so you can call this one:
struct snp_guest_request_ioctl
and then it is perfectly clear what is what.
> + /* Request and response structure address */
> + __u64 req_data;
> + __u64 resp_data;
> +
> + /* firmware error code on failure (see psp-sev.h) */
> + __u64 fw_err;
> + };
> +
> +2.1 SNP_GET_REPORT
> +------------------
> +
> +:Technology: sev-snp
> +:Type: guest ioctl
> +:Parameters (in): struct snp_report_req
> +:Returns (out): struct snp_report_resp on success, -negative on error
> +
> +The SNP_GET_REPORT ioctl can be used to query the attestation report from the
> +SEV-SNP firmware. The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command
> +provided by the SEV-SNP firmware to query the attestation report.
> +
> +On success, the snp_report_resp.data will contains the report. The report
"... will contain... "
> +format is described in the SEV-SNP specification. See the SEV-SNP specification
> +for further details.
"... which can be found at https://developer.amd.com/sev/."
assuming that URL will keep its validity in the foreseeable future.
> +static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> +{
> + struct snp_guest_dev *snp_dev = to_snp_dev(file);
> + void __user *argp = (void __user *)arg;
> + struct snp_user_guest_request input;
> + int ret = -ENOTTY;
> +
> + if (copy_from_user(&input, argp, sizeof(input)))
> + return -EFAULT;
> +
> + mutex_lock(&snp_cmd_mutex);
> +
> + switch (ioctl) {
> + case SNP_GET_REPORT: {
> + ret = get_report(snp_dev, &input);
> + break;
> + }
No need for those {} brackets around the case.
> + default:
> + break;
> + }
> +
> + mutex_unlock(&snp_cmd_mutex);
> +
> + if (copy_to_user(argp, &input, sizeof(input)))
> + return -EFAULT;
> +
> + return ret;
> +}
> +
> +static void free_shared_pages(void *buf, size_t sz)
> +{
> + unsigned int npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
> +
> + /* If fail to restore the encryption mask then leak it. */
> + if (set_memory_encrypted((unsigned long)buf, npages))
Hmm, this sounds like an abnormal condition about which we should at
least warn...
> + return;
> +
> + __free_pages(virt_to_page(buf), get_order(sz));
> +}
> +
> +static void *alloc_shared_pages(size_t sz)
> +{
> + unsigned int npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
> + struct page *page;
> + int ret;
> +
> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(sz));
> + if (IS_ERR(page))
> + return NULL;
> +
> + ret = set_memory_decrypted((unsigned long)page_address(page), npages);
> + if (ret) {
> + __free_pages(page, get_order(sz));
> + return NULL;
> + }
> +
> + return page_address(page);
> +}
> +
> +static const struct file_operations snp_guest_fops = {
> + .owner = THIS_MODULE,
> + .unlocked_ioctl = snp_guest_ioctl,
> +};
> +
> +static int __init snp_guest_probe(struct platform_device *pdev)
> +{
> + struct snp_guest_platform_data *data;
> + struct device *dev = &pdev->dev;
> + struct snp_guest_dev *snp_dev;
> + struct miscdevice *misc;
> + int ret;
> +
> + if (!dev->platform_data)
> + return -ENODEV;
> +
> + data = (struct snp_guest_platform_data *)dev->platform_data;
> + vmpck_id = data->vmpck_id;
> +
> + snp_dev = devm_kzalloc(&pdev->dev, sizeof(struct snp_guest_dev), GFP_KERNEL);
> + if (!snp_dev)
> + return -ENOMEM;
> +
> + platform_set_drvdata(pdev, snp_dev);
> + snp_dev->dev = dev;
> +
> + snp_dev->crypto = init_crypto(snp_dev, data->vmpck, sizeof(data->vmpck));
> + if (!snp_dev->crypto)
> + return -EIO;
I guess you should put the crypto init...
> +
> + /* Allocate the shared page used for the request and response message. */
> + snp_dev->request = alloc_shared_pages(sizeof(struct snp_guest_msg));
> + if (IS_ERR(snp_dev->request)) {
> + ret = PTR_ERR(snp_dev->request);
> + goto e_free_crypto;
> + }
> +
> + snp_dev->response = alloc_shared_pages(sizeof(struct snp_guest_msg));
> + if (IS_ERR(snp_dev->response)) {
> + ret = PTR_ERR(snp_dev->response);
> + goto e_free_req;
> + }
... here, after the page allocation to save yourself all the setup work
if the shared pages allocation fails.
> +
> + misc = &snp_dev->misc;
> + misc->minor = MISC_DYNAMIC_MINOR;
> + misc->name = DEVICE_NAME;
> + misc->fops = &snp_guest_fops;
> +
> + return misc_register(misc);
> +
> +e_free_req:
> + free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
> +
> +e_free_crypto:
> + deinit_crypto(snp_dev->crypto);
> +
> + return ret;
> +}
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Boris,
I will update doc and commit per your feedback.
On 9/6/21 12:38 PM, Borislav Petkov wrote:
>
> So you said earlier:
>
>> I followed the naming convension you recommended during the initial SEV driver
>> developement. IIRC, the main reason for us having to add "user" in it because
>> we wanted to distinguious that this structure is not exactly same as the what
>> is defined in the SEV-SNP firmware spec.
>
> but looking at the current variant in the code, the structure in the SNP spec is
>
> Table 91. Layout of the CMDBUF_SNP_GUEST_REQUEST Structure
>
> which corresponds to struct snp_guest_request_data so you can call this one:
>
> struct snp_guest_request_ioctl
>
> and then it is perfectly clear what is what.
Noted.
>
>
> "... which can be found at https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fsev%2F&data=04%7C01%7Cbrijesh.singh%40amd.com%7C9bc8f642dbad48a2a78008d9715d1edd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637665467074351191%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Iiz3emjR%2Blx8H73g2N0bOfPHeXXv%2FhLtlljOWNoD2mQ%3D&reserved=0."
>
> assuming that URL will keep its validity in the foreseeable future.
Unfortunately, the doc folks are replacing the current spec with the
new, and previous URLs are no longer valid. I will spell out the spec
version number so that anyone downloading the spec from bugzilla will
able to locate it.
thanks
Brijesh
On Tue, Sep 07, 2021 at 08:35:13AM -0500, Brijesh Singh wrote:
> Unfortunately, the doc folks are replacing the current spec with the new,
> and previous URLs are no longer valid. I will spell out the spec version
> number so that anyone downloading the spec from bugzilla will able to locate
> it.
Yap, this is yet another example why we need a stable collection for
docs, outside of the vendor domains which change way too often and URLs
end up disappearing.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:32AM -0500, Brijesh Singh wrote:
> +2.2 SNP_GET_DERIVED_KEY
> +-----------------------
> +:Technology: sev-snp
> +:Type: guest ioctl
> +:Parameters (in): struct snp_derived_key_req
> +:Returns (out): struct snp_derived_key_req on success, -negative on error
> +
> +The SNP_GET_DERIVED_KEY ioctl can be used to get a key derive from a root key.
> +The derived key can be used by the guest for any purpose, such as sealing keys
> +or communicating with external entities.
> +
> +The ioctl uses the SNP_GUEST_REQUEST (MSG_KEY_REQ) command provided by the
> +SEV-SNP firmware to derive the key. See SEV-SNP specification for further details
> +on the various fileds passed in the key derivation request.
> +
> +On success, the snp_derived_key_resp.data will contains the derived key
"will contain"
> +value.
> diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
> index d029a98ad088..621b1c5a9cfc 100644
> --- a/drivers/virt/coco/sevguest/sevguest.c
> +++ b/drivers/virt/coco/sevguest/sevguest.c
> @@ -303,6 +303,50 @@ static int get_report(struct snp_guest_dev *snp_dev, struct snp_user_guest_reque
> return rc;
> }
>
> +static int get_derived_key(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *arg)
> +{
> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> + struct snp_derived_key_resp *resp;
> + struct snp_derived_key_req req;
> + int rc, resp_len;
> +
> + if (!arg->req_data || !arg->resp_data)
> + return -EINVAL;
> +
> + /* Copy the request payload from the userspace */
"from userspace"
> + if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
> + return -EFAULT;
> +
> + /* Message version must be non-zero */
> + if (!req.msg_version)
> + return -EINVAL;
> +
> + /*
> + * The intermediate response buffer is used while decrypting the
> + * response payload. Make sure that it has enough space to cover the
> + * authtag.
> + */
> + resp_len = sizeof(resp->data) + crypto->a_len;
> + resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
> + if (!resp)
> + return -ENOMEM;
> +
> + /* Issue the command to get the attestation report */
> + rc = handle_guest_request(snp_dev, req.msg_version, SNP_MSG_KEY_REQ,
> + &req.data, sizeof(req.data), resp->data, resp_len,
> + &arg->fw_err);
> + if (rc)
> + goto e_free;
> +
> + /* Copy the response payload to userspace */
> + if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
> + rc = -EFAULT;
> +
> +e_free:
> + kfree(resp);
> + return rc;
> +}
> +
> static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> {
> struct snp_guest_dev *snp_dev = to_snp_dev(file);
> @@ -320,6 +364,10 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
> ret = get_report(snp_dev, &input);
> break;
> }
> + case SNP_GET_DERIVED_KEY: {
> + ret = get_derived_key(snp_dev, &input);
> + break;
> + }
{} brackets are not needed.
What, however, is bothering me more in this function is that you call
the respective ioctl function which might fail, you do not look at the
return value and copy_to_user() unconditionally.
Looking at get_derived_key(), for example, if it returns after:
if (!arg->req_data || !arg->resp_data)
return -EINVAL;
you will be copying the same thing back to the user, you copied in
earlier. That doesn't make any sense to me.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 20, 2021 at 10:19:33AM -0500, Brijesh Singh wrote:
> Version 2 of GHCB specification defines NAE to get the extended guest
Resolve "NAE" pls.
> request. It is similar to the SNP_GET_REPORT ioctl. The main difference
> is related to the additional data that be returned. The additional
"that will be returned"
> data returned is a certificate blob that can be used by the SNP guest
> user. The certificate blob layout is defined in the GHCB specification.
> The driver simply treats the blob as a opaque data and copies it to
> userspace.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> Documentation/virt/coco/sevguest.rst | 22 +++++
> drivers/virt/coco/sevguest/sevguest.c | 126 ++++++++++++++++++++++++++
> include/uapi/linux/sev-guest.h | 13 +++
> 3 files changed, 161 insertions(+)
>
> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
> index 25446670d816..7acb8696fca4 100644
> --- a/Documentation/virt/coco/sevguest.rst
> +++ b/Documentation/virt/coco/sevguest.rst
> @@ -85,3 +85,25 @@ on the various fileds passed in the key derivation request.
>
> On success, the snp_derived_key_resp.data will contains the derived key
> value.
> +
> +2.2 SNP_GET_EXT_REPORT
> +----------------------
> +:Technology: sev-snp
> +:Type: guest ioctl
> +:Parameters (in/out): struct snp_ext_report_req
> +:Returns (out): struct snp_report_resp on success, -negative on error
> +
> +The SNP_GET_EXT_REPORT ioctl is similar to the SNP_GET_REPORT. The difference is
> +related to the additional certificate data that is returned with the report.
> +The certificate data returned is being provided by the hypervisor through the
> +SNP_SET_EXT_CONFIG.
> +
> +The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command provided by the SEV-SNP
> +firmware to get the attestation report.
> +
> +On success, the snp_ext_report_resp.data will contains the attestation report
"will contain"
> +and snp_ext_report_req.certs_address will contains the certificate blob. If the
ditto.
> +length of the blob is lesser than expected then snp_ext_report_req.certs_len will
"is smaller"
> +be updated with the expected value.
> +
> +See GHCB specification for further detail on how to parse the certificate blob.
> diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
> index 621b1c5a9cfc..d978eb432c4c 100644
> --- a/drivers/virt/coco/sevguest/sevguest.c
> +++ b/drivers/virt/coco/sevguest/sevguest.c
> @@ -39,6 +39,7 @@ struct snp_guest_dev {
> struct device *dev;
> struct miscdevice misc;
>
> + void *certs_data;
> struct snp_guest_crypto *crypto;
> struct snp_guest_msg *request, *response;
> };
> @@ -347,6 +348,117 @@ static int get_derived_key(struct snp_guest_dev *snp_dev, struct snp_user_guest_
> return rc;
> }
>
> +static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *arg)
> +{
> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> + struct snp_guest_request_data input = {};
> + struct snp_ext_report_req req;
> + int ret, npages = 0, resp_len;
> + struct snp_report_resp *resp;
> + struct snp_report_req *rreq;
> + unsigned long fw_err = 0;
> +
> + if (!arg->req_data || !arg->resp_data)
> + return -EINVAL;
> +
> + /* Copy the request payload from the userspace */
"from userspace"
> + if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
> + return -EFAULT;
> +
> + rreq = &req.data;
> +
> + /* Message version must be non-zero */
> + if (!rreq->msg_version)
> + return -EINVAL;
> +
> + if (req.certs_len) {
> + if (req.certs_len > SEV_FW_BLOB_MAX_SIZE ||
> + !IS_ALIGNED(req.certs_len, PAGE_SIZE))
> + return -EINVAL;
> + }
> +
> + if (req.certs_address && req.certs_len) {
> + if (!access_ok(req.certs_address, req.certs_len))
> + return -EFAULT;
> +
> + /*
> + * Initialize the intermediate buffer with all zero's. This buffer
> + * is used in the guest request message to get the certs blob from
> + * the host. If host does not supply any certs in it, then we copy
Please use passive voice: no "we" or "I", etc,
> + * zeros to indicate that certificate data was not provided.
> + */
> + memset(snp_dev->certs_data, 0, req.certs_len);
> +
> + input.data_gpa = __pa(snp_dev->certs_data);
> + npages = req.certs_len >> PAGE_SHIFT;
> + }
> +
> + /*
> + * The intermediate response buffer is used while decrypting the
> + * response payload. Make sure that it has enough space to cover the
> + * authtag.
> + */
> + resp_len = sizeof(resp->data) + crypto->a_len;
> + resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
> + if (!resp)
> + return -ENOMEM;
> +
> + if (copy_from_user(resp, (void __user *)arg->resp_data, sizeof(*resp))) {
> + ret = -EFAULT;
> + goto e_free;
> + }
> +
> + /* Encrypt the userspace provided payload */
> + ret = enc_payload(snp_dev, rreq->msg_version, SNP_MSG_REPORT_REQ,
> + &rreq->user_data, sizeof(rreq->user_data));
> + if (ret)
> + goto e_free;
> +
> + /* Call firmware to process the request */
> + input.req_gpa = __pa(snp_dev->request);
> + input.resp_gpa = __pa(snp_dev->response);
> + input.data_npages = npages;
> + memset(snp_dev->response, 0, sizeof(*snp_dev->response));
> + ret = snp_issue_guest_request(EXT_GUEST_REQUEST, &input, &fw_err);
> +
> + /* Popogate any firmware error to the userspace */
> + arg->fw_err = fw_err;
> +
> + /* If certs length is invalid then copy the returned length */
> + if (arg->fw_err == SNP_GUEST_REQ_INVALID_LEN) {
> + req.certs_len = input.data_npages << PAGE_SHIFT;
> +
> + if (copy_to_user((void __user *)arg->req_data, &req, sizeof(req)))
> + ret = -EFAULT;
> +
> + goto e_free;
> + }
> +
> + if (ret)
> + goto e_free;
This one is really confusing. You assign ret in the if branch
above but then you test ret outside too, just in case the
snp_issue_guest_request() call above has failed.
But then if that call has failed, you still go and do some cleanup work
for invalid certs length...
So that get_ext_report() function is doing too many things at once and
is crying to be split.
For example, the glue around snp_issue_guest_request() is already carved
out in handle_guest_request(). Why aren't you calling that function here
too?
That'll keep the enc, request, dec payload game separate and then the
rest of the logic can remain in get_ext_report()...
...
> static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> {
> struct snp_guest_dev *snp_dev = to_snp_dev(file);
> @@ -368,6 +480,10 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
> ret = get_derived_key(snp_dev, &input);
> break;
> }
> + case SNP_GET_EXT_REPORT: {
> + ret = get_ext_report(snp_dev, &input);
> + break;
> + }
> default:
> break;
> }
> @@ -453,6 +569,12 @@ static int __init snp_guest_probe(struct platform_device *pdev)
> goto e_free_req;
> }
>
> + snp_dev->certs_data = alloc_shared_pages(SEV_FW_BLOB_MAX_SIZE);
> + if (IS_ERR(snp_dev->certs_data)) {
> + ret = PTR_ERR(snp_dev->certs_data);
> + goto e_free_resp;
> + }
Same comments here as for patch 37.
> +
> misc = &snp_dev->misc;
> misc->minor = MISC_DYNAMIC_MINOR;
> misc->name = DEVICE_NAME;
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 9/8/21 9:00 AM, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:19:32AM -0500, Brijesh Singh wrote:
>> +2.2 SNP_GET_DERIVED_KEY
>> +-----------------------
>> +:Technology: sev-snp
>> +:Type: guest ioctl
>> +:Parameters (in): struct snp_derived_key_req
>> +:Returns (out): struct snp_derived_key_req on success, -negative on error
>> +
>> +The SNP_GET_DERIVED_KEY ioctl can be used to get a key derive from a root key.
>> +The derived key can be used by the guest for any purpose, such as sealing keys
>> +or communicating with external entities.
>> +
>> +The ioctl uses the SNP_GUEST_REQUEST (MSG_KEY_REQ) command provided by the
>> +SEV-SNP firmware to derive the key. See SEV-SNP specification for further details
>> +on the various fileds passed in the key derivation request.
>> +
>> +On success, the snp_derived_key_resp.data will contains the derived key
>
> "will contain"
Noted.
>> +
>> + /* Copy the request payload from the userspace */
>
> "from userspace"
Noted.
>> +
>> static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>> {
>> struct snp_guest_dev *snp_dev = to_snp_dev(file);
>> @@ -320,6 +364,10 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
>> ret = get_report(snp_dev, &input);
>> break;
>> }
>> + case SNP_GET_DERIVED_KEY: {
>> + ret = get_derived_key(snp_dev, &input);
>> + break;
>> + }
>
> {} brackets are not needed.
>
> What, however, is bothering me more in this function is that you call
> the respective ioctl function which might fail, you do not look at the
> return value and copy_to_user() unconditionally.
>
> Looking at get_derived_key(), for example, if it returns after:
>
> if (!arg->req_data || !arg->resp_data)
> return -EINVAL;
>
> you will be copying the same thing back to the user, you copied in
> earlier. That doesn't make any sense to me.
I will look into improving it to copy back to userspace only if there is
firmware error.
thanks
On Fri, Aug 20, 2021 at 9:22 AM Brijesh Singh <[email protected]> wrote:
>
> The SNP guest request message header contains a message count. The
> message count is used while building the IV. The PSP firmware increments
> the message count by 1, and expects that next message will be using the
> incremented count. The snp_msg_seqno() helper will be used by driver to
> get the message sequence counter used in the request message header,
> and it will be automatically incremented after the request is successful.
> The incremented value is saved in the secrets page so that the kexec'ed
> kernel knows from where to begin.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/sev.c | 79 +++++++++++++++++++++++++++++++++++++++
> include/linux/sev-guest.h | 37 ++++++++++++++++++
> 2 files changed, 116 insertions(+)
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 319a40fc57ce..f42cd5a8e7bb 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -51,6 +51,8 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
> */
> static struct ghcb __initdata *boot_ghcb;
>
> +static u64 snp_secrets_phys;
> +
> /* #VC handler runtime per-CPU data */
> struct sev_es_runtime_data {
> struct ghcb ghcb_page;
> @@ -2030,6 +2032,80 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
> halt();
> }
>
> +static struct snp_secrets_page_layout *snp_map_secrets_page(void)
> +{
> + u16 __iomem *secrets;
> +
> + if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
> + return NULL;
> +
> + secrets = ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
> + if (!secrets)
> + return NULL;
> +
> + return (struct snp_secrets_page_layout *)secrets;
> +}
> +
> +static inline u64 snp_read_msg_seqno(void)
> +{
> + struct snp_secrets_page_layout *layout;
> + u64 count;
> +
> + layout = snp_map_secrets_page();
> + if (!layout)
> + return 0;
> +
> + /* Read the current message sequence counter from secrets pages */
> + count = readl(&layout->os_area.msg_seqno_0);
> +
> + iounmap(layout);
> +
> + /* The sequence counter must begin with 1 */
> + if (!count)
> + return 1;
> +
> + return count + 1;
> +}
> +
> +u64 snp_msg_seqno(void)
> +{
> + u64 count = snp_read_msg_seqno();
> +
> + if (unlikely(!count))
> + return 0;
> +
> + /*
> + * The message sequence counter for the SNP guest request is a
> + * 64-bit value but the version 2 of GHCB specification defines a
> + * 32-bit storage for the it.
> + */
> + if (count >= UINT_MAX)
> + return 0;
> +
> + return count;
> +}
> +EXPORT_SYMBOL_GPL(snp_msg_seqno);
Do we need some sort of get sequence number, then ack that sequence
number was used API? Taking your host changes in Part2 V5 as an
example. If 'snp_setup_guest_buf' fails the given sequence number is
never actually used by a message to the PSP. So the guest will have
the wrong current sequence number, an off by 1 error, right?
Also it seems like there is a concurrency error waiting to happen
here. If 2 callers call snp_msg_seqno() before either actually places
a call to the PSP, if the first caller's request doesn't reach the PSP
before the second caller's request both calls will fail. And again I
think the sequence numbers in the guest will be incorrect and
unrecoverable.
> +
> +static void snp_gen_msg_seqno(void)
> +{
> + struct snp_secrets_page_layout *layout;
> + u64 count;
> +
> + layout = snp_map_secrets_page();
> + if (!layout)
> + return;
> +
> + /*
> + * The counter is also incremented by the PSP, so increment it by 2
> + * and save in secrets page.
> + */
> + count = readl(&layout->os_area.msg_seqno_0);
> + count += 2;
> +
> + writel(count, &layout->os_area.msg_seqno_0);
> + iounmap(layout);
> +}
> +
> int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsigned long *fw_err)
> {
> struct ghcb_state state;
> @@ -2077,6 +2153,9 @@ int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsi
> ret = -EIO;
> }
>
> + /* The command was successful, increment the sequence counter */
> + snp_gen_msg_seqno();
> +
> e_put:
> __sev_put_ghcb(&state);
> e_restore_irq:
> diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
> index 24dd17507789..16b6af24fda7 100644
> --- a/include/linux/sev-guest.h
> +++ b/include/linux/sev-guest.h
> @@ -20,6 +20,41 @@ enum vmgexit_type {
> GUEST_REQUEST_MAX
> };
>
> +/*
> + * The secrets page contains 96-bytes of reserved field that can be used by
> + * the guest OS. The guest OS uses the area to save the message sequence
> + * number for each VMPCK.
> + *
> + * See the GHCB spec section Secret page layout for the format for this area.
> + */
> +struct secrets_os_area {
> + u32 msg_seqno_0;
> + u32 msg_seqno_1;
> + u32 msg_seqno_2;
> + u32 msg_seqno_3;
> + u64 ap_jump_table_pa;
> + u8 rsvd[40];
> + u8 guest_usage[32];
> +} __packed;
> +
> +#define VMPCK_KEY_LEN 32
> +
> +/* See the SNP spec for secrets page format */
> +struct snp_secrets_page_layout {
> + u32 version;
> + u32 imien : 1,
> + rsvd1 : 31;
> + u32 fms;
> + u32 rsvd2;
> + u8 gosvw[16];
> + u8 vmpck0[VMPCK_KEY_LEN];
> + u8 vmpck1[VMPCK_KEY_LEN];
> + u8 vmpck2[VMPCK_KEY_LEN];
> + u8 vmpck3[VMPCK_KEY_LEN];
> + struct secrets_os_area os_area;
> + u8 rsvd3[3840];
> +} __packed;
> +
> /*
> * The error code when the data_npages is too small. The error code
> * is defined in the GHCB specification.
> @@ -36,6 +71,7 @@ struct snp_guest_request_data {
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
> unsigned long *fw_err);
> +u64 snp_msg_seqno(void);
> #else
>
> static inline int snp_issue_guest_request(int type, struct snp_guest_request_data *input,
> @@ -43,6 +79,7 @@ static inline int snp_issue_guest_request(int type, struct snp_guest_request_dat
> {
> return -ENODEV;
> }
> +static inline u64 snp_msg_seqno(void) { return 0; }
>
> #endif /* CONFIG_AMD_MEM_ENCRYPT */
> #endif /* __LINUX_SEV_GUEST_H__ */
> --
> 2.17.1
>
>
On 9/9/21 9:54 AM, Peter Gonda wrote:
> On Fri, Aug 20, 2021 at 9:22 AM Brijesh Singh <[email protected]> wrote:
>>
>> The SNP guest request message header contains a message count. The
>> message count is used while building the IV. The PSP firmware increments
>> the message count by 1, and expects that next message will be using the
>> incremented count. The snp_msg_seqno() helper will be used by driver to
>> get the message sequence counter used in the request message header,
>> and it will be automatically incremented after the request is successful.
>> The incremented value is saved in the secrets page so that the kexec'ed
>> kernel knows from where to begin.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/kernel/sev.c | 79 +++++++++++++++++++++++++++++++++++++++
>> include/linux/sev-guest.h | 37 ++++++++++++++++++
>> 2 files changed, 116 insertions(+)
>>
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index 319a40fc57ce..f42cd5a8e7bb 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -51,6 +51,8 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
>> */
>> static struct ghcb __initdata *boot_ghcb;
>>
>> +static u64 snp_secrets_phys;
>> +
>> /* #VC handler runtime per-CPU data */
>> struct sev_es_runtime_data {
>> struct ghcb ghcb_page;
>> @@ -2030,6 +2032,80 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
>> halt();
>> }
>>
>> +static struct snp_secrets_page_layout *snp_map_secrets_page(void)
>> +{
>> + u16 __iomem *secrets;
>> +
>> + if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
>> + return NULL;
>> +
>> + secrets = ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
>> + if (!secrets)
>> + return NULL;
>> +
>> + return (struct snp_secrets_page_layout *)secrets;
>> +}
>> +
>> +static inline u64 snp_read_msg_seqno(void)
>> +{
>> + struct snp_secrets_page_layout *layout;
>> + u64 count;
>> +
>> + layout = snp_map_secrets_page();
>> + if (!layout)
>> + return 0;
>> +
>> + /* Read the current message sequence counter from secrets pages */
>> + count = readl(&layout->os_area.msg_seqno_0);
>> +
>> + iounmap(layout);
>> +
>> + /* The sequence counter must begin with 1 */
>> + if (!count)
>> + return 1;
>> +
>> + return count + 1;
>> +}
>> +
>> +u64 snp_msg_seqno(void)
>> +{
>> + u64 count = snp_read_msg_seqno();
>> +
>> + if (unlikely(!count))
>> + return 0;
>> +
>> + /*
>> + * The message sequence counter for the SNP guest request is a
>> + * 64-bit value but the version 2 of GHCB specification defines a
>> + * 32-bit storage for the it.
>> + */
>> + if (count >= UINT_MAX)
>> + return 0;
>> +
>> + return count;
>> +}
>> +EXPORT_SYMBOL_GPL(snp_msg_seqno);
>
> Do we need some sort of get sequence number, then ack that sequence
> number was used API? Taking your host changes in Part2 V5 as an
> example. If 'snp_setup_guest_buf' fails the given sequence number is
> never actually used by a message to the PSP. So the guest will have
> the wrong current sequence number, an off by 1 error, right?
>
The sequence number should be incremented only after the command is
successful. In this particular case the next caller should not get the
updated sequence number.
Having said so, there is a bug in current code that will cause us to
increment the sequence number on failure. I notice it last week and have
it fixed in v6 wip branch.
int snp_issue_guest_request(....)
{
.....
.....
ret = sev_es_ghcb_hv_call(ghcb, NULL, id, input->req_gpa, input->resp_gpa);
if (ret)
goto e_put;
if (ghcb->save.sw_exit_info_2) {
...
...
ret = -EIO;
goto e_put; /** THIS WAS MISSING */
}
/* The command was successful, increment the sequence counter. */
snp_gen_msg_seqno();
e_put:
....
}
Does this address your concern?
> Also it seems like there is a concurrency error waiting to happen
> here. If 2 callers call snp_msg_seqno() before either actually places
> a call to the PSP, if the first caller's request doesn't reach the PSP
> before the second caller's request both calls will fail. And again I
> think the sequence numbers in the guest will be incorrect and
> unrecoverable.
>
So far, the only user for the snp_msg_seqno() is the attestation driver.
And the driver is designed to serialize the vmgexit request and thus we
should not run into concurrence issue.
>> +
>> +static void snp_gen_msg_seqno(void)
>> +{
>> + struct snp_secrets_page_layout *layout;
>> + u64 count;
>> +
>> + layout = snp_map_secrets_page();
>> + if (!layout)
>> + return;
>> +
>> + /*
>> + * The counter is also incremented by the PSP, so increment it by 2
>> + * and save in secrets page.
>> + */
>> + count = readl(&layout->os_area.msg_seqno_0);
>> + count += 2;
>> +
>> + writel(count, &layout->os_area.msg_seqno_0);
>> + iounmap(layout);
>> +}
>> +
>> int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsigned long *fw_err)
>> {
>> struct ghcb_state state;
>> @@ -2077,6 +2153,9 @@ int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsi
>> ret = -EIO;
>> }
>>
>> + /* The command was successful, increment the sequence counter */
>> + snp_gen_msg_seqno();
>> +
>> e_put:
>> __sev_put_ghcb(&state);
>> e_restore_irq:
>> diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
>> index 24dd17507789..16b6af24fda7 100644
>> --- a/include/linux/sev-guest.h
>> +++ b/include/linux/sev-guest.h
>> @@ -20,6 +20,41 @@ enum vmgexit_type {
>> GUEST_REQUEST_MAX
>> };
>>
>> +/*
>> + * The secrets page contains 96-bytes of reserved field that can be used by
>> + * the guest OS. The guest OS uses the area to save the message sequence
>> + * number for each VMPCK.
>> + *
>> + * See the GHCB spec section Secret page layout for the format for this area.
>> + */
>> +struct secrets_os_area {
>> + u32 msg_seqno_0;
>> + u32 msg_seqno_1;
>> + u32 msg_seqno_2;
>> + u32 msg_seqno_3;
>> + u64 ap_jump_table_pa;
>> + u8 rsvd[40];
>> + u8 guest_usage[32];
>> +} __packed;
>> +
>> +#define VMPCK_KEY_LEN 32
>> +
>> +/* See the SNP spec for secrets page format */
>> +struct snp_secrets_page_layout {
>> + u32 version;
>> + u32 imien : 1,
>> + rsvd1 : 31;
>> + u32 fms;
>> + u32 rsvd2;
>> + u8 gosvw[16];
>> + u8 vmpck0[VMPCK_KEY_LEN];
>> + u8 vmpck1[VMPCK_KEY_LEN];
>> + u8 vmpck2[VMPCK_KEY_LEN];
>> + u8 vmpck3[VMPCK_KEY_LEN];
>> + struct secrets_os_area os_area;
>> + u8 rsvd3[3840];
>> +} __packed;
>> +
>> /*
>> * The error code when the data_npages is too small. The error code
>> * is defined in the GHCB specification.
>> @@ -36,6 +71,7 @@ struct snp_guest_request_data {
>> #ifdef CONFIG_AMD_MEM_ENCRYPT
>> int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
>> unsigned long *fw_err);
>> +u64 snp_msg_seqno(void);
>> #else
>>
>> static inline int snp_issue_guest_request(int type, struct snp_guest_request_data *input,
>> @@ -43,6 +79,7 @@ static inline int snp_issue_guest_request(int type, struct snp_guest_request_dat
>> {
>> return -ENODEV;
>> }
>> +static inline u64 snp_msg_seqno(void) { return 0; }
>>
>> #endif /* CONFIG_AMD_MEM_ENCRYPT */
>> #endif /* __LINUX_SEV_GUEST_H__ */
>> --
>> 2.17.1
>>
>>
)()
On Thu, Sep 9, 2021 at 9:26 AM Brijesh Singh <[email protected]> wrote:
>
>
>
> On 9/9/21 9:54 AM, Peter Gonda wrote:
> > On Fri, Aug 20, 2021 at 9:22 AM Brijesh Singh <[email protected]> wrote:
> >>
> >> The SNP guest request message header contains a message count. The
> >> message count is used while building the IV. The PSP firmware increments
> >> the message count by 1, and expects that next message will be using the
> >> incremented count. The snp_msg_seqno() helper will be used by driver to
> >> get the message sequence counter used in the request message header,
> >> and it will be automatically incremented after the request is successful.
> >> The incremented value is saved in the secrets page so that the kexec'ed
> >> kernel knows from where to begin.
> >>
> >> Signed-off-by: Brijesh Singh <[email protected]>
> >> ---
> >> arch/x86/kernel/sev.c | 79 +++++++++++++++++++++++++++++++++++++++
> >> include/linux/sev-guest.h | 37 ++++++++++++++++++
> >> 2 files changed, 116 insertions(+)
> >>
> >> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> >> index 319a40fc57ce..f42cd5a8e7bb 100644
> >> --- a/arch/x86/kernel/sev.c
> >> +++ b/arch/x86/kernel/sev.c
> >> @@ -51,6 +51,8 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
> >> */
> >> static struct ghcb __initdata *boot_ghcb;
> >>
> >> +static u64 snp_secrets_phys;
> >> +
> >> /* #VC handler runtime per-CPU data */
> >> struct sev_es_runtime_data {
> >> struct ghcb ghcb_page;
> >> @@ -2030,6 +2032,80 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
> >> halt();
> >> }
> >>
> >> +static struct snp_secrets_page_layout *snp_map_secrets_page(void)
> >> +{
> >> + u16 __iomem *secrets;
> >> +
> >> + if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
> >> + return NULL;
> >> +
> >> + secrets = ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
> >> + if (!secrets)
> >> + return NULL;
> >> +
> >> + return (struct snp_secrets_page_layout *)secrets;
> >> +}
> >> +
> >> +static inline u64 snp_read_msg_seqno(void)
> >> +{
> >> + struct snp_secrets_page_layout *layout;
> >> + u64 count;
> >> +
> >> + layout = snp_map_secrets_page();
> >> + if (!layout)
> >> + return 0;
> >> +
> >> + /* Read the current message sequence counter from secrets pages */
> >> + count = readl(&layout->os_area.msg_seqno_0);
> >> +
> >> + iounmap(layout);
> >> +
> >> + /* The sequence counter must begin with 1 */
> >> + if (!count)
> >> + return 1;
> >> +
> >> + return count + 1;
> >> +}
> >> +
> >> +u64 snp_msg_seqno(void)
> >> +{
> >> + u64 count = snp_read_msg_seqno();
> >> +
> >> + if (unlikely(!count))
> >> + return 0;
> >> +
> >> + /*
> >> + * The message sequence counter for the SNP guest request is a
> >> + * 64-bit value but the version 2 of GHCB specification defines a
> >> + * 32-bit storage for the it.
> >> + */
> >> + if (count >= UINT_MAX)
> >> + return 0;
> >> +
> >> + return count;
> >> +}
> >> +EXPORT_SYMBOL_GPL(snp_msg_seqno);
> >
> > Do we need some sort of get sequence number, then ack that sequence
> > number was used API? Taking your host changes in Part2 V5 as an
> > example. If 'snp_setup_guest_buf' fails the given sequence number is
> > never actually used by a message to the PSP. So the guest will have
> > the wrong current sequence number, an off by 1 error, right?
> >
>
> The sequence number should be incremented only after the command is
> successful. In this particular case the next caller should not get the
> updated sequence number.
>
> Having said so, there is a bug in current code that will cause us to
> increment the sequence number on failure. I notice it last week and have
> it fixed in v6 wip branch.
>
> int snp_issue_guest_request(....)
> {
>
> .....
> .....
>
> ret = sev_es_ghcb_hv_call(ghcb, NULL, id, input->req_gpa, input->resp_gpa);
> if (ret)
> goto e_put;
>
> if (ghcb->save.sw_exit_info_2) {
> ...
> ...
>
> ret = -EIO;
> goto e_put; /** THIS WAS MISSING */
> }
>
> /* The command was successful, increment the sequence counter. */
> snp_gen_msg_seqno();
> e_put:
> ....
> }
>
> Does this address your concern?
So the 'snp_msg_seqno()' call in 'enc_payload' will not increment the
counter, its only incremented on 'snp_gen_msg_seqno()'? If thats
correct, that addresses my first concern.
>
>
> > Also it seems like there is a concurrency error waiting to happen
> > here. If 2 callers call snp_msg_seqno() before either actually places
> > a call to the PSP, if the first caller's request doesn't reach the PSP
> > before the second caller's request both calls will fail. And again I
> > think the sequence numbers in the guest will be incorrect and
> > unrecoverable.
> >
>
> So far, the only user for the snp_msg_seqno() is the attestation driver.
> And the driver is designed to serialize the vmgexit request and thus we
> should not run into concurrence issue.
That seems a little dangerous as any module new code or out-of-tree
module could use this function thus revealing this race condition
right? Could we at least have a comment on these functions
(snp_msg_seqno and snp_gen_msg_seqno) noting this?
>
> >> +
> >> +static void snp_gen_msg_seqno(void)
> >> +{
> >> + struct snp_secrets_page_layout *layout;
> >> + u64 count;
> >> +
> >> + layout = snp_map_secrets_page();
> >> + if (!layout)
> >> + return;
> >> +
> >> + /*
> >> + * The counter is also incremented by the PSP, so increment it by 2
> >> + * and save in secrets page.
> >> + */
> >> + count = readl(&layout->os_area.msg_seqno_0);
> >> + count += 2;
> >> +
> >> + writel(count, &layout->os_area.msg_seqno_0);
> >> + iounmap(layout);
> >> +}
> >> +
> >> int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsigned long *fw_err)
> >> {
> >> struct ghcb_state state;
> >> @@ -2077,6 +2153,9 @@ int snp_issue_guest_request(int type, struct snp_guest_request_data *input, unsi
> >> ret = -EIO;
> >> }
> >>
> >> + /* The command was successful, increment the sequence counter */
> >> + snp_gen_msg_seqno();
> >> +
> >> e_put:
> >> __sev_put_ghcb(&state);
> >> e_restore_irq:
> >> diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
> >> index 24dd17507789..16b6af24fda7 100644
> >> --- a/include/linux/sev-guest.h
> >> +++ b/include/linux/sev-guest.h
> >> @@ -20,6 +20,41 @@ enum vmgexit_type {
> >> GUEST_REQUEST_MAX
> >> };
> >>
> >> +/*
> >> + * The secrets page contains 96-bytes of reserved field that can be used by
> >> + * the guest OS. The guest OS uses the area to save the message sequence
> >> + * number for each VMPCK.
> >> + *
> >> + * See the GHCB spec section Secret page layout for the format for this area.
> >> + */
> >> +struct secrets_os_area {
> >> + u32 msg_seqno_0;
> >> + u32 msg_seqno_1;
> >> + u32 msg_seqno_2;
> >> + u32 msg_seqno_3;
> >> + u64 ap_jump_table_pa;
> >> + u8 rsvd[40];
> >> + u8 guest_usage[32];
> >> +} __packed;
> >> +
> >> +#define VMPCK_KEY_LEN 32
> >> +
> >> +/* See the SNP spec for secrets page format */
> >> +struct snp_secrets_page_layout {
> >> + u32 version;
> >> + u32 imien : 1,
> >> + rsvd1 : 31;
> >> + u32 fms;
> >> + u32 rsvd2;
> >> + u8 gosvw[16];
> >> + u8 vmpck0[VMPCK_KEY_LEN];
> >> + u8 vmpck1[VMPCK_KEY_LEN];
> >> + u8 vmpck2[VMPCK_KEY_LEN];
> >> + u8 vmpck3[VMPCK_KEY_LEN];
> >> + struct secrets_os_area os_area;
> >> + u8 rsvd3[3840];
> >> +} __packed;
> >> +
> >> /*
> >> * The error code when the data_npages is too small. The error code
> >> * is defined in the GHCB specification.
> >> @@ -36,6 +71,7 @@ struct snp_guest_request_data {
> >> #ifdef CONFIG_AMD_MEM_ENCRYPT
> >> int snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input,
> >> unsigned long *fw_err);
> >> +u64 snp_msg_seqno(void);
> >> #else
> >>
> >> static inline int snp_issue_guest_request(int type, struct snp_guest_request_data *input,
> >> @@ -43,6 +79,7 @@ static inline int snp_issue_guest_request(int type, struct snp_guest_request_dat
> >> {
> >> return -ENODEV;
> >> }
> >> +static inline u64 snp_msg_seqno(void) { return 0; }
> >>
> >> #endif /* CONFIG_AMD_MEM_ENCRYPT */
> >> #endif /* __LINUX_SEV_GUEST_H__ */
> >> --
> >> 2.17.1
> >>
> >>
On 9/9/21 10:43 AM, Peter Gonda wrote:
...
>>
>> Does this address your concern?
>
> So the 'snp_msg_seqno()' call in 'enc_payload' will not increment the
> counter, its only incremented on 'snp_gen_msg_seqno()'? If thats
> correct, that addresses my first concern.
>
Yes, that is goal.
>>>
>>
>> So far, the only user for the snp_msg_seqno() is the attestation driver.
>> And the driver is designed to serialize the vmgexit request and thus we
>> should not run into concurrence issue.
>
> That seems a little dangerous as any module new code or out-of-tree
> module could use this function thus revealing this race condition
> right? Could we at least have a comment on these functions
> (snp_msg_seqno and snp_gen_msg_seqno) noting this?
>
Yes, if the driver is not performing the serialization then we will get
into race condition.
One way to avoid this requirement is to do all the crypto inside the
snp_issue_guest_request() and eliminate the need to export the
snp_msg_seqno().
I will add the comment about it in the function.
thanks
On Thu, Sep 9, 2021 at 10:17 AM Brijesh Singh <[email protected]> wrote:
>
>
>
> On 9/9/21 10:43 AM, Peter Gonda wrote:
> ...
>
> >>
> >> Does this address your concern?
> >
> > So the 'snp_msg_seqno()' call in 'enc_payload' will not increment the
> > counter, its only incremented on 'snp_gen_msg_seqno()'? If thats
> > correct, that addresses my first concern.
> >
>
> Yes, that is goal.
>
> >>>
> >>
> >> So far, the only user for the snp_msg_seqno() is the attestation driver.
> >> And the driver is designed to serialize the vmgexit request and thus we
> >> should not run into concurrence issue.
> >
> > That seems a little dangerous as any module new code or out-of-tree
> > module could use this function thus revealing this race condition
> > right? Could we at least have a comment on these functions
> > (snp_msg_seqno and snp_gen_msg_seqno) noting this?
> >
>
> Yes, if the driver is not performing the serialization then we will get
> into race condition.
>
> One way to avoid this requirement is to do all the crypto inside the
> snp_issue_guest_request() and eliminate the need to export the
> snp_msg_seqno().
>
> I will add the comment about it in the function.
Actually I forgot that the sequence number is the only component of
the AES-GCM IV. Seen in 'enc_payload'. Given the AES-GCM spec requires
uniqueness of the IV. I think we should try a little harder than a
comment to guarantee we never expose 2 requests encrypted with the
same sequence number / IV. It's more than just a DOS against the
guest's PSP request ability but also could be a guest security issue,
thoughts?
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
(Section 8 page 18)
>
> thanks
On 9/9/21 11:21 AM, Peter Gonda wrote:
> On Thu, Sep 9, 2021 at 10:17 AM Brijesh Singh <[email protected]> wrote:
>>
>>
>>
>> On 9/9/21 10:43 AM, Peter Gonda wrote:
>> ...
>>
>>>>
>>>> Does this address your concern?
>>>
>>> So the 'snp_msg_seqno()' call in 'enc_payload' will not increment the
>>> counter, its only incremented on 'snp_gen_msg_seqno()'? If thats
>>> correct, that addresses my first concern.
>>>
>>
>> Yes, that is goal.
>>
>>>>>
>>>>
>>>> So far, the only user for the snp_msg_seqno() is the attestation driver.
>>>> And the driver is designed to serialize the vmgexit request and thus we
>>>> should not run into concurrence issue.
>>>
>>> That seems a little dangerous as any module new code or out-of-tree
>>> module could use this function thus revealing this race condition
>>> right? Could we at least have a comment on these functions
>>> (snp_msg_seqno and snp_gen_msg_seqno) noting this?
>>>
>>
>> Yes, if the driver is not performing the serialization then we will get
>> into race condition.
>>
>> One way to avoid this requirement is to do all the crypto inside the
>> snp_issue_guest_request() and eliminate the need to export the
>> snp_msg_seqno().
>>
>> I will add the comment about it in the function.
>
> Actually I forgot that the sequence number is the only component of
> the AES-GCM IV. Seen in 'enc_payload'. Given the AES-GCM spec requires
> uniqueness of the IV. I think we should try a little harder than a
> comment to guarantee we never expose 2 requests encrypted with the
> same sequence number / IV. It's more than just a DOS against the
> guest's PSP request ability but also could be a guest security issue,
> thoughts?
>
Ah good point, we should avoid a request with same IV. May be move the
sequence number increment and save in sevguest drv. Then driver can do
the sequence get, vmgexit and increment with a protected lock.
thanks
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvlpubs.nist.gov%2Fnistpubs%2FLegacy%2FSP%2Fnistspecialpublication800-38d.pdf&data=04%7C01%7Cbrijesh.singh%40amd.com%7C46a05f4713834307706608d973ade616%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637668013461202204%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KCsi5rTQX6L%2BqY07VdBtF8IH0TLNyHn6wTyidgWvXf4%3D&reserved=0
> (Section 8 page 18)
>
>>
>> thanks
>
* Brijesh Singh ([email protected]) wrote:
> Version 2 of GHCB specification defines NAE to get the extended guest
> request. It is similar to the SNP_GET_REPORT ioctl. The main difference
^^^^^^^^^ is that 'report' not request?
> is related to the additional data that be returned. The additional
> data returned is a certificate blob that can be used by the SNP guest
> user. The certificate blob layout is defined in the GHCB specification.
> The driver simply treats the blob as a opaque data and copies it to
> userspace.
>
> Signed-off-by: Brijesh Singh <[email protected]>
I'm confused by snp_dev->certs_data - who writes to that, and when?
I see it's allocated as shared by the probe function but then passed in
input data in get_ext_report - but get_ext_report memset's it.
What happens if two threads were to try and get an extended report at
the same time?
Dave
> ---
> Documentation/virt/coco/sevguest.rst | 22 +++++
> drivers/virt/coco/sevguest/sevguest.c | 126 ++++++++++++++++++++++++++
> include/uapi/linux/sev-guest.h | 13 +++
> 3 files changed, 161 insertions(+)
>
> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
> index 25446670d816..7acb8696fca4 100644
> --- a/Documentation/virt/coco/sevguest.rst
> +++ b/Documentation/virt/coco/sevguest.rst
> @@ -85,3 +85,25 @@ on the various fileds passed in the key derivation request.
>
> On success, the snp_derived_key_resp.data will contains the derived key
> value.
> +
> +2.2 SNP_GET_EXT_REPORT
> +----------------------
> +:Technology: sev-snp
> +:Type: guest ioctl
> +:Parameters (in/out): struct snp_ext_report_req
> +:Returns (out): struct snp_report_resp on success, -negative on error
> +
> +The SNP_GET_EXT_REPORT ioctl is similar to the SNP_GET_REPORT. The difference is
> +related to the additional certificate data that is returned with the report.
> +The certificate data returned is being provided by the hypervisor through the
> +SNP_SET_EXT_CONFIG.
> +
> +The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command provided by the SEV-SNP
> +firmware to get the attestation report.
> +
> +On success, the snp_ext_report_resp.data will contains the attestation report
> +and snp_ext_report_req.certs_address will contains the certificate blob. If the
> +length of the blob is lesser than expected then snp_ext_report_req.certs_len will
> +be updated with the expected value.
> +
> +See GHCB specification for further detail on how to parse the certificate blob.
> diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
> index 621b1c5a9cfc..d978eb432c4c 100644
> --- a/drivers/virt/coco/sevguest/sevguest.c
> +++ b/drivers/virt/coco/sevguest/sevguest.c
> @@ -39,6 +39,7 @@ struct snp_guest_dev {
> struct device *dev;
> struct miscdevice misc;
>
> + void *certs_data;
> struct snp_guest_crypto *crypto;
> struct snp_guest_msg *request, *response;
> };
> @@ -347,6 +348,117 @@ static int get_derived_key(struct snp_guest_dev *snp_dev, struct snp_user_guest_
> return rc;
> }
>
> +static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *arg)
> +{
> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> + struct snp_guest_request_data input = {};
> + struct snp_ext_report_req req;
> + int ret, npages = 0, resp_len;
> + struct snp_report_resp *resp;
> + struct snp_report_req *rreq;
> + unsigned long fw_err = 0;
> +
> + if (!arg->req_data || !arg->resp_data)
> + return -EINVAL;
> +
> + /* Copy the request payload from the userspace */
> + if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
> + return -EFAULT;
> +
> + rreq = &req.data;
> +
> + /* Message version must be non-zero */
> + if (!rreq->msg_version)
> + return -EINVAL;
> +
> + if (req.certs_len) {
> + if (req.certs_len > SEV_FW_BLOB_MAX_SIZE ||
> + !IS_ALIGNED(req.certs_len, PAGE_SIZE))
> + return -EINVAL;
> + }
> +
> + if (req.certs_address && req.certs_len) {
> + if (!access_ok(req.certs_address, req.certs_len))
> + return -EFAULT;
> +
> + /*
> + * Initialize the intermediate buffer with all zero's. This buffer
> + * is used in the guest request message to get the certs blob from
> + * the host. If host does not supply any certs in it, then we copy
> + * zeros to indicate that certificate data was not provided.
> + */
> + memset(snp_dev->certs_data, 0, req.certs_len);
> +
> + input.data_gpa = __pa(snp_dev->certs_data);
> + npages = req.certs_len >> PAGE_SHIFT;
> + }
> +
> + /*
> + * The intermediate response buffer is used while decrypting the
> + * response payload. Make sure that it has enough space to cover the
> + * authtag.
> + */
> + resp_len = sizeof(resp->data) + crypto->a_len;
> + resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
> + if (!resp)
> + return -ENOMEM;
> +
> + if (copy_from_user(resp, (void __user *)arg->resp_data, sizeof(*resp))) {
> + ret = -EFAULT;
> + goto e_free;
> + }
> +
> + /* Encrypt the userspace provided payload */
> + ret = enc_payload(snp_dev, rreq->msg_version, SNP_MSG_REPORT_REQ,
> + &rreq->user_data, sizeof(rreq->user_data));
> + if (ret)
> + goto e_free;
> +
> + /* Call firmware to process the request */
> + input.req_gpa = __pa(snp_dev->request);
> + input.resp_gpa = __pa(snp_dev->response);
> + input.data_npages = npages;
> + memset(snp_dev->response, 0, sizeof(*snp_dev->response));
> + ret = snp_issue_guest_request(EXT_GUEST_REQUEST, &input, &fw_err);
> +
> + /* Popogate any firmware error to the userspace */
> + arg->fw_err = fw_err;
> +
> + /* If certs length is invalid then copy the returned length */
> + if (arg->fw_err == SNP_GUEST_REQ_INVALID_LEN) {
> + req.certs_len = input.data_npages << PAGE_SHIFT;
> +
> + if (copy_to_user((void __user *)arg->req_data, &req, sizeof(req)))
> + ret = -EFAULT;
> +
> + goto e_free;
> + }
> +
> + if (ret)
> + goto e_free;
> +
> + /* Decrypt the response payload */
> + ret = verify_and_dec_payload(snp_dev, resp->data, resp_len);
> + if (ret)
> + goto e_free;
> +
> + /* Copy the certificate data blob to userspace */
> + if (req.certs_address &&
> + copy_to_user((void __user *)req.certs_address, snp_dev->certs_data,
> + req.certs_len)) {
> + ret = -EFAULT;
> + goto e_free;
> + }
> +
> + /* Copy the response payload to userspace */
> + if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
> + ret = -EFAULT;
> +
> +e_free:
> + kfree(resp);
> + return ret;
> +}
> +
> static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> {
> struct snp_guest_dev *snp_dev = to_snp_dev(file);
> @@ -368,6 +480,10 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
> ret = get_derived_key(snp_dev, &input);
> break;
> }
> + case SNP_GET_EXT_REPORT: {
> + ret = get_ext_report(snp_dev, &input);
> + break;
> + }
> default:
> break;
> }
> @@ -453,6 +569,12 @@ static int __init snp_guest_probe(struct platform_device *pdev)
> goto e_free_req;
> }
>
> + snp_dev->certs_data = alloc_shared_pages(SEV_FW_BLOB_MAX_SIZE);
> + if (IS_ERR(snp_dev->certs_data)) {
> + ret = PTR_ERR(snp_dev->certs_data);
> + goto e_free_resp;
> + }
> +
> misc = &snp_dev->misc;
> misc->minor = MISC_DYNAMIC_MINOR;
> misc->name = DEVICE_NAME;
> @@ -460,6 +582,9 @@ static int __init snp_guest_probe(struct platform_device *pdev)
>
> return misc_register(misc);
>
> +e_free_resp:
> + free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
> +
> e_free_req:
> free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
>
> @@ -475,6 +600,7 @@ static int __exit snp_guest_remove(struct platform_device *pdev)
>
> free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
> free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
> + free_shared_pages(snp_dev->certs_data, SEV_FW_BLOB_MAX_SIZE);
> deinit_crypto(snp_dev->crypto);
> misc_deregister(&snp_dev->misc);
>
> diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
> index 621a9167df7a..23659215fcfb 100644
> --- a/include/uapi/linux/sev-guest.h
> +++ b/include/uapi/linux/sev-guest.h
> @@ -57,6 +57,16 @@ struct snp_derived_key_resp {
> __u8 data[64];
> };
>
> +struct snp_ext_report_req {
> + struct snp_report_req data;
> +
> + /* where to copy the certificate blob */
> + __u64 certs_address;
> +
> + /* length of the certificate blob */
> + __u32 certs_len;
> +};
> +
> #define SNP_GUEST_REQ_IOC_TYPE 'S'
>
> /* Get SNP attestation report */
> @@ -65,4 +75,7 @@ struct snp_derived_key_resp {
> /* Get a derived key from the root */
> #define SNP_GET_DERIVED_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_user_guest_request)
>
> +/* Get SNP extended report as defined in the GHCB specification version 2. */
> +#define SNP_GET_EXT_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x2, struct snp_user_guest_request)
> +
> #endif /* __UAPI_LINUX_SEV_GUEST_H_ */
> --
> 2.17.1
>
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK
Hi Boris,
On 9/8/21 12:53 PM, Borislav Petkov wrote:
>> +
>> + /* If certs length is invalid then copy the returned length */
>> + if (arg->fw_err == SNP_GUEST_REQ_INVALID_LEN) {
>> + req.certs_len = input.data_npages << PAGE_SHIFT;
>> +
>> + if (copy_to_user((void __user *)arg->req_data, &req, sizeof(req)))
>> + ret = -EFAULT;
>> +
>> + goto e_free;
>> + }
>> +
>> + if (ret)
>> + goto e_free;
> This one is really confusing. You assign ret in the if branch
> above but then you test ret outside too, just in case the
> snp_issue_guest_request() call above has failed.
>
> But then if that call has failed, you still go and do some cleanup work
> for invalid certs length...
>
> So that get_ext_report() function is doing too many things at once and
> is crying to be split.
I will try to see what I can come up with to make it easy to read.
>
> For example, the glue around snp_issue_guest_request() is already carved
> out in handle_guest_request(). Why aren't you calling that function here
> too?
The handle_guest_request() uses the VMGEXIT_GUEST_REQUEST which does not
require the memory for the certificate blobs etc. But based on your
earlier comment that we should let the driver use the VMGEXIT code
rather than enum will help in this case. I will be reworking
handle_guest_request() so that it can be used for both the cases (with
and without certificate).
> That'll keep the enc, request, dec payload game separate and then the
> rest of the logic can remain in get_ext_report()...
>
> ...
>
>> static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>> {
>> struct snp_guest_dev *snp_dev = to_snp_dev(file);
>> @@ -368,6 +480,10 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
>> ret = get_derived_key(snp_dev, &input);
>> break;
>> }
>> + case SNP_GET_EXT_REPORT: {
>> + ret = get_ext_report(snp_dev, &input);
>> + break;
>> + }
>> default:
>> break;
>> }
>> @@ -453,6 +569,12 @@ static int __init snp_guest_probe(struct platform_device *pdev)
>> goto e_free_req;
>> }
>>
>> + snp_dev->certs_data = alloc_shared_pages(SEV_FW_BLOB_MAX_SIZE);
>> + if (IS_ERR(snp_dev->certs_data)) {
>> + ret = PTR_ERR(snp_dev->certs_data);
>> + goto e_free_resp;
>> + }
> Same comments here as for patch 37.
>
>> +
>> misc = &snp_dev->misc;
>> misc->minor = MISC_DYNAMIC_MINOR;
>> misc->name = DEVICE_NAME;
>
On 9/15/21 5:02 AM, Dr. David Alan Gilbert wrote:
> * Brijesh Singh ([email protected]) wrote:
>> Version 2 of GHCB specification defines NAE to get the extended guest
>> request. It is similar to the SNP_GET_REPORT ioctl. The main difference
> ^^^^^^^^^ is that 'report' not request?
>
>> is related to the additional data that be returned. The additional
>> data returned is a certificate blob that can be used by the SNP guest
>> user. The certificate blob layout is defined in the GHCB specification.
>> The driver simply treats the blob as a opaque data and copies it to
>> userspace.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
> I'm confused by snp_dev->certs_data - who writes to that, and when?
> I see it's allocated as shared by the probe function but then passed in
> input data in get_ext_report - but get_ext_report memset's it.
> What happens if two threads were to try and get an extended report at
> the same time?
The certs are system wide and is programmed by the Hypervisor during the
platform provisioning.The hypervisor copies the cert blob in the guest
memory while responding to the extended guest message request vmgexit.
The call to the guest message request function is serialized. i.e there
is a mutex_lock() before the get_ext_report().
> Dave
>
>
>> ---
>> Documentation/virt/coco/sevguest.rst | 22 +++++
>> drivers/virt/coco/sevguest/sevguest.c | 126 ++++++++++++++++++++++++++
>> include/uapi/linux/sev-guest.h | 13 +++
>> 3 files changed, 161 insertions(+)
>>
>> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
>> index 25446670d816..7acb8696fca4 100644
>> --- a/Documentation/virt/coco/sevguest.rst
>> +++ b/Documentation/virt/coco/sevguest.rst
>> @@ -85,3 +85,25 @@ on the various fileds passed in the key derivation request.
>>
>> On success, the snp_derived_key_resp.data will contains the derived key
>> value.
>> +
>> +2.2 SNP_GET_EXT_REPORT
>> +----------------------
>> +:Technology: sev-snp
>> +:Type: guest ioctl
>> +:Parameters (in/out): struct snp_ext_report_req
>> +:Returns (out): struct snp_report_resp on success, -negative on error
>> +
>> +The SNP_GET_EXT_REPORT ioctl is similar to the SNP_GET_REPORT. The difference is
>> +related to the additional certificate data that is returned with the report.
>> +The certificate data returned is being provided by the hypervisor through the
>> +SNP_SET_EXT_CONFIG.
>> +
>> +The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command provided by the SEV-SNP
>> +firmware to get the attestation report.
>> +
>> +On success, the snp_ext_report_resp.data will contains the attestation report
>> +and snp_ext_report_req.certs_address will contains the certificate blob. If the
>> +length of the blob is lesser than expected then snp_ext_report_req.certs_len will
>> +be updated with the expected value.
>> +
>> +See GHCB specification for further detail on how to parse the certificate blob.
>> diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
>> index 621b1c5a9cfc..d978eb432c4c 100644
>> --- a/drivers/virt/coco/sevguest/sevguest.c
>> +++ b/drivers/virt/coco/sevguest/sevguest.c
>> @@ -39,6 +39,7 @@ struct snp_guest_dev {
>> struct device *dev;
>> struct miscdevice misc;
>>
>> + void *certs_data;
>> struct snp_guest_crypto *crypto;
>> struct snp_guest_msg *request, *response;
>> };
>> @@ -347,6 +348,117 @@ static int get_derived_key(struct snp_guest_dev *snp_dev, struct snp_user_guest_
>> return rc;
>> }
>>
>> +static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *arg)
>> +{
>> + struct snp_guest_crypto *crypto = snp_dev->crypto;
>> + struct snp_guest_request_data input = {};
>> + struct snp_ext_report_req req;
>> + int ret, npages = 0, resp_len;
>> + struct snp_report_resp *resp;
>> + struct snp_report_req *rreq;
>> + unsigned long fw_err = 0;
>> +
>> + if (!arg->req_data || !arg->resp_data)
>> + return -EINVAL;
>> +
>> + /* Copy the request payload from the userspace */
>> + if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
>> + return -EFAULT;
>> +
>> + rreq = &req.data;
>> +
>> + /* Message version must be non-zero */
>> + if (!rreq->msg_version)
>> + return -EINVAL;
>> +
>> + if (req.certs_len) {
>> + if (req.certs_len > SEV_FW_BLOB_MAX_SIZE ||
>> + !IS_ALIGNED(req.certs_len, PAGE_SIZE))
>> + return -EINVAL;
>> + }
>> +
>> + if (req.certs_address && req.certs_len) {
>> + if (!access_ok(req.certs_address, req.certs_len))
>> + return -EFAULT;
>> +
>> + /*
>> + * Initialize the intermediate buffer with all zero's. This buffer
>> + * is used in the guest request message to get the certs blob from
>> + * the host. If host does not supply any certs in it, then we copy
>> + * zeros to indicate that certificate data was not provided.
>> + */
>> + memset(snp_dev->certs_data, 0, req.certs_len);
>> +
>> + input.data_gpa = __pa(snp_dev->certs_data);
>> + npages = req.certs_len >> PAGE_SHIFT;
>> + }
>> +
>> + /*
>> + * The intermediate response buffer is used while decrypting the
>> + * response payload. Make sure that it has enough space to cover the
>> + * authtag.
>> + */
>> + resp_len = sizeof(resp->data) + crypto->a_len;
>> + resp = kzalloc(resp_len, GFP_KERNEL_ACCOUNT);
>> + if (!resp)
>> + return -ENOMEM;
>> +
>> + if (copy_from_user(resp, (void __user *)arg->resp_data, sizeof(*resp))) {
>> + ret = -EFAULT;
>> + goto e_free;
>> + }
>> +
>> + /* Encrypt the userspace provided payload */
>> + ret = enc_payload(snp_dev, rreq->msg_version, SNP_MSG_REPORT_REQ,
>> + &rreq->user_data, sizeof(rreq->user_data));
>> + if (ret)
>> + goto e_free;
>> +
>> + /* Call firmware to process the request */
>> + input.req_gpa = __pa(snp_dev->request);
>> + input.resp_gpa = __pa(snp_dev->response);
>> + input.data_npages = npages;
>> + memset(snp_dev->response, 0, sizeof(*snp_dev->response));
>> + ret = snp_issue_guest_request(EXT_GUEST_REQUEST, &input, &fw_err);
>> +
>> + /* Popogate any firmware error to the userspace */
>> + arg->fw_err = fw_err;
>> +
>> + /* If certs length is invalid then copy the returned length */
>> + if (arg->fw_err == SNP_GUEST_REQ_INVALID_LEN) {
>> + req.certs_len = input.data_npages << PAGE_SHIFT;
>> +
>> + if (copy_to_user((void __user *)arg->req_data, &req, sizeof(req)))
>> + ret = -EFAULT;
>> +
>> + goto e_free;
>> + }
>> +
>> + if (ret)
>> + goto e_free;
>> +
>> + /* Decrypt the response payload */
>> + ret = verify_and_dec_payload(snp_dev, resp->data, resp_len);
>> + if (ret)
>> + goto e_free;
>> +
>> + /* Copy the certificate data blob to userspace */
>> + if (req.certs_address &&
>> + copy_to_user((void __user *)req.certs_address, snp_dev->certs_data,
>> + req.certs_len)) {
>> + ret = -EFAULT;
>> + goto e_free;
>> + }
>> +
>> + /* Copy the response payload to userspace */
>> + if (copy_to_user((void __user *)arg->resp_data, resp, sizeof(*resp)))
>> + ret = -EFAULT;
>> +
>> +e_free:
>> + kfree(resp);
>> + return ret;
>> +}
>> +
>> static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>> {
>> struct snp_guest_dev *snp_dev = to_snp_dev(file);
>> @@ -368,6 +480,10 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
>> ret = get_derived_key(snp_dev, &input);
>> break;
>> }
>> + case SNP_GET_EXT_REPORT: {
>> + ret = get_ext_report(snp_dev, &input);
>> + break;
>> + }
>> default:
>> break;
>> }
>> @@ -453,6 +569,12 @@ static int __init snp_guest_probe(struct platform_device *pdev)
>> goto e_free_req;
>> }
>>
>> + snp_dev->certs_data = alloc_shared_pages(SEV_FW_BLOB_MAX_SIZE);
>> + if (IS_ERR(snp_dev->certs_data)) {
>> + ret = PTR_ERR(snp_dev->certs_data);
>> + goto e_free_resp;
>> + }
>> +
>> misc = &snp_dev->misc;
>> misc->minor = MISC_DYNAMIC_MINOR;
>> misc->name = DEVICE_NAME;
>> @@ -460,6 +582,9 @@ static int __init snp_guest_probe(struct platform_device *pdev)
>>
>> return misc_register(misc);
>>
>> +e_free_resp:
>> + free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
>> +
>> e_free_req:
>> free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
>>
>> @@ -475,6 +600,7 @@ static int __exit snp_guest_remove(struct platform_device *pdev)
>>
>> free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
>> free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
>> + free_shared_pages(snp_dev->certs_data, SEV_FW_BLOB_MAX_SIZE);
>> deinit_crypto(snp_dev->crypto);
>> misc_deregister(&snp_dev->misc);
>>
>> diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
>> index 621a9167df7a..23659215fcfb 100644
>> --- a/include/uapi/linux/sev-guest.h
>> +++ b/include/uapi/linux/sev-guest.h
>> @@ -57,6 +57,16 @@ struct snp_derived_key_resp {
>> __u8 data[64];
>> };
>>
>> +struct snp_ext_report_req {
>> + struct snp_report_req data;
>> +
>> + /* where to copy the certificate blob */
>> + __u64 certs_address;
>> +
>> + /* length of the certificate blob */
>> + __u32 certs_len;
>> +};
>> +
>> #define SNP_GUEST_REQ_IOC_TYPE 'S'
>>
>> /* Get SNP attestation report */
>> @@ -65,4 +75,7 @@ struct snp_derived_key_resp {
>> /* Get a derived key from the root */
>> #define SNP_GET_DERIVED_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_user_guest_request)
>>
>> +/* Get SNP extended report as defined in the GHCB specification version 2. */
>> +#define SNP_GET_EXT_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x2, struct snp_user_guest_request)
>> +
>> #endif /* __UAPI_LINUX_SEV_GUEST_H_ */
>> --
>> 2.17.1
>>
>>