2021-06-02 14:05:26

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 00/22] Add AMD Secure Nested Paging (SEV-SNP) Guest Support

This part of Secure Encrypted Paging (SEV-SNP) series focuses on the changes
required in a guest OS for SEV-SNP support.

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

Many of the integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP
VM requires a 2-step process. First, the hypervisor assigns a page to the
guest using the new RMPUPDATE instruction. This transitions the page to
guest-invalid. Second, the guest validates the page using the new PVALIDATE
instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE"
defined in the GHCB specification to ask hypervisor to add or remove page
from the RMP table.

Each page assigned to the SEV-SNP VM can either be validated or unvalidated,
as indicated by the Validated flag in the page's RMP entry. There are two
approaches that can be taken for the page validation: Pre-validation and
Lazy Validation.

Under pre-validation, the pages are validated prior to first use. And under
lazy validation, pages are validated when first accessed. An access to a
unvalidated page results in a #VC exception, at which time the exception
handler may validate the page. Lazy validation requires careful tracking of
the validated pages to avoid validating the same GPA more than once. The
recently introduced "Unaccepted" memory type can be used to communicate the
unvalidated memory ranges to the Guest OS.

At this time we only sypport the pre-validation, the OVMF guest BIOS
validates the entire RAM before the control is handed over to the guest kernel.
The early_set_memory_{encrypt,decrypt} and set_memory_{encrypt,decrypt} are
enlightened to perform the page validation or invalidation while setting or
clearing the encryption attribute from the page table.

This series does not provide support for the following SEV-SNP features yet:

* Extended Guest request
* CPUID filtering
* Lazy validation
* Interrupt security

The series is based on tip/master commit
493a0d4559fd (origin/master, origin/HEAD) Merge branch 'perf/core'

Additional resources
---------------------
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf

APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf
(section 15.36)

GHCB spec:
https://developer.amd.com/wp-content/resources/56421.pdf

SEV-SNP firmware specification:
https://developer.amd.com/sev/

Changes since v2:
* Add support for AP startup.
* Add snp_prep_memory() and sev_snp_active() helper.
* Drop sev_snp_active() helper.
* Add sev_feature_enabled() helper to check which SEV feature is active.
* Shorten the GHCB NAE macro names.
* Add snp_msg_seqno() to get the message counter used while building the request for the attestation report.
* Sync the SNP guest message request header with latest SNP FW spec.
* Multiple cleanup and fixes to address the review feedbacks.

Changes since v1:
* Integerate the SNP support in sev.{ch}.
* Add support to query the hypervisor feature and detect whether SNP is supported.
* Define Linux specific reason code for the SNP guest termination.
* Extend the setup_header provide a way for hypervisor to pass secret and cpuid page.
* Add support to create a platform device and driver to query the attestation report
and the derive a key.
* Multiple cleanup and fixes to address Boris's review fedback.

Brijesh Singh (18):
x86/sev: shorten GHCB terminate macro names
x86/sev: Define the Linux specific guest termination reasons
x86/sev: Save the negotiated GHCB version
x86/mm: Add sev_feature_enabled() helper
x86/sev: Add support for hypervisor feature VMGEXIT
x86/sev: check SEV-SNP features support
x86/sev: Add a helper for the PVALIDATE instruction
x86/compressed: Add helper for validating pages in the decompression
stage
x86/compressed: Register GHCB memory when SEV-SNP is active
x86/sev: Register GHCB memory when SEV-SNP is active
x86/sev: Add helper for validating pages in early enc attribute
changes
x86/kernel: Make the bss.decrypted section shared in RMP table
x86/kernel: Validate rom memory before accessing when SEV-SNP is
active
x86/mm: Add support to validate memory when changing C-bit
KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area
x86/boot: Add Confidential Computing address to setup_header
x86/sev: Register SNP guest request platform device
virt: Add SEV-SNP guest driver

Tom Lendacky (4):
KVM: SVM: Create a separate mapping for the SEV-ES save area
KVM: SVM: Create a separate mapping for the GHCB save area
KVM: SVM: Update the SEV-ES save area mapping
x86/sev-snp: SEV-SNP AP creation support

Documentation/x86/boot.rst | 27 +
arch/x86/boot/compressed/ident_map_64.c | 17 +-
arch/x86/boot/compressed/misc.h | 6 +
arch/x86/boot/compressed/sev.c | 78 ++-
arch/x86/boot/header.S | 7 +-
arch/x86/include/asm/mem_encrypt.h | 9 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/include/asm/sev-common.h | 76 ++-
arch/x86/include/asm/sev.h | 76 ++-
arch/x86/include/asm/svm.h | 167 ++++++-
arch/x86/include/uapi/asm/bootparam.h | 1 +
arch/x86/include/uapi/asm/svm.h | 9 +
arch/x86/kernel/head64.c | 7 +
arch/x86/kernel/probe_roms.c | 13 +-
arch/x86/kernel/sev-internal.h | 12 +
arch/x86/kernel/sev-shared.c | 74 ++-
arch/x86/kernel/sev.c | 630 +++++++++++++++++++++++-
arch/x86/kernel/smpboot.c | 3 +
arch/x86/kvm/svm/sev.c | 24 +-
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/mm/mem_encrypt.c | 61 ++-
arch/x86/mm/pat/set_memory.c | 14 +
arch/x86/platform/efi/efi.c | 2 +
drivers/virt/Kconfig | 3 +
drivers/virt/Makefile | 1 +
drivers/virt/sevguest/Kconfig | 10 +
drivers/virt/sevguest/Makefile | 4 +
drivers/virt/sevguest/snp.c | 448 +++++++++++++++++
drivers/virt/sevguest/snp.h | 63 +++
include/linux/efi.h | 1 +
include/linux/sev-guest.h | 76 +++
include/uapi/linux/sev-guest.h | 56 +++
33 files changed, 1926 insertions(+), 57 deletions(-)
create mode 100644 arch/x86/kernel/sev-internal.h
create mode 100644 drivers/virt/sevguest/Kconfig
create mode 100644 drivers/virt/sevguest/Makefile
create mode 100644 drivers/virt/sevguest/snp.c
create mode 100644 drivers/virt/sevguest/snp.h
create mode 100644 include/linux/sev-guest.h
create mode 100644 include/uapi/linux/sev-guest.h

--
2.17.1


2021-06-02 14:05:54

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 01/22] x86/sev: shorten GHCB terminate macro names

Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 6 +++---
arch/x86/include/asm/sev-common.h | 4 ++--
arch/x86/kernel/sev-shared.c | 2 +-
arch/x86/kernel/sev.c | 4 ++--
4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 670e998fe930..28bcf04c022e 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -122,7 +122,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
static bool early_setup_sev_es(void)
{
if (!sev_es_negotiate_protocol())
- sev_es_terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED);
+ sev_es_terminate(GHCB_SEV_ES_PROT_UNSUPPORTED);

if (set_page_decrypted((unsigned long)&boot_ghcb_page))
return false;
@@ -175,7 +175,7 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
enum es_result result;

if (!boot_ghcb && !early_setup_sev_es())
- sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+ sev_es_terminate(GHCB_SEV_ES_GEN_REQ);

vc_ghcb_invalidate(boot_ghcb);
result = vc_init_em_ctxt(&ctxt, regs, exit_code);
@@ -202,5 +202,5 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
if (result == ES_OK)
vc_finish_insn(&ctxt);
else if (result != ES_RETRY)
- sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+ sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
}
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 629c3df243f0..11b7d9cea775 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -54,8 +54,8 @@
(((((u64)reason_set) & GHCB_MSR_TERM_REASON_SET_MASK) << GHCB_MSR_TERM_REASON_SET_POS) | \
((((u64)reason_val) & GHCB_MSR_TERM_REASON_MASK) << GHCB_MSR_TERM_REASON_POS))

-#define GHCB_SEV_ES_REASON_GENERAL_REQUEST 0
-#define GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED 1
+#define GHCB_SEV_ES_GEN_REQ 0
+#define GHCB_SEV_ES_PROT_UNSUPPORTED 1

#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 6ec8b3bfd76e..14198075ff8b 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -207,7 +207,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)

fail:
/* Terminate the guest */
- sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+ sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
}

static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 9578c82832aa..460717e3f72d 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1383,7 +1383,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
show_regs(regs);

/* Ask hypervisor to sev_es_terminate */
- sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+ sev_es_terminate(GHCB_SEV_ES_GEN_REQ);

/* If that fails and we get here - just panic */
panic("Returned from Terminate-Request to Hypervisor\n");
@@ -1416,7 +1416,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)

/* Do initial setup or terminate the guest */
if (unlikely(boot_ghcb == NULL && !sev_es_setup_ghcb()))
- sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+ sev_es_terminate(GHCB_SEV_ES_GEN_REQ);

vc_ghcb_invalidate(boot_ghcb);

--
2.17.1

2021-06-02 14:05:55

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 02/22] x86/sev: Define the Linux specific guest termination reasons

GHCB specification defines the reason code for reason set 0. The reason
codes defined in the set 0 do not cover all possible causes for a guest
to request termination.

The reason set 1 to 255 is reserved for the vendor-specific codes.
Reseve the reason set 1 for the Linux guest. Define an error codes for
reason set 1.

While at it, change the sev_es_terminate() to accept the reason set
parameter.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 6 +++---
arch/x86/include/asm/sev-common.h | 5 +++++
arch/x86/kernel/sev-shared.c | 6 +++---
arch/x86/kernel/sev.c | 4 ++--
4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 28bcf04c022e..87621f4e4703 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -122,7 +122,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
static bool early_setup_sev_es(void)
{
if (!sev_es_negotiate_protocol())
- sev_es_terminate(GHCB_SEV_ES_PROT_UNSUPPORTED);
+ sev_es_terminate(0, GHCB_SEV_ES_PROT_UNSUPPORTED);

if (set_page_decrypted((unsigned long)&boot_ghcb_page))
return false;
@@ -175,7 +175,7 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
enum es_result result;

if (!boot_ghcb && !early_setup_sev_es())
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(0, GHCB_SEV_ES_GEN_REQ);

vc_ghcb_invalidate(boot_ghcb);
result = vc_init_em_ctxt(&ctxt, regs, exit_code);
@@ -202,5 +202,5 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
if (result == ES_OK)
vc_finish_insn(&ctxt);
else if (result != ES_RETRY)
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(0, GHCB_SEV_ES_GEN_REQ);
}
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 11b7d9cea775..f1e2aacb0d61 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,4 +59,9 @@

#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)

+/* Linux specific reason codes (used with reason set 1) */
+#define GHCB_TERM_REGISTER 0 /* GHCB GPA registration failure */
+#define GHCB_TERM_PSC 1 /* Page State Change failure */
+#define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
+
#endif
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 14198075ff8b..de0e7e6c52b8 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -24,7 +24,7 @@ static bool __init sev_es_check_cpu_features(void)
return true;
}

-static void __noreturn sev_es_terminate(unsigned int reason)
+static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
{
u64 val = GHCB_MSR_TERM_REQ;

@@ -32,7 +32,7 @@ static void __noreturn sev_es_terminate(unsigned int reason)
* Tell the hypervisor what went wrong - only reason-set 0 is
* currently supported.
*/
- val |= GHCB_SEV_TERM_REASON(0, reason);
+ val |= GHCB_SEV_TERM_REASON(set, reason);

/* Request Guest Termination from Hypvervisor */
sev_es_wr_ghcb_msr(val);
@@ -207,7 +207,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)

fail:
/* Terminate the guest */
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(0, GHCB_SEV_ES_GEN_REQ);
}

static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 460717e3f72d..77a754365ba9 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1383,7 +1383,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
show_regs(regs);

/* Ask hypervisor to sev_es_terminate */
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(0, GHCB_SEV_ES_GEN_REQ);

/* If that fails and we get here - just panic */
panic("Returned from Terminate-Request to Hypervisor\n");
@@ -1416,7 +1416,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)

/* Do initial setup or terminate the guest */
if (unlikely(boot_ghcb == NULL && !sev_es_setup_ghcb()))
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(0, GHCB_SEV_ES_GEN_REQ);

vc_ghcb_invalidate(boot_ghcb);

--
2.17.1

2021-06-02 14:06:17

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 03/22] x86/sev: Save the negotiated GHCB version

The SEV-ES guest calls the sev_es_negotiate_protocol() to negotiate the
GHCB protocol version before establishing the GHCB. Cache the negotiated
GHCB version so that it can be used later.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 2 +-
arch/x86/kernel/sev-shared.c | 15 ++++++++++++---
2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index fa5cd05d3b5b..7ec91b1359df 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -12,7 +12,7 @@
#include <asm/insn.h>
#include <asm/sev-common.h>

-#define GHCB_PROTO_OUR 0x0001UL
+#define GHCB_PROTOCOL_MIN 1ULL
#define GHCB_PROTOCOL_MAX 1ULL
#define GHCB_DEFAULT_USAGE 0ULL

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index de0e7e6c52b8..70f181f20d92 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -14,6 +14,13 @@
#define has_cpuflag(f) boot_cpu_has(f)
#endif

+/*
+ * Since feature negotiation related variables are set early in the boot
+ * process they must reside in the .data section so as not to be zeroed
+ * out when the .bss section is later cleared.
+ */
+static u16 ghcb_version __section(".data");
+
static bool __init sev_es_check_cpu_features(void)
{
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -54,10 +61,12 @@ static bool sev_es_negotiate_protocol(void)
if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
return false;

- if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTO_OUR ||
- GHCB_MSR_PROTO_MIN(val) > GHCB_PROTO_OUR)
+ if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
+ GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
return false;

+ ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
+
return true;
}

@@ -101,7 +110,7 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
enum es_result ret;

/* Fill in protocol and format specifiers */
- ghcb->protocol_version = GHCB_PROTOCOL_MAX;
+ ghcb->protocol_version = ghcb_version;
ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;

ghcb_set_sw_exit_code(ghcb, exit_code);
--
2.17.1

2021-06-02 14:06:35

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 06/22] x86/sev: check SEV-SNP features support

Version 2 of the GHCB specification added the advertisement of features
that are supported by the hypervisor. If hypervisor supports the SEV-SNP
then it must set the SEV-SNP features bit to indicate that the base
SEV-SNP is supported.

Check the SEV-SNP feature while establishing the GHCB, if failed,
terminate the guest.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 22 ++++++++++++++++++++++
arch/x86/include/asm/sev-common.h | 3 +++
arch/x86/kernel/sev-shared.c | 11 +++++++++++
arch/x86/kernel/sev.c | 4 ++++
4 files changed, 40 insertions(+)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 87621f4e4703..0745ea61d32e 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -25,6 +25,7 @@

struct ghcb boot_ghcb_page __aligned(PAGE_SIZE);
struct ghcb *boot_ghcb;
+static u64 msr_sev_status;

/*
* Copy a version of this function here - insn-eval.c can't be used in
@@ -119,11 +120,32 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
/* Include code for early handlers */
#include "../../kernel/sev-shared.c"

+static inline bool sev_snp_enabled(void)
+{
+ unsigned long low, high;
+
+ if (!msr_sev_status) {
+ asm volatile("rdmsr\n"
+ : "=a" (low), "=d" (high)
+ : "c" (MSR_AMD64_SEV));
+ msr_sev_status = (high << 32) | low;
+ }
+
+ return msr_sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+}
+
static bool early_setup_sev_es(void)
{
if (!sev_es_negotiate_protocol())
sev_es_terminate(0, GHCB_SEV_ES_PROT_UNSUPPORTED);

+ /*
+ * If SEV-SNP is enabled, then check if the hypervisor supports the SEV-SNP
+ * features.
+ */
+ if (sev_snp_enabled() && !sev_snp_check_hypervisor_features())
+ sev_es_terminate(0, GHCB_SEV_ES_SNP_UNSUPPORTED);
+
if (set_page_decrypted((unsigned long)&boot_ghcb_page))
return false;

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 981fff2257b9..3ebf00772f26 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -51,6 +51,8 @@
#define GHCB_MSR_HV_FT_POS 12
#define GHCB_MSR_HV_FT_MASK GENMASK_ULL(51, 0)

+#define GHCB_HV_FT_SNP BIT_ULL(0)
+
#define GHCB_MSR_HV_FT_RESP_VAL(v) \
(((unsigned long)((v) & GHCB_MSR_HV_FT_MASK) >> GHCB_MSR_HV_FT_POS))

@@ -65,6 +67,7 @@

#define GHCB_SEV_ES_GEN_REQ 0
#define GHCB_SEV_ES_PROT_UNSUPPORTED 1
+#define GHCB_SEV_ES_SNP_UNSUPPORTED 2

#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 94957c5bdb51..b8312ad66120 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -32,6 +32,17 @@ static bool __init sev_es_check_cpu_features(void)
return true;
}

+static bool __init sev_snp_check_hypervisor_features(void)
+{
+ if (ghcb_version < 2)
+ return false;
+
+ if (!(hv_features & GHCB_HV_FT_SNP))
+ return false;
+
+ return true;
+}
+
static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
{
u64 val = GHCB_MSR_TERM_REQ;
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 77a754365ba9..9b70b7332614 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -609,6 +609,10 @@ static bool __init sev_es_setup_ghcb(void)
if (!sev_es_negotiate_protocol())
return false;

+ /* If SNP is active, make sure that hypervisor supports the feature. */
+ if (sev_feature_enabled(SEV_SNP) && !sev_snp_check_hypervisor_features())
+ sev_es_terminate(0, GHCB_SEV_ES_SNP_UNSUPPORTED);
+
/*
* Clear the boot_ghcb. The first exception comes in before the bss
* section is cleared.
--
2.17.1

2021-06-02 14:06:35

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 04/22] x86/mm: Add sev_feature_enabled() helper

The sev_feature_enabled() helper can be used by the guest to query whether
the SNP - Secure Nested Paging feature is active.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 9 +++++++++
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/mm/mem_encrypt.c | 14 ++++++++++++++
3 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 9c80c68d75b5..bcc00d0d7c20 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -16,6 +16,12 @@

#include <asm/bootparam.h>

+enum sev_feature_type {
+ SEV,
+ SEV_ES,
+ SEV_SNP
+};
+
#ifdef CONFIG_AMD_MEM_ENCRYPT

extern u64 sme_me_mask;
@@ -53,6 +59,7 @@ void __init sev_es_init_vc_handling(void);
bool sme_active(void);
bool sev_active(void);
bool sev_es_active(void);
+bool sev_feature_enabled(unsigned int feature_type);

#define __bss_decrypted __section(".bss..decrypted")

@@ -78,6 +85,7 @@ static inline void sev_es_init_vc_handling(void) { }
static inline bool sme_active(void) { return false; }
static inline bool sev_active(void) { return false; }
static inline bool sev_es_active(void) { return false; }
+static inline bool sev_snp_active(void) { return false; }

static inline int __init
early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
@@ -85,6 +93,7 @@ static inline int __init
early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }

static inline void mem_encrypt_free_decrypted_mem(void) { }
+static bool sev_feature_enabled(unsigned int feature_type) { return false; }

#define __bss_decrypted

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 211ba3375ee9..69ce50fa3565 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -481,8 +481,10 @@
#define MSR_AMD64_SEV 0xc0010131
#define MSR_AMD64_SEV_ENABLED_BIT 0
#define MSR_AMD64_SEV_ES_ENABLED_BIT 1
+#define MSR_AMD64_SEV_SNP_ENABLED_BIT 2
#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
#define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
+#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)

#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ff08dc463634..63e7799a9a86 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -389,6 +389,16 @@ bool noinstr sev_es_active(void)
return sev_status & MSR_AMD64_SEV_ES_ENABLED;
}

+bool sev_feature_enabled(unsigned int type)
+{
+ switch (type) {
+ case SEV: return sev_status & MSR_AMD64_SEV_ENABLED;
+ case SEV_ES: return sev_status & MSR_AMD64_SEV_ES_ENABLED;
+ case SEV_SNP: return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+ default: return false;
+ }
+}
+
/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
bool force_dma_unencrypted(struct device *dev)
{
@@ -461,6 +471,10 @@ static void print_mem_encrypt_feature_info(void)
if (sev_es_active())
pr_cont(" SEV-ES");

+ /* Secure Nested Paging */
+ if (sev_feature_enabled(SEV_SNP))
+ pr_cont(" SEV-SNP");
+
pr_cont("\n");
}

--
2.17.1

2021-06-02 14:06:52

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 05/22] x86/sev: Add support for hypervisor feature VMGEXIT

Version 2 of GHCB specification introduced advertisement of a features
that are supported by the hypervisor. Define the GHCB MSR protocol and NAE
for the hypervisor feature request and query the feature during the GHCB
protocol negotitation. See the GHCB specification for more details.

Version 2 of GHCB specification adds several new NAEs, most of them are
optional except the hypervisor feature. Now that hypervisor feature NAE
is implemented, so bump the GHCB maximum support protocol version.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 9 +++++++++
arch/x86/include/asm/sev.h | 2 +-
arch/x86/kernel/sev-shared.c | 21 +++++++++++++++++++++
3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index f1e2aacb0d61..981fff2257b9 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -45,6 +45,15 @@
(((unsigned long)reg & GHCB_MSR_CPUID_REG_MASK) << GHCB_MSR_CPUID_REG_POS) | \
(((unsigned long)fn) << GHCB_MSR_CPUID_FUNC_POS))

+/* GHCB Hypervisor Feature Request */
+#define GHCB_MSR_HV_FT_REQ 0x080
+#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_POS 12
+#define GHCB_MSR_HV_FT_MASK GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_HV_FT_RESP_VAL(v) \
+ (((unsigned long)((v) & GHCB_MSR_HV_FT_MASK) >> GHCB_MSR_HV_FT_POS))
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7ec91b1359df..134a7c9d91b6 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -13,7 +13,7 @@
#include <asm/sev-common.h>

#define GHCB_PROTOCOL_MIN 1ULL
-#define GHCB_PROTOCOL_MAX 1ULL
+#define GHCB_PROTOCOL_MAX 2ULL
#define GHCB_DEFAULT_USAGE 0ULL

#define VMGEXIT() { asm volatile("rep; vmmcall\n\r"); }
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 70f181f20d92..94957c5bdb51 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -20,6 +20,7 @@
* out when the .bss section is later cleared.
*/
static u16 ghcb_version __section(".data");
+static u64 hv_features __section(".data");

static bool __init sev_es_check_cpu_features(void)
{
@@ -49,6 +50,22 @@ static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
asm volatile("hlt\n" : : : "memory");
}

+static bool get_hv_features(void)
+{
+ u64 val;
+
+ sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_HV_FT_RESP)
+ return false;
+
+ hv_features = GHCB_MSR_HV_FT_RESP_VAL(val);
+
+ return true;
+}
+
static bool sev_es_negotiate_protocol(void)
{
u64 val;
@@ -67,6 +84,10 @@ static bool sev_es_negotiate_protocol(void)

ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);

+ /* The hypervisor features are available from version 2 onward. */
+ if ((ghcb_version >= 2) && !get_hv_features())
+ return false;
+
return true;
}

--
2.17.1

2021-06-02 14:07:00

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 17/22] KVM: SVM: Create a separate mapping for the GHCB save area

From: Tom Lendacky <[email protected]>

The initial implementation of the GHCB spec was based on trying to keep
the register state offsets the same relative to the VM save area. However,
the save area for SEV-ES has changed within the hardware causing the
relation between the SEV-ES save area to change relative to the GHCB save
area.

This is the second step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Create a GHCB save area that matches the GHCB specification.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 48 +++++++++++++++++++++++++++++++++++---
1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 482fb20104da..f5edfc552240 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -346,9 +346,49 @@ struct sev_es_save_area {
u64 x87_state_gpa;
} __packed;

+struct ghcb_save_area {
+ u8 reserved_1[203];
+ u8 cpl;
+ u8 reserved_2[116];
+ u64 xss;
+ u8 reserved_3[24];
+ u64 dr7;
+ u8 reserved_4[16];
+ u64 rip;
+ u8 reserved_5[88];
+ u64 rsp;
+ u8 reserved_6[24];
+ u64 rax;
+ u8 reserved_7[264];
+ u64 rcx;
+ u64 rdx;
+ u64 rbx;
+ u8 reserved_8[8];
+ u64 rbp;
+ u64 rsi;
+ u64 rdi;
+ u64 r8;
+ u64 r9;
+ u64 r10;
+ u64 r11;
+ u64 r12;
+ u64 r13;
+ u64 r14;
+ u64 r15;
+ u8 reserved_9[16];
+ u64 sw_exit_code;
+ u64 sw_exit_info_1;
+ u64 sw_exit_info_2;
+ u64 sw_scratch;
+ u8 reserved_10[56];
+ u64 xcr0;
+ u8 valid_bitmap[16];
+ u64 x87_state_gpa;
+} __packed;
+
struct ghcb {
- struct sev_es_save_area save;
- u8 reserved_save[2048 - sizeof(struct sev_es_save_area)];
+ struct ghcb_save_area save;
+ u8 reserved_save[2048 - sizeof(struct ghcb_save_area)];

u8 shared_buffer[2032];

@@ -359,6 +399,7 @@ struct ghcb {


#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
+#define EXPECTED_GHCB_SAVE_AREA_SIZE 1032
#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 272
#define EXPECTED_GHCB_SIZE PAGE_SIZE
@@ -366,6 +407,7 @@ struct ghcb {
static inline void __unused_size_checks(void)
{
BUILD_BUG_ON(sizeof(struct vmcb_save_area) != EXPECTED_VMCB_SAVE_AREA_SIZE);
+ BUILD_BUG_ON(sizeof(struct ghcb_save_area) != EXPECTED_GHCB_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct sev_es_save_area) != EXPECTED_SEV_ES_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct vmcb_control_area) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct ghcb) != EXPECTED_GHCB_SIZE);
@@ -437,7 +479,7 @@ struct vmcb {
/* GHCB Accessor functions */

#define GHCB_BITMAP_IDX(field) \
- (offsetof(struct sev_es_save_area, field) / sizeof(u64))
+ (offsetof(struct ghcb_save_area, field) / sizeof(u64))

#define DEFINE_GHCB_ACCESSORS(field) \
static inline bool ghcb_##field##_is_valid(const struct ghcb *ghcb) \
--
2.17.1

2021-06-02 14:07:15

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes

The early_set_memory_{encrypt,decrypt}() are used for changing the
page from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps.

If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:

1. Issue the page state change VMGEXIT to add the page as a private
in the RMP table.
2. Validate the page after its successfully added in the RMP table.

To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before clearing the
encryption attribute from the page table.

1. Invalidate the page.
2. Issue the page state change VMGEXIT to make the page shared in the
RMP table.

The early_set_memory_{encrypt,decrypt} can be called before the GHCB
is setup, use the SNP page state MSR protocol VMGEXIT defined in the GHCB
specification to request the page state change in the RMP table.

While at it, add a helper snp_prep_memory() that can be used outside
the sev specific files to change the page state for a specified memory
range.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 20 +++++++
arch/x86/kernel/sev.c | 105 +++++++++++++++++++++++++++++++++++++
arch/x86/mm/mem_encrypt.c | 47 ++++++++++++++++-
3 files changed, 170 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index c41c786d69fe..7c2cb5300e43 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -65,6 +65,12 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* RMP page size */
#define RMP_PG_SIZE_4K 0

+/* Memory opertion for snp_prep_memory() */
+enum snp_mem_op {
+ MEMORY_PRIVATE,
+ MEMORY_SHARED
+};
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -103,6 +109,11 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)

return rc;
}
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages);
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages);
+void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -110,6 +121,15 @@ static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { ret
static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
+static inline void __init
+early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages)
+{
+}
+static inline void __init
+early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages)
+{
+}
+static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op) { }
#endif

#endif
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 455c09a9b2c2..6e9b45bb38ab 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -532,6 +532,111 @@ static u64 get_jump_table_addr(void)
return ret;
}

+static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool validate)
+{
+ unsigned long vaddr_end;
+ int rc;
+
+ vaddr = vaddr & PAGE_MASK;
+ vaddr_end = vaddr + (npages << PAGE_SHIFT);
+
+ while (vaddr < vaddr_end) {
+ rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
+ if (WARN(rc, "Failed to validate address 0x%lx ret %d", vaddr, rc))
+ sev_es_terminate(1, GHCB_TERM_PVALIDATE);
+
+ vaddr = vaddr + PAGE_SIZE;
+ }
+}
+
+static void __init early_set_page_state(unsigned long paddr, unsigned int npages, int op)
+{
+ unsigned long paddr_end;
+ u64 val;
+
+ paddr = paddr & PAGE_MASK;
+ paddr_end = paddr + (npages << PAGE_SHIFT);
+
+ while (paddr < paddr_end) {
+ /*
+ * Use the MSR protocol because this function can be called before the GHCB
+ * is established.
+ */
+ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP)
+ goto e_term;
+
+ if (WARN(GHCB_MSR_PSC_RESP_VAL(val),
+ "Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",
+ op == SNP_PAGE_STATE_PRIVATE ? "private" : "shared",
+ paddr, GHCB_MSR_PSC_RESP_VAL(val)))
+ goto e_term;
+
+ paddr = paddr + PAGE_SIZE;
+ }
+
+ return;
+
+e_term:
+ sev_es_terminate(1, GHCB_TERM_PSC);
+}
+
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return;
+
+ /* Ask hypervisor to add the memory pages in RMP table as a 'private'. */
+ early_set_page_state(paddr, npages, SNP_PAGE_STATE_PRIVATE);
+
+ /* Validate the memory pages after they've been added in the RMP table. */
+ pvalidate_pages(vaddr, npages, 1);
+}
+
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+ unsigned int npages)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return;
+
+ /*
+ * Invalidate the memory pages before they are marked shared in the
+ * RMP table.
+ */
+ pvalidate_pages(vaddr, npages, 0);
+
+ /* Ask hypervisor to make the memory pages shared in the RMP table. */
+ early_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
+}
+
+void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op)
+{
+ unsigned long vaddr, npages;
+
+ vaddr = (unsigned long)__va(paddr);
+ npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ switch (op) {
+ case MEMORY_PRIVATE: {
+ early_snp_set_memory_private(vaddr, paddr, npages);
+ return;
+ }
+ case MEMORY_SHARED: {
+ early_snp_set_memory_shared(vaddr, paddr, npages);
+ return;
+ }
+ default:
+ break;
+ }
+
+ WARN(1, "invalid memory op %d\n", op);
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 63e7799a9a86..45d9feb0151a 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -30,6 +30,7 @@
#include <asm/processor-flags.h>
#include <asm/msr.h>
#include <asm/cmdline.h>
+#include <asm/sev.h>

#include "mm_internal.h"

@@ -48,6 +49,34 @@ EXPORT_SYMBOL_GPL(sev_enable_key);
/* Buffer used for early in-place encryption by BSP, no locking needed */
static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);

+/*
+ * When SNP is active, changes the page state from private to shared before
+ * copying the data from the source to destination and restore after the copy.
+ * This is required because the source address is mapped as decrypted by the
+ * caller of the routine.
+ */
+static inline void __init snp_memcpy(void *dst, void *src, size_t sz,
+ unsigned long paddr, bool decrypt)
+{
+ unsigned long npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ if (!sev_feature_enabled(SEV_SNP) || !decrypt) {
+ memcpy(dst, src, sz);
+ return;
+ }
+
+ /*
+ * If the paddr needs to be accessed decrypted, mark the page
+ * shared in the RMP table before copying it.
+ */
+ early_snp_set_memory_shared((unsigned long)__va(paddr), paddr, npages);
+
+ memcpy(dst, src, sz);
+
+ /* Restore the page state after the memcpy. */
+ early_snp_set_memory_private((unsigned long)__va(paddr), paddr, npages);
+}
+
/*
* This routine does not change the underlying encryption setting of the
* page(s) that map this memory. It assumes that eventually the memory is
@@ -96,8 +125,8 @@ static void __init __sme_early_enc_dec(resource_size_t paddr,
* Use a temporary buffer, of cache-line multiple size, to
* avoid data corruption as documented in the APM.
*/
- memcpy(sme_early_buffer, src, len);
- memcpy(dst, sme_early_buffer, len);
+ snp_memcpy(sme_early_buffer, src, len, paddr, enc);
+ snp_memcpy(dst, sme_early_buffer, len, paddr, !enc);

early_memunmap(dst, len);
early_memunmap(src, len);
@@ -277,9 +306,23 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
else
sme_early_decrypt(pa, size);

+ /*
+ * If page is getting mapped decrypted in the page table, then the page state
+ * change in the RMP table must happen before the page table updates.
+ */
+ if (!enc)
+ early_snp_set_memory_shared((unsigned long)__va(pa), pa, 1);
+
/* Change the page encryption mask. */
new_pte = pfn_pte(pfn, new_prot);
set_pte_atomic(kpte, new_pte);
+
+ /*
+ * If page is set encrypted in the page table, then update the RMP table to
+ * add this page as private.
+ */
+ if (enc)
+ early_snp_set_memory_private((unsigned long)__va(pa), pa, 1);
}

static int __init early_set_memory_enc_dec(unsigned long vaddr,
--
2.17.1

2021-06-02 14:08:15

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 12/22] x86/kernel: Make the bss.decrypted section shared in RMP table

The encryption attribute for the bss.decrypted region is cleared in the
initial page table build. This is because the section contains the data
that need to be shared between the guest and the hypervisor.

When SEV-SNP is active, just clearing the encryption attribute in the
page table is not enough. The page state need to be updated in the RMP
table.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/head64.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..f4c3e632345a 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -288,7 +288,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
if (mem_encrypt_active()) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
+
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+ /*
+ * When SEV-SNP is active then transition the page to shared in the RMP
+ * table so that it is consistent with the page table attribute change.
+ */
+ early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
+
i = pmd_index(vaddr);
pmd[i] -= sme_get_me_mask();
}
--
2.17.1

2021-06-02 14:08:26

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 18/22] KVM: SVM: Update the SEV-ES save area mapping

From: Tom Lendacky <[email protected]>

This is the final step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Update the SEV-ES/SEV-SNP save area to match the APM. This save
area will be used for the upcoming SEV-SNP AP Creation NAE event support.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 66 +++++++++++++++++++++++++++++---------
1 file changed, 50 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index f5edfc552240..b6f358d6b975 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -282,7 +282,13 @@ struct sev_es_save_area {
struct vmcb_seg ldtr;
struct vmcb_seg idtr;
struct vmcb_seg tr;
- u8 reserved_1[43];
+ u64 vmpl0_ssp;
+ u64 vmpl1_ssp;
+ u64 vmpl2_ssp;
+ u64 vmpl3_ssp;
+ u64 u_cet;
+ u8 reserved_1[2];
+ u8 vmpl;
u8 cpl;
u8 reserved_2[4];
u64 efer;
@@ -295,9 +301,19 @@ struct sev_es_save_area {
u64 dr6;
u64 rflags;
u64 rip;
- u8 reserved_4[88];
+ u64 dr0;
+ u64 dr1;
+ u64 dr2;
+ u64 dr3;
+ u64 dr0_addr_mask;
+ u64 dr1_addr_mask;
+ u64 dr2_addr_mask;
+ u64 dr3_addr_mask;
+ u8 reserved_4[24];
u64 rsp;
- u8 reserved_5[24];
+ u64 s_cet;
+ u64 ssp;
+ u64 isst_addr;
u64 rax;
u64 star;
u64 lstar;
@@ -308,7 +324,7 @@ struct sev_es_save_area {
u64 sysenter_esp;
u64 sysenter_eip;
u64 cr2;
- u8 reserved_6[32];
+ u8 reserved_5[32];
u64 g_pat;
u64 dbgctl;
u64 br_from;
@@ -317,12 +333,12 @@ struct sev_es_save_area {
u64 last_excp_to;
u8 reserved_7[80];
u32 pkru;
- u8 reserved_9[20];
- u64 reserved_10; /* rax already available at 0x01f8 */
+ u8 reserved_8[20];
+ u64 reserved_9; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
u64 rbx;
- u64 reserved_11; /* rsp already available at 0x01d8 */
+ u64 reserved_10; /* rsp already available at 0x01d8 */
u64 rbp;
u64 rsi;
u64 rdi;
@@ -334,16 +350,34 @@ struct sev_es_save_area {
u64 r13;
u64 r14;
u64 r15;
- u8 reserved_12[16];
- u64 sw_exit_code;
- u64 sw_exit_info_1;
- u64 sw_exit_info_2;
- u64 sw_scratch;
+ u8 reserved_11[16];
+ u64 guest_exit_info_1;
+ u64 guest_exit_info_2;
+ u64 guest_exit_int_info;
+ u64 guest_nrip;
u64 sev_features;
- u8 reserved_13[48];
+ u64 vintr_ctrl;
+ u64 guest_exit_code;
+ u64 virtual_tom;
+ u64 tlb_id;
+ u64 pcpu_id;
+ u64 event_inj;
u64 xcr0;
- u8 valid_bitmap[16];
- u64 x87_state_gpa;
+ u8 reserved_12[16];
+
+ /* Floating point area */
+ u64 x87_dp;
+ u32 mxcsr;
+ u16 x87_ftw;
+ u16 x87_fsw;
+ u16 x87_fcw;
+ u16 x87_fop;
+ u16 x87_ds;
+ u16 x87_cs;
+ u64 x87_rip;
+ u8 fpreg_x87[80];
+ u8 fpreg_xmm[256];
+ u8 fpreg_ymm[256];
} __packed;

struct ghcb_save_area {
@@ -400,7 +434,7 @@ struct ghcb {

#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
#define EXPECTED_GHCB_SAVE_AREA_SIZE 1032
-#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
+#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1648
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 272
#define EXPECTED_GHCB_SIZE PAGE_SIZE

--
2.17.1

2021-06-02 14:08:37

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 21/22] x86/sev: Register SNP guest request platform device

Version 2 of GHCB specification provides NAEs that can be used by the SNP
guest to communicate with the PSP without risk from a malicious hypervisor
who wishes to read, alter, drop or replay the messages sent.

The hypervisor uses the SNP_GUEST_REQUEST command interface provided by
the SEV-SNP firmware to forward the guest messages to the PSP.

In order to communicate with the PSP, the guest need to locate the secrets
page inserted by the hypervisor during the SEV-SNP guest launch. The
secrets page contains the communication keys used to send and receive the
encrypted messages between the guest and the PSP.

The secrets page is located either through the setup_data cc_blob_address
or EFI configuration table.

Create a platform device that the SNP guest driver can bind to get the
platform resources. The SNP guest driver can provide userspace interface
to get the attestation report, key derivation etc.

The helper snp_issue_guest_request() will be used by the drivers to
send the guest message request to the hypervisor. The guest message header
contains a message count. The message count is used in the IV. The
firmware increments the message count by 1, and expects that next message
will be using the incremented count.

The helper snp_msg_seqno() will be used by driver to get and message
sequence counter, and it will be automatically incremented by the
snp_issue_guest_request(). The incremented value is be saved in the
secrets page so that the kexec'ed kernel knows from where to begin.

See SEV-SNP and GHCB spec for more details.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 12 +++
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kernel/sev.c | 176 ++++++++++++++++++++++++++++++++
arch/x86/platform/efi/efi.c | 2 +
include/linux/efi.h | 1 +
include/linux/sev-guest.h | 76 ++++++++++++++
6 files changed, 269 insertions(+)
create mode 100644 include/linux/sev-guest.h

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 640108402ae9..da2f757cd9bc 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -59,6 +59,18 @@ extern void vc_no_ghcb(void);
extern void vc_boot_ghcb(void);
extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

+/* AMD SEV Confidential computing blob structure */
+#define CC_BLOB_SEV_HDR_MAGIC 0x45444d41
+struct cc_blob_sev_info {
+ u32 magic;
+ u16 version;
+ u16 reserved;
+ u64 secrets_phys;
+ u32 secrets_len;
+ u64 cpuid_phys;
+ u32 cpuid_len;
+};
+
/* Software defined (when rFlags.CF = 1) */
#define PVALIDATE_FAIL_NOUPDATE 255

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index c0152186a008..bd64f2b98ac7 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -109,6 +109,7 @@
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
#define SVM_VMGEXIT_PSC 0x80000010
+#define SVM_VMGEXIT_GUEST_REQUEST 0x80000011
#define SVM_VMGEXIT_AP_CREATION 0x80000013
#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
#define SVM_VMGEXIT_AP_CREATE 1
@@ -222,6 +223,7 @@
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
{ SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
{ SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
+ { SVM_VMGEXIT_GUEST_REQUEST, "vmgexit_guest_request" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }


diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 8f7ef35a25ef..8aae1166f52e 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -9,6 +9,7 @@

#define pr_fmt(fmt) "SEV-ES: " fmt

+#include <linux/platform_device.h>
#include <linux/sched/debug.h> /* For show_regs() */
#include <linux/percpu-defs.h>
#include <linux/mem_encrypt.h>
@@ -16,10 +17,13 @@
#include <linux/printk.h>
#include <linux/mm_types.h>
#include <linux/set_memory.h>
+#include <linux/sev-guest.h>
#include <linux/memblock.h>
#include <linux/kernel.h>
+#include <linux/efi.h>
#include <linux/mm.h>
#include <linux/cpumask.h>
+#include <linux/io.h>

#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -33,6 +37,7 @@
#include <asm/smp.h>
#include <asm/cpu.h>
#include <asm/apic.h>
+#include <asm/setup.h> /* For struct boot_params */

#include "sev-internal.h"

@@ -47,6 +52,8 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
*/
static struct ghcb __initdata *boot_ghcb;

+static unsigned long snp_secrets_phys;
+
/* #VC handler runtime per-CPU data */
struct sev_es_runtime_data {
struct ghcb ghcb_page;
@@ -105,6 +112,10 @@ struct ghcb_state {
struct ghcb *ghcb;
};

+#ifdef CONFIG_EFI
+extern unsigned long cc_blob_phys;
+#endif
+
static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);

@@ -1909,3 +1920,168 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
while (true)
halt();
}
+
+static struct resource guest_req_res[0];
+static struct platform_device guest_req_device = {
+ .name = "snp-guest",
+ .id = -1,
+ .resource = guest_req_res,
+ .num_resources = 1,
+};
+
+static struct snp_secrets_page_layout *snp_map_secrets_page(void)
+{
+ u16 __iomem *secrets;
+
+ if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
+ return NULL;
+
+ secrets = ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
+ if (!secrets)
+ return NULL;
+
+ return (struct snp_secrets_page_layout *)secrets;
+}
+
+u64 snp_msg_seqno(void)
+{
+ struct snp_secrets_page_layout *layout;
+ u64 count;
+
+ layout = snp_map_secrets_page();
+ if (layout == NULL)
+ return 0;
+
+ /* Read the current message sequence counter from secrets pages */
+ count = readl(&layout->os_area.msg_seqno_0);
+
+ iounmap(layout);
+
+ /*
+ * The message sequence counter for the SNP guest request is a 64-bit value
+ * but the version 2 of GHCB specification defines the 32-bit storage for the
+ * it.
+ */
+ if ((count + 1) >= INT_MAX)
+ return 0;
+
+ return count + 1;
+}
+EXPORT_SYMBOL_GPL(snp_msg_seqno);
+
+static void snp_gen_msg_seqno(void)
+{
+ struct snp_secrets_page_layout *layout;
+ u64 count;
+
+ layout = snp_map_secrets_page();
+ if (layout == NULL)
+ return;
+
+ /* Increment the sequence counter by 2 and save in secrets page. */
+ count = readl(&layout->os_area.msg_seqno_0);
+ count += 2;
+
+ writel(count, &layout->os_area.msg_seqno_0);
+ iounmap(layout);
+}
+
+static int get_snp_secrets_resource(struct resource *res)
+{
+ struct setup_header *hdr = &boot_params.hdr;
+ struct cc_blob_sev_info *info;
+ unsigned long paddr;
+ int ret = -ENODEV;
+
+ /*
+ * The secret page contains the VM encryption key used for encrypting the
+ * messages between the guest and the PSP. The secrets page location is
+ * available either through the setup_data or EFI configuration table.
+ */
+ if (hdr->cc_blob_address) {
+ paddr = hdr->cc_blob_address;
+ } else if (efi_enabled(EFI_CONFIG_TABLES)) {
+#ifdef CONFIG_EFI
+ paddr = cc_blob_phys;
+#else
+ return -ENODEV;
+#endif
+ } else {
+ return -ENODEV;
+ }
+
+ info = memremap(paddr, sizeof(*info), MEMREMAP_WB);
+ if (!info)
+ return -ENOMEM;
+
+ /* Verify the header that its a valid SEV_SNP CC header */
+ if ((info->magic == CC_BLOB_SEV_HDR_MAGIC) &&
+ info->secrets_phys &&
+ (info->secrets_len == PAGE_SIZE)) {
+ res->start = info->secrets_phys;
+ res->end = info->secrets_phys + info->secrets_len;
+ res->flags = IORESOURCE_MEM;
+ snp_secrets_phys = info->secrets_phys;
+ ret = 0;
+ }
+
+ memunmap(info);
+ return ret;
+}
+
+static int __init add_snp_guest_request(void)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return -ENODEV;
+
+ if (get_snp_secrets_resource(&guest_req_res[0]))
+ return -ENODEV;
+
+ platform_device_register(&guest_req_device);
+ dev_info(&guest_req_device.dev, "registered [secret 0x%llx - 0x%llx]\n",
+ guest_req_res[0].start, guest_req_res[0].end);
+
+ return 0;
+}
+device_initcall(add_snp_guest_request);
+
+unsigned long snp_issue_guest_request(int type, struct snp_guest_request_data *input)
+{
+ struct ghcb_state state;
+ struct ghcb *ghcb;
+ unsigned long id;
+ int ret;
+
+ if (!sev_feature_enabled(SEV_SNP))
+ return -ENODEV;
+
+ if (type == GUEST_REQUEST)
+ id = SVM_VMGEXIT_GUEST_REQUEST;
+ else
+ return -EINVAL;
+
+ ghcb = sev_es_get_ghcb(&state);
+ if (!ghcb)
+ return -ENODEV;
+
+ vc_ghcb_invalidate(ghcb);
+ ghcb_set_rax(ghcb, input->data_gpa);
+ ghcb_set_rbx(ghcb, input->data_npages);
+
+ ret = sev_es_ghcb_hv_call(ghcb, NULL, id, input->req_gpa, input->resp_gpa);
+ if (ret)
+ goto e_put;
+
+ if (ghcb->save.sw_exit_info_2) {
+ ret = ghcb->save.sw_exit_info_2;
+ goto e_put;
+ }
+
+ /* Command was successful, increment the message sequence counter. */
+ snp_gen_msg_seqno();
+
+e_put:
+ sev_es_put_ghcb(&state);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(snp_issue_guest_request);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 8a26e705cb06..2cca9ee6e1d4 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -57,6 +57,7 @@ static unsigned long efi_systab_phys __initdata;
static unsigned long prop_phys = EFI_INVALID_TABLE_ADDR;
static unsigned long uga_phys = EFI_INVALID_TABLE_ADDR;
static unsigned long efi_runtime, efi_nr_tables;
+unsigned long cc_blob_phys;

unsigned long efi_fw_vendor, efi_config_table;

@@ -66,6 +67,7 @@ static const efi_config_table_type_t arch_tables[] __initconst = {
#ifdef CONFIG_X86_UV
{UV_SYSTEM_TABLE_GUID, &uv_systab_phys, "UVsystab" },
#endif
+ {EFI_CC_BLOB_GUID, &cc_blob_phys, "CC blob" },
{},
};

diff --git a/include/linux/efi.h b/include/linux/efi.h
index 6b5d36babfcc..75aeb2a56888 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -344,6 +344,7 @@ void efi_native_runtime_setup(void);
#define EFI_CERT_SHA256_GUID EFI_GUID(0xc1c41626, 0x504c, 0x4092, 0xac, 0xa9, 0x41, 0xf9, 0x36, 0x93, 0x43, 0x28)
#define EFI_CERT_X509_GUID EFI_GUID(0xa5c059a1, 0x94e4, 0x4aa7, 0x87, 0xb5, 0xab, 0x15, 0x5c, 0x2b, 0xf0, 0x72)
#define EFI_CERT_X509_SHA256_GUID EFI_GUID(0x3bd2a492, 0x96c0, 0x4079, 0xb4, 0x20, 0xfc, 0xf9, 0x8e, 0xf1, 0x03, 0xed)
+#define EFI_CC_BLOB_GUID EFI_GUID(0x067b1f5f, 0xcf26, 0x44c5, 0x85, 0x54, 0x93, 0xd7, 0x77, 0x91, 0x2d, 0x42)

/*
* This GUID is used to pass to the kernel proper the struct screen_info
diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
new file mode 100644
index 000000000000..51277448a108
--- /dev/null
+++ b/include/linux/sev-guest.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * AMD Secure Encrypted Virtualization (SEV) guest driver interface
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ *
+ */
+
+#ifndef __LINUX_SEV_GUEST_H_
+#define __LINUX_SEV_GUEST_H_
+
+#include <linux/types.h>
+
+enum vmgexit_type {
+ GUEST_REQUEST,
+
+ GUEST_REQUEST_MAX
+};
+
+/*
+ * The secrets page contains 96-bytes of reserved field that can be used by
+ * the guest OS. The guest OS uses the area to save the message sequence
+ * number for each VMPL level.
+ *
+ * See the GHCB spec section Secret page layout for the format for this area.
+ */
+struct secrets_os_area {
+ u32 msg_seqno_0;
+ u32 msg_seqno_1;
+ u32 msg_seqno_2;
+ u32 msg_seqno_3;
+ u64 ap_jump_table_pa;
+ u8 rsvd[40];
+ u8 guest_usage[32];
+} __packed;
+
+#define VMPCK_KEY_LEN 32
+
+/* See the SNP spec secrets page layout section for the structure */
+struct snp_secrets_page_layout {
+ u32 version;
+ u32 imiEn : 1,
+ rsvd1 : 31;
+ u32 fms;
+ u32 rsvd2;
+ u8 gosvw[16];
+ u8 vmpck0[VMPCK_KEY_LEN];
+ u8 vmpck1[VMPCK_KEY_LEN];
+ u8 vmpck2[VMPCK_KEY_LEN];
+ u8 vmpck3[VMPCK_KEY_LEN];
+ struct secrets_os_area os_area;
+ u8 rsvd3[3840];
+} __packed;
+
+struct snp_guest_request_data {
+ unsigned long req_gpa;
+ unsigned long resp_gpa;
+ unsigned long data_gpa;
+ unsigned int data_npages;
+};
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+unsigned long snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input);
+u64 snp_msg_seqno(void);
+#else
+
+static inline unsigned long snp_issue_guest_request(int type,
+ struct snp_guest_request_data *input)
+{
+ return -ENODEV;
+}
+static inline u64 snp_msg_seqno(void) { return 0; }
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+#endif /* __LINUX_SEV_GUEST_H__ */
--
2.17.1

2021-06-02 14:08:47

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 08/22] x86/compressed: Add helper for validating pages in the decompression stage

Many of the integrity guarantees of SEV-SNP are enforced through the
Reverse Map Table (RMP). Each RMP entry contains the GPA at which a
particular page of DRAM should be mapped. The VMs can request the
hypervisor to add pages in the RMP table via the Page State Change VMGEXIT
defined in the GHCB specification. Inside each RMP entry is a Validated
flag; this flag is automatically cleared to 0 by the CPU hardware when a
new RMP entry is created for a guest. Each VM page can be either
validated or invalidated, as indicated by the Validated flag in the RMP
entry. Memory access to a private page that is not validated generates
a #VC. A VM must use PVALIDATE instruction to validate the private page
before using it.

To maintain the security guarantee of SEV-SNP guests, when transitioning
pages from private to shared, the guest must invalidate the pages before
asking the hypervisor to change the page state to shared in the RMP table.

After the pages are mapped private in the page table, the guest must issue
a page state change VMGEXIT to make the pages private in the RMP table and
validate it.

On boot, BIOS should have validated the entire system memory. During
the kernel decompression stage, the VC handler uses the
set_memory_decrypted() to make the GHCB page shared (i.e clear encryption
attribute). And while exiting from the decompression, it calls the
set_page_encrypted() to make the page private.

Add sev_snp_set_page_{private,shared}() helper that is used by the
set_memory_{decrypt,encrypt}() to change the page state in the RMP table.

Signed-off-by: Brijesh Singh <[email protected]>
---
Hi Boris,

As you pointed in the v2 feedback, the RMP_PG_SIZE_4K macro is later moved
from sev-common.h to generic header file. You wanted to avoid the move and
define the macro in generic from the get go. But that generic file is not
included in part1 of the series so I kept the macro definition in
sev-common.h and later moved to generic in part2 series. This is mainly to
make sure that part1 compiles independently.

-Brijesh

arch/x86/boot/compressed/ident_map_64.c | 17 ++++++++-
arch/x86/boot/compressed/misc.h | 6 ++++
arch/x86/boot/compressed/sev.c | 46 +++++++++++++++++++++++++
arch/x86/include/asm/sev-common.h | 19 ++++++++++
arch/x86/include/asm/sev.h | 3 ++
5 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index f7213d0943b8..59befc610993 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -274,16 +274,31 @@ static int set_clr_page_flags(struct x86_mapping_info *info,
/*
* Changing encryption attributes of a page requires to flush it from
* the caches.
+ *
+ * If the encryption attribute is being cleared, then change the page
+ * state to shared in the RMP table.
*/
- if ((set | clr) & _PAGE_ENC)
+ if ((set | clr) & _PAGE_ENC) {
clflush_page(address);

+ if (clr)
+ snp_set_page_shared(pte_pfn(*ptep) << PAGE_SHIFT);
+ }
+
/* Update PTE */
pte = *ptep;
pte = pte_set_flags(pte, set);
pte = pte_clear_flags(pte, clr);
set_pte(ptep, pte);

+ /*
+ * If the encryption attribute is being set, then change the page state to
+ * private in the RMP entry. The page state must be done after the PTE
+ * is updated.
+ */
+ if (set & _PAGE_ENC)
+ snp_set_page_private(pte_pfn(*ptep) << PAGE_SHIFT);
+
/* Flush TLB after changing encryption attribute */
write_cr3(top_level_pgt);

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index e5612f035498..49a2a5848eec 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -121,12 +121,18 @@ void set_sev_encryption_mask(void);
#ifdef CONFIG_AMD_MEM_ENCRYPT
void sev_es_shutdown_ghcb(void);
extern bool sev_es_check_ghcb_fault(unsigned long address);
+void snp_set_page_private(unsigned long paddr);
+void snp_set_page_shared(unsigned long paddr);
+
#else
static inline void sev_es_shutdown_ghcb(void) { }
static inline bool sev_es_check_ghcb_fault(unsigned long address)
{
return false;
}
+static inline void snp_set_page_private(unsigned long paddr) { }
+static inline void snp_set_page_shared(unsigned long paddr) { }
+
#endif

/* acpi.c */
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 0745ea61d32e..808fe1f6b170 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -134,6 +134,52 @@ static inline bool sev_snp_enabled(void)
return msr_sev_status & MSR_AMD64_SEV_SNP_ENABLED;
}

+static void __page_state_change(unsigned long paddr, int op)
+{
+ u64 val;
+
+ if (!sev_snp_enabled())
+ return;
+
+ /*
+ * If private -> shared then invalidate the page before requesting the
+ * state change in the RMP table.
+ */
+ if ((op == SNP_PAGE_STATE_SHARED) && pvalidate(paddr, RMP_PG_SIZE_4K, 0))
+ goto e_pvalidate;
+
+ /* Issue VMGEXIT to change the page state in RMP table. */
+ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
+ VMGEXIT();
+
+ /* Read the response of the VMGEXIT. */
+ val = sev_es_rd_ghcb_msr();
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
+ sev_es_terminate(1, GHCB_TERM_PSC);
+
+ /*
+ * Now that page is added in the RMP table, validate it so that it is
+ * consistent with the RMP entry.
+ */
+ if ((op == SNP_PAGE_STATE_PRIVATE) && pvalidate(paddr, RMP_PG_SIZE_4K, 1))
+ goto e_pvalidate;
+
+ return;
+
+e_pvalidate:
+ sev_es_terminate(1, GHCB_TERM_PVALIDATE);
+}
+
+void snp_set_page_private(unsigned long paddr)
+{
+ __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
+}
+
+void snp_set_page_shared(unsigned long paddr)
+{
+ __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
+}
+
static bool early_setup_sev_es(void)
{
if (!sev_es_negotiate_protocol())
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 3ebf00772f26..1424b8ffde0b 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -56,6 +56,25 @@
#define GHCB_MSR_HV_FT_RESP_VAL(v) \
(((unsigned long)((v) & GHCB_MSR_HV_FT_MASK) >> GHCB_MSR_HV_FT_POS))

+#define GHCB_HV_FT_SNP BIT_ULL(0)
+
+/* SNP Page State Change */
+#define GHCB_MSR_PSC_REQ 0x014
+#define SNP_PAGE_STATE_PRIVATE 1
+#define SNP_PAGE_STATE_SHARED 2
+#define GHCB_MSR_PSC_GFN_POS 12
+#define GHCB_MSR_PSC_GFN_MASK GENMASK_ULL(39, 0)
+#define GHCB_MSR_PSC_OP_POS 52
+#define GHCB_MSR_PSC_OP_MASK 0xf
+#define GHCB_MSR_PSC_REQ_GFN(gfn, op) \
+ (((unsigned long)((op) & GHCB_MSR_PSC_OP_MASK) << GHCB_MSR_PSC_OP_POS) | \
+ ((unsigned long)((gfn) & GHCB_MSR_PSC_GFN_MASK) << GHCB_MSR_PSC_GFN_POS) | \
+ GHCB_MSR_PSC_REQ)
+
+#define GHCB_MSR_PSC_RESP 0x015
+#define GHCB_MSR_PSC_ERROR_POS 32
+#define GHCB_MSR_PSC_RESP_VAL(val) ((val) >> GHCB_MSR_PSC_ERROR_POS)
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 1b7a172b832b..c41c786d69fe 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -62,6 +62,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* Software defined (when rFlags.CF = 1) */
#define PVALIDATE_FAIL_NOUPDATE 255

+/* RMP page size */
+#define RMP_PG_SIZE_4K 0
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
--
2.17.1

2021-06-02 14:09:18

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 22/22] virt: Add SEV-SNP guest driver

SEV-SNP specification provides the guest a mechanism to communicate with
the PSP without risk from a malicious hypervisor who wishes to read, alter,
drop or replay the messages sent. The driver uses snp_issue_guest_request()
to issue GHCB SNP_GUEST_REQUEST NAE event. This command constructs a
trusted channel between the guest and the PSP firmware.

The userspace can use the following ioctls provided by the driver:

1. Request an attestation report that can be used to assume the identity
and security configuration of the guest.
2. Ask the firmware to provide a key derived from a root key.

See SEV-SNP spec section Guest Messages for more details.

Signed-off-by: Brijesh Singh <[email protected]>
---
drivers/virt/Kconfig | 3 +
drivers/virt/Makefile | 1 +
drivers/virt/sevguest/Kconfig | 10 +
drivers/virt/sevguest/Makefile | 4 +
drivers/virt/sevguest/snp.c | 448 +++++++++++++++++++++++++++++++++
drivers/virt/sevguest/snp.h | 63 +++++
include/uapi/linux/sev-guest.h | 56 +++++
7 files changed, 585 insertions(+)
create mode 100644 drivers/virt/sevguest/Kconfig
create mode 100644 drivers/virt/sevguest/Makefile
create mode 100644 drivers/virt/sevguest/snp.c
create mode 100644 drivers/virt/sevguest/snp.h
create mode 100644 include/uapi/linux/sev-guest.h

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 8061e8ef449f..4de714c5ee9a 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -36,4 +36,7 @@ source "drivers/virt/vboxguest/Kconfig"
source "drivers/virt/nitro_enclaves/Kconfig"

source "drivers/virt/acrn/Kconfig"
+
+source "drivers/virt/sevguest/Kconfig"
+
endif
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index 3e272ea60cd9..b2d1a8131c90 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -8,3 +8,4 @@ obj-y += vboxguest/

obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
obj-$(CONFIG_ACRN_HSM) += acrn/
+obj-$(CONFIG_SEV_GUEST) += sevguest/
diff --git a/drivers/virt/sevguest/Kconfig b/drivers/virt/sevguest/Kconfig
new file mode 100644
index 000000000000..e88a85527bf6
--- /dev/null
+++ b/drivers/virt/sevguest/Kconfig
@@ -0,0 +1,10 @@
+config SEV_GUEST
+ tristate "AMD SEV Guest driver"
+ default y
+ depends on AMD_MEM_ENCRYPT
+ help
+ Provides AMD SNP guest request driver. The driver can be used by the
+ guest to communicate with the hypervisor to request the attestation report
+ and more.
+
+ If you choose 'M' here, this module will be called sevguest.
diff --git a/drivers/virt/sevguest/Makefile b/drivers/virt/sevguest/Makefile
new file mode 100644
index 000000000000..1505df437682
--- /dev/null
+++ b/drivers/virt/sevguest/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+sevguest-y := snp.o
+
+obj-$(CONFIG_SEV_GUEST) += sevguest.o
diff --git a/drivers/virt/sevguest/snp.c b/drivers/virt/sevguest/snp.c
new file mode 100644
index 000000000000..00d8e8fddf2c
--- /dev/null
+++ b/drivers/virt/sevguest/snp.c
@@ -0,0 +1,448 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD Secure Encrypted Virtualization Nested Paging (SEV-SNP) guest request interface
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/io.h>
+#include <linux/platform_device.h>
+#include <linux/miscdevice.h>
+#include <linux/set_memory.h>
+#include <linux/fs.h>
+#include <crypto/aead.h>
+#include <linux/scatterlist.h>
+#include <linux/sev-guest.h>
+#include <uapi/linux/sev-guest.h>
+
+#include "snp.h"
+
+#define DEVICE_NAME "sev-guest"
+#define AAD_LEN 48
+#define MSG_HDR_VER 1
+
+struct snp_guest_crypto {
+ struct crypto_aead *tfm;
+ uint8_t *iv, *authtag;
+ int iv_len, a_len;
+};
+
+struct snp_guest_dev {
+ struct device *dev;
+ struct miscdevice misc;
+
+ struct snp_guest_crypto *crypto;
+ struct snp_guest_msg *request, *response;
+};
+
+static DEFINE_MUTEX(snp_cmd_mutex);
+
+static inline struct snp_guest_dev *to_snp_dev(struct file *file)
+{
+ struct miscdevice *dev = file->private_data;
+
+ return container_of(dev, struct snp_guest_dev, misc);
+}
+
+static struct snp_guest_crypto *init_crypto(struct snp_guest_dev *snp_dev, uint8_t *key,
+ size_t keylen)
+{
+ struct snp_guest_crypto *crypto;
+
+ crypto = kzalloc(sizeof(*crypto), GFP_KERNEL_ACCOUNT);
+ if (!crypto)
+ return NULL;
+
+ crypto->tfm = crypto_alloc_aead("gcm(aes)", 0, 0);
+ if (IS_ERR(crypto->tfm))
+ goto e_free;
+
+ if (crypto_aead_setkey(crypto->tfm, key, keylen))
+ goto e_free_crypto;
+
+ crypto->iv_len = crypto_aead_ivsize(crypto->tfm);
+ if (crypto->iv_len < 12) {
+ dev_err(snp_dev->dev, "IV length is less than 12.\n");
+ goto e_free_crypto;
+ }
+
+ crypto->iv = kmalloc(crypto->iv_len, GFP_KERNEL_ACCOUNT);
+ if (!crypto->iv)
+ goto e_free_crypto;
+
+ if (crypto_aead_authsize(crypto->tfm) > MAX_AUTHTAG_LEN) {
+ if (crypto_aead_setauthsize(crypto->tfm, MAX_AUTHTAG_LEN)) {
+ dev_err(snp_dev->dev, "failed to set authsize to %d\n", MAX_AUTHTAG_LEN);
+ goto e_free_crypto;
+ }
+ }
+
+ crypto->a_len = crypto_aead_authsize(crypto->tfm);
+ crypto->authtag = kmalloc(crypto->a_len, GFP_KERNEL_ACCOUNT);
+ if (!crypto->authtag)
+ goto e_free_crypto;
+
+ return crypto;
+
+e_free_crypto:
+ crypto_free_aead(crypto->tfm);
+e_free:
+ kfree(crypto->iv);
+ kfree(crypto->authtag);
+ kfree(crypto);
+
+ return NULL;
+}
+
+static void deinit_crypto(struct snp_guest_crypto *crypto)
+{
+ crypto_free_aead(crypto->tfm);
+ kfree(crypto->iv);
+ kfree(crypto->authtag);
+ kfree(crypto);
+}
+
+static int enc_dec_message(struct snp_guest_crypto *crypto, struct snp_guest_msg *msg,
+ uint8_t *src_buf, uint8_t *dst_buf, size_t len, bool enc)
+{
+ struct snp_guest_msg_hdr *hdr = &msg->hdr;
+ struct scatterlist src[3], dst[3];
+ DECLARE_CRYPTO_WAIT(wait);
+ struct aead_request *req;
+ int ret;
+
+ req = aead_request_alloc(crypto->tfm, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
+
+ /*
+ * AEAD memory operations:
+ * +------ AAD -------+------- DATA -----+---- AUTHTAG----+
+ * | msg header | plaintext | hdr->authtag |
+ * | bytes 30h - 5Fh | or | |
+ * | | cipher | |
+ * +------------------+------------------+----------------+
+ */
+ sg_init_table(src, 3);
+ sg_set_buf(&src[0], &hdr->algo, AAD_LEN);
+ sg_set_buf(&src[1], src_buf, hdr->msg_sz);
+ sg_set_buf(&src[2], hdr->authtag, crypto->a_len);
+
+ sg_init_table(dst, 3);
+ sg_set_buf(&dst[0], &hdr->algo, AAD_LEN);
+ sg_set_buf(&dst[1], dst_buf, hdr->msg_sz);
+ sg_set_buf(&dst[2], hdr->authtag, crypto->a_len);
+
+ aead_request_set_ad(req, AAD_LEN);
+ aead_request_set_tfm(req, crypto->tfm);
+ aead_request_set_callback(req, 0, crypto_req_done, &wait);
+
+ aead_request_set_crypt(req, src, dst, len, crypto->iv);
+ ret = crypto_wait_req(enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req), &wait);
+
+ aead_request_free(req);
+ return ret;
+}
+
+static int encrypt_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
+ void *plaintext, size_t len)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_guest_msg_hdr *hdr = &msg->hdr;
+
+ memset(crypto->iv, 0, crypto->iv_len);
+ memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
+
+ return enc_dec_message(crypto, msg, plaintext, msg->payload, len, true);
+}
+
+static int decrypt_payload(struct snp_guest_dev *snp_dev, struct snp_guest_msg *msg,
+ void *plaintext, size_t len)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_guest_msg_hdr *hdr = &msg->hdr;
+
+ /* Build IV with response buffer sequence number */
+ memset(crypto->iv, 0, crypto->iv_len);
+ memcpy(crypto->iv, &hdr->msg_seqno, sizeof(hdr->msg_seqno));
+
+ return enc_dec_message(crypto, msg, msg->payload, plaintext, len, false);
+}
+
+static int __handle_guest_request(struct snp_guest_dev *snp_dev, int msg_type,
+ struct snp_user_guest_request *input, uint8_t *req_buf,
+ size_t req_sz, uint8_t *resp_buf, size_t resp_sz, size_t *msg_sz)
+{
+ struct snp_guest_msg *response = snp_dev->response;
+ struct snp_guest_msg_hdr *resp_hdr = &response->hdr;
+ struct snp_guest_msg *request = snp_dev->request;
+ struct snp_guest_msg_hdr *req_hdr = &request->hdr;
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_guest_request_data data;
+ int ret;
+
+ memset(request, 0, sizeof(*request));
+
+ /* Populate the request header */
+ req_hdr->algo = SNP_AEAD_AES_256_GCM;
+ req_hdr->hdr_version = MSG_HDR_VER;
+ req_hdr->hdr_sz = sizeof(*req_hdr);
+ req_hdr->msg_type = msg_type;
+ req_hdr->msg_version = input->msg_version;
+ req_hdr->msg_seqno = snp_msg_seqno();
+ req_hdr->msg_vmpck = 0;
+ req_hdr->msg_sz = req_sz;
+
+ dev_dbg(snp_dev->dev, "request [seqno %lld type %d version %d sz %d]\n",
+ req_hdr->msg_seqno, req_hdr->msg_type, req_hdr->msg_version, req_hdr->msg_sz);
+
+ /* Encrypt the request message buffer */
+ ret = encrypt_payload(snp_dev, request, req_buf, req_sz);
+ if (ret)
+ return ret;
+
+ /* Call firmware to process the request */
+ data.req_gpa = __pa(request);
+ data.resp_gpa = __pa(response);
+ ret = snp_issue_guest_request(GUEST_REQUEST, &data);
+ input->fw_err = ret;
+ if (ret)
+ return ret;
+
+ dev_dbg(snp_dev->dev, "response [msg_seqno %lld msg_type %d msg_version %d msg_sz %d]\n",
+ resp_hdr->msg_seqno, resp_hdr->msg_type, resp_hdr->msg_version, resp_hdr->msg_sz);
+
+ /* Verify that the sequence counter is incremented by 1 */
+ if (unlikely(resp_hdr->msg_seqno != (req_hdr->msg_seqno + 1)))
+ return -EBADMSG;
+
+ /* Verify response message type and version */
+ if ((resp_hdr->msg_type != (req_hdr->msg_type + 1)) ||
+ (resp_hdr->msg_version != req_hdr->msg_version))
+ return -EBADMSG;
+
+ /*
+ * If the message size is greather than our buffer length then return
+ * an error.
+ */
+ if (unlikely((resp_hdr->msg_sz + crypto->a_len) > resp_sz))
+ return -EBADMSG;
+
+ /* Decrypt the payload */
+ ret = decrypt_payload(snp_dev, response, resp_buf, resp_hdr->msg_sz + crypto->a_len);
+ if (ret)
+ return ret;
+
+ *msg_sz = resp_hdr->msg_sz;
+ return 0;
+}
+
+static int handle_guest_request(struct snp_guest_dev *snp_dev, int msg_type,
+ struct snp_user_guest_request *input, void *req_buf,
+ size_t req_len, void __user *resp_buf, size_t resp_len)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct page *page;
+ size_t msg_len;
+ int ret;
+
+ /* Allocate the buffer to hold response */
+ resp_len += crypto->a_len;
+ page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(resp_len));
+ if (!page)
+ return -ENOMEM;
+
+ ret = __handle_guest_request(snp_dev, msg_type, input, req_buf, req_len,
+ page_address(page), resp_len, &msg_len);
+ if (ret)
+ goto e_free;
+
+ if (copy_to_user(resp_buf, page_address(page), msg_len))
+ ret = -EFAULT;
+
+e_free:
+ __free_pages(page, get_order(resp_len));
+
+ return ret;
+}
+
+static int get_report(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *input)
+{
+ struct snp_user_report __user *report = (struct snp_user_report *)input->data;
+ struct snp_user_report_req req;
+
+ if (copy_from_user(&req, &report->req, sizeof(req)))
+ return -EFAULT;
+
+ return handle_guest_request(snp_dev, SNP_MSG_REPORT_REQ, input, &req.user_data,
+ sizeof(req.user_data), report->response, sizeof(report->response));
+}
+
+static int derive_key(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *input)
+{
+ struct snp_user_derive_key __user *key = (struct snp_user_derive_key *)input->data;
+ struct snp_user_derive_key_req req;
+
+ if (copy_from_user(&req, &key->req, sizeof(req)))
+ return -EFAULT;
+
+ return handle_guest_request(snp_dev, SNP_MSG_KEY_REQ, input, &req, sizeof(req),
+ key->response, sizeof(key->response));
+}
+
+static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
+{
+ struct snp_guest_dev *snp_dev = to_snp_dev(file);
+ struct snp_user_guest_request input;
+ void __user *argp = (void __user *)arg;
+ int ret = -ENOTTY;
+
+ if (copy_from_user(&input, argp, sizeof(input)))
+ return -EFAULT;
+
+ mutex_lock(&snp_cmd_mutex);
+ switch (ioctl) {
+ case SNP_GET_REPORT: {
+ ret = get_report(snp_dev, &input);
+ break;
+ }
+ case SNP_DERIVE_KEY: {
+ ret = derive_key(snp_dev, &input);
+ break;
+ }
+ default:
+ break;
+ }
+
+ mutex_unlock(&snp_cmd_mutex);
+
+ if (copy_to_user(argp, &input, sizeof(input)))
+ return -EFAULT;
+
+ return ret;
+}
+
+static void free_shared_pages(void *buf, size_t sz)
+{
+ unsigned int npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ /* If fail to restore the encryption mask then leak it. */
+ if (set_memory_encrypted((unsigned long)buf, npages))
+ return;
+
+ __free_pages(virt_to_page(buf), get_order(sz));
+}
+
+static void *alloc_shared_pages(size_t sz)
+{
+ unsigned int npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+ struct page *page;
+ int ret;
+
+ page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(sz));
+ if (IS_ERR(page))
+ return NULL;
+
+ ret = set_memory_decrypted((unsigned long)page_address(page), npages);
+ if (ret) {
+ __free_pages(page, get_order(sz));
+ return NULL;
+ }
+
+ return page_address(page);
+}
+
+static const struct file_operations snp_guest_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = snp_guest_ioctl,
+};
+
+static int __init snp_guest_probe(struct platform_device *pdev)
+{
+ struct snp_secrets_page_layout *secrets;
+ struct device *dev = &pdev->dev;
+ struct snp_guest_dev *snp_dev;
+ uint8_t key[VMPCK_KEY_LEN];
+ struct miscdevice *misc;
+ struct resource *res;
+ void __iomem *base;
+ int ret;
+
+ snp_dev = devm_kzalloc(&pdev->dev, sizeof(struct snp_guest_dev), GFP_KERNEL);
+ if (!snp_dev)
+ return -ENOMEM;
+
+ platform_set_drvdata(pdev, snp_dev);
+ snp_dev->dev = dev;
+
+ res = platform_get_mem_or_io(pdev, 0);
+ if (IS_ERR(res))
+ return PTR_ERR(res);
+
+ /* Map the secrets page to get the key */
+ base = ioremap_encrypted(res->start, resource_size(res));
+ if (IS_ERR(base))
+ return PTR_ERR(base);
+
+ secrets = (struct snp_secrets_page_layout *)base;
+ memcpy_fromio(key, secrets->vmpck0, sizeof(key));
+ iounmap(base);
+
+ snp_dev->crypto = init_crypto(snp_dev, key, sizeof(key));
+ if (!snp_dev->crypto)
+ return -EIO;
+
+ /* Allocate the shared page used for the request and response message. */
+ snp_dev->request = alloc_shared_pages(sizeof(struct snp_guest_msg));
+ if (IS_ERR(snp_dev->request))
+ return PTR_ERR(snp_dev->request);
+
+ snp_dev->response = alloc_shared_pages(sizeof(struct snp_guest_msg));
+ if (IS_ERR(snp_dev->response)) {
+ ret = PTR_ERR(snp_dev->response);
+ goto e_free_req;
+ }
+
+ misc = &snp_dev->misc;
+ misc->minor = MISC_DYNAMIC_MINOR;
+ misc->name = DEVICE_NAME;
+ misc->fops = &snp_guest_fops;
+
+ return misc_register(misc);
+
+e_free_req:
+ free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
+ return ret;
+}
+
+static int __exit snp_guest_remove(struct platform_device *pdev)
+{
+ struct snp_guest_dev *snp_dev = platform_get_drvdata(pdev);
+
+ free_shared_pages(snp_dev->request, sizeof(struct snp_guest_msg));
+ free_shared_pages(snp_dev->response, sizeof(struct snp_guest_msg));
+ deinit_crypto(snp_dev->crypto);
+ misc_deregister(&snp_dev->misc);
+
+ return 0;
+}
+
+static struct platform_driver snp_guest_driver = {
+ .remove = __exit_p(snp_guest_remove),
+ .driver = {
+ .name = "snp-guest",
+ },
+};
+
+module_platform_driver_probe(snp_guest_driver, snp_guest_probe);
+
+MODULE_AUTHOR("Brijesh Singh <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1.0.0");
+MODULE_DESCRIPTION("AMD SNP Guest Driver");
diff --git a/drivers/virt/sevguest/snp.h b/drivers/virt/sevguest/snp.h
new file mode 100644
index 000000000000..930ffc0f4be3
--- /dev/null
+++ b/drivers/virt/sevguest/snp.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ *
+ * SEV-SNP API spec is available at https://developer.amd.com/sev
+ */
+
+#ifndef __LINUX_SNP_GUEST_H_
+#define __LINUX_SNP_GUEST_H_
+
+#include <linux/types.h>
+
+#define MAX_AUTHTAG_LEN 32
+
+/* See SNP spec SNP_GUEST_REQUEST section for the structure */
+enum msg_type {
+ SNP_MSG_TYPE_INVALID = 0,
+ SNP_MSG_CPUID_REQ,
+ SNP_MSG_CPUID_RSP,
+ SNP_MSG_KEY_REQ,
+ SNP_MSG_KEY_RSP,
+ SNP_MSG_REPORT_REQ,
+ SNP_MSG_REPORT_RSP,
+ SNP_MSG_EXPORT_REQ,
+ SNP_MSG_EXPORT_RSP,
+ SNP_MSG_IMPORT_REQ,
+ SNP_MSG_IMPORT_RSP,
+ SNP_MSG_ABSORB_REQ,
+ SNP_MSG_ABSORB_RSP,
+ SNP_MSG_VMRK_REQ,
+ SNP_MSG_VMRK_RSP,
+
+ SNP_MSG_TYPE_MAX
+};
+
+enum aead_algo {
+ SNP_AEAD_INVALID,
+ SNP_AEAD_AES_256_GCM,
+};
+
+struct snp_guest_msg_hdr {
+ u8 authtag[MAX_AUTHTAG_LEN];
+ u64 msg_seqno;
+ u8 rsvd1[8];
+ u8 algo;
+ u8 hdr_version;
+ u16 hdr_sz;
+ u8 msg_type;
+ u8 msg_version;
+ u16 msg_sz;
+ u32 rsvd2;
+ u8 msg_vmpck;
+ u8 rsvd3[35];
+} __packed;
+
+struct snp_guest_msg {
+ struct snp_guest_msg_hdr hdr;
+ u8 payload[4000];
+} __packed;
+
+#endif /* __LINUX_SNP_GUEST_H__ */
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
new file mode 100644
index 000000000000..0a8454631605
--- /dev/null
+++ b/include/uapi/linux/sev-guest.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
+/*
+ * Userspace interface for AMD SEV and SEV-SNP guest driver.
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <[email protected]>
+ *
+ * SEV-SNP API specification is available at: https://developer.amd.com/sev/
+ */
+
+#ifndef __UAPI_LINUX_SEV_GUEST_H_
+#define __UAPI_LINUX_SEV_GUEST_H_
+
+#include <linux/types.h>
+
+struct snp_user_report_req {
+ __u8 user_data[64];
+};
+
+struct snp_user_report {
+ struct snp_user_report_req req;
+
+ /* see SEV-SNP spec for the response format */
+ __u8 response[4000];
+};
+
+struct snp_user_derive_key_req {
+ __u8 root_key_select;
+ __u64 guest_field_select;
+ __u32 vmpl;
+ __u32 guest_svn;
+ __u64 tcb_version;
+};
+
+struct snp_user_derive_key {
+ struct snp_user_derive_key_req req;
+
+ /* see SEV-SNP spec for the response format */
+ __u8 response[64];
+};
+
+struct snp_user_guest_request {
+ /* Message version number (must be non-zero) */
+ __u8 msg_version;
+ __u64 data;
+
+ /* firmware error code on failure (see psp-sev.h) */
+ __u32 fw_err;
+};
+
+#define SNP_GUEST_REQ_IOC_TYPE 'S'
+#define SNP_GET_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x0, struct snp_user_guest_request)
+#define SNP_DERIVE_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_user_guest_request)
+
+#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.17.1

2021-06-02 14:09:23

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 09/22] x86/compressed: Register GHCB memory when SEV-SNP is active

The SEV-SNP guest is required to perform GHCB GPA registration. This is
because the hypervisor may prefer that a guest use a consistent and/or
specific GPA for the GHCB associated with a vCPU. For more information,
see the GHCB specification.

If hypervisor can not work with the guest provided GPA then terminate the
guest boot.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 4 ++++
arch/x86/include/asm/sev-common.h | 11 +++++++++++
arch/x86/kernel/sev-shared.c | 16 ++++++++++++++++
3 files changed, 31 insertions(+)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 808fe1f6b170..4acade02267b 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -203,6 +203,10 @@ static bool early_setup_sev_es(void)
/* Initialize lookup tables for the instruction decoder */
inat_init_tables();

+ /* SEV-SNP guest requires the GHCB GPA must be registered */
+ if (sev_snp_enabled())
+ snp_register_ghcb_early(__pa(&boot_ghcb_page));
+
return true;
}

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1424b8ffde0b..ae99a8a756fe 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -75,6 +75,17 @@
#define GHCB_MSR_PSC_ERROR_POS 32
#define GHCB_MSR_PSC_RESP_VAL(val) ((val) >> GHCB_MSR_PSC_ERROR_POS)

+/* GHCB GPA Register */
+#define GHCB_MSR_GPA_REG_REQ 0x012
+#define GHCB_MSR_GPA_REG_VALUE_POS 12
+#define GHCB_MSR_GPA_REG_GFN_MASK GENMASK_ULL(51, 0)
+#define GHCB_MSR_GPA_REQ_GFN_VAL(v) \
+ (((unsigned long)((v) & GHCB_MSR_GPA_REG_GFN_MASK) << GHCB_MSR_GPA_REG_VALUE_POS)| \
+ GHCB_MSR_GPA_REG_REQ)
+
+#define GHCB_MSR_GPA_REG_RESP 0x013
+#define GHCB_MSR_GPA_REG_RESP_VAL(v) ((v) >> GHCB_MSR_GPA_REG_VALUE_POS)
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index b8312ad66120..b62226bf51b9 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -77,6 +77,22 @@ static bool get_hv_features(void)
return true;
}

+static void snp_register_ghcb_early(unsigned long paddr)
+{
+ unsigned long pfn = paddr >> PAGE_SHIFT;
+ u64 val;
+
+ sev_es_wr_ghcb_msr(GHCB_MSR_GPA_REQ_GFN_VAL(pfn));
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+
+ /* If the response GPA is not ours then abort the guest */
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_GPA_REG_RESP) ||
+ (GHCB_MSR_GPA_REG_RESP_VAL(val) != pfn))
+ sev_es_terminate(1, GHCB_TERM_REGISTER);
+}
+
static bool sev_es_negotiate_protocol(void)
{
u64 val;
--
2.17.1

2021-06-02 14:09:33

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 14/22] x86/mm: Add support to validate memory when changing C-bit

The set_memory_{encrypt,decrypt}() are used for changing the pages
from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps.

If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:

1. Issue the page state change VMGEXIT to add the memory region in
the RMP table.
2. Validate the memory region after the RMP entry is added.

To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before encryption attribute
is removed from the page table:

1. Invalidate the page.
2. Issue the page state change VMGEXIT to remove the page from RMP table.

To change the page state in the RMP table, use the Page State Change
VMGEXIT defined in the GHCB specification.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 24 +++++++
arch/x86/include/asm/sev.h | 4 ++
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kernel/sev.c | 107 ++++++++++++++++++++++++++++++
arch/x86/mm/pat/set_memory.c | 14 ++++
5 files changed, 151 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index ae99a8a756fe..86bb185b5ec1 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -62,6 +62,8 @@
#define GHCB_MSR_PSC_REQ 0x014
#define SNP_PAGE_STATE_PRIVATE 1
#define SNP_PAGE_STATE_SHARED 2
+#define SNP_PAGE_STATE_PSMASH 3
+#define SNP_PAGE_STATE_UNSMASH 4
#define GHCB_MSR_PSC_GFN_POS 12
#define GHCB_MSR_PSC_GFN_MASK GENMASK_ULL(39, 0)
#define GHCB_MSR_PSC_OP_POS 52
@@ -86,6 +88,28 @@
#define GHCB_MSR_GPA_REG_RESP 0x013
#define GHCB_MSR_GPA_REG_RESP_VAL(v) ((v) >> GHCB_MSR_GPA_REG_VALUE_POS)

+/* SNP Page State Change NAE event */
+#define VMGEXIT_PSC_MAX_ENTRY 253
+
+struct __packed snp_page_state_header {
+ u16 cur_entry;
+ u16 end_entry;
+ u32 reserved;
+};
+
+struct __packed snp_page_state_entry {
+ u64 cur_page : 12,
+ gfn : 40,
+ operation : 4,
+ pagesize : 1,
+ reserved : 7;
+};
+
+struct __packed snp_page_state_change {
+ struct snp_page_state_header header;
+ struct snp_page_state_entry entry[VMGEXIT_PSC_MAX_ENTRY];
+};
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7c2cb5300e43..e2141fc28058 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -114,6 +114,8 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
unsigned int npages);
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op);
+void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
+void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -130,6 +132,8 @@ early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned i
{
}
static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op) { }
+static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
+static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
#endif

#endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 554f75fe013c..41573cf44470 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -108,6 +108,7 @@
#define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
+#define SVM_VMGEXIT_PSC 0x80000010
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff

#define SVM_EXIT_ERR -1
@@ -215,6 +216,7 @@
{ SVM_VMGEXIT_NMI_COMPLETE, "vmgexit_nmi_complete" }, \
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
+ { SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }


diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 6e9b45bb38ab..4847ac81cca3 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -637,6 +637,113 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op)
WARN(1, "invalid memory op %d\n", op);
}

+static int page_state_vmgexit(struct ghcb *ghcb, struct snp_page_state_change *data)
+{
+ struct snp_page_state_header *hdr;
+ int ret = 0;
+
+ hdr = &data->header;
+
+ /*
+ * As per the GHCB specification, the hypervisor can resume the guest before
+ * processing all the entries. The loop checks whether all the entries are
+ * processed. If not, then keep retrying.
+ */
+ while (hdr->cur_entry <= hdr->end_entry) {
+
+ ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
+
+ ret = sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PSC, 0, 0);
+
+ /* Page State Change VMGEXIT can pass error code through exit_info_2. */
+ if (WARN(ret || ghcb->save.sw_exit_info_2,
+ "SEV-SNP: page state change failed ret=%d exit_info_2=%llx\n",
+ ret, ghcb->save.sw_exit_info_2))
+ return 1;
+ }
+
+ return 0;
+}
+
+static void set_page_state(unsigned long vaddr, unsigned int npages, int op)
+{
+ struct snp_page_state_change *data;
+ struct snp_page_state_header *hdr;
+ struct snp_page_state_entry *e;
+ unsigned long vaddr_end;
+ struct ghcb_state state;
+ struct ghcb *ghcb;
+ int idx;
+
+ vaddr = vaddr & PAGE_MASK;
+ vaddr_end = vaddr + (npages << PAGE_SHIFT);
+
+ ghcb = sev_es_get_ghcb(&state);
+ if (unlikely(!ghcb))
+ panic("SEV-SNP: Failed to get GHCB\n");
+
+ data = (struct snp_page_state_change *)ghcb->shared_buffer;
+ hdr = &data->header;
+
+ while (vaddr < vaddr_end) {
+ e = data->entry;
+ memset(data, 0, sizeof(*data));
+
+ for (idx = 0; idx < VMGEXIT_PSC_MAX_ENTRY; idx++, e++) {
+ unsigned long pfn;
+
+ if (is_vmalloc_addr((void *)vaddr))
+ pfn = vmalloc_to_pfn((void *)vaddr);
+ else
+ pfn = __pa(vaddr) >> PAGE_SHIFT;
+
+ e->gfn = pfn;
+ e->operation = op;
+ hdr->end_entry = idx;
+
+ /*
+ * The GHCB specification provides the flexibility to
+ * use either 4K or 2MB page size in the RMP table.
+ * The current SNP support does not keep track of the
+ * page size used in the RMP table. To avoid the
+ * overlap request, use the 4K page size in the RMP
+ * table.
+ */
+ e->pagesize = RMP_PG_SIZE_4K;
+ vaddr = vaddr + PAGE_SIZE;
+
+ if (vaddr >= vaddr_end)
+ break;
+ }
+
+ /* Terminate the guest on page state change failure. */
+ if (page_state_vmgexit(ghcb, data))
+ sev_es_terminate(1, GHCB_TERM_PSC);
+ }
+
+ sev_es_put_ghcb(&state);
+}
+
+void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return;
+
+ pvalidate_pages(vaddr, npages, 0);
+
+ set_page_state(vaddr, npages, SNP_PAGE_STATE_SHARED);
+}
+
+void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return;
+
+ set_page_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
+
+ pvalidate_pages(vaddr, npages, 1);
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 156cd235659f..20cd5ebc972f 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -29,6 +29,7 @@
#include <asm/proto.h>
#include <asm/memtype.h>
#include <asm/set_memory.h>
+#include <asm/sev.h>

#include "../mm_internal.h"

@@ -2009,8 +2010,21 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
*/
cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));

+ /*
+ * To maintain the security guarantees of SEV-SNP guest, invalidate
+ * the memory before clearing the encryption attribute.
+ */
+ if (!enc)
+ snp_set_memory_shared(addr, numpages);
+
ret = __change_page_attr_set_clr(&cpa, 1);

+ /*
+ * Now that memory is marked encrypted in the page table, validate it.
+ */
+ if (!ret && enc)
+ snp_set_memory_private(addr, numpages);
+
/*
* After changing the encryption attribute, we need to flush TLBs again
* in case any speculative TLB caching occurred (but no need to flush
--
2.17.1

2021-06-02 14:09:35

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 07/22] x86/sev: Add a helper for the PVALIDATE instruction

An SNP-active guest uses the PVALIDATE instruction to validate or
rescind the validation of a guest page’s RMP entry. Upon completion,
a return code is stored in EAX and rFLAGS bits are set based on the
return code. If the instruction completed successfully, the CF
indicates if the content of the RMP were changed or not.

See AMD APM Volume 3 for additional details.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 134a7c9d91b6..1b7a172b832b 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -59,6 +59,9 @@ extern void vc_no_ghcb(void);
extern void vc_boot_ghcb(void);
extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

+/* Software defined (when rFlags.CF = 1) */
+#define PVALIDATE_FAIL_NOUPDATE 255
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -81,12 +84,29 @@ static __always_inline void sev_es_nmi_complete(void)
__sev_es_nmi_complete();
}
extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
+{
+ bool no_rmpupdate;
+ int rc;
+
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
+ CC_SET(c)
+ : CC_OUT(c) (no_rmpupdate), "=a"(rc)
+ : "a"(vaddr), "c"(rmp_psize), "d"(validate)
+ : "memory", "cc");
+
+ if (no_rmpupdate)
+ return PVALIDATE_FAIL_NOUPDATE;
+
+ return rc;
+}
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { return 0; }
static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
+static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
#endif

#endif
--
2.17.1

2021-06-02 14:09:47

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 13/22] x86/kernel: Validate rom memory before accessing when SEV-SNP is active

The probe_roms() access the memory range (0xc0000 - 0x10000) to probe
various ROMs. The memory range is not part of the E820 system RAM
range. The memory range is mapped as private (i.e encrypted) in page
table.

When SEV-SNP is active, all the private memory must be validated before
the access. The ROM range was not part of E820 map, so the guest BIOS
did not validate it. An access to invalidated memory will cause a VC
exception. The guest does not support handling not-validated VC exception
yet, so validate the ROM memory regions before it is accessed.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/probe_roms.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index 9e1def3744f2..04b59ee77e42 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -21,6 +21,7 @@
#include <asm/sections.h>
#include <asm/io.h>
#include <asm/setup_arch.h>
+#include <asm/sev.h>

static struct resource system_rom_resource = {
.name = "System ROM",
@@ -197,11 +198,21 @@ static int __init romchecksum(const unsigned char *rom, unsigned long length)

void __init probe_roms(void)
{
- const unsigned char *rom;
unsigned long start, length, upper;
+ const unsigned char *rom;
unsigned char c;
int i;

+ /*
+ * The ROM memory is not part of the E820 system RAM and is not pre-validated
+ * by the BIOS. The kernel page table maps the ROM region as encrypted memory,
+ * the SEV-SNP requires the encrypted memory must be validated before the
+ * access. Validate the ROM before accessing it.
+ */
+ snp_prep_memory(video_rom_resource.start,
+ ((system_rom_resource.end + 1) - video_rom_resource.start),
+ MEMORY_PRIVATE);
+
/* video rom */
upper = adapter_rom_resources[0].start;
for (start = video_rom_resource.start; start < upper; start += 2048) {
--
2.17.1

2021-06-02 14:09:58

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 16/22] KVM: SVM: Create a separate mapping for the SEV-ES save area

From: Tom Lendacky <[email protected]>

The save area for SEV-ES/SEV-SNP guests, as used by the hardware, is
different from the save area of a non SEV-ES/SEV-SNP guest.

This is the first step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Create an SEV-ES/SEV-SNP save area and adjust usage to the new
save area definition where needed.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 83 +++++++++++++++++++++++++++++---------
arch/x86/kvm/svm/sev.c | 24 +++++------
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 77 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index fb38fae3d5ab..482fb20104da 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -219,6 +219,7 @@ struct vmcb_seg {
u64 base;
} __packed;

+/* Save area definition for legacy and SEV-MEM guests */
struct vmcb_save_area {
struct vmcb_seg es;
struct vmcb_seg cs;
@@ -235,8 +236,58 @@ struct vmcb_save_area {
u8 cpl;
u8 reserved_2[4];
u64 efer;
+ u8 reserved_3[112];
+ u64 cr4;
+ u64 cr3;
+ u64 cr0;
+ u64 dr7;
+ u64 dr6;
+ u64 rflags;
+ u64 rip;
+ u8 reserved_4[88];
+ u64 rsp;
+ u64 s_cet;
+ u64 ssp;
+ u64 isst_addr;
+ u64 rax;
+ u64 star;
+ u64 lstar;
+ u64 cstar;
+ u64 sfmask;
+ u64 kernel_gs_base;
+ u64 sysenter_cs;
+ u64 sysenter_esp;
+ u64 sysenter_eip;
+ u64 cr2;
+ u8 reserved_5[32];
+ u64 g_pat;
+ u64 dbgctl;
+ u64 br_from;
+ u64 br_to;
+ u64 last_excp_from;
+ u64 last_excp_to;
+ u8 reserved_6[72];
+ u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
+} __packed;
+
+/* Save area definition for SEV-ES and SEV-SNP guests */
+struct sev_es_save_area {
+ struct vmcb_seg es;
+ struct vmcb_seg cs;
+ struct vmcb_seg ss;
+ struct vmcb_seg ds;
+ struct vmcb_seg fs;
+ struct vmcb_seg gs;
+ struct vmcb_seg gdtr;
+ struct vmcb_seg ldtr;
+ struct vmcb_seg idtr;
+ struct vmcb_seg tr;
+ u8 reserved_1[43];
+ u8 cpl;
+ u8 reserved_2[4];
+ u64 efer;
u8 reserved_3[104];
- u64 xss; /* Valid for SEV-ES only */
+ u64 xss;
u64 cr4;
u64 cr3;
u64 cr0;
@@ -264,22 +315,14 @@ struct vmcb_save_area {
u64 br_to;
u64 last_excp_from;
u64 last_excp_to;
-
- /*
- * The following part of the save area is valid only for
- * SEV-ES guests when referenced through the GHCB or for
- * saving to the host save area.
- */
- u8 reserved_7[72];
- u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
- u8 reserved_7b[4];
+ u8 reserved_7[80];
u32 pkru;
- u8 reserved_7a[20];
- u64 reserved_8; /* rax already available at 0x01f8 */
+ u8 reserved_9[20];
+ u64 reserved_10; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
u64 rbx;
- u64 reserved_9; /* rsp already available at 0x01d8 */
+ u64 reserved_11; /* rsp already available at 0x01d8 */
u64 rbp;
u64 rsi;
u64 rdi;
@@ -291,21 +334,21 @@ struct vmcb_save_area {
u64 r13;
u64 r14;
u64 r15;
- u8 reserved_10[16];
+ u8 reserved_12[16];
u64 sw_exit_code;
u64 sw_exit_info_1;
u64 sw_exit_info_2;
u64 sw_scratch;
u64 sev_features;
- u8 reserved_11[48];
+ u8 reserved_13[48];
u64 xcr0;
u8 valid_bitmap[16];
u64 x87_state_gpa;
} __packed;

struct ghcb {
- struct vmcb_save_area save;
- u8 reserved_save[2048 - sizeof(struct vmcb_save_area)];
+ struct sev_es_save_area save;
+ u8 reserved_save[2048 - sizeof(struct sev_es_save_area)];

u8 shared_buffer[2032];

@@ -315,13 +358,15 @@ struct ghcb {
} __packed;


-#define EXPECTED_VMCB_SAVE_AREA_SIZE 1032
+#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
+#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 272
#define EXPECTED_GHCB_SIZE PAGE_SIZE

static inline void __unused_size_checks(void)
{
BUILD_BUG_ON(sizeof(struct vmcb_save_area) != EXPECTED_VMCB_SAVE_AREA_SIZE);
+ BUILD_BUG_ON(sizeof(struct sev_es_save_area) != EXPECTED_SEV_ES_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct vmcb_control_area) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct ghcb) != EXPECTED_GHCB_SIZE);
}
@@ -392,7 +437,7 @@ struct vmcb {
/* GHCB Accessor functions */

#define GHCB_BITMAP_IDX(field) \
- (offsetof(struct vmcb_save_area, field) / sizeof(u64))
+ (offsetof(struct sev_es_save_area, field) / sizeof(u64))

#define DEFINE_GHCB_ACCESSORS(field) \
static inline bool ghcb_##field##_is_valid(const struct ghcb *ghcb) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5bc887e9a986..d93a1c368b61 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -542,12 +542,20 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)

static int sev_es_sync_vmsa(struct vcpu_svm *svm)
{
- struct vmcb_save_area *save = &svm->vmcb->save;
+ struct sev_es_save_area *save = svm->vmsa;

/* Check some debug related fields before encrypting the VMSA */
- if (svm->vcpu.guest_debug || (save->dr7 & ~DR7_FIXED_1))
+ if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1))
return -EINVAL;

+ /*
+ * SEV-ES will use a VMSA that is pointed to by the VMCB, not
+ * the traditional VMSA that is part of the VMCB. Copy the
+ * traditional VMSA as it has been built so far (in prep
+ * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
+ */
+ memcpy(save, &svm->vmcb->save, sizeof(svm->vmcb->save));
+
/* Sync registgers */
save->rax = svm->vcpu.arch.regs[VCPU_REGS_RAX];
save->rbx = svm->vcpu.arch.regs[VCPU_REGS_RBX];
@@ -574,14 +582,6 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->pkru = svm->vcpu.arch.pkru;
save->xss = svm->vcpu.arch.ia32_xss;

- /*
- * SEV-ES will use a VMSA that is pointed to by the VMCB, not
- * the traditional VMSA that is part of the VMCB. Copy the
- * traditional VMSA as it has been built so far (in prep
- * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
- */
- memcpy(svm->vmsa, save, sizeof(*save));
-
return 0;
}

@@ -2598,7 +2598,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm)
void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
{
struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
- struct vmcb_save_area *hostsa;
+ struct sev_es_save_area *hostsa;

/*
* As an SEV-ES guest, hardware will restore the host state on VMEXIT,
@@ -2608,7 +2608,7 @@ void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
vmsave(__sme_page_pa(sd->save_area));

/* XCR0 is restored on VMEXIT, save the current host value */
- hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
+ hostsa = (struct sev_es_save_area *)(page_address(sd->save_area) + 0x400);
hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);

/* PKRU is restored on VMEXIT, save the current host value */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2c9ece618b29..0b89aee51b74 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -170,7 +170,7 @@ struct vcpu_svm {
} shadow_msr_intercept;

/* SEV-ES support */
- struct vmcb_save_area *vmsa;
+ struct sev_es_save_area *vmsa;
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
bool received_first_sipi;
--
2.17.1

2021-06-02 14:10:28

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 15/22] KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area

The hypervisor uses the SEV_FEATURES field (offset 3B0h) in the Save State
Area to control the SEV-SNP guest features such as SNPActive, vTOM,
ReflectVC etc. An SEV-SNP guest can read the SEV_FEATURES fields through
the SEV_STATUS MSR.

While at it, update the dump_vmcb() to log the VMPL level.

See APM2 Table 15-34 and B-4 for more details.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 6 ++++--
arch/x86/kvm/svm/svm.c | 4 ++--
2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 772e60efe243..fb38fae3d5ab 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -230,7 +230,8 @@ struct vmcb_save_area {
struct vmcb_seg ldtr;
struct vmcb_seg idtr;
struct vmcb_seg tr;
- u8 reserved_1[43];
+ u8 reserved_1[42];
+ u8 vmpl;
u8 cpl;
u8 reserved_2[4];
u64 efer;
@@ -295,7 +296,8 @@ struct vmcb_save_area {
u64 sw_exit_info_1;
u64 sw_exit_info_2;
u64 sw_scratch;
- u8 reserved_11[56];
+ u64 sev_features;
+ u8 reserved_11[48];
u64 xcr0;
u8 valid_bitmap[16];
u64 x87_state_gpa;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 05eca131eaf2..2acf187a3100 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3186,8 +3186,8 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
"tr:",
save01->tr.selector, save01->tr.attrib,
save01->tr.limit, save01->tr.base);
- pr_err("cpl: %d efer: %016llx\n",
- save->cpl, save->efer);
+ pr_err("vmpl: %d cpl: %d efer: %016llx\n",
+ save->vmpl, save->cpl, save->efer);
pr_err("%-15s %016llx %-13s %016llx\n",
"cr0:", save->cr0, "cr2:", save->cr2);
pr_err("%-15s %016llx %-13s %016llx\n",
--
2.17.1

2021-06-02 14:10:56

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH Part1 RFC v3 19/22] x86/sev-snp: SEV-SNP AP creation support

From: Tom Lendacky <[email protected]>

To provide a more secure way to start APs under SEV-SNP, use the SEV-SNP
AP Creation NAE event. This allows for guest control over the AP register
state rather than trusting the hypervisor with the SEV-ES Jump Table
address.

During native_smp_prepare_cpus(), invoke an SEV-SNP function that, if
SEV-SNP is active, will set or override apic->wakeup_secondary_cpu. This
will allow the SEV-SNP AP Creation NAE event method to be used to boot
the APs.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 1 +
arch/x86/include/asm/sev.h | 13 ++
arch/x86/include/uapi/asm/svm.h | 5 +
arch/x86/kernel/sev-shared.c | 5 +
arch/x86/kernel/sev.c | 206 ++++++++++++++++++++++++++++++
arch/x86/kernel/smpboot.c | 3 +
6 files changed, 233 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 86bb185b5ec1..47aa57bf654a 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -57,6 +57,7 @@
(((unsigned long)((v) & GHCB_MSR_HV_FT_MASK) >> GHCB_MSR_HV_FT_POS))

#define GHCB_HV_FT_SNP BIT_ULL(0)
+#define GHCB_HV_FT_SNP_AP_CREATION (BIT_ULL(1) | GHCB_HV_FT_SNP)

/* SNP Page State Change */
#define GHCB_MSR_PSC_REQ 0x014
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index e2141fc28058..640108402ae9 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -71,6 +71,13 @@ enum snp_mem_op {
MEMORY_SHARED
};

+#define RMPADJUST_VMPL_MAX 3
+#define RMPADJUST_VMPL_MASK GENMASK(7, 0)
+#define RMPADJUST_VMPL_SHIFT 0
+#define RMPADJUST_PERM_MASK_MASK GENMASK(7, 0)
+#define RMPADJUST_PERM_MASK_SHIFT 8
+#define RMPADJUST_VMSA_PAGE_BIT BIT(16)
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -116,6 +123,9 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op);
void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
+
+void snp_setup_wakeup_secondary_cpu(void);
+
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -134,6 +144,9 @@ early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned i
static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op) { }
static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
+
+static inline void snp_setup_wakeup_secondary_cpu(void) { }
+
#endif

#endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 41573cf44470..c0152186a008 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -109,6 +109,10 @@
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
#define SVM_VMGEXIT_PSC 0x80000010
+#define SVM_VMGEXIT_AP_CREATION 0x80000013
+#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
+#define SVM_VMGEXIT_AP_CREATE 1
+#define SVM_VMGEXIT_AP_DESTROY 2
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff

#define SVM_EXIT_ERR -1
@@ -217,6 +221,7 @@
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
{ SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
+ { SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }


diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index b62226bf51b9..7139c9ba59b2 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -32,6 +32,11 @@ static bool __init sev_es_check_cpu_features(void)
return true;
}

+static bool snp_ap_creation_supported(void)
+{
+ return (hv_features & GHCB_HV_FT_SNP_AP_CREATION) == GHCB_HV_FT_SNP_AP_CREATION;
+}
+
static bool __init sev_snp_check_hypervisor_features(void)
{
if (ghcb_version < 2)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 4847ac81cca3..8f7ef35a25ef 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -19,6 +19,7 @@
#include <linux/memblock.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/cpumask.h>

#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -31,6 +32,7 @@
#include <asm/svm.h>
#include <asm/smp.h>
#include <asm/cpu.h>
+#include <asm/apic.h>

#include "sev-internal.h"

@@ -106,6 +108,8 @@ struct ghcb_state {
static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);

+static DEFINE_PER_CPU(struct sev_es_save_area *, snp_vmsa);
+
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);

@@ -744,6 +748,208 @@ void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
pvalidate_pages(vaddr, npages, 1);
}

+static int snp_rmpadjust(void *va, unsigned int vmpl, unsigned int perm_mask, bool vmsa)
+{
+ unsigned int attrs;
+ int err;
+
+ attrs = (vmpl & RMPADJUST_VMPL_MASK) << RMPADJUST_VMPL_SHIFT;
+ attrs |= (perm_mask & RMPADJUST_PERM_MASK_MASK) << RMPADJUST_PERM_MASK_SHIFT;
+ if (vmsa)
+ attrs |= RMPADJUST_VMSA_PAGE_BIT;
+
+ /* Perform RMPADJUST */
+ asm volatile (".byte 0xf3,0x0f,0x01,0xfe\n\t"
+ : "=a" (err)
+ : "a" (va), "c" (0), "d" (attrs)
+ : "memory", "cc");
+
+ return err;
+}
+
+static int snp_clear_vmsa(void *vmsa)
+{
+ /*
+ * Clear the VMSA attribute for the page:
+ * RDX[7:0] = 1, Target VMPL level, must be numerically
+ * higher than current level (VMPL0)
+ * RDX[15:8] = 0, Target permission mask (not used)
+ * RDX[16] = 0, Not a VMSA page
+ */
+ return snp_rmpadjust(vmsa, RMPADJUST_VMPL_MAX, 0, false);
+}
+
+static int snp_set_vmsa(void *vmsa)
+{
+ /*
+ * To set the VMSA attribute for the page:
+ * RDX[7:0] = 1, Target VMPL level, must be numerically
+ * higher than current level (VMPL0)
+ * RDX[15:8] = 0, Target permission mask (not used)
+ * RDX[16] = 1, VMSA page
+ */
+ return snp_rmpadjust(vmsa, RMPADJUST_VMPL_MAX, 0, true);
+}
+
+#define INIT_CS_ATTRIBS (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
+#define INIT_DS_ATTRIBS (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK | SVM_SELECTOR_WRITE_MASK)
+
+#define INIT_LDTR_ATTRIBS (SVM_SELECTOR_P_MASK | 2)
+#define INIT_TR_ATTRIBS (SVM_SELECTOR_P_MASK | 3)
+
+static int snp_wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
+{
+ struct sev_es_save_area *cur_vmsa;
+ struct sev_es_save_area *vmsa;
+ struct ghcb_state state;
+ struct ghcb *ghcb;
+ unsigned long flags;
+ u8 sipi_vector;
+ u64 cr4;
+ int cpu;
+ int ret;
+
+ if (!snp_ap_creation_supported())
+ return -ENOTSUPP;
+
+ /* Override start_ip with known SEV-ES/SEV-SNP starting RIP */
+ if (start_ip == real_mode_header->trampoline_start) {
+ start_ip = real_mode_header->sev_es_trampoline_start;
+ } else {
+ WARN_ONCE(1, "unsupported SEV-SNP start_ip: %lx\n", start_ip);
+ return -EINVAL;
+ }
+
+ /* Find the logical CPU for the APIC ID */
+ for_each_present_cpu(cpu) {
+ if (arch_match_cpu_phys_id(cpu, apic_id))
+ break;
+ }
+ if (cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ cur_vmsa = per_cpu(snp_vmsa, cpu);
+ vmsa = (struct sev_es_save_area *)get_zeroed_page(GFP_KERNEL);
+ if (!vmsa)
+ return -ENOMEM;
+
+ /* CR4 should maintain the MCE value */
+ cr4 = native_read_cr4() & ~X86_CR4_MCE;
+
+ /* Set the CS value based on the start_ip converted to a SIPI vector */
+ sipi_vector = (start_ip >> 12);
+ vmsa->cs.base = sipi_vector << 12;
+ vmsa->cs.limit = 0xffff;
+ vmsa->cs.attrib = INIT_CS_ATTRIBS;
+ vmsa->cs.selector = sipi_vector << 8;
+
+ /* Set the RIP value based on start_ip */
+ vmsa->rip = start_ip & 0xfff;
+
+ /* Set VMSA entries to the INIT values as documented in the APM */
+ vmsa->ds.limit = 0xffff;
+ vmsa->ds.attrib = INIT_DS_ATTRIBS;
+ vmsa->es = vmsa->ds;
+ vmsa->fs = vmsa->ds;
+ vmsa->gs = vmsa->ds;
+ vmsa->ss = vmsa->ds;
+
+ vmsa->gdtr.limit = 0xffff;
+ vmsa->ldtr.limit = 0xffff;
+ vmsa->ldtr.attrib = INIT_LDTR_ATTRIBS;
+ vmsa->idtr.limit = 0xffff;
+ vmsa->tr.limit = 0xffff;
+ vmsa->tr.attrib = INIT_TR_ATTRIBS;
+
+ vmsa->efer = 0x1000; /* Must set SVME bit */
+ vmsa->cr4 = cr4;
+ vmsa->cr0 = 0x60000010;
+ vmsa->dr7 = 0x400;
+ vmsa->dr6 = 0xffff0ff0;
+ vmsa->rflags = 0x2;
+ vmsa->g_pat = 0x0007040600070406ULL;
+ vmsa->xcr0 = 0x1;
+ vmsa->mxcsr = 0x1f80;
+ vmsa->x87_ftw = 0x5555;
+ vmsa->x87_fcw = 0x0040;
+
+ /*
+ * Set the SNP-specific fields for this VMSA:
+ * VMPL level
+ * SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
+ */
+ vmsa->vmpl = 0;
+ vmsa->sev_features = sev_status >> 2;
+
+ /* Switch the page over to a VMSA page now that it is initialized */
+ ret = snp_set_vmsa(vmsa);
+ if (ret) {
+ pr_err("set VMSA page failed (%u)\n", ret);
+ free_page((unsigned long)vmsa);
+
+ return -EINVAL;
+ }
+
+ /* Issue VMGEXIT AP Creation NAE event */
+ local_irq_save(flags);
+
+ ghcb = sev_es_get_ghcb(&state);
+
+ vc_ghcb_invalidate(ghcb);
+ ghcb_set_rax(ghcb, vmsa->sev_features);
+ ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
+ ghcb_set_sw_exit_info_1(ghcb, ((u64)apic_id << 32) | SVM_VMGEXIT_AP_CREATE);
+ ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
+
+ sev_es_wr_ghcb_msr(__pa(ghcb));
+ VMGEXIT();
+
+ if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
+ lower_32_bits(ghcb->save.sw_exit_info_1)) {
+ pr_alert("SNP AP Creation error\n");
+ ret = -EINVAL;
+ }
+
+ sev_es_put_ghcb(&state);
+
+ local_irq_restore(flags);
+
+ /* Perform cleanup if there was an error */
+ if (ret) {
+ int err = snp_clear_vmsa(vmsa);
+
+ if (err)
+ pr_err("clear VMSA page failed (%u), leaking page\n", err);
+ else
+ free_page((unsigned long)vmsa);
+
+ vmsa = NULL;
+ }
+
+ /* Free up any previous VMSA page */
+ if (cur_vmsa) {
+ int err = snp_clear_vmsa(cur_vmsa);
+
+ if (err)
+ pr_err("clear VMSA page failed (%u), leaking page\n", err);
+ else
+ free_page((unsigned long)cur_vmsa);
+ }
+
+ /* Record the current VMSA page */
+ cur_vmsa = vmsa;
+
+ return ret;
+}
+
+void snp_setup_wakeup_secondary_cpu(void)
+{
+ if (!sev_feature_enabled(SEV_SNP))
+ return;
+
+ apic->wakeup_secondary_cpu = snp_wakeup_cpu_via_vmgexit;
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0ad5214f598a..973145081818 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -82,6 +82,7 @@
#include <asm/spec-ctrl.h>
#include <asm/hw_irq.h>
#include <asm/stackprotector.h>
+#include <asm/sev.h>

#ifdef CONFIG_ACPI_CPPC_LIB
#include <acpi/cppc_acpi.h>
@@ -1379,6 +1380,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
smp_quirk_init_udelay();

speculative_store_bypass_ht_init();
+
+ snp_setup_wakeup_secondary_cpu();
}

void arch_thaw_secondary_cpus_begin(void)
--
2.17.1

2021-06-03 19:58:52

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 03/22] x86/sev: Save the negotiated GHCB version

On Wed, Jun 02, 2021 at 09:03:57AM -0500, Brijesh Singh wrote:
> +/*
> + * Since feature negotiation related variables are set early in the boot
> + * process they must reside in the .data section so as not to be zeroed
> + * out when the .bss section is later cleared.
> + */

From previous review:

...

*
* GHCB protocol version negotiated with the hypervisor.
*/

You need to document what this variable is used for so add a comment
over it please.

> +static u16 ghcb_version __section(".data");
> +

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-04 11:29:36

by Sergio Lopez Pascual

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 21/22] x86/sev: Register SNP guest request platform device

On Wed, Jun 02, 2021 at 09:04:15AM -0500, Brijesh Singh wrote:
> Version 2 of GHCB specification provides NAEs that can be used by the SNP
> guest to communicate with the PSP without risk from a malicious hypervisor
> who wishes to read, alter, drop or replay the messages sent.
>
> The hypervisor uses the SNP_GUEST_REQUEST command interface provided by
> the SEV-SNP firmware to forward the guest messages to the PSP.
>
> In order to communicate with the PSP, the guest need to locate the secrets
> page inserted by the hypervisor during the SEV-SNP guest launch. The
> secrets page contains the communication keys used to send and receive the
> encrypted messages between the guest and the PSP.
>
> The secrets page is located either through the setup_data cc_blob_address
> or EFI configuration table.
>
> Create a platform device that the SNP guest driver can bind to get the
> platform resources. The SNP guest driver can provide userspace interface
> to get the attestation report, key derivation etc.
>
> The helper snp_issue_guest_request() will be used by the drivers to
> send the guest message request to the hypervisor. The guest message header
> contains a message count. The message count is used in the IV. The
> firmware increments the message count by 1, and expects that next message
> will be using the incremented count.
>
> The helper snp_msg_seqno() will be used by driver to get and message
> sequence counter, and it will be automatically incremented by the
> snp_issue_guest_request(). The incremented value is be saved in the
> secrets page so that the kexec'ed kernel knows from where to begin.
>
> See SEV-SNP and GHCB spec for more details.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/sev.h | 12 +++
> arch/x86/include/uapi/asm/svm.h | 2 +
> arch/x86/kernel/sev.c | 176 ++++++++++++++++++++++++++++++++
> arch/x86/platform/efi/efi.c | 2 +
> include/linux/efi.h | 1 +
> include/linux/sev-guest.h | 76 ++++++++++++++
> 6 files changed, 269 insertions(+)
> create mode 100644 include/linux/sev-guest.h
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 640108402ae9..da2f757cd9bc 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -59,6 +59,18 @@ extern void vc_no_ghcb(void);
> extern void vc_boot_ghcb(void);
> extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>
> +/* AMD SEV Confidential computing blob structure */
> +#define CC_BLOB_SEV_HDR_MAGIC 0x45444d41
> +struct cc_blob_sev_info {
> + u32 magic;
> + u16 version;
> + u16 reserved;
> + u64 secrets_phys;
> + u32 secrets_len;
> + u64 cpuid_phys;
> + u32 cpuid_len;
> +};
> +
> /* Software defined (when rFlags.CF = 1) */
> #define PVALIDATE_FAIL_NOUPDATE 255
>
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index c0152186a008..bd64f2b98ac7 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -109,6 +109,7 @@
> #define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
> #define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
> #define SVM_VMGEXIT_PSC 0x80000010
> +#define SVM_VMGEXIT_GUEST_REQUEST 0x80000011
> #define SVM_VMGEXIT_AP_CREATION 0x80000013
> #define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
> #define SVM_VMGEXIT_AP_CREATE 1
> @@ -222,6 +223,7 @@
> { SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
> { SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
> { SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
> + { SVM_VMGEXIT_GUEST_REQUEST, "vmgexit_guest_request" }, \
> { SVM_EXIT_ERR, "invalid_guest_state" }
>
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 8f7ef35a25ef..8aae1166f52e 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -9,6 +9,7 @@
>
> #define pr_fmt(fmt) "SEV-ES: " fmt
>
> +#include <linux/platform_device.h>
> #include <linux/sched/debug.h> /* For show_regs() */
> #include <linux/percpu-defs.h>
> #include <linux/mem_encrypt.h>
> @@ -16,10 +17,13 @@
> #include <linux/printk.h>
> #include <linux/mm_types.h>
> #include <linux/set_memory.h>
> +#include <linux/sev-guest.h>
> #include <linux/memblock.h>
> #include <linux/kernel.h>
> +#include <linux/efi.h>
> #include <linux/mm.h>
> #include <linux/cpumask.h>
> +#include <linux/io.h>
>
> #include <asm/cpu_entry_area.h>
> #include <asm/stacktrace.h>
> @@ -33,6 +37,7 @@
> #include <asm/smp.h>
> #include <asm/cpu.h>
> #include <asm/apic.h>
> +#include <asm/setup.h> /* For struct boot_params */
>
> #include "sev-internal.h"
>
> @@ -47,6 +52,8 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
> */
> static struct ghcb __initdata *boot_ghcb;
>
> +static unsigned long snp_secrets_phys;
> +
> /* #VC handler runtime per-CPU data */
> struct sev_es_runtime_data {
> struct ghcb ghcb_page;
> @@ -105,6 +112,10 @@ struct ghcb_state {
> struct ghcb *ghcb;
> };
>
> +#ifdef CONFIG_EFI
> +extern unsigned long cc_blob_phys;
> +#endif
> +
> static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
> DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
>
> @@ -1909,3 +1920,168 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
> while (true)
> halt();
> }
> +
> +static struct resource guest_req_res[0];
> +static struct platform_device guest_req_device = {
> + .name = "snp-guest",
> + .id = -1,
> + .resource = guest_req_res,
> + .num_resources = 1,
> +};

Perhaps I'm missing something, but I can't find where the memory for
"guest_req_res" is allocated. In my tests I had to turn this
zero-length array into a single struct to prevent the kernel from
crashing.

Thanks,
Sergio.


Attachments:
(No filename) (5.77 kB)
signature.asc (849.00 B)
Download all attachments

2021-06-05 10:55:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 04/22] x86/mm: Add sev_feature_enabled() helper

On Wed, Jun 02, 2021 at 09:03:58AM -0500, Brijesh Singh wrote:
> @@ -78,6 +85,7 @@ static inline void sev_es_init_vc_handling(void) { }
> static inline bool sme_active(void) { return false; }
> static inline bool sev_active(void) { return false; }
> static inline bool sev_es_active(void) { return false; }
> +static inline bool sev_snp_active(void) { return false; }

Leftover from the previous version, can go.

> +bool sev_feature_enabled(unsigned int type)
> +{
> + switch (type) {
> + case SEV: return sev_status & MSR_AMD64_SEV_ENABLED;
> + case SEV_ES: return sev_status & MSR_AMD64_SEV_ES_ENABLED;
> + case SEV_SNP: return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
> + default: return false;
> + }
> +}

Yeah, btw, we might even do a generic one, see:

https://lkml.kernel.org/r/[email protected]

and the following mail.

But that doesn't matter as sev_feature_enabled()'s body can go into
sev_protected_guest_has() or whatever we end up calling it.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-07 14:21:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 05/22] x86/sev: Add support for hypervisor feature VMGEXIT

On Wed, Jun 02, 2021 at 09:03:59AM -0500, Brijesh Singh wrote:
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 70f181f20d92..94957c5bdb51 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c

I'm guessing this is in sev-shared.c because it is going to be used by
both stages?

> @@ -20,6 +20,7 @@
> * out when the .bss section is later cleared.
> */
> static u16 ghcb_version __section(".data");

State what this is:

/* Bitmap of SEV features supported by the hypervisor */

> +static u64 hv_features __section(".data");

Also, I'm assuming that bitmap remains immutable during the guest
lifetime so you can do:

static u64 hv_features __ro_after_init;

instead, which will do:

static u64 hv_features __attribute__((__section__(".data..ro_after_init")));

and it'll be in the data section and then also marked read-only after
init, after mark_rodata_ro() more specifically.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-07 14:57:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 06/22] x86/sev: check SEV-SNP features support

On Wed, Jun 02, 2021 at 09:04:00AM -0500, Brijesh Singh wrote:
> static bool early_setup_sev_es(void)

This function is doing SNP init now too, so it should be called
something generic like

do_early_sev_setup()

or so.

> #define GHCB_SEV_ES_GEN_REQ 0
> #define GHCB_SEV_ES_PROT_UNSUPPORTED 1
> +#define GHCB_SEV_ES_SNP_UNSUPPORTED 2

GHCB_SNP_UNSUPPORTED

> +static bool __init sev_snp_check_hypervisor_features(void)

check_hv_features()

is nice and short.

> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 77a754365ba9..9b70b7332614 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -609,6 +609,10 @@ static bool __init sev_es_setup_ghcb(void)

Ditto for this one: setup_ghcb()

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-07 14:59:13

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 05/22] x86/sev: Add support for hypervisor feature VMGEXIT


On 6/7/21 9:19 AM, Borislav Petkov wrote:
> On Wed, Jun 02, 2021 at 09:03:59AM -0500, Brijesh Singh wrote:
>> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
>> index 70f181f20d92..94957c5bdb51 100644
>> --- a/arch/x86/kernel/sev-shared.c
>> +++ b/arch/x86/kernel/sev-shared.c
> I'm guessing this is in sev-shared.c because it is going to be used by
> both stages?

Yes,  the function is used by both the stages.


>> @@ -20,6 +20,7 @@
>> * out when the .bss section is later cleared.
>> */
>> static u16 ghcb_version __section(".data");
> State what this is:
>
> /* Bitmap of SEV features supported by the hypervisor */

Noted.


>
>> +static u64 hv_features __section(".data");
> Also, I'm assuming that bitmap remains immutable during the guest
> lifetime so you can do:
>
> static u64 hv_features __ro_after_init;
>
> instead, which will do:
>
> static u64 hv_features __attribute__((__section__(".data..ro_after_init")));
>
> and it'll be in the data section and then also marked read-only after
> init, after mark_rodata_ro() more specifically.

Yes, it should be immutable. I will set the ro_after_init section to
mark it read-only. thanks


2021-06-07 15:36:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 07/22] x86/sev: Add a helper for the PVALIDATE instruction

On Wed, Jun 02, 2021 at 09:04:01AM -0500, Brijesh Singh wrote:
> +static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
> +{
> + bool no_rmpupdate;
> + int rc;

From a previous review:

Please put over the opcode bytes line:

/* "pvalidate" mnemonic support in binutils 2.36 and newer */

> + asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
> + CC_SET(c)
> + : CC_OUT(c) (no_rmpupdate), "=a"(rc)
> + : "a"(vaddr), "c"(rmp_psize), "d"(validate)
> + : "memory", "cc");
> +
> + if (no_rmpupdate)
> + return PVALIDATE_FAIL_NOUPDATE;
> +
> + return rc;
> +}

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-07 16:02:56

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 06/22] x86/sev: check SEV-SNP features support


On 6/7/21 9:54 AM, Borislav Petkov wrote:
> On Wed, Jun 02, 2021 at 09:04:00AM -0500, Brijesh Singh wrote:
>> static bool early_setup_sev_es(void)
> This function is doing SNP init now too, so it should be called
> something generic like
>
> do_early_sev_setup()
>
> or so.

Okay, noted.


>> #define GHCB_SEV_ES_GEN_REQ 0
>> #define GHCB_SEV_ES_PROT_UNSUPPORTED 1
>> +#define GHCB_SEV_ES_SNP_UNSUPPORTED 2
> GHCB_SNP_UNSUPPORTED

Noted.


>
>> +static bool __init sev_snp_check_hypervisor_features(void)
> check_hv_features()
>
> is nice and short.

Noted.


>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index 77a754365ba9..9b70b7332614 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -609,6 +609,10 @@ static bool __init sev_es_setup_ghcb(void)
> Ditto for this one: setup_ghcb()

Noted.


>
> Thx.
>

2021-06-07 19:17:40

by Venu Busireddy

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 00/22] Add AMD Secure Nested Paging (SEV-SNP) Guest Support

On 2021-06-02 09:03:54 -0500, Brijesh Singh wrote:

[ snip ]

> The series is based on tip/master commit
> 493a0d4559fd (origin/master, origin/HEAD) Merge branch 'perf/core'

I could not find that commit (493a0d4559fd) either in
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git repo, or in
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git repo. Which
repo can I use to apply this series?

Venu

2021-06-07 19:18:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 00/22] Add AMD Secure Nested Paging (SEV-SNP) Guest Support

On Mon, Jun 07, 2021 at 02:15:22PM -0500, Venu Busireddy wrote:
> I could not find that commit (493a0d4559fd) either in
> git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git repo, or in
> git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git repo. Which
> repo can I use to apply this series?

Use the current tip/master, whichever it is, in the former repo.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-08 11:14:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 08/22] x86/compressed: Add helper for validating pages in the decompression stage

On Wed, Jun 02, 2021 at 09:04:02AM -0500, Brijesh Singh wrote:
> +static void __page_state_change(unsigned long paddr, int op)
> +{
> + u64 val;
> +
> + if (!sev_snp_enabled())
> + return;
> +
> + /*
> + * If private -> shared then invalidate the page before requesting the
> + * state change in the RMP table.
> + */
> + if ((op == SNP_PAGE_STATE_SHARED) && pvalidate(paddr, RMP_PG_SIZE_4K, 0))
> + goto e_pvalidate;
> +
> + /* Issue VMGEXIT to change the page state in RMP table. */
> + sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
> + VMGEXIT();
> +
> + /* Read the response of the VMGEXIT. */
> + val = sev_es_rd_ghcb_msr();
> + if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
> + sev_es_terminate(1, GHCB_TERM_PSC);
> +
> + /*
> + * Now that page is added in the RMP table, validate it so that it is
> + * consistent with the RMP entry.
> + */
> + if ((op == SNP_PAGE_STATE_PRIVATE) && pvalidate(paddr, RMP_PG_SIZE_4K, 1))
> + goto e_pvalidate;
> +
> + return;
> +
> +e_pvalidate:
> + sev_es_terminate(1, GHCB_TERM_PVALIDATE);
> +}

You don't even need that label, diff ontop:

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 808fe1f6b170..dd0f22386fd2 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -146,7 +146,7 @@ static void __page_state_change(unsigned long paddr, int op)
* state change in the RMP table.
*/
if ((op == SNP_PAGE_STATE_SHARED) && pvalidate(paddr, RMP_PG_SIZE_4K, 0))
- goto e_pvalidate;
+ sev_es_terminate(1, GHCB_TERM_PVALIDATE);

/* Issue VMGEXIT to change the page state in RMP table. */
sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
@@ -162,12 +162,7 @@ static void __page_state_change(unsigned long paddr, int op)
* consistent with the RMP entry.
*/
if ((op == SNP_PAGE_STATE_PRIVATE) && pvalidate(paddr, RMP_PG_SIZE_4K, 1))
- goto e_pvalidate;
-
- return;
-
-e_pvalidate:
- sev_es_terminate(1, GHCB_TERM_PVALIDATE);
+ sev_es_terminate(1, GHCB_TERM_PVALIDATE);
}

void snp_set_page_private(unsigned long paddr)

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-08 15:59:05

by Venu Busireddy

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 01/22] x86/sev: shorten GHCB terminate macro names

On 2021-06-02 09:03:55 -0500, Brijesh Singh wrote:
> Suggested-by: Borislav Petkov <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>

Reviewed-by: Venu Busireddy <[email protected]>

> ---
> arch/x86/boot/compressed/sev.c | 6 +++---
> arch/x86/include/asm/sev-common.h | 4 ++--
> arch/x86/kernel/sev-shared.c | 2 +-
> arch/x86/kernel/sev.c | 4 ++--
> 4 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 670e998fe930..28bcf04c022e 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -122,7 +122,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
> static bool early_setup_sev_es(void)
> {
> if (!sev_es_negotiate_protocol())
> - sev_es_terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED);
> + sev_es_terminate(GHCB_SEV_ES_PROT_UNSUPPORTED);
>
> if (set_page_decrypted((unsigned long)&boot_ghcb_page))
> return false;
> @@ -175,7 +175,7 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
> enum es_result result;
>
> if (!boot_ghcb && !early_setup_sev_es())
> - sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> + sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
>
> vc_ghcb_invalidate(boot_ghcb);
> result = vc_init_em_ctxt(&ctxt, regs, exit_code);
> @@ -202,5 +202,5 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
> if (result == ES_OK)
> vc_finish_insn(&ctxt);
> else if (result != ES_RETRY)
> - sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> + sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
> }
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 629c3df243f0..11b7d9cea775 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -54,8 +54,8 @@
> (((((u64)reason_set) & GHCB_MSR_TERM_REASON_SET_MASK) << GHCB_MSR_TERM_REASON_SET_POS) | \
> ((((u64)reason_val) & GHCB_MSR_TERM_REASON_MASK) << GHCB_MSR_TERM_REASON_POS))
>
> -#define GHCB_SEV_ES_REASON_GENERAL_REQUEST 0
> -#define GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED 1
> +#define GHCB_SEV_ES_GEN_REQ 0
> +#define GHCB_SEV_ES_PROT_UNSUPPORTED 1
>
> #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
>
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 6ec8b3bfd76e..14198075ff8b 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -207,7 +207,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
>
> fail:
> /* Terminate the guest */
> - sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> + sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
> }
>
> static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 9578c82832aa..460717e3f72d 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -1383,7 +1383,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
> show_regs(regs);
>
> /* Ask hypervisor to sev_es_terminate */
> - sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> + sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
>
> /* If that fails and we get here - just panic */
> panic("Returned from Terminate-Request to Hypervisor\n");
> @@ -1416,7 +1416,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
>
> /* Do initial setup or terminate the guest */
> if (unlikely(boot_ghcb == NULL && !sev_es_setup_ghcb()))
> - sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> + sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
>
> vc_ghcb_invalidate(boot_ghcb);
>
> --
> 2.17.1
>

2021-06-08 16:01:23

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 08/22] x86/compressed: Add helper for validating pages in the decompression stage


On 6/8/21 6:12 AM, Borislav Petkov wrote:
>
> You don't even need that label, diff ontop:

I will merge your diff ontop. Thanks


2021-06-08 16:03:09

by Venu Busireddy

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 02/22] x86/sev: Define the Linux specific guest termination reasons

On 2021-06-02 09:03:56 -0500, Brijesh Singh wrote:
> GHCB specification defines the reason code for reason set 0. The reason
> codes defined in the set 0 do not cover all possible causes for a guest
> to request termination.
>
> The reason set 1 to 255 is reserved for the vendor-specific codes.
> Reseve the reason set 1 for the Linux guest. Define an error codes for
> reason set 1.
>
> While at it, change the sev_es_terminate() to accept the reason set
> parameter.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/boot/compressed/sev.c | 6 +++---
> arch/x86/include/asm/sev-common.h | 5 +++++
> arch/x86/kernel/sev-shared.c | 6 +++---
> arch/x86/kernel/sev.c | 4 ++--
> 4 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 28bcf04c022e..87621f4e4703 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -122,7 +122,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
> static bool early_setup_sev_es(void)
> {
> if (!sev_es_negotiate_protocol())
> - sev_es_terminate(GHCB_SEV_ES_PROT_UNSUPPORTED);
> + sev_es_terminate(0, GHCB_SEV_ES_PROT_UNSUPPORTED);
>
> if (set_page_decrypted((unsigned long)&boot_ghcb_page))
> return false;
> @@ -175,7 +175,7 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
> enum es_result result;
>
> if (!boot_ghcb && !early_setup_sev_es())
> - sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
> + sev_es_terminate(0, GHCB_SEV_ES_GEN_REQ);
>
> vc_ghcb_invalidate(boot_ghcb);
> result = vc_init_em_ctxt(&ctxt, regs, exit_code);
> @@ -202,5 +202,5 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
> if (result == ES_OK)
> vc_finish_insn(&ctxt);
> else if (result != ES_RETRY)
> - sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
> + sev_es_terminate(0, GHCB_SEV_ES_GEN_REQ);
> }
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 11b7d9cea775..f1e2aacb0d61 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -59,4 +59,9 @@
>
> #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
>
> +/* Linux specific reason codes (used with reason set 1) */
> +#define GHCB_TERM_REGISTER 0 /* GHCB GPA registration failure */
> +#define GHCB_TERM_PSC 1 /* Page State Change failure */
> +#define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
> +
> #endif
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 14198075ff8b..de0e7e6c52b8 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -24,7 +24,7 @@ static bool __init sev_es_check_cpu_features(void)
> return true;
> }
>
> -static void __noreturn sev_es_terminate(unsigned int reason)
> +static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
> {
> u64 val = GHCB_MSR_TERM_REQ;
>
> @@ -32,7 +32,7 @@ static void __noreturn sev_es_terminate(unsigned int reason)
> * Tell the hypervisor what went wrong - only reason-set 0 is
> * currently supported.
> */

Since reason set 0 is not the only set supported anymore, maybe the part
about reason set 0 should be removed from the above comment?

Venu

> - val |= GHCB_SEV_TERM_REASON(0, reason);
> + val |= GHCB_SEV_TERM_REASON(set, reason);
>
> /* Request Guest Termination from Hypvervisor */
> sev_es_wr_ghcb_msr(val);
> @@ -207,7 +207,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
>
> fail:
> /* Terminate the guest */
> - sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
> + sev_es_terminate(0, GHCB_SEV_ES_GEN_REQ);
> }
>
> static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 460717e3f72d..77a754365ba9 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -1383,7 +1383,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
> show_regs(regs);
>
> /* Ask hypervisor to sev_es_terminate */
> - sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
> + sev_es_terminate(0, GHCB_SEV_ES_GEN_REQ);
>
> /* If that fails and we get here - just panic */
> panic("Returned from Terminate-Request to Hypervisor\n");
> @@ -1416,7 +1416,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
>
> /* Do initial setup or terminate the guest */
> if (unlikely(boot_ghcb == NULL && !sev_es_setup_ghcb()))
> - sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
> + sev_es_terminate(0, GHCB_SEV_ES_GEN_REQ);
>
> vc_ghcb_invalidate(boot_ghcb);
>
> --
> 2.17.1
>

2021-06-08 16:53:42

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 02/22] x86/sev: Define the Linux specific guest termination reasons


On 6/8/21 10:59 AM, Venu Busireddy wrote:
>
>> {
>> u64 val = GHCB_MSR_TERM_REQ;
>>
>> @@ -32,7 +32,7 @@ static void __noreturn sev_es_terminate(unsigned int reason)
>> * Tell the hypervisor what went wrong - only reason-set 0 is
>> * currently supported.
>> */
> Since reason set 0 is not the only set supported anymore, maybe the part
> about reason set 0 should be removed from the above comment?

Sure, I will update the comment. thanks


2021-06-08 17:38:23

by Venu Busireddy

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 03/22] x86/sev: Save the negotiated GHCB version

On 2021-06-02 09:03:57 -0500, Brijesh Singh wrote:
> The SEV-ES guest calls the sev_es_negotiate_protocol() to negotiate the
> GHCB protocol version before establishing the GHCB. Cache the negotiated
> GHCB version so that it can be used later.
>
> Signed-off-by: Brijesh Singh <[email protected]>

Reviewed-by: Venu Busireddy <[email protected]>

> ---
> arch/x86/include/asm/sev.h | 2 +-
> arch/x86/kernel/sev-shared.c | 15 ++++++++++++---
> 2 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index fa5cd05d3b5b..7ec91b1359df 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -12,7 +12,7 @@
> #include <asm/insn.h>
> #include <asm/sev-common.h>
>
> -#define GHCB_PROTO_OUR 0x0001UL
> +#define GHCB_PROTOCOL_MIN 1ULL
> #define GHCB_PROTOCOL_MAX 1ULL
> #define GHCB_DEFAULT_USAGE 0ULL
>
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index de0e7e6c52b8..70f181f20d92 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -14,6 +14,13 @@
> #define has_cpuflag(f) boot_cpu_has(f)
> #endif
>
> +/*
> + * Since feature negotiation related variables are set early in the boot
> + * process they must reside in the .data section so as not to be zeroed
> + * out when the .bss section is later cleared.
> + */
> +static u16 ghcb_version __section(".data");
> +
> static bool __init sev_es_check_cpu_features(void)
> {
> if (!has_cpuflag(X86_FEATURE_RDRAND)) {
> @@ -54,10 +61,12 @@ static bool sev_es_negotiate_protocol(void)
> if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
> return false;
>
> - if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTO_OUR ||
> - GHCB_MSR_PROTO_MIN(val) > GHCB_PROTO_OUR)
> + if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
> + GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
> return false;
>
> + ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
> +
> return true;
> }
>
> @@ -101,7 +110,7 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> enum es_result ret;
>
> /* Fill in protocol and format specifiers */
> - ghcb->protocol_version = GHCB_PROTOCOL_MAX;
> + ghcb->protocol_version = ghcb_version;
> ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
>
> ghcb_set_sw_exit_code(ghcb, exit_code);
> --
> 2.17.1
>

2021-06-09 18:15:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 09/22] x86/compressed: Register GHCB memory when SEV-SNP is active

On Wed, Jun 02, 2021 at 09:04:03AM -0500, Brijesh Singh wrote:
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 1424b8ffde0b..ae99a8a756fe 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -75,6 +75,17 @@
> #define GHCB_MSR_PSC_ERROR_POS 32
> #define GHCB_MSR_PSC_RESP_VAL(val) ((val) >> GHCB_MSR_PSC_ERROR_POS)
>
> +/* GHCB GPA Register */
> +#define GHCB_MSR_GPA_REG_REQ 0x012
> +#define GHCB_MSR_GPA_REG_VALUE_POS 12
> +#define GHCB_MSR_GPA_REG_GFN_MASK GENMASK_ULL(51, 0)
> +#define GHCB_MSR_GPA_REQ_GFN_VAL(v) \
> + (((unsigned long)((v) & GHCB_MSR_GPA_REG_GFN_MASK) << GHCB_MSR_GPA_REG_VALUE_POS)| \
> + GHCB_MSR_GPA_REG_REQ)
> +
> +#define GHCB_MSR_GPA_REG_RESP 0x013
> +#define GHCB_MSR_GPA_REG_RESP_VAL(v) ((v) >> GHCB_MSR_GPA_REG_VALUE_POS)
> +

Can we pls pay attention to having those REQuests sorted by their
number, like in the GHCB spec, for faster finding?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-09 19:27:31

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 21/22] x86/sev: Register SNP guest request platform device

* Brijesh Singh ([email protected]) wrote:
> Version 2 of GHCB specification provides NAEs that can be used by the SNP
> guest to communicate with the PSP without risk from a malicious hypervisor
> who wishes to read, alter, drop or replay the messages sent.
>
> The hypervisor uses the SNP_GUEST_REQUEST command interface provided by
> the SEV-SNP firmware to forward the guest messages to the PSP.
>
> In order to communicate with the PSP, the guest need to locate the secrets
> page inserted by the hypervisor during the SEV-SNP guest launch. The
> secrets page contains the communication keys used to send and receive the
> encrypted messages between the guest and the PSP.
>
> The secrets page is located either through the setup_data cc_blob_address
> or EFI configuration table.
>
> Create a platform device that the SNP guest driver can bind to get the
> platform resources. The SNP guest driver can provide userspace interface
> to get the attestation report, key derivation etc.
>
> The helper snp_issue_guest_request() will be used by the drivers to
> send the guest message request to the hypervisor. The guest message header
> contains a message count. The message count is used in the IV. The
> firmware increments the message count by 1, and expects that next message
> will be using the incremented count.
>
> The helper snp_msg_seqno() will be used by driver to get and message
> sequence counter, and it will be automatically incremented by the
> snp_issue_guest_request(). The incremented value is be saved in the
> secrets page so that the kexec'ed kernel knows from where to begin.
>
> See SEV-SNP and GHCB spec for more details.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/sev.h | 12 +++
> arch/x86/include/uapi/asm/svm.h | 2 +
> arch/x86/kernel/sev.c | 176 ++++++++++++++++++++++++++++++++
> arch/x86/platform/efi/efi.c | 2 +
> include/linux/efi.h | 1 +
> include/linux/sev-guest.h | 76 ++++++++++++++
> 6 files changed, 269 insertions(+)
> create mode 100644 include/linux/sev-guest.h
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 640108402ae9..da2f757cd9bc 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -59,6 +59,18 @@ extern void vc_no_ghcb(void);
> extern void vc_boot_ghcb(void);
> extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>
> +/* AMD SEV Confidential computing blob structure */
> +#define CC_BLOB_SEV_HDR_MAGIC 0x45444d41
> +struct cc_blob_sev_info {
> + u32 magic;
> + u16 version;
> + u16 reserved;
> + u64 secrets_phys;
> + u32 secrets_len;
> + u64 cpuid_phys;
> + u32 cpuid_len;
> +};
> +
> /* Software defined (when rFlags.CF = 1) */
> #define PVALIDATE_FAIL_NOUPDATE 255
>
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index c0152186a008..bd64f2b98ac7 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -109,6 +109,7 @@
> #define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
> #define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
> #define SVM_VMGEXIT_PSC 0x80000010
> +#define SVM_VMGEXIT_GUEST_REQUEST 0x80000011
> #define SVM_VMGEXIT_AP_CREATION 0x80000013
> #define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
> #define SVM_VMGEXIT_AP_CREATE 1
> @@ -222,6 +223,7 @@
> { SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
> { SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
> { SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
> + { SVM_VMGEXIT_GUEST_REQUEST, "vmgexit_guest_request" }, \
> { SVM_EXIT_ERR, "invalid_guest_state" }
>
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 8f7ef35a25ef..8aae1166f52e 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -9,6 +9,7 @@
>
> #define pr_fmt(fmt) "SEV-ES: " fmt
>
> +#include <linux/platform_device.h>
> #include <linux/sched/debug.h> /* For show_regs() */
> #include <linux/percpu-defs.h>
> #include <linux/mem_encrypt.h>
> @@ -16,10 +17,13 @@
> #include <linux/printk.h>
> #include <linux/mm_types.h>
> #include <linux/set_memory.h>
> +#include <linux/sev-guest.h>
> #include <linux/memblock.h>
> #include <linux/kernel.h>
> +#include <linux/efi.h>
> #include <linux/mm.h>
> #include <linux/cpumask.h>
> +#include <linux/io.h>
>
> #include <asm/cpu_entry_area.h>
> #include <asm/stacktrace.h>
> @@ -33,6 +37,7 @@
> #include <asm/smp.h>
> #include <asm/cpu.h>
> #include <asm/apic.h>
> +#include <asm/setup.h> /* For struct boot_params */
>
> #include "sev-internal.h"
>
> @@ -47,6 +52,8 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
> */
> static struct ghcb __initdata *boot_ghcb;
>
> +static unsigned long snp_secrets_phys;
> +
> /* #VC handler runtime per-CPU data */
> struct sev_es_runtime_data {
> struct ghcb ghcb_page;
> @@ -105,6 +112,10 @@ struct ghcb_state {
> struct ghcb *ghcb;
> };
>
> +#ifdef CONFIG_EFI
> +extern unsigned long cc_blob_phys;
> +#endif
> +
> static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
> DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
>
> @@ -1909,3 +1920,168 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
> while (true)
> halt();
> }
> +
> +static struct resource guest_req_res[0];
> +static struct platform_device guest_req_device = {
> + .name = "snp-guest",
> + .id = -1,
> + .resource = guest_req_res,
> + .num_resources = 1,
> +};
> +
> +static struct snp_secrets_page_layout *snp_map_secrets_page(void)
> +{
> + u16 __iomem *secrets;
> +
> + if (!snp_secrets_phys || !sev_feature_enabled(SEV_SNP))
> + return NULL;
> +
> + secrets = ioremap_encrypted(snp_secrets_phys, PAGE_SIZE);
> + if (!secrets)
> + return NULL;
> +
> + return (struct snp_secrets_page_layout *)secrets;
> +}
> +
> +u64 snp_msg_seqno(void)
> +{
> + struct snp_secrets_page_layout *layout;
> + u64 count;
> +
> + layout = snp_map_secrets_page();
> + if (layout == NULL)
> + return 0;
> +
> + /* Read the current message sequence counter from secrets pages */
> + count = readl(&layout->os_area.msg_seqno_0);

Why is this seqno_0 - is that because it's the count of talking to the
PSP?

> + iounmap(layout);
> +
> + /*
> + * The message sequence counter for the SNP guest request is a 64-bit value
> + * but the version 2 of GHCB specification defines the 32-bit storage for the
> + * it.
> + */
> + if ((count + 1) >= INT_MAX)
> + return 0;

Is that UINT_MAX?

> +
> + return count + 1;
> +}
> +EXPORT_SYMBOL_GPL(snp_msg_seqno);
> +
> +static void snp_gen_msg_seqno(void)
> +{
> + struct snp_secrets_page_layout *layout;
> + u64 count;
> +
> + layout = snp_map_secrets_page();
> + if (layout == NULL)
> + return;
> +
> + /* Increment the sequence counter by 2 and save in secrets page. */
> + count = readl(&layout->os_area.msg_seqno_0);
> + count += 2;

Why 2 not 1 ?

> + writel(count, &layout->os_area.msg_seqno_0);
> + iounmap(layout);
> +}
> +
> +static int get_snp_secrets_resource(struct resource *res)
> +{
> + struct setup_header *hdr = &boot_params.hdr;
> + struct cc_blob_sev_info *info;
> + unsigned long paddr;
> + int ret = -ENODEV;
> +
> + /*
> + * The secret page contains the VM encryption key used for encrypting the
> + * messages between the guest and the PSP. The secrets page location is
> + * available either through the setup_data or EFI configuration table.
> + */
> + if (hdr->cc_blob_address) {
> + paddr = hdr->cc_blob_address;

Can you trust the paddr the host has given you or do you need to do some
form of validation?

Dave
> + } else if (efi_enabled(EFI_CONFIG_TABLES)) {
> +#ifdef CONFIG_EFI
> + paddr = cc_blob_phys;
> +#else
> + return -ENODEV;
> +#endif
> + } else {
> + return -ENODEV;
> + }
> +
> + info = memremap(paddr, sizeof(*info), MEMREMAP_WB);
> + if (!info)
> + return -ENOMEM;
> +
> + /* Verify the header that its a valid SEV_SNP CC header */
> + if ((info->magic == CC_BLOB_SEV_HDR_MAGIC) &&
> + info->secrets_phys &&
> + (info->secrets_len == PAGE_SIZE)) {
> + res->start = info->secrets_phys;
> + res->end = info->secrets_phys + info->secrets_len;
> + res->flags = IORESOURCE_MEM;
> + snp_secrets_phys = info->secrets_phys;
> + ret = 0;
> + }
> +
> + memunmap(info);
> + return ret;
> +}
> +
> +static int __init add_snp_guest_request(void)
> +{
> + if (!sev_feature_enabled(SEV_SNP))
> + return -ENODEV;
> +
> + if (get_snp_secrets_resource(&guest_req_res[0]))
> + return -ENODEV;
> +
> + platform_device_register(&guest_req_device);
> + dev_info(&guest_req_device.dev, "registered [secret 0x%llx - 0x%llx]\n",
> + guest_req_res[0].start, guest_req_res[0].end);
> +
> + return 0;
> +}
> +device_initcall(add_snp_guest_request);
> +
> +unsigned long snp_issue_guest_request(int type, struct snp_guest_request_data *input)
> +{
> + struct ghcb_state state;
> + struct ghcb *ghcb;
> + unsigned long id;
> + int ret;
> +
> + if (!sev_feature_enabled(SEV_SNP))
> + return -ENODEV;
> +
> + if (type == GUEST_REQUEST)
> + id = SVM_VMGEXIT_GUEST_REQUEST;
> + else
> + return -EINVAL;
> +
> + ghcb = sev_es_get_ghcb(&state);
> + if (!ghcb)
> + return -ENODEV;
> +
> + vc_ghcb_invalidate(ghcb);
> + ghcb_set_rax(ghcb, input->data_gpa);
> + ghcb_set_rbx(ghcb, input->data_npages);
> +
> + ret = sev_es_ghcb_hv_call(ghcb, NULL, id, input->req_gpa, input->resp_gpa);
> + if (ret)
> + goto e_put;
> +
> + if (ghcb->save.sw_exit_info_2) {
> + ret = ghcb->save.sw_exit_info_2;
> + goto e_put;
> + }
> +
> + /* Command was successful, increment the message sequence counter. */
> + snp_gen_msg_seqno();
> +
> +e_put:
> + sev_es_put_ghcb(&state);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(snp_issue_guest_request);
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 8a26e705cb06..2cca9ee6e1d4 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -57,6 +57,7 @@ static unsigned long efi_systab_phys __initdata;
> static unsigned long prop_phys = EFI_INVALID_TABLE_ADDR;
> static unsigned long uga_phys = EFI_INVALID_TABLE_ADDR;
> static unsigned long efi_runtime, efi_nr_tables;
> +unsigned long cc_blob_phys;
>
> unsigned long efi_fw_vendor, efi_config_table;
>
> @@ -66,6 +67,7 @@ static const efi_config_table_type_t arch_tables[] __initconst = {
> #ifdef CONFIG_X86_UV
> {UV_SYSTEM_TABLE_GUID, &uv_systab_phys, "UVsystab" },
> #endif
> + {EFI_CC_BLOB_GUID, &cc_blob_phys, "CC blob" },
> {},
> };
>
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 6b5d36babfcc..75aeb2a56888 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -344,6 +344,7 @@ void efi_native_runtime_setup(void);
> #define EFI_CERT_SHA256_GUID EFI_GUID(0xc1c41626, 0x504c, 0x4092, 0xac, 0xa9, 0x41, 0xf9, 0x36, 0x93, 0x43, 0x28)
> #define EFI_CERT_X509_GUID EFI_GUID(0xa5c059a1, 0x94e4, 0x4aa7, 0x87, 0xb5, 0xab, 0x15, 0x5c, 0x2b, 0xf0, 0x72)
> #define EFI_CERT_X509_SHA256_GUID EFI_GUID(0x3bd2a492, 0x96c0, 0x4079, 0xb4, 0x20, 0xfc, 0xf9, 0x8e, 0xf1, 0x03, 0xed)
> +#define EFI_CC_BLOB_GUID EFI_GUID(0x067b1f5f, 0xcf26, 0x44c5, 0x85, 0x54, 0x93, 0xd7, 0x77, 0x91, 0x2d, 0x42)
>
> /*
> * This GUID is used to pass to the kernel proper the struct screen_info
> diff --git a/include/linux/sev-guest.h b/include/linux/sev-guest.h
> new file mode 100644
> index 000000000000..51277448a108
> --- /dev/null
> +++ b/include/linux/sev-guest.h
> @@ -0,0 +1,76 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * AMD Secure Encrypted Virtualization (SEV) guest driver interface
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Brijesh Singh <[email protected]>
> + *
> + */
> +
> +#ifndef __LINUX_SEV_GUEST_H_
> +#define __LINUX_SEV_GUEST_H_
> +
> +#include <linux/types.h>
> +
> +enum vmgexit_type {
> + GUEST_REQUEST,
> +
> + GUEST_REQUEST_MAX
> +};
> +
> +/*
> + * The secrets page contains 96-bytes of reserved field that can be used by
> + * the guest OS. The guest OS uses the area to save the message sequence
> + * number for each VMPL level.
> + *
> + * See the GHCB spec section Secret page layout for the format for this area.
> + */
> +struct secrets_os_area {
> + u32 msg_seqno_0;
> + u32 msg_seqno_1;
> + u32 msg_seqno_2;
> + u32 msg_seqno_3;
> + u64 ap_jump_table_pa;
> + u8 rsvd[40];
> + u8 guest_usage[32];
> +} __packed;
> +
> +#define VMPCK_KEY_LEN 32
> +
> +/* See the SNP spec secrets page layout section for the structure */
> +struct snp_secrets_page_layout {
> + u32 version;
> + u32 imiEn : 1,
> + rsvd1 : 31;
> + u32 fms;
> + u32 rsvd2;
> + u8 gosvw[16];
> + u8 vmpck0[VMPCK_KEY_LEN];
> + u8 vmpck1[VMPCK_KEY_LEN];
> + u8 vmpck2[VMPCK_KEY_LEN];
> + u8 vmpck3[VMPCK_KEY_LEN];
> + struct secrets_os_area os_area;
> + u8 rsvd3[3840];
> +} __packed;
> +
> +struct snp_guest_request_data {
> + unsigned long req_gpa;
> + unsigned long resp_gpa;
> + unsigned long data_gpa;
> + unsigned int data_npages;
> +};
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +unsigned long snp_issue_guest_request(int vmgexit_type, struct snp_guest_request_data *input);
> +u64 snp_msg_seqno(void);
> +#else
> +
> +static inline unsigned long snp_issue_guest_request(int type,
> + struct snp_guest_request_data *input)
> +{
> + return -ENODEV;
> +}
> +static inline u64 snp_msg_seqno(void) { return 0; }
> +#endif /* CONFIG_AMD_MEM_ENCRYPT */
> +#endif /* __LINUX_SEV_GUEST_H__ */
> --
> 2.17.1
>
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2021-06-10 15:51:37

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes

On Wed, Jun 02, 2021 at 09:04:05AM -0500, Brijesh Singh wrote:
> @@ -65,6 +65,12 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
> /* RMP page size */
> #define RMP_PG_SIZE_4K 0
>
> +/* Memory opertion for snp_prep_memory() */
> +enum snp_mem_op {
> + MEMORY_PRIVATE,
> + MEMORY_SHARED

See below.

> +};
> +
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> extern struct static_key_false sev_es_enable_key;
> extern void __sev_es_ist_enter(struct pt_regs *regs);
> @@ -103,6 +109,11 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
>
> return rc;
> }
> +void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
> + unsigned int npages);
> +void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
> + unsigned int npages);

Align arguments on the opening brace.

> +void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op);
> #else
> static inline void sev_es_ist_enter(struct pt_regs *regs) { }
> static inline void sev_es_ist_exit(void) { }
> @@ -110,6 +121,15 @@ static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { ret
> static inline void sev_es_nmi_complete(void) { }
> static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
> static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
> +static inline void __init
> +early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages)

Put those { } at the end of the line:

early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }

no need for separate lines. Ditto below.

> +{
> +}
> +static inline void __init
> +early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages)
> +{
> +}
> +static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op) { }
> #endif
>
> #endif
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 455c09a9b2c2..6e9b45bb38ab 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -532,6 +532,111 @@ static u64 get_jump_table_addr(void)
> return ret;
> }
>
> +static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool validate)
> +{
> + unsigned long vaddr_end;
> + int rc;
> +
> + vaddr = vaddr & PAGE_MASK;
> + vaddr_end = vaddr + (npages << PAGE_SHIFT);
> +
> + while (vaddr < vaddr_end) {
> + rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
> + if (WARN(rc, "Failed to validate address 0x%lx ret %d", vaddr, rc))
> + sev_es_terminate(1, GHCB_TERM_PVALIDATE);
^^

I guess that 1 should be a define too, if we have to be correct:

sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);

or so. Ditto for all other calls of this.

> +
> + vaddr = vaddr + PAGE_SIZE;
> + }
> +}
> +
> +static void __init early_set_page_state(unsigned long paddr, unsigned int npages, int op)
> +{
> + unsigned long paddr_end;
> + u64 val;
> +
> + paddr = paddr & PAGE_MASK;
> + paddr_end = paddr + (npages << PAGE_SHIFT);
> +
> + while (paddr < paddr_end) {
> + /*
> + * Use the MSR protocol because this function can be called before the GHCB
> + * is established.
> + */
> + sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
> + VMGEXIT();
> +
> + val = sev_es_rd_ghcb_msr();
> +
> + if (GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP)

From a previous review:

Does that one need a warning too or am I being too paranoid?

> + goto e_term;
> +
> + if (WARN(GHCB_MSR_PSC_RESP_VAL(val),
> + "Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",
> + op == SNP_PAGE_STATE_PRIVATE ? "private" : "shared",
> + paddr, GHCB_MSR_PSC_RESP_VAL(val)))
> + goto e_term;
> +
> + paddr = paddr + PAGE_SIZE;
> + }
> +
> + return;
> +
> +e_term:
> + sev_es_terminate(1, GHCB_TERM_PSC);
> +}
> +
> +void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
> + unsigned int npages)
> +{
> + if (!sev_feature_enabled(SEV_SNP))
> + return;
> +
> + /* Ask hypervisor to add the memory pages in RMP table as a 'private'. */

Ask the hypervisor to mark the memory pages as private in the RMP table.

> + early_set_page_state(paddr, npages, SNP_PAGE_STATE_PRIVATE);
> +
> + /* Validate the memory pages after they've been added in the RMP table. */
> + pvalidate_pages(vaddr, npages, 1);
> +}
> +
> +void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
> + unsigned int npages)
> +{
> + if (!sev_feature_enabled(SEV_SNP))
> + return;
> +
> + /*
> + * Invalidate the memory pages before they are marked shared in the
> + * RMP table.
> + */
> + pvalidate_pages(vaddr, npages, 0);
> +
> + /* Ask hypervisor to make the memory pages shared in the RMP table. */

mark

> + early_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
> +}
> +
> +void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op)
> +{
> + unsigned long vaddr, npages;
> +
> + vaddr = (unsigned long)__va(paddr);
> + npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
> +
> + switch (op) {
> + case MEMORY_PRIVATE: {
> + early_snp_set_memory_private(vaddr, paddr, npages);
> + return;
> + }
> + case MEMORY_SHARED: {
> + early_snp_set_memory_shared(vaddr, paddr, npages);
> + return;
> + }
> + default:
> + break;
> + }
> +
> + WARN(1, "invalid memory op %d\n", op);

A lot easier, diff ontop of your patch:

---
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7c2cb5300e43..2ad4b5ab3f6c 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -65,12 +65,6 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* RMP page size */
#define RMP_PG_SIZE_4K 0

-/* Memory opertion for snp_prep_memory() */
-enum snp_mem_op {
- MEMORY_PRIVATE,
- MEMORY_SHARED
-};
-
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 2a5dce42af35..991d7964cee9 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -662,20 +662,13 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op)
vaddr = (unsigned long)__va(paddr);
npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;

- switch (op) {
- case MEMORY_PRIVATE: {
+ if (op == SNP_PAGE_STATE_PRIVATE)
early_snp_set_memory_private(vaddr, paddr, npages);
- return;
- }
- case MEMORY_SHARED: {
+ else if (op == SNP_PAGE_STATE_SHARED)
early_snp_set_memory_shared(vaddr, paddr, npages);
- return;
+ else {
+ WARN(1, "invalid memory page op %d\n", op);
}
- default:
- break;
- }
-
- WARN(1, "invalid memory op %d\n", op);
}

int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
---

> static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
>
> +/*
> + * When SNP is active, changes the page state from private to shared before

s/changes/change/

> + * copying the data from the source to destination and restore after the copy.
> + * This is required because the source address is mapped as decrypted by the
> + * caller of the routine.
> + */
> +static inline void __init snp_memcpy(void *dst, void *src, size_t sz,
> + unsigned long paddr, bool decrypt)
> +{
> + unsigned long npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
> +
> + if (!sev_feature_enabled(SEV_SNP) || !decrypt) {
> + memcpy(dst, src, sz);
> + return;
> + }
> +
> + /*
> + * If the paddr needs to be accessed decrypted, mark the page

What do you mean "If" - this is the SNP version of memcpy. Just say:

/*
* With SNP, the page address needs to be ...
*/

> + * shared in the RMP table before copying it.
> + */
> + early_snp_set_memory_shared((unsigned long)__va(paddr), paddr, npages);
> +
> + memcpy(dst, src, sz);
> +
> + /* Restore the page state after the memcpy. */
> + early_snp_set_memory_private((unsigned long)__va(paddr), paddr, npages);
> +}
> +
> /*
> * This routine does not change the underlying encryption setting of the
> * page(s) that map this memory. It assumes that eventually the memory is
> @@ -96,8 +125,8 @@ static void __init __sme_early_enc_dec(resource_size_t paddr,
> * Use a temporary buffer, of cache-line multiple size, to
> * avoid data corruption as documented in the APM.
> */
> - memcpy(sme_early_buffer, src, len);
> - memcpy(dst, sme_early_buffer, len);
> + snp_memcpy(sme_early_buffer, src, len, paddr, enc);
> + snp_memcpy(dst, sme_early_buffer, len, paddr, !enc);
>
> early_memunmap(dst, len);
> early_memunmap(src, len);
> @@ -277,9 +306,23 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
> else
> sme_early_decrypt(pa, size);
>
> + /*
> + * If page is getting mapped decrypted in the page table, then the page state
> + * change in the RMP table must happen before the page table updates.
> + */
> + if (!enc)
> + early_snp_set_memory_shared((unsigned long)__va(pa), pa, 1);

Merge the two branches:

/* Encrypt/decrypt the contents in-place */
if (enc) {
sme_early_encrypt(pa, size);
} else {
sme_early_decrypt(pa, size);

/*
* On SNP, the page state change in the RMP table must happen
* before the page table updates.
*/
early_snp_set_memory_shared((unsigned long)__va(pa), pa, 1);
}

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-10 16:07:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 12/22] x86/kernel: Make the bss.decrypted section shared in RMP table

On Wed, Jun 02, 2021 at 09:04:06AM -0500, Brijesh Singh wrote:
> The encryption attribute for the bss.decrypted region is cleared in the
> initial page table build. This is because the section contains the data
> that need to be shared between the guest and the hypervisor.
>
> When SEV-SNP is active, just clearing the encryption attribute in the
> page table is not enough. The page state need to be updated in the RMP
> table.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/kernel/head64.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index de01903c3735..f4c3e632345a 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -288,7 +288,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
> if (mem_encrypt_active()) {
> vaddr = (unsigned long)__start_bss_decrypted;
> vaddr_end = (unsigned long)__end_bss_decrypted;
> +
> for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> + /*
> + * When SEV-SNP is active then transition the page to shared in the RMP
> + * table so that it is consistent with the page table attribute change.
> + */
> + early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
> +
> i = pmd_index(vaddr);
> pmd[i] -= sme_get_me_mask();
> }
> --

It seems to me that all that code from the sme_encrypt_kernel(bp); call
to the end of the function should be in a separate function in sev.c
called sev_prepare_kernel(...args...) to be at least abstracted away
from the main boot path.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-11 09:49:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 14/22] x86/mm: Add support to validate memory when changing C-bit

On Wed, Jun 02, 2021 at 09:04:08AM -0500, Brijesh Singh wrote:
> +/* SNP Page State Change NAE event */
> +#define VMGEXIT_PSC_MAX_ENTRY 253
> +
> +struct __packed snp_page_state_header {

psc_hdr

> + u16 cur_entry;
> + u16 end_entry;
> + u32 reserved;
> +};
> +
> +struct __packed snp_page_state_entry {

psc_entry

> + u64 cur_page : 12,
> + gfn : 40,
> + operation : 4,
> + pagesize : 1,
> + reserved : 7;
> +};
> +
> +struct __packed snp_page_state_change {

snp_psc_desc

or so.

> + struct snp_page_state_header header;
> + struct snp_page_state_entry entry[VMGEXIT_PSC_MAX_ENTRY];
> +};

Which would make this struct a lot more readable:

struct __packed snp_psc_desc {
struct psc_hdr hdr;
struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY];

> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 6e9b45bb38ab..4847ac81cca3 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -637,6 +637,113 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op)
> WARN(1, "invalid memory op %d\n", op);
> }
>
> +static int page_state_vmgexit(struct ghcb *ghcb, struct snp_page_state_change *data)

vmgexit_psc

> +{
> + struct snp_page_state_header *hdr;
> + int ret = 0;
> +
> + hdr = &data->header;

Make sure to verify that snp_page_state_header.reserved field is always
0 before working more on the header so that people don't put stuff in
there which you cannot change later because it becomes ABI or whatnot.
Ditto for the other reserved fields.

> +
> + /*
> + * As per the GHCB specification, the hypervisor can resume the guest before
> + * processing all the entries. The loop checks whether all the entries are

s/The loop checks/Check/

> + * processed. If not, then keep retrying.

What guarantees that that loop will terminate eventually?

> + */
> + while (hdr->cur_entry <= hdr->end_entry) {

I see that "[t]he hypervisor should ensure that cur_entry and end_entry
represent values within the limits of the GHCB Shared Buffer." but let's
sanity-check that HV here too. We don't trust it, remember? :)

> +
> + ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
> +
> + ret = sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PSC, 0, 0);
> +
> + /* Page State Change VMGEXIT can pass error code through exit_info_2. */
> + if (WARN(ret || ghcb->save.sw_exit_info_2,
> + "SEV-SNP: page state change failed ret=%d exit_info_2=%llx\n",
> + ret, ghcb->save.sw_exit_info_2))
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> +static void set_page_state(unsigned long vaddr, unsigned int npages, int op)
> +{
> + struct snp_page_state_change *data;
> + struct snp_page_state_header *hdr;
> + struct snp_page_state_entry *e;
> + unsigned long vaddr_end;
> + struct ghcb_state state;
> + struct ghcb *ghcb;
> + int idx;
> +
> + vaddr = vaddr & PAGE_MASK;
> + vaddr_end = vaddr + (npages << PAGE_SHIFT);

Move those...

> +
> + ghcb = sev_es_get_ghcb(&state);
> + if (unlikely(!ghcb))
> + panic("SEV-SNP: Failed to get GHCB\n");

<--- ... here.

> +
> + data = (struct snp_page_state_change *)ghcb->shared_buffer;
> + hdr = &data->header;
> +
> + while (vaddr < vaddr_end) {
> + e = data->entry;
> + memset(data, 0, sizeof(*data));
> +
> + for (idx = 0; idx < VMGEXIT_PSC_MAX_ENTRY; idx++, e++) {
> + unsigned long pfn;
> +
> + if (is_vmalloc_addr((void *)vaddr))
> + pfn = vmalloc_to_pfn((void *)vaddr);
> + else
> + pfn = __pa(vaddr) >> PAGE_SHIFT;
> +
> + e->gfn = pfn;
> + e->operation = op;
> + hdr->end_entry = idx;
> +
> + /*
> + * The GHCB specification provides the flexibility to
> + * use either 4K or 2MB page size in the RMP table.
> + * The current SNP support does not keep track of the
> + * page size used in the RMP table. To avoid the
> + * overlap request, use the 4K page size in the RMP
> + * table.
> + */
> + e->pagesize = RMP_PG_SIZE_4K;
> + vaddr = vaddr + PAGE_SIZE;

Please put that
e++;

here.

It took me a while to find it hidden at the end of the loop and was
scratching my head as to why are we overwriting e-> everytime.

> +
> + if (vaddr >= vaddr_end)
> + break;

Instead of this silly check here, you can compute the range starting at
vaddr, VMGEXIT_PSC_MAX_ENTRY pages worth, carve out that second for-loop
in a helper called

__set_page_state()

which does the data preparation and does the vmgexit at the end.

Then the outer loop does only the computation and calls that helper.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-11 13:20:29

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 21/22] x86/sev: Register SNP guest request platform device

On 6/9/21 2:24 PM, Dr. David Alan Gilbert wrote:
> * Brijesh Singh ([email protected]) wrote:
>> Version 2 of GHCB specification provides NAEs that can be used by the SNP
>> guest to communicate with the PSP without risk from a malicious hypervisor
>> who wishes to read, alter, drop or replay the messages sent.
>>
>> The hypervisor uses the SNP_GUEST_REQUEST command interface provided by
>> the SEV-SNP firmware to forward the guest messages to the PSP.
>>
>> In order to communicate with the PSP, the guest need to locate the secrets
>> page inserted by the hypervisor during the SEV-SNP guest launch. The
>> secrets page contains the communication keys used to send and receive the
>> encrypted messages between the guest and the PSP.
>>
>> The secrets page is located either through the setup_data cc_blob_address
>> or EFI configuration table.
>>
>> Create a platform device that the SNP guest driver can bind to get the
>> platform resources. The SNP guest driver can provide userspace interface
>> to get the attestation report, key derivation etc.
>>
>> The helper snp_issue_guest_request() will be used by the drivers to
>> send the guest message request to the hypervisor. The guest message header
>> contains a message count. The message count is used in the IV. The
>> firmware increments the message count by 1, and expects that next message
>> will be using the incremented count.
>>
>> The helper snp_msg_seqno() will be used by driver to get and message
>> sequence counter, and it will be automatically incremented by the
>> snp_issue_guest_request(). The incremented value is be saved in the
>> secrets page so that the kexec'ed kernel knows from where to begin.
>>
>> See SEV-SNP and GHCB spec for more details.
>>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/include/asm/sev.h | 12 +++
>> arch/x86/include/uapi/asm/svm.h | 2 +
>> arch/x86/kernel/sev.c | 176 ++++++++++++++++++++++++++++++++
>> arch/x86/platform/efi/efi.c | 2 +
>> include/linux/efi.h | 1 +
>> include/linux/sev-guest.h | 76 ++++++++++++++
>> 6 files changed, 269 insertions(+)
>> create mode 100644 include/linux/sev-guest.h
>>

>> +u64 snp_msg_seqno(void)
>> +{
>> + struct snp_secrets_page_layout *layout;
>> + u64 count;
>> +
>> + layout = snp_map_secrets_page();
>> + if (layout == NULL)
>> + return 0;
>> +
>> + /* Read the current message sequence counter from secrets pages */
>> + count = readl(&layout->os_area.msg_seqno_0);
>
> Why is this seqno_0 - is that because it's the count of talking to the
> PSP?

Yes, the sequence number is an ever increasing value that is used in
communicating with the PSP. The PSP maintains the next expected sequence
number and will reject messages which have a sequence number that is not
in sync with the PSP. The 0 refers to the VMPL level. Each VMPL level has
its own sequence number.

>
>> + iounmap(layout);
>> +
>> + /*
>> + * The message sequence counter for the SNP guest request is a 64-bit value
>> + * but the version 2 of GHCB specification defines the 32-bit storage for the
>> + * it.
>> + */
>> + if ((count + 1) >= INT_MAX)
>> + return 0;
>
> Is that UINT_MAX?
>
>> +
>> + return count + 1;
>> +}
>> +EXPORT_SYMBOL_GPL(snp_msg_seqno);
>> +
>> +static void snp_gen_msg_seqno(void)
>> +{
>> + struct snp_secrets_page_layout *layout;
>> + u64 count;
>> +
>> + layout = snp_map_secrets_page();
>> + if (layout == NULL)
>> + return;
>> +
>> + /* Increment the sequence counter by 2 and save in secrets page. */
>> + count = readl(&layout->os_area.msg_seqno_0);
>> + count += 2;
>
> Why 2 not 1 ?

The return message by the PSP also increments the sequence number, hence
the increment by 2 instead of 1 for the next message to be submitted.

I'll let Brijesh address the other questions.

Thanks,
Tom

2021-06-14 11:04:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 16/22] KVM: SVM: Create a separate mapping for the SEV-ES save area

On Wed, Jun 02, 2021 at 09:04:10AM -0500, Brijesh Singh wrote:
> +/* Save area definition for SEV-ES and SEV-SNP guests */
> +struct sev_es_save_area {

Can we agree on a convention here to denote SEV-ES and later
variants VS earlier ones so that you don't have "SEV-ES" in the name
sev_es_save_area but to mean that this applies to SNP and future stuff
too?

What about SEV-only guests? I'm assuming those use the old variant.

Which would mean you can call this

struct prot_guest_save_area

or so, so that it doesn't have "sev" in the name and so that there's no
confusion...

Ditto for the size defines.

> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 5bc887e9a986..d93a1c368b61 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -542,12 +542,20 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
> static int sev_es_sync_vmsa(struct vcpu_svm *svm)

Not SEV-ES only anymore, so I guess sev_snp_sync_vmca() or so.

> - struct vmcb_save_area *save = &svm->vmcb->save;
> + struct sev_es_save_area *save = svm->vmsa;
>
> /* Check some debug related fields before encrypting the VMSA */
> - if (svm->vcpu.guest_debug || (save->dr7 & ~DR7_FIXED_1))
> + if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1))
> return -EINVAL;
>
> + /*
> + * SEV-ES will use a VMSA that is pointed to by the VMCB, not
> + * the traditional VMSA that is part of the VMCB. Copy the
> + * traditional VMSA as it has been built so far (in prep
> + * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.

Ditto - nomenclature.

> + */
> + memcpy(save, &svm->vmcb->save, sizeof(svm->vmcb->save));
> +
> /* Sync registgers */
^^^^^^^^^^

typo. Might as well fix while at it.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-14 12:30:14

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 09/22] x86/compressed: Register GHCB memory when SEV-SNP is active


On 6/9/21 12:47 PM, Borislav Petkov wrote:
> On Wed, Jun 02, 2021 at 09:04:03AM -0500, Brijesh Singh wrote:
>> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
>> index 1424b8ffde0b..ae99a8a756fe 100644
>> --- a/arch/x86/include/asm/sev-common.h
>> +++ b/arch/x86/include/asm/sev-common.h
>> @@ -75,6 +75,17 @@
>> #define GHCB_MSR_PSC_ERROR_POS 32
>> #define GHCB_MSR_PSC_RESP_VAL(val) ((val) >> GHCB_MSR_PSC_ERROR_POS)
>>
>> +/* GHCB GPA Register */
>> +#define GHCB_MSR_GPA_REG_REQ 0x012
>> +#define GHCB_MSR_GPA_REG_VALUE_POS 12
>> +#define GHCB_MSR_GPA_REG_GFN_MASK GENMASK_ULL(51, 0)
>> +#define GHCB_MSR_GPA_REQ_GFN_VAL(v) \
>> + (((unsigned long)((v) & GHCB_MSR_GPA_REG_GFN_MASK) << GHCB_MSR_GPA_REG_VALUE_POS)| \
>> + GHCB_MSR_GPA_REG_REQ)
>> +
>> +#define GHCB_MSR_GPA_REG_RESP 0x013
>> +#define GHCB_MSR_GPA_REG_RESP_VAL(v) ((v) >> GHCB_MSR_GPA_REG_VALUE_POS)
>> +
> Can we pls pay attention to having those REQuests sorted by their
> number, like in the GHCB spec, for faster finding?

Sure, I will keep them sorted. thanks

2021-06-14 12:46:05

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes


On 6/10/21 10:50 AM, Borislav Petkov wrote:
> On Wed, Jun 02, 2021 at 09:04:05AM -0500, Brijesh Singh wrote:
>> @@ -65,6 +65,12 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>> /* RMP page size */
>> #define RMP_PG_SIZE_4K 0
>>
>> +/* Memory opertion for snp_prep_memory() */
>> +enum snp_mem_op {
>> + MEMORY_PRIVATE,
>> + MEMORY_SHARED
> See below.
>
>> +};
>> +
>> #ifdef CONFIG_AMD_MEM_ENCRYPT
>> extern struct static_key_false sev_es_enable_key;
>> extern void __sev_es_ist_enter(struct pt_regs *regs);
>> @@ -103,6 +109,11 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
>>
>> return rc;
>> }
>> +void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
>> + unsigned int npages);
>> +void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
>> + unsigned int npages);
> Align arguments on the opening brace.

Noted.


>
>> +void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op);
>> #else
>> static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>> static inline void sev_es_ist_exit(void) { }
>> @@ -110,6 +121,15 @@ static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { ret
>> static inline void sev_es_nmi_complete(void) { }
>> static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
>> static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
>> +static inline void __init
>> +early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages)
> Put those { } at the end of the line:
>
> early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
>
> no need for separate lines. Ditto below.

Noted.


>
>> +{
>> +}
>> +static inline void __init
>> +early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages)
>> +{
>> +}
>> +static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op) { }
>> #endif
>>
>> #endif
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index 455c09a9b2c2..6e9b45bb38ab 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -532,6 +532,111 @@ static u64 get_jump_table_addr(void)
>> return ret;
>> }
>>
>> +static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool validate)
>> +{
>> + unsigned long vaddr_end;
>> + int rc;
>> +
>> + vaddr = vaddr & PAGE_MASK;
>> + vaddr_end = vaddr + (npages << PAGE_SHIFT);
>> +
>> + while (vaddr < vaddr_end) {
>> + rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
>> + if (WARN(rc, "Failed to validate address 0x%lx ret %d", vaddr, rc))
>> + sev_es_terminate(1, GHCB_TERM_PVALIDATE);
> ^^
>
> I guess that 1 should be a define too, if we have to be correct:
>
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
>
> or so. Ditto for all other calls of this.

Sure, I will define a macro for it.


>
>> +
>> + vaddr = vaddr + PAGE_SIZE;
>> + }
>> +}
>> +
>> +static void __init early_set_page_state(unsigned long paddr, unsigned int npages, int op)
>> +{
>> + unsigned long paddr_end;
>> + u64 val;
>> +
>> + paddr = paddr & PAGE_MASK;
>> + paddr_end = paddr + (npages << PAGE_SHIFT);
>> +
>> + while (paddr < paddr_end) {
>> + /*
>> + * Use the MSR protocol because this function can be called before the GHCB
>> + * is established.
>> + */
>> + sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
>> + VMGEXIT();
>> +
>> + val = sev_es_rd_ghcb_msr();
>> +
>> + if (GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP)
> From a previous review:
>
> Does that one need a warning too or am I being too paranoid?

IMO, there is no need to add a warning. This case should happen if its
either a hypervisor bug or hypervisor does not follow the GHCB
specification. I followed the SEV-ES vmgexit handling  and it does not
warn if the hypervisor returns a wrong response code. We simply
terminate the guest.


>
>> + goto e_term;
>> +
>> + if (WARN(GHCB_MSR_PSC_RESP_VAL(val),
>> + "Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",
>> + op == SNP_PAGE_STATE_PRIVATE ? "private" : "shared",
>> + paddr, GHCB_MSR_PSC_RESP_VAL(val)))
>> + goto e_term;
>> +
>> + paddr = paddr + PAGE_SIZE;
>> + }
>> +
>> + return;
>> +
>> +e_term:
>> + sev_es_terminate(1, GHCB_TERM_PSC);
>> +}
>> +
>> +void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
>> + unsigned int npages)
>> +{
>> + if (!sev_feature_enabled(SEV_SNP))
>> + return;
>> +
>> + /* Ask hypervisor to add the memory pages in RMP table as a 'private'. */
> Ask the hypervisor to mark the memory pages as private in the RMP table.

Noted.


>
>> + early_set_page_state(paddr, npages, SNP_PAGE_STATE_PRIVATE);
>> +
>> + /* Validate the memory pages after they've been added in the RMP table. */
>> + pvalidate_pages(vaddr, npages, 1);
>> +}
>> +
>> +void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
>> + unsigned int npages)
>> +{
>> + if (!sev_feature_enabled(SEV_SNP))
>> + return;
>> +
>> + /*
>> + * Invalidate the memory pages before they are marked shared in the
>> + * RMP table.
>> + */
>> + pvalidate_pages(vaddr, npages, 0);
>> +
>> + /* Ask hypervisor to make the memory pages shared in the RMP table. */
> mark

Noted.


>> + early_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
>> +}
>> +
>> +void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op)
>> +{
>> + unsigned long vaddr, npages;
>> +
>> + vaddr = (unsigned long)__va(paddr);
>> + npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
>> +
>> + switch (op) {
>> + case MEMORY_PRIVATE: {
>> + early_snp_set_memory_private(vaddr, paddr, npages);
>> + return;
>> + }
>> + case MEMORY_SHARED: {
>> + early_snp_set_memory_shared(vaddr, paddr, npages);
>> + return;
>> + }
>> + default:
>> + break;
>> + }
>> +
>> + WARN(1, "invalid memory op %d\n", op);
> A lot easier, diff ontop of your patch:

thanks. I will apply it.

I did thought about reusing the VMGEXIT defined macro
SNP_PAGE_STATE_{PRIVATE, SHARED} but I was not sure if you will be okay
with that. Additionally now both the function name and macro name will
include the "SNP". The call will look like this:

snp_prep_memory(paddr, SNP_PAGE_STATE_PRIVATE)

>
> ---
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 7c2cb5300e43..2ad4b5ab3f6c 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -65,12 +65,6 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
> /* RMP page size */
> #define RMP_PG_SIZE_4K 0
>
> -/* Memory opertion for snp_prep_memory() */
> -enum snp_mem_op {
> - MEMORY_PRIVATE,
> - MEMORY_SHARED
> -};
> -
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> extern struct static_key_false sev_es_enable_key;
> extern void __sev_es_ist_enter(struct pt_regs *regs);
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 2a5dce42af35..991d7964cee9 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -662,20 +662,13 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op)
> vaddr = (unsigned long)__va(paddr);
> npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
>
> - switch (op) {
> - case MEMORY_PRIVATE: {
> + if (op == SNP_PAGE_STATE_PRIVATE)
> early_snp_set_memory_private(vaddr, paddr, npages);
> - return;
> - }
> - case MEMORY_SHARED: {
> + else if (op == SNP_PAGE_STATE_SHARED)
> early_snp_set_memory_shared(vaddr, paddr, npages);
> - return;
> + else {
> + WARN(1, "invalid memory page op %d\n", op);
> }
> - default:
> - break;
> - }
> -
> - WARN(1, "invalid memory op %d\n", op);
> }
>
> int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
> ---
>
>> static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
>>
>> +/*
>> + * When SNP is active, changes the page state from private to shared before
> s/changes/change/

Noted.


>
>> + * copying the data from the source to destination and restore after the copy.
>> + * This is required because the source address is mapped as decrypted by the
>> + * caller of the routine.
>> + */
>> +static inline void __init snp_memcpy(void *dst, void *src, size_t sz,
>> + unsigned long paddr, bool decrypt)
>> +{
>> + unsigned long npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
>> +
>> + if (!sev_feature_enabled(SEV_SNP) || !decrypt) {
>> + memcpy(dst, src, sz);
>> + return;
>> + }
>> +
>> + /*
>> + * If the paddr needs to be accessed decrypted, mark the page
> What do you mean "If" - this is the SNP version of memcpy. Just say:
>
> /*
> * With SNP, the page address needs to be ...
> */
>
>> + * shared in the RMP table before copying it.
>> + */
>> + early_snp_set_memory_shared((unsigned long)__va(paddr), paddr, npages);
>> +
>> + memcpy(dst, src, sz);
>> +
>> + /* Restore the page state after the memcpy. */
>> + early_snp_set_memory_private((unsigned long)__va(paddr), paddr, npages);
>> +}
>> +
>> /*
>> * This routine does not change the underlying encryption setting of the
>> * page(s) that map this memory. It assumes that eventually the memory is
>> @@ -96,8 +125,8 @@ static void __init __sme_early_enc_dec(resource_size_t paddr,
>> * Use a temporary buffer, of cache-line multiple size, to
>> * avoid data corruption as documented in the APM.
>> */
>> - memcpy(sme_early_buffer, src, len);
>> - memcpy(dst, sme_early_buffer, len);
>> + snp_memcpy(sme_early_buffer, src, len, paddr, enc);
>> + snp_memcpy(dst, sme_early_buffer, len, paddr, !enc);
>>
>> early_memunmap(dst, len);
>> early_memunmap(src, len);
>> @@ -277,9 +306,23 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
>> else
>> sme_early_decrypt(pa, size);
>>
>> + /*
>> + * If page is getting mapped decrypted in the page table, then the page state
>> + * change in the RMP table must happen before the page table updates.
>> + */
>> + if (!enc)
>> + early_snp_set_memory_shared((unsigned long)__va(pa), pa, 1);
> Merge the two branches:

Noted.


>
> /* Encrypt/decrypt the contents in-place */
> if (enc) {
> sme_early_encrypt(pa, size);
> } else {
> sme_early_decrypt(pa, size);
>
> /*
> * On SNP, the page state change in the RMP table must happen
> * before the page table updates.
> */
> early_snp_set_memory_shared((unsigned long)__va(pa), pa, 1);
> }

- Brijesh

2021-06-14 13:06:41

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 14/22] x86/mm: Add support to validate memory when changing C-bit


On 6/11/21 4:44 AM, Borislav Petkov wrote:
> On Wed, Jun 02, 2021 at 09:04:08AM -0500, Brijesh Singh wrote:
>> +/* SNP Page State Change NAE event */
>> +#define VMGEXIT_PSC_MAX_ENTRY 253
>> +
>> +struct __packed snp_page_state_header {
> psc_hdr

Noted.


>> + u16 cur_entry;
>> + u16 end_entry;
>> + u32 reserved;
>> +};
>> +
>> +struct __packed snp_page_state_entry {
> psc_entry

Noted.


>
>> + u64 cur_page : 12,
>> + gfn : 40,
>> + operation : 4,
>> + pagesize : 1,
>> + reserved : 7;
>> +};
>> +
>> +struct __packed snp_page_state_change {
> snp_psc_desc
>
> or so.

Noted.


>
>> + struct snp_page_state_header header;
>> + struct snp_page_state_entry entry[VMGEXIT_PSC_MAX_ENTRY];
>> +};
> Which would make this struct a lot more readable:
>
> struct __packed snp_psc_desc {
> struct psc_hdr hdr;
> struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY];
>
Agreed.


>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index 6e9b45bb38ab..4847ac81cca3 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -637,6 +637,113 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op)
>> WARN(1, "invalid memory op %d\n", op);
>> }
>>
>> +static int page_state_vmgexit(struct ghcb *ghcb, struct snp_page_state_change *data)
> vmgexit_psc

Noted.


>> +{
>> + struct snp_page_state_header *hdr;
>> + int ret = 0;
>> +
>> + hdr = &data->header;
> Make sure to verify that snp_page_state_header.reserved field is always
> 0 before working more on the header so that people don't put stuff in
> there which you cannot change later because it becomes ABI or whatnot.
> Ditto for the other reserved fields.
>
Good point, let me go through both the hypervisor and guest to make sure
that reserved fields are all zero (as defined by the GHCB spec).


>> +
>> + /*
>> + * As per the GHCB specification, the hypervisor can resume the guest before
>> + * processing all the entries. The loop checks whether all the entries are
> s/The loop checks/Check/

Noted.


>
>> + * processed. If not, then keep retrying.
> What guarantees that that loop will terminate eventually?

Guest OS depend on the hypervisor to assist in this operation. The loop
will terminate only after the hypervisor completes the requested
operation. Guest is not protecting itself from DoS type of attack. A
guest should not proceed until hypervisor performs the request page
state change in the RMP table.


>> + */
>> + while (hdr->cur_entry <= hdr->end_entry) {
> I see that "[t]he hypervisor should ensure that cur_entry and end_entry
> represent values within the limits of the GHCB Shared Buffer." but let's
> sanity-check that HV here too. We don't trust it, remember? :)

Let me understand, are you saying that hypervisor could trick us into
believing that page state change completed without actually changing it ?


>> +
>> + ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
>> +
>> + ret = sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PSC, 0, 0);
>> +
>> + /* Page State Change VMGEXIT can pass error code through exit_info_2. */
>> + if (WARN(ret || ghcb->save.sw_exit_info_2,
>> + "SEV-SNP: page state change failed ret=%d exit_info_2=%llx\n",
>> + ret, ghcb->save.sw_exit_info_2))
>> + return 1;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void set_page_state(unsigned long vaddr, unsigned int npages, int op)
>> +{
>> + struct snp_page_state_change *data;
>> + struct snp_page_state_header *hdr;
>> + struct snp_page_state_entry *e;
>> + unsigned long vaddr_end;
>> + struct ghcb_state state;
>> + struct ghcb *ghcb;
>> + int idx;
>> +
>> + vaddr = vaddr & PAGE_MASK;
>> + vaddr_end = vaddr + (npages << PAGE_SHIFT);
> Move those...
>
>> +
>> + ghcb = sev_es_get_ghcb(&state);
>> + if (unlikely(!ghcb))
>> + panic("SEV-SNP: Failed to get GHCB\n");
> <--- ... here.

Noted.


>
>> +
>> + data = (struct snp_page_state_change *)ghcb->shared_buffer;
>> + hdr = &data->header;
>> +
>> + while (vaddr < vaddr_end) {
>> + e = data->entry;
>> + memset(data, 0, sizeof(*data));
>> +
>> + for (idx = 0; idx < VMGEXIT_PSC_MAX_ENTRY; idx++, e++) {
>> + unsigned long pfn;
>> +
>> + if (is_vmalloc_addr((void *)vaddr))
>> + pfn = vmalloc_to_pfn((void *)vaddr);
>> + else
>> + pfn = __pa(vaddr) >> PAGE_SHIFT;
>> +
>> + e->gfn = pfn;
>> + e->operation = op;
>> + hdr->end_entry = idx;
>> +
>> + /*
>> + * The GHCB specification provides the flexibility to
>> + * use either 4K or 2MB page size in the RMP table.
>> + * The current SNP support does not keep track of the
>> + * page size used in the RMP table. To avoid the
>> + * overlap request, use the 4K page size in the RMP
>> + * table.
>> + */
>> + e->pagesize = RMP_PG_SIZE_4K;
>> + vaddr = vaddr + PAGE_SIZE;
> Please put that
> e++;
>
> here.
>
> It took me a while to find it hidden at the end of the loop and was
> scratching my head as to why are we overwriting e-> everytime.

Ah, sure I will do it.


>> +
>> + if (vaddr >= vaddr_end)
>> + break;
> Instead of this silly check here, you can compute the range starting at
> vaddr, VMGEXIT_PSC_MAX_ENTRY pages worth, carve out that second for-loop
> in a helper called
>
> __set_page_state()
>
> which does the data preparation and does the vmgexit at the end.
>
> Then the outer loop does only the computation and calls that helper.

Okay, I will look into rearranging the code a bit more to address your
feedback.

-Brijesh

2021-06-14 13:21:45

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 21/22] x86/sev: Register SNP guest request platform device

I see that Tom answered few comments. I will cover others.


On 6/9/21 2:24 PM, Dr. David Alan Gilbert wrote:
+ /*
>> + * The message sequence counter for the SNP guest request is a 64-bit value
>> + * but the version 2 of GHCB specification defines the 32-bit storage for the
>> + * it.
>> + */
>> + if ((count + 1) >= INT_MAX)
>> + return 0;
> Is that UINT_MAX?

Good catch. It should be UINT_MAX.


> + /*
> + * The secret page contains the VM encryption key used for encrypting the
> + * messages between the guest and the PSP. The secrets page location is
> + * available either through the setup_data or EFI configuration table.
> + */
> + if (hdr->cc_blob_address) {
> + paddr = hdr->cc_blob_address;
> Can you trust the paddr the host has given you or do you need to do some
> form of validation?
The paddr is mapped encrypted. That means that data  in the paddr must
be encrypted either through the guest or PSP. After locating the paddr,
we perform a simply sanity check (32-bit magic string "AMDE"). See the
verify header check below. Unfortunately the secrets page itself does
not contain any magic key which we can use to ensure that
hdr->secret_paddr is actually pointing to the secrets pages but all of
these memory is accessed encrypted so its safe to access it. If VMM
lying to us that basically means guest will not be able to communicate
with the PSP and can't do the attestation etc.

>
> Dave
> + } else if (efi_enabled(EFI_CONFIG_TABLES)) {
> +#ifdef CONFIG_EFI
> + paddr = cc_blob_phys;
> +#else
> + return -ENODEV;
> +#endif
> + } else {
> + return -ENODEV;
> + }
> +
> + info = memremap(paddr, sizeof(*info), MEMREMAP_WB);
> + if (!info)
> + return -ENOMEM;
> +
> + /* Verify the header that its a valid SEV_SNP CC header */
> + if ((info->magic == CC_BLOB_SEV_HDR_MAGIC) &&
> + info->secrets_phys &&
> + (info->secrets_len == PAGE_SIZE)) {
> + res->start = info->secrets_phys;
> + res->end = info->secrets_phys + info->secrets_len;
> + res->flags = IORESOURCE_MEM;
> + snp_secrets_phys = info->secrets_phys;
> + ret = 0;
> + }
> +
> + memunmap(info);
> + return ret;
> +}
> +

2021-06-14 17:16:07

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 21/22] x86/sev: Register SNP guest request platform device

* Tom Lendacky ([email protected]) wrote:
> On 6/9/21 2:24 PM, Dr. David Alan Gilbert wrote:
> > * Brijesh Singh ([email protected]) wrote:
> >> Version 2 of GHCB specification provides NAEs that can be used by the SNP
> >> guest to communicate with the PSP without risk from a malicious hypervisor
> >> who wishes to read, alter, drop or replay the messages sent.
> >>
> >> The hypervisor uses the SNP_GUEST_REQUEST command interface provided by
> >> the SEV-SNP firmware to forward the guest messages to the PSP.
> >>
> >> In order to communicate with the PSP, the guest need to locate the secrets
> >> page inserted by the hypervisor during the SEV-SNP guest launch. The
> >> secrets page contains the communication keys used to send and receive the
> >> encrypted messages between the guest and the PSP.
> >>
> >> The secrets page is located either through the setup_data cc_blob_address
> >> or EFI configuration table.
> >>
> >> Create a platform device that the SNP guest driver can bind to get the
> >> platform resources. The SNP guest driver can provide userspace interface
> >> to get the attestation report, key derivation etc.
> >>
> >> The helper snp_issue_guest_request() will be used by the drivers to
> >> send the guest message request to the hypervisor. The guest message header
> >> contains a message count. The message count is used in the IV. The
> >> firmware increments the message count by 1, and expects that next message
> >> will be using the incremented count.
> >>
> >> The helper snp_msg_seqno() will be used by driver to get and message
> >> sequence counter, and it will be automatically incremented by the
> >> snp_issue_guest_request(). The incremented value is be saved in the
> >> secrets page so that the kexec'ed kernel knows from where to begin.
> >>
> >> See SEV-SNP and GHCB spec for more details.
> >>
> >> Signed-off-by: Brijesh Singh <[email protected]>
> >> ---
> >> arch/x86/include/asm/sev.h | 12 +++
> >> arch/x86/include/uapi/asm/svm.h | 2 +
> >> arch/x86/kernel/sev.c | 176 ++++++++++++++++++++++++++++++++
> >> arch/x86/platform/efi/efi.c | 2 +
> >> include/linux/efi.h | 1 +
> >> include/linux/sev-guest.h | 76 ++++++++++++++
> >> 6 files changed, 269 insertions(+)
> >> create mode 100644 include/linux/sev-guest.h
> >>
>
> >> +u64 snp_msg_seqno(void)
> >> +{
> >> + struct snp_secrets_page_layout *layout;
> >> + u64 count;
> >> +
> >> + layout = snp_map_secrets_page();
> >> + if (layout == NULL)
> >> + return 0;
> >> +
> >> + /* Read the current message sequence counter from secrets pages */
> >> + count = readl(&layout->os_area.msg_seqno_0);
> >
> > Why is this seqno_0 - is that because it's the count of talking to the
> > PSP?
>
> Yes, the sequence number is an ever increasing value that is used in
> communicating with the PSP. The PSP maintains the next expected sequence
> number and will reject messages which have a sequence number that is not
> in sync with the PSP. The 0 refers to the VMPL level. Each VMPL level has
> its own sequence number.

Can you just clarify; is that the VMPL of the caller or the destination?
What I'm partially asking here is whether it matters which VMPL the
kernel is running at (which I'm assuming could well be non-0)

> >
> >> + iounmap(layout);
> >> +
> >> + /*
> >> + * The message sequence counter for the SNP guest request is a 64-bit value
> >> + * but the version 2 of GHCB specification defines the 32-bit storage for the
> >> + * it.
> >> + */
> >> + if ((count + 1) >= INT_MAX)
> >> + return 0;
> >
> > Is that UINT_MAX?
> >
> >> +
> >> + return count + 1;
> >> +}
> >> +EXPORT_SYMBOL_GPL(snp_msg_seqno);
> >> +
> >> +static void snp_gen_msg_seqno(void)
> >> +{
> >> + struct snp_secrets_page_layout *layout;
> >> + u64 count;
> >> +
> >> + layout = snp_map_secrets_page();
> >> + if (layout == NULL)
> >> + return;
> >> +
> >> + /* Increment the sequence counter by 2 and save in secrets page. */
> >> + count = readl(&layout->os_area.msg_seqno_0);
> >> + count += 2;
> >
> > Why 2 not 1 ?
>
> The return message by the PSP also increments the sequence number, hence
> the increment by 2 instead of 1 for the next message to be submitted.

OK

Dave

> I'll let Brijesh address the other questions.
>
> Thanks,
> Tom
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2021-06-14 17:24:10

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 21/22] x86/sev: Register SNP guest request platform device

* Brijesh Singh ([email protected]) wrote:
> I see that Tom answered few comments. I will cover others.
>
>
> On 6/9/21 2:24 PM, Dr. David Alan Gilbert wrote:
> + /*
> >> + * The message sequence counter for the SNP guest request is a 64-bit value
> >> + * but the version 2 of GHCB specification defines the 32-bit storage for the
> >> + * it.
> >> + */
> >> + if ((count + 1) >= INT_MAX)
> >> + return 0;
> > Is that UINT_MAX?
>
> Good catch. It should be UINT_MAX.

OK, but I'm also confused by two things:
a) Why +1 given that Tom's reply says this gets incremented by 2 each
time (once for the message, once for the reply)
b) Why >= ? I think here is count was INT_MAX-1 you'd skip to 0,
skipping INT_MAX - is that what you want?

>
> > + /*
> > + * The secret page contains the VM encryption key used for encrypting the
> > + * messages between the guest and the PSP. The secrets page location is
> > + * available either through the setup_data or EFI configuration table.
> > + */
> > + if (hdr->cc_blob_address) {
> > + paddr = hdr->cc_blob_address;
> > Can you trust the paddr the host has given you or do you need to do some
> > form of validation?
> The paddr is mapped encrypted. That means that data? in the paddr must
> be encrypted either through the guest or PSP. After locating the paddr,
> we perform a simply sanity check (32-bit magic string "AMDE"). See the
> verify header check below. Unfortunately the secrets page itself does
> not contain any magic key which we can use to ensure that
> hdr->secret_paddr is actually pointing to the secrets pages but all of
> these memory is accessed encrypted so its safe to access it. If VMM
> lying to us that basically means guest will not be able to communicate
> with the PSP and can't do the attestation etc.

OK; that nails pretty much anything bad that can happen - I was just
thinking if the host did something odd like give you an address in the
middle of some other useful structure.

Dave

> >
> > Dave
> > + } else if (efi_enabled(EFI_CONFIG_TABLES)) {
> > +#ifdef CONFIG_EFI
> > + paddr = cc_blob_phys;
> > +#else
> > + return -ENODEV;
> > +#endif
> > + } else {
> > + return -ENODEV;
> > + }
> > +
> > + info = memremap(paddr, sizeof(*info), MEMREMAP_WB);
> > + if (!info)
> > + return -ENOMEM;
> > +
> > + /* Verify the header that its a valid SEV_SNP CC header */
> > + if ((info->magic == CC_BLOB_SEV_HDR_MAGIC) &&
> > + info->secrets_phys &&
> > + (info->secrets_len == PAGE_SIZE)) {
> > + res->start = info->secrets_phys;
> > + res->end = info->secrets_phys + info->secrets_len;
> > + res->flags = IORESOURCE_MEM;
> > + snp_secrets_phys = info->secrets_phys;
> > + ret = 0;
> > + }
> > +
> > + memunmap(info);
> > + return ret;
> > +}
> > +
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2021-06-14 18:25:16

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 21/22] x86/sev: Register SNP guest request platform device


On 6/14/21 12:15 PM, Dr. David Alan Gilbert wrote:
> * Tom Lendacky ([email protected]) wrote:
>> On 6/9/21 2:24 PM, Dr. David Alan Gilbert wrote:
>>> * Brijesh Singh ([email protected]) wrote:
>>>> Version 2 of GHCB specification provides NAEs that can be used by the SNP
>>>> guest to communicate with the PSP without risk from a malicious hypervisor
>>>> who wishes to read, alter, drop or replay the messages sent.
>>>>
>>>> The hypervisor uses the SNP_GUEST_REQUEST command interface provided by
>>>> the SEV-SNP firmware to forward the guest messages to the PSP.
>>>>
>>>> In order to communicate with the PSP, the guest need to locate the secrets
>>>> page inserted by the hypervisor during the SEV-SNP guest launch. The
>>>> secrets page contains the communication keys used to send and receive the
>>>> encrypted messages between the guest and the PSP.
>>>>
>>>> The secrets page is located either through the setup_data cc_blob_address
>>>> or EFI configuration table.
>>>>
>>>> Create a platform device that the SNP guest driver can bind to get the
>>>> platform resources. The SNP guest driver can provide userspace interface
>>>> to get the attestation report, key derivation etc.
>>>>
>>>> The helper snp_issue_guest_request() will be used by the drivers to
>>>> send the guest message request to the hypervisor. The guest message header
>>>> contains a message count. The message count is used in the IV. The
>>>> firmware increments the message count by 1, and expects that next message
>>>> will be using the incremented count.
>>>>
>>>> The helper snp_msg_seqno() will be used by driver to get and message
>>>> sequence counter, and it will be automatically incremented by the
>>>> snp_issue_guest_request(). The incremented value is be saved in the
>>>> secrets page so that the kexec'ed kernel knows from where to begin.
>>>>
>>>> See SEV-SNP and GHCB spec for more details.
>>>>
>>>> Signed-off-by: Brijesh Singh <[email protected]>
>>>> ---
>>>> arch/x86/include/asm/sev.h | 12 +++
>>>> arch/x86/include/uapi/asm/svm.h | 2 +
>>>> arch/x86/kernel/sev.c | 176 ++++++++++++++++++++++++++++++++
>>>> arch/x86/platform/efi/efi.c | 2 +
>>>> include/linux/efi.h | 1 +
>>>> include/linux/sev-guest.h | 76 ++++++++++++++
>>>> 6 files changed, 269 insertions(+)
>>>> create mode 100644 include/linux/sev-guest.h
>>>>
>>>> +u64 snp_msg_seqno(void)
>>>> +{
>>>> + struct snp_secrets_page_layout *layout;
>>>> + u64 count;
>>>> +
>>>> + layout = snp_map_secrets_page();
>>>> + if (layout == NULL)
>>>> + return 0;
>>>> +
>>>> + /* Read the current message sequence counter from secrets pages */
>>>> + count = readl(&layout->os_area.msg_seqno_0);
>>> Why is this seqno_0 - is that because it's the count of talking to the
>>> PSP?
>> Yes, the sequence number is an ever increasing value that is used in
>> communicating with the PSP. The PSP maintains the next expected sequence
>> number and will reject messages which have a sequence number that is not
>> in sync with the PSP. The 0 refers to the VMPL level. Each VMPL level has
>> its own sequence number.
> Can you just clarify; is that the VMPL of the caller or the destination?
> What I'm partially asking here is whether it matters which VMPL the
> kernel is running at (which I'm assuming could well be non-0)


The caller's VMPL number. Each VMPL have different communicate keys,
please see the secrets page layout as described in the SEV-SNP firmware
spec 8.14.2.5[1].

As indicated in the cover letter, the guest and hypervisor patches are
targeted to for VMPL0 so we are using sequence number and key from the
vmpl0 only.

[1] https://www.amd.com/system/files/TechDocs/56860.pdf

>
>>>> + iounmap(layout);
>>>> +
>>>> + /*
>>>> + * The message sequence counter for the SNP guest request is a 64-bit value
>>>> + * but the version 2 of GHCB specification defines the 32-bit storage for the
>>>> + * it.
>>>> + */
>>>> + if ((count + 1) >= INT_MAX)
>>>> + return 0;
>>> Is that UINT_MAX?
>>>
>>>> +
>>>> + return count + 1;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(snp_msg_seqno);
>>>> +
>>>> +static void snp_gen_msg_seqno(void)
>>>> +{
>>>> + struct snp_secrets_page_layout *layout;
>>>> + u64 count;
>>>> +
>>>> + layout = snp_map_secrets_page();
>>>> + if (layout == NULL)
>>>> + return;
>>>> +
>>>> + /* Increment the sequence counter by 2 and save in secrets page. */
>>>> + count = readl(&layout->os_area.msg_seqno_0);
>>>> + count += 2;
>>> Why 2 not 1 ?
>> The return message by the PSP also increments the sequence number, hence
>> the increment by 2 instead of 1 for the next message to be submitted.
> OK
>
> Dave
>
>> I'll let Brijesh address the other questions.
>>
>> Thanks,
>> Tom
>>

2021-06-14 19:03:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes

On Mon, Jun 14, 2021 at 07:45:11AM -0500, Brijesh Singh wrote:
> IMO, there is no need to add a warning. This case should happen if its
> either a hypervisor bug or hypervisor does not follow the GHCB
> specification. I followed the SEV-ES vmgexit handling  and it does not
> warn if the hypervisor returns a wrong response code. We simply
> terminate the guest.

This brings my regular user-friendliness question: will the guest user
know what happened or will the guest simply disappear/freeze without any
hint as to what has happened so that a post-mortem analysis would turn
out hard to decipher?

> I did thought about reusing the VMGEXIT defined macro
> SNP_PAGE_STATE_{PRIVATE, SHARED} but I was not sure if you will be okay
> with that.

Yeah, I think that makes stuff simpler. Unless there's something
speaking against it which we both are not thinking of right now.

> Additionally now both the function name and macro name will
> include the "SNP". The call will look like this:
>
> snp_prep_memory(paddr, SNP_PAGE_STATE_PRIVATE)

Yap, looks ok to me.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-14 19:29:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 14/22] x86/mm: Add support to validate memory when changing C-bit

On Mon, Jun 14, 2021 at 08:05:51AM -0500, Brijesh Singh wrote:
> Guest OS depend on the hypervisor to assist in this operation. The loop
> will terminate only after the hypervisor completes the requested
> operation. Guest is not protecting itself from DoS type of attack. A
> guest should not proceed until hypervisor performs the request page
> state change in the RMP table.

Some of that could be in a comment over that loop, so that it is clear
what the guest strategy is.

> Let me understand, are you saying that hypervisor could trick us into
> believing that page state change completed without actually changing it ?

Nah, I'm just saying that you should verify those ->cur_entry and
->end_entry values.

Of course the guest doesn't protect itself against DoS types of attacks
but this function page_state_vmgexit() here could save ->cur_entry
and ->end_entry on function entry and then compare it each time the
hypercall returns to make sure HV is not doing some shenanigans with
the entries range or even has a bug or so. I.e., it has not changed
->end_entry or ->cur_entry is not going backwards into the buffer.

I know, if uncaught here, it probably will explode later but a cheap
sanity check like that doesn't hurt to have just in case.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-14 19:34:55

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 16/22] KVM: SVM: Create a separate mapping for the SEV-ES save area

On 6/14/21 5:58 AM, Borislav Petkov wrote:
> On Wed, Jun 02, 2021 at 09:04:10AM -0500, Brijesh Singh wrote:
>> +/* Save area definition for SEV-ES and SEV-SNP guests */
>> +struct sev_es_save_area {
>
> Can we agree on a convention here to denote SEV-ES and later
> variants VS earlier ones so that you don't have "SEV-ES" in the name
> sev_es_save_area but to mean that this applies to SNP and future stuff
> too?

I was just following the APM, which lists it as the "State Save Area for
SEV-ES."

>
> What about SEV-only guests? I'm assuming those use the old variant.

Correct.

>
> Which would mean you can call this
>
> struct prot_guest_save_area
>
> or so, so that it doesn't have "sev" in the name and so that there's no
> confusion...

I guess we can call it just prot_save_area or protected_save_area or even
encrypted_save_area (no need for guest, since guest is implied, e.g. we
don't call the normal save area guest_save_area).

>
> Ditto for the size defines.
>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 5bc887e9a986..d93a1c368b61 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -542,12 +542,20 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>
>> static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>
> Not SEV-ES only anymore, so I guess sev_snp_sync_vmca() or so.
>
>> - struct vmcb_save_area *save = &svm->vmcb->save;
>> + struct sev_es_save_area *save = svm->vmsa;
>>
>> /* Check some debug related fields before encrypting the VMSA */
>> - if (svm->vcpu.guest_debug || (save->dr7 & ~DR7_FIXED_1))
>> + if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1))
>> return -EINVAL;
>>
>> + /*
>> + * SEV-ES will use a VMSA that is pointed to by the VMCB, not
>> + * the traditional VMSA that is part of the VMCB. Copy the
>> + * traditional VMSA as it has been built so far (in prep
>> + * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
>
> Ditto - nomenclature.

Yup, that can be made more generic.

>
>> + */
>> + memcpy(save, &svm->vmcb->save, sizeof(svm->vmcb->save));
>> +
>> /* Sync registgers */
> ^^^^^^^^^^
>
> typo. Might as well fix while at it.

Will do.

Thanks,
Tom

>

2021-06-14 19:51:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 16/22] KVM: SVM: Create a separate mapping for the SEV-ES save area

On Mon, Jun 14, 2021 at 02:34:03PM -0500, Tom Lendacky wrote:
> I guess we can call it just prot_save_area or protected_save_area or even
> encrypted_save_area (no need for guest, since guest is implied, e.g. we
> don't call the normal save area guest_save_area).

All three sound good to me.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-14 20:52:30

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 21/22] x86/sev: Register SNP guest request platform device


On 6/14/21 12:23 PM, Dr. David Alan Gilbert wrote:
> * Brijesh Singh ([email protected]) wrote:
>> I see that Tom answered few comments. I will cover others.
>>
>>
>> On 6/9/21 2:24 PM, Dr. David Alan Gilbert wrote:
>> + /*
>>>> + * The message sequence counter for the SNP guest request is a 64-bit value
>>>> + * but the version 2 of GHCB specification defines the 32-bit storage for the
>>>> + * it.
>>>> + */
>>>> + if ((count + 1) >= INT_MAX)
>>>> + return 0;
>>> Is that UINT_MAX?
>> Good catch. It should be UINT_MAX.
> OK, but I'm also confused by two things:
> a) Why +1 given that Tom's reply says this gets incremented by 2 each
> time (once for the message, once for the reply)
> b) Why >= ? I think here is count was INT_MAX-1 you'd skip to 0,
> skipping INT_MAX - is that what you want?

That's bug. I noticed it after you pointed the INT_MAX check and asked
question on why 2. I will fix in next iteration.


>>> + /*
>>> + * The secret page contains the VM encryption key used for encrypting the
>>> + * messages between the guest and the PSP. The secrets page location is
>>> + * available either through the setup_data or EFI configuration table.
>>> + */
>>> + if (hdr->cc_blob_address) {
>>> + paddr = hdr->cc_blob_address;
>>> Can you trust the paddr the host has given you or do you need to do some
>>> form of validation?
>> The paddr is mapped encrypted. That means that data  in the paddr must
>> be encrypted either through the guest or PSP. After locating the paddr,
>> we perform a simply sanity check (32-bit magic string "AMDE"). See the
>> verify header check below. Unfortunately the secrets page itself does
>> not contain any magic key which we can use to ensure that
>> hdr->secret_paddr is actually pointing to the secrets pages but all of
>> these memory is accessed encrypted so its safe to access it. If VMM
>> lying to us that basically means guest will not be able to communicate
>> with the PSP and can't do the attestation etc.
> OK; that nails pretty much anything bad that can happen - I was just
> thinking if the host did something odd like give you an address in the
> middle of some other useful structure.
>
> Dave
>
>>> Dave
>>> + } else if (efi_enabled(EFI_CONFIG_TABLES)) {
>>> +#ifdef CONFIG_EFI
>>> + paddr = cc_blob_phys;
>>> +#else
>>> + return -ENODEV;
>>> +#endif
>>> + } else {
>>> + return -ENODEV;
>>> + }
>>> +
>>> + info = memremap(paddr, sizeof(*info), MEMREMAP_WB);
>>> + if (!info)
>>> + return -ENOMEM;
>>> +
>>> + /* Verify the header that its a valid SEV_SNP CC header */
>>> + if ((info->magic == CC_BLOB_SEV_HDR_MAGIC) &&
>>> + info->secrets_phys &&
>>> + (info->secrets_len == PAGE_SIZE)) {
>>> + res->start = info->secrets_phys;
>>> + res->end = info->secrets_phys + info->secrets_len;
>>> + res->flags = IORESOURCE_MEM;
>>> + snp_secrets_phys = info->secrets_phys;
>>> + ret = 0;
>>> + }
>>> +
>>> + memunmap(info);
>>> + return ret;
>>> +}
>>> +

2021-06-14 21:02:32

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes


On 6/14/21 2:03 PM, Borislav Petkov wrote:
> On Mon, Jun 14, 2021 at 07:45:11AM -0500, Brijesh Singh wrote:
>> IMO, there is no need to add a warning. This case should happen if its
>> either a hypervisor bug or hypervisor does not follow the GHCB
>> specification. I followed the SEV-ES vmgexit handling  and it does not
>> warn if the hypervisor returns a wrong response code. We simply
>> terminate the guest.
> This brings my regular user-friendliness question: will the guest user
> know what happened or will the guest simply disappear/freeze without any
> hint as to what has happened so that a post-mortem analysis would turn
> out hard to decipher?

When a guest requests to terminate then guest user (aka VMM) will be
notified through the hypervisor that guest has requested the
termination. KVM defines a fixed set of reason code that is passed to
the guest user, see
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/kvm.h#L237.
In this particular case guest user probably get the KVM_EXIT_SHUTDOWN --
i.e guest asked to be terminated. If user wants to see the actual GHCB
reason code then they must look into the KVM log.

Now that we have to defined a Linux specific reason set, we could
potentially define a new error code "Invalid response code" and return
that instead of generic termination error in this particular case. So
that when user looks at KVM log they see the "invalid response code"
instead of the generic GHCB error.

If we go with that approach then I think it makes sense to cover it for
SEV-ES guests too.


>> I did thought about reusing the VMGEXIT defined macro
>> SNP_PAGE_STATE_{PRIVATE, SHARED} but I was not sure if you will be okay
>> with that.
> Yeah, I think that makes stuff simpler. Unless there's something
> speaking against it which we both are not thinking of right now.
>
>> Additionally now both the function name and macro name will
>> include the "SNP". The call will look like this:
>>
>> snp_prep_memory(paddr, SNP_PAGE_STATE_PRIVATE)
> Yap, looks ok to me.
>
> Thx.
>

2021-06-16 10:08:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes

On Mon, Jun 14, 2021 at 04:01:44PM -0500, Brijesh Singh wrote:
> Now that we have to defined a Linux specific reason set, we could
> potentially define a new error code "Invalid response code" and return

I don't understand - you have a WARN for the following check:

if (WARN(GHCB_MSR_PSC_RESP_VAL(val),
"Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",

what's wrong with doing:

if (WARN(GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP,
"Wrong PSC response code: 0x%x\n",
(unsigned int)GHCB_RESP_CODE(val)))
goto e_term;


above it too?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-16 10:23:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 08/22] x86/compressed: Add helper for validating pages in the decompression stage

On Wed, Jun 02, 2021 at 09:04:02AM -0500, Brijesh Singh wrote:
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 3ebf00772f26..1424b8ffde0b 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -56,6 +56,25 @@
> #define GHCB_MSR_HV_FT_RESP_VAL(v) \
> (((unsigned long)((v) & GHCB_MSR_HV_FT_MASK) >> GHCB_MSR_HV_FT_POS))
>
> +#define GHCB_HV_FT_SNP BIT_ULL(0)

That define is already added by

x86/sev: Check SEV-SNP features support

earlier.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-16 11:00:55

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes


On 6/16/21 5:07 AM, Borislav Petkov wrote:
> On Mon, Jun 14, 2021 at 04:01:44PM -0500, Brijesh Singh wrote:
>> Now that we have to defined a Linux specific reason set, we could
>> potentially define a new error code "Invalid response code" and return
> I don't understand - you have a WARN for the following check:
>
> if (WARN(GHCB_MSR_PSC_RESP_VAL(val),
> "Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",


This WARN indicates that command execution failed but it does not mean
that hypervisor violated the GHCB protocol.


>
> what's wrong with doing:
>
> if (WARN(GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP,
> "Wrong PSC response code: 0x%x\n",
> (unsigned int)GHCB_RESP_CODE(val)))
> goto e_term;

There are around 6 MSR based VMGEXITs and every VMEXIT are paired with
request and response. Guest uses the request code while submitting the
command, and hypervisor provides result through the response code. If
the guest sees hypervisor didn't use the response code (i.e violates the
GHCB protocol) then it terminates the guest with reason code set to
"General termination".  e.g look at the do_vc_no_ghcb()

https://elixir.bootlin.com/linux/v5.13-rc6/source/arch/x86/kernel/sev-shared.c#L157

I am trying to be consistent with previous VMGEXIT implementations. If
the command itself failed then use the command specific error code to
tell hypervisor why we terminated but if the hypervisor violated the
GHCB specification then use the "general request termination".


>
> above it too?
>

2021-06-16 12:04:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes

On Wed, Jun 16, 2021 at 06:00:09AM -0500, Brijesh Singh wrote:
> I am trying to be consistent with previous VMGEXIT implementations. If
> the command itself failed then use the command specific error code to
> tell hypervisor why we terminated but if the hypervisor violated the
> GHCB specification then use the "general request termination".

I feel like we're running in circles here: I ask about debuggability
and telling the user what exactly failed and you're giving me some
explanation about what the error codes mean. I can see what they mean.

So let me try again:

Imagine you're a guest owner and you haven't written the SNP code and
you don't know how it works.

You start a guest in the public cloud and it fails because the
hypervisor violates the GHCB protocol and all that guest prints before
it dies is

"general request termination"

How are you - the guest owner - going to find out what exactly happened?

Call support?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-16 12:50:11

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes


On 6/16/21 7:03 AM, Borislav Petkov wrote:
> On Wed, Jun 16, 2021 at 06:00:09AM -0500, Brijesh Singh wrote:
>> I am trying to be consistent with previous VMGEXIT implementations. If
>> the command itself failed then use the command specific error code to
>> tell hypervisor why we terminated but if the hypervisor violated the
>> GHCB specification then use the "general request termination".
> I feel like we're running in circles here: I ask about debuggability
> and telling the user what exactly failed and you're giving me some
> explanation about what the error codes mean. I can see what they mean.
>
> So let me try again:
>
> Imagine you're a guest owner and you haven't written the SNP code and
> you don't know how it works.
>
> You start a guest in the public cloud and it fails because the
> hypervisor violates the GHCB protocol and all that guest prints before
> it dies is
>
> "general request termination"


The GHCB specification does not define a unique error code for every
possible condition. Now that we have reserved reason set 1 for the
Linux-specific error code, we could add a new error code to cover the
cases for the protocol violation. I was highlighting that we should not
overload the meaning of GHCB_TERM_PSC. In my mind, the GHCB_TERM_PSC
error code is used when the guest sees that the hypervisor failed to
change the state . The failure maybe because the guest provided a bogus
GPA or invalid operation code, or RMPUPDATE failure or HV does not
support SNP feature etc etc. But in this case, the failure was due to
the protocol error, and IMO we should not use the GHCB_TERM_PSC.
Additionally, we should also update CPUID and other VMGEXITs to use the
new error code instead of "general request termination" so that its
consistent.


If you still think that GHCB_TERM_PSC is valid here, then I am okay with it.

-Brijesh


2021-06-16 13:03:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes

On Wed, Jun 16, 2021 at 07:49:25AM -0500, Brijesh Singh wrote:
> If you still think ...

I think you should answer my question first:

> Imagine you're a guest owner and you haven't written the SNP code and
> you don't know how it works.
>
> You start a guest in the public cloud and it fails because the
> hypervisor violates the GHCB protocol and all that guest prints before
> it dies is
>
> "general request termination"
>
> How are you - the guest owner - going to find out what exactly happened?
>
> Call support?

And let me paraphrase it again: if the error condition with which the
guest terminates is not uniquely identifiable but simply a "general
request", how are such conditions going to be debugged?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-16 13:07:41

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes

* Brijesh Singh ([email protected]) wrote:
>
> On 6/16/21 7:03 AM, Borislav Petkov wrote:
> > On Wed, Jun 16, 2021 at 06:00:09AM -0500, Brijesh Singh wrote:
> >> I am trying to be consistent with previous VMGEXIT implementations. If
> >> the command itself failed then use the command specific error code to
> >> tell hypervisor why we terminated but if the hypervisor violated the
> >> GHCB specification then use the "general request termination".
> > I feel like we're running in circles here: I ask about debuggability
> > and telling the user what exactly failed and you're giving me some
> > explanation about what the error codes mean. I can see what they mean.
> >
> > So let me try again:
> >
> > Imagine you're a guest owner and you haven't written the SNP code and
> > you don't know how it works.
> >
> > You start a guest in the public cloud and it fails because the
> > hypervisor violates the GHCB protocol and all that guest prints before
> > it dies is
> >
> > "general request termination"
>
>
> The GHCB specification does not define a unique error code for every
> possible condition. Now that we have reserved reason set 1 for the
> Linux-specific error code, we could add a new error code to cover the
> cases for the protocol violation. I was highlighting that we should not
> overload the meaning of GHCB_TERM_PSC. In my mind, the GHCB_TERM_PSC
> error code is used when the guest sees that the hypervisor failed to
> change the state . The failure maybe because the guest provided a bogus
> GPA or invalid operation code, or RMPUPDATE failure or HV does not
> support SNP feature etc etc. But in this case, the failure was due to
> the protocol error, and IMO we should not use the GHCB_TERM_PSC.
> Additionally, we should also update CPUID and other VMGEXITs to use the
> new error code instead of "general request termination" so that its
> consistent.
>
>
> If you still think that GHCB_TERM_PSC is valid here, then I am okay with it.

I'd kind of agree with Borislav, the more hints we can have as to the
actual failure reason the better - so if you've got multiple cases
where the guest thinks the hypervisor has screwed up, find a way to give
an error code to tell us which one.

Dave

> -Brijesh
>
>
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2021-06-16 13:08:16

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 19/22] x86/sev-snp: SEV-SNP AP creation support

On Wed, Jun 02, 2021 at 09:04:13AM -0500, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>

> Subject: Re: [PATCH Part1 RFC v3 19/22] x86/sev-snp: SEV-SNP AP creation support

The condensed patch description in the subject line should be written in
imperative tone. I.e., it needs a verb.

And to simplify it even more, let's prefix all SEV-* stuff with
"x86/sev: " from now on to mean the whole encrypted virt area.

> To provide a more secure way to start APs under SEV-SNP, use the SEV-SNP
> AP Creation NAE event. This allows for guest control over the AP register
> state rather than trusting the hypervisor with the SEV-ES Jump Table
> address.
>
> During native_smp_prepare_cpus(), invoke an SEV-SNP function that, if
> SEV-SNP is active, will set or override apic->wakeup_secondary_cpu. This
> will allow the SEV-SNP AP Creation NAE event method to be used to boot
> the APs.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> arch/x86/include/asm/sev-common.h | 1 +
> arch/x86/include/asm/sev.h | 13 ++
> arch/x86/include/uapi/asm/svm.h | 5 +
> arch/x86/kernel/sev-shared.c | 5 +
> arch/x86/kernel/sev.c | 206 ++++++++++++++++++++++++++++++
> arch/x86/kernel/smpboot.c | 3 +
> 6 files changed, 233 insertions(+)
>
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 86bb185b5ec1..47aa57bf654a 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -57,6 +57,7 @@
> (((unsigned long)((v) & GHCB_MSR_HV_FT_MASK) >> GHCB_MSR_HV_FT_POS))
>
> #define GHCB_HV_FT_SNP BIT_ULL(0)
> +#define GHCB_HV_FT_SNP_AP_CREATION (BIT_ULL(1) | GHCB_HV_FT_SNP)
>
> /* SNP Page State Change */
> #define GHCB_MSR_PSC_REQ 0x014
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index e2141fc28058..640108402ae9 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -71,6 +71,13 @@ enum snp_mem_op {
> MEMORY_SHARED
> };
>
> +#define RMPADJUST_VMPL_MAX 3
> +#define RMPADJUST_VMPL_MASK GENMASK(7, 0)
> +#define RMPADJUST_VMPL_SHIFT 0
> +#define RMPADJUST_PERM_MASK_MASK GENMASK(7, 0)

mask mask huh?

How about "perm mask" and "perm shift" ?

> +#define RMPADJUST_PERM_MASK_SHIFT 8
> +#define RMPADJUST_VMSA_PAGE_BIT BIT(16)
> +
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> extern struct static_key_false sev_es_enable_key;
> extern void __sev_es_ist_enter(struct pt_regs *regs);
> @@ -116,6 +123,9 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
> void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op);
> void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
> void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
> +

No need for the newlines here - it is all function prototypes lumped
together - the only one who reads them is the compiler.

> +void snp_setup_wakeup_secondary_cpu(void);

"setup" "wakeup" huh?

snp_set_wakeup_secondary_cpu() looks just fine to me. :)

> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index b62226bf51b9..7139c9ba59b2 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -32,6 +32,11 @@ static bool __init sev_es_check_cpu_features(void)
> return true;
> }
>
> +static bool snp_ap_creation_supported(void)
> +{
> + return (hv_features & GHCB_HV_FT_SNP_AP_CREATION) == GHCB_HV_FT_SNP_AP_CREATION;
> +}

Can we get rid of those silly accessors pls?

We established earlier that hv_features is going to be __ro_after_init
so we might just as well export it to sev.c for direct querying -
there's no worry that something'll change it during runtime.

> static bool __init sev_snp_check_hypervisor_features(void)
> {
> if (ghcb_version < 2)
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 4847ac81cca3..8f7ef35a25ef 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -19,6 +19,7 @@
> #include <linux/memblock.h>
> #include <linux/kernel.h>
> #include <linux/mm.h>
> +#include <linux/cpumask.h>
>
> #include <asm/cpu_entry_area.h>
> #include <asm/stacktrace.h>
> @@ -31,6 +32,7 @@
> #include <asm/svm.h>
> #include <asm/smp.h>
> #include <asm/cpu.h>
> +#include <asm/apic.h>
>
> #include "sev-internal.h"
>
> @@ -106,6 +108,8 @@ struct ghcb_state {
> static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
> DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
>
> +static DEFINE_PER_CPU(struct sev_es_save_area *, snp_vmsa);
> +
> /* Needed in vc_early_forward_exception */
> void do_early_exception(struct pt_regs *regs, int trapnr);
>
> @@ -744,6 +748,208 @@ void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
> pvalidate_pages(vaddr, npages, 1);
> }
>
> +static int snp_rmpadjust(void *va, unsigned int vmpl, unsigned int perm_mask, bool vmsa)

No need for the "snp_" prefix. Drop it for all static functions here too
pls.

@vmpl can be a u8 so that you don't need to mask it off. The same for
@perm_mask. And then you can drop the mask defines too.

> +{
> + unsigned int attrs;
> + int err;
> +
> + attrs = (vmpl & RMPADJUST_VMPL_MASK) << RMPADJUST_VMPL_SHIFT;

Shift by 0 huh? Can we drop this silliness pls?

/* Make sure Reserved[63:17] is 0 */
attrs = 0;

attrs |= vmpl;

Plain and simple.

> + attrs |= (perm_mask & RMPADJUST_PERM_MASK_MASK) << RMPADJUST_PERM_MASK_SHIFT;

perm_mask is always 0 - you don't even have to pass it in as a function
argument.

> + if (vmsa)
> + attrs |= RMPADJUST_VMSA_PAGE_BIT;
> +
> + /* Perform RMPADJUST */

Add:

/* Instruction mnemonic supported in binutils versions v2.36 and later */

> + asm volatile (".byte 0xf3,0x0f,0x01,0xfe\n\t"
> + : "=a" (err)

here you should do:

: ... "c" (RMP_PG_SIZE_4K), ...

so that it is clear what goes into %rcx.

> + : "a" (va), "c" (0), "d" (attrs)
> + : "memory", "cc");
> +
> + return err;
> +}
> +
> +static int snp_clear_vmsa(void *vmsa)
> +{
> + /*
> + * Clear the VMSA attribute for the page:
> + * RDX[7:0] = 1, Target VMPL level, must be numerically
> + * higher than current level (VMPL0)

But RMPADJUST_VMPL_MAX is 3?!

> + * RDX[15:8] = 0, Target permission mask (not used)
> + * RDX[16] = 0, Not a VMSA page
> + */
> + return snp_rmpadjust(vmsa, RMPADJUST_VMPL_MAX, 0, false);
> +}
> +
> +static int snp_set_vmsa(void *vmsa)
> +{
> + /*
> + * To set the VMSA attribute for the page:
> + * RDX[7:0] = 1, Target VMPL level, must be numerically
> + * higher than current level (VMPL0)
> + * RDX[15:8] = 0, Target permission mask (not used)
> + * RDX[16] = 1, VMSA page
> + */
> + return snp_rmpadjust(vmsa, RMPADJUST_VMPL_MAX, 0, true);
> +}
> +
> +#define INIT_CS_ATTRIBS (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
> +#define INIT_DS_ATTRIBS (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK | SVM_SELECTOR_WRITE_MASK)
> +

Put SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK in a helper define and share it in the two
definitions above pls.

> +#define INIT_LDTR_ATTRIBS (SVM_SELECTOR_P_MASK | 2)
> +#define INIT_TR_ATTRIBS (SVM_SELECTOR_P_MASK | 3)
> +
> +static int snp_wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
> +{
> + struct sev_es_save_area *cur_vmsa;
> + struct sev_es_save_area *vmsa;
> + struct ghcb_state state;
> + struct ghcb *ghcb;
> + unsigned long flags;
> + u8 sipi_vector;
> + u64 cr4;
> + int cpu;
> + int ret;

Remember the reversed xmas tree. And you can combine the variables of
the same type into a single line.

> +
> + if (!snp_ap_creation_supported())
> + return -ENOTSUPP;

WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP
#320: FILE: arch/x86/kernel/sev.c:813:
+ return -ENOTSUPP;

> + /* Override start_ip with known SEV-ES/SEV-SNP starting RIP */
> + if (start_ip == real_mode_header->trampoline_start) {
> + start_ip = real_mode_header->sev_es_trampoline_start;
> + } else {
> + WARN_ONCE(1, "unsupported SEV-SNP start_ip: %lx\n", start_ip);
> + return -EINVAL;
> + }

What's all that checking for? Why not simply and unconditionally doing:

start_ip = real_mode_header->sev_es_trampoline_start;

?

We are waking up an SNP guest so who cares what the previous start_ip
value was?

> + /* Find the logical CPU for the APIC ID */
> + for_each_present_cpu(cpu) {
> + if (arch_match_cpu_phys_id(cpu, apic_id))
> + break;
> + }
> + if (cpu >= nr_cpu_ids)
> + return -EINVAL;
> +
> + cur_vmsa = per_cpu(snp_vmsa, cpu);

Where is that snp_vmsa thing used? I don't see it anywhere in the whole
patchset.

> + vmsa = (struct sev_es_save_area *)get_zeroed_page(GFP_KERNEL);
> + if (!vmsa)
> + return -ENOMEM;
> +
> + /* CR4 should maintain the MCE value */
> + cr4 = native_read_cr4() & ~X86_CR4_MCE;
> +
> + /* Set the CS value based on the start_ip converted to a SIPI vector */
> + sipi_vector = (start_ip >> 12);
> + vmsa->cs.base = sipi_vector << 12;
> + vmsa->cs.limit = 0xffff;
> + vmsa->cs.attrib = INIT_CS_ATTRIBS;
> + vmsa->cs.selector = sipi_vector << 8;
> +
> + /* Set the RIP value based on start_ip */
> + vmsa->rip = start_ip & 0xfff;
> +
> + /* Set VMSA entries to the INIT values as documented in the APM */
> + vmsa->ds.limit = 0xffff;
> + vmsa->ds.attrib = INIT_DS_ATTRIBS;
> + vmsa->es = vmsa->ds;
> + vmsa->fs = vmsa->ds;
> + vmsa->gs = vmsa->ds;
> + vmsa->ss = vmsa->ds;
> +
> + vmsa->gdtr.limit = 0xffff;
> + vmsa->ldtr.limit = 0xffff;
> + vmsa->ldtr.attrib = INIT_LDTR_ATTRIBS;
> + vmsa->idtr.limit = 0xffff;
> + vmsa->tr.limit = 0xffff;
> + vmsa->tr.attrib = INIT_TR_ATTRIBS;
> +
> + vmsa->efer = 0x1000; /* Must set SVME bit */
> + vmsa->cr4 = cr4;
> + vmsa->cr0 = 0x60000010;
> + vmsa->dr7 = 0x400;
> + vmsa->dr6 = 0xffff0ff0;
> + vmsa->rflags = 0x2;
> + vmsa->g_pat = 0x0007040600070406ULL;
> + vmsa->xcr0 = 0x1;
> + vmsa->mxcsr = 0x1f80;
> + vmsa->x87_ftw = 0x5555;
> + vmsa->x87_fcw = 0x0040;

Align them all on a single vertical line pls.

> + /*
> + * Set the SNP-specific fields for this VMSA:
> + * VMPL level
> + * SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
> + */
> + vmsa->vmpl = 0;
> + vmsa->sev_features = sev_status >> 2;
> +
> + /* Switch the page over to a VMSA page now that it is initialized */
> + ret = snp_set_vmsa(vmsa);
> + if (ret) {
> + pr_err("set VMSA page failed (%u)\n", ret);
> + free_page((unsigned long)vmsa);
> +
> + return -EINVAL;
> + }
> +
> + /* Issue VMGEXIT AP Creation NAE event */
> + local_irq_save(flags);
> +
> + ghcb = sev_es_get_ghcb(&state);
> +
> + vc_ghcb_invalidate(ghcb);
> + ghcb_set_rax(ghcb, vmsa->sev_features);
> + ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
> + ghcb_set_sw_exit_info_1(ghcb, ((u64)apic_id << 32) | SVM_VMGEXIT_AP_CREATE);
> + ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
> +
> + sev_es_wr_ghcb_msr(__pa(ghcb));
> + VMGEXIT();
> +
> + if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
> + lower_32_bits(ghcb->save.sw_exit_info_1)) {
> + pr_alert("SNP AP Creation error\n");
> + ret = -EINVAL;
> + }
> +
> + sev_es_put_ghcb(&state);
> +
> + local_irq_restore(flags);
> +
> + /* Perform cleanup if there was an error */
> + if (ret) {
> + int err = snp_clear_vmsa(vmsa);
> +


^ Superfluous newline.

> + if (err)
> + pr_err("clear VMSA page failed (%u), leaking page\n", err);
> + else
> + free_page((unsigned long)vmsa);
> +
> + vmsa = NULL;
> + }
> +
> + /* Free up any previous VMSA page */
> + if (cur_vmsa) {
> + int err = snp_clear_vmsa(cur_vmsa);
> +


^ Superfluous newline.

> + if (err)
> + pr_err("clear VMSA page failed (%u), leaking page\n", err);
> + else
> + free_page((unsigned long)cur_vmsa);
> + }
> +
> + /* Record the current VMSA page */
> + cur_vmsa = vmsa;
> +
> + return ret;
> +}

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-16 13:10:55

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes


On 6/16/21 8:02 AM, Borislav Petkov wrote:
> On Wed, Jun 16, 2021 at 07:49:25AM -0500, Brijesh Singh wrote:
>> If you still think ...
> I think you should answer my question first:
>
>> Imagine you're a guest owner and you haven't written the SNP code and
>> you don't know how it works.
>>
>> You start a guest in the public cloud and it fails because the
>> hypervisor violates the GHCB protocol and all that guest prints before
>> it dies is
>>
>> "general request termination"
>>
>> How are you - the guest owner - going to find out what exactly happened?
>>
>> Call support?
> And let me paraphrase it again: if the error condition with which the
> guest terminates is not uniquely identifiable but simply a "general
> request", how are such conditions going to be debugged?

I thought I said it somewhere in our previous conversation, I would look
at the KVM trace log, each vmgexit entry and exit are logged. The log
contains full GHCB MSR value, and in it you can see both the request and
response code and decode the failure reason.

-Brijesh

2021-06-16 14:37:34

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes



On 6/16/2021 8:10 AM, Brijesh Singh wrote:
>
> On 6/16/21 8:02 AM, Borislav Petkov wrote:
>> On Wed, Jun 16, 2021 at 07:49:25AM -0500, Brijesh Singh wrote:
>>> If you still think ...
>> I think you should answer my question first:
>>
>>> Imagine you're a guest owner and you haven't written the SNP code and
>>> you don't know how it works.
>>>
>>> You start a guest in the public cloud and it fails because the
>>> hypervisor violates the GHCB protocol and all that guest prints before
>>> it dies is
>>>
>>> "general request termination"
>>>
>>> How are you - the guest owner - going to find out what exactly happened?
>>>
>>> Call support?
>> And let me paraphrase it again: if the error condition with which the
>> guest terminates is not uniquely identifiable but simply a "general
>> request", how are such conditions going to be debugged?
> > I thought I said it somewhere in our previous conversation, I would look
> at the KVM trace log, each vmgexit entry and exit are logged. The log
> contains full GHCB MSR value, and in it you can see both the request and
> response code and decode the failure reason.
>

I now realized that in this case we may not have the trace's. It's
a production environment and my development machine :(. I will go ahead and
add the error message when guest sees an invalid response code before
terminating it. Will do the similar error message in the decompression
path.

-Brijesh

2021-06-16 14:38:02

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes



On 6/16/2021 9:36 AM, Brijesh Singh wrote:
>
>
> On 6/16/2021 8:10 AM, Brijesh Singh wrote:
>>
>> On 6/16/21 8:02 AM, Borislav Petkov wrote:
>>> On Wed, Jun 16, 2021 at 07:49:25AM -0500, Brijesh Singh wrote:
>>>> If you still think ...
>>> I think you should answer my question first:
>>>
>>>> Imagine you're a guest owner and you haven't written the SNP code and
>>>> you don't know how it works.
>>>>
>>>> You start a guest in the public cloud and it fails because the
>>>> hypervisor violates the GHCB protocol and all that guest prints before
>>>> it dies is
>>>>
>>>> "general request termination"
>>>>
>>>> How are you - the guest owner - going to find out what exactly happened?
>>>>
>>>> Call support?
>>> And let me paraphrase it again: if the error condition with which the
>>> guest terminates is not uniquely identifiable but simply a "general
>>> request", how are such conditions going to be debugged?
>>> I thought I said it somewhere in our previous conversation, I would look
>> at the KVM trace log, each vmgexit entry and exit are logged. The log
>> contains full GHCB MSR value, and in it you can see both the request and
>> response code and decode the failure reason.
>>
>
> I now realized that in this case we may not have the trace's. It's
> a production environment and my development machine :(.

I am mean to say *not* my development machine

2021-06-16 16:14:07

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 19/22] x86/sev-snp: SEV-SNP AP creation support

On 6/16/21 8:07 AM, Borislav Petkov wrote:
> On Wed, Jun 02, 2021 at 09:04:13AM -0500, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>
>> Subject: Re: [PATCH Part1 RFC v3 19/22] x86/sev-snp: SEV-SNP AP creation support
>
> The condensed patch description in the subject line should be written in
> imperative tone. I.e., it needs a verb.
>
> And to simplify it even more, let's prefix all SEV-* stuff with
> "x86/sev: " from now on to mean the whole encrypted virt area.

Yup, I'll make those changes.

>
>> To provide a more secure way to start APs under SEV-SNP, use the SEV-SNP
>> AP Creation NAE event. This allows for guest control over the AP register
>> state rather than trusting the hypervisor with the SEV-ES Jump Table
>> address.
>>
>> During native_smp_prepare_cpus(), invoke an SEV-SNP function that, if
>> SEV-SNP is active, will set or override apic->wakeup_secondary_cpu. This
>> will allow the SEV-SNP AP Creation NAE event method to be used to boot
>> the APs.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> ---
>> arch/x86/include/asm/sev-common.h | 1 +
>> arch/x86/include/asm/sev.h | 13 ++
>> arch/x86/include/uapi/asm/svm.h | 5 +
>> arch/x86/kernel/sev-shared.c | 5 +
>> arch/x86/kernel/sev.c | 206 ++++++++++++++++++++++++++++++
>> arch/x86/kernel/smpboot.c | 3 +
>> 6 files changed, 233 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
>> index 86bb185b5ec1..47aa57bf654a 100644
>> --- a/arch/x86/include/asm/sev-common.h
>> +++ b/arch/x86/include/asm/sev-common.h
>> @@ -57,6 +57,7 @@
>> (((unsigned long)((v) & GHCB_MSR_HV_FT_MASK) >> GHCB_MSR_HV_FT_POS))
>>
>> #define GHCB_HV_FT_SNP BIT_ULL(0)
>> +#define GHCB_HV_FT_SNP_AP_CREATION (BIT_ULL(1) | GHCB_HV_FT_SNP)
>>
>> /* SNP Page State Change */
>> #define GHCB_MSR_PSC_REQ 0x014
>> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
>> index e2141fc28058..640108402ae9 100644
>> --- a/arch/x86/include/asm/sev.h
>> +++ b/arch/x86/include/asm/sev.h
>> @@ -71,6 +71,13 @@ enum snp_mem_op {
>> MEMORY_SHARED
>> };
>>
>> +#define RMPADJUST_VMPL_MAX 3
>> +#define RMPADJUST_VMPL_MASK GENMASK(7, 0)
>> +#define RMPADJUST_VMPL_SHIFT 0
>> +#define RMPADJUST_PERM_MASK_MASK GENMASK(7, 0)
>
> mask mask huh?
>
> How about "perm mask" and "perm shift" ?

Yeah, I debated on that one. I wanted to stay close to what the APM has,
which is PERM_MASK (TARGET_PERM_MASK actually), but it does look odd. I'll
get rid of the MASK portion of PERM_MASK.

>
>> +#define RMPADJUST_PERM_MASK_SHIFT 8
>> +#define RMPADJUST_VMSA_PAGE_BIT BIT(16)
>> +
>> #ifdef CONFIG_AMD_MEM_ENCRYPT
>> extern struct static_key_false sev_es_enable_key;
>> extern void __sev_es_ist_enter(struct pt_regs *regs);
>> @@ -116,6 +123,9 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
>> void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op);
>> void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
>> void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
>> +
>
> No need for the newlines here - it is all function prototypes lumped
> together - the only one who reads them is the compiler.
>
>> +void snp_setup_wakeup_secondary_cpu(void);
>
> "setup" "wakeup" huh?
>
> snp_set_wakeup_secondary_cpu() looks just fine to me. :)

Will do.

>
>> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
>> index b62226bf51b9..7139c9ba59b2 100644
>> --- a/arch/x86/kernel/sev-shared.c
>> +++ b/arch/x86/kernel/sev-shared.c
>> @@ -32,6 +32,11 @@ static bool __init sev_es_check_cpu_features(void)
>> return true;
>> }
>>
>> +static bool snp_ap_creation_supported(void)
>> +{
>> + return (hv_features & GHCB_HV_FT_SNP_AP_CREATION) == GHCB_HV_FT_SNP_AP_CREATION;
>> +}
>
> Can we get rid of those silly accessors pls?

Will do.

>
> We established earlier that hv_features is going to be __ro_after_init
> so we might just as well export it to sev.c for direct querying -
> there's no worry that something'll change it during runtime.
>
>> static bool __init sev_snp_check_hypervisor_features(void)
>> {
>> if (ghcb_version < 2)
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index 4847ac81cca3..8f7ef35a25ef 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -19,6 +19,7 @@
>> #include <linux/memblock.h>
>> #include <linux/kernel.h>
>> #include <linux/mm.h>
>> +#include <linux/cpumask.h>
>>
>> #include <asm/cpu_entry_area.h>
>> #include <asm/stacktrace.h>
>> @@ -31,6 +32,7 @@
>> #include <asm/svm.h>
>> #include <asm/smp.h>
>> #include <asm/cpu.h>
>> +#include <asm/apic.h>
>>
>> #include "sev-internal.h"
>>
>> @@ -106,6 +108,8 @@ struct ghcb_state {
>> static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
>> DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
>>
>> +static DEFINE_PER_CPU(struct sev_es_save_area *, snp_vmsa);
>> +
>> /* Needed in vc_early_forward_exception */
>> void do_early_exception(struct pt_regs *regs, int trapnr);
>>
>> @@ -744,6 +748,208 @@ void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
>> pvalidate_pages(vaddr, npages, 1);
>> }
>>
>> +static int snp_rmpadjust(void *va, unsigned int vmpl, unsigned int perm_mask, bool vmsa)
>
> No need for the "snp_" prefix. Drop it for all static functions here too
> pls.
>
> @vmpl can be a u8 so that you don't need to mask it off. The same for
> @perm_mask. And then you can drop the mask defines too.

Will do.

>
>> +{
>> + unsigned int attrs;
>> + int err;
>> +
>> + attrs = (vmpl & RMPADJUST_VMPL_MASK) << RMPADJUST_VMPL_SHIFT;
>
> Shift by 0 huh? Can we drop this silliness pls?
>
> /* Make sure Reserved[63:17] is 0 */
> attrs = 0;
>
> attrs |= vmpl;
>
> Plain and simple.
>
>> + attrs |= (perm_mask & RMPADJUST_PERM_MASK_MASK) << RMPADJUST_PERM_MASK_SHIFT;
>
> perm_mask is always 0 - you don't even have to pass it in as a function
> argument.

Will do. The compiler should be smart enough to do the right thing, I was
just trying to show the structure of the input. But, easy enough to drop it.

>
>> + if (vmsa)
>> + attrs |= RMPADJUST_VMSA_PAGE_BIT;
>> +
>> + /* Perform RMPADJUST */
>
> Add:
>
> /* Instruction mnemonic supported in binutils versions v2.36 and later */

Will do.

>
>> + asm volatile (".byte 0xf3,0x0f,0x01,0xfe\n\t"
>> + : "=a" (err)
>
> here you should do:
>
> : ... "c" (RMP_PG_SIZE_4K), ...
>
> so that it is clear what goes into %rcx.

Will do.

>
>> + : "a" (va), "c" (0), "d" (attrs)
>> + : "memory", "cc");
>> +
>> + return err;
>> +}
>> +
>> +static int snp_clear_vmsa(void *vmsa)
>> +{
>> + /*
>> + * Clear the VMSA attribute for the page:
>> + * RDX[7:0] = 1, Target VMPL level, must be numerically
>> + * higher than current level (VMPL0)
>
> But RMPADJUST_VMPL_MAX is 3?!

Yup, I'll make this change. And actually, the max VMPL is defined via
CPUID, so I may look at saving that off somewhere or just retrieve it here
and use that.

>
>> + * RDX[15:8] = 0, Target permission mask (not used)
>> + * RDX[16] = 0, Not a VMSA page
>> + */
>> + return snp_rmpadjust(vmsa, RMPADJUST_VMPL_MAX, 0, false);
>> +}
>> +
>> +static int snp_set_vmsa(void *vmsa)
>> +{
>> + /*
>> + * To set the VMSA attribute for the page:
>> + * RDX[7:0] = 1, Target VMPL level, must be numerically
>> + * higher than current level (VMPL0)
>> + * RDX[15:8] = 0, Target permission mask (not used)
>> + * RDX[16] = 1, VMSA page
>> + */
>> + return snp_rmpadjust(vmsa, RMPADJUST_VMPL_MAX, 0, true);
>> +}
>> +
>> +#define INIT_CS_ATTRIBS (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
>> +#define INIT_DS_ATTRIBS (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK | SVM_SELECTOR_WRITE_MASK)
>> +
>
> Put SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK in a helper define and share it in the two
> definitions above pls.

Will do.

>
>> +#define INIT_LDTR_ATTRIBS (SVM_SELECTOR_P_MASK | 2)
>> +#define INIT_TR_ATTRIBS (SVM_SELECTOR_P_MASK | 3)
>> +
>> +static int snp_wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
>> +{
>> + struct sev_es_save_area *cur_vmsa;
>> + struct sev_es_save_area *vmsa;
>> + struct ghcb_state state;
>> + struct ghcb *ghcb;
>> + unsigned long flags;
>> + u8 sipi_vector;
>> + u64 cr4;
>> + int cpu;
>> + int ret;
>
> Remember the reversed xmas tree. And you can combine the variables of
> the same type into a single line.

Yup.

>
>> +
>> + if (!snp_ap_creation_supported())
>> + return -ENOTSUPP;
>
> WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP
> #320: FILE: arch/x86/kernel/sev.c:813:
> + return -ENOTSUPP;

Yup.

>
>> + /* Override start_ip with known SEV-ES/SEV-SNP starting RIP */
>> + if (start_ip == real_mode_header->trampoline_start) {
>> + start_ip = real_mode_header->sev_es_trampoline_start;
>> + } else {
>> + WARN_ONCE(1, "unsupported SEV-SNP start_ip: %lx\n", start_ip);
>> + return -EINVAL;
>> + }
>
> What's all that checking for? Why not simply and unconditionally doing:
>
> start_ip = real_mode_header->sev_es_trampoline_start;
>
> ?
>
> We are waking up an SNP guest so who cares what the previous start_ip
> value was?

It is to catch any change in the future that adds a new trampoline start
location, that may do something different. If that happens, then this will
do the wrong thing, so it's just a safe guard.

>
>> + /* Find the logical CPU for the APIC ID */
>> + for_each_present_cpu(cpu) {
>> + if (arch_match_cpu_phys_id(cpu, apic_id))
>> + break;
>> + }
>> + if (cpu >= nr_cpu_ids)
>> + return -EINVAL;
>> +
>> + cur_vmsa = per_cpu(snp_vmsa, cpu);
>
> Where is that snp_vmsa thing used? I don't see it anywhere in the whole
> patchset.

Ah, good catch. It should be set at the end of the function in place of
the "cur_vmsa = vmsa" statement.

>
>> + vmsa = (struct sev_es_save_area *)get_zeroed_page(GFP_KERNEL);
>> + if (!vmsa)
>> + return -ENOMEM;
>> +
>> + /* CR4 should maintain the MCE value */
>> + cr4 = native_read_cr4() & ~X86_CR4_MCE;
>> +
>> + /* Set the CS value based on the start_ip converted to a SIPI vector */
>> + sipi_vector = (start_ip >> 12);
>> + vmsa->cs.base = sipi_vector << 12;
>> + vmsa->cs.limit = 0xffff;
>> + vmsa->cs.attrib = INIT_CS_ATTRIBS;
>> + vmsa->cs.selector = sipi_vector << 8;
>> +
>> + /* Set the RIP value based on start_ip */
>> + vmsa->rip = start_ip & 0xfff;
>> +
>> + /* Set VMSA entries to the INIT values as documented in the APM */
>> + vmsa->ds.limit = 0xffff;
>> + vmsa->ds.attrib = INIT_DS_ATTRIBS;
>> + vmsa->es = vmsa->ds;
>> + vmsa->fs = vmsa->ds;
>> + vmsa->gs = vmsa->ds;
>> + vmsa->ss = vmsa->ds;
>> +
>> + vmsa->gdtr.limit = 0xffff;
>> + vmsa->ldtr.limit = 0xffff;
>> + vmsa->ldtr.attrib = INIT_LDTR_ATTRIBS;
>> + vmsa->idtr.limit = 0xffff;
>> + vmsa->tr.limit = 0xffff;
>> + vmsa->tr.attrib = INIT_TR_ATTRIBS;
>> +
>> + vmsa->efer = 0x1000; /* Must set SVME bit */
>> + vmsa->cr4 = cr4;
>> + vmsa->cr0 = 0x60000010;
>> + vmsa->dr7 = 0x400;
>> + vmsa->dr6 = 0xffff0ff0;
>> + vmsa->rflags = 0x2;
>> + vmsa->g_pat = 0x0007040600070406ULL;
>> + vmsa->xcr0 = 0x1;
>> + vmsa->mxcsr = 0x1f80;
>> + vmsa->x87_ftw = 0x5555;
>> + vmsa->x87_fcw = 0x0040;
>
> Align them all on a single vertical line pls.

Will do.

>
>> + /*
>> + * Set the SNP-specific fields for this VMSA:
>> + * VMPL level
>> + * SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
>> + */
>> + vmsa->vmpl = 0;
>> + vmsa->sev_features = sev_status >> 2;
>> +
>> + /* Switch the page over to a VMSA page now that it is initialized */
>> + ret = snp_set_vmsa(vmsa);
>> + if (ret) {
>> + pr_err("set VMSA page failed (%u)\n", ret);
>> + free_page((unsigned long)vmsa);
>> +
>> + return -EINVAL;
>> + }
>> +
>> + /* Issue VMGEXIT AP Creation NAE event */
>> + local_irq_save(flags);
>> +
>> + ghcb = sev_es_get_ghcb(&state);
>> +
>> + vc_ghcb_invalidate(ghcb);
>> + ghcb_set_rax(ghcb, vmsa->sev_features);
>> + ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
>> + ghcb_set_sw_exit_info_1(ghcb, ((u64)apic_id << 32) | SVM_VMGEXIT_AP_CREATE);
>> + ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
>> +
>> + sev_es_wr_ghcb_msr(__pa(ghcb));
>> + VMGEXIT();
>> +
>> + if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
>> + lower_32_bits(ghcb->save.sw_exit_info_1)) {
>> + pr_alert("SNP AP Creation error\n");
>> + ret = -EINVAL;
>> + }
>> +
>> + sev_es_put_ghcb(&state);
>> +
>> + local_irq_restore(flags);
>> +
>> + /* Perform cleanup if there was an error */
>> + if (ret) {
>> + int err = snp_clear_vmsa(vmsa);
>> +
>
>
> ^ Superfluous newline.

Ok.

>
>> + if (err)
>> + pr_err("clear VMSA page failed (%u), leaking page\n", err);
>> + else
>> + free_page((unsigned long)vmsa);
>> +
>> + vmsa = NULL;
>> + }
>> +
>> + /* Free up any previous VMSA page */
>> + if (cur_vmsa) {
>> + int err = snp_clear_vmsa(cur_vmsa);
>> +
>
>
> ^ Superfluous newline.

Ok.

Thanks,
Tom

>
>> + if (err)
>> + pr_err("clear VMSA page failed (%u), leaking page\n", err);
>> + else
>> + free_page((unsigned long)cur_vmsa);
>> + }
>> +
>> + /* Record the current VMSA page */
>> + cur_vmsa = vmsa;
>> +
>> + return ret;
>> +}
>

2021-06-17 20:38:33

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 06/22] x86/sev: check SEV-SNP features support

Hi Boris,


On 6/7/2021 9:54 AM, Borislav Petkov wrote:
> On Wed, Jun 02, 2021 at 09:04:00AM -0500, Brijesh Singh wrote:
>> static bool early_setup_sev_es(void)
>
> This function is doing SNP init now too, so it should be called
> something generic like
>
> do_early_sev_setup()
>
> or so.
>
>> #define GHCB_SEV_ES_GEN_REQ 0
>> #define GHCB_SEV_ES_PROT_UNSUPPORTED 1
>> +#define GHCB_SEV_ES_SNP_UNSUPPORTED 2
>
> GHCB_SNP_UNSUPPORTED
>
>> +static bool __init sev_snp_check_hypervisor_features(void)
>
> check_hv_features()
>

Based on your feedback on AP creation patch to not use the accessors, I am inclined to
remove this helper and have the caller directly check the feature bit, is that okay ?

something like:

if (sev_snp_enabled() && !(hv_features & GHCB_HV_FT_SNP))
sev_es_terminate(GHCB_SNP_UNSUPPORTED);

Let me know if you think I should still keep the accessors.

-Brijesh

> is nice and short.
>
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index 77a754365ba9..9b70b7332614 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -609,6 +609,10 @@ static bool __init sev_es_setup_ghcb(void)
>
> Ditto for this one: setup_ghcb()
>
> Thx.
>

2021-06-18 06:20:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 06/22] x86/sev: check SEV-SNP features support

On Thu, Jun 17, 2021 at 01:46:08PM -0500, Brijesh Singh wrote:
> Based on your feedback on AP creation patch to not use the accessors, I am inclined to
> remove this helper and have the caller directly check the feature bit, is that okay ?
>
> something like:
>
> if (sev_snp_enabled() && !(hv_features & GHCB_HV_FT_SNP))
> sev_es_terminate(GHCB_SNP_UNSUPPORTED);
>
> Let me know if you think I should still keep the accessors.

Yeah, looks about right. Let's keep hv_features in a sev-specific
header so that there are no name clashes. Or maybe we should call it
sev_hv_features since it is going to be read-only anyway.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-18 14:00:29

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 21/22] x86/sev: Register SNP guest request platform device



On 6/18/2021 4:46 AM, Borislav Petkov wrote:
> Please split it this way before I take a look.

Ack, I will split into multiple. thanks

-Brijesh

2021-06-30 13:36:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 22/22] virt: Add SEV-SNP guest driver

On Wed, Jun 02, 2021 at 09:04:16AM -0500, Brijesh Singh wrote:
> SEV-SNP specification provides the guest a mechanism to communicate with
> the PSP without risk from a malicious hypervisor who wishes to read, alter,
> drop or replay the messages sent. The driver uses snp_issue_guest_request()
> to issue GHCB SNP_GUEST_REQUEST NAE event. This command constructs a
> trusted channel between the guest and the PSP firmware.
>
> The userspace can use the following ioctls provided by the driver:
>
> 1. Request an attestation report that can be used to assume the identity
> and security configuration of the guest.
> 2. Ask the firmware to provide a key derived from a root key.
>
> See SEV-SNP spec section Guest Messages for more details.
>
> Signed-off-by: Brijesh Singh <[email protected]>
> ---
> drivers/virt/Kconfig | 3 +
> drivers/virt/Makefile | 1 +
> drivers/virt/sevguest/Kconfig | 10 +
> drivers/virt/sevguest/Makefile | 4 +
> drivers/virt/sevguest/snp.c | 448 +++++++++++++++++++++++++++++++++
> drivers/virt/sevguest/snp.h | 63 +++++
> include/uapi/linux/sev-guest.h | 56 +++++
> 7 files changed, 585 insertions(+)
> create mode 100644 drivers/virt/sevguest/Kconfig
> create mode 100644 drivers/virt/sevguest/Makefile
> create mode 100644 drivers/virt/sevguest/snp.c
> create mode 100644 drivers/virt/sevguest/snp.h
> create mode 100644 include/uapi/linux/sev-guest.h

Seeing how there are a bunch of such driver things for SEV stuff, I'd
say to put it under:

drivers/virt/coco/

where we can collect all those confidential computing supporting
drivers.

>
> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
> index 8061e8ef449f..4de714c5ee9a 100644
> --- a/drivers/virt/Kconfig
> +++ b/drivers/virt/Kconfig
> @@ -36,4 +36,7 @@ source "drivers/virt/vboxguest/Kconfig"
> source "drivers/virt/nitro_enclaves/Kconfig"
>
> source "drivers/virt/acrn/Kconfig"
> +
> +source "drivers/virt/sevguest/Kconfig"
> +
> endif
> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> index 3e272ea60cd9..b2d1a8131c90 100644
> --- a/drivers/virt/Makefile
> +++ b/drivers/virt/Makefile
> @@ -8,3 +8,4 @@ obj-y += vboxguest/
>
> obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
> obj-$(CONFIG_ACRN_HSM) += acrn/
> +obj-$(CONFIG_SEV_GUEST) += sevguest/
> diff --git a/drivers/virt/sevguest/Kconfig b/drivers/virt/sevguest/Kconfig
> new file mode 100644
> index 000000000000..e88a85527bf6
> --- /dev/null
> +++ b/drivers/virt/sevguest/Kconfig
> @@ -0,0 +1,10 @@
> +config SEV_GUEST
> + tristate "AMD SEV Guest driver"
> + default y
> + depends on AMD_MEM_ENCRYPT
> + help
> + Provides AMD SNP guest request driver. The driver can be used by the

s/Provides AMD SNP guest request driver. //

> + guest to communicate with the hypervisor to request the attestation report

to communicate with the PSP, I thought, not the hypervisor?

> + and more.
> +
> + If you choose 'M' here, this module will be called sevguest.
> diff --git a/drivers/virt/sevguest/Makefile b/drivers/virt/sevguest/Makefile
> new file mode 100644
> index 000000000000..1505df437682
> --- /dev/null
> +++ b/drivers/virt/sevguest/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +sevguest-y := snp.o

What's that for?

Why isn't the filename simply called:

drivers/virt/coco/sevguest.c

?

Or is more coming?

And below there's

.name = "snp-guest",

so you need to get the naming in order here.

> +obj-$(CONFIG_SEV_GUEST) += sevguest.o
> diff --git a/drivers/virt/sevguest/snp.c b/drivers/virt/sevguest/snp.c
> new file mode 100644
> index 000000000000..00d8e8fddf2c
> --- /dev/null
> +++ b/drivers/virt/sevguest/snp.c
> @@ -0,0 +1,448 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * AMD Secure Encrypted Virtualization Nested Paging (SEV-SNP) guest request interface
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Brijesh Singh <[email protected]>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/mutex.h>
> +#include <linux/io.h>
> +#include <linux/platform_device.h>
> +#include <linux/miscdevice.h>
> +#include <linux/set_memory.h>
> +#include <linux/fs.h>
> +#include <crypto/aead.h>
> +#include <linux/scatterlist.h>
> +#include <linux/sev-guest.h>
> +#include <uapi/linux/sev-guest.h>
> +
> +#include "snp.h"
> +
> +#define DEVICE_NAME "sev-guest"
> +#define AAD_LEN 48
> +#define MSG_HDR_VER 1
> +
> +struct snp_guest_crypto {
> + struct crypto_aead *tfm;
> + uint8_t *iv, *authtag;
> + int iv_len, a_len;
> +};
> +
> +struct snp_guest_dev {
> + struct device *dev;
> + struct miscdevice misc;
> +
> + struct snp_guest_crypto *crypto;
> + struct snp_guest_msg *request, *response;
> +};
> +
> +static DEFINE_MUTEX(snp_cmd_mutex);
> +
> +static inline struct snp_guest_dev *to_snp_dev(struct file *file)
> +{
> + struct miscdevice *dev = file->private_data;
> +
> + return container_of(dev, struct snp_guest_dev, misc);
> +}
> +
> +static struct snp_guest_crypto *init_crypto(struct snp_guest_dev *snp_dev, uint8_t *key,
> + size_t keylen)
> +{
> + struct snp_guest_crypto *crypto;
> +
> + crypto = kzalloc(sizeof(*crypto), GFP_KERNEL_ACCOUNT);
> + if (!crypto)
> + return NULL;
> +
> + crypto->tfm = crypto_alloc_aead("gcm(aes)", 0, 0);

I know that it is hard to unselect CONFIG_CRYPTO_AEAD2 which provides
this but you better depend on it in the Makefile so that some random
config still builds.

> + if (IS_ERR(crypto->tfm))
> + goto e_free;
> +
> + if (crypto_aead_setkey(crypto->tfm, key, keylen))
> + goto e_free_crypto;
> +
> + crypto->iv_len = crypto_aead_ivsize(crypto->tfm);
> + if (crypto->iv_len < 12) {
> + dev_err(snp_dev->dev, "IV length is less than 12.\n");
> + goto e_free_crypto;
> + }
> +
> + crypto->iv = kmalloc(crypto->iv_len, GFP_KERNEL_ACCOUNT);
> + if (!crypto->iv)
> + goto e_free_crypto;
> +
> + if (crypto_aead_authsize(crypto->tfm) > MAX_AUTHTAG_LEN) {
> + if (crypto_aead_setauthsize(crypto->tfm, MAX_AUTHTAG_LEN)) {
> + dev_err(snp_dev->dev, "failed to set authsize to %d\n", MAX_AUTHTAG_LEN);
> + goto e_free_crypto;
> + }
> + }
> +
> + crypto->a_len = crypto_aead_authsize(crypto->tfm);
> + crypto->authtag = kmalloc(crypto->a_len, GFP_KERNEL_ACCOUNT);
> + if (!crypto->authtag)
> + goto e_free_crypto;
> +
> + return crypto;
> +
> +e_free_crypto:
> + crypto_free_aead(crypto->tfm);
> +e_free:
> + kfree(crypto->iv);
> + kfree(crypto->authtag);
> + kfree(crypto);
> +
> + return NULL;
> +}

...

> +static int handle_guest_request(struct snp_guest_dev *snp_dev, int msg_type,
> + struct snp_user_guest_request *input, void *req_buf,
> + size_t req_len, void __user *resp_buf, size_t resp_len)
> +{
> + struct snp_guest_crypto *crypto = snp_dev->crypto;
> + struct page *page;
> + size_t msg_len;
> + int ret;
> +
> + /* Allocate the buffer to hold response */
> + resp_len += crypto->a_len;
> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(resp_len));
> + if (!page)
> + return -ENOMEM;
> +
> + ret = __handle_guest_request(snp_dev, msg_type, input, req_buf, req_len,
> + page_address(page), resp_len, &msg_len);

Align arguments on the opening brace.

Check the whole patch too for other similar cases.

> + if (ret)
> + goto e_free;
> +
> + if (copy_to_user(resp_buf, page_address(page), msg_len))
> + ret = -EFAULT;
> +
> +e_free:
> + __free_pages(page, get_order(resp_len));
> +
> + return ret;
> +}
> +
> +static int get_report(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *input)
> +{
> + struct snp_user_report __user *report = (struct snp_user_report *)input->data;
> + struct snp_user_report_req req;
> +
> + if (copy_from_user(&req, &report->req, sizeof(req)))

What guarantees that that __user report thing is valid and is not going
to trick the kernel into doing a NULL pointer access in the ->req access
here?

IOW, you need to verify all your user data being passed through before
using it.

> + return -EFAULT;
> +
> + return handle_guest_request(snp_dev, SNP_MSG_REPORT_REQ, input, &req.user_data,
> + sizeof(req.user_data), report->response, sizeof(report->response));
> +}
> +
> +static int derive_key(struct snp_guest_dev *snp_dev, struct snp_user_guest_request *input)
> +{
> + struct snp_user_derive_key __user *key = (struct snp_user_derive_key *)input->data;
> + struct snp_user_derive_key_req req;
> +
> + if (copy_from_user(&req, &key->req, sizeof(req)))
> + return -EFAULT;
> +
> + return handle_guest_request(snp_dev, SNP_MSG_KEY_REQ, input, &req, sizeof(req),
> + key->response, sizeof(key->response));
> +}
> +
> +static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> +{
> + struct snp_guest_dev *snp_dev = to_snp_dev(file);
> + struct snp_user_guest_request input;
> + void __user *argp = (void __user *)arg;
> + int ret = -ENOTTY;
> +
> + if (copy_from_user(&input, argp, sizeof(input)))
> + return -EFAULT;
> +
> + mutex_lock(&snp_cmd_mutex);
> + switch (ioctl) {
> + case SNP_GET_REPORT: {
> + ret = get_report(snp_dev, &input);
> + break;
> + }
> + case SNP_DERIVE_KEY: {
> + ret = derive_key(snp_dev, &input);
> + break;
> + }
> + default:
> + break;
> + }

If only two ioctls, you don't need the switch-case thing.

> +
> + mutex_unlock(&snp_cmd_mutex);
> +
> + if (copy_to_user(argp, &input, sizeof(input)))
> + return -EFAULT;
> +
> + return ret;
> +}

...

> diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
> new file mode 100644
> index 000000000000..0a8454631605
> --- /dev/null
> +++ b/include/uapi/linux/sev-guest.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> +/*
> + * Userspace interface for AMD SEV and SEV-SNP guest driver.
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Brijesh Singh <[email protected]>
> + *
> + * SEV-SNP API specification is available at: https://developer.amd.com/sev/
> + */
> +
> +#ifndef __UAPI_LINUX_SEV_GUEST_H_
> +#define __UAPI_LINUX_SEV_GUEST_H_
> +
> +#include <linux/types.h>
> +
> +struct snp_user_report_req {
> + __u8 user_data[64];
> +};
> +
> +struct snp_user_report {
> + struct snp_user_report_req req;
> +
> + /* see SEV-SNP spec for the response format */
> + __u8 response[4000];
> +};
> +
> +struct snp_user_derive_key_req {
> + __u8 root_key_select;
> + __u64 guest_field_select;
> + __u32 vmpl;
> + __u32 guest_svn;
> + __u64 tcb_version;
> +};
> +
> +struct snp_user_derive_key {
> + struct snp_user_derive_key_req req;
> +
> + /* see SEV-SNP spec for the response format */
> + __u8 response[64];
> +};
> +
> +struct snp_user_guest_request {
> + /* Message version number (must be non-zero) */
> + __u8 msg_version;
> + __u64 data;
> +
> + /* firmware error code on failure (see psp-sev.h) */
> + __u32 fw_err;
> +};

All those struct names have a "snp_user" prefix. It seems to me that
that "user" is superfluous.

> +
> +#define SNP_GUEST_REQ_IOC_TYPE 'S'
> +#define SNP_GET_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x0, struct snp_user_guest_request)
> +#define SNP_DERIVE_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_user_guest_request)

Where are those ioctls documented so that userspace can know how to use
them?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-30 16:28:28

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 22/22] virt: Add SEV-SNP guest driver



On 6/30/2021 8:35 AM, Borislav Petkov wrote:
>
> Seeing how there are a bunch of such driver things for SEV stuff, I'd
> say to put it under:
>
> drivers/virt/coco/
>
> where we can collect all those confidential computing supporting
> drivers.
>
Sounds good to me.

>>
>> + depends on AMD_MEM_ENCRYPT
>> + help
>> + Provides AMD SNP guest request driver. The driver can be used by the
>
> s/Provides AMD SNP guest request driver. //
>
>> + guest to communicate with the hypervisor to request the attestation report
>
> to communicate with the PSP, I thought, not the hypervisor?

Yes, the guest communicates directly with the PSP through the hypervisor. I will fix
the wording.

>
>> + and more.
>> +
>> + If you choose 'M' here, this module will be called sevguest.
>> diff --git a/drivers/virt/sevguest/Makefile b/drivers/virt/sevguest/Makefile
>> new file mode 100644
>> index 000000000000..1505df437682
>> --- /dev/null
>> +++ b/drivers/virt/sevguest/Makefile
>> @@ -0,0 +1,4 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +sevguest-y := snp.o
>
> What's that for?
>
> Why isn't the filename simply called:
>
> drivers/virt/coco/sevguest.c
>
> ?
>
> Or is more coming?
>
> And below there's
>
> .name = "snp-guest",
>
> so you need to get the naming in order here.
>

As you have noticed that Dov is submitting the SEV specific driver. I was thinking that
it will be nice if we have one driver that covers both the SEV and SEV-SNP. That driver
can be called "sevguest". The kernel will install the appropriate platform device. The
sevguest driver can probe for both the "sev-guest" and "snp-guest" and delegate the
ioctl handling accordingly.

In the kernel the directory structure may look like this:

virt/coco/sevguest
sevguest.c // common code
snp.c // SNP specific ioctl implementation
sev.c // SEV specific ioctl or sysfs implementation

Thoughts ?

>> + struct snp_guest_crypto *crypto;
>> +
>> + crypto = kzalloc(sizeof(*crypto), GFP_KERNEL_ACCOUNT);
>> + if (!crypto)
>> + return NULL;
>> +
>> + crypto->tfm = crypto_alloc_aead("gcm(aes)", 0, 0);
>
> I know that it is hard to unselect CONFIG_CRYPTO_AEAD2 which provides
> this but you better depend on it in the Makefile so that some random
> config still builds.
>

Noted.

>> + if (IS_ERR(crypto->tfm))
>> + goto e_free;
>> +
>> + if (crypto_aead_setkey(crypto->tfm, key, keylen))
>> +
>> + ret = __handle_guest_request(snp_dev, msg_type, input, req_buf, req_len,
>> + page_address(page), resp_len, &msg_len);
>
> Align arguments on the opening brace.
>
> Check the whole patch too for other similar cases.

Noted.

>
>> + struct snp_user_report __user *report = (struct snp_user_report *)input->data;
>> + struct snp_user_report_req req;
>> +
>> + if (copy_from_user(&req, &report->req, sizeof(req)))
>
> What guarantees that that __user report thing is valid and is not going
> to trick the kernel into doing a NULL pointer access in the ->req access
> here?
>
> IOW, you need to verify all your user data being passed through before
> using it.

Let me work to go through it and make sure that we don't get into NULL
deference situtation.

>
>> + case SNP_GET_REPORT: {
>> + ret = get_report(snp_dev, &input);
>> + break;
>> + }
>> + case SNP_DERIVE_KEY: {
>> + ret = derive_key(snp_dev, &input);
>> + break;
>> + }
>> + default:
>> + break;
>> + }
>
> If only two ioctls, you don't need the switch-case thing.
>

I am working to add support for "extended guest request" that will make it 3 ioctl.

>> +
>> +struct snp_user_guest_request {
>> + /* Message version number (must be non-zero) */
>> + __u8 msg_version;
>> + __u64 data;
>> +
>> + /* firmware error code on failure (see psp-sev.h) */
>> + __u32 fw_err;
>> +};
>
> All those struct names have a "snp_user" prefix. It seems to me that
> that "user" is superfluous.
>

I followed the naming convension you recommended during the initial SEV driver
developement. IIRC, the main reason for us having to add "user" in it because
we wanted to distinguious that this structure is not exactly same as the what
is defined in the SEV-SNP firmware spec.


>> +
>> +#define SNP_GUEST_REQ_IOC_TYPE 'S'
>> +#define SNP_GET_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x0, struct snp_user_guest_request)
>> +#define SNP_DERIVE_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_user_guest_request)
>
> Where are those ioctls documented so that userspace can know how to use
> them?

Good question, I am not able to find a generic place to document it. Should we
create a documentation "Documentation/virt/coco/sevguest-api.rst" for it ? I am
open to other suggestions.

-Brijesh

2021-07-01 18:05:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 22/22] virt: Add SEV-SNP guest driver

On Wed, Jun 30, 2021 at 11:26:46AM -0500, Brijesh Singh wrote:
> As you have noticed that Dov is submitting the SEV specific driver.

Well, reportedly that driver is generic-ish as it only handles the
EFI-provided sekrits and is not SEV-specific - the SEV use is only
exemplary.

> I was thinking that it will be nice if we have one driver that covers
> both the SEV and SEV-SNP. That driver can be called "sevguest". The
> kernel will install the appropriate platform device. The sevguest
> driver can probe for both the "sev-guest" and "snp-guest" and delegate
> the ioctl handling accordingly.
>
> In the kernel the directory structure may look like this:
>
> virt/coco/sevguest
> sevguest.c // common code
> snp.c // SNP specific ioctl implementation
> sev.c // SEV specific ioctl or sysfs implementation
>
> Thoughts ?

Sure, but I'd call it sevguest.c and will have it deal with both SEV and
SNP ioctls depending on what has been detected in the hardware. Or is
there some special reason for having snp.c and sev.c separate?

> I followed the naming convension you recommended during the initial SEV driver
> developement. IIRC, the main reason for us having to add "user" in it because
> we wanted to distinguious that this structure is not exactly same as the what
> is defined in the SEV-SNP firmware spec.

I most definitely have forgotten about this. Can you point me to the
details of that discussion and why there's a need to distinguish?

> Good question, I am not able to find a generic place to document it. Should we
> create a documentation "Documentation/virt/coco/sevguest-api.rst" for it ? I am
> open to other suggestions.

Well, grepping the tree for "ioctl" I see:

Documentation/driver-api/ioctl.rst
Documentation/process/botching-up-ioctls.rst
Documentation/userspace-api/ioctl/cdrom.rst
Documentation/userspace-api/ioctl/hdio.rst
Documentation/userspace-api/ioctl/index.rst
Documentation/userspace-api/ioctl/ioctl-decoding.rst
Documentation/userspace-api/ioctl/ioctl-number.rst
Documentation/userspace-api/media/cec/cec-func-ioctl.rst
Documentation/userspace-api/media/mediactl/media-func-ioctl.rst
Documentation/userspace-api/media/mediactl/request-func-ioctl.rst
Documentation/userspace-api/media/v4l/func-ioctl.rst

and there's some good info as to what to do.

In any case, Documentation/virt/coco/sevguest-api.rst doesn't sound too
bad either, actually, as it collects everything under virt/

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-07-01 21:33:59

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 22/22] virt: Add SEV-SNP guest driver



On 7/1/2021 1:03 PM, Borislav Petkov wrote:
>
> Sure, but I'd call it sevguest.c and will have it deal with both SEV and
> SNP ioctls depending on what has been detected in the hardware. Or is
> there some special reason for having snp.c and sev.c separate?
>

I don't have any strong reason. I am okay to begin putting all the SNP
stuff in the sevguest.c.


>> I followed the naming convension you recommended during the initial SEV driver
>> developement. IIRC, the main reason for us having to add "user" in it because
>> we wanted to distinguious that this structure is not exactly same as the what
>> is defined in the SEV-SNP firmware spec.
>
> I most definitely have forgotten about this. Can you point me to the
> details of that discussion and why there's a need to distinguish?
>
>> Good question, I am not able to find a generic place to document it. Should we
>> create a documentation "Documentation/virt/coco/sevguest-api.rst" for it ? I am
>> open to other suggestions.
>

The spec definition is present in include/linux/psp-sev.h but sometime we don't
expose the spec defs as-is to userspace. Several SEV/SEV-SNP does not need to
be exposed to the userspace, those which need to be expose we provide a bit
modified Linux uapi for it, and for SEV drivers we choose "_user" prefix.

e.g
a spec definition for the PEK import in include/linux/psp-sev.h is:
struct sev_data_pek_cert_import {
u64 pdh_cert_address; /* system physical address */
u32 pdh_cert_len;
u32 reserved;
...
};

But its corresponding userspace structure def in include/uapi/linux/psp-sev.h is:
struct sev_user_data_pek_cert_import {
__u64 pek_cert_uaddr; /* userspace address */
__u32 pek_cert_len;
...
};

The ioctl handling takes care of mapping from uaddr to pa and other things as required.
So, I took similar approach for the SEV-SNP guest ioctl. In this particular case the
guest request structure defined in the spec contains multiple field but many of
those fields are managed internally by the kernel (e.g seqno, IV, etc etc).

-Brijesh

-Brijesh

2021-07-03 16:21:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 22/22] virt: Add SEV-SNP guest driver

On Thu, Jul 01, 2021 at 04:32:25PM -0500, Brijesh Singh wrote:
> The spec definition is present in include/linux/psp-sev.h but sometime
> we don't expose the spec defs as-is to userspace.

Why?

Having such undocumented and maybe unwarranted differences - I still
don't see a clear reason why - is calling for additional and unnecessary
confusion.

> Several SEV/SEV-SNP does not need to be exposed to the userspace,
> those which need to be expose we provide a bit modified Linux uapi for
> it, and for SEV drivers we choose "_user" prefix.

Is that documented somewhere?

Because "user" doesn't tell me it is a modified structure which is
different from the spec.

> e.g
> a spec definition for the PEK import in include/linux/psp-sev.h is:
> struct sev_data_pek_cert_import {
> u64 pdh_cert_address; /* system physical address */
> u32 pdh_cert_len;
> u32 reserved;
> ...
> };
>
> But its corresponding userspace structure def in include/uapi/linux/psp-sev.h is:
> struct sev_user_data_pek_cert_import {
> __u64 pek_cert_uaddr; /* userspace address */
> __u32 pek_cert_len;
> ...
> };

And the difference is a single "u32 reserved"?

Dunno, from where I'm standing this looks like unnecessary confusion to
me.

> The ioctl handling takes care of mapping from uaddr to pa and other
> things as required. So, I took similar approach for the SEV-SNP guest
> ioctl. In this particular case the guest request structure defined in
> the spec contains multiple field but many of those fields are managed
> internally by the kernel (e.g seqno, IV, etc etc).

Ok, multiple fields sounds like you wanna save on the data that is
shovelled between kernel and user space and then some of the fields
don't mean a thing for the user API. Ok.

But again, where is this documented and stated clear so that people are
aware?

Or are you assuming that since the user counterparts are in

include/uapi/linux/psp-sev.h
^^^^

and it being an uapi header, then that should state that?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-07-05 10:39:54

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH Part1 RFC v3 22/22] virt: Add SEV-SNP guest driver


On 7/3/21 11:19 AM, Borislav Petkov wrote:
> On Thu, Jul 01, 2021 at 04:32:25PM -0500, Brijesh Singh wrote:
>> The spec definition is present in include/linux/psp-sev.h but sometime
>> we don't expose the spec defs as-is to userspace.
> Why?
>
> Having such undocumented and maybe unwarranted differences - I still
> don't see a clear reason why - is calling for additional and unnecessary
> confusion.

Because some of fields don't make any sense for the userspace interfaces. 


>
>> Several SEV/SEV-SNP does not need to be exposed to the userspace,
>> those which need to be expose we provide a bit modified Linux uapi for
>> it, and for SEV drivers we choose "_user" prefix.
> Is that documented somewhere?
>
> Because "user" doesn't tell me it is a modified structure which is
> different from the spec.

We have good documentation for the SEV ioctl and structure provided
through the KVM interface.

Unfortunately the the documentation for the ioctl and structure provided
through /dev/sev does not exist. We could look into adding this
documentation outside this series. The structure provided through
/dev/sev is identical to the structure documented in the spec with minor
changes such as not exposing reserved fields or rename from paddr to
uaddr etc.


>> e.g
>> a spec definition for the PEK import in include/linux/psp-sev.h is:
>> struct sev_data_pek_cert_import {
>> u64 pdh_cert_address; /* system physical address */
>> u32 pdh_cert_len;
>> u32 reserved;
>> ...
>> };
>>
>> But its corresponding userspace structure def in include/uapi/linux/psp-sev.h is:
>> struct sev_user_data_pek_cert_import {
>> __u64 pek_cert_uaddr; /* userspace address */
>> __u32 pek_cert_len;
>> ...
>> };
> And the difference is a single "u32 reserved"?

Mostly yes.

>
> Dunno, from where I'm standing this looks like unnecessary confusion to
> me.
>> The ioctl handling takes care of mapping from uaddr to pa and other
>> things as required. So, I took similar approach for the SEV-SNP guest
>> ioctl. In this particular case the guest request structure defined in
>> the spec contains multiple field but many of those fields are managed
>> internally by the kernel (e.g seqno, IV, etc etc).
> Ok, multiple fields sounds like you wanna save on the data that is
> shovelled between kernel and user space and then some of the fields
> don't mean a thing for the user API. Ok.
>
> But again, where is this documented and stated clear so that people are
> aware?
>
> Or are you assuming that since the user counterparts are in
>
> include/uapi/linux/psp-sev.h
> ^^^^
>
> and it being an uapi header, then that should state that?
>
Yes, the assumption is user wanting to communicate to PSP through
/dev/sev will need to include include psp-sev.h from
uapi/linux/psp-sev.h. The header file itself document the fields
definition, and then user need to refer to SEV SPEC for the further
details. I could start documenting the SNP specific ioctl in
Documentation/virt/coco/sevguest.rst and it can be later expanded to
cover SEV and SEV-ES.

-Brijesh