2022-02-09 19:02:54

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 00/45] Add AMD Secure Nested Paging (SEV-SNP) Guest Support

This part of Secure Encrypted Paging (SEV-SNP) series focuses on the changes
required in a guest OS for SEV-SNP support.

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

Many of the integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP
VM requires a 2-step process. First, the hypervisor assigns a page to the
guest using the new RMPUPDATE instruction. This transitions the page to
guest-invalid. Second, the guest validates the page using the new PVALIDATE
instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE"
defined in the GHCB specification to ask hypervisor to add or remove page
from the RMP table.

Each page assigned to the SEV-SNP VM can either be validated or unvalidated,
as indicated by the Validated flag in the page's RMP entry. There are two
approaches that can be taken for the page validation: Pre-validation and
Lazy Validation.

Under pre-validation, the pages are validated prior to first use. And under
lazy validation, pages are validated when first accessed. An access to a
unvalidated page results in a #VC exception, at which time the exception
handler may validate the page. Lazy validation requires careful tracking of
the validated pages to avoid validating the same GPA more than once. The
recently introduced "Unaccepted" memory type can be used to communicate the
unvalidated memory ranges to the Guest OS.

At this time we only support the pre-validation, the OVMF guest BIOS
validates the entire RAM before the control is handed over to the guest kernel.
The early_set_memory_{encrypted,decrypted} and set_memory_{encrypted,decrypted} are
enlightened to perform the page validation or invalidation while setting or
clearing the encryption attribute from the page table.

This series does not provide support for the Interrupt security yet which will
be added after the base support.

The series is based on tip/master
63812a9c80a Merge x86/cpu into tip/master

The complete branch is at https://github.com/AMDESE/linux/tree/sev-snp-v10

Patch 1-4 defines multiple VMSA save area to support SEV,SEV-ES and SEV-SNP guests.
It is a pre-requisite for the SEV-SNP guest support, and included in the
series for the completeness. These patch apply cleanly to kvm/next.
It is also posted on KVM mailing list:
https://lore.kernel.org/lkml/[email protected]/T/#m7d6868f3e81624323ea933d3a63a68949b286103

Additional resources
---------------------
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf

APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf
(section 15.36)

GHCB spec:
https://developer.amd.com/wp-content/resources/56421.pdf

SEV-SNP firmware specification:
https://developer.amd.com/sev/

v9: https://lore.kernel.org/linux-mm/[email protected]
v8: https://lore.kernel.org/lkml/[email protected]/
v7: https://lore.kernel.org/linux-mm/[email protected]/
v6: https://lore.kernel.org/linux-mm/[email protected]/
v5: https://lore.kernel.org/lkml/[email protected]/

Changes since v9:
* Removed unecessary checks on CPUID table contents, added kernel param to dump CPUID table during boot
* Added boot_{rd,wr}msr() helpers
* Renamed/refactored SNP CPUID code/definitions for clarity/consistency
* Re-worked comments for clarity and avoid redundancies
* Moved SNP CPUID table documentation to Documentation/virt/coco/sevguest.rst
* Documented cc_blob_address/acpi_rsdp_addr in zero-page.rst

Changes since v8:
* Setup the GHCB before taking the first #VC.
* Make the CC blob structure size invariant.
* Define the AP INIT macro and update the AP creation to use those macro
instead of the hardcoded values.
* Expand the comments to cover some of previous feedbacks.
* Fix the commit messages based on the feedbacks.
* Multiple fixes/cleanup on cpuid patches (based on Boris and Dave feedback)
* drop is_efi64 return arguments in favor of a separate efi_get_type() helper.
* drop is_efi64 input arguments in favor of calling efi_get_type() as-needed.
* move acpi.c's kexec-specific handling into library code.
* fix stack protection for 32/64-bit builds.
* Export add_identity_map() to avoid SEV-specific code in ident_map_64.c.
* use snp_abort() when terminating via initial ccblob scan.
* fix the copyright header after the code refactor.
* remove code duplication whereever possible.

Changes since v7:
* sevguest: extend the get report structure to accept the vmpl from userspace.
* In the compressed path, move the GHCB protocol negotiation from VC handler
to sev_enable().
* sev_enable(): don't expect SEV bit in status MSR when cpuid bit is present, update comments.
* sme_enable(): call directly from head_64.S rather than as part of startup_64_setup_env, add comments
* snp_find_cc_blob(), sev_prep_identity_maps(): add missing 'static' keywords to function prototypes

Changes since v6:
* Add rmpadjust() helper to be used by AP creation and vmpl0 detect function.
* Clear the VM communication key if guest detects that hypervisor is modifying
the SNP_GUEST_REQ response header.
* Move the per-cpu GHCB registration from first #VC to idt setup.
* Consolidate initial SEV/SME setup into a common entry point that gets called
early enough to also be used for SEV-SNP CPUID table setup.
* SNP CPUID: separate initial SEV-SNP feature detection out into standalone
snp_init() routines, then add CPUID table setup to it as a separate patch.
* SNP CPUID: fix boot issue with Seabios due to ACPI relying on certain EFI
config table lookup failures as fallthrough cases rather than error cases.
* SNP CPUID: drop the use of a separate init routines to handle pointer fixups
after switching to kernel virtual addresses, instead use a helper that uses
RIP-relative addressing to access CPUID table when either on identity mapping
or kernel virtual addresses.

Changes since v5:
* move the seqno allocation in the sevguest driver.
* extend snp_issue_guest_request() to accept the exit_info to simplify the logic.
* use smaller structure names based on feedback.
* explicitly clear the memory after the SNP guest request is completed.
* cpuid validation: use a local copy of cpuid table instead of keeping
firmware table mapped throughout boot.
* cpuid validation: coding style fix-ups and refactor cpuid-related helpers
as suggested.
* cpuid validation: drop a number of BOOT_COMPRESSED-guarded defs/declarations
by moving things like snp_cpuid_init*() out of sev-shared.c and keeping only
the common bits there.
* Break up EFI config table helpers and related acpi.c changes into separate
patches.
* re-enable stack protection for 32-bit kernels as well, not just 64-bit

Changes since v4:
* Address the cpuid specific review comment
* Simplified the macro based on the review feedback
* Move macro definition to the patch that needs it
* Fix the issues reported by the checkpath
* Address the AP creation specific review comment

Changes since v3:
* Add support to use the PSP filtered CPUID.
* Add support for the extended guest request.
* Move sevguest driver in driver/virt/coco.
* Add documentation for sevguest ioctl.
* Add support to check the vmpl0.
* Pass the VM encryption key and id to be used for encrypting guest messages
through the platform drv data.
* Multiple cleanup and fixes to address the review feedbacks.

Changes since v2:
* Add support for AP startup using SNP specific vmgexit.
* Add snp_prep_memory() helper.
* Drop sev_snp_active() helper.
* Add sev_feature_enabled() helper to check which SEV feature is active.
* Sync the SNP guest message request header with latest SNP FW spec.
* Multiple cleanup and fixes to address the review feedbacks.

Changes since v1:
* Integerate the SNP support in sev.{ch}.
* Add support to query the hypervisor feature and detect whether SNP is supported.
* Define Linux specific reason code for the SNP guest termination.
* Extend the setup_header provide a way for hypervisor to pass secret and cpuid page.
* Add support to create a platform device and driver to query the attestation report
and the derive a key.
* Multiple cleanup and fixes to address Boris's review fedback.

Brijesh Singh (20):
KVM: SVM: Define sev_features and vmpl field in the VMSA
x86/mm: Extend cc_attr to include AMD SEV-SNP
x86/sev: Define the Linux specific guest termination reasons
x86/sev: Save the negotiated GHCB version
x86/sev: Check SEV-SNP features support
x86/sev: Add a helper for the PVALIDATE instruction
x86/sev: Check the vmpl level
x86/compressed: Add helper for validating pages in the decompression
stage
x86/compressed: Register GHCB memory when SEV-SNP is active
x86/sev: Register GHCB memory when SEV-SNP is active
x86/sev: Add helper for validating pages in early enc attribute
changes
x86/kernel: Make the .bss..decrypted section shared in RMP table
x86/kernel: Validate ROM memory before accessing when SEV-SNP is
active
x86/mm: Add support to validate memory when changing C-bit
x86/boot: Add Confidential Computing type to setup_data
x86/sev: Provide support for SNP guest request NAEs
x86/sev: Register SEV-SNP guest request platform device
virt: Add SEV-SNP guest driver
virt: sevguest: Add support to derive key
virt: sevguest: Add support to get extended report

Michael Roth (21):
x86/boot: Introduce helpers for MSR reads/writes
x86/boot: Use MSR read/write helpers instead of inline assembly
x86/compressed/64: Detect/setup SEV/SME features earlier in boot
x86/sev: Detect/setup SEV/SME features earlier in boot
x86/head/64: Re-enable stack protection
x86/compressed/acpi: Move EFI detection to helper
x86/compressed/acpi: Move EFI system table lookup to helper
x86/compressed/acpi: Move EFI config table lookup to helper
x86/compressed/acpi: Move EFI vendor table lookup to helper
x86/compressed/acpi: Move EFI kexec handling into common code
KVM: x86: Move lookup of indexed CPUID leafs to helper
x86/sev: Move MSR-based VMGEXITs for CPUID to helper
x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers
x86/boot: Add a pointer to Confidential Computing blob in bootparams
x86/compressed: Add SEV-SNP feature detection/setup
x86/compressed: Use firmware-validated CPUID leaves for SEV-SNP guests
x86/compressed: Export and rename add_identity_map()
x86/compressed/64: Add identity mapping for Confidential Computing
blob
x86/sev: Add SEV-SNP feature detection/setup
x86/sev: Use firmware-validated CPUID for SEV-SNP guests
virt: sevguest: Add documentation for SEV-SNP CPUID Enforcement

Tom Lendacky (4):
KVM: SVM: Create a separate mapping for the SEV-ES save area
KVM: SVM: Create a separate mapping for the GHCB save area
KVM: SVM: Update the SEV-ES save area mapping
x86/sev: Use SEV-SNP AP creation to start secondary CPUs

.../admin-guide/kernel-parameters.txt | 4 +
Documentation/virt/coco/sevguest.rst | 155 ++++
Documentation/virt/index.rst | 1 +
Documentation/x86/zero-page.rst | 2 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/acpi.c | 173 +---
arch/x86/boot/compressed/efi.c | 238 ++++++
arch/x86/boot/compressed/head_64.S | 37 +-
arch/x86/boot/compressed/ident_map_64.c | 39 +-
arch/x86/boot/compressed/idt_64.c | 18 +-
arch/x86/boot/compressed/mem_encrypt.S | 36 -
arch/x86/boot/compressed/misc.h | 55 +-
arch/x86/boot/compressed/sev.c | 263 +++++-
arch/x86/boot/cpucheck.c | 30 +-
arch/x86/boot/msr.h | 28 +
arch/x86/include/asm/bootparam_utils.h | 1 +
arch/x86/include/asm/cpuid.h | 34 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/include/asm/msr.h | 11 +-
arch/x86/include/asm/setup.h | 1 -
arch/x86/include/asm/sev-common.h | 82 ++
arch/x86/include/asm/sev.h | 102 ++-
arch/x86/include/asm/shared/msr.h | 15 +
arch/x86/include/asm/svm.h | 171 +++-
arch/x86/include/uapi/asm/bootparam.h | 4 +-
arch/x86/include/uapi/asm/svm.h | 13 +
arch/x86/kernel/Makefile | 1 -
arch/x86/kernel/cc_platform.c | 2 +
arch/x86/kernel/cpu/common.c | 4 +
arch/x86/kernel/head64.c | 29 +-
arch/x86/kernel/head_64.S | 37 +-
arch/x86/kernel/probe_roms.c | 13 +-
arch/x86/kernel/sev-shared.c | 529 +++++++++++-
arch/x86/kernel/sev.c | 802 +++++++++++++++++-
arch/x86/kernel/smpboot.c | 3 +
arch/x86/kvm/cpuid.c | 19 +-
arch/x86/kvm/svm/sev.c | 24 +-
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/mm/mem_encrypt.c | 4 +
arch/x86/mm/mem_encrypt_amd.c | 58 +-
arch/x86/mm/mem_encrypt_identity.c | 8 +
arch/x86/mm/pat/set_memory.c | 15 +
drivers/virt/Kconfig | 3 +
drivers/virt/Makefile | 1 +
drivers/virt/coco/sevguest/Kconfig | 12 +
drivers/virt/coco/sevguest/Makefile | 2 +
drivers/virt/coco/sevguest/sevguest.c | 736 ++++++++++++++++
drivers/virt/coco/sevguest/sevguest.h | 98 +++
include/linux/cc_platform.h | 8 +
include/linux/efi.h | 1 +
include/uapi/linux/sev-guest.h | 80 ++
52 files changed, 3639 insertions(+), 372 deletions(-)
create mode 100644 Documentation/virt/coco/sevguest.rst
create mode 100644 arch/x86/boot/compressed/efi.c
create mode 100644 arch/x86/boot/msr.h
create mode 100644 arch/x86/include/asm/cpuid.h
create mode 100644 arch/x86/include/asm/shared/msr.h
create mode 100644 drivers/virt/coco/sevguest/Kconfig
create mode 100644 drivers/virt/coco/sevguest/Makefile
create mode 100644 drivers/virt/coco/sevguest/sevguest.c
create mode 100644 drivers/virt/coco/sevguest/sevguest.h
create mode 100644 include/uapi/linux/sev-guest.h

--
2.25.1



2022-02-09 19:05:50

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 19/45] x86/kernel: Make the .bss..decrypted section shared in RMP table

The encryption attribute for the .bss..decrypted section is cleared in the
initial page table build. This is because the section contains the data
that need to be shared between the guest and the hypervisor.

When SEV-SNP is active, just clearing the encryption attribute in the
page table is not enough. The page state need to be updated in the RMP
table.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/kernel/head64.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 8075e91cff2b..4a24b121a2ba 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -143,7 +143,20 @@ static unsigned long sme_postprocess_startup(struct boot_params *bp, pmdval_t *p
if (sme_get_me_mask()) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
+
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+ /*
+ * On SNP, transition the page to shared in the RMP table so that
+ * it is consistent with the page table attribute change.
+ *
+ * __start_bss_decrypted has a virtual address in the high range
+ * mapping (kernel .text). PVALIDATE, by way of
+ * early_snp_set_memory_shared(), requires a valid virtual
+ * address but the kernel is currently running off of the identity
+ * mapping so use __pa() to get a *currently* valid virtual address.
+ */
+ early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
+
i = pmd_index(vaddr);
pmd[i] -= sme_get_me_mask();
}
--
2.25.1


2022-02-09 19:11:01

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 33/45] x86/boot: Add a pointer to Confidential Computing blob in bootparams

From: Michael Roth <[email protected]>

The previously defined Confidential Computing blob is provided to the
kernel via a setup_data structure or EFI config table entry. Currently
these are both checked for by boot/compressed kernel to access the
CPUID table address within it for use with SEV-SNP CPUID enforcement.

To also enable SEV-SNP CPUID enforcement for the run-time kernel,
similar early access to the CPUID table is needed early on while it's
still using the identity-mapped page table set up by boot/compressed,
where global pointers need to be accessed via fixup_pointer().

This isn't much of an issue for accessing setup_data, and the EFI
config table helper code currently used in boot/compressed *could* be
used in this case as well since they both rely on identity-mapping.
However, it has some reliance on EFI helpers/string constants that
would need to be accessed via fixup_pointer(), and fixing it up while
making it shareable between boot/compressed and run-time kernel is
fragile and introduces a good bit of uglyness.

Instead, add a boot_params->cc_blob_address pointer that the
boot/compressed kernel can initialize so that the run-time kernel can
access the CC blob from there instead of re-scanning the EFI config
table.

Also document these in Documentation/x86/zero-page.rst. While there,
add missing documentation for the acpi_rsdp_addr field, which serves a
similar purpose in providing the run-time kernel a pointer to the ACPI
RSDP table so that it does not need to [re-]scan the EFI configuration
table.

Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/x86/zero-page.rst | 2 ++
arch/x86/include/asm/bootparam_utils.h | 1 +
arch/x86/include/uapi/asm/bootparam.h | 3 ++-
3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/x86/zero-page.rst b/Documentation/x86/zero-page.rst
index f088f5881666..45aa9cceb4f1 100644
--- a/Documentation/x86/zero-page.rst
+++ b/Documentation/x86/zero-page.rst
@@ -19,6 +19,7 @@ Offset/Size Proto Name Meaning
058/008 ALL tboot_addr Physical address of tboot shared page
060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information
(struct ist_info)
+070/008 ALL acpi_rsdp_addr Physical address of ACPI RSDP table
080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!!
090/010 ALL hd1_info hd1 disk parameter, OBSOLETE!!
0A0/010 ALL sys_desc_table System description table (struct sys_desc_table),
@@ -27,6 +28,7 @@ Offset/Size Proto Name Meaning
0C0/004 ALL ext_ramdisk_image ramdisk_image high 32bits
0C4/004 ALL ext_ramdisk_size ramdisk_size high 32bits
0C8/004 ALL ext_cmd_line_ptr cmd_line_ptr high 32bits
+13C/004 ALL cc_blob_address Physical address of Confidential Computing blob
140/080 ALL edid_info Video mode setup (struct edid_info)
1C0/020 ALL efi_info EFI 32 information (struct efi_info)
1E0/004 ALL alt_mem_k Alternative mem check, in KB
diff --git a/arch/x86/include/asm/bootparam_utils.h b/arch/x86/include/asm/bootparam_utils.h
index 981fe923a59f..53e9b0620d96 100644
--- a/arch/x86/include/asm/bootparam_utils.h
+++ b/arch/x86/include/asm/bootparam_utils.h
@@ -74,6 +74,7 @@ static void sanitize_boot_params(struct boot_params *boot_params)
BOOT_PARAM_PRESERVE(hdr),
BOOT_PARAM_PRESERVE(e820_table),
BOOT_PARAM_PRESERVE(eddbuf),
+ BOOT_PARAM_PRESERVE(cc_blob_address),
};

memset(&scratch, 0, sizeof(scratch));
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 1ac5acca72ce..bea5cdcdf532 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -188,7 +188,8 @@ struct boot_params {
__u32 ext_ramdisk_image; /* 0x0c0 */
__u32 ext_ramdisk_size; /* 0x0c4 */
__u32 ext_cmd_line_ptr; /* 0x0c8 */
- __u8 _pad4[116]; /* 0x0cc */
+ __u8 _pad4[112]; /* 0x0cc */
+ __u32 cc_blob_address; /* 0x13c */
struct edid_info edid_info; /* 0x140 */
struct efi_info efi_info; /* 0x1c0 */
__u32 alt_mem_k; /* 0x1e0 */
--
2.25.1


2022-02-09 19:11:14

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 04/45] KVM: SVM: Update the SEV-ES save area mapping

From: Tom Lendacky <[email protected]>

This is the final step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Update the SEV-ES/SEV-SNP save area to match the APM. This save
area will be used for the upcoming SEV-SNP AP Creation NAE event support.

Reviewed-by: Venu Busireddy <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 66 +++++++++++++++++++++++++++++---------
1 file changed, 50 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 5ff1fa364a31..7d90321e7775 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -290,7 +290,13 @@ struct sev_es_save_area {
struct vmcb_seg ldtr;
struct vmcb_seg idtr;
struct vmcb_seg tr;
- u8 reserved_1[43];
+ u64 vmpl0_ssp;
+ u64 vmpl1_ssp;
+ u64 vmpl2_ssp;
+ u64 vmpl3_ssp;
+ u64 u_cet;
+ u8 reserved_1[2];
+ u8 vmpl;
u8 cpl;
u8 reserved_2[4];
u64 efer;
@@ -303,9 +309,19 @@ struct sev_es_save_area {
u64 dr6;
u64 rflags;
u64 rip;
- u8 reserved_4[88];
+ u64 dr0;
+ u64 dr1;
+ u64 dr2;
+ u64 dr3;
+ u64 dr0_addr_mask;
+ u64 dr1_addr_mask;
+ u64 dr2_addr_mask;
+ u64 dr3_addr_mask;
+ u8 reserved_4[24];
u64 rsp;
- u8 reserved_5[24];
+ u64 s_cet;
+ u64 ssp;
+ u64 isst_addr;
u64 rax;
u64 star;
u64 lstar;
@@ -316,7 +332,7 @@ struct sev_es_save_area {
u64 sysenter_esp;
u64 sysenter_eip;
u64 cr2;
- u8 reserved_6[32];
+ u8 reserved_5[32];
u64 g_pat;
u64 dbgctl;
u64 br_from;
@@ -325,12 +341,12 @@ struct sev_es_save_area {
u64 last_excp_to;
u8 reserved_7[80];
u32 pkru;
- u8 reserved_9[20];
- u64 reserved_10; /* rax already available at 0x01f8 */
+ u8 reserved_8[20];
+ u64 reserved_9; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
u64 rbx;
- u64 reserved_11; /* rsp already available at 0x01d8 */
+ u64 reserved_10; /* rsp already available at 0x01d8 */
u64 rbp;
u64 rsi;
u64 rdi;
@@ -342,16 +358,34 @@ struct sev_es_save_area {
u64 r13;
u64 r14;
u64 r15;
- u8 reserved_12[16];
- u64 sw_exit_code;
- u64 sw_exit_info_1;
- u64 sw_exit_info_2;
- u64 sw_scratch;
+ u8 reserved_11[16];
+ u64 guest_exit_info_1;
+ u64 guest_exit_info_2;
+ u64 guest_exit_int_info;
+ u64 guest_nrip;
u64 sev_features;
- u8 reserved_13[48];
+ u64 vintr_ctrl;
+ u64 guest_exit_code;
+ u64 virtual_tom;
+ u64 tlb_id;
+ u64 pcpu_id;
+ u64 event_inj;
u64 xcr0;
- u8 valid_bitmap[16];
- u64 x87_state_gpa;
+ u8 reserved_12[16];
+
+ /* Floating point area */
+ u64 x87_dp;
+ u32 mxcsr;
+ u16 x87_ftw;
+ u16 x87_fsw;
+ u16 x87_fcw;
+ u16 x87_fop;
+ u16 x87_ds;
+ u16 x87_cs;
+ u64 x87_rip;
+ u8 fpreg_x87[80];
+ u8 fpreg_xmm[256];
+ u8 fpreg_ymm[256];
} __packed;

struct ghcb_save_area {
@@ -410,7 +444,7 @@ struct ghcb {

#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
#define EXPECTED_GHCB_SAVE_AREA_SIZE 1032
-#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
+#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1648
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 1024
#define EXPECTED_GHCB_SIZE PAGE_SIZE

--
2.25.1


2022-02-09 19:11:47

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 41/45] x86/sev: Register SEV-SNP guest request platform device

Version 2 of the GHCB specification providesa Non Automatic Exit (NAE)
event type that can be used by the SEV-SNP guest to communicate with the
PSP without risk from a malicious hypervisor who wishes to read, alter,
drop or replay the messages sent.

SNP_LAUNCH_UPDATE can insert two special pages into the guest’s memory:
the secrets page and the CPUID page. The PSP firmware populates the
contents of the secrets page. The secrets page contains encryption keys
used by the guest to interact with the firmware. Because the secrets
page is encrypted with the guest’s memory encryption key, the hypervisor
cannot read the keys. See SEV-SNP firmware spec for further details on
the secrets page format.

Create a platform device that the SEV-SNP guest driver can bind to get
the platform resources such as encryption key and message id to use to
communicate with the PSP. The SEV-SNP guest driver provides a userspace
interface to get the attestation report, key derivation, extended
attestation report etc.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 4 +++
arch/x86/kernel/sev.c | 56 ++++++++++++++++++++++++++++++++++++++
2 files changed, 60 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 9830ee1d6ef0..ca977493eb72 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -95,6 +95,10 @@ struct snp_req_data {
unsigned int data_npages;
};

+struct snp_guest_platform_data {
+ u64 secrets_gpa;
+};
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index cafced2237f3..7784067df7fa 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -19,6 +19,9 @@
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/cpumask.h>
+#include <linux/efi.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>

#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -2148,3 +2151,56 @@ int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned
return ret;
}
EXPORT_SYMBOL_GPL(snp_issue_guest_request);
+
+static struct platform_device guest_req_device = {
+ .name = "snp-guest",
+ .id = -1,
+};
+
+static u64 get_secrets_page(void)
+{
+ u64 pa_data = boot_params.cc_blob_address;
+ struct cc_blob_sev_info info;
+ void *map;
+
+ /*
+ * The CC blob contains the address of the secrets page, check if the
+ * blob is present.
+ */
+ if (!pa_data)
+ return 0;
+
+ map = early_memremap(pa_data, sizeof(info));
+ memcpy(&info, map, sizeof(info));
+ early_memunmap(map, sizeof(info));
+
+ /* smoke-test the secrets page passed */
+ if (!info.secrets_phys || info.secrets_len != PAGE_SIZE)
+ return 0;
+
+ return info.secrets_phys;
+}
+
+static int __init snp_init_platform_device(void)
+{
+ struct snp_guest_platform_data data;
+ u64 gpa;
+
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return -ENODEV;
+
+ gpa = get_secrets_page();
+ if (!gpa)
+ return -ENODEV;
+
+ data.secrets_gpa = gpa;
+ if (platform_device_add_data(&guest_req_device, &data, sizeof(data)))
+ return -ENODEV;
+
+ if (platform_device_register(&guest_req_device))
+ return -ENODEV;
+
+ pr_info("SNP guest platform device initialized.\n");
+ return 0;
+}
+device_initcall(snp_init_platform_device);
--
2.25.1


2022-02-09 19:12:53

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 29/45] x86/boot: Add Confidential Computing type to setup_data

While launching the encrypted guests, the hypervisor may need to provide
some additional information during the guest boot. When booting under the
EFI based BIOS, the EFI configuration table contains an entry for the
confidential computing blob that contains the required information.

To support booting encrypted guests on non-EFI VM, the hypervisor needs to
pass this additional information to the kernel with a different method.

For this purpose, introduce SETUP_CC_BLOB type in setup_data to hold the
physical address of the confidential computing blob location. The boot
loader or hypervisor may choose to use this method instead of EFI
configuration table. The CC blob location scanning should give preference
to setup_data data over the EFI configuration table.

In AMD SEV-SNP, the CC blob contains the address of the secrets and CPUID
pages. The secrets page includes information such as a VM to PSP
communication key and CPUID page contains PSP filtered CPUID values.
Define the AMD SEV confidential computing blob structure.

While at it, define the EFI GUID for the confidential computing blob.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 18 ++++++++++++++++++
arch/x86/include/uapi/asm/bootparam.h | 1 +
include/linux/efi.h | 1 +
3 files changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index a3203b2caaca..1a7e21bb6eea 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -42,6 +42,24 @@ struct es_em_ctxt {
struct es_fault_info fi;
};

+/*
+ * AMD SEV Confidential computing blob structure. The structure is
+ * defined in OVMF UEFI firmware header:
+ * https://github.com/tianocore/edk2/blob/master/OvmfPkg/Include/Guid/ConfidentialComputingSevSnpBlob.h
+ */
+#define CC_BLOB_SEV_HDR_MAGIC 0x45444d41
+struct cc_blob_sev_info {
+ u32 magic;
+ u16 version;
+ u16 reserved;
+ u64 secrets_phys;
+ u32 secrets_len;
+ u32 rsvd1;
+ u64 cpuid_phys;
+ u32 cpuid_len;
+ u32 rsvd2;
+};
+
void do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code);

static inline u64 lower_bits(u64 val, unsigned int bits)
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index b25d3f82c2f3..1ac5acca72ce 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -10,6 +10,7 @@
#define SETUP_EFI 4
#define SETUP_APPLE_PROPERTIES 5
#define SETUP_JAILHOUSE 6
+#define SETUP_CC_BLOB 7

#define SETUP_INDIRECT (1<<31)

diff --git a/include/linux/efi.h b/include/linux/efi.h
index ccd4d3f91c98..984aa688997a 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -390,6 +390,7 @@ void efi_native_runtime_setup(void);
#define EFI_CERT_SHA256_GUID EFI_GUID(0xc1c41626, 0x504c, 0x4092, 0xac, 0xa9, 0x41, 0xf9, 0x36, 0x93, 0x43, 0x28)
#define EFI_CERT_X509_GUID EFI_GUID(0xa5c059a1, 0x94e4, 0x4aa7, 0x87, 0xb5, 0xab, 0x15, 0x5c, 0x2b, 0xf0, 0x72)
#define EFI_CERT_X509_SHA256_GUID EFI_GUID(0x3bd2a492, 0x96c0, 0x4079, 0xb4, 0x20, 0xfc, 0xf9, 0x8e, 0xf1, 0x03, 0xed)
+#define EFI_CC_BLOB_GUID EFI_GUID(0x067b1f5f, 0xcf26, 0x44c5, 0x85, 0x54, 0x93, 0xd7, 0x77, 0x91, 0x2d, 0x42)

/*
* This GUID is used to pass to the kernel proper the struct screen_info
--
2.25.1


2022-02-09 19:13:05

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 32/45] x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers

From: Michael Roth <[email protected]>

CPUID instructions generate a #VC exception for SEV-ES/SEV-SNP guests,
for which early handlers are currently set up to handle. In the case
of SEV-SNP, guests can use a configurable location in guest memory
that has been pre-populated with a firmware-validated CPUID table to
look up the relevant CPUID values rather than requesting them from
hypervisor via a VMGEXIT. Add the various hooks in the #VC handlers to
allow CPUID instructions to be handled via the table. The code to
actually configure/enable the table will be added in a subsequent
commit.

Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 2 +
arch/x86/kernel/sev-shared.c | 324 ++++++++++++++++++++++++++++++
2 files changed, 326 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index e9b6815b3b3d..0759af9b1acf 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -152,6 +152,8 @@ struct snp_psc_desc {
#define GHCB_TERM_PSC 1 /* Page State Change failure */
#define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
#define GHCB_TERM_NOT_VMPL0 3 /* SNP guest is not running at VMPL-0 */
+#define GHCB_TERM_CPUID 4 /* CPUID-validation failure */
+#define GHCB_TERM_CPUID_HV 5 /* CPUID failure during hypervisor fallback */

#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index b4d5558c9d0a..0f1375164ff0 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -24,6 +24,36 @@ struct cpuid_leaf {
u32 edx;
};

+/*
+ * Individual entries of the SNP CPUID table, as defined by the SNP
+ * Firmware ABI, Revision 0.9, Section 7.1, Table 14.
+ */
+struct snp_cpuid_fn {
+ u32 eax_in;
+ u32 ecx_in;
+ u64 xcr0_in;
+ u64 xss_in;
+ u32 eax;
+ u32 ebx;
+ u32 ecx;
+ u32 edx;
+ u64 __reserved;
+} __packed;
+
+/*
+ * SNP CPUID table, as defined by the SNP Firmware ABI, Revision 0.9,
+ * Section 8.14.2.6. Also noted there is the SNP firmware-enforced limit
+ * of 64 entries per CPUID table.
+ */
+#define SNP_CPUID_COUNT_MAX 64
+
+struct snp_cpuid_table {
+ u32 count;
+ u32 __reserved1;
+ u64 __reserved2;
+ struct snp_cpuid_fn fn[SNP_CPUID_COUNT_MAX];
+} __packed;
+
/*
* Since feature negotiation related variables are set early in the boot
* process they must reside in the .data section so as not to be zeroed
@@ -33,6 +63,19 @@ struct cpuid_leaf {
*/
static u16 ghcb_version __ro_after_init;

+/* Copy of the SNP firmware's CPUID page. */
+static struct snp_cpuid_table cpuid_table_copy __ro_after_init;
+
+/*
+ * These will be initialized based on CPUID table so that non-present
+ * all-zero leaves (for sparse tables) can be differentiated from
+ * invalid/out-of-range leaves. This is needed since all-zero leaves
+ * still need to be post-processed.
+ */
+static u32 cpuid_std_range_max __ro_after_init;
+static u32 cpuid_hyp_range_max __ro_after_init;
+static u32 cpuid_ext_range_max __ro_after_init;
+
static bool __init sev_es_check_cpu_features(void)
{
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -242,6 +285,252 @@ static int sev_cpuid_hv(struct cpuid_leaf *leaf)
return ret;
}

+/*
+ * This may be called early while still running on the initial identity
+ * mapping. Use RIP-relative addressing to obtain the correct address
+ * while running with the initial identity mapping as well as the
+ * switch-over to kernel virtual addresses later.
+ */
+static const struct snp_cpuid_table *snp_cpuid_get_table(void)
+{
+ void *ptr;
+
+ asm ("lea cpuid_table_copy(%%rip), %0"
+ : "=r" (ptr)
+ : "p" (&cpuid_table_copy));
+
+ return ptr;
+}
+
+/*
+ * The SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of
+ * XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0
+ * and 1 based on the corresponding features enabled by a particular
+ * combination of XCR0 and XSS registers so that a guest can look up the
+ * version corresponding to the features currently enabled in its XCR0/XSS
+ * registers. The only values that differ between these versions/table
+ * entries is the enabled XSAVE area size advertised via EBX.
+ *
+ * While hypervisors may choose to make use of this support, it is more
+ * robust/secure for a guest to simply find the entry corresponding to the
+ * base/legacy XSAVE area size (XCR0=1 or XCR0=3), and then calculate the
+ * XSAVE area size using subfunctions 2 through 64, as documented in APM
+ * Volume 3, Rev 3.31, Appendix E.3.8, which is what is done here.
+ *
+ * Since base/legacy XSAVE area size is documented as 0x240, use that value
+ * directly rather than relying on the base size in the CPUID table.
+ *
+ * Return: XSAVE area size on success, 0 otherwise.
+ */
+static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
+{
+ const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ u64 xfeatures_found = 0;
+ u32 xsave_size = 0x240;
+ int i;
+
+ for (i = 0; i < cpuid_table->count; i++) {
+ const struct snp_cpuid_fn *e = &cpuid_table->fn[i];
+
+ if (!(e->eax_in == 0xD && e->ecx_in > 1 && e->ecx_in < 64))
+ continue;
+ if (!(xfeatures_en & (BIT_ULL(e->ecx_in))))
+ continue;
+ if (xfeatures_found & (BIT_ULL(e->ecx_in)))
+ continue;
+
+ xfeatures_found |= (BIT_ULL(e->ecx_in));
+
+ if (compacted)
+ xsave_size += e->eax;
+ else
+ xsave_size = max(xsave_size, e->eax + e->ebx);
+ }
+
+ /*
+ * Either the guest set unsupported XCR0/XSS bits, or the corresponding
+ * entries in the CPUID table were not present. This is not a valid
+ * state to be in.
+ */
+ if (xfeatures_found != (xfeatures_en & GENMASK_ULL(63, 2)))
+ return 0;
+
+ return xsave_size;
+}
+
+static bool
+snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
+{
+ const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ int i;
+
+ for (i = 0; i < cpuid_table->count; i++) {
+ const struct snp_cpuid_fn *e = &cpuid_table->fn[i];
+
+ if (e->eax_in != leaf->fn)
+ continue;
+
+ if (cpuid_function_is_indexed(leaf->fn) && e->ecx_in != leaf->subfn)
+ continue;
+
+ /*
+ * For 0xD subfunctions 0 and 1, only use the entry corresponding
+ * to the base/legacy XSAVE area size (XCR0=1 or XCR0=3, XSS=0).
+ * See the comments above snp_cpuid_calc_xsave_size() for more
+ * details.
+ */
+ if (e->eax_in == 0xD && (e->ecx_in == 0 || e->ecx_in == 1))
+ if (!(e->xcr0_in == 1 || e->xcr0_in == 3) || e->xss_in)
+ continue;
+
+ leaf->eax = e->eax;
+ leaf->ebx = e->ebx;
+ leaf->ecx = e->ecx;
+ leaf->edx = e->edx;
+
+ return true;
+ }
+
+ return false;
+}
+
+static void snp_cpuid_hv(struct cpuid_leaf *leaf)
+{
+ if (sev_cpuid_hv(leaf))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
+}
+
+static int snp_cpuid_postprocess(struct cpuid_leaf *leaf)
+{
+ struct cpuid_leaf leaf_hv = *leaf;
+
+ switch (leaf->fn) {
+ case 0x1:
+ snp_cpuid_hv(&leaf_hv);
+
+ /* initial APIC ID */
+ leaf->ebx = (leaf_hv.ebx & GENMASK(31, 24)) | (leaf->ebx & GENMASK(23, 0));
+ /* APIC enabled bit */
+ leaf->edx = (leaf_hv.edx & BIT(9)) | (leaf->edx & ~BIT(9));
+
+ /* OSXSAVE enabled bit */
+ if (native_read_cr4() & X86_CR4_OSXSAVE)
+ leaf->ecx |= BIT(27);
+ break;
+ case 0x7:
+ /* OSPKE enabled bit */
+ leaf->ecx &= ~BIT(4);
+ if (native_read_cr4() & X86_CR4_PKE)
+ leaf->ecx |= BIT(4);
+ break;
+ case 0xB:
+ leaf_hv.subfn = 0;
+ snp_cpuid_hv(&leaf_hv);
+
+ /* extended APIC ID */
+ leaf->edx = leaf_hv.edx;
+ break;
+ case 0xD: {
+ bool compacted = false;
+ u64 xcr0 = 1, xss = 0;
+ u32 xsave_size;
+
+ if (leaf->subfn != 0 && leaf->subfn != 1)
+ return 0;
+
+ if (native_read_cr4() & X86_CR4_OSXSAVE)
+ xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
+ if (leaf->subfn == 1) {
+ /* Get XSS value if XSAVES is enabled. */
+ if (leaf->eax & BIT(3)) {
+ unsigned long lo, hi;
+
+ asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
+ : "c" (MSR_IA32_XSS));
+ xss = (hi << 32) | lo;
+ }
+
+ /*
+ * The PPR and APM aren't clear on what size should be
+ * encoded in 0xD:0x1:EBX when compaction is not enabled
+ * by either XSAVEC (feature bit 1) or XSAVES (feature
+ * bit 3) since SNP-capable hardware has these feature
+ * bits fixed as 1. KVM sets it to 0 in this case, but
+ * to avoid this becoming an issue it's safer to simply
+ * treat this as unsupported for SNP guests.
+ */
+ if (!(leaf->eax & (BIT(1) | BIT(3))))
+ return -EINVAL;
+
+ compacted = true;
+ }
+
+ xsave_size = snp_cpuid_calc_xsave_size(xcr0 | xss, compacted);
+ if (!xsave_size)
+ return -EINVAL;
+
+ leaf->ebx = xsave_size;
+ }
+ break;
+ case 0x8000001E:
+ snp_cpuid_hv(&leaf_hv);
+
+ /* extended APIC ID */
+ leaf->eax = leaf_hv.eax;
+ /* compute ID */
+ leaf->ebx = (leaf->ebx & GENMASK(31, 8)) | (leaf_hv.ebx & GENMASK(7, 0));
+ /* node ID */
+ leaf->ecx = (leaf->ecx & GENMASK(31, 8)) | (leaf_hv.ecx & GENMASK(7, 0));
+ break;
+ default:
+ /* No fix-ups needed, use values as-is. */
+ break;
+ }
+
+ return 0;
+}
+
+/*
+ * Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value
+ * should be treated as fatal by caller.
+ */
+static int snp_cpuid(struct cpuid_leaf *leaf)
+{
+ const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+
+ if (!cpuid_table->count)
+ return -EOPNOTSUPP;
+
+ if (!snp_cpuid_get_validated_func(leaf)) {
+ /*
+ * Some hypervisors will avoid keeping track of CPUID entries
+ * where all values are zero, since they can be handled the
+ * same as out-of-range values (all-zero). This is useful here
+ * as well as it allows virtually all guest configurations to
+ * work using a single SNP CPUID table.
+ *
+ * To allow for this, there is a need to distinguish between
+ * out-of-range entries and in-range zero entries, since the
+ * CPUID table entries are only a template that may need to be
+ * augmented with additional values for things like
+ * CPU-specific information during post-processing. So if it's
+ * not in the table, set the values to zero. Then, if they are
+ * within a valid CPUID range, proceed with post-processing
+ * using zeros as the initial values. Otherwise, skip
+ * post-processing and just return zeros immediately.
+ */
+ leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;
+
+ /* Skip post-processing for out-of-range zero leafs. */
+ if (!(leaf->fn <= cpuid_std_range_max ||
+ (leaf->fn >= 0x40000000 && leaf->fn <= cpuid_hyp_range_max) ||
+ (leaf->fn >= 0x80000000 && leaf->fn <= cpuid_ext_range_max)))
+ return 0;
+ }
+
+ return snp_cpuid_postprocess(leaf);
+}
+
/*
* Boot VC Handler - This is the first VC handler during boot, there is no GHCB
* page yet, so it only supports the MSR based communication with the
@@ -252,6 +541,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
unsigned int subfn = lower_bits(regs->cx, 32);
unsigned int fn = lower_bits(regs->ax, 32);
struct cpuid_leaf leaf;
+ int ret;

/* Only CPUID is supported via MSR protocol */
if (exit_code != SVM_EXIT_CPUID)
@@ -259,9 +549,18 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)

leaf.fn = fn;
leaf.subfn = subfn;
+
+ ret = snp_cpuid(&leaf);
+ if (!ret)
+ goto cpuid_done;
+
+ if (ret != -EOPNOTSUPP)
+ goto fail;
+
if (sev_cpuid_hv(&leaf))
goto fail;

+cpuid_done:
regs->ax = leaf.eax;
regs->bx = leaf.ebx;
regs->cx = leaf.ecx;
@@ -556,12 +855,37 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
return ret;
}

+static int vc_handle_cpuid_snp(struct pt_regs *regs)
+{
+ struct cpuid_leaf leaf;
+ int ret;
+
+ leaf.fn = regs->ax;
+ leaf.subfn = regs->cx;
+ ret = snp_cpuid(&leaf);
+ if (!ret) {
+ regs->ax = leaf.eax;
+ regs->bx = leaf.ebx;
+ regs->cx = leaf.ecx;
+ regs->dx = leaf.edx;
+ }
+
+ return ret;
+}
+
static enum es_result vc_handle_cpuid(struct ghcb *ghcb,
struct es_em_ctxt *ctxt)
{
struct pt_regs *regs = ctxt->regs;
u32 cr4 = native_read_cr4();
enum es_result ret;
+ int snp_cpuid_ret;
+
+ snp_cpuid_ret = vc_handle_cpuid_snp(regs);
+ if (!snp_cpuid_ret)
+ return ES_OK;
+ if (snp_cpuid_ret != -EOPNOTSUPP)
+ return ES_VMM_ERROR;

ghcb_set_rax(ghcb, regs->ax);
ghcb_set_rcx(ghcb, regs->cx);
--
2.25.1


2022-02-09 19:13:54

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 07/45] x86/compressed/64: Detect/setup SEV/SME features earlier in boot

From: Michael Roth <[email protected]>

With upcoming SEV-SNP support, SEV-related features need to be
initialized earlier in boot, at the same point the initial #VC handler
is set up, so that the SEV-SNP CPUID table can be utilized during the
initial feature checks. Also, SEV-SNP feature detection will rely on
EFI helper functions to scan the EFI config table for the Confidential
Computing blob, and so would need to be implemented at least partially
in C.

Currently set_sev_encryption_mask() is used to initialize the
sev_status and sme_me_mask globals that advertise what SEV/SME features
are available in a guest. Rename it to sev_enable() to better reflect
that (SME is only enabled in the case of SEV guests in the
boot/compressed kernel), and move it to just after the stage1 #VC
handler is set up so that it can be used to initialize SEV-SNP as well
in future patches.

While at it, re-implement it as C code so that all SEV feature
detection can be better consolidated with upcoming SEV-SNP feature
detection, which will also be in C.

The 32-bit entry path remains unchanged, as it never relied on the
set_sev_encryption_mask() initialization to begin with, possibly due to
the normal rva() helper for accessing globals only being usable by code
in .head.text. Either way, 32-bit entry for SEV-SNP would likely only
be supported for non-EFI boot paths, and so wouldn't rely on existing
EFI helper functions, and so could be handled by a separate/simpler
32-bit initializer in the future if needed.

Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 37 +++++++++++++++-----------
arch/x86/boot/compressed/mem_encrypt.S | 36 -------------------------
arch/x86/boot/compressed/misc.h | 4 +--
arch/x86/boot/compressed/sev.c | 36 +++++++++++++++++++++++++
4 files changed, 60 insertions(+), 53 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fd9441f40457..4cd83afb9531 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -189,11 +189,11 @@ SYM_FUNC_START(startup_32)
subl $32, %eax /* Encryption bit is always above bit 31 */
bts %eax, %edx /* Set encryption mask for page tables */
/*
- * Mark SEV as active in sev_status so that startup32_check_sev_cbit()
- * will do a check. The sev_status memory will be fully initialized
- * with the contents of MSR_AMD_SEV_STATUS later in
- * set_sev_encryption_mask(). For now it is sufficient to know that SEV
- * is active.
+ * Set MSR_AMD64_SEV_ENABLED_BIT in sev_status so that
+ * startup32_check_sev_cbit() will do a check. sev_enable() will
+ * initialize sev_status with all the bits reported by
+ * MSR_AMD_SEV_STATUS later, but only MSR_AMD64_SEV_ENABLED_BIT
+ * needs to be set for now.
*/
movl $1, rva(sev_status)(%ebp)
1:
@@ -447,6 +447,23 @@ SYM_CODE_START(startup_64)
call load_stage1_idt
popq %rsi

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ /*
+ * Now that the stage1 interrupt handlers are set up, #VC exceptions from
+ * CPUID instructions can be properly handled for SEV-ES guests.
+ *
+ * For SEV-SNP, the CPUID table also needs to be set up in advance of any
+ * CPUID instructions being issued, so go ahead and do that now via
+ * sev_enable(), which will also handle the rest of the SEV-related
+ * detection/setup to ensure that has been done in advance of any dependent
+ * code.
+ */
+ pushq %rsi
+ movq %rsi, %rdi /* real mode address */
+ call sev_enable
+ popq %rsi
+#endif
+
/*
* paging_prepare() sets up the trampoline and checks if we need to
* enable 5-level paging.
@@ -559,17 +576,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
shrq $3, %rcx
rep stosq

-/*
- * If running as an SEV guest, the encryption mask is required in the
- * page-table setup code below. When the guest also has SEV-ES enabled
- * set_sev_encryption_mask() will cause #VC exceptions, but the stage2
- * handler can't map its GHCB because the page-table is not set up yet.
- * So set up the encryption mask here while still on the stage1 #VC
- * handler. Then load stage2 IDT and switch to the kernel's own
- * page-table.
- */
pushq %rsi
- call set_sev_encryption_mask
call load_stage2_idt

/* Pass boot_params to initialize_identity_maps() */
diff --git a/arch/x86/boot/compressed/mem_encrypt.S b/arch/x86/boot/compressed/mem_encrypt.S
index a63424d13627..a73e4d783cae 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -187,42 +187,6 @@ SYM_CODE_END(startup32_vc_handler)
.code64

#include "../../kernel/sev_verify_cbit.S"
-SYM_FUNC_START(set_sev_encryption_mask)
-#ifdef CONFIG_AMD_MEM_ENCRYPT
- push %rbp
- push %rdx
-
- movq %rsp, %rbp /* Save current stack pointer */
-
- call get_sev_encryption_bit /* Get the encryption bit position */
- testl %eax, %eax
- jz .Lno_sev_mask
-
- bts %rax, sme_me_mask(%rip) /* Create the encryption mask */
-
- /*
- * Read MSR_AMD64_SEV again and store it to sev_status. Can't do this in
- * get_sev_encryption_bit() because this function is 32-bit code and
- * shared between 64-bit and 32-bit boot path.
- */
- movl $MSR_AMD64_SEV, %ecx /* Read the SEV MSR */
- rdmsr
-
- /* Store MSR value in sev_status */
- shlq $32, %rdx
- orq %rdx, %rax
- movq %rax, sev_status(%rip)
-
-.Lno_sev_mask:
- movq %rbp, %rsp /* Restore original stack pointer */
-
- pop %rdx
- pop %rbp
-#endif
-
- xor %rax, %rax
- RET
-SYM_FUNC_END(set_sev_encryption_mask)

.data

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 16ed360b6692..23e0e395084a 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -120,12 +120,12 @@ static inline void console_init(void)
{ }
#endif

-void set_sev_encryption_mask(void);
-
#ifdef CONFIG_AMD_MEM_ENCRYPT
+void sev_enable(struct boot_params *bp);
void sev_es_shutdown_ghcb(void);
extern bool sev_es_check_ghcb_fault(unsigned long address);
#else
+static inline void sev_enable(struct boot_params *bp) { }
static inline void sev_es_shutdown_ghcb(void) { }
static inline bool sev_es_check_ghcb_fault(unsigned long address)
{
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 4e82101b7d13..27ccd5a5ff60 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -201,3 +201,39 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
else if (result != ES_RETRY)
sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
}
+
+void sev_enable(struct boot_params *bp)
+{
+ unsigned int eax, ebx, ecx, edx;
+ struct msr m;
+
+ /* Check for the SME/SEV support leaf */
+ eax = 0x80000000;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ if (eax < 0x8000001f)
+ return;
+
+ /*
+ * Check for the SME/SEV feature:
+ * CPUID Fn8000_001F[EAX]
+ * - Bit 0 - Secure Memory Encryption support
+ * - Bit 1 - Secure Encrypted Virtualization support
+ * CPUID Fn8000_001F[EBX]
+ * - Bits 5:0 - Pagetable bit position used to indicate encryption
+ */
+ eax = 0x8000001f;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ /* Check whether SEV is supported */
+ if (!(eax & BIT(1)))
+ return;
+
+ /* Set the SME mask if this is an SEV guest. */
+ boot_rdmsr(MSR_AMD64_SEV, &m);
+ sev_status = m.q;
+ if (!(sev_status & MSR_AMD64_SEV_ENABLED))
+ return;
+
+ sme_me_mask = BIT_ULL(ebx & 0x3f);
+}
--
2.25.1


2022-02-09 19:14:32

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

The set_memory_{encrypted,decrypted}() are used for changing the pages
from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps done by the guest.

If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:

1. Issue the page state change VMGEXIT to add the memory region in
the RMP table.
2. Validate the memory region after the RMP entry is added.

To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before encryption attribute
is removed from the page table:

1. Invalidate the page.
2. Issue the page state change VMGEXIT to remove the page from RMP table.

To change the page state in the RMP table, use the Page State Change
VMGEXIT defined in the GHCB specification.

The GHCB specification provides the flexibility to use either 4K or 2MB
page size in during the page state change (PSC) request. For now use the
4K page size for all the PSC until RMP page size tracking is supported
in the kernel.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 22 ++++
arch/x86/include/asm/sev.h | 4 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kernel/sev.c | 168 ++++++++++++++++++++++++++++++
arch/x86/mm/pat/set_memory.c | 15 +++
5 files changed, 211 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index f077a6c95e67..1aa72b5c2490 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -105,6 +105,28 @@ enum psc_op {

#define GHCB_HV_FT_SNP BIT_ULL(0)

+/* SNP Page State Change NAE event */
+#define VMGEXIT_PSC_MAX_ENTRY 253
+
+struct psc_hdr {
+ u16 cur_entry;
+ u16 end_entry;
+ u32 reserved;
+} __packed;
+
+struct psc_entry {
+ u64 cur_page : 12,
+ gfn : 40,
+ operation : 4,
+ pagesize : 1,
+ reserved : 7;
+} __packed;
+
+struct snp_psc_desc {
+ struct psc_hdr hdr;
+ struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY];
+} __packed;
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index f65d257e3d4a..feeb93e6ec97 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -128,6 +128,8 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
unsigned int npages);
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
+void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
+void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -142,6 +144,8 @@ early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned
static inline void __init
early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
+static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
+static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
#endif

#endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index b0ad00f4c1e1..0dcdb6e0c913 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -108,6 +108,7 @@
#define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
+#define SVM_VMGEXIT_PSC 0x80000010
#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff

@@ -219,6 +220,7 @@
{ SVM_VMGEXIT_NMI_COMPLETE, "vmgexit_nmi_complete" }, \
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
+ { SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
{ SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1e8dc71e7ba6..4315be1602d1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -655,6 +655,174 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
WARN(1, "invalid memory op %d\n", op);
}

+static int vmgexit_psc(struct snp_psc_desc *desc)
+{
+ int cur_entry, end_entry, ret = 0;
+ struct snp_psc_desc *data;
+ struct ghcb_state state;
+ struct es_em_ctxt ctxt;
+ unsigned long flags;
+ struct ghcb *ghcb;
+
+ /*
+ * __sev_get_ghcb() needs to run with IRQs disabled because it is using
+ * a per-CPU GHCB.
+ */
+ local_irq_save(flags);
+
+ ghcb = __sev_get_ghcb(&state);
+ if (!ghcb) {
+ ret = 1;
+ goto out_unlock;
+ }
+
+ /* Copy the input desc into GHCB shared buffer */
+ data = (struct snp_psc_desc *)ghcb->shared_buffer;
+ memcpy(ghcb->shared_buffer, desc, min_t(int, GHCB_SHARED_BUF_SIZE, sizeof(*desc)));
+
+ /*
+ * As per the GHCB specification, the hypervisor can resume the guest
+ * before processing all the entries. Check whether all the entries
+ * are processed. If not, then keep retrying. Note, the hypervisor
+ * will update the data memory directly to indicate the status, so
+ * reference the data->hdr everywhere.
+ *
+ * The strategy here is to wait for the hypervisor to change the page
+ * state in the RMP table before guest accesses the memory pages. If the
+ * page state change was not successful, then later memory access will
+ * result in a crash.
+ */
+ cur_entry = data->hdr.cur_entry;
+ end_entry = data->hdr.end_entry;
+
+ while (data->hdr.cur_entry <= data->hdr.end_entry) {
+ ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
+
+ /* This will advance the shared buffer data points to. */
+ ret = sev_es_ghcb_hv_call(ghcb, true, &ctxt, SVM_VMGEXIT_PSC, 0, 0);
+
+ /*
+ * Page State Change VMGEXIT can pass error code through
+ * exit_info_2.
+ */
+ if (WARN(ret || ghcb->save.sw_exit_info_2,
+ "SNP: PSC failed ret=%d exit_info_2=%llx\n",
+ ret, ghcb->save.sw_exit_info_2)) {
+ ret = 1;
+ goto out;
+ }
+
+ /* Verify that reserved bit is not set */
+ if (WARN(data->hdr.reserved, "Reserved bit is set in the PSC header\n")) {
+ ret = 1;
+ goto out;
+ }
+
+ /*
+ * Sanity check that entry processing is not going backwards.
+ * This will happen only if hypervisor is tricking us.
+ */
+ if (WARN(data->hdr.end_entry > end_entry || cur_entry > data->hdr.cur_entry,
+"SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
+ end_entry, data->hdr.end_entry, cur_entry, data->hdr.cur_entry)) {
+ ret = 1;
+ goto out;
+ }
+ }
+
+out:
+ __sev_put_ghcb(&state);
+
+out_unlock:
+ local_irq_restore(flags);
+
+ return ret;
+}
+
+static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
+ unsigned long vaddr_end, int op)
+{
+ struct psc_hdr *hdr;
+ struct psc_entry *e;
+ unsigned long pfn;
+ int i;
+
+ hdr = &data->hdr;
+ e = data->entries;
+
+ memset(data, 0, sizeof(*data));
+ i = 0;
+
+ while (vaddr < vaddr_end) {
+ if (is_vmalloc_addr((void *)vaddr))
+ pfn = vmalloc_to_pfn((void *)vaddr);
+ else
+ pfn = __pa(vaddr) >> PAGE_SHIFT;
+
+ e->gfn = pfn;
+ e->operation = op;
+ hdr->end_entry = i;
+
+ /*
+ * Current SNP implementation doesn't keep track of the RMP page
+ * size so use 4K for simplicity.
+ */
+ e->pagesize = RMP_PG_SIZE_4K;
+
+ vaddr = vaddr + PAGE_SIZE;
+ e++;
+ i++;
+ }
+
+ if (vmgexit_psc(data))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+}
+
+static void set_pages_state(unsigned long vaddr, unsigned int npages, int op)
+{
+ unsigned long vaddr_end, next_vaddr;
+ struct snp_psc_desc *desc;
+
+ desc = kmalloc(sizeof(*desc), GFP_KERNEL_ACCOUNT);
+ if (!desc)
+ panic("SNP: failed to allocate memory for PSC descriptor\n");
+
+ vaddr = vaddr & PAGE_MASK;
+ vaddr_end = vaddr + (npages << PAGE_SHIFT);
+
+ while (vaddr < vaddr_end) {
+ /* Calculate the last vaddr that fits in one struct snp_psc_desc. */
+ next_vaddr = min_t(unsigned long, vaddr_end,
+ (VMGEXIT_PSC_MAX_ENTRY * PAGE_SIZE) + vaddr);
+
+ __set_pages_state(desc, vaddr, next_vaddr, op);
+
+ vaddr = next_vaddr;
+ }
+
+ kfree(desc);
+}
+
+void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
+{
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return;
+
+ pvalidate_pages(vaddr, npages, false);
+
+ set_pages_state(vaddr, npages, SNP_PAGE_STATE_SHARED);
+}
+
+void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
+{
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return;
+
+ set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
+
+ pvalidate_pages(vaddr, npages, true);
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index b4072115c8ef..e58d57b038ee 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -32,6 +32,7 @@
#include <asm/set_memory.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
+#include <asm/sev.h>

#include "../mm_internal.h"

@@ -2012,8 +2013,22 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
*/
cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));

+ /*
+ * To maintain the security guarantees of SEV-SNP guests, make sure
+ * to invalidate the memory before clearing the encryption attribute.
+ */
+ if (!enc)
+ snp_set_memory_shared(addr, numpages);
+
ret = __change_page_attr_set_clr(&cpa, 1);

+ /*
+ * Now that memory is mapped encrypted in the page table, validate it
+ * so that it is consistent with the above page state.
+ */
+ if (!ret && enc)
+ snp_set_memory_private(addr, numpages);
+
/*
* After changing the encryption attribute, we need to flush TLBs again
* in case any speculative TLB caching occurred (but no need to flush
--
2.25.1


2022-02-09 19:14:41

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 43/45] virt: sevguest: Add support to derive key

The SNP_GET_DERIVED_KEY ioctl interface can be used by the SNP guest to
ask the firmware to provide a key derived from a root key. The derived
key may be used by the guest for any purposes it chooses, such as a
sealing key or communicating with the external entities.

See SEV-SNP firmware spec for more information.

Reviewed-by: Liam Merwick <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/virt/coco/sevguest.rst | 17 ++++++++++
drivers/virt/coco/sevguest/sevguest.c | 45 +++++++++++++++++++++++++++
include/uapi/linux/sev-guest.h | 17 ++++++++++
3 files changed, 79 insertions(+)

diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
index 34feff6d5940..ae2e76f59435 100644
--- a/Documentation/virt/coco/sevguest.rst
+++ b/Documentation/virt/coco/sevguest.rst
@@ -77,6 +77,23 @@ On success, the snp_report_resp.data will contains the report. The report
contain the format described in the SEV-SNP specification. See the SEV-SNP
specification for further details.

+2.2 SNP_GET_DERIVED_KEY
+-----------------------
+:Technology: sev-snp
+:Type: guest ioctl
+:Parameters (in): struct snp_derived_key_req
+:Returns (out): struct snp_derived_key_resp on success, -negative on error
+
+The SNP_GET_DERIVED_KEY ioctl can be used to get a key derive from a root key.
+The derived key can be used by the guest for any purpose, such as sealing keys
+or communicating with external entities.
+
+The ioctl uses the SNP_GUEST_REQUEST (MSG_KEY_REQ) command provided by the
+SEV-SNP firmware to derive the key. See SEV-SNP specification for further details
+on the various fields passed in the key derivation request.
+
+On success, the snp_derived_key_resp.data contains the derived key value. See
+the SEV-SNP specification for further details.

Reference
---------
diff --git a/drivers/virt/coco/sevguest/sevguest.c b/drivers/virt/coco/sevguest/sevguest.c
index 4fac82fd3e4c..2062f94fdc38 100644
--- a/drivers/virt/coco/sevguest/sevguest.c
+++ b/drivers/virt/coco/sevguest/sevguest.c
@@ -389,6 +389,48 @@ static int get_report(struct snp_guest_dev *snp_dev, struct snp_guest_request_io
return rc;
}

+static int get_derived_key(struct snp_guest_dev *snp_dev, struct snp_guest_request_ioctl *arg)
+{
+ struct snp_guest_crypto *crypto = snp_dev->crypto;
+ struct snp_derived_key_resp resp = {0};
+ struct snp_derived_key_req req = {0};
+ int rc, resp_len;
+ /* Response data is 64 bytes and max authsize for GCM is 16 bytes. */
+ u8 buf[64 + 16];
+
+ lockdep_assert_held(&snp_cmd_mutex);
+
+ if (!arg->req_data || !arg->resp_data)
+ return -EINVAL;
+
+ /*
+ * The intermediate response buffer is used while decrypting the
+ * response payload. Make sure that it has enough space to cover the
+ * authtag.
+ */
+ resp_len = sizeof(resp.data) + crypto->a_len;
+ if (sizeof(buf) < resp_len)
+ return -ENOMEM;
+
+ if (copy_from_user(&req, (void __user *)arg->req_data, sizeof(req)))
+ return -EFAULT;
+
+ rc = handle_guest_request(snp_dev, SVM_VMGEXIT_GUEST_REQUEST, arg->msg_version,
+ SNP_MSG_KEY_REQ, &req, sizeof(req), buf, resp_len,
+ &arg->fw_err);
+ if (rc)
+ return rc;
+
+ memcpy(resp.data, buf, sizeof(resp.data));
+ if (copy_to_user((void __user *)arg->resp_data, &resp, sizeof(resp)))
+ rc = -EFAULT;
+
+ /* The response buffer contains the sensitive data, explicitly clear it. */
+ memzero_explicit(buf, sizeof(buf));
+ memzero_explicit(&resp, sizeof(resp));
+ return rc;
+}
+
static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
struct snp_guest_dev *snp_dev = to_snp_dev(file);
@@ -418,6 +460,9 @@ static long snp_guest_ioctl(struct file *file, unsigned int ioctl, unsigned long
case SNP_GET_REPORT:
ret = get_report(snp_dev, &input);
break;
+ case SNP_GET_DERIVED_KEY:
+ ret = get_derived_key(snp_dev, &input);
+ break;
default:
break;
}
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index 38f11d723c68..598367f12064 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -30,6 +30,20 @@ struct snp_report_resp {
__u8 data[4000];
};

+struct snp_derived_key_req {
+ __u32 root_key_select;
+ __u32 rsvd;
+ __u64 guest_field_select;
+ __u32 vmpl;
+ __u32 guest_svn;
+ __u64 tcb_version;
+};
+
+struct snp_derived_key_resp {
+ /* response data, see SEV-SNP spec for the format */
+ __u8 data[64];
+};
+
struct snp_guest_request_ioctl {
/* message version number (must be non-zero) */
__u8 msg_version;
@@ -47,4 +61,7 @@ struct snp_guest_request_ioctl {
/* Get SNP attestation report */
#define SNP_GET_REPORT _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x0, struct snp_guest_request_ioctl)

+/* Get a derived key from the root */
+#define SNP_GET_DERIVED_KEY _IOWR(SNP_GUEST_REQ_IOC_TYPE, 0x1, struct snp_guest_request_ioctl)
+
#endif /* __UAPI_LINUX_SEV_GUEST_H_ */
--
2.25.1


2022-02-09 19:14:47

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 45/45] virt: sevguest: Add documentation for SEV-SNP CPUID Enforcement

From: Michael Roth <[email protected]>

Update the documentation with information regarding SEV-SNP CPUID
Enforcement details and what sort of assurances it provides to guests.

Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
Documentation/virt/coco/sevguest.rst | 29 ++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
index 0f352056572d..48d66e10305b 100644
--- a/Documentation/virt/coco/sevguest.rst
+++ b/Documentation/virt/coco/sevguest.rst
@@ -118,6 +118,35 @@ be updated with the expected value.

See GHCB specification for further detail on how to parse the certificate blob.

+3. SEV-SNP CPUID Enforcement
+============================
+
+SEV-SNP guests can access a special page that contains a table of CPUID values
+that have been validated by the PSP as part of the SNP_LAUNCH_UPDATE firmware
+command. It provides the following assurances regarding the validity of CPUID
+values:
+
+ - Its address is obtained via bootloader/firmware (via CC blob), and those
+ binaries will be measured as part of the SEV-SNP attestation report.
+ - Its initial state will be encrypted/pvalidated, so attempts to modify
+ it during run-time will result in garbage being written, or #VC exceptions
+ being generated due to changes in validation state if the hypervisor tries
+ to swap the backing page.
+ - Attempts to bypass PSP checks by the hypervisor by using a normal page, or
+ a non-CPUID encrypted page will change the measurement provided by the
+ SEV-SNP attestation report.
+ - The CPUID page contents are *not* measured, but attempts to modify the
+ expected contents of a CPUID page as part of guest initialization will be
+ gated by the PSP CPUID enforcement policy checks performed on the page
+ during SNP_LAUNCH_UPDATE, and noticeable later if the guest owner
+ implements their own checks of the CPUID values.
+
+It is important to note that this last assurance is only useful if the kernel
+has taken care to make use of the SEV-SNP CPUID throughout all stages of boot.
+Otherwise, guest owner attestation provides no assurance that the kernel wasn't
+fed incorrect values at some point during boot.
+
+
Reference
---------

--
2.25.1


2022-02-09 19:17:10

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 40/45] x86/sev: Provide support for SNP guest request NAEs

Version 2 of GHCB specification provides SNP_GUEST_REQUEST and
SNP_EXT_GUEST_REQUEST NAE that can be used by the SNP guest to communicate
with the PSP.

While at it, add a snp_issue_guest_request() helper that will be used by
driver or other subsystem to issue the request to PSP.

See SEV-SNP firmware and GHCB spec for more details.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 3 ++
arch/x86/include/asm/sev.h | 14 ++++++++
arch/x86/include/uapi/asm/svm.h | 4 +++
arch/x86/kernel/sev.c | 55 +++++++++++++++++++++++++++++++
4 files changed, 76 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 0759af9b1acf..b8357d6ecd47 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -128,6 +128,9 @@ struct snp_psc_desc {
struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY];
} __packed;

+/* Guest message request error code */
+#define SNP_GUEST_REQ_INVALID_LEN BIT_ULL(32)
+
#define GHCB_MSR_TERM_REQ 0x100
#define GHCB_MSR_TERM_REASON_SET_POS 12
#define GHCB_MSR_TERM_REASON_SET_MASK 0xf
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 219abb4590f2..9830ee1d6ef0 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -87,6 +87,14 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

#define RMPADJUST_VMSA_PAGE_BIT BIT(16)

+/* SNP Guest message request */
+struct snp_req_data {
+ unsigned long req_gpa;
+ unsigned long resp_gpa;
+ unsigned long data_gpa;
+ unsigned int data_npages;
+};
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -154,6 +162,7 @@ void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void snp_abort(void);
+int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -173,6 +182,11 @@ static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npag
static inline void snp_set_wakeup_secondary_cpu(void) { }
static inline bool snp_init(struct boot_params *bp) { return false; }
static inline void snp_abort(void) { }
+static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input,
+ unsigned long *fw_err)
+{
+ return -ENOTTY;
+}
#endif

#endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 8b4c57baec52..5b8bc2b65a5e 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -109,6 +109,8 @@
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
#define SVM_VMGEXIT_PSC 0x80000010
+#define SVM_VMGEXIT_GUEST_REQUEST 0x80000011
+#define SVM_VMGEXIT_EXT_GUEST_REQUEST 0x80000012
#define SVM_VMGEXIT_AP_CREATION 0x80000013
#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
#define SVM_VMGEXIT_AP_CREATE 1
@@ -225,6 +227,8 @@
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
{ SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
+ { SVM_VMGEXIT_GUEST_REQUEST, "vmgexit_guest_request" }, \
+ { SVM_VMGEXIT_EXT_GUEST_REQUEST, "vmgexit_ext_guest_request" }, \
{ SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
{ SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 7bef422b428f..cafced2237f3 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2093,3 +2093,58 @@ static int __init report_cpuid_table(void)
}

arch_initcall(report_cpuid_table);
+
+int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err)
+{
+ struct ghcb_state state;
+ struct es_em_ctxt ctxt;
+ unsigned long flags;
+ struct ghcb *ghcb;
+ int ret;
+
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return -ENODEV;
+
+ /*
+ * __sev_get_ghcb() needs to run with IRQs disabled because it is using
+ * a per-CPU GHCB.
+ */
+ local_irq_save(flags);
+
+ ghcb = __sev_get_ghcb(&state);
+ if (!ghcb) {
+ ret = -EIO;
+ goto e_restore_irq;
+ }
+
+ vc_ghcb_invalidate(ghcb);
+
+ if (exit_code == SVM_VMGEXIT_EXT_GUEST_REQUEST) {
+ ghcb_set_rax(ghcb, input->data_gpa);
+ ghcb_set_rbx(ghcb, input->data_npages);
+ }
+
+ ret = sev_es_ghcb_hv_call(ghcb, true, &ctxt, exit_code, input->req_gpa, input->resp_gpa);
+ if (ret)
+ goto e_put;
+
+ if (ghcb->save.sw_exit_info_2) {
+ /* Number of expected pages are returned in RBX */
+ if (exit_code == SVM_VMGEXIT_EXT_GUEST_REQUEST &&
+ ghcb->save.sw_exit_info_2 == SNP_GUEST_REQ_INVALID_LEN)
+ input->data_npages = ghcb_get_rbx(ghcb);
+
+ if (fw_err)
+ *fw_err = ghcb->save.sw_exit_info_2;
+
+ ret = -EIO;
+ }
+
+e_put:
+ __sev_put_ghcb(&state);
+e_restore_irq:
+ local_irq_restore(flags);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(snp_issue_guest_request);
--
2.25.1


2022-02-09 19:21:24

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 17/45] x86/sev: Register GHCB memory when SEV-SNP is active

The SEV-SNP guest is required by the GHCB spec to register the GHCB's
Guest Physical Address (GPA). This is because the hypervisor may prefer
that a guest use a consistent and/or specific GPA for the GHCB associated
with a vCPU. For more information, see the GHCB specification section
"GHCB GPA Registration".

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev.h | 2 ++
arch/x86/kernel/cpu/common.c | 4 +++
arch/x86/kernel/head64.c | 4 ++-
arch/x86/kernel/sev-shared.c | 2 +-
arch/x86/kernel/sev.c | 47 +++++++++++++++++++++++++++---------
5 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index e37451849165..48df02713ee0 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -122,6 +122,7 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)

return rc;
}
+void setup_ghcb(void);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -130,6 +131,7 @@ static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs) { return 0; }
+static inline void setup_ghcb(void) { }
#endif

#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 64deb7727d00..2e0dd7f4018e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -59,6 +59,7 @@
#include <asm/cpu_device_id.h>
#include <asm/uv/uv.h>
#include <asm/sigframe.h>
+#include <asm/sev.h>

#include "cpu.h"

@@ -2067,6 +2068,9 @@ void cpu_init_exception_handling(void)

load_TR_desc();

+ /* GHCB need to be setup to handle #VC. */
+ setup_ghcb();
+
/* Finally load the IDT */
load_current_idt();
}
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 66363f51a3ad..8075e91cff2b 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -597,8 +597,10 @@ static void startup_64_load_idt(unsigned long physbase)
void early_setup_idt(void)
{
/* VMM Communication Exception */
- if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ setup_ghcb();
set_bringup_idt_handler(bringup_idt_table, X86_TRAP_VC, vc_boot_ghcb);
+ }

bringup_idt_descr.address = (unsigned long)bringup_idt_table;
native_load_idt(&bringup_idt_descr);
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index e9ff13cd90b0..3aaef1a18ffe 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -68,7 +68,7 @@ static u64 get_hv_features(void)
return GHCB_MSR_HV_FT_RESP_VAL(val);
}

-static void __maybe_unused snp_register_ghcb_early(unsigned long paddr)
+static void snp_register_ghcb_early(unsigned long paddr)
{
unsigned long pfn = paddr >> PAGE_SHIFT;
u64 val;
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index cb20fb0c608e..cc382c4f89ef 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -41,7 +41,7 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
* Needs to be in the .data section because we need it NULL before bss is
* cleared
*/
-static struct ghcb __initdata *boot_ghcb;
+static struct ghcb *boot_ghcb __section(".data");

/* Bitmap of SEV features supported by the hypervisor */
static u64 sev_hv_features __ro_after_init;
@@ -647,15 +647,40 @@ static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
return ret;
}

-/*
- * This function runs on the first #VC exception after the kernel
- * switched to virtual addresses.
- */
-static bool __init sev_es_setup_ghcb(void)
+static void snp_register_per_cpu_ghcb(void)
{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ snp_register_ghcb_early(__pa(ghcb));
+}
+
+void setup_ghcb(void)
+{
+ if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
+ return;
+
/* First make sure the hypervisor talks a supported protocol. */
if (!sev_es_negotiate_protocol())
- return false;
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
+ /*
+ * Check whether the runtime #VC exception handler is active.
+ * The runtime exception handler uses the per-CPU GHCB page, and
+ * the GHCB page would be setup by sev_es_init_vc_handling().
+ *
+ * If SNP is active, then register the per-CPU GHCB page so that
+ * the runtime exception handler can use it.
+ */
+ if (initial_vc_handler == (unsigned long)kernel_exc_vmm_communication) {
+ if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ snp_register_per_cpu_ghcb();
+
+ return;
+ }

/*
* Clear the boot_ghcb. The first exception comes in before the bss
@@ -666,7 +691,9 @@ static bool __init sev_es_setup_ghcb(void)
/* Alright - Make the boot-ghcb public */
boot_ghcb = &boot_ghcb_page;

- return true;
+ /* SNP guest requires that GHCB GPA must be registered. */
+ if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ snp_register_ghcb_early(__pa(&boot_ghcb_page));
}

#ifdef CONFIG_HOTPLUG_CPU
@@ -1397,10 +1424,6 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
struct es_em_ctxt ctxt;
enum es_result result;

- /* Do initial setup or terminate the guest */
- if (unlikely(boot_ghcb == NULL && !sev_es_setup_ghcb()))
- sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
-
vc_ghcb_invalidate(boot_ghcb);

result = vc_init_em_ctxt(&ctxt, regs, exit_code);
--
2.25.1


2022-02-09 19:22:43

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 24/45] x86/compressed/acpi: Move EFI detection to helper

From: Michael Roth <[email protected]>

Future patches for SEV-SNP-validated CPUID will also require early
parsing of the EFI configuration. Incrementally move the related
code into a set of helpers that can be re-used for that purpose.

Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/acpi.c | 28 +++++++----------
arch/x86/boot/compressed/efi.c | 50 +++++++++++++++++++++++++++++++
arch/x86/boot/compressed/misc.h | 16 ++++++++++
4 files changed, 77 insertions(+), 18 deletions(-)
create mode 100644 arch/x86/boot/compressed/efi.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 6115274fe10f..e69c3d2e0628 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -103,6 +103,7 @@ endif
vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o

vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
+vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a

$(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE
diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index 8bcbcee54aa1..db6c561920f0 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -87,7 +87,7 @@ static acpi_physical_address kexec_get_rsdp_addr(void)
efi_system_table_64_t *systab;
struct efi_setup_data *esd;
struct efi_info *ei;
- char *sig;
+ enum efi_type et;

esd = (struct efi_setup_data *)get_kexec_setup_data_addr();
if (!esd)
@@ -98,10 +98,9 @@ static acpi_physical_address kexec_get_rsdp_addr(void)
return 0;
}

- ei = &boot_params->efi_info;
- sig = (char *)&ei->efi_loader_signature;
- if (strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
- debug_putstr("Wrong kexec EFI loader signature.\n");
+ et = efi_get_type(boot_params);
+ if (et != EFI_TYPE_64) {
+ debug_putstr("Unexpected kexec EFI environment (expected 64-bit EFI).\n");
return 0;
}

@@ -122,29 +121,22 @@ static acpi_physical_address efi_get_rsdp_addr(void)
unsigned long systab, config_tables;
unsigned int nr_tables;
struct efi_info *ei;
+ enum efi_type et;
bool efi_64;
- char *sig;
-
- ei = &boot_params->efi_info;
- sig = (char *)&ei->efi_loader_signature;

- if (!strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
+ et = efi_get_type(boot_params);
+ if (et == EFI_TYPE_64)
efi_64 = true;
- } else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4)) {
+ else if (et == EFI_TYPE_32)
efi_64 = false;
- } else {
- debug_putstr("Wrong EFI loader signature.\n");
+ else
return 0;
- }

/* Get systab from boot params. */
+ ei = &boot_params->efi_info;
#ifdef CONFIG_X86_64
systab = ei->efi_systab | ((__u64)ei->efi_systab_hi << 32);
#else
- if (ei->efi_systab_hi || ei->efi_memmap_hi) {
- debug_putstr("Error getting RSDP address: EFI system table located above 4GB.\n");
- return 0;
- }
systab = ei->efi_systab;
#endif
if (!systab)
diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
new file mode 100644
index 000000000000..bad0ce3e2ef6
--- /dev/null
+++ b/arch/x86/boot/compressed/efi.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Helpers for early access to EFI configuration table.
+ *
+ * Originally derived from arch/x86/boot/compressed/acpi.c
+ */
+
+#include "misc.h"
+#include <linux/efi.h>
+#include <asm/efi.h>
+
+/**
+ * efi_get_type - Given a pointer to boot_params, determine the type of EFI environment.
+ *
+ * @bp: pointer to boot_params
+ *
+ * Return: EFI_TYPE_{32,64} for valid EFI environments, EFI_TYPE_NONE otherwise.
+ */
+enum efi_type efi_get_type(struct boot_params *bp)
+{
+ struct efi_info *ei;
+ enum efi_type et;
+ const char *sig;
+
+ ei = &bp->efi_info;
+ sig = (char *)&ei->efi_loader_signature;
+
+ if (!strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
+ et = EFI_TYPE_64;
+ } else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4)) {
+ et = EFI_TYPE_32;
+ } else {
+ debug_putstr("No EFI environment detected.\n");
+ et = EFI_TYPE_NONE;
+ }
+
+#ifndef CONFIG_X86_64
+ /*
+ * Existing callers like acpi.c treat this case as an indicator to
+ * fall-through to non-EFI, rather than an error, so maintain that
+ * functionality here as well.
+ */
+ if (ei->efi_systab_hi || ei->efi_memmap_hi) {
+ debug_putstr("EFI system table is located above 4GB and cannot be accessed.\n");
+ et = EFI_TYPE_NONE;
+ }
+#endif
+
+ return et;
+}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 01cc13c12059..fede1afa39e9 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -176,4 +176,20 @@ void boot_stage2_vc(void);

unsigned long sev_verify_cbit(unsigned long cr3);

+enum efi_type {
+ EFI_TYPE_64,
+ EFI_TYPE_32,
+ EFI_TYPE_NONE,
+};
+
+#ifdef CONFIG_EFI
+/* helpers for early EFI config table access */
+enum efi_type efi_get_type(struct boot_params *bp);
+#else
+static inline enum efi_type efi_get_type(struct boot_params *bp)
+{
+ return EFI_TYPE_NONE;
+}
+#endif /* CONFIG_EFI */
+
#endif /* BOOT_COMPRESSED_MISC_H */
--
2.25.1


2022-02-09 19:23:57

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 10/45] x86/sev: Define the Linux specific guest termination reasons

GHCB specification defines the reason code for reason set 0. The reason
codes defined in the set 0 do not cover all possible causes for a guest
to request termination.

The reason sets 1 to 255 are reserved for the vendor-specific codes.
Reserve the reason set 1 for the Linux guest. Define the error codes for
reason set 1 so that one can have meaningful termination reasons and thus
better guest failure diagnosis.

While at it, change the sev_es_terminate() to accept the reason set
parameter.

Reviewed-by: Venu Busireddy <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 6 +++---
arch/x86/include/asm/sev-common.h | 8 ++++++++
arch/x86/kernel/sev-shared.c | 11 ++++-------
arch/x86/kernel/sev.c | 4 ++--
4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 27ccd5a5ff60..56e941d5e092 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -119,7 +119,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
static bool early_setup_sev_es(void)
{
if (!sev_es_negotiate_protocol())
- sev_es_terminate(GHCB_SEV_ES_PROT_UNSUPPORTED);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_PROT_UNSUPPORTED);

if (set_page_decrypted((unsigned long)&boot_ghcb_page))
return false;
@@ -172,7 +172,7 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
enum es_result result;

if (!boot_ghcb && !early_setup_sev_es())
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);

vc_ghcb_invalidate(boot_ghcb);
result = vc_init_em_ctxt(&ctxt, regs, exit_code);
@@ -199,7 +199,7 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
if (result == ES_OK)
vc_finish_insn(&ctxt);
else if (result != ES_RETRY)
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
}

void sev_enable(struct boot_params *bp)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1b2fd32b42fe..94f0ea574049 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -73,9 +73,17 @@
/* GHCBData[23:16] */ \
((((u64)reason_val) & 0xff) << 16))

+/* Error codes from reason set 0 */
+#define SEV_TERM_SET_GEN 0
#define GHCB_SEV_ES_GEN_REQ 0
#define GHCB_SEV_ES_PROT_UNSUPPORTED 1

+/* Linux-specific reason codes (used with reason set 1) */
+#define SEV_TERM_SET_LINUX 1
+#define GHCB_TERM_REGISTER 0 /* GHCB GPA registration failure */
+#define GHCB_TERM_PSC 1 /* Page State Change failure */
+#define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */
+
#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)

/*
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index ce987688bbc0..2abf8a7d75e5 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -24,15 +24,12 @@ static bool __init sev_es_check_cpu_features(void)
return true;
}

-static void __noreturn sev_es_terminate(unsigned int reason)
+static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
{
u64 val = GHCB_MSR_TERM_REQ;

- /*
- * Tell the hypervisor what went wrong - only reason-set 0 is
- * currently supported.
- */
- val |= GHCB_SEV_TERM_REASON(0, reason);
+ /* Tell the hypervisor what went wrong. */
+ val |= GHCB_SEV_TERM_REASON(set, reason);

/* Request Guest Termination from Hypvervisor */
sev_es_wr_ghcb_msr(val);
@@ -221,7 +218,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)

fail:
/* Terminate the guest */
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
}

static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e6d316a01fdd..19ad09712902 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1337,7 +1337,7 @@ DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication)
show_regs(regs);

/* Ask hypervisor to sev_es_terminate */
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);

/* If that fails and we get here - just panic */
panic("Returned from Terminate-Request to Hypervisor\n");
@@ -1385,7 +1385,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)

/* Do initial setup or terminate the guest */
if (unlikely(boot_ghcb == NULL && !sev_es_setup_ghcb()))
- sev_es_terminate(GHCB_SEV_ES_GEN_REQ);
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);

vc_ghcb_invalidate(boot_ghcb);

--
2.25.1


2022-02-09 19:29:51

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 28/45] x86/compressed/acpi: Move EFI kexec handling into common code

From: Michael Roth <[email protected]>

Future patches for SEV-SNP-validated CPUID will also require early
parsing of the EFI configuration. Incrementally move the related code
into a set of helpers that can be re-used for that purpose.

In this instance, the current acpi.c kexec handling is mainly used to
get the alternative EFI config table address provided by kexec via a
setup_data entry of type SETUP_EFI. If not present, the code then falls
back to normal EFI config table address provided by EFI system table.
This would need to be done by all call-sites attempting to access the
EFI config table, so just have efi_get_conf_table() handle that
automatically.

Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/acpi.c | 59 ---------------------------------
arch/x86/boot/compressed/efi.c | 46 ++++++++++++++++++++++++-
2 files changed, 45 insertions(+), 60 deletions(-)

diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index b0c1dffc5510..64b172dabd5c 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -47,57 +47,6 @@ __efi_get_rsdp_addr(unsigned long cfg_tbl_pa, unsigned int cfg_tbl_len)
return 0;
}

-/* EFI/kexec support is 64-bit only. */
-#ifdef CONFIG_X86_64
-static struct efi_setup_data *get_kexec_setup_data_addr(void)
-{
- struct setup_data *data;
- u64 pa_data;
-
- pa_data = boot_params->hdr.setup_data;
- while (pa_data) {
- data = (struct setup_data *)pa_data;
- if (data->type == SETUP_EFI)
- return (struct efi_setup_data *)(pa_data + sizeof(struct setup_data));
-
- pa_data = data->next;
- }
- return NULL;
-}
-
-static acpi_physical_address kexec_get_rsdp_addr(void)
-{
- efi_system_table_64_t *systab;
- struct efi_setup_data *esd;
- struct efi_info *ei;
- enum efi_type et;
-
- esd = (struct efi_setup_data *)get_kexec_setup_data_addr();
- if (!esd)
- return 0;
-
- if (!esd->tables) {
- debug_putstr("Wrong kexec SETUP_EFI data.\n");
- return 0;
- }
-
- et = efi_get_type(boot_params);
- if (et != EFI_TYPE_64) {
- debug_putstr("Unexpected kexec EFI environment (expected 64-bit EFI).\n");
- return 0;
- }
-
- /* Get systab from boot params. */
- systab = (efi_system_table_64_t *)efi_get_system_table(boot_params);
- if (!systab)
- error("EFI system table not found in kexec boot_params.");
-
- return __efi_get_rsdp_addr((unsigned long)esd->tables, systab->nr_tables);
-}
-#else
-static acpi_physical_address kexec_get_rsdp_addr(void) { return 0; }
-#endif /* CONFIG_X86_64 */
-
static acpi_physical_address efi_get_rsdp_addr(void)
{
#ifdef CONFIG_EFI
@@ -210,14 +159,6 @@ acpi_physical_address get_rsdp_addr(void)

pa = boot_params->acpi_rsdp_addr;

- /*
- * Try to get EFI data from setup_data. This can happen when we're a
- * kexec'ed kernel and kexec(1) has passed all the required EFI info to
- * us.
- */
- if (!pa)
- pa = kexec_get_rsdp_addr();
-
if (!pa)
pa = efi_get_rsdp_addr();

diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
index f8d26db22659..ff2e2eaba1d4 100644
--- a/arch/x86/boot/compressed/efi.c
+++ b/arch/x86/boot/compressed/efi.c
@@ -78,6 +78,46 @@ unsigned long efi_get_system_table(struct boot_params *bp)
return sys_tbl_pa;
}

+/*
+ * EFI config table address changes to virtual address after boot, which may
+ * not be accessible for the kexec'd kernel. To address this, kexec provides
+ * the initial physical address via a struct setup_data entry, which is
+ * checked for here, along with some sanity checks.
+ */
+static struct efi_setup_data *get_kexec_setup_data(struct boot_params *bp,
+ enum efi_type et)
+{
+#ifdef CONFIG_X86_64
+ struct efi_setup_data *esd = NULL;
+ struct setup_data *data;
+ u64 pa_data;
+
+ pa_data = bp->hdr.setup_data;
+ while (pa_data) {
+ data = (struct setup_data *)pa_data;
+ if (data->type == SETUP_EFI) {
+ esd = (struct efi_setup_data *)(pa_data + sizeof(struct setup_data));
+ break;
+ }
+
+ pa_data = data->next;
+ }
+
+ /*
+ * Original ACPI code falls back to attempting normal EFI boot in these
+ * cases, so maintain existing behavior by indicating non-kexec
+ * environment to the caller, but print them for debugging.
+ */
+ if (esd && !esd->tables) {
+ debug_putstr("kexec EFI environment missing valid configuration table.\n");
+ return NULL;
+ }
+
+ return esd;
+#endif
+ return NULL;
+}
+
/**
* efi_get_conf_table - Given a pointer to boot_params, locate and return the physical
* address of EFI configuration table.
@@ -106,8 +146,12 @@ int efi_get_conf_table(struct boot_params *bp, unsigned long *cfg_tbl_pa,
et = efi_get_type(bp);
if (et == EFI_TYPE_64) {
efi_system_table_64_t *stbl = (efi_system_table_64_t *)sys_tbl_pa;
+ struct efi_setup_data *esd;

- *cfg_tbl_pa = stbl->tables;
+ /* kexec provides an alternative EFI conf table, check for it. */
+ esd = get_kexec_setup_data(bp, et);
+
+ *cfg_tbl_pa = esd ? esd->tables : stbl->tables;
*cfg_tbl_len = stbl->nr_tables;
} else if (et == EFI_TYPE_32) {
efi_system_table_32_t *stbl = (efi_system_table_32_t *)sys_tbl_pa;
--
2.25.1


2022-02-09 19:53:10

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 38/45] x86/sev: Add SEV-SNP feature detection/setup

From: Michael Roth <[email protected]>

Initial/preliminary detection of SEV-SNP is done via the Confidential
Computing blob. Check for it prior to the normal SEV/SME feature
initialization, and add some sanity checks to confirm it agrees with
SEV-SNP CPUID/MSR bits.

Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/sev.c | 27 -------------
arch/x86/include/asm/sev.h | 2 +
arch/x86/kernel/sev-shared.c | 27 +++++++++++++
arch/x86/kernel/sev.c | 64 ++++++++++++++++++++++++++++++
arch/x86/mm/mem_encrypt_identity.c | 8 ++++
5 files changed, 101 insertions(+), 27 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 2a48f3a3f372..2911137bf37f 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -352,33 +352,6 @@ static struct cc_blob_sev_info *find_cc_blob_efi(struct boot_params *bp)
EFI_CC_BLOB_GUID);
}

-struct cc_setup_data {
- struct setup_data header;
- u32 cc_blob_address;
-};
-
-/*
- * Search for a Confidential Computing blob passed in as a setup_data entry
- * via the Linux Boot Protocol.
- */
-static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
-{
- struct cc_setup_data *sd = NULL;
- struct setup_data *hdr;
-
- hdr = (struct setup_data *)bp->hdr.setup_data;
-
- while (hdr) {
- if (hdr->type == SETUP_CC_BLOB) {
- sd = (struct cc_setup_data *)hdr;
- return (struct cc_blob_sev_info *)(unsigned long)sd->cc_blob_address;
- }
- hdr = (struct setup_data *)hdr->next;
- }
-
- return NULL;
-}
-
/*
* Initial set up of SNP relies on information provided by the
* Confidential Computing blob, which can be passed to the boot kernel
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 4e3909042001..219abb4590f2 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -153,6 +153,7 @@ void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
+void snp_abort(void);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -171,6 +172,7 @@ static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npage
static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_wakeup_secondary_cpu(void) { }
static inline bool snp_init(struct boot_params *bp) { return false; }
+static inline void snp_abort(void) { }
#endif

#endif
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 0f1375164ff0..a7a1c0fb298e 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -937,3 +937,30 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,

return ES_OK;
}
+
+struct cc_setup_data {
+ struct setup_data header;
+ u32 cc_blob_address;
+};
+
+/*
+ * Search for a Confidential Computing blob passed in as a setup_data entry
+ * via the Linux Boot Protocol.
+ */
+static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
+{
+ struct cc_setup_data *sd = NULL;
+ struct setup_data *hdr;
+
+ hdr = (struct setup_data *)bp->hdr.setup_data;
+
+ while (hdr) {
+ if (hdr->type == SETUP_CC_BLOB) {
+ sd = (struct cc_setup_data *)hdr;
+ return (struct cc_blob_sev_info *)(unsigned long)sd->cc_blob_address;
+ }
+ hdr = (struct setup_data *)hdr->next;
+ }
+
+ return NULL;
+}
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index b876b1d989eb..a79ddacf0478 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1984,3 +1984,67 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
while (true)
halt();
}
+
+/*
+ * Initial set up of SNP relies on information provided by the
+ * Confidential Computing blob, which can be passed to the kernel
+ * in the following ways, depending on how it is booted:
+ *
+ * - when booted via the boot/decompress kernel:
+ * - via boot_params
+ *
+ * - when booted directly by firmware/bootloader (e.g. CONFIG_PVH):
+ * - via a setup_data entry, as defined by the Linux Boot Protocol
+ *
+ * Scan for the blob in that order.
+ */
+static __init struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
+{
+ struct cc_blob_sev_info *cc_info;
+
+ /* Boot kernel would have passed the CC blob via boot_params. */
+ if (bp->cc_blob_address) {
+ cc_info = (struct cc_blob_sev_info *)(unsigned long)bp->cc_blob_address;
+ goto found_cc_info;
+ }
+
+ /*
+ * If kernel was booted directly, without the use of the
+ * boot/decompression kernel, the CC blob may have been passed via
+ * setup_data instead.
+ */
+ cc_info = find_cc_blob_setup_data(bp);
+ if (!cc_info)
+ return NULL;
+
+found_cc_info:
+ if (cc_info->magic != CC_BLOB_SEV_HDR_MAGIC)
+ snp_abort();
+
+ return cc_info;
+}
+
+bool __init snp_init(struct boot_params *bp)
+{
+ struct cc_blob_sev_info *cc_info;
+
+ if (!bp)
+ return false;
+
+ cc_info = find_cc_blob(bp);
+ if (!cc_info)
+ return false;
+
+ /*
+ * The CC blob will be used later to access the secrets page. Cache
+ * it here like the boot kernel does.
+ */
+ bp->cc_blob_address = (u32)(unsigned long)cc_info;
+
+ return true;
+}
+
+void __init snp_abort(void)
+{
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+}
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 3f0abb403340..2f723e106ed3 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -44,6 +44,7 @@
#include <asm/setup.h>
#include <asm/sections.h>
#include <asm/cmdline.h>
+#include <asm/sev.h>

#include "mm_internal.h"

@@ -508,8 +509,11 @@ void __init sme_enable(struct boot_params *bp)
bool active_by_default;
unsigned long me_mask;
char buffer[16];
+ bool snp;
u64 msr;

+ snp = snp_init(bp);
+
/* Check for the SME/SEV support leaf */
eax = 0x80000000;
ecx = 0;
@@ -541,6 +545,10 @@ void __init sme_enable(struct boot_params *bp)
sev_status = __rdmsr(MSR_AMD64_SEV);
feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;

+ /* The SEV-SNP CC blob should never be present unless SEV-SNP is enabled. */
+ if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ snp_abort();
+
/* Check if memory encryption is enabled */
if (feature_mask == AMD_SME_BIT) {
/*
--
2.25.1


2022-02-09 20:10:40

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 15/45] x86/compressed: Add helper for validating pages in the decompression stage

Many of the integrity guarantees of SEV-SNP are enforced through the
Reverse Map Table (RMP). Each RMP entry contains the GPA at which a
particular page of DRAM should be mapped. The VMs can request the
hypervisor to add pages in the RMP table via the Page State Change VMGEXIT
defined in the GHCB specification. Inside each RMP entry is a Validated
flag; this flag is automatically cleared to 0 by the CPU hardware when a
new RMP entry is created for a guest. Each VM page can be either
validated or invalidated, as indicated by the Validated flag in the RMP
entry. Memory access to a private page that is not validated generates
a #VC. A VM must use PVALIDATE instruction to validate the private page
before using it.

To maintain the security guarantee of SEV-SNP guests, when transitioning
pages from private to shared, the guest must invalidate the pages before
asking the hypervisor to change the page state to shared in the RMP table.

After the pages are mapped private in the page table, the guest must issue
a page state change VMGEXIT to make the pages private in the RMP table and
validate it.

On boot, BIOS should have validated the entire system memory. During
the kernel decompression stage, the early_setup_ghcb() uses the
set_page_decrypted() to make the GHCB page shared (i.e clear encryption
attribute). And while exiting from the decompression, it calls the
set_page_encrypted() to make the page private.

Add snp_set_page_{private,shared}() helpers that are used by the
set_page_{decrypted,encrypted}() to change the page state in the RMP table.

Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/boot/compressed/ident_map_64.c | 18 +++++++++-
arch/x86/boot/compressed/misc.h | 4 +++
arch/x86/boot/compressed/sev.c | 46 +++++++++++++++++++++++++
arch/x86/include/asm/sev-common.h | 26 ++++++++++++++
4 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index f7213d0943b8..3d566964b829 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -275,15 +275,31 @@ static int set_clr_page_flags(struct x86_mapping_info *info,
* Changing encryption attributes of a page requires to flush it from
* the caches.
*/
- if ((set | clr) & _PAGE_ENC)
+ if ((set | clr) & _PAGE_ENC) {
clflush_page(address);

+ /*
+ * If the encryption attribute is being cleared, then change
+ * the page state to shared in the RMP table.
+ */
+ if (clr)
+ snp_set_page_shared(__pa(address & PAGE_MASK));
+ }
+
/* Update PTE */
pte = *ptep;
pte = pte_set_flags(pte, set);
pte = pte_clear_flags(pte, clr);
set_pte(ptep, pte);

+ /*
+ * If the encryption attribute is being set, then change the page state to
+ * private in the RMP entry. The page state change must be done after the PTE
+ * is updated.
+ */
+ if (set & _PAGE_ENC)
+ snp_set_page_private(__pa(address & PAGE_MASK));
+
/* Flush TLB after changing encryption attribute */
write_cr3(top_level_pgt);

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 23e0e395084a..01cc13c12059 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -124,6 +124,8 @@ static inline void console_init(void)
void sev_enable(struct boot_params *bp);
void sev_es_shutdown_ghcb(void);
extern bool sev_es_check_ghcb_fault(unsigned long address);
+void snp_set_page_private(unsigned long paddr);
+void snp_set_page_shared(unsigned long paddr);
#else
static inline void sev_enable(struct boot_params *bp) { }
static inline void sev_es_shutdown_ghcb(void) { }
@@ -131,6 +133,8 @@ static inline bool sev_es_check_ghcb_fault(unsigned long address)
{
return false;
}
+static inline void snp_set_page_private(unsigned long paddr) { }
+static inline void snp_set_page_shared(unsigned long paddr) { }
#endif

/* acpi.c */
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 84e7d45afa9e..23978d858297 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -116,6 +116,52 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
/* Include code for early handlers */
#include "../../kernel/sev-shared.c"

+static inline bool sev_snp_enabled(void)
+{
+ return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+}
+
+static void __page_state_change(unsigned long paddr, enum psc_op op)
+{
+ u64 val;
+
+ if (!sev_snp_enabled())
+ return;
+
+ /*
+ * If private -> shared then invalidate the page before requesting the
+ * state change in the RMP table.
+ */
+ if (op == SNP_PAGE_STATE_SHARED && pvalidate(paddr, RMP_PG_SIZE_4K, 0))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+
+ /* Issue VMGEXIT to change the page state in RMP table. */
+ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
+ VMGEXIT();
+
+ /* Read the response of the VMGEXIT. */
+ val = sev_es_rd_ghcb_msr();
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+ /*
+ * Now that page state is changed in the RMP table, validate it so that it is
+ * consistent with the RMP entry.
+ */
+ if (op == SNP_PAGE_STATE_PRIVATE && pvalidate(paddr, RMP_PG_SIZE_4K, 1))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+}
+
+void snp_set_page_private(unsigned long paddr)
+{
+ __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
+}
+
+void snp_set_page_shared(unsigned long paddr)
+{
+ __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
+}
+
static bool early_setup_ghcb(void)
{
if (set_page_decrypted((unsigned long)&boot_ghcb_page))
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 7ac5842e32b6..fe7fe16e5fd5 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -57,6 +57,32 @@
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007

+/*
+ * SNP Page State Change Operation
+ *
+ * GHCBData[55:52] - Page operation:
+ * 0x0001 Page assignment, Private
+ * 0x0002 Page assignment, Shared
+ */
+enum psc_op {
+ SNP_PAGE_STATE_PRIVATE = 1,
+ SNP_PAGE_STATE_SHARED,
+};
+
+#define GHCB_MSR_PSC_REQ 0x014
+#define GHCB_MSR_PSC_REQ_GFN(gfn, op) \
+ /* GHCBData[55:52] */ \
+ (((u64)((op) & 0xf) << 52) | \
+ /* GHCBData[51:12] */ \
+ ((u64)((gfn) & GENMASK_ULL(39, 0)) << 12) | \
+ /* GHCBData[11:0] */ \
+ GHCB_MSR_PSC_REQ)
+
+#define GHCB_MSR_PSC_RESP 0x015
+#define GHCB_MSR_PSC_RESP_VAL(val) \
+ /* GHCBData[63:32] */ \
+ (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
+
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
--
2.25.1


2022-02-09 20:16:21

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 05/45] x86/boot: Introduce helpers for MSR reads/writes

From: Michael Roth <[email protected]>

The current set of helpers used throughout the run-time kernel have
dependencies on code/facilities outside of the boot kernel, so there
are a number of call-sites throughout the boot kernel where inline
assembly is used instead. More will be added with subsequent patches
that add support for SEV-SNP, so take the opportunity to provide a basic
set of helpers that can be used by the boot kernel to reduce reliance on
inline assembly.

Use boot_* prefix so that it's clear these are helpers specific to the
boot kernel to avoid any confusion with the various other MSR read/write
helpers.

Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/boot/msr.h | 28 ++++++++++++++++++++++++++++
arch/x86/include/asm/msr.h | 11 +----------
arch/x86/include/asm/shared/msr.h | 15 +++++++++++++++
3 files changed, 44 insertions(+), 10 deletions(-)
create mode 100644 arch/x86/boot/msr.h
create mode 100644 arch/x86/include/asm/shared/msr.h

diff --git a/arch/x86/boot/msr.h b/arch/x86/boot/msr.h
new file mode 100644
index 000000000000..b6bb2161da27
--- /dev/null
+++ b/arch/x86/boot/msr.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Helpers/definitions related to MSR access.
+ */
+
+#ifndef BOOT_MSR_H
+#define BOOT_MSR_H
+
+#include <asm/shared/msr.h>
+
+/*
+ * The kernel proper already defines rdmsr()/wrmsr(), but they are not for the
+ * boot kernel since they rely tracepoint/exception handling infrastructure
+ * that's not available here, hence these boot_{rd,wr}msr helpers which serve
+ * the singular purpose of wrapping the RDMSR/WRMSR instructions to reduce the
+ * need for inline assembly calls throughout the boot kernel code.
+ */
+static inline void boot_rdmsr(unsigned int msr, struct msr *m)
+{
+ asm volatile("rdmsr" : "=a" (m->l), "=d" (m->h) : "c" (msr));
+}
+
+static inline void boot_wrmsr(unsigned int msr, const struct msr *m)
+{
+ asm volatile("wrmsr" : : "c" (msr), "a"(m->l), "d" (m->h) : "memory");
+}
+
+#endif /* BOOT_MSR_H */
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index d42e6c6b47b1..65ec1965cd28 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -10,16 +10,7 @@
#include <asm/errno.h>
#include <asm/cpumask.h>
#include <uapi/asm/msr.h>
-
-struct msr {
- union {
- struct {
- u32 l;
- u32 h;
- };
- u64 q;
- };
-};
+#include <asm/shared/msr.h>

struct msr_info {
u32 msr_no;
diff --git a/arch/x86/include/asm/shared/msr.h b/arch/x86/include/asm/shared/msr.h
new file mode 100644
index 000000000000..1e6ec10b3a15
--- /dev/null
+++ b/arch/x86/include/asm/shared/msr.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_SHARED_MSR_H
+#define _ASM_X86_SHARED_MSR_H
+
+struct msr {
+ union {
+ struct {
+ u32 l;
+ u32 h;
+ };
+ u64 q;
+ };
+};
+
+#endif /* _ASM_X86_SHARED_MSR_H */
--
2.25.1


2022-02-09 20:36:30

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 02/45] KVM: SVM: Create a separate mapping for the SEV-ES save area

From: Tom Lendacky <[email protected]>

The save area for SEV-ES/SEV-SNP guests, as used by the hardware, is
different from the save area of a non SEV-ES/SEV-SNP guest.

This is the first step in defining the multiple save areas to keep them
separate and ensuring proper operation amongst the different types of
guests. Create an SEV-ES/SEV-SNP save area and adjust usage to the new
save area definition where needed.

Reviewed-by: Venu Busireddy <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/svm.h | 87 +++++++++++++++++++++++++++++---------
arch/x86/kvm/svm/sev.c | 24 +++++------
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 80 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 7c9cf4f3c164..3ce2e575a2de 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -227,6 +227,7 @@ struct vmcb_seg {
u64 base;
} __packed;

+/* Save area definition for legacy and SEV-MEM guests */
struct vmcb_save_area {
struct vmcb_seg es;
struct vmcb_seg cs;
@@ -243,8 +244,58 @@ struct vmcb_save_area {
u8 cpl;
u8 reserved_2[4];
u64 efer;
+ u8 reserved_3[112];
+ u64 cr4;
+ u64 cr3;
+ u64 cr0;
+ u64 dr7;
+ u64 dr6;
+ u64 rflags;
+ u64 rip;
+ u8 reserved_4[88];
+ u64 rsp;
+ u64 s_cet;
+ u64 ssp;
+ u64 isst_addr;
+ u64 rax;
+ u64 star;
+ u64 lstar;
+ u64 cstar;
+ u64 sfmask;
+ u64 kernel_gs_base;
+ u64 sysenter_cs;
+ u64 sysenter_esp;
+ u64 sysenter_eip;
+ u64 cr2;
+ u8 reserved_5[32];
+ u64 g_pat;
+ u64 dbgctl;
+ u64 br_from;
+ u64 br_to;
+ u64 last_excp_from;
+ u64 last_excp_to;
+ u8 reserved_6[72];
+ u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
+} __packed;
+
+/* Save area definition for SEV-ES and SEV-SNP guests */
+struct sev_es_save_area {
+ struct vmcb_seg es;
+ struct vmcb_seg cs;
+ struct vmcb_seg ss;
+ struct vmcb_seg ds;
+ struct vmcb_seg fs;
+ struct vmcb_seg gs;
+ struct vmcb_seg gdtr;
+ struct vmcb_seg ldtr;
+ struct vmcb_seg idtr;
+ struct vmcb_seg tr;
+ u8 reserved_1[43];
+ u8 cpl;
+ u8 reserved_2[4];
+ u64 efer;
u8 reserved_3[104];
- u64 xss; /* Valid for SEV-ES only */
+ u64 xss;
u64 cr4;
u64 cr3;
u64 cr0;
@@ -272,22 +323,14 @@ struct vmcb_save_area {
u64 br_to;
u64 last_excp_from;
u64 last_excp_to;
-
- /*
- * The following part of the save area is valid only for
- * SEV-ES guests when referenced through the GHCB or for
- * saving to the host save area.
- */
- u8 reserved_7[72];
- u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
- u8 reserved_7b[4];
+ u8 reserved_7[80];
u32 pkru;
- u8 reserved_7a[20];
- u64 reserved_8; /* rax already available at 0x01f8 */
+ u8 reserved_9[20];
+ u64 reserved_10; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
u64 rbx;
- u64 reserved_9; /* rsp already available at 0x01d8 */
+ u64 reserved_11; /* rsp already available at 0x01d8 */
u64 rbp;
u64 rsi;
u64 rdi;
@@ -299,23 +342,25 @@ struct vmcb_save_area {
u64 r13;
u64 r14;
u64 r15;
- u8 reserved_10[16];
+ u8 reserved_12[16];
u64 sw_exit_code;
u64 sw_exit_info_1;
u64 sw_exit_info_2;
u64 sw_scratch;
u64 sev_features;
- u8 reserved_11[48];
+ u8 reserved_13[48];
u64 xcr0;
u8 valid_bitmap[16];
u64 x87_state_gpa;
} __packed;

+#define GHCB_SHARED_BUF_SIZE 2032
+
struct ghcb {
- struct vmcb_save_area save;
- u8 reserved_save[2048 - sizeof(struct vmcb_save_area)];
+ struct sev_es_save_area save;
+ u8 reserved_save[2048 - sizeof(struct sev_es_save_area)];

- u8 shared_buffer[2032];
+ u8 shared_buffer[GHCB_SHARED_BUF_SIZE];

u8 reserved_1[10];
u16 protocol_version; /* negotiated SEV-ES/GHCB protocol version */
@@ -323,13 +368,15 @@ struct ghcb {
} __packed;


-#define EXPECTED_VMCB_SAVE_AREA_SIZE 1032
+#define EXPECTED_VMCB_SAVE_AREA_SIZE 740
+#define EXPECTED_SEV_ES_SAVE_AREA_SIZE 1032
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 1024
#define EXPECTED_GHCB_SIZE PAGE_SIZE

static inline void __unused_size_checks(void)
{
BUILD_BUG_ON(sizeof(struct vmcb_save_area) != EXPECTED_VMCB_SAVE_AREA_SIZE);
+ BUILD_BUG_ON(sizeof(struct sev_es_save_area) != EXPECTED_SEV_ES_SAVE_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct vmcb_control_area) != EXPECTED_VMCB_CONTROL_AREA_SIZE);
BUILD_BUG_ON(sizeof(struct ghcb) != EXPECTED_GHCB_SIZE);
}
@@ -399,7 +446,7 @@ struct vmcb {
/* GHCB Accessor functions */

#define GHCB_BITMAP_IDX(field) \
- (offsetof(struct vmcb_save_area, field) / sizeof(u64))
+ (offsetof(struct sev_es_save_area, field) / sizeof(u64))

#define DEFINE_GHCB_ACCESSORS(field) \
static inline bool ghcb_##field##_is_valid(const struct ghcb *ghcb) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 17b53457d866..b6245298f805 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -558,12 +558,20 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)

static int sev_es_sync_vmsa(struct vcpu_svm *svm)
{
- struct vmcb_save_area *save = &svm->vmcb->save;
+ struct sev_es_save_area *save = svm->sev_es.vmsa;

/* Check some debug related fields before encrypting the VMSA */
- if (svm->vcpu.guest_debug || (save->dr7 & ~DR7_FIXED_1))
+ if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1))
return -EINVAL;

+ /*
+ * SEV-ES will use a VMSA that is pointed to by the VMCB, not
+ * the traditional VMSA that is part of the VMCB. Copy the
+ * traditional VMSA as it has been built so far (in prep
+ * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
+ */
+ memcpy(save, &svm->vmcb->save, sizeof(svm->vmcb->save));
+
/* Sync registgers */
save->rax = svm->vcpu.arch.regs[VCPU_REGS_RAX];
save->rbx = svm->vcpu.arch.regs[VCPU_REGS_RBX];
@@ -591,14 +599,6 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->xss = svm->vcpu.arch.ia32_xss;
save->dr6 = svm->vcpu.arch.dr6;

- /*
- * SEV-ES will use a VMSA that is pointed to by the VMCB, not
- * the traditional VMSA that is part of the VMCB. Copy the
- * traditional VMSA as it has been built so far (in prep
- * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
- */
- memcpy(svm->sev_es.vmsa, save, sizeof(*save));
-
return 0;
}

@@ -2910,7 +2910,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
{
struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
- struct vmcb_save_area *hostsa;
+ struct sev_es_save_area *hostsa;

/*
* As an SEV-ES guest, hardware will restore the host state on VMEXIT,
@@ -2920,7 +2920,7 @@ void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
vmsave(__sme_page_pa(sd->save_area));

/* XCR0 is restored on VMEXIT, save the current host value */
- hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
+ hostsa = (struct sev_es_save_area *)(page_address(sd->save_area) + 0x400);
hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);

/* PKRU is restored on VMEXIT, save the current host value */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 73525353e424..5c626fd0e8e4 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -167,7 +167,7 @@ struct svm_nested_state {

struct vcpu_sev_es_state {
/* SEV-ES support */
- struct vmcb_save_area *vmsa;
+ struct sev_es_save_area *vmsa;
struct ghcb *ghcb;
struct kvm_host_map ghcb_map;
bool received_first_sipi;
--
2.25.1


2022-02-09 20:39:11

by Brijesh Singh

[permalink] [raw]
Subject: [PATCH v10 22/45] x86/sev: Use SEV-SNP AP creation to start secondary CPUs

From: Tom Lendacky <[email protected]>

To provide a more secure way to start APs under SEV-SNP, use the SEV-SNP
AP Creation NAE event. This allows for guest control over the AP register
state rather than trusting the hypervisor with the SEV-ES Jump Table
address.

During native_smp_prepare_cpus(), invoke an SEV-SNP function that, if
SEV-SNP is active, will set/override apic->wakeup_secondary_cpu. This
will allow the SEV-SNP AP Creation NAE event method to be used to boot
the APs. As a result of installing the override when SEV-SNP is active,
this method of starting the APs becomes the required method. The override
function will fail to start the AP if the hypervisor does not have
support for AP creation.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
---
arch/x86/include/asm/sev-common.h | 1 +
arch/x86/include/asm/sev.h | 4 +
arch/x86/include/uapi/asm/svm.h | 5 +
arch/x86/kernel/sev.c | 251 ++++++++++++++++++++++++++++++
arch/x86/kernel/smpboot.c | 3 +
5 files changed, 264 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1aa72b5c2490..e9b6815b3b3d 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -104,6 +104,7 @@ enum psc_op {
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)

#define GHCB_HV_FT_SNP BIT_ULL(0)
+#define GHCB_HV_FT_SNP_AP_CREATION BIT_ULL(1)

/* SNP Page State Change NAE event */
#define VMGEXIT_PSC_MAX_ENTRY 253
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index feeb93e6ec97..a3203b2caaca 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -66,6 +66,8 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
/* RMP page size */
#define RMP_PG_SIZE_4K 0

+#define RMPADJUST_VMSA_PAGE_BIT BIT(16)
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -130,6 +132,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
+void snp_set_wakeup_secondary_cpu(void);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -146,6 +149,7 @@ early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned i
static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
+static inline void snp_set_wakeup_secondary_cpu(void) { }
#endif

#endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 0dcdb6e0c913..8b4c57baec52 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -109,6 +109,10 @@
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
#define SVM_VMGEXIT_PSC 0x80000010
+#define SVM_VMGEXIT_AP_CREATION 0x80000013
+#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
+#define SVM_VMGEXIT_AP_CREATE 1
+#define SVM_VMGEXIT_AP_DESTROY 2
#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff

@@ -221,6 +225,7 @@
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
{ SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
+ { SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
{ SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 4315be1602d1..439c2f963e17 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -18,6 +18,7 @@
#include <linux/memblock.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/cpumask.h>

#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
@@ -31,9 +32,26 @@
#include <asm/svm.h>
#include <asm/smp.h>
#include <asm/cpu.h>
+#include <asm/apic.h>

#define DR7_RESET_VALUE 0x400

+/* AP INIT values as documented in the APM2 section "Processor Initialization State" */
+#define AP_INIT_CS_LIMIT 0xffff
+#define AP_INIT_DS_LIMIT 0xffff
+#define AP_INIT_LDTR_LIMIT 0xffff
+#define AP_INIT_GDTR_LIMIT 0xffff
+#define AP_INIT_IDTR_LIMIT 0xffff
+#define AP_INIT_TR_LIMIT 0xffff
+#define AP_INIT_RFLAGS_DEFAULT 0x2
+#define AP_INIT_DR6_DEFAULT 0xffff0ff0
+#define AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
+#define AP_INIT_XCR0_DEFAULT 0x1
+#define AP_INIT_X87_FTW_DEFAULT 0x5555
+#define AP_INIT_X87_FCW_DEFAULT 0x0040
+#define AP_INIT_CR0_DEFAULT 0x60000010
+#define AP_INIT_MXCSR_DEFAULT 0x1f80
+
/* For early boot hypervisor communication in SEV-ES enabled guests */
static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);

@@ -90,6 +108,8 @@ struct ghcb_state {
static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);

+static DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
+
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -823,6 +843,237 @@ void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
pvalidate_pages(vaddr, npages, true);
}

+static int snp_set_vmsa(void *va, bool vmsa)
+{
+ u64 attrs;
+
+ /*
+ * Running at VMPL0 allows the kernel to change the VMSA bit for a page
+ * using the RMPADJUST instruction. However, for the instruction to
+ * succeed it must target the permissions of a lesser privileged
+ * VMPL level, so use VMPL1 (refer to the RMPADJUST instruction in the
+ * AMD64 APM Volume 3).
+ */
+ attrs = 1;
+ if (vmsa)
+ attrs |= RMPADJUST_VMSA_PAGE_BIT;
+
+ return rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs);
+}
+
+#define __ATTR_BASE (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK)
+#define INIT_CS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK)
+#define INIT_DS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_WRITE_MASK)
+
+#define INIT_LDTR_ATTRIBS (SVM_SELECTOR_P_MASK | 2)
+#define INIT_TR_ATTRIBS (SVM_SELECTOR_P_MASK | 3)
+
+static void *snp_alloc_vmsa_page(void)
+{
+ unsigned long pfn;
+ struct page *p;
+
+ /*
+ * Allocate VMSA page to work around the SNP erratum where the CPU will
+ * incorrectly signal an RMP violation #PF if a large page (2MB or 1GB)
+ * collides with the RMP entry of VMSA page. The recommended workaround
+ * is to not use a large page.
+ *
+ * Allocate one extra page, use a page which is not 2MB-aligned
+ * and free the other.
+ */
+ p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+ if (!p)
+ return NULL;
+
+ split_page(p, 1);
+
+ pfn = page_to_pfn(p);
+ if (IS_ALIGNED(__pfn_to_phys(pfn), PMD_SIZE)) {
+ pfn++;
+ __free_page(p);
+ } else {
+ __free_page(pfn_to_page(pfn + 1));
+ }
+
+ return page_address(pfn_to_page(pfn));
+}
+
+static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa)
+{
+ int err;
+
+ err = snp_set_vmsa(vmsa, false);
+ if (err)
+ pr_err("clear VMSA page failed (%u), leaking page\n", err);
+ else
+ free_page((unsigned long)vmsa);
+}
+
+static int wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
+{
+ struct sev_es_save_area *cur_vmsa, *vmsa;
+ struct ghcb_state state;
+ unsigned long flags;
+ struct ghcb *ghcb;
+ u8 sipi_vector;
+ int cpu, ret;
+ u64 cr4;
+
+ /*
+ * SNP AP creation requires that the hypervisor must support SNP and
+ * the AP creation feature. The SNP feature check was already checked
+ * prior to getting here, so just check for the AP_CREATION feature
+ * flag.
+ */
+ if (!(sev_hv_features & GHCB_HV_FT_SNP_AP_CREATION))
+ return -EOPNOTSUPP;
+
+ /*
+ * Verify the desired start IP against the known trampoline start IP
+ * to catch any future new trampolines that may be introduced that
+ * would require a new protected guest entry point.
+ */
+ if (WARN_ONCE(start_ip != real_mode_header->trampoline_start,
+ "Unsupported SNP start_ip: %lx\n", start_ip))
+ return -EINVAL;
+
+ /* Override start_ip with known protected guest start IP */
+ start_ip = real_mode_header->sev_es_trampoline_start;
+
+ /* Find the logical CPU for the APIC ID */
+ for_each_present_cpu(cpu) {
+ if (arch_match_cpu_phys_id(cpu, apic_id))
+ break;
+ }
+ if (cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ cur_vmsa = per_cpu(sev_vmsa, cpu);
+
+ /*
+ * A new VMSA is created each time because there is no guarantee that
+ * the current VMSA is the kernels or that the vCPU is not running. If
+ * an attempt was done to use the current VMSA with a running vCPU, a
+ * #VMEXIT of that vCPU would wipe out all of the settings being done
+ * here.
+ */
+ vmsa = (struct sev_es_save_area *)snp_alloc_vmsa_page();
+ if (!vmsa)
+ return -ENOMEM;
+
+ /* CR4 should maintain the MCE value */
+ cr4 = native_read_cr4() & X86_CR4_MCE;
+
+ /* Set the CS value based on the start_ip converted to a SIPI vector */
+ sipi_vector = (start_ip >> 12);
+ vmsa->cs.base = sipi_vector << 12;
+ vmsa->cs.limit = AP_INIT_CS_LIMIT;
+ vmsa->cs.attrib = INIT_CS_ATTRIBS;
+ vmsa->cs.selector = sipi_vector << 8;
+
+ /* Set the RIP value based on start_ip */
+ vmsa->rip = start_ip & 0xfff;
+
+ /* Set AP INIT defaults as documented in the APM */
+ vmsa->ds.limit = AP_INIT_DS_LIMIT;
+ vmsa->ds.attrib = INIT_DS_ATTRIBS;
+ vmsa->es = vmsa->ds;
+ vmsa->fs = vmsa->ds;
+ vmsa->gs = vmsa->ds;
+ vmsa->ss = vmsa->ds;
+
+ vmsa->gdtr.limit = AP_INIT_GDTR_LIMIT;
+ vmsa->ldtr.limit = AP_INIT_LDTR_LIMIT;
+ vmsa->ldtr.attrib = INIT_LDTR_ATTRIBS;
+ vmsa->idtr.limit = AP_INIT_IDTR_LIMIT;
+ vmsa->tr.limit = AP_INIT_TR_LIMIT;
+ vmsa->tr.attrib = INIT_TR_ATTRIBS;
+
+ vmsa->cr4 = cr4;
+ vmsa->cr0 = AP_INIT_CR0_DEFAULT;
+ vmsa->dr7 = DR7_RESET_VALUE;
+ vmsa->dr6 = AP_INIT_DR6_DEFAULT;
+ vmsa->rflags = AP_INIT_RFLAGS_DEFAULT;
+ vmsa->g_pat = AP_INIT_GPAT_DEFAULT;
+ vmsa->xcr0 = AP_INIT_XCR0_DEFAULT;
+ vmsa->mxcsr = AP_INIT_MXCSR_DEFAULT;
+ vmsa->x87_ftw = AP_INIT_X87_FTW_DEFAULT;
+ vmsa->x87_fcw = AP_INIT_X87_FCW_DEFAULT;
+
+ /* SVME must be set. */
+ vmsa->efer = EFER_SVME;
+
+ /*
+ * Set the SNP-specific fields for this VMSA:
+ * VMPL level
+ * SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
+ */
+ vmsa->vmpl = 0;
+ vmsa->sev_features = sev_status >> 2;
+
+ /* Switch the page over to a VMSA page now that it is initialized */
+ ret = snp_set_vmsa(vmsa, true);
+ if (ret) {
+ pr_err("set VMSA page failed (%u)\n", ret);
+ free_page((unsigned long)vmsa);
+
+ return -EINVAL;
+ }
+
+ /* Issue VMGEXIT AP Creation NAE event */
+ local_irq_save(flags);
+
+ ghcb = __sev_get_ghcb(&state);
+
+ vc_ghcb_invalidate(ghcb);
+ ghcb_set_rax(ghcb, vmsa->sev_features);
+ ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
+ ghcb_set_sw_exit_info_1(ghcb, ((u64)apic_id << 32) | SVM_VMGEXIT_AP_CREATE);
+ ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
+
+ sev_es_wr_ghcb_msr(__pa(ghcb));
+ VMGEXIT();
+
+ if (!ghcb_sw_exit_info_1_is_valid(ghcb) ||
+ lower_32_bits(ghcb->save.sw_exit_info_1)) {
+ pr_err("SNP AP Creation error\n");
+ ret = -EINVAL;
+ }
+
+ __sev_put_ghcb(&state);
+
+ local_irq_restore(flags);
+
+ /* Perform cleanup if there was an error */
+ if (ret) {
+ snp_cleanup_vmsa(vmsa);
+ vmsa = NULL;
+ }
+
+ /* Free up any previous VMSA page */
+ if (cur_vmsa)
+ snp_cleanup_vmsa(cur_vmsa);
+
+ /* Record the current VMSA page */
+ per_cpu(sev_vmsa, cpu) = vmsa;
+
+ return ret;
+}
+
+void snp_set_wakeup_secondary_cpu(void)
+{
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return;
+
+ /*
+ * Always set this override if SNP is enabled. This makes it the
+ * required method to start APs under SNP. If the hypervisor does
+ * not support AP creation, then no APs will be started.
+ */
+ apic->wakeup_secondary_cpu = wakeup_cpu_via_vmgexit;
+}
+
int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
{
u16 startup_cs, startup_ip;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 617012f4619f..ad23d53b39ac 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -82,6 +82,7 @@
#include <asm/spec-ctrl.h>
#include <asm/hw_irq.h>
#include <asm/stackprotector.h>
+#include <asm/sev.h>

#ifdef CONFIG_ACPI_CPPC_LIB
#include <acpi/cppc_acpi.h>
@@ -1436,6 +1437,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
smp_quirk_init_udelay();

speculative_store_bypass_ht_init();
+
+ snp_set_wakeup_secondary_cpu();
}

void arch_thaw_secondary_cpus_begin(void)
--
2.25.1


2022-02-11 08:50:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Wed, Feb 09, 2022 at 12:10:15PM -0600, Brijesh Singh wrote:
> The set_memory_{encrypted,decrypted}() are used for changing the pages
> from decrypted (shared) to encrypted (private) and vice versa.
> When SEV-SNP is active, the page state transition needs to go through
> additional steps done by the guest.
>
> If the page is transitioned from shared to private, then perform the
> following after the encryption attribute is set in the page table:
>
> 1. Issue the page state change VMGEXIT to add the memory region in
> the RMP table.
> 2. Validate the memory region after the RMP entry is added.
>
> To maintain the security guarantees, if the page is transitioned from
> private to shared, then perform the following before encryption attribute
> is removed from the page table:
>
> 1. Invalidate the page.
> 2. Issue the page state change VMGEXIT to remove the page from RMP table.
>
> To change the page state in the RMP table, use the Page State Change
> VMGEXIT defined in the GHCB specification.
>
> The GHCB specification provides the flexibility to use either 4K or 2MB
> page size in during the page state change (PSC) request. For now use the
> 4K page size for all the PSC until RMP page size tracking is supported
> in the kernel.

This commit message sounds familiar because I've read it before - patch
18 - and it looks copied. So I've turned into a simple one which says it
all:

x86/mm: Validate memory when changing the C-bit

Add the needed functionality to change pages state from shared
to private and vice-versa using the Page State Change VMGEXIT as
documented in the GHCB spec.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-02-11 16:06:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

+ Kirill.

On Wed, Feb 09, 2022 at 12:10:15PM -0600, Brijesh Singh wrote:
> @@ -2012,8 +2013,22 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
> */
> cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
>
> + /*
> + * To maintain the security guarantees of SEV-SNP guests, make sure
> + * to invalidate the memory before clearing the encryption attribute.
> + */
> + if (!enc)
> + snp_set_memory_shared(addr, numpages);
> +
> ret = __change_page_attr_set_clr(&cpa, 1);
>
> + /*
> + * Now that memory is mapped encrypted in the page table, validate it
> + * so that it is consistent with the above page state.
> + */
> + if (!ret && enc)
> + snp_set_memory_private(addr, numpages);
> +
> /*
> * After changing the encryption attribute, we need to flush TLBs again
> * in case any speculative TLB caching occurred (but no need to flush
> --

Right, as tglx rightfully points out here:

https://lore.kernel.org/r/875ypyvz07.ffs@tglx

this piece of code needs careful coordinated design so that it is clean
for both SEV and TDX.

First, as we've said here:

https://lore.kernel.org/r/[email protected]

we'd need generic functions which turn a pgprot into an encrypted or
decrypted pgprot on both SEV and TDX so we could do:

cc_pgprot_enc()
cc_pgprot_dec()

which does the required conversion on each guest type.

Also, I think adding required functions to x86_platform.guest. is a very
nice way to solve the ugly if (guest_type) querying all over the place.

Also, I was thinking of sme_me_mask and the corresponding
tdx_shared_mask I threw into the mix here:

https://lore.kernel.org/r/[email protected]

and we should simply add those without ifdeffery but unconditionally.

Simply have them always present. They will have !0 values on the
respective guest types and 0 otherwise. This should simplify a lot of
code and another unconditionally present u64 won't be the end of the
world.

Any other aspect I'm missing?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-02-11 22:14:50

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit


On 2/11/22 8:55 AM, Borislav Petkov wrote:
>
> Simply have them always present. They will have !0 values on the
> respective guest types and 0 otherwise. This should simplify a lot of
> code and another unconditionally present u64 won't be the end of the
> world.
>
> Any other aspect I'm missing?

I think that's mostly about it. IIUC, the recommendation is to define a
new callback in x86_platform_op. The callback will be invoked
unconditionally; The default implementation for this callback is NOP;
The TDX and SEV will override with the platform specific implementation.
I think we may able to handle everything in one callback hook but having
pre and post will be a more desirable. Here is why I am thinking so:

* On SNP, the page must be invalidated before clearing the _PAGE_ENC
from the page table attribute

* On SNP, the page must be validated after setting the _PAGE_ENC in the
page table attribute.

~Brijesh

2022-02-13 17:07:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Fri, Feb 11, 2022 at 11:27:54AM -0600, Brijesh Singh wrote:
> > Simply have them always present. They will have !0 values on the
> > respective guest types and 0 otherwise. This should simplify a lot of
> > code and another unconditionally present u64 won't be the end of the
> > world.
> >
> > Any other aspect I'm missing?
>
> I think that's mostly about it. IIUC, the recommendation is to define a
> new callback in x86_platform_op. The callback will be invoked
> unconditionally; The default implementation for this callback is NOP;
> The TDX and SEV will override with the platform specific implementation.
> I think we may able to handle everything in one callback hook but having
> pre and post will be a more desirable. Here is why I am thinking so:
>
> * On SNP, the page must be invalidated before clearing the _PAGE_ENC
> from the page table attribute
>
> * On SNP, the page must be validated after setting the _PAGE_ENC in the
> page table attribute.

Right, we could have a pre- and post- callback, if that would make
things simpler/clearer.

Also, in thinking further about the encryption mask, we could make it a
*single*, *global* variable called cc_mask which each guest type sets it
as it wants to.

Then, it would use it in the vendor-specific encrypt/decrypt helpers
accordingly and that would simplify a lot of code. And we can get rid of
all the ifdeffery around it too.

So I think the way to go should be we do the common functionality, I
queue it on the common tip:x86/cc branch and then SNP and TDX will be
both based ontop of it.

Thoughts?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-02-13 20:52:01

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On 2/13/22 06:15, Borislav Petkov wrote:
> On Fri, Feb 11, 2022 at 11:27:54AM -0600, Brijesh Singh wrote:
>>> Simply have them always present. They will have !0 values on the
>>> respective guest types and 0 otherwise. This should simplify a lot of
>>> code and another unconditionally present u64 won't be the end of the
>>> world.
>>>
>>> Any other aspect I'm missing?
>>
>> I think that's mostly about it. IIUC, the recommendation is to define a
>> new callback in x86_platform_op. The callback will be invoked
>> unconditionally; The default implementation for this callback is NOP;
>> The TDX and SEV will override with the platform specific implementation.
>> I think we may able to handle everything in one callback hook but having
>> pre and post will be a more desirable. Here is why I am thinking so:
>>
>> * On SNP, the page must be invalidated before clearing the _PAGE_ENC
>> from the page table attribute
>>
>> * On SNP, the page must be validated after setting the _PAGE_ENC in the
>> page table attribute.
>
> Right, we could have a pre- and post- callback, if that would make
> things simpler/clearer.
>
> Also, in thinking further about the encryption mask, we could make it a
> *single*, *global* variable called cc_mask which each guest type sets it
> as it wants to.
>
> Then, it would use it in the vendor-specific encrypt/decrypt helpers
> accordingly and that would simplify a lot of code. And we can get rid of
> all the ifdeffery around it too.
>
> So I think the way to go should be we do the common functionality, I
> queue it on the common tip:x86/cc branch and then SNP and TDX will be
> both based ontop of it.
>
> Thoughts?

I think there were a lot of assumptions that only SME/SEV would set
sme_me_mask and that is used, for example, in the cc_platform_has()
routine to figure out whether we're AMD or Intel. If you go the cc_mask
route, I think we'll need to add a cc_vendor variable that would then be
checked in cc_platform_has(). All other uses of sme_me_mask would need to
be audited to see whether cc_vendor would need to be checked, too.

Thanks,
Tom

>

2022-02-14 10:21:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Sun, Feb 13, 2022 at 08:50:48AM -0600, Tom Lendacky wrote:
> I think there were a lot of assumptions that only SME/SEV would set
> sme_me_mask and that is used, for example, in the cc_platform_has() routine
> to figure out whether we're AMD or Intel. If you go the cc_mask route, I
> think we'll need to add a cc_vendor variable that would then be checked in
> cc_platform_has().

Right, or cc_platform_type or whatever. It would probably be a good
idea to have a variable explicitly state what the active coco flavor is
anyway, as we had some ambiguity questions in the past along the lines
of, what does cc_platform_has() need to return when running as a guest
on the respective platform.

If you have it explicitly, then it would work unambiguously simple. And
then we can get rid of CC_ATTR_GUEST_SEV_SNP or CC_ATTR_GUEST_TDX which
is clumsy.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-02-15 13:33:46

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Sun, Feb 13, 2022 at 01:15:23PM +0100, Borislav Petkov wrote:
> On Fri, Feb 11, 2022 at 11:27:54AM -0600, Brijesh Singh wrote:
> > > Simply have them always present. They will have !0 values on the
> > > respective guest types and 0 otherwise. This should simplify a lot of
> > > code and another unconditionally present u64 won't be the end of the
> > > world.
> > >
> > > Any other aspect I'm missing?
> >
> > I think that's mostly about it. IIUC, the recommendation is to define a
> > new callback in x86_platform_op. The callback will be invoked
> > unconditionally; The default implementation for this callback is NOP;
> > The TDX and SEV will override with the platform specific implementation.
> > I think we may able to handle everything in one callback hook but having
> > pre and post will be a more desirable. Here is why I am thinking so:
> >
> > * On SNP, the page must be invalidated before clearing the _PAGE_ENC
> > from the page table attribute
> >
> > * On SNP, the page must be validated after setting the _PAGE_ENC in the
> > page table attribute.
>
> Right, we could have a pre- and post- callback, if that would make
> things simpler/clearer.
>
> Also, in thinking further about the encryption mask, we could make it a
> *single*, *global* variable called cc_mask which each guest type sets it
> as it wants to.

I don't think it works. TDX and SME/SEV has opposite polarity of the mask.
SME/SEV has to clear the mask to share the page. TDX has to set it.

Making a single global mask only increases confusion.

--
Kirill A. Shutemov

2022-02-15 14:18:09

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Tue, Feb 15, 2022 at 01:54:48PM +0100, Borislav Petkov wrote:
> On Tue, Feb 15, 2022 at 03:43:31PM +0300, Kirill A. Shutemov wrote:
> > I don't think it works. TDX and SME/SEV has opposite polarity of the mask.
> > SME/SEV has to clear the mask to share the page. TDX has to set it.
> >
> > Making a single global mask only increases confusion.
>
> Didn't you read the rest of the thread with Tom's suggestion? I think
> there's a merit in having a cc_vendor or so which explicitly states what
> type of HV the kernel runs on...

I have no problem with cc_vendor idea. It looks good.

Regarding the masks, if we want to have common ground here we can add two
mask: cc_enc_mask and cc_dec_mask. And then

pgprotval_t cc_enc(pgprotval_t protval)
{
protval |= cc_enc_mask;
protval &= ~cc_dec_mask;
return protval;
}

pgprotval_t cc_dec(pgprotval_t protval)
{
protval |= cc_dec_mask;
protval &= ~cc_enc_mask;
return protval;
}

It assumes (cc_enc_mask & cc_dec_mask) == 0.

Any opinions?

--
Kirill A. Shutemov

2022-02-15 15:38:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Tue, Feb 15, 2022 at 04:15:22PM +0300, Kirill A. Shutemov wrote:
> I have no problem with cc_vendor idea. It looks good.

Good.

> Regarding the masks, if we want to have common ground here we can add two
> mask: cc_enc_mask and cc_dec_mask. And then

If we do two masks, then we can just as well leave the SME and TDX
masks. The point of the whole exercise is to have simpler code and less
ifdeffery.

If you "hide" how the mask works on each vendor in the respective
functions - and yes, cc_pgprot_dec/enc() reads better - then it doesn't
matter how the mask is defined.

Because you don't need two masks to encrypt/decrypt pages - you need a
single mask but apply it differently.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-02-15 16:52:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Tue, Feb 15, 2022 at 03:43:31PM +0300, Kirill A. Shutemov wrote:
> I don't think it works. TDX and SME/SEV has opposite polarity of the mask.
> SME/SEV has to clear the mask to share the page. TDX has to set it.
>
> Making a single global mask only increases confusion.

Didn't you read the rest of the thread with Tom's suggestion? I think
there's a merit in having a cc_vendor or so which explicitly states what
type of HV the kernel runs on...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-02-16 14:58:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Fri, Feb 11, 2022 at 03:55:23PM +0100, Borislav Petkov wrote:
> Also, I think adding required functions to x86_platform.guest. is a very
> nice way to solve the ugly if (guest_type) querying all over the place.

So I guess something like below. It builds here...

---
arch/x86/include/asm/set_memory.h | 1 -
arch/x86/include/asm/sev.h | 2 ++
arch/x86/include/asm/x86_init.h | 12 ++++++++++++
arch/x86/kernel/sev.c | 2 ++
arch/x86/mm/mem_encrypt_amd.c | 6 +++---
arch/x86/mm/pat/set_memory.c | 2 +-
6 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index ff0f2d90338a..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -84,7 +84,6 @@ int set_pages_rw(struct page *page, int numpages);
int set_direct_map_invalid_noflush(struct page *page);
int set_direct_map_default_noflush(struct page *page);
bool kernel_page_present(struct page *page);
-void notify_range_enc_status_changed(unsigned long vaddr, int npages, bool enc);

extern int kernel_set_to_readonly;

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ec060c433589..2435b0ca6cfc 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -95,4 +95,6 @@ static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
#endif

+void amd_notify_range_enc_status_changed(unsigned long vaddr, int npages, bool enc);
+
#endif
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 22b7412c08f6..226663e2d769 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -141,6 +141,17 @@ struct x86_init_acpi {
void (*reduced_hw_early_init)(void);
};

+/**
+ * struct x86_guest - Functions used by misc guest incarnations like SEV, TDX,
+ * etc.
+ *
+ * @enc_status_change Notify HV about change of encryption status of a
+ * range of pages
+ */
+struct x86_guest {
+ void (*enc_status_change)(unsigned long vaddr, int npages, bool enc);
+};
+
/**
* struct x86_init_ops - functions for platform specific setup
*
@@ -287,6 +298,7 @@ struct x86_platform_ops {
struct x86_legacy_features legacy;
void (*set_legacy_features)(void);
struct x86_hyper_runtime hyper;
+ struct x86_guest guest;
};

struct x86_apic_ops {
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e6d316a01fdd..e645e868a49b 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -766,6 +766,8 @@ void __init sev_es_init_vc_handling(void)
if (!sev_es_check_cpu_features())
panic("SEV-ES CPU Features missing");

+ x86_platform.guest.enc_status_change = amd_notify_range_enc_status_changed;
+
/* Enable SEV-ES special handling */
static_branch_enable(&sev_es_enable_key);

diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index 2b2d018ea345..7038a9f7ae55 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -256,7 +256,7 @@ static unsigned long pg_level_to_pfn(int level, pte_t *kpte, pgprot_t *ret_prot)
return pfn;
}

-void notify_range_enc_status_changed(unsigned long vaddr, int npages, bool enc)
+void amd_notify_range_enc_status_changed(unsigned long vaddr, int npages, bool enc)
{
#ifdef CONFIG_PARAVIRT
unsigned long sz = npages << PAGE_SHIFT;
@@ -392,7 +392,7 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,

ret = 0;

- notify_range_enc_status_changed(start, PAGE_ALIGN(size) >> PAGE_SHIFT, enc);
+ amd_notify_range_enc_status_changed(start, PAGE_ALIGN(size) >> PAGE_SHIFT, enc);
out:
__flush_tlb_all();
return ret;
@@ -410,7 +410,7 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size)

void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages, bool enc)
{
- notify_range_enc_status_changed(vaddr, npages, enc);
+ amd_notify_range_enc_status_changed(vaddr, npages, enc);
}

void __init mem_encrypt_free_decrypted_mem(void)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index b4072115c8ef..0acc52a3a5b7 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2027,7 +2027,7 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
* Notify hypervisor that a given memory range is mapped encrypted
* or decrypted.
*/
- notify_range_enc_status_changed(addr, numpages, enc);
+ x86_platform.guest.enc_status_change(addr, numpages, enc);

return ret;
}
--
2.29.2

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-02-16 19:09:33

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Fri, Feb 11, 2022 at 03:55:23PM +0100, Borislav Petkov wrote:
>> Also, I think adding required functions to x86_platform.guest. is a very
>> nice way to solve the ugly if (guest_type) querying all over the place.

> So I guess something like below. It builds here...

I made a small change to use prepare() and finish(). The idea is that prepare()
will be called before the encryption mask is changed in the page table, and
finish() will be called after the change.

I have not tried integrating the SNP PSC yet, but want to check if approach
will work for Krill.

---
arch/x86/include/asm/set_memory.h | 1 -
arch/x86/include/asm/sev.h | 3 +++
arch/x86/include/asm/x86_init.h | 15 +++++++++++++++
arch/x86/kernel/sev.c | 3 +++
arch/x86/mm/mem_encrypt_amd.c | 14 ++++++++++----
arch/x86/mm/pat/set_memory.c | 13 +++++++------
6 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index ff0f2d90338a..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -84,7 +84,6 @@ int set_pages_rw(struct page *page, int numpages);
int set_direct_map_invalid_noflush(struct page *page);
int set_direct_map_default_noflush(struct page *page);
bool kernel_page_present(struct page *page);
-void notify_range_enc_status_changed(unsigned long vaddr, int npages, bool enc);

extern int kernel_set_to_readonly;

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ec060c433589..2ebd8c225257 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -87,6 +87,9 @@ extern enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
struct es_em_ctxt *ctxt,
u64 exit_code, u64 exit_info_1,
u64 exit_info_2);
+void amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc);
+void amd_enc_status_change_finish(unsigned long vaddr, int npages, bool enc);
+
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 22b7412c08f6..da7fc1c0b917 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -141,6 +141,20 @@ struct x86_init_acpi {
void (*reduced_hw_early_init)(void);
};

+/**
+ * struct x86_guest - Functions used by misc guest incarnations like SEV, TDX, etc.
+ *
+ * @enc_status_change_prepare Notify HV before the encryption status of a range
+ * is changed.
+ *
+ * @enc_status_change_finish Notify HV after the encryption status of a range
+ * is changed.
+ */
+struct x86_guest {
+ void (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
+ void (*enc_status_change_finish)(unsigned long vaddr, int npages, bool enc);
+};
+
/**
* struct x86_init_ops - functions for platform specific setup
*
@@ -287,6 +301,7 @@ struct x86_platform_ops {
struct x86_legacy_features legacy;
void (*set_legacy_features)(void);
struct x86_hyper_runtime hyper;
+ struct x86_guest guest;
};

struct x86_apic_ops {
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e6d316a01fdd..3b2133fd6682 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -760,6 +760,9 @@ void __init sev_es_init_vc_handling(void)

BUILD_BUG_ON(offsetof(struct sev_es_runtime_data, ghcb_page) % PAGE_SIZE);

+ x86_platform.guest.enc_status_change_prepare = amd_enc_status_change_prepare;
+ x86_platform.guest.enc_status_change_finish = amd_enc_status_change_finish;
+
if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
return;

diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index 2b2d018ea345..c93b6c2fc6a3 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -256,7 +256,11 @@ static unsigned long pg_level_to_pfn(int level, pte_t *kpte, pgprot_t *ret_prot)
return pfn;
}

-void notify_range_enc_status_changed(unsigned long vaddr, int npages, bool enc)
+void amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc)
+{
+}
+
+void amd_enc_status_change_finish(unsigned long vaddr, int npages, bool enc)
{
#ifdef CONFIG_PARAVIRT
unsigned long sz = npages << PAGE_SHIFT;
@@ -280,7 +284,7 @@ void notify_range_enc_status_changed(unsigned long vaddr, int npages, bool enc)
psize = page_level_size(level);
pmask = page_level_mask(level);

- notify_page_enc_status_changed(pfn, psize >> PAGE_SHIFT, enc);
+ amd_enc_status_change_finish(pfn, psize >> PAGE_SHIFT, enc);

vaddr = (vaddr & pmask) + psize;
}
@@ -341,6 +345,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
vaddr_next = vaddr;
vaddr_end = vaddr + size;

+ amd_enc_status_change_prepare(start, PAGE_ALIGN(size) >> PAGE_SHIFT, enc);
+
for (; vaddr < vaddr_end; vaddr = vaddr_next) {
kpte = lookup_address(vaddr, &level);
if (!kpte || pte_none(*kpte)) {
@@ -392,7 +398,7 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,

ret = 0;

- notify_range_enc_status_changed(start, PAGE_ALIGN(size) >> PAGE_SHIFT, enc);
+ amd_enc_status_change_finish(start, PAGE_ALIGN(size) >> PAGE_SHIFT, enc);
out:
__flush_tlb_all();
return ret;
@@ -410,7 +416,7 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size)

void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages, bool enc)
{
- notify_range_enc_status_changed(vaddr, npages, enc);
+ amd_enc_status_change_finish(vaddr, npages, enc);
}

void __init mem_encrypt_free_decrypted_mem(void)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index b4072115c8ef..a55477a6e578 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2012,8 +2012,15 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
*/
cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));

+ /* Notify HV that we are about to set/clr encryption attribute. */
+ x86_platform.guest.enc_status_change_prepare(addr, numpages, enc);
+
ret = __change_page_attr_set_clr(&cpa, 1);

+ /* Notify HV that we have succesfully set/clr encryption attribute. */
+ if (!ret)
+ x86_platform.guest.enc_status_change_finish(addr, numpages, enc);
+
/*
* After changing the encryption attribute, we need to flush TLBs again
* in case any speculative TLB caching occurred (but no need to flush
@@ -2023,12 +2030,6 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
*/
cpa_flush(&cpa, 0);

- /*
- * Notify hypervisor that a given memory range is mapped encrypted
- * or decrypted.
- */
- notify_range_enc_status_changed(addr, numpages, enc);
-
return ret;
}

--
2.25.1

2022-02-21 22:18:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Mon, Feb 21, 2022 at 08:41:21PM +0300, Kirill A. Shutemov wrote:
> On Wed, Feb 16, 2022 at 10:04:57AM -0600, Brijesh Singh wrote:
> > @@ -287,6 +301,7 @@ struct x86_platform_ops {
> > struct x86_legacy_features legacy;
> > void (*set_legacy_features)(void);
> > struct x86_hyper_runtime hyper;
> > + struct x86_guest guest;
> > };
>
> I used 'cc' instead of 'guest'. 'guest' looks too generic.

But guest is what is needed there. Not all cases where the kernel runs
as a guest are confidential ones.

Later, that hyperv thing should be merged into the guest one too as the
hyperv should be a guest too. AFAICT.

> Also, I'm not sure why not to use pointer to ops struct instead of stroing
> them directly in x86_platform. Yes, it is consistent with 'hyper', but I
> don't see it as a strong argument.

There should be no big difference but we're doing it with direct struct
member assignment so far so we should keep doing the same and not start
doing pointers now, all of a sudden.

> This doesn't cover difference in flushing requirements. Can we get it too?

What are the requirements you have for TDX on this path?

This is the main reason why I'm asking you to review this - I'd like to
have one version which works for both and then I'll queue it on a common
branch.

This is also why I'd like for you and SEV folks to agree on all the
common code so that I can apply it and you can both base your patchsets
ontop.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-02-22 05:17:26

by Brijesh Singh

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit



On 2/21/22 11:41, Kirill A. Shutemov wrote:
> On Wed, Feb 16, 2022 at 10:04:57AM -0600, Brijesh Singh wrote:
>> @@ -287,6 +301,7 @@ struct x86_platform_ops {
>> struct x86_legacy_features legacy;
>> void (*set_legacy_features)(void);
>> struct x86_hyper_runtime hyper;
>> + struct x86_guest guest;
>> };
>
> I used 'cc' instead of 'guest'. 'guest' looks too generic.

I am fine with either of them.

>
> Also, I'm not sure why not to use pointer to ops struct instead of stroing
> them directly in x86_platform. Yes, it is consistent with 'hyper', but I
> don't see it as a strong argument.
>
>>
>> index b4072115c8ef..a55477a6e578 100644
>> --- a/arch/x86/mm/pat/set_memory.c
>> +++ b/arch/x86/mm/pat/set_memory.c
>> @@ -2012,8 +2012,15 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
>> */
>> cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
>>
>> + /* Notify HV that we are about to set/clr encryption attribute. */
>> + x86_platform.guest.enc_status_change_prepare(addr, numpages, enc);
>> +
>> ret = __change_page_attr_set_clr(&cpa, 1);
>
> This doesn't cover difference in flushing requirements. Can we get it too?
>

Yes, we can work to include that too.

>>
>> + /* Notify HV that we have succesfully set/clr encryption attribute. */
>> + if (!ret)
>> + x86_platform.guest.enc_status_change_finish(addr, numpages, enc);
>> +
>
> Any particular reason you moved it above cpa_flush()? I don't think it
> makes a difference for TDX, but still.
>

It does not make any difference for the SNP as well. We can keep it
where it was.


>> /*
>> * After changing the encryption attribute, we need to flush TLBs again
>> * in case any speculative TLB caching occurred (but no need to flush
>> @@ -2023,12 +2030,6 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
>> */
>> cpa_flush(&cpa, 0);
>>
>> - /*
>> - * Notify hypervisor that a given memory range is mapped encrypted
>> - * or decrypted.
>> - */
>> - notify_range_enc_status_changed(addr, numpages, enc);
>> -
>> return ret;
>> }
>>
>> --
>> 2.25.1
>>
>

2022-02-22 05:46:37

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v10 21/45] x86/mm: Add support to validate memory when changing C-bit

On Wed, Feb 16, 2022 at 10:04:57AM -0600, Brijesh Singh wrote:
> @@ -287,6 +301,7 @@ struct x86_platform_ops {
> struct x86_legacy_features legacy;
> void (*set_legacy_features)(void);
> struct x86_hyper_runtime hyper;
> + struct x86_guest guest;
> };

I used 'cc' instead of 'guest'. 'guest' looks too generic.

Also, I'm not sure why not to use pointer to ops struct instead of stroing
them directly in x86_platform. Yes, it is consistent with 'hyper', but I
don't see it as a strong argument.

>
> index b4072115c8ef..a55477a6e578 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2012,8 +2012,15 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
> */
> cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
>
> + /* Notify HV that we are about to set/clr encryption attribute. */
> + x86_platform.guest.enc_status_change_prepare(addr, numpages, enc);
> +
> ret = __change_page_attr_set_clr(&cpa, 1);

This doesn't cover difference in flushing requirements. Can we get it too?

>
> + /* Notify HV that we have succesfully set/clr encryption attribute. */
> + if (!ret)
> + x86_platform.guest.enc_status_change_finish(addr, numpages, enc);
> +

Any particular reason you moved it above cpa_flush()? I don't think it
makes a difference for TDX, but still.

> /*
> * After changing the encryption attribute, we need to flush TLBs again
> * in case any speculative TLB caching occurred (but no need to flush
> @@ -2023,12 +2030,6 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
> */
> cpa_flush(&cpa, 0);
>
> - /*
> - * Notify hypervisor that a given memory range is mapped encrypted
> - * or decrypted.
> - */
> - notify_range_enc_status_changed(addr, numpages, enc);
> -
> return ret;
> }
>
> --
> 2.25.1
>

--
Kirill A. Shutemov