The focus of Trechboot project (https://github.com/TrenchBoot) is to
enhance the boot security and integrity. This requires the linux kernel
to be directly invoked by x86 Dynamic launch measurements to establish
Dynamic Root of Trust for Measurement (DRTM). The dynamic launch will
be initiated by a boot loader with associated support added to it, for
example the first targeted boot loader will be GRUB2. An integral part of
establishing the DRTM involves measuring everything that is intended to
be run (kernel image, initrd, etc) and everything that will configure
that kernel to run (command line, boot params, etc) into specific PCRs,
the DRTM PCRs (17-22), in the TPM. Another key aspect is the dynamic
launch is rooted in hardware, that is to say the hardware (CPU) is what
takes the first measurement for the chain of integrity measurements. On
Intel this is done using the GETSEC instruction provided by Intel's TXT
and the SKINIT instruction provided by AMD's AMD-V. Information on these
technologies can be readily found online. This patchset introduces Intel
TXT support.
To enable the kernel to be launched by GETSEC, a stub must be built
into the setup section of the compressed kernel to handle the specific
state that the dynamic launch process leaves the BSP in. Also this stub
must measure everything that is going to be used as early as possible.
This stub code and subsequent code must also deal with the specific
state that the dynamic launch leaves the APs in.
A quick note on terminology. The larger open source project itself is
called Trenchboot, which is hosted on Github (links below). The kernel
feature enabling the use of the x86 technology is referred to as "Secure
Launch" within the kernel code. As such the prefixes sl_/SL_ or
slaunch/SLAUNCH will be seen in the code. The stub code discussed above
is referred to as the SL stub.
Note that patch 1 was authored by Arvind Sankar. We were not able to get
a status on this patch but Secure Launch depends on it so it is included
with the set.
The basic flow is:
- Entry from the dynamic launch jumps to the SL stub
- SL stub fixes up the world on the BSP
- For TXT, SL stub wakes the APs, fixes up their worlds
- For TXT, APs are left halted waiting for an NMI to wake them
- SL stub jumps to startup_32
- SL main locates the TPM event log and writes the measurements of
configuration and module information into it.
- Kernel boot proceeds normally from this point.
- During early setup, slaunch_setup() runs to finish some validation
and setup tasks.
- The SMP bringup code is modified to wake the waiting APs. APs vector
to rmpiggy and start up normally from that point.
- SL platform module is registered as a late initcall module. It reads
the TPM event log and extends the measurements taken into the TPM PCRs.
- SL platform module initializes the securityfs interface to allow
asccess to the TPM event log and TXT public registers.
- Kernel boot finishes booting normally
- SEXIT support to leave SMX mode is present on the kexec path and
the various reboot paths (poweroff, reset, halt).
Links:
The Trenchboot project including documentation:
https://github.com/trenchboot
Intel TXT is documented in its own specification and in the SDM Instruction Set volume:
https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
https://software.intel.com/en-us/articles/intel-sdm
AMD SKINIT is documented in the System Programming manual:
https://www.amd.com/system/files/TechDocs/24593.pdf
GRUB2 pre-launch support patchset (WIP):
https://lists.gnu.org/archive/html/grub-devel/2020-05/msg00011.html
Thanks
Ross Philipson and Daniel P. Smith
Changes in v2:
- Modified 32b entry code to prevent causing relocations in the compressed
kernel.
- Dropped patches for compressed kernel TPM PCR extender.
- Modified event log code to insert log delimiter events and not rely
on TPM access.
- Stop extending PCRs in the early Secure Launch stub code.
- Removed Kconfig options for hash algorithms and use the algorithms the
ACM used.
- Match Secure Launch measurement algorithm use to those reported in the
TPM 2.0 event log.
- Read the TPM events out of the TPM and extend them into the PCRs using
the mainline TPM driver. This is done in the late initcall module.
- Allow use of alternate PCR 19 and 20 for post ACM measurements.
- Add Kconfig constraints needed by Secure Launch (disable KASLR
and add x2apic dependency).
- Fix testing of SL_FLAGS when determining if Secure Launch is active
and the architecture is TXT.
- Use SYM_DATA_START_LOCAL macros in early entry point code.
- Security audit changes:
- Validate buffers passed to MLE do not overlap the MLE and are
properly laid out.
- Validate buffers and memory regions used by the MLE are
protected by IOMMU PMRs.
- Force IOMMU to not use passthrough mode during a Secure Launch.
- Prevent KASLR use during a Secure Launch.
Arvind Sankar (1):
x86/boot: Place kernel_info at a fixed offset
Daniel P. Smith (2):
x86: Add early SHA support for Secure Launch early measurements
x86: Secure Launch late initcall platform module
Ross Philipson (9):
x86: Secure Launch Kconfig
x86: Secure Launch main header file
x86: Secure Launch kernel early boot stub
x86: Secure Launch kernel late boot stub
x86: Secure Launch SMP bringup support
kexec: Secure Launch kexec SEXIT support
reboot: Secure Launch SEXIT support on reboot paths
tpm: Allow locality 2 to be set when initializing the TPM for Secure
Launch
iommu: Do not allow IOMMU passthrough with Secure Launch
Documentation/x86/boot.rst | 13 +
arch/x86/Kconfig | 32 ++
arch/x86/boot/compressed/Makefile | 3 +
arch/x86/boot/compressed/early_sha1.c | 103 +++++
arch/x86/boot/compressed/early_sha1.h | 17 +
arch/x86/boot/compressed/early_sha256.c | 7 +
arch/x86/boot/compressed/head_64.S | 37 ++
arch/x86/boot/compressed/kaslr.c | 11 +
arch/x86/boot/compressed/kernel_info.S | 52 ++-
arch/x86/boot/compressed/kernel_info.h | 12 +
arch/x86/boot/compressed/sl_main.c | 523 +++++++++++++++++++++++++
arch/x86/boot/compressed/sl_stub.S | 667 ++++++++++++++++++++++++++++++++
arch/x86/boot/compressed/vmlinux.lds.S | 6 +
arch/x86/include/asm/realmode.h | 3 +
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/asm-offsets.c | 19 +
arch/x86/kernel/reboot.c | 10 +
arch/x86/kernel/setup.c | 3 +
arch/x86/kernel/slaunch.c | 543 ++++++++++++++++++++++++++
arch/x86/kernel/slmodule.c | 495 ++++++++++++++++++++++++
arch/x86/kernel/smpboot.c | 86 ++++
arch/x86/realmode/rm/header.S | 3 +
arch/x86/realmode/rm/trampoline_64.S | 37 ++
drivers/char/tpm/tpm-chip.c | 13 +-
drivers/iommu/intel/dmar.c | 4 +
drivers/iommu/intel/iommu.c | 5 +
drivers/iommu/iommu.c | 6 +-
include/linux/slaunch.h | 540 ++++++++++++++++++++++++++
kernel/kexec_core.c | 4 +
lib/crypto/sha256.c | 8 +
lib/sha1.c | 4 +
31 files changed, 3261 insertions(+), 7 deletions(-)
create mode 100644 arch/x86/boot/compressed/early_sha1.c
create mode 100644 arch/x86/boot/compressed/early_sha1.h
create mode 100644 arch/x86/boot/compressed/early_sha256.c
create mode 100644 arch/x86/boot/compressed/kernel_info.h
create mode 100644 arch/x86/boot/compressed/sl_main.c
create mode 100644 arch/x86/boot/compressed/sl_stub.S
create mode 100644 arch/x86/kernel/slaunch.c
create mode 100644 arch/x86/kernel/slmodule.c
create mode 100644 include/linux/slaunch.h
--
1.8.3.1
Initial bits to bring in Secure Launch functionality. Add Kconfig
options for compiling in/out the Secure Launch code.
Signed-off-by: Ross Philipson <[email protected]>
---
arch/x86/Kconfig | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0045e1b..65d69f0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1989,6 +1989,38 @@ config EFI_MIXED
If unsure, say N.
+config SECURE_LAUNCH
+ bool "Secure Launch support"
+ default n
+ depends on X86_64 && X86_X2APIC
+ help
+ The Secure Launch feature allows a kernel to be loaded
+ directly through an Intel TXT measured launch. Intel TXT
+ establishes a Dynamic Root of Trust for Measurement (DRTM)
+ where the CPU measures the kernel image. This feature then
+ continues the measurement chain over kernel configuration
+ information and init images.
+
+config SECURE_LAUNCH_ALT_PCR19
+ bool "Secure Launch Alternate PCR 19 usage"
+ default n
+ depends on SECURE_LAUNCH
+ help
+ In the post ACM environment, Secure Launch by default measures
+ configuration information into PCR18. This feature allows finer
+ control over measurements by moving configuration measurements
+ into PCR19.
+
+config SECURE_LAUNCH_ALT_PCR20
+ bool "Secure Launch Alternate PCR 20 usage"
+ default n
+ depends on SECURE_LAUNCH
+ help
+ In the post ACM environment, Secure Launch by default measures
+ image data like any external initrd into PCR17. This feature
+ allows finer control over measurements by moving image measurements
+ into PCR20.
+
source "kernel/Kconfig.hz"
config KEXEC
--
1.8.3.1
Prior to running the next kernel via kexec, the Secure Launch code
closes down private SMX resources and does an SEXIT. This allows the
next kernel to start normally without any issues starting the APs etc.
Signed-off-by: Ross Philipson <[email protected]>
---
arch/x86/kernel/slaunch.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++
kernel/kexec_core.c | 4 +++
2 files changed, 75 insertions(+)
diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c
index a24d384..8db557a 100644
--- a/arch/x86/kernel/slaunch.c
+++ b/arch/x86/kernel/slaunch.c
@@ -470,3 +470,74 @@ void __init slaunch_setup(void)
vendor[3] == INTEL_CPUID_MFGID_EDX)
slaunch_setup_intel();
}
+
+static inline void smx_getsec_sexit(void)
+{
+ asm volatile (".byte 0x0f,0x37\n"
+ : : "a" (SMX_X86_GETSEC_SEXIT));
+}
+
+void slaunch_finalize(int do_sexit)
+{
+ void __iomem *config;
+ u64 one = TXT_REGVALUE_ONE, val;
+
+ if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) !=
+ (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT))
+ return;
+
+ config = ioremap(TXT_PRIV_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES *
+ PAGE_SIZE);
+ if (!config) {
+ pr_emerg("Error SEXIT failed to ioremap TXT private reqs\n");
+ return;
+ }
+
+ /* Clear secrets bit for SEXIT */
+ memcpy_toio(config + TXT_CR_CMD_NO_SECRETS, &one, sizeof(one));
+ memcpy_fromio(&val, config + TXT_CR_E2STS, sizeof(val));
+
+ /* Unlock memory configurations */
+ memcpy_toio(config + TXT_CR_CMD_UNLOCK_MEM_CONFIG, &one, sizeof(one));
+ memcpy_fromio(&val, config + TXT_CR_E2STS, sizeof(val));
+
+ /* Close the TXT private register space */
+ memcpy_toio(config + TXT_CR_CMD_CLOSE_PRIVATE, &one, sizeof(one));
+ memcpy_fromio(&val, config + TXT_CR_E2STS, sizeof(val));
+
+ /*
+ * Calls to iounmap are not being done because of the state of the
+ * system this late in the kexec process. Local IRQs are disabled and
+ * iounmap causes a TLB flush which in turn causes a warning. Leaving
+ * thse mappings is not an issue since the next kernel is going to
+ * completely re-setup memory management.
+ */
+
+ /* Map public registers and do a final read fence */
+ config = ioremap(TXT_PUB_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES *
+ PAGE_SIZE);
+ if (!config) {
+ pr_emerg("Error SEXIT failed to ioremap TXT public reqs\n");
+ return;
+ }
+
+ memcpy_fromio(&val, config + TXT_CR_E2STS, sizeof(val));
+
+ pr_emerg("TXT clear secrets bit and unlock memory complete.");
+
+ if (!do_sexit)
+ return;
+
+ if (smp_processor_id() != 0) {
+ pr_emerg("Error TXT SEXIT must be called on CPU 0\n");
+ return;
+ }
+
+ /* Disable SMX mode */
+ cr4_set_bits(X86_CR4_SMXE);
+
+ /* Do the SEXIT SMX operation */
+ smx_getsec_sexit();
+
+ pr_emerg("TXT SEXIT complete.");
+}
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index f099bae..1dcf20c 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -38,6 +38,7 @@
#include <linux/hugetlb.h>
#include <linux/objtool.h>
#include <linux/kmsg_dump.h>
+#include <linux/slaunch.h>
#include <asm/page.h>
#include <asm/sections.h>
@@ -1178,6 +1179,9 @@ int kernel_kexec(void)
cpu_hotplug_enable();
pr_notice("Starting new kernel\n");
machine_shutdown();
+
+ /* Finalize TXT registers and do SEXIT */
+ slaunch_finalize(1);
}
kmsg_dump(KMSG_DUMP_SHUTDOWN);
--
1.8.3.1
The routine slaunch_setup is called out of the x86 specific setup_arch
routine during early kernel boot. After determining what platform is
present, various operations specific to that platform occur. This
includes finalizing setting for the platform late launch and verifying
that memory protections are in place.
For TXT, this code also reserves the original compressed kernel setup
area where the APs were left looping so that this memory cannot be used.
Signed-off-by: Ross Philipson <[email protected]>
---
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/setup.c | 3 +
arch/x86/kernel/slaunch.c | 472 +++++++++++++++++++++++++++++++++++++++++++++
drivers/iommu/intel/dmar.c | 4 +
4 files changed, 480 insertions(+)
create mode 100644 arch/x86/kernel/slaunch.c
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0f66682..574e643 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -80,6 +80,7 @@ obj-$(CONFIG_X86_32) += tls.o
obj-$(CONFIG_IA32_EMULATION) += tls.o
obj-y += step.o
obj-$(CONFIG_INTEL_TXT) += tboot.o
+obj-$(CONFIG_SECURE_LAUNCH) += slaunch.o
obj-$(CONFIG_ISA_DMA_API) += i8237.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += cpu/
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1e72062..7f5d12a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -18,6 +18,7 @@
#include <linux/root_dev.h>
#include <linux/hugetlb.h>
#include <linux/tboot.h>
+#include <linux/slaunch.h>
#include <linux/usb/xhci-dbgp.h>
#include <linux/static_call.h>
#include <linux/swiotlb.h>
@@ -993,6 +994,8 @@ void __init setup_arch(char **cmdline_p)
early_gart_iommu_check();
#endif
+ slaunch_setup();
+
/*
* partially used pages are not usable - thus
* we are rounding upwards:
diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c
new file mode 100644
index 00000000..a24d384
--- /dev/null
+++ b/arch/x86/kernel/slaunch.c
@@ -0,0 +1,472 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Secure Launch late validation/setup and finalization support.
+ *
+ * Copyright (c) 2021, Oracle and/or its affiliates.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/linkage.h>
+#include <linux/mm.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+#include <linux/security.h>
+#include <linux/memblock.h>
+#include <asm/segment.h>
+#include <asm/sections.h>
+#include <asm/tlbflush.h>
+#include <asm/e820/api.h>
+#include <asm/setup.h>
+#include <linux/slaunch.h>
+
+static u32 sl_flags;
+static struct sl_ap_wake_info ap_wake_info;
+static u64 evtlog_addr;
+static u32 evtlog_size;
+static u64 vtd_pmr_lo_size;
+
+/* This should be plenty of room */
+static u8 txt_dmar[PAGE_SIZE] __aligned(16);
+
+u32 slaunch_get_flags(void)
+{
+ return sl_flags;
+}
+EXPORT_SYMBOL(slaunch_get_flags);
+
+struct sl_ap_wake_info *slaunch_get_ap_wake_info(void)
+{
+ return &ap_wake_info;
+}
+
+struct acpi_table_header *slaunch_get_dmar_table(struct acpi_table_header *dmar)
+{
+ /* The DMAR is only stashed and provided via TXT on Intel systems */
+ if (memcmp(txt_dmar, "DMAR", 4))
+ return dmar;
+
+ return (struct acpi_table_header *)(&txt_dmar[0]);
+}
+
+void __noreturn slaunch_txt_reset(void __iomem *txt,
+ const char *msg, u64 error)
+{
+ u64 one = 1, val;
+
+ pr_err("%s", msg);
+
+ /*
+ * This performs a TXT reset with a sticky error code. The reads of
+ * TXT_CR_E2STS act as barriers.
+ */
+ memcpy_toio(txt + TXT_CR_ERRORCODE, &error, sizeof(error));
+ memcpy_fromio(&val, txt + TXT_CR_E2STS, sizeof(val));
+ memcpy_toio(txt + TXT_CR_CMD_NO_SECRETS, &one, sizeof(one));
+ memcpy_fromio(&val, txt + TXT_CR_E2STS, sizeof(val));
+ memcpy_toio(txt + TXT_CR_CMD_UNLOCK_MEM_CONFIG, &one, sizeof(one));
+ memcpy_fromio(&val, txt + TXT_CR_E2STS, sizeof(val));
+ memcpy_toio(txt + TXT_CR_CMD_RESET, &one, sizeof(one));
+
+ for ( ; ; )
+ asm volatile ("hlt");
+
+ unreachable();
+}
+
+/*
+ * The TXT heap is too big to map all at once with early_ioremap
+ * so it is done a table at a time.
+ */
+static void __init *txt_early_get_heap_table(void __iomem *txt, u32 type,
+ u32 bytes)
+{
+ void *heap;
+ u64 base, size, offset = 0;
+ int i;
+
+ if (type > TXT_SINIT_TABLE_MAX)
+ slaunch_txt_reset(txt,
+ "Error invalid table type for early heap walk\n",
+ SL_ERROR_HEAP_WALK);
+
+ memcpy_fromio(&base, txt + TXT_CR_HEAP_BASE, sizeof(base));
+ memcpy_fromio(&size, txt + TXT_CR_HEAP_SIZE, sizeof(size));
+
+ /* Iterate over heap tables looking for table of "type" */
+ for (i = 0; i < type; i++) {
+ base += offset;
+ heap = early_memremap(base, sizeof(u64));
+ if (!heap)
+ slaunch_txt_reset(txt,
+ "Error early_memremap of heap for heap walk\n",
+ SL_ERROR_HEAP_MAP);
+
+ offset = *((u64 *)heap);
+
+ /*
+ * After the first iteration, any offset of zero is invalid and
+ * implies the TXT heap is corrupted.
+ */
+ if (!offset)
+ slaunch_txt_reset(txt,
+ "Error invalid 0 offset in heap walk\n",
+ SL_ERROR_HEAP_ZERO_OFFSET);
+
+ early_memunmap(heap, sizeof(u64));
+ }
+
+ /* Skip the size field at the head of each table */
+ base += sizeof(u64);
+ heap = early_memremap(base, bytes);
+ if (!heap)
+ slaunch_txt_reset(txt,
+ "Error early_memremap of heap section\n",
+ SL_ERROR_HEAP_MAP);
+
+ return heap;
+}
+
+static void __init txt_early_put_heap_table(void *addr, unsigned long size)
+{
+ early_memunmap(addr, size);
+}
+
+/*
+ * TXT uses a special set of VTd registers to protect all of memory from DMA
+ * until the IOMMU can be programmed to protect memory. There is the low
+ * memory PMR that can protect all memory up to 4G. The high memory PRM can
+ * be setup to protect all memory beyond 4Gb. Validate that these values cover
+ * what is expected.
+ */
+static void __init slaunch_verify_pmrs(void __iomem *txt)
+{
+ struct txt_os_sinit_data *os_sinit_data;
+ unsigned long last_pfn;
+ u32 field_offset, err = 0;
+ const char *errmsg = "";
+
+ field_offset = offsetof(struct txt_os_sinit_data, lcp_po_base);
+ os_sinit_data = txt_early_get_heap_table(txt, TXT_OS_SINIT_DATA_TABLE,
+ field_offset);
+
+ /* Save a copy */
+ vtd_pmr_lo_size = os_sinit_data->vtd_pmr_lo_size;
+
+ last_pfn = e820__end_of_ram_pfn();
+
+ /*
+ * First make sure the hi PMR covers all memory above 4G. In the
+ * unlikely case where there is < 4G on the system, the hi PMR will
+ * not be set.
+ */
+ if (os_sinit_data->vtd_pmr_hi_base != 0x0ULL) {
+ if (os_sinit_data->vtd_pmr_hi_base != 0x100000000ULL) {
+ err = SL_ERROR_HI_PMR_BASE;
+ errmsg = "Error hi PMR base\n";
+ goto out;
+ }
+
+ if (PFN_PHYS(last_pfn) > os_sinit_data->vtd_pmr_hi_base +
+ os_sinit_data->vtd_pmr_hi_size) {
+ err = SL_ERROR_HI_PMR_SIZE;
+ errmsg = "Error hi PMR size\n";
+ goto out;
+ }
+ }
+
+ /*
+ * Lo PMR base should always be 0. This was already checked in
+ * early stub.
+ */
+
+ /*
+ * Check that if the kernel was loaded below 4G, that it is protected
+ * by the lo PMR. Note this is the decompressed kernel. The ACM would
+ * have ensured the compressed kernel (the MLE image) was protected.
+ */
+ if ((__pa_symbol(_end) < 0x100000000ULL) &&
+ (__pa_symbol(_end) > os_sinit_data->vtd_pmr_lo_size)) {
+ err = SL_ERROR_LO_PMR_MLE;
+ errmsg = "Error lo PMR does not cover MLE kernel\n";
+ }
+
+ /*
+ * Other regions of interest like boot param, AP wake block, cmdline
+ * already checked for PMR coverage in the early stub code.
+ */
+
+out:
+ txt_early_put_heap_table(os_sinit_data, field_offset);
+
+ if (err)
+ slaunch_txt_reset(txt, errmsg, err);
+}
+
+static void __init slaunch_txt_reserve_range(u64 base, u64 size)
+{
+ int type;
+
+ type = e820__get_entry_type(base, base + size - 1);
+ if (type == E820_TYPE_RAM) {
+ pr_info("memblock reserve base: %llx size: %llx\n", base, size);
+ memblock_reserve(base, size);
+ }
+}
+
+/*
+ * For Intel, certain regions of memory must be marked as reserved by putting
+ * them on the memblock reserved list if they are not already e820 reserved.
+ * This includes:
+ * - The TXT HEAP
+ * - The ACM area
+ * - The TXT private register bank
+ * - The MDR list sent to the MLE by the ACM (see TXT specification)
+ * (Normally the above are properly reserved by firmware but if it was not
+ * done, reserve them now)
+ * - The AP wake block
+ * - TPM log external to the TXT heap
+ *
+ * Also if the low PMR doesn't cover all memory < 4G, any RAM regions above
+ * the low PMR must be reservered too.
+ */
+static void __init slaunch_txt_reserve(void __iomem *txt)
+{
+ struct txt_sinit_memory_descriptor_record *mdr;
+ struct txt_sinit_mle_data *sinit_mle_data;
+ void *mdrs;
+ u64 base, size, heap_base, heap_size;
+ u32 field_offset, mdrnum, mdroffset, mdrslen, i;
+
+ base = TXT_PRIV_CONFIG_REGS_BASE;
+ size = TXT_PUB_CONFIG_REGS_BASE - TXT_PRIV_CONFIG_REGS_BASE;
+ slaunch_txt_reserve_range(base, size);
+
+ memcpy_fromio(&heap_base, txt + TXT_CR_HEAP_BASE, sizeof(heap_base));
+ memcpy_fromio(&heap_size, txt + TXT_CR_HEAP_SIZE, sizeof(heap_size));
+ slaunch_txt_reserve_range(heap_base, heap_size);
+
+ memcpy_fromio(&base, txt + TXT_CR_SINIT_BASE, sizeof(base));
+ memcpy_fromio(&size, txt + TXT_CR_SINIT_SIZE, sizeof(size));
+ slaunch_txt_reserve_range(base, size);
+
+ field_offset = offsetof(struct txt_sinit_mle_data,
+ sinit_vtd_dmar_table_size);
+ sinit_mle_data = txt_early_get_heap_table(txt, TXT_SINIT_MLE_DATA_TABLE,
+ field_offset);
+
+ mdrnum = sinit_mle_data->num_of_sinit_mdrs;
+ mdroffset = sinit_mle_data->sinit_mdrs_table_offset;
+
+ txt_early_put_heap_table(sinit_mle_data, field_offset);
+
+ if (!mdrnum)
+ goto nomdr;
+
+ mdrslen = (mdrnum * sizeof(struct txt_sinit_memory_descriptor_record));
+
+ mdrs = txt_early_get_heap_table(txt, TXT_SINIT_MLE_DATA_TABLE,
+ mdroffset + mdrslen - 8);
+
+ mdr = (struct txt_sinit_memory_descriptor_record *)
+ (mdrs + mdroffset - 8);
+
+ for (i = 0; i < mdrnum; i++, mdr++) {
+ /* Spec says some entries can have length 0, ignore them */
+ if (mdr->type > 0 && mdr->length > 0)
+ slaunch_txt_reserve_range(mdr->address, mdr->length);
+ }
+
+ txt_early_put_heap_table(mdrs, mdroffset + mdrslen - 8);
+
+nomdr:
+ slaunch_txt_reserve_range(ap_wake_info.ap_wake_block,
+ ap_wake_info.ap_wake_block_size);
+
+ /*
+ * Earlier checks ensured that the event log was properly situated
+ * either inside the TXT heap or outside. This is a check to see if the
+ * event log needs to be reserved. If it is in the TXT heap, it is
+ * already reserved.
+ */
+ if (evtlog_addr < heap_base || evtlog_addr > (heap_base + heap_size))
+ slaunch_txt_reserve_range(evtlog_addr, evtlog_size);
+
+ for (i = 0; i < e820_table->nr_entries; i++) {
+ base = e820_table->entries[i].addr;
+ size = e820_table->entries[i].size;
+ if ((base >= vtd_pmr_lo_size) && (base < 0x100000000ULL))
+ slaunch_txt_reserve_range(base, size);
+ else if ((base < vtd_pmr_lo_size) &&
+ (base + size > vtd_pmr_lo_size))
+ slaunch_txt_reserve_range(vtd_pmr_lo_size,
+ base + size - vtd_pmr_lo_size);
+ }
+}
+
+/*
+ * TXT stashes a safe copy of the DMAR ACPI table to prevent tampering.
+ * It is stored in the TXT heap. Fetch it from there and make it available
+ * to the IOMMU driver.
+ */
+static void __init slaunch_copy_dmar_table(void __iomem *txt)
+{
+ struct txt_sinit_mle_data *sinit_mle_data;
+ void *dmar;
+ u32 field_offset, dmar_size, dmar_offset;
+
+ memset(&txt_dmar, 0, PAGE_SIZE);
+
+ field_offset = offsetof(struct txt_sinit_mle_data,
+ processor_scrtm_status);
+ sinit_mle_data = txt_early_get_heap_table(txt, TXT_SINIT_MLE_DATA_TABLE,
+ field_offset);
+
+ dmar_size = sinit_mle_data->sinit_vtd_dmar_table_size;
+ dmar_offset = sinit_mle_data->sinit_vtd_dmar_table_offset;
+
+ txt_early_put_heap_table(sinit_mle_data, field_offset);
+
+ if (!dmar_size || !dmar_offset)
+ slaunch_txt_reset(txt,
+ "Error invalid DMAR table values\n",
+ SL_ERROR_HEAP_INVALID_DMAR);
+
+ if (unlikely(dmar_size > PAGE_SIZE))
+ slaunch_txt_reset(txt,
+ "Error DMAR too big to store\n",
+ SL_ERROR_HEAP_DMAR_SIZE);
+
+
+ dmar = txt_early_get_heap_table(txt, TXT_SINIT_MLE_DATA_TABLE,
+ dmar_offset + dmar_size - 8);
+ if (!dmar)
+ slaunch_txt_reset(txt,
+ "Error early_ioremap of DMAR\n",
+ SL_ERROR_HEAP_DMAR_MAP);
+
+ memcpy(&txt_dmar[0], dmar + dmar_offset - 8, dmar_size);
+
+ txt_early_put_heap_table(dmar, dmar_offset + dmar_size - 8);
+}
+
+/*
+ * The location of the safe AP wake code block is stored in the TXT heap.
+ * Fetch it here in the early init code for later use in SMP startup.
+ *
+ * Also get the TPM event log values that may have to be put on the
+ * memblock reserve list later.
+ */
+static void __init slaunch_fetch_os_mle_fields(void __iomem *txt)
+{
+ struct txt_os_mle_data *os_mle_data;
+ u8 *jmp_offset;
+
+ os_mle_data = txt_early_get_heap_table(txt, TXT_OS_MLE_DATA_TABLE,
+ sizeof(*os_mle_data));
+
+ ap_wake_info.ap_wake_block = os_mle_data->ap_wake_block;
+ ap_wake_info.ap_wake_block_size = os_mle_data->ap_wake_block_size;
+
+ jmp_offset = os_mle_data->mle_scratch + SL_SCRATCH_AP_JMP_OFFSET;
+ ap_wake_info.ap_jmp_offset = *((u32 *)jmp_offset);
+
+ evtlog_addr = os_mle_data->evtlog_addr;
+ evtlog_size = os_mle_data->evtlog_size;
+
+ txt_early_put_heap_table(os_mle_data, sizeof(*os_mle_data));
+}
+
+/*
+ * Intel specific late stub setup and validation.
+ */
+static void __init slaunch_setup_intel(void)
+{
+ void __iomem *txt;
+ u64 one = TXT_REGVALUE_ONE, val;
+
+ /*
+ * First see if SENTER was done and not by TBOOT by reading the status
+ * register in the public space.
+ */
+ txt = early_ioremap(TXT_PUB_CONFIG_REGS_BASE,
+ TXT_NR_CONFIG_PAGES * PAGE_SIZE);
+ if (!txt) {
+ /* This is really bad, no where to go from here */
+ panic("Error early_ioremap of TXT pub registers\n");
+ }
+
+ memcpy_fromio(&val, txt + TXT_CR_STS, sizeof(val));
+ early_iounmap(txt, TXT_NR_CONFIG_PAGES * PAGE_SIZE);
+
+ /* Was SENTER done? */
+ if (!(val & TXT_SENTER_DONE_STS))
+ return;
+
+ /* Was it done by TBOOT? */
+ if (boot_params.tboot_addr)
+ return;
+
+ /* Now we want to use the private register space */
+ txt = early_ioremap(TXT_PRIV_CONFIG_REGS_BASE,
+ TXT_NR_CONFIG_PAGES * PAGE_SIZE);
+ if (!txt) {
+ /* This is really bad, no where to go from here */
+ panic("Error early_ioremap of TXT priv registers\n");
+ }
+
+ /*
+ * Try to read the Intel VID from the TXT private registers to see if
+ * TXT measured launch happened properly and the private space is
+ * available.
+ */
+ memcpy_fromio(&val, txt + TXT_CR_DIDVID, sizeof(val));
+ if ((u16)(val & 0xffff) != 0x8086) {
+ /*
+ * Can't do a proper TXT reset since it appears something is
+ * wrong even though SENTER happened and it should be in SMX
+ * mode.
+ */
+ panic("Invalid TXT vendor ID, not in SMX mode\n");
+ }
+
+ /* Set flags so subsequent code knows the status of the launch */
+ sl_flags |= (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT);
+
+ /*
+ * Reading the proper DIDVID from the private register space means we
+ * are in SMX mode and private registers are open for read/write.
+ */
+
+ /* On Intel, have to handle TPM localities via TXT */
+ memcpy_toio(txt + TXT_CR_CMD_SECRETS, &one, sizeof(one));
+ memcpy_fromio(&val, txt + TXT_CR_E2STS, sizeof(val));
+ memcpy_toio(txt + TXT_CR_CMD_OPEN_LOCALITY1, &one, sizeof(one));
+ memcpy_fromio(&val, txt + TXT_CR_E2STS, sizeof(val));
+
+ slaunch_fetch_os_mle_fields(txt);
+
+ slaunch_verify_pmrs(txt);
+
+ slaunch_txt_reserve(txt);
+
+ slaunch_copy_dmar_table(txt);
+
+ early_iounmap(txt, TXT_NR_CONFIG_PAGES * PAGE_SIZE);
+
+ pr_info("Intel TXT setup complete\n");
+}
+
+void __init slaunch_setup(void)
+{
+ u32 vendor[4];
+
+ /* Get manufacturer string with CPUID 0 */
+ cpuid(0, &vendor[0], &vendor[1], &vendor[2], &vendor[3]);
+
+ /* Only Intel TXT is supported at this point */
+ if (vendor[1] == INTEL_CPUID_MFGID_EBX &&
+ vendor[2] == INTEL_CPUID_MFGID_ECX &&
+ vendor[3] == INTEL_CPUID_MFGID_EDX)
+ slaunch_setup_intel();
+}
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 84057cb..ba7100c 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -29,6 +29,7 @@
#include <linux/iommu.h>
#include <linux/numa.h>
#include <linux/limits.h>
+#include <linux/slaunch.h>
#include <asm/irq_remapping.h>
#include <asm/iommu_table.h>
#include <trace/events/intel_iommu.h>
@@ -662,6 +663,9 @@ static inline int dmar_walk_dmar_table(struct acpi_table_dmar *dmar,
*/
dmar_tbl = tboot_get_dmar_table(dmar_tbl);
+ /* If Secure Launch is active, it has similar logic */
+ dmar_tbl = slaunch_get_dmar_table(dmar_tbl);
+
dmar = (struct acpi_table_dmar *)dmar_tbl;
if (!dmar)
return -ENODEV;
--
1.8.3.1
The IOMMU should always be set to default translated type after
the PMRs are disabled to protect the MLE from DMA.
Signed-off-by: Ross Philipson <[email protected]>
---
drivers/iommu/intel/iommu.c | 5 +++++
drivers/iommu/iommu.c | 6 +++++-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284..4f0256d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -41,6 +41,7 @@
#include <linux/dma-direct.h>
#include <linux/crash_dump.h>
#include <linux/numa.h>
+#include <linux/slaunch.h>
#include <asm/irq_remapping.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>
@@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device *dev)
*/
static int device_def_domain_type(struct device *dev)
{
+ /* Do not allow identity domain when Secure Launch is configured */
+ if (slaunch_get_flags() & SL_FLAG_ACTIVE)
+ return IOMMU_DOMAIN_DMA;
+
if (dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(dev);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d..d49b7dd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -23,6 +23,7 @@
#include <linux/property.h>
#include <linux/fsl/mc.h>
#include <linux/module.h>
+#include <linux/slaunch.h>
#include <trace/events/iommu.h>
static struct kset *iommu_group_kset;
@@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
{
if (cmd_line)
iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
- iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
+
+ /* Do not allow identity domain when Secure Launch is configured */
+ if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
+ iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
}
void iommu_set_default_translated(bool cmd_line)
--
1.8.3.1
Introduce the main Secure Launch header file used in the early SL stub
and the early setup code.
Signed-off-by: Ross Philipson <[email protected]>
---
include/linux/slaunch.h | 540 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 540 insertions(+)
create mode 100644 include/linux/slaunch.h
diff --git a/include/linux/slaunch.h b/include/linux/slaunch.h
new file mode 100644
index 00000000..dd8d92e
--- /dev/null
+++ b/include/linux/slaunch.h
@@ -0,0 +1,540 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Main Secure Launch header file.
+ *
+ * Copyright (c) 2021, Oracle and/or its affiliates.
+ */
+
+#ifndef _LINUX_SLAUNCH_H
+#define _LINUX_SLAUNCH_H
+
+/*
+ * Secure Launch Defined State Flags
+ */
+#define SL_FLAG_ACTIVE 0x00000001
+#define SL_FLAG_ARCH_SKINIT 0x00000002
+#define SL_FLAG_ARCH_TXT 0x00000004
+
+/*
+ * Secure Launch CPU Type
+ */
+#define SL_CPU_AMD 1
+#define SL_CPU_INTEL 2
+
+#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
+
+#define __SL32_CS 0x0008
+#define __SL32_DS 0x0010
+
+#define INTEL_CPUID_MFGID_EBX 0x756e6547 /* Genu */
+#define INTEL_CPUID_MFGID_EDX 0x49656e69 /* ineI */
+#define INTEL_CPUID_MFGID_ECX 0x6c65746e /* ntel */
+
+#define AMD_CPUID_MFGID_EBX 0x68747541 /* Auth */
+#define AMD_CPUID_MFGID_EDX 0x69746e65 /* enti */
+#define AMD_CPUID_MFGID_ECX 0x444d4163 /* cAMD */
+
+/*
+ * Intel Safer Mode Extensions (SMX)
+ *
+ * Intel SMX provides a programming interface to establish a Measured Launched
+ * Environment (MLE). The measurement and protection mechanisms supported by the
+ * capabilities of an Intel Trusted Execution Technology (TXT) platform. SMX is
+ * the processor’s programming interface in an Intel TXT platform.
+ *
+ * See Intel SDM Volume 2 - 6.1 "Safer Mode Extensions Reference"
+ */
+
+/*
+ * SMX GETSEC Leaf Functions
+ */
+#define SMX_X86_GETSEC_SEXIT 5
+#define SMX_X86_GETSEC_SMCTRL 7
+#define SMX_X86_GETSEC_WAKEUP 8
+
+/*
+ * Intel Trusted Execution Technology MMIO Registers Banks
+ */
+#define TXT_PUB_CONFIG_REGS_BASE 0xfed30000
+#define TXT_PRIV_CONFIG_REGS_BASE 0xfed20000
+#define TXT_NR_CONFIG_PAGES ((TXT_PUB_CONFIG_REGS_BASE - \
+ TXT_PRIV_CONFIG_REGS_BASE) >> PAGE_SHIFT)
+
+/*
+ * Intel Trusted Execution Technology (TXT) Registers
+ */
+#define TXT_CR_STS 0x0000
+#define TXT_CR_ESTS 0x0008
+#define TXT_CR_ERRORCODE 0x0030
+#define TXT_CR_CMD_RESET 0x0038
+#define TXT_CR_CMD_CLOSE_PRIVATE 0x0048
+#define TXT_CR_DIDVID 0x0110
+#define TXT_CR_VER_EMIF 0x0200
+#define TXT_CR_CMD_UNLOCK_MEM_CONFIG 0x0218
+#define TXT_CR_SINIT_BASE 0x0270
+#define TXT_CR_SINIT_SIZE 0x0278
+#define TXT_CR_MLE_JOIN 0x0290
+#define TXT_CR_HEAP_BASE 0x0300
+#define TXT_CR_HEAP_SIZE 0x0308
+#define TXT_CR_SCRATCHPAD 0x0378
+#define TXT_CR_CMD_OPEN_LOCALITY1 0x0380
+#define TXT_CR_CMD_CLOSE_LOCALITY1 0x0388
+#define TXT_CR_CMD_OPEN_LOCALITY2 0x0390
+#define TXT_CR_CMD_CLOSE_LOCALITY2 0x0398
+#define TXT_CR_CMD_SECRETS 0x08e0
+#define TXT_CR_CMD_NO_SECRETS 0x08e8
+#define TXT_CR_E2STS 0x08f0
+
+/* TXT default register value */
+#define TXT_REGVALUE_ONE 0x1ULL
+
+/* TXTCR_STS status bits */
+#define TXT_SENTER_DONE_STS (1<<0)
+#define TXT_SEXIT_DONE_STS (1<<1)
+
+/*
+ * SINIT/MLE Capabilities Field Bit Definitions
+ */
+#define TXT_SINIT_MLE_CAP_WAKE_GETSEC 0
+#define TXT_SINIT_MLE_CAP_WAKE_MONITOR 1
+
+/*
+ * OS/MLE Secure Launch Specific Definitions
+ */
+#define TXT_OS_MLE_STRUCT_VERSION 1
+#define TXT_OS_MLE_MAX_VARIABLE_MTRRS 32
+
+/*
+ * TXT Heap Table Enumeration
+ */
+#define TXT_BIOS_DATA_TABLE 1
+#define TXT_OS_MLE_DATA_TABLE 2
+#define TXT_OS_SINIT_DATA_TABLE 3
+#define TXT_SINIT_MLE_DATA_TABLE 4
+#define TXT_SINIT_TABLE_MAX TXT_SINIT_MLE_DATA_TABLE
+
+/*
+ * Secure Launch Defined Error Codes used in MLE-initiated TXT resets.
+ *
+ * TXT Specification
+ * Appendix I ACM Error Codes
+ */
+#define SL_ERROR_GENERIC 0xc0008001
+#define SL_ERROR_TPM_INIT 0xc0008002
+#define SL_ERROR_TPM_INVALID_LOG20 0xc0008003
+#define SL_ERROR_TPM_LOGGING_FAILED 0xc0008004
+#define SL_ERROR_REGION_STRADDLE_4GB 0xc0008005
+#define SL_ERROR_TPM_EXTEND 0xc0008006
+#define SL_ERROR_MTRR_INV_VCNT 0xc0008007
+#define SL_ERROR_MTRR_INV_DEF_TYPE 0xc0008008
+#define SL_ERROR_MTRR_INV_BASE 0xc0008009
+#define SL_ERROR_MTRR_INV_MASK 0xc000800a
+#define SL_ERROR_MSR_INV_MISC_EN 0xc000800b
+#define SL_ERROR_INV_AP_INTERRUPT 0xc000800c
+#define SL_ERROR_INTEGER_OVERFLOW 0xc000800d
+#define SL_ERROR_HEAP_WALK 0xc000800e
+#define SL_ERROR_HEAP_MAP 0xc000800f
+#define SL_ERROR_REGION_ABOVE_4GB 0xc0008010
+#define SL_ERROR_HEAP_INVALID_DMAR 0xc0008011
+#define SL_ERROR_HEAP_DMAR_SIZE 0xc0008012
+#define SL_ERROR_HEAP_DMAR_MAP 0xc0008013
+#define SL_ERROR_HI_PMR_BASE 0xc0008014
+#define SL_ERROR_HI_PMR_SIZE 0xc0008015
+#define SL_ERROR_LO_PMR_BASE 0xc0008016
+#define SL_ERROR_LO_PMR_MLE 0xc0008017
+#define SL_ERROR_INITRD_TOO_BIG 0xc0008018
+#define SL_ERROR_HEAP_ZERO_OFFSET 0xc0008019
+#define SL_ERROR_WAKE_BLOCK_TOO_SMALL 0xc000801a
+#define SL_ERROR_MLE_BUFFER_OVERLAP 0xc000801b
+#define SL_ERROR_BUFFER_BEYOND_PMR 0xc000801c
+#define SL_ERROR_OS_SINIT_BAD_VERSION 0xc000801d
+#define SL_ERROR_EVENTLOG_MAP 0xc000801e
+#define SL_ERROR_TPM_NUMBER_ALGS 0xc000801f
+#define SL_ERROR_TPM_UNKNOWN_DIGEST 0xc0008020
+#define SL_ERROR_TPM_INVALID_EVENT 0xc0008021
+
+/*
+ * Secure Launch Defined Limits
+ */
+#define TXT_MAX_CPUS 512
+#define TXT_BOOT_STACK_SIZE 24
+
+/*
+ * Secure Launch event log entry type. The TXT specification defines the
+ * base event value as 0x400 for DRTM values.
+ */
+#define TXT_EVTYPE_BASE 0x400
+#define TXT_EVTYPE_SLAUNCH (TXT_EVTYPE_BASE + 0x102)
+#define TXT_EVTYPE_SLAUNCH_START (TXT_EVTYPE_BASE + 0x103)
+#define TXT_EVTYPE_SLAUNCH_END (TXT_EVTYPE_BASE + 0x104)
+
+/*
+ * Measured Launch PCRs
+ */
+#define SL_DEF_IMAGE_PCR17 17 /* TCG Details PCR */
+#define SL_DEF_CONFIG_PCR18 18 /* TCG Authorities PCR */
+#define SL_ALT_CONFIG_PCR19 19
+#define SL_ALT_IMAGE_PCR20 20
+
+/*
+ * MLE scratch area offsets
+ */
+#define SL_SCRATCH_AP_EBX 0
+#define SL_SCRATCH_AP_JMP_OFFSET 4
+#define SL_SCRATCH_AP_PAUSE 8
+
+#ifndef __ASSEMBLY__
+
+#include <linux/io.h>
+#include <linux/tpm.h>
+#include <linux/tpm_eventlog.h>
+
+/*
+ * Secure Launch AP wakeup information fetched in SMP boot code.
+ */
+struct sl_ap_wake_info {
+ u32 ap_wake_block;
+ u32 ap_wake_block_size;
+ u32 ap_jmp_offset;
+};
+
+/*
+ * TXT heap extended data elements.
+ */
+struct txt_heap_ext_data_element {
+ u32 type;
+ u32 size;
+ /* Data */
+} __packed;
+
+#define TXT_HEAP_EXTDATA_TYPE_END 0
+
+struct txt_heap_end_element {
+ u32 type;
+ u32 size;
+} __packed;
+
+#define TXT_HEAP_EXTDATA_TYPE_TPM_EVENT_LOG_PTR 5
+
+struct txt_heap_event_log_element {
+ u64 event_log_phys_addr;
+} __packed;
+
+#define TXT_HEAP_EXTDATA_TYPE_EVENT_LOG_POINTER2_1 8
+
+struct txt_heap_event_log_pointer2_1_element {
+ u64 phys_addr;
+ u32 allocated_event_container_size;
+ u32 first_record_offset;
+ u32 next_record_offset;
+} __packed;
+
+/*
+ * Secure Launch defined MTRR saving structures
+ */
+struct txt_mtrr_pair {
+ u64 mtrr_physbase;
+ u64 mtrr_physmask;
+} __packed;
+
+struct txt_mtrr_state {
+ u64 default_mem_type;
+ u64 mtrr_vcnt;
+ struct txt_mtrr_pair mtrr_pair[TXT_OS_MLE_MAX_VARIABLE_MTRRS];
+} __packed;
+
+/*
+ * Secure Launch defined OS/MLE TXT Heap table
+ */
+struct txt_os_mle_data {
+ u32 version;
+ u32 boot_params_addr;
+ u64 saved_misc_enable_msr;
+ struct txt_mtrr_state saved_bsp_mtrrs;
+ u32 ap_wake_block;
+ u32 ap_wake_block_size;
+ u64 evtlog_addr;
+ u32 evtlog_size;
+ u8 mle_scratch[64];
+} __packed;
+
+/*
+ * TXT specification defined BIOS data TXT Heap table
+ */
+struct txt_bios_data {
+ u32 version; /* Currently 5 for TPM 1.2 and 6 for TPM 2.0 */
+ u32 bios_sinit_size;
+ u64 reserved1;
+ u64 reserved2;
+ u32 num_logical_procs;
+ /* Versions >= 5 with updates in version 6 */
+ u32 sinit_flags;
+ u32 mle_flags;
+ /* Versions >= 4 */
+ /* Ext Data Elements */
+} __packed;
+
+/*
+ * TXT specification defined OS/SINIT TXT Heap table
+ */
+struct txt_os_sinit_data {
+ u32 version; /* Currently 6 for TPM 1.2 and 7 for TPM 2.0 */
+ u32 flags;
+ u64 mle_ptab;
+ u64 mle_size;
+ u64 mle_hdr_base;
+ u64 vtd_pmr_lo_base;
+ u64 vtd_pmr_lo_size;
+ u64 vtd_pmr_hi_base;
+ u64 vtd_pmr_hi_size;
+ u64 lcp_po_base;
+ u64 lcp_po_size;
+ u32 capabilities;
+ /* Version = 5 */
+ u64 efi_rsdt_ptr;
+ /* Versions >= 6 */
+ /* Ext Data Elements */
+} __packed;
+
+/*
+ * TXT specification defined SINIT/MLE TXT Heap table
+ */
+struct txt_sinit_mle_data {
+ u32 version; /* Current values are 6 through 9 */
+ /* Versions <= 8 */
+ u8 bios_acm_id[20];
+ u32 edx_senter_flags;
+ u64 mseg_valid;
+ u8 sinit_hash[20];
+ u8 mle_hash[20];
+ u8 stm_hash[20];
+ u8 lcp_policy_hash[20];
+ u32 lcp_policy_control;
+ /* Versions >= 7 */
+ u32 rlp_wakeup_addr;
+ u32 reserved;
+ u32 num_of_sinit_mdrs;
+ u32 sinit_mdrs_table_offset;
+ u32 sinit_vtd_dmar_table_size;
+ u32 sinit_vtd_dmar_table_offset;
+ /* Versions >= 8 */
+ u32 processor_scrtm_status;
+ /* Versions >= 9 */
+ /* Ext Data Elements */
+} __packed;
+
+/*
+ * TXT data reporting structure for memory types
+ */
+struct txt_sinit_memory_descriptor_record {
+ u64 address;
+ u64 length;
+ u8 type;
+ u8 reserved[7];
+} __packed;
+
+/*
+ * TXT data structure used by a responsive local processor (RLP) to start
+ * execution in response to a GETSEC[WAKEUP].
+ */
+struct smx_rlp_mle_join {
+ u32 rlp_gdt_limit;
+ u32 rlp_gdt_base;
+ u32 rlp_seg_sel; /* cs (ds, es, ss are seg_sel+8) */
+ u32 rlp_entry_point; /* phys addr */
+} __packed;
+
+/*
+ * TPM event log structures defined in both the TXT specification and
+ * the TCG documentation.
+ */
+#define TPM12_EVTLOG_SIGNATURE "TXT Event Container"
+
+struct tpm12_event_log_header {
+ char signature[20];
+ char reserved[12];
+ u8 container_ver_major;
+ u8 container_ver_minor;
+ u8 pcr_event_ver_major;
+ u8 pcr_event_ver_minor;
+ u32 container_size;
+ u32 pcr_events_offset;
+ u32 next_event_offset;
+ /* PCREvents[] */
+} __packed;
+
+/*
+ * Functions to extract data from the Intel TXT Heap Memory. The layout
+ * of the heap is as follows:
+ * +----------------------------+
+ * | Size Bios Data table (u64) |
+ * +----------------------------+
+ * | Bios Data table |
+ * +----------------------------+
+ * | Size OS MLE table (u64) |
+ * +----------------------------+
+ * | OS MLE table |
+ * +--------------------------- +
+ * | Size OS SINIT table (u64) |
+ * +----------------------------+
+ * | OS SINIT table |
+ * +----------------------------+
+ * | Size SINIT MLE table (u64) |
+ * +----------------------------+
+ * | SINIT MLE table |
+ * +----------------------------+
+ *
+ * NOTE: the table size fields include the 8 byte size field itself.
+ */
+static inline u64 txt_bios_data_size(void *heap)
+{
+ return *((u64 *)heap);
+}
+
+static inline void *txt_bios_data_start(void *heap)
+{
+ return heap + sizeof(u64);
+}
+
+static inline u64 txt_os_mle_data_size(void *heap)
+{
+ return *((u64 *)(heap + txt_bios_data_size(heap)));
+}
+
+static inline void *txt_os_mle_data_start(void *heap)
+{
+ return heap + txt_bios_data_size(heap) + sizeof(u64);
+}
+
+static inline u64 txt_os_sinit_data_size(void *heap)
+{
+ return *((u64 *)(heap + txt_bios_data_size(heap) +
+ txt_os_mle_data_size(heap)));
+}
+
+static inline void *txt_os_sinit_data_start(void *heap)
+{
+ return heap + txt_bios_data_size(heap) +
+ txt_os_mle_data_size(heap) + sizeof(u64);
+}
+
+static inline u64 txt_sinit_mle_data_size(void *heap)
+{
+ return *((u64 *)(heap + txt_bios_data_size(heap) +
+ txt_os_mle_data_size(heap) +
+ txt_os_sinit_data_size(heap)));
+}
+
+static inline void *txt_sinit_mle_data_start(void *heap)
+{
+ return heap + txt_bios_data_size(heap) +
+ txt_os_mle_data_size(heap) +
+ txt_sinit_mle_data_size(heap) + sizeof(u64);
+}
+
+/*
+ * TPM event logging functions.
+ */
+static inline struct txt_heap_event_log_pointer2_1_element*
+tpm20_find_log2_1_element(struct txt_os_sinit_data *os_sinit_data)
+{
+ struct txt_heap_ext_data_element *ext_elem;
+
+ /* The extended element array as at the end of this table */
+ ext_elem = (struct txt_heap_ext_data_element *)
+ ((u8 *)os_sinit_data + sizeof(struct txt_os_sinit_data));
+
+ while (ext_elem->type != TXT_HEAP_EXTDATA_TYPE_END) {
+ if (ext_elem->type ==
+ TXT_HEAP_EXTDATA_TYPE_EVENT_LOG_POINTER2_1) {
+ return (struct txt_heap_event_log_pointer2_1_element *)
+ ((u8 *)ext_elem +
+ sizeof(struct txt_heap_ext_data_element));
+ }
+ ext_elem =
+ (struct txt_heap_ext_data_element *)
+ ((u8 *)ext_elem + ext_elem->size);
+ }
+
+ return NULL;
+}
+
+static inline int tpm12_log_event(void *evtlog_base, u32 evtlog_size,
+ u32 event_size, void *event)
+{
+ struct tpm12_event_log_header *evtlog =
+ (struct tpm12_event_log_header *)evtlog_base;
+
+ if (memcmp(evtlog->signature, TPM12_EVTLOG_SIGNATURE,
+ sizeof(TPM12_EVTLOG_SIGNATURE)))
+ return -EINVAL;
+
+ if (evtlog->container_size > evtlog_size)
+ return -EINVAL;
+
+ if (evtlog->next_event_offset + event_size > evtlog->container_size)
+ return -E2BIG;
+
+ memcpy(evtlog_base + evtlog->next_event_offset, event, event_size);
+ evtlog->next_event_offset += event_size;
+
+ return 0;
+}
+
+static inline int tpm20_log_event(struct txt_heap_event_log_pointer2_1_element *elem,
+ void *evtlog_base, u32 evtlog_size,
+ u32 event_size, void *event)
+{
+ struct tcg_pcr_event *header =
+ (struct tcg_pcr_event *)evtlog_base;
+
+ /* Has to be at least big enough for the signature */
+ if (header->event_size < sizeof(TCG_SPECID_SIG))
+ return -EINVAL;
+
+ if (memcmp((u8 *)header + sizeof(struct tcg_pcr_event),
+ TCG_SPECID_SIG, sizeof(TCG_SPECID_SIG)))
+ return -EINVAL;
+
+ if (elem->allocated_event_container_size > evtlog_size)
+ return -EINVAL;
+
+ if (elem->next_record_offset + event_size >
+ elem->allocated_event_container_size)
+ return -E2BIG;
+
+ memcpy(evtlog_base + elem->next_record_offset, event, event_size);
+ elem->next_record_offset += event_size;
+
+ return 0;
+}
+
+/*
+ * External functions avalailable in compressed kernel.
+ */
+extern u32 slaunch_get_cpu_type(void);
+
+/*
+ * External functions avalailable in mainline kernel.
+ */
+extern void slaunch_setup(void);
+extern u32 slaunch_get_flags(void);
+extern struct sl_ap_wake_info *slaunch_get_ap_wake_info(void);
+extern struct acpi_table_header *slaunch_get_dmar_table(struct acpi_table_header *dmar);
+extern void __noreturn slaunch_txt_reset(void __iomem *txt,
+ const char *msg, u64 error);
+extern void slaunch_finalize(int do_sexit);
+
+#endif /* !__ASSEMBLY */
+
+#else
+
+#define slaunch_get_cpu_type() 0
+#define slaunch_setup() do { } while (0)
+#define slaunch_get_flags() 0
+#define slaunch_get_dmar_table(d) (d)
+#define slaunch_finalize(d) do { } while (0)
+
+#endif /* !IS_ENABLED(CONFIG_SECURE_LAUNCH) */
+
+#endif /* _LINUX_SLAUNCH_H */
--
1.8.3.1
On Intel, the APs are left in a well documented state after TXT performs
the late launch. Specifically they cannot have #INIT asserted on them so
a standard startup via INIT/SIPI/SIPI cannot be performed. Instead the
early SL stub code parked the APs in a pause/jmp loop waiting for an NMI.
The modified SMP boot code is called for the Secure Launch case. The
jump address for the RM piggy entry point is fixed up in the jump where
the APs are waiting and an NMI IPI is sent to the AP. The AP vectors to
the Secure Launch entry point in the RM piggy which mimics what the real
mode code would do then jumps the the standard RM piggy protected mode
entry point.
Signed-off-by: Ross Philipson <[email protected]>
---
arch/x86/include/asm/realmode.h | 3 ++
arch/x86/kernel/smpboot.c | 86 ++++++++++++++++++++++++++++++++++++
arch/x86/realmode/rm/header.S | 3 ++
arch/x86/realmode/rm/trampoline_64.S | 37 ++++++++++++++++
4 files changed, 129 insertions(+)
diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 5db5d08..ef37bf1 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -37,6 +37,9 @@ struct real_mode_header {
#ifdef CONFIG_X86_64
u32 machine_real_restart_seg;
#endif
+#ifdef CONFIG_SECURE_LAUNCH
+ u32 sl_trampoline_start32;
+#endif
};
/* This must match data at realmode/rm/trampoline_{32,64}.S */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7770245..c324b04 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -57,6 +57,7 @@
#include <linux/pgtable.h>
#include <linux/overflow.h>
#include <linux/syscore_ops.h>
+#include <linux/slaunch.h>
#include <asm/acpi.h>
#include <asm/desc.h>
@@ -1023,6 +1024,83 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
return 0;
}
+#ifdef CONFIG_SECURE_LAUNCH
+
+static atomic_t first_ap_only = {1};
+
+/*
+ * Called to fix the long jump address for the waiting APs to vector to
+ * the correct startup location in the Secure Launch stub in the rmpiggy.
+ */
+static int
+slaunch_fixup_jump_vector(void)
+{
+ struct sl_ap_wake_info *ap_wake_info;
+ u32 *ap_jmp_ptr = NULL;
+
+ if (!atomic_dec_and_test(&first_ap_only))
+ return 0;
+
+ ap_wake_info = slaunch_get_ap_wake_info();
+
+ ap_jmp_ptr = (u32 *)__va(ap_wake_info->ap_wake_block +
+ ap_wake_info->ap_jmp_offset);
+
+ *ap_jmp_ptr = real_mode_header->sl_trampoline_start32;
+
+ pr_info("TXT AP long jump address updated\n");
+
+ return 0;
+}
+
+/*
+ * TXT AP startup is quite different than normal. The APs cannot have #INIT
+ * asserted on them or receive SIPIs. The early Secure Launch code has parked
+ * the APs in a pause loop waiting to receive an NMI. This will wake the APs
+ * and have them jump to the protected mode code in the rmpiggy where the rest
+ * of the SMP boot of the AP will proceed normally.
+ */
+static int
+slaunch_wakeup_cpu_from_txt(int cpu, int apicid)
+{
+ unsigned long send_status = 0, accept_status = 0;
+
+ /* Only done once */
+ if (slaunch_fixup_jump_vector())
+ return -1;
+
+ /* Send NMI IPI to idling AP and wake it up */
+ apic_icr_write(APIC_DM_NMI, apicid);
+
+ if (init_udelay == 0)
+ udelay(10);
+ else
+ udelay(300);
+
+ send_status = safe_apic_wait_icr_idle();
+
+ if (init_udelay == 0)
+ udelay(10);
+ else
+ udelay(300);
+
+ accept_status = (apic_read(APIC_ESR) & 0xEF);
+
+ if (send_status)
+ pr_err("Secure Launch IPI never delivered???\n");
+ if (accept_status)
+ pr_err("Secure Launch IPI delivery error (%lx)\n",
+ accept_status);
+
+ return (send_status | accept_status);
+}
+
+#else
+
+#define slaunch_wakeup_cpu_from_txt(cpu, apicid) 0
+
+#endif /* !CONFIG_SECURE_LAUNCH */
+
/*
* NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
* (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -1077,6 +1155,13 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
cpumask_clear_cpu(cpu, cpu_initialized_mask);
smp_mb();
+ /* With Intel TXT, the AP startup is totally different */
+ if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) ==
+ (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) {
+ boot_error = slaunch_wakeup_cpu_from_txt(cpu, apicid);
+ goto txt_wake;
+ }
+
/*
* Wake up a CPU in difference cases:
* - Use the method in the APIC driver if it's defined
@@ -1089,6 +1174,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
cpu0_nmi_registered);
+txt_wake:
if (!boot_error) {
/*
* Wait 10s total for first sign of life from AP
diff --git a/arch/x86/realmode/rm/header.S b/arch/x86/realmode/rm/header.S
index 8c1db5b..9136bd5 100644
--- a/arch/x86/realmode/rm/header.S
+++ b/arch/x86/realmode/rm/header.S
@@ -36,6 +36,9 @@ SYM_DATA_START(real_mode_header)
#ifdef CONFIG_X86_64
.long __KERNEL32_CS
#endif
+#ifdef CONFIG_SECURE_LAUNCH
+ .long pa_sl_trampoline_start32
+#endif
SYM_DATA_END(real_mode_header)
/* End signature, used to verify integrity */
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index cc8391f..cdfc2c2 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -104,6 +104,43 @@ SYM_CODE_END(sev_es_trampoline_start)
.section ".text32","ax"
.code32
+#ifdef CONFIG_SECURE_LAUNCH
+ .balign 4
+SYM_CODE_START(sl_trampoline_start32)
+ /*
+ * The early secure launch stub AP wakeup code has taken care of all
+ * the vagaries of launching out of TXT. This bit just mimics what the
+ * 16b entry code does and jumps off to the real startup_32.
+ */
+ cli
+ wbinvd
+
+ /*
+ * The %ebx provided is not terribly useful since it is the physical
+ * address of tb_trampoline_start and not the base of the image.
+ * Use pa_real_mode_base, which is fixed up, to get a run time
+ * base register to use for offsets to location that do not have
+ * pa_ symbols.
+ */
+ movl $pa_real_mode_base, %ebx
+
+ /*
+ * This may seem a little odd but this is what %esp would have had in
+ * it on the jmp from real mode because all real mode fixups were done
+ * via the code segment. The base is added at the 32b entry.
+ */
+ movl rm_stack_end, %esp
+
+ lgdt tr_gdt(%ebx)
+ lidt tr_idt(%ebx)
+
+ movw $__KERNEL_DS, %dx # Data segment descriptor
+
+ /* Jump to where the 16b code would have jumped */
+ ljmpl $__KERNEL32_CS, $pa_startup_32
+SYM_CODE_END(sl_trampoline_start32)
+#endif
+
.balign 4
SYM_CODE_START(startup_32)
movl %edx, %ss
--
1.8.3.1
The Secure Launch (SL) stub provides the entry point for Intel TXT (and
later AMD SKINIT) to vector to during the late launch. The symbol
sl_stub_entry is that entry point and its offset into the kernel is
conveyed to the launching code using the MLE (Measured Launch
Environment) header in the structure named mle_header. The offset of the
MLE header is set in the kernel_info. The routine sl_stub contains the
very early late launch setup code responsible for setting up the basic
environment to allow the normal kernel startup_32 code to proceed. It is
also responsible for properly waking and handling the APs on Intel
platforms. The routine sl_main which runs after entering 64b mode is
responsible for measuring configuration and module information before
it is used like the boot params, the kernel command line, the TXT heap,
an external initramfs, etc.
Signed-off-by: Ross Philipson <[email protected]>
---
Documentation/x86/boot.rst | 13 +
arch/x86/boot/compressed/Makefile | 3 +-
arch/x86/boot/compressed/head_64.S | 37 ++
arch/x86/boot/compressed/kaslr.c | 11 +
arch/x86/boot/compressed/kernel_info.S | 33 ++
arch/x86/boot/compressed/sl_main.c | 523 ++++++++++++++++++++++++++
arch/x86/boot/compressed/sl_stub.S | 667 +++++++++++++++++++++++++++++++++
arch/x86/kernel/asm-offsets.c | 19 +
8 files changed, 1305 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/boot/compressed/sl_main.c
create mode 100644 arch/x86/boot/compressed/sl_stub.S
diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
index fc84491..7623f60 100644
--- a/Documentation/x86/boot.rst
+++ b/Documentation/x86/boot.rst
@@ -1026,6 +1026,19 @@ Offset/size: 0x000c/4
This field contains maximal allowed type for setup_data and setup_indirect structs.
+============ =================
+Field name: mle_header_offset
+Offset/size: 0x0010/4
+============ =================
+
+ This field contains the offset to the Secure Launch Measured Launch Environment
+ (MLE) header. This offset is used to locate information needed during a secure
+ late launch using Intel TXT. If the offset is zero, the kernel does not have
+ Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
+ following a success measured launch. The specific state of the processors is
+ outlined in the TXT Software Development Guide, the latest can be found here:
+ https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
+
The Image Checksum
==================
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 059d49a..1fe55a5 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -102,7 +102,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
-vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
+vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
+ $(obj)/sl_main.o $(obj)/sl_stub.o
$(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index a2347de..b35e072 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -498,6 +498,17 @@ trampoline_return:
pushq $0
popfq
+#ifdef CONFIG_SECURE_LAUNCH
+ pushq %rsi
+
+ /* Ensure the relocation region coverd by a PMR */
+ movq %rbx, %rdi
+ movl $(_bss - startup_32), %esi
+ callq sl_check_region
+
+ popq %rsi
+#endif
+
/*
* Copy the compressed kernel to the end of our buffer
* where decompression in place becomes safe.
@@ -556,6 +567,32 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
shrq $3, %rcx
rep stosq
+#ifdef CONFIG_SECURE_LAUNCH
+ /*
+ * Have to do the final early sl stub work in 64b area.
+ *
+ * *********** NOTE ***********
+ *
+ * Several boot params get used before we get a chance to measure
+ * them in this call. This is a known issue and we currently don't
+ * have a solution. The scratch field doesn't matter. There is no
+ * obvious way to do anything about the use of kernel_alignment or
+ * init_size though these seem low risk with all the PMR and overlap
+ * checks in place.
+ */
+ pushq %rsi
+
+ movq %rsi, %rdi
+ callq sl_main
+
+ /* Ensure the decompression location is coverd by a PMR */
+ movq %rbp, %rdi
+ movq output_len(%rip), %rsi
+ callq sl_check_region
+
+ popq %rsi
+#endif
+
/*
* If running as an SEV guest, the encryption mask is required in the
* page-table setup code below. When the guest also has SEV-ES enabled
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index e366907..f935468 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -29,6 +29,7 @@
#include <linux/utsname.h>
#include <linux/ctype.h>
#include <linux/efi.h>
+#include <linux/slaunch.h>
#include <generated/utsrelease.h>
#include <asm/efi.h>
@@ -840,6 +841,16 @@ void choose_random_location(unsigned long input,
return;
}
+ /*
+ * If a secure launch is in progress, KASLR cannot be used
+ * since knowing the exact location of things is a crucial
+ * part of the secure launch.
+ */
+ if (slaunch_get_cpu_type() & SL_CPU_INTEL) {
+ warn("KASLR disabled: by Secure Launch");
+ return;
+ }
+
boot_params->hdr.loadflags |= KASLR_FLAG;
if (IS_ENABLED(CONFIG_X86_32))
diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
index c18f071..91914eb 100644
--- a/arch/x86/boot/compressed/kernel_info.S
+++ b/arch/x86/boot/compressed/kernel_info.S
@@ -28,6 +28,39 @@ SYM_DATA_START(kernel_info)
/* Maximal allowed type for setup_data and setup_indirect structs. */
.long SETUP_TYPE_MAX
+ /* Offset to the MLE header structure */
+#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
+ .long rva(mle_header)
+#else
+ .long 0
+#endif
+
kernel_info_var_len_data:
/* Empty for time being... */
SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
+
+#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
+ /*
+ * The MLE Header per the TXT Specification, section 2.1
+ * MLE capabilities, see table 4. Capabilities set:
+ * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
+ * bit 1: Support for RLP wakeup using MONITOR address
+ * bit 5: TPM 1.2 family: Details/authorities PCR usage support
+ * bit 9: Supported format of TPM 2.0 event log - TCG compliant
+ */
+SYM_DATA_START(mle_header)
+ .long 0x9082ac5a /* UUID0 */
+ .long 0x74a7476f /* UUID1 */
+ .long 0xa2555c0f /* UUID2 */
+ .long 0x42b651cb /* UUID3 */
+ .long 0x00000034 /* MLE header size */
+ .long 0x00020002 /* MLE version 2.2 */
+ .long rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */
+ .long 0x00000000 /* First valid page of MLE */
+ .long 0x00000000 /* Offset within binary of first byte of MLE */
+ .long rva(_edata) /* Offset within binary of last byte + 1 of MLE */
+ .long 0x00000223 /* Bit vector of MLE-supported capabilities */
+ .long 0x00000000 /* Starting linear address of command line (unused) */
+ .long 0x00000000 /* Ending linear address of command line (unused) */
+SYM_DATA_END(mle_header)
+#endif
diff --git a/arch/x86/boot/compressed/sl_main.c b/arch/x86/boot/compressed/sl_main.c
new file mode 100644
index 00000000..64ada2a
--- /dev/null
+++ b/arch/x86/boot/compressed/sl_main.c
@@ -0,0 +1,523 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Secure Launch early measurement and validation routines.
+ *
+ * Copyright (c) 2021, Oracle and/or its affiliates.
+ */
+
+#include <linux/init.h>
+#include <linux/string.h>
+#include <linux/linkage.h>
+#include <linux/efi.h>
+#include <asm/segment.h>
+#include <asm/boot.h>
+#include <asm/msr.h>
+#include <asm/io.h>
+#include <asm/mtrr.h>
+#include <asm/processor-flags.h>
+#include <asm/asm-offsets.h>
+#include <asm/bootparam.h>
+#include <asm/efi.h>
+#include <asm/bootparam_utils.h>
+#include <linux/slaunch.h>
+#include <crypto/sha1.h>
+#include <crypto/sha2.h>
+
+#include "misc.h"
+#include "early_sha1.h"
+
+#define CAPS_VARIABLE_MTRR_COUNT_MASK 0xff
+
+#define SL_TPM12_LOG 1
+#define SL_TPM20_LOG 2
+
+#define SL_TPM20_MAX_ALGS 2
+
+#define SL_MAX_EVENT_DATA 64
+#define SL_TPM12_LOG_SIZE (sizeof(struct tcg_pcr_event) + \
+ SL_MAX_EVENT_DATA)
+#define SL_TPM20_LOG_SIZE (sizeof(struct tcg_pcr_event2_head) + \
+ SHA1_DIGEST_SIZE + SHA256_DIGEST_SIZE + \
+ sizeof(struct tcg_event_field) + \
+ SL_MAX_EVENT_DATA)
+
+static void *evtlog_base;
+static u32 evtlog_size;
+static struct txt_heap_event_log_pointer2_1_element *log20_elem;
+static u32 tpm_log_ver = SL_TPM12_LOG;
+struct tcg_efi_specid_event_algs tpm_algs[SL_TPM20_MAX_ALGS] = {0};
+
+#if !IS_ENABLED(CONFIG_SECURE_LAUNCH_ALT_PCR19)
+static u32 pcr_config = SL_DEF_CONFIG_PCR18;
+#else
+static u32 pcr_config = SL_ALT_CONFIG_PCR19;
+#endif
+
+#if !IS_ENABLED(CONFIG_SECURE_LAUNCH_ALT_PCR20)
+static u32 pcr_image = SL_DEF_IMAGE_PCR17;
+#else
+static u32 pcr_image = SL_ALT_IMAGE_PCR20;
+#endif
+
+extern u32 sl_cpu_type;
+extern u32 sl_mle_start;
+
+u32 slaunch_get_cpu_type(void)
+{
+ return sl_cpu_type;
+}
+
+static u64 sl_txt_read(u32 reg)
+{
+ return readq((void *)(u64)(TXT_PRIV_CONFIG_REGS_BASE + reg));
+}
+
+static void sl_txt_write(u32 reg, u64 val)
+{
+ writeq(val, (void *)(u64)(TXT_PRIV_CONFIG_REGS_BASE + reg));
+}
+
+static void __noreturn sl_txt_reset(u64 error)
+{
+ /* Reading the E2STS register acts as a barrier for TXT registers */
+ sl_txt_write(TXT_CR_ERRORCODE, error);
+ sl_txt_read(TXT_CR_E2STS);
+ sl_txt_write(TXT_CR_CMD_UNLOCK_MEM_CONFIG, 1);
+ sl_txt_read(TXT_CR_E2STS);
+ sl_txt_write(TXT_CR_CMD_RESET, 1);
+
+ for ( ; ; )
+ asm volatile ("hlt");
+
+ unreachable();
+}
+
+static u64 sl_rdmsr(u32 reg)
+{
+ u64 lo, hi;
+
+ asm volatile ("rdmsr" : "=a" (lo), "=d" (hi) : "c" (reg));
+
+ return (hi << 32) | lo;
+}
+
+static void sl_check_pmr_coverage(void *base, u32 size, bool allow_hi)
+{
+ void *end = base + size;
+ struct txt_os_sinit_data *os_sinit_data;
+ void *txt_heap;
+
+ if (!(sl_cpu_type & SL_CPU_INTEL))
+ return;
+
+ txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
+ os_sinit_data = txt_os_sinit_data_start(txt_heap);
+
+ if ((end >= (void *)0x100000000ULL) &&
+ (base < (void *)0x100000000ULL))
+ sl_txt_reset(SL_ERROR_REGION_STRADDLE_4GB);
+
+ /*
+ * Note that the late stub code validates that the hi PMR covers
+ * all memory above 4G. At this point the code can only check that
+ * regions are within the hi PMR but that is sufficient.
+ */
+ if ((end > (void *)0x100000000ULL) &&
+ (base >= (void *)0x100000000ULL)) {
+ if (allow_hi) {
+ if (end >= (void *)(os_sinit_data->vtd_pmr_hi_base +
+ os_sinit_data->vtd_pmr_hi_size))
+ sl_txt_reset(SL_ERROR_BUFFER_BEYOND_PMR);
+ } else
+ sl_txt_reset(SL_ERROR_REGION_ABOVE_4GB);
+ }
+
+ if (end >= (void *)os_sinit_data->vtd_pmr_lo_size)
+ sl_txt_reset(SL_ERROR_BUFFER_BEYOND_PMR);
+}
+
+/*
+ * Some MSRs are modified by the pre-launch code including the MTRRs.
+ * The early MLE code has to restore these values. This code validates
+ * the values after they are measured.
+ */
+static void sl_txt_validate_msrs(struct txt_os_mle_data *os_mle_data)
+{
+ u64 mtrr_caps, mtrr_def_type, mtrr_var, misc_en_msr;
+ u32 vcnt, i;
+ struct txt_mtrr_state *saved_bsp_mtrrs =
+ &(os_mle_data->saved_bsp_mtrrs);
+
+ mtrr_caps = sl_rdmsr(MSR_MTRRcap);
+ vcnt = (u32)(mtrr_caps & CAPS_VARIABLE_MTRR_COUNT_MASK);
+
+ if (saved_bsp_mtrrs->mtrr_vcnt > vcnt)
+ sl_txt_reset(SL_ERROR_MTRR_INV_VCNT);
+ if (saved_bsp_mtrrs->mtrr_vcnt > TXT_OS_MLE_MAX_VARIABLE_MTRRS)
+ sl_txt_reset(SL_ERROR_MTRR_INV_VCNT);
+
+ mtrr_def_type = sl_rdmsr(MSR_MTRRdefType);
+ if (saved_bsp_mtrrs->default_mem_type != mtrr_def_type)
+ sl_txt_reset(SL_ERROR_MTRR_INV_DEF_TYPE);
+
+ for (i = 0; i < saved_bsp_mtrrs->mtrr_vcnt; i++) {
+ mtrr_var = sl_rdmsr(MTRRphysBase_MSR(i));
+ if (saved_bsp_mtrrs->mtrr_pair[i].mtrr_physbase != mtrr_var)
+ sl_txt_reset(SL_ERROR_MTRR_INV_BASE);
+ mtrr_var = sl_rdmsr(MTRRphysMask_MSR(i));
+ if (saved_bsp_mtrrs->mtrr_pair[i].mtrr_physmask != mtrr_var)
+ sl_txt_reset(SL_ERROR_MTRR_INV_MASK);
+ }
+
+ misc_en_msr = sl_rdmsr(MSR_IA32_MISC_ENABLE);
+ if (os_mle_data->saved_misc_enable_msr != misc_en_msr)
+ sl_txt_reset(SL_ERROR_MSR_INV_MISC_EN);
+}
+
+static void sl_find_event_log(void)
+{
+ struct txt_os_mle_data *os_mle_data;
+ struct txt_os_sinit_data *os_sinit_data;
+ void *txt_heap;
+
+ txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
+
+ os_mle_data = txt_os_mle_data_start(txt_heap);
+ evtlog_base = (void *)os_mle_data->evtlog_addr;
+ evtlog_size = os_mle_data->evtlog_size;
+
+ /*
+ * For TPM 2.0, the event log 2.1 extended data structure has to also
+ * be located and fixed up.
+ */
+ os_sinit_data = txt_os_sinit_data_start(txt_heap);
+
+ /*
+ * Only support version 6 and later that properly handle the
+ * list of ExtDataElements in the OS-SINIT structure.
+ */
+ if (os_sinit_data->version < 6)
+ sl_txt_reset(SL_ERROR_OS_SINIT_BAD_VERSION);
+
+ /* Find the TPM2.0 logging extended heap element */
+ log20_elem = tpm20_find_log2_1_element(os_sinit_data);
+
+ /* If found, this implies TPM20 log and family */
+ if (log20_elem)
+ tpm_log_ver = SL_TPM20_LOG;
+}
+
+static void sl_validate_event_log_buffer(void)
+{
+ void *mle_base = (void *)(u64)sl_mle_start;
+ void *mle_end;
+ struct txt_os_sinit_data *os_sinit_data;
+ void *txt_heap;
+ void *txt_heap_end;
+ void *evtlog_end;
+
+ if ((u64)evtlog_size > (LLONG_MAX - (u64)evtlog_base))
+ sl_txt_reset(SL_ERROR_INTEGER_OVERFLOW);
+ evtlog_end = evtlog_base + evtlog_size;
+
+ txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
+ txt_heap_end = txt_heap + sl_txt_read(TXT_CR_HEAP_SIZE);
+ os_sinit_data = txt_os_sinit_data_start(txt_heap);
+
+ mle_end = mle_base + os_sinit_data->mle_size;
+
+ /*
+ * This check is to ensure the event log buffer does not overlap with
+ * the MLE image.
+ */
+ if ((evtlog_base >= mle_end) &&
+ (evtlog_end > mle_end))
+ goto pmr_check; /* above */
+
+ if ((evtlog_end <= mle_base) &&
+ (evtlog_base < mle_base))
+ goto pmr_check; /* below */
+
+ sl_txt_reset(SL_ERROR_MLE_BUFFER_OVERLAP);
+
+pmr_check:
+ /*
+ * The TXT heap is protected by the DPR. If the TPM event log is
+ * inside the TXT heap, there is no need for a PMR check.
+ */
+ if ((evtlog_base > txt_heap) &&
+ (evtlog_end < txt_heap_end))
+ return;
+
+ sl_check_pmr_coverage(evtlog_base, evtlog_size, true);
+}
+
+static void sl_find_event_log_algorithms(void)
+{
+ struct tcg_efi_specid_event_head *efi_head =
+ (struct tcg_efi_specid_event_head *)(evtlog_base +
+ log20_elem->first_record_offset +
+ sizeof(struct tcg_pcr_event));
+
+ if (efi_head->num_algs == 0 || efi_head->num_algs > 2)
+ sl_txt_reset(SL_ERROR_TPM_NUMBER_ALGS);
+
+ memcpy(&tpm_algs[0], &efi_head->digest_sizes[0],
+ sizeof(struct tcg_efi_specid_event_algs) * efi_head->num_algs);
+}
+
+static void sl_tpm12_log_event(u32 pcr, u32 event_type,
+ const u8 *data, u32 length,
+ const u8 *event_data, u32 event_size)
+{
+ struct tcg_pcr_event *pcr_event;
+ struct sha1_state sctx = {0};
+ u32 total_size;
+ u8 log_buf[SL_TPM12_LOG_SIZE] = {0};
+ u8 sha1_hash[SHA1_DIGEST_SIZE] = {0};
+
+ pcr_event = (struct tcg_pcr_event *)log_buf;
+ pcr_event->pcr_idx = pcr;
+ pcr_event->event_type = event_type;
+ if (length > 0) {
+ early_sha1_init(&sctx);
+ early_sha1_update(&sctx, data, length);
+ early_sha1_final(&sctx, &sha1_hash[0]);
+ memcpy(&pcr_event->digest[0], &sha1_hash[0], SHA1_DIGEST_SIZE);
+ }
+ pcr_event->event_size = event_size;
+ if (event_size > 0)
+ memcpy((u8 *)pcr_event + sizeof(struct tcg_pcr_event),
+ event_data, event_size);
+
+ total_size = sizeof(struct tcg_pcr_event) + event_size;
+
+ if (tpm12_log_event(evtlog_base, evtlog_size, total_size, pcr_event))
+ sl_txt_reset(SL_ERROR_TPM_LOGGING_FAILED);
+}
+
+static void sl_tpm20_log_event(u32 pcr, u32 event_type,
+ const u8 *data, u32 length,
+ const u8 *event_data, u32 event_size)
+{
+ struct tcg_pcr_event2_head *head;
+ struct tcg_event_field *event;
+ struct sha1_state sctx1 = {0};
+ struct sha256_state sctx256 = {0};
+ u32 total_size;
+ u16 *alg_ptr;
+ u8 *dgst_ptr;
+ u8 log_buf[SL_TPM20_LOG_SIZE] = {0};
+ u8 sha1_hash[SHA1_DIGEST_SIZE] = {0};
+ u8 sha256_hash[SHA256_DIGEST_SIZE] = {0};
+
+ head = (struct tcg_pcr_event2_head *)log_buf;
+ head->pcr_idx = pcr;
+ head->event_type = event_type;
+ total_size = sizeof(struct tcg_pcr_event2_head);
+ alg_ptr = (u16 *)(log_buf + sizeof(struct tcg_pcr_event2_head));
+
+ for ( ; head->count < 2; head->count++) {
+ if (!tpm_algs[head->count].alg_id)
+ break;
+
+ *alg_ptr = tpm_algs[head->count].alg_id;
+ dgst_ptr = (u8 *)alg_ptr + sizeof(u16);
+
+ if (tpm_algs[head->count].alg_id == TPM_ALG_SHA256 &&
+ length) {
+ sha256_init(&sctx256);
+ sha256_update(&sctx256, data, length);
+ sha256_final(&sctx256, &sha256_hash[0]);
+ } else if (tpm_algs[head->count].alg_id == TPM_ALG_SHA1 &&
+ length) {
+ early_sha1_init(&sctx1);
+ early_sha1_update(&sctx1, data, length);
+ early_sha1_final(&sctx1, &sha1_hash[0]);
+ }
+
+ if (tpm_algs[head->count].alg_id == TPM_ALG_SHA256) {
+ memcpy(dgst_ptr, &sha256_hash[0], SHA256_DIGEST_SIZE);
+ total_size += SHA256_DIGEST_SIZE + sizeof(u16);
+ alg_ptr = (u16 *)((u8 *)alg_ptr + SHA256_DIGEST_SIZE + sizeof(u16));
+ } else if (tpm_algs[head->count].alg_id == TPM_ALG_SHA1) {
+ memcpy(dgst_ptr, &sha1_hash[0], SHA1_DIGEST_SIZE);
+ total_size += SHA1_DIGEST_SIZE + sizeof(u16);
+ alg_ptr = (u16 *)((u8 *)alg_ptr + SHA1_DIGEST_SIZE + sizeof(u16));
+ } else
+ sl_txt_reset(SL_ERROR_TPM_UNKNOWN_DIGEST);
+ }
+
+ event = (struct tcg_event_field *)(log_buf + total_size);
+ event->event_size = event_size;
+ if (event_size > 0)
+ memcpy((u8 *)event + sizeof(struct tcg_event_field),
+ event_data, event_size);
+ total_size += sizeof(struct tcg_event_field) + event_size;
+
+ if (tpm20_log_event(log20_elem, evtlog_base, evtlog_size,
+ total_size, &log_buf[0]))
+ sl_txt_reset(SL_ERROR_TPM_LOGGING_FAILED);
+}
+
+static void sl_tpm_extend_evtlog(u32 pcr, u32 type,
+ const u8 *data, u32 length, const char *desc)
+{
+ if (tpm_log_ver == SL_TPM20_LOG)
+ sl_tpm20_log_event(pcr, type, data, length,
+ (const u8 *)desc, strlen(desc));
+ else
+ sl_tpm12_log_event(pcr, type, data, length,
+ (const u8 *)desc, strlen(desc));
+}
+
+asmlinkage __visible void sl_check_region(void *base, u32 size)
+{
+ sl_check_pmr_coverage(base, size, false);
+}
+
+asmlinkage __visible void sl_main(void *bootparams)
+{
+ struct boot_params *bp;
+ struct setup_data *data;
+ struct txt_os_mle_data *os_mle_data;
+ struct txt_os_mle_data os_mle_tmp = {0};
+ const char *signature;
+ unsigned long mmap = 0;
+ void *txt_heap;
+ u32 data_count;
+
+ /*
+ * Currently only Intel TXT is supported for Secure Launch. Testing
+ * this value also indicates that the kernel was booted successfully
+ * through the Secure Launch entry point and is in SMX mode.
+ */
+ if (!(sl_cpu_type & SL_CPU_INTEL))
+ return;
+
+ /* Locate the TPM event log. */
+ sl_find_event_log();
+
+ /* Validate the location of the event log buffer before using it */
+ sl_validate_event_log_buffer();
+
+ /*
+ * Find the TPM hash algorithms used by the ACM and recorded in the
+ * event log.
+ */
+ if (tpm_log_ver == SL_TPM20_LOG)
+ sl_find_event_log_algorithms();
+
+ /* Sanitize them before measuring */
+ boot_params = (struct boot_params *)bootparams;
+ sanitize_boot_params(boot_params);
+
+ /* Place event log NO_ACTION tags before and after measurements */
+ sl_tpm_extend_evtlog(17, TXT_EVTYPE_SLAUNCH_START, NULL, 0, "");
+
+ sl_check_pmr_coverage(bootparams, PAGE_SIZE, false);
+
+ /* Measure the zero page/boot params */
+ sl_tpm_extend_evtlog(pcr_config, TXT_EVTYPE_SLAUNCH,
+ bootparams, PAGE_SIZE,
+ "Measured boot parameters");
+
+ /* Now safe to use boot params */
+ bp = (struct boot_params *)bootparams;
+
+ /* Measure the command line */
+ if (bp->hdr.cmdline_size > 0) {
+ u64 cmdline = (u64)bp->hdr.cmd_line_ptr;
+
+ if (bp->ext_cmd_line_ptr > 0)
+ cmdline = cmdline | ((u64)bp->ext_cmd_line_ptr << 32);
+
+ sl_check_pmr_coverage((void *)cmdline,
+ bp->hdr.cmdline_size, true);
+
+ sl_tpm_extend_evtlog(pcr_config, TXT_EVTYPE_SLAUNCH,
+ (u8 *)cmdline,
+ bp->hdr.cmdline_size,
+ "Measured Kernel command line");
+ }
+
+ /*
+ * Measuring the boot params measured the fixed e820 memory map.
+ * Measure any setup_data entries including e820 extended entries.
+ */
+ data = (struct setup_data *)(unsigned long)bp->hdr.setup_data;
+ while (data) {
+ sl_check_pmr_coverage(((u8 *)data) + sizeof(struct setup_data),
+ data->len, true);
+
+ sl_tpm_extend_evtlog(pcr_config, TXT_EVTYPE_SLAUNCH,
+ ((u8 *)data) + sizeof(struct setup_data),
+ data->len,
+ "Measured Kernel setup_data");
+
+ data = (struct setup_data *)(unsigned long)data->next;
+ }
+
+ /* If bootloader was EFI, measure the memory map passed across */
+ signature =
+ (const char *)&bp->efi_info.efi_loader_signature;
+
+ if (!strncmp(signature, EFI32_LOADER_SIGNATURE, 4))
+ mmap = bp->efi_info.efi_memmap;
+ else if (!strncmp(signature, EFI64_LOADER_SIGNATURE, 4))
+ mmap = (bp->efi_info.efi_memmap |
+ ((u64)bp->efi_info.efi_memmap_hi << 32));
+
+ if (mmap)
+ sl_tpm_extend_evtlog(pcr_config, TXT_EVTYPE_SLAUNCH,
+ (void *)mmap,
+ bp->efi_info.efi_memmap_size,
+ "Measured EFI memory map");
+
+ /* Measure any external initrd */
+ if (bp->hdr.ramdisk_image != 0 && bp->hdr.ramdisk_size != 0) {
+ u64 ramdisk = (u64)bp->hdr.ramdisk_image;
+
+ if (bp->ext_ramdisk_size > 0)
+ sl_txt_reset(SL_ERROR_INITRD_TOO_BIG);
+
+ if (bp->ext_ramdisk_image > 0)
+ ramdisk = ramdisk |
+ ((u64)bp->ext_ramdisk_image << 32);
+
+ sl_check_pmr_coverage((void *)ramdisk,
+ bp->hdr.ramdisk_size, true);
+
+ sl_tpm_extend_evtlog(pcr_image, TXT_EVTYPE_SLAUNCH,
+ (u8 *)(ramdisk),
+ bp->hdr.ramdisk_size,
+ "Measured initramfs");
+ }
+
+ /*
+ * Some extra work to do on Intel, have to measure the OS-MLE
+ * heap area.
+ */
+ txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
+ os_mle_data = txt_os_mle_data_start(txt_heap);
+
+ /* Measure only portions of OS-MLE data, not addresses/sizes etc. */
+ os_mle_tmp.version = os_mle_data->version;
+ os_mle_tmp.saved_misc_enable_msr = os_mle_data->saved_misc_enable_msr;
+ os_mle_tmp.saved_bsp_mtrrs = os_mle_data->saved_bsp_mtrrs;
+
+ /* No PMR check is needed, the TXT heap is covered by the DPR */
+
+ sl_tpm_extend_evtlog(pcr_config, TXT_EVTYPE_SLAUNCH,
+ (u8 *)&os_mle_tmp,
+ sizeof(struct txt_os_mle_data),
+ "Measured TXT OS-MLE data");
+
+ sl_tpm_extend_evtlog(17, TXT_EVTYPE_SLAUNCH_END, NULL, 0, "");
+
+ /*
+ * Now that the OS-MLE data is measured, ensure the MTRR and
+ * misc enable MSRs are what we expect.
+ */
+ sl_txt_validate_msrs(os_mle_data);
+}
diff --git a/arch/x86/boot/compressed/sl_stub.S b/arch/x86/boot/compressed/sl_stub.S
new file mode 100644
index 00000000..311539c7
--- /dev/null
+++ b/arch/x86/boot/compressed/sl_stub.S
@@ -0,0 +1,667 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Secure Launch protected mode entry point.
+ *
+ * Copyright (c) 2021, Oracle and/or its affiliates.
+ */
+ .code32
+ .text
+#include <linux/linkage.h>
+#include <asm/segment.h>
+#include <asm/msr.h>
+#include <asm/processor-flags.h>
+#include <asm/asm-offsets.h>
+#include <asm/bootparam.h>
+#include <asm/page_types.h>
+#include <asm/irq_vectors.h>
+#include <linux/slaunch.h>
+
+/* Can't include apiddef.h in asm */
+#define XAPIC_ENABLE (1 << 11)
+#define X2APIC_ENABLE (1 << 10)
+
+/* Can't include traps.h in asm */
+#define X86_TRAP_NMI 2
+
+/* Can't include mtrr.h in asm */
+#define MTRRphysBase0 0x200
+
+#define IDT_VECTOR_LO_BITS 0
+#define IDT_VECTOR_HI_BITS 6
+
+/*
+ * See the comment in head_64.S for detailed informatoin on what this macro
+ * is used for.
+ */
+#define rva(X) ((X) - sl_stub_entry)
+
+/*
+ * The GETSEC op code is open coded because older versions of
+ * GCC do not support the getsec mnemonic.
+ */
+.macro GETSEC leaf
+ pushl %ebx
+ xorl %ebx, %ebx /* Must be zero for SMCTRL */
+ movl \leaf, %eax /* Leaf function */
+ .byte 0x0f, 0x37 /* GETSEC opcode */
+ popl %ebx
+.endm
+
+.macro TXT_RESET error
+ /*
+ * Set a sticky error value and reset. Note the movs to %eax act as
+ * TXT register barriers.
+ */
+ movl \error, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ERRORCODE)
+ movl (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
+ movl $1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_NO_SECRETS)
+ movl (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
+ movl $1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_UNLOCK_MEM_CONFIG)
+ movl (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
+ movl $1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_RESET)
+1:
+ hlt
+ jmp 1b
+.endm
+
+ .code32
+SYM_FUNC_START(sl_stub_entry)
+ cli
+ cld
+
+ /*
+ * On entry, %ebx has the entry abs offset to sl_stub_entry. This
+ * will be correctly scaled using the rva macro and avoid causing
+ * relocations. Only %cs and %ds segments are known good.
+ */
+
+ /* Load GDT, set segment regs and lret to __SL32_CS */
+ leal rva(sl_gdt_desc)(%ebx), %eax
+ addl %eax, 2(%eax)
+ lgdt (%eax)
+
+ movl $(__SL32_DS), %eax
+ movw %ax, %ds
+ movw %ax, %es
+ movw %ax, %fs
+ movw %ax, %gs
+ movw %ax, %ss
+
+ /*
+ * Now that %ss is known good, take the first stack for the BSP. The
+ * AP stacks are only used on Intel.
+ */
+ leal rva(sl_stacks_end)(%ebx), %esp
+
+ leal rva(.Lsl_cs)(%ebx), %eax
+ pushl $(__SL32_CS)
+ pushl %eax
+ lret
+
+.Lsl_cs:
+ /* Save our base pointer reg */
+ pushl %ebx
+
+ /* Now see if it is GenuineIntel. CPUID 0 returns the manufacturer */
+ xorl %eax, %eax
+ cpuid
+ cmpl $(INTEL_CPUID_MFGID_EBX), %ebx
+ jnz .Ldo_unknown_cpu
+ cmpl $(INTEL_CPUID_MFGID_EDX), %edx
+ jnz .Ldo_unknown_cpu
+ cmpl $(INTEL_CPUID_MFGID_ECX), %ecx
+ jnz .Ldo_unknown_cpu
+
+ popl %ebx
+
+ /* Know it is Intel */
+ movl $(SL_CPU_INTEL), rva(sl_cpu_type)(%ebx)
+
+ /* Increment CPU count for BSP */
+ incl rva(sl_txt_cpu_count)(%ebx)
+
+ /* Enable SMI with GETSEC[SMCTRL] */
+ GETSEC $(SMX_X86_GETSEC_SMCTRL)
+
+ /* IRET-to-self can be used to enable NMIs which SENTER disabled */
+ leal rva(.Lnmi_enabled)(%ebx), %eax
+ pushfl
+ pushl $(__SL32_CS)
+ pushl %eax
+ iret
+
+.Lnmi_enabled:
+ /* Clear the TXT error registers for a clean start of day */
+ movl $0, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ERRORCODE)
+ movl $0xffffffff, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ESTS)
+
+ /* On Intel, the zero page address is passed in the TXT heap */
+ /* Read physical base of heap into EAX */
+ movl (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
+ /* Read the size of the BIOS data into ECX (first 8 bytes) */
+ movl (%eax), %ecx
+ /* Skip over BIOS data and size of OS to MLE data section */
+ leal 8(%eax, %ecx), %eax
+
+ /* Need to verify the values in the OS-MLE struct passed in */
+ call sl_txt_verify_os_mle_struct
+
+ /*
+ * Get the boot params address from the heap. Note %esi and %ebx MUST
+ * be preserved across calls and operations.
+ */
+ movl SL_boot_params_addr(%eax), %esi
+
+ /* Save %ebx so the APs can find their way home */
+ movl %ebx, (SL_mle_scratch + SL_SCRATCH_AP_EBX)(%eax)
+
+ /* Fetch the AP wake code block address from the heap */
+ movl SL_ap_wake_block(%eax), %edi
+ movl %edi, rva(sl_txt_ap_wake_block)(%ebx)
+
+ /* Store the offset in the AP wake block to the jmp address */
+ movl $(sl_ap_jmp_offset - sl_txt_ap_wake_begin), \
+ (SL_mle_scratch + SL_SCRATCH_AP_JMP_OFFSET)(%eax)
+
+ /* %eax still is the base of the OS-MLE block, save it */
+ pushl %eax
+
+ /* Relocate the AP wake code to the safe block */
+ call sl_txt_reloc_ap_wake
+
+ /*
+ * Wake up all APs that are blocked in the ACM and wait for them to
+ * halt. This should be done before restoring the MTRRs so the ACM is
+ * still properly in WB memory.
+ */
+ call sl_txt_wake_aps
+
+ /*
+ * Pop OS-MLE base address (was in %eax above) for call to load
+ * MTRRs/MISC MSR
+ */
+ popl %edi
+ call sl_txt_load_regs
+
+ jmp .Lcpu_setup_done
+
+.Ldo_unknown_cpu:
+ /* Non-Intel CPUs are not yet supported */
+ ud2
+
+.Lcpu_setup_done:
+ /*
+ * Don't enable MCE at this point. The kernel will enable
+ * it on the BSP later when it is ready.
+ */
+
+ /* Done, jump to normal 32b pm entry */
+ jmp startup_32
+SYM_FUNC_END(sl_stub_entry)
+
+SYM_FUNC_START(sl_check_buffer_mle_overlap)
+ /* %ecx: buffer begin %edx: buffer end */
+ /* %ebx: MLE begin %edi: MLE end */
+
+ cmpl %edi, %ecx
+ jb .Lnext_check
+ cmpl %edi, %edx
+ jbe .Lnext_check
+ jmp .Lvalid /* Buffer above MLE */
+
+.Lnext_check:
+ cmpl %ebx, %edx
+ ja .Linvalid
+ cmpl %ebx, %ecx
+ jae .Linvalid
+ jmp .Lvalid /* Buffer below MLE */
+
+.Linvalid:
+ TXT_RESET $(SL_ERROR_MLE_BUFFER_OVERLAP)
+
+.Lvalid:
+ ret
+SYM_FUNC_END(sl_check_buffer_mle_overlap)
+
+SYM_FUNC_START(sl_txt_verify_os_mle_struct)
+ /*
+ * %eax points to the base of the OS-MLE struct. Need to also
+ * read some values from the OS-SINIT struct too.
+ */
+ movl -8(%eax), %ecx
+ /* Skip over OS to MLE data section and size of OS-SINIT structure */
+ leal (%eax, %ecx), %edx
+
+ /* Save MLE image base for sl_main's use */
+ movl %ebx, rva(sl_mle_start)(%ebx)
+
+ /* Verify the value of the low PMR base. It should always be 0. */
+ movl SL_vtd_pmr_lo_base(%edx), %esi
+ cmpl $0, %esi
+ jz .Lvalid_pmr_base
+ TXT_RESET $(SL_ERROR_LO_PMR_BASE)
+
+.Lvalid_pmr_base:
+ /* Grab some values from OS-SINIT structure */
+ movl SL_mle_size(%edx), %edi
+ addl %ebx, %edi
+ jc .Loverflow_detected
+ movl SL_vtd_pmr_lo_size(%edx), %esi
+
+ /* Check the AP wake block */
+ movl SL_ap_wake_block(%eax), %ecx
+ movl SL_ap_wake_block_size(%eax), %edx
+ addl %ecx, %edx
+ jc .Loverflow_detected
+ call sl_check_buffer_mle_overlap
+ cmpl %esi, %edx
+ ja .Lbuffer_beyond_pmr
+
+ /* Check the boot params */
+ movl SL_boot_params_addr(%eax), %ecx
+ movl $(PAGE_SIZE), %edx
+ addl %ecx, %edx
+ jc .Loverflow_detected
+ call sl_check_buffer_mle_overlap
+ cmpl %esi, %edx
+ ja .Lbuffer_beyond_pmr
+
+ /* Check that the AP wake block is big enough */
+ cmpl $(sl_txt_ap_wake_end - sl_txt_ap_wake_begin), \
+ SL_ap_wake_block_size(%eax)
+ jae .Lwake_block_ok
+ TXT_RESET $(SL_ERROR_WAKE_BLOCK_TOO_SMALL)
+
+.Lwake_block_ok:
+ ret
+
+.Loverflow_detected:
+ TXT_RESET $(SL_ERROR_INTEGER_OVERFLOW)
+
+.Lbuffer_beyond_pmr:
+ TXT_RESET $(SL_ERROR_BUFFER_BEYOND_PMR)
+SYM_FUNC_END(sl_txt_verify_os_mle_struct)
+
+SYM_FUNC_START(sl_txt_ap_entry)
+ cli
+ cld
+ /*
+ * The %cs and %ds segments are known good after waking the AP.
+ * First order of business is to find where we are and
+ * save it in %ebx.
+ */
+
+ /* Read physical base of heap into EAX */
+ movl (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
+ /* Read the size of the BIOS data into ECX (first 8 bytes) */
+ movl (%eax), %ecx
+ /* Skip over BIOS data and size of OS to MLE data section */
+ leal 8(%eax, %ecx), %eax
+
+ /* Saved %ebx from the BSP and stash OS-MLE pointer */
+ movl (SL_mle_scratch + SL_SCRATCH_AP_EBX)(%eax), %ebx
+ /* Save OS-MLE base in %edi for call to sl_txt_load_regs */
+ movl %eax, %edi
+
+ /* Lock and get our stack index */
+ movl $1, %ecx
+.Lspin:
+ xorl %eax, %eax
+ lock cmpxchgl %ecx, rva(sl_txt_spin_lock)(%ebx)
+ pause
+ jnz .Lspin
+
+ /* Increment the stack index and use the next value inside lock */
+ incl rva(sl_txt_stack_index)(%ebx)
+ movl rva(sl_txt_stack_index)(%ebx), %eax
+
+ /* Unlock */
+ movl $0, rva(sl_txt_spin_lock)(%ebx)
+
+ /* Location of the relocated AP wake block */
+ movl rva(sl_txt_ap_wake_block)(%ebx), %ecx
+
+ /* Load reloc GDT, set segment regs and lret to __SL32_CS */
+ lgdt (sl_ap_gdt_desc - sl_txt_ap_wake_begin)(%ecx)
+
+ movl $(__SL32_DS), %edx
+ movw %dx, %ds
+ movw %dx, %es
+ movw %dx, %fs
+ movw %dx, %gs
+ movw %dx, %ss
+
+ /* Load our reloc AP stack */
+ movl $(TXT_BOOT_STACK_SIZE), %edx
+ mull %edx
+ leal (sl_stacks_end - sl_txt_ap_wake_begin)(%ecx), %esp
+ subl %eax, %esp
+
+ /* Switch to AP code segment */
+ leal rva(.Lsl_ap_cs)(%ebx), %eax
+ pushl $(__SL32_CS)
+ pushl %eax
+ lret
+
+.Lsl_ap_cs:
+ /* Load the relocated AP IDT */
+ lidt (sl_ap_idt_desc - sl_txt_ap_wake_begin)(%ecx)
+
+ /* Fixup MTRRs and misc enable MSR on APs too */
+ call sl_txt_load_regs
+
+ /* Enable SMI with GETSEC[SMCTRL] */
+ GETSEC $(SMX_X86_GETSEC_SMCTRL)
+
+ /* IRET-to-self can be used to enable NMIs which SENTER disabled */
+ leal rva(.Lnmi_enabled_ap)(%ebx), %eax
+ pushfl
+ pushl $(__SL32_CS)
+ pushl %eax
+ iret
+
+.Lnmi_enabled_ap:
+ /* Put APs in X2APIC mode like the BSP */
+ movl $(MSR_IA32_APICBASE), %ecx
+ rdmsr
+ orl $(XAPIC_ENABLE | X2APIC_ENABLE), %eax
+ wrmsr
+
+ /*
+ * Basically done, increment the CPU count and jump off to the AP
+ * wake block to wait.
+ */
+ lock incl rva(sl_txt_cpu_count)(%ebx)
+
+ movl rva(sl_txt_ap_wake_block)(%ebx), %eax
+ jmp *%eax
+SYM_FUNC_END(sl_txt_ap_entry)
+
+SYM_FUNC_START(sl_txt_reloc_ap_wake)
+ /* Save boot params register */
+ pushl %esi
+
+ movl rva(sl_txt_ap_wake_block)(%ebx), %edi
+
+ /* Fixup AP IDT and GDT descriptor before relocating */
+ leal rva(sl_ap_idt_desc)(%ebx), %eax
+ addl %edi, 2(%eax)
+ leal rva(sl_ap_gdt_desc)(%ebx), %eax
+ addl %edi, 2(%eax)
+
+ /*
+ * Copy the AP wake code and AP GDT/IDT to the protected wake block
+ * provided by the loader. Destination already in %edi.
+ */
+ movl $(sl_txt_ap_wake_end - sl_txt_ap_wake_begin), %ecx
+ leal rva(sl_txt_ap_wake_begin)(%ebx), %esi
+ rep movsb
+
+ /* Setup the IDT for the APs to use in the relocation block */
+ movl rva(sl_txt_ap_wake_block)(%ebx), %ecx
+ addl $(sl_ap_idt - sl_txt_ap_wake_begin), %ecx
+ xorl %edx, %edx
+
+ /* Form the default reset vector relocation address */
+ movl rva(sl_txt_ap_wake_block)(%ebx), %esi
+ addl $(sl_txt_int_reset - sl_txt_ap_wake_begin), %esi
+
+1:
+ cmpw $(NR_VECTORS), %dx
+ jz .Lap_idt_done
+
+ cmpw $(X86_TRAP_NMI), %dx
+ jz 2f
+
+ /* Load all other fixed vectors with reset handler */
+ movl %esi, %eax
+ movw %ax, (IDT_VECTOR_LO_BITS)(%ecx)
+ shrl $16, %eax
+ movw %ax, (IDT_VECTOR_HI_BITS)(%ecx)
+ jmp 3f
+
+2:
+ /* Load single wake NMI IPI vector at the relocation address */
+ movl rva(sl_txt_ap_wake_block)(%ebx), %eax
+ addl $(sl_txt_int_ipi_wake - sl_txt_ap_wake_begin), %eax
+ movw %ax, (IDT_VECTOR_LO_BITS)(%ecx)
+ shrl $16, %eax
+ movw %ax, (IDT_VECTOR_HI_BITS)(%ecx)
+
+3:
+ incw %dx
+ addl $8, %ecx
+ jmp 1b
+
+.Lap_idt_done:
+ popl %esi
+ ret
+SYM_FUNC_END(sl_txt_reloc_ap_wake)
+
+SYM_FUNC_START(sl_txt_load_regs)
+ /* Save base pointer register */
+ pushl %ebx
+
+ /*
+ * On Intel, the original variable MTRRs and Misc Enable MSR are
+ * restored on the BSP at early boot. Each AP will also restore
+ * its MTRRs and Misc Enable MSR.
+ */
+ pushl %edi
+ addl $(SL_saved_bsp_mtrrs), %edi
+ movl (%edi), %ebx
+ pushl %ebx /* default_mem_type lo */
+ addl $4, %edi
+ movl (%edi), %ebx
+ pushl %ebx /* default_mem_type hi */
+ addl $4, %edi
+ movl (%edi), %ebx /* mtrr_vcnt lo, don't care about hi part */
+ addl $8, %edi /* now at MTRR pair array */
+ /* Write the variable MTRRs */
+ movl $(MTRRphysBase0), %ecx
+1:
+ cmpl $0, %ebx
+ jz 2f
+
+ movl (%edi), %eax /* MTRRphysBaseX lo */
+ addl $4, %edi
+ movl (%edi), %edx /* MTRRphysBaseX hi */
+ wrmsr
+ addl $4, %edi
+ incl %ecx
+ movl (%edi), %eax /* MTRRphysMaskX lo */
+ addl $4, %edi
+ movl (%edi), %edx /* MTRRphysMaskX hi */
+ wrmsr
+ addl $4, %edi
+ incl %ecx
+
+ decl %ebx
+ jmp 1b
+2:
+ /* Write the default MTRR register */
+ popl %edx
+ popl %eax
+ movl $(MSR_MTRRdefType), %ecx
+ wrmsr
+
+ /* Return to beginning and write the misc enable msr */
+ popl %edi
+ addl $(SL_saved_misc_enable_msr), %edi
+ movl (%edi), %eax /* saved_misc_enable_msr lo */
+ addl $4, %edi
+ movl (%edi), %edx /* saved_misc_enable_msr hi */
+ movl $(MSR_IA32_MISC_ENABLE), %ecx
+ wrmsr
+
+ popl %ebx
+ ret
+SYM_FUNC_END(sl_txt_load_regs)
+
+SYM_FUNC_START(sl_txt_wake_aps)
+ /* Save boot params register */
+ pushl %esi
+
+ /* First setup the MLE join structure and load it into TXT reg */
+ leal rva(sl_gdt)(%ebx), %eax
+ leal rva(sl_txt_ap_entry)(%ebx), %ecx
+ leal rva(sl_smx_rlp_mle_join)(%ebx), %edx
+ movl %eax, SL_rlp_gdt_base(%edx)
+ movl %ecx, SL_rlp_entry_point(%edx)
+ movl %edx, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_MLE_JOIN)
+
+ /* Another TXT heap walk to find various values needed to wake APs */
+ movl (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
+ /* At BIOS data size, find the number of logical processors */
+ movl (SL_num_logical_procs + 8)(%eax), %edx
+ /* Skip over BIOS data */
+ movl (%eax), %ecx
+ addl %ecx, %eax
+ /* Skip over OS to MLE */
+ movl (%eax), %ecx
+ addl %ecx, %eax
+ /* At OS-SNIT size, get capabilities to know how to wake up the APs */
+ movl (SL_capabilities + 8)(%eax), %esi
+ /* Skip over OS to SNIT */
+ movl (%eax), %ecx
+ addl %ecx, %eax
+ /* At SINIT-MLE size, get the AP wake MONITOR address */
+ movl (SL_rlp_wakeup_addr + 8)(%eax), %edi
+
+ /* Determine how to wake up the APs */
+ testl $(1 << TXT_SINIT_MLE_CAP_WAKE_MONITOR), %esi
+ jz .Lwake_getsec
+
+ /* Wake using MWAIT MONITOR */
+ movl $1, (%edi)
+ jmp .Laps_awake
+
+.Lwake_getsec:
+ /* Wake using GETSEC(WAKEUP) */
+ GETSEC $(SMX_X86_GETSEC_WAKEUP)
+
+.Laps_awake:
+ /*
+ * All of the APs are woken up and rendesvous in the relocated wake
+ * block starting at sl_txt_ap_wake_begin. Wait for all of them to
+ * halt.
+ */
+ pause
+ cmpl rva(sl_txt_cpu_count)(%ebx), %edx
+ jne .Laps_awake
+
+ popl %esi
+ ret
+SYM_FUNC_END(sl_txt_wake_aps)
+
+/* This is the beginning of the relocated AP wake code block */
+ .global sl_txt_ap_wake_begin
+sl_txt_ap_wake_begin:
+
+ /*
+ * Wait for NMI IPI in the relocated AP wake block which was provided
+ * and protected in the memory map by the prelaunch code. Leave all
+ * other interrupts masked since we do not expect anything but an NMI.
+ */
+ xorl %edx, %edx
+
+1:
+ hlt
+ testl %edx, %edx
+ jz 1b
+
+ /*
+ * This is the long absolute jump to the 32b Secure Launch protected
+ * mode stub code in the rmpiggy. The jump address will be fixed in
+ * the SMP boot code when the first AP is brought up. This whole area
+ * is provided and protected in the memory map by the prelaunch code.
+ */
+ .byte 0xea
+sl_ap_jmp_offset:
+ .long 0x00000000
+ .word __SL32_CS
+
+SYM_FUNC_START(sl_txt_int_ipi_wake)
+ movl $1, %edx
+
+ /* NMI context, just IRET */
+ iret
+SYM_FUNC_END(sl_txt_int_ipi_wake)
+
+SYM_FUNC_START(sl_txt_int_reset)
+ TXT_RESET $(SL_ERROR_INV_AP_INTERRUPT)
+SYM_FUNC_END(sl_txt_int_reset)
+
+ .balign 8
+SYM_DATA_START_LOCAL(sl_ap_idt_desc)
+ .word sl_ap_idt_end - sl_ap_idt - 1 /* Limit */
+ .long sl_ap_idt - sl_txt_ap_wake_begin /* Base */
+SYM_DATA_END_LABEL(sl_ap_idt_desc, SYM_L_LOCAL, sl_ap_idt_desc_end)
+
+ .balign 8
+SYM_DATA_START_LOCAL(sl_ap_idt)
+ .rept NR_VECTORS
+ .word 0x0000 /* Offset 15 to 0 */
+ .word __SL32_CS /* Segment selector */
+ .word 0x8e00 /* Present, DPL=0, 32b Vector, Interrupt */
+ .word 0x0000 /* Offset 31 to 16 */
+ .endr
+SYM_DATA_END_LABEL(sl_ap_idt, SYM_L_LOCAL, sl_ap_idt_end)
+
+ .balign 8
+SYM_DATA_START_LOCAL(sl_ap_gdt_desc)
+ .word sl_ap_gdt_end - sl_ap_gdt - 1
+ .long sl_ap_gdt - sl_txt_ap_wake_begin
+SYM_DATA_END_LABEL(sl_ap_gdt_desc, SYM_L_LOCAL, sl_ap_gdt_desc_end)
+
+ .balign 8
+SYM_DATA_START_LOCAL(sl_ap_gdt)
+ .quad 0x0000000000000000 /* NULL */
+ .quad 0x00cf9a000000ffff /* __SL32_CS */
+ .quad 0x00cf92000000ffff /* __SL32_DS */
+SYM_DATA_END_LABEL(sl_ap_gdt, SYM_L_LOCAL, sl_ap_gdt_end)
+
+ /* Small stacks for BSP and APs to work with */
+ .balign 4
+SYM_DATA_START_LOCAL(sl_stacks)
+ .fill (TXT_MAX_CPUS * TXT_BOOT_STACK_SIZE), 1, 0
+SYM_DATA_END_LABEL(sl_stacks, SYM_L_LOCAL, sl_stacks_end)
+
+/* This is the end of the relocated AP wake code block */
+ .global sl_txt_ap_wake_end
+sl_txt_ap_wake_end:
+
+ .data
+ .balign 8
+SYM_DATA_START_LOCAL(sl_gdt_desc)
+ .word sl_gdt_end - sl_gdt - 1
+ .long sl_gdt - sl_gdt_desc
+SYM_DATA_END_LABEL(sl_gdt_desc, SYM_L_LOCAL, sl_gdt_desc_end)
+
+ .balign 8
+SYM_DATA_START_LOCAL(sl_gdt)
+ .quad 0x0000000000000000 /* NULL */
+ .quad 0x00cf9a000000ffff /* __SL32_CS */
+ .quad 0x00cf92000000ffff /* __SL32_DS */
+SYM_DATA_END_LABEL(sl_gdt, SYM_L_LOCAL, sl_gdt_end)
+
+ .balign 8
+SYM_DATA_START_LOCAL(sl_smx_rlp_mle_join)
+ .long sl_gdt_end - sl_gdt - 1 /* GDT limit */
+ .long 0x00000000 /* GDT base */
+ .long __SL32_CS /* Seg Sel - CS (DS, ES, SS = seg_sel+8) */
+ .long 0x00000000 /* Entry point physical address */
+SYM_DATA_END(sl_smx_rlp_mle_join)
+
+SYM_DATA(sl_cpu_type, .long 0x00000000)
+
+SYM_DATA(sl_mle_start, .long 0x00000000)
+
+SYM_DATA_LOCAL(sl_txt_spin_lock, .long 0x00000000)
+
+SYM_DATA_LOCAL(sl_txt_stack_index, .long 0x00000000)
+
+SYM_DATA_LOCAL(sl_txt_cpu_count, .long 0x00000000)
+
+SYM_DATA_LOCAL(sl_txt_ap_wake_block, .long 0x00000000)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ecd3fd6..b665c51 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -12,6 +12,7 @@
#include <linux/hardirq.h>
#include <linux/suspend.h>
#include <linux/kbuild.h>
+#include <linux/slaunch.h>
#include <asm/processor.h>
#include <asm/thread_info.h>
#include <asm/sigframe.h>
@@ -93,4 +94,22 @@ static void __used common(void)
OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
+
+#ifdef CONFIG_SECURE_LAUNCH
+ BLANK();
+ OFFSET(SL_boot_params_addr, txt_os_mle_data, boot_params_addr);
+ OFFSET(SL_saved_misc_enable_msr, txt_os_mle_data, saved_misc_enable_msr);
+ OFFSET(SL_saved_bsp_mtrrs, txt_os_mle_data, saved_bsp_mtrrs);
+ OFFSET(SL_ap_wake_block, txt_os_mle_data, ap_wake_block);
+ OFFSET(SL_ap_wake_block_size, txt_os_mle_data, ap_wake_block_size);
+ OFFSET(SL_mle_scratch, txt_os_mle_data, mle_scratch);
+ OFFSET(SL_num_logical_procs, txt_bios_data, num_logical_procs);
+ OFFSET(SL_capabilities, txt_os_sinit_data, capabilities);
+ OFFSET(SL_mle_size, txt_os_sinit_data, mle_size);
+ OFFSET(SL_vtd_pmr_lo_base, txt_os_sinit_data, vtd_pmr_lo_base);
+ OFFSET(SL_vtd_pmr_lo_size, txt_os_sinit_data, vtd_pmr_lo_size);
+ OFFSET(SL_rlp_wakeup_addr, txt_sinit_mle_data, rlp_wakeup_addr);
+ OFFSET(SL_rlp_gdt_base, smx_rlp_mle_join, rlp_gdt_base);
+ OFFSET(SL_rlp_entry_point, smx_rlp_mle_join, rlp_entry_point);
+#endif
}
--
1.8.3.1
On 2021-06-18 17:12, Ross Philipson wrote:
> The IOMMU should always be set to default translated type after
> the PMRs are disabled to protect the MLE from DMA.
>
> Signed-off-by: Ross Philipson <[email protected]>
> ---
> drivers/iommu/intel/iommu.c | 5 +++++
> drivers/iommu/iommu.c | 6 +++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index be35284..4f0256d 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -41,6 +41,7 @@
> #include <linux/dma-direct.h>
> #include <linux/crash_dump.h>
> #include <linux/numa.h>
> +#include <linux/slaunch.h>
> #include <asm/irq_remapping.h>
> #include <asm/cacheflush.h>
> #include <asm/iommu.h>
> @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device *dev)
> */
> static int device_def_domain_type(struct device *dev)
> {
> + /* Do not allow identity domain when Secure Launch is configured */
> + if (slaunch_get_flags() & SL_FLAG_ACTIVE)
> + return IOMMU_DOMAIN_DMA;
Is this specific to Intel? It seems like it could easily be done
commonly like the check for untrusted external devices.
> +
> if (dev_is_pci(dev)) {
> struct pci_dev *pdev = to_pci_dev(dev);
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 808ab70d..d49b7dd 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -23,6 +23,7 @@
> #include <linux/property.h>
> #include <linux/fsl/mc.h>
> #include <linux/module.h>
> +#include <linux/slaunch.h>
> #include <trace/events/iommu.h>
>
> static struct kset *iommu_group_kset;
> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
> {
> if (cmd_line)
> iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
> - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
> +
> + /* Do not allow identity domain when Secure Launch is configured */
> + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
> + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
Quietly ignoring the setting and possibly leaving iommu_def_domain_type
uninitialised (note that 0 is not actually a usable type) doesn't seem
great. AFAICS this probably warrants similar treatment to the
mem_encrypt_active() case - there doesn't seem a great deal of value in
trying to save users from themselves if they care about measured boot
yet explicitly pass options which may compromise measured boot. If you
really want to go down that route there's at least the sysfs interface
you'd need to nobble as well, not to mention the various ways of
completely disabling IOMMUs...
It might be reasonable to make IOMMU_DEFAULT_PASSTHROUGH depend on
!SECURE_LAUNCH for clarity though.
Robin.
> }
>
> void iommu_set_default_translated(bool cmd_line)
>
On 6/18/21 2:32 PM, Robin Murphy wrote:
> On 2021-06-18 17:12, Ross Philipson wrote:
>> The IOMMU should always be set to default translated type after
>> the PMRs are disabled to protect the MLE from DMA.
>>
>> Signed-off-by: Ross Philipson <[email protected]>
>> ---
>> drivers/iommu/intel/iommu.c | 5 +++++
>> drivers/iommu/iommu.c | 6 +++++-
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index be35284..4f0256d 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -41,6 +41,7 @@
>> #include <linux/dma-direct.h>
>> #include <linux/crash_dump.h>
>> #include <linux/numa.h>
>> +#include <linux/slaunch.h>
>> #include <asm/irq_remapping.h>
>> #include <asm/cacheflush.h>
>> #include <asm/iommu.h>
>> @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device
>> *dev)
>> */
>> static int device_def_domain_type(struct device *dev)
>> {
>> + /* Do not allow identity domain when Secure Launch is configured */
>> + if (slaunch_get_flags() & SL_FLAG_ACTIVE)
>> + return IOMMU_DOMAIN_DMA;
>
> Is this specific to Intel? It seems like it could easily be done
> commonly like the check for untrusted external devices.
It is currently Intel only but that will change. I will look into what
you suggest.
>
>> +
>> if (dev_is_pci(dev)) {
>> struct pci_dev *pdev = to_pci_dev(dev);
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 808ab70d..d49b7dd 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -23,6 +23,7 @@
>> #include <linux/property.h>
>> #include <linux/fsl/mc.h>
>> #include <linux/module.h>
>> +#include <linux/slaunch.h>
>> #include <trace/events/iommu.h>
>> static struct kset *iommu_group_kset;
>> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
>> {
>> if (cmd_line)
>> iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
>> - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>> +
>> + /* Do not allow identity domain when Secure Launch is configured */
>> + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
>> + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>
> Quietly ignoring the setting and possibly leaving iommu_def_domain_type
> uninitialised (note that 0 is not actually a usable type) doesn't seem
> great. AFAICS this probably warrants similar treatment to the
Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event
though passthrough was requested. Or perhaps something more is needed here?
> mem_encrypt_active() case - there doesn't seem a great deal of value in
> trying to save users from themselves if they care about measured boot
> yet explicitly pass options which may compromise measured boot. If you
> really want to go down that route there's at least the sysfs interface
> you'd need to nobble as well, not to mention the various ways of
> completely disabling IOMMUs...
Doing a secure launch with the kernel is not a general purpose user use
case. A lot of work is done to secure the environment. Allowing
passthrough mode would leave the secure launch kernel exposed to DMA. I
think what we are trying to do here is what we intend though there may
be a better way or perhaps it is incomplete as you suggest.
>
> It might be reasonable to make IOMMU_DEFAULT_PASSTHROUGH depend on
> !SECURE_LAUNCH for clarity though.
This came from a specific request to not make disabling IOMMU modes
build time dependent. This is because a secure launch enabled kernel can
also be booted as a general purpose kernel in cases where this is desired.
Thank you,
Ross
>
> Robin.
>
>> }
>> void iommu_set_default_translated(bool cmd_line)
>>
On Mon, Jun 21, 2021 at 10:51 AM Ross Philipson
<[email protected]> wrote:
>
> On 6/18/21 2:32 PM, Robin Murphy wrote:
> > On 2021-06-18 17:12, Ross Philipson wrote:
> >> The IOMMU should always be set to default translated type after
> >> the PMRs are disabled to protect the MLE from DMA.
> >>
> >> Signed-off-by: Ross Philipson <[email protected]>
> >> ---
> >> drivers/iommu/intel/iommu.c | 5 +++++
> >> drivers/iommu/iommu.c | 6 +++++-
> >> 2 files changed, 10 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> >> index be35284..4f0256d 100644
> >> --- a/drivers/iommu/intel/iommu.c
> >> +++ b/drivers/iommu/intel/iommu.c
> >> @@ -41,6 +41,7 @@
> >> #include <linux/dma-direct.h>
> >> #include <linux/crash_dump.h>
> >> #include <linux/numa.h>
> >> +#include <linux/slaunch.h>
> >> #include <asm/irq_remapping.h>
> >> #include <asm/cacheflush.h>
> >> #include <asm/iommu.h>
> >> @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device
> >> *dev)
> >> */
> >> static int device_def_domain_type(struct device *dev)
> >> {
> >> + /* Do not allow identity domain when Secure Launch is configured */
> >> + if (slaunch_get_flags() & SL_FLAG_ACTIVE)
> >> + return IOMMU_DOMAIN_DMA;
> >
> > Is this specific to Intel? It seems like it could easily be done
> > commonly like the check for untrusted external devices.
>
> It is currently Intel only but that will change. I will look into what
> you suggest.
>
> >
> >> +
> >> if (dev_is_pci(dev)) {
> >> struct pci_dev *pdev = to_pci_dev(dev);
> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >> index 808ab70d..d49b7dd 100644
> >> --- a/drivers/iommu/iommu.c
> >> +++ b/drivers/iommu/iommu.c
> >> @@ -23,6 +23,7 @@
> >> #include <linux/property.h>
> >> #include <linux/fsl/mc.h>
> >> #include <linux/module.h>
> >> +#include <linux/slaunch.h>
> >> #include <trace/events/iommu.h>
> >> static struct kset *iommu_group_kset;
> >> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
> >> {
> >> if (cmd_line)
> >> iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
> >> - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
> >> +
> >> + /* Do not allow identity domain when Secure Launch is configured */
> >> + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
> >> + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
> >
> > Quietly ignoring the setting and possibly leaving iommu_def_domain_type
> > uninitialised (note that 0 is not actually a usable type) doesn't seem
> > great. AFAICS this probably warrants similar treatment to the
>
> Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event
> though passthrough was requested. Or perhaps something more is needed here?
>
> > mem_encrypt_active() case - there doesn't seem a great deal of value in
> > trying to save users from themselves if they care about measured boot
> > yet explicitly pass options which may compromise measured boot. If you
> > really want to go down that route there's at least the sysfs interface
> > you'd need to nobble as well, not to mention the various ways of
> > completely disabling IOMMUs...
>
> Doing a secure launch with the kernel is not a general purpose user use
> case. A lot of work is done to secure the environment. Allowing
> passthrough mode would leave the secure launch kernel exposed to DMA. I
> think what we are trying to do here is what we intend though there may
> be a better way or perhaps it is incomplete as you suggest.
>
I don't really like all these special cases. Generically, what you're
trying to do is (AFAICT) to get the kernel to run in a mode in which
it does its best not to trust attached devices. Nothing about this is
specific to Secure Launch. There are plenty of scenarios in which
this the case:
- Virtual devices in a VM host outside the TCB, e.g. VDUSE, Xen
device domains (did I get the name right), whatever tricks QEMU has,
etc.
- SRTM / DRTM technologies (including but not limited to Secure
Launch -- plain old Secure Boot can work like this too).
- Secure guest technologies, including but not limited to TDX and SEV.
- Any computer with a USB-C port or other external DMA-capable port.
- Regular computers in which the admin wants to enable this mode for
whatever reason.
Can you folks all please agree on a coordinated way for a Linux kernel
to configure itself appropriately? Or to be configured via initramfs,
boot option, or some other trusted source of configuration supplied at
boot time? We don't need a whole bunch of if (TDX), if (SEV), if
(secure launch), if (I have a USB-C port with PCIe exposed), if
(running on Xen), and similar checks all over the place.
On 2021-06-21 18:51, Ross Philipson wrote:
> On 6/18/21 2:32 PM, Robin Murphy wrote:
>> On 2021-06-18 17:12, Ross Philipson wrote:
>>> The IOMMU should always be set to default translated type after
>>> the PMRs are disabled to protect the MLE from DMA.
>>>
>>> Signed-off-by: Ross Philipson <[email protected]>
>>> ---
>>> drivers/iommu/intel/iommu.c | 5 +++++
>>> drivers/iommu/iommu.c | 6 +++++-
>>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>> index be35284..4f0256d 100644
>>> --- a/drivers/iommu/intel/iommu.c
>>> +++ b/drivers/iommu/intel/iommu.c
>>> @@ -41,6 +41,7 @@
>>> #include <linux/dma-direct.h>
>>> #include <linux/crash_dump.h>
>>> #include <linux/numa.h>
>>> +#include <linux/slaunch.h>
>>> #include <asm/irq_remapping.h>
>>> #include <asm/cacheflush.h>
>>> #include <asm/iommu.h>
>>> @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device
>>> *dev)
>>> */
>>> static int device_def_domain_type(struct device *dev)
>>> {
>>> + /* Do not allow identity domain when Secure Launch is configured */
>>> + if (slaunch_get_flags() & SL_FLAG_ACTIVE)
>>> + return IOMMU_DOMAIN_DMA;
>>
>> Is this specific to Intel? It seems like it could easily be done
>> commonly like the check for untrusted external devices.
>
> It is currently Intel only but that will change. I will look into what
> you suggest.
Yeah, it's simple and unobtrusive enough that I reckon it's worth going
straight to the common version if it's worth doing at all.
>>> +
>>> if (dev_is_pci(dev)) {
>>> struct pci_dev *pdev = to_pci_dev(dev);
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index 808ab70d..d49b7dd 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -23,6 +23,7 @@
>>> #include <linux/property.h>
>>> #include <linux/fsl/mc.h>
>>> #include <linux/module.h>
>>> +#include <linux/slaunch.h>
>>> #include <trace/events/iommu.h>
>>> static struct kset *iommu_group_kset;
>>> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
>>> {
>>> if (cmd_line)
>>> iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
>>> - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>>> +
>>> + /* Do not allow identity domain when Secure Launch is configured */
>>> + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
>>> + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>>
>> Quietly ignoring the setting and possibly leaving iommu_def_domain_type
>> uninitialised (note that 0 is not actually a usable type) doesn't seem
>> great. AFAICS this probably warrants similar treatment to the
>
> Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event
> though passthrough was requested. Or perhaps something more is needed here?
>
>> mem_encrypt_active() case - there doesn't seem a great deal of value in
>> trying to save users from themselves if they care about measured boot
>> yet explicitly pass options which may compromise measured boot. If you
>> really want to go down that route there's at least the sysfs interface
>> you'd need to nobble as well, not to mention the various ways of
>> completely disabling IOMMUs...
>
> Doing a secure launch with the kernel is not a general purpose user use
> case. A lot of work is done to secure the environment. Allowing
> passthrough mode would leave the secure launch kernel exposed to DMA. I
> think what we are trying to do here is what we intend though there may
> be a better way or perhaps it is incomplete as you suggest.
On second thoughts this is overkill anyway - if you do hook
iommu_get_def_domain_type(), you're done (in terms of the kernel-managed
setting, at least); it doesn't matter what iommu_def_domain_type gets
set to if will never get used. However, since this isn't really a
per-device thing, it might be more semantically appropriate to leave
that alone and instead only massage the default type in
iommu_subsys_init(), as for memory encryption.
When you say "secure the environment", what's the actual threat model
here, i.e. who's securing what against whom? If it's a device lockdown
type thing where the system owner wants to defend against the end user
trying to mess with the software stack or gain access to parts they
shouldn't, then possibly you can trust the command line, but there are
definitely other places which need consideration. If on the other hand
it's more about giving the end user confidence that their choice of
software stack isn't being interfered with by a malicious host or
external third parties, then it probably leans towards the opposite
being true...
If the command line *is* within the threat model, consider "iommu=off"
and/or "intel_iommu=off" for example: I don't know how PMRs work, but I
can only imagine that that's liable to leave things either wide open, or
blocked to the point of no DMA working at all, neither of which seems to
be what you want. I'm guessing "intel_iommu=tboot_noforce" might have
some relevant implications too.
>> It might be reasonable to make IOMMU_DEFAULT_PASSTHROUGH depend on
>> !SECURE_LAUNCH for clarity though.
>
> This came from a specific request to not make disabling IOMMU modes
> build time dependent. This is because a secure launch enabled kernel can
> also be booted as a general purpose kernel in cases where this is desired.
Ah, thanks for clarifying - I was wondering about that aspect. FWIW,
note that that wouldn't actually change any functionality - it's a
non-default config option anyway, and users could still override it
either way in a non-secure-launch setup - but it sounds like it might be
effectively superfluous if you do need to make a more active runtime
decision anyway.
Cheers,
Robin.
On 6/22/21 7:06 AM, Robin Murphy wrote:
> On 2021-06-21 18:51, Ross Philipson wrote:
>> On 6/18/21 2:32 PM, Robin Murphy wrote:
>>> On 2021-06-18 17:12, Ross Philipson wrote:
>>>> The IOMMU should always be set to default translated type after
>>>> the PMRs are disabled to protect the MLE from DMA.
>>>>
>>>> Signed-off-by: Ross Philipson <[email protected]>
>>>> ---
>>>> drivers/iommu/intel/iommu.c | 5 +++++
>>>> drivers/iommu/iommu.c | 6 +++++-
>>>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>>> index be35284..4f0256d 100644
>>>> --- a/drivers/iommu/intel/iommu.c
>>>> +++ b/drivers/iommu/intel/iommu.c
>>>> @@ -41,6 +41,7 @@
>>>> #include <linux/dma-direct.h>
>>>> #include <linux/crash_dump.h>
>>>> #include <linux/numa.h>
>>>> +#include <linux/slaunch.h>
>>>> #include <asm/irq_remapping.h>
>>>> #include <asm/cacheflush.h>
>>>> #include <asm/iommu.h>
>>>> @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device
>>>> *dev)
>>>> */
>>>> static int device_def_domain_type(struct device *dev)
>>>> {
>>>> + /* Do not allow identity domain when Secure Launch is
>>>> configured */
>>>> + if (slaunch_get_flags() & SL_FLAG_ACTIVE)
>>>> + return IOMMU_DOMAIN_DMA;
>>>
>>> Is this specific to Intel? It seems like it could easily be done
>>> commonly like the check for untrusted external devices.
>>
>> It is currently Intel only but that will change. I will look into what
>> you suggest.
>
> Yeah, it's simple and unobtrusive enough that I reckon it's worth going
> straight to the common version if it's worth doing at all.
>
>>>> +
>>>> if (dev_is_pci(dev)) {
>>>> struct pci_dev *pdev = to_pci_dev(dev);
>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>> index 808ab70d..d49b7dd 100644
>>>> --- a/drivers/iommu/iommu.c
>>>> +++ b/drivers/iommu/iommu.c
>>>> @@ -23,6 +23,7 @@
>>>> #include <linux/property.h>
>>>> #include <linux/fsl/mc.h>
>>>> #include <linux/module.h>
>>>> +#include <linux/slaunch.h>
>>>> #include <trace/events/iommu.h>
>>>> static struct kset *iommu_group_kset;
>>>> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool
>>>> cmd_line)
>>>> {
>>>> if (cmd_line)
>>>> iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
>>>> - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>>>> +
>>>> + /* Do not allow identity domain when Secure Launch is
>>>> configured */
>>>> + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
>>>> + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>>>
>>> Quietly ignoring the setting and possibly leaving iommu_def_domain_type
>>> uninitialised (note that 0 is not actually a usable type) doesn't seem
>>> great. AFAICS this probably warrants similar treatment to the
>>
>> Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event
>> though passthrough was requested. Or perhaps something more is needed
>> here?
>>
>>> mem_encrypt_active() case - there doesn't seem a great deal of value in
>>> trying to save users from themselves if they care about measured boot
>>> yet explicitly pass options which may compromise measured boot. If you
>>> really want to go down that route there's at least the sysfs interface
>>> you'd need to nobble as well, not to mention the various ways of
>>> completely disabling IOMMUs...
>>
>> Doing a secure launch with the kernel is not a general purpose user use
>> case. A lot of work is done to secure the environment. Allowing
>> passthrough mode would leave the secure launch kernel exposed to DMA. I
>> think what we are trying to do here is what we intend though there may
>> be a better way or perhaps it is incomplete as you suggest.
>
> On second thoughts this is overkill anyway - if you do hook
> iommu_get_def_domain_type(), you're done (in terms of the kernel-managed
> setting, at least); it doesn't matter what iommu_def_domain_type gets
> set to if will never get used. However, since this isn't really a
> per-device thing, it might be more semantically appropriate to leave
> that alone and instead only massage the default type in
> iommu_subsys_init(), as for memory encryption.
>
> When you say "secure the environment", what's the actual threat model
> here, i.e. who's securing what against whom? If it's a device lockdown
> type thing where the system owner wants to defend against the end user
> trying to mess with the software stack or gain access to parts they
> shouldn't, then possibly you can trust the command line, but there are
> definitely other places which need consideration. If on the other hand
> it's more about giving the end user confidence that their choice of
> software stack isn't being interfered with by a malicious host or
> external third parties, then it probably leans towards the opposite
> being true...
>
> If the command line *is* within the threat model, consider "iommu=off"
> and/or "intel_iommu=off" for example: I don't know how PMRs work, but I
> can only imagine that that's liable to leave things either wide open, or
> blocked to the point of no DMA working at all, neither of which seems to
> be what you want. I'm guessing "intel_iommu=tboot_noforce" might have
> some relevant implications too.
Thank you for you suggestions and feedback. Sorry we did not get back
sooner. After the comments from you and Andy Lutomirski we decided we
needed to re-imagine what we are trying to accomplish here and how else
we might approach it.
Ross
>
>>> It might be reasonable to make IOMMU_DEFAULT_PASSTHROUGH depend on
>>> !SECURE_LAUNCH for clarity though.
>>
>> This came from a specific request to not make disabling IOMMU modes
>> build time dependent. This is because a secure launch enabled kernel can
>> also be booted as a general purpose kernel in cases where this is
>> desired.
>
> Ah, thanks for clarifying - I was wondering about that aspect. FWIW,
> note that that wouldn't actually change any functionality - it's a
> non-default config option anyway, and users could still override it
> either way in a non-secure-launch setup - but it sounds like it might be
> effectively superfluous if you do need to make a more active runtime
> decision anyway.
>
> Cheers,
> Robin.
On 6/21/21 5:15 PM, Andy Lutomirski wrote:
> On Mon, Jun 21, 2021 at 10:51 AM Ross Philipson
> <[email protected]> wrote:
>>
>> On 6/18/21 2:32 PM, Robin Murphy wrote:
>>> On 2021-06-18 17:12, Ross Philipson wrote:
>>>> The IOMMU should always be set to default translated type after
>>>> the PMRs are disabled to protect the MLE from DMA.
>>>>
>>>> Signed-off-by: Ross Philipson <[email protected]>
>>>> ---
>>>> drivers/iommu/intel/iommu.c | 5 +++++
>>>> drivers/iommu/iommu.c | 6 +++++-
>>>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>>> index be35284..4f0256d 100644
>>>> --- a/drivers/iommu/intel/iommu.c
>>>> +++ b/drivers/iommu/intel/iommu.c
>>>> @@ -41,6 +41,7 @@
>>>> #include <linux/dma-direct.h>
>>>> #include <linux/crash_dump.h>
>>>> #include <linux/numa.h>
>>>> +#include <linux/slaunch.h>
>>>> #include <asm/irq_remapping.h>
>>>> #include <asm/cacheflush.h>
>>>> #include <asm/iommu.h>
>>>> @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device
>>>> *dev)
>>>> */
>>>> static int device_def_domain_type(struct device *dev)
>>>> {
>>>> + /* Do not allow identity domain when Secure Launch is configured */
>>>> + if (slaunch_get_flags() & SL_FLAG_ACTIVE)
>>>> + return IOMMU_DOMAIN_DMA;
>>>
>>> Is this specific to Intel? It seems like it could easily be done
>>> commonly like the check for untrusted external devices.
>>
>> It is currently Intel only but that will change. I will look into what
>> you suggest.
>>
>>>
>>>> +
>>>> if (dev_is_pci(dev)) {
>>>> struct pci_dev *pdev = to_pci_dev(dev);
>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>> index 808ab70d..d49b7dd 100644
>>>> --- a/drivers/iommu/iommu.c
>>>> +++ b/drivers/iommu/iommu.c
>>>> @@ -23,6 +23,7 @@
>>>> #include <linux/property.h>
>>>> #include <linux/fsl/mc.h>
>>>> #include <linux/module.h>
>>>> +#include <linux/slaunch.h>
>>>> #include <trace/events/iommu.h>
>>>> static struct kset *iommu_group_kset;
>>>> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
>>>> {
>>>> if (cmd_line)
>>>> iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
>>>> - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>>>> +
>>>> + /* Do not allow identity domain when Secure Launch is configured */
>>>> + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
>>>> + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>>>
>>> Quietly ignoring the setting and possibly leaving iommu_def_domain_type
>>> uninitialised (note that 0 is not actually a usable type) doesn't seem
>>> great. AFAICS this probably warrants similar treatment to the
>>
>> Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event
>> though passthrough was requested. Or perhaps something more is needed here?
>>
>>> mem_encrypt_active() case - there doesn't seem a great deal of value in
>>> trying to save users from themselves if they care about measured boot
>>> yet explicitly pass options which may compromise measured boot. If you
>>> really want to go down that route there's at least the sysfs interface
>>> you'd need to nobble as well, not to mention the various ways of
>>> completely disabling IOMMUs...
>>
>> Doing a secure launch with the kernel is not a general purpose user use
>> case. A lot of work is done to secure the environment. Allowing
>> passthrough mode would leave the secure launch kernel exposed to DMA. I
>> think what we are trying to do here is what we intend though there may
>> be a better way or perhaps it is incomplete as you suggest.
>>
>
> I don't really like all these special cases. Generically, what you're
> trying to do is (AFAICT) to get the kernel to run in a mode in which
> it does its best not to trust attached devices. Nothing about this is
> specific to Secure Launch. There are plenty of scenarios in which
> this the case:
>
> - Virtual devices in a VM host outside the TCB, e.g. VDUSE, Xen
> device domains (did I get the name right), whatever tricks QEMU has,
> etc.
> - SRTM / DRTM technologies (including but not limited to Secure
> Launch -- plain old Secure Boot can work like this too).
> - Secure guest technologies, including but not limited to TDX and SEV.
> - Any computer with a USB-C port or other external DMA-capable port.
> - Regular computers in which the admin wants to enable this mode for
> whatever reason.
>
> Can you folks all please agree on a coordinated way for a Linux kernel
> to configure itself appropriately? Or to be configured via initramfs,
> boot option, or some other trusted source of configuration supplied at
> boot time? We don't need a whole bunch of if (TDX), if (SEV), if
> (secure launch), if (I have a USB-C port with PCIe exposed), if
> (running on Xen), and similar checks all over the place.
>
I replied to Robin Murphy in another thread. As far as the IOMMU is
concerned, I think we need to rethink our approach. As to the other
technologies you mention here, we have not considered special casing
anything at this point.
Thanks
Ross
On 6/21/21 5:15 PM, Andy Lutomirski wrote:
> On Mon, Jun 21, 2021 at 10:51 AM Ross Philipson
> <[email protected]> wrote:
>>
>> On 6/18/21 2:32 PM, Robin Murphy wrote:
>>> On 2021-06-18 17:12, Ross Philipson wrote:
>>>> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
>>>> {
>>>> if (cmd_line)
>>>> iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
>>>> - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>>>> +
>>>> + /* Do not allow identity domain when Secure Launch is configured */
>>>> + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
>>>> + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>>>
>>> Quietly ignoring the setting and possibly leaving iommu_def_domain_type
>>> uninitialised (note that 0 is not actually a usable type) doesn't seem
>>> great. AFAICS this probably warrants similar treatment to the
>>
>> Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event
>> though passthrough was requested. Or perhaps something more is needed here?
>>
>>> mem_encrypt_active() case - there doesn't seem a great deal of value in
>>> trying to save users from themselves if they care about measured boot
>>> yet explicitly pass options which may compromise measured boot. If you
>>> really want to go down that route there's at least the sysfs interface
>>> you'd need to nobble as well, not to mention the various ways of
>>> completely disabling IOMMUs...
>>
>> Doing a secure launch with the kernel is not a general purpose user use
>> case. A lot of work is done to secure the environment. Allowing
>> passthrough mode would leave the secure launch kernel exposed to DMA. I
>> think what we are trying to do here is what we intend though there may
>> be a better way or perhaps it is incomplete as you suggest.
>>
>
> I don't really like all these special cases. Generically, what you're
> trying to do is (AFAICT) to get the kernel to run in a mode in which
> it does its best not to trust attached devices. Nothing about this is
> specific to Secure Launch. There are plenty of scenarios in which
> this the case:
>
> - Virtual devices in a VM host outside the TCB, e.g. VDUSE, Xen
> device domains (did I get the name right), whatever tricks QEMU has,
> etc.
> - SRTM / DRTM technologies (including but not limited to Secure
> Launch -- plain old Secure Boot can work like this too).
> - Secure guest technologies, including but not limited to TDX and SEV.
> - Any computer with a USB-C port or other external DMA-capable port.
> - Regular computers in which the admin wants to enable this mode for
> whatever reason.
>
> Can you folks all please agree on a coordinated way for a Linux kernel
> to configure itself appropriately? Or to be configured via initramfs,
> boot option, or some other trusted source of configuration supplied at
> boot time? We don't need a whole bunch of if (TDX), if (SEV), if
> (secure launch), if (I have a USB-C port with PCIe exposed), if
> (running on Xen), and similar checks all over the place.
Hey Andy,
On behalf of Ross and myself I wanted to follow up on the points raised
here. While there is an interest to ensure a system is properly
configured we should not be blocking the user from configuring the
system as they desire. Instead we are taking the approach to document
the SecureLaunch capability, in particular the recommend way to
configure the kernel to appropriately use the capability using the
already existing methods such as using kernel parameters. Hopefully that
will address the concerns in the short term. Looking forward, we do have
a vested interest in ensuring there is an ability to configure access
control for security and safety critical solutions and would be grateful
if we would be included in any discussions or working groups that might
be looking into unifying how all these security technologies should be
configuring the Linux kernel.
V/r,
Daniel P. Smith