This patch series provides support for AMD's new Secure Memory Encryption (SME)
feature.
SME can be used to mark individual pages of memory as encrypted through the
page tables. A page of memory that is marked encrypted will be automatically
decrypted when read from DRAM and will be automatically encrypted when
written to DRAM. Details on SME can found in the links below.
The SME feature is identified through a CPUID function and enabled through
the SYSCFG MSR. Once enabled, page table entries will determine how the
memory is accessed. If a page table entry has the memory encryption mask set,
then that memory will be accessed as encrypted memory. The memory encryption
mask (as well as other related information) is determined from settings
returned through the same CPUID function that identifies the presence of the
feature.
The approach that this patch series takes is to encrypt everything possible
starting early in the boot where the kernel is encrypted. Using the page
table macros the encryption mask can be incorporated into all page table
entries and page allocations. By updating the protection map, userspace
allocations are also marked encrypted. Certain data must be accounted for
as having been placed in memory before SME was enabled (EFI, initrd, etc.)
and accessed accordingly.
This patch series is a pre-cursor to another AMD processor feature called
Secure Encrypted Virtualization (SEV). The support for SEV will build upon
the SME support and will be submitted later. Details on SEV can be found
in the links below.
The following links provide additional detail:
AMD Memory Encryption whitepaper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34
---
This patch series is based off of the master branch of tip.
Commit 53614fbd7961 ("Merge branch 'WIP.x86/fpu'")
Source code is also available at https://github.com/codomania/tip/tree/sme-v6
Still to do:
- Kdump support, including using memremap() instead of ioremap_cache()
Changes since v5:
- Added support for 5-level paging
- Added IOMMU support
- Created a generic asm/mem_encrypt.h in order to remove a bunch of
#ifndef/#define entries
- Removed changes to the __va() macro and defined a function to return
the true physical address in cr3
- Removed sysfs support as it was determined not to be needed
- General code cleanup based on feedback
- General cleanup of patch subjects and descriptions
Changes since v4:
- Re-worked mapping of setup data to not use a fixed list. Rather, check
dynamically whether the requested early_memremap()/memremap() call
needs to be mapped decrypted.
- Moved SME cpu feature into scattered features
- Moved some declarations into header files
- Cleared the encryption mask from the __PHYSICAL_MASK so that users
of macros such as pmd_pfn_mask() don't have to worry/know about the
encryption mask
- Updated some return types and values related to EFI and e820 functions
so that an error could be returned
- During cpu shutdown, removed cache disabling and added a check for kexec
in progress to use wbinvd followed immediately by halt in order to avoid
any memory corruption
- Update how persistent memory is identified
- Added a function to find command line arguments and their values
- Added sysfs support
- General code cleanup based on feedback
- General cleanup of patch subjects and descriptions
Changes since v3:
- Broke out some of the patches into smaller individual patches
- Updated Documentation
- Added a message to indicate why the IOMMU was disabled
- Updated CPU feature support for SME by taking into account whether
BIOS has enabled SME
- Eliminated redundant functions
- Added some warning messages for DMA usage of bounce buffers when SME
is active
- Added support for persistent memory
- Added support to determine when setup data is being mapped and be sure
to map it un-encrypted
- Added CONFIG support to set the default action of whether to activate
SME if it is supported/enabled
- Added support for (re)booting with kexec
Changes since v2:
- Updated Documentation
- Make the encryption mask available outside of arch/x86 through a
standard include file
- Conversion of assembler routines to C where possible (not everything
could be converted, e.g. the routine that does the actual encryption
needs to be copied into a safe location and it is difficult to
determine the actual length of the function in order to copy it)
- Fix SME feature use of scattered CPUID feature
- Creation of SME specific functions for things like encrypting
the setup data, ramdisk, etc.
- New take on early_memremap / memremap encryption support
- Additional support for accessing video buffers (fbdev/gpu) as
un-encrypted
- Disable IOMMU for now - need to investigate further in relation to
how it needs to be programmed relative to accessing physical memory
Changes since v1:
- Added Documentation.
- Removed AMD vendor check for setting the PAT write protect mode
- Updated naming of trampoline flag for SME as well as moving of the
SME check to before paging is enabled.
- Change to early_memremap to identify the data being mapped as either
boot data or kernel data. The idea being that boot data will have
been placed in memory as un-encrypted data and would need to be accessed
as such.
- Updated debugfs support for the bootparams to access the data properly.
- Do not set the SYSCFG[MEME] bit, only check it. The setting of the
MemEncryptionModeEn bit results in a reduction of physical address size
of the processor. It is possible that BIOS could have configured resources
resources into a range that will now not be addressable. To prevent this,
rely on BIOS to set the SYSCFG[MEME] bit and only then enable memory
encryption support in the kernel.
Tom Lendacky (34):
x86: Document AMD Secure Memory Encryption (SME)
x86/mm/pat: Set write-protect cache mode for full PAT support
x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use memremap for RAM mappings
x86/CPU/AMD: Add the Secure Memory Encryption CPU feature
x86/CPU/AMD: Handle SME reduction in physical address size
x86/mm: Add Secure Memory Encryption (SME) support
x86/mm: Don't use phys_to_virt in ioremap() if SME is active
x86/mm: Add support to enable SME in early boot processing
x86/mm: Simplify p[gum]d_page() macros
x86, x86/mm, x86/xen, olpc: Use __va() against just the physical address in cr3
x86/mm: Provide general kernel support for memory encryption
x86/mm: Extend early_memremap() support with additional attrs
x86/mm: Add support for early encrypt/decrypt of memory
x86/mm: Insure that boot memory areas are mapped properly
x86/boot/e820: Add support to determine the E820 type of an address
efi: Add an EFI table address match function
efi: Update efi_mem_type() to return an error rather than 0
x86/efi: Update EFI pagetable creation to work with SME
x86/mm: Add support to access boot related data in the clear
x86, mpparse: Use memremap to map the mpf and mpc data
x86/mm: Add support to access persistent memory in the clear
x86/mm: Add support for changing the memory encryption attribute
x86, realmode: Decrypt trampoline area if memory encryption is active
x86, swiotlb: Add memory encryption support
swiotlb: Add warnings for use of bounce buffers with SME
iommu/amd: Allow the AMD IOMMU to work with memory encryption
x86, realmode: Check for memory encryption on the APs
x86, drm, fbdev: Do not specify encrypted memory for video mappings
kvm: x86: svm: Support Secure Memory Encryption within KVM
x86/mm, kexec: Allow kexec to be used with SME
x86/mm: Use proper encryption attributes with /dev/mem
x86/mm: Add support to encrypt the kernel in-place
x86/boot: Add early cmdline parsing for options with arguments
x86/mm: Add support to make use of Secure Memory Encryption
Documentation/admin-guide/kernel-parameters.txt | 11
Documentation/x86/amd-memory-encryption.txt | 68 ++
arch/ia64/kernel/efi.c | 4
arch/x86/Kconfig | 26 +
arch/x86/boot/compressed/pagetable.c | 7
arch/x86/include/asm/cmdline.h | 2
arch/x86/include/asm/cpufeatures.h | 1
arch/x86/include/asm/dma-mapping.h | 5
arch/x86/include/asm/dmi.h | 8
arch/x86/include/asm/e820/api.h | 2
arch/x86/include/asm/fixmap.h | 20 +
arch/x86/include/asm/init.h | 1
arch/x86/include/asm/io.h | 7
arch/x86/include/asm/kexec.h | 8
arch/x86/include/asm/kvm_host.h | 2
arch/x86/include/asm/mem_encrypt.h | 112 ++++
arch/x86/include/asm/msr-index.h | 2
arch/x86/include/asm/page_types.h | 2
arch/x86/include/asm/pgtable.h | 28 +
arch/x86/include/asm/pgtable_types.h | 54 +-
arch/x86/include/asm/processor.h | 3
arch/x86/include/asm/realmode.h | 12
arch/x86/include/asm/set_memory.h | 3
arch/x86/include/asm/special_insns.h | 9
arch/x86/include/asm/vga.h | 14
arch/x86/kernel/acpi/boot.c | 6
arch/x86/kernel/cpu/amd.c | 17 +
arch/x86/kernel/cpu/scattered.c | 1
arch/x86/kernel/e820.c | 26 +
arch/x86/kernel/espfix_64.c | 2
arch/x86/kernel/head64.c | 42 +
arch/x86/kernel/head_64.S | 80 ++-
arch/x86/kernel/kdebugfs.c | 34 -
arch/x86/kernel/ksysfs.c | 28 -
arch/x86/kernel/machine_kexec_64.c | 35 +
arch/x86/kernel/mpparse.c | 108 +++-
arch/x86/kernel/pci-dma.c | 11
arch/x86/kernel/pci-nommu.c | 2
arch/x86/kernel/pci-swiotlb.c | 15 -
arch/x86/kernel/process.c | 17 +
arch/x86/kernel/setup.c | 9
arch/x86/kvm/mmu.c | 12
arch/x86/kvm/mmu.h | 2
arch/x86/kvm/svm.c | 35 +
arch/x86/kvm/vmx.c | 3
arch/x86/kvm/x86.c | 3
arch/x86/lib/cmdline.c | 105 ++++
arch/x86/mm/Makefile | 3
arch/x86/mm/fault.c | 10
arch/x86/mm/ident_map.c | 12
arch/x86/mm/ioremap.c | 277 +++++++++-
arch/x86/mm/kasan_init_64.c | 4
arch/x86/mm/mem_encrypt.c | 667 +++++++++++++++++++++++
arch/x86/mm/mem_encrypt_boot.S | 150 +++++
arch/x86/mm/pageattr.c | 67 ++
arch/x86/mm/pat.c | 9
arch/x86/pci/common.c | 4
arch/x86/platform/efi/efi.c | 6
arch/x86/platform/efi/efi_64.c | 15 -
arch/x86/platform/olpc/olpc-xo1-pm.c | 2
arch/x86/power/hibernate_64.c | 2
arch/x86/realmode/init.c | 15 +
arch/x86/realmode/rm/trampoline_64.S | 24 +
arch/x86/xen/mmu_pv.c | 6
drivers/firmware/dmi-sysfs.c | 5
drivers/firmware/efi/efi.c | 33 +
drivers/firmware/pcdp.c | 4
drivers/gpu/drm/drm_gem.c | 2
drivers/gpu/drm/drm_vm.c | 4
drivers/gpu/drm/ttm/ttm_bo_vm.c | 7
drivers/gpu/drm/udl/udl_fb.c | 4
drivers/iommu/amd_iommu.c | 36 +
drivers/iommu/amd_iommu_init.c | 18 -
drivers/iommu/amd_iommu_proto.h | 10
drivers/iommu/amd_iommu_types.h | 2
drivers/sfi/sfi_core.c | 22 -
drivers/video/fbdev/core/fbmem.c | 12
include/asm-generic/early_ioremap.h | 2
include/asm-generic/mem_encrypt.h | 45 ++
include/asm-generic/pgtable.h | 8
include/linux/dma-mapping.h | 9
include/linux/efi.h | 9
include/linux/io.h | 2
include/linux/kexec.h | 14
include/linux/mem_encrypt.h | 18 +
include/linux/swiotlb.h | 1
init/main.c | 13
kernel/kexec_core.c | 6
kernel/memremap.c | 20 +
lib/swiotlb.c | 59 ++
mm/early_ioremap.c | 30 +
91 files changed, 2411 insertions(+), 261 deletions(-)
create mode 100644 Documentation/x86/amd-memory-encryption.txt
create mode 100644 arch/x86/include/asm/mem_encrypt.h
create mode 100644 arch/x86/mm/mem_encrypt.c
create mode 100644 arch/x86/mm/mem_encrypt_boot.S
create mode 100644 include/asm-generic/mem_encrypt.h
create mode 100644 include/linux/mem_encrypt.h
--
Tom Lendacky
Create a Documentation entry to describe the AMD Secure Memory
Encryption (SME) feature and add documentation for the mem_encrypt=
kernel parameter.
Reviewed-by: Borislav Petkov <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 11 ++++
Documentation/x86/amd-memory-encryption.txt | 68 +++++++++++++++++++++++
2 files changed, 79 insertions(+)
create mode 100644 Documentation/x86/amd-memory-encryption.txt
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 4e4c340..abb65da 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2188,6 +2188,17 @@
memory contents and reserves bad memory
regions that are detected.
+ mem_encrypt= [X86-64] AMD Secure Memory Encryption (SME) control
+ Valid arguments: on, off
+ Default (depends on kernel configuration option):
+ on (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
+ off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
+ mem_encrypt=on: Activate SME
+ mem_encrypt=off: Do not activate SME
+
+ Refer to Documentation/x86/amd-memory-encryption.txt
+ for details on when memory encryption can be activated.
+
mem_sleep_default= [SUSPEND] Default system suspend mode:
s2idle - Suspend-To-Idle
shallow - Power-On Suspend or equivalent (if supported)
diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
new file mode 100644
index 0000000..f512ab7
--- /dev/null
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -0,0 +1,68 @@
+Secure Memory Encryption (SME) is a feature found on AMD processors.
+
+SME provides the ability to mark individual pages of memory as encrypted using
+the standard x86 page tables. A page that is marked encrypted will be
+automatically decrypted when read from DRAM and encrypted when written to
+DRAM. SME can therefore be used to protect the contents of DRAM from physical
+attacks on the system.
+
+A page is encrypted when a page table entry has the encryption bit set (see
+below on how to determine its position). The encryption bit can also be
+specified in the cr3 register, allowing the PGD table to be encrypted. Each
+successive level of page tables can also be encrypted by setting the encryption
+bit in the page table entry that points to the next table. This allows the full
+page table hierarchy to be encrypted. Note, this means that just because the
+encryption bit is set in cr3, doesn't imply the full hierarchy is encyrpted.
+Each page table entry in the hierarchy needs to have the encryption bit set to
+achieve that. So, theoretically, you could have the encryption bit set in cr3
+so that the PGD is encrypted, but not set the encryption bit in the PGD entry
+for a PUD which results in the PUD pointed to by that entry to not be
+encrypted.
+
+Support for SME can be determined through the CPUID instruction. The CPUID
+function 0x8000001f reports information related to SME:
+
+ 0x8000001f[eax]:
+ Bit[0] indicates support for SME
+ 0x8000001f[ebx]:
+ Bits[5:0] pagetable bit number used to activate memory
+ encryption
+ Bits[11:6] reduction in physical address space, in bits, when
+ memory encryption is enabled (this only affects
+ system physical addresses, not guest physical
+ addresses)
+
+If support for SME is present, MSR 0xc00100010 (MSR_K8_SYSCFG) can be used to
+determine if SME is enabled and/or to enable memory encryption:
+
+ 0xc0010010:
+ Bit[23] 0 = memory encryption features are disabled
+ 1 = memory encryption features are enabled
+
+Linux relies on BIOS to set this bit if BIOS has determined that the reduction
+in the physical address space as a result of enabling memory encryption (see
+CPUID information above) will not conflict with the address space resource
+requirements for the system. If this bit is not set upon Linux startup then
+Linux itself will not set it and memory encryption will not be possible.
+
+The state of SME in the Linux kernel can be documented as follows:
+ - Supported:
+ The CPU supports SME (determined through CPUID instruction).
+
+ - Enabled:
+ Supported and bit 23 of MSR_K8_SYSCFG is set.
+
+ - Active:
+ Supported, Enabled and the Linux kernel is actively applying
+ the encryption bit to page table entries (the SME mask in the
+ kernel is non-zero).
+
+SME can also be enabled and activated in the BIOS. If SME is enabled and
+activated in the BIOS, then all memory accesses will be encrypted and it will
+not be necessary to activate the Linux memory encryption support. If the BIOS
+merely enables SME (sets bit 23 of the MSR_K8_SYSCFG), then Linux can activate
+memory encryption by default (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y) or
+by supplying mem_encrypt=on on the kernel command line. However, if BIOS does
+not enable SME, then Linux will not be able to activate memory encryption, even
+if configured to do so by default or the mem_encrypt=on command line parameter
+is specified.
For processors that support PAT, set the write-protect cache mode
(_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).
Acked-by: Borislav Petkov <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/mm/pat.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 9b78685..6753d9c 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -295,7 +295,7 @@ static void init_cache_modes(void)
* pat_init - Initialize PAT MSR and PAT table
*
* This function initializes PAT MSR and PAT table with an OS-defined value
- * to enable additional cache attributes, WC and WT.
+ * to enable additional cache attributes, WC, WT and WP.
*
* This function must be called on all CPUs using the specific sequence of
* operations defined in Intel SDM. mtrr_rendezvous_handler() provides this
@@ -356,7 +356,7 @@ void pat_init(void)
* 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
* 011 3 UC : _PAGE_CACHE_MODE_UC
* 100 4 WB : Reserved
- * 101 5 WC : Reserved
+ * 101 5 WP : _PAGE_CACHE_MODE_WP
* 110 6 UC-: Reserved
* 111 7 WT : _PAGE_CACHE_MODE_WT
*
@@ -364,7 +364,7 @@ void pat_init(void)
* corresponding types in the presence of PAT errata.
*/
pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
+ PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
}
if (!boot_cpu_done) {
The ioremap() function is intended for mapping MMIO. For RAM, the
memremap() function should be used. Convert calls from ioremap() to
memremap() when re-mapping RAM.
This will be used later by SME to control how the encryption mask is
applied to memory mappings, with certain memory locations being mapped
decrypted vs encrypted.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/dmi.h | 8 ++++----
arch/x86/kernel/acpi/boot.c | 6 +++---
arch/x86/kernel/kdebugfs.c | 34 +++++++++++-----------------------
arch/x86/kernel/ksysfs.c | 28 ++++++++++++++--------------
arch/x86/kernel/mpparse.c | 10 +++++-----
arch/x86/pci/common.c | 4 ++--
drivers/firmware/dmi-sysfs.c | 5 +++--
drivers/firmware/pcdp.c | 4 ++--
drivers/sfi/sfi_core.c | 22 +++++++++++-----------
9 files changed, 55 insertions(+), 66 deletions(-)
diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
index 3c69fed..a8e15b0 100644
--- a/arch/x86/include/asm/dmi.h
+++ b/arch/x86/include/asm/dmi.h
@@ -13,9 +13,9 @@ static __always_inline __init void *dmi_alloc(unsigned len)
}
/* Use early IO mappings for DMI because it's initialized early */
-#define dmi_early_remap early_ioremap
-#define dmi_early_unmap early_iounmap
-#define dmi_remap ioremap_cache
-#define dmi_unmap iounmap
+#define dmi_early_remap early_memremap
+#define dmi_early_unmap early_memunmap
+#define dmi_remap(_x, _l) memremap(_x, _l, MEMREMAP_WB)
+#define dmi_unmap(_x) memunmap(_x)
#endif /* _ASM_X86_DMI_H */
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 6bb6806..850160a 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -115,7 +115,7 @@
#define ACPI_INVALID_GSI INT_MIN
/*
- * This is just a simple wrapper around early_ioremap(),
+ * This is just a simple wrapper around early_memremap(),
* with sanity checks for phys == 0 and size == 0.
*/
char *__init __acpi_map_table(unsigned long phys, unsigned long size)
@@ -124,7 +124,7 @@ char *__init __acpi_map_table(unsigned long phys, unsigned long size)
if (!phys || !size)
return NULL;
- return early_ioremap(phys, size);
+ return early_memremap(phys, size);
}
void __init __acpi_unmap_table(char *map, unsigned long size)
@@ -132,7 +132,7 @@ void __init __acpi_unmap_table(char *map, unsigned long size)
if (!map || !size)
return;
- early_iounmap(map, size);
+ early_memunmap(map, size);
}
#ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
index 38b6458..fd6f8fb 100644
--- a/arch/x86/kernel/kdebugfs.c
+++ b/arch/x86/kernel/kdebugfs.c
@@ -33,7 +33,6 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,
struct setup_data_node *node = file->private_data;
unsigned long remain;
loff_t pos = *ppos;
- struct page *pg;
void *p;
u64 pa;
@@ -47,18 +46,13 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,
count = node->len - pos;
pa = node->paddr + sizeof(struct setup_data) + pos;
- pg = pfn_to_page((pa + count - 1) >> PAGE_SHIFT);
- if (PageHighMem(pg)) {
- p = ioremap_cache(pa, count);
- if (!p)
- return -ENXIO;
- } else
- p = __va(pa);
+ p = memremap(pa, count, MEMREMAP_WB);
+ if (!p)
+ return -ENOMEM;
remain = copy_to_user(user_buf, p, count);
- if (PageHighMem(pg))
- iounmap(p);
+ memunmap(p);
if (remain)
return -EFAULT;
@@ -109,7 +103,6 @@ static int __init create_setup_data_nodes(struct dentry *parent)
struct setup_data *data;
int error;
struct dentry *d;
- struct page *pg;
u64 pa_data;
int no = 0;
@@ -126,16 +119,12 @@ static int __init create_setup_data_nodes(struct dentry *parent)
goto err_dir;
}
- pg = pfn_to_page((pa_data+sizeof(*data)-1) >> PAGE_SHIFT);
- if (PageHighMem(pg)) {
- data = ioremap_cache(pa_data, sizeof(*data));
- if (!data) {
- kfree(node);
- error = -ENXIO;
- goto err_dir;
- }
- } else
- data = __va(pa_data);
+ data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
+ if (!data) {
+ kfree(node);
+ error = -ENOMEM;
+ goto err_dir;
+ }
node->paddr = pa_data;
node->type = data->type;
@@ -143,8 +132,7 @@ static int __init create_setup_data_nodes(struct dentry *parent)
error = create_setup_data_node(d, no, node);
pa_data = data->next;
- if (PageHighMem(pg))
- iounmap(data);
+ memunmap(data);
if (error)
goto err_dir;
no++;
diff --git a/arch/x86/kernel/ksysfs.c b/arch/x86/kernel/ksysfs.c
index 4afc67f..ee51db9 100644
--- a/arch/x86/kernel/ksysfs.c
+++ b/arch/x86/kernel/ksysfs.c
@@ -16,8 +16,8 @@
#include <linux/stat.h>
#include <linux/slab.h>
#include <linux/mm.h>
+#include <linux/io.h>
-#include <asm/io.h>
#include <asm/setup.h>
static ssize_t version_show(struct kobject *kobj,
@@ -79,12 +79,12 @@ static int get_setup_data_paddr(int nr, u64 *paddr)
*paddr = pa_data;
return 0;
}
- data = ioremap_cache(pa_data, sizeof(*data));
+ data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
if (!data)
return -ENOMEM;
pa_data = data->next;
- iounmap(data);
+ memunmap(data);
i++;
}
return -EINVAL;
@@ -97,17 +97,17 @@ static int __init get_setup_data_size(int nr, size_t *size)
u64 pa_data = boot_params.hdr.setup_data;
while (pa_data) {
- data = ioremap_cache(pa_data, sizeof(*data));
+ data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
if (!data)
return -ENOMEM;
if (nr == i) {
*size = data->len;
- iounmap(data);
+ memunmap(data);
return 0;
}
pa_data = data->next;
- iounmap(data);
+ memunmap(data);
i++;
}
return -EINVAL;
@@ -127,12 +127,12 @@ static ssize_t type_show(struct kobject *kobj,
ret = get_setup_data_paddr(nr, &paddr);
if (ret)
return ret;
- data = ioremap_cache(paddr, sizeof(*data));
+ data = memremap(paddr, sizeof(*data), MEMREMAP_WB);
if (!data)
return -ENOMEM;
ret = sprintf(buf, "0x%x\n", data->type);
- iounmap(data);
+ memunmap(data);
return ret;
}
@@ -154,7 +154,7 @@ static ssize_t setup_data_data_read(struct file *fp,
ret = get_setup_data_paddr(nr, &paddr);
if (ret)
return ret;
- data = ioremap_cache(paddr, sizeof(*data));
+ data = memremap(paddr, sizeof(*data), MEMREMAP_WB);
if (!data)
return -ENOMEM;
@@ -170,15 +170,15 @@ static ssize_t setup_data_data_read(struct file *fp,
goto out;
ret = count;
- p = ioremap_cache(paddr + sizeof(*data), data->len);
+ p = memremap(paddr + sizeof(*data), data->len, MEMREMAP_WB);
if (!p) {
ret = -ENOMEM;
goto out;
}
memcpy(buf, p + off, count);
- iounmap(p);
+ memunmap(p);
out:
- iounmap(data);
+ memunmap(data);
return ret;
}
@@ -250,13 +250,13 @@ static int __init get_setup_data_total_num(u64 pa_data, int *nr)
*nr = 0;
while (pa_data) {
*nr += 1;
- data = ioremap_cache(pa_data, sizeof(*data));
+ data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
if (!data) {
ret = -ENOMEM;
goto out;
}
pa_data = data->next;
- iounmap(data);
+ memunmap(data);
}
out:
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index 0d904d7..fd37f39 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -436,9 +436,9 @@ static unsigned long __init get_mpc_size(unsigned long physptr)
struct mpc_table *mpc;
unsigned long size;
- mpc = early_ioremap(physptr, PAGE_SIZE);
+ mpc = early_memremap(physptr, PAGE_SIZE);
size = mpc->length;
- early_iounmap(mpc, PAGE_SIZE);
+ early_memunmap(mpc, PAGE_SIZE);
apic_printk(APIC_VERBOSE, " mpc: %lx-%lx\n", physptr, physptr + size);
return size;
@@ -450,7 +450,7 @@ static int __init check_physptr(struct mpf_intel *mpf, unsigned int early)
unsigned long size;
size = get_mpc_size(mpf->physptr);
- mpc = early_ioremap(mpf->physptr, size);
+ mpc = early_memremap(mpf->physptr, size);
/*
* Read the physical hardware table. Anything here will
* override the defaults.
@@ -461,10 +461,10 @@ static int __init check_physptr(struct mpf_intel *mpf, unsigned int early)
#endif
pr_err("BIOS bug, MP table errors detected!...\n");
pr_cont("... disabling SMP support. (tell your hw vendor)\n");
- early_iounmap(mpc, size);
+ early_memunmap(mpc, size);
return -1;
}
- early_iounmap(mpc, size);
+ early_memunmap(mpc, size);
if (early)
return -1;
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 190e718..08cf71c 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -691,7 +691,7 @@ int pcibios_add_device(struct pci_dev *dev)
pa_data = boot_params.hdr.setup_data;
while (pa_data) {
- data = ioremap(pa_data, sizeof(*rom));
+ data = memremap(pa_data, sizeof(*rom), MEMREMAP_WB);
if (!data)
return -ENOMEM;
@@ -710,7 +710,7 @@ int pcibios_add_device(struct pci_dev *dev)
}
}
pa_data = data->next;
- iounmap(data);
+ memunmap(data);
}
set_dma_domain_ops(dev);
set_dev_domain_options(dev);
diff --git a/drivers/firmware/dmi-sysfs.c b/drivers/firmware/dmi-sysfs.c
index ef76e5e..d5de6ee 100644
--- a/drivers/firmware/dmi-sysfs.c
+++ b/drivers/firmware/dmi-sysfs.c
@@ -25,6 +25,7 @@
#include <linux/slab.h>
#include <linux/list.h>
#include <linux/io.h>
+#include <asm/dmi.h>
#define MAX_ENTRY_TYPE 255 /* Most of these aren't used, but we consider
the top entry type is only 8 bits */
@@ -380,7 +381,7 @@ static ssize_t dmi_sel_raw_read_phys32(struct dmi_sysfs_entry *entry,
u8 __iomem *mapped;
ssize_t wrote = 0;
- mapped = ioremap(sel->access_method_address, sel->area_length);
+ mapped = dmi_remap(sel->access_method_address, sel->area_length);
if (!mapped)
return -EIO;
@@ -390,7 +391,7 @@ static ssize_t dmi_sel_raw_read_phys32(struct dmi_sysfs_entry *entry,
wrote++;
}
- iounmap(mapped);
+ dmi_unmap(mapped);
return wrote;
}
diff --git a/drivers/firmware/pcdp.c b/drivers/firmware/pcdp.c
index 75273a25..e83d6ae 100644
--- a/drivers/firmware/pcdp.c
+++ b/drivers/firmware/pcdp.c
@@ -95,7 +95,7 @@
if (efi.hcdp == EFI_INVALID_TABLE_ADDR)
return -ENODEV;
- pcdp = early_ioremap(efi.hcdp, 4096);
+ pcdp = early_memremap(efi.hcdp, 4096);
printk(KERN_INFO "PCDP: v%d at 0x%lx\n", pcdp->rev, efi.hcdp);
if (strstr(cmdline, "console=hcdp")) {
@@ -131,6 +131,6 @@
}
out:
- early_iounmap(pcdp, 4096);
+ early_memunmap(pcdp, 4096);
return rc;
}
diff --git a/drivers/sfi/sfi_core.c b/drivers/sfi/sfi_core.c
index 296db7a..d5ce534 100644
--- a/drivers/sfi/sfi_core.c
+++ b/drivers/sfi/sfi_core.c
@@ -86,13 +86,13 @@
/*
* FW creates and saves the SFI tables in memory. When these tables get
* used, they may need to be mapped to virtual address space, and the mapping
- * can happen before or after the ioremap() is ready, so a flag is needed
+ * can happen before or after the memremap() is ready, so a flag is needed
* to indicating this
*/
-static u32 sfi_use_ioremap __read_mostly;
+static u32 sfi_use_memremap __read_mostly;
/*
- * sfi_un/map_memory calls early_ioremap/iounmap which is a __init function
+ * sfi_un/map_memory calls early_memremap/memunmap which is a __init function
* and introduces section mismatch. So use __ref to make it calm.
*/
static void __iomem * __ref sfi_map_memory(u64 phys, u32 size)
@@ -100,10 +100,10 @@ static void __iomem * __ref sfi_map_memory(u64 phys, u32 size)
if (!phys || !size)
return NULL;
- if (sfi_use_ioremap)
- return ioremap_cache(phys, size);
+ if (sfi_use_memremap)
+ return memremap(phys, size, MEMREMAP_WB);
else
- return early_ioremap(phys, size);
+ return early_memremap(phys, size);
}
static void __ref sfi_unmap_memory(void __iomem *virt, u32 size)
@@ -111,10 +111,10 @@ static void __ref sfi_unmap_memory(void __iomem *virt, u32 size)
if (!virt || !size)
return;
- if (sfi_use_ioremap)
- iounmap(virt);
+ if (sfi_use_memremap)
+ memunmap(virt);
else
- early_iounmap(virt, size);
+ early_memunmap(virt, size);
}
static void sfi_print_table_header(unsigned long long pa,
@@ -507,8 +507,8 @@ void __init sfi_init_late(void)
length = syst_va->header.len;
sfi_unmap_memory(syst_va, sizeof(struct sfi_table_simple));
- /* Use ioremap now after it is ready */
- sfi_use_ioremap = 1;
+ /* Use memremap now after it is ready */
+ sfi_use_memremap = 1;
syst_va = sfi_map_memory(syst_pa, length);
sfi_acpi_init();
Add support for Secure Memory Encryption (SME). This initial support
provides a Kconfig entry to build the SME support into the kernel and
defines the memory encryption mask that will be used in subsequent
patches to mark pages as encrypted.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/Kconfig | 22 ++++++++++++++++++++++
arch/x86/include/asm/mem_encrypt.h | 35 +++++++++++++++++++++++++++++++++++
arch/x86/mm/Makefile | 1 +
arch/x86/mm/mem_encrypt.c | 21 +++++++++++++++++++++
include/asm-generic/mem_encrypt.h | 27 +++++++++++++++++++++++++++
include/linux/mem_encrypt.h | 18 ++++++++++++++++++
6 files changed, 124 insertions(+)
create mode 100644 arch/x86/include/asm/mem_encrypt.h
create mode 100644 arch/x86/mm/mem_encrypt.c
create mode 100644 include/asm-generic/mem_encrypt.h
create mode 100644 include/linux/mem_encrypt.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4ccfacc..11f2fdb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1407,6 +1407,28 @@ config X86_DIRECT_GBPAGES
supports them), so don't confuse the user by printing
that we have them enabled.
+config AMD_MEM_ENCRYPT
+ bool "AMD Secure Memory Encryption (SME) support"
+ depends on X86_64 && CPU_SUP_AMD
+ ---help---
+ Say yes to enable support for the encryption of system memory.
+ This requires an AMD processor that supports Secure Memory
+ Encryption (SME).
+
+config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
+ bool "Activate AMD Secure Memory Encryption (SME) by default"
+ default y
+ depends on AMD_MEM_ENCRYPT
+ ---help---
+ Say yes to have system memory encrypted by default if running on
+ an AMD processor that supports Secure Memory Encryption (SME).
+
+ If set to Y, then the encryption of system memory can be
+ deactivated with the mem_encrypt=off command line option.
+
+ If set to N, then the encryption of system memory can be
+ activated with the mem_encrypt=on command line option.
+
# Common NUMA Features
config NUMA
bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
new file mode 100644
index 0000000..5008fd9
--- /dev/null
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -0,0 +1,35 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __X86_MEM_ENCRYPT_H__
+#define __X86_MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+extern unsigned long sme_me_mask;
+
+#else /* !CONFIG_AMD_MEM_ENCRYPT */
+
+#define sme_me_mask 0UL
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+static inline bool sme_active(void)
+{
+ return !!sme_me_mask;
+}
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 96d2b84..44d4d21 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -39,3 +39,4 @@ obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
new file mode 100644
index 0000000..b99d469
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt.c
@@ -0,0 +1,21 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+/*
+ * Since SME related variables are set early in the boot process they must
+ * reside in the .data section so as not to be zeroed out when the .bss
+ * section is later cleared.
+ */
+unsigned long sme_me_mask __section(.data) = 0;
+EXPORT_SYMBOL_GPL(sme_me_mask);
diff --git a/include/asm-generic/mem_encrypt.h b/include/asm-generic/mem_encrypt.h
new file mode 100644
index 0000000..563c918
--- /dev/null
+++ b/include/asm-generic/mem_encrypt.h
@@ -0,0 +1,27 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2017 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_GENERIC_MEM_ENCRYPT_H__
+#define __ASM_GENERIC_MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#define sme_me_mask 0UL
+
+static inline bool sme_active(void)
+{
+ return false;
+}
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __MEM_ENCRYPT_H__ */
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
new file mode 100644
index 0000000..1d8e063
--- /dev/null
+++ b/include/linux/mem_encrypt.h
@@ -0,0 +1,18 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __MEM_ENCRYPT_H__
+#define __MEM_ENCRYPT_H__
+
+#include <asm/mem_encrypt.h>
+
+#endif /* __MEM_ENCRYPT_H__ */
When System Memory Encryption (SME) is enabled, the physical address
space is reduced. Adjust the x86_phys_bits value to reflect this
reduction.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kernel/cpu/amd.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c47ceee..5bdcbd4 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -613,15 +613,19 @@ static void early_init_amd(struct cpuinfo_x86 *c)
set_cpu_bug(c, X86_BUG_AMD_E400);
/*
- * BIOS support is required for SME. If BIOS has not enabled SME
- * then don't advertise the feature (set in scattered.c)
+ * BIOS support is required for SME. If BIOS has enabld SME then
+ * adjust x86_phys_bits by the SME physical address space reduction
+ * value. If BIOS has not enabled SME then don't advertise the
+ * feature (set in scattered.c).
*/
if (cpu_has(c, X86_FEATURE_SME)) {
u64 msr;
/* Check if SME is enabled */
rdmsrl(MSR_K8_SYSCFG, msr);
- if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+ if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT)
+ c->x86_phys_bits -= (cpuid_ebx(0x8000001f) >> 6) & 0x3f;
+ else
clear_cpu_cap(c, X86_FEATURE_SME);
}
}
Add support to the early boot code to use Secure Memory Encryption (SME).
Since the kernel has been loaded into memory in a decrypted state, encrypt
the kernel in place and update the early pagetables with the memory
encryption mask so that new pagetable entries will use memory encryption.
The routines to set the encryption mask and perform the encryption are
stub routines for now with functionality to be added in a later patch.
Because of the need to have the routines available to head_64.S, the
mem_encrypt.c is always built and #ifdefs in mem_encrypt.c will provide
functionality or stub routines depending on CONFIG_AMD_MEM_ENCRYPT.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kernel/head_64.S | 61 ++++++++++++++++++++++++++++++++++++++++++++-
arch/x86/mm/Makefile | 4 +--
arch/x86/mm/mem_encrypt.c | 26 +++++++++++++++++++
3 files changed, 86 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index ac9d327..222630c 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -91,6 +91,23 @@ startup_64:
jnz bad_address
/*
+ * Activate Secure Memory Encryption (SME), if supported and enabled.
+ * The real_mode_data address is in %rsi and that register can be
+ * clobbered by the called function so be sure to save it.
+ * Save the returned mask in %r12 for later use.
+ */
+ push %rsi
+ call sme_enable
+ pop %rsi
+ movq %rax, %r12
+
+ /*
+ * Add the memory encryption mask to %rbp to include it in the page
+ * table fixups.
+ */
+ addq %r12, %rbp
+
+ /*
* Fixup the physical addresses in the page table
*/
addq %rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
@@ -113,6 +130,7 @@ startup_64:
shrq $PGDIR_SHIFT, %rax
leaq (PAGE_SIZE + _KERNPG_TABLE)(%rbx), %rdx
+ addq %r12, %rdx
movq %rdx, 0(%rbx,%rax,8)
movq %rdx, 8(%rbx,%rax,8)
@@ -129,6 +147,7 @@ startup_64:
movq %rdi, %rax
shrq $PMD_SHIFT, %rdi
addq $(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+ addq %r12, %rax
leaq (_end - 1)(%rip), %rcx
shrq $PMD_SHIFT, %rcx
subq %rdi, %rcx
@@ -142,6 +161,12 @@ startup_64:
decl %ecx
jnz 1b
+ /*
+ * Determine if any fixups are required. This includes fixups
+ * based on where the kernel was loaded and whether SME is
+ * active. If %rbp is zero, then we can skip both the fixups
+ * and the call to encrypt the kernel.
+ */
test %rbp, %rbp
jz .Lskip_fixup
@@ -162,11 +187,30 @@ startup_64:
cmp %r8, %rdi
jne 1b
- /* Fixup phys_base */
+ /*
+ * Fixup phys_base - remove the memory encryption mask from %rbp
+ * to obtain the true physical address.
+ */
+ subq %r12, %rbp
addq %rbp, phys_base(%rip)
+ /*
+ * Encrypt the kernel if SME is active.
+ * The real_mode_data address is in %rsi and that register can be
+ * clobbered by the called function so be sure to save it.
+ */
+ push %rsi
+ call sme_encrypt_kernel
+ pop %rsi
+
.Lskip_fixup:
+ /*
+ * The encryption mask is in %r12. We ADD this to %rax to be sure
+ * that the encryption mask is part of the value that will be
+ * stored in %cr3.
+ */
movq $(early_level4_pgt - __START_KERNEL_map), %rax
+ addq %r12, %rax
jmp 1f
ENTRY(secondary_startup_64)
/*
@@ -186,7 +230,20 @@ ENTRY(secondary_startup_64)
/* Sanitize CPU configuration */
call verify_cpu
- movq $(init_level4_pgt - __START_KERNEL_map), %rax
+ /*
+ * Get the SME encryption mask.
+ * The encryption mask will be returned in %rax so we do an ADD
+ * below to be sure that the encryption mask is part of the
+ * value that will stored in %cr3.
+ *
+ * The real_mode_data address is in %rsi and that register can be
+ * clobbered by the called function so be sure to save it.
+ */
+ push %rsi
+ call sme_get_me_mask
+ pop %rsi
+
+ addq $(init_level4_pgt - __START_KERNEL_map), %rax
1:
/* Enable PAE mode and PGE */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 44d4d21..88ee454 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -2,7 +2,7 @@
KCOV_INSTRUMENT_tlb.o := n
obj-y := init.o init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
- pat.o pgtable.o physaddr.o gup.o setup_nx.o tlb.o
+ pat.o pgtable.o physaddr.o gup.o setup_nx.o tlb.o mem_encrypt.o
# Make sure __phys_addr has no stackprotector
nostackp := $(call cc-option, -fno-stack-protector)
@@ -38,5 +38,3 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o
obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
-
-obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index b99d469..cc00d8b 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -11,6 +11,9 @@
*/
#include <linux/linkage.h>
+#include <linux/init.h>
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
/*
* Since SME related variables are set early in the boot process they must
@@ -19,3 +22,26 @@
*/
unsigned long sme_me_mask __section(.data) = 0;
EXPORT_SYMBOL_GPL(sme_me_mask);
+
+void __init sme_encrypt_kernel(void)
+{
+}
+
+unsigned long __init sme_enable(void)
+{
+ return sme_me_mask;
+}
+
+unsigned long sme_get_me_mask(void)
+{
+ return sme_me_mask;
+}
+
+#else /* !CONFIG_AMD_MEM_ENCRYPT */
+
+void __init sme_encrypt_kernel(void) { }
+unsigned long __init sme_enable(void) { return 0; }
+
+unsigned long sme_get_me_mask(void) { return 0; }
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
The cr3 register entry can contain the SME encryption bit that indicates
the PGD is encrypted. The encryption bit should not be used when creating
a virtual address for the PGD table.
Create a new function, read_cr3_pa(), that will extract the physical
address from the cr3 register. This function is then used where a virtual
address of the PGD needs to be created/used from the cr3 register.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/special_insns.h | 9 +++++++++
arch/x86/kernel/head64.c | 2 +-
arch/x86/mm/fault.c | 10 +++++-----
arch/x86/mm/ioremap.c | 2 +-
arch/x86/platform/olpc/olpc-xo1-pm.c | 2 +-
arch/x86/power/hibernate_64.c | 2 +-
arch/x86/xen/mmu_pv.c | 6 +++---
7 files changed, 21 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 12af3e3..d8e8ace 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -234,6 +234,15 @@ static inline void clwb(volatile void *__p)
#define nop() asm volatile ("nop")
+static inline unsigned long native_read_cr3_pa(void)
+{
+ return (native_read_cr3() & PHYSICAL_PAGE_MASK);
+}
+
+static inline unsigned long read_cr3_pa(void)
+{
+ return (read_cr3() & PHYSICAL_PAGE_MASK);
+}
#endif /* __KERNEL__ */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 43b7002..dc03624 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -55,7 +55,7 @@ int __init early_make_pgtable(unsigned long address)
pmdval_t pmd, *pmd_p;
/* Invalid address or early pgt is done ? */
- if (physaddr >= MAXMEM || read_cr3() != __pa_nodebug(early_level4_pgt))
+ if (physaddr >= MAXMEM || read_cr3_pa() != __pa_nodebug(early_level4_pgt))
return -1;
again:
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8ad91a0..2a1fa10c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -346,7 +346,7 @@ static noinline int vmalloc_fault(unsigned long address)
* Do _not_ use "current" here. We might be inside
* an interrupt in the middle of a task switch..
*/
- pgd_paddr = read_cr3();
+ pgd_paddr = read_cr3_pa();
pmd_k = vmalloc_sync_one(__va(pgd_paddr), address);
if (!pmd_k)
return -1;
@@ -388,7 +388,7 @@ static bool low_pfn(unsigned long pfn)
static void dump_pagetable(unsigned long address)
{
- pgd_t *base = __va(read_cr3());
+ pgd_t *base = __va(read_cr3_pa());
pgd_t *pgd = &base[pgd_index(address)];
p4d_t *p4d;
pud_t *pud;
@@ -451,7 +451,7 @@ static noinline int vmalloc_fault(unsigned long address)
* happen within a race in page table update. In the later
* case just flush:
*/
- pgd = (pgd_t *)__va(read_cr3()) + pgd_index(address);
+ pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address);
pgd_ref = pgd_offset_k(address);
if (pgd_none(*pgd_ref))
return -1;
@@ -555,7 +555,7 @@ static int bad_address(void *p)
static void dump_pagetable(unsigned long address)
{
- pgd_t *base = __va(read_cr3() & PHYSICAL_PAGE_MASK);
+ pgd_t *base = __va(read_cr3_pa());
pgd_t *pgd = base + pgd_index(address);
p4d_t *p4d;
pud_t *pud;
@@ -700,7 +700,7 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
pgd_t *pgd;
pte_t *pte;
- pgd = __va(read_cr3() & PHYSICAL_PAGE_MASK);
+ pgd = __va(read_cr3_pa());
pgd += pgd_index(address);
pte = lookup_address_in_pgd(pgd, address, &level);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 2a0fa89..e6305dd 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -427,7 +427,7 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
{
/* Don't assume we're using swapper_pg_dir at this point */
- pgd_t *base = __va(read_cr3());
+ pgd_t *base = __va(read_cr3_pa());
pgd_t *pgd = &base[pgd_index(addr)];
p4d_t *p4d = p4d_offset(pgd, addr);
pud_t *pud = pud_offset(p4d, addr);
diff --git a/arch/x86/platform/olpc/olpc-xo1-pm.c b/arch/x86/platform/olpc/olpc-xo1-pm.c
index c5350fd..0668aaf 100644
--- a/arch/x86/platform/olpc/olpc-xo1-pm.c
+++ b/arch/x86/platform/olpc/olpc-xo1-pm.c
@@ -77,7 +77,7 @@ static int xo1_power_state_enter(suspend_state_t pm_state)
asmlinkage __visible int xo1_do_sleep(u8 sleep_state)
{
- void *pgd_addr = __va(read_cr3());
+ void *pgd_addr = __va(read_cr3_pa());
/* Program wakeup mask (using dword access to CS5536_PM1_EN) */
outl(wakeup_mask << 16, acpi_base + CS5536_PM1_STS);
diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
index a6e21fe..0a7650d 100644
--- a/arch/x86/power/hibernate_64.c
+++ b/arch/x86/power/hibernate_64.c
@@ -150,7 +150,7 @@ static int relocate_restore_code(void)
memcpy((void *)relocated_restore_code, &core_restore_code, PAGE_SIZE);
/* Make the page containing the relocated code executable */
- pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code);
+ pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(relocated_restore_code);
p4d = p4d_offset(pgd, relocated_restore_code);
if (p4d_large(*p4d)) {
set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 1f386d7..2dc5243 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2022,7 +2022,7 @@ static phys_addr_t __init xen_early_virt_to_phys(unsigned long vaddr)
pmd_t pmd;
pte_t pte;
- pa = read_cr3();
+ pa = read_cr3_pa();
pgd = native_make_pgd(xen_read_phys_ulong(pa + pgd_index(vaddr) *
sizeof(pgd)));
if (!pgd_present(pgd))
@@ -2102,7 +2102,7 @@ void __init xen_relocate_p2m(void)
pt_phys = pmd_phys + PFN_PHYS(n_pmd);
p2m_pfn = PFN_DOWN(pt_phys) + n_pt;
- pgd = __va(read_cr3());
+ pgd = __va(read_cr3_pa());
new_p2m = (unsigned long *)(2 * PGDIR_SIZE);
idx_p4d = 0;
save_pud = n_pud;
@@ -2209,7 +2209,7 @@ static void __init xen_write_cr3_init(unsigned long cr3)
{
unsigned long pfn = PFN_DOWN(__pa(swapper_pg_dir));
- BUG_ON(read_cr3() != __pa(initial_page_table));
+ BUG_ON(read_cr3_pa() != __pa(initial_page_table));
BUG_ON(cr3 != __pa(swapper_pg_dir));
/*
Changes to the existing page table macros will allow the SME support to
be enabled in a simple fashion with minimal changes to files that use these
macros. Since the memory encryption mask will now be part of the regular
pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
_KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
without the encryption mask before SME becomes active. Two new pgprot()
macros are defined to allow setting or clearing the page encryption mask.
The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO. SME does
not support encryption for MMIO areas so this define removes the encryption
mask from the page attribute.
Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
creating a physical address with the encryption mask. These are used when
working with the cr3 register so that the PGD can be encrypted. The current
__va() macro is updated so that the virtual address is generated based off
of the physical address without the encryption mask thus allowing the same
virtual address to be generated regardless of whether encryption is enabled
for that physical location or not.
Also, an early initialization function is added for SME. If SME is active,
this function:
- Updates the early_pmd_flags so that early page faults create mappings
with the encryption mask.
- Updates the __supported_pte_mask to include the encryption mask.
- Updates the protection_map entries to include the encryption mask so
that user-space allocations will automatically have the encryption mask
applied.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/boot/compressed/pagetable.c | 7 +++++
arch/x86/include/asm/fixmap.h | 7 +++++
arch/x86/include/asm/mem_encrypt.h | 25 +++++++++++++++++++
arch/x86/include/asm/page_types.h | 2 +-
arch/x86/include/asm/pgtable.h | 9 +++++++
arch/x86/include/asm/pgtable_types.h | 45 ++++++++++++++++++++++------------
arch/x86/include/asm/processor.h | 3 ++
arch/x86/kernel/espfix_64.c | 2 +-
arch/x86/kernel/head64.c | 10 +++++++-
arch/x86/kernel/head_64.S | 18 +++++++-------
arch/x86/mm/kasan_init_64.c | 4 ++-
arch/x86/mm/mem_encrypt.c | 18 ++++++++++++++
arch/x86/mm/pageattr.c | 3 ++
include/asm-generic/mem_encrypt.h | 8 ++++++
include/asm-generic/pgtable.h | 8 ++++++
15 files changed, 138 insertions(+), 31 deletions(-)
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index 1d78f17..05455ff 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -15,6 +15,13 @@
#define __pa(x) ((unsigned long)(x))
#define __va(x) ((void *)((unsigned long)(x)))
+/*
+ * The pgtable.h and mm/ident_map.c includes make use of the SME related
+ * information which is not used in the compressed image support. Un-define
+ * the SME support to avoid any compile and link errors.
+ */
+#undef CONFIG_AMD_MEM_ENCRYPT
+
#include "misc.h"
/* These actually do the work of building the kernel identity maps. */
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index b65155c..d9ff226 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -157,6 +157,13 @@ static inline void __set_fixmap(enum fixed_addresses idx,
}
#endif
+/*
+ * FIXMAP_PAGE_NOCACHE is used for MMIO. Memory encryption is not
+ * supported for MMIO addresses, so make sure that the memory encryption
+ * mask is not part of the page attributes.
+ */
+#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
+
#include <asm-generic/fixmap.h>
#define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 5008fd9..f1c4c29 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -15,14 +15,22 @@
#ifndef __ASSEMBLY__
+#include <linux/init.h>
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern unsigned long sme_me_mask;
+void __init sme_early_init(void);
+
#else /* !CONFIG_AMD_MEM_ENCRYPT */
#define sme_me_mask 0UL
+static inline void __init sme_early_init(void)
+{
+}
+
#endif /* CONFIG_AMD_MEM_ENCRYPT */
static inline bool sme_active(void)
@@ -30,6 +38,23 @@ static inline bool sme_active(void)
return !!sme_me_mask;
}
+/*
+ * The __sme_pa() and __sme_pa_nodebug() macros are meant for use when
+ * writing to or comparing values from the cr3 register. Having the
+ * encryption mask set in cr3 enables the PGD entry to be encrypted and
+ * avoid special case handling of PGD allocations.
+ */
+#define __sme_pa(x) (__pa(x) | sme_me_mask)
+#define __sme_pa_nodebug(x) (__pa_nodebug(x) | sme_me_mask)
+
+/*
+ * The __sme_set() and __sme_clr() macros are useful for adding or removing
+ * the encryption mask from a value (e.g. when dealing with pagetable
+ * entries).
+ */
+#define __sme_set(x) ((unsigned long)(x) | sme_me_mask)
+#define __sme_clr(x) ((unsigned long)(x) & ~sme_me_mask)
+
#endif /* __ASSEMBLY__ */
#endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 7bd0099..fead0a5 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -15,7 +15,7 @@
#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))
-#define __PHYSICAL_MASK ((phys_addr_t)((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
+#define __PHYSICAL_MASK ((phys_addr_t)(__sme_clr((1ULL << __PHYSICAL_MASK_SHIFT) - 1)))
#define __VIRTUAL_MASK ((1UL << __VIRTUAL_MASK_SHIFT) - 1)
/* Cast *PAGE_MASK to a signed type so that it is sign-extended if
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 96b6b83..3f789ec 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -3,6 +3,7 @@
#include <asm/page.h>
#include <asm/pgtable_types.h>
+#include <asm/mem_encrypt.h>
/*
* Macro to mark a page protection value as UC-
@@ -13,6 +14,12 @@
cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS))) \
: (prot))
+/*
+ * Macros to add or remove encryption attribute
+ */
+#define pgprot_encrypted(prot) __pgprot(__sme_set(pgprot_val(prot)))
+#define pgprot_decrypted(prot) __pgprot(__sme_clr(pgprot_val(prot)))
+
#ifndef __ASSEMBLY__
#include <asm/x86_init.h>
@@ -38,6 +45,8 @@
extern struct mm_struct *pgd_page_get_mm(struct page *page);
+extern pmdval_t early_pmd_flags;
+
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else /* !CONFIG_PARAVIRT */
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index bf9638e..d3ae99c 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -2,7 +2,9 @@
#define _ASM_X86_PGTABLE_DEFS_H
#include <linux/const.h>
+
#include <asm/page_types.h>
+#include <asm/mem_encrypt.h>
#define FIRST_USER_ADDRESS 0UL
@@ -121,10 +123,10 @@
#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
-#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
- _PAGE_ACCESSED | _PAGE_DIRTY)
-#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \
- _PAGE_DIRTY)
+#define _PAGE_TABLE_NOENC (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |\
+ _PAGE_ACCESSED | _PAGE_DIRTY)
+#define _KERNPG_TABLE_NOENC (_PAGE_PRESENT | _PAGE_RW | \
+ _PAGE_ACCESSED | _PAGE_DIRTY)
/*
* Set of bits not changed in pte_modify. The pte's
@@ -191,18 +193,29 @@ enum page_cache_mode {
#define __PAGE_KERNEL_IO (__PAGE_KERNEL)
#define __PAGE_KERNEL_IO_NOCACHE (__PAGE_KERNEL_NOCACHE)
-#define PAGE_KERNEL __pgprot(__PAGE_KERNEL)
-#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO)
-#define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC)
-#define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX)
-#define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE)
-#define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE)
-#define PAGE_KERNEL_LARGE_EXEC __pgprot(__PAGE_KERNEL_LARGE_EXEC)
-#define PAGE_KERNEL_VSYSCALL __pgprot(__PAGE_KERNEL_VSYSCALL)
-#define PAGE_KERNEL_VVAR __pgprot(__PAGE_KERNEL_VVAR)
-
-#define PAGE_KERNEL_IO __pgprot(__PAGE_KERNEL_IO)
-#define PAGE_KERNEL_IO_NOCACHE __pgprot(__PAGE_KERNEL_IO_NOCACHE)
+#ifndef __ASSEMBLY__
+
+#define _PAGE_ENC (_AT(pteval_t, sme_me_mask))
+
+#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
+ _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_ENC)
+#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \
+ _PAGE_DIRTY | _PAGE_ENC)
+
+#define PAGE_KERNEL __pgprot(__PAGE_KERNEL | _PAGE_ENC)
+#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
+#define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE_EXEC __pgprot(__PAGE_KERNEL_LARGE_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_VSYSCALL __pgprot(__PAGE_KERNEL_VSYSCALL | _PAGE_ENC)
+#define PAGE_KERNEL_VVAR __pgprot(__PAGE_KERNEL_VVAR | _PAGE_ENC)
+
+#define PAGE_KERNEL_IO __pgprot(__PAGE_KERNEL_IO)
+#define PAGE_KERNEL_IO_NOCACHE __pgprot(__PAGE_KERNEL_IO_NOCACHE)
+
+#endif /* __ASSEMBLY__ */
/* xwr */
#define __P000 PAGE_NONE
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3cada99..61e055d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -22,6 +22,7 @@
#include <asm/nops.h>
#include <asm/special_insns.h>
#include <asm/fpu/types.h>
+#include <asm/mem_encrypt.h>
#include <linux/personality.h>
#include <linux/cache.h>
@@ -233,7 +234,7 @@ static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
static inline void load_cr3(pgd_t *pgdir)
{
- write_cr3(__pa(pgdir));
+ write_cr3(__sme_pa(pgdir));
}
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 8e598a1..0955ec7 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -195,7 +195,7 @@ void init_espfix_ap(int cpu)
pte_p = pte_offset_kernel(&pmd, addr);
stack_page = page_address(alloc_pages_node(node, GFP_KERNEL, 0));
- pte = __pte(__pa(stack_page) | (__PAGE_KERNEL_RO & ptemask));
+ pte = __pte(__pa(stack_page) | ((__PAGE_KERNEL_RO | _PAGE_ENC) & ptemask));
for (n = 0; n < ESPFIX_PTE_CLONES; n++)
set_pte(&pte_p[n*PTE_STRIDE], pte);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index dc03624..00ae2c5 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -29,6 +29,7 @@
#include <asm/bootparam_utils.h>
#include <asm/microcode.h>
#include <asm/kasan.h>
+#include <asm/mem_encrypt.h>
/*
* Manage page tables very early on.
@@ -43,7 +44,7 @@ static void __init reset_early_page_tables(void)
{
memset(early_level4_pgt, 0, sizeof(pgd_t)*(PTRS_PER_PGD-1));
next_early_pgt = 0;
- write_cr3(__pa_nodebug(early_level4_pgt));
+ write_cr3(__sme_pa_nodebug(early_level4_pgt));
}
/* Create a new PMD entry */
@@ -158,6 +159,13 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
clear_page(init_level4_pgt);
+ /*
+ * SME support may update early_pmd_flags to include the memory
+ * encryption mask, so it needs to be called before anything
+ * that may generate a page fault.
+ */
+ sme_early_init();
+
kasan_early_init();
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 222630c..1fe944b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -129,7 +129,7 @@ startup_64:
movq %rdi, %rax
shrq $PGDIR_SHIFT, %rax
- leaq (PAGE_SIZE + _KERNPG_TABLE)(%rbx), %rdx
+ leaq (PAGE_SIZE + _KERNPG_TABLE_NOENC)(%rbx), %rdx
addq %r12, %rdx
movq %rdx, 0(%rbx,%rax,8)
movq %rdx, 8(%rbx,%rax,8)
@@ -476,7 +476,7 @@ GLOBAL(name)
__INITDATA
NEXT_PAGE(early_level4_pgt)
.fill 511,8,0
- .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
NEXT_PAGE(early_dynamic_pgts)
.fill 512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -488,15 +488,15 @@ NEXT_PAGE(init_level4_pgt)
.fill 512,8,0
#else
NEXT_PAGE(init_level4_pgt)
- .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+ .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
.org init_level4_pgt + L4_PAGE_OFFSET*8, 0
- .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+ .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
.org init_level4_pgt + L4_START_KERNEL*8, 0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
- .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
NEXT_PAGE(level3_ident_pgt)
- .quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+ .quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
.fill 511, 8, 0
NEXT_PAGE(level2_ident_pgt)
/* Since I easily can, map the first 1G.
@@ -508,8 +508,8 @@ NEXT_PAGE(level2_ident_pgt)
NEXT_PAGE(level3_kernel_pgt)
.fill L3_START_KERNEL,8,0
/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
- .quad level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
- .quad level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .quad level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+ .quad level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
NEXT_PAGE(level2_kernel_pgt)
/*
@@ -527,7 +527,7 @@ NEXT_PAGE(level2_kernel_pgt)
NEXT_PAGE(level2_fixmap_pgt)
.fill 506,8,0
- .quad level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .quad level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
.fill 5,8,0
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 0c7d812..6f1837a 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -92,7 +92,7 @@ static int kasan_die_handler(struct notifier_block *self,
void __init kasan_early_init(void)
{
int i;
- pteval_t pte_val = __pa_nodebug(kasan_zero_page) | __PAGE_KERNEL;
+ pteval_t pte_val = __pa_nodebug(kasan_zero_page) | __PAGE_KERNEL | _PAGE_ENC;
pmdval_t pmd_val = __pa_nodebug(kasan_zero_pte) | _KERNPG_TABLE;
pudval_t pud_val = __pa_nodebug(kasan_zero_pmd) | _KERNPG_TABLE;
p4dval_t p4d_val = __pa_nodebug(kasan_zero_pud) | _KERNPG_TABLE;
@@ -158,7 +158,7 @@ void __init kasan_init(void)
*/
memset(kasan_zero_page, 0, PAGE_SIZE);
for (i = 0; i < PTRS_PER_PTE; i++) {
- pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO);
+ pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO | _PAGE_ENC);
set_pte(&kasan_zero_pte[i], pte);
}
/* Flush TLBs again to be sure that write protection applied. */
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index cc00d8b..8ca93e5 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -15,6 +15,8 @@
#ifdef CONFIG_AMD_MEM_ENCRYPT
+#include <linux/mm.h>
+
/*
* Since SME related variables are set early in the boot process they must
* reside in the .data section so as not to be zeroed out when the .bss
@@ -23,6 +25,22 @@
unsigned long sme_me_mask __section(.data) = 0;
EXPORT_SYMBOL_GPL(sme_me_mask);
+void __init sme_early_init(void)
+{
+ unsigned int i;
+
+ if (!sme_me_mask)
+ return;
+
+ early_pmd_flags = __sme_set(early_pmd_flags);
+
+ __supported_pte_mask = __sme_set(__supported_pte_mask);
+
+ /* Update the protection map with memory encryption mask */
+ for (i = 0; i < ARRAY_SIZE(protection_map); i++)
+ protection_map[i] = pgprot_encrypted(protection_map[i]);
+}
+
void __init sme_encrypt_kernel(void)
{
}
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index c8520b2..e7d3866 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2014,6 +2014,9 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
if (!(page_flags & _PAGE_RW))
cpa.mask_clr = __pgprot(_PAGE_RW);
+ if (!(page_flags & _PAGE_ENC))
+ cpa.mask_clr = pgprot_encrypted(cpa.mask_clr);
+
cpa.mask_set = __pgprot(_PAGE_PRESENT | page_flags);
retval = __change_page_attr_set_clr(&cpa, 0);
diff --git a/include/asm-generic/mem_encrypt.h b/include/asm-generic/mem_encrypt.h
index 563c918..b55c3f9 100644
--- a/include/asm-generic/mem_encrypt.h
+++ b/include/asm-generic/mem_encrypt.h
@@ -22,6 +22,14 @@ static inline bool sme_active(void)
return false;
}
+/*
+ * The __sme_set() and __sme_clr() macros are useful for adding or removing
+ * the encryption mask from a value (e.g. when dealing with pagetable
+ * entries).
+ */
+#define __sme_set(x) (x)
+#define __sme_clr(x) (x)
+
#endif /* __ASSEMBLY__ */
#endif /* __MEM_ENCRYPT_H__ */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 7dfa767..882cb5d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -424,6 +424,14 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b)
#define pgprot_device pgprot_noncached
#endif
+#ifndef pgprot_encrypted
+#define pgprot_encrypted(prot) (prot)
+#endif
+
+#ifndef pgprot_decrypted
+#define pgprot_decrypted(prot) (prot)
+#endif
+
#ifndef pgprot_modify
#define pgprot_modify pgprot_modify
static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
The boot data and command line data are present in memory in a decrypted
state and are copied early in the boot process. The early page fault
support will map these areas as encrypted, so before attempting to copy
them, add decrypted mappings so the data is accessed properly when copied.
For the initrd, encrypt this data in place. Since the future mapping of the
initrd area will be mapped as encrypted the data will be accessed properly.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 11 +++++
arch/x86/include/asm/pgtable.h | 3 +
arch/x86/kernel/head64.c | 30 ++++++++++++--
arch/x86/kernel/setup.c | 9 ++++
arch/x86/mm/mem_encrypt.c | 77 ++++++++++++++++++++++++++++++++++++
5 files changed, 126 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 7c395cf..61a7049 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,9 @@ void __init sme_early_encrypt(resource_size_t paddr,
void __init sme_early_decrypt(resource_size_t paddr,
unsigned long size);
+void __init sme_map_bootdata(char *real_mode_data);
+void __init sme_unmap_bootdata(char *real_mode_data);
+
void __init sme_early_init(void);
#else /* !CONFIG_AMD_MEM_ENCRYPT */
@@ -42,6 +45,14 @@ static inline void __init sme_early_decrypt(resource_size_t paddr,
{
}
+static inline void __init sme_map_bootdata(char *real_mode_data)
+{
+}
+
+static inline void __init sme_unmap_bootdata(char *real_mode_data)
+{
+}
+
static inline void __init sme_early_init(void)
{
}
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 3f789ec..16657e7 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -23,6 +23,9 @@
#ifndef __ASSEMBLY__
#include <asm/x86_init.h>
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
+
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
void ptdump_walk_pgd_level_checkwx(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 00ae2c5..f1fe5df 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -34,7 +34,6 @@
/*
* Manage page tables very early on.
*/
-extern pgd_t early_level4_pgt[PTRS_PER_PGD];
extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
static unsigned int __initdata next_early_pgt = 2;
pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
@@ -48,12 +47,12 @@ static void __init reset_early_page_tables(void)
}
/* Create a new PMD entry */
-int __init early_make_pgtable(unsigned long address)
+int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
{
unsigned long physaddr = address - __PAGE_OFFSET;
pgdval_t pgd, *pgd_p;
pudval_t pud, *pud_p;
- pmdval_t pmd, *pmd_p;
+ pmdval_t *pmd_p;
/* Invalid address or early pgt is done ? */
if (physaddr >= MAXMEM || read_cr3_pa() != __pa_nodebug(early_level4_pgt))
@@ -95,12 +94,21 @@ int __init early_make_pgtable(unsigned long address)
memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
}
- pmd = (physaddr & PMD_MASK) + early_pmd_flags;
pmd_p[pmd_index(address)] = pmd;
return 0;
}
+int __init early_make_pgtable(unsigned long address)
+{
+ unsigned long physaddr = address - __PAGE_OFFSET;
+ pmdval_t pmd;
+
+ pmd = (physaddr & PMD_MASK) + early_pmd_flags;
+
+ return __early_make_pgtable(address, pmd);
+}
+
/* Don't add a printk in there. printk relies on the PDA which is not initialized
yet. */
static void __init clear_bss(void)
@@ -123,6 +131,12 @@ static void __init copy_bootdata(char *real_mode_data)
char * command_line;
unsigned long cmd_line_ptr;
+ /*
+ * If SME is active, this will create decrypted mappings of the
+ * boot data in advance of the copy operations.
+ */
+ sme_map_bootdata(real_mode_data);
+
memcpy(&boot_params, real_mode_data, sizeof boot_params);
sanitize_boot_params(&boot_params);
cmd_line_ptr = get_cmd_line_ptr();
@@ -130,6 +144,14 @@ static void __init copy_bootdata(char *real_mode_data)
command_line = __va(cmd_line_ptr);
memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
}
+
+ /*
+ * The old boot data is no longer needed and won't be reserved,
+ * freeing up that memory for use by the system. If SME is active,
+ * we need to remove the mappings that were created so that the
+ * memory doesn't remain mapped as decrypted.
+ */
+ sme_unmap_bootdata(real_mode_data);
}
asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f818236..d1414a1 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -115,6 +115,7 @@
#include <asm/microcode.h>
#include <asm/mmu_context.h>
#include <asm/kaslr.h>
+#include <asm/mem_encrypt.h>
/*
* max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -374,6 +375,14 @@ static void __init reserve_initrd(void)
!ramdisk_image || !ramdisk_size)
return; /* No initrd provided by bootloader */
+ /*
+ * If SME is active, this memory will be marked encrypted by the
+ * kernel when it is accessed (including relocation). However, the
+ * ramdisk image was loaded decrypted by the bootloader, so make
+ * sure that it is encrypted before accessing it.
+ */
+ sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
+
initrd_start = 0;
mapped_size = memblock_mem_size(max_pfn_mapped);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 18c0887..2321f05 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -19,6 +19,8 @@
#include <asm/tlbflush.h>
#include <asm/fixmap.h>
+#include <asm/setup.h>
+#include <asm/bootparam.h>
/*
* Since SME related variables are set early in the boot process they must
@@ -101,6 +103,81 @@ void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
__sme_early_enc_dec(paddr, size, false);
}
+static void __init sme_early_pgtable_flush(void)
+{
+ write_cr3(__sme_pa_nodebug(early_level4_pgt));
+}
+
+static void __init __sme_early_map_unmap_mem(void *vaddr, unsigned long size,
+ bool map)
+{
+ unsigned long paddr = (unsigned long)vaddr - __PAGE_OFFSET;
+ pmdval_t pmd_flags, pmd;
+
+ /* Use early_pmd_flags but remove the encryption mask */
+ pmd_flags = __sme_clr(early_pmd_flags);
+
+ do {
+ pmd = map ? (paddr & PMD_MASK) + pmd_flags : 0;
+ __early_make_pgtable((unsigned long)vaddr, pmd);
+
+ vaddr += PMD_SIZE;
+ paddr += PMD_SIZE;
+ size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE;
+ } while (size);
+}
+
+static void __init __sme_map_unmap_bootdata(char *real_mode_data, bool map)
+{
+ struct boot_params *boot_data;
+ unsigned long cmdline_paddr;
+
+ __sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), map);
+ boot_data = (struct boot_params *)real_mode_data;
+
+ /*
+ * Determine the command line address only after having established
+ * the decrypted mapping.
+ */
+ cmdline_paddr = boot_data->hdr.cmd_line_ptr |
+ ((u64)boot_data->ext_cmd_line_ptr << 32);
+
+ if (cmdline_paddr)
+ __sme_early_map_unmap_mem(__va(cmdline_paddr),
+ COMMAND_LINE_SIZE, map);
+}
+
+void __init sme_unmap_bootdata(char *real_mode_data)
+{
+ /* If SME is not active, the bootdata is in the correct state */
+ if (!sme_active())
+ return;
+
+ /*
+ * The bootdata and command line aren't needed anymore so clear
+ * any mapping of them.
+ */
+ __sme_map_unmap_bootdata(real_mode_data, false);
+
+ sme_early_pgtable_flush();
+}
+
+void __init sme_map_bootdata(char *real_mode_data)
+{
+ /* If SME is not active, the bootdata is in the correct state */
+ if (!sme_active())
+ return;
+
+ /*
+ * The bootdata and command line will not be encrypted, so they
+ * need to be mapped as decrypted memory so they can be copied
+ * properly.
+ */
+ __sme_map_unmap_bootdata(real_mode_data, true);
+
+ sme_early_pgtable_flush();
+}
+
void __init sme_early_init(void)
{
unsigned int i;
Add a function that will return the E820 type associated with an address
range.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/e820/api.h | 2 ++
arch/x86/kernel/e820.c | 26 +++++++++++++++++++++++---
2 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index 8e0f8b8..3641f5f 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -38,6 +38,8 @@
extern void e820__reallocate_tables(void);
extern void e820__register_nosave_regions(unsigned long limit_pfn);
+extern int e820__get_entry_type(u64 start, u64 end);
+
/*
* Returns true iff the specified range [start,end) is completely contained inside
* the ISA region.
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index d78a586..46c9b65 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -84,7 +84,8 @@ bool e820__mapped_any(u64 start, u64 end, enum e820_type type)
* Note: this function only works correctly once the E820 table is sorted and
* not-overlapping (at least for the range specified), which is the case normally.
*/
-bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
+static struct e820_entry *__e820__mapped_all(u64 start, u64 end,
+ enum e820_type type)
{
int i;
@@ -110,9 +111,28 @@ bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
* coverage of the desired range exists:
*/
if (start >= end)
- return 1;
+ return entry;
}
- return 0;
+
+ return NULL;
+}
+
+/*
+ * This function checks if the entire range <start,end> is mapped with type.
+ */
+bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
+{
+ return __e820__mapped_all(start, end, type);
+}
+
+/*
+ * This function returns the type associated with the range <start,end>.
+ */
+int e820__get_entry_type(u64 start, u64 end)
+{
+ struct e820_entry *entry = __e820__mapped_all(start, end, 0);
+
+ return entry ? entry->type : -EINVAL;
}
/*
Add a function that will determine if a supplied physical address matches
the address of an EFI table.
Signed-off-by: Tom Lendacky <[email protected]>
---
drivers/firmware/efi/efi.c | 33 +++++++++++++++++++++++++++++++++
include/linux/efi.h | 7 +++++++
2 files changed, 40 insertions(+)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index b372aad..983675d 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -55,6 +55,25 @@ struct efi __read_mostly efi = {
};
EXPORT_SYMBOL(efi);
+static unsigned long *efi_tables[] = {
+ &efi.mps,
+ &efi.acpi,
+ &efi.acpi20,
+ &efi.smbios,
+ &efi.smbios3,
+ &efi.sal_systab,
+ &efi.boot_info,
+ &efi.hcdp,
+ &efi.uga,
+ &efi.uv_systab,
+ &efi.fw_vendor,
+ &efi.runtime,
+ &efi.config_table,
+ &efi.esrt,
+ &efi.properties_table,
+ &efi.mem_attr_table,
+};
+
static bool disable_runtime;
static int __init setup_noefi(char *arg)
{
@@ -854,6 +873,20 @@ int efi_status_to_err(efi_status_t status)
return err;
}
+bool efi_is_table_address(unsigned long phys_addr)
+{
+ unsigned int i;
+
+ if (phys_addr == EFI_INVALID_TABLE_ADDR)
+ return false;
+
+ for (i = 0; i < ARRAY_SIZE(efi_tables); i++)
+ if (*(efi_tables[i]) == phys_addr)
+ return true;
+
+ return false;
+}
+
#ifdef CONFIG_KEXEC
static int update_efi_random_seed(struct notifier_block *nb,
unsigned long code, void *unused)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index ec36f42..504fa85 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1079,6 +1079,8 @@ static inline bool efi_enabled(int feature)
return test_bit(feature, &efi.flags) != 0;
}
extern void efi_reboot(enum reboot_mode reboot_mode, const char *__unused);
+
+extern bool efi_is_table_address(unsigned long phys_addr);
#else
static inline bool efi_enabled(int feature)
{
@@ -1092,6 +1094,11 @@ static inline bool efi_enabled(int feature)
{
return false;
}
+
+static inline bool efi_is_table_address(unsigned long phys_addr)
+{
+ return false;
+}
#endif
extern int efi_status_to_err(efi_status_t status);
The efi_mem_type() function currently returns a 0, which maps to
EFI_RESERVED_TYPE, if the function is unable to find a memmap entry for
the supplied physical address. Returning EFI_RESERVED_TYPE implies that
a memmap entry exists, when it doesn't. Instead of returning 0, change
the function to return a negative error value when no memmap entry is
found.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/ia64/kernel/efi.c | 4 ++--
arch/x86/platform/efi/efi.c | 6 +++---
include/linux/efi.h | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 1212956..8141600 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -757,14 +757,14 @@ static void __init handle_palo(unsigned long phys_addr)
return 0;
}
-u32
+int
efi_mem_type (unsigned long phys_addr)
{
efi_memory_desc_t *md = efi_memory_descriptor(phys_addr);
if (md)
return md->type;
- return 0;
+ return -EINVAL;
}
u64
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 43b96f5..a6a26cc 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -1034,12 +1034,12 @@ void __init efi_enter_virtual_mode(void)
/*
* Convenience functions to obtain memory types and attributes
*/
-u32 efi_mem_type(unsigned long phys_addr)
+int efi_mem_type(unsigned long phys_addr)
{
efi_memory_desc_t *md;
if (!efi_enabled(EFI_MEMMAP))
- return 0;
+ return -ENOTSUPP;
for_each_efi_memory_desc(md) {
if ((md->phys_addr <= phys_addr) &&
@@ -1047,7 +1047,7 @@ u32 efi_mem_type(unsigned long phys_addr)
(md->num_pages << EFI_PAGE_SHIFT))))
return md->type;
}
- return 0;
+ return -EINVAL;
}
static int __init arch_parse_efi_cmdline(char *str)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 504fa85..8bcb271 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -973,7 +973,7 @@ static inline void efi_esrt_init(void) { }
extern int efi_config_parse_tables(void *config_tables, int count, int sz,
efi_config_table_type_t *arch_tables);
extern u64 efi_get_iobase (void);
-extern u32 efi_mem_type (unsigned long phys_addr);
+extern int efi_mem_type(unsigned long phys_addr);
extern u64 efi_mem_attributes (unsigned long phys_addr);
extern u64 efi_mem_attribute (unsigned long phys_addr, unsigned long size);
extern int __init efi_uart_console_only (void);
Boot data (such as EFI related data) is not encrypted when the system is
booted because UEFI/BIOS does not run with SME active. In order to access
this data properly it needs to be mapped decrypted.
Update early_memremap() to provide an arch specific routine to modify the
pagetable protection attributes before they are applied to the new
mapping. This is used to remove the encryption mask for boot related data.
Update memremap() to provide an arch specific routine to determine if RAM
remapping is allowed. RAM remapping will cause an encrypted mapping to be
generated. By preventing RAM remapping, ioremap_cache() will be used
instead, which will provide a decrypted mapping of the boot related data.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/io.h | 4 +
arch/x86/mm/ioremap.c | 179 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/io.h | 2 +
kernel/memremap.c | 20 ++++-
mm/early_ioremap.c | 18 ++++-
5 files changed, 216 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 7afb0e2..9eac5a5 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -381,4 +381,8 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
#define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
#endif
+extern bool arch_memremap_can_ram_remap(resource_size_t offset, size_t size,
+ unsigned long flags);
+#define arch_memremap_can_ram_remap arch_memremap_can_ram_remap
+
#endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 792db75..34ed59d 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -13,6 +13,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/mmiotrace.h>
+#include <linux/efi.h>
#include <asm/set_memory.h>
#include <asm/e820/api.h>
@@ -22,6 +23,7 @@
#include <asm/pgalloc.h>
#include <asm/pat.h>
#include <asm/mem_encrypt.h>
+#include <asm/setup.h>
#include "physaddr.h"
@@ -422,6 +424,183 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
}
+/*
+ * Examine the physical address to determine if it is an area of memory
+ * that should be mapped decrypted. If the memory is not part of the
+ * kernel usable area it was accessed and created decrypted, so these
+ * areas should be mapped decrypted.
+ */
+static bool memremap_should_map_decrypted(resource_size_t phys_addr,
+ unsigned long size)
+{
+ /* Check if the address is outside kernel usable area */
+ switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
+ case E820_TYPE_RESERVED:
+ case E820_TYPE_ACPI:
+ case E820_TYPE_NVS:
+ case E820_TYPE_UNUSABLE:
+ return true;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Examine the physical address to determine if it is EFI data. Check
+ * it against the boot params structure and EFI tables and memory types.
+ */
+static bool memremap_is_efi_data(resource_size_t phys_addr,
+ unsigned long size)
+{
+ u64 paddr;
+
+ /* Check if the address is part of EFI boot/runtime data */
+ if (!efi_enabled(EFI_BOOT))
+ return false;
+
+ paddr = boot_params.efi_info.efi_memmap_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_memmap;
+ if (phys_addr == paddr)
+ return true;
+
+ paddr = boot_params.efi_info.efi_systab_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_systab;
+ if (phys_addr == paddr)
+ return true;
+
+ if (efi_is_table_address(phys_addr))
+ return true;
+
+ switch (efi_mem_type(phys_addr)) {
+ case EFI_BOOT_SERVICES_DATA:
+ case EFI_RUNTIME_SERVICES_DATA:
+ return true;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Examine the physical address to determine if it is boot data by checking
+ * it against the boot params setup_data chain.
+ */
+static bool memremap_is_setup_data(resource_size_t phys_addr,
+ unsigned long size)
+{
+ struct setup_data *data;
+ u64 paddr, paddr_next;
+
+ paddr = boot_params.hdr.setup_data;
+ while (paddr) {
+ unsigned int len;
+
+ if (phys_addr == paddr)
+ return true;
+
+ data = memremap(paddr, sizeof(*data),
+ MEMREMAP_WB | MEMREMAP_DEC);
+
+ paddr_next = data->next;
+ len = data->len;
+
+ memunmap(data);
+
+ if ((phys_addr > paddr) && (phys_addr < (paddr + len)))
+ return true;
+
+ paddr = paddr_next;
+ }
+
+ return false;
+}
+
+/*
+ * Examine the physical address to determine if it is boot data by checking
+ * it against the boot params setup_data chain (early boot version).
+ */
+static bool __init early_memremap_is_setup_data(resource_size_t phys_addr,
+ unsigned long size)
+{
+ struct setup_data *data;
+ u64 paddr, paddr_next;
+
+ paddr = boot_params.hdr.setup_data;
+ while (paddr) {
+ unsigned int len;
+
+ if (phys_addr == paddr)
+ return true;
+
+ data = early_memremap_decrypted(paddr, sizeof(*data));
+
+ paddr_next = data->next;
+ len = data->len;
+
+ early_memunmap(data, sizeof(*data));
+
+ if ((phys_addr > paddr) && (phys_addr < (paddr + len)))
+ return true;
+
+ paddr = paddr_next;
+ }
+
+ return false;
+}
+
+/*
+ * Architecture function to determine if RAM remap is allowed. By default, a
+ * RAM remap will map the data as encrypted. Determine if a RAM remap should
+ * not be done so that the data will be mapped decrypted.
+ */
+bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
+ unsigned long flags)
+{
+ if (!sme_active())
+ return true;
+
+ if (flags & MEMREMAP_ENC)
+ return true;
+
+ if (flags & MEMREMAP_DEC)
+ return false;
+
+ if (memremap_is_setup_data(phys_addr, size) ||
+ memremap_is_efi_data(phys_addr, size) ||
+ memremap_should_map_decrypted(phys_addr, size))
+ return false;
+
+ return true;
+}
+
+/*
+ * Architecture override of __weak function to adjust the protection attributes
+ * used when remapping memory. By default, early_memremap() will map the data
+ * as encrypted. Determine if an encrypted mapping should not be done and set
+ * the appropriate protection attributes.
+ */
+pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
+ unsigned long size,
+ pgprot_t prot)
+{
+ if (!sme_active())
+ return prot;
+
+ if (early_memremap_is_setup_data(phys_addr, size) ||
+ memremap_is_efi_data(phys_addr, size) ||
+ memremap_should_map_decrypted(phys_addr, size))
+ prot = pgprot_decrypted(prot);
+ else
+ prot = pgprot_encrypted(prot);
+
+ return prot;
+}
+
#ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
/* Remap memory with encryption */
void __init *early_memremap_encrypted(resource_size_t phys_addr,
diff --git a/include/linux/io.h b/include/linux/io.h
index 2195d9e..32e30e8 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -157,6 +157,8 @@ enum {
MEMREMAP_WB = 1 << 0,
MEMREMAP_WT = 1 << 1,
MEMREMAP_WC = 1 << 2,
+ MEMREMAP_ENC = 1 << 3,
+ MEMREMAP_DEC = 1 << 4,
};
void *memremap(resource_size_t offset, size_t size, unsigned long flags);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 23a6483..9b4fbd5 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -34,13 +34,24 @@ static void *arch_memremap_wb(resource_size_t offset, unsigned long size)
}
#endif
-static void *try_ram_remap(resource_size_t offset, size_t size)
+#ifndef arch_memremap_can_ram_remap
+static bool arch_memremap_can_ram_remap(resource_size_t offset, size_t size,
+ unsigned long flags)
+{
+ return true;
+}
+#endif
+
+static void *try_ram_remap(resource_size_t offset, size_t size,
+ unsigned long flags)
{
unsigned long pfn = PHYS_PFN(offset);
/* In the simple case just return the existing linear address */
- if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)))
+ if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)) &&
+ arch_memremap_can_ram_remap(offset, size, flags))
return __va(offset);
+
return NULL; /* fallback to arch_memremap_wb */
}
@@ -48,7 +59,8 @@ static void *try_ram_remap(resource_size_t offset, size_t size)
* memremap() - remap an iomem_resource as cacheable memory
* @offset: iomem resource start address
* @size: size of remap
- * @flags: any of MEMREMAP_WB, MEMREMAP_WT and MEMREMAP_WC
+ * @flags: any of MEMREMAP_WB, MEMREMAP_WT, MEMREMAP_WC,
+ * MEMREMAP_ENC, MEMREMAP_DEC
*
* memremap() is "ioremap" for cases where it is known that the resource
* being mapped does not have i/o side effects and the __iomem
@@ -95,7 +107,7 @@ void *memremap(resource_size_t offset, size_t size, unsigned long flags)
* the requested range is potentially in System RAM.
*/
if (is_ram == REGION_INTERSECTS)
- addr = try_ram_remap(offset, size);
+ addr = try_ram_remap(offset, size, flags);
if (!addr)
addr = arch_memremap_wb(offset, size);
}
diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index d7d30da..b1dd4a9 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -30,6 +30,13 @@ static int __init early_ioremap_debug_setup(char *str)
static int after_paging_init __initdata;
+pgprot_t __init __weak early_memremap_pgprot_adjust(resource_size_t phys_addr,
+ unsigned long size,
+ pgprot_t prot)
+{
+ return prot;
+}
+
void __init __weak early_ioremap_shutdown(void)
{
}
@@ -215,14 +222,19 @@ void __init early_iounmap(void __iomem *addr, unsigned long size)
void __init *
early_memremap(resource_size_t phys_addr, unsigned long size)
{
- return (__force void *)__early_ioremap(phys_addr, size,
- FIXMAP_PAGE_NORMAL);
+ pgprot_t prot = early_memremap_pgprot_adjust(phys_addr, size,
+ FIXMAP_PAGE_NORMAL);
+
+ return (__force void *)__early_ioremap(phys_addr, size, prot);
}
#ifdef FIXMAP_PAGE_RO
void __init *
early_memremap_ro(resource_size_t phys_addr, unsigned long size)
{
- return (__force void *)__early_ioremap(phys_addr, size, FIXMAP_PAGE_RO);
+ pgprot_t prot = early_memremap_pgprot_adjust(phys_addr, size,
+ FIXMAP_PAGE_RO);
+
+ return (__force void *)__early_ioremap(phys_addr, size, prot);
}
#endif
Persistent memory is expected to persist across reboots. The encryption
key used by SME will change across reboots which will result in corrupted
persistent memory. Persistent memory is handed out by block devices
through memory remapping functions, so be sure not to map this memory as
encrypted.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/mm/ioremap.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 34ed59d..99cda55 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -428,17 +428,46 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
* Examine the physical address to determine if it is an area of memory
* that should be mapped decrypted. If the memory is not part of the
* kernel usable area it was accessed and created decrypted, so these
- * areas should be mapped decrypted.
+ * areas should be mapped decrypted. And since the encryption key can
+ * change across reboots, persistent memory should also be mapped
+ * decrypted.
*/
static bool memremap_should_map_decrypted(resource_size_t phys_addr,
unsigned long size)
{
+ int is_pmem;
+
+ /*
+ * Check if the address is part of a persistent memory region.
+ * This check covers areas added by E820, EFI and ACPI.
+ */
+ is_pmem = region_intersects(phys_addr, size, IORESOURCE_MEM,
+ IORES_DESC_PERSISTENT_MEMORY);
+ if (is_pmem != REGION_DISJOINT)
+ return true;
+
+ /*
+ * Check if the non-volatile attribute is set for an EFI
+ * reserved area.
+ */
+ if (efi_enabled(EFI_BOOT)) {
+ switch (efi_mem_type(phys_addr)) {
+ case EFI_RESERVED_TYPE:
+ if (efi_mem_attributes(phys_addr) & EFI_MEMORY_NV)
+ return true;
+ break;
+ default:
+ break;
+ }
+ }
+
/* Check if the address is outside kernel usable area */
switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
case E820_TYPE_RESERVED:
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
+ case E820_TYPE_PRAM:
return true;
default:
break;
Add support for changing the memory encryption attribute for one or more
memory pages. This will be useful when we have to change the AP trampoline
area to not be encrypted. Or when we need to change the SWIOTLB area to
not be encrypted in support of devices that can't support the encryption
mask range.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/set_memory.h | 3 ++
arch/x86/mm/pageattr.c | 62 +++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index eaec6c3..cd71273 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -11,6 +11,7 @@
* Executability : eXeutable, NoteXecutable
* Read/Write : ReadOnly, ReadWrite
* Presence : NotPresent
+ * Encryption : Encrypted, Decrypted
*
* Within a category, the attributes are mutually exclusive.
*
@@ -42,6 +43,8 @@
int set_memory_wb(unsigned long addr, int numpages);
int set_memory_np(unsigned long addr, int numpages);
int set_memory_4k(unsigned long addr, int numpages);
+int set_memory_encrypted(unsigned long addr, int numpages);
+int set_memory_decrypted(unsigned long addr, int numpages);
int set_memory_array_uc(unsigned long *addr, int addrinarray);
int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index e7d3866..d9e09fb 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1769,6 +1769,68 @@ int set_memory_4k(unsigned long addr, int numpages)
__pgprot(0), 1, 0, NULL);
}
+static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
+{
+ struct cpa_data cpa;
+ unsigned long start;
+ int ret;
+
+ /* Nothing to do if the SME is not active */
+ if (!sme_active())
+ return 0;
+
+ /* Should not be working on unaligned addresses */
+ if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
+ addr &= PAGE_MASK;
+
+ start = addr;
+
+ memset(&cpa, 0, sizeof(cpa));
+ cpa.vaddr = &addr;
+ cpa.numpages = numpages;
+ cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0);
+ cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC);
+ cpa.pgd = init_mm.pgd;
+
+ /* Must avoid aliasing mappings in the highmem code */
+ kmap_flush_unused();
+ vm_unmap_aliases();
+
+ /*
+ * Before changing the encryption attribute, we need to flush caches.
+ */
+ if (static_cpu_has(X86_FEATURE_CLFLUSH))
+ cpa_flush_range(start, numpages, 1);
+ else
+ cpa_flush_all(1);
+
+ ret = __change_page_attr_set_clr(&cpa, 1);
+
+ /*
+ * After changing the encryption attribute, we need to flush TLBs
+ * again in case any speculative TLB caching occurred (but no need
+ * to flush caches again). We could just use cpa_flush_all(), but
+ * in case TLB flushing gets optimized in the cpa_flush_range()
+ * path use the same logic as above.
+ */
+ if (static_cpu_has(X86_FEATURE_CLFLUSH))
+ cpa_flush_range(start, numpages, 0);
+ else
+ cpa_flush_all(0);
+
+ return ret;
+}
+
+int set_memory_encrypted(unsigned long addr, int numpages)
+{
+ return __set_memory_enc_dec(addr, numpages, true);
+}
+
+int set_memory_decrypted(unsigned long addr, int numpages)
+{
+ return __set_memory_enc_dec(addr, numpages, false);
+}
+
int set_pages_uc(struct page *page, int numpages)
{
unsigned long addr = (unsigned long)page_address(page);
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create decrypted bounce buffers for use by these devices.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/dma-mapping.h | 5 ++-
arch/x86/include/asm/mem_encrypt.h | 5 +++
arch/x86/kernel/pci-dma.c | 11 +++++--
arch/x86/kernel/pci-nommu.c | 2 +
arch/x86/kernel/pci-swiotlb.c | 15 ++++++++--
arch/x86/mm/mem_encrypt.c | 22 ++++++++++++++
include/linux/swiotlb.h | 1 +
init/main.c | 13 ++++++++
lib/swiotlb.c | 56 +++++++++++++++++++++++++++++++-----
9 files changed, 113 insertions(+), 17 deletions(-)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 08a0838..d75430a 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -12,6 +12,7 @@
#include <asm/io.h>
#include <asm/swiotlb.h>
#include <linux/dma-contiguous.h>
+#include <asm/mem_encrypt.h>
#ifdef CONFIG_ISA
# define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -62,12 +63,12 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
{
- return paddr;
+ return __sme_set(paddr);
}
static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
{
- return daddr;
+ return __sme_clr(daddr);
}
#endif /* CONFIG_X86_DMA_REMAP */
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 61a7049..f1215a4 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -31,6 +31,11 @@ void __init sme_early_decrypt(resource_size_t paddr,
void __init sme_early_init(void);
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void);
+
+void swiotlb_set_mem_attributes(void *vaddr, unsigned long size);
+
#else /* !CONFIG_AMD_MEM_ENCRYPT */
#define sme_me_mask 0UL
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 3a216ec..72d96d4 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -93,9 +93,12 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
if (gfpflags_allow_blocking(flag)) {
page = dma_alloc_from_contiguous(dev, count, get_order(size),
flag);
- if (page && page_to_phys(page) + size > dma_mask) {
- dma_release_from_contiguous(dev, page, count);
- page = NULL;
+ if (page) {
+ addr = phys_to_dma(dev, page_to_phys(page));
+ if (addr + size > dma_mask) {
+ dma_release_from_contiguous(dev, page, count);
+ page = NULL;
+ }
}
}
/* fallback */
@@ -104,7 +107,7 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
if (!page)
return NULL;
- addr = page_to_phys(page);
+ addr = phys_to_dma(dev, page_to_phys(page));
if (addr + size > dma_mask) {
__free_pages(page, get_order(size));
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index a88952e..98b576a 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
enum dma_data_direction dir,
unsigned long attrs)
{
- dma_addr_t bus = page_to_phys(page) + offset;
+ dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
WARN_ON(size == 0);
if (!check_addr("map_single", dev, bus, size))
return DMA_ERROR_CODE;
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 1e23577..cc1e106 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -12,6 +12,8 @@
#include <asm/dma.h>
#include <asm/xen/swiotlb-xen.h>
#include <asm/iommu_table.h>
+#include <asm/mem_encrypt.h>
+
int swiotlb __read_mostly;
void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -79,8 +81,8 @@ int __init pci_swiotlb_detect_override(void)
pci_swiotlb_late_init);
/*
- * if 4GB or more detected (and iommu=off not set) return 1
- * and set swiotlb to 1.
+ * If 4GB or more detected (and iommu=off not set) or if SME is active
+ * then set swiotlb to 1 and return 1.
*/
int __init pci_swiotlb_detect_4gb(void)
{
@@ -89,6 +91,15 @@ int __init pci_swiotlb_detect_4gb(void)
if (!no_iommu && max_possible_pfn > MAX_DMA32_PFN)
swiotlb = 1;
#endif
+
+ /*
+ * If SME is active then swiotlb will be set to 1 so that bounce
+ * buffers are allocated and used for devices that do not support
+ * the addressing range required for the encryption mask.
+ */
+ if (sme_active())
+ swiotlb = 1;
+
return swiotlb;
}
IOMMU_INIT(pci_swiotlb_detect_4gb,
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 2321f05..5d7c51d 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -16,11 +16,14 @@
#ifdef CONFIG_AMD_MEM_ENCRYPT
#include <linux/mm.h>
+#include <linux/dma-mapping.h>
+#include <linux/swiotlb.h>
#include <asm/tlbflush.h>
#include <asm/fixmap.h>
#include <asm/setup.h>
#include <asm/bootparam.h>
+#include <asm/set_memory.h>
/*
* Since SME related variables are set early in the boot process they must
@@ -194,6 +197,25 @@ void __init sme_early_init(void)
protection_map[i] = pgprot_encrypted(protection_map[i]);
}
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void)
+{
+ if (!sme_me_mask)
+ return;
+
+ /* Call into SWIOTLB to update the SWIOTLB DMA buffers */
+ swiotlb_update_mem_attributes();
+}
+
+void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
+{
+ WARN(PAGE_ALIGN(size) != size,
+ "size is not page-aligned (%#lx)\n", size);
+
+ /* Make the SWIOTLB buffer area decrypted */
+ set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
+}
+
void __init sme_encrypt_kernel(void)
{
}
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4ee479f..15e7160 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -35,6 +35,7 @@ enum swiotlb_force {
extern unsigned long swiotlb_nr_tbl(void);
unsigned long swiotlb_size_or_default(void);
extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
+extern void __init swiotlb_update_mem_attributes(void);
/*
* Enumeration for sync targets
diff --git a/init/main.c b/init/main.c
index df58a41..7125b5f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -488,6 +488,10 @@ void __init __weak thread_stack_cache_init(void)
}
#endif
+void __init __weak mem_encrypt_init(void)
+{
+}
+
/*
* Set up kernel memory allocators
*/
@@ -640,6 +644,15 @@ asmlinkage __visible void __init start_kernel(void)
*/
locking_selftest();
+ /*
+ * This needs to be called before any devices perform DMA
+ * operations that might use the SWIOTLB bounce buffers.
+ * This call will mark the bounce buffers as decrypted so
+ * that their usage will not cause "plain-text" data to be
+ * decrypted when accessed.
+ */
+ mem_encrypt_init();
+
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index a8d74a7..74d6557 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -30,6 +30,7 @@
#include <linux/highmem.h>
#include <linux/gfp.h>
#include <linux/scatterlist.h>
+#include <linux/mem_encrypt.h>
#include <asm/io.h>
#include <asm/dma.h>
@@ -155,6 +156,17 @@ unsigned long swiotlb_size_or_default(void)
return size ? size : (IO_TLB_DEFAULT_SIZE);
}
+void __weak swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
+{
+}
+
+/* For swiotlb, clear memory encryption mask from dma addresses */
+static dma_addr_t swiotlb_phys_to_dma(struct device *hwdev,
+ phys_addr_t address)
+{
+ return __sme_clr(phys_to_dma(hwdev, address));
+}
+
/* Note that this doesn't work with highmem page */
static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
volatile void *address)
@@ -183,6 +195,31 @@ void swiotlb_print_info(void)
bytes >> 20, vstart, vend - 1);
}
+/*
+ * Early SWIOTLB allocation may be too early to allow an architecture to
+ * perform the desired operations. This function allows the architecture to
+ * call SWIOTLB when the operations are possible. It needs to be called
+ * before the SWIOTLB memory is used.
+ */
+void __init swiotlb_update_mem_attributes(void)
+{
+ void *vaddr;
+ unsigned long bytes;
+
+ if (no_iotlb_memory || late_alloc)
+ return;
+
+ vaddr = phys_to_virt(io_tlb_start);
+ bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+ swiotlb_set_mem_attributes(vaddr, bytes);
+ memset(vaddr, 0, bytes);
+
+ vaddr = phys_to_virt(io_tlb_overflow_buffer);
+ bytes = PAGE_ALIGN(io_tlb_overflow);
+ swiotlb_set_mem_attributes(vaddr, bytes);
+ memset(vaddr, 0, bytes);
+}
+
int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
{
void *v_overflow_buffer;
@@ -320,6 +357,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
io_tlb_start = virt_to_phys(tlb);
io_tlb_end = io_tlb_start + bytes;
+ swiotlb_set_mem_attributes(tlb, bytes);
memset(tlb, 0, bytes);
/*
@@ -330,6 +368,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
if (!v_overflow_buffer)
goto cleanup2;
+ swiotlb_set_mem_attributes(v_overflow_buffer, io_tlb_overflow);
+ memset(v_overflow_buffer, 0, io_tlb_overflow);
io_tlb_overflow_buffer = virt_to_phys(v_overflow_buffer);
/*
@@ -581,7 +621,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
return SWIOTLB_MAP_ERROR;
}
- start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
+ start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
return swiotlb_tbl_map_single(hwdev, start_dma_addr, phys, size,
dir, attrs);
}
@@ -702,7 +742,7 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
goto err_warn;
ret = phys_to_virt(paddr);
- dev_addr = phys_to_dma(hwdev, paddr);
+ dev_addr = swiotlb_phys_to_dma(hwdev, paddr);
/* Confirm address can be DMA'd by device */
if (dev_addr + size - 1 > dma_mask) {
@@ -812,10 +852,10 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
map = map_single(dev, phys, size, dir, attrs);
if (map == SWIOTLB_MAP_ERROR) {
swiotlb_full(dev, size, dir, 1);
- return phys_to_dma(dev, io_tlb_overflow_buffer);
+ return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
}
- dev_addr = phys_to_dma(dev, map);
+ dev_addr = swiotlb_phys_to_dma(dev, map);
/* Ensure that the address returned is DMA'ble */
if (dma_capable(dev, dev_addr, size))
@@ -824,7 +864,7 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
attrs |= DMA_ATTR_SKIP_CPU_SYNC;
swiotlb_tbl_unmap_single(dev, map, size, dir, attrs);
- return phys_to_dma(dev, io_tlb_overflow_buffer);
+ return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
}
EXPORT_SYMBOL_GPL(swiotlb_map_page);
@@ -958,7 +998,7 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
sg_dma_len(sgl) = 0;
return 0;
}
- sg->dma_address = phys_to_dma(hwdev, map);
+ sg->dma_address = swiotlb_phys_to_dma(hwdev, map);
} else
sg->dma_address = dev_addr;
sg_dma_len(sg) = sg->length;
@@ -1026,7 +1066,7 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
int
swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
{
- return (dma_addr == phys_to_dma(hwdev, io_tlb_overflow_buffer));
+ return (dma_addr == swiotlb_phys_to_dma(hwdev, io_tlb_overflow_buffer));
}
EXPORT_SYMBOL(swiotlb_dma_mapping_error);
@@ -1039,6 +1079,6 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
int
swiotlb_dma_supported(struct device *hwdev, u64 mask)
{
- return phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
+ return swiotlb_phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
}
EXPORT_SYMBOL(swiotlb_dma_supported);
Add warnings to let the user know when bounce buffers are being used for
DMA when SME is active. Since the bounce buffers are not in encrypted
memory, these notifications are to allow the user to determine some
appropriate action - if necessary.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 8 ++++++++
include/asm-generic/mem_encrypt.h | 5 +++++
include/linux/dma-mapping.h | 9 +++++++++
lib/swiotlb.c | 3 +++
4 files changed, 25 insertions(+)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index f1215a4..c7a2525 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -69,6 +69,14 @@ static inline bool sme_active(void)
return !!sme_me_mask;
}
+static inline u64 sme_dma_mask(void)
+{
+ if (!sme_me_mask)
+ return 0ULL;
+
+ return ((u64)sme_me_mask << 1) - 1;
+}
+
/*
* The __sme_pa() and __sme_pa_nodebug() macros are meant for use when
* writing to or comparing values from the cr3 register. Having the
diff --git a/include/asm-generic/mem_encrypt.h b/include/asm-generic/mem_encrypt.h
index b55c3f9..fb02ff0 100644
--- a/include/asm-generic/mem_encrypt.h
+++ b/include/asm-generic/mem_encrypt.h
@@ -22,6 +22,11 @@ static inline bool sme_active(void)
return false;
}
+static inline u64 sme_dma_mask(void)
+{
+ return 0ULL;
+}
+
/*
* The __sme_set() and __sme_clr() macros are useful for adding or removing
* the encryption mask from a value (e.g. when dealing with pagetable
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4f3eece..e2c5fda 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -10,6 +10,7 @@
#include <linux/scatterlist.h>
#include <linux/kmemcheck.h>
#include <linux/bug.h>
+#include <linux/mem_encrypt.h>
/**
* List of possible attributes associated with a DMA mapping. The semantics
@@ -577,6 +578,10 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;
+
+ if (sme_active() && (mask < sme_dma_mask()))
+ dev_warn(dev, "SME is active, device will require DMA bounce buffers\n");
+
*dev->dma_mask = mask;
return 0;
}
@@ -596,6 +601,10 @@ static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
{
if (!dma_supported(dev, mask))
return -EIO;
+
+ if (sme_active() && (mask < sme_dma_mask()))
+ dev_warn(dev, "SME is active, device will require DMA bounce buffers\n");
+
dev->coherent_dma_mask = mask;
return 0;
}
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 74d6557..f78906a 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -509,6 +509,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
if (no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
+ if (sme_active())
+ pr_warn_once("SME is active and system is using DMA bounce buffers\n");
+
mask = dma_get_seg_boundary(hwdev);
tbl_dma_addr &= mask;
Since video memory needs to be accessed decrypted, be sure that the
memory encryption mask is not set for the video ranges.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/vga.h | 14 +++++++++++++-
arch/x86/mm/pageattr.c | 2 ++
drivers/gpu/drm/drm_gem.c | 2 ++
drivers/gpu/drm/drm_vm.c | 4 ++++
drivers/gpu/drm/ttm/ttm_bo_vm.c | 7 +++++--
drivers/gpu/drm/udl/udl_fb.c | 4 ++++
drivers/video/fbdev/core/fbmem.c | 12 ++++++++++++
7 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/vga.h b/arch/x86/include/asm/vga.h
index c4b9dc2..9f42bee 100644
--- a/arch/x86/include/asm/vga.h
+++ b/arch/x86/include/asm/vga.h
@@ -7,12 +7,24 @@
#ifndef _ASM_X86_VGA_H
#define _ASM_X86_VGA_H
+#include <asm/set_memory.h>
+
/*
* On the PC, we can just recalculate addresses and then
* access the videoram directly without any black magic.
+ * To support memory encryption however, we need to access
+ * the videoram as decrypted memory.
*/
-#define VGA_MAP_MEM(x, s) (unsigned long)phys_to_virt(x)
+#define VGA_MAP_MEM(x, s) \
+({ \
+ unsigned long start = (unsigned long)phys_to_virt(x); \
+ \
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) \
+ set_memory_decrypted(start, (s) >> PAGE_SHIFT); \
+ \
+ start; \
+})
#define vga_readb(x) (*(x))
#define vga_writeb(x, y) (*(y) = (x))
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index d9e09fb..13fc5db 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1825,11 +1825,13 @@ int set_memory_encrypted(unsigned long addr, int numpages)
{
return __set_memory_enc_dec(addr, numpages, true);
}
+EXPORT_SYMBOL_GPL(set_memory_encrypted);
int set_memory_decrypted(unsigned long addr, int numpages)
{
return __set_memory_enc_dec(addr, numpages, false);
}
+EXPORT_SYMBOL_GPL(set_memory_decrypted);
int set_pages_uc(struct page *page, int numpages)
{
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index b1e28c9..019f48c 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -36,6 +36,7 @@
#include <linux/pagemap.h>
#include <linux/shmem_fs.h>
#include <linux/dma-buf.h>
+#include <linux/mem_encrypt.h>
#include <drm/drmP.h>
#include <drm/drm_vma_manager.h>
#include <drm/drm_gem.h>
@@ -928,6 +929,7 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
vma->vm_ops = dev->driver->gem_vm_ops;
vma->vm_private_data = obj;
vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+ vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
/* Take a ref for this mapping of the object, so that the fault
* handler can dereference the mmap offset's pointer to the object.
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index 1170b32..ed4bcbf 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -40,6 +40,7 @@
#include <linux/efi.h>
#include <linux/slab.h>
#endif
+#include <linux/mem_encrypt.h>
#include <asm/pgtable.h>
#include "drm_internal.h"
#include "drm_legacy.h"
@@ -58,6 +59,9 @@ static pgprot_t drm_io_prot(struct drm_local_map *map,
{
pgprot_t tmp = vm_get_page_prot(vma->vm_flags);
+ /* We don't want graphics memory to be mapped encrypted */
+ tmp = pgprot_decrypted(tmp);
+
#if defined(__i386__) || defined(__x86_64__) || defined(__powerpc__)
if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING))
tmp = pgprot_noncached(tmp);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 9f53df9..622dab6 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -39,6 +39,7 @@
#include <linux/rbtree.h>
#include <linux/module.h>
#include <linux/uaccess.h>
+#include <linux/mem_encrypt.h>
#define TTM_BO_VM_NUM_PREFAULT 16
@@ -230,9 +231,11 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
* first page.
*/
for (i = 0; i < TTM_BO_VM_NUM_PREFAULT; ++i) {
- if (bo->mem.bus.is_iomem)
+ if (bo->mem.bus.is_iomem) {
+ /* Iomem should not be marked encrypted */
+ cvma.vm_page_prot = pgprot_decrypted(cvma.vm_page_prot);
pfn = bdev->driver->io_mem_pfn(bo, page_offset);
- else {
+ } else {
page = ttm->pages[page_offset];
if (unlikely(!page && i == 0)) {
retval = VM_FAULT_OOM;
diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c
index 4a65003..92e1690 100644
--- a/drivers/gpu/drm/udl/udl_fb.c
+++ b/drivers/gpu/drm/udl/udl_fb.c
@@ -14,6 +14,7 @@
#include <linux/slab.h>
#include <linux/fb.h>
#include <linux/dma-buf.h>
+#include <linux/mem_encrypt.h>
#include <drm/drmP.h>
#include <drm/drm_crtc.h>
@@ -169,6 +170,9 @@ static int udl_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
pr_notice("mmap() framebuffer addr:%lu size:%lu\n",
pos, size);
+ /* We don't want the framebuffer to be mapped encrypted */
+ vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
+
while (size > 0) {
page = vmalloc_to_pfn((void *)pos);
if (remap_pfn_range(vma, start, page, PAGE_SIZE, PAGE_SHARED))
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 069fe79..b5e7c33 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -32,6 +32,7 @@
#include <linux/device.h>
#include <linux/efi.h>
#include <linux/fb.h>
+#include <linux/mem_encrypt.h>
#include <asm/fb.h>
@@ -1405,6 +1406,12 @@ static long fb_compat_ioctl(struct file *file, unsigned int cmd,
mutex_lock(&info->mm_lock);
if (fb->fb_mmap) {
int res;
+
+ /*
+ * The framebuffer needs to be accessed decrypted, be sure
+ * SME protection is removed ahead of the call
+ */
+ vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
res = fb->fb_mmap(info, vma);
mutex_unlock(&info->mm_lock);
return res;
@@ -1430,6 +1437,11 @@ static long fb_compat_ioctl(struct file *file, unsigned int cmd,
mutex_unlock(&info->mm_lock);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+ /*
+ * The framebuffer needs to be accessed decrypted, be sure
+ * SME protection is removed
+ */
+ vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
fb_pgprotect(file, vma, start);
return vm_iomap_memory(vma, start, len);
Update the KVM support to work with SME. The VMCB has a number of fields
where physical addresses are used and these addresses must contain the
memory encryption mask in order to properly access the encrypted memory.
Also, use the memory encryption mask when creating and using the nested
page tables.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.c | 12 ++++++++----
arch/x86/kvm/mmu.h | 2 +-
arch/x86/kvm/svm.c | 35 ++++++++++++++++++-----------------
arch/x86/kvm/vmx.c | 3 ++-
arch/x86/kvm/x86.c | 3 ++-
6 files changed, 32 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 695605e..6d1267f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1069,7 +1069,7 @@ struct kvm_arch_async_pf {
void kvm_mmu_uninit_vm(struct kvm *kvm);
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
- u64 acc_track_mask);
+ u64 acc_track_mask, u64 me_mask);
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5d3376f..892b7bd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -107,7 +107,7 @@ enum {
(((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
-#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#define PT64_BASE_ADDR_MASK __sme_clr((((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)))
#define PT64_DIR_BASE_ADDR_MASK \
(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
#define PT64_LVL_ADDR_MASK(level) \
@@ -125,7 +125,7 @@ enum {
* PT32_LEVEL_BITS))) - 1))
#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
- | shadow_x_mask | shadow_nx_mask)
+ | shadow_x_mask | shadow_nx_mask | shadow_me_mask)
#define ACC_EXEC_MASK 1
#define ACC_WRITE_MASK PT_WRITABLE_MASK
@@ -184,6 +184,7 @@ struct kvm_shadow_walk_iterator {
static u64 __read_mostly shadow_dirty_mask;
static u64 __read_mostly shadow_mmio_mask;
static u64 __read_mostly shadow_present_mask;
+static u64 __read_mostly shadow_me_mask;
/*
* The mask/value to distinguish a PTE that has been marked not-present for
@@ -317,7 +318,7 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
- u64 acc_track_mask)
+ u64 acc_track_mask, u64 me_mask)
{
if (acc_track_mask != 0)
acc_track_mask |= SPTE_SPECIAL_MASK;
@@ -330,6 +331,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
shadow_present_mask = p_mask;
shadow_acc_track_mask = acc_track_mask;
WARN_ON(shadow_accessed_mask != 0 && shadow_acc_track_mask != 0);
+ shadow_me_mask = me_mask;
}
EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
@@ -2398,7 +2400,8 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK |
- shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+ shadow_user_mask | shadow_x_mask | shadow_accessed_mask |
+ shadow_me_mask;
mmu_spte_set(sptep, spte);
@@ -2700,6 +2703,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
pte_access &= ~ACC_WRITE_MASK;
spte |= (u64)pfn << PAGE_SHIFT;
+ spte |= shadow_me_mask;
if (pte_access & ACC_WRITE_MASK) {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2797580..9694ff9 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,7 +48,7 @@
static inline u64 rsvd_bits(int s, int e)
{
- return ((1ULL << (e - s + 1)) - 1) << s;
+ return __sme_clr(((1ULL << (e - s + 1)) - 1) << s);
}
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ba9891a..d2e9fca 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1138,9 +1138,9 @@ static void avic_init_vmcb(struct vcpu_svm *svm)
{
struct vmcb *vmcb = svm->vmcb;
struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
- phys_addr_t bpa = page_to_phys(svm->avic_backing_page);
- phys_addr_t lpa = page_to_phys(vm_data->avic_logical_id_table_page);
- phys_addr_t ppa = page_to_phys(vm_data->avic_physical_id_table_page);
+ phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page));
+ phys_addr_t lpa = __sme_set(page_to_phys(vm_data->avic_logical_id_table_page));
+ phys_addr_t ppa = __sme_set(page_to_phys(vm_data->avic_physical_id_table_page));
vmcb->control.avic_backing_page = bpa & AVIC_HPA_MASK;
vmcb->control.avic_logical_id = lpa & AVIC_HPA_MASK;
@@ -1203,8 +1203,8 @@ static void init_vmcb(struct vcpu_svm *svm)
set_intercept(svm, INTERCEPT_MWAIT);
}
- control->iopm_base_pa = iopm_base;
- control->msrpm_base_pa = __pa(svm->msrpm);
+ control->iopm_base_pa = __sme_set(iopm_base);
+ control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
control->int_ctl = V_INTR_MASKING_MASK;
init_seg(&save->es);
@@ -1338,9 +1338,9 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
return -EINVAL;
new_entry = READ_ONCE(*entry);
- new_entry = (page_to_phys(svm->avic_backing_page) &
- AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK) |
- AVIC_PHYSICAL_ID_ENTRY_VALID_MASK;
+ new_entry = __sme_set((page_to_phys(svm->avic_backing_page) &
+ AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK) |
+ AVIC_PHYSICAL_ID_ENTRY_VALID_MASK);
WRITE_ONCE(*entry, new_entry);
svm->avic_physical_id_cache = entry;
@@ -1608,7 +1608,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
svm->vmcb = page_address(page);
clear_page(svm->vmcb);
- svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
+ svm->vmcb_pa = __sme_set(page_to_pfn(page) << PAGE_SHIFT);
svm->asid_generation = 0;
init_vmcb(svm);
@@ -1636,7 +1636,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- __free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
+ __free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT));
__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
__free_page(virt_to_page(svm->nested.hsave));
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
@@ -2303,7 +2303,7 @@ static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
u64 pdpte;
int ret;
- ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(cr3), &pdpte,
+ ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(__sme_clr(cr3)), &pdpte,
offset_in_page(cr3) + index * 8, 8);
if (ret)
return 0;
@@ -2315,7 +2315,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
{
struct vcpu_svm *svm = to_svm(vcpu);
- svm->vmcb->control.nested_cr3 = root;
+ svm->vmcb->control.nested_cr3 = __sme_set(root);
mark_dirty(svm->vmcb, VMCB_NPT);
svm_flush_tlb(vcpu);
}
@@ -2803,7 +2803,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
svm->nested.msrpm[p] = svm->msrpm[p] | value;
}
- svm->vmcb->control.msrpm_base_pa = __pa(svm->nested.msrpm);
+ svm->vmcb->control.msrpm_base_pa = __sme_set(__pa(svm->nested.msrpm));
return true;
}
@@ -4435,7 +4435,7 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
pr_debug("SVM: %s: use GA mode for irq %u\n", __func__,
irq.vector);
*svm = to_svm(vcpu);
- vcpu_info->pi_desc_addr = page_to_phys((*svm)->avic_backing_page);
+ vcpu_info->pi_desc_addr = __sme_set(page_to_phys((*svm)->avic_backing_page));
vcpu_info->vector = irq.vector;
return 0;
@@ -4486,7 +4486,8 @@ static int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
struct amd_iommu_pi_data pi;
/* Try to enable guest_mode in IRTE */
- pi.base = page_to_phys(svm->avic_backing_page) & AVIC_HPA_MASK;
+ pi.base = __sme_set(page_to_phys(svm->avic_backing_page) &
+ AVIC_HPA_MASK);
pi.ga_tag = AVIC_GATAG(kvm->arch.avic_vm_id,
svm->vcpu.vcpu_id);
pi.is_guest_mode = true;
@@ -4911,7 +4912,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned long root)
{
struct vcpu_svm *svm = to_svm(vcpu);
- svm->vmcb->save.cr3 = root;
+ svm->vmcb->save.cr3 = __sme_set(root);
mark_dirty(svm->vmcb, VMCB_CR);
svm_flush_tlb(vcpu);
}
@@ -4920,7 +4921,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
{
struct vcpu_svm *svm = to_svm(vcpu);
- svm->vmcb->control.nested_cr3 = root;
+ svm->vmcb->control.nested_cr3 = __sme_set(root);
mark_dirty(svm->vmcb, VMCB_NPT);
/* Also sync guest cr3 here in case we live migrate */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9b4b5d6..dd3dd26 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6443,7 +6443,8 @@ void vmx_enable_tdp(void)
enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull,
0ull, VMX_EPT_EXECUTABLE_MASK,
cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK,
- enable_ept_ad_bits ? 0ull : VMX_EPT_RWX_MASK);
+ enable_ept_ad_bits ? 0ull : VMX_EPT_RWX_MASK,
+ 0ull);
ept_set_mmio_spte_mask();
kvm_enable_tdp();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a2cd099..d232b98 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -66,6 +66,7 @@
#include <asm/pvclock.h>
#include <asm/div64.h>
#include <asm/irq_remapping.h>
+#include <asm/mem_encrypt.h>
#define CREATE_TRACE_POINTS
#include "trace.h"
@@ -6095,7 +6096,7 @@ int kvm_arch_init(void *opaque)
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
PT_DIRTY_MASK, PT64_NX_MASK, 0,
- PT_PRESENT_MASK, 0);
+ PT_PRESENT_MASK, 0, sme_me_mask);
kvm_timer_init();
perf_register_guest_info_callbacks(&kvm_guest_cbs);
Provide support so that kexec can be used to boot a kernel when SME is
enabled.
Support is needed to allocate pages for kexec without encryption. This
is needed in order to be able to reboot in the kernel in the same manner
as originally booted.
Additionally, when shutting down all of the CPUs we need to be sure to
flush the caches and then halt. This is needed when booting from a state
where SME was not active into a state where SME is active (or vice-versa).
Without these steps, it is possible for cache lines to exist for the same
physical location but tagged both with and without the encryption bit. This
can cause random memory corruption when caches are flushed depending on
which cacheline is written last.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/init.h | 1 +
arch/x86/include/asm/kexec.h | 8 ++++++++
arch/x86/include/asm/pgtable_types.h | 1 +
arch/x86/kernel/machine_kexec_64.c | 35 +++++++++++++++++++++++++++++++++-
arch/x86/kernel/process.c | 17 +++++++++++++++--
arch/x86/mm/ident_map.c | 12 ++++++++----
include/linux/kexec.h | 14 ++++++++++++++
kernel/kexec_core.c | 6 ++++++
8 files changed, 87 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 474eb8c..05c4aa0 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -7,6 +7,7 @@ struct x86_mapping_info {
unsigned long page_flag; /* page flag for PMD or PUD entry */
unsigned long offset; /* ident mapping offset */
bool direct_gbpages; /* PUD level 1GB page support */
+ unsigned long kernpg_flag; /* kernel pagetable flag override */
};
int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 70ef205..e8183ac 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -207,6 +207,14 @@ struct kexec_entry64_regs {
uint64_t r15;
uint64_t rip;
};
+
+extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
+ gfp_t gfp);
+#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
+
+extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
+#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
+
#endif
typedef void crash_vmclear_fn(void);
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index ce8cb1c..0f326f4 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -213,6 +213,7 @@ enum page_cache_mode {
#define PAGE_KERNEL __pgprot(__PAGE_KERNEL | _PAGE_ENC)
#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
#define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC_NOENC __pgprot(__PAGE_KERNEL_EXEC)
#define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
#define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
#define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 6f5ca4e..35e069a 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -87,7 +87,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
}
pte = pte_offset_kernel(pmd, vaddr);
- set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+ set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
return 0;
err:
free_transition_pgtable(image);
@@ -115,6 +115,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
.alloc_pgt_page = alloc_pgt_page,
.context = image,
.page_flag = __PAGE_KERNEL_LARGE_EXEC,
+ .kernpg_flag = _KERNPG_TABLE_NOENC,
};
unsigned long mstart, mend;
pgd_t *level4p;
@@ -602,3 +603,35 @@ void arch_kexec_unprotect_crashkres(void)
{
kexec_mark_crashkres(false);
}
+
+int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
+{
+ int ret;
+
+ if (sme_active()) {
+ /*
+ * If SME is active we need to be sure that kexec pages are
+ * not encrypted because when we boot to the new kernel the
+ * pages won't be accessed encrypted (initially).
+ */
+ ret = set_memory_decrypted((unsigned long)vaddr, pages);
+ if (ret)
+ return ret;
+
+ if (gfp & __GFP_ZERO)
+ memset(vaddr, 0, pages * PAGE_SIZE);
+ }
+
+ return 0;
+}
+
+void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
+{
+ if (sme_active()) {
+ /*
+ * If SME is active we need to reset the pages back to being
+ * an encrypted mapping before freeing them.
+ */
+ set_memory_encrypted((unsigned long)vaddr, pages);
+ }
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0bb8842..fdad0fb 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -355,6 +355,7 @@ bool xen_set_default_idle(void)
return ret;
}
#endif
+
void stop_this_cpu(void *dummy)
{
local_irq_disable();
@@ -365,8 +366,20 @@ void stop_this_cpu(void *dummy)
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
- for (;;)
- halt();
+ for (;;) {
+ /*
+ * Use wbinvd followed by hlt to stop the processor. This
+ * provides support for kexec on a processor that supports
+ * SME. With kexec, going from SME inactive to SME active
+ * requires clearing cache entries so that addresses without
+ * the encryption bit set don't corrupt the same physical
+ * address that has the encryption bit set when caches are
+ * flushed. To achieve this a wbinvd is performed followed by
+ * a hlt. Even if the processor is not in the kexec/SME
+ * scenario this only adds a wbinvd to a halting processor.
+ */
+ asm volatile("wbinvd; hlt" : : : "memory");
+ }
}
/*
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index adab159..31cea98 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -51,7 +51,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
if (!pmd)
return -ENOMEM;
ident_pmd_init(info, pmd, addr, next);
- set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+ set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
}
return 0;
@@ -79,7 +79,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
if (!pud)
return -ENOMEM;
ident_pud_init(info, pud, addr, next);
- set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
+ set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
}
return 0;
@@ -93,6 +93,10 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
unsigned long next;
int result;
+ /* Set the default pagetable flags if not supplied */
+ if (!info->kernpg_flag)
+ info->kernpg_flag = _KERNPG_TABLE;
+
for (; addr < end; addr = next) {
pgd_t *pgd = pgd_page + pgd_index(addr);
p4d_t *p4d;
@@ -116,14 +120,14 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
if (result)
return result;
if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
- set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
+ set_pgd(pgd, __pgd(__pa(p4d) | info->kernpg_flag));
} else {
/*
* With p4d folded, pgd is equal to p4d.
* The pgd entry has to point to the pud page table in this case.
*/
pud_t *pud = pud_offset(p4d, 0);
- set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+ set_pgd(pgd, __pgd(__pa(pud) | info->kernpg_flag));
}
}
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index c9481eb..5d17fd6 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -334,6 +334,20 @@ static inline void *boot_phys_to_virt(unsigned long entry)
return phys_to_virt(boot_phys_to_phys(entry));
}
+#ifndef arch_kexec_post_alloc_pages
+static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
+ gfp_t gfp)
+{
+ return 0;
+}
+#endif
+
+#ifndef arch_kexec_pre_free_pages
+static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
+{
+}
+#endif
+
#else /* !CONFIG_KEXEC_CORE */
struct pt_regs;
struct task_struct;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index ae1a3ba..ecab7b3 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -309,6 +309,9 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
count = 1 << order;
for (i = 0; i < count; i++)
SetPageReserved(pages + i);
+
+ arch_kexec_post_alloc_pages(page_address(pages), count,
+ gfp_mask);
}
return pages;
@@ -320,6 +323,9 @@ static void kimage_free_pages(struct page *page)
order = page_private(page);
count = 1 << order;
+
+ arch_kexec_pre_free_pages(page_address(page), count);
+
for (i = 0; i < count; i++)
ClearPageReserved(page + i);
__free_pages(page, order);
When accessing memory using /dev/mem (or /dev/kmem) use the proper
encryption attributes when mapping the memory.
To insure the proper attributes are applied when reading or writing
/dev/mem, update the xlate_dev_mem_ptr() function to use memremap()
which will essentially perform the same steps of applying __va for
RAM or using ioremap() for if not RAM.
To insure the proper attributes are applied when mmapping /dev/mem,
update the phys_mem_access_prot() to call phys_mem_access_encrypted(),
a new function which will check if the memory should be mapped encrypted
or not. If it is not to be mapped encrypted then the VMA protection
value is updated to remove the encryption bit.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/io.h | 3 +++
arch/x86/mm/ioremap.c | 18 +++++++++---------
arch/x86/mm/pat.c | 3 +++
3 files changed, 15 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 9eac5a5..db163d7 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -385,4 +385,7 @@ extern bool arch_memremap_can_ram_remap(resource_size_t offset, size_t size,
unsigned long flags);
#define arch_memremap_can_ram_remap arch_memremap_can_ram_remap
+extern bool phys_mem_access_encrypted(unsigned long phys_addr,
+ unsigned long size);
+
#endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 99cda55..56dd5b2 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -404,12 +404,10 @@ void *xlate_dev_mem_ptr(phys_addr_t phys)
unsigned long offset = phys & ~PAGE_MASK;
void *vaddr;
- /* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
- if (page_is_ram(start >> PAGE_SHIFT))
- return __va(phys);
+ /* memremap() maps if RAM, otherwise falls back to ioremap() */
+ vaddr = memremap(start, PAGE_SIZE, MEMREMAP_WB);
- vaddr = ioremap_cache(start, PAGE_SIZE);
- /* Only add the offset on success and return NULL if the ioremap() failed: */
+ /* Only add the offset on success and return NULL if memremap() failed */
if (vaddr)
vaddr += offset;
@@ -418,10 +416,7 @@ void *xlate_dev_mem_ptr(phys_addr_t phys)
void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
{
- if (page_is_ram(phys >> PAGE_SHIFT))
- return;
-
- iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
+ memunmap((void *)((unsigned long)addr & PAGE_MASK));
}
/*
@@ -630,6 +625,11 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
return prot;
}
+bool phys_mem_access_encrypted(unsigned long phys_addr, unsigned long size)
+{
+ return arch_memremap_can_ram_remap(phys_addr, size, 0);
+}
+
#ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
/* Remap memory with encryption */
void __init *early_memremap_encrypted(resource_size_t phys_addr,
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 6753d9c..b970c95 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -748,6 +748,9 @@ void arch_io_free_memtype_wc(resource_size_t start, resource_size_t size)
pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t vma_prot)
{
+ if (!phys_mem_access_encrypted(pfn << PAGE_SHIFT, size))
+ vma_prot = pgprot_decrypted(vma_prot);
+
return vma_prot;
}
Add support to check if SME has been enabled and if memory encryption
should be activated (checking of command line option based on the
configuration of the default state). If memory encryption is to be
activated, then the encryption mask is set and the kernel is encrypted
"in place."
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kernel/head_64.S | 1
arch/x86/mm/mem_encrypt.c | 93 +++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 89 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 1fe944b..660bf8e 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -97,6 +97,7 @@ startup_64:
* Save the returned mask in %r12 for later use.
*/
push %rsi
+ movq %rsi, %rdi
call sme_enable
pop %rsi
movq %rax, %r12
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 6129477..d624058 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -12,6 +12,7 @@
#include <linux/linkage.h>
#include <linux/init.h>
+#include <asm/bootparam.h>
#ifdef CONFIG_AMD_MEM_ENCRYPT
@@ -22,10 +23,23 @@
#include <asm/tlbflush.h>
#include <asm/fixmap.h>
#include <asm/setup.h>
-#include <asm/bootparam.h>
#include <asm/set_memory.h>
#include <asm/cacheflush.h>
#include <asm/sections.h>
+#include <asm/mem_encrypt.h>
+#include <asm/processor-flags.h>
+#include <asm/msr.h>
+#include <asm/cmdline.h>
+
+static char sme_cmdline_arg[] __initdata = "mem_encrypt";
+static char sme_cmdline_on[] __initdata = "on";
+static char sme_cmdline_off[] __initdata = "off";
+
+/*
+ * Some SME functions run very early causing issues with the stack-protector
+ * support. Provide a way to turn off this support on a per-function basis.
+ */
+#define SME_NOSTACKP __attribute__((__optimize__("no-stack-protector")))
/*
* Since SME related variables are set early in the boot process they must
@@ -237,6 +251,8 @@ void __init mem_encrypt_init(void)
/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
swiotlb_update_mem_attributes();
+
+ pr_info("AMD Secure Memory Encryption (SME) active\n");
}
void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
@@ -564,8 +580,75 @@ void __init sme_encrypt_kernel(void)
native_write_cr3(native_read_cr3());
}
-unsigned long __init sme_enable(void)
+unsigned long __init SME_NOSTACKP sme_enable(struct boot_params *bp)
{
+ const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
+ unsigned int eax, ebx, ecx, edx;
+ bool active_by_default;
+ unsigned long me_mask;
+ char buffer[16];
+ u64 msr;
+
+ /* Check for the SME support leaf */
+ eax = 0x80000000;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ if (eax < 0x8000001f)
+ goto out;
+
+ /*
+ * Check for the SME feature:
+ * CPUID Fn8000_001F[EAX] - Bit 0
+ * Secure Memory Encryption support
+ * CPUID Fn8000_001F[EBX] - Bits 5:0
+ * Pagetable bit position used to indicate encryption
+ */
+ eax = 0x8000001f;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ if (!(eax & 1))
+ goto out;
+
+ me_mask = 1UL << (ebx & 0x3f);
+
+ /* Check if SME is enabled */
+ msr = __rdmsr(MSR_K8_SYSCFG);
+ if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+ goto out;
+
+ /*
+ * Fixups have not been applied to phys_base yet and we're running
+ * identity mapped, so we must obtain the address to the SME command
+ * line argument data using rip-relative addressing.
+ */
+ asm ("lea sme_cmdline_arg(%%rip), %0"
+ : "=r" (cmdline_arg)
+ : "p" (sme_cmdline_arg));
+ asm ("lea sme_cmdline_on(%%rip), %0"
+ : "=r" (cmdline_on)
+ : "p" (sme_cmdline_on));
+ asm ("lea sme_cmdline_off(%%rip), %0"
+ : "=r" (cmdline_off)
+ : "p" (sme_cmdline_off));
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
+ active_by_default = true;
+ else
+ active_by_default = false;
+
+ cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
+ ((u64)bp->ext_cmd_line_ptr << 32));
+
+ cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer));
+
+ if (strncmp(buffer, cmdline_on, sizeof(buffer)) == 0)
+ sme_me_mask = me_mask;
+ else if (strncmp(buffer, cmdline_off, sizeof(buffer)) == 0)
+ sme_me_mask = 0;
+ else
+ sme_me_mask = active_by_default ? me_mask : 0;
+
+out:
return sme_me_mask;
}
@@ -576,9 +659,9 @@ unsigned long sme_get_me_mask(void)
#else /* !CONFIG_AMD_MEM_ENCRYPT */
-void __init sme_encrypt_kernel(void) { }
-unsigned long __init sme_enable(void) { return 0; }
+void __init sme_encrypt_kernel(void) { }
+unsigned long __init sme_enable(struct boot_params *bp) { return 0; }
-unsigned long sme_get_me_mask(void) { return 0; }
+unsigned long sme_get_me_mask(void) { return 0; }
#endif /* CONFIG_AMD_MEM_ENCRYPT */
Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/cmdline.h | 2 +
arch/x86/lib/cmdline.c | 105 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 107 insertions(+)
diff --git a/arch/x86/include/asm/cmdline.h b/arch/x86/include/asm/cmdline.h
index e01f7f7..84ae170 100644
--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
#define _ASM_X86_CMDLINE_H
int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+ char *buffer, int bufsize);
#endif /* _ASM_X86_CMDLINE_H */
diff --git a/arch/x86/lib/cmdline.c b/arch/x86/lib/cmdline.c
index 5cc78bf..3261abb 100644
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -104,7 +104,112 @@ static inline int myisspace(u8 c)
return 0; /* Buffer overrun */
}
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+ const char *option, char *buffer, int bufsize)
+{
+ char c;
+ int pos = 0, len = -1;
+ const char *opptr = NULL;
+ char *bufptr = buffer;
+ enum {
+ st_wordstart = 0, /* Start of word/after whitespace */
+ st_wordcmp, /* Comparing this word */
+ st_wordskip, /* Miscompare, skip */
+ st_bufcpy, /* Copying this to buffer */
+ } state = st_wordstart;
+
+ if (!cmdline)
+ return -1; /* No command line */
+
+ /*
+ * This 'pos' check ensures we do not overrun
+ * a non-NULL-terminated 'cmdline'
+ */
+ while (pos++ < max_cmdline_size) {
+ c = *(char *)cmdline++;
+ if (!c)
+ break;
+
+ switch (state) {
+ case st_wordstart:
+ if (myisspace(c))
+ break;
+
+ state = st_wordcmp;
+ opptr = option;
+ /* fall through */
+
+ case st_wordcmp:
+ if ((c == '=') && !*opptr) {
+ /*
+ * We matched all the way to the end of the
+ * option we were looking for, prepare to
+ * copy the argument.
+ */
+ len = 0;
+ bufptr = buffer;
+ state = st_bufcpy;
+ break;
+ } else if (c == *opptr++) {
+ /*
+ * We are currently matching, so continue
+ * to the next character on the cmdline.
+ */
+ break;
+ }
+ state = st_wordskip;
+ /* fall through */
+
+ case st_wordskip:
+ if (myisspace(c))
+ state = st_wordstart;
+ break;
+
+ case st_bufcpy:
+ if (myisspace(c)) {
+ state = st_wordstart;
+ } else {
+ /*
+ * Increment len, but don't overrun the
+ * supplied buffer and leave room for the
+ * NULL terminator.
+ */
+ if (++len < bufsize)
+ *bufptr++ = c;
+ }
+ break;
+ }
+ }
+
+ if (bufsize)
+ *bufptr = '\0';
+
+ return len;
+}
+
int cmdline_find_option_bool(const char *cmdline, const char *option)
{
return __cmdline_find_option_bool(cmdline, COMMAND_LINE_SIZE, option);
}
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+ int bufsize)
+{
+ return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
+ buffer, bufsize);
+}
Add the support to encrypt the kernel in-place. This is done by creating
new page mappings for the kernel - a decrypted write-protected mapping
and an encrypted mapping. The kernel is encrypted by copying it through
a temporary buffer.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 6 +
arch/x86/mm/Makefile | 2
arch/x86/mm/mem_encrypt.c | 314 ++++++++++++++++++++++++++++++++++++
arch/x86/mm/mem_encrypt_boot.S | 150 +++++++++++++++++
4 files changed, 472 insertions(+)
create mode 100644 arch/x86/mm/mem_encrypt_boot.S
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index d86e544..e0a8edc 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,6 +21,12 @@
extern unsigned long sme_me_mask;
+void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
+ unsigned long decrypted_kernel_vaddr,
+ unsigned long kernel_len,
+ unsigned long encryption_wa,
+ unsigned long encryption_pgd);
+
void __init sme_early_encrypt(resource_size_t paddr,
unsigned long size);
void __init sme_early_decrypt(resource_size_t paddr,
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 88ee454..47b26ea 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -38,3 +38,5 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o
obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+
+obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 018b58a..6129477 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -24,6 +24,8 @@
#include <asm/setup.h>
#include <asm/bootparam.h>
#include <asm/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/sections.h>
/*
* Since SME related variables are set early in the boot process they must
@@ -246,8 +248,320 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
}
+static void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,
+ unsigned long end)
+{
+ unsigned long pgd_start, pgd_end, pgd_size;
+ pgd_t *pgd_p;
+
+ pgd_start = start & PGDIR_MASK;
+ pgd_end = end & PGDIR_MASK;
+
+ pgd_size = (((pgd_end - pgd_start) / PGDIR_SIZE) + 1);
+ pgd_size *= sizeof(pgd_t);
+
+ pgd_p = pgd_base + pgd_index(start);
+
+ memset(pgd_p, 0, pgd_size);
+}
+
+#ifndef CONFIG_X86_5LEVEL
+#define native_make_p4d(_x) (p4d_t) { .pgd = native_make_pgd(_x) }
+#endif
+
+#define PGD_FLAGS _KERNPG_TABLE_NOENC
+#define P4D_FLAGS _KERNPG_TABLE_NOENC
+#define PUD_FLAGS _KERNPG_TABLE_NOENC
+#define PMD_FLAGS (__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)
+
+static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area,
+ unsigned long vaddr, pmdval_t pmd_val)
+{
+ pgd_t *pgd_p;
+ p4d_t *p4d_p;
+ pud_t *pud_p;
+ pmd_t *pmd_p;
+
+ pgd_p = pgd_base + pgd_index(vaddr);
+ if (native_pgd_val(*pgd_p)) {
+ if (IS_ENABLED(CONFIG_X86_5LEVEL))
+ p4d_p = (p4d_t *)(native_pgd_val(*pgd_p) & ~PTE_FLAGS_MASK);
+ else
+ pud_p = (pud_t *)(native_pgd_val(*pgd_p) & ~PTE_FLAGS_MASK);
+ } else {
+ pgd_t pgd;
+
+ if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+ p4d_p = pgtable_area;
+ memset(p4d_p, 0, sizeof(*p4d_p) * PTRS_PER_P4D);
+ pgtable_area += sizeof(*p4d_p) * PTRS_PER_P4D;
+
+ pgd = native_make_pgd((pgdval_t)p4d_p + PGD_FLAGS);
+ } else {
+ pud_p = pgtable_area;
+ memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
+ pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
+
+ pgd = native_make_pgd((pgdval_t)pud_p + PGD_FLAGS);
+ }
+ native_set_pgd(pgd_p, pgd);
+ }
+
+ if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+ p4d_p += p4d_index(vaddr);
+ if (native_p4d_val(*p4d_p)) {
+ pud_p = (pud_t *)(native_p4d_val(*p4d_p) & ~PTE_FLAGS_MASK);
+ } else {
+ p4d_t p4d;
+
+ pud_p = pgtable_area;
+ memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
+ pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
+
+ p4d = native_make_p4d((p4dval_t)pud_p + P4D_FLAGS);
+ native_set_p4d(p4d_p, p4d);
+ }
+ }
+
+ pud_p += pud_index(vaddr);
+ if (native_pud_val(*pud_p)) {
+ if (native_pud_val(*pud_p) & _PAGE_PSE)
+ goto out;
+
+ pmd_p = (pmd_t *)(native_pud_val(*pud_p) & ~PTE_FLAGS_MASK);
+ } else {
+ pud_t pud;
+
+ pmd_p = pgtable_area;
+ memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
+ pgtable_area += sizeof(*pmd_p) * PTRS_PER_PMD;
+
+ pud = native_make_pud((pudval_t)pmd_p + PUD_FLAGS);
+ native_set_pud(pud_p, pud);
+ }
+
+ pmd_p += pmd_index(vaddr);
+ if (!native_pmd_val(*pmd_p) || !(native_pmd_val(*pmd_p) & _PAGE_PSE))
+ native_set_pmd(pmd_p, native_make_pmd(pmd_val));
+
+out:
+ return pgtable_area;
+}
+
+static unsigned long __init sme_pgtable_calc(unsigned long len)
+{
+ unsigned long p4d_size, pud_size, pmd_size;
+ unsigned long total;
+
+ /*
+ * Perform a relatively simplistic calculation of the pagetable
+ * entries that are needed. That mappings will be covered by 2MB
+ * PMD entries so we can conservatively calculate the required
+ * number of P4D, PUD and PMD structures needed to perform the
+ * mappings. Incrementing the count for each covers the case where
+ * the addresses cross entries.
+ */
+ if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+ p4d_size = (ALIGN(len, PGDIR_SIZE) / PGDIR_SIZE) + 1;
+ p4d_size *= sizeof(p4d_t) * PTRS_PER_P4D;
+ pud_size = (ALIGN(len, P4D_SIZE) / P4D_SIZE) + 1;
+ pud_size *= sizeof(pud_t) * PTRS_PER_PUD;
+ } else {
+ p4d_size = 0;
+ pud_size = (ALIGN(len, PGDIR_SIZE) / PGDIR_SIZE) + 1;
+ pud_size *= sizeof(pud_t) * PTRS_PER_PUD;
+ }
+ pmd_size = (ALIGN(len, PUD_SIZE) / PUD_SIZE) + 1;
+ pmd_size *= sizeof(pmd_t) * PTRS_PER_PMD;
+
+ total = p4d_size + pud_size + pmd_size;
+
+ /*
+ * Now calculate the added pagetable structures needed to populate
+ * the new pagetables.
+ */
+ if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
+ p4d_size = ALIGN(total, PGDIR_SIZE) / PGDIR_SIZE;
+ p4d_size *= sizeof(p4d_t) * PTRS_PER_P4D;
+ pud_size = ALIGN(total, P4D_SIZE) / P4D_SIZE;
+ pud_size *= sizeof(pud_t) * PTRS_PER_PUD;
+ } else {
+ p4d_size = 0;
+ pud_size = ALIGN(total, PGDIR_SIZE) / PGDIR_SIZE;
+ pud_size *= sizeof(pud_t) * PTRS_PER_PUD;
+ }
+ pmd_size = ALIGN(total, PUD_SIZE) / PUD_SIZE;
+ pmd_size *= sizeof(pmd_t) * PTRS_PER_PMD;
+
+ total += p4d_size + pud_size + pmd_size;
+
+ return total;
+}
+
void __init sme_encrypt_kernel(void)
{
+ unsigned long workarea_start, workarea_end, workarea_len;
+ unsigned long execute_start, execute_end, execute_len;
+ unsigned long kernel_start, kernel_end, kernel_len;
+ unsigned long pgtable_area_len;
+ unsigned long paddr, pmd_flags;
+ unsigned long decrypted_base;
+ void *pgtable_area;
+ pgd_t *pgd;
+
+ if (!sme_active())
+ return;
+
+ /*
+ * Prepare for encrypting the kernel by building new pagetables with
+ * the necessary attributes needed to encrypt the kernel in place.
+ *
+ * One range of virtual addresses will map the memory occupied
+ * by the kernel as encrypted.
+ *
+ * Another range of virtual addresses will map the memory occupied
+ * by the kernel as decrypted and write-protected.
+ *
+ * The use of write-protect attribute will prevent any of the
+ * memory from being cached.
+ */
+
+ /* Physical addresses gives us the identity mapped virtual addresses */
+ kernel_start = __pa_symbol(_text);
+ kernel_end = ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE);
+ kernel_len = kernel_end - kernel_start;
+
+ /* Set the encryption workarea to be immediately after the kernel */
+ workarea_start = kernel_end;
+
+ /*
+ * Calculate required number of workarea bytes needed:
+ * executable encryption area size:
+ * stack page (PAGE_SIZE)
+ * encryption routine page (PAGE_SIZE)
+ * intermediate copy buffer (PMD_PAGE_SIZE)
+ * pagetable structures for the encryption of the kernel
+ * pagetable structures for workarea (in case not currently mapped)
+ */
+ execute_start = workarea_start;
+ execute_end = execute_start + (PAGE_SIZE * 2) + PMD_PAGE_SIZE;
+ execute_len = execute_end - execute_start;
+
+ /*
+ * One PGD for both encrypted and decrypted mappings and a set of
+ * PUDs and PMDs for each of the encrypted and decrypted mappings.
+ */
+ pgtable_area_len = sizeof(pgd_t) * PTRS_PER_PGD;
+ pgtable_area_len += sme_pgtable_calc(execute_end - kernel_start) * 2;
+
+ /* PUDs and PMDs needed in the current pagetables for the workarea */
+ pgtable_area_len += sme_pgtable_calc(execute_len + pgtable_area_len);
+
+ /*
+ * The total workarea includes the executable encryption area and
+ * the pagetable area.
+ */
+ workarea_len = execute_len + pgtable_area_len;
+ workarea_end = workarea_start + workarea_len;
+
+ /*
+ * Set the address to the start of where newly created pagetable
+ * structures (PGDs, PUDs and PMDs) will be allocated. New pagetable
+ * structures are created when the workarea is added to the current
+ * pagetables and when the new encrypted and decrypted kernel
+ * mappings are populated.
+ */
+ pgtable_area = (void *)execute_end;
+
+ /*
+ * Make sure the current pagetable structure has entries for
+ * addressing the workarea.
+ */
+ pgd = (pgd_t *)native_read_cr3_pa();
+ paddr = workarea_start;
+ while (paddr < workarea_end) {
+ pgtable_area = sme_populate_pgd(pgd, pgtable_area,
+ paddr,
+ paddr + PMD_FLAGS);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Flush the TLB - no globals so cr3 is enough */
+ native_write_cr3(native_read_cr3());
+
+ /*
+ * A new pagetable structure is being built to allow for the kernel
+ * to be encrypted. It starts with an empty PGD that will then be
+ * populated with new PUDs and PMDs as the encrypted and decrypted
+ * kernel mappings are created.
+ */
+ pgd = pgtable_area;
+ memset(pgd, 0, sizeof(*pgd) * PTRS_PER_PGD);
+ pgtable_area += sizeof(*pgd) * PTRS_PER_PGD;
+
+ /* Add encrypted kernel (identity) mappings */
+ pmd_flags = PMD_FLAGS | _PAGE_ENC;
+ paddr = kernel_start;
+ while (paddr < kernel_end) {
+ pgtable_area = sme_populate_pgd(pgd, pgtable_area,
+ paddr,
+ paddr + pmd_flags);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /*
+ * A different PGD index/entry must be used to get different
+ * pagetable entries for the decrypted mapping. Choose the next
+ * PGD index and convert it to a virtual address to be used as
+ * the base of the mapping.
+ */
+ decrypted_base = (pgd_index(workarea_end) + 1) & (PTRS_PER_PGD - 1);
+ decrypted_base <<= PGDIR_SHIFT;
+
+ /* Add decrypted, write-protected kernel (non-identity) mappings */
+ pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT);
+ paddr = kernel_start;
+ while (paddr < kernel_end) {
+ pgtable_area = sme_populate_pgd(pgd, pgtable_area,
+ paddr + decrypted_base,
+ paddr + pmd_flags);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Add decrypted workarea mappings to both kernel mappings */
+ paddr = workarea_start;
+ while (paddr < workarea_end) {
+ pgtable_area = sme_populate_pgd(pgd, pgtable_area,
+ paddr,
+ paddr + PMD_FLAGS);
+
+ pgtable_area = sme_populate_pgd(pgd, pgtable_area,
+ paddr + decrypted_base,
+ paddr + PMD_FLAGS);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Perform the encryption */
+ sme_encrypt_execute(kernel_start, kernel_start + decrypted_base,
+ kernel_len, workarea_start, (unsigned long)pgd);
+
+ /*
+ * At this point we are running encrypted. Remove the mappings for
+ * the decrypted areas - all that is needed for this is to remove
+ * the PGD entry/entries.
+ */
+ sme_clear_pgd(pgd, kernel_start + decrypted_base,
+ kernel_end + decrypted_base);
+
+ sme_clear_pgd(pgd, workarea_start + decrypted_base,
+ workarea_end + decrypted_base);
+
+ /* Flush the TLB - no globals so cr3 is enough */
+ native_write_cr3(native_read_cr3());
}
unsigned long __init sme_enable(void)
diff --git a/arch/x86/mm/mem_encrypt_boot.S b/arch/x86/mm/mem_encrypt_boot.S
new file mode 100644
index 0000000..7720b00
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt_boot.S
@@ -0,0 +1,150 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/pgtable.h>
+#include <asm/page.h>
+#include <asm/processor-flags.h>
+#include <asm/msr-index.h>
+
+ .text
+ .code64
+ENTRY(sme_encrypt_execute)
+
+ /*
+ * Entry parameters:
+ * RDI - virtual address for the encrypted kernel mapping
+ * RSI - virtual address for the decrypted kernel mapping
+ * RDX - length of kernel
+ * RCX - virtual address of the encryption workarea, including:
+ * - stack page (PAGE_SIZE)
+ * - encryption routine page (PAGE_SIZE)
+ * - intermediate copy buffer (PMD_PAGE_SIZE)
+ * R8 - physcial address of the pagetables to use for encryption
+ */
+
+ push %rbp
+ push %r12
+
+ /* Set up a one page stack in the non-encrypted memory area */
+ movq %rsp, %rbp /* Save current stack pointer */
+ movq %rcx, %rax /* Workarea stack page */
+ movq %rax, %rsp /* Set new stack pointer */
+ addq $PAGE_SIZE, %rsp /* Stack grows from the bottom */
+ addq $PAGE_SIZE, %rax /* Workarea encryption routine */
+
+ movq %rdi, %r10 /* Encrypted kernel */
+ movq %rsi, %r11 /* Decrypted kernel */
+ movq %rdx, %r12 /* Kernel length */
+
+ /* Copy encryption routine into the workarea */
+ movq %rax, %rdi /* Workarea encryption routine */
+ leaq __enc_copy(%rip), %rsi /* Encryption routine */
+ movq $(.L__enc_copy_end - __enc_copy), %rcx /* Encryption routine length */
+ rep movsb
+
+ /* Setup registers for call */
+ movq %r10, %rdi /* Encrypted kernel */
+ movq %r11, %rsi /* Decrypted kernel */
+ movq %r8, %rdx /* Pagetables used for encryption */
+ movq %r12, %rcx /* Kernel length */
+ movq %rax, %r8 /* Workarea encryption routine */
+ addq $PAGE_SIZE, %r8 /* Workarea intermediate copy buffer */
+
+ call *%rax /* Call the encryption routine */
+
+ movq %rbp, %rsp /* Restore original stack pointer */
+
+ pop %r12
+ pop %rbp
+
+ ret
+ENDPROC(sme_encrypt_execute)
+
+ENTRY(__enc_copy)
+/*
+ * Routine used to encrypt kernel.
+ * This routine must be run outside of the kernel proper since
+ * the kernel will be encrypted during the process. So this
+ * routine is defined here and then copied to an area outside
+ * of the kernel where it will remain and run decrypted
+ * during execution.
+ *
+ * On entry the registers must be:
+ * RDI - virtual address for the encrypted kernel mapping
+ * RSI - virtual address for the decrypted kernel mapping
+ * RDX - address of the pagetables to use for encryption
+ * RCX - length of kernel
+ * R8 - intermediate copy buffer
+ *
+ * RAX - points to this routine
+ *
+ * The kernel will be encrypted by copying from the non-encrypted
+ * kernel space to an intermediate buffer and then copying from the
+ * intermediate buffer back to the encrypted kernel space. The physical
+ * addresses of the two kernel space mappings are the same which
+ * results in the kernel being encrypted "in place".
+ */
+ /* Enable the new page tables */
+ mov %rdx, %cr3
+
+ /* Flush any global TLBs */
+ mov %cr4, %rdx
+ andq $~X86_CR4_PGE, %rdx
+ mov %rdx, %cr4
+ orq $X86_CR4_PGE, %rdx
+ mov %rdx, %cr4
+
+ /* Set the PAT register PA5 entry to write-protect */
+ push %rcx
+ movl $MSR_IA32_CR_PAT, %ecx
+ rdmsr
+ push %rdx /* Save original PAT value */
+ andl $0xffff00ff, %edx /* Clear PA5 */
+ orl $0x00000500, %edx /* Set PA5 to WP */
+ wrmsr
+ pop %rdx /* RDX contains original PAT value */
+ pop %rcx
+
+ movq %rcx, %r9 /* Save kernel length */
+ movq %rdi, %r10 /* Save encrypted kernel address */
+ movq %rsi, %r11 /* Save decrypted kernel address */
+
+ wbinvd /* Invalidate any cache entries */
+
+ /* Copy/encrypt 2MB at a time */
+1:
+ movq %r11, %rsi /* Source - decrypted kernel */
+ movq %r8, %rdi /* Dest - intermediate copy buffer */
+ movq $PMD_PAGE_SIZE, %rcx /* 2MB length */
+ rep movsb
+
+ movq %r8, %rsi /* Source - intermediate copy buffer */
+ movq %r10, %rdi /* Dest - encrypted kernel */
+ movq $PMD_PAGE_SIZE, %rcx /* 2MB length */
+ rep movsb
+
+ addq $PMD_PAGE_SIZE, %r11
+ addq $PMD_PAGE_SIZE, %r10
+ subq $PMD_PAGE_SIZE, %r9 /* Kernel length decrement */
+ jnz 1b /* Kernel length not zero? */
+
+ /* Restore PAT register */
+ push %rdx /* Save original PAT value */
+ movl $MSR_IA32_CR_PAT, %ecx
+ rdmsr
+ pop %rdx /* Restore original PAT value */
+ wrmsr
+
+ ret
+.L__enc_copy_end:
+ENDPROC(__enc_copy)
The IOMMU is programmed with physical addresses for the various tables
and buffers that are used to communicate between the device and the
driver. When the driver allocates this memory it is encrypted. In order
for the IOMMU to access the memory as encrypted the encryption mask needs
to be included in these physical addresses during configuration.
The PTE entries created by the IOMMU should also include the encryption
mask so that when the device behind the IOMMU performs a DMA, the DMA
will be performed to encrypted memory.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 7 +++++++
arch/x86/mm/mem_encrypt.c | 30 ++++++++++++++++++++++++++++++
drivers/iommu/amd_iommu.c | 36 +++++++++++++++++++-----------------
drivers/iommu/amd_iommu_init.c | 18 ++++++++++++------
drivers/iommu/amd_iommu_proto.h | 10 ++++++++++
drivers/iommu/amd_iommu_types.h | 2 +-
include/asm-generic/mem_encrypt.h | 5 +++++
7 files changed, 84 insertions(+), 24 deletions(-)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index c7a2525..d86e544 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -31,6 +31,8 @@ void __init sme_early_decrypt(resource_size_t paddr,
void __init sme_early_init(void);
+bool sme_iommu_supported(void);
+
/* Architecture __weak replacement functions */
void __init mem_encrypt_init(void);
@@ -62,6 +64,11 @@ static inline void __init sme_early_init(void)
{
}
+static inline bool sme_iommu_supported(void)
+{
+ return true;
+}
+
#endif /* CONFIG_AMD_MEM_ENCRYPT */
static inline bool sme_active(void)
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 5d7c51d..018b58a 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -197,6 +197,36 @@ void __init sme_early_init(void)
protection_map[i] = pgprot_encrypted(protection_map[i]);
}
+bool sme_iommu_supported(void)
+{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ if (!sme_me_mask || (c->x86 != 0x17))
+ return true;
+
+ /* For Fam17h, a specific level of support is required */
+ switch (c->microcode & 0xf000) {
+ case 0x0000:
+ return false;
+ case 0x1000:
+ switch (c->microcode & 0x0f00) {
+ case 0x0000:
+ return false;
+ case 0x0100:
+ if ((c->microcode & 0xff) < 0x26)
+ return false;
+ break;
+ case 0x0200:
+ if ((c->microcode & 0xff) < 0x05)
+ return false;
+ break;
+ }
+ break;
+ }
+
+ return true;
+}
+
/* Architecture __weak replacement functions */
void __init mem_encrypt_init(void)
{
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 63cacf5..94eb130 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -544,7 +544,7 @@ static void dump_dte_entry(u16 devid)
static void dump_command(unsigned long phys_addr)
{
- struct iommu_cmd *cmd = phys_to_virt(phys_addr);
+ struct iommu_cmd *cmd = iommu_phys_to_virt(phys_addr);
int i;
for (i = 0; i < 4; ++i)
@@ -863,13 +863,15 @@ static void copy_cmd_to_buffer(struct amd_iommu *iommu,
writel(tail, iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
}
-static void build_completion_wait(struct iommu_cmd *cmd, u64 address)
+static void build_completion_wait(struct iommu_cmd *cmd, volatile u64 *sem)
{
+ u64 address = iommu_virt_to_phys((void *)sem);
+
WARN_ON(address & 0x7ULL);
memset(cmd, 0, sizeof(*cmd));
- cmd->data[0] = lower_32_bits(__pa(address)) | CMD_COMPL_WAIT_STORE_MASK;
- cmd->data[1] = upper_32_bits(__pa(address));
+ cmd->data[0] = lower_32_bits(address) | CMD_COMPL_WAIT_STORE_MASK;
+ cmd->data[1] = upper_32_bits(address);
cmd->data[2] = 1;
CMD_SET_TYPE(cmd, CMD_COMPL_WAIT);
}
@@ -1033,7 +1035,7 @@ static int __iommu_queue_command_sync(struct amd_iommu *iommu,
iommu->cmd_sem = 0;
- build_completion_wait(&sync_cmd, (u64)&iommu->cmd_sem);
+ build_completion_wait(&sync_cmd, &iommu->cmd_sem);
copy_cmd_to_buffer(iommu, &sync_cmd, tail);
if ((ret = wait_on_sem(&iommu->cmd_sem)) != 0)
@@ -1083,7 +1085,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu)
return 0;
- build_completion_wait(&cmd, (u64)&iommu->cmd_sem);
+ build_completion_wait(&cmd, &iommu->cmd_sem);
spin_lock_irqsave(&iommu->lock, flags);
@@ -1328,7 +1330,7 @@ static bool increase_address_space(struct protection_domain *domain,
return false;
*pte = PM_LEVEL_PDE(domain->mode,
- virt_to_phys(domain->pt_root));
+ iommu_virt_to_phys(domain->pt_root));
domain->pt_root = pte;
domain->mode += 1;
domain->updated = true;
@@ -1365,7 +1367,7 @@ static u64 *alloc_pte(struct protection_domain *domain,
if (!page)
return NULL;
- __npte = PM_LEVEL_PDE(level, virt_to_phys(page));
+ __npte = PM_LEVEL_PDE(level, iommu_virt_to_phys(page));
/* pte could have been changed somewhere. */
if (cmpxchg64(pte, __pte, __npte) != __pte) {
@@ -1481,10 +1483,10 @@ static int iommu_map_page(struct protection_domain *dom,
return -EBUSY;
if (count > 1) {
- __pte = PAGE_SIZE_PTE(phys_addr, page_size);
+ __pte = PAGE_SIZE_PTE(__sme_set(phys_addr), page_size);
__pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_P | IOMMU_PTE_FC;
} else
- __pte = phys_addr | IOMMU_PTE_P | IOMMU_PTE_FC;
+ __pte = __sme_set(phys_addr) | IOMMU_PTE_P | IOMMU_PTE_FC;
if (prot & IOMMU_PROT_IR)
__pte |= IOMMU_PTE_IR;
@@ -1700,7 +1702,7 @@ static void free_gcr3_tbl_level1(u64 *tbl)
if (!(tbl[i] & GCR3_VALID))
continue;
- ptr = __va(tbl[i] & PAGE_MASK);
+ ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
free_page((unsigned long)ptr);
}
@@ -1715,7 +1717,7 @@ static void free_gcr3_tbl_level2(u64 *tbl)
if (!(tbl[i] & GCR3_VALID))
continue;
- ptr = __va(tbl[i] & PAGE_MASK);
+ ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
free_gcr3_tbl_level1(ptr);
}
@@ -1807,7 +1809,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
u64 flags = 0;
if (domain->mode != PAGE_MODE_NONE)
- pte_root = virt_to_phys(domain->pt_root);
+ pte_root = iommu_virt_to_phys(domain->pt_root);
pte_root |= (domain->mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
@@ -1819,7 +1821,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
flags |= DTE_FLAG_IOTLB;
if (domain->flags & PD_IOMMUV2_MASK) {
- u64 gcr3 = __pa(domain->gcr3_tbl);
+ u64 gcr3 = iommu_virt_to_phys(domain->gcr3_tbl);
u64 glx = domain->glx;
u64 tmp;
@@ -3470,10 +3472,10 @@ static u64 *__get_gcr3_pte(u64 *root, int level, int pasid, bool alloc)
if (root == NULL)
return NULL;
- *pte = __pa(root) | GCR3_VALID;
+ *pte = iommu_virt_to_phys(root) | GCR3_VALID;
}
- root = __va(*pte & PAGE_MASK);
+ root = iommu_phys_to_virt(*pte & PAGE_MASK);
level -= 1;
}
@@ -3652,7 +3654,7 @@ static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
dte = amd_iommu_dev_table[devid].data[2];
dte &= ~DTE_IRQ_PHYS_ADDR_MASK;
- dte |= virt_to_phys(table->table);
+ dte |= iommu_virt_to_phys(table->table);
dte |= DTE_IRQ_REMAP_INTCTL;
dte |= DTE_IRQ_TABLE_LEN;
dte |= DTE_IRQ_REMAP_ENABLE;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5a11328..2870a6b 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -29,6 +29,7 @@
#include <linux/export.h>
#include <linux/iommu.h>
#include <linux/kmemleak.h>
+#include <linux/mem_encrypt.h>
#include <asm/pci-direct.h>
#include <asm/iommu.h>
#include <asm/gart.h>
@@ -346,7 +347,7 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
BUG_ON(iommu->mmio_base == NULL);
- entry = virt_to_phys(amd_iommu_dev_table);
+ entry = iommu_virt_to_phys(amd_iommu_dev_table);
entry |= (dev_table_size >> 12) - 1;
memcpy_toio(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET,
&entry, sizeof(entry));
@@ -602,7 +603,7 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
BUG_ON(iommu->cmd_buf == NULL);
- entry = (u64)virt_to_phys(iommu->cmd_buf);
+ entry = iommu_virt_to_phys(iommu->cmd_buf);
entry |= MMIO_CMD_SIZE_512;
memcpy_toio(iommu->mmio_base + MMIO_CMD_BUF_OFFSET,
@@ -631,7 +632,7 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
BUG_ON(iommu->evt_buf == NULL);
- entry = (u64)virt_to_phys(iommu->evt_buf) | EVT_LEN_MASK;
+ entry = iommu_virt_to_phys(iommu->evt_buf) | EVT_LEN_MASK;
memcpy_toio(iommu->mmio_base + MMIO_EVT_BUF_OFFSET,
&entry, sizeof(entry));
@@ -664,7 +665,7 @@ static void iommu_enable_ppr_log(struct amd_iommu *iommu)
if (iommu->ppr_log == NULL)
return;
- entry = (u64)virt_to_phys(iommu->ppr_log) | PPR_LOG_SIZE_512;
+ entry = iommu_virt_to_phys(iommu->ppr_log) | PPR_LOG_SIZE_512;
memcpy_toio(iommu->mmio_base + MMIO_PPR_LOG_OFFSET,
&entry, sizeof(entry));
@@ -744,10 +745,10 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
if (!iommu->ga_log_tail)
goto err_out;
- entry = (u64)virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512;
+ entry = iommu_virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512;
memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_BASE_OFFSET,
&entry, sizeof(entry));
- entry = ((u64)virt_to_phys(iommu->ga_log) & 0xFFFFFFFFFFFFFULL) & ~7ULL;
+ entry = (iommu_virt_to_phys(iommu->ga_log) & 0xFFFFFFFFFFFFFULL) & ~7ULL;
memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_TAIL_OFFSET,
&entry, sizeof(entry));
writel(0x00, iommu->mmio_base + MMIO_GA_HEAD_OFFSET);
@@ -2552,6 +2553,11 @@ int __init amd_iommu_detect(void)
if (amd_iommu_disabled)
return -ENODEV;
+ if (!sme_iommu_supported()) {
+ pr_notice("AMD-Vi: IOMMU not supported when SME is active\n");
+ return -ENODEV;
+ }
+
ret = iommu_go_to_state(IOMMU_IVRS_DETECTED);
if (ret)
return ret;
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index 466260f..3f12fb2 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -87,4 +87,14 @@ static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
return !!(iommu->features & f);
}
+static inline u64 iommu_virt_to_phys(void *vaddr)
+{
+ return (u64)__sme_set(virt_to_phys(vaddr));
+}
+
+static inline void *iommu_phys_to_virt(unsigned long paddr)
+{
+ return phys_to_virt(__sme_clr(paddr));
+}
+
#endif /* _ASM_X86_AMD_IOMMU_PROTO_H */
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 4de8f41..3ce587d 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -343,7 +343,7 @@
#define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_P)
-#define IOMMU_PTE_PAGE(pte) (phys_to_virt((pte) & IOMMU_PAGE_MASK))
+#define IOMMU_PTE_PAGE(pte) (iommu_phys_to_virt((pte) & IOMMU_PAGE_MASK))
#define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07)
#define IOMMU_PROT_MASK 0x03
diff --git a/include/asm-generic/mem_encrypt.h b/include/asm-generic/mem_encrypt.h
index fb02ff0..bbc49e1 100644
--- a/include/asm-generic/mem_encrypt.h
+++ b/include/asm-generic/mem_encrypt.h
@@ -27,6 +27,11 @@ static inline u64 sme_dma_mask(void)
return 0ULL;
}
+static inline bool sme_iommu_supported(void)
+{
+ return true;
+}
+
/*
* The __sme_set() and __sme_clr() macros are useful for adding or removing
* the encryption mask from a value (e.g. when dealing with pagetable
Add support to check if memory encryption is active in the kernel and that
it has been enabled on the AP. If memory encryption is active in the kernel
but has not been enabled on the AP, then set the memory encryption bit (bit
23) of MSR_K8_SYSCFG to enable memory encryption on that AP and allow the
AP to continue start up.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/realmode.h | 12 ++++++++++++
arch/x86/realmode/init.c | 4 ++++
arch/x86/realmode/rm/trampoline_64.S | 24 ++++++++++++++++++++++++
3 files changed, 40 insertions(+)
diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 230e190..90d9152 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -1,6 +1,15 @@
#ifndef _ARCH_X86_REALMODE_H
#define _ARCH_X86_REALMODE_H
+/*
+ * Flag bit definitions for use with the flags field of the trampoline header
+ * in the CONFIG_X86_64 variant.
+ */
+#define TH_FLAGS_SME_ACTIVE_BIT 0
+#define TH_FLAGS_SME_ACTIVE BIT(TH_FLAGS_SME_ACTIVE_BIT)
+
+#ifndef __ASSEMBLY__
+
#include <linux/types.h>
#include <asm/io.h>
@@ -38,6 +47,7 @@ struct trampoline_header {
u64 start;
u64 efer;
u32 cr4;
+ u32 flags;
#endif
};
@@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
void set_real_mode_mem(phys_addr_t mem, size_t size);
void reserve_real_mode(void);
+#endif /* __ASSEMBLY__ */
+
#endif /* _ARCH_X86_REALMODE_H */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 195ba29..60373d0 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -101,6 +101,10 @@ static void __init setup_real_mode(void)
trampoline_cr4_features = &trampoline_header->cr4;
*trampoline_cr4_features = mmu_cr4_features;
+ trampoline_header->flags = 0;
+ if (sme_active())
+ trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
+
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
trampoline_pgd[0] = trampoline_pgd_entry.pgd;
trampoline_pgd[511] = init_level4_pgt[511].pgd;
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index dac7b20..614fd70 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -30,6 +30,7 @@
#include <asm/msr.h>
#include <asm/segment.h>
#include <asm/processor-flags.h>
+#include <asm/realmode.h>
#include "realmode.h"
.text
@@ -92,6 +93,28 @@ ENTRY(startup_32)
movl %edx, %fs
movl %edx, %gs
+ /*
+ * Check for memory encryption support. This is a safety net in
+ * case BIOS hasn't done the necessary step of setting the bit in
+ * the MSR for this AP. If SME is active and we've gotten this far
+ * then it is safe for us to set the MSR bit and continue. If we
+ * don't we'll eventually crash trying to execute encrypted
+ * instructions.
+ */
+ bt $TH_FLAGS_SME_ACTIVE_BIT, pa_tr_flags
+ jnc .Ldone
+ movl $MSR_K8_SYSCFG, %ecx
+ rdmsr
+ bts $MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
+ jc .Ldone
+
+ /*
+ * Memory encryption is enabled but the SME enable bit for this
+ * CPU has has not been set. It is safe to set it, so do so.
+ */
+ wrmsr
+.Ldone:
+
movl pa_tr_cr4, %eax
movl %eax, %cr4 # Enable PAE mode
@@ -147,6 +170,7 @@ GLOBAL(trampoline_header)
tr_start: .space 8
GLOBAL(tr_efer) .space 8
GLOBAL(tr_cr4) .space 4
+ GLOBAL(tr_flags) .space 4
END(trampoline_header)
#include "trampoline_common.S"
When Secure Memory Encryption is enabled, the trampoline area must not
be encrypted. A CPU running in real mode will not be able to decrypt
memory that has been encrypted because it will not be able to use addresses
with the memory encryption mask.
A recent change that added a new system_state value exposed a warning
issued by early_ioreamp() when the system_state was not SYSTEM_BOOTING.
At the stage where the trampoline area is decrypted, the system_state is
now SYSTEM_SCHEDULING. The check was changed to issue a warning if the
system_state is greater than or equal to SYSTEM_RUNNING.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/realmode/init.c | 11 +++++++++++
mm/early_ioremap.c | 2 +-
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index a163a90..195ba29 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -6,6 +6,7 @@
#include <asm/pgtable.h>
#include <asm/realmode.h>
#include <asm/tlbflush.h>
+#include <asm/mem_encrypt.h>
struct real_mode_header *real_mode_header;
u32 *trampoline_cr4_features;
@@ -130,6 +131,16 @@ static void __init set_real_mode_permissions(void)
unsigned long text_start =
(unsigned long) __va(real_mode_header->text_start);
+ /*
+ * If SME is active, the trampoline area will need to be in
+ * decrypted memory in order to bring up other processors
+ * successfully.
+ */
+ if (sme_active()) {
+ sme_early_decrypt(__pa(base), size);
+ set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
+ }
+
set_memory_nx((unsigned long) base, size >> PAGE_SHIFT);
set_memory_ro((unsigned long) base, ro_size >> PAGE_SHIFT);
set_memory_x((unsigned long) text_start, text_size >> PAGE_SHIFT);
diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index b1dd4a9..01d13ae 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -110,7 +110,7 @@ static int __init check_early_ioremap_leak(void)
enum fixed_addresses idx;
int i, slot;
- WARN_ON(system_state != SYSTEM_BOOTING);
+ WARN_ON(system_state >= SYSTEM_RUNNING);
slot = -1;
for (i = 0; i < FIX_BTMAPS_SLOTS; i++) {
When SME is active, pagetable entries created for EFI need to have the
encryption mask set as necessary.
When the new pagetable pages are allocated they are mapped encrypted. So,
update the efi_pgt value that will be used in cr3 to include the encryption
mask so that the PGD table can be read successfully. The pagetable mapping
as well as the kernel are also added to the pagetable mapping as encrypted.
All other EFI mappings are mapped decrypted (tables, etc.).
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/platform/efi/efi_64.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index eb8dff1..ed37fa3 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -327,7 +327,7 @@ void efi_sync_low_kernel_mappings(void)
int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
{
- unsigned long pfn, text;
+ unsigned long pfn, text, pf;
struct page *page;
unsigned npages;
pgd_t *pgd;
@@ -335,7 +335,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
- efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+ /*
+ * Since the PGD is encrypted, set the encryption mask so that when
+ * this value is loaded into cr3 the PGD will be decrypted during
+ * the pagetable walk.
+ */
+ efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
pgd = efi_pgd;
/*
@@ -345,7 +350,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
* phys_efi_set_virtual_address_map().
*/
pfn = pa_memmap >> PAGE_SHIFT;
- if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW)) {
+ pf = _PAGE_NX | _PAGE_RW | _PAGE_ENC;
+ if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, pf)) {
pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
return 1;
}
@@ -388,7 +394,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
text = __pa(_text);
pfn = text >> PAGE_SHIFT;
- if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
+ pf = _PAGE_RW | _PAGE_ENC;
+ if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
pr_err("Failed to map kernel text 1:1\n");
return 1;
}
Add support to be able to either encrypt or decrypt data in place during
the early stages of booting the kernel. This does not change the memory
encryption attribute - it is used for ensuring that data present in either
an encrypted or decrypted memory area is in the proper state (for example
the initrd will have been loaded by the boot loader and will not be
encrypted, but the memory that it resides in is marked as encrypted).
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 15 +++++++
arch/x86/mm/mem_encrypt.c | 76 ++++++++++++++++++++++++++++++++++++
2 files changed, 91 insertions(+)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index f1c4c29..7c395cf 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,12 +21,27 @@
extern unsigned long sme_me_mask;
+void __init sme_early_encrypt(resource_size_t paddr,
+ unsigned long size);
+void __init sme_early_decrypt(resource_size_t paddr,
+ unsigned long size);
+
void __init sme_early_init(void);
#else /* !CONFIG_AMD_MEM_ENCRYPT */
#define sme_me_mask 0UL
+static inline void __init sme_early_encrypt(resource_size_t paddr,
+ unsigned long size)
+{
+}
+
+static inline void __init sme_early_decrypt(resource_size_t paddr,
+ unsigned long size)
+{
+}
+
static inline void __init sme_early_init(void)
{
}
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 8ca93e5..18c0887 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -17,6 +17,9 @@
#include <linux/mm.h>
+#include <asm/tlbflush.h>
+#include <asm/fixmap.h>
+
/*
* Since SME related variables are set early in the boot process they must
* reside in the .data section so as not to be zeroed out when the .bss
@@ -25,6 +28,79 @@
unsigned long sme_me_mask __section(.data) = 0;
EXPORT_SYMBOL_GPL(sme_me_mask);
+/* Buffer used for early in-place encryption by BSP, no locking needed */
+static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+/*
+ * This routine does not change the underlying encryption setting of the
+ * page(s) that map this memory. It assumes that eventually the memory is
+ * meant to be accessed as either encrypted or decrypted but the contents
+ * are currently not in the desired state.
+ *
+ * This routine follows the steps outlined in the AMD64 Architecture
+ * Programmer's Manual Volume 2, Section 7.10.8 Encrypt-in-Place.
+ */
+static void __init __sme_early_enc_dec(resource_size_t paddr,
+ unsigned long size, bool enc)
+{
+ void *src, *dst;
+ size_t len;
+
+ if (!sme_me_mask)
+ return;
+
+ local_flush_tlb();
+ wbinvd();
+
+ /*
+ * There are limited number of early mapping slots, so map (at most)
+ * one page at time.
+ */
+ while (size) {
+ len = min_t(size_t, sizeof(sme_early_buffer), size);
+
+ /*
+ * Create mappings for the current and desired format of
+ * the memory. Use a write-protected mapping for the source.
+ */
+ src = enc ? early_memremap_decrypted_wp(paddr, len) :
+ early_memremap_encrypted_wp(paddr, len);
+
+ dst = enc ? early_memremap_encrypted(paddr, len) :
+ early_memremap_decrypted(paddr, len);
+
+ /*
+ * If a mapping can't be obtained to perform the operation,
+ * then eventual access of that area in the desired mode
+ * will cause a crash.
+ */
+ BUG_ON(!src || !dst);
+
+ /*
+ * Use a temporary buffer, of cache-line multiple size, to
+ * avoid data corruption as documented in the APM.
+ */
+ memcpy(sme_early_buffer, src, len);
+ memcpy(dst, sme_early_buffer, len);
+
+ early_memunmap(dst, len);
+ early_memunmap(src, len);
+
+ paddr += len;
+ size -= len;
+ }
+}
+
+void __init sme_early_encrypt(resource_size_t paddr, unsigned long size)
+{
+ __sme_early_enc_dec(paddr, size, true);
+}
+
+void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
+{
+ __sme_early_enc_dec(paddr, size, false);
+}
+
void __init sme_early_init(void)
{
unsigned int i;
Add early_memremap() support to be able to specify encrypted and
decrypted mappings with and without write-protection. The use of
write-protection is necessary when encrypting data "in place". The
write-protect attribute is considered cacheable for loads, but not
stores. This implies that the hardware will never give the core a
dirty line with this memtype.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/Kconfig | 4 +++
arch/x86/include/asm/fixmap.h | 13 ++++++++++
arch/x86/include/asm/pgtable_types.h | 8 ++++++
arch/x86/mm/ioremap.c | 44 ++++++++++++++++++++++++++++++++++
include/asm-generic/early_ioremap.h | 2 ++
mm/early_ioremap.c | 10 ++++++++
6 files changed, 81 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 11f2fdb..8002530 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1429,6 +1429,10 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
If set to N, then the encryption of system memory can be
activated with the mem_encrypt=on command line option.
+config ARCH_USE_MEMREMAP_PROT
+ def_bool y
+ depends on AMD_MEM_ENCRYPT
+
# Common NUMA Features
config NUMA
bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index d9ff226..dcd9fb5 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -164,6 +164,19 @@ static inline void __set_fixmap(enum fixed_addresses idx,
*/
#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
+/*
+ * Early memremap routines used for in-place encryption. The mappings created
+ * by these routines are intended to be used as temporary mappings.
+ */
+void __init *early_memremap_encrypted(resource_size_t phys_addr,
+ unsigned long size);
+void __init *early_memremap_encrypted_wp(resource_size_t phys_addr,
+ unsigned long size);
+void __init *early_memremap_decrypted(resource_size_t phys_addr,
+ unsigned long size);
+void __init *early_memremap_decrypted_wp(resource_size_t phys_addr,
+ unsigned long size);
+
#include <asm-generic/fixmap.h>
#define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index d3ae99c..ce8cb1c 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -161,6 +161,7 @@ enum page_cache_mode {
#define _PAGE_CACHE_MASK (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
#define _PAGE_NOCACHE (cachemode2protval(_PAGE_CACHE_MODE_UC))
+#define _PAGE_CACHE_WP (cachemode2protval(_PAGE_CACHE_MODE_WP))
#define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
#define PAGE_SHARED __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
@@ -189,6 +190,7 @@ enum page_cache_mode {
#define __PAGE_KERNEL_VVAR (__PAGE_KERNEL_RO | _PAGE_USER)
#define __PAGE_KERNEL_LARGE (__PAGE_KERNEL | _PAGE_PSE)
#define __PAGE_KERNEL_LARGE_EXEC (__PAGE_KERNEL_EXEC | _PAGE_PSE)
+#define __PAGE_KERNEL_WP (__PAGE_KERNEL | _PAGE_CACHE_WP)
#define __PAGE_KERNEL_IO (__PAGE_KERNEL)
#define __PAGE_KERNEL_IO_NOCACHE (__PAGE_KERNEL_NOCACHE)
@@ -202,6 +204,12 @@ enum page_cache_mode {
#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \
_PAGE_DIRTY | _PAGE_ENC)
+#define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC)
+#define __PAGE_KERNEL_ENC_WP (__PAGE_KERNEL_WP | _PAGE_ENC)
+
+#define __PAGE_KERNEL_NOENC (__PAGE_KERNEL)
+#define __PAGE_KERNEL_NOENC_WP (__PAGE_KERNEL_WP)
+
#define PAGE_KERNEL __pgprot(__PAGE_KERNEL | _PAGE_ENC)
#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
#define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index e6305dd..792db75 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -422,6 +422,50 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
}
+#ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
+/* Remap memory with encryption */
+void __init *early_memremap_encrypted(resource_size_t phys_addr,
+ unsigned long size)
+{
+ return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC);
+}
+
+/*
+ * Remap memory with encryption and write-protected - cannot be called
+ * before pat_init() is called
+ */
+void __init *early_memremap_encrypted_wp(resource_size_t phys_addr,
+ unsigned long size)
+{
+ /* Be sure the write-protect PAT entry is set for write-protect */
+ if (__pte2cachemode_tbl[_PAGE_CACHE_MODE_WP] != _PAGE_CACHE_MODE_WP)
+ return NULL;
+
+ return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC_WP);
+}
+
+/* Remap memory without encryption */
+void __init *early_memremap_decrypted(resource_size_t phys_addr,
+ unsigned long size)
+{
+ return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_NOENC);
+}
+
+/*
+ * Remap memory without encryption and write-protected - cannot be called
+ * before pat_init() is called
+ */
+void __init *early_memremap_decrypted_wp(resource_size_t phys_addr,
+ unsigned long size)
+{
+ /* Be sure the write-protect PAT entry is set for write-protect */
+ if (__pte2cachemode_tbl[_PAGE_CACHE_MODE_WP] != _PAGE_CACHE_MODE_WP)
+ return NULL;
+
+ return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_NOENC_WP);
+}
+#endif /* CONFIG_ARCH_USE_MEMREMAP_PROT */
+
static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
diff --git a/include/asm-generic/early_ioremap.h b/include/asm-generic/early_ioremap.h
index 734ad4d..2edef8d 100644
--- a/include/asm-generic/early_ioremap.h
+++ b/include/asm-generic/early_ioremap.h
@@ -13,6 +13,8 @@ extern void *early_memremap(resource_size_t phys_addr,
unsigned long size);
extern void *early_memremap_ro(resource_size_t phys_addr,
unsigned long size);
+extern void *early_memremap_prot(resource_size_t phys_addr,
+ unsigned long size, unsigned long prot_val);
extern void early_iounmap(void __iomem *addr, unsigned long size);
extern void early_memunmap(void *addr, unsigned long size);
diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index 6d5717b..d7d30da 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -226,6 +226,16 @@ void __init early_iounmap(void __iomem *addr, unsigned long size)
}
#endif
+#ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
+void __init *
+early_memremap_prot(resource_size_t phys_addr, unsigned long size,
+ unsigned long prot_val)
+{
+ return (__force void *)__early_ioremap(phys_addr, size,
+ __pgprot(prot_val));
+}
+#endif
+
#define MAX_MAP_CHUNK (NR_FIX_BTMAPS << PAGE_SHIFT)
void __init copy_from_early_mem(void *dest, phys_addr_t src, unsigned long size)
Create a pgd_pfn() macro similar to the p[um]d_pfn() macros and then
use the p[gum]d_pfn() macros in the p[gum]d_page() macros instead of
duplicating the code.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/pgtable.h | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index f5af95a..96b6b83 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -195,6 +195,11 @@ static inline unsigned long p4d_pfn(p4d_t p4d)
return (p4d_val(p4d) & p4d_pfn_mask(p4d)) >> PAGE_SHIFT;
}
+static inline unsigned long pgd_pfn(pgd_t pgd)
+{
+ return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
static inline int p4d_large(p4d_t p4d)
{
/* No 512 GiB pages yet */
@@ -699,8 +704,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define pmd_page(pmd) \
- pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
+#define pmd_page(pmd) pfn_to_page(pmd_pfn(pmd))
/*
* the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@@ -768,8 +772,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define pud_page(pud) \
- pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
+#define pud_page(pud) pfn_to_page(pud_pfn(pud))
/* Find an entry in the second-level page table.. */
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
@@ -819,8 +822,7 @@ static inline unsigned long p4d_page_vaddr(p4d_t p4d)
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define p4d_page(p4d) \
- pfn_to_page((p4d_val(p4d) & p4d_pfn_mask(p4d)) >> PAGE_SHIFT)
+#define p4d_page(p4d) pfn_to_page(p4d_pfn(p4d))
/* Find an entry in the third-level page table.. */
static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
@@ -854,7 +856,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define pgd_page(pgd) pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
+#define pgd_page(pgd) pfn_to_page(pgd_pfn(pgd))
/* to find an entry in a page-table-directory. */
static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
Currently there is a check if the address being mapped is in the ISA
range (is_ISA_range()), and if it is then phys_to_virt() is used to
perform the mapping. When SME is active, however, this will result
in the mapping having the encryption bit set when it is expected that
an ioremap() should not have the encryption bit set. So only use the
phys_to_virt() function if SME is not active
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/mm/ioremap.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index bbc558b..2a0fa89 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -21,6 +21,7 @@
#include <asm/tlbflush.h>
#include <asm/pgalloc.h>
#include <asm/pat.h>
+#include <asm/mem_encrypt.h>
#include "physaddr.h"
@@ -106,9 +107,11 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
}
/*
- * Don't remap the low PCI/ISA area, it's always mapped..
+ * Don't remap the low PCI/ISA area, it's always mapped.
+ * But if SME is active, skip this so that the encryption bit
+ * doesn't get set.
*/
- if (is_ISA_range(phys_addr, last_addr))
+ if (is_ISA_range(phys_addr, last_addr) && !sme_active())
return (__force void __iomem *)phys_to_virt(phys_addr);
/*
Update the CPU features to include identifying and reporting on the
Secure Memory Encryption (SME) feature. SME is identified by CPUID
0x8000001f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG). Only show the SME feature as available if reported by
CPUID and enabled by BIOS.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/amd.c | 13 +++++++++++++
arch/x86/kernel/cpu/scattered.c | 1 +
4 files changed, 17 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 2701e5f..2b692df 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -196,6 +196,7 @@
#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
+#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */
#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 18b1623..460ac01 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -352,6 +352,8 @@
#define MSR_K8_TOP_MEM1 0xc001001a
#define MSR_K8_TOP_MEM2 0xc001001d
#define MSR_K8_SYSCFG 0xc0010010
+#define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT 23
+#define MSR_K8_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT)
#define MSR_K8_INT_PENDING_MSG 0xc0010055
/* C1E active bits in int pending message */
#define K8_INTP_C1E_ACTIVE_MASK 0x18000000
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index bb5abe8..c47ceee 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -611,6 +611,19 @@ static void early_init_amd(struct cpuinfo_x86 *c)
*/
if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_E400);
+
+ /*
+ * BIOS support is required for SME. If BIOS has not enabled SME
+ * then don't advertise the feature (set in scattered.c)
+ */
+ if (cpu_has(c, X86_FEATURE_SME)) {
+ u64 msr;
+
+ /* Check if SME is enabled */
+ rdmsrl(MSR_K8_SYSCFG, msr);
+ if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+ clear_cpu_cap(c, X86_FEATURE_SME);
+ }
}
static void init_amd_k8(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 23c2350..05459ad 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -31,6 +31,7 @@ struct cpuid_bit {
{ X86_FEATURE_HW_PSTATE, CPUID_EDX, 7, 0x80000007, 0 },
{ X86_FEATURE_CPB, CPUID_EDX, 9, 0x80000007, 0 },
{ X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
+ { X86_FEATURE_SME, CPUID_EAX, 0, 0x8000001f, 0 },
{ 0, 0, 0, 0, 0 }
};
The SMP MP-table is built by UEFI and placed in memory in a decrypted
state. These tables are accessed using a mix of early_memremap(),
early_memunmap(), phys_to_virt() and virt_to_phys(). Change all accesses
to use early_memremap()/early_memunmap(). This allows for proper setting
of the encryption mask so that the data can be successfully accessed when
SME is active.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kernel/mpparse.c | 98 ++++++++++++++++++++++++++++++++-------------
1 file changed, 70 insertions(+), 28 deletions(-)
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index fd37f39..44b5d582 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -429,7 +429,7 @@ static inline void __init construct_default_ISA_mptable(int mpc_default_type)
}
}
-static struct mpf_intel *mpf_found;
+static unsigned long mpf_base;
static unsigned long __init get_mpc_size(unsigned long physptr)
{
@@ -451,6 +451,7 @@ static int __init check_physptr(struct mpf_intel *mpf, unsigned int early)
size = get_mpc_size(mpf->physptr);
mpc = early_memremap(mpf->physptr, size);
+
/*
* Read the physical hardware table. Anything here will
* override the defaults.
@@ -497,12 +498,12 @@ static int __init check_physptr(struct mpf_intel *mpf, unsigned int early)
*/
void __init default_get_smp_config(unsigned int early)
{
- struct mpf_intel *mpf = mpf_found;
+ struct mpf_intel *mpf;
if (!smp_found_config)
return;
- if (!mpf)
+ if (!mpf_base)
return;
if (acpi_lapic && early)
@@ -515,6 +516,12 @@ void __init default_get_smp_config(unsigned int early)
if (acpi_lapic && acpi_ioapic)
return;
+ mpf = early_memremap(mpf_base, sizeof(*mpf));
+ if (!mpf) {
+ pr_err("MPTABLE: mpf early_memremap() failed\n");
+ return;
+ }
+
pr_info("Intel MultiProcessor Specification v1.%d\n",
mpf->specification);
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86_32)
@@ -529,7 +536,7 @@ void __init default_get_smp_config(unsigned int early)
/*
* Now see if we need to read further.
*/
- if (mpf->feature1 != 0) {
+ if (mpf->feature1) {
if (early) {
/*
* local APIC has default address
@@ -542,8 +549,10 @@ void __init default_get_smp_config(unsigned int early)
construct_default_ISA_mptable(mpf->feature1);
} else if (mpf->physptr) {
- if (check_physptr(mpf, early))
+ if (check_physptr(mpf, early)) {
+ early_memunmap(mpf, sizeof(*mpf));
return;
+ }
} else
BUG();
@@ -552,6 +561,8 @@ void __init default_get_smp_config(unsigned int early)
/*
* Only use the first configuration found.
*/
+
+ early_memunmap(mpf, sizeof(*mpf));
}
static void __init smp_reserve_memory(struct mpf_intel *mpf)
@@ -561,15 +572,16 @@ static void __init smp_reserve_memory(struct mpf_intel *mpf)
static int __init smp_scan_config(unsigned long base, unsigned long length)
{
- unsigned int *bp = phys_to_virt(base);
+ unsigned int *bp;
struct mpf_intel *mpf;
- unsigned long mem;
+ int ret = 0;
apic_printk(APIC_VERBOSE, "Scan for SMP in [mem %#010lx-%#010lx]\n",
base, base + length - 1);
BUILD_BUG_ON(sizeof(*mpf) != 16);
while (length > 0) {
+ bp = early_memremap(base, length);
mpf = (struct mpf_intel *)bp;
if ((*bp == SMP_MAGIC_IDENT) &&
(mpf->length == 1) &&
@@ -579,24 +591,26 @@ static int __init smp_scan_config(unsigned long base, unsigned long length)
#ifdef CONFIG_X86_LOCAL_APIC
smp_found_config = 1;
#endif
- mpf_found = mpf;
+ mpf_base = base;
- pr_info("found SMP MP-table at [mem %#010llx-%#010llx] mapped at [%p]\n",
- (unsigned long long) virt_to_phys(mpf),
- (unsigned long long) virt_to_phys(mpf) +
- sizeof(*mpf) - 1, mpf);
+ pr_info("found SMP MP-table at [mem %#010lx-%#010lx] mapped at [%p]\n",
+ base, base + sizeof(*mpf) - 1, mpf);
- mem = virt_to_phys(mpf);
- memblock_reserve(mem, sizeof(*mpf));
+ memblock_reserve(base, sizeof(*mpf));
if (mpf->physptr)
smp_reserve_memory(mpf);
- return 1;
+ ret = 1;
}
- bp += 4;
+ early_memunmap(bp, length);
+
+ if (ret)
+ break;
+
+ base += 16;
length -= 16;
}
- return 0;
+ return ret;
}
void __init default_find_smp_config(void)
@@ -838,29 +852,40 @@ static int __init update_mp_table(void)
char oem[10];
struct mpf_intel *mpf;
struct mpc_table *mpc, *mpc_new;
+ unsigned long size;
if (!enable_update_mptable)
return 0;
- mpf = mpf_found;
- if (!mpf)
+ if (!mpf_base)
+ return 0;
+
+ mpf = early_memremap(mpf_base, sizeof(*mpf));
+ if (!mpf) {
+ pr_err("MPTABLE: mpf early_memremap() failed\n");
return 0;
+ }
/*
* Now see if we need to go further.
*/
- if (mpf->feature1 != 0)
- return 0;
+ if (mpf->feature1)
+ goto do_unmap_mpf;
if (!mpf->physptr)
- return 0;
+ goto do_unmap_mpf;
- mpc = phys_to_virt(mpf->physptr);
+ size = get_mpc_size(mpf->physptr);
+ mpc = early_memremap(mpf->physptr, size);
+ if (!mpc) {
+ pr_err("MPTABLE: mpc early_memremap() failed\n");
+ goto do_unmap_mpf;
+ }
if (!smp_check_mpc(mpc, oem, str))
- return 0;
+ goto do_unmap_mpc;
- pr_info("mpf: %llx\n", (u64)virt_to_phys(mpf));
+ pr_info("mpf: %llx\n", (u64)mpf_base);
pr_info("physptr: %x\n", mpf->physptr);
if (mpc_new_phys && mpc->length > mpc_new_length) {
@@ -878,21 +903,32 @@ static int __init update_mp_table(void)
new = mpf_checksum((unsigned char *)mpc, mpc->length);
if (old == new) {
pr_info("mpc is readonly, please try alloc_mptable instead\n");
- return 0;
+ goto do_unmap_mpc;
}
pr_info("use in-position replacing\n");
} else {
+ mpc_new = early_memremap(mpc_new_phys, mpc_new_length);
+ if (!mpc_new) {
+ pr_err("MPTABLE: new mpc early_memremap() failed\n");
+ goto do_unmap_mpc;
+ }
mpf->physptr = mpc_new_phys;
- mpc_new = phys_to_virt(mpc_new_phys);
memcpy(mpc_new, mpc, mpc->length);
+ early_memunmap(mpc, size);
mpc = mpc_new;
+ size = mpc_new_length;
/* check if we can modify that */
if (mpc_new_phys - mpf->physptr) {
struct mpf_intel *mpf_new;
/* steal 16 bytes from [0, 1k) */
+ mpf_new = early_memremap(0x400 - 16, sizeof(*mpf_new));
+ if (!mpf_new) {
+ pr_err("MPTABLE: new mpf early_memremap() failed\n");
+ goto do_unmap_mpc;
+ }
pr_info("mpf new: %x\n", 0x400 - 16);
- mpf_new = phys_to_virt(0x400 - 16);
memcpy(mpf_new, mpf, 16);
+ early_memunmap(mpf, sizeof(*mpf));
mpf = mpf_new;
mpf->physptr = mpc_new_phys;
}
@@ -909,6 +945,12 @@ static int __init update_mp_table(void)
*/
replace_intsrc_all(mpc, mpc_new_phys, mpc_new_length);
+do_unmap_mpc:
+ early_memunmap(mpc, size);
+
+do_unmap_mpf:
+ early_memunmap(mpf, sizeof(*mpf));
+
return 0;
}
On 06/07/2017 03:14 PM, Tom Lendacky wrote:
> The cr3 register entry can contain the SME encryption bit that indicates
> the PGD is encrypted. The encryption bit should not be used when creating
> a virtual address for the PGD table.
>
> Create a new function, read_cr3_pa(), that will extract the physical
> address from the cr3 register. This function is then used where a virtual
> address of the PGD needs to be created/used from the cr3 register.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/special_insns.h | 9 +++++++++
> arch/x86/kernel/head64.c | 2 +-
> arch/x86/mm/fault.c | 10 +++++-----
> arch/x86/mm/ioremap.c | 2 +-
> arch/x86/platform/olpc/olpc-xo1-pm.c | 2 +-
> arch/x86/power/hibernate_64.c | 2 +-
> arch/x86/xen/mmu_pv.c | 6 +++---
> 7 files changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> index 12af3e3..d8e8ace 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -234,6 +234,15 @@ static inline void clwb(volatile void *__p)
>
> #define nop() asm volatile ("nop")
>
> +static inline unsigned long native_read_cr3_pa(void)
> +{
> + return (native_read_cr3() & PHYSICAL_PAGE_MASK);
> +}
> +
> +static inline unsigned long read_cr3_pa(void)
> +{
> + return (read_cr3() & PHYSICAL_PAGE_MASK);
> +}
>
> #endif /* __KERNEL__ */
>
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 43b7002..dc03624 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -55,7 +55,7 @@ int __init early_make_pgtable(unsigned long address)
> pmdval_t pmd, *pmd_p;
>
> /* Invalid address or early pgt is done ? */
> - if (physaddr >= MAXMEM || read_cr3() != __pa_nodebug(early_level4_pgt))
> + if (physaddr >= MAXMEM || read_cr3_pa() != __pa_nodebug(early_level4_pgt))
> return -1;
>
> again:
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 8ad91a0..2a1fa10c 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -346,7 +346,7 @@ static noinline int vmalloc_fault(unsigned long address)
> * Do _not_ use "current" here. We might be inside
> * an interrupt in the middle of a task switch..
> */
> - pgd_paddr = read_cr3();
> + pgd_paddr = read_cr3_pa();
> pmd_k = vmalloc_sync_one(__va(pgd_paddr), address);
> if (!pmd_k)
> return -1;
> @@ -388,7 +388,7 @@ static bool low_pfn(unsigned long pfn)
>
> static void dump_pagetable(unsigned long address)
> {
> - pgd_t *base = __va(read_cr3());
> + pgd_t *base = __va(read_cr3_pa());
> pgd_t *pgd = &base[pgd_index(address)];
> p4d_t *p4d;
> pud_t *pud;
> @@ -451,7 +451,7 @@ static noinline int vmalloc_fault(unsigned long address)
> * happen within a race in page table update. In the later
> * case just flush:
> */
> - pgd = (pgd_t *)__va(read_cr3()) + pgd_index(address);
> + pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address);
> pgd_ref = pgd_offset_k(address);
> if (pgd_none(*pgd_ref))
> return -1;
> @@ -555,7 +555,7 @@ static int bad_address(void *p)
>
> static void dump_pagetable(unsigned long address)
> {
> - pgd_t *base = __va(read_cr3() & PHYSICAL_PAGE_MASK);
> + pgd_t *base = __va(read_cr3_pa());
> pgd_t *pgd = base + pgd_index(address);
> p4d_t *p4d;
> pud_t *pud;
> @@ -700,7 +700,7 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
> pgd_t *pgd;
> pte_t *pte;
>
> - pgd = __va(read_cr3() & PHYSICAL_PAGE_MASK);
> + pgd = __va(read_cr3_pa());
> pgd += pgd_index(address);
>
> pte = lookup_address_in_pgd(pgd, address, &level);
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 2a0fa89..e6305dd 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -427,7 +427,7 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
> static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
> {
> /* Don't assume we're using swapper_pg_dir at this point */
> - pgd_t *base = __va(read_cr3());
> + pgd_t *base = __va(read_cr3_pa());
> pgd_t *pgd = &base[pgd_index(addr)];
> p4d_t *p4d = p4d_offset(pgd, addr);
> pud_t *pud = pud_offset(p4d, addr);
> diff --git a/arch/x86/platform/olpc/olpc-xo1-pm.c b/arch/x86/platform/olpc/olpc-xo1-pm.c
> index c5350fd..0668aaf 100644
> --- a/arch/x86/platform/olpc/olpc-xo1-pm.c
> +++ b/arch/x86/platform/olpc/olpc-xo1-pm.c
> @@ -77,7 +77,7 @@ static int xo1_power_state_enter(suspend_state_t pm_state)
>
> asmlinkage __visible int xo1_do_sleep(u8 sleep_state)
> {
> - void *pgd_addr = __va(read_cr3());
> + void *pgd_addr = __va(read_cr3_pa());
>
> /* Program wakeup mask (using dword access to CS5536_PM1_EN) */
> outl(wakeup_mask << 16, acpi_base + CS5536_PM1_STS);
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index a6e21fe..0a7650d 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -150,7 +150,7 @@ static int relocate_restore_code(void)
> memcpy((void *)relocated_restore_code, &core_restore_code, PAGE_SIZE);
>
> /* Make the page containing the relocated code executable */
> - pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code);
> + pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(relocated_restore_code);
> p4d = p4d_offset(pgd, relocated_restore_code);
> if (p4d_large(*p4d)) {
> set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index 1f386d7..2dc5243 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -2022,7 +2022,7 @@ static phys_addr_t __init xen_early_virt_to_phys(unsigned long vaddr)
> pmd_t pmd;
> pte_t pte;
>
> - pa = read_cr3();
> + pa = read_cr3_pa();
> pgd = native_make_pgd(xen_read_phys_ulong(pa + pgd_index(vaddr) *
> sizeof(pgd)));
> if (!pgd_present(pgd))
> @@ -2102,7 +2102,7 @@ void __init xen_relocate_p2m(void)
> pt_phys = pmd_phys + PFN_PHYS(n_pmd);
> p2m_pfn = PFN_DOWN(pt_phys) + n_pt;
>
> - pgd = __va(read_cr3());
> + pgd = __va(read_cr3_pa());
> new_p2m = (unsigned long *)(2 * PGDIR_SIZE);
> idx_p4d = 0;
> save_pud = n_pud;
> @@ -2209,7 +2209,7 @@ static void __init xen_write_cr3_init(unsigned long cr3)
> {
> unsigned long pfn = PFN_DOWN(__pa(swapper_pg_dir));
>
> - BUG_ON(read_cr3() != __pa(initial_page_table));
> + BUG_ON(read_cr3_pa() != __pa(initial_page_table));
> BUG_ON(cr3 != __pa(swapper_pg_dir));
>
> /*
(Please copy Xen maintainers when modifying xen-related files.)
Given that page tables for Xen PV guests are controlled by the
hypervisor I don't think this change (although harmless) is necessary.
What may be needed is making sure X86_FEATURE_SME is not set for PV guests.
-boris
On Wed, Jun 7, 2017 at 3:17 PM, Tom Lendacky <[email protected]> wrote:
> The IOMMU is programmed with physical addresses for the various tables
> and buffers that are used to communicate between the device and the
> driver. When the driver allocates this memory it is encrypted. In order
> for the IOMMU to access the memory as encrypted the encryption mask needs
> to be included in these physical addresses during configuration.
>
> The PTE entries created by the IOMMU should also include the encryption
> mask so that when the device behind the IOMMU performs a DMA, the DMA
> will be performed to encrypted memory.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 7 +++++++
> arch/x86/mm/mem_encrypt.c | 30 ++++++++++++++++++++++++++++++
> drivers/iommu/amd_iommu.c | 36 +++++++++++++++++++-----------------
> drivers/iommu/amd_iommu_init.c | 18 ++++++++++++------
> drivers/iommu/amd_iommu_proto.h | 10 ++++++++++
> drivers/iommu/amd_iommu_types.h | 2 +-
> include/asm-generic/mem_encrypt.h | 5 +++++
> 7 files changed, 84 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index c7a2525..d86e544 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -31,6 +31,8 @@ void __init sme_early_decrypt(resource_size_t paddr,
>
> void __init sme_early_init(void);
>
> +bool sme_iommu_supported(void);
> +
> /* Architecture __weak replacement functions */
> void __init mem_encrypt_init(void);
>
> @@ -62,6 +64,11 @@ static inline void __init sme_early_init(void)
> {
> }
>
> +static inline bool sme_iommu_supported(void)
> +{
> + return true;
> +}
> +
> #endif /* CONFIG_AMD_MEM_ENCRYPT */
>
> static inline bool sme_active(void)
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 5d7c51d..018b58a 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -197,6 +197,36 @@ void __init sme_early_init(void)
> protection_map[i] = pgprot_encrypted(protection_map[i]);
> }
>
> +bool sme_iommu_supported(void)
> +{
> + struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> + if (!sme_me_mask || (c->x86 != 0x17))
> + return true;
> +
> + /* For Fam17h, a specific level of support is required */
> + switch (c->microcode & 0xf000) {
> + case 0x0000:
> + return false;
> + case 0x1000:
> + switch (c->microcode & 0x0f00) {
> + case 0x0000:
> + return false;
> + case 0x0100:
> + if ((c->microcode & 0xff) < 0x26)
> + return false;
> + break;
> + case 0x0200:
> + if ((c->microcode & 0xff) < 0x05)
> + return false;
> + break;
> + }
> + break;
> + }
> +
> + return true;
> +}
> +
> /* Architecture __weak replacement functions */
> void __init mem_encrypt_init(void)
> {
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 63cacf5..94eb130 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -544,7 +544,7 @@ static void dump_dte_entry(u16 devid)
>
> static void dump_command(unsigned long phys_addr)
> {
> - struct iommu_cmd *cmd = phys_to_virt(phys_addr);
> + struct iommu_cmd *cmd = iommu_phys_to_virt(phys_addr);
> int i;
>
> for (i = 0; i < 4; ++i)
> @@ -863,13 +863,15 @@ static void copy_cmd_to_buffer(struct amd_iommu *iommu,
> writel(tail, iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
> }
>
> -static void build_completion_wait(struct iommu_cmd *cmd, u64 address)
> +static void build_completion_wait(struct iommu_cmd *cmd, volatile u64 *sem)
> {
> + u64 address = iommu_virt_to_phys((void *)sem);
> +
> WARN_ON(address & 0x7ULL);
>
> memset(cmd, 0, sizeof(*cmd));
> - cmd->data[0] = lower_32_bits(__pa(address)) | CMD_COMPL_WAIT_STORE_MASK;
> - cmd->data[1] = upper_32_bits(__pa(address));
> + cmd->data[0] = lower_32_bits(address) | CMD_COMPL_WAIT_STORE_MASK;
> + cmd->data[1] = upper_32_bits(address);
> cmd->data[2] = 1;
> CMD_SET_TYPE(cmd, CMD_COMPL_WAIT);
> }
> @@ -1033,7 +1035,7 @@ static int __iommu_queue_command_sync(struct amd_iommu *iommu,
>
> iommu->cmd_sem = 0;
>
> - build_completion_wait(&sync_cmd, (u64)&iommu->cmd_sem);
> + build_completion_wait(&sync_cmd, &iommu->cmd_sem);
> copy_cmd_to_buffer(iommu, &sync_cmd, tail);
>
> if ((ret = wait_on_sem(&iommu->cmd_sem)) != 0)
> @@ -1083,7 +1085,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu)
> return 0;
>
>
> - build_completion_wait(&cmd, (u64)&iommu->cmd_sem);
> + build_completion_wait(&cmd, &iommu->cmd_sem);
>
> spin_lock_irqsave(&iommu->lock, flags);
>
> @@ -1328,7 +1330,7 @@ static bool increase_address_space(struct protection_domain *domain,
> return false;
>
> *pte = PM_LEVEL_PDE(domain->mode,
> - virt_to_phys(domain->pt_root));
> + iommu_virt_to_phys(domain->pt_root));
> domain->pt_root = pte;
> domain->mode += 1;
> domain->updated = true;
> @@ -1365,7 +1367,7 @@ static u64 *alloc_pte(struct protection_domain *domain,
> if (!page)
> return NULL;
>
> - __npte = PM_LEVEL_PDE(level, virt_to_phys(page));
> + __npte = PM_LEVEL_PDE(level, iommu_virt_to_phys(page));
>
> /* pte could have been changed somewhere. */
> if (cmpxchg64(pte, __pte, __npte) != __pte) {
> @@ -1481,10 +1483,10 @@ static int iommu_map_page(struct protection_domain *dom,
> return -EBUSY;
>
> if (count > 1) {
> - __pte = PAGE_SIZE_PTE(phys_addr, page_size);
> + __pte = PAGE_SIZE_PTE(__sme_set(phys_addr), page_size);
> __pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_P | IOMMU_PTE_FC;
> } else
> - __pte = phys_addr | IOMMU_PTE_P | IOMMU_PTE_FC;
> + __pte = __sme_set(phys_addr) | IOMMU_PTE_P | IOMMU_PTE_FC;
>
> if (prot & IOMMU_PROT_IR)
> __pte |= IOMMU_PTE_IR;
> @@ -1700,7 +1702,7 @@ static void free_gcr3_tbl_level1(u64 *tbl)
> if (!(tbl[i] & GCR3_VALID))
> continue;
>
> - ptr = __va(tbl[i] & PAGE_MASK);
> + ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
>
> free_page((unsigned long)ptr);
> }
> @@ -1715,7 +1717,7 @@ static void free_gcr3_tbl_level2(u64 *tbl)
> if (!(tbl[i] & GCR3_VALID))
> continue;
>
> - ptr = __va(tbl[i] & PAGE_MASK);
> + ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
>
> free_gcr3_tbl_level1(ptr);
> }
> @@ -1807,7 +1809,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
> u64 flags = 0;
>
> if (domain->mode != PAGE_MODE_NONE)
> - pte_root = virt_to_phys(domain->pt_root);
> + pte_root = iommu_virt_to_phys(domain->pt_root);
>
> pte_root |= (domain->mode & DEV_ENTRY_MODE_MASK)
> << DEV_ENTRY_MODE_SHIFT;
> @@ -1819,7 +1821,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
> flags |= DTE_FLAG_IOTLB;
>
> if (domain->flags & PD_IOMMUV2_MASK) {
> - u64 gcr3 = __pa(domain->gcr3_tbl);
> + u64 gcr3 = iommu_virt_to_phys(domain->gcr3_tbl);
> u64 glx = domain->glx;
> u64 tmp;
>
> @@ -3470,10 +3472,10 @@ static u64 *__get_gcr3_pte(u64 *root, int level, int pasid, bool alloc)
> if (root == NULL)
> return NULL;
>
> - *pte = __pa(root) | GCR3_VALID;
> + *pte = iommu_virt_to_phys(root) | GCR3_VALID;
> }
>
> - root = __va(*pte & PAGE_MASK);
> + root = iommu_phys_to_virt(*pte & PAGE_MASK);
>
> level -= 1;
> }
> @@ -3652,7 +3654,7 @@ static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
>
> dte = amd_iommu_dev_table[devid].data[2];
> dte &= ~DTE_IRQ_PHYS_ADDR_MASK;
> - dte |= virt_to_phys(table->table);
> + dte |= iommu_virt_to_phys(table->table);
> dte |= DTE_IRQ_REMAP_INTCTL;
> dte |= DTE_IRQ_TABLE_LEN;
> dte |= DTE_IRQ_REMAP_ENABLE;
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 5a11328..2870a6b 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -29,6 +29,7 @@
> #include <linux/export.h>
> #include <linux/iommu.h>
> #include <linux/kmemleak.h>
> +#include <linux/mem_encrypt.h>
> #include <asm/pci-direct.h>
> #include <asm/iommu.h>
> #include <asm/gart.h>
> @@ -346,7 +347,7 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
>
> BUG_ON(iommu->mmio_base == NULL);
>
> - entry = virt_to_phys(amd_iommu_dev_table);
> + entry = iommu_virt_to_phys(amd_iommu_dev_table);
> entry |= (dev_table_size >> 12) - 1;
> memcpy_toio(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET,
> &entry, sizeof(entry));
> @@ -602,7 +603,7 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
>
> BUG_ON(iommu->cmd_buf == NULL);
>
> - entry = (u64)virt_to_phys(iommu->cmd_buf);
> + entry = iommu_virt_to_phys(iommu->cmd_buf);
> entry |= MMIO_CMD_SIZE_512;
>
> memcpy_toio(iommu->mmio_base + MMIO_CMD_BUF_OFFSET,
> @@ -631,7 +632,7 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
>
> BUG_ON(iommu->evt_buf == NULL);
>
> - entry = (u64)virt_to_phys(iommu->evt_buf) | EVT_LEN_MASK;
> + entry = iommu_virt_to_phys(iommu->evt_buf) | EVT_LEN_MASK;
>
> memcpy_toio(iommu->mmio_base + MMIO_EVT_BUF_OFFSET,
> &entry, sizeof(entry));
> @@ -664,7 +665,7 @@ static void iommu_enable_ppr_log(struct amd_iommu *iommu)
> if (iommu->ppr_log == NULL)
> return;
>
> - entry = (u64)virt_to_phys(iommu->ppr_log) | PPR_LOG_SIZE_512;
> + entry = iommu_virt_to_phys(iommu->ppr_log) | PPR_LOG_SIZE_512;
>
> memcpy_toio(iommu->mmio_base + MMIO_PPR_LOG_OFFSET,
> &entry, sizeof(entry));
> @@ -744,10 +745,10 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
> if (!iommu->ga_log_tail)
> goto err_out;
>
> - entry = (u64)virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512;
> + entry = iommu_virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512;
> memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_BASE_OFFSET,
> &entry, sizeof(entry));
> - entry = ((u64)virt_to_phys(iommu->ga_log) & 0xFFFFFFFFFFFFFULL) & ~7ULL;
> + entry = (iommu_virt_to_phys(iommu->ga_log) & 0xFFFFFFFFFFFFFULL) & ~7ULL;
> memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_TAIL_OFFSET,
> &entry, sizeof(entry));
> writel(0x00, iommu->mmio_base + MMIO_GA_HEAD_OFFSET);
> @@ -2552,6 +2553,11 @@ int __init amd_iommu_detect(void)
> if (amd_iommu_disabled)
> return -ENODEV;
>
> + if (!sme_iommu_supported()) {
> + pr_notice("AMD-Vi: IOMMU not supported when SME is active\n");
> + return -ENODEV;
> + }
> +
> ret = iommu_go_to_state(IOMMU_IVRS_DETECTED);
> if (ret)
> return ret;
> diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
> index 466260f..3f12fb2 100644
> --- a/drivers/iommu/amd_iommu_proto.h
> +++ b/drivers/iommu/amd_iommu_proto.h
> @@ -87,4 +87,14 @@ static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
> return !!(iommu->features & f);
> }
>
> +static inline u64 iommu_virt_to_phys(void *vaddr)
> +{
> + return (u64)__sme_set(virt_to_phys(vaddr));
> +}
> +
> +static inline void *iommu_phys_to_virt(unsigned long paddr)
> +{
> + return phys_to_virt(__sme_clr(paddr));
> +}
> +
> #endif /* _ASM_X86_AMD_IOMMU_PROTO_H */
> diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
> index 4de8f41..3ce587d 100644
> --- a/drivers/iommu/amd_iommu_types.h
> +++ b/drivers/iommu/amd_iommu_types.h
> @@ -343,7 +343,7 @@
>
> #define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
> #define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_P)
> -#define IOMMU_PTE_PAGE(pte) (phys_to_virt((pte) & IOMMU_PAGE_MASK))
> +#define IOMMU_PTE_PAGE(pte) (iommu_phys_to_virt((pte) & IOMMU_PAGE_MASK))
> #define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07)
>
> #define IOMMU_PROT_MASK 0x03
> diff --git a/include/asm-generic/mem_encrypt.h b/include/asm-generic/mem_encrypt.h
> index fb02ff0..bbc49e1 100644
> --- a/include/asm-generic/mem_encrypt.h
> +++ b/include/asm-generic/mem_encrypt.h
> @@ -27,6 +27,11 @@ static inline u64 sme_dma_mask(void)
> return 0ULL;
> }
>
> +static inline bool sme_iommu_supported(void)
> +{
> + return true;
> +}
> +
> /*
> * The __sme_set() and __sme_clr() macros are useful for adding or removing
> * the encryption mask from a value (e.g. when dealing with pagetable
>
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
Hi Tom,
This sounds like a cool feature. I'm trying to test it on my Ryzen
system, but c->microcode & 0xf000 is evaluating as 0, so IOMMU is not
being enabled on my system. I'm using the latest microcode for AGESA
1.0.0.6, 0x08001126. Is this work reliant on a future microcode
update, or is there some other issue?
Thanks,
Sarnex
On Wed, Jun 7, 2017 at 3:13 PM, Tom Lendacky <[email protected]> wrote:
> This patch series provides support for AMD's new Secure Memory Encryption (SME)
> feature.
>
> SME can be used to mark individual pages of memory as encrypted through the
> page tables. A page of memory that is marked encrypted will be automatically
> decrypted when read from DRAM and will be automatically encrypted when
> written to DRAM. Details on SME can found in the links below.
>
> The SME feature is identified through a CPUID function and enabled through
> the SYSCFG MSR. Once enabled, page table entries will determine how the
> memory is accessed. If a page table entry has the memory encryption mask set,
> then that memory will be accessed as encrypted memory. The memory encryption
> mask (as well as other related information) is determined from settings
> returned through the same CPUID function that identifies the presence of the
> feature.
>
> The approach that this patch series takes is to encrypt everything possible
> starting early in the boot where the kernel is encrypted. Using the page
> table macros the encryption mask can be incorporated into all page table
> entries and page allocations. By updating the protection map, userspace
> allocations are also marked encrypted. Certain data must be accounted for
> as having been placed in memory before SME was enabled (EFI, initrd, etc.)
> and accessed accordingly.
>
> This patch series is a pre-cursor to another AMD processor feature called
> Secure Encrypted Virtualization (SEV). The support for SEV will build upon
> the SME support and will be submitted later. Details on SEV can be found
> in the links below.
>
> The following links provide additional detail:
>
> AMD Memory Encryption whitepaper:
> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
>
> AMD64 Architecture Programmer's Manual:
> http://support.amd.com/TechDocs/24593.pdf
> SME is section 7.10
> SEV is section 15.34
>
> ---
>
> This patch series is based off of the master branch of tip.
> Commit 53614fbd7961 ("Merge branch 'WIP.x86/fpu'")
>
> Source code is also available at https://github.com/codomania/tip/tree/sme-v6
>
>
> Still to do:
> - Kdump support, including using memremap() instead of ioremap_cache()
>
> Changes since v5:
> - Added support for 5-level paging
> - Added IOMMU support
> - Created a generic asm/mem_encrypt.h in order to remove a bunch of
> #ifndef/#define entries
> - Removed changes to the __va() macro and defined a function to return
> the true physical address in cr3
> - Removed sysfs support as it was determined not to be needed
> - General code cleanup based on feedback
> - General cleanup of patch subjects and descriptions
>
> Changes since v4:
> - Re-worked mapping of setup data to not use a fixed list. Rather, check
> dynamically whether the requested early_memremap()/memremap() call
> needs to be mapped decrypted.
> - Moved SME cpu feature into scattered features
> - Moved some declarations into header files
> - Cleared the encryption mask from the __PHYSICAL_MASK so that users
> of macros such as pmd_pfn_mask() don't have to worry/know about the
> encryption mask
> - Updated some return types and values related to EFI and e820 functions
> so that an error could be returned
> - During cpu shutdown, removed cache disabling and added a check for kexec
> in progress to use wbinvd followed immediately by halt in order to avoid
> any memory corruption
> - Update how persistent memory is identified
> - Added a function to find command line arguments and their values
> - Added sysfs support
> - General code cleanup based on feedback
> - General cleanup of patch subjects and descriptions
>
>
> Changes since v3:
> - Broke out some of the patches into smaller individual patches
> - Updated Documentation
> - Added a message to indicate why the IOMMU was disabled
> - Updated CPU feature support for SME by taking into account whether
> BIOS has enabled SME
> - Eliminated redundant functions
> - Added some warning messages for DMA usage of bounce buffers when SME
> is active
> - Added support for persistent memory
> - Added support to determine when setup data is being mapped and be sure
> to map it un-encrypted
> - Added CONFIG support to set the default action of whether to activate
> SME if it is supported/enabled
> - Added support for (re)booting with kexec
>
> Changes since v2:
> - Updated Documentation
> - Make the encryption mask available outside of arch/x86 through a
> standard include file
> - Conversion of assembler routines to C where possible (not everything
> could be converted, e.g. the routine that does the actual encryption
> needs to be copied into a safe location and it is difficult to
> determine the actual length of the function in order to copy it)
> - Fix SME feature use of scattered CPUID feature
> - Creation of SME specific functions for things like encrypting
> the setup data, ramdisk, etc.
> - New take on early_memremap / memremap encryption support
> - Additional support for accessing video buffers (fbdev/gpu) as
> un-encrypted
> - Disable IOMMU for now - need to investigate further in relation to
> how it needs to be programmed relative to accessing physical memory
>
> Changes since v1:
> - Added Documentation.
> - Removed AMD vendor check for setting the PAT write protect mode
> - Updated naming of trampoline flag for SME as well as moving of the
> SME check to before paging is enabled.
> - Change to early_memremap to identify the data being mapped as either
> boot data or kernel data. The idea being that boot data will have
> been placed in memory as un-encrypted data and would need to be accessed
> as such.
> - Updated debugfs support for the bootparams to access the data properly.
> - Do not set the SYSCFG[MEME] bit, only check it. The setting of the
> MemEncryptionModeEn bit results in a reduction of physical address size
> of the processor. It is possible that BIOS could have configured resources
> resources into a range that will now not be addressable. To prevent this,
> rely on BIOS to set the SYSCFG[MEME] bit and only then enable memory
> encryption support in the kernel.
>
> Tom Lendacky (34):
> x86: Document AMD Secure Memory Encryption (SME)
> x86/mm/pat: Set write-protect cache mode for full PAT support
> x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use memremap for RAM mappings
> x86/CPU/AMD: Add the Secure Memory Encryption CPU feature
> x86/CPU/AMD: Handle SME reduction in physical address size
> x86/mm: Add Secure Memory Encryption (SME) support
> x86/mm: Don't use phys_to_virt in ioremap() if SME is active
> x86/mm: Add support to enable SME in early boot processing
> x86/mm: Simplify p[gum]d_page() macros
> x86, x86/mm, x86/xen, olpc: Use __va() against just the physical address in cr3
> x86/mm: Provide general kernel support for memory encryption
> x86/mm: Extend early_memremap() support with additional attrs
> x86/mm: Add support for early encrypt/decrypt of memory
> x86/mm: Insure that boot memory areas are mapped properly
> x86/boot/e820: Add support to determine the E820 type of an address
> efi: Add an EFI table address match function
> efi: Update efi_mem_type() to return an error rather than 0
> x86/efi: Update EFI pagetable creation to work with SME
> x86/mm: Add support to access boot related data in the clear
> x86, mpparse: Use memremap to map the mpf and mpc data
> x86/mm: Add support to access persistent memory in the clear
> x86/mm: Add support for changing the memory encryption attribute
> x86, realmode: Decrypt trampoline area if memory encryption is active
> x86, swiotlb: Add memory encryption support
> swiotlb: Add warnings for use of bounce buffers with SME
> iommu/amd: Allow the AMD IOMMU to work with memory encryption
> x86, realmode: Check for memory encryption on the APs
> x86, drm, fbdev: Do not specify encrypted memory for video mappings
> kvm: x86: svm: Support Secure Memory Encryption within KVM
> x86/mm, kexec: Allow kexec to be used with SME
> x86/mm: Use proper encryption attributes with /dev/mem
> x86/mm: Add support to encrypt the kernel in-place
> x86/boot: Add early cmdline parsing for options with arguments
> x86/mm: Add support to make use of Secure Memory Encryption
>
>
> Documentation/admin-guide/kernel-parameters.txt | 11
> Documentation/x86/amd-memory-encryption.txt | 68 ++
> arch/ia64/kernel/efi.c | 4
> arch/x86/Kconfig | 26 +
> arch/x86/boot/compressed/pagetable.c | 7
> arch/x86/include/asm/cmdline.h | 2
> arch/x86/include/asm/cpufeatures.h | 1
> arch/x86/include/asm/dma-mapping.h | 5
> arch/x86/include/asm/dmi.h | 8
> arch/x86/include/asm/e820/api.h | 2
> arch/x86/include/asm/fixmap.h | 20 +
> arch/x86/include/asm/init.h | 1
> arch/x86/include/asm/io.h | 7
> arch/x86/include/asm/kexec.h | 8
> arch/x86/include/asm/kvm_host.h | 2
> arch/x86/include/asm/mem_encrypt.h | 112 ++++
> arch/x86/include/asm/msr-index.h | 2
> arch/x86/include/asm/page_types.h | 2
> arch/x86/include/asm/pgtable.h | 28 +
> arch/x86/include/asm/pgtable_types.h | 54 +-
> arch/x86/include/asm/processor.h | 3
> arch/x86/include/asm/realmode.h | 12
> arch/x86/include/asm/set_memory.h | 3
> arch/x86/include/asm/special_insns.h | 9
> arch/x86/include/asm/vga.h | 14
> arch/x86/kernel/acpi/boot.c | 6
> arch/x86/kernel/cpu/amd.c | 17 +
> arch/x86/kernel/cpu/scattered.c | 1
> arch/x86/kernel/e820.c | 26 +
> arch/x86/kernel/espfix_64.c | 2
> arch/x86/kernel/head64.c | 42 +
> arch/x86/kernel/head_64.S | 80 ++-
> arch/x86/kernel/kdebugfs.c | 34 -
> arch/x86/kernel/ksysfs.c | 28 -
> arch/x86/kernel/machine_kexec_64.c | 35 +
> arch/x86/kernel/mpparse.c | 108 +++-
> arch/x86/kernel/pci-dma.c | 11
> arch/x86/kernel/pci-nommu.c | 2
> arch/x86/kernel/pci-swiotlb.c | 15 -
> arch/x86/kernel/process.c | 17 +
> arch/x86/kernel/setup.c | 9
> arch/x86/kvm/mmu.c | 12
> arch/x86/kvm/mmu.h | 2
> arch/x86/kvm/svm.c | 35 +
> arch/x86/kvm/vmx.c | 3
> arch/x86/kvm/x86.c | 3
> arch/x86/lib/cmdline.c | 105 ++++
> arch/x86/mm/Makefile | 3
> arch/x86/mm/fault.c | 10
> arch/x86/mm/ident_map.c | 12
> arch/x86/mm/ioremap.c | 277 +++++++++-
> arch/x86/mm/kasan_init_64.c | 4
> arch/x86/mm/mem_encrypt.c | 667 +++++++++++++++++++++++
> arch/x86/mm/mem_encrypt_boot.S | 150 +++++
> arch/x86/mm/pageattr.c | 67 ++
> arch/x86/mm/pat.c | 9
> arch/x86/pci/common.c | 4
> arch/x86/platform/efi/efi.c | 6
> arch/x86/platform/efi/efi_64.c | 15 -
> arch/x86/platform/olpc/olpc-xo1-pm.c | 2
> arch/x86/power/hibernate_64.c | 2
> arch/x86/realmode/init.c | 15 +
> arch/x86/realmode/rm/trampoline_64.S | 24 +
> arch/x86/xen/mmu_pv.c | 6
> drivers/firmware/dmi-sysfs.c | 5
> drivers/firmware/efi/efi.c | 33 +
> drivers/firmware/pcdp.c | 4
> drivers/gpu/drm/drm_gem.c | 2
> drivers/gpu/drm/drm_vm.c | 4
> drivers/gpu/drm/ttm/ttm_bo_vm.c | 7
> drivers/gpu/drm/udl/udl_fb.c | 4
> drivers/iommu/amd_iommu.c | 36 +
> drivers/iommu/amd_iommu_init.c | 18 -
> drivers/iommu/amd_iommu_proto.h | 10
> drivers/iommu/amd_iommu_types.h | 2
> drivers/sfi/sfi_core.c | 22 -
> drivers/video/fbdev/core/fbmem.c | 12
> include/asm-generic/early_ioremap.h | 2
> include/asm-generic/mem_encrypt.h | 45 ++
> include/asm-generic/pgtable.h | 8
> include/linux/dma-mapping.h | 9
> include/linux/efi.h | 9
> include/linux/io.h | 2
> include/linux/kexec.h | 14
> include/linux/mem_encrypt.h | 18 +
> include/linux/swiotlb.h | 1
> init/main.c | 13
> kernel/kexec_core.c | 6
> kernel/memremap.c | 20 +
> lib/swiotlb.c | 59 ++
> mm/early_ioremap.c | 30 +
> 91 files changed, 2411 insertions(+), 261 deletions(-)
> create mode 100644 Documentation/x86/amd-memory-encryption.txt
> create mode 100644 arch/x86/include/asm/mem_encrypt.h
> create mode 100644 arch/x86/mm/mem_encrypt.c
> create mode 100644 arch/x86/mm/mem_encrypt_boot.S
> create mode 100644 include/asm-generic/mem_encrypt.h
> create mode 100644 include/linux/mem_encrypt.h
>
> --
> Tom Lendacky
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
Hi Tom,
Thanks for your work on this. This may be a stupid question, but is
using bounce buffers for the GPU(s) expected to reduce performance in
any/a noticeable way? I'm hitting another issue which I've already
sent mail about so I can't test it for myself at the moment,
Thanks,
Sarnex
Hi Tom,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.12-rc4 next-20170607]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Tom-Lendacky/x86-Secure-Memory-Encryption-AMD/20170608-104147
config: i386-randconfig-x077-06040719 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All error/warnings (new ones prefixed by >>):
In file included from arch/x86/include/asm/dma.h:12:0,
from include/linux/bootmem.h:9,
from arch/x86/mm/ioremap.c:9:
>> arch/x86/include/asm/io.h:386:37: error: conflicting types for 'arch_memremap_can_ram_remap'
#define arch_memremap_can_ram_remap arch_memremap_can_ram_remap
^
>> arch/x86/mm/ioremap.c:561:6: note: in expansion of macro 'arch_memremap_can_ram_remap'
bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/io.h:384:13: note: previous declaration of 'arch_memremap_can_ram_remap' was here
extern bool arch_memremap_can_ram_remap(resource_size_t offset, size_t size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~
--
In file included from arch/x86/include/asm/dma.h:12:0,
from include/linux/bootmem.h:9,
from arch/x86//mm/ioremap.c:9:
>> arch/x86/include/asm/io.h:386:37: error: conflicting types for 'arch_memremap_can_ram_remap'
#define arch_memremap_can_ram_remap arch_memremap_can_ram_remap
^
arch/x86//mm/ioremap.c:561:6: note: in expansion of macro 'arch_memremap_can_ram_remap'
bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/io.h:384:13: note: previous declaration of 'arch_memremap_can_ram_remap' was here
extern bool arch_memremap_can_ram_remap(resource_size_t offset, size_t size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~
vim +/arch_memremap_can_ram_remap +386 arch/x86/include/asm/io.h
380 extern void arch_io_free_memtype_wc(resource_size_t start, resource_size_t size);
381 #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
382 #endif
383
384 extern bool arch_memremap_can_ram_remap(resource_size_t offset, size_t size,
385 unsigned long flags);
> 386 #define arch_memremap_can_ram_remap arch_memremap_can_ram_remap
387
388 #endif /* _ASM_X86_IO_H */
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Tom,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.12-rc4 next-20170607]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Tom-Lendacky/x86-Secure-Memory-Encryption-AMD/20170608-104147
config: sparc-defconfig (attached as .config)
compiler: sparc-linux-gcc (GCC) 6.2.0
reproduce:
wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc
All errors (new ones prefixed by >>):
In file included from include/linux/dma-mapping.h:13:0,
from include/linux/skbuff.h:34,
from include/linux/filter.h:12,
from kernel//bpf/core.c:24:
>> include/linux/mem_encrypt.h:16:29: fatal error: asm/mem_encrypt.h: No such file or directory
#include <asm/mem_encrypt.h>
^
compilation terminated.
vim +16 include/linux/mem_encrypt.h
2d7c2ec4 Tom Lendacky 2017-06-07 10 * published by the Free Software Foundation.
2d7c2ec4 Tom Lendacky 2017-06-07 11 */
2d7c2ec4 Tom Lendacky 2017-06-07 12
2d7c2ec4 Tom Lendacky 2017-06-07 13 #ifndef __MEM_ENCRYPT_H__
2d7c2ec4 Tom Lendacky 2017-06-07 14 #define __MEM_ENCRYPT_H__
2d7c2ec4 Tom Lendacky 2017-06-07 15
2d7c2ec4 Tom Lendacky 2017-06-07 @16 #include <asm/mem_encrypt.h>
2d7c2ec4 Tom Lendacky 2017-06-07 17
2d7c2ec4 Tom Lendacky 2017-06-07 18 #endif /* __MEM_ENCRYPT_H__ */
:::::: The code at line 16 was first introduced by commit
:::::: 2d7c2ec4c60e83432b27bfb32042706f404d4158 x86/mm: Add Secure Memory Encryption (SME) support
:::::: TO: Tom Lendacky <[email protected]>
:::::: CC: 0day robot <[email protected]>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Wed, Jun 7, 2017 at 12:14 PM, Tom Lendacky <[email protected]> wrote:
> The cr3 register entry can contain the SME encryption bit that indicates
> the PGD is encrypted. The encryption bit should not be used when creating
> a virtual address for the PGD table.
>
> Create a new function, read_cr3_pa(), that will extract the physical
> address from the cr3 register. This function is then used where a virtual
> address of the PGD needs to be created/used from the cr3 register.
This is going to conflict with:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pcid&id=555c81e5d01a62b629ec426a2f50d27e2127c1df
We're both encountering the fact that CR3 munges the page table PA
with some other stuff, and some readers want to see the actual CR3
value and other readers just want the PA. The thing I prefer about my
patch is that I get rid of read_cr3() entirely, forcing the patch to
update every single reader, making review and conflict resolution much
safer.
I'd be willing to send a patch tomorrow that just does the split into
__read_cr3() and read_cr3_pa() (I like your name better) and then we
can both base on top of it. Would that make sense?
Also:
> +static inline unsigned long read_cr3_pa(void)
> +{
> + return (read_cr3() & PHYSICAL_PAGE_MASK);
> +}
Is there any guarantee that the magic encryption bit is masked out in
PHYSICAL_PAGE_MASK? The docs make it sound like it could be any bit.
(But if it's one of the low 12 bits, that would be quite confusing.)
--Andy
Hi Tom,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.12-rc4 next-20170607]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Tom-Lendacky/x86-Secure-Memory-Encryption-AMD/20170608-104147
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=um SUBARCH=x86_64
All errors (new ones prefixed by >>):
In file included from arch/x86/include/asm/cacheflush.h:6:0,
from include/linux/highmem.h:11,
from net/core/sock.c:116:
arch/x86/include/asm/special_insns.h: In function 'native_read_cr3_pa':
>> arch/x86/include/asm/special_insns.h:239:30: error: 'PHYSICAL_PAGE_MASK' undeclared (first use in this function)
return (native_read_cr3() & PHYSICAL_PAGE_MASK);
^~~~~~~~~~~~~~~~~~
arch/x86/include/asm/special_insns.h:239:30: note: each undeclared identifier is reported only once for each function it appears in
arch/x86/include/asm/special_insns.h: In function 'read_cr3_pa':
arch/x86/include/asm/special_insns.h:244:23: error: 'PHYSICAL_PAGE_MASK' undeclared (first use in this function)
return (read_cr3() & PHYSICAL_PAGE_MASK);
^~~~~~~~~~~~~~~~~~
vim +/PHYSICAL_PAGE_MASK +239 arch/x86/include/asm/special_insns.h
233 }
234
235 #define nop() asm volatile ("nop")
236
237 static inline unsigned long native_read_cr3_pa(void)
238 {
> 239 return (native_read_cr3() & PHYSICAL_PAGE_MASK);
240 }
241
242 static inline unsigned long read_cr3_pa(void)
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Wed, Jun 07, 2017 at 02:17:32PM -0500, Tom Lendacky wrote:
> Add warnings to let the user know when bounce buffers are being used for
> DMA when SME is active. Since the bounce buffers are not in encrypted
> memory, these notifications are to allow the user to determine some
> appropriate action - if necessary.
And what would the action be? Do we need a boot or other option to
disallow this fallback for people who care deeply?
On 6/7/2017 5:06 PM, Boris Ostrovsky wrote:
> On 06/07/2017 03:14 PM, Tom Lendacky wrote:
>> The cr3 register entry can contain the SME encryption bit that indicates
>> the PGD is encrypted. The encryption bit should not be used when creating
>> a virtual address for the PGD table.
>>
>> Create a new function, read_cr3_pa(), that will extract the physical
>> address from the cr3 register. This function is then used where a virtual
>> address of the PGD needs to be created/used from the cr3 register.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>> arch/x86/include/asm/special_insns.h | 9 +++++++++
>> arch/x86/kernel/head64.c | 2 +-
>> arch/x86/mm/fault.c | 10 +++++-----
>> arch/x86/mm/ioremap.c | 2 +-
>> arch/x86/platform/olpc/olpc-xo1-pm.c | 2 +-
>> arch/x86/power/hibernate_64.c | 2 +-
>> arch/x86/xen/mmu_pv.c | 6 +++---
>> 7 files changed, 21 insertions(+), 12 deletions(-)
>>
...
>> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
>> index 1f386d7..2dc5243 100644
>> --- a/arch/x86/xen/mmu_pv.c
>> +++ b/arch/x86/xen/mmu_pv.c
>> @@ -2022,7 +2022,7 @@ static phys_addr_t __init xen_early_virt_to_phys(unsigned long vaddr)
>> pmd_t pmd;
>> pte_t pte;
>>
>> - pa = read_cr3();
>> + pa = read_cr3_pa();
>> pgd = native_make_pgd(xen_read_phys_ulong(pa + pgd_index(vaddr) *
>> sizeof(pgd)));
>> if (!pgd_present(pgd))
>> @@ -2102,7 +2102,7 @@ void __init xen_relocate_p2m(void)
>> pt_phys = pmd_phys + PFN_PHYS(n_pmd);
>> p2m_pfn = PFN_DOWN(pt_phys) + n_pt;
>>
>> - pgd = __va(read_cr3());
>> + pgd = __va(read_cr3_pa());
>> new_p2m = (unsigned long *)(2 * PGDIR_SIZE);
>> idx_p4d = 0;
>> save_pud = n_pud;
>> @@ -2209,7 +2209,7 @@ static void __init xen_write_cr3_init(unsigned long cr3)
>> {
>> unsigned long pfn = PFN_DOWN(__pa(swapper_pg_dir));
>>
>> - BUG_ON(read_cr3() != __pa(initial_page_table));
>> + BUG_ON(read_cr3_pa() != __pa(initial_page_table));
>> BUG_ON(cr3 != __pa(swapper_pg_dir));
>>
>> /*
>
>
> (Please copy Xen maintainers when modifying xen-related files.)
Sorry about that, missed adding the Xen maintainers when I added this
change.
>
> Given that page tables for Xen PV guests are controlled by the
> hypervisor I don't think this change (although harmless) is necessary.
I can back this change out if the Xen maintainers think that's best.
> What may be needed is making sure X86_FEATURE_SME is not set for PV guests.
And that may be something that Xen will need to control through either
CPUID or MSR support for the PV guests.
Thanks,
Tom
>
> -boris
>
On 6/7/2017 9:38 PM, Nick Sarnie wrote:
> On Wed, Jun 7, 2017 at 3:17 PM, Tom Lendacky <[email protected]> wrote:
>> The IOMMU is programmed with physical addresses for the various tables
>> and buffers that are used to communicate between the device and the
>> driver. When the driver allocates this memory it is encrypted. In order
>> for the IOMMU to access the memory as encrypted the encryption mask needs
>> to be included in these physical addresses during configuration.
>>
>> The PTE entries created by the IOMMU should also include the encryption
>> mask so that when the device behind the IOMMU performs a DMA, the DMA
>> will be performed to encrypted memory.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>> arch/x86/include/asm/mem_encrypt.h | 7 +++++++
>> arch/x86/mm/mem_encrypt.c | 30 ++++++++++++++++++++++++++++++
>> drivers/iommu/amd_iommu.c | 36 +++++++++++++++++++-----------------
>> drivers/iommu/amd_iommu_init.c | 18 ++++++++++++------
>> drivers/iommu/amd_iommu_proto.h | 10 ++++++++++
>> drivers/iommu/amd_iommu_types.h | 2 +-
>> include/asm-generic/mem_encrypt.h | 5 +++++
>> 7 files changed, 84 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index c7a2525..d86e544 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -31,6 +31,8 @@ void __init sme_early_decrypt(resource_size_t paddr,
>>
>> void __init sme_early_init(void);
>>
>> +bool sme_iommu_supported(void);
>> +
>> /* Architecture __weak replacement functions */
>> void __init mem_encrypt_init(void);
>>
>> @@ -62,6 +64,11 @@ static inline void __init sme_early_init(void)
>> {
>> }
>>
>> +static inline bool sme_iommu_supported(void)
>> +{
>> + return true;
>> +}
>> +
>> #endif /* CONFIG_AMD_MEM_ENCRYPT */
>>
>> static inline bool sme_active(void)
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 5d7c51d..018b58a 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -197,6 +197,36 @@ void __init sme_early_init(void)
>> protection_map[i] = pgprot_encrypted(protection_map[i]);
>> }
>>
>> +bool sme_iommu_supported(void)
>> +{
>> + struct cpuinfo_x86 *c = &boot_cpu_data;
>> +
>> + if (!sme_me_mask || (c->x86 != 0x17))
>> + return true;
>> +
>> + /* For Fam17h, a specific level of support is required */
>> + switch (c->microcode & 0xf000) {
>> + case 0x0000:
>> + return false;
>> + case 0x1000:
>> + switch (c->microcode & 0x0f00) {
>> + case 0x0000:
>> + return false;
>> + case 0x0100:
>> + if ((c->microcode & 0xff) < 0x26)
>> + return false;
>> + break;
>> + case 0x0200:
>> + if ((c->microcode & 0xff) < 0x05)
>> + return false;
>> + break;
>> + }
>> + break;
>> + }
>> +
>> + return true;
>> +}
>> +
>> /* Architecture __weak replacement functions */
>> void __init mem_encrypt_init(void)
>> {
>
...
>
> Hi Tom,
>
> This sounds like a cool feature. I'm trying to test it on my Ryzen
> system, but c->microcode & 0xf000 is evaluating as 0, so IOMMU is not
> being enabled on my system. I'm using the latest microcode for AGESA
> 1.0.0.6, 0x08001126. Is this work reliant on a future microcode
> update, or is there some other issue?
This is my mistake. I moved the check and didn't re-test. At this point
the c->microcode field hasn't been filled in so I'll need to read
MSR_AMD64_PATCH_LEVEL directly in the sme_iommu_supported() function.
Thanks,
Tom
>
> Thanks,
> Sarnex
>
On 6/7/2017 9:40 PM, Nick Sarnie wrote:
> On Wed, Jun 7, 2017 at 3:13 PM, Tom Lendacky <[email protected]> wrote:
>> This patch series provides support for AMD's new Secure Memory Encryption (SME)
>> feature.
>>
>> SME can be used to mark individual pages of memory as encrypted through the
>> page tables. A page of memory that is marked encrypted will be automatically
>> decrypted when read from DRAM and will be automatically encrypted when
>> written to DRAM. Details on SME can found in the links below.
>>
>> The SME feature is identified through a CPUID function and enabled through
>> the SYSCFG MSR. Once enabled, page table entries will determine how the
>> memory is accessed. If a page table entry has the memory encryption mask set,
>> then that memory will be accessed as encrypted memory. The memory encryption
>> mask (as well as other related information) is determined from settings
>> returned through the same CPUID function that identifies the presence of the
>> feature.
>>
>> The approach that this patch series takes is to encrypt everything possible
>> starting early in the boot where the kernel is encrypted. Using the page
>> table macros the encryption mask can be incorporated into all page table
>> entries and page allocations. By updating the protection map, userspace
>> allocations are also marked encrypted. Certain data must be accounted for
>> as having been placed in memory before SME was enabled (EFI, initrd, etc.)
>> and accessed accordingly.
>>
>> This patch series is a pre-cursor to another AMD processor feature called
>> Secure Encrypted Virtualization (SEV). The support for SEV will build upon
>> the SME support and will be submitted later. Details on SEV can be found
>> in the links below.
>>
>> The following links provide additional detail:
>>
>> AMD Memory Encryption whitepaper:
>> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
>>
>> AMD64 Architecture Programmer's Manual:
>> http://support.amd.com/TechDocs/24593.pdf
>> SME is section 7.10
>> SEV is section 15.34
>>
>> ---
>>
...
>
>
> Hi Tom,
>
> Thanks for your work on this. This may be a stupid question, but is
> using bounce buffers for the GPU(s) expected to reduce performance in
> any/a noticeable way? I'm hitting another issue which I've already
> sent mail about so I can't test it for myself at the moment,
That all depends on the workload, how much DMA is being performed, etc.
But it is extra overhead to use bounce buffers.
Thanks,
Tom
>
> Thanks,
> Sarnex
>
>
>> What may be needed is making sure X86_FEATURE_SME is not set for PV
>> guests.
>
> And that may be something that Xen will need to control through either
> CPUID or MSR support for the PV guests.
Only on newer versions of Xen. On earlier versions (2-3 years old) leaf
0x80000007 is passed to the guest unchanged. And so is MSR_K8_SYSCFG.
-boris
On 6/8/2017 3:51 PM, Boris Ostrovsky wrote:
>
>>
>>> What may be needed is making sure X86_FEATURE_SME is not set for PV
>>> guests.
>>
>> And that may be something that Xen will need to control through either
>> CPUID or MSR support for the PV guests.
>
>
> Only on newer versions of Xen. On earlier versions (2-3 years old) leaf
> 0x80000007 is passed to the guest unchanged. And so is MSR_K8_SYSCFG.
The SME feature is in leaf 0x8000001f, is that leaf passed to the guest
unchanged?
Thanks,
Tom
>
> -boris
>
On 6/8/2017 12:53 AM, kbuild test robot wrote:
> Hi Tom,
>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.12-rc4 next-20170607]
> [cannot apply to tip/x86/core]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Tom-Lendacky/x86-Secure-Memory-Encryption-AMD/20170608-104147
> config: sparc-defconfig (attached as .config)
> compiler: sparc-linux-gcc (GCC) 6.2.0
> reproduce:
> wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=sparc
>
> All errors (new ones prefixed by >>):
>
> In file included from include/linux/dma-mapping.h:13:0,
> from include/linux/skbuff.h:34,
> from include/linux/filter.h:12,
> from kernel//bpf/core.c:24:
>>> include/linux/mem_encrypt.h:16:29: fatal error: asm/mem_encrypt.h: No such file or directory
> #include <asm/mem_encrypt.h>
> ^
> compilation terminated.
Okay, I had the wrong understanding of the asm-generic directory. The
next series will fix this so it is not an issue for other arches.
Thanks,
Tom
>
> vim +16 include/linux/mem_encrypt.h
>
> 2d7c2ec4 Tom Lendacky 2017-06-07 10 * published by the Free Software Foundation.
> 2d7c2ec4 Tom Lendacky 2017-06-07 11 */
> 2d7c2ec4 Tom Lendacky 2017-06-07 12
> 2d7c2ec4 Tom Lendacky 2017-06-07 13 #ifndef __MEM_ENCRYPT_H__
> 2d7c2ec4 Tom Lendacky 2017-06-07 14 #define __MEM_ENCRYPT_H__
> 2d7c2ec4 Tom Lendacky 2017-06-07 15
> 2d7c2ec4 Tom Lendacky 2017-06-07 @16 #include <asm/mem_encrypt.h>
> 2d7c2ec4 Tom Lendacky 2017-06-07 17
> 2d7c2ec4 Tom Lendacky 2017-06-07 18 #endif /* __MEM_ENCRYPT_H__ */
>
> :::::: The code at line 16 was first introduced by commit
> :::::: 2d7c2ec4c60e83432b27bfb32042706f404d4158 x86/mm: Add Secure Memory Encryption (SME) support
>
> :::::: TO: Tom Lendacky <[email protected]>
> :::::: CC: 0day robot <[email protected]>
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
>
On 06/08/2017 05:02 PM, Tom Lendacky wrote:
> On 6/8/2017 3:51 PM, Boris Ostrovsky wrote:
>>
>>>
>>>> What may be needed is making sure X86_FEATURE_SME is not set for PV
>>>> guests.
>>>
>>> And that may be something that Xen will need to control through either
>>> CPUID or MSR support for the PV guests.
>>
>>
>> Only on newer versions of Xen. On earlier versions (2-3 years old) leaf
>> 0x80000007 is passed to the guest unchanged. And so is MSR_K8_SYSCFG.
>
> The SME feature is in leaf 0x8000001f, is that leaf passed to the guest
> unchanged?
Oh, I misread the patch where X86_FEATURE_SME is defined. Then all
versions, including the current one, pass it unchanged.
All that's needed is setup_clear_cpu_cap(X86_FEATURE_SME) in
xen_init_capabilities().
-boris
On 08/06/2017 22:17, Boris Ostrovsky wrote:
> On 06/08/2017 05:02 PM, Tom Lendacky wrote:
>> On 6/8/2017 3:51 PM, Boris Ostrovsky wrote:
>>>>> What may be needed is making sure X86_FEATURE_SME is not set for PV
>>>>> guests.
>>>> And that may be something that Xen will need to control through either
>>>> CPUID or MSR support for the PV guests.
>>>
>>> Only on newer versions of Xen. On earlier versions (2-3 years old) leaf
>>> 0x80000007 is passed to the guest unchanged. And so is MSR_K8_SYSCFG.
>> The SME feature is in leaf 0x8000001f, is that leaf passed to the guest
>> unchanged?
> Oh, I misread the patch where X86_FEATURE_SME is defined. Then all
> versions, including the current one, pass it unchanged.
>
> All that's needed is setup_clear_cpu_cap(X86_FEATURE_SME) in
> xen_init_capabilities().
AMD processors still don't support CPUID Faulting (or at least, I
couldn't find any reference to it in the latest docs), so we cannot
actually hide SME from a guest which goes looking at native CPUID.
Furthermore, I'm not aware of any CPUID masking support covering that leaf.
However, if Linux is using the paravirtual cpuid hook, things are
slightly better.
On Xen 4.9 and later, no guests will see the feature. On earlier
versions of Xen (before I fixed the logic), plain domUs will not see the
feature, while dom0 will.
For safely, I'd recommend unilaterally clobbering the feature as Boris
suggested. There is no way SME will be supportable on a per-PV guest
basis, although (as far as I am aware) Xen as a whole would be able to
encompass itself and all of its PV guests inside one single SME instance.
~Andrew
On 6/8/2017 1:05 AM, Andy Lutomirski wrote:
> On Wed, Jun 7, 2017 at 12:14 PM, Tom Lendacky <[email protected]> wrote:
>> The cr3 register entry can contain the SME encryption bit that indicates
>> the PGD is encrypted. The encryption bit should not be used when creating
>> a virtual address for the PGD table.
>>
>> Create a new function, read_cr3_pa(), that will extract the physical
>> address from the cr3 register. This function is then used where a virtual
>> address of the PGD needs to be created/used from the cr3 register.
>
> This is going to conflict with:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pcid&id=555c81e5d01a62b629ec426a2f50d27e2127c1df
>
> We're both encountering the fact that CR3 munges the page table PA
> with some other stuff, and some readers want to see the actual CR3
> value and other readers just want the PA. The thing I prefer about my
> patch is that I get rid of read_cr3() entirely, forcing the patch to
> update every single reader, making review and conflict resolution much
> safer.
>
> I'd be willing to send a patch tomorrow that just does the split into
> __read_cr3() and read_cr3_pa() (I like your name better) and then we
> can both base on top of it. Would that make sense?
That makes sense to me.
>
> Also:
>
>> +static inline unsigned long read_cr3_pa(void)
>> +{
>> + return (read_cr3() & PHYSICAL_PAGE_MASK);
>> +}
>
> Is there any guarantee that the magic encryption bit is masked out in
> PHYSICAL_PAGE_MASK? The docs make it sound like it could be any bit.
> (But if it's one of the low 12 bits, that would be quite confusing.)
Right now it's bit 47 and we're steering away from any of the currently
reserved bits so we should be safe.
Thanks,
Tom
>
> --Andy
>
On 6/8/2017 2:58 AM, Christoph Hellwig wrote:
> On Wed, Jun 07, 2017 at 02:17:32PM -0500, Tom Lendacky wrote:
>> Add warnings to let the user know when bounce buffers are being used for
>> DMA when SME is active. Since the bounce buffers are not in encrypted
>> memory, these notifications are to allow the user to determine some
>> appropriate action - if necessary.
>
> And what would the action be? Do we need a boot or other option to
> disallow this fallback for people who care deeply?
The action could be to enable the IOMMU so that lower DMA addresses are
used, or to replace the device with one that supports 64-bit DMA or, if
the device is not used much, you could just ignore it.
I'm not sure we need an option to disallow the fallback. Are you
thinking along the lines of disabling the device?
Thanks,
Tom
>
On Wed, Jun 07, 2017 at 02:13:53PM -0500, Tom Lendacky wrote:
> Update the CPU features to include identifying and reporting on the
> Secure Memory Encryption (SME) feature. SME is identified by CPUID
> 0x8000001f, but requires BIOS support to enable it (set bit 23 of
> MSR_K8_SYSCFG). Only show the SME feature as available if reported by
> CPUID and enabled by BIOS.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/msr-index.h | 2 ++
> arch/x86/kernel/cpu/amd.c | 13 +++++++++++++
> arch/x86/kernel/cpu/scattered.c | 1 +
> 4 files changed, 17 insertions(+)
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:14:04PM -0500, Tom Lendacky wrote:
> When System Memory Encryption (SME) is enabled, the physical address
> space is reduced. Adjust the x86_phys_bits value to reflect this
> reduction.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/kernel/cpu/amd.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:14:16PM -0500, Tom Lendacky wrote:
> Add support for Secure Memory Encryption (SME). This initial support
> provides a Kconfig entry to build the SME support into the kernel and
> defines the memory encryption mask that will be used in subsequent
> patches to mark pages as encrypted.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/Kconfig | 22 ++++++++++++++++++++++
> arch/x86/include/asm/mem_encrypt.h | 35 +++++++++++++++++++++++++++++++++++
> arch/x86/mm/Makefile | 1 +
> arch/x86/mm/mem_encrypt.c | 21 +++++++++++++++++++++
> include/asm-generic/mem_encrypt.h | 27 +++++++++++++++++++++++++++
> include/linux/mem_encrypt.h | 18 ++++++++++++++++++
> 6 files changed, 124 insertions(+)
> create mode 100644 arch/x86/include/asm/mem_encrypt.h
> create mode 100644 arch/x86/mm/mem_encrypt.c
> create mode 100644 include/asm-generic/mem_encrypt.h
> create mode 100644 include/linux/mem_encrypt.h
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 6/8/2017 5:01 PM, Andrew Cooper wrote:
> On 08/06/2017 22:17, Boris Ostrovsky wrote:
>> On 06/08/2017 05:02 PM, Tom Lendacky wrote:
>>> On 6/8/2017 3:51 PM, Boris Ostrovsky wrote:
>>>>>> What may be needed is making sure X86_FEATURE_SME is not set for PV
>>>>>> guests.
>>>>> And that may be something that Xen will need to control through either
>>>>> CPUID or MSR support for the PV guests.
>>>>
>>>> Only on newer versions of Xen. On earlier versions (2-3 years old) leaf
>>>> 0x80000007 is passed to the guest unchanged. And so is MSR_K8_SYSCFG.
>>> The SME feature is in leaf 0x8000001f, is that leaf passed to the guest
>>> unchanged?
>> Oh, I misread the patch where X86_FEATURE_SME is defined. Then all
>> versions, including the current one, pass it unchanged.
>>
>> All that's needed is setup_clear_cpu_cap(X86_FEATURE_SME) in
>> xen_init_capabilities().
>
> AMD processors still don't support CPUID Faulting (or at least, I
> couldn't find any reference to it in the latest docs), so we cannot
> actually hide SME from a guest which goes looking at native CPUID.
> Furthermore, I'm not aware of any CPUID masking support covering that leaf.
>
> However, if Linux is using the paravirtual cpuid hook, things are
> slightly better.
>
> On Xen 4.9 and later, no guests will see the feature. On earlier
> versions of Xen (before I fixed the logic), plain domUs will not see the
> feature, while dom0 will.
>
> For safely, I'd recommend unilaterally clobbering the feature as Boris
> suggested. There is no way SME will be supportable on a per-PV guest
That may be too late. Early boot support in head_64.S will make calls to
check for the feature (through CPUID and MSR), set the sme_me_mask and
encrypt the kernel in place. Is there another way to approach this?
> basis, although (as far as I am aware) Xen as a whole would be able to
> encompass itself and all of its PV guests inside one single SME instance.
Yes, that is correct.
Thanks,
Tom
>
> ~Andrew
>
On 06/09/2017 02:36 PM, Tom Lendacky wrote:
> On 6/8/2017 5:01 PM, Andrew Cooper wrote:
>> On 08/06/2017 22:17, Boris Ostrovsky wrote:
>>> On 06/08/2017 05:02 PM, Tom Lendacky wrote:
>>>> On 6/8/2017 3:51 PM, Boris Ostrovsky wrote:
>>>>>>> What may be needed is making sure X86_FEATURE_SME is not set for PV
>>>>>>> guests.
>>>>>> And that may be something that Xen will need to control through
>>>>>> either
>>>>>> CPUID or MSR support for the PV guests.
>>>>>
>>>>> Only on newer versions of Xen. On earlier versions (2-3 years old)
>>>>> leaf
>>>>> 0x80000007 is passed to the guest unchanged. And so is MSR_K8_SYSCFG.
>>>> The SME feature is in leaf 0x8000001f, is that leaf passed to the
>>>> guest
>>>> unchanged?
>>> Oh, I misread the patch where X86_FEATURE_SME is defined. Then all
>>> versions, including the current one, pass it unchanged.
>>>
>>> All that's needed is setup_clear_cpu_cap(X86_FEATURE_SME) in
>>> xen_init_capabilities().
>>
>> AMD processors still don't support CPUID Faulting (or at least, I
>> couldn't find any reference to it in the latest docs), so we cannot
>> actually hide SME from a guest which goes looking at native CPUID.
>> Furthermore, I'm not aware of any CPUID masking support covering that
>> leaf.
>>
>> However, if Linux is using the paravirtual cpuid hook, things are
>> slightly better.
>>
>> On Xen 4.9 and later, no guests will see the feature. On earlier
>> versions of Xen (before I fixed the logic), plain domUs will not see the
>> feature, while dom0 will.
>>
>> For safely, I'd recommend unilaterally clobbering the feature as Boris
>> suggested. There is no way SME will be supportable on a per-PV guest
>
> That may be too late. Early boot support in head_64.S will make calls to
> check for the feature (through CPUID and MSR), set the sme_me_mask and
> encrypt the kernel in place. Is there another way to approach this?
PV guests don't go through Linux x86 early boot code. They start at
xen_start_kernel() (well, xen-head.S:startup_xen(), really) and merge
with baremetal path at x86_64_start_reservations() (for 64-bit).
-boris
>
>> basis, although (as far as I am aware) Xen as a whole would be able to
>> encompass itself and all of its PV guests inside one single SME
>> instance.
>
> Yes, that is correct.
>
> Thanks,
> Tom
>
>>
>> ~Andrew
>>
On Thu, Jun 8, 2017 at 3:38 PM, Tom Lendacky <[email protected]> wrote:
> On 6/8/2017 1:05 AM, Andy Lutomirski wrote:
>>
>> On Wed, Jun 7, 2017 at 12:14 PM, Tom Lendacky <[email protected]>
>> wrote:
>>>
>>> The cr3 register entry can contain the SME encryption bit that indicates
>>> the PGD is encrypted. The encryption bit should not be used when
>>> creating
>>> a virtual address for the PGD table.
>>>
>>> Create a new function, read_cr3_pa(), that will extract the physical
>>> address from the cr3 register. This function is then used where a virtual
>>> address of the PGD needs to be created/used from the cr3 register.
>>
>>
>> This is going to conflict with:
>>
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pcid&id=555c81e5d01a62b629ec426a2f50d27e2127c1df
>>
>> We're both encountering the fact that CR3 munges the page table PA
>> with some other stuff, and some readers want to see the actual CR3
>> value and other readers just want the PA. The thing I prefer about my
>> patch is that I get rid of read_cr3() entirely, forcing the patch to
>> update every single reader, making review and conflict resolution much
>> safer.
>>
>> I'd be willing to send a patch tomorrow that just does the split into
>> __read_cr3() and read_cr3_pa() (I like your name better) and then we
>> can both base on top of it. Would that make sense?
>
>
> That makes sense to me.
Draft patch:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/read_cr3&id=9adebbc1071f066421a27b4f6e040190f1049624
>
>>
>> Also:
>>
>>> +static inline unsigned long read_cr3_pa(void)
>>> +{
>>> + return (read_cr3() & PHYSICAL_PAGE_MASK);
>>> +}
>>
>>
>> Is there any guarantee that the magic encryption bit is masked out in
>> PHYSICAL_PAGE_MASK? The docs make it sound like it could be any bit.
>> (But if it's one of the low 12 bits, that would be quite confusing.)
>
>
> Right now it's bit 47 and we're steering away from any of the currently
> reserved bits so we should be safe.
Should the SME init code check that it's a usable bit (i.e. outside
our physical address mask and not one of the bottom twelve bits)? If
some future CPU daftly picks, say, bit 12, we'll regret it if we
enable SME.
--Andy
On 09/06/17 19:43, Boris Ostrovsky wrote:
> On 06/09/2017 02:36 PM, Tom Lendacky wrote:
>>> basis, although (as far as I am aware) Xen as a whole would be able to
>>> encompass itself and all of its PV guests inside one single SME
>>> instance.
>> Yes, that is correct.
Thinking more about this, it would only be possible if all the PV guests
were SME-aware and understood not to choke when it finds a frame with a
high address bit set.
I expect the only viable way to implement this (should we wish) is to
have PV guests explicitly signal support (probably via an ELF note),
after which it needs to know about the existence of SME, the meaning of
the encrypted bit in PTEs, and to defer all configuration responsibility
to Xen.
~Andrew
On 6/9/2017 1:43 PM, Boris Ostrovsky wrote:
> On 06/09/2017 02:36 PM, Tom Lendacky wrote:
>> On 6/8/2017 5:01 PM, Andrew Cooper wrote:
>>> On 08/06/2017 22:17, Boris Ostrovsky wrote:
>>>> On 06/08/2017 05:02 PM, Tom Lendacky wrote:
>>>>> On 6/8/2017 3:51 PM, Boris Ostrovsky wrote:
>>>>>>>> What may be needed is making sure X86_FEATURE_SME is not set for PV
>>>>>>>> guests.
>>>>>>> And that may be something that Xen will need to control through
>>>>>>> either
>>>>>>> CPUID or MSR support for the PV guests.
>>>>>>
>>>>>> Only on newer versions of Xen. On earlier versions (2-3 years old)
>>>>>> leaf
>>>>>> 0x80000007 is passed to the guest unchanged. And so is MSR_K8_SYSCFG.
>>>>> The SME feature is in leaf 0x8000001f, is that leaf passed to the
>>>>> guest
>>>>> unchanged?
>>>> Oh, I misread the patch where X86_FEATURE_SME is defined. Then all
>>>> versions, including the current one, pass it unchanged.
>>>>
>>>> All that's needed is setup_clear_cpu_cap(X86_FEATURE_SME) in
>>>> xen_init_capabilities().
>>>
>>> AMD processors still don't support CPUID Faulting (or at least, I
>>> couldn't find any reference to it in the latest docs), so we cannot
>>> actually hide SME from a guest which goes looking at native CPUID.
>>> Furthermore, I'm not aware of any CPUID masking support covering that
>>> leaf.
>>>
>>> However, if Linux is using the paravirtual cpuid hook, things are
>>> slightly better.
>>>
>>> On Xen 4.9 and later, no guests will see the feature. On earlier
>>> versions of Xen (before I fixed the logic), plain domUs will not see the
>>> feature, while dom0 will.
>>>
>>> For safely, I'd recommend unilaterally clobbering the feature as Boris
>>> suggested. There is no way SME will be supportable on a per-PV guest
>>
>> That may be too late. Early boot support in head_64.S will make calls to
>> check for the feature (through CPUID and MSR), set the sme_me_mask and
>> encrypt the kernel in place. Is there another way to approach this?
>
>
> PV guests don't go through Linux x86 early boot code. They start at
> xen_start_kernel() (well, xen-head.S:startup_xen(), really) and merge
> with baremetal path at x86_64_start_reservations() (for 64-bit).
>
Ok, I don't think anything needs to be done then. The sme_me_mask is set
in sme_enable() which is only called from head_64.S. If the sme_me_mask
isn't set then SME won't be active. The feature will just report the
capability of the processor, but that doesn't mean it is active. If you
still want the feature to be clobbered we can do that, though.
Thanks,
Tom
>
> -boris
>
>>
>>> basis, although (as far as I am aware) Xen as a whole would be able to
>>> encompass itself and all of its PV guests inside one single SME
>>> instance.
>>
>> Yes, that is correct.
>>
>> Thanks,
>> Tom
>>
>>>
>>> ~Andrew
>>>
>
>>
>> PV guests don't go through Linux x86 early boot code. They start at
>> xen_start_kernel() (well, xen-head.S:startup_xen(), really) and merge
>> with baremetal path at x86_64_start_reservations() (for 64-bit).
>>
>
> Ok, I don't think anything needs to be done then. The sme_me_mask is set
> in sme_enable() which is only called from head_64.S. If the sme_me_mask
> isn't set then SME won't be active. The feature will just report the
> capability of the processor, but that doesn't mean it is active. If you
> still want the feature to be clobbered we can do that, though.
I'd prefer to explicitly clear to avoid any ambiguity.
-boris
On 6/9/2017 1:46 PM, Andy Lutomirski wrote:
> On Thu, Jun 8, 2017 at 3:38 PM, Tom Lendacky <[email protected]> wrote:
>> On 6/8/2017 1:05 AM, Andy Lutomirski wrote:
>>>
>>> On Wed, Jun 7, 2017 at 12:14 PM, Tom Lendacky <[email protected]>
>>> wrote:
>>>>
>>>> The cr3 register entry can contain the SME encryption bit that indicates
>>>> the PGD is encrypted. The encryption bit should not be used when
>>>> creating
>>>> a virtual address for the PGD table.
>>>>
>>>> Create a new function, read_cr3_pa(), that will extract the physical
>>>> address from the cr3 register. This function is then used where a virtual
>>>> address of the PGD needs to be created/used from the cr3 register.
>>>
>>>
>>> This is going to conflict with:
>>>
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pcid&id=555c81e5d01a62b629ec426a2f50d27e2127c1df
>>>
>>> We're both encountering the fact that CR3 munges the page table PA
>>> with some other stuff, and some readers want to see the actual CR3
>>> value and other readers just want the PA. The thing I prefer about my
>>> patch is that I get rid of read_cr3() entirely, forcing the patch to
>>> update every single reader, making review and conflict resolution much
>>> safer.
>>>
>>> I'd be willing to send a patch tomorrow that just does the split into
>>> __read_cr3() and read_cr3_pa() (I like your name better) and then we
>>> can both base on top of it. Would that make sense?
>>
>>
>> That makes sense to me.
>
> Draft patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/read_cr3&id=9adebbc1071f066421a27b4f6e040190f1049624
Looks good to me. I'll look at how to best mask off the encryption bit
in CR3_ADDR_MASK for SME support. I should be able to just do an
__sme_clr() against it.
>
>>
>>>
>>> Also:
>>>
>>>> +static inline unsigned long read_cr3_pa(void)
>>>> +{
>>>> + return (read_cr3() & PHYSICAL_PAGE_MASK);
>>>> +}
>>>
>>>
>>> Is there any guarantee that the magic encryption bit is masked out in
>>> PHYSICAL_PAGE_MASK? The docs make it sound like it could be any bit.
>>> (But if it's one of the low 12 bits, that would be quite confusing.)
>>
>>
>> Right now it's bit 47 and we're steering away from any of the currently
>> reserved bits so we should be safe.
>
> Should the SME init code check that it's a usable bit (i.e. outside
> our physical address mask and not one of the bottom twelve bits)? If
> some future CPU daftly picks, say, bit 12, we'll regret it if we
> enable SME.
I think I can safely say that it will never be any of the lower 12 bits,
but let me talk to some of the hardware folks and see about the other
end of the range.
Thanks,
Tom
>
> --Andy
>
On Wed, Jun 07, 2017 at 02:14:45PM -0500, Tom Lendacky wrote:
> Create a pgd_pfn() macro similar to the p[um]d_pfn() macros and then
> use the p[gum]d_pfn() macros in the p[gum]d_page() macros instead of
> duplicating the code.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/pgtable.h | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
For patches 7-9:
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:15:27PM -0500, Tom Lendacky wrote:
> Add support to be able to either encrypt or decrypt data in place during
> the early stages of booting the kernel. This does not change the memory
> encryption attribute - it is used for ensuring that data present in either
> an encrypted or decrypted memory area is in the proper state (for example
> the initrd will have been loaded by the boot loader and will not be
> encrypted, but the memory that it resides in is marked as encrypted).
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 15 +++++++
> arch/x86/mm/mem_encrypt.c | 76 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 91 insertions(+)
Patches 11-13:
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:15:39PM -0500, Tom Lendacky wrote:
> The boot data and command line data are present in memory in a decrypted
> state and are copied early in the boot process. The early page fault
> support will map these areas as encrypted, so before attempting to copy
> them, add decrypted mappings so the data is accessed properly when copied.
>
> For the initrd, encrypt this data in place. Since the future mapping of the
> initrd area will be mapped as encrypted the data will be accessed properly.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 11 +++++
> arch/x86/include/asm/pgtable.h | 3 +
> arch/x86/kernel/head64.c | 30 ++++++++++++--
> arch/x86/kernel/setup.c | 9 ++++
> arch/x86/mm/mem_encrypt.c | 77 ++++++++++++++++++++++++++++++++++++
> 5 files changed, 126 insertions(+), 4 deletions(-)
Some cleanups ontop in case you get to send v7:
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 61a704945294..5959a42dd4d5 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -45,13 +45,8 @@ static inline void __init sme_early_decrypt(resource_size_t paddr,
{
}
-static inline void __init sme_map_bootdata(char *real_mode_data)
-{
-}
-
-static inline void __init sme_unmap_bootdata(char *real_mode_data)
-{
-}
+static inline void __init sme_map_bootdata(char *real_mode_data) { }
+static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
static inline void __init sme_early_init(void)
{
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 2321f05045e5..32ebbe0ab04d 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -132,6 +132,10 @@ static void __init __sme_map_unmap_bootdata(char *real_mode_data, bool map)
struct boot_params *boot_data;
unsigned long cmdline_paddr;
+ /* If SME is not active, the bootdata is in the correct state */
+ if (!sme_active())
+ return;
+
__sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), map);
boot_data = (struct boot_params *)real_mode_data;
@@ -142,40 +146,22 @@ static void __init __sme_map_unmap_bootdata(char *real_mode_data, bool map)
cmdline_paddr = boot_data->hdr.cmd_line_ptr |
((u64)boot_data->ext_cmd_line_ptr << 32);
- if (cmdline_paddr)
- __sme_early_map_unmap_mem(__va(cmdline_paddr),
- COMMAND_LINE_SIZE, map);
+ if (!cmdline_paddr)
+ return;
+
+ __sme_early_map_unmap_mem(__va(cmdline_paddr), COMMAND_LINE_SIZE, map);
+
+ sme_early_pgtable_flush();
}
void __init sme_unmap_bootdata(char *real_mode_data)
{
- /* If SME is not active, the bootdata is in the correct state */
- if (!sme_active())
- return;
-
- /*
- * The bootdata and command line aren't needed anymore so clear
- * any mapping of them.
- */
__sme_map_unmap_bootdata(real_mode_data, false);
-
- sme_early_pgtable_flush();
}
void __init sme_map_bootdata(char *real_mode_data)
{
- /* If SME is not active, the bootdata is in the correct state */
- if (!sme_active())
- return;
-
- /*
- * The bootdata and command line will not be encrypted, so they
- * need to be mapped as decrypted memory so they can be copied
- * properly.
- */
__sme_map_unmap_bootdata(real_mode_data, true);
-
- sme_early_pgtable_flush();
}
void __init sme_early_init(void)
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:16:27PM -0500, Tom Lendacky wrote:
> When SME is active, pagetable entries created for EFI need to have the
> encryption mask set as necessary.
>
> When the new pagetable pages are allocated they are mapped encrypted. So,
> update the efi_pgt value that will be used in cr3 to include the encryption
> mask so that the PGD table can be read successfully. The pagetable mapping
> as well as the kernel are also added to the pagetable mapping as encrypted.
> All other EFI mappings are mapped decrypted (tables, etc.).
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/platform/efi/efi_64.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
patches 15-18:
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 6/10/2017 11:01 AM, Borislav Petkov wrote:
> On Wed, Jun 07, 2017 at 02:15:39PM -0500, Tom Lendacky wrote:
>> The boot data and command line data are present in memory in a decrypted
>> state and are copied early in the boot process. The early page fault
>> support will map these areas as encrypted, so before attempting to copy
>> them, add decrypted mappings so the data is accessed properly when copied.
>>
>> For the initrd, encrypt this data in place. Since the future mapping of the
>> initrd area will be mapped as encrypted the data will be accessed properly.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>> arch/x86/include/asm/mem_encrypt.h | 11 +++++
>> arch/x86/include/asm/pgtable.h | 3 +
>> arch/x86/kernel/head64.c | 30 ++++++++++++--
>> arch/x86/kernel/setup.c | 9 ++++
>> arch/x86/mm/mem_encrypt.c | 77 ++++++++++++++++++++++++++++++++++++
>> 5 files changed, 126 insertions(+), 4 deletions(-)
>
> Some cleanups ontop in case you get to send v7:
There will be a v7.
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 61a704945294..5959a42dd4d5 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -45,13 +45,8 @@ static inline void __init sme_early_decrypt(resource_size_t paddr,
> {
> }
>
> -static inline void __init sme_map_bootdata(char *real_mode_data)
> -{
> -}
> -
> -static inline void __init sme_unmap_bootdata(char *real_mode_data)
> -{
> -}
> +static inline void __init sme_map_bootdata(char *real_mode_data) { }
> +static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
>
> static inline void __init sme_early_init(void)
> {
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 2321f05045e5..32ebbe0ab04d 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -132,6 +132,10 @@ static void __init __sme_map_unmap_bootdata(char *real_mode_data, bool map)
> struct boot_params *boot_data;
> unsigned long cmdline_paddr;
>
> + /* If SME is not active, the bootdata is in the correct state */
> + if (!sme_active())
> + return;
> +
> __sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), map);
> boot_data = (struct boot_params *)real_mode_data;
>
> @@ -142,40 +146,22 @@ static void __init __sme_map_unmap_bootdata(char *real_mode_data, bool map)
> cmdline_paddr = boot_data->hdr.cmd_line_ptr |
> ((u64)boot_data->ext_cmd_line_ptr << 32);
>
> - if (cmdline_paddr)
> - __sme_early_map_unmap_mem(__va(cmdline_paddr),
> - COMMAND_LINE_SIZE, map);
> + if (!cmdline_paddr)
> + return;
> +
> + __sme_early_map_unmap_mem(__va(cmdline_paddr), COMMAND_LINE_SIZE, map);
> +
> + sme_early_pgtable_flush();
Yup, overall it definitely simplifies things.
I have to call sme_early_pgtable_flush() even if cmdline_paddr is NULL,
so I'll either keep the if and have one flush at the end or I can move
the flush into __sme_early_map_unmap_mem(). I'm leaning towards the
latter.
Thanks,
Tom
> }
>
> void __init sme_unmap_bootdata(char *real_mode_data)
> {
> - /* If SME is not active, the bootdata is in the correct state */
> - if (!sme_active())
> - return;
> -
> - /*
> - * The bootdata and command line aren't needed anymore so clear
> - * any mapping of them.
> - */
> __sme_map_unmap_bootdata(real_mode_data, false);
> -
> - sme_early_pgtable_flush();
> }
>
> void __init sme_map_bootdata(char *real_mode_data)
> {
> - /* If SME is not active, the bootdata is in the correct state */
> - if (!sme_active())
> - return;
> -
> - /*
> - * The bootdata and command line will not be encrypted, so they
> - * need to be mapped as decrypted memory so they can be copied
> - * properly.
> - */
> __sme_map_unmap_bootdata(real_mode_data, true);
> -
> - sme_early_pgtable_flush();
> }
>
> void __init sme_early_init(void)
>
On Wed, Jun 07, 2017 at 02:16:43PM -0500, Tom Lendacky wrote:
> The SMP MP-table is built by UEFI and placed in memory in a decrypted
> state. These tables are accessed using a mix of early_memremap(),
> early_memunmap(), phys_to_virt() and virt_to_phys(). Change all accesses
> to use early_memremap()/early_memunmap(). This allows for proper setting
> of the encryption mask so that the data can be successfully accessed when
> SME is active.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/kernel/mpparse.c | 98 ++++++++++++++++++++++++++++++++-------------
> 1 file changed, 70 insertions(+), 28 deletions(-)
...
> @@ -515,6 +516,12 @@ void __init default_get_smp_config(unsigned int early)
> if (acpi_lapic && acpi_ioapic)
> return;
>
> + mpf = early_memremap(mpf_base, sizeof(*mpf));
> + if (!mpf) {
> + pr_err("MPTABLE: mpf early_memremap() failed\n");
If you're going to introduce new prefixes then add:
#undef pr_fmt
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
at the beginning of the file so that they all say "mpparse:" instead.
And pls make that message more user-friendly: "Error mapping MP table"
or so.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:17:09PM -0500, Tom Lendacky wrote:
> When Secure Memory Encryption is enabled, the trampoline area must not
> be encrypted. A CPU running in real mode will not be able to decrypt
> memory that has been encrypted because it will not be able to use addresses
> with the memory encryption mask.
>
> A recent change that added a new system_state value exposed a warning
> issued by early_ioreamp() when the system_state was not SYSTEM_BOOTING.
> At the stage where the trampoline area is decrypted, the system_state is
> now SYSTEM_SCHEDULING. The check was changed to issue a warning if the
> system_state is greater than or equal to SYSTEM_RUNNING.
This piece along with the hunk touching system_state absolutely needs to
be a separate patch as it is unrelated.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:17:00PM -0500, Tom Lendacky wrote:
> Add support for changing the memory encryption attribute for one or more
> memory pages. This will be useful when we have to change the AP trampoline
> area to not be encrypted. Or when we need to change the SWIOTLB area to
> not be encrypted in support of devices that can't support the encryption
> mask range.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/set_memory.h | 3 ++
> arch/x86/mm/pageattr.c | 62 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 65 insertions(+)
Patches 21-22:
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 6/14/2017 11:24 AM, Borislav Petkov wrote:
> On Wed, Jun 07, 2017 at 02:17:09PM -0500, Tom Lendacky wrote:
>> When Secure Memory Encryption is enabled, the trampoline area must not
>> be encrypted. A CPU running in real mode will not be able to decrypt
>> memory that has been encrypted because it will not be able to use addresses
>> with the memory encryption mask.
>>
>> A recent change that added a new system_state value exposed a warning
>> issued by early_ioreamp() when the system_state was not SYSTEM_BOOTING.
>> At the stage where the trampoline area is decrypted, the system_state is
>> now SYSTEM_SCHEDULING. The check was changed to issue a warning if the
>> system_state is greater than or equal to SYSTEM_RUNNING.
>
> This piece along with the hunk touching system_state absolutely needs to
> be a separate patch as it is unrelated.
Yup, will do.
Thanks,
Tom
>
On Wed, Jun 14, 2017 at 06:24:16PM +0200, Borislav Petkov wrote:
> On Wed, Jun 07, 2017 at 02:17:09PM -0500, Tom Lendacky wrote:
> > When Secure Memory Encryption is enabled, the trampoline area must not
> > be encrypted. A CPU running in real mode will not be able to decrypt
> > memory that has been encrypted because it will not be able to use addresses
> > with the memory encryption mask.
> >
> > A recent change that added a new system_state value exposed a warning
> > issued by early_ioreamp() when the system_state was not SYSTEM_BOOTING.
> > At the stage where the trampoline area is decrypted, the system_state is
> > now SYSTEM_SCHEDULING. The check was changed to issue a warning if the
> > system_state is greater than or equal to SYSTEM_RUNNING.
>
> This piece along with the hunk touching system_state absolutely needs to
> be a separate patch as it is unrelated.
Btw, pls send this now and separate from the patchset as it is a bugfix
that should go into sched/core.
Thanks.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:17:21PM -0500, Tom Lendacky wrote:
> Since DMA addresses will effectively look like 48-bit addresses when the
> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
> device performing the DMA does not support 48-bits. SWIOTLB will be
> initialized to create decrypted bounce buffers for use by these devices.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
...
> diff --git a/init/main.c b/init/main.c
> index df58a41..7125b5f 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -488,6 +488,10 @@ void __init __weak thread_stack_cache_init(void)
> }
> #endif
>
> +void __init __weak mem_encrypt_init(void)
> +{
> +}
void __init __weak mem_encrypt_init(void) { }
saves some real estate. Please do that for the rest of the stubs you're
adding, for the next version.
> +
> /*
> * Set up kernel memory allocators
> */
> @@ -640,6 +644,15 @@ asmlinkage __visible void __init start_kernel(void)
> */
> locking_selftest();
>
> + /*
> + * This needs to be called before any devices perform DMA
> + * operations that might use the SWIOTLB bounce buffers.
> + * This call will mark the bounce buffers as decrypted so
> + * that their usage will not cause "plain-text" data to be
> + * decrypted when accessed.
s/This call/It/
> + */
> + mem_encrypt_init();
> +
> #ifdef CONFIG_BLK_DEV_INITRD
> if (initrd_start && !initrd_below_start_ok &&
> page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index a8d74a7..74d6557 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -30,6 +30,7 @@
> #include <linux/highmem.h>
> #include <linux/gfp.h>
> #include <linux/scatterlist.h>
> +#include <linux/mem_encrypt.h>
>
> #include <asm/io.h>
> #include <asm/dma.h>
> @@ -155,6 +156,17 @@ unsigned long swiotlb_size_or_default(void)
> return size ? size : (IO_TLB_DEFAULT_SIZE);
> }
>
> +void __weak swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
> +{
> +}
As above.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:17:32PM -0500, Tom Lendacky wrote:
> Add warnings to let the user know when bounce buffers are being used for
> DMA when SME is active. Since the bounce buffers are not in encrypted
> memory, these notifications are to allow the user to determine some
> appropriate action - if necessary.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 8 ++++++++
> include/asm-generic/mem_encrypt.h | 5 +++++
> include/linux/dma-mapping.h | 9 +++++++++
> lib/swiotlb.c | 3 +++
> 4 files changed, 25 insertions(+)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index f1215a4..c7a2525 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -69,6 +69,14 @@ static inline bool sme_active(void)
> return !!sme_me_mask;
> }
>
> +static inline u64 sme_dma_mask(void)
> +{
> + if (!sme_me_mask)
> + return 0ULL;
> +
> + return ((u64)sme_me_mask << 1) - 1;
> +}
> +
> /*
> * The __sme_pa() and __sme_pa_nodebug() macros are meant for use when
> * writing to or comparing values from the cr3 register. Having the
> diff --git a/include/asm-generic/mem_encrypt.h b/include/asm-generic/mem_encrypt.h
> index b55c3f9..fb02ff0 100644
> --- a/include/asm-generic/mem_encrypt.h
> +++ b/include/asm-generic/mem_encrypt.h
> @@ -22,6 +22,11 @@ static inline bool sme_active(void)
> return false;
> }
>
> +static inline u64 sme_dma_mask(void)
> +{
> + return 0ULL;
> +}
> +
> /*
> * The __sme_set() and __sme_clr() macros are useful for adding or removing
> * the encryption mask from a value (e.g. when dealing with pagetable
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 4f3eece..e2c5fda 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -10,6 +10,7 @@
> #include <linux/scatterlist.h>
> #include <linux/kmemcheck.h>
> #include <linux/bug.h>
> +#include <linux/mem_encrypt.h>
>
> /**
> * List of possible attributes associated with a DMA mapping. The semantics
> @@ -577,6 +578,10 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
>
> if (!dev->dma_mask || !dma_supported(dev, mask))
> return -EIO;
> +
> + if (sme_active() && (mask < sme_dma_mask()))
> + dev_warn(dev, "SME is active, device will require DMA bounce buffers\n");
Something looks strange here:
you're checking sme_active() before calling sme_dma_mask() and yet in
it, you're checking !sme_me_mask again. What gives?
Why not move the sme_active() check into sme_dma_mask() and thus
simplify callers?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 6/14/2017 11:07 AM, Borislav Petkov wrote:
> On Wed, Jun 07, 2017 at 02:16:43PM -0500, Tom Lendacky wrote:
>> The SMP MP-table is built by UEFI and placed in memory in a decrypted
>> state. These tables are accessed using a mix of early_memremap(),
>> early_memunmap(), phys_to_virt() and virt_to_phys(). Change all accesses
>> to use early_memremap()/early_memunmap(). This allows for proper setting
>> of the encryption mask so that the data can be successfully accessed when
>> SME is active.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>> arch/x86/kernel/mpparse.c | 98 ++++++++++++++++++++++++++++++++-------------
>> 1 file changed, 70 insertions(+), 28 deletions(-)
>
> ...
>
>> @@ -515,6 +516,12 @@ void __init default_get_smp_config(unsigned int early)
>> if (acpi_lapic && acpi_ioapic)
>> return;
>>
>> + mpf = early_memremap(mpf_base, sizeof(*mpf));
>> + if (!mpf) {
>> + pr_err("MPTABLE: mpf early_memremap() failed\n");
>
> If you're going to introduce new prefixes then add:
This isn't new... there are a number of messages issued in this file
with that prefix, so I was just following convention. Changing the
prefix could be a follow-on patch.
>
> #undef pr_fmt
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> at the beginning of the file so that they all say "mpparse:" instead.
>
> And pls make that message more user-friendly: "Error mapping MP table"
> or so.
Can do.
Thanks,
Tom
>
On Wed, Jun 14, 2017 at 12:06:54PM -0500, Tom Lendacky wrote:
> This isn't new... there are a number of messages issued in this file
> with that prefix, so I was just following convention.
The "convention" that some of the messages are prefixed and some aren't?
:-)
> Changing the prefix could be a follow-on patch.
Ok. As some of those print statements have prefixes and some don't,
let's unify them.
Thanks.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:17:45PM -0500, Tom Lendacky wrote:
> The IOMMU is programmed with physical addresses for the various tables
> and buffers that are used to communicate between the device and the
> driver. When the driver allocates this memory it is encrypted. In order
> for the IOMMU to access the memory as encrypted the encryption mask needs
> to be included in these physical addresses during configuration.
>
> The PTE entries created by the IOMMU should also include the encryption
> mask so that when the device behind the IOMMU performs a DMA, the DMA
> will be performed to encrypted memory.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 7 +++++++
> arch/x86/mm/mem_encrypt.c | 30 ++++++++++++++++++++++++++++++
> drivers/iommu/amd_iommu.c | 36 +++++++++++++++++++-----------------
> drivers/iommu/amd_iommu_init.c | 18 ++++++++++++------
> drivers/iommu/amd_iommu_proto.h | 10 ++++++++++
> drivers/iommu/amd_iommu_types.h | 2 +-
> include/asm-generic/mem_encrypt.h | 5 +++++
> 7 files changed, 84 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index c7a2525..d86e544 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -31,6 +31,8 @@ void __init sme_early_decrypt(resource_size_t paddr,
>
> void __init sme_early_init(void);
>
> +bool sme_iommu_supported(void);
> +
> /* Architecture __weak replacement functions */
> void __init mem_encrypt_init(void);
>
> @@ -62,6 +64,11 @@ static inline void __init sme_early_init(void)
> {
> }
>
> +static inline bool sme_iommu_supported(void)
> +{
> + return true;
> +}
Some more file real-estate saving:
static inline bool sme_iommu_supported(void) { return true; }
> +
> #endif /* CONFIG_AMD_MEM_ENCRYPT */
>
> static inline bool sme_active(void)
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 5d7c51d..018b58a 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -197,6 +197,36 @@ void __init sme_early_init(void)
> protection_map[i] = pgprot_encrypted(protection_map[i]);
> }
>
> +bool sme_iommu_supported(void)
Why is this one exported with all the header file declarations if it is
going to be used in the iommu code only? IOW, you can make it a static
function there and save yourself all the exporting.
> +{
> + struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> + if (!sme_me_mask || (c->x86 != 0x17))
me_mask or sme_active()?
Or is the IOMMU "disabled" in a way the moment the BIOS decides that SME
can be enabled?
Also, family checks are always a bad idea for enablement. Why do we need
the family check? Because future families will work with the IOMMU? :-)
> + return true;
> +
> + /* For Fam17h, a specific level of support is required */
> + switch (c->microcode & 0xf000) {
Also, you said in another mail on this subthread that c->microcode
is not yet set. Are you saying, that the iommu init gunk runs before
init_amd(), where we do set c->microcode?
If so, we can move the setting to early_init_amd() or so.
> + case 0x0000:
> + return false;
> + case 0x1000:
> + switch (c->microcode & 0x0f00) {
> + case 0x0000:
> + return false;
> + case 0x0100:
> + if ((c->microcode & 0xff) < 0x26)
> + return false;
> + break;
> + case 0x0200:
> + if ((c->microcode & 0xff) < 0x05)
> + return false;
> + break;
> + }
So this is the microcode revision, why those complex compares? Can't you
simply check a range of values?
> + break;
> + }
> +
> + return true;
> +}
> +
> /* Architecture __weak replacement functions */
> void __init mem_encrypt_init(void)
> {
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 63cacf5..94eb130 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -544,7 +544,7 @@ static void dump_dte_entry(u16 devid)
>
> static void dump_command(unsigned long phys_addr)
> {
> - struct iommu_cmd *cmd = phys_to_virt(phys_addr);
> + struct iommu_cmd *cmd = iommu_phys_to_virt(phys_addr);
> int i;
>
> for (i = 0; i < 4; ++i)
> @@ -863,13 +863,15 @@ static void copy_cmd_to_buffer(struct amd_iommu *iommu,
> writel(tail, iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
> }
>
> -static void build_completion_wait(struct iommu_cmd *cmd, u64 address)
> +static void build_completion_wait(struct iommu_cmd *cmd, volatile u64 *sem)
WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
#134: FILE: drivers/iommu/amd_iommu.c:866:
+static void build_completion_wait(struct iommu_cmd *cmd, volatile u64 *sem)
> {
> + u64 address = iommu_virt_to_phys((void *)sem);
> +
> WARN_ON(address & 0x7ULL);
>
> memset(cmd, 0, sizeof(*cmd));
> - cmd->data[0] = lower_32_bits(__pa(address)) | CMD_COMPL_WAIT_STORE_MASK;
> - cmd->data[1] = upper_32_bits(__pa(address));
> + cmd->data[0] = lower_32_bits(address) | CMD_COMPL_WAIT_STORE_MASK;
> + cmd->data[1] = upper_32_bits(address);
> cmd->data[2] = 1;
> CMD_SET_TYPE(cmd, CMD_COMPL_WAIT);
<... snip stuff which Joerg needs to review... >
> diff --git a/include/asm-generic/mem_encrypt.h b/include/asm-generic/mem_encrypt.h
> index fb02ff0..bbc49e1 100644
> --- a/include/asm-generic/mem_encrypt.h
> +++ b/include/asm-generic/mem_encrypt.h
> @@ -27,6 +27,11 @@ static inline u64 sme_dma_mask(void)
> return 0ULL;
> }
>
> +static inline bool sme_iommu_supported(void)
> +{
> + return true;
> +}
Save some more file real-estate... you get the idea by now, I'm sure.
:-)
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 6/14/2017 11:45 AM, Borislav Petkov wrote:
> On Wed, Jun 07, 2017 at 02:17:21PM -0500, Tom Lendacky wrote:
>> Since DMA addresses will effectively look like 48-bit addresses when the
>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>> device performing the DMA does not support 48-bits. SWIOTLB will be
>> initialized to create decrypted bounce buffers for use by these devices.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>
> ...
>
>
>> diff --git a/init/main.c b/init/main.c
>> index df58a41..7125b5f 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -488,6 +488,10 @@ void __init __weak thread_stack_cache_init(void)
>> }
>> #endif
>>
>> +void __init __weak mem_encrypt_init(void)
>> +{
>> +}
>
> void __init __weak mem_encrypt_init(void) { }
>
> saves some real estate. Please do that for the rest of the stubs you're
> adding, for the next version.
Ok, will do.
Thanks,
Tom
>
>> +
>> /*
>> * Set up kernel memory allocators
>> */
>> @@ -640,6 +644,15 @@ asmlinkage __visible void __init start_kernel(void)
>> */
>> locking_selftest();
>>
>> + /*
>> + * This needs to be called before any devices perform DMA
>> + * operations that might use the SWIOTLB bounce buffers.
>> + * This call will mark the bounce buffers as decrypted so
>> + * that their usage will not cause "plain-text" data to be
>> + * decrypted when accessed.
>
> s/This call/It/
>
>> + */
>> + mem_encrypt_init();
>> +
>> #ifdef CONFIG_BLK_DEV_INITRD
>> if (initrd_start && !initrd_below_start_ok &&
>> page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
>> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
>> index a8d74a7..74d6557 100644
>> --- a/lib/swiotlb.c
>> +++ b/lib/swiotlb.c
>> @@ -30,6 +30,7 @@
>> #include <linux/highmem.h>
>> #include <linux/gfp.h>
>> #include <linux/scatterlist.h>
>> +#include <linux/mem_encrypt.h>
>>
>> #include <asm/io.h>
>> #include <asm/dma.h>
>> @@ -155,6 +156,17 @@ unsigned long swiotlb_size_or_default(void)
>> return size ? size : (IO_TLB_DEFAULT_SIZE);
>> }
>>
>> +void __weak swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
>> +{
>> +}
>
> As above.
>
On 6/14/2017 11:50 AM, Borislav Petkov wrote:
> On Wed, Jun 07, 2017 at 02:17:32PM -0500, Tom Lendacky wrote:
>> Add warnings to let the user know when bounce buffers are being used for
>> DMA when SME is active. Since the bounce buffers are not in encrypted
>> memory, these notifications are to allow the user to determine some
>> appropriate action - if necessary.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>> arch/x86/include/asm/mem_encrypt.h | 8 ++++++++
>> include/asm-generic/mem_encrypt.h | 5 +++++
>> include/linux/dma-mapping.h | 9 +++++++++
>> lib/swiotlb.c | 3 +++
>> 4 files changed, 25 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index f1215a4..c7a2525 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -69,6 +69,14 @@ static inline bool sme_active(void)
>> return !!sme_me_mask;
>> }
>>
>> +static inline u64 sme_dma_mask(void)
>> +{
>> + if (!sme_me_mask)
>> + return 0ULL;
>> +
>> + return ((u64)sme_me_mask << 1) - 1;
>> +}
>> +
>> /*
>> * The __sme_pa() and __sme_pa_nodebug() macros are meant for use when
>> * writing to or comparing values from the cr3 register. Having the
>> diff --git a/include/asm-generic/mem_encrypt.h b/include/asm-generic/mem_encrypt.h
>> index b55c3f9..fb02ff0 100644
>> --- a/include/asm-generic/mem_encrypt.h
>> +++ b/include/asm-generic/mem_encrypt.h
>> @@ -22,6 +22,11 @@ static inline bool sme_active(void)
>> return false;
>> }
>>
>> +static inline u64 sme_dma_mask(void)
>> +{
>> + return 0ULL;
>> +}
>> +
>> /*
>> * The __sme_set() and __sme_clr() macros are useful for adding or removing
>> * the encryption mask from a value (e.g. when dealing with pagetable
>> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
>> index 4f3eece..e2c5fda 100644
>> --- a/include/linux/dma-mapping.h
>> +++ b/include/linux/dma-mapping.h
>> @@ -10,6 +10,7 @@
>> #include <linux/scatterlist.h>
>> #include <linux/kmemcheck.h>
>> #include <linux/bug.h>
>> +#include <linux/mem_encrypt.h>
>>
>> /**
>> * List of possible attributes associated with a DMA mapping. The semantics
>> @@ -577,6 +578,10 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
>>
>> if (!dev->dma_mask || !dma_supported(dev, mask))
>> return -EIO;
>> +
>> + if (sme_active() && (mask < sme_dma_mask()))
>> + dev_warn(dev, "SME is active, device will require DMA bounce buffers\n");
>
> Something looks strange here:
>
> you're checking sme_active() before calling sme_dma_mask() and yet in
> it, you're checking !sme_me_mask again. What gives?
>
I guess I don't need the sme_active() check since the second part of the
if statement can only ever be true if SME is active (since mask is
unsigned).
Thanks,
Tom
> Why not move the sme_active() check into sme_dma_mask() and thus
> simplify callers?
>
On 6/14/2017 12:42 PM, Borislav Petkov wrote:
> On Wed, Jun 07, 2017 at 02:17:45PM -0500, Tom Lendacky wrote:
>> The IOMMU is programmed with physical addresses for the various tables
>> and buffers that are used to communicate between the device and the
>> driver. When the driver allocates this memory it is encrypted. In order
>> for the IOMMU to access the memory as encrypted the encryption mask needs
>> to be included in these physical addresses during configuration.
>>
>> The PTE entries created by the IOMMU should also include the encryption
>> mask so that when the device behind the IOMMU performs a DMA, the DMA
>> will be performed to encrypted memory.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>> arch/x86/include/asm/mem_encrypt.h | 7 +++++++
>> arch/x86/mm/mem_encrypt.c | 30 ++++++++++++++++++++++++++++++
>> drivers/iommu/amd_iommu.c | 36 +++++++++++++++++++-----------------
>> drivers/iommu/amd_iommu_init.c | 18 ++++++++++++------
>> drivers/iommu/amd_iommu_proto.h | 10 ++++++++++
>> drivers/iommu/amd_iommu_types.h | 2 +-
>> include/asm-generic/mem_encrypt.h | 5 +++++
>> 7 files changed, 84 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index c7a2525..d86e544 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -31,6 +31,8 @@ void __init sme_early_decrypt(resource_size_t paddr,
>>
>> void __init sme_early_init(void);
>>
>> +bool sme_iommu_supported(void);
>> +
>> /* Architecture __weak replacement functions */
>> void __init mem_encrypt_init(void);
>>
>> @@ -62,6 +64,11 @@ static inline void __init sme_early_init(void)
>> {
>> }
>>
>> +static inline bool sme_iommu_supported(void)
>> +{
>> + return true;
>> +}
>
> Some more file real-estate saving:
>
> static inline bool sme_iommu_supported(void) { return true; }
>
>> +
>> #endif /* CONFIG_AMD_MEM_ENCRYPT */
>>
>> static inline bool sme_active(void)
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 5d7c51d..018b58a 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -197,6 +197,36 @@ void __init sme_early_init(void)
>> protection_map[i] = pgprot_encrypted(protection_map[i]);
>> }
>>
>> +bool sme_iommu_supported(void)
>
> Why is this one exported with all the header file declarations if it is
> going to be used in the iommu code only? IOW, you can make it a static
> function there and save yourself all the exporting.
I was trying to keep all the logic for it here in the SME related files
rather than put it in the iommu code itself. But it is easy enough to
move if you think it's worth it.
>
>> +{
>> + struct cpuinfo_x86 *c = &boot_cpu_data;
>> +
>> + if (!sme_me_mask || (c->x86 != 0x17))
>
> me_mask or sme_active()?
I like using sme_active() outside of the SME-specific files and using
sme_me_mask in the SME-specific files to save any changes that will have
to be made once SEV comes around.
>
> Or is the IOMMU "disabled" in a way the moment the BIOS decides that SME
> can be enabled?
There's a fix in the AGESA layer of the BIOS that permits the IOMMU to
function properly when SME is enabled. Unfortunately, the only easy way
to determine if that fix is available is through the patch level check.
>
> Also, family checks are always a bad idea for enablement. Why do we need
> the family check? Because future families will work with the IOMMU? :-)
Yes, any future family that supports SME will (should) work with the
IOMMU without having to check patch levels.
>
>> + return true;
>> +
>> + /* For Fam17h, a specific level of support is required */
>> + switch (c->microcode & 0xf000) {
>
> Also, you said in another mail on this subthread that c->microcode
> is not yet set. Are you saying, that the iommu init gunk runs before
> init_amd(), where we do set c->microcode?
>
> If so, we can move the setting to early_init_amd() or so.
I'll look into that.
>
>> + case 0x0000:
>> + return false;
>> + case 0x1000:
>> + switch (c->microcode & 0x0f00) {
>> + case 0x0000:
>> + return false;
>> + case 0x0100:
>> + if ((c->microcode & 0xff) < 0x26)
>> + return false;
>> + break;
>> + case 0x0200:
>> + if ((c->microcode & 0xff) < 0x05)
>> + return false;
>> + break;
>> + }
>
> So this is the microcode revision, why those complex compares? Can't you
> simply check a range of values?
I'll look into simplifying the checks.
>
>> + break;
>> + }
>> +
>> + return true;
>> +}
>> +
>> /* Architecture __weak replacement functions */
>> void __init mem_encrypt_init(void)
>> {
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index 63cacf5..94eb130 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -544,7 +544,7 @@ static void dump_dte_entry(u16 devid)
>>
>> static void dump_command(unsigned long phys_addr)
>> {
>> - struct iommu_cmd *cmd = phys_to_virt(phys_addr);
>> + struct iommu_cmd *cmd = iommu_phys_to_virt(phys_addr);
>> int i;
>>
>> for (i = 0; i < 4; ++i)
>> @@ -863,13 +863,15 @@ static void copy_cmd_to_buffer(struct amd_iommu *iommu,
>> writel(tail, iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
>> }
>>
>> -static void build_completion_wait(struct iommu_cmd *cmd, u64 address)
>> +static void build_completion_wait(struct iommu_cmd *cmd, volatile u64 *sem)
>
> WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
> #134: FILE: drivers/iommu/amd_iommu.c:866:
> +static void build_completion_wait(struct iommu_cmd *cmd, volatile u64 *sem)
>
The semaphore area is written to by the device so the use of volatile is
appropriate in this case.
Thanks,
Tom
>> {
>> + u64 address = iommu_virt_to_phys((void *)sem);
>> +
>> WARN_ON(address & 0x7ULL);
>>
>> memset(cmd, 0, sizeof(*cmd));
>> - cmd->data[0] = lower_32_bits(__pa(address)) | CMD_COMPL_WAIT_STORE_MASK;
>> - cmd->data[1] = upper_32_bits(__pa(address));
>> + cmd->data[0] = lower_32_bits(address) | CMD_COMPL_WAIT_STORE_MASK;
>> + cmd->data[1] = upper_32_bits(address);
>> cmd->data[2] = 1;
>> CMD_SET_TYPE(cmd, CMD_COMPL_WAIT);
>
> <... snip stuff which Joerg needs to review... >
>
>> diff --git a/include/asm-generic/mem_encrypt.h b/include/asm-generic/mem_encrypt.h
>> index fb02ff0..bbc49e1 100644
>> --- a/include/asm-generic/mem_encrypt.h
>> +++ b/include/asm-generic/mem_encrypt.h
>> @@ -27,6 +27,11 @@ static inline u64 sme_dma_mask(void)
>> return 0ULL;
>> }
>>
>> +static inline bool sme_iommu_supported(void)
>> +{
>> + return true;
>> +}
>
> Save some more file real-estate... you get the idea by now, I'm sure.
>
> :-)
>
On Wed, Jun 14, 2017 at 02:49:02PM -0500, Tom Lendacky wrote:
> I guess I don't need the sme_active() check since the second part of the
> if statement can only ever be true if SME is active (since mask is
> unsigned).
... and you can define sme_me_mask as an u64 directly (it is that already,
practically, as we don't do SME on 32-bit) and then get rid of the cast.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 14, 2017 at 03:40:28PM -0500, Tom Lendacky wrote:
> I was trying to keep all the logic for it here in the SME related files
> rather than put it in the iommu code itself. But it is easy enough to
> move if you think it's worth it.
Yes please - the less needlessly global symbols, the better.
> > Also, you said in another mail on this subthread that c->microcode
> > is not yet set. Are you saying, that the iommu init gunk runs before
> > init_amd(), where we do set c->microcode?
> >
> > If so, we can move the setting to early_init_amd() or so.
>
> I'll look into that.
And I don't think c->microcode is not set by the time we init the iommu
because, AFAICT, we do the latter in pci_iommu_init() and that's a
rootfs_initcall() which happens later then the CPU init stuff.
> I'll look into simplifying the checks.
Something like this maybe?
if (rev >= 0x1205)
return true;
if (rev <= 0x11ff && rev >= 0x1126)
return true;
return false;
> > WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
> > #134: FILE: drivers/iommu/amd_iommu.c:866:
> > +static void build_completion_wait(struct iommu_cmd *cmd, volatile u64 *sem)
> >
>
> The semaphore area is written to by the device so the use of volatile is
> appropriate in this case.
Do you mean this is like the last exception case in that document above:
"
- Pointers to data structures in coherent memory which might be modified
by I/O devices can, sometimes, legitimately be volatile. A ring buffer
used by a network adapter, where that adapter changes pointers to
indicate which descriptors have been processed, is an example of this
type of situation."
?
If so, it did work fine until now, without the volatile. Why is it
needed now, all of a sudden?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:18:15PM -0500, Tom Lendacky wrote:
> Update the KVM support to work with SME. The VMCB has a number of fields
> where physical addresses are used and these addresses must contain the
> memory encryption mask in order to properly access the encrypted memory.
> Also, use the memory encryption mask when creating and using the nested
> page tables.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/mmu.c | 12 ++++++++----
> arch/x86/kvm/mmu.h | 2 +-
> arch/x86/kvm/svm.c | 35 ++++++++++++++++++-----------------
> arch/x86/kvm/vmx.c | 3 ++-
> arch/x86/kvm/x86.c | 3 ++-
> 6 files changed, 32 insertions(+), 25 deletions(-)
Patches 27-29:
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Wed, Jun 07, 2017 at 02:18:27PM -0500, Tom Lendacky wrote:
> Provide support so that kexec can be used to boot a kernel when SME is
> enabled.
>
> Support is needed to allocate pages for kexec without encryption. This
> is needed in order to be able to reboot in the kernel in the same manner
> as originally booted.
>
> Additionally, when shutting down all of the CPUs we need to be sure to
> flush the caches and then halt. This is needed when booting from a state
> where SME was not active into a state where SME is active (or vice-versa).
> Without these steps, it is possible for cache lines to exist for the same
> physical location but tagged both with and without the encryption bit. This
> can cause random memory corruption when caches are flushed depending on
> which cacheline is written last.
>
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/init.h | 1 +
> arch/x86/include/asm/kexec.h | 8 ++++++++
> arch/x86/include/asm/pgtable_types.h | 1 +
> arch/x86/kernel/machine_kexec_64.c | 35 +++++++++++++++++++++++++++++++++-
> arch/x86/kernel/process.c | 17 +++++++++++++++--
> arch/x86/mm/ident_map.c | 12 ++++++++----
> include/linux/kexec.h | 14 ++++++++++++++
> kernel/kexec_core.c | 6 ++++++
> 8 files changed, 87 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 474eb8c..05c4aa0 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -7,6 +7,7 @@ struct x86_mapping_info {
> unsigned long page_flag; /* page flag for PMD or PUD entry */
> unsigned long offset; /* ident mapping offset */
> bool direct_gbpages; /* PUD level 1GB page support */
> + unsigned long kernpg_flag; /* kernel pagetable flag override */
> };
>
> int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 70ef205..e8183ac 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -207,6 +207,14 @@ struct kexec_entry64_regs {
> uint64_t r15;
> uint64_t rip;
> };
> +
> +extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
> + gfp_t gfp);
> +#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
> +
> +extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
> +#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
> +
> #endif
>
> typedef void crash_vmclear_fn(void);
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index ce8cb1c..0f326f4 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -213,6 +213,7 @@ enum page_cache_mode {
> #define PAGE_KERNEL __pgprot(__PAGE_KERNEL | _PAGE_ENC)
> #define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
> #define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
> +#define PAGE_KERNEL_EXEC_NOENC __pgprot(__PAGE_KERNEL_EXEC)
> #define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
> #define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
> #define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 6f5ca4e..35e069a 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -87,7 +87,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
> set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
> }
> pte = pte_offset_kernel(pmd, vaddr);
> - set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
> + set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
> return 0;
> err:
> free_transition_pgtable(image);
> @@ -115,6 +115,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
> .alloc_pgt_page = alloc_pgt_page,
> .context = image,
> .page_flag = __PAGE_KERNEL_LARGE_EXEC,
> + .kernpg_flag = _KERNPG_TABLE_NOENC,
> };
> unsigned long mstart, mend;
> pgd_t *level4p;
> @@ -602,3 +603,35 @@ void arch_kexec_unprotect_crashkres(void)
> {
> kexec_mark_crashkres(false);
> }
> +
> +int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
> +{
> + int ret;
> +
> + if (sme_active()) {
What happened to flipping the logic and saving an indentation level here?
> + /*
> + * If SME is active we need to be sure that kexec pages are
> + * not encrypted because when we boot to the new kernel the
> + * pages won't be accessed encrypted (initially).
> + */
> + ret = set_memory_decrypted((unsigned long)vaddr, pages);
> + if (ret)
> + return ret;
> +
> + if (gfp & __GFP_ZERO)
> + memset(vaddr, 0, pages * PAGE_SIZE);
> + }
This is still zeroing the memory a second time. That function has missed
all my comments from last time.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 6/15/2017 4:08 AM, Borislav Petkov wrote:
> On Wed, Jun 14, 2017 at 02:49:02PM -0500, Tom Lendacky wrote:
>> I guess I don't need the sme_active() check since the second part of the
>> if statement can only ever be true if SME is active (since mask is
>> unsigned).
>
> ... and you can define sme_me_mask as an u64 directly (it is that already,
> practically, as we don't do SME on 32-bit) and then get rid of the cast.
Let me look into that. There are so many places that are expecting an
unsigned long I'll have to see how that all works out from a build
perspective.
Thanks,
Tom
>
On 6/15/2017 4:41 AM, Borislav Petkov wrote:
> On Wed, Jun 14, 2017 at 03:40:28PM -0500, Tom Lendacky wrote:
>> I was trying to keep all the logic for it here in the SME related files
>> rather than put it in the iommu code itself. But it is easy enough to
>> move if you think it's worth it.
>
> Yes please - the less needlessly global symbols, the better.
Ok.
>
>>> Also, you said in another mail on this subthread that c->microcode
>>> is not yet set. Are you saying, that the iommu init gunk runs before
>>> init_amd(), where we do set c->microcode?
>>>
>>> If so, we can move the setting to early_init_amd() or so.
>>
>> I'll look into that.
>
> And I don't think c->microcode is not set by the time we init the iommu
> because, AFAICT, we do the latter in pci_iommu_init() and that's a
> rootfs_initcall() which happens later then the CPU init stuff.
Actually the detection routine, amd_iommu_detect(), is part of the
IOMMU_INIT_FINISH macro support which is called early through mm_init()
from start_kernel() and that routine is called before init_amd().
>
>> I'll look into simplifying the checks.
>
> Something like this maybe?
>
> if (rev >= 0x1205)
> return true;
>
> if (rev <= 0x11ff && rev >= 0x1126)
> return true;
>
> return false;
Yup, something like that.
>
>>> WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
>>> #134: FILE: drivers/iommu/amd_iommu.c:866:
>>> +static void build_completion_wait(struct iommu_cmd *cmd, volatile u64 *sem)
>>>
>>
>> The semaphore area is written to by the device so the use of volatile is
>> appropriate in this case.
>
> Do you mean this is like the last exception case in that document above:
>
> "
> - Pointers to data structures in coherent memory which might be modified
> by I/O devices can, sometimes, legitimately be volatile. A ring buffer
> used by a network adapter, where that adapter changes pointers to
> indicate which descriptors have been processed, is an example of this
> type of situation."
>
> ?
>
> If so, it did work fine until now, without the volatile. Why is it
> needed now, all of a sudden?
If you run checkpatch against the whole amd_iommu.c file you'll see that
same warning for the wait_on_sem() function. The checkpatch warning
shows up now because I modified the build_completion_wait() function as
part of the support to use iommu_virt_to_phys().
Since I'm casting the arg to iommu_virt_to_phys() no matter what I can
avoid the signature change to the build_completion_wait() function and
avoid this confusion in the future.
Thanks,
Tom
>
On Thu, Jun 15, 2017 at 09:59:45AM -0500, Tom Lendacky wrote:
> Actually the detection routine, amd_iommu_detect(), is part of the
> IOMMU_INIT_FINISH macro support which is called early through mm_init()
> from start_kernel() and that routine is called before init_amd().
Ah, we do that there too:
for (p = __iommu_table; p < __iommu_table_end; p++) {
Can't say that that code with the special section and whatnot is
obvious. :-\
Oh, well, early_init_amd() then. That is called in
start_kernel->setup_arch->early_cpu_init and thus before mm_init().
> > If so, it did work fine until now, without the volatile. Why is it
> > needed now, all of a sudden?
>
> If you run checkpatch against the whole amd_iommu.c file you'll see that
I'm, of course, not talking about the signature change: I'm *actually*
questioning the need to make this argument volatile, all of a sudden.
If there's a need, please explain why. It worked fine until now. If it
didn't, we would've seen it.
If it is a bug, then it needs a proper explanation, a *separate* patch
and so on. But not like now, a drive-by change in an IOMMU enablement
patch.
If it is wrong, then wait_on_sem() needs to be fixed too. AFAICT,
wait_on_sem() gets called in both cases with interrupts disabled, while
holding a lock so I'd like to pls know why, even in that case, does this
variable need to be volatile.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 6/15/2017 10:33 AM, Borislav Petkov wrote:
> On Thu, Jun 15, 2017 at 09:59:45AM -0500, Tom Lendacky wrote:
>> Actually the detection routine, amd_iommu_detect(), is part of the
>> IOMMU_INIT_FINISH macro support which is called early through mm_init()
>> from start_kernel() and that routine is called before init_amd().
>
> Ah, we do that there too:
>
> for (p = __iommu_table; p < __iommu_table_end; p++) {
>
> Can't say that that code with the special section and whatnot is
> obvious. :-\
>
> Oh, well, early_init_amd() then. That is called in
> start_kernel->setup_arch->early_cpu_init and thus before mm_init().
>
>>> If so, it did work fine until now, without the volatile. Why is it
>>> needed now, all of a sudden?
>>
>> If you run checkpatch against the whole amd_iommu.c file you'll see that
>
> I'm, of course, not talking about the signature change: I'm *actually*
> questioning the need to make this argument volatile, all of a sudden.
Understood.
>
> If there's a need, please explain why. It worked fine until now. If it
> didn't, we would've seen it.
The original reason for the change was to try and make the use of
iommu_virt_to_phys() straight forward. Removing the cast and changing
build_completion_wait() to take a u64 * (without volatile) resulted in a
warning because cmd_sem is defined in the amd_iommu struct as volatile,
which required a cast on the call to iommu_virt_to_phys() anyway. Since
it worked fine previously and the whole volatile thing is beyond the
scope of this patchset, I'll change back to the original method of how
the function was called.
>
> If it is a bug, then it needs a proper explanation, a *separate* patch
> and so on. But not like now, a drive-by change in an IOMMU enablement
> patch.
>
> If it is wrong, then wait_on_sem() needs to be fixed too. AFAICT,
> wait_on_sem() gets called in both cases with interrupts disabled, while
> holding a lock so I'd like to pls know why, even in that case, does this
> variable need to be volatile
Changing the signature back reverts to the original way, so this can be
looked at separate from this patchset then.
Thanks,
Tom
>
On 6/15/2017 5:03 AM, Borislav Petkov wrote:
> On Wed, Jun 07, 2017 at 02:18:27PM -0500, Tom Lendacky wrote:
>> Provide support so that kexec can be used to boot a kernel when SME is
>> enabled.
>>
>> Support is needed to allocate pages for kexec without encryption. This
>> is needed in order to be able to reboot in the kernel in the same manner
>> as originally booted.
>>
>> Additionally, when shutting down all of the CPUs we need to be sure to
>> flush the caches and then halt. This is needed when booting from a state
>> where SME was not active into a state where SME is active (or vice-versa).
>> Without these steps, it is possible for cache lines to exist for the same
>> physical location but tagged both with and without the encryption bit. This
>> can cause random memory corruption when caches are flushed depending on
>> which cacheline is written last.
>>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> ---
>> arch/x86/include/asm/init.h | 1 +
>> arch/x86/include/asm/kexec.h | 8 ++++++++
>> arch/x86/include/asm/pgtable_types.h | 1 +
>> arch/x86/kernel/machine_kexec_64.c | 35 +++++++++++++++++++++++++++++++++-
>> arch/x86/kernel/process.c | 17 +++++++++++++++--
>> arch/x86/mm/ident_map.c | 12 ++++++++----
>> include/linux/kexec.h | 14 ++++++++++++++
>> kernel/kexec_core.c | 6 ++++++
>> 8 files changed, 87 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
>> index 474eb8c..05c4aa0 100644
>> --- a/arch/x86/include/asm/init.h
>> +++ b/arch/x86/include/asm/init.h
>> @@ -7,6 +7,7 @@ struct x86_mapping_info {
>> unsigned long page_flag; /* page flag for PMD or PUD entry */
>> unsigned long offset; /* ident mapping offset */
>> bool direct_gbpages; /* PUD level 1GB page support */
>> + unsigned long kernpg_flag; /* kernel pagetable flag override */
>> };
>>
>> int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 70ef205..e8183ac 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -207,6 +207,14 @@ struct kexec_entry64_regs {
>> uint64_t r15;
>> uint64_t rip;
>> };
>> +
>> +extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
>> + gfp_t gfp);
>> +#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
>> +
>> +extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
>> +#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
>> +
>> #endif
>>
>> typedef void crash_vmclear_fn(void);
>> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
>> index ce8cb1c..0f326f4 100644
>> --- a/arch/x86/include/asm/pgtable_types.h
>> +++ b/arch/x86/include/asm/pgtable_types.h
>> @@ -213,6 +213,7 @@ enum page_cache_mode {
>> #define PAGE_KERNEL __pgprot(__PAGE_KERNEL | _PAGE_ENC)
>> #define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
>> #define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
>> +#define PAGE_KERNEL_EXEC_NOENC __pgprot(__PAGE_KERNEL_EXEC)
>> #define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
>> #define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
>> #define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 6f5ca4e..35e069a 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -87,7 +87,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>> set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>> }
>> pte = pte_offset_kernel(pmd, vaddr);
>> - set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
>> + set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
>> return 0;
>> err:
>> free_transition_pgtable(image);
>> @@ -115,6 +115,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>> .alloc_pgt_page = alloc_pgt_page,
>> .context = image,
>> .page_flag = __PAGE_KERNEL_LARGE_EXEC,
>> + .kernpg_flag = _KERNPG_TABLE_NOENC,
>> };
>> unsigned long mstart, mend;
>> pgd_t *level4p;
>> @@ -602,3 +603,35 @@ void arch_kexec_unprotect_crashkres(void)
>> {
>> kexec_mark_crashkres(false);
>> }
>> +
>> +int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
>> +{
>> + int ret;
>> +
>> + if (sme_active()) {
>
> What happened to flipping the logic and saving an indentation level here?
>
>> + /*
>> + * If SME is active we need to be sure that kexec pages are
>> + * not encrypted because when we boot to the new kernel the
>> + * pages won't be accessed encrypted (initially).
>> + */
>> + ret = set_memory_decrypted((unsigned long)vaddr, pages);
>> + if (ret)
>> + return ret;
>> +
>> + if (gfp & __GFP_ZERO)
>> + memset(vaddr, 0, pages * PAGE_SIZE);
>> + }
>
> This is still zeroing the memory a second time. That function has missed
> all my comments from last time.
Hmmm... not sure what happened, I thought I made changes here. I'll
take care of it.
Thanks,
Tom
>
On June 15, 2017 11:33:22 AM EDT, Borislav Petkov <[email protected]> wrote:
>On Thu, Jun 15, 2017 at 09:59:45AM -0500, Tom Lendacky wrote:
>> Actually the detection routine, amd_iommu_detect(), is part of the
>> IOMMU_INIT_FINISH macro support which is called early through
>mm_init()
>> from start_kernel() and that routine is called before init_amd().
>
>Ah, we do that there too:
>
> for (p = __iommu_table; p < __iommu_table_end; p++) {
>
>Can't say that that code with the special section and whatnot is
>obvious. :-\
>
Patches to make it more obvious would be always welcome!
Thanks!
On Thu, Jun 15, 2017 at 11:33:41AM -0500, Tom Lendacky wrote:
> Changing the signature back reverts to the original way, so this can be
> looked at separate from this patchset then.
Right, the patch which added the volatile thing was this one:
4bf5beef578e ("iommu/amd: Don't put completion-wait semaphore on stack")
and the commit message doesn't say why the thing needs to be volatile at
all.
Joerg?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Jun 15, 2017 at 11:41:12AM +0200, Borislav Petkov wrote:
> On Wed, Jun 14, 2017 at 03:40:28PM -0500, Tom Lendacky wrote:
> > > WARNING: Use of volatile is usually wrong: see Documentation/process/volatile-considered-harmful.rst
> > > #134: FILE: drivers/iommu/amd_iommu.c:866:
> > > +static void build_completion_wait(struct iommu_cmd *cmd, volatile u64 *sem)
> > >
> >
> > The semaphore area is written to by the device so the use of volatile is
> > appropriate in this case.
>
> Do you mean this is like the last exception case in that document above:
>
> "
> - Pointers to data structures in coherent memory which might be modified
> by I/O devices can, sometimes, legitimately be volatile. A ring buffer
> used by a network adapter, where that adapter changes pointers to
> indicate which descriptors have been processed, is an example of this
> type of situation."
>
> ?
So currently (without this patch) the build_completion_wait function
does not take a volatile parameter, only wait_on_sem() does.
Wait_on_sem() needs it because its purpose is to poll a memory location
which is changed by the iommu-hardware when its done with command
processing.
But the 'volatile' in build_completion_wait() looks unnecessary, because
the function does not poll the memory location. It only uses the
pointer, converts it to a physical address and writes it to the command
to be queued.
Regards,
Joerg
On Wed, Jun 21, 2017 at 05:37:22PM +0200, Joerg Roedel wrote:
> > Do you mean this is like the last exception case in that document above:
> >
> > "
> > - Pointers to data structures in coherent memory which might be modified
> > by I/O devices can, sometimes, legitimately be volatile. A ring buffer
> > used by a network adapter, where that adapter changes pointers to
> > indicate which descriptors have been processed, is an example of this
> > type of situation."
> >
> > ?
>
> So currently (without this patch) the build_completion_wait function
> does not take a volatile parameter, only wait_on_sem() does.
>
> Wait_on_sem() needs it because its purpose is to poll a memory location
> which is changed by the iommu-hardware when its done with command
> processing.
Right, the reason above - memory modifiable by an IO device. You could
add a comment there explaining the need for the volatile.
> But the 'volatile' in build_completion_wait() looks unnecessary, because
> the function does not poll the memory location. It only uses the
> pointer, converts it to a physical address and writes it to the command
> to be queued.
Ok.
Thanks.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 6/21/2017 11:59 AM, Borislav Petkov wrote:
> On Wed, Jun 21, 2017 at 05:37:22PM +0200, Joerg Roedel wrote:
>>> Do you mean this is like the last exception case in that document above:
>>>
>>> "
>>> - Pointers to data structures in coherent memory which might be modified
>>> by I/O devices can, sometimes, legitimately be volatile. A ring buffer
>>> used by a network adapter, where that adapter changes pointers to
>>> indicate which descriptors have been processed, is an example of this
>>> type of situation."
>>>
>>> ?
>>
>> So currently (without this patch) the build_completion_wait function
>> does not take a volatile parameter, only wait_on_sem() does.
>>
>> Wait_on_sem() needs it because its purpose is to poll a memory location
>> which is changed by the iommu-hardware when its done with command
>> processing.
>
> Right, the reason above - memory modifiable by an IO device. You could
> add a comment there explaining the need for the volatile.
>
>> But the 'volatile' in build_completion_wait() looks unnecessary, because
>> the function does not poll the memory location. It only uses the
>> pointer, converts it to a physical address and writes it to the command
>> to be queued.
>
> Ok.
Ok, so the (now) current version of the patch that doesn't change the
function signature is the right way to go.
Thanks,
Tom
>
> Thanks.
>