2018-03-05 16:34:10

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 00/22] Partial MKTME enabling

Hi everybody,

Here's updated version of my patchset that brings support of MKTME.
It's not yet complete, but I think it worth sharing to get early feedback.

Things that are missing:

- kmap() is not yet wired up to support tempoprary mappings of encrypted
pages. It's requried to allow kernel to access encrypted memory.

- Interface to manipulate encryption keys.

- Interface to create encrypted userspace mappings.

- IOMMU support.

What has been done:

- PCONFIG, TME and MKTME enumeration.

- In-kernel helper that allows to program encryption keys into CPU.

- Allocation and freeing encrypted pages.

- Helpers to find out if a VMA/anon_vma/page is encrypted and with what
KeyID.

Any feedback is welcome.

------------------------------------------------------------------------------

Multikey Total Memory Encryption (MKTME)[1] is a technology that allows
transparent memory encryption in upcoming Intel platforms.

MKTME is built on top of TME. TME allows encryption of the entirety of
system memory using a single key. MKTME allows to have multiple encryption
domains, each having own key -- different memory pages can be encrypted
with different keys.

Key design points of Intel MKTME:

- Initial HW implementation would support upto 63 keys (plus one default
TME key). But the number of keys may be as low as 3, depending to SKU
and BIOS settings

- To access encrypted memory you need to use mapping with proper KeyID
int the page table entry. KeyID is encoded in upper bits of PFN in page
table entry.

This means we cannot use direct map to access encrypted memory from
kernel side. My idea is to re-use kmap() interface to get proper
temporary mapping on kernel side.

- CPU does not enforce coherency between mappings of the same physical
page with different KeyIDs or encryption keys. We wound need to take
care about flushing cache on allocation of encrypted page and on
returning it back to free pool.

- For managing keys, there's MKTME_KEY_PROGRAM leaf of the new PCONFIG
(platform configuration) instruction. It allows load and clear keys
associated with a KeyID. You can also ask CPU to generate a key for
you or disable memory encryption when a KeyID is used.

[1] https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf

Kirill A. Shutemov (22):
x86/cpufeatures: Add Intel Total Memory Encryption cpufeature
x86/tme: Detect if TME and MKTME is activated by BIOS
x86/cpufeatures: Add Intel PCONFIG cpufeature
x86/pconfig: Detect PCONFIG targets
x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf
x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME
x86/mm: Mask out KeyID bits from page table entry pfn
mm: Introduce __GFP_ENCRYPT
mm, rmap: Add arch-specific field into anon_vma
mm/shmem: Zero out unused vma fields in shmem_pseudo_vma_init()
mm: Use __GFP_ENCRYPT for pages in encrypted VMAs
mm: Do no merge vma with different encryption KeyIDs
mm, rmap: Free encrypted pages once mapcount drops to zero
mm, khugepaged: Do not collapse pages in encrypted VMAs
x86/mm: Introduce variables to store number, shift and mask of KeyIDs
x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
x86/mm: Implement vma_is_encrypted() and vma_keyid()
x86/mm: Handle allocation of encrypted pages
x86/mm: Implement free_encrypt_page()
x86/mm: Implement anon_vma_encrypted() and anon_vma_keyid()
x86/mm: Introduce page_keyid() and page_encrypted()
x86: Introduce CONFIG_X86_INTEL_MKTME

arch/x86/Kconfig | 21 +++++++
arch/x86/boot/compressed/kaslr_64.c | 3 +
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/intel_pconfig.h | 65 +++++++++++++++++++
arch/x86/include/asm/mktme.h | 56 +++++++++++++++++
arch/x86/include/asm/page.h | 13 +++-
arch/x86/include/asm/page_types.h | 8 ++-
arch/x86/include/asm/pgtable_types.h | 7 ++-
arch/x86/kernel/cpu/Makefile | 2 +-
arch/x86/kernel/cpu/intel.c | 119 +++++++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/intel_pconfig.c | 82 ++++++++++++++++++++++++
arch/x86/mm/Makefile | 2 +
arch/x86/mm/mem_encrypt_identity.c | 3 +
arch/x86/mm/mktme.c | 101 +++++++++++++++++++++++++++++
arch/x86/mm/pgtable.c | 5 ++
include/linux/gfp.h | 29 +++++++--
include/linux/mm.h | 17 +++++
include/linux/rmap.h | 6 ++
include/trace/events/mmflags.h | 1 +
mm/Kconfig | 3 +
mm/khugepaged.c | 2 +
mm/mempolicy.c | 3 +
mm/mmap.c | 3 +-
mm/page_alloc.c | 3 +
mm/rmap.c | 49 +++++++++++++--
mm/shmem.c | 3 +-
tools/perf/builtin-kmem.c | 1 +
27 files changed, 590 insertions(+), 19 deletions(-)
create mode 100644 arch/x86/include/asm/intel_pconfig.h
create mode 100644 arch/x86/include/asm/mktme.h
create mode 100644 arch/x86/kernel/cpu/intel_pconfig.c
create mode 100644 arch/x86/mm/mktme.c

--
2.16.1



2018-03-05 16:27:59

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 21/22] x86/mm: Introduce page_keyid() and page_encrypted()

The new helpers checks if page is encrypted and with which keyid.
They use anon_vma get the information.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 14 ++++++++++++++
arch/x86/mm/mktme.c | 17 +++++++++++++++++
2 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 56c7e9b14ab6..dd81fe167e25 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -33,10 +33,24 @@ bool anon_vma_encrypted(struct anon_vma *anon_vma);

#define anon_vma_keyid anon_vma_keyid
int anon_vma_keyid(struct anon_vma *anon_vma);
+
+int page_keyid(struct page *page);
#else
+
#define mktme_keyid_mask ((phys_addr_t)0)
#define mktme_nr_keyids 0
#define mktme_keyid_shift 0
+
+static inline int page_keyid(struct page *page)
+{
+ return 0;
+}
#endif

+static inline bool page_encrypted(struct page *page)
+{
+ /* All pages with non-zero KeyID are encrypted */
+ return page_keyid(page) != 0;
+}
+
#endif
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 69172aabc07c..0ab795dfb1a4 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -39,6 +39,23 @@ int anon_vma_keyid(struct anon_vma *anon_vma)
return anon_vma->arch_anon_vma.keyid;
}

+int page_keyid(struct page *page)
+{
+ struct anon_vma *anon_vma;
+ int keyid = 0;
+
+ if (!PageAnon(page))
+ return 0;
+
+ anon_vma = page_get_anon_vma(page);
+ if (anon_vma) {
+ keyid = anon_vma_keyid(anon_vma);
+ put_anon_vma(anon_vma);
+ }
+
+ return keyid;
+}
+
void prep_encrypt_page(struct page *page, gfp_t gfp, unsigned int order)
{
void *v = page_to_virt(page);
--
2.16.1


2018-03-05 16:28:24

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 04/22] x86/pconfig: Detect PCONFIG targets

Intel PCONFIG targets are enumerated via new CPUID leaf 0x1b. This patch
detects all supported targets of PCONFIG and implements helper to check
if the target is supported.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/intel_pconfig.h | 15 +++++++
arch/x86/kernel/cpu/Makefile | 2 +-
arch/x86/kernel/cpu/intel_pconfig.c | 82 ++++++++++++++++++++++++++++++++++++
3 files changed, 98 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/include/asm/intel_pconfig.h
create mode 100644 arch/x86/kernel/cpu/intel_pconfig.c

diff --git a/arch/x86/include/asm/intel_pconfig.h b/arch/x86/include/asm/intel_pconfig.h
new file mode 100644
index 000000000000..fb7a37c3798b
--- /dev/null
+++ b/arch/x86/include/asm/intel_pconfig.h
@@ -0,0 +1,15 @@
+#ifndef _ASM_X86_INTEL_PCONFIG_H
+#define _ASM_X86_INTEL_PCONFIG_H
+
+#include <asm/asm.h>
+#include <asm/processor.h>
+
+enum pconfig_target {
+ INVALID_TARGET = 0,
+ MKTME_TARGET = 1,
+ PCONFIG_TARGET_NR
+};
+
+int pconfig_target_supported(enum pconfig_target target);
+
+#endif /* _ASM_X86_INTEL_PCONFIG_H */
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 570e8bb1f386..a66229f51b12 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -28,7 +28,7 @@ obj-y += cpuid-deps.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o

-obj-$(CONFIG_CPU_SUP_INTEL) += intel.o
+obj-$(CONFIG_CPU_SUP_INTEL) += intel.o intel_pconfig.o
obj-$(CONFIG_CPU_SUP_AMD) += amd.o
obj-$(CONFIG_CPU_SUP_CYRIX_32) += cyrix.o
obj-$(CONFIG_CPU_SUP_CENTAUR) += centaur.o
diff --git a/arch/x86/kernel/cpu/intel_pconfig.c b/arch/x86/kernel/cpu/intel_pconfig.c
new file mode 100644
index 000000000000..0771a905b286
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_pconfig.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel PCONFIG instruction support.
+ *
+ * Copyright (C) 2017 Intel Corporation
+ *
+ * Author:
+ * Kirill A. Shutemov <[email protected]>
+ */
+
+#include <asm/cpufeature.h>
+#include <asm/intel_pconfig.h>
+
+#define PCONFIG_CPUID 0x1b
+
+#define PCONFIG_CPUID_SUBLEAF_MASK ((1 << 12) - 1)
+
+/* Subleaf type (EAX) for PCONFIG CPUID leaf (0x1B) */
+enum {
+ PCONFIG_CPUID_SUBLEAF_INVALID = 0,
+ PCONFIG_CPUID_SUBLEAF_TARGETID = 1,
+};
+
+/* Bitmask of supported targets */
+static u64 targets_supported __read_mostly;
+
+int pconfig_target_supported(enum pconfig_target target)
+{
+ /*
+ * We would need to re-think the implementation once we get > 64
+ * PCONFIG targets. Spec allows up to 2^32 targets.
+ */
+ BUILD_BUG_ON(PCONFIG_TARGET_NR >= 64);
+
+ if (WARN_ON_ONCE(target >= 64))
+ return 0;
+ return targets_supported & (1ULL << target);
+}
+
+static int __init intel_pconfig_init(void)
+{
+ int subleaf;
+
+ if (!boot_cpu_has(X86_FEATURE_PCONFIG))
+ return 0;
+
+ /*
+ * Scan subleafs of PCONFIG CPUID leaf.
+ *
+ * Subleafs of the same type need not to be consecutive.
+ *
+ * Stop on the first invalid subleaf type. All subleafs after the first
+ * invalid are invalid too.
+ */
+ for (subleaf = 0; subleaf < INT_MAX; subleaf++) {
+ struct cpuid_regs regs;
+
+ cpuid_count(PCONFIG_CPUID, subleaf,
+ &regs.eax, &regs.ebx, &regs.ecx, &regs.edx);
+
+ switch (regs.eax & PCONFIG_CPUID_SUBLEAF_MASK) {
+ case PCONFIG_CPUID_SUBLEAF_INVALID:
+ /* Stop on the first invalid subleaf */
+ goto out;
+ case PCONFIG_CPUID_SUBLEAF_TARGETID:
+ /* Mark supported PCONFIG targets */
+ if (regs.ebx < 64)
+ targets_supported |= (1ULL << regs.ebx);
+ if (regs.ecx < 64)
+ targets_supported |= (1ULL << regs.ecx);
+ if (regs.edx < 64)
+ targets_supported |= (1ULL << regs.edx);
+ break;
+ default:
+ /* Unknown CPUID.PCONFIG subleaf: ignore */
+ break;
+ }
+ }
+out:
+ return 0;
+}
+arch_initcall(intel_pconfig_init);
--
2.16.1


2018-03-05 16:28:35

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 03/22] x86/cpufeatures: Add Intel PCONFIG cpufeature

CPUID.0x7.0x0:EDX[18] indicates whether Intel CPU support PCONFIG instruction.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d5f42a303e74..755353d79ae9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -330,6 +330,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
#define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */
#define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
+#define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */
#define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
--
2.16.1


2018-03-05 16:28:54

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 19/22] x86/mm: Implement free_encrypt_page()

As on allocation of encrypted page, we need to flush cache before
returning page to free pool. Failing to do this may lead to data
corruption.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/mm/mktme.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 1129ad25b22a..ef0eb1eb8d6e 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -45,6 +45,19 @@ void prep_encrypt_page(struct page *page, gfp_t gfp, unsigned int order)
WARN_ONCE(gfp & __GFP_ZERO, "__GFP_ZERO is useless for encrypted pages");
}

+void free_encrypt_page(struct page *page, int keyid, unsigned int order)
+{
+ int i;
+ void *v;
+
+ for (i = 0; i < (1 << order); i++) {
+ v = kmap_atomic_keyid(page, keyid + i);
+ /* See comment in prep_encrypt_page() */
+ clflush_cache_range(v, PAGE_SIZE);
+ kunmap_atomic(v);
+ }
+}
+
struct page *__alloc_zeroed_encrypted_user_highpage(gfp_t gfp,
struct vm_area_struct *vma, unsigned long vaddr)
{
--
2.16.1


2018-03-05 16:29:04

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 22/22] x86: Introduce CONFIG_X86_INTEL_MKTME

Add new config option to enabled/disable Multi-Key Total Memory
Encryption support.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/Kconfig | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 99aecb2caed3..e1b377443899 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1540,6 +1540,23 @@ config ARCH_USE_MEMREMAP_PROT
def_bool y
depends on AMD_MEM_ENCRYPT

+config X86_INTEL_MKTME
+ bool "Intel Multi-Key Total Memory Encryption"
+ select DYNAMIC_PHYSICAL_MASK
+ select ARCH_WANTS_GFP_ENCRYPT
+ depends on X86_64 && CPU_SUP_INTEL
+ ---help---
+ Say yes to enable support for Multi-Key Total Memory Encryption.
+ This requires Intel processor that has support of the feature.
+
+ Multikey Total Memory Encryption (MKTME) is a technology that allows
+ transparent memory encryption in upcoming Intel platforms.
+
+ MKTME is built on top of TME. TME allows encryption of the entirety
+ of system memory using a single key. MKTME allows to have multiple
+ encryption domains, each having own key -- different memory pages can
+ be encrypted with different keys.
+
# Common NUMA Features
config NUMA
bool "Numa Memory Allocation and Scheduler Support"
--
2.16.1


2018-03-05 16:29:37

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 18/22] x86/mm: Handle allocation of encrypted pages

The hardware/CPU does not enforce coherency between mappings of the
same physical page with different KeyIDs or encrypt ion keys.
We are responsible for cache management.

We have to flush cache on allocation and freeing of encrypted page.
Failing to do this may lead to data corruption.

Zeroing of encrypted page has to be done with correct KeyID. In normal
situation kmap() takes care of creating temporary mapping for the page.
But during allocaiton path page doesn't have page->mapping set.

kmap_atomic_keyid() would map the page with the specified KeyID.
For now it's dummy implementation that would be replaced later.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 3 +++
arch/x86/include/asm/page.h | 13 +++++++++++--
arch/x86/mm/mktme.c | 38 ++++++++++++++++++++++++++++++++++++++
3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 08f613953207..c8f41837351a 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -5,6 +5,9 @@

struct vm_area_struct;

+struct page *__alloc_zeroed_encrypted_user_highpage(gfp_t gfp,
+ struct vm_area_struct *vma, unsigned long vaddr);
+
#ifdef CONFIG_X86_INTEL_MKTME
extern phys_addr_t mktme_keyid_mask;
extern int mktme_nr_keyids;
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 7555b48803a8..8f808723f676 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -19,6 +19,7 @@
struct page;

#include <linux/range.h>
+#include <asm/mktme.h>
extern struct range pfn_mapped[];
extern int nr_pfn_mapped;

@@ -34,9 +35,17 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
copy_page(to, from);
}

-#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
- alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
#define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
+#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
+({ \
+ struct page *page; \
+ gfp_t gfp = movableflags | GFP_HIGHUSER; \
+ if (vma_is_encrypted(vma)) \
+ page = __alloc_zeroed_encrypted_user_highpage(gfp, vma, vaddr); \
+ else \
+ page = alloc_page_vma(gfp | __GFP_ZERO, vma, vaddr); \
+ page; \
+})

#ifndef __pa
#define __pa(x) __phys_addr((unsigned long)(x))
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 3b2f28a21d99..1129ad25b22a 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,10 +1,17 @@
#include <linux/mm.h>
+#include <linux/highmem.h>
#include <asm/mktme.h>

phys_addr_t mktme_keyid_mask;
int mktme_nr_keyids;
int mktme_keyid_shift;

+void *kmap_atomic_keyid(struct page *page, int keyid)
+{
+ /* Dummy implementation. To be replaced. */
+ return kmap_atomic(page);
+}
+
bool vma_is_encrypted(struct vm_area_struct *vma)
{
return pgprot_val(vma->vm_page_prot) & mktme_keyid_mask;
@@ -20,3 +27,34 @@ int vma_keyid(struct vm_area_struct *vma)
prot = pgprot_val(vma->vm_page_prot);
return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
}
+
+void prep_encrypt_page(struct page *page, gfp_t gfp, unsigned int order)
+{
+ void *v = page_to_virt(page);
+
+ /*
+ * The hardware/CPU does not enforce coherency between mappings of the
+ * same physical page with different KeyIDs or encrypt ion keys.
+ * We are responsible for cache management.
+ *
+ * We have to flush cache on allocation and freeing of encrypted page.
+ * Failing to do this may lead to data corruption.
+ */
+ clflush_cache_range(v, PAGE_SIZE << order);
+
+ WARN_ONCE(gfp & __GFP_ZERO, "__GFP_ZERO is useless for encrypted pages");
+}
+
+struct page *__alloc_zeroed_encrypted_user_highpage(gfp_t gfp,
+ struct vm_area_struct *vma, unsigned long vaddr)
+{
+ struct page *page;
+ void *v;
+
+ page = alloc_page_vma(gfp | GFP_HIGHUSER, vma, vaddr);
+ v = kmap_atomic_keyid(page, vma_keyid(vma));
+ clear_page(v);
+ kunmap_atomic(v);
+
+ return page;
+}
--
2.16.1


2018-03-05 16:30:01

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 17/22] x86/mm: Implement vma_is_encrypted() and vma_keyid()

We store KeyID in upper bits for vm_page_prot that match position of
KeyID in PTE. vma_keyid() extracts KeyID from vm_page_prot.

VMA is encrypted if KeyID is non-zero. vma_is_encrypted() checks that.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 9 +++++++++
arch/x86/mm/mktme.c | 17 +++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index df31876ec48c..08f613953207 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -3,10 +3,19 @@

#include <linux/types.h>

+struct vm_area_struct;
+
#ifdef CONFIG_X86_INTEL_MKTME
extern phys_addr_t mktme_keyid_mask;
extern int mktme_nr_keyids;
extern int mktme_keyid_shift;
+
+#define vma_is_encrypted vma_is_encrypted
+bool vma_is_encrypted(struct vm_area_struct *vma);
+
+#define vma_keyid vma_keyid
+int vma_keyid(struct vm_area_struct *vma);
+
#else
#define mktme_keyid_mask ((phys_addr_t)0)
#define mktme_nr_keyids 0
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 467f1b26c737..3b2f28a21d99 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,5 +1,22 @@
+#include <linux/mm.h>
#include <asm/mktme.h>

phys_addr_t mktme_keyid_mask;
int mktme_nr_keyids;
int mktme_keyid_shift;
+
+bool vma_is_encrypted(struct vm_area_struct *vma)
+{
+ return pgprot_val(vma->vm_page_prot) & mktme_keyid_mask;
+}
+
+int vma_keyid(struct vm_area_struct *vma)
+{
+ pgprotval_t prot;
+
+ if (!vma_is_anonymous(vma))
+ return 0;
+
+ prot = pgprot_val(vma->vm_page_prot);
+ return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
+}
--
2.16.1


2018-03-05 16:30:17

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 12/22] mm: Do no merge vma with different encryption KeyIDs

VMAs with different KeyID do not mix together. Only VMAs with the same
KeyID are compatible.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
include/linux/mm.h | 7 +++++++
mm/mmap.c | 3 ++-
2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc7b32d0189b..7a4285f09c99 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1486,6 +1486,13 @@ static inline bool vma_is_encrypted(struct vm_area_struct *vma)
}
#endif

+#ifndef vma_keyid
+static inline int vma_keyid(struct vm_area_struct *vma)
+{
+ return 0;
+}
+#endif
+
#ifdef CONFIG_SHMEM
/*
* The vma_is_shmem is not inline because it is used only by slow
diff --git a/mm/mmap.c b/mm/mmap.c
index 9efdc021ad22..fa218d1c6bfa 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1208,7 +1208,8 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
mpol_equal(vma_policy(a), vma_policy(b)) &&
a->vm_file == b->vm_file &&
!((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC|VM_SOFTDIRTY)) &&
- b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
+ b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT) &&
+ vma_keyid(a) == vma_keyid(b);
}

/*
--
2.16.1


2018-03-05 16:30:47

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 16/22] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()

Encrypted VMA would have KeyID stored in vma->vm_page_prot. This way we
don't need to do anything special to setup encrypted page table entries
and don't need to reserve space for KeyID in a VMA.

This patch changes _PAGE_CHG_MASK to include KeyID bits. Otherwise they
are going to be stripped from vm_page_prot on the first pgprot_modify().

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/pgtable_types.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 246f15b4e64c..800f66770163 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -121,8 +121,13 @@
* protection key is treated like _PAGE_RW, for
* instance, and is *not* included in this mask since
* pte_modify() does modify it.
+ *
+ * It includes full range of PFN bits regardless if they were claimed for KeyID
+ * or not: we want to preserve KeyID on pte_modify() and pgprot_modify().
*/
-#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
+#define PTE_PFN_MASK_MAX \
+ (((signed long)PAGE_MASK) & ((1UL << __PHYSICAL_MASK_SHIFT) - 1))
+#define _PAGE_CHG_MASK (PTE_PFN_MASK_MAX | _PAGE_PCD | _PAGE_PWT | \
_PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \
_PAGE_SOFT_DIRTY)
#define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
--
2.16.1


2018-03-05 16:31:03

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 15/22] x86/mm: Introduce variables to store number, shift and mask of KeyIDs

mktme_nr_keyids holds number of KeyIDs available for MKTME, excluding
KeyID zero which used by TME. MKTME KeyIDs start from 1.

mktme_keyid_shift holds shift of KeyID within physical address.

mktme_keyid_mask holds mask to extract KeyID from physical address.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 16 ++++++++++++++++
arch/x86/kernel/cpu/intel.c | 13 +++++++++----
arch/x86/mm/Makefile | 2 ++
arch/x86/mm/mktme.c | 5 +++++
4 files changed, 32 insertions(+), 4 deletions(-)
create mode 100644 arch/x86/include/asm/mktme.h
create mode 100644 arch/x86/mm/mktme.c

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
new file mode 100644
index 000000000000..df31876ec48c
--- /dev/null
+++ b/arch/x86/include/asm/mktme.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_MKTME_H
+#define _ASM_X86_MKTME_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_X86_INTEL_MKTME
+extern phys_addr_t mktme_keyid_mask;
+extern int mktme_nr_keyids;
+extern int mktme_keyid_shift;
+#else
+#define mktme_keyid_mask ((phys_addr_t)0)
+#define mktme_nr_keyids 0
+#define mktme_keyid_shift 0
+#endif
+
+#endif
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 35436bbadd0b..77b5dc937ac6 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -589,11 +589,13 @@ static void detect_tme(struct cpuinfo_x86 *c)
* and number of bits claimed for KeyID is 6, bits 51:46 of
* physical address is unusable.
*/
- phys_addr_t keyid_mask;
+ mktme_keyid_mask = 1ULL << c->x86_phys_bits;
+ mktme_keyid_mask -= 1ULL << (c->x86_phys_bits - keyid_bits);
+ physical_mask &= ~mktme_keyid_mask;

- keyid_mask = 1ULL << c->x86_phys_bits;
- keyid_mask -= 1ULL << (c->x86_phys_bits - keyid_bits);
- physical_mask &= ~keyid_mask;
+
+ mktme_nr_keyids = nr_keyids;
+ mktme_keyid_shift = c->x86_phys_bits - keyid_bits;
} else {
/*
* Reset __PHYSICAL_MASK.
@@ -601,6 +603,9 @@ static void detect_tme(struct cpuinfo_x86 *c)
* between CPUs.
*/
physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+ mktme_keyid_mask = 0;
+ mktme_keyid_shift = 0;
+ mktme_nr_keyids = 0;
}
#endif

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 4b101dd6e52f..4ebee899c363 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o
+
+obj-$(CONFIG_X86_INTEL_MKTME) += mktme.o
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
new file mode 100644
index 000000000000..467f1b26c737
--- /dev/null
+++ b/arch/x86/mm/mktme.c
@@ -0,0 +1,5 @@
+#include <asm/mktme.h>
+
+phys_addr_t mktme_keyid_mask;
+int mktme_nr_keyids;
+int mktme_keyid_shift;
--
2.16.1


2018-03-05 16:31:52

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 08/22] mm: Introduce __GFP_ENCRYPT

The patch adds new gfp flag to indicate that we're allocating encrypted
page.

Architectural code may need to do special preparation for encrypted
pages such as flushing cache to avoid aliasing.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
include/linux/gfp.h | 12 ++++++++++++
include/linux/mm.h | 2 ++
include/trace/events/mmflags.h | 1 +
mm/Kconfig | 3 +++
mm/page_alloc.c | 3 +++
tools/perf/builtin-kmem.c | 1 +
6 files changed, 22 insertions(+)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 1a4582b44d32..43a93ca11c3c 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -24,6 +24,11 @@ struct vm_area_struct;
#define ___GFP_HIGH 0x20u
#define ___GFP_IO 0x40u
#define ___GFP_FS 0x80u
+#ifdef CONFIG_ARCH_WANTS_GFP_ENCRYPT
+#define ___GFP_ENCYPT 0x100u
+#else
+#define ___GFP_ENCYPT 0
+#endif
#define ___GFP_NOWARN 0x200u
#define ___GFP_RETRY_MAYFAIL 0x400u
#define ___GFP_NOFAIL 0x800u
@@ -188,6 +193,13 @@ struct vm_area_struct;
#define __GFP_NOFAIL ((__force gfp_t)___GFP_NOFAIL)
#define __GFP_NORETRY ((__force gfp_t)___GFP_NORETRY)

+/*
+ * Allocate encrypted page.
+ *
+ * Architectural code may need to do special preparation for encrypted pages
+ * such as flushing cache to avoid aliasing.
+ */
+#define __GFP_ENCRYPT ((__force gfp_t)___GFP_ENCYPT)
/*
* Action modifiers
*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42adb1a..6791eccdb740 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1966,6 +1966,8 @@ extern void mem_init_print_info(const char *str);

extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end);

+extern void prep_encrypt_page(struct page *page, gfp_t gfp, unsigned int order);
+
/* Free the reserved page into the buddy system, so it gets managed. */
static inline void __free_reserved_page(struct page *page)
{
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index dbe1bb058c09..43cc3f7170bc 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -32,6 +32,7 @@
{(unsigned long)__GFP_ATOMIC, "__GFP_ATOMIC"}, \
{(unsigned long)__GFP_IO, "__GFP_IO"}, \
{(unsigned long)__GFP_FS, "__GFP_FS"}, \
+ {(unsigned long)__GFP_ENCRYPT, "__GFP_ENCRYPT"}, \
{(unsigned long)__GFP_NOWARN, "__GFP_NOWARN"}, \
{(unsigned long)__GFP_RETRY_MAYFAIL, "__GFP_RETRY_MAYFAIL"}, \
{(unsigned long)__GFP_NOFAIL, "__GFP_NOFAIL"}, \
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..e08583c0498e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -149,6 +149,9 @@ config NO_BOOTMEM
config MEMORY_ISOLATION
bool

+config ARCH_WANTS_GFP_ENCRYPT
+ bool
+
#
# Only be set on architectures that have completely implemented memory hotplug
# feature. If you are not sure, don't touch it.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cb416723538f..8d049445b827 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1829,6 +1829,9 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
set_page_pfmemalloc(page);
else
clear_page_pfmemalloc(page);
+
+ if (gfp_flags & __GFP_ENCRYPT)
+ prep_encrypt_page(page, gfp_flags, order);
}

/*
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index ae11e4c3516a..1eeb2425cb01 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -641,6 +641,7 @@ static const struct {
{ "__GFP_ATOMIC", "_A" },
{ "__GFP_IO", "I" },
{ "__GFP_FS", "F" },
+ { "__GFP_ENCRYPT", "E" },
{ "__GFP_NOWARN", "NWR" },
{ "__GFP_RETRY_MAYFAIL", "R" },
{ "__GFP_NOFAIL", "NF" },
--
2.16.1


2018-03-05 16:32:03

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 20/22] x86/mm: Implement anon_vma_encrypted() and anon_vma_keyid()

This patch implements helpers to check if given VMA is encrypted and
with which KeyID.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 14 ++++++++++++++
arch/x86/mm/mktme.c | 11 +++++++++++
2 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index c8f41837351a..56c7e9b14ab6 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -4,11 +4,20 @@
#include <linux/types.h>

struct vm_area_struct;
+struct anon_vma;

struct page *__alloc_zeroed_encrypted_user_highpage(gfp_t gfp,
struct vm_area_struct *vma, unsigned long vaddr);

#ifdef CONFIG_X86_INTEL_MKTME
+#define arch_anon_vma arch_anon_vma
+struct arch_anon_vma {
+ int keyid;
+};
+
+#define arch_anon_vma_init(anon_vma, vma) \
+ anon_vma->arch_anon_vma.keyid = vma_keyid(vma);
+
extern phys_addr_t mktme_keyid_mask;
extern int mktme_nr_keyids;
extern int mktme_keyid_shift;
@@ -19,6 +28,11 @@ bool vma_is_encrypted(struct vm_area_struct *vma);
#define vma_keyid vma_keyid
int vma_keyid(struct vm_area_struct *vma);

+#define anon_vma_encrypted anon_vma_encrypted
+bool anon_vma_encrypted(struct anon_vma *anon_vma);
+
+#define anon_vma_keyid anon_vma_keyid
+int anon_vma_keyid(struct anon_vma *anon_vma);
#else
#define mktme_keyid_mask ((phys_addr_t)0)
#define mktme_nr_keyids 0
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index ef0eb1eb8d6e..69172aabc07c 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,4 +1,5 @@
#include <linux/mm.h>
+#include <linux/rmap.h>
#include <linux/highmem.h>
#include <asm/mktme.h>

@@ -28,6 +29,16 @@ int vma_keyid(struct vm_area_struct *vma)
return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
}

+bool anon_vma_encrypted(struct anon_vma *anon_vma)
+{
+ return anon_vma_keyid(anon_vma);
+}
+
+int anon_vma_keyid(struct anon_vma *anon_vma)
+{
+ return anon_vma->arch_anon_vma.keyid;
+}
+
void prep_encrypt_page(struct page *page, gfp_t gfp, unsigned int order)
{
void *v = page_to_virt(page);
--
2.16.1


2018-03-05 16:32:23

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 10/22] mm/shmem: Zero out unused vma fields in shmem_pseudo_vma_init()

shmem/tmpfs uses pseudo vma to allocate page with correct NUMA policy.

The pseudo vma doesn't have vm_page_prot set. We are going to encode
encryption KeyID in vm_page_prot. Having garbage there causes problems.

Zero out all unused fields in the pseudo vma.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
mm/shmem.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 1907688b75ee..e0e87b6aad26 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1395,10 +1395,9 @@ static void shmem_pseudo_vma_init(struct vm_area_struct *vma,
struct shmem_inode_info *info, pgoff_t index)
{
/* Create a pseudo vma that just contains the policy */
- vma->vm_start = 0;
+ memset(vma, 0, sizeof(*vma));
/* Bias interleave by inode number to distribute better across nodes */
vma->vm_pgoff = index + info->vfs_inode.i_ino;
- vma->vm_ops = NULL;
vma->vm_policy = mpol_shared_policy_lookup(&info->policy, index);
}

--
2.16.1


2018-03-05 16:32:23

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 02/22] x86/tme: Detect if TME and MKTME is activated by BIOS

IA32_TME_ACTIVATE MSR (0x982) can be used to check if BIOS has enabled
TME and MKTME. It includes which encryption policy/algorithm is selected
for TME or available for MKTME. For MKTME, the MSR also enumerates how
many KeyIDs are available.

We would need to exclude KeyID bits from physical address bits.
detect_tme() would adjust cpuinfo_x86::x86_phys_bits accordingly.

We have to do this even if we are not going to use KeyID bits
ourself. VM guests still have to know that these bits are not usable
for physical address.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 90 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 90 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index d19e903214b4..c770689490b5 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -503,6 +503,93 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
}
}

+#define MSR_IA32_TME_ACTIVATE 0x982
+
+/* Helpers to access TME_ACTIVATE MSR */
+#define TME_ACTIVATE_LOCKED(x) (x & 0x1)
+#define TME_ACTIVATE_ENABLED(x) (x & 0x2)
+
+#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf) /* Bits 7:4 */
+#define TME_ACTIVATE_POLICY_AES_XTS_128 0
+
+#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /* Bits 35:32 */
+
+#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */
+#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1
+
+/* Values for mktme_status (SW only construct) */
+#define MKTME_ENABLED 0
+#define MKTME_DISABLED 1
+#define MKTME_UNINITIALIZED 2
+static int mktme_status = MKTME_UNINITIALIZED;
+
+static void detect_tme(struct cpuinfo_x86 *c)
+{
+ u64 tme_activate, tme_policy, tme_crypto_algs;
+ int keyid_bits = 0, nr_keyids = 0;
+ static u64 tme_activate_cpu0 = 0;
+
+ rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate);
+
+ if (mktme_status != MKTME_UNINITIALIZED) {
+ if (tme_activate != tme_activate_cpu0) {
+ /* Broken BIOS? */
+ pr_err_once("x86/tme: configuation is inconsistent between CPUs\n");
+ pr_err_once("x86/tme: MKTME is not usable\n");
+ mktme_status = MKTME_DISABLED;
+
+ /* Proceed. We may need to exclude bits from x86_phys_bits. */
+ }
+ } else {
+ tme_activate_cpu0 = tme_activate;
+ }
+
+ if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) {
+ pr_info_once("x86/tme: not enabled by BIOS\n");
+ mktme_status = MKTME_DISABLED;
+ return;
+ }
+
+ if (mktme_status != MKTME_UNINITIALIZED)
+ goto detect_keyid_bits;
+
+ pr_info("x86/tme: enabled by BIOS\n");
+
+ tme_policy = TME_ACTIVATE_POLICY(tme_activate);
+ if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128)
+ pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy);
+
+ tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
+ if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) {
+ pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n",
+ tme_crypto_algs);
+ mktme_status = MKTME_DISABLED;
+ }
+detect_keyid_bits:
+ keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate);
+ nr_keyids = (1UL << keyid_bits) - 1;
+ if (nr_keyids) {
+ pr_info_once("x86/mktme: enabled by BIOS\n");
+ pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids);
+ } else {
+ pr_info_once("x86/mktme: disabled by BIOS\n");
+ }
+
+ if (mktme_status == MKTME_UNINITIALIZED) {
+ /* MKTME is usable */
+ mktme_status = MKTME_ENABLED;
+ }
+
+ /*
+ * Exclude KeyID bits from physical address bits.
+ *
+ * We have to do this even if we are not going to use KeyID bits
+ * ourself. VM guests still have to know that these bits are not usable
+ * for physical address.
+ */
+ c->x86_phys_bits -= keyid_bits;
+}
+
static void init_intel_energy_perf(struct cpuinfo_x86 *c)
{
u64 epb;
@@ -673,6 +760,9 @@ static void init_intel(struct cpuinfo_x86 *c)
if (cpu_has(c, X86_FEATURE_VMX))
detect_vmx_virtcap(c);

+ if (cpu_has(c, X86_FEATURE_TME))
+ detect_tme(c);
+
init_intel_energy_perf(c);

init_intel_misc_features(c);
--
2.16.1


2018-03-05 16:32:49

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 07/22] x86/mm: Mask out KeyID bits from page table entry pfn

MKTME claims several upper bits of the physical address in a page table
entry to encode KeyID. It effectively shrinks number of bits for
physical address. We should exclude KeyID bits from physical addresses.

For instance, if CPU enumerates 52 physical address bits and number of
bits claimed for KeyID is 6, bits 51:46 must not be threated as part
physical address.

This patch adjusts __PHYSICAL_MASK during MKTME enumeration.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index c770689490b5..35436bbadd0b 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -580,6 +580,30 @@ static void detect_tme(struct cpuinfo_x86 *c)
mktme_status = MKTME_ENABLED;
}

+#ifdef CONFIG_X86_INTEL_MKTME
+ if (mktme_status == MKTME_ENABLED && nr_keyids) {
+ /*
+ * Mask out bits claimed from KeyID from physical address mask.
+ *
+ * For instance, if a CPU enumerates 52 physical address bits
+ * and number of bits claimed for KeyID is 6, bits 51:46 of
+ * physical address is unusable.
+ */
+ phys_addr_t keyid_mask;
+
+ keyid_mask = 1ULL << c->x86_phys_bits;
+ keyid_mask -= 1ULL << (c->x86_phys_bits - keyid_bits);
+ physical_mask &= ~keyid_mask;
+ } else {
+ /*
+ * Reset __PHYSICAL_MASK.
+ * Maybe needed if there's inconsistent configuation
+ * between CPUs.
+ */
+ physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+ }
+#endif
+
/*
* Exclude KeyID bits from physical address bits.
*
--
2.16.1


2018-03-05 16:33:07

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 11/22] mm: Use __GFP_ENCRYPT for pages in encrypted VMAs

Change page allocation path to pass __GFP_ENCRYPT on allocation pages
for encrypted VMAs.

There are two different path where __GFP_ENCRYPT has to be set. One for
kernel compiled with CONFIG_NUMA enabled and the second for kernel
without NUMA support.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
include/linux/gfp.h | 17 +++++++++++------
include/linux/mm.h | 7 +++++++
mm/mempolicy.c | 3 +++
3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 43a93ca11c3c..c2e6f99a7fc6 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -506,21 +506,26 @@ alloc_pages(gfp_t gfp_mask, unsigned int order)
extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
struct vm_area_struct *vma, unsigned long addr,
int node, bool hugepage);
-#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
- alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
#else
#define alloc_pages(gfp_mask, order) \
alloc_pages_node(numa_node_id(), gfp_mask, order)
-#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
- alloc_pages(gfp_mask, order)
-#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
- alloc_pages(gfp_mask, order)
+
+static inline struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
+ struct vm_area_struct *vma, unsigned long addr,
+ int node, bool hugepage)
+{
+ if (vma_is_encrypted(vma))
+ gfp_mask |= __GFP_ENCRYPT;
+ return alloc_pages(gfp_mask, order);
+}
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
#define alloc_page_vma(gfp_mask, vma, addr) \
alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false)
#define alloc_page_vma_node(gfp_mask, vma, addr, node) \
alloc_pages_vma(gfp_mask, 0, vma, addr, node, false)
+#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
+ alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)

extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
extern unsigned long get_zeroed_page(gfp_t gfp_mask);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6791eccdb740..bc7b32d0189b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1479,6 +1479,13 @@ static inline bool vma_is_anonymous(struct vm_area_struct *vma)
return !vma->vm_ops;
}

+#ifndef vma_is_encrypted
+static inline bool vma_is_encrypted(struct vm_area_struct *vma)
+{
+ return false;
+}
+#endif
+
#ifdef CONFIG_SHMEM
/*
* The vma_is_shmem is not inline because it is used only by slow
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d879f1d8a44a..da989273de40 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1977,6 +1977,9 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
int preferred_nid;
nodemask_t *nmask;

+ if (vma_is_encrypted(vma))
+ gfp |= __GFP_ENCRYPT;
+
pol = get_vma_policy(vma, addr);

if (pol->mode == MPOL_INTERLEAVE) {
--
2.16.1


2018-03-05 16:33:30

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 09/22] mm, rmap: Add arch-specific field into anon_vma

MKTME enabling requires a way to find out which encryption KeyID has to
be used to access the page. There's not enough space in struct page to
store this information.

As a way out we can store it in anon_vma for the page: all pages in the
same anon_vma tree will be encrypted with the same KeyID.

This patch adds arch-specific field into anon_vma. For x86 it will be
used to store KeyID.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
include/linux/rmap.h | 6 ++++++
mm/rmap.c | 15 ++++++++++++---
2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 988d176472df..54c7ea330827 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -12,6 +12,10 @@
#include <linux/memcontrol.h>
#include <linux/highmem.h>

+#ifndef arch_anon_vma
+struct arch_anon_vma {};
+#endif
+
/*
* The anon_vma heads a list of private "related" vmas, to scan if
* an anonymous page pointing to this anon_vma needs to be unmapped:
@@ -59,6 +63,8 @@ struct anon_vma {

/* Interval tree of private "related" vmas */
struct rb_root_cached rb_root;
+
+ struct arch_anon_vma arch_anon_vma;
};

/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 47db27f8049e..c0470a69a4c9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -74,7 +74,14 @@
static struct kmem_cache *anon_vma_cachep;
static struct kmem_cache *anon_vma_chain_cachep;

-static inline struct anon_vma *anon_vma_alloc(void)
+#ifndef arch_anon_vma_init
+static inline void arch_anon_vma_init(struct anon_vma *anon_vma,
+ struct vm_area_struct *vma)
+{
+}
+#endif
+
+static inline struct anon_vma *anon_vma_alloc(struct vm_area_struct *vma)
{
struct anon_vma *anon_vma;

@@ -88,6 +95,8 @@ static inline struct anon_vma *anon_vma_alloc(void)
* from fork, the root will be reset to the parents anon_vma.
*/
anon_vma->root = anon_vma;
+
+ arch_anon_vma_init(anon_vma, vma);
}

return anon_vma;
@@ -186,7 +195,7 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
anon_vma = find_mergeable_anon_vma(vma);
allocated = NULL;
if (!anon_vma) {
- anon_vma = anon_vma_alloc();
+ anon_vma = anon_vma_alloc(vma);
if (unlikely(!anon_vma))
goto out_enomem_free_avc;
allocated = anon_vma;
@@ -337,7 +346,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
return 0;

/* Then add our own anon_vma. */
- anon_vma = anon_vma_alloc();
+ anon_vma = anon_vma_alloc(vma);
if (!anon_vma)
goto out_error;
avc = anon_vma_chain_alloc(GFP_KERNEL);
--
2.16.1


2018-03-05 16:33:32

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 14/22] mm, khugepaged: Do not collapse pages in encrypted VMAs

Pages for encrypted VMAs have to be allocated in a special way:
we would need to propagate down not only desired NUMA node but also
whether the page is encrypted.

It complicates not-so-trivial routine of huge page allocation in
khugepaged even more. I also puts more pressure on page allocator:
we cannot re-use pages allocated for encrypted VMA to collapse
page in unencrypted one or vice versa.

I think for now it worth skipping encrypted VMAs. We can return
to this topic later.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
mm/khugepaged.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b7e2268dfc9a..601151678414 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -830,6 +830,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma)
return false;
if (is_vma_temporary_stack(vma))
return false;
+ if (vma_is_encrypted(vma))
+ return false;
return !(vma->vm_flags & VM_NO_KHUGEPAGED);
}

--
2.16.1


2018-03-05 16:33:41

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 06/22] x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

AMD SME claims one bit from physical address to indicate whether the
page is encrypted or not. To achieve that we clear out the bit from
__PHYSICAL_MASK.

The capability to adjust __PHYSICAL_MASK is required beyond AMD SME.
For instance for upcoming Intel Multi-Key Total Memory Encryption.

Let's factor it out into separate feature with own Kconfig handle.

It also helps with overhead of AMD SME. It saves more than 3k in .text
on defconfig + AMD_MEM_ENCRYPT:

add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564)

We would need to return to this once we have infrastructure to patch
constants in code. That's good candidate for it.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/Kconfig | 4 ++++
arch/x86/boot/compressed/kaslr_64.c | 3 +++
arch/x86/include/asm/page_types.h | 8 +++++++-
arch/x86/mm/mem_encrypt_identity.c | 3 +++
arch/x86/mm/pgtable.c | 5 +++++
5 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bdfd503065d3..99aecb2caed3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -332,6 +332,9 @@ config ARCH_SUPPORTS_UPROBES
config FIX_EARLYCON_MEM
def_bool y

+config DYNAMIC_PHYSICAL_MASK
+ bool
+
config PGTABLE_LEVELS
int
default 5 if X86_5LEVEL
@@ -1513,6 +1516,7 @@ config ARCH_HAS_MEM_ENCRYPT
config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
+ select DYNAMIC_PHYSICAL_MASK
---help---
Say yes to enable support for the encryption of system memory.
This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/boot/compressed/kaslr_64.c b/arch/x86/boot/compressed/kaslr_64.c
index b5e5e02f8cde..4318ac0af815 100644
--- a/arch/x86/boot/compressed/kaslr_64.c
+++ b/arch/x86/boot/compressed/kaslr_64.c
@@ -16,6 +16,9 @@
#define __pa(x) ((unsigned long)(x))
#define __va(x) ((void *)((unsigned long)(x)))

+/* No need in adjustable __PHYSICAL_MASK during decompresssion phase */
+#undef CONFIG_DYNAMIC_PHYSICAL_MASK
+
/*
* The pgtable.h and mm/ident_map.c includes make use of the SME related
* information which is not used in the compressed image support. Un-define
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 1e53560a84bb..c85e15010f48 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -17,7 +17,6 @@
#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))

-#define __PHYSICAL_MASK ((phys_addr_t)(__sme_clr((1ULL << __PHYSICAL_MASK_SHIFT) - 1)))
#define __VIRTUAL_MASK ((1UL << __VIRTUAL_MASK_SHIFT) - 1)

/* Cast *PAGE_MASK to a signed type so that it is sign-extended if
@@ -55,6 +54,13 @@

#ifndef __ASSEMBLY__

+#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
+extern phys_addr_t physical_mask;
+#define __PHYSICAL_MASK physical_mask
+#else
+#define __PHYSICAL_MASK ((phys_addr_t)((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
+#endif
+
extern int devmem_is_allowed(unsigned long pagenr);

extern unsigned long max_low_pfn_mapped;
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 1b2197d13832..7ae36868aed2 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -527,6 +527,7 @@ void __init sme_enable(struct boot_params *bp)
/* SEV state cannot be controlled by a command line option */
sme_me_mask = me_mask;
sev_enabled = true;
+ physical_mask &= ~sme_me_mask;
return;
}

@@ -561,4 +562,6 @@ void __init sme_enable(struct boot_params *bp)
sme_me_mask = 0;
else
sme_me_mask = active_by_default ? me_mask : 0;
+
+ physical_mask &= ~sme_me_mask;
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 004abf9ebf12..a4dfe85f2fd8 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -7,6 +7,11 @@
#include <asm/fixmap.h>
#include <asm/mtrr.h>

+#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
+phys_addr_t physical_mask __ro_after_init = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+EXPORT_SYMBOL(physical_mask);
+#endif
+
#define PGALLOC_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO)

#ifdef CONFIG_HIGHPTE
--
2.16.1


2018-03-05 16:33:58

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 05/22] x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf

MKTME_KEY_PROG allows to manipulate MKTME keys in the CPU.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/intel_pconfig.h | 50 ++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)

diff --git a/arch/x86/include/asm/intel_pconfig.h b/arch/x86/include/asm/intel_pconfig.h
index fb7a37c3798b..3cb002b1d0f9 100644
--- a/arch/x86/include/asm/intel_pconfig.h
+++ b/arch/x86/include/asm/intel_pconfig.h
@@ -12,4 +12,54 @@ enum pconfig_target {

int pconfig_target_supported(enum pconfig_target target);

+enum pconfig_leaf {
+ MKTME_KEY_PROGRAM = 0,
+ PCONFIG_LEAF_INVALID,
+};
+
+#define PCONFIG ".byte 0x0f, 0x01, 0xc5"
+
+/* Defines and structure for MKTME_KEY_PROGRAM of PCONFIG instruction */
+
+/* mktme_key_program::keyid_ctrl COMMAND, bits [7:0] */
+#define MKTME_KEYID_SET_KEY_DIRECT 0
+#define MKTME_KEYID_SET_KEY_RANDOM 1
+#define MKTME_KEYID_CLEAR_KEY 2
+#define MKTME_KEYID_NO_ENCRYPT 3
+
+/* mktme_key_program::keyid_ctrl ENC_ALG, bits [23:8] */
+#define MKTME_AES_XTS_128 (1 << 8)
+
+/* Return codes from the PCONFIG MKTME_KEY_PROGRAM */
+#define MKTME_PROG_SUCCESS 0
+#define MKTME_INVALID_PROG_CMD 1
+#define MKTME_ENTROPY_ERROR 2
+#define MKTME_INVALID_KEYID 3
+#define MKTME_INVALID_ENC_ALG 4
+#define MKTME_DEVICE_BUSY 5
+
+/* Hardware requires the structure to be 256 byte alinged. Otherwise #GP(0). */
+struct mktme_key_program {
+ u16 keyid;
+ u32 keyid_ctrl;
+ u8 __rsvd[58];
+ u8 key_field_1[64];
+ u8 key_field_2[64];
+} __packed __aligned(256);
+
+static inline int mktme_key_program(struct mktme_key_program *key_program)
+{
+ unsigned long rax = MKTME_KEY_PROGRAM;
+
+ if (!pconfig_target_supported(MKTME_TARGET))
+ return -ENXIO;
+
+ asm volatile(PCONFIG
+ : "=a" (rax), "=b" (key_program)
+ : "0" (rax), "1" (key_program)
+ : "memory", "cc");
+
+ return rax;
+}
+
#endif /* _ASM_X86_INTEL_PCONFIG_H */
--
2.16.1


2018-03-05 16:34:34

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 13/22] mm, rmap: Free encrypted pages once mapcount drops to zero

Freeing encrypted pages may require special treatment such as flush
cache to avoid aliasing.

Anonymous pages cannot be mapped back once the last mapcount is gone.
That's a good place to add hook to free encrypted page. At later point
we may not have valid anon_vma around to get KeyID.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
include/linux/mm.h | 1 +
mm/rmap.c | 34 ++++++++++++++++++++++++++++++++--
2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7a4285f09c99..7ab5e39e3195 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1981,6 +1981,7 @@ extern void mem_init_print_info(const char *str);
extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end);

extern void prep_encrypt_page(struct page *page, gfp_t gfp, unsigned int order);
+extern void free_encrypt_page(struct page *page, int keyid, unsigned int order);

/* Free the reserved page into the buddy system, so it gets managed. */
static inline void __free_reserved_page(struct page *page)
diff --git a/mm/rmap.c b/mm/rmap.c
index c0470a69a4c9..4bff992fc106 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -81,6 +81,21 @@ static inline void arch_anon_vma_init(struct anon_vma *anon_vma,
}
#endif

+#ifndef anon_vma_encrypted
+static inline bool anon_vma_encrypted(struct anon_vma *anon_vma)
+{
+ return false;
+}
+#endif
+
+#ifndef anon_vma_keyid
+static inline int anon_vma_keyid(struct anon_vma *anon_vma)
+{
+ BUILD_BUG();
+ return 0;
+}
+#endif
+
static inline struct anon_vma *anon_vma_alloc(struct vm_area_struct *vma)
{
struct anon_vma *anon_vma;
@@ -1258,6 +1273,7 @@ static void page_remove_file_rmap(struct page *page, bool compound)

static void page_remove_anon_compound_rmap(struct page *page)
{
+ struct anon_vma *anon_vma;
int i, nr;

if (!atomic_add_negative(-1, compound_mapcount_ptr(page)))
@@ -1292,6 +1308,12 @@ static void page_remove_anon_compound_rmap(struct page *page)
__mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, -nr);
deferred_split_huge_page(page);
}
+
+ anon_vma = page_anon_vma(page);
+ if (anon_vma_encrypted(anon_vma)) {
+ int keyid = anon_vma_keyid(anon_vma);
+ free_encrypt_page(page, keyid, compound_order(page));
+ }
}

/**
@@ -1303,6 +1325,9 @@ static void page_remove_anon_compound_rmap(struct page *page)
*/
void page_remove_rmap(struct page *page, bool compound)
{
+ struct page *head;
+ struct anon_vma *anon_vma;
+
if (!PageAnon(page))
return page_remove_file_rmap(page, compound);

@@ -1323,8 +1348,13 @@ void page_remove_rmap(struct page *page, bool compound)
if (unlikely(PageMlocked(page)))
clear_page_mlock(page);

- if (PageTransCompound(page))
- deferred_split_huge_page(compound_head(page));
+ head = compound_head(page);
+ if (PageTransHuge(head))
+ deferred_split_huge_page(head);
+
+ anon_vma = page_anon_vma(head);
+ if (anon_vma_encrypted(anon_vma))
+ free_encrypt_page(page, anon_vma_keyid(anon_vma), 0);

/*
* It would be tidy to reset the PageAnon mapping here,
--
2.16.1


2018-03-05 16:34:39

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFC, PATCH 01/22] x86/cpufeatures: Add Intel Total Memory Encryption cpufeature

CPUID.0x7.0x0:ECX[13] indicates whether CPU supports Intel Total Memory
Encryption.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3508318b5d23..d5f42a303e74 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -317,6 +317,7 @@
#define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */
#define X86_FEATURE_AVX512_VNNI (16*32+11) /* Vector Neural Network Instructions */
#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB instructions */
+#define X86_FEATURE_TME (16*32+13) /* Intel Total Memory Encryption */
#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */
#define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */
#define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */
--
2.16.1


2018-03-05 17:10:35

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 21/22] x86/mm: Introduce page_keyid() and page_encrypted()

On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> +static inline bool page_encrypted(struct page *page)
> +{
> + /* All pages with non-zero KeyID are encrypted */
> + return page_keyid(page) != 0;
> +}

Is this true? I thought there was a KEYID_NO_ENCRYPT "Do not encrypt
memory when this KeyID is in use." Is that really only limited to key 0.

2018-03-05 18:33:56

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC, PATCH 00/22] Partial MKTME enabling

On Mon, Mar 05, 2018 at 07:25:48PM +0300, Kirill A. Shutemov wrote:
> Hi everybody,
>
> Here's updated version of my patchset that brings support of MKTME.

It would really help if you'd explain what "MKTME" is..

2018-03-05 19:01:21

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 19/22] x86/mm: Implement free_encrypt_page()

On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> +void free_encrypt_page(struct page *page, int keyid, unsigned int order)
> +{
> + int i;
> + void *v;
> +
> + for (i = 0; i < (1 << order); i++) {
> + v = kmap_atomic_keyid(page, keyid + i);
> + /* See comment in prep_encrypt_page() */
> + clflush_cache_range(v, PAGE_SIZE);
> + kunmap_atomic(v);
> + }
> +}

Did you miss adding the call sites for this?


2018-03-05 19:05:06

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 18/22] x86/mm: Handle allocation of encrypted pages

On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> -#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
> - alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
> #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
> +#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
> +({ \
> + struct page *page; \
> + gfp_t gfp = movableflags | GFP_HIGHUSER; \
> + if (vma_is_encrypted(vma)) \
> + page = __alloc_zeroed_encrypted_user_highpage(gfp, vma, vaddr); \
> + else \
> + page = alloc_page_vma(gfp | __GFP_ZERO, vma, vaddr); \
> + page; \
> +})

This is pretty darn ugly and also adds a big old branch into the hottest
path in the page allocator.

It's also really odd that you strip __GFP_ZERO and then go ahead and
zero the encrypted page unconditionally. It really makes me wonder if
this is the right spot to be doing this.

Can we not, for instance do it inside alloc_page_vma()?

2018-03-05 19:08:04

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC, PATCH 00/22] Partial MKTME enabling

On Mon, Mar 05, 2018 at 10:30:50AM -0800, Christoph Hellwig wrote:
> On Mon, Mar 05, 2018 at 07:25:48PM +0300, Kirill A. Shutemov wrote:
> > Hi everybody,
> >
> > Here's updated version of my patchset that brings support of MKTME.
>
> It would really help if you'd explain what "MKTME" is..

You needed to keep reading, to below the -------------- line.

I agree though, that should have been up top.

2018-03-05 19:08:21

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 19/22] x86/mm: Implement free_encrypt_page()

On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> +void free_encrypt_page(struct page *page, int keyid, unsigned int order)
> +{
> + int i;
> + void *v;
> +
> + for (i = 0; i < (1 << order); i++) {
> + v = kmap_atomic_keyid(page, keyid + i);
> + /* See comment in prep_encrypt_page() */
> + clflush_cache_range(v, PAGE_SIZE);
> + kunmap_atomic(v);
> + }
> +}

Have you measured how slow this is?

It's an optimization, but can we find a way to only do this dance when
we *actually* change the keyid? Right now, we're doing mapping at alloc
and free, clflushing at free and zeroing at alloc. Let's say somebody does:

ptr = malloc(PAGE_SIZE);
*ptr = foo;
free(ptr);

ptr = malloc(PAGE_SIZE);
*ptr = bar;
free(ptr);

And let's say ptr is in encrypted memory and that we actually munmap()
at free(). We can theoretically skip the clflush, right?

2018-03-05 19:09:13

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 18/22] x86/mm: Handle allocation of encrypted pages

On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> kmap_atomic_keyid() would map the page with the specified KeyID.
> For now it's dummy implementation that would be replaced later.

I think you need to explain the tradeoffs here. We could just change
the linear map around, but you don't. Why?

2018-03-05 19:10:53

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 16/22] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()

On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> + * It includes full range of PFN bits regardless if they were claimed for KeyID
> + * or not: we want to preserve KeyID on pte_modify() and pgprot_modify().
> */
> -#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
> +#define PTE_PFN_MASK_MAX \
> + (((signed long)PAGE_MASK) & ((1UL << __PHYSICAL_MASK_SHIFT) - 1))
> +#define _PAGE_CHG_MASK (PTE_PFN_MASK_MAX | _PAGE_PCD | _PAGE_PWT | \
> _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \
> _PAGE_SOFT_DIRTY)

Is there a way to make this:

#define _PAGE_CHG_MASK (PTE_PFN_MASK | PTE_KEY_MASK...? | _PAGE_PCD |

That would be a lot more understandable.

2018-03-05 19:13:26

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/22] mm, rmap: Free encrypted pages once mapcount drops to zero

On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> extern void prep_encrypt_page(struct page *page, gfp_t gfp, unsigned int order);
> +extern void free_encrypt_page(struct page *page, int keyid, unsigned int order);

The grammar here is weird, I think.

Why not free_encrypted_page()?

2018-03-05 19:16:03

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/22] mm, rmap: Free encrypted pages once mapcount drops to zero

On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> @@ -1292,6 +1308,12 @@ static void page_remove_anon_compound_rmap(struct page *page)
> __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, -nr);
> deferred_split_huge_page(page);
> }
> +
> + anon_vma = page_anon_vma(page);
> + if (anon_vma_encrypted(anon_vma)) {
> + int keyid = anon_vma_keyid(anon_vma);
> + free_encrypt_page(page, keyid, compound_order(page));
> + }
> }

It's not covered in the description and I'm to lazy to dig into it, so:
Without this code, where do they get freed? Why does it not cause any
problems to free them here?

2018-03-06 08:19:41

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/22] mm, rmap: Free encrypted pages once mapcount drops to zero

On Mon, Mar 05, 2018 at 11:12:15AM -0800, Dave Hansen wrote:
> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> > extern void prep_encrypt_page(struct page *page, gfp_t gfp, unsigned int order);
> > +extern void free_encrypt_page(struct page *page, int keyid, unsigned int order);
>
> The grammar here is weird, I think.
>
> Why not free_encrypted_page()?

Okay, I'll fix this.

--
Kirill A. Shutemov

2018-03-06 08:29:09

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/22] mm, rmap: Free encrypted pages once mapcount drops to zero

On Mon, Mar 05, 2018 at 11:13:36AM -0800, Dave Hansen wrote:
> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> > @@ -1292,6 +1308,12 @@ static void page_remove_anon_compound_rmap(struct page *page)
> > __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, -nr);
> > deferred_split_huge_page(page);
> > }
> > +
> > + anon_vma = page_anon_vma(page);
> > + if (anon_vma_encrypted(anon_vma)) {
> > + int keyid = anon_vma_keyid(anon_vma);
> > + free_encrypt_page(page, keyid, compound_order(page));
> > + }
> > }
>
> It's not covered in the description and I'm to lazy to dig into it, so:
> Without this code, where do they get freed? Why does it not cause any
> problems to free them here?

It's the only place where we get it freed. "Freeing" is not the best
terminology here, but I failed to come up with something batter.
We prepare the encryption page to being freed: flush the cache in MKTME
case.

The page itself gets freed later in a usual manner: once refcount drops to
zero. The problem is that we may not have valid anon_vma around once
mapcount drops to zero, so we have to do "freeing" here.

For anonymous memory once mapcount dropped to zero there's no way it will
get mapped back to userspace. page_remove_anon

Kernel still will be able to access the page with kmap() and I will need
to be very careful to get it right wrt cache management.

I'll update the description.

--
Kirill A. Shutemov

2018-03-06 08:31:42

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 16/22] x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()

On Mon, Mar 05, 2018 at 11:09:23AM -0800, Dave Hansen wrote:
> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> > + * It includes full range of PFN bits regardless if they were claimed for KeyID
> > + * or not: we want to preserve KeyID on pte_modify() and pgprot_modify().
> > */
> > -#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
> > +#define PTE_PFN_MASK_MAX \
> > + (((signed long)PAGE_MASK) & ((1UL << __PHYSICAL_MASK_SHIFT) - 1))
> > +#define _PAGE_CHG_MASK (PTE_PFN_MASK_MAX | _PAGE_PCD | _PAGE_PWT | \
> > _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \
> > _PAGE_SOFT_DIRTY)
>
> Is there a way to make this:
>
> #define _PAGE_CHG_MASK (PTE_PFN_MASK | PTE_KEY_MASK...? | _PAGE_PCD |
>
> That would be a lot more understandable.

Yes, it would.

But it means we will have *two* variables referenced from _PAGE_CHG_MASK:
one for PTE_PFN_MASK and one for PTE_KEY_MASK as both of them are dynamic.

With this patch we would get rid of both of them.

I'll update the description.

--
Kirill A. Shutemov

2018-03-06 08:37:17

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 18/22] x86/mm: Handle allocation of encrypted pages

On Mon, Mar 05, 2018 at 11:03:55AM -0800, Dave Hansen wrote:
> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> > -#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
> > - alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
> > #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
> > +#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
> > +({ \
> > + struct page *page; \
> > + gfp_t gfp = movableflags | GFP_HIGHUSER; \
> > + if (vma_is_encrypted(vma)) \
> > + page = __alloc_zeroed_encrypted_user_highpage(gfp, vma, vaddr); \
> > + else \
> > + page = alloc_page_vma(gfp | __GFP_ZERO, vma, vaddr); \
> > + page; \
> > +})
>
> This is pretty darn ugly and also adds a big old branch into the hottest
> path in the page allocator.
>
> It's also really odd that you strip __GFP_ZERO and then go ahead and
> zero the encrypted page unconditionally. It really makes me wonder if
> this is the right spot to be doing this.
>
> Can we not, for instance do it inside alloc_page_vma()?

Yes we can.

It would require substantial change into page allocation path for
CONFIG_NUMA=n as we don't path down vma at the moment. And without vma we
don't have a way to know which KeyID to use.

I will explore how it would fit together.

--
Kirill A. Shutemov

2018-03-06 08:38:18

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 18/22] x86/mm: Handle allocation of encrypted pages

On Mon, Mar 05, 2018 at 11:07:55AM -0800, Dave Hansen wrote:
> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> > kmap_atomic_keyid() would map the page with the specified KeyID.
> > For now it's dummy implementation that would be replaced later.
>
> I think you need to explain the tradeoffs here. We could just change
> the linear map around, but you don't. Why?

I don't think we settled on implementation by this point: kmap() is only
interface and doesn't imply what it acctually does. I *can* change linear
mapping if we would chose so.

I will explain the kmap() implementation in patches that would implement
it.

--
Kirill A. Shutemov

2018-03-06 08:40:33

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 19/22] x86/mm: Implement free_encrypt_page()

On Mon, Mar 05, 2018 at 11:00:00AM -0800, Dave Hansen wrote:
> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> > +void free_encrypt_page(struct page *page, int keyid, unsigned int order)
> > +{
> > + int i;
> > + void *v;
> > +
> > + for (i = 0; i < (1 << order); i++) {
> > + v = kmap_atomic_keyid(page, keyid + i);
> > + /* See comment in prep_encrypt_page() */
> > + clflush_cache_range(v, PAGE_SIZE);
> > + kunmap_atomic(v);
> > + }
> > +}
>
> Did you miss adding the call sites for this?

No. It is in "mm, rmap: Free encrypted pages once mapcount drops to zero".
But the call is optimized out since anon_vma_encrypted() is always false
so far.

--
Kirill A. Shutemov

2018-03-06 08:55:51

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 19/22] x86/mm: Implement free_encrypt_page()

On Mon, Mar 05, 2018 at 11:07:16AM -0800, Dave Hansen wrote:
> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> > +void free_encrypt_page(struct page *page, int keyid, unsigned int order)
> > +{
> > + int i;
> > + void *v;
> > +
> > + for (i = 0; i < (1 << order); i++) {
> > + v = kmap_atomic_keyid(page, keyid + i);
> > + /* See comment in prep_encrypt_page() */
> > + clflush_cache_range(v, PAGE_SIZE);
> > + kunmap_atomic(v);
> > + }
> > +}
>
> Have you measured how slow this is?

No, I have not.

> It's an optimization, but can we find a way to only do this dance when
> we *actually* change the keyid? Right now, we're doing mapping at alloc
> and free, clflushing at free and zeroing at alloc. Let's say somebody does:
>
> ptr = malloc(PAGE_SIZE);
> *ptr = foo;
> free(ptr);
>
> ptr = malloc(PAGE_SIZE);
> *ptr = bar;
> free(ptr);
>
> And let's say ptr is in encrypted memory and that we actually munmap()
> at free(). We can theoretically skip the clflush, right?

Yes we can. Theoretically. We would need to find a way to keep KeyID
around after the page is removed from rmap. That's not so trivial as far
as I can see.

I will look into optimization after I'll got functionality in place.

--
Kirill A. Shutemov

2018-03-06 08:59:14

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 21/22] x86/mm: Introduce page_keyid() and page_encrypted()

On Mon, Mar 05, 2018 at 09:08:53AM -0800, Dave Hansen wrote:
> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> > +static inline bool page_encrypted(struct page *page)
> > +{
> > + /* All pages with non-zero KeyID are encrypted */
> > + return page_keyid(page) != 0;
> > +}
>
> Is this true? I thought there was a KEYID_NO_ENCRYPT "Do not encrypt
> memory when this KeyID is in use." Is that really only limited to key 0.

Well, it depends on what we mean by "encrypted". For memory management
pruposes we care if the page is encrypted with KeyID different from
default one. All pages with non-default KeyID threated the same by memory
management.

So far we don't have users for the interface. We may reconsider
the meaning once we get users.

--
Kirill A. Shutemov

2018-03-06 08:59:54

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 00/22] Partial MKTME enabling

On Mon, Mar 05, 2018 at 11:05:50AM -0800, Matthew Wilcox wrote:
> On Mon, Mar 05, 2018 at 10:30:50AM -0800, Christoph Hellwig wrote:
> > On Mon, Mar 05, 2018 at 07:25:48PM +0300, Kirill A. Shutemov wrote:
> > > Hi everybody,
> > >
> > > Here's updated version of my patchset that brings support of MKTME.
> >
> > It would really help if you'd explain what "MKTME" is..
>
> You needed to keep reading, to below the -------------- line.
>
> I agree though, that should have been up top.

My bad. Will update it for future postings.

--
Kirill A. Shutemov

2018-03-06 13:54:34

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 19/22] x86/mm: Implement free_encrypt_page()

On 03/06/2018 12:54 AM, Kirill A. Shutemov wrote:
>> Have you measured how slow this is?
> No, I have not.

It would be handy to do this. I *think* you can do it on normal
hardware, even if it does not have "real" support for memory encryption.
Just don't set the encryption bits in the PTEs but go through all the
motions of cache flushing.

I think that will help tell us whether this is a really specialized
thing a la hugetlbfs or whether it's something we really want to support
as a first-class citizen in the VM.


2018-03-06 14:11:18

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 19/22] x86/mm: Implement free_encrypt_page()

On Tue, Mar 06, 2018 at 05:52:44AM -0800, Dave Hansen wrote:
> On 03/06/2018 12:54 AM, Kirill A. Shutemov wrote:
> >> Have you measured how slow this is?
> > No, I have not.
>
> It would be handy to do this. I *think* you can do it on normal
> hardware, even if it does not have "real" support for memory encryption.
> Just don't set the encryption bits in the PTEs but go through all the
> motions of cache flushing.

Yes, allocation/freeing and KeyID interfaces can be tested with MKTME
support in hardware. I did most of my testing this way.

> I think that will help tell us whether this is a really specialized
> thing a la hugetlbfs or whether it's something we really want to support
> as a first-class citizen in the VM.

I will benchmark this. But not right now.

--
Kirill A. Shutemov

2018-03-06 14:58:37

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 21/22] x86/mm: Introduce page_keyid() and page_encrypted()

On 03/06/2018 12:57 AM, Kirill A. Shutemov wrote:
> On Mon, Mar 05, 2018 at 09:08:53AM -0800, Dave Hansen wrote:
>> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
>>> +static inline bool page_encrypted(struct page *page)
>>> +{
>>> + /* All pages with non-zero KeyID are encrypted */
>>> + return page_keyid(page) != 0;
>>> +}
>>
>> Is this true? I thought there was a KEYID_NO_ENCRYPT "Do not encrypt
>> memory when this KeyID is in use." Is that really only limited to key 0.
>
> Well, it depends on what we mean by "encrypted". For memory management
> pruposes we care if the page is encrypted with KeyID different from
> default one. All pages with non-default KeyID threated the same by memory
> management.

Doesn't it really mean "am I able to use the direct map to get this
page's contents?"

2018-03-06 15:01:02

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/22] mm, rmap: Free encrypted pages once mapcount drops to zero

On 03/06/2018 12:27 AM, Kirill A. Shutemov wrote:
> + anon_vma = page_anon_vma(page);
> + if (anon_vma_encrypted(anon_vma)) {
> + int keyid = anon_vma_keyid(anon_vma);
> + free_encrypt_page(page, keyid, compound_order(page));
> + }
> }

So, just double-checking: free_encrypt_page() neither "frees and
encrypts the page"" nor "free an encrypted page"?

That seems a bit suboptimal. :)

2018-03-06 15:01:40

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 21/22] x86/mm: Introduce page_keyid() and page_encrypted()

On Tue, Mar 06, 2018 at 02:56:08PM +0000, Dave Hansen wrote:
> On 03/06/2018 12:57 AM, Kirill A. Shutemov wrote:
> > On Mon, Mar 05, 2018 at 09:08:53AM -0800, Dave Hansen wrote:
> >> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> >>> +static inline bool page_encrypted(struct page *page)
> >>> +{
> >>> + /* All pages with non-zero KeyID are encrypted */
> >>> + return page_keyid(page) != 0;
> >>> +}
> >>
> >> Is this true? I thought there was a KEYID_NO_ENCRYPT "Do not encrypt
> >> memory when this KeyID is in use." Is that really only limited to key 0.
> >
> > Well, it depends on what we mean by "encrypted". For memory management
> > pruposes we care if the page is encrypted with KeyID different from
> > default one. All pages with non-default KeyID threated the same by memory
> > management.
>
> Doesn't it really mean "am I able to use the direct map to get this
> page's contents?"

Yes.

Any proposal for better helper name?

--
Kirill A. Shutemov

2018-03-06 15:04:20

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/22] mm, rmap: Free encrypted pages once mapcount drops to zero

On Tue, Mar 06, 2018 at 06:59:04AM -0800, Dave Hansen wrote:
> On 03/06/2018 12:27 AM, Kirill A. Shutemov wrote:
> > + anon_vma = page_anon_vma(page);
> > + if (anon_vma_encrypted(anon_vma)) {
> > + int keyid = anon_vma_keyid(anon_vma);
> > + free_encrypt_page(page, keyid, compound_order(page));
> > + }
> > }
>
> So, just double-checking: free_encrypt_page() neither "frees and
> encrypts the page"" nor "free an encrypted page"?
>
> That seems a bit suboptimal. :)

Yes, I'm bad with words :)

--
Kirill A. Shutemov

2018-03-06 15:05:57

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC, PATCH 21/22] x86/mm: Introduce page_keyid() and page_encrypted()

On 03/06/2018 06:58 AM, Kirill A. Shutemov wrote:
>> Doesn't it really mean "am I able to use the direct map to get this
>> page's contents?"
> Yes.
>
> Any proposal for better helper name?

Let's see how it gets used.

Subject: [tip:x86/mm] x86/cpufeatures: Add Intel Total Memory Encryption cpufeature

Commit-ID: 1da961d72ab0cfbe8b7c26cba731dc2bb6b9494b
Gitweb: https://git.kernel.org/tip/1da961d72ab0cfbe8b7c26cba731dc2bb6b9494b
Author: Kirill A. Shutemov <[email protected]>
AuthorDate: Mon, 5 Mar 2018 19:25:49 +0300
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 12 Mar 2018 12:09:53 +0100

x86/cpufeatures: Add Intel Total Memory Encryption cpufeature

CPUID.0x7.0x0:ECX[13] indicates whether CPU supports Intel Total Memory
Encryption.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kai Huang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f41079da38c5..16898eb813f5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -316,6 +316,7 @@
#define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */
#define X86_FEATURE_AVX512_VNNI (16*32+11) /* Vector Neural Network Instructions */
#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB instructions */
+#define X86_FEATURE_TME (16*32+13) /* Intel Total Memory Encryption */
#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */
#define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */
#define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */

Subject: [tip:x86/mm] x86/cpufeatures: Add Intel PCONFIG cpufeature

Commit-ID: 7958b2246fadf54b7ff820a2a5a2c5ca1554716f
Gitweb: https://git.kernel.org/tip/7958b2246fadf54b7ff820a2a5a2c5ca1554716f
Author: Kirill A. Shutemov <[email protected]>
AuthorDate: Mon, 5 Mar 2018 19:25:51 +0300
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 12 Mar 2018 12:09:53 +0100

x86/cpufeatures: Add Intel PCONFIG cpufeature

CPUID.0x7.0x0:EDX[18] indicates whether Intel CPU support PCONFIG instruction.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kai Huang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 16898eb813f5..d554c11e01ff 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -329,6 +329,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
#define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */
#define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
+#define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */
#define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */

Subject: [tip:x86/mm] x86/pconfig: Detect PCONFIG targets

Commit-ID: be7825c19b4866ddc7b1431740b69ede2eeb93c1
Gitweb: https://git.kernel.org/tip/be7825c19b4866ddc7b1431740b69ede2eeb93c1
Author: Kirill A. Shutemov <[email protected]>
AuthorDate: Mon, 5 Mar 2018 19:25:52 +0300
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 12 Mar 2018 12:10:54 +0100

x86/pconfig: Detect PCONFIG targets

Intel PCONFIG targets are enumerated via new CPUID leaf 0x1b. This patch
detects all supported targets of PCONFIG and implements helper to check
if the target is supported.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kai Huang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/intel_pconfig.h | 15 +++++++
arch/x86/kernel/cpu/Makefile | 2 +-
arch/x86/kernel/cpu/intel_pconfig.c | 82 ++++++++++++++++++++++++++++++++++++
3 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/intel_pconfig.h b/arch/x86/include/asm/intel_pconfig.h
new file mode 100644
index 000000000000..fb7a37c3798b
--- /dev/null
+++ b/arch/x86/include/asm/intel_pconfig.h
@@ -0,0 +1,15 @@
+#ifndef _ASM_X86_INTEL_PCONFIG_H
+#define _ASM_X86_INTEL_PCONFIG_H
+
+#include <asm/asm.h>
+#include <asm/processor.h>
+
+enum pconfig_target {
+ INVALID_TARGET = 0,
+ MKTME_TARGET = 1,
+ PCONFIG_TARGET_NR
+};
+
+int pconfig_target_supported(enum pconfig_target target);
+
+#endif /* _ASM_X86_INTEL_PCONFIG_H */
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 570e8bb1f386..a66229f51b12 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -28,7 +28,7 @@ obj-y += cpuid-deps.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o

-obj-$(CONFIG_CPU_SUP_INTEL) += intel.o
+obj-$(CONFIG_CPU_SUP_INTEL) += intel.o intel_pconfig.o
obj-$(CONFIG_CPU_SUP_AMD) += amd.o
obj-$(CONFIG_CPU_SUP_CYRIX_32) += cyrix.o
obj-$(CONFIG_CPU_SUP_CENTAUR) += centaur.o
diff --git a/arch/x86/kernel/cpu/intel_pconfig.c b/arch/x86/kernel/cpu/intel_pconfig.c
new file mode 100644
index 000000000000..0771a905b286
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_pconfig.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel PCONFIG instruction support.
+ *
+ * Copyright (C) 2017 Intel Corporation
+ *
+ * Author:
+ * Kirill A. Shutemov <[email protected]>
+ */
+
+#include <asm/cpufeature.h>
+#include <asm/intel_pconfig.h>
+
+#define PCONFIG_CPUID 0x1b
+
+#define PCONFIG_CPUID_SUBLEAF_MASK ((1 << 12) - 1)
+
+/* Subleaf type (EAX) for PCONFIG CPUID leaf (0x1B) */
+enum {
+ PCONFIG_CPUID_SUBLEAF_INVALID = 0,
+ PCONFIG_CPUID_SUBLEAF_TARGETID = 1,
+};
+
+/* Bitmask of supported targets */
+static u64 targets_supported __read_mostly;
+
+int pconfig_target_supported(enum pconfig_target target)
+{
+ /*
+ * We would need to re-think the implementation once we get > 64
+ * PCONFIG targets. Spec allows up to 2^32 targets.
+ */
+ BUILD_BUG_ON(PCONFIG_TARGET_NR >= 64);
+
+ if (WARN_ON_ONCE(target >= 64))
+ return 0;
+ return targets_supported & (1ULL << target);
+}
+
+static int __init intel_pconfig_init(void)
+{
+ int subleaf;
+
+ if (!boot_cpu_has(X86_FEATURE_PCONFIG))
+ return 0;
+
+ /*
+ * Scan subleafs of PCONFIG CPUID leaf.
+ *
+ * Subleafs of the same type need not to be consecutive.
+ *
+ * Stop on the first invalid subleaf type. All subleafs after the first
+ * invalid are invalid too.
+ */
+ for (subleaf = 0; subleaf < INT_MAX; subleaf++) {
+ struct cpuid_regs regs;
+
+ cpuid_count(PCONFIG_CPUID, subleaf,
+ &regs.eax, &regs.ebx, &regs.ecx, &regs.edx);
+
+ switch (regs.eax & PCONFIG_CPUID_SUBLEAF_MASK) {
+ case PCONFIG_CPUID_SUBLEAF_INVALID:
+ /* Stop on the first invalid subleaf */
+ goto out;
+ case PCONFIG_CPUID_SUBLEAF_TARGETID:
+ /* Mark supported PCONFIG targets */
+ if (regs.ebx < 64)
+ targets_supported |= (1ULL << regs.ebx);
+ if (regs.ecx < 64)
+ targets_supported |= (1ULL << regs.ecx);
+ if (regs.edx < 64)
+ targets_supported |= (1ULL << regs.edx);
+ break;
+ default:
+ /* Unknown CPUID.PCONFIG subleaf: ignore */
+ break;
+ }
+ }
+out:
+ return 0;
+}
+arch_initcall(intel_pconfig_init);

Subject: [tip:x86/mm] x86/tme: Detect if TME and MKTME is activated by BIOS

Commit-ID: cb06d8e3d020c30fe10ae711c925a5319ab82c88
Gitweb: https://git.kernel.org/tip/cb06d8e3d020c30fe10ae711c925a5319ab82c88
Author: Kirill A. Shutemov <[email protected]>
AuthorDate: Mon, 5 Mar 2018 19:25:50 +0300
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 12 Mar 2018 12:10:54 +0100

x86/tme: Detect if TME and MKTME is activated by BIOS

IA32_TME_ACTIVATE MSR (0x982) can be used to check if BIOS has enabled
TME and MKTME. It includes which encryption policy/algorithm is selected
for TME or available for MKTME. For MKTME, the MSR also enumerates how
many KeyIDs are available.

We would need to exclude KeyID bits from physical address bits.
detect_tme() would adjust cpuinfo_x86::x86_phys_bits accordingly.

We have to do this even if we are not going to use KeyID bits
ourself. VM guests still have to know that these bits are not usable
for physical address.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kai Huang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 90 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 90 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4aa9fd379390..b862067bb33c 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -510,6 +510,93 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
}
}

+#define MSR_IA32_TME_ACTIVATE 0x982
+
+/* Helpers to access TME_ACTIVATE MSR */
+#define TME_ACTIVATE_LOCKED(x) (x & 0x1)
+#define TME_ACTIVATE_ENABLED(x) (x & 0x2)
+
+#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf) /* Bits 7:4 */
+#define TME_ACTIVATE_POLICY_AES_XTS_128 0
+
+#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /* Bits 35:32 */
+
+#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */
+#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1
+
+/* Values for mktme_status (SW only construct) */
+#define MKTME_ENABLED 0
+#define MKTME_DISABLED 1
+#define MKTME_UNINITIALIZED 2
+static int mktme_status = MKTME_UNINITIALIZED;
+
+static void detect_tme(struct cpuinfo_x86 *c)
+{
+ u64 tme_activate, tme_policy, tme_crypto_algs;
+ int keyid_bits = 0, nr_keyids = 0;
+ static u64 tme_activate_cpu0 = 0;
+
+ rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate);
+
+ if (mktme_status != MKTME_UNINITIALIZED) {
+ if (tme_activate != tme_activate_cpu0) {
+ /* Broken BIOS? */
+ pr_err_once("x86/tme: configuation is inconsistent between CPUs\n");
+ pr_err_once("x86/tme: MKTME is not usable\n");
+ mktme_status = MKTME_DISABLED;
+
+ /* Proceed. We may need to exclude bits from x86_phys_bits. */
+ }
+ } else {
+ tme_activate_cpu0 = tme_activate;
+ }
+
+ if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) {
+ pr_info_once("x86/tme: not enabled by BIOS\n");
+ mktme_status = MKTME_DISABLED;
+ return;
+ }
+
+ if (mktme_status != MKTME_UNINITIALIZED)
+ goto detect_keyid_bits;
+
+ pr_info("x86/tme: enabled by BIOS\n");
+
+ tme_policy = TME_ACTIVATE_POLICY(tme_activate);
+ if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128)
+ pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy);
+
+ tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
+ if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) {
+ pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n",
+ tme_crypto_algs);
+ mktme_status = MKTME_DISABLED;
+ }
+detect_keyid_bits:
+ keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate);
+ nr_keyids = (1UL << keyid_bits) - 1;
+ if (nr_keyids) {
+ pr_info_once("x86/mktme: enabled by BIOS\n");
+ pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids);
+ } else {
+ pr_info_once("x86/mktme: disabled by BIOS\n");
+ }
+
+ if (mktme_status == MKTME_UNINITIALIZED) {
+ /* MKTME is usable */
+ mktme_status = MKTME_ENABLED;
+ }
+
+ /*
+ * Exclude KeyID bits from physical address bits.
+ *
+ * We have to do this even if we are not going to use KeyID bits
+ * ourself. VM guests still have to know that these bits are not usable
+ * for physical address.
+ */
+ c->x86_phys_bits -= keyid_bits;
+}
+
static void init_intel_energy_perf(struct cpuinfo_x86 *c)
{
u64 epb;
@@ -680,6 +767,9 @@ static void init_intel(struct cpuinfo_x86 *c)
if (cpu_has(c, X86_FEATURE_VMX))
detect_vmx_virtcap(c);

+ if (cpu_has(c, X86_FEATURE_TME))
+ detect_tme(c);
+
init_intel_energy_perf(c);

init_intel_misc_features(c);

Subject: [tip:x86/mm] x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf

Commit-ID: 24c517856af6511be1339dd55edd131160e37aac
Gitweb: https://git.kernel.org/tip/24c517856af6511be1339dd55edd131160e37aac
Author: Kirill A. Shutemov <[email protected]>
AuthorDate: Mon, 5 Mar 2018 19:25:53 +0300
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 12 Mar 2018 12:10:54 +0100

x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf

MKTME_KEY_PROG allows to manipulate MKTME keys in the CPU.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kai Huang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/intel_pconfig.h | 50 ++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)

diff --git a/arch/x86/include/asm/intel_pconfig.h b/arch/x86/include/asm/intel_pconfig.h
index fb7a37c3798b..3cb002b1d0f9 100644
--- a/arch/x86/include/asm/intel_pconfig.h
+++ b/arch/x86/include/asm/intel_pconfig.h
@@ -12,4 +12,54 @@ enum pconfig_target {

int pconfig_target_supported(enum pconfig_target target);

+enum pconfig_leaf {
+ MKTME_KEY_PROGRAM = 0,
+ PCONFIG_LEAF_INVALID,
+};
+
+#define PCONFIG ".byte 0x0f, 0x01, 0xc5"
+
+/* Defines and structure for MKTME_KEY_PROGRAM of PCONFIG instruction */
+
+/* mktme_key_program::keyid_ctrl COMMAND, bits [7:0] */
+#define MKTME_KEYID_SET_KEY_DIRECT 0
+#define MKTME_KEYID_SET_KEY_RANDOM 1
+#define MKTME_KEYID_CLEAR_KEY 2
+#define MKTME_KEYID_NO_ENCRYPT 3
+
+/* mktme_key_program::keyid_ctrl ENC_ALG, bits [23:8] */
+#define MKTME_AES_XTS_128 (1 << 8)
+
+/* Return codes from the PCONFIG MKTME_KEY_PROGRAM */
+#define MKTME_PROG_SUCCESS 0
+#define MKTME_INVALID_PROG_CMD 1
+#define MKTME_ENTROPY_ERROR 2
+#define MKTME_INVALID_KEYID 3
+#define MKTME_INVALID_ENC_ALG 4
+#define MKTME_DEVICE_BUSY 5
+
+/* Hardware requires the structure to be 256 byte alinged. Otherwise #GP(0). */
+struct mktme_key_program {
+ u16 keyid;
+ u32 keyid_ctrl;
+ u8 __rsvd[58];
+ u8 key_field_1[64];
+ u8 key_field_2[64];
+} __packed __aligned(256);
+
+static inline int mktme_key_program(struct mktme_key_program *key_program)
+{
+ unsigned long rax = MKTME_KEY_PROGRAM;
+
+ if (!pconfig_target_supported(MKTME_TARGET))
+ return -ENXIO;
+
+ asm volatile(PCONFIG
+ : "=a" (rax), "=b" (key_program)
+ : "0" (rax), "1" (key_program)
+ : "memory", "cc");
+
+ return rax;
+}
+
#endif /* _ASM_X86_INTEL_PCONFIG_H */

2018-03-13 02:13:14

by Kai Huang

[permalink] [raw]
Subject: Re: [tip:x86/mm] x86/tme: Detect if TME and MKTME is activated by BIOS

On Mon, 2018-03-12 at 05:21 -0700, tip-bot for Kirill A. Shutemov
wrote:
> Commit-ID: cb06d8e3d020c30fe10ae711c925a5319ab82c88
> Gitweb: https://git.kernel.org/tip/cb06d8e3d020c30fe10ae711c925a5
> 319ab82c88
> Author: Kirill A. Shutemov <[email protected]>
> AuthorDate: Mon, 5 Mar 2018 19:25:50 +0300
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Mon, 12 Mar 2018 12:10:54 +0100
>
> x86/tme: Detect if TME and MKTME is activated by BIOS
>
> IA32_TME_ACTIVATE MSR (0x982) can be used to check if BIOS has
> enabled
> TME and MKTME. It includes which encryption policy/algorithm is
> selected
> for TME or available for MKTME. For MKTME, the MSR also enumerates
> how
> many KeyIDs are available.
>
> We would need to exclude KeyID bits from physical address bits.
> detect_tme() would adjust cpuinfo_x86::x86_phys_bits accordingly.
>
> We have to do this even if we are not going to use KeyID bits
> ourself. VM guests still have to know that these bits are not usable
> for physical address.
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Kai Huang <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Link: http://lkml.kernel.org/r/20180305162610.37510-3-kirill.shutemov
> @linux.intel.com
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/kernel/cpu/intel.c | 90
> +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 90 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/intel.c
> b/arch/x86/kernel/cpu/intel.c
> index 4aa9fd379390..b862067bb33c 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -510,6 +510,93 @@ static void detect_vmx_virtcap(struct
> cpuinfo_x86 *c)
> }
> }
>
> +#define MSR_IA32_TME_ACTIVATE 0x982
> +
> +/* Helpers to access TME_ACTIVATE MSR */
> +#define TME_ACTIVATE_LOCKED(x) (x & 0x1)
> +#define TME_ACTIVATE_ENABLED(x) (x & 0x2)
> +
> +#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf)
> /* Bits 7:4 */
> +#define TME_ACTIVATE_POLICY_AES_XTS_128 0
> +
> +#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /
> * Bits 35:32 */
> +
> +#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff)
> /* Bits 63:48 */
> +#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1
> +
> +/* Values for mktme_status (SW only construct) */
> +#define MKTME_ENABLED 0
> +#define MKTME_DISABLED 1
> +#define MKTME_UNINITIALIZED 2
> +static int mktme_status = MKTME_UNINITIALIZED;
> +
> +static void detect_tme(struct cpuinfo_x86 *c)
> +{
> + u64 tme_activate, tme_policy, tme_crypto_algs;
> + int keyid_bits = 0, nr_keyids = 0;
> + static u64 tme_activate_cpu0 = 0;
> +
> + rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate);
> +
> + if (mktme_status != MKTME_UNINITIALIZED) {
> + if (tme_activate != tme_activate_cpu0) {
> + /* Broken BIOS? */
> + pr_err_once("x86/tme: configuation is
> inconsistent between CPUs\n");
> + pr_err_once("x86/tme: MKTME is not
> usable\n");
> + mktme_status = MKTME_DISABLED;
> +
> + /* Proceed. We may need to exclude bits from
> x86_phys_bits. */
> + }
> + } else {
> + tme_activate_cpu0 = tme_activate;
> + }
> +
> + if (!TME_ACTIVATE_LOCKED(tme_activate) ||
> !TME_ACTIVATE_ENABLED(tme_activate)) {
> + pr_info_once("x86/tme: not enabled by BIOS\n");
> + mktme_status = MKTME_DISABLED;
> + return;
> + }
> +
> + if (mktme_status != MKTME_UNINITIALIZED)
> + goto detect_keyid_bits;
> +
> + pr_info("x86/tme: enabled by BIOS\n");
> +
> + tme_policy = TME_ACTIVATE_POLICY(tme_activate);
> + if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128)
> + pr_warn("x86/tme: Unknown policy is active:
> %#llx\n", tme_policy);
> +
> + tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
> + if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) {
> + pr_err("x86/mktme: No known encryption algorithm is
> supported: %#llx\n",
> + tme_crypto_algs);
> + mktme_status = MKTME_DISABLED;
> + }
> +detect_keyid_bits:
> + keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate);
> + nr_keyids = (1UL << keyid_bits) - 1;
> + if (nr_keyids) {
> + pr_info_once("x86/mktme: enabled by BIOS\n");
> + pr_info_once("x86/mktme: %d KeyIDs available\n",
> nr_keyids);
> + } else {
> + pr_info_once("x86/mktme: disabled by BIOS\n");
> + }
> +
> + if (mktme_status == MKTME_UNINITIALIZED) {
> + /* MKTME is usable */
> + mktme_status = MKTME_ENABLED;
> + }
> +
> + /*
> + * Exclude KeyID bits from physical address bits.
> + *
> + * We have to do this even if we are not going to use KeyID
> bits
> + * ourself. VM guests still have to know that these bits are
> not usable
> + * for physical address.
> + */
> + c->x86_phys_bits -= keyid_bits;

It seems setup_pku() will call get_cpu_cap to restore c->x86_phys_bits
later? In which case I think you need to change setup_pku as well.

And for the comments here, I think it can be refined. It is true that
VM guest needs to know bits of physical address, but this info is not
used only by VM. I think the reason we need to update is this is simply
the fact.

Thanks,
-Kai

> +}
> +
> static void init_intel_energy_perf(struct cpuinfo_x86 *c)
> {
> u64 epb;
> @@ -680,6 +767,9 @@ static void init_intel(struct cpuinfo_x86 *c)
> if (cpu_has(c, X86_FEATURE_VMX))
> detect_vmx_virtcap(c);
>
> + if (cpu_has(c, X86_FEATURE_TME))
> + detect_tme(c);
> +
> init_intel_energy_perf(c);
>
> init_intel_misc_features(c);

2018-03-13 12:52:09

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [tip:x86/mm] x86/tme: Detect if TME and MKTME is activated by BIOS

On Tue, Mar 13, 2018 at 03:12:02PM +1300, Kai Huang wrote:
> It seems setup_pku() will call get_cpu_cap to restore c->x86_phys_bits
> later? In which case I think you need to change setup_pku as well.

Thanks for catching this.

I think setup_pku() shouldn't call get_cpu_cap().

Any objections against this:

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 348cf4821240..ce10d8ae4cd6 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -362,6 +362,8 @@ static bool pku_disabled;

static __always_inline void setup_pku(struct cpuinfo_x86 *c)
{
+ u32 eax, ebx, ecx, edx;
+
/* check the boot processor, plus compile options for PKU: */
if (!cpu_feature_enabled(X86_FEATURE_PKU))
return;
@@ -377,7 +379,8 @@ static __always_inline void setup_pku(struct cpuinfo_x86 *c)
* cpuid bit to be set. We need to ensure that we
* update that bit in this CPU's "cpu_info".
*/
- get_cpu_cap(c);
+ cpuid_count(0x00000007, 0, &eax, &ebx, &ecx, &edx);
+ c->x86_capability[CPUID_7_ECX] = ecx;
}

#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS

> And for the comments here, I think it can be refined. It is true that
> VM guest needs to know bits of physical address, but this info is not
> used only by VM. I think the reason we need to update is this is simply
> the fact.

Fair enough. Like this?

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index e8ddc6dcfd53..ac45ba7398d9 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -612,11 +612,8 @@ static void detect_tme(struct cpuinfo_x86 *c)
#endif

/*
- * Exclude KeyID bits from physical address bits.
- *
- * We have to do this even if we are not going to use KeyID bits
- * ourself. VM guests still have to know that these bits are not usable
- * for physical address.
+ * KeyID bits effectively lower number of physical address bits.
+ * Let's update cpuinfo_x86::x86_phys_bits to reflect the fact.
*/
c->x86_phys_bits -= keyid_bits;
}
--
Kirill A. Shutemov

2018-03-13 15:12:58

by Dave Hansen

[permalink] [raw]
Subject: Re: [tip:x86/mm] x86/tme: Detect if TME and MKTME is activated by BIOS

On 03/13/2018 05:49 AM, Kirill A. Shutemov wrote:
> On Tue, Mar 13, 2018 at 03:12:02PM +1300, Kai Huang wrote:
>> It seems setup_pku() will call get_cpu_cap to restore c->x86_phys_bits
>> later? In which case I think you need to change setup_pku as well.
> Thanks for catching this.
>
> I think setup_pku() shouldn't call get_cpu_cap().

I think if you want to make it illegal to call get_cpu_cap() twice, you
should enforce that.

2018-03-13 22:09:16

by Kai Huang

[permalink] [raw]
Subject: Re: [tip:x86/mm] x86/tme: Detect if TME and MKTME is activated by BIOS

On Tue, 2018-03-13 at 15:49 +0300, Kirill A. Shutemov wrote:
> On Tue, Mar 13, 2018 at 03:12:02PM +1300, Kai Huang wrote:
> > It seems setup_pku() will call get_cpu_cap to restore c-
> > >x86_phys_bits
> > later? In which case I think you need to change setup_pku as well.
>
> Thanks for catching this.
>
> I think setup_pku() shouldn't call get_cpu_cap().
>
> Any objections against this:
>
> diff --git a/arch/x86/kernel/cpu/common.c
> b/arch/x86/kernel/cpu/common.c
> index 348cf4821240..ce10d8ae4cd6 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -362,6 +362,8 @@ static bool pku_disabled;
>
> static __always_inline void setup_pku(struct cpuinfo_x86 *c)
> {
> + u32 eax, ebx, ecx, edx;
> +
> /* check the boot processor, plus compile options for PKU: */
> if (!cpu_feature_enabled(X86_FEATURE_PKU))
> return;
> @@ -377,7 +379,8 @@ static __always_inline void setup_pku(struct
> cpuinfo_x86 *c)
> * cpuid bit to be set. We need to ensure that we
> * update that bit in this CPU's "cpu_info".
> */
> - get_cpu_cap(c);
> + cpuid_count(0x00000007, 0, &eax, &ebx, &ecx, &edx);
> + c->x86_capability[CPUID_7_ECX] = ecx;
> }
>
> #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>
> > And for the comments here, I think it can be refined. It is true
> > that
> > VM guest needs to know bits of physical address, but this info is
> > not
> > used only by VM. I think the reason we need to update is this is
> > simply
> > the fact.
>
> Fair enough. Like this?

Yes good to me. Thanks.

Thanks,
-Kai
>
> diff --git a/arch/x86/kernel/cpu/intel.c
> b/arch/x86/kernel/cpu/intel.c
> index e8ddc6dcfd53..ac45ba7398d9 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -612,11 +612,8 @@ static void detect_tme(struct cpuinfo_x86 *c)
> #endif
>
> /*
> - * Exclude KeyID bits from physical address bits.
> - *
> - * We have to do this even if we are not going to use KeyID
> bits
> - * ourself. VM guests still have to know that these bits are
> not usable
> - * for physical address.
> + * KeyID bits effectively lower number of physical address
> bits.
> + * Let's update cpuinfo_x86::x86_phys_bits to reflect the
> fact.
> */
> c->x86_phys_bits -= keyid_bits;
> }

2018-03-20 12:52:25

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 19/22] x86/mm: Implement free_encrypt_page()

On Mon, Mar 05, 2018 at 11:07:16AM -0800, Dave Hansen wrote:
> On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> > +void free_encrypt_page(struct page *page, int keyid, unsigned int order)
> > +{
> > + int i;
> > + void *v;
> > +
> > + for (i = 0; i < (1 << order); i++) {
> > + v = kmap_atomic_keyid(page, keyid + i);
> > + /* See comment in prep_encrypt_page() */
> > + clflush_cache_range(v, PAGE_SIZE);
> > + kunmap_atomic(v);
> > + }
> > +}
>
> Have you measured how slow this is?

Well, it's pretty bad.

Tight loop of allocation/free a page (measured from within kernel) is
4-6 times slower:

Encryption off
Order-0, 10000000 iterations: 50496616 cycles
Order-0, 10000000 iterations: 46900080 cycles
Order-0, 10000000 iterations: 46873540 cycles

Encryption on
Order-0, 10000000 iterations: 222021882 cycles
Order-0, 10000000 iterations: 222315381 cycles
Order-0, 10000000 iterations: 222289110 cycles

Encryption off
Order-9, 100000 iterations: 46829632 cycles
Order-9, 100000 iterations: 46919952 cycles
Order-9, 100000 iterations: 37647873 cycles

Encryption on
Order-9, 100000 iterations: 222407715 cycles
Order-9, 100000 iterations: 222111657 cycles
Order-9, 100000 iterations: 222335352 cycles

On macro benchmark it's not that dramatic, but still bad -- 16% down:

Encryption off

Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

6769369.623773 task-clock (msec) # 33.869 CPUs utilized ( +- 0.02% )
1,086,729 context-switches # 0.161 K/sec ( +- 0.83% )
193,153 cpu-migrations # 0.029 K/sec ( +- 0.72% )
104,971,541 page-faults # 0.016 M/sec ( +- 0.01% )
20,179,502,944,932 cycles # 2.981 GHz ( +- 0.02% )
15,244,481,306,390 stalled-cycles-frontend # 75.54% frontend cycles idle ( +- 0.02% )
11,548,852,154,412 instructions # 0.57 insn per cycle
# 1.32 stalled cycles per insn ( +- 0.00% )
2,488,836,449,779 branches # 367.661 M/sec ( +- 0.00% )
94,445,965,563 branch-misses # 3.79% of all branches ( +- 0.01% )

199.871815231 seconds time elapsed ( +- 0.17% )

Encryption on

Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

8099514.432371 task-clock (msec) # 34.959 CPUs utilized ( +- 0.01% )
1,169,589 context-switches # 0.144 K/sec ( +- 0.51% )
198,008 cpu-migrations # 0.024 K/sec ( +- 0.77% )
104,953,906 page-faults # 0.013 M/sec ( +- 0.01% )
24,158,282,050,086 cycles # 2.983 GHz ( +- 0.01% )
19,183,031,041,329 stalled-cycles-frontend # 79.41% frontend cycles idle ( +- 0.01% )
11,600,772,560,767 instructions # 0.48 insn per cycle
# 1.65 stalled cycles per insn ( +- 0.00% )
2,501,453,131,164 branches # 308.840 M/sec ( +- 0.00% )
94,566,437,048 branch-misses # 3.78% of all branches ( +- 0.01% )

231.684539584 seconds time elapsed ( +- 0.15% )

I'll check what we can do here.

--
Kirill A. Shutemov

2018-03-22 15:57:11

by Punit Agrawal

[permalink] [raw]
Subject: Re: [RFC, PATCH 07/22] x86/mm: Mask out KeyID bits from page table entry pfn

Hi Kirill,

A flyby comment below.

"Kirill A. Shutemov" <[email protected]> writes:

> MKTME claims several upper bits of the physical address in a page table
> entry to encode KeyID. It effectively shrinks number of bits for
> physical address. We should exclude KeyID bits from physical addresses.
>
> For instance, if CPU enumerates 52 physical address bits and number of
> bits claimed for KeyID is 6, bits 51:46 must not be threated as part
> physical address.
>
> This patch adjusts __PHYSICAL_MASK during MKTME enumeration.
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> ---
> arch/x86/kernel/cpu/intel.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index c770689490b5..35436bbadd0b 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -580,6 +580,30 @@ static void detect_tme(struct cpuinfo_x86 *c)
> mktme_status = MKTME_ENABLED;
> }
>
> +#ifdef CONFIG_X86_INTEL_MKTME
> + if (mktme_status == MKTME_ENABLED && nr_keyids) {
> + /*
> + * Mask out bits claimed from KeyID from physical address mask.
> + *
> + * For instance, if a CPU enumerates 52 physical address bits
> + * and number of bits claimed for KeyID is 6, bits 51:46 of
> + * physical address is unusable.
> + */
> + phys_addr_t keyid_mask;
> +
> + keyid_mask = 1ULL << c->x86_phys_bits;
> + keyid_mask -= 1ULL << (c->x86_phys_bits - keyid_bits);
> + physical_mask &= ~keyid_mask;

You could use GENMASK_ULL() to construct the keyid_mask instead of
rolling your own here.

Thanks,
Punit


> + } else {
> + /*
> + * Reset __PHYSICAL_MASK.
> + * Maybe needed if there's inconsistent configuation
> + * between CPUs.
> + */
> + physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
> + }
> +#endif
> +
> /*
> * Exclude KeyID bits from physical address bits.
> *

2018-03-22 16:04:31

by Punit Agrawal

[permalink] [raw]
Subject: Re: [RFC, PATCH 08/22] mm: Introduce __GFP_ENCRYPT

"Kirill A. Shutemov" <[email protected]> writes:

> The patch adds new gfp flag to indicate that we're allocating encrypted
> page.
>
> Architectural code may need to do special preparation for encrypted
> pages such as flushing cache to avoid aliasing.
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> ---
> include/linux/gfp.h | 12 ++++++++++++
> include/linux/mm.h | 2 ++
> include/trace/events/mmflags.h | 1 +
> mm/Kconfig | 3 +++
> mm/page_alloc.c | 3 +++
> tools/perf/builtin-kmem.c | 1 +
> 6 files changed, 22 insertions(+)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 1a4582b44d32..43a93ca11c3c 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -24,6 +24,11 @@ struct vm_area_struct;
> #define ___GFP_HIGH 0x20u
> #define ___GFP_IO 0x40u
> #define ___GFP_FS 0x80u
> +#ifdef CONFIG_ARCH_WANTS_GFP_ENCRYPT
> +#define ___GFP_ENCYPT 0x100u
> +#else
> +#define ___GFP_ENCYPT 0

s/___GFP_ENCYPT/___GFP_ENCRYPT?

Thanks,
Punit

[...]


2018-03-27 14:46:12

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC, PATCH 19/22] x86/mm: Implement free_encrypt_page()

On Tue, Mar 20, 2018 at 03:50:46PM +0300, Kirill A. Shutemov wrote:
> On Mon, Mar 05, 2018 at 11:07:16AM -0800, Dave Hansen wrote:
> > On 03/05/2018 08:26 AM, Kirill A. Shutemov wrote:
> > > +void free_encrypt_page(struct page *page, int keyid, unsigned int order)
> > > +{
> > > + int i;
> > > + void *v;
> > > +
> > > + for (i = 0; i < (1 << order); i++) {
> > > + v = kmap_atomic_keyid(page, keyid + i);
> > > + /* See comment in prep_encrypt_page() */
> > > + clflush_cache_range(v, PAGE_SIZE);
> > > + kunmap_atomic(v);
> > > + }
> > > +}
> >
> > Have you measured how slow this is?
>
> Well, it's pretty bad.
>
> Tight loop of allocation/free a page (measured from within kernel) is
> 4-6 times slower:
>
> Encryption off
> Order-0, 10000000 iterations: 50496616 cycles
> Order-0, 10000000 iterations: 46900080 cycles
> Order-0, 10000000 iterations: 46873540 cycles
>
> Encryption on
> Order-0, 10000000 iterations: 222021882 cycles
> Order-0, 10000000 iterations: 222315381 cycles
> Order-0, 10000000 iterations: 222289110 cycles
>
> Encryption off
> Order-9, 100000 iterations: 46829632 cycles
> Order-9, 100000 iterations: 46919952 cycles
> Order-9, 100000 iterations: 37647873 cycles
>
> Encryption on
> Order-9, 100000 iterations: 222407715 cycles
> Order-9, 100000 iterations: 222111657 cycles
> Order-9, 100000 iterations: 222335352 cycles
>
> On macro benchmark it's not that dramatic, but still bad -- 16% down:
>
> Encryption off
>
> Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
>
> 6769369.623773 task-clock (msec) # 33.869 CPUs utilized ( +- 0.02% )
> 1,086,729 context-switches # 0.161 K/sec ( +- 0.83% )
> 193,153 cpu-migrations # 0.029 K/sec ( +- 0.72% )
> 104,971,541 page-faults # 0.016 M/sec ( +- 0.01% )
> 20,179,502,944,932 cycles # 2.981 GHz ( +- 0.02% )
> 15,244,481,306,390 stalled-cycles-frontend # 75.54% frontend cycles idle ( +- 0.02% )
> 11,548,852,154,412 instructions # 0.57 insn per cycle
> # 1.32 stalled cycles per insn ( +- 0.00% )
> 2,488,836,449,779 branches # 367.661 M/sec ( +- 0.00% )
> 94,445,965,563 branch-misses # 3.79% of all branches ( +- 0.01% )
>
> 199.871815231 seconds time elapsed ( +- 0.17% )
>
> Encryption on
>
> Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):
>
> 8099514.432371 task-clock (msec) # 34.959 CPUs utilized ( +- 0.01% )
> 1,169,589 context-switches # 0.144 K/sec ( +- 0.51% )
> 198,008 cpu-migrations # 0.024 K/sec ( +- 0.77% )
> 104,953,906 page-faults # 0.013 M/sec ( +- 0.01% )
> 24,158,282,050,086 cycles # 2.983 GHz ( +- 0.01% )
> 19,183,031,041,329 stalled-cycles-frontend # 79.41% frontend cycles idle ( +- 0.01% )
> 11,600,772,560,767 instructions # 0.48 insn per cycle
> # 1.65 stalled cycles per insn ( +- 0.00% )
> 2,501,453,131,164 branches # 308.840 M/sec ( +- 0.00% )
> 94,566,437,048 branch-misses # 3.78% of all branches ( +- 0.01% )
>
> 231.684539584 seconds time elapsed ( +- 0.15% )
>
> I'll check what we can do here.

Okay, I've rework the patchset (will post later) to store KeyID per-page
in page_ext->flags. The KeyID is preserved for freed pages and we can
avoid cache flushing if the new KeyID we want to use for the page matches
the previous one.

With the change microbenchmark I used before is useless as it will keep
allocating the same page avoiding cache flushes all the time.

On macrobenchmark (kernel build) we still see slow down, but it's ~3.6%
instead of 16%. It's more acceptable.

I guess we can do better than this and I will look more into performance
once whole stack will be functional.

Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs):

7045275.657792 task-clock (msec) # 34.007 CPUs utilized ( +- 0.02% )
1,122,659 context-switches # 0.159 K/sec ( +- 0.50% )
197,678 cpu-migrations # 0.028 K/sec ( +- 0.50% )
104,958,956 page-faults # 0.015 M/sec ( +- 0.01% )
21,003,977,611,574 cycles # 2.981 GHz ( +- 0.02% )
16,057,772,099,500 stalled-cycles-frontend # 76.45% frontend cycles idle ( +- 0.02% )
11,563,935,077,599 instructions # 0.55 insn per cycle
# 1.39 stalled cycles per insn ( +- 0.00% )
2,492,841,089,612 branches # 353.832 M/sec ( +- 0.00% )
94,613,299,643 branch-misses # 3.80% of all branches ( +- 0.02% )

207.171360888 seconds time elapsed ( +- 0.07% )

--
Kirill A. Shutemov