= Intro =
The patchset brings enabling of Intel Multi-Key Total Memory Encryption.
It consists of changes into multiple subsystems:
* Core MM: infrastructure for allocation pages, dealing with encrypted VMAs
and providing API setup encrypted mappings.
* arch/x86: feature enumeration, program keys into hardware, setup
page table entries for encrypted pages and more.
* Key management service: setup and management of encryption keys.
* DMA/IOMMU: dealing with encrypted memory on IO side.
* KVM: interaction with virtualization side.
* Documentation: description of APIs and usage examples.
Please review. Any feedback is welcome.
= Overview =
Multi-Key Total Memory Encryption (MKTME)[1] is a technology that allows
transparent memory encryption in upcoming Intel platforms. It uses a new
instruction (PCONFIG) for key setup and selects a key for individual pages
by repurposing physical address bits in the page tables.
These patches add support for MKTME into the existing kernel keyring
subsystem and add a new mprotect_encrypt() system call that can be used by
applications to encrypt anonymous memory with keys obtained from the
keyring.
This architecture supports encrypting both normal, volatile DRAM and
persistent memory. However, these patches do not implement persistent
memory support. We anticipate adding that support next.
== Hardware Background ==
MKTME is built on top of an existing single-key technology called TME.
TME encrypts all system memory using a single key generated by the CPU on
every boot of the system. TME provides robust mitigation against
single-read physical attacks, such as physically removing a DIMM and
inspecting its contents. TME provides weaker mitigations against
multiple-read physical attacks.
MKTME enables the use of multiple encryption keys[2], allowing selection
of the encryption key per-page using the page tables. Encryption keys are
programmed into each memory controller and the same set of keys is
available to all entities on the system with access to that memory (all
cores, DMA engines, etc...).
MKTME inherits many of the mitigations against hardware attacks from TME.
Like TME, MKTME does not fully mitigate vulnerable or malicious operating
systems or virtual machine managers. MKTME offers additional mitigations
when compared to TME.
TME and MKTME use the AES encryption algorithm in the AES-XTS mode. This
mode, typically used for block-based storage devices, takes the physical
address of the data into account when encrypting each block. This ensures
that the effective key is different for each block of memory. Moving
encrypted content across physical address results in garbage on read,
mitigating block-relocation attacks. This property is the reason many of
the discussed attacks require control of a shared physical page to be
handed from the victim to the attacker.
== MKTME-Provided Mitigations ==
MKTME adds a few mitigations against attacks that are not mitigated when
using TME alone. The first set are mitigations against software attacks
that are familiar today:
* Kernel Mapping Attacks: information disclosures that leverage the
kernel direct map are mitigated against disclosing user data.
* Freed Data Leak Attacks: removing an encryption key from the
hardware mitigates future user information disclosure.
The next set are attacks that depend on specialized hardware, such as an
“evil DIMM” or a DDR interposer:
* Cross-Domain Replay Attack: data is captured from one domain
(guest) and replayed to another at a later time.
* Cross-Domain Capture and Delayed Compare Attack: data is captured
and later analyzed to discover secrets.
* Key Wear-out Attack: data is captured and analyzed in order to
later write precise changes to plaintext.
More details on these attacks are below.
MKTME does not mitigate all attacks that can be performed with an “evil
DIMM” or a DDR interposer. In determining MKTME’s security value in an
environment, the ease and effectiveness of the above attacks mitigated by
MKTME should be compared with those which are not mitigated. Some key
examples of unmitigated attacks follow:
* Random Data Modification Attack: An attacker writes random
ciphertext, which causes the victim to consume random data.
This can be used to flip security-sensitive bits.
* Same-Domain Replay Attacks: Data can be captured and replayed
within a single domain. An attacker could, for instance, replay
an old ‘struct cred’ value to a newer, less-privileged process.
* Ciphertext Side Channel Attacks: Similar to delayed-compare
attacks, useful information might be inferred even from ciphertext.
This information might be leveraged to infer information about
secrets such as private keys.
=== Kernel Mapping Attacks ===
Information disclosure vulnerabilities leverage the kernel direct map
because many vulnerabilities involve manipulation of kernel data
structures (examples: CVE-2017-7277, CVE-2017-9605). We normally think of
these bugs as leaking valuable *kernel* data, but they can leak
application data when application pages are recycled for kernel use.
With this MKTME implementation, there is a direct map created for each
MKTME KeyID which is used whenever the kernel needs to access plaintext.
But, all kernel data structures are accessed via the direct map for
KeyID-0. Thus, memory reads which are not coordinated with the KeyID get
garbage (for example, accessing KeyID-4 data with the KeyID-0 mapping).
This means that if sensitive data encrypted using MKTME is leaked via the
KeyID-0 direct map, ciphertext decrypted with the wrong key will be
disclosed. To disclose plaintext, an attacker must “pivot” to the correct
direct mapping, which is non-trivial because there are no kernel data
structures in the KeyID!=0 direct mapping.
=== Freed Data Leak Attack ===
The kernel has a history of bugs around uninitialized data. Usually, we
think of these bugs as leaking sensitive kernel data, but they can also be
used to leak application secrets.
MKTME can help mitigate the case where application secrets are leaked:
* App (or VM) places a secret in a page
* App exits or frees memory to kernel allocator
* Page added to allocator free list
* Attacker reallocates page to a purpose where it can read the page
Now, imagine MKTME was in use on the memory being leaked. The data can
only be leaked as long as the key is programmed in the hardware. If the
key is de-programmed, like after all pages are freed after a guest is shut
down, any future reads will just see ciphertext.
Basically, the key is a convenient choke-point: you can be more confident
that data encrypted with it is inaccessible once the key is removed.
=== Cross-Domain Replay Attack ===
MKTME mitigates cross-domain replay attacks where an attacker replaces an
encrypted block owned by one domain with a block owned by another domain.
MKTME does not prevent this replacement from occurring, but it does
mitigate plaintext from being disclosed if the domains use different keys.
With TME, the attack could be executed by:
* A victim places secret in memory, at a given physical address.
Note: AES-XTS is what restricts the attack to being performed at a
single physical address instead of across different physical
addresses
* Attacker captures victim secret’s ciphertext
* Later on, after victim frees the physical address, attacker gains
ownership
* Attacker puts the ciphertext at the address and get the secret
plaintext
But, due to the presumably different keys used by the attacker and the
victim, the attacker can not successfully decrypt old ciphertext.
=== Cross-Domain Capture and Delayed Compare Attack ===
This is also referred to as a kind of dictionary attack.
Similarly, MKTME protects against cross-domain capture-and-compare
attacks. Consider the following scenario:
* A victim places a secret in memory, at a known physical address
* Attacker captures victim’s ciphertext
* Attacker gains control of the target physical address, perhaps
after the victim’s VM is shut down or its memory reclaimed.
* Attacker computes and writes many possible plaintexts until new
ciphertext matches content captured previously.
Secrets which have low (plaintext) entropy are more vulnerable to this
attack because they reduce the number of possible plaintexts an attacker
has to compute and write.
The attack will not work if attacker and victim uses different keys.
=== Key Wear-out Attack ===
Repeated use of an encryption key might be used by an attacker to infer
information about the key or the plaintext, weakening the encryption. The
higher the bandwidth of the encryption engine, the more vulnerable the key
is to wear-out. The MKTME memory encryption hardware works at the speed
of the memory bus, which has high bandwidth.
This attack requires capturing potentially large amounts of cipertext,
processing it, then replaying modified cipertext. Our expectation is that
most attackers would opt for lower-cost attacks like the replay attack
mentioned above.
For this implementation, the kernel always uses KeyID-0 which is always
vulnerable to wear out since it can not be rotated. KeyID-0 wearout can
be mitigated by limiting the bandwidth with which an attacker can write
KeyID-0-encrypted data.
Such a weakness has been demonstrated[3] on a theoretical cipher with
similar properties as AES-XTS.
An attack would take the following steps:
* Victim system is using TME with AES-XTS-128
* Attacker repeatedly captures ciphertext/plaintext pairs (can be
performed with online hardware attack like an interposer).
* Attacker compels repeated use of the key under attack for a
sustained time period without a system reboot[4].
* Attacker discovers a ‘plaintext XOR cipertext’ collision pair
* Attacker can induce controlled modifications to the targeted
plaintext by modifying the colliding ciphertext
MKTME mitigates key wear-out in two ways:
* Keys can be rotated periodically to mitigate wear-out. Since TME
keys are generated at boot, rotation of TME keys requires a
reboot. In contrast, MKTME allows rotation while the system is
booted. An application could implement a policy to rotate keys at
a frequency which is not feasible to attack.
* In the case that MKTME is used to encrypt two guests’ memory with
two different keys, an attack on one guest’s key would not weaken
the key used in the second guest.
== Userspace API ==
Here's an overview of the anonymous memory encryption process
as viewed from user space:
* Allocate an MKTME Key:
key = add_key("mktme", "name", "type=cpu algorithm=aes-xts-128" @u
* Map memory:
ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
* Protect memory:
ret = syscall(SYS_encrypt_mprotect, ptr, size, PROT_READ|PROT_WRITE,
key);
*Enjoy the encrypted memory*
* Free memory:
ret = munmap(ptr, size);
* Free the MKTME key:
ret = keyctl(KEYCTL_INVALIDATE, key);
See the documentation patches for more info and a demo program.
This update removes support for user type keys. This closes a security
gap, where the encryption keys were exposed to user space. Additionally,
memory hotplug support was basically removed from the API. Only skeleton
support remains to enforce the rule that no new memory may be added to the
MKTME system. This is a deferral of memory hot add support until the
platform support is in place.
== Changelog ==
v2:
- Add comments in allocation and free paths on how ordering is ensured.
- Modify pageattr code to sync direct mapping after the canonical direct
mapping is modified.
- Introduce helpers to access number of KeyIDs, KeyID shift and KeyID
mask.
- Drop unneeded EXPORT_SYMBOL_GPL().
- User type key support, keys in which users bring their own encryption
keys, has been removed. CPU generated keys remain and should be used
instead of USER type keys. (removes security gap, reduces complexity)
- Adding a CPU generated key no longer offers the user an option of
supplying additional entropy to the data and tweak key. (reduces
complexity)
- Memory hotplug add support is removed. This is basically a deferral
of the feature until we have platform support for the feature.
(reduces complexity)
- Documentation is updated to match changes to the add key API.
- Documentation adds an index in the x86 index, and corrects a typo.
- Reference counting: code and commit message comments are updated
to reflect the general nature of the ref counter. Previous comments
said it counted VMAs only.
- Replace an GFP_ATOMIC with GFP_KERNEL is mktme_keys.c
--
[1] https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
[2] The MKTME architecture supports up to 16 bits of KeyIDs, so a
maximum of 65535 keys on top of the “TME key” at KeyID-0. The
first implementation is expected to support 5 bits, making 63 keys
available to applications. However, this is not guaranteed. The
number of available keys could be reduced if, for instance,
additional physical address space is desired over additional
KeyIDs.
[3] http://web.cs.ucdavis.edu/~rogaway/papers/offsets.pdf
[4] This sustained time required for an attack could vary from days
to years depending on the attacker’s goals.
Alison Schofield (30):
x86/pconfig: Set an activated algorithm in all MKTME commands
keys/mktme: Introduce a Kernel Key Service for MKTME
keys/mktme: Preparse the MKTME key payload
keys/mktme: Instantiate MKTME keys
keys/mktme: Destroy MKTME keys
keys/mktme: Move the MKTME payload into a cache aligned structure
keys/mktme: Set up PCONFIG programming targets for MKTME keys
keys/mktme: Program MKTME keys into the platform hardware
keys/mktme: Set up a percpu_ref_count for MKTME keys
keys/mktme: Clear the key programming from the MKTME hardware
keys/mktme: Require CAP_SYS_RESOURCE capability for MKTME keys
acpi: Remove __init from acpi table parsing functions
acpi/hmat: Determine existence of an ACPI HMAT
keys/mktme: Require ACPI HMAT to register the MKTME Key Service
acpi/hmat: Evaluate topology presented in ACPI HMAT for MKTME
keys/mktme: Do not allow key creation in unsafe topologies
keys/mktme: Support CPU hotplug for MKTME key service
keys/mktme: Block memory hotplug additions when MKTME is enabled
mm: Generalize the mprotect implementation to support extensions
syscall/x86: Wire up a system call for MKTME encryption keys
x86/mm: Set KeyIDs in encrypted VMAs for MKTME
mm: Add the encrypt_mprotect() system call for MKTME
x86/mm: Keep reference counts on hardware key usage for MKTME
mm: Restrict MKTME memory encryption to anonymous VMAs
x86/mktme: Overview of Multi-Key Total Memory Encryption
x86/mktme: Document the MKTME provided security mitigations
x86/mktme: Document the MKTME kernel configuration requirements
x86/mktme: Document the MKTME Key Service API
x86/mktme: Document the MKTME API for anonymous memory encryption
x86/mktme: Demonstration program using the MKTME APIs
Jacob Pan (3):
iommu/vt-d: Support MKTME in DMA remapping
x86/mm: introduce common code for mem encryption
x86/mm: Use common code for DMA memory encryption
Kai Huang (1):
kvm, x86, mmu: setup MKTME keyID to spte for given PFN
Kirill A. Shutemov (25):
mm: Do no merge VMAs with different encryption KeyIDs
mm: Add helpers to setup zero page mappings
mm/ksm: Do not merge pages with different KeyIDs
mm/page_alloc: Unify alloc_hugepage_vma()
mm/page_alloc: Handle allocation for encrypted memory
mm/khugepaged: Handle encrypted pages
x86/mm: Mask out KeyID bits from page table entry pfn
x86/mm: Introduce helpers to read number, shift and mask of KeyIDs
x86/mm: Store bitmask of the encryption algorithms supported by MKTME
x86/mm: Preserve KeyID on pte_modify() and pgprot_modify()
x86/mm: Detect MKTME early
x86/mm: Add a helper to retrieve KeyID for a page
x86/mm: Add a helper to retrieve KeyID for a VMA
x86/mm: Add hooks to allocate and free encrypted pages
x86/mm: Map zero pages into encrypted mappings correctly
x86/mm: Rename CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING
x86/mm: Allow to disable MKTME after enumeration
x86/mm: Calculate direct mapping size
x86/mm: Implement syncing per-KeyID direct mappings
x86/mm: Handle encrypted memory in page_to_virt() and __pa()
mm/page_ext: Export lookup_page_ext() symbol
mm/rmap: Clear vma->anon_vma on unlink_anon_vmas()
x86/mm: Disable MKTME on incompatible platform configurations
x86/mm: Disable MKTME if not all system memory supports encryption
x86: Introduce CONFIG_X86_INTEL_MKTME
Documentation/x86/index.rst | 1 +
Documentation/x86/mktme/index.rst | 13 +
.../x86/mktme/mktme_configuration.rst | 6 +
Documentation/x86/mktme/mktme_demo.rst | 53 ++
Documentation/x86/mktme/mktme_encrypt.rst | 56 ++
Documentation/x86/mktme/mktme_keys.rst | 61 ++
Documentation/x86/mktme/mktme_mitigations.rst | 151 ++++
Documentation/x86/mktme/mktme_overview.rst | 57 ++
Documentation/x86/x86_64/mm.rst | 4 +
arch/alpha/include/asm/page.h | 2 +-
arch/x86/Kconfig | 31 +-
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/asm/intel_pconfig.h | 14 +-
arch/x86/include/asm/mem_encrypt.h | 29 +
arch/x86/include/asm/mktme.h | 96 +++
arch/x86/include/asm/page.h | 4 +
arch/x86/include/asm/page_32.h | 1 +
arch/x86/include/asm/page_64.h | 4 +-
arch/x86/include/asm/pgtable.h | 19 +
arch/x86/include/asm/pgtable_types.h | 23 +-
arch/x86/include/asm/setup.h | 6 +
arch/x86/kernel/cpu/intel.c | 65 +-
arch/x86/kernel/head64.c | 4 +
arch/x86/kernel/setup.c | 3 +
arch/x86/kvm/mmu.c | 18 +-
arch/x86/mm/Makefile | 3 +
arch/x86/mm/init_64.c | 65 ++
arch/x86/mm/kaslr.c | 11 +-
arch/x86/mm/mem_encrypt.c | 30 -
arch/x86/mm/mem_encrypt_common.c | 52 ++
arch/x86/mm/mktme.c | 683 ++++++++++++++++++
arch/x86/mm/pageattr.c | 27 +
drivers/acpi/hmat/hmat.c | 67 ++
drivers/acpi/tables.c | 10 +-
drivers/firmware/efi/efi.c | 25 +-
drivers/iommu/intel-iommu.c | 29 +-
fs/dax.c | 3 +-
fs/exec.c | 4 +-
fs/userfaultfd.c | 7 +-
include/asm-generic/pgtable.h | 8 +
include/keys/mktme-type.h | 31 +
include/linux/acpi.h | 9 +-
include/linux/dma-direct.h | 4 +-
include/linux/efi.h | 1 +
include/linux/gfp.h | 56 +-
include/linux/intel-iommu.h | 9 +-
include/linux/mem_encrypt.h | 23 +-
include/linux/migrate.h | 14 +-
include/linux/mm.h | 27 +-
include/linux/page_ext.h | 11 +-
include/linux/syscalls.h | 2 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/fork.c | 2 +
kernel/sys_ni.c | 2 +
mm/compaction.c | 3 +
mm/huge_memory.c | 6 +-
mm/khugepaged.c | 10 +
mm/ksm.c | 17 +
mm/madvise.c | 2 +-
mm/memory.c | 3 +-
mm/mempolicy.c | 30 +-
mm/migrate.c | 4 +-
mm/mlock.c | 2 +-
mm/mmap.c | 31 +-
mm/mprotect.c | 98 ++-
mm/page_alloc.c | 74 ++
mm/page_ext.c | 5 +
mm/rmap.c | 4 +-
mm/userfaultfd.c | 3 +-
security/keys/Makefile | 1 +
security/keys/mktme_keys.c | 590 +++++++++++++++
72 files changed, 2670 insertions(+), 155 deletions(-)
create mode 100644 Documentation/x86/mktme/index.rst
create mode 100644 Documentation/x86/mktme/mktme_configuration.rst
create mode 100644 Documentation/x86/mktme/mktme_demo.rst
create mode 100644 Documentation/x86/mktme/mktme_encrypt.rst
create mode 100644 Documentation/x86/mktme/mktme_keys.rst
create mode 100644 Documentation/x86/mktme/mktme_mitigations.rst
create mode 100644 Documentation/x86/mktme/mktme_overview.rst
create mode 100644 arch/x86/include/asm/mktme.h
create mode 100644 arch/x86/mm/mem_encrypt_common.c
create mode 100644 arch/x86/mm/mktme.c
create mode 100644 include/keys/mktme-type.h
create mode 100644 security/keys/mktme_keys.c
--
2.21.0
page_keyid() is inline funcation that uses lookup_page_ext(). KVM is
going to use page_keyid() and since KVM can be built as a module
lookup_page_ext() has to be exported.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
mm/page_ext.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/page_ext.c b/mm/page_ext.c
index c52b77c13cd9..eeca218891e7 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -139,6 +139,7 @@ struct page_ext *lookup_page_ext(const struct page *page)
MAX_ORDER_NR_PAGES);
return get_entry(base, index);
}
+EXPORT_SYMBOL_GPL(lookup_page_ext);
static int __init alloc_node_page_ext(int nid)
{
@@ -209,6 +210,7 @@ struct page_ext *lookup_page_ext(const struct page *page)
return NULL;
return get_entry(section->page_ext, pfn);
}
+EXPORT_SYMBOL_GPL(lookup_page_ext);
static void *__meminit alloc_page_ext(size_t size, int nid)
{
--
2.21.0
From: Alison Schofield <[email protected]>
Finally, the keys are programmed into the hardware via each
lead CPU. Every package has to be programmed successfully.
There is no partial success allowed here.
Here a retry scheme is included for two errors that may succeed
on retry: MKTME_DEVICE_BUSY and MKTME_ENTROPY_ERROR.
However, it's not clear if even those errors should be retried
at this level. Perhaps they too, should be returned to user space
for handling.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 92 +++++++++++++++++++++++++++++++++++++-
1 file changed, 91 insertions(+), 1 deletion(-)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 272bff8591b7..3c641f3ee794 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -83,6 +83,96 @@ static const match_table_t mktme_token = {
{OPT_ERROR, NULL}
};
+struct mktme_hw_program_info {
+ struct mktme_key_program *key_program;
+ int *status;
+};
+
+struct mktme_err_table {
+ const char *msg;
+ bool retry;
+};
+
+static const struct mktme_err_table mktme_error[] = {
+/* MKTME_PROG_SUCCESS */ {"KeyID was successfully programmed", false},
+/* MKTME_INVALID_PROG_CMD */ {"Invalid KeyID programming command", false},
+/* MKTME_ENTROPY_ERROR */ {"Insufficient entropy", true},
+/* MKTME_INVALID_KEYID */ {"KeyID not valid", false},
+/* MKTME_INVALID_ENC_ALG */ {"Invalid encryption algorithm chosen", false},
+/* MKTME_DEVICE_BUSY */ {"Failure to access key table", true},
+};
+
+static int mktme_parse_program_status(int status[])
+{
+ int cpu, sum = 0;
+
+ /* Success: all CPU(s) programmed all key table(s) */
+ for_each_cpu(cpu, mktme_leadcpus)
+ sum += status[cpu];
+ if (!sum)
+ return MKTME_PROG_SUCCESS;
+
+ /* Invalid Parameters: log the error and return the error. */
+ for_each_cpu(cpu, mktme_leadcpus) {
+ switch (status[cpu]) {
+ case MKTME_INVALID_KEYID:
+ case MKTME_INVALID_PROG_CMD:
+ case MKTME_INVALID_ENC_ALG:
+ pr_err("mktme: %s\n", mktme_error[status[cpu]].msg);
+ return status[cpu];
+
+ default:
+ break;
+ }
+ }
+ /*
+ * Device Busy or Insufficient Entropy: do not log the
+ * error. These will be retried and if retries (time or
+ * count runs out) caller will log the error.
+ */
+ for_each_cpu(cpu, mktme_leadcpus) {
+ if (status[cpu] == MKTME_DEVICE_BUSY)
+ return status[cpu];
+ }
+ return MKTME_ENTROPY_ERROR;
+}
+
+/* Program a single key using one CPU. */
+static void mktme_do_program(void *hw_program_info)
+{
+ struct mktme_hw_program_info *info = hw_program_info;
+ int cpu;
+
+ cpu = smp_processor_id();
+ info->status[cpu] = mktme_key_program(info->key_program);
+}
+
+static int mktme_program_all_keytables(struct mktme_key_program *key_program)
+{
+ struct mktme_hw_program_info info;
+ int err, retries = 10; /* Maybe users should handle retries */
+
+ info.key_program = key_program;
+ info.status = kcalloc(num_possible_cpus(), sizeof(info.status[0]),
+ GFP_KERNEL);
+
+ while (retries--) {
+ get_online_cpus();
+ on_each_cpu_mask(mktme_leadcpus, mktme_do_program,
+ &info, 1);
+ put_online_cpus();
+
+ err = mktme_parse_program_status(info.status);
+ if (!err) /* Success */
+ return err;
+ else if (!mktme_error[err].retry) /* Error no retry */
+ return -ENOKEY;
+ }
+ /* Ran out of retries */
+ pr_err("mktme: %s\n", mktme_error[err].msg);
+ return err;
+}
+
/* Copy the payload to the HW programming structure and program this KeyID */
static int mktme_program_keyid(int keyid, u32 payload)
{
@@ -97,7 +187,7 @@ static int mktme_program_keyid(int keyid, u32 payload)
kprog->keyid = keyid;
kprog->keyid_ctrl = payload;
- ret = MKTME_PROG_SUCCESS; /* Future programming call */
+ ret = mktme_program_all_keytables(kprog);
kmem_cache_free(mktme_prog_cache, kprog);
return ret;
}
--
2.21.0
MKTME claims several upper bits of the physical address in a page table
entry to encode KeyID. It effectively shrinks number of bits for
physical address. We should exclude KeyID bits from physical addresses.
For instance, if CPU enumerates 52 physical address bits and number of
bits claimed for KeyID is 6, bits 51:46 must not be threated as part
physical address.
This patch adjusts __PHYSICAL_MASK during MKTME enumeration.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 8d6d92ebeb54..f03eee666761 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -616,6 +616,29 @@ static void detect_tme(struct cpuinfo_x86 *c)
mktme_status = MKTME_ENABLED;
}
+#ifdef CONFIG_X86_INTEL_MKTME
+ if (mktme_status == MKTME_ENABLED && nr_keyids) {
+ /*
+ * Mask out bits claimed from KeyID from physical address mask.
+ *
+ * For instance, if a CPU enumerates 52 physical address bits
+ * and number of bits claimed for KeyID is 6, bits 51:46 of
+ * physical address is unusable.
+ */
+ phys_addr_t keyid_mask;
+
+ keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, c->x86_phys_bits - keyid_bits);
+ physical_mask &= ~keyid_mask;
+ } else {
+ /*
+ * Reset __PHYSICAL_MASK.
+ * Maybe needed if there's inconsistent configuation
+ * between CPUs.
+ */
+ physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+ }
+#endif
+
/*
* KeyID bits effectively lower the number of physical address
* bits. Update cpuinfo_x86::x86_phys_bits accordingly.
--
2.21.0
From: Alison Schofield <[email protected]>
MKTME depends upon at least one online CPU capable of programming
each memory controller in the platform.
An unsafe topology for MKTME is a memory only package or a package
with no online CPUs. Key creation with unsafe topologies will fail
with EINVAL and a warning will be logged one time.
For example:
[ ] MKTME: no online CPU in proximity domain
[ ] MKTME: topology does not support key creation
These are recoverable errors. CPUs may be brought online that are
capable of programming a previously unprogrammable memory controller.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 36 ++++++++++++++++++++++++++++++------
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 6265b62801e9..70662e882674 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -23,6 +23,7 @@ static unsigned int mktme_available_keyids; /* Free Hardware KeyIDs */
static struct kmem_cache *mktme_prog_cache; /* Hardware programming cache */
static unsigned long *mktme_target_map; /* PCONFIG programming target */
static cpumask_var_t mktme_leadcpus; /* One CPU per PCONFIG target */
+static bool mktme_allow_keys; /* HW topology supports keys */
enum mktme_keyid_state {
KEYID_AVAILABLE, /* Available to be assigned */
@@ -253,32 +254,55 @@ static void mktme_destroy_key(struct key *key)
percpu_ref_kill(&encrypt_count[keyid]);
}
+static void mktme_update_pconfig_targets(void);
/* Key Service Method to create a new key. Payload is preparsed. */
int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
{
u32 *payload = prep->payload.data[0];
unsigned long flags;
+ int ret = -ENOKEY;
int keyid;
spin_lock_irqsave(&mktme_lock, flags);
+
+ /* Topology supports key creation */
+ if (mktme_allow_keys)
+ goto get_key;
+
+ /* Topology unknown, check it. */
+ if (!mktme_hmat_evaluate()) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ /* Keys are now allowed. Update the programming targets. */
+ mktme_update_pconfig_targets();
+ mktme_allow_keys = true;
+
+get_key:
keyid = mktme_reserve_keyid(key);
spin_unlock_irqrestore(&mktme_lock, flags);
if (!keyid)
- return -ENOKEY;
+ goto out;
if (percpu_ref_init(&encrypt_count[keyid], mktme_percpu_ref_release,
0, GFP_KERNEL))
- goto err_out;
+ goto out_free_key;
- if (!mktme_program_keyid(keyid, *payload))
- return MKTME_PROG_SUCCESS;
+ ret = mktme_program_keyid(keyid, *payload);
+ if (ret == MKTME_PROG_SUCCESS)
+ goto out;
+ /* Key programming failed */
percpu_ref_exit(&encrypt_count[keyid]);
-err_out:
+
+out_free_key:
spin_lock_irqsave(&mktme_lock, flags);
mktme_release_keyid(keyid);
+out_unlock:
spin_unlock_irqrestore(&mktme_lock, flags);
- return -ENOKEY;
+out:
+ return ret;
}
/* Make sure arguments are correct for the TYPE of key requested */
--
2.21.0
From: Kai Huang <[email protected]>
Setup keyID to SPTE, which will be eventually programmed to shadow MMU
or EPT table, according to page's associated keyID, so that guest is
able to use correct keyID to access guest memory.
Note current shadow_me_mask doesn't suit MKTME's needs, since for MKTME
there's no fixed memory encryption mask, but can vary from keyID 1 to
maximum keyID, therefore shadow_me_mask remains 0 for MKTME.
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/kvm/mmu.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8f72526e2f68..b8742e6219f6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2936,6 +2936,22 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
#define SET_SPTE_WRITE_PROTECTED_PT BIT(0)
#define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1)
+static u64 get_phys_encryption_mask(kvm_pfn_t pfn)
+{
+#ifdef CONFIG_X86_INTEL_MKTME
+ struct page *page;
+
+ if (!pfn_valid(pfn))
+ return 0;
+
+ page = pfn_to_page(pfn);
+
+ return ((u64)page_keyid(page)) << mktme_keyid_shift();
+#else
+ return shadow_me_mask;
+#endif
+}
+
static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
unsigned pte_access, int level,
gfn_t gfn, kvm_pfn_t pfn, bool speculative,
@@ -2982,7 +2998,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
pte_access &= ~ACC_WRITE_MASK;
if (!kvm_is_mmio_pfn(pfn))
- spte |= shadow_me_mask;
+ spte |= get_phys_encryption_mask(pfn);
spte |= (u64)pfn << PAGE_SHIFT;
--
2.21.0
From: Alison Schofield <[email protected]>
The MKTME encryption hardware resides on each physical package.
The encryption hardware includes 'Key Tables' that must be
programmed identically across all physical packages in the
platform. Although every CPU in a package can program its key
table, the kernel uses one lead CPU per package for programming.
CPU Hotplug Teardown
--------------------
MKTME manages CPU hotplug teardown to make sure the ability to
program all packages is preserved when MKTME keys are present.
When MKTME keys are not currently programmed, simply allow
the teardown, and set "mktme_allow_keys" to false. This will
force a re-evaluation of the platform topology before the next
key creation. If this CPU teardown mattered, MKTME key service
will report an error and fail to create the key. (User can
online that CPU and try again)
When MKTME keys are currently programmed, allow teardowns
of non 'lead CPU's' and of CPUs where another, core sibling
CPU, can take over as lead. Do not allow teardown of any
lead CPU that would render a hardware key table unreachable!
CPU Hotplug Startup
-------------------
CPUs coming online are of interest to the key service, but since
the service never needs to block a CPU startup event, nor does it
need to prepare for an onlining CPU, a callback is not implemented.
MKTME will catch the availability of the new CPU, if it is
needed, at the next key creation time. If keys are not allowed,
that new CPU will be part of the topology evaluation to determine
if keys should now be allowed.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 47 +++++++++++++++++++++++++++++++++++++-
1 file changed, 46 insertions(+), 1 deletion(-)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 70662e882674..b042df73899d 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -460,9 +460,46 @@ static int mktme_alloc_pconfig_targets(void)
return 0;
}
+static int mktme_cpu_teardown(unsigned int cpu)
+{
+ int new_leadcpu, ret = 0;
+ unsigned long flags;
+
+ /* Do not allow key programming during cpu hotplug event */
+ spin_lock_irqsave(&mktme_lock, flags);
+
+ /*
+ * When no keys are in use, allow the teardown, and set
+ * mktme_allow_keys to FALSE. That forces an evaluation
+ * of the topology before the next key creation.
+ */
+ if (mktme_available_keyids == mktme_nr_keyids()) {
+ mktme_allow_keys = false;
+ goto out;
+ }
+ /* Teardown CPU is not a lead CPU. Allow teardown. */
+ if (!cpumask_test_cpu(cpu, mktme_leadcpus))
+ goto out;
+
+ /* Teardown CPU is a lead CPU. Look for a new lead CPU. */
+ new_leadcpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+ if (new_leadcpu < nr_cpumask_bits) {
+ /* New lead CPU found. Update the programming mask */
+ __cpumask_clear_cpu(cpu, mktme_leadcpus);
+ __cpumask_set_cpu(new_leadcpu, mktme_leadcpus);
+ } else {
+ /* New lead CPU not found. Do not allow CPU teardown */
+ ret = -1;
+ }
+out:
+ spin_unlock_irqrestore(&mktme_lock, flags);
+ return ret;
+}
+
static int __init init_mktme(void)
{
- int ret;
+ int ret, cpuhp;
/* Verify keys are present */
if (mktme_nr_keyids() < 1)
@@ -500,10 +537,18 @@ static int __init init_mktme(void)
if (!encrypt_count)
goto free_targets;
+ cpuhp = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "keys/mktme_keys:online",
+ NULL, mktme_cpu_teardown);
+ if (cpuhp < 0)
+ goto free_encrypt;
+
ret = register_key_type(&key_type_mktme);
if (!ret)
return ret; /* SUCCESS */
+ cpuhp_remove_state_nocalls(cpuhp);
+free_encrypt:
kvfree(encrypt_count);
free_targets:
free_cpumask_var(mktme_leadcpus);
--
2.21.0
From: Alison Schofield <[email protected]>
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/x86/mktme/index.rst | 1 +
Documentation/x86/mktme/mktme_keys.rst | 61 ++++++++++++++++++++++++++
2 files changed, 62 insertions(+)
create mode 100644 Documentation/x86/mktme/mktme_keys.rst
diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index 0f021cc4a2db..8cf2b7d62091 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -8,3 +8,4 @@ Multi-Key Total Memory Encryption (MKTME)
mktme_overview
mktme_mitigations
mktme_configuration
+ mktme_keys
diff --git a/Documentation/x86/mktme/mktme_keys.rst b/Documentation/x86/mktme/mktme_keys.rst
new file mode 100644
index 000000000000..5d9125eb7950
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_keys.rst
@@ -0,0 +1,61 @@
+MKTME Key Service API
+=====================
+MKTME is a new key service type added to the Linux Kernel Key Service.
+
+The MKTME Key Service type is available when CONFIG_X86_INTEL_MKTME is
+turned on in Intel platforms that support the MKTME feature.
+
+The MKTME Key Service type manages the allocation of hardware encryption
+keys. Users can request an MKTME type key and then use that key to
+encrypt memory with the encrypt_mprotect() system call.
+
+Usage
+-----
+ When using the Kernel Key Service to request an *mktme* key,
+ specify the *payload* as follows:
+
+ type=
+ *cpu* User requests a CPU generated encryption key.
+ The CPU generates and assigns an ephemeral key.
+
+ *no-encrypt*
+ User requests that hardware does not encrypt
+ memory when this key is in use.
+
+ algorithm=
+ When type=cpu the algorithm field must be *aes-xts-128*
+ *aes-xts-128* is the only supported encryption algorithm
+
+ When type=no-encrypt the algorithm field must not be
+ present in the payload.
+
+ERRORS
+------
+ In addition to the Errors returned from the Kernel Key Service,
+ add_key(2) or keyctl(1) commands, the MKTME Key Service type may
+ return the following errors:
+
+ EINVAL for any payload specification that does not match the
+ MKTME type payload as defined above.
+
+ EACCES for access denied. The MKTME key type uses capabilities
+ to restrict the allocation of keys to privileged users.
+ CAP_SYS_RESOURCE is required, but it will accept the
+ broader capability of CAP_SYS_ADMIN. See capabilities(7).
+
+ ENOKEY if a hardware key cannot be allocated. Additional error
+ messages will describe the hardware programming errors.
+
+EXAMPLES
+--------
+ Add a 'cpu' type key::
+
+ char \*options_CPU = "type=cpu algorithm=aes-xts-128";
+
+ key = add_key("mktme", "name", options_CPU, strlen(options_CPU),
+ KEY_SPEC_THREAD_KEYRING);
+
+ Add a "no-encrypt' type key::
+
+ key = add_key("mktme", "name", "no-encrypt", strlen(options_CPU),
+ KEY_SPEC_THREAD_KEYRING);
--
2.21.0
For MKTME we use per-KeyID direct mappings. This allows kernel to have
access to encrypted memory.
sync_direct_mapping() sync per-KeyID direct mappings with a canonical
one -- KeyID-0.
The function tracks changes in the canonical mapping:
- creating or removing chunks of the translation tree;
- changes in mapping flags (i.e. protection bits);
- splitting huge page mapping into a page table;
- replacing page table with a huge page mapping;
The function need to be called on every change to the direct mapping:
hotplug, hotremove, changes in permissions bits, etc.
The function is nop until MKTME is enabled.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 6 +
arch/x86/mm/init_64.c | 7 +
arch/x86/mm/mktme.c | 439 +++++++++++++++++++++++++++++++++++
arch/x86/mm/pageattr.c | 27 +++
4 files changed, 479 insertions(+)
diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 3fc246acc279..d26ada6b65f7 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -62,6 +62,8 @@ static inline void arch_free_page(struct page *page, int order)
free_encrypted_page(page, order);
}
+int sync_direct_mapping(unsigned long start, unsigned long end);
+
#else
#define mktme_keyid_mask() ((phys_addr_t)0)
#define mktme_nr_keyids() 0
@@ -76,6 +78,10 @@ static inline bool mktme_enabled(void)
static inline void mktme_disable(void) {}
+static inline int sync_direct_mapping(unsigned long start, unsigned long end)
+{
+ return 0;
+}
#endif
#endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4c1f93df47a5..6769650ad18d 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -726,6 +726,7 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
{
bool pgd_changed = false;
unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
+ int ret;
paddr_last = paddr_end;
vaddr = (unsigned long)__va(paddr_start);
@@ -762,6 +763,9 @@ __kernel_physical_mapping_init(unsigned long paddr_start,
pgd_changed = true;
}
+ ret = sync_direct_mapping(vaddr_start, vaddr_end);
+ WARN_ON(ret);
+
if (pgd_changed)
sync_global_pgds(vaddr_start, vaddr_end - 1);
@@ -1201,10 +1205,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
static void __meminit
kernel_physical_mapping_remove(unsigned long start, unsigned long end)
{
+ int ret;
start = (unsigned long)__va(start);
end = (unsigned long)__va(end);
remove_pagetable(start, end, true, NULL);
+ ret = sync_direct_mapping(start, end);
+ WARN_ON(ret);
}
void __ref arch_remove_memory(int nid, u64 start, u64 size,
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 1e8d662e5bff..ed13967bb543 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,6 +1,8 @@
#include <linux/mm.h>
#include <linux/highmem.h>
#include <asm/mktme.h>
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
/* Mask to extract KeyID from physical address. */
phys_addr_t __mktme_keyid_mask;
@@ -54,6 +56,8 @@ static bool need_page_mktme(void)
static void init_page_mktme(void)
{
static_branch_enable(&mktme_enabled_key);
+
+ sync_direct_mapping(PAGE_OFFSET, PAGE_OFFSET + direct_mapping_size);
}
struct page_ext_operations page_mktme_ops = {
@@ -148,3 +152,438 @@ void free_encrypted_page(struct page *page, int order)
page++;
}
}
+
+static int sync_direct_mapping_pte(unsigned long keyid,
+ pmd_t *dst_pmd, pmd_t *src_pmd,
+ unsigned long addr, unsigned long end)
+{
+ pte_t *src_pte, *dst_pte;
+ pte_t *new_pte = NULL;
+ bool remove_pte;
+
+ /*
+ * We want to unmap and free the page table if the source is empty and
+ * the range covers whole page table.
+ */
+ remove_pte = !src_pmd && PAGE_ALIGNED(addr) && PAGE_ALIGNED(end);
+
+ /*
+ * PMD page got split into page table.
+ * Clear PMD mapping. Page table will be established instead.
+ */
+ if (pmd_large(*dst_pmd)) {
+ spin_lock(&init_mm.page_table_lock);
+ pmd_clear(dst_pmd);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ /* Allocate a new page table if needed. */
+ if (pmd_none(*dst_pmd)) {
+ new_pte = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+ if (!new_pte)
+ return -ENOMEM;
+ dst_pte = new_pte + pte_index(addr + keyid * direct_mapping_size);
+ } else {
+ dst_pte = pte_offset_map(dst_pmd, addr + keyid * direct_mapping_size);
+ }
+ src_pte = src_pmd ? pte_offset_map(src_pmd, addr) : NULL;
+
+ spin_lock(&init_mm.page_table_lock);
+
+ do {
+ pteval_t val;
+
+ if (!src_pte || pte_none(*src_pte)) {
+ set_pte(dst_pte, __pte(0));
+ goto next;
+ }
+
+ if (!pte_none(*dst_pte)) {
+ /*
+ * Sanity check: PFNs must match between source
+ * and destination even if the rest doesn't.
+ */
+ BUG_ON(pte_pfn(*dst_pte) != pte_pfn(*src_pte));
+ }
+
+ /* Copy entry, but set KeyID. */
+ val = pte_val(*src_pte) | keyid << mktme_keyid_shift();
+ val &= __supported_pte_mask;
+ set_pte(dst_pte, __pte(val));
+next:
+ addr += PAGE_SIZE;
+ dst_pte++;
+ if (src_pte)
+ src_pte++;
+ } while (addr != end);
+
+ if (new_pte)
+ pmd_populate_kernel(&init_mm, dst_pmd, new_pte);
+
+ if (remove_pte) {
+ __free_page(pmd_page(*dst_pmd));
+ pmd_clear(dst_pmd);
+ }
+
+ spin_unlock(&init_mm.page_table_lock);
+
+ return 0;
+}
+
+static int sync_direct_mapping_pmd(unsigned long keyid,
+ pud_t *dst_pud, pud_t *src_pud,
+ unsigned long addr, unsigned long end)
+{
+ pmd_t *src_pmd, *dst_pmd;
+ pmd_t *new_pmd = NULL;
+ bool remove_pmd = false;
+ unsigned long next;
+ int ret = 0;
+
+ /*
+ * We want to unmap and free the page table if the source is empty and
+ * the range covers whole page table.
+ */
+ remove_pmd = !src_pud && IS_ALIGNED(addr, PUD_SIZE) && IS_ALIGNED(end, PUD_SIZE);
+
+ /*
+ * PUD page got split into page table.
+ * Clear PUD mapping. Page table will be established instead.
+ */
+ if (pud_large(*dst_pud)) {
+ spin_lock(&init_mm.page_table_lock);
+ pud_clear(dst_pud);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ /* Allocate a new page table if needed. */
+ if (pud_none(*dst_pud)) {
+ new_pmd = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+ if (!new_pmd)
+ return -ENOMEM;
+ dst_pmd = new_pmd + pmd_index(addr + keyid * direct_mapping_size);
+ } else {
+ dst_pmd = pmd_offset(dst_pud, addr + keyid * direct_mapping_size);
+ }
+ src_pmd = src_pud ? pmd_offset(src_pud, addr) : NULL;
+
+ do {
+ pmd_t *__src_pmd = src_pmd;
+
+ next = pmd_addr_end(addr, end);
+ if (!__src_pmd || pmd_none(*__src_pmd)) {
+ if (pmd_none(*dst_pmd))
+ goto next;
+ if (pmd_large(*dst_pmd)) {
+ spin_lock(&init_mm.page_table_lock);
+ set_pmd(dst_pmd, __pmd(0));
+ spin_unlock(&init_mm.page_table_lock);
+ goto next;
+ }
+ __src_pmd = NULL;
+ }
+
+ if (__src_pmd && pmd_large(*__src_pmd)) {
+ pmdval_t val;
+
+ if (pmd_large(*dst_pmd)) {
+ /*
+ * Sanity check: PFNs must match between source
+ * and destination even if the rest doesn't.
+ */
+ BUG_ON(pmd_pfn(*dst_pmd) != pmd_pfn(*__src_pmd));
+ } else if (!pmd_none(*dst_pmd)) {
+ /*
+ * Page table is replaced with a PMD page.
+ * Free and unmap the page table.
+ */
+ __free_page(pmd_page(*dst_pmd));
+ spin_lock(&init_mm.page_table_lock);
+ pmd_clear(dst_pmd);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ /* Copy entry, but set KeyID. */
+ val = pmd_val(*__src_pmd) | keyid << mktme_keyid_shift();
+ val &= __supported_pte_mask;
+ spin_lock(&init_mm.page_table_lock);
+ set_pmd(dst_pmd, __pmd(val));
+ spin_unlock(&init_mm.page_table_lock);
+ goto next;
+ }
+
+ ret = sync_direct_mapping_pte(keyid, dst_pmd, __src_pmd,
+ addr, next);
+next:
+ addr = next;
+ dst_pmd++;
+ if (src_pmd)
+ src_pmd++;
+ } while (addr != end && !ret);
+
+ if (new_pmd) {
+ spin_lock(&init_mm.page_table_lock);
+ pud_populate(&init_mm, dst_pud, new_pmd);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ if (remove_pmd) {
+ spin_lock(&init_mm.page_table_lock);
+ __free_page(pud_page(*dst_pud));
+ pud_clear(dst_pud);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ return ret;
+}
+
+static int sync_direct_mapping_pud(unsigned long keyid,
+ p4d_t *dst_p4d, p4d_t *src_p4d,
+ unsigned long addr, unsigned long end)
+{
+ pud_t *src_pud, *dst_pud;
+ pud_t *new_pud = NULL;
+ bool remove_pud = false;
+ unsigned long next;
+ int ret = 0;
+
+ /*
+ * We want to unmap and free the page table if the source is empty and
+ * the range covers whole page table.
+ */
+ remove_pud = !src_p4d && IS_ALIGNED(addr, P4D_SIZE) && IS_ALIGNED(end, P4D_SIZE);
+
+ /*
+ * P4D page got split into page table.
+ * Clear P4D mapping. Page table will be established instead.
+ */
+ if (p4d_large(*dst_p4d)) {
+ spin_lock(&init_mm.page_table_lock);
+ p4d_clear(dst_p4d);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ /* Allocate a new page table if needed. */
+ if (p4d_none(*dst_p4d)) {
+ new_pud = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+ if (!new_pud)
+ return -ENOMEM;
+ dst_pud = new_pud + pud_index(addr + keyid * direct_mapping_size);
+ } else {
+ dst_pud = pud_offset(dst_p4d, addr + keyid * direct_mapping_size);
+ }
+ src_pud = src_p4d ? pud_offset(src_p4d, addr) : NULL;
+
+ do {
+ pud_t *__src_pud = src_pud;
+
+ next = pud_addr_end(addr, end);
+ if (!__src_pud || pud_none(*__src_pud)) {
+ if (pud_none(*dst_pud))
+ goto next;
+ if (pud_large(*dst_pud)) {
+ spin_lock(&init_mm.page_table_lock);
+ set_pud(dst_pud, __pud(0));
+ spin_unlock(&init_mm.page_table_lock);
+ goto next;
+ }
+ __src_pud = NULL;
+ }
+
+ if (__src_pud && pud_large(*__src_pud)) {
+ pudval_t val;
+
+ if (pud_large(*dst_pud)) {
+ /*
+ * Sanity check: PFNs must match between source
+ * and destination even if the rest doesn't.
+ */
+ BUG_ON(pud_pfn(*dst_pud) != pud_pfn(*__src_pud));
+ } else if (!pud_none(*dst_pud)) {
+ /*
+ * Page table is replaced with a pud page.
+ * Free and unmap the page table.
+ */
+ __free_page(pud_page(*dst_pud));
+ spin_lock(&init_mm.page_table_lock);
+ pud_clear(dst_pud);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ /* Copy entry, but set KeyID. */
+ val = pud_val(*__src_pud) | keyid << mktme_keyid_shift();
+ val &= __supported_pte_mask;
+ spin_lock(&init_mm.page_table_lock);
+ set_pud(dst_pud, __pud(val));
+ spin_unlock(&init_mm.page_table_lock);
+ goto next;
+ }
+
+ ret = sync_direct_mapping_pmd(keyid, dst_pud, __src_pud,
+ addr, next);
+next:
+ addr = next;
+ dst_pud++;
+ if (src_pud)
+ src_pud++;
+ } while (addr != end && !ret);
+
+ if (new_pud) {
+ spin_lock(&init_mm.page_table_lock);
+ p4d_populate(&init_mm, dst_p4d, new_pud);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ if (remove_pud) {
+ spin_lock(&init_mm.page_table_lock);
+ __free_page(p4d_page(*dst_p4d));
+ p4d_clear(dst_p4d);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ return ret;
+}
+
+static int sync_direct_mapping_p4d(unsigned long keyid,
+ pgd_t *dst_pgd, pgd_t *src_pgd,
+ unsigned long addr, unsigned long end)
+{
+ p4d_t *src_p4d, *dst_p4d;
+ p4d_t *new_p4d_1 = NULL, *new_p4d_2 = NULL;
+ bool remove_p4d = false;
+ unsigned long next;
+ int ret = 0;
+
+ /*
+ * We want to unmap and free the page table if the source is empty and
+ * the range covers whole page table.
+ */
+ remove_p4d = !src_pgd && IS_ALIGNED(addr, PGDIR_SIZE) && IS_ALIGNED(end, PGDIR_SIZE);
+
+ /* Allocate a new page table if needed. */
+ if (pgd_none(*dst_pgd)) {
+ new_p4d_1 = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+ if (!new_p4d_1)
+ return -ENOMEM;
+ dst_p4d = new_p4d_1 + p4d_index(addr + keyid * direct_mapping_size);
+ } else {
+ dst_p4d = p4d_offset(dst_pgd, addr + keyid * direct_mapping_size);
+ }
+ src_p4d = src_pgd ? p4d_offset(src_pgd, addr) : NULL;
+
+ do {
+ p4d_t *__src_p4d = src_p4d;
+
+ next = p4d_addr_end(addr, end);
+ if (!__src_p4d || p4d_none(*__src_p4d)) {
+ if (p4d_none(*dst_p4d))
+ goto next;
+ __src_p4d = NULL;
+ }
+
+ ret = sync_direct_mapping_pud(keyid, dst_p4d, __src_p4d,
+ addr, next);
+next:
+ addr = next;
+ dst_p4d++;
+
+ /*
+ * Direct mappings are 1TiB-aligned. With 5-level paging it
+ * means that on PGD level there can be misalignment between
+ * source and distiantion.
+ *
+ * Allocate the new page table if dst_p4d crosses page table
+ * boundary.
+ */
+ if (!((unsigned long)dst_p4d & ~PAGE_MASK) && addr != end) {
+ if (pgd_none(dst_pgd[1])) {
+ new_p4d_2 = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+ if (!new_p4d_2)
+ ret = -ENOMEM;
+ dst_p4d = new_p4d_2;
+ } else {
+ dst_p4d = p4d_offset(dst_pgd + 1, 0);
+ }
+ }
+ if (src_p4d)
+ src_p4d++;
+ } while (addr != end && !ret);
+
+ if (new_p4d_1 || new_p4d_2) {
+ spin_lock(&init_mm.page_table_lock);
+ if (new_p4d_1)
+ pgd_populate(&init_mm, dst_pgd, new_p4d_1);
+ if (new_p4d_2)
+ pgd_populate(&init_mm, dst_pgd + 1, new_p4d_2);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ if (remove_p4d) {
+ spin_lock(&init_mm.page_table_lock);
+ __free_page(pgd_page(*dst_pgd));
+ pgd_clear(dst_pgd);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ return ret;
+}
+
+static int sync_direct_mapping_keyid(unsigned long keyid,
+ unsigned long addr, unsigned long end)
+{
+ pgd_t *src_pgd, *dst_pgd;
+ unsigned long next;
+ int ret = 0;
+
+ dst_pgd = pgd_offset_k(addr + keyid * direct_mapping_size);
+ src_pgd = pgd_offset_k(addr);
+
+ do {
+ pgd_t *__src_pgd = src_pgd;
+
+ next = pgd_addr_end(addr, end);
+ if (pgd_none(*__src_pgd)) {
+ if (pgd_none(*dst_pgd))
+ continue;
+ __src_pgd = NULL;
+ }
+
+ ret = sync_direct_mapping_p4d(keyid, dst_pgd, __src_pgd,
+ addr, next);
+ } while (dst_pgd++, src_pgd++, addr = next, addr != end && !ret);
+
+ return ret;
+}
+
+/*
+ * For MKTME we maintain per-KeyID direct mappings. This allows kernel to have
+ * access to encrypted memory.
+ *
+ * sync_direct_mapping() sync per-KeyID direct mappings with a canonical
+ * one -- KeyID-0.
+ *
+ * The function tracks changes in the canonical mapping:
+ * - creating or removing chunks of the translation tree;
+ * - changes in mapping flags (i.e. protection bits);
+ * - splitting huge page mapping into a page table;
+ * - replacing page table with a huge page mapping;
+ *
+ * The function need to be called on every change to the direct mapping:
+ * hotplug, hotremove, changes in permissions bits, etc.
+ *
+ * The function is nop until MKTME is enabled.
+ */
+int sync_direct_mapping(unsigned long start, unsigned long end)
+{
+ int i, ret = 0;
+
+ if (!mktme_enabled())
+ return 0;
+
+ for (i = 1; !ret && i <= mktme_nr_keyids(); i++)
+ ret = sync_direct_mapping_keyid(i, start, end);
+
+ flush_tlb_all();
+
+ return ret;
+}
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 6a9a77a403c9..f4e3205d2cdd 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -347,6 +347,33 @@ static void cpa_flush(struct cpa_data *data, int cache)
BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
+ if (mktme_enabled()) {
+ unsigned long start, end;
+
+ start = PAGE_OFFSET + (cpa->pfn << PAGE_SHIFT);
+ end = start + cpa->numpages * PAGE_SIZE;
+
+ /* Round to cover huge page possibly split by the change */
+ start = round_down(start, direct_gbpages ? PUD_SIZE : PMD_SIZE);
+ end = round_up(end, direct_gbpages ? PUD_SIZE : PMD_SIZE);
+
+ /* Sync all direct mapping for an array */
+ if (cpa->flags & CPA_ARRAY) {
+ start = PAGE_OFFSET;
+ end = PAGE_OFFSET + direct_mapping_size;
+ }
+
+ /*
+ * Sync per-KeyID direct mappings with the canonical one
+ * (KeyID-0).
+ *
+ * sync_direct_mapping() does full TLB flush.
+ */
+ sync_direct_mapping(start, end);
+ if (!cache)
+ return;
+ }
+
if (cache && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
cpa_flush_all(cache);
return;
--
2.21.0
From: Alison Schofield <[email protected]>
Destroy is a method invoked by the kernel key service when a
userspace key is being removed. (invalidate, revoke, timeout).
During destroy, MKTME wil returned the hardware KeyID to the pool
of available keyids.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index beca852db01a..10fcdbf5a08f 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -50,6 +50,23 @@ int mktme_reserve_keyid(struct key *key)
return 0;
}
+static void mktme_release_keyid(int keyid)
+{
+ mktme_map[keyid].state = KEYID_AVAILABLE;
+ mktme_available_keyids++;
+}
+
+int mktme_keyid_from_key(struct key *key)
+{
+ int i;
+
+ for (i = 1; i <= mktme_nr_keyids(); i++) {
+ if (mktme_map[i].key == key)
+ return i;
+ }
+ return 0;
+}
+
enum mktme_opt_id {
OPT_ERROR,
OPT_TYPE,
@@ -62,6 +79,17 @@ static const match_table_t mktme_token = {
{OPT_ERROR, NULL}
};
+/* Key Service Method called when a Userspace Key is garbage collected. */
+static void mktme_destroy_key(struct key *key)
+{
+ int keyid = mktme_keyid_from_key(key);
+ unsigned long flags;
+
+ spin_lock_irqsave(&mktme_lock, flags);
+ mktme_release_keyid(keyid);
+ spin_unlock_irqrestore(&mktme_lock, flags);
+}
+
/* Key Service Method to create a new key. Payload is preparsed. */
int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
{
@@ -198,6 +226,7 @@ struct key_type key_type_mktme = {
.free_preparse = mktme_free_preparsed_payload,
.instantiate = mktme_instantiate_key,
.describe = user_describe,
+ .destroy = mktme_destroy_key,
};
static int __init init_mktme(void)
--
2.21.0
From: Alison Schofield <[email protected]>
Memory encryption is only supported for mappings that are ANONYMOUS.
Test the VMA's in an encrypt_mprotect() request to make sure they all
meet that requirement before encrypting any.
The encrypt_mprotect syscall will return -EINVAL and will not encrypt
any VMA's if this check fails.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
mm/mprotect.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 518d75582e7b..4b079e1b2d6f 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -347,6 +347,24 @@ static int prot_none_walk(struct vm_area_struct *vma, unsigned long start,
return walk_page_range(start, end, &prot_none_walk);
}
+/*
+ * Encrypted mprotect is only supported on anonymous mappings.
+ * If this test fails on any single VMA, the entire mprotect
+ * request fails.
+ */
+static bool mem_supports_encryption(struct vm_area_struct *vma, unsigned long end)
+{
+ struct vm_area_struct *test_vma = vma;
+
+ do {
+ if (!vma_is_anonymous(test_vma))
+ return false;
+
+ test_vma = test_vma->vm_next;
+ } while (test_vma && test_vma->vm_start < end);
+ return true;
+}
+
int
mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
unsigned long start, unsigned long end, unsigned long newflags,
@@ -533,6 +551,12 @@ static int do_mprotect_ext(unsigned long start, size_t len,
goto out;
}
}
+
+ if (keyid > 0 && !mem_supports_encryption(vma, end)) {
+ error = -EINVAL;
+ goto out;
+ }
+
if (start > vma->vm_start)
prev = vma;
--
2.21.0
From: Alison Schofield <[email protected]>
Describe the security benefits of Multi-Key Total Memory
Encryption (MKTME) over Total Memory Encryption (TME) alone.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/x86/mktme/index.rst | 1 +
Documentation/x86/mktme/mktme_mitigations.rst | 151 ++++++++++++++++++
2 files changed, 152 insertions(+)
create mode 100644 Documentation/x86/mktme/mktme_mitigations.rst
diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index 1614b52dd3e9..a3a29577b013 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -6,3 +6,4 @@ Multi-Key Total Memory Encryption (MKTME)
.. toctree::
mktme_overview
+ mktme_mitigations
diff --git a/Documentation/x86/mktme/mktme_mitigations.rst b/Documentation/x86/mktme/mktme_mitigations.rst
new file mode 100644
index 000000000000..c593784851fb
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_mitigations.rst
@@ -0,0 +1,151 @@
+MKTME-Provided Mitigations
+==========================
+:Author: Dave Hansen <[email protected]>
+
+MKTME adds a few mitigations against attacks that are not
+mitigated when using TME alone. The first set are mitigations
+against software attacks that are familiar today:
+
+ * Kernel Mapping Attacks: information disclosures that leverage
+ the kernel direct map are mitigated against disclosing user
+ data.
+ * Freed Data Leak Attacks: removing an encryption key from the
+ hardware mitigates future user information disclosure.
+
+The next set are attacks that depend on specialized hardware,
+such as an “evil DIMM” or a DDR interposer:
+
+ * Cross-Domain Replay Attack: data is captured from one domain
+(guest) and replayed to another at a later time.
+ * Cross-Domain Capture and Delayed Compare Attack: data is
+ captured and later analyzed to discover secrets.
+ * Key Wear-out Attack: data is captured and analyzed in order
+ to Weaken the AES encryption itself.
+
+More details on these attacks are below.
+
+Kernel Mapping Attacks
+----------------------
+Information disclosure vulnerabilities leverage the kernel direct
+map because many vulnerabilities involve manipulation of kernel
+data structures (examples: CVE-2017-7277, CVE-2017-9605). We
+normally think of these bugs as leaking valuable *kernel* data,
+but they can leak application data when application pages are
+recycled for kernel use.
+
+With this MKTME implementation, there is a direct map created for
+each MKTME KeyID which is used whenever the kernel needs to
+access plaintext. But, all kernel data structures are accessed
+via the direct map for KeyID-0. Thus, memory reads which are not
+coordinated with the KeyID get garbage (for example, accessing
+KeyID-4 data with the KeyID-0 mapping).
+
+This means that if sensitive data encrypted using MKTME is leaked
+via the KeyID-0 direct map, ciphertext decrypted with the wrong
+key will be disclosed. To disclose plaintext, an attacker must
+“pivot” to the correct direct mapping, which is non-trivial
+because there are no kernel data structures in the KeyID!=0
+direct mapping.
+
+Freed Data Leak Attack
+----------------------
+The kernel has a history of bugs around uninitialized data.
+Usually, we think of these bugs as leaking sensitive kernel data,
+but they can also be used to leak application secrets.
+
+MKTME can help mitigate the case where application secrets are
+leaked:
+
+ * App (or VM) places a secret in a page * App exits or frees
+memory to kernel allocator * Page added to allocator free list *
+Attacker reallocates page to a purpose where it can read the page
+
+Now, imagine MKTME was in use on the memory being leaked. The
+data can only be leaked as long as the key is programmed in the
+hardware. If the key is de-programmed, like after all pages are
+freed after a guest is shut down, any future reads will just see
+ciphertext.
+
+Basically, the key is a convenient choke-point: you can be more
+confident that data encrypted with it is inaccessible once the
+key is removed.
+
+Cross-Domain Replay Attack
+--------------------------
+MKTME mitigates cross-domain replay attacks where an attacker
+replaces an encrypted block owned by one domain with a block
+owned by another domain. MKTME does not prevent this replacement
+from occurring, but it does mitigate plaintext from being
+disclosed if the domains use different keys.
+
+With TME, the attack could be executed by:
+ * A victim places secret in memory, at a given physical address.
+ Note: AES-XTS is what restricts the attack to being performed
+ at a single physical address instead of across different
+ physical addresses
+ * Attacker captures victim secret’s ciphertext * Later on, after
+ victim frees the physical address, attacker gains ownership
+ * Attacker puts the ciphertext at the address and get the secret
+ plaintext
+
+But, due to the presumably different keys used by the attacker
+and the victim, the attacker can not successfully decrypt old
+ciphertext.
+
+Cross-Domain Capture and Delayed Compare Attack
+-----------------------------------------------
+This is also referred to as a kind of dictionary attack.
+
+Similarly, MKTME protects against cross-domain capture-and-compare
+attacks. Consider the following scenario:
+ * A victim places a secret in memory, at a known physical address
+ * Attacker captures victim’s ciphertext
+ * Attacker gains control of the target physical address, perhaps
+ after the victim’s VM is shut down or its memory reclaimed.
+ * Attacker computes and writes many possible plaintexts until new
+ ciphertext matches content captured previously.
+
+Secrets which have low (plaintext) entropy are more vulnerable to
+this attack because they reduce the number of possible plaintexts
+an attacker has to compute and write.
+
+The attack will not work if attacker and victim uses different
+keys.
+
+Key Wear-out Attack
+-------------------
+Repeated use of an encryption key might be used by an attacker to
+infer information about the key or the plaintext, weakening the
+encryption. The higher the bandwidth of the encryption engine,
+the more vulnerable the key is to wear-out. The MKTME memory
+encryption hardware works at the speed of the memory bus, which
+has high bandwidth.
+
+Such a weakness has been demonstrated[1] on a theoretical cipher
+with similar properties as AES-XTS.
+
+An attack would take the following steps:
+ * Victim system is using TME with AES-XTS-128
+ * Attacker repeatedly captures ciphertext/plaintext pairs (can
+ be Performed with online hardware attack like an interposer).
+ * Attacker compels repeated use of the key under attack for a
+ sustained time period without a system reboot[2].
+ * Attacker discovers a cipertext collision (two plaintexts
+ translating to the same ciphertext)
+ * Attacker can induce controlled modifications to the targeted
+ plaintext by modifying the colliding ciphertext
+
+MKTME mitigates key wear-out in two ways:
+ * Keys can be rotated periodically to mitigate wear-out. Since
+ TME keys are generated at boot, rotation of TME keys requires a
+ reboot. In contrast, MKTME allows rotation while the system is
+ booted. An application could implement a policy to rotate keys
+ at a frequency which is not feasible to attack.
+ * In the case that MKTME is used to encrypt two guests’ memory
+ with two different keys, an attack on one guest’s key would not
+ weaken the key used in the second guest.
+
+--
+1. http://web.cs.ucdavis.edu/~rogaway/papers/offsets.pdf
+2. This sustained time required for an attack could vary from days
+ to years depending on the attacker’s goals.
--
2.21.0
From: Alison Schofield <[email protected]>
Intel platforms supporting MKTME need the ability to evaluate
the memory topology before allowing new memory to go online.
That evaluation would determine if the kernel can program the
memory controller. Every memory controller needs to have a
CPU online, capable of programming its MKTME keys.
The kernel uses the ACPI HMAT at boot time to determine a safe
MKTME topology, but at run time, there is no update to the HMAT.
That run time support will come in the future with platform and
kernel support for the _HMA method.
Meanwhile, be safe, and do not allow any MEM_GOING_ONLINE events
when MKTME is enabled.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index b042df73899d..f804d780fc91 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -8,6 +8,7 @@
#include <linux/init.h>
#include <linux/key.h>
#include <linux/key-type.h>
+#include <linux/memory.h>
#include <linux/mm.h>
#include <linux/parser.h>
#include <linux/percpu-refcount.h>
@@ -497,6 +498,26 @@ static int mktme_cpu_teardown(unsigned int cpu)
return ret;
}
+static int mktme_memory_callback(struct notifier_block *nb,
+ unsigned long action, void *arg)
+{
+ /*
+ * Do not allow the hot add of memory until run time
+ * support of the ACPI HMAT is available via an _HMA
+ * method. Without it, the new memory cannot be
+ * evaluated to determine an MTKME safe topology.
+ */
+ if (action == MEM_GOING_ONLINE)
+ return NOTIFY_BAD;
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block mktme_memory_nb = {
+ .notifier_call = mktme_memory_callback,
+ .priority = 99, /* priority ? */
+};
+
static int __init init_mktme(void)
{
int ret, cpuhp;
@@ -543,10 +564,15 @@ static int __init init_mktme(void)
if (cpuhp < 0)
goto free_encrypt;
+ if (register_memory_notifier(&mktme_memory_nb))
+ goto remove_cpuhp;
+
ret = register_key_type(&key_type_mktme);
if (!ret)
return ret; /* SUCCESS */
+ unregister_memory_notifier(&mktme_memory_nb);
+remove_cpuhp:
cpuhp_remove_state_nocalls(cpuhp);
free_encrypt:
kvfree(encrypt_count);
--
2.21.0
From: Alison Schofield <[email protected]>
The MKTME key service needs to keep usage counts on the encryption
keys in order to know when it is safe to free a key for reuse.
percpu_ref_count applies well here because the key service will
take the initial reference and typically hold that reference while
the intermediary references are get/put. The intermediaries in this
case will be encrypted VMA's,
Align the percpu_ref_init and percpu_ref_kill with the key service
instantiate and destroy methods respectively.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 39 +++++++++++++++++++++++++++++++++++++-
1 file changed, 38 insertions(+), 1 deletion(-)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 3c641f3ee794..18cb57be5193 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -8,6 +8,7 @@
#include <linux/key-type.h>
#include <linux/mm.h>
#include <linux/parser.h>
+#include <linux/percpu-refcount.h>
#include <linux/string.h>
#include <asm/intel_pconfig.h>
#include <keys/mktme-type.h>
@@ -71,6 +72,26 @@ int mktme_keyid_from_key(struct key *key)
return 0;
}
+struct percpu_ref *encrypt_count;
+void mktme_percpu_ref_release(struct percpu_ref *ref)
+{
+ unsigned long flags;
+ int keyid;
+
+ for (keyid = 1; keyid <= mktme_nr_keyids(); keyid++) {
+ if (&encrypt_count[keyid] == ref)
+ break;
+ }
+ if (&encrypt_count[keyid] != ref) {
+ pr_debug("%s: invalid ref counter\n", __func__);
+ return;
+ }
+ percpu_ref_exit(ref);
+ spin_lock_irqsave(&mktme_lock, flags);
+ mktme_release_keyid(keyid);
+ spin_unlock_irqrestore(&mktme_lock, flags);
+}
+
enum mktme_opt_id {
OPT_ERROR,
OPT_TYPE,
@@ -199,8 +220,10 @@ static void mktme_destroy_key(struct key *key)
unsigned long flags;
spin_lock_irqsave(&mktme_lock, flags);
- mktme_release_keyid(keyid);
+ mktme_map[keyid].key = NULL;
+ mktme_map[keyid].state = KEYID_REF_KILLED;
spin_unlock_irqrestore(&mktme_lock, flags);
+ percpu_ref_kill(&encrypt_count[keyid]);
}
/* Key Service Method to create a new key. Payload is preparsed. */
@@ -216,9 +239,15 @@ int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
if (!keyid)
return -ENOKEY;
+ if (percpu_ref_init(&encrypt_count[keyid], mktme_percpu_ref_release,
+ 0, GFP_KERNEL))
+ goto err_out;
+
if (!mktme_program_keyid(keyid, *payload))
return MKTME_PROG_SUCCESS;
+ percpu_ref_exit(&encrypt_count[keyid]);
+err_out:
spin_lock_irqsave(&mktme_lock, flags);
mktme_release_keyid(keyid);
spin_unlock_irqrestore(&mktme_lock, flags);
@@ -405,10 +434,18 @@ static int __init init_mktme(void)
/* Initialize first programming targets */
mktme_update_pconfig_targets();
+ /* Reference counters to protect in use KeyIDs */
+ encrypt_count = kvcalloc(mktme_nr_keyids() + 1, sizeof(encrypt_count[0]),
+ GFP_KERNEL);
+ if (!encrypt_count)
+ goto free_targets;
+
ret = register_key_type(&key_type_mktme);
if (!ret)
return ret; /* SUCCESS */
+ kvfree(encrypt_count);
+free_targets:
free_cpumask_var(mktme_leadcpus);
bitmap_free(mktme_target_map);
free_cache:
--
2.21.0
From: Alison Schofield <[email protected]>
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/x86/mktme/index.rst | 1 +
Documentation/x86/mktme/mktme_configuration.rst | 6 ++++++
2 files changed, 7 insertions(+)
create mode 100644 Documentation/x86/mktme/mktme_configuration.rst
diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index a3a29577b013..0f021cc4a2db 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -7,3 +7,4 @@ Multi-Key Total Memory Encryption (MKTME)
mktme_overview
mktme_mitigations
+ mktme_configuration
diff --git a/Documentation/x86/mktme/mktme_configuration.rst b/Documentation/x86/mktme/mktme_configuration.rst
new file mode 100644
index 000000000000..7d56596360cb
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_configuration.rst
@@ -0,0 +1,6 @@
+MKTME Configuration
+===================
+
+CONFIG_X86_INTEL_MKTME
+ MKTME is enabled by selecting CONFIG_X86_INTEL_MKTME on Intel
+ platforms supporting the MKTME feature.
--
2.21.0
From: Alison Schofield <[email protected]>
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/x86/mktme/index.rst | 1 +
Documentation/x86/mktme/mktme_encrypt.rst | 56 +++++++++++++++++++++++
2 files changed, 57 insertions(+)
create mode 100644 Documentation/x86/mktme/mktme_encrypt.rst
diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index 8cf2b7d62091..ca3c76adc596 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -9,3 +9,4 @@ Multi-Key Total Memory Encryption (MKTME)
mktme_mitigations
mktme_configuration
mktme_keys
+ mktme_encrypt
diff --git a/Documentation/x86/mktme/mktme_encrypt.rst b/Documentation/x86/mktme/mktme_encrypt.rst
new file mode 100644
index 000000000000..6dc8ae11f1cb
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_encrypt.rst
@@ -0,0 +1,56 @@
+MKTME API: system call encrypt_mprotect()
+=========================================
+
+Synopsis
+--------
+int encrypt_mprotect(void \*addr, size_t len, int prot, key_serial_t serial);
+
+Where *key_serial_t serial* is the serial number of a key allocated
+using the MKTME Key Service.
+
+Description
+-----------
+ encrypt_mprotect() encrypts the memory pages containing any part
+ of the address range in the interval specified by addr and len.
+
+ encrypt_mprotect() supports the legacy mprotect() behavior plus
+ the enabling of memory encryption. That means that in addition
+ to encrypting the memory, the protection flags will be updated
+ as requested in the call.
+
+ The *addr* and *len* must be aligned to a page boundary.
+
+ The caller must have *KEY_NEED_VIEW* permission on the key.
+
+ The memory that is to be protected must be mapped *ANONYMOUS*.
+
+Errors
+------
+ In addition to the Errors returned from legacy mprotect()
+ encrypt_mprotect will return:
+
+ ENOKEY *serial* parameter does not represent a valid key.
+
+ EINVAL *len* parameter is not page aligned.
+
+ EACCES Caller does not have *KEY_NEED_VIEW* permission on the key.
+
+EXAMPLE
+--------
+ Allocate an MKTME Key::
+ serial = add_key("mktme", "name", "type=cpu algorithm=aes-xts-128" @u
+
+ Map ANONYMOUS memory::
+ ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+
+ Protect memory::
+ ret = syscall(SYS_encrypt_mprotect, ptr, size, PROT_READ|PROT_WRITE,
+ serial);
+
+ Use the encrypted memory
+
+ Free memory::
+ ret = munmap(ptr, size);
+
+ Free the key resource::
+ ret = keyctl(KEYCTL_INVALIDATE, serial);
--
2.21.0
From: Alison Schofield <[email protected]>
MKTME, Multi-Key Total Memory Encryption, is a feature on Intel
platforms. The ACPI HMAT table can be used to verify that the
platform topology is safe for the usage of MKTME.
The kernel must be capable of programming every memory controller
on the platform. This means that there must be a CPU online, in
the same proximity domain of each memory controller.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
drivers/acpi/hmat/hmat.c | 54 ++++++++++++++++++++++++++++++++++++++++
include/linux/acpi.h | 1 +
2 files changed, 55 insertions(+)
diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c
index 38e3341f569f..936a403c0694 100644
--- a/drivers/acpi/hmat/hmat.c
+++ b/drivers/acpi/hmat/hmat.c
@@ -677,3 +677,57 @@ bool acpi_hmat_present(void)
acpi_put_table(tbl);
return true;
}
+
+static int mktme_parse_proximity_domains(union acpi_subtable_headers *header,
+ const unsigned long end)
+{
+ struct acpi_hmat_proximity_domain *mar = (void *)header;
+ struct acpi_hmat_structure *hdr = (void *)header;
+
+ const struct cpumask *tmp_mask;
+
+ if (!hdr || hdr->type != ACPI_HMAT_TYPE_PROXIMITY)
+ return -EINVAL;
+
+ if (mar->header.length != sizeof(*mar)) {
+ pr_warn("MKTME: invalid header length in HMAT\n");
+ return -1;
+ }
+ /*
+ * Require a valid processor proximity domain.
+ * This will catch memory only physical packages with
+ * no processor capable of programming the key table.
+ */
+ if (!(mar->flags & ACPI_HMAT_PROCESSOR_PD_VALID)) {
+ pr_warn("MKTME: no valid processor proximity domain\n");
+ return -1;
+ }
+ /* Require an online CPU in the processor proximity domain. */
+ tmp_mask = cpumask_of_node(pxm_to_node(mar->processor_PD));
+ if (!cpumask_intersects(tmp_mask, cpu_online_mask)) {
+ pr_warn("MKTME: no online CPU in proximity domain\n");
+ return -1;
+ }
+ return 0;
+}
+
+/* Returns true if topology is safe for MKTME key creation */
+bool mktme_hmat_evaluate(void)
+{
+ struct acpi_table_header *tbl;
+ bool ret = true;
+ acpi_status status;
+
+ status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl);
+ if (ACPI_FAILURE(status))
+ return -EINVAL;
+
+ if (acpi_table_parse_entries(ACPI_SIG_HMAT,
+ sizeof(struct acpi_table_hmat),
+ ACPI_HMAT_TYPE_PROXIMITY,
+ mktme_parse_proximity_domains, 0) < 0) {
+ ret = false;
+ }
+ acpi_put_table(tbl);
+ return ret;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index d27f4d17dfb3..8854ae942e37 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1337,6 +1337,7 @@ acpi_platform_notify(struct device *dev, enum kobject_action action)
#ifdef CONFIG_X86_INTEL_MKTME
extern bool acpi_hmat_present(void);
+extern bool mktme_hmat_evaluate(void);
#endif /* CONFIG_X86_INTEL_MKTME */
#endif /*_LINUX_ACPI_H*/
--
2.21.0
From: Alison Schofield <[email protected]>
ACPI table parsing functions are useful after init time.
For example, the MKTME (Multi-Key Total Memory Encryption) key
service will evaluate the ACPI HMAT table when the first key
creation request occurs. This will happen after init time.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
drivers/acpi/tables.c | 10 +++++-----
include/linux/acpi.h | 4 ++--
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index b32327759380..9d40af7f07fb 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -33,7 +33,7 @@ static char *mps_inti_flags_trigger[] = { "dfl", "edge", "res", "level" };
static struct acpi_table_desc initial_tables[ACPI_MAX_TABLES] __initdata;
-static int acpi_apic_instance __initdata;
+static int acpi_apic_instance;
enum acpi_subtable_type {
ACPI_SUBTABLE_COMMON,
@@ -49,7 +49,7 @@ struct acpi_subtable_entry {
* Disable table checksum verification for the early stage due to the size
* limitation of the current x86 early mapping implementation.
*/
-static bool acpi_verify_table_checksum __initdata = false;
+static bool acpi_verify_table_checksum = false;
void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
{
@@ -280,7 +280,7 @@ acpi_get_subtable_type(char *id)
* On success returns sum of all matching entries for all proc handlers.
* Otherwise, -ENODEV or -EINVAL is returned.
*/
-static int __init acpi_parse_entries_array(char *id, unsigned long table_size,
+static int acpi_parse_entries_array(char *id, unsigned long table_size,
struct acpi_table_header *table_header,
struct acpi_subtable_proc *proc, int proc_num,
unsigned int max_entries)
@@ -355,7 +355,7 @@ static int __init acpi_parse_entries_array(char *id, unsigned long table_size,
return errs ? -EINVAL : count;
}
-int __init acpi_table_parse_entries_array(char *id,
+int acpi_table_parse_entries_array(char *id,
unsigned long table_size,
struct acpi_subtable_proc *proc, int proc_num,
unsigned int max_entries)
@@ -386,7 +386,7 @@ int __init acpi_table_parse_entries_array(char *id,
return count;
}
-int __init acpi_table_parse_entries(char *id,
+int acpi_table_parse_entries(char *id,
unsigned long table_size,
int entry_id,
acpi_tbl_entry_handler handler,
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 9426b9aaed86..fc1e7d4648bf 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -228,11 +228,11 @@ int acpi_numa_init (void);
int acpi_table_init (void);
int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
-int __init acpi_table_parse_entries(char *id, unsigned long table_size,
+int acpi_table_parse_entries(char *id, unsigned long table_size,
int entry_id,
acpi_tbl_entry_handler handler,
unsigned int max_entries);
-int __init acpi_table_parse_entries_array(char *id, unsigned long table_size,
+int acpi_table_parse_entries_array(char *id, unsigned long table_size,
struct acpi_subtable_proc *proc, int proc_num,
unsigned int max_entries);
int acpi_table_parse_madt(enum acpi_madt_type id,
--
2.21.0
Per-KeyID direct mappings require changes into how we find the right
virtual address for a page and virt-to-phys address translations.
page_to_virt() definition overwrites default macros provided by
<linux/mm.h>.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/page.h | 3 +++
arch/x86/include/asm/page_64.h | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 39af59487d5f..aff30554f38e 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -72,6 +72,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
extern bool __virt_addr_valid(unsigned long kaddr);
#define virt_addr_valid(kaddr) __virt_addr_valid((unsigned long) (kaddr))
+#define page_to_virt(x) \
+ (__va(PFN_PHYS(page_to_pfn(x))) + page_keyid(x) * direct_mapping_size)
+
#endif /* __ASSEMBLY__ */
#include <asm-generic/memory_model.h>
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index f57fc3cc2246..a4f394e3471d 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -24,7 +24,7 @@ static inline unsigned long __phys_addr_nodebug(unsigned long x)
/* use the carry flag to determine if x was < __START_KERNEL_map */
x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET));
- return x;
+ return x & direct_mapping_mask;
}
#ifdef CONFIG_DEBUG_VIRTUAL
--
2.21.0
From: Alison Schofield <[email protected]>
MKTME architecture requires the KeyID to be placed in PTE bits 51:46.
To create an encrypted VMA, place the KeyID in the upper bits of
vm_page_prot that matches the position of those PTE bits.
When the VMA is assigned a KeyID it is always considered a KeyID
change. The VMA is either going from not encrypted to encrypted,
or from encrypted with any KeyID to encrypted with any other KeyID.
To make the change safely, remove the user pages held by the VMA
and unlink the VMA's anonymous chain.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 4 ++++
arch/x86/mm/mktme.c | 26 ++++++++++++++++++++++++++
include/linux/mm.h | 6 ++++++
3 files changed, 36 insertions(+)
diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index d26ada6b65f7..e8f7f80bb013 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -16,6 +16,10 @@ extern int __mktme_nr_keyids;
extern int mktme_nr_keyids(void);
extern unsigned int mktme_algs;
+/* Set the encryption keyid bits in a VMA */
+extern void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
+ unsigned long start, unsigned long end);
+
DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
static inline bool mktme_enabled(void)
{
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index ed13967bb543..05bbf5058ade 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,5 +1,6 @@
#include <linux/mm.h>
#include <linux/highmem.h>
+#include <linux/rmap.h>
#include <asm/mktme.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
@@ -71,6 +72,31 @@ int __vma_keyid(struct vm_area_struct *vma)
return (prot & mktme_keyid_mask()) >> mktme_keyid_shift();
}
+/* Set the encryption keyid bits in a VMA */
+void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
+ unsigned long start, unsigned long end)
+{
+ int oldkeyid = vma_keyid(vma);
+ pgprotval_t newprot;
+
+ /* Unmap pages with old KeyID if there's any. */
+ zap_page_range(vma, start, end - start);
+
+ if (oldkeyid == newkeyid)
+ return;
+
+ newprot = pgprot_val(vma->vm_page_prot);
+ newprot &= ~mktme_keyid_mask();
+ newprot |= (unsigned long)newkeyid << mktme_keyid_shift();
+ vma->vm_page_prot = __pgprot(newprot);
+
+ /*
+ * The VMA doesn't have any inherited pages.
+ * Start anon VMA tree from scratch.
+ */
+ unlink_anon_vmas(vma);
+}
+
/* Prepare page to be used for encryption. Called from page allocator. */
void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
{
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3f9640f388ac..98a6d2bd66a6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2905,5 +2905,11 @@ void __init setup_nr_node_ids(void);
static inline void setup_nr_node_ids(void) {}
#endif
+#ifndef CONFIG_X86_INTEL_MKTME
+static inline void mprotect_set_encrypt(struct vm_area_struct *vma,
+ int newkeyid,
+ unsigned long start,
+ unsigned long end) {}
+#endif /* CONFIG_X86_INTEL_MKTME */
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
--
2.21.0
From: Alison Schofield <[email protected]>
Today mprotect is implemented to support legacy mprotect behavior
plus an extension for memory protection keys. Make it more generic
so that it can support additional extensions in the future.
This is done is preparation for adding a new system call for memory
encyption keys. The intent is that the new encrypted mprotect will be
another extension to legacy mprotect.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
mm/mprotect.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 82d7b194a918..4d55725228e3 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -35,6 +35,8 @@
#include "internal.h"
+#define NO_KEY -1
+
static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end, pgprot_t newprot,
int dirty_accountable, int prot_numa)
@@ -453,9 +455,9 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
}
/*
- * pkey==-1 when doing a legacy mprotect()
+ * When pkey==NO_KEY we get legacy mprotect behavior here.
*/
-static int do_mprotect_pkey(unsigned long start, size_t len,
+static int do_mprotect_ext(unsigned long start, size_t len,
unsigned long prot, int pkey)
{
unsigned long nstart, end, tmp, reqprot;
@@ -579,7 +581,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
unsigned long, prot)
{
- return do_mprotect_pkey(start, len, prot, -1);
+ return do_mprotect_ext(start, len, prot, NO_KEY);
}
#ifdef CONFIG_ARCH_HAS_PKEYS
@@ -587,7 +589,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
unsigned long, prot, int, pkey)
{
- return do_mprotect_pkey(start, len, prot, pkey);
+ return do_mprotect_ext(start, len, prot, pkey);
}
SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
--
2.21.0
From: Jacob Pan <[email protected]>
When MKTME is enabled, keyid is stored in the high order bits of physical
address. For DMA transactions targeting encrypted physical memory, keyid
must be included in the IOVA to physical address translation.
This patch appends page keyid when setting up the IOMMU PTEs. On the
reverse direction, keyid bits are cleared in the physical address lookup.
Mapping functions of both DMA ops and IOMMU ops are covered.
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
drivers/iommu/intel-iommu.c | 29 +++++++++++++++++++++++++++--
include/linux/intel-iommu.h | 9 ++++++++-
2 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ac4172c02244..32d22872656b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -867,6 +867,28 @@ static void free_context_table(struct intel_iommu *iommu)
spin_unlock_irqrestore(&iommu->lock, flags);
}
+static inline void set_pte_mktme_keyid(unsigned long phys_pfn,
+ phys_addr_t *pteval)
+{
+ unsigned long keyid;
+
+ if (!pfn_valid(phys_pfn))
+ return;
+
+ keyid = page_keyid(pfn_to_page(phys_pfn));
+
+#ifdef CONFIG_X86_INTEL_MKTME
+ /*
+ * When MKTME is enabled, set keyid in PTE such that DMA
+ * remapping will include keyid in the translation from IOVA
+ * to physical address. This applies to both user and kernel
+ * allocated DMA memory.
+ */
+ *pteval &= ~mktme_keyid_mask();
+ *pteval |= keyid << mktme_keyid_shift();
+#endif
+}
+
static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
unsigned long pfn, int *target_level)
{
@@ -893,7 +915,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
break;
if (!dma_pte_present(pte)) {
- uint64_t pteval;
+ phys_addr_t pteval;
tmp_page = alloc_pgtable_page(domain->nid);
@@ -901,7 +923,8 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
return NULL;
domain_flush_cache(domain, tmp_page, VTD_PAGE_SIZE);
- pteval = ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
+ pteval = (virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
+ set_pte_mktme_keyid(virt_to_dma_pfn(tmp_page), &pteval);
if (cmpxchg64(&pte->val, 0ULL, pteval))
/* Someone else set it while we were thinking; use theirs. */
free_pgtable_page(tmp_page);
@@ -2214,6 +2237,8 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
}
}
+ set_pte_mktme_keyid(phys_pfn, &pteval);
+
/* We don't need lock here, nobody else
* touches the iova range
*/
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index f2ae8a006ff8..8fbb9353d5a6 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -22,6 +22,8 @@
#include <asm/cacheflush.h>
#include <asm/iommu.h>
+#include <asm/page.h>
+
/*
* VT-d hardware uses 4KiB page size regardless of host page size.
@@ -608,7 +610,12 @@ static inline void dma_clear_pte(struct dma_pte *pte)
static inline u64 dma_pte_addr(struct dma_pte *pte)
{
#ifdef CONFIG_64BIT
- return pte->val & VTD_PAGE_MASK;
+ u64 addr = pte->val;
+ addr &= VTD_PAGE_MASK;
+#ifdef CONFIG_X86_INTEL_MKTME
+ addr &= ~mktme_keyid_mask();
+#endif
+ return addr;
#else
/* Must have a full atomic 64-bit read */
return __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
--
2.21.0
From: Alison Schofield <[email protected]>
In preparation for programming the key into the hardware, move
the key payload into a cache aligned structure. This alignment
is a requirement of the MKTME hardware.
Use the slab allocator to have this structure readily available.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 37 +++++++++++++++++++++++++++++++++++--
1 file changed, 35 insertions(+), 2 deletions(-)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 10fcdbf5a08f..8ac75b1e6188 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -16,6 +16,7 @@
static DEFINE_SPINLOCK(mktme_lock);
static unsigned int mktme_available_keyids; /* Free Hardware KeyIDs */
+static struct kmem_cache *mktme_prog_cache; /* Hardware programming cache */
enum mktme_keyid_state {
KEYID_AVAILABLE, /* Available to be assigned */
@@ -79,6 +80,25 @@ static const match_table_t mktme_token = {
{OPT_ERROR, NULL}
};
+/* Copy the payload to the HW programming structure and program this KeyID */
+static int mktme_program_keyid(int keyid, u32 payload)
+{
+ struct mktme_key_program *kprog = NULL;
+ int ret;
+
+ kprog = kmem_cache_zalloc(mktme_prog_cache, GFP_KERNEL);
+ if (!kprog)
+ return -ENOMEM;
+
+ /* Hardware programming requires cached aligned struct */
+ kprog->keyid = keyid;
+ kprog->keyid_ctrl = payload;
+
+ ret = MKTME_PROG_SUCCESS; /* Future programming call */
+ kmem_cache_free(mktme_prog_cache, kprog);
+ return ret;
+}
+
/* Key Service Method called when a Userspace Key is garbage collected. */
static void mktme_destroy_key(struct key *key)
{
@@ -93,6 +113,7 @@ static void mktme_destroy_key(struct key *key)
/* Key Service Method to create a new key. Payload is preparsed. */
int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
{
+ u32 *payload = prep->payload.data[0];
unsigned long flags;
int keyid;
@@ -101,7 +122,14 @@ int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
spin_unlock_irqrestore(&mktme_lock, flags);
if (!keyid)
return -ENOKEY;
- return 0;
+
+ if (!mktme_program_keyid(keyid, *payload))
+ return MKTME_PROG_SUCCESS;
+
+ spin_lock_irqsave(&mktme_lock, flags);
+ mktme_release_keyid(keyid);
+ spin_unlock_irqrestore(&mktme_lock, flags);
+ return -ENOKEY;
}
/* Make sure arguments are correct for the TYPE of key requested */
@@ -245,10 +273,15 @@ static int __init init_mktme(void)
if (!mktme_map)
return -ENOMEM;
+ /* Used to program the hardware key tables */
+ mktme_prog_cache = KMEM_CACHE(mktme_key_program, SLAB_PANIC);
+ if (!mktme_prog_cache)
+ goto free_map;
+
ret = register_key_type(&key_type_mktme);
if (!ret)
return ret; /* SUCCESS */
-
+free_map:
kvfree(mktme_map);
return -ENOMEM;
--
2.21.0
From: Alison Schofield <[email protected]>
MKTME Key service maintains the hardware key tables. These key tables
are package scoped per the MKTME hardware definition. This means that
each physical package on the system needs its key table programmed.
These physical packages are the targets of the new PCONFIG programming
command. So, introduce a PCONFIG targets bitmap as well as a CPU mask
that includes the lead CPUs capable of programming the targets.
The lead CPU mask will be used every time a new key is programmed into
the hardware.
Keep the PCONFIG targets bit map around for future use during CPU
hotplug events.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 42 ++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 8ac75b1e6188..272bff8591b7 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -2,6 +2,7 @@
/* Documentation/x86/mktme/ */
+#include <linux/cpu.h>
#include <linux/init.h>
#include <linux/key.h>
#include <linux/key-type.h>
@@ -17,6 +18,8 @@
static DEFINE_SPINLOCK(mktme_lock);
static unsigned int mktme_available_keyids; /* Free Hardware KeyIDs */
static struct kmem_cache *mktme_prog_cache; /* Hardware programming cache */
+static unsigned long *mktme_target_map; /* PCONFIG programming target */
+static cpumask_var_t mktme_leadcpus; /* One CPU per PCONFIG target */
enum mktme_keyid_state {
KEYID_AVAILABLE, /* Available to be assigned */
@@ -257,6 +260,33 @@ struct key_type key_type_mktme = {
.destroy = mktme_destroy_key,
};
+static void mktme_update_pconfig_targets(void)
+{
+ int cpu, target_id;
+
+ cpumask_clear(mktme_leadcpus);
+ bitmap_clear(mktme_target_map, 0, sizeof(mktme_target_map));
+
+ for_each_online_cpu(cpu) {
+ target_id = topology_physical_package_id(cpu);
+ if (!__test_and_set_bit(target_id, mktme_target_map))
+ __cpumask_set_cpu(cpu, mktme_leadcpus);
+ }
+}
+
+static int mktme_alloc_pconfig_targets(void)
+{
+ if (!alloc_cpumask_var(&mktme_leadcpus, GFP_KERNEL))
+ return -ENOMEM;
+
+ mktme_target_map = bitmap_alloc(topology_max_packages(), GFP_KERNEL);
+ if (!mktme_target_map) {
+ free_cpumask_var(mktme_leadcpus);
+ return -ENOMEM;
+ }
+ return 0;
+}
+
static int __init init_mktme(void)
{
int ret;
@@ -278,9 +308,21 @@ static int __init init_mktme(void)
if (!mktme_prog_cache)
goto free_map;
+ /* Hardware programming targets */
+ if (mktme_alloc_pconfig_targets())
+ goto free_cache;
+
+ /* Initialize first programming targets */
+ mktme_update_pconfig_targets();
+
ret = register_key_type(&key_type_mktme);
if (!ret)
return ret; /* SUCCESS */
+
+ free_cpumask_var(mktme_leadcpus);
+ bitmap_free(mktme_target_map);
+free_cache:
+ kmem_cache_destroy(mktme_prog_cache);
free_map:
kvfree(mktme_map);
--
2.21.0
From: Jacob Pan <[email protected]>
Both Intel MKTME and AMD SME have needs to support DMA address
translation with encryption related bits. Common functions are
introduced in this patch to keep DMA generic code abstracted.
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/Kconfig | 8 +++--
arch/x86/mm/Makefile | 1 +
arch/x86/mm/mem_encrypt.c | 30 ------------------
arch/x86/mm/mem_encrypt_common.c | 52 ++++++++++++++++++++++++++++++++
4 files changed, 59 insertions(+), 32 deletions(-)
create mode 100644 arch/x86/mm/mem_encrypt_common.c
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2eb2867db5fa..f2cc88fe8ada 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1521,12 +1521,16 @@ config X86_CPA_STATISTICS
config ARCH_HAS_MEM_ENCRYPT
def_bool y
+config X86_MEM_ENCRYPT_COMMON
+ select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+ select DYNAMIC_PHYSICAL_MASK
+ def_bool n
+
config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
- select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
- select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+ select X86_MEM_ENCRYPT_COMMON
---help---
Say yes to enable support for the encryption of system memory.
This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 600d18691876..608e57cda784 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -55,3 +55,4 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o
obj-$(CONFIG_X86_INTEL_MKTME) += mktme.o
+obj-$(CONFIG_X86_MEM_ENCRYPT_COMMON) += mem_encrypt_common.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index fece30ca8b0c..e94e0a62ba92 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -15,10 +15,6 @@
#include <linux/dma-direct.h>
#include <linux/swiotlb.h>
#include <linux/mem_encrypt.h>
-#include <linux/device.h>
-#include <linux/kernel.h>
-#include <linux/bitops.h>
-#include <linux/dma-mapping.h>
#include <asm/tlbflush.h>
#include <asm/fixmap.h>
@@ -352,32 +348,6 @@ bool sev_active(void)
}
EXPORT_SYMBOL(sev_active);
-/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
-bool force_dma_unencrypted(struct device *dev)
-{
- /*
- * For SEV, all DMA must be to unencrypted addresses.
- */
- if (sev_active())
- return true;
-
- /*
- * For SME, all DMA must be to unencrypted addresses if the
- * device does not support DMA to addresses that include the
- * encryption mask.
- */
- if (sme_active()) {
- u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
- u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
- dev->bus_dma_mask);
-
- if (dma_dev_mask <= dma_enc_mask)
- return true;
- }
-
- return false;
-}
-
/* Architecture __weak replacement functions */
void __init mem_encrypt_free_decrypted_mem(void)
{
diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
new file mode 100644
index 000000000000..c11d70151735
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -0,0 +1,52 @@
+#include <linux/mm.h>
+#include <linux/mem_encrypt.h>
+#include <linux/dma-mapping.h>
+#include <asm/mktme.h>
+
+/*
+ * Encryption bits need to be set and cleared for both Intel MKTME and
+ * AMD SME when converting between DMA address and physical address.
+ */
+dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr)
+{
+ unsigned long keyid;
+
+ if (sme_active())
+ return __sme_set(daddr);
+ keyid = page_keyid(pfn_to_page(__phys_to_pfn(paddr)));
+
+ return (daddr & ~mktme_keyid_mask()) | (keyid << mktme_keyid_shift());
+}
+
+phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
+{
+ if (sme_active())
+ return __sme_clr(paddr);
+
+ return paddr & ~mktme_keyid_mask();
+}
+
+/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
+bool force_dma_unencrypted(struct device *dev)
+{
+ u64 dma_enc_mask, dma_dev_mask;
+
+ /*
+ * For SEV, all DMA must be to unencrypted addresses.
+ */
+ if (sev_active())
+ return true;
+
+ /*
+ * For SME and MKTME, all DMA must be to unencrypted addresses if the
+ * device does not support DMA to addresses that include the encryption
+ * mask.
+ */
+ if (!sme_active() && !mktme_enabled())
+ return false;
+
+ dma_enc_mask = sme_me_mask | mktme_keyid_mask();
+ dma_dev_mask = min_not_zero(dev->coherent_dma_mask, dev->bus_dma_mask);
+
+ return (dma_dev_mask & dma_enc_mask) != dma_enc_mask;
+}
--
2.21.0
UEFI memory attribute EFI_MEMORY_CPU_CRYPTO indicates whether the memory
region supports encryption.
Kernel doesn't handle situation when only part of the system memory
supports encryption.
Disable MKTME if not all system memory supports encryption.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/mm/mktme.c | 35 +++++++++++++++++++++++++++++++++++
drivers/firmware/efi/efi.c | 25 +++++++++++++------------
include/linux/efi.h | 1 +
3 files changed, 49 insertions(+), 12 deletions(-)
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 17366d81c21b..4e00c244478b 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,9 +1,11 @@
#include <linux/mm.h>
#include <linux/highmem.h>
#include <linux/rmap.h>
+#include <linux/efi.h>
#include <asm/mktme.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
+#include <asm/e820/api.h>
/* Mask to extract KeyID from physical address. */
phys_addr_t __mktme_keyid_mask;
@@ -48,9 +50,42 @@ void mktme_disable(void)
static bool need_page_mktme(void)
{
+ int nid;
+
/* Make sure keyid doesn't collide with extended page flags */
BUILD_BUG_ON(__NR_PAGE_EXT_FLAGS > 16);
+ if (!mktme_nr_keyids())
+ return 0;
+
+ for_each_node_state(nid, N_MEMORY) {
+ const efi_memory_desc_t *md;
+ unsigned long node_start, node_end;
+
+ node_start = node_start_pfn(nid) << PAGE_SHIFT;
+ node_end = node_end_pfn(nid) << PAGE_SHIFT;
+
+ for_each_efi_memory_desc(md) {
+ u64 efi_start = md->phys_addr;
+ u64 efi_end = md->phys_addr + PAGE_SIZE * md->num_pages;
+
+ if (md->attribute & EFI_MEMORY_CPU_CRYPTO)
+ continue;
+ if (efi_start > node_end)
+ continue;
+ if (efi_end < node_start)
+ continue;
+ if (!e820__mapped_any(efi_start, efi_end, E820_TYPE_RAM))
+ continue;
+
+ printk("Memory range %#llx-%#llx: doesn't support encryption\n",
+ efi_start, efi_end);
+ printk("Disable MKTME\n");
+ mktme_disable();
+ break;
+ }
+ }
+
return !!mktme_nr_keyids();
}
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index ad3b1f4866b3..fc19da5da3e8 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -852,25 +852,26 @@ char * __init efi_md_typeattr_format(char *buf, size_t size,
if (attr & ~(EFI_MEMORY_UC | EFI_MEMORY_WC | EFI_MEMORY_WT |
EFI_MEMORY_WB | EFI_MEMORY_UCE | EFI_MEMORY_RO |
EFI_MEMORY_WP | EFI_MEMORY_RP | EFI_MEMORY_XP |
- EFI_MEMORY_NV |
+ EFI_MEMORY_NV | EFI_MEMORY_CPU_CRYPTO |
EFI_MEMORY_RUNTIME | EFI_MEMORY_MORE_RELIABLE))
snprintf(pos, size, "|attr=0x%016llx]",
(unsigned long long)attr);
else
snprintf(pos, size,
- "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
+ "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
attr & EFI_MEMORY_RUNTIME ? "RUN" : "",
attr & EFI_MEMORY_MORE_RELIABLE ? "MR" : "",
- attr & EFI_MEMORY_NV ? "NV" : "",
- attr & EFI_MEMORY_XP ? "XP" : "",
- attr & EFI_MEMORY_RP ? "RP" : "",
- attr & EFI_MEMORY_WP ? "WP" : "",
- attr & EFI_MEMORY_RO ? "RO" : "",
- attr & EFI_MEMORY_UCE ? "UCE" : "",
- attr & EFI_MEMORY_WB ? "WB" : "",
- attr & EFI_MEMORY_WT ? "WT" : "",
- attr & EFI_MEMORY_WC ? "WC" : "",
- attr & EFI_MEMORY_UC ? "UC" : "");
+ attr & EFI_MEMORY_NV ? "NV" : "",
+ attr & EFI_MEMORY_CPU_CRYPTO ? "CR" : "",
+ attr & EFI_MEMORY_XP ? "XP" : "",
+ attr & EFI_MEMORY_RP ? "RP" : "",
+ attr & EFI_MEMORY_WP ? "WP" : "",
+ attr & EFI_MEMORY_RO ? "RO" : "",
+ attr & EFI_MEMORY_UCE ? "UCE" : "",
+ attr & EFI_MEMORY_WB ? "WB" : "",
+ attr & EFI_MEMORY_WT ? "WT" : "",
+ attr & EFI_MEMORY_WC ? "WC" : "",
+ attr & EFI_MEMORY_UC ? "UC" : "");
return buf;
}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index f87fabea4a85..4ac54a168ffe 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -112,6 +112,7 @@ typedef struct {
#define EFI_MEMORY_MORE_RELIABLE \
((u64)0x0000000000010000ULL) /* higher reliability */
#define EFI_MEMORY_RO ((u64)0x0000000000020000ULL) /* read-only */
+#define EFI_MEMORY_CPU_CRYPTO ((u64)0x0000000000080000ULL) /* memory encryption supported */
#define EFI_MEMORY_RUNTIME ((u64)0x8000000000000000ULL) /* range requires runtime mapping */
#define EFI_MEMORY_DESCRIPTOR_VERSION 1
--
2.21.0
From: Alison Schofield <[email protected]>
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/x86/mktme/index.rst | 1 +
Documentation/x86/mktme/mktme_demo.rst | 53 ++++++++++++++++++++++++++
2 files changed, 54 insertions(+)
create mode 100644 Documentation/x86/mktme/mktme_demo.rst
diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
index ca3c76adc596..3af322d13225 100644
--- a/Documentation/x86/mktme/index.rst
+++ b/Documentation/x86/mktme/index.rst
@@ -10,3 +10,4 @@ Multi-Key Total Memory Encryption (MKTME)
mktme_configuration
mktme_keys
mktme_encrypt
+ mktme_demo
diff --git a/Documentation/x86/mktme/mktme_demo.rst b/Documentation/x86/mktme/mktme_demo.rst
new file mode 100644
index 000000000000..5af78617f887
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_demo.rst
@@ -0,0 +1,53 @@
+Demonstration Program using MKTME API's
+=======================================
+
+/* Compile with the keyutils library: cc -o mdemo mdemo.c -lkeyutils */
+
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <keyutils.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGE_SIZE)
+#define sys_encrypt_mprotect 434
+
+void main(void)
+{
+ char *options_CPU = "algorithm=aes-xts-128 type=cpu";
+ long size = PAGE_SIZE;
+ key_serial_t key;
+ void *ptra;
+ int ret;
+
+ /* Allocate an MKTME Key */
+ key = add_key("mktme", "testkey", options_CPU, strlen(options_CPU),
+ KEY_SPEC_THREAD_KEYRING);
+
+ if (key == -1) {
+ printf("addkey FAILED\n");
+ return;
+ }
+ /* Map a page of ANONYMOUS memory */
+ ptra = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+ if (!ptra) {
+ printf("failed to mmap");
+ goto inval_key;
+ }
+ /* Encrypt that page of memory with the MKTME Key */
+ ret = syscall(sys_encrypt_mprotect, ptra, size, PROT_NONE, key);
+ if (ret)
+ printf("mprotect error [%d]\n", ret);
+
+ /* Enjoy that page of encrypted memory */
+
+ /* Free the memory */
+ ret = munmap(ptra, size);
+
+inval_key:
+ /* Free the Key */
+ if (keyctl(KEYCTL_INVALIDATE, key) == -1)
+ printf("invalidate failed on key [%d]\n", key);
+}
--
2.21.0
From: Alison Schofield <[email protected]>
The MKTME (Multi-Key Total Memory Encryption) Key Service needs
a reference count the key usage. This reference count is used
to determine when a hardware encryption KeyID is no longer in use
and can be freed and reassigned to another Userspace Key.
The MKTME Key service does the percpu_ref_init and _kill.
Encrypted VMA's and encrypted pages are included in the reference
counts per keyid.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 5 +++++
arch/x86/mm/mktme.c | 37 ++++++++++++++++++++++++++++++++++--
include/linux/mm.h | 2 ++
kernel/fork.c | 2 ++
4 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index e8f7f80bb013..a5f664d3805b 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -20,6 +20,11 @@ extern unsigned int mktme_algs;
extern void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
unsigned long start, unsigned long end);
+/* MTKME encrypt_count for VMAs */
+extern struct percpu_ref *encrypt_count;
+extern void vma_get_encrypt_ref(struct vm_area_struct *vma);
+extern void vma_put_encrypt_ref(struct vm_area_struct *vma);
+
DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
static inline bool mktme_enabled(void)
{
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 05bbf5058ade..17366d81c21b 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -84,11 +84,12 @@ void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
if (oldkeyid == newkeyid)
return;
-
+ vma_put_encrypt_ref(vma);
newprot = pgprot_val(vma->vm_page_prot);
newprot &= ~mktme_keyid_mask();
newprot |= (unsigned long)newkeyid << mktme_keyid_shift();
vma->vm_page_prot = __pgprot(newprot);
+ vma_get_encrypt_ref(vma);
/*
* The VMA doesn't have any inherited pages.
@@ -97,6 +98,18 @@ void mprotect_set_encrypt(struct vm_area_struct *vma, int newkeyid,
unlink_anon_vmas(vma);
}
+void vma_get_encrypt_ref(struct vm_area_struct *vma)
+{
+ if (vma_keyid(vma))
+ percpu_ref_get(&encrypt_count[vma_keyid(vma)]);
+}
+
+void vma_put_encrypt_ref(struct vm_area_struct *vma)
+{
+ if (vma_keyid(vma))
+ percpu_ref_put(&encrypt_count[vma_keyid(vma)]);
+}
+
/* Prepare page to be used for encryption. Called from page allocator. */
void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
{
@@ -137,6 +150,22 @@ void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
page++;
}
+
+ /*
+ * Make sure the KeyID cannot be freed until the last page that
+ * uses the KeyID is gone.
+ *
+ * This is required because the page may live longer than VMA it
+ * is mapped into (i.e. in get_user_pages() case) and having
+ * refcounting per-VMA is not enough.
+ *
+ * Taking a reference per-4K helps in case if the page will be
+ * split after the allocation. free_encrypted_page() will balance
+ * out the refcount even if the page was split and freed as bunch
+ * of 4K pages.
+ */
+
+ percpu_ref_get_many(&encrypt_count[keyid], 1 << order);
}
/*
@@ -145,7 +174,9 @@ void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
*/
void free_encrypted_page(struct page *page, int order)
{
- int i;
+ int i, keyid;
+
+ keyid = page_keyid(page);
/*
* The hardware/CPU does not enforce coherency between mappings
@@ -177,6 +208,8 @@ void free_encrypted_page(struct page *page, int order)
lookup_page_ext(page)->keyid = 0;
page++;
}
+
+ percpu_ref_put_many(&encrypt_count[keyid], 1 << order);
}
static int sync_direct_mapping_pte(unsigned long keyid,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8551b5ebdedf..be27cb0cc0c7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2911,6 +2911,8 @@ static inline void mprotect_set_encrypt(struct vm_area_struct *vma,
int newkeyid,
unsigned long start,
unsigned long end) {}
+static inline void vma_get_encrypt_ref(struct vm_area_struct *vma) {}
+static inline void vma_put_encrypt_ref(struct vm_area_struct *vma) {}
#endif /* CONFIG_X86_INTEL_MKTME */
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b4148..00735092d370 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -349,12 +349,14 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
if (new) {
*new = *orig;
INIT_LIST_HEAD(&new->anon_vma_chain);
+ vma_get_encrypt_ref(new);
}
return new;
}
void vm_area_free(struct vm_area_struct *vma)
{
+ vma_put_encrypt_ref(vma);
kmem_cache_free(vm_area_cachep, vma);
}
--
2.21.0
From: Alison Schofield <[email protected]>
The MKTME key type uses capabilities to restrict the allocation
of keys to privileged users. CAP_SYS_RESOURCE is required, but
the broader capability of CAP_SYS_ADMIN is accepted.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 1e2afcce7d85..2d90cc83e5ce 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -2,6 +2,7 @@
/* Documentation/x86/mktme/ */
+#include <linux/cred.h>
#include <linux/cpu.h>
#include <linux/init.h>
#include <linux/key.h>
@@ -371,6 +372,9 @@ int mktme_preparse_payload(struct key_preparsed_payload *prep)
char *options;
int ret;
+ if (!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
+ return -EACCES;
+
if (datalen <= 0 || datalen > 1024 || !prep->data)
return -EINVAL;
--
2.21.0
From: Alison Schofield <[email protected]>
MKTME (Multi-Key Total Memory Encryption) is a technology that allows
transparent memory encryption in upcoming Intel platforms. MKTME will
support multiple encryption domains, each having their own key.
The MKTME key service will manage the hardware encryption keys. It
will map Userspace Keys to Hardware KeyIDs and program the hardware
with the user requested encryption options.
Here the mapping structure is introduced, as well as the key service
initialization and registration.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/Makefile | 1 +
security/keys/mktme_keys.c | 60 ++++++++++++++++++++++++++++++++++++++
2 files changed, 61 insertions(+)
create mode 100644 security/keys/mktme_keys.c
diff --git a/security/keys/Makefile b/security/keys/Makefile
index 9cef54064f60..28799be801a9 100644
--- a/security/keys/Makefile
+++ b/security/keys/Makefile
@@ -30,3 +30,4 @@ obj-$(CONFIG_ASYMMETRIC_KEY_TYPE) += keyctl_pkey.o
obj-$(CONFIG_BIG_KEYS) += big_key.o
obj-$(CONFIG_TRUSTED_KEYS) += trusted.o
obj-$(CONFIG_ENCRYPTED_KEYS) += encrypted-keys/
+obj-$(CONFIG_X86_INTEL_MKTME) += mktme_keys.o
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
new file mode 100644
index 000000000000..d262e0f348e4
--- /dev/null
+++ b/security/keys/mktme_keys.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-3.0
+
+/* Documentation/x86/mktme/ */
+
+#include <linux/init.h>
+#include <linux/key.h>
+#include <linux/key-type.h>
+#include <linux/mm.h>
+#include <keys/user-type.h>
+
+#include "internal.h"
+
+static unsigned int mktme_available_keyids; /* Free Hardware KeyIDs */
+
+enum mktme_keyid_state {
+ KEYID_AVAILABLE, /* Available to be assigned */
+ KEYID_ASSIGNED, /* Assigned to a userspace key */
+ KEYID_REF_KILLED, /* Userspace key has been destroyed */
+ KEYID_REF_RELEASED, /* Last reference is released */
+};
+
+/* 1:1 Mapping between Userspace Keys (struct key) and Hardware KeyIDs */
+struct mktme_mapping {
+ struct key *key;
+ enum mktme_keyid_state state;
+};
+
+static struct mktme_mapping *mktme_map;
+
+struct key_type key_type_mktme = {
+ .name = "mktme",
+ .describe = user_describe,
+};
+
+static int __init init_mktme(void)
+{
+ int ret;
+
+ /* Verify keys are present */
+ if (mktme_nr_keyids() < 1)
+ return 0;
+
+ mktme_available_keyids = mktme_nr_keyids();
+
+ /* Mapping of Userspace Keys to Hardware KeyIDs */
+ mktme_map = kvzalloc((sizeof(*mktme_map) * (mktme_nr_keyids() + 1)),
+ GFP_KERNEL);
+ if (!mktme_map)
+ return -ENOMEM;
+
+ ret = register_key_type(&key_type_mktme);
+ if (!ret)
+ return ret; /* SUCCESS */
+
+ kvfree(mktme_map);
+
+ return -ENOMEM;
+}
+
+late_initcall(init_mktme);
--
2.21.0
From: Alison Schofield <[email protected]>
Platforms that need to confirm the presence of an HMAT table
can use this function that simply reports the HMATs existence.
This is added in support of the Multi-Key Total Memory Encryption
(MKTME), a feature on future Intel platforms. These platforms will
need to confirm an HMAT is present at init time.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
drivers/acpi/hmat/hmat.c | 13 +++++++++++++
include/linux/acpi.h | 4 ++++
2 files changed, 17 insertions(+)
diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c
index 96b7d39a97c6..38e3341f569f 100644
--- a/drivers/acpi/hmat/hmat.c
+++ b/drivers/acpi/hmat/hmat.c
@@ -664,3 +664,16 @@ static __init int hmat_init(void)
return 0;
}
subsys_initcall(hmat_init);
+
+bool acpi_hmat_present(void)
+{
+ struct acpi_table_header *tbl;
+ acpi_status status;
+
+ status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl);
+ if (ACPI_FAILURE(status))
+ return false;
+
+ acpi_put_table(tbl);
+ return true;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index fc1e7d4648bf..d27f4d17dfb3 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1335,4 +1335,8 @@ acpi_platform_notify(struct device *dev, enum kobject_action action)
}
#endif
+#ifdef CONFIG_X86_INTEL_MKTME
+extern bool acpi_hmat_present(void);
+#endif /* CONFIG_X86_INTEL_MKTME */
+
#endif /*_LINUX_ACPI_H*/
--
2.21.0
From: Alison Schofield <[email protected]>
Send a request to the MKTME hardware to clear a previously
programmed key. This will be used when userspace keys are
destroyed and the key slot is no longer in use. No longer
in use means that the reference has been released, and its
usage count has returned to zero.
This clear command is not offered as an option to userspace,
since the key service can execute it automatically, and at
the right time, safely.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 18cb57be5193..1e2afcce7d85 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -72,6 +72,9 @@ int mktme_keyid_from_key(struct key *key)
return 0;
}
+static void mktme_clear_hardware_keyid(struct work_struct *work);
+static DECLARE_WORK(mktme_clear_work, mktme_clear_hardware_keyid);
+
struct percpu_ref *encrypt_count;
void mktme_percpu_ref_release(struct percpu_ref *ref)
{
@@ -88,8 +91,9 @@ void mktme_percpu_ref_release(struct percpu_ref *ref)
}
percpu_ref_exit(ref);
spin_lock_irqsave(&mktme_lock, flags);
- mktme_release_keyid(keyid);
+ mktme_map[keyid].state = KEYID_REF_RELEASED;
spin_unlock_irqrestore(&mktme_lock, flags);
+ schedule_work(&mktme_clear_work);
}
enum mktme_opt_id {
@@ -213,6 +217,27 @@ static int mktme_program_keyid(int keyid, u32 payload)
return ret;
}
+static void mktme_clear_hardware_keyid(struct work_struct *work)
+{
+ u32 clear_payload = MKTME_KEYID_CLEAR_KEY;
+ unsigned long flags;
+ int keyid, ret;
+
+ for (keyid = 1; keyid <= mktme_nr_keyids(); keyid++) {
+ if (mktme_map[keyid].state != KEYID_REF_RELEASED)
+ continue;
+
+ ret = mktme_program_keyid(keyid, clear_payload);
+ if (ret != MKTME_PROG_SUCCESS)
+ pr_debug("mktme: clear key failed [%s]\n",
+ mktme_error[ret].msg);
+
+ spin_lock_irqsave(&mktme_lock, flags);
+ mktme_release_keyid(keyid);
+ spin_unlock_irqrestore(&mktme_lock, flags);
+ }
+}
+
/* Key Service Method called when a Userspace Key is garbage collected. */
static void mktme_destroy_key(struct key *key)
{
--
2.21.0
We don't need to have separate implementations of alloc_hugepage_vma()
for NUMA and non-NUMA. Using variant based on alloc_pages_vma() we would
cover both cases.
This is preparation patch for allocation encrypted pages.
alloc_pages_vma() will handle allocation of encrypted pages. With this
change we don' t need to cover alloc_hugepage_vma() separately.
The change makes typo in Alpha's implementation of
__alloc_zeroed_user_highpage() visible. Fix it too.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/alpha/include/asm/page.h | 2 +-
include/linux/gfp.h | 6 ++----
2 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h
index f3fb2848470a..9a6fbb5269f3 100644
--- a/arch/alpha/include/asm/page.h
+++ b/arch/alpha/include/asm/page.h
@@ -18,7 +18,7 @@ extern void clear_page(void *page);
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
- alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vmaddr)
+ alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
#define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
extern void copy_page(void * _to, void * _from);
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fb07b503dc45..3d4cb9fea417 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -511,21 +511,19 @@ alloc_pages(gfp_t gfp_mask, unsigned int order)
extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
struct vm_area_struct *vma, unsigned long addr,
int node, bool hugepage);
-#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
- alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
#else
#define alloc_pages(gfp_mask, order) \
alloc_pages_node(numa_node_id(), gfp_mask, order)
#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
alloc_pages(gfp_mask, order)
-#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
- alloc_pages(gfp_mask, order)
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
#define alloc_page_vma(gfp_mask, vma, addr) \
alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false)
#define alloc_page_vma_node(gfp_mask, vma, addr, node) \
alloc_pages_vma(gfp_mask, 0, vma, addr, node, false)
+#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
+ alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
extern unsigned long get_zeroed_page(gfp_t gfp_mask);
--
2.21.0
Store bitmask of the supported encryption algorithms in 'mktme_algs'.
This will be used by key management service.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 2 ++
arch/x86/kernel/cpu/intel.c | 6 +++++-
arch/x86/mm/mktme.c | 2 ++
3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index b9ba2ea5b600..42a3b1b44669 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -10,6 +10,8 @@ extern int __mktme_keyid_shift;
extern int mktme_keyid_shift(void);
extern int __mktme_nr_keyids;
extern int mktme_nr_keyids(void);
+extern unsigned int mktme_algs;
+
#else
#define mktme_keyid_mask() ((phys_addr_t)0)
#define mktme_nr_keyids() 0
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 7ba44825be42..991bdcb2a55a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -553,6 +553,8 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */
#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1
+#define TME_ACTIVATE_CRYPTO_KNOWN_ALGS TME_ACTIVATE_CRYPTO_AES_XTS_128
+
/* Values for mktme_status (SW only construct) */
#define MKTME_ENABLED 0
#define MKTME_DISABLED 1
@@ -596,7 +598,7 @@ static void detect_tme(struct cpuinfo_x86 *c)
pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy);
tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
- if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) {
+ if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_KNOWN_ALGS)) {
pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n",
tme_crypto_algs);
mktme_status = MKTME_DISABLED;
@@ -631,6 +633,8 @@ static void detect_tme(struct cpuinfo_x86 *c)
__mktme_keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, mktme_keyid_shift());
physical_mask &= ~mktme_keyid_mask();
+ tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
+ mktme_algs = tme_crypto_algs & TME_ACTIVATE_CRYPTO_KNOWN_ALGS;
} else {
/*
* Reset __PHYSICAL_MASK.
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 0f48ef2720cc..755afc6935b5 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -25,3 +25,5 @@ int mktme_nr_keyids(void)
{
return __mktme_nr_keyids;
}
+
+unsigned int mktme_algs;
--
2.21.0
Rename the option to CONFIG_MEMORY_PHYSICAL_PADDING. It will be used
not only for KASLR.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/Kconfig | 2 +-
arch/x86/mm/kaslr.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 222855cc0158..2eb2867db5fa 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2214,7 +2214,7 @@ config RANDOMIZE_MEMORY
If unsure, say Y.
-config RANDOMIZE_MEMORY_PHYSICAL_PADDING
+config MEMORY_PHYSICAL_PADDING
hex "Physical memory mapping padding" if EXPERT
depends on RANDOMIZE_MEMORY
default "0xa" if MEMORY_HOTPLUG
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index dc6182eecefa..580b82c2621b 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -104,7 +104,7 @@ void __init kernel_randomize_memory(void)
*/
BUG_ON(kaslr_regions[0].base != &page_offset_base);
memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
- CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+ CONFIG_MEMORY_PHYSICAL_PADDING;
/* Adapt phyiscal memory region size based on available memory */
if (memory_tb < kaslr_regions[0].size_tb)
--
2.21.0
From: Jacob Pan <[email protected]>
Replace sme_ code with x86 memory encryption common code such that
Intel MKTME can be supported underneath generic DMA code.
dma_to_phys() & phys_to_dma() results will be runtime modified by
memory encryption code.
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 29 +++++++++++++++++++++++++++++
arch/x86/mm/mem_encrypt_common.c | 2 +-
include/linux/dma-direct.h | 4 ++--
include/linux/mem_encrypt.h | 23 ++++++++++-------------
4 files changed, 42 insertions(+), 16 deletions(-)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 0c196c47d621..62a1493f389c 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -52,8 +52,19 @@ bool sev_active(void);
#define __bss_decrypted __attribute__((__section__(".bss..decrypted")))
+/*
+ * The __sme_set() and __sme_clr() macros are useful for adding or removing
+ * the encryption mask from a value (e.g. when dealing with pagetable
+ * entries).
+ */
+#define __sme_set(x) ((x) | sme_me_mask)
+#define __sme_clr(x) ((x) & ~sme_me_mask)
+
#else /* !CONFIG_AMD_MEM_ENCRYPT */
+#define __sme_set(x) (x)
+#define __sme_clr(x) (x)
+
#define sme_me_mask 0ULL
static inline void __init sme_early_encrypt(resource_size_t paddr,
@@ -94,4 +105,22 @@ extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypte
#endif /* __ASSEMBLY__ */
+#ifdef CONFIG_X86_MEM_ENCRYPT_COMMON
+
+extern dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr);
+extern phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr);
+
+#else
+static inline dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr)
+{
+ return daddr;
+}
+
+static inline phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
+{
+ return paddr;
+}
+#endif /* CONFIG_X86_MEM_ENCRYPT_COMMON */
+
+
#endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/mm/mem_encrypt_common.c b/arch/x86/mm/mem_encrypt_common.c
index c11d70151735..588d6ea45624 100644
--- a/arch/x86/mm/mem_encrypt_common.c
+++ b/arch/x86/mm/mem_encrypt_common.c
@@ -1,6 +1,6 @@
#include <linux/mm.h>
-#include <linux/mem_encrypt.h>
#include <linux/dma-mapping.h>
+#include <asm/mem_encrypt.h>
#include <asm/mktme.h>
/*
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index adf993a3bd58..6ce96b06c440 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -49,12 +49,12 @@ static inline bool force_dma_unencrypted(struct device *dev)
*/
static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
{
- return __sme_set(__phys_to_dma(dev, paddr));
+ return __mem_encrypt_dma_set(__phys_to_dma(dev, paddr), paddr);
}
static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
{
- return __sme_clr(__dma_to_phys(dev, daddr));
+ return __mem_encrypt_dma_clear(__dma_to_phys(dev, daddr));
}
u64 dma_direct_get_required_mask(struct device *dev);
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 470bd53a89df..88724aa7c065 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -23,6 +23,16 @@
static inline bool sme_active(void) { return false; }
static inline bool sev_active(void) { return false; }
+static inline dma_addr_t __mem_encrypt_dma_set(dma_addr_t daddr, phys_addr_t paddr)
+{
+ return daddr;
+}
+
+static inline phys_addr_t __mem_encrypt_dma_clear(phys_addr_t paddr)
+{
+ return paddr;
+}
+
#endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
static inline bool mem_encrypt_active(void)
@@ -35,19 +45,6 @@ static inline u64 sme_get_me_mask(void)
return sme_me_mask;
}
-#ifdef CONFIG_AMD_MEM_ENCRYPT
-/*
- * The __sme_set() and __sme_clr() macros are useful for adding or removing
- * the encryption mask from a value (e.g. when dealing with pagetable
- * entries).
- */
-#define __sme_set(x) ((x) | sme_me_mask)
-#define __sme_clr(x) ((x) & ~sme_me_mask)
-#else
-#define __sme_set(x) (x)
-#define __sme_clr(x) (x)
-#endif
-
#endif /* __ASSEMBLY__ */
#endif /* __MEM_ENCRYPT_H__ */
--
2.21.0
From: Alison Schofield <[email protected]>
The ACPI HMAT will be used by the MKTME key service to identify
topologies that support the safe programming of encryption keys.
Those decisions will happen at key creation time and during
hotplug events.
To enable this, we at least need to have the ACPI HMAT present
at init time. If it's not present, do not register the type.
If the HMAT is not present, failure looks like this:
[ ] MKTME: Registration failed. ACPI HMAT not present.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index 2d90cc83e5ce..6265b62801e9 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -2,6 +2,7 @@
/* Documentation/x86/mktme/ */
+#include <linux/acpi.h>
#include <linux/cred.h>
#include <linux/cpu.h>
#include <linux/init.h>
@@ -445,6 +446,12 @@ static int __init init_mktme(void)
mktme_available_keyids = mktme_nr_keyids();
+ /* Require an ACPI HMAT to identify MKTME safe topologies */
+ if (!acpi_hmat_present()) {
+ pr_warn("MKTME: Registration failed. ACPI HMAT not present.\n");
+ return -EINVAL;
+ }
+
/* Mapping of Userspace Keys to Hardware KeyIDs */
mktme_map = kvzalloc((sizeof(*mktme_map) * (mktme_nr_keyids() + 1)),
GFP_KERNEL);
--
2.21.0
From: Alison Schofield <[email protected]>
The Intel MKTME architecture specification requires an activated
encryption algorithm for all command types.
For commands that actually perform encryption, SET_KEY_DIRECT and
SET_KEY_RANDOM, the user specifies the algorithm when requesting the
key through the MKTME Key Service.
For CLEAR_KEY and NO_ENCRYPT commands, do not require the user to
specify an algorithm. Define a default algorithm, that is 'any
activated algorithm' to cover those two special cases.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/intel_pconfig.h | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/intel_pconfig.h b/arch/x86/include/asm/intel_pconfig.h
index 3cb002b1d0f9..4f27b0c532ee 100644
--- a/arch/x86/include/asm/intel_pconfig.h
+++ b/arch/x86/include/asm/intel_pconfig.h
@@ -21,14 +21,20 @@ enum pconfig_leaf {
/* Defines and structure for MKTME_KEY_PROGRAM of PCONFIG instruction */
+/* mktme_key_program::keyid_ctrl ENC_ALG, bits [23:8] */
+#define MKTME_AES_XTS_128 (1 << 8)
+#define MKTME_ANY_ACTIVATED_ALG (1 << __ffs(mktme_algs) << 8)
+
/* mktme_key_program::keyid_ctrl COMMAND, bits [7:0] */
#define MKTME_KEYID_SET_KEY_DIRECT 0
#define MKTME_KEYID_SET_KEY_RANDOM 1
-#define MKTME_KEYID_CLEAR_KEY 2
-#define MKTME_KEYID_NO_ENCRYPT 3
-/* mktme_key_program::keyid_ctrl ENC_ALG, bits [23:8] */
-#define MKTME_AES_XTS_128 (1 << 8)
+/*
+ * CLEAR_KEY and NO_ENCRYPT require the COMMAND in bits [7:0]
+ * and any activated encryption algorithm, ENC_ALG, in bits [23:8]
+ */
+#define MKTME_KEYID_CLEAR_KEY (2 | MKTME_ANY_ACTIVATED_ALG)
+#define MKTME_KEYID_NO_ENCRYPT (3 | MKTME_ANY_ACTIVATED_ALG)
/* Return codes from the PCONFIG MKTME_KEY_PROGRAM */
#define MKTME_PROG_SUCCESS 0
--
2.21.0
page_ext allows to store additional per-page information without growing
main struct page. The additional space can be requested at boot time.
Store KeyID in bits 31:16 of extended page flags. These bits are unused.
page_keyid() returns zero until page_ext is ready. page_ext initializer
enables a static branch to indicate that page_keyid() can use page_ext.
The same static branch will gate MKTME readiness in general.
We don't yet set KeyID for the page. It will come in the following
patch that implements prep_encrypted_page(). All pages have KeyID-0 for
now.
page_keyid() will be used by KVM which can be built as a module. We need
to export mktme_enabled_key to be able to inline page_keyid().
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 26 ++++++++++++++++++++++++++
arch/x86/include/asm/page.h | 1 +
arch/x86/mm/mktme.c | 21 +++++++++++++++++++++
include/linux/mm.h | 2 +-
include/linux/page_ext.h | 11 ++++++++++-
mm/page_ext.c | 3 +++
6 files changed, 62 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 42a3b1b44669..46041075f617 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -2,6 +2,8 @@
#define _ASM_X86_MKTME_H
#include <linux/types.h>
+#include <linux/page_ext.h>
+#include <linux/jump_label.h>
#ifdef CONFIG_X86_INTEL_MKTME
extern phys_addr_t __mktme_keyid_mask;
@@ -12,10 +14,34 @@ extern int __mktme_nr_keyids;
extern int mktme_nr_keyids(void);
extern unsigned int mktme_algs;
+DECLARE_STATIC_KEY_FALSE(mktme_enabled_key);
+static inline bool mktme_enabled(void)
+{
+ return static_branch_unlikely(&mktme_enabled_key);
+}
+
+extern struct page_ext_operations page_mktme_ops;
+
+#define page_keyid page_keyid
+static inline int page_keyid(const struct page *page)
+{
+ if (!mktme_enabled())
+ return 0;
+
+ return lookup_page_ext(page)->keyid;
+}
+
#else
#define mktme_keyid_mask() ((phys_addr_t)0)
#define mktme_nr_keyids() 0
#define mktme_keyid_shift() 0
+
+#define page_keyid(page) 0
+
+static inline bool mktme_enabled(void)
+{
+ return false;
+}
#endif
#endif
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 7555b48803a8..39af59487d5f 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -19,6 +19,7 @@
struct page;
#include <linux/range.h>
+#include <asm/mktme.h>
extern struct range pfn_mapped[];
extern int nr_pfn_mapped;
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 755afc6935b5..48c2d4c97356 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -27,3 +27,24 @@ int mktme_nr_keyids(void)
}
unsigned int mktme_algs;
+
+DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
+EXPORT_SYMBOL_GPL(mktme_enabled_key);
+
+static bool need_page_mktme(void)
+{
+ /* Make sure keyid doesn't collide with extended page flags */
+ BUILD_BUG_ON(__NR_PAGE_EXT_FLAGS > 16);
+
+ return !!mktme_nr_keyids();
+}
+
+static void init_page_mktme(void)
+{
+ static_branch_enable(&mktme_enabled_key);
+}
+
+struct page_ext_operations page_mktme_ops = {
+ .need = need_page_mktme,
+ .init = init_page_mktme,
+};
diff --git a/include/linux/mm.h b/include/linux/mm.h
index af1a56ff6764..3f9640f388ac 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1645,7 +1645,7 @@ static inline int vma_keyid(struct vm_area_struct *vma)
#endif
#ifndef page_keyid
-static inline int page_keyid(struct page *page)
+static inline int page_keyid(const struct page *page)
{
return 0;
}
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 09592951725c..a9fa95ae9847 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -22,6 +22,7 @@ enum page_ext_flags {
PAGE_EXT_YOUNG,
PAGE_EXT_IDLE,
#endif
+ __NR_PAGE_EXT_FLAGS
};
/*
@@ -32,7 +33,15 @@ enum page_ext_flags {
* then the page_ext for pfn always exists.
*/
struct page_ext {
- unsigned long flags;
+ union {
+ unsigned long flags;
+#ifdef CONFIG_X86_INTEL_MKTME
+ struct {
+ unsigned short __pad;
+ unsigned short keyid;
+ };
+#endif
+ };
};
extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 5f5769c7db3b..c52b77c13cd9 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -65,6 +65,9 @@ static struct page_ext_operations *page_ext_ops[] = {
#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
&page_idle_ops,
#endif
+#ifdef CONFIG_X86_INTEL_MKTME
+ &page_mktme_ops,
+#endif
};
static unsigned long total_usage;
--
2.21.0
KSM compares plain text. It might try to merge two pages that have the
same plain text but different ciphertext and possibly different
encryption keys. When the kernel encrypted the page, it promised that
it would keep it encrypted with _that_ key. That makes it impossible to
merge two pages encrypted with different keys.
Never merge encrypted pages with different KeyIDs.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
include/linux/mm.h | 7 +++++++
mm/ksm.c | 17 +++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5bfd3dd121c1..af1a56ff6764 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1644,6 +1644,13 @@ static inline int vma_keyid(struct vm_area_struct *vma)
}
#endif
+#ifndef page_keyid
+static inline int page_keyid(struct page *page)
+{
+ return 0;
+}
+#endif
+
extern unsigned long move_page_tables(struct vm_area_struct *vma,
unsigned long old_addr, struct vm_area_struct *new_vma,
unsigned long new_addr, unsigned long len,
diff --git a/mm/ksm.c b/mm/ksm.c
index 3dc4346411e4..7d4ef634f38e 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1228,6 +1228,23 @@ static int try_to_merge_one_page(struct vm_area_struct *vma,
if (!PageAnon(page))
goto out;
+ /*
+ * KeyID indicates what key to use to encrypt and decrypt page's
+ * content.
+ *
+ * KSM compares plain text instead (transparently to KSM code).
+ *
+ * But we still need to make sure that pages with identical plain
+ * text will not be merged together if they are encrypted with
+ * different keys.
+ *
+ * To make it work kernel only allows merging pages with the same KeyID.
+ * The approach guarantees that the merged page can be read by all
+ * users.
+ */
+ if (kpage && page_keyid(page) != page_keyid(kpage))
+ goto out;
+
/*
* We need the page lock to read a stable PageSwapCache in
* write_protect_page(). We use trylock_page() instead of
--
2.21.0
The new helper mktme_disable() allows to disable MKTME even if it's
enumerated successfully. MKTME initialization may fail and this
functionality allows system to boot regardless of the failure.
MKTME needs per-KeyID direct mapping. It requires a lot more virtual
address space which may be a problem in 4-level paging mode. If the
system has more physical memory than we can handle with MKTME the
feature allows to fail MKTME, but boot the system successfully.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 5 +++++
arch/x86/kernel/cpu/intel.c | 5 +----
arch/x86/mm/mktme.c | 10 ++++++++++
3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index a61b45fca4b1..3fc246acc279 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -22,6 +22,8 @@ static inline bool mktme_enabled(void)
return static_branch_unlikely(&mktme_enabled_key);
}
+void mktme_disable(void);
+
extern struct page_ext_operations page_mktme_ops;
#define page_keyid page_keyid
@@ -71,6 +73,9 @@ static inline bool mktme_enabled(void)
{
return false;
}
+
+static inline void mktme_disable(void) {}
+
#endif
#endif
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4c2d70287eb4..9852580340b9 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -650,10 +650,7 @@ static void detect_tme(struct cpuinfo_x86 *c)
* We must not allow onlining secondary CPUs with non-matching
* configuration.
*/
- physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
- __mktme_keyid_mask = 0;
- __mktme_keyid_shift = 0;
- __mktme_nr_keyids = 0;
+ mktme_disable();
}
#endif
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index 8015e7822c9b..1e8d662e5bff 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -33,6 +33,16 @@ unsigned int mktme_algs;
DEFINE_STATIC_KEY_FALSE(mktme_enabled_key);
EXPORT_SYMBOL_GPL(mktme_enabled_key);
+void mktme_disable(void)
+{
+ physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+ __mktme_keyid_mask = 0;
+ __mktme_keyid_shift = 0;
+ __mktme_nr_keyids = 0;
+ if (mktme_enabled())
+ static_branch_disable(&mktme_enabled_key);
+}
+
static bool need_page_mktme(void)
{
/* Make sure keyid doesn't collide with extended page flags */
--
2.21.0
From: Alison Schofield <[email protected]>
Instantiate is a Kernel Key Service method invoked when a key is
added (add_key, request_key) by the user.
During instantiation, MKTME allocates an available hardware KeyID
and maps it to the Userspace Key.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
security/keys/mktme_keys.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/security/keys/mktme_keys.c b/security/keys/mktme_keys.c
index fe119a155235..beca852db01a 100644
--- a/security/keys/mktme_keys.c
+++ b/security/keys/mktme_keys.c
@@ -14,6 +14,7 @@
#include "internal.h"
+static DEFINE_SPINLOCK(mktme_lock);
static unsigned int mktme_available_keyids; /* Free Hardware KeyIDs */
enum mktme_keyid_state {
@@ -31,6 +32,24 @@ struct mktme_mapping {
static struct mktme_mapping *mktme_map;
+int mktme_reserve_keyid(struct key *key)
+{
+ int i;
+
+ if (!mktme_available_keyids)
+ return 0;
+
+ for (i = 1; i <= mktme_nr_keyids(); i++) {
+ if (mktme_map[i].state == KEYID_AVAILABLE) {
+ mktme_map[i].state = KEYID_ASSIGNED;
+ mktme_map[i].key = key;
+ mktme_available_keyids--;
+ return i;
+ }
+ }
+ return 0;
+}
+
enum mktme_opt_id {
OPT_ERROR,
OPT_TYPE,
@@ -43,6 +62,20 @@ static const match_table_t mktme_token = {
{OPT_ERROR, NULL}
};
+/* Key Service Method to create a new key. Payload is preparsed. */
+int mktme_instantiate_key(struct key *key, struct key_preparsed_payload *prep)
+{
+ unsigned long flags;
+ int keyid;
+
+ spin_lock_irqsave(&mktme_lock, flags);
+ keyid = mktme_reserve_keyid(key);
+ spin_unlock_irqrestore(&mktme_lock, flags);
+ if (!keyid)
+ return -ENOKEY;
+ return 0;
+}
+
/* Make sure arguments are correct for the TYPE of key requested */
static int mktme_check_options(u32 *payload, unsigned long token_mask,
enum mktme_type type, enum mktme_alg alg)
@@ -163,6 +196,7 @@ struct key_type key_type_mktme = {
.name = "mktme",
.preparse = mktme_preparse_payload,
.free_preparse = mktme_free_preparsed_payload,
+ .instantiate = mktme_instantiate_key,
.describe = user_describe,
};
--
2.21.0
VMAs with different KeyID do not mix together. Only VMAs with the same
KeyID are compatible.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
fs/userfaultfd.c | 7 ++++---
include/linux/mm.h | 9 ++++++++-
mm/madvise.c | 2 +-
mm/mempolicy.c | 3 ++-
mm/mlock.c | 2 +-
mm/mmap.c | 31 +++++++++++++++++++------------
mm/mprotect.c | 2 +-
7 files changed, 36 insertions(+), 20 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index ccbdbd62f0d8..3b845a6a44d0 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -911,7 +911,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
new_flags, vma->anon_vma,
vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- NULL_VM_UFFD_CTX);
+ NULL_VM_UFFD_CTX, vma_keyid(vma));
if (prev)
vma = prev;
else
@@ -1461,7 +1461,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
prev = vma_merge(mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- ((struct vm_userfaultfd_ctx){ ctx }));
+ ((struct vm_userfaultfd_ctx){ ctx }),
+ vma_keyid(vma));
if (prev) {
vma = prev;
goto next;
@@ -1623,7 +1624,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
prev = vma_merge(mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- NULL_VM_UFFD_CTX);
+ NULL_VM_UFFD_CTX, vma_keyid(vma));
if (prev) {
vma = prev;
goto next;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0334ca97c584..5bfd3dd121c1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1637,6 +1637,13 @@ int clear_page_dirty_for_io(struct page *page);
int get_cmdline(struct task_struct *task, char *buffer, int buflen);
+#ifndef vma_keyid
+static inline int vma_keyid(struct vm_area_struct *vma)
+{
+ return 0;
+}
+#endif
+
extern unsigned long move_page_tables(struct vm_area_struct *vma,
unsigned long old_addr, struct vm_area_struct *new_vma,
unsigned long new_addr, unsigned long len,
@@ -2301,7 +2308,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
extern struct vm_area_struct *vma_merge(struct mm_struct *,
struct vm_area_struct *prev, unsigned long addr, unsigned long end,
unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
- struct mempolicy *, struct vm_userfaultfd_ctx);
+ struct mempolicy *, struct vm_userfaultfd_ctx, int keyid);
extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
unsigned long addr, int new_below);
diff --git a/mm/madvise.c b/mm/madvise.c
index 968df3aa069f..00216780a630 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -138,7 +138,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, vma_keyid(vma));
if (*prev) {
vma = *prev;
goto success;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f48693f75b37..14ee933b1ff7 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -731,7 +731,8 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
((vmstart - vma->vm_start) >> PAGE_SHIFT);
prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff,
- new_pol, vma->vm_userfaultfd_ctx);
+ new_pol, vma->vm_userfaultfd_ctx,
+ vma_keyid(vma));
if (prev) {
vma = prev;
next = vma->vm_next;
diff --git a/mm/mlock.c b/mm/mlock.c
index a90099da4fb4..3d0a31bf214c 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -535,7 +535,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, vma_keyid(vma));
if (*prev) {
vma = *prev;
goto success;
diff --git a/mm/mmap.c b/mm/mmap.c
index 7e8c3e8ae75f..715438a1fb93 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1008,7 +1008,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
*/
static inline int is_mergeable_vma(struct vm_area_struct *vma,
struct file *file, unsigned long vm_flags,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ int keyid)
{
/*
* VM_SOFTDIRTY should not prevent from VMA merging, if we
@@ -1022,6 +1023,8 @@ static inline int is_mergeable_vma(struct vm_area_struct *vma,
return 0;
if (vma->vm_file != file)
return 0;
+ if (vma_keyid(vma) != keyid)
+ return 0;
if (vma->vm_ops && vma->vm_ops->close)
return 0;
if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
@@ -1058,9 +1061,10 @@ static int
can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
struct anon_vma *anon_vma, struct file *file,
pgoff_t vm_pgoff,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ int keyid)
{
- if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
+ if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, keyid) &&
is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
if (vma->vm_pgoff == vm_pgoff)
return 1;
@@ -1079,9 +1083,10 @@ static int
can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
struct anon_vma *anon_vma, struct file *file,
pgoff_t vm_pgoff,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ int keyid)
{
- if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
+ if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, keyid) &&
is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
pgoff_t vm_pglen;
vm_pglen = vma_pages(vma);
@@ -1136,7 +1141,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
unsigned long end, unsigned long vm_flags,
struct anon_vma *anon_vma, struct file *file,
pgoff_t pgoff, struct mempolicy *policy,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ int keyid)
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
struct vm_area_struct *area, *next;
@@ -1169,7 +1175,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
mpol_equal(vma_policy(prev), policy) &&
can_vma_merge_after(prev, vm_flags,
anon_vma, file, pgoff,
- vm_userfaultfd_ctx)) {
+ vm_userfaultfd_ctx, keyid)) {
/*
* OK, it can. Can we now merge in the successor as well?
*/
@@ -1178,7 +1184,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
can_vma_merge_before(next, vm_flags,
anon_vma, file,
pgoff+pglen,
- vm_userfaultfd_ctx) &&
+ vm_userfaultfd_ctx,
+ keyid) &&
is_mergeable_anon_vma(prev->anon_vma,
next->anon_vma, NULL)) {
/* cases 1, 6 */
@@ -1201,7 +1208,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
mpol_equal(policy, vma_policy(next)) &&
can_vma_merge_before(next, vm_flags,
anon_vma, file, pgoff+pglen,
- vm_userfaultfd_ctx)) {
+ vm_userfaultfd_ctx, keyid)) {
if (prev && addr < prev->vm_end) /* case 4 */
err = __vma_adjust(prev, prev->vm_start,
addr, prev->vm_pgoff, NULL, next);
@@ -1746,7 +1753,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* Can we just expand an old mapping?
*/
vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
- NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX);
+ NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, 0);
if (vma)
goto out;
@@ -3025,7 +3032,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
/* Can we just expand an old private anonymous mapping? */
vma = vma_merge(mm, prev, addr, addr + len, flags,
- NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX);
+ NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, 0);
if (vma)
goto out;
@@ -3223,7 +3230,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
return NULL; /* should never get here */
new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, vma_keyid(vma));
if (new_vma) {
/*
* Source vma may have been merged into new_vma
diff --git a/mm/mprotect.c b/mm/mprotect.c
index bf38dfbbb4b4..82d7b194a918 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -400,7 +400,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*pprev = vma_merge(mm, *pprev, start, end, newflags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, vma_keyid(vma));
if (*pprev) {
vma = *pprev;
VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
--
2.21.0
mktme_nr_keyids() returns the number of KeyIDs available for MKTME,
excluding KeyID zero which used by TME. MKTME KeyIDs start from 1.
mktme_keyid_shift() returns the shift of KeyID within physical address.
mktme_keyid_mask() returns the mask to extract KeyID from physical address.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 19 +++++++++++++++++++
arch/x86/kernel/cpu/intel.c | 15 ++++++++++++---
arch/x86/mm/Makefile | 2 ++
arch/x86/mm/mktme.c | 27 +++++++++++++++++++++++++++
4 files changed, 60 insertions(+), 3 deletions(-)
create mode 100644 arch/x86/include/asm/mktme.h
create mode 100644 arch/x86/mm/mktme.c
diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
new file mode 100644
index 000000000000..b9ba2ea5b600
--- /dev/null
+++ b/arch/x86/include/asm/mktme.h
@@ -0,0 +1,19 @@
+#ifndef _ASM_X86_MKTME_H
+#define _ASM_X86_MKTME_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_X86_INTEL_MKTME
+extern phys_addr_t __mktme_keyid_mask;
+extern phys_addr_t mktme_keyid_mask(void);
+extern int __mktme_keyid_shift;
+extern int mktme_keyid_shift(void);
+extern int __mktme_nr_keyids;
+extern int mktme_nr_keyids(void);
+#else
+#define mktme_keyid_mask() ((phys_addr_t)0)
+#define mktme_nr_keyids() 0
+#define mktme_keyid_shift() 0
+#endif
+
+#endif
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index f03eee666761..7ba44825be42 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -618,6 +618,9 @@ static void detect_tme(struct cpuinfo_x86 *c)
#ifdef CONFIG_X86_INTEL_MKTME
if (mktme_status == MKTME_ENABLED && nr_keyids) {
+ __mktme_nr_keyids = nr_keyids;
+ __mktme_keyid_shift = c->x86_phys_bits - keyid_bits;
+
/*
* Mask out bits claimed from KeyID from physical address mask.
*
@@ -625,17 +628,23 @@ static void detect_tme(struct cpuinfo_x86 *c)
* and number of bits claimed for KeyID is 6, bits 51:46 of
* physical address is unusable.
*/
- phys_addr_t keyid_mask;
+ __mktme_keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, mktme_keyid_shift());
+ physical_mask &= ~mktme_keyid_mask();
- keyid_mask = GENMASK_ULL(c->x86_phys_bits - 1, c->x86_phys_bits - keyid_bits);
- physical_mask &= ~keyid_mask;
} else {
/*
* Reset __PHYSICAL_MASK.
* Maybe needed if there's inconsistent configuation
* between CPUs.
+ *
+ * FIXME: broken for hotplug.
+ * We must not allow onlining secondary CPUs with non-matching
+ * configuration.
*/
physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+ __mktme_keyid_mask = 0;
+ __mktme_keyid_shift = 0;
+ __mktme_nr_keyids = 0;
}
#endif
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 84373dc9b341..600d18691876 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -53,3 +53,5 @@ obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o
+
+obj-$(CONFIG_X86_INTEL_MKTME) += mktme.o
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
new file mode 100644
index 000000000000..0f48ef2720cc
--- /dev/null
+++ b/arch/x86/mm/mktme.c
@@ -0,0 +1,27 @@
+#include <asm/mktme.h>
+
+/* Mask to extract KeyID from physical address. */
+phys_addr_t __mktme_keyid_mask;
+phys_addr_t mktme_keyid_mask(void)
+{
+ return __mktme_keyid_mask;
+}
+EXPORT_SYMBOL_GPL(mktme_keyid_mask);
+
+/* Shift of KeyID within physical address. */
+int __mktme_keyid_shift;
+int mktme_keyid_shift(void)
+{
+ return __mktme_keyid_shift;
+}
+EXPORT_SYMBOL_GPL(mktme_keyid_shift);
+
+/*
+ * Number of KeyIDs available for MKTME.
+ * Excludes KeyID-0 which used by TME. MKTME KeyIDs start from 1.
+ */
+int __mktme_nr_keyids;
+int mktme_nr_keyids(void)
+{
+ return __mktme_nr_keyids;
+}
--
2.21.0
The kernel needs to have a way to access encrypted memory. We have two
option on how approach it:
- Create temporary mappings every time kernel needs access to encrypted
memory. That's basically brings highmem and its overhead back.
- Create multiple direct mappings, one per-KeyID. In this setup we
don't need to create temporary mappings on the fly -- encrypted
memory is permanently available in kernel address space.
We take the second approach as it has lower overhead.
It's worth noting that with per-KeyID direct mappings compromised kernel
would give access to decrypted data right away without additional tricks
to get memory mapped with the correct KeyID.
Per-KeyID mappings require a lot more virtual address space. On 4-level
machine with 64 KeyIDs we max out 46-bit virtual address space dedicated
for direct mapping with 1TiB of RAM. Given that we round up any
calculation on direct mapping size to 1TiB, we effectively claim all
46-bit address space for direct mapping on such machine regardless of
RAM size.
Increased usage of virtual address space has implications for KASLR:
we have less space for randomization. With 64 TiB claimed for direct
mapping with 4-level we left with 27 TiB of entropy to place
page_offset_base, vmalloc_base and vmemmap_base.
5-level paging provides much wider virtual address space and KASLR
doesn't suffer significantly from per-KeyID direct mappings.
It's preferred to run MKTME with 5-level paging.
A direct mapping for each KeyID will be put next to each other in the
virtual address space. We need to have a way to find boundaries of
direct mapping for particular KeyID.
The new variable direct_mapping_size specifies the size of direct
mapping. With the value, it's trivial to find direct mapping for
KeyID-N: PAGE_OFFSET + N * direct_mapping_size.
Size of direct mapping is calculated during KASLR setup. If KALSR is
disabled it happens during MKTME initialization.
With MKTME size of direct mapping has to be power-of-2. It makes
implementation of __pa() efficient.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/x86/x86_64/mm.rst | 4 +++
arch/x86/include/asm/page_32.h | 1 +
arch/x86/include/asm/page_64.h | 2 ++
arch/x86/include/asm/setup.h | 6 ++++
arch/x86/kernel/head64.c | 4 +++
arch/x86/kernel/setup.c | 3 ++
arch/x86/mm/init_64.c | 58 +++++++++++++++++++++++++++++++++
arch/x86/mm/kaslr.c | 11 +++++--
8 files changed, 86 insertions(+), 3 deletions(-)
diff --git a/Documentation/x86/x86_64/mm.rst b/Documentation/x86/x86_64/mm.rst
index 267fc4808945..7978afe6c396 100644
--- a/Documentation/x86/x86_64/mm.rst
+++ b/Documentation/x86/x86_64/mm.rst
@@ -140,6 +140,10 @@ The direct mapping covers all memory in the system up to the highest
memory address (this means in some cases it can also include PCI memory
holes).
+With MKTME, we have multiple direct mappings. One per-KeyID. They are put
+next to each other. PAGE_OFFSET + N * direct_mapping_size can be used to
+find direct mapping for KeyID-N.
+
vmalloc space is lazily synchronized into the different PML4/PML5 pages of
the processes using the page fault handler, with init_top_pgt as
reference.
diff --git a/arch/x86/include/asm/page_32.h b/arch/x86/include/asm/page_32.h
index 94dbd51df58f..8bce788f9ca9 100644
--- a/arch/x86/include/asm/page_32.h
+++ b/arch/x86/include/asm/page_32.h
@@ -6,6 +6,7 @@
#ifndef __ASSEMBLY__
+#define direct_mapping_size 0
#define __phys_addr_nodebug(x) ((x) - PAGE_OFFSET)
#ifdef CONFIG_DEBUG_VIRTUAL
extern unsigned long __phys_addr(unsigned long);
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index 939b1cff4a7b..f57fc3cc2246 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -14,6 +14,8 @@ extern unsigned long phys_base;
extern unsigned long page_offset_base;
extern unsigned long vmalloc_base;
extern unsigned long vmemmap_base;
+extern unsigned long direct_mapping_size;
+extern unsigned long direct_mapping_mask;
static inline unsigned long __phys_addr_nodebug(unsigned long x)
{
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index ed8ec011a9fd..d2861074cf83 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -62,6 +62,12 @@ extern void x86_ce4100_early_setup(void);
static inline void x86_ce4100_early_setup(void) { }
#endif
+#ifdef CONFIG_MEMORY_PHYSICAL_PADDING
+void calculate_direct_mapping_size(void);
+#else
+static inline void calculate_direct_mapping_size(void) { }
+#endif
+
#ifndef _SETUP
#include <asm/espfix.h>
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 29ffa495bd1c..006d3ff46afe 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -60,6 +60,10 @@ EXPORT_SYMBOL(vmalloc_base);
unsigned long vmemmap_base __ro_after_init = __VMEMMAP_BASE_L4;
EXPORT_SYMBOL(vmemmap_base);
#endif
+unsigned long direct_mapping_size __ro_after_init = -1UL;
+EXPORT_SYMBOL(direct_mapping_size);
+unsigned long direct_mapping_mask __ro_after_init = -1UL;
+EXPORT_SYMBOL(direct_mapping_mask);
#define __head __section(.head.text)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bbe35bf879f5..d12431e20876 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1077,6 +1077,9 @@ void __init setup_arch(char **cmdline_p)
*/
init_cache_modes();
+ /* direct_mapping_size has to be initialized before KASLR and MKTME */
+ calculate_direct_mapping_size();
+
/*
* Define random base addresses for memory sections after max_pfn is
* defined and before each memory section base is used.
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a6b5c653727b..4c1f93df47a5 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1440,6 +1440,64 @@ unsigned long memory_block_size_bytes(void)
return memory_block_size_probed;
}
+#ifdef CONFIG_MEMORY_PHYSICAL_PADDING
+void __init calculate_direct_mapping_size(void)
+{
+ unsigned long available_va;
+
+ /* 1/4 of virtual address space is didicated for direct mapping */
+ available_va = 1UL << (__VIRTUAL_MASK_SHIFT - 1);
+
+ /* How much memory the system has? */
+ direct_mapping_size = max_pfn << PAGE_SHIFT;
+ direct_mapping_size = round_up(direct_mapping_size, 1UL << 40);
+
+ if (!mktme_nr_keyids())
+ goto out;
+
+ /*
+ * For MKTME we need direct_mapping_size to be power-of-2.
+ * It makes __pa() implementation efficient.
+ */
+ direct_mapping_size = roundup_pow_of_two(direct_mapping_size);
+
+ /*
+ * Not enough virtual address space to address all physical memory with
+ * MKTME enabled. Even without padding.
+ *
+ * Disable MKTME instead.
+ */
+ if (direct_mapping_size > available_va / (mktme_nr_keyids() + 1)) {
+ pr_err("x86/mktme: Disabled. Not enough virtual address space\n");
+ pr_err("x86/mktme: Consider switching to 5-level paging\n");
+ mktme_disable();
+ goto out;
+ }
+
+ /*
+ * Virtual address space is divided between per-KeyID direct mappings.
+ */
+ available_va /= mktme_nr_keyids() + 1;
+out:
+ /* Add padding, if there's enough virtual address space */
+ direct_mapping_size += (1UL << 40) * CONFIG_MEMORY_PHYSICAL_PADDING;
+ if (mktme_nr_keyids())
+ direct_mapping_size = roundup_pow_of_two(direct_mapping_size);
+
+ if (direct_mapping_size > available_va)
+ direct_mapping_size = available_va;
+
+ /*
+ * For MKTME, make sure direct_mapping_size is still power-of-2
+ * after adding padding and calculate mask that is used in __pa().
+ */
+ if (mktme_nr_keyids()) {
+ direct_mapping_size = rounddown_pow_of_two(direct_mapping_size);
+ direct_mapping_mask = direct_mapping_size - 1;
+ }
+}
+#endif
+
#ifdef CONFIG_SPARSEMEM_VMEMMAP
/*
* Initialise the sparsemem vmemmap using huge-pages at the PMD level.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 580b82c2621b..83af41d289ed 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -103,10 +103,15 @@ void __init kernel_randomize_memory(void)
* add padding if needed (especially for memory hotplug support).
*/
BUG_ON(kaslr_regions[0].base != &page_offset_base);
- memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
- CONFIG_MEMORY_PHYSICAL_PADDING;
- /* Adapt phyiscal memory region size based on available memory */
+ /*
+ * Calculate space required to map all physical memory.
+ * In case of MKTME, we map physical memory multiple times, one for
+ * each KeyID. If MKTME is disabled mktme_nr_keyids() is 0.
+ */
+ memory_tb = (direct_mapping_size * (mktme_nr_keyids() + 1)) >> TB_SHIFT;
+
+ /* Adapt physical memory region size based on available memory */
if (memory_tb < kaslr_regions[0].size_tb)
kaslr_regions[0].size_tb = memory_tb;
--
2.21.0
Hook up into page allocator to allocate and free encrypted page
properly.
The hardware/CPU does not enforce coherency between mappings of the same
physical page with different KeyIDs or encryption keys.
We are responsible for cache management.
Flush cache on allocating encrypted page and on returning the page to
the free pool.
prep_encrypted_page() also takes care about zeroing the page. We have to
do this after KeyID is set for the page.
The patch relies on page_address() to return virtual address of the page
mapping with the current KeyID. It will be implemented later in the
patchset.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/mktme.h | 17 ++++++++
arch/x86/mm/mktme.c | 83 ++++++++++++++++++++++++++++++++++++
2 files changed, 100 insertions(+)
diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index 52b115b30a42..a61b45fca4b1 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -43,6 +43,23 @@ static inline int vma_keyid(struct vm_area_struct *vma)
return __vma_keyid(vma);
}
+#define prep_encrypted_page prep_encrypted_page
+void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero);
+static inline void prep_encrypted_page(struct page *page, int order,
+ int keyid, bool zero)
+{
+ if (keyid)
+ __prep_encrypted_page(page, order, keyid, zero);
+}
+
+#define HAVE_ARCH_FREE_PAGE
+void free_encrypted_page(struct page *page, int order);
+static inline void arch_free_page(struct page *page, int order)
+{
+ if (page_keyid(page))
+ free_encrypted_page(page, order);
+}
+
#else
#define mktme_keyid_mask() ((phys_addr_t)0)
#define mktme_nr_keyids() 0
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index d02867212e33..8015e7822c9b 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,4 +1,5 @@
#include <linux/mm.h>
+#include <linux/highmem.h>
#include <asm/mktme.h>
/* Mask to extract KeyID from physical address. */
@@ -55,3 +56,85 @@ int __vma_keyid(struct vm_area_struct *vma)
pgprotval_t prot = pgprot_val(vma->vm_page_prot);
return (prot & mktme_keyid_mask()) >> mktme_keyid_shift();
}
+
+/* Prepare page to be used for encryption. Called from page allocator. */
+void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
+{
+ int i;
+
+ /*
+ * The hardware/CPU does not enforce coherency between mappings
+ * of the same physical page with different KeyIDs or
+ * encryption keys. We are responsible for cache management.
+ *
+ * Flush cache lines with KeyID-0. page_address() returns virtual
+ * address of the page mapping with the current (zero) KeyID.
+ */
+ clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
+
+ for (i = 0; i < (1 << order); i++) {
+ /* All pages coming out of the allocator should have KeyID 0 */
+ WARN_ON_ONCE(lookup_page_ext(page)->keyid);
+
+ /*
+ * Change KeyID. From now on page_address() will return address
+ * of the page mapping with the new KeyID.
+ *
+ * We don't need barrier() before the KeyID change because
+ * clflush_cache_range() above stops compiler from reordring
+ * past the point with mb().
+ *
+ * And we don't need a barrier() after the assignment because
+ * any future reference of KeyID (i.e. from page_address())
+ * will create address dependency and compiler is not allow to
+ * mess with this.
+ */
+ lookup_page_ext(page)->keyid = keyid;
+
+ /* Clear the page after the KeyID is set. */
+ if (zero)
+ clear_highpage(page);
+
+ page++;
+ }
+}
+
+/*
+ * Handles freeing of encrypted page.
+ * Called from page allocator on freeing encrypted page.
+ */
+void free_encrypted_page(struct page *page, int order)
+{
+ int i;
+
+ /*
+ * The hardware/CPU does not enforce coherency between mappings
+ * of the same physical page with different KeyIDs or
+ * encryption keys. We are responsible for cache management.
+ *
+ * Flush cache lines with non-0 KeyID. page_address() returns virtual
+ * address of the page mapping with the current (non-zero) KeyID.
+ */
+ clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
+
+ for (i = 0; i < (1 << order); i++) {
+ /* Check if the page has reasonable KeyID */
+ WARN_ON_ONCE(!lookup_page_ext(page)->keyid);
+ WARN_ON_ONCE(lookup_page_ext(page)->keyid > mktme_nr_keyids());
+
+ /*
+ * Switch the page back to zero KeyID.
+ *
+ * We don't need barrier() before the KeyID change because
+ * clflush_cache_range() above stops compiler from reordring
+ * past the point with mb().
+ *
+ * And we don't need a barrier() after the assignment because
+ * any future reference of KeyID (i.e. from page_address())
+ * will create address dependency and compiler is not allow to
+ * mess with this.
+ */
+ lookup_page_ext(page)->keyid = 0;
+ page++;
+ }
+}
--
2.21.0
Icelake Server requires additional check to make sure that MKTME usage
is safe on Linux.
Kernel needs a way to access encrypted memory. There can be different
approaches to this: create a temporary mapping to access the page (using
kmap() interface), modify kernel's direct mapping on allocation of
encrypted page.
In order to minimize runtime overhead, the Linux MKTME implementation
uses multiple direct mappings, one per-KeyID. Kernel uses the direct
mapping that is relevant for the page at the moment.
Icelake Server in some configurations doesn't allow a page to be mapped
with multiple KeyIDs at the same time. Even if only one of KeyIDs is
actively used. It conflicts with the Linux MKTME implementation.
OS can check if it's safe to map the same with multiple KeyIDs by
examining bit 8 of MSR 0x6F. If the bit is set we cannot safely use
MKTME on Linux.
The user can disable the Directory Mode in BIOS setup to get the
platform into Linux-compatible mode.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 9852580340b9..3583bea0a5b9 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,7 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -560,6 +561,16 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
#define TME_ACTIVATE_CRYPTO_KNOWN_ALGS TME_ACTIVATE_CRYPTO_AES_XTS_128
+#define MSR_ICX_MKTME_STATUS 0x6F
+#define MKTME_ALIASES_FORBIDDEN(x) (x & BIT(8))
+
+/* Need to check MSR_ICX_MKTME_STATUS for these CPUs */
+static const struct x86_cpu_id mktme_status_msr_ids[] = {
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ICELAKE_X },
+ { X86_VENDOR_INTEL, 6, INTEL_FAM6_ICELAKE_XEON_D },
+ {}
+};
+
/* Values for mktme_status (SW only construct) */
#define MKTME_ENABLED 0
#define MKTME_DISABLED 1
@@ -593,6 +604,17 @@ static void detect_tme(struct cpuinfo_x86 *c)
return;
}
+ /* Icelake Server quirk: do not enable MKTME if aliases are forbidden */
+ if (x86_match_cpu(mktme_status_msr_ids)) {
+ u64 status;
+ rdmsrl(MSR_ICX_MKTME_STATUS, status);
+
+ if (MKTME_ALIASES_FORBIDDEN(status)) {
+ pr_err_once("x86/tme: Directory Mode is enabled in BIOS\n");
+ mktme_status = MKTME_DISABLED;
+ }
+ }
+
if (mktme_status != MKTME_UNINITIALIZED)
goto detect_keyid_bits;
--
2.21.0
For !NUMA khugepaged allocates page in advance, before we found a VMA
for collapse. We don't yet know which KeyID to use for the allocation.
The page is allocated with KeyID-0. Once we know that the VMA is
suitable for collapsing, we prepare the page for KeyID we need, based on
vma_keyid().
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
mm/khugepaged.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index eaaa21b23215..ae9bd3b18aa1 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1059,6 +1059,16 @@ static void collapse_huge_page(struct mm_struct *mm,
*/
anon_vma_unlock_write(vma->anon_vma);
+ /*
+ * At this point new_page is allocated as non-encrypted.
+ * If VMA's KeyID is non-zero, we need to prepare it to be encrypted
+ * before coping data.
+ */
+ if (vma_keyid(vma)) {
+ prep_encrypted_page(new_page, HPAGE_PMD_ORDER,
+ vma_keyid(vma), false);
+ }
+
__collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl);
pte_unmap(pte);
__SetPageUptodate(new_page);
--
2.21.0
From: Alison Schofield <[email protected]>
Provide an overview of MKTME on Intel Platforms.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/x86/index.rst | 1 +
Documentation/x86/mktme/index.rst | 8 +++
Documentation/x86/mktme/mktme_overview.rst | 57 ++++++++++++++++++++++
3 files changed, 66 insertions(+)
create mode 100644 Documentation/x86/mktme/index.rst
create mode 100644 Documentation/x86/mktme/mktme_overview.rst
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index af64c4bb4447..449bb6abeb0e 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -22,6 +22,7 @@ x86-specific Documentation
intel_mpx
intel-iommu
intel_txt
+ mktme/index
amd-memory-encryption
pti
mds
diff --git a/Documentation/x86/mktme/index.rst b/Documentation/x86/mktme/index.rst
new file mode 100644
index 000000000000..1614b52dd3e9
--- /dev/null
+++ b/Documentation/x86/mktme/index.rst
@@ -0,0 +1,8 @@
+
+=========================================
+Multi-Key Total Memory Encryption (MKTME)
+=========================================
+
+.. toctree::
+
+ mktme_overview
diff --git a/Documentation/x86/mktme/mktme_overview.rst b/Documentation/x86/mktme/mktme_overview.rst
new file mode 100644
index 000000000000..64c3268a508e
--- /dev/null
+++ b/Documentation/x86/mktme/mktme_overview.rst
@@ -0,0 +1,57 @@
+Overview
+=========
+Multi-Key Total Memory Encryption (MKTME)[1] is a technology that
+allows transparent memory encryption in upcoming Intel platforms.
+It uses a new instruction (PCONFIG) for key setup and selects a
+key for individual pages by repurposing physical address bits in
+the page tables.
+
+Support for MKTME is added to the existing kernel keyring subsystem
+and via a new mprotect_encrypt() system call that can be used by
+applications to encrypt anonymous memory with keys obtained from
+the keyring.
+
+This architecture supports encrypting both normal, volatile DRAM
+and persistent memory. However, persistent memory support is
+not included in the Linux kernel implementation at this time.
+(We anticipate adding that support next.)
+
+Hardware Background
+===================
+
+MKTME is built on top of an existing single-key technology called
+TME. TME encrypts all system memory using a single key generated
+by the CPU on every boot of the system. TME provides mitigation
+against physical attacks, such as physically removing a DIMM or
+watching memory bus traffic.
+
+MKTME enables the use of multiple encryption keys[2], allowing
+selection of the encryption key per-page using the page tables.
+Encryption keys are programmed into each memory controller and
+the same set of keys is available to all entities on the system
+with access to that memory (all cores, DMA engines, etc...).
+
+MKTME inherits many of the mitigations against hardware attacks
+from TME. Like TME, MKTME does not mitigate vulnerable or
+malicious operating systems or virtual machine managers. MKTME
+offers additional mitigations when compared to TME.
+
+TME and MKTME use the AES encryption algorithm in the AES-XTS
+mode. This mode, typically used for block-based storage devices,
+takes the physical address of the data into account when
+encrypting each block. This ensures that the effective key is
+different for each block of memory. Moving encrypted content
+across physical address results in garbage on read, mitigating
+block-relocation attacks. This property is the reason many of
+the discussed attacks require control of a shared physical page
+to be handed from the victim to the attacker.
+
+--
+1. https://software.intel.com/sites/default/files/managed/a5/16/Multi-Key-Total-Memory-Encryption-Spec.pdf
+2. The MKTME architecture supports up to 16 bits of KeyIDs, so a
+ maximum of 65535 keys on top of the “TME key” at KeyID-0. The
+ first implementation is expected to support 6 bits, making 63
+ keys available to applications. However, this is not guaranteed.
+ The number of available keys could be reduced if, for instance,
+ additional physical address space is desired over additional
+ KeyIDs.
--
2.21.0
If all pages in the VMA got unmapped there's no reason to link it into
original anon VMA hierarchy: it cannot possibly share any pages with
other VMA.
Set vma->anon_vma to NULL on unlink_anon_vmas(). With the change VMA
can be reused. The new anon VMA will be allocated on the first fault.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
mm/rmap.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..911367b5fb40 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -400,8 +400,10 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
list_del(&avc->same_vma);
anon_vma_chain_free(avc);
}
- if (vma->anon_vma)
+ if (vma->anon_vma) {
vma->anon_vma->degree--;
+ vma->anon_vma = NULL;
+ }
unlock_anon_vma_root(root);
/*
--
2.21.0
Add new config option to enabled/disable Multi-Key Total Memory
Encryption support.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/Kconfig | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f2cc88fe8ada..d8551b612f3b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1550,6 +1550,25 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
If set to N, then the encryption of system memory can be
activated with the mem_encrypt=on command line option.
+config X86_INTEL_MKTME
+ bool "Intel Multi-Key Total Memory Encryption"
+ depends on X86_64 && CPU_SUP_INTEL && !KASAN
+ select X86_MEM_ENCRYPT_COMMON
+ select PAGE_EXTENSION
+ select KEYS
+ select ACPI_HMAT
+ ---help---
+ Say yes to enable support for Multi-Key Total Memory Encryption.
+ This requires an Intel processor that has support of the feature.
+
+ Multikey Total Memory Encryption (MKTME) is a technology that allows
+ transparent memory encryption in upcoming Intel platforms.
+
+ MKTME is built on top of TME. TME allows encryption of the entirety
+ of system memory using a single key. MKTME allows having multiple
+ encryption domains, each having own key -- different memory pages can
+ be encrypted with different keys.
+
# Common NUMA Features
config NUMA
bool "Numa Memory Allocation and Scheduler Support"
@@ -2220,7 +2239,7 @@ config RANDOMIZE_MEMORY
config MEMORY_PHYSICAL_PADDING
hex "Physical memory mapping padding" if EXPERT
- depends on RANDOMIZE_MEMORY
+ depends on RANDOMIZE_MEMORY || X86_INTEL_MKTME
default "0xa" if MEMORY_HOTPLUG
default "0x0"
range 0x1 0x40 if MEMORY_HOTPLUG
--
2.21.0
On Wed, Jul 31, 2019 at 18:08:11 +0300, Kirill A. Shutemov wrote:
> + key = add_key("mktme", "name", "no-encrypt", strlen(options_CPU),
> + KEY_SPEC_THREAD_KEYRING);
Should this be `type=no-encrypt` here? Also, seems like copy/paste from
the `type=cpu` case for the `strlen` call.
--Ben
On Mon, Aug 05, 2019 at 07:58:37AM -0400, Ben Boeckel wrote:
> On Wed, Jul 31, 2019 at 18:08:11 +0300, Kirill A. Shutemov wrote:
> > + key = add_key("mktme", "name", "no-encrypt", strlen(options_CPU),
> > + KEY_SPEC_THREAD_KEYRING);
>
> Should this be `type=no-encrypt` here? Also, seems like copy/paste from
> the `type=cpu` case for the `strlen` call.
>
> --Ben
Yes. Fixed up as follows:
Add a "no-encrypt' type key::
char \*options_NOENCRYPT = "type=no-encrypt";
key = add_key("mktme", "name", options_NOENCRYPT,
strlen(options_NOENCRYPT), KEY_SPEC_THREAD_KEYRING);
Thanks for the review,
Alison
On 7/31/19 10:08 AM, Kirill A. Shutemov wrote:
> From: Kai Huang <[email protected]>
>
> Setup keyID to SPTE, which will be eventually programmed to shadow MMU
> or EPT table, according to page's associated keyID, so that guest is
> able to use correct keyID to access guest memory.
>
> Note current shadow_me_mask doesn't suit MKTME's needs, since for MKTME
> there's no fixed memory encryption mask, but can vary from keyID 1 to
> maximum keyID, therefore shadow_me_mask remains 0 for MKTME.
>
> Signed-off-by: Kai Huang <[email protected]>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> ---
> arch/x86/kvm/mmu.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 8f72526e2f68..b8742e6219f6 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2936,6 +2936,22 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
> #define SET_SPTE_WRITE_PROTECTED_PT BIT(0)
> #define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1)
>
> +static u64 get_phys_encryption_mask(kvm_pfn_t pfn)
> +{
> +#ifdef CONFIG_X86_INTEL_MKTME
> + struct page *page;
> +
> + if (!pfn_valid(pfn))
> + return 0;
> +
> + page = pfn_to_page(pfn);
> +
> + return ((u64)page_keyid(page)) << mktme_keyid_shift();
> +#else
> + return shadow_me_mask;
> +#endif
> +}
This patch breaks AMD virtualization (SVM) in general (non-SEV and SEV
guests) when SME is active. Shouldn't this be a run time, vs build time,
check for MKTME being active?
Thanks,
Tom
> +
> static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> unsigned pte_access, int level,
> gfn_t gfn, kvm_pfn_t pfn, bool speculative,
> @@ -2982,7 +2998,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> pte_access &= ~ACC_WRITE_MASK;
>
> if (!kvm_is_mmio_pfn(pfn))
> - spte |= shadow_me_mask;
> + spte |= get_phys_encryption_mask(pfn);
>
> spte |= (u64)pfn << PAGE_SHIFT;
>
>
On Tue, Aug 06, 2019 at 08:26:52PM +0000, Lendacky, Thomas wrote:
> On 7/31/19 10:08 AM, Kirill A. Shutemov wrote:
> > From: Kai Huang <[email protected]>
> >
> > Setup keyID to SPTE, which will be eventually programmed to shadow MMU
> > or EPT table, according to page's associated keyID, so that guest is
> > able to use correct keyID to access guest memory.
> >
> > Note current shadow_me_mask doesn't suit MKTME's needs, since for MKTME
> > there's no fixed memory encryption mask, but can vary from keyID 1 to
> > maximum keyID, therefore shadow_me_mask remains 0 for MKTME.
> >
> > Signed-off-by: Kai Huang <[email protected]>
> > Signed-off-by: Kirill A. Shutemov <[email protected]>
> > ---
> > arch/x86/kvm/mmu.c | 18 +++++++++++++++++-
> > 1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index 8f72526e2f68..b8742e6219f6 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -2936,6 +2936,22 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
> > #define SET_SPTE_WRITE_PROTECTED_PT BIT(0)
> > #define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1)
> >
> > +static u64 get_phys_encryption_mask(kvm_pfn_t pfn)
> > +{
> > +#ifdef CONFIG_X86_INTEL_MKTME
> > + struct page *page;
> > +
> > + if (!pfn_valid(pfn))
> > + return 0;
> > +
> > + page = pfn_to_page(pfn);
> > +
> > + return ((u64)page_keyid(page)) << mktme_keyid_shift();
> > +#else
> > + return shadow_me_mask;
> > +#endif
> > +}
>
> This patch breaks AMD virtualization (SVM) in general (non-SEV and SEV
> guests) when SME is active. Shouldn't this be a run time, vs build time,
> check for MKTME being active?
Thanks, I've missed this.
This fixup should help:
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 00d17bdfec0f..54931acf260e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2947,18 +2947,17 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
static u64 get_phys_encryption_mask(kvm_pfn_t pfn)
{
-#ifdef CONFIG_X86_INTEL_MKTME
struct page *page;
+ if (!mktme_enabled())
+ return shadow_me_mask;
+
if (!pfn_valid(pfn))
return 0;
page = pfn_to_page(pfn);
return ((u64)page_keyid(page)) << mktme_keyid_shift();
-#else
- return shadow_me_mask;
-#endif
}
static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
--
Kirill A. Shutemov
On Mon, Aug 05, 2019 at 13:44:53 -0700, Alison Schofield wrote:
> Yes. Fixed up as follows:
>
> Add a "no-encrypt' type key::
>
> char \*options_NOENCRYPT = "type=no-encrypt";
>
> key = add_key("mktme", "name", options_NOENCRYPT,
> strlen(options_NOENCRYPT), KEY_SPEC_THREAD_KEYRING);
Thanks. Looks good to me.
Reviewed-by: Ben Boeckel <[email protected]>
--Ben