2020-07-14 07:08:43

by Ira Weiny

[permalink] [raw]
Subject: [RFC PATCH 00/15] PKS: Add Protection Keys Supervisor (PKS) support

From: Ira Weiny <[email protected]>

This RFC series has been reviewed by Dave Hansen.

This patch set introduces a new page protection mechanism for supervisor pages,
Protection Key Supervisor (PKS) and an initial user of them, persistent memory,
PMEM.

PKS enables protections on 'domains' of supervisor pages to limit supervisor
mode access to those pages beyond the normal paging protections. They work in
a similar fashion to user space pkeys. Like User page pkeys (PKU), supervisor
pkeys are checked in addition to normal paging protections and Access or Writes
can be disabled via a MSR update without TLB flushes when permissions change.
A page mapping is assigned to a domain by setting a pkey in the page table
entry.

Unlike User pkeys no new instructions are added; rather WRMSR/RDMSR are used to
update the PKRS register.

XSAVE is not supported for the PKRS MSR. To reduce software complexity the
implementation saves/restores the MSR across context switches but not during
irqs. This is a compromise which results is a hardening of unwanted access
without absolute restriction.

For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.

Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one. Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
arch and/or CPU instance.

Protecting against stray writes is particularly important for PMEM because,
unlike writes to anonymous memory, writes to PMEM persists across a reboot.
Thus data corruption could result in permanent loss of data.

The following attributes of PKS makes it perfect as a mechanism to protect PMEM
from stray access within the kernel:

1) Fast switching of permissions
2) Prevents access without page table manipulations
3) Works on a per thread basis
4) No TLB flushes required

The second half of this series thus uses the PKS mechanism to protect PMEM from
stray access.


Implementation details
----------------------

Modifications of task struct in patches:
(x86/pks: Preserve the PKRS MSR on context switch)
(memremap: Add zone device access protection)

Because pkey access is per-thread 2 modifications are made to the task struct.
The first is a saved copy of the MSR during context switches. The second
reference counts access to the device domain to correctly handle kmap nesting
properly.


Maintain PKS setting in a re-entrant manner in patch:
(memremap: Add zone device access protection)

Using local_irq_save() seems to be the safest and fastest way to maintain kmap
as re-entrant. But there may be a better way. spin_lock_irq() and atomic
counters were considered. But atomic counters do not properly protect the pkey
update and spin_lock_irq() is unnecessary as the pkey protections are thread
local. Suggestions are welcome.


The use of kmap in patch:
(kmap: Add stray write protection for device pages)

To keep general access to PMEM pages general, we piggy back on the kmap()
interface as there are many places in the kernel who do not have, nor should be
required to have, a priori knowledge that a page is PMEM. The modifications to
the kmap code is careful to quickly determine which pages don't require special
handling to reduce overhead for non PMEM pages.



Breakdown of patches
--------------------

Implement PKS within x86 arch:

x86/pkeys: Create pkeys_internal.h
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Enable Protection Keys Supervisor (PKS)
x86/pks: Preserve the PKRS MSR on context switch
x86/pks: Add PKS kernel API
x86/pks: Add a debugfs file for allocated PKS keys
Documentation/pkeys: Update documentation for kernel pkeys
x86/pks: Add PKS Test code

pre-req bug fixes for dax:

fs/dax: Remove unused size parameter
drivers/dax: Expand lock scope to cover the use of addresses

Add stray write protection to PMEM:

memremap: Add zone device access protection
kmap: Add stray write protection for device pages
dax: Stray write protection for dax_direct_access()
nvdimm/pmem: Stray write protection for pmem->virt_addr
[dax|pmem]: Enable stray write protection

Fenghua Yu (4):
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Enable Protection Keys Supervisor (PKS)
x86/pks: Add PKS kernel API
x86/pks: Add a debugfs file for allocated PKS keys

Ira Weiny (11):
x86/pkeys: Create pkeys_internal.h
x86/pks: Preserve the PKRS MSR on context switch
Documentation/pkeys: Update documentation for kernel pkeys
x86/pks: Add PKS Test code
fs/dax: Remove unused size parameter
drivers/dax: Expand lock scope to cover the use of addresses
memremap: Add zone device access protection
kmap: Add stray write protection for device pages
dax: Stray write protection for dax_direct_access()
nvdimm/pmem: Stray write protection for pmem->virt_addr
[dax|pmem]: Enable stray write protection

Documentation/core-api/protection-keys.rst | 81 +++-
arch/x86/Kconfig | 1 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable.h | 13 +-
arch/x86/include/asm/pgtable_types.h | 4 +
arch/x86/include/asm/pkeys.h | 43 ++
arch/x86/include/asm/pkeys_internal.h | 35 ++
arch/x86/include/asm/processor.h | 13 +
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/cpu/common.c | 17 +
arch/x86/kernel/fpu/xstate.c | 17 +-
arch/x86/kernel/process.c | 35 ++
arch/x86/mm/fault.c | 16 +-
arch/x86/mm/pkeys.c | 174 +++++++-
drivers/dax/device.c | 2 +
drivers/dax/super.c | 5 +-
drivers/nvdimm/pmem.c | 6 +
fs/dax.c | 13 +-
include/linux/highmem.h | 32 +-
include/linux/memremap.h | 1 +
include/linux/mm.h | 33 ++
include/linux/pkeys.h | 18 +
include/linux/sched.h | 3 +
init/init_task.c | 3 +
kernel/fork.c | 3 +
lib/Kconfig.debug | 12 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 452 ++++++++++++++++++++
mm/Kconfig | 15 +
mm/memremap.c | 111 +++++
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 65 +++
34 files changed, 1175 insertions(+), 61 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_internal.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c

--
2.25.1


2020-07-14 07:12:48

by Ira Weiny

[permalink] [raw]
Subject: [RFC PATCH 08/15] x86/pks: Add PKS Test code

From: Ira Weiny <[email protected]>

The core PKS functionality provides an interface for kernel users to
reserve keys to their domains set up the page tables with those keys and
control access to those domains when needed.

Define test code which exercises the core functionality of PKS via a
debugfs entry. Basic checks can be triggered on boot with a kernel
command line option while both basic and preemption checks can be
triggered with separate debugfs values.

debugfs controls are:

'0' -- Run access tests with a single pkey
'1' -- Set up the pkey register with no access for the pkey allocated to
this fd
'2' -- Check that the pkey register updated in '1' is still the same.
(To be used after a forced context switch.)
'3' -- Allocate all pkeys possible and run tests on each pkey allocated.
DEFAULT when run at boot.

Closing the fd will cleanup and release the pkey, therefore to fully
exercise context switch testing a user space program is provided in:

.../tools/testing/selftests/x86/test_pks.c

Co-developed-by: Fenghua Yu <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
arch/x86/include/asm/pkeys.h | 9 +
arch/x86/mm/fault.c | 16 +-
include/linux/pkeys.h | 4 +
lib/Kconfig.debug | 12 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 452 +++++++++++++++++++++++++
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 65 ++++
9 files changed, 562 insertions(+), 5 deletions(-)
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c

diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index e30ea907abb6..097abca7784c 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -168,4 +168,13 @@ static inline void pks_update_protection(int pkey, unsigned long protection)
}
#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */

+#if defined(CONFIG_PKS_TESTING)
+bool pks_test_armed_and_clear(void);
+#else
+static inline bool pks_test_armed_and_clear(void)
+{
+ return false;
+}
+#endif
+
#endif /*_ASM_X86_PKEYS_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 1ead568c0101..483fbf5b7957 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -18,6 +18,7 @@
#include <linux/uaccess.h> /* faulthandler_disabled() */
#include <linux/efi.h> /* efi_recover_from_page_fault()*/
#include <linux/mm_types.h>
+#include <linux/pkeys.h>

#include <asm/cpufeature.h> /* boot_cpu_has, ... */
#include <asm/traps.h> /* dotraplinkage, ... */
@@ -1105,11 +1106,18 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
unsigned long address)
{
/*
- * Protection keys exceptions only happen on user pages. We
- * have no user pages in the kernel portion of the address
- * space, so do not expect them here.
+ * If we get a protection key exception it could be because we are
+ * running the PKS test. If so, pks_test_armed_and_clear() will clear
+ * the protection mechanism and we can safely return.
+ *
+ * Otherwise we warn the user that something has gone wrong and
+ * continue with the fault.
*/
- WARN_ON_ONCE(hw_error_code & X86_PF_PK);
+ if (hw_error_code & X86_PF_PK) {
+ if (pks_test_armed_and_clear())
+ return;
+ WARN_ON_ONCE(hw_error_code & X86_PF_PK);
+ }

/* Was the fault spurious, caused by lazy TLB invalidation? */
if (spurious_kernel_fault(hw_error_code, address))
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index e4bff77d7b49..1d84ab7c12d4 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -48,6 +48,10 @@ static inline void copy_init_pkru_to_fpregs(void)
{
}

+static inline bool pks_test_armed_and_clear(void)
+{
+ return false;
+}
#endif /* ! CONFIG_ARCH_HAS_PKEYS */

#ifndef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 9ad9210d70a1..aa876ebb4c8b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2329,6 +2329,18 @@ config HYPERV_TESTING
help
Select this option to enable Hyper-V vmbus testing.

+config PKS_TESTING
+ bool "PKey(S)upervisor testing"
+ default n
+ depends on ARCH_HAS_SUPERVISOR_PKEYS
+ help
+ Select this option to enable testing of PKS core software and
+ hardware. The PKS core provides a mechanism to allocate keys as well
+ as maintain the protection settings across context switches.
+ Answer N if you don't know what supervisor keys are.
+
+ If unsure, say N.
+
endmenu # "Kernel Testing and Coverage"

endmenu # Kernel hacking
diff --git a/lib/Makefile b/lib/Makefile
index b1c42c10073b..667dea28cf7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -318,3 +318,6 @@ obj-$(CONFIG_OBJAGG) += objagg.o
# KUnit tests
obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o
obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o
+
+# PKS test
+obj-y += pks/
diff --git a/lib/pks/Makefile b/lib/pks/Makefile
new file mode 100644
index 000000000000..7d1df7563db9
--- /dev/null
+++ b/lib/pks/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_PKS_TESTING) += pks_test.o
diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c
new file mode 100644
index 000000000000..6d8172734f97
--- /dev/null
+++ b/lib/pks/pks_test.c
@@ -0,0 +1,452 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright(c) 2020 Intel Corporation. All rights reserved.
+ *
+ * Implement PKS testing
+ * Access to run this test can be with a command line parameter
+ * ("pks-test-on-boot") or more detailed tests can be triggered through:
+ *
+ * /sys/kernel/debug/x86/run_pks
+ *
+ * debugfs controls are:
+ *
+ * '0' -- Run access tests with a single pkey
+ *
+ * '1' -- Set up the pkey register with no access for the pkey allocated to
+ * this fd
+ * '2' -- Check that the pkey register updated in '1' is still the same. (To
+ * be used after a forced context switch.)
+ *
+ * '3' -- Allocate all pkeys possible and run tests on each pkey allocated.
+ * DEFAULT when run at boot.
+ *
+ * Closing the fd will cleanup and release the pkey.
+ *
+ * A companion user space program is provided in:
+ *
+ * .../tools/testing/selftests/x86/test_pks.c
+ *
+ * which will better test the context switching.
+ *
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/pgtable.h>
+
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/mman.h>
+#include <linux/module.h>
+#include <linux/pkeys.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+
+#define PKS_TEST_MEM_SIZE (PAGE_SIZE)
+
+/*
+ * run_on_boot default '= false' which checkpatch complains about initializing;
+ * so we don't
+ */
+static bool run_on_boot;
+static struct dentry *pks_test_dentry;
+
+/*
+ * We must lock the following globals for brief periods while the fault handler
+ * checks/updates them.
+ */
+static DEFINE_SPINLOCK(test_lock);
+static int test_armed_key;
+static unsigned long prev_cnt;
+static unsigned long fault_cnt;
+
+struct pks_test_ctx {
+ bool pass;
+ bool pks_cpu_enabled;
+ int pkey;
+ char data[64];
+};
+
+static pte_t *walk_table(void *ptr)
+{
+ struct page *page = NULL;
+ pgd_t *pgdp;
+ p4d_t *p4dp;
+ pud_t *pudp;
+ pmd_t *pmdp;
+ pte_t *ret = NULL;
+
+ pgdp = pgd_offset_k((unsigned long)ptr);
+ if (pgd_none(*pgdp) || pgd_bad(*pgdp))
+ goto error;
+
+ p4dp = p4d_offset(pgdp, (unsigned long)ptr);
+ if (p4d_none(*p4dp) || p4d_bad(*p4dp))
+ goto error;
+
+ pudp = pud_offset(p4dp, (unsigned long)ptr);
+ if (pud_none(*pudp) || pud_bad(*pudp))
+ goto error;
+
+ pmdp = pmd_offset(pudp, (unsigned long)ptr);
+ if (pmd_none(*pmdp) || pmd_bad(*pmdp))
+ goto error;
+
+ ret = pte_offset_map(pmdp, (unsigned long)ptr);
+ if (pte_present(*ret)) {
+ page = pte_page(*ret);
+ if (!page) {
+ pte_unmap(ret);
+ goto error;
+ }
+ pr_info("page 0x%lx; flags 0x%lx\n",
+ (unsigned long)page, page->flags);
+ }
+
+error:
+ return ret;
+}
+
+/**
+ * pks_test_armed_and_clear() is exported so that the fault handler can detect
+ * and report back status of intentional faults.
+ *
+ * NOTE: It clears the protection key from the page such that the fault handler
+ * will not re-trigger.
+ */
+bool pks_test_armed_and_clear(void)
+{
+ bool armed = (test_armed_key != 0);
+
+ if (armed) {
+ /* Enable read and write to stop faults */
+ pks_update_protection(test_armed_key, 0);
+ fault_cnt++;
+ }
+
+ return armed;
+}
+EXPORT_SYMBOL(pks_test_armed_and_clear);
+
+static bool exception_caught(void)
+{
+ bool ret = (fault_cnt != prev_cnt);
+
+ prev_cnt = fault_cnt;
+ return ret;
+}
+
+static void report_pkey_settings(void *unused)
+{
+ u8 pkey;
+ unsigned long long msr = 0;
+ unsigned int cpu = smp_processor_id();
+
+ rdmsrl(MSR_IA32_PKRS, msr);
+
+ pr_info("for CPU %d : 0x%llx\n", cpu, msr);
+ for (pkey = 0; pkey < PKS_NUM_KEYS; pkey++) {
+ int ad, wd;
+
+ ad = (msr >> (pkey * PKR_BITS_PER_PKEY)) & PKEY_DISABLE_ACCESS;
+ wd = (msr >> (pkey * PKR_BITS_PER_PKEY)) & PKEY_DISABLE_WRITE;
+ pr_info(" %u: A:%d W:%d\n", pkey, ad, wd);
+ }
+}
+
+struct pks_access_test {
+ int ad;
+ int wd;
+ bool write;
+ bool exception;
+};
+
+static struct pks_access_test pkey_test_ary[] = {
+ /* disable both */
+ { PKEY_DISABLE_ACCESS, PKEY_DISABLE_WRITE, true, true },
+ { PKEY_DISABLE_ACCESS, PKEY_DISABLE_WRITE, false, true },
+
+ /* enable both */
+ { 0, 0, true, false },
+ { 0, 0, false, false },
+
+ /* enable read only */
+ { 0, PKEY_DISABLE_WRITE, true, true },
+ { 0, PKEY_DISABLE_WRITE, false, false },
+};
+
+static int run_access_test(struct pks_test_ctx *ctx,
+ struct pks_access_test *test,
+ void *ptr)
+{
+ int ret = 0;
+ bool exception;
+
+ pks_update_protection(ctx->pkey, test->ad | test->wd);
+
+ spin_lock(&test_lock);
+ test_armed_key = ctx->pkey;
+
+ if (test->write)
+ memcpy(ptr, ctx->data, 8);
+ else
+ memcpy(ctx->data, ptr, 8);
+
+ exception = exception_caught();
+
+ test_armed_key = 0;
+ spin_unlock(&test_lock);
+
+ if (test->exception != exception) {
+ pr_err("pkey test FAILED: ad %d; wd %d; write %s; exception %s != %s\n",
+ test->ad, test->wd,
+ test->write ? "TRUE" : "FALSE",
+ test->exception ? "TRUE" : "FALSE",
+ exception ? "TRUE" : "FALSE");
+ ret = -EFAULT;
+ }
+
+ return ret;
+}
+
+static void test_mem_access(struct pks_test_ctx *ctx)
+{
+ int i, rc;
+ u8 pkey;
+ void *ptr = NULL;
+ pte_t *ptep;
+
+ ptr = __vmalloc_node_range(PKS_TEST_MEM_SIZE, 1, VMALLOC_START, VMALLOC_END,
+ GFP_KERNEL, PAGE_KERNEL_PKEY(ctx->pkey),
+ 0, NUMA_NO_NODE, __builtin_return_address(0));
+ if (!ptr) {
+ pr_err("Failed to vmalloc page???\n");
+ ctx->pass = false;
+ return;
+ }
+
+ ptep = walk_table(ptr);
+ if (!ptep) {
+ pr_err("Failed to walk table???\n");
+ ctx->pass = false;
+ goto done;
+ }
+
+ pkey = pte_flags_pkey(ptep->pte);
+ pr_info("ptep flags 0x%lx pkey %u\n",
+ (unsigned long)ptep->pte, pkey);
+
+ if (pkey != ctx->pkey) {
+ pr_err("invalid pkey found: %u, test_pkey: %u\n",
+ pkey, ctx->pkey);
+ ctx->pass = false;
+ goto unmap;
+ }
+
+ if (!ctx->pks_cpu_enabled) {
+ pr_err("not CPU enabled; skipping access tests...\n");
+ ctx->pass = true;
+ goto unmap;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(pkey_test_ary); i++) {
+ rc = run_access_test(ctx, &pkey_test_ary[i], ptr);
+
+ /* only save last error is fine */
+ if (rc)
+ ctx->pass = false;
+ }
+
+unmap:
+ pte_unmap(ptep);
+done:
+ vfree(ptr);
+}
+
+static void pks_run_test(struct pks_test_ctx *ctx)
+{
+ ctx->pass = true;
+
+ pr_info("\n");
+ pr_info("\n");
+ pr_info(" ***** BEGIN: Testing (CPU enabled : %s) *****\n",
+ ctx->pks_cpu_enabled ? "TRUE" : "FALSE");
+
+ if (ctx->pks_cpu_enabled)
+ on_each_cpu(report_pkey_settings, NULL, 1);
+
+ pr_info(" BEGIN: pkey %d Testing\n", ctx->pkey);
+ test_mem_access(ctx);
+ pr_info(" END: PAGE_KERNEL_PKEY Testing : %s\n",
+ ctx->pass ? "PASS" : "FAIL");
+
+ pr_info(" ***** END: Testing *****\n");
+ pr_info("\n");
+ pr_info("\n");
+}
+
+static ssize_t pks_read_file(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct pks_test_ctx *ctx = file->private_data;
+ char buf[32];
+ unsigned int len;
+
+ if (!ctx)
+ len = sprintf(buf, "not run\n");
+ else
+ len = sprintf(buf, "%s\n", ctx->pass ? "PASS" : "FAIL");
+
+ return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static struct pks_test_ctx *alloc_ctx(const char *name)
+{
+ struct pks_test_ctx *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+
+ if (!ctx) {
+ pr_err("Failed to allocate memory for test context\n");
+ return ERR_PTR(-ENOMEM);
+ }
+
+ ctx->pkey = pks_key_alloc(name);
+ if (ctx->pkey <= 0) {
+ pr_err("Failed to allocate memory for test context\n");
+ kfree(ctx);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ ctx->pks_cpu_enabled = cpu_feature_enabled(X86_FEATURE_PKS);
+ sprintf(ctx->data, "%s", "DEADBEEF");
+ return ctx;
+}
+
+static void free_ctx(struct pks_test_ctx *ctx)
+{
+ pks_key_free(ctx->pkey);
+ kfree(ctx);
+}
+
+static void run_all(void)
+{
+ struct pks_test_ctx *ctx[PKS_NUM_KEYS];
+ static char name[PKS_NUM_KEYS][64];
+ int i;
+
+ for (i = 1; i < PKS_NUM_KEYS; i++) {
+ sprintf(name[i], "pks ctx %d", i);
+ ctx[i] = alloc_ctx((const char *)name[i]);
+ }
+
+ for (i = 1; i < PKS_NUM_KEYS; i++) {
+ if (!IS_ERR(ctx[i]))
+ pks_run_test(ctx[i]);
+ }
+
+ for (i = 1; i < PKS_NUM_KEYS; i++) {
+ if (!IS_ERR(ctx[i]))
+ free_ctx(ctx[i]);
+ }
+}
+
+static ssize_t pks_write_file(struct file *file, const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ char buf[2];
+ struct pks_test_ctx *ctx = file->private_data;
+
+ if (copy_from_user(buf, user_buf, 1))
+ return -EFAULT;
+ buf[1] = '\0';
+
+ /*
+ * Test "3" will test allocating all keys. Do it first without
+ * using "ctx".
+ */
+ if (!strcmp(buf, "3"))
+ run_all();
+
+ if (!ctx) {
+ ctx = alloc_ctx("pks test");
+ if (IS_ERR(ctx))
+ return -ENOMEM;
+ file->private_data = ctx;
+ }
+
+ if (!strcmp(buf, "0"))
+ pks_run_test(ctx);
+
+ /* start of context switch test */
+ if (!strcmp(buf, "1")) {
+ /* Ensure a known state to test context switch */
+ pks_update_protection(ctx->pkey,
+ PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE);
+ }
+
+ /* After context switch msr should be restored */
+ if (!strcmp(buf, "2") && ctx->pks_cpu_enabled) {
+ unsigned long reg_pkrs;
+ int access;
+
+ rdmsrl(MSR_IA32_PKRS, reg_pkrs);
+
+ access = (reg_pkrs >> (ctx->pkey * PKR_BITS_PER_PKEY)) &
+ PKEY_ACCESS_MASK;
+ if (access != (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)) {
+ ctx->pass = false;
+ pr_err("Context switch check failed\n");
+ }
+ }
+
+ return count;
+}
+
+static int pks_release_file(struct inode *inode, struct file *file)
+{
+ struct pks_test_ctx *ctx = file->private_data;
+
+ if (!ctx)
+ return 0;
+
+ free_ctx(ctx);
+ return 0;
+}
+
+static const struct file_operations fops_init_pks = {
+ .read = pks_read_file,
+ .write = pks_write_file,
+ .llseek = default_llseek,
+ .release = pks_release_file,
+};
+
+static int __init parse_pks_test_options(char *str)
+{
+ run_on_boot = true;
+
+ return 0;
+}
+early_param("pks-test-on-boot", parse_pks_test_options);
+
+static int __init pks_test_init(void)
+{
+ if (cpu_feature_enabled(X86_FEATURE_PKS)) {
+ if (run_on_boot)
+ run_all();
+
+ pks_test_dentry = debugfs_create_file("run_pks", 0600, arch_debugfs_dir,
+ NULL, &fops_init_pks);
+ }
+
+ return 0;
+}
+late_initcall(pks_test_init);
+
+static void __exit pks_test_exit(void)
+{
+ debugfs_remove(pks_test_dentry);
+ pr_info("test exit\n");
+}
+module_exit(pks_test_exit);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_LICENSE("GPL v2");
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index d2796ea98c5a..3572dfb25c0a 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -13,7 +13,8 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie)
TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \
check_initial_reg_state sigreturn iopl ioperm \
test_vdso test_vsyscall mov_ss_trap \
- syscall_arg_fault
+ syscall_arg_fault test_pks
+
TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftests/x86/test_pks.c
new file mode 100644
index 000000000000..8037a2a9ff5f
--- /dev/null
+++ b/tools/testing/selftests/x86/test_pks.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <assert.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+int main(void)
+{
+ cpu_set_t cpuset;
+ char result[32];
+ pid_t pid;
+ int fd;
+
+ CPU_ZERO(&cpuset);
+ CPU_SET(0, &cpuset);
+ /* Two processes run on CPU 0 so that they go through context switch. */
+ sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset);
+
+ pid = fork();
+ if (pid == 0) {
+ fd = open("/sys/kernel/debug/x86/run_pks", O_RDWR);
+ if (fd < 0) {
+ printf("cannot open file\n");
+ return -1;
+ }
+
+ /* Allocate test_pkey1 and run test. */
+ write(fd, "0", 1);
+
+ /* Arm for context switch test */
+ write(fd, "1", 1);
+
+ /* Context switch out... */
+ sleep(4);
+
+ /* Check msr restored */
+ write(fd, "2", 1);
+ } else {
+ sleep(2);
+
+ fd = open("/sys/kernel/debug/x86/run_pks", O_RDWR);
+ if (fd < 0) {
+ printf("cannot open file\n");
+ return -1;
+ }
+
+ /* run test with alternate pkey */
+ write(fd, "0", 1);
+ }
+
+ read(fd, result, 10);
+ printf("#PF, context switch, pkey allocation and free tests: %s\n",
+ result);
+
+ close(fd);
+
+ return 0;
+}
--
2.25.1