Memory protection keys enable applications to protect its
address space from inadvertent access from or corruption
by itself.
These patches along with the pte-bit freeing patch series
enables the protection key feature on powerpc; 4k and 64k
hashpage kernels. It also changes the generic and x86
code to expose memkey features through sysfs. Finally
testcases and Documentation is updated.
All patches can be found at --
https://github.com/rampai/memorykeys.git memkey.v9
The overall idea:
-----------------
A process allocates a key and associates it with
an address range within its address space.
The process then can dynamically set read/write
permissions on the key without involving the
kernel. Any code that violates the permissions
of the address space; as defined by its associated
key, will receive a segmentation fault.
This patch series enables the feature on PPC64 HPTE
platform.
ISA3.0 section 5.7.13 describes the detailed
specifications.
Highlevel view of the design:
---------------------------
When an application associates a key with a address
address range, program the key in the Linux PTE.
When the MMU detects a page fault, allocate a hash
page and program the key into HPTE. And finally
when the MMU detects a key violation; due to
invalid application access, invoke the registered
signal handler and provide the violated key number.
Testing:
-------
This patch series has passed all the protection key
tests available in the selftest directory.The
tests are updated to work on both x86 and powerpc.
The selftests have passed on x86 and powerpc hardware.
History:
-------
version v9:
(1) used jump-labels to optimize code
-- Balbir
(2) fixed a register initialization bug noted
by Balbir
(3) fixed inappropriate use of paca to pass
siginfo and keys to signal handler
(4) Cleanup of comment style not to be right
justified -- mpe
(5) restructured the patches to depend on the
availability of VM_PKEY_BIT4 in
include/linux/mm.h
(6) Incorporated comments from Dave Hansen
towards changes to selftest and got
them tested on x86.
version v8:
(1) Contents of the AMR register withdrawn from
the siginfo structure. Applications can always
read the AMR register.
(2) AMR/IAMR/UAMOR are now available through
ptrace system call. -- thanks to Thiago
(3) code changes to handle legacy power cpus
that do not support execute-disable.
(4) incorporates many code improvement
suggestions.
version v7:
(1) refers to device tree property to enable
protection keys.
(2) adds 4K PTE support.
(3) fixes a couple of bugs noticed by Thiago
(4) decouples this patch series from arch-
independent code. This patch series can
now stand by itself, with one kludge
patch(2).
version v7:
(1) refers to device tree property to enable
protection keys.
(2) adds 4K PTE support.
(3) fixes a couple of bugs noticed by Thiago
(4) decouples this patch series from arch-
independent code. This patch series can
now stand by itself, with one kludge
patch(2).
version v6:
(1) selftest changes are broken down into 20
incremental patches.
(2) A separate key allocation mask that
includes PKEY_DISABLE_EXECUTE is
added for powerpc
(3) pkey feature is enabled for 64K HPT case
only. RPT and 4k HPT is disabled.
(4) Documentation is updated to better
capture the semantics.
(5) introduced arch_pkeys_enabled() to find
if an arch enables pkeys. Correspond-
ing change the logic that displays
key value in smaps.
(6) code rearranged in many places based on
comments from Dave Hansen, Balbir,
Anshuman.
(7) fixed one bug where a bogus key could be
associated successfully in
pkey_mprotect().
version v5:
(1) reverted back to the old design -- store
the key in the pte, instead of bypassing
it. The v4 design slowed down the hash
page path.
(2) detects key violation when kernel is told
to access user pages.
(3) further refined the patches into smaller
consumable units
(4) page faults handlers captures the fault-
ing key
from the pte instead of the vma. This
closes a race between where the key
update in the vma and a key fault caused
by the key programmed in the pte.
(5) a key created with access-denied should
also set it up to deny write. Fixed it.
(6) protection-key number is displayed in
smaps the x86 way.
version v4:
(1) patches no more depend on the pte bits
to program the hpte
-- comment by Balbir
(2) documentation updates
(3) fixed a bug in the selftest.
(4) unlike x86, powerpc lets signal handler
change key permission bits; the
change will persist across signal
handler boundaries. Earlier we
allowed the signal handler to
modify a field in the siginfo
structure which would than be used
by the kernel to program the key
protection register (AMR)
-- resolves a issue raised by Ben.
"Calls to sys_swapcontext with a
made-up context will end up with a
crap AMR if done by code who didn't
know about that register".
(5) these changes enable protection keys on
4k-page kernel aswell.
version v3:
(1) split the patches into smaller consumable
patches.
(2) added the ability to disable execute
permission on a key at creation.
(3) rename calc_pte_to_hpte_pkey_bits() to
pte_to_hpte_pkey_bits()
-- suggested by Anshuman
(4) some code optimization and clarity in
do_page_fault()
(5) A bug fix while invalidating a hpte slot
in __hash_page_4K()
-- noticed by Aneesh
version v2:
(1) documentation and selftest added.
(2) fixed a bug in 4k hpte backed 64k pte
where page invalidation was not
done correctly, and initialization
of second-part-of-the-pte was not
done correctly if the pte was not
yet Hashed with a hpte.
-- Reported by Aneesh.
(3) Fixed ABI breakage caused in siginfo
structure.
-- Reported by Anshuman.
version v1: Initial version
Ram Pai (47):
mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS
is enabled
mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
powerpc: initial pkey plumbing
powerpc: track allocation status of all pkeys
powerpc: helper function to read,write AMR,IAMR,UAMOR registers
powerpc: helper functions to initialize AMR, IAMR and UAMOR registers
powerpc: cleanup AMR, IAMR when a key is allocated or freed
powerpc: implementation for arch_set_user_pkey_access()
powerpc: ability to create execute-disabled pkeys
powerpc: store and restore the pkey state across context switches
powerpc: introduce execute-only pkey
powerpc: ability to associate pkey to a vma
powerpc: implementation for arch_override_mprotect_pkey()
powerpc: map vma key-protection bits to pte key bits.
powerpc: Program HPTE key protection bits
powerpc: helper to validate key-access permissions of a pte
powerpc: check key protection for user page access
powerpc: implementation for arch_vma_access_permitted()
powerpc: Handle exceptions caused by pkey violation
powerpc: introduce get_mm_addr_key() helper
powerpc: Deliver SEGV signal on pkey violation
powerpc: Enable pkey subsystem
powerpc: sys_pkey_alloc() and sys_pkey_free() system calls
powerpc: sys_pkey_mprotect() system call
powerpc: add sys_pkey_modify() system call
mm, x86 : introduce arch_pkeys_enabled()
mm: display pkey in smaps if arch_pkeys_enabled() is true
Documentation/x86: Move protecton key documentation to arch neutral
directory
Documentation/vm: PowerPC specific updates to memory protection keys
selftest/x86: Move protecton key selftest to arch neutral directory
selftest/vm: rename all references to pkru to a generic name
selftest/vm: move generic definitions to header file
selftest/vm: typecast the pkey register
selftest/vm: generic function to handle shadow key register
selftest/vm: fix the wrong assert in pkey_disable_set()
selftest/vm: fixed bugs in pkey_disable_clear()
selftest/vm: clear the bits in shadow reg when a pkey is freed.
selftest/vm: fix alloc_random_pkey() to make it really random
selftest/vm: introduce two arch independent abstraction
selftest/vm: pkey register should match shadow pkey
selftest/vm: generic cleanup
selftest/vm: powerpc implementation for generic abstraction
selftest/vm: fix an assertion in test_pkey_alloc_exhaust()
selftest/vm: associate key on a mapped page and detect access
violation
selftest/vm: associate key on a mapped page and detect write
violation
selftest/vm: detect write violation on a mapped access-denied-key
page
selftest/vm: sub-page allocator
Thiago Jung Bauermann (4):
powerpc/ptrace: Add memory protection key regset
mm/mprotect, powerpc/mm/pkeys, x86/mm/pkeys: Add sysfs interface
selftests/powerpc: Add ptrace tests for Protection Key register
selftests/powerpc: Add core file test for Protection Key register
Documentation/vm/protection-keys.txt | 161 +++
Documentation/x86/protection-keys.txt | 85 --
arch/powerpc/Kconfig | 15 +
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 +
arch/powerpc/include/asm/book3s/64/mmu.h | 10 +
arch/powerpc/include/asm/book3s/64/pgtable.h | 42 +-
arch/powerpc/include/asm/bug.h | 1 +
arch/powerpc/include/asm/cputable.h | 15 +-
arch/powerpc/include/asm/mman.h | 13 +-
arch/powerpc/include/asm/mmu.h | 9 +
arch/powerpc/include/asm/mmu_context.h | 24 +
arch/powerpc/include/asm/pkeys.h | 247 ++++
arch/powerpc/include/asm/processor.h | 5 +
arch/powerpc/include/asm/systbl.h | 4 +
arch/powerpc/include/asm/unistd.h | 6 +-
arch/powerpc/include/uapi/asm/elf.h | 1 +
arch/powerpc/include/uapi/asm/mman.h | 6 +
arch/powerpc/include/uapi/asm/unistd.h | 4 +
arch/powerpc/kernel/entry_64.S | 9 +
arch/powerpc/kernel/process.c | 7 +
arch/powerpc/kernel/prom.c | 18 +
arch/powerpc/kernel/ptrace.c | 66 +
arch/powerpc/kernel/traps.c | 19 +-
arch/powerpc/mm/Makefile | 1 +
arch/powerpc/mm/fault.c | 49 +-
arch/powerpc/mm/hash_utils_64.c | 29 +
arch/powerpc/mm/mmu_context_book3s64.c | 2 +
arch/powerpc/mm/pkeys.c | 463 +++++++
arch/x86/include/asm/mmu_context.h | 4 +-
arch/x86/include/asm/pkeys.h | 2 +
arch/x86/kernel/fpu/xstate.c | 5 +
arch/x86/kernel/setup.c | 8 -
arch/x86/mm/pkeys.c | 9 +
fs/proc/task_mmu.c | 16 +-
include/linux/mm.h | 12 +-
include/linux/pkeys.h | 7 +-
include/uapi/linux/elf.h | 1 +
mm/mprotect.c | 88 ++
tools/testing/selftests/powerpc/include/reg.h | 1 +
tools/testing/selftests/powerpc/ptrace/Makefile | 5 +-
tools/testing/selftests/powerpc/ptrace/core-pkey.c | 438 ++++++
.../testing/selftests/powerpc/ptrace/ptrace-pkey.c | 443 ++++++
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/pkey-helpers.h | 405 ++++++
tools/testing/selftests/vm/protection_keys.c | 1464 ++++++++++++++++++++
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/pkey-helpers.h | 220 ---
tools/testing/selftests/x86/protection_keys.c | 1395 -------------------
48 files changed, 4095 insertions(+), 1747 deletions(-)
create mode 100644 Documentation/vm/protection-keys.txt
delete mode 100644 Documentation/x86/protection-keys.txt
create mode 100644 arch/powerpc/include/asm/pkeys.h
create mode 100644 arch/powerpc/mm/pkeys.c
create mode 100644 tools/testing/selftests/powerpc/ptrace/core-pkey.c
create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
create mode 100644 tools/testing/selftests/vm/pkey-helpers.h
create mode 100644 tools/testing/selftests/vm/protection_keys.c
delete mode 100644 tools/testing/selftests/x86/pkey-helpers.h
delete mode 100644 tools/testing/selftests/x86/protection_keys.c
From 1583962828613438233@xxx Mon Nov 13 14:51:07 +0000 2017
X-GM-THRID: 1583962828613438233
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
PAPR defines 'ibm,processor-storage-keys' property. It exports two
values. The first value holds the number of data-access keys and the
second holds the number of instruction-access keys. Due to a bug in
the firmware, instruction-access keys is always reported as zero.
However any key can be configured to disable data-access and/or disable
execution-access. The inavailablity of the second value is not a
big handicap, though it could have been used to determine if the
platform supported disable-execution-access.
Non PAPR platforms do not define this property in the device tree yet.
Here, we hardcode CPUs that support pkey by consulting
PowerISA3.0
This patch calculates the number of keys supported by the platform.
Alsi it determines the platform support for read/write/execution access
support for pkeys.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/cputable.h | 15 +++++++++----
arch/powerpc/include/asm/mmu_context.h | 1 +
arch/powerpc/include/asm/pkeys.h | 10 +++++++++
arch/powerpc/kernel/prom.c | 18 +++++++++++++++++
arch/powerpc/mm/pkeys.c | 33 +++++++++++++++++++++----------
5 files changed, 61 insertions(+), 16 deletions(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 53b31c2..b288735 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -215,7 +215,9 @@ enum {
#define CPU_FTR_DAWR LONG_ASM_CONST(0x0400000000000000)
#define CPU_FTR_DABRX LONG_ASM_CONST(0x0800000000000000)
#define CPU_FTR_PMAO_BUG LONG_ASM_CONST(0x1000000000000000)
+#define CPU_FTR_PKEY LONG_ASM_CONST(0x2000000000000000)
#define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000000000000000)
+#define CPU_FTR_PKEY_EXECUTE LONG_ASM_CONST(0x8000000000000000)
#ifndef __ASSEMBLY__
@@ -436,7 +438,8 @@ enum {
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | CPU_FTR_PURR | \
- CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_DABRX)
+ CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_DABRX | \
+ CPU_FTR_PKEY)
#define CPU_FTRS_POWER6 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -444,7 +447,7 @@ enum {
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_CFAR | \
- CPU_FTR_DABRX)
+ CPU_FTR_DABRX | CPU_FTR_PKEY)
#define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -453,7 +456,7 @@ enum {
CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | \
- CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX)
+ CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX | CPU_FTR_PKEY)
#define CPU_FTRS_POWER8 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -463,7 +466,8 @@ enum {
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
- CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP)
+ CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_PKEY |\
+ CPU_FTR_PKEY_EXECUTE)
#define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
#define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
#define CPU_FTRS_POWER9 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
@@ -475,7 +479,8 @@ enum {
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
- CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300)
+ CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | \
+ CPU_FTR_PKEY | CPU_FTR_PKEY_EXECUTE)
#define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \
(~CPU_FTR_SAO))
#define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 95a3288..5a15d37 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -152,6 +152,7 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
#define thread_pkey_regs_save(thread)
#define thread_pkey_regs_restore(new_thread, old_thread)
#define thread_pkey_regs_init(thread)
+#define pkey_mmu_values(total_data, total_execute)
static inline int vma_pkey(struct vm_area_struct *vma)
{
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 9ee4731..333fb28 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -13,6 +13,7 @@
#define _ASM_POWERPC_KEYS_H
#include <linux/jump_label.h>
+#include <asm/firmware.h>
DECLARE_STATIC_KEY_TRUE(pkey_disabled);
extern int pkeys_total; /* total pkeys as per device tree */
@@ -227,6 +228,15 @@ static inline void pkey_mm_init(struct mm_struct *mm)
mm->context.execute_only_pkey = -1;
}
+static inline void pkey_mmu_values(int total_data, int total_execute)
+{
+ /*
+ * Since any pkey can be used for data or execute, we will just treat
+ * all keys as equal and track them as one entity.
+ */
+ pkeys_total = total_data;
+}
+
extern void thread_pkey_regs_save(struct thread_struct *thread);
extern void thread_pkey_regs_restore(struct thread_struct *new_thread,
struct thread_struct *old_thread);
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index f830562..8b75e9b 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -35,6 +35,7 @@
#include <linux/of_fdt.h>
#include <linux/libfdt.h>
#include <linux/cpu.h>
+#include <linux/pkeys.h>
#include <asm/prom.h>
#include <asm/rtas.h>
@@ -228,6 +229,22 @@ static void __init check_cpu_pa_features(unsigned long node)
ibm_pa_features, ARRAY_SIZE(ibm_pa_features));
}
+static void __init check_cpu_pkey_feature(unsigned long node)
+{
+ const __be32 *ftrs;
+ int len, total_data, total_execute;
+
+ ftrs = of_get_flat_dt_prop(node, "ibm,processor-storage-keys", &len);
+ if (ftrs == NULL)
+ return;
+
+ len /= sizeof(int);
+ total_execute = (len >= 2) ? be32_to_cpu(ftrs[1]) : 0;
+ total_data = (len >= 1) ? be32_to_cpu(ftrs[0]) : 0;
+ pkey_mmu_values(total_data, total_execute);
+}
+
+
#ifdef CONFIG_PPC_STD_MMU_64
static void __init init_mmu_slb_size(unsigned long node)
{
@@ -391,6 +408,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
check_cpu_feature_properties(node);
check_cpu_pa_features(node);
+ check_cpu_pkey_feature(node);
}
identical_pvr_fixup(node);
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 3b221bd..5047371 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -26,6 +26,14 @@
#define PKEY_REG_BITS (sizeof(u64)*8)
#define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
+static inline bool pkey_mmu_enabled(void)
+{
+ if (firmware_has_feature(FW_FEATURE_LPAR))
+ return pkeys_total;
+ else
+ return cpu_has_feature(CPU_FTR_PKEY);
+}
+
void __init pkey_initialize(void)
{
int os_reserved, i;
@@ -46,14 +54,9 @@ void __init pkey_initialize(void)
__builtin_popcountl(ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT)
!= (sizeof(u64) * BITS_PER_BYTE));
- /*
- * Disable the pkey system till everything is in place. A subsequent
- * patch will enable it.
- */
- static_branch_enable(&pkey_disabled);
-
- /* Lets assume 32 keys */
- pkeys_total = 32;
+ /* Let's assume 32 keys if we are not told the number of pkeys. */
+ if (!pkeys_total)
+ pkeys_total = 32;
/*
* Adjust the upper limit, based on the number of bits supported by
@@ -62,11 +65,19 @@ void __init pkey_initialize(void)
pkeys_total = min_t(int, pkeys_total,
(ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT));
+ if (!pkey_mmu_enabled() || radix_enabled() || !pkeys_total)
+ static_branch_enable(&pkey_disabled);
+ else
+ static_branch_disable(&pkey_disabled);
+
+ if (static_branch_likely(&pkey_disabled))
+ return;
+
/*
- * Disable execute_disable support for now. A subsequent patch will
- * enable it.
+ * The device tree cannot be relied on for execute_disable support.
+ * Hence we depend on CPU FTR.
*/
- pkey_execute_disable_supported = false;
+ pkey_execute_disable_supported = cpu_has_feature(CPU_FTR_PKEY_EXECUTE);
#ifdef CONFIG_PPC_4K_PAGES
/*
--
1.7.1
From 1583844545044099149@xxx Sun Nov 12 07:31:03 +0000 2017
X-GM-THRID: 1583844545044099149
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Introduce powerpc implementation for the different
abstactions.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/pkey-helpers.h | 109 ++++++++++++++++++++++----
tools/testing/selftests/vm/protection_keys.c | 38 ++++++----
2 files changed, 117 insertions(+), 30 deletions(-)
diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h
index 30755be..f764d66 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -18,27 +18,54 @@
#define u16 uint16_t
#define u32 uint32_t
#define u64 uint64_t
-#define pkey_reg_t u32
-#ifdef __i386__
+#if defined(__i386__) || defined(__x86_64__) /* arch */
+
+#ifdef __i386__ /* arch */
#define SYS_mprotect_key 380
-#define SYS_pkey_alloc 381
-#define SYS_pkey_free 382
+#define SYS_pkey_alloc 381
+#define SYS_pkey_free 382
#define REG_IP_IDX REG_EIP
#define si_pkey_offset 0x14
-#else
+#elif __x86_64__
#define SYS_mprotect_key 329
-#define SYS_pkey_alloc 330
-#define SYS_pkey_free 331
+#define SYS_pkey_alloc 330
+#define SYS_pkey_free 331
#define REG_IP_IDX REG_RIP
#define si_pkey_offset 0x20
-#endif
+#endif /* __x86_64__ */
+
+#define NR_PKEYS 16
+#define NR_RESERVED_PKEYS 1
+#define PKEY_BITS_PER_PKEY 2
+#define PKEY_DISABLE_ACCESS 0x1
+#define PKEY_DISABLE_WRITE 0x2
+#define HPAGE_SIZE (1UL<<21)
+#define pkey_reg_t u32
-#define NR_PKEYS 16
-#define PKEY_BITS_PER_PKEY 2
-#define PKEY_DISABLE_ACCESS 0x1
-#define PKEY_DISABLE_WRITE 0x2
-#define HPAGE_SIZE (1UL<<21)
+#elif __powerpc64__ /* arch */
+
+#define SYS_mprotect_key 386
+#define SYS_pkey_alloc 384
+#define SYS_pkey_free 385
+#define si_pkey_offset 0x20
+#define REG_IP_IDX PT_NIP
+#define REG_TRAPNO PT_TRAP
+#define gregs gp_regs
+#define fpregs fp_regs
+
+#define NR_PKEYS 32
+#define NR_RESERVED_PKEYS_4K 26
+#define NR_RESERVED_PKEYS_64K 3
+#define PKEY_BITS_PER_PKEY 2
+#define PKEY_DISABLE_ACCESS 0x3 /* disable read and write */
+#define PKEY_DISABLE_WRITE 0x2
+#define HPAGE_SIZE (1UL<<24)
+#define pkey_reg_t u64
+
+#else /* arch */
+ NOT SUPPORTED
+#endif /* arch */
#ifndef DEBUG_LEVEL
#define DEBUG_LEVEL 0
@@ -47,7 +74,11 @@
static inline u32 pkey_to_shift(int pkey)
{
+#if defined(__i386__) || defined(__x86_64__) /* arch */
return pkey * PKEY_BITS_PER_PKEY;
+#elif __powerpc64__ /* arch */
+ return (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY;
+#endif /* arch */
}
static inline pkey_reg_t reset_bits(int pkey, pkey_reg_t bits)
@@ -108,6 +139,7 @@ static inline void sigsafe_printf(const char *format, ...)
extern pkey_reg_t shadow_pkey_reg;
static inline pkey_reg_t __rdpkey_reg(void)
{
+#if defined(__i386__) || defined(__x86_64__) /* arch */
unsigned int eax, edx;
unsigned int ecx = 0;
pkey_reg_t pkey_reg;
@@ -115,7 +147,13 @@ static inline pkey_reg_t __rdpkey_reg(void)
asm volatile(".byte 0x0f,0x01,0xee\n\t"
: "=a" (eax), "=d" (edx)
: "c" (ecx));
- pkey_reg = eax;
+#elif __powerpc64__ /* arch */
+ pkey_reg_t eax;
+ pkey_reg_t pkey_reg;
+
+ asm volatile("mfspr %0, 0xd" : "=r" ((pkey_reg_t)(eax)));
+#endif /* arch */
+ pkey_reg = (pkey_reg_t)eax;
return pkey_reg;
}
@@ -135,6 +173,7 @@ static inline pkey_reg_t _rdpkey_reg(int line)
static inline void __wrpkey_reg(pkey_reg_t pkey_reg)
{
pkey_reg_t eax = pkey_reg;
+#if defined(__i386__) || defined(__x86_64__) /* arch */
pkey_reg_t ecx = 0;
pkey_reg_t edx = 0;
@@ -143,6 +182,14 @@ static inline void __wrpkey_reg(pkey_reg_t pkey_reg)
asm volatile(".byte 0x0f,0x01,0xef\n\t"
: : "a" (eax), "c" (ecx), "d" (edx));
assert(pkey_reg == __rdpkey_reg());
+
+#elif __powerpc64__ /* arch */
+ dprintf4("%s() changing %llx to %llx\n",
+ __func__, __rdpkey_reg(), pkey_reg);
+ asm volatile("mtspr 0xd, %0" : : "r" ((unsigned long)(eax)) : "memory");
+#endif /* arch */
+ dprintf4("%s() pkey register after changing %016lx to %016lx\n",
+ __func__, __rdpkey_reg(), pkey_reg);
}
static inline void wrpkey_reg(pkey_reg_t pkey_reg)
@@ -189,6 +236,8 @@ static inline void __pkey_write_allow(int pkey, int do_allow_write)
dprintf4("pkey_reg now: %08x\n", rdpkey_reg());
}
+#if defined(__i386__) || defined(__x86_64__) /* arch */
+
#define PAGE_SIZE 4096
#define MB (1<<20)
@@ -271,8 +320,18 @@ static inline void __page_o_noops(void)
/* 8-bytes of instruction * 512 bytes = 1 page */
asm(".rept 512 ; nopl 0x7eeeeeee(%eax) ; .endr");
}
+#elif __powerpc64__ /* arch */
-#endif /* _PKEYS_HELPER_H */
+#define PAGE_SIZE (0x1UL << 16)
+static inline int cpu_has_pku(void)
+{
+ return 1;
+}
+
+/* 8-bytes of instruction * 16384bytes = 1 page */
+#define __page_o_noops() asm(".rept 16384 ; nop; .endr")
+
+#endif /* arch */
#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
#define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1))
@@ -304,11 +363,29 @@ static inline void __page_o_noops(void)
static inline int open_hugepage_file(int flag)
{
- return open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages",
+ int fd;
+
+#if defined(__i386__) || defined(__x86_64__) /* arch */
+ fd = open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages",
O_RDONLY);
+#elif __powerpc64__ /* arch */
+ fd = open("/sys/kernel/mm/hugepages/hugepages-16384kB/nr_hugepages",
+ O_RDONLY);
+#else /* arch */
+ NOT SUPPORTED
+#endif /* arch */
+ return fd;
}
static inline int get_start_key(void)
{
+#if defined(__i386__) || defined(__x86_64__) /* arch */
return 1;
+#elif __powerpc64__ /* arch */
+ return 0;
+#else /* arch */
+ NOT SUPPORTED
+#endif /* arch */
}
+
+#endif /* _PKEYS_HELPER_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 3868434..4fe42cc 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -186,17 +186,20 @@ void dump_mem(void *dumpme, int len_bytes)
int pkey_faults;
int last_si_pkey = -1;
+void pkey_access_allow(int pkey);
void signal_handler(int signum, siginfo_t *si, void *vucontext)
{
ucontext_t *uctxt = vucontext;
int trapno;
unsigned long ip;
char *fpregs;
+#if defined(__i386__) || defined(__x86_64__) /* arch */
pkey_reg_t *pkey_reg_ptr;
- u32 si_pkey;
- u32 *si_pkey_ptr;
int pkey_reg_offset;
fpregset_t fpregset;
+#endif /* defined(__i386__) || defined(__x86_64__) */
+ u32 si_pkey;
+ u32 *si_pkey_ptr;
dprint_in_signal = 1;
dprintf1(">>>>===============SIGSEGV============================\n");
@@ -206,12 +209,14 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
- fpregset = uctxt->uc_mcontext.fpregs;
- fpregs = (void *)fpregset;
+ fpregs = (char *) uctxt->uc_mcontext.fpregs;
dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n",
__func__, trapno, ip, si_code_str(si->si_code),
si->si_code);
+
+#if defined(__i386__) || defined(__x86_64__) /* arch */
+
#ifdef __i386__
/*
* 32-bit has some extra padding so that userspace can tell whether
@@ -219,20 +224,21 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
* state. We just assume that it is here.
*/
fpregs += 0x70;
-#endif
- pkey_reg_offset = pkey_reg_xstate_offset();
- pkey_reg_ptr = (void *)(&fpregs[pkey_reg_offset]);
+#endif /* __i386__ */
- dprintf1("siginfo: %p\n", si);
- dprintf1(" fpregs: %p\n", fpregs);
+ pkey_reg_ptr = (void *)(&fpregs[pkey_reg_xstate_offset()]);
/*
- * If we got a PKEY fault, we *HAVE* to have at least one bit set in
+ * If we got a key fault, we *HAVE* to have at least one bit set in
* here.
*/
dprintf1("pkey_reg_xstate_offset: %d\n", pkey_reg_xstate_offset());
if (DEBUG_LEVEL > 4)
dump_mem(pkey_reg_ptr - 128, 256);
pkey_assert(*pkey_reg_ptr);
+#endif /* defined(__i386__) || defined(__x86_64__) */
+
+ dprintf1("siginfo: %p\n", si);
+ dprintf1(" fpregs: %p\n", fpregs);
si_pkey_ptr = (u32 *)(((u8 *)si) + si_pkey_offset);
dprintf1("si_pkey_ptr: %p\n", si_pkey_ptr);
@@ -248,19 +254,23 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
exit(4);
}
- dprintf1("signal pkey_reg from xsave: %016lx\n", *pkey_reg_ptr);
/*
* need __rdpkey_reg() version so we do not do shadow_pkey_reg
* checking
*/
dprintf1("signal pkey_reg from pkey_reg: %016lx\n", __rdpkey_reg());
- dprintf1("si_pkey from siginfo: %jx\n", si_pkey);
- *(u64 *)pkey_reg_ptr = 0x00000000;
+ dprintf1("si_pkey from siginfo: %lx\n", si_pkey);
+#if defined(__i386__) || defined(__x86_64__) /* arch */
+ dprintf1("signal pkey_reg from xsave: %016lx\n", *pkey_reg_ptr);
+ *(u64 *)pkey_reg_ptr &= reset_bits(si_pkey, PKEY_DISABLE_ACCESS);
+#elif __powerpc64__
+ pkey_access_allow(si_pkey);
+#endif
+ shadow_pkey_reg &= reset_bits(si_pkey, PKEY_DISABLE_ACCESS);
dprintf1("WARNING: set PKEY_REG=0 to allow faulting instruction "
"to continue\n");
pkey_faults++;
dprintf1("<<<<==================================================\n");
- return;
}
int wait_all_children(void)
--
1.7.1
From 1583976752448880040@xxx Mon Nov 13 18:32:26 +0000 2017
X-GM-THRID: 1583976261947170486
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
From: Thiago Jung Bauermann <[email protected]>
The AMR/IAMR/UAMOR are part of the program context.
Allow it to be accessed via ptrace and through core files.
Signed-off-by: Ram Pai <[email protected]>
Signed-off-by: Thiago Jung Bauermann <[email protected]>
---
arch/powerpc/include/asm/pkeys.h | 5 +++
arch/powerpc/include/uapi/asm/elf.h | 1 +
arch/powerpc/kernel/ptrace.c | 66 +++++++++++++++++++++++++++++++++++
arch/powerpc/kernel/traps.c | 7 ++++
include/uapi/linux/elf.h | 1 +
5 files changed, 80 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 3437a50..9ee4731 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -213,6 +213,11 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
return __arch_set_user_pkey_access(tsk, pkey, init_val);
}
+static inline bool arch_pkeys_enabled(void)
+{
+ return !static_branch_likely(&pkey_disabled);
+}
+
static inline void pkey_mm_init(struct mm_struct *mm)
{
if (static_branch_likely(&pkey_disabled))
diff --git a/arch/powerpc/include/uapi/asm/elf.h b/arch/powerpc/include/uapi/asm/elf.h
index 5f201d4..860c592 100644
--- a/arch/powerpc/include/uapi/asm/elf.h
+++ b/arch/powerpc/include/uapi/asm/elf.h
@@ -97,6 +97,7 @@
#define ELF_NTMSPRREG 3 /* include tfhar, tfiar, texasr */
#define ELF_NEBB 3 /* includes ebbrr, ebbhr, bescr */
#define ELF_NPMU 5 /* includes siar, sdar, sier, mmcr2, mmcr0 */
+#define ELF_NPKEY 3 /* includes amr, iamr, uamor */
typedef unsigned long elf_greg_t64;
typedef elf_greg_t64 elf_gregset_t64[ELF_NGREG];
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index f52ad5b..3718a04 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -35,6 +35,7 @@
#include <linux/context_tracking.h>
#include <linux/uaccess.h>
+#include <linux/pkeys.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/switch_to.h>
@@ -1775,6 +1776,61 @@ static int pmu_set(struct task_struct *target,
return ret;
}
#endif
+
+#ifdef CONFIG_PPC_MEM_KEYS
+static int pkey_active(struct task_struct *target,
+ const struct user_regset *regset)
+{
+ if (!arch_pkeys_enabled())
+ return -ENODEV;
+
+ return regset->n;
+}
+
+static int pkey_get(struct task_struct *target,
+ const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ BUILD_BUG_ON(TSO(amr) + sizeof(unsigned long) != TSO(iamr));
+ BUILD_BUG_ON(TSO(iamr) + sizeof(unsigned long) != TSO(uamor));
+
+ if (!arch_pkeys_enabled())
+ return -ENODEV;
+
+ return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.amr, 0,
+ ELF_NPKEY * sizeof(unsigned long));
+}
+
+static int pkey_set(struct task_struct *target,
+ const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ u64 new_amr;
+ int ret;
+
+ if (!arch_pkeys_enabled())
+ return -ENODEV;
+
+ /* Only the AMR can be set from userspace */
+ if (pos != 0 || count != sizeof(new_amr))
+ return -EINVAL;
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+ &new_amr, 0, sizeof(new_amr));
+ if (ret)
+ return ret;
+
+ /* UAMOR determines which bits of the AMR can be set from userspace. */
+ target->thread.amr = (new_amr & target->thread.uamor) |
+ (target->thread.amr & ~target->thread.uamor);
+
+ return 0;
+}
+#endif /* CONFIG_PPC_MEM_KEYS */
+
/*
* These are our native regset flavors.
*/
@@ -1809,6 +1865,9 @@ enum powerpc_regset {
REGSET_EBB, /* EBB registers */
REGSET_PMR, /* Performance Monitor Registers */
#endif
+#ifdef CONFIG_PPC_MEM_KEYS
+ REGSET_PKEY, /* AMR register */
+#endif
};
static const struct user_regset native_regsets[] = {
@@ -1914,6 +1973,13 @@ enum powerpc_regset {
.active = pmu_active, .get = pmu_get, .set = pmu_set
},
#endif
+#ifdef CONFIG_PPC_MEM_KEYS
+ [REGSET_PKEY] = {
+ .core_note_type = NT_PPC_PKEY, .n = ELF_NPKEY,
+ .size = sizeof(u64), .align = sizeof(u64),
+ .active = pkey_active, .get = pkey_get, .set = pkey_set
+ },
+#endif
};
static const struct user_regset_view user_ppc_native_view = {
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index ed1c39b..f449dc5 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -291,6 +291,13 @@ void _exception_pkey(int signr, struct pt_regs *regs, int code, unsigned long ad
local_irq_enable();
current->thread.trap_nr = code;
+
+ /*
+ * Save all the pkey registers AMR/IAMR/UAMOR. Eg: Core dumps need
+ * to capture the content, if the task gets killed.
+ */
+ thread_pkey_regs_save(¤t->thread);
+
memset(&info, 0, sizeof(info));
info.si_signo = signr;
info.si_code = code;
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index c58627c..c017818 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -396,6 +396,7 @@
#define NT_PPC_TM_CTAR 0x10d /* TM checkpointed Target Address Register */
#define NT_PPC_TM_CPPR 0x10e /* TM checkpointed Program Priority Register */
#define NT_PPC_TM_CDSCR 0x10f /* TM checkpointed Data Stream Control Register */
+#define NT_PPC_PKEY 0x110 /* Memory Protection Keys registers */
#define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */
#define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */
#define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */
--
1.7.1
From 1583365134760703369@xxx Tue Nov 07 00:31:02 +0000 2017
X-GM-THRID: 1583365134760703369
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
This patch provides the implementation for
arch_vma_access_permitted(). Returns true if the
requested access is allowed by pkey associated with the
vma.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/mmu_context.h | 5 +++-
arch/powerpc/mm/pkeys.c | 34 ++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index a557735..95a3288 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -136,6 +136,10 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm,
{
}
+#ifdef CONFIG_PPC_MEM_KEYS
+bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
+ bool execute, bool foreign);
+#else /* CONFIG_PPC_MEM_KEYS */
static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
bool write, bool execute, bool foreign)
{
@@ -143,7 +147,6 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
return true;
}
-#ifndef CONFIG_PPC_MEM_KEYS
#define pkey_initialize()
#define pkey_mm_init(mm)
#define thread_pkey_regs_save(thread)
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 13902be..3b221bd 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -375,3 +375,37 @@ bool arch_pte_access_permitted(u64 pte, bool write, bool execute)
return pkey_access_permitted(pte_to_pkey_bits(pte), write, execute);
}
+
+/*
+ * We only want to enforce protection keys on the current thread because we
+ * effectively have no access to AMR/IAMR for other threads or any way to tell
+ * which AMR/IAMR in a threaded process we could use.
+ *
+ * So do not enforce things if the VMA is not from the current mm, or if we are
+ * in a kernel thread.
+ */
+static inline bool vma_is_foreign(struct vm_area_struct *vma)
+{
+ if (!current->mm)
+ return true;
+
+ /* if it is not our ->mm, it has to be foreign */
+ if (current->mm != vma->vm_mm)
+ return true;
+
+ return false;
+}
+
+bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
+ bool execute, bool foreign)
+{
+ if (static_branch_likely(&pkey_disabled))
+ return true;
+ /*
+ * Do not enforce our key-permissions on a foreign vma.
+ */
+ if (foreign || vma_is_foreign(vma))
+ return true;
+
+ return pkey_access_permitted(vma_pkey(vma), write, execute);
+}
--
1.7.1
From 1586055917895043477@xxx Wed Dec 06 17:19:53 +0000 2017
X-GM-THRID: 1586031988596160854
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Add documentation updates that capture PowerPC specific changes.
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
---
Documentation/vm/protection-keys.txt | 126 +++++++++++++++++++++++++++-------
1 files changed, 101 insertions(+), 25 deletions(-)
diff --git a/Documentation/vm/protection-keys.txt b/Documentation/vm/protection-keys.txt
index fa46dcb..bc079b3 100644
--- a/Documentation/vm/protection-keys.txt
+++ b/Documentation/vm/protection-keys.txt
@@ -1,22 +1,46 @@
-Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
-which will be found on future Intel CPUs.
-
-Memory Protection Keys provides a mechanism for enforcing page-based
-protections, but without requiring modification of the page tables
-when an application changes protection domains. It works by
-dedicating 4 previously ignored bits in each page table entry to a
-"protection key", giving 16 possible keys.
-
-There is also a new user-accessible register (PKRU) with two separate
-bits (Access Disable and Write Disable) for each key. Being a CPU
-register, PKRU is inherently thread-local, potentially giving each
-thread a different set of protections from every other thread.
-
-There are two new instructions (RDPKRU/WRPKRU) for reading and writing
-to the new register. The feature is only available in 64-bit mode,
-even though there is theoretically space in the PAE PTEs. These
-permissions are enforced on data access only and have no effect on
-instruction fetches.
+Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature found on
+future Intel CPUs and on PowerPC 5 and higher CPUs.
+
+Memory Protection Keys provide a mechanism for enforcing page-based
+protections, but without requiring modification of the page tables when an
+application changes protection domains.
+
+It works by dedicating bits in each page table entry to a "protection key".
+There is also a user-accessible register with two separate bits for each
+key. Being a CPU register, the user-accessible register is inherently
+thread-local, potentially giving each thread a different set of protections
+from every other thread.
+
+On Intel:
+
+ Four previously bits are used the page table entry giving 16 possible keys.
+
+ The user accessible register(PKRU) has a bit each per key to disable
+ access and to disable write.
+
+ The feature is only available in 64-bit mode, even though there is
+ theoretically space in the PAE PTEs. These permissions are enforced on
+ data access only and have no effect on instruction fetches.
+
+On PowerPC:
+
+ Five bits in the page table entry are used giving 32 possible keys.
+ This support is currently for Hash Page Table mode only.
+
+ The user accessible register(AMR) has a bit each per key to disable
+ read and write. Access disable can be achieved by disabling
+ read and write.
+
+ 'mtspr 0xd, mem' reads the AMR register
+ 'mfspr mem, 0xd' writes into the AMR register.
+
+ Execution can be disabled by allocating a key with execute-disabled
+ permission. The execute-permissions on the key; however, cannot be
+ changed through a user accessible register. Instead; a powerpc specific
+ system call sys_pkey_modify() must be used. The CPU will not allow
+ execution of instruction in pages that are associated with
+ execute-disabled key.
+
=========================== Syscalls ===========================
@@ -28,9 +52,9 @@ There are 3 system calls which directly interact with pkeys:
unsigned long prot, int pkey);
Before a pkey can be used, it must first be allocated with
-pkey_alloc(). An application calls the WRPKRU instruction
+pkey_alloc(). An application calls the WRPKRU/AMR instruction
directly in order to change access permissions to memory covered
-with a key. In this example WRPKRU is wrapped by a C function
+with a key. In this example WRPKRU/AMR is wrapped by a C function
called pkey_set().
int real_prot = PROT_READ|PROT_WRITE;
@@ -52,11 +76,11 @@ is no longer in use:
munmap(ptr, PAGE_SIZE);
pkey_free(pkey);
-(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
+(Note: pkey_set() is a wrapper for the RDPKRU,WRPKRU or AMR instructions.
An example implementation can be found in
- tools/testing/selftests/x86/protection_keys.c)
+ tools/testing/selftests/vm/protection_keys.c)
-=========================== Behavior ===========================
+=========================== Behavior =================================
The kernel attempts to make protection keys consistent with the
behavior of a plain mprotect(). For instance if you do this:
@@ -66,7 +90,7 @@ behavior of a plain mprotect(). For instance if you do this:
you can expect the same effects with protection keys when doing this:
- pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
+ pkey = pkey_alloc(0, PKEY_DISABLE_ACCESS);
pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
something(ptr);
@@ -83,3 +107,55 @@ with a read():
The kernel will send a SIGSEGV in both cases, but si_code will be set
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
the plain mprotect() permissions are violated.
+
+========================== sysfs Interface ==========================
+
+Information about support of protection keys on the system can be
+found in the /sys/kernel/mm/protection_keys directory, which
+contains the following files:
+
+- total_keys: Shows the number of keys supported by the hardware.
+ Not all of those keys may be available for use by a process
+ because the platform or operating system may reserve some keys
+ for their own use.
+
+- usable_keys: Shows the minimum number of keys guaranteed to be
+ available for use by a process. In other words: total_keys minus
+ the keys reserved by the platform or operating system. This
+ number doesn't change to reflect keys that are already being
+ used by the process reading the file.
+
+ There may be one more key available than what is advertised in
+ this file because the kernel may use one key for mprotect()
+ calls setting up memory with execute-only permissions. This file
+ assumes that this key is being used, but if it is not the
+ process will have one more key it can use for other purposes.
+
+- disable_access_supported: Shows 'true' if the system supports keys
+ which disallow reading from a given page (i.e., the
+ PKEY_DISABLE_ACCESS flag is supported).
+
+- disable_write_supported: Shows 'true' if the system supports keys
+ which disallow writing to a given page (i.e., the
+ PKEY_DISABLE_WRITE flag is supported).
+
+- disable_execute_supported: Shows 'true' if the system supports keys
+ which disallow code execution from a given page (i.e., the
+ PKEY_DISABLE_EXECUTE flag is supported).
+
+====================================================================
+ Differences
+
+The following differences exist between x86 and power.
+
+a) powerpc (PowerPC8 onwards) *also* allows creation of a key with
+ execute-disabled.
+ The following is allowed
+ pkey = pkey_alloc(0, PKEY_DISABLE_EXECUTE);
+
+b) On powerpc the access/write permission on a key can be modified by
+ programming the AMR register from the signal handler. The changes
+ persist across signal boundaries. On x86, the PKRU specific fpregs
+ entry has to be modified to change the access/write permission on
+ a key.
+=====================================================================
--
1.7.1
From 1583306948647228406@xxx Mon Nov 06 09:06:11 +0000 2017
X-GM-THRID: 1583306948647228406
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
cleanup the code to satisfy coding styles.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/protection_keys.c | 81 ++++++++++++++------------
1 files changed, 43 insertions(+), 38 deletions(-)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 2600f7a..3868434 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -4,7 +4,7 @@
*
* There are examples in here of:
* * how to set protection keys on memory
- * * how to set/clear bits in pkey registers (the rights register)
+ * * how to set/clear bits in Protection Key registers (the rights register)
* * how to handle SEGV_PKUERR signals and extract pkey-relevant
* information from the siginfo
*
@@ -13,13 +13,18 @@
* prefault pages in at malloc, or not
* protect MPX bounds tables with protection keys?
* make sure VMA splitting/merging is working correctly
- * OOMs can destroy mm->mmap (see exit_mmap()), so make sure it is immune to pkeys
- * look for pkey "leaks" where it is still set on a VMA but "freed" back to the kernel
- * do a plain mprotect() to a mprotect_pkey() area and make sure the pkey sticks
+ * OOMs can destroy mm->mmap (see exit_mmap()),
+ * so make sure it is immune to pkeys
+ * look for pkey "leaks" where it is still set on a VMA
+ * but "freed" back to the kernel
+ * do a plain mprotect() to a mprotect_pkey() area and make
+ * sure the pkey sticks
*
* Compile like this:
- * gcc -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
- * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
+ * gcc -o protection_keys -O2 -g -std=gnu99
+ * -pthread -Wall protection_keys.c -lrt -ldl -lm
+ * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99
+ * -pthread -Wall protection_keys.c -lrt -ldl -lm
*/
#define _GNU_SOURCE
#include <errno.h>
@@ -251,26 +256,11 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
dprintf1("signal pkey_reg from pkey_reg: %016lx\n", __rdpkey_reg());
dprintf1("si_pkey from siginfo: %jx\n", si_pkey);
*(u64 *)pkey_reg_ptr = 0x00000000;
- dprintf1("WARNING: set PRKU=0 to allow faulting instruction to continue\n");
+ dprintf1("WARNING: set PKEY_REG=0 to allow faulting instruction "
+ "to continue\n");
pkey_faults++;
dprintf1("<<<<==================================================\n");
return;
- if (trapno == 14) {
- fprintf(stderr,
- "ERROR: In signal handler, page fault, trapno = %d, ip = %016lx\n",
- trapno, ip);
- fprintf(stderr, "si_addr %p\n", si->si_addr);
- fprintf(stderr, "REG_ERR: %lx\n",
- (unsigned long)uctxt->uc_mcontext.gregs[REG_ERR]);
- exit(1);
- } else {
- fprintf(stderr, "unexpected trap %d! at 0x%lx\n", trapno, ip);
- fprintf(stderr, "si_addr %p\n", si->si_addr);
- fprintf(stderr, "REG_ERR: %lx\n",
- (unsigned long)uctxt->uc_mcontext.gregs[REG_ERR]);
- exit(2);
- }
- dprint_in_signal = 0;
}
int wait_all_children(void)
@@ -415,7 +405,7 @@ void pkey_disable_set(int pkey, int flags)
{
unsigned long syscall_flags = 0;
int ret;
- int pkey_rights;
+ u32 pkey_rights;
pkey_reg_t orig_pkey_reg = rdpkey_reg();
dprintf1("START->%s(%d, 0x%x)\n", __func__,
@@ -453,7 +443,7 @@ void pkey_disable_clear(int pkey, int flags)
{
unsigned long syscall_flags = 0;
int ret;
- int pkey_rights = pkey_get(pkey, syscall_flags);
+ u32 pkey_rights = pkey_get(pkey, syscall_flags);
pkey_reg_t orig_pkey_reg = rdpkey_reg();
pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
@@ -516,9 +506,10 @@ int sys_mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
return sret;
}
-int sys_pkey_alloc(unsigned long flags, unsigned long init_val)
+int sys_pkey_alloc(unsigned long flags, u64 init_val)
{
int ret = syscall(SYS_pkey_alloc, flags, init_val);
+
dprintf1("%s(flags=%lx, init_val=%lx) syscall ret: %d errno: %d\n",
__func__, flags, init_val, ret, errno);
return ret;
@@ -542,7 +533,7 @@ void pkey_set_shadow(u32 key, u64 init_val)
int alloc_pkey(void)
{
int ret;
- unsigned long init_val = 0x0;
+ u64 init_val = 0x0;
dprintf1("%s()::%d, pkey_reg: 0x%016lx shadow: %016lx\n", __func__,
__LINE__, __rdpkey_reg(), shadow_pkey_reg);
@@ -692,7 +683,9 @@ void record_pkey_malloc(void *ptr, long size)
/* every record is full */
size_t old_nr_records = nr_pkey_malloc_records;
size_t new_nr_records = (nr_pkey_malloc_records * 2 + 1);
- size_t new_size = new_nr_records * sizeof(struct pkey_malloc_record);
+ size_t new_size = new_nr_records *
+ sizeof(struct pkey_malloc_record);
+
dprintf2("new_nr_records: %zd\n", new_nr_records);
dprintf2("new_size: %zd\n", new_size);
pkey_malloc_records = realloc(pkey_malloc_records, new_size);
@@ -716,9 +709,11 @@ void free_pkey_malloc(void *ptr)
{
long i;
int ret;
+
dprintf3("%s(%p)\n", __func__, ptr);
for (i = 0; i < nr_pkey_malloc_records; i++) {
struct pkey_malloc_record *rec = &pkey_malloc_records[i];
+
dprintf4("looking for ptr %p at record[%ld/%p]: {%p, %ld}\n",
ptr, i, rec, rec->ptr, rec->size);
if ((ptr < rec->ptr) ||
@@ -799,11 +794,13 @@ void setup_hugetlbfs(void)
char buf[] = "123";
if (geteuid() != 0) {
- fprintf(stderr, "WARNING: not run as root, can not do hugetlb test\n");
+ fprintf(stderr,
+ "WARNING: not run as root, can not do hugetlb test\n");
return;
}
- cat_into_file(__stringify(GET_NR_HUGE_PAGES), "/proc/sys/vm/nr_hugepages");
+ cat_into_file(__stringify(GET_NR_HUGE_PAGES),
+ "/proc/sys/vm/nr_hugepages");
/*
* Now go make sure that we got the pages and that they
@@ -824,7 +821,8 @@ void setup_hugetlbfs(void)
}
if (atoi(buf) != GET_NR_HUGE_PAGES) {
- fprintf(stderr, "could not confirm 2M pages, got: '%s' expected %d\n",
+ fprintf(stderr, "could not confirm 2M pages, got:"
+ " '%s' expected %d\n",
buf, GET_NR_HUGE_PAGES);
return;
}
@@ -957,6 +955,7 @@ void __save_test_fd(int fd)
int get_test_read_fd(void)
{
int test_fd = open("/etc/passwd", O_RDONLY);
+
__save_test_fd(test_fd);
return test_fd;
}
@@ -998,7 +997,8 @@ void test_read_of_access_disabled_region(int *ptr, u16 pkey)
{
int ptr_contents;
- dprintf1("disabling access to PKEY[%02d], doing read @ %p\n", pkey, ptr);
+ dprintf1("disabling access to PKEY[%02d], doing read @ %p\n",
+ pkey, ptr);
rdpkey_reg();
pkey_access_deny(pkey);
ptr_contents = read_ptr(ptr);
@@ -1120,13 +1120,14 @@ void test_pkey_syscalls_bad_args(int *ptr, u16 pkey)
/* Assumes that all pkeys other than 'pkey' are unallocated */
void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
{
- int err;
+ int err = 0;
int allocated_pkeys[NR_PKEYS] = {0};
int nr_allocated_pkeys = 0;
int i;
for (i = 0; i < NR_PKEYS*2; i++) {
int new_pkey;
+
dprintf1("%s() alloc loop: %d\n", __func__, i);
new_pkey = alloc_pkey();
dprintf4("%s()::%d, err: %d pkey_reg: 0x%016lx "
@@ -1134,9 +1135,11 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
__func__, __LINE__, err, __rdpkey_reg(),
shadow_pkey_reg);
rdpkey_reg(); /* for shadow checking */
- dprintf2("%s() errno: %d ENOSPC: %d\n", __func__, errno, ENOSPC);
+ dprintf2("%s() errno: %d ENOSPC: %d\n",
+ __func__, errno, ENOSPC);
if ((new_pkey == -1) && (errno == ENOSPC)) {
- dprintf2("%s() failed to allocate pkey after %d tries\n",
+ dprintf2("%s() failed to allocate pkey "
+ "after %d tries\n",
__func__, nr_allocated_pkeys);
break;
}
@@ -1338,7 +1341,8 @@ void run_tests_once(void)
tracing_off();
close_test_fds();
- printf("test %2d PASSED (iteration %d)\n", test_nr, iteration_nr);
+ printf("test %2d PASSED (iteration %d)\n",
+ test_nr, iteration_nr);
dprintf1("======================\n\n");
}
iteration_nr++;
@@ -1350,7 +1354,7 @@ int main(void)
setup_handlers();
- printf("has pku: %d\n", cpu_has_pku());
+ printf("has pkey: %d\n", cpu_has_pku());
if (!cpu_has_pku()) {
int size = PAGE_SIZE;
@@ -1358,7 +1362,8 @@ int main(void)
printf("running PKEY tests for unsupported CPU/OS\n");
- ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+ ptr = mmap(NULL, size, PROT_NONE,
+ MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
assert(ptr != (void *)-1);
test_mprotect_pkey_on_unsupported_cpu(ptr, 1);
exit(0);
--
1.7.1
From 1583468015852099816@xxx Wed Nov 08 03:46:17 +0000 2017
X-GM-THRID: 1582909548280322827
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
The maximum number of keys that can be allocated has to
take into consideration, that some keys are reserved by
the architecture for specific purpose. Hence cannot
be allocated.
Fix the assertion in test_pkey_alloc_exhaust()
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/pkey-helpers.h | 14 ++++++++++++++
tools/testing/selftests/vm/protection_keys.c | 9 ++++-----
2 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h
index f764d66..3ea3e06 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -388,4 +388,18 @@ static inline int get_start_key(void)
#endif /* arch */
}
+static inline int arch_reserved_keys(void)
+{
+#if defined(__i386__) || defined(__x86_64__) /* arch */
+ return NR_RESERVED_PKEYS;
+#elif __powerpc64__ /* arch */
+ if (sysconf(_SC_PAGESIZE) == 4096)
+ return NR_RESERVED_PKEYS_4K;
+ else
+ return NR_RESERVED_PKEYS_64K;
+#else /* arch */
+ NOT SUPPORTED
+#endif /* arch */
+}
+
#endif /* _PKEYS_HELPER_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 4fe42cc..8f0dd94 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1166,12 +1166,11 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
pkey_assert(i < NR_PKEYS*2);
/*
- * There are 16 pkeys supported in hardware. One is taken
- * up for the default (0) and another can be taken up by
- * an execute-only mapping. Ensure that we can allocate
- * at least 14 (16-2).
+ * There are NR_PKEYS pkeys supported in hardware. arch_reserved_keys()
+ * are reserved. One can be taken up by an execute-only mapping.
+ * Ensure that we can allocate at least the remaining.
*/
- pkey_assert(i >= NR_PKEYS-2);
+ pkey_assert(i >= (NR_PKEYS-arch_reserved_keys()-1));
for (i = 0; i < nr_allocated_pkeys; i++) {
err = sys_pkey_free(allocated_pkeys[i]);
--
1.7.1
From 1583323091302696872@xxx Mon Nov 06 13:22:46 +0000 2017
X-GM-THRID: 1583323091302696872
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Currently only 4bits are allocated in the vma flags to hold 16
keys. This is sufficient for x86. PowerPC supports 32 keys,
which needs 5bits. This patch allocates an additional bit.
Acked-by: Balbir Singh <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
---
fs/proc/task_mmu.c | 1 +
include/linux/mm.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 677866e..fad19a0 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -683,6 +683,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
[ilog2(VM_PKEY_BIT1)] = "",
[ilog2(VM_PKEY_BIT2)] = "",
[ilog2(VM_PKEY_BIT3)] = "",
+ [ilog2(VM_PKEY_BIT4)] = "",
#endif /* CONFIG_ARCH_HAS_PKEYS */
};
size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2c5ea48..f5330a9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -221,9 +221,10 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *,
#ifdef CONFIG_ARCH_HAS_PKEYS
# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */
-# define VM_PKEY_BIT1 VM_HIGH_ARCH_1
+# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 /* on x86 and 5-bit value on ppc64 */
# define VM_PKEY_BIT2 VM_HIGH_ARCH_2
# define VM_PKEY_BIT3 VM_HIGH_ARCH_3
+# define VM_PKEY_BIT4 VM_HIGH_ARCH_4
#endif /* CONFIG_ARCH_HAS_PKEYS */
#if defined(CONFIG_X86)
--
1.7.1
From 1583345000571417051@xxx Mon Nov 06 19:11:01 +0000 2017
X-GM-THRID: 1583345000571417051
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
detect access-violation on a page to which access-disabled
key is associated much after the page is mapped.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/protection_keys.c | 19 +++++++++++++++++++
1 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 8f0dd94..998a44f 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1015,6 +1015,24 @@ void test_read_of_access_disabled_region(int *ptr, u16 pkey)
dprintf1("*ptr: %d\n", ptr_contents);
expected_pkey_fault(pkey);
}
+
+void test_read_of_access_disabled_region_with_page_already_mapped(int *ptr,
+ u16 pkey)
+{
+ int ptr_contents;
+
+ dprintf1("disabling access to PKEY[%02d], doing read @ %p\n",
+ pkey, ptr);
+ ptr_contents = read_ptr(ptr);
+ dprintf1("reading ptr before disabling the read : %d\n",
+ ptr_contents);
+ rdpkey_reg();
+ pkey_access_deny(pkey);
+ ptr_contents = read_ptr(ptr);
+ dprintf1("*ptr: %d\n", ptr_contents);
+ expected_pkey_fault(pkey);
+}
+
void test_write_of_write_disabled_region(int *ptr, u16 pkey)
{
dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey);
@@ -1309,6 +1327,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey)
void (*pkey_tests[])(int *ptr, u16 pkey) = {
test_read_of_write_disabled_region,
test_read_of_access_disabled_region,
+ test_read_of_access_disabled_region_with_page_already_mapped,
test_write_of_write_disabled_region,
test_write_of_access_disabled_region,
test_kernel_write_of_access_disabled_region,
--
1.7.1
From 1583368138859087934@xxx Tue Nov 07 01:18:47 +0000 2017
X-GM-THRID: 1583368138859087934
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
This is in preparation to accomadate a differing size register
across architectures.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/pkey-helpers.h | 27 +++++-----
tools/testing/selftests/vm/protection_keys.c | 71 ++++++++++++++------------
2 files changed, 52 insertions(+), 46 deletions(-)
diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h
index 1b15b54..b03f7e5 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -18,6 +18,7 @@
#define u16 uint16_t
#define u32 uint32_t
#define u64 uint64_t
+#define pkey_reg_t u32
#ifdef __i386__
#define SYS_mprotect_key 380
@@ -77,12 +78,12 @@ static inline void sigsafe_printf(const char *format, ...)
#define dprintf3(args...) dprintf_level(3, args)
#define dprintf4(args...) dprintf_level(4, args)
-extern unsigned int shadow_pkey_reg;
-static inline unsigned int __rdpkey_reg(void)
+extern pkey_reg_t shadow_pkey_reg;
+static inline pkey_reg_t __rdpkey_reg(void)
{
unsigned int eax, edx;
unsigned int ecx = 0;
- unsigned int pkey_reg;
+ pkey_reg_t pkey_reg;
asm volatile(".byte 0x0f,0x01,0xee\n\t"
: "=a" (eax), "=d" (edx)
@@ -91,11 +92,11 @@ static inline unsigned int __rdpkey_reg(void)
return pkey_reg;
}
-static inline unsigned int _rdpkey_reg(int line)
+static inline pkey_reg_t _rdpkey_reg(int line)
{
- unsigned int pkey_reg = __rdpkey_reg();
+ pkey_reg_t pkey_reg = __rdpkey_reg();
- dprintf4("rdpkey_reg(line=%d) pkey_reg: %x shadow: %x\n",
+ dprintf4("rdpkey_reg(line=%d) pkey_reg: %016lx shadow: %016lx\n",
line, pkey_reg, shadow_pkey_reg);
assert(pkey_reg == shadow_pkey_reg);
@@ -104,11 +105,11 @@ static inline unsigned int _rdpkey_reg(int line)
#define rdpkey_reg() _rdpkey_reg(__LINE__)
-static inline void __wrpkey_reg(unsigned int pkey_reg)
+static inline void __wrpkey_reg(pkey_reg_t pkey_reg)
{
- unsigned int eax = pkey_reg;
- unsigned int ecx = 0;
- unsigned int edx = 0;
+ pkey_reg_t eax = pkey_reg;
+ pkey_reg_t ecx = 0;
+ pkey_reg_t edx = 0;
dprintf4("%s() changing %08x to %08x\n", __func__,
__rdpkey_reg(), pkey_reg);
@@ -117,7 +118,7 @@ static inline void __wrpkey_reg(unsigned int pkey_reg)
assert(pkey_reg == __rdpkey_reg());
}
-static inline void wrpkey_reg(unsigned int pkey_reg)
+static inline void wrpkey_reg(pkey_reg_t pkey_reg)
{
dprintf4("%s() changing %08x to %08x\n", __func__,
__rdpkey_reg(), pkey_reg);
@@ -135,7 +136,7 @@ static inline void wrpkey_reg(unsigned int pkey_reg)
*/
static inline void __pkey_access_allow(int pkey, int do_allow)
{
- unsigned int pkey_reg = rdpkey_reg();
+ pkey_reg_t pkey_reg = rdpkey_reg();
int bit = pkey * 2;
if (do_allow)
@@ -149,7 +150,7 @@ static inline void __pkey_access_allow(int pkey, int do_allow)
static inline void __pkey_write_allow(int pkey, int do_allow_write)
{
- long pkey_reg = rdpkey_reg();
+ pkey_reg_t pkey_reg = rdpkey_reg();
int bit = pkey * 2 + 1;
if (do_allow_write)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index dec05e0..2e8de01 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -48,7 +48,7 @@
int iteration_nr = 1;
int test_nr;
-unsigned int shadow_pkey_reg;
+pkey_reg_t shadow_pkey_reg;
int dprint_in_signal;
char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
@@ -158,7 +158,7 @@ void dump_mem(void *dumpme, int len_bytes)
for (i = 0; i < len_bytes; i += sizeof(u64)) {
u64 *ptr = (u64 *)(c + i);
- dprintf1("dump[%03d][@%p]: %016jx\n", i, ptr, *ptr);
+ dprintf1("dump[%03d][@%p]: %016lx\n", i, ptr, *ptr);
}
}
@@ -186,15 +186,16 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
int trapno;
unsigned long ip;
char *fpregs;
- u32 *pkey_reg_ptr;
- u64 si_pkey;
+ pkey_reg_t *pkey_reg_ptr;
+ u32 si_pkey;
u32 *si_pkey_ptr;
int pkey_reg_offset;
fpregset_t fpregset;
dprint_in_signal = 1;
dprintf1(">>>>===============SIGSEGV============================\n");
- dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__, __LINE__,
+ dprintf1("%s()::%d, pkey_reg: 0x%016lx shadow: %016lx\n",
+ __func__, __LINE__,
__rdpkey_reg(), shadow_pkey_reg);
trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
@@ -202,8 +203,9 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
fpregset = uctxt->uc_mcontext.fpregs;
fpregs = (void *)fpregset;
- dprintf2("%s() trapno: %d ip: 0x%lx info->si_code: %s/%d\n", __func__,
- trapno, ip, si_code_str(si->si_code), si->si_code);
+ dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n",
+ __func__, trapno, ip, si_code_str(si->si_code),
+ si->si_code);
#ifdef __i386__
/*
* 32-bit has some extra padding so that userspace can tell whether
@@ -240,12 +242,12 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
exit(4);
}
- dprintf1("signal pkey_reg from xsave: %08x\n", *pkey_reg_ptr);
+ dprintf1("signal pkey_reg from xsave: %016lx\n", *pkey_reg_ptr);
/*
* need __rdpkey_reg() version so we do not do shadow_pkey_reg
* checking
*/
- dprintf1("signal pkey_reg from pkey_reg: %08x\n", __rdpkey_reg());
+ dprintf1("signal pkey_reg from pkey_reg: %016lx\n", __rdpkey_reg());
dprintf1("si_pkey from siginfo: %jx\n", si_pkey);
*(u64 *)pkey_reg_ptr = 0x00000000;
dprintf1("WARNING: set PRKU=0 to allow faulting instruction to continue\n");
@@ -364,8 +366,8 @@ void dumpit(char *f)
u32 pkey_get(int pkey, unsigned long flags)
{
u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
- u32 pkey_reg = __rdpkey_reg();
- u32 shifted_pkey_reg;
+ pkey_reg_t pkey_reg = __rdpkey_reg();
+ pkey_reg_t shifted_pkey_reg;
u32 masked_pkey_reg;
dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n",
@@ -386,8 +388,8 @@ u32 pkey_get(int pkey, unsigned long flags)
int pkey_set(int pkey, unsigned long rights, unsigned long flags)
{
u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
- u32 old_pkey_reg = __rdpkey_reg();
- u32 new_pkey_reg;
+ pkey_reg_t old_pkey_reg = __rdpkey_reg();
+ pkey_reg_t new_pkey_reg;
/* make sure that 'rights' only contains the bits we expect: */
assert(!(rights & ~mask));
@@ -401,10 +403,10 @@ int pkey_set(int pkey, unsigned long rights, unsigned long flags)
__wrpkey_reg(new_pkey_reg);
- dprintf3("%s(pkey=%d, rights=%lx, flags=%lx) = %x"
- " pkey_reg now: %x old_pkey_reg: %x\n",
- __func__, pkey, rights, flags, 0, __rdpkey_reg(),
- old_pkey_reg);
+ dprintf3("%s(pkey=%d, rights=%lx, flags=%lx) = %x "
+ "pkey_reg now: %016lx old_pkey_reg: %016lx\n",
+ __func__, pkey, rights, flags,
+ 0, __rdpkey_reg(), old_pkey_reg);
return 0;
}
@@ -413,7 +415,7 @@ void pkey_disable_set(int pkey, int flags)
unsigned long syscall_flags = 0;
int ret;
int pkey_rights;
- u32 orig_pkey_reg = rdpkey_reg();
+ pkey_reg_t orig_pkey_reg = rdpkey_reg();
dprintf1("START->%s(%d, 0x%x)\n", __func__,
pkey, flags);
@@ -421,8 +423,6 @@ void pkey_disable_set(int pkey, int flags)
pkey_rights = pkey_get(pkey, syscall_flags);
- dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
- pkey, pkey, pkey_rights);
pkey_assert(pkey_rights >= 0);
pkey_rights |= flags;
@@ -431,7 +431,8 @@ void pkey_disable_set(int pkey, int flags)
assert(!ret);
/*pkey_reg and flags have the same format */
shadow_pkey_reg |= flags << (pkey * 2);
- dprintf1("%s(%d) shadow: 0x%x\n", __func__, pkey, shadow_pkey_reg);
+ dprintf1("%s(%d) shadow: 0x%016lx\n",
+ __func__, pkey, shadow_pkey_reg);
pkey_assert(ret >= 0);
@@ -439,7 +440,8 @@ void pkey_disable_set(int pkey, int flags)
dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
pkey, pkey, pkey_rights);
- dprintf1("%s(%d) pkey_reg: 0x%x\n", __func__, pkey, rdpkey_reg());
+ dprintf1("%s(%d) pkey_reg: 0x%lx\n",
+ __func__, pkey, rdpkey_reg());
if (flags)
pkey_assert(rdpkey_reg() > orig_pkey_reg);
dprintf1("END<---%s(%d, 0x%x)\n", __func__,
@@ -451,7 +453,7 @@ void pkey_disable_clear(int pkey, int flags)
unsigned long syscall_flags = 0;
int ret;
int pkey_rights = pkey_get(pkey, syscall_flags);
- u32 orig_pkey_reg = rdpkey_reg();
+ pkey_reg_t orig_pkey_reg = rdpkey_reg();
pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
@@ -470,7 +472,8 @@ void pkey_disable_clear(int pkey, int flags)
dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
pkey, pkey, pkey_rights);
- dprintf1("%s(%d) pkey_reg: 0x%x\n", __func__, pkey, rdpkey_reg());
+ dprintf1("%s(%d) pkey_reg: 0x%016lx\n", __func__,
+ pkey, rdpkey_reg());
if (flags)
assert(rdpkey_reg() > orig_pkey_reg);
}
@@ -525,20 +528,21 @@ int alloc_pkey(void)
int ret;
unsigned long init_val = 0x0;
- dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__,
+ dprintf1("%s()::%d, pkey_reg: 0x%016lx shadow: %016lx\n", __func__,
__LINE__, __rdpkey_reg(), shadow_pkey_reg);
ret = sys_pkey_alloc(0, init_val);
/*
* pkey_alloc() sets PKEY register, so we need to reflect it in
* shadow_pkey_reg:
*/
- dprintf4("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ dprintf4("%s()::%d, ret: %d pkey_reg: 0x%016lx shadow: 0x%016lx\n",
__func__, __LINE__, ret, __rdpkey_reg(),
shadow_pkey_reg);
if (ret) {
/* clear both the bits: */
shadow_pkey_reg &= ~(0x3 << (ret * 2));
- dprintf4("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ dprintf4("%s()::%d, ret: %d pkey_reg: 0x%016lx "
+ "shadow: 0x%016lx\n",
__func__,
__LINE__, ret, __rdpkey_reg(),
shadow_pkey_reg);
@@ -548,13 +552,13 @@ int alloc_pkey(void)
*/
shadow_pkey_reg |= (init_val << (ret * 2));
}
- dprintf4("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ dprintf4("%s()::%d, ret: %d pkey_reg: 0x%016lx shadow: 0x%016lx\n",
__func__, __LINE__, ret, __rdpkey_reg(),
shadow_pkey_reg);
dprintf1("%s()::%d errno: %d\n", __func__, __LINE__, errno);
/* for shadow checking: */
rdpkey_reg();
- dprintf4("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ dprintf4("%s()::%d, ret: %d pkey_reg: 0x%016lx shadow: 0x%016lx\n",
__func__, __LINE__, ret, __rdpkey_reg(),
shadow_pkey_reg);
return ret;
@@ -1103,9 +1107,10 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
int new_pkey;
dprintf1("%s() alloc loop: %d\n", __func__, i);
new_pkey = alloc_pkey();
- dprintf4("%s()::%d, err: %d pkey_reg: 0x%x shadow: 0x%x\n",
- __func__, __LINE__, err, __rdpkey_reg(),
- shadow_pkey_reg);
+ dprintf4("%s()::%d, err: %d pkey_reg: 0x%016lx "
+ "shadow: 0x%016lx\n",
+ __func__, __LINE__, err, __rdpkey_reg(),
+ shadow_pkey_reg);
rdpkey_reg(); /* for shadow checking */
dprintf2("%s() errno: %d ENOSPC: %d\n", __func__, errno, ENOSPC);
if ((new_pkey == -1) && (errno == ENOSPC)) {
@@ -1343,7 +1348,7 @@ int main(void)
}
pkey_setup_shadow();
- printf("startup pkey_reg: %x\n", rdpkey_reg());
+ printf("startup pkey_reg: 0x%016lx\n", rdpkey_reg());
setup_hugetlbfs();
while (nr_iterations-- > 0)
--
1.7.1
From 1583331380271091644@xxx Mon Nov 06 15:34:31 +0000 2017
X-GM-THRID: 1582208537841314933
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
From: Thiago Jung Bauermann <[email protected]>
This test exercises read and write access to the AMR.
Signed-off-by: Thiago Jung Bauermann <[email protected]>
---
tools/testing/selftests/powerpc/include/reg.h | 1 +
tools/testing/selftests/powerpc/ptrace/Makefile | 5 +-
.../testing/selftests/powerpc/ptrace/ptrace-pkey.c | 443 ++++++++++++++++++++
3 files changed, 448 insertions(+), 1 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
diff --git a/tools/testing/selftests/powerpc/include/reg.h b/tools/testing/selftests/powerpc/include/reg.h
index 4afdebc..7f348c0 100644
--- a/tools/testing/selftests/powerpc/include/reg.h
+++ b/tools/testing/selftests/powerpc/include/reg.h
@@ -54,6 +54,7 @@
#define SPRN_DSCR_PRIV 0x11 /* Privilege State DSCR */
#define SPRN_DSCR 0x03 /* Data Stream Control Register */
#define SPRN_PPR 896 /* Program Priority Register */
+#define SPRN_AMR 13 /* Authority Mask Register - problem state */
/* TEXASR register bits */
#define TEXASR_FC 0xFE00000000000000
diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile b/tools/testing/selftests/powerpc/ptrace/Makefile
index 4803052..fd896b2 100644
--- a/tools/testing/selftests/powerpc/ptrace/Makefile
+++ b/tools/testing/selftests/powerpc/ptrace/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
TEST_PROGS := ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr \
ptrace-tar ptrace-tm-tar ptrace-tm-spd-tar ptrace-vsx ptrace-tm-vsx \
- ptrace-tm-spd-vsx ptrace-tm-spr
+ ptrace-tm-spd-vsx ptrace-tm-spr ptrace-pkey
include ../../lib.mk
@@ -9,6 +9,9 @@ all: $(TEST_PROGS)
CFLAGS += -m64 -I../../../../../usr/include -I../tm -mhtm -fno-pie
+ptrace-pkey: ../harness.c ../utils.c ../lib/reg.S ptrace.h ptrace-pkey.c
+ $(LINK.c) $^ $(LDLIBS) -pthread -o $@
+
$(TEST_PROGS): ../harness.c ../utils.c ../lib/reg.S ptrace.h
clean:
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
new file mode 100644
index 0000000..2e5b676
--- /dev/null
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
@@ -0,0 +1,443 @@
+/*
+ * Ptrace test for Memory Protection Key registers
+ *
+ * Copyright (C) 2015 Anshuman Khandual, IBM Corporation.
+ * Copyright (C) 2017 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <semaphore.h>
+#include "ptrace.h"
+
+#ifndef __NR_pkey_alloc
+#define __NR_pkey_alloc 384
+#endif
+
+#ifndef __NR_pkey_free
+#define __NR_pkey_free 385
+#endif
+
+#ifndef NT_PPC_PKEY
+#define NT_PPC_PKEY 0x110
+#endif
+
+#ifndef PKEY_DISABLE_EXECUTE
+#define PKEY_DISABLE_EXECUTE 0x4
+#endif
+
+#define AMR_BITS_PER_PKEY 2
+#define PKEY_REG_BITS (sizeof(u64) * 8)
+#define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey + 1) * AMR_BITS_PER_PKEY))
+
+static const char user_read[] = "[User Read (Running)]";
+static const char user_write[] = "[User Write (Running)]";
+static const char ptrace_read_running[] = "[Ptrace Read (Running)]";
+static const char ptrace_write_running[] = "[Ptrace Write (Running)]";
+
+/* Information shared between the parent and the child. */
+struct shared_info {
+ /* AMR value the parent expects to read from the child. */
+ unsigned long amr1;
+
+ /* AMR value the parent is expected to write to the child. */
+ unsigned long amr2;
+
+ /* AMR value that ptrace should refuse to write to the child. */
+ unsigned long amr3;
+
+ /* IAMR value the parent expects to read from the child. */
+ unsigned long expected_iamr;
+
+ /* UAMOR value the parent expects to read from the child. */
+ unsigned long expected_uamor;
+
+ /*
+ * IAMR and UAMOR values that ptrace should refuse to write to the child
+ * (even though they're valid ones) because userspace doesn't have
+ * access to those registers.
+ */
+ unsigned long new_iamr;
+ unsigned long new_uamor;
+
+ /* The parent waits on this semaphore. */
+ sem_t sem_parent;
+
+ /* If true, the child should give up as well. */
+ bool parent_gave_up;
+
+ /* The child waits on this semaphore. */
+ sem_t sem_child;
+
+ /* If true, the parent should give up as well. */
+ bool child_gave_up;
+};
+
+#define CHILD_FAIL_IF(x, info) \
+ do { \
+ if ((x)) { \
+ fprintf(stderr, \
+ "[FAIL] Test FAILED on line %d\n", __LINE__); \
+ (info)->child_gave_up = true; \
+ prod_parent(info); \
+ return 1; \
+ } \
+ } while (0)
+
+#define PARENT_FAIL_IF(x, info) \
+ do { \
+ if ((x)) { \
+ fprintf(stderr, \
+ "[FAIL] Test FAILED on line %d\n", __LINE__); \
+ (info)->parent_gave_up = true; \
+ prod_child(info); \
+ return 1; \
+ } \
+ } while (0)
+
+static int wait_child(struct shared_info *info)
+{
+ int ret;
+
+ /* Wait until the child prods us. */
+ ret = sem_wait(&info->sem_parent);
+ if (ret) {
+ perror("Error waiting for child");
+ return TEST_FAIL;
+ }
+
+ return info->child_gave_up ? TEST_FAIL : TEST_PASS;
+}
+
+static int prod_child(struct shared_info *info)
+{
+ int ret;
+
+ /* Unblock the child now. */
+ ret = sem_post(&info->sem_child);
+ if (ret) {
+ perror("Error prodding child");
+ return TEST_FAIL;
+ }
+
+ return TEST_PASS;
+}
+
+static int wait_parent(struct shared_info *info)
+{
+ int ret;
+
+ /* Wait until the parent prods us. */
+ ret = sem_wait(&info->sem_child);
+ if (ret) {
+ perror("Error waiting for parent");
+ return TEST_FAIL;
+ }
+
+ return info->parent_gave_up ? TEST_FAIL : TEST_PASS;
+}
+
+static int prod_parent(struct shared_info *info)
+{
+ int ret;
+
+ /* Unblock the parent now. */
+ ret = sem_post(&info->sem_parent);
+ if (ret) {
+ perror("Error prodding parent");
+ return TEST_FAIL;
+ }
+
+ return TEST_PASS;
+}
+
+static int sys_pkey_alloc(unsigned long flags, unsigned long init_access_rights)
+{
+ return syscall(__NR_pkey_alloc, flags, init_access_rights);
+}
+
+static int sys_pkey_free(int pkey)
+{
+ return syscall(__NR_pkey_free, pkey);
+}
+
+static int ptrace_read_regs(pid_t child, unsigned long regs[], int n)
+{
+ struct iovec iov;
+ long ret;
+
+ FAIL_IF(start_trace(child));
+
+ iov.iov_base = regs;
+ iov.iov_len = n * sizeof(unsigned long);
+
+ ret = ptrace(PTRACE_GETREGSET, child, NT_PPC_PKEY, &iov);
+ FAIL_IF(ret != 0);
+
+ FAIL_IF(stop_trace(child));
+
+ return TEST_PASS;
+}
+
+static long ptrace_write_regs(pid_t child, unsigned long regs[], int n)
+{
+ struct iovec iov;
+ long ret;
+
+ FAIL_IF(start_trace(child));
+
+ iov.iov_base = regs;
+ iov.iov_len = n * sizeof(unsigned long);
+
+ ret = ptrace(PTRACE_SETREGSET, child, NT_PPC_PKEY, &iov);
+
+ FAIL_IF(stop_trace(child));
+
+ return ret;
+}
+
+static int child(struct shared_info *info)
+{
+ unsigned long reg;
+ bool disable_execute = true;
+ int pkey1, pkey2, pkey3;
+ int ret;
+
+ /* Get some pkeys so that we can change their bits in the AMR. */
+ pkey1 = sys_pkey_alloc(0, PKEY_DISABLE_EXECUTE);
+ if (pkey1 < 0) {
+ pkey1 = sys_pkey_alloc(0, 0);
+ CHILD_FAIL_IF(pkey1 < 0, info);
+
+ disable_execute = false;
+ }
+
+ pkey2 = sys_pkey_alloc(0, 0);
+ CHILD_FAIL_IF(pkey2 < 0, info);
+
+ pkey3 = sys_pkey_alloc(0, 0);
+ CHILD_FAIL_IF(pkey3 < 0, info);
+
+ info->amr1 = 3ul << pkeyshift(pkey1);
+ info->amr2 = 3ul << pkeyshift(pkey2);
+ info->amr3 = info->amr2 | 3ul << pkeyshift(pkey3);
+
+ if (disable_execute)
+ info->expected_iamr = 1ul << pkeyshift(pkey1);
+ else
+ info->expected_iamr = 0;
+
+ info->expected_uamor = 3ul << pkeyshift(pkey1) |
+ 3ul << pkeyshift(pkey2);
+ info->new_iamr = 1ul << pkeyshift(pkey1) | 1ul << pkeyshift(pkey2);
+ info->new_uamor = 3ul << pkeyshift(pkey1);
+
+ /*
+ * We won't use pkey3. We just want a plausible but invalid key to test
+ * whether ptrace will let us write to AMR bits we are not supposed to.
+ *
+ * This also tests whether the kernel restores the UAMOR permissions
+ * after a key is freed.
+ */
+ sys_pkey_free(pkey3);
+
+ printf("%-30s AMR: %016lx pkey1: %d pkey2: %d pkey3: %d\n",
+ user_write, info->amr1, pkey1, pkey2, pkey3);
+
+ mtspr(SPRN_AMR, info->amr1);
+
+ /* Wait for parent to read our AMR value and write a new one. */
+ ret = prod_parent(info);
+ CHILD_FAIL_IF(ret, info);
+
+ ret = wait_parent(info);
+ if (ret)
+ return ret;
+
+ reg = mfspr(SPRN_AMR);
+
+ printf("%-30s AMR: %016lx\n", user_read, reg);
+
+ CHILD_FAIL_IF(reg != info->amr2, info);
+
+ /*
+ * Wait for parent to try to write an invalid AMR value.
+ */
+ ret = prod_parent(info);
+ CHILD_FAIL_IF(ret, info);
+
+ ret = wait_parent(info);
+ if (ret)
+ return ret;
+
+ reg = mfspr(SPRN_AMR);
+
+ printf("%-30s AMR: %016lx\n", user_read, reg);
+
+ CHILD_FAIL_IF(reg != info->amr2, info);
+
+ /*
+ * Wait for parent to try to write an IAMR and a UAMOR value. We can't
+ * verify them, but we can verify that the AMR didn't change.
+ */
+ ret = prod_parent(info);
+ CHILD_FAIL_IF(ret, info);
+
+ ret = wait_parent(info);
+ if (ret)
+ return ret;
+
+ reg = mfspr(SPRN_AMR);
+
+ printf("%-30s AMR: %016lx\n", user_read, reg);
+
+ CHILD_FAIL_IF(reg != info->amr2, info);
+
+ /* Now let parent now that we are finished. */
+
+ ret = prod_parent(info);
+ CHILD_FAIL_IF(ret, info);
+
+ return TEST_PASS;
+}
+
+static int parent(struct shared_info *info, pid_t pid)
+{
+ unsigned long regs[4];
+ int ret, status;
+
+ ret = wait_child(info);
+ if (ret)
+ return ret;
+
+ /* Verify that we can read the pkey registers from the child. */
+ ret = ptrace_read_regs(pid, regs, 3);
+ PARENT_FAIL_IF(ret, info);
+
+ printf("%-30s AMR: %016lx IAMR: %016lx UAMOR: %016lx\n",
+ ptrace_read_running, regs[0], regs[1], regs[2]);
+
+ PARENT_FAIL_IF(regs[0] != info->amr1, info);
+ PARENT_FAIL_IF(regs[1] != info->expected_iamr, info);
+ PARENT_FAIL_IF(regs[2] != info->expected_uamor, info);
+
+ /* Write valid AMR value in child. */
+ ret = ptrace_write_regs(pid, &info->amr2, 1);
+ PARENT_FAIL_IF(ret, info);
+
+ printf("%-30s AMR: %016lx\n", ptrace_write_running, info->amr2);
+
+ /* Wake up child so that it can verify it changed. */
+ ret = prod_child(info);
+ PARENT_FAIL_IF(ret, info);
+
+ ret = wait_child(info);
+ if (ret)
+ return ret;
+
+ /* Write invalid AMR value in child. */
+ ret = ptrace_write_regs(pid, &info->amr3, 1);
+ PARENT_FAIL_IF(ret, info);
+
+ printf("%-30s AMR: %016lx\n", ptrace_write_running, info->amr3);
+
+ /* Wake up child so that it can verify it didn't change. */
+ ret = prod_child(info);
+ PARENT_FAIL_IF(ret, info);
+
+ ret = wait_child(info);
+ if (ret)
+ return ret;
+
+ /* Try to write to IAMR. */
+ regs[0] = info->amr1;
+ regs[1] = info->new_iamr;
+ ret = ptrace_write_regs(pid, regs, 2);
+ PARENT_FAIL_IF(!ret, info);
+
+ printf("%-30s AMR: %016lx IAMR: %016lx\n",
+ ptrace_write_running, regs[0], regs[1]);
+
+ /* Try to write to IAMR and UAMOR. */
+ regs[2] = info->new_uamor;
+ ret = ptrace_write_regs(pid, regs, 3);
+ PARENT_FAIL_IF(!ret, info);
+
+ printf("%-30s AMR: %016lx IAMR: %016lx UAMOR: %016lx\n",
+ ptrace_write_running, regs[0], regs[1], regs[2]);
+
+ /* Verify that all registers still have their expected values. */
+ ret = ptrace_read_regs(pid, regs, 3);
+ PARENT_FAIL_IF(ret, info);
+
+ printf("%-30s AMR: %016lx IAMR: %016lx UAMOR: %016lx\n",
+ ptrace_read_running, regs[0], regs[1], regs[2]);
+
+ PARENT_FAIL_IF(regs[0] != info->amr2, info);
+ PARENT_FAIL_IF(regs[1] != info->expected_iamr, info);
+ PARENT_FAIL_IF(regs[2] != info->expected_uamor, info);
+
+ /* Wake up child so that it can verify AMR didn't change and wrap up. */
+ ret = prod_child(info);
+ PARENT_FAIL_IF(ret, info);
+
+ ret = wait(&status);
+ if (ret != pid) {
+ printf("Child's exit status not captured\n");
+ ret = TEST_PASS;
+ } else if (!WIFEXITED(status)) {
+ printf("Child exited abnormally\n");
+ ret = TEST_FAIL;
+ } else
+ ret = WEXITSTATUS(status) ? TEST_FAIL : TEST_PASS;
+
+ return ret;
+}
+
+static int ptrace_pkey(void)
+{
+ struct shared_info *info;
+ int shm_id;
+ int ret;
+ pid_t pid;
+
+ shm_id = shmget(IPC_PRIVATE, sizeof(*info), 0777 | IPC_CREAT);
+ info = shmat(shm_id, NULL, 0);
+
+ ret = sem_init(&info->sem_parent, 1, 0);
+ if (ret) {
+ perror("Semaphore initialization failed");
+ return TEST_FAIL;
+ }
+ ret = sem_init(&info->sem_child, 1, 0);
+ if (ret) {
+ perror("Semaphore initialization failed");
+ return TEST_FAIL;
+ }
+
+ pid = fork();
+ if (pid < 0) {
+ perror("fork() failed");
+ ret = TEST_FAIL;
+ } else if (pid == 0)
+ ret = child(info);
+ else
+ ret = parent(info, pid);
+
+ shmdt(info);
+
+ if (pid) {
+ sem_destroy(&info->sem_parent);
+ sem_destroy(&info->sem_child);
+ shmctl(shm_id, IPC_RMID, NULL);
+ }
+
+ return ret;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(ptrace_pkey, "ptrace_pkey");
+}
--
1.7.1
From 1584414697086186794@xxx Sat Nov 18 14:33:23 +0000 2017
X-GM-THRID: 1584414697086186794
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
sys_pkey_modify() is powerpc specific system call. It
enables the ability to modify *any* attribute of a key.
Since powerpc disallows modification of IAMR from user space
an application is unable to change a key's execute-attribute.
This system call helps accomplish the above.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/systbl.h | 1 +
arch/powerpc/include/asm/unistd.h | 2 +-
arch/powerpc/include/uapi/asm/unistd.h | 1 +
arch/powerpc/kernel/entry_64.S | 9 +++++++++
arch/powerpc/mm/pkeys.c | 17 +++++++++++++++++
5 files changed, 29 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index d61f9c9..533cdc5 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -392,3 +392,4 @@
SYSCALL(pkey_alloc)
SYSCALL(pkey_free)
SYSCALL(pkey_mprotect)
+PPC64ONLY(pkey_modify)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index daf1ba9..1e97086 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
#include <uapi/asm/unistd.h>
-#define NR_syscalls 387
+#define NR_syscalls 388
#define __NR__exit __NR_exit
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index 389c36f..318cd79 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -398,5 +398,6 @@
#define __NR_pkey_alloc 384
#define __NR_pkey_free 385
#define __NR_pkey_mprotect 386
+#define __NR_pkey_modify 387
#endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 4a0fd4f..47c85f9 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -455,6 +455,15 @@ _GLOBAL(ppc_switch_endian)
bl sys_switch_endian
b .Lsyscall_exit
+_GLOBAL(ppc_pkey_modify)
+ bl save_nvgprs
+#ifdef CONFIG_PPC_MEM_KEYS
+ bl sys_pkey_modify
+#else
+ bl sys_ni_syscall
+#endif
+ b .Lsyscall_exit
+
_GLOBAL(ret_from_fork)
bl schedule_tail
REST_NVGPRS(r1)
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 5047371..2612f61 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -420,3 +420,20 @@ bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
return pkey_access_permitted(vma_pkey(vma), write, execute);
}
+
+long sys_pkey_modify(int pkey, unsigned long new_val)
+{
+ bool ret;
+ /* Check for unsupported init values */
+ if (new_val & ~PKEY_ACCESS_MASK)
+ return -EINVAL;
+
+ down_write(¤t->mm->mmap_sem);
+ ret = mm_pkey_is_allocated(current->mm, pkey);
+ up_write(¤t->mm->mmap_sem);
+
+ if (!ret)
+ return -EINVAL;
+
+ return __arch_set_user_pkey_access(current, pkey, new_val);
+}
--
1.7.1
From 1583307962777472105@xxx Mon Nov 06 09:22:19 +0000 2017
X-GM-THRID: 1583307962777472105
X-Gmail-Labels: Inbox,Category Promotions,HistoricalUnread
get_mm_addr_key() helper returns the pkey associated with
an address corresponding to a given mm_struct.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/mmu.h | 9 +++++++++
arch/powerpc/mm/hash_utils_64.c | 24 ++++++++++++++++++++++++
2 files changed, 33 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 6364f5c..bb38312 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -260,6 +260,15 @@ static inline bool early_radix_enabled(void)
}
#endif
+#ifdef CONFIG_PPC_MEM_KEYS
+extern u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address);
+#else
+static inline u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address)
+{
+ return 0;
+}
+#endif /* CONFIG_PPC_MEM_KEYS */
+
#endif /* !__ASSEMBLY__ */
/* The kernel use the constants below to index in the page sizes array.
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index ddfc673..0108d12 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1575,6 +1575,30 @@ void hash_preload(struct mm_struct *mm, unsigned long ea,
local_irq_restore(flags);
}
+#ifdef CONFIG_PPC_MEM_KEYS
+/*
+ * Return the protection key associated with the given address and the
+ * mm_struct.
+ */
+u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address)
+{
+ pte_t *ptep;
+ u16 pkey = 0;
+ unsigned long flags;
+
+ if (!mm || !mm->pgd)
+ return 0;
+
+ local_irq_save(flags);
+ ptep = find_linux_pte(mm->pgd, address, NULL, NULL);
+ if (ptep)
+ pkey = pte_to_pkey_bits(pte_val(READ_ONCE(*ptep)));
+ local_irq_restore(flags);
+
+ return pkey;
+}
+#endif /* CONFIG_PPC_MEM_KEYS */
+
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
static inline void tm_flush_hash_page(int local)
{
--
1.7.1
From 1583309536161015528@xxx Mon Nov 06 09:47:19 +0000 2017
X-GM-THRID: 1583309536161015528
X-Gmail-Labels: Inbox,Category Promotions,HistoricalUnread
If the flag is 0, no bits will be set. Hence we cant expect
the resulting bitmap to have a higher value than what it
was earlier.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/protection_keys.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 8e2e277..5aba137 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -443,7 +443,7 @@ void pkey_disable_set(int pkey, int flags)
dprintf1("%s(%d) pkey_reg: 0x%lx\n",
__func__, pkey, rdpkey_reg());
if (flags)
- pkey_assert(rdpkey_reg() > orig_pkey_reg);
+ pkey_assert(rdpkey_reg() >= orig_pkey_reg);
dprintf1("END<---%s(%d, 0x%x)\n", __func__,
pkey, flags);
}
--
1.7.1
From 1585213471462168994@xxx Mon Nov 27 10:09:33 +0000 2017
X-GM-THRID: 1585213471462168994
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
This patch provides the detailed implementation for
a user to allocate a key and enable it in the hardware.
It provides the plumbing, but it cannot be used till
the system call is implemented. The next patch will
do so.
Reviewed-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/pkeys.h | 6 ++++-
arch/powerpc/mm/pkeys.c | 40 ++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0d00a54..652c750 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -139,10 +139,14 @@ static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
return 0;
}
+extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+ unsigned long init_val);
static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val)
{
- return 0;
+ if (static_branch_likely(&pkey_disabled))
+ return -EINVAL;
+ return __arch_set_user_pkey_access(tsk, pkey, init_val);
}
static inline void pkey_mm_init(struct mm_struct *mm)
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index f3bf661..4a01c2f 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -9,6 +9,7 @@
* (at your option) any later version.
*/
+#include <asm/mman.h>
#include <linux/pkeys.h>
DEFINE_STATIC_KEY_TRUE(pkey_disabled);
@@ -17,6 +18,9 @@
u32 initial_allocation_mask; /* Bits set for reserved keys */
#define AMR_BITS_PER_PKEY 2
+#define AMR_RD_BIT 0x1UL
+#define AMR_WR_BIT 0x2UL
+#define IAMR_EX_BIT 0x1UL
#define PKEY_REG_BITS (sizeof(u64)*8)
#define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
@@ -102,6 +106,20 @@ static inline void write_uamor(u64 value)
mtspr(SPRN_UAMOR, value);
}
+static bool is_pkey_enabled(int pkey)
+{
+ u64 uamor = read_uamor();
+ u64 pkey_bits = 0x3ul << pkeyshift(pkey);
+ u64 uamor_pkey_bits = (uamor & pkey_bits);
+
+ /*
+ * Both the bits in UAMOR corresponding to the key should be set or
+ * reset.
+ */
+ WARN_ON(uamor_pkey_bits && (uamor_pkey_bits != pkey_bits));
+ return !!(uamor_pkey_bits);
+}
+
static inline void init_amr(int pkey, u8 init_bits)
{
u64 new_amr_bits = (((u64)init_bits & 0x3UL) << pkeyshift(pkey));
@@ -144,3 +162,25 @@ void __arch_deactivate_pkey(int pkey)
{
pkey_status_change(pkey, false);
}
+
+/*
+ * Set the access rights in AMR IAMR and UAMOR registers for @pkey to that
+ * specified in @init_val.
+ */
+int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+ unsigned long init_val)
+{
+ u64 new_amr_bits = 0x0ul;
+
+ if (!is_pkey_enabled(pkey))
+ return -EINVAL;
+
+ /* Set the bits we need in AMR: */
+ if (init_val & PKEY_DISABLE_ACCESS)
+ new_amr_bits |= AMR_RD_BIT | AMR_WR_BIT;
+ else if (init_val & PKEY_DISABLE_WRITE)
+ new_amr_bits |= AMR_WR_BIT;
+
+ init_amr(pkey, new_amr_bits);
+ return 0;
+}
--
1.7.1
From 1583309180487644457@xxx Mon Nov 06 09:41:40 +0000 2017
X-GM-THRID: 1583309180487644457
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Implements helper functions to read and write the key related
registers; AMR, IAMR, UAMOR.
AMR register tracks the read,write permission of a key
IAMR register tracks the execute permission of a key
UAMOR register enables and disables a key
Acked-by: Balbir Singh <[email protected]>
Reviewed-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/mm/pkeys.c | 36 ++++++++++++++++++++++++++++++++++++
1 files changed, 36 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 512bdf2..b6bdfdf 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -61,3 +61,39 @@ void __init pkey_initialize(void)
for (i = 2; i < (pkeys_total - os_reserved); i++)
initial_allocation_mask &= ~(0x1 << i);
}
+
+static inline u64 read_amr(void)
+{
+ return mfspr(SPRN_AMR);
+}
+
+static inline void write_amr(u64 value)
+{
+ mtspr(SPRN_AMR, value);
+}
+
+static inline u64 read_iamr(void)
+{
+ if (!likely(pkey_execute_disable_supported))
+ return 0x0UL;
+
+ return mfspr(SPRN_IAMR);
+}
+
+static inline void write_iamr(u64 value)
+{
+ if (!likely(pkey_execute_disable_supported))
+ return;
+
+ mtspr(SPRN_IAMR, value);
+}
+
+static inline u64 read_uamor(void)
+{
+ return mfspr(SPRN_UAMOR);
+}
+
+static inline void write_uamor(u64 value)
+{
+ mtspr(SPRN_UAMOR, value);
+}
--
1.7.1
From 1583309458835695531@xxx Mon Nov 06 09:46:05 +0000 2017
X-GM-THRID: 1583307505196060866
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
helper function that checks if the read/write/execute is allowed
on the pte.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/book3s/64/pgtable.h | 4 +++
arch/powerpc/include/asm/pkeys.h | 9 ++++++++
arch/powerpc/mm/pkeys.c | 28 ++++++++++++++++++++++++++
3 files changed, 41 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 4c1ee6e..c277a63 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -462,6 +462,10 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
pte_update(mm, addr, ptep, 0, _PAGE_PRIVILEGED, 1);
}
+#ifdef CONFIG_PPC_MEM_KEYS
+extern bool arch_pte_access_permitted(u64 pte, bool write, bool execute);
+#endif /* CONFIG_PPC_MEM_KEYS */
+
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 06a58fe..3437a50 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -82,6 +82,15 @@ static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL));
}
+static inline u16 pte_to_pkey_bits(u64 pteflags)
+{
+ return (((pteflags & H_PTE_PKEY_BIT0) ? 0x10 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT1) ? 0x8 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT2) ? 0x4 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT3) ? 0x2 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT4) ? 0x1 : 0x0UL));
+}
+
#define pkey_alloc_mask(pkey) (0x1 << pkey)
#define mm_pkey_allocation_map(mm) (mm->context.pkey_allocation_map)
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index f1c6195..13902be 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -347,3 +347,31 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot,
/* Nothing to override. */
return vma_pkey(vma);
}
+
+static bool pkey_access_permitted(int pkey, bool write, bool execute)
+{
+ int pkey_shift;
+ u64 amr;
+
+ if (!pkey)
+ return true;
+
+ if (!is_pkey_enabled(pkey))
+ return true;
+
+ pkey_shift = pkeyshift(pkey);
+ if (execute && !(read_iamr() & (IAMR_EX_BIT << pkey_shift)))
+ return true;
+
+ amr = read_amr(); /* Delay reading amr until absolutely needed */
+ return ((!write && !(amr & (AMR_RD_BIT << pkey_shift))) ||
+ (write && !(amr & (AMR_WR_BIT << pkey_shift))));
+}
+
+bool arch_pte_access_permitted(u64 pte, bool write, bool execute)
+{
+ if (static_branch_likely(&pkey_disabled))
+ return true;
+
+ return pkey_access_permitted(pte_to_pkey_bits(pte), write, execute);
+}
--
1.7.1
From 1583346504288942528@xxx Mon Nov 06 19:34:55 +0000 2017
X-GM-THRID: 1583346504288942528
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Moved all the generic definition and helper functions to the
header file
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/pkey-helpers.h | 62 +++++++++++++++++++++++--
tools/testing/selftests/vm/protection_keys.c | 54 ----------------------
2 files changed, 57 insertions(+), 59 deletions(-)
diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h
index 2d91d34..1b15b54 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -13,8 +13,31 @@
#include <ucontext.h>
#include <sys/mman.h>
+/* Define some kernel-like types */
+#define u8 uint8_t
+#define u16 uint16_t
+#define u32 uint32_t
+#define u64 uint64_t
+
+#ifdef __i386__
+#define SYS_mprotect_key 380
+#define SYS_pkey_alloc 381
+#define SYS_pkey_free 382
+#define REG_IP_IDX REG_EIP
+#define si_pkey_offset 0x14
+#else
+#define SYS_mprotect_key 329
+#define SYS_pkey_alloc 330
+#define SYS_pkey_free 331
+#define REG_IP_IDX REG_RIP
+#define si_pkey_offset 0x20
+#endif
+
#define NR_PKEYS 16
#define PKEY_BITS_PER_PKEY 2
+#define PKEY_DISABLE_ACCESS 0x1
+#define PKEY_DISABLE_WRITE 0x2
+#define HPAGE_SIZE (1UL<<21)
#ifndef DEBUG_LEVEL
#define DEBUG_LEVEL 0
@@ -138,11 +161,6 @@ static inline void __pkey_write_allow(int pkey, int do_allow_write)
dprintf4("pkey_reg now: %08x\n", rdpkey_reg());
}
-#define PROT_PKEY0 0x10 /* protection key value (bit 0) */
-#define PROT_PKEY1 0x20 /* protection key value (bit 1) */
-#define PROT_PKEY2 0x40 /* protection key value (bit 2) */
-#define PROT_PKEY3 0x80 /* protection key value (bit 3) */
-
#define PAGE_SIZE 4096
#define MB (1<<20)
@@ -220,4 +238,38 @@ int pkey_reg_xstate_offset(void)
return xstate_offset;
}
+static inline void __page_o_noops(void)
+{
+ /* 8-bytes of instruction * 512 bytes = 1 page */
+ asm(".rept 512 ; nopl 0x7eeeeeee(%eax) ; .endr");
+}
+
#endif /* _PKEYS_HELPER_H */
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
+#define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1))
+#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
+#define ALIGN_PTR_UP(p, ptr_align_to) \
+ ((typeof(p))ALIGN_UP((unsigned long)(p), ptr_align_to))
+#define ALIGN_PTR_DOWN(p, ptr_align_to) \
+ ((typeof(p))ALIGN_DOWN((unsigned long)(p), ptr_align_to))
+#define __stringify_1(x...) #x
+#define __stringify(x...) __stringify_1(x)
+
+#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
+
+int dprint_in_signal;
+char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
+
+extern void abort_hooks(void);
+#define pkey_assert(condition) do { \
+ if (!(condition)) { \
+ dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \
+ __FILE__, __LINE__, \
+ test_nr, iteration_nr); \
+ dprintf0("errno at assert: %d", errno); \
+ abort_hooks(); \
+ assert(condition); \
+ } \
+} while (0)
+#define raw_assert(cond) assert(cond)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 27b11e6..dec05e0 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -49,34 +49,9 @@
int test_nr;
unsigned int shadow_pkey_reg;
-
-#define HPAGE_SIZE (1UL<<21)
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
-#define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1))
-#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
-#define ALIGN_PTR_UP(p, ptr_align_to) ((typeof(p))ALIGN_UP((unsigned long)(p), ptr_align_to))
-#define ALIGN_PTR_DOWN(p, ptr_align_to) ((typeof(p))ALIGN_DOWN((unsigned long)(p), ptr_align_to))
-#define __stringify_1(x...) #x
-#define __stringify(x...) __stringify_1(x)
-
-#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
-
int dprint_in_signal;
char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
-extern void abort_hooks(void);
-#define pkey_assert(condition) do { \
- if (!(condition)) { \
- dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \
- __FILE__, __LINE__, \
- test_nr, iteration_nr); \
- dprintf0("errno at assert: %d", errno); \
- abort_hooks(); \
- assert(condition); \
- } \
-} while (0)
-#define raw_assert(cond) assert(cond)
-
void cat_into_file(char *str, char *file)
{
int fd = open(file, O_RDWR);
@@ -154,12 +129,6 @@ void abort_hooks(void)
#endif
}
-static inline void __page_o_noops(void)
-{
- /* 8-bytes of instruction * 512 bytes = 1 page */
- asm(".rept 512 ; nopl 0x7eeeeeee(%eax) ; .endr");
-}
-
/*
* This attempts to have roughly a page of instructions followed by a few
* instructions that do a write, and another page of instructions. That
@@ -182,26 +151,6 @@ void lots_o_noops_around_write(int *write_to_me)
dprintf3("%s() done\n", __func__);
}
-/* Define some kernel-like types */
-#define u8 uint8_t
-#define u16 uint16_t
-#define u32 uint32_t
-#define u64 uint64_t
-
-#ifdef __i386__
-#define SYS_mprotect_key 380
-#define SYS_pkey_alloc 381
-#define SYS_pkey_free 382
-#define REG_IP_IDX REG_EIP
-#define si_pkey_offset 0x14
-#else
-#define SYS_mprotect_key 329
-#define SYS_pkey_alloc 330
-#define SYS_pkey_free 331
-#define REG_IP_IDX REG_RIP
-#define si_pkey_offset 0x20
-#endif
-
void dump_mem(void *dumpme, int len_bytes)
{
char *c = (void *)dumpme;
@@ -412,9 +361,6 @@ void dumpit(char *f)
close(fd);
}
-#define PKEY_DISABLE_ACCESS 0x1
-#define PKEY_DISABLE_WRITE 0x2
-
u32 pkey_get(int pkey, unsigned long flags)
{
u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
--
1.7.1
From 1583308442037330083@xxx Mon Nov 06 09:29:56 +0000 2017
X-GM-THRID: 1580052043957453261
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Store and restore the AMR, IAMR and UAMOR register state of the task
before scheduling out and after scheduling in, respectively.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/mmu_context.h | 3 ++
arch/powerpc/include/asm/pkeys.h | 4 ++
arch/powerpc/include/asm/processor.h | 5 +++
arch/powerpc/kernel/process.c | 7 ++++
arch/powerpc/mm/pkeys.c | 49 +++++++++++++++++++++++++++++++-
5 files changed, 67 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 6d7c4f1..4eccc2f 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -146,6 +146,9 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
#ifndef CONFIG_PPC_MEM_KEYS
#define pkey_initialize()
#define pkey_mm_init(mm)
+#define thread_pkey_regs_save(thread)
+#define thread_pkey_regs_restore(new_thread, old_thread)
+#define thread_pkey_regs_init(thread)
#endif /* CONFIG_PPC_MEM_KEYS */
#endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 652c750..0b2d9f0 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -156,5 +156,9 @@ static inline void pkey_mm_init(struct mm_struct *mm)
mm_pkey_allocation_map(mm) = initial_allocation_mask;
}
+extern void thread_pkey_regs_save(struct thread_struct *thread);
+extern void thread_pkey_regs_restore(struct thread_struct *new_thread,
+ struct thread_struct *old_thread);
+extern void thread_pkey_regs_init(struct thread_struct *thread);
extern void pkey_initialize(void);
#endif /*_ASM_POWERPC_KEYS_H */
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index fab7ff8..e3c417c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -309,6 +309,11 @@ struct thread_struct {
struct thread_vr_state ckvr_state; /* Checkpointed VR state */
unsigned long ckvrsave; /* Checkpointed VRSAVE */
#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#ifdef CONFIG_PPC_MEM_KEYS
+ unsigned long amr;
+ unsigned long iamr;
+ unsigned long uamor;
+#endif
#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
void* kvm_shadow_vcpu; /* KVM internal data */
#endif /* CONFIG_KVM_BOOK3S_32_HANDLER */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a0c74bb..148b934 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -42,6 +42,7 @@
#include <linux/hw_breakpoint.h>
#include <linux/uaccess.h>
#include <linux/elf-randomize.h>
+#include <linux/pkeys.h>
#include <asm/pgtable.h>
#include <asm/io.h>
@@ -1085,6 +1086,8 @@ static inline void save_sprs(struct thread_struct *t)
t->tar = mfspr(SPRN_TAR);
}
#endif
+
+ thread_pkey_regs_save(t);
}
static inline void restore_sprs(struct thread_struct *old_thread,
@@ -1120,6 +1123,8 @@ static inline void restore_sprs(struct thread_struct *old_thread,
mtspr(SPRN_TAR, new_thread->tar);
}
#endif
+
+ thread_pkey_regs_restore(new_thread, old_thread);
}
#ifdef CONFIG_PPC_BOOK3S_64
@@ -1705,6 +1710,8 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
current->thread.tm_tfiar = 0;
current->thread.load_tm = 0;
#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+
+ thread_pkey_regs_init(¤t->thread);
}
EXPORT_SYMBOL(start_thread);
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 3ddc13a..469f370 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -16,6 +16,8 @@
bool pkey_execute_disable_supported;
int pkeys_total; /* Total pkeys as per device tree */
u32 initial_allocation_mask; /* Bits set for reserved keys */
+u64 pkey_amr_uamor_mask; /* Bits in AMR/UMOR not to be touched */
+u64 pkey_iamr_mask; /* Bits in AMR not to be touched */
#define AMR_BITS_PER_PKEY 2
#define AMR_RD_BIT 0x1UL
@@ -74,8 +76,16 @@ void __init pkey_initialize(void)
* programming note.
*/
initial_allocation_mask = ~0x0;
- for (i = 2; i < (pkeys_total - os_reserved); i++)
+
+ /* register mask is in BE format */
+ pkey_amr_uamor_mask = ~0x0ul;
+ pkey_iamr_mask = ~0x0ul;
+
+ for (i = 2; i < (pkeys_total - os_reserved); i++) {
initial_allocation_mask &= ~(0x1 << i);
+ pkey_amr_uamor_mask &= ~(0x3ul << pkeyshift(i));
+ pkey_iamr_mask &= ~(0x1ul << pkeyshift(i));
+ }
}
static inline u64 read_amr(void)
@@ -200,3 +210,40 @@ int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
init_amr(pkey, new_amr_bits);
return 0;
}
+
+void thread_pkey_regs_save(struct thread_struct *thread)
+{
+ if (static_branch_likely(&pkey_disabled))
+ return;
+
+ /*
+ * TODO: Skip saving registers if @thread hasn't used any keys yet.
+ */
+ thread->amr = read_amr();
+ thread->iamr = read_iamr();
+ thread->uamor = read_uamor();
+}
+
+void thread_pkey_regs_restore(struct thread_struct *new_thread,
+ struct thread_struct *old_thread)
+{
+ if (static_branch_likely(&pkey_disabled))
+ return;
+
+ /*
+ * TODO: Just set UAMOR to zero if @new_thread hasn't used any keys yet.
+ */
+ if (old_thread->amr != new_thread->amr)
+ write_amr(new_thread->amr);
+ if (old_thread->iamr != new_thread->iamr)
+ write_iamr(new_thread->iamr);
+ if (old_thread->uamor != new_thread->uamor)
+ write_uamor(new_thread->uamor);
+}
+
+void thread_pkey_regs_init(struct thread_struct *thread)
+{
+ write_amr(read_amr() & pkey_amr_uamor_mask);
+ write_iamr(read_iamr() & pkey_iamr_mask);
+ write_uamor(read_uamor() & pkey_amr_uamor_mask);
+}
--
1.7.1
From 1583307853746238184@xxx Mon Nov 06 09:20:35 +0000 2017
X-GM-THRID: 1583307853746238184
X-Gmail-Labels: Inbox,Category Promotions,HistoricalUnread
detect write-violation on a page to which write-disabled
key is associated much after the page is mapped.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/protection_keys.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 998a44f..0b7b826 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1033,6 +1033,17 @@ void test_read_of_access_disabled_region_with_page_already_mapped(int *ptr,
expected_pkey_fault(pkey);
}
+void test_write_of_write_disabled_region_with_page_already_mapped(int *ptr,
+ u16 pkey)
+{
+ *ptr = __LINE__;
+ dprintf1("disabling write access; after accessing the page, "
+ "to PKEY[%02d], doing write\n", pkey);
+ pkey_write_deny(pkey);
+ *ptr = __LINE__;
+ expected_pkey_fault(pkey);
+}
+
void test_write_of_write_disabled_region(int *ptr, u16 pkey)
{
dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey);
@@ -1329,6 +1340,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey)
test_read_of_access_disabled_region,
test_read_of_access_disabled_region_with_page_already_mapped,
test_write_of_write_disabled_region,
+ test_write_of_write_disabled_region_with_page_already_mapped,
test_write_of_access_disabled_region,
test_kernel_write_of_access_disabled_region,
test_kernel_write_of_write_disabled_region,
--
1.7.1
From 1583309483287227405@xxx Mon Nov 06 09:46:29 +0000 2017
X-GM-THRID: 1583309483287227405
X-Gmail-Labels: Inbox,Category Promotions,HistoricalUnread
introduce a new allocator that allocates 4k hardware-pages to back
64k linux-page. This allocator is only applicable on powerpc.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/protection_keys.c | 30 ++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index c790bff..7b3649f 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -765,6 +765,35 @@ void free_pkey_malloc(void *ptr)
return ptr;
}
+void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey)
+{
+#ifdef __powerpc64__
+ void *ptr;
+ int ret;
+
+ dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__,
+ size, prot, pkey);
+ pkey_assert(pkey < NR_PKEYS);
+ ptr = mmap(NULL, size, prot, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+ pkey_assert(ptr != (void *)-1);
+
+ ret = syscall(__NR_subpage_prot, ptr, size, NULL);
+ if (ret) {
+ perror("subpage_perm");
+ return PTR_ERR_ENOTSUP;
+ }
+
+ ret = mprotect_pkey((void *)ptr, PAGE_SIZE, prot, pkey);
+ pkey_assert(!ret);
+ record_pkey_malloc(ptr, size);
+
+ dprintf1("%s() for pkey %d @ %p\n", __func__, pkey, ptr);
+ return ptr;
+#else /* __powerpc64__ */
+ return PTR_ERR_ENOTSUP;
+#endif /* __powerpc64__ */
+}
+
void *malloc_pkey_anon_huge(long size, int prot, u16 pkey)
{
int ret;
@@ -887,6 +916,7 @@ void setup_hugetlbfs(void)
void *(*pkey_malloc[])(long size, int prot, u16 pkey) = {
malloc_pkey_with_mprotect,
+ malloc_pkey_with_mprotect_subpage,
malloc_pkey_anon_huge,
malloc_pkey_hugetlb
/* can not do direct with the pkey_mprotect() API:
--
1.7.1
From 1583317021580141345@xxx Mon Nov 06 11:46:18 +0000 2017
X-GM-THRID: 1583317021580141345
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
The value of the pkey, whose protection got violated,
is made available in si_pkey field of the siginfo structure.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/bug.h | 1 +
arch/powerpc/kernel/traps.c | 12 ++++++++-
arch/powerpc/mm/fault.c | 55 ++++++++++++++++++++++-----------------
3 files changed, 43 insertions(+), 25 deletions(-)
diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 3c04249..97c3847 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -133,6 +133,7 @@
extern int do_page_fault(struct pt_regs *, unsigned long, unsigned long);
extern void bad_page_fault(struct pt_regs *, unsigned long, int);
extern void _exception(int, struct pt_regs *, int, unsigned long);
+extern void _exception_pkey(int, struct pt_regs *, int, unsigned long, int);
extern void die(const char *, struct pt_regs *, long);
extern bool die_will_crash(void);
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 13c9dcd..ed1c39b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -20,6 +20,7 @@
#include <linux/sched/debug.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/pkeys.h>
#include <linux/stddef.h>
#include <linux/unistd.h>
#include <linux/ptrace.h>
@@ -265,7 +266,9 @@ void user_single_step_siginfo(struct task_struct *tsk,
info->si_addr = (void __user *)regs->nip;
}
-void _exception(int signr, struct pt_regs *regs, int code, unsigned long addr)
+
+void _exception_pkey(int signr, struct pt_regs *regs, int code, unsigned long addr,
+ int key)
{
siginfo_t info;
const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
@@ -292,9 +295,16 @@ void _exception(int signr, struct pt_regs *regs, int code, unsigned long addr)
info.si_signo = signr;
info.si_code = code;
info.si_addr = (void __user *) addr;
+ info.si_pkey = key;
+
force_sig_info(signr, &info, current);
}
+void _exception(int signr, struct pt_regs *regs, int code, unsigned long addr)
+{
+ _exception_pkey(signr, regs, code, addr, 0);
+}
+
void system_reset_exception(struct pt_regs *regs)
{
/*
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index dfcd0e4..84523ed 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -107,7 +107,8 @@ static bool store_updates_sp(struct pt_regs *regs)
*/
static int
-__bad_area_nosemaphore(struct pt_regs *regs, unsigned long address, int si_code)
+__bad_area_nosemaphore(struct pt_regs *regs, unsigned long address, int si_code,
+ int pkey)
{
/*
* If we are in kernel mode, bail out with a SEGV, this will
@@ -117,17 +118,18 @@ static bool store_updates_sp(struct pt_regs *regs)
if (!user_mode(regs))
return SIGSEGV;
- _exception(SIGSEGV, regs, si_code, address);
+ _exception_pkey(SIGSEGV, regs, si_code, address, pkey);
return 0;
}
static noinline int bad_area_nosemaphore(struct pt_regs *regs, unsigned long address)
{
- return __bad_area_nosemaphore(regs, address, SEGV_MAPERR);
+ return __bad_area_nosemaphore(regs, address, SEGV_MAPERR, 0);
}
-static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code)
+static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code,
+ int pkey)
{
struct mm_struct *mm = current->mm;
@@ -137,30 +139,18 @@ static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code)
*/
up_read(&mm->mmap_sem);
- return __bad_area_nosemaphore(regs, address, si_code);
+ return __bad_area_nosemaphore(regs, address, si_code, pkey);
}
static noinline int bad_area(struct pt_regs *regs, unsigned long address)
{
- return __bad_area(regs, address, SEGV_MAPERR);
+ return __bad_area(regs, address, SEGV_MAPERR, 0);
}
-static int bad_page_fault_exception(struct pt_regs *regs, unsigned long address,
- int si_code)
+static int bad_key_fault_exception(struct pt_regs *regs, unsigned long address,
+ int pkey)
{
- int sig = SIGBUS;
- int code = BUS_OBJERR;
-
-#ifdef CONFIG_PPC_MEM_KEYS
- if (si_code & DSISR_KEYFAULT) {
- perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
- sig = SIGSEGV;
- code = SEGV_PKUERR;
- }
-#endif /* CONFIG_PPC_MEM_KEYS */
-
- _exception(sig, regs, code, address);
- return 0;
+ return __bad_area_nosemaphore(regs, address, SEGV_PKUERR, pkey);
}
static int do_sigbus(struct pt_regs *regs, unsigned long address,
@@ -411,7 +401,16 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
if (unlikely(page_fault_is_bad(error_code))) {
if (!is_user)
return SIGBUS;
- return bad_page_fault_exception(regs, address, error_code);
+
+ if (error_code & DSISR_KEYFAULT) {
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs,
+ address);
+ return bad_key_fault_exception(regs, address,
+ get_mm_addr_key(current->mm, address));
+ }
+
+ _exception_pkey(SIGBUS, regs, BUS_OBJERR, address, 0);
+ return 0;
}
/* Additional sanity check(s) */
@@ -516,8 +515,16 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
fault = handle_mm_fault(vma, address, flags);
#ifdef CONFIG_PPC_MEM_KEYS
- if (unlikely(fault & VM_FAULT_SIGSEGV))
- return __bad_area(regs, address, SEGV_PKUERR);
+ if (unlikely(fault & VM_FAULT_SIGSEGV)) {
+ /*
+ * The PGD-PDT...PMD-PTE tree may not have been fully setup.
+ * Hence we cannot walk the tree to locate the PTE, to locate
+ * the key. Hence lets use vma_pkey() to get the key; instead
+ * of get_mm_addr_key().
+ */
+ up_read(¤t->mm->mmap_sem);
+ return bad_key_fault_exception(regs, address, vma_pkey(vma));
+ }
#endif /* CONFIG_PPC_MEM_KEYS */
major |= fault & VM_FAULT_MAJOR;
--
1.7.1
From 1583289397657835511@xxx Mon Nov 06 04:27:13 +0000 2017
X-GM-THRID: 1583289397657835511
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Make sure that the kernel does not access user pages without
checking their key-protection.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/book3s/64/pgtable.h | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index c277a63..5ecb846 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -464,6 +464,19 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
#ifdef CONFIG_PPC_MEM_KEYS
extern bool arch_pte_access_permitted(u64 pte, bool write, bool execute);
+
+#define pte_access_permitted(pte, write) \
+ (pte_present(pte) && \
+ ((!(write) || pte_write(pte)) && \
+ arch_pte_access_permitted(pte_val(pte), !!write, 0)))
+
+/*
+ * We store key in pmd for huge tlb pages. So need to check for key protection.
+ */
+#define pmd_access_permitted(pmd, write) \
+ (pmd_present(pmd) && \
+ ((!(write) || pmd_write(pmd)) && \
+ arch_pte_access_permitted(pmd_val(pmd), !!write, 0)))
#endif /* CONFIG_PPC_MEM_KEYS */
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
--
1.7.1
From 1583299646624991096@xxx Mon Nov 06 07:10:08 +0000 2017
X-GM-THRID: 1583299646624991096
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
some pkru references are named to pkey_reg
and some prku references are renamed to pkey
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/pkey-helpers.h | 85 +++++-----
tools/testing/selftests/vm/protection_keys.c | 227 ++++++++++++++------------
2 files changed, 164 insertions(+), 148 deletions(-)
diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h
index 3818f25..2d91d34 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -14,7 +14,7 @@
#include <sys/mman.h>
#define NR_PKEYS 16
-#define PKRU_BITS_PER_PKEY 2
+#define PKEY_BITS_PER_PKEY 2
#ifndef DEBUG_LEVEL
#define DEBUG_LEVEL 0
@@ -54,85 +54,88 @@ static inline void sigsafe_printf(const char *format, ...)
#define dprintf3(args...) dprintf_level(3, args)
#define dprintf4(args...) dprintf_level(4, args)
-extern unsigned int shadow_pkru;
-static inline unsigned int __rdpkru(void)
+extern unsigned int shadow_pkey_reg;
+static inline unsigned int __rdpkey_reg(void)
{
unsigned int eax, edx;
unsigned int ecx = 0;
- unsigned int pkru;
+ unsigned int pkey_reg;
asm volatile(".byte 0x0f,0x01,0xee\n\t"
: "=a" (eax), "=d" (edx)
: "c" (ecx));
- pkru = eax;
- return pkru;
+ pkey_reg = eax;
+ return pkey_reg;
}
-static inline unsigned int _rdpkru(int line)
+static inline unsigned int _rdpkey_reg(int line)
{
- unsigned int pkru = __rdpkru();
+ unsigned int pkey_reg = __rdpkey_reg();
- dprintf4("rdpkru(line=%d) pkru: %x shadow: %x\n",
- line, pkru, shadow_pkru);
- assert(pkru == shadow_pkru);
+ dprintf4("rdpkey_reg(line=%d) pkey_reg: %x shadow: %x\n",
+ line, pkey_reg, shadow_pkey_reg);
+ assert(pkey_reg == shadow_pkey_reg);
- return pkru;
+ return pkey_reg;
}
-#define rdpkru() _rdpkru(__LINE__)
+#define rdpkey_reg() _rdpkey_reg(__LINE__)
-static inline void __wrpkru(unsigned int pkru)
+static inline void __wrpkey_reg(unsigned int pkey_reg)
{
- unsigned int eax = pkru;
+ unsigned int eax = pkey_reg;
unsigned int ecx = 0;
unsigned int edx = 0;
- dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+ dprintf4("%s() changing %08x to %08x\n", __func__,
+ __rdpkey_reg(), pkey_reg);
asm volatile(".byte 0x0f,0x01,0xef\n\t"
: : "a" (eax), "c" (ecx), "d" (edx));
- assert(pkru == __rdpkru());
+ assert(pkey_reg == __rdpkey_reg());
}
-static inline void wrpkru(unsigned int pkru)
+static inline void wrpkey_reg(unsigned int pkey_reg)
{
- dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+ dprintf4("%s() changing %08x to %08x\n", __func__,
+ __rdpkey_reg(), pkey_reg);
/* will do the shadow check for us: */
- rdpkru();
- __wrpkru(pkru);
- shadow_pkru = pkru;
- dprintf4("%s(%08x) pkru: %08x\n", __func__, pkru, __rdpkru());
+ rdpkey_reg();
+ __wrpkey_reg(pkey_reg);
+ shadow_pkey_reg = pkey_reg;
+ dprintf4("%s(%08x) pkey_reg: %08x\n", __func__,
+ pkey_reg, __rdpkey_reg());
}
/*
* These are technically racy. since something could
- * change PKRU between the read and the write.
+ * change PKEY register between the read and the write.
*/
static inline void __pkey_access_allow(int pkey, int do_allow)
{
- unsigned int pkru = rdpkru();
+ unsigned int pkey_reg = rdpkey_reg();
int bit = pkey * 2;
if (do_allow)
- pkru &= (1<<bit);
+ pkey_reg &= (1<<bit);
else
- pkru |= (1<<bit);
+ pkey_reg |= (1<<bit);
- dprintf4("pkru now: %08x\n", rdpkru());
- wrpkru(pkru);
+ dprintf4("pkey_reg now: %08x\n", rdpkey_reg());
+ wrpkey_reg(pkey_reg);
}
static inline void __pkey_write_allow(int pkey, int do_allow_write)
{
- long pkru = rdpkru();
+ long pkey_reg = rdpkey_reg();
int bit = pkey * 2 + 1;
if (do_allow_write)
- pkru &= (1<<bit);
+ pkey_reg &= (1<<bit);
else
- pkru |= (1<<bit);
+ pkey_reg |= (1<<bit);
- wrpkru(pkru);
- dprintf4("pkru now: %08x\n", rdpkru());
+ wrpkey_reg(pkey_reg);
+ dprintf4("pkey_reg now: %08x\n", rdpkey_reg());
}
#define PROT_PKEY0 0x10 /* protection key value (bit 0) */
@@ -182,10 +185,10 @@ static inline int cpu_has_pku(void)
return 1;
}
-#define XSTATE_PKRU_BIT (9)
-#define XSTATE_PKRU 0x200
+#define XSTATE_PKEY_BIT (9)
+#define XSTATE_PKEY 0x200
-int pkru_xstate_offset(void)
+int pkey_reg_xstate_offset(void)
{
unsigned int eax;
unsigned int ebx;
@@ -196,21 +199,21 @@ int pkru_xstate_offset(void)
unsigned long XSTATE_CPUID = 0xd;
int leaf;
- /* assume that XSTATE_PKRU is set in XCR0 */
- leaf = XSTATE_PKRU_BIT;
+ /* assume that XSTATE_PKEY is set in XCR0 */
+ leaf = XSTATE_PKEY_BIT;
{
eax = XSTATE_CPUID;
ecx = leaf;
__cpuid(&eax, &ebx, &ecx, &edx);
- if (leaf == XSTATE_PKRU_BIT) {
+ if (leaf == XSTATE_PKEY_BIT) {
xstate_offset = ebx;
xstate_size = eax;
}
}
if (xstate_size == 0) {
- printf("could not find size/offset of PKRU in xsave state\n");
+ printf("could not find size/offset of PKEY in xsave state\n");
return 0;
}
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 555e43c..27b11e6 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1,11 +1,11 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Tests x86 Memory Protection Keys (see Documentation/x86/protection-keys.txt)
+ * Tests Memory Protection Keys (see Documentation/vm/protection-keys.txt)
*
* There are examples in here of:
* * how to set protection keys on memory
- * * how to set/clear bits in PKRU (the rights register)
- * * how to handle SEGV_PKRU signals and extract pkey-relevant
+ * * how to set/clear bits in pkey registers (the rights register)
+ * * how to handle SEGV_PKUERR signals and extract pkey-relevant
* information from the siginfo
*
* Things to add:
@@ -48,7 +48,7 @@
int iteration_nr = 1;
int test_nr;
-unsigned int shadow_pkru;
+unsigned int shadow_pkey_reg;
#define HPAGE_SIZE (1UL<<21)
#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
@@ -229,7 +229,7 @@ void dump_mem(void *dumpme, int len_bytes)
return "UNKNOWN";
}
-int pkru_faults;
+int pkey_faults;
int last_si_pkey = -1;
void signal_handler(int signum, siginfo_t *si, void *vucontext)
{
@@ -237,16 +237,16 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
int trapno;
unsigned long ip;
char *fpregs;
- u32 *pkru_ptr;
+ u32 *pkey_reg_ptr;
u64 si_pkey;
u32 *si_pkey_ptr;
- int pkru_offset;
+ int pkey_reg_offset;
fpregset_t fpregset;
dprint_in_signal = 1;
dprintf1(">>>>===============SIGSEGV============================\n");
- dprintf1("%s()::%d, pkru: 0x%x shadow: %x\n", __func__, __LINE__,
- __rdpkru(), shadow_pkru);
+ dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__, __LINE__,
+ __rdpkey_reg(), shadow_pkey_reg);
trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
@@ -263,19 +263,19 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
*/
fpregs += 0x70;
#endif
- pkru_offset = pkru_xstate_offset();
- pkru_ptr = (void *)(&fpregs[pkru_offset]);
+ pkey_reg_offset = pkey_reg_xstate_offset();
+ pkey_reg_ptr = (void *)(&fpregs[pkey_reg_offset]);
dprintf1("siginfo: %p\n", si);
dprintf1(" fpregs: %p\n", fpregs);
/*
- * If we got a PKRU fault, we *HAVE* to have at least one bit set in
+ * If we got a PKEY fault, we *HAVE* to have at least one bit set in
* here.
*/
- dprintf1("pkru_xstate_offset: %d\n", pkru_xstate_offset());
+ dprintf1("pkey_reg_xstate_offset: %d\n", pkey_reg_xstate_offset());
if (DEBUG_LEVEL > 4)
- dump_mem(pkru_ptr - 128, 256);
- pkey_assert(*pkru_ptr);
+ dump_mem(pkey_reg_ptr - 128, 256);
+ pkey_assert(*pkey_reg_ptr);
si_pkey_ptr = (u32 *)(((u8 *)si) + si_pkey_offset);
dprintf1("si_pkey_ptr: %p\n", si_pkey_ptr);
@@ -291,13 +291,16 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
exit(4);
}
- dprintf1("signal pkru from xsave: %08x\n", *pkru_ptr);
- /* need __rdpkru() version so we do not do shadow_pkru checking */
- dprintf1("signal pkru from pkru: %08x\n", __rdpkru());
+ dprintf1("signal pkey_reg from xsave: %08x\n", *pkey_reg_ptr);
+ /*
+ * need __rdpkey_reg() version so we do not do shadow_pkey_reg
+ * checking
+ */
+ dprintf1("signal pkey_reg from pkey_reg: %08x\n", __rdpkey_reg());
dprintf1("si_pkey from siginfo: %jx\n", si_pkey);
- *(u64 *)pkru_ptr = 0x00000000;
+ *(u64 *)pkey_reg_ptr = 0x00000000;
dprintf1("WARNING: set PRKU=0 to allow faulting instruction to continue\n");
- pkru_faults++;
+ pkey_faults++;
dprintf1("<<<<==================================================\n");
return;
if (trapno == 14) {
@@ -415,45 +418,47 @@ void dumpit(char *f)
u32 pkey_get(int pkey, unsigned long flags)
{
u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
- u32 pkru = __rdpkru();
- u32 shifted_pkru;
- u32 masked_pkru;
+ u32 pkey_reg = __rdpkey_reg();
+ u32 shifted_pkey_reg;
+ u32 masked_pkey_reg;
dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n",
__func__, pkey, flags, 0, 0);
- dprintf2("%s() raw pkru: %x\n", __func__, pkru);
+ dprintf2("%s() raw pkey_reg: %x\n", __func__, pkey_reg);
- shifted_pkru = (pkru >> (pkey * PKRU_BITS_PER_PKEY));
- dprintf2("%s() shifted_pkru: %x\n", __func__, shifted_pkru);
- masked_pkru = shifted_pkru & mask;
- dprintf2("%s() masked pkru: %x\n", __func__, masked_pkru);
+ shifted_pkey_reg = (pkey_reg >> (pkey * PKEY_BITS_PER_PKEY));
+ dprintf2("%s() shifted_pkey_reg: %x\n", __func__, shifted_pkey_reg);
+ masked_pkey_reg = shifted_pkey_reg & mask;
+ dprintf2("%s() masked pkey_reg: %x\n", __func__, masked_pkey_reg);
/*
* shift down the relevant bits to the lowest two, then
* mask off all the other high bits.
*/
- return masked_pkru;
+ return masked_pkey_reg;
}
int pkey_set(int pkey, unsigned long rights, unsigned long flags)
{
u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
- u32 old_pkru = __rdpkru();
- u32 new_pkru;
+ u32 old_pkey_reg = __rdpkey_reg();
+ u32 new_pkey_reg;
/* make sure that 'rights' only contains the bits we expect: */
assert(!(rights & ~mask));
- /* copy old pkru */
- new_pkru = old_pkru;
+ /* copy old pkey_reg */
+ new_pkey_reg = old_pkey_reg;
/* mask out bits from pkey in old value: */
- new_pkru &= ~(mask << (pkey * PKRU_BITS_PER_PKEY));
+ new_pkey_reg &= ~(mask << (pkey * PKEY_BITS_PER_PKEY));
/* OR in new bits for pkey: */
- new_pkru |= (rights << (pkey * PKRU_BITS_PER_PKEY));
+ new_pkey_reg |= (rights << (pkey * PKEY_BITS_PER_PKEY));
- __wrpkru(new_pkru);
+ __wrpkey_reg(new_pkey_reg);
- dprintf3("%s(pkey=%d, rights=%lx, flags=%lx) = %x pkru now: %x old_pkru: %x\n",
- __func__, pkey, rights, flags, 0, __rdpkru(), old_pkru);
+ dprintf3("%s(pkey=%d, rights=%lx, flags=%lx) = %x"
+ " pkey_reg now: %x old_pkey_reg: %x\n",
+ __func__, pkey, rights, flags, 0, __rdpkey_reg(),
+ old_pkey_reg);
return 0;
}
@@ -462,7 +467,7 @@ void pkey_disable_set(int pkey, int flags)
unsigned long syscall_flags = 0;
int ret;
int pkey_rights;
- u32 orig_pkru = rdpkru();
+ u32 orig_pkey_reg = rdpkey_reg();
dprintf1("START->%s(%d, 0x%x)\n", __func__,
pkey, flags);
@@ -478,9 +483,9 @@ void pkey_disable_set(int pkey, int flags)
ret = pkey_set(pkey, pkey_rights, syscall_flags);
assert(!ret);
- /*pkru and flags have the same format */
- shadow_pkru |= flags << (pkey * 2);
- dprintf1("%s(%d) shadow: 0x%x\n", __func__, pkey, shadow_pkru);
+ /*pkey_reg and flags have the same format */
+ shadow_pkey_reg |= flags << (pkey * 2);
+ dprintf1("%s(%d) shadow: 0x%x\n", __func__, pkey, shadow_pkey_reg);
pkey_assert(ret >= 0);
@@ -488,9 +493,9 @@ void pkey_disable_set(int pkey, int flags)
dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
pkey, pkey, pkey_rights);
- dprintf1("%s(%d) pkru: 0x%x\n", __func__, pkey, rdpkru());
+ dprintf1("%s(%d) pkey_reg: 0x%x\n", __func__, pkey, rdpkey_reg());
if (flags)
- pkey_assert(rdpkru() > orig_pkru);
+ pkey_assert(rdpkey_reg() > orig_pkey_reg);
dprintf1("END<---%s(%d, 0x%x)\n", __func__,
pkey, flags);
}
@@ -500,7 +505,7 @@ void pkey_disable_clear(int pkey, int flags)
unsigned long syscall_flags = 0;
int ret;
int pkey_rights = pkey_get(pkey, syscall_flags);
- u32 orig_pkru = rdpkru();
+ u32 orig_pkey_reg = rdpkey_reg();
pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
@@ -511,17 +516,17 @@ void pkey_disable_clear(int pkey, int flags)
pkey_rights |= flags;
ret = pkey_set(pkey, pkey_rights, 0);
- /* pkru and flags have the same format */
- shadow_pkru &= ~(flags << (pkey * 2));
+ /* pkey_reg and flags have the same format */
+ shadow_pkey_reg &= ~(flags << (pkey * 2));
pkey_assert(ret >= 0);
pkey_rights = pkey_get(pkey, syscall_flags);
dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
pkey, pkey, pkey_rights);
- dprintf1("%s(%d) pkru: 0x%x\n", __func__, pkey, rdpkru());
+ dprintf1("%s(%d) pkey_reg: 0x%x\n", __func__, pkey, rdpkey_reg());
if (flags)
- assert(rdpkru() > orig_pkru);
+ assert(rdpkey_reg() > orig_pkey_reg);
}
void pkey_write_allow(int pkey)
@@ -574,33 +579,38 @@ int alloc_pkey(void)
int ret;
unsigned long init_val = 0x0;
- dprintf1("alloc_pkey()::%d, pkru: 0x%x shadow: %x\n",
- __LINE__, __rdpkru(), shadow_pkru);
+ dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__,
+ __LINE__, __rdpkey_reg(), shadow_pkey_reg);
ret = sys_pkey_alloc(0, init_val);
/*
- * pkey_alloc() sets PKRU, so we need to reflect it in
- * shadow_pkru:
+ * pkey_alloc() sets PKEY register, so we need to reflect it in
+ * shadow_pkey_reg:
*/
- dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
- __LINE__, ret, __rdpkru(), shadow_pkru);
+ dprintf4("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ __func__, __LINE__, ret, __rdpkey_reg(),
+ shadow_pkey_reg);
if (ret) {
/* clear both the bits: */
- shadow_pkru &= ~(0x3 << (ret * 2));
- dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
- __LINE__, ret, __rdpkru(), shadow_pkru);
+ shadow_pkey_reg &= ~(0x3 << (ret * 2));
+ dprintf4("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ __func__,
+ __LINE__, ret, __rdpkey_reg(),
+ shadow_pkey_reg);
/*
* move the new state in from init_val
- * (remember, we cheated and init_val == pkru format)
+ * (remember, we cheated and init_val == pkey_reg format)
*/
- shadow_pkru |= (init_val << (ret * 2));
+ shadow_pkey_reg |= (init_val << (ret * 2));
}
- dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
- __LINE__, ret, __rdpkru(), shadow_pkru);
- dprintf1("alloc_pkey()::%d errno: %d\n", __LINE__, errno);
+ dprintf4("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ __func__, __LINE__, ret, __rdpkey_reg(),
+ shadow_pkey_reg);
+ dprintf1("%s()::%d errno: %d\n", __func__, __LINE__, errno);
/* for shadow checking: */
- rdpkru();
- dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
- __LINE__, ret, __rdpkru(), shadow_pkru);
+ rdpkey_reg();
+ dprintf4("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ __func__, __LINE__, ret, __rdpkey_reg(),
+ shadow_pkey_reg);
return ret;
}
@@ -651,8 +661,8 @@ int alloc_random_pkey(void)
free_ret = sys_pkey_free(alloced_pkeys[i]);
pkey_assert(!free_ret);
}
- dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, ret, __rdpkru(), shadow_pkru);
+ dprintf1("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n", __func__,
+ __LINE__, ret, __rdpkey_reg(), shadow_pkey_reg);
return ret;
}
@@ -670,11 +680,13 @@ int mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
if (nr_iterations-- < 0)
break;
- dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, ret, __rdpkru(), shadow_pkru);
+ dprintf1("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ __func__, __LINE__, ret, __rdpkey_reg(),
+ shadow_pkey_reg);
sys_pkey_free(rpkey);
- dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, ret, __rdpkru(), shadow_pkru);
+ dprintf1("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ __func__, __LINE__, ret, __rdpkey_reg(),
+ shadow_pkey_reg);
}
pkey_assert(pkey < NR_PKEYS);
@@ -682,8 +694,8 @@ int mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
dprintf1("mprotect_pkey(%p, %zx, prot=0x%lx, pkey=%ld) ret: %d\n",
ptr, size, orig_prot, pkey, ret);
pkey_assert(!ret);
- dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, ret, __rdpkru(), shadow_pkru);
+ dprintf1("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n", __func__,
+ __LINE__, ret, __rdpkey_reg(), shadow_pkey_reg);
return ret;
}
@@ -761,7 +773,7 @@ void free_pkey_malloc(void *ptr)
void *ptr;
int ret;
- rdpkru();
+ rdpkey_reg();
dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__,
size, prot, pkey);
pkey_assert(pkey < NR_PKEYS);
@@ -770,7 +782,7 @@ void free_pkey_malloc(void *ptr)
ret = mprotect_pkey((void *)ptr, PAGE_SIZE, prot, pkey);
pkey_assert(!ret);
record_pkey_malloc(ptr, size);
- rdpkru();
+ rdpkey_reg();
dprintf1("%s() for pkey %d @ %p\n", __func__, pkey, ptr);
return ptr;
@@ -933,31 +945,31 @@ void setup_hugetlbfs(void)
return ret;
}
-int last_pkru_faults;
-void expected_pk_fault(int pkey)
+int last_pkey_faults;
+void expected_pkey_fault(int pkey)
{
- dprintf2("%s(): last_pkru_faults: %d pkru_faults: %d\n",
- __func__, last_pkru_faults, pkru_faults);
+ dprintf2("%s(): last_pkey_faults: %d pkey_faults: %d\n",
+ __func__, last_pkey_faults, pkey_faults);
dprintf2("%s(%d): last_si_pkey: %d\n", __func__, pkey, last_si_pkey);
- pkey_assert(last_pkru_faults + 1 == pkru_faults);
+ pkey_assert(last_pkey_faults + 1 == pkey_faults);
pkey_assert(last_si_pkey == pkey);
/*
- * The signal handler shold have cleared out PKRU to let the
+ * The signal handler shold have cleared out PKEY register to let the
* test program continue. We now have to restore it.
*/
- if (__rdpkru() != 0)
+ if (__rdpkey_reg() != 0)
pkey_assert(0);
- __wrpkru(shadow_pkru);
- dprintf1("%s() set PKRU=%x to restore state after signal nuked it\n",
- __func__, shadow_pkru);
- last_pkru_faults = pkru_faults;
+ __wrpkey_reg(shadow_pkey_reg);
+ dprintf1("%s() set pkey_reg=%x to restore state after signal "
+ "nuked it\n", __func__, shadow_pkey_reg);
+ last_pkey_faults = pkey_faults;
last_si_pkey = -1;
}
-void do_not_expect_pk_fault(void)
+void do_not_expect_pkey_fault(void)
{
- pkey_assert(last_pkru_faults == pkru_faults);
+ pkey_assert(last_pkey_faults == pkey_faults);
}
int test_fds[10] = { -1 };
@@ -1015,25 +1027,25 @@ void test_read_of_access_disabled_region(int *ptr, u16 pkey)
int ptr_contents;
dprintf1("disabling access to PKEY[%02d], doing read @ %p\n", pkey, ptr);
- rdpkru();
+ rdpkey_reg();
pkey_access_deny(pkey);
ptr_contents = read_ptr(ptr);
dprintf1("*ptr: %d\n", ptr_contents);
- expected_pk_fault(pkey);
+ expected_pkey_fault(pkey);
}
void test_write_of_write_disabled_region(int *ptr, u16 pkey)
{
dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey);
pkey_write_deny(pkey);
*ptr = __LINE__;
- expected_pk_fault(pkey);
+ expected_pkey_fault(pkey);
}
void test_write_of_access_disabled_region(int *ptr, u16 pkey)
{
dprintf1("disabling access to PKEY[%02d], doing write\n", pkey);
pkey_access_deny(pkey);
*ptr = __LINE__;
- expected_pk_fault(pkey);
+ expected_pkey_fault(pkey);
}
void test_kernel_write_of_access_disabled_region(int *ptr, u16 pkey)
{
@@ -1145,9 +1157,10 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
int new_pkey;
dprintf1("%s() alloc loop: %d\n", __func__, i);
new_pkey = alloc_pkey();
- dprintf4("%s()::%d, err: %d pkru: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, err, __rdpkru(), shadow_pkru);
- rdpkru(); /* for shadow checking */
+ dprintf4("%s()::%d, err: %d pkey_reg: 0x%x shadow: 0x%x\n",
+ __func__, __LINE__, err, __rdpkey_reg(),
+ shadow_pkey_reg);
+ rdpkey_reg(); /* for shadow checking */
dprintf2("%s() errno: %d ENOSPC: %d\n", __func__, errno, ENOSPC);
if ((new_pkey == -1) && (errno == ENOSPC)) {
dprintf2("%s() failed to allocate pkey after %d tries\n",
@@ -1177,7 +1190,7 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
for (i = 0; i < nr_allocated_pkeys; i++) {
err = sys_pkey_free(allocated_pkeys[i]);
pkey_assert(!err);
- rdpkru(); /* for shadow checking */
+ rdpkey_reg(); /* for shadow checking */
}
}
@@ -1234,7 +1247,7 @@ void test_ptrace_of_child(int *ptr, u16 pkey)
pkey_assert(ret != -1);
/* Now access from the current task, and expect an exception: */
peek_result = read_ptr(ptr);
- expected_pk_fault(pkey);
+ expected_pkey_fault(pkey);
/*
* Try to access the NON-pkey-protected "plain_ptr" via ptrace:
@@ -1244,7 +1257,7 @@ void test_ptrace_of_child(int *ptr, u16 pkey)
pkey_assert(ret != -1);
/* Now access from the current task, and expect NO exception: */
peek_result = read_ptr(plain_ptr);
- do_not_expect_pk_fault();
+ do_not_expect_pkey_fault();
ret = ptrace(PTRACE_DETACH, child_pid, ignored, 0);
pkey_assert(ret != -1);
@@ -1281,17 +1294,17 @@ void test_executing_on_unreadable_memory(int *ptr, u16 pkey)
pkey_assert(!ret);
pkey_access_deny(pkey);
- dprintf2("pkru: %x\n", rdpkru());
+ dprintf2("pkey_reg: %x\n", rdpkey_reg());
/*
* Make sure this is an *instruction* fault
*/
madvise(p1, PAGE_SIZE, MADV_DONTNEED);
lots_o_noops_around_write(&scratch);
- do_not_expect_pk_fault();
+ do_not_expect_pkey_fault();
ptr_contents = read_ptr(p1);
dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
- expected_pk_fault(pkey);
+ expected_pkey_fault(pkey);
}
void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey)
@@ -1331,7 +1344,7 @@ void run_tests_once(void)
for (test_nr = 0; test_nr < ARRAY_SIZE(pkey_tests); test_nr++) {
int pkey;
- int orig_pkru_faults = pkru_faults;
+ int orig_pkey_faults = pkey_faults;
dprintf1("======================\n");
dprintf1("test %d preparing...\n", test_nr);
@@ -1346,8 +1359,8 @@ void run_tests_once(void)
free_pkey_malloc(ptr);
sys_pkey_free(pkey);
- dprintf1("pkru_faults: %d\n", pkru_faults);
- dprintf1("orig_pkru_faults: %d\n", orig_pkru_faults);
+ dprintf1("pkey_faults: %d\n", pkey_faults);
+ dprintf1("orig_pkey_faults: %d\n", orig_pkey_faults);
tracing_off();
close_test_fds();
@@ -1360,7 +1373,7 @@ void run_tests_once(void)
void pkey_setup_shadow(void)
{
- shadow_pkru = __rdpkru();
+ shadow_pkey_reg = __rdpkey_reg();
}
int main(void)
@@ -1384,7 +1397,7 @@ int main(void)
}
pkey_setup_shadow();
- printf("startup pkru: %x\n", rdpkru());
+ printf("startup pkey_reg: %x\n", rdpkey_reg());
setup_hugetlbfs();
while (nr_iterations-- > 0)
--
1.7.1
From 1583369014046566887@xxx Tue Nov 07 01:32:42 +0000 2017
X-GM-THRID: 1583366093932376528
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Handle Data and Instruction exceptions caused by memory
protection-key.
The CPU will detect the key fault if the HPTE is already
programmed with the key.
However if the HPTE is not hashed, a key fault will not
be detected by the hardware. The software will detect
pkey violation in such a case.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/mm/fault.c | 32 +++++++++++++++++++++++++++-----
1 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 4797d08..dfcd0e4 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -145,6 +145,24 @@ static noinline int bad_area(struct pt_regs *regs, unsigned long address)
return __bad_area(regs, address, SEGV_MAPERR);
}
+static int bad_page_fault_exception(struct pt_regs *regs, unsigned long address,
+ int si_code)
+{
+ int sig = SIGBUS;
+ int code = BUS_OBJERR;
+
+#ifdef CONFIG_PPC_MEM_KEYS
+ if (si_code & DSISR_KEYFAULT) {
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+ sig = SIGSEGV;
+ code = SEGV_PKUERR;
+ }
+#endif /* CONFIG_PPC_MEM_KEYS */
+
+ _exception(sig, regs, code, address);
+ return 0;
+}
+
static int do_sigbus(struct pt_regs *regs, unsigned long address,
unsigned int fault)
{
@@ -391,11 +409,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
return 0;
if (unlikely(page_fault_is_bad(error_code))) {
- if (is_user) {
- _exception(SIGBUS, regs, BUS_OBJERR, address);
- return 0;
- }
- return SIGBUS;
+ if (!is_user)
+ return SIGBUS;
+ return bad_page_fault_exception(regs, address, error_code);
}
/* Additional sanity check(s) */
@@ -498,6 +514,12 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
* the fault.
*/
fault = handle_mm_fault(vma, address, flags);
+
+#ifdef CONFIG_PPC_MEM_KEYS
+ if (unlikely(fault & VM_FAULT_SIGSEGV))
+ return __bad_area(regs, address, SEGV_PKUERR);
+#endif /* CONFIG_PPC_MEM_KEYS */
+
major |= fault & VM_FAULT_MAJOR;
/*
--
1.7.1
From 1583307760926008874@xxx Mon Nov 06 09:19:06 +0000 2017
X-GM-THRID: 1583278151262820801
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
This patch provides the implementation of execute-only pkey.
The architecture-independent layer expects the arch-dependent
layer, to support the ability to create and enable a special
key which has execute-only permission.
Acked-by: Balbir Singh <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/book3s/64/mmu.h | 1 +
arch/powerpc/include/asm/pkeys.h | 8 ++++-
arch/powerpc/mm/pkeys.c | 56 ++++++++++++++++++++++++++++++
3 files changed, 64 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index df17fbc..44dbc91 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -116,6 +116,7 @@ struct patb_entry {
* bit unset -> key available for allocation
*/
u32 pkey_allocation_map;
+ s16 execute_only_pkey; /* key holding execute-only protection */
#endif
} mm_context_t;
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0b2d9f0..20d1f0e 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -128,9 +128,13 @@ static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
* Try to dedicate one of the protection keys to be used as an
* execute-only protection key.
*/
+extern int __execute_only_pkey(struct mm_struct *mm);
static inline int execute_only_pkey(struct mm_struct *mm)
{
- return 0;
+ if (static_branch_likely(&pkey_disabled))
+ return -1;
+
+ return __execute_only_pkey(mm);
}
static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
@@ -154,6 +158,8 @@ static inline void pkey_mm_init(struct mm_struct *mm)
if (static_branch_likely(&pkey_disabled))
return;
mm_pkey_allocation_map(mm) = initial_allocation_mask;
+ /* -1 means unallocated or invalid */
+ mm->context.execute_only_pkey = -1;
}
extern void thread_pkey_regs_save(struct thread_struct *thread);
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 469f370..5da94fe 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -247,3 +247,59 @@ void thread_pkey_regs_init(struct thread_struct *thread)
write_iamr(read_iamr() & pkey_iamr_mask);
write_uamor(read_uamor() & pkey_amr_uamor_mask);
}
+
+static inline bool pkey_allows_readwrite(int pkey)
+{
+ int pkey_shift = pkeyshift(pkey);
+
+ if (!is_pkey_enabled(pkey))
+ return true;
+
+ return !(read_amr() & ((AMR_RD_BIT|AMR_WR_BIT) << pkey_shift));
+}
+
+int __execute_only_pkey(struct mm_struct *mm)
+{
+ bool need_to_set_mm_pkey = false;
+ int execute_only_pkey = mm->context.execute_only_pkey;
+ int ret;
+
+ /* Do we need to assign a pkey for mm's execute-only maps? */
+ if (execute_only_pkey == -1) {
+ /* Go allocate one to use, which might fail */
+ execute_only_pkey = mm_pkey_alloc(mm);
+ if (execute_only_pkey < 0)
+ return -1;
+ need_to_set_mm_pkey = true;
+ }
+
+ /*
+ * We do not want to go through the relatively costly dance to set AMR
+ * if we do not need to. Check it first and assume that if the
+ * execute-only pkey is readwrite-disabled than we do not have to set it
+ * ourselves.
+ */
+ if (!need_to_set_mm_pkey && !pkey_allows_readwrite(execute_only_pkey))
+ return execute_only_pkey;
+
+ /*
+ * Set up AMR so that it denies access for everything other than
+ * execution.
+ */
+ ret = __arch_set_user_pkey_access(current, execute_only_pkey,
+ PKEY_DISABLE_ACCESS |
+ PKEY_DISABLE_WRITE);
+ /*
+ * If the AMR-set operation failed somehow, just return 0 and
+ * effectively disable execute-only support.
+ */
+ if (ret) {
+ mm_pkey_free(mm, execute_only_pkey);
+ return -1;
+ }
+
+ /* We got one, store it and use it from here on out */
+ if (need_to_set_mm_pkey)
+ mm->context.execute_only_pkey = execute_only_pkey;
+ return execute_only_pkey;
+}
--
1.7.1
From 1583309598365649608@xxx Mon Nov 06 09:48:18 +0000 2017
X-GM-THRID: 1583309598365649608
X-Gmail-Labels: Inbox,Category Promotions,HistoricalUnread
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/pkey-helpers.h | 220 ++++
tools/testing/selftests/vm/protection_keys.c | 1395 +++++++++++++++++++++++++
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/pkey-helpers.h | 220 ----
tools/testing/selftests/x86/protection_keys.c | 1395 -------------------------
6 files changed, 1617 insertions(+), 1616 deletions(-)
create mode 100644 tools/testing/selftests/vm/pkey-helpers.h
create mode 100644 tools/testing/selftests/vm/protection_keys.c
delete mode 100644 tools/testing/selftests/x86/pkey-helpers.h
delete mode 100644 tools/testing/selftests/x86/protection_keys.c
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index e49eca1..6f18ef4 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -18,6 +18,7 @@ TEST_GEN_FILES += transhuge-stress
TEST_GEN_FILES += userfaultfd
TEST_GEN_FILES += mlock-random-test
TEST_GEN_FILES += virtual_address_range
+TEST_GEN_FILES += protection_keys
TEST_PROGS := run_vmtests
diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h
new file mode 100644
index 0000000..3818f25
--- /dev/null
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -0,0 +1,220 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _PKEYS_HELPER_H
+#define _PKEYS_HELPER_H
+#define _GNU_SOURCE
+#include <string.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <ucontext.h>
+#include <sys/mman.h>
+
+#define NR_PKEYS 16
+#define PKRU_BITS_PER_PKEY 2
+
+#ifndef DEBUG_LEVEL
+#define DEBUG_LEVEL 0
+#endif
+#define DPRINT_IN_SIGNAL_BUF_SIZE 4096
+extern int dprint_in_signal;
+extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
+static inline void sigsafe_printf(const char *format, ...)
+{
+ va_list ap;
+
+ va_start(ap, format);
+ if (!dprint_in_signal) {
+ vprintf(format, ap);
+ } else {
+ int len = vsnprintf(dprint_in_signal_buffer,
+ DPRINT_IN_SIGNAL_BUF_SIZE,
+ format, ap);
+ /*
+ * len is amount that would have been printed,
+ * but actual write is truncated at BUF_SIZE.
+ */
+ if (len > DPRINT_IN_SIGNAL_BUF_SIZE)
+ len = DPRINT_IN_SIGNAL_BUF_SIZE;
+ write(1, dprint_in_signal_buffer, len);
+ }
+ va_end(ap);
+}
+#define dprintf_level(level, args...) do { \
+ if (level <= DEBUG_LEVEL) \
+ sigsafe_printf(args); \
+ fflush(NULL); \
+} while (0)
+#define dprintf0(args...) dprintf_level(0, args)
+#define dprintf1(args...) dprintf_level(1, args)
+#define dprintf2(args...) dprintf_level(2, args)
+#define dprintf3(args...) dprintf_level(3, args)
+#define dprintf4(args...) dprintf_level(4, args)
+
+extern unsigned int shadow_pkru;
+static inline unsigned int __rdpkru(void)
+{
+ unsigned int eax, edx;
+ unsigned int ecx = 0;
+ unsigned int pkru;
+
+ asm volatile(".byte 0x0f,0x01,0xee\n\t"
+ : "=a" (eax), "=d" (edx)
+ : "c" (ecx));
+ pkru = eax;
+ return pkru;
+}
+
+static inline unsigned int _rdpkru(int line)
+{
+ unsigned int pkru = __rdpkru();
+
+ dprintf4("rdpkru(line=%d) pkru: %x shadow: %x\n",
+ line, pkru, shadow_pkru);
+ assert(pkru == shadow_pkru);
+
+ return pkru;
+}
+
+#define rdpkru() _rdpkru(__LINE__)
+
+static inline void __wrpkru(unsigned int pkru)
+{
+ unsigned int eax = pkru;
+ unsigned int ecx = 0;
+ unsigned int edx = 0;
+
+ dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+ asm volatile(".byte 0x0f,0x01,0xef\n\t"
+ : : "a" (eax), "c" (ecx), "d" (edx));
+ assert(pkru == __rdpkru());
+}
+
+static inline void wrpkru(unsigned int pkru)
+{
+ dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+ /* will do the shadow check for us: */
+ rdpkru();
+ __wrpkru(pkru);
+ shadow_pkru = pkru;
+ dprintf4("%s(%08x) pkru: %08x\n", __func__, pkru, __rdpkru());
+}
+
+/*
+ * These are technically racy. since something could
+ * change PKRU between the read and the write.
+ */
+static inline void __pkey_access_allow(int pkey, int do_allow)
+{
+ unsigned int pkru = rdpkru();
+ int bit = pkey * 2;
+
+ if (do_allow)
+ pkru &= (1<<bit);
+ else
+ pkru |= (1<<bit);
+
+ dprintf4("pkru now: %08x\n", rdpkru());
+ wrpkru(pkru);
+}
+
+static inline void __pkey_write_allow(int pkey, int do_allow_write)
+{
+ long pkru = rdpkru();
+ int bit = pkey * 2 + 1;
+
+ if (do_allow_write)
+ pkru &= (1<<bit);
+ else
+ pkru |= (1<<bit);
+
+ wrpkru(pkru);
+ dprintf4("pkru now: %08x\n", rdpkru());
+}
+
+#define PROT_PKEY0 0x10 /* protection key value (bit 0) */
+#define PROT_PKEY1 0x20 /* protection key value (bit 1) */
+#define PROT_PKEY2 0x40 /* protection key value (bit 2) */
+#define PROT_PKEY3 0x80 /* protection key value (bit 3) */
+
+#define PAGE_SIZE 4096
+#define MB (1<<20)
+
+static inline void __cpuid(unsigned int *eax, unsigned int *ebx,
+ unsigned int *ecx, unsigned int *edx)
+{
+ /* ecx is often an input as well as an output. */
+ asm volatile(
+ "cpuid;"
+ : "=a" (*eax),
+ "=b" (*ebx),
+ "=c" (*ecx),
+ "=d" (*edx)
+ : "0" (*eax), "2" (*ecx));
+}
+
+/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx) */
+#define X86_FEATURE_PKU (1<<3) /* Protection Keys for Userspace */
+#define X86_FEATURE_OSPKE (1<<4) /* OS Protection Keys Enable */
+
+static inline int cpu_has_pku(void)
+{
+ unsigned int eax;
+ unsigned int ebx;
+ unsigned int ecx;
+ unsigned int edx;
+
+ eax = 0x7;
+ ecx = 0x0;
+ __cpuid(&eax, &ebx, &ecx, &edx);
+
+ if (!(ecx & X86_FEATURE_PKU)) {
+ dprintf2("cpu does not have PKU\n");
+ return 0;
+ }
+ if (!(ecx & X86_FEATURE_OSPKE)) {
+ dprintf2("cpu does not have OSPKE\n");
+ return 0;
+ }
+ return 1;
+}
+
+#define XSTATE_PKRU_BIT (9)
+#define XSTATE_PKRU 0x200
+
+int pkru_xstate_offset(void)
+{
+ unsigned int eax;
+ unsigned int ebx;
+ unsigned int ecx;
+ unsigned int edx;
+ int xstate_offset;
+ int xstate_size;
+ unsigned long XSTATE_CPUID = 0xd;
+ int leaf;
+
+ /* assume that XSTATE_PKRU is set in XCR0 */
+ leaf = XSTATE_PKRU_BIT;
+ {
+ eax = XSTATE_CPUID;
+ ecx = leaf;
+ __cpuid(&eax, &ebx, &ecx, &edx);
+
+ if (leaf == XSTATE_PKRU_BIT) {
+ xstate_offset = ebx;
+ xstate_size = eax;
+ }
+ }
+
+ if (xstate_size == 0) {
+ printf("could not find size/offset of PKRU in xsave state\n");
+ return 0;
+ }
+
+ return xstate_offset;
+}
+
+#endif /* _PKEYS_HELPER_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
new file mode 100644
index 0000000..555e43c
--- /dev/null
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -0,0 +1,1395 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Tests x86 Memory Protection Keys (see Documentation/x86/protection-keys.txt)
+ *
+ * There are examples in here of:
+ * * how to set protection keys on memory
+ * * how to set/clear bits in PKRU (the rights register)
+ * * how to handle SEGV_PKRU signals and extract pkey-relevant
+ * information from the siginfo
+ *
+ * Things to add:
+ * make sure KSM and KSM COW breaking works
+ * prefault pages in at malloc, or not
+ * protect MPX bounds tables with protection keys?
+ * make sure VMA splitting/merging is working correctly
+ * OOMs can destroy mm->mmap (see exit_mmap()), so make sure it is immune to pkeys
+ * look for pkey "leaks" where it is still set on a VMA but "freed" back to the kernel
+ * do a plain mprotect() to a mprotect_pkey() area and make sure the pkey sticks
+ *
+ * Compile like this:
+ * gcc -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
+ * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
+ */
+#define _GNU_SOURCE
+#include <errno.h>
+#include <linux/futex.h>
+#include <sys/time.h>
+#include <sys/syscall.h>
+#include <string.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <ucontext.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/ptrace.h>
+#include <setjmp.h>
+
+#include "pkey-helpers.h"
+
+int iteration_nr = 1;
+int test_nr;
+
+unsigned int shadow_pkru;
+
+#define HPAGE_SIZE (1UL<<21)
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
+#define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1))
+#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
+#define ALIGN_PTR_UP(p, ptr_align_to) ((typeof(p))ALIGN_UP((unsigned long)(p), ptr_align_to))
+#define ALIGN_PTR_DOWN(p, ptr_align_to) ((typeof(p))ALIGN_DOWN((unsigned long)(p), ptr_align_to))
+#define __stringify_1(x...) #x
+#define __stringify(x...) __stringify_1(x)
+
+#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
+
+int dprint_in_signal;
+char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
+
+extern void abort_hooks(void);
+#define pkey_assert(condition) do { \
+ if (!(condition)) { \
+ dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \
+ __FILE__, __LINE__, \
+ test_nr, iteration_nr); \
+ dprintf0("errno at assert: %d", errno); \
+ abort_hooks(); \
+ assert(condition); \
+ } \
+} while (0)
+#define raw_assert(cond) assert(cond)
+
+void cat_into_file(char *str, char *file)
+{
+ int fd = open(file, O_RDWR);
+ int ret;
+
+ dprintf2("%s(): writing '%s' to '%s'\n", __func__, str, file);
+ /*
+ * these need to be raw because they are called under
+ * pkey_assert()
+ */
+ raw_assert(fd >= 0);
+ ret = write(fd, str, strlen(str));
+ if (ret != strlen(str)) {
+ perror("write to file failed");
+ fprintf(stderr, "filename: '%s' str: '%s'\n", file, str);
+ raw_assert(0);
+ }
+ close(fd);
+}
+
+#if CONTROL_TRACING > 0
+static int warned_tracing;
+int tracing_root_ok(void)
+{
+ if (geteuid() != 0) {
+ if (!warned_tracing)
+ fprintf(stderr, "WARNING: not run as root, "
+ "can not do tracing control\n");
+ warned_tracing = 1;
+ return 0;
+ }
+ return 1;
+}
+#endif
+
+void tracing_on(void)
+{
+#if CONTROL_TRACING > 0
+#define TRACEDIR "/sys/kernel/debug/tracing"
+ char pidstr[32];
+
+ if (!tracing_root_ok())
+ return;
+
+ sprintf(pidstr, "%d", getpid());
+ cat_into_file("0", TRACEDIR "/tracing_on");
+ cat_into_file("\n", TRACEDIR "/trace");
+ if (1) {
+ cat_into_file("function_graph", TRACEDIR "/current_tracer");
+ cat_into_file("1", TRACEDIR "/options/funcgraph-proc");
+ } else {
+ cat_into_file("nop", TRACEDIR "/current_tracer");
+ }
+ cat_into_file(pidstr, TRACEDIR "/set_ftrace_pid");
+ cat_into_file("1", TRACEDIR "/tracing_on");
+ dprintf1("enabled tracing\n");
+#endif
+}
+
+void tracing_off(void)
+{
+#if CONTROL_TRACING > 0
+ if (!tracing_root_ok())
+ return;
+ cat_into_file("0", "/sys/kernel/debug/tracing/tracing_on");
+#endif
+}
+
+void abort_hooks(void)
+{
+ fprintf(stderr, "running %s()...\n", __func__);
+ tracing_off();
+#ifdef SLEEP_ON_ABORT
+ sleep(SLEEP_ON_ABORT);
+#endif
+}
+
+static inline void __page_o_noops(void)
+{
+ /* 8-bytes of instruction * 512 bytes = 1 page */
+ asm(".rept 512 ; nopl 0x7eeeeeee(%eax) ; .endr");
+}
+
+/*
+ * This attempts to have roughly a page of instructions followed by a few
+ * instructions that do a write, and another page of instructions. That
+ * way, we are pretty sure that the write is in the second page of
+ * instructions and has at least a page of padding behind it.
+ *
+ * *That* lets us be sure to madvise() away the write instruction, which
+ * will then fault, which makes sure that the fault code handles
+ * execute-only memory properly.
+ */
+__attribute__((__aligned__(PAGE_SIZE)))
+void lots_o_noops_around_write(int *write_to_me)
+{
+ dprintf3("running %s()\n", __func__);
+ __page_o_noops();
+ /* Assume this happens in the second page of instructions: */
+ *write_to_me = __LINE__;
+ /* pad out by another page: */
+ __page_o_noops();
+ dprintf3("%s() done\n", __func__);
+}
+
+/* Define some kernel-like types */
+#define u8 uint8_t
+#define u16 uint16_t
+#define u32 uint32_t
+#define u64 uint64_t
+
+#ifdef __i386__
+#define SYS_mprotect_key 380
+#define SYS_pkey_alloc 381
+#define SYS_pkey_free 382
+#define REG_IP_IDX REG_EIP
+#define si_pkey_offset 0x14
+#else
+#define SYS_mprotect_key 329
+#define SYS_pkey_alloc 330
+#define SYS_pkey_free 331
+#define REG_IP_IDX REG_RIP
+#define si_pkey_offset 0x20
+#endif
+
+void dump_mem(void *dumpme, int len_bytes)
+{
+ char *c = (void *)dumpme;
+ int i;
+
+ for (i = 0; i < len_bytes; i += sizeof(u64)) {
+ u64 *ptr = (u64 *)(c + i);
+ dprintf1("dump[%03d][@%p]: %016jx\n", i, ptr, *ptr);
+ }
+}
+
+#define SEGV_BNDERR 3 /* failed address bound checks */
+#define SEGV_PKUERR 4
+
+static char *si_code_str(int si_code)
+{
+ if (si_code == SEGV_MAPERR)
+ return "SEGV_MAPERR";
+ if (si_code == SEGV_ACCERR)
+ return "SEGV_ACCERR";
+ if (si_code == SEGV_BNDERR)
+ return "SEGV_BNDERR";
+ if (si_code == SEGV_PKUERR)
+ return "SEGV_PKUERR";
+ return "UNKNOWN";
+}
+
+int pkru_faults;
+int last_si_pkey = -1;
+void signal_handler(int signum, siginfo_t *si, void *vucontext)
+{
+ ucontext_t *uctxt = vucontext;
+ int trapno;
+ unsigned long ip;
+ char *fpregs;
+ u32 *pkru_ptr;
+ u64 si_pkey;
+ u32 *si_pkey_ptr;
+ int pkru_offset;
+ fpregset_t fpregset;
+
+ dprint_in_signal = 1;
+ dprintf1(">>>>===============SIGSEGV============================\n");
+ dprintf1("%s()::%d, pkru: 0x%x shadow: %x\n", __func__, __LINE__,
+ __rdpkru(), shadow_pkru);
+
+ trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
+ ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
+ fpregset = uctxt->uc_mcontext.fpregs;
+ fpregs = (void *)fpregset;
+
+ dprintf2("%s() trapno: %d ip: 0x%lx info->si_code: %s/%d\n", __func__,
+ trapno, ip, si_code_str(si->si_code), si->si_code);
+#ifdef __i386__
+ /*
+ * 32-bit has some extra padding so that userspace can tell whether
+ * the XSTATE header is present in addition to the "legacy" FPU
+ * state. We just assume that it is here.
+ */
+ fpregs += 0x70;
+#endif
+ pkru_offset = pkru_xstate_offset();
+ pkru_ptr = (void *)(&fpregs[pkru_offset]);
+
+ dprintf1("siginfo: %p\n", si);
+ dprintf1(" fpregs: %p\n", fpregs);
+ /*
+ * If we got a PKRU fault, we *HAVE* to have at least one bit set in
+ * here.
+ */
+ dprintf1("pkru_xstate_offset: %d\n", pkru_xstate_offset());
+ if (DEBUG_LEVEL > 4)
+ dump_mem(pkru_ptr - 128, 256);
+ pkey_assert(*pkru_ptr);
+
+ si_pkey_ptr = (u32 *)(((u8 *)si) + si_pkey_offset);
+ dprintf1("si_pkey_ptr: %p\n", si_pkey_ptr);
+ dump_mem(si_pkey_ptr - 8, 24);
+ si_pkey = *si_pkey_ptr;
+ pkey_assert(si_pkey < NR_PKEYS);
+ last_si_pkey = si_pkey;
+
+ if ((si->si_code == SEGV_MAPERR) ||
+ (si->si_code == SEGV_ACCERR) ||
+ (si->si_code == SEGV_BNDERR)) {
+ printf("non-PK si_code, exiting...\n");
+ exit(4);
+ }
+
+ dprintf1("signal pkru from xsave: %08x\n", *pkru_ptr);
+ /* need __rdpkru() version so we do not do shadow_pkru checking */
+ dprintf1("signal pkru from pkru: %08x\n", __rdpkru());
+ dprintf1("si_pkey from siginfo: %jx\n", si_pkey);
+ *(u64 *)pkru_ptr = 0x00000000;
+ dprintf1("WARNING: set PRKU=0 to allow faulting instruction to continue\n");
+ pkru_faults++;
+ dprintf1("<<<<==================================================\n");
+ return;
+ if (trapno == 14) {
+ fprintf(stderr,
+ "ERROR: In signal handler, page fault, trapno = %d, ip = %016lx\n",
+ trapno, ip);
+ fprintf(stderr, "si_addr %p\n", si->si_addr);
+ fprintf(stderr, "REG_ERR: %lx\n",
+ (unsigned long)uctxt->uc_mcontext.gregs[REG_ERR]);
+ exit(1);
+ } else {
+ fprintf(stderr, "unexpected trap %d! at 0x%lx\n", trapno, ip);
+ fprintf(stderr, "si_addr %p\n", si->si_addr);
+ fprintf(stderr, "REG_ERR: %lx\n",
+ (unsigned long)uctxt->uc_mcontext.gregs[REG_ERR]);
+ exit(2);
+ }
+ dprint_in_signal = 0;
+}
+
+int wait_all_children(void)
+{
+ int status;
+ return waitpid(-1, &status, 0);
+}
+
+void sig_chld(int x)
+{
+ dprint_in_signal = 1;
+ dprintf2("[%d] SIGCHLD: %d\n", getpid(), x);
+ dprint_in_signal = 0;
+}
+
+void setup_sigsegv_handler(void)
+{
+ int r, rs;
+ struct sigaction newact;
+ struct sigaction oldact;
+
+ /* #PF is mapped to sigsegv */
+ int signum = SIGSEGV;
+
+ newact.sa_handler = 0;
+ newact.sa_sigaction = signal_handler;
+
+ /*sigset_t - signals to block while in the handler */
+ /* get the old signal mask. */
+ rs = sigprocmask(SIG_SETMASK, 0, &newact.sa_mask);
+ pkey_assert(rs == 0);
+
+ /* call sa_sigaction, not sa_handler*/
+ newact.sa_flags = SA_SIGINFO;
+
+ newact.sa_restorer = 0; /* void(*)(), obsolete */
+ r = sigaction(signum, &newact, &oldact);
+ r = sigaction(SIGALRM, &newact, &oldact);
+ pkey_assert(r == 0);
+}
+
+void setup_handlers(void)
+{
+ signal(SIGCHLD, &sig_chld);
+ setup_sigsegv_handler();
+}
+
+pid_t fork_lazy_child(void)
+{
+ pid_t forkret;
+
+ forkret = fork();
+ pkey_assert(forkret >= 0);
+ dprintf3("[%d] fork() ret: %d\n", getpid(), forkret);
+
+ if (!forkret) {
+ /* in the child */
+ while (1) {
+ dprintf1("child sleeping...\n");
+ sleep(30);
+ }
+ }
+ return forkret;
+}
+
+void davecmp(void *_a, void *_b, int len)
+{
+ int i;
+ unsigned long *a = _a;
+ unsigned long *b = _b;
+
+ for (i = 0; i < len / sizeof(*a); i++) {
+ if (a[i] == b[i])
+ continue;
+
+ dprintf3("[%3d]: a: %016lx b: %016lx\n", i, a[i], b[i]);
+ }
+}
+
+void dumpit(char *f)
+{
+ int fd = open(f, O_RDONLY);
+ char buf[100];
+ int nr_read;
+
+ dprintf2("maps fd: %d\n", fd);
+ do {
+ nr_read = read(fd, &buf[0], sizeof(buf));
+ write(1, buf, nr_read);
+ } while (nr_read > 0);
+ close(fd);
+}
+
+#define PKEY_DISABLE_ACCESS 0x1
+#define PKEY_DISABLE_WRITE 0x2
+
+u32 pkey_get(int pkey, unsigned long flags)
+{
+ u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
+ u32 pkru = __rdpkru();
+ u32 shifted_pkru;
+ u32 masked_pkru;
+
+ dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n",
+ __func__, pkey, flags, 0, 0);
+ dprintf2("%s() raw pkru: %x\n", __func__, pkru);
+
+ shifted_pkru = (pkru >> (pkey * PKRU_BITS_PER_PKEY));
+ dprintf2("%s() shifted_pkru: %x\n", __func__, shifted_pkru);
+ masked_pkru = shifted_pkru & mask;
+ dprintf2("%s() masked pkru: %x\n", __func__, masked_pkru);
+ /*
+ * shift down the relevant bits to the lowest two, then
+ * mask off all the other high bits.
+ */
+ return masked_pkru;
+}
+
+int pkey_set(int pkey, unsigned long rights, unsigned long flags)
+{
+ u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
+ u32 old_pkru = __rdpkru();
+ u32 new_pkru;
+
+ /* make sure that 'rights' only contains the bits we expect: */
+ assert(!(rights & ~mask));
+
+ /* copy old pkru */
+ new_pkru = old_pkru;
+ /* mask out bits from pkey in old value: */
+ new_pkru &= ~(mask << (pkey * PKRU_BITS_PER_PKEY));
+ /* OR in new bits for pkey: */
+ new_pkru |= (rights << (pkey * PKRU_BITS_PER_PKEY));
+
+ __wrpkru(new_pkru);
+
+ dprintf3("%s(pkey=%d, rights=%lx, flags=%lx) = %x pkru now: %x old_pkru: %x\n",
+ __func__, pkey, rights, flags, 0, __rdpkru(), old_pkru);
+ return 0;
+}
+
+void pkey_disable_set(int pkey, int flags)
+{
+ unsigned long syscall_flags = 0;
+ int ret;
+ int pkey_rights;
+ u32 orig_pkru = rdpkru();
+
+ dprintf1("START->%s(%d, 0x%x)\n", __func__,
+ pkey, flags);
+ pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
+
+ pkey_rights = pkey_get(pkey, syscall_flags);
+
+ dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
+ pkey, pkey, pkey_rights);
+ pkey_assert(pkey_rights >= 0);
+
+ pkey_rights |= flags;
+
+ ret = pkey_set(pkey, pkey_rights, syscall_flags);
+ assert(!ret);
+ /*pkru and flags have the same format */
+ shadow_pkru |= flags << (pkey * 2);
+ dprintf1("%s(%d) shadow: 0x%x\n", __func__, pkey, shadow_pkru);
+
+ pkey_assert(ret >= 0);
+
+ pkey_rights = pkey_get(pkey, syscall_flags);
+ dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
+ pkey, pkey, pkey_rights);
+
+ dprintf1("%s(%d) pkru: 0x%x\n", __func__, pkey, rdpkru());
+ if (flags)
+ pkey_assert(rdpkru() > orig_pkru);
+ dprintf1("END<---%s(%d, 0x%x)\n", __func__,
+ pkey, flags);
+}
+
+void pkey_disable_clear(int pkey, int flags)
+{
+ unsigned long syscall_flags = 0;
+ int ret;
+ int pkey_rights = pkey_get(pkey, syscall_flags);
+ u32 orig_pkru = rdpkru();
+
+ pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
+
+ dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
+ pkey, pkey, pkey_rights);
+ pkey_assert(pkey_rights >= 0);
+
+ pkey_rights |= flags;
+
+ ret = pkey_set(pkey, pkey_rights, 0);
+ /* pkru and flags have the same format */
+ shadow_pkru &= ~(flags << (pkey * 2));
+ pkey_assert(ret >= 0);
+
+ pkey_rights = pkey_get(pkey, syscall_flags);
+ dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
+ pkey, pkey, pkey_rights);
+
+ dprintf1("%s(%d) pkru: 0x%x\n", __func__, pkey, rdpkru());
+ if (flags)
+ assert(rdpkru() > orig_pkru);
+}
+
+void pkey_write_allow(int pkey)
+{
+ pkey_disable_clear(pkey, PKEY_DISABLE_WRITE);
+}
+void pkey_write_deny(int pkey)
+{
+ pkey_disable_set(pkey, PKEY_DISABLE_WRITE);
+}
+void pkey_access_allow(int pkey)
+{
+ pkey_disable_clear(pkey, PKEY_DISABLE_ACCESS);
+}
+void pkey_access_deny(int pkey)
+{
+ pkey_disable_set(pkey, PKEY_DISABLE_ACCESS);
+}
+
+int sys_mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
+ unsigned long pkey)
+{
+ int sret;
+
+ dprintf2("%s(0x%p, %zx, prot=%lx, pkey=%lx)\n", __func__,
+ ptr, size, orig_prot, pkey);
+
+ errno = 0;
+ sret = syscall(SYS_mprotect_key, ptr, size, orig_prot, pkey);
+ if (errno) {
+ dprintf2("SYS_mprotect_key sret: %d\n", sret);
+ dprintf2("SYS_mprotect_key prot: 0x%lx\n", orig_prot);
+ dprintf2("SYS_mprotect_key failed, errno: %d\n", errno);
+ if (DEBUG_LEVEL >= 2)
+ perror("SYS_mprotect_pkey");
+ }
+ return sret;
+}
+
+int sys_pkey_alloc(unsigned long flags, unsigned long init_val)
+{
+ int ret = syscall(SYS_pkey_alloc, flags, init_val);
+ dprintf1("%s(flags=%lx, init_val=%lx) syscall ret: %d errno: %d\n",
+ __func__, flags, init_val, ret, errno);
+ return ret;
+}
+
+int alloc_pkey(void)
+{
+ int ret;
+ unsigned long init_val = 0x0;
+
+ dprintf1("alloc_pkey()::%d, pkru: 0x%x shadow: %x\n",
+ __LINE__, __rdpkru(), shadow_pkru);
+ ret = sys_pkey_alloc(0, init_val);
+ /*
+ * pkey_alloc() sets PKRU, so we need to reflect it in
+ * shadow_pkru:
+ */
+ dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
+ __LINE__, ret, __rdpkru(), shadow_pkru);
+ if (ret) {
+ /* clear both the bits: */
+ shadow_pkru &= ~(0x3 << (ret * 2));
+ dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
+ __LINE__, ret, __rdpkru(), shadow_pkru);
+ /*
+ * move the new state in from init_val
+ * (remember, we cheated and init_val == pkru format)
+ */
+ shadow_pkru |= (init_val << (ret * 2));
+ }
+ dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
+ __LINE__, ret, __rdpkru(), shadow_pkru);
+ dprintf1("alloc_pkey()::%d errno: %d\n", __LINE__, errno);
+ /* for shadow checking: */
+ rdpkru();
+ dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
+ __LINE__, ret, __rdpkru(), shadow_pkru);
+ return ret;
+}
+
+int sys_pkey_free(unsigned long pkey)
+{
+ int ret = syscall(SYS_pkey_free, pkey);
+ dprintf1("%s(pkey=%ld) syscall ret: %d\n", __func__, pkey, ret);
+ return ret;
+}
+
+/*
+ * I had a bug where pkey bits could be set by mprotect() but
+ * not cleared. This ensures we get lots of random bit sets
+ * and clears on the vma and pte pkey bits.
+ */
+int alloc_random_pkey(void)
+{
+ int max_nr_pkey_allocs;
+ int ret;
+ int i;
+ int alloced_pkeys[NR_PKEYS];
+ int nr_alloced = 0;
+ int random_index;
+ memset(alloced_pkeys, 0, sizeof(alloced_pkeys));
+
+ /* allocate every possible key and make a note of which ones we got */
+ max_nr_pkey_allocs = NR_PKEYS;
+ max_nr_pkey_allocs = 1;
+ for (i = 0; i < max_nr_pkey_allocs; i++) {
+ int new_pkey = alloc_pkey();
+ if (new_pkey < 0)
+ break;
+ alloced_pkeys[nr_alloced++] = new_pkey;
+ }
+
+ pkey_assert(nr_alloced > 0);
+ /* select a random one out of the allocated ones */
+ random_index = rand() % nr_alloced;
+ ret = alloced_pkeys[random_index];
+ /* now zero it out so we don't free it next */
+ alloced_pkeys[random_index] = 0;
+
+ /* go through the allocated ones that we did not want and free them */
+ for (i = 0; i < nr_alloced; i++) {
+ int free_ret;
+ if (!alloced_pkeys[i])
+ continue;
+ free_ret = sys_pkey_free(alloced_pkeys[i]);
+ pkey_assert(!free_ret);
+ }
+ dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
+ __LINE__, ret, __rdpkru(), shadow_pkru);
+ return ret;
+}
+
+int mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
+ unsigned long pkey)
+{
+ int nr_iterations = random() % 100;
+ int ret;
+
+ while (0) {
+ int rpkey = alloc_random_pkey();
+ ret = sys_mprotect_pkey(ptr, size, orig_prot, pkey);
+ dprintf1("sys_mprotect_pkey(%p, %zx, prot=0x%lx, pkey=%ld) ret: %d\n",
+ ptr, size, orig_prot, pkey, ret);
+ if (nr_iterations-- < 0)
+ break;
+
+ dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
+ __LINE__, ret, __rdpkru(), shadow_pkru);
+ sys_pkey_free(rpkey);
+ dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
+ __LINE__, ret, __rdpkru(), shadow_pkru);
+ }
+ pkey_assert(pkey < NR_PKEYS);
+
+ ret = sys_mprotect_pkey(ptr, size, orig_prot, pkey);
+ dprintf1("mprotect_pkey(%p, %zx, prot=0x%lx, pkey=%ld) ret: %d\n",
+ ptr, size, orig_prot, pkey, ret);
+ pkey_assert(!ret);
+ dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
+ __LINE__, ret, __rdpkru(), shadow_pkru);
+ return ret;
+}
+
+struct pkey_malloc_record {
+ void *ptr;
+ long size;
+};
+struct pkey_malloc_record *pkey_malloc_records;
+long nr_pkey_malloc_records;
+void record_pkey_malloc(void *ptr, long size)
+{
+ long i;
+ struct pkey_malloc_record *rec = NULL;
+
+ for (i = 0; i < nr_pkey_malloc_records; i++) {
+ rec = &pkey_malloc_records[i];
+ /* find a free record */
+ if (rec)
+ break;
+ }
+ if (!rec) {
+ /* every record is full */
+ size_t old_nr_records = nr_pkey_malloc_records;
+ size_t new_nr_records = (nr_pkey_malloc_records * 2 + 1);
+ size_t new_size = new_nr_records * sizeof(struct pkey_malloc_record);
+ dprintf2("new_nr_records: %zd\n", new_nr_records);
+ dprintf2("new_size: %zd\n", new_size);
+ pkey_malloc_records = realloc(pkey_malloc_records, new_size);
+ pkey_assert(pkey_malloc_records != NULL);
+ rec = &pkey_malloc_records[nr_pkey_malloc_records];
+ /*
+ * realloc() does not initialize memory, so zero it from
+ * the first new record all the way to the end.
+ */
+ for (i = 0; i < new_nr_records - old_nr_records; i++)
+ memset(rec + i, 0, sizeof(*rec));
+ }
+ dprintf3("filling malloc record[%d/%p]: {%p, %ld}\n",
+ (int)(rec - pkey_malloc_records), rec, ptr, size);
+ rec->ptr = ptr;
+ rec->size = size;
+ nr_pkey_malloc_records++;
+}
+
+void free_pkey_malloc(void *ptr)
+{
+ long i;
+ int ret;
+ dprintf3("%s(%p)\n", __func__, ptr);
+ for (i = 0; i < nr_pkey_malloc_records; i++) {
+ struct pkey_malloc_record *rec = &pkey_malloc_records[i];
+ dprintf4("looking for ptr %p at record[%ld/%p]: {%p, %ld}\n",
+ ptr, i, rec, rec->ptr, rec->size);
+ if ((ptr < rec->ptr) ||
+ (ptr >= rec->ptr + rec->size))
+ continue;
+
+ dprintf3("found ptr %p at record[%ld/%p]: {%p, %ld}\n",
+ ptr, i, rec, rec->ptr, rec->size);
+ nr_pkey_malloc_records--;
+ ret = munmap(rec->ptr, rec->size);
+ dprintf3("munmap ret: %d\n", ret);
+ pkey_assert(!ret);
+ dprintf3("clearing rec->ptr, rec: %p\n", rec);
+ rec->ptr = NULL;
+ dprintf3("done clearing rec->ptr, rec: %p\n", rec);
+ return;
+ }
+ pkey_assert(false);
+}
+
+
+void *malloc_pkey_with_mprotect(long size, int prot, u16 pkey)
+{
+ void *ptr;
+ int ret;
+
+ rdpkru();
+ dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__,
+ size, prot, pkey);
+ pkey_assert(pkey < NR_PKEYS);
+ ptr = mmap(NULL, size, prot, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+ pkey_assert(ptr != (void *)-1);
+ ret = mprotect_pkey((void *)ptr, PAGE_SIZE, prot, pkey);
+ pkey_assert(!ret);
+ record_pkey_malloc(ptr, size);
+ rdpkru();
+
+ dprintf1("%s() for pkey %d @ %p\n", __func__, pkey, ptr);
+ return ptr;
+}
+
+void *malloc_pkey_anon_huge(long size, int prot, u16 pkey)
+{
+ int ret;
+ void *ptr;
+
+ dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__,
+ size, prot, pkey);
+ /*
+ * Guarantee we can fit at least one huge page in the resulting
+ * allocation by allocating space for 2:
+ */
+ size = ALIGN_UP(size, HPAGE_SIZE * 2);
+ ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+ pkey_assert(ptr != (void *)-1);
+ record_pkey_malloc(ptr, size);
+ mprotect_pkey(ptr, size, prot, pkey);
+
+ dprintf1("unaligned ptr: %p\n", ptr);
+ ptr = ALIGN_PTR_UP(ptr, HPAGE_SIZE);
+ dprintf1(" aligned ptr: %p\n", ptr);
+ ret = madvise(ptr, HPAGE_SIZE, MADV_HUGEPAGE);
+ dprintf1("MADV_HUGEPAGE ret: %d\n", ret);
+ ret = madvise(ptr, HPAGE_SIZE, MADV_WILLNEED);
+ dprintf1("MADV_WILLNEED ret: %d\n", ret);
+ memset(ptr, 0, HPAGE_SIZE);
+
+ dprintf1("mmap()'d thp for pkey %d @ %p\n", pkey, ptr);
+ return ptr;
+}
+
+int hugetlb_setup_ok;
+#define GET_NR_HUGE_PAGES 10
+void setup_hugetlbfs(void)
+{
+ int err;
+ int fd;
+ char buf[] = "123";
+
+ if (geteuid() != 0) {
+ fprintf(stderr, "WARNING: not run as root, can not do hugetlb test\n");
+ return;
+ }
+
+ cat_into_file(__stringify(GET_NR_HUGE_PAGES), "/proc/sys/vm/nr_hugepages");
+
+ /*
+ * Now go make sure that we got the pages and that they
+ * are 2M pages. Someone might have made 1G the default.
+ */
+ fd = open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages", O_RDONLY);
+ if (fd < 0) {
+ perror("opening sysfs 2M hugetlb config");
+ return;
+ }
+
+ /* -1 to guarantee leaving the trailing \0 */
+ err = read(fd, buf, sizeof(buf)-1);
+ close(fd);
+ if (err <= 0) {
+ perror("reading sysfs 2M hugetlb config");
+ return;
+ }
+
+ if (atoi(buf) != GET_NR_HUGE_PAGES) {
+ fprintf(stderr, "could not confirm 2M pages, got: '%s' expected %d\n",
+ buf, GET_NR_HUGE_PAGES);
+ return;
+ }
+
+ hugetlb_setup_ok = 1;
+}
+
+void *malloc_pkey_hugetlb(long size, int prot, u16 pkey)
+{
+ void *ptr;
+ int flags = MAP_ANONYMOUS|MAP_PRIVATE|MAP_HUGETLB;
+
+ if (!hugetlb_setup_ok)
+ return PTR_ERR_ENOTSUP;
+
+ dprintf1("doing %s(%ld, %x, %x)\n", __func__, size, prot, pkey);
+ size = ALIGN_UP(size, HPAGE_SIZE * 2);
+ pkey_assert(pkey < NR_PKEYS);
+ ptr = mmap(NULL, size, PROT_NONE, flags, -1, 0);
+ pkey_assert(ptr != (void *)-1);
+ mprotect_pkey(ptr, size, prot, pkey);
+
+ record_pkey_malloc(ptr, size);
+
+ dprintf1("mmap()'d hugetlbfs for pkey %d @ %p\n", pkey, ptr);
+ return ptr;
+}
+
+void *malloc_pkey_mmap_dax(long size, int prot, u16 pkey)
+{
+ void *ptr;
+ int fd;
+
+ dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__,
+ size, prot, pkey);
+ pkey_assert(pkey < NR_PKEYS);
+ fd = open("/dax/foo", O_RDWR);
+ pkey_assert(fd >= 0);
+
+ ptr = mmap(0, size, prot, MAP_SHARED, fd, 0);
+ pkey_assert(ptr != (void *)-1);
+
+ mprotect_pkey(ptr, size, prot, pkey);
+
+ record_pkey_malloc(ptr, size);
+
+ dprintf1("mmap()'d for pkey %d @ %p\n", pkey, ptr);
+ close(fd);
+ return ptr;
+}
+
+void *(*pkey_malloc[])(long size, int prot, u16 pkey) = {
+
+ malloc_pkey_with_mprotect,
+ malloc_pkey_anon_huge,
+ malloc_pkey_hugetlb
+/* can not do direct with the pkey_mprotect() API:
+ malloc_pkey_mmap_direct,
+ malloc_pkey_mmap_dax,
+*/
+};
+
+void *malloc_pkey(long size, int prot, u16 pkey)
+{
+ void *ret;
+ static int malloc_type;
+ int nr_malloc_types = ARRAY_SIZE(pkey_malloc);
+
+ pkey_assert(pkey < NR_PKEYS);
+
+ while (1) {
+ pkey_assert(malloc_type < nr_malloc_types);
+
+ ret = pkey_malloc[malloc_type](size, prot, pkey);
+ pkey_assert(ret != (void *)-1);
+
+ malloc_type++;
+ if (malloc_type >= nr_malloc_types)
+ malloc_type = (random()%nr_malloc_types);
+
+ /* try again if the malloc_type we tried is unsupported */
+ if (ret == PTR_ERR_ENOTSUP)
+ continue;
+
+ break;
+ }
+
+ dprintf3("%s(%ld, prot=%x, pkey=%x) returning: %p\n", __func__,
+ size, prot, pkey, ret);
+ return ret;
+}
+
+int last_pkru_faults;
+void expected_pk_fault(int pkey)
+{
+ dprintf2("%s(): last_pkru_faults: %d pkru_faults: %d\n",
+ __func__, last_pkru_faults, pkru_faults);
+ dprintf2("%s(%d): last_si_pkey: %d\n", __func__, pkey, last_si_pkey);
+ pkey_assert(last_pkru_faults + 1 == pkru_faults);
+ pkey_assert(last_si_pkey == pkey);
+ /*
+ * The signal handler shold have cleared out PKRU to let the
+ * test program continue. We now have to restore it.
+ */
+ if (__rdpkru() != 0)
+ pkey_assert(0);
+
+ __wrpkru(shadow_pkru);
+ dprintf1("%s() set PKRU=%x to restore state after signal nuked it\n",
+ __func__, shadow_pkru);
+ last_pkru_faults = pkru_faults;
+ last_si_pkey = -1;
+}
+
+void do_not_expect_pk_fault(void)
+{
+ pkey_assert(last_pkru_faults == pkru_faults);
+}
+
+int test_fds[10] = { -1 };
+int nr_test_fds;
+void __save_test_fd(int fd)
+{
+ pkey_assert(fd >= 0);
+ pkey_assert(nr_test_fds < ARRAY_SIZE(test_fds));
+ test_fds[nr_test_fds] = fd;
+ nr_test_fds++;
+}
+
+int get_test_read_fd(void)
+{
+ int test_fd = open("/etc/passwd", O_RDONLY);
+ __save_test_fd(test_fd);
+ return test_fd;
+}
+
+void close_test_fds(void)
+{
+ int i;
+
+ for (i = 0; i < nr_test_fds; i++) {
+ if (test_fds[i] < 0)
+ continue;
+ close(test_fds[i]);
+ test_fds[i] = -1;
+ }
+ nr_test_fds = 0;
+}
+
+#define barrier() __asm__ __volatile__("": : :"memory")
+__attribute__((noinline)) int read_ptr(int *ptr)
+{
+ /*
+ * Keep GCC from optimizing this away somehow
+ */
+ barrier();
+ return *ptr;
+}
+
+void test_read_of_write_disabled_region(int *ptr, u16 pkey)
+{
+ int ptr_contents;
+
+ dprintf1("disabling write access to PKEY[1], doing read\n");
+ pkey_write_deny(pkey);
+ ptr_contents = read_ptr(ptr);
+ dprintf1("*ptr: %d\n", ptr_contents);
+ dprintf1("\n");
+}
+void test_read_of_access_disabled_region(int *ptr, u16 pkey)
+{
+ int ptr_contents;
+
+ dprintf1("disabling access to PKEY[%02d], doing read @ %p\n", pkey, ptr);
+ rdpkru();
+ pkey_access_deny(pkey);
+ ptr_contents = read_ptr(ptr);
+ dprintf1("*ptr: %d\n", ptr_contents);
+ expected_pk_fault(pkey);
+}
+void test_write_of_write_disabled_region(int *ptr, u16 pkey)
+{
+ dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey);
+ pkey_write_deny(pkey);
+ *ptr = __LINE__;
+ expected_pk_fault(pkey);
+}
+void test_write_of_access_disabled_region(int *ptr, u16 pkey)
+{
+ dprintf1("disabling access to PKEY[%02d], doing write\n", pkey);
+ pkey_access_deny(pkey);
+ *ptr = __LINE__;
+ expected_pk_fault(pkey);
+}
+void test_kernel_write_of_access_disabled_region(int *ptr, u16 pkey)
+{
+ int ret;
+ int test_fd = get_test_read_fd();
+
+ dprintf1("disabling access to PKEY[%02d], "
+ "having kernel read() to buffer\n", pkey);
+ pkey_access_deny(pkey);
+ ret = read(test_fd, ptr, 1);
+ dprintf1("read ret: %d\n", ret);
+ pkey_assert(ret);
+}
+void test_kernel_write_of_write_disabled_region(int *ptr, u16 pkey)
+{
+ int ret;
+ int test_fd = get_test_read_fd();
+
+ pkey_write_deny(pkey);
+ ret = read(test_fd, ptr, 100);
+ dprintf1("read ret: %d\n", ret);
+ if (ret < 0 && (DEBUG_LEVEL > 0))
+ perror("verbose read result (OK for this to be bad)");
+ pkey_assert(ret);
+}
+
+void test_kernel_gup_of_access_disabled_region(int *ptr, u16 pkey)
+{
+ int pipe_ret, vmsplice_ret;
+ struct iovec iov;
+ int pipe_fds[2];
+
+ pipe_ret = pipe(pipe_fds);
+
+ pkey_assert(pipe_ret == 0);
+ dprintf1("disabling access to PKEY[%02d], "
+ "having kernel vmsplice from buffer\n", pkey);
+ pkey_access_deny(pkey);
+ iov.iov_base = ptr;
+ iov.iov_len = PAGE_SIZE;
+ vmsplice_ret = vmsplice(pipe_fds[1], &iov, 1, SPLICE_F_GIFT);
+ dprintf1("vmsplice() ret: %d\n", vmsplice_ret);
+ pkey_assert(vmsplice_ret == -1);
+
+ close(pipe_fds[0]);
+ close(pipe_fds[1]);
+}
+
+void test_kernel_gup_write_to_write_disabled_region(int *ptr, u16 pkey)
+{
+ int ignored = 0xdada;
+ int futex_ret;
+ int some_int = __LINE__;
+
+ dprintf1("disabling write to PKEY[%02d], "
+ "doing futex gunk in buffer\n", pkey);
+ *ptr = some_int;
+ pkey_write_deny(pkey);
+ futex_ret = syscall(SYS_futex, ptr, FUTEX_WAIT, some_int-1, NULL,
+ &ignored, ignored);
+ if (DEBUG_LEVEL > 0)
+ perror("futex");
+ dprintf1("futex() ret: %d\n", futex_ret);
+}
+
+/* Assumes that all pkeys other than 'pkey' are unallocated */
+void test_pkey_syscalls_on_non_allocated_pkey(int *ptr, u16 pkey)
+{
+ int err;
+ int i;
+
+ /* Note: 0 is the default pkey, so don't mess with it */
+ for (i = 1; i < NR_PKEYS; i++) {
+ if (pkey == i)
+ continue;
+
+ dprintf1("trying get/set/free to non-allocated pkey: %2d\n", i);
+ err = sys_pkey_free(i);
+ pkey_assert(err);
+
+ err = sys_pkey_free(i);
+ pkey_assert(err);
+
+ err = sys_mprotect_pkey(ptr, PAGE_SIZE, PROT_READ, i);
+ pkey_assert(err);
+ }
+}
+
+/* Assumes that all pkeys other than 'pkey' are unallocated */
+void test_pkey_syscalls_bad_args(int *ptr, u16 pkey)
+{
+ int err;
+ int bad_pkey = NR_PKEYS+99;
+
+ /* pass a known-invalid pkey in: */
+ err = sys_mprotect_pkey(ptr, PAGE_SIZE, PROT_READ, bad_pkey);
+ pkey_assert(err);
+}
+
+/* Assumes that all pkeys other than 'pkey' are unallocated */
+void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
+{
+ int err;
+ int allocated_pkeys[NR_PKEYS] = {0};
+ int nr_allocated_pkeys = 0;
+ int i;
+
+ for (i = 0; i < NR_PKEYS*2; i++) {
+ int new_pkey;
+ dprintf1("%s() alloc loop: %d\n", __func__, i);
+ new_pkey = alloc_pkey();
+ dprintf4("%s()::%d, err: %d pkru: 0x%x shadow: 0x%x\n", __func__,
+ __LINE__, err, __rdpkru(), shadow_pkru);
+ rdpkru(); /* for shadow checking */
+ dprintf2("%s() errno: %d ENOSPC: %d\n", __func__, errno, ENOSPC);
+ if ((new_pkey == -1) && (errno == ENOSPC)) {
+ dprintf2("%s() failed to allocate pkey after %d tries\n",
+ __func__, nr_allocated_pkeys);
+ break;
+ }
+ pkey_assert(nr_allocated_pkeys < NR_PKEYS);
+ allocated_pkeys[nr_allocated_pkeys++] = new_pkey;
+ }
+
+ dprintf3("%s()::%d\n", __func__, __LINE__);
+
+ /*
+ * ensure it did not reach the end of the loop without
+ * failure:
+ */
+ pkey_assert(i < NR_PKEYS*2);
+
+ /*
+ * There are 16 pkeys supported in hardware. One is taken
+ * up for the default (0) and another can be taken up by
+ * an execute-only mapping. Ensure that we can allocate
+ * at least 14 (16-2).
+ */
+ pkey_assert(i >= NR_PKEYS-2);
+
+ for (i = 0; i < nr_allocated_pkeys; i++) {
+ err = sys_pkey_free(allocated_pkeys[i]);
+ pkey_assert(!err);
+ rdpkru(); /* for shadow checking */
+ }
+}
+
+void test_ptrace_of_child(int *ptr, u16 pkey)
+{
+ __attribute__((__unused__)) int peek_result;
+ pid_t child_pid;
+ void *ignored = 0;
+ long ret;
+ int status;
+ /*
+ * This is the "control" for our little expermient. Make sure
+ * we can always access it when ptracing.
+ */
+ int *plain_ptr_unaligned = malloc(HPAGE_SIZE);
+ int *plain_ptr = ALIGN_PTR_UP(plain_ptr_unaligned, PAGE_SIZE);
+
+ /*
+ * Fork a child which is an exact copy of this process, of course.
+ * That means we can do all of our tests via ptrace() and then plain
+ * memory access and ensure they work differently.
+ */
+ child_pid = fork_lazy_child();
+ dprintf1("[%d] child pid: %d\n", getpid(), child_pid);
+
+ ret = ptrace(PTRACE_ATTACH, child_pid, ignored, ignored);
+ if (ret)
+ perror("attach");
+ dprintf1("[%d] attach ret: %ld %d\n", getpid(), ret, __LINE__);
+ pkey_assert(ret != -1);
+ ret = waitpid(child_pid, &status, WUNTRACED);
+ if ((ret != child_pid) || !(WIFSTOPPED(status))) {
+ fprintf(stderr, "weird waitpid result %ld stat %x\n",
+ ret, status);
+ pkey_assert(0);
+ }
+ dprintf2("waitpid ret: %ld\n", ret);
+ dprintf2("waitpid status: %d\n", status);
+
+ pkey_access_deny(pkey);
+ pkey_write_deny(pkey);
+
+ /* Write access, untested for now:
+ ret = ptrace(PTRACE_POKEDATA, child_pid, peek_at, data);
+ pkey_assert(ret != -1);
+ dprintf1("poke at %p: %ld\n", peek_at, ret);
+ */
+
+ /*
+ * Try to access the pkey-protected "ptr" via ptrace:
+ */
+ ret = ptrace(PTRACE_PEEKDATA, child_pid, ptr, ignored);
+ /* expect it to work, without an error: */
+ pkey_assert(ret != -1);
+ /* Now access from the current task, and expect an exception: */
+ peek_result = read_ptr(ptr);
+ expected_pk_fault(pkey);
+
+ /*
+ * Try to access the NON-pkey-protected "plain_ptr" via ptrace:
+ */
+ ret = ptrace(PTRACE_PEEKDATA, child_pid, plain_ptr, ignored);
+ /* expect it to work, without an error: */
+ pkey_assert(ret != -1);
+ /* Now access from the current task, and expect NO exception: */
+ peek_result = read_ptr(plain_ptr);
+ do_not_expect_pk_fault();
+
+ ret = ptrace(PTRACE_DETACH, child_pid, ignored, 0);
+ pkey_assert(ret != -1);
+
+ ret = kill(child_pid, SIGKILL);
+ pkey_assert(ret != -1);
+
+ wait(&status);
+
+ free(plain_ptr_unaligned);
+}
+
+void test_executing_on_unreadable_memory(int *ptr, u16 pkey)
+{
+ void *p1;
+ int scratch;
+ int ptr_contents;
+ int ret;
+
+ p1 = ALIGN_PTR_UP(&lots_o_noops_around_write, PAGE_SIZE);
+ dprintf3("&lots_o_noops: %p\n", &lots_o_noops_around_write);
+ /* lots_o_noops_around_write should be page-aligned already */
+ assert(p1 == &lots_o_noops_around_write);
+
+ /* Point 'p1' at the *second* page of the function: */
+ p1 += PAGE_SIZE;
+
+ madvise(p1, PAGE_SIZE, MADV_DONTNEED);
+ lots_o_noops_around_write(&scratch);
+ ptr_contents = read_ptr(p1);
+ dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
+
+ ret = mprotect_pkey(p1, PAGE_SIZE, PROT_EXEC, (u64)pkey);
+ pkey_assert(!ret);
+ pkey_access_deny(pkey);
+
+ dprintf2("pkru: %x\n", rdpkru());
+
+ /*
+ * Make sure this is an *instruction* fault
+ */
+ madvise(p1, PAGE_SIZE, MADV_DONTNEED);
+ lots_o_noops_around_write(&scratch);
+ do_not_expect_pk_fault();
+ ptr_contents = read_ptr(p1);
+ dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
+ expected_pk_fault(pkey);
+}
+
+void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey)
+{
+ int size = PAGE_SIZE;
+ int sret;
+
+ if (cpu_has_pku()) {
+ dprintf1("SKIP: %s: no CPU support\n", __func__);
+ return;
+ }
+
+ sret = syscall(SYS_mprotect_key, ptr, size, PROT_READ, pkey);
+ pkey_assert(sret < 0);
+}
+
+void (*pkey_tests[])(int *ptr, u16 pkey) = {
+ test_read_of_write_disabled_region,
+ test_read_of_access_disabled_region,
+ test_write_of_write_disabled_region,
+ test_write_of_access_disabled_region,
+ test_kernel_write_of_access_disabled_region,
+ test_kernel_write_of_write_disabled_region,
+ test_kernel_gup_of_access_disabled_region,
+ test_kernel_gup_write_to_write_disabled_region,
+ test_executing_on_unreadable_memory,
+ test_ptrace_of_child,
+ test_pkey_syscalls_on_non_allocated_pkey,
+ test_pkey_syscalls_bad_args,
+ test_pkey_alloc_exhaust,
+};
+
+void run_tests_once(void)
+{
+ int *ptr;
+ int prot = PROT_READ|PROT_WRITE;
+
+ for (test_nr = 0; test_nr < ARRAY_SIZE(pkey_tests); test_nr++) {
+ int pkey;
+ int orig_pkru_faults = pkru_faults;
+
+ dprintf1("======================\n");
+ dprintf1("test %d preparing...\n", test_nr);
+
+ tracing_on();
+ pkey = alloc_random_pkey();
+ dprintf1("test %d starting with pkey: %d\n", test_nr, pkey);
+ ptr = malloc_pkey(PAGE_SIZE, prot, pkey);
+ dprintf1("test %d starting...\n", test_nr);
+ pkey_tests[test_nr](ptr, pkey);
+ dprintf1("freeing test memory: %p\n", ptr);
+ free_pkey_malloc(ptr);
+ sys_pkey_free(pkey);
+
+ dprintf1("pkru_faults: %d\n", pkru_faults);
+ dprintf1("orig_pkru_faults: %d\n", orig_pkru_faults);
+
+ tracing_off();
+ close_test_fds();
+
+ printf("test %2d PASSED (iteration %d)\n", test_nr, iteration_nr);
+ dprintf1("======================\n\n");
+ }
+ iteration_nr++;
+}
+
+void pkey_setup_shadow(void)
+{
+ shadow_pkru = __rdpkru();
+}
+
+int main(void)
+{
+ int nr_iterations = 22;
+
+ setup_handlers();
+
+ printf("has pku: %d\n", cpu_has_pku());
+
+ if (!cpu_has_pku()) {
+ int size = PAGE_SIZE;
+ int *ptr;
+
+ printf("running PKEY tests for unsupported CPU/OS\n");
+
+ ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+ assert(ptr != (void *)-1);
+ test_mprotect_pkey_on_unsupported_cpu(ptr, 1);
+ exit(0);
+ }
+
+ pkey_setup_shadow();
+ printf("startup pkru: %x\n", rdpkru());
+ setup_hugetlbfs();
+
+ while (nr_iterations-- > 0)
+ run_tests_once();
+
+ printf("done (all tests OK)\n");
+ return 0;
+}
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 7b1adee..9687501 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -7,7 +7,7 @@ include ../lib.mk
TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_syscall test_mremap_vdso \
check_initial_reg_state sigreturn ldt_gdt iopl mpx-mini-test ioperm \
- protection_keys test_vdso
+ test_vdso
TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
diff --git a/tools/testing/selftests/x86/pkey-helpers.h b/tools/testing/selftests/x86/pkey-helpers.h
deleted file mode 100644
index 3818f25..0000000
--- a/tools/testing/selftests/x86/pkey-helpers.h
+++ /dev/null
@@ -1,220 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _PKEYS_HELPER_H
-#define _PKEYS_HELPER_H
-#define _GNU_SOURCE
-#include <string.h>
-#include <stdarg.h>
-#include <stdio.h>
-#include <stdint.h>
-#include <stdbool.h>
-#include <signal.h>
-#include <assert.h>
-#include <stdlib.h>
-#include <ucontext.h>
-#include <sys/mman.h>
-
-#define NR_PKEYS 16
-#define PKRU_BITS_PER_PKEY 2
-
-#ifndef DEBUG_LEVEL
-#define DEBUG_LEVEL 0
-#endif
-#define DPRINT_IN_SIGNAL_BUF_SIZE 4096
-extern int dprint_in_signal;
-extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
-static inline void sigsafe_printf(const char *format, ...)
-{
- va_list ap;
-
- va_start(ap, format);
- if (!dprint_in_signal) {
- vprintf(format, ap);
- } else {
- int len = vsnprintf(dprint_in_signal_buffer,
- DPRINT_IN_SIGNAL_BUF_SIZE,
- format, ap);
- /*
- * len is amount that would have been printed,
- * but actual write is truncated at BUF_SIZE.
- */
- if (len > DPRINT_IN_SIGNAL_BUF_SIZE)
- len = DPRINT_IN_SIGNAL_BUF_SIZE;
- write(1, dprint_in_signal_buffer, len);
- }
- va_end(ap);
-}
-#define dprintf_level(level, args...) do { \
- if (level <= DEBUG_LEVEL) \
- sigsafe_printf(args); \
- fflush(NULL); \
-} while (0)
-#define dprintf0(args...) dprintf_level(0, args)
-#define dprintf1(args...) dprintf_level(1, args)
-#define dprintf2(args...) dprintf_level(2, args)
-#define dprintf3(args...) dprintf_level(3, args)
-#define dprintf4(args...) dprintf_level(4, args)
-
-extern unsigned int shadow_pkru;
-static inline unsigned int __rdpkru(void)
-{
- unsigned int eax, edx;
- unsigned int ecx = 0;
- unsigned int pkru;
-
- asm volatile(".byte 0x0f,0x01,0xee\n\t"
- : "=a" (eax), "=d" (edx)
- : "c" (ecx));
- pkru = eax;
- return pkru;
-}
-
-static inline unsigned int _rdpkru(int line)
-{
- unsigned int pkru = __rdpkru();
-
- dprintf4("rdpkru(line=%d) pkru: %x shadow: %x\n",
- line, pkru, shadow_pkru);
- assert(pkru == shadow_pkru);
-
- return pkru;
-}
-
-#define rdpkru() _rdpkru(__LINE__)
-
-static inline void __wrpkru(unsigned int pkru)
-{
- unsigned int eax = pkru;
- unsigned int ecx = 0;
- unsigned int edx = 0;
-
- dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
- asm volatile(".byte 0x0f,0x01,0xef\n\t"
- : : "a" (eax), "c" (ecx), "d" (edx));
- assert(pkru == __rdpkru());
-}
-
-static inline void wrpkru(unsigned int pkru)
-{
- dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
- /* will do the shadow check for us: */
- rdpkru();
- __wrpkru(pkru);
- shadow_pkru = pkru;
- dprintf4("%s(%08x) pkru: %08x\n", __func__, pkru, __rdpkru());
-}
-
-/*
- * These are technically racy. since something could
- * change PKRU between the read and the write.
- */
-static inline void __pkey_access_allow(int pkey, int do_allow)
-{
- unsigned int pkru = rdpkru();
- int bit = pkey * 2;
-
- if (do_allow)
- pkru &= (1<<bit);
- else
- pkru |= (1<<bit);
-
- dprintf4("pkru now: %08x\n", rdpkru());
- wrpkru(pkru);
-}
-
-static inline void __pkey_write_allow(int pkey, int do_allow_write)
-{
- long pkru = rdpkru();
- int bit = pkey * 2 + 1;
-
- if (do_allow_write)
- pkru &= (1<<bit);
- else
- pkru |= (1<<bit);
-
- wrpkru(pkru);
- dprintf4("pkru now: %08x\n", rdpkru());
-}
-
-#define PROT_PKEY0 0x10 /* protection key value (bit 0) */
-#define PROT_PKEY1 0x20 /* protection key value (bit 1) */
-#define PROT_PKEY2 0x40 /* protection key value (bit 2) */
-#define PROT_PKEY3 0x80 /* protection key value (bit 3) */
-
-#define PAGE_SIZE 4096
-#define MB (1<<20)
-
-static inline void __cpuid(unsigned int *eax, unsigned int *ebx,
- unsigned int *ecx, unsigned int *edx)
-{
- /* ecx is often an input as well as an output. */
- asm volatile(
- "cpuid;"
- : "=a" (*eax),
- "=b" (*ebx),
- "=c" (*ecx),
- "=d" (*edx)
- : "0" (*eax), "2" (*ecx));
-}
-
-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx) */
-#define X86_FEATURE_PKU (1<<3) /* Protection Keys for Userspace */
-#define X86_FEATURE_OSPKE (1<<4) /* OS Protection Keys Enable */
-
-static inline int cpu_has_pku(void)
-{
- unsigned int eax;
- unsigned int ebx;
- unsigned int ecx;
- unsigned int edx;
-
- eax = 0x7;
- ecx = 0x0;
- __cpuid(&eax, &ebx, &ecx, &edx);
-
- if (!(ecx & X86_FEATURE_PKU)) {
- dprintf2("cpu does not have PKU\n");
- return 0;
- }
- if (!(ecx & X86_FEATURE_OSPKE)) {
- dprintf2("cpu does not have OSPKE\n");
- return 0;
- }
- return 1;
-}
-
-#define XSTATE_PKRU_BIT (9)
-#define XSTATE_PKRU 0x200
-
-int pkru_xstate_offset(void)
-{
- unsigned int eax;
- unsigned int ebx;
- unsigned int ecx;
- unsigned int edx;
- int xstate_offset;
- int xstate_size;
- unsigned long XSTATE_CPUID = 0xd;
- int leaf;
-
- /* assume that XSTATE_PKRU is set in XCR0 */
- leaf = XSTATE_PKRU_BIT;
- {
- eax = XSTATE_CPUID;
- ecx = leaf;
- __cpuid(&eax, &ebx, &ecx, &edx);
-
- if (leaf == XSTATE_PKRU_BIT) {
- xstate_offset = ebx;
- xstate_size = eax;
- }
- }
-
- if (xstate_size == 0) {
- printf("could not find size/offset of PKRU in xsave state\n");
- return 0;
- }
-
- return xstate_offset;
-}
-
-#endif /* _PKEYS_HELPER_H */
diff --git a/tools/testing/selftests/x86/protection_keys.c b/tools/testing/selftests/x86/protection_keys.c
deleted file mode 100644
index 555e43c..0000000
--- a/tools/testing/selftests/x86/protection_keys.c
+++ /dev/null
@@ -1,1395 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Tests x86 Memory Protection Keys (see Documentation/x86/protection-keys.txt)
- *
- * There are examples in here of:
- * * how to set protection keys on memory
- * * how to set/clear bits in PKRU (the rights register)
- * * how to handle SEGV_PKRU signals and extract pkey-relevant
- * information from the siginfo
- *
- * Things to add:
- * make sure KSM and KSM COW breaking works
- * prefault pages in at malloc, or not
- * protect MPX bounds tables with protection keys?
- * make sure VMA splitting/merging is working correctly
- * OOMs can destroy mm->mmap (see exit_mmap()), so make sure it is immune to pkeys
- * look for pkey "leaks" where it is still set on a VMA but "freed" back to the kernel
- * do a plain mprotect() to a mprotect_pkey() area and make sure the pkey sticks
- *
- * Compile like this:
- * gcc -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
- * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
- */
-#define _GNU_SOURCE
-#include <errno.h>
-#include <linux/futex.h>
-#include <sys/time.h>
-#include <sys/syscall.h>
-#include <string.h>
-#include <stdio.h>
-#include <stdint.h>
-#include <stdbool.h>
-#include <signal.h>
-#include <assert.h>
-#include <stdlib.h>
-#include <ucontext.h>
-#include <sys/mman.h>
-#include <sys/types.h>
-#include <sys/wait.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <unistd.h>
-#include <sys/ptrace.h>
-#include <setjmp.h>
-
-#include "pkey-helpers.h"
-
-int iteration_nr = 1;
-int test_nr;
-
-unsigned int shadow_pkru;
-
-#define HPAGE_SIZE (1UL<<21)
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
-#define ALIGN_UP(x, align_to) (((x) + ((align_to)-1)) & ~((align_to)-1))
-#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
-#define ALIGN_PTR_UP(p, ptr_align_to) ((typeof(p))ALIGN_UP((unsigned long)(p), ptr_align_to))
-#define ALIGN_PTR_DOWN(p, ptr_align_to) ((typeof(p))ALIGN_DOWN((unsigned long)(p), ptr_align_to))
-#define __stringify_1(x...) #x
-#define __stringify(x...) __stringify_1(x)
-
-#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
-
-int dprint_in_signal;
-char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
-
-extern void abort_hooks(void);
-#define pkey_assert(condition) do { \
- if (!(condition)) { \
- dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \
- __FILE__, __LINE__, \
- test_nr, iteration_nr); \
- dprintf0("errno at assert: %d", errno); \
- abort_hooks(); \
- assert(condition); \
- } \
-} while (0)
-#define raw_assert(cond) assert(cond)
-
-void cat_into_file(char *str, char *file)
-{
- int fd = open(file, O_RDWR);
- int ret;
-
- dprintf2("%s(): writing '%s' to '%s'\n", __func__, str, file);
- /*
- * these need to be raw because they are called under
- * pkey_assert()
- */
- raw_assert(fd >= 0);
- ret = write(fd, str, strlen(str));
- if (ret != strlen(str)) {
- perror("write to file failed");
- fprintf(stderr, "filename: '%s' str: '%s'\n", file, str);
- raw_assert(0);
- }
- close(fd);
-}
-
-#if CONTROL_TRACING > 0
-static int warned_tracing;
-int tracing_root_ok(void)
-{
- if (geteuid() != 0) {
- if (!warned_tracing)
- fprintf(stderr, "WARNING: not run as root, "
- "can not do tracing control\n");
- warned_tracing = 1;
- return 0;
- }
- return 1;
-}
-#endif
-
-void tracing_on(void)
-{
-#if CONTROL_TRACING > 0
-#define TRACEDIR "/sys/kernel/debug/tracing"
- char pidstr[32];
-
- if (!tracing_root_ok())
- return;
-
- sprintf(pidstr, "%d", getpid());
- cat_into_file("0", TRACEDIR "/tracing_on");
- cat_into_file("\n", TRACEDIR "/trace");
- if (1) {
- cat_into_file("function_graph", TRACEDIR "/current_tracer");
- cat_into_file("1", TRACEDIR "/options/funcgraph-proc");
- } else {
- cat_into_file("nop", TRACEDIR "/current_tracer");
- }
- cat_into_file(pidstr, TRACEDIR "/set_ftrace_pid");
- cat_into_file("1", TRACEDIR "/tracing_on");
- dprintf1("enabled tracing\n");
-#endif
-}
-
-void tracing_off(void)
-{
-#if CONTROL_TRACING > 0
- if (!tracing_root_ok())
- return;
- cat_into_file("0", "/sys/kernel/debug/tracing/tracing_on");
-#endif
-}
-
-void abort_hooks(void)
-{
- fprintf(stderr, "running %s()...\n", __func__);
- tracing_off();
-#ifdef SLEEP_ON_ABORT
- sleep(SLEEP_ON_ABORT);
-#endif
-}
-
-static inline void __page_o_noops(void)
-{
- /* 8-bytes of instruction * 512 bytes = 1 page */
- asm(".rept 512 ; nopl 0x7eeeeeee(%eax) ; .endr");
-}
-
-/*
- * This attempts to have roughly a page of instructions followed by a few
- * instructions that do a write, and another page of instructions. That
- * way, we are pretty sure that the write is in the second page of
- * instructions and has at least a page of padding behind it.
- *
- * *That* lets us be sure to madvise() away the write instruction, which
- * will then fault, which makes sure that the fault code handles
- * execute-only memory properly.
- */
-__attribute__((__aligned__(PAGE_SIZE)))
-void lots_o_noops_around_write(int *write_to_me)
-{
- dprintf3("running %s()\n", __func__);
- __page_o_noops();
- /* Assume this happens in the second page of instructions: */
- *write_to_me = __LINE__;
- /* pad out by another page: */
- __page_o_noops();
- dprintf3("%s() done\n", __func__);
-}
-
-/* Define some kernel-like types */
-#define u8 uint8_t
-#define u16 uint16_t
-#define u32 uint32_t
-#define u64 uint64_t
-
-#ifdef __i386__
-#define SYS_mprotect_key 380
-#define SYS_pkey_alloc 381
-#define SYS_pkey_free 382
-#define REG_IP_IDX REG_EIP
-#define si_pkey_offset 0x14
-#else
-#define SYS_mprotect_key 329
-#define SYS_pkey_alloc 330
-#define SYS_pkey_free 331
-#define REG_IP_IDX REG_RIP
-#define si_pkey_offset 0x20
-#endif
-
-void dump_mem(void *dumpme, int len_bytes)
-{
- char *c = (void *)dumpme;
- int i;
-
- for (i = 0; i < len_bytes; i += sizeof(u64)) {
- u64 *ptr = (u64 *)(c + i);
- dprintf1("dump[%03d][@%p]: %016jx\n", i, ptr, *ptr);
- }
-}
-
-#define SEGV_BNDERR 3 /* failed address bound checks */
-#define SEGV_PKUERR 4
-
-static char *si_code_str(int si_code)
-{
- if (si_code == SEGV_MAPERR)
- return "SEGV_MAPERR";
- if (si_code == SEGV_ACCERR)
- return "SEGV_ACCERR";
- if (si_code == SEGV_BNDERR)
- return "SEGV_BNDERR";
- if (si_code == SEGV_PKUERR)
- return "SEGV_PKUERR";
- return "UNKNOWN";
-}
-
-int pkru_faults;
-int last_si_pkey = -1;
-void signal_handler(int signum, siginfo_t *si, void *vucontext)
-{
- ucontext_t *uctxt = vucontext;
- int trapno;
- unsigned long ip;
- char *fpregs;
- u32 *pkru_ptr;
- u64 si_pkey;
- u32 *si_pkey_ptr;
- int pkru_offset;
- fpregset_t fpregset;
-
- dprint_in_signal = 1;
- dprintf1(">>>>===============SIGSEGV============================\n");
- dprintf1("%s()::%d, pkru: 0x%x shadow: %x\n", __func__, __LINE__,
- __rdpkru(), shadow_pkru);
-
- trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
- ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
- fpregset = uctxt->uc_mcontext.fpregs;
- fpregs = (void *)fpregset;
-
- dprintf2("%s() trapno: %d ip: 0x%lx info->si_code: %s/%d\n", __func__,
- trapno, ip, si_code_str(si->si_code), si->si_code);
-#ifdef __i386__
- /*
- * 32-bit has some extra padding so that userspace can tell whether
- * the XSTATE header is present in addition to the "legacy" FPU
- * state. We just assume that it is here.
- */
- fpregs += 0x70;
-#endif
- pkru_offset = pkru_xstate_offset();
- pkru_ptr = (void *)(&fpregs[pkru_offset]);
-
- dprintf1("siginfo: %p\n", si);
- dprintf1(" fpregs: %p\n", fpregs);
- /*
- * If we got a PKRU fault, we *HAVE* to have at least one bit set in
- * here.
- */
- dprintf1("pkru_xstate_offset: %d\n", pkru_xstate_offset());
- if (DEBUG_LEVEL > 4)
- dump_mem(pkru_ptr - 128, 256);
- pkey_assert(*pkru_ptr);
-
- si_pkey_ptr = (u32 *)(((u8 *)si) + si_pkey_offset);
- dprintf1("si_pkey_ptr: %p\n", si_pkey_ptr);
- dump_mem(si_pkey_ptr - 8, 24);
- si_pkey = *si_pkey_ptr;
- pkey_assert(si_pkey < NR_PKEYS);
- last_si_pkey = si_pkey;
-
- if ((si->si_code == SEGV_MAPERR) ||
- (si->si_code == SEGV_ACCERR) ||
- (si->si_code == SEGV_BNDERR)) {
- printf("non-PK si_code, exiting...\n");
- exit(4);
- }
-
- dprintf1("signal pkru from xsave: %08x\n", *pkru_ptr);
- /* need __rdpkru() version so we do not do shadow_pkru checking */
- dprintf1("signal pkru from pkru: %08x\n", __rdpkru());
- dprintf1("si_pkey from siginfo: %jx\n", si_pkey);
- *(u64 *)pkru_ptr = 0x00000000;
- dprintf1("WARNING: set PRKU=0 to allow faulting instruction to continue\n");
- pkru_faults++;
- dprintf1("<<<<==================================================\n");
- return;
- if (trapno == 14) {
- fprintf(stderr,
- "ERROR: In signal handler, page fault, trapno = %d, ip = %016lx\n",
- trapno, ip);
- fprintf(stderr, "si_addr %p\n", si->si_addr);
- fprintf(stderr, "REG_ERR: %lx\n",
- (unsigned long)uctxt->uc_mcontext.gregs[REG_ERR]);
- exit(1);
- } else {
- fprintf(stderr, "unexpected trap %d! at 0x%lx\n", trapno, ip);
- fprintf(stderr, "si_addr %p\n", si->si_addr);
- fprintf(stderr, "REG_ERR: %lx\n",
- (unsigned long)uctxt->uc_mcontext.gregs[REG_ERR]);
- exit(2);
- }
- dprint_in_signal = 0;
-}
-
-int wait_all_children(void)
-{
- int status;
- return waitpid(-1, &status, 0);
-}
-
-void sig_chld(int x)
-{
- dprint_in_signal = 1;
- dprintf2("[%d] SIGCHLD: %d\n", getpid(), x);
- dprint_in_signal = 0;
-}
-
-void setup_sigsegv_handler(void)
-{
- int r, rs;
- struct sigaction newact;
- struct sigaction oldact;
-
- /* #PF is mapped to sigsegv */
- int signum = SIGSEGV;
-
- newact.sa_handler = 0;
- newact.sa_sigaction = signal_handler;
-
- /*sigset_t - signals to block while in the handler */
- /* get the old signal mask. */
- rs = sigprocmask(SIG_SETMASK, 0, &newact.sa_mask);
- pkey_assert(rs == 0);
-
- /* call sa_sigaction, not sa_handler*/
- newact.sa_flags = SA_SIGINFO;
-
- newact.sa_restorer = 0; /* void(*)(), obsolete */
- r = sigaction(signum, &newact, &oldact);
- r = sigaction(SIGALRM, &newact, &oldact);
- pkey_assert(r == 0);
-}
-
-void setup_handlers(void)
-{
- signal(SIGCHLD, &sig_chld);
- setup_sigsegv_handler();
-}
-
-pid_t fork_lazy_child(void)
-{
- pid_t forkret;
-
- forkret = fork();
- pkey_assert(forkret >= 0);
- dprintf3("[%d] fork() ret: %d\n", getpid(), forkret);
-
- if (!forkret) {
- /* in the child */
- while (1) {
- dprintf1("child sleeping...\n");
- sleep(30);
- }
- }
- return forkret;
-}
-
-void davecmp(void *_a, void *_b, int len)
-{
- int i;
- unsigned long *a = _a;
- unsigned long *b = _b;
-
- for (i = 0; i < len / sizeof(*a); i++) {
- if (a[i] == b[i])
- continue;
-
- dprintf3("[%3d]: a: %016lx b: %016lx\n", i, a[i], b[i]);
- }
-}
-
-void dumpit(char *f)
-{
- int fd = open(f, O_RDONLY);
- char buf[100];
- int nr_read;
-
- dprintf2("maps fd: %d\n", fd);
- do {
- nr_read = read(fd, &buf[0], sizeof(buf));
- write(1, buf, nr_read);
- } while (nr_read > 0);
- close(fd);
-}
-
-#define PKEY_DISABLE_ACCESS 0x1
-#define PKEY_DISABLE_WRITE 0x2
-
-u32 pkey_get(int pkey, unsigned long flags)
-{
- u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
- u32 pkru = __rdpkru();
- u32 shifted_pkru;
- u32 masked_pkru;
-
- dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n",
- __func__, pkey, flags, 0, 0);
- dprintf2("%s() raw pkru: %x\n", __func__, pkru);
-
- shifted_pkru = (pkru >> (pkey * PKRU_BITS_PER_PKEY));
- dprintf2("%s() shifted_pkru: %x\n", __func__, shifted_pkru);
- masked_pkru = shifted_pkru & mask;
- dprintf2("%s() masked pkru: %x\n", __func__, masked_pkru);
- /*
- * shift down the relevant bits to the lowest two, then
- * mask off all the other high bits.
- */
- return masked_pkru;
-}
-
-int pkey_set(int pkey, unsigned long rights, unsigned long flags)
-{
- u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
- u32 old_pkru = __rdpkru();
- u32 new_pkru;
-
- /* make sure that 'rights' only contains the bits we expect: */
- assert(!(rights & ~mask));
-
- /* copy old pkru */
- new_pkru = old_pkru;
- /* mask out bits from pkey in old value: */
- new_pkru &= ~(mask << (pkey * PKRU_BITS_PER_PKEY));
- /* OR in new bits for pkey: */
- new_pkru |= (rights << (pkey * PKRU_BITS_PER_PKEY));
-
- __wrpkru(new_pkru);
-
- dprintf3("%s(pkey=%d, rights=%lx, flags=%lx) = %x pkru now: %x old_pkru: %x\n",
- __func__, pkey, rights, flags, 0, __rdpkru(), old_pkru);
- return 0;
-}
-
-void pkey_disable_set(int pkey, int flags)
-{
- unsigned long syscall_flags = 0;
- int ret;
- int pkey_rights;
- u32 orig_pkru = rdpkru();
-
- dprintf1("START->%s(%d, 0x%x)\n", __func__,
- pkey, flags);
- pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
-
- pkey_rights = pkey_get(pkey, syscall_flags);
-
- dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
- pkey, pkey, pkey_rights);
- pkey_assert(pkey_rights >= 0);
-
- pkey_rights |= flags;
-
- ret = pkey_set(pkey, pkey_rights, syscall_flags);
- assert(!ret);
- /*pkru and flags have the same format */
- shadow_pkru |= flags << (pkey * 2);
- dprintf1("%s(%d) shadow: 0x%x\n", __func__, pkey, shadow_pkru);
-
- pkey_assert(ret >= 0);
-
- pkey_rights = pkey_get(pkey, syscall_flags);
- dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
- pkey, pkey, pkey_rights);
-
- dprintf1("%s(%d) pkru: 0x%x\n", __func__, pkey, rdpkru());
- if (flags)
- pkey_assert(rdpkru() > orig_pkru);
- dprintf1("END<---%s(%d, 0x%x)\n", __func__,
- pkey, flags);
-}
-
-void pkey_disable_clear(int pkey, int flags)
-{
- unsigned long syscall_flags = 0;
- int ret;
- int pkey_rights = pkey_get(pkey, syscall_flags);
- u32 orig_pkru = rdpkru();
-
- pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
-
- dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
- pkey, pkey, pkey_rights);
- pkey_assert(pkey_rights >= 0);
-
- pkey_rights |= flags;
-
- ret = pkey_set(pkey, pkey_rights, 0);
- /* pkru and flags have the same format */
- shadow_pkru &= ~(flags << (pkey * 2));
- pkey_assert(ret >= 0);
-
- pkey_rights = pkey_get(pkey, syscall_flags);
- dprintf1("%s(%d) pkey_get(%d): %x\n", __func__,
- pkey, pkey, pkey_rights);
-
- dprintf1("%s(%d) pkru: 0x%x\n", __func__, pkey, rdpkru());
- if (flags)
- assert(rdpkru() > orig_pkru);
-}
-
-void pkey_write_allow(int pkey)
-{
- pkey_disable_clear(pkey, PKEY_DISABLE_WRITE);
-}
-void pkey_write_deny(int pkey)
-{
- pkey_disable_set(pkey, PKEY_DISABLE_WRITE);
-}
-void pkey_access_allow(int pkey)
-{
- pkey_disable_clear(pkey, PKEY_DISABLE_ACCESS);
-}
-void pkey_access_deny(int pkey)
-{
- pkey_disable_set(pkey, PKEY_DISABLE_ACCESS);
-}
-
-int sys_mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
- unsigned long pkey)
-{
- int sret;
-
- dprintf2("%s(0x%p, %zx, prot=%lx, pkey=%lx)\n", __func__,
- ptr, size, orig_prot, pkey);
-
- errno = 0;
- sret = syscall(SYS_mprotect_key, ptr, size, orig_prot, pkey);
- if (errno) {
- dprintf2("SYS_mprotect_key sret: %d\n", sret);
- dprintf2("SYS_mprotect_key prot: 0x%lx\n", orig_prot);
- dprintf2("SYS_mprotect_key failed, errno: %d\n", errno);
- if (DEBUG_LEVEL >= 2)
- perror("SYS_mprotect_pkey");
- }
- return sret;
-}
-
-int sys_pkey_alloc(unsigned long flags, unsigned long init_val)
-{
- int ret = syscall(SYS_pkey_alloc, flags, init_val);
- dprintf1("%s(flags=%lx, init_val=%lx) syscall ret: %d errno: %d\n",
- __func__, flags, init_val, ret, errno);
- return ret;
-}
-
-int alloc_pkey(void)
-{
- int ret;
- unsigned long init_val = 0x0;
-
- dprintf1("alloc_pkey()::%d, pkru: 0x%x shadow: %x\n",
- __LINE__, __rdpkru(), shadow_pkru);
- ret = sys_pkey_alloc(0, init_val);
- /*
- * pkey_alloc() sets PKRU, so we need to reflect it in
- * shadow_pkru:
- */
- dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
- __LINE__, ret, __rdpkru(), shadow_pkru);
- if (ret) {
- /* clear both the bits: */
- shadow_pkru &= ~(0x3 << (ret * 2));
- dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
- __LINE__, ret, __rdpkru(), shadow_pkru);
- /*
- * move the new state in from init_val
- * (remember, we cheated and init_val == pkru format)
- */
- shadow_pkru |= (init_val << (ret * 2));
- }
- dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
- __LINE__, ret, __rdpkru(), shadow_pkru);
- dprintf1("alloc_pkey()::%d errno: %d\n", __LINE__, errno);
- /* for shadow checking: */
- rdpkru();
- dprintf4("alloc_pkey()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n",
- __LINE__, ret, __rdpkru(), shadow_pkru);
- return ret;
-}
-
-int sys_pkey_free(unsigned long pkey)
-{
- int ret = syscall(SYS_pkey_free, pkey);
- dprintf1("%s(pkey=%ld) syscall ret: %d\n", __func__, pkey, ret);
- return ret;
-}
-
-/*
- * I had a bug where pkey bits could be set by mprotect() but
- * not cleared. This ensures we get lots of random bit sets
- * and clears on the vma and pte pkey bits.
- */
-int alloc_random_pkey(void)
-{
- int max_nr_pkey_allocs;
- int ret;
- int i;
- int alloced_pkeys[NR_PKEYS];
- int nr_alloced = 0;
- int random_index;
- memset(alloced_pkeys, 0, sizeof(alloced_pkeys));
-
- /* allocate every possible key and make a note of which ones we got */
- max_nr_pkey_allocs = NR_PKEYS;
- max_nr_pkey_allocs = 1;
- for (i = 0; i < max_nr_pkey_allocs; i++) {
- int new_pkey = alloc_pkey();
- if (new_pkey < 0)
- break;
- alloced_pkeys[nr_alloced++] = new_pkey;
- }
-
- pkey_assert(nr_alloced > 0);
- /* select a random one out of the allocated ones */
- random_index = rand() % nr_alloced;
- ret = alloced_pkeys[random_index];
- /* now zero it out so we don't free it next */
- alloced_pkeys[random_index] = 0;
-
- /* go through the allocated ones that we did not want and free them */
- for (i = 0; i < nr_alloced; i++) {
- int free_ret;
- if (!alloced_pkeys[i])
- continue;
- free_ret = sys_pkey_free(alloced_pkeys[i]);
- pkey_assert(!free_ret);
- }
- dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, ret, __rdpkru(), shadow_pkru);
- return ret;
-}
-
-int mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
- unsigned long pkey)
-{
- int nr_iterations = random() % 100;
- int ret;
-
- while (0) {
- int rpkey = alloc_random_pkey();
- ret = sys_mprotect_pkey(ptr, size, orig_prot, pkey);
- dprintf1("sys_mprotect_pkey(%p, %zx, prot=0x%lx, pkey=%ld) ret: %d\n",
- ptr, size, orig_prot, pkey, ret);
- if (nr_iterations-- < 0)
- break;
-
- dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, ret, __rdpkru(), shadow_pkru);
- sys_pkey_free(rpkey);
- dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, ret, __rdpkru(), shadow_pkru);
- }
- pkey_assert(pkey < NR_PKEYS);
-
- ret = sys_mprotect_pkey(ptr, size, orig_prot, pkey);
- dprintf1("mprotect_pkey(%p, %zx, prot=0x%lx, pkey=%ld) ret: %d\n",
- ptr, size, orig_prot, pkey, ret);
- pkey_assert(!ret);
- dprintf1("%s()::%d, ret: %d pkru: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, ret, __rdpkru(), shadow_pkru);
- return ret;
-}
-
-struct pkey_malloc_record {
- void *ptr;
- long size;
-};
-struct pkey_malloc_record *pkey_malloc_records;
-long nr_pkey_malloc_records;
-void record_pkey_malloc(void *ptr, long size)
-{
- long i;
- struct pkey_malloc_record *rec = NULL;
-
- for (i = 0; i < nr_pkey_malloc_records; i++) {
- rec = &pkey_malloc_records[i];
- /* find a free record */
- if (rec)
- break;
- }
- if (!rec) {
- /* every record is full */
- size_t old_nr_records = nr_pkey_malloc_records;
- size_t new_nr_records = (nr_pkey_malloc_records * 2 + 1);
- size_t new_size = new_nr_records * sizeof(struct pkey_malloc_record);
- dprintf2("new_nr_records: %zd\n", new_nr_records);
- dprintf2("new_size: %zd\n", new_size);
- pkey_malloc_records = realloc(pkey_malloc_records, new_size);
- pkey_assert(pkey_malloc_records != NULL);
- rec = &pkey_malloc_records[nr_pkey_malloc_records];
- /*
- * realloc() does not initialize memory, so zero it from
- * the first new record all the way to the end.
- */
- for (i = 0; i < new_nr_records - old_nr_records; i++)
- memset(rec + i, 0, sizeof(*rec));
- }
- dprintf3("filling malloc record[%d/%p]: {%p, %ld}\n",
- (int)(rec - pkey_malloc_records), rec, ptr, size);
- rec->ptr = ptr;
- rec->size = size;
- nr_pkey_malloc_records++;
-}
-
-void free_pkey_malloc(void *ptr)
-{
- long i;
- int ret;
- dprintf3("%s(%p)\n", __func__, ptr);
- for (i = 0; i < nr_pkey_malloc_records; i++) {
- struct pkey_malloc_record *rec = &pkey_malloc_records[i];
- dprintf4("looking for ptr %p at record[%ld/%p]: {%p, %ld}\n",
- ptr, i, rec, rec->ptr, rec->size);
- if ((ptr < rec->ptr) ||
- (ptr >= rec->ptr + rec->size))
- continue;
-
- dprintf3("found ptr %p at record[%ld/%p]: {%p, %ld}\n",
- ptr, i, rec, rec->ptr, rec->size);
- nr_pkey_malloc_records--;
- ret = munmap(rec->ptr, rec->size);
- dprintf3("munmap ret: %d\n", ret);
- pkey_assert(!ret);
- dprintf3("clearing rec->ptr, rec: %p\n", rec);
- rec->ptr = NULL;
- dprintf3("done clearing rec->ptr, rec: %p\n", rec);
- return;
- }
- pkey_assert(false);
-}
-
-
-void *malloc_pkey_with_mprotect(long size, int prot, u16 pkey)
-{
- void *ptr;
- int ret;
-
- rdpkru();
- dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__,
- size, prot, pkey);
- pkey_assert(pkey < NR_PKEYS);
- ptr = mmap(NULL, size, prot, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
- pkey_assert(ptr != (void *)-1);
- ret = mprotect_pkey((void *)ptr, PAGE_SIZE, prot, pkey);
- pkey_assert(!ret);
- record_pkey_malloc(ptr, size);
- rdpkru();
-
- dprintf1("%s() for pkey %d @ %p\n", __func__, pkey, ptr);
- return ptr;
-}
-
-void *malloc_pkey_anon_huge(long size, int prot, u16 pkey)
-{
- int ret;
- void *ptr;
-
- dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__,
- size, prot, pkey);
- /*
- * Guarantee we can fit at least one huge page in the resulting
- * allocation by allocating space for 2:
- */
- size = ALIGN_UP(size, HPAGE_SIZE * 2);
- ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
- pkey_assert(ptr != (void *)-1);
- record_pkey_malloc(ptr, size);
- mprotect_pkey(ptr, size, prot, pkey);
-
- dprintf1("unaligned ptr: %p\n", ptr);
- ptr = ALIGN_PTR_UP(ptr, HPAGE_SIZE);
- dprintf1(" aligned ptr: %p\n", ptr);
- ret = madvise(ptr, HPAGE_SIZE, MADV_HUGEPAGE);
- dprintf1("MADV_HUGEPAGE ret: %d\n", ret);
- ret = madvise(ptr, HPAGE_SIZE, MADV_WILLNEED);
- dprintf1("MADV_WILLNEED ret: %d\n", ret);
- memset(ptr, 0, HPAGE_SIZE);
-
- dprintf1("mmap()'d thp for pkey %d @ %p\n", pkey, ptr);
- return ptr;
-}
-
-int hugetlb_setup_ok;
-#define GET_NR_HUGE_PAGES 10
-void setup_hugetlbfs(void)
-{
- int err;
- int fd;
- char buf[] = "123";
-
- if (geteuid() != 0) {
- fprintf(stderr, "WARNING: not run as root, can not do hugetlb test\n");
- return;
- }
-
- cat_into_file(__stringify(GET_NR_HUGE_PAGES), "/proc/sys/vm/nr_hugepages");
-
- /*
- * Now go make sure that we got the pages and that they
- * are 2M pages. Someone might have made 1G the default.
- */
- fd = open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages", O_RDONLY);
- if (fd < 0) {
- perror("opening sysfs 2M hugetlb config");
- return;
- }
-
- /* -1 to guarantee leaving the trailing \0 */
- err = read(fd, buf, sizeof(buf)-1);
- close(fd);
- if (err <= 0) {
- perror("reading sysfs 2M hugetlb config");
- return;
- }
-
- if (atoi(buf) != GET_NR_HUGE_PAGES) {
- fprintf(stderr, "could not confirm 2M pages, got: '%s' expected %d\n",
- buf, GET_NR_HUGE_PAGES);
- return;
- }
-
- hugetlb_setup_ok = 1;
-}
-
-void *malloc_pkey_hugetlb(long size, int prot, u16 pkey)
-{
- void *ptr;
- int flags = MAP_ANONYMOUS|MAP_PRIVATE|MAP_HUGETLB;
-
- if (!hugetlb_setup_ok)
- return PTR_ERR_ENOTSUP;
-
- dprintf1("doing %s(%ld, %x, %x)\n", __func__, size, prot, pkey);
- size = ALIGN_UP(size, HPAGE_SIZE * 2);
- pkey_assert(pkey < NR_PKEYS);
- ptr = mmap(NULL, size, PROT_NONE, flags, -1, 0);
- pkey_assert(ptr != (void *)-1);
- mprotect_pkey(ptr, size, prot, pkey);
-
- record_pkey_malloc(ptr, size);
-
- dprintf1("mmap()'d hugetlbfs for pkey %d @ %p\n", pkey, ptr);
- return ptr;
-}
-
-void *malloc_pkey_mmap_dax(long size, int prot, u16 pkey)
-{
- void *ptr;
- int fd;
-
- dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__,
- size, prot, pkey);
- pkey_assert(pkey < NR_PKEYS);
- fd = open("/dax/foo", O_RDWR);
- pkey_assert(fd >= 0);
-
- ptr = mmap(0, size, prot, MAP_SHARED, fd, 0);
- pkey_assert(ptr != (void *)-1);
-
- mprotect_pkey(ptr, size, prot, pkey);
-
- record_pkey_malloc(ptr, size);
-
- dprintf1("mmap()'d for pkey %d @ %p\n", pkey, ptr);
- close(fd);
- return ptr;
-}
-
-void *(*pkey_malloc[])(long size, int prot, u16 pkey) = {
-
- malloc_pkey_with_mprotect,
- malloc_pkey_anon_huge,
- malloc_pkey_hugetlb
-/* can not do direct with the pkey_mprotect() API:
- malloc_pkey_mmap_direct,
- malloc_pkey_mmap_dax,
-*/
-};
-
-void *malloc_pkey(long size, int prot, u16 pkey)
-{
- void *ret;
- static int malloc_type;
- int nr_malloc_types = ARRAY_SIZE(pkey_malloc);
-
- pkey_assert(pkey < NR_PKEYS);
-
- while (1) {
- pkey_assert(malloc_type < nr_malloc_types);
-
- ret = pkey_malloc[malloc_type](size, prot, pkey);
- pkey_assert(ret != (void *)-1);
-
- malloc_type++;
- if (malloc_type >= nr_malloc_types)
- malloc_type = (random()%nr_malloc_types);
-
- /* try again if the malloc_type we tried is unsupported */
- if (ret == PTR_ERR_ENOTSUP)
- continue;
-
- break;
- }
-
- dprintf3("%s(%ld, prot=%x, pkey=%x) returning: %p\n", __func__,
- size, prot, pkey, ret);
- return ret;
-}
-
-int last_pkru_faults;
-void expected_pk_fault(int pkey)
-{
- dprintf2("%s(): last_pkru_faults: %d pkru_faults: %d\n",
- __func__, last_pkru_faults, pkru_faults);
- dprintf2("%s(%d): last_si_pkey: %d\n", __func__, pkey, last_si_pkey);
- pkey_assert(last_pkru_faults + 1 == pkru_faults);
- pkey_assert(last_si_pkey == pkey);
- /*
- * The signal handler shold have cleared out PKRU to let the
- * test program continue. We now have to restore it.
- */
- if (__rdpkru() != 0)
- pkey_assert(0);
-
- __wrpkru(shadow_pkru);
- dprintf1("%s() set PKRU=%x to restore state after signal nuked it\n",
- __func__, shadow_pkru);
- last_pkru_faults = pkru_faults;
- last_si_pkey = -1;
-}
-
-void do_not_expect_pk_fault(void)
-{
- pkey_assert(last_pkru_faults == pkru_faults);
-}
-
-int test_fds[10] = { -1 };
-int nr_test_fds;
-void __save_test_fd(int fd)
-{
- pkey_assert(fd >= 0);
- pkey_assert(nr_test_fds < ARRAY_SIZE(test_fds));
- test_fds[nr_test_fds] = fd;
- nr_test_fds++;
-}
-
-int get_test_read_fd(void)
-{
- int test_fd = open("/etc/passwd", O_RDONLY);
- __save_test_fd(test_fd);
- return test_fd;
-}
-
-void close_test_fds(void)
-{
- int i;
-
- for (i = 0; i < nr_test_fds; i++) {
- if (test_fds[i] < 0)
- continue;
- close(test_fds[i]);
- test_fds[i] = -1;
- }
- nr_test_fds = 0;
-}
-
-#define barrier() __asm__ __volatile__("": : :"memory")
-__attribute__((noinline)) int read_ptr(int *ptr)
-{
- /*
- * Keep GCC from optimizing this away somehow
- */
- barrier();
- return *ptr;
-}
-
-void test_read_of_write_disabled_region(int *ptr, u16 pkey)
-{
- int ptr_contents;
-
- dprintf1("disabling write access to PKEY[1], doing read\n");
- pkey_write_deny(pkey);
- ptr_contents = read_ptr(ptr);
- dprintf1("*ptr: %d\n", ptr_contents);
- dprintf1("\n");
-}
-void test_read_of_access_disabled_region(int *ptr, u16 pkey)
-{
- int ptr_contents;
-
- dprintf1("disabling access to PKEY[%02d], doing read @ %p\n", pkey, ptr);
- rdpkru();
- pkey_access_deny(pkey);
- ptr_contents = read_ptr(ptr);
- dprintf1("*ptr: %d\n", ptr_contents);
- expected_pk_fault(pkey);
-}
-void test_write_of_write_disabled_region(int *ptr, u16 pkey)
-{
- dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey);
- pkey_write_deny(pkey);
- *ptr = __LINE__;
- expected_pk_fault(pkey);
-}
-void test_write_of_access_disabled_region(int *ptr, u16 pkey)
-{
- dprintf1("disabling access to PKEY[%02d], doing write\n", pkey);
- pkey_access_deny(pkey);
- *ptr = __LINE__;
- expected_pk_fault(pkey);
-}
-void test_kernel_write_of_access_disabled_region(int *ptr, u16 pkey)
-{
- int ret;
- int test_fd = get_test_read_fd();
-
- dprintf1("disabling access to PKEY[%02d], "
- "having kernel read() to buffer\n", pkey);
- pkey_access_deny(pkey);
- ret = read(test_fd, ptr, 1);
- dprintf1("read ret: %d\n", ret);
- pkey_assert(ret);
-}
-void test_kernel_write_of_write_disabled_region(int *ptr, u16 pkey)
-{
- int ret;
- int test_fd = get_test_read_fd();
-
- pkey_write_deny(pkey);
- ret = read(test_fd, ptr, 100);
- dprintf1("read ret: %d\n", ret);
- if (ret < 0 && (DEBUG_LEVEL > 0))
- perror("verbose read result (OK for this to be bad)");
- pkey_assert(ret);
-}
-
-void test_kernel_gup_of_access_disabled_region(int *ptr, u16 pkey)
-{
- int pipe_ret, vmsplice_ret;
- struct iovec iov;
- int pipe_fds[2];
-
- pipe_ret = pipe(pipe_fds);
-
- pkey_assert(pipe_ret == 0);
- dprintf1("disabling access to PKEY[%02d], "
- "having kernel vmsplice from buffer\n", pkey);
- pkey_access_deny(pkey);
- iov.iov_base = ptr;
- iov.iov_len = PAGE_SIZE;
- vmsplice_ret = vmsplice(pipe_fds[1], &iov, 1, SPLICE_F_GIFT);
- dprintf1("vmsplice() ret: %d\n", vmsplice_ret);
- pkey_assert(vmsplice_ret == -1);
-
- close(pipe_fds[0]);
- close(pipe_fds[1]);
-}
-
-void test_kernel_gup_write_to_write_disabled_region(int *ptr, u16 pkey)
-{
- int ignored = 0xdada;
- int futex_ret;
- int some_int = __LINE__;
-
- dprintf1("disabling write to PKEY[%02d], "
- "doing futex gunk in buffer\n", pkey);
- *ptr = some_int;
- pkey_write_deny(pkey);
- futex_ret = syscall(SYS_futex, ptr, FUTEX_WAIT, some_int-1, NULL,
- &ignored, ignored);
- if (DEBUG_LEVEL > 0)
- perror("futex");
- dprintf1("futex() ret: %d\n", futex_ret);
-}
-
-/* Assumes that all pkeys other than 'pkey' are unallocated */
-void test_pkey_syscalls_on_non_allocated_pkey(int *ptr, u16 pkey)
-{
- int err;
- int i;
-
- /* Note: 0 is the default pkey, so don't mess with it */
- for (i = 1; i < NR_PKEYS; i++) {
- if (pkey == i)
- continue;
-
- dprintf1("trying get/set/free to non-allocated pkey: %2d\n", i);
- err = sys_pkey_free(i);
- pkey_assert(err);
-
- err = sys_pkey_free(i);
- pkey_assert(err);
-
- err = sys_mprotect_pkey(ptr, PAGE_SIZE, PROT_READ, i);
- pkey_assert(err);
- }
-}
-
-/* Assumes that all pkeys other than 'pkey' are unallocated */
-void test_pkey_syscalls_bad_args(int *ptr, u16 pkey)
-{
- int err;
- int bad_pkey = NR_PKEYS+99;
-
- /* pass a known-invalid pkey in: */
- err = sys_mprotect_pkey(ptr, PAGE_SIZE, PROT_READ, bad_pkey);
- pkey_assert(err);
-}
-
-/* Assumes that all pkeys other than 'pkey' are unallocated */
-void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
-{
- int err;
- int allocated_pkeys[NR_PKEYS] = {0};
- int nr_allocated_pkeys = 0;
- int i;
-
- for (i = 0; i < NR_PKEYS*2; i++) {
- int new_pkey;
- dprintf1("%s() alloc loop: %d\n", __func__, i);
- new_pkey = alloc_pkey();
- dprintf4("%s()::%d, err: %d pkru: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, err, __rdpkru(), shadow_pkru);
- rdpkru(); /* for shadow checking */
- dprintf2("%s() errno: %d ENOSPC: %d\n", __func__, errno, ENOSPC);
- if ((new_pkey == -1) && (errno == ENOSPC)) {
- dprintf2("%s() failed to allocate pkey after %d tries\n",
- __func__, nr_allocated_pkeys);
- break;
- }
- pkey_assert(nr_allocated_pkeys < NR_PKEYS);
- allocated_pkeys[nr_allocated_pkeys++] = new_pkey;
- }
-
- dprintf3("%s()::%d\n", __func__, __LINE__);
-
- /*
- * ensure it did not reach the end of the loop without
- * failure:
- */
- pkey_assert(i < NR_PKEYS*2);
-
- /*
- * There are 16 pkeys supported in hardware. One is taken
- * up for the default (0) and another can be taken up by
- * an execute-only mapping. Ensure that we can allocate
- * at least 14 (16-2).
- */
- pkey_assert(i >= NR_PKEYS-2);
-
- for (i = 0; i < nr_allocated_pkeys; i++) {
- err = sys_pkey_free(allocated_pkeys[i]);
- pkey_assert(!err);
- rdpkru(); /* for shadow checking */
- }
-}
-
-void test_ptrace_of_child(int *ptr, u16 pkey)
-{
- __attribute__((__unused__)) int peek_result;
- pid_t child_pid;
- void *ignored = 0;
- long ret;
- int status;
- /*
- * This is the "control" for our little expermient. Make sure
- * we can always access it when ptracing.
- */
- int *plain_ptr_unaligned = malloc(HPAGE_SIZE);
- int *plain_ptr = ALIGN_PTR_UP(plain_ptr_unaligned, PAGE_SIZE);
-
- /*
- * Fork a child which is an exact copy of this process, of course.
- * That means we can do all of our tests via ptrace() and then plain
- * memory access and ensure they work differently.
- */
- child_pid = fork_lazy_child();
- dprintf1("[%d] child pid: %d\n", getpid(), child_pid);
-
- ret = ptrace(PTRACE_ATTACH, child_pid, ignored, ignored);
- if (ret)
- perror("attach");
- dprintf1("[%d] attach ret: %ld %d\n", getpid(), ret, __LINE__);
- pkey_assert(ret != -1);
- ret = waitpid(child_pid, &status, WUNTRACED);
- if ((ret != child_pid) || !(WIFSTOPPED(status))) {
- fprintf(stderr, "weird waitpid result %ld stat %x\n",
- ret, status);
- pkey_assert(0);
- }
- dprintf2("waitpid ret: %ld\n", ret);
- dprintf2("waitpid status: %d\n", status);
-
- pkey_access_deny(pkey);
- pkey_write_deny(pkey);
-
- /* Write access, untested for now:
- ret = ptrace(PTRACE_POKEDATA, child_pid, peek_at, data);
- pkey_assert(ret != -1);
- dprintf1("poke at %p: %ld\n", peek_at, ret);
- */
-
- /*
- * Try to access the pkey-protected "ptr" via ptrace:
- */
- ret = ptrace(PTRACE_PEEKDATA, child_pid, ptr, ignored);
- /* expect it to work, without an error: */
- pkey_assert(ret != -1);
- /* Now access from the current task, and expect an exception: */
- peek_result = read_ptr(ptr);
- expected_pk_fault(pkey);
-
- /*
- * Try to access the NON-pkey-protected "plain_ptr" via ptrace:
- */
- ret = ptrace(PTRACE_PEEKDATA, child_pid, plain_ptr, ignored);
- /* expect it to work, without an error: */
- pkey_assert(ret != -1);
- /* Now access from the current task, and expect NO exception: */
- peek_result = read_ptr(plain_ptr);
- do_not_expect_pk_fault();
-
- ret = ptrace(PTRACE_DETACH, child_pid, ignored, 0);
- pkey_assert(ret != -1);
-
- ret = kill(child_pid, SIGKILL);
- pkey_assert(ret != -1);
-
- wait(&status);
-
- free(plain_ptr_unaligned);
-}
-
-void test_executing_on_unreadable_memory(int *ptr, u16 pkey)
-{
- void *p1;
- int scratch;
- int ptr_contents;
- int ret;
-
- p1 = ALIGN_PTR_UP(&lots_o_noops_around_write, PAGE_SIZE);
- dprintf3("&lots_o_noops: %p\n", &lots_o_noops_around_write);
- /* lots_o_noops_around_write should be page-aligned already */
- assert(p1 == &lots_o_noops_around_write);
-
- /* Point 'p1' at the *second* page of the function: */
- p1 += PAGE_SIZE;
-
- madvise(p1, PAGE_SIZE, MADV_DONTNEED);
- lots_o_noops_around_write(&scratch);
- ptr_contents = read_ptr(p1);
- dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
-
- ret = mprotect_pkey(p1, PAGE_SIZE, PROT_EXEC, (u64)pkey);
- pkey_assert(!ret);
- pkey_access_deny(pkey);
-
- dprintf2("pkru: %x\n", rdpkru());
-
- /*
- * Make sure this is an *instruction* fault
- */
- madvise(p1, PAGE_SIZE, MADV_DONTNEED);
- lots_o_noops_around_write(&scratch);
- do_not_expect_pk_fault();
- ptr_contents = read_ptr(p1);
- dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
- expected_pk_fault(pkey);
-}
-
-void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey)
-{
- int size = PAGE_SIZE;
- int sret;
-
- if (cpu_has_pku()) {
- dprintf1("SKIP: %s: no CPU support\n", __func__);
- return;
- }
-
- sret = syscall(SYS_mprotect_key, ptr, size, PROT_READ, pkey);
- pkey_assert(sret < 0);
-}
-
-void (*pkey_tests[])(int *ptr, u16 pkey) = {
- test_read_of_write_disabled_region,
- test_read_of_access_disabled_region,
- test_write_of_write_disabled_region,
- test_write_of_access_disabled_region,
- test_kernel_write_of_access_disabled_region,
- test_kernel_write_of_write_disabled_region,
- test_kernel_gup_of_access_disabled_region,
- test_kernel_gup_write_to_write_disabled_region,
- test_executing_on_unreadable_memory,
- test_ptrace_of_child,
- test_pkey_syscalls_on_non_allocated_pkey,
- test_pkey_syscalls_bad_args,
- test_pkey_alloc_exhaust,
-};
-
-void run_tests_once(void)
-{
- int *ptr;
- int prot = PROT_READ|PROT_WRITE;
-
- for (test_nr = 0; test_nr < ARRAY_SIZE(pkey_tests); test_nr++) {
- int pkey;
- int orig_pkru_faults = pkru_faults;
-
- dprintf1("======================\n");
- dprintf1("test %d preparing...\n", test_nr);
-
- tracing_on();
- pkey = alloc_random_pkey();
- dprintf1("test %d starting with pkey: %d\n", test_nr, pkey);
- ptr = malloc_pkey(PAGE_SIZE, prot, pkey);
- dprintf1("test %d starting...\n", test_nr);
- pkey_tests[test_nr](ptr, pkey);
- dprintf1("freeing test memory: %p\n", ptr);
- free_pkey_malloc(ptr);
- sys_pkey_free(pkey);
-
- dprintf1("pkru_faults: %d\n", pkru_faults);
- dprintf1("orig_pkru_faults: %d\n", orig_pkru_faults);
-
- tracing_off();
- close_test_fds();
-
- printf("test %2d PASSED (iteration %d)\n", test_nr, iteration_nr);
- dprintf1("======================\n\n");
- }
- iteration_nr++;
-}
-
-void pkey_setup_shadow(void)
-{
- shadow_pkru = __rdpkru();
-}
-
-int main(void)
-{
- int nr_iterations = 22;
-
- setup_handlers();
-
- printf("has pku: %d\n", cpu_has_pku());
-
- if (!cpu_has_pku()) {
- int size = PAGE_SIZE;
- int *ptr;
-
- printf("running PKEY tests for unsupported CPU/OS\n");
-
- ptr = mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
- assert(ptr != (void *)-1);
- test_mprotect_pkey_on_unsupported_cpu(ptr, 1);
- exit(0);
- }
-
- pkey_setup_shadow();
- printf("startup pkru: %x\n", rdpkru());
- setup_hugetlbfs();
-
- while (nr_iterations-- > 0)
- run_tests_once();
-
- printf("done (all tests OK)\n");
- return 0;
-}
--
1.7.1
From 1583300115295991621@xxx Mon Nov 06 07:17:35 +0000 2017
X-GM-THRID: 1582194064135448324
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
instead of clearing the bits, pkey_disable_clear() was setting
the bits. Fixed it.
Also fixed a wrong assertion in that function. When bits are
cleared, the resulting bit value will be less than the original.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/protection_keys.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 5aba137..384cc9a 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -461,7 +461,7 @@ void pkey_disable_clear(int pkey, int flags)
pkey, pkey, pkey_rights);
pkey_assert(pkey_rights >= 0);
- pkey_rights |= flags;
+ pkey_rights &= ~flags;
ret = pkey_set(pkey, pkey_rights, 0);
/* pkey_reg and flags have the same format */
@@ -475,7 +475,7 @@ void pkey_disable_clear(int pkey, int flags)
dprintf1("%s(%d) pkey_reg: 0x%016lx\n", __func__,
pkey, rdpkey_reg());
if (flags)
- assert(rdpkey_reg() > orig_pkey_reg);
+ assert(rdpkey_reg() < orig_pkey_reg);
}
void pkey_write_allow(int pkey)
--
1.7.1
From 1583279823135774746@xxx Mon Nov 06 01:55:03 +0000 2017
X-GM-THRID: 1582827082359046937
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
detect write-violation on a page to which access-disabled
key is associated much after the page is mapped.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/protection_keys.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 0b7b826..c790bff 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1058,6 +1058,18 @@ void test_write_of_access_disabled_region(int *ptr, u16 pkey)
*ptr = __LINE__;
expected_pkey_fault(pkey);
}
+
+void test_write_of_access_disabled_region_with_page_already_mapped(int *ptr,
+ u16 pkey)
+{
+ *ptr = __LINE__;
+ dprintf1("disabling access; after accessing the page, "
+ " to PKEY[%02d], doing write\n", pkey);
+ pkey_access_deny(pkey);
+ *ptr = __LINE__;
+ expected_pkey_fault(pkey);
+}
+
void test_kernel_write_of_access_disabled_region(int *ptr, u16 pkey)
{
int ret;
@@ -1342,6 +1354,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey)
test_write_of_write_disabled_region,
test_write_of_write_disabled_region_with_page_already_mapped,
test_write_of_access_disabled_region,
+ test_write_of_access_disabled_region_with_page_already_mapped,
test_kernel_write_of_access_disabled_region,
test_kernel_write_of_write_disabled_region,
test_kernel_gup_of_access_disabled_region,
--
1.7.1
From 1583310302350409398@xxx Mon Nov 06 09:59:30 +0000 2017
X-GM-THRID: 1583310302350409398
X-Gmail-Labels: Inbox,Category Promotions,HistoricalUnread
Total 32 keys are available on power7 and above. However
pkey 0,1 are reserved. So effectively we have 30 pkeys.
On 4K kernels, we do not have 5 bits in the PTE to
represent all the keys; we only have 3bits.Two of those
keys are reserved; pkey 0 and pkey 1. So effectively we
have 6 pkeys.
This patch keeps track of reserved keys, allocated keys
and keys that are currently free.
Also it adds skeletal functions and macros, that the
architecture-independent code expects to be available.
Reviewed-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/book3s/64/mmu.h | 9 +++
arch/powerpc/include/asm/mmu_context.h | 1 +
arch/powerpc/include/asm/pkeys.h | 95 ++++++++++++++++++++++++++++-
arch/powerpc/mm/mmu_context_book3s64.c | 2 +
arch/powerpc/mm/pkeys.c | 33 ++++++++++
5 files changed, 136 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 37fdede..df17fbc 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -108,6 +108,15 @@ struct patb_entry {
#ifdef CONFIG_SPAPR_TCE_IOMMU
struct list_head iommu_group_mem_list;
#endif
+
+#ifdef CONFIG_PPC_MEM_KEYS
+ /*
+ * Each bit represents one protection key.
+ * bit set -> key allocated
+ * bit unset -> key available for allocation
+ */
+ u32 pkey_allocation_map;
+#endif
} mm_context_t;
/*
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 2c24447..6d7c4f1 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -145,6 +145,7 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
#ifndef CONFIG_PPC_MEM_KEYS
#define pkey_initialize()
+#define pkey_mm_init(mm)
#endif /* CONFIG_PPC_MEM_KEYS */
#endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index a54cb39..e5deac7 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -15,21 +15,101 @@
#include <linux/jump_label.h>
DECLARE_STATIC_KEY_TRUE(pkey_disabled);
-#define ARCH_VM_PKEY_FLAGS 0
+extern int pkeys_total; /* total pkeys as per device tree */
+extern u32 initial_allocation_mask; /* bits set for reserved keys */
+
+/*
+ * powerpc needs VM_PKEY_BIT* bit to enable pkey system.
+ * Without them, at least compilation needs to succeed.
+ */
+#ifndef VM_PKEY_BIT0
+#define VM_PKEY_SHIFT 0
+#define VM_PKEY_BIT0 0
+#define VM_PKEY_BIT1 0
+#define VM_PKEY_BIT2 0
+#define VM_PKEY_BIT3 0
+#endif
+
+/*
+ * powerpc needs an additional vma bit to support 32 keys. Till the additional
+ * vma bit lands in include/linux/mm.h we can only support 16 keys.
+ */
+#ifndef VM_PKEY_BIT4
+#define VM_PKEY_BIT4 0
+#endif
+
+#define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
+ VM_PKEY_BIT3 | VM_PKEY_BIT4)
+
+#define arch_max_pkey() pkeys_total
+
+#define pkey_alloc_mask(pkey) (0x1 << pkey)
+
+#define mm_pkey_allocation_map(mm) (mm->context.pkey_allocation_map)
+
+#define __mm_pkey_allocated(mm, pkey) { \
+ mm_pkey_allocation_map(mm) |= pkey_alloc_mask(pkey); \
+}
+
+#define __mm_pkey_free(mm, pkey) { \
+ mm_pkey_allocation_map(mm) &= ~pkey_alloc_mask(pkey); \
+}
+
+#define __mm_pkey_is_allocated(mm, pkey) \
+ (mm_pkey_allocation_map(mm) & pkey_alloc_mask(pkey))
+
+#define __mm_pkey_is_reserved(pkey) (initial_allocation_mask & \
+ pkey_alloc_mask(pkey))
static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
{
- return false;
+ /* A reserved key is never considered as 'explicitly allocated' */
+ return ((pkey < arch_max_pkey()) &&
+ !__mm_pkey_is_reserved(pkey) &&
+ __mm_pkey_is_allocated(mm, pkey));
}
+/*
+ * Returns a positive, 5-bit key on success, or -1 on failure.
+ * Relies on the mmap_sem to protect against concurrency in mm_pkey_alloc() and
+ * mm_pkey_free().
+ */
static inline int mm_pkey_alloc(struct mm_struct *mm)
{
- return -1;
+ /*
+ * Note: this is the one and only place we make sure that the pkey is
+ * valid as far as the hardware is concerned. The rest of the kernel
+ * trusts that only good, valid pkeys come out of here.
+ */
+ u32 all_pkeys_mask = (u32)(~(0x0));
+ int ret;
+
+ if (static_branch_likely(&pkey_disabled))
+ return -1;
+
+ /*
+ * Are we out of pkeys? We must handle this specially because ffz()
+ * behavior is undefined if there are no zeros.
+ */
+ if (mm_pkey_allocation_map(mm) == all_pkeys_mask)
+ return -1;
+
+ ret = ffz((u32)mm_pkey_allocation_map(mm));
+ __mm_pkey_allocated(mm, ret);
+ return ret;
}
static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
{
- return -EINVAL;
+ if (static_branch_likely(&pkey_disabled))
+ return -1;
+
+ if (!mm_pkey_is_allocated(mm, pkey))
+ return -EINVAL;
+
+ __mm_pkey_free(mm, pkey);
+
+ return 0;
}
/*
@@ -53,5 +133,12 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
return 0;
}
+static inline void pkey_mm_init(struct mm_struct *mm)
+{
+ if (static_branch_likely(&pkey_disabled))
+ return;
+ mm_pkey_allocation_map(mm) = initial_allocation_mask;
+}
+
extern void pkey_initialize(void);
#endif /*_ASM_POWERPC_KEYS_H */
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 05e1538..5df223a 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -16,6 +16,7 @@
#include <linux/string.h>
#include <linux/types.h>
#include <linux/mm.h>
+#include <linux/pkeys.h>
#include <linux/spinlock.h>
#include <linux/idr.h>
#include <linux/export.h>
@@ -118,6 +119,7 @@ static int hash__init_new_context(struct mm_struct *mm)
subpage_prot_init_new_context(mm);
+ pkey_mm_init(mm);
return index;
}
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index c97a7a0..512bdf2 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -13,18 +13,51 @@
DEFINE_STATIC_KEY_TRUE(pkey_disabled);
bool pkey_execute_disable_supported;
+int pkeys_total; /* Total pkeys as per device tree */
+u32 initial_allocation_mask; /* Bits set for reserved keys */
void __init pkey_initialize(void)
{
+ int os_reserved, i;
+
/*
* Disable the pkey system till everything is in place. A subsequent
* patch will enable it.
*/
static_branch_enable(&pkey_disabled);
+ /* Lets assume 32 keys */
+ pkeys_total = 32;
+
+ /*
+ * Adjust the upper limit, based on the number of bits supported by
+ * arch-neutral code.
+ */
+ pkeys_total = min_t(int, pkeys_total,
+ (ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT));
+
/*
* Disable execute_disable support for now. A subsequent patch will
* enable it.
*/
pkey_execute_disable_supported = false;
+
+#ifdef CONFIG_PPC_4K_PAGES
+ /*
+ * The OS can manage only 8 pkeys due to its inability to represent them
+ * in the Linux 4K PTE.
+ */
+ os_reserved = pkeys_total - 8;
+#else
+ os_reserved = 0;
+#endif
+ /*
+ * Bits are in LE format. NOTE: 1, 0 are reserved.
+ * key 0 is the default key, which allows read/write/execute.
+ * key 1 is recommended not to be used. PowerISA(3.0) page 1015,
+ * programming note.
+ */
+ initial_allocation_mask = ~0x0;
+ for (i = 2; i < (pkeys_total - os_reserved); i++)
+ initial_allocation_mask &= ~(0x1 << i);
}
--
1.7.1
From 1583346967196310157@xxx Mon Nov 06 19:42:16 +0000 2017
X-GM-THRID: 1583346967196310157
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Basic plumbing to initialize the pkey system.
Nothing is enabled yet. A later patch will enable it
ones all the infrastructure is in place.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/Kconfig | 15 ++++++++
arch/powerpc/include/asm/mmu_context.h | 5 +++
arch/powerpc/include/asm/pkeys.h | 57 ++++++++++++++++++++++++++++++++
arch/powerpc/mm/Makefile | 1 +
arch/powerpc/mm/hash_utils_64.c | 4 ++
arch/powerpc/mm/pkeys.c | 30 +++++++++++++++++
6 files changed, 112 insertions(+), 0 deletions(-)
create mode 100644 arch/powerpc/include/asm/pkeys.h
create mode 100644 arch/powerpc/mm/pkeys.c
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cb782ac..9fd389b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -865,6 +865,21 @@ config SECCOMP
If unsure, say Y. Only embedded should say N here.
+config PPC_MEM_KEYS
+ prompt "PowerPC Memory Protection Keys"
+ def_bool y
+ depends on PPC_BOOK3S_64
+ select ARCH_USES_HIGH_VMA_FLAGS
+ select ARCH_HAS_PKEYS
+ help
+ Memory Protection Keys provides a mechanism for enforcing
+ page-based protections, but without requiring modification of the
+ page tables when an application changes protection domains.
+
+ For details, see Documentation/vm/protection-keys.txt
+
+ If unsure, say y.
+
endmenu
config ISA_DMA_API
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 492d814..2c24447 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -142,5 +142,10 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
/* by default, allow everything */
return true;
}
+
+#ifndef CONFIG_PPC_MEM_KEYS
+#define pkey_initialize()
+#endif /* CONFIG_PPC_MEM_KEYS */
+
#endif /* __KERNEL__ */
#endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
new file mode 100644
index 0000000..a54cb39
--- /dev/null
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -0,0 +1,57 @@
+/*
+ * PowerPC Memory Protection Keys management
+ * Copyright (c) 2017, IBM Corporation.
+ * Author: Ram Pai <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _ASM_POWERPC_KEYS_H
+#define _ASM_POWERPC_KEYS_H
+
+#include <linux/jump_label.h>
+
+DECLARE_STATIC_KEY_TRUE(pkey_disabled);
+#define ARCH_VM_PKEY_FLAGS 0
+
+static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
+{
+ return false;
+}
+
+static inline int mm_pkey_alloc(struct mm_struct *mm)
+{
+ return -1;
+}
+
+static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
+{
+ return -EINVAL;
+}
+
+/*
+ * Try to dedicate one of the protection keys to be used as an
+ * execute-only protection key.
+ */
+static inline int execute_only_pkey(struct mm_struct *mm)
+{
+ return 0;
+}
+
+static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
+ int prot, int pkey)
+{
+ return 0;
+}
+
+static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
+ unsigned long init_val)
+{
+ return 0;
+}
+
+extern void pkey_initialize(void);
+#endif /*_ASM_POWERPC_KEYS_H */
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index a0c327d..823b03d 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -44,3 +44,4 @@ obj-$(CONFIG_PPC_COPRO_BASE) += copro_fault.o
obj-$(CONFIG_SPAPR_TCE_IOMMU) += mmu_context_iommu.o
obj-$(CONFIG_PPC_PTDUMP) += dump_linuxpagetables.o
obj-$(CONFIG_PPC_HTDUMP) += dump_hashpagetable.o
+obj-$(CONFIG_PPC_MEM_KEYS) += pkeys.o
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 578d5a3..1e74590 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -35,6 +35,7 @@
#include <linux/memblock.h>
#include <linux/context_tracking.h>
#include <linux/libfdt.h>
+#include <linux/pkeys.h>
#include <asm/debugfs.h>
#include <asm/processor.h>
@@ -1050,6 +1051,9 @@ void __init hash__early_init_mmu(void)
pr_info("Initializing hash mmu with SLB\n");
/* Initialize SLB management */
slb_initialize();
+
+ /* initialize the key subsystem */
+ pkey_initialize();
}
#ifdef CONFIG_SMP
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
new file mode 100644
index 0000000..c97a7a0
--- /dev/null
+++ b/arch/powerpc/mm/pkeys.c
@@ -0,0 +1,30 @@
+/*
+ * PowerPC Memory Protection Keys management
+ * Copyright (c) 2017, IBM Corporation.
+ * Author: Ram Pai <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/pkeys.h>
+
+DEFINE_STATIC_KEY_TRUE(pkey_disabled);
+bool pkey_execute_disable_supported;
+
+void __init pkey_initialize(void)
+{
+ /*
+ * Disable the pkey system till everything is in place. A subsequent
+ * patch will enable it.
+ */
+ static_branch_enable(&pkey_disabled);
+
+ /*
+ * Disable execute_disable support for now. A subsequent patch will
+ * enable it.
+ */
+ pkey_execute_disable_supported = false;
+}
--
1.7.1
From 1583203878920771763@xxx Sun Nov 05 05:47:56 +0000 2017
X-GM-THRID: 1583203878920771763
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Currently the architecture specific code is expected to
display the protection keys in smap for a given vma.
This can lead to redundant code and possibly to divergent
formats in which the key gets displayed.
This patch changes the implementation. It displays the
pkey only if the architecture support pkeys.
x86 arch_show_smap() function is not needed anymore.
Delete it.
Signed-off-by: Ram Pai <[email protected]>
---
arch/x86/kernel/setup.c | 8 --------
fs/proc/task_mmu.c | 11 ++++++-----
2 files changed, 6 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 0957dd7..b8b8d0e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1357,11 +1357,3 @@ static int __init register_kernel_offset_dumper(void)
return 0;
}
__initcall(register_kernel_offset_dumper);
-
-void arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
-{
- if (!boot_cpu_has(X86_FEATURE_OSPKE))
- return;
-
- seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma));
-}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index fad19a0..5ce3ec0 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -18,6 +18,7 @@
#include <linux/page_idle.h>
#include <linux/shmem_fs.h>
#include <linux/uaccess.h>
+#include <linux/pkeys.h>
#include <asm/elf.h>
#include <asm/tlb.h>
@@ -731,10 +732,6 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
}
#endif /* HUGETLB_PAGE */
-void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
-{
-}
-
static int show_smap(struct seq_file *m, void *v, int is_pid)
{
struct proc_maps_private *priv = m->private;
@@ -854,9 +851,13 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
(unsigned long)(mss->pss >> (10 + PSS_SHIFT)));
if (!rollup_mode) {
- arch_show_smap(m, vma);
+#ifdef CONFIG_ARCH_HAS_PKEYS
+ if (arch_pkeys_enabled())
+ seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma));
+#endif
show_smap_vma_flags(m, vma);
}
+
m_cache_vma(m, vma);
return ret;
}
--
1.7.1
From 1583245100946443742@xxx Sun Nov 05 16:43:09 +0000 2017
X-GM-THRID: 1582782361389199767
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
expected_pkey_fault() is comparing the contents of pkey
register with 0. This may not be true all the time. There
could be bits set by default by the architecture
which can never be changed. Hence compare the value against
shadow pkey register, which is supposed to track the bits
accurately all throughout
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/protection_keys.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 19ae991..2600f7a 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -926,10 +926,10 @@ void expected_pkey_fault(int pkey)
pkey_assert(last_pkey_faults + 1 == pkey_faults);
pkey_assert(last_si_pkey == pkey);
/*
- * The signal handler shold have cleared out PKEY register to let the
+ * The signal handler shold have cleared out pkey-register to let the
* test program continue. We now have to restore it.
*/
- if (__rdpkey_reg() != 0)
+ if (__rdpkey_reg() != shadow_pkey_reg)
pkey_assert(0);
__wrpkey_reg(shadow_pkey_reg);
--
1.7.1
From 1584619336918074391@xxx Mon Nov 20 20:46:02 +0000 2017
X-GM-THRID: 1583732270078976484
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
From: Thiago Jung Bauermann <[email protected]>
This test verifies that the AMR is being written to a
process' core file.
Signed-off-by: Thiago Jung Bauermann <[email protected]>
---
tools/testing/selftests/powerpc/ptrace/Makefile | 2 +-
tools/testing/selftests/powerpc/ptrace/core-pkey.c | 438 ++++++++++++++++++++
2 files changed, 439 insertions(+), 1 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/ptrace/core-pkey.c
diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile b/tools/testing/selftests/powerpc/ptrace/Makefile
index fd896b2..ca25fda 100644
--- a/tools/testing/selftests/powerpc/ptrace/Makefile
+++ b/tools/testing/selftests/powerpc/ptrace/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
TEST_PROGS := ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr \
ptrace-tar ptrace-tm-tar ptrace-tm-spd-tar ptrace-vsx ptrace-tm-vsx \
- ptrace-tm-spd-vsx ptrace-tm-spr ptrace-pkey
+ ptrace-tm-spd-vsx ptrace-tm-spr ptrace-pkey core-pkey
include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/ptrace/core-pkey.c b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
new file mode 100644
index 0000000..2328f8c
--- /dev/null
+++ b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
@@ -0,0 +1,438 @@
+/*
+ * Ptrace test for Memory Protection Key registers
+ *
+ * Copyright (C) 2015 Anshuman Khandual, IBM Corporation.
+ * Copyright (C) 2017 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include <limits.h>
+#include <semaphore.h>
+#include <linux/kernel.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include "ptrace.h"
+
+#ifndef __NR_pkey_alloc
+#define __NR_pkey_alloc 384
+#endif
+
+#ifndef __NR_pkey_free
+#define __NR_pkey_free 385
+#endif
+
+#ifndef NT_PPC_PKEY
+#define NT_PPC_PKEY 0x110
+#endif
+
+#ifndef PKEY_DISABLE_EXECUTE
+#define PKEY_DISABLE_EXECUTE 0x4
+#endif
+
+#define AMR_BITS_PER_PKEY 2
+#define PKEY_REG_BITS (sizeof(u64) * 8)
+#define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey + 1) * AMR_BITS_PER_PKEY))
+
+#define CORE_FILE_LIMIT (5 * 1024 * 1024) /* 5 MB should be enough */
+
+static const char core_pattern_file[] = "/proc/sys/kernel/core_pattern";
+
+static const char user_write[] = "[User Write (Running)]";
+static const char core_read_running[] = "[Core Read (Running)]";
+
+/* Information shared between the parent and the child. */
+struct shared_info {
+ /* AMR value the parent expects to read in the core file. */
+ unsigned long amr;
+
+ /* IAMR value the parent expects to read from the child. */
+ unsigned long iamr;
+
+ /* UAMOR value the parent expects to read from the child. */
+ unsigned long uamor;
+
+ /* When the child crashed. */
+ time_t core_time;
+};
+
+static int sys_pkey_alloc(unsigned long flags, unsigned long init_access_rights)
+{
+ return syscall(__NR_pkey_alloc, flags, init_access_rights);
+}
+
+static int sys_pkey_free(int pkey)
+{
+ return syscall(__NR_pkey_free, pkey);
+}
+
+static int increase_core_file_limit(void)
+{
+ struct rlimit rlim;
+ int ret;
+
+ ret = getrlimit(RLIMIT_CORE, &rlim);
+ FAIL_IF(ret);
+
+ if (rlim.rlim_cur != RLIM_INFINITY && rlim.rlim_cur < CORE_FILE_LIMIT) {
+ rlim.rlim_cur = CORE_FILE_LIMIT;
+
+ if (rlim.rlim_max != RLIM_INFINITY &&
+ rlim.rlim_max < CORE_FILE_LIMIT)
+ rlim.rlim_max = CORE_FILE_LIMIT;
+
+ ret = setrlimit(RLIMIT_CORE, &rlim);
+ FAIL_IF(ret);
+ }
+
+ ret = getrlimit(RLIMIT_FSIZE, &rlim);
+ FAIL_IF(ret);
+
+ if (rlim.rlim_cur != RLIM_INFINITY && rlim.rlim_cur < CORE_FILE_LIMIT) {
+ rlim.rlim_cur = CORE_FILE_LIMIT;
+
+ if (rlim.rlim_max != RLIM_INFINITY &&
+ rlim.rlim_max < CORE_FILE_LIMIT)
+ rlim.rlim_max = CORE_FILE_LIMIT;
+
+ ret = setrlimit(RLIMIT_FSIZE, &rlim);
+ FAIL_IF(ret);
+ }
+
+ return TEST_PASS;
+}
+
+static int child(struct shared_info *info)
+{
+ bool disable_execute = true;
+ int pkey1, pkey2, pkey3;
+ int *ptr, ret;
+
+ ret = increase_core_file_limit();
+ FAIL_IF(ret);
+
+ /* Get some pkeys so that we can change their bits in the AMR. */
+ pkey1 = sys_pkey_alloc(0, PKEY_DISABLE_EXECUTE);
+ if (pkey1 < 0) {
+ pkey1 = sys_pkey_alloc(0, 0);
+ FAIL_IF(pkey1 < 0);
+
+ disable_execute = false;
+ }
+
+ pkey2 = sys_pkey_alloc(0, 0);
+ FAIL_IF(pkey2 < 0);
+
+ pkey3 = sys_pkey_alloc(0, 0);
+ FAIL_IF(pkey3 < 0);
+
+ info->amr = 3ul << pkeyshift(pkey1) | 2ul << pkeyshift(pkey2);
+
+ if (disable_execute)
+ info->iamr = 1ul << pkeyshift(pkey1);
+ else
+ info->iamr = 0;
+
+ info->uamor = 3ul << pkeyshift(pkey1) | 3ul << pkeyshift(pkey2);
+
+ printf("%-30s AMR: %016lx pkey1: %d pkey2: %d pkey3: %d\n",
+ user_write, info->amr, pkey1, pkey2, pkey3);
+
+ mtspr(SPRN_AMR, info->amr);
+
+ /*
+ * We won't use pkey3. This tests whether the kernel restores the UAMOR
+ * permissions after a key is freed.
+ */
+ sys_pkey_free(pkey3);
+
+ info->core_time = time(NULL);
+
+ /* Crash. */
+ ptr = 0;
+ *ptr = 1;
+
+ /* Shouldn't get here. */
+ FAIL_IF(true);
+
+ return TEST_FAIL;
+}
+
+/* Return file size if filename exists and pass sanity check, or zero if not. */
+static off_t try_core_file(const char *filename, struct shared_info *info,
+ pid_t pid)
+{
+ struct stat buf;
+ int ret;
+
+ ret = stat(filename, &buf);
+ if (ret == -1)
+ return TEST_FAIL;
+
+ /* Make sure we're not using a stale core file. */
+ return buf.st_mtime >= info->core_time ? buf.st_size : TEST_FAIL;
+}
+
+static Elf64_Nhdr *next_note(Elf64_Nhdr *nhdr)
+{
+ return (void *) nhdr + sizeof(*nhdr) +
+ __ALIGN_KERNEL(nhdr->n_namesz, 4) +
+ __ALIGN_KERNEL(nhdr->n_descsz, 4);
+}
+
+static int check_core_file(struct shared_info *info, Elf64_Ehdr *ehdr,
+ off_t core_size)
+{
+ unsigned long *regs;
+ Elf64_Phdr *phdr;
+ Elf64_Nhdr *nhdr;
+ size_t phdr_size;
+ void *p = ehdr, *note;
+ int ret;
+
+ ret = memcmp(ehdr->e_ident, ELFMAG, SELFMAG);
+ FAIL_IF(ret);
+
+ FAIL_IF(ehdr->e_type != ET_CORE);
+ FAIL_IF(ehdr->e_machine != EM_PPC64);
+ FAIL_IF(ehdr->e_phoff == 0 || ehdr->e_phnum == 0);
+
+ /*
+ * e_phnum is at most 65535 so calculating the size of the
+ * program header cannot overflow.
+ */
+ phdr_size = sizeof(*phdr) * ehdr->e_phnum;
+
+ /* Sanity check the program header table location. */
+ FAIL_IF(ehdr->e_phoff + phdr_size < ehdr->e_phoff);
+ FAIL_IF(ehdr->e_phoff + phdr_size > core_size);
+
+ /* Find the PT_NOTE segment. */
+ for (phdr = p + ehdr->e_phoff;
+ (void *) phdr < p + ehdr->e_phoff + phdr_size;
+ phdr += ehdr->e_phentsize)
+ if (phdr->p_type == PT_NOTE)
+ break;
+
+ FAIL_IF((void *) phdr >= p + ehdr->e_phoff + phdr_size);
+
+ /* Find the NT_PPC_PKEY note. */
+ for (nhdr = p + phdr->p_offset;
+ (void *) nhdr < p + phdr->p_offset + phdr->p_filesz;
+ nhdr = next_note(nhdr))
+ if (nhdr->n_type == NT_PPC_PKEY)
+ break;
+
+ FAIL_IF((void *) nhdr >= p + phdr->p_offset + phdr->p_filesz);
+ FAIL_IF(nhdr->n_descsz == 0);
+
+ p = nhdr;
+ note = p + sizeof(*nhdr) + __ALIGN_KERNEL(nhdr->n_namesz, 4);
+
+ regs = (unsigned long *) note;
+
+ printf("%-30s AMR: %016lx IAMR: %016lx UAMOR: %016lx\n",
+ core_read_running, regs[0], regs[1], regs[2]);
+
+ FAIL_IF(regs[0] != info->amr);
+ FAIL_IF(regs[1] != info->iamr);
+ FAIL_IF(regs[2] != info->uamor);
+
+ return TEST_PASS;
+}
+
+static int parent(struct shared_info *info, pid_t pid)
+{
+ char *filenames, *filename[3];
+ int fd, i, ret, status;
+ off_t core_size;
+ void *core;
+
+ ret = wait(&status);
+ if (ret != pid) {
+ printf("Child's exit status not captured\n");
+ return TEST_FAIL;
+ } else if (!WIFSIGNALED(status) || !WCOREDUMP(status)) {
+ printf("Child didn't dump core\n");
+ return TEST_FAIL;
+ }
+
+ /* Construct array of core file names to try. */
+
+ filename[0] = filenames = malloc(PATH_MAX);
+ if (!filenames) {
+ perror("Error allocating memory");
+ return TEST_FAIL;
+ }
+
+ ret = snprintf(filename[0], PATH_MAX, "core-pkey.%d", pid);
+ if (ret < 0 || ret >= PATH_MAX) {
+ ret = TEST_FAIL;
+ goto out;
+ }
+
+ filename[1] = filename[0] + ret + 1;
+ ret = snprintf(filename[1], PATH_MAX - ret - 1, "core.%d", pid);
+ if (ret < 0 || ret >= PATH_MAX - ret - 1) {
+ ret = TEST_FAIL;
+ goto out;
+ }
+ filename[2] = "core";
+
+ for (i = 0; i < 3; i++) {
+ core_size = try_core_file(filename[i], info, pid);
+ if (core_size != TEST_FAIL)
+ break;
+ }
+
+ if (i == 3) {
+ printf("Couldn't find core file\n");
+ ret = TEST_FAIL;
+ goto out;
+ }
+
+ fd = open(filename[i], O_RDONLY);
+ if (fd == -1) {
+ perror("Error opening core file");
+ ret = TEST_FAIL;
+ goto out;
+ }
+
+ core = mmap(NULL, core_size, PROT_READ, MAP_PRIVATE, fd, 0);
+ if (core == (void *) -1) {
+ perror("Error mmaping core file");
+ ret = TEST_FAIL;
+ goto out;
+ }
+
+ ret = check_core_file(info, core, core_size);
+
+ munmap(core, core_size);
+ close(fd);
+ unlink(filename[i]);
+
+ out:
+ free(filenames);
+
+ return ret;
+}
+
+static int write_core_pattern(const char *core_pattern)
+{
+ size_t len = strlen(core_pattern), ret;
+ FILE *f;
+
+ f = fopen(core_pattern_file, "w");
+ if (!f) {
+ perror("Error writing to core_pattern file");
+ return TEST_FAIL;
+ }
+
+ ret = fwrite(core_pattern, 1, len, f);
+ fclose(f);
+ if (ret != len) {
+ perror("Error writing to core_pattern file");
+ return TEST_FAIL;
+ }
+
+ return TEST_PASS;
+}
+
+static int setup_core_pattern(char **core_pattern_, bool *changed_)
+{
+ FILE *f;
+ char *core_pattern;
+ int ret;
+
+ core_pattern = malloc(PATH_MAX);
+ if (!core_pattern) {
+ perror("Error allocating memory");
+ return TEST_FAIL;
+ }
+
+ f = fopen(core_pattern_file, "r");
+ if (!f) {
+ perror("Error opening core_pattern file");
+ ret = TEST_FAIL;
+ goto out;
+ }
+
+ ret = fread(core_pattern, 1, PATH_MAX, f);
+ fclose(f);
+ if (!ret) {
+ perror("Error reading core_pattern file");
+ ret = TEST_FAIL;
+ goto out;
+ }
+
+ /* Check whether we can predict the name of the core file. */
+ if (!strcmp(core_pattern, "core") || !strcmp(core_pattern, "core.%p"))
+ *changed_ = false;
+ else {
+ ret = write_core_pattern("core-pkey.%p");
+ if (ret)
+ goto out;
+
+ *changed_ = true;
+ }
+
+ *core_pattern_ = core_pattern;
+ ret = TEST_PASS;
+
+ out:
+ if (ret)
+ free(core_pattern);
+
+ return ret;
+}
+
+static int core_pkey(void)
+{
+ char *core_pattern;
+ bool changed_core_pattern;
+ struct shared_info *info;
+ int shm_id;
+ int ret;
+ pid_t pid;
+
+ ret = setup_core_pattern(&core_pattern, &changed_core_pattern);
+ if (ret)
+ return ret;
+
+ shm_id = shmget(IPC_PRIVATE, sizeof(*info), 0777 | IPC_CREAT);
+ info = shmat(shm_id, NULL, 0);
+
+ pid = fork();
+ if (pid < 0) {
+ perror("fork() failed");
+ ret = TEST_FAIL;
+ } else if (pid == 0)
+ ret = child(info);
+ else
+ ret = parent(info, pid);
+
+ shmdt(info);
+
+ if (pid) {
+ shmctl(shm_id, IPC_RMID, NULL);
+
+ if (changed_core_pattern)
+ write_core_pattern(core_pattern);
+ }
+
+ free(core_pattern);
+
+ return ret;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(core_pkey, "core_pkey");
+}
--
1.7.1
From 1583204000372725963@xxx Sun Nov 05 05:49:52 +0000 2017
X-GM-THRID: 1583204000372725963
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Since PowerPC and Intel both support memory protection keys, moving
the documenation to arch-neutral directory.
Signed-off-by: Ram Pai <[email protected]>
---
Documentation/vm/protection-keys.txt | 85 +++++++++++++++++++++++++++++++++
Documentation/x86/protection-keys.txt | 85 ---------------------------------
2 files changed, 85 insertions(+), 85 deletions(-)
create mode 100644 Documentation/vm/protection-keys.txt
delete mode 100644 Documentation/x86/protection-keys.txt
diff --git a/Documentation/vm/protection-keys.txt b/Documentation/vm/protection-keys.txt
new file mode 100644
index 0000000..fa46dcb
--- /dev/null
+++ b/Documentation/vm/protection-keys.txt
@@ -0,0 +1,85 @@
+Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
+which will be found on future Intel CPUs.
+
+Memory Protection Keys provides a mechanism for enforcing page-based
+protections, but without requiring modification of the page tables
+when an application changes protection domains. It works by
+dedicating 4 previously ignored bits in each page table entry to a
+"protection key", giving 16 possible keys.
+
+There is also a new user-accessible register (PKRU) with two separate
+bits (Access Disable and Write Disable) for each key. Being a CPU
+register, PKRU is inherently thread-local, potentially giving each
+thread a different set of protections from every other thread.
+
+There are two new instructions (RDPKRU/WRPKRU) for reading and writing
+to the new register. The feature is only available in 64-bit mode,
+even though there is theoretically space in the PAE PTEs. These
+permissions are enforced on data access only and have no effect on
+instruction fetches.
+
+=========================== Syscalls ===========================
+
+There are 3 system calls which directly interact with pkeys:
+
+ int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
+ int pkey_free(int pkey);
+ int pkey_mprotect(unsigned long start, size_t len,
+ unsigned long prot, int pkey);
+
+Before a pkey can be used, it must first be allocated with
+pkey_alloc(). An application calls the WRPKRU instruction
+directly in order to change access permissions to memory covered
+with a key. In this example WRPKRU is wrapped by a C function
+called pkey_set().
+
+ int real_prot = PROT_READ|PROT_WRITE;
+ pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
+ ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+ ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
+ ... application runs here
+
+Now, if the application needs to update the data at 'ptr', it can
+gain access, do the update, then remove its write access:
+
+ pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
+ *ptr = foo; // assign something
+ pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
+
+Now when it frees the memory, it will also free the pkey since it
+is no longer in use:
+
+ munmap(ptr, PAGE_SIZE);
+ pkey_free(pkey);
+
+(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
+ An example implementation can be found in
+ tools/testing/selftests/x86/protection_keys.c)
+
+=========================== Behavior ===========================
+
+The kernel attempts to make protection keys consistent with the
+behavior of a plain mprotect(). For instance if you do this:
+
+ mprotect(ptr, size, PROT_NONE);
+ something(ptr);
+
+you can expect the same effects with protection keys when doing this:
+
+ pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
+ pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
+ something(ptr);
+
+That should be true whether something() is a direct access to 'ptr'
+like:
+
+ *ptr = foo;
+
+or when the kernel does the access on the application's behalf like
+with a read():
+
+ read(fd, ptr, 1);
+
+The kernel will send a SIGSEGV in both cases, but si_code will be set
+to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
+the plain mprotect() permissions are violated.
diff --git a/Documentation/x86/protection-keys.txt b/Documentation/x86/protection-keys.txt
deleted file mode 100644
index fa46dcb..0000000
--- a/Documentation/x86/protection-keys.txt
+++ /dev/null
@@ -1,85 +0,0 @@
-Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
-which will be found on future Intel CPUs.
-
-Memory Protection Keys provides a mechanism for enforcing page-based
-protections, but without requiring modification of the page tables
-when an application changes protection domains. It works by
-dedicating 4 previously ignored bits in each page table entry to a
-"protection key", giving 16 possible keys.
-
-There is also a new user-accessible register (PKRU) with two separate
-bits (Access Disable and Write Disable) for each key. Being a CPU
-register, PKRU is inherently thread-local, potentially giving each
-thread a different set of protections from every other thread.
-
-There are two new instructions (RDPKRU/WRPKRU) for reading and writing
-to the new register. The feature is only available in 64-bit mode,
-even though there is theoretically space in the PAE PTEs. These
-permissions are enforced on data access only and have no effect on
-instruction fetches.
-
-=========================== Syscalls ===========================
-
-There are 3 system calls which directly interact with pkeys:
-
- int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
- int pkey_free(int pkey);
- int pkey_mprotect(unsigned long start, size_t len,
- unsigned long prot, int pkey);
-
-Before a pkey can be used, it must first be allocated with
-pkey_alloc(). An application calls the WRPKRU instruction
-directly in order to change access permissions to memory covered
-with a key. In this example WRPKRU is wrapped by a C function
-called pkey_set().
-
- int real_prot = PROT_READ|PROT_WRITE;
- pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
- ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
- ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
- ... application runs here
-
-Now, if the application needs to update the data at 'ptr', it can
-gain access, do the update, then remove its write access:
-
- pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
- *ptr = foo; // assign something
- pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
-
-Now when it frees the memory, it will also free the pkey since it
-is no longer in use:
-
- munmap(ptr, PAGE_SIZE);
- pkey_free(pkey);
-
-(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
- An example implementation can be found in
- tools/testing/selftests/x86/protection_keys.c)
-
-=========================== Behavior ===========================
-
-The kernel attempts to make protection keys consistent with the
-behavior of a plain mprotect(). For instance if you do this:
-
- mprotect(ptr, size, PROT_NONE);
- something(ptr);
-
-you can expect the same effects with protection keys when doing this:
-
- pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
- pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
- something(ptr);
-
-That should be true whether something() is a direct access to 'ptr'
-like:
-
- *ptr = foo;
-
-or when the kernel does the access on the application's behalf like
-with a read():
-
- read(fd, ptr, 1);
-
-The kernel will send a SIGSEGV in both cases, but si_code will be set
-to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
-the plain mprotect() permissions are violated.
--
1.7.1
From 1583223879942244405@xxx Sun Nov 05 11:05:51 +0000 2017
X-GM-THRID: 1583223879942244405
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Introduce helper functions that can initialize the bits in the AMR,
IAMR and UAMOR register; the bits that correspond to the given pkey.
Reviewed-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/mm/pkeys.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 47 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index b6bdfdf..f3bf661 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -16,6 +16,10 @@
int pkeys_total; /* Total pkeys as per device tree */
u32 initial_allocation_mask; /* Bits set for reserved keys */
+#define AMR_BITS_PER_PKEY 2
+#define PKEY_REG_BITS (sizeof(u64)*8)
+#define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
+
void __init pkey_initialize(void)
{
int os_reserved, i;
@@ -97,3 +101,46 @@ static inline void write_uamor(u64 value)
{
mtspr(SPRN_UAMOR, value);
}
+
+static inline void init_amr(int pkey, u8 init_bits)
+{
+ u64 new_amr_bits = (((u64)init_bits & 0x3UL) << pkeyshift(pkey));
+ u64 old_amr = read_amr() & ~((u64)(0x3ul) << pkeyshift(pkey));
+
+ write_amr(old_amr | new_amr_bits);
+}
+
+static inline void init_iamr(int pkey, u8 init_bits)
+{
+ u64 new_iamr_bits = (((u64)init_bits & 0x1UL) << pkeyshift(pkey));
+ u64 old_iamr = read_iamr() & ~((u64)(0x1ul) << pkeyshift(pkey));
+
+ write_iamr(old_iamr | new_iamr_bits);
+}
+
+static void pkey_status_change(int pkey, bool enable)
+{
+ u64 old_uamor;
+
+ /* Reset the AMR and IAMR bits for this key */
+ init_amr(pkey, 0x0);
+ init_iamr(pkey, 0x0);
+
+ /* Enable/disable key */
+ old_uamor = read_uamor();
+ if (enable)
+ old_uamor |= (0x3ul << pkeyshift(pkey));
+ else
+ old_uamor &= ~(0x3ul << pkeyshift(pkey));
+ write_uamor(old_uamor);
+}
+
+void __arch_activate_pkey(int pkey)
+{
+ pkey_status_change(pkey, true);
+}
+
+void __arch_deactivate_pkey(int pkey)
+{
+ pkey_status_change(pkey, false);
+}
--
1.7.1
From 1583349717722418994@xxx Mon Nov 06 20:25:59 +0000 2017
X-GM-THRID: 1583349717722418994
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
From: Thiago Jung Bauermann <[email protected]>
Expose useful information for programs using memory protection keys.
Provide implementation for powerpc and x86.
On a powerpc system with pkeys support, here is what is shown:
$ head /sys/kernel/mm/protection_keys/*
==> /sys/kernel/mm/protection_keys/disable_access_supported <==
true
==> /sys/kernel/mm/protection_keys/disable_execute_supported <==
true
==> /sys/kernel/mm/protection_keys/disable_write_supported <==
true
==> /sys/kernel/mm/protection_keys/total_keys <==
31
==> /sys/kernel/mm/protection_keys/usable_keys <==
27
And on an x86 without pkeys support:
$ head /sys/kernel/mm/protection_keys/*
==> /sys/kernel/mm/protection_keys/disable_access_supported <==
false
==> /sys/kernel/mm/protection_keys/disable_execute_supported <==
false
==> /sys/kernel/mm/protection_keys/disable_write_supported <==
false
==> /sys/kernel/mm/protection_keys/total_keys <==
1
==> /sys/kernel/mm/protection_keys/usable_keys <==
0
Signed-off-by: Ram Pai <[email protected]>
Signed-off-by: Thiago Jung Bauermann <[email protected]>
---
arch/powerpc/include/asm/pkeys.h | 2 +
arch/powerpc/mm/pkeys.c | 24 ++++++++++
arch/x86/include/asm/mmu_context.h | 4 +-
arch/x86/include/asm/pkeys.h | 1 +
arch/x86/mm/pkeys.c | 9 ++++
include/linux/pkeys.h | 2 +-
mm/mprotect.c | 88 ++++++++++++++++++++++++++++++++++++
7 files changed, 128 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 333fb28..6d70b1a 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -237,6 +237,8 @@ static inline void pkey_mmu_values(int total_data, int total_execute)
pkeys_total = total_data;
}
+extern bool arch_supports_pkeys(int cap);
+extern unsigned int arch_usable_pkeys(void);
extern void thread_pkey_regs_save(struct thread_struct *thread);
extern void thread_pkey_regs_restore(struct thread_struct *new_thread,
struct thread_struct *old_thread);
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 2612f61..7e8468f 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -421,6 +421,30 @@ bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
return pkey_access_permitted(vma_pkey(vma), write, execute);
}
+unsigned int arch_usable_pkeys(void)
+{
+ unsigned int reserved;
+
+ if (static_branch_likely(&pkey_disabled))
+ return 0;
+
+ /* Reserve one more to account for the execute-only pkey. */
+ reserved = hweight32(initial_allocation_mask) + 1;
+
+ return pkeys_total > reserved ? pkeys_total - reserved : 0;
+}
+
+bool arch_supports_pkeys(int cap)
+{
+ if (static_branch_likely(&pkey_disabled))
+ return false;
+
+ if (cap & PKEY_DISABLE_EXECUTE)
+ return pkey_execute_disable_supported;
+
+ return (cap & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
+}
+
long sys_pkey_modify(int pkey, unsigned long new_val)
{
bool ret;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 6699fc4..e3efabb 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -129,6 +129,8 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
+#define PKEY_INITIAL_ALLOCATION_MAP 1
+
static inline int init_new_context(struct task_struct *tsk,
struct mm_struct *mm)
{
@@ -138,7 +140,7 @@ static inline int init_new_context(struct task_struct *tsk,
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
/* pkey 0 is the default and always allocated */
- mm->context.pkey_allocation_map = 0x1;
+ mm->context.pkey_allocation_map = PKEY_INITIAL_ALLOCATION_MAP;
/* -1 means unallocated or invalid */
mm->context.execute_only_pkey = -1;
}
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index f6c287b..6807288 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -106,5 +106,6 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
extern void copy_init_pkru_to_fpregs(void);
+extern unsigned int arch_usable_pkeys(void);
#endif /*_ASM_X86_PKEYS_H */
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index d7bc0ee..3083a59 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -122,6 +122,15 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey
return vma_pkey(vma);
}
+unsigned int arch_usable_pkeys(void)
+{
+ /* Reserve one more to account for the execute-only pkey. */
+ unsigned int reserved = (boot_cpu_has(X86_FEATURE_OSPKE) ?
+ hweight32(PKEY_INITIAL_ALLOCATION_MAP) : 0) + 1;
+
+ return arch_max_pkey() > reserved ? arch_max_pkey() - reserved : 0;
+}
+
#define PKRU_AD_KEY(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY))
/*
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 3ca2e44..0784f20 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -11,6 +11,7 @@
#define arch_max_pkey() (1)
#define execute_only_pkey(mm) (0)
#define arch_override_mprotect_pkey(vma, prot, pkey) (0)
+#define arch_usable_pkeys() (0)
#define PKEY_DEDICATED_EXECUTE_ONLY 0
#define ARCH_VM_PKEY_FLAGS 0
@@ -43,7 +44,6 @@ static inline bool arch_pkeys_enabled(void)
static inline void copy_init_pkru_to_fpregs(void)
{
}
-
#endif /* ! CONFIG_ARCH_HAS_PKEYS */
#endif /* _LINUX_PKEYS_H */
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ec39f73..43a4584 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -568,4 +568,92 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
return ret;
}
+#ifdef CONFIG_SYSFS
+
+#define PKEYS_ATTR_RO(_name) \
+ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
+
+static ssize_t total_keys_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%u\n", arch_max_pkey());
+}
+PKEYS_ATTR_RO(total_keys);
+
+static ssize_t usable_keys_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%u\n", arch_usable_pkeys());
+}
+PKEYS_ATTR_RO(usable_keys);
+
+static ssize_t disable_access_supported_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ if (arch_pkeys_enabled()) {
+ strcpy(buf, "true\n");
+ return sizeof("true\n") - 1;
+ }
+
+ strcpy(buf, "false\n");
+ return sizeof("false\n") - 1;
+}
+PKEYS_ATTR_RO(disable_access_supported);
+
+static ssize_t disable_write_supported_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ if (arch_pkeys_enabled()) {
+ strcpy(buf, "true\n");
+ return sizeof("true\n") - 1;
+ }
+
+ strcpy(buf, "false\n");
+ return sizeof("false\n") - 1;
+}
+PKEYS_ATTR_RO(disable_write_supported);
+
+static ssize_t disable_execute_supported_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+#ifdef PKEY_DISABLE_EXECUTE
+ if (arch_supports_pkeys(PKEY_DISABLE_EXECUTE)) {
+ strcpy(buf, "true\n");
+ return sizeof("true\n") - 1;
+ }
+#endif
+
+ strcpy(buf, "false\n");
+ return sizeof("false\n") - 1;
+}
+PKEYS_ATTR_RO(disable_execute_supported);
+
+static struct attribute *pkeys_attrs[] = {
+ &total_keys_attr.attr,
+ &usable_keys_attr.attr,
+ &disable_access_supported_attr.attr,
+ &disable_write_supported_attr.attr,
+ &disable_execute_supported_attr.attr,
+ NULL,
+};
+
+static const struct attribute_group pkeys_attr_group = {
+ .attrs = pkeys_attrs,
+ .name = "protection_keys",
+};
+
+static int __init pkeys_sysfs_init(void)
+{
+ int err;
+
+ err = sysfs_create_group(mm_kobj, &pkeys_attr_group);
+
+ return err;
+}
+late_initcall(pkeys_sysfs_init);
+#endif /* CONFIG_SYSFS */
+
#endif /* CONFIG_ARCH_HAS_PKEYS */
--
1.7.1
From 1583153963123206198@xxx Sat Nov 04 16:34:33 +0000 2017
X-GM-THRID: 1582818041467251394
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Finally this patch provides the ability for a process to
allocate and free a protection key.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/systbl.h | 2 ++
arch/powerpc/include/asm/unistd.h | 4 +---
arch/powerpc/include/uapi/asm/unistd.h | 2 ++
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 449912f..dea4a95 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -389,3 +389,5 @@
COMPAT_SYS_SPU(pwritev2)
SYSCALL(kexec_file_load)
SYSCALL(statx)
+SYSCALL(pkey_alloc)
+SYSCALL(pkey_free)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index 9ba11db..e0273bc 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,13 +12,11 @@
#include <uapi/asm/unistd.h>
-#define NR_syscalls 384
+#define NR_syscalls 386
#define __NR__exit __NR_exit
#define __IGNORE_pkey_mprotect
-#define __IGNORE_pkey_alloc
-#define __IGNORE_pkey_free
#ifndef __ASSEMBLY__
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index df8684f..5db4385 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -395,5 +395,7 @@
#define __NR_pwritev2 381
#define __NR_kexec_file_load 382
#define __NR_statx 383
+#define __NR_pkey_alloc 384
+#define __NR_pkey_free 385
#endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
--
1.7.1
From 1583236606232496868@xxx Sun Nov 05 14:28:08 +0000 2017
X-GM-THRID: 1582966814362830300
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
helper functions to handler shadow pkey register
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/pkey-helpers.h | 27 ++++++++++++++++++++
tools/testing/selftests/vm/protection_keys.c | 34 ++++++++++++++++---------
2 files changed, 49 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h
index b03f7e5..d521f53 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -44,6 +44,33 @@
#define DEBUG_LEVEL 0
#endif
#define DPRINT_IN_SIGNAL_BUF_SIZE 4096
+
+static inline u32 pkey_to_shift(int pkey)
+{
+ return pkey * PKEY_BITS_PER_PKEY;
+}
+
+static inline pkey_reg_t reset_bits(int pkey, pkey_reg_t bits)
+{
+ u32 shift = pkey_to_shift(pkey);
+
+ return ~(bits << shift);
+}
+
+static inline pkey_reg_t left_shift_bits(int pkey, pkey_reg_t bits)
+{
+ u32 shift = pkey_to_shift(pkey);
+
+ return (bits << shift);
+}
+
+static inline pkey_reg_t right_shift_bits(int pkey, pkey_reg_t bits)
+{
+ u32 shift = pkey_to_shift(pkey);
+
+ return (bits >> shift);
+}
+
extern int dprint_in_signal;
extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
static inline void sigsafe_printf(const char *format, ...)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 2e8de01..8e2e277 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -374,7 +374,7 @@ u32 pkey_get(int pkey, unsigned long flags)
__func__, pkey, flags, 0, 0);
dprintf2("%s() raw pkey_reg: %x\n", __func__, pkey_reg);
- shifted_pkey_reg = (pkey_reg >> (pkey * PKEY_BITS_PER_PKEY));
+ shifted_pkey_reg = right_shift_bits(pkey, pkey_reg);
dprintf2("%s() shifted_pkey_reg: %x\n", __func__, shifted_pkey_reg);
masked_pkey_reg = shifted_pkey_reg & mask;
dprintf2("%s() masked pkey_reg: %x\n", __func__, masked_pkey_reg);
@@ -397,9 +397,9 @@ int pkey_set(int pkey, unsigned long rights, unsigned long flags)
/* copy old pkey_reg */
new_pkey_reg = old_pkey_reg;
/* mask out bits from pkey in old value: */
- new_pkey_reg &= ~(mask << (pkey * PKEY_BITS_PER_PKEY));
+ new_pkey_reg &= reset_bits(pkey, mask);
/* OR in new bits for pkey: */
- new_pkey_reg |= (rights << (pkey * PKEY_BITS_PER_PKEY));
+ new_pkey_reg |= left_shift_bits(pkey, rights);
__wrpkey_reg(new_pkey_reg);
@@ -430,7 +430,7 @@ void pkey_disable_set(int pkey, int flags)
ret = pkey_set(pkey, pkey_rights, syscall_flags);
assert(!ret);
/*pkey_reg and flags have the same format */
- shadow_pkey_reg |= flags << (pkey * 2);
+ shadow_pkey_reg |= left_shift_bits(pkey, flags);
dprintf1("%s(%d) shadow: 0x%016lx\n",
__func__, pkey, shadow_pkey_reg);
@@ -465,7 +465,7 @@ void pkey_disable_clear(int pkey, int flags)
ret = pkey_set(pkey, pkey_rights, 0);
/* pkey_reg and flags have the same format */
- shadow_pkey_reg &= ~(flags << (pkey * 2));
+ shadow_pkey_reg &= reset_bits(pkey, flags);
pkey_assert(ret >= 0);
pkey_rights = pkey_get(pkey, syscall_flags);
@@ -523,6 +523,21 @@ int sys_pkey_alloc(unsigned long flags, unsigned long init_val)
return ret;
}
+void pkey_setup_shadow(void)
+{
+ shadow_pkey_reg = __rdpkey_reg();
+}
+
+void pkey_reset_shadow(u32 key)
+{
+ shadow_pkey_reg &= reset_bits(key, 0x3);
+}
+
+void pkey_set_shadow(u32 key, u64 init_val)
+{
+ shadow_pkey_reg |= left_shift_bits(key, init_val);
+}
+
int alloc_pkey(void)
{
int ret;
@@ -540,7 +555,7 @@ int alloc_pkey(void)
shadow_pkey_reg);
if (ret) {
/* clear both the bits: */
- shadow_pkey_reg &= ~(0x3 << (ret * 2));
+ pkey_reset_shadow(ret);
dprintf4("%s()::%d, ret: %d pkey_reg: 0x%016lx "
"shadow: 0x%016lx\n",
__func__,
@@ -550,7 +565,7 @@ int alloc_pkey(void)
* move the new state in from init_val
* (remember, we cheated and init_val == pkey_reg format)
*/
- shadow_pkey_reg |= (init_val << (ret * 2));
+ pkey_set_shadow(ret, init_val);
}
dprintf4("%s()::%d, ret: %d pkey_reg: 0x%016lx shadow: 0x%016lx\n",
__func__, __LINE__, ret, __rdpkey_reg(),
@@ -1322,11 +1337,6 @@ void run_tests_once(void)
iteration_nr++;
}
-void pkey_setup_shadow(void)
-{
- shadow_pkey_reg = __rdpkey_reg();
-}
-
int main(void)
{
int nr_iterations = 22;
--
1.7.1
From 1583078694280622824@xxx Fri Nov 03 20:38:11 +0000 2017
X-GM-THRID: 1582956709818260170
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Map the key protection bits of the vma to the pkey bits in
the PTE.
The PTE bits used for pkey are 3,4,5,6 and 57. The first
four bits are the same four bits that were freed up initially
in this patch series. remember? :-) Without those four bits
this patch wouldn't be possible.
BUT, on 4k kernel, bit 3, and 4 could not be freed up. remember?
Hence we have to be satisfied with 5, 6 and 7.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/book3s/64/pgtable.h | 25 ++++++++++++++++++++++++-
arch/powerpc/include/asm/mman.h | 6 ++++++
arch/powerpc/include/asm/pkeys.h | 12 ++++++++++++
3 files changed, 42 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 9a677cd..4c1ee6e 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -39,6 +39,7 @@
#define _RPAGE_RSV2 0x0800000000000000UL
#define _RPAGE_RSV3 0x0400000000000000UL
#define _RPAGE_RSV4 0x0200000000000000UL
+#define _RPAGE_RSV5 0x00040UL
#define _PAGE_PTE 0x4000000000000000UL /* distinguishes PTEs from pointers */
#define _PAGE_PRESENT 0x8000000000000000UL /* pte contains a translation */
@@ -58,6 +59,25 @@
/* Max physical address bit as per radix table */
#define _RPAGE_PA_MAX 57
+#ifdef CONFIG_PPC_MEM_KEYS
+#ifdef CONFIG_PPC_64K_PAGES
+#define H_PTE_PKEY_BIT0 _RPAGE_RSV1
+#define H_PTE_PKEY_BIT1 _RPAGE_RSV2
+#else /* CONFIG_PPC_64K_PAGES */
+#define H_PTE_PKEY_BIT0 0 /* _RPAGE_RSV1 is not available */
+#define H_PTE_PKEY_BIT1 0 /* _RPAGE_RSV2 is not available */
+#endif /* CONFIG_PPC_64K_PAGES */
+#define H_PTE_PKEY_BIT2 _RPAGE_RSV3
+#define H_PTE_PKEY_BIT3 _RPAGE_RSV4
+#define H_PTE_PKEY_BIT4 _RPAGE_RSV5
+#else /* CONFIG_PPC_MEM_KEYS */
+#define H_PTE_PKEY_BIT0 0
+#define H_PTE_PKEY_BIT1 0
+#define H_PTE_PKEY_BIT2 0
+#define H_PTE_PKEY_BIT3 0
+#define H_PTE_PKEY_BIT4 0
+#endif /* CONFIG_PPC_MEM_KEYS */
+
/*
* Max physical address bit we will use for now.
*
@@ -121,13 +141,16 @@
#define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \
_PAGE_ACCESSED | _PAGE_SPECIAL | _PAGE_PTE | \
_PAGE_SOFT_DIRTY)
+
+#define H_PTE_PKEY (H_PTE_PKEY_BIT0 | H_PTE_PKEY_BIT1 | H_PTE_PKEY_BIT2 | \
+ H_PTE_PKEY_BIT3 | H_PTE_PKEY_BIT4)
/*
* Mask of bits returned by pte_pgprot()
*/
#define PAGE_PROT_BITS (_PAGE_SAO | _PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT | \
H_PAGE_4K_PFN | _PAGE_PRIVILEGED | _PAGE_ACCESSED | \
_PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY | _PAGE_EXEC | \
- _PAGE_SOFT_DIRTY)
+ _PAGE_SOFT_DIRTY | H_PTE_PKEY)
/*
* We define 2 sets of base prot bits, one for basic pages (ie,
* cacheable kernel and user pages) and one for non cacheable
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 2999478..07e3f54 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -33,7 +33,13 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
{
+#ifdef CONFIG_PPC_MEM_KEYS
+ return (vm_flags & VM_SAO) ?
+ __pgprot(_PAGE_SAO | vmflag_to_pte_pkey_bits(vm_flags)) :
+ __pgprot(0 | vmflag_to_pte_pkey_bits(vm_flags));
+#else
return (vm_flags & VM_SAO) ? __pgprot(_PAGE_SAO) : __pgprot(0);
+#endif
}
#define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 441bbf3..cfe61a9 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -52,6 +52,18 @@ static inline u64 pkey_to_vmflag_bits(u16 pkey)
return (((u64)pkey << VM_PKEY_SHIFT) & ARCH_VM_PKEY_FLAGS);
}
+static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
+{
+ if (static_branch_likely(&pkey_disabled))
+ return 0x0UL;
+
+ return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT4 : 0x0UL) |
+ ((vm_flags & VM_PKEY_BIT1) ? H_PTE_PKEY_BIT3 : 0x0UL) |
+ ((vm_flags & VM_PKEY_BIT2) ? H_PTE_PKEY_BIT2 : 0x0UL) |
+ ((vm_flags & VM_PKEY_BIT3) ? H_PTE_PKEY_BIT1 : 0x0UL) |
+ ((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT0 : 0x0UL));
+}
+
static inline int vma_pkey(struct vm_area_struct *vma)
{
if (static_branch_likely(&pkey_disabled))
--
1.7.1
From 1582968573385220927@xxx Thu Nov 02 15:27:52 +0000 2017
X-GM-THRID: 1582968573385220927
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
arch independent code calls arch_override_mprotect_pkey()
to return a pkey that best matches the requested protection.
This patch provides the implementation.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/mmu_context.h | 5 ++++
arch/powerpc/include/asm/pkeys.h | 21 +++++++++++++++++-
arch/powerpc/mm/pkeys.c | 36 ++++++++++++++++++++++++++++++++
3 files changed, 61 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 4eccc2f..a83d540 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -149,6 +149,11 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
#define thread_pkey_regs_save(thread)
#define thread_pkey_regs_restore(new_thread, old_thread)
#define thread_pkey_regs_init(thread)
+
+static inline int vma_pkey(struct vm_area_struct *vma)
+{
+ return 0;
+}
#endif /* CONFIG_PPC_MEM_KEYS */
#endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 1bd41ef..441bbf3 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -52,6 +52,13 @@ static inline u64 pkey_to_vmflag_bits(u16 pkey)
return (((u64)pkey << VM_PKEY_SHIFT) & ARCH_VM_PKEY_FLAGS);
}
+static inline int vma_pkey(struct vm_area_struct *vma)
+{
+ if (static_branch_likely(&pkey_disabled))
+ return 0;
+ return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT;
+}
+
#define arch_max_pkey() pkeys_total
#define pkey_alloc_mask(pkey) (0x1 << pkey)
@@ -148,10 +155,22 @@ static inline int execute_only_pkey(struct mm_struct *mm)
return __execute_only_pkey(mm);
}
+extern int __arch_override_mprotect_pkey(struct vm_area_struct *vma,
+ int prot, int pkey);
static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
int prot, int pkey)
{
- return 0;
+ if (static_branch_likely(&pkey_disabled))
+ return 0;
+
+ /*
+ * Is this an mprotect_pkey() call? If so, never override the value that
+ * came from the user.
+ */
+ if (pkey != -1)
+ return pkey;
+
+ return __arch_override_mprotect_pkey(vma, prot, pkey);
}
extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 4d704ea..f1c6195 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -311,3 +311,39 @@ int __execute_only_pkey(struct mm_struct *mm)
mm->context.execute_only_pkey = execute_only_pkey;
return execute_only_pkey;
}
+
+static inline bool vma_is_pkey_exec_only(struct vm_area_struct *vma)
+{
+ /* Do this check first since the vm_flags should be hot */
+ if ((vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)) != VM_EXEC)
+ return false;
+
+ return (vma_pkey(vma) == vma->vm_mm->context.execute_only_pkey);
+}
+
+/*
+ * This should only be called for *plain* mprotect calls.
+ */
+int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot,
+ int pkey)
+{
+ /*
+ * If the currently associated pkey is execute-only, but the requested
+ * protection requires read or write, move it back to the default pkey.
+ */
+ if (vma_is_pkey_exec_only(vma) && (prot & (PROT_READ | PROT_WRITE)))
+ return 0;
+
+ /*
+ * The requested protection is execute-only. Hence let's use an
+ * execute-only pkey.
+ */
+ if (prot == PROT_EXEC) {
+ pkey = execute_only_pkey(vma->vm_mm);
+ if (pkey > 0)
+ return pkey;
+ }
+
+ /* Nothing to override. */
+ return vma_pkey(vma);
+}
--
1.7.1
From 1583068527009256658@xxx Fri Nov 03 17:56:35 +0000 2017
X-GM-THRID: 1583065651693918938
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
When a key is freed, the key is no more effective.
Clear the bits corresponding to the pkey in the shadow
register. Otherwise it will carry some spurious bits
which can trigger false-positive asserts.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/protection_keys.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 384cc9a..2823d4d 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -582,6 +582,9 @@ int alloc_pkey(void)
int sys_pkey_free(unsigned long pkey)
{
int ret = syscall(SYS_pkey_free, pkey);
+
+ if (!ret)
+ shadow_pkey_reg &= reset_bits(pkey, PKEY_DISABLE_ACCESS);
dprintf1("%s(pkey=%ld) syscall ret: %d\n", __func__, pkey, ret);
return ret;
}
--
1.7.1
From 1582952801086096182@xxx Thu Nov 02 11:17:10 +0000 2017
X-GM-THRID: 1582952801086096182
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
powerpc has hardware support to disable execute on a pkey.
This patch enables the ability to create execute-disabled
keys.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/uapi/asm/mman.h | 6 ++++++
arch/powerpc/mm/pkeys.c | 16 ++++++++++++++++
2 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index e63bc37..65065ce 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -30,4 +30,10 @@
#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+/* Override any generic PKEY permission defines */
+#define PKEY_DISABLE_EXECUTE 0x4
+#undef PKEY_ACCESS_MASK
+#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\
+ PKEY_DISABLE_WRITE |\
+ PKEY_DISABLE_EXECUTE)
#endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 4a01c2f..3ddc13a 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -29,6 +29,14 @@ void __init pkey_initialize(void)
int os_reserved, i;
/*
+ * We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral
+ * generic defines for PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE.
+ * Ensure that the bits a distinct.
+ */
+ BUILD_BUG_ON(PKEY_DISABLE_EXECUTE &
+ (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
+
+ /*
* Disable the pkey system till everything is in place. A subsequent
* patch will enable it.
*/
@@ -171,10 +179,18 @@ int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val)
{
u64 new_amr_bits = 0x0ul;
+ u64 new_iamr_bits = 0x0ul;
if (!is_pkey_enabled(pkey))
return -EINVAL;
+ if (init_val & PKEY_DISABLE_EXECUTE) {
+ if (!pkey_execute_disable_supported)
+ return -EINVAL;
+ new_iamr_bits |= IAMR_EX_BIT;
+ }
+ init_iamr(pkey, new_iamr_bits);
+
/* Set the bits we need in AMR: */
if (init_val & PKEY_DISABLE_ACCESS)
new_amr_bits |= AMR_RD_BIT | AMR_WR_BIT;
--
1.7.1
From 1586807082342490646@xxx Fri Dec 15 00:19:19 +0000 2017
X-GM-THRID: 1584702926332561254
X-Gmail-Labels: Inbox,Category Forums
alloc_random_pkey() was allocating the same pkey every time.
Not all pkeys were geting tested. fixed it.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/protection_keys.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 2823d4d..1a14027 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -24,6 +24,7 @@
#define _GNU_SOURCE
#include <errno.h>
#include <linux/futex.h>
+#include <time.h>
#include <sys/time.h>
#include <sys/syscall.h>
#include <string.h>
@@ -602,13 +603,15 @@ int alloc_random_pkey(void)
int alloced_pkeys[NR_PKEYS];
int nr_alloced = 0;
int random_index;
+
memset(alloced_pkeys, 0, sizeof(alloced_pkeys));
+ srand((unsigned int)time(NULL));
/* allocate every possible key and make a note of which ones we got */
max_nr_pkey_allocs = NR_PKEYS;
- max_nr_pkey_allocs = 1;
for (i = 0; i < max_nr_pkey_allocs; i++) {
int new_pkey = alloc_pkey();
+
if (new_pkey < 0)
break;
alloced_pkeys[nr_alloced++] = new_pkey;
@@ -624,13 +627,14 @@ int alloc_random_pkey(void)
/* go through the allocated ones that we did not want and free them */
for (i = 0; i < nr_alloced; i++) {
int free_ret;
+
if (!alloced_pkeys[i])
continue;
free_ret = sys_pkey_free(alloced_pkeys[i]);
pkey_assert(!free_ret);
}
- dprintf1("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n", __func__,
- __LINE__, ret, __rdpkey_reg(), shadow_pkey_reg);
+ dprintf1("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%016lx\n",
+ __func__, __LINE__, ret, __rdpkey_reg(), shadow_pkey_reg);
return ret;
}
--
1.7.1
From 1583039741698772803@xxx Fri Nov 03 10:19:03 +0000 2017
X-GM-THRID: 1583039741698772803
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Cleanup the bits corresponding to a key in the AMR, and IAMR
register, when the key is newly allocated/activated or is freed.
We dont want some residual bits cause the hardware enforce
unintended behavior when the key is activated or freed.
Reviewed-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/pkeys.h | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index e5deac7..0d00a54 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -69,6 +69,8 @@ static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
__mm_pkey_is_allocated(mm, pkey));
}
+extern void __arch_activate_pkey(int pkey);
+extern void __arch_deactivate_pkey(int pkey);
/*
* Returns a positive, 5-bit key on success, or -1 on failure.
* Relies on the mmap_sem to protect against concurrency in mm_pkey_alloc() and
@@ -96,6 +98,12 @@ static inline int mm_pkey_alloc(struct mm_struct *mm)
ret = ffz((u32)mm_pkey_allocation_map(mm));
__mm_pkey_allocated(mm, ret);
+
+ /*
+ * Enable the key in the hardware
+ */
+ if (ret > 0)
+ __arch_activate_pkey(ret);
return ret;
}
@@ -107,6 +115,10 @@ static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
if (!mm_pkey_is_allocated(mm, pkey))
return -EINVAL;
+ /*
+ * Disable the key in the hardware
+ */
+ __arch_deactivate_pkey(pkey);
__mm_pkey_free(mm, pkey);
return 0;
--
1.7.1
From 1583367088635555944@xxx Tue Nov 07 01:02:05 +0000 2017
X-GM-THRID: 1583367088635555944
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Map the PTE protection key bits to the HPTE key protection bits,
while creating HPTE entries.
Acked-by: Balbir Singh <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 +++++
arch/powerpc/include/asm/mmu_context.h | 6 ++++++
arch/powerpc/include/asm/pkeys.h | 9 +++++++++
arch/powerpc/mm/hash_utils_64.c | 1 +
4 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 508275b..2e22357 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -90,6 +90,8 @@
#define HPTE_R_PP0 ASM_CONST(0x8000000000000000)
#define HPTE_R_TS ASM_CONST(0x4000000000000000)
#define HPTE_R_KEY_HI ASM_CONST(0x3000000000000000)
+#define HPTE_R_KEY_BIT0 ASM_CONST(0x2000000000000000)
+#define HPTE_R_KEY_BIT1 ASM_CONST(0x1000000000000000)
#define HPTE_R_RPN_SHIFT 12
#define HPTE_R_RPN ASM_CONST(0x0ffffffffffff000)
#define HPTE_R_RPN_3_0 ASM_CONST(0x01fffffffffff000)
@@ -104,6 +106,9 @@
#define HPTE_R_C ASM_CONST(0x0000000000000080)
#define HPTE_R_R ASM_CONST(0x0000000000000100)
#define HPTE_R_KEY_LO ASM_CONST(0x0000000000000e00)
+#define HPTE_R_KEY_BIT2 ASM_CONST(0x0000000000000800)
+#define HPTE_R_KEY_BIT3 ASM_CONST(0x0000000000000400)
+#define HPTE_R_KEY_BIT4 ASM_CONST(0x0000000000000200)
#define HPTE_R_KEY (HPTE_R_KEY_LO | HPTE_R_KEY_HI)
#define HPTE_V_1TB_SEG ASM_CONST(0x4000000000000000)
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index a83d540..a557735 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -154,6 +154,12 @@ static inline int vma_pkey(struct vm_area_struct *vma)
{
return 0;
}
+
+static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
+{
+ return 0x0UL;
+}
+
#endif /* CONFIG_PPC_MEM_KEYS */
#endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index cfe61a9..06a58fe 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -73,6 +73,15 @@ static inline int vma_pkey(struct vm_area_struct *vma)
#define arch_max_pkey() pkeys_total
+static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
+{
+ return (((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
+ ((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL));
+}
+
#define pkey_alloc_mask(pkey) (0x1 << pkey)
#define mm_pkey_allocation_map(mm) (mm->context.pkey_allocation_map)
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 1e74590..ddfc673 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -232,6 +232,7 @@ unsigned long htab_convert_pte_flags(unsigned long pteflags)
*/
rflags |= HPTE_R_M;
+ rflags |= pte_to_hpte_pkey_bits(pteflags);
return rflags;
}
--
1.7.1
From 1583314631491434192@xxx Mon Nov 06 11:08:18 +0000 2017
X-GM-THRID: 1574636603958149385
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Patch provides the ability for a process to
associate a pkey with a address range.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/systbl.h | 1 +
arch/powerpc/include/asm/unistd.h | 4 +---
arch/powerpc/include/uapi/asm/unistd.h | 1 +
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index dea4a95..d61f9c9 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -391,3 +391,4 @@
SYSCALL(statx)
SYSCALL(pkey_alloc)
SYSCALL(pkey_free)
+SYSCALL(pkey_mprotect)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index e0273bc..daf1ba9 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,12 +12,10 @@
#include <uapi/asm/unistd.h>
-#define NR_syscalls 386
+#define NR_syscalls 387
#define __NR__exit __NR_exit
-#define __IGNORE_pkey_mprotect
-
#ifndef __ASSEMBLY__
#include <linux/types.h>
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index 5db4385..389c36f 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -397,5 +397,6 @@
#define __NR_statx 383
#define __NR_pkey_alloc 384
#define __NR_pkey_free 385
+#define __NR_pkey_mprotect 386
#endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
--
1.7.1
From 1582977673485579310@xxx Thu Nov 02 17:52:30 +0000 2017
X-GM-THRID: 1582942473161349065
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
is enabled. Powerpc also needs these bits. Hence lets define the
VM_PKEY_BITx bits for any architecture that enables
CONFIG_ARCH_HAS_PKEYS.
Signed-off-by: Ram Pai <[email protected]>
---
fs/proc/task_mmu.c | 4 ++--
include/linux/mm.h | 9 +++++----
2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6744bd7..677866e 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -677,13 +677,13 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
[ilog2(VM_MERGEABLE)] = "mg",
[ilog2(VM_UFFD_MISSING)]= "um",
[ilog2(VM_UFFD_WP)] = "uw",
-#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+#ifdef CONFIG_ARCH_HAS_PKEYS
/* These come out via ProtectionKey: */
[ilog2(VM_PKEY_BIT0)] = "",
[ilog2(VM_PKEY_BIT1)] = "",
[ilog2(VM_PKEY_BIT2)] = "",
[ilog2(VM_PKEY_BIT3)] = "",
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
};
size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 43edf65..2c5ea48 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -218,15 +218,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *,
#define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
#endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
-#if defined(CONFIG_X86)
-# define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */
-#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)
+#ifdef CONFIG_ARCH_HAS_PKEYS
# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */
# define VM_PKEY_BIT1 VM_HIGH_ARCH_1
# define VM_PKEY_BIT2 VM_HIGH_ARCH_2
# define VM_PKEY_BIT3 VM_HIGH_ARCH_3
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
+
+#if defined(CONFIG_X86)
+# define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */
#elif defined(CONFIG_PPC)
# define VM_SAO VM_ARCH_1 /* Strong Access Ordering (powerpc) */
#elif defined(CONFIG_PARISC)
--
1.7.1
From 1585503254384630999@xxx Thu Nov 30 14:55:32 +0000 2017
X-GM-THRID: 1585503254384630999
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
open_hugepage_file() <- opens the huge page file
get_start_key() <-- provides the first non-reserved key.
Signed-off-by: Ram Pai <[email protected]>
---
tools/testing/selftests/vm/pkey-helpers.h | 11 +++++++++++
tools/testing/selftests/vm/protection_keys.c | 6 +++---
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h
index d521f53..30755be 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -301,3 +301,14 @@ static inline void __page_o_noops(void)
} \
} while (0)
#define raw_assert(cond) assert(cond)
+
+static inline int open_hugepage_file(int flag)
+{
+ return open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages",
+ O_RDONLY);
+}
+
+static inline int get_start_key(void)
+{
+ return 1;
+}
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 1a14027..19ae991 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -809,7 +809,7 @@ void setup_hugetlbfs(void)
* Now go make sure that we got the pages and that they
* are 2M pages. Someone might have made 1G the default.
*/
- fd = open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages", O_RDONLY);
+ fd = open_hugepage_file(O_RDONLY);
if (fd < 0) {
perror("opening sysfs 2M hugetlb config");
return;
@@ -1087,10 +1087,10 @@ void test_kernel_gup_write_to_write_disabled_region(int *ptr, u16 pkey)
void test_pkey_syscalls_on_non_allocated_pkey(int *ptr, u16 pkey)
{
int err;
- int i;
+ int i = get_start_key();
/* Note: 0 is the default pkey, so don't mess with it */
- for (i = 1; i < NR_PKEYS; i++) {
+ for (; i < NR_PKEYS; i++) {
if (pkey == i)
continue;
--
1.7.1
From 1584955083550997535@xxx Fri Nov 24 13:42:35 +0000 2017
X-GM-THRID: 1584680927621956878
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread
Arch neutral code needs to know if the architecture supports
protection keys to display protection key in smaps. Hence
introducing arch_pkeys_enabled().
This patch also provides x86 implementation for
arch_pkeys_enabled().
Signed-off-by: Ram Pai <[email protected]>
---
arch/x86/include/asm/pkeys.h | 1 +
arch/x86/kernel/fpu/xstate.c | 5 +++++
include/linux/pkeys.h | 5 +++++
3 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index a0ba1ff..f6c287b 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -6,6 +6,7 @@
extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
+extern bool arch_pkeys_enabled(void);
/*
* Try to dedicate one of the protection keys to be used as an
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index f1d5476..a43db74 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -942,6 +942,11 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
return 0;
}
+
+bool arch_pkeys_enabled(void)
+{
+ return boot_cpu_has(X86_FEATURE_OSPKE);
+}
#endif /* ! CONFIG_ARCH_HAS_PKEYS */
/*
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 0794ca7..3ca2e44 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -35,6 +35,11 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
return 0;
}
+static inline bool arch_pkeys_enabled(void)
+{
+ return false;
+}
+
static inline void copy_init_pkru_to_fpregs(void)
{
}
--
1.7.1
From 1586132557529833925@xxx Thu Dec 07 13:38:02 +0000 2017
X-GM-THRID: 1586132557529833925
X-Gmail-Labels: Inbox,Category Promotions,HistoricalUnread
arch-independent code expects the arch to map
a pkey into the vma's protection bit setting.
The patch provides that ability.
Signed-off-by: Ram Pai <[email protected]>
---
arch/powerpc/include/asm/mman.h | 7 ++++++-
arch/powerpc/include/asm/pkeys.h | 11 +++++++++++
arch/powerpc/mm/pkeys.c | 8 ++++++++
3 files changed, 25 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 30922f6..2999478 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -13,6 +13,7 @@
#include <asm/cputable.h>
#include <linux/mm.h>
+#include <linux/pkeys.h>
#include <asm/cpu_has_feature.h>
/*
@@ -22,7 +23,11 @@
static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
unsigned long pkey)
{
- return (prot & PROT_SAO) ? VM_SAO : 0;
+#ifdef CONFIG_PPC_MEM_KEYS
+ return (((prot & PROT_SAO) ? VM_SAO : 0) | pkey_to_vmflag_bits(pkey));
+#else
+ return ((prot & PROT_SAO) ? VM_SAO : 0);
+#endif
}
#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 20d1f0e..1bd41ef 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -41,6 +41,17 @@
#define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
VM_PKEY_BIT3 | VM_PKEY_BIT4)
+/* Override any generic PKEY permission defines */
+#define PKEY_DISABLE_EXECUTE 0x4
+#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS | \
+ PKEY_DISABLE_WRITE | \
+ PKEY_DISABLE_EXECUTE)
+
+static inline u64 pkey_to_vmflag_bits(u16 pkey)
+{
+ return (((u64)pkey << VM_PKEY_SHIFT) & ARCH_VM_PKEY_FLAGS);
+}
+
#define arch_max_pkey() pkeys_total
#define pkey_alloc_mask(pkey) (0x1 << pkey)
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 5da94fe..4d704ea 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -39,6 +39,14 @@ void __init pkey_initialize(void)
(PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
/*
+ * pkey_to_vmflag_bits() assumes that the pkey bits are contiguous
+ * in the vmaflag. Make sure that is really the case.
+ */
+ BUILD_BUG_ON(__builtin_clzl(ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) +
+ __builtin_popcountl(ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT)
+ != (sizeof(u64) * BITS_PER_BYTE));
+
+ /*
* Disable the pkey system till everything is in place. A subsequent
* patch will enable it.
*/
--
1.7.1
From 1586063073694284458@xxx Wed Dec 06 19:13:37 +0000 2017
X-GM-THRID: 1585318318730453659
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread