2022-04-21 16:53:21

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 00/44] PKS/PMEM: Add Stray Write Protection

From: Ira Weiny <[email protected]>

I'm looking for Intel acks on the series prior to submitting to maintainers.

Because I did not get a lot of feedback on the previous version I've reworked
the order of the patches to lighten the review load.

I'd like to get comments from Peter and Dave on patches 1-24.

Patches 25-36 implement the PMEM use case. I'd like to get Dan to look at those.

Patches 37-44 implement the PKS tests which can be deferred if review time is
short.

Code wise there were no significant changes between v9 and v10. But a V10 was
required due to upstream changes/conflicts. One of which require dropping a
patch because a different fix landed.

This series is now based on 5.18-rc3.


Changes for V10
Rebased to 5.18-rc3
Re-aranged the patch series into 3 sections
1-24 PKS core
25-36 PMEM use case
37-44 PKS core testing
Drop the irqentry_exit_cond_resched() fixup patch as that was fixed by
Mark Rutland in:
4624a14f4daa ("sched/preempt: Simplify irqentry_exit_cond_resched() callers")
Adjust irqentry_exit_cond_resched() changes based on Marks fix
Fix test_pks cpu option processing
Move memremap code to memremap.h



PKS/PMEM Stray write protection
===============================

This series is broken into 2 parts.

1) Introduce Protection Key Supervisor (PKS), testing, and
documentation
2) Use PKS to protect PMEM from stray writes

Introduce Protection Key Supervisor (PKS) [Patches 1-24]
--------------------------------------------------------

PKS enables protections on 'domains' of supervisor pages to limit supervisor
mode access to pages beyond the normal paging protections. PKS works in a
similar fashion to user space pkeys, PKU. As with PKU, supervisor pkeys are
checked in addition to normal paging protections. And page mappings are
assigned to a domain by setting a 4 bit pkey in the PTE of that mapping.

Unlike PKU, permissions are changed via a MSR update. This update avoids TLB
flushes making this an efficient way to alter protections vs PTE updates.

Also, unlike PTE updates PKS permission changes apply only to the current
processor. Therefore changing permissions apply only to that thread and not
any other cpu/process. This allows protections to remain in place on other
cpus for additional protection and isolation.

Even though PKS updates are thread local, XSAVE is not supported for the PKRS
MSR. Therefore this implementation saves and restores the MSR across context
switches and during exceptions within software. Nested exceptions are
supported by each exception getting a new PKS state.

For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections because PTEs naturally have a pkey value of 0.

Other keys, (1-15) are statically allocated by kernel consumers when
configured. This is done by adding the appropriate PKS_NEW_KEY and
PKS_DECLARE_INIT_VALUE macros to pks-keys.h.

Two PKS consumers, PKS_TEST and PMEM stray write protection, are included in
this series. When the number of users grows larger the sharing of keys will
need to be resolved depending on the needs of the users at that time. Many
methods have been contemplated but the number of kernel users and use cases
envisioned is still quite small, much less than the 15 available keys.

To summarize, the following are key attributes of PKS.

1) Fast switching of permissions
1a) Prevents access without page table manipulations
1b) No TLB flushes required
2) Works on a per thread basis, thus allowing protections to be
preserved on threads which are not actively accessing data through
the mapping.

PKS is available with 4 and 5 level paging. For this and simplicity of
implementation, the feature is restricted to x86_64.


Use PKS to protect PMEM from stray writes [Patches 25-36]
---------------------------------------------------------

DAX leverages the direct-map to enable 'struct page' services for PMEM. Given
that PMEM capacity may be an order of magnitude higher capacity than System RAM
it presents a large vulnerability surface to stray writes. Such a stray write
becomes a silent data corruption bug.

Stray pointers to System RAM may result in a crash or other undesirable
behavior which, while unfortunate, are usually recoverable with a reboot.
Stray writes to PMEM are permanent in nature and thus are more likely to result
in permanent user data loss. Given that PMEM access from the kernel is limited
to a constrained set of locations (PMEM driver, Filesystem-DAX, direct-I/O, and
any properly kmap'ed page), it is amenable to PKS protection.

Set up an infrastructure for extra device access protection. Then implement the
protection using the new Protection Keys Supervisor (PKS) on architectures
which support it.

Because PMEM pages are all associated with a struct dev_pagemap and flags in
struct page are valuable the flag of protecting memory can be stored in struct
dev_pagemap. All PMEM is protected by the same pkey. So a single flag is all
that is needed in each dev_pagemap to indicate protection.

General access in the kernel is supported by modifying the kmap infrastructure
which can detect if a page is pks protected and enable access until the
corresponding unmap is called.

Because PKS is a thread local mechanism and because kmap was never really
intended to create a long term mapping, this implementation does not support
the kmap()/kunmap() calls. Calling kmap() on a PMEM protected page is allowed
but accessing that mapping will cause a fault.

Originally this series modified many of the kmap call sites to indicate they
were thread local.[1] And an attempt to support kmap()[2] was made. But now
that kmap_local_page() has been developed[3] and in more wide spread use,
kmap() can safely be left unsupported.

How the fault is handled is configurable via a new module parameter
memremap.pks_fault_mode. Two modes are supported.

'relaxed' (default) -- WARN_ONCE, disable the protection and allow
access

'strict' -- prevent any unguarded access to a protected dev_pagemap
range

This 'safety valve' feature has already been useful in the development of this
feature.


[1] https://lore.kernel.org/lkml/[email protected]/

[2] https://lore.kernel.org/lkml/[email protected]/

[3] https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/


----------------------------------------------------------------------------
Changes for V9

Review and update all commit messages.
Update cover letter below

PKS Core
Separate user and supervisor pkey code in the headers
create linux/pks.h for supervisor calls
This facilitated making the pmem code more efficient
Completely rearchitect the test code
[After Dave Hansen and Rick Edgecombe found issues in the test
code it was easier to rearchitect the code completely
rather than attempt to fix it.]
Remove pks_test_callback in favor of using fault hooks
Fault hooks also isolate the fault callbacks from being
false positives if non-test consumers are running
Make additional PKS_TEST_RUN_ALL Kconfig option which is
mutually exclusive to any non-test PKS consumer
PKS_TEST_RUN_ALL takes over all pkey callbacks
Ensure that each test runs within it's own context and is
mutually exclusive from running while any other test is
running.
Ensure test session and context memory is cleaned up on file
close
Use pr_debug() and dynamic debug for in kernel debug messages
Enhance test_pks selftest
Add the ability to run all tests not just the context
switch test
Standardize output [PASS][FAIL][SKIP]
Add '-d' option enables dynamic debug to see the kernel
debug messages

Incorporate feedback from Rick Edgecombe
Update all pkey types to u8
Fix up test code barriers
Move patch declaring PKS_INIT_VALUE ahead of the patch which enables
PKS so that PKS_INIT_VALUE can be used when pks_setup() is
first created
From Dan Williams
Use macros instead of an enum for a pkey allocation scheme
which is predicated on the config options of consumers
This almost worked perfectly. It required a bit of
tweeking to be able to allocate all of the keys.

From Dave Hansen
Reposition some code to be near/similar to user pkeys
s/pks_write_current/x86_pkrs_load
s/pks_saved_pkrs/pkrs
Update Documentation
s/PKR_{RW,AD,WD}_KEY/PKR_{RW,AD,WD}_MASK
Consistently use lower case for pkey
Update commit messages
Add Acks

PMEM Stray Write
Building on the change to the pks_mk_*() function rename
s/pgmap_mk_*/pgmap_set_*/
s/dax_mk_*/dax_set_*/
From Dan Williams
Avoid adding new dax operations by teaching dax_device about pgmap
Remove pgmap_protection_flag_invalid() patch (Just let
kmap'ings fail)

Changes for V8

Feedback from Thomas
* clean up noinstr mess
* Fix static PKEY allocation mess
* Ensure all functions are consistently named.
* Split up patches to do 1 thing per patch
* pkey_update_pkval() implementation
* Streamline the use of pks_write_pkrs() by not disabling preemption
- Leave this to the callers who require it.
- Use documentation and lockdep to prevent errors
* Clean up commit messages to explain in detail _why_ each patch is
there.

Feedback from Dave H.
* Leave out pks_mk_readonly() as it is not used by the PMEM use case

Feedback from Peter Anvin
* Replace pks_abandon_pkey() with pks_update_exception()
This is an even greater simplification in that it no longer
attempts to shield users from faults. As the main use case for
abandoning a key was to allow a system to continue running even
with an error. This should be a rare event so the performance
should not be an issue.

* Simplify ARCH_ENABLE_SUPERVISOR_PKEYS

* Update PKS Test code
- Add default value test
- Split up the test code into patches which follow each feature
addition
- simplify test code processing
- ensure consistent reporting of errors.

* Ensure all entry points to the PKS code are protected by
cpu_feature_enabled(X86_FEATURE_PKS)
- At the same time make sure non-entry points or sub-functions to the
PKS code are not _unnecessarily_ protected by the feature check

* Update documentation
- Use kernel docs to place the docs with the code for easier internal
developer use

* Adjust the PMEM use cases for the core changes

* Split the PMEM patches up to be 1 change per patch and help clarify review

* Review all header files and remove those no longer needed

* Review/update/clarify all commit messages

Fenghua Yu (1):
mm/pkeys: Define PKS page table macros

Ira Weiny (42):
Documentation/protection-keys: Clean up documentation for User Space
pkeys
x86/pkeys: Clarify PKRU_AD_KEY macro
x86/pkeys: Make PKRU macros generic
x86/fpu: Refactor arch_set_user_pkey_access()
mm/pkeys: Add Kconfig options for PKS
x86/pkeys: Add PKS CPU feature bit
x86/fault: Adjust WARN_ON for pkey fault
Documentation/pkeys: Add initial PKS documentation
mm/pkeys: Provide for PKS key allocation
x86/pkeys: Enable PKS on cpus which support it
x86/pkeys: Introduce pks_write_pkrs()
x86/pkeys: Preserve the PKS MSR on context switch
mm/pkeys: Introduce pks_set_readwrite()
mm/pkeys: Introduce pks_set_noaccess()
x86/entry: Add auxiliary pt_regs space
entry: Pass pt_regs to irqentry_exit_cond_resched()
entry: Add calls for save/restore auxiliary pt_regs
x86/entry: Define arch_{save|restore}_auxiliary_pt_regs()
x86/pkeys: Preserve PKRS MSR across exceptions
x86/fault: Print PKS MSR on fault
mm/pkeys: Introduce pks_update_exception()
mm/pkeys: Add pks_available()
memremap_pages: Add Kconfig for DEVMAP_ACCESS_PROTECTION
memremap_pages: Introduce pgmap_protection_available()
memremap_pages: Introduce a PGMAP_PROTECTION flag
memremap_pages: Introduce devmap_protected()
memremap_pages: Reserve a PKS pkey for eventual use by PMEM
memremap_pages: Set PKS pkey in PTEs if requested
memremap_pages: Define pgmap_set_{readwrite|noaccess}() calls
memremap_pages: Add memremap.pks_fault_mode
kmap: Make kmap work for devmap protected pages
dax: Stray access protection for dax_direct_access()
nvdimm/pmem: Enable stray access protection
devdax: Enable stray access protection
mm/pkeys: PKS testing, add initial test code
x86/selftests: Add test_pks
mm/pkeys: PKS testing, add a fault call back
mm/pkeys: PKS testing, add pks_set_*() tests
mm/pkeys: PKS testing, test context switching
mm/pkeys: PKS testing, Add exception test
mm/pkeys: PKS testing, test pks_update_exception()
mm/pkeys: PKS testing, add test for all keys

Rick Edgecombe (1):
mm/pkeys: Introduce PKS fault callbacks

.../admin-guide/kernel-parameters.txt | 12 +
Documentation/core-api/protection-keys.rst | 130 ++-
arch/arm64/include/asm/preempt.h | 2 +-
arch/arm64/kernel/entry-common.c | 4 +-
arch/x86/Kconfig | 6 +
arch/x86/entry/calling.h | 20 +
arch/x86/entry/common.c | 2 +-
arch/x86/entry/entry_64.S | 22 +
arch/x86/entry/entry_64_compat.S | 6 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/entry-common.h | 15 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable_types.h | 22 +
arch/x86/include/asm/pkeys.h | 2 +
arch/x86/include/asm/pkeys_common.h | 18 +
arch/x86/include/asm/pkru.h | 20 +-
arch/x86/include/asm/pks.h | 46 ++
arch/x86/include/asm/processor.h | 15 +-
arch/x86/include/asm/ptrace.h | 21 +
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/asm-offsets_64.c | 15 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/dumpstack.c | 32 +-
arch/x86/kernel/fpu/xstate.c | 22 +-
arch/x86/kernel/head_64.S | 6 +
arch/x86/kernel/process_64.c | 3 +
arch/x86/mm/fault.c | 17 +-
arch/x86/mm/pkeys.c | 320 +++++++-
drivers/dax/device.c | 2 +
drivers/dax/super.c | 60 ++
drivers/md/dm-writecache.c | 8 +-
drivers/nvdimm/pmem.c | 26 +
fs/dax.c | 8 +
fs/fuse/virtio_fs.c | 2 +
include/linux/dax.h | 5 +
include/linux/entry-common.h | 24 +-
include/linux/highmem-internal.h | 6 +
include/linux/memremap.h | 73 ++
include/linux/pgtable.h | 4 +
include/linux/pks-keys.h | 93 +++
include/linux/pks.h | 73 ++
include/linux/sched.h | 7 +
include/uapi/asm-generic/mman-common.h | 1 +
init/init_task.c | 3 +
kernel/entry/common.c | 29 +-
kernel/sched/core.c | 40 +-
lib/Kconfig.debug | 33 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 755 ++++++++++++++++++
mm/Kconfig | 32 +
mm/memremap.c | 132 +++
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/test_pks.c | 514 ++++++++++++
55 files changed, 2618 insertions(+), 112 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_common.h
create mode 100644 arch/x86/include/asm/pks.h
create mode 100644 include/linux/pks-keys.h
create mode 100644 include/linux/pks.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c


base-commit: b2d229d4ddb17db541098b83524d901257e93845
prerequisite-patch-id: a73f5ec8b3ecec9c95724106ccb5999c4f955b89
--
2.35.1


2022-04-21 20:57:47

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 29/44] memremap_pages: Reserve a PKS pkey for eventual use by PMEM

From: Ira Weiny <[email protected]>

Reserve a pkey for use by the memmap facility and set the default
protections to Access Disabled.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V10
This patch now reserves a key before the PKS testing does. So
adjust for this being the only key at this point in the series.

Changes for V9
Adjust for new key allocation
From Dave Hansen
use pkey
---
include/linux/pks-keys.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/pks-keys.h b/include/linux/pks-keys.h
index c914afecb2d3..4e63c8061e55 100644
--- a/include/linux/pks-keys.h
+++ b/include/linux/pks-keys.h
@@ -60,17 +60,22 @@

/* PKS_KEY_DEFAULT must be 0 */
#define PKS_KEY_DEFAULT 0
-#define PKS_KEY_MAX PKS_NEW_KEY(PKS_KEY_DEFAULT, 1)
+#define PKS_KEY_PGMAP_PROTECTION \
+ PKS_NEW_KEY(PKS_KEY_DEFAULT, CONFIG_DEVMAP_ACCESS_PROTECTION)
+#define PKS_KEY_MAX PKS_NEW_KEY(PKS_KEY_PGMAP_PROTECTION, 1)

/* PKS_KEY_DEFAULT_INIT must be RW */
#define PKS_KEY_DEFAULT_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_DEFAULT, RW, 1)
+#define PKS_KEY_PGMAP_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_PGMAP_PROTECTION, \
+ AD, CONFIG_DEVMAP_ACCESS_PROTECTION)

#define PKS_ALL_AD_MASK \
GENMASK(PKS_NUM_PKEYS * PKR_BITS_PER_PKEY, \
PKS_KEY_MAX * PKR_BITS_PER_PKEY)

#define PKS_INIT_VALUE ((PKS_ALL_AD & PKS_ALL_AD_MASK) | \
- PKS_KEY_DEFAULT_INIT \
+ PKS_KEY_DEFAULT_INIT | \
+ PKS_KEY_PGMAP_INIT \
)

#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
--
2.35.1

2022-04-21 22:31:37

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 04/44] x86/fpu: Refactor arch_set_user_pkey_access()

From: Ira Weiny <[email protected]>

Both PKU and PKS update their register values in the same way. They can
therefore share the update code.

Define a helper, pkey_update_pkval(), which will be used to support both
Protection Key User (PKU) and the new Protection Key for Supervisor
(PKS) in subsequent patches.

pkey_update_pkval() contributed by Thomas

Acked-by: Dave Hansen <[email protected]>
Co-developed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Update for V8:
From Rick Edgecombe
Change pkey type to u8
Replace the code Peter provided in update_pkey_reg() for
Thomas' pkey_update_pkval()
-- https://lore.kernel.org/lkml/[email protected]/
---
arch/x86/include/asm/pkeys.h | 2 ++
arch/x86/kernel/fpu/xstate.c | 22 ++++------------------
arch/x86/mm/pkeys.c | 16 ++++++++++++++++
3 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 1d5f14aff5f6..26616cbe19e2 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -131,4 +131,6 @@ static inline int vma_pkey(struct vm_area_struct *vma)
return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
}

+u32 pkey_update_pkval(u32 pkval, u8 pkey, u32 accessbits);
+
#endif /*_ASM_X86_PKEYS_H */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index e525bfee7e07..ea9207b12863 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -984,8 +984,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val)
{
- u32 old_pkru, new_pkru_bits = 0;
- int pkey_shift;
+ u32 pkru;

/*
* This check implies XSAVE support. OSPKE only gets
@@ -1002,22 +1001,9 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
if (WARN_ON_ONCE(pkey >= arch_max_pkey()))
return -EINVAL;

- /* Set the bits needed in PKRU: */
- if (init_val & PKEY_DISABLE_ACCESS)
- new_pkru_bits |= PKR_AD_BIT;
- if (init_val & PKEY_DISABLE_WRITE)
- new_pkru_bits |= PKR_WD_BIT;
-
- /* Shift the bits in to the correct place in PKRU for pkey: */
- pkey_shift = pkey * PKR_BITS_PER_PKEY;
- new_pkru_bits <<= pkey_shift;
-
- /* Get old PKRU and mask off any old bits in place: */
- old_pkru = read_pkru();
- old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift);
-
- /* Write old part along with new part: */
- write_pkru(old_pkru | new_pkru_bits);
+ pkru = read_pkru();
+ pkru = pkey_update_pkval(pkru, pkey, init_val);
+ write_pkru(pkru);

return 0;
}
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index e1527b4619e1..7c90b2188c5f 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -193,3 +193,19 @@ static __init int setup_init_pkru(char *opt)
return 1;
}
__setup("init_pkru=", setup_init_pkru);
+
+/*
+ * Kernel users use the same flags as user space:
+ * PKEY_DISABLE_ACCESS
+ * PKEY_DISABLE_WRITE
+ */
+u32 pkey_update_pkval(u32 pkval, u8 pkey, u32 accessbits)
+{
+ int shift = pkey * PKR_BITS_PER_PKEY;
+
+ if (WARN_ON_ONCE(accessbits & ~PKEY_ACCESS_MASK))
+ accessbits &= PKEY_ACCESS_MASK;
+
+ pkval &= ~(PKEY_ACCESS_MASK << shift);
+ return pkval | accessbits << shift;
+}
--
2.35.1

2022-04-22 01:35:56

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 17/44] x86/entry: Add auxiliary pt_regs space

From: Ira Weiny <[email protected]>

The PKRS MSR is not managed by XSAVE. In order for the MSR to be saved
during an exception the current CPU MSR value needs to be saved
somewhere during the exception and restored when returning to the
previous context.

Two possible places for preserving this state were considered,
irqentry_state_t or pt_regs.[1] pt_regs was much more complicated and
was potentially fraught with unintended consequences.[2] However, Andy
Lutomirski came up with a way to hide additional values on the stack
which could be accessed as "extended_pt_regs".[3] This method allows any
function with current access to pt_regs to obtain access to the extra
information without expanding the use of irqentry_state_t and leaving
pt_regs intact for compatibility with outside tools like BPF.

Prepare the assembly code to add a hidden auxiliary pt_regs space. To
simplify, the assembly code only adds space on the stack as defined by
the C code which needs it. The use of this space is left to the C code
which is required to select ARCH_HAS_PTREGS_AUXILIARY to enable this
support.

Each nested exception gets another copy of this auxiliary space allowing
for any number of levels of exception handling.

Initially the space is left empty and results in no code changes because
ARCH_HAS_PTREGS_AUXILIARY is not set. Subsequent patches adding data to
pt_regs_auxiliary must set ARCH_HAS_PTREGS_AUXILIARY or a build failure
will occur. The use of ARCH_HAS_PTREGS_AUXILIARY also avoids the
introduction of 2 instructions (addq/subq) on every entry call when the
extra space is not needed.

32bit is specifically excluded as the current consumer of this, PKS,
will not support 32bit either.

Peter, Thomas, Andy, Dave, and Dan all suggested parts of the patch or
aided in the development of the patch..

[1] https://lore.kernel.org/lkml/CALCETrVe1i5JdyzD_BcctxQJn+ZE3T38EFPgjxN1F577M36g+w@mail.gmail.com/
[2] https://lore.kernel.org/lkml/[email protected]/#t
[3] https://lore.kernel.org/lkml/CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3=PGW+iW5=w@mail.gmail.com/

Cc: Dave Hansen <[email protected]>
Cc: Dan Williams <[email protected]>
Suggested-by: Dave Hansen <[email protected]>
Suggested-by: Dan Williams <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V9:
Update commit message

Changes for V8:
Exclude 32bit
Introduce ARCH_HAS_PTREGS_AUXILIARY to optimize this away when
not needed.
From Thomas
s/EXTENDED_PT_REGS_SIZE/PT_REGS_AUX_SIZE
Fix up PTREGS_AUX_SIZE macro to be based on the
structures and used in assembly code via the
nifty asm-offset macros
Bound calls into c code with [PUSH|POP]_RTREGS_AUXILIARY
instead of using a macro 'call'
Split this patch out and put the PKS specific stuff in a
separate patch

Changes for V7:
Rebased to 5.14 entry code
declare write_pkrs() in pks.h
s/INIT_PKRS_VALUE/pkrs_init_value
Remove unnecessary INIT_PKRS_VALUE def
s/pkrs_save_set_irq/pkrs_save_irq/
The inital value for exceptions is best managed
completely within the pkey code.
---
arch/x86/Kconfig | 4 ++++
arch/x86/entry/calling.h | 20 ++++++++++++++++++++
arch/x86/entry/entry_64.S | 22 ++++++++++++++++++++++
arch/x86/entry/entry_64_compat.S | 6 ++++++
arch/x86/include/asm/ptrace.h | 18 ++++++++++++++++++
arch/x86/kernel/asm-offsets_64.c | 15 +++++++++++++++
arch/x86/kernel/head_64.S | 6 ++++++
7 files changed, 91 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c53deda2ea25..69e611d3b8ef 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1889,6 +1889,10 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS

If unsure, say y.

+config ARCH_HAS_PTREGS_AUXILIARY
+ depends on X86_64
+ bool
+
choice
prompt "TSX enable mode"
depends on CPU_SUP_INTEL
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index a4c061fb7c6e..d0ebf9b069c9 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -63,6 +63,26 @@ For 32-bit we have the following conventions - kernel is built with
* for assembly code:
*/

+
+#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY
+
+.macro PUSH_PTREGS_AUXILIARY
+ /* add space for pt_regs_auxiliary */
+ subq $PTREGS_AUX_SIZE, %rsp
+.endm
+
+.macro POP_PTREGS_AUXILIARY
+ /* remove space for pt_regs_auxiliary */
+ addq $PTREGS_AUX_SIZE, %rsp
+.endm
+
+#else
+
+#define PUSH_PTREGS_AUXILIARY
+#define POP_PTREGS_AUXILIARY
+
+#endif
+
.macro PUSH_REGS rdx=%rdx rax=%rax save_ret=0
.if \save_ret
pushq %rsi /* pt_regs->si */
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4faac48ebec5..5a037a56814d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -335,7 +335,9 @@ SYM_CODE_END(ret_from_fork)
movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
.endif

+ PUSH_PTREGS_AUXILIARY
call \cfunc
+ POP_PTREGS_AUXILIARY

jmp error_return
.endm
@@ -440,7 +442,9 @@ SYM_CODE_START(\asmsym)

movq %rsp, %rdi /* pt_regs pointer */

+ PUSH_PTREGS_AUXILIARY
call \cfunc
+ POP_PTREGS_AUXILIARY

jmp paranoid_exit

@@ -502,7 +506,9 @@ SYM_CODE_START(\asmsym)
* stack.
*/
movq %rsp, %rdi /* pt_regs pointer */
+ PUSH_PTREGS_AUXILIARY
call vc_switch_off_ist
+ POP_PTREGS_AUXILIARY
movq %rax, %rsp /* Switch to new stack */

UNWIND_HINT_REGS
@@ -513,7 +519,9 @@ SYM_CODE_START(\asmsym)

movq %rsp, %rdi /* pt_regs pointer */

+ PUSH_PTREGS_AUXILIARY
call kernel_\cfunc
+ POP_PTREGS_AUXILIARY

/*
* No need to switch back to the IST stack. The current stack is either
@@ -549,7 +557,9 @@ SYM_CODE_START(\asmsym)
movq %rsp, %rdi /* pt_regs pointer into first argument */
movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/
movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
+ PUSH_PTREGS_AUXILIARY
call \cfunc
+ POP_PTREGS_AUXILIARY

/* For some configurations \cfunc ends up being a noreturn. */
REACHABLE
@@ -802,7 +812,9 @@ SYM_CODE_START_LOCAL(exc_xen_hypervisor_callback)
movq %rdi, %rsp /* we don't return, adjust the stack frame */
UNWIND_HINT_REGS

+ PUSH_PTREGS_AUXILIARY
call xen_pv_evtchn_do_upcall
+ POP_PTREGS_AUXILIARY

jmp error_return
SYM_CODE_END(exc_xen_hypervisor_callback)
@@ -1003,7 +1015,9 @@ SYM_CODE_START_LOCAL(error_entry)
/* Put us onto the real thread stack. */
popq %r12 /* save return addr in %12 */
movq %rsp, %rdi /* arg0 = pt_regs pointer */
+ PUSH_PTREGS_AUXILIARY
call sync_regs
+ POP_PTREGS_AUXILIARY
movq %rax, %rsp /* switch stack */
ENCODE_FRAME_POINTER
pushq %r12
@@ -1059,7 +1073,9 @@ SYM_CODE_START_LOCAL(error_entry)
* as if we faulted immediately after IRET.
*/
mov %rsp, %rdi
+ PUSH_PTREGS_AUXILIARY
call fixup_bad_iret
+ POP_PTREGS_AUXILIARY
mov %rax, %rsp
jmp .Lerror_entry_from_usermode_after_swapgs
SYM_CODE_END(error_entry)
@@ -1166,7 +1182,9 @@ SYM_CODE_START(asm_exc_nmi)

movq %rsp, %rdi
movq $-1, %rsi
+ PUSH_PTREGS_AUXILIARY
call exc_nmi
+ POP_PTREGS_AUXILIARY

/*
* Return back to user mode. We must *not* do the normal exit
@@ -1202,6 +1220,8 @@ SYM_CODE_START(asm_exc_nmi)
* +---------------------------------------------------------+
* | pt_regs |
* +---------------------------------------------------------+
+ * | (Optionally) pt_regs_extended |
+ * +---------------------------------------------------------+
*
* The "original" frame is used by hardware. Before re-enabling
* NMIs, we need to be done with it, and we need to leave enough
@@ -1380,7 +1400,9 @@ end_repeat_nmi:

movq %rsp, %rdi
movq $-1, %rsi
+ PUSH_PTREGS_AUXILIARY
call exc_nmi
+ POP_PTREGS_AUXILIARY

/* Always restore stashed CR3 value (see paranoid_entry) */
RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 4fdb007cddbd..cf6c88eb384d 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -137,7 +137,9 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
.Lsysenter_flags_fixed:

movq %rsp, %rdi
+ PUSH_PTREGS_AUXILIARY
call do_SYSENTER_32
+ POP_PTREGS_AUXILIARY
/* XEN PV guests always use IRET path */
ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
"jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
@@ -257,7 +259,9 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
UNWIND_HINT_REGS

movq %rsp, %rdi
+ PUSH_PTREGS_AUXILIARY
call do_fast_syscall_32
+ POP_PTREGS_AUXILIARY
/* XEN PV guests always use IRET path */
ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
"jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
@@ -415,6 +419,8 @@ SYM_CODE_START(entry_INT80_compat)
cld

movq %rsp, %rdi
+ PUSH_PTREGS_AUXILIARY
call do_int80_syscall_32
+ POP_PTREGS_AUXILIARY
jmp swapgs_restore_regs_and_return_to_usermode
SYM_CODE_END(entry_INT80_compat)
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 4357e0f2cd5f..0889045b3a6f 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -2,6 +2,7 @@
#ifndef _ASM_X86_PTRACE_H
#define _ASM_X86_PTRACE_H

+#include <linux/container_of.h>
#include <asm/segment.h>
#include <asm/page_types.h>
#include <uapi/asm/ptrace.h>
@@ -91,6 +92,23 @@ struct pt_regs {
/* top of stack page */
};

+/*
+ * NOTE: Features which add data to pt_regs_auxiliary must select
+ * ARCH_HAS_PTREGS_AUXILIARY. Failure to do so will result in a build failure.
+ */
+struct pt_regs_auxiliary {
+};
+
+struct pt_regs_extended {
+ struct pt_regs_auxiliary aux;
+ struct pt_regs pt_regs __aligned(8);
+};
+
+static inline struct pt_regs_extended *to_extended_pt_regs(struct pt_regs *regs)
+{
+ return container_of(regs, struct pt_regs_extended, pt_regs);
+}
+
#endif /* !__i386__ */

#ifdef CONFIG_PARAVIRT
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index b14533af7676..66f08ac3507a 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -4,6 +4,7 @@
#endif

#include <asm/ia32.h>
+#include <asm/ptrace.h>

#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_PARAVIRT_SPINLOCKS)
#include <asm/kvm_para.h>
@@ -60,5 +61,19 @@ int main(void)
DEFINE(stack_canary_offset, offsetof(struct fixed_percpu_data, stack_canary));
BLANK();
#endif
+
+#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY
+ /* Size of Auxiliary pt_regs data */
+ DEFINE(PTREGS_AUX_SIZE, sizeof(struct pt_regs_extended) -
+ sizeof(struct pt_regs));
+#else
+ /*
+ * Adding data to struct pt_regs_auxiliary requires setting
+ * ARCH_HAS_PTREGS_AUXILIARY
+ */
+ BUILD_BUG_ON((sizeof(struct pt_regs_extended) -
+ sizeof(struct pt_regs)) != 0);
+#endif
+
return 0;
}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index b8e3019547a5..00bc3a74efb7 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -341,8 +341,10 @@ SYM_CODE_START_NOALIGN(vc_boot_ghcb)
movq %rsp, %rdi
movq ORIG_RAX(%rsp), %rsi
movq initial_vc_handler(%rip), %rax
+ PUSH_PTREGS_AUXILIARY
ANNOTATE_RETPOLINE_SAFE
call *%rax
+ POP_PTREGS_AUXILIARY

/* Unwind pt_regs */
POP_REGS
@@ -421,7 +423,9 @@ SYM_CODE_START_LOCAL(early_idt_handler_common)
UNWIND_HINT_REGS

movq %rsp,%rdi /* RDI = pt_regs; RSI is already trapnr */
+ PUSH_PTREGS_AUXILIARY
call do_early_exception
+ POP_PTREGS_AUXILIARY

decl early_recursion_flag(%rip)
jmp restore_regs_and_return_to_kernel
@@ -448,7 +452,9 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb)
/* Call C handler */
movq %rsp, %rdi
movq ORIG_RAX(%rsp), %rsi
+ PUSH_PTREGS_AUXILIARY
call do_vc_no_ghcb
+ POP_PTREGS_AUXILIARY

/* Unwind pt_regs */
POP_REGS
--
2.35.1

2022-04-22 06:02:00

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 25/44] memremap_pages: Add Kconfig for DEVMAP_ACCESS_PROTECTION

From: Ira Weiny <[email protected]>

The persistent memory (PMEM) driver uses the memremap_pages facility to
provide 'struct page' metadata (vmemmap) for PMEM. Given that PMEM
capacity may be orders of magnitude higher capacity than System RAM it
presents a large vulnerability surface to stray writes. Unlike stray
writes to System RAM, which may result in a crash or other undesirable
behavior, stray writes to PMEM additionally are more likely to result in
permanent data loss. Reboot is not a remediation for PMEM corruption
like it is for System RAM.

Given that PMEM access from the kernel is limited to a constrained set
of locations (PMEM driver, Filesystem-DAX, and direct-I/O to a DAX
page), it is amenable to supervisor pkey protection.

Add a Kconfig option to configure additional devmap protections using
PKS.

Only PMEM which is advertised to the memory subsystem needs this
protection. Therefore, the feature depends on NVDIMM_PFN.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V10
Rebased to latest

Changes for V9
Change this to enable arch pks consumer for mutual exclusion
with testing all pkeys
From Dan Williams
Default to no
Clean up commit message

Changes for V8
Split this out from
[PATCH V7 13/18] memremap_pages: Add access protection via supervisor Protection Keys (PKS)
---
mm/Kconfig | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 29c272974aa9..fe1752e6e76c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -797,6 +797,24 @@ config ZONE_DEVICE

If FS_DAX is enabled, then say Y.

+config DEVMAP_ACCESS_PROTECTION
+ bool "Access protection for memremap_pages()"
+ depends on NVDIMM_PFN
+ depends on ARCH_HAS_SUPERVISOR_PKEYS
+ select ARCH_ENABLE_PKS_CONSUMER
+ default n
+
+ help
+ Enable extra protections on device memory. This protects against
+ unintended access to devices such as a stray writes. This feature is
+ particularly useful to protect against corruption of persistent
+ memory.
+
+ This depends on architecture support of supervisor PKeys and has no
+ overhead if the architecture does not support them.
+
+ If you have persistent memory say 'Y'.
+
#
# Helpers to mirror range of the CPU page tables of a process into device page
# tables.
--
2.35.1

2022-04-22 07:58:45

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 01/44] Documentation/protection-keys: Clean up documentation for User Space pkeys

From: Ira Weiny <[email protected]>

The documentation for user space pkeys was a bit dated including things
such as Amazon and distribution testing information which is irrelevant
now.

Update the documentation. This also streamlines adding the Supervisor
pkey documentation later on.

Cc: "Moger, Babu" <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V9:
use pkey
Change information on which CPU's have PKU
---
Documentation/core-api/protection-keys.rst | 44 +++++++++++-----------
1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
index ec575e72d0b2..bf28ac0401f3 100644
--- a/Documentation/core-api/protection-keys.rst
+++ b/Documentation/core-api/protection-keys.rst
@@ -4,31 +4,29 @@
Memory Protection Keys
======================

-Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
-which is found on Intel's Skylake (and later) "Scalable Processor"
-Server CPUs. It will be available in future non-server Intel parts
-and future AMD processors.
-
-For anyone wishing to test or use this feature, it is available in
-Amazon's EC2 C5 instances and is known to work there using an Ubuntu
-17.04 image.
-
-Memory Protection Keys provides a mechanism for enforcing page-based
-protections, but without requiring modification of the page tables
-when an application changes protection domains. It works by
-dedicating 4 previously ignored bits in each page table entry to a
-"protection key", giving 16 possible keys.
-
-There is also a new user-accessible register (PKRU) with two separate
-bits (Access Disable and Write Disable) for each key. Being a CPU
-register, PKRU is inherently thread-local, potentially giving each
+Memory Protection Keys provide a mechanism for enforcing page-based
+protections, but without requiring modification of the page tables when an
+application changes protection domains.
+
+Pkeys Userspace (PKU) is a feature which can be found on:
+ * Intel server CPUs, Skylake and later
+ * Intel client CPUs, Tiger Lake (11th Gen Core) and later
+ * Future AMD CPUs
+
+Pkeys work by dedicating 4 previously Reserved bits in each page table entry to
+a "protection key", giving 16 possible keys.
+
+Protections for each key are defined with a per-CPU user-accessible register
+(PKRU). Each of these is a 32-bit register storing two bits (Access Disable
+and Write Disable) for each of 16 keys.
+
+Being a CPU register, PKRU is inherently thread-local, potentially giving each
thread a different set of protections from every other thread.

-There are two new instructions (RDPKRU/WRPKRU) for reading and writing
-to the new register. The feature is only available in 64-bit mode,
-even though there is theoretically space in the PAE PTEs. These
-permissions are enforced on data access only and have no effect on
-instruction fetches.
+There are two instructions (RDPKRU/WRPKRU) for reading and writing to the
+register. The feature is only available in 64-bit mode, even though there is
+theoretically space in the PAE PTEs. These permissions are enforced on data
+access only and have no effect on instruction fetches.

Syscalls
========
--
2.35.1

2022-04-22 09:03:44

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 20/44] x86/entry: Define arch_{save|restore}_auxiliary_pt_regs()

From: Ira Weiny <[email protected]>

The x86 architecture supports the new auxiliary pt_regs space if
ARCH_HAS_PTREGS_AUXILIARY is enabled.

Define the callbacks within the x86 code required by the core entry code
when this support is enabled.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V8
New patch
---
arch/x86/include/asm/entry-common.h | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index 43184640b579..5fa5dd2d539c 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -95,4 +95,16 @@ static __always_inline void arch_exit_to_user_mode(void)
}
#define arch_exit_to_user_mode arch_exit_to_user_mode

+#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY
+
+static inline void arch_save_aux_pt_regs(struct pt_regs *regs)
+{
+}
+
+static inline void arch_restore_aux_pt_regs(struct pt_regs *regs)
+{
+}
+
+#endif
+
#endif
--
2.35.1

2022-04-22 11:22:45

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 10/44] x86/pkeys: Enable PKS on cpus which support it

From: Ira Weiny <[email protected]>

Protection Keys for Supervisor pages (PKS) enables fast, hardware thread
specific, manipulation of permission restrictions on supervisor page
mappings. It uses a supervisor specific MSR to assign permissions to
the pkeys.

When PKS is configured and the cpu supports PKS, initialize the MSR, and
enable the hardware.

Add asm/pks.h to store new internal functions and structures such as
pks_setup().

Co-developed-by: Fenghua Yu <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V10
Update to latest master branch

Changes for V9
Reword commit message
Move this after the patch defining PKS_INIT_VALUE

Changes for V8
Move setup_pks() into this patch with a default of all access
for all pkeys.
From Thomas
s/setup_pks/pks_setup/
Update Change log to better reflect exactly what this patch does.
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pks.h | 15 +++++++++++++++
arch/x86/include/uapi/asm/processor-flags.h | 2 ++
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/mm/pkeys.c | 17 +++++++++++++++++
5 files changed, 37 insertions(+)
create mode 100644 arch/x86/include/asm/pks.h

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ee15311b6be1..e8e33b5ed507 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -809,6 +809,7 @@

#define MSR_IA32_TSC_DEADLINE 0x000006E0

+#define MSR_IA32_PKRS 0x000006E1

#define MSR_TSX_FORCE_ABORT 0x0000010F

diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h
new file mode 100644
index 000000000000..8180fc59790b
--- /dev/null
+++ b/arch/x86/include/asm/pks.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PKS_H
+#define _ASM_X86_PKS_H
+
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+
+void pks_setup(void);
+
+#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
+
+static inline void pks_setup(void) { }
+
+#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
+
+#endif /* _ASM_X86_PKS_H */
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index c47cc7f2feeb..21b7783885b3 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -132,6 +132,8 @@
#define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT)
#define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement Technology */
#define X86_CR4_CET _BITUL(X86_CR4_CET_BIT)
+#define X86_CR4_PKS_BIT 24 /* enable Protection Keys for Supervisor */
+#define X86_CR4_PKS _BITUL(X86_CR4_PKS_BIT)

/*
* x86-64 Task Priority Register, CR8
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e342ae4db3c4..4c0623783bd8 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -60,6 +60,7 @@
#include <asm/uv/uv.h>
#include <asm/sigframe.h>
#include <asm/traps.h>
+#include <asm/pks.h>

#include "cpu.h"

@@ -1764,6 +1765,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
x86_init_rdrand(c);
setup_pku(c);
setup_cet(c);
+ pks_setup();

/*
* Clear/Set all flags overridden by options, need do it
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 7c90b2188c5f..f904376570f4 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -6,6 +6,7 @@
#include <linux/debugfs.h> /* debugfs_create_u32() */
#include <linux/mm_types.h> /* mm_struct, vma, etc... */
#include <linux/pkeys.h> /* PKEY_* */
+#include <linux/pks-keys.h>
#include <uapi/asm-generic/mman-common.h>

#include <asm/cpufeature.h> /* boot_cpu_has, ... */
@@ -209,3 +210,19 @@ u32 pkey_update_pkval(u32 pkval, u8 pkey, u32 accessbits)
pkval &= ~(PKEY_ACCESS_MASK << shift);
return pkval | accessbits << shift;
}
+
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+
+/*
+ * PKS is independent of PKU and either or both may be supported on a CPU.
+ */
+void pks_setup(void)
+{
+ if (!cpu_feature_enabled(X86_FEATURE_PKS))
+ return;
+
+ wrmsrl(MSR_IA32_PKRS, PKS_INIT_VALUE);
+ cr4_set_bits(X86_CR4_PKS);
+}
+
+#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
--
2.35.1

2022-04-22 14:11:13

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 12/44] x86/pkeys: Introduce pks_write_pkrs()

From: Ira Weiny <[email protected]>

Writing to MSR's is inefficient. Even though the underlying PKS
register, MSR_IA32_PKRS, is not serializing; writing to the MSR should
be avoided if possible. Especially when updates are made in critical
paths such as the scheduler or the entry code.

Introduce pks_write_pkrs(). pks_write_pkrs() avoids writing
MSR_IA32_PKRS if the pkrs value has not changed for the current CPU.
Most of the callers are in a non-preemptable code path. Therefore,
avoid calling preempt_{disable,enable}() to protect the per-cpu cache
and instead rely on outer calls for this protection. Do the same with
checks to X86_FEATURE_PKS.

On startup, while unlikely, the PKS_INIT_VALUE may be 0. This would
prevent pks_write_pkrs() from updating the MSR because of the initial
value of the per-cpu cache. Therefore, keep the MSR write in
pks_setup() to ensure the MSR is initialized at least one time.

Suggested-by: Dave Hansen <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V9
From Dave Hansen
Update commit message with a bit more detail about why
this optimization is needed
Update the code comments as well.

Changes for V8
From Thomas
Remove get/put_cpu_ptr() and make this a 'lower level
call. This makes it preemption unsafe but it is called
mostly where preemption is already disabled. Add this
as a predicate of the call and those calls which need to
can disable preemption.
Add lockdep assert for preemption
Ensure MSR gets written even if the PKS_INIT_VALUE is 0.
Completely re-write the commit message.
s/write_pkrs/pks_write_pkrs/
Split this off into a singular patch

Changes for V7
Create a dynamic pkrs_initial_value in early init code.
Clean up comments
Add comment to macro guard
---
arch/x86/mm/pkeys.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)

diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index f904376570f4..10521f1a292e 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -213,15 +213,56 @@ u32 pkey_update_pkval(u32 pkval, u8 pkey, u32 accessbits)

#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS

+static DEFINE_PER_CPU(u32, pkrs_cache);
+
+/*
+ * pks_write_pkrs() - Write the pkrs of the current CPU
+ * @new_pkrs: New value to write to the current CPU register
+ *
+ * Optimizes the MSR writes by maintaining a per cpu cache.
+ *
+ * Context: must be called with preemption disabled
+ * Context: must only be called if PKS is enabled
+ *
+ * It should also be noted that the underlying WRMSR(MSR_IA32_PKRS) is not
+ * serializing but still maintains ordering properties similar to WRPKRU.
+ * The current SDM section on PKRS needs updating but should be the same as
+ * that of WRPKRU. Quote from the WRPKRU text:
+ *
+ * WRPKRU will never execute transiently. Memory accesses
+ * affected by PKRU register will not execute (even transiently)
+ * until all prior executions of WRPKRU have completed execution
+ * and updated the PKRU register.
+ */
+static inline void pks_write_pkrs(u32 new_pkrs)
+{
+ u32 pkrs = __this_cpu_read(pkrs_cache);
+
+ lockdep_assert_preemption_disabled();
+
+ if (pkrs != new_pkrs) {
+ __this_cpu_write(pkrs_cache, new_pkrs);
+ wrmsrl(MSR_IA32_PKRS, new_pkrs);
+ }
+}
+
/*
* PKS is independent of PKU and either or both may be supported on a CPU.
+ *
+ * Context: must be called with preemption disabled
*/
void pks_setup(void)
{
if (!cpu_feature_enabled(X86_FEATURE_PKS))
return;

+ /*
+ * If the PKS_INIT_VALUE is 0 then pks_write_pkrs() will fail to
+ * initialize the MSR. Do a single write here to ensure the MSR is
+ * written at least one time.
+ */
wrmsrl(MSR_IA32_PKRS, PKS_INIT_VALUE);
+ pks_write_pkrs(PKS_INIT_VALUE);
cr4_set_bits(X86_CR4_PKS);
}

--
2.35.1

2022-04-22 14:25:52

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 28/44] memremap_pages: Introduce devmap_protected()

From: Ira Weiny <[email protected]>

Consumers of protected dev_pagemaps can check the PGMAP_PROTECTION flag to
see if the devmap is protected. However, most contexts will have a struct
page not the pagemap structure directly.

Define devmap_protected() to determine if a page is part of a
dev_pagemap mapping and if the page is protected by additional
protections.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V10
Move code from mm.h to memremap.h
Upstream separated memremap.h functionality from mm.h
dc90f0846df4 ("mm: don't include <linux/memremap.h> in <linux/mm.h>")
---
include/linux/memremap.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 02c415b1b278..6325f00096ec 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -223,6 +223,23 @@ static inline bool pgmap_protection_available(void)
return pks_available();
}

+DECLARE_STATIC_KEY_FALSE(dev_pgmap_protection_static_key);
+
+/*
+ * devmap_protected() requires a reference on the page to ensure there is no
+ * races with dev_pagemap tear down.
+ */
+static inline bool devmap_protected(struct page *page)
+{
+ if (!static_branch_unlikely(&dev_pgmap_protection_static_key))
+ return false;
+ if (!is_zone_device_page(page))
+ return false;
+ if (page->pgmap->flags & PGMAP_PROTECTION)
+ return true;
+ return false;
+}
+
#else

static inline bool pgmap_protection_available(void)
--
2.35.1

2022-04-22 17:53:10

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 27/44] memremap_pages: Introduce a PGMAP_PROTECTION flag

From: Ira Weiny <[email protected]>

The persistent memory (PMEM) driver uses the memremap_pages facility to
provide 'struct page' metadata (vmemmap) for PMEM. Given that PMEM
capacity maybe orders of magnitude higher capacity than System RAM it
presents a large vulnerability surface to stray writes. Unlike stray
writes to System RAM, which may result in a crash or other undesirable
behavior, stray writes to PMEM additionally are more likely to result in
permanent data loss. Reboot is not a remediation for PMEM corruption
like it is for System RAM.

Given that PMEM access from the kernel is limited to a constrained set
of locations (PMEM driver, Filesystem-DAX, and direct-I/O to a DAX
page), it is amenable to supervisor pkey protection.

Some systems which have configured DEVMAP_ACCESS_PROTECTION may not have
PMEM installed. Or the PMEM may not be mapped into the direct map. In
addition, some callers of memremap_pages() will not want the mapped
pages protected.

Define a new PGMAP flag to distinguish page maps which are protected.
Use this flag to enable runtime protection support. A static key is
used to optimize the runtime support.

Specifying this flag on a system which can't support protections will
fail. Callers are expected to check if protections are supported via
pgmap_protection_available(). It was considered to have callers specify
the flag and check if the dev_pagemap object returned was protected or
not. But this was considered less efficient than a direct check
beforehand.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V9
Clean up commit message

Changes for V8
Split this out into it's own patch
---
include/linux/memremap.h | 1 +
mm/memremap.c | 40 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 7980d0db8617..02c415b1b278 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -83,6 +83,7 @@ struct dev_pagemap_ops {
};

#define PGMAP_ALTMAP_VALID (1 << 0)
+#define PGMAP_PROTECTION (1 << 1)

/**
* struct dev_pagemap - metadata for ZONE_DEVICE mappings
diff --git a/mm/memremap.c b/mm/memremap.c
index af0223605e69..4dfb3025cee3 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -62,6 +62,37 @@ static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
}
#endif /* CONFIG_FS_DAX */

+#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION
+
+/*
+ * Note; all devices which have asked for protections share the same key. The
+ * key may, or may not, have been provided by the core. If not, protection
+ * will be disabled. The key acquisition is attempted when the first ZONE
+ * DEVICE requests it and freed when all zones have been unmapped.
+ *
+ * Also this must be EXPORT_SYMBOL rather than EXPORT_SYMBOL_GPL because it is
+ * intended to be used in the kmap API.
+ */
+DEFINE_STATIC_KEY_FALSE(dev_pgmap_protection_static_key);
+EXPORT_SYMBOL(dev_pgmap_protection_static_key);
+
+static void devmap_protection_enable(void)
+{
+ static_branch_inc(&dev_pgmap_protection_static_key);
+}
+
+static void devmap_protection_disable(void)
+{
+ static_branch_dec(&dev_pgmap_protection_static_key);
+}
+
+#else /* !CONFIG_DEVMAP_ACCESS_PROTECTION */
+
+static void devmap_protection_enable(void) { }
+static void devmap_protection_disable(void) { }
+
+#endif /* CONFIG_DEVMAP_ACCESS_PROTECTION */
+
static void pgmap_array_delete(struct range *range)
{
xa_store_range(&pgmap_array, PHYS_PFN(range->start), PHYS_PFN(range->end),
@@ -148,6 +179,9 @@ void memunmap_pages(struct dev_pagemap *pgmap)

WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
devmap_managed_enable_put(pgmap);
+
+ if (pgmap->flags & PGMAP_PROTECTION)
+ devmap_protection_disable();
}
EXPORT_SYMBOL_GPL(memunmap_pages);

@@ -295,6 +329,12 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
return ERR_PTR(-EINVAL);

+ if (pgmap->flags & PGMAP_PROTECTION) {
+ if (!pgmap_protection_available())
+ return ERR_PTR(-EINVAL);
+ devmap_protection_enable();
+ }
+
switch (pgmap->type) {
case MEMORY_DEVICE_PRIVATE:
if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
--
2.35.1

2022-04-22 17:53:52

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 07/44] x86/fault: Adjust WARN_ON for pkey fault

From: Ira Weiny <[email protected]>

Previously if a protection key fault occurred on a kernel address it
indicated something wrong because user page mappings are not supposed to
be in the kernel address space.

With the addition of PKS, pkey faults may now happen on kernel mappings.

If PKS is enabled, avoid the warning in the fault path. Simplify the
comment.

Cc: Sean Christopherson <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V9
From Dave Hansen
Clarify the comment and commit message
---
arch/x86/mm/fault.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index d0074c6ed31a..5599109d1124 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1148,11 +1148,11 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
unsigned long address)
{
/*
- * Protection keys exceptions only happen on user pages. We
- * have no user pages in the kernel portion of the address
- * space, so do not expect them here.
+ * PF_PF faults should only occur on kernel
+ * addresses when supervisor pkeys are enabled.
*/
- WARN_ON_ONCE(hw_error_code & X86_PF_PK);
+ WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS) &&
+ (hw_error_code & X86_PF_PK));

#ifdef CONFIG_X86_32
/*
--
2.35.1

2022-04-22 17:56:01

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 43/44] mm/pkeys: PKS testing, test pks_update_exception()

From: Ira Weiny <[email protected]>

A common use case for the custom fault callbacks will be for the
callback to warn of the violation and relax the permissions rather than
crash the kernel. pks_update_exception() was added for this purpose.

Add a test which uses pks_update_exception() to clear the pkey
permissions. Verify that the permissions are changed in the interrupted
thread.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V9
Update the commit message
Clean up test name
Add test_pks support
s/pks_mk_*/pks_set_*/
Simplify the use of globals for the faults
From Rick Edgecombe
Use WRITE_ONCE to protect against races with the fault
handler
s/RUN_FAULT_ABANDON/RUN_FAULT_CALLBACK

Changes for V8
New test developed just to double check for regressions while
reworking the code.
---
lib/pks/pks_test.c | 60 ++++++++++++++++++++++++++
tools/testing/selftests/x86/test_pks.c | 5 ++-
2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c
index 762f4a19cb7d..a9cd2a49abfa 100644
--- a/lib/pks/pks_test.c
+++ b/lib/pks/pks_test.c
@@ -49,6 +49,7 @@
#define ARM_CTX_SWITCH 2
#define CHECK_CTX_SWITCH 3
#define RUN_EXCEPTION 4
+#define RUN_EXCEPTION_UPDATE 5
#define RUN_CRASH_TEST 9

DECLARE_PER_CPU(u32, pkrs_cache);
@@ -64,6 +65,7 @@ struct pks_test_ctx {
void *test_page;
bool fault_seen;
bool validate_exp_handling;
+ bool validate_update_exp;
};

static bool check_pkey_val(u32 pk_reg, u8 pkey, u32 expected)
@@ -164,6 +166,16 @@ static void validate_exception(struct pks_test_ctx *ctx, u32 thread_pkrs)
}
}

+static bool handle_update_exception(struct pt_regs *regs, struct pks_test_ctx *ctx)
+{
+ pr_debug("Updating pkey %d during exception\n", ctx->pkey);
+
+ ctx->fault_seen = true;
+ pks_update_exception(regs, ctx->pkey, 0);
+
+ return true;
+}
+
/* Global data protected by test_run_lock */
struct pks_test_ctx *g_ctx_under_test;

@@ -190,6 +202,9 @@ bool pks_test_fault_callback(struct pt_regs *regs, unsigned long address,
if (!g_ctx_under_test)
return false;

+ if (g_ctx_under_test->validate_update_exp)
+ return handle_update_exception(regs, g_ctx_under_test);
+
if (g_ctx_under_test->validate_exp_handling) {
validate_exception(g_ctx_under_test, pkrs);
/*
@@ -518,6 +533,47 @@ static void check_ctx_switch(struct pks_session_data *sd)
}
}

+static bool run_exception_update(struct pks_session_data *sd)
+{
+ struct pks_test_ctx *ctx;
+
+ ctx = alloc_ctx(PKS_KEY_TEST);
+ if (IS_ERR(ctx))
+ return false;
+
+ set_ctx_data(sd, ctx);
+
+ ctx->fault_seen = false;
+ ctx->validate_update_exp = true;
+ pks_set_noaccess(ctx->pkey);
+
+ set_context_for_fault(ctx);
+
+ /* fault */
+ memcpy(ctx->test_page, ctx->data, 8);
+
+ if (!ctx->fault_seen) {
+ pr_err("Failed to see the callback\n");
+ return false;
+ }
+
+ ctx->fault_seen = false;
+ ctx->validate_update_exp = false;
+
+ set_context_for_fault(ctx);
+
+ /* no fault */
+ memcpy(ctx->test_page, ctx->data, 8);
+
+ if (ctx->fault_seen) {
+ pr_err("Pkey %d failed to be set RD/WR in the callback\n",
+ ctx->pkey);
+ return false;
+ }
+
+ return true;
+}
+
static ssize_t pks_read_file(struct file *file, char __user *user_buf,
size_t count, loff_t *ppos)
{
@@ -584,6 +640,10 @@ static ssize_t pks_write_file(struct file *file, const char __user *user_buf,
pr_debug("Exception checking\n");
sd->last_test_pass = run_exception_test(file->private_data);
break;
+ case RUN_EXCEPTION_UPDATE:
+ pr_debug("Fault clear test\n");
+ sd->last_test_pass = run_exception_update(file->private_data);
+ break;
default:
pr_debug("Unknown test\n");
sd->last_test_pass = false;
diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftests/x86/test_pks.c
index c40035803e38..194c9dd9a211 100644
--- a/tools/testing/selftests/x86/test_pks.c
+++ b/tools/testing/selftests/x86/test_pks.c
@@ -36,6 +36,7 @@
#define ARM_CTX_SWITCH "2"
#define CHECK_CTX_SWITCH "3"
#define RUN_EXCEPTION "4"
+#define RUN_EXCEPTION_UPDATE "5"
#define RUN_CRASH_TEST "9"

time_t g_start_time;
@@ -63,6 +64,7 @@ enum {
TEST_SINGLE,
TEST_CTX_SWITCH,
TEST_EXCEPTION,
+ TEST_FAULT_CALLBACK,
MAX_TESTS,
} tests;

@@ -77,7 +79,8 @@ struct test_item {
{ "check_defaults", CHECK_DEFAULTS, do_simple_test },
{ "single", RUN_SINGLE, do_simple_test },
{ "context_switch", ARM_CTX_SWITCH, do_context_switch },
- { "exception", RUN_EXCEPTION, do_simple_test }
+ { "exception", RUN_EXCEPTION, do_simple_test },
+ { "exception_update", RUN_EXCEPTION_UPDATE, do_simple_test }
};

static char *get_test_name(int test_num)
--
2.35.1

2022-04-22 18:01:30

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 36/44] devdax: Enable stray access protection

From: Ira Weiny <[email protected]>

Device dax is primarily accessed through user space and kernel access is
controlled through the kmap interfaces.

Now that all valid kernel initiated access to dax devices have been
accounted for, turn on PGMAP_PKEYS_PROTECT for device dax.

Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V9
Add Review tag

Changes for V8
Rebase to 5.17-rc1
Use pgmap_protection_available()
s/PGMAP_PKEYS_PROTECT/PGMAP_PROTECTION/
---
drivers/dax/device.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 5494d745ced5..045854ba3855 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -451,6 +451,8 @@ int dev_dax_probe(struct dev_dax *dev_dax)
if (dev_dax->align > PAGE_SIZE)
pgmap->vmemmap_shift =
order_base_2(dev_dax->align >> PAGE_SHIFT);
+ if (pgmap_protection_available())
+ pgmap->flags |= PGMAP_PROTECTION;
addr = devm_memremap_pages(dev, pgmap);
if (IS_ERR(addr))
return PTR_ERR(addr);
--
2.35.1

2022-04-22 18:18:40

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 35/44] nvdimm/pmem: Enable stray access protection

From: Ira Weiny <[email protected]>

The persistent memory (PMEM) driver uses the memremap_pages facility to
provide 'struct page' metadata (vmemmap) for PMEM. Given that PMEM
capacity maybe orders of magnitude higher capacity than System RAM it
presents a large vulnerability surface to stray writes. Unlike stray
writes to System RAM, which may result in a crash or other undesirable
behavior, stray writes to PMEM additionally are more likely to result in
permanent data loss. Reboot is not a remediation for PMEM corruption
like it is for System RAM.

Now that all valid kernel access' to PMEM have been annotated with
{__}pgmap_set_{readwrite,noaccess}() PGMAP_PROTECTION is safe to enable
in the pmem layer.

Set PGMAP_PROTECTION if pgmap protections are available and set the
pgmap property of the dax device for it's use.

Internally, the pmem driver uses a cached virtual address,
pmem->virt_addr (pmem_addr). Call __pgmap_set_{readwrite,noaccess}()
directly when PGMAP_PROTECTION is active on those mappings.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V9
Remove the dax operations and pass the pgmap to the dax_device
for its use.
s/pgmap_mk_*/pgmap_set_*/
s/pmem_mk_*/pmem_set_*/

Changes for V8
Rebase to 5.17-rc1
Remove global param
Add internal structure which uses the pmem device and pgmap
device directly in the *_mk_*() calls.
Add pmem dax ops callbacks
Use pgmap_protection_available()
s/PGMAP_PKEY_PROTECT/PGMAP_PROTECTION
---
drivers/nvdimm/pmem.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 58d95242a836..2c7b18da7974 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -138,6 +138,18 @@ static blk_status_t read_pmem(struct page *page, unsigned int off,
return BLK_STS_OK;
}

+static void pmem_set_readwrite(struct pmem_device *pmem)
+{
+ if (pmem->pgmap.flags & PGMAP_PROTECTION)
+ __pgmap_set_readwrite(&pmem->pgmap);
+}
+
+static void pmem_set_noaccess(struct pmem_device *pmem)
+{
+ if (pmem->pgmap.flags & PGMAP_PROTECTION)
+ __pgmap_set_noaccess(&pmem->pgmap);
+}
+
static blk_status_t pmem_do_read(struct pmem_device *pmem,
struct page *page, unsigned int page_off,
sector_t sector, unsigned int len)
@@ -149,7 +161,11 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem,
if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
return BLK_STS_IOERR;

+ /* Enable direct use of pmem->virt_addr */
+ pmem_set_readwrite(pmem);
rc = read_pmem(page, page_off, pmem_addr, len);
+ pmem_set_noaccess(pmem);
+
flush_dcache_page(page);
return rc;
}
@@ -181,11 +197,15 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem,
* after clear poison.
*/
flush_dcache_page(page);
+
+ /* Enable direct use of pmem->virt_addr */
+ pmem_set_readwrite(pmem);
write_pmem(pmem_addr, page, page_off, len);
if (unlikely(bad_pmem)) {
rc = pmem_clear_poison(pmem, pmem_off, len);
write_pmem(pmem_addr, page, page_off, len);
}
+ pmem_set_noaccess(pmem);

return rc;
}
@@ -427,6 +447,8 @@ static int pmem_attach_disk(struct device *dev,
pmem->pfn_flags = PFN_DEV;
if (is_nd_pfn(dev)) {
pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
+ if (pgmap_protection_available())
+ pmem->pgmap.flags |= PGMAP_PROTECTION;
addr = devm_memremap_pages(dev, &pmem->pgmap);
pfn_sb = nd_pfn->pfn_sb;
pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
@@ -440,6 +462,8 @@ static int pmem_attach_disk(struct device *dev,
pmem->pgmap.range.end = res->end;
pmem->pgmap.nr_range = 1;
pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
+ if (pgmap_protection_available())
+ pmem->pgmap.flags |= PGMAP_PROTECTION;
addr = devm_memremap_pages(dev, &pmem->pgmap);
pmem->pfn_flags |= PFN_MAP;
bb_range = pmem->pgmap.range;
@@ -481,6 +505,8 @@ static int pmem_attach_disk(struct device *dev,
}
set_dax_nocache(dax_dev);
set_dax_nomc(dax_dev);
+ if (pmem->pgmap.flags & PGMAP_PROTECTION)
+ set_dax_pgmap(dax_dev, &pmem->pgmap);
if (is_nvdimm_sync(nd_region))
set_dax_synchronous(dax_dev);
rc = dax_add_host(dax_dev, disk);
--
2.35.1

2022-04-22 19:31:44

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 02/44] x86/pkeys: Clarify PKRU_AD_KEY macro

From: Ira Weiny <[email protected]>

When changing the PKRU_AD_KEY macro to be used for PKS the name came
into question.[1]

The intent of PKRU_AD_KEY is to set an initial value for the PKRU
register but that is just a mask value.

Clarify this by changing the name to PKRU_AD_MASK().

NOTE the checkpatch errors are ignored for the init_pkru_value to align
the values in the code.

[1] https://lore.kernel.org/lkml/[email protected]/

Suggested-by: Dave Hansen <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V9
New Patch
---
arch/x86/mm/pkeys.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index e44e938885b7..7418c367e328 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -110,7 +110,7 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey
return vma_pkey(vma);
}

-#define PKRU_AD_KEY(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY))
+#define PKRU_AD_MASK(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY))

/*
* Make the default PKRU value (at execve() time) as restrictive
@@ -118,11 +118,14 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey
* in the process's lifetime will not accidentally get access
* to data which is pkey-protected later on.
*/
-u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) |
- PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) |
- PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) |
- PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) |
- PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15);
+u32 init_pkru_value = PKRU_AD_MASK( 1) | PKRU_AD_MASK( 2) |
+ PKRU_AD_MASK( 3) | PKRU_AD_MASK( 4) |
+ PKRU_AD_MASK( 5) | PKRU_AD_MASK( 6) |
+ PKRU_AD_MASK( 7) | PKRU_AD_MASK( 8) |
+ PKRU_AD_MASK( 9) | PKRU_AD_MASK(10) |
+ PKRU_AD_MASK(11) | PKRU_AD_MASK(12) |
+ PKRU_AD_MASK(13) | PKRU_AD_MASK(14) |
+ PKRU_AD_MASK(15);

static ssize_t init_pkru_read_file(struct file *file, char __user *user_buf,
size_t count, loff_t *ppos)
--
2.35.1

2022-04-22 20:00:40

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 31/44] memremap_pages: Define pgmap_set_{readwrite|noaccess}() calls

From: Ira Weiny <[email protected]>

A thread that wants to access memory protected by PGMAP protections must
first enable access, and then disable access when it is done.

Introduce pgmap_set_{readwrite|noaccess}() for this purpose. The two
calls are destined to be used by the kmap API and take a struct page for
convenience. They determine if the page is protected and, if so,
perform the requested operation.

Toggling between Read/Write and No Access was chosen as it fits well
with the accessibility of a kmap'ed page. Discussions did occur
regarding making a finer grained mapping for Read Only but that is
something which can be added at a later date.

In addition, two lower level functions are exported. They take the
dev_pagemap object directly for internal consumers who have knowledge of
the of the dev_pagemap.

All changes in the protections must be through the above calls. They
abstract the protection implementation (currently the PKS API) from
upper layer consumers.

The calls are made nestable by the use of a per task reference count.
This ensures that the first call to re-enable protection does not
'break' the last access of the device memory. Expansion of the task
struct is unavoidable due to the desire to maintain kmap_local_page() as
non-atomic and migratable. The only other idea to track a reference
count was in a per-cpu variable. However, doing so would make
kmap_local_page() equivalent to kmap_atomic() which is undesirable.

Access to device memory during exceptions (#PF) is expected only from
user faults. Therefore there is no need to maintain the reference count
during exceptions.

NOTE: It is not anticipated that any code path will directly nest these
calls. For this reason multiple reviewers, including Dan and Thomas,
asked why this reference counting was needed at this level rather than
in a higher level call such as kmap_local_page(). The reason is that
pgmap_set_readwrite() can nest with kmap_{atomic,local_page}().
Therefore this reference counting is pushed to the lower level to ensure
that any combination of calls is nestable.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V10
Move code from mm.h to memremap.h
Upstream separated memremap.h functionality from mm.h
dc90f0846df4 ("mm: don't include <linux/memremap.h> in <linux/mm.h>")

Changes for V9
From Dan Williams
Update the commit message with details on why the thread
struct needs to be expanded.
Following on Dave Hansens suggestion for pks_mk
s/pgmap_mk_*/pgmap_set_*/

Changes for V8
Split these functions into their own patch.
This helps to clarify the commit message and usage.
---
include/linux/memremap.h | 35 +++++++++++++++++++++++++++++++++++
include/linux/sched.h | 7 +++++++
init/init_task.c | 3 +++
mm/memremap.c | 14 ++++++++++++++
4 files changed, 59 insertions(+)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 6325f00096ec..1012c6c4c664 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -240,8 +240,43 @@ static inline bool devmap_protected(struct page *page)
return false;
}

+void __pgmap_set_readwrite(struct dev_pagemap *pgmap);
+void __pgmap_set_noaccess(struct dev_pagemap *pgmap);
+
+static inline bool pgmap_check_pgmap_prot(struct page *page)
+{
+ if (!devmap_protected(page))
+ return false;
+
+ /*
+ * There is no known use case to change permissions in an irq for pgmap
+ * pages
+ */
+ lockdep_assert_in_irq();
+ return true;
+}
+
+static inline void pgmap_set_readwrite(struct page *page)
+{
+ if (!pgmap_check_pgmap_prot(page))
+ return;
+ __pgmap_set_readwrite(page->pgmap);
+}
+
+static inline void pgmap_set_noaccess(struct page *page)
+{
+ if (!pgmap_check_pgmap_prot(page))
+ return;
+ __pgmap_set_noaccess(page->pgmap);
+}
+
#else

+static inline void __pgmap_set_readwrite(struct dev_pagemap *pgmap) { }
+static inline void __pgmap_set_noaccess(struct dev_pagemap *pgmap) { }
+static inline void pgmap_set_readwrite(struct page *page) { }
+static inline void pgmap_set_noaccess(struct page *page) { }
+
static inline bool pgmap_protection_available(void)
{
return false;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..7da0d2a0ac74 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1498,6 +1498,13 @@ struct task_struct {
struct callback_head l1d_flush_kill;
#endif

+#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION
+ /*
+ * NOTE: pgmap_prot_count is modified within a single thread of
+ * execution. So it does not need to be atomic_t.
+ */
+ u32 pgmap_prot_count;
+#endif
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
diff --git a/init/init_task.c b/init/init_task.c
index 73cc8f03511a..948b32cf8139 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -209,6 +209,9 @@ struct task_struct init_task
#ifdef CONFIG_SECCOMP_FILTER
.seccomp = { .filter_count = ATOMIC_INIT(0) },
#endif
+#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION
+ .pgmap_prot_count = 0,
+#endif
};
EXPORT_SYMBOL(init_task);

diff --git a/mm/memremap.c b/mm/memremap.c
index 215ab9c51917..491bb49255ae 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -94,6 +94,20 @@ static void devmap_protection_disable(void)
static_branch_dec(&dev_pgmap_protection_static_key);
}

+void __pgmap_set_readwrite(struct dev_pagemap *pgmap)
+{
+ if (!current->pgmap_prot_count++)
+ pks_set_readwrite(PKS_KEY_PGMAP_PROTECTION);
+}
+EXPORT_SYMBOL_GPL(__pgmap_set_readwrite);
+
+void __pgmap_set_noaccess(struct dev_pagemap *pgmap)
+{
+ if (!--current->pgmap_prot_count)
+ pks_set_noaccess(PKS_KEY_PGMAP_PROTECTION);
+}
+EXPORT_SYMBOL_GPL(__pgmap_set_noaccess);
+
#else /* !CONFIG_DEVMAP_ACCESS_PROTECTION */

static void devmap_protection_enable(void) { }
--
2.35.1

2022-04-22 21:45:05

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 24/44] mm/pkeys: Add pks_available()

From: Ira Weiny <[email protected]>

If PKS is configured within the kernel but the CPU does not support PKS,
the PKS calls remain safe to execute even without protection. However,
adding the overhead of these calls on CPUs which don't support PKS is
inefficient and best avoided.

Define pks_available() to allow users to check if PKS is enabled on the
current system.

The implementation of pks_available() is placed in the asm headers while
being directly exported via linux/pks.h to allow for the inline calling
of cpu_feature_enabled() by consumers outside of the architecture.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V9
Driven by a request by Dan Williams to make this static inline
Place this in pks.h to avoid header conflicts while
allowing for an optimized call to cpu_feature_enabled()

Changes for V8
s/pks_enabled/pks_available
---
Documentation/core-api/protection-keys.rst | 3 +++
arch/x86/include/asm/pks.h | 12 ++++++++++++
include/linux/pks.h | 8 ++++++++
3 files changed, 23 insertions(+)

diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
index c5f0f5d39929..47bcb38fff4f 100644
--- a/Documentation/core-api/protection-keys.rst
+++ b/Documentation/core-api/protection-keys.rst
@@ -152,6 +152,9 @@ Changing permissions of individual keys
.. kernel-doc:: arch/x86/mm/pkeys.c
:identifiers: pks_update_exception

+.. kernel-doc:: arch/x86/include/asm/pks.h
+ :identifiers: pks_available
+
Overriding Default Fault Behavior
---------------------------------

diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h
index de67d5b5a2af..cab42aadea07 100644
--- a/arch/x86/include/asm/pks.h
+++ b/arch/x86/include/asm/pks.h
@@ -2,8 +2,20 @@
#ifndef _ASM_X86_PKS_H
#define _ASM_X86_PKS_H

+#include <asm/cpufeature.h>
+
#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS

+/**
+ * pks_available() - Is PKS available on this system
+ *
+ * Return if PKS is currently supported and enabled on this system.
+ */
+static inline bool pks_available(void)
+{
+ return cpu_feature_enabled(X86_FEATURE_PKS);
+}
+
void pks_setup(void);
void x86_pkrs_load(struct thread_struct *thread);
void pks_save_pt_regs(struct pt_regs *regs);
diff --git a/include/linux/pks.h b/include/linux/pks.h
index 2ea5fb57f2dc..151a3fda9de4 100644
--- a/include/linux/pks.h
+++ b/include/linux/pks.h
@@ -8,6 +8,9 @@

#include <uapi/asm-generic/mman-common.h>

+#include <asm/pks.h>
+
+bool pks_available(void);
void pks_update_protection(u8 pkey, u8 protection);
void pks_update_exception(struct pt_regs *regs, u8 pkey, u8 protection);

@@ -40,6 +43,11 @@ typedef bool (*pks_key_callback)(struct pt_regs *regs, unsigned long address,

#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */

+static inline bool pks_available(void)
+{
+ return false;
+}
+
static inline void pks_set_noaccess(u8 pkey) {}
static inline void pks_set_readwrite(u8 pkey) {}
static inline void pks_update_exception(struct pt_regs *regs,
--
2.35.1

2022-04-22 22:07:24

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V10 26/44] memremap_pages: Introduce pgmap_protection_available()

From: Ira Weiny <[email protected]>

PMEM will flag additional dev_pagemap protection through (struct
dev_pagemap)->flags. However, it is more efficient to know if that
protection is available prior to requesting it and failing the mapping.

Define pgmap_protection_available() to check if protection is available
prior to being requested. The name of pgmap_protection_available() was
specifically chosen to isolate the implementation of the protection from
higher level users.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V10
Move code from mm.h to memremap.h
Upstream separated memremap.h functionality from mm.h
dc90f0846df4 ("mm: don't include <linux/memremap.h> in <linux/mm.h>")

Changes for V9
Clean up commit message
From Dan Williams
make call stack static inline throughout this call and
pks_available() such that callers calls
cpu_feature_enabled() directly

Changes for V8
Split this out to it's own patch.
s/pgmap_protection_enabled/pgmap_protection_available
---
include/linux/memremap.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 8af304f6b504..7980d0db8617 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -6,6 +6,7 @@
#include <linux/range.h>
#include <linux/ioport.h>
#include <linux/percpu-refcount.h>
+#include <linux/pks.h>

struct resource;
struct device;
@@ -214,4 +215,20 @@ static inline void put_dev_pagemap(struct dev_pagemap *pgmap)
percpu_ref_put(&pgmap->ref);
}

+#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION
+
+static inline bool pgmap_protection_available(void)
+{
+ return pks_available();
+}
+
+#else
+
+static inline bool pgmap_protection_available(void)
+{
+ return false;
+}
+
+#endif /* CONFIG_DEVMAP_ACCESS_PROTECTION */
+
#endif /* _LINUX_MEMREMAP_H_ */
--
2.35.1

Subject: [tip: x86/mm] Documentation/protection-keys: Clean up documentation for User Space pkeys

The following commit has been merged into the x86/mm branch of tip:

Commit-ID: f8c1d4ca55177326adad1fdc6bf602423a507542
Gitweb: https://git.kernel.org/tip/f8c1d4ca55177326adad1fdc6bf602423a507542
Author: Ira Weiny <[email protected]>
AuthorDate: Tue, 19 Apr 2022 10:06:06 -07:00
Committer: Dave Hansen <[email protected]>
CommitterDate: Tue, 07 Jun 2022 16:06:22 -07:00

Documentation/protection-keys: Clean up documentation for User Space pkeys

The documentation for user space pkeys was a bit dated including things
such as Amazon and distribution testing information which is irrelevant
now.

Update the documentation. This also streamlines adding the Supervisor
pkey documentation later on.

Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Documentation/core-api/protection-keys.rst | 44 ++++++++++-----------
1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
index ec575e7..bf28ac0 100644
--- a/Documentation/core-api/protection-keys.rst
+++ b/Documentation/core-api/protection-keys.rst
@@ -4,31 +4,29 @@
Memory Protection Keys
======================

-Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
-which is found on Intel's Skylake (and later) "Scalable Processor"
-Server CPUs. It will be available in future non-server Intel parts
-and future AMD processors.
-
-For anyone wishing to test or use this feature, it is available in
-Amazon's EC2 C5 instances and is known to work there using an Ubuntu
-17.04 image.
-
-Memory Protection Keys provides a mechanism for enforcing page-based
-protections, but without requiring modification of the page tables
-when an application changes protection domains. It works by
-dedicating 4 previously ignored bits in each page table entry to a
-"protection key", giving 16 possible keys.
-
-There is also a new user-accessible register (PKRU) with two separate
-bits (Access Disable and Write Disable) for each key. Being a CPU
-register, PKRU is inherently thread-local, potentially giving each
+Memory Protection Keys provide a mechanism for enforcing page-based
+protections, but without requiring modification of the page tables when an
+application changes protection domains.
+
+Pkeys Userspace (PKU) is a feature which can be found on:
+ * Intel server CPUs, Skylake and later
+ * Intel client CPUs, Tiger Lake (11th Gen Core) and later
+ * Future AMD CPUs
+
+Pkeys work by dedicating 4 previously Reserved bits in each page table entry to
+a "protection key", giving 16 possible keys.
+
+Protections for each key are defined with a per-CPU user-accessible register
+(PKRU). Each of these is a 32-bit register storing two bits (Access Disable
+and Write Disable) for each of 16 keys.
+
+Being a CPU register, PKRU is inherently thread-local, potentially giving each
thread a different set of protections from every other thread.

-There are two new instructions (RDPKRU/WRPKRU) for reading and writing
-to the new register. The feature is only available in 64-bit mode,
-even though there is theoretically space in the PAE PTEs. These
-permissions are enforced on data access only and have no effect on
-instruction fetches.
+There are two instructions (RDPKRU/WRPKRU) for reading and writing to the
+register. The feature is only available in 64-bit mode, even though there is
+theoretically space in the PAE PTEs. These permissions are enforced on data
+access only and have no effect on instruction fetches.

Syscalls
========

Subject: [tip: x86/mm] x86/pkeys: Clarify PKRU_AD_KEY macro

The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 54ee1844047c1df015ab2679a4f55564a3aa1fa1
Gitweb: https://git.kernel.org/tip/54ee1844047c1df015ab2679a4f55564a3aa1fa1
Author: Ira Weiny <[email protected]>
AuthorDate: Tue, 19 Apr 2022 10:06:07 -07:00
Committer: Dave Hansen <[email protected]>
CommitterDate: Tue, 07 Jun 2022 16:06:33 -07:00

x86/pkeys: Clarify PKRU_AD_KEY macro

When changing the PKRU_AD_KEY macro to be used for PKS the name came
into question.[1]

The intent of PKRU_AD_KEY is to set an initial value for the PKRU
register but that is just a mask value.

Clarify this by changing the name to PKRU_AD_MASK().

NOTE the checkpatch errors are ignored for the init_pkru_value to align
the values in the code.

[1] https://lore.kernel.org/lkml/[email protected]/

Suggested-by: Dave Hansen <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/mm/pkeys.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index e44e938..7418c36 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -110,7 +110,7 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey
return vma_pkey(vma);
}

-#define PKRU_AD_KEY(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY))
+#define PKRU_AD_MASK(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY))

/*
* Make the default PKRU value (at execve() time) as restrictive
@@ -118,11 +118,14 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey
* in the process's lifetime will not accidentally get access
* to data which is pkey-protected later on.
*/
-u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) |
- PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) |
- PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) |
- PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) |
- PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15);
+u32 init_pkru_value = PKRU_AD_MASK( 1) | PKRU_AD_MASK( 2) |
+ PKRU_AD_MASK( 3) | PKRU_AD_MASK( 4) |
+ PKRU_AD_MASK( 5) | PKRU_AD_MASK( 6) |
+ PKRU_AD_MASK( 7) | PKRU_AD_MASK( 8) |
+ PKRU_AD_MASK( 9) | PKRU_AD_MASK(10) |
+ PKRU_AD_MASK(11) | PKRU_AD_MASK(12) |
+ PKRU_AD_MASK(13) | PKRU_AD_MASK(14) |
+ PKRU_AD_MASK(15);

static ssize_t init_pkru_read_file(struct file *file, char __user *user_buf,
size_t count, loff_t *ppos)