2020-10-09 20:01:21

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC V3 0/9] PKS: Add Protection Keys Supervisor (PKS) support RFC v3

From: Ira Weiny <[email protected]>

This RFC series has been reviewed by Dave Hansen.

Introduce a new page protection mechanism for supervisor pages, Protection Key
Supervisor (PKS).

2 use cases for PKS are being developed, trusted keys and PMEM. Trusted keys
is a newer use case which is still being explored. PMEM was submitted as part
of the RFC (v2) series[1]. However, since then it was found that some callers
of kmap() require a global implementation of PKS. Specifically some users of
kmap() expect mappings to be available to all kernel threads. While global use
of PKS is rare it needs to be included for correctness. Unfortunately the
kmap() updates required a large patch series to make the needed changes at the
various kmap() call sites so that patch set has been split out. Because the
global PKS feature is only required for that use case it will be deferred to
that set as well.[2] This patch set is being submitted as a precursor to both
of the use cases.

For an overview of the entire PKS ecosystem, a git tree including this series
and the 2 use cases can be found here:

https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3


PKS enables protections on 'domains' of supervisor pages to limit supervisor
mode access to those pages beyond the normal paging protections. PKS works in
a similar fashion to user space pkeys, PKU. As with PKU, supervisor pkeys are
checked in addition to normal paging protections and Access or Writes can be
disabled via a MSR update without TLB flushes when permissions change. Also
like PKU, a page mapping is assigned to a domain by setting pkey bits in the
page table entry for that mapping.

Access is controlled through a PKRS register which is updated via WRMSR/RDMSR.

XSAVE is not supported for the PKRS MSR. Therefore the implementation
saves/restores the MSR across context switches and during exceptions. Nested
exceptions are supported by each exception getting a new PKS state.

For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.

Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one. Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
arch and/or CPU instance.

The following are key attributes of PKS.

1) Fast switching of permissions
1a) Prevents access without page table manipulations
1b) No TLB flushes required
2) Works on a per thread basis

PKS is available with 4 and 5 level paging. Like PKRU it consumes 4 bits from
the PTE to store the pkey within the entry.


[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://github.com/weiny2/linux-kernel/commit/f10abb0f0d7b4e14f03fc8890313a5830cde1e49
and a testing patch
https://github.com/weiny2/linux-kernel/commit/2a8e0fc7654a7c69b243d628f63b01ff26a5a797


Fenghua Yu (3):
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Enable Protection Keys Supervisor (PKS)
x86/pks: Add PKS kernel API

Ira Weiny (6):
x86/pkeys: Create pkeys_common.h
x86/pks: Preserve the PKRS MSR on context switch
x86/entry: Pass irqentry_state_t by reference
x86/entry: Preserve PKRS MSR across exceptions
x86/fault: Report the PKRS state on fault
x86/pks: Add PKS test code

Documentation/core-api/protection-keys.rst | 102 ++-
arch/x86/Kconfig | 1 +
arch/x86/entry/common.c | 57 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/idtentry.h | 29 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable.h | 13 +-
arch/x86/include/asm/pgtable_types.h | 12 +
arch/x86/include/asm/pkeys.h | 15 +
arch/x86/include/asm/pkeys_common.h | 36 +
arch/x86/include/asm/processor.h | 13 +
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/cpu/common.c | 17 +
arch/x86/kernel/cpu/mce/core.c | 4 +
arch/x86/kernel/fpu/xstate.c | 22 +-
arch/x86/kernel/kvm.c | 4 +-
arch/x86/kernel/nmi.c | 7 +-
arch/x86/kernel/process.c | 21 +
arch/x86/kernel/traps.c | 21 +-
arch/x86/mm/fault.c | 86 ++-
arch/x86/mm/pkeys.c | 188 +++++-
include/linux/entry-common.h | 19 +-
include/linux/pgtable.h | 4 +
include/linux/pkeys.h | 23 +-
kernel/entry/common.c | 28 +-
lib/Kconfig.debug | 12 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 690 ++++++++++++++++++++
mm/Kconfig | 2 +
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 65 ++
32 files changed, 1376 insertions(+), 128 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_common.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c

--
2.28.0.rc0.12.gb6a658bd00c9


2020-10-10 02:21:41

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC V3 1/9] x86/pkeys: Create pkeys_common.h

From: Ira Weiny <[email protected]>

Protection Keys User (PKU) and Protection Keys Supervisor (PKS) work in
similar fashions and can share common defines. Normally, these defines
would be put in asm/pkeys.h to be used internally and externally to the
arch code. However, the defines are required in pgtable.h and inclusion
of pkeys.h in that header creates complex dependencies which are best
resolved in a separate header.

Share these defines by moving those them into a new header, change their
names to reflect the common use, and include the header where needed.

Signed-off-by: Ira Weiny <[email protected]>

---
NOTE: The initialization of init_pkru_value cause checkpatch errors
because of the space after the '(' in the macros. We leave this as is
because it is more readable in this format. And it was existing code.
---
arch/x86/include/asm/pgtable.h | 13 ++++++-------
arch/x86/include/asm/pkeys.h | 2 ++
arch/x86/include/asm/pkeys_common.h | 11 +++++++++++
arch/x86/kernel/fpu/xstate.c | 8 ++++----
arch/x86/mm/pkeys.c | 14 ++++++--------
5 files changed, 29 insertions(+), 19 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_common.h

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index b836138ce852..2576154be6cf 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1361,9 +1361,7 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd)
}
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */

-#define PKRU_AD_BIT 0x1
-#define PKRU_WD_BIT 0x2
-#define PKRU_BITS_PER_PKEY 2
+#include <asm/pkeys_common.h>

#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
extern u32 init_pkru_value;
@@ -1373,18 +1371,19 @@ extern u32 init_pkru_value;

static inline bool __pkru_allows_read(u32 pkru, u16 pkey)
{
- int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY;
- return !(pkru & (PKRU_AD_BIT << pkru_pkey_bits));
+ int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY;
+
+ return !(pkru & (PKR_AD_BIT << pkru_pkey_bits));
}

static inline bool __pkru_allows_write(u32 pkru, u16 pkey)
{
- int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY;
+ int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY;
/*
* Access-disable disables writes too so we need to check
* both bits here.
*/
- return !(pkru & ((PKRU_AD_BIT|PKRU_WD_BIT) << pkru_pkey_bits));
+ return !(pkru & ((PKR_AD_BIT|PKR_WD_BIT) << pkru_pkey_bits));
}

static inline u16 pte_flags_pkey(unsigned long pte_flags)
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 2ff9b98812b7..f9feba80894b 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_PKEYS_H
#define _ASM_X86_PKEYS_H

+#include <asm/pkeys_common.h>
+
#define ARCH_DEFAULT_PKEY 0

/*
diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h
new file mode 100644
index 000000000000..a9f086f1e4b4
--- /dev/null
+++ b/arch/x86/include/asm/pkeys_common.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PKEYS_INTERNAL_H
+#define _ASM_X86_PKEYS_INTERNAL_H
+
+#define PKR_AD_BIT 0x1
+#define PKR_WD_BIT 0x2
+#define PKR_BITS_PER_PKEY 2
+
+#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY))
+
+#endif /*_ASM_X86_PKEYS_INTERNAL_H */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 038e19c0019e..b55cf3acd82a 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -991,7 +991,7 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val)
{
u32 old_pkru;
- int pkey_shift = (pkey * PKRU_BITS_PER_PKEY);
+ int pkey_shift = (pkey * PKR_BITS_PER_PKEY);
u32 new_pkru_bits = 0;

/*
@@ -1010,16 +1010,16 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,

/* Set the bits we need in PKRU: */
if (init_val & PKEY_DISABLE_ACCESS)
- new_pkru_bits |= PKRU_AD_BIT;
+ new_pkru_bits |= PKR_AD_BIT;
if (init_val & PKEY_DISABLE_WRITE)
- new_pkru_bits |= PKRU_WD_BIT;
+ new_pkru_bits |= PKR_WD_BIT;

/* Shift the bits in to the correct place in PKRU for pkey: */
new_pkru_bits <<= pkey_shift;

/* Get old PKRU and mask off any old bits in place: */
old_pkru = read_pkru();
- old_pkru &= ~((PKRU_AD_BIT|PKRU_WD_BIT) << pkey_shift);
+ old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift);

/* Write old part along with new part: */
write_pkru(old_pkru | new_pkru_bits);
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 8873ed1438a9..f5efb4007e74 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -111,19 +111,17 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey
return vma_pkey(vma);
}

-#define PKRU_AD_KEY(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY))
-
/*
* Make the default PKRU value (at execve() time) as restrictive
* as possible. This ensures that any threads clone()'d early
* in the process's lifetime will not accidentally get access
* to data which is pkey-protected later on.
*/
-u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) |
- PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) |
- PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) |
- PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) |
- PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15);
+u32 init_pkru_value = PKR_AD_KEY( 1) | PKR_AD_KEY( 2) | PKR_AD_KEY( 3) |
+ PKR_AD_KEY( 4) | PKR_AD_KEY( 5) | PKR_AD_KEY( 6) |
+ PKR_AD_KEY( 7) | PKR_AD_KEY( 8) | PKR_AD_KEY( 9) |
+ PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) |
+ PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15);

/*
* Called from the FPU code when creating a fresh set of FPU
@@ -173,7 +171,7 @@ static ssize_t init_pkru_write_file(struct file *file,
* up immediately if someone attempts to disable access
* or writes to pkey 0.
*/
- if (new_init_pkru & (PKRU_AD_BIT|PKRU_WD_BIT))
+ if (new_init_pkru & (PKR_AD_BIT|PKR_WD_BIT))
return -EINVAL;

WRITE_ONCE(init_pkru_value, new_init_pkru);
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-10 02:23:22

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC V3 8/9] x86/fault: Report the PKRS state on fault

From: Ira Weiny <[email protected]>

When only user space pkeys are enabled faulting within the kernel was an
unexpected condition which should never happen, therefore a WARN_ON was
added to the kernel fault handler to detect if it ever did. Now that
PKS can be enabled this is no longer the case.

Report a Pkey fault with a normal splat and add the PKRS state to the
fault splat text. Note the PKS register is reset during an exception
therefore the saved PKRS value from before the beginning of the
exception is passed down.

If PKS is not enabled, or not active, maintain the WARN_ON_ONCE() from
before.

Because each fault has its own state the pkrs information will be
correctly reported even if a fault 'faults'.

Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
arch/x86/mm/fault.c | 59 ++++++++++++++++++++++++++-------------------
1 file changed, 34 insertions(+), 25 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e55bc4bff389..ee761c993f58 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -504,7 +504,8 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
}

static void
-show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
+show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address,
+ irqentry_state_t *irq_state)
{
if (!oops_may_print())
return;
@@ -548,6 +549,11 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
(error_code & X86_PF_PK) ? "protection keys violation" :
"permissions violation");

+#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
+ if (irq_state && (error_code & X86_PF_PK))
+ pr_alert("PKRS: 0x%x\n", irq_state->pkrs);
+#endif
+
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
struct desc_ptr idt, gdt;
u16 ldtr, tr;
@@ -626,7 +632,8 @@ static void set_signal_archinfo(unsigned long address,

static noinline void
no_context(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, int signal, int si_code)
+ unsigned long address, int signal, int si_code,
+ irqentry_state_t *irq_state)
{
struct task_struct *tsk = current;
unsigned long flags;
@@ -732,7 +739,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
*/
flags = oops_begin();

- show_fault_oops(regs, error_code, address);
+ show_fault_oops(regs, error_code, address, irq_state);

if (task_stack_end_corrupted(tsk))
printk(KERN_EMERG "Thread overran stack, or stack corrupted\n");
@@ -785,7 +792,8 @@ static bool is_vsyscall_vaddr(unsigned long vaddr)

static void
__bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, u32 pkey, int si_code)
+ unsigned long address, u32 pkey, int si_code,
+ irqentry_state_t *state)
{
struct task_struct *tsk = current;

@@ -832,14 +840,14 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
if (is_f00f_bug(regs, address))
return;

- no_context(regs, error_code, address, SIGSEGV, si_code);
+ no_context(regs, error_code, address, SIGSEGV, si_code, state);
}

static noinline void
bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
- unsigned long address)
+ unsigned long address, irqentry_state_t *state)
{
- __bad_area_nosemaphore(regs, error_code, address, 0, SEGV_MAPERR);
+ __bad_area_nosemaphore(regs, error_code, address, 0, SEGV_MAPERR, state);
}

static void
@@ -853,7 +861,7 @@ __bad_area(struct pt_regs *regs, unsigned long error_code,
*/
mmap_read_unlock(mm);

- __bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
+ __bad_area_nosemaphore(regs, error_code, address, pkey, si_code, NULL);
}

static noinline void
@@ -923,7 +931,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
{
/* Kernel mode? Handle exceptions or die: */
if (!(error_code & X86_PF_USER)) {
- no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
+ no_context(regs, error_code, address, SIGBUS, BUS_ADRERR, NULL);
return;
}

@@ -957,7 +965,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
unsigned long address, vm_fault_t fault)
{
if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) {
- no_context(regs, error_code, address, 0, 0);
+ no_context(regs, error_code, address, 0, 0, NULL);
return;
}

@@ -965,7 +973,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
/* Kernel mode? Handle exceptions or die: */
if (!(error_code & X86_PF_USER)) {
no_context(regs, error_code, address,
- SIGSEGV, SEGV_MAPERR);
+ SIGSEGV, SEGV_MAPERR, NULL);
return;
}

@@ -980,7 +988,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
VM_FAULT_HWPOISON_LARGE))
do_sigbus(regs, error_code, address, fault);
else if (fault & VM_FAULT_SIGSEGV)
- bad_area_nosemaphore(regs, error_code, address);
+ bad_area_nosemaphore(regs, error_code, address, NULL);
else
BUG();
}
@@ -1148,14 +1156,15 @@ static int fault_in_kernel_space(unsigned long address)
*/
static void
do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
- unsigned long address)
+ unsigned long address, irqentry_state_t *irq_state)
{
/*
- * Protection keys exceptions only happen on user pages. We
- * have no user pages in the kernel portion of the address
- * space, so do not expect them here.
+ * If protection keys are not enabled for kernel space
+ * do not expect Pkey errors here.
*/
- WARN_ON_ONCE(hw_error_code & X86_PF_PK);
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_SUPERVISOR_PKEYS) ||
+ !cpu_feature_enabled(X86_FEATURE_PKS))
+ WARN_ON_ONCE(hw_error_code & X86_PF_PK);

#ifdef CONFIG_X86_32
/*
@@ -1204,7 +1213,7 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
* Don't take the mm semaphore here. If we fixup a prefetch
* fault we could otherwise deadlock:
*/
- bad_area_nosemaphore(regs, hw_error_code, address);
+ bad_area_nosemaphore(regs, hw_error_code, address, irq_state);
}
NOKPROBE_SYMBOL(do_kern_addr_fault);

@@ -1245,7 +1254,7 @@ void do_user_addr_fault(struct pt_regs *regs,
!(hw_error_code & X86_PF_USER) &&
!(regs->flags & X86_EFLAGS_AC)))
{
- bad_area_nosemaphore(regs, hw_error_code, address);
+ bad_area_nosemaphore(regs, hw_error_code, address, NULL);
return;
}

@@ -1254,7 +1263,7 @@ void do_user_addr_fault(struct pt_regs *regs,
* in a region with pagefaults disabled then we must not take the fault
*/
if (unlikely(faulthandler_disabled() || !mm)) {
- bad_area_nosemaphore(regs, hw_error_code, address);
+ bad_area_nosemaphore(regs, hw_error_code, address, NULL);
return;
}

@@ -1316,7 +1325,7 @@ void do_user_addr_fault(struct pt_regs *regs,
* Fault from code in kernel from
* which we do not expect faults.
*/
- bad_area_nosemaphore(regs, hw_error_code, address);
+ bad_area_nosemaphore(regs, hw_error_code, address, NULL);
return;
}
retry:
@@ -1375,7 +1384,7 @@ void do_user_addr_fault(struct pt_regs *regs,
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
no_context(regs, hw_error_code, address, SIGBUS,
- BUS_ADRERR);
+ BUS_ADRERR, NULL);
return;
}

@@ -1415,7 +1424,7 @@ trace_page_fault_entries(struct pt_regs *regs, unsigned long error_code,

static __always_inline void
handle_page_fault(struct pt_regs *regs, unsigned long error_code,
- unsigned long address)
+ unsigned long address, irqentry_state_t *irq_state)
{
trace_page_fault_entries(regs, error_code, address);

@@ -1424,7 +1433,7 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,

/* Was the fault on kernel-controlled part of the address space? */
if (unlikely(fault_in_kernel_space(address))) {
- do_kern_addr_fault(regs, error_code, address);
+ do_kern_addr_fault(regs, error_code, address, irq_state);
} else {
do_user_addr_fault(regs, error_code, address);
/*
@@ -1479,7 +1488,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
irqentry_enter(regs, &state);

instrumentation_begin();
- handle_page_fault(regs, error_code, address);
+ handle_page_fault(regs, error_code, address, &state);
instrumentation_end();

irqentry_exit(regs, &state);
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-10 03:49:33

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC V3 0/9] PKS: Add Protection Keys Supervisor (PKS) support RFC v3

On Fri, Oct 09, 2020 at 12:42:49PM -0700, 'Ira Weiny' wrote:
> From: Ira Weiny <[email protected]>
>
> This RFC series has been reviewed by Dave Hansen.
>
> Introduce a new page protection mechanism for supervisor pages, Protection Key
> Supervisor (PKS).
>
> 2 use cases for PKS are being developed, trusted keys and PMEM.

RFC patch sets for these use cases have also been posted:

PMEM:
https://lore.kernel.org/lkml/[email protected]/

Trusted Keys:
https://lore.kernel.org/lkml/[email protected]/

Ira

> Trusted keys
> is a newer use case which is still being explored. PMEM was submitted as part
> of the RFC (v2) series[1]. However, since then it was found that some callers
> of kmap() require a global implementation of PKS. Specifically some users of
> kmap() expect mappings to be available to all kernel threads. While global use
> of PKS is rare it needs to be included for correctness. Unfortunately the
> kmap() updates required a large patch series to make the needed changes at the
> various kmap() call sites so that patch set has been split out. Because the
> global PKS feature is only required for that use case it will be deferred to
> that set as well.[2] This patch set is being submitted as a precursor to both
> of the use cases.
>
> For an overview of the entire PKS ecosystem, a git tree including this series
> and the 2 use cases can be found here:
>
> https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3
>
>
> PKS enables protections on 'domains' of supervisor pages to limit supervisor
> mode access to those pages beyond the normal paging protections. PKS works in
> a similar fashion to user space pkeys, PKU. As with PKU, supervisor pkeys are
> checked in addition to normal paging protections and Access or Writes can be
> disabled via a MSR update without TLB flushes when permissions change. Also
> like PKU, a page mapping is assigned to a domain by setting pkey bits in the
> page table entry for that mapping.
>
> Access is controlled through a PKRS register which is updated via WRMSR/RDMSR.
>
> XSAVE is not supported for the PKRS MSR. Therefore the implementation
> saves/restores the MSR across context switches and during exceptions. Nested
> exceptions are supported by each exception getting a new PKS state.
>
> For consistent behavior with current paging protections, pkey 0 is reserved and
> configured to allow full access via the pkey mechanism, thus preserving the
> default paging protections on mappings with the default pkey value of 0.
>
> Other keys, (1-15) are allocated by an allocator which prepares us for key
> contention from day one. Kernel users should be prepared for the allocator to
> fail either because of key exhaustion or due to PKS not being supported on the
> arch and/or CPU instance.
>
> The following are key attributes of PKS.
>
> 1) Fast switching of permissions
> 1a) Prevents access without page table manipulations
> 1b) No TLB flushes required
> 2) Works on a per thread basis
>
> PKS is available with 4 and 5 level paging. Like PKRU it consumes 4 bits from
> the PTE to store the pkey within the entry.
>
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://github.com/weiny2/linux-kernel/commit/f10abb0f0d7b4e14f03fc8890313a5830cde1e49
> and a testing patch
> https://github.com/weiny2/linux-kernel/commit/2a8e0fc7654a7c69b243d628f63b01ff26a5a797
>
>
> Fenghua Yu (3):
> x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
> x86/pks: Enable Protection Keys Supervisor (PKS)
> x86/pks: Add PKS kernel API
>
> Ira Weiny (6):
> x86/pkeys: Create pkeys_common.h
> x86/pks: Preserve the PKRS MSR on context switch
> x86/entry: Pass irqentry_state_t by reference
> x86/entry: Preserve PKRS MSR across exceptions
> x86/fault: Report the PKRS state on fault
> x86/pks: Add PKS test code
>
> Documentation/core-api/protection-keys.rst | 102 ++-
> arch/x86/Kconfig | 1 +
> arch/x86/entry/common.c | 57 +-
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/idtentry.h | 29 +-
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/include/asm/pgtable.h | 13 +-
> arch/x86/include/asm/pgtable_types.h | 12 +
> arch/x86/include/asm/pkeys.h | 15 +
> arch/x86/include/asm/pkeys_common.h | 36 +
> arch/x86/include/asm/processor.h | 13 +
> arch/x86/include/uapi/asm/processor-flags.h | 2 +
> arch/x86/kernel/cpu/common.c | 17 +
> arch/x86/kernel/cpu/mce/core.c | 4 +
> arch/x86/kernel/fpu/xstate.c | 22 +-
> arch/x86/kernel/kvm.c | 4 +-
> arch/x86/kernel/nmi.c | 7 +-
> arch/x86/kernel/process.c | 21 +
> arch/x86/kernel/traps.c | 21 +-
> arch/x86/mm/fault.c | 86 ++-
> arch/x86/mm/pkeys.c | 188 +++++-
> include/linux/entry-common.h | 19 +-
> include/linux/pgtable.h | 4 +
> include/linux/pkeys.h | 23 +-
> kernel/entry/common.c | 28 +-
> lib/Kconfig.debug | 12 +
> lib/Makefile | 3 +
> lib/pks/Makefile | 3 +
> lib/pks/pks_test.c | 690 ++++++++++++++++++++
> mm/Kconfig | 2 +
> tools/testing/selftests/x86/Makefile | 3 +-
> tools/testing/selftests/x86/test_pks.c | 65 ++
> 32 files changed, 1376 insertions(+), 128 deletions(-)
> create mode 100644 arch/x86/include/asm/pkeys_common.h
> create mode 100644 lib/pks/Makefile
> create mode 100644 lib/pks/pks_test.c
> create mode 100644 tools/testing/selftests/x86/test_pks.c
>
> --
> 2.28.0.rc0.12.gb6a658bd00c9
>

2020-10-13 19:20:12

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH RFC V3 1/9] x86/pkeys: Create pkeys_common.h

On 10/9/20 12:42 PM, [email protected] wrote:
> Protection Keys User (PKU) and Protection Keys Supervisor (PKS) work
> in similar fashions and can share common defines.

Could we be a bit less abstract? PKS and PKU each have:
1. A single control register
2. The same number of keys
3. The same number of bits in the register per key
4. Access and Write disable in the same bit locations

That means that we can share all the macros that synthesize and
manipulate register values between the two features.

> +++ b/arch/x86/include/asm/pkeys_common.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_PKEYS_INTERNAL_H
> +#define _ASM_X86_PKEYS_INTERNAL_H
> +
> +#define PKR_AD_BIT 0x1
> +#define PKR_WD_BIT 0x2
> +#define PKR_BITS_PER_PKEY 2
> +
> +#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY))

Now that this has moved away from its use-site, it's a bit less
self-documenting. Let's add a comment:

/*
* Generate an Access-Disable mask for the given pkey. Several of these
* can be OR'd together to generate pkey register values.
*/

Once that's in place, along with the updated changelog:

Reviewed-by: Dave Hansen <[email protected]>

2020-10-13 19:38:37

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH RFC V3 8/9] x86/fault: Report the PKRS state on fault

> @@ -548,6 +549,11 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
> (error_code & X86_PF_PK) ? "protection keys violation" :
> "permissions violation");
>
> +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
> + if (irq_state && (error_code & X86_PF_PK))
> + pr_alert("PKRS: 0x%x\n", irq_state->pkrs);
> +#endif

This means everyone will see 'PKRS: 0x0', even if they're on non-PKS
hardware. I think I'd rather have this only show PKRS when we're on
cpu_feature_enabled(PKS) hardware.

...
> @@ -1148,14 +1156,15 @@ static int fault_in_kernel_space(unsigned long address)
> */
> static void
> do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
> - unsigned long address)
> + unsigned long address, irqentry_state_t *irq_state)
> {
> /*
> - * Protection keys exceptions only happen on user pages. We
> - * have no user pages in the kernel portion of the address
> - * space, so do not expect them here.
> + * If protection keys are not enabled for kernel space
> + * do not expect Pkey errors here.
> */

Let's fix the double-negative:

/*
* PF_PK is only expected on kernel addresses whenn
* supervisor pkeys are enabled:
*/

> - WARN_ON_ONCE(hw_error_code & X86_PF_PK);
> + if (!IS_ENABLED(CONFIG_ARCH_HAS_SUPERVISOR_PKEYS) ||
> + !cpu_feature_enabled(X86_FEATURE_PKS))
> + WARN_ON_ONCE(hw_error_code & X86_PF_PK);

Yeah, please stick X86_FEATURE_PKS in disabled-features so you can use
cpu_feature_enabled(X86_FEATURE_PKS) by itself here..

2020-10-13 19:48:51

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC V3 1/9] x86/pkeys: Create pkeys_common.h

On Tue, Oct 13, 2020 at 10:46:16AM -0700, Dave Hansen wrote:
> On 10/9/20 12:42 PM, [email protected] wrote:
> > Protection Keys User (PKU) and Protection Keys Supervisor (PKS) work
> > in similar fashions and can share common defines.
>
> Could we be a bit less abstract? PKS and PKU each have:
> 1. A single control register
> 2. The same number of keys
> 3. The same number of bits in the register per key
> 4. Access and Write disable in the same bit locations
>
> That means that we can share all the macros that synthesize and
> manipulate register values between the two features.

Sure. Done.

>
> > +++ b/arch/x86/include/asm/pkeys_common.h
> > @@ -0,0 +1,11 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_X86_PKEYS_INTERNAL_H
> > +#define _ASM_X86_PKEYS_INTERNAL_H
> > +
> > +#define PKR_AD_BIT 0x1
> > +#define PKR_WD_BIT 0x2
> > +#define PKR_BITS_PER_PKEY 2
> > +
> > +#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY))
>
> Now that this has moved away from its use-site, it's a bit less
> self-documenting. Let's add a comment:
>
> /*
> * Generate an Access-Disable mask for the given pkey. Several of these
> * can be OR'd together to generate pkey register values.
> */

Fair enough. done.

>
> Once that's in place, along with the updated changelog:
>
> Reviewed-by: Dave Hansen <[email protected]>

Thanks,
Ira

2020-10-15 10:24:12

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC V3 8/9] x86/fault: Report the PKRS state on fault

On Tue, Oct 13, 2020 at 11:56:53AM -0700, Dave Hansen wrote:
> > @@ -548,6 +549,11 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
> > (error_code & X86_PF_PK) ? "protection keys violation" :
> > "permissions violation");
> >
> > +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
> > + if (irq_state && (error_code & X86_PF_PK))
> > + pr_alert("PKRS: 0x%x\n", irq_state->pkrs);
> > +#endif
>
> This means everyone will see 'PKRS: 0x0', even if they're on non-PKS
> hardware. I think I'd rather have this only show PKRS when we're on
> cpu_feature_enabled(PKS) hardware.

Good catch, thanks.

>
> ...
> > @@ -1148,14 +1156,15 @@ static int fault_in_kernel_space(unsigned long address)
> > */
> > static void
> > do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
> > - unsigned long address)
> > + unsigned long address, irqentry_state_t *irq_state)
> > {
> > /*
> > - * Protection keys exceptions only happen on user pages. We
> > - * have no user pages in the kernel portion of the address
> > - * space, so do not expect them here.
> > + * If protection keys are not enabled for kernel space
> > + * do not expect Pkey errors here.
> > */
>
> Let's fix the double-negative:
>
> /*
> * PF_PK is only expected on kernel addresses whenn
> * supervisor pkeys are enabled:
> */

done. thanks.

>
> > - WARN_ON_ONCE(hw_error_code & X86_PF_PK);
> > + if (!IS_ENABLED(CONFIG_ARCH_HAS_SUPERVISOR_PKEYS) ||
> > + !cpu_feature_enabled(X86_FEATURE_PKS))
> > + WARN_ON_ONCE(hw_error_code & X86_PF_PK);
>
> Yeah, please stick X86_FEATURE_PKS in disabled-features so you can use
> cpu_feature_enabled(X86_FEATURE_PKS) by itself here..

done.

thanks,
Ira