2020-10-09 20:00:17

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC V3 5/9] x86/pks: Add PKS kernel API

From: Fenghua Yu <[email protected]>

PKS allows kernel users to define domains of page mappings which have
additional protections beyond the paging protections.

Add an API to allocate, use, and free a protection key which identifies
such a domain. Export 5 new symbols pks_key_alloc(), pks_mknoaccess(),
pks_mkread(), pks_mkrdwr(), and pks_key_free(). Add 2 new macros;
PAGE_KERNEL_PKEY(key) and _PAGE_PKEY(pkey).

Update the protection key documentation to cover pkeys on supervisor
pages.

Co-developed-by: Ira Weiny <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
---
Documentation/core-api/protection-keys.rst | 101 ++++++++++++++---
arch/x86/include/asm/pgtable_types.h | 12 ++
arch/x86/include/asm/pkeys.h | 11 ++
arch/x86/include/asm/pkeys_common.h | 4 +
arch/x86/mm/pkeys.c | 125 +++++++++++++++++++++
include/linux/pgtable.h | 4 +
include/linux/pkeys.h | 22 ++++
7 files changed, 261 insertions(+), 18 deletions(-)

diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
index ec575e72d0b2..00a046a913e4 100644
--- a/Documentation/core-api/protection-keys.rst
+++ b/Documentation/core-api/protection-keys.rst
@@ -4,25 +4,33 @@
Memory Protection Keys
======================

-Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
-which is found on Intel's Skylake (and later) "Scalable Processor"
-Server CPUs. It will be available in future non-server Intel parts
-and future AMD processors.
-
-For anyone wishing to test or use this feature, it is available in
-Amazon's EC2 C5 instances and is known to work there using an Ubuntu
-17.04 image.
-
Memory Protection Keys provides a mechanism for enforcing page-based
protections, but without requiring modification of the page tables
-when an application changes protection domains. It works by
-dedicating 4 previously ignored bits in each page table entry to a
-"protection key", giving 16 possible keys.
+when an application changes protection domains.
+
+PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Scalable
+Processor" Server CPUs and later. And It will be available in future
+non-server Intel parts and future AMD processors.
+
+Future Intel processors will support Protection Keys for Supervisor pages
+(PKS).
+
+For anyone wishing to test or use user space pkeys, it is available in Amazon's
+EC2 C5 instances and is known to work there using an Ubuntu 17.04 image.
+
+pkeys work by dedicating 4 previously Reserved bits in each page table entry to
+a "protection key", giving 16 possible keys. User and Supervisor pages are
+treated separately.
+
+Protections for each page are controlled with per CPU registers for each type
+of page User and Supervisor. Each of these 32 bit register stores two separate
+bits (Access Disable and Write Disable) for each key.

-There is also a new user-accessible register (PKRU) with two separate
-bits (Access Disable and Write Disable) for each key. Being a CPU
-register, PKRU is inherently thread-local, potentially giving each
-thread a different set of protections from every other thread.
+For Userspace the register is user-accessible (rdpkru/wrpkru). For
+Supervisor, the register (MSR_IA32_PKRS) is accessible only to the kernel.
+
+Being a CPU register, pkeys are inherently thread-local, potentially giving
+each thread an independent set of protections from every other thread.

There are two new instructions (RDPKRU/WRPKRU) for reading and writing
to the new register. The feature is only available in 64-bit mode,
@@ -30,8 +38,11 @@ even though there is theoretically space in the PAE PTEs. These
permissions are enforced on data access only and have no effect on
instruction fetches.

-Syscalls
-========
+For kernel space rdmsr/wrmsr are used to access the kernel MSRs.
+
+
+Syscalls for user space keys
+============================

There are 3 system calls which directly interact with pkeys::

@@ -98,3 +109,57 @@ with a read()::
The kernel will send a SIGSEGV in both cases, but si_code will be set
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
the plain mprotect() permissions are violated.
+
+
+Kernel API for PKS support
+==========================
+
+The following interface is used to allocate, use, and free a pkey which defines
+a 'protection domain' within the kernel. Setting a pkey value in a supervisor
+mapping adds that mapping to the protection domain.
+
+ int pks_key_alloc(const char * const pkey_user);
+ #define PAGE_KERNEL_PKEY(pkey)
+ #define _PAGE_KEY(pkey)
+ void pks_mknoaccess(int pkey);
+ void pks_mkread(int pkey);
+ void pks_mkrdwr(int pkey);
+ void pks_key_free(int pkey);
+
+pks_key_alloc() allocates keys dynamically to allow better use of the limited
+key space.
+
+Callers of pks_key_alloc() _must_ be prepared for it to fail and take
+appropriate action. This is due mainly to the fact that PKS may not be
+available on all arch's. Failure to check the return of pks_key_alloc() and
+using any of the rest of the API is undefined.
+
+Kernel users must set the PTE permissions in the page table entries for the
+mappings they want to protect. This can be done with PAGE_KERNEL_PKEY() or
+_PAGE_KEY().
+
+The pks_mk*() family of calls allows kernel users the ability to change the
+protections for the domain identified by the pkey specified. 3 states are
+available pks_mknoaccess(), pks_mkread(), and pks_mkrdwr() which set the access
+to none, read, and read/write respectively.
+
+Finally, pks_key_free() allows a user to return the key to the allocator for
+use by others.
+
+The interface maintains pks_mknoaccess() (Access Disabled (AD=1)) for all keys
+not currently allocated. Therefore, the user can depend on access being
+disabled when pks_key_alloc() returns a key and the user should remove mappings
+from the domain (remove the pkey from the PTE) prior to calling pks_key_free().
+
+It should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not serializing
+but still maintains ordering properties similar to WRPKRU. Thus it is safe to
+immediately use a mapping when the pks_mk*() functions returns.
+
+The current SDM section on PKRS needs updating but should be the same as that
+of WRPKRU. So to quote from the WRPKRU text:
+
+ WRPKRU will never execute transiently. Memory accesses
+ affected by PKRU register will not execute (even transiently)
+ until all prior executions of WRPKRU have completed execution
+ and updated the PKRU register.
+
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 816b31c68550..c9fdfbdcbbfb 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -73,6 +73,12 @@
_PAGE_PKEY_BIT2 | \
_PAGE_PKEY_BIT3)

+#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
+#define _PAGE_PKEY(pkey) (_AT(pteval_t, pkey) << _PAGE_BIT_PKEY_BIT0)
+#else
+#define _PAGE_PKEY(pkey) (_AT(pteval_t, 0))
+#endif
+
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED)
#else
@@ -229,6 +235,12 @@ enum page_cache_mode {
#define PAGE_KERNEL_IO __pgprot_mask(__PAGE_KERNEL_IO)
#define PAGE_KERNEL_IO_NOCACHE __pgprot_mask(__PAGE_KERNEL_IO_NOCACHE)

+#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
+#define PAGE_KERNEL_PKEY(pkey) __pgprot_mask(__PAGE_KERNEL | _PAGE_PKEY(pkey))
+#else
+#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL
+#endif
+
#endif /* __ASSEMBLY__ */

/* xwr */
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 4526245b03e5..79952216474e 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -3,6 +3,7 @@
#define _ASM_X86_PKEYS_H

#include <asm/pkeys_common.h>
+#include <asm-generic/mman-common.h>

#define ARCH_DEFAULT_PKEY 0

@@ -138,4 +139,14 @@ static inline int vma_pkey(struct vm_area_struct *vma)

u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags);

+#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
+int pks_key_alloc(const char *const pkey_user);
+void pks_key_free(int pkey);
+
+void pks_mknoaccess(int pkey);
+void pks_mkread(int pkey);
+void pks_mkrdwr(int pkey);
+
+#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */
+
#endif /*_ASM_X86_PKEYS_H */
diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h
index 05781be33c14..40464c170522 100644
--- a/arch/x86/include/asm/pkeys_common.h
+++ b/arch/x86/include/asm/pkeys_common.h
@@ -22,6 +22,10 @@
PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | \
PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15))

+/* PKS supports 16 keys. Key 0 is reserved for the kernel. */
+#define PKS_KERN_DEFAULT_KEY 0
+#define PKS_NUM_KEYS 16
+
#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
void write_pkrs(u32 new_pkrs);
#else
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 30f65dd3d0c5..1d9f451b4b78 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -3,6 +3,9 @@
* Intel Memory Protection Keys management
* Copyright (c) 2015, Intel Corporation.
*/
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/pkeys: " fmt
+
#include <linux/debugfs.h> /* debugfs_create_u32() */
#include <linux/mm_types.h> /* mm_struct, vma, etc... */
#include <linux/pkeys.h> /* PKEY_* */
@@ -229,6 +232,7 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)

return pk_reg;
}
+EXPORT_SYMBOL_GPL(update_pkey_val);

DEFINE_PER_CPU(u32, pkrs_cache);

@@ -257,3 +261,124 @@ void write_pkrs(u32 new_pkrs)
}
put_cpu_ptr(pkrs);
}
+EXPORT_SYMBOL_GPL(write_pkrs);
+
+/**
+ * Do not call this directly, see pks_mk*() below.
+ *
+ * @pkey: Key for the domain to change
+ * @protection: protection bits to be used
+ *
+ * Protection utilizes the same protection bits specified for User pkeys
+ * PKEY_DISABLE_ACCESS
+ * PKEY_DISABLE_WRITE
+ *
+ */
+static inline void pks_update_protection(int pkey, unsigned long protection)
+{
+ current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs,
+ pkey, protection);
+ preempt_disable();
+ write_pkrs(current->thread.saved_pkrs);
+ preempt_enable();
+}
+
+/**
+ * PKS access control functions
+ *
+ * Change the access of the domain specified by the pkey. These are global
+ * updates. They only affects the current running thread. It is undefined and
+ * a bug for users to call this without having allocated a pkey and using it as
+ * pkey here.
+ *
+ * pks_mknoaccess()
+ * Disable all access to the domain
+ * pks_mkread()
+ * Make the domain Read only
+ * pks_mkrdwr()
+ * Make the domain Read/Write
+ *
+ * @pkey the pkey for which the access should change.
+ *
+ */
+void pks_mknoaccess(int pkey)
+{
+ pks_update_protection(pkey, PKEY_DISABLE_ACCESS);
+}
+EXPORT_SYMBOL_GPL(pks_mknoaccess);
+
+void pks_mkread(int pkey)
+{
+ pks_update_protection(pkey, PKEY_DISABLE_WRITE);
+}
+EXPORT_SYMBOL_GPL(pks_mkread);
+
+void pks_mkrdwr(int pkey)
+{
+ pks_update_protection(pkey, 0);
+}
+EXPORT_SYMBOL_GPL(pks_mkrdwr);
+
+static const char pks_key_user0[] = "kernel";
+
+/* Store names of allocated keys for debug. Key 0 is reserved for the kernel. */
+static const char *pks_key_users[PKS_NUM_KEYS] = {
+ pks_key_user0
+};
+
+/*
+ * Each key is represented by a bit. Bit 0 is set for key 0 and reserved for
+ * its use. We use ulong for the bit operations but only 16 bits are used.
+ */
+static unsigned long pks_key_allocation_map = 1 << PKS_KERN_DEFAULT_KEY;
+
+/*
+ * pks_key_alloc - Allocate a PKS key
+ *
+ * @pkey_user: String stored for debugging of key exhaustion. The caller is
+ * responsible to maintain this memory until pks_key_free().
+ */
+int pks_key_alloc(const char * const pkey_user)
+{
+ int nr;
+
+ if (!cpu_feature_enabled(X86_FEATURE_PKS))
+ return -EINVAL;
+
+ while (1) {
+ nr = find_first_zero_bit(&pks_key_allocation_map, PKS_NUM_KEYS);
+ if (nr >= PKS_NUM_KEYS) {
+ pr_info("Cannot allocate supervisor key for %s.\n",
+ pkey_user);
+ return -ENOSPC;
+ }
+ if (!test_and_set_bit_lock(nr, &pks_key_allocation_map))
+ break;
+ }
+
+ /* for debugging key exhaustion */
+ pks_key_users[nr] = pkey_user;
+
+ return nr;
+}
+EXPORT_SYMBOL_GPL(pks_key_alloc);
+
+/*
+ * pks_key_free - Free a previously allocate PKS key
+ *
+ * @pkey: Key to be free'ed
+ */
+void pks_key_free(int pkey)
+{
+ if (!cpu_feature_enabled(X86_FEATURE_PKS))
+ return;
+
+ if (pkey >= PKS_NUM_KEYS || pkey <= PKS_KERN_DEFAULT_KEY)
+ return;
+
+ /* Restore to default of no access */
+ pks_mknoaccess(pkey);
+ pks_key_users[pkey] = NULL;
+ __clear_bit(pkey, &pks_key_allocation_map);
+}
+EXPORT_SYMBOL_GPL(pks_key_free);
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 90654cb63e9e..6900182d53ee 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1374,6 +1374,10 @@ static inline bool arch_has_pfn_modify_check(void)
# define PAGE_KERNEL_EXEC PAGE_KERNEL
#endif

+#ifndef PAGE_KERNEL_PKEY
+#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL
+#endif
+
/*
* Page Table Modification bits for pgtbl_mod_mask.
*
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 2955ba976048..cc3510cde64e 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -50,4 +50,26 @@ static inline void copy_init_pkru_to_fpregs(void)

#endif /* ! CONFIG_ARCH_HAS_PKEYS */

+#ifndef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
+static inline int pks_key_alloc(const char * const pkey_user)
+{
+ return -EINVAL;
+}
+static inline void pks_key_free(int pkey)
+{
+}
+static inline void pks_mknoaccess(int pkey)
+{
+ WARN_ON_ONCE(1);
+}
+static inline void pks_mkread(int pkey)
+{
+ WARN_ON_ONCE(1);
+}
+static inline void pks_mkrdwr(int pkey)
+{
+ WARN_ON_ONCE(1);
+}
+#endif /* ! CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */
+
#endif /* _LINUX_PKEYS_H */
--
2.28.0.rc0.12.gb6a658bd00c9


2020-10-14 06:21:58

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH RFC V3 5/9] x86/pks: Add PKS kernel API

> +static inline void pks_update_protection(int pkey, unsigned long protection)
> +{
> + current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs,
> + pkey, protection);
> + preempt_disable();
> + write_pkrs(current->thread.saved_pkrs);
> + preempt_enable();
> +}

Why does this need preempt count manipulation in addition to the
get/put_cpu_var() inside of write_pkrs()?

> +/**
> + * PKS access control functions
> + *
> + * Change the access of the domain specified by the pkey. These are global
> + * updates. They only affects the current running thread. It is undefined and
> + * a bug for users to call this without having allocated a pkey and using it as
> + * pkey here.
> + *
> + * pks_mknoaccess()
> + * Disable all access to the domain
> + * pks_mkread()
> + * Make the domain Read only
> + * pks_mkrdwr()
> + * Make the domain Read/Write
> + *
> + * @pkey the pkey for which the access should change.
> + *
> + */
> +void pks_mknoaccess(int pkey)
> +{
> + pks_update_protection(pkey, PKEY_DISABLE_ACCESS);
> +}
> +EXPORT_SYMBOL_GPL(pks_mknoaccess);

These are named like PTE manipulation functions, which is kinda weird.

What's wrong with: pks_disable_access(pkey) ?

> +void pks_mkread(int pkey)
> +{
> + pks_update_protection(pkey, PKEY_DISABLE_WRITE);
> +}
> +EXPORT_SYMBOL_GPL(pks_mkread);

I really don't like this name. It doesn't make readable, or even
read-only, *especially* if it was already access-disabled.

> +static const char pks_key_user0[] = "kernel";
> +
> +/* Store names of allocated keys for debug. Key 0 is reserved for the kernel. */
> +static const char *pks_key_users[PKS_NUM_KEYS] = {
> + pks_key_user0
> +};
> +
> +/*
> + * Each key is represented by a bit. Bit 0 is set for key 0 and reserved for
> + * its use. We use ulong for the bit operations but only 16 bits are used.
> + */
> +static unsigned long pks_key_allocation_map = 1 << PKS_KERN_DEFAULT_KEY;
> +
> +/*
> + * pks_key_alloc - Allocate a PKS key
> + *
> + * @pkey_user: String stored for debugging of key exhaustion. The caller is
> + * responsible to maintain this memory until pks_key_free().
> + */
> +int pks_key_alloc(const char * const pkey_user)
> +{
> + int nr;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_PKS))
> + return -EINVAL;

I'm not sure I like -EINVAL for this. I thought we returned -ENOSPC for
this case for user pkeys.

> + while (1) {
> + nr = find_first_zero_bit(&pks_key_allocation_map, PKS_NUM_KEYS);
> + if (nr >= PKS_NUM_KEYS) {
> + pr_info("Cannot allocate supervisor key for %s.\n",
> + pkey_user);
> + return -ENOSPC;
> + }
> + if (!test_and_set_bit_lock(nr, &pks_key_allocation_map))
> + break;
> + }
> +
> + /* for debugging key exhaustion */
> + pks_key_users[nr] = pkey_user;
> +
> + return nr;
> +}
> +EXPORT_SYMBOL_GPL(pks_key_alloc);
> +
> +/*
> + * pks_key_free - Free a previously allocate PKS key
> + *
> + * @pkey: Key to be free'ed
> + */
> +void pks_key_free(int pkey)
> +{
> + if (!cpu_feature_enabled(X86_FEATURE_PKS))
> + return;
> +
> + if (pkey >= PKS_NUM_KEYS || pkey <= PKS_KERN_DEFAULT_KEY)
> + return;

This seems worthy of a WARN_ON_ONCE() at least. It's essentially
corrupt data coming into a kernel API.

2020-10-15 03:30:49

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC V3 5/9] x86/pks: Add PKS kernel API

On Tue, Oct 13, 2020 at 11:43:57AM -0700, Dave Hansen wrote:
> > +static inline void pks_update_protection(int pkey, unsigned long protection)
> > +{
> > + current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs,
> > + pkey, protection);
> > + preempt_disable();
> > + write_pkrs(current->thread.saved_pkrs);
> > + preempt_enable();
> > +}
>
> Why does this need preempt count manipulation in addition to the
> get/put_cpu_var() inside of write_pkrs()?

This is a bug. The disable should be around the update_pkey_val().

>
> > +/**
> > + * PKS access control functions
> > + *
> > + * Change the access of the domain specified by the pkey. These are global
> > + * updates. They only affects the current running thread. It is undefined and
> > + * a bug for users to call this without having allocated a pkey and using it as
> > + * pkey here.
> > + *
> > + * pks_mknoaccess()
> > + * Disable all access to the domain
> > + * pks_mkread()
> > + * Make the domain Read only
> > + * pks_mkrdwr()
> > + * Make the domain Read/Write
> > + *
> > + * @pkey the pkey for which the access should change.
> > + *
> > + */
> > +void pks_mknoaccess(int pkey)
> > +{
> > + pks_update_protection(pkey, PKEY_DISABLE_ACCESS);
> > +}
> > +EXPORT_SYMBOL_GPL(pks_mknoaccess);
>
> These are named like PTE manipulation functions, which is kinda weird.
>
> What's wrong with: pks_disable_access(pkey) ?

Internal review suggested these names. I'm not dead set on them.

FWIW I would rather they not get to wordy.

I was trying to get some consistency with pks_mk*() as meaning PKS 'make' X.

Do me 'disable' implies a state transition where 'make' implies we are
'setting' an absolute value. I think the later is a better name. And 'make'
made more sense because 'set' is so overloaded IHO.

>
> > +void pks_mkread(int pkey)
> > +{
> > + pks_update_protection(pkey, PKEY_DISABLE_WRITE);
> > +}
> > +EXPORT_SYMBOL_GPL(pks_mkread);
>
> I really don't like this name. It doesn't make readable, or even
> read-only, *especially* if it was already access-disabled.

Ok.

But it does sense if going from access-disable to read, correct?. I could see
this being better named pks_mkreadonly() so that going from RW to this would
make more sense. Especially after thinking about it above 'read only' needs to
be in the name.

Before I change anything I'd like to get consensus on naming.

How about the following?

pks_mk_noaccess()
pks_mk_readonly()
pks_mk_readwrite()

?

>
> > +static const char pks_key_user0[] = "kernel";
> > +
> > +/* Store names of allocated keys for debug. Key 0 is reserved for the kernel. */
> > +static const char *pks_key_users[PKS_NUM_KEYS] = {
> > + pks_key_user0
> > +};
> > +
> > +/*
> > + * Each key is represented by a bit. Bit 0 is set for key 0 and reserved for
> > + * its use. We use ulong for the bit operations but only 16 bits are used.
> > + */
> > +static unsigned long pks_key_allocation_map = 1 << PKS_KERN_DEFAULT_KEY;
> > +
> > +/*
> > + * pks_key_alloc - Allocate a PKS key
> > + *
> > + * @pkey_user: String stored for debugging of key exhaustion. The caller is
> > + * responsible to maintain this memory until pks_key_free().
> > + */
> > +int pks_key_alloc(const char * const pkey_user)
> > +{
> > + int nr;
> > +
> > + if (!cpu_feature_enabled(X86_FEATURE_PKS))
> > + return -EINVAL;
>
> I'm not sure I like -EINVAL for this. I thought we returned -ENOSPC for
> this case for user pkeys.

-ENOTSUP?

I'm not really sure anyone will need to know the difference between the
platform not supporting the key vs running out of them. But they are 2
different error conditions.

>
> > + while (1) {
> > + nr = find_first_zero_bit(&pks_key_allocation_map, PKS_NUM_KEYS);
> > + if (nr >= PKS_NUM_KEYS) {
> > + pr_info("Cannot allocate supervisor key for %s.\n",
> > + pkey_user);
> > + return -ENOSPC;

We return -ENOSPC here when running out of keys.

> > + }
> > + if (!test_and_set_bit_lock(nr, &pks_key_allocation_map))
> > + break;
> > + }
> > +
> > + /* for debugging key exhaustion */
> > + pks_key_users[nr] = pkey_user;
> > +
> > + return nr;
> > +}
> > +EXPORT_SYMBOL_GPL(pks_key_alloc);
> > +
> > +/*
> > + * pks_key_free - Free a previously allocate PKS key
> > + *
> > + * @pkey: Key to be free'ed
> > + */
> > +void pks_key_free(int pkey)
> > +{
> > + if (!cpu_feature_enabled(X86_FEATURE_PKS))
> > + return;
> > +
> > + if (pkey >= PKS_NUM_KEYS || pkey <= PKS_KERN_DEFAULT_KEY)
> > + return;
>
> This seems worthy of a WARN_ON_ONCE() at least. It's essentially
> corrupt data coming into a kernel API.

Ok, Done,
Ira

2020-10-16 14:44:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC V3 5/9] x86/pks: Add PKS kernel API

On Fri, Oct 09, 2020 at 12:42:54PM -0700, [email protected] wrote:
> +static inline void pks_update_protection(int pkey, unsigned long protection)
> +{
> + current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs,
> + pkey, protection);
> + preempt_disable();
> + write_pkrs(current->thread.saved_pkrs);
> + preempt_enable();
> +}

write_pkrs() already disables preemption itself. Wrapping it in yet
another layer is useless.

2020-10-17 09:51:03

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC V3 5/9] x86/pks: Add PKS kernel API

On Fri, Oct 16, 2020 at 01:07:47PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 09, 2020 at 12:42:54PM -0700, [email protected] wrote:
> > +static inline void pks_update_protection(int pkey, unsigned long protection)
> > +{
> > + current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs,
> > + pkey, protection);
> > + preempt_disable();
> > + write_pkrs(current->thread.saved_pkrs);
> > + preempt_enable();
> > +}
>
> write_pkrs() already disables preemption itself. Wrapping it in yet
> another layer is useless.

I was thinking the update to saved_pkrs needed this protection as well and that
was to be included in the preemption disable. But that too is incorrect.

I've removed this preemption disable.

Thanks,
Ira