2018-11-07 16:58:37

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v1 1/6] powerpc/mm: Fix reporting of kernel execute faults on the 8xx

On the 8xx, no-execute is set via PPP bits in the PTE. Therefore
a no-exec fault generates DSISR_PROTFAULT error bits,
not DSISR_NOEXEC_OR_G.

This patch adds DSISR_PROTFAULT in the test mask.

Fixes: d3ca587404b3 ("powerpc/mm: Fix reporting of kernel execute faults")
Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/mm/fault.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 1697e903bbf2..50e5c790d11e 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -226,7 +226,9 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr,
static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
unsigned long address)
{
- if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT))) {
+ /* NX faults set DSISR_PROTFAULT on the 8xx, DSISR_NOEXEC_OR_G on others */
+ if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT |
+ DSISR_PROTFAULT))) {
printk_ratelimited(KERN_CRIT "kernel tried to execute"
" exec-protected page (%lx) -"
"exploit attempt? (uid: %d)\n",
--
2.13.3



2018-11-07 16:57:20

by Christophe Leroy

[permalink] [raw]
Subject: [RFC PATCH v1 2/6] powerpc: Add framework for Kernel Userspace Protection

This patch adds a skeleton for Kernel Userspace Protection
functionnalities like Kernel Userspace Access Protection and
Kernel Userspace Execution Prevention

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/mmu.h | 2 ++
arch/powerpc/kernel/setup_64.c | 1 +
arch/powerpc/mm/init-common.c | 4 ++++
arch/powerpc/mm/init_32.c | 2 ++
4 files changed, 9 insertions(+)

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index eb20eb3b8fb0..39c51dcba8f4 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -268,6 +268,8 @@ static inline u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address)
}
#endif /* CONFIG_PPC_MEM_KEYS */

+void setup_kup(void);
+
#endif /* !__ASSEMBLY__ */

/* The kernel use the constants below to index in the page sizes array.
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 2a51e4cc8246..4e21b10ff38d 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -372,6 +372,7 @@ void __init early_setup(unsigned long dt_ptr)
*/
btext_map();
#endif /* CONFIG_PPC_EARLY_DEBUG_BOOTX */
+ setup_kup();
}

#ifdef CONFIG_SMP
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 2b656e67f2ea..e469814e8290 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -25,6 +25,10 @@
#include <asm/pgalloc.h>
#include <asm/pgtable.h>

+void setup_kup(void)
+{
+}
+
static void pgd_ctor(void *addr)
{
memset(addr, 0, PGD_TABLE_SIZE);
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 3e59e5d64b01..4012ee6f0754 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -182,6 +182,8 @@ void __init MMU_init(void)
btext_unmap();
#endif

+ setup_kup();
+
/* Shortly after that, the entire linear mapping will be available */
memblock_set_current_limit(lowmem_end_addr);
}
--
2.13.3


2018-11-07 16:57:25

by Christophe Leroy

[permalink] [raw]
Subject: [RFC PATCH v1 6/6] powerpc/8xx: Add Kernel Userspace Access Protection

This patch adds Kernel Userspace Access Protection on the 8xx.

When a page is RO or RW, it is set RO or RW for Key 0 and NA
for Key 1.

Up to now, the User group is defined with Key 0 for both User and
Supervisor.

By changing the group to Key 0 for User and Key 1 for Supervisor,
this patch prevents the Kernel from being able to access user data.

At exception entry, the kernel saves SPRN_MD_AP in the regs struct,
and reapply the protection. At exception exit it restore SPRN_MD_AP
with the value it had on exception entry.

For the time being, the unused mq field of pt_regs struct is used for
that.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/mmu-8xx.h | 7 +++++++
arch/powerpc/include/asm/nohash/32/pte-8xx.h | 23 +++++++++++++++++++++++
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/entry_32.S | 19 +++++++++++++++++++
arch/powerpc/kernel/process.c | 2 +-
arch/powerpc/mm/8xx_mmu.c | 10 ++++++++++
arch/powerpc/platforms/Kconfig.cputype | 1 +
7 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index 53dbf0788fce..bab8568def50 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -1,6 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_POWERPC_MMU_8XX_H_
#define _ASM_POWERPC_MMU_8XX_H_
+
/*
* PPC8xx support
*/
@@ -120,6 +121,12 @@
*/
#define MD_APG_INIT 0x44444444

+/*
+ * 0 => No user => 01 (all accesses performed according to page definition)
+ * 1 => User => 10 (all accesses performed according to swaped page definition)
+ */
+#define MD_APG_KUAP 0x66666666
+
/* The effective page number register. When read, contains the information
* about the last instruction TLB miss. When MD_RPN is written, bits in
* this register are used to create the TLB entry.
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index 6bfe041ef59d..f1ec7cf949d5 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -135,6 +135,29 @@ static inline pte_t pte_mkhuge(pte_t pte)
}

#define pte_mkhuge pte_mkhuge
+
+#ifdef CONFIG_PPC_KUAP
+static inline void lock_user_wr_access(void)
+{
+ mtspr(SPRN_MD_AP, MD_APG_KUAP);
+}
+
+static inline void unlock_user_wr_access(void)
+{
+ mtspr(SPRN_MD_AP, MD_APG_INIT);
+}
+
+static inline void lock_user_rd_access(void)
+{
+ mtspr(SPRN_MD_AP, MD_APG_KUAP);
+}
+
+static inline void unlock_user_rd_access(void)
+{
+ mtspr(SPRN_MD_AP, MD_APG_INIT);
+}
+#endif
+
#endif

#endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 9ffc72ded73a..da2f5d011ddb 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -320,6 +320,7 @@ int main(void)
*/
STACK_PT_REGS_OFFSET(_DEAR, dar);
STACK_PT_REGS_OFFSET(_ESR, dsisr);
+ STACK_PT_REGS_OFFSET(_MQ, mq);
#else /* CONFIG_PPC64 */
STACK_PT_REGS_OFFSET(SOFTE, softe);
STACK_PT_REGS_OFFSET(_PPR, ppr);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 77decded1175..7fd9dc16dd48 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -155,6 +155,13 @@ transfer_to_handler:
mfspr r2,SPRN_XER
stw r12,_CTR(r11)
stw r2,_XER(r11)
+#ifdef CONFIG_PPC_KUAP
+ mfspr r2, SPRN_MD_AP
+ stw r2, _MQ(r11)
+ lis r2, MD_APG_KUAP@h
+ ori r2, r2, MD_APG_KUAP@l
+ mtspr SPRN_MD_AP, r2
+#endif
mfspr r12,SPRN_SPRG_THREAD
addi r2,r12,-THREAD
tovirt(r2,r2) /* set r2 to current */
@@ -447,6 +454,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
mtlr r4
mtcr r5
lwz r7,_NIP(r1)
+#ifdef CONFIG_PPC_KUAP
+ lwz r2, _MQ(r1)
+ mtspr SPRN_MD_AP, r2
+#endif
lwz r2,GPR2(r1)
lwz r1,GPR1(r1)
#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
@@ -745,6 +756,10 @@ fast_exception_return:
mtcr r10
lwz r10,_LINK(r11)
mtlr r10
+#ifdef CONFIG_PPC_KUAP
+ lwz r10, _MQ(r11)
+ mtspr SPRN_MD_AP, r10
+#endif
REST_GPR(10, r11)
#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
mtspr SPRN_NRI, r0
@@ -997,6 +1012,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
.globl exc_exit_restart
exc_exit_restart:
lwz r12,_NIP(r1)
+#ifdef CONFIG_PPC_KUAP
+ lwz r10, _MQ(r1)
+ mtspr SPRN_MD_AP, r10
+#endif
#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
mtspr SPRN_NRI, r0
#endif
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 96f34730010f..bbec29bda4cb 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1768,7 +1768,7 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
regs->trap &= ~1UL;

#ifdef CONFIG_PPC32
- regs->mq = 0;
+ regs->mq = MD_APG_KUAP;
regs->nip = start;
regs->msr = MSR_USER;
#else
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 597d3bd2d9b5..acea296d4c76 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -204,3 +204,13 @@ void setup_kuep(bool disabled)

mtspr(SPRN_MI_AP, MI_APG_KUEP);
}
+
+void setup_kuap(bool disabled)
+{
+ pr_warn("Activating Kernel Userspace Access Protection\n");
+
+ if (disabled)
+ pr_warn("KUAP cannot be disabled yet on 8xx when compiled in\n");
+
+ mtspr(SPRN_MD_AP, MD_APG_KUAP);
+}
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index d1757cedf60b..a20669a9ec13 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -34,6 +34,7 @@ config PPC_8xx
select FSL_SOC
select SYS_SUPPORTS_HUGETLBFS
select PPC_HAVE_KUEP
+ select PPC_HAVE_KUAP

config 40x
bool "AMCC 40x"
--
2.13.3


2018-11-07 16:57:29

by Christophe Leroy

[permalink] [raw]
Subject: [RFC PATCH v1 5/6] powerpc/mm: Add a framework for Kernel Userspace Access Protection

This patch implements a framework for Kernel Userspace Access Protection.

Then subarches will have to possibility to provide their own implementation
by providing setup_kuap(), and lock/unlock_user_rd/wr_access

We separate read and write accesses because some subarches like
book3s32 might only support write access protection.

Signed-off-by: Christophe Leroy <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 2 +-
arch/powerpc/include/asm/futex.h | 4 +++
arch/powerpc/include/asm/mmu.h | 6 ++++
arch/powerpc/include/asm/uaccess.h | 44 ++++++++++++++++++++-----
arch/powerpc/lib/checksum_wrappers.c | 4 +++
arch/powerpc/mm/fault.c | 17 +++++++---
arch/powerpc/mm/init-common.c | 10 ++++++
arch/powerpc/platforms/Kconfig.cputype | 12 +++++++
8 files changed, 86 insertions(+), 13 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1103549363bb..0d059b141ff8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2792,7 +2792,7 @@
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings

- nosmap [X86]
+ nosmap [X86,PPC]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.

diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h
index 94542776a62d..bd58be09f1e8 100644
--- a/arch/powerpc/include/asm/futex.h
+++ b/arch/powerpc/include/asm/futex.h
@@ -35,6 +35,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
{
int oldval = 0, ret;

+ unlock_user_wr_access();
pagefault_disable();

switch (op) {
@@ -62,6 +63,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
if (!ret)
*oval = oldval;

+ lock_user_wr_access();
return ret;
}

@@ -75,6 +77,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;

+ unlock_user_wr_access();
__asm__ __volatile__ (
PPC_ATOMIC_ENTRY_BARRIER
"1: lwarx %1,0,%3 # futex_atomic_cmpxchg_inatomic\n\
@@ -95,6 +98,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
: "cc", "memory");

*uval = prev;
+ lock_user_wr_access();
return ret;
}

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index cf92f2a813a8..5631a906af55 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -276,6 +276,12 @@ void setup_kuep(bool disabled);
static inline void setup_kuep(bool disabled) { }
#endif

+#ifdef CONFIG_PPC_KUAP
+void setup_kuap(bool disabled);
+#else
+static inline void setup_kuap(bool disabled) { }
+#endif
+
#endif /* !__ASSEMBLY__ */

/* The kernel use the constants below to index in the page sizes array.
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 15bea9a0f260..2f3625cbfcee 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -6,6 +6,7 @@
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/extable.h>
+#include <asm/pgtable.h>

/*
* The fs value determines whether argument validity checking should be
@@ -62,6 +63,13 @@ static inline int __access_ok(unsigned long addr, unsigned long size,

#endif

+#ifndef CONFIG_PPC_KUAP
+static inline void unlock_user_rd_access(void) { }
+static inline void lock_user_rd_access(void) { }
+static inline void unlock_user_wr_access(void) { }
+static inline void lock_user_wr_access(void) { }
+#endif
+
#define access_ok(type, addr, size) \
(__chk_user_ptr(addr), \
__access_ok((__force unsigned long)(addr), (size), get_fs()))
@@ -141,6 +149,7 @@ extern long __put_user_bad(void);
#define __put_user_size(x, ptr, size, retval) \
do { \
retval = 0; \
+ unlock_user_wr_access(); \
switch (size) { \
case 1: __put_user_asm(x, ptr, retval, "stb"); break; \
case 2: __put_user_asm(x, ptr, retval, "sth"); break; \
@@ -148,6 +157,7 @@ do { \
case 8: __put_user_asm2(x, ptr, retval); break; \
default: __put_user_bad(); \
} \
+ lock_user_wr_access(); \
} while (0)

#define __put_user_nocheck(x, ptr, size) \
@@ -240,6 +250,7 @@ do { \
__chk_user_ptr(ptr); \
if (size > sizeof(x)) \
(x) = __get_user_bad(); \
+ unlock_user_rd_access(); \
switch (size) { \
case 1: __get_user_asm(x, ptr, retval, "lbz"); break; \
case 2: __get_user_asm(x, ptr, retval, "lhz"); break; \
@@ -247,6 +258,7 @@ do { \
case 8: __get_user_asm2(x, ptr, retval); break; \
default: (x) = __get_user_bad(); \
} \
+ lock_user_rd_access(); \
} while (0)

/*
@@ -306,15 +318,20 @@ extern unsigned long __copy_tofrom_user(void __user *to,
static inline unsigned long
raw_copy_in_user(void __user *to, const void __user *from, unsigned long n)
{
- return __copy_tofrom_user(to, from, n);
+ unsigned long ret;
+ unlock_user_wr_access(); \
+ ret = __copy_tofrom_user(to, from, n); \
+ lock_user_wr_access(); \
+ return ret; \
}
#endif /* __powerpc64__ */

static inline unsigned long raw_copy_from_user(void *to,
const void __user *from, unsigned long n)
{
+ unsigned long ret;
if (__builtin_constant_p(n) && (n <= 8)) {
- unsigned long ret = 1;
+ ret = 1;

switch (n) {
case 1:
@@ -339,14 +356,18 @@ static inline unsigned long raw_copy_from_user(void *to,
}

barrier_nospec();
- return __copy_tofrom_user((__force void __user *)to, from, n);
+ unlock_user_rd_access();
+ ret = __copy_tofrom_user((__force void __user *)to, from, n);
+ lock_user_rd_access();
+ return ret;
}

static inline unsigned long raw_copy_to_user(void __user *to,
const void *from, unsigned long n)
{
+ unsigned long ret;
if (__builtin_constant_p(n) && (n <= 8)) {
- unsigned long ret = 1;
+ ret = 1;

switch (n) {
case 1:
@@ -366,17 +387,24 @@ static inline unsigned long raw_copy_to_user(void __user *to,
return 0;
}

- return __copy_tofrom_user(to, (__force const void __user *)from, n);
+ unlock_user_wr_access();
+ ret = __copy_tofrom_user(to, (__force const void __user *)from, n);
+ lock_user_wr_access();
+ return ret;
}

extern unsigned long __clear_user(void __user *addr, unsigned long size);

static inline unsigned long clear_user(void __user *addr, unsigned long size)
{
+ unsigned long ret = size;
might_fault();
- if (likely(access_ok(VERIFY_WRITE, addr, size)))
- return __clear_user(addr, size);
- return size;
+ if (likely(access_ok(VERIFY_WRITE, addr, size))) {
+ unlock_user_wr_access();
+ ret = __clear_user(addr, size);
+ lock_user_wr_access();
+ }
+ return ret;
}

extern long strncpy_from_user(char *dst, const char __user *src, long count);
diff --git a/arch/powerpc/lib/checksum_wrappers.c b/arch/powerpc/lib/checksum_wrappers.c
index a0cb63fb76a1..7dca421ecb53 100644
--- a/arch/powerpc/lib/checksum_wrappers.c
+++ b/arch/powerpc/lib/checksum_wrappers.c
@@ -28,6 +28,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void *dst,
{
unsigned int csum;

+ unlock_user_rd_access();
might_sleep();

*err_ptr = 0;
@@ -60,6 +61,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void *dst,
}

out:
+ lock_user_rd_access();
return (__force __wsum)csum;
}
EXPORT_SYMBOL(csum_and_copy_from_user);
@@ -69,6 +71,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len,
{
unsigned int csum;

+ unlock_user_wr_access();
might_sleep();

*err_ptr = 0;
@@ -97,6 +100,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len,
}

out:
+ lock_user_wr_access();
return (__force __wsum)csum;
}
EXPORT_SYMBOL(csum_and_copy_to_user);
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index e57bd46cf25b..7cea9d7fc5e8 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -223,9 +223,11 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr,
}

/* Is this a bad kernel fault ? */
-static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
+static bool bad_kernel_fault(struct pt_regs *regs, unsigned long error_code,
unsigned long address)
{
+ int is_exec = TRAP(regs) == 0x400;
+
/* NX faults set DSISR_PROTFAULT on the 8xx, DSISR_NOEXEC_OR_G on others */
if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT |
DSISR_PROTFAULT))) {
@@ -236,7 +238,13 @@ static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
address, from_kuid(&init_user_ns,
current_uid()));
}
- return is_exec || (address >= TASK_SIZE);
+ if (!is_exec && address < TASK_SIZE && (error_code & DSISR_PROTFAULT) &&
+ !search_exception_tables(regs->nip))
+ printk_ratelimited(KERN_CRIT "Kernel attempted to access user"
+ " page (%lx) - exploit attempt? (uid: %d)\n",
+ address, from_kuid(&init_user_ns,
+ current_uid()));
+ return is_exec || (address >= TASK_SIZE) || !search_exception_tables(regs->nip);
}

static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
@@ -442,9 +450,10 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,

/*
* The kernel should never take an execute fault nor should it
- * take a page fault to a kernel address.
+ * take a page fault to a kernel address or a page fault to a user
+ * address outside of dedicated places
*/
- if (unlikely(!is_user && bad_kernel_fault(is_exec, error_code, address)))
+ if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address)))
return SIGSEGV;

/*
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 89c8bd58509e..1e12f936844e 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -26,6 +26,7 @@
#include <asm/pgtable.h>

static bool disable_kuep = !IS_ENABLED(CONFIG_PPC_KUEP);
+static bool disable_kuap = !IS_ENABLED(CONFIG_PPC_KUAP);

static int __init parse_nosmep(char *p)
{
@@ -35,9 +36,18 @@ static int __init parse_nosmep(char *p)
}
early_param("nosmep", parse_nosmep);

+static int __init parse_nosmap(char *p)
+{
+ disable_kuap = true;
+ pr_warn("Disabling Kernel Userspace Access Protection\n");
+ return 0;
+}
+early_param("nosmap", parse_nosmap);
+
void setup_kup(void)
{
setup_kuep(disable_kuep);
+ setup_kuap(disable_kuap);
}

static void pgd_ctor(void *addr)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index bbcae320324c..d1757cedf60b 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -364,6 +364,18 @@ config PPC_KUEP

If you're unsure, say Y.

+config PPC_HAVE_KUAP
+ bool
+
+config PPC_KUAP
+ bool "Kernel Userspace Access Protection"
+ depends on PPC_HAVE_KUAP
+ default y
+ help
+ Enable support for Kernel Userspace Access Protection (KUAP)
+
+ If you're unsure, say Y.
+
config ARCH_ENABLE_HUGEPAGE_MIGRATION
def_bool y
depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
--
2.13.3


2018-11-07 16:58:07

by Christophe Leroy

[permalink] [raw]
Subject: [RFC PATCH v1 3/6] powerpc: Add skeleton for Kernel Userspace Execution Prevention

This patch adds a skeleton for Kernel Userspace Execution Prevention.

Then subarches implementing it have to define CONFIG_PPC_HAVE_KUEP
and provide setup_kuep() function.

Signed-off-by: Christophe Leroy <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 2 +-
arch/powerpc/include/asm/mmu.h | 6 ++++++
arch/powerpc/mm/fault.c | 3 ++-
arch/powerpc/mm/init-common.c | 11 +++++++++++
arch/powerpc/platforms/Kconfig.cputype | 12 ++++++++++++
5 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 81d1d5a74728..1103549363bb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2796,7 +2796,7 @@
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.

- nosmep [X86]
+ nosmep [X86,PPC]
Disable SMEP (Supervisor Mode Execution Prevention)
even if it is supported by processor.

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 39c51dcba8f4..cf92f2a813a8 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -270,6 +270,12 @@ static inline u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address)

void setup_kup(void);

+#ifdef CONFIG_PPC_KUEP
+void setup_kuep(bool disabled);
+#else
+static inline void setup_kuep(bool disabled) { }
+#endif
+
#endif /* !__ASSEMBLY__ */

/* The kernel use the constants below to index in the page sizes array.
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 50e5c790d11e..e57bd46cf25b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -230,8 +230,9 @@ static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT |
DSISR_PROTFAULT))) {
printk_ratelimited(KERN_CRIT "kernel tried to execute"
- " exec-protected page (%lx) -"
+ " %s page (%lx) -"
"exploit attempt? (uid: %d)\n",
+ address >= TASK_SIZE ? "exec-protected" : "user",
address, from_kuid(&init_user_ns,
current_uid()));
}
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index e469814e8290..89c8bd58509e 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -25,8 +25,19 @@
#include <asm/pgalloc.h>
#include <asm/pgtable.h>

+static bool disable_kuep = !IS_ENABLED(CONFIG_PPC_KUEP);
+
+static int __init parse_nosmep(char *p)
+{
+ disable_kuep = true;
+ pr_warn("Disabling Kernel Userspace Execution Prevention\n");
+ return 0;
+}
+early_param("nosmep", parse_nosmep);
+
void setup_kup(void)
{
+ setup_kuep(disable_kuep);
}

static void pgd_ctor(void *addr)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index f4e2c5729374..70830cb3c18a 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -351,6 +351,18 @@ config PPC_RADIX_MMU_DEFAULT

If you're unsure, say Y.

+config PPC_HAVE_KUEP
+ bool
+
+config PPC_KUEP
+ bool "Kernel Userspace Execution Prevention"
+ depends on PPC_HAVE_KUEP
+ default y
+ help
+ Enable support for Kernel Userspace Execution Prevention (KUEP)
+
+ If you're unsure, say Y.
+
config ARCH_ENABLE_HUGEPAGE_MIGRATION
def_bool y
depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
--
2.13.3


2018-11-07 16:58:33

by Christophe Leroy

[permalink] [raw]
Subject: [RFC PATCH v1 4/6] powerpc/8xx: Add Kernel Userspace Execution Prevention

This patch adds Kernel Userspace Execution Prevention on the 8xx.

When a page is Executable, it is set Executable for Key 0 and NX
for Key 1.

Up to now, the User group is defined with Key 0 for both User and
Supervisor.

By changing the group to Key 0 for User and Key 1 for Supervisor,
this patch prevents the Kernel from being able to execute user code.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/mmu-8xx.h | 6 ++++++
arch/powerpc/mm/8xx_mmu.c | 10 ++++++++++
arch/powerpc/platforms/Kconfig.cputype | 1 +
3 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index fa05aa566ece..53dbf0788fce 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -41,6 +41,12 @@
*/
#define MI_APG_INIT 0x44444444

+/*
+ * 0 => No user => 01 (all accesses performed according to page definition)
+ * 1 => User => 10 (all accesses performed according to swaped page definition)
+ */
+#define MI_APG_KUEP 0x66666666
+
/* The effective page number register. When read, contains the information
* about the last instruction TLB miss. When MI_RPN is written, bits in
* this register are used to create the TLB entry.
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 01b7f5107c3a..597d3bd2d9b5 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -194,3 +194,13 @@ void flush_instruction_cache(void)
mtspr(SPRN_IC_CST, IDC_INVALL);
isync();
}
+
+void setup_kuep(bool disabled)
+{
+ if (disabled)
+ return;
+
+ pr_warn("Activating Kernel Userspace Execution Prevention\n");
+
+ mtspr(SPRN_MI_AP, MI_APG_KUEP);
+}
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 70830cb3c18a..bbcae320324c 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -33,6 +33,7 @@ config PPC_8xx
bool "Freescale 8xx"
select FSL_SOC
select SYS_SUPPORTS_HUGETLBFS
+ select PPC_HAVE_KUEP

config 40x
bool "AMCC 40x"
--
2.13.3


2018-11-07 17:10:52

by Christophe Leroy

[permalink] [raw]
Subject: Re: [RFC PATCH v1 5/6] powerpc/mm: Add a framework for Kernel Userspace Access Protection



Le 07/11/2018 à 17:56, Christophe Leroy a écrit :
> This patch implements a framework for Kernel Userspace Access Protection.
>
> Then subarches will have to possibility to provide their own implementation
> by providing setup_kuap(), and lock/unlock_user_rd/wr_access
>
> We separate read and write accesses because some subarches like
> book3s32 might only support write access protection.
>
> Signed-off-by: Christophe Leroy <[email protected]>

Note that many parts of this patch comes from Russel's serie. Thanks.

> ---
> Documentation/admin-guide/kernel-parameters.txt | 2 +-
> arch/powerpc/include/asm/futex.h | 4 +++
> arch/powerpc/include/asm/mmu.h | 6 ++++
> arch/powerpc/include/asm/uaccess.h | 44 ++++++++++++++++++++-----
> arch/powerpc/lib/checksum_wrappers.c | 4 +++
> arch/powerpc/mm/fault.c | 17 +++++++---
> arch/powerpc/mm/init-common.c | 10 ++++++
> arch/powerpc/platforms/Kconfig.cputype | 12 +++++++
> 8 files changed, 86 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 1103549363bb..0d059b141ff8 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2792,7 +2792,7 @@
> noexec=on: enable non-executable mappings (default)
> noexec=off: disable non-executable mappings
>
> - nosmap [X86]
> + nosmap [X86,PPC]
> Disable SMAP (Supervisor Mode Access Prevention)
> even if it is supported by processor.
>
> diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h
> index 94542776a62d..bd58be09f1e8 100644
> --- a/arch/powerpc/include/asm/futex.h
> +++ b/arch/powerpc/include/asm/futex.h
> @@ -35,6 +35,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
> {
> int oldval = 0, ret;
>
> + unlock_user_wr_access();
> pagefault_disable();
>
> switch (op) {
> @@ -62,6 +63,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
> if (!ret)
> *oval = oldval;
>
> + lock_user_wr_access();
> return ret;
> }
>
> @@ -75,6 +77,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
> return -EFAULT;
>
> + unlock_user_wr_access();
> __asm__ __volatile__ (
> PPC_ATOMIC_ENTRY_BARRIER
> "1: lwarx %1,0,%3 # futex_atomic_cmpxchg_inatomic\n\
> @@ -95,6 +98,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> : "cc", "memory");
>
> *uval = prev;
> + lock_user_wr_access();
> return ret;
> }
>
> diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
> index cf92f2a813a8..5631a906af55 100644
> --- a/arch/powerpc/include/asm/mmu.h
> +++ b/arch/powerpc/include/asm/mmu.h
> @@ -276,6 +276,12 @@ void setup_kuep(bool disabled);
> static inline void setup_kuep(bool disabled) { }
> #endif
>
> +#ifdef CONFIG_PPC_KUAP
> +void setup_kuap(bool disabled);
> +#else
> +static inline void setup_kuap(bool disabled) { }
> +#endif
> +
> #endif /* !__ASSEMBLY__ */
>
> /* The kernel use the constants below to index in the page sizes array.
> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
> index 15bea9a0f260..2f3625cbfcee 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -6,6 +6,7 @@
> #include <asm/processor.h>
> #include <asm/page.h>
> #include <asm/extable.h>
> +#include <asm/pgtable.h>
>
> /*
> * The fs value determines whether argument validity checking should be
> @@ -62,6 +63,13 @@ static inline int __access_ok(unsigned long addr, unsigned long size,
>
> #endif
>
> +#ifndef CONFIG_PPC_KUAP
> +static inline void unlock_user_rd_access(void) { }
> +static inline void lock_user_rd_access(void) { }
> +static inline void unlock_user_wr_access(void) { }
> +static inline void lock_user_wr_access(void) { }
> +#endif
> +
> #define access_ok(type, addr, size) \
> (__chk_user_ptr(addr), \
> __access_ok((__force unsigned long)(addr), (size), get_fs()))
> @@ -141,6 +149,7 @@ extern long __put_user_bad(void);
> #define __put_user_size(x, ptr, size, retval) \
> do { \
> retval = 0; \
> + unlock_user_wr_access(); \
> switch (size) { \
> case 1: __put_user_asm(x, ptr, retval, "stb"); break; \
> case 2: __put_user_asm(x, ptr, retval, "sth"); break; \
> @@ -148,6 +157,7 @@ do { \
> case 8: __put_user_asm2(x, ptr, retval); break; \
> default: __put_user_bad(); \
> } \
> + lock_user_wr_access(); \
> } while (0)
>
> #define __put_user_nocheck(x, ptr, size) \
> @@ -240,6 +250,7 @@ do { \
> __chk_user_ptr(ptr); \
> if (size > sizeof(x)) \
> (x) = __get_user_bad(); \
> + unlock_user_rd_access(); \
> switch (size) { \
> case 1: __get_user_asm(x, ptr, retval, "lbz"); break; \
> case 2: __get_user_asm(x, ptr, retval, "lhz"); break; \
> @@ -247,6 +258,7 @@ do { \
> case 8: __get_user_asm2(x, ptr, retval); break; \
> default: (x) = __get_user_bad(); \
> } \
> + lock_user_rd_access(); \
> } while (0)
>
> /*
> @@ -306,15 +318,20 @@ extern unsigned long __copy_tofrom_user(void __user *to,
> static inline unsigned long
> raw_copy_in_user(void __user *to, const void __user *from, unsigned long n)
> {
> - return __copy_tofrom_user(to, from, n);
> + unsigned long ret;
> + unlock_user_wr_access(); \
> + ret = __copy_tofrom_user(to, from, n); \
> + lock_user_wr_access(); \
> + return ret; \
> }
> #endif /* __powerpc64__ */
>
> static inline unsigned long raw_copy_from_user(void *to,
> const void __user *from, unsigned long n)
> {
> + unsigned long ret;
> if (__builtin_constant_p(n) && (n <= 8)) {
> - unsigned long ret = 1;
> + ret = 1;
>
> switch (n) {
> case 1:
> @@ -339,14 +356,18 @@ static inline unsigned long raw_copy_from_user(void *to,
> }
>
> barrier_nospec();
> - return __copy_tofrom_user((__force void __user *)to, from, n);
> + unlock_user_rd_access();
> + ret = __copy_tofrom_user((__force void __user *)to, from, n);
> + lock_user_rd_access();
> + return ret;
> }
>
> static inline unsigned long raw_copy_to_user(void __user *to,
> const void *from, unsigned long n)
> {
> + unsigned long ret;
> if (__builtin_constant_p(n) && (n <= 8)) {
> - unsigned long ret = 1;
> + ret = 1;
>
> switch (n) {
> case 1:
> @@ -366,17 +387,24 @@ static inline unsigned long raw_copy_to_user(void __user *to,
> return 0;
> }
>
> - return __copy_tofrom_user(to, (__force const void __user *)from, n);
> + unlock_user_wr_access();
> + ret = __copy_tofrom_user(to, (__force const void __user *)from, n);
> + lock_user_wr_access();
> + return ret;
> }
>
> extern unsigned long __clear_user(void __user *addr, unsigned long size);
>
> static inline unsigned long clear_user(void __user *addr, unsigned long size)
> {
> + unsigned long ret = size;
> might_fault();
> - if (likely(access_ok(VERIFY_WRITE, addr, size)))
> - return __clear_user(addr, size);
> - return size;
> + if (likely(access_ok(VERIFY_WRITE, addr, size))) {
> + unlock_user_wr_access();
> + ret = __clear_user(addr, size);
> + lock_user_wr_access();
> + }
> + return ret;
> }
>
> extern long strncpy_from_user(char *dst, const char __user *src, long count);
> diff --git a/arch/powerpc/lib/checksum_wrappers.c b/arch/powerpc/lib/checksum_wrappers.c
> index a0cb63fb76a1..7dca421ecb53 100644
> --- a/arch/powerpc/lib/checksum_wrappers.c
> +++ b/arch/powerpc/lib/checksum_wrappers.c
> @@ -28,6 +28,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void *dst,
> {
> unsigned int csum;
>
> + unlock_user_rd_access();
> might_sleep();
>
> *err_ptr = 0;
> @@ -60,6 +61,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void *dst,
> }
>
> out:
> + lock_user_rd_access();
> return (__force __wsum)csum;
> }
> EXPORT_SYMBOL(csum_and_copy_from_user);
> @@ -69,6 +71,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len,
> {
> unsigned int csum;
>
> + unlock_user_wr_access();
> might_sleep();
>
> *err_ptr = 0;
> @@ -97,6 +100,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len,
> }
>
> out:
> + lock_user_wr_access();
> return (__force __wsum)csum;
> }
> EXPORT_SYMBOL(csum_and_copy_to_user);
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index e57bd46cf25b..7cea9d7fc5e8 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -223,9 +223,11 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr,
> }
>
> /* Is this a bad kernel fault ? */
> -static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
> +static bool bad_kernel_fault(struct pt_regs *regs, unsigned long error_code,
> unsigned long address)
> {
> + int is_exec = TRAP(regs) == 0x400;
> +
> /* NX faults set DSISR_PROTFAULT on the 8xx, DSISR_NOEXEC_OR_G on others */
> if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT |
> DSISR_PROTFAULT))) {
> @@ -236,7 +238,13 @@ static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
> address, from_kuid(&init_user_ns,
> current_uid()));
> }
> - return is_exec || (address >= TASK_SIZE);
> + if (!is_exec && address < TASK_SIZE && (error_code & DSISR_PROTFAULT) &&
> + !search_exception_tables(regs->nip))
> + printk_ratelimited(KERN_CRIT "Kernel attempted to access user"
> + " page (%lx) - exploit attempt? (uid: %d)\n",
> + address, from_kuid(&init_user_ns,
> + current_uid()));
> + return is_exec || (address >= TASK_SIZE) || !search_exception_tables(regs->nip);
> }
>
> static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
> @@ -442,9 +450,10 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>
> /*
> * The kernel should never take an execute fault nor should it
> - * take a page fault to a kernel address.
> + * take a page fault to a kernel address or a page fault to a user
> + * address outside of dedicated places
> */
> - if (unlikely(!is_user && bad_kernel_fault(is_exec, error_code, address)))
> + if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address)))
> return SIGSEGV;
>
> /*
> diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
> index 89c8bd58509e..1e12f936844e 100644
> --- a/arch/powerpc/mm/init-common.c
> +++ b/arch/powerpc/mm/init-common.c
> @@ -26,6 +26,7 @@
> #include <asm/pgtable.h>
>
> static bool disable_kuep = !IS_ENABLED(CONFIG_PPC_KUEP);
> +static bool disable_kuap = !IS_ENABLED(CONFIG_PPC_KUAP);
>
> static int __init parse_nosmep(char *p)
> {
> @@ -35,9 +36,18 @@ static int __init parse_nosmep(char *p)
> }
> early_param("nosmep", parse_nosmep);
>
> +static int __init parse_nosmap(char *p)
> +{
> + disable_kuap = true;
> + pr_warn("Disabling Kernel Userspace Access Protection\n");
> + return 0;
> +}
> +early_param("nosmap", parse_nosmap);
> +
> void setup_kup(void)
> {
> setup_kuep(disable_kuep);
> + setup_kuap(disable_kuap);
> }
>
> static void pgd_ctor(void *addr)
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index bbcae320324c..d1757cedf60b 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -364,6 +364,18 @@ config PPC_KUEP
>
> If you're unsure, say Y.
>
> +config PPC_HAVE_KUAP
> + bool
> +
> +config PPC_KUAP
> + bool "Kernel Userspace Access Protection"
> + depends on PPC_HAVE_KUAP
> + default y
> + help
> + Enable support for Kernel Userspace Access Protection (KUAP)
> +
> + If you're unsure, say Y.
> +
> config ARCH_ENABLE_HUGEPAGE_MIGRATION
> def_bool y
> depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
>

2018-11-19 02:45:13

by Russell Currey

[permalink] [raw]
Subject: Re: [RFC PATCH v1 3/6] powerpc: Add skeleton for Kernel Userspace Execution Prevention

On Wed, 2018-11-07 at 16:56 +0000, Christophe Leroy wrote:
> This patch adds a skeleton for Kernel Userspace Execution Prevention.
>
> Then subarches implementing it have to define CONFIG_PPC_HAVE_KUEP
> and provide setup_kuep() function.
>
> Signed-off-by: Christophe Leroy <[email protected]>

An open question (with nothing to do specifically with this patch):

For what reason would you ever disable execution prevention? Clearly
there must be something since "nosmep" is a thing, but I don't know why
we'd ever do it.

- Russell


2018-11-21 02:58:46

by Russell Currey

[permalink] [raw]
Subject: Re: [RFC PATCH v1 5/6] powerpc/mm: Add a framework for Kernel Userspace Access Protection

On Wed, 2018-11-07 at 16:56 +0000, Christophe Leroy wrote:
> This patch implements a framework for Kernel Userspace Access
> Protection.
>
> Then subarches will have to possibility to provide their own
> implementation
> by providing setup_kuap(), and lock/unlock_user_rd/wr_access
>
> We separate read and write accesses because some subarches like
> book3s32 might only support write access protection.
>
> Signed-off-by: Christophe Leroy <[email protected]>

Separating read and writes does have a performance impact, I'm doing
some benchmarking to find out exactly how much - but at least for radix
it means we have to do a RMW instead of just a write. It does add some
amount of security, though.

The other issue I have is that you're just locking everything here
(like I was), and not doing anything different for just reads or
writes. In theory, wouldn't someone assume that they could (for
example) unlock reads, lock writes, then attempt to read? At which
point the read would fail, because the lock actually locks both.

I would think we either need to bundle read/write locking/unlocking
together, or only implement this on platforms that can do one at a
time, unless there's a cleaner way to handle this. Glancing at the
values you use for 8xx, this doesn't seem possible there, and it's a
definite performance hit for radix.

At the same time, as you say, it would suck for book3s32 that can only
do writes, but maybe just doing both at the same time and if
implemented for that platform it could just have a warning that it only
applies to writes on init?

Curious for people's thoughts on this.

- Russell


2018-11-21 08:33:33

by Christophe Leroy

[permalink] [raw]
Subject: Re: [RFC PATCH v1 5/6] powerpc/mm: Add a framework for Kernel Userspace Access Protection



Le 21/11/2018 à 03:26, Russell Currey a écrit :
> On Wed, 2018-11-07 at 16:56 +0000, Christophe Leroy wrote:
>> This patch implements a framework for Kernel Userspace Access
>> Protection.
>>
>> Then subarches will have to possibility to provide their own
>> implementation
>> by providing setup_kuap(), and lock/unlock_user_rd/wr_access
>>
>> We separate read and write accesses because some subarches like
>> book3s32 might only support write access protection.
>>
>> Signed-off-by: Christophe Leroy <[email protected]>
>
> Separating read and writes does have a performance impact, I'm doing
> some benchmarking to find out exactly how much - but at least for radix
> it means we have to do a RMW instead of just a write. It does add some
> amount of security, though.
>
> The other issue I have is that you're just locking everything here
> (like I was), and not doing anything different for just reads or
> writes. In theory, wouldn't someone assume that they could (for
> example) unlock reads, lock writes, then attempt to read? At which
> point the read would fail, because the lock actually locks both.
>
> I would think we either need to bundle read/write locking/unlocking
> together, or only implement this on platforms that can do one at a
> time, unless there's a cleaner way to handle this. Glancing at the
> values you use for 8xx, this doesn't seem possible there, and it's a
> definite performance hit for radix.
>
> At the same time, as you say, it would suck for book3s32 that can only
> do writes, but maybe just doing both at the same time and if
> implemented for that platform it could just have a warning that it only
> applies to writes on init?

Well, I see your points. My idea was not to separate read and write
on platform that can lock both. I think it is no problem to also
unlocking writes when we are doing a read, so on platforms that can do
both I think both should do the same..

The idea was to avoid spending time unlocking writes for doing a read on
platforms on which reads are not locked. And for platforms able to
independently unlock/lock reads and writes, if only unlocking reads can
improve performance it can be interesting as well.

For book3s/32, locking/unlocking will be done through Kp/Ks bits in
segment registers, the function won't be trivial as it may involve more
than one segment at a time. So I just wanted to avoid spending time
doing that for reads as reads won't be protected. And may also be the
case on older book3s/64, may not it ?
On Book3s/32, the page protection bits are as follows:

Key 0 1
PP
00 RW NA
01 RW RO
10 RW RW
11 RO RO

So the idea is to encode user RW with PP01 (instead of PP10 today) and
user RO with PP11 (as done today), giving Key0 to user and Key1 to
kernel (today both user and kernel have Key1). Then when kernel needs to
write, we change Ks to Key0 in segment register for the involved segments.

I'm not sure there is any risk that someone nests unlocks/locks for
reads and unlocks/locks for writes, because the unlocks/locks are done
in very limited places.

Christophe


>
> Curious for people's thoughts on this.
>
> - Russell
>

2018-11-21 13:09:49

by Michael Ellerman

[permalink] [raw]
Subject: Re: [RFC PATCH v1 3/6] powerpc: Add skeleton for Kernel Userspace Execution Prevention

Russell Currey <[email protected]> writes:

> On Wed, 2018-11-07 at 16:56 +0000, Christophe Leroy wrote:
>> This patch adds a skeleton for Kernel Userspace Execution Prevention.
>>
>> Then subarches implementing it have to define CONFIG_PPC_HAVE_KUEP
>> and provide setup_kuep() function.
>>
>> Signed-off-by: Christophe Leroy <[email protected]>
>
> An open question (with nothing to do specifically with this patch):
>
> For what reason would you ever disable execution prevention? Clearly
> there must be something since "nosmep" is a thing, but I don't know why
> we'd ever do it.

Because depending on the implementation there might be a performance
overhead, and you may want to avoid that.

cheers

2018-11-22 11:45:02

by Russell Currey

[permalink] [raw]
Subject: Re: [RFC PATCH v1 5/6] powerpc/mm: Add a framework for Kernel Userspace Access Protection

On Wed, 2018-11-21 at 09:32 +0100, Christophe LEROY wrote:
>
> Le 21/11/2018 à 03:26, Russell Currey a écrit :
> > On Wed, 2018-11-07 at 16:56 +0000, Christophe Leroy wrote:
> > > This patch implements a framework for Kernel Userspace Access
> > > Protection.
> > >
> > > Then subarches will have to possibility to provide their own
> > > implementation
> > > by providing setup_kuap(), and lock/unlock_user_rd/wr_access
> > >
> > > We separate read and write accesses because some subarches like
> > > book3s32 might only support write access protection.
> > >
> > > Signed-off-by: Christophe Leroy <[email protected]>
> >
> > Separating read and writes does have a performance impact, I'm
> > doing
> > some benchmarking to find out exactly how much - but at least for
> > radix
> > it means we have to do a RMW instead of just a write. It does add
> > some
> > amount of security, though.
> >
> > The other issue I have is that you're just locking everything here
> > (like I was), and not doing anything different for just reads or
> > writes. In theory, wouldn't someone assume that they could (for
> > example) unlock reads, lock writes, then attempt to read? At which
> > point the read would fail, because the lock actually locks both.
> >
> > I would think we either need to bundle read/write locking/unlocking
> > together, or only implement this on platforms that can do one at a
> > time, unless there's a cleaner way to handle this. Glancing at the
> > values you use for 8xx, this doesn't seem possible there, and it's
> > a
> > definite performance hit for radix.
> >
> > At the same time, as you say, it would suck for book3s32 that can
> > only
> > do writes, but maybe just doing both at the same time and if
> > implemented for that platform it could just have a warning that it
> > only
> > applies to writes on init?
>
> Well, I see your points. My idea was not to separate read and write
> on platform that can lock both. I think it is no problem to also
> unlocking writes when we are doing a read, so on platforms that can
> do
> both I think both should do the same..
>
> The idea was to avoid spending time unlocking writes for doing a read
> on
> platforms on which reads are not locked. And for platforms able to
> independently unlock/lock reads and writes, if only unlocking reads
> can
> improve performance it can be interesting as well.
>
> For book3s/32, locking/unlocking will be done through Kp/Ks bits in
> segment registers, the function won't be trivial as it may involve
> more
> than one segment at a time. So I just wanted to avoid spending time
> doing that for reads as reads won't be protected. And may also be
> the
> case on older book3s/64, may not it ?
> On Book3s/32, the page protection bits are as follows:
>
> Key 0 1
> PP
> 00 RW NA
> 01 RW RO
> 10 RW RW
> 11 RO RO
>
> So the idea is to encode user RW with PP01 (instead of PP10 today)
> and
> user RO with PP11 (as done today), giving Key0 to user and Key1 to
> kernel (today both user and kernel have Key1). Then when kernel needs
> to
> write, we change Ks to Key0 in segment register for the involved
> segments.
>
> I'm not sure there is any risk that someone nests unlocks/locks for
> reads and unlocks/locks for writes, because the unlocks/locks are
> done
> in very limited places.

Yeah I don't think it's a risk since the scope is so limited, it just
needs to be clearly documented that locking/unlocking reads/writes
could have the side effect of covering the other. My concern is less
about a problem in practice as much as functions that only don't
exactly do what the function name says.

Another option is to again have a single lock/unlock function that
takes a bitmask (so read/write/both), which due to being a singular
function might be a bit more obvious that it could lock/unlock
everything, but at this point I'm just bikeshedding.

Doing it this way should be fine, I'm just cautious that some future
developer might be caught off guard.

Planning on sending my series based on top of yours for radix today.

- Russell

>
> Christophe
>
>
> > Curious for people's thoughts on this.
> >
> > - Russell
> >


2018-11-28 10:06:48

by Christophe Leroy

[permalink] [raw]
Subject: Re: [RFC PATCH v1 5/6] powerpc/mm: Add a framework for Kernel Userspace Access Protection



Le 22/11/2018 à 02:25, Russell Currey a écrit :
> On Wed, 2018-11-21 at 09:32 +0100, Christophe LEROY wrote:
>>
>> Le 21/11/2018 à 03:26, Russell Currey a écrit :
>>> On Wed, 2018-11-07 at 16:56 +0000, Christophe Leroy wrote:
>>>> This patch implements a framework for Kernel Userspace Access
>>>> Protection.
>>>>
>>>> Then subarches will have to possibility to provide their own
>>>> implementation
>>>> by providing setup_kuap(), and lock/unlock_user_rd/wr_access
>>>>
>>>> We separate read and write accesses because some subarches like
>>>> book3s32 might only support write access protection.
>>>>
>>>> Signed-off-by: Christophe Leroy <[email protected]>
>>>
>>> Separating read and writes does have a performance impact, I'm
>>> doing
>>> some benchmarking to find out exactly how much - but at least for
>>> radix
>>> it means we have to do a RMW instead of just a write. It does add
>>> some
>>> amount of security, though.
>>>
>>> The other issue I have is that you're just locking everything here
>>> (like I was), and not doing anything different for just reads or
>>> writes. In theory, wouldn't someone assume that they could (for
>>> example) unlock reads, lock writes, then attempt to read? At which
>>> point the read would fail, because the lock actually locks both.
>>>
>>> I would think we either need to bundle read/write locking/unlocking
>>> together, or only implement this on platforms that can do one at a
>>> time, unless there's a cleaner way to handle this. Glancing at the
>>> values you use for 8xx, this doesn't seem possible there, and it's
>>> a
>>> definite performance hit for radix.
>>>
>>> At the same time, as you say, it would suck for book3s32 that can
>>> only
>>> do writes, but maybe just doing both at the same time and if
>>> implemented for that platform it could just have a warning that it
>>> only
>>> applies to writes on init?
>>
>> Well, I see your points. My idea was not to separate read and write
>> on platform that can lock both. I think it is no problem to also
>> unlocking writes when we are doing a read, so on platforms that can
>> do
>> both I think both should do the same..
>>
>> The idea was to avoid spending time unlocking writes for doing a read
>> on
>> platforms on which reads are not locked. And for platforms able to
>> independently unlock/lock reads and writes, if only unlocking reads
>> can
>> improve performance it can be interesting as well.
>>
>> For book3s/32, locking/unlocking will be done through Kp/Ks bits in
>> segment registers, the function won't be trivial as it may involve
>> more
>> than one segment at a time. So I just wanted to avoid spending time
>> doing that for reads as reads won't be protected. And may also be
>> the
>> case on older book3s/64, may not it ?
>> On Book3s/32, the page protection bits are as follows:
>>
>> Key 0 1
>> PP
>> 00 RW NA
>> 01 RW RO
>> 10 RW RW
>> 11 RO RO
>>
>> So the idea is to encode user RW with PP01 (instead of PP10 today)
>> and
>> user RO with PP11 (as done today), giving Key0 to user and Key1 to
>> kernel (today both user and kernel have Key1). Then when kernel needs
>> to
>> write, we change Ks to Key0 in segment register for the involved
>> segments.
>>
>> I'm not sure there is any risk that someone nests unlocks/locks for
>> reads and unlocks/locks for writes, because the unlocks/locks are
>> done
>> in very limited places.
>
> Yeah I don't think it's a risk since the scope is so limited, it just
> needs to be clearly documented that locking/unlocking reads/writes
> could have the side effect of covering the other. My concern is less
> about a problem in practice as much as functions that only don't
> exactly do what the function name says.
>
> Another option is to again have a single lock/unlock function that
> takes a bitmask (so read/write/both), which due to being a singular
> function might be a bit more obvious that it could lock/unlock
> everything, but at this point I'm just bikeshedding.

In order to support book3s/32, I needed to add arguments to the
unlock/lock functions, as the address is needed to identify the affected
segments.

Therefore, I changed it to single functions as you suggested. These
functions have 'to', 'from' and 'size' arguments. When it is a read,
'to' is NULL. When it is a write, 'from' is NULL. When it is a copy,
none is NULL.

See RFC v2 for the details.

Christophe

>
> Doing it this way should be fine, I'm just cautious that some future
> developer might be caught off guard.
>
> Planning on sending my series based on top of yours for radix today.
>
> - Russell
>
>>
>> Christophe
>>
>>
>>> Curious for people's thoughts on this.
>>>
>>> - Russell
>>>