2014-10-04 03:17:22

by Leonid Yegoshin

[permalink] [raw]
Subject: [PATCH 0/3] MIPS executable stack protection

The following series implements an executable stack protection in MIPS.

It sets up a per-thread 'VDSO' page and appropriate TLB support.
Page is set write-protected from user and is maintained via kernel VA.
MIPS FPU emulation is shifted to new page and stack is relieved for
execute protection as is as all data pages in default setup during ELF
binary initialization. The real protection is controlled by GLIBC and
it can do stack protected now as it is done in other architectures and
I learned today that GLIBC team is ready for this.

Note: actual execute-protection depends from HW capability, of course.

This patch is required for MIPS32/64 R2 emulation on MIPS R6 architecture.
Without it 'ssh-keygen' crashes pretty fast on attempt to execute instruction
in stack.
---

Leonid Yegoshin (3):
MIPS: mips_flush_cache_range is added
MIPS: Setup an instruction emulation in VDSO protected page instead of user stack
MIPS: set stack/data protection as non-executable


arch/mips/include/asm/cacheflush.h | 3 +
arch/mips/include/asm/mmu.h | 2 +
arch/mips/include/asm/page.h | 2 -
arch/mips/include/asm/processor.h | 2 -
arch/mips/include/asm/switch_to.h | 12 ++++
arch/mips/include/asm/thread_info.h | 3 +
arch/mips/include/asm/tlbmisc.h | 1
arch/mips/include/asm/vdso.h | 2 +
arch/mips/kernel/process.c | 7 ++
arch/mips/kernel/vdso.c | 41 ++++++++++++-
arch/mips/math-emu/dsemul.c | 114 ++++++++++++++++++++++++++---------
arch/mips/mm/c-octeon.c | 8 ++
arch/mips/mm/c-r3k.c | 8 ++
arch/mips/mm/c-r4k.c | 43 +++++++++++++
arch/mips/mm/c-tx39.c | 9 +++
arch/mips/mm/cache.c | 4 +
arch/mips/mm/fault.c | 5 ++
arch/mips/mm/tlb-r4k.c | 39 ++++++++++++
18 files changed, 273 insertions(+), 32 deletions(-)

--
Signature


2014-10-04 03:17:33

by Leonid Yegoshin

[permalink] [raw]
Subject: [PATCH 1/3] MIPS: mips_flush_cache_range is added

New function mips_flush_cache_range() is added.
It flushes D-cache on kernel VA and I-cache on user VA.
It is significant in case of cache aliasing systems.
It can be used to flush a short sequence of newly written code
to user space and especially usefull in ptrace() and dsemul().
Today a full page is flushed by flush_cache_page in ptrace().

Signed-off-by: Leonid Yegoshin <[email protected]>
---
arch/mips/include/asm/cacheflush.h | 3 +++
arch/mips/mm/c-octeon.c | 8 +++++++
arch/mips/mm/c-r3k.c | 8 +++++++
arch/mips/mm/c-r4k.c | 43 ++++++++++++++++++++++++++++++++++++
arch/mips/mm/c-tx39.c | 9 ++++++++
arch/mips/mm/cache.c | 4 +++
6 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index e08381a..8305241 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -94,6 +94,9 @@ extern void (*flush_cache_sigtramp)(unsigned long addr);
extern void (*flush_icache_all)(void);
extern void (*local_flush_data_cache_page)(void * addr);
extern void (*flush_data_cache_page)(unsigned long addr);
+extern void (*mips_flush_data_cache_range)(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long addr,
+ unsigned long size);

/*
* This flag is used to indicate that the page pointed to by a pte
diff --git a/arch/mips/mm/c-octeon.c b/arch/mips/mm/c-octeon.c
index 05b1d7c..38349d1 100644
--- a/arch/mips/mm/c-octeon.c
+++ b/arch/mips/mm/c-octeon.c
@@ -178,6 +178,13 @@ static void octeon_flush_kernel_vmap_range(unsigned long vaddr, int size)
BUG();
}

+static void octeon_flush_data_cache_range(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long addr,
+ unsigned long size)
+{
+ octeon_flush_cache_page(vma, addr, page_to_pfn(page));
+}
+
/**
* Probe Octeon's caches
*
@@ -292,6 +299,7 @@ void octeon_cache_init(void)
flush_cache_sigtramp = octeon_flush_cache_sigtramp;
flush_icache_all = octeon_flush_icache_all;
flush_data_cache_page = octeon_flush_data_cache_page;
+ mips_flush_data_cache_range = octeon_flush_data_cache_range;
flush_icache_range = octeon_flush_icache_range;
local_flush_icache_range = local_octeon_flush_icache_range;

diff --git a/arch/mips/mm/c-r3k.c b/arch/mips/mm/c-r3k.c
index 135ec31..93b4810 100644
--- a/arch/mips/mm/c-r3k.c
+++ b/arch/mips/mm/c-r3k.c
@@ -273,6 +273,13 @@ static void r3k_flush_data_cache_page(unsigned long addr)
{
}

+static void r3k_mips_flush_data_cache_range(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long addr,
+ unsigned long size)
+{
+ r3k_flush_cache_page(vma, addr, page_to_pfn(page));
+}
+
static void r3k_flush_cache_sigtramp(unsigned long addr)
{
unsigned long flags;
@@ -322,6 +329,7 @@ void r3k_cache_init(void)
__flush_cache_all = r3k___flush_cache_all;
flush_cache_mm = r3k_flush_cache_mm;
flush_cache_range = r3k_flush_cache_range;
+ mips_flush_data_cache_range = r3k_mips_flush_data_cache_range;
flush_cache_page = r3k_flush_cache_page;
flush_icache_range = r3k_flush_icache_range;
local_flush_icache_range = r3k_flush_icache_range;
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index ad6ff7b..ee014e4 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -636,6 +636,48 @@ static void r4k_flush_data_cache_page(unsigned long addr)
r4k_on_each_cpu(local_r4k_flush_data_cache_page, (void *) addr);
}

+struct mips_flush_data_cache_range_args {
+ struct vm_area_struct *vma;
+ unsigned long vaddr;
+ unsigned long start;
+ unsigned long len;
+};
+
+static inline void local_r4k_mips_flush_data_cache_range(void *args)
+{
+ struct mips_flush_data_cache_range_args *f_args = args;
+ unsigned long vaddr = f_args->vaddr;
+ unsigned long start = f_args->start;
+ unsigned long len = f_args->len;
+ struct vm_area_struct * vma = f_args->vma;
+
+ blast_dcache_range(start, start + len);
+
+ if ((vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc) {
+ wmb();
+
+ /* vma is given for exec check only, mmap is current,
+ so - no non-current vma page flush, just user or kernel */
+ protected_blast_icache_range(vaddr, vaddr + len);
+ }
+}
+
+/* flush dirty kernel data and a corresponding user instructions (if needed).
+ used in copy_to_user_page() */
+static void r4k_mips_flush_data_cache_range(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long start,
+ unsigned long len)
+{
+ struct mips_flush_data_cache_range_args args;
+
+ args.vma = vma;
+ args.vaddr = vaddr;
+ args.start = start;
+ args.len = len;
+
+ r4k_on_each_cpu(local_r4k_mips_flush_data_cache_range, (void *)&args);
+}
+
struct flush_icache_range_args {
unsigned long start;
unsigned long end;
@@ -1656,6 +1698,7 @@ void r4k_cache_init(void)
flush_icache_all = r4k_flush_icache_all;
local_flush_data_cache_page = local_r4k_flush_data_cache_page;
flush_data_cache_page = r4k_flush_data_cache_page;
+ mips_flush_data_cache_range = r4k_mips_flush_data_cache_range;
flush_icache_range = r4k_flush_icache_range;
local_flush_icache_range = local_r4k_flush_icache_range;

diff --git a/arch/mips/mm/c-tx39.c b/arch/mips/mm/c-tx39.c
index 8d909db..9316e92 100644
--- a/arch/mips/mm/c-tx39.c
+++ b/arch/mips/mm/c-tx39.c
@@ -230,6 +230,13 @@ static void tx39_flush_data_cache_page(unsigned long addr)
tx39_blast_dcache_page(addr);
}

+static void local_flush_data_cache_range(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long addr,
+ unsigned long size)
+{
+ flush_cache_page(vma, addr, page_to_pfn(page));
+}
+
static void tx39_flush_icache_range(unsigned long start, unsigned long end)
{
if (end - start > dcache_size)
@@ -371,6 +378,7 @@ void tx39_cache_init(void)

flush_cache_sigtramp = (void *) tx39h_flush_icache_all;
local_flush_data_cache_page = (void *) tx39h_flush_icache_all;
+ mips_flush_data_cache_range = (void *) local_flush_data_cache_range;
flush_data_cache_page = (void *) tx39h_flush_icache_all;

_dma_cache_wback_inv = tx39h_dma_cache_wback_inv;
@@ -402,6 +410,7 @@ void tx39_cache_init(void)

flush_cache_sigtramp = tx39_flush_cache_sigtramp;
local_flush_data_cache_page = local_tx39_flush_data_cache_page;
+ mips_flush_data_cache_range = (void *) local_flush_data_cache_range;
flush_data_cache_page = tx39_flush_data_cache_page;

_dma_cache_wback_inv = tx39_dma_cache_wback_inv;
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 7e3ea77..b7bdd01 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -44,10 +44,14 @@ void (*__invalidate_kernel_vmap_range)(unsigned long vaddr, int size);
void (*flush_cache_sigtramp)(unsigned long addr);
void (*local_flush_data_cache_page)(void * addr);
void (*flush_data_cache_page)(unsigned long addr);
+void (*mips_flush_data_cache_range)(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long addr,
+ unsigned long size);
void (*flush_icache_all)(void);

EXPORT_SYMBOL_GPL(local_flush_data_cache_page);
EXPORT_SYMBOL(flush_data_cache_page);
+EXPORT_SYMBOL(mips_flush_data_cache_range);
EXPORT_SYMBOL(flush_icache_all);

#if defined(CONFIG_DMA_NONCOHERENT) || defined(CONFIG_DMA_MAYBE_COHERENT)

2014-10-04 03:17:42

by Leonid Yegoshin

[permalink] [raw]
Subject: [PATCH 2/3] MIPS: Setup an instruction emulation in VDSO protected page instead of user stack

Historically, during FPU emulation MIPS runs live BD-slot instruction in stack.
This is needed because it was the only way to correctly handle branch
exceptions with unknown COP2 instructions in BD-slot. Now there is
an eXecuteInhibit feature and it is desirable to protect stack from execution
for security reasons.
This patch moves FPU emulation from stack area to VDSO-located page which is set
write-protected for application access. VDSO page itself is now per-thread and
it's addresses and offsets are stored in thread_info.
Small stack of emulation blocks is supported because nested traps are possible
in MIPS32/64 R6 emulation mix with FPU emulation.

Explanation of problem (current state before patch):

If I set eXecute-disabled stack in ELF binary initialisation then GLIBC ignores
it and may change stack protection at library load. If this library has
eXecute-disabled stack then anything is OK, but if this section (PT_GNU_STACK)
is just missed as usual, then GLIBC applies it's own default == eXecute-enabled
stack.
So, ssh_keygen is built explicitly with eXecute-disabled stack. But GLIBC
ignores it and set stack executable. And because of that - anything works,
FPU emulation and hacker tools.
However, if I use all *.SO libraries with eXecute-disabled stack in PT_GNU_STACK
section then GLIBC keeps stack non-executable but things fails at FPU emulation
later.

Here are two issues which are bind together and to solve an incorrect
behaviour of GLIBC (ignoring X ssh-keygen intention) the splitting both issues
is needed. So, I did a kernel emulation protected and out of stack.

Signed-off-by: Leonid Yegoshin <[email protected]>
---
arch/mips/include/asm/mmu.h | 2 +
arch/mips/include/asm/processor.h | 2 -
arch/mips/include/asm/switch_to.h | 12 ++++
arch/mips/include/asm/thread_info.h | 3 +
arch/mips/include/asm/tlbmisc.h | 1
arch/mips/include/asm/vdso.h | 2 +
arch/mips/kernel/process.c | 7 ++
arch/mips/kernel/vdso.c | 41 ++++++++++++-
arch/mips/math-emu/dsemul.c | 114 ++++++++++++++++++++++++++---------
arch/mips/mm/fault.c | 5 ++
arch/mips/mm/tlb-r4k.c | 39 ++++++++++++
11 files changed, 197 insertions(+), 31 deletions(-)

diff --git a/arch/mips/include/asm/mmu.h b/arch/mips/include/asm/mmu.h
index c436138..96266b8 100644
--- a/arch/mips/include/asm/mmu.h
+++ b/arch/mips/include/asm/mmu.h
@@ -3,7 +3,9 @@

typedef struct {
unsigned long asid[NR_CPUS];
+ unsigned long vdso_asid[NR_CPUS];
void *vdso;
+ struct vm_area_struct *vdso_vma;
} mm_context_t;

#endif /* __ASM_MMU_H */
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 05f0843..c87b1da 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -40,7 +40,7 @@ extern unsigned int vced_count, vcei_count;
* A special page (the vdso) is mapped into all processes at the very
* top of the virtual memory space.
*/
-#define SPECIAL_PAGES_SIZE PAGE_SIZE
+#define SPECIAL_PAGES_SIZE (PAGE_SIZE * 2)

#ifdef CONFIG_32BIT
#ifdef CONFIG_KVM_GUEST
diff --git a/arch/mips/include/asm/switch_to.h b/arch/mips/include/asm/switch_to.h
index b928b6f..0968f51 100644
--- a/arch/mips/include/asm/switch_to.h
+++ b/arch/mips/include/asm/switch_to.h
@@ -17,6 +17,7 @@
#include <asm/dsp.h>
#include <asm/cop2.h>
#include <asm/msa.h>
+#include <asm/mmu_context.h>

struct task_struct;

@@ -104,6 +105,16 @@ do { \
disable_msa(); \
} while (0)

+static inline void flush_vdso_page(void)
+{
+ if (current->mm && cpu_context(raw_smp_processor_id(), current->mm) &&
+ (current->mm->context.vdso_asid[raw_smp_processor_id()] ==
+ cpu_asid(raw_smp_processor_id(), current->mm))) {
+ local_flush_tlb_page(current->mm->mmap, (unsigned long)current->mm->context.vdso);
+ current->mm->context.vdso_asid[raw_smp_processor_id()] = 0;
+ }
+}
+
#define finish_arch_switch(prev) \
do { \
u32 __c0_stat; \
@@ -118,6 +129,7 @@ do { \
__restore_dsp(current); \
if (cpu_has_userlocal) \
write_c0_userlocal(current_thread_info()->tp_value); \
+ flush_vdso_page(); \
__restore_watch(); \
} while (0)

diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
index 7de8658..8d16003 100644
--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -34,6 +34,8 @@ struct thread_info {
* 0x7fffffff for user-thead
* 0xffffffff for kernel-thread
*/
+ unsigned long vdso_offset;
+ struct page *vdso_page;
struct restart_block restart_block;
struct pt_regs *regs;
};
@@ -49,6 +51,7 @@ struct thread_info {
.cpu = 0, \
.preempt_count = INIT_PREEMPT_COUNT, \
.addr_limit = KERNEL_DS, \
+ .vdso_page = NULL, \
.restart_block = { \
.fn = do_no_restart_syscall, \
}, \
diff --git a/arch/mips/include/asm/tlbmisc.h b/arch/mips/include/asm/tlbmisc.h
index 3a45228..abd7bf6 100644
--- a/arch/mips/include/asm/tlbmisc.h
+++ b/arch/mips/include/asm/tlbmisc.h
@@ -6,5 +6,6 @@
*/
extern void add_wired_entry(unsigned long entrylo0, unsigned long entrylo1,
unsigned long entryhi, unsigned long pagemask);
+int install_vdso_tlb(void);

#endif /* __ASM_TLBMISC_H */
diff --git a/arch/mips/include/asm/vdso.h b/arch/mips/include/asm/vdso.h
index cca56aa..baf3402 100644
--- a/arch/mips/include/asm/vdso.h
+++ b/arch/mips/include/asm/vdso.h
@@ -11,6 +11,8 @@

#include <linux/types.h>

+void mips_thread_vdso(struct thread_info *ti);
+void arch_release_thread_info(struct thread_info *info);

#ifdef CONFIG_32BIT
struct mips_vdso {
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 636b074..762738a 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -42,6 +42,7 @@
#include <asm/isadep.h>
#include <asm/inst.h>
#include <asm/stacktrace.h>
+#include <asm/vdso.h>

#ifdef CONFIG_HOTPLUG_CPU
void arch_cpu_idle_dead(void)
@@ -59,6 +60,8 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp)
{
unsigned long status;

+ mips_thread_vdso(current_thread_info());
+
/* New thread loses kernel privileges. */
status = regs->cp0_status & ~(ST0_CU0|ST0_CU1|ST0_FR|KU_MASK);
status |= KU_USER;
@@ -75,6 +78,7 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp)

void exit_thread(void)
{
+ arch_release_thread_info(current_thread_info());
}

void flush_thread(void)
@@ -103,6 +107,9 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,

preempt_enable();

+ ti->vdso_page = NULL;
+ mips_thread_vdso(ti);
+
/* set up new TSS. */
childregs = (struct pt_regs *) childksp - 1;
/* Put the stack after the struct pt_regs. */
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 0f1af58..7b31bba 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -19,6 +19,8 @@

#include <asm/vdso.h>
#include <asm/uasm.h>
+#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>

/*
* Including <asm/unistd.h> would give use the 64-bit syscall numbers ...
@@ -88,14 +90,18 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)

ret = install_special_mapping(mm, addr, PAGE_SIZE,
VM_READ|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ VM_MAYREAD|VM_MAYEXEC,
&vdso_page);

if (ret)
goto up_fail;

mm->context.vdso = (void *)addr;
+ /* if cache aliasing - use a different cache flush later */
+ if (cpu_has_rixi && cpu_has_dc_aliases)
+ mm->context.vdso_vma = find_vma(mm,addr);

+ mips_thread_vdso(current_thread_info());
up_fail:
up_write(&mm->mmap_sem);
return ret;
@@ -107,3 +113,36 @@ const char *arch_vma_name(struct vm_area_struct *vma)
return "[vdso]";
return NULL;
}
+
+void mips_thread_vdso(struct thread_info *ti)
+{
+ struct page *vdso;
+ unsigned long addr;
+
+ if (cpu_has_rixi && ti->task->mm && !ti->vdso_page) {
+ vdso = alloc_page(GFP_USER);
+ if (!vdso)
+ return;
+ ti->vdso_page = vdso;
+ ti->vdso_offset = PAGE_SIZE;
+ addr = (unsigned long)page_address(vdso);
+ copy_page((void *)addr, (void *)page_address(vdso_page));
+ if (!cpu_has_ic_fills_f_dc)
+ flush_data_cache_page(addr);
+ /* any vma in mmap is used, just to get ASIDs back from mm */
+ local_flush_tlb_page(ti->task->mm->mmap,addr);
+ }
+}
+
+void arch_release_thread_info(struct thread_info *info)
+{
+ if (info->vdso_page) {
+ if (info->task->mm) {
+ /* any vma in mmap is used, just to get ASIDs */
+ local_flush_tlb_page(info->task->mm->mmap,(unsigned long)page_address(info->vdso_page));
+ info->task->mm->context.vdso_asid[smp_processor_id()] = 0;
+ }
+ __free_page(info->vdso_page);
+ info->vdso_page = NULL;
+ }
+}
diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c
index 4f514f3..274f9b7 100644
--- a/arch/mips/math-emu/dsemul.c
+++ b/arch/mips/math-emu/dsemul.c
@@ -3,6 +3,7 @@
#include <asm/fpu_emulator.h>
#include <asm/inst.h>
#include <asm/mipsregs.h>
+#include <asm/vdso.h>
#include <asm/uaccess.h>

#include "ieee754.h"
@@ -30,12 +31,15 @@ struct emuframe {
mips_instruction cookie;
unsigned long epc;
};
+/* round structure size to N*8 to force a fit two instructions in a single cache line */
+#define EMULFRAME_ROUNDED_SIZE ((sizeof(struct emuframe) + 0x7) & ~0x7)

int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
{
extern asmlinkage void handle_dsemulret(void);
struct emuframe __user *fr;
int err;
+ unsigned char *pg_addr;

if ((get_isa16_mode(regs->cp0_epc) && ((ir >> 16) == MM_NOP16)) ||
(ir == 0)) {
@@ -48,7 +52,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
pr_debug("dsemul %lx %lx\n", regs->cp0_epc, cpc);

/*
- * The strategy is to push the instruction onto the user stack
+ * The strategy is to push the instruction onto the user stack/VDSO page
* and put a trap after it which we can catch and jump to
* the required address any alternative apart from full
* instruction emulation!!.
@@ -65,37 +69,78 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
* handler (single entry point).
*/

- /* Ensure that the two instructions are in the same cache line */
- fr = (struct emuframe __user *)
- ((regs->regs[29] - sizeof(struct emuframe)) & ~0x7);
-
- /* Verify that the stack pointer is not competely insane */
- if (unlikely(!access_ok(VERIFY_WRITE, fr, sizeof(struct emuframe))))
- return SIGBUS;
-
- if (get_isa16_mode(regs->cp0_epc)) {
- err = __put_user(ir >> 16, (u16 __user *)(&fr->emul));
- err |= __put_user(ir & 0xffff, (u16 __user *)((long)(&fr->emul) + 2));
- err |= __put_user(BREAK_MATH >> 16, (u16 __user *)(&fr->badinst));
- err |= __put_user(BREAK_MATH & 0xffff, (u16 __user *)((long)(&fr->badinst) + 2));
+ if (current_thread_info()->vdso_page) {
+ /*
+ * Use VDSO page and fill structure via kernel VA,
+ * user write is disabled
+ */
+ pg_addr = (unsigned char *)page_address(current_thread_info()->vdso_page);
+ fr = (struct emuframe __user *)
+ (pg_addr + current_thread_info()->vdso_offset -
+ EMULFRAME_ROUNDED_SIZE);
+
+ /* verify that we don't overflow into trampoline areas */
+ if ((unsigned char *)fr < (unsigned char *)(((struct mips_vdso *)pg_addr) + 1)) {
+ MIPS_FPU_EMU_INC_STATS(errors);
+ return SIGBUS;
+ }
+
+ current_thread_info()->vdso_offset -= EMULFRAME_ROUNDED_SIZE;
+
+ if (get_isa16_mode(regs->cp0_epc)) {
+ *(u16 *)&fr->emul = (u16)(ir >> 16);
+ *((u16 *)(&fr->emul) + 1) = (u16)(ir & 0xffff);
+ } else {
+ fr->emul = ir;
+ fr->badinst = (mips_instruction)BREAK_MATH;
+ }
+ fr->cookie = (mips_instruction)BD_COOKIE;
+ fr->epc = cpc;
+
+ /* fill CP0_EPC with user VA */
+ regs->cp0_epc = ((unsigned long)(current->mm->context.vdso +
+ current_thread_info()->vdso_offset)) |
+ get_isa16_mode(regs->cp0_epc);
+ if (cpu_has_dc_aliases)
+ mips_flush_data_cache_range(current->mm->context.vdso_vma,
+ regs->cp0_epc, current_thread_info()->vdso_page,
+ (unsigned long)fr, sizeof(struct emuframe));
+ else
+ /* it is a less expensive on CPU with correct SYNCI */
+ flush_cache_sigtramp((unsigned long)fr);
} else {
- err = __put_user(ir, &fr->emul);
- err |= __put_user((mips_instruction)BREAK_MATH, &fr->badinst);
+ /* Ensure that the two instructions are in the same cache line */
+ fr = (struct emuframe __user *)
+ ((regs->regs[29] - sizeof(struct emuframe)) & ~0x7);
+
+ /* Verify that the stack pointer is not competely insane */
+ if (unlikely(!access_ok(VERIFY_WRITE, fr, sizeof(struct emuframe))))
+ return SIGBUS;
+
+ if (get_isa16_mode(regs->cp0_epc)) {
+ err = __put_user(ir >> 16, (u16 __user *)(&fr->emul));
+ err |= __put_user(ir & 0xffff, (u16 __user *)((long)(&fr->emul) + 2));
+ err |= __put_user(BREAK_MATH >> 16, (u16 __user *)(&fr->badinst));
+ err |= __put_user(BREAK_MATH & 0xffff, (u16 __user *)((long)(&fr->badinst) + 2));
+ } else {
+ err = __put_user(ir, &fr->emul);
+ err |= __put_user((mips_instruction)BREAK_MATH, &fr->badinst);
+ }
+
+ err |= __put_user((mips_instruction)BD_COOKIE, &fr->cookie);
+ err |= __put_user(cpc, &fr->epc);
+
+ if (unlikely(err)) {
+ MIPS_FPU_EMU_INC_STATS(errors);
+ return SIGBUS;
+ }
+
+ regs->cp0_epc = ((unsigned long) &fr->emul) |
+ get_isa16_mode(regs->cp0_epc);
+
+ flush_cache_sigtramp((unsigned long)&fr->badinst);
}

- err |= __put_user((mips_instruction)BD_COOKIE, &fr->cookie);
- err |= __put_user(cpc, &fr->epc);
-
- if (unlikely(err)) {
- MIPS_FPU_EMU_INC_STATS(errors);
- return SIGBUS;
- }
-
- regs->cp0_epc = ((unsigned long) &fr->emul) |
- get_isa16_mode(regs->cp0_epc);
-
- flush_cache_sigtramp((unsigned long)&fr->badinst);
-
return SIGILL; /* force out of emulation loop */
}

@@ -156,6 +201,17 @@ int do_dsemulret(struct pt_regs *xcp)
return 0;
}

+ if (current_thread_info()->vdso_page) {
+ /* restore VDSO stack level */
+ current_thread_info()->vdso_offset += EMULFRAME_ROUNDED_SIZE;
+ if (current_thread_info()->vdso_offset > PAGE_SIZE) {
+ /* This is not a good situation to be in */
+ current_thread_info()->vdso_offset -= EMULFRAME_ROUNDED_SIZE;
+ force_sig(SIGBUS, current);
+
+ return 0;
+ }
+ }
/* Set EPC to return to post-branch instruction */
xcp->cp0_epc = epc;

diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index becc42b..516045a 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -26,6 +26,7 @@
#include <asm/uaccess.h>
#include <asm/ptrace.h>
#include <asm/highmem.h> /* For VMALLOC_END */
+#include <asm/tlbmisc.h>
#include <linux/kdebug.h>

/*
@@ -138,6 +139,9 @@ good_area:
#endif
goto bad_area;
}
+ if (((address & PAGE_MASK) == (unsigned long)(mm->context.vdso)) &&
+ install_vdso_tlb())
+ goto up_return;
} else {
if (!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
goto bad_area;
@@ -186,6 +190,7 @@ good_area:
}
}

+up_return:
up_read(&mm->mmap_sem);
return;

diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index fa6ebd4..a857e21 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -350,6 +350,45 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
local_irq_restore(flags);
}

+int install_vdso_tlb(void)
+{
+ int tlbidx;
+ unsigned long flags;
+
+ if (!current_thread_info()->vdso_page)
+ return(0);
+
+ local_irq_save(flags);
+ write_c0_entryhi(((unsigned long)current->mm->context.vdso & (PAGE_MASK << 1)) |
+ cpu_asid(smp_processor_id(), current->mm));
+
+ mtc0_tlbw_hazard();
+ tlb_probe();
+ tlb_probe_hazard();
+ tlbidx = read_c0_index();
+#if defined(CONFIG_64BIT_PHYS_ADDR) && defined(CONFIG_CPU_MIPS32)
+ write_c0_entrylo0(pte_val(pfn_pte(
+ page_to_pfn(current_thread_info()->vdso_page),
+ __pgprot(_page_cachable_default|_PAGE_VALID)))>>32);
+#else
+ write_c0_entrylo0(pte_to_entrylo(pte_val(pfn_pte(
+ page_to_pfn(current_thread_info()->vdso_page),
+ __pgprot(_page_cachable_default|_PAGE_VALID)))));
+#endif
+ write_c0_entrylo1(0);
+ mtc0_tlbw_hazard();
+ if (tlbidx < 0)
+ tlb_write_random();
+ else
+ tlb_write_indexed();
+ tlbw_use_hazard();
+
+ current->mm->context.vdso_asid[smp_processor_id()] = cpu_asid(smp_processor_id(), current->mm);
+ local_irq_restore(flags);
+
+ return(1);
+}
+
void add_wired_entry(unsigned long entrylo0, unsigned long entrylo1,
unsigned long entryhi, unsigned long pagemask)
{

2014-10-04 03:17:49

by Leonid Yegoshin

[permalink] [raw]
Subject: [PATCH 3/3] MIPS: set stack/data protection as non-executable

This is a last step of 3 patches which shift FPU emulation out of
stack into protected area. So, it disables a default executable stack.

Additionally, it sets a default data area non-executable protection.

Signed-off-by: Leonid Yegoshin <[email protected]>
---
arch/mips/include/asm/page.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h
index 3be8180..d49ba81 100644
--- a/arch/mips/include/asm/page.h
+++ b/arch/mips/include/asm/page.h
@@ -230,7 +230,7 @@ extern int __virt_addr_valid(const volatile void *kaddr);
#define virt_addr_valid(kaddr) \
__virt_addr_valid((const volatile void *) (kaddr))

-#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \
+#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | \
VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)

#define UNCAC_ADDR(addr) ((addr) - PAGE_OFFSET + UNCAC_BASE)

2014-10-04 08:23:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/3] MIPS executable stack protection

On Fri, Oct 03, 2014 at 08:17:14PM -0700, Leonid Yegoshin wrote:
> The following series implements an executable stack protection in MIPS.
>
> It sets up a per-thread 'VDSO' page and appropriate TLB support.

So traditionally we've always avoided per-thread pages like that. What
makes it worth it on MIPS?

2014-10-04 15:33:44

by Leonid Yegoshin

[permalink] [raw]
Subject: Re: [PATCH 0/3] MIPS executable stack protection

Peter Zijlstra wrote:

>> It sets up a per-thread 'VDSO' page and appropriate TLB support.

> So traditionally we've always avoided per-thread pages like that.
> What makes it worth it on MIPS?

MIPS has branch delay slots - it is an instruction after branch which is executed
before branch is taken. If branch fails due to FPU unavailability then that
instruction should be emulated as well as branch itself.
However, MIPS allows to have a customisable coprocessor 2 instructions
and it is impractical to emulate it and big amount of other traditional MIPS
instructions inside of kernel.

So, some per thread space is needed to put instruction into it, enclose it with
a return kernel call and switch temporary execution into it.

Currently, this space is space at SP register (user stack) but it prevents
switching stack as non-executable.

Handle another stack set (one stack per thread) in common user map is
impractical because of management, scalability and performance difficulties.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-10-04 16:03:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/3] MIPS executable stack protection

On Sat, Oct 4, 2014 at 1:23 AM, Peter Zijlstra <[email protected]> wrote:
> On Fri, Oct 03, 2014 at 08:17:14PM -0700, Leonid Yegoshin wrote:
>> The following series implements an executable stack protection in MIPS.
>>
>> It sets up a per-thread 'VDSO' page and appropriate TLB support.
>
> So traditionally we've always avoided per-thread pages like that. What
> makes it worth it on MIPS?

Nothing makes it worth it on MIPS.

It may be easy to implement when you have all software-fill of TLB's,
but it's a mistake even then - because it means that you'll never be
able to do hardware TLB walkers.

And MIPS *does* have hardware TLB walkers, even if they are not
necessarily available everywhere.

So this is a horrible idea. Don't do it. Page tables need to be
per-process, not per thread, so that page tables can be shared.

Linus

2014-10-04 16:17:16

by Leonid Yegoshin

[permalink] [raw]
Subject: Re: [PATCH 0/3] MIPS executable stack protection

Linus, it works on CPU with hardware page table walker - MIPS P5600 aka Apache.

I was involved in architecture development of HTW and took care of it.


Linus Torvalds <[email protected]> wrote:


On Sat, Oct 4, 2014 at 1:23 AM, Peter Zijlstra <[email protected]> wrote:
> On Fri, Oct 03, 2014 at 08:17:14PM -0700, Leonid Yegoshin wrote:
>> The following series implements an executable stack protection in MIPS.
>>
>> It sets up a per-thread 'VDSO' page and appropriate TLB support.
>
> So traditionally we've always avoided per-thread pages like that. What
> makes it worth it on MIPS?

Nothing makes it worth it on MIPS.

It may be easy to implement when you have all software-fill of TLB's,
but it's a mistake even then - because it means that you'll never be
able to do hardware TLB walkers.

And MIPS *does* have hardware TLB walkers, even if they are not
necessarily available everywhere.

So this is a horrible idea. Don't do it. Page tables need to be
per-process, not per thread, so that page tables can be shared.

Linus

2014-10-04 16:28:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/3] MIPS executable stack protection

On Sat, Oct 4, 2014 at 9:17 AM, Leonid Yegoshin
<[email protected]> wrote:
> Linus, it works on CPU with hardware page table walker - MIPS P5600 aka Apache.
>
> I was involved in architecture development of HTW and took care of it.

Ok, as long as it works architecturally, per-thread TLB fills are fine by me.

Linus

2014-10-04 20:00:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/3] MIPS: Setup an instruction emulation in VDSO protected page instead of user stack

On Fri, Oct 03, 2014 at 08:17:30PM -0700, Leonid Yegoshin wrote:

> --- a/arch/mips/include/asm/switch_to.h
> +++ b/arch/mips/include/asm/switch_to.h
> @@ -17,6 +17,7 @@
> #include <asm/dsp.h>
> #include <asm/cop2.h>
> #include <asm/msa.h>
> +#include <asm/mmu_context.h>
>
> struct task_struct;
>
> @@ -104,6 +105,16 @@ do { \
> disable_msa(); \
> } while (0)
>
> +static inline void flush_vdso_page(void)
> +{
> + if (current->mm && cpu_context(raw_smp_processor_id(), current->mm) &&
> + (current->mm->context.vdso_asid[raw_smp_processor_id()] ==
> + cpu_asid(raw_smp_processor_id(), current->mm))) {
> + local_flush_tlb_page(current->mm->mmap, (unsigned long)current->mm->context.vdso);
> + current->mm->context.vdso_asid[raw_smp_processor_id()] = 0;
> + }
> +}

Why raw_smp_processor_id() and why evaluate it 3 times, sure compilers
can be expected to do some CSE but something like:

int cpu = smp_processor_id();

if ( ... [cpu] ...)

is far more readable as well.

> +
> #define finish_arch_switch(prev) \
> do { \
> u32 __c0_stat; \
> @@ -118,6 +129,7 @@ do { \
> __restore_dsp(current); \
> if (cpu_has_userlocal) \
> write_c0_userlocal(current_thread_info()->tp_value); \
> + flush_vdso_page(); \
> __restore_watch(); \
> } while (0)

So what I didn't see is any talk about the cost of this. Surely a TLB
flush isn't exactly free.

2014-10-05 05:52:59

by Leonid Yegoshin

[permalink] [raw]
Subject: RE: [PATCH 2/3] MIPS: Setup an instruction emulation in VDSO protected page instead of user stack

From: Peter Zijlstra [[email protected]]:

>On Fri, Oct 03, 2014 at 08:17:30PM -0700, Leonid Yegoshin wrote:

>> --- a/arch/mips/include/asm/switch_to.h
> >+++ b/arch/mips/include/asm/switch_to.h

>Why raw_smp_processor_id() and why evaluate it 3 times, sure compilers
>can be expected to do some CSE but something like:
>
> int cpu = smp_processor_id();
>
> if ( ... [cpu] ...)
>
>is far more readable as well.

Sure. But may be it has sense to use raw_smp_processor_id() due to elevated preemption counter.


>> + flush_vdso_page(); \

>So what I didn't see is any talk about the cost of this. Surely a TLB
>flush isn't exactly free.

Well, flush_vdso_page() uses a local version of TLB page flushing and it is cheap 'per se' in comparison with system-wide.
And I take precautions to flush only if it matches the same memory map, so it is the situation then one pthread on some map is replaced by some pthread on the same map on the same CPU.
So, it flushes only after real use in previous pthread of that map.

However, some performance loss can be expected due to killing TLB.
In low-end cores, with small TLB array we can expect that this TLB can be kicked off anyway after context switch.
In high end cores we should expect FPU unit available and float point emulation can be very rare (un-normalized operations or so).

The only question is a middle line which has enough TLB (32 or more) but may have no float point processor. However, the emulation itself is very slow and it is natural to expect performance degradation because of float point emulation here in much higher degree than possible penalty due to early loss of TLB element.

2014-10-06 12:30:20

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 2/3] MIPS: Setup an instruction emulation in VDSO protected page instead of user stack

On Fri, Oct 03, 2014 at 08:17:30PM -0700, Leonid Yegoshin wrote:
> Historically, during FPU emulation MIPS runs live BD-slot instruction in stack.
> This is needed because it was the only way to correctly handle branch
> exceptions with unknown COP2 instructions in BD-slot. Now there is
> an eXecuteInhibit feature and it is desirable to protect stack from execution
> for security reasons.
> This patch moves FPU emulation from stack area to VDSO-located page which is set
> write-protected for application access. VDSO page itself is now per-thread and
> it's addresses and offsets are stored in thread_info.
> Small stack of emulation blocks is supported because nested traps are possible
> in MIPS32/64 R6 emulation mix with FPU emulation.
>
> Explanation of problem (current state before patch):
>
> If I set eXecute-disabled stack in ELF binary initialisation then GLIBC ignores
> it and may change stack protection at library load. If this library has
> eXecute-disabled stack then anything is OK, but if this section (PT_GNU_STACK)
> is just missed as usual, then GLIBC applies it's own default == eXecute-enabled
> stack.
> So, ssh_keygen is built explicitly with eXecute-disabled stack. But GLIBC
> ignores it and set stack executable. And because of that - anything works,
> FPU emulation and hacker tools.
> However, if I use all *.SO libraries with eXecute-disabled stack in PT_GNU_STACK
> section then GLIBC keeps stack non-executable but things fails at FPU emulation
> later.
>
> Here are two issues which are bind together and to solve an incorrect
> behaviour of GLIBC (ignoring X ssh-keygen intention) the splitting both issues
> is needed. So, I did a kernel emulation protected and out of stack.
>
> Signed-off-by: Leonid Yegoshin <[email protected]>

Hi Leonid,

First some general questions: is there any reason to need the page used
to be at the same virtual address for each thread? I can't think of one,
and if that's the case then why not simply allocate a series of pages
per-thread via mmap_region or similar, on demand when a thread first
executes an FP branch instruction? That would of course consume some
more virtual address space, but would avoid the hassles of manually
prodding at the TLB, tracking ASIDs & flushing the caches. Perhaps the
shrinker interface could be used to allow freeing those pages if & when
it becomes necessary for long running threads.

Also VDSO is really a misnomer throughout, as I've pointed out to you
before. I'm aware it's an existing misnomer, but it would be nice if
we could clear that up rather than repeat it...

> ---
> arch/mips/include/asm/mmu.h | 2 +
> arch/mips/include/asm/processor.h | 2 -
> arch/mips/include/asm/switch_to.h | 12 ++++
> arch/mips/include/asm/thread_info.h | 3 +
> arch/mips/include/asm/tlbmisc.h | 1
> arch/mips/include/asm/vdso.h | 2 +
> arch/mips/kernel/process.c | 7 ++
> arch/mips/kernel/vdso.c | 41 ++++++++++++-
> arch/mips/math-emu/dsemul.c | 114 ++++++++++++++++++++++++++---------
> arch/mips/mm/fault.c | 5 ++
> arch/mips/mm/tlb-r4k.c | 39 ++++++++++++
> 11 files changed, 197 insertions(+), 31 deletions(-)
>
> diff --git a/arch/mips/include/asm/mmu.h b/arch/mips/include/asm/mmu.h
> index c436138..96266b8 100644
> --- a/arch/mips/include/asm/mmu.h
> +++ b/arch/mips/include/asm/mmu.h
> @@ -3,7 +3,9 @@
>
> typedef struct {
> unsigned long asid[NR_CPUS];
> + unsigned long vdso_asid[NR_CPUS];
> void *vdso;
> + struct vm_area_struct *vdso_vma;
> } mm_context_t;
>
> #endif /* __ASM_MMU_H */
> diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
> index 05f0843..c87b1da 100644
> --- a/arch/mips/include/asm/processor.h
> +++ b/arch/mips/include/asm/processor.h
> @@ -40,7 +40,7 @@ extern unsigned int vced_count, vcei_count;
> * A special page (the vdso) is mapped into all processes at the very
> * top of the virtual memory space.
> */
> -#define SPECIAL_PAGES_SIZE PAGE_SIZE
> +#define SPECIAL_PAGES_SIZE (PAGE_SIZE * 2)
>
> #ifdef CONFIG_32BIT
> #ifdef CONFIG_KVM_GUEST
> diff --git a/arch/mips/include/asm/switch_to.h b/arch/mips/include/asm/switch_to.h
> index b928b6f..0968f51 100644
> --- a/arch/mips/include/asm/switch_to.h
> +++ b/arch/mips/include/asm/switch_to.h
> @@ -17,6 +17,7 @@
> #include <asm/dsp.h>
> #include <asm/cop2.h>
> #include <asm/msa.h>
> +#include <asm/mmu_context.h>
>
> struct task_struct;
>
> @@ -104,6 +105,16 @@ do { \
> disable_msa(); \
> } while (0)
>
> +static inline void flush_vdso_page(void)
> +{
> + if (current->mm && cpu_context(raw_smp_processor_id(), current->mm) &&
> + (current->mm->context.vdso_asid[raw_smp_processor_id()] ==
> + cpu_asid(raw_smp_processor_id(), current->mm))) {
> + local_flush_tlb_page(current->mm->mmap, (unsigned long)current->mm->context.vdso);
> + current->mm->context.vdso_asid[raw_smp_processor_id()] = 0;
> + }
> +}
> +
> #define finish_arch_switch(prev) \
> do { \
> u32 __c0_stat; \
> @@ -118,6 +129,7 @@ do { \
> __restore_dsp(current); \
> if (cpu_has_userlocal) \
> write_c0_userlocal(current_thread_info()->tp_value); \
> + flush_vdso_page(); \
> __restore_watch(); \
> } while (0)
>
> diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
> index 7de8658..8d16003 100644
> --- a/arch/mips/include/asm/thread_info.h
> +++ b/arch/mips/include/asm/thread_info.h
> @@ -34,6 +34,8 @@ struct thread_info {
> * 0x7fffffff for user-thead
> * 0xffffffff for kernel-thread
> */
> + unsigned long vdso_offset;
> + struct page *vdso_page;
> struct restart_block restart_block;
> struct pt_regs *regs;
> };
> @@ -49,6 +51,7 @@ struct thread_info {
> .cpu = 0, \
> .preempt_count = INIT_PREEMPT_COUNT, \
> .addr_limit = KERNEL_DS, \
> + .vdso_page = NULL, \
> .restart_block = { \
> .fn = do_no_restart_syscall, \
> }, \
> diff --git a/arch/mips/include/asm/tlbmisc.h b/arch/mips/include/asm/tlbmisc.h
> index 3a45228..abd7bf6 100644
> --- a/arch/mips/include/asm/tlbmisc.h
> +++ b/arch/mips/include/asm/tlbmisc.h
> @@ -6,5 +6,6 @@
> */
> extern void add_wired_entry(unsigned long entrylo0, unsigned long entrylo1,
> unsigned long entryhi, unsigned long pagemask);
> +int install_vdso_tlb(void);
>
> #endif /* __ASM_TLBMISC_H */
> diff --git a/arch/mips/include/asm/vdso.h b/arch/mips/include/asm/vdso.h
> index cca56aa..baf3402 100644
> --- a/arch/mips/include/asm/vdso.h
> +++ b/arch/mips/include/asm/vdso.h
> @@ -11,6 +11,8 @@
>
> #include <linux/types.h>
>
> +void mips_thread_vdso(struct thread_info *ti);
> +void arch_release_thread_info(struct thread_info *info);
>
> #ifdef CONFIG_32BIT
> struct mips_vdso {
> diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
> index 636b074..762738a 100644
> --- a/arch/mips/kernel/process.c
> +++ b/arch/mips/kernel/process.c
> @@ -42,6 +42,7 @@
> #include <asm/isadep.h>
> #include <asm/inst.h>
> #include <asm/stacktrace.h>
> +#include <asm/vdso.h>
>
> #ifdef CONFIG_HOTPLUG_CPU
> void arch_cpu_idle_dead(void)
> @@ -59,6 +60,8 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp)
> {
> unsigned long status;
>
> + mips_thread_vdso(current_thread_info());
> +
> /* New thread loses kernel privileges. */
> status = regs->cp0_status & ~(ST0_CU0|ST0_CU1|ST0_FR|KU_MASK);
> status |= KU_USER;
> @@ -75,6 +78,7 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp)
>
> void exit_thread(void)
> {
> + arch_release_thread_info(current_thread_info());
> }
>
> void flush_thread(void)
> @@ -103,6 +107,9 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>
> preempt_enable();
>
> + ti->vdso_page = NULL;
> + mips_thread_vdso(ti);
> +
> /* set up new TSS. */
> childregs = (struct pt_regs *) childksp - 1;
> /* Put the stack after the struct pt_regs. */
> diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> index 0f1af58..7b31bba 100644
> --- a/arch/mips/kernel/vdso.c
> +++ b/arch/mips/kernel/vdso.c
> @@ -19,6 +19,8 @@
>
> #include <asm/vdso.h>
> #include <asm/uasm.h>
> +#include <asm/cacheflush.h>
> +#include <asm/tlbflush.h>
>
> /*
> * Including <asm/unistd.h> would give use the 64-bit syscall numbers ...
> @@ -88,14 +90,18 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
>
> ret = install_special_mapping(mm, addr, PAGE_SIZE,
> VM_READ|VM_EXEC|
> - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> + VM_MAYREAD|VM_MAYEXEC,
> &vdso_page);
>
> if (ret)
> goto up_fail;
>
> mm->context.vdso = (void *)addr;
> + /* if cache aliasing - use a different cache flush later */
> + if (cpu_has_rixi && cpu_has_dc_aliases)
> + mm->context.vdso_vma = find_vma(mm,addr);
>
> + mips_thread_vdso(current_thread_info());
> up_fail:
> up_write(&mm->mmap_sem);
> return ret;
> @@ -107,3 +113,36 @@ const char *arch_vma_name(struct vm_area_struct *vma)
> return "[vdso]";
> return NULL;
> }
> +
> +void mips_thread_vdso(struct thread_info *ti)
> +{
> + struct page *vdso;
> + unsigned long addr;
> +
> + if (cpu_has_rixi && ti->task->mm && !ti->vdso_page) {
> + vdso = alloc_page(GFP_USER);
> + if (!vdso)
> + return;
> + ti->vdso_page = vdso;
> + ti->vdso_offset = PAGE_SIZE;
> + addr = (unsigned long)page_address(vdso);
> + copy_page((void *)addr, (void *)page_address(vdso_page));
> + if (!cpu_has_ic_fills_f_dc)
> + flush_data_cache_page(addr);
> + /* any vma in mmap is used, just to get ASIDs back from mm */
> + local_flush_tlb_page(ti->task->mm->mmap,addr);
> + }
> +}
> +
> +void arch_release_thread_info(struct thread_info *info)
> +{
> + if (info->vdso_page) {
> + if (info->task->mm) {
> + /* any vma in mmap is used, just to get ASIDs */
> + local_flush_tlb_page(info->task->mm->mmap,(unsigned long)page_address(info->vdso_page));
> + info->task->mm->context.vdso_asid[smp_processor_id()] = 0;
> + }
> + __free_page(info->vdso_page);
> + info->vdso_page = NULL;
> + }
> +}
> diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c
> index 4f514f3..274f9b7 100644
> --- a/arch/mips/math-emu/dsemul.c
> +++ b/arch/mips/math-emu/dsemul.c
> @@ -3,6 +3,7 @@
> #include <asm/fpu_emulator.h>
> #include <asm/inst.h>
> #include <asm/mipsregs.h>
> +#include <asm/vdso.h>
> #include <asm/uaccess.h>
>
> #include "ieee754.h"
> @@ -30,12 +31,15 @@ struct emuframe {
> mips_instruction cookie;
> unsigned long epc;
> };
> +/* round structure size to N*8 to force a fit two instructions in a single cache line */
> +#define EMULFRAME_ROUNDED_SIZE ((sizeof(struct emuframe) + 0x7) & ~0x7)
>
> int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
> {
> extern asmlinkage void handle_dsemulret(void);
> struct emuframe __user *fr;
> int err;
> + unsigned char *pg_addr;
>
> if ((get_isa16_mode(regs->cp0_epc) && ((ir >> 16) == MM_NOP16)) ||
> (ir == 0)) {
> @@ -48,7 +52,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
> pr_debug("dsemul %lx %lx\n", regs->cp0_epc, cpc);
>
> /*
> - * The strategy is to push the instruction onto the user stack
> + * The strategy is to push the instruction onto the user stack/VDSO page
> * and put a trap after it which we can catch and jump to
> * the required address any alternative apart from full
> * instruction emulation!!.
> @@ -65,37 +69,78 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
> * handler (single entry point).
> */
>
> - /* Ensure that the two instructions are in the same cache line */
> - fr = (struct emuframe __user *)
> - ((regs->regs[29] - sizeof(struct emuframe)) & ~0x7);
> -
> - /* Verify that the stack pointer is not competely insane */
> - if (unlikely(!access_ok(VERIFY_WRITE, fr, sizeof(struct emuframe))))
> - return SIGBUS;
> -
> - if (get_isa16_mode(regs->cp0_epc)) {
> - err = __put_user(ir >> 16, (u16 __user *)(&fr->emul));
> - err |= __put_user(ir & 0xffff, (u16 __user *)((long)(&fr->emul) + 2));
> - err |= __put_user(BREAK_MATH >> 16, (u16 __user *)(&fr->badinst));
> - err |= __put_user(BREAK_MATH & 0xffff, (u16 __user *)((long)(&fr->badinst) + 2));
> + if (current_thread_info()->vdso_page) {
> + /*
> + * Use VDSO page and fill structure via kernel VA,
> + * user write is disabled
> + */
> + pg_addr = (unsigned char *)page_address(current_thread_info()->vdso_page);
> + fr = (struct emuframe __user *)
> + (pg_addr + current_thread_info()->vdso_offset -
> + EMULFRAME_ROUNDED_SIZE);
> +
> + /* verify that we don't overflow into trampoline areas */
> + if ((unsigned char *)fr < (unsigned char *)(((struct mips_vdso *)pg_addr) + 1)) {
> + MIPS_FPU_EMU_INC_STATS(errors);
> + return SIGBUS;
> + }
> +
> + current_thread_info()->vdso_offset -= EMULFRAME_ROUNDED_SIZE;
> +
> + if (get_isa16_mode(regs->cp0_epc)) {
> + *(u16 *)&fr->emul = (u16)(ir >> 16);
> + *((u16 *)(&fr->emul) + 1) = (u16)(ir & 0xffff);

This microMIPS case doesn't set badinst, as I pointed out internally. I
think it would be much cleaner if you were to do something along the
lines of:

struct emuframe __user _fr, *fr, *user_fr;

if (current_thread_info()->vdso_page) {
fr = user_fr = (struct emuframe __user *)
(pg_addr + current_thread_info()->vdso_offset -
EMULFRAME_ROUNDED_SIZE);
} else {
fr = &_fr;
user_fr = (struct emuframe __user *)
((regs->regs[29] - sizeof(struct emuframe)) & ~0x7);
}

/* initialize frame */
fr->inst = ...;
fr->badinst = ...;

if (fr != user_fr)
copy_to_user(user_fr, fr, sizeof(*fr));

That way you don't duplicate the initialization of the struct emuframe
and leave it less likely you'll miss cases like above.

Or the on-stack case could be removed entirely.

Paul

> + } else {
> + fr->emul = ir;
> + fr->badinst = (mips_instruction)BREAK_MATH;
> + }
> + fr->cookie = (mips_instruction)BD_COOKIE;
> + fr->epc = cpc;
> +
> + /* fill CP0_EPC with user VA */
> + regs->cp0_epc = ((unsigned long)(current->mm->context.vdso +
> + current_thread_info()->vdso_offset)) |
> + get_isa16_mode(regs->cp0_epc);
> + if (cpu_has_dc_aliases)
> + mips_flush_data_cache_range(current->mm->context.vdso_vma,
> + regs->cp0_epc, current_thread_info()->vdso_page,
> + (unsigned long)fr, sizeof(struct emuframe));
> + else
> + /* it is a less expensive on CPU with correct SYNCI */
> + flush_cache_sigtramp((unsigned long)fr);
> } else {
> - err = __put_user(ir, &fr->emul);
> - err |= __put_user((mips_instruction)BREAK_MATH, &fr->badinst);
> + /* Ensure that the two instructions are in the same cache line */
> + fr = (struct emuframe __user *)
> + ((regs->regs[29] - sizeof(struct emuframe)) & ~0x7);
> +
> + /* Verify that the stack pointer is not competely insane */
> + if (unlikely(!access_ok(VERIFY_WRITE, fr, sizeof(struct emuframe))))
> + return SIGBUS;
> +
> + if (get_isa16_mode(regs->cp0_epc)) {
> + err = __put_user(ir >> 16, (u16 __user *)(&fr->emul));
> + err |= __put_user(ir & 0xffff, (u16 __user *)((long)(&fr->emul) + 2));
> + err |= __put_user(BREAK_MATH >> 16, (u16 __user *)(&fr->badinst));
> + err |= __put_user(BREAK_MATH & 0xffff, (u16 __user *)((long)(&fr->badinst) + 2));
> + } else {
> + err = __put_user(ir, &fr->emul);
> + err |= __put_user((mips_instruction)BREAK_MATH, &fr->badinst);
> + }
> +
> + err |= __put_user((mips_instruction)BD_COOKIE, &fr->cookie);
> + err |= __put_user(cpc, &fr->epc);
> +
> + if (unlikely(err)) {
> + MIPS_FPU_EMU_INC_STATS(errors);
> + return SIGBUS;
> + }
> +
> + regs->cp0_epc = ((unsigned long) &fr->emul) |
> + get_isa16_mode(regs->cp0_epc);
> +
> + flush_cache_sigtramp((unsigned long)&fr->badinst);
> }
>
> - err |= __put_user((mips_instruction)BD_COOKIE, &fr->cookie);
> - err |= __put_user(cpc, &fr->epc);
> -
> - if (unlikely(err)) {
> - MIPS_FPU_EMU_INC_STATS(errors);
> - return SIGBUS;
> - }
> -
> - regs->cp0_epc = ((unsigned long) &fr->emul) |
> - get_isa16_mode(regs->cp0_epc);
> -
> - flush_cache_sigtramp((unsigned long)&fr->badinst);
> -
> return SIGILL; /* force out of emulation loop */
> }
>
> @@ -156,6 +201,17 @@ int do_dsemulret(struct pt_regs *xcp)
> return 0;
> }
>
> + if (current_thread_info()->vdso_page) {
> + /* restore VDSO stack level */
> + current_thread_info()->vdso_offset += EMULFRAME_ROUNDED_SIZE;
> + if (current_thread_info()->vdso_offset > PAGE_SIZE) {
> + /* This is not a good situation to be in */
> + current_thread_info()->vdso_offset -= EMULFRAME_ROUNDED_SIZE;
> + force_sig(SIGBUS, current);
> +
> + return 0;
> + }
> + }
> /* Set EPC to return to post-branch instruction */
> xcp->cp0_epc = epc;
>
> diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
> index becc42b..516045a 100644
> --- a/arch/mips/mm/fault.c
> +++ b/arch/mips/mm/fault.c
> @@ -26,6 +26,7 @@
> #include <asm/uaccess.h>
> #include <asm/ptrace.h>
> #include <asm/highmem.h> /* For VMALLOC_END */
> +#include <asm/tlbmisc.h>
> #include <linux/kdebug.h>
>
> /*
> @@ -138,6 +139,9 @@ good_area:
> #endif
> goto bad_area;
> }
> + if (((address & PAGE_MASK) == (unsigned long)(mm->context.vdso)) &&
> + install_vdso_tlb())
> + goto up_return;
> } else {
> if (!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
> goto bad_area;
> @@ -186,6 +190,7 @@ good_area:
> }
> }
>
> +up_return:
> up_read(&mm->mmap_sem);
> return;
>
> diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
> index fa6ebd4..a857e21 100644
> --- a/arch/mips/mm/tlb-r4k.c
> +++ b/arch/mips/mm/tlb-r4k.c
> @@ -350,6 +350,45 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
> local_irq_restore(flags);
> }
>
> +int install_vdso_tlb(void)
> +{
> + int tlbidx;
> + unsigned long flags;
> +
> + if (!current_thread_info()->vdso_page)
> + return(0);
> +
> + local_irq_save(flags);
> + write_c0_entryhi(((unsigned long)current->mm->context.vdso & (PAGE_MASK << 1)) |
> + cpu_asid(smp_processor_id(), current->mm));
> +
> + mtc0_tlbw_hazard();
> + tlb_probe();
> + tlb_probe_hazard();
> + tlbidx = read_c0_index();
> +#if defined(CONFIG_64BIT_PHYS_ADDR) && defined(CONFIG_CPU_MIPS32)
> + write_c0_entrylo0(pte_val(pfn_pte(
> + page_to_pfn(current_thread_info()->vdso_page),
> + __pgprot(_page_cachable_default|_PAGE_VALID)))>>32);
> +#else
> + write_c0_entrylo0(pte_to_entrylo(pte_val(pfn_pte(
> + page_to_pfn(current_thread_info()->vdso_page),
> + __pgprot(_page_cachable_default|_PAGE_VALID)))));
> +#endif
> + write_c0_entrylo1(0);
> + mtc0_tlbw_hazard();
> + if (tlbidx < 0)
> + tlb_write_random();
> + else
> + tlb_write_indexed();
> + tlbw_use_hazard();
> +
> + current->mm->context.vdso_asid[smp_processor_id()] = cpu_asid(smp_processor_id(), current->mm);
> + local_irq_restore(flags);
> +
> + return(1);
> +}
> +
> void add_wired_entry(unsigned long entrylo0, unsigned long entrylo1,
> unsigned long entryhi, unsigned long pagemask)
> {
>

2014-10-06 18:05:49

by David Daney

[permalink] [raw]
Subject: Re: [PATCH 2/3] MIPS: Setup an instruction emulation in VDSO protected page instead of user stack

On 10/03/2014 08:17 PM, Leonid Yegoshin wrote:
> Historically, during FPU emulation MIPS runs live BD-slot instruction in stack.
> This is needed because it was the only way to correctly handle branch
> exceptions with unknown COP2 instructions in BD-slot. Now there is
> an eXecuteInhibit feature and it is desirable to protect stack from execution
> for security reasons.
> This patch moves FPU emulation from stack area to VDSO-located page which is set
> write-protected for application access. VDSO page itself is now per-thread and
> it's addresses and offsets are stored in thread_info.
> Small stack of emulation blocks is supported because nested traps are possible
> in MIPS32/64 R6 emulation mix with FPU emulation.
>

Can you explain how this per-thread mapping works.

I am especially interested in what happens when a different thread from
the thread using the special mapping, issues flush_tlb_mm(), and
invalidates the TLBs on all CPUs. How does the TLB entry for the
special mapping survive this?

David Daney

2014-10-06 20:03:55

by Leonid Yegoshin

[permalink] [raw]
Subject: Re: [PATCH 2/3] MIPS: Setup an instruction emulation in VDSO protected page instead of user stack

On 10/06/2014 11:05 AM, David Daney wrote:
> On 10/03/2014 08:17 PM, Leonid Yegoshin wrote:
>> Historically, during FPU emulation MIPS runs live BD-slot instruction
>> in stack.
>> This is needed because it was the only way to correctly handle branch
>> exceptions with unknown COP2 instructions in BD-slot. Now there is
>> an eXecuteInhibit feature and it is desirable to protect stack from
>> execution
>> for security reasons.
>> This patch moves FPU emulation from stack area to VDSO-located page
>> which is set
>> write-protected for application access. VDSO page itself is now
>> per-thread and
>> it's addresses and offsets are stored in thread_info.
>> Small stack of emulation blocks is supported because nested traps are
>> possible
>> in MIPS32/64 R6 emulation mix with FPU emulation.
>>
>
> Can you explain how this per-thread mapping works.
>
> I am especially interested in what happens when a different thread
> from the thread using the special mapping, issues flush_tlb_mm(), and
> invalidates the TLBs on all CPUs. How does the TLB entry for the
> special mapping survive this?
>
>
This patch works as long as 'install_special_mapping()' doesn't change
PTE itself but installs Page Fault handler. It is the only hidden
dependency from common Linux code.

MIPS code allocates a page (copy of a standard 'VDSO' page) and links it
to thread_info and handles all allocation/deallocation/thread creation
via arch hooks. It does it only for thread which have a memory map, not
for kernel threads. Oh, it does all stuff only if CPU has RI/XI
capability - the HW execute inhibit feature, otherwise it works as is
done today.

It still does attachment of a standard 'VDSO' page to memory map for
accounting purpose, so /proc/.../maps shows [VDSO] page. However the new
(per-thread) page is actually a shadow.

Then TLB refill happens it loads an empty PTE and subsequent TLBL (TLB
load Page Fault) comes to MIPS C-code which recognizes 'VDSO' address
and asks install_vdso_tlb() to fill TLB directly and marks ASID of it in
memory map for this CPU.

At process (read - thread) reschedule there is a check that on this CPU
some previous thread of the same memory map loads TLB via comparing
ASIDs. If that happend and ASIDs are the same, then local_flush_tlb_page
is called to eliminate this TLB because it has the same ASID but can
have a different per-thread page.

Because PTE stays as 0x00..00 and never changes then this activity
starts again after eviction of TLB due to some reason - either
flush_tlb_mm(), either other flush or either eviction due to TLB array
HW or SW replacements, but only if page is demanded again.

Now, the emulation part: some stack of emulation blocks can be used
from top of page. Each time during emulation of FPU instruction from
BD-slot it takes a kernel VA of page and puts that into stack but
changes a thread EPC to user VA of that block. It uses a cache flush via
different addresses here (D-cache via kernel VA and I-cache via user VA)
in case of cache aliasing and new functions is needed to avoid a huge
performance loss from flush_cache_page(). It uses a regular
flush_cache_sigtramp() in absence of cache aliasing because in some
systems it can be much faster (via SYNCI).

Stack of emulation blocks is needed because I work on MIPS32/64 R6
architecture kernel and there is a need for emulation of some removed
MIPS R2 instructions. And a reentry of emulation may happens in some
rare cases - FPU emulation and MIPS R2 emulation subsystems are
different pieces.


Note: After Peter Zijlstra note about performance I am thinking about
adding the check of situation then the same single thread is rescheduled
again on the same CPU and don't flush TLB in this case. It just requires
yet another array of process-ids or 'VDSO' pages - one element per CPU
and I am weighting it against schedule time interval. Today array is max
8 elements for MIPS but it can change in future. There is also a
possibility to write a special TLB flush function which compares TLB
element address with page address and skips TLB element eviction if
address compares.

- Leonid.

2014-10-06 20:42:42

by Leonid Yegoshin

[permalink] [raw]
Subject: Re: [PATCH 2/3] MIPS: Setup an instruction emulation in VDSO protected page instead of user stack

On 10/06/2014 05:29 AM, Paul Burton wrote:

>>
>> First some general questions: is there any reason to need the page used
>> to be at the same virtual address for each thread? I can't think of one,
>> and if that's the case then why not simply allocate a series of pages
>> per-thread via mmap_region or similar, on demand when a thread first
>> executes an FP branch instruction? That would of course consume some
>> more virtual address space, but would avoid the hassles of manually
>> prodding at the TLB, tracking ASIDs & flushing the caches. Perhaps the
>> shrinker interface could be used to allow freeing those pages if & when
>> it becomes necessary for long running threads.
The only reason to have the same virtual address is to keep mmap
accounting the same. An original 'VDSO' is presented in mmap for all
threads of the same mmap.

As for another approach, I think it may be too much code to handle it
and too much implicit interlinks with common Linux code and GLIBC/bionic.

>>
>> Also VDSO is really a misnomer throughout, as I've pointed out to you
>> before. I'm aware it's an existing misnomer, but it would be nice if
>> we could clear that up rather than repeat it...
>>
Yes, I agree but that is outside of this patch. I think it has sense to
rename the current stuff to something like "Emulation" right before some
patch which implement the real VDSO capability on MIPS.

>> + if (get_isa16_mode(regs->cp0_epc)) {
>> + *(u16 *)&fr->emul = (u16)(ir >> 16);
>> + *((u16 *)(&fr->emul) + 1) = (u16)(ir & 0xffff);
> This microMIPS case doesn't set badinst, as I pointed out internally.
Thank you, I missed it, may be due to long citation. I will add it.

> I
> think it would be much cleaner if you were to do something along the
> lines of:
>
I try to keep it as close as an original code for better understanding.
Even with it there are questions.

Your variant may be cleaner but it may be some next style change patch.

- Leonid.