Hi,
Ard and I have worked together to implement vmap stack support for
arm64. This supersedes our earlier vmap stack RFCs [0,1]. The git author
stats are a little misleading, as I've teased parts out into smaller
patches for review.
The series is based on our stack dump rework [2,3], which can be found
in the arm64/exception-stack branch [4] of my kernel.org repo. This
series can be found in the arm64/vmap-stack branch [5] of the same repo.
Since v1 [6]:
* Fix typos
* Update comments in entry assembly
* Dump exception context (and stacks) before regs
* Define safe adr_this_cpu for modules
On arm64, there is no double-fault exception, as software saves
exception context to the stack. An erroneous memory access taken during
exception handling results in a data abort, as with any other erroneous
memory access. To avoid taking these recursively, we must detect
overflow by checking the SP before we attempt to store any context to
the stack. Doing this efficiently requires a couple of tricks.
For a naturally aligned stack, bits THREAD_SHIFT-1:0 of a valid SP may
contain any arbitrary value:
0bXX .. 11111111111111
0bXX .. 11011001011100
0bXX .. 00000000000000
By aligning stacks to double their natural alignment, we know that the
THREAD_SHIFT bit of any valid SP must be zero:
0bXX .. 0 11111111111111
0bXX .. 0 11011001011100
0bXX .. 0 00000000000000
... while an overflow will result in this bit flipping, along with
(some) other high-order bits:
0bXX .. 0 00000000000000
< SP -= 1 >
0bXX .. 1 11111111111111
... and thus, we can detect overflows of up to THREAD_SIZE by testing
the THREAD_SHIFT bit of the SP value.
Provided we can get the SP into a general purpose register, we can
perform this test with a single TBNZ instruction. We don't have scratch
space to store a GPR, but we can (partially) swap the SP with a GPR
using arithmetic to perform the test:
add sp, sp, x0 // sp' = sp + x0
sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
tbnz x0, #THREAD_SHIFT, overflow_handler
sub x0, sp, x0 // sp' - x0' = (sp + x0) - sp = x0
sub sp, sp, x0 // sp' - x0 = (sp + x0) - x0 = sp
This series implements this approach, along with the other requisite
changes required to make this work.
The SP test is performed for all exceptions, after compensating for the
size of the exception registers, allowing the original exception context
to be preserved in entirety. The tests themselves are folded into the
exception vectors, minimizing their impact.
To ensure that IRQ stack overflows are detected and handled, IRQ stacks
are now dynamically allocated, with guard pages.
I've given the series some light testing with LKDTM, Syzkaller, Vince
Weaver's perf fuzzer, and a few combinations of debug options. I haven't
compared performance of the entire series to a baseline kernel, but from
testing so far the cost of the SP test falls in the noise for a kernel
build workload on Cortex-A57.
Many thanks to Ard for putting up with my meddling, and also to Laura,
James, Catalin, and Will for comments and testing.
Thanks,
Mark.
[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/518368.html
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/518434.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/520705.html
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/521435.html
[4] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/exception-stack
[5] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/vmap-stack
[6] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-August/524179.html
Ard Biesheuvel (2):
arm64: kernel: remove {THREAD,IRQ_STACK}_START_SP
arm64: assembler: allow adr_this_cpu to use the stack pointer
Mark Rutland (12):
arm64: remove __die()'s stack dump
fork: allow arch-override of VMAP stack alignment
arm64: factor out PAGE_* and CONT_* definitions
arm64: clean up THREAD_* definitions
arm64: clean up irq stack definitions
arm64: move SEGMENT_ALIGN to <asm/memory.h>
efi/arm64: add EFI_KIMG_ALIGN
arm64: factor out entry stack manipulation
arm64: use an irq stack pointer
arm64: add basic VMAP_STACK support
arm64: add on_accessible_stack()
arm64: add VMAP_STACK overflow detection
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/assembler.h | 8 +-
arch/arm64/include/asm/efi.h | 8 ++
arch/arm64/include/asm/irq.h | 25 ------
arch/arm64/include/asm/memory.h | 53 +++++++++++++
arch/arm64/include/asm/page-def.h | 34 +++++++++
arch/arm64/include/asm/page.h | 12 +--
arch/arm64/include/asm/processor.h | 2 +-
arch/arm64/include/asm/stacktrace.h | 60 ++++++++++++++-
arch/arm64/include/asm/thread_info.h | 10 +--
arch/arm64/kernel/entry.S | 121 ++++++++++++++++++++++++------
arch/arm64/kernel/irq.c | 40 +++++++++-
arch/arm64/kernel/ptrace.c | 1 +
arch/arm64/kernel/smp.c | 2 +-
arch/arm64/kernel/stacktrace.c | 7 +-
arch/arm64/kernel/traps.c | 44 ++++++++++-
arch/arm64/kernel/vmlinux.lds.S | 18 +----
drivers/firmware/efi/libstub/arm64-stub.c | 6 +-
kernel/fork.c | 5 +-
19 files changed, 353 insertions(+), 104 deletions(-)
create mode 100644 arch/arm64/include/asm/page-def.h
--
1.9.1
Our __die() implementation tries to dump the stack memory, in addition
to a backtrace, which is problematic.
For contemporary 16K stacks, this can be a lot of data, which can take a
long time to dump, and can push other useful context out of the kernel's
printk ringbuffer (and/or a user's scrollback buffer on an attached
console).
Additionally, the code implicitly assumes that the SP is on the task's
stack, and tries to dump everything between the SP and the highest task
stack address. When the SP points at an IRQ stack (or is corrupted),
this makes the kernel attempt to dump vast amounts of VA space. With
vmap'd stacks, this may result in erroneous accesses to peripherals.
This patch removes the memory dump, leaving us to rely on the backtrace,
and other means of dumping stack memory such as kdump.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/kernel/traps.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index c2a81bf..9633773 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -237,8 +237,6 @@ static int __die(const char *str, int err, struct pt_regs *regs)
end_of_stack(tsk));
if (!user_mode(regs)) {
- dump_mem(KERN_EMERG, "Stack: ", regs->sp,
- THREAD_SIZE + (unsigned long)task_stack_page(tsk));
dump_backtrace(regs, tsk);
dump_instr(KERN_EMERG, regs);
}
--
1.9.1
In some cases, an architecture might wish its stacks to be aligned to a
boundary larger than THREAD_SIZE. For example, using an alignment of
double THREAD_SIZE can allow for stack overflows smaller than
THREAD_SIZE to be detected by checking a single bit of the stack
pointer.
This patch allows architectures to override the alignment of VMAP'd
stacks, by defining THREAD_ALIGN. Where not defined, this defaults to
THREAD_SIZE, as is the case today.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
---
kernel/fork.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0..696d692 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -217,7 +217,10 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
return s->addr;
}
- stack = __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE,
+#ifndef THREAD_ALIGN
+#define THREAD_ALIGN THREAD_SIZE
+#endif
+ stack = __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN,
VMALLOC_START, VMALLOC_END,
THREADINFO_GFP,
PAGE_KERNEL,
--
1.9.1
From: Ard Biesheuvel <[email protected]>
For historical reasons, we leave the top 16 bytes of our task and IRQ
stacks unused, a practice used to ensure that the SP can always be
masked to find the base of the current stack (historically, where
thread_info could be found).
However, this is not necessary, as:
* When an exception is taken from a task stack, we decrement the SP by
S_FRAME_SIZE and stash the exception registers before we compare the
SP against the task stack. In such cases, the SP must be at least
S_FRAME_SIZE below the limit, and can be safely masked to determine
whether the task stack is in use.
* When transitioning to an IRQ stack, we'll place a dummy frame onto the
IRQ stack before enabling asynchronous exceptions, or executing code
we expect to trigger faults. Thus, if an exception is taken from the
IRQ stack, the SP must be at least 16 bytes below the limit.
* We no longer mask the SP to find the thread_info, which is now found
via sp_el0. Note that historically, the offset was critical to ensure
that cpu_switch_to() found the correct stack for new threads that
hadn't yet executed ret_from_fork().
Given that, this initial offset serves no purpose, and can be removed.
This brings us in-line with other architectures (e.g. x86) which do not
rely on this masking.
Signed-off-by: Ard Biesheuvel <[email protected]>
[Mark: rebase, kill THREAD_START_SP, commit msg additions]
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/irq.h | 5 ++---
arch/arm64/include/asm/processor.h | 2 +-
arch/arm64/include/asm/thread_info.h | 1 -
arch/arm64/kernel/entry.S | 2 +-
arch/arm64/kernel/smp.c | 2 +-
5 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index 8ba89c4..1ebe202 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -2,7 +2,6 @@
#define __ASM_IRQ_H
#define IRQ_STACK_SIZE THREAD_SIZE
-#define IRQ_STACK_START_SP THREAD_START_SP
#ifndef __ASSEMBLER__
@@ -26,9 +25,9 @@ static inline int nr_legacy_irqs(void)
static inline bool on_irq_stack(unsigned long sp)
{
unsigned long low = (unsigned long)raw_cpu_ptr(irq_stack);
- unsigned long high = low + IRQ_STACK_START_SP;
+ unsigned long high = low + IRQ_STACK_SIZE;
- return (low <= sp && sp <= high);
+ return (low <= sp && sp < high);
}
static inline bool on_task_stack(struct task_struct *tsk, unsigned long sp)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 64c9e78..6687dd2 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -159,7 +159,7 @@ extern struct task_struct *cpu_switch_to(struct task_struct *prev,
struct task_struct *next);
#define task_pt_regs(p) \
- ((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
+ ((struct pt_regs *)(THREAD_SIZE + task_stack_page(p)) - 1)
#define KSTK_EIP(tsk) ((unsigned long)task_pt_regs(tsk)->pc)
#define KSTK_ESP(tsk) user_stack_pointer(task_pt_regs(tsk))
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 46c3b93..b29ab0e 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -30,7 +30,6 @@
#endif
#define THREAD_SIZE 16384
-#define THREAD_START_SP (THREAD_SIZE - 16)
#ifndef __ASSEMBLY__
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 612a077..f31c7b2 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -272,7 +272,7 @@ alternative_else_nop_endif
cbnz x25, 9998f
adr_this_cpu x25, irq_stack, x26
- mov x26, #IRQ_STACK_START_SP
+ mov x26, #IRQ_STACK_SIZE
add x26, x25, x26
/* switch to the irq stack */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dc66e6e..f13ddb24 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -154,7 +154,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
* page tables.
*/
secondary_data.task = idle;
- secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
+ secondary_data.stack = task_stack_page(idle) + THREAD_SIZE;
update_cpu_boot_status(CPU_MMU_OFF);
__flush_dcache_area(&secondary_data, sizeof(secondary_data));
--
1.9.1
Some headers rely on PAGE_* definitions from <asm/page.h>, but cannot
include this due to potential circular includes. For example, a number
of definitions in <asm/memory.h> rely on PAGE_SHIFT, and <asm/page.h>
includes <asm/memory.h>.
This requires users of these definitions to include both headers, which
is fragile and error-prone.
This patch ameliorates matters by moving the basic definitions out to a
new header, <asm/page-def.h>. Both <asm/page.h> and <asm/memory.h> are
updated to include this, avoiding this fragility, and avoiding the
possibility of circular include dependencies.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/memory.h | 1 +
arch/arm64/include/asm/page-def.h | 34 ++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/page.h | 12 +-----------
3 files changed, 36 insertions(+), 11 deletions(-)
create mode 100644 arch/arm64/include/asm/page-def.h
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 32f827233..77d55dc 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -25,6 +25,7 @@
#include <linux/const.h>
#include <linux/types.h>
#include <asm/bug.h>
+#include <asm/page-def.h>
#include <asm/sizes.h>
/*
diff --git a/arch/arm64/include/asm/page-def.h b/arch/arm64/include/asm/page-def.h
new file mode 100644
index 0000000..01591a2
--- /dev/null
+++ b/arch/arm64/include/asm/page-def.h
@@ -0,0 +1,34 @@
+/*
+ * Based on arch/arm/include/asm/page.h
+ *
+ * Copyright (C) 1995-2003 Russell King
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PAGE_DEF_H
+#define __ASM_PAGE_DEF_H
+
+#include <linux/const.h>
+
+/* PAGE_SHIFT determines the page size */
+/* CONT_SHIFT determines the number of pages which can be tracked together */
+#define PAGE_SHIFT CONFIG_ARM64_PAGE_SHIFT
+#define CONT_SHIFT CONFIG_ARM64_CONT_SHIFT
+#define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT)
+#define PAGE_MASK (~(PAGE_SIZE-1))
+
+#define CONT_SIZE (_AC(1, UL) << (CONT_SHIFT + PAGE_SHIFT))
+#define CONT_MASK (~(CONT_SIZE-1))
+
+#endif /* __ASM_PAGE_DEF_H */
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 8472c6d..60d02c8 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -19,17 +19,7 @@
#ifndef __ASM_PAGE_H
#define __ASM_PAGE_H
-#include <linux/const.h>
-
-/* PAGE_SHIFT determines the page size */
-/* CONT_SHIFT determines the number of pages which can be tracked together */
-#define PAGE_SHIFT CONFIG_ARM64_PAGE_SHIFT
-#define CONT_SHIFT CONFIG_ARM64_CONT_SHIFT
-#define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT)
-#define PAGE_MASK (~(PAGE_SIZE-1))
-
-#define CONT_SIZE (_AC(1, UL) << (CONT_SHIFT + PAGE_SHIFT))
-#define CONT_MASK (~(CONT_SIZE-1))
+#include <asm/page-def.h>
#ifndef __ASSEMBLY__
--
1.9.1
Before we add yet another stack to the kernel, it would be nice to
ensure that we consistently organise stack definitions and related
helper functions.
This patch moves the basic IRQ stack defintions to <asm/memory.h> to
live with their task stack counterparts. Helpers used for unwinding are
moved into <asm/stacktrace.h>, where subsequent patches will add helpers
for other stacks. Includes are fixed up accordingly.
This patch is a pure refactoring -- there should be no functional
changes as a result of this patch.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/irq.h | 24 ------------------------
arch/arm64/include/asm/memory.h | 2 ++
arch/arm64/include/asm/stacktrace.h | 25 ++++++++++++++++++++++++-
arch/arm64/kernel/ptrace.c | 1 +
4 files changed, 27 insertions(+), 25 deletions(-)
diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index 1ebe202..5e6f772 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -1,20 +1,12 @@
#ifndef __ASM_IRQ_H
#define __ASM_IRQ_H
-#define IRQ_STACK_SIZE THREAD_SIZE
-
#ifndef __ASSEMBLER__
-#include <linux/percpu.h>
-#include <linux/sched/task_stack.h>
-
#include <asm-generic/irq.h>
-#include <asm/thread_info.h>
struct pt_regs;
-DECLARE_PER_CPU(unsigned long [IRQ_STACK_SIZE/sizeof(long)], irq_stack);
-
extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
static inline int nr_legacy_irqs(void)
@@ -22,21 +14,5 @@ static inline int nr_legacy_irqs(void)
return 0;
}
-static inline bool on_irq_stack(unsigned long sp)
-{
- unsigned long low = (unsigned long)raw_cpu_ptr(irq_stack);
- unsigned long high = low + IRQ_STACK_SIZE;
-
- return (low <= sp && sp < high);
-}
-
-static inline bool on_task_stack(struct task_struct *tsk, unsigned long sp)
-{
- unsigned long low = (unsigned long)task_stack_page(tsk);
- unsigned long high = low + THREAD_SIZE;
-
- return (low <= sp && sp < high);
-}
-
#endif /* !__ASSEMBLER__ */
#endif
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 8ab4774..1fc2453 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -110,6 +110,8 @@
#define THREAD_SIZE (UL(1) << THREAD_SHIFT)
+#define IRQ_STACK_SIZE THREAD_SIZE
+
/*
* Memory types available.
*/
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 3bebab3..000e2418 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -16,7 +16,12 @@
#ifndef __ASM_STACKTRACE_H
#define __ASM_STACKTRACE_H
-struct task_struct;
+#include <linux/percpu.h>
+#include <linux/sched.h>
+#include <linux/sched/task_stack.h>
+
+#include <asm/memory.h>
+#include <asm/ptrace.h>
struct stackframe {
unsigned long fp;
@@ -31,4 +36,22 @@ extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);
+DECLARE_PER_CPU(unsigned long [IRQ_STACK_SIZE/sizeof(long)], irq_stack);
+
+static inline bool on_irq_stack(unsigned long sp)
+{
+ unsigned long low = (unsigned long)raw_cpu_ptr(irq_stack);
+ unsigned long high = low + IRQ_STACK_SIZE;
+
+ return (low <= sp && sp < high);
+}
+
+static inline bool on_task_stack(struct task_struct *tsk, unsigned long sp)
+{
+ unsigned long low = (unsigned long)task_stack_page(tsk);
+ unsigned long high = low + THREAD_SIZE;
+
+ return (low <= sp && sp < high);
+}
+
#endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index baf0838..a9f8715 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -42,6 +42,7 @@
#include <asm/compat.h>
#include <asm/debug-monitors.h>
#include <asm/pgtable.h>
+#include <asm/stacktrace.h>
#include <asm/syscall.h>
#include <asm/traps.h>
#include <asm/system_misc.h>
--
1.9.1
Currently we define SEGMENT_ALIGN directly in our vmlinux.lds.S.
This is unfortunate, as the EFI stub currently open-codes the same
number, and in future we'll want to fiddle with this.
This patch moves the definition to our <asm/memory.h>, where it can be
used by both vmlinux.lds.S and the EFI stub code.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/memory.h | 19 +++++++++++++++++++
arch/arm64/kernel/vmlinux.lds.S | 16 ----------------
2 files changed, 19 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 1fc2453..7fa6ad4 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -113,6 +113,25 @@
#define IRQ_STACK_SIZE THREAD_SIZE
/*
+ * Alignment of kernel segments (e.g. .text, .data).
+ */
+#if defined(CONFIG_DEBUG_ALIGN_RODATA)
+/*
+ * 4 KB granule: 1 level 2 entry
+ * 16 KB granule: 128 level 3 entries, with contiguous bit
+ * 64 KB granule: 32 level 3 entries, with contiguous bit
+ */
+#define SEGMENT_ALIGN SZ_2M
+#else
+/*
+ * 4 KB granule: 16 level 3 entries, with contiguous bit
+ * 16 KB granule: 4 level 3 entries, without contiguous bit
+ * 64 KB granule: 1 level 3 entry
+ */
+#define SEGMENT_ALIGN SZ_64K
+#endif
+
+/*
* Memory types available.
*/
#define MT_DEVICE_nGnRnE 0
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 987a00e..7156538 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -72,22 +72,6 @@ PECOFF_FILE_ALIGNMENT = 0x200;
#define PECOFF_EDATA_PADDING
#endif
-#if defined(CONFIG_DEBUG_ALIGN_RODATA)
-/*
- * 4 KB granule: 1 level 2 entry
- * 16 KB granule: 128 level 3 entries, with contiguous bit
- * 64 KB granule: 32 level 3 entries, with contiguous bit
- */
-#define SEGMENT_ALIGN SZ_2M
-#else
-/*
- * 4 KB granule: 16 level 3 entries, with contiguous bit
- * 16 KB granule: 4 level 3 entries, without contiguous bit
- * 64 KB granule: 1 level 3 entry
- */
-#define SEGMENT_ALIGN SZ_64K
-#endif
-
SECTIONS
{
/*
--
1.9.1
We allocate our IRQ stacks using a percpu array. This allows us to generate our
IRQ stack pointers with adr_this_cpu, but bloats the kernel Image with the boot
CPU's IRQ stack. Additionally, these are packed with other percpu variables,
and aren't guaranteed to have guard pages.
When we enable VMAP_STACK we'll want to vmap our IRQ stacks also, in order to
provide guard pages and to permit more stringent alignment requirements. Doing
so will require that we use a percpu pointer to each IRQ stack, rather than
allocating a percpu IRQ stack in the kernel image.
This patch updates our IRQ stack code to use a percpu pointer to the base of
each IRQ stack. This will allow us to change the way the stack is allocated
with minimal changes elsewhere. In some cases we may try to backtrace before
the IRQ stack pointers are initialised, so on_irq_stack() is updated to account
for this.
In testing with cyclictest, there was no measureable difference between using
adr_this_cpu (for irq_stack) and ldr_this_cpu (for irq_stack_ptr) in the IRQ
entry path.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 7 +++++--
arch/arm64/kernel/entry.S | 2 +-
arch/arm64/kernel/irq.c | 10 ++++++++++
3 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 000e2418..4c68d8a 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -36,13 +36,16 @@ extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);
-DECLARE_PER_CPU(unsigned long [IRQ_STACK_SIZE/sizeof(long)], irq_stack);
+DECLARE_PER_CPU(unsigned long *, irq_stack_ptr);
static inline bool on_irq_stack(unsigned long sp)
{
- unsigned long low = (unsigned long)raw_cpu_ptr(irq_stack);
+ unsigned long low = (unsigned long)raw_cpu_read(irq_stack_ptr);
unsigned long high = low + IRQ_STACK_SIZE;
+ if (!low)
+ return false;
+
return (low <= sp && sp < high);
}
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 58eba94..5234886 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -276,7 +276,7 @@ alternative_else_nop_endif
and x25, x25, #~(THREAD_SIZE - 1)
cbnz x25, 9998f
- adr_this_cpu x25, irq_stack, x26
+ ldr_this_cpu x25, irq_stack_ptr, x26
mov x26, #IRQ_STACK_SIZE
add x26, x25, x26
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 2386b26..5141282 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -32,6 +32,7 @@
/* irq stack only needs to be 16 byte aligned - not IRQ_STACK_SIZE aligned. */
DEFINE_PER_CPU(unsigned long [IRQ_STACK_SIZE/sizeof(long)], irq_stack) __aligned(16);
+DEFINE_PER_CPU(unsigned long *, irq_stack_ptr);
int arch_show_interrupts(struct seq_file *p, int prec)
{
@@ -50,8 +51,17 @@ void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
handle_arch_irq = handle_irq;
}
+static void init_irq_stacks(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ per_cpu(irq_stack_ptr, cpu) = per_cpu(irq_stack, cpu);
+}
+
void __init init_IRQ(void)
{
+ init_irq_stacks();
irqchip_init();
if (!handle_arch_irq)
panic("No interrupt controller found.");
--
1.9.1
This patch enables arm64 to be built with vmap'd task and IRQ stacks.
As vmap'd stacks are mapped at page granularity, stacks must be a multiple of
PAGE_SIZE. This means that a 64K page kernel must use stacks of at least 64K in
size.
To minimize the increase in Image size, IRQ stacks are dynamically allocated at
boot time, rather than embedding the boot CPU's IRQ stack in the kernel image.
This patch was co-authored by Ard Biesheuvel and Mark Rutland.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/efi.h | 7 ++++++-
arch/arm64/include/asm/memory.h | 23 ++++++++++++++++++++++-
arch/arm64/kernel/irq.c | 30 ++++++++++++++++++++++++++++--
arch/arm64/kernel/vmlinux.lds.S | 2 +-
5 files changed, 58 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index dfd9086..d66f9db 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -75,6 +75,7 @@ config ARM64
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+ select HAVE_ARCH_VMAP_STACK
select HAVE_ARM_SMCCC
select HAVE_EBPF_JIT
select HAVE_C_RECORDMCOUNT
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 0e8cc3b..2b1e5de 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -49,7 +49,12 @@
*/
#define EFI_FDT_ALIGN SZ_2M /* used by allocate_new_fdt_and_exit_boot() */
-#define EFI_KIMG_ALIGN SEGMENT_ALIGN
+/*
+ * In some configurations (e.g. VMAP_STACK && 64K pages), stacks built into the
+ * kernel need greater alignment than we require the segments to be padded to.
+ */
+#define EFI_KIMG_ALIGN \
+ (SEGMENT_ALIGN > THREAD_ALIGN ? SEGMENT_ALIGN : THREAD_ALIGN)
/* on arm64, the FDT may be located anywhere in system RAM */
static inline unsigned long efi_get_max_fdt_addr(unsigned long dram_base)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 7fa6ad4..c5cd2c5 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -102,7 +102,17 @@
#define KASAN_SHADOW_SIZE (0)
#endif
-#define THREAD_SHIFT 14
+#define MIN_THREAD_SHIFT 14
+
+/*
+ * VMAP'd stacks are allocated at page granularity, so we must ensure that such
+ * stacks are a multiple of page size.
+ */
+#if defined(CONFIG_VMAP_STACK) && (MIN_THREAD_SHIFT < PAGE_SHIFT)
+#define THREAD_SHIFT PAGE_SHIFT
+#else
+#define THREAD_SHIFT MIN_THREAD_SHIFT
+#endif
#if THREAD_SHIFT >= PAGE_SHIFT
#define THREAD_SIZE_ORDER (THREAD_SHIFT - PAGE_SHIFT)
@@ -110,6 +120,17 @@
#define THREAD_SIZE (UL(1) << THREAD_SHIFT)
+/*
+ * By aligning VMAP'd stacks to 2 * THREAD_SIZE, we can detect overflow by
+ * checking sp & (1 << THREAD_SHIFT), which we can do cheaply in the entry
+ * assembly.
+ */
+#ifdef CONFIG_VMAP_STACK
+#define THREAD_ALIGN (2 * THREAD_SIZE)
+#else
+#define THREAD_ALIGN THREAD_SIZE
+#endif
+
#define IRQ_STACK_SIZE THREAD_SIZE
/*
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 5141282..713561e 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -23,15 +23,15 @@
#include <linux/kernel_stat.h>
#include <linux/irq.h>
+#include <linux/memory.h>
#include <linux/smp.h>
#include <linux/init.h>
#include <linux/irqchip.h>
#include <linux/seq_file.h>
+#include <linux/vmalloc.h>
unsigned long irq_err_count;
-/* irq stack only needs to be 16 byte aligned - not IRQ_STACK_SIZE aligned. */
-DEFINE_PER_CPU(unsigned long [IRQ_STACK_SIZE/sizeof(long)], irq_stack) __aligned(16);
DEFINE_PER_CPU(unsigned long *, irq_stack_ptr);
int arch_show_interrupts(struct seq_file *p, int prec)
@@ -51,6 +51,31 @@ void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
handle_arch_irq = handle_irq;
}
+#ifdef CONFIG_VMAP_STACK
+static void init_irq_stacks(void)
+{
+ int cpu;
+ unsigned long *p;
+
+ for_each_possible_cpu(cpu) {
+ /*
+ * To ensure that VMAP'd stack overflow detection works
+ * correctly, the IRQ stacks need to have the same
+ * alignment as other stacks.
+ */
+ p = __vmalloc_node_range(IRQ_STACK_SIZE, THREAD_ALIGN,
+ VMALLOC_START, VMALLOC_END,
+ THREADINFO_GFP, PAGE_KERNEL,
+ 0, cpu_to_node(cpu),
+ __builtin_return_address(0));
+
+ per_cpu(irq_stack_ptr, cpu) = p;
+ }
+}
+#else
+/* irq stack only needs to be 16 byte aligned - not IRQ_STACK_SIZE aligned. */
+DEFINE_PER_CPU_ALIGNED(unsigned long [IRQ_STACK_SIZE/sizeof(long)], irq_stack);
+
static void init_irq_stacks(void)
{
int cpu;
@@ -58,6 +83,7 @@ static void init_irq_stacks(void)
for_each_possible_cpu(cpu)
per_cpu(irq_stack_ptr, cpu) = per_cpu(irq_stack, cpu);
}
+#endif
void __init init_IRQ(void)
{
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7156538..fe56c26 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -176,7 +176,7 @@ SECTIONS
_data = .;
_sdata = .;
- RW_DATA_SECTION(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
+ RW_DATA_SECTION(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
/*
* Data written with the MMU off but read with the MMU on requires
--
1.9.1
This patch adds stack overflow detection to arm64, usable when vmap'd stacks
are in use.
Overflow is detected in a small preamble executed for each exception entry,
which checks whether there is enough space on the current stack for the general
purpose registers to be saved. If there is not enough space, the overflow
handler is invoked on a per-cpu overflow stack. This approach preserves the
original exception information in ESR_EL1 (and where appropriate, FAR_EL1).
Task and IRQ stacks are aligned to double their size, enabling overflow to be
detected with a single bit test. For example, a 16K stack is aligned to 32K,
ensuring that bit 14 of the SP must be zero. On an overflow (or underflow),
this bit is flipped. Thus, overflow (of less than the size of the stack) can be
detected by testing whether this bit is set.
The overflow check is performed before any attempt is made to access the
stack, avoiding recursive faults (and the loss of exception information
these would entail). As logical operations cannot be performed on the SP
directly, the SP is temporarily swapped with a general purpose register
using arithmetic operations to enable the test to be performed.
This gives us a useful error message on stack overflow, as can be trigger with
the LKDTM overflow test:
[ 305.388749] lkdtm: Performing direct entry OVERFLOW
[ 305.395444] Insufficient stack space to handle exception!
[ 305.395482] ESR: 0x96000047 -- DABT (current EL)
[ 305.399890] FAR: 0xffff00000a5e7f30
[ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000]
[ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0]
[ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.412785] Hardware name: linux,dummy-virt (DT)
[ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000
[ 305.419221] PC is at recursive_loop+0x10/0x48
[ 305.421637] LR is at recursive_loop+0x38/0x48
[ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145
[ 305.428020] sp : ffff00000a5e7f50
[ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00
[ 305.433191] x27: ffff000008981000 x26: ffff000008f80400
[ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8
[ 305.440369] x23: ffff000008f80138 x22: 0000000000000009
[ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188
[ 305.444552] x19: 0000000000000013 x18: 0000000000000006
[ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8
[ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8
[ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff
[ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d
[ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770
[ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1
[ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000
[ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400
[ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012
[ 305.467724] Kernel panic - not syncing: kernel stack overflow
[ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.473325] Hardware name: linux,dummy-virt (DT)
[ 305.475070] Call trace:
[ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378
[ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20
[ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8
[ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280
[ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70
[ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128
[ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0)
[ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000
[ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313
[ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548
[ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d
[ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013
[ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138
[ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000
[ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50
[ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000
[ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330
[ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c
[ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48
[ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20
[ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28
[ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170
[ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8
[ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128
[ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0
[ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0
[ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000)
[ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080
[ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f
[ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038
[ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000
[ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458
[ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0
[ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040
[ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28
[ 305.504720] Kernel Offset: disabled
[ 305.505189] CPU features: 0x002082
[ 305.505473] Memory Limit: none
[ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow
This patch was co-authored by Ard Biesheuvel and Mark Rutland.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/memory.h | 2 ++
arch/arm64/include/asm/stacktrace.h | 16 +++++++++
arch/arm64/kernel/entry.S | 70 +++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/traps.c | 39 +++++++++++++++++++++
4 files changed, 127 insertions(+)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index c5cd2c5..1a025b7 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -133,6 +133,8 @@
#define IRQ_STACK_SIZE THREAD_SIZE
+#define OVERFLOW_STACK_SIZE SZ_4K
+
/*
* Alignment of kernel segments (e.g. .text, .data).
*/
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 92ddb6d..6ad3077 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -57,6 +57,20 @@ static inline bool on_task_stack(struct task_struct *tsk, unsigned long sp)
return (low <= sp && sp < high);
}
+#ifdef CONFIG_VMAP_STACK
+DECLARE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack);
+
+static inline bool on_overflow_stack(unsigned long sp)
+{
+ unsigned long low = (unsigned long)raw_cpu_ptr(overflow_stack);
+ unsigned long high = low + OVERFLOW_STACK_SIZE;
+
+ return (low <= sp && sp < high);
+}
+#else
+static inline bool on_overflow_stack(unsigned long sp) { return false; }
+#endif
+
/*
* We can only safely access per-cpu stacks from current in a non-preemptible
* context.
@@ -69,6 +83,8 @@ static inline bool on_accessible_stack(struct task_struct *tsk, unsigned long sp
return false;
if (on_irq_stack(sp))
return true;
+ if (on_overflow_stack(sp))
+ return true;
return false;
}
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 5234886..3ef6e22 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -72,6 +72,48 @@
.macro kernel_ventry label
.align 7
sub sp, sp, #S_FRAME_SIZE
+#ifdef CONFIG_VMAP_STACK
+ /*
+ * Test whether the SP has overflowed, without corrupting a GPR.
+ * Task and IRQ stacks are aligned to (1 << THREAD_SHIFT).
+ */
+ add sp, sp, x0 // sp' = sp + x0
+ sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
+ tbnz x0, #THREAD_SHIFT, 0f
+ sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
+ sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
+ b \label
+
+0:
+ /*
+ * Either we've just detected an overflow, or we've taken an exception
+ * while on the overflow stack. Either way, we won't return to
+ * userspace, and can clobber EL0 registers to free up GPRs.
+ */
+
+ /* Stash the original SP (minus S_FRAME_SIZE) in tpidr_el0. */
+ msr tpidr_el0, x0
+
+ /* Recover the original x0 value and stash it in tpidrro_el0 */
+ sub x0, sp, x0
+ msr tpidrro_el0, x0
+
+ /* Switch to the overflow stack */
+ adr_this_cpu sp, overflow_stack + OVERFLOW_STACK_SIZE, x0
+
+ /*
+ * Check whether we were already on the overflow stack. This may happen
+ * after panic() re-enables interrupts.
+ */
+ mrs x0, tpidr_el0 // sp of interrupted context
+ sub x0, sp, x0 // delta with top of overflow stack
+ tst x0, #~(OVERFLOW_STACK_SIZE - 1) // within range?
+ b.ne __bad_stack // no? -> bad stack pointer
+
+ /* We were already on the overflow stack. Restore sp/x0 and carry on. */
+ sub sp, sp, x0
+ mrs x0, tpidrro_el0
+#endif
b \label
.endm
@@ -352,6 +394,34 @@ ENTRY(vectors)
#endif
END(vectors)
+#ifdef CONFIG_VMAP_STACK
+ /*
+ * We detected an overflow in kernel_ventry, which switched to the
+ * overflow stack. Stash the exception regs, and head to our overflow
+ * handler.
+ */
+__bad_stack:
+ /* Restore the original x0 value */
+ mrs x0, tpidrro_el0
+
+ /*
+ * Store the original GPRs to the new stack. The orginal SP (minus
+ * S_FRAME_SIZE) was stashed in tpidr_el0 by kernel_ventry.
+ */
+ sub sp, sp, #S_FRAME_SIZE
+ kernel_entry 1
+ mrs x0, tpidr_el0
+ add x0, x0, #S_FRAME_SIZE
+ str x0, [sp, #S_SP]
+
+ /* Stash the regs for handle_bad_stack */
+ mov x0, sp
+
+ /* Time to die */
+ bl handle_bad_stack
+ ASM_BUG()
+#endif /* CONFIG_VMAP_STACK */
+
/*
* Invalid mode handlers
*/
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index d01c598..2d59180 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -32,6 +32,7 @@
#include <linux/sched/signal.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
+#include <linux/sizes.h>
#include <linux/syscalls.h>
#include <linux/mm_types.h>
@@ -41,6 +42,7 @@
#include <asm/esr.h>
#include <asm/insn.h>
#include <asm/traps.h>
+#include <asm/smp.h>
#include <asm/stack_pointer.h>
#include <asm/stacktrace.h>
#include <asm/exception.h>
@@ -666,6 +668,43 @@ asmlinkage void bad_el0_sync(struct pt_regs *regs, int reason, unsigned int esr)
force_sig_info(info.si_signo, &info, current);
}
+#ifdef CONFIG_VMAP_STACK
+
+DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack)
+ __aligned(16);
+
+asmlinkage void handle_bad_stack(struct pt_regs *regs)
+{
+ unsigned long tsk_stk = (unsigned long)current->stack;
+ unsigned long irq_stk = (unsigned long)this_cpu_read(irq_stack_ptr);
+ unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
+ unsigned int esr = read_sysreg(esr_el1);
+ unsigned long far = read_sysreg(far_el1);
+
+ console_verbose();
+ pr_emerg("Insufficient stack space to handle exception!");
+
+ pr_emerg("ESR: 0x%08x -- %s\n", esr, esr_get_class_string(esr));
+ pr_emerg("FAR: 0x%016lx\n", far);
+
+ pr_emerg("Task stack: [0x%016lx..0x%016lx]\n",
+ tsk_stk, tsk_stk + THREAD_SIZE);
+ pr_emerg("IRQ stack: [0x%016lx..0x%016lx]\n",
+ irq_stk, irq_stk + THREAD_SIZE);
+ pr_emerg("Overflow stack: [0x%016lx..0x%016lx]\n",
+ ovf_stk, ovf_stk + OVERFLOW_STACK_SIZE);
+
+ __show_regs(regs);
+
+ /*
+ * We use nmi_panic to limit the potential for recusive overflows, and
+ * to get a better stack trace.
+ */
+ nmi_panic(NULL, "kernel stack overflow");
+ cpu_park_loop();
+}
+#endif
+
void __pte_error(const char *file, int line, unsigned long val)
{
pr_err("%s:%d: bad pte %016lx.\n", file, line, val);
--
1.9.1
Both unwind_frame() and dump_backtrace() try to check whether a stack
address is sane to access, with very similar logic. Both will need
updating in order to handle overflow stacks.
Factor out this logic into a helper, so that we can avoid further
duplication when we add overflow stacks.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 16 ++++++++++++++++
arch/arm64/kernel/stacktrace.c | 7 +------
arch/arm64/kernel/traps.c | 3 +--
3 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 4c68d8a..92ddb6d 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -57,4 +57,20 @@ static inline bool on_task_stack(struct task_struct *tsk, unsigned long sp)
return (low <= sp && sp < high);
}
+/*
+ * We can only safely access per-cpu stacks from current in a non-preemptible
+ * context.
+ */
+static inline bool on_accessible_stack(struct task_struct *tsk, unsigned long sp)
+{
+ if (on_task_stack(tsk, sp))
+ return true;
+ if (tsk != current || preemptible())
+ return false;
+ if (on_irq_stack(sp))
+ return true;
+
+ return false;
+}
+
#endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 35588ca..3144584 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -50,12 +50,7 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
if (!tsk)
tsk = current;
- /*
- * Switching between stacks is valid when tracing current and in
- * non-preemptible context.
- */
- if (!(tsk == current && !preemptible() && on_irq_stack(fp)) &&
- !on_task_stack(tsk, fp))
+ if (!on_accessible_stack(tsk, fp))
return -EINVAL;
frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 9633773..d01c598 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -193,8 +193,7 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
if (in_entry_text(frame.pc)) {
stack = frame.fp - offsetof(struct pt_regs, stackframe);
- if (on_task_stack(tsk, stack) ||
- (tsk == current && !preemptible() && on_irq_stack(stack)))
+ if (on_accessible_stack(tsk, stack))
dump_mem("", "Exception stack", stack,
stack + sizeof(struct pt_regs));
}
--
1.9.1
From: Ard Biesheuvel <[email protected]>
Given that adr_this_cpu already requires a temp register in addition
to the destination register, tweak the instruction sequence so that sp
may be used as well.
This will simplify switching to per-cpu stacks in subsequent patches. While
this limits the range of adr_this_cpu, to +/-4GiB, we don't currently use
adr_this_cpu in modules, and this is not problematic for the main kernel image.
Signed-off-by: Ard Biesheuvel <[email protected]>
[Mark: add more commit text]
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/assembler.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 610a420..2f2bd51 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -230,12 +230,18 @@
.endm
/*
- * @dst: Result of per_cpu(sym, smp_processor_id())
+ * @dst: Result of per_cpu(sym, smp_processor_id()), can be SP for
+ * non-module code
* @sym: The name of the per-cpu variable
* @tmp: scratch register
*/
.macro adr_this_cpu, dst, sym, tmp
+#ifndef MODULE
+ adrp \tmp, \sym
+ add \dst, \tmp, #:lo12:\sym
+#else
adr_l \dst, \sym
+#endif
mrs \tmp, tpidr_el1
add \dst, \dst, \tmp
.endm
--
1.9.1
In subsequent patches, we will detect stack overflow in our exception
entry code, by verifying the SP after it has been decremented to make
space for the exception regs.
This verification code is small, and we can minimize its impact by
placing it directly in the vectors. To avoid redundant modification of
the SP, we also need to move the initial decrement of the SP into the
vectors.
As a preparatory step, this patch introduces kernel_ventry, which
performs this decrement, and updates the entry code accordingly.
Subsequent patches will fold SP verification into kernel_ventry.
There should be no functional change as a result of this patch.
Signed-off-by: Ard Biesheuvel <[email protected]>
[Mark: turn into prep patch, expand commit msg]
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/kernel/entry.S | 47 ++++++++++++++++++++++++++---------------------
1 file changed, 26 insertions(+), 21 deletions(-)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index f31c7b2..58eba94 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -69,8 +69,13 @@
#define BAD_FIQ 2
#define BAD_ERROR 3
- .macro kernel_entry, el, regsize = 64
+ .macro kernel_ventry label
+ .align 7
sub sp, sp, #S_FRAME_SIZE
+ b \label
+ .endm
+
+ .macro kernel_entry, el, regsize = 64
.if \regsize == 32
mov w0, w0 // zero upper 32 bits of x0
.endif
@@ -319,31 +324,31 @@ tsk .req x28 // current thread_info
.align 11
ENTRY(vectors)
- ventry el1_sync_invalid // Synchronous EL1t
- ventry el1_irq_invalid // IRQ EL1t
- ventry el1_fiq_invalid // FIQ EL1t
- ventry el1_error_invalid // Error EL1t
+ kernel_ventry el1_sync_invalid // Synchronous EL1t
+ kernel_ventry el1_irq_invalid // IRQ EL1t
+ kernel_ventry el1_fiq_invalid // FIQ EL1t
+ kernel_ventry el1_error_invalid // Error EL1t
- ventry el1_sync // Synchronous EL1h
- ventry el1_irq // IRQ EL1h
- ventry el1_fiq_invalid // FIQ EL1h
- ventry el1_error_invalid // Error EL1h
+ kernel_ventry el1_sync // Synchronous EL1h
+ kernel_ventry el1_irq // IRQ EL1h
+ kernel_ventry el1_fiq_invalid // FIQ EL1h
+ kernel_ventry el1_error_invalid // Error EL1h
- ventry el0_sync // Synchronous 64-bit EL0
- ventry el0_irq // IRQ 64-bit EL0
- ventry el0_fiq_invalid // FIQ 64-bit EL0
- ventry el0_error_invalid // Error 64-bit EL0
+ kernel_ventry el0_sync // Synchronous 64-bit EL0
+ kernel_ventry el0_irq // IRQ 64-bit EL0
+ kernel_ventry el0_fiq_invalid // FIQ 64-bit EL0
+ kernel_ventry el0_error_invalid // Error 64-bit EL0
#ifdef CONFIG_COMPAT
- ventry el0_sync_compat // Synchronous 32-bit EL0
- ventry el0_irq_compat // IRQ 32-bit EL0
- ventry el0_fiq_invalid_compat // FIQ 32-bit EL0
- ventry el0_error_invalid_compat // Error 32-bit EL0
+ kernel_ventry el0_sync_compat // Synchronous 32-bit EL0
+ kernel_ventry el0_irq_compat // IRQ 32-bit EL0
+ kernel_ventry el0_fiq_invalid_compat // FIQ 32-bit EL0
+ kernel_ventry el0_error_invalid_compat // Error 32-bit EL0
#else
- ventry el0_sync_invalid // Synchronous 32-bit EL0
- ventry el0_irq_invalid // IRQ 32-bit EL0
- ventry el0_fiq_invalid // FIQ 32-bit EL0
- ventry el0_error_invalid // Error 32-bit EL0
+ kernel_ventry el0_sync_invalid // Synchronous 32-bit EL0
+ kernel_ventry el0_irq_invalid // IRQ 32-bit EL0
+ kernel_ventry el0_fiq_invalid // FIQ 32-bit EL0
+ kernel_ventry el0_error_invalid // Error 32-bit EL0
#endif
END(vectors)
--
1.9.1
The EFI stub is intimately coupled with the kernel, and takes advantage
of this by relocating the kernel at a weaker alignment than the
documented boot protocol mandates.
However, it does so by assuming it can align the kernel to the segment
alignment, and assumes that this is 64K. In subsequent patches, we'll
have to consider other details to determine this de-facto alignment
constraint.
This patch adds a new EFI_KIMG_ALIGN definition that will track the
kernel's de-facto alignment requirements. Subsequent patches will modify
this as required.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/efi.h | 3 +++
drivers/firmware/efi/libstub/arm64-stub.c | 6 ++++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 8f3043a..0e8cc3b 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -4,6 +4,7 @@
#include <asm/boot.h>
#include <asm/cpufeature.h>
#include <asm/io.h>
+#include <asm/memory.h>
#include <asm/mmu_context.h>
#include <asm/neon.h>
#include <asm/ptrace.h>
@@ -48,6 +49,8 @@
*/
#define EFI_FDT_ALIGN SZ_2M /* used by allocate_new_fdt_and_exit_boot() */
+#define EFI_KIMG_ALIGN SEGMENT_ALIGN
+
/* on arm64, the FDT may be located anywhere in system RAM */
static inline unsigned long efi_get_max_fdt_addr(unsigned long dram_base)
{
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index b4c2589..af6ae95 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -11,6 +11,7 @@
*/
#include <linux/efi.h>
#include <asm/efi.h>
+#include <asm/memory.h>
#include <asm/sections.h>
#include <asm/sysreg.h>
@@ -81,9 +82,10 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table_arg,
/*
* If CONFIG_DEBUG_ALIGN_RODATA is not set, produce a
* displacement in the interval [0, MIN_KIMG_ALIGN) that
- * is a multiple of the minimal segment alignment (SZ_64K)
+ * doesn't violate this kernel's de-facto alignment
+ * constraints.
*/
- u32 mask = (MIN_KIMG_ALIGN - 1) & ~(SZ_64K - 1);
+ u32 mask = (MIN_KIMG_ALIGN - 1) & ~(EFI_KIMG_ALIGN - 1);
u32 offset = !IS_ENABLED(CONFIG_DEBUG_ALIGN_RODATA) ?
(phys_seed >> 32) & mask : TEXT_OFFSET;
--
1.9.1
Currently we define THREAD_SIZE and THREAD_SIZE_ORDER separately, with
the latter dependent on particular CONFIG_ARM64_*K_PAGES definitions.
This is somewhat opaque, and will get in the way of future modifications
to THREAD_SIZE.
This patch cleans this up, defining both in terms of a common
THREAD_SHIFT, and using PAGE_SHIFT to calculate THREAD_SIZE_ORDER,
rather than using a number of definitions dependent on config symbols.
Subsequent patches will make use of this to alter the stack size used in
some configurations.
At the same time, these are moved into <asm/memory.h>, which will avoid
circular include issues in subsequent patches. To ensure that existing
code isn't adversely affected, <asm/thread_info.h> is updated to
transitively include these definitions.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/memory.h | 8 ++++++++
arch/arm64/include/asm/thread_info.h | 9 +--------
2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 77d55dc..8ab4774 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -102,6 +102,14 @@
#define KASAN_SHADOW_SIZE (0)
#endif
+#define THREAD_SHIFT 14
+
+#if THREAD_SHIFT >= PAGE_SHIFT
+#define THREAD_SIZE_ORDER (THREAD_SHIFT - PAGE_SHIFT)
+#endif
+
+#define THREAD_SIZE (UL(1) << THREAD_SHIFT)
+
/*
* Memory types available.
*/
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index b29ab0e..aa04b73 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -23,18 +23,11 @@
#include <linux/compiler.h>
-#ifdef CONFIG_ARM64_4K_PAGES
-#define THREAD_SIZE_ORDER 2
-#elif defined(CONFIG_ARM64_16K_PAGES)
-#define THREAD_SIZE_ORDER 0
-#endif
-
-#define THREAD_SIZE 16384
-
#ifndef __ASSEMBLY__
struct task_struct;
+#include <asm/memory.h>
#include <asm/stack_pointer.h>
#include <asm/types.h>
--
1.9.1
On Tue, Aug 15, 2017 at 01:50:35PM +0100, Mark Rutland wrote:
> Ard and I have worked together to implement vmap stack support for
> arm64. This supersedes our earlier vmap stack RFCs [0,1]. The git author
> stats are a little misleading, as I've teased parts out into smaller
> patches for review.
>
> The series is based on our stack dump rework [2,3], which can be found
> in the arm64/exception-stack branch [4] of my kernel.org repo. This
> series can be found in the arm64/vmap-stack branch [5] of the same repo.
>
> Since v1 [6]:
> * Fix typos
> * Update comments in entry assembly
> * Dump exception context (and stacks) before regs
> * Define safe adr_this_cpu for modules
Thanks:
Reviewed-by: Will Deacon <[email protected]>
for the series.
Will
On Tue, Aug 15, 2017 at 5:50 AM, Mark Rutland <[email protected]> wrote:
> In some cases, an architecture might wish its stacks to be aligned to a
> boundary larger than THREAD_SIZE. For example, using an alignment of
> double THREAD_SIZE can allow for stack overflows smaller than
> THREAD_SIZE to be detected by checking a single bit of the stack
> pointer.
>
> This patch allows architectures to override the alignment of VMAP'd
> stacks, by defining THREAD_ALIGN. Where not defined, this defaults to
> THREAD_SIZE, as is the case today.
This looks okay, but it might make sense to move that to a header file
so THREAD_ALIGN is always available.
>
> Signed-off-by: Mark Rutland <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: James Morse <[email protected]>
> Cc: Laura Abbott <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: [email protected]
> ---
> kernel/fork.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 17921b0..696d692 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -217,7 +217,10 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
> return s->addr;
> }
>
> - stack = __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE,
> +#ifndef THREAD_ALIGN
> +#define THREAD_ALIGN THREAD_SIZE
> +#endif
> + stack = __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN,
> VMALLOC_START, VMALLOC_END,
> THREADINFO_GFP,
> PAGE_KERNEL,
> --
> 1.9.1
>
--
Andy Lutomirski
AMA Capital Management, LLC
On Tue, Aug 15, 2017 at 09:09:36AM -0700, Andy Lutomirski wrote:
> On Tue, Aug 15, 2017 at 5:50 AM, Mark Rutland <[email protected]> wrote:
> > In some cases, an architecture might wish its stacks to be aligned to a
> > boundary larger than THREAD_SIZE. For example, using an alignment of
> > double THREAD_SIZE can allow for stack overflows smaller than
> > THREAD_SIZE to be detected by checking a single bit of the stack
> > pointer.
> >
> > This patch allows architectures to override the alignment of VMAP'd
> > stacks, by defining THREAD_ALIGN. Where not defined, this defaults to
> > THREAD_SIZE, as is the case today.
>
> This looks okay, but it might make sense to move that to a header file
> so THREAD_ALIGN is always available.
I was a little worried about breaking things, since arches don't define
THREAD_SIZE in a consistent location.
Looking again, it looks like those are all transitiviely included into
each arch's <asm/thread_info.h>, so I think I can move this into
<linux/thread_info.h>, which'll have to be added to kernel.fork.c's
includes.
Are you happy with the below fixup?
Thanks,
Mark.
---->8----
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 250a276..905d769 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -38,6 +38,10 @@ enum {
#ifdef __KERNEL__
+#ifndef THREAD_ALIGN
+#define THREAD_ALIGN THREAD_SIZE
+#endif
+
#ifdef CONFIG_DEBUG_STACK_USAGE
# define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \
__GFP_ZERO)
diff --git a/kernel/fork.c b/kernel/fork.c
index 696d692..f12882a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -88,6 +88,7 @@
#include <linux/sysctl.h>
#include <linux/kcov.h>
#include <linux/livepatch.h>
+#include <linux/thread_info.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -217,9 +218,6 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
return s->addr;
}
-#ifndef THREAD_ALIGN
-#define THREAD_ALIGN THREAD_SIZE
-#endif
stack = __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN,
VMALLOC_START, VMALLOC_END,
THREADINFO_GFP,
On Tue, Aug 15, 2017 at 9:30 AM, Mark Rutland <[email protected]> wrote:
> On Tue, Aug 15, 2017 at 09:09:36AM -0700, Andy Lutomirski wrote:
>> On Tue, Aug 15, 2017 at 5:50 AM, Mark Rutland <[email protected]> wrote:
>> > In some cases, an architecture might wish its stacks to be aligned to a
>> > boundary larger than THREAD_SIZE. For example, using an alignment of
>> > double THREAD_SIZE can allow for stack overflows smaller than
>> > THREAD_SIZE to be detected by checking a single bit of the stack
>> > pointer.
>> >
>> > This patch allows architectures to override the alignment of VMAP'd
>> > stacks, by defining THREAD_ALIGN. Where not defined, this defaults to
>> > THREAD_SIZE, as is the case today.
>>
>> This looks okay, but it might make sense to move that to a header file
>> so THREAD_ALIGN is always available.
>
> I was a little worried about breaking things, since arches don't define
> THREAD_SIZE in a consistent location.
>
> Looking again, it looks like those are all transitiviely included into
> each arch's <asm/thread_info.h>, so I think I can move this into
> <linux/thread_info.h>, which'll have to be added to kernel.fork.c's
> includes.
>
> Are you happy with the below fixup?
LGTM.
>
> Thanks,
> Mark.
>
> ---->8----
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index 250a276..905d769 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -38,6 +38,10 @@ enum {
>
> #ifdef __KERNEL__
>
> +#ifndef THREAD_ALIGN
> +#define THREAD_ALIGN THREAD_SIZE
> +#endif
> +
> #ifdef CONFIG_DEBUG_STACK_USAGE
> # define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \
> __GFP_ZERO)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 696d692..f12882a 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -88,6 +88,7 @@
> #include <linux/sysctl.h>
> #include <linux/kcov.h>
> #include <linux/livepatch.h>
> +#include <linux/thread_info.h>
>
> #include <asm/pgtable.h>
> #include <asm/pgalloc.h>
> @@ -217,9 +218,6 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
> return s->addr;
> }
>
> -#ifndef THREAD_ALIGN
> -#define THREAD_ALIGN THREAD_SIZE
> -#endif
> stack = __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN,
> VMALLOC_START, VMALLOC_END,
> THREADINFO_GFP,
>
--
Andy Lutomirski
AMA Capital Management, LLC
On Tue, Aug 15, 2017 at 09:33:18AM -0700, Andy Lutomirski wrote:
> On Tue, Aug 15, 2017 at 9:30 AM, Mark Rutland <[email protected]> wrote:
> > On Tue, Aug 15, 2017 at 09:09:36AM -0700, Andy Lutomirski wrote:
> >> This looks okay, but it might make sense to move that to a header file
> >> so THREAD_ALIGN is always available.
> >
> > I was a little worried about breaking things, since arches don't define
> > THREAD_SIZE in a consistent location.
> >
> > Looking again, it looks like those are all transitiviely included into
> > each arch's <asm/thread_info.h>, so I think I can move this into
> > <linux/thread_info.h>, which'll have to be added to kernel.fork.c's
> > includes.
> >
> > Are you happy with the below fixup?
>
> LGTM.
Cool. I've folded that in and pushed the updated branch.
Thanks,
Mark.
On Tue, Aug 15, 2017 at 05:39:55PM +0100, Mark Rutland wrote:
> On Tue, Aug 15, 2017 at 09:33:18AM -0700, Andy Lutomirski wrote:
> > On Tue, Aug 15, 2017 at 9:30 AM, Mark Rutland <[email protected]> wrote:
> > > On Tue, Aug 15, 2017 at 09:09:36AM -0700, Andy Lutomirski wrote:
>
> > >> This looks okay, but it might make sense to move that to a header file
> > >> so THREAD_ALIGN is always available.
> > >
> > > I was a little worried about breaking things, since arches don't define
> > > THREAD_SIZE in a consistent location.
> > >
> > > Looking again, it looks like those are all transitiviely included into
> > > each arch's <asm/thread_info.h>, so I think I can move this into
> > > <linux/thread_info.h>, which'll have to be added to kernel.fork.c's
> > > includes.
> > >
> > > Are you happy with the below fixup?
> >
> > LGTM.
>
> Cool. I've folded that in and pushed the updated branch.
I pulled the branch for 4.14. Thanks.
--
Catalin
On 08/15/2017 05:50 AM, Mark Rutland wrote:
> Hi,
>
> Ard and I have worked together to implement vmap stack support for
> arm64. This supersedes our earlier vmap stack RFCs [0,1]. The git author
> stats are a little misleading, as I've teased parts out into smaller
> patches for review.
>
> The series is based on our stack dump rework [2,3], which can be found
> in the arm64/exception-stack branch [4] of my kernel.org repo. This
> series can be found in the arm64/vmap-stack branch [5] of the same repo.
>
> Since v1 [6]:
> * Fix typos
> * Update comments in entry assembly
> * Dump exception context (and stacks) before regs
> * Define safe adr_this_cpu for modules
>
> On arm64, there is no double-fault exception, as software saves
> exception context to the stack. An erroneous memory access taken during
> exception handling results in a data abort, as with any other erroneous
> memory access. To avoid taking these recursively, we must detect
> overflow by checking the SP before we attempt to store any context to
> the stack. Doing this efficiently requires a couple of tricks.
>
> For a naturally aligned stack, bits THREAD_SHIFT-1:0 of a valid SP may
> contain any arbitrary value:
>
> 0bXX .. 11111111111111
> 0bXX .. 11011001011100
> 0bXX .. 00000000000000
>
> By aligning stacks to double their natural alignment, we know that the
> THREAD_SHIFT bit of any valid SP must be zero:
>
> 0bXX .. 0 11111111111111
> 0bXX .. 0 11011001011100
> 0bXX .. 0 00000000000000
>
> ... while an overflow will result in this bit flipping, along with
> (some) other high-order bits:
>
> 0bXX .. 0 00000000000000
> < SP -= 1 >
> 0bXX .. 1 11111111111111
>
> ... and thus, we can detect overflows of up to THREAD_SIZE by testing
> the THREAD_SHIFT bit of the SP value.
>
> Provided we can get the SP into a general purpose register, we can
> perform this test with a single TBNZ instruction. We don't have scratch
> space to store a GPR, but we can (partially) swap the SP with a GPR
> using arithmetic to perform the test:
>
> add sp, sp, x0 // sp' = sp + x0
> sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
> tbnz x0, #THREAD_SHIFT, overflow_handler
> sub x0, sp, x0 // sp' - x0' = (sp + x0) - sp = x0
> sub sp, sp, x0 // sp' - x0 = (sp + x0) - x0 = sp
>
> This series implements this approach, along with the other requisite
> changes required to make this work.
>
> The SP test is performed for all exceptions, after compensating for the
> size of the exception registers, allowing the original exception context
> to be preserved in entirety. The tests themselves are folded into the
> exception vectors, minimizing their impact.
>
> To ensure that IRQ stack overflows are detected and handled, IRQ stacks
> are now dynamically allocated, with guard pages.
>
> I've given the series some light testing with LKDTM, Syzkaller, Vince
> Weaver's perf fuzzer, and a few combinations of debug options. I haven't
> compared performance of the entire series to a baseline kernel, but from
> testing so far the cost of the SP test falls in the noise for a kernel
> build workload on Cortex-A57.
>
> Many thanks to Ard for putting up with my meddling, and also to Laura,
> James, Catalin, and Will for comments and testing.
>
> Thanks,
> Mark.
>
> [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/518368.html
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/518434.html
> [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/520705.html
> [3] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/521435.html
> [4] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/exception-stack
> [5] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/vmap-stack
> [6] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-August/524179.html
>
> Ard Biesheuvel (2):
> arm64: kernel: remove {THREAD,IRQ_STACK}_START_SP
> arm64: assembler: allow adr_this_cpu to use the stack pointer
>
> Mark Rutland (12):
> arm64: remove __die()'s stack dump
> fork: allow arch-override of VMAP stack alignment
> arm64: factor out PAGE_* and CONT_* definitions
> arm64: clean up THREAD_* definitions
> arm64: clean up irq stack definitions
> arm64: move SEGMENT_ALIGN to <asm/memory.h>
> efi/arm64: add EFI_KIMG_ALIGN
> arm64: factor out entry stack manipulation
> arm64: use an irq stack pointer
> arm64: add basic VMAP_STACK support
> arm64: add on_accessible_stack()
> arm64: add VMAP_STACK overflow detection
>
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/assembler.h | 8 +-
> arch/arm64/include/asm/efi.h | 8 ++
> arch/arm64/include/asm/irq.h | 25 ------
> arch/arm64/include/asm/memory.h | 53 +++++++++++++
> arch/arm64/include/asm/page-def.h | 34 +++++++++
> arch/arm64/include/asm/page.h | 12 +--
> arch/arm64/include/asm/processor.h | 2 +-
> arch/arm64/include/asm/stacktrace.h | 60 ++++++++++++++-
> arch/arm64/include/asm/thread_info.h | 10 +--
> arch/arm64/kernel/entry.S | 121 ++++++++++++++++++++++++------
> arch/arm64/kernel/irq.c | 40 +++++++++-
> arch/arm64/kernel/ptrace.c | 1 +
> arch/arm64/kernel/smp.c | 2 +-
> arch/arm64/kernel/stacktrace.c | 7 +-
> arch/arm64/kernel/traps.c | 44 ++++++++++-
> arch/arm64/kernel/vmlinux.lds.S | 18 +----
> drivers/firmware/efi/libstub/arm64-stub.c | 6 +-
> kernel/fork.c | 5 +-
> 19 files changed, 353 insertions(+), 104 deletions(-)
> create mode 100644 arch/arm64/include/asm/page-def.h
>
Tested-by: Laura Abbott <[email protected]>
(I think I may be slightly late with this. Oh well.)
On Tue, Aug 15, 2017 at 10:18:20AM -0700, Laura Abbott wrote:
> On 08/15/2017 05:50 AM, Mark Rutland wrote:
> > Hi,
> >
> > Ard and I have worked together to implement vmap stack support for
> > arm64. This supersedes our earlier vmap stack RFCs [0,1]. The git author
> > stats are a little misleading, as I've teased parts out into smaller
> > patches for review.
> >
> > The series is based on our stack dump rework [2,3], which can be found
> > in the arm64/exception-stack branch [4] of my kernel.org repo. This
> > series can be found in the arm64/vmap-stack branch [5] of the same repo.
> Tested-by: Laura Abbott <[email protected]>
Thanks!
> (I think I may be slightly late with this. Oh well.)
On the chance that Catalin's happy to redo the pull, I've folded that
Tested-by in, and pushed the series.
Thanks,
Mark.
On Tue, Aug 15, 2017 at 06:39:14PM +0100, Mark Rutland wrote:
> On Tue, Aug 15, 2017 at 10:18:20AM -0700, Laura Abbott wrote:
> > On 08/15/2017 05:50 AM, Mark Rutland wrote:
> > > Hi,
> > >
> > > Ard and I have worked together to implement vmap stack support for
> > > arm64. This supersedes our earlier vmap stack RFCs [0,1]. The git author
> > > stats are a little misleading, as I've teased parts out into smaller
> > > patches for review.
> > >
> > > The series is based on our stack dump rework [2,3], which can be found
> > > in the arm64/exception-stack branch [4] of my kernel.org repo. This
> > > series can be found in the arm64/vmap-stack branch [5] of the same repo.
>
> > Tested-by: Laura Abbott <[email protected]>
>
> Thanks!
>
> > (I think I may be slightly late with this. Oh well.)
>
> On the chance that Catalin's happy to redo the pull, I've folded that
> Tested-by in, and pushed the series.
Pulled again as I haven't pushed it out yet (not until tomorrow, I'm
running some tests overnight).
--
Catalin
On Tue, Aug 15, 2017 at 10:44 AM, Catalin Marinas
<[email protected]> wrote:
> On Tue, Aug 15, 2017 at 06:39:14PM +0100, Mark Rutland wrote:
>> On Tue, Aug 15, 2017 at 10:18:20AM -0700, Laura Abbott wrote:
>> > On 08/15/2017 05:50 AM, Mark Rutland wrote:
>> > > Hi,
>> > >
>> > > Ard and I have worked together to implement vmap stack support for
>> > > arm64. This supersedes our earlier vmap stack RFCs [0,1]. The git author
>> > > stats are a little misleading, as I've teased parts out into smaller
>> > > patches for review.
>> > >
>> > > The series is based on our stack dump rework [2,3], which can be found
>> > > in the arm64/exception-stack branch [4] of my kernel.org repo. This
>> > > series can be found in the arm64/vmap-stack branch [5] of the same repo.
>>
>> > Tested-by: Laura Abbott <[email protected]>
>>
>> Thanks!
>>
>> > (I think I may be slightly late with this. Oh well.)
>>
>> On the chance that Catalin's happy to redo the pull, I've folded that
>> Tested-by in, and pushed the series.
>
> Pulled again as I haven't pushed it out yet (not until tomorrow, I'm
> running some tests overnight).
So awesome! Thanks everyone for the work, review, and testing. :)
I'll get the LKDTM tests for VMAP_STACK pushed to Greg.
-Kees
--
Kees Cook
Pixel Security