2020-04-25 10:05:10

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 00/11] ORC fixes

v2:
- Dropped three patches which technically weren't fixes. Will post them
later as part of another patch set with more improvements.
- Removed show_iret_regs() declaration [mbenes]
- Added Miroslav Reviewed-by, Linus Reported-by

v1 was here:
https://lkml.kernel.org/r/[email protected]

Jann Horn (1):
x86/entry/64: Fix unwind hints in rewind_stack_do_exit()

Josh Poimboeuf (9):
objtool: Fix stack offset tracking for indirect CFAs
x86/entry/64: Fix unwind hints in register clearing code
x86/entry/64: Fix unwind hints in kernel exit path
x86/entry/64: Fix unwind hints in __switch_to_asm()
x86/unwind/orc: Convert global variables to static
x86/unwind: Prevent false warnings for non-current tasks
x86/unwind/orc: Prevent unwinding before ORC initialization
x86/unwind/orc: Fix error path for bad ORC entry type
x86/unwind/orc: Fix premature unwind stoppage due to IRET frames

Miroslav Benes (1):
x86/unwind/orc: Don't skip the first frame for inactive tasks

arch/x86/entry/calling.h | 40 ++++++------
arch/x86/entry/entry_64.S | 14 ++---
arch/x86/include/asm/unwind.h | 2 +-
arch/x86/kernel/dumpstack_64.c | 3 +-
arch/x86/kernel/unwind_frame.c | 3 +
arch/x86/kernel/unwind_orc.c | 111 ++++++++++++++++++++++-----------
tools/objtool/check.c | 2 +-
7 files changed, 108 insertions(+), 67 deletions(-)

--
2.21.1


2020-04-25 10:05:13

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 01/11] objtool: Fix stack offset tracking for indirect CFAs

When the current frame address (CFA) is stored on the stack (i.e.,
cfa->base == CFI_SP_INDIRECT), objtool neglects to adjust the stack
offset when there are subsequent pushes or pops. This results in bad
ORC data at the end of the ENTER_IRQ_STACK macro, when it puts the
previous stack pointer on the stack and does a subsequent push.

This fixes the following unwinder warning:

WARNING: can't dereference registers at 00000000f0a6bdba for ip interrupt_entry+0x9f/0xa0

Fixes: 627fce14809b ("objtool: Add ORC unwind table generation")
Reported-by: Vince Weaver <[email protected]>
Reported-by: Dave Jones <[email protected]>
Reported-by: Steven Rostedt <[email protected]>
Reported-by: Vegard Nossum <[email protected]>
Reported-by: Joe Mario <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
tools/objtool/check.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 0d500767009b..0c732d586924 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1550,7 +1550,7 @@ static int update_cfi_state_regs(struct instruction *insn,
{
struct cfi_reg *cfa = &cfi->cfa;

- if (cfa->base != CFI_SP)
+ if (cfa->base != CFI_SP && cfa->base != CFI_SP_INDIRECT)
return 0;

/* push */
--
2.21.1

2020-04-25 10:05:19

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 03/11] x86/entry/64: Fix unwind hints in kernel exit path

In swapgs_restore_regs_and_return_to_usermode, after the stack is
switched to the trampoline stack, the existing UNWIND_HINT_REGS hint is
no longer valid, which can result in the following ORC unwinder warning:

WARNING: can't dereference registers at 000000003aeb0cdd for ip swapgs_restore_regs_and_return_to_usermode+0x93/0xa0

For full correctness, we could try to add complicated unwind hints so
the unwinder could continue to find the registers, but when when it's
this close to kernel exit, unwind hints aren't really needed anymore and
it's fine to just use an empty hint which tells the unwinder to stop.

For consistency, also move the UNWIND_HINT_EMPTY in
entry_SYSCALL_64_after_hwframe to a similar location.

Fixes: 3e3b9293d392 ("x86/entry/64: Return to userspace from the trampoline stack")
Reported-by: Vince Weaver <[email protected]>
Reported-by: Dave Jones <[email protected]>
Reported-by: "Dr. David Alan Gilbert" <[email protected]>
Reported-by: Joe Mario <[email protected]>
Reported-by: Jann Horn <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
arch/x86/entry/entry_64.S | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0e9504fabe52..6b0d679efd6b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -249,7 +249,6 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
*/
syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
- UNWIND_HINT_EMPTY
POP_REGS pop_rdi=0 skip_r11rcx=1

/*
@@ -258,6 +257,7 @@ syscall_return_via_sysret:
*/
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
+ UNWIND_HINT_EMPTY

pushq RSP-RDI(%rdi) /* RSP */
pushq (%rdi) /* RDI */
@@ -637,6 +637,7 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
*/
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
+ UNWIND_HINT_EMPTY

/* Copy the IRET frame to the trampoline stack. */
pushq 6*8(%rdi) /* SS */
--
2.21.1

2020-04-25 10:05:22

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 02/11] x86/entry/64: Fix unwind hints in register clearing code

The PUSH_AND_CLEAR_REGS macro zeroes each register immediately after
pushing it. If an NMI or exception hits after a register is cleared,
but before the UNWIND_HINT_REGS annotation, the ORC unwinder will
wrongly think the previous value of the register was zero. This can
confuse the unwinding process and cause it to exit early.

Because ORC is simpler than DWARF, there are a limited number of unwind
annotation states, so it's not possible to add an individual unwind hint
after each push/clear combination. Instead, the register clearing
instructions need to be consolidated and moved to after the
UNWIND_HINT_REGS annotation.

Fixes: 3f01daecd545 ("x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro")
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
arch/x86/entry/calling.h | 40 +++++++++++++++++++++-------------------
1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 0789e13ece90..1c7f13bb6728 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -98,13 +98,6 @@ For 32-bit we have the following conventions - kernel is built with
#define SIZEOF_PTREGS 21*8

.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
- /*
- * Push registers and sanitize registers of values that a
- * speculation attack might otherwise want to exploit. The
- * lower registers are likely clobbered well before they
- * could be put to use in a speculative execution gadget.
- * Interleave XOR with PUSH for better uop scheduling:
- */
.if \save_ret
pushq %rsi /* pt_regs->si */
movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */
@@ -114,34 +107,43 @@ For 32-bit we have the following conventions - kernel is built with
pushq %rsi /* pt_regs->si */
.endif
pushq \rdx /* pt_regs->dx */
- xorl %edx, %edx /* nospec dx */
pushq %rcx /* pt_regs->cx */
- xorl %ecx, %ecx /* nospec cx */
pushq \rax /* pt_regs->ax */
pushq %r8 /* pt_regs->r8 */
- xorl %r8d, %r8d /* nospec r8 */
pushq %r9 /* pt_regs->r9 */
- xorl %r9d, %r9d /* nospec r9 */
pushq %r10 /* pt_regs->r10 */
- xorl %r10d, %r10d /* nospec r10 */
pushq %r11 /* pt_regs->r11 */
- xorl %r11d, %r11d /* nospec r11*/
pushq %rbx /* pt_regs->rbx */
- xorl %ebx, %ebx /* nospec rbx*/
pushq %rbp /* pt_regs->rbp */
- xorl %ebp, %ebp /* nospec rbp*/
pushq %r12 /* pt_regs->r12 */
- xorl %r12d, %r12d /* nospec r12*/
pushq %r13 /* pt_regs->r13 */
- xorl %r13d, %r13d /* nospec r13*/
pushq %r14 /* pt_regs->r14 */
- xorl %r14d, %r14d /* nospec r14*/
pushq %r15 /* pt_regs->r15 */
- xorl %r15d, %r15d /* nospec r15*/
UNWIND_HINT_REGS
+
.if \save_ret
pushq %rsi /* return address on top of stack */
.endif
+
+ /*
+ * Sanitize registers of values that a speculation attack might
+ * otherwise want to exploit. The lower registers are likely clobbered
+ * well before they could be put to use in a speculative execution
+ * gadget.
+ */
+ xorl %edx, %edx /* nospec dx */
+ xorl %ecx, %ecx /* nospec cx */
+ xorl %r8d, %r8d /* nospec r8 */
+ xorl %r9d, %r9d /* nospec r9 */
+ xorl %r10d, %r10d /* nospec r10 */
+ xorl %r11d, %r11d /* nospec r11 */
+ xorl %ebx, %ebx /* nospec rbx */
+ xorl %ebp, %ebp /* nospec rbp */
+ xorl %r12d, %r12d /* nospec r12 */
+ xorl %r13d, %r13d /* nospec r13 */
+ xorl %r14d, %r14d /* nospec r14 */
+ xorl %r15d, %r15d /* nospec r15 */
+
.endm

.macro POP_REGS pop_rdi=1 skip_r11rcx=0
--
2.21.1

2020-04-25 10:05:25

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 04/11] x86/entry/64: Fix unwind hints in __switch_to_asm()

UNWIND_HINT_FUNC has some limitations: specifically, it doesn't reset
all the registers to undefined. This causes objtool to get confused
about the RBP push in __switch_to_asm(), resulting in bad ORC data.

While __switch_to_asm() does do some stack magic, it's otherwise a
normal callable-from-C function, so just annotate it as a function,
which makes objtool happy and allows it to produces the correct hints
automatically.

Fixes: 8c1f75587a18 ("x86/entry/64: Add unwind hint annotations")
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
arch/x86/entry/entry_64.S | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 6b0d679efd6b..34a588950fe1 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -279,8 +279,7 @@ SYM_CODE_END(entry_SYSCALL_64)
* %rdi: prev task
* %rsi: next task
*/
-SYM_CODE_START(__switch_to_asm)
- UNWIND_HINT_FUNC
+SYM_FUNC_START(__switch_to_asm)
/*
* Save callee-saved registers
* This must match the order in inactive_task_frame
@@ -321,7 +320,7 @@ SYM_CODE_START(__switch_to_asm)
popq %rbp

jmp __switch_to
-SYM_CODE_END(__switch_to_asm)
+SYM_FUNC_END(__switch_to_asm)

/*
* A newly forked process directly context switches into this address.
--
2.21.1

2020-04-25 10:05:29

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 05/11] x86/entry/64: Fix unwind hints in rewind_stack_do_exit()

From: Jann Horn <[email protected]>

The leaq instruction in rewind_stack_do_exit moves the stack pointer
directly below the pt_regs at the top of the task stack before calling
do_exit(). Tell the unwinder to expect pt_regs.

Fixes: 8c1f75587a18 ("x86/entry/64: Add unwind hint annotations")
Signed-off-by: Jann Horn <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
arch/x86/entry/entry_64.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 34a588950fe1..9fe0d5cad8e4 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1739,7 +1739,7 @@ SYM_CODE_START(rewind_stack_do_exit)

movq PER_CPU_VAR(cpu_current_top_of_stack), %rax
leaq -PTREGS_SIZE(%rax), %rsp
- UNWIND_HINT_FUNC sp_offset=PTREGS_SIZE
+ UNWIND_HINT_REGS

call do_exit
SYM_CODE_END(rewind_stack_do_exit)
--
2.21.1

2020-04-25 10:05:31

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 08/11] x86/unwind/orc: Don't skip the first frame for inactive tasks

From: Miroslav Benes <[email protected]>

When unwinding an inactive task, the ORC unwinder skips the first frame
by default. If both the 'regs' and 'first_frame' parameters of
unwind_start() are NULL, 'state->sp' and 'first_frame' are later
initialized to the same value for an inactive task. Given there is a
"less than or equal to" comparison used at the end of __unwind_start()
for skipping stack frames, the first frame is skipped.

Drop the equal part of the comparison and make the behavior equivalent
to the frame pointer unwinder.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Signed-off-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
arch/x86/kernel/unwind_orc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 45166fd50be3..e9f5a20c69c6 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -657,7 +657,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
/* Otherwise, skip ahead to the user-specified starting frame: */
while (!unwind_done(state) &&
(!on_stack(&state->stack_info, first_frame, sizeof(long)) ||
- state->sp <= (unsigned long)first_frame))
+ state->sp < (unsigned long)first_frame))
unwind_next_frame(state);

return;
--
2.21.1

2020-04-25 10:05:41

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 06/11] x86/unwind/orc: Convert global variables to static

These variables aren't used outside of unwind_orc.c, make them static.

Also annotate some of them with '__ro_after_init', as applicable.

Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
arch/x86/kernel/unwind_orc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index e9cc182aa97e..64889da666f4 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -15,12 +15,12 @@ extern int __stop_orc_unwind_ip[];
extern struct orc_entry __start_orc_unwind[];
extern struct orc_entry __stop_orc_unwind[];

-static DEFINE_MUTEX(sort_mutex);
-int *cur_orc_ip_table = __start_orc_unwind_ip;
-struct orc_entry *cur_orc_table = __start_orc_unwind;
+static bool orc_init __ro_after_init;
+static unsigned int lookup_num_blocks __ro_after_init;

-unsigned int lookup_num_blocks;
-bool orc_init;
+static DEFINE_MUTEX(sort_mutex);
+static int *cur_orc_ip_table = __start_orc_unwind_ip;
+static struct orc_entry *cur_orc_table = __start_orc_unwind;

static inline unsigned long orc_ip(const int *ip)
{
--
2.21.1

2020-04-25 10:06:09

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 09/11] x86/unwind/orc: Prevent unwinding before ORC initialization

If the unwinder is called before the ORC data has been initialized,
orc_find() returns NULL, and it tries to fall back to using frame
pointers. This can cause some unexpected warnings during boot.

Move the 'orc_init' check from orc_find() to __unwind_init(), so that it
doesn't even try to unwind from an uninitialized state.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
arch/x86/kernel/unwind_orc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index e9f5a20c69c6..cb11567361cc 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -148,9 +148,6 @@ static struct orc_entry *orc_find(unsigned long ip)
{
static struct orc_entry *orc;

- if (!orc_init)
- return NULL;
-
if (ip == 0)
return &null_orc_entry;

@@ -591,6 +588,9 @@ EXPORT_SYMBOL_GPL(unwind_next_frame);
void __unwind_start(struct unwind_state *state, struct task_struct *task,
struct pt_regs *regs, unsigned long *first_frame)
{
+ if (!orc_init)
+ goto done;
+
memset(state, 0, sizeof(*state));
state->task = task;

--
2.21.1

2020-04-25 10:07:09

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 07/11] x86/unwind: Prevent false warnings for non-current tasks

There's some daring kernel code out there which dumps the stack of
another task without first making sure the task is inactive. If the
task happens to be running while the unwinder is reading the stack,
unusual unwinder warnings can result.

There's no race-free way for the unwinder to know whether such a warning
is legitimate, so just disable unwinder warnings for all non-current
tasks.

Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
arch/x86/kernel/dumpstack_64.c | 3 ++-
arch/x86/kernel/unwind_frame.c | 3 +++
arch/x86/kernel/unwind_orc.c | 40 +++++++++++++++++++---------------
3 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 87b97897a881..460ae7f66818 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -183,7 +183,8 @@ int get_stack_info(unsigned long *stack, struct task_struct *task,
*/
if (visit_mask) {
if (*visit_mask & (1UL << info->type)) {
- printk_deferred_once(KERN_WARNING "WARNING: stack recursion on stack type %d\n", info->type);
+ if (task == current)
+ printk_deferred_once(KERN_WARNING "WARNING: stack recursion on stack type %d\n", info->type);
goto unknown;
}
*visit_mask |= 1UL << info->type;
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index a224b5ab103f..54226110bc7f 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -344,6 +344,9 @@ bool unwind_next_frame(struct unwind_state *state)
if (IS_ENABLED(CONFIG_X86_32))
goto the_end;

+ if (state->task != current)
+ goto the_end;
+
if (state->regs) {
printk_deferred_once(KERN_WARNING
"WARNING: kernel stack regs at %p in %s:%d has bad 'bp' value %p\n",
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 64889da666f4..45166fd50be3 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -8,7 +8,13 @@
#include <asm/orc_lookup.h>

#define orc_warn(fmt, ...) \
- printk_deferred_once(KERN_WARNING pr_fmt("WARNING: " fmt), ##__VA_ARGS__)
+ printk_deferred_once(KERN_WARNING "WARNING: " fmt, ##__VA_ARGS__)
+
+#define orc_warn_current(args...) \
+({ \
+ if (state->task == current) \
+ orc_warn(args); \
+})

extern int __start_orc_unwind_ip[];
extern int __stop_orc_unwind_ip[];
@@ -446,8 +452,8 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_REG_R10:
if (!state->regs || !state->full_regs) {
- orc_warn("missing regs for base reg R10 at ip %pB\n",
- (void *)state->ip);
+ orc_warn_current("missing R10 value at %pB\n",
+ (void *)state->ip);
goto err;
}
sp = state->regs->r10;
@@ -455,8 +461,8 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_REG_R13:
if (!state->regs || !state->full_regs) {
- orc_warn("missing regs for base reg R13 at ip %pB\n",
- (void *)state->ip);
+ orc_warn_current("missing R13 value at %pB\n",
+ (void *)state->ip);
goto err;
}
sp = state->regs->r13;
@@ -464,8 +470,8 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_REG_DI:
if (!state->regs || !state->full_regs) {
- orc_warn("missing regs for base reg DI at ip %pB\n",
- (void *)state->ip);
+ orc_warn_current("missing RDI value at %pB\n",
+ (void *)state->ip);
goto err;
}
sp = state->regs->di;
@@ -473,15 +479,15 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_REG_DX:
if (!state->regs || !state->full_regs) {
- orc_warn("missing regs for base reg DX at ip %pB\n",
- (void *)state->ip);
+ orc_warn_current("missing DX value at %pB\n",
+ (void *)state->ip);
goto err;
}
sp = state->regs->dx;
break;

default:
- orc_warn("unknown SP base reg %d for ip %pB\n",
+ orc_warn("unknown SP base reg %d at %pB\n",
orc->sp_reg, (void *)state->ip);
goto err;
}
@@ -509,8 +515,8 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_TYPE_REGS:
if (!deref_stack_regs(state, sp, &state->ip, &state->sp)) {
- orc_warn("can't dereference registers at %p for ip %pB\n",
- (void *)sp, (void *)orig_ip);
+ orc_warn_current("can't access registers at %pB\n",
+ (void *)orig_ip);
goto err;
}

@@ -521,8 +527,8 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_TYPE_REGS_IRET:
if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) {
- orc_warn("can't dereference iret registers at %p for ip %pB\n",
- (void *)sp, (void *)orig_ip);
+ orc_warn_current("can't access iret registers at %pB\n",
+ (void *)orig_ip);
goto err;
}

@@ -532,7 +538,7 @@ bool unwind_next_frame(struct unwind_state *state)
break;

default:
- orc_warn("unknown .orc_unwind entry type %d for ip %pB\n",
+ orc_warn("unknown .orc_unwind entry type %d at %pB\n",
orc->type, (void *)orig_ip);
break;
}
@@ -564,8 +570,8 @@ bool unwind_next_frame(struct unwind_state *state)
if (state->stack_info.type == prev_type &&
on_stack(&state->stack_info, (void *)state->sp, sizeof(long)) &&
state->sp <= prev_sp) {
- orc_warn("stack going in the wrong direction? ip=%pB\n",
- (void *)orig_ip);
+ orc_warn_current("stack going in the wrong direction? at %pB\n",
+ (void *)orig_ip);
goto err;
}

--
2.21.1

2020-04-25 10:11:06

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 10/11] x86/unwind/orc: Fix error path for bad ORC entry type

If the ORC entry type is unknown, nothing else can be done other than
reporting an error. Exit the function instead of breaking out of the
switch statement.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
arch/x86/kernel/unwind_orc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index cb11567361cc..33b80a7f998f 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -537,7 +537,7 @@ bool unwind_next_frame(struct unwind_state *state)
default:
orc_warn("unknown .orc_unwind entry type %d at %pB\n",
orc->type, (void *)orig_ip);
- break;
+ goto err;
}

/* Find BP: */
--
2.21.1

2020-04-25 10:11:06

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH v2 11/11] x86/unwind/orc: Fix premature unwind stoppage due to IRET frames

The following execution path is possible:

fsnotify()
[ realign the stack and store previous SP in R10 ]
<IRQ>
[ only IRET regs saved ]
common_interrupt()
interrupt_entry()
<NMI>
[ full pt_regs saved ]
...
[ unwind stack ]

When the unwinder goes through the NMI and the IRQ on the stack, and
then sees fsnotify(), it doesn't have access to the value of R10,
because it only has the five IRET registers. So the unwind stops
prematurely.

However, because the interrupt_entry() code is careful not to clobber
R10 before saving the full regs, the unwinder should be able to read R10
from the previously saved full pt_regs associated with the NMI.

Handle this case properly. When encountering an IRET regs frame
immediately after a full pt_regs frame, use the pt_regs as a backup
which can be used to get the C register values.

Also, note that a call frame resets the 'prev_regs' value, because a
function is free to clobber the registers. For this fix to work, the
IRET and full regs frames must be adjacent, with no FUNC frames in
between. So replace the FUNC hint in interrupt_entry() with an
IRET_REGS hint.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
---
arch/x86/entry/entry_64.S | 4 +--
arch/x86/include/asm/unwind.h | 2 +-
arch/x86/kernel/unwind_orc.c | 51 +++++++++++++++++++++++++++--------
3 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9fe0d5cad8e4..3063aa9090f9 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -511,7 +511,7 @@ SYM_CODE_END(spurious_entries_start)
* +----------------------------------------------------+
*/
SYM_CODE_START(interrupt_entry)
- UNWIND_HINT_FUNC
+ UNWIND_HINT_IRET_REGS offset=16
ASM_CLAC
cld

@@ -543,9 +543,9 @@ SYM_CODE_START(interrupt_entry)
pushq 5*8(%rdi) /* regs->eflags */
pushq 4*8(%rdi) /* regs->cs */
pushq 3*8(%rdi) /* regs->ip */
+ UNWIND_HINT_IRET_REGS
pushq 2*8(%rdi) /* regs->orig_ax */
pushq 8(%rdi) /* return address */
- UNWIND_HINT_FUNC

movq (%rdi), %rdi
jmp 2f
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 499578f7e6d7..70fc159ebe69 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -19,7 +19,7 @@ struct unwind_state {
#if defined(CONFIG_UNWINDER_ORC)
bool signal, full_regs;
unsigned long sp, bp, ip;
- struct pt_regs *regs;
+ struct pt_regs *regs, *prev_regs;
#elif defined(CONFIG_UNWINDER_FRAME_POINTER)
bool got_irq;
unsigned long *bp, *orig_sp, ip;
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 33b80a7f998f..0ebc11a8bb45 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -384,9 +384,38 @@ static bool deref_stack_iret_regs(struct unwind_state *state, unsigned long addr
return true;
}

+/*
+ * If state->regs is non-NULL, and points to a full pt_regs, just get the reg
+ * value from state->regs.
+ *
+ * Otherwise, if state->regs just points to IRET regs, and the previous frame
+ * had full regs, it's safe to get the value from the previous regs. This can
+ * happen when early/late IRQ entry code gets interrupted by an NMI.
+ */
+static bool get_reg(struct unwind_state *state, unsigned int reg_off,
+ unsigned long *val)
+{
+ unsigned int reg = reg_off/8;
+
+ if (!state->regs)
+ return false;
+
+ if (state->full_regs) {
+ *val = ((unsigned long *)state->regs)[reg];
+ return true;
+ }
+
+ if (state->prev_regs) {
+ *val = ((unsigned long *)state->prev_regs)[reg];
+ return true;
+ }
+
+ return false;
+}
+
bool unwind_next_frame(struct unwind_state *state)
{
- unsigned long ip_p, sp, orig_ip = state->ip, prev_sp = state->sp;
+ unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp;
enum stack_type prev_type = state->stack_info.type;
struct orc_entry *orc;
bool indirect = false;
@@ -448,39 +477,35 @@ bool unwind_next_frame(struct unwind_state *state)
break;

case ORC_REG_R10:
- if (!state->regs || !state->full_regs) {
+ if (!get_reg(state, offsetof(struct pt_regs, r10), &sp)) {
orc_warn_current("missing R10 value at %pB\n",
(void *)state->ip);
goto err;
}
- sp = state->regs->r10;
break;

case ORC_REG_R13:
- if (!state->regs || !state->full_regs) {
+ if (!get_reg(state, offsetof(struct pt_regs, r13), &sp)) {
orc_warn_current("missing R13 value at %pB\n",
(void *)state->ip);
goto err;
}
- sp = state->regs->r13;
break;

case ORC_REG_DI:
- if (!state->regs || !state->full_regs) {
+ if (!get_reg(state, offsetof(struct pt_regs, di), &sp)) {
orc_warn_current("missing RDI value at %pB\n",
(void *)state->ip);
goto err;
}
- sp = state->regs->di;
break;

case ORC_REG_DX:
- if (!state->regs || !state->full_regs) {
+ if (!get_reg(state, offsetof(struct pt_regs, dx), &sp)) {
orc_warn_current("missing DX value at %pB\n",
(void *)state->ip);
goto err;
}
- sp = state->regs->dx;
break;

default:
@@ -507,6 +532,7 @@ bool unwind_next_frame(struct unwind_state *state)

state->sp = sp;
state->regs = NULL;
+ state->prev_regs = NULL;
state->signal = false;
break;

@@ -518,6 +544,7 @@ bool unwind_next_frame(struct unwind_state *state)
}

state->regs = (struct pt_regs *)sp;
+ state->prev_regs = NULL;
state->full_regs = true;
state->signal = true;
break;
@@ -529,6 +556,8 @@ bool unwind_next_frame(struct unwind_state *state)
goto err;
}

+ if (state->full_regs)
+ state->prev_regs = state->regs;
state->regs = (void *)sp - IRET_FRAME_OFFSET;
state->full_regs = false;
state->signal = true;
@@ -543,8 +572,8 @@ bool unwind_next_frame(struct unwind_state *state)
/* Find BP: */
switch (orc->bp_reg) {
case ORC_REG_UNDEFINED:
- if (state->regs && state->full_regs)
- state->bp = state->regs->bp;
+ if (get_reg(state, offsetof(struct pt_regs, bp), &tmp))
+ state->bp = tmp;
break;

case ORC_REG_PREV_SP:
--
2.21.1

2020-04-25 10:15:14

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 09/11] x86/unwind/orc: Prevent unwinding before ORC initialization

On Sat, Apr 25, 2020 at 05:03:08AM -0500, Josh Poimboeuf wrote:
> If the unwinder is called before the ORC data has been initialized,
> orc_find() returns NULL, and it tries to fall back to using frame
> pointers. This can cause some unexpected warnings during boot.
>
> Move the 'orc_init' check from orc_find() to __unwind_init(), so that it
> doesn't even try to unwind from an uninitialized state.
>
> Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
> Signed-off-by: Josh Poimboeuf <[email protected]>
> Reviewed-by: Miroslav Benes <[email protected]>

I got a weird error when sending this one:

4.7.1 Error: too much mail from 10.10.114.29

If anybody didn't get it, let me know and I can bounce it.

> ---
> arch/x86/kernel/unwind_orc.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index e9f5a20c69c6..cb11567361cc 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -148,9 +148,6 @@ static struct orc_entry *orc_find(unsigned long ip)
> {
> static struct orc_entry *orc;
>
> - if (!orc_init)
> - return NULL;
> -
> if (ip == 0)
> return &null_orc_entry;
>
> @@ -591,6 +588,9 @@ EXPORT_SYMBOL_GPL(unwind_next_frame);
> void __unwind_start(struct unwind_state *state, struct task_struct *task,
> struct pt_regs *regs, unsigned long *first_frame)
> {
> + if (!orc_init)
> + goto done;
> +
> memset(state, 0, sizeof(*state));
> state->task = task;
>
> --
> 2.21.1
>

--
Josh

2020-04-25 10:28:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] ORC fixes


* Josh Poimboeuf <[email protected]> wrote:

> v2:
> - Dropped three patches which technically weren't fixes. Will post them
> later as part of another patch set with more improvements.
> - Removed show_iret_regs() declaration [mbenes]
> - Added Miroslav Reviewed-by, Linus Reported-by
>
> v1 was here:
> https://lkml.kernel.org/r/[email protected]
>
> Jann Horn (1):
> x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
>
> Josh Poimboeuf (9):
> objtool: Fix stack offset tracking for indirect CFAs
> x86/entry/64: Fix unwind hints in register clearing code
> x86/entry/64: Fix unwind hints in kernel exit path
> x86/entry/64: Fix unwind hints in __switch_to_asm()
> x86/unwind/orc: Convert global variables to static
> x86/unwind: Prevent false warnings for non-current tasks
> x86/unwind/orc: Prevent unwinding before ORC initialization
> x86/unwind/orc: Fix error path for bad ORC entry type
> x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
>
> Miroslav Benes (1):
> x86/unwind/orc: Don't skip the first frame for inactive tasks

Thanks for doing this. These ORC handling bugs IMHO look serious and
widespread enough to warrant x86/urgent treatment, and the v2 series is
fixes-only.

Any objections against targeting v5.7-rc3 with this, assuming that
there's no problems found during review and it passes about a week of
testing?

Thanks,

Ingo

2020-04-25 10:43:41

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] ORC fixes

On Sat, Apr 25, 2020 at 12:25:12PM +0200, Ingo Molnar wrote:
>
> * Josh Poimboeuf <[email protected]> wrote:
>
> > v2:
> > - Dropped three patches which technically weren't fixes. Will post them
> > later as part of another patch set with more improvements.
> > - Removed show_iret_regs() declaration [mbenes]
> > - Added Miroslav Reviewed-by, Linus Reported-by
> >
> > v1 was here:
> > https://lkml.kernel.org/r/[email protected]
> >
> > Jann Horn (1):
> > x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
> >
> > Josh Poimboeuf (9):
> > objtool: Fix stack offset tracking for indirect CFAs
> > x86/entry/64: Fix unwind hints in register clearing code
> > x86/entry/64: Fix unwind hints in kernel exit path
> > x86/entry/64: Fix unwind hints in __switch_to_asm()
> > x86/unwind/orc: Convert global variables to static
> > x86/unwind: Prevent false warnings for non-current tasks
> > x86/unwind/orc: Prevent unwinding before ORC initialization
> > x86/unwind/orc: Fix error path for bad ORC entry type
> > x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
> >
> > Miroslav Benes (1):
> > x86/unwind/orc: Don't skip the first frame for inactive tasks
>
> Thanks for doing this. These ORC handling bugs IMHO look serious and
> widespread enough to warrant x86/urgent treatment, and the v2 series is
> fixes-only.
>
> Any objections against targeting v5.7-rc3 with this, assuming that
> there's no problems found during review and it passes about a week of
> testing?

Hi Ingo,

Due to other distractions, I unfortunately have been sitting on some of
these fixes for several months -- notice some of the long Reported-by
chains :-/

They're good small localized fixes and I would agree it makes sense to
target x86/urgent.

Thanks!

--
Josh

Subject: [tip: x86/urgent] x86/unwind/orc: Prevent unwinding before ORC initialization

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 98d0c8ebf77e0ba7c54a9ae05ea588f0e9e3f46e
Gitweb: https://git.kernel.org/tip/98d0c8ebf77e0ba7c54a9ae05ea588f0e9e3f46e
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:03:08 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:29 +02:00

x86/unwind/orc: Prevent unwinding before ORC initialization

If the unwinder is called before the ORC data has been initialized,
orc_find() returns NULL, and it tries to fall back to using frame
pointers. This can cause some unexpected warnings during boot.

Move the 'orc_init' check from orc_find() to __unwind_init(), so that it
doesn't even try to unwind from an uninitialized state.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lore.kernel.org/r/069d1499ad606d85532eb32ce39b2441679667d5.1587808742.git.jpoimboe@redhat.com
---
arch/x86/kernel/unwind_orc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index e9f5a20..cb11567 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -148,9 +148,6 @@ static struct orc_entry *orc_find(unsigned long ip)
{
static struct orc_entry *orc;

- if (!orc_init)
- return NULL;
-
if (ip == 0)
return &null_orc_entry;

@@ -591,6 +588,9 @@ EXPORT_SYMBOL_GPL(unwind_next_frame);
void __unwind_start(struct unwind_state *state, struct task_struct *task,
struct pt_regs *regs, unsigned long *first_frame)
{
+ if (!orc_init)
+ goto done;
+
memset(state, 0, sizeof(*state));
state->task = task;

Subject: [tip: x86/urgent] x86/unwind/orc: Convert global variables to static

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 153eb2223c794251b28400f3f74862e090d23f16
Gitweb: https://git.kernel.org/tip/153eb2223c794251b28400f3f74862e090d23f16
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:03:05 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:28 +02:00

x86/unwind/orc: Convert global variables to static

These variables aren't used outside of unwind_orc.c, make them static.

Also annotate some of them with '__ro_after_init', as applicable.

Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lore.kernel.org/r/43ae310bf7822b9862e571f36ae3474cfde8f301.1587808742.git.jpoimboe@redhat.com
---
arch/x86/kernel/unwind_orc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index e9cc182..64889da 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -15,12 +15,12 @@ extern int __stop_orc_unwind_ip[];
extern struct orc_entry __start_orc_unwind[];
extern struct orc_entry __stop_orc_unwind[];

-static DEFINE_MUTEX(sort_mutex);
-int *cur_orc_ip_table = __start_orc_unwind_ip;
-struct orc_entry *cur_orc_table = __start_orc_unwind;
+static bool orc_init __ro_after_init;
+static unsigned int lookup_num_blocks __ro_after_init;

-unsigned int lookup_num_blocks;
-bool orc_init;
+static DEFINE_MUTEX(sort_mutex);
+static int *cur_orc_ip_table = __start_orc_unwind_ip;
+static struct orc_entry *cur_orc_table = __start_orc_unwind;

static inline unsigned long orc_ip(const int *ip)
{

Subject: [tip: x86/urgent] x86/unwind: Prevent false warnings for non-current tasks

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: b08418b54831255a7e3700d6bf7dfc2bdae25cd7
Gitweb: https://git.kernel.org/tip/b08418b54831255a7e3700d6bf7dfc2bdae25cd7
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:03:06 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:28 +02:00

x86/unwind: Prevent false warnings for non-current tasks

There's some daring kernel code out there which dumps the stack of
another task without first making sure the task is inactive. If the
task happens to be running while the unwinder is reading the stack,
unusual unwinder warnings can result.

There's no race-free way for the unwinder to know whether such a warning
is legitimate, so just disable unwinder warnings for all non-current
tasks.

Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lore.kernel.org/r/ec424a2aea1d461eb30cab48a28c6433de2ab784.1587808742.git.jpoimboe@redhat.com
---
arch/x86/kernel/dumpstack_64.c | 3 +-
arch/x86/kernel/unwind_frame.c | 3 ++-
arch/x86/kernel/unwind_orc.c | 40 ++++++++++++++++++---------------
3 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 87b9789..460ae7f 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -183,7 +183,8 @@ recursion_check:
*/
if (visit_mask) {
if (*visit_mask & (1UL << info->type)) {
- printk_deferred_once(KERN_WARNING "WARNING: stack recursion on stack type %d\n", info->type);
+ if (task == current)
+ printk_deferred_once(KERN_WARNING "WARNING: stack recursion on stack type %d\n", info->type);
goto unknown;
}
*visit_mask |= 1UL << info->type;
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index a224b5a..5422611 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -344,6 +344,9 @@ bad_address:
if (IS_ENABLED(CONFIG_X86_32))
goto the_end;

+ if (state->task != current)
+ goto the_end;
+
if (state->regs) {
printk_deferred_once(KERN_WARNING
"WARNING: kernel stack regs at %p in %s:%d has bad 'bp' value %p\n",
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 64889da..45166fd 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -8,7 +8,13 @@
#include <asm/orc_lookup.h>

#define orc_warn(fmt, ...) \
- printk_deferred_once(KERN_WARNING pr_fmt("WARNING: " fmt), ##__VA_ARGS__)
+ printk_deferred_once(KERN_WARNING "WARNING: " fmt, ##__VA_ARGS__)
+
+#define orc_warn_current(args...) \
+({ \
+ if (state->task == current) \
+ orc_warn(args); \
+})

extern int __start_orc_unwind_ip[];
extern int __stop_orc_unwind_ip[];
@@ -446,8 +452,8 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_REG_R10:
if (!state->regs || !state->full_regs) {
- orc_warn("missing regs for base reg R10 at ip %pB\n",
- (void *)state->ip);
+ orc_warn_current("missing R10 value at %pB\n",
+ (void *)state->ip);
goto err;
}
sp = state->regs->r10;
@@ -455,8 +461,8 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_REG_R13:
if (!state->regs || !state->full_regs) {
- orc_warn("missing regs for base reg R13 at ip %pB\n",
- (void *)state->ip);
+ orc_warn_current("missing R13 value at %pB\n",
+ (void *)state->ip);
goto err;
}
sp = state->regs->r13;
@@ -464,8 +470,8 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_REG_DI:
if (!state->regs || !state->full_regs) {
- orc_warn("missing regs for base reg DI at ip %pB\n",
- (void *)state->ip);
+ orc_warn_current("missing RDI value at %pB\n",
+ (void *)state->ip);
goto err;
}
sp = state->regs->di;
@@ -473,15 +479,15 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_REG_DX:
if (!state->regs || !state->full_regs) {
- orc_warn("missing regs for base reg DX at ip %pB\n",
- (void *)state->ip);
+ orc_warn_current("missing DX value at %pB\n",
+ (void *)state->ip);
goto err;
}
sp = state->regs->dx;
break;

default:
- orc_warn("unknown SP base reg %d for ip %pB\n",
+ orc_warn("unknown SP base reg %d at %pB\n",
orc->sp_reg, (void *)state->ip);
goto err;
}
@@ -509,8 +515,8 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_TYPE_REGS:
if (!deref_stack_regs(state, sp, &state->ip, &state->sp)) {
- orc_warn("can't dereference registers at %p for ip %pB\n",
- (void *)sp, (void *)orig_ip);
+ orc_warn_current("can't access registers at %pB\n",
+ (void *)orig_ip);
goto err;
}

@@ -521,8 +527,8 @@ bool unwind_next_frame(struct unwind_state *state)

case ORC_TYPE_REGS_IRET:
if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) {
- orc_warn("can't dereference iret registers at %p for ip %pB\n",
- (void *)sp, (void *)orig_ip);
+ orc_warn_current("can't access iret registers at %pB\n",
+ (void *)orig_ip);
goto err;
}

@@ -532,7 +538,7 @@ bool unwind_next_frame(struct unwind_state *state)
break;

default:
- orc_warn("unknown .orc_unwind entry type %d for ip %pB\n",
+ orc_warn("unknown .orc_unwind entry type %d at %pB\n",
orc->type, (void *)orig_ip);
break;
}
@@ -564,8 +570,8 @@ bool unwind_next_frame(struct unwind_state *state)
if (state->stack_info.type == prev_type &&
on_stack(&state->stack_info, (void *)state->sp, sizeof(long)) &&
state->sp <= prev_sp) {
- orc_warn("stack going in the wrong direction? ip=%pB\n",
- (void *)orig_ip);
+ orc_warn_current("stack going in the wrong direction? at %pB\n",
+ (void *)orig_ip);
goto err;
}

Subject: [tip: x86/urgent] x86/entry/64: Fix unwind hints in kernel exit path

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 1fb143634a38095b641a3a21220774799772dc4c
Gitweb: https://git.kernel.org/tip/1fb143634a38095b641a3a21220774799772dc4c
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:03:02 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:27 +02:00

x86/entry/64: Fix unwind hints in kernel exit path

In swapgs_restore_regs_and_return_to_usermode, after the stack is
switched to the trampoline stack, the existing UNWIND_HINT_REGS hint is
no longer valid, which can result in the following ORC unwinder warning:

WARNING: can't dereference registers at 000000003aeb0cdd for ip swapgs_restore_regs_and_return_to_usermode+0x93/0xa0

For full correctness, we could try to add complicated unwind hints so
the unwinder could continue to find the registers, but when when it's
this close to kernel exit, unwind hints aren't really needed anymore and
it's fine to just use an empty hint which tells the unwinder to stop.

For consistency, also move the UNWIND_HINT_EMPTY in
entry_SYSCALL_64_after_hwframe to a similar location.

Fixes: 3e3b9293d392 ("x86/entry/64: Return to userspace from the trampoline stack")
Reported-by: Vince Weaver <[email protected]>
Reported-by: Dave Jones <[email protected]>
Reported-by: Dr. David Alan Gilbert <[email protected]>
Reported-by: Joe Mario <[email protected]>
Reported-by: Jann Horn <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/60ea8f562987ed2d9ace2977502fe481c0d7c9a0.1587808742.git.jpoimboe@redhat.com
---
arch/x86/entry/entry_64.S | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0e9504f..6b0d679 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -249,7 +249,6 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
*/
syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
- UNWIND_HINT_EMPTY
POP_REGS pop_rdi=0 skip_r11rcx=1

/*
@@ -258,6 +257,7 @@ syscall_return_via_sysret:
*/
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
+ UNWIND_HINT_EMPTY

pushq RSP-RDI(%rdi) /* RSP */
pushq (%rdi) /* RDI */
@@ -637,6 +637,7 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
*/
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
+ UNWIND_HINT_EMPTY

/* Copy the IRET frame to the trampoline stack. */
pushq 6*8(%rdi) /* SS */

Subject: [tip: x86/urgent] x86/entry/64: Fix unwind hints in __switch_to_asm()

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 96c64806b4bf35f5edb465cafa6cec490e424a30
Gitweb: https://git.kernel.org/tip/96c64806b4bf35f5edb465cafa6cec490e424a30
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:03:03 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:28 +02:00

x86/entry/64: Fix unwind hints in __switch_to_asm()

UNWIND_HINT_FUNC has some limitations: specifically, it doesn't reset
all the registers to undefined. This causes objtool to get confused
about the RBP push in __switch_to_asm(), resulting in bad ORC data.

While __switch_to_asm() does do some stack magic, it's otherwise a
normal callable-from-C function, so just annotate it as a function,
which makes objtool happy and allows it to produces the correct hints
automatically.

Fixes: 8c1f75587a18 ("x86/entry/64: Add unwind hint annotations")
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lore.kernel.org/r/03d0411920d10f7418f2e909210d8e9a3b2ab081.1587808742.git.jpoimboe@redhat.com
---
arch/x86/entry/entry_64.S | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 6b0d679..34a5889 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -279,8 +279,7 @@ SYM_CODE_END(entry_SYSCALL_64)
* %rdi: prev task
* %rsi: next task
*/
-SYM_CODE_START(__switch_to_asm)
- UNWIND_HINT_FUNC
+SYM_FUNC_START(__switch_to_asm)
/*
* Save callee-saved registers
* This must match the order in inactive_task_frame
@@ -321,7 +320,7 @@ SYM_CODE_START(__switch_to_asm)
popq %rbp

jmp __switch_to
-SYM_CODE_END(__switch_to_asm)
+SYM_FUNC_END(__switch_to_asm)

/*
* A newly forked process directly context switches into this address.

Subject: [tip: x86/urgent] x86/entry/64: Fix unwind hints in register clearing code

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 06a9750edcffa808494d56da939085c35904e618
Gitweb: https://git.kernel.org/tip/06a9750edcffa808494d56da939085c35904e618
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:03:01 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:27 +02:00

x86/entry/64: Fix unwind hints in register clearing code

The PUSH_AND_CLEAR_REGS macro zeroes each register immediately after
pushing it. If an NMI or exception hits after a register is cleared,
but before the UNWIND_HINT_REGS annotation, the ORC unwinder will
wrongly think the previous value of the register was zero. This can
confuse the unwinding process and cause it to exit early.

Because ORC is simpler than DWARF, there are a limited number of unwind
annotation states, so it's not possible to add an individual unwind hint
after each push/clear combination. Instead, the register clearing
instructions need to be consolidated and moved to after the
UNWIND_HINT_REGS annotation.

Fixes: 3f01daecd545 ("x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro")
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lore.kernel.org/r/68fd3d0bc92ae2d62ff7879d15d3684217d51f08.1587808742.git.jpoimboe@redhat.com
---
arch/x86/entry/calling.h | 40 ++++++++++++++++++++-------------------
1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 0789e13..1c7f13b 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -98,13 +98,6 @@ For 32-bit we have the following conventions - kernel is built with
#define SIZEOF_PTREGS 21*8

.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
- /*
- * Push registers and sanitize registers of values that a
- * speculation attack might otherwise want to exploit. The
- * lower registers are likely clobbered well before they
- * could be put to use in a speculative execution gadget.
- * Interleave XOR with PUSH for better uop scheduling:
- */
.if \save_ret
pushq %rsi /* pt_regs->si */
movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */
@@ -114,34 +107,43 @@ For 32-bit we have the following conventions - kernel is built with
pushq %rsi /* pt_regs->si */
.endif
pushq \rdx /* pt_regs->dx */
- xorl %edx, %edx /* nospec dx */
pushq %rcx /* pt_regs->cx */
- xorl %ecx, %ecx /* nospec cx */
pushq \rax /* pt_regs->ax */
pushq %r8 /* pt_regs->r8 */
- xorl %r8d, %r8d /* nospec r8 */
pushq %r9 /* pt_regs->r9 */
- xorl %r9d, %r9d /* nospec r9 */
pushq %r10 /* pt_regs->r10 */
- xorl %r10d, %r10d /* nospec r10 */
pushq %r11 /* pt_regs->r11 */
- xorl %r11d, %r11d /* nospec r11*/
pushq %rbx /* pt_regs->rbx */
- xorl %ebx, %ebx /* nospec rbx*/
pushq %rbp /* pt_regs->rbp */
- xorl %ebp, %ebp /* nospec rbp*/
pushq %r12 /* pt_regs->r12 */
- xorl %r12d, %r12d /* nospec r12*/
pushq %r13 /* pt_regs->r13 */
- xorl %r13d, %r13d /* nospec r13*/
pushq %r14 /* pt_regs->r14 */
- xorl %r14d, %r14d /* nospec r14*/
pushq %r15 /* pt_regs->r15 */
- xorl %r15d, %r15d /* nospec r15*/
UNWIND_HINT_REGS
+
.if \save_ret
pushq %rsi /* return address on top of stack */
.endif
+
+ /*
+ * Sanitize registers of values that a speculation attack might
+ * otherwise want to exploit. The lower registers are likely clobbered
+ * well before they could be put to use in a speculative execution
+ * gadget.
+ */
+ xorl %edx, %edx /* nospec dx */
+ xorl %ecx, %ecx /* nospec cx */
+ xorl %r8d, %r8d /* nospec r8 */
+ xorl %r9d, %r9d /* nospec r9 */
+ xorl %r10d, %r10d /* nospec r10 */
+ xorl %r11d, %r11d /* nospec r11 */
+ xorl %ebx, %ebx /* nospec rbx */
+ xorl %ebp, %ebp /* nospec rbp */
+ xorl %r12d, %r12d /* nospec r12 */
+ xorl %r13d, %r13d /* nospec r13 */
+ xorl %r14d, %r14d /* nospec r14 */
+ xorl %r15d, %r15d /* nospec r15 */
+
.endm

.macro POP_REGS pop_rdi=1 skip_r11rcx=0

Subject: [tip: x86/urgent] x86/entry/64: Fix unwind hints in rewind_stack_do_exit()

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: f977df7b7ca45a4ac4b66d30a8931d0434c394b1
Gitweb: https://git.kernel.org/tip/f977df7b7ca45a4ac4b66d30a8931d0434c394b1
Author: Jann Horn <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:03:04 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:28 +02:00

x86/entry/64: Fix unwind hints in rewind_stack_do_exit()

The LEAQ instruction in rewind_stack_do_exit() moves the stack pointer
directly below the pt_regs at the top of the task stack before calling
do_exit(). Tell the unwinder to expect pt_regs.

Fixes: 8c1f75587a18 ("x86/entry/64: Add unwind hint annotations")
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Jann Horn <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lore.kernel.org/r/68c33e17ae5963854916a46f522624f8e1d264f2.1587808742.git.jpoimboe@redhat.com
---
arch/x86/entry/entry_64.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 34a5889..9fe0d5c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1739,7 +1739,7 @@ SYM_CODE_START(rewind_stack_do_exit)

movq PER_CPU_VAR(cpu_current_top_of_stack), %rax
leaq -PTREGS_SIZE(%rax), %rsp
- UNWIND_HINT_FUNC sp_offset=PTREGS_SIZE
+ UNWIND_HINT_REGS

call do_exit
SYM_CODE_END(rewind_stack_do_exit)

Subject: [tip: x86/urgent] x86/unwind/orc: Fix error path for bad ORC entry type

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: a0f81bf26888048100bf017fadf438a5bdffa8d8
Gitweb: https://git.kernel.org/tip/a0f81bf26888048100bf017fadf438a5bdffa8d8
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:06:13 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:29 +02:00

x86/unwind/orc: Fix error path for bad ORC entry type

If the ORC entry type is unknown, nothing else can be done other than
reporting an error. Exit the function instead of breaking out of the
switch statement.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lore.kernel.org/r/a7fa668ca6eabbe81ab18b2424f15adbbfdc810a.1587808742.git.jpoimboe@redhat.com
---
arch/x86/kernel/unwind_orc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index cb11567..33b80a7 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -537,7 +537,7 @@ bool unwind_next_frame(struct unwind_state *state)
default:
orc_warn("unknown .orc_unwind entry type %d at %pB\n",
orc->type, (void *)orig_ip);
- break;
+ goto err;
}

/* Find BP: */

Subject: [tip: x86/urgent] x86/unwind/orc: Don't skip the first frame for inactive tasks

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: f1d9a2abff66aa8156fbc1493abed468db63ea48
Gitweb: https://git.kernel.org/tip/f1d9a2abff66aa8156fbc1493abed468db63ea48
Author: Miroslav Benes <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:03:07 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:29 +02:00

x86/unwind/orc: Don't skip the first frame for inactive tasks

When unwinding an inactive task, the ORC unwinder skips the first frame
by default. If both the 'regs' and 'first_frame' parameters of
unwind_start() are NULL, 'state->sp' and 'first_frame' are later
initialized to the same value for an inactive task. Given there is a
"less than or equal to" comparison used at the end of __unwind_start()
for skipping stack frames, the first frame is skipped.

Drop the equal part of the comparison and make the behavior equivalent
to the frame pointer unwinder.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lore.kernel.org/r/7f08db872ab59e807016910acdbe82f744de7065.1587808742.git.jpoimboe@redhat.com
---
arch/x86/kernel/unwind_orc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 45166fd..e9f5a20 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -657,7 +657,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
/* Otherwise, skip ahead to the user-specified starting frame: */
while (!unwind_done(state) &&
(!on_stack(&state->stack_info, first_frame, sizeof(long)) ||
- state->sp <= (unsigned long)first_frame))
+ state->sp < (unsigned long)first_frame))
unwind_next_frame(state);

return;

Subject: [tip: x86/urgent] objtool: Fix stack offset tracking for indirect CFAs

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: d8dd25a461e4eec7190cb9d66616aceacc5110ad
Gitweb: https://git.kernel.org/tip/d8dd25a461e4eec7190cb9d66616aceacc5110ad
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:03:00 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:27 +02:00

objtool: Fix stack offset tracking for indirect CFAs

When the current frame address (CFA) is stored on the stack (i.e.,
cfa->base == CFI_SP_INDIRECT), objtool neglects to adjust the stack
offset when there are subsequent pushes or pops. This results in bad
ORC data at the end of the ENTER_IRQ_STACK macro, when it puts the
previous stack pointer on the stack and does a subsequent push.

This fixes the following unwinder warning:

WARNING: can't dereference registers at 00000000f0a6bdba for ip interrupt_entry+0x9f/0xa0

Fixes: 627fce14809b ("objtool: Add ORC unwind table generation")
Reported-by: Vince Weaver <[email protected]>
Reported-by: Dave Jones <[email protected]>
Reported-by: Steven Rostedt <[email protected]>
Reported-by: Vegard Nossum <[email protected]>
Reported-by: Joe Mario <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/853d5d691b29e250333332f09b8e27410b2d9924.1587808742.git.jpoimboe@redhat.com
---
tools/objtool/check.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 4b170fd..e718464 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1449,7 +1449,7 @@ static int update_insn_state_regs(struct instruction *insn, struct insn_state *s
struct cfi_reg *cfa = &state->cfa;
struct stack_op *op = &insn->stack_op;

- if (cfa->base != CFI_SP)
+ if (cfa->base != CFI_SP && cfa->base != CFI_SP_INDIRECT)
return 0;

/* push */

Subject: [tip: x86/urgent] x86/unwind/orc: Fix premature unwind stoppage due to IRET frames

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 81b67439d147677d844d492fcbd03712ea438f42
Gitweb: https://git.kernel.org/tip/81b67439d147677d844d492fcbd03712ea438f42
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Sat, 25 Apr 2020 05:06:14 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 25 Apr 2020 12:22:29 +02:00

x86/unwind/orc: Fix premature unwind stoppage due to IRET frames

The following execution path is possible:

fsnotify()
[ realign the stack and store previous SP in R10 ]
<IRQ>
[ only IRET regs saved ]
common_interrupt()
interrupt_entry()
<NMI>
[ full pt_regs saved ]
...
[ unwind stack ]

When the unwinder goes through the NMI and the IRQ on the stack, and
then sees fsnotify(), it doesn't have access to the value of R10,
because it only has the five IRET registers. So the unwind stops
prematurely.

However, because the interrupt_entry() code is careful not to clobber
R10 before saving the full regs, the unwinder should be able to read R10
from the previously saved full pt_regs associated with the NMI.

Handle this case properly. When encountering an IRET regs frame
immediately after a full pt_regs frame, use the pt_regs as a backup
which can be used to get the C register values.

Also, note that a call frame resets the 'prev_regs' value, because a
function is free to clobber the registers. For this fix to work, the
IRET and full regs frames must be adjacent, with no FUNC frames in
between. So replace the FUNC hint in interrupt_entry() with an
IRET_REGS hint.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
---
arch/x86/entry/entry_64.S | 4 +--
arch/x86/include/asm/unwind.h | 2 +-
arch/x86/kernel/unwind_orc.c | 51 ++++++++++++++++++++++++++--------
3 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9fe0d5c..3063aa9 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -511,7 +511,7 @@ SYM_CODE_END(spurious_entries_start)
* +----------------------------------------------------+
*/
SYM_CODE_START(interrupt_entry)
- UNWIND_HINT_FUNC
+ UNWIND_HINT_IRET_REGS offset=16
ASM_CLAC
cld

@@ -543,9 +543,9 @@ SYM_CODE_START(interrupt_entry)
pushq 5*8(%rdi) /* regs->eflags */
pushq 4*8(%rdi) /* regs->cs */
pushq 3*8(%rdi) /* regs->ip */
+ UNWIND_HINT_IRET_REGS
pushq 2*8(%rdi) /* regs->orig_ax */
pushq 8(%rdi) /* return address */
- UNWIND_HINT_FUNC

movq (%rdi), %rdi
jmp 2f
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 499578f..70fc159 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -19,7 +19,7 @@ struct unwind_state {
#if defined(CONFIG_UNWINDER_ORC)
bool signal, full_regs;
unsigned long sp, bp, ip;
- struct pt_regs *regs;
+ struct pt_regs *regs, *prev_regs;
#elif defined(CONFIG_UNWINDER_FRAME_POINTER)
bool got_irq;
unsigned long *bp, *orig_sp, ip;
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 33b80a7..0ebc11a 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -384,9 +384,38 @@ static bool deref_stack_iret_regs(struct unwind_state *state, unsigned long addr
return true;
}

+/*
+ * If state->regs is non-NULL, and points to a full pt_regs, just get the reg
+ * value from state->regs.
+ *
+ * Otherwise, if state->regs just points to IRET regs, and the previous frame
+ * had full regs, it's safe to get the value from the previous regs. This can
+ * happen when early/late IRQ entry code gets interrupted by an NMI.
+ */
+static bool get_reg(struct unwind_state *state, unsigned int reg_off,
+ unsigned long *val)
+{
+ unsigned int reg = reg_off/8;
+
+ if (!state->regs)
+ return false;
+
+ if (state->full_regs) {
+ *val = ((unsigned long *)state->regs)[reg];
+ return true;
+ }
+
+ if (state->prev_regs) {
+ *val = ((unsigned long *)state->prev_regs)[reg];
+ return true;
+ }
+
+ return false;
+}
+
bool unwind_next_frame(struct unwind_state *state)
{
- unsigned long ip_p, sp, orig_ip = state->ip, prev_sp = state->sp;
+ unsigned long ip_p, sp, tmp, orig_ip = state->ip, prev_sp = state->sp;
enum stack_type prev_type = state->stack_info.type;
struct orc_entry *orc;
bool indirect = false;
@@ -448,39 +477,35 @@ bool unwind_next_frame(struct unwind_state *state)
break;

case ORC_REG_R10:
- if (!state->regs || !state->full_regs) {
+ if (!get_reg(state, offsetof(struct pt_regs, r10), &sp)) {
orc_warn_current("missing R10 value at %pB\n",
(void *)state->ip);
goto err;
}
- sp = state->regs->r10;
break;

case ORC_REG_R13:
- if (!state->regs || !state->full_regs) {
+ if (!get_reg(state, offsetof(struct pt_regs, r13), &sp)) {
orc_warn_current("missing R13 value at %pB\n",
(void *)state->ip);
goto err;
}
- sp = state->regs->r13;
break;

case ORC_REG_DI:
- if (!state->regs || !state->full_regs) {
+ if (!get_reg(state, offsetof(struct pt_regs, di), &sp)) {
orc_warn_current("missing RDI value at %pB\n",
(void *)state->ip);
goto err;
}
- sp = state->regs->di;
break;

case ORC_REG_DX:
- if (!state->regs || !state->full_regs) {
+ if (!get_reg(state, offsetof(struct pt_regs, dx), &sp)) {
orc_warn_current("missing DX value at %pB\n",
(void *)state->ip);
goto err;
}
- sp = state->regs->dx;
break;

default:
@@ -507,6 +532,7 @@ bool unwind_next_frame(struct unwind_state *state)

state->sp = sp;
state->regs = NULL;
+ state->prev_regs = NULL;
state->signal = false;
break;

@@ -518,6 +544,7 @@ bool unwind_next_frame(struct unwind_state *state)
}

state->regs = (struct pt_regs *)sp;
+ state->prev_regs = NULL;
state->full_regs = true;
state->signal = true;
break;
@@ -529,6 +556,8 @@ bool unwind_next_frame(struct unwind_state *state)
goto err;
}

+ if (state->full_regs)
+ state->prev_regs = state->regs;
state->regs = (void *)sp - IRET_FRAME_OFFSET;
state->full_regs = false;
state->signal = true;
@@ -543,8 +572,8 @@ bool unwind_next_frame(struct unwind_state *state)
/* Find BP: */
switch (orc->bp_reg) {
case ORC_REG_UNDEFINED:
- if (state->regs && state->full_regs)
- state->bp = state->regs->bp;
+ if (get_reg(state, offsetof(struct pt_regs, bp), &tmp))
+ state->bp = tmp;
break;

case ORC_REG_PREV_SP:

2020-04-26 07:28:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] ORC fixes


* Josh Poimboeuf <[email protected]> wrote:

> > Any objections against targeting v5.7-rc3 with this, assuming that
> > there's no problems found during review and it passes about a week of
> > testing?
>
> Hi Ingo,
>
> Due to other distractions, I unfortunately have been sitting on some of
> these fixes for several months -- notice some of the long Reported-by
> chains :-/
>
> They're good small localized fixes and I would agree it makes sense to
> target x86/urgent.
>
> Thanks!

Ok, I've merged them for x86/urgent - with the small tweak to the merge
plan that it would be for -rc4 next week - the patches are too fresh for
-rc3 IMO.

Thanks,

Ingo

2024-02-28 23:33:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 06/11] x86/unwind/orc: Convert global variables to static

On Sat, 25 Apr 2020 05:03:05 -0500
Josh Poimboeuf <[email protected]> wrote:

> These variables aren't used outside of unwind_orc.c, make them static.
>
> Also annotate some of them with '__ro_after_init', as applicable.

So it appears that crash uses "lookup_num_blocks" to be able to do
back-traces with the ORC unwinder. But because it's now static, crash can no
longer do that.

Is it possible to make lookup_num_blocks global again?

/* Not static so that the crash utility can access it */
unsigned int lookup_num_blocks __ro_after_init;

-- Steve

>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> Reviewed-by: Miroslav Benes <[email protected]>
> ---
> arch/x86/kernel/unwind_orc.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index e9cc182aa97e..64889da666f4 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -15,12 +15,12 @@ extern int __stop_orc_unwind_ip[];
> extern struct orc_entry __start_orc_unwind[];
> extern struct orc_entry __stop_orc_unwind[];
>
> -static DEFINE_MUTEX(sort_mutex);
> -int *cur_orc_ip_table = __start_orc_unwind_ip;
> -struct orc_entry *cur_orc_table = __start_orc_unwind;
> +static bool orc_init __ro_after_init;
> +static unsigned int lookup_num_blocks __ro_after_init;
>
> -unsigned int lookup_num_blocks;
> -bool orc_init;
> +static DEFINE_MUTEX(sort_mutex);
> +static int *cur_orc_ip_table = __start_orc_unwind_ip;
> +static struct orc_entry *cur_orc_table = __start_orc_unwind;
>
> static inline unsigned long orc_ip(const int *ip)
> {


2024-02-29 00:03:18

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 06/11] x86/unwind/orc: Convert global variables to static

On Wed, Feb 28, 2024 at 06:35:07PM -0500, Steven Rostedt wrote:
> On Sat, 25 Apr 2020 05:03:05 -0500
> Josh Poimboeuf <[email protected]> wrote:
>
> > These variables aren't used outside of unwind_orc.c, make them static.
> >
> > Also annotate some of them with '__ro_after_init', as applicable.
>
> So it appears that crash uses "lookup_num_blocks" to be able to do
> back-traces with the ORC unwinder. But because it's now static, crash can no
> longer do that.

Hm, but why? Even a static variable has a known address.

--
Josh

2024-02-29 00:12:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 06/11] x86/unwind/orc: Convert global variables to static

On Wed, 28 Feb 2024 16:03:09 -0800
Josh Poimboeuf <[email protected]> wrote:

> On Wed, Feb 28, 2024 at 06:35:07PM -0500, Steven Rostedt wrote:
> > On Sat, 25 Apr 2020 05:03:05 -0500
> > Josh Poimboeuf <[email protected]> wrote:
> >
> > > These variables aren't used outside of unwind_orc.c, make them static.
> > >
> > > Also annotate some of them with '__ro_after_init', as applicable.
> >
> > So it appears that crash uses "lookup_num_blocks" to be able to do
> > back-traces with the ORC unwinder. But because it's now static, crash can no
> > longer do that.
>
> Hm, but why? Even a static variable has a known address.
>

I'm guessing because we don't have the full dwarf info?

-- Steve

2024-02-29 00:43:03

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 06/11] x86/unwind/orc: Convert global variables to static

On Wed, Feb 28, 2024 at 07:14:06PM -0500, Steven Rostedt wrote:
> On Wed, 28 Feb 2024 16:03:09 -0800
> Josh Poimboeuf <[email protected]> wrote:
>
> > On Wed, Feb 28, 2024 at 06:35:07PM -0500, Steven Rostedt wrote:
> > > On Sat, 25 Apr 2020 05:03:05 -0500
> > > Josh Poimboeuf <[email protected]> wrote:
> > >
> > > > These variables aren't used outside of unwind_orc.c, make them static.
> > > >
> > > > Also annotate some of them with '__ro_after_init', as applicable.
> > >
> > > So it appears that crash uses "lookup_num_blocks" to be able to do
> > > back-traces with the ORC unwinder. But because it's now static, crash can no
> > > longer do that.
> >
> > Hm, but why? Even a static variable has a known address.
> >
>
> I'm guessing because we don't have the full dwarf info?

DWARF isn't needed for that. Even the symbol table has it (as does
System.map). For both globals and statics.

--
Josh

2024-02-29 23:38:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 06/11] x86/unwind/orc: Convert global variables to static

On Wed, 28 Feb 2024 16:42:52 -0800
Josh Poimboeuf <[email protected]> wrote:

> DWARF isn't needed for that. Even the symbol table has it (as does
> System.map). For both globals and statics.

Interesting enough, it is reported that if lookup_num_blocks is global,
crash returns the proper value. If it is static, it just returns "1", which
is incorrect (the correct value is around 60K).

-- Steve

2024-03-18 15:56:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 06/11] x86/unwind/orc: Convert global variables to static

On Wed, 28 Feb 2024 16:42:52 -0800
Josh Poimboeuf <[email protected]> wrote:

> > I'm guessing because we don't have the full dwarf info?
>
> DWARF isn't needed for that. Even the symbol table has it (as does
> System.map). For both globals and statics.

It is in System.map, but I guess the real issue is that the compiler can
optimize it out. That is, the number is never set as it is static and the
compiler doesn't need to do anything to make it valid. To the compiler, the
number is just a local variable. I'm not exactly sure how it does that, as
it sets the value in one function and uses it in another. But clang appears
to not be setting it when it is static.

Either removing static or making it volatile makes it work again.

Thoughts?

-- Steve