Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751152AbdL3R5u (ORCPT ); Sat, 30 Dec 2017 12:57:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53870 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750955AbdL3R5s (ORCPT ); Sat, 30 Dec 2017 12:57:48 -0500 Date: Sat, 30 Dec 2017 11:57:46 -0600 From: Josh Poimboeuf To: Alexander Tsoy Cc: Andy Lutomirski , Linus Torvalds , Toralf =?utf-8?Q?F=C3=B6rster?= , stable , Linux Kernel , the arch/x86 maintainers Subject: Re: 4.14.9 doesn't boot (regression) Message-ID: <20171230175746.5snazk5sem4ribk6@treble> References: <33249a35-7d6a-f0f3-5a98-e6474f9366e3@gmx.de> <7A0A9B37-20FF-4B17-B4F5-D8B999269FC4@amacapital.net> <20171230034947.6jgk5t7c7jrl6dwg@treble> <1514623513.2379.1.camel@tsoy.me> <20171230170946.ywnkxkeolwk3jm6w@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20171230170946.ywnkxkeolwk3jm6w@treble> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Sat, 30 Dec 2017 17:57:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9230 Lines: 226 On Sat, Dec 30, 2017 at 11:09:46AM -0600, Josh Poimboeuf wrote: > On Sat, Dec 30, 2017 at 11:45:13AM +0300, Alexander Tsoy wrote: > > В Пт, 29/12/2017 в 21:49 -0600, Josh Poimboeuf пишет: > > > On Fri, Dec 29, 2017 at 05:10:35PM -0700, Andy Lutomirski wrote: > > > > (Also, Josh, the oops code should have printed the contents of the > > > > struct pt_regs at the top of the DF stack.  Any idea why it > > > > didn't?) > > > > > > Looking at one of the dumps: > > > > > >   [  392.774879] NMI backtrace for cpu 0 > > >   [  392.774881] CPU: 0 PID: 1 Comm: init Not tainted 4.14.9-gentoo > > > #1 > > >   [  392.774881] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > > >   [  392.774882] task: ffff8802368b8000 task.stack: ffffc9000000c000 > > >   [  392.774885] RIP: 0010:double_fault+0x0/0x30 > > >   [  392.774886] RSP: 0000:ffffffffff527fd0 EFLAGS: 00000086 > > >   [  392.774887] RAX: 000000003fc00000 RBX: 0000000000000001 RCX: > > > 00000000c0000101 > > >   [  392.774887] RDX: 00000000ffff8802 RSI: 0000000000000000 RDI: > > > ffffffffff527f58 > > >   [  392.774887] RBP: 0000000000000000 R08: 0000000000000000 R09: > > > 0000000000000000 > > >   [  392.774888] R10: 0000000000000000 R11: 0000000000000000 R12: > > > ffffffff816ae726 > > >   [  392.774888] R13: 0000000000000000 R14: 0000000000000000 R15: > > > 0000000000000000 > > >   [  392.774889] FS:  0000000000000000(0000) > > > GS:ffff88023fc00000(0000) knlGS:0000000000000000 > > >   [  392.774889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >   [  392.774890] CR2: ffffffffff526f08 CR3: 0000000235b48002 CR4: > > > 00000000001606f0 > > >   [  392.774892] Call Trace: > > >   [  392.774894]  <#DF> > > >   [  392.774897]  do_double_fault+0xb/0x140 > > >   [  392.774898]   > > > > > > It should have at least printed the #DF iret frame registers, which I > > > recently added support for in "x86/unwinder: Handle stack overflows > > > more > > > gracefully", which is in both 4.14.9 and 4.15-rc5. > > > > > > I think the missing iret regs are due to a bug in > > > show_trace_log_lvl(), > > > where if the unwind starts with two regs frames in a row, the second > > > regs don't get printed. > > > > > > Alexander, would you mind reproducing again with the below patch?  It > > > should still fail, but this time it should hopefully show another > > > RIP/RSP/EFLAGS instead of the "do_double_fault+0xb/0x140" line. > > > > > > > Yes, it works: > > > > [   23.058064] NMI backtrace for cpu 2 > > [   23.058068] CPU: 2 PID: 1 Comm: init Not tainted 4.15.0-rc5+ #1 > > [   23.058069] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > BIOS 1.10.2-1.fc27 04/01/2014 > > [   23.058074] RIP: 0010:double_fault+0x0/0x30 > > [   23.058075] RSP: 0000:fffffe800005ffd0 EFLAGS: 00000086 > > [   23.058077] RAX: 000000003fd00000 RBX: 0000000000000001 RCX: > > 00000000c0000101 > > [   23.058077] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI: > > fffffe800005ff58 > > [   23.058078] RBP: 0000000000000000 R08: 0000000000000000 R09: > > 0000000000000000 > > [   23.058079] R10: 0000000000000000 R11: 0000000000000000 R12: > > ffffffff92001426 > > [   23.058080] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [   23.058083] FS:  0000000000000000(0000) GS:ffff96813fd00000(0000) > > knlGS:0000000000000000 > > [   23.058084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [   23.058085] CR2: fffffe800005ef08 CR3: 0000000137a09000 CR4: > > 00000000000406a0 > > [   23.058089] Call Trace: > > [   23.058101]  <#DF> > > [   23.058104] RIP: 0010:do_double_fault+0xb/0x140 > > [   23.058105] RSP: 0000:fffffe800005ef18 EFLAGS: 00010086 ORIG_RAX: > > 0000000000000000 > > [   23.058106] RAX: 000000003fd00000 RBX: 0000000000000001 RCX: > > 00000000c0000101 > > [   23.058107] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI: > > fffffe800005ff58 > > [   23.058107] RBP: 0000000000000000 R08: 0000000000000000 R09: > > 0000000000000000 > > [   23.058108] R10: 0000000000000000 R11: 0000000000000000 R12: > > ffffffff92001426 > > [   23.058108] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [   23.058111]   > > [   23.058111] Code: 05 00 00 48 89 e7 31 f6 e8 2e 8c 61 ff e9 69 06 00 > > 00 e8 94 05 00 00 48 89 e7 31 f6 e8 1a 8c 61 ff e9 55 06 00 00 0f 1f 44 > > 00 00 <0f> 1f 00 48 83 c4 88 e8 e4 04 00 00 48 89 e7 48 8b 74 24 78 48 > > That's better indeed, though still not quite right. It should have only > shown a subset of those registers. One more bug to fix there... Turns out my previous code to print iret frames was a bit ... misguided, to put it nicely. Not sure what I was smoking. Hopefully the below patch should fix it (in place of the previous patch). Would you mind testing again? diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h index c1688c2d0a12..1f86e1b0a5cd 100644 --- a/arch/x86/include/asm/unwind.h +++ b/arch/x86/include/asm/unwind.h @@ -56,18 +56,27 @@ void unwind_start(struct unwind_state *state, struct task_struct *task, #if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER) /* - * WARNING: The entire pt_regs may not be safe to dereference. In some cases, - * only the iret frame registers are accessible. Use with caution! + * If 'partial' returns true, only the iret frame registers are valid. */ -static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) +static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state, + bool *partial) { if (unwind_done(state)) return NULL; + if (partial) { +#ifdef CONFIG_UNWINDER_ORC + *partial = !state->full_regs; +#else + *partial = false; +#endif + } + return state->regs; } #else -static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) +static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state, + bool *partial) { return NULL; } diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 36b17e0febe8..6e49d2e0c243 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -76,12 +76,11 @@ void show_iret_regs(struct pt_regs *regs) regs->sp, regs->flags); } -static void show_regs_safe(struct stack_info *info, struct pt_regs *regs) +static void show_regs_full_or_partial(struct pt_regs *regs, bool partial) { - if (on_stack(info, regs, sizeof(*regs))) + if (!partial) __show_regs(regs, 0); - else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET, - IRET_FRAME_SIZE)) { + else { /* * When an interrupt or exception occurs in entry code, the * full pt_regs might not have been saved yet. In that case @@ -98,11 +97,13 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, struct stack_info stack_info = {0}; unsigned long visit_mask = 0; int graph_idx = 0; + bool partial; printk("%sCall Trace:\n", log_lvl); unwind_start(&state, task, regs, stack); stack = stack ? : get_stack_pointer(task, regs); + regs = unwind_get_entry_regs(&state, &partial); /* * Iterate through the stacks, starting with the current stack pointer. @@ -120,7 +121,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, * - hardirq stack * - entry stack */ - for (regs = NULL; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { + for ( ; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { const char *stack_name; if (get_stack_info(stack, task, &stack_info, &visit_mask)) { @@ -140,7 +141,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, printk("%s <%s>\n", log_lvl, stack_name); if (regs) - show_regs_safe(&stack_info, regs); + show_regs_full_or_partial(regs, partial); /* * Scan the stack, printing any text addresses we find. At the @@ -164,7 +165,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, /* * Don't print regs->ip again if it was already printed - * by show_regs_safe() below. + * by show_regs_full_or_partial() below. */ if (regs && stack == ®s->ip) goto next; @@ -199,9 +200,9 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, unwind_next_frame(&state); /* if the frame has entry regs, print them */ - regs = unwind_get_entry_regs(&state); + regs = unwind_get_entry_regs(&state, &partial); if (regs) - show_regs_safe(&stack_info, regs); + show_regs_full_or_partial(regs, partial); } if (stack_name) diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c index 77835bc021c7..7dd0d2a0d142 100644 --- a/arch/x86/kernel/stacktrace.c +++ b/arch/x86/kernel/stacktrace.c @@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace, for (unwind_start(&state, task, NULL, NULL); !unwind_done(&state); unwind_next_frame(&state)) { - regs = unwind_get_entry_regs(&state); + regs = unwind_get_entry_regs(&state, NULL); if (regs) { /* * Kernel mode registers on the stack indicate an