LinuxLists.cc - 4.14.9 doesn't boot (regression)

2017-12-29 21:02:40

В Пт, 29/12/2017 в 17:10 -0700, Andy Lutomirski пишет:
>
> Also, you wouldn't happen to be using Gentoo perchance? I already
> have two reports of a Gentoo system miscompiling the vDSO due to
> Gentoo enabling -fstack-check and GCC generating stack check code
> that is highly suboptimal, actively incorrect, and doesn't even
> manage to check the stack in a particularly helpful way.
>
> If this is indeed what's going on, I'm going to try to come up with a
> patch to outright fail the build on these buggy systems. We could
> probably fudge the build options to avoid the problem, but Gentoo
> really just needs fix its toolchain.

You are right, It's due to fstack-check enabled in gentoo's gcc spec.
"-fstack-check=no" in KBUILD_CFLAGS fixed this problem for me. =/

2017-12-30 01:34:47

[permalink] [raw]

Subject: Re: 4.14.9 doesn't boot (regression)

В Sat, 30 Dec 2017 11:57:46 -0600
Josh Poimboeuf <[email protected]> пишет:

> On Sat, Dec 30, 2017 at 11:09:46AM -0600, Josh Poimboeuf wrote:
> > On Sat, Dec 30, 2017 at 11:45:13AM +0300, Alexander Tsoy wrote:
> > > В Пт, 29/12/2017 в 21:49 -0600, Josh Poimboeuf пишет:
> > > > On Fri, Dec 29, 2017 at 05:10:35PM -0700, Andy Lutomirski
> > > > wrote:
> > > > > (Also, Josh, the oops code should have printed the contents
> > > > > of the struct pt_regs at the top of the DF stack.  Any idea
> > > > > why it didn't?)
> > > >
> > > > Looking at one of the dumps:
> > > >
> > > > [  392.774879] NMI backtrace for cpu 0
> > > > [  392.774881] CPU: 0 PID: 1 Comm: init Not tainted
> > > > 4.14.9-gentoo #1
> > > > [  392.774881] Hardware name: Red Hat KVM, BIOS 0.5.1
> > > > 01/01/2011 [  392.774882] task: ffff8802368b8000 task.stack:
> > > > ffffc9000000c000 [  392.774885] RIP: 0010:double_fault+0x0/0x30
> > > > [  392.774886] RSP: 0000:ffffffffff527fd0 EFLAGS: 00000086
> > > > [  392.774887] RAX: 000000003fc00000 RBX: 0000000000000001
> > > > RCX: 00000000c0000101
> > > > [  392.774887] RDX: 00000000ffff8802 RSI: 0000000000000000
> > > > RDI: ffffffffff527f58
> > > > [  392.774887] RBP: 0000000000000000 R08: 0000000000000000
> > > > R09: 0000000000000000
> > > > [  392.774888] R10: 0000000000000000 R11: 0000000000000000
> > > > R12: ffffffff816ae726
> > > > [  392.774888] R13: 0000000000000000 R14: 0000000000000000
> > > > R15: 0000000000000000
> > > > [  392.774889] FS:  0000000000000000(0000)
> > > > GS:ffff88023fc00000(0000) knlGS:0000000000000000
> > > > [  392.774889] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > 0000000080050033 [  392.774890] CR2: ffffffffff526f08 CR3:
> > > > 0000000235b48002 CR4: 00000000001606f0
> > > > [  392.774892] Call Trace:
> > > > [  392.774894]  <#DF>
> > > > [  392.774897]  do_double_fault+0xb/0x140
> > > > [  392.774898]  </#DF>
> > > >
> > > > It should have at least printed the #DF iret frame registers,
> > > > which I recently added support for in "x86/unwinder: Handle
> > > > stack overflows more
> > > > gracefully", which is in both 4.14.9 and 4.15-rc5.
> > > >
> > > > I think the missing iret regs are due to a bug in
> > > > show_trace_log_lvl(),
> > > > where if the unwind starts with two regs frames in a row, the
> > > > second regs don't get printed.
> > > >
> > > > Alexander, would you mind reproducing again with the below
> > > > patch?  It should still fail, but this time it should hopefully
> > > > show another RIP/RSP/EFLAGS instead of the
> > > > "do_double_fault+0xb/0x140" line.
> > >
> > > Yes, it works:
> > >
> > > [   23.058064] NMI backtrace for cpu 2
> > > [   23.058068] CPU: 2 PID: 1 Comm: init Not tainted 4.15.0-rc5+ #1
> > > [   23.058069] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > > 1996), BIOS 1.10.2-1.fc27 04/01/2014
> > > [   23.058074] RIP: 0010:double_fault+0x0/0x30
> > > [   23.058075] RSP: 0000:fffffe800005ffd0 EFLAGS: 00000086
> > > [   23.058077] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
> > > 00000000c0000101
> > > [   23.058077] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
> > > fffffe800005ff58
> > > [   23.058078] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [   23.058079] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffffffff92001426
> > > [   23.058080] R13: 0000000000000000 R14: 0000000000000000 R15:
> > > 0000000000000000
> > > [   23.058083] FS:  0000000000000000(0000)
> > > GS:ffff96813fd00000(0000) knlGS:0000000000000000
> > > [   23.058084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   23.058085] CR2: fffffe800005ef08 CR3: 0000000137a09000 CR4:
> > > 00000000000406a0
> > > [   23.058089] Call Trace:
> > > [   23.058101]  <#DF>
> > > [   23.058104] RIP: 0010:do_double_fault+0xb/0x140
> > > [   23.058105] RSP: 0000:fffffe800005ef18 EFLAGS: 00010086
> > > ORIG_RAX: 0000000000000000
> > > [   23.058106] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
> > > 00000000c0000101
> > > [   23.058107] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
> > > fffffe800005ff58
> > > [   23.058107] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [   23.058108] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffffffff92001426
> > > [   23.058108] R13: 0000000000000000 R14: 0000000000000000 R15:
> > > 0000000000000000
> > > [   23.058111]  </#DF>
> > > [   23.058111] Code: 05 00 00 48 89 e7 31 f6 e8 2e 8c 61 ff e9 69
> > > 06 00 00 e8 94 05 00 00 48 89 e7 31 f6 e8 1a 8c 61 ff e9 55 06 00
> > > 00 0f 1f 44 00 00 <0f> 1f 00 48 83 c4 88 e8 e4 04 00 00 48 89 e7
> > > 48 8b 74 24 78 48
> >
> > That's better indeed, though still not quite right. It should have
> > only shown a subset of those registers. One more bug to fix
> > there...
>
> Turns out my previous code to print iret frames was a bit ...
> misguided, to put it nicely. Not sure what I was smoking.
>
> Hopefully the below patch should fix it (in place of the previous
> patch). Would you mind testing again?
>

With that patch I get:

[ 2.160017] NMI backtrace for cpu 0
[ 2.160017] CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc5 #1
[ 2.160017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
[ 2.160017] RIP: 0010:double_fault+0x0/0x30
[ 2.160017] RSP: 0000:fffffe8000007fd0 EFLAGS: 00010086
[ 2.160017] RAX: 00000000ffc00000 RBX: 0000000000000001 RCX: 00000000c0000101
[ 2.160017] RDX: 00000000ffff8edc RSI: 0000000000000000 RDI: fffffe8000007f58
[ 2.160017] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2.160017] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa3c01426
[ 2.160017] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2.160017] FS: 0000000000000000(0000) GS:ffff8edcffc00000(0000) knlGS:0000000000000000
[ 2.160017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.160017] CR2: fffffe8000006f08 CR3: 000000007c153000 CR4: 00000000000006b0
[ 2.160017] Call Trace:
[ 2.160017] <#DF>
[ 2.160017] RIP: 0010:do_double_fault+0xb/0x140
[ 2.160017] RSP: 0000:fffffe8000006f18 EFLAGS: 00010086
[ 2.160017] </#DF>

--
Best regards,
Alexander Tsoy

2017-12-30 22:16:54

by Josh Poimboeuf

[permalink] [raw]

Subject: Re: 4.14.9 doesn't boot (regression)

On Sun, Dec 31, 2017 at 01:03:25AM +0300, Alexander Tsoy wrote:
> > Turns out my previous code to print iret frames was a bit ...
> > misguided, to put it nicely. Not sure what I was smoking.
> >
> > Hopefully the below patch should fix it (in place of the previous
> > patch). Would you mind testing again?
> >
>
> With that patch I get:
>
> [ 2.160017] NMI backtrace for cpu 0
> [ 2.160017] CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc5 #1
> [ 2.160017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
> [ 2.160017] RIP: 0010:double_fault+0x0/0x30
> [ 2.160017] RSP: 0000:fffffe8000007fd0 EFLAGS: 00010086
> [ 2.160017] RAX: 00000000ffc00000 RBX: 0000000000000001 RCX: 00000000c0000101
> [ 2.160017] RDX: 00000000ffff8edc RSI: 0000000000000000 RDI: fffffe8000007f58
> [ 2.160017] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 2.160017] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa3c01426
> [ 2.160017] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 2.160017] FS: 0000000000000000(0000) GS:ffff8edcffc00000(0000) knlGS:0000000000000000
> [ 2.160017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.160017] CR2: fffffe8000006f08 CR3: 000000007c153000 CR4: 00000000000006b0
> [ 2.160017] Call Trace:
> [ 2.160017] <#DF>
> [ 2.160017] RIP: 0010:do_double_fault+0xb/0x140
> [ 2.160017] RSP: 0000:fffffe8000006f18 EFLAGS: 00010086
> [ 2.160017] </#DF>

Yes, that's more like it. I'll clean up the patches and submit them
soon. These nasty bugs are always a good testcase for the stack dump
code.

Thanks for testing!

--
Josh