2017-12-29 11:14:38

by Toralf Förster

[permalink] [raw]
Subject: 4.14.9 doesn't boot (regression)

I can confirm now, that that kernel breaks both a desktop (an ThinkPad T440s i5) and a headless server (i3930) setup. For the server the attached .config works fine but switching from CONFIG_GENERIC_CPU to CONFIG_MCORE2 legt them hang at boot w/op any messages. Similar picture at the desktop.
Both are stable Gentoo Linux hardened systems.

This issue seems to exist in mainline too, probably visible with d120cd749 (stable) and 9aaefe7b59 (upstream).

--
Toralf
PGP C4EACDDE 0076E94E


Attachments:
.config (72.21 kB)

2017-12-29 13:33:42

by Sebastian Gottschall

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

bootlog?

Am 29.12.2017 um 12:14 schrieb Toralf Förster:
> I can confirm now, that that kernel breaks both a desktop (an ThinkPad T440s i5) and a headless server (i3930) setup. For the server the attached .config works fine but switching from CONFIG_GENERIC_CPU to CONFIG_MCORE2 legt them hang at boot w/op any messages. Similar picture at the desktop.
> Both are stable Gentoo Linux hardened systems.
>
> This issue seems to exist in mainline too, probably visible with d120cd749 (stable) and 9aaefe7b59 (upstream).
>

--
Mit freundlichen Grüssen / Regards

Sebastian Gottschall / CTO

NewMedia-NET GmbH - DD-WRT
Firmensitz: Stubenwaldallee 21a, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Geschäftsführer: Peter Steinhäuser, Christian Scheele
http://www.dd-wrt.com
email: [email protected]
Tel.: +496251-582650 / Fax: +496251-5826565

2017-12-29 13:38:58

by Toralf Förster

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On 12/29/2017 02:33 PM, Sebastian Gottschall wrote:
> bootlog?
>
nothing in any logs, hang happens very early in the boot process


--
Toralf
PGP C4EACDDE 0076E94E

2017-12-29 15:29:39

by Andy Shevchenko

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Fri, Dec 29, 2017 at 3:38 PM, Toralf Förster <[email protected]> wrote:
> On 12/29/2017 02:33 PM, Sebastian Gottschall wrote:
>> bootlog?
>>
> nothing in any logs, hang happens very early in the boot process

Does it have serial?

Does it use EFI?

You may try earlyprintk for EFI case or legacy UART.
There was support for PCI UARTs, though it wasn't really what I ever used.


--
With Best Regards,
Andy Shevchenko

2017-12-29 15:48:58

by Alexander Tsoy

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

В Пт, 29/12/2017 в 12:14 +0100, Toralf Förster пишет:
> I can confirm now, that that kernel breaks both a desktop (an
> ThinkPad T440s i5) and a headless server (i3930) setup. For the
> server the attached .config works fine but switching from
> CONFIG_GENERIC_CPU to CONFIG_MCORE2 legt them hang at boot w/op any
> messages. Similar picture at the desktop.

You most likely have the same problem as me:
https://lkml.org/lkml/2017/12/29/279

2017-12-29 15:53:03

by Toralf Förster

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On 12/29/2017 04:48 PM, Alexander Tsoy wrote:
> В Пт, 29/12/2017 в 12:14 +0100, Toralf Förster пишет:
>> I can confirm now, that that kernel breaks both a desktop (an
>> ThinkPad T440s i5) and a headless server (i3930) setup. For the
>> server the attached .config works fine but switching from
>> CONFIG_GENERIC_CPU to CONFIG_MCORE2 legt them hang at boot w/op any
>> messages. Similar picture at the desktop.
>
> You most likely have the same problem as me:
> https://lkml.org/lkml/2017/12/29/279
>

Indeed, I got a similar message at my ThinkPad too when I tried to bisect it:

>[ 21.776011] INFO: rcu_preempt detected stalls on CPUs/tasks:
>[ 21.w77008] 0-...!: (0 ticks this GP) idle=c56/140000000000000/0
>softirq=73/73 fqs=0
>[ 21.777008] (detected by 1, t=21002 jiffies, g=-255, c=-256, q=4)


--
Toralf
PGP C4EACDDE 0076E94E

2017-12-29 20:12:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Fri, Dec 29, 2017 at 3:14 AM, Toralf Förster <[email protected]> wrote:
>
> For the server the attached .config works fine but switching from
> CONFIG_GENERIC_CPU to CONFIG_MCORE2 legt them hang at boot w/op any
> messages. Similar picture at the desktop.

Ok, so there's another thread ("4.14.9 with CONFIG_MCORE2 fails to
boot") about this same thing, but one thing to try is to see if it's
just the

cflags-$(CONFIG_MCORE2) += \
$(call cc-option,-march=core2,$(call cc-option,-mtune=generic))

in arch/x86/Makefile that causes this.

The MCORE2 option does potentially have a few other effects (see
arch/x86/Kconfig.cpu), but the first one to check might be just that
compiler command line effect.

So if you can edit arch/x86/Makefile, and just make that say

cflags-$(CONFIG_MCORE2) += $(call cc-option,-mtune=generic)

instead, and see if that makes a difference, that would narrow down
the possible root cause of this problem.

Linus

2017-12-29 20:21:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)


* Linus Torvalds <[email protected]> wrote:

> On Fri, Dec 29, 2017 at 3:14 AM, Toralf F?rster <[email protected]> wrote:
> >
> > For the server the attached .config works fine but switching from
> > CONFIG_GENERIC_CPU to CONFIG_MCORE2 legt them hang at boot w/op any
> > messages. Similar picture at the desktop.
>
> Ok, so there's another thread ("4.14.9 with CONFIG_MCORE2 fails to
> boot") about this same thing, but one thing to try is to see if it's
> just the
>
> cflags-$(CONFIG_MCORE2) += \
> $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
>
> in arch/x86/Makefile that causes this.
>
> The MCORE2 option does potentially have a few other effects (see
> arch/x86/Kconfig.cpu), but the first one to check might be just that
> compiler command line effect.
>
> So if you can edit arch/x86/Makefile, and just make that say
>
> cflags-$(CONFIG_MCORE2) += $(call cc-option,-mtune=generic)
>
> instead, and see if that makes a difference, that would narrow down
> the possible root cause of this problem.

Or, if it's more convenient, you can try Linus's suggestion by applying the patch
below.

Thanks,

Ingo

===========>

arch/x86/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 3e73bc255e4e..1835752fffc9 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -127,8 +127,8 @@ else
cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)

- cflags-$(CONFIG_MCORE2) += \
- $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_MCORE2) += $(call cc-option,-mtune=generic)
+
cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)

2017-12-29 21:02:40

by Toralf Förster

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On 12/29/2017 09:12 PM, Linus Torvalds wrote:
> instead, and see if that makes a difference, that would narrow down
> the possible root cause of this problem.

not at this ThinkPad T440s (didn't test at the server with an i7-3930).

Boot stops just at:

tsc: Refined TSC clocksource calibration: 2494.225 MHz
clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23f3ea95b09, max_idle_ns: 440795287034 ns

I changed the Makefile accordingly to your suggestion to:

~/devel/linux $ git diff
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 3e73bc255e4e..fb695558821b 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -128,7 +128,7 @@ else
cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)

cflags-$(CONFIG_MCORE2) += \
- $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
+ $(call cc-option,-mtune=generic)
cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)

~/devel/linux $ git describe
v4.15-rc5-114-g2758b3e3e630

This is a "Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz" with gcc-6.4

.config attached

--
Toralf
PGP C4EACDDE 0076E94E


Attachments:
.config (102.95 kB)

2017-12-29 21:17:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Fri, Dec 29, 2017 at 1:02 PM, Toralf Förster <[email protected]> wrote:
> On 12/29/2017 09:12 PM, Linus Torvalds wrote:
>> instead, and see if that makes a difference, that would narrow down
>> the possible root cause of this problem.
>
> not at this ThinkPad T440s (didn't test at the server with an i7-3930).
>
> Boot stops just at:
>
> tsc: Refined TSC clocksource calibration: 2494.225 MHz
> clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23f3ea95b09, max_idle_ns: 440795287034 ns

Uhhuh. So for Alexander Troy, just getting rid of the -march=core2
fixed the boot.

But not for you.

Strange. It really looked like the exact same thing.

> This is a "Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz" with gcc-6.4

Yeah, other reporters of this have used gcc-6.4.0 too.

But there's been some muddying of the waters there too - changing
compilers have fixed it for some cases, but there's at least one
report that a kernel build with gcc-7.2.0 still had the issue (and
another that said it didn't).

But the MCORE2 was consistent for several people - including you.
Until this point.

Strange.

The only other thing (apart from the compiler flag) that MCORE2
results in is to enable

CONFIG_X86_INTEL_USERCOPY
CONFIG_X86_USE_PPRO_CHECKSUM
CONFIG_X86_P6_NOP

and the two first of those shouldn't even matter on x86-64, and I
don't see that last one making any difference either.

So because it looks so impossible that the "-march=core2" didn't make
a difference for you, I'll ask you to please double-check that you
actually booted into the right kernel.

Sorry for doubting you, but your report just broke the _one_
consistent thing we've seen about this bug.

Linus

2017-12-29 21:39:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Fri, Dec 29, 2017 at 1:17 PM, Linus Torvalds
<[email protected]> wrote:
>
> Yeah, other reporters of this have used gcc-6.4.0 too.
>
> But there's been some muddying of the waters there too - changing
> compilers have fixed it for some cases, but there's at least one
> report that a kernel build with gcc-7.2.0 still had the issue (and
> another that said it didn't).

Side note: I'm not convinced that we will reliably catch a compiler
version change in our dependency analysis, so it's probably best to
"make clean" between switching compilers to make sure that you don't
have old object files with the old compiler.

> But the MCORE2 was consistent for several people - including you.
> Until this point.

.. and our build infrastructure definitely _should_ catch compiler
switch changes automatically and force a re-build.

Linus

2017-12-29 22:04:54

by Alexander Tsoy

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

В Пт, 29/12/2017 в 13:39 -0800, Linus Torvalds пишет:
> On Fri, Dec 29, 2017 at 1:17 PM, Linus Torvalds
> <[email protected]> wrote:
> >
> > Yeah, other reporters of this have used gcc-6.4.0 too.
> >
> > But there's been some muddying of the waters there too - changing
> > compilers have fixed it for some cases, but there's at least one
> > report that a kernel build with gcc-7.2.0 still had the issue (and
> > another that said it didn't).
>
> Side note: I'm not convinced that we will reliably catch a compiler
> version change in our dependency analysis, so it's probably best to
> "make clean" between switching compilers to make sure that you don't
> have old object files with the old compiler.

I did "make clean" after changing compiler flags.

2017-12-29 22:30:17

by Toralf Förster

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On 12/29/2017 10:17 PM, Linus Torvalds wrote:
> On Fri, Dec 29, 2017 at 1:02 PM, Toralf Förster <[email protected]> wrote:
>> On 12/29/2017 09:12 PM, Linus Torvalds wrote:
>>> instead, and see if that makes a difference, that would narrow down
>>> the possible root cause of this problem.
>>
>> not at this ThinkPad T440s (didn't test at the server with an i7-3930).
>>
>> Boot stops just at:
>>
>> tsc: Refined TSC clocksource calibration: 2494.225 MHz
>> clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23f3ea95b09, max_idle_ns: 440795287034 ns
>
> Uhhuh. So for Alexander Troy, just getting rid of the -march=core2
> fixed the boot.
>
> But not for you.
>
> Strange. It really looked like the exact same thing.
>
>> This is a "Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz" with gcc-6.4
>
> Yeah, other reporters of this have used gcc-6.4.0 too.
>
> But there's been some muddying of the waters there too - changing
> compilers have fixed it for some cases, but there's at least one
> report that a kernel build with gcc-7.2.0 still had the issue (and
> another that said it didn't).
>
> But the MCORE2 was consistent for several people - including you.
> Until this point.
>
> Strange.
>
> The only other thing (apart from the compiler flag) that MCORE2
> results in is to enable
>
> CONFIG_X86_INTEL_USERCOPY
> CONFIG_X86_USE_PPRO_CHECKSUM
> CONFIG_X86_P6_NOP
>
> and the two first of those shouldn't even matter on x86-64, and I
> don't see that last one making any difference either.
>
> So because it looks so impossible that the "-march=core2" didn't make
> a difference for you, I'll ask you to please double-check that you
> actually booted into the right kernel.
>
> Sorry for doubting you, but your report just broke the _one_
> consistent thing we've seen about this bug.
>
> Linus
>


I double-checked it.

The bad news - the issue is not solved with the changed cflags.
The good news - I could compile eventually a working config for my desktop (works fine with 4.14.10 with generic CPU) having a higher screen resolution during boot.

So I made a "make distclean", followed by a "sudo zcat /proc/config.gz > .config", changed the .config to use MCORE2 instead of GENERIC and defined the string "-local" to ensure that the modules directory is really unique.
Then I run "time make -j4 && sudo make modules_install && sudo cp arch/x86_64/boot/bzImage /boot/vmlinuz-0 && sudo grub-mkconfig -o /boot/grub/grub.cfg", booted and made 3 fotos which were uploaded to [1], look for IMG_*

[1] https://zwiebeltoralf.de/pub/


--
Toralf
PGP C4EACDDE 0076E94E

2017-12-29 22:54:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Fri, Dec 29, 2017 at 2:30 PM, Toralf Förster <[email protected]> wrote:
>
> The bad news - the issue is not solved with the changed cflags.
> The good news - I could compile eventually a working config for my desktop (works fine with 4.14.10 with generic CPU) having a higher screen resolution during boot.
>
> So I made a "make distclean", followed by a "sudo zcat /proc/config.gz > .config", changed the .config to use MCORE2 instead of GENERIC and defined the string "-local" to ensure that the modules directory is really unique.
> Then I run "time make -j4 && sudo make modules_install && sudo cp arch/x86_64/boot/bzImage /boot/vmlinuz-0 && sudo grub-mkconfig -o /boot/grub/grub.cfg", booted and made 3 fotos which were uploaded to [1], look for IMG_*

Ok, so what does seem to be consistent for everybody is that
double-fault in the NMI backtrace.

So the fact that the NMI always hits on a double-fault does make me
suspect that it's a infinite stream of double-faults, and that is
presumably also what causes the RCU timeout.

And as I pointed out elsewhere (damn two threads), I think that it
would help to simply catch the *first* double-fault.

And I *think* that the only thing that can make a double-fault
silently be re-tried is the CONFIG_X86_ESPFIX64 case, so if you can
build a failing kernel with the CONFIG_X86_ESPFIX64 case disabled in
arch/x86/kernel/traps.c do_double_fault(), that would be interesting.

So just change the

#ifdef CONFIG_X86_ESPFIX64

into a

#if 0

and see if instead of the RCU stall after 20 seconds, you get an
immediate double fault error report instead?

I'm still entirely confused about why that MCORE2 would make _any_
difference what-so-ever, so this is all fishing for random clues in
the dark.

Linus

2017-12-29 23:14:14

by Toralf Förster

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On 12/29/2017 11:53 PM, Linus Torvalds wrote:
> So just change the
>
> #ifdef CONFIG_X86_ESPFIX64
>
> into a
>
> #if 0
>
> and see if instead of the RCU stall after 20 seconds, you get an
> immediate double fault error report instead?

well, 3 IMG_20171230_0008* should show the results https://zwiebeltoralf.de/pub/



--
Toralf
PGP C4EACDDE 0076E94E

2017-12-30 00:10:40

by Andy Lutomirski

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)



> On Dec 29, 2017, at 3:53 PM, Linus Torvalds <[email protected]> wrote:
>
>> On Fri, Dec 29, 2017 at 2:30 PM, Toralf Förster <[email protected]> wrote:
>>
>> The bad news - the issue is not solved with the changed cflags.
>> The good news - I could compile eventually a working config for my desktop (works fine with 4.14.10 with generic CPU) having a higher screen resolution during boot.
>>
>> So I made a "make distclean", followed by a "sudo zcat /proc/config.gz > .config", changed the .config to use MCORE2 instead of GENERIC and defined the string "-local" to ensure that the modules directory is really unique.
>> Then I run "time make -j4 && sudo make modules_install && sudo cp arch/x86_64/boot/bzImage /boot/vmlinuz-0 && sudo grub-mkconfig -o /boot/grub/grub.cfg", booted and made 3 fotos which were uploaded to [1], look for IMG_*
>
> Ok, so what does seem to be consistent for everybody is that
> double-fault in the NMI backtrace.
>
> So the fact that the NMI always hits on a double-fault does make me
> suspect that it's a infinite stream of double-faults, and that is
> presumably also what causes the RCU timeout.
>
> And as I pointed out elsewhere (damn two threads), I think that it
> would help to simply catch the *first* double-fault.
>
> And I *think* that the only thing that can make a double-fault
> silently be re-tried is the CONFIG_X86_ESPFIX64 case, so if you can
> build a failing kernel with the CONFIG_X86_ESPFIX64 case disabled in
> arch/x86/kernel/traps.c do_double_fault(), that would be interesting.

Double faults use IST, so a double fault that double faults will effectively just start over rather than eventually running out of stack and triple faulting.

But check out the registers. We have RSP = ...28fd8 and CR2 = ...27f08. IOW the double fault stack is ...28000 - ...28fff and we're somehow getting a failed page fault a couple hundred bytes below the bottom of the IST stack. IOW, I think we're just stuck in a neverending loop of stack overflows.

(Also, Josh, the oops code should have printed the contents of the struct pt_regs at the top of the DF stack. Any idea why it didn't?)

Toralf, can you send the complete output of:

objdump -dr arch/x86/kernel/traps.o

>From the build tree of a nonworking kernel?

Also, you wouldn't happen to be using Gentoo perchance? I already have two reports of a Gentoo system miscompiling the vDSO due to Gentoo enabling -fstack-check and GCC generating stack check code that is highly suboptimal, actively incorrect, and doesn't even manage to check the stack in a particularly helpful way.

If this is indeed what's going on, I'm going to try to come up with a patch to outright fail the build on these buggy systems. We could probably fudge the build options to avoid the problem, but Gentoo really just needs fix its toolchain.

2017-12-30 01:00:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

f

On Fri, Dec 29, 2017 at 4:10 PM, Andy Lutomirski <[email protected]> wrote:
>
> Double faults use IST, so a double fault that double faults will effectively just start over rather than eventually running out of stack and triple faulting.
>
> But check out the registers. We have RSP = ...28fd8 and CR2 = ...27f08.
> IOW the double fault stack is ...28000 - ...28fff and we're somehow getting
> a failed page fault a couple hundred bytes below the bottom of the IST stack.
> IOW, I think we're just stuck in a neverending loop of stack overflows.

Ahh, good catch. This feels like it might finally be explaining things.

> (Also, Josh, the oops code should have printed the contents of the struct pt_regs at the top of the DF stack. Any idea why it didn't?)
>
> Toralf, can you send the complete output of:
>
> objdump -dr arch/x86/kernel/traps.o
>
> From the build tree of a nonworking kernel?

Alexander made one of his failing kernels available earlier:

https://www.dropbox.com/s/yesupqgig3uxf73/linux-4.15-rc5%2B.tar.xz?dl=0

and yes, there's something seriously wrong there. Doing a disassembly
on "do_double_fault()" shows:

ffffffff8101bda0 <do_double_fault>:
ffffffff8101bda0: 41 54 push %r12
ffffffff8101bda2: 55 push %rbp
ffffffff8101bda3: 53 push %rbx
ffffffff8101bda4: 48 81 ec 20 10 00 00 sub $0x1020,%rsp
ffffffff8101bdab: 48 83 0c 24 00 orq $0x0,(%rsp)
ffffffff8101bdb0: 48 81 c4 20 10 00 00 add $0x1020,%rsp

WTF? That's bogus crap, and not ok in the kernel. Doing a stack probe
below the stack by subtracting 4128rom the stack pointer and then
oring it, and then resetting the stack pointer again is just crazy.
And it's definitely not ever going to work for the kernel that has a
limited stack.

So yes, It's a terminally broken compiler from hell. I assume gentoo
has applied some completely broken security patch to their compiler,
turning said compiler into complete garbage.

Doing some trivial grepping on the disassembly in that vmlinux file,
there's tons of those "let's probe more than a page below the stack"
issues. The biggest offset I found was 0x1400.

That one happened to be in do_sys_poll().

> Also, you wouldn't happen to be using Gentoo perchance?

Yes, several people involved are using gentoo. Maybe everybody.

> I already have two reports of a Gentoo system miscompiling the vDSO
> due to Gentoo enabling -fstack-check and GCC generating stack check
> code that is highly suboptimal, actively incorrect, and doesn't even
> manage to check the stack in a particularly helpful way.

Yes. Good. I think you root-caused it.

Good. I was not feeling so happy about this bug report, but now I can
firmly just blame the gentoo compiler for having some shit-for-brains
"feature".

Linus

2017-12-30 01:14:05

by Alexander Tsoy

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

В Пт, 29/12/2017 в 17:10 -0700, Andy Lutomirski пишет:
>
> Also, you wouldn't happen to be using Gentoo perchance?  I already
> have two reports of a Gentoo system miscompiling the vDSO due to
> Gentoo enabling -fstack-check and GCC generating stack check code
> that is highly suboptimal, actively incorrect, and doesn't even
> manage to check the stack in a particularly helpful way.
>
> If this is indeed what's going on, I'm going to try to come up with a
> patch to outright fail the build on these buggy systems.  We could
> probably fudge the build options to avoid the problem, but Gentoo
> really just needs fix its toolchain.

You are right, It's due to fstack-check enabled in gentoo's gcc spec.
"-fstack-check=no" in KBUILD_CFLAGS fixed this problem for me. =/

2017-12-30 01:34:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Fri, Dec 29, 2017 at 5:00 PM, Linus Torvalds
<[email protected]> wrote:
>
> Good. I was not feeling so happy about this bug report, but now I can
> firmly just blame the gentoo compiler for having some shit-for-brains
> "feature".

Looks like I can generate similar bad code with the F26 version of
gcc, it's just not enabled by default.

So all gentoo did was change the default options.

I suspect we should just add a

KBUILD_CFLAGS += $(call cc-option,-fno-stack-check,)

somewhere to the main Makefile, just to make sure.

Maybe like the appended?

Toralf, Alexander, does this make things JustWork(tm) for you?

Linus


Attachments:
patch.diff (575.00 B)

2017-12-30 03:49:51

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Fri, Dec 29, 2017 at 05:10:35PM -0700, Andy Lutomirski wrote:
> (Also, Josh, the oops code should have printed the contents of the
> struct pt_regs at the top of the DF stack. Any idea why it didn't?)

Looking at one of the dumps:

[ 392.774879] NMI backtrace for cpu 0
[ 392.774881] CPU: 0 PID: 1 Comm: init Not tainted 4.14.9-gentoo #1
[ 392.774881] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 392.774882] task: ffff8802368b8000 task.stack: ffffc9000000c000
[ 392.774885] RIP: 0010:double_fault+0x0/0x30
[ 392.774886] RSP: 0000:ffffffffff527fd0 EFLAGS: 00000086
[ 392.774887] RAX: 000000003fc00000 RBX: 0000000000000001 RCX: 00000000c0000101
[ 392.774887] RDX: 00000000ffff8802 RSI: 0000000000000000 RDI: ffffffffff527f58
[ 392.774887] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 392.774888] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff816ae726
[ 392.774888] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 392.774889] FS: 0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
[ 392.774889] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 392.774890] CR2: ffffffffff526f08 CR3: 0000000235b48002 CR4: 00000000001606f0
[ 392.774892] Call Trace:
[ 392.774894] <#DF>
[ 392.774897] do_double_fault+0xb/0x140
[ 392.774898] </#DF>

It should have at least printed the #DF iret frame registers, which I
recently added support for in "x86/unwinder: Handle stack overflows more
gracefully", which is in both 4.14.9 and 4.15-rc5.

I think the missing iret regs are due to a bug in show_trace_log_lvl(),
where if the unwind starts with two regs frames in a row, the second
regs don't get printed.

Alexander, would you mind reproducing again with the below patch? It
should still fail, but this time it should hopefully show another
RIP/RSP/EFLAGS instead of the "do_double_fault+0xb/0x140" line.


diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 36b17e0febe8..39a320d077aa 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -103,6 +103,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,

unwind_start(&state, task, regs, stack);
stack = stack ? : get_stack_pointer(task, regs);
+ regs = unwind_get_entry_regs(&state);

/*
* Iterate through the stacks, starting with the current stack pointer.
@@ -120,7 +121,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
* - hardirq stack
* - entry stack
*/
- for (regs = NULL; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
+ for ( ; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
const char *stack_name;

if (get_stack_info(stack, task, &stack_info, &visit_mask)) {

2017-12-30 08:33:03

by Toralf Förster

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On 12/30/2017 01:10 AM, Andy Lutomirski wrote:
> Toralf, can you send the complete output of:
>
> objdump -dr arch/x86/kernel/traps.o
>
> From the build tree of a nonworking kernel?

I attached it.

FWIW:

tfoerste@t44 ~/devel/linux $ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/6.4.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-6.4.0/work/gcc-6.4.0/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/6.4.0 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.0/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/6.4.0 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/6.4.0/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/6.4.0/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.0/include/g++-v6 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/6.4.0/python --enable-languages=c,c++ --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo Hardened 6.4.0 p1.1' --enable-esp --enable-libstdcxx-time --disable-libstdcxx-pch --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-altivec --disable-fixed-point --enable-targets=all --disable-libgcj --enable-libgomp --disable-libmudflap --disable-libssp --disable-libcilkrts --disable-libmpx --enable-vtable-verify --enable-libvtv --disable-libquadmath --enable-lto --without-isl --disable-libsanitizer --enable-default-pie --enable-default-ssp
Thread model: posix
gcc version 6.4.0 (Gentoo Hardened 6.4.0 p1.1)

--
Toralf
PGP C4EACDDE 0076E94E


Attachments:
objdump.txt (46.12 kB)

2017-12-30 08:45:31

by Alexander Tsoy

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

В Пт, 29/12/2017 в 21:49 -0600, Josh Poimboeuf пишет:
> On Fri, Dec 29, 2017 at 05:10:35PM -0700, Andy Lutomirski wrote:
> > (Also, Josh, the oops code should have printed the contents of the
> > struct pt_regs at the top of the DF stack.  Any idea why it
> > didn't?)
>
> Looking at one of the dumps:
>
>   [  392.774879] NMI backtrace for cpu 0
>   [  392.774881] CPU: 0 PID: 1 Comm: init Not tainted 4.14.9-gentoo
> #1
>   [  392.774881] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>   [  392.774882] task: ffff8802368b8000 task.stack: ffffc9000000c000
>   [  392.774885] RIP: 0010:double_fault+0x0/0x30
>   [  392.774886] RSP: 0000:ffffffffff527fd0 EFLAGS: 00000086
>   [  392.774887] RAX: 000000003fc00000 RBX: 0000000000000001 RCX:
> 00000000c0000101
>   [  392.774887] RDX: 00000000ffff8802 RSI: 0000000000000000 RDI:
> ffffffffff527f58
>   [  392.774887] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
>   [  392.774888] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffffff816ae726
>   [  392.774888] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
>   [  392.774889] FS:  0000000000000000(0000)
> GS:ffff88023fc00000(0000) knlGS:0000000000000000
>   [  392.774889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [  392.774890] CR2: ffffffffff526f08 CR3: 0000000235b48002 CR4:
> 00000000001606f0
>   [  392.774892] Call Trace:
>   [  392.774894]  <#DF>
>   [  392.774897]  do_double_fault+0xb/0x140
>   [  392.774898]  </#DF>
>
> It should have at least printed the #DF iret frame registers, which I
> recently added support for in "x86/unwinder: Handle stack overflows
> more
> gracefully", which is in both 4.14.9 and 4.15-rc5.
>
> I think the missing iret regs are due to a bug in
> show_trace_log_lvl(),
> where if the unwind starts with two regs frames in a row, the second
> regs don't get printed.
>
> Alexander, would you mind reproducing again with the below patch?  It
> should still fail, but this time it should hopefully show another
> RIP/RSP/EFLAGS instead of the "do_double_fault+0xb/0x140" line.
>

Yes, it works:

[   23.058064] NMI backtrace for cpu 2
[   23.058068] CPU: 2 PID: 1 Comm: init Not tainted 4.15.0-rc5+ #1
[   23.058069] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1.fc27 04/01/2014
[   23.058074] RIP: 0010:double_fault+0x0/0x30
[   23.058075] RSP: 0000:fffffe800005ffd0 EFLAGS: 00000086
[   23.058077] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
00000000c0000101
[   23.058077] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
fffffe800005ff58
[   23.058078] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[   23.058079] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffffff92001426
[   23.058080] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[   23.058083] FS:  0000000000000000(0000) GS:ffff96813fd00000(0000)
knlGS:0000000000000000
[   23.058084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.058085] CR2: fffffe800005ef08 CR3: 0000000137a09000 CR4:
00000000000406a0
[   23.058089] Call Trace:
[   23.058101]  <#DF>
[   23.058104] RIP: 0010:do_double_fault+0xb/0x140
[   23.058105] RSP: 0000:fffffe800005ef18 EFLAGS: 00010086 ORIG_RAX:
0000000000000000
[   23.058106] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
00000000c0000101
[   23.058107] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
fffffe800005ff58
[   23.058107] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[   23.058108] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffffff92001426
[   23.058108] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[   23.058111]  </#DF>
[   23.058111] Code: 05 00 00 48 89 e7 31 f6 e8 2e 8c 61 ff e9 69 06 00
00 e8 94 05 00 00 48 89 e7 31 f6 e8 1a 8c 61 ff e9 55 06 00 00 0f 1f 44
00 00 <0f> 1f 00 48 83 c4 88 e8 e4 04 00 00 48 89 e7 48 8b 74 24 78 48

2017-12-30 09:14:21

by Alexander Tsoy

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

В Пт, 29/12/2017 в 17:34 -0800, Linus Torvalds пишет:
> On Fri, Dec 29, 2017 at 5:00 PM, Linus Torvalds
> <[email protected]> wrote:
> >
> > Good. I was not feeling so happy about this bug report, but now I
> > can
> > firmly just blame the gentoo compiler for having some shit-for-
> > brains
> > "feature".
>
> Looks like I can generate similar bad code with the F26 version of
> gcc, it's just not enabled by default.
>
> So all gentoo did was change the default options.

Yes, and only in hardened profile, so most users don't have -fstack-
check by default. :)

>
> I suspect we should just add a
>
>     KBUILD_CFLAGS  += $(call cc-option,-fno-stack-check,)
>
> somewhere to the main Makefile, just to make sure.
>
> Maybe like the appended?
>
> Toralf, Alexander, does this make things JustWork(tm) for you?

I can confirm that with your patch my gcc produces working kernel.

2017-12-30 09:21:55

by Toralf Förster

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On 12/30/2017 10:14 AM, Alexander Tsoy wrote:
> Yes, and only in hardened profile, so most users don't have -fstack-
> check by default. :)
Indeed, I do run hardened Gentoo only.

--
Toralf
PGP C4EACDDE 0076E94E

2017-12-30 09:30:38

by Toralf Förster

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On 12/30/2017 04:49 AM, Josh Poimboeuf wrote:
> Alexander, would you mind reproducing again with the below patch? It
> should still fail, but this time it should hopefully show another
> RIP/RSP/EFLAGS instead of the "do_double_fault+0xb/0x140" line.

I applied that too on top of v4.15-rc5-114-g2758b3e3e630 (no other patches or changes to cflags or so), make c clean, then build and booted the kernel, still stucks, the result is in [1]


[1] https://zwiebeltoralf.de/pub/IMG_20171230_102325.jpg

--
Toralf
PGP C4EACDDE 0076E94E

2017-12-30 10:02:44

by Jiri Kosina

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Fri, 29 Dec 2017, Linus Torvalds wrote:

> Ok, so what does seem to be consistent for everybody is that
> double-fault in the NMI backtrace.
>
> So the fact that the NMI always hits on a double-fault does make me
> suspect that it's a infinite stream of double-faults, and that is
> presumably also what causes the RCU timeout.

As I've been fighting with recursive double-faults lately (backporting PTI
to ancient kernels), I can tell you that this is not the symptom you'd be
seeing in such case; recursive double fault pretty quickly overflows the
interrupt stack and triple-faults.

--
Jiri Kosina
SUSE Labs

2017-12-30 12:58:32

by Toralf Förster

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On 12/30/2017 02:13 AM, Alexander Tsoy wrote:
> You are right, It's due to fstack-check enabled in gentoo's gcc spec.
> "-fstack-check=no" in KBUILD_CFLAGS fixed this problem for me. =/

This made the issue go away :

diff --git a/Makefile b/Makefile
index ac8c441866b7..11a12947c550 100644
--- a/Makefile
+++ b/Makefile
@@ -414,7 +414,7 @@ LINUXINCLUDE := \

KBUILD_AFLAGS := -D__ASSEMBLY__
KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
- -fno-strict-aliasing -fno-common -fshort-wchar \
+ -fno-strict-aliasing -fno-common -fshort-wchar -fstack-check=no \
-Werror-implicit-function-declaration \
-Wno-format-security \
-std=gnu89

But this doesn't solve the root cause, right ? So if the root cause is "Gentoo hardened GCC is broken" please just let me know this - FWIW I'm in #gentoo-dev on freenode.

--
Toralf
PGP C4EACDDE 0076E94E

2017-12-30 13:15:22

by Jiri Kosina

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Sat, 30 Dec 2017, Toralf F?rster wrote:

> This made the issue go away :
>
> diff --git a/Makefile b/Makefile
> index ac8c441866b7..11a12947c550 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -414,7 +414,7 @@ LINUXINCLUDE := \
>
> KBUILD_AFLAGS := -D__ASSEMBLY__
> KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
> - -fno-strict-aliasing -fno-common -fshort-wchar \
> + -fno-strict-aliasing -fno-common -fshort-wchar -fstack-check=no \
> -Werror-implicit-function-declaration \
> -Wno-format-security \
> -std=gnu89
>
> But this doesn't solve the root cause, right ? So if the root cause is
> "Gentoo hardened GCC is broken" please just let me know this - FWIW I'm
> in #gentoo-dev on freenode.

-fstack-check for kernel is never going to work properly.

That option is purely for userspace, and assumes all the logic around
'stack guard gap' and the auto-growing semantics being in place; which is
there for user stack VMA, but definitely not for kernel stack.

It's probably the "hardened" flavor of your distro trying to push
'-fstack-check' to everything it compiles; so I actually think the
Makefile patch, sanitizing CFLAGS by force-disabling -fstack-check is
exactly what we should be doing.

Thanks,

--
Jiri Kosina
SUSE Labs

2017-12-30 17:09:56

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Sat, Dec 30, 2017 at 11:45:13AM +0300, Alexander Tsoy wrote:
> В Пт, 29/12/2017 в 21:49 -0600, Josh Poimboeuf пишет:
> > On Fri, Dec 29, 2017 at 05:10:35PM -0700, Andy Lutomirski wrote:
> > > (Also, Josh, the oops code should have printed the contents of the
> > > struct pt_regs at the top of the DF stack.  Any idea why it
> > > didn't?)
> >
> > Looking at one of the dumps:
> >
> >   [  392.774879] NMI backtrace for cpu 0
> >   [  392.774881] CPU: 0 PID: 1 Comm: init Not tainted 4.14.9-gentoo
> > #1
> >   [  392.774881] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> >   [  392.774882] task: ffff8802368b8000 task.stack: ffffc9000000c000
> >   [  392.774885] RIP: 0010:double_fault+0x0/0x30
> >   [  392.774886] RSP: 0000:ffffffffff527fd0 EFLAGS: 00000086
> >   [  392.774887] RAX: 000000003fc00000 RBX: 0000000000000001 RCX:
> > 00000000c0000101
> >   [  392.774887] RDX: 00000000ffff8802 RSI: 0000000000000000 RDI:
> > ffffffffff527f58
> >   [  392.774887] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > 0000000000000000
> >   [  392.774888] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffffff816ae726
> >   [  392.774888] R13: 0000000000000000 R14: 0000000000000000 R15:
> > 0000000000000000
> >   [  392.774889] FS:  0000000000000000(0000)
> > GS:ffff88023fc00000(0000) knlGS:0000000000000000
> >   [  392.774889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   [  392.774890] CR2: ffffffffff526f08 CR3: 0000000235b48002 CR4:
> > 00000000001606f0
> >   [  392.774892] Call Trace:
> >   [  392.774894]  <#DF>
> >   [  392.774897]  do_double_fault+0xb/0x140
> >   [  392.774898]  </#DF>
> >
> > It should have at least printed the #DF iret frame registers, which I
> > recently added support for in "x86/unwinder: Handle stack overflows
> > more
> > gracefully", which is in both 4.14.9 and 4.15-rc5.
> >
> > I think the missing iret regs are due to a bug in
> > show_trace_log_lvl(),
> > where if the unwind starts with two regs frames in a row, the second
> > regs don't get printed.
> >
> > Alexander, would you mind reproducing again with the below patch?  It
> > should still fail, but this time it should hopefully show another
> > RIP/RSP/EFLAGS instead of the "do_double_fault+0xb/0x140" line.
> >
>
> Yes, it works:
>
> [   23.058064] NMI backtrace for cpu 2
> [   23.058068] CPU: 2 PID: 1 Comm: init Not tainted 4.15.0-rc5+ #1
> [   23.058069] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.10.2-1.fc27 04/01/2014
> [   23.058074] RIP: 0010:double_fault+0x0/0x30
> [   23.058075] RSP: 0000:fffffe800005ffd0 EFLAGS: 00000086
> [   23.058077] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
> 00000000c0000101
> [   23.058077] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
> fffffe800005ff58
> [   23.058078] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [   23.058079] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffffff92001426
> [   23.058080] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [   23.058083] FS:  0000000000000000(0000) GS:ffff96813fd00000(0000)
> knlGS:0000000000000000
> [   23.058084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   23.058085] CR2: fffffe800005ef08 CR3: 0000000137a09000 CR4:
> 00000000000406a0
> [   23.058089] Call Trace:
> [   23.058101]  <#DF>
> [   23.058104] RIP: 0010:do_double_fault+0xb/0x140
> [   23.058105] RSP: 0000:fffffe800005ef18 EFLAGS: 00010086 ORIG_RAX:
> 0000000000000000
> [   23.058106] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
> 00000000c0000101
> [   23.058107] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
> fffffe800005ff58
> [   23.058107] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [   23.058108] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffffff92001426
> [   23.058108] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [   23.058111]  </#DF>
> [   23.058111] Code: 05 00 00 48 89 e7 31 f6 e8 2e 8c 61 ff e9 69 06 00
> 00 e8 94 05 00 00 48 89 e7 31 f6 e8 1a 8c 61 ff e9 55 06 00 00 0f 1f 44
> 00 00 <0f> 1f 00 48 83 c4 88 e8 e4 04 00 00 48 89 e7 48 8b 74 24 78 48

That's better indeed, though still not quite right. It should have only
shown a subset of those registers. One more bug to fix there...

--
Josh

2017-12-30 17:57:50

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Sat, Dec 30, 2017 at 11:09:46AM -0600, Josh Poimboeuf wrote:
> On Sat, Dec 30, 2017 at 11:45:13AM +0300, Alexander Tsoy wrote:
> > В Пт, 29/12/2017 в 21:49 -0600, Josh Poimboeuf пишет:
> > > On Fri, Dec 29, 2017 at 05:10:35PM -0700, Andy Lutomirski wrote:
> > > > (Also, Josh, the oops code should have printed the contents of the
> > > > struct pt_regs at the top of the DF stack.  Any idea why it
> > > > didn't?)
> > >
> > > Looking at one of the dumps:
> > >
> > >   [  392.774879] NMI backtrace for cpu 0
> > >   [  392.774881] CPU: 0 PID: 1 Comm: init Not tainted 4.14.9-gentoo
> > > #1
> > >   [  392.774881] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> > >   [  392.774882] task: ffff8802368b8000 task.stack: ffffc9000000c000
> > >   [  392.774885] RIP: 0010:double_fault+0x0/0x30
> > >   [  392.774886] RSP: 0000:ffffffffff527fd0 EFLAGS: 00000086
> > >   [  392.774887] RAX: 000000003fc00000 RBX: 0000000000000001 RCX:
> > > 00000000c0000101
> > >   [  392.774887] RDX: 00000000ffff8802 RSI: 0000000000000000 RDI:
> > > ffffffffff527f58
> > >   [  392.774887] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > > 0000000000000000
> > >   [  392.774888] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffffffff816ae726
> > >   [  392.774888] R13: 0000000000000000 R14: 0000000000000000 R15:
> > > 0000000000000000
> > >   [  392.774889] FS:  0000000000000000(0000)
> > > GS:ffff88023fc00000(0000) knlGS:0000000000000000
> > >   [  392.774889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >   [  392.774890] CR2: ffffffffff526f08 CR3: 0000000235b48002 CR4:
> > > 00000000001606f0
> > >   [  392.774892] Call Trace:
> > >   [  392.774894]  <#DF>
> > >   [  392.774897]  do_double_fault+0xb/0x140
> > >   [  392.774898]  </#DF>
> > >
> > > It should have at least printed the #DF iret frame registers, which I
> > > recently added support for in "x86/unwinder: Handle stack overflows
> > > more
> > > gracefully", which is in both 4.14.9 and 4.15-rc5.
> > >
> > > I think the missing iret regs are due to a bug in
> > > show_trace_log_lvl(),
> > > where if the unwind starts with two regs frames in a row, the second
> > > regs don't get printed.
> > >
> > > Alexander, would you mind reproducing again with the below patch?  It
> > > should still fail, but this time it should hopefully show another
> > > RIP/RSP/EFLAGS instead of the "do_double_fault+0xb/0x140" line.
> > >
> >
> > Yes, it works:
> >
> > [   23.058064] NMI backtrace for cpu 2
> > [   23.058068] CPU: 2 PID: 1 Comm: init Not tainted 4.15.0-rc5+ #1
> > [   23.058069] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS 1.10.2-1.fc27 04/01/2014
> > [   23.058074] RIP: 0010:double_fault+0x0/0x30
> > [   23.058075] RSP: 0000:fffffe800005ffd0 EFLAGS: 00000086
> > [   23.058077] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
> > 00000000c0000101
> > [   23.058077] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
> > fffffe800005ff58
> > [   23.058078] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > 0000000000000000
> > [   23.058079] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffffff92001426
> > [   23.058080] R13: 0000000000000000 R14: 0000000000000000 R15:
> > 0000000000000000
> > [   23.058083] FS:  0000000000000000(0000) GS:ffff96813fd00000(0000)
> > knlGS:0000000000000000
> > [   23.058084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   23.058085] CR2: fffffe800005ef08 CR3: 0000000137a09000 CR4:
> > 00000000000406a0
> > [   23.058089] Call Trace:
> > [   23.058101]  <#DF>
> > [   23.058104] RIP: 0010:do_double_fault+0xb/0x140
> > [   23.058105] RSP: 0000:fffffe800005ef18 EFLAGS: 00010086 ORIG_RAX:
> > 0000000000000000
> > [   23.058106] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
> > 00000000c0000101
> > [   23.058107] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
> > fffffe800005ff58
> > [   23.058107] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > 0000000000000000
> > [   23.058108] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffffff92001426
> > [   23.058108] R13: 0000000000000000 R14: 0000000000000000 R15:
> > 0000000000000000
> > [   23.058111]  </#DF>
> > [   23.058111] Code: 05 00 00 48 89 e7 31 f6 e8 2e 8c 61 ff e9 69 06 00
> > 00 e8 94 05 00 00 48 89 e7 31 f6 e8 1a 8c 61 ff e9 55 06 00 00 0f 1f 44
> > 00 00 <0f> 1f 00 48 83 c4 88 e8 e4 04 00 00 48 89 e7 48 8b 74 24 78 48
>
> That's better indeed, though still not quite right. It should have only
> shown a subset of those registers. One more bug to fix there...

Turns out my previous code to print iret frames was a bit ... misguided,
to put it nicely. Not sure what I was smoking.

Hopefully the below patch should fix it (in place of the previous
patch). Would you mind testing again?

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index c1688c2d0a12..1f86e1b0a5cd 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -56,18 +56,27 @@ void unwind_start(struct unwind_state *state, struct task_struct *task,

#if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER)
/*
- * WARNING: The entire pt_regs may not be safe to dereference. In some cases,
- * only the iret frame registers are accessible. Use with caution!
+ * If 'partial' returns true, only the iret frame registers are valid.
*/
-static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state,
+ bool *partial)
{
if (unwind_done(state))
return NULL;

+ if (partial) {
+#ifdef CONFIG_UNWINDER_ORC
+ *partial = !state->full_regs;
+#else
+ *partial = false;
+#endif
+ }
+
return state->regs;
}
#else
-static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state,
+ bool *partial)
{
return NULL;
}
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 36b17e0febe8..6e49d2e0c243 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -76,12 +76,11 @@ void show_iret_regs(struct pt_regs *regs)
regs->sp, regs->flags);
}

-static void show_regs_safe(struct stack_info *info, struct pt_regs *regs)
+static void show_regs_full_or_partial(struct pt_regs *regs, bool partial)
{
- if (on_stack(info, regs, sizeof(*regs)))
+ if (!partial)
__show_regs(regs, 0);
- else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET,
- IRET_FRAME_SIZE)) {
+ else {
/*
* When an interrupt or exception occurs in entry code, the
* full pt_regs might not have been saved yet. In that case
@@ -98,11 +97,13 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
struct stack_info stack_info = {0};
unsigned long visit_mask = 0;
int graph_idx = 0;
+ bool partial;

printk("%sCall Trace:\n", log_lvl);

unwind_start(&state, task, regs, stack);
stack = stack ? : get_stack_pointer(task, regs);
+ regs = unwind_get_entry_regs(&state, &partial);

/*
* Iterate through the stacks, starting with the current stack pointer.
@@ -120,7 +121,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
* - hardirq stack
* - entry stack
*/
- for (regs = NULL; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
+ for ( ; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
const char *stack_name;

if (get_stack_info(stack, task, &stack_info, &visit_mask)) {
@@ -140,7 +141,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
printk("%s <%s>\n", log_lvl, stack_name);

if (regs)
- show_regs_safe(&stack_info, regs);
+ show_regs_full_or_partial(regs, partial);

/*
* Scan the stack, printing any text addresses we find. At the
@@ -164,7 +165,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,

/*
* Don't print regs->ip again if it was already printed
- * by show_regs_safe() below.
+ * by show_regs_full_or_partial() below.
*/
if (regs && stack == &regs->ip)
goto next;
@@ -199,9 +200,9 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
unwind_next_frame(&state);

/* if the frame has entry regs, print them */
- regs = unwind_get_entry_regs(&state);
+ regs = unwind_get_entry_regs(&state, &partial);
if (regs)
- show_regs_safe(&stack_info, regs);
+ show_regs_full_or_partial(regs, partial);
}

if (stack_name)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 77835bc021c7..7dd0d2a0d142 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace,
for (unwind_start(&state, task, NULL, NULL); !unwind_done(&state);
unwind_next_frame(&state)) {

- regs = unwind_get_entry_regs(&state);
+ regs = unwind_get_entry_regs(&state, NULL);
if (regs) {
/*
* Kernel mode registers on the stack indicate an

2017-12-30 22:03:44

by Alexander Tsoy

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

В Sat, 30 Dec 2017 11:57:46 -0600
Josh Poimboeuf <[email protected]> пишет:

> On Sat, Dec 30, 2017 at 11:09:46AM -0600, Josh Poimboeuf wrote:
> > On Sat, Dec 30, 2017 at 11:45:13AM +0300, Alexander Tsoy wrote:
> > > В Пт, 29/12/2017 в 21:49 -0600, Josh Poimboeuf пишет:
> > > > On Fri, Dec 29, 2017 at 05:10:35PM -0700, Andy Lutomirski
> > > > wrote:
> > > > > (Also, Josh, the oops code should have printed the contents
> > > > > of the struct pt_regs at the top of the DF stack.  Any idea
> > > > > why it didn't?)
> > > >
> > > > Looking at one of the dumps:
> > > >
> > > >   [  392.774879] NMI backtrace for cpu 0
> > > >   [  392.774881] CPU: 0 PID: 1 Comm: init Not tainted
> > > > 4.14.9-gentoo #1
> > > >   [  392.774881] Hardware name: Red Hat KVM, BIOS 0.5.1
> > > > 01/01/2011 [  392.774882] task: ffff8802368b8000 task.stack:
> > > > ffffc9000000c000 [  392.774885] RIP: 0010:double_fault+0x0/0x30
> > > >   [  392.774886] RSP: 0000:ffffffffff527fd0 EFLAGS: 00000086
> > > >   [  392.774887] RAX: 000000003fc00000 RBX: 0000000000000001
> > > > RCX: 00000000c0000101
> > > >   [  392.774887] RDX: 00000000ffff8802 RSI: 0000000000000000
> > > > RDI: ffffffffff527f58
> > > >   [  392.774887] RBP: 0000000000000000 R08: 0000000000000000
> > > > R09: 0000000000000000
> > > >   [  392.774888] R10: 0000000000000000 R11: 0000000000000000
> > > > R12: ffffffff816ae726
> > > >   [  392.774888] R13: 0000000000000000 R14: 0000000000000000
> > > > R15: 0000000000000000
> > > >   [  392.774889] FS:  0000000000000000(0000)
> > > > GS:ffff88023fc00000(0000) knlGS:0000000000000000
> > > >   [  392.774889] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > 0000000080050033 [  392.774890] CR2: ffffffffff526f08 CR3:
> > > > 0000000235b48002 CR4: 00000000001606f0
> > > >   [  392.774892] Call Trace:
> > > >   [  392.774894]  <#DF>
> > > >   [  392.774897]  do_double_fault+0xb/0x140
> > > >   [  392.774898]  </#DF>
> > > >
> > > > It should have at least printed the #DF iret frame registers,
> > > > which I recently added support for in "x86/unwinder: Handle
> > > > stack overflows more
> > > > gracefully", which is in both 4.14.9 and 4.15-rc5.
> > > >
> > > > I think the missing iret regs are due to a bug in
> > > > show_trace_log_lvl(),
> > > > where if the unwind starts with two regs frames in a row, the
> > > > second regs don't get printed.
> > > >
> > > > Alexander, would you mind reproducing again with the below
> > > > patch?  It should still fail, but this time it should hopefully
> > > > show another RIP/RSP/EFLAGS instead of the
> > > > "do_double_fault+0xb/0x140" line.
> > >
> > > Yes, it works:
> > >
> > > [   23.058064] NMI backtrace for cpu 2
> > > [   23.058068] CPU: 2 PID: 1 Comm: init Not tainted 4.15.0-rc5+ #1
> > > [   23.058069] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > > 1996), BIOS 1.10.2-1.fc27 04/01/2014
> > > [   23.058074] RIP: 0010:double_fault+0x0/0x30
> > > [   23.058075] RSP: 0000:fffffe800005ffd0 EFLAGS: 00000086
> > > [   23.058077] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
> > > 00000000c0000101
> > > [   23.058077] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
> > > fffffe800005ff58
> > > [   23.058078] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [   23.058079] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffffffff92001426
> > > [   23.058080] R13: 0000000000000000 R14: 0000000000000000 R15:
> > > 0000000000000000
> > > [   23.058083] FS:  0000000000000000(0000)
> > > GS:ffff96813fd00000(0000) knlGS:0000000000000000
> > > [   23.058084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   23.058085] CR2: fffffe800005ef08 CR3: 0000000137a09000 CR4:
> > > 00000000000406a0
> > > [   23.058089] Call Trace:
> > > [   23.058101]  <#DF>
> > > [   23.058104] RIP: 0010:do_double_fault+0xb/0x140
> > > [   23.058105] RSP: 0000:fffffe800005ef18 EFLAGS: 00010086
> > > ORIG_RAX: 0000000000000000
> > > [   23.058106] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
> > > 00000000c0000101
> > > [   23.058107] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
> > > fffffe800005ff58
> > > [   23.058107] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [   23.058108] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffffffff92001426
> > > [   23.058108] R13: 0000000000000000 R14: 0000000000000000 R15:
> > > 0000000000000000
> > > [   23.058111]  </#DF>
> > > [   23.058111] Code: 05 00 00 48 89 e7 31 f6 e8 2e 8c 61 ff e9 69
> > > 06 00 00 e8 94 05 00 00 48 89 e7 31 f6 e8 1a 8c 61 ff e9 55 06 00
> > > 00 0f 1f 44 00 00 <0f> 1f 00 48 83 c4 88 e8 e4 04 00 00 48 89 e7
> > > 48 8b 74 24 78 48
> >
> > That's better indeed, though still not quite right. It should have
> > only shown a subset of those registers. One more bug to fix
> > there...
>
> Turns out my previous code to print iret frames was a bit ...
> misguided, to put it nicely. Not sure what I was smoking.
>
> Hopefully the below patch should fix it (in place of the previous
> patch). Would you mind testing again?
>

With that patch I get:

[ 2.160017] NMI backtrace for cpu 0
[ 2.160017] CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc5 #1
[ 2.160017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
[ 2.160017] RIP: 0010:double_fault+0x0/0x30
[ 2.160017] RSP: 0000:fffffe8000007fd0 EFLAGS: 00010086
[ 2.160017] RAX: 00000000ffc00000 RBX: 0000000000000001 RCX: 00000000c0000101
[ 2.160017] RDX: 00000000ffff8edc RSI: 0000000000000000 RDI: fffffe8000007f58
[ 2.160017] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2.160017] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa3c01426
[ 2.160017] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2.160017] FS: 0000000000000000(0000) GS:ffff8edcffc00000(0000) knlGS:0000000000000000
[ 2.160017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.160017] CR2: fffffe8000006f08 CR3: 000000007c153000 CR4: 00000000000006b0
[ 2.160017] Call Trace:
[ 2.160017] <#DF>
[ 2.160017] RIP: 0010:do_double_fault+0xb/0x140
[ 2.160017] RSP: 0000:fffffe8000006f18 EFLAGS: 00010086
[ 2.160017] </#DF>

--
Best regards,
Alexander Tsoy

2017-12-30 22:16:54

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: 4.14.9 doesn't boot (regression)

On Sun, Dec 31, 2017 at 01:03:25AM +0300, Alexander Tsoy wrote:
> > Turns out my previous code to print iret frames was a bit ...
> > misguided, to put it nicely. Not sure what I was smoking.
> >
> > Hopefully the below patch should fix it (in place of the previous
> > patch). Would you mind testing again?
> >
>
> With that patch I get:
>
> [ 2.160017] NMI backtrace for cpu 0
> [ 2.160017] CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc5 #1
> [ 2.160017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
> [ 2.160017] RIP: 0010:double_fault+0x0/0x30
> [ 2.160017] RSP: 0000:fffffe8000007fd0 EFLAGS: 00010086
> [ 2.160017] RAX: 00000000ffc00000 RBX: 0000000000000001 RCX: 00000000c0000101
> [ 2.160017] RDX: 00000000ffff8edc RSI: 0000000000000000 RDI: fffffe8000007f58
> [ 2.160017] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 2.160017] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa3c01426
> [ 2.160017] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 2.160017] FS: 0000000000000000(0000) GS:ffff8edcffc00000(0000) knlGS:0000000000000000
> [ 2.160017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.160017] CR2: fffffe8000006f08 CR3: 000000007c153000 CR4: 00000000000006b0
> [ 2.160017] Call Trace:
> [ 2.160017] <#DF>
> [ 2.160017] RIP: 0010:do_double_fault+0xb/0x140
> [ 2.160017] RSP: 0000:fffffe8000006f18 EFLAGS: 00010086
> [ 2.160017] </#DF>

Yes, that's more like it. I'll clean up the patches and submit them
soon. These nasty bugs are always a good testcase for the stack dump
code.

Thanks for testing!

--
Josh