LinuxLists.cc - 4.15-rc6+ hang

2018-01-05 02:13:02

Subject: 4.15-rc6+ hang

I am seeing a hang running kernel 4.15-rc6+ on a vanilla VirtualBox VM.
(VirtualBox version 5.0.40)

I am running Linus' main git branch as of this commit, and my config is
attached.
    commit e1915c8195b38393005be9b74bfa6a3a367c83b3
    Merge: 00a5ae2 abb62c4
    Author: Linus Torvalds <[email protected]>
    Date:   Thu Jan 4 11:14:36 2018 -0800

My kernel command line:
BOOT_IMAGE=/vmlinuz-4.15.0-rc6+ root=/dev/mapper/ol-root ro
    crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap
    LANG=en_US.UTF-8 console=tty0 console=ttyS0,115200n8

When I boot this kernel, it hangs and falls back into dracut. Here's
the boot log immediately prior to the hang:
[    8.195533] Segment Routing with IPv6
[    8.195545] NET: Registered protocol family 17
[    8.420746] sched_clock: Marking stable (8420621166,
0)->(18132394441, -9711773275)
[    8.642271] input: ImExPS/2 Generic Explorer Mouse as
/devices/platform/i8042/serio1/input/input4
[ OK ] Reached target System Initialization.
[ OK ] Started Show Plymouth Boot Screen.
[ OK ] Reached target Paths.
[ OK ] Reached target Basic System.
[ 145.607869] dracut-initqueue[314]: Warning: dracut-initqueue timeout
- starting timeout scripts
[ 146.185075] dracut-initqueue[314]: Warning: dracut-initqueue timeout
- starting timeout scripts

And here's a snippet from dmesg in dracut:
dracut:/# dmesg | less
[    0.000000] Linux version 4.15.0-rc6+ (kernel@OracleL460) (gcc
version 5.4.0 20160609 (Ubuntu 5.4.0-68
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.15.0-rc6+
root=/dev/mapper/ol-root ro crashkernel=auo8
[    0.000000] ------------[ cut here ]------------
[    0.000000] XSAVE consistency problem, dumping leaves
[    0.000000] WARNING: CPU: 0 PID: 0 at
arch/x86/kernel/fpu/xstate.c:614 fpu__init_system_xstate+0x374/b
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.15.0-rc6+ #10
[    0.000000] RIP: 0010:fpu__init_system_xstate+0x374/0x8bb
[    0.000000] RSP: 0000:ffffffff82003de0 EFLAGS: 00010086 ORIG_RAX:
0000000000000000
[    0.000000] RAX: 0000000000000000 RBX: ffffffff82003e0c RCX:
ffffffff82061868
[    0.000000] RDX: 0000000000000001 RSI: 0000000000000092 RDI:
0000000000000002
[    0.000000] RBP: 000000000000000a R08: 657661656c20676e R09:
0000000000000003
[    0.000000] R10: ffffffff8221dd00 R11: 69706d7564202c6d R12:
0000000000000100
[    0.000000] R13: 0000000000000340 R14: 0000000000000440 R15:
ffffffff82003e00
[    0.000000] FS: 0000000000000000(0000) GS:ffffffff8254f000(0000)
knlGS:0000000000000000
[    0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: ffff8800000945a0 CR3: 00000000025e6000 CR4:
00000000000406a0
[    0.000000] Call Trace:
[    0.000000] ? fpu__init_system+0x224/0x27c
[    0.000000] ? intel_detect_tlb+0x480/0x4b0
[    0.000000] ? early_init_intel+0xe8/0x320
[    0.000000] ? setup_arch+0xb7/0xc8d
[    0.000000] ? setup_arch+0xb7/0xc8d
[    0.000000] ? start_kernel+0x69/0x4a6
[    0.000000] ? x86_family+0x5/0x20
[    0.000000] ? load_ucode_bsp+0x118/0x134
[    0.000000] ? secondary_startup_64+0xa5/0xb0
3d 2e 97 aa ff 00 74 05 e8 7b 56 ab fe 48 8b 35 20 b2
x50 with crng_init=0
[    0.000000] ---[ end trace 6ecf7087f2750797 ]---

I scraped the screen and have attached the kernel's dmesg as well.

Thanks.

Tom

Attachments:

dmesg (44.52 kB)
config-4.15-rc6+ (170.64 kB)
Download all attachments

2018-01-05 02:23:01

by Linus Torvalds

[permalink] [raw]

Subject: Re: 4.15-rc6+ hang

On Thu, Jan 4, 2018 at 5:36 PM, Tom Hromatka <[email protected]> wrote:
> I am seeing a hang running kernel 4.15-rc6+ on a vanilla VirtualBox VM.
> (VirtualBox version 5.0.40)

Any chance of bisecting this?

I could imagine that all the stuff we now do for page table isolation
might confuse the VM.

> When I boot this kernel, it hangs and falls back into dracut. Here's
> the boot log immediately prior to the hang:

So a few questions:

(a) does it work with "pti=no" on the kernel command line

(b) what was the last kernel that worked? Is 4.15-rc5 fine, for example?

> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] XSAVE consistency problem, dumping leaves

I think this is a vbox issue, with virtualbox not exposing all the
xsave state, so that when the kernel adds up the xsave areas, the end
result doesn't match what the total size is reported to be.

I suspect you _should_ have gotten that before too, independently of the hang.

Linus

2018-01-05 04:07:49

by Tom Hromatka

[permalink] [raw]

Subject: Re: 4.15-rc6+ hang

On 01/04/2018 07:22 PM, Linus Torvalds wrote:
> On Thu, Jan 4, 2018 at 5:36 PM, Tom Hromatka <[email protected]> wrote:
>> I am seeing a hang running kernel 4.15-rc6+ on a vanilla VirtualBox VM.
>> (VirtualBox version 5.0.40)
> Any chance of bisecting this?
>
> I could imagine that all the stuff we now do for page table isolation
> might confuse the VM.

Yes, I can try and bisect this.

>> When I boot this kernel, it hangs and falls back into dracut. Here's
>> the boot log immediately prior to the hang:
> So a few questions:
>
> (a) does it work with "pti=no" on the kernel command line

pti=no also hung in the same fashion with the 4.15-rc6+
kernel.

>
> (b) what was the last kernel that worked? Is 4.15-rc5 fine, for example?

4.15-rc5 hung as well. I'll go further back and see what I
can find.

>> [ 0.000000] ------------[ cut here ]------------
>> [ 0.000000] XSAVE consistency problem, dumping leaves
> I think this is a vbox issue, with virtualbox not exposing all the
> xsave state, so that when the kernel adds up the xsave areas, the end
> result doesn't match what the total size is reported to be.

It seems probable that this is a VirtualBox issue. I was
able to boot my exact 4.15-rc6+ kernel in qemu-kvm v1.5.3
just fine.

>
> I suspect you _should_ have gotten that before too, independently of the hang.

4.15-rc5 also exhibits the xsave issue in VirtualBox.

Thanks.

Tom

>
> Linus

2018-01-07 10:51:00

by Christian Kujau

[permalink] [raw]

Subject: Re: 4.15-rc6+ hang

On Thu, 4 Jan 2018, Tom Hromatka wrote:
> > > [ 0.000000] ------------[ cut here ]------------
> > > [ 0.000000] XSAVE consistency problem, dumping leaves
> > I think this is a vbox issue, with virtualbox not exposing all the
> > xsave state, so that when the kernel adds up the xsave areas, the end
> > result doesn't match what the total size is reported to be.
>
> It seems probable that this is a VirtualBox issue. I was
> able to boot my exact 4.15-rc6+ kernel in qemu-kvm v1.5.3
> just fine.

This was discussed on vbox-dev back in May 2017 (see the whole thread for
more details):

https://www.virtualbox.org/pipermail/vbox-dev/2017-May/014466.html

Does that help?

Christian.
--
BOFH excuse #9:

doppler effect

2018-01-08 18:55:01

by Ingo Molnar

[permalink] [raw]

Subject: Re: 4.15-rc6+ hang

* Christian Kujau <[email protected]> wrote:

> On Thu, 4 Jan 2018, Tom Hromatka wrote:
> > > > [ 0.000000] ------------[ cut here ]------------
> > > > [ 0.000000] XSAVE consistency problem, dumping leaves
> > > I think this is a vbox issue, with virtualbox not exposing all the
> > > xsave state, so that when the kernel adds up the xsave areas, the end
> > > result doesn't match what the total size is reported to be.
> >
> > It seems probable that this is a VirtualBox issue.? I was
> > able to boot my exact 4.15-rc6+ kernel in qemu-kvm v1.5.3
> > just fine.
>
> This was discussed on vbox-dev back in May 2017 (see the whole thread for
> more details):
>
> https://www.virtualbox.org/pipermail/vbox-dev/2017-May/014466.html
>
> Does that help?

So, unless I've managed to confuse myself navigating that horrible email archive
web interface, the conclusion of that discussion appears to be a workaround: by
adding 'nofxsave' to the guest kernel boot line the warning+boot-hang goes away?

It would be nice to root cause this bug and offer a more permanent solution. We
can even apply VirtualBox quirks to the upstream kernel, within reason - i.e. as
long as it does not get overly ugly and does not impact other uses.

Thanks,

Ingo