2008-08-09 21:03:33

by Bruno Prémont

[permalink] [raw]
Subject: [2.6.27-rc2-git4] Kernel panic on VIA Ester+VIA CX700

Hi,

Trying out 2.6.27-rc2-git4 on a VIA Ester + CX700 based system I
experience the following panic using a config obtained with make
oldconfig from working 2.6.26 config.

Looking at the traces and when it happens it looks like it could be
libATA or SCSI related...


Note: the kernel is patched with viafb patches sent yesterday ([y]) and
squashfs (<m>)

lspci:
00:00.0 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:0324] (rev 03)
00:00.1 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:1324]
00:00.2 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:2324]
00:00.3 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:3324]
00:00.4 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:4324]
00:00.7 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:7324]
00:01.0 PCI bridge [0604]: VIA Technologies, Inc. VT8237 PCI Bridge [1106:b198]
00:0f.0 IDE interface [0101]: VIA Technologies, Inc. Unknown device [1106:0581]
00:10.0 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller [1106:3038] (rev 90)
00:10.1 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller [1106:3038] (rev 90)
00:10.2 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller [1106:3038] (rev 90)
00:10.4 USB Controller [0c03]: VIA Technologies, Inc. USB 2.0 [1106:3104] (rev 90)
00:11.0 ISA bridge [0601]: VIA Technologies, Inc. CX700 PCI to ISA Bridge [1106:8324]
00:11.7 Host bridge [0600]: VIA Technologies, Inc. CX700 Internal Module Bus [1106:324e]
00:13.0 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:324b]
00:13.1 PCI bridge [0604]: VIA Technologies, Inc. CX700 PCI to PCI Bridge [1106:324a]
01:00.0 VGA compatible controller [0300]: VIA Technologies, Inc. CX700M2 UniChrome PRO II Graphics [1106:3157] (rev 03)
02:08.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet [10ec:8169] (rev 10)
80:01.0 Audio device [0403]: VIA Technologies, Inc. VIA High Definition Audio Controller [1106:3288] (rev 10)

Attached are boot messages as captured with netconsole and config file.

Bruno


Attachments:
(No filename) (2.14 kB)
venus-2.6.27-rc2.4.dmesg (23.11 kB)
config-2.6.27-rc2-git4 (52.46 kB)
Download all attachments

2008-08-09 21:57:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [2.6.27-rc2-git4] Kernel panic on VIA Ester+VIA CX700

On Saturday, 9 of August 2008, Bruno Pr?mont wrote:
> Hi,

Hi,

> Trying out 2.6.27-rc2-git4 on a VIA Ester + CX700 based system I
> experience the following panic using a config obtained with make
> oldconfig from working 2.6.26 config.
>
> Looking at the traces and when it happens it looks like it could be
> libATA or SCSI related...
>
>
> Note: the kernel is patched with viafb patches sent yesterday ([y]) and
> squashfs (<m>)

Could you retest without these two, please?

Thanks,
Rafael

2008-08-09 22:02:21

by Al Viro

[permalink] [raw]
Subject: Re: [2.6.27-rc2-git4] Kernel panic on VIA Ester+VIA CX700

On Sun, Aug 10, 2008 at 12:00:24AM +0200, Rafael J. Wysocki wrote:
> On Saturday, 9 of August 2008, Bruno Pr?mont wrote:
> > Hi,
>
> Hi,
>
> > Trying out 2.6.27-rc2-git4 on a VIA Ester + CX700 based system I
> > experience the following panic using a config obtained with make
> > oldconfig from working 2.6.26 config.
> >
> > Looking at the traces and when it happens it looks like it could be
> > libATA or SCSI related...
> >
> >
> > Note: the kernel is patched with viafb patches sent yesterday ([y]) and
> > squashfs (<m>)
>
> Could you retest without these two, please?

There'a another fun candidate: lazy allocation of fpu state. Do you have
padlock-aes/padlock-sha/via-rng in use?

2008-08-09 22:13:50

by Bruno Prémont

[permalink] [raw]
Subject: Re: [2.6.27-rc2-git4] Kernel panic on VIA Ester+VIA CX700

On Sat, 09 August 2008 Al Viro <[email protected]> wrote:
> On Sun, Aug 10, 2008 at 12:00:24AM +0200, Rafael J. Wysocki wrote:
> > On Saturday, 9 of August 2008, Bruno Pr?mont wrote:
> > > Hi,
> >
> > Hi,
> >
> > > Trying out 2.6.27-rc2-git4 on a VIA Ester + CX700 based system I
> > > experience the following panic using a config obtained with make
> > > oldconfig from working 2.6.26 config.
> > >
> > > Looking at the traces and when it happens it looks like it could
> > > be libATA or SCSI related...
> > >
> > >
> > > Note: the kernel is patched with viafb patches sent yesterday
> > > ([y]) and squashfs (<m>)
> >
> > Could you retest without these two, please?
>
> There'a another fun candidate: lazy allocation of fpu state. Do you
> have padlock-aes/padlock-sha/via-rng in use?
>
They are compiled into the kernel and initialize with 2.6.26.2, though
I have no idea if the kernel uses them more than that...

The crash happens long enough before userspace gets started, as such
except kernel-side users there should be no users of them.

I will check tomorrow without these to determine which one might be the
offending one:
- no crypto,padlock/viarng
- no viafb
- no squashfs (though it should be pretty inoffensive)

Bruno

2008-08-10 23:46:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [2.6.27-rc2-git4] Kernel panic on VIA Ester+VIA CX700

Bruno Pr?mont wrote:
>
> Recompiling without viafb+squashfs patches makes the panic go away.
>
> So something from viafb or squashfs triggers the panic or prepare
> for something else to trigger it...
>

Out of those, viafb by far seems most likely. Could you try compiling
with only one or the other?

-hpa

2008-08-11 18:18:27

by Suresh Siddha

[permalink] [raw]
Subject: Re: [2.6.27-rc2-git4] Kernel panic on VIA Ester+VIA CX700

On Sun, Aug 10, 2008 at 04:39:23PM -0700, H. Peter Anvin wrote:
> Bruno Pr?mont wrote:
> >
> > Recompiling without viafb+squashfs patches makes the panic go away.
> >
> > So something from viafb or squashfs triggers the panic or prepare
> > for something else to trigger it...
> >
>
> Out of those, viafb by far seems most likely. Could you try compiling
> with only one or the other?

[ 5.010629] general protection fault: 0000 [#1]
[ 5.021782] Modules linked in:
[ 5.030227]
[ 5.030227] Pid: 3, comm: ksoftirqd/0 Not tainted (2.6.27-rc2-git4_nocrypto #1)
[ 5.030227] EIP: 0060:[<c01042f5>] EFLAGS: 00010046 CPU: 0
[ 5.030227] EIP is at math_state_restore+0x25/0x60
[ 5.030227] EAX: f781db3d EBX: f781d898 ECX: 00000000 EDX: 00000000
[ 5.030227] ESI: f781d000 EDI: f782c20c EBP: f781d840 ESP: f781d838
[ 5.030227] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 5.030227] Process ksoftirqd/0 (pid: 3, ti=f781d000 task=f782c6a0 task.ti=f782a000)
[ 5.030227] Stack: f782c000 f782c6a0 f781d890 c01038dd f782c000 00000000 f782c000 f782c6a0
[ 5.030227] f782c20c f781d890 f7807e00 0000007b 0000007b 00000000 ffffffff c0101e8f
[ 5.030227] 00000060 00010002 f782c8ac f782c000 c04d8180 c011fa50 f782afc0 c03cfa61
[ 5.030227] Call Trace:
[ 5.030227] [<c01038dd>] ? device_not_available+0x2d/0x32
[ 5.030227] [<c0101e8f>] ? __switch_to+0x2f/0x130

It got a GP fault, because in __switch_to() we were doing unlazy_fpu() and
fxsave generated a DNA fault(which shouldn't happen unless we are hitting the
via padlock instruction issue or something else) and the math_state_restore()
found the task's math state pointer to be 0xf781db3d (EAX in the oops) and while
doing fxrstor we got GP fault, as the fxrstor pointer(EAX) is not 16byte
aligned.

It is interesting to see the EAX value similar to stack pointer. Task's
FP area gets dynamically allocated and as such EAX def looks wrong here.
I also see the config is using 4K stacks. Some config(viafb/squashfs?) causing
some thing wrong with the kernel stack?

2008-08-11 20:33:35

by Bruno Prémont

[permalink] [raw]
Subject: Re: [2.6.27-rc2-git4] Kernel panic on VIA Ester+VIA CX700

On Mon, 11 August 2008 Suresh Siddha <[email protected]> wrote:
> On Sun, Aug 10, 2008 at 04:39:23PM -0700, H. Peter Anvin wrote:
> > Bruno Prémont wrote:
> > >
> > > Recompiling without viafb+squashfs patches makes the panic go
> > > away.
> > >
> > > So something from viafb or squashfs triggers the panic or prepare
> > > for something else to trigger it...
> > >
> >
> > Out of those, viafb by far seems most likely. Could you try
> > compiling with only one or the other?
>
> [ 5.010629] general protection fault: 0000 [#1]
> [ 5.021782] Modules linked in:
> [ 5.030227]
> [ 5.030227] Pid: 3, comm: ksoftirqd/0 Not tainted (2.6.27-rc2-git4_nocrypto #1)
> [ 5.030227] EIP: 0060:[<c01042f5>] EFLAGS: 00010046 CPU: 0
> [ 5.030227] EIP is at math_state_restore+0x25/0x60
> [ 5.030227] EAX: f781db3d EBX: f781d898 ECX: 00000000 EDX: 00000000
> [ 5.030227] ESI: f781d000 EDI: f782c20c EBP: f781d840 ESP: f781d838
> [ 5.030227] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 5.030227] Process ksoftirqd/0 (pid: 3, ti=f781d000 task=f782c6a0 task.ti=f782a000)
> [ 5.030227] Stack: f782c000 f782c6a0 f781d890 c01038dd f782c000 00000000 f782c000 f782c6a0
> [ 5.030227] f782c20c f781d890 f7807e00 0000007b 0000007b 00000000 ffffffff c0101e8f
> [ 5.030227] 00000060 00010002 f782c8ac f782c000 c04d8180 c011fa50 f782afc0 c03cfa61
> [ 5.030227] Call Trace:
> [ 5.030227] [<c01038dd>] ? device_not_available+0x2d/0x32
> [ 5.030227] [<c0101e8f>] ? __switch_to+0x2f/0x130
>
> It got a GP fault, because in __switch_to() we were doing
> unlazy_fpu() and fxsave generated a DNA fault(which shouldn't happen
> unless we are hitting the via padlock instruction issue or something
> else) and the math_state_restore() found the task's math state
> pointer to be 0xf781db3d (EAX in the oops) and while doing fxrstor we
> got GP fault, as the fxrstor pointer(EAX) is not 16byte aligned.
>
> It is interesting to see the EAX value similar to stack pointer.
> Task's FP area gets dynamically allocated and as such EAX def looks
> wrong here. I also see the config is using 4K stacks. Some
> config(viafb/squashfs?) causing some thing wrong with the kernel
> stack? --
That's pretty possible...

I just recompiled (enabling some more stack debugging - which didn't help),
then I disabled 4k-stack and now system boots up...

Anything I forgot the enable to get stacktrace when stack is overflowing
instead of at a random time later on?
Also wondering that maximum stack usage is only printed for userspace apps
or kernel threads once init is running... is the stack usage not checked
earlier during boot process?

Changes to posted config:
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
-# CONFIG_DEBUG_STACK_USAGE is not set
+CONFIG_DEBUG_STACK_USAGE=y
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_X86_PTDUMP is not set
CONFIG_DEBUG_RODATA=y
# CONFIG_DEBUG_RODATA_TEST is not set
# CONFIG_DEBUG_NX_TEST is not set
-CONFIG_4KSTACKS=y
+# CONFIG_4KSTACKS is not set
CONFIG_DOUBLEFAULT=y
# CONFIG_MMIOTRACE is not set
CONFIG_IO_DELAY_TYPE_0X80=0

Attached are bootlog with 4k stack and 8k stacks using above config diff
(with appropriate CONFIG_4KSTACKS)

Bruno


Attachments:
(No filename) (3.19 kB)
venus-2.6.27-rc2.4-stack4k (23.41 kB)
venus-2.6.27-rc2.4-stack8k (21.74 kB)
Download all attachments