2006-02-06 12:22:35

by Neal Becker

[permalink] [raw]
Subject: 2.6.16-rc1 panic on startup (acpi)

Sorry, I meant 2.6.16-rc1 (not 2.6.12)

Neal Becker wrote:

> HP dv8000 notebook
> 2.6.15 is fine, but 2.6.12-rc1 panics immediately on startup
>
> Here is a picture of some traceback
> https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=124152&action=view



2006-02-07 22:57:41

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

Neal Becker <[email protected]> wrote:
>
> Sorry, I meant 2.6.16-rc1 (not 2.6.12)
>
> Neal Becker wrote:
>
> > HP dv8000 notebook
> > 2.6.15 is fine, but 2.6.12-rc1 panics immediately on startup
> >
> > Here is a picture of some traceback
> > https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=124152&action=view
>
>

It died in pci_mmcfg_read(). Greg, didn't a crash in there get fixed recently?

Neal, booting with `vga=extended' or similar will help prevent the oops
from scrolling off the display.

2006-02-08 00:10:41

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Wednesday 08 February 2006 01:07, Greg KH wrote:

> > In the meantime, here's what I got..
> >
> > http://people.redhat.com/davej/DSC00148.JPG
>
> Andi, didn't your change for this function make it into Linus's tree?

Yes

See
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1de6bf33bc4601d856c286ad5c7d515468e24bbb

Workaround is pci=nommconf btw


-Andi

2006-02-07 23:31:21

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Tue, Feb 07, 2006 at 03:18:35PM -0800, Greg Kroah-Hartman wrote:
> On Tue, Feb 07, 2006 at 02:59:13PM -0800, Andrew Morton wrote:
> > Neal Becker <[email protected]> wrote:
> > >
> > > Sorry, I meant 2.6.16-rc1 (not 2.6.12)
> > >
> > > Neal Becker wrote:
> > >
> > > > HP dv8000 notebook
> > > > 2.6.15 is fine, but 2.6.12-rc1 panics immediately on startup
> > > >
> > > > Here is a picture of some traceback
> > > > https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=124152&action=view
> > >
> > >
> >
> > It died in pci_mmcfg_read(). Greg, didn't a crash in there get fixed recently?
>
> Yes. Can you try 2.6.16-rc2? Is this a x86-64 machine?

I can hit this on my dv8000 too. It's still there in 2.6.12-rc2-git3
I'm building a kernel with Randy's 'pause after printk' patch right now
to catch the top of the oops. It's enormous. Even with a 50 line display,
and x86-64s dual-line backtrace, it scrolls off the top.

Dave

2006-02-07 23:32:35

by Neal Becker

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Tuesday 07 February 2006 6:18 pm, Greg KH wrote:
> On Tue, Feb 07, 2006 at 02:59:13PM -0800, Andrew Morton wrote:
> > Neal Becker <[email protected]> wrote:
> > > Sorry, I meant 2.6.16-rc1 (not 2.6.12)
> > >
> > > Neal Becker wrote:
> > > > HP dv8000 notebook
> > > > 2.6.15 is fine, but 2.6.12-rc1 panics immediately on startup
> > > >
> > > > Here is a picture of some traceback
> > > > https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=124152&action=
> > > >view
> >
> > It died in pci_mmcfg_read(). Greg, didn't a crash in there get fixed
> > recently?
>
> Yes. Can you try 2.6.16-rc2? Is this a x86-64 machine?
>
> thanks,
>

This is 2.6.16-rc2. Yes, this is an x86-64. I have a new picture, I put
here:

http://nbecker.dyndns.org:8080/imgp0361.jpg

2006-02-08 00:07:31

by Greg KH

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Tue, Feb 07, 2006 at 06:40:43PM -0500, Dave Jones wrote:
> On Tue, Feb 07, 2006 at 03:35:31PM -0800, Randy.Dunlap wrote:
> > On Tue, 7 Feb 2006, Dave Jones wrote:
> >
> > > On Tue, Feb 07, 2006 at 03:18:35PM -0800, Greg Kroah-Hartman wrote:
> > > > On Tue, Feb 07, 2006 at 02:59:13PM -0800, Andrew Morton wrote:
> > > > > Neal Becker <[email protected]> wrote:
> > > > > >
> > > > > > Sorry, I meant 2.6.16-rc1 (not 2.6.12)
> > > > > >
> > > > > > Neal Becker wrote:
> > > > > >
> > > > > > > HP dv8000 notebook
> > > > > > > 2.6.15 is fine, but 2.6.12-rc1 panics immediately on startup
> > > > > > >
> > > > > > > Here is a picture of some traceback
> > > > > > > https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=124152&action=view
> > > > > >
> > > > > >
> > > > >
> > > > > It died in pci_mmcfg_read(). Greg, didn't a crash in there get fixed recently?
> > > >
> > > > Yes. Can you try 2.6.16-rc2? Is this a x86-64 machine?
> > >
> > > I can hit this on my dv8000 too. It's still there in 2.6.12-rc2-git3
> > > I'm building a kernel with Randy's 'pause after printk' patch right now
> > > to catch the top of the oops. It's enormous. Even with a 50 line display,
> > > and x86-64s dual-line backtrace, it scrolls off the top.
> >
> > Just be patient. A boot can take a few minutes... ;)
>
> It doesn't get that far. What did bugger things up though was the NMI watchdog
> kicking in. I've thrown a touch_nmi_watchdog in the delay, and kicked off another build
> hoping for a cleaner dump.
>
> In the meantime, here's what I got..
>
> http://people.redhat.com/davej/DSC00148.JPG

Andi, didn't your change for this function make it into Linus's tree?

thanks,

greg k-h

2006-02-07 23:35:36

by Randy Dunlap

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Tue, 7 Feb 2006, Dave Jones wrote:

> On Tue, Feb 07, 2006 at 03:18:35PM -0800, Greg Kroah-Hartman wrote:
> > On Tue, Feb 07, 2006 at 02:59:13PM -0800, Andrew Morton wrote:
> > > Neal Becker <[email protected]> wrote:
> > > >
> > > > Sorry, I meant 2.6.16-rc1 (not 2.6.12)
> > > >
> > > > Neal Becker wrote:
> > > >
> > > > > HP dv8000 notebook
> > > > > 2.6.15 is fine, but 2.6.12-rc1 panics immediately on startup
> > > > >
> > > > > Here is a picture of some traceback
> > > > > https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=124152&action=view
> > > >
> > > >
> > >
> > > It died in pci_mmcfg_read(). Greg, didn't a crash in there get fixed recently?
> >
> > Yes. Can you try 2.6.16-rc2? Is this a x86-64 machine?
>
> I can hit this on my dv8000 too. It's still there in 2.6.12-rc2-git3
> I'm building a kernel with Randy's 'pause after printk' patch right now
> to catch the top of the oops. It's enormous. Even with a 50 line display,
> and x86-64s dual-line backtrace, it scrolls off the top.

Just be patient. A boot can take a few minutes... ;)

--
~Randy

2006-02-07 23:40:55

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Tue, Feb 07, 2006 at 03:35:31PM -0800, Randy.Dunlap wrote:
> On Tue, 7 Feb 2006, Dave Jones wrote:
>
> > On Tue, Feb 07, 2006 at 03:18:35PM -0800, Greg Kroah-Hartman wrote:
> > > On Tue, Feb 07, 2006 at 02:59:13PM -0800, Andrew Morton wrote:
> > > > Neal Becker <[email protected]> wrote:
> > > > >
> > > > > Sorry, I meant 2.6.16-rc1 (not 2.6.12)
> > > > >
> > > > > Neal Becker wrote:
> > > > >
> > > > > > HP dv8000 notebook
> > > > > > 2.6.15 is fine, but 2.6.12-rc1 panics immediately on startup
> > > > > >
> > > > > > Here is a picture of some traceback
> > > > > > https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=124152&action=view
> > > > >
> > > > >
> > > >
> > > > It died in pci_mmcfg_read(). Greg, didn't a crash in there get fixed recently?
> > >
> > > Yes. Can you try 2.6.16-rc2? Is this a x86-64 machine?
> >
> > I can hit this on my dv8000 too. It's still there in 2.6.12-rc2-git3
> > I'm building a kernel with Randy's 'pause after printk' patch right now
> > to catch the top of the oops. It's enormous. Even with a 50 line display,
> > and x86-64s dual-line backtrace, it scrolls off the top.
>
> Just be patient. A boot can take a few minutes... ;)

It doesn't get that far. What did bugger things up though was the NMI watchdog
kicking in. I've thrown a touch_nmi_watchdog in the delay, and kicked off another build
hoping for a cleaner dump.

In the meantime, here's what I got..

http://people.redhat.com/davej/DSC00148.JPG


Dave


2006-02-07 23:18:21

by Greg KH

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Tue, Feb 07, 2006 at 02:59:13PM -0800, Andrew Morton wrote:
> Neal Becker <[email protected]> wrote:
> >
> > Sorry, I meant 2.6.16-rc1 (not 2.6.12)
> >
> > Neal Becker wrote:
> >
> > > HP dv8000 notebook
> > > 2.6.15 is fine, but 2.6.12-rc1 panics immediately on startup
> > >
> > > Here is a picture of some traceback
> > > https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=124152&action=view
> >
> >
>
> It died in pci_mmcfg_read(). Greg, didn't a crash in there get fixed recently?

Yes. Can you try 2.6.16-rc2? Is this a x86-64 machine?

thanks,

greg k-h

2006-02-08 01:25:56

by Neal Becker

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Tuesday 07 February 2006 7:10 pm, Andi Kleen wrote:
> On Wednesday 08 February 2006 01:07, Greg KH wrote:
> > > In the meantime, here's what I got..
> > >
> > > http://people.redhat.com/davej/DSC00148.JPG
> >
> > Andi, didn't your change for this function make it into Linus's tree?
>
> Yes
>
> See
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=comm
>it;h=1de6bf33bc4601d856c286ad5c7d515468e24bbb
>
> Workaround is pci=nommconf btw
>

Yes! This patch worked.

2006-02-08 03:03:52

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Wed, Feb 08, 2006 at 01:10:06AM +0100, Andi Kleen wrote:
> On Wednesday 08 February 2006 01:07, Greg KH wrote:
>
> > > In the meantime, here's what I got..
> > >
> > > http://people.redhat.com/davej/DSC00148.JPG
> >
> > Andi, didn't your change for this function make it into Linus's tree?
>
> Yes
>
> See
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1de6bf33bc4601d856c286ad5c7d515468e24bbb
>
> Workaround is pci=nommconf btw

I'm puzzled. I'm still seeing this crash with latest -git which
has this patch (I just double checked the source I built).
The pci=nommconf workaround does indeed work though.

Dave

2006-02-08 07:58:58

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Wednesday 08 February 2006 04:03, Dave Jones wrote:

> > Workaround is pci=nommconf btw
>
> I'm puzzled. I'm still seeing this crash with latest -git which
> has this patch (I just double checked the source I built).

That's surprising. Can you addr2line the exactly address it's crashing on?

-Andi

2006-02-09 20:50:10

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Wed, Feb 08, 2006 at 08:55:05AM +0100, Andi Kleen wrote:
> > > Workaround is pci=nommconf btw
> > I'm puzzled. I'm still seeing this crash with latest -git which
> > has this patch (I just double checked the source I built).
>
> That's surprising. Can you addr2line the exactly address it's crashing on?

Still there in todays git snapshot.
http://people.redhat.com/davej/dsc00150.jpg is the top of the oops.

Full traceback is
acpi_os_derive_pci_id_2
acpi_os_derive_pci_id
acpi_ev_pci_config_region_setup
acpi_os_acquire_object
acpi_ev_pci_config_region_setup
acpi_ev_address_space_dispatch
cache_alloc_debugcheck_after
acpi_ex_access_region
acpi_ex_field_datum_io
acpi_os_acquire_ojbect
acpi_ex_extract_from_field
acpi_ut_create_internal_object
acpi_ex_read_data_from_field
acpi_ex_resolve_node_to_value
acpi_ds_init_object_from_op
acpi_ex_resolve_to_value
acpi_ex_resolve_operands
acpi_ds_exec_end_op
acpi_ps_parse_loop
acpi_ps_parse_aml
acpi_ps_execute_pass
acpi_ps_execute_method
acpi_ns_evaluate_by_handle
acpi_ns_evaluate_realative
acpi_ut_evalute_object
acpi_ut_execute_STA
acpi_ut_release_mutex
acpi_ns_get_device_callback
vsscanf
acpi_os_wait_semaphore
acpi_ns_get_device_callback
acpi_ns_walk_namespace
acpi_get_devices
find_pci_rootbridge
acpi_get_pci_rootbridge_handle
pci_acpi_find_root_bridge
acpi_platform_notify
device_add
pci_create_bus
pci_scan_bus_parented
pci_acpi_scan_root
acpi_pci_root_add
acpi_bus_driver_init
acpi_add_single_object
acpi_bus_scan
acpi_scan_init
acpi_event_init
init


Here's pci_mm_cfg from that kernel..
d+0>: push %r15
0xffffffff802d6e36 <pci_mmcfg_read+2>: push %r14
0xffffffff802d6e38 <pci_mmcfg_read+4>: push %r13
0xffffffff802d6e3a <pci_mmcfg_read+6>: push %r12
0xffffffff802d6e3c <pci_mmcfg_read+8>: push %rbp
0xffffffff802d6e3d <pci_mmcfg_read+9>: push %rbx
0xffffffff802d6e3e <pci_mmcfg_read+10>: sub $0x8,%rsp
0xffffffff802d6e42 <pci_mmcfg_read+14>: mov %edi,%r15d
0xffffffff802d6e45 <pci_mmcfg_read+17>: mov %esi,%r14d
0xffffffff802d6e48 <pci_mmcfg_read+20>: mov %edx,%r12d
0xffffffff802d6e4b <pci_mmcfg_read+23>: mov %ecx,%ebp
0xffffffff802d6e4d <pci_mmcfg_read+25>: mov %r8d,%ebx
0xffffffff802d6e50 <pci_mmcfg_read+28>: mov %r9,%r13
0xffffffff802d6e53 <pci_mmcfg_read+31>: test %r9,%r9
0xffffffff802d6e56 <pci_mmcfg_read+34>: je 0xffffffff802d6e70 <pci_mmcfg_read+60>
0xffffffff802d6e58 <pci_mmcfg_read+36>: cmp $0xff,%esi
0xffffffff802d6e5e <pci_mmcfg_read+42>: ja 0xffffffff802d6e70 <pci_mmcfg_read+60>
0xffffffff802d6e60 <pci_mmcfg_read+44>: cmp $0xff,%edx
0xffffffff802d6e66 <pci_mmcfg_read+50>: ja 0xffffffff802d6e70 <pci_mmcfg_read+60>
0xffffffff802d6e68 <pci_mmcfg_read+52>: cmp $0xfff,%ecx
0xffffffff802d6e6e <pci_mmcfg_read+58>: jle 0xffffffff802d6e77 <pci_mmcfg_read+67>
0xffffffff802d6e70 <pci_mmcfg_read+60>: mov $0xffffffea,%eax
0xffffffff802d6e75 <pci_mmcfg_read+65>: jmp 0xffffffff802d6edf <pci_mmcfg_read+171>
0xffffffff802d6e77 <pci_mmcfg_read+67>: callq 0xffffffff802d6cf0 <pci_dev_base>
0xffffffff802d6e7c <pci_mmcfg_read+72>: mov %rax,%rdx
0xffffffff802d6e7f <pci_mmcfg_read+75>: test %rax,%rax
0xffffffff802d6e82 <pci_mmcfg_read+78>: jne 0xffffffff802d6ea5 <pci_mmcfg_read+113>
0xffffffff802d6e84 <pci_mmcfg_read+80>: mov %r13,%r9
0xffffffff802d6e87 <pci_mmcfg_read+83>: mov %ebx,%r8d
0xffffffff802d6e8a <pci_mmcfg_read+86>: mov %ebp,%ecx
0xffffffff802d6e8c <pci_mmcfg_read+88>: mov %r12d,%edx
0xffffffff802d6e8f <pci_mmcfg_read+91>: mov %r14d,%esi
0xffffffff802d6e92 <pci_mmcfg_read+94>: mov %r15d,%edi
0xffffffff802d6e95 <pci_mmcfg_read+97>: pop %rbx
0xffffffff802d6e96 <pci_mmcfg_read+98>: pop %rbx
0xffffffff802d6e97 <pci_mmcfg_read+99>: pop %rbp
0xffffffff802d6e98 <pci_mmcfg_read+100>: pop %r12
0xffffffff802d6e9a <pci_mmcfg_read+102>: pop %r13
0xffffffff802d6e9c <pci_mmcfg_read+104>: pop %r14
0xffffffff802d6e9e <pci_mmcfg_read+106>: pop %r15
0xffffffff802d6ea0 <pci_mmcfg_read+108>: jmpq 0xffffffff802d56a1 <pci_conf1_read>
0xffffffff802d6ea5 <pci_mmcfg_read+113>: cmp $0x2,%ebx
0xffffffff802d6ea8 <pci_mmcfg_read+116>: je 0xffffffff802d6ec1 <pci_mmcfg_read+141>
0xffffffff802d6eaa <pci_mmcfg_read+118>: cmp $0x4,%ebx
0xffffffff802d6ead <pci_mmcfg_read+121>: je 0xffffffff802d6ed0 <pci_mmcfg_read+156>
0xffffffff802d6eaf <pci_mmcfg_read+123>: dec %ebx
0xffffffff802d6eb1 <pci_mmcfg_read+125>: jne 0xffffffff802d6edd <pci_mmcfg_read+169>
0xffffffff802d6eb3 <pci_mmcfg_read+127>: movslq %ebp,%rax
0xffffffff802d6eb6 <pci_mmcfg_read+130>: lea (%rdx,%rax,1),%rax
0xffffffff802d6eba <pci_mmcfg_read+134>: mov (%rax),%al

We blew up here ^^^

0xffffffff802d6ebc <pci_mmcfg_read+136>: movzbl %al,%eax
0xffffffff802d6ebf <pci_mmcfg_read+139>: jmp 0xffffffff802d6ed9 <pci_mmcfg_read+165>
0xffffffff802d6ec1 <pci_mmcfg_read+141>: movslq %ebp,%rax
0xffffffff802d6ec4 <pci_mmcfg_read+144>: lea (%rdx,%rax,1),%rax
0xffffffff802d6ec8 <pci_mmcfg_read+148>: mov (%rax),%ax
0xffffffff802d6ecb <pci_mmcfg_read+151>: movzwl %ax,%eax
0xffffffff802d6ece <pci_mmcfg_read+154>: jmp 0xffffffff802d6ed9 <pci_mmcfg_read+165>
0xffffffff802d6ed0 <pci_mmcfg_read+156>: movslq %ebp,%rax
0xffffffff802d6ed3 <pci_mmcfg_read+159>: lea (%rdx,%rax,1),%rax
0xffffffff802d6ed7 <pci_mmcfg_read+163>: mov (%rax),%eax
0xffffffff802d6ed9 <pci_mmcfg_read+165>: mov %eax,0x0(%r13)
0xffffffff802d6edd <pci_mmcfg_read+169>: xor %eax,%eax
0xffffffff802d6edf <pci_mmcfg_read+171>: pop %r11
0xffffffff802d6ee1 <pci_mmcfg_read+173>: pop %rbx
0xffffffff802d6ee2 <pci_mmcfg_read+174>: pop %rbp
0xffffffff802d6ee3 <pci_mmcfg_read+175>: pop %r12
0xffffffff802d6ee5 <pci_mmcfg_read+177>: pop %r13
0xffffffff802d6ee7 <pci_mmcfg_read+179>: pop %r14
0xffffffff802d6ee9 <pci_mmcfg_read+181>: pop %r15
0xffffffff802d6eeb <pci_mmcfg_read+183>: retq

Dave

2006-02-09 20:58:17

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.16-rc1 panic on startup (acpi)

On Thu, Feb 09, 2006 at 03:49:40PM -0500, Dave Jones wrote:
> On Wed, Feb 08, 2006 at 08:55:05AM +0100, Andi Kleen wrote:
> > > > Workaround is pci=nommconf btw
> > > I'm puzzled. I'm still seeing this crash with latest -git which
> > > has this patch (I just double checked the source I built).
> >
> > That's surprising. Can you addr2line the exactly address it's crashing on?
>
> Still there in todays git snapshot.
> http://people.redhat.com/davej/dsc00150.jpg is the top of the oops.

Actually I think this is pilot error. I've been running a mispatched tree.

Dave