2009-12-15 16:34:36

by Peter Palfrader

[permalink] [raw]
Subject: 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer dereference at 000000000000001f

Hi,

we tried to upgrade a couple of our proliant servers from 2.6.31.6 to
2.6.32.1.

On two of our DL385g1 servers we had problems booting 2.6.32.1, as they
paniced.

One of them eventually booted correctly when it was decided to log its
serial console output; that strategy proved unsuccessful with the second
box.


[ 5.304749] BUG: unable to handle kernel NULL pointer dereference at 000000000000001f
..
[ 5.308739] Call Trace:
[ 5.308739] [<ffffffff810c3840>] kstrdup+0x40/0x70
[ 5.308739] [<ffffffff81150d77>] sysfs_new_dirent+0xf7/0x110
[ 5.308739] [<ffffffff8115121d>] create_dir+0x3d/0xc0
[ 5.308739] [<ffffffff81090af1>] ? autoremove_wake_function+0x11/0x40
[ 5.308739] [<ffffffff811512d4>] sysfs_create_dir+0x34/0x50
[ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
[ 5.308739] [<ffffffff8138e961>] kobject_add_internal+0xe1/0x1e0
[ 5.308739] [<ffffffff8138eb78>] kobject_add_varg+0x38/0x60
[ 5.308739] [<ffffffff8138ec15>] kobject_init_and_add+0x75/0x90
[ 5.308739] [<ffffffff81150560>] ? sysfs_ilookup_test+0x0/0x20
[ 5.308739] [<ffffffff8115082d>] ? sysfs_find_dirent+0x2d/0x40
[ 5.308739] [<ffffffff81150ec1>] ? sysfs_addrm_finish+0x21/0x250
[ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
[ 5.308739] [<ffffffff810e6fe4>] ? kmem_cache_alloc+0x84/0xc0
[ 5.308739] [<ffffffff814238d4>] bus_add_driver+0x94/0x260
[ 5.308739] [<ffffffff81424ed9>] driver_register+0x79/0x160
[ 5.308739] [<ffffffff815a28a3>] __hid_register_driver+0x43/0x80
[ 5.308739] [<ffffffff81a3d7ff>] ? gyration_init+0x0/0x1b
[ 5.308739] [<ffffffff81a3d818>] gyration_init+0x19/0x1b
[ 5.308739] [<ffffffff81009048>] do_one_initcall+0x38/0x1a0
[ 5.308739] [<ffffffff81a0e6b5>] kernel_init+0x172/0x1ca
[ 5.308739] [<ffffffff81036a0a>] child_rip+0xa/0x20
[ 5.308739] [<ffffffff81a0e543>] ? kernel_init+0x0/0x1ca
[ 5.308739] [<ffffffff81036a00>] ? child_rip+0x0/0x20

is from the machine that reliably fails to boot.
http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/ravel
hosts the complete serial console output.




What I caught on the second box, that eventually decided to boot is
similar, but not identical:
[ 19.028333] Call Trace:
[ 19.028333] [<ffffffff81150560>] ? sysfs_ilookup_test+0x0/0x20
[ 19.028333] [<ffffffff810c3840>] kstrdup+0x40/0x70
[ 19.028333] [<ffffffff81150d77>] sysfs_new_dirent+0xf7/0x110
[ 19.028333] [<ffffffff81150b17>] ? sysfs_add_one+0x27/0xd0
[ 19.028333] [<ffffffff81151bf7>] sysfs_do_create_link+0x87/0x160
[ 19.028333] [<ffffffff81151cee>] sysfs_create_link+0xe/0x10
[ 19.028333] [<ffffffff81422072>] device_add+0x272/0x730
[ 19.028333] [<ffffffff8139779e>] ? kvasprintf+0x6e/0x90
[ 19.028333] [<ffffffff81422549>] device_register+0x19/0x20
[ 19.028333] [<ffffffff8142262c>] device_create_vargs+0xdc/0xf0
[ 19.028333] [<ffffffff8142268b>] device_create+0x4b/0x50
[ 19.028333] [<ffffffff813e9702>] ? extract_entropy+0xe2/0x140
[ 19.028333] [<ffffffff813f573f>] misc_register+0xbf/0x180
[ 19.028333] [<ffffffff8107a4e0>] ? init_oops_id+0x0/0x40
[ 19.028333] [<ffffffff81a2626b>] ? pm_qos_power_init+0x0/0xe1
[ 19.028333] [<ffffffff81a262a3>] pm_qos_power_init+0x38/0xe1
[ 19.028333] [<ffffffff81009048>] do_one_initcall+0x38/0x1a0
[ 19.028333] [<ffffffff81a0e6b5>] kernel_init+0x172/0x1ca
[ 19.028333] [<ffffffff81036a0a>] child_rip+0xa/0x20
[ 19.028333] [<ffffffff81a0e543>] ? kernel_init+0x0/0x1ca
[ 19.028333] [<ffffffff81036a00>] ? child_rip+0x0/0x20

http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/klecker-bad

http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/klecker-good
for the output during a successful boot.

The config file can be found at
http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/config-2.6.32.1-dsa-amd64


Cheers,
Peter
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/


2009-12-22 11:47:15

by Peter Palfrader

[permalink] [raw]
Subject: Re: 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer dereference at 000000000000001f

On Tue, 15 Dec 2009, Peter Palfrader wrote:

> we tried to upgrade a couple of our proliant servers from 2.6.31.6 to
> 2.6.32.1.
>
> On two of our DL385g1 servers we had problems booting 2.6.32.1, as they
> paniced.

Several more do not boot .32 reliably. Anything I can try?

--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/

2009-12-22 12:04:20

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer dereference at 000000000000001f

Peter Palfrader <[email protected]> writes:


> [ 5.304749] BUG: unable to handle kernel NULL pointer dereference at 000000000000001f
> ..
> [ 5.308739] Call Trace:
> [ 5.308739] [<ffffffff810c3840>] kstrdup+0x40/0x70
> [ 5.308739] [<ffffffff81150d77>] sysfs_new_dirent+0xf7/0x110
> [ 5.308739] [<ffffffff8115121d>] create_dir+0x3d/0xc0
> [ 5.308739] [<ffffffff81090af1>] ? autoremove_wake_function+0x11/0x40
> [ 5.308739] [<ffffffff811512d4>] sysfs_create_dir+0x34/0x50
> [ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
> [ 5.308739] [<ffffffff8138e961>] kobject_add_internal+0xe1/0x1e0
> [ 5.308739] [<ffffffff8138eb78>] kobject_add_varg+0x38/0x60
> [ 5.308739] [<ffffffff8138ec15>] kobject_init_and_add+0x75/0x90
> [ 5.308739] [<ffffffff81150560>] ? sysfs_ilookup_test+0x0/0x20
> [ 5.308739] [<ffffffff8115082d>] ? sysfs_find_dirent+0x2d/0x40
> [ 5.308739] [<ffffffff81150ec1>] ? sysfs_addrm_finish+0x21/0x250
> [ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
> [ 5.308739] [<ffffffff810e6fe4>] ? kmem_cache_alloc+0x84/0xc0
> [ 5.308739] [<ffffffff814238d4>] bus_add_driver+0x94/0x260
> [ 5.308739] [<ffffffff81424ed9>] driver_register+0x79/0x160
> [ 5.308739] [<ffffffff815a28a3>] __hid_register_driver+0x43/0x80
> [ 5.308739] [<ffffffff81a3d7ff>] ? gyration_init+0x0/0x1b
> [ 5.308739] [<ffffffff81a3d818>] gyration_init+0x19/0x1b

Seems to be caused by the "gyration driver" whatever that is. Do you
have such a USB device?

It could be some module mismatch, it looks suspicious
and from a quick look the gyration driver does nothing bad
in that init path. Try a make clean and remove/rebuild/reinstall all the modules
on the target system.

If that doesn't help perhaps disable CONFIG_HID_GYRATION,
but from your other oops something more seems to be broken anyways.

> [ 5.308739] [<ffffffff81009048>] do_one_initcall+0x38/0x1a0
> [ 5.308739] [<ffffffff81a0e6b5>] kernel_init+0x172/0x1ca
> [ 5.308739] [<ffffffff81036a0a>] child_rip+0xa/0x20
> [ 5.308739] [<ffffffff81a0e543>] ? kernel_init+0x0/0x1ca
> [ 5.308739] [<ffffffff81036a00>] ? child_rip+0x0/0x20

-Andi

--
[email protected] -- Speaking for myself only.

2009-12-22 18:33:57

by Peter Palfrader

[permalink] [raw]
Subject: Re: 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer dereference at 000000000000001f

On Tue, 22 Dec 2009, Andi Kleen wrote:

> > [ 5.304749] BUG: unable to handle kernel NULL pointer dereference at 000000000000001f
> > ..
> > [ 5.308739] Call Trace:
> > [ 5.308739] [<ffffffff810c3840>] kstrdup+0x40/0x70
> > [ 5.308739] [<ffffffff81150d77>] sysfs_new_dirent+0xf7/0x110
> > [ 5.308739] [<ffffffff8115121d>] create_dir+0x3d/0xc0
> > [ 5.308739] [<ffffffff81090af1>] ? autoremove_wake_function+0x11/0x40
> > [ 5.308739] [<ffffffff811512d4>] sysfs_create_dir+0x34/0x50
> > [ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
> > [ 5.308739] [<ffffffff8138e961>] kobject_add_internal+0xe1/0x1e0
> > [ 5.308739] [<ffffffff8138eb78>] kobject_add_varg+0x38/0x60
> > [ 5.308739] [<ffffffff8138ec15>] kobject_init_and_add+0x75/0x90
> > [ 5.308739] [<ffffffff81150560>] ? sysfs_ilookup_test+0x0/0x20
> > [ 5.308739] [<ffffffff8115082d>] ? sysfs_find_dirent+0x2d/0x40
> > [ 5.308739] [<ffffffff81150ec1>] ? sysfs_addrm_finish+0x21/0x250
> > [ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
> > [ 5.308739] [<ffffffff810e6fe4>] ? kmem_cache_alloc+0x84/0xc0
> > [ 5.308739] [<ffffffff814238d4>] bus_add_driver+0x94/0x260
> > [ 5.308739] [<ffffffff81424ed9>] driver_register+0x79/0x160
> > [ 5.308739] [<ffffffff815a28a3>] __hid_register_driver+0x43/0x80
> > [ 5.308739] [<ffffffff81a3d7ff>] ? gyration_init+0x0/0x1b
> > [ 5.308739] [<ffffffff81a3d818>] gyration_init+0x19/0x1b
>
> Seems to be caused by the "gyration driver" whatever that is. Do you
> have such a USB device?

Doubtful.

> It could be some module mismatch, it looks suspicious
> and from a quick look the gyration driver does nothing bad
> in that init path. Try a make clean and remove/rebuild/reinstall all the modules
> on the target system.
>
> If that doesn't help perhaps disable CONFIG_HID_GYRATION,
> but from your other oops something more seems to be broken anyways.

This is a static kernel - no module support. Anyway, I also tried
without CONFIG_USB_HID (which pulls in all the other HID_* things) but
no luck.

--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/

2009-12-22 18:42:21

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer dereference at 000000000000001f

> This is a static kernel - no module support. Anyway, I also tried
> without CONFIG_USB_HID (which pulls in all the other HID_* things) but
> no luck.

Try a make distclean + rebuild anyways.

-Andi
--
[email protected] -- Speaking for myself only.

2009-12-22 18:58:06

by Peter Palfrader

[permalink] [raw]
Subject: Re: 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer dereference at 000000000000001f

On Tue, 22 Dec 2009, Peter Palfrader wrote:

> > If that doesn't help perhaps disable CONFIG_HID_GYRATION,
> > but from your other oops something more seems to be broken anyways.
>
> This is a static kernel - no module support. Anyway, I also tried
> without CONFIG_USB_HID (which pulls in all the other HID_* things) but
> no luck.

However, disabling all of HID (CONFIG_HID_SUPPORT=n) makes the system
boot (Previously HID, HIDRAW and HID_SUPPORT were still enabled).

--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/

2009-12-22 19:01:46

by Peter Palfrader

[permalink] [raw]
Subject: Re: 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer dereference at 000000000000001f

On Tue, 22 Dec 2009, Andi Kleen wrote:

> > This is a static kernel - no module support. Anyway, I also tried
> > without CONFIG_USB_HID (which pulls in all the other HID_* things) but
> > no luck.
>
> Try a make distclean + rebuild anyways.

I usually do. make-kpkg doesn't really like building from dirty
directories all that much.

--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/

2009-12-24 13:04:29

by Peter Palfrader

[permalink] [raw]
Subject: Re: 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer dereference at 000000000000001f

On Tue, 22 Dec 2009, Peter Palfrader wrote:

> On Tue, 22 Dec 2009, Peter Palfrader wrote:
>
> > > If that doesn't help perhaps disable CONFIG_HID_GYRATION,
> > > but from your other oops something more seems to be broken anyways.
> >
> > This is a static kernel - no module support. Anyway, I also tried
> > without CONFIG_USB_HID (which pulls in all the other HID_* things) but
> > no luck.
>
> However, disabling all of HID (CONFIG_HID_SUPPORT=n) makes the system
> boot (Previously HID, HIDRAW and HID_SUPPORT were still enabled).

However, I still see panics on boot occassionally, tho not so often or
reproducible. So far only on dl385 (opteron) systems.

And all of the backtraces go through sysfs_new_dirent() near the top.
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/

2009-12-26 17:12:08

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.32.1: BUG and panic: unable to handle kernel NULL pointer dereference at 000000000000001f

On Thu, Dec 24, 2009 at 02:04:25PM +0100, Peter Palfrader wrote:
> On Tue, 22 Dec 2009, Peter Palfrader wrote:
>
> > On Tue, 22 Dec 2009, Peter Palfrader wrote:
> >
> > > > If that doesn't help perhaps disable CONFIG_HID_GYRATION,
> > > > but from your other oops something more seems to be broken anyways.
> > >
> > > This is a static kernel - no module support. Anyway, I also tried
> > > without CONFIG_USB_HID (which pulls in all the other HID_* things) but
> > > no luck.
> >
> > However, disabling all of HID (CONFIG_HID_SUPPORT=n) makes the system
> > boot (Previously HID, HIDRAW and HID_SUPPORT were still enabled).

It's suspicious if you don't have such devices, that would
point to something being confused in the driver probing
layer.

>
> However, I still see panics on boot occassionally, tho not so often or
> reproducible. So far only on dl385 (opteron) systems.

Multiple systems and the same oopses?

>
> And all of the backtraces go through sysfs_new_dirent() near the top.

Please post full oopses.

-Andi

--
[email protected] -- Speaking for myself only.