2011-04-06 18:47:58

by Eric B Munson

[permalink] [raw]
Subject: 2.6.39-rc2 boot crash

I am seeing a boot crash on my machine with 2.6.39-rc2. I don't yet have
netconsole setup and I don't have serial so the best I can do at the moment
is a couple of pictures of the stack trace.

http://tinypic.com/r/2zsr19i/7

And

http://tinypic.com/r/2iw22dj/7

I will try and get a net console working for better output, is there
anything else that would help?

Thanks,
Eric


Attachments:
(No filename) (379.00 B)
signature.asc (490.00 B)
Digital signature
Download all attachments

2011-04-06 18:52:39

by Dave Hansen

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

On Wed, 2011-04-06 at 14:47 -0400, Eric B Munson wrote:
> I am seeing a boot crash on my machine with 2.6.39-rc2. I don't yet have
> netconsole setup and I don't have serial so the best I can do at the moment
> is a couple of pictures of the stack trace.
>
> http://tinypic.com/r/2zsr19i/7
>
> And
>
> http://tinypic.com/r/2iw22dj/7
>
> I will try and get a net console working for better output, is there
> anything else that would help?

This:

> boot_delay= Milliseconds to delay each printk during boot.
> Values larger than 10 seconds (10000) are changed to
> no delay (0).
> Format: integer

sometimes gives you enough time to get something out of the system. You
might also want to try and boot the same kernel in a VM. It might just
be a weird configuration issue, or maybe even a lockdep bug. If it's
not hardware related, booting in a VM should help a lot.

-- Dave

2011-04-06 21:20:47

by Eric B Munson

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

On Wed, 06 Apr 2011, Dave Hansen wrote:

> On Wed, 2011-04-06 at 14:47 -0400, Eric B Munson wrote:
> > I am seeing a boot crash on my machine with 2.6.39-rc2. I don't yet have
> > netconsole setup and I don't have serial so the best I can do at the moment
> > is a couple of pictures of the stack trace.
> >
> > http://tinypic.com/r/2zsr19i/7
> >
> > And
> >
> > http://tinypic.com/r/2iw22dj/7
> >
> > I will try and get a net console working for better output, is there
> > anything else that would help?
>
> This:
>
> > boot_delay= Milliseconds to delay each printk during boot.
> > Values larger than 10 seconds (10000) are changed to
> > no delay (0).
> > Format: integer
>
> sometimes gives you enough time to get something out of the system. You
> might also want to try and boot the same kernel in a VM. It might just
> be a weird configuration issue, or maybe even a lockdep bug. If it's
> not hardware related, booting in a VM should help a lot.
>
> -- Dave
>

A bisect points at commit 04f482faf50535229a5a5c8d629cf963899f857c for the
first bad one. Unfortunately, I have not made netconsole work yet and the
hang is happening mostly right when X starts so I can't even see the console.
I will keep at the netconsole and see if I can get it functioning, also I will
try to boot this kernel in a VM and see if that helps.

Eric


Attachments:
(No filename) (1.39 kB)
signature.asc (490.00 B)
Digital signature
Download all attachments

2011-04-06 21:22:35

by David Miller

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

From: Eric B Munson <[email protected]>
Date: Wed, 6 Apr 2011 17:20:41 -0400

> A bisect points at commit 04f482faf50535229a5a5c8d629cf963899f857c for the
> first bad one. Unfortunately, I have not made netconsole work yet and the
> hang is happening mostly right when X starts so I can't even see the console.
> I will keep at the netconsole and see if I can get it functioning, also I will
> try to boot this kernel in a VM and see if that helps.

Patrick, please help Eric so we can fix this bug.

Thanks.

2011-04-06 22:05:22

by Eric B Munson

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

On Wed, 06 Apr 2011, David Miller wrote:

> From: Eric B Munson <[email protected]>
> Date: Wed, 6 Apr 2011 17:20:41 -0400
>
> > A bisect points at commit 04f482faf50535229a5a5c8d629cf963899f857c for the
> > first bad one. Unfortunately, I have not made netconsole work yet and the
> > hang is happening mostly right when X starts so I can't even see the console.
> > I will keep at the netconsole and see if I can get it functioning, also I will
> > try to boot this kernel in a VM and see if that helps.
>
> Patrick, please help Eric so we can fix this bug.
>
> Thanks.
>

I have a useful trace now from netconsole:

[ 18.029521] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1087
[ 18.029527] in_atomic(): 0, irqs_disabled(): 1, pid: 2018, name: cgrulesengd
[ 18.029693] BUG: unable to handle kernel paging request at 0000100000000000
[ 18.029730] IP: [<ffffffff814c3db8>] __skb_recv_datagram+0x128/0x2b0
[ 18.029756] PGD 0
[ 18.029768] Oops: 0002 [#1] SMP
[ 18.029790] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb10/10-0:1.0/bInterfaceClass
[ 18.029824] CPU 0
[ 18.029833] Modules linked in: kvm_intel kvm parport_pc ppdev snd_hda_codec_hdmi snd_hda_codec_realtek nfs lockd fscache auth_rpcgss nfs_acl sunrpc radeon deflate zlib_deflate ctr twofish_generic twofish_x86_64 twofish_common ttm camellia serpent drm_kms_helper snd_usb_audio blowfish cast5 snd_hda_intel drm des_generic snd_hda_codec snd_hwdep aesni_intel snd_usbmidi_lib cryptd aes_x86_64 aes_generic snd_pcm xcbc snd_seq_midi rmd160 snd_rawmidi sha512_generic sha256_generic uvcvideo snd_seq_midi_event sha1_generic snd_seq snd_timer crypto_null snd_seq_device snd af_key xhci_hcd i7core_edac videodev joydev psmouse edac_core v4l2_compat_ioctl32 w83627ehf soundcore serio_raw hwmon_vid snd_page_alloc max6650 hid_microsoft i2c_algo_bit lp parport asus_atk0110 usbhid hid firewire_ohci firewire_core crc_itu_t
[ 18.030424]
[ 18.030432] Pid: 2018, comm: cgrulesengd Not tainted 2.6.39-rc2+ #52 System manufacturer System Product Name/P6X58D PREMIUM
[ 18.030477] RIP: 0010:[<ffffffff814c3db8>] [<ffffffff814c3db8>] __skb_recv_datagram+0x128/0x2b0
[ 18.030510] RSP: 0018:ffff880326f03b28 EFLAGS: 00010002
[ 18.030528] RAX: 0000000000000286 RBX: ffff8803204c5100 RCX: 0000100000000000
[ 18.030552] RDX: ffff88031fe47200 RSI: ffff880326f03bf4 RDI: 0000000000000046
[ 18.030576] RBP: ffff880326f03bd8 R08: 0000000000000000 R09: 0000000000000000
[ 18.030599] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880327d6e928
[ 18.030623] R13: ffff880326f03b78 R14: ffff880326f03b90 R15: ffff880327d6e940
[ 18.030646] FS: 00007f3bf9173b20(0000) GS:ffff880331600000(0000) knlGS:0000000000000000
[ 18.030673] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 18.030693] CR2: 0000100000000000 CR3: 0000000326dda000 CR4: 00000000000006f0
[ 18.030716] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 18.030740] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 18.030763] Process cgrulesengd (pid: 2018, threadinfo ffff880326f02000, task ffff8803275aa300)
[ 18.030794] Stack:
[ 18.030803] ffff880300000000 ffff8803275aa338 ffff880327d6ebd0 ffff8803275aa300
[ 18.030839] 7fffffffffffffff ffff880326f03c74 ffff880326f03bf4 0000000000000001
[ 18.030872] ffff8803275aa300 ffff880327d6e940 00000000000001f7 0000000000000001
[ 18.030905] Call Trace:
[ 18.030916] [<ffffffff81009833>] ? native_sched_clock+0x13/0x60
[ 18.030936] [<ffffffff814c3f64>] skb_recv_datagram+0x24/0x30
[ 18.030956] [<ffffffff814f463c>] netlink_recvmsg+0x7c/0x430
[ 18.030975] [<ffffffff814bc185>] ? sock_update_classid+0x65/0x100
[ 18.030996] [<ffffffff814bc19d>] ? sock_update_classid+0x7d/0x100
[ 18.031016] [<ffffffff814bc1c0>] ? sock_update_classid+0xa0/0x100
[ 18.031037] [<ffffffff814b7c1d>] sock_recvmsg+0xfd/0x130
[ 18.031055] [<ffffffff81178af8>] ? set_fd_set+0x48/0x60
[ 18.031073] [<ffffffff8117a25b>] ? core_sys_select+0x26b/0x330
[ 18.031093] [<ffffffff8117a03d>] ? core_sys_select+0x4d/0x330
[ 18.031112] [<ffffffff8108cc05>] ? lock_release_holdtime+0x35/0x160
[ 18.031133] [<ffffffff814b7da1>] sys_recvfrom+0xf1/0x170
[ 18.031152] [<ffffffff815d40ba>] ? sysret_check+0x2e/0x69
[ 18.031171] [<ffffffff812f02de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 18.031193] [<ffffffff815d4082>] system_call_fastpath+0x16/0x1b
[ 18.031212] Code: 41 5d 41 5e 41 5f c9 c3 eb 01 90 ff 8b 38 01 00 00 48 8b 1a 48 8b 4a 08 48 c7 02 00 00 00 00 48 c7 42 08 00 00 00 00 48 89 4b 08
[ 18.031494] 89 19 eb aa eb 01 90 48 8b 83 f0 03 00 00 48 89 85 70 ff ff
[ 18.031601] RIP [<ffffffff814c3db8>] __skb_recv_datagram+0x128/0x2b0
[ 18.031625] RSP <ffff880326f03b28>
[ 18.031637] CR2: 0000100000000000
[ 18.039388] ---[ end trace 0e3e016130139f1b ]---
[ 18.112703] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 18.112738] IP: [<ffffffff814befed>] skb_queue_tail+0x3d/0x60
[ 18.112763] PGD 0
[ 18.112775] Oops: 0002 [#2] SMP
[ 18.112796] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb10/10-0:1.0/bInterfaceClass
[ 18.112828] CPU 0
[ 18.112837] Modules linked in: kvm_intel kvm parport_pc ppdev snd_hda_codec_hdmi snd_hda_codec_realtek nfs lockd fscache auth_rpcgss nfs_acl sunrpc radeon deflate zlib_deflate ctr twofish_generic twofish_x86_64 twofish_common ttm camellia serpent drm_kms_helper snd_usb_audio blowfish cast5 snd_hda_intel drm des_generic snd_hda_codec snd_hwdep aesni_intel snd_usbmidi_lib cryptd aes_x86_64 aes_generic snd_pcm xcbc snd_seq_midi rmd160 snd_rawmidi sha512_generic sha256_generic uvcvideo snd_seq_midi_event sha1_generic snd_seq snd_timer crypto_null snd_seq_device snd af_key xhci_hcd i7core_edac videodev joydev psmouse edac_core v4l2_compat_ioctl32 w83627ehf soundcore serio_raw hwmon_vid snd_page_alloc max6650 hid_microsoft i2c_algo_bit lp parport asus_atk0110 usbhid hid firewire_ohci firewire_core crc_itu_t
[ 18.115476]
[ 18.117533] Pid: 2178, comm: 0dns-down Tainted: G D 2.6.39-rc2+ #52 System manufacturer System Product Name/P6X58D PREMIUM
[ 18.119646] RIP: 0010:[<ffffffff814befed>] [<ffffffff814befed>] skb_queue_tail+0x3d/0x60
[ 18.121757] RSP: 0018:ffff88032666bd08 EFLAGS: 00010096
[ 18.123845] RAX: 0000000000000282 RBX: ffff880327d6e928 RCX: 000000000acc7db8
[ 18.125948] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff880327d6e940
[ 18.128046] RBP: ffff88032666bd28 R08: 0000000000000000 R09: 0000000000000001
[ 18.130171] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880327d6e940
[ 18.132281] R13: ffff880320929b00 R14: ffff880327d6e818 R15: ffff880327d6e800
[ 18.134388] FS: 0000000000000000(0000) GS:ffff880331600000(0000) knlGS:0000000000000000
[ 18.136498] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 18.138610] CR2: 0000000000000000 CR3: 0000000001a03000 CR4: 00000000000006f0
[ 18.140732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 18.142839] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 18.144953] Process 0dns-down (pid: 2178, threadinfo ffff88032666a000, task ffff880326f0a300)
[ 18.147057] Stack:
[ 18.149156] ffff88032666bd28 0000000000000000 ffff88032a7fa800 0000000000000000
[ 18.151256] ffff88032666bdb8 ffffffff814f4d12 0000000000000000 ffff880320929b00
[ 18.153365] ffff880327d6e84c ffff880320929bec 0000000026f0a300 0000000000000000
[ 18.155464] Call Trace:
[ 18.157539] [<ffffffff814f4d12>] netlink_broadcast_filtered+0x322/0x480
[ 18.159575] [<ffffffff814f4e8d>] netlink_broadcast+0x1d/0x20
[ 18.161568] [<ffffffff813a0223>] cn_netlink_send+0x1a3/0x1c0
[ 18.163515] [<ffffffff813a044a>] proc_exit_connector+0xda/0x100
[ 18.165538] [<ffffffff81055a08>] do_exit+0x1d8/0x870
[ 18.167428] [<ffffffff810570fe>] ? sys_wait4+0xae/0x100
[ 18.169287] [<ffffffff812f0354>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 18.171133] [<ffffffff810560fe>] do_group_exit+0x5e/0xd0
[ 18.172965] [<ffffffff81056187>] sys_exit_group+0x17/0x20
[ 18.174782] [<ffffffff815d4082>] system_call_fastpath+0x16/0x1b
[ 18.176600] Code: 6d f8 0f 1f 44 00 00 49 89 f5 48 89 fb 4c 8d 67 18 4c 89 e7 e8 65 c6 10 00 48 8b 53 08 4c 89 e7 49 89 5d 00 49 89 55 08 48 89 c6 <4c> 89 2a 4c 89 6b 08 ff 43 10 e8 54 cf 10 00 48 8b 5d e8 4c 8b
[ 18.178889] RIP [<ffffffff814befed>] skb_queue_tail+0x3d/0x60
[ 18.180925] RSP <ffff88032666bd08>
[ 18.182948] CR2: 0000000000000000
[ 18.184969] ---[ end trace 0e3e016130139f1c ]---
[ 18.184972] Fixing recursive fault but reboot is needed!

I haven't dug into it at all, but I am happy to help test potential fixes.

Eric


Attachments:
(No filename) (8.56 kB)
signature.asc (490.00 B)
Digital signature
Download all attachments

2011-04-07 11:06:19

by Patrick McHardy

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

Am 07.04.2011 00:05, schrieb Eric B Munson:
> On Wed, 06 Apr 2011, David Miller wrote:
>
>> From: Eric B Munson <[email protected]>
>> Date: Wed, 6 Apr 2011 17:20:41 -0400
>>
>>> A bisect points at commit 04f482faf50535229a5a5c8d629cf963899f857c for the
>>> first bad one. Unfortunately, I have not made netconsole work yet and the
>>> hang is happening mostly right when X starts so I can't even see the console.
>>> I will keep at the netconsole and see if I can get it functioning, also I will
>>> try to boot this kernel in a VM and see if that helps.
>>
>> Patrick, please help Eric so we can fix this bug.
>>
>> Thanks.
>>
>
> I have a useful trace now from netconsole:
>
> [ 18.029521] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1087
> [ 18.029527] in_atomic(): 0, irqs_disabled(): 1, pid: 2018, name: cgrulesengd
> [ 18.029693] BUG: unable to handle kernel paging request at 0000100000000000
> [ 18.029730] IP: [<ffffffff814c3db8>] __skb_recv_datagram+0x128/0x2b0
> [ 18.029756] PGD 0
> [ 18.029768] Oops: 0002 [#1] SMP
> [ 18.029790] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb10/10-0:1.0/bInterfaceClass
> [ 18.029824] CPU 0
> [ 18.029833] Modules linked in: kvm_intel kvm parport_pc ppdev snd_hda_codec_hdmi snd_hda_codec_realtek nfs lockd fscache auth_rpcgss nfs_acl sunrpc radeon deflate zlib_deflate ctr twofish_generic twofish_x86_64 twofish_common ttm camellia serpent drm_kms_helper snd_usb_audio blowfish cast5 snd_hda_intel drm des_generic snd_hda_codec snd_hwdep aesni_intel snd_usbmidi_lib cryptd aes_x86_64 aes_generic snd_pcm xcbc snd_seq_midi rmd160 snd_rawmidi sha512_generic sha256_generic uvcvideo snd_seq_midi_event sha1_generic snd_seq snd_timer crypto_null snd_seq_device snd af_key xhci_hcd i7core_edac videodev joydev psmouse edac_core v4l2_compat_ioctl32 w83627ehf soundcore serio_raw hwmon_vid snd_page_alloc max6650 hid_microsoft i2c_algo_bit lp parport asus_atk0110 usbhid hid firewire_ohci firewire_core crc_itu_t
> [ 18.030424]
> [ 18.030432] Pid: 2018, comm: cgrulesengd Not tainted 2.6.39-rc2+ #52 System manufacturer System Product Name/P6X58D PREMIUM
> [ 18.030477] RIP: 0010:[<ffffffff814c3db8>] [<ffffffff814c3db8>] __skb_recv_datagram+0x128/0x2b0
>...
>
> I haven't dug into it at all, but I am happy to help test potential fixes.

I can't figure this out, the only thing that should have changed is the
time the initial PROC_CN_MCAST_LISTEN message is received. Apparently
at that point connector is not fully initialized yet. Please post your
config and the full boot log. Thanks.

2011-04-07 14:17:26

by Eric B Munson

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

On Thu, 07 Apr 2011, Patrick McHardy wrote:

> Am 07.04.2011 00:05, schrieb Eric B Munson:
> > On Wed, 06 Apr 2011, David Miller wrote:
> >
> >> From: Eric B Munson <[email protected]>
> >> Date: Wed, 6 Apr 2011 17:20:41 -0400
> >>
> >>> A bisect points at commit 04f482faf50535229a5a5c8d629cf963899f857c for the
> >>> first bad one. Unfortunately, I have not made netconsole work yet and the
> >>> hang is happening mostly right when X starts so I can't even see the console.
> >>> I will keep at the netconsole and see if I can get it functioning, also I will
> >>> try to boot this kernel in a VM and see if that helps.
> >>
> >> Patrick, please help Eric so we can fix this bug.
> >>
> >> Thanks.
> >>
> >
> > I have a useful trace now from netconsole:
> >
> > [ 18.029521] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1087
> > [ 18.029527] in_atomic(): 0, irqs_disabled(): 1, pid: 2018, name: cgrulesengd
> > [ 18.029693] BUG: unable to handle kernel paging request at 0000100000000000
> > [ 18.029730] IP: [<ffffffff814c3db8>] __skb_recv_datagram+0x128/0x2b0
> > [ 18.029756] PGD 0
> > [ 18.029768] Oops: 0002 [#1] SMP
> > [ 18.029790] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb10/10-0:1.0/bInterfaceClass
> > [ 18.029824] CPU 0
> > [ 18.029833] Modules linked in: kvm_intel kvm parport_pc ppdev snd_hda_codec_hdmi snd_hda_codec_realtek nfs lockd fscache auth_rpcgss nfs_acl sunrpc radeon deflate zlib_deflate ctr twofish_generic twofish_x86_64 twofish_common ttm camellia serpent drm_kms_helper snd_usb_audio blowfish cast5 snd_hda_intel drm des_generic snd_hda_codec snd_hwdep aesni_intel snd_usbmidi_lib cryptd aes_x86_64 aes_generic snd_pcm xcbc snd_seq_midi rmd160 snd_rawmidi sha512_generic sha256_generic uvcvideo snd_seq_midi_event sha1_generic snd_seq snd_timer crypto_null snd_seq_device snd af_key xhci_hcd i7core_edac videodev joydev psmouse edac_core v4l2_compat_ioctl32 w83627ehf soundcore serio_raw hwmon_vid snd_page_alloc max6650 hid_microsoft i2c_algo_bit lp parport asus_atk0110 usbhid hid firewire_ohci firewire_core crc_itu_t
> > [ 18.030424]
> > [ 18.030432] Pid: 2018, comm: cgrulesengd Not tainted 2.6.39-rc2+ #52 System manufacturer System Product Name/P6X58D PREMIUM
> > [ 18.030477] RIP: 0010:[<ffffffff814c3db8>] [<ffffffff814c3db8>] __skb_recv_datagram+0x128/0x2b0
> >...
> >
> > I haven't dug into it at all, but I am happy to help test potential fixes.
>
> I can't figure this out, the only thing that should have changed is the
> time the initial PROC_CN_MCAST_LISTEN message is received. Apparently
> at that point connector is not fully initialized yet. Please post your
> config and the full boot log. Thanks.
>

I have attached both, let me know if you need anything else.

Eric


Attachments:
(No filename) (0.00 B)
signature.asc (490.00 B)
Digital signature
Download all attachments

2011-04-11 21:07:57

by Eric B Munson

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

On Thu, 07 Apr 2011, Patrick McHardy wrote:

> Am 07.04.2011 00:05, schrieb Eric B Munson:
> > On Wed, 06 Apr 2011, David Miller wrote:
> >
> >> From: Eric B Munson <[email protected]>
> >> Date: Wed, 6 Apr 2011 17:20:41 -0400
> >>
> >>> A bisect points at commit 04f482faf50535229a5a5c8d629cf963899f857c for the
> >>> first bad one. Unfortunately, I have not made netconsole work yet and the
> >>> hang is happening mostly right when X starts so I can't even see the console.
> >>> I will keep at the netconsole and see if I can get it functioning, also I will
> >>> try to boot this kernel in a VM and see if that helps.
> >>
> >> Patrick, please help Eric so we can fix this bug.
> >>
> >> Thanks.
> >>
> >
> > I have a useful trace now from netconsole:
> >
> > [ 18.029521] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1087
> > [ 18.029527] in_atomic(): 0, irqs_disabled(): 1, pid: 2018, name: cgrulesengd
> > [ 18.029693] BUG: unable to handle kernel paging request at 0000100000000000
> > [ 18.029730] IP: [<ffffffff814c3db8>] __skb_recv_datagram+0x128/0x2b0
> > [ 18.029756] PGD 0
> > [ 18.029768] Oops: 0002 [#1] SMP
> > [ 18.029790] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/usb10/10-0:1.0/bInterfaceClass
> > [ 18.029824] CPU 0
> > [ 18.029833] Modules linked in: kvm_intel kvm parport_pc ppdev snd_hda_codec_hdmi snd_hda_codec_realtek nfs lockd fscache auth_rpcgss nfs_acl sunrpc radeon deflate zlib_deflate ctr twofish_generic twofish_x86_64 twofish_common ttm camellia serpent drm_kms_helper snd_usb_audio blowfish cast5 snd_hda_intel drm des_generic snd_hda_codec snd_hwdep aesni_intel snd_usbmidi_lib cryptd aes_x86_64 aes_generic snd_pcm xcbc snd_seq_midi rmd160 snd_rawmidi sha512_generic sha256_generic uvcvideo snd_seq_midi_event sha1_generic snd_seq snd_timer crypto_null snd_seq_device snd af_key xhci_hcd i7core_edac videodev joydev psmouse edac_core v4l2_compat_ioctl32 w83627ehf soundcore serio_raw hwmon_vid snd_page_alloc max6650 hid_microsoft i2c_algo_bit lp parport asus_atk0110 usbhid hid firewire_ohci firewire_core crc_itu_t
> > [ 18.030424]
> > [ 18.030432] Pid: 2018, comm: cgrulesengd Not tainted 2.6.39-rc2+ #52 System manufacturer System Product Name/P6X58D PREMIUM
> > [ 18.030477] RIP: 0010:[<ffffffff814c3db8>] [<ffffffff814c3db8>] __skb_recv_datagram+0x128/0x2b0
> >...
> >
> > I haven't dug into it at all, but I am happy to help test potential fixes.
>
> I can't figure this out, the only thing that should have changed is the
> time the initial PROC_CN_MCAST_LISTEN message is received. Apparently
> at that point connector is not fully initialized yet. Please post your
> config and the full boot log. Thanks.
>

I am still seeing this on Linus' tree, is there anything more I can do to help
track the problem?

Thanks,
Eric


Attachments:
(No filename) (2.79 kB)
signature.asc (490.00 B)
Digital signature
Download all attachments

2011-04-11 22:07:03

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

Hi.

On Mon, Apr 11, 2011 at 05:07:47PM -0400, Eric B Munson ([email protected]) wrote:
> > I can't figure this out, the only thing that should have changed is the
> > time the initial PROC_CN_MCAST_LISTEN message is received. Apparently
> > at that point connector is not fully initialized yet. Please post your
> > config and the full boot log. Thanks.
> >
>
> I am still seeing this on Linus' tree, is there anything more I can do to help
> track the problem?

Patrick, do you need my assist on this bug?

--
Evgeniy Polyakov

2011-04-12 12:50:13

by Patrick McHardy

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

On 12.04.2011 00:06, Evgeniy Polyakov wrote:
> Hi.
>
> On Mon, Apr 11, 2011 at 05:07:47PM -0400, Eric B Munson ([email protected]) wrote:
>>> I can't figure this out, the only thing that should have changed is the
>>> time the initial PROC_CN_MCAST_LISTEN message is received. Apparently
>>> at that point connector is not fully initialized yet. Please post your
>>> config and the full boot log. Thanks.
>>>
>>
>> I am still seeing this on Linus' tree, is there anything more I can do to help
>> track the problem?

Sorry, I had a hardware failure, I'm back working on this now.

> Patrick, do you need my assist on this bug?

Thanks, but I can meanwhile reproduce the problem, so I think I
should have a fix soon.

2011-04-12 15:40:11

by Patrick McHardy

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

On 12.04.2011 14:49, Patrick McHardy wrote:
> On 12.04.2011 00:06, Evgeniy Polyakov wrote:
>> Hi.
>>
>> On Mon, Apr 11, 2011 at 05:07:47PM -0400, Eric B Munson ([email protected]) wrote:
>>>> I can't figure this out, the only thing that should have changed is the
>>>> time the initial PROC_CN_MCAST_LISTEN message is received. Apparently
>>>> at that point connector is not fully initialized yet. Please post your
>>>> config and the full boot log. Thanks.
>>>>
>>>
>>> I am still seeing this on Linus' tree, is there anything more I can do to help
>>> track the problem?
>
> Sorry, I had a hardware failure, I'm back working on this now.
>
>> Patrick, do you need my assist on this bug?
>
> Thanks, but I can meanwhile reproduce the problem, so I think I
> should have a fix soon.

I think this patch should fix the problem. Eric, could you please
give it a try?




Attachments:
cn.diff (838.00 B)

2011-04-12 15:59:55

by Eric B Munson

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

On Tue, 12 Apr 2011, Patrick McHardy wrote:

> On 12.04.2011 14:49, Patrick McHardy wrote:
> > On 12.04.2011 00:06, Evgeniy Polyakov wrote:
> >> Hi.
> >>
> >> On Mon, Apr 11, 2011 at 05:07:47PM -0400, Eric B Munson ([email protected]) wrote:
> >>>> I can't figure this out, the only thing that should have changed is the
> >>>> time the initial PROC_CN_MCAST_LISTEN message is received. Apparently
> >>>> at that point connector is not fully initialized yet. Please post your
> >>>> config and the full boot log. Thanks.
> >>>>
> >>>
> >>> I am still seeing this on Linus' tree, is there anything more I can do to help
> >>> track the problem?
> >
> > Sorry, I had a hardware failure, I'm back working on this now.
> >
> >> Patrick, do you need my assist on this bug?
> >
> > Thanks, but I can meanwhile reproduce the problem, so I think I
> > should have a fix soon.
>
> I think this patch should fix the problem. Eric, could you please
> give it a try?

This has me up and running again, thanks!

Tested-by: Eric B Munson <[email protected]>
>
>
>

> commit ad676e0dbbe8658ce46e192f449689bf3011bdf5
> Author: Patrick McHardy <[email protected]>
> Date: Tue Apr 12 17:37:04 2011 +0200
>
> connector: fix skb double free in cn_rx_skb()
>
> When a skb is delivered to a registered callback, cn_call_callback()
> incorrectly returns -ENODEV after freeing the skb, causing cn_rx_skb()
> to free the skb a second time.
>
> Reported-by: Eric B Munson <[email protected]>
> Signed-off-by: Patrick McHardy <[email protected]>
>
> diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
> index d770058..219d88a 100644
> --- a/drivers/connector/connector.c
> +++ b/drivers/connector/connector.c
> @@ -142,6 +142,7 @@ static int cn_call_callback(struct sk_buff *skb)
> cbq->callback(msg, nsp);
> kfree_skb(skb);
> cn_queue_release_callback(cbq);
> + err = 0;
> }
>
> return err;


Attachments:
(No filename) (1.90 kB)
signature.asc (490.00 B)
Digital signature
Download all attachments

2011-04-12 21:39:47

by David Miller

[permalink] [raw]
Subject: Re: 2.6.39-rc2 boot crash

From: Patrick McHardy <[email protected]>
Date: Tue, 12 Apr 2011 17:39:51 +0200

> I think this patch should fix the problem. Eric, could you please
> give it a try?

Applied, thanks everyone.